Hi community,

  After running tpc-ds test suite for several days on a session cluster, we
found a resource leak problem of OrcInputFormat which was reported in
FLINK-15239. The problem comes from the dependent third party library which
creates new internal thread (pool) and never release it. As a result, the
user class loader which is referenced by these threads will never be
garbage collected as well as other classes loaded by the user class loader,
which finally lead to the continually grow of meta space size for JM (AM)
whose meta space size is not limited currently. And for TM whose meta space
size is limited, it will result in meta space oom eventually. I am not sure
if any other connectors/input formats incurs the similar problem.
  In general, it is hard for Flink to restrict the behavior of the third
party dependencies, especially the dependencies of connectors. However, it
will be better if we can supply some mechanism like stronger isolation or
some test facilities to find potential problems, for example, we can run
jobs on a cluster and automatically check something like whether user class
loader can be garbage collected, whether there is thread leak, whether some
shutdown hooks have been registered and so on.
  What do you think? Or should we treat it as a problem?

Best,
Yingjie

Reply via email to