[ https://issues.apache.org/jira/browse/SPARK-22625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16268461#comment-16268461 ]
Tolstopyatov Vsevolod commented on SPARK-22625: ----------------------------------------------- If you agree this is the problem I can work on a patch in a week or so > Properly cleanup inheritable thread-locals > ------------------------------------------ > > Key: SPARK-22625 > URL: https://issues.apache.org/jira/browse/SPARK-22625 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.2.0 > Reporter: Tolstopyatov Vsevolod > Labels: leak > > Memory leak is present due to inherited thread locals, SPARK-20558 didn't > fixed it properly. > Our production application has the following logic: one thread is reading > from HDFS and another one creates spark context, processes HDFS files and > then closes it on regular schedule. > Depending on what thread started first, SparkContext thread local may or may > not be inherited by HDFS-daemon (DataStreamer), causing memory leak when > streamer was created after spark context. Memory consumption increases every > time new spark context is created, related yourkit paths: > https://screencast.com/t/tgFBYMEpW > The problem is more general and is not related to HDFS in particular. > Proper fix: register all cloned properties (in `localProperties#childValue`) > in ConcurrentHashMap and forcefully clear all of them in `SparkContext#close` -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org