choojoyq opened a new pull request #25439: [SPARK-28709][DSTREAMS] - Fix 
StreamingContext leak through Streaming…
URL: https://github.com/apache/spark/pull/25439
 
 
   ## What changes were proposed in this pull request?
   
   In my application spark streaming is restarted programmatically by stopping 
StreamingContext without stopping of SparkContext and creating/starting a new 
one. I use it for automatic detection of Kafka topic/partition changes and 
automatic failover in case of non fatal exceptions.
   
   However i notice that after multiple restarts driver fails with OOM. During 
investigation of heap dump i figured out that StreamingContext object isn't 
cleared by GC after stopping.
   
   There are several places which holds reference to it :
   
   1. StreamingTab registers StreamingJobProgressListener which holds reference 
to Streaming Context directly to LiveListenerBus shared queue via 
ssc.sc.addSparkListener(listener) method invocation. However this listener 
isn't unregistered at stop method.
   2. json handlers (/streaming/json and /streaming/batch/json) aren't 
unregistered in SparkUI, while they hold reference to 
StreamingJobProgressListener. Basically the same issue affects all the pages, i 
assume that renderJsonHandler should be added to pageToHandlers cache on 
attachPage method invocation in order to unregistered it as well on detachPage.
   3. SparkUi holds reference to StreamingJobProgressListener in the 
corresponding local variable which isn't cleared after stopping of 
StreamingContext.
   
   ## How was this patch tested?
   
   Added tests to existing test suites.
   After i applied these changes via reflection in my app OOM on driver side 
gone.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to