Frens Jan Rumph created SPARK-54487:
---------------------------------------

             Summary: First MicroBatchExecution never released
                 Key: SPARK-54487
                 URL: https://issues.apache.org/jira/browse/SPARK-54487
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 4.0.1
            Reporter: Frens Jan Rumph


{{MicroBatchExecution#runActivatedStream}} seems to retain a reference to the 
first {{MicroBatchExecutionContext}} in the {{execCtx}} variable, causing 
shuffles of the first micro batch to be retained forever.

The 'stream execution thread for XYZ' thread drives the trigger execution by 
setting the first context to be executed and then calling 
{{triggerExecutor.execute(...)}}. This causes {{execCtx}} to be a GC root for 
that first batch which prevents {{ContextCleaner}} to cleanup shuffle 
dependencies as this is driven by JVM garbage collection.

I have a heap dump available wherein after hours of streaming, two 
{{CleanShuffle}} objects are retained for shuffles 0 and 1. I can provide more 
details based on this dump if need be.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to