Frens Jan Rumph created SPARK-54487:
---------------------------------------
Summary: First MicroBatchExecution never released
Key: SPARK-54487
URL: https://issues.apache.org/jira/browse/SPARK-54487
Project: Spark
Issue Type: Bug
Components: Structured Streaming
Affects Versions: 4.0.1
Reporter: Frens Jan Rumph
{{MicroBatchExecution#runActivatedStream}} seems to retain a reference to the
first {{MicroBatchExecutionContext}} in the {{execCtx}} variable, causing
shuffles of the first micro batch to be retained forever.
The 'stream execution thread for XYZ' thread drives the trigger execution by
setting the first context to be executed and then calling
{{triggerExecutor.execute(...)}}. This causes {{execCtx}} to be a GC root for
that first batch which prevents {{ContextCleaner}} to cleanup shuffle
dependencies as this is driven by JVM garbage collection.
I have a heap dump available wherein after hours of streaming, two
{{CleanShuffle}} objects are retained for shuffles 0 and 1. I can provide more
details based on this dump if need be.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]