[ 
https://issues.apache.org/jira/browse/SPARK-40849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40849:
------------------------------------

    Assignee:     (was: Apache Spark)

> Async log purge
> ---------------
>
>                 Key: SPARK-40849
>                 URL: https://issues.apache.org/jira/browse/SPARK-40849
>             Project: Spark
>          Issue Type: New Feature
>          Components: Structured Streaming
>    Affects Versions: 3.4.0
>            Reporter: Boyang Jerry Peng
>            Priority: Major
>
> Purging old entries in both the offset log and commit log will be done 
> asynchronously.
>  
> For every micro-batch, older entries in both offset log and commit log are 
> deleted. This is done so that the offset log and commit log do not 
> continually grow.  Please reference logic here
>  
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L539]
>  
>  
> The time spent performing these log purges is grouped with the “walCommit” 
> execution time in the StreamingProgressListener metrics.  Around two thirds 
> of the “walCommit” execution time is performing these purge operations thus 
> making these operations asynchronous will also reduce latency.  Also, we do 
> not necessarily need to perform the purges every micro-batch.  When these 
> purges are executed asynchronously, they do not need to block micro-batch 
> execution and we don’t need to start another purge until the current one is 
> finished.  The purges can happen essentially in the background.  We will just 
> have to synchronize the purges with the offset WAL commits and completion 
> commits so that we don’t have concurrent modifications of the offset log and 
> commit log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to