HeartSaVioR opened a new pull request #31495:
URL: https://github.com/apache/spark/pull/31495


   ### What changes were proposed in this pull request?
   
   This PR proposes to optimize WAL commit phase via following changes:
   
   * cache offset log to avoid FS get operation per batch
   * use FS exist operation instead of FS list operation on purge (2 operations 
per batch)
   
   ### Why are the changes needed?
   
   There're inefficiency on WAL commit phase which can be easily optimized via 
using a small driver memory.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Manually tested with debug log. (Verified that cache is used, cache keeps 
the size as 2, only one exist call is used instead of list call)
   
   Experimental on AWS S3 + S3 guard:
   
   > before the patch
   
   > after the patch
   
   Experimental on Azure:
   
   > before the patch
   
   > after the patch
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to