Burak Yavuz created SPARK-11419:
-----------------------------------

             Summary: WriteAheadLog recovery improvements for when 
closeFileAfterWrite is enabled
                 Key: SPARK-11419
                 URL: https://issues.apache.org/jira/browse/SPARK-11419
             Project: Spark
          Issue Type: Improvement
          Components: Streaming
            Reporter: Burak Yavuz


The support for closing WriteAheadLog files after writes was just merged in. 
Closing every file after a write is a very expensive operation as it creates 
many small files on S3. It's not necessary to enable it on HDFS anyway.

However, when you have many small files on S3, recovery takes very long. We can 
parallelize the recovery process.

In addition, files start stacking up pretty quickly, and deletes may not be 
able to keep up, therefore we should add support for that as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to