[jira] [Created] (SPARK-24295) Purge Structured streaming FileStreamSinkLog metadata compact file data.

Iqbal Singh (JIRA) Wed, 16 May 2018 04:54:12 -0700

Iqbal Singh created SPARK-24295:
-----------------------------------

             Summary: Purge Structured streaming FileStreamSinkLog metadata 
compact file data.
                 Key: SPARK-24295
                 URL: https://issues.apache.org/jira/browse/SPARK-24295
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 2.3.0
            Reporter: Iqbal Singh



FileStreamSinkLog metadata logs are concatenated to a single compact file after 
defined compact interval.

For long running jobs, compact file size can grow up to 10's of GB's, Causing 
slowness  while reading the data from FileStreamSinkLog dir as spark is 
defaulting to the "__spark__metadata" dir for the read.

We need a functionality to purge the compact file size.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-24295) Purge Structured streaming FileStreamSinkLog metadata compact file data.

Reply via email to