[jira] [Resolved] (SPARK-18156) CLONE - StreamExecution should discard unneeded metadata

Sean Owen (JIRA) Fri, 28 Oct 2016 02:59:39 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-18156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen resolved SPARK-18156.
-------------------------------
    Resolution: Invalid

> CLONE - StreamExecution should discard unneeded metadata
> --------------------------------------------------------
>
>                 Key: SPARK-18156
>                 URL: https://issues.apache.org/jira/browse/SPARK-18156
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Streaming
>            Reporter: Sunil Kumar
>            Assignee: Frederick Reiss
>             Fix For: 2.1.0, 2.0.1
>
>
> The StreamExecution maintains a write-ahead log of batch metadata in order to 
> allow repeating previously in-flight batches if the driver is restarted. 
> StreamExecution does not garbage-collect or compact this log in any way.
> Since the log is implemented with HDFSMetadataLog, these files will consume 
> memory on the HDFS NameNode. Specifically, each log file will consume about 
> 300 bytes of NameNode memory (150 bytes for the inode and 150 bytes for the 
> block of file contents; see 
> [https://www.cloudera.com/documentation/enterprise/latest/topics/admin_nn_memory_config.html].
>  An application with a 100 msec batch interval will increase the NameNode's 
> heap usage by about 250MB per day.
> There is also the matter of recovery. StreamExecution reads its entire log 
> when restarting. This read operation will be very expensive if the log 
> contains millions of entries spread over millions of files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-18156) CLONE - StreamExecution should discard unneeded metadata

Reply via email to