[ 
https://issues.apache.org/jira/browse/SPARK-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frederick Reiss updated SPARK-17475:
------------------------------------
    Description: 
When HDFSMetadataLog uses a log directory on a filesystem other than HDFS (i.e. 
NFS or the driver node's local filesystem), the class leaves orphan checksum 
(CRC) files in the log directory. The files have names that follow the pattern 
"..[long UUID hex string].tmp.crc". These files exist because HDFSMetaDataLog 
renames other temporary files without renaming the corresponding checksum 
files. There is one CRC file per batch, so the directory fills up quite quickly.

I'm not certain, but this problem might also occur on certain versions of the 
HDFS APIs.

  was:
When HDFSMetadataLog uses a log directory on a filesystem other than HDFS (i.e. 
NFS or the driver node's local filesystem), the class leaves orphan checksum 
(CRC) files in the log directory. The files have names that follow the pattern 
"..[long UUID hex string].tmp.crc". These files exist HDFSMetaDataLog renames 
other temporary files without renaming the corresponding checksum files. There 
is one CRC file per batch, so the directory fills up quite quickly.

I'm not certain, but this problem might also occur on certain versions of the 
HDFS APIs.


> HDFSMetadataLog should not leak CRC files
> -----------------------------------------
>
>                 Key: SPARK-17475
>                 URL: https://issues.apache.org/jira/browse/SPARK-17475
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Streaming
>            Reporter: Frederick Reiss
>
> When HDFSMetadataLog uses a log directory on a filesystem other than HDFS 
> (i.e. NFS or the driver node's local filesystem), the class leaves orphan 
> checksum (CRC) files in the log directory. The files have names that follow 
> the pattern "..[long UUID hex string].tmp.crc". These files exist because 
> HDFSMetaDataLog renames other temporary files without renaming the 
> corresponding checksum files. There is one CRC file per batch, so the 
> directory fills up quite quickly.
> I'm not certain, but this problem might also occur on certain versions of the 
> HDFS APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to