[
https://issues.apache.org/jira/browse/SPARK-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adam Wang updated SPARK-8605:
-----------------------------
Comment: was deleted
(was: How about change FileInputDStream.defaultFilter(path) to fix this bug?)
> Exclude files in StreamingContext. textFileStream(directory)
> ------------------------------------------------------------
>
> Key: SPARK-8605
> URL: https://issues.apache.org/jira/browse/SPARK-8605
> Project: Spark
> Issue Type: Improvement
> Components: DStreams
> Reporter: Noel Vo
> Labels: streaming, streaming_api
>
> Currenly, spark streaming can monitor a directory and it will process the
> newly added files. This will cause a bug if the files copied to the directory
> are big. For example, in hdfs, if a file is being copied, its name is
> file_name._COPYING_. Spark will pick up the file and process. However, when
> it's done copying the file, the file name becomes file_name. This would cause
> FileDoesNotExist error. It would be great if we can exclude files using regex
> in the directory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]