[
https://issues.apache.org/jira/browse/SPARK-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760854#comment-15760854
]
Adam Wang commented on SPARK-8605:
----------------------------------
How about change defaultFilter() to fix this bug?
> Exclude files in StreamingContext. textFileStream(directory)
> ------------------------------------------------------------
>
> Key: SPARK-8605
> URL: https://issues.apache.org/jira/browse/SPARK-8605
> Project: Spark
> Issue Type: Improvement
> Components: DStreams
> Reporter: Noel Vo
> Labels: streaming, streaming_api
>
> Currenly, spark streaming can monitor a directory and it will process the
> newly added files. This will cause a bug if the files copied to the directory
> are big. For example, in hdfs, if a file is being copied, its name is
> file_name._COPYING_. Spark will pick up the file and process. However, when
> it's done copying the file, the file name becomes file_name. This would cause
> FileDoesNotExist error. It would be great if we can exclude files using regex
> in the directory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]