[ 
https://issues.apache.org/jira/browse/SPARK-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760858#comment-15760858
 ] 

Adam Wang commented on SPARK-8605:
----------------------------------

How about change FileInputDStream.defaultFilter(path) to fix this bug?

> Exclude files in StreamingContext. textFileStream(directory)
> ------------------------------------------------------------
>
>                 Key: SPARK-8605
>                 URL: https://issues.apache.org/jira/browse/SPARK-8605
>             Project: Spark
>          Issue Type: Improvement
>          Components: DStreams
>            Reporter: Noel Vo
>              Labels: streaming, streaming_api
>
> Currenly, spark streaming can monitor a directory and it will process the 
> newly added files. This will cause a bug if the files copied to the directory 
> are big. For example, in hdfs, if a file is being copied, its name is 
> file_name._COPYING_. Spark will pick up the file and process. However, when 
> it's done copying the file, the file name becomes file_name. This would cause 
> FileDoesNotExist error. It would be great if we can exclude files using regex 
> in the directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to