Jack Hu created SPARK-6061: ------------------------------ Summary: File source dstream can not include the old file which timestamp is before the system time Key: SPARK-6061 URL: https://issues.apache.org/jira/browse/SPARK-6061 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Jack Hu
The file source dstream (StreamContext.fileStream) has a properties named "newFilesOnly" to include the old files, it worked fine with 1.1.0, and broken at 1.2.1, the older files always be ignored no mattern what value is set. Here is the simple reproduce code: https://gist.github.com/jhu-chang/1ee5b0788c7479414eeb The reason is that: the "modTimeIgnoreThreshold" in FileInputDStream::findNewFiles is set to a time closed to system time (Spark Streaming Clock time), so the files old than this time are ignored. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org