[ https://issues.apache.org/jira/browse/SPARK-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386775#comment-14386775 ]
Emre Sevinç commented on SPARK-3276: ------------------------------------ Any plans to make the private val {{FileInputDStream.MIN_REMEMBER_DURATION}} configurable via some API? It seems to be hard-coded as 1 minute in https://github.com/apache/spark/blob/branch-1.2/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala#L325, and this leads to files older than 1 minute not to be processed. > Provide a API to specify whether the old files need to be ignored in file > input text DStream > -------------------------------------------------------------------------------------------- > > Key: SPARK-3276 > URL: https://issues.apache.org/jira/browse/SPARK-3276 > Project: Spark > Issue Type: Improvement > Components: Streaming > Affects Versions: 1.2.0 > Reporter: Jack Hu > Priority: Minor > > Currently, only one API called textFileStream in StreamingContext to specify > the text file dstream, which ignores the old files always. On some times, the > old files is still useful. > Need a API to let user choose whether the old files need to be ingored or not > . > The API currently in StreamingContext: > def textFileStream(directory: String): DStream[String] = { > fileStream[LongWritable, Text, > TextInputFormat](directory).map(_._2.toString) > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org