[jira] [Commented] (SPARK-3276) Provide a API to specify whether the old files need to be ignored in file input text DStream

2015-03-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386775#comment-14386775
 ] 

Emre Sevinç commented on SPARK-3276:


Any plans to make the private val {{FileInputDStream.MIN_REMEMBER_DURATION}} 
configurable via some API?

It seems to be hard-coded as 1 minute in 
https://github.com/apache/spark/blob/branch-1.2/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala#L325,
 and this leads to files older than 1 minute not to be processed.

> Provide a API to specify whether the old files need to be ignored in file 
> input text DStream
> 
>
> Key: SPARK-3276
> URL: https://issues.apache.org/jira/browse/SPARK-3276
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.2.0
>Reporter: Jack Hu
>Priority: Minor
>
> Currently, only one API called textFileStream in StreamingContext to specify 
> the text file dstream, which ignores the old files always. On some times, the 
> old files is still useful.
> Need a API to let user choose whether the old files need to be ingored or not 
> .
> The API currently in StreamingContext:
> def textFileStream(directory: String): DStream[String] = {
> fileStream[LongWritable, Text, 
> TextInputFormat](directory).map(_._2.toString)
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3276) Provide a API to specify whether the old files need to be ignored in file input text DStream

2015-01-20 Thread Jack Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14283662#comment-14283662
 ] 

Jack Hu commented on SPARK-3276:


With some cases, the old files (older than current spark system time) are
needed: if you have a fixed list in hdfs you want to correlate to the input
stream, then you need to load it from the file system.

As the newFilesOnly options, it breaks on spark 1.2 (It works on 1.1).




> Provide a API to specify whether the old files need to be ignored in file 
> input text DStream
> 
>
> Key: SPARK-3276
> URL: https://issues.apache.org/jira/browse/SPARK-3276
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.2.0
>Reporter: Jack Hu
>Priority: Minor
>
> Currently, only one API called textFileStream in StreamingContext to specify 
> the text file dstream, which ignores the old files always. On some times, the 
> old files is still useful.
> Need a API to let user choose whether the old files need to be ingored or not 
> .
> The API currently in StreamingContext:
> def textFileStream(directory: String): DStream[String] = {
> fileStream[LongWritable, Text, 
> TextInputFormat](directory).map(_._2.toString)
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3276) Provide a API to specify whether the old files need to be ignored in file input text DStream

2015-01-14 Thread Jem Tucker (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276641#comment-14276641
 ] 

Jem Tucker commented on SPARK-3276:
---

This can be achieved using ssc.fileStream[LongWritable, Text, 
TextInputFormat](directory: String, filter: path => Boolean, newFilesOnly: 
Boolean)

If newFilesOnly is set to false, all files already in the directory will be 
streamed in the first batch. Is this what you meant?

> Provide a API to specify whether the old files need to be ignored in file 
> input text DStream
> 
>
> Key: SPARK-3276
> URL: https://issues.apache.org/jira/browse/SPARK-3276
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.0.2
>Reporter: Jack Hu
>
> Currently, only one API called textFileStream in StreamingContext to specify 
> the text file dstream, which ignores the old files always. On some times, the 
> old files is still useful.
> Need a API to let user choose whether the old files need to be ingored or not 
> .
> The API currently in StreamingContext:
> def textFileStream(directory: String): DStream[String] = {
> fileStream[LongWritable, Text, 
> TextInputFormat](directory).map(_._2.toString)
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3276) Provide a API to specify whether the old files need to be ignored in file input text DStream

2014-08-28 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113704#comment-14113704
 ] 

Sean Owen commented on SPARK-3276:
--

Given the nature of a stream processing framework, when would you want to keep 
reprocessing all old data? that is something you can do, but, doesn't require 
Spark Streaming

> Provide a API to specify whether the old files need to be ignored in file 
> input text DStream
> 
>
> Key: SPARK-3276
> URL: https://issues.apache.org/jira/browse/SPARK-3276
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.0.2
>Reporter: Jack Hu
>
> Currently, only one API called textFileStream in StreamingContext to specify 
> the text file dstream, which ignores the old files always. On some times, the 
> old files is still useful.
> Need a API to let user choose whether the old files need to be ingored or not 
> .
> The API currently in StreamingContext:
> def textFileStream(directory: String): DStream[String] = {
> fileStream[LongWritable, Text, 
> TextInputFormat](directory).map(_._2.toString)
>   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org