[ 
https://issues.apache.org/jira/browse/SPARK-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14568019#comment-14568019
 ] 

Tathagata Das edited comment on SPARK-1312 at 6/1/15 9:16 PM:
--------------------------------------------------------------

I have verified this in current Spark and I am closing this JIRa. 


was (Author: tdas):
I have verified this and I am closing this JIRa. 

> Batch should read based on the batch interval provided in the StreamingContext
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-1312
>                 URL: https://issues.apache.org/jira/browse/SPARK-1312
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 0.9.0
>            Reporter: Sanjay Awatramani
>            Assignee: Tathagata Das
>            Priority: Critical
>              Labels: sliding, streaming, window
>
> This problem primarily affects sliding window operations in spark streaming.
> Consider the following scenario:
> - a DStream is created from any source. (I've checked with file and socket)
> - No actions are applied to this DStream
> - Sliding Window operation is applied to this DStream and an action is 
> applied to the sliding window.
> In this case, Spark will not even read the input stream in the batch in which 
> the sliding interval isn't a multiple of batch interval. Put another way, it 
> won't read the input when it doesn't have to apply the window function. This 
> is happening because all transformations in Spark are lazy.
> How to fix this or workaround it (see line#3):
> JavaStreamingContext stcObj = new JavaStreamingContext(confObj, new 
> Duration(1 * 60 * 1000));
> JavaDStream<String> inputStream = stcObj.textFileStream("/Input");
> inputStream.print(); // This is the workaround
> JavaDStream<String> objWindow = inputStream.window(new 
> Duration(windowLen*60*1000), new Duration(slideInt*60*1000));
> objWindow.dstream().saveAsTextFiles("/Output", "");
> The "Window operations" example on the streaming guide implies that Spark 
> will read the stream in every batch, which is not happening because of the 
> lazy transformations.
> Wherever sliding window would be used, in most of the cases, no actions will 
> be taken on the pre-window batch, hence my gut feeling was that Streaming 
> would read every batch if any actions are being taken in the windowed stream.
> As per Tathagata,
> "Ideally every batch should read based on the batch interval provided in the 
> StreamingContext."
> Refer the original thread on 
> http://apache-spark-user-list.1001560.n3.nabble.com/Sliding-Window-operations-do-not-work-as-documented-tp2999.html
>  for more details, including Tathagata's conclusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to