Thanks Tathagata, so can I say RDD size(from the stream) is window size.
and the overlap between 2 adjacent RDDs are sliding size.

But I still don't understand what it batch size, why do we need this since
data processing is RDD by RDD right?

And does spark chop the data into RDDs at the very beginning? Do you allow
event by event processing, for example filtering




On Wed, Jul 16, 2014 at 6:47 PM, Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> I guess this is better explained in the streaming programming guide's
> <http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations>
> window operation subsection.
>
> For completeness sake, its worth mentioning the following. Window
> operations can be applied on other windowed-DStreams as well. So the
> correct thing to say is that the slide duration of the window operations
> must be a multiple of "sliding interval" of the parent DStream. For simple,
> non-window dstream, this sliding interval is same as the batch interval
>
> // say batch interval is 2 seconds
> inputstream                                // moves every batch interval 2
> seconds
> inputstream.window(Seconds(3))  // not allowed, must be multiple of 2
> seconds
> inputstream.window(Seconds(4))  // allowed, moves every 2 seconds
> (therefore sliding interval is 2 seconds)
> inputstream.window(Seconds(10), Seconds(4))    // allowed, moves every 4
> seconds (therefore sliding interval is 4 seconds)
> inputstream.window(Seconds(10), Seconds(4)).window(Seconds(6))    // not
> allowed, as window interval must be multiple of parent's sliding interval
> which is 4 seconds
> inputstream.window(Seconds(10), Seconds(4)).window(Seconds(8))    //
> allowed
>
> Hopefully that made sense :)
>
> TD
>
>
>
>
> On Wed, Jul 16, 2014 at 12:41 PM, Walrus theCat <walrusthe...@gmail.com>
> wrote:
>
>> I did not!
>>
>>
>> On Wed, Jul 16, 2014 at 12:31 PM, aaronjosephs <aa...@placeiq.com> wrote:
>>
>>> The only other thing to keep in mind is that window duration and slide
>>> duration have to be multiples of batch duration, IDK if you made that
>>> fully
>>> clear
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Difference-among-batchDuration-windowDuration-slideDuration-tp9966p9973.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>
>>
>

Reply via email to