Re: spark streaming, batchinterval,windowinterval and window sliding interval difference

Jeffrey Jedele Fri, 27 Feb 2015 02:15:59 -0800

If you read the streaming programming guide, you'll notice that Spark does
not do "real" streaming but "emulates" it with a so-called mini-batching
approach. Let's say you want to work with a continuous stream of incoming
events from a computing centre:


Batch interval:
That's the basic "heartbeat" of your streaming application. If you set this
to 1 second, Spark will create a RDD every second containing the events of
that second. That's your "mini-batch" of data.

Windowing:
That's a way to do aggregations on your streaming data. Let's say you want
to have a summary of how many warnings your system produced in the last
hour. Then you would use a windowed reduce with a window size of 1h.

Sliding:
This tells Spark how often to perform your windowed operation. If you would
set this to 1h as well, you would aggregate your data stream to consecutive
1h windows of data - no overlap. You could also tell spark to create your
1h aggregation 2 times a day only by setting the sliding interval to 12h.
Or you could tell Spark to create a 1h aggregation every 30 min. Then each
data window overlaps with the previous window of course.

I recommend to carefully read the programming guide- it explains these
concepts pretty well.
https://spark.apache.org/docs/latest/streaming-programming-guide.html

Regards,
Jeff

2015-02-26 18:51 GMT+01:00 Hafiz Mujadid <hafizmujadi...@gmail.com>:

> Can somebody explain the difference between
> batchinterval,windowinterval and window sliding interval with example.
> If there is any real time use case of using these parameters?
>
>
> Thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-batchinterval-windowinterval-and-window-sliding-interval-difference-tp21829.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: spark streaming, batchinterval,windowinterval and window sliding interval difference

Reply via email to