Re: Spark streaming: Fixed time aggregation & handling driver failures

Cody Koeninger Fri, 15 Jan 2016 20:49:13 -0800

You can't really use spark batches as the basis for any kind of reliable
time aggregation.  Time of batch processing in general has nothing to do
with time of event.


You need to filter / aggregate by the time interval you care about, in your
own code, or use a data store that can do the aggregation.



On Fri, Jan 15, 2016 at 9:13 PM, ffarozan <ffaro...@gmail.com> wrote:

> I am implementing aggregation using spark streaming and kafka. My batch and
> window size are same. And the aggregated data is persisted in Cassandra.
>
> I want to aggregate for fixed time windows - 5:00, 5:05, 5:10, ...
>
> But we cannot control when to run streaming job, we only get to specify the
> batch interval.
>
> So the problem is - lets say if streaming job starts at 5:02, then I will
> get results at 5:07, 5:12, etc. and not what I want.
>
> Any suggestions?
>
> thanks,
> Firdousi
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-Fixed-time-aggregation-handling-driver-failures-tp25982.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Spark streaming: Fixed time aggregation & handling driver failures

Reply via email to