RE: Timed aggregation in Spark

2016-05-23 Thread Ewan Leith
Kerker <ofir.ker...@gmail.com> Cc: user@spark.apache.org Subject: Re: Timed aggregation in Spark I don't think this is solving the problem. So here are the issues: 1) How do we push entire data to vertica. Opening a connection per record will be too costly 2) If a key doesn't come again, how do w

Re: Timed aggregation in Spark

2016-05-23 Thread Nikhil Goyal
om: Nikhil Goyal <nownik...@gmail.com> > Sent: Monday, May 23, 2016 23:28 > Subject: Timed aggregation in Spark > To: <user@spark.apache.org> > > > > Hi all, > > I want to aggregate my data for 5-10 min and then flush the aggregated > data to some database like vertic

Re: Timed aggregation in Spark

2016-05-23 Thread Ofir Kerker
Yes, check out mapWithState:https://databricks.com/blog/2016/02/01/faster-stateful-stream-processing-in-apache-spark-streaming.html _ From: Nikhil Goyal <nownik...@gmail.com> Sent: Monday, May 23, 2016 23:28 Subject: Timed aggregation in Spark To:

Timed aggregation in Spark

2016-05-23 Thread Nikhil Goyal
Hi all, I want to aggregate my data for 5-10 min and then flush the aggregated data to some database like vertica. updateStateByKey is not exactly helpful in this scenario as I can't flush all the records at once, neither can I clear the state. I wanted to know if anyone else has faced a similar