Kerker <ofir.ker...@gmail.com>
Cc: user@spark.apache.org
Subject: Re: Timed aggregation in Spark
I don't think this is solving the problem. So here are the issues:
1) How do we push entire data to vertica. Opening a connection per record will
be too costly
2) If a key doesn't come again, how do w
om: Nikhil Goyal <nownik...@gmail.com>
> Sent: Monday, May 23, 2016 23:28
> Subject: Timed aggregation in Spark
> To: <user@spark.apache.org>
>
>
>
> Hi all,
>
> I want to aggregate my data for 5-10 min and then flush the aggregated
> data to some database like vertic
Yes, check out
mapWithState:https://databricks.com/blog/2016/02/01/faster-stateful-stream-processing-in-apache-spark-streaming.html
_
From: Nikhil Goyal <nownik...@gmail.com>
Sent: Monday, May 23, 2016 23:28
Subject: Timed aggregation in Spark
To:
Hi all,
I want to aggregate my data for 5-10 min and then flush the aggregated data
to some database like vertica. updateStateByKey is not exactly helpful in
this scenario as I can't flush all the records at once, neither can I clear
the state. I wanted to know if anyone else has faced a similar