subject:"RE\: Timed aggregation in Spark"

RE: Timed aggregation in Spark

2016-05-23 Thread Ewan Leith

Kerker <ofir.ker...@gmail.com> Cc: user@spark.apache.org Subject: Re: Timed aggregation in Spark I don't think this is solving the problem. So here are the issues: 1) How do we push entire data to vertica. Opening a connection per record will be too costly 2) If a key doesn't come again, how do w

Re: Timed aggregation in Spark

2016-05-23 Thread Nikhil Goyal

I don't think this is solving the problem. So here are the issues: 1) How do we push entire data to vertica. Opening a connection per record will be too costly 2) If a key doesn't come again, how do we push this to vertica 3) How do we schedule the dumping of data to avoid loading too much data in

Re: Timed aggregation in Spark

2016-05-23 Thread Ofir Kerker

Yes, check out mapWithState:https://databricks.com/blog/2016/02/01/faster-stateful-stream-processing-in-apache-spark-streaming.html _ From: Nikhil Goyal Sent: Monday, May 23, 2016 23:28 Subject: Timed aggregation in Spark To: