Re: Timed aggregation in Spark

2016-05-23 Thread Ofir Kerker
Yes, check out mapWithState:https://databricks.com/blog/2016/02/01/faster-stateful-stream-processing-in-apache-spark-streaming.html _ From: Nikhil Goyal Sent: Monday, May 23, 2016 23:28 Subject: Timed aggregation in Spark To:

Re: mapWithState not compacting removed state

2016-04-07 Thread Ofir Kerker
Hi Iain, Did you manage to solve this issue? It looks like we have a similar issue with processing time increasing every micro-batch but only after 30 batches. Thanks. On Thu, Mar 3, 2016 at 4:45 PM Iain Cundy wrote: > Hi All > > > > I’m aggregating data using

Re: Spark Streaming application code change and stateful transformations

2015-09-16 Thread Ofir Kerker
Cody Koeninger <c...@koeninger.org> wrote: > Solution 2 sounds better to me. You aren't always going to have graceful > shutdowns. > > On Mon, Sep 14, 2015 at 1:49 PM, Ofir Kerker <ofir.ker...@gmail.com> > wrote: > >> Hi, >> My Spark Streaming application c

Spark Streaming application code change and stateful transformations

2015-09-14 Thread Ofir Kerker
Hi, My Spark Streaming application consumes messages (events) from Kafka every 10 seconds using the direct stream approach and aggregates these messages into hourly aggregations (to answer analytics questions like: "How many users from Paris visited page X between 8PM to 9PM") and save the data to