Spark Streaming, external windowing?

2014-07-16 Thread Sargun Dhillon
Does anyone here have a way to do Spark Streaming with external timing for windows? Right now, it relies on the wall clock of the driver to determine the amount of time that each batch read lasts. We have a Kafka, and HDFS ingress into our Spark Streaming pipeline where the events are annotated

Stateful RDDs?

2014-07-10 Thread Sargun Dhillon
So, one portion of our Spark streaming application requires some state. Our application takes a bunch of application events (i.e. user_session_started, user_session_ended, etc..), and calculates out metrics from these, and writes them to a serving layer (see: Lambda Architecture). Two related