Does anyone here have a way to do Spark Streaming with external timing
for windows? Right now, it relies on the wall clock of the driver to
determine the amount of time that each batch read lasts.
We have a Kafka, and HDFS ingress into our Spark Streaming pipeline
where the events are annotated
So, one portion of our Spark streaming application requires some
state. Our application takes a bunch of application events (i.e.
user_session_started, user_session_ended, etc..), and calculates out
metrics from these, and writes them to a serving layer (see: Lambda
Architecture). Two related