Does anyone here have a way to do Spark Streaming with external timing
for windows? Right now, it relies on the wall clock of the driver to
determine the amount of time that each batch read lasts.

We have a Kafka, and HDFS ingress into our Spark Streaming pipeline
where the events are annotated by the timestamps that they happened
(in real time) in. We would like to keep our windows based on those
timestamps, as opposed to based on the driver time.

Does anyone have any ideas how to do this?

Reply via email to