Does anyone here have a way to do Spark Streaming with external timing for windows? Right now, it relies on the wall clock of the driver to determine the amount of time that each batch read lasts.
We have a Kafka, and HDFS ingress into our Spark Streaming pipeline where the events are annotated by the timestamps that they happened (in real time) in. We would like to keep our windows based on those timestamps, as opposed to based on the driver time. Does anyone have any ideas how to do this?