Count-based windows

Tobias Pfeiffer Mon, 08 Dec 2014 00:57:27 -0800

Hi,

I am interested in building an application that uses sliding windows not
based on the time when the item was received, but on either
* a timestamp embedded in the data, or
* a count (like: every 10 items, look at the last 100 items).


Also, I want to do this on stream data received from Kafka, but also on
HDFS data (where clearly the aspect "received in" is not present). I found <
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-windowing-Driven-by-absolutely-time-td1733.html#a1843>
as an instruction for how to use the timestamp, but does anyone have a
suggestion on how to use item count as window size constraint?

Thanks
Tobias

Count-based windows

Reply via email to