Hi, I am interested in building an application that uses sliding windows not based on the time when the item was received, but on either * a timestamp embedded in the data, or * a count (like: every 10 items, look at the last 100 items).
Also, I want to do this on stream data received from Kafka, but also on HDFS data (where clearly the aspect "received in" is not present). I found < http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-windowing-Driven-by-absolutely-time-td1733.html#a1843> as an instruction for how to use the timestamp, but does anyone have a suggestion on how to use item count as window size constraint? Thanks Tobias