[Spark Streaming] support of non-timebased windows and lag function

Moser, Michael Tue, 22 Dec 2020 05:44:36 -0800

Hi,

I have a question regarding Spark structured streaming:
will non-timebased window operations like the lag function be supported at some 
point, or is this not on the table due to technical difficulties?


I.e. will something like this be possible in the future:
w = Window.partitionBy('uid').orderBy('timestamp')
df = df.withColumn('lag', lag(df['col_A']).over(w))

This would be useful in streaming applications, where we look for "patterns" 
based on the occurrence of multiple events (rows) in a particular order.
A different way to achieve the above functionality, while being more general, 
would be to use joins, but for this a streaming-ready, monotonically increasing 
and (!) concurrent uid would be needed.

Thanks a lot & best,
Michael

[Spark Streaming] support of non-timebased windows and lag function

Reply via email to