Hi all! I am learning Spark SQL and window functions. The behavior of the last() window function was unexpected for me in one case(for a person without any previous experience in the window functions).
I define my window specification as follows: Window.partitionBy('transportType, 'route).orderBy('eventTime). So, I have neither rowsBetween nor rangeBetween boundaries. In this scenario, I expect to get the latest event (by time) in a group if I apply the last('eventTime) window function over this window specification. However, this does not happen. Looking at the code, I was able to figure out that if there are no range/rows boundaries, the UnspecifiedFrame is assigned. Later, in ResolveWindowFrame for the last() function, Spark assigns a default window frame. The default frame depends on the presence of any order specification (if one has an order specification, the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). That's why the last() window function does work I as expected in my case. There is a very helpful comment in SpecifiedWindowFrame. I wish I could find it in the documentation. That's why I have 2 questions: - Did I miss the place in the documentation where this behavior is described? If no, would it be appropriate from my side to try to find where this can be done? - Would it be appropriate/useful to add some window function examples to spark/examples? There are no such so far Sincerely, Anton Okolnychyi