westonpace commented on issue #34201:
URL: https://github.com/apache/arrow/issues/34201#issuecomment-1454307957

   Yes, #32991 is a blocker.  After that we will still need some work to 
support window functions.  There was an effort to do this in #32813 but that 
was abandoned and probably more complex then we have the resources to implement:
   
   We will need to add a window node which (ignoring `over` for the moment) 
mostly boils down to defining a kernel for window functions.  A naive 
implementation would simply accumulate all of the data and pass it to the 
kernel all at once but I don't think this is practical.  I've been thinking 
about how we might do this in a more reasonable fashion and it shouldn't be 
nightmarish.
   
   Many window functions (e.g. cumulative sum, rank, percent_rank) simply 
require ordered input (and knowledge of the total # of rows) and don't "reach" 
ahead or behind.  So this would be very similar to `ExecuteScalarExpression` 
except the kernels would be executed in a serial fashion (easily doable once 
#32991 is done) and maintain a state after each call (e.g. the current 
cumulative sum).
   
   Sadly, lag & lead are rather peculiar window functions.  I believe, under 
the hood, it makes more sense to translate something like `lag(col) over ()` 
into `first_value(col) over (rows 1 preceding)`.  These behave the same except 
for the boundaries and we could finesse that.  That will require support for 
`over` (e.g. actual "windows" and not just ordered execution of scalar-ish 
functions).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to