yashmayya opened a new pull request, #14249:
URL: https://github.com/apache/pinot/pull/14249

   - Fixes https://github.com/apache/pinot/issues/11406.
   - Background reading - 
https://www.postgresql.org/docs/current/tutorial-window.html, 
https://www.postgresql.org/docs/current/functions-window.html, 
https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS
 (this one is most relevant to this PR).
   - Currently, Pinot's window function implementations have limited or even 
incorrect support for window frame bounds. For instance, `FIRST_VALUE` / 
`LAST_VALUE` assume that the window frame is always `ROWS BETWEEN UNBOUNDED 
PRECEDING AND UNBOUNDED FOLLOWING` even though the default window frame as per 
standard SQL is `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`. 
Furthermore, support for defining the lower bound explicitly as `UNBOUNDED 
PRECEDING` / `CURRENT ROW` / `n FOLLOWING` / `n PRECEDING` and the upper bound 
as `UNBOUNDED FOLLOWING` / `CURRENT ROW` / `n FOLLOWING` / `n PRECEDING` does 
not exist.
   - This patch adds support for any custom bounds (offset based or otherwise) 
for `ROWS` window frames, and also adds support for `UNBOUNDED PRECEDING` / 
`CURRENT ROW` / `UNBOUNDED FOLLOWING` bounds for `RANGE` window frames. There 
are a ton of edge cases to be handled here but this patch attempts to add test 
cases to cover most of these scenarios.
   - Note that Calcite (and hence Pinot) only supports `ROWS` and `RANGE` based 
window frame bounds, whereas Postgres also supports `GROUPS`.
   - The planner side changes (mainly literal extraction) are built over 
https://github.com/apache/pinot/pull/14233/.
   - Apart from the need to add support for offset bounds for `RANGE` based 
window frames, another important future enhancement is to optimize the 
performance of `ROWS` based window frames for aggregate window functions where 
both the lower and upper bounds are offset based / current row. Since the 
changes in this patch are built over the existing framework for window 
functions where a "merger" is used to merge values for aggregate window 
functions, it isn't possible to use a sliding window based algorithm to 
efficiently compute aggregates for windows. This will require more significant 
changes to the framework but is critical to ensure performant computations 
especially for larger windows. Optimizations have been added in this patch to 
ensure that aggregation window functions over window frames with `UNBOUNDED 
PRECEDING` lower bound or `UNBOUNDED FOLLOWING` upper bound are computed 
efficiently.
   - Note that all the changes here only affect the aggregate window functions 
(`SUM`, `COUNT`, `MIN`, `MAX` etc.) and `FIRST_VALUE` / `LAST_VALUE`. The other 
window functions currently supported by Pinot (`LAG`, `LEAD`, `RANK`, 
`DENSE_RANK`, `ROW_NUMBER`) don't support custom window frame bounds and 
Calcite ensures that during query planning.
   - Calcite also does some other validation for window frame bounds like 
making sure lower bound isn't `UNBOUNDED FOLLOWING` / upper bound isn't 
`UNBOUNDED PRECEDING`, lower bound isn't `UNBOUNDED FOLLOWING` if upper bound 
is `UNBOUNDED PRECEDING` and vice versa etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to