[GitHub] [arrow-datafusion] ozankabak commented on pull request #3570: Custom window frame logic (support `ROWS`, `RANGE`, `PRECEDING` and `FOLLOWING` for window functions)

GitBox Wed, 28 Sep 2022 05:36:20 -0700


ozankabak commented on PR #3570:
URL: 
https://github.com/apache/arrow-datafusion/pull/3570#issuecomment-1260839211

Hello again! We resolved almost all the points you've mentioned.
Additionally, we identified some challenging corner cases regarding NULL
treatment and updated our code to handle these cases, along with new unit tests
for these corner cases. Three main discussion points remain:

*PostgreSQL compatibility of new unit tests*

Expected values of all the new unit tests were sanity-checked against
PostgreSQL. To remain in sync with PostgreSQL, we also think that it could be a
good idea to add psql-parity tests for NULL-treatment cases. If you agree,
let's make this subject of another PR. We can open an issue to track this (and
will be happy to work on a PR to resolve it).

*Changes to `ScalarValue`*

At this point, we do not really have any new functionality or complexity
added to this type. We just tidied up/moved some `ScalarValue`-only code that
was scattered across multiple modules to `scalar.rs`. For example, the
coercions you saw were already a part of the code here:

https://github.com/apache/arrow-datafusion/blob/87faf868f2276b84e63cad6721ca08bd79ed9cb8/datafusion/physical-expr/src/aggregate/sum.rs#L261

We agree that there should be no coercions in general at this stage -- any
necessary work along these lines should be handled previously in the code path.
However, we don't want to extend the scope of this PR to remove
already-existing coercions from the codebase. If there is no issue about this,
we are happy to open an issue for this and help resolve it in the near future
along with similar issues like
[#3511](https://github.com/apache/arrow-datafusion/issues/3511).

Some teaser: I have done some preliminary tests and it seems like removing
all the coercions will not be a tough task at all. However, removing this kind
of code always requires more careful testing than just preliminary testing and
deserves a careful study/separate PR 🙂

*Bisect*

We considered using the
[`partition_point`](https://doc.rust-lang.org/std/primitive.slice.html#method.partition_point)
function. This requires converting `&[ArrayRef]` to `&[&[ScalarValue]]`,
where the inner `&[ScalarValue]` corresponds to each row (Because we are
searching the insertion place of target `&[ScalarValue]` among rows). Unless we
are missing something, doing this will require unnecessary memory usage and CPU
time since it will involve copying. On the contrary, our implementation takes
data in original columnar format and calculates rows from the index only when
the comparison is done. Since we are doing *log(n)* comparisons on average, we
need *log(n)* copies instead of *n* copies. If you have any suggestions for
adding bisect functionality without changing the columnar orientation of the
data, we can create an issue for this and help resolve it in a subsequent PR.

Thanks again for all the reviews. Excited to see this merge and get used!

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] ozankabak commented on pull request #3570: Custom window frame logic (support `ROWS`, `RANGE`, `PRECEDING` and `FOLLOWING` for window functions)

Reply via email to