ianmcook commented on code in PR #12460: URL: https://github.com/apache/arrow/pull/12460#discussion_r846110288
########## docs/source/python/api/compute.rst: ########## @@ -45,6 +45,21 @@ Aggregations tdigest variance +Cumulative Functions +-------------------- + +Cumulative functions are vector functions that perform a running total on its +input and outputs an array containing the corresponding intermediate running values. Review Comment: >Is this what's expected Yes I think so. I am worried that users will will _think_ this function does something like this: ```python >>> import pyarrow as pa >>> import pyarrow.compute as pc >>> t = pa.table({'x':[1, 2, 3, 4]}) >>> pc.cumulative_sum(t, ['x']) pyarrow.Table x: int64 ---- x: [[1,3,6,10]] ``` That's what `pandas.DataFrame.cumsum` does, so users of PyArrow will expect it's what `pyarrow.compute.cumulative_sum` does. But it's not. This is less of an obvious problem in PyArrow, but users of APIs that create ExecPlans might think it works this way. (P.S. There is currently no way for Arrow C++ compute functions to do what my example here shows, because we can't deterministically preserve row order. Later if we implement window functions, we will get this capability.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org