ianmcook commented on code in PR #12460:
URL: https://github.com/apache/arrow/pull/12460#discussion_r846110288


##########
docs/source/python/api/compute.rst:
##########
@@ -45,6 +45,21 @@ Aggregations
    tdigest
    variance
 
+Cumulative Functions
+--------------------
+
+Cumulative functions are vector functions that perform a running total on its
+input and outputs an array containing the corresponding intermediate running 
values.

Review Comment:
   >Is this what's expected
   
   Yes I think so.
   
   I am worried that users will will _think_ this function does something like 
this:
   ```python
   >>> import pyarrow as pa
   >>> import pyarrow.compute as pc
   >>> t = pa.table({'x':[1, 2, 3, 4]})
   >>> pc.cumulative_sum(t, ['x'])
   pyarrow.Table
   x: int64
   ----
   x: [[1,3,6,10]]
   ```
   That's what `pandas.DataFrame.cumsum` does, so users of PyArrow will expect 
it's what `pyarrow.compute.cumulative_sum` does. But it's not.
   
   This is less of an obvious problem in PyArrow, but users of APIs that create 
ExecPlans might think it works this way.
   
   (P.S. There is currently no way for Arrow C++ compute functions to do what 
my example here shows, because we can't deterministically preserve row order. 
Later if we implement window functions, we will get this capability.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to