A. Coady created ARROW-18433:
--------------------------------

             Summary: Optimize aggregate functions to work with batches.
                 Key: ARROW-18433
                 URL: https://issues.apache.org/jira/browse/ARROW-18433
             Project: Apache Arrow
          Issue Type: New Feature
          Components: C++, Python
    Affects Versions: 10.0.1
            Reporter: A. Coady


Most compute functions work with the dataset api and don't load columns. But 
aggregate functions which are associative could also work: `min`, `max`, `any`, 
`all`, `sum`, `product`. Even `unique` and `value_counts`.

A couple of implementation ideas:
 * expand the dataset api to support expressions which return scalars
 * add a `BatchedArray` type which is like a `ChunkedArray` but with lazy 
loading



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to