A. Coady created ARROW-18433: -------------------------------- Summary: Optimize aggregate functions to work with batches. Key: ARROW-18433 URL: https://issues.apache.org/jira/browse/ARROW-18433 Project: Apache Arrow Issue Type: New Feature Components: C++, Python Affects Versions: 10.0.1 Reporter: A. Coady
Most compute functions work with the dataset api and don't load columns. But aggregate functions which are associative could also work: `min`, `max`, `any`, `all`, `sum`, `product`. Even `unique` and `value_counts`. A couple of implementation ideas: * expand the dataset api to support expressions which return scalars * add a `BatchedArray` type which is like a `ChunkedArray` but with lazy loading -- This message was sent by Atlassian Jira (v8.20.10#820010)