[ https://issues.apache.org/jira/browse/ARROW-12044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ben Kietzman updated ARROW-12044: --------------------------------- Labels: query-engine (was: ) > [C++][Compute] Add support for imperfect grouping for use in radix > partitioning > ------------------------------------------------------------------------------- > > Key: ARROW-12044 > URL: https://issues.apache.org/jira/browse/ARROW-12044 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Ben Kietzman > Priority: Major > Labels: query-engine > > ARROW-11591 adds Grouper for identifying groups based on multiple key columns. > For a large number of groups, it is beneficial to do a first pass > partitioning on the key columns so that each worker thread only handles a > subset of the query's groups. This is usually accomplished by computing only > the hashes of the keys (not full group identity) and pushing slices of the > input batches to workers based on those. > This would probably make sense as a member function of Grouper, maybe > Grouper::ConsumeImperfect -- This message was sent by Atlassian Jira (v8.3.4#803005)