[ https://issues.apache.org/jira/browse/ARROW-11606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Lamb reassigned ARROW-11606: ----------------------------------- Assignee: Andy Grove > [Rust] [DataFusion] Need guidance on HashAggregateExec reconstruction > --------------------------------------------------------------------- > > Key: ARROW-11606 > URL: https://issues.apache.org/jira/browse/ARROW-11606 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust - DataFusion > Reporter: Andy Grove > Assignee: Andy Grove > Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > We have run into an issue in the Ballista project where we are reconstructing > the Final and Partial HashAggregateExec operators [1] for distributed > execution and we need some guidance. > The Partial HashAggregateExec gets created OK and executes correctly. > However, when we create the Final HashAggregateExec, it is not finding the > expected schema in the input operator. The partial exec outputs field names > ending with "[sum]" and "[count]" and so on but the final aggregate doesn't > seem to be looking for those names. > It is also worth noting that the Final and Partial executors are not > connected directly in this usage. > The Partial exec is executed and output streamed to disk. > The Final exec then runs against the output from the Partial exec. > We may need to make changes in DataFusion to allow other crates to support > this kind of use case? > [1] https://github.com/ballista-compute/ballista/pull/491 > -- This message was sent by Atlassian Jira (v8.3.4#803005)