[ 
https://issues.apache.org/jira/browse/ARROW-11606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11606:
-----------------------------------
    Labels: pull-request-available  (was: )

> [Rust] [DataFusion] Need guidance on HashAggregateExec reconstruction
> ---------------------------------------------------------------------
>
>                 Key: ARROW-11606
>                 URL: https://issues.apache.org/jira/browse/ARROW-11606
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust - DataFusion
>            Reporter: Andy Grove
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have run into an issue in the Ballista project where we are reconstructing 
> the Final and Partial HashAggregateExec operators [1] for distributed 
> execution and we need some guidance.
> The Partial HashAggregateExec gets created OK and executes correctly.
> However, when we create the Final HashAggregateExec, it is not finding the 
> expected schema in the input operator. The partial exec outputs field names 
> ending with "[sum]" and "[count]" and so on but the final aggregate doesn't 
> seem to be looking for those names.
> It is also worth noting that the Final and Partial executors are not 
> connected directly in this usage.
> The Partial exec is executed and output streamed to disk.
> The Final exec then runs against the output from the Partial exec.
> We may need to make changes in DataFusion to allow other crates to support 
> this kind of use case?
>  [1] https://github.com/ballista-compute/ballista/pull/491
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to