[ 
https://issues.apache.org/jira/browse/ARROW-11606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17283766#comment-17283766
 ] 

Andy Grove commented on ARROW-11606:
------------------------------------

I understand the issue better now.

In the DataFusion planner, the aggregate expressions are compiled against the 
schema of the input to the partial aggregate. These compiled expressions are 
then used to construct both the partial and final aggregates.

In other words, the expressions for the Final aggregate are not compiled 
against it's input schema, but against the input schema of the Partial 
aggregate.

This feels a little unnatural when implementing serde but I will think about 
this more and see how I can work around this.

 

 

> [Rust] [DataFusion] Need guidance on HashAggregateExec reconstruction
> ---------------------------------------------------------------------
>
>                 Key: ARROW-11606
>                 URL: https://issues.apache.org/jira/browse/ARROW-11606
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust - DataFusion
>            Reporter: Andy Grove
>            Priority: Major
>
> We have run into an issue in the Ballista project where we are reconstructing 
> the Final and Partial HashAggregateExec operators [1] for distributed 
> execution and we need some guidance.
> The Partial HashAggregateExec gets created OK and executes correctly.
> However, when we create the Final HashAggregateExec, it is not finding the 
> expected schema in the input operator. The partial exec outputs field names 
> ending with "[sum]" and "[count]" and so on but the final aggregate doesn't 
> seem to be looking for those names.
> It is also worth noting that the Final and Partial executors are not 
> connected directly in this usage.
> The Partial exec is executed and output streamed to disk.
> The Final exec then runs against the output from the Partial exec.
> We may need to make changes in DataFusion to allow other crates to support 
> this kind of use case?
>  [1] https://github.com/ballista-compute/ballista/pull/491
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to