Re: [C++][Acero] can Acero support distributed computation?

2023-07-07 Thread Sasha Krassovsky
> Note that there are a few aggregates that cannot be distributed (e.g. median). You can distribute/parallelize based on the group-by key, but yes in general these so-called Holistic Aggregates (i.e. aggregates that must look at the entire partition) can’t be distributed within a key without a

Re: [C++][Acero] can Acero support distributed computation?

2023-07-07 Thread Weston Pace
aggregate kernels already maintain such intermediate status > internally, I wonder if it is possible to have some APIs in aggregate > kernels to retrieve these intermediate status to enable such use scenarios. > Thanks. > > > > Jiangtao > > > > *From: *Sasha Krassovsk

Re: [C++][Acero] can Acero support distributed computation?

2023-07-07 Thread Jiangtao Peng
7, 2023 at 2:21 PM To: user@arrow.apache.org Subject: Re: [C++][Acero] can Acero support distributed computation? Yes, what you’ve said is correct for Mean. But my point earlier is that there should only be a few of such special cases. A simple case would be e.g. Max, where Aggregate outputs Max

Re: [C++][Acero] can Acero support distributed computation?

2023-07-07 Thread Sasha Krassovsky
s how to implement Pre-Aggregation and Post-Aggregation using Acero. > > Best, > Jiangtao > > > From: Sasha Krassovsky > Date: Friday, July 7, 2023 at 1:25 PM > To: user@arrow.apache.org > Subject: Re: [C++][Acero] can Acero support distributed computation? > >

Re: [C++][Acero] can Acero support distributed computation?

2023-07-07 Thread Jiangtao Peng
and Post-Aggregation using Acero. Best, Jiangtao From: Sasha Krassovsky Date: Friday, July 7, 2023 at 1:25 PM To: user@arrow.apache.org Subject: Re: [C++][Acero] can Acero support distributed computation? Can you clarify what you mean by “data flow”? Each machine will be executing the same

Re: [C++][Acero] can Acero support distributed computation?

2023-07-06 Thread Sasha Krassovsky
about data flow of “compute” and “merge” on different nodes? > > > Best, > Jiangtao > > From: Sasha Krassovsky > Date: Friday, July 7, 2023 at 11:07 AM > To: user@arrow.apache.org > Subject: Re: [C++][Acero] can Acero support distributed computation? > > Distribut

Re: [C++][Acero] can Acero support distributed computation?

2023-07-06 Thread Jiangtao Peng
be appreciated. Thanks, Jiangtao From: Sasha Krassovsky Date: Friday, July 7, 2023 at 10:12 AM To: user@arrow.apache.org Subject: Re: [C++][Acero] can Acero support distributed computation? Hi Jiangtao, Acero doesn’t support any distributed computation on its own. However, to get some simple

Re: [C++][Acero] can Acero support distributed computation?

2023-07-06 Thread Sasha Krassovsky
thod on aggregation kernel? Any other tips would be appreciated. > > Thanks, > Jiangtao > > From: Sasha Krassovsky > Date: Friday, July 7, 2023 at 10:12 AM > To: user@arrow.apache.org > Subject: Re: [C++][Acero] can Acero support distributed computation? > > Hi

Re: [C++][Acero] can Acero support distributed computation?

2023-07-06 Thread Jiangtao Peng
Krassovsky Date: Friday, July 7, 2023 at 10:12 AM To: user@arrow.apache.org Subject: Re: [C++][Acero] can Acero support distributed computation? Hi Jiangtao, Acero doesn’t support any distributed computation on its own. However, to get some simple distributed computation going it would

Re: [C++][Acero] can Acero support distributed computation?

2023-07-06 Thread Sasha Krassovsky
Hi Jiangtao, Acero doesn’t support any distributed computation on its own. However, to get some simple distributed computation going it would be sufficient to add a Shuffle node. For example for Aggregation, the Shuffle would assign a range of hashes to each node, and then each node would

[C++][Acero] can Acero support distributed computation?

2023-07-06 Thread Jiangtao Peng
Hi there, I'm learning Acero streaming execution engine recently. And I’m wondering if Acero support distributed computing. I have read code about aggregation node and kernel; Aggregation kernel seems to hide the details of aggregation middle state. If use multiple nodes with Acero execution