The partial aggregate seems to be working now, with one interface extension and one bug fix in the Phoenix project. Will do some code cleanup and create a pull request soon.
Still there was a hack in the Drill project which I made to force 2-phase aggregation. I'll try to fix that. Jacques, I have one question though, how can I verify that there are more than one slice and the shuffle happens? Thanks, Maryann On Mon, Oct 5, 2015 at 2:03 PM, James Taylor <[email protected]> wrote: > Maryann, > I believe Jacques mentioned that a little bit of refactoring is required > for a merge sort to occur - there's something that does that, but it's not > expected to be used in this context currently. > > IMHO, there's more of a clear value in getting the aggregation to use > Phoenix first, so I'd recommend going down that road as Jacques mentioned > above if possible. Once that's working, we can circle back to the partial > sort. > > Thoughts? > James > > On Mon, Oct 5, 2015 at 10:40 AM, Maryann Xue <[email protected]> > wrote: > >> I actually tried implementing partial sort with >> https://github.com/jacques-n/drill/pull/4, which I figured might be a >> little easier to start with than partial aggregation. But I found that even >> though the code worked (returned the right results), the Drill side sort >> turned out to be a ordinary sort instead of a merge which it should have >> been. Any idea of how to fix that? >> >> >> Thanks, >> Maryann >> >> On Mon, Oct 5, 2015 at 12:52 PM, Jacques Nadeau <[email protected]> >> wrote: >> >>> Right now this type of work is done here: >>> >>> >>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrule.java >>> >>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/AggPruleBase.java >>> >>> With Distribution Trait application here: >>> >>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTraitDef.java >>> >>> To me, the easiest way to solve the Phoenix issue is by providing a rule >>> that matches HashAgg and StreamAgg but requires Phoenix convention as >>> input. It would replace everywhere but would only be plannable when it is >>> the first phase of aggregation. >>> >>> Thoughts? >>> >>> >>> >>> -- >>> Jacques Nadeau >>> CTO and Co-Founder, Dremio >>> >>> On Thu, Oct 1, 2015 at 2:30 PM, Julian Hyde <[email protected]> wrote: >>> >>>> Phoenix is able to perform quite a few relational operations on the >>>> region server: scan, filter, project, aggregate, sort (optionally with >>>> limit). However, the sort and aggregate are necessarily "local". They >>>> can only deal with data on that region server, and there needs to be a >>>> further operation to combine the results from the region servers. >>>> >>>> The question is how to plan such queries. I think the answer is an >>>> AggregateExchangeTransposeRule. >>>> >>>> The rule would spot an Aggregate on a data source that is split into >>>> multiple locations (partitions) and split it into a partial Aggregate >>>> that computes sub-totals and a summarizing Aggregate that combines >>>> those totals. >>>> >>>> How does the planner know that the Aggregate needs to be split? Since >>>> the data's distribution has changed, there would need to be an >>>> Exchange operator. It is the Exchange operator that triggers the rule >>>> to fire. >>>> >>>> There are some special cases. If the data is sorted as well as >>>> partitioned (say because the local aggregate uses a sort-based >>>> algorithm) we could maybe use a more efficient plan. And if the >>>> partition key is the same as the aggregation key we don't need a >>>> summarizing Aggregate, just a Union. >>>> >>>> It turns out not to be very Phoenix-specific. In the Drill-on-Phoenix >>>> scenario, once the Aggregate has been pushed through the Exchange >>>> (i.e. onto the drill-bit residing on the region server) we can then >>>> push the DrillAggregate across the drill-to-phoenix membrane and make >>>> it into a PhoenixServerAggregate that executes in the region server. >>>> >>>> Related issues: >>>> * https://issues.apache.org/jira/browse/DRILL-3840 >>>> * https://issues.apache.org/jira/browse/CALCITE-751 >>>> >>>> Julian >>>> >>> >>> >> >
