Jinfeng, Yes, you are right about increased plan exploration time. But an implementation could put bounds on the search space.
This is what we are planning to do in Hive. On 2/11/15, 2:21 PM, "Jinfeng Ni" <[email protected]> wrote: >Drill currently do query planing in two phases : 1) logical planning, >which handles join order, logical filter/project push down etc, and 2) >physical planning, which makes decision between different physical >operators ( different join / aggregation method), filter/project push down >(storage-specific rule), and insert EXCHANGE. Part of the reason to put >into two phases is when the two phases are merged together, the planning >time is increased significantly ( since the planner need to enumerate >different join orders, multiplied by different choices of EXCHANGE). > >The new rules that you are proposing seems to want to build plan in one >single logical planing phase. I'm not sure how it will impact the overall >planning time. > > > >On Wed, Feb 11, 2015 at 1:38 PM, Jinfeng Ni <[email protected]> wrote: > >> I think it's a good proposal to put Exchange/Distribution into Calcite >> library. >> >> Make sense to me. +1 >> >> >> >> On Wed, Feb 11, 2015 at 12:45 PM, Julian Hyde <[email protected]> wrote: >> >>> Drill guys: What do you think of the proposal? >>> >>> On Feb 11, 2015, at 11:34 AM, Ashutosh Chauhan <[email protected]> >>> wrote: >>> >>> Overall proposal sounds good to me. +1 >>> >>> On Tue, Feb 10, 2015 at 3:35 PM, Julian Hyde <[email protected]> wrote: >>> >>> I've had some discussions about adding an Exchange operator and >>> Distribution trait to Hive's cost-based optimizer, which uses Calcite. >>> Ashutosh has logged a bug [ >>> https://issues.apache.org/jira/browse/CALCITE-594 ] and pull request >>> containing a proof-of-concept [ >>> https://github.com/apache/incubator-calcite/pull/52/files ]. >>> >>> I know that Drill has a Distribution trait and several sub-classes of >>> Exchange operator (DrillDistributionTrait, ExchangePrel, >>> BroadcastExchangePrel, HashToMergeExchangePrel, >>>HashToRandomExchangePrel, >>> OrderedPartitionExchangePrel and SimpleMergeExchangePrel, in >>> >>> >>> >>>https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java >>>/org/apache/drill/exec/planner/physical >>> ) >>> >>> I propose to create a Distribution trait and Exchange operator base >>>class >>> in Calcite, with the goal that both Drill and Hive would use them. (I >>>am >>> adopting Drill terminology -- Distribution rather than Partition, >>>Exchange >>> rather than Shuffle -- but I am pretty sure that the concepts are the >>> same.) >>> >>> public abstract class Exchange extends SingleRel { >>> public final RelDistribution distribution; >>> >>> protected Exchange(RelCluster cluster, RelTraitSet traitSet, RelNode >>> input, RelDistribution distribution) { >>> super(cluster, traitSet, input); >>> this.distribution = distribution; >>> } >>> } >>> >>> public interface RelDistribution extends RelMultipleTrait { >>> enum DistributionType { >>> SINGLETON, >>> HASH_DISTRIBUTED, >>> RANGE_DISTRIBUTED, >>> RANDOM_DISTRIBUTED, >>> ROUND_ROBIN_DISTRIBUTED, >>> BROADCAST_DISTRIBUTED >>> } >>> >>> public DistributionType getType(); >>> public ImmutableIntList getFields(); >>> } >>> >>> Calcite would not contain any particular exchange algorithms. However, >>> since it is common to combine sort and exchange, I would create a base >>> class for it: >>> >>> public abstract class SortExchange extends Exchange { >>> public final Collation collation; >>> >>> ... >>> } >>> >>> The physical operators would remain in Drill/Hive and would likely be >>> fully >>> specified by the distribution and collation; they would not need any >>> additional attributes. We would not be able to port >>> DrillDistributionTraitDef.convert directly -- it would create a >>> LogicalExchange (analogous to how RelCollationTraitDef.convert creates >>>a >>> LogicalSort) and then Drill rules would need to kick in to convert >>>that to >>> HashToRandomExchangePrel etc. >>> >>> I do not think that RelDistribution needs to be a "multiple" trait >>> (compare >>> with RelCollation extends RelMultipleTrait, which allows a RelNode to >>>have >>> more than one sort-order) but I may be wrong. >>> >>> The advantages of making Exchange a first-class operator and >>>Distribution >>> a >>> trait are clear. We will be able to build a library of rules (e.g. >>> FilterExchangePushRule, ExchangeRemoveRule), a RelMdDistribution >>>metadata >>> interface, and start working on stats and cost model. >>> >>> Drill and Hive stakeholders, please let me know what you think of this >>> plan. >>> >>> Julian >>> >> >>
