Jinfeng,

  Yes, you are right about increased plan exploration time.
But an implementation could put bounds on the search space.


This is what we are planning to do in Hive.


On 2/11/15, 2:21 PM, "Jinfeng Ni" <[email protected]> wrote:

>Drill currently  do query planing in two phases : 1) logical planning,
>which handles join order, logical filter/project push down etc, and 2)
>physical planning, which makes decision between different physical
>operators ( different join / aggregation method), filter/project push down
>(storage-specific rule), and insert EXCHANGE.   Part of the reason to put
>into two phases is when the two phases are merged together, the planning
>time is increased significantly ( since the planner need to enumerate
>different join orders, multiplied by different choices of EXCHANGE).
>
>The new rules that you are proposing seems to want to build plan in one
>single logical planing phase.  I'm not sure how it will impact the overall
>planning time.
>
>
>
>On Wed, Feb 11, 2015 at 1:38 PM, Jinfeng Ni <[email protected]> wrote:
>
>> I think it's a good proposal to put Exchange/Distribution into Calcite
>> library.
>>
>> Make sense to me.  +1
>>
>>
>>
>> On Wed, Feb 11, 2015 at 12:45 PM, Julian Hyde <[email protected]> wrote:
>>
>>> Drill guys: What do you think of the proposal?
>>>
>>> On Feb 11, 2015, at 11:34 AM, Ashutosh Chauhan <[email protected]>
>>> wrote:
>>>
>>> Overall proposal sounds good to me. +1
>>>
>>> On Tue, Feb 10, 2015 at 3:35 PM, Julian Hyde <[email protected]> wrote:
>>>
>>> I've had some discussions about adding an Exchange operator and
>>> Distribution trait to Hive's cost-based optimizer, which uses Calcite.
>>> Ashutosh has logged a bug [
>>> https://issues.apache.org/jira/browse/CALCITE-594 ] and pull request
>>> containing a proof-of-concept [
>>> https://github.com/apache/incubator-calcite/pull/52/files ].
>>>
>>> I know that Drill has a Distribution trait and several sub-classes of
>>> Exchange operator (DrillDistributionTrait, ExchangePrel,
>>> BroadcastExchangePrel, HashToMergeExchangePrel,
>>>HashToRandomExchangePrel,
>>> OrderedPartitionExchangePrel and SimpleMergeExchangePrel, in
>>>
>>>
>>> 
>>>https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java
>>>/org/apache/drill/exec/planner/physical
>>> )
>>>
>>> I propose to create a Distribution trait and Exchange operator base
>>>class
>>> in Calcite, with the goal that both Drill and Hive would use them. (I
>>>am
>>> adopting Drill terminology -- Distribution rather than Partition,
>>>Exchange
>>> rather than Shuffle -- but I am pretty sure that the concepts are the
>>> same.)
>>>
>>> public abstract class Exchange extends SingleRel {
>>>  public final RelDistribution distribution;
>>>
>>>  protected Exchange(RelCluster cluster, RelTraitSet traitSet, RelNode
>>> input, RelDistribution distribution) {
>>>    super(cluster, traitSet, input);
>>>    this.distribution = distribution;
>>>  }
>>> }
>>>
>>> public interface RelDistribution extends RelMultipleTrait {
>>>  enum DistributionType {
>>>    SINGLETON,
>>>    HASH_DISTRIBUTED,
>>>    RANGE_DISTRIBUTED,
>>>    RANDOM_DISTRIBUTED,
>>>    ROUND_ROBIN_DISTRIBUTED,
>>>    BROADCAST_DISTRIBUTED
>>>  }
>>>
>>>  public DistributionType getType();
>>>  public ImmutableIntList getFields();
>>> }
>>>
>>> Calcite would not contain any particular exchange algorithms. However,
>>> since it is common to combine sort and exchange, I would create a base
>>> class for it:
>>>
>>> public abstract class SortExchange extends Exchange {
>>>  public final Collation collation;
>>>
>>>  ...
>>> }
>>>
>>> The physical operators would remain in Drill/Hive and would likely be
>>> fully
>>> specified by the distribution and collation; they would not need any
>>> additional attributes. We would not be able to port
>>> DrillDistributionTraitDef.convert directly -- it would create a
>>> LogicalExchange (analogous to how RelCollationTraitDef.convert creates
>>>a
>>> LogicalSort) and then Drill rules would need to kick in to convert
>>>that to
>>> HashToRandomExchangePrel etc.
>>>
>>> I do not think that RelDistribution needs to be a "multiple" trait
>>> (compare
>>> with RelCollation extends RelMultipleTrait, which allows a RelNode to
>>>have
>>> more than one sort-order) but I may be wrong.
>>>
>>> The advantages of making Exchange a first-class operator and
>>>Distribution
>>> a
>>> trait are clear. We will be able to build a library of rules (e.g.
>>> FilterExchangePushRule, ExchangeRemoveRule), a RelMdDistribution
>>>metadata
>>> interface, and start working on stats and cost model.
>>>
>>> Drill and Hive stakeholders, please let me know what you think of this
>>> plan.
>>>
>>> Julian
>>>
>>
>>

Reply via email to