Re: Adding Exchange operator and Distribution trait

John Pullokkaran Wed, 11 Feb 2015 14:26:01 -0800

Jinfeng,

  Yes, you are right about increased plan exploration time.
But an implementation could put bounds on the search space.



This is what we are planning to do in Hive.


On 2/11/15, 2:21 PM, "Jinfeng Ni" <[email protected]> wrote:

>Drill currently  do query planing in two phases : 1) logical planning,
>which handles join order, logical filter/project push down etc, and 2)
>physical planning, which makes decision between different physical
>operators ( different join / aggregation method), filter/project push down
>(storage-specific rule), and insert EXCHANGE.   Part of the reason to put
>into two phases is when the two phases are merged together, the planning
>time is increased significantly ( since the planner need to enumerate
>different join orders, multiplied by different choices of EXCHANGE).
>
>The new rules that you are proposing seems to want to build plan in one
>single logical planing phase.  I'm not sure how it will impact the overall
>planning time.
>
>
>
>On Wed, Feb 11, 2015 at 1:38 PM, Jinfeng Ni <[email protected]> wrote:
>
>> I think it's a good proposal to put Exchange/Distribution into Calcite
>> library.
>>
>> Make sense to me.  +1
>>
>>
>>
>> On Wed, Feb 11, 2015 at 12:45 PM, Julian Hyde <[email protected]> wrote:
>>
>>> Drill guys: What do you think of the proposal?
>>>
>>> On Feb 11, 2015, at 11:34 AM, Ashutosh Chauhan <[email protected]>
>>> wrote:
>>>
>>> Overall proposal sounds good to me. +1
>>>
>>> On Tue, Feb 10, 2015 at 3:35 PM, Julian Hyde <[email protected]> wrote:
>>>
>>> I've had some discussions about adding an Exchange operator and
>>> Distribution trait to Hive's cost-based optimizer, which uses Calcite.
>>> Ashutosh has logged a bug [
>>> https://issues.apache.org/jira/browse/CALCITE-594 ] and pull request
>>> containing a proof-of-concept [
>>> https://github.com/apache/incubator-calcite/pull/52/files ].
>>>
>>> I know that Drill has a Distribution trait and several sub-classes of
>>> Exchange operator (DrillDistributionTrait, ExchangePrel,
>>> BroadcastExchangePrel, HashToMergeExchangePrel,
>>>HashToRandomExchangePrel,
>>> OrderedPartitionExchangePrel and SimpleMergeExchangePrel, in
>>>
>>>
>>> 
>>>https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java
>>>/org/apache/drill/exec/planner/physical
>>> )
>>>
>>> I propose to create a Distribution trait and Exchange operator base
>>>class
>>> in Calcite, with the goal that both Drill and Hive would use them. (I
>>>am
>>> adopting Drill terminology -- Distribution rather than Partition,
>>>Exchange
>>> rather than Shuffle -- but I am pretty sure that the concepts are the
>>> same.)
>>>
>>> public abstract class Exchange extends SingleRel {
>>>  public final RelDistribution distribution;
>>>
>>>  protected Exchange(RelCluster cluster, RelTraitSet traitSet, RelNode
>>> input, RelDistribution distribution) {
>>>    super(cluster, traitSet, input);
>>>    this.distribution = distribution;
>>>  }
>>> }
>>>
>>> public interface RelDistribution extends RelMultipleTrait {
>>>  enum DistributionType {
>>>    SINGLETON,
>>>    HASH_DISTRIBUTED,
>>>    RANGE_DISTRIBUTED,
>>>    RANDOM_DISTRIBUTED,
>>>    ROUND_ROBIN_DISTRIBUTED,
>>>    BROADCAST_DISTRIBUTED
>>>  }
>>>
>>>  public DistributionType getType();
>>>  public ImmutableIntList getFields();
>>> }
>>>
>>> Calcite would not contain any particular exchange algorithms. However,
>>> since it is common to combine sort and exchange, I would create a base
>>> class for it:
>>>
>>> public abstract class SortExchange extends Exchange {
>>>  public final Collation collation;
>>>
>>>  ...
>>> }
>>>
>>> The physical operators would remain in Drill/Hive and would likely be
>>> fully
>>> specified by the distribution and collation; they would not need any
>>> additional attributes. We would not be able to port
>>> DrillDistributionTraitDef.convert directly -- it would create a
>>> LogicalExchange (analogous to how RelCollationTraitDef.convert creates
>>>a
>>> LogicalSort) and then Drill rules would need to kick in to convert
>>>that to
>>> HashToRandomExchangePrel etc.
>>>
>>> I do not think that RelDistribution needs to be a "multiple" trait
>>> (compare
>>> with RelCollation extends RelMultipleTrait, which allows a RelNode to
>>>have
>>> more than one sort-order) but I may be wrong.
>>>
>>> The advantages of making Exchange a first-class operator and
>>>Distribution
>>> a
>>> trait are clear. We will be able to build a library of rules (e.g.
>>> FilterExchangePushRule, ExchangeRemoveRule), a RelMdDistribution
>>>metadata
>>> interface, and start working on stats and cost model.
>>>
>>> Drill and Hive stakeholders, please let me know what you think of this
>>> plan.
>>>
>>> Julian
>>>
>>
>>

Re: Adding Exchange operator and Distribution trait

Reply via email to