Aman,

RelDistribution will be an interface, and there’s no reason why Drill shouldn’t 
have its own values or even sub-classes. As long as RelDistributionTraitDef is 
able to canonize them. So you could, for instance, sub-class "Hash[1, 3]” and 
specify which hash function is being used.

I’ve addressed the comment about logical exchange already — you can go straight 
to physical.


On Feb 11, 2015, at 2:34 PM, Aman Sinha <[email protected]> wrote:

> I am neutral on this for now until we give it more thought.  The reason
> being that since Calcite is not aware of the execution engine's capability
> and configuration parameters for distribution (e.g Drill has a few
> parameters, including just true/false type of flags that determine whether
> or not an Exchange node is even inserted in the plan and if it is used,
> what type of Exchange it is etc.).  In that sense, if the logical plan
> produced by Calcite contains a LogicalExchange, it is possible that Drill
> may not be able use it directly while building the physical plan.
> 
> I do however see the benefits in terms of trait propagation, combining
> distribution and collation traits and consolidating the subsumption logic
> in some base class such that it is useful for other consumers of Calcite.
> 
> Aman
> 
> On Wed, Feb 11, 2015 at 2:21 PM, Jinfeng Ni <[email protected]> wrote:
> 
>> Drill currently  do query planing in two phases : 1) logical planning,
>> which handles join order, logical filter/project push down etc, and 2)
>> physical planning, which makes decision between different physical
>> operators ( different join / aggregation method), filter/project push down
>> (storage-specific rule), and insert EXCHANGE.   Part of the reason to put
>> into two phases is when the two phases are merged together, the planning
>> time is increased significantly ( since the planner need to enumerate
>> different join orders, multiplied by different choices of EXCHANGE).
>> 
>> The new rules that you are proposing seems to want to build plan in one
>> single logical planing phase.  I'm not sure how it will impact the overall
>> planning time.
>> 
>> 
>> 
>> On Wed, Feb 11, 2015 at 1:38 PM, Jinfeng Ni <[email protected]> wrote:
>> 
>>> I think it's a good proposal to put Exchange/Distribution into Calcite
>>> library.
>>> 
>>> Make sense to me.  +1
>>> 
>>> 
>>> 
>>> On Wed, Feb 11, 2015 at 12:45 PM, Julian Hyde <[email protected]> wrote:
>>> 
>>>> Drill guys: What do you think of the proposal?
>>>> 
>>>> On Feb 11, 2015, at 11:34 AM, Ashutosh Chauhan <[email protected]>
>>>> wrote:
>>>> 
>>>> Overall proposal sounds good to me. +1
>>>> 
>>>> On Tue, Feb 10, 2015 at 3:35 PM, Julian Hyde <[email protected]> wrote:
>>>> 
>>>> I've had some discussions about adding an Exchange operator and
>>>> Distribution trait to Hive's cost-based optimizer, which uses Calcite.
>>>> Ashutosh has logged a bug [
>>>> https://issues.apache.org/jira/browse/CALCITE-594 ] and pull request
>>>> containing a proof-of-concept [
>>>> https://github.com/apache/incubator-calcite/pull/52/files ].
>>>> 
>>>> I know that Drill has a Distribution trait and several sub-classes of
>>>> Exchange operator (DrillDistributionTrait, ExchangePrel,
>>>> BroadcastExchangePrel, HashToMergeExchangePrel,
>> HashToRandomExchangePrel,
>>>> OrderedPartitionExchangePrel and SimpleMergeExchangePrel, in
>>>> 
>>>> 
>>>> 
>> https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical
>>>> )
>>>> 
>>>> I propose to create a Distribution trait and Exchange operator base
>> class
>>>> in Calcite, with the goal that both Drill and Hive would use them. (I am
>>>> adopting Drill terminology -- Distribution rather than Partition,
>> Exchange
>>>> rather than Shuffle -- but I am pretty sure that the concepts are the
>>>> same.)
>>>> 
>>>> public abstract class Exchange extends SingleRel {
>>>> public final RelDistribution distribution;
>>>> 
>>>> protected Exchange(RelCluster cluster, RelTraitSet traitSet, RelNode
>>>> input, RelDistribution distribution) {
>>>>   super(cluster, traitSet, input);
>>>>   this.distribution = distribution;
>>>> }
>>>> }
>>>> 
>>>> public interface RelDistribution extends RelMultipleTrait {
>>>> enum DistributionType {
>>>>   SINGLETON,
>>>>   HASH_DISTRIBUTED,
>>>>   RANGE_DISTRIBUTED,
>>>>   RANDOM_DISTRIBUTED,
>>>>   ROUND_ROBIN_DISTRIBUTED,
>>>>   BROADCAST_DISTRIBUTED
>>>> }
>>>> 
>>>> public DistributionType getType();
>>>> public ImmutableIntList getFields();
>>>> }
>>>> 
>>>> Calcite would not contain any particular exchange algorithms. However,
>>>> since it is common to combine sort and exchange, I would create a base
>>>> class for it:
>>>> 
>>>> public abstract class SortExchange extends Exchange {
>>>> public final Collation collation;
>>>> 
>>>> ...
>>>> }
>>>> 
>>>> The physical operators would remain in Drill/Hive and would likely be
>>>> fully
>>>> specified by the distribution and collation; they would not need any
>>>> additional attributes. We would not be able to port
>>>> DrillDistributionTraitDef.convert directly -- it would create a
>>>> LogicalExchange (analogous to how RelCollationTraitDef.convert creates a
>>>> LogicalSort) and then Drill rules would need to kick in to convert that
>> to
>>>> HashToRandomExchangePrel etc.
>>>> 
>>>> I do not think that RelDistribution needs to be a "multiple" trait
>>>> (compare
>>>> with RelCollation extends RelMultipleTrait, which allows a RelNode to
>> have
>>>> more than one sort-order) but I may be wrong.
>>>> 
>>>> The advantages of making Exchange a first-class operator and
>> Distribution
>>>> a
>>>> trait are clear. We will be able to build a library of rules (e.g.
>>>> FilterExchangePushRule, ExchangeRemoveRule), a RelMdDistribution
>> metadata
>>>> interface, and start working on stats and cost model.
>>>> 
>>>> Drill and Hive stakeholders, please let me know what you think of this
>>>> plan.
>>>> 
>>>> Julian
>>>> 
>>> 
>>> 
>> 

Reply via email to