> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillFilterRelBase.java,
> >  line 58
> > <https://reviews.apache.org/r/20741/diff/1/?file=569043#file569043line58>
> >
> >     I think we should probably add a cost settings/config object.  Could 
> > hang off PlannerSettings or used to initialize our RelOptCostFactory.  I'd 
> > like to avoid magic numbers being spread throughout the code so we can 
> > change them in a local place and evaluate the impacts

Pls see my response to Jinfeng's comments on the same issue. I have modified 
the formula to not use the hardcoded value. The other part about externalizing 
the cost factors through PlannerSettings needs some thought..we should only 
externalize selected factors.    


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java,
> >  line 60
> > <https://reviews.apache.org/r/20741/diff/1/?file=569057#file569057line60>
> >
> >     As above, let's primarily do this check in the rule rather than waiting 
> > for exceptions.  Exceptions are expensive.  Additionally, we have warning 
> > messages that are misleading because an alternative plan may avoid an 
> > equijoin.

Agreed.  I have added a new JoinPruleBase class that performs the checks for 
cartesian join and non-equijoins.  Both HashJoinPrule and MergeJoinPrule extend 
from this class. If the conditional checks fail, the onMatch() will return 
without creating the transform request.  


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrule.java,
> >  line 72
> > <https://reviews.apache.org/r/20741/diff/1/?file=569056#file569056line72>
> >
> >     why did you disable single grouping key?

That was just for testing.. I have uncommented it. 


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamAggPrule.java,
> >  line 80
> > <https://reviews.apache.org/r/20741/diff/1/?file=569068#file569068line80>
> >
> >     same as hash agg, why removed?

That was just for testing.. I have uncommented it. 


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamAggPrule.java,
> >  line 84
> > <https://reviews.apache.org/r/20741/diff/1/?file=569068#file569068line84>
> >
> >     maybe note that this is incomplete and thus commented out.

done. 


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamAggPrule.java,
> >  line 149
> > <https://reviews.apache.org/r/20741/diff/1/?file=569068#file569068line149>
> >
> >     I think this is the same as StreamAgg, can we move to shared location?

Yes.  I created an abstract base class AggPruleBase and moved this utility 
function there.  Both HashAggr and StreamingAggr extend from this base class.  


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrule.java,
> >  line 39
> > <https://reviews.apache.org/r/20741/diff/1/?file=569056#file569056line39>
> >
> >     Can you add a matches which checks the aggregates and fails to match on 
> > the case where there is a distinct aggregate?

Done. onMatch() will return without creating a transform request if any of the 
aggregate exprs contains a distinct.  


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillProjectRelBase.java,
> >  line 63
> > <https://reviews.apache.org/r/20741/diff/1/?file=569044#file569044line63>
> >
> >     Let's try to include this magic number also in the centralized cost 
> > config so we can explore its impact.  As above, ideally we would pull this 
> > from an understanding of the expressions.

Added a constant DrillCostBase.projectCpuCost with a factor of 4 * baseCpuCost 
to model the cost of projecting expressions and columns.  Ideally, we would 
want to analyze the expression to determine cost accurately, but for now I have 
changed the formala to be:   
   cpuCost = numProjectedExpressions * rowCount * projectCpuCost.


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrel.java,
> >  line 59
> > <https://reviews.apache.org/r/20741/diff/1/?file=569055#file569055line59>
> >
> >     Ideally, this shouldn't be hit.  Rule should fail to match.

Done (see below for HashAggPrule).


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java,
> >  line 88
> > <https://reviews.apache.org/r/20741/diff/1/?file=569057#file569057line88>
> >
> >     please centralize magic number.

Done.


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ScanPrel.java,
> >  line 75
> > <https://reviews.apache.org/r/20741/diff/1/?file=569064#file569064line75>
> >
> >     I could be wrong but I thought the third number was disk io.  Why would 
> > this be 0?  Also, we should probably delegate this down to groupscan as 
> > different scan types have different costs.  For example, Parquet is 
> > substantially more CPU intensive.  CSV, more disk intensive.

After discussing this, kept the formula as-is and added the following comments: 
    // Even though scan is reading from disk, in the currently generated plans 
all plans will
    // need to read the same amount of data, so keeping the disk io cost 0 is 
ok for now.  
    // In the future we might consider alternative scans that go against 
projections or 
    // different compression schemes etc that affect the amount of data read. 
Such alternatives
    // would affect both cpu and io cost. 


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashToMergeExchange.java,
> >  line 87
> > <https://reviews.apache.org/r/20741/diff/1/?file=569041#file569041line87>
> >
> >     This looks like old, unused code.  Do we need this?

All exchanges implement this method which is defined in the Exchange interface 
class but not getting used.  After discussion, I took it out from all the 
exchanges and the interface. 


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrule.java,
> >  line 80
> > <https://reviews.apache.org/r/20741/diff/1/?file=569062#file569062line80>
> >
> >     This brings to mind that we should probably have settings to enable 
> > disable various types of exchanges.  E.g. supports_broadcast_exchange

We have talked about this...it's not part of these costing changes but will 
address as part of a separate JIRA. 


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrule.java,
> >  line 57
> > <https://reviews.apache.org/r/20741/diff/1/?file=569056#file569056line57>
> >
> >     probably add note here about multi phase aggregate

You mean for the hash aggregate rule in general, not for this particular 
statement, right ?  this statement (line 57) is part of the following 
conditional block which is about non-grouped (plain) aggregation, not 
necessarily multi-phase aggrs. I will add a general comment about multi-phase 
for this rule.   
      if (aggregate.getGroupSet().isEmpty()) {
        toDist = DrillDistributionTrait.SINGLETON;
        traits = 
call.getPlanner().emptyTraitSet().plus(Prel.DRILL_PHYSICAL).plus(toDist);
        createTransformRequest(call, aggregate, input, traits);


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTraitDef.java,
> >  line 75
> > <https://reviews.apache.org/r/20741/diff/1/?file=569053#file569053line75>
> >
> >     How does OrderedPartitionExchange destroy orderedness?  Same for 
> > broadcast?

OrderedPartitionExchange and BroadcastExchange both create a RandomReceiver 
instance, not a MergingReceiver.  So one would have to add a Sort enforcer 
after the exchange to provide sortedness. I suppose your point is that one 
could theoretically create a BroadcastExchange with a MergingReceiver and 
preserve sortedness. Currently, we don't do that though. 


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashToMergeExchangePrel.java,
> >  line 88
> > <https://reviews.apache.org/r/20741/diff/1/?file=569059#file569059line88>
> >
> >     Can we update to new utility method?  Also, are we sure this is true.  
> > I thought hash exchange supported all selection vectors.

Modified to use the utility method in PrelUtil. The modified code is the same 
as is present in HashToRandomExchange. 


- Aman


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20741/#review41681
-----------------------------------------------------------


On April 26, 2014, 1:54 a.m., Aman Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20741/
> -----------------------------------------------------------
> 
> (Updated April 26, 2014, 1:54 a.m.)
> 
> 
> Review request for drill, Jacques Nadeau and Jinfeng Ni.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> 1. Added DrillCostBase that includes distribution (network) cost.
> 2. Added cost formulas for computeSelfCost() for various Exchange operators, 
> joins, aggregations etc. 
> 3. Added Prels for HashJoin, HashAggregate and generate plans for those.  
> 4. Added exchange Prels: BroadcastExchangePrel and a new type called 
> HashToMergeExchangePrel. 
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java 
> 7e3b63d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashToMergeExchange.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableConfig.java
>  98892c0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillFilterRelBase.java
>  955729b 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillProjectRelBase.java
>  cf3d188 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillScanRelBase.java
>  b370352 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillScreenRelBase.java
>  51ed442 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillCostBase.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelOptCost.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelOptCostFactory.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
>  1492a28 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/BroadcastExchangePrel.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTrait.java
>  b75fb40 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTraitDef.java
>  c2ebb7a 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/FilterPrel.java
>  0fc3abd 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrel.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrule.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrule.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashToMergeExchangePrel.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashToRandomExchangePrel.java
>  e5c9661 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java
>  978a531 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrule.java
>  8298e50 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
>  e6e99c0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ScanPrel.java
>  a945129 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SingleMergeExchangePrel.java
>  d9431cc 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SortPrel.java
>  344be4e 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamAggPrel.java
>  c2880da 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamAggPrule.java
>  a561a61 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/TopNPrel.java
>  e981a45 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/UnionExchangePrel.java
>  f89cbaa 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java
>  8892a8f 
>   exec/java-exec/src/test/java/org/apache/drill/TestTpchSingleMode.java 
> 1ccb65c 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoin.java
>  ba067e2 
>   exec/java-exec/src/test/resources/join/hj_exchanges.json PRE-CREATION 
>   exec/java-exec/src/test/resources/queries/tpch/01.sql f33416f 
> 
> Diff: https://reviews.apache.org/r/20741/diff/
> 
> 
> Testing
> -------
> 
> Existing unit tests.  Some TPCH tests show memory leaks but that is not 
> directly related to the costing changes...however those leaks would have to 
> be resolved before enabling the new plans.  However, I wanted to get this 
> review request out to get feedback.  
> 
> 
> Thanks,
> 
> Aman Sinha
> 
>

Reply via email to