> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillFilterRelBase.java, > > line 58 > > <https://reviews.apache.org/r/20741/diff/1/?file=569043#file569043line58> > > > > I think we should probably add a cost settings/config object. Could > > hang off PlannerSettings or used to initialize our RelOptCostFactory. I'd > > like to avoid magic numbers being spread throughout the code so we can > > change them in a local place and evaluate the impacts
Pls see my response to Jinfeng's comments on the same issue. I have modified the formula to not use the hardcoded value. The other part about externalizing the cost factors through PlannerSettings needs some thought..we should only externalize selected factors. > On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java, > > line 60 > > <https://reviews.apache.org/r/20741/diff/1/?file=569057#file569057line60> > > > > As above, let's primarily do this check in the rule rather than waiting > > for exceptions. Exceptions are expensive. Additionally, we have warning > > messages that are misleading because an alternative plan may avoid an > > equijoin. Agreed. I have added a new JoinPruleBase class that performs the checks for cartesian join and non-equijoins. Both HashJoinPrule and MergeJoinPrule extend from this class. If the conditional checks fail, the onMatch() will return without creating the transform request. > On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrule.java, > > line 72 > > <https://reviews.apache.org/r/20741/diff/1/?file=569056#file569056line72> > > > > why did you disable single grouping key? That was just for testing.. I have uncommented it. > On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamAggPrule.java, > > line 80 > > <https://reviews.apache.org/r/20741/diff/1/?file=569068#file569068line80> > > > > same as hash agg, why removed? That was just for testing.. I have uncommented it. > On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamAggPrule.java, > > line 84 > > <https://reviews.apache.org/r/20741/diff/1/?file=569068#file569068line84> > > > > maybe note that this is incomplete and thus commented out. done. > On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamAggPrule.java, > > line 149 > > <https://reviews.apache.org/r/20741/diff/1/?file=569068#file569068line149> > > > > I think this is the same as StreamAgg, can we move to shared location? Yes. I created an abstract base class AggPruleBase and moved this utility function there. Both HashAggr and StreamingAggr extend from this base class. > On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrule.java, > > line 39 > > <https://reviews.apache.org/r/20741/diff/1/?file=569056#file569056line39> > > > > Can you add a matches which checks the aggregates and fails to match on > > the case where there is a distinct aggregate? Done. onMatch() will return without creating a transform request if any of the aggregate exprs contains a distinct. > On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillProjectRelBase.java, > > line 63 > > <https://reviews.apache.org/r/20741/diff/1/?file=569044#file569044line63> > > > > Let's try to include this magic number also in the centralized cost > > config so we can explore its impact. As above, ideally we would pull this > > from an understanding of the expressions. Added a constant DrillCostBase.projectCpuCost with a factor of 4 * baseCpuCost to model the cost of projecting expressions and columns. Ideally, we would want to analyze the expression to determine cost accurately, but for now I have changed the formala to be: cpuCost = numProjectedExpressions * rowCount * projectCpuCost. > On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrel.java, > > line 59 > > <https://reviews.apache.org/r/20741/diff/1/?file=569055#file569055line59> > > > > Ideally, this shouldn't be hit. Rule should fail to match. Done (see below for HashAggPrule). > On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java, > > line 88 > > <https://reviews.apache.org/r/20741/diff/1/?file=569057#file569057line88> > > > > please centralize magic number. Done. > On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ScanPrel.java, > > line 75 > > <https://reviews.apache.org/r/20741/diff/1/?file=569064#file569064line75> > > > > I could be wrong but I thought the third number was disk io. Why would > > this be 0? Also, we should probably delegate this down to groupscan as > > different scan types have different costs. For example, Parquet is > > substantially more CPU intensive. CSV, more disk intensive. After discussing this, kept the formula as-is and added the following comments: // Even though scan is reading from disk, in the currently generated plans all plans will // need to read the same amount of data, so keeping the disk io cost 0 is ok for now. // In the future we might consider alternative scans that go against projections or // different compression schemes etc that affect the amount of data read. Such alternatives // would affect both cpu and io cost. > On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashToMergeExchange.java, > > line 87 > > <https://reviews.apache.org/r/20741/diff/1/?file=569041#file569041line87> > > > > This looks like old, unused code. Do we need this? All exchanges implement this method which is defined in the Exchange interface class but not getting used. After discussion, I took it out from all the exchanges and the interface. > On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrule.java, > > line 80 > > <https://reviews.apache.org/r/20741/diff/1/?file=569062#file569062line80> > > > > This brings to mind that we should probably have settings to enable > > disable various types of exchanges. E.g. supports_broadcast_exchange We have talked about this...it's not part of these costing changes but will address as part of a separate JIRA. > On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrule.java, > > line 57 > > <https://reviews.apache.org/r/20741/diff/1/?file=569056#file569056line57> > > > > probably add note here about multi phase aggregate You mean for the hash aggregate rule in general, not for this particular statement, right ? this statement (line 57) is part of the following conditional block which is about non-grouped (plain) aggregation, not necessarily multi-phase aggrs. I will add a general comment about multi-phase for this rule. if (aggregate.getGroupSet().isEmpty()) { toDist = DrillDistributionTrait.SINGLETON; traits = call.getPlanner().emptyTraitSet().plus(Prel.DRILL_PHYSICAL).plus(toDist); createTransformRequest(call, aggregate, input, traits); > On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTraitDef.java, > > line 75 > > <https://reviews.apache.org/r/20741/diff/1/?file=569053#file569053line75> > > > > How does OrderedPartitionExchange destroy orderedness? Same for > > broadcast? OrderedPartitionExchange and BroadcastExchange both create a RandomReceiver instance, not a MergingReceiver. So one would have to add a Sort enforcer after the exchange to provide sortedness. I suppose your point is that one could theoretically create a BroadcastExchange with a MergingReceiver and preserve sortedness. Currently, we don't do that though. > On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashToMergeExchangePrel.java, > > line 88 > > <https://reviews.apache.org/r/20741/diff/1/?file=569059#file569059line88> > > > > Can we update to new utility method? Also, are we sure this is true. > > I thought hash exchange supported all selection vectors. Modified to use the utility method in PrelUtil. The modified code is the same as is present in HashToRandomExchange. - Aman ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20741/#review41681 ----------------------------------------------------------- On April 26, 2014, 1:54 a.m., Aman Sinha wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/20741/ > ----------------------------------------------------------- > > (Updated April 26, 2014, 1:54 a.m.) > > > Review request for drill, Jacques Nadeau and Jinfeng Ni. > > > Repository: drill-git > > > Description > ------- > > 1. Added DrillCostBase that includes distribution (network) cost. > 2. Added cost formulas for computeSelfCost() for various Exchange operators, > joins, aggregations etc. > 3. Added Prels for HashJoin, HashAggregate and generate plans for those. > 4. Added exchange Prels: BroadcastExchangePrel and a new type called > HashToMergeExchangePrel. > > > Diffs > ----- > > exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java > 7e3b63d > > exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashToMergeExchange.java > PRE-CREATION > > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableConfig.java > 98892c0 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillFilterRelBase.java > 955729b > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillProjectRelBase.java > cf3d188 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillScanRelBase.java > b370352 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillScreenRelBase.java > 51ed442 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillCostBase.java > PRE-CREATION > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelOptCost.java > PRE-CREATION > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelOptCostFactory.java > PRE-CREATION > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java > 1492a28 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/BroadcastExchangePrel.java > PRE-CREATION > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTrait.java > b75fb40 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTraitDef.java > c2ebb7a > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/FilterPrel.java > 0fc3abd > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrel.java > PRE-CREATION > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrule.java > PRE-CREATION > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java > PRE-CREATION > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrule.java > PRE-CREATION > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashToMergeExchangePrel.java > PRE-CREATION > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashToRandomExchangePrel.java > e5c9661 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java > 978a531 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrule.java > 8298e50 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java > e6e99c0 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ScanPrel.java > a945129 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SingleMergeExchangePrel.java > d9431cc > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SortPrel.java > 344be4e > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamAggPrel.java > c2880da > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamAggPrule.java > a561a61 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/TopNPrel.java > e981a45 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/UnionExchangePrel.java > f89cbaa > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java > 8892a8f > exec/java-exec/src/test/java/org/apache/drill/TestTpchSingleMode.java > 1ccb65c > > exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoin.java > ba067e2 > exec/java-exec/src/test/resources/join/hj_exchanges.json PRE-CREATION > exec/java-exec/src/test/resources/queries/tpch/01.sql f33416f > > Diff: https://reviews.apache.org/r/20741/diff/ > > > Testing > ------- > > Existing unit tests. Some TPCH tests show memory leaks but that is not > directly related to the costing changes...however those leaks would have to > be resolved before enabling the new plans. However, I wanted to get this > review request out to get feedback. > > > Thanks, > > Aman Sinha > >
