> On May 9, 2015, 5:57 a.m., Aman Sinha wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java, > > line 53 > > <https://reviews.apache.org/r/34006/diff/2/?file=954198#file954198line53> > > > > It is not clear why Scan should have a 'distinct' row count of 10% ? > > What does 'distinct' row count mean for scan ? (since we are not > > considering filter push-down or partition pruning here). > > Jinfeng Ni wrote: > The distinct row count is for one set of column. It's more like column > cardinality. For instance, GB c1, c2, c3. The ImmutableBitSet groupKey > indicates the column set. However, since Drill does not have any column > cardinality, we simply use row count to estimate. > > As a matter of fact, the current VolcanoPlanner uses the default > RelMetadaProvider, which will return 10% of row count for AggregateRel. In > this sense, this DistinctRowCount MetadataProvider is using the same way for > estimation.
Returning 10% as the distinct row count for AggregateRel seems reasonable since it is doing group-by. I suppose for ScanRel it will be called for individual columns, not a set of columns... in other words if I want the NDV(c1), NDV(c2), NDV(c3), I will call this method 3 times with different columns in the groupKey. > On May 9, 2015, 5:57 a.m., Aman Sinha wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java, > > line 529 > > <https://reviews.apache.org/r/34006/diff/2/?file=954207#file954207line529> > > > > Is the criteria for using Lopt optimizer (in terms of number of tables > > above a certain threshold) applied internally ? We should have a Drill > > specific setting for it beyond just a true/false setting. > > Jinfeng Ni wrote: > I think it probably makes sense to avoid LOPT planner for single table > query. For any query with JOIN, since the current planer does not enable > SwapJoin rule in the logical planning phase, it may not find the optimal > plan, and rely on a post-planing method to swap join based on rowcount. In > that sense, I feel LOPT planner might be a better choice even for 2 or 3 > tables join. > > For single table query, since there is no join, it seems no difference > between LOPT / the current planner. That's why I did not add a threashold. > > Jinfeng Ni wrote: > Another reason that I did not add a threashold option is we have added a > new option to swtich between the existing planner and new planner. User could > simply turn on/off that option, to completely switch to one of them. Having > both the switch option and threashold option probably will cause more > confusing, IMHO. Ok, let's see how the TPC-DS testing goes and see if more control is needed. - Aman ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34006/#review83134 ----------------------------------------------------------- On May 9, 2015, 12:20 a.m., Jinfeng Ni wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/34006/ > ----------------------------------------------------------- > > (Updated May 9, 2015, 12:20 a.m.) > > > Review request for drill and Aman Sinha. > > > Repository: drill-git > > > Description > ------- > > Drill current use VolcanoPlanner in join planning. This planner has two known > issues: > > 1. The search space is increased exponentially with increased # of tables > joined. If query has more than > 10 tables join, the planning time itself > could be minutes, if not longer. > > 2. Drill did not enable a rule to swap both sides of join, due to the search > space problem. We only do a swap join afterwards. See DRILL-2236. This means > the join order chosen by Drill's VolcanoPlanner might not be optimal. > > To address the above two issues, we are going to provide another planner for > the purpose of join ordering planning. This planner will use a different > optimization rules, and the search space is not increased exponentially with > # of table. > > The main logic of this new planner: > 1) Let VolcanoPlanner do all the rule transformations same as the current > planner's logical planning, except for the join permutation rule. > 2) After that, pass to HepPlanner with Calcite LOPT optimization rule, to let > it do the join ordering. Feed with the HepPlanner with Drill's > RelMetaDataProvider, to leverage the statistics (rowcount) available in > Drill's table/files. > 3) Continue with the same physical planning as before. > > With the limited statistics available in Drill, the new planner seems to > produce better query plan than the current, for several TPCH queries. > > Preliminary performance results show this planner run faster than the > existing one, and the join plan seems to be same or better than the plan > chosen by the existing planner. > > Will update more in detail about the comparison. > > > Diffs > ----- > > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java > 5ab416c > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillProjectRelBase.java > 42ef6ac > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillDefaultRelMetadataProvider.java > PRE-CREATION > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java > PRE-CREATION > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterRel.java > dbd08f4 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillJoinRel.java > dcccdb0 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillProjectRel.java > 6e132aa > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjIntoScan.java > 2981de8 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRelFactories.java > PRE-CREATION > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java > 53e1bff > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java > 7d8dd97 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java > 3c78c08 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java > eda1b5f > > exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java > 4d8b034 > > Diff: https://reviews.apache.org/r/34006/diff/ > > > Testing > ------- > > Unit test / Regression suite. > > > Thanks, > > Jinfeng Ni > >
