> On May 8, 2015, 10:57 p.m., Aman Sinha wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java, > > line 124 > > <https://reviews.apache.org/r/34006/diff/2/?file=954195#file954195line124> > > > > Since the main difference between this and other costing function is > > the multiplication factor, can you consolidate the two and just provide > > thin wrappers ?
There is some difference between the two costing function. But I'll try to put the common code in one place. > On May 8, 2015, 10:57 p.m., Aman Sinha wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java, > > line 53 > > <https://reviews.apache.org/r/34006/diff/2/?file=954198#file954198line53> > > > > It is not clear why Scan should have a 'distinct' row count of 10% ? > > What does 'distinct' row count mean for scan ? (since we are not > > considering filter push-down or partition pruning here). The distinct row count is for one set of column. It's more like column cardinality. For instance, GB c1, c2, c3. The ImmutableBitSet groupKey indicates the column set. However, since Drill does not have any column cardinality, we simply use row count to estimate. As a matter of fact, the current VolcanoPlanner uses the default RelMetadaProvider, which will return 10% of row count for AggregateRel. In this sense, this DistinctRowCount MetadataProvider is using the same way for estimation. > On May 8, 2015, 10:57 p.m., Aman Sinha wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRelFactories.java, > > line 37 > > <https://reviews.apache.org/r/34006/diff/2/?file=954203#file954203line37> > > > > This currently has factories for creating logical project, filter and > > join. Why only these 3 and not other logical rels ? This is because LOPT only requires three : project, filter, join. We certainly could create more factories. But I'm not inclined to create more class, if they are not used at all in this patch. I'll file another JIRA to create more factories for futher use. > On May 8, 2015, 10:57 p.m., Aman Sinha wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java, > > line 529 > > <https://reviews.apache.org/r/34006/diff/2/?file=954207#file954207line529> > > > > Is the criteria for using Lopt optimizer (in terms of number of tables > > above a certain threshold) applied internally ? We should have a Drill > > specific setting for it beyond just a true/false setting. I think it probably makes sense to avoid LOPT planner for single table query. For any query with JOIN, since the current planer does not enable SwapJoin rule in the logical planning phase, it may not find the optimal plan, and rely on a post-planing method to swap join based on rowcount. In that sense, I feel LOPT planner might be a better choice even for 2 or 3 tables join. For single table query, since there is no join, it seems no difference between LOPT / the current planner. That's why I did not add a threashold. - Jinfeng ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34006/#review83134 ----------------------------------------------------------- On May 8, 2015, 5:20 p.m., Jinfeng Ni wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/34006/ > ----------------------------------------------------------- > > (Updated May 8, 2015, 5:20 p.m.) > > > Review request for drill and Aman Sinha. > > > Repository: drill-git > > > Description > ------- > > Drill current use VolcanoPlanner in join planning. This planner has two known > issues: > > 1. The search space is increased exponentially with increased # of tables > joined. If query has more than > 10 tables join, the planning time itself > could be minutes, if not longer. > > 2. Drill did not enable a rule to swap both sides of join, due to the search > space problem. We only do a swap join afterwards. See DRILL-2236. This means > the join order chosen by Drill's VolcanoPlanner might not be optimal. > > To address the above two issues, we are going to provide another planner for > the purpose of join ordering planning. This planner will use a different > optimization rules, and the search space is not increased exponentially with > # of table. > > The main logic of this new planner: > 1) Let VolcanoPlanner do all the rule transformations same as the current > planner's logical planning, except for the join permutation rule. > 2) After that, pass to HepPlanner with Calcite LOPT optimization rule, to let > it do the join ordering. Feed with the HepPlanner with Drill's > RelMetaDataProvider, to leverage the statistics (rowcount) available in > Drill's table/files. > 3) Continue with the same physical planning as before. > > With the limited statistics available in Drill, the new planner seems to > produce better query plan than the current, for several TPCH queries. > > Preliminary performance results show this planner run faster than the > existing one, and the join plan seems to be same or better than the plan > chosen by the existing planner. > > Will update more in detail about the comparison. > > > Diffs > ----- > > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java > 5ab416c > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillProjectRelBase.java > 42ef6ac > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillDefaultRelMetadataProvider.java > PRE-CREATION > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java > PRE-CREATION > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterRel.java > dbd08f4 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillJoinRel.java > dcccdb0 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillProjectRel.java > 6e132aa > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjIntoScan.java > 2981de8 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRelFactories.java > PRE-CREATION > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java > 53e1bff > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java > 7d8dd97 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java > 3c78c08 > > exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java > eda1b5f > > exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java > 4d8b034 > > Diff: https://reviews.apache.org/r/34006/diff/ > > > Testing > ------- > > Unit test / Regression suite. > > > Thanks, > > Jinfeng Ni > >
