[ https://issues.apache.org/jira/browse/PHOENIX-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chenglei updated PHOENIX-5148: ------------------------------ Attachment: PHOENIX-5148-4.x-HBase-1.4.patch > Improve OrderPreservingTracker to optimize OrderBy/GroupBy for ClientScanPlan > and ClientAggregatePlan > ----------------------------------------------------------------------------------------------------- > > Key: PHOENIX-5148 > URL: https://issues.apache.org/jira/browse/PHOENIX-5148 > Project: Phoenix > Issue Type: Improvement > Affects Versions: 4.14.1 > Reporter: chenglei > Assignee: chenglei > Priority: Major > Fix For: 4.15.0 > > Attachments: PHOENIX-5148-4.x-HBase-1.4.patch > > > Given a table > {code:java} > create table test ( > pk1 varchar not null , > pk2 varchar not null, > pk3 varchar not null, > v1 varchar, > v2 varchar, > CONSTRAINT TEST_PK PRIMARY KEY ( > pk1, > pk2, > pk3 )) > {code} > Consider following three cases: > *1. OrderBy of ClientScanPlan* > for sql: > {code:java} > select v1 from (select v1,v2,pk3 from test t where pk1 = '6' order by > t.v2,t.pk3,t.v1 limit 10) a order by v2,pk3 > {code} > Obviously, the outer {{OrderBy}} "order by v2,pk3" should be compiled out > because it matchs the inner query {{OrderBy}} "order by t.v2,t.pk3,t.v1" , > but unfortunately it is not compiled out. > *2. GroupBy of ClientAggregatePlan* > for sql : > {code:java} > select v1 from (select v1,pk2,pk1 from test t where pk1 = '6' order by > t.pk2,t.v1,t.pk1 limit 10) a group by pk2,v1 > {code} > Obviously, the outer {{GroupBy}} "group by pk2,v1" should be orderPreserving > because it matchs the inner query {{OrderBy}} "order by t.pk2,t.v1,t.pk1" , > but unfortunately the {{isOrderPreserving()}} of outer {{GroupBy}} return > false. > *3. OrderBy of SortMergeJoinPlan(from PHOENIX-4618)* > for sql: > {code:java} > SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b{code} > The result of the sort-merge-join is sorted on (T1.a, T1.b) and (T2.a, T2.b) > at the same time. > Thus, both 1) > {code:java} > SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T1.a, > T1.b{code} > and 2) > {code:java} > SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T2.a, > T2.b{code} > should avoid doing an extra order-by after the sort-merge-join operation. > > All the above three cases are caused by the same problem that the > {{OrderPreservingTracker}} relies solely on row keys for inferring alignment > between the target {{Expression}} and the source sortedness. > For following {{QueryCompiler.compileSingleQuery}}, because the inner query > of above cases has {{OrderBy}}, so in line 534, local variable > {{isInRowKeyOrder}} is false: > {code:java} > 518 protected QueryPlan compileSingleQuery(StatementContext context, > SelectStatement select, List<Object> binds, boolean asSubquery, boolean > allowPageFilter) throws SQLException{ > 519 SelectStatement innerSelect = select.getInnerSelectStatement(); > 520 if (innerSelect == null) { > 521 return compileSingleFlatQuery(context, select, binds, > asSubquery, allowPageFilter, null, null, true); > 522 } > 523 > 524 QueryPlan innerPlan = compileSubquery(innerSelect, false); > 525 TupleProjector tupleProjector = new > TupleProjector(innerPlan.getProjector()); > 526 innerPlan = new TupleProjectionPlan(innerPlan, tupleProjector, > null); > 527 > 528 // Replace the original resolver and table with those having > compiled type info. > 529 TableRef tableRef = context.getResolver().getTables().get(0); > 530 ColumnResolver resolver = > FromCompiler.getResolverForCompiledDerivedTable(statement.getConnection(), > tableRef, innerPlan.getProjector()); > 531 context.setResolver(resolver); > 532 tableRef = resolver.getTables().get(0); > 533 context.setCurrentTable(tableRef); > 534 boolean isInRowKeyOrder = innerPlan.getGroupBy() == > GroupBy.EMPTY_GROUP_BY && innerPlan.getOrderBy() == OrderBy.EMPTY_ORDER_BY; > 535 > 536 return compileSingleFlatQuery(context, select, binds, > asSubquery, allowPageFilter, innerPlan, tupleProjector, isInRowKeyOrder); > 537 } > {code} > Because the {{OrderPreservingTracker}} relies solely on row keys, so in > following line 170 of {{OrderByCompiler.compile}} , when {{isInRowKeyOrder}} > is false, the {{OrderPreservingTracker.isOrderPreserving()}} checking is > skipped: > {code:java} > 169 // If we're ordering by the order returned by the scan, we don't need > an order by > 170 if (isInRowKeyOrder && tracker.isOrderPreserving()) { > 171 if (tracker.isReverse()) { > 172 // Don't use reverse scan if: > 173 // 1) we're using a skip scan, as our skip scan doesn't > support this yet. > 174 // 2) we have the FORWARD_SCAN hint set to choose to > keep loading of column > 175 // families on demand versus doing a reverse scan > 176 // REV_ROW_KEY_ORDER_BY scan would not take effect for a > projected table, so don't return it for such table types. > 177 if > (context.getConnection().getQueryServices().getProps().getBoolean(QueryServices.USE_REVERSE_SCAN_ATTRIB, > QueryServicesOptions.DEFAULT_USE_REVERSE_SCAN) > 178 && !context.getScanRanges().useSkipScanFilter() > 179 && context.getCurrentTable().getTable().getType() > != PTableType.PROJECTED > 180 && context.getCurrentTable().getTable().getType() > != PTableType.SUBQUERY > 181 && > !statement.getHint().hasHint(Hint.FORWARD_SCAN)) { > 182 return OrderBy.REV_ROW_KEY_ORDER_BY; > 183 } > 184 } else { > 185 return OrderBy.FWD_ROW_KEY_ORDER_BY; > 186 } > 187 } > {code} > > What we can do to fix this problem are : > - refactor the {{OrderPreservingTracker}} to make it inferring alignment > between the target {{Expression}} and the source sortedness based on > {{OrderByExpression}} , not the row keys, so we can remove the variable > {{isInRowKeyOrder}}. > - add a new method {{getActualOutputOrderBys}} to the {{QueryPlan}} to get > the output OrderBys of the query result , eg. for the above third > {{SortMergeJoinPlan}} example ,{{getActualOutputOrderBys}} returns (T1.a, > T1.b) and (T2.a, T2.b) . > Besides above three cases, a implicit semantic optimization could be > implemented also: > for following sql , even the literal {{OrderBy}} of inner query is "order by > t.pk1,t.pk2" . By {{getActualOutputOrderBys}} of inner {{QueryPlan}}, we can > infer that the actual {{OrderBy}} of inner query is "order by > t.pk1,t.pk2,t.pk3" , so the outer OrderBy "order by pk1,pk2,pk3" could be > compiled out: > {code} > select pk1 from (select pk3,pk2,pk1 from test t where v1 = '6' order by > t.pk1,t.pk2 limit 10) a where pk3 > '8' order by pk1,pk2,pk3 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)