[ https://issues.apache.org/jira/browse/PHOENIX-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chenglei updated PHOENIX-5148: ------------------------------ Description: Given a table {code:java} create table test ( pk1 varchar not null , pk2 varchar not null, pk3 varchar not null, v1 varchar, v2 varchar, CONSTRAINT TEST_PK PRIMARY KEY ( pk1, pk2, pk3 )) {code} Consider following three cases: *1. OrderBy of ClientScanPlan* for sql: {code:java} select v1 from (select v1,v2,pk3 from test t where pk1 = '6' order by t.v2,t.pk3,t.v1 limit 10) a order by v2,pk3 {code} Obviously, the outer {{OrderBy}} "order by v2,pk3" should be compiled out because it matchs the inner query {{OrderBy}} "order by t.v2,t.pk3,t.v1" , but unfortunately it is not compiled out. *2. GroupBy of ClientAggregatePlan* for sql : {code:java} select v1 from (select v1,pk2,pk1 from test t where pk1 = '6' order by t.pk2,t.v1,t.pk1 limit 10) a group by pk2,v1 {code} Obviously, the outer {{GroupBy}} "group by pk2,v1" should be orderPreserving because it matchs the inner query {{OrderBy}} "order by t.pk2,t.v1,t.pk1" , but unfortunately the {{isOrderPreserving()}} of outer {{GroupBy}} return false. *3. OrderBy of SortMergeJoinPlan(from PHOENIX-4618)* for sql: {code:java} SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b{code} The result of the sort-merge-join is sorted on (T1.a, T1.b) and (T2.a, T2.b) at the same time. Thus, both 1) {code:java} SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T1.a, T1.b{code} and 2) {code:java} SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T2.a, T2.b{code} should avoid doing an extra order-by after the sort-merge-join operation. All the above three cases are caused by the same problem that the {{OrderPreservingTracker}} relies solely on row keys for inferring alignment between the target {{OrderByExpression}} and the source sortedness. For following {{QueryCompiler.compileSingleQuery}}, because the inner query of above cases has {{OrderBy}}, so in line 534, local variable {{isInRowKeyOrder}} is false: {code:java} 518 protected QueryPlan compileSingleQuery(StatementContext context, SelectStatement select, List<Object> binds, boolean asSubquery, boolean allowPageFilter) throws SQLException{ 519 SelectStatement innerSelect = select.getInnerSelectStatement(); 520 if (innerSelect == null) { 521 return compileSingleFlatQuery(context, select, binds, asSubquery, allowPageFilter, null, null, true); 522 } 523 524 QueryPlan innerPlan = compileSubquery(innerSelect, false); 525 TupleProjector tupleProjector = new TupleProjector(innerPlan.getProjector()); 526 innerPlan = new TupleProjectionPlan(innerPlan, tupleProjector, null); 527 528 // Replace the original resolver and table with those having compiled type info. 529 TableRef tableRef = context.getResolver().getTables().get(0); 530 ColumnResolver resolver = FromCompiler.getResolverForCompiledDerivedTable(statement.getConnection(), tableRef, innerPlan.getProjector()); 531 context.setResolver(resolver); 532 tableRef = resolver.getTables().get(0); 533 context.setCurrentTable(tableRef); 534 boolean isInRowKeyOrder = innerPlan.getGroupBy() == GroupBy.EMPTY_GROUP_BY && innerPlan.getOrderBy() == OrderBy.EMPTY_ORDER_BY; 535 536 return compileSingleFlatQuery(context, select, binds, asSubquery, allowPageFilter, innerPlan, tupleProjector, isInRowKeyOrder); 537 } {code} Because the {{OrderPreservingTracker}} relies solely on row keys, so in following {{OrderByCompiler.compile}} when {{isInRowKeyOrder}} is false, the {{OrderPreservingTracker.isOrderPreserving()}} is skipped: {code:java} 169 // If we're ordering by the order returned by the scan, we don't need an order by 170 if (isInRowKeyOrder && tracker.isOrderPreserving()) { 171 if (tracker.isReverse()) { 172 // Don't use reverse scan if: 173 // 1) we're using a skip scan, as our skip scan doesn't support this yet. 174 // 2) we have the FORWARD_SCAN hint set to choose to keep loading of column 175 // families on demand versus doing a reverse scan 176 // REV_ROW_KEY_ORDER_BY scan would not take effect for a projected table, so don't return it for such table types. 177 if (context.getConnection().getQueryServices().getProps().getBoolean(QueryServices.USE_REVERSE_SCAN_ATTRIB, QueryServicesOptions.DEFAULT_USE_REVERSE_SCAN) 178 && !context.getScanRanges().useSkipScanFilter() 179 && context.getCurrentTable().getTable().getType() != PTableType.PROJECTED 180 && context.getCurrentTable().getTable().getType() != PTableType.SUBQUERY 181 && !statement.getHint().hasHint(Hint.FORWARD_SCAN)) { 182 return OrderBy.REV_ROW_KEY_ORDER_BY; 183 } 184 } else { 185 return OrderBy.FWD_ROW_KEY_ORDER_BY; 186 } 187 } {code} was: Given a table {code:java} create table test ( pk1 varchar not null , pk2 varchar not null, pk3 varchar not null, v1 varchar, v2 varchar, CONSTRAINT TEST_PK PRIMARY KEY ( pk1, pk2, pk3 )) {code} Consider following three cases: *1. OrderBy of ClientScanPlan* for sql: {code:java} select v1 from (select v1,v2,pk3 from test t where pk1 = '6' order by t.v2,t.pk3,t.v1 limit 10) a order by v2,pk3 {code} Obviously, the outer {{OrderBy}} "order by v2,pk3" should be compiled out because it matchs the inner query {{OrderBy}} "order by t.v2,t.pk3,t.v1" , but unfortunately it is not compiled out. *2. GroupBy of ClientAggregatePlan* for sql : {code:java} select v1 from (select v1,pk2,pk1 from test t where pk1 = '6' order by t.pk2,t.v1,t.pk1 limit 10) a group by pk2,v1 {code} Obviously, the outer {{GroupBy}} "group by pk2,v1" should be orderPreserving because it matchs the inner query {{OrderBy}} "order by t.pk2,t.v1,t.pk1" , but unfortunately the {{isOrderPreserving()}} of outer {{GroupBy}} return false. *3. OrderBy of SortMergeJoinPlan(from PHOENIX-4618)* for sql: {code:java} SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b{code} The result of the sort-merge-join is sorted on (T1.a, T1.b) and (T2.a, T2.b) at the same time. Thus, both 1) {code:java} SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T1.a, T1.b{code} and 2) {code:java} SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T2.a, T2.b{code} should avoid doing an extra order-by after the sort-merge-join operation. All the above three cases are caused by the same problem that the {{OrderPreservingTracker}} relies solely on primary keys for inferring alignment between the target {{OrderByExpression}} and the source sortedness. For following {{QueryCompiler.compileSingleQuery}}, because the inner query of above cases has {{OrderBy}}, so in line 534, local variable {{isInRowKeyOrder}} is false: {code:java} 518 protected QueryPlan compileSingleQuery(StatementContext context, SelectStatement select, List<Object> binds, boolean asSubquery, boolean allowPageFilter) throws SQLException{ 519 SelectStatement innerSelect = select.getInnerSelectStatement(); 520 if (innerSelect == null) { 521 return compileSingleFlatQuery(context, select, binds, asSubquery, allowPageFilter, null, null, true); 522 } 523 524 QueryPlan innerPlan = compileSubquery(innerSelect, false); 525 TupleProjector tupleProjector = new TupleProjector(innerPlan.getProjector()); 526 innerPlan = new TupleProjectionPlan(innerPlan, tupleProjector, null); 527 528 // Replace the original resolver and table with those having compiled type info. 529 TableRef tableRef = context.getResolver().getTables().get(0); 530 ColumnResolver resolver = FromCompiler.getResolverForCompiledDerivedTable(statement.getConnection(), tableRef, innerPlan.getProjector()); 531 context.setResolver(resolver); 532 tableRef = resolver.getTables().get(0); 533 context.setCurrentTable(tableRef); 534 boolean isInRowKeyOrder = innerPlan.getGroupBy() == GroupBy.EMPTY_GROUP_BY && innerPlan.getOrderBy() == OrderBy.EMPTY_ORDER_BY; 535 536 return compileSingleFlatQuery(context, select, binds, asSubquery, allowPageFilter, innerPlan, tupleProjector, isInRowKeyOrder); 537 } {code} > Improve OrderPreservingTracker to optimize OrderBy/GroupBy for ClientScanPlan > and ClientAggregatePlan > ----------------------------------------------------------------------------------------------------- > > Key: PHOENIX-5148 > URL: https://issues.apache.org/jira/browse/PHOENIX-5148 > Project: Phoenix > Issue Type: Improvement > Affects Versions: 4.14.1 > Reporter: chenglei > Priority: Major > > Given a table > {code:java} > create table test ( > pk1 varchar not null , > pk2 varchar not null, > pk3 varchar not null, > v1 varchar, > v2 varchar, > CONSTRAINT TEST_PK PRIMARY KEY ( > pk1, > pk2, > pk3 )) > {code} > Consider following three cases: > *1. OrderBy of ClientScanPlan* > for sql: > {code:java} > select v1 from (select v1,v2,pk3 from test t where pk1 = '6' order by > t.v2,t.pk3,t.v1 limit 10) a order by v2,pk3 > {code} > Obviously, the outer {{OrderBy}} "order by v2,pk3" should be compiled out > because it matchs the inner query {{OrderBy}} "order by t.v2,t.pk3,t.v1" , > but unfortunately it is not compiled out. > *2. GroupBy of ClientAggregatePlan* > for sql : > {code:java} > select v1 from (select v1,pk2,pk1 from test t where pk1 = '6' order by > t.pk2,t.v1,t.pk1 limit 10) a group by pk2,v1 > {code} > Obviously, the outer {{GroupBy}} "group by pk2,v1" should be orderPreserving > because it matchs the inner query {{OrderBy}} "order by t.pk2,t.v1,t.pk1" , > but unfortunately the {{isOrderPreserving()}} of outer {{GroupBy}} return > false. > *3. OrderBy of SortMergeJoinPlan(from PHOENIX-4618)* > for sql: > {code:java} > SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b{code} > The result of the sort-merge-join is sorted on (T1.a, T1.b) and (T2.a, T2.b) > at the same time. > Thus, both 1) > {code:java} > SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T1.a, > T1.b{code} > and 2) > {code:java} > SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T2.a, > T2.b{code} > should avoid doing an extra order-by after the sort-merge-join operation. > > All the above three cases are caused by the same problem that the > {{OrderPreservingTracker}} relies solely on row keys for inferring alignment > between the target {{OrderByExpression}} and the source sortedness. > For following {{QueryCompiler.compileSingleQuery}}, because the inner query > of above cases has {{OrderBy}}, so in line 534, local variable > {{isInRowKeyOrder}} is false: > {code:java} > 518 protected QueryPlan compileSingleQuery(StatementContext context, > SelectStatement select, List<Object> binds, boolean asSubquery, boolean > allowPageFilter) throws SQLException{ > 519 SelectStatement innerSelect = select.getInnerSelectStatement(); > 520 if (innerSelect == null) { > 521 return compileSingleFlatQuery(context, select, binds, > asSubquery, allowPageFilter, null, null, true); > 522 } > 523 > 524 QueryPlan innerPlan = compileSubquery(innerSelect, false); > 525 TupleProjector tupleProjector = new > TupleProjector(innerPlan.getProjector()); > 526 innerPlan = new TupleProjectionPlan(innerPlan, tupleProjector, > null); > 527 > 528 // Replace the original resolver and table with those having > compiled type info. > 529 TableRef tableRef = context.getResolver().getTables().get(0); > 530 ColumnResolver resolver = > FromCompiler.getResolverForCompiledDerivedTable(statement.getConnection(), > tableRef, innerPlan.getProjector()); > 531 context.setResolver(resolver); > 532 tableRef = resolver.getTables().get(0); > 533 context.setCurrentTable(tableRef); > 534 boolean isInRowKeyOrder = innerPlan.getGroupBy() == > GroupBy.EMPTY_GROUP_BY && innerPlan.getOrderBy() == OrderBy.EMPTY_ORDER_BY; > 535 > 536 return compileSingleFlatQuery(context, select, binds, > asSubquery, allowPageFilter, innerPlan, tupleProjector, isInRowKeyOrder); > 537 } > {code} > Because the {{OrderPreservingTracker}} relies solely on row keys, so in > following {{OrderByCompiler.compile}} when {{isInRowKeyOrder}} is false, the > {{OrderPreservingTracker.isOrderPreserving()}} is skipped: > {code:java} > 169 // If we're ordering by the order returned by the scan, we don't need > an order by > 170 if (isInRowKeyOrder && tracker.isOrderPreserving()) { > 171 if (tracker.isReverse()) { > 172 // Don't use reverse scan if: > 173 // 1) we're using a skip scan, as our skip scan doesn't > support this yet. > 174 // 2) we have the FORWARD_SCAN hint set to choose to > keep loading of column > 175 // families on demand versus doing a reverse scan > 176 // REV_ROW_KEY_ORDER_BY scan would not take effect for a > projected table, so don't return it for such table types. > 177 if > (context.getConnection().getQueryServices().getProps().getBoolean(QueryServices.USE_REVERSE_SCAN_ATTRIB, > QueryServicesOptions.DEFAULT_USE_REVERSE_SCAN) > 178 && !context.getScanRanges().useSkipScanFilter() > 179 && context.getCurrentTable().getTable().getType() > != PTableType.PROJECTED > 180 && context.getCurrentTable().getTable().getType() > != PTableType.SUBQUERY > 181 && > !statement.getHint().hasHint(Hint.FORWARD_SCAN)) { > 182 return OrderBy.REV_ROW_KEY_ORDER_BY; > 183 } > 184 } else { > 185 return OrderBy.FWD_ROW_KEY_ORDER_BY; > 186 } > 187 } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)