[ 
https://issues.apache.org/jira/browse/PHOENIX-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-5148:
------------------------------
    Attachment: PHOENIX-5148_v3-4.x-HBase-1.2.patch

> Improve OrderPreservingTracker to optimize OrderBy/GroupBy for ClientScanPlan 
> and ClientAggregatePlan
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5148
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5148
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.14.1
>            Reporter: chenglei
>            Assignee: chenglei
>            Priority: Major
>             Fix For: 4.15.0
>
>         Attachments: PHOENIX-5148-4.x-HBase-1.4.patch, 
> PHOENIX-5148_v2-4.x-HBase-1.4.patch, PHOENIX-5148_v3-4.x-HBase-1.2.patch, 
> PHOENIX-5148_v3-4.x-HBase-1.4.patch, PHOENIX-5148_v3-master.patch
>
>          Time Spent: 9h
>  Remaining Estimate: 0h
>
> Given a table
> {code:java}
>   create table test ( 
>        pk1 varchar not null , 
>        pk2 varchar not null, 
>        pk3 varchar not null,
>         v1 varchar, 
>         v2 varchar, 
>         CONSTRAINT TEST_PK PRIMARY KEY ( 
>               pk1,
>               pk2,
>               pk3 ))
> {code}
> Consider following three cases:
> *1. OrderBy of ClientScanPlan*
> for sql:
> {code:java}
> select v1 from (select v1,v2,pk3 from test t where pk1 = '6' order by 
> t.v2,t.pk3,t.v1 limit 10) a order by v2,pk3
> {code}
> Obviously, the outer {{OrderBy}} "order by v2,pk3" should be compiled out 
> because it matchs the inner query {{OrderBy}} "order by t.v2,t.pk3,t.v1" , 
> but unfortunately it is not compiled out.
> *2. GroupBy of ClientAggregatePlan*
> for sql :
> {code:java}
> select v1 from (select v1,pk2,pk1 from test t where pk1 = '6' order by 
> t.pk2,t.v1,t.pk1 limit 10) a group by pk2,v1
> {code}
> Obviously, the outer {{GroupBy}} "group by pk2,v1" should be orderPreserving 
> because it matchs the inner query {{OrderBy}} "order by t.pk2,t.v1,t.pk1" , 
> but unfortunately the {{isOrderPreserving()}} of outer {{GroupBy}} return 
> false.
> *3. OrderBy of SortMergeJoinPlan(from PHOENIX-4618)*
> for sql:
> {code:java}
> SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b{code}
> The result of the sort-merge-join is sorted on (T1.a, T1.b) and (T2.a, T2.b) 
> at the same time.
>  Thus, both 1)
> {code:java}
> SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T1.a, 
> T1.b{code}
> and 2)
> {code:java}
> SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T2.a, 
> T2.b{code}
> should avoid doing an extra order-by after the sort-merge-join operation.
>  
> All the above three cases are caused by the same problem that the 
> {{OrderPreservingTracker}} relies solely on row keys for inferring alignment 
> between the target {{Expression}} and the source sortedness.
> For following {{QueryCompiler.compileSingleQuery}}, because the inner query 
> of above cases has {{OrderBy}}, so in line 534, local variable 
> {{isInRowKeyOrder}} is false:
> {code:java}
> 518   protected QueryPlan compileSingleQuery(StatementContext context, 
> SelectStatement select, List<Object> binds, boolean asSubquery, boolean 
> allowPageFilter) throws SQLException{
> 519           SelectStatement innerSelect = select.getInnerSelectStatement();
> 520           if (innerSelect == null) {
> 521                return compileSingleFlatQuery(context, select, binds, 
> asSubquery, allowPageFilter, null, null, true);
> 522           }
> 523  
> 524           QueryPlan innerPlan = compileSubquery(innerSelect, false);
> 525           TupleProjector tupleProjector = new 
> TupleProjector(innerPlan.getProjector());
> 526           innerPlan = new TupleProjectionPlan(innerPlan, tupleProjector, 
> null);
> 527
> 528           // Replace the original resolver and table with those having 
> compiled type info.
> 529           TableRef tableRef = context.getResolver().getTables().get(0);
> 530            ColumnResolver resolver = 
> FromCompiler.getResolverForCompiledDerivedTable(statement.getConnection(), 
> tableRef, innerPlan.getProjector());
> 531            context.setResolver(resolver);
> 532           tableRef = resolver.getTables().get(0);
> 533           context.setCurrentTable(tableRef);
> 534           boolean isInRowKeyOrder = innerPlan.getGroupBy() == 
> GroupBy.EMPTY_GROUP_BY && innerPlan.getOrderBy() == OrderBy.EMPTY_ORDER_BY;
> 535
> 536           return compileSingleFlatQuery(context, select, binds, 
> asSubquery, allowPageFilter, innerPlan, tupleProjector, isInRowKeyOrder);
> 537       }
> {code}
> Because the {{OrderPreservingTracker}} relies solely on row keys, so in 
> following line 170 of {{OrderByCompiler.compile}} , when {{isInRowKeyOrder}} 
> is false, the {{OrderPreservingTracker.isOrderPreserving()}} checking is 
> skipped:
> {code:java}
> 169     // If we're ordering by the order returned by the scan, we don't need 
> an order by
> 170      if (isInRowKeyOrder && tracker.isOrderPreserving()) {
> 171             if (tracker.isReverse()) {
> 172                  // Don't use reverse scan if:
> 173                  // 1) we're using a skip scan, as our skip scan doesn't 
> support this yet.
> 174                  // 2) we have the FORWARD_SCAN hint set to choose to 
> keep loading of column
> 175                  //    families on demand versus doing a reverse scan
> 176                  // REV_ROW_KEY_ORDER_BY scan would not take effect for a 
> projected table, so don't return it for such table types.
> 177                if 
> (context.getConnection().getQueryServices().getProps().getBoolean(QueryServices.USE_REVERSE_SCAN_ATTRIB,
>  QueryServicesOptions.DEFAULT_USE_REVERSE_SCAN)
> 178                        && !context.getScanRanges().useSkipScanFilter()
> 179                        && context.getCurrentTable().getTable().getType() 
> != PTableType.PROJECTED
> 180                        && context.getCurrentTable().getTable().getType() 
> != PTableType.SUBQUERY
> 181                        && 
> !statement.getHint().hasHint(Hint.FORWARD_SCAN)) {
> 182                    return OrderBy.REV_ROW_KEY_ORDER_BY;
> 183               }
> 184            } else {
> 185                 return OrderBy.FWD_ROW_KEY_ORDER_BY;
> 186           }
> 187        }
> {code}
>  
> What we can do to fix this problem are :
> - refactor the {{OrderPreservingTracker}}  to make it inferring alignment 
> between the target {{Expression}} and the source sortedness based on 
> {{OrderByExpression}} , not the row keys, so we can remove the variable 
> {{isInRowKeyOrder}}.
> - add a new method {{getActualOutputOrderBys}} to the {{QueryPlan}} to get 
> the output OrderBys of the query result , eg. for the above third 
> {{SortMergeJoinPlan}} example ,{{getActualOutputOrderBys}} returns (T1.a, 
> T1.b) and (T2.a, T2.b) .
> Besides above three cases, a implicit semantic optimization could be 
> implemented also:
> for following sql , even the literal {{OrderBy}} of inner query is "order by 
> t.pk1,t.pk2" .  By {{getActualOutputOrderBys}} of inner {{QueryPlan}}, we can 
> infer that the actual {{OrderBy}} of inner query is "order by 
> t.pk1,t.pk2,t.pk3" , so the outer OrderBy "order by pk1,pk2,pk3"  could be 
> compiled out:
> {code}
> select pk1 from (select pk3,pk2,pk1 from test t where v1 = '6' order by 
> t.pk1,t.pk2 limit 10) a where pk3 > '8' order by pk1,pk2,pk3
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to