[
https://issues.apache.org/jira/browse/PHOENIX-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
chenglei updated PHOENIX-5148:
------------------------------
Description:
Given a table
{code:java}
create table test (
pk1 varchar not null ,
pk2 varchar not null,
pk3 varchar not null,
v1 varchar,
v2 varchar,
CONSTRAINT TEST_PK PRIMARY KEY (
pk1,
pk2,
pk3 ))
{code}
Consider following three cases:
*1. OrderBy of ClientScanPlan*
for sql:
{code:java}
select v1 from (select v1,v2,pk3 from test t where pk1 = '6' order by
t.v2,t.pk3,t.v1 limit 10) a order by v2,pk3
{code}
Obviously, the outer {{OrderBy}} "order by v2,pk3" should be compiled out
because it matchs the inner query {{OrderBy}} "order by t.v2,t.pk3,t.v1" , but
unfortunately it is not compiled out.
*2. GroupBy of ClientAggregatePlan*
for sql :
{code:java}
select v1 from (select v1,pk2,pk1 from test t where pk1 = '6' order by
t.pk2,t.v1,t.pk1 limit 10) a group by pk2,v1
{code}
Obviously, the outer {{GroupBy}} "group by pk2,v1" should be orderPreserving
because it matchs the inner query {{OrderBy}} "order by t.pk2,t.v1,t.pk1" , but
unfortunately the {{isOrderPreserving()}} of outer {{GroupBy}} return false.
*3. OrderBy of SortMergeJoinPlan(from PHOENIX-4618)*
for sql:
{code:java}
SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b{code}
The result of the sort-merge-join is sorted on (T1.a, T1.b) and (T2.a, T2.b) at
the same time.
Thus, both 1)
{code:java}
SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T1.a,
T1.b{code}
and 2)
{code:java}
SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T2.a,
T2.b{code}
should avoid doing an extra order-by after the sort-merge-join operation.
All the above three cases are caused by the same problem that the
{{OrderPreservingTracker}} relies solely on row keys for inferring alignment
between the target {{Expression}} and the source sortedness.
For following {{QueryCompiler.compileSingleQuery}}, because the inner query of
above cases has {{OrderBy}}, so in line 534, local variable {{isInRowKeyOrder}}
is false:
{code:java}
518 protected QueryPlan compileSingleQuery(StatementContext context,
SelectStatement select, List<Object> binds, boolean asSubquery, boolean
allowPageFilter) throws SQLException{
519 SelectStatement innerSelect = select.getInnerSelectStatement();
520 if (innerSelect == null) {
521 return compileSingleFlatQuery(context, select, binds,
asSubquery, allowPageFilter, null, null, true);
522 }
523
524 QueryPlan innerPlan = compileSubquery(innerSelect, false);
525 TupleProjector tupleProjector = new
TupleProjector(innerPlan.getProjector());
526 innerPlan = new TupleProjectionPlan(innerPlan, tupleProjector,
null);
527
528 // Replace the original resolver and table with those having
compiled type info.
529 TableRef tableRef = context.getResolver().getTables().get(0);
530 ColumnResolver resolver =
FromCompiler.getResolverForCompiledDerivedTable(statement.getConnection(),
tableRef, innerPlan.getProjector());
531 context.setResolver(resolver);
532 tableRef = resolver.getTables().get(0);
533 context.setCurrentTable(tableRef);
534 boolean isInRowKeyOrder = innerPlan.getGroupBy() ==
GroupBy.EMPTY_GROUP_BY && innerPlan.getOrderBy() == OrderBy.EMPTY_ORDER_BY;
535
536 return compileSingleFlatQuery(context, select, binds, asSubquery,
allowPageFilter, innerPlan, tupleProjector, isInRowKeyOrder);
537 }
{code}
Because the {{OrderPreservingTracker}} relies solely on row keys, so in
following line 170 of {{OrderByCompiler.compile}} , when {{isInRowKeyOrder}} is
false, the {{OrderPreservingTracker.isOrderPreserving()}} checking is skipped:
{code:java}
169 // If we're ordering by the order returned by the scan, we don't need
an order by
170 if (isInRowKeyOrder && tracker.isOrderPreserving()) {
171 if (tracker.isReverse()) {
172 // Don't use reverse scan if:
173 // 1) we're using a skip scan, as our skip scan doesn't
support this yet.
174 // 2) we have the FORWARD_SCAN hint set to choose to keep
loading of column
175 // families on demand versus doing a reverse scan
176 // REV_ROW_KEY_ORDER_BY scan would not take effect for a
projected table, so don't return it for such table types.
177 if
(context.getConnection().getQueryServices().getProps().getBoolean(QueryServices.USE_REVERSE_SCAN_ATTRIB,
QueryServicesOptions.DEFAULT_USE_REVERSE_SCAN)
178 && !context.getScanRanges().useSkipScanFilter()
179 && context.getCurrentTable().getTable().getType() !=
PTableType.PROJECTED
180 && context.getCurrentTable().getTable().getType() !=
PTableType.SUBQUERY
181 && !statement.getHint().hasHint(Hint.FORWARD_SCAN)) {
182 return OrderBy.REV_ROW_KEY_ORDER_BY;
183 }
184 } else {
185 return OrderBy.FWD_ROW_KEY_ORDER_BY;
186 }
187 }
{code}
What we can do to fix this problem are :
- refactor the {{OrderPreservingTracker}} to make it inferring alignment
between the target {{Expression}} and the source sortedness based on
{{OrderByExpression}} , not the row keys, so we can remove the variable
{{isInRowKeyOrder}}.
- add a new method {{getActualOutputOrderBys}} to the {{QueryPlan}} to get the
output OrderBys of the query result , eg. for the above third
{{SortMergeJoinPlan}} example ,{{getActualOutputOrderBys}} returns (T1.a, T1.b)
and (T2.a, T2.b) .
With this Jira ,
was:
Given a table
{code:java}
create table test (
pk1 varchar not null ,
pk2 varchar not null,
pk3 varchar not null,
v1 varchar,
v2 varchar,
CONSTRAINT TEST_PK PRIMARY KEY (
pk1,
pk2,
pk3 ))
{code}
Consider following three cases:
*1. OrderBy of ClientScanPlan*
for sql:
{code:java}
select v1 from (select v1,v2,pk3 from test t where pk1 = '6' order by
t.v2,t.pk3,t.v1 limit 10) a order by v2,pk3
{code}
Obviously, the outer {{OrderBy}} "order by v2,pk3" should be compiled out
because it matchs the inner query {{OrderBy}} "order by t.v2,t.pk3,t.v1" , but
unfortunately it is not compiled out.
*2. GroupBy of ClientAggregatePlan*
for sql :
{code:java}
select v1 from (select v1,pk2,pk1 from test t where pk1 = '6' order by
t.pk2,t.v1,t.pk1 limit 10) a group by pk2,v1
{code}
Obviously, the outer {{GroupBy}} "group by pk2,v1" should be orderPreserving
because it matchs the inner query {{OrderBy}} "order by t.pk2,t.v1,t.pk1" , but
unfortunately the {{isOrderPreserving()}} of outer {{GroupBy}} return false.
*3. OrderBy of SortMergeJoinPlan(from PHOENIX-4618)*
for sql:
{code:java}
SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b{code}
The result of the sort-merge-join is sorted on (T1.a, T1.b) and (T2.a, T2.b) at
the same time.
Thus, both 1)
{code:java}
SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T1.a,
T1.b{code}
and 2)
{code:java}
SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T2.a,
T2.b{code}
should avoid doing an extra order-by after the sort-merge-join operation.
All the above three cases are caused by the same problem that the
{{OrderPreservingTracker}} relies solely on row keys for inferring alignment
between the target {{OrderByExpression}} and the source sortedness.
For following {{QueryCompiler.compileSingleQuery}}, because the inner query of
above cases has {{OrderBy}}, so in line 534, local variable {{isInRowKeyOrder}}
is false:
{code:java}
518 protected QueryPlan compileSingleQuery(StatementContext context,
SelectStatement select, List<Object> binds, boolean asSubquery, boolean
allowPageFilter) throws SQLException{
519 SelectStatement innerSelect = select.getInnerSelectStatement();
520 if (innerSelect == null) {
521 return compileSingleFlatQuery(context, select, binds,
asSubquery, allowPageFilter, null, null, true);
522 }
523
524 QueryPlan innerPlan = compileSubquery(innerSelect, false);
525 TupleProjector tupleProjector = new
TupleProjector(innerPlan.getProjector());
526 innerPlan = new TupleProjectionPlan(innerPlan, tupleProjector,
null);
527
528 // Replace the original resolver and table with those having
compiled type info.
529 TableRef tableRef = context.getResolver().getTables().get(0);
530 ColumnResolver resolver =
FromCompiler.getResolverForCompiledDerivedTable(statement.getConnection(),
tableRef, innerPlan.getProjector());
531 context.setResolver(resolver);
532 tableRef = resolver.getTables().get(0);
533 context.setCurrentTable(tableRef);
534 boolean isInRowKeyOrder = innerPlan.getGroupBy() ==
GroupBy.EMPTY_GROUP_BY && innerPlan.getOrderBy() == OrderBy.EMPTY_ORDER_BY;
535
536 return compileSingleFlatQuery(context, select, binds, asSubquery,
allowPageFilter, innerPlan, tupleProjector, isInRowKeyOrder);
537 }
{code}
Because the {{OrderPreservingTracker}} relies solely on row keys, so in
following line 170 of {{OrderByCompiler.compile}} , when {{isInRowKeyOrder}} is
false, the {{OrderPreservingTracker.isOrderPreserving()}} checking is skipped:
{code:java}
169 // If we're ordering by the order returned by the scan, we don't need
an order by
170 if (isInRowKeyOrder && tracker.isOrderPreserving()) {
171 if (tracker.isReverse()) {
172 // Don't use reverse scan if:
173 // 1) we're using a skip scan, as our skip scan doesn't
support this yet.
174 // 2) we have the FORWARD_SCAN hint set to choose to keep
loading of column
175 // families on demand versus doing a reverse scan
176 // REV_ROW_KEY_ORDER_BY scan would not take effect for a
projected table, so don't return it for such table types.
177 if
(context.getConnection().getQueryServices().getProps().getBoolean(QueryServices.USE_REVERSE_SCAN_ATTRIB,
QueryServicesOptions.DEFAULT_USE_REVERSE_SCAN)
178 && !context.getScanRanges().useSkipScanFilter()
179 && context.getCurrentTable().getTable().getType() !=
PTableType.PROJECTED
180 && context.getCurrentTable().getTable().getType() !=
PTableType.SUBQUERY
181 && !statement.getHint().hasHint(Hint.FORWARD_SCAN)) {
182 return OrderBy.REV_ROW_KEY_ORDER_BY;
183 }
184 } else {
185 return OrderBy.FWD_ROW_KEY_ORDER_BY;
186 }
187 }
{code}
> Improve OrderPreservingTracker to optimize OrderBy/GroupBy for ClientScanPlan
> and ClientAggregatePlan
> -----------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-5148
> URL: https://issues.apache.org/jira/browse/PHOENIX-5148
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 4.14.1
> Reporter: chenglei
> Assignee: chenglei
> Priority: Major
>
> Given a table
> {code:java}
> create table test (
> pk1 varchar not null ,
> pk2 varchar not null,
> pk3 varchar not null,
> v1 varchar,
> v2 varchar,
> CONSTRAINT TEST_PK PRIMARY KEY (
> pk1,
> pk2,
> pk3 ))
> {code}
> Consider following three cases:
> *1. OrderBy of ClientScanPlan*
> for sql:
> {code:java}
> select v1 from (select v1,v2,pk3 from test t where pk1 = '6' order by
> t.v2,t.pk3,t.v1 limit 10) a order by v2,pk3
> {code}
> Obviously, the outer {{OrderBy}} "order by v2,pk3" should be compiled out
> because it matchs the inner query {{OrderBy}} "order by t.v2,t.pk3,t.v1" ,
> but unfortunately it is not compiled out.
> *2. GroupBy of ClientAggregatePlan*
> for sql :
> {code:java}
> select v1 from (select v1,pk2,pk1 from test t where pk1 = '6' order by
> t.pk2,t.v1,t.pk1 limit 10) a group by pk2,v1
> {code}
> Obviously, the outer {{GroupBy}} "group by pk2,v1" should be orderPreserving
> because it matchs the inner query {{OrderBy}} "order by t.pk2,t.v1,t.pk1" ,
> but unfortunately the {{isOrderPreserving()}} of outer {{GroupBy}} return
> false.
> *3. OrderBy of SortMergeJoinPlan(from PHOENIX-4618)*
> for sql:
> {code:java}
> SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b{code}
> The result of the sort-merge-join is sorted on (T1.a, T1.b) and (T2.a, T2.b)
> at the same time.
> Thus, both 1)
> {code:java}
> SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T1.a,
> T1.b{code}
> and 2)
> {code:java}
> SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T2.a,
> T2.b{code}
> should avoid doing an extra order-by after the sort-merge-join operation.
>
> All the above three cases are caused by the same problem that the
> {{OrderPreservingTracker}} relies solely on row keys for inferring alignment
> between the target {{Expression}} and the source sortedness.
> For following {{QueryCompiler.compileSingleQuery}}, because the inner query
> of above cases has {{OrderBy}}, so in line 534, local variable
> {{isInRowKeyOrder}} is false:
> {code:java}
> 518 protected QueryPlan compileSingleQuery(StatementContext context,
> SelectStatement select, List<Object> binds, boolean asSubquery, boolean
> allowPageFilter) throws SQLException{
> 519 SelectStatement innerSelect = select.getInnerSelectStatement();
> 520 if (innerSelect == null) {
> 521 return compileSingleFlatQuery(context, select, binds,
> asSubquery, allowPageFilter, null, null, true);
> 522 }
> 523
> 524 QueryPlan innerPlan = compileSubquery(innerSelect, false);
> 525 TupleProjector tupleProjector = new
> TupleProjector(innerPlan.getProjector());
> 526 innerPlan = new TupleProjectionPlan(innerPlan, tupleProjector,
> null);
> 527
> 528 // Replace the original resolver and table with those having
> compiled type info.
> 529 TableRef tableRef = context.getResolver().getTables().get(0);
> 530 ColumnResolver resolver =
> FromCompiler.getResolverForCompiledDerivedTable(statement.getConnection(),
> tableRef, innerPlan.getProjector());
> 531 context.setResolver(resolver);
> 532 tableRef = resolver.getTables().get(0);
> 533 context.setCurrentTable(tableRef);
> 534 boolean isInRowKeyOrder = innerPlan.getGroupBy() ==
> GroupBy.EMPTY_GROUP_BY && innerPlan.getOrderBy() == OrderBy.EMPTY_ORDER_BY;
> 535
> 536 return compileSingleFlatQuery(context, select, binds,
> asSubquery, allowPageFilter, innerPlan, tupleProjector, isInRowKeyOrder);
> 537 }
> {code}
> Because the {{OrderPreservingTracker}} relies solely on row keys, so in
> following line 170 of {{OrderByCompiler.compile}} , when {{isInRowKeyOrder}}
> is false, the {{OrderPreservingTracker.isOrderPreserving()}} checking is
> skipped:
> {code:java}
> 169 // If we're ordering by the order returned by the scan, we don't need
> an order by
> 170 if (isInRowKeyOrder && tracker.isOrderPreserving()) {
> 171 if (tracker.isReverse()) {
> 172 // Don't use reverse scan if:
> 173 // 1) we're using a skip scan, as our skip scan doesn't
> support this yet.
> 174 // 2) we have the FORWARD_SCAN hint set to choose to
> keep loading of column
> 175 // families on demand versus doing a reverse scan
> 176 // REV_ROW_KEY_ORDER_BY scan would not take effect for a
> projected table, so don't return it for such table types.
> 177 if
> (context.getConnection().getQueryServices().getProps().getBoolean(QueryServices.USE_REVERSE_SCAN_ATTRIB,
> QueryServicesOptions.DEFAULT_USE_REVERSE_SCAN)
> 178 && !context.getScanRanges().useSkipScanFilter()
> 179 && context.getCurrentTable().getTable().getType()
> != PTableType.PROJECTED
> 180 && context.getCurrentTable().getTable().getType()
> != PTableType.SUBQUERY
> 181 &&
> !statement.getHint().hasHint(Hint.FORWARD_SCAN)) {
> 182 return OrderBy.REV_ROW_KEY_ORDER_BY;
> 183 }
> 184 } else {
> 185 return OrderBy.FWD_ROW_KEY_ORDER_BY;
> 186 }
> 187 }
> {code}
>
> What we can do to fix this problem are :
> - refactor the {{OrderPreservingTracker}} to make it inferring alignment
> between the target {{Expression}} and the source sortedness based on
> {{OrderByExpression}} , not the row keys, so we can remove the variable
> {{isInRowKeyOrder}}.
> - add a new method {{getActualOutputOrderBys}} to the {{QueryPlan}} to get
> the output OrderBys of the query result , eg. for the above third
> {{SortMergeJoinPlan}} example ,{{getActualOutputOrderBys}} returns (T1.a,
> T1.b) and (T2.a, T2.b) .
> With this Jira ,
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)