[jira] [Updated] (PHOENIX-7397) Optimize ClientAggregatePlan/ClientScanPlan when inner query plan is UnionPlan

chenglei (Jira) Thu, 29 Aug 2024 09:09:42 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


chenglei updated PHOENIX-7397:
------------------------------
    Description: 
A common use case of union all is to union multiple homogeneous tables and then 
further aggregate or sort , just as following sql statements：
{code:java}
select A, B,C, sum(...),sum(....)
from (
select a1 as A, b1 as B,c1 as C, ....
from T1
union all
select a2,b2,c2,.....
from T2
union all
select a3,b3,c3, ....
)
group by A,B,C
having....
{code}


{code:java}
select A, B,C, sum(...),sum(...)
from (
select a1 as A, b1 as B,c1 as C, sum(...), sum(...),....
from T1
group by a1,b1,c1
union all
select a2,b2,c2,.sum(...),sum(...)....
from T2
group by a2,b2,c2
union all
select a3,b3,c3, sum(...),sum(...)....
group by a3,b3,c3
)
group by A,B,C
having....
{code}

{code:java}
select A, B,C...
from (
select a1 as A, b1 as B,c1 as C, ....
from T1
union all
select a2,b2,c2,.....
from T2
union all
select a3,b3,c3, ....
)
where ......
order by A,B,C
{code}
For situations similar to these three SQL queries, if each subquery of union 
all is actually ordered by A, B, and C, then {{UnionPlan}} only needs to 
perform a simple merge on the subquery results to ensure the overall order of 
the union all result. However, for the current implementation of Phoenix, the 
outer {{ClientAggregatePlan}} or {{ClientScanPlan}} does not know this 
information within {{UnionPlan}} and will still perform heavyweight processing 
based on the assumption that the overall result of union all is completely 
unordered.


  was:
A common use case of union all is to union multiple homogeneous tables and then 
further aggregate or sort , just as following sql statements：
{code:java}
select A, B,C, sum(...),sum(....)
from (
select a1 as A, b1 as B,c1 as C, ....
from T1
union all
select a2,b2,c2,.....
from T2
union all
select a3,b3,c3, ....
)
group by A,B,C
having....
{code}


{code:java}
select A, B,C, sum(...),sum(...)
from (
select a1 as A, b1 as B,c1 as C, sum(...), sum(...),....
from T1
group by a1,b1,c1
union all
select a2,b2,c2,.sum(...),sum(...)....
from T2
group by a2,b2,c2
union all
select a3,b3,c3, sum(...),sum(...)....
group by a3,b3,c3
)
group by A,B,C
having....
{code}

{code:java}
select A, B,C...
from (
select a1 as A, b1 as B,c1 as C, ....
from T1
union all
select a2,b2,c2,.....
from T2
union all
select a3,b3,c3, ....
)
where ......
order by A,B,C
{code}
For situations similar to these three SQL queries, if each subquery of union 
all is actually ordered by A, B, and C, then UnionPlan only needs to perform a 
simple merge on the subquery results to ensure the overall order of the union 
all result. However, for the current implementation of Phoenix, the outer 
ClientAggregatePlan or ClientScanPlan does not know this information within 
UnionPlan and will still perform heavyweight processing based on the assumption 
that the overall result of union all is unordered.



> Optimize ClientAggregatePlan/ClientScanPlan when inner query plan is UnionPlan
> ------------------------------------------------------------------------------
>
>                 Key: PHOENIX-7397
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7397
>             Project: Phoenix
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 5.2.0, 5.1.3
>            Reporter: chenglei
>            Priority: Major
>
> A common use case of union all is to union multiple homogeneous tables and 
> then further aggregate or sort , just as following sql statements：
> {code:java}
> select A, B,C, sum(...),sum(....)
> from (
> select a1 as A, b1 as B,c1 as C, ....
> from T1
> union all
> select a2,b2,c2,.....
> from T2
> union all
> select a3,b3,c3, ....
> )
> group by A,B,C
> having....
> {code}
> {code:java}
> select A, B,C, sum(...),sum(...)
> from (
> select a1 as A, b1 as B,c1 as C, sum(...), sum(...),....
> from T1
> group by a1,b1,c1
> union all
> select a2,b2,c2,.sum(...),sum(...)....
> from T2
> group by a2,b2,c2
> union all
> select a3,b3,c3, sum(...),sum(...)....
> group by a3,b3,c3
> )
> group by A,B,C
> having....
> {code}
> {code:java}
> select A, B,C...
> from (
> select a1 as A, b1 as B,c1 as C, ....
> from T1
> union all
> select a2,b2,c2,.....
> from T2
> union all
> select a3,b3,c3, ....
> )
> where ......
> order by A,B,C
> {code}
> For situations similar to these three SQL queries, if each subquery of union 
> all is actually ordered by A, B, and C, then {{UnionPlan}} only needs to 
> perform a simple merge on the subquery results to ensure the overall order of 
> the union all result. However, for the current implementation of Phoenix, the 
> outer {{ClientAggregatePlan}} or {{ClientScanPlan}} does not know this 
> information within {{UnionPlan}} and will still perform heavyweight 
> processing based on the assumption that the overall result of union all is 
> completely unordered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PHOENIX-7397) Optimize ClientAggregatePlan/ClientScanPlan when inner query plan is UnionPlan

Reply via email to