[jira] [Commented] (KYLIN-6044) Subqueries containing JOIN may lead to incorrect query results

Guoliang Sun (Jira) Wed, 19 Feb 2025 00:50:21 -0800


    [ 
https://issues.apache.org/jira/browse/KYLIN-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17928350#comment-17928350
 ]


Guoliang Sun commented on KYLIN-6044:
-------------------------------------

h3. Dev Design

As mentioned in the RootCause, using detailed indexes is a negative 
optimization. During debugging, it was found that even when detailed indexes 
were used to answer the query, the final result was still incorrect. Analyzing 
the SQL:  

1. The innermost subquery calculates the sum of the `D_YEARMONTHNUM` column and 
the `D_YEARMONTHNUM` column in the `SSB.DATES` table, and appends a column `DT` 
with a constant value of 1.  
2. The results of the first two subqueries are then LEFT JOINed on the 
condition `a.DT = b.DT`. Since the `DT` column is always 1 in both subqueries, 
this JOIN will return one row containing the values of `acct_bal_CURR` and 
`acct_id_CURR`.  
3. The outer query performs a `SELECT 1` on the result of the previous step, 
replacing each row with the constant value 1.  
4. Since the result of the LEFT JOIN in the inner query contains only one row 
(assuming the `SSB.DATES` table is not empty), the final `SELECT count(1)` will 
also return 1, indicating only one row exists.  

Thus, the result of this SQL query should always be 1. However, when hitting 
the detailed index, the result returned is the direct row count of the detailed 
index, which does not match the expected result. This discrepancy occurs 
because the query logic loses two layers of Aggregation operators during 
computation. After removing the `Sum`, the corresponding upper-level `Count` is 
also removed. Subsequently, the operation that should have performed a table 
scan and JOIN instead returns precomputed table data, causing the `Count` value 
to become the number of rows after joining the two precomputed tables. This 
data is incorrect.  

Even if the user sees that the query is answered using detailed indexes, the 
result may still be wrong. Notably, in the new version, the query logic plan 
`OlapProjectRel($f0=[1], ctx=[])` is pushed to the lowest-level `ProjectRel` 
due to a series of optimization processes.  

This is caused by the default abstract relational optimization rules registered 
in Calcite. The difference between the new and old versions is that the new 
version adds two optimization rules: *PROJECT_AGGREGATE_MERGE* and 
{*}AGGREGATE_MERGE{*}. While *AGGREGATE_MERGE* does not affect the query logic 
plan of the example SQL, *PROJECT_AGGREGATE_MERGE* removes unused aggregation 
operators, impacting Kylin's aggregation index matching logic.  

After removing the *PROJECT_AGGREGATE_MERGE* rule, the following query logic 
plan after the CBO phase retains the `DUMMY=[COUNT()]` node, allowing proper 
information collection and hitting the aggregation index.  
{code:java}
OlapToEnumerableConverter
  OlapLimitRel(ctx=[], fetch=[500])
    OlapAggregateRel(group-set=[[]], groups=[null], EXPR$0=[COUNT()], ctx=[])
      OlapJoinRel(condition=[=($0, $1)], joinType=[full], ctx=[])
        OlapProjectRel($f3=['1995-07-07':VARCHAR(10)], ctx=[])
          OlapAggregateRel(group-set=[[]], groups=[null], DUMMY=[COUNT()], 
ctx=[])
            OlapAggregateRel(group-set=[[0]], groups=[null], 
ACCT_BAL=[SUM($1)], ctx=[])
              OlapFilterRel(condition=[=($0, 1995-07-06)], ctx=[])
                OlapProjectRel(D_DATEKEY=[$0], D_YEAR=[$4], ctx=[])
                  OlapTableScan(table=[[SSB, DATES]], ctx=[], fields=[[0, 1, 2, 
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
        OlapProjectRel($f3=['1995-07-07':VARCHAR(10)], ctx=[])
          OlapAggregateRel(group-set=[[]], groups=[null], DUMMY=[COUNT()], 
ctx=[])
            OlapAggregateRel(group-set=[[0]], groups=[null], ACCT_ID=[SUM($1)], 
ctx=[])
              OlapFilterRel(condition=[=($0, 1995-07-06)], ctx=[])
                OlapProjectRel(D_DATEKEY=[$0], D_YEARMONTHNUM=[$5], ctx=[])
                  OlapTableScan(table=[[SSB, DATES]], ctx=[], fields=[[0, 1, 2, 
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) {code}
The final design approach is to remove the *PROJECT_AGGREGATE_MERGE* rule in 
the code. Verification work covered CI and Step query test cases, all of which 
passed successfully.

> Subqueries containing JOIN may lead to incorrect query results
> --------------------------------------------------------------
>
>                 Key: KYLIN-6044
>                 URL: https://issues.apache.org/jira/browse/KYLIN-6044
>             Project: Kylin
>          Issue Type: Bug
>    Affects Versions: 5.0.0
>            Reporter: Guoliang Sun
>            Priority: Major
>
> It was discovered that a specific SQL query hits inconsistent indexes in a 
> particular version compared to earlier versions. Further simplification of 
> the SQL revealed that this issue might lead to incorrect query results.  
>  
> {code:java}
> SELECT count(1)
> FROM (
>     SELECT 1
>     FROM (
>         SELECT sum(D_YEAR) as acct_bal_CURR
>             ,1 AS DT
>         FROM SSB.DATES
>         ) a
>     left JOIN (
>         SELECT sum(D_YEARMONTHNUM) acct_id_CURR
>             ,1 AS DT
>         FROM SSB.DATES 
>         ) b ON a.DT = b.DT
>     )
> LIMIT 500{code}
> The model contains two SUM metrics from the SQL, and the correct query result 
> should be 1.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KYLIN-6044) Subqueries containing JOIN may lead to incorrect query results

Reply via email to