[jira] [Created] (KYLIN-5742) When the "Group by" group has duplicate values, the result of Grouping Set query is inconsistent with that in SparkSQL

zhong.zhu (Jira) Sun, 10 Dec 2023 22:59:07 -0800

zhong.zhu created KYLIN-5742:
--------------------------------

             Summary: When the "Group by" group has duplicate values, the 
result of Grouping Set query is inconsistent with that in SparkSQL
                 Key: KYLIN-5742
                 URL: https://issues.apache.org/jira/browse/KYLIN-5742
             Project: Kylin
          Issue Type: Bug
    Affects Versions: 5.0-beta
            Reporter: zhong.zhu
             Fix For: 5.0.0
         Attachments: image-2023-12-11-14-54-38-652.png, 
image-2023-12-11-14-55-46-222.png, image-2023-12-11-14-57-32-037.png, 
image-2023-12-11-14-57-56-771.png


{code:sql}

-- sql1
select C_NAME,C_CITY,C_NATION,C_REGION,count(*)
FROM SSB.LINEORDER as LINEORDER
INNER JOIN SSB.CUSTOMER as CUSTOMER
ON LINEORDER.LO_CUSTKEY = CUSTOMER.C_CUSTKEY
where C_NATION = 'CHINA' and C_CITY = 'CHINA    0'
group by 
GROUPING SETS ((),(C_NAME,C_CITY),(C_NATION,C_REGION))
order by C_NAME;

-- sql2
select C_NAME,C_CITY,C_NATION,C_REGION,count(*)
FROM SSB.LINEORDER as LINEORDER
INNER JOIN SSB.CUSTOMER as CUSTOMER
ON LINEORDER.LO_CUSTKEY = CUSTOMER.C_CUSTKEY
where C_NATION = 'CHINA' and C_CITY = 'CHINA    0'
group by 
C_NAME,C_CITY,C_NATION,C_REGION,
GROUPING SETS ((),(C_NAME,C_CITY),(C_NATION,C_REGION))
order by C_NAME;

-- sql3
select C_NAME,C_CITY,C_NATION,C_REGION,count(*)
FROM SSB.LINEORDER as LINEORDER
INNER JOIN SSB.CUSTOMER as CUSTOMER
ON LINEORDER.LO_CUSTKEY = CUSTOMER.C_CUSTKEY
where C_NATION = 'CHINA' and C_CITY = 'CHINA    0'
group by 
C_NAME,C_CITY,C_NATION,C_REGION
GROUPING SETS ((),(C_NAME,C_CITY),(C_NATION,C_REGION))
order by C_NAME

{code}
In spark-sql, sql1 and sql3 query results are consistent as follows:
 !image-2023-12-11-14-54-38-652.png! 

In spark-sql, sql 2 the query results are as follows.
 !image-2023-12-11-14-55-46-222.png! 

In KYLIN, the query result of sql1 is as follows, which is consistent with the 
result of spark-sql sql sql1 sql2:
 !image-2023-12-11-14-57-32-037.png! 

The query result of sql2 is as follows, which is inconsistent with the 
spark-sql sql2 result:
 !image-2023-12-11-14-57-56-771.png! 

The syntax of sql3 is not supported

Hive does not support commas before grouping sets, that is, sql2 is not 
supported, and the query results of sql1 and sql3 are consistent with spark-sql







--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KYLIN-5742) When the "Group by" group has duplicate values, the result of Grouping Set query is inconsistent with that in SparkSQL

Reply via email to