zhong.zhu created KYLIN-5742:
--------------------------------
Summary: When the "Group by" group has duplicate values, the
result of Grouping Set query is inconsistent with that in SparkSQL
Key: KYLIN-5742
URL: https://issues.apache.org/jira/browse/KYLIN-5742
Project: Kylin
Issue Type: Bug
Affects Versions: 5.0-beta
Reporter: zhong.zhu
Fix For: 5.0.0
Attachments: image-2023-12-11-14-54-38-652.png,
image-2023-12-11-14-55-46-222.png, image-2023-12-11-14-57-32-037.png,
image-2023-12-11-14-57-56-771.png
{code:sql}
-- sql1
select C_NAME,C_CITY,C_NATION,C_REGION,count(*)
FROM SSB.LINEORDER as LINEORDER
INNER JOIN SSB.CUSTOMER as CUSTOMER
ON LINEORDER.LO_CUSTKEY = CUSTOMER.C_CUSTKEY
where C_NATION = 'CHINA' and C_CITY = 'CHINA 0'
group by
GROUPING SETS ((),(C_NAME,C_CITY),(C_NATION,C_REGION))
order by C_NAME;
-- sql2
select C_NAME,C_CITY,C_NATION,C_REGION,count(*)
FROM SSB.LINEORDER as LINEORDER
INNER JOIN SSB.CUSTOMER as CUSTOMER
ON LINEORDER.LO_CUSTKEY = CUSTOMER.C_CUSTKEY
where C_NATION = 'CHINA' and C_CITY = 'CHINA 0'
group by
C_NAME,C_CITY,C_NATION,C_REGION,
GROUPING SETS ((),(C_NAME,C_CITY),(C_NATION,C_REGION))
order by C_NAME;
-- sql3
select C_NAME,C_CITY,C_NATION,C_REGION,count(*)
FROM SSB.LINEORDER as LINEORDER
INNER JOIN SSB.CUSTOMER as CUSTOMER
ON LINEORDER.LO_CUSTKEY = CUSTOMER.C_CUSTKEY
where C_NATION = 'CHINA' and C_CITY = 'CHINA 0'
group by
C_NAME,C_CITY,C_NATION,C_REGION
GROUPING SETS ((),(C_NAME,C_CITY),(C_NATION,C_REGION))
order by C_NAME
{code}
In spark-sql, sql1 and sql3 query results are consistent as follows:
!image-2023-12-11-14-54-38-652.png!
In spark-sql, sql 2 the query results are as follows.
!image-2023-12-11-14-55-46-222.png!
In KYLIN, the query result of sql1 is as follows, which is consistent with the
result of spark-sql sql sql1 sql2:
!image-2023-12-11-14-57-32-037.png!
The query result of sql2 is as follows, which is inconsistent with the
spark-sql sql2 result:
!image-2023-12-11-14-57-56-771.png!
The syntax of sql3 is not supported
Hive does not support commas before grouping sets, that is, sql2 is not
supported, and the query results of sql1 and sql3 are consistent with spark-sql
--
This message was sent by Atlassian Jira
(v8.20.10#820010)