Lino Rosa created CALCITE-7138:
----------------------------------
Summary: RelToSqlConverterTest drops subquery boundary and
inflates COUNT results
Key: CALCITE-7138
URL: https://issues.apache.org/jira/browse/CALCITE-7138
Project: Calcite
Issue Type: Bug
Reporter: Lino Rosa
Take this input query:
{code:java}
SELECT
COUNT(CASE WHEN has_salesman = 1 AND has_high_salary = 1 THEN 1 END) AS
depts_with_salesman_and_high_salary,
COUNT(CASE WHEN has_salesman = 1 THEN 1 END) AS depts_with_salesman,
COUNT(CASE WHEN has_high_salary = 1 THEN 1 END) AS depts_with_high_salary
FROM (
SELECT
MAX(CASE WHEN "position_title" = 'SALESMAN' THEN 1 ELSE 0 END) AS
has_salesman,
MAX(CASE WHEN "salary" > 1250 THEN 1 ELSE 0 END) AS has_high_salary
FROM "foodmart"."employee"
GROUP BY "department_id"
) AS dept_flags;{code}
After going through `SqlToRelConverter`, the resulting SQL will have the GROUP
BY from the subquery inlined into a single query:
{code:java}
SELECT
COUNT(
CASE
WHEN MAX(CASE WHEN "position_title" = 'SALESMAN' THEN 1 ELSE 0 END)
= 1
AND MAX(CASE WHEN CAST("salary" AS DECIMAL(14, 4)) > 1250.0000
THEN 1 ELSE 0 END) = 1
THEN 1
ELSE NULL
END
) AS "DEPTS_WITH_SALESMAN_AND_HIGH_SALARY",
COUNT(
CASE
WHEN MAX(CASE WHEN "position_title" = 'SALESMAN' THEN 1 ELSE 0 END)
= 1
THEN 1
ELSE NULL
END
) AS "DEPTS_WITH_SALESMAN",
COUNT(
CASE
WHEN MAX(CASE WHEN CAST("salary" AS DECIMAL(14, 4)) > 1250.0000
THEN 1 ELSE 0 END) = 1
THEN 1
ELSE NULL
END
) AS "DEPTS_WITH_HIGH_SALARY"
FROM "foodmart"."employee"
GROUP BY "department_id";
{code}
This is not equivalent to the first query.
The first query counts departments, while the second—due to the GROUP BY
department_id and how COUNT(...) is used—counts rows (employees) inside each
qualifying department.
For example, for this dataset:
||department_id||position_title||salary||
|10|SALESMAN|1000|
|10|CLERK|3000|
|20|MANAGER|2000|
|30|SALESMAN|1100|
The first query results in:
||depts_with_salesman_and_high_salary||depts_with_salesman||depts_with_high_salary||
|1|2|2|
While the second query returns {*}one row per department{*}, and the numbers
are row counts, not department counts:
||depts_with_salesman_and_high_salary||depts_with_salesman||depts_with_high_salary||
|2|2|2|
|0|0|1|
|0|1|0|
--
This message was sent by Atlassian Jira
(v8.20.10#820010)