[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074232#comment-18074232
]
Zhen Chen edited comment on CALCITE-7463 at 4/17/26 12:19 PM:
--------------------------------------------------------------
Sorry for delay reply. Please see this case. Also you can test in this
[link|https://onecompiler.com/postgresql/44kmk22jr].
CREATE TABLE t (
id INTEGER,
name VARCHAR(10)
);
INSERT INTO t VALUES (1, 'a');
INSERT INTO t VALUES (1, 'b');
INSERT INTO t VALUES (1, 'c');
INSERT INTO t VALUES (2, 'd');
INSERT INTO t VALUES (2, 'e');
INSERT INTO t VALUES (3, 'f');
INSERT INTO t VALUES (3, 'g');
INSERT INTO t VALUES (3, 'h');
INSERT INTO t VALUES (3, 'i');
(SELECT id FROM t limit 2)
UNION
(SELECT id FROM t limit 2);
– id
– ----
– 1
– (1 row)
select distinct id from t limit 2;
– id
– ----
– 3
– 2
– (2 rows)
1. UNION with LIMIT in subqueries:
Each subquery's \{{LIMIT }}executes first → may get duplicate rows
\{{UNION }}removes duplicates afterward → final result may have fewer rows
2. DISTINCT with LIMIT:
\{{DISTINCT }}executes first → gets unique values
\{{LIMIT }}applies to distinct results → stable row count
Execution order of operations causes different outcomes. Please note that if
table t is not strictly ordered, the result of the first SQL query will be
unstable, possibly returning 1 to 3 rows, while the result of the second SQL
query is stable with 2 rows.
If we can accept this random scenario, meaning the logic that "random" +
"random" = "random" holds true, let's look at the following example:
select id from t limit 10 * random();
select id from t limit 10 * random() * random();
Are these two SQL statements equivalent?
If the above equation holds, I believe these two SQL statements are equivalent.
Is my conclusion wrong?
If the logic that "random" + "random" = "random" is incorrect, can the
rewriting of UNION be considered wrong?
Is it reasonable to add "uncertainty" on top of "uncertainty"?
was (Author: jensen):
Sorry for delay reply. Please see this case.
CREATE TABLE t (
id INTEGER,
name VARCHAR(10)
);
INSERT INTO t VALUES (1, 'a');
INSERT INTO t VALUES (1, 'b');
INSERT INTO t VALUES (1, 'c');
INSERT INTO t VALUES (2, 'd');
INSERT INTO t VALUES (2, 'e');
INSERT INTO t VALUES (3, 'f');
INSERT INTO t VALUES (3, 'g');
INSERT INTO t VALUES (3, 'h');
INSERT INTO t VALUES (3, 'i');
(SELECT id FROM t limit 2)
UNION
(SELECT id FROM t limit 2);
-- id
-- ----
-- 1
-- (1 row)
select distinct id from t limit 2;
-- id
-- ----
-- 3
-- 2
-- (2 rows)
1. UNION with LIMIT in subqueries:
Each subquery's {{LIMIT }}executes first → may get duplicate rows
{{UNION }}removes duplicates afterward → final result may have fewer rows
2. DISTINCT with LIMIT:
{{DISTINCT }}executes first → gets unique values
{{LIMIT }}applies to distinct results → stable row count
Execution order of operations causes different outcomes. Please note that if
table t is not strictly ordered, the result of the first SQL query will be
unstable, possibly returning 1 to 3 rows, while the result of the second SQL
query is stable with 2 rows.
If we can accept this random scenario, meaning the logic that "random" +
"random" = "random" holds true, let's look at the following example:
select id from t limit 10 * random();
select id from t limit 10 * random() * random();
Are these two SQL statements equivalent?
If the above equation holds, I believe these two SQL statements are equivalent.
Is my conclusion wrong?
If the logic that "random" + "random" = "random" is incorrect, can the
rewriting of UNION be considered wrong?
Is it reasonable to add "uncertainty" on top of "uncertainty"?
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> -------------------------------------------------------
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.41.0
> Reporter: Zhen Chen
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)