[jira] [Comment Edited] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

Zhen Chen (Jira) Fri, 17 Apr 2026 05:20:06 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074232#comment-18074232
 ]


Zhen Chen edited comment on CALCITE-7463 at 4/17/26 12:19 PM:
--------------------------------------------------------------

Sorry for delay reply. Please see this case. Also you can test in this 
[link|https://onecompiler.com/postgresql/44kmk22jr].
CREATE TABLE t (
id INTEGER,
name VARCHAR(10)
);

INSERT INTO t VALUES (1, 'a');
INSERT INTO t VALUES (1, 'b');
INSERT INTO t VALUES (1, 'c');
INSERT INTO t VALUES (2, 'd');
INSERT INTO t VALUES (2, 'e');
INSERT INTO t VALUES (3, 'f');
INSERT INTO t VALUES (3, 'g');
INSERT INTO t VALUES (3, 'h');
INSERT INTO t VALUES (3, 'i');

(SELECT id FROM t limit 2)
UNION
(SELECT id FROM t limit 2);
– id 
– ----
–   1
– (1 row)

select distinct id from t limit 2;
– id 
– ----
–   3
–   2
– (2 rows)
1. UNION with LIMIT in subqueries:

  Each subquery's \{{LIMIT }}executes first → may get duplicate rows

  \{{UNION }}removes duplicates afterward → final result may have fewer rows

2. DISTINCT with LIMIT:

  \{{DISTINCT }}executes first → gets unique values

  \{{LIMIT }}applies to distinct results → stable row count

Execution order of operations causes different outcomes. Please note that if 
table t is not strictly ordered, the result of the first SQL query will be 
unstable, possibly returning 1 to 3 rows, while the result of the second SQL 
query is stable with 2 rows.

If we can accept this random scenario, meaning the logic that "random" + 
"random" = "random" holds true, let's look at the following example:
select id from t limit 10 * random();
select id from t limit 10 * random() * random();
Are these two SQL statements equivalent?

If the above equation holds, I believe these two SQL statements are equivalent. 
Is my conclusion wrong?

If the logic that "random" + "random" = "random" is incorrect, can the 
rewriting of UNION be considered wrong?

Is it reasonable to add "uncertainty" on top of "uncertainty"?


was (Author: jensen):
Sorry for delay reply. Please see this case.
CREATE TABLE t (
  id INTEGER,
  name VARCHAR(10)
);

INSERT INTO t VALUES (1, 'a');
INSERT INTO t VALUES (1, 'b');
INSERT INTO t VALUES (1, 'c');
INSERT INTO t VALUES (2, 'd');
INSERT INTO t VALUES (2, 'e');
INSERT INTO t VALUES (3, 'f');
INSERT INTO t VALUES (3, 'g');
INSERT INTO t VALUES (3, 'h');
INSERT INTO t VALUES (3, 'i');

(SELECT id FROM t limit 2)
UNION
(SELECT id FROM t limit 2);
--  id 
-- ----
--   1
-- (1 row)

select distinct id from t limit 2;
--  id 
-- ----
--   3
--   2
-- (2 rows)
1. UNION with LIMIT in subqueries:

  Each subquery's {{LIMIT }}executes first → may get duplicate rows

  {{UNION }}removes duplicates afterward → final result may have fewer rows

2. DISTINCT with LIMIT:

  {{DISTINCT }}executes first → gets unique values

  {{LIMIT }}applies to distinct results → stable row count

Execution order of operations causes different outcomes. Please note that if 
table t is not strictly ordered, the result of the first SQL query will be 
unstable, possibly returning 1 to 3 rows, while the result of the second SQL 
query is stable with 2 rows.

If we can accept this random scenario, meaning the logic that "random" + 
"random" = "random" holds true, let's look at the following example:
select id from t limit 10 * random();
select id from t limit 10 * random() * random();
Are these two SQL statements equivalent?

If the above equation holds, I believe these two SQL statements are equivalent. 
Is my conclusion wrong?

If the logic that "random" + "random" = "random" is incorrect, can the 
rewriting of UNION be considered wrong?

Is it reasonable to add "uncertainty" on top of "uncertainty"?

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> -------------------------------------------------------
>
>                 Key: CALCITE-7463
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7463
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.41.0
>            Reporter: Zhen Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
>     LogicalProject(MGR=[$3], COMM=[$6])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
>     LogicalProject(MGR=[$3], COMM=[$6])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
>     LogicalProject(MGR=[$3], COMM=[$6])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

Reply via email to