[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074386#comment-18074386
]
Steven Phillips commented on CALCITE-7463:
------------------------------------------
That's actually not quite what I'm saying. The fact that it probably does it
anyway is not the foundation of my argument as to why it's correct. My argument
is basically:
Limit without order by means: return two rows, it doesn't matter which ones or
what method is used to choose them.
Therefore, choosing the two rows for the subqueries such that they return the
exact same two rows in each instance is a valid optimizer choice, as it meets
the requirements of the request.
With the random() function, there is a different meaning. It doesn't mean give
me any value, I don't care what or how you decided. It explicitly means, give
me a value with random probability over a uniform distribution. So, while it
could be any value, I do care how it's done, or at least I expect the result to
be generated in a way that is (pseudo)random.
Another analogy would be the ANY aggregate function, which some engines
support. It's arbitrary, but not random, so it would be valid to transform the
query:
{code:sql}
select any(x) from t
{code}
to
{code:sql}
select min(x) from t
{code}
whereas:
{code}
select x from t order by random() limit 1
{code}
would not be valid to transform this way, since the random() function has the
expectation of (pseudo)randomness.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> -------------------------------------------------------
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.41.0
> Reporter: Zhen Chen
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)