[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074386#comment-18074386
 ] 

Steven Phillips commented on CALCITE-7463:
------------------------------------------

That's actually not quite what I'm saying. The fact that it probably does it 
anyway is not the foundation of my argument as to why it's correct. My argument 
is basically:

Limit without order by means: return two rows, it doesn't matter which ones or 
what method is used to choose them.
Therefore, choosing the two rows for the subqueries such that they return the 
exact same two rows in each instance is a valid optimizer choice, as it meets 
the requirements of the request.

With the random() function, there is a different meaning. It doesn't mean give 
me any value, I don't care what or how you decided. It explicitly means, give 
me a value with random probability over a uniform distribution. So, while it 
could be any value, I do care how it's done, or at least I expect the result to 
be generated in a way that is (pseudo)random.

Another analogy would be the ANY aggregate function, which some engines 
support. It's arbitrary, but not random, so it would be valid to transform the 
query:

{code:sql}
select any(x) from t
{code}

to 

{code:sql}
select min(x) from t
{code}

whereas:

{code}
select x from t order by random() limit 1
{code}

would not be valid to transform this way, since the random() function has the 
expectation of (pseudo)randomness. 

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> -------------------------------------------------------
>
>                 Key: CALCITE-7463
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7463
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.41.0
>            Reporter: Zhen Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
>     LogicalProject(MGR=[$3], COMM=[$6])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
>     LogicalProject(MGR=[$3], COMM=[$6])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
>     LogicalProject(MGR=[$3], COMM=[$6])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to