[ 
https://issues.apache.org/jira/browse/CALCITE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17745319#comment-17745319
 ] 

Will Noble edited comment on CALCITE-5866 at 7/21/23 12:17 AM:
---------------------------------------------------------------

> IMO the `ORDER BY` clause in a sub-query is useless, because it not affects 
> the final result. Please correct it if I am wrong.

Most dialects allow sorting in sub-queries. This is a departure from the 
relational model, but MSSQL is the only popular SQL dialect I've encountered 
that does not work this was. Because of this, it is not generally possible to 
sort on something that is not also selected.

Here's the kind of query that motivated this bug in the first place:

{code:sql}
SELECT "JOB"
FROM "scott"."EMP"
GROUP BY "JOB"
ORDER BY "JOB", COUNT("ENAME")
{code}

While exploring solutions for CALCITE-5808 I came up with this that leverages 
MSSQL's sort-by-ordinal ability:

{code:sql}
SELECT "JOB"
FROM (SELECT "JOB", COUNT("ENAME") AS "$f1"
FROM "scott"."EMP"
GROUP BY "JOB"
ORDER BY "JOB", 2) AS "t0"
{code}

Of course, this is a problem because it has an {{ORDER BY}} clause in a 
sub-query, which will be ignored. So this query has no dependable order for the 
results. Using sort-by-ordinal, it would actually be impossible to write a 
single query that sorts on anything that is not a column in the final output. 
In order to achieve that, you must sort by expression. In this case, the top 
example works fine.

But MSSQL does a bad enough job at matching expressions in the {{ORDER BY}} 
list to expressions in the {{SELECT}} list that we cannot depend on 
sort-by-expression in the general case. Something like {{SELECT <crazy long 
expression> ORDER BY <same exact crazy long expression>}} will fail in the 
general case with the error message ["A constant expression was encountered in 
the ORDER BY 
list"|https://stackoverflow.com/questions/49377988/a-constant-expression-was-encountered-in-the-order-by-list-position-1]
 even for expressions that are not constant. I just encountered this error 
trying to run this query (I know it's complicated but it's hard to say for sure 
where the problem is so I'm reproducing it almost as-is from my real-world 
example):

{code:sql}
SELECT ((CONVERT(VARCHAR(19),
                CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
              
               ,120) + '.' + RIGHT('000000' + CAST(DATEPART(MCS, CAST(
                CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
              
                AS DATETIME2)) AS VARCHAR(6)), 6))) AS 
`users.created_microsecond`
FROM `USERS` AS `users`
GROUP BY ((CONVERT(VARCHAR(19),
                CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
              
               ,120) + '.' + RIGHT('000000' + CAST(DATEPART(MCS, CAST(
                CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
              
                AS DATETIME2)) AS VARCHAR(6)), 6)))
ORDER BY ((CONVERT(VARCHAR(19),
                CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
              
               ,120) + '.' + RIGHT('000000' + CAST(DATEPART(MCS, CAST(
                CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
              
                AS DATETIME2)) AS VARCHAR(6)), 6))) DESC
FETCH NEXT 1000 ROWS ONLY
{code}

As you can see, the {{ORDER BY}} expression is the same as the one in the 
{{SELECT}} and {{GROUP BY}} lists, but MSSQL thinks it's a constant expression.

Taken together, this means:

# We cannot generally rely on sorting by ordinal (or alias) because we need to 
insert sub-queries for sort expressions that aren't also selected, and we can't 
sort sub-queries.
# We cannot generally rely on sorting by expression because MSSQL does a bad 
job of recognizing when expressions are constant or not. This part is very 
poorly documented, because clearly sorting by expression works in some 
circumstances (and is in fact the only way to achieve certain results, like 
sorting by an un-selected expression), but not others.

I'm not sure how we can reconcile these in a reliable way. Perhaps the only way 
is to file bugs against SQL Server.


was (Author: wnoble):
This is actually a weirder problem than it seems because I've encountered some 
severely under-documented discrepancies in what SQL Server 2017 will accept, 
and it seems like MSSQL is actually incompatible with particular edge-cases 
that other databases can handle.

Here's the kind of query that motivated this bug in the first place:

{code:sql}
SELECT "JOB"
FROM "scott"."EMP"
GROUP BY "JOB"
ORDER BY "JOB", COUNT("ENAME")
{code}

While exploring solutions for CALCITE-5808 I came up with this that leverages 
MSSQL's sort-by-ordinal ability:

{code:sql}
SELECT "JOB"
FROM (SELECT "JOB", COUNT("ENAME") AS "$f1"
FROM "scott"."EMP"
GROUP BY "JOB"
ORDER BY "JOB", 2) AS "t0"
{code}

Of course, this is a problem because it has an {{ORDER BY}} clause in a 
sub-query, which will be ignored. So this query has no dependable order for the 
results. Using sort-by-ordinal, it would actually be impossible to write a 
single query that sorts on anything that is not a column in the final output. 
In order to achieve that, you must sort by expression. In this case, the top 
example works fine.

But MSSQL does a bad enough job at matching expressions in the {{ORDER BY}} 
list to expressions in the {{SELECT}} list that we cannot depend on 
sort-by-expression in the general case. Something like {{SELECT <crazy long 
expression> ORDER BY <same exact crazy long expression>}} will fail in the 
general case with the error message ["A constant expression was encountered in 
the ORDER BY 
list"|https://stackoverflow.com/questions/49377988/a-constant-expression-was-encountered-in-the-order-by-list-position-1]
 even for expressions that are not constant. I just encountered this error 
trying to run this query (I know it's complicated but it's hard to say for sure 
where the problem is so I'm reproducing it almost as-is from my real-world 
example):

{code:sql}
SELECT ((CONVERT(VARCHAR(19),
                CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
              
               ,120) + '.' + RIGHT('000000' + CAST(DATEPART(MCS, CAST(
                CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
              
                AS DATETIME2)) AS VARCHAR(6)), 6))) AS 
`users.created_microsecond`
FROM `USERS` AS `users`
GROUP BY ((CONVERT(VARCHAR(19),
                CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
              
               ,120) + '.' + RIGHT('000000' + CAST(DATEPART(MCS, CAST(
                CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
              
                AS DATETIME2)) AS VARCHAR(6)), 6)))
ORDER BY ((CONVERT(VARCHAR(19),
                CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
              
               ,120) + '.' + RIGHT('000000' + CAST(DATEPART(MCS, CAST(
                CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
              
                AS DATETIME2)) AS VARCHAR(6)), 6))) DESC
FETCH NEXT 1000 ROWS ONLY
{code}

As you can see, the {{ORDER BY}} expression is the same as the one in the 
{{SELECT}} and {{GROUP BY}} lists, but MSSQL thinks it's a constant expression.

Taken together, this means:

# We cannot generally rely on sorting by ordinal (or alias) because we need to 
insert sub-queries for sort expressions that aren't also selected, and we can't 
sort sub-queries.
# We cannot generally rely on sorting by expression because MSSQL does a bad 
job of recognizing when expressions are constant or not. This part is very 
poorly documented, because clearly sorting by expression works in some 
circumstances (and is in fact the only way to achieve certain results, like 
sorting by an un-selected expression), but not others.

I'm not sure how we can reconcile these in a reliable way. Perhaps the only way 
is to file bugs against SQL Server.

> Not all dialects support sorting in sub-queries
> -----------------------------------------------
>
>                 Key: CALCITE-5866
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5866
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Will Noble
>            Priority: Minor
>
> The rel-to-sql converter inserts subqueries in certain situations, such as 
> when sorting by ordinal (see CALCITE-5768). Certain dialects, such as MSSQL 
> (SQL Server) ignore the {{ORDER BY}} clause in a subquery. We may need a new 
> dialect-level setting like {{canSortInSubQuery}} that dictates whether 
> Calcite can safely insert sub-queries with {{ORDER BY}} clauses in them, or 
> whether it must do everything in it's power to avoid that (such as refusing 
> to sort by ordinal in cases where not all sort expressions are included in 
> the {{SELECT}} list).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to