[ https://issues.apache.org/jira/browse/CALCITE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17745319#comment-17745319 ]
Will Noble edited comment on CALCITE-5866 at 7/21/23 12:17 AM: --------------------------------------------------------------- > IMO the `ORDER BY` clause in a sub-query is useless, because it not affects > the final result. Please correct it if I am wrong. Most dialects allow sorting in sub-queries. This is a departure from the relational model, but MSSQL is the only popular SQL dialect I've encountered that does not work this was. Because of this, it is not generally possible to sort on something that is not also selected. Here's the kind of query that motivated this bug in the first place: {code:sql} SELECT "JOB" FROM "scott"."EMP" GROUP BY "JOB" ORDER BY "JOB", COUNT("ENAME") {code} While exploring solutions for CALCITE-5808 I came up with this that leverages MSSQL's sort-by-ordinal ability: {code:sql} SELECT "JOB" FROM (SELECT "JOB", COUNT("ENAME") AS "$f1" FROM "scott"."EMP" GROUP BY "JOB" ORDER BY "JOB", 2) AS "t0" {code} Of course, this is a problem because it has an {{ORDER BY}} clause in a sub-query, which will be ignored. So this query has no dependable order for the results. Using sort-by-ordinal, it would actually be impossible to write a single query that sorts on anything that is not a column in the final output. In order to achieve that, you must sort by expression. In this case, the top example works fine. But MSSQL does a bad enough job at matching expressions in the {{ORDER BY}} list to expressions in the {{SELECT}} list that we cannot depend on sort-by-expression in the general case. Something like {{SELECT <crazy long expression> ORDER BY <same exact crazy long expression>}} will fail in the general case with the error message ["A constant expression was encountered in the ORDER BY list"|https://stackoverflow.com/questions/49377988/a-constant-expression-was-encountered-in-the-order-by-list-position-1] even for expressions that are not constant. I just encountered this error trying to run this query (I know it's complicated but it's hard to say for sure where the problem is so I'm reproducing it almost as-is from my real-world example): {code:sql} SELECT ((CONVERT(VARCHAR(19), CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE users.name END AS DATETIME2) ,120) + '.' + RIGHT('000000' + CAST(DATEPART(MCS, CAST( CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE users.name END AS DATETIME2) AS DATETIME2)) AS VARCHAR(6)), 6))) AS `users.created_microsecond` FROM `USERS` AS `users` GROUP BY ((CONVERT(VARCHAR(19), CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE users.name END AS DATETIME2) ,120) + '.' + RIGHT('000000' + CAST(DATEPART(MCS, CAST( CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE users.name END AS DATETIME2) AS DATETIME2)) AS VARCHAR(6)), 6))) ORDER BY ((CONVERT(VARCHAR(19), CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE users.name END AS DATETIME2) ,120) + '.' + RIGHT('000000' + CAST(DATEPART(MCS, CAST( CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE users.name END AS DATETIME2) AS DATETIME2)) AS VARCHAR(6)), 6))) DESC FETCH NEXT 1000 ROWS ONLY {code} As you can see, the {{ORDER BY}} expression is the same as the one in the {{SELECT}} and {{GROUP BY}} lists, but MSSQL thinks it's a constant expression. Taken together, this means: # We cannot generally rely on sorting by ordinal (or alias) because we need to insert sub-queries for sort expressions that aren't also selected, and we can't sort sub-queries. # We cannot generally rely on sorting by expression because MSSQL does a bad job of recognizing when expressions are constant or not. This part is very poorly documented, because clearly sorting by expression works in some circumstances (and is in fact the only way to achieve certain results, like sorting by an un-selected expression), but not others. I'm not sure how we can reconcile these in a reliable way. Perhaps the only way is to file bugs against SQL Server. was (Author: wnoble): This is actually a weirder problem than it seems because I've encountered some severely under-documented discrepancies in what SQL Server 2017 will accept, and it seems like MSSQL is actually incompatible with particular edge-cases that other databases can handle. Here's the kind of query that motivated this bug in the first place: {code:sql} SELECT "JOB" FROM "scott"."EMP" GROUP BY "JOB" ORDER BY "JOB", COUNT("ENAME") {code} While exploring solutions for CALCITE-5808 I came up with this that leverages MSSQL's sort-by-ordinal ability: {code:sql} SELECT "JOB" FROM (SELECT "JOB", COUNT("ENAME") AS "$f1" FROM "scott"."EMP" GROUP BY "JOB" ORDER BY "JOB", 2) AS "t0" {code} Of course, this is a problem because it has an {{ORDER BY}} clause in a sub-query, which will be ignored. So this query has no dependable order for the results. Using sort-by-ordinal, it would actually be impossible to write a single query that sorts on anything that is not a column in the final output. In order to achieve that, you must sort by expression. In this case, the top example works fine. But MSSQL does a bad enough job at matching expressions in the {{ORDER BY}} list to expressions in the {{SELECT}} list that we cannot depend on sort-by-expression in the general case. Something like {{SELECT <crazy long expression> ORDER BY <same exact crazy long expression>}} will fail in the general case with the error message ["A constant expression was encountered in the ORDER BY list"|https://stackoverflow.com/questions/49377988/a-constant-expression-was-encountered-in-the-order-by-list-position-1] even for expressions that are not constant. I just encountered this error trying to run this query (I know it's complicated but it's hard to say for sure where the problem is so I'm reproducing it almost as-is from my real-world example): {code:sql} SELECT ((CONVERT(VARCHAR(19), CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE users.name END AS DATETIME2) ,120) + '.' + RIGHT('000000' + CAST(DATEPART(MCS, CAST( CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE users.name END AS DATETIME2) AS DATETIME2)) AS VARCHAR(6)), 6))) AS `users.created_microsecond` FROM `USERS` AS `users` GROUP BY ((CONVERT(VARCHAR(19), CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE users.name END AS DATETIME2) ,120) + '.' + RIGHT('000000' + CAST(DATEPART(MCS, CAST( CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE users.name END AS DATETIME2) AS DATETIME2)) AS VARCHAR(6)), 6))) ORDER BY ((CONVERT(VARCHAR(19), CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE users.name END AS DATETIME2) ,120) + '.' + RIGHT('000000' + CAST(DATEPART(MCS, CAST( CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE users.name END AS DATETIME2) AS DATETIME2)) AS VARCHAR(6)), 6))) DESC FETCH NEXT 1000 ROWS ONLY {code} As you can see, the {{ORDER BY}} expression is the same as the one in the {{SELECT}} and {{GROUP BY}} lists, but MSSQL thinks it's a constant expression. Taken together, this means: # We cannot generally rely on sorting by ordinal (or alias) because we need to insert sub-queries for sort expressions that aren't also selected, and we can't sort sub-queries. # We cannot generally rely on sorting by expression because MSSQL does a bad job of recognizing when expressions are constant or not. This part is very poorly documented, because clearly sorting by expression works in some circumstances (and is in fact the only way to achieve certain results, like sorting by an un-selected expression), but not others. I'm not sure how we can reconcile these in a reliable way. Perhaps the only way is to file bugs against SQL Server. > Not all dialects support sorting in sub-queries > ----------------------------------------------- > > Key: CALCITE-5866 > URL: https://issues.apache.org/jira/browse/CALCITE-5866 > Project: Calcite > Issue Type: Bug > Reporter: Will Noble > Priority: Minor > > The rel-to-sql converter inserts subqueries in certain situations, such as > when sorting by ordinal (see CALCITE-5768). Certain dialects, such as MSSQL > (SQL Server) ignore the {{ORDER BY}} clause in a subquery. We may need a new > dialect-level setting like {{canSortInSubQuery}} that dictates whether > Calcite can safely insert sub-queries with {{ORDER BY}} clauses in them, or > whether it must do everything in it's power to avoid that (such as refusing > to sort by ordinal in cases where not all sort expressions are included in > the {{SELECT}} list). -- This message was sent by Atlassian Jira (v8.20.10#820010)