[ https://issues.apache.org/jira/browse/CALCITE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735913#comment-17735913 ]
Will Noble commented on CALCITE-5775: ------------------------------------- Filed CALCITE-5793. Calcite already is so inclined to group and order by ordinals for BQ. It just doesn't work for null direction emulation since that requires adding an addition {{IS NULL}} operator, and the only operators you can apply to sorting ordinals is {{DESC}} or {{NULLS FIRST}} / {{LAST}}, so it has to expand it into the full expression in that case. This could potentially be mitigated by introducing a subquery with the {{IS NULL}} expression in the select list. This may be worth considering. I believe that previous bug you're talking about is CALCITE-5767. I'm not sure if that's related to the claims that you think are red herrings, but the problems outlined in this bug mean that only a single null direction (the default for whatever dialect is being unparsed) can possibly work. The user does not have the ability to use the non-default null direction if it has to be emulated with an extra {{IS NULL}} sort. > Null direction emulation broken for complex expressions on some dialects > ------------------------------------------------------------------------ > > Key: CALCITE-5775 > URL: https://issues.apache.org/jira/browse/CALCITE-5775 > Project: Calcite > Issue Type: Bug > Components: core > Reporter: Will Noble > Priority: Minor > Labels: pull-request-available > > This is a problem in BigQuery, and may be a problem in other dialects as > well. Consider the following piece of valid BQ SQL: > {code:sql} > SELECT REPEAT(first_name, 2), > COUNT(id) > FROM looker_test.users > GROUP BY REPEAT(first_name, 2) > ORDER BY 1 > {code} > Now consider a version where the {{ORDER BY}} clause is changed to this: > {code:sql} > ORDER BY REPEAT(first_name, 2) > {code} > This is logically the same query, because the expression in the {{ORDER}} > clause is the same as the one in the {{SELECT}} / {{GROUP}} clauses. BigQuery > is sophisticated enough to match the select to the group expression in both > queries, but cannot match either with the order expression. It gives this > error: _ORDER BY clause expression references column first_name which is > neither grouped nor aggregated_. > So, when sorting by complex expressions in BQ, Calcite relies on either: > * No null direction emulation required. > * sorting by alias or ordinal, which is a problem with current null direction > emulation because it adds an extra complex sort expression, or > * having a query that just happens to also have the underlying field in the > {{GROUP BY}} clause by itself, which seems to actually happen pretty often in > my testing, but obviously shouldn't be a constraint. > As I wrote that, I realized this may be easily fixable for BQ since it added > support for {{NULLS FIRST}} / {{LAST}} in 2020 and it seems Calcite has not > caught up yet. Consider this rel node: > {code} > LogicalSort(sort0=[$0], dir0=[ASC]) > LogicalAggregate(group=[{0}], cent=[COUNT($1)]) > LogicalProject($f0=[CASE(IS NULL($4), 0, 1)], MGR=[$3]) > JdbcTableScan(table=[[JDBC_SCOTT, EMP]]) > {code} > Calcite would convert it to this in BigQuery due to null direction emulation: > {code:sql} > SELECT CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END AS `$f0`, COUNT(MGR) AS > cent > FROM SCOTT.EMP > GROUP BY CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END > ORDER BY CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END IS NULL, 1 > {code} > Which of course triggers the problem described above. This may be a problem > for MSSQL as well since it doesn't support {{NULLS LAST}}. The fix for BQ, at > least, may be to just support {{NULLS LAST}} and sort by ordinal. -- This message was sent by Atlassian Jira (v8.20.10#820010)