[ 
https://issues.apache.org/jira/browse/CALCITE-5808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5808:
--------------------------------
    Description: 
While exploring solutions to CALCITE-5724 I produced [this draft 
commit|https://github.com/apache/calcite/commit/d644ecee44ffd7927a62be96329cd5456f545de2]
 to explore what the Rel-to-Sql conversion process would look like if it were 
heavily biased toward using ordinals in the {{GROUP BY}} and {{ORDER BY}} 
clauses. It works for most common queries, but I gave up with basically 2 
problems yet to be solved:

# Grouping with {{ROLLUP}}, {{CUBE}}, or {{GROUPING SETS}} does not allow 
ordinals (at least [according to this presto 
ticket|https://github.com/prestodb/presto/issues/9522]; I didn't double-check 
any standards or dialects), so my draft solution would have to distinguish 
between these cases and a simple group-by, and only use ordinals in the simple 
case.
# Window functions with an {{ORDER BY}} clause also use ordinals in my draft. 
It seems likely that ordinals are also disallowed in this context, but I'm not 
sure. Looking at the results in 
{{RelToSqlConverterTest.testConvertWindowToSql()}}, it certainly looks like the 
query is incorrect even if ordinals could hypothetically go in a window's 
{{ORDER BY}}.

So, I've decided to stop working on this for now, but wanted to preserve my 
draft change and start a discussion on this ticket. Here are some thoughts:

# It seems like we can reach a happy middle ground on sorting by using ordinals 
whenever the dialect allows *and* the expression returned by {{field()}} is 
anything other than a {{SqlIdentifier}} (i.e. a named column reference). 
Existing behavior as of writing is to only use ordinals when the expression is 
a {{SqlCall}}, but we should use it for literals as well. That would solve 
CALCITE-5724, but still use named references for "simple" collations, which 
seems to be the case for all window functions.
# When it comes to grouping, things are more complicated. As of writing, 
Calcite tends to group by expressions, but some dialects (e.g. BigQuery) can 
get easily confused by this, even when the expression in the {{SELECT}} list 
perfectly matches that in the {{GROUP BY}} clause. With our Calcite 
integration, we need customized cleanup logic to rewrite the {{GROUP BY}} 
clauses in terms of aliases / ordinals in the {{SELECT}} list whenever 
possible, as a band-aid for this problem. We want to get rid of this band-aid 
and upstream a proper solution, and I thought using ordinals could be it, but 
the problems with {{GROUPING SETS}} et al still needs to be solved, and Calcite 
seems to have poor support for grouping by ordinals in general (see the changes 
I had to make to {{RelToSqlConverter.generateGroupList}} in my draft) which 
should be improved.

  was:
While exploring solutions to CALCITE-5724 I produced [this draft 
commit|https://github.com/apache/calcite/commit/d644ecee44ffd7927a62be96329cd5456f545de2]
 to explore what the Rel-to-Sql conversion process would look like if it were 
heavily biased toward using ordinals in the {{GROUP BY}} and {{ORDER BY}} 
clauses. It works for most common queries, but I gave up with basically 2 
problems yet to be solved:

# Grouping with {{ROLLUP}}, {{CUBE}}, or {{GROUPING SETS}} does not allow 
ordinals (at least [according to this presto 
ticket|https://github.com/prestodb/presto/issues/9522]; I didn't double-check 
any standards or dialects), so my draft solution would have to distinguish 
between these cases and a simple group-by, and only use ordinals in the simple 
case.
# Window functions with an {{ORDER BY}} clause also use ordinals in my draft. 
It seems likely that ordinals are also disallowed in this context, but I'm not 
sure. Looking at the results in 
{{RelToSqlConverterTest.testConvertWindowToSql()}}, it certainly looks like the 
query is incorrect even if ordinals can hypothetically go in a window's {{ORDER 
BY}}.

So, I've decided to stop working on this for now, but wanted to preserve my 
draft change and start a discussion on this ticket. Here are some thoughts:

# It seems like we can reach a happy middle ground on sorting by using ordinals 
whenever the dialect allows *and* the expression returned by {{field()}} is 
anything other than a {{SqlIdentifier}} (i.e. a named column reference). 
Existing behavior as of writing is to only use ordinals when the expression is 
a {{SqlCall}}, but we should use it for literals as well. That would solve 
CALCITE-5724, but still use named references for "simple" collations, which 
seems to be the case for all window functions.
# When it comes to grouping, things are more complicated. As of writing, 
Calcite tends to group by expressions, but some dialects (e.g. BigQuery) can 
get easily confused by this, even when the expression in the {{SELECT}} list 
perfectly matches that in the {{GROUP BY}} clause. With our Calcite 
integration, we need customized cleanup logic to rewrite the {{GROUP BY}} 
clauses in terms of aliases / ordinals in the {{SELECT}} list whenever 
possible, as a band-aid for this problem. We want to get rid of this band-aid 
and upstream a proper solution, and I thought using ordinals could be it, but 
the problems with {{GROUPING SETS}} et al still needs to be solved, and Calcite 
seems to have poor support for grouping by ordinals in general (see the changes 
I had to make to {{RelToSqlConverter.generateGroupList}} in my draft) which 
should be improved.


> Rel-to-Sql conversion should better support grouping or sorting by references 
> or ordinals
> -----------------------------------------------------------------------------------------
>
>                 Key: CALCITE-5808
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5808
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Will Noble
>            Priority: Major
>
> While exploring solutions to CALCITE-5724 I produced [this draft 
> commit|https://github.com/apache/calcite/commit/d644ecee44ffd7927a62be96329cd5456f545de2]
>  to explore what the Rel-to-Sql conversion process would look like if it were 
> heavily biased toward using ordinals in the {{GROUP BY}} and {{ORDER BY}} 
> clauses. It works for most common queries, but I gave up with basically 2 
> problems yet to be solved:
> # Grouping with {{ROLLUP}}, {{CUBE}}, or {{GROUPING SETS}} does not allow 
> ordinals (at least [according to this presto 
> ticket|https://github.com/prestodb/presto/issues/9522]; I didn't double-check 
> any standards or dialects), so my draft solution would have to distinguish 
> between these cases and a simple group-by, and only use ordinals in the 
> simple case.
> # Window functions with an {{ORDER BY}} clause also use ordinals in my draft. 
> It seems likely that ordinals are also disallowed in this context, but I'm 
> not sure. Looking at the results in 
> {{RelToSqlConverterTest.testConvertWindowToSql()}}, it certainly looks like 
> the query is incorrect even if ordinals could hypothetically go in a window's 
> {{ORDER BY}}.
> So, I've decided to stop working on this for now, but wanted to preserve my 
> draft change and start a discussion on this ticket. Here are some thoughts:
> # It seems like we can reach a happy middle ground on sorting by using 
> ordinals whenever the dialect allows *and* the expression returned by 
> {{field()}} is anything other than a {{SqlIdentifier}} (i.e. a named column 
> reference). Existing behavior as of writing is to only use ordinals when the 
> expression is a {{SqlCall}}, but we should use it for literals as well. That 
> would solve CALCITE-5724, but still use named references for "simple" 
> collations, which seems to be the case for all window functions.
> # When it comes to grouping, things are more complicated. As of writing, 
> Calcite tends to group by expressions, but some dialects (e.g. BigQuery) can 
> get easily confused by this, even when the expression in the {{SELECT}} list 
> perfectly matches that in the {{GROUP BY}} clause. With our Calcite 
> integration, we need customized cleanup logic to rewrite the {{GROUP BY}} 
> clauses in terms of aliases / ordinals in the {{SELECT}} list whenever 
> possible, as a band-aid for this problem. We want to get rid of this band-aid 
> and upstream a proper solution, and I thought using ordinals could be it, but 
> the problems with {{GROUPING SETS}} et al still needs to be solved, and 
> Calcite seems to have poor support for grouping by ordinals in general (see 
> the changes I had to make to {{RelToSqlConverter.generateGroupList}} in my 
> draft) which should be improved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to