[jira] [Created] (CALCITE-6229) Upgrade json-path version to 2.9.0

2024-01-25 Thread Will Noble (Jira)
Will Noble created CALCITE-6229:
---

 Summary: Upgrade json-path version to 2.9.0
 Key: CALCITE-6229
 URL: https://issues.apache.org/jira/browse/CALCITE-6229
 Project: Calcite
  Issue Type: Bug
Reporter: Will Noble


There is a CVE: https://osv.dev/vulnerability/GHSA-pfh2-hfmq-phg5

Probably not critical for most users since it would be strange for unsanitized 
user input to make it into the JSON parser through Calcite, but should be easy 
to patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5164) Planner#parser can't parse TIMESTAMP() function

2023-11-22 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788914#comment-17788914
 ] 

Will Noble commented on CALCITE-5164:
-

If I'm not mistaken, I believe this ticket is now obsolete since CALCITE-5508 
is resolved.

> Planner#parser can't parse TIMESTAMP() function
> ---
>
> Key: CALCITE-5164
> URL: https://issues.apache.org/jira/browse/CALCITE-5164
> Project: Calcite
>  Issue Type: Bug
>  Components: babel
>Affects Versions: 1.30.0
>Reporter: Jiajun Xie
>Assignee: Jiajun Xie
>Priority: Minor
>
> Both core and babel will parse fail.
> {code:java}
>   FrameworkConfig coreConfig = Frameworks.newConfigBuilder().build();
>   Planner corePlanner = Frameworks.getPlanner(coreConfig);
>   corePlanner.parse("SELECT TIMESTAMP('2022-05-21 08:00:00'");
>   // Caused by: org.apache.calcite.sql.parser.babel.ParseException: 
> Incorrect syntax near the keyword 'TIMESTAMP' at line 1, column 8.
>   FrameworkConfig babelConfig = Frameworks.newConfigBuilder()
>   .parserConfig(SqlParser.Config.DEFAULT.withParserFactory(
>   SqlBabelParserImpl.FACTORY))
>   .build();
>   Planner babelPlanner = Frameworks.getPlanner(babelConfig);
>   babelPlanner.parse("SELECT TIMESTAMP('2022-05-21 08:00:00'");
>   // Caused by: org.apache.calcite.sql.parser.babel.ParseException: 
> Incorrect syntax near the keyword 'TIMESTAMP' at line 1, column 8
> {code}
> Here are some databases that support TIMESTAMP function.
>  - MySQL: 
> [https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_timestamp]
> {code:java}
> select timestamp('2022-05-21 08:00:00')
> // result
> timestamp('2022-05-21 08:00:00') 
> 2022-05-21 08:00:00
> {code}
>  - Derby: [https://docs.oracle.com/javadb/10.6.2.1/ref/rreftimestampfunc.html]
> Also, here are some databases that not support TIMESTAMP function:
>  - Oracle: 
> [https://docs.oracle.com/cd/B19306_01/server.102/b14225/ch4datetime.htm]
> {code:java}
> select timestamp('2022-05-21 08:00:00')
> //ORA-00923: FROM keyword not found where expected
> {code}
>  - SQL Server: 
> [https://docs.microsoft.com/en-us/sql/t-sql/functions/date-and-time-data-types-and-functions-transact-sql?view=sql-server-ver15]
> {code:java}
> select timestamp('2022-05-21 08:00:00')
> // Msg 195 Level 15 State 10 Line 1
> // 'timestamp' is not a recognized built-in function name.{code}
> Is it necessary for us to support it in babel module?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6117) Converting SAFE_CAST from RexCall to SqlCall fails to add the type as an argument

2023-11-13 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-6117:

Summary: Converting SAFE_CAST from RexCall to SqlCall fails to add the type 
as an argument  (was: Unparsing SAFE_CAST is broken)

> Converting SAFE_CAST from RexCall to SqlCall fails to add the type as an 
> argument
> -
>
> Key: CALCITE-6117
> URL: https://issues.apache.org/jira/browse/CALCITE-6117
> Project: Calcite
>  Issue Type: Bug
>Reporter: Will Noble
>Priority: Trivial
>  Labels: pull-request-available
>
> BigQuery's {{SAFE_CAST}} function operates much like {{CAST}}. Currently, the 
> unparsing logic is broken because it does not use the [special logic for 
> {{CAST}}|https://github.com/apache/calcite/blob/94dc1673da824174f9271677ead73cfae1aeb29b/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L816]
>  in {{SqlImplementor.callToSql}} that turns the rex call's return type into 
> the sql call's second argument.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6117) Unparsing SAFE_CAST is broken

2023-11-13 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17785720#comment-17785720
 ] 

Will Noble commented on CALCITE-6117:
-

It will trigger [this 
assertion|https://github.com/apache/calcite/blob/3843ede3d28783235780c0f81f725dc7e64a7828/core/src/main/java/org/apache/calcite/sql/fun/SqlCastFunction.java#L254]
 because the sql call only has 1 argument.

> Unparsing SAFE_CAST is broken
> -
>
> Key: CALCITE-6117
> URL: https://issues.apache.org/jira/browse/CALCITE-6117
> Project: Calcite
>  Issue Type: Bug
>Reporter: Will Noble
>Priority: Trivial
>  Labels: pull-request-available
>
> BigQuery's {{SAFE_CAST}} function operates much like {{CAST}}. Currently, the 
> unparsing logic is broken because it does not use the [special logic for 
> {{CAST}}|https://github.com/apache/calcite/blob/94dc1673da824174f9271677ead73cfae1aeb29b/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L816]
>  in {{SqlImplementor.callToSql}} that turns the rex call's return type into 
> the sql call's second argument.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6117) Unparsing SAFE_CAST is broken

2023-11-13 Thread Will Noble (Jira)
Will Noble created CALCITE-6117:
---

 Summary: Unparsing SAFE_CAST is broken
 Key: CALCITE-6117
 URL: https://issues.apache.org/jira/browse/CALCITE-6117
 Project: Calcite
  Issue Type: Bug
Reporter: Will Noble


BigQuery's {{SAFE_CAST}} function operates much like {{CAST}}. Currently, the 
unparsing logic is broken because it does not use the [special logic for 
{{CAST}}|https://github.com/apache/calcite/blob/94dc1673da824174f9271677ead73cfae1aeb29b/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L816]
 in {{SqlImplementor.callToSql}} that turns the rex call's return type into the 
sql call's second argument.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6062) Allow conversion of un-zoned timestamp strings to timestamp with local time zone

2023-10-19 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-6062:

Description: 
[This 
function|https://github.com/apache/calcite/blob/ce88348960e95e1c13da850b4422b2f49f022d93/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L4333]
 that implements conversion from strings to {{TIMESTAMP WITH LOCAL TIME ZONE}} 
uses {{TimestampWithTimeZoneString}}, which seems to have been a subtle mistake.

That function is only used to implement {{CAST}} expressions from strings to TS 
w/ LTZ. Those expressions should allow for strings that do not have a time 
zone, e.g. {{CAST('2023-10-19 12:00:00' AS TIMESTAMP WITH LOCAL TIME ZONE)}}, 
but it currently cannot because the {{TimestampWithTimeZoneString}} class 
always expects a string with a time zone on the end.

In the case that no zone is included in the string, we should assume that the 
string expresses a timestamp in the data context time zone by default.

  was:
[This 
function|https://github.com/apache/calcite/blob/ce88348960e95e1c13da850b4422b2f49f022d93/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L4333]
 that implements conversion from strings to {{TIMESTAMP WITH LOCAL TIME ZONE}} 
uses {{TimestampWithTimeZoneString}}, which seems to have been a subtle mistake.

That function is only used to implement {{CAST}} expressions from strings to TS 
w/ LTZ. Those expressions should allow for strings that do not have a time 
zone, e.g. {{CAST('2023-10-19 12:00:00' AS TIMESTAMP WITH LOCAL TIME ZONE)}}, 
but it currently cannot because the {{TimestampWithTimeZoneString}} class 
always expects a time zone string.

In the case that no zone is included in the string, we should assume that the 
string expresses a timestamp in the data context time zone by defaul.


> Allow conversion of un-zoned timestamp strings to timestamp with local time 
> zone
> 
>
> Key: CALCITE-6062
> URL: https://issues.apache.org/jira/browse/CALCITE-6062
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Will Noble
>Priority: Minor
>
> [This 
> function|https://github.com/apache/calcite/blob/ce88348960e95e1c13da850b4422b2f49f022d93/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L4333]
>  that implements conversion from strings to {{TIMESTAMP WITH LOCAL TIME 
> ZONE}} uses {{TimestampWithTimeZoneString}}, which seems to have been a 
> subtle mistake.
> That function is only used to implement {{CAST}} expressions from strings to 
> TS w/ LTZ. Those expressions should allow for strings that do not have a time 
> zone, e.g. {{CAST('2023-10-19 12:00:00' AS TIMESTAMP WITH LOCAL TIME ZONE)}}, 
> but it currently cannot because the {{TimestampWithTimeZoneString}} class 
> always expects a string with a time zone on the end.
> In the case that no zone is included in the string, we should assume that the 
> string expresses a timestamp in the data context time zone by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6062) Allow conversion of un-zoned timestamp strings to timestamp with local time zone

2023-10-19 Thread Will Noble (Jira)
Will Noble created CALCITE-6062:
---

 Summary: Allow conversion of un-zoned timestamp strings to 
timestamp with local time zone
 Key: CALCITE-6062
 URL: https://issues.apache.org/jira/browse/CALCITE-6062
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Will Noble


[This 
function|https://github.com/apache/calcite/blob/ce88348960e95e1c13da850b4422b2f49f022d93/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L4333]
 that implements conversion from strings to {{TIMESTAMP WITH LOCAL TIME ZONE}} 
uses {{TimestampWithTimeZoneString}}, which seems to have been a subtle mistake.

That function is only used to implement {{CAST}} expressions from strings to TS 
w/ LTZ. Those expressions should allow for strings that do not have a time 
zone, e.g. {{CAST('2023-10-19 12:00:00' AS TIMESTAMP WITH LOCAL TIME ZONE)}}, 
but it currently cannot because the {{TimestampWithTimeZoneString}} class 
always expects a time zone string.

In the case that no zone is included in the string, we should assume that the 
string expresses a timestamp in the data context time zone by defaul.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6055) Customize handling of DateTimeTypeName based on type system

2023-10-19 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1438#comment-1438
 ] 

Will Noble commented on CALCITE-6055:
-

My bad. Added more context to the description. Let me know if anything is still 
unclear or if you think the title could still use tweaking.

> Customize handling of DateTimeTypeName based on type system
> ---
>
> Key: CALCITE-6055
> URL: https://issues.apache.org/jira/browse/CALCITE-6055
> Project: Calcite
>  Issue Type: Bug
>Reporter: Will Noble
>Priority: Minor
>  Labels: pull-request-available
>
> *Background*:
> In ISO SQL, a {{TIMESTAMP}} comprises the parameters year, month, day, hour, 
> minute, and second, henceforth referred to as clock-calendar parameters. This 
> does not define an unambiguous instant in time, since timestamps in different 
> time zones can have different parameters at the same instant. It is analogous 
> to a BigQuery {{DATETIME}}. BigQuery chose to give its native {{TIMESTAMP}} 
> type a different definition: “an absolute point in time, independent of any 
> time zone or convention such as Daylight Savings Time.”
> ISO SQL has 2 other {{TIMESTAMP}} types. The {{TIMESTAMP WITH TIME ZONE}} 
> simply attaches a time zone to the existing {{TIMESTAMP}} type, making it 
> _both_ a set of clock-calendar parameters _and_ an unambiguous instant in 
> time. The final type -- {{TIMESTAMP WITH LOCAL TIME ZONE}} -- is more subtle. 
> According to Oracle, it “is normalized to the database time zone, and the 
> time zone offset is not stored as part of the column data. When users 
> retrieve the data, Oracle returns it in the users' local session time zone.” 
> This act of conversion between the local time zone and the database time zone 
> means that users interacting with this kind of timestamp will observe the 
> same instant in time even though they see different clock-calendar 
> parameters. However, the local time zone is not considered “part of the 
> data”. Thus, it has the semantics of an un-zoned, absolute point in time, 
> analogous to a BigQuery {{TIMESTAMP}}.
> This creates the confusing situation where a BigQuery {{DATETIME}} is 
> actually an ISO {{TIMESTAMP}}, and a BigQuery {{TIMESTAMP}} is actually an 
> ISO {{TIMESTAMP WITH LOCAL TIME ZONE}}.
> *Problem*:
> This is very similar to CALCITE-5424, which only applied to literals (e.g. 
> {{TIMESTAMP '2023-10-17 12:00:00'}}). We need a similar solution for [the 
> {{DateTimeTypeName}} 
> construction|https://github.com/apache/calcite/blob/6f79436c178beec639e559d9152c237bbf8ec3e8/core/src/main/codegen/templates/Parser.jj#L6019]
>  which is used in other contexts (e.g. {{CAST('2023-10-17 12:00:00' AS 
> TIMESTAMP)}}). Seems like we can use a very similar approach: introduce a 
> {{DATETIME}} type name, and use {{SqlUserDefinedTypeNameSpec}} for both that 
> and the {{TIMESTAMP}} type names. During validation and sql-to-rel 
> conversion, we can look up the type names in the UDT map. If no mapping is 
> found for {{DATETIME}} (which would occur for any dialect besides BigQuery), 
> an error would occur during validation, because that type does not exist in 
> that dialect. If no mapping is found for {{TIMESTAMP}}, we would fall back on 
> the canonical {{TIMESTAMP}} type, preserving the existing behavior for all 
> other dialects.
> *In summary*:
> {{CAST('2023-10-17 12:00:00' AS TIMESTAMP)}} should produce a {{TIMESTAMP 
> WITH LOCAL TIME ZONE}} for BigQuery, but a regular {{TIMESTAMP}} for all 
> other dialects.
> {{CAST('2023-10-17 12:00:00' AS DATETIME)}} should produce a {{TIMESTAMP}} 
> for BigQuery, but throw a validation error for all other dialects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6055) Customize handling of DateTimeTypeName based on type system

2023-10-19 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-6055:

Summary: Customize handling of DateTimeTypeName based on type system  (was: 
Customize handling of type name based on type system)

> Customize handling of DateTimeTypeName based on type system
> ---
>
> Key: CALCITE-6055
> URL: https://issues.apache.org/jira/browse/CALCITE-6055
> Project: Calcite
>  Issue Type: Bug
>Reporter: Will Noble
>Priority: Minor
>  Labels: pull-request-available
>
> *Background*:
> In ISO SQL, a {{TIMESTAMP}} comprises the parameters year, month, day, hour, 
> minute, and second, henceforth referred to as clock-calendar parameters. This 
> does not define an unambiguous instant in time, since timestamps in different 
> time zones can have different parameters at the same instant. It is analogous 
> to a BigQuery {{DATETIME}}. BigQuery chose to give its native {{TIMESTAMP}} 
> type a different definition: “an absolute point in time, independent of any 
> time zone or convention such as Daylight Savings Time.”
> ISO SQL has 2 other {{TIMESTAMP}} types. The {{TIMESTAMP WITH TIME ZONE}} 
> simply attaches a time zone to the existing {{TIMESTAMP}} type, making it 
> _both_ a set of clock-calendar parameters _and_ an unambiguous instant in 
> time. The final type -- {{TIMESTAMP WITH LOCAL TIME ZONE}} -- is more subtle. 
> According to Oracle, it “is normalized to the database time zone, and the 
> time zone offset is not stored as part of the column data. When users 
> retrieve the data, Oracle returns it in the users' local session time zone.” 
> This act of conversion between the local time zone and the database time zone 
> means that users interacting with this kind of timestamp will observe the 
> same instant in time even though they see different clock-calendar 
> parameters. However, the local time zone is not considered “part of the 
> data”. Thus, it has the semantics of an un-zoned, absolute point in time, 
> analogous to a BigQuery {{TIMESTAMP}}.
> This creates the confusing situation where a BigQuery {{DATETIME}} is 
> actually an ISO {{TIMESTAMP}}, and a BigQuery {{TIMESTAMP}} is actually an 
> ISO {{TIMESTAMP WITH LOCAL TIME ZONE}}.
> *Problem*:
> This is very similar to CALCITE-5424, which only applied to literals (e.g. 
> {{TIMESTAMP '2023-10-17 12:00:00'}}). We need a similar solution for [the 
> {{DateTimeTypeName}} 
> construction|https://github.com/apache/calcite/blob/6f79436c178beec639e559d9152c237bbf8ec3e8/core/src/main/codegen/templates/Parser.jj#L6019]
>  which is used in other contexts (e.g. {{CAST('2023-10-17 12:00:00' AS 
> TIMESTAMP)}}). Seems like we can use a very similar approach: introduce a 
> {{DATETIME}} type name, and use {{SqlUserDefinedTypeNameSpec}} for both that 
> and the {{TIMESTAMP}} type names. During validation and sql-to-rel 
> conversion, we can look up the type names in the UDT map. If no mapping is 
> found for {{DATETIME}} (which would occur for any dialect besides BigQuery), 
> an error would occur during validation, because that type does not exist in 
> that dialect. If no mapping is found for {{TIMESTAMP}}, we would fall back on 
> the canonical {{TIMESTAMP}} type, preserving the existing behavior for all 
> other dialects.
> *In summary*:
> {{CAST('2023-10-17 12:00:00' AS TIMESTAMP)}} should produce a {{TIMESTAMP 
> WITH LOCAL TIME ZONE}} for BigQuery, but a regular {{TIMESTAMP}} for all 
> other dialects.
> {{CAST('2023-10-17 12:00:00' AS DATETIME)}} should produce a {{TIMESTAMP}} 
> for BigQuery, but throw a validation error for all other dialects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6055) Customize handling of type name based on type system

2023-10-19 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-6055:

Description: 
*Background*:

In ISO SQL, a {{TIMESTAMP}} comprises the parameters year, month, day, hour, 
minute, and second, henceforth referred to as clock-calendar parameters. This 
does not define an unambiguous instant in time, since timestamps in different 
time zones can have different parameters at the same instant. It is analogous 
to a BigQuery {{DATETIME}}. BigQuery chose to give its native {{TIMESTAMP}} 
type a different definition: “an absolute point in time, independent of any 
time zone or convention such as Daylight Savings Time.”

ISO SQL has 2 other {{TIMESTAMP}} types. The {{TIMESTAMP WITH TIME ZONE}} 
simply attaches a time zone to the existing {{TIMESTAMP}} type, making it 
_both_ a set of clock-calendar parameters _and_ an unambiguous instant in time. 
The final type -- {{TIMESTAMP WITH LOCAL TIME ZONE}} -- is more subtle. 
According to Oracle, it “is normalized to the database time zone, and the time 
zone offset is not stored as part of the column data. When users retrieve the 
data, Oracle returns it in the users' local session time zone.” This act of 
conversion between the local time zone and the database time zone means that 
users interacting with this kind of timestamp will observe the same instant in 
time even though they see different clock-calendar parameters. However, the 
local time zone is not considered “part of the data”. Thus, it has the 
semantics of an un-zoned, absolute point in time, analogous to a BigQuery 
{{TIMESTAMP}}.

This creates the confusing situation where a BigQuery {{DATETIME}} is actually 
an ISO {{TIMESTAMP}}, and a BigQuery {{TIMESTAMP}} is actually an ISO 
{{TIMESTAMP WITH LOCAL TIME ZONE}}.

*Problem*:

This is very similar to CALCITE-5424, which only applied to literals (e.g. 
{{TIMESTAMP '2023-10-17 12:00:00'}}). We need a similar solution for [the 
{{DateTimeTypeName}} 
construction|https://github.com/apache/calcite/blob/6f79436c178beec639e559d9152c237bbf8ec3e8/core/src/main/codegen/templates/Parser.jj#L6019]
 which is used in other contexts (e.g. {{CAST('2023-10-17 12:00:00' AS 
TIMESTAMP)}}). Seems like we can use a very similar approach: introduce a 
{{DATETIME}} type name, and use {{SqlUserDefinedTypeNameSpec}} for both that 
and the {{TIMESTAMP}} type names. During validation and sql-to-rel conversion, 
we can look up the type names in the UDT map. If no mapping is found for 
{{DATETIME}} (which would occur for any dialect besides BigQuery), an error 
would occur during validation, because that type does not exist in that 
dialect. If no mapping is found for {{TIMESTAMP}}, we would fall back on the 
canonical {{TIMESTAMP}} type, preserving the existing behavior for all other 
dialects.

*In summary*:

{{CAST('2023-10-17 12:00:00' AS TIMESTAMP)}} should produce a {{TIMESTAMP WITH 
LOCAL TIME ZONE}} for BigQuery, but a regular {{TIMESTAMP}} for all other 
dialects.
{{CAST('2023-10-17 12:00:00' AS DATETIME)}} should produce a {{TIMESTAMP}} for 
BigQuery, but throw a validation error for all other dialects.

  was:
*Background*:

In ISO SQL, a {{TIMESTAMP}} comprises the parameters year, month, day, hour, 
minute, and second, henceforth referred to as clock-calendar parameters. This 
does not define an unambiguous instant in time, since timestamps in different 
time zones can have different parameters at the same instant. It is analogous 
to a BigQuery {{DATETIME}}. BigQuery chose to give its native {{TIMESTAMP}} 
type a different definition: “an absolute point in time, independent of any 
time zone or convention such as Daylight Savings Time.”

ISO SQL has 2 other {{TIMESTAMP}} types. The {{TIMESTAMP WITH TIME ZONE}} 
simply attaches a time zone to the existing {{TIMESTAMP}} type, making it both 
a set of clock-calendar parameters _and_ an unambiguous instant in time. The 
final type -- {{TIMESTAMP WITH LOCAL TIME ZONE}} -- is more subtle. According 
to Oracle, it “is normalized to the database time zone, and the time zone 
offset is not stored as part of the column data. When users retrieve the data, 
Oracle returns it in the users' local session time zone.” This act of 
conversion between the local time zone and the database time zone means that 
users interacting with this kind of timestamp will observe the same instant in 
time even though they see different clock-calendar parameters. However, the 
local time zone is not considered “part of the data”. Thus, it has the 
semantics of an un-zoned, absolute point in time, analogous to a BigQuery 
{{TIMESTAMP}}.

This creates the confusing situation where a BigQuery {{DATETIME}} is actually 
an ISO {{TIMESTAMP}}, and a BigQuery {{TIMESTAMP}} is actually an ISO 
{{TIMESTAMP WITH LOCAL TIME ZONE}}.

*Problem*:

This is very similar to CALCITE-5424, which only applied to literals (e.g. 

[jira] [Updated] (CALCITE-6055) Customize handling of type name based on type system

2023-10-19 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-6055:

Description: 
*Background*:

In ISO SQL, a {{TIMESTAMP}} comprises the parameters year, month, day, hour, 
minute, and second, henceforth referred to as clock-calendar parameters. This 
does not define an unambiguous instant in time, since timestamps in different 
time zones can have different parameters at the same instant. It is analogous 
to a BigQuery {{DATETIME}}. BigQuery chose to give its native {{TIMESTAMP}} 
type a different definition: “an absolute point in time, independent of any 
time zone or convention such as Daylight Savings Time.”

ISO SQL has 2 other {{TIMESTAMP}} types. The {{TIMESTAMP WITH TIME ZONE}} 
simply attaches a time zone to the existing {{TIMESTAMP}} type, making it both 
a set of clock-calendar parameters _and_ an unambiguous instant in time. The 
final type -- {{TIMESTAMP WITH LOCAL TIME ZONE}} -- is more subtle. According 
to Oracle, it “is normalized to the database time zone, and the time zone 
offset is not stored as part of the column data. When users retrieve the data, 
Oracle returns it in the users' local session time zone.” This act of 
conversion between the local time zone and the database time zone means that 
users interacting with this kind of timestamp will observe the same instant in 
time even though they see different clock-calendar parameters. However, the 
local time zone is not considered “part of the data”. Thus, it has the 
semantics of an un-zoned, absolute point in time, analogous to a BigQuery 
{{TIMESTAMP}}.

This creates the confusing situation where a BigQuery {{DATETIME}} is actually 
an ISO {{TIMESTAMP}}, and a BigQuery {{TIMESTAMP}} is actually an ISO 
{{TIMESTAMP WITH LOCAL TIME ZONE}}.

*Problem*:

This is very similar to CALCITE-5424, which only applied to literals (e.g. 
{{TIMESTAMP '2023-10-17 12:00:00'}}). We need a similar solution for [the 
{{DateTimeTypeName}} 
construction|https://github.com/apache/calcite/blob/6f79436c178beec639e559d9152c237bbf8ec3e8/core/src/main/codegen/templates/Parser.jj#L6019]
 which is used in other contexts (e.g. {{CAST('2023-10-17 12:00:00' AS 
TIMESTAMP)}}). Seems like we can use a very similar approach: introduce a 
{{DATETIME}} type name, and use {{SqlUserDefinedTypeNameSpec}} for both that 
and the {{TIMESTAMP}} type names. During validation and sql-to-rel conversion, 
we can look up the type names in the UDT map. If no mapping is found for 
{{DATETIME}} (which would occur for any dialect besides BigQuery), an error 
would occur during validation, because that type does not exist in that 
dialect. If no mapping is found for {{TIMESTAMP}}, we would fall back on the 
canonical {{TIMESTAMP}} type, preserving the existing behavior for all other 
dialects.

*In summary*:

{{CAST('2023-10-17 12:00:00' AS TIMESTAMP)}} should produce a {{TIMESTAMP WITH 
LOCAL TIME ZONE}} for BigQuery, but a regular {{TIMESTAMP}} for all other 
dialects.
{{CAST('2023-10-17 12:00:00' AS DATETIME)}} should produce a {{TIMESTAMP}} for 
BigQuery, but throw a validation error for all other dialects.

  was:
Background:

In ISO SQL, a {{TIMESTAMP}} comprises the parameters year, month, day, hour, 
minute, and second, henceforth referred to as clock-calendar parameters. This 
does not define an unambiguous instant in time, since timestamps in different 
time zones can have different parameters at the same instant. It is analogous 
to a BigQuery {{DATETIME}}. BigQuery chose to give its native {{TIMESTAMP}} 
type a different definition: “an absolute point in time, independent of any 
time zone or convention such as Daylight Savings Time.”

ISO SQL has 2 other {{TIMESTAMP}} types. The {{TIMESTAMP WITH TIME ZONE}} 
simply attaches a time zone to the existing {{TIMESTAMP}} type, making it both 
a set of clock-calendar parameters and an unambiguous instant in time. The 
final type -- {{TIMESTAMP WITH LOCAL TIME ZONE}} -- is more subtle. According 
to Oracle, it “is normalized to the database time zone, and the time zone 
offset is not stored as part of the column data. When users retrieve the data, 
Oracle returns it in the users' local session time zone.” This act of 
conversion between the local time zone and the database time zone means that 
users interacting with this kind of timestamp will observe the same instant in 
time even though they see different clock-calendar parameters. However, the 
local time zone is not considered “part of the data”. Thus, it has the 
semantics of an un-zoned, absolute point in time, analogous to a BigQuery 
{{TIMESTAMP}}.

This creates the confusing situation where a BigQuery {{DATETIME}} is actually 
an ISO {{TIMESTAMP}}, and a BigQuery {{TIMESTAMP}} is actually an ISO 
{{TIMESTAMP WITH LOCAL TIME ZONE}}.

Problem:

This is very similar to CALCITE-5424, which only applied to literals (e.g. 
{{TIMESTAMP 

[jira] [Updated] (CALCITE-6055) Customize handling of type name based on type system

2023-10-19 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-6055:

Description: 
Background:

In ISO SQL, a {{TIMESTAMP}} comprises the parameters year, month, day, hour, 
minute, and second, henceforth referred to as clock-calendar parameters. This 
does not define an unambiguous instant in time, since timestamps in different 
time zones can have different parameters at the same instant. It is analogous 
to a BigQuery {{DATETIME}}. BigQuery chose to give its native {{TIMESTAMP}} 
type a different definition: “an absolute point in time, independent of any 
time zone or convention such as Daylight Savings Time.”

ISO SQL has 2 other {{TIMESTAMP}} types. The {{TIMESTAMP WITH TIME ZONE}} 
simply attaches a time zone to the existing {{TIMESTAMP}} type, making it both 
a set of clock-calendar parameters and an unambiguous instant in time. The 
final type -- {{TIMESTAMP WITH LOCAL TIME ZONE}} -- is more subtle. According 
to Oracle, it “is normalized to the database time zone, and the time zone 
offset is not stored as part of the column data. When users retrieve the data, 
Oracle returns it in the users' local session time zone.” This act of 
conversion between the local time zone and the database time zone means that 
users interacting with this kind of timestamp will observe the same instant in 
time even though they see different clock-calendar parameters. However, the 
local time zone is not considered “part of the data”. Thus, it has the 
semantics of an un-zoned, absolute point in time, analogous to a BigQuery 
{{TIMESTAMP}}.

This creates the confusing situation where a BigQuery {{DATETIME}} is actually 
an ISO {{TIMESTAMP}}, and a BigQuery {{TIMESTAMP}} is actually an ISO 
{{TIMESTAMP WITH LOCAL TIME ZONE}}.

Problem:

This is very similar to CALCITE-5424, which only applied to literals (e.g. 
{{TIMESTAMP '2023-10-17 12:00:00'}}). We need a similar solution for [the 
{{DateTimeTypeName}} 
construction|https://github.com/apache/calcite/blob/6f79436c178beec639e559d9152c237bbf8ec3e8/core/src/main/codegen/templates/Parser.jj#L6019]
 which is used in other contexts (e.g. {{CAST('2023-10-17 12:00:00' AS 
TIMESTAMP)}}). Seems like we can use a very similar approach: introduce a 
{{DATETIME}} type name, and use {{SqlUserDefinedTypeNameSpec}} for both that 
and the {{TIMESTAMP}} type names. During validation and sql-to-rel conversion, 
we can look up the type names in the UDT map. If no mapping is found for 
{{DATETIME}} (which would occur for any dialect besides BigQuery), an error 
would occur during validation, because that type does not exist in that 
dialect. If no mapping is found for {{TIMESTAMP}}, we would fall back on the 
canonical {{TIMESTAMP}} type, preserving the existing behavior for all other 
dialects.

In summary:

{{CAST('2023-10-17 12:00:00' AS TIMESTAMP)}} should produce a {{TIMESTAMP WITH 
LOCAL TIME ZONE}} for BigQuery, but a regular {{TIMESTAMP}} for all other 
dialects.
{{CAST('2023-10-17 12:00:00' AS DATETIME)}} should produce a {{TIMESTAMP}} for 
BigQuery, but throw a validation error for all other dialects.

  was:
Background:

In ISO SQL, a {{TIMESTAMP}} comprises the parameters year, month, day, hour, 
minute, and second, henceforth referred to as clock-calendar parameters. This 
does not define an unambiguous instant in time, since timestamps in different 
time zones can have different parameters at the same instant. It is analogous 
to a BigQuery {{DATETIME}}. BigQuery chose to give its native {{TIMESTAMP}} 
type a different definition: “an absolute point in time, independent of any 
time zone or convention such as Daylight Savings Time.”

ISO SQL has 2 other {{TIMESTAMP}} types. The {{TIMESTAMP WITH TIME ZONE}} 
simply attaches a time zone to the existing {{TIMESTAMP}} type, making it both 
a set of clock-calendar parameters and an unambiguous instant in time. The 
final type -- {{TIMESTAMP WITH LOCAL TIME ZONE}} -- is more subtle. According 
to Oracle, it “is normalized to the database time zone, and the time zone 
offset is not stored as part of the column data. When users retrieve the data, 
Oracle returns it in the users' local session time zone.” This act of 
conversion between the local time zone and the database time zone means that 
users interacting with this kind of timestamp will observe the same instant in 
time even though they see different clock-calendar parameters. However, the 
local time zone is not considered “part of the data”. Thus, it has the 
semantics of an un-zoned, absolute point in time, analogous to a BigQuery 
{{TIMESTAMP}}.

This creates the confusing situation where a BigQuery {{DATETIME}} is actually 
an ISO {{TIMESTAMP}}, and a BigQuery {{TIMESTAMP}} is actually an ISO 
{{TIMESTAMP WITH LOCAL TIME ZONE}}.

Problem:

This is very similar to CALCITE-5424, which only applied to literals (e.g. 
{{TIMESTAMP '2023-10-17 

[jira] [Updated] (CALCITE-6055) Customize handling of type name based on type system

2023-10-19 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-6055:

Description: 
Background:

In ISO SQL, a {{TIMESTAMP}} comprises the parameters year, month, day, hour, 
minute, and second, henceforth referred to as clock-calendar parameters. This 
does not define an unambiguous instant in time, since timestamps in different 
time zones can have different parameters at the same instant. It is analogous 
to a BigQuery {{DATETIME}}. BigQuery chose to give its native {{TIMESTAMP}} 
type a different definition: “an absolute point in time, independent of any 
time zone or convention such as Daylight Savings Time.”

ISO SQL has 2 other {{TIMESTAMP}} types. The {{TIMESTAMP WITH TIME ZONE}} 
simply attaches a time zone to the existing {{TIMESTAMP}} type, making it both 
a set of clock-calendar parameters and an unambiguous instant in time. The 
final type -- {{TIMESTAMP WITH LOCAL TIME ZONE}} -- is more subtle. According 
to Oracle, it “is normalized to the database time zone, and the time zone 
offset is not stored as part of the column data. When users retrieve the data, 
Oracle returns it in the users' local session time zone.” This act of 
conversion between the local time zone and the database time zone means that 
users interacting with this kind of timestamp will observe the same instant in 
time even though they see different clock-calendar parameters. However, the 
local time zone is not considered “part of the data”. Thus, it has the 
semantics of an un-zoned, absolute point in time, analogous to a BigQuery 
{{TIMESTAMP}}.

This creates the confusing situation where a BigQuery {{DATETIME}} is actually 
an ISO {{TIMESTAMP}}, and a BigQuery {{TIMESTAMP}} is actually an ISO 
{{TIMESTAMP WITH LOCAL TIME ZONE}}.

Problem:

This is very similar to CALCITE-5424, which only applied to literals (e.g. 
{{TIMESTAMP '2023-10-17 12:00:00'}}). We need a similar solution for [the 
{{DateTimeTypeName}} 
construction|https://github.com/apache/calcite/blob/6f79436c178beec639e559d9152c237bbf8ec3e8/core/src/main/codegen/templates/Parser.jj#L6019]
 which is used in other contexts (e.g. {{CAST('2023-10-17 12:00:00' AS 
TIMESTAMP)}}). Seems like we can use a very similar approach: introduce a 
{{DATETIME}} type name, and use {{SqlUserDefinedTypeNameSpec}} for both that 
and the {{TIMESTAMP}} type names. During validation and sql-to-rel conversion, 
we can look up the type names in the UDT map. If no mapping is found for 
{{DATETIME}} (which would occur for any dialect besides BigQuery), an error 
would occur during validation, because that type does not exist in that 
dialect. If no mapping is found for {{TIMESTAMP}}, we would fall back on the 
canonical {{TIMESTAMP}} type, preserving the existing behavior for all other 
dialects.

  was:
Background:

In ISO SQL, a {{TIMESTAMP}} comprises the parameters year, month, day, hour, 
minute, and second, henceforth referred to as clock-calendar parameters. This 
does not define an unambiguous instant in time, since timestamps in different 
time zones can have different parameters at the same instant. It is analogous 
to a BigQuery {{DATETIME}}. BigQuery chose to give its native {{TIMESTAMP}} 
type a different definition: “an absolute point in time, independent of any 
time zone or convention such as Daylight Savings Time.”

ISO SQL has 2 other {{TIMESTAMP}} types. The {{TIMESTAMP WITH TIME ZONE}} 
simply attaches a time zone to the existing {{TIMESTAMP}} type, making it both 
a set of clock-calendar parameters and an unambiguous instant in time. The 
final type -- {{TIMESTAMP WITH LOCAL TIME ZONE}} -- is more subtle. According 
to Oracle, it “is normalized to the database time zone, and the time zone 
offset is not stored as part of the column data. When users retrieve the data, 
Oracle returns it in the users' local session time zone.” This act of 
conversion between the local time zone and the database time zone means that 
users interacting with this kind of timestamp will observe the same instant in 
time even though they see different clock-calendar parameters. However, the 
local time zone is not considered “part of the data”. Thus, it has the 
semantics of an un-zoned, absolute point in time, analogous to a BigQuery 
{{TIMESTAMP}}.

This creates the confusing situation where a BigQuery {{DATETIME}} is actually 
an ISO {{TIMESTAMP}}, and a BigQuery {{TIMESTAMP}} is actually an ISO 
{{TIMESTAMP WITH LOCAL TIME ZONE}}.

Problem:

This is very similar to CALCITE-5424, which only applied to literals (e.g. 
{{TIMESTAMP '2023-10-17 12:00:00'}}). We need a similar solution for [the 
{{DateTimeTypeName}} 
construction|https://github.com/apache/calcite/blob/6f79436c178beec639e559d9152c237bbf8ec3e8/core/src/main/codegen/templates/Parser.jj#L6019]
 which is used in other contexts (e.g. {{CAST('2023-10-17 12:00:00' AS 
TIMESTAMP)}}). Seems like we can use 

[jira] [Updated] (CALCITE-6055) Customize handling of type name based on type system

2023-10-19 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-6055:

Description: 
Background:

In ISO SQL, a {{TIMESTAMP}} comprises the parameters year, month, day, hour, 
minute, and second, henceforth referred to as clock-calendar parameters. This 
does not define an unambiguous instant in time, since timestamps in different 
time zones can have different parameters at the same instant. It is analogous 
to a BigQuery {{DATETIME}}. BigQuery chose to give its native {{TIMESTAMP}} 
type a different definition: “an absolute point in time, independent of any 
time zone or convention such as Daylight Savings Time.”

ISO SQL has 2 other {{TIMESTAMP}} types. The {{TIMESTAMP WITH TIME ZONE}} 
simply attaches a time zone to the existing {{TIMESTAMP}} type, making it both 
a set of clock-calendar parameters and an unambiguous instant in time. The 
final type -- {{TIMESTAMP WITH LOCAL TIME ZONE}} -- is more subtle. According 
to Oracle, it “is normalized to the database time zone, and the time zone 
offset is not stored as part of the column data. When users retrieve the data, 
Oracle returns it in the users' local session time zone.” This act of 
conversion between the local time zone and the database time zone means that 
users interacting with this kind of timestamp will observe the same instant in 
time even though they see different clock-calendar parameters. However, the 
local time zone is not considered “part of the data”. Thus, it has the 
semantics of an un-zoned, absolute point in time, analogous to a BigQuery 
{{TIMESTAMP}}.

This creates the confusing situation where a BigQuery {{DATETIME}} is actually 
an ISO {{TIMESTAMP}}, and a BigQuery {{TIMESTAMP}} is actually an ISO 
{{TIMESTAMP WITH LOCAL TIME ZONE}}.

Problem:

This is very similar to CALCITE-5424, which only applied to literals (e.g. 
{{TIMESTAMP '2023-10-17 12:00:00'}}). We need a similar solution for [the 
{{DateTimeTypeName}} 
construction|https://github.com/apache/calcite/blob/6f79436c178beec639e559d9152c237bbf8ec3e8/core/src/main/codegen/templates/Parser.jj#L6019]
 which is used in other contexts (e.g. {{CAST('2023-10-17 12:00:00' AS 
TIMESTAMP)}}). Seems like we can use a very similar approach: introduce a 
{{SqlUnknownTypeNameSpec}} that's resolved during validation.

  was:This is very similar to CALCITE-5424, which only applied to literals 
(e.g. {{TIMESTAMP '2023-10-17 12:00:00'}}). We need a similar solution for [the 
{{DateTimeTypeName}} 
construction|https://github.com/apache/calcite/blob/6f79436c178beec639e559d9152c237bbf8ec3e8/core/src/main/codegen/templates/Parser.jj#L6019]
 which is used in other contexts (e.g. {{CAST('2023-10-17 12:00:00' AS 
TIMESTAMP)}}). Seems like we can use a very similar approach: introduce a 
{{SqlUnknownTypeNameSpec}} that's resolved during validation.


> Customize handling of type name based on type system
> 
>
> Key: CALCITE-6055
> URL: https://issues.apache.org/jira/browse/CALCITE-6055
> Project: Calcite
>  Issue Type: Bug
>Reporter: Will Noble
>Priority: Minor
>  Labels: pull-request-available
>
> Background:
> In ISO SQL, a {{TIMESTAMP}} comprises the parameters year, month, day, hour, 
> minute, and second, henceforth referred to as clock-calendar parameters. This 
> does not define an unambiguous instant in time, since timestamps in different 
> time zones can have different parameters at the same instant. It is analogous 
> to a BigQuery {{DATETIME}}. BigQuery chose to give its native {{TIMESTAMP}} 
> type a different definition: “an absolute point in time, independent of any 
> time zone or convention such as Daylight Savings Time.”
> ISO SQL has 2 other {{TIMESTAMP}} types. The {{TIMESTAMP WITH TIME ZONE}} 
> simply attaches a time zone to the existing {{TIMESTAMP}} type, making it 
> both a set of clock-calendar parameters and an unambiguous instant in time. 
> The final type -- {{TIMESTAMP WITH LOCAL TIME ZONE}} -- is more subtle. 
> According to Oracle, it “is normalized to the database time zone, and the 
> time zone offset is not stored as part of the column data. When users 
> retrieve the data, Oracle returns it in the users' local session time zone.” 
> This act of conversion between the local time zone and the database time zone 
> means that users interacting with this kind of timestamp will observe the 
> same instant in time even though they see different clock-calendar 
> parameters. However, the local time zone is not considered “part of the 
> data”. Thus, it has the semantics of an un-zoned, absolute point in time, 
> analogous to a BigQuery {{TIMESTAMP}}.
> This creates the confusing situation where a BigQuery {{DATETIME}} is 
> actually an ISO {{TIMESTAMP}}, and a BigQuery {{TIMESTAMP}} is actually an 
> 

[jira] [Commented] (CALCITE-5526) Handle unparsing of literals based on type system

2023-10-18 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776924#comment-17776924
 ] 

Will Noble commented on CALCITE-5526:
-

Let me know if you think the two basic tests in [this draft 
PR|https://github.com/apache/calcite/pull/3476] fall short of illuminating the 
issue.

The {{UdtTest}} case passes and shows the happy path of scenario #2 above. The 
{{SqlUserDefinedTypeNameSpec}} nodes are converted to normal {{RelDataType}} 
objects during sql-to-rel conversion. Relational expressions are 
dialect-agnostic by nature. Losing the context that those types were originally 
UDTs is fine, even desirable.

The {{SqlParserTest}} is the base class for a few tests like 
{{CoreSqlParserTest}} and {{ServerParserTest}}. These tests involve parsing and 
then unparsing SQL without converting to rel nodes, so it exemplifies the 
complexity of scenario #1. With the draft implementation as currently written, 
{{CoreSqlParserTest}} passes, but {{ServerParserTest}} does not, because the 
former uses backtick identifier quoting and the latter uses double-quotes. 
While these tests function by simply parsing then unparsing, perhaps this does 
not represent a realistic use-case for Calcite, and could be revisited as a 
testing strategy. Otherwise, the implementation could be tweaked in various 
ways as described in that PR.

> Handle unparsing of literals based on type system
> -
>
> Key: CALCITE-5526
> URL: https://issues.apache.org/jira/browse/CALCITE-5526
> Project: Calcite
>  Issue Type: Task
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>
> CALCITE-5424 dealt with parsing date/time literals via a custom type system, 
> however there is still no way to unparse them via the same type system. This 
> is handled in 
> [SqlDialect.unparseDateTimeLiteral|https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/java/org/apache/calcite/sql/SqlDialect.java#L468]
>  which simply calls the literal object's {{toString()}} method.
> The literal object (a subclass of {{SqlAbstractDateTimeLiteral}}) should not 
> be able to determine it's own unparsed representation by itself since it 
> could be unparsed in any dialect. Since the current system for resolving 
> custom types relies on access to the catalog, it appears we'll need to 
> introduce to {{SqlDialect}} a dependency on {{CalciteCatalogReader}} so it 
> can use the same mappings for unparsing as it does for parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-5526) Handle unparsing of literals based on type system

2023-10-17 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776449#comment-17776449
 ] 

Will Noble edited comment on CALCITE-5526 at 10/18/23 1:50 AM:
---

There are actually 2 different scenarios:

1. The program parses a query using a UDT map, and ends up with 
{{SqlUnknownLiteral}} and {{SqlUserDefinedTypeNameSpec}} nodes in the AST. The 
program never bothers converting that AST into a relational expression, and 
simply unparses it in a specific dialect. I'm not sure how relevant this 
scenario is to anybody, or if it even makes a ton a sense. It's not 
particularly relevant to us. It would also be the annoying scenario to support; 
we would have to consult the type mapping for these {{SqlUnknownLiteral}} and 
{{SqlUserDefinedTypeNameSpec}} nodes during unparsing.

2. The program parses the query, converts to {{RelNode}} form, does whatever, 
then converts back to {{SqlNode}} form, and *then* unparses in a specific 
dialect. This is the scenario we really care about. In this scenario, all those 
"unknown" and "user-defined" nodes have disappeared; replaced by their 
canonical types. This would be easy to support: we just hardcode the mapping in 
{{SqlDialect.unparseDateTimeLiteral}} and {{SqlDialect.getCastSpec}} for each 
particular dialect.

I'm inclined to move forward on scenario #2. Seems like they're actually fairly 
independent issues. The former only deals with those unknown/user-defined SQL 
nodes, and the latter only deals with "canonical" nodes like 
{{SqlBasicTypeNameSpec}}.

The reason I question whether scenario #1 even makes sense is that it implies 
programs will always want to unparse a query in the same context (schema and 
possibly dialect) that it was parsed. This may not be true, and if it's not, it 
becomes complicated for the program to get it right every time. In scenario #2, 
all that context has already been lost anyway.


was (Author: wnoble):
There are actually 2 different scenarios:

1. The program parses a query using a UDT map, and ends up with 
{{SqlUnknownLiteral}} and {{SqlUserDefinedTypeNameSpec}} nodes in the AST. The 
program never bothers converting that AST into a relational expression, and 
simply unparses it in a specific dialect. I'm not sure how relevant this 
scenario is to anybody, or if it even makes a ton a sense. It's not 
particularly relevant to us. It would also be the annoying scenario to support; 
we would have to consult the type mapping for these {{SqlUnknownLiteral}} and 
{{SqlUserDefinedTypeNameSpec}} nodes during unparsing.

2. The program parses the query, converts to {{RelNode}} form, does whatever, 
then converts back to {{SqlNode}} form, and *then* unparses in a specific 
dialect. This is the scenario we really care about. In this scenario, all those 
"unknown" and "user-defined" nodes have disappeared; replaced by their 
canonical types. This would be easy to support: we just hardcode the mapping in 
{{SqlDialect.unparseDateTimeLiteral}} and {{SqlDialect.getCastSpec}} for each 
particular dialect.

I'm inclined to move forward on scenario #2. Seems like they're actually fairly 
independent issues. The former only deals with those unknown/user-defined SQL 
nodes, and the latter only deals with "canonical" nodes like 
{{SqlBasicTypeNameSpec}}.

> Handle unparsing of literals based on type system
> -
>
> Key: CALCITE-5526
> URL: https://issues.apache.org/jira/browse/CALCITE-5526
> Project: Calcite
>  Issue Type: Task
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>
> CALCITE-5424 dealt with parsing date/time literals via a custom type system, 
> however there is still no way to unparse them via the same type system. This 
> is handled in 
> [SqlDialect.unparseDateTimeLiteral|https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/java/org/apache/calcite/sql/SqlDialect.java#L468]
>  which simply calls the literal object's {{toString()}} method.
> The literal object (a subclass of {{SqlAbstractDateTimeLiteral}}) should not 
> be able to determine it's own unparsed representation by itself since it 
> could be unparsed in any dialect. Since the current system for resolving 
> custom types relies on access to the catalog, it appears we'll need to 
> introduce to {{SqlDialect}} a dependency on {{CalciteCatalogReader}} so it 
> can use the same mappings for unparsing as it does for parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-5526) Handle unparsing of literals based on type system

2023-10-17 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776449#comment-17776449
 ] 

Will Noble edited comment on CALCITE-5526 at 10/18/23 1:42 AM:
---

There are actually 2 different scenarios:

1. The program parses a query using a UDT map, and ends up with 
{{SqlUnknownLiteral}} and {{SqlUserDefinedTypeNameSpec}} nodes in the AST. The 
program never bothers converting that AST into a relational expression, and 
simply unparses it in a specific dialect. I'm not sure how relevant this 
scenario is to anybody, or if it even makes a ton a sense. It's not 
particularly relevant to us. It would also be the annoying scenario to support; 
we would have to consult the type mapping for these {{SqlUnknownLiteral}} and 
{{SqlUserDefinedTypeNameSpec}} nodes during unparsing.

2. The program parses the query, converts to {{RelNode}} form, does whatever, 
then converts back to {{SqlNode}} form, and *then* unparses in a specific 
dialect. This is the scenario we really care about. In this scenario, all those 
"unknown" and "user-defined" nodes have disappeared; replaced by their 
canonical types. This would be easy to support: we just hardcode the mapping in 
{{SqlDialect.unparseDateTimeLiteral}} and {{SqlDialect.getCastSpec}} for each 
particular dialect.

I'm inclined to move forward on scenario #2. Seems like they're actually fairly 
independent issues. The former only deals with those unknown/user-defined SQL 
nodes, and the latter only deals with "canonical" nodes like 
{{SqlBasicTypeNameSpec}}.


was (Author: wnoble):
There are actually 2 different scenarios:

1. The program parses a query using a UDT map, and ends up with 
{{SqlUnknownLiteral}} and {{SqlUserDefinedTypeNameSpec}} nodes in the AST. The 
program never bothers converting that AST into a relational expression, and 
simply unparses it in a specific dialect. I'm not sure how relevant this 
scenario is to anybody, or if it even makes a ton a sense. It's not 
particularly relevant to us. It would also be the annoying scenario to support; 
we would have to consult the type mapping for these {{SqlUnknownLiteral}} and 
{{SqlUserDefinedTypeNameSpec}} nodes during unparsing.

2. The program parses, converts to {{RelNode}} form, does whatever, then 
converts back to {{SqlNode}} form, and *then* unparses in a specific dialect. 
This is the scenario we really care about. In this scenario, all those 
"unknown" and "user-defined" nodes have disappeared; replaced by their 
canonical types. This would be easy to support: we just hardcode the mapping in 
{{SqlDialect.unparseDateTimeLiteral}} and {{SqlDialect.getCastSpec}} for each 
particular dialect.

I'm inclined to move forward on scenario #2. Seems like they're actually fairly 
independent issues. The former only deals with those unknown/user-defined SQL 
nodes, and the latter only deals with "canonical" nodes like 
{{SqlBasicTypeNameSpec}}.

> Handle unparsing of literals based on type system
> -
>
> Key: CALCITE-5526
> URL: https://issues.apache.org/jira/browse/CALCITE-5526
> Project: Calcite
>  Issue Type: Task
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>
> CALCITE-5424 dealt with parsing date/time literals via a custom type system, 
> however there is still no way to unparse them via the same type system. This 
> is handled in 
> [SqlDialect.unparseDateTimeLiteral|https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/java/org/apache/calcite/sql/SqlDialect.java#L468]
>  which simply calls the literal object's {{toString()}} method.
> The literal object (a subclass of {{SqlAbstractDateTimeLiteral}}) should not 
> be able to determine it's own unparsed representation by itself since it 
> could be unparsed in any dialect. Since the current system for resolving 
> custom types relies on access to the catalog, it appears we'll need to 
> introduce to {{SqlDialect}} a dependency on {{CalciteCatalogReader}} so it 
> can use the same mappings for unparsing as it does for parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-5526) Handle unparsing of literals based on type system

2023-10-17 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776449#comment-17776449
 ] 

Will Noble edited comment on CALCITE-5526 at 10/18/23 1:41 AM:
---

There are actually 2 different scenarios:

1. The program parses a query using a UDT map, and ends up with 
{{SqlUnknownLiteral}} and {{SqlUserDefinedTypeNameSpec}} nodes in the AST. The 
program never bothers converting that AST into a relational expression, and 
simply unparses it in a specific dialect. I'm not sure how relevant this 
scenario is to anybody, or if it even makes a ton a sense. It's not 
particularly relevant to us. It would also be the annoying scenario to support; 
we would have to consult the type mapping for these {{SqlUnknownLiteral}} and 
{{SqlUserDefinedTypeNameSpec}} nodes during unparsing.

2. The program parses, converts to {{RelNode}} form, does whatever, then 
converts back to {{SqlNode}} form, and *then* unparses in a specific dialect. 
This is the scenario we really care about. In this scenario, all those 
"unknown" and "user-defined" nodes have disappeared; replaced by their 
canonical types. This would be easy to support: we just hardcode the mapping in 
{{SqlDialect.unparseDateTimeLiteral}} and {{SqlDialect.getCastSpec}} for each 
particular dialect.

I'm inclined to move forward on scenario #2. Seems like they're actually fairly 
independent issues. The former only deals with those unknown/user-defined SQL 
nodes, and the latter only deals with "canonical" nodes like 
{{SqlBasicTypeNameSpec}}.


was (Author: wnoble):
There are actually 2 different scenarios:

1. The program parses a query using a UDT map, and ends up with 
{{SqlUnknownLiteral}} and {{SqlUserDefinedTypeNameSpec}} nodes in the AST. The 
program never bothers converting that AST into a relational expression, and 
simply unparses it in a specific dialect. I'm not sure how relevant this 
scenario is to anybody. It's not particularly relevant to us. This would be the 
annoying scenario to support; we would have to consult the type mapping for 
these {{SqlUnknownLiteral}} and {{SqlUserDefinedTypeNameSpec}} nodes during 
unparsing.

2. The program parses, converts to {{RelNode}} form, does whatever, then 
converts back to {{SqlNode}} form, and *then* unparses in a specific dialect. 
This is the scenario we really care about. In this scenario, all those 
"unknown" and "user-defined" nodes have disappeared; replaced by their 
canonical types. This would be easy to support: we just hardcode the mapping in 
{{SqlDialect.unparseDateTimeLiteral}} and {{SqlDialect.getCastSpec}} for each 
particular dialect.

I'm inclined to move forward on scenario #2. Seems like they're actually fairly 
independent issues. The former only deals with those unknown/user-defined SQL 
nodes, and the latter only deals with "canonical" nodes like 
{{SqlBasicTypeNameSpec}}.

> Handle unparsing of literals based on type system
> -
>
> Key: CALCITE-5526
> URL: https://issues.apache.org/jira/browse/CALCITE-5526
> Project: Calcite
>  Issue Type: Task
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>
> CALCITE-5424 dealt with parsing date/time literals via a custom type system, 
> however there is still no way to unparse them via the same type system. This 
> is handled in 
> [SqlDialect.unparseDateTimeLiteral|https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/java/org/apache/calcite/sql/SqlDialect.java#L468]
>  which simply calls the literal object's {{toString()}} method.
> The literal object (a subclass of {{SqlAbstractDateTimeLiteral}}) should not 
> be able to determine it's own unparsed representation by itself since it 
> could be unparsed in any dialect. Since the current system for resolving 
> custom types relies on access to the catalog, it appears we'll need to 
> introduce to {{SqlDialect}} a dependency on {{CalciteCatalogReader}} so it 
> can use the same mappings for unparsing as it does for parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-5526) Handle unparsing of literals based on type system

2023-10-17 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776449#comment-17776449
 ] 

Will Noble edited comment on CALCITE-5526 at 10/18/23 1:37 AM:
---

There are actually 2 different scenarios:

1. The program parses a query using a UDT map, and ends up with 
{{SqlUnknownLiteral}} and {{SqlUserDefinedTypeNameSpec}} nodes in the AST. The 
program never bothers converting that AST into a relational expression, and 
simply unparses it in a specific dialect. I'm not sure how relevant this 
scenario is to anybody. It's not particularly relevant to us. This would be the 
annoying scenario to support; we would have to consult the type mapping for 
these {{SqlUnknownLiteral}} and {{SqlUserDefinedTypeNameSpec}} nodes during 
unparsing.

2. The program parses, converts to {{RelNode}} form, does whatever, then 
converts back to {{SqlNode}} form, and *then* unparses in a specific dialect. 
This is the scenario we really care about. In this scenario, all those 
"unknown" and "user-defined" nodes have disappeared; replaced by their 
canonical types. This would be easy to support: we just hardcode the mapping in 
{{SqlDialect.unparseDateTimeLiteral}} and {{SqlDialect.getCastSpec}} for each 
particular dialect.

I'm inclined to move forward on scenario #2. Seems like they're actually fairly 
independent issues. The former only deals with those unknown/user-defined SQL 
nodes, and the latter only deals with "canonical" nodes like 
{{SqlBasicTypeNameSpec}}.


was (Author: wnoble):
There are actually 2 different scenarios:

1. The program parses a query using a UDT map, and ends up with 
{{SqlUnknownLiteral}} and {{SqlUserDefinedTypeNameSpec}} nodes in the AST. The 
program never bothers converting that AST into a relational expression, and 
simply unparses it in a specific dialect. I'm not sure how relevant this 
scenario is to anybody. It's not particularly relevant to us. This would be the 
annoying scenario to support; we would have to consult the type mapping for 
these {{SqlUnknownLiteral}} and {{SqlUserDefinedTypeNameSpec}} nodes during 
unparsing.

2. The program parses, converts to {{RelNode}} form, does whatever, then 
converts back to {{SqlNode}} form, and *then* unparses in a specific dialect. 
This is the scenario we really care about. In this scenario, all those 
"unknown" and "user-defined" nodes have disappeared; replaced by their 
canonical types. This would be easy to support: we just hardcode the mapping in 
{{SqlDialect.unparseDateTimeLiteral}} and {{SqlDialect.getCastSpec}}.

I'm inclined to move forward on scenario #2. Seems like they're actually fairly 
independent issues. The former only deals with those unknown/user-defined SQL 
nodes, and the latter only deals with "canonical" nodes like 
{{SqlBasicTypeNameSpec}}.

> Handle unparsing of literals based on type system
> -
>
> Key: CALCITE-5526
> URL: https://issues.apache.org/jira/browse/CALCITE-5526
> Project: Calcite
>  Issue Type: Task
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>
> CALCITE-5424 dealt with parsing date/time literals via a custom type system, 
> however there is still no way to unparse them via the same type system. This 
> is handled in 
> [SqlDialect.unparseDateTimeLiteral|https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/java/org/apache/calcite/sql/SqlDialect.java#L468]
>  which simply calls the literal object's {{toString()}} method.
> The literal object (a subclass of {{SqlAbstractDateTimeLiteral}}) should not 
> be able to determine it's own unparsed representation by itself since it 
> could be unparsed in any dialect. Since the current system for resolving 
> custom types relies on access to the catalog, it appears we'll need to 
> introduce to {{SqlDialect}} a dependency on {{CalciteCatalogReader}} so it 
> can use the same mappings for unparsing as it does for parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-5526) Handle unparsing of literals based on type system

2023-10-17 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776449#comment-17776449
 ] 

Will Noble edited comment on CALCITE-5526 at 10/18/23 1:36 AM:
---

There are actually 2 different scenarios:

1. The program parses a query using a UDT map, and ends up with 
{{SqlUnknownLiteral}} and {{SqlUserDefinedTypeNameSpec}} nodes in the AST. The 
program never bothers converting that AST into a relational expression, and 
simply unparses it in a specific dialect. I'm not sure how relevant this 
scenario is to anybody. It's not particularly relevant to us. This would be the 
annoying scenario to support; we would have to consult the type mapping for 
these {{SqlUnknownLiteral}} and {{SqlUserDefinedTypeNameSpec}} nodes during 
unparsing.

2. The program parses, converts to {{RelNode}} form, does whatever, then 
converts back to {{SqlNode}} form, and *then* unparses in a specific dialect. 
This is the scenario we really care about. In this scenario, all those 
"unknown" and "user-defined" nodes have disappeared; replaced by their 
canonical types. This would be easy to support: we just hardcode the mapping in 
{{SqlDialect.unparseDateTimeLiteral}} and {{SqlDialect.getCastSpec}}.

I'm inclined to move forward on scenario #2. Seems like they're actually fairly 
independent issues. The former only deals with those unknown/user-defined SQL 
nodes, and the latter only deals with "canonical" nodes like 
{{SqlBasicTypeNameSpec}}.


was (Author: wnoble):
There are actually 2 different scenarios:

1. The program parses a query using a UDT map, and ends up with 
{{SqlUnknownLiteral}} and {{SqlUserDefinedTypeNameSpec}} nodes in the AST. The 
program never bothers converting that AST into a relational expression, and 
simply unparses it in a specific dialect. I'm not sure how relevant this 
scenario is to anybody. It's not particularly relevant to us. This would be the 
annoying scenario to support; we would have to consult the type mapping for 
these {{SqlUnknownLiteral}} and {{SqlUserDefinedTypeNameSpec}} nodes during 
unparsing.
1. The program parses, converts to {{RelNode}} form, does whatever, then 
converts back to {{SqlNode}} form, and *then* unparses in a specific dialect. 
This is the scenario we really care about. In this scenario, all those 
"unknown" and "user-defined" nodes have disappeared; replaced by their 
canonical types. This would be easy to support: we just hardcode the mapping in 
{{SqlDialect.unparseDateTimeLiteral}} and {{SqlDialect.getCastSpec}}.

I'm inclined to move forward on scenario #2. Seems like they're actually fairly 
independent issues. The former only deals with those unknown/user-defined SQL 
nodes, and the latter only deals with "canonical" nodes like 
{{SqlBasicTypeNameSpec}}.

> Handle unparsing of literals based on type system
> -
>
> Key: CALCITE-5526
> URL: https://issues.apache.org/jira/browse/CALCITE-5526
> Project: Calcite
>  Issue Type: Task
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>
> CALCITE-5424 dealt with parsing date/time literals via a custom type system, 
> however there is still no way to unparse them via the same type system. This 
> is handled in 
> [SqlDialect.unparseDateTimeLiteral|https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/java/org/apache/calcite/sql/SqlDialect.java#L468]
>  which simply calls the literal object's {{toString()}} method.
> The literal object (a subclass of {{SqlAbstractDateTimeLiteral}}) should not 
> be able to determine it's own unparsed representation by itself since it 
> could be unparsed in any dialect. Since the current system for resolving 
> custom types relies on access to the catalog, it appears we'll need to 
> introduce to {{SqlDialect}} a dependency on {{CalciteCatalogReader}} so it 
> can use the same mappings for unparsing as it does for parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5526) Handle unparsing of literals based on type system

2023-10-17 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776449#comment-17776449
 ] 

Will Noble commented on CALCITE-5526:
-

There are actually 2 different scenarios:

1. The program parses a query using a UDT map, and ends up with 
{{SqlUnknownLiteral}} and {{SqlUserDefinedTypeNameSpec}} nodes in the AST. The 
program never bothers converting that AST into a relational expression, and 
simply unparses it in a specific dialect. I'm not sure how relevant this 
scenario is to anybody. It's not particularly relevant to us. This would be the 
annoying scenario to support; we would have to consult the type mapping for 
these {{SqlUnknownLiteral}} and {{SqlUserDefinedTypeNameSpec}} nodes during 
unparsing.
1. The program parses, converts to {{RelNode}} form, does whatever, then 
converts back to {{SqlNode}} form, and *then* unparses in a specific dialect. 
This is the scenario we really care about. In this scenario, all those 
"unknown" and "user-defined" nodes have disappeared; replaced by their 
canonical types. This would be easy to support: we just hardcode the mapping in 
{{SqlDialect.unparseDateTimeLiteral}} and {{SqlDialect.getCastSpec}}.

I'm inclined to move forward on scenario #2. Seems like they're actually fairly 
independent issues. The former only deals with those unknown/user-defined SQL 
nodes, and the latter only deals with "canonical" nodes like 
{{SqlBasicTypeNameSpec}}.

> Handle unparsing of literals based on type system
> -
>
> Key: CALCITE-5526
> URL: https://issues.apache.org/jira/browse/CALCITE-5526
> Project: Calcite
>  Issue Type: Task
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>
> CALCITE-5424 dealt with parsing date/time literals via a custom type system, 
> however there is still no way to unparse them via the same type system. This 
> is handled in 
> [SqlDialect.unparseDateTimeLiteral|https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/java/org/apache/calcite/sql/SqlDialect.java#L468]
>  which simply calls the literal object's {{toString()}} method.
> The literal object (a subclass of {{SqlAbstractDateTimeLiteral}}) should not 
> be able to determine it's own unparsed representation by itself since it 
> could be unparsed in any dialect. Since the current system for resolving 
> custom types relies on access to the catalog, it appears we'll need to 
> introduce to {{SqlDialect}} a dependency on {{CalciteCatalogReader}} so it 
> can use the same mappings for unparsing as it does for parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6055) Customize handling of type name based on type system

2023-10-17 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776433#comment-17776433
 ] 

Will Noble commented on CALCITE-6055:
-

Actually, this can be handled fine using the existing 
{{SqlUserDefinedTypeNameSpec}}. We had come up with a fix for this in our local 
branch, but it was still erroneously using {{SqlBasicTypeNameSpec}} for the 
original {{TIMESTAMP}} type name, leading to the complex problem described in 
CALCITE-5636. I should be able to obsolete that ticket in short order as I 
address this one.

> Customize handling of type name based on type system
> 
>
> Key: CALCITE-6055
> URL: https://issues.apache.org/jira/browse/CALCITE-6055
> Project: Calcite
>  Issue Type: Bug
>Reporter: Will Noble
>Priority: Minor
>
> This is very similar to CALCITE-5424, which only applied to literals (e.g. 
> {{TIMESTAMP '2023-10-17 12:00:00'}}). We need a similar solution for [the 
> {{DateTimeTypeName}} 
> construction|https://github.com/apache/calcite/blob/6f79436c178beec639e559d9152c237bbf8ec3e8/core/src/main/codegen/templates/Parser.jj#L6019]
>  which is used in other contexts (e.g. {{CAST('2023-10-17 12:00:00' AS 
> TIMESTAMP)}}). Seems like we can use a very similar approach: introduce a 
> {{SqlUnknownTypeNameSpec}} that's resolved during validation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-6055) Customize handling of type name based on type system

2023-10-17 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-6055:

Description: This is very similar to CALCITE-5424, which only applied to 
literals (e.g. {{TIMESTAMP '2023-10-17 12:00:00'}}). We need a similar solution 
for [the {{DateTimeTypeName}} 
construction|https://github.com/apache/calcite/blob/6f79436c178beec639e559d9152c237bbf8ec3e8/core/src/main/codegen/templates/Parser.jj#L6019]
 which is used in other contexts (e.g. {{CAST('2023-10-17 12:00:00' AS 
TIMESTAMP)}}). Seems like we can use a very similar approach: introduce a 
{{SqlUnknownTypeNameSpec}} that's resolved during validation.  (was: This is 
very similar to CALCITE-5424, which only applied to literals (e.g. {{TIMESTAMP 
'2023-10-17 12:00:00'}}). We need a similar solution for [the 
{{DateTimeTypeName}} 
construction|https://github.com/apache/calcite/blob/6f79436c178beec639e559d9152c237bbf8ec3e8/core/src/main/codegen/templates/Parser.jj#L6019]
 which is used in other contexts (e.g. {{CAST '2023-10-17 12:00:00' AS 
TIMESTAMP}}). Seems like we can use a very similar approach: introduce a 
{{SqlUnknownTypeNameSpec}} that's resolved during validation.)

> Customize handling of type name based on type system
> 
>
> Key: CALCITE-6055
> URL: https://issues.apache.org/jira/browse/CALCITE-6055
> Project: Calcite
>  Issue Type: Bug
>Reporter: Will Noble
>Priority: Minor
>
> This is very similar to CALCITE-5424, which only applied to literals (e.g. 
> {{TIMESTAMP '2023-10-17 12:00:00'}}). We need a similar solution for [the 
> {{DateTimeTypeName}} 
> construction|https://github.com/apache/calcite/blob/6f79436c178beec639e559d9152c237bbf8ec3e8/core/src/main/codegen/templates/Parser.jj#L6019]
>  which is used in other contexts (e.g. {{CAST('2023-10-17 12:00:00' AS 
> TIMESTAMP)}}). Seems like we can use a very similar approach: introduce a 
> {{SqlUnknownTypeNameSpec}} that's resolved during validation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6055) Customize handling of type name based on type system

2023-10-17 Thread Will Noble (Jira)
Will Noble created CALCITE-6055:
---

 Summary: Customize handling of type name based on type system
 Key: CALCITE-6055
 URL: https://issues.apache.org/jira/browse/CALCITE-6055
 Project: Calcite
  Issue Type: Bug
Reporter: Will Noble


This is very similar to CALCITE-5424, which only applied to literals (e.g. 
{{TIMESTAMP '2023-10-17 12:00:00'}}). We need a similar solution for [the 
{{DateTimeTypeName}} 
construction|https://github.com/apache/calcite/blob/6f79436c178beec639e559d9152c237bbf8ec3e8/core/src/main/codegen/templates/Parser.jj#L6019]
 which is used in other contexts (e.g. {{CAST '2023-10-17 12:00:00' AS 
TIMESTAMP}}). Seems like we can use a very similar approach: introduce a 
{{SqlUnknownTypeNameSpec}} that's resolved during validation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6054) LAST_DAY should support timestamps with local time zone

2023-10-17 Thread Will Noble (Jira)
Will Noble created CALCITE-6054:
---

 Summary: LAST_DAY should support timestamps with local time zone
 Key: CALCITE-6054
 URL: https://issues.apache.org/jira/browse/CALCITE-6054
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Will Noble


The [implementation for 
{{LAST_DAY}}|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L2497]
 currently throws an error when it encounters a {{TIMESTAMP WITH LOCAL TIME 
ZONE}}. Rather, it should adjust the value of such a timestamp based on the 
data context time zone, then treat it like a regular {{TIMESTAMP}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6045) CURRENT_TIMESTAMP has incorrect return type

2023-10-16 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775930#comment-17775930
 ] 

Will Noble commented on CALCITE-6045:
-

This would seem to be a case where the standard SQL function 
{{CURRENT_TIMESTAMP}} (meant to return a {{{}TIMESTAMP WITH TIME ZONE}}) has a 
name collision with the BigQuery-specific function {{CURRENT_TIMESTAMP}} (mean 
to return a {{{}TIMESTAMP WITH LOCAL TIME ZONE}} a.k.a. BigQuery-specific 
{{TIMESTAMP}}). Is there a standard procedure for handling function name 
collisions between standard SQL and particular dialects?

> CURRENT_TIMESTAMP has incorrect return type
> ---
>
> Key: CALCITE-6045
> URL: https://issues.apache.org/jira/browse/CALCITE-6045
> Project: Calcite
>  Issue Type: Bug
>Reporter: Tanner Clary
>Priority: Major
>
> When trying to work on CALCITE-6021, I noticed that {{CURRENT_TIMESTAMP}} 
> currently returns type {{TIMESTAMP}} when it should be 
> {{TIMESTAMP_WITH_LOCAL_TIME_ZONE}}.
> After modifying it, I noticed function was returning the time from (UTC - 
> System TZ) hours ago. For example, I am in {{America/Los_Angeles}} and if I 
> called the function at {{2023-10-10 13:28:00 America/Los_Angeles}}, it would 
> return {{2023-10-10 06:28:00 America/Los_Angeles}}. 
> I think this is because the DataContext {{CURRENT_TIMESTAMP}} variable, which 
> is meant to represent milliseconds since epoch UTC, actually has the timezone 
> offset applied in {{CalciteConnectionImpl#DataContextImpl}} 
> [here|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/jdbc/CalciteConnectionImpl.java#L442].
>  To be clear: it is meant to represent millis since epoch UTC, but instead it 
> is millis since epoch [system tz], as I understand it. 
> Additionally, I believe the {{getString()}} method for timestamps in 
> AvaticaResultSet should behave similarly to 
> [{{SqlFunctions#timestampWithLocalTimezoneToString()}}|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L4021]
>  when dealing with a {{TIMESTAMP WITH LOCAL TIME ZONE}}. Right now, it does 
> not take the timezone into consideration so although it represents the 
> accurate instant in time, it displays differently than 
> {{CAST(CURRENT_TIMESTAMP AS VARCHAR)}}.
> For example, {{SELECT CURRENT_TIMESTAMP, CAST(CURRENT_TIMESTAMP AS 
> VARCHAR)}}, with the correct return type, returns something like:
> {{2023-10-10 13:28:00 |  2023-10-10 06:28:00.000 America/Los_Angeles}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-5636) Consult type mappings when coercing types

2023-09-06 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762525#comment-17762525
 ] 

Will Noble edited comment on CALCITE-5636 at 9/7/23 12:58 AM:
--

Finally getting back to this. Hopefully this is a much better description of 
the problem. It occurs when using BigQuery's catalog type map (where 
{{TIMESTAMP}} maps to {{TIMESTAMP WITH LOCAL TIME ZONE}}) and 
{{SqlToRelConverter}} encounters a query like this:
{code:sql}
SELECT TIMESTAMP_ADD(DATE(1900, 1, 1), INTERVAL (1) DAY) AS `ts`
{code}
The parsed {{SqlNode}} has validated type {{TIMESTAMP(0) NOT NULL}}, and it 
ends up converting into the following relational expression:
{code:java}
LogicalProject(ts=[+(CAST(DATE(1900, 1, 1)):TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) 
NOT NULL, *(8640:INTERVAL DAY, 1))])
{code}
which of course has converted type {{TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) NOT 
NULL}}, so the type of the {{SqlNode}} does not match the type of the 
{{{}RelNode{}}}, and it triggers [this 
exception|https://github.com/apache/calcite/blob/50c3edfc3d6630528ab51fe836bd50df82cc7db8/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L489].

Btw, despite the fact that a BQ {{TIMESTAMP}} is really a timestamp w/ ltz, the 
{{TIMESTAMP_ADD()}} function actually returns a {{DATETIME}} object (ISO 
timestamp) when the first argument is a {{DATE}}, so the validated type of the 
{{SqlNode}} is actually correct, as verified by manually running the query 
against BQ. My original PR was mistaken in believing that it should be 
timestamp w/ ltz, as the name seems to imply. Whoops. There may actually be a 
couple bugs at play here.

Anyway, the reason for the discrepancy can be seen in 
{{SqlValidatorImpl.validateSelectList()}}. This invokes {{expandSelectItem()}} 
[here|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L4638],
 which eventually tries to derive the type of the {{TIMESTAMP_ADD()}} call 
[here|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L6555],
 which must validate the operands in order to derive the type of the function 
call, and in so doing coerces the type of the first operand by casting it to an 
ISO timestamp. All of this results in a value in {{expandedSelectItems}} that 
looks like this:

{code:sql}
TIMESTAMP_ADD(CAST(DATE(1900, 1, 1) AS TIMESTAMP(0)), INTERVAL 1 DAY) AS `ts`
{code}

[A few lines 
down|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L4659]
 in {{validateSelectList()}}, we invoke {{validateExpr()}} on that expanded 
node, and this eventually tries to derive the type of of that {{CAST}} 
expression, which results in visiting the inserted {{TIMESTAMP(0)}} expression 
(which is a {{SqlDataTypeSpec}}) 
[here|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L6657],
 and in so doing switches it from {{TIMESTAMP}} to {{TIMESTAMP WITH LOCAL TIME 
ZONE}}, because it thinks that the {{CAST}} expression was written by the user, 
and must therefore be translated according to the catalog reader.

I'm still thinking about how to handle this. My [original 
PR|https://github.com/apache/calcite/pull/3144] tried to handle it by 
consulting the type map when the {{CAST}} is inserted during coercion, but what 
we probably actually want is for the type map to be consulted during validation 
IFF the {{SqlDataTypeSpec}} was actually written by the user, as opposed to 
being inserted during expansion. Perhaps we should introduce a boolean flag to 
{{SqlDataTypeSpec}} that indicates that the type is final?




was (Author: wnoble):
Finally getting back to this. Hopefully this is a much better description of 
the problem. It occurs when using BigQuery's catalog type map (where 
{{TIMESTAMP}} maps to {{TIMESTAMP WITH LOCAL TIME ZONE}}) and 
{{SqlToRelConverter}} encounters a query like this:
{code:sql}
SELECT TIMESTAMP_ADD(DATE(1900, 1, 1), INTERVAL (1) DAY) AS `ts`
{code}
The parsed {{SqlNode}} has validated type {{TIMESTAMP(0) NOT NULL}}, and it 
ends up converting into the following relational expression:
{code:java}
LogicalProject(ts=[+(CAST(DATE(1900, 1, 1)):TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) 
NOT NULL, *(8640:INTERVAL DAY, 1))])
{code}
which of course has converted type {{TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) NOT 
NULL}}, so the type of the {{SqlNode}} does not match the type of the 
{{{}RelNode{}}}, and it triggers [this 
exception|https://github.com/apache/calcite/blob/50c3edfc3d6630528ab51fe836bd50df82cc7db8/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L489].

[jira] [Comment Edited] (CALCITE-5636) Consult type mappings when coercing types

2023-09-06 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762525#comment-17762525
 ] 

Will Noble edited comment on CALCITE-5636 at 9/7/23 12:54 AM:
--

Finally getting back to this. Hopefully this is a much better description of 
the problem. It occurs when using BigQuery's catalog type map (where 
{{TIMESTAMP}} maps to {{TIMESTAMP WITH LOCAL TIME ZONE}}) and 
{{SqlToRelConverter}} encounters a query like this:
{code:sql}
SELECT TIMESTAMP_ADD(DATE(1900, 1, 1), INTERVAL (1) DAY) AS `ts`
{code}
The parsed {{SqlNode}} has validated type {{TIMESTAMP(0) NOT NULL}}, and it 
ends up converting into the following relational expression:
{code:java}
LogicalProject(ts=[+(CAST(DATE(1900, 1, 1)):TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) 
NOT NULL, *(8640:INTERVAL DAY, 1))])
{code}
which of course has converted type {{TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) NOT 
NULL}}, so the type of the {{SqlNode}} does not match the type of the 
{{{}RelNode{}}}, and it triggers [this 
exception|https://github.com/apache/calcite/blob/50c3edfc3d6630528ab51fe836bd50df82cc7db8/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L489].

Btw, despite the fact that a BQ {{TIMESTAMP}} is really a timestamp w/ ltz, the 
{{TIMESTAMP_ADD()}} function actually returns a {{DATETIME}} object (ISO 
timestamp) when the first argument is a {{DATE}}, so the validated type of the 
{{SqlNode}} is actually correct, as verified by manually running the query 
against BQ. My original PR was mistaken in believing that it should be 
timestamp w/ ltz, as the name seems to imply. Whoops. There may actually be a 
couple bugs at play here.

Anyway, the reason for the discrepancy can be seen in 
{{SqlValidatorImpl.validateSelectList()}}. This invokes {{expandSelectItem()}} 
[here|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L4638],
 which eventually tries to derive the type of the {{TIMESTAMP_ADD()}} call 
[here|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L6555],
 which must validate the operands in order to derive the type of the function 
call, and in so doing coerces the type of the first operand by casting it to an 
ISO timestamp. All of this results in a value in {{expandedSelectItems}} that 
looks like this:

{code:sql}
TIMESTAMP_ADD(CAST(DATE(1900, 1, 1) AS TIMESTAMP(0)), INTERVAL 1 DAY) AS `ts`
{code}

[A few lines 
down|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L4659]
 in {{validateSelectList()}}, we invoke {{validateExpr()}} on that expanded 
node, and this eventually tries to derive the type of of that {{CAST}} 
expression, which results in visiting the inserted {{TIMESTAMP(0)}} expression 
(which is a {{SqlDataTypeSpec}}) 
[here|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L6657],
 and in so doing switches it from {{TIMESTAMP}} to {{TIMESTAMP WITH LOCAL TIME 
ZONE}}, because it thinks that the {{CAST}} expression was written by the user, 
and must therefore be translated according to the catalog reader.

I'm still thinking about how to handle this.




was (Author: wnoble):
Finally getting back to this. Hopefully this is a much better description of 
the problem. It occurs when using BigQuery's catalog type map (where 
{{TIMESTAMP}} maps to {{TIMESTAMP WITH LOCAL TIME ZONE}}) and 
{{SqlToRelConverter}} encounters a query like this:
{code:sql}
SELECT TIMESTAMP_ADD(DATE(1900, 1, 1), INTERVAL (1) DAY) AS `ts`
{code}
The parsed {{SqlNode}} has validated type {{TIMESTAMP(0) NOT NULL}}, and it 
ends up converting into the following relational expression:
{code:java}
LogicalProject(ts=[+(CAST(DATE(1900, 1, 1)):TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) 
NOT NULL, *(8640:INTERVAL DAY, 1))])
{code}
which of course has converted type {{TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) NOT 
NULL}}, so the type of the {{SqlNode}} does not match the type of the 
{{{}RelNode{}}}, and it triggers [this 
exception|https://github.com/apache/calcite/blob/50c3edfc3d6630528ab51fe836bd50df82cc7db8/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L489].

Btw, despite the fact that a BQ {{TIMESTAMP}} is really a timestamp w/ ltz, the 
{{TIMESTAMP_ADD()}} function actually returns a {{DATETIME}} object (ISO 
timestamp) when the first argument is a {{DATE}}, so the validated type of the 
{{SqlNode}} is actually correct, as verified by manually running the query 
against BQ. My original PR was mistaken in believing that it should be 
timestamp w/ ltz, as the name seems to imply. Whoops. There may actually be a 

[jira] [Comment Edited] (CALCITE-5636) Consult type mappings when coercing types

2023-09-06 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762525#comment-17762525
 ] 

Will Noble edited comment on CALCITE-5636 at 9/7/23 12:52 AM:
--

Finally getting back to this. Hopefully this is a much better description of 
the problem. It occurs when using BigQuery's catalog type map (where 
{{TIMESTAMP}} maps to {{TIMESTAMP WITH LOCAL TIME ZONE}}) and 
{{SqlToRelConverter}} encounters a query like this:
{code:sql}
SELECT TIMESTAMP_ADD(DATE(1900, 1, 1), INTERVAL (1) DAY) AS `ts`
{code}
The parsed {{SqlNode}} has validated type {{TIMESTAMP(0) NOT NULL}}, and it 
ends up converting into the following relational expression:
{code:java}
LogicalProject(ts=[+(CAST(DATE(1900, 1, 1)):TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) 
NOT NULL, *(8640:INTERVAL DAY, 1))])
{code}
which of course has converted type {{TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) NOT 
NULL}}, so the type of the {{SqlNode}} does not match the type of the 
{{{}RelNode{}}}, and it triggers [this 
exception|https://github.com/apache/calcite/blob/50c3edfc3d6630528ab51fe836bd50df82cc7db8/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L489].

Btw, despite the fact that a BQ {{TIMESTAMP}} is really a timestamp w/ ltz, the 
{{TIMESTAMP_ADD()}} function actually returns a {{DATETIME}} object (ISO 
timestamp) when the first argument is a {{DATE}}, so the validated type of the 
{{SqlNode}} is actually correct, as verified by manually running the query 
against BQ. My original PR was mistaken in believing that it should be 
timestamp w/ ltz, as the name seems to imply. Whoops. There may actually be a 
couple bugs at play here.

Anyway, the reason for the discrepancy can be seen in 
{{SqlValidatorImpl.validateSelectList()}}. This invokes {{expandSelectItem()}} 
[here|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L4638],
 which eventually tries to derive the type of the {{TIMESTAMP_ADD()}} call 
[here|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L6555],
 which must validate the operands in order to derive the type of the function 
call, and in so doing coerces the type of the first operand by casting it to an 
ISO timestamp. All of this results in a value in {{expandedSelectItems}} that 
looks like this:

{code:sql}
TIMESTAMP_ADD(CAST(DATE(1900, 1, 1) AS TIMESTAMP(0)), INTERVAL 1 DAY) AS `ts`
{code}

[A few lines 
down|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L4659]
 in {{validateSelectList()}}, we invoke {{validateExpr()}} on that expanded 
node, and this eventually tries to derive the type of of that {{CAST}} 
expression, which results in visiting the inserted {{SqlDataTypeSpec}} 
[here|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L6657],
 and in so doing switches it from {{TIMESTAMP}} to {{TIMESTAMP WITH LOCAL TIME 
ZONE}}, because it thinks that the {{CAST}} expression was written by the user, 
and must therefore be translated according to the catalog reader.

I'm still thinking about how to handle this.




was (Author: wnoble):
Finally getting back to this. Hopefully this is a much better description of 
the problem. It occurs when using BigQuery's catalog type map (where 
{{TIMESTAMP}} maps to {{TIMESTAMP WITH LOCAL TIME ZONE}}) and 
{{SqlToRelConverter}} encounters a query like this:
{code:sql}
SELECT TIMESTAMP_ADD(DATE(1900, 1, 1), INTERVAL (1) DAY) AS `ts`
{code}
The parsed {{SqlNode}} has validated type {{TIMESTAMP(0) NOT NULL}}, and it 
ends up converting into the following relational expression:
{code:java}
LogicalProject(ts=[+(CAST(DATE(1900, 1, 1)):TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) 
NOT NULL, *(8640:INTERVAL DAY, 1))])
{code}
which of course has converted type {{TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) NOT 
NULL}}, so the type of the {{SqlNode}} does not match the type of the 
{{{}RelNode{}}}, and it triggers [this 
exception|https://github.com/apache/calcite/blob/50c3edfc3d6630528ab51fe836bd50df82cc7db8/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L489].

Btw, despite the fact that a BQ {{TIMESTAMP}} is really a timestamp w/ ltz, the 
{{TIMESTAMP_ADD()}} function actually returns a {{DATETIME}} object (ISO 
timestamp) when the first argument is a {{DATE}}, so the validated type of the 
{{SqlNode}} is actually correct, as verified by manually running the query 
against BQ. My original PR was mistaken in believing that it should be 
timestamp w/ ltz, as the name seems to imply. Whoops. There may actually be a 
couple bugs at play here.

The reason for 

[jira] [Comment Edited] (CALCITE-5636) Consult type mappings when coercing types

2023-09-06 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762525#comment-17762525
 ] 

Will Noble edited comment on CALCITE-5636 at 9/7/23 12:48 AM:
--

Finally getting back to this. Hopefully this is a much better description of 
the problem. It occurs when using BigQuery's catalog type map (where 
{{TIMESTAMP}} maps to {{TIMESTAMP WITH LOCAL TIME ZONE}}) and 
{{SqlToRelConverter}} encounters a query like this:
{code:sql}
SELECT TIMESTAMP_ADD(DATE(1900, 1, 1), INTERVAL (1) DAY) AS `ts`
{code}
The parsed {{SqlNode}} has validated type {{TIMESTAMP(0) NOT NULL}}, and it 
ends up converting into the following relational expression:
{code:java}
LogicalProject(ts=[+(CAST(DATE(1900, 1, 1)):TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) 
NOT NULL, *(8640:INTERVAL DAY, 1))])
{code}
which of course has converted type {{TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) NOT 
NULL}}, so the type of the {{SqlNode}} does not match the type of the 
{{{}RelNode{}}}, and it triggers [this 
exception|https://github.com/apache/calcite/blob/50c3edfc3d6630528ab51fe836bd50df82cc7db8/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L489].

Btw, despite the fact that a BQ {{TIMESTAMP}} is really a timestamp w/ ltz, the 
{{TIMESTAMP_ADD()}} function actually returns a {{DATETIME}} object (ISO 
timestamp) when the first argument is a {{DATE}}, so the validated type of the 
{{SqlNode}} is actually correct, as verified by manually running the query 
against BQ. My original PR was mistaken in believing that it should be 
timestamp w/ ltz, as the name seems to imply. Whoops. There may actually be a 
couple bugs at play here.

The reason for the discrepancy can be seen in 
{{SqlValidatorImpl.validateSelectList()}}. This invokes {{expandSelectItem()}} 
[here|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L4638],
 which eventually tries to derive the type of the {{TIMESTAMP_ADD()}} call 
[here|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L6555],
 which must validate the operands in order to derive the type of the function 
call, and in so doing coerces the type of the first operand by casting it to an 
ISO timestamp. All of this results in a value in {{expandedSelectItems}} that 
looks like this:

{code:sql}
TIMESTAMP_ADD(CAST(DATE(1900, 1, 1) AS TIMESTAMP(0)), INTERVAL 1 DAY) AS `ts`
{code}

[A few lines 
down|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L4659]
 in {{validateSelectList()}}, we invoke {{validateExpr()}} on that expanded 
node, and this eventually tries to derive the type of of that {{CAST}} 
expression, which results in visiting the inserted {{SqlDataTypeSpec}} 
[here|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L6657],
 and in so doing switches it from {{TIMESTAMP}} to {{TIMESTAMP WITH LOCAL TIME 
ZONE}}, because it thinks that the {{CAST}} expression was written by the user, 
and must therefore be translated according to the catalog reader.

I'm still thinking about how to handle this.




was (Author: wnoble):
Finally getting back to this. Hopefully this is a much better description of 
the problem. It occurs when using BigQuery's catalog type map (where 
{{TIMESTAMP}} maps to {{TIMESTAMP WITH LOCAL TIME ZONE}}) and 
{{SqlToRelConverter}} encounters a query like this:
{code:sql}
SELECT TIMESTAMP_ADD(DATE(1900, 1, 1), INTERVAL (1) DAY) AS `ts`
{code}
The parsed {{SqlNode}} has validated type {{TIMESTAMP(0) NOT NULL}}, and it 
ends up converting into the following relational expression:
{code:java}
LogicalProject(ts=[+(CAST(DATE(1900, 1, 1)):TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) 
NOT NULL, *(8640:INTERVAL DAY, 1))])
{code}
which of course has converted type {{TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) NOT 
NULL}}, so the type of the {{SqlNode}} does not match the type of the 
{{{}RelNode{}}}, and it triggers [this 
exception|https://github.com/apache/calcite/blob/50c3edfc3d6630528ab51fe836bd50df82cc7db8/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L489].

Btw, despite the fact that a BQ {{TIMESTAMP}} is really a timestamp w/ ltz, the 
{{TIMESTAMP_ADD()}} function actually returns a {{DATETIME}} object (ISO 
timestamp) when the first argument is a {{DATE}}, so the validated type of the 
{{SqlNode}} is actually correct, as verified by manually running the query 
against BQ. My original PR was mistaken in believing that it should be 
timestamp w/ ltz, as the name seems to imply. Whoops.

The reason for the discrepancy can be seen in 

[jira] [Comment Edited] (CALCITE-5636) Consult type mappings when coercing types

2023-09-06 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762525#comment-17762525
 ] 

Will Noble edited comment on CALCITE-5636 at 9/7/23 12:45 AM:
--

Finally getting back to this. Hopefully this is a much better description of 
the problem. It occurs when using BigQuery's catalog type map (where 
{{TIMESTAMP}} maps to {{TIMESTAMP WITH LOCAL TIME ZONE}}) and 
{{SqlToRelConverter}} encounters a query like this:
{code:sql}
SELECT TIMESTAMP_ADD(DATE(1900, 1, 1), INTERVAL (1) DAY) AS `ts`
{code}
The parsed {{SqlNode}} has validated type {{TIMESTAMP(0) NOT NULL}}, and it 
ends up converting into the following relational expression:
{code:java}
LogicalProject(ts=[+(CAST(DATE(1900, 1, 1)):TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) 
NOT NULL, *(8640:INTERVAL DAY, 1))])
{code}
which of course has converted type {{TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) NOT 
NULL}}, so the type of the {{SqlNode}} does not match the type of the 
{{{}RelNode{}}}, and it triggers [this 
exception|https://github.com/apache/calcite/blob/50c3edfc3d6630528ab51fe836bd50df82cc7db8/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L489].

Btw, despite the fact that a BQ {{TIMESTAMP}} is really a timestamp w/ ltz, the 
{{TIMESTAMP_ADD()}} function actually returns a {{DATETIME}} object (ISO 
timestamp) when the first argument is a {{DATE}}, so the validated type of the 
{{SqlNode}} is actually correct, as verified by manually running the query 
against BQ. My original PR was mistaken in believing that it should be 
timestamp w/ ltz, as the name seems to imply. Whoops.

The reason for the discrepancy can be seen in 
{{SqlValidatorImpl.validateSelectList()}}. This invokes {{expandSelectItem()}} 
[here|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L4638],
 which eventually tries to derive the type of the {{TIMESTAMP_ADD()}} call 
[here|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L6555],
 which must validate the operands in order to derive the type of the function 
call, and in so doing coerces the type of the first operand by casting it to an 
ISO timestamp. All of this results in a value in {{expandedSelectItems}} that 
looks like this:

{code:sql}
TIMESTAMP_ADD(CAST(DATE(1900, 1, 1) AS TIMESTAMP(0)), INTERVAL 1 DAY) AS `ts`
{code}

[A few lines 
down|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L4659]
 in {{validateSelectList()}}, we invoke {{validateExpr()}} on that expanded 
node, and this eventually tries to derive the type of of that {{CAST}} 
expression, which results in visiting the inserted {{SqlDataTypeSpec}} 
[here|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L6657],
 and in so doing switches it from {{TIMESTAMP}} to {{TIMESTAMP WITH LOCAL TIME 
ZONE}}, because it thinks that the {{CAST}} expression was written by the user, 
and must therefore be translated according to the catalog reader.

I'm still thinking about how to handle this.




was (Author: wnoble):
Finally getting back to this. Hopefully this is a much better description of 
the problem. It occurs when using BigQuery's catalog type map (where 
{{TIMESTAMP}} maps to {{TIMESTAMP WITH LOCAL TIME ZONE}}) and 
{{SqlToRelConverter}} encounters a query like this:
{code:sql}
SELECT TIMESTAMP_ADD(DATE(1900, 1, 1), INTERVAL (1) DAY) AS `ts`
{code}
The parsed {{SqlNode}} has validated type {{TIMESTAMP(0) NOT NULL}}, and it 
ends up converting into the following relational expression:
{code:java}
LogicalProject(ts=[+(CAST(DATE(1900, 1, 1)):TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) 
NOT NULL, *(8640:INTERVAL DAY, 1))])
{code}
which of course has converted type {{TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) NOT 
NULL}}, so the type of the {{SqlNode}} does not match the type of the 
{{{}RelNode{}}}, and it triggers [this 
exception|https://github.com/apache/calcite/blob/50c3edfc3d6630528ab51fe836bd50df82cc7db8/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L489].

Btw, despite the fact that a BQ {{TIMESTAMP}} is really a timestamp w/ ltz, the 
{{TIMESTAMP_ADD()}} function actually returns a {{DATETIME}} object (ISO 
timestamp) when the first argument is a {{DATE}}, so the validated type of the 
{{SqlNode}} is actually correct, as verified by manually running the query 
against BQ. My original PR was mistaken in believing that it should be 
timestamp w/ ltz, as the name seems to imply. Whoops.

The reason for the discrepancy can be seen in 
{{SqlValidatorImpl.validateSelectList()}}. This invokes 

[jira] [Commented] (CALCITE-5636) Consult type mappings when coercing types

2023-09-06 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762525#comment-17762525
 ] 

Will Noble commented on CALCITE-5636:
-

Finally getting back to this. Hopefully this is a much better description of 
the problem. It occurs when using BigQuery's catalog type map (where 
{{TIMESTAMP}} maps to {{TIMESTAMP WITH LOCAL TIME ZONE}}) and 
{{SqlToRelConverter}} encounters a query like this:
{code:sql}
SELECT TIMESTAMP_ADD(DATE(1900, 1, 1), INTERVAL (1) DAY) AS `ts`
{code}
The parsed {{SqlNode}} has validated type {{TIMESTAMP(0) NOT NULL}}, and it 
ends up converting into the following relational expression:
{code:java}
LogicalProject(ts=[+(CAST(DATE(1900, 1, 1)):TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) 
NOT NULL, *(8640:INTERVAL DAY, 1))])
{code}
which of course has converted type {{TIMESTAMP_WITH_LOCAL_TIME_ZONE(0) NOT 
NULL}}, so the type of the {{SqlNode}} does not match the type of the 
{{{}RelNode{}}}, and it triggers [this 
exception|https://github.com/apache/calcite/blob/50c3edfc3d6630528ab51fe836bd50df82cc7db8/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L489].

Btw, despite the fact that a BQ {{TIMESTAMP}} is really a timestamp w/ ltz, the 
{{TIMESTAMP_ADD()}} function actually returns a {{DATETIME}} object (ISO 
timestamp) when the first argument is a {{DATE}}, so the validated type of the 
{{SqlNode}} is actually correct, as verified by manually running the query 
against BQ. My original PR was mistaken in believing that it should be 
timestamp w/ ltz, as the name seems to imply. Whoops.

The reason for the discrepancy can be seen in 
{{SqlValidatorImpl.validateSelectList()}}. This invokes {{expandSelectItem()}} 
[here|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L4638],
 which eventually tries to derive the type of the {{TIMESTAMP_ADD()}} call 
[here|https://github.com/apache/calcite/blob/bcf6bd8577b25c563b1c597c70704594a18ca1a3/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L6555],
 which must validate the operands in order to derive the type of the function 
call, and in so doing coerces the type of the first operand by casting it to an 
ISO timestamp. All of this results in a value in {{expandedSelectItems}} that 
looks like this:

{code:sql}
TIMESTAMP_ADD(CAST(DATE(1900, 1, 1) AS TIMESTAMP(0)), INTERVAL 1 DAY) AS `ts`
{code}

A few lines down,



> Consult type mappings when coercing types
> -
>
> Key: CALCITE-5636
> URL: https://issues.apache.org/jira/browse/CALCITE-5636
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Will Noble
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is another offshoot of CALCITE-5346.
> In 
> [{{AbstractTypeCoercion.implicitCast}}|https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/main/java/org/apache/calcite/sql/validate/implicit/AbstractTypeCoercion.java#L713],
>  Calcite may insert {{CAST}} invocations disregarding the root schema type 
> map. This happens during the validation phase, so it's easy enough to read 
> the type map directly in the {{AbstractTypeCoercion}} utility, although it's 
> hard to say if this is an optimal solution. It may make sense to incorporate 
> the type map into the {{RelDataTypeSystem}} which is more widely available in 
> non-validation phases of SQL processing pipelines.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5866) Not all dialects support sorting in sub-queries

2023-07-28 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748764#comment-17748764
 ] 

Will Noble commented on CALCITE-5866:
-

The main thing for us to decide is what the behavior should be for a Calcite 
relational expression, then we can handle each dialect as necessary. If this 
was already hashed out somewhere linkable I'd be curious to know what was said.

# We could decide that any sort or aggregate node effectively de-stabilizes any 
prior sorts, meaning the ordering is lost but the limit and offset are kept. I 
believe that would be consistent with standard SQL databases. When translating 
stable-sorting dialects to rel nodes, we can use a rule that pushes sorts to 
the outermost layer possible, similar to the example I gave in my last comment.
# The canonical behavior of Calcite's rel nodes *could* depart from what a 
"standard SQL" dialect expects, and keep prior sorts stable. When translating 
standard SQL dialects into rel nodes, we can use a rule that removes 
unrestricted inner sorts, so that translating the resulting rel node into a 
stable-sorting dialect produces the same results.

No matter which is chosen, both styles should be accommodable in any dialect. I 
have a feeling these rules I'm describing may already exist.

> Not all dialects support sorting in sub-queries
> ---
>
> Key: CALCITE-5866
> URL: https://issues.apache.org/jira/browse/CALCITE-5866
> Project: Calcite
>  Issue Type: Bug
>Reporter: Will Noble
>Priority: Minor
>
> The rel-to-sql converter inserts subqueries in certain situations, such as 
> when sorting by ordinal (see CALCITE-5768). Certain dialects, such as MSSQL 
> (SQL Server) ignore the {{ORDER BY}} clause in a subquery. We may need a new 
> dialect-level setting like {{canSortInSubQuery}} that dictates whether 
> Calcite can safely insert sub-queries with {{ORDER BY}} clauses in them, or 
> whether it must do everything in it's power to avoid that (such as refusing 
> to sort by ordinal in cases where not all sort expressions are included in 
> the {{SELECT}} list).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-5866) Not all dialects support sorting in sub-queries

2023-07-20 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745321#comment-17745321
 ] 

Will Noble edited comment on CALCITE-5866 at 7/21/23 12:40 AM:
---

Actually, I think we could come up with a general solution using aliases. 
Working on top of the simple examples in my last comment, this could work:

{code:sql}
SELECT "JOB"
FROM (SELECT "JOB", COUNT("ENAME") AS "$f1"
FROM "scott"."EMP"
GROUP BY "JOB") AS "t0"
ORDER BY "JOB", "$f1"
{code}

(I have not actually tested this, but I believe it could work). The trick is 
that, for queries that do not support ordering in sub-queries, we must try to 
move the {{ORDER BY}} clause to the outer query when we insert a sub-query, 
make sure the sub-query has aliases, then sort by alias in the outer query. 
This may require complex logic to make sure it doesn't interfere with other 
sorts, but I can't yet see a reason why it wouldn't work. Is there anything in 
the rel-to-sql converter that already works kinda like this?


was (Author: wnoble):
Actually, I think we could come up with a general solution using aliases. 
Working on top of the simple examples in my last comment, this could work:

{code:sql}
SELECT "JOB"
FROM (SELECT "JOB", COUNT("ENAME") AS "$f1"
FROM "scott"."EMP"
GROUP BY "JOB") AS "t0"
ORDER BY "JOB", "$f1"
{code}

(I have not actually tested this, but I believe it could work). The trick is 
that, for queries that do not support ordering in sub-queries, we must try to 
move the {{ORDER BY}} clause to the outer query when we insert a sub-query, 
make sure the sub-query has aliases, then sort by alias in the outer query.

> Not all dialects support sorting in sub-queries
> ---
>
> Key: CALCITE-5866
> URL: https://issues.apache.org/jira/browse/CALCITE-5866
> Project: Calcite
>  Issue Type: Bug
>Reporter: Will Noble
>Priority: Minor
>
> The rel-to-sql converter inserts subqueries in certain situations, such as 
> when sorting by ordinal (see CALCITE-5768). Certain dialects, such as MSSQL 
> (SQL Server) ignore the {{ORDER BY}} clause in a subquery. We may need a new 
> dialect-level setting like {{canSortInSubQuery}} that dictates whether 
> Calcite can safely insert sub-queries with {{ORDER BY}} clauses in them, or 
> whether it must do everything in it's power to avoid that (such as refusing 
> to sort by ordinal in cases where not all sort expressions are included in 
> the {{SELECT}} list).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5866) Not all dialects support sorting in sub-queries

2023-07-20 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745321#comment-17745321
 ] 

Will Noble commented on CALCITE-5866:
-

Actually, I think we could come up with a general solution using aliases. 
Working on top of the simple examples in my last comment, this could work:

{code:sql}
SELECT "JOB"
FROM (SELECT "JOB", COUNT("ENAME") AS "$f1"
FROM "scott"."EMP"
GROUP BY "JOB") AS "t0"
ORDER BY "JOB", "$f1"
{code}

(I have not actually tested this, but I believe it could work). The trick is 
that, for queries that do not support ordering in sub-queries, we must try to 
move the {{ORDER BY}} clause to the outer query when we insert a sub-query, 
make sure the sub-query has aliases, then sort by alias in the outer query.

> Not all dialects support sorting in sub-queries
> ---
>
> Key: CALCITE-5866
> URL: https://issues.apache.org/jira/browse/CALCITE-5866
> Project: Calcite
>  Issue Type: Bug
>Reporter: Will Noble
>Priority: Minor
>
> The rel-to-sql converter inserts subqueries in certain situations, such as 
> when sorting by ordinal (see CALCITE-5768). Certain dialects, such as MSSQL 
> (SQL Server) ignore the {{ORDER BY}} clause in a subquery. We may need a new 
> dialect-level setting like {{canSortInSubQuery}} that dictates whether 
> Calcite can safely insert sub-queries with {{ORDER BY}} clauses in them, or 
> whether it must do everything in it's power to avoid that (such as refusing 
> to sort by ordinal in cases where not all sort expressions are included in 
> the {{SELECT}} list).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-5866) Not all dialects support sorting in sub-queries

2023-07-20 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745319#comment-17745319
 ] 

Will Noble edited comment on CALCITE-5866 at 7/21/23 12:34 AM:
---

> IMO the `ORDER BY` clause in a sub-query is useless, because it not affects 
> the final result. Please correct it if I am wrong.

Most dialects allow sorting in sub-queries. The ordering of the rows in the 
sub-query will remain stable as they pass through later stages of processing 
(unless they are re-ordered in a later stage). This is a departure from the 
relational model, but MSSQL is the only popular SQL dialect I've encountered 
that does not work this way. Because of this, it is not generally possible to 
sort on something that is not also selected.

Here's the kind of query that motivated this bug in the first place:

{code:sql}
SELECT "JOB"
FROM "scott"."EMP"
GROUP BY "JOB"
ORDER BY "JOB", COUNT("ENAME")
{code}

While exploring solutions for CALCITE-5808 I came up with this that leverages 
MSSQL's sort-by-ordinal ability:

{code:sql}
SELECT "JOB"
FROM (SELECT "JOB", COUNT("ENAME") AS "$f1"
FROM "scott"."EMP"
GROUP BY "JOB"
ORDER BY "JOB", 2) AS "t0"
{code}

Of course, this is a problem because it has an {{ORDER BY}} clause in a 
sub-query, which will be ignored. So this query has no dependable order for the 
results. Using sort-by-ordinal (or even sort-by-alias), it would actually be 
impossible to write a single query that sorts on anything that is not a column 
in the final output. In order to achieve that, you must sort by expression. In 
this case, the top example works fine.

But MSSQL does a bad enough job at matching expressions in the {{ORDER BY}} 
list to expressions in the {{SELECT}} list that we cannot depend on 
sort-by-expression in the general case. Something like {{SELECT  ORDER BY }} will fail in the 
general case with the error message ["A constant expression was encountered in 
the ORDER BY 
list"|https://stackoverflow.com/questions/49377988/a-constant-expression-was-encountered-in-the-order-by-list-position-1]
 even for expressions that are not constant. I just encountered this error 
trying to run this query (I know it's complicated but it's hard to say for sure 
where the problem is so I'm reproducing it almost as-is from my real-world 
example):

{code:sql}
SELECT ((CONVERT(VARCHAR(19),
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
   ,120) + '.' + RIGHT('00' + CAST(DATEPART(MCS, CAST(
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
AS DATETIME2)) AS VARCHAR(6)), 6))) AS 
`users.created_microsecond`
FROM `USERS` AS `users`
GROUP BY ((CONVERT(VARCHAR(19),
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
   ,120) + '.' + RIGHT('00' + CAST(DATEPART(MCS, CAST(
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
AS DATETIME2)) AS VARCHAR(6)), 6)))
ORDER BY ((CONVERT(VARCHAR(19),
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
   ,120) + '.' + RIGHT('00' + CAST(DATEPART(MCS, CAST(
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
AS DATETIME2)) AS VARCHAR(6)), 6))) DESC
FETCH NEXT 1000 ROWS ONLY
{code}

As you can see, the {{ORDER BY}} expression is the same as the one in the 
{{SELECT}} and {{GROUP BY}} lists, but MSSQL thinks it's a constant expression.

Taken together, this means:

# We cannot generally rely on sorting by ordinal (or alias) because we need to 
insert sub-queries for sort expressions that aren't also selected, and we can't 
sort sub-queries.
# We cannot generally rely on sorting by expression because MSSQL does a bad 
job of recognizing when expressions are constant or not. This part is very 
poorly documented, because clearly sorting by expression works in some 
circumstances (and is in fact the only way to achieve certain results, like 
sorting by an un-selected expression), but not others.

I'm not sure how we can reconcile these in a reliable way. Perhaps the only way 
is to file bugs against SQL Server.


was (Author: wnoble):
> IMO the `ORDER BY` clause in a sub-query is useless, because it not affects 
> the final result. Please correct it if I am wrong.

Most dialects allow sorting in sub-queries. This is a departure from the 
relational model, but MSSQL is the only popular SQL dialect I've encountered 
that does not work this way. Because of this, it is not generally possible to 
sort on something that is not also selected.

Here's the kind of 

[jira] [Comment Edited] (CALCITE-5866) Not all dialects support sorting in sub-queries

2023-07-20 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745319#comment-17745319
 ] 

Will Noble edited comment on CALCITE-5866 at 7/21/23 12:21 AM:
---

> IMO the `ORDER BY` clause in a sub-query is useless, because it not affects 
> the final result. Please correct it if I am wrong.

Most dialects allow sorting in sub-queries. This is a departure from the 
relational model, but MSSQL is the only popular SQL dialect I've encountered 
that does not work this way. Because of this, it is not generally possible to 
sort on something that is not also selected.

Here's the kind of query that motivated this bug in the first place:

{code:sql}
SELECT "JOB"
FROM "scott"."EMP"
GROUP BY "JOB"
ORDER BY "JOB", COUNT("ENAME")
{code}

While exploring solutions for CALCITE-5808 I came up with this that leverages 
MSSQL's sort-by-ordinal ability:

{code:sql}
SELECT "JOB"
FROM (SELECT "JOB", COUNT("ENAME") AS "$f1"
FROM "scott"."EMP"
GROUP BY "JOB"
ORDER BY "JOB", 2) AS "t0"
{code}

Of course, this is a problem because it has an {{ORDER BY}} clause in a 
sub-query, which will be ignored. So this query has no dependable order for the 
results. Using sort-by-ordinal (or even sort-by-alias), it would actually be 
impossible to write a single query that sorts on anything that is not a column 
in the final output. In order to achieve that, you must sort by expression. In 
this case, the top example works fine.

But MSSQL does a bad enough job at matching expressions in the {{ORDER BY}} 
list to expressions in the {{SELECT}} list that we cannot depend on 
sort-by-expression in the general case. Something like {{SELECT  ORDER BY }} will fail in the 
general case with the error message ["A constant expression was encountered in 
the ORDER BY 
list"|https://stackoverflow.com/questions/49377988/a-constant-expression-was-encountered-in-the-order-by-list-position-1]
 even for expressions that are not constant. I just encountered this error 
trying to run this query (I know it's complicated but it's hard to say for sure 
where the problem is so I'm reproducing it almost as-is from my real-world 
example):

{code:sql}
SELECT ((CONVERT(VARCHAR(19),
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
   ,120) + '.' + RIGHT('00' + CAST(DATEPART(MCS, CAST(
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
AS DATETIME2)) AS VARCHAR(6)), 6))) AS 
`users.created_microsecond`
FROM `USERS` AS `users`
GROUP BY ((CONVERT(VARCHAR(19),
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
   ,120) + '.' + RIGHT('00' + CAST(DATEPART(MCS, CAST(
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
AS DATETIME2)) AS VARCHAR(6)), 6)))
ORDER BY ((CONVERT(VARCHAR(19),
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
   ,120) + '.' + RIGHT('00' + CAST(DATEPART(MCS, CAST(
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
AS DATETIME2)) AS VARCHAR(6)), 6))) DESC
FETCH NEXT 1000 ROWS ONLY
{code}

As you can see, the {{ORDER BY}} expression is the same as the one in the 
{{SELECT}} and {{GROUP BY}} lists, but MSSQL thinks it's a constant expression.

Taken together, this means:

# We cannot generally rely on sorting by ordinal (or alias) because we need to 
insert sub-queries for sort expressions that aren't also selected, and we can't 
sort sub-queries.
# We cannot generally rely on sorting by expression because MSSQL does a bad 
job of recognizing when expressions are constant or not. This part is very 
poorly documented, because clearly sorting by expression works in some 
circumstances (and is in fact the only way to achieve certain results, like 
sorting by an un-selected expression), but not others.

I'm not sure how we can reconcile these in a reliable way. Perhaps the only way 
is to file bugs against SQL Server.


was (Author: wnoble):
> IMO the `ORDER BY` clause in a sub-query is useless, because it not affects 
> the final result. Please correct it if I am wrong.

Most dialects allow sorting in sub-queries. This is a departure from the 
relational model, but MSSQL is the only popular SQL dialect I've encountered 
that does not work this was. Because of this, it is not generally possible to 
sort on something that is not also selected.

Here's the kind of query that motivated this bug in the first place:

{code:sql}
SELECT "JOB"
FROM "scott"."EMP"
GROUP BY "JOB"
ORDER BY "JOB", COUNT("ENAME")
{code}

While 

[jira] [Comment Edited] (CALCITE-5866) Not all dialects support sorting in sub-queries

2023-07-20 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745319#comment-17745319
 ] 

Will Noble edited comment on CALCITE-5866 at 7/21/23 12:18 AM:
---

> IMO the `ORDER BY` clause in a sub-query is useless, because it not affects 
> the final result. Please correct it if I am wrong.

Most dialects allow sorting in sub-queries. This is a departure from the 
relational model, but MSSQL is the only popular SQL dialect I've encountered 
that does not work this was. Because of this, it is not generally possible to 
sort on something that is not also selected.

Here's the kind of query that motivated this bug in the first place:

{code:sql}
SELECT "JOB"
FROM "scott"."EMP"
GROUP BY "JOB"
ORDER BY "JOB", COUNT("ENAME")
{code}

While exploring solutions for CALCITE-5808 I came up with this that leverages 
MSSQL's sort-by-ordinal ability:

{code:sql}
SELECT "JOB"
FROM (SELECT "JOB", COUNT("ENAME") AS "$f1"
FROM "scott"."EMP"
GROUP BY "JOB"
ORDER BY "JOB", 2) AS "t0"
{code}

Of course, this is a problem because it has an {{ORDER BY}} clause in a 
sub-query, which will be ignored. So this query has no dependable order for the 
results. Using sort-by-ordinal (or even sort-by-alias), it would actually be 
impossible to write a single query that sorts on anything that is not a column 
in the final output. In order to achieve that, you must sort by expression. In 
this case, the top example works fine.

But MSSQL does a bad enough job at matching expressions in the {{ORDER BY}} 
list to expressions in the {{SELECT}} list that we cannot depend on 
sort-by-expression in the general case. Something like {{SELECT  ORDER BY }} will fail in the 
general case with the error message ["A constant expression was encountered in 
the ORDER BY 
list"|https://stackoverflow.com/questions/49377988/a-constant-expression-was-encountered-in-the-order-by-list-position-1]
 even for expressions that are not constant. I just encountered this error 
trying to run this query (I know it's complicated but it's hard to say for sure 
where the problem is so I'm reproducing it almost as-is from my real-world 
example):

{code:sql}
SELECT ((CONVERT(VARCHAR(19),
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
   ,120) + '.' + RIGHT('00' + CAST(DATEPART(MCS, CAST(
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
AS DATETIME2)) AS VARCHAR(6)), 6))) AS 
`users.created_microsecond`
FROM `USERS` AS `users`
GROUP BY ((CONVERT(VARCHAR(19),
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
   ,120) + '.' + RIGHT('00' + CAST(DATEPART(MCS, CAST(
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
AS DATETIME2)) AS VARCHAR(6)), 6)))
ORDER BY ((CONVERT(VARCHAR(19),
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
   ,120) + '.' + RIGHT('00' + CAST(DATEPART(MCS, CAST(
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
AS DATETIME2)) AS VARCHAR(6)), 6))) DESC
FETCH NEXT 1000 ROWS ONLY
{code}

As you can see, the {{ORDER BY}} expression is the same as the one in the 
{{SELECT}} and {{GROUP BY}} lists, but MSSQL thinks it's a constant expression.

Taken together, this means:

# We cannot generally rely on sorting by ordinal (or alias) because we need to 
insert sub-queries for sort expressions that aren't also selected, and we can't 
sort sub-queries.
# We cannot generally rely on sorting by expression because MSSQL does a bad 
job of recognizing when expressions are constant or not. This part is very 
poorly documented, because clearly sorting by expression works in some 
circumstances (and is in fact the only way to achieve certain results, like 
sorting by an un-selected expression), but not others.

I'm not sure how we can reconcile these in a reliable way. Perhaps the only way 
is to file bugs against SQL Server.


was (Author: wnoble):
> IMO the `ORDER BY` clause in a sub-query is useless, because it not affects 
> the final result. Please correct it if I am wrong.

Most dialects allow sorting in sub-queries. This is a departure from the 
relational model, but MSSQL is the only popular SQL dialect I've encountered 
that does not work this was. Because of this, it is not generally possible to 
sort on something that is not also selected.

Here's the kind of query that motivated this bug in the first place:

{code:sql}
SELECT "JOB"
FROM "scott"."EMP"
GROUP BY "JOB"
ORDER BY "JOB", COUNT("ENAME")
{code}

While 

[jira] [Comment Edited] (CALCITE-5866) Not all dialects support sorting in sub-queries

2023-07-20 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745319#comment-17745319
 ] 

Will Noble edited comment on CALCITE-5866 at 7/21/23 12:17 AM:
---

> IMO the `ORDER BY` clause in a sub-query is useless, because it not affects 
> the final result. Please correct it if I am wrong.

Most dialects allow sorting in sub-queries. This is a departure from the 
relational model, but MSSQL is the only popular SQL dialect I've encountered 
that does not work this was. Because of this, it is not generally possible to 
sort on something that is not also selected.

Here's the kind of query that motivated this bug in the first place:

{code:sql}
SELECT "JOB"
FROM "scott"."EMP"
GROUP BY "JOB"
ORDER BY "JOB", COUNT("ENAME")
{code}

While exploring solutions for CALCITE-5808 I came up with this that leverages 
MSSQL's sort-by-ordinal ability:

{code:sql}
SELECT "JOB"
FROM (SELECT "JOB", COUNT("ENAME") AS "$f1"
FROM "scott"."EMP"
GROUP BY "JOB"
ORDER BY "JOB", 2) AS "t0"
{code}

Of course, this is a problem because it has an {{ORDER BY}} clause in a 
sub-query, which will be ignored. So this query has no dependable order for the 
results. Using sort-by-ordinal, it would actually be impossible to write a 
single query that sorts on anything that is not a column in the final output. 
In order to achieve that, you must sort by expression. In this case, the top 
example works fine.

But MSSQL does a bad enough job at matching expressions in the {{ORDER BY}} 
list to expressions in the {{SELECT}} list that we cannot depend on 
sort-by-expression in the general case. Something like {{SELECT  ORDER BY }} will fail in the 
general case with the error message ["A constant expression was encountered in 
the ORDER BY 
list"|https://stackoverflow.com/questions/49377988/a-constant-expression-was-encountered-in-the-order-by-list-position-1]
 even for expressions that are not constant. I just encountered this error 
trying to run this query (I know it's complicated but it's hard to say for sure 
where the problem is so I'm reproducing it almost as-is from my real-world 
example):

{code:sql}
SELECT ((CONVERT(VARCHAR(19),
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
   ,120) + '.' + RIGHT('00' + CAST(DATEPART(MCS, CAST(
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
AS DATETIME2)) AS VARCHAR(6)), 6))) AS 
`users.created_microsecond`
FROM `USERS` AS `users`
GROUP BY ((CONVERT(VARCHAR(19),
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
   ,120) + '.' + RIGHT('00' + CAST(DATEPART(MCS, CAST(
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
AS DATETIME2)) AS VARCHAR(6)), 6)))
ORDER BY ((CONVERT(VARCHAR(19),
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
   ,120) + '.' + RIGHT('00' + CAST(DATEPART(MCS, CAST(
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
AS DATETIME2)) AS VARCHAR(6)), 6))) DESC
FETCH NEXT 1000 ROWS ONLY
{code}

As you can see, the {{ORDER BY}} expression is the same as the one in the 
{{SELECT}} and {{GROUP BY}} lists, but MSSQL thinks it's a constant expression.

Taken together, this means:

# We cannot generally rely on sorting by ordinal (or alias) because we need to 
insert sub-queries for sort expressions that aren't also selected, and we can't 
sort sub-queries.
# We cannot generally rely on sorting by expression because MSSQL does a bad 
job of recognizing when expressions are constant or not. This part is very 
poorly documented, because clearly sorting by expression works in some 
circumstances (and is in fact the only way to achieve certain results, like 
sorting by an un-selected expression), but not others.

I'm not sure how we can reconcile these in a reliable way. Perhaps the only way 
is to file bugs against SQL Server.


was (Author: wnoble):
This is actually a weirder problem than it seems because I've encountered some 
severely under-documented discrepancies in what SQL Server 2017 will accept, 
and it seems like MSSQL is actually incompatible with particular edge-cases 
that other databases can handle.

Here's the kind of query that motivated this bug in the first place:

{code:sql}
SELECT "JOB"
FROM "scott"."EMP"
GROUP BY "JOB"
ORDER BY "JOB", COUNT("ENAME")
{code}

While exploring solutions for CALCITE-5808 I came up with this that leverages 
MSSQL's sort-by-ordinal ability:

{code:sql}
SELECT "JOB"
FROM (SELECT "JOB", COUNT("ENAME") AS 

[jira] [Commented] (CALCITE-5866) Not all dialects support sorting in sub-queries

2023-07-20 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745319#comment-17745319
 ] 

Will Noble commented on CALCITE-5866:
-

This is actually a weirder problem than it seems because I've encountered some 
severely under-documented discrepancies in what SQL Server 2017 will accept, 
and it seems like MSSQL is actually incompatible with particular edge-cases 
that other databases can handle.

Here's the kind of query that motivated this bug in the first place:

{code:sql}
SELECT "JOB"
FROM "scott"."EMP"
GROUP BY "JOB"
ORDER BY "JOB", COUNT("ENAME")
{code}

While exploring solutions for CALCITE-5808 I came up with this that leverages 
MSSQL's sort-by-ordinal ability:

{code:sql}
SELECT "JOB"
FROM (SELECT "JOB", COUNT("ENAME") AS "$f1"
FROM "scott"."EMP"
GROUP BY "JOB"
ORDER BY "JOB", 2) AS "t0"
{code}

Of course, this is a problem because it has an {{ORDER BY}} clause in a 
sub-query, which will be ignored. So this query has no dependable order for the 
results. Using sort-by-ordinal, it would actually be impossible to write a 
single query that sorts on anything that is not a column in the final output. 
In order to achieve that, you must sort by expression. In this case, the top 
example works fine.

But MSSQL does a bad enough job at matching expressions in the {{ORDER BY}} 
list to expressions in the {{SELECT}} list that we cannot depend on 
sort-by-expression in the general case. Something like {{SELECT  ORDER BY }} will fail in the 
general case with the error message ["A constant expression was encountered in 
the ORDER BY 
list"|https://stackoverflow.com/questions/49377988/a-constant-expression-was-encountered-in-the-order-by-list-position-1]
 even for expressions that are not constant. I just encountered this error 
trying to run this query (I know it's complicated but it's hard to say for sure 
where the problem is so I'm reproducing it almost as-is from my real-world 
example):

{code:sql}
SELECT ((CONVERT(VARCHAR(19),
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
   ,120) + '.' + RIGHT('00' + CAST(DATEPART(MCS, CAST(
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
AS DATETIME2)) AS VARCHAR(6)), 6))) AS 
`users.created_microsecond`
FROM `USERS` AS `users`
GROUP BY ((CONVERT(VARCHAR(19),
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
   ,120) + '.' + RIGHT('00' + CAST(DATEPART(MCS, CAST(
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
AS DATETIME2)) AS VARCHAR(6)), 6)))
ORDER BY ((CONVERT(VARCHAR(19),
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
   ,120) + '.' + RIGHT('00' + CAST(DATEPART(MCS, CAST(
CAST(CASE WHEN 1 > 0 THEN '2016-02-01 00:30:00.567891' ELSE 
users.name END AS DATETIME2)
  
AS DATETIME2)) AS VARCHAR(6)), 6))) DESC
FETCH NEXT 1000 ROWS ONLY
{code}

As you can see, the {{ORDER BY}} expression is the same as the one in the 
{{SELECT}} and {{GROUP BY}} lists, but MSSQL thinks it's a constant expression.

Taken together, this means:

# We cannot generally rely on sorting by ordinal (or alias) because we need to 
insert sub-queries for sort expressions that aren't also selected, and we can't 
sort sub-queries.
# We cannot generally rely on sorting by expression because MSSQL does a bad 
job of recognizing when expressions are constant or not. This part is very 
poorly documented, because clearly sorting by expression works in some 
circumstances (and is in fact the only way to achieve certain results, like 
sorting by an un-selected expression), but not others.

I'm not sure how we can reconcile these in a reliable way. Perhaps the only way 
is to file bugs against SQL Server.

> Not all dialects support sorting in sub-queries
> ---
>
> Key: CALCITE-5866
> URL: https://issues.apache.org/jira/browse/CALCITE-5866
> Project: Calcite
>  Issue Type: Bug
>Reporter: Will Noble
>Priority: Minor
>
> The rel-to-sql converter inserts subqueries in certain situations, such as 
> when sorting by ordinal (see CALCITE-5768). Certain dialects, such as MSSQL 
> (SQL Server) ignore the {{ORDER BY}} clause in a subquery. We may need a new 
> dialect-level setting like {{canSortInSubQuery}} that dictates whether 
> Calcite can safely insert sub-queries with {{ORDER BY}} clauses in them, or 
> whether it must do everything in it's power to avoid that (such as refusing 
> to sort by ordinal in 

[jira] [Created] (CALCITE-5866) Not all dialects support sorting in sub-queries

2023-07-19 Thread Will Noble (Jira)
Will Noble created CALCITE-5866:
---

 Summary: Not all dialects support sorting in sub-queries
 Key: CALCITE-5866
 URL: https://issues.apache.org/jira/browse/CALCITE-5866
 Project: Calcite
  Issue Type: Bug
Reporter: Will Noble


The rel-to-sql converter inserts subqueries in certain situations, such as when 
sorting by ordinal (see CALCITE-5768). Certain dialects, such as MSSQL (SQL 
Server) ignore the {{ORDER BY}} clause in a subquery. We may need a new 
dialect-level setting like {{canSortInSubQuery}} that dictates whether Calcite 
can safely insert sub-queries with {{ORDER BY}} clauses in them, or whether it 
must do everything in it's power to avoid that (such as refusing to sort by 
ordinal in cases where not all sort expressions are included in the {{SELECT}} 
list).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5808) Rel-to-Sql conversion should better support grouping or sorting by references or ordinals

2023-06-28 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5808:

Description: 
While exploring solutions to CALCITE-5724 I produced [this draft 
commit|https://github.com/apache/calcite/commit/d644ecee44ffd7927a62be96329cd5456f545de2]
 to explore what the Rel-to-Sql conversion process would look like if it were 
heavily biased toward using ordinals in the {{GROUP BY}} and {{ORDER BY}} 
clauses. It works for most common queries, but I gave up with basically 2 
problems yet to be solved:

# Grouping with {{ROLLUP}}, {{CUBE}}, or {{GROUPING SETS}} does not allow 
ordinals (at least [according to this presto 
ticket|https://github.com/prestodb/presto/issues/9522]; I didn't double-check 
any standards or dialects), so my draft solution would have to distinguish 
between these cases and a simple group-by, and only use ordinals in the simple 
case.
# Window functions with an {{ORDER BY}} clause also use ordinals in my draft. 
It seems likely that ordinals are also disallowed in this context, but I'm not 
sure. Looking at the results in 
{{RelToSqlConverterTest.testConvertWindowToSql()}}, it certainly looks like the 
query is incorrect even if ordinals could hypothetically go in a window's 
{{ORDER BY}}.

So, I've decided to stop working on this for now, but wanted to preserve my 
draft change and start a discussion on this ticket. Here are some thoughts:

# It seems like we can reach a happy middle ground on sorting by using ordinals 
whenever the dialect allows *and* the expression returned by {{field()}} is 
anything other than a {{SqlIdentifier}} (i.e. a named column reference). 
Existing behavior as of writing is to only use ordinals when the expression is 
a {{SqlCall}}, but we should use it for literals as well. That would solve 
CALCITE-5724, but still use named references for "simple" collations, which 
seems to be the case for all window functions.
# When it comes to grouping, things are more complicated. As of writing, 
Calcite tends to group by expressions, but some dialects (e.g. BigQuery) can 
get easily confused by this, even when the expression in the {{SELECT}} list 
perfectly matches that in the {{GROUP BY}} clause. With our Calcite 
integration, we need customized cleanup logic to rewrite the {{GROUP BY}} 
clauses in terms of aliases / ordinals in the {{SELECT}} list whenever 
possible, as a band-aid for this problem. We want to get rid of this band-aid 
and upstream a proper solution, and I thought using ordinals could be it, but 
the problems with {{GROUPING SETS}} et al still needs to be solved, and Calcite 
seems to have poor support for grouping by ordinals in general (see the changes 
I had to make to {{RelToSqlConverter.generateGroupList}} in my draft) which 
should be improved.

  was:
While exploring solutions to CALCITE-5724 I produced [this draft 
commit|https://github.com/apache/calcite/commit/d644ecee44ffd7927a62be96329cd5456f545de2]
 to explore what the Rel-to-Sql conversion process would look like if it were 
heavily biased toward using ordinals in the {{GROUP BY}} and {{ORDER BY}} 
clauses. It works for most common queries, but I gave up with basically 2 
problems yet to be solved:

# Grouping with {{ROLLUP}}, {{CUBE}}, or {{GROUPING SETS}} does not allow 
ordinals (at least [according to this presto 
ticket|https://github.com/prestodb/presto/issues/9522]; I didn't double-check 
any standards or dialects), so my draft solution would have to distinguish 
between these cases and a simple group-by, and only use ordinals in the simple 
case.
# Window functions with an {{ORDER BY}} clause also use ordinals in my draft. 
It seems likely that ordinals are also disallowed in this context, but I'm not 
sure. Looking at the results in 
{{RelToSqlConverterTest.testConvertWindowToSql()}}, it certainly looks like the 
query is incorrect even if ordinals can hypothetically go in a window's {{ORDER 
BY}}.

So, I've decided to stop working on this for now, but wanted to preserve my 
draft change and start a discussion on this ticket. Here are some thoughts:

# It seems like we can reach a happy middle ground on sorting by using ordinals 
whenever the dialect allows *and* the expression returned by {{field()}} is 
anything other than a {{SqlIdentifier}} (i.e. a named column reference). 
Existing behavior as of writing is to only use ordinals when the expression is 
a {{SqlCall}}, but we should use it for literals as well. That would solve 
CALCITE-5724, but still use named references for "simple" collations, which 
seems to be the case for all window functions.
# When it comes to grouping, things are more complicated. As of writing, 
Calcite tends to group by expressions, but some dialects (e.g. BigQuery) can 
get easily confused by this, even when the expression in the {{SELECT}} list 
perfectly matches that in the {{GROUP BY}} clause. With our Calcite 

[jira] [Updated] (CALCITE-5808) Rel-to-Sql conversion should better support grouping or sorting by references or ordinals

2023-06-28 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5808:

Description: 
While exploring solutions to CALCITE-5724 I produced [this draft 
commit|https://github.com/apache/calcite/commit/d644ecee44ffd7927a62be96329cd5456f545de2]
 to explore what the Rel-to-Sql conversion process would look like if it were 
heavily biased toward using ordinals in the {{GROUP BY}} and {{ORDER BY}} 
clauses. It works for most common queries, but I gave up with basically 2 
problems yet to be solved:

# Grouping with {{ROLLUP}}, {{CUBE}}, or {{GROUPING SETS}} does not allow 
ordinals (at least [according to this presto 
ticket|https://github.com/prestodb/presto/issues/9522]; I didn't double-check 
any standards or dialects), so my draft solution would have to distinguish 
between these cases and a simple group-by, and only use ordinals in the simple 
case.
# Window functions with an {{ORDER BY}} clause also use ordinals in my draft. 
It seems likely that ordinals are also disallowed in this context, but I'm not 
sure. Looking at the results in 
{{RelToSqlConverterTest.testConvertWindowToSql()}}, it certainly looks like the 
query is incorrect even if ordinals can hypothetically go in a window's {{ORDER 
BY}}.

So, I've decided to stop working on this for now, but wanted to preserve my 
draft change and start a discussion on this ticket. Here are some thoughts:

# It seems like we can reach a happy middle ground on sorting by using ordinals 
whenever the dialect allows *and* the expression returned by {{field()}} is 
anything other than a {{SqlIdentifier}} (i.e. a named column reference). 
Existing behavior as of writing is to only use ordinals when the expression is 
a {{SqlCall}}, but we should use it for literals as well. That would solve 
CALCITE-5724, but still use named references for "simple" collations, which 
seems to be the case for all window functions.
# When it comes to grouping, things are more complicated. As of writing, 
Calcite tends to group by expressions, but some dialects (e.g. BigQuery) can 
get easily confused by this, even when the expression in the {{SELECT}} list 
perfectly matches that in the {{GROUP BY}} clause. With our Calcite 
integration, we need customized cleanup logic to rewrite the {{GROUP BY}} 
clauses in terms of aliases / ordinals in the {{SELECT}} list whenever 
possible, as a band-aid for this problem. We want to get rid of this band-aid 
and upstream a proper solution, and I thought using ordinals could be it, but 
the problems with {{GROUPING SETS}} et al still needs to be solved, and Calcite 
seems to have poor support for grouping by ordinals in general (see the changes 
I had to make to {{RelToSqlConverter.generateGroupList}} in my draft) which 
should be improved.

  was:
While exploring solutions to CALCITE-5724 I produced [this draft 
PR|https://github.com/apache/calcite/commit/d644ecee44ffd7927a62be96329cd5456f545de2]
 to explore what the Rel-to-Sql conversion process would look like if it were 
heavily biased toward using ordinals in the {{GROUP BY}} and {{ORDER BY}} 
clauses. It works for most common queries, but I gave up with basically 2 
problems yet to be solved:

# Grouping with {{ROLLUP}}, {{CUBE}}, or {{GROUPING SETS}} does not allow 
ordinals (at least [according to this presto 
ticket|https://github.com/prestodb/presto/issues/9522]; I didn't double-check 
any standards or dialects), so my draft solution would have to distinguish 
between these cases and a simple group-by, and only use ordinals in the simple 
case.
# Window functions with an {{ORDER BY}} clause also use ordinals in my draft. 
It seems likely that ordinals are also disallowed in this context, but I'm not 
sure. Looking at the results in 
{{RelToSqlConverterTest.testConvertWindowToSql()}}, it certainly looks like the 
query is incorrect even if ordinals can hypothetically go in a window's {{ORDER 
BY}}.

So, I've decided to stop working on this for now, but wanted to preserve my 
draft change and start a discussion on this ticket. Here are some thoughts:

# It seems like we can reach a happy middle ground on sorting by using ordinals 
whenever the dialect allows *and* the expression returned by {{field()}} is 
anything other than a {{SqlIdentifier}} (i.e. a named column reference). 
Existing behavior as of writing is to only use ordinals when the expression is 
a {{SqlCall}}, but we should use it for literals as well. That would solve 
CALCITE-5724, but still use named references for "simple" collations, which 
seems to be the case for all window functions.
# When it comes to grouping, things are more complicated. As of writing, 
Calcite tends to group by expressions, but some dialects (e.g. BigQuery) can 
get easily confused by this, even when the expression in the {{SELECT}} list 
perfectly matches that in the {{GROUP BY}} clause. With our Calcite 
integration, 

[jira] [Created] (CALCITE-5808) Rel-to-Sql conversion should better support grouping or sorting by references or ordinals

2023-06-28 Thread Will Noble (Jira)
Will Noble created CALCITE-5808:
---

 Summary: Rel-to-Sql conversion should better support grouping or 
sorting by references or ordinals
 Key: CALCITE-5808
 URL: https://issues.apache.org/jira/browse/CALCITE-5808
 Project: Calcite
  Issue Type: Improvement
Reporter: Will Noble


While exploring solutions to CALCITE-5724 I produced [this draft 
PR|https://github.com/apache/calcite/commit/d644ecee44ffd7927a62be96329cd5456f545de2]
 to explore what the Rel-to-Sql conversion process would look like if it were 
heavily biased toward using ordinals in the {{GROUP BY}} and {{ORDER BY}} 
clauses. It works for most common queries, but I gave up with basically 2 
problems yet to be solved:

# Grouping with {{ROLLUP}}, {{CUBE}}, or {{GROUPING SETS}} does not allow 
ordinals (at least [according to this presto 
ticket|https://github.com/prestodb/presto/issues/9522]; I didn't double-check 
any standards or dialects), so my draft solution would have to distinguish 
between these cases and a simple group-by, and only use ordinals in the simple 
case.
# Window functions with an {{ORDER BY}} clause also use ordinals in my draft. 
It seems likely that ordinals are also disallowed in this context, but I'm not 
sure. Looking at the results in 
{{RelToSqlConverterTest.testConvertWindowToSql()}}, it certainly looks like the 
query is incorrect even if ordinals can hypothetically go in a window's {{ORDER 
BY}}.

So, I've decided to stop working on this for now, but wanted to preserve my 
draft change and start a discussion on this ticket. Here are some thoughts:

# It seems like we can reach a happy middle ground on sorting by using ordinals 
whenever the dialect allows *and* the expression returned by {{field()}} is 
anything other than a {{SqlIdentifier}} (i.e. a named column reference). 
Existing behavior as of writing is to only use ordinals when the expression is 
a {{SqlCall}}, but we should use it for literals as well. That would solve 
CALCITE-5724, but still use named references for "simple" collations, which 
seems to be the case for all window functions.
# When it comes to grouping, things are more complicated. As of writing, 
Calcite tends to group by expressions, but some dialects (e.g. BigQuery) can 
get easily confused by this, even when the expression in the {{SELECT}} list 
perfectly matches that in the {{GROUP BY}} clause. With our Calcite 
integration, we need customized cleanup logic to rewrite the {{GROUP BY}} 
clauses in terms of aliases / ordinals in the {{SELECT}} list whenever 
possible, as a band-aid for this problem. We want to get rid of this band-aid 
and upstream a proper solution, and I thought using ordinals could be it, but 
the problems with {{GROUPING SETS}} et al still needs to be solved, and Calcite 
seems to have poor support for grouping by ordinals in general (see the changes 
I had to make to {{RelToSqlConverter.generateGroupList}} in my draft) which 
should be improved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-5724) Generated SQL uses literal values in ORDER BY clauses

2023-06-26 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737418#comment-17737418
 ] 

Will Noble edited comment on CALCITE-5724 at 6/27/23 2:44 AM:
--

I think there's a better way. Between CALCITE-5044 and CALCITE-5510, the 
behavior of {{SqlImplementor.Context.orderField}} went from simply copying the 
expression from the select list, to this complex behavior:

# If the dialect is sort-by-ordinal and the expression is a literal integer, 
convert it to a literal string, and sort by that string expression.
# If the dialect is sort-by-ordinal and the expression is a {{SqlBasicCall}}, 
sort by ordinal.
# Otherwise, sort by expression.

It seems like a much better state of affairs would be this:

# If the dialect is sort-by-ordinal, sort by ordinal.
# Otherwise, sort by expression.

Biasing ourselves more consistently toward sorting by ordinal like this causes 
the rel-to-sql converter to generate more sub-queries, the logic for which was 
recently fixed in CALCITE-5768. I have a hunch that bug was stopping someone 
from implementing this seemingly-more-obvious solution in the first place.


was (Author: wnoble):
I think there's a better way. Between CALCITE-5044 and CALCITE-5510, the 
behavior of {{SqlImplementor.Context.orderField}} went from simply copying the 
expression from the select list, to this complex behavior:

# If the dialect is sort-by-ordinal and the expression is a literal integer, 
convert it to a literal string.
# If the dialect is sort-by-ordinal and the expression is a {{SqlBasicCall}}, 
sort by ordinal.
# Otherwise, sort by expression.

It seems like a much better state of affairs would be this:

# If the dialect is sort-by-ordinal, sort by ordinal.
# Otherwise, sort by expression.

Biasing ourselves more consistently toward sorting by ordinal like this causes 
the rel-to-sql converter to generate more sub-queries, the logic for which was 
recently fixed in CALCITE-5768. I have a hunch that bug was stopping someone 
from implementing this seemingly-more-obvious solution in the first place.

> Generated SQL uses literal values in ORDER BY clauses
> -
>
> Key: CALCITE-5724
> URL: https://issues.apache.org/jira/browse/CALCITE-5724
> Project: Calcite
>  Issue Type: Bug
>Reporter: Joey Moore
>Assignee: Joey Moore
>Priority: Major
>  Labels: pull-request-available
>
> Current behavior in the SqlImplementor will generate SqlCharStringLiterals in 
> ORDER BY fields when there is a Literal value in the SELECT clause. This 
> happens in languages with isSortByOrdinal(). This leads to errors in dialects 
> in which cannot have literal values in ORDER BY clauses such as BigQuery. 
> Proposed fix is to use ordinals in all cases where a literal value is present 
> in the SELECT clause.
> Example of current implementation:
> {code:java}
> select 3.14159265 as pi 
> from \"product\"
> order by 1;
> {code}
> Will return 
> {code:java}
> SELECT 3.14159265 AS \"PI\"
> FROM \"foodmart\".\"product\"
> ORDER BY '3.14159265'{code}
> Proposed implementation will return :
> {code:java}
> SELECT 3.14159265 AS \"PI\"
> FROM \"foodmart\".\"product\"
> ORDER BY 1{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5724) Generated SQL uses literal values in ORDER BY clauses

2023-06-26 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737418#comment-17737418
 ] 

Will Noble commented on CALCITE-5724:
-

I think there's a better way. Between CALCITE-5044 and CALCITE-5510, the 
behavior of {{SqlImplementor.Context.orderField}} went from simply copying the 
expression from the select list, to this complex behavior:

# If the dialect is sort-by-ordinal and the expression is a literal integer, 
convert it to a literal string.
# If the dialect is sort-by-ordinal and the expression is a {{SqlBasicCall}}, 
sort by ordinal.
# Otherwise, sort by expression.

It seems like a much better state of affairs would be this:

# If the dialect is sort-by-ordinal, sort by ordinal.
# Otherwise, sort by expression.

Biasing ourselves more consistently toward sorting by ordinal like this causes 
the rel-to-sql converter to generate more sub-queries, the logic for which was 
recently fixed in CALCITE-5768. I have a hunch that bug was stopping someone 
from implementing this seemingly-more-obvious solution in the first place.

> Generated SQL uses literal values in ORDER BY clauses
> -
>
> Key: CALCITE-5724
> URL: https://issues.apache.org/jira/browse/CALCITE-5724
> Project: Calcite
>  Issue Type: Bug
>Reporter: Joey Moore
>Assignee: Joey Moore
>Priority: Major
>  Labels: pull-request-available
>
> Current behavior in the SqlImplementor will generate SqlCharStringLiterals in 
> ORDER BY fields when there is a Literal value in the SELECT clause. This 
> happens in languages with isSortByOrdinal(). This leads to errors in dialects 
> in which cannot have literal values in ORDER BY clauses such as BigQuery. 
> Proposed fix is to use ordinals in all cases where a literal value is present 
> in the SELECT clause.
> Example of current implementation:
> {code:java}
> select 3.14159265 as pi 
> from \"product\"
> order by 1;
> {code}
> Will return 
> {code:java}
> SELECT 3.14159265 AS \"PI\"
> FROM \"foodmart\".\"product\"
> ORDER BY '3.14159265'{code}
> Proposed implementation will return :
> {code:java}
> SELECT 3.14159265 AS \"PI\"
> FROM \"foodmart\".\"product\"
> ORDER BY 1{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5775) Null direction emulation broken for complex expressions on some dialects

2023-06-21 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17735913#comment-17735913
 ] 

Will Noble commented on CALCITE-5775:
-

Filed CALCITE-5793.

Calcite already is so inclined to group and order by ordinals for BQ. It just 
doesn't work for null direction emulation since that requires adding an 
addition {{IS NULL}} operator, and the only operators you can apply to sorting 
ordinals is {{DESC}} or {{NULLS FIRST}} / {{LAST}}, so it has to expand it into 
the full expression in that case. This could potentially be mitigated by 
introducing a subquery with the {{IS NULL}} expression in the select list. This 
may be worth considering.

I believe that previous bug you're talking about is CALCITE-5767. I'm not sure 
if that's related to the claims that you think are red herrings, but the 
problems outlined in this bug mean that only a single null direction (the 
default for whatever dialect is being unparsed) can possibly work. The user 
does not have the ability to use the non-default null direction if it has to be 
emulated with an extra {{IS NULL}} sort.

> Null direction emulation broken for complex expressions on some dialects
> 
>
> Key: CALCITE-5775
> URL: https://issues.apache.org/jira/browse/CALCITE-5775
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Will Noble
>Priority: Minor
>  Labels: pull-request-available
>
> This is a problem in BigQuery, and may be a problem in other dialects as 
> well. Consider the following piece of valid BQ SQL:
> {code:sql}
> SELECT REPEAT(first_name, 2),
>COUNT(id)
> FROM looker_test.users
> GROUP BY REPEAT(first_name, 2)
> ORDER BY 1
> {code}
> Now consider a version where the {{ORDER BY}} clause is changed to this:
> {code:sql}
> ORDER BY REPEAT(first_name, 2)
> {code}
> This is logically the same query, because the expression in the {{ORDER}} 
> clause is the same as the one in the {{SELECT}} / {{GROUP}} clauses. BigQuery 
> is sophisticated enough to match the select to the group expression in both 
> queries, but cannot match either with the order expression. It gives this 
> error: _ORDER BY clause expression references column first_name which is 
> neither grouped nor aggregated_.
> So, when sorting by complex expressions in BQ, Calcite relies on either:
> * No null direction emulation required. 
> * sorting by alias or ordinal, which is a problem with current null direction 
> emulation because it adds an extra complex sort expression, or
> * having a query that just happens to also have the underlying field in the 
> {{GROUP BY}} clause by itself, which seems to actually happen pretty often in 
> my testing, but obviously shouldn't be a constraint.
> As I wrote that, I realized this may be easily fixable for BQ since it added 
> support for {{NULLS FIRST}} / {{LAST}} in 2020 and it seems Calcite has not 
> caught up yet. Consider this rel node:
> {code}
> LogicalSort(sort0=[$0], dir0=[ASC])
>   LogicalAggregate(group=[{0}], cent=[COUNT($1)])
> LogicalProject($f0=[CASE(IS NULL($4), 0, 1)], MGR=[$3])
>   JdbcTableScan(table=[[JDBC_SCOTT, EMP]])
> {code}
> Calcite would convert it to this in BigQuery due to null direction emulation:
> {code:sql}
> SELECT CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END AS `$f0`, COUNT(MGR) AS 
> cent
> FROM SCOTT.EMP
> GROUP BY CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END
> ORDER BY CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END IS NULL, 1
> {code}
> Which of course triggers the problem described above. This may be a problem 
> for MSSQL as well since it doesn't support {{NULLS LAST}}. The fix for BQ, at 
> least, may be to just support {{NULLS LAST}} and sort by ordinal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5793) Use NULLS FIRST / LAST when unparsing in BigQuery

2023-06-21 Thread Will Noble (Jira)
Will Noble created CALCITE-5793:
---

 Summary: Use NULLS FIRST / LAST when unparsing in BigQuery
 Key: CALCITE-5793
 URL: https://issues.apache.org/jira/browse/CALCITE-5793
 Project: Calcite
  Issue Type: Improvement
Reporter: Will Noble


BigQuery added support for {{NULLS FIRST}} and {{NULLS LAST}} in 2020. Calcite 
should use it instead of emulating null direction. CALCITE-5775 has context on 
how this was discovered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5775) Null direction emulation broken for complex expressions on some dialects

2023-06-13 Thread Will Noble (Jira)
Will Noble created CALCITE-5775:
---

 Summary: Null direction emulation broken for complex expressions 
on some dialects
 Key: CALCITE-5775
 URL: https://issues.apache.org/jira/browse/CALCITE-5775
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Will Noble


This is a problem in BigQuery, and may be a problem in other dialects as well. 
Consider the following piece of valid BQ SQL:

{code:sql}
SELECT REPEAT(first_name, 2),
   COUNT(id)
FROM looker_test.users
GROUP BY REPEAT(first_name, 2)
ORDER BY 1
{code}

Now consider a version where the {{ORDER BY}} clause is changed to this:

{code:sql}
ORDER BY REPEAT(first_name, 2)
{code}

This is logically the same query, because the expression in the {{ORDER}} 
clause is the same as the one in the {{SELECT}} / {{GROUP}} clauses. BigQuery 
is sophisticated enough to match the select to the group expression in both 
queries, but cannot match either with the order expression. It gives this 
error: _ORDER BY clause expression references column first_name which is 
neither grouped nor aggregated_.

So, when sorting by complex expressions in BQ, Calcite relies on either:
* No null direction emulation required. 
* sorting by alias or ordinal, which is a problem with current null direction 
emulation because it adds an extra complex sort expression, or
* having a query that just happens to also have the underlying field in the 
{{GROUP BY}} clause by itself, which seems to actually happen pretty often in 
my testing, but obviously shouldn't be a constraint.

As I wrote that, I realized this may be easily fixable for BQ since it added 
support for {{NULLS FIRST}} / {{LAST}} in 2020 and it seems Calcite has not 
caught up yet. Consider this rel node:

{code}
LogicalSort(sort0=[$0], dir0=[ASC])
  LogicalAggregate(group=[{0}], cent=[COUNT($1)])
LogicalProject($f0=[CASE(IS NULL($4), 0, 1)], MGR=[$3])
  JdbcTableScan(table=[[JDBC_SCOTT, EMP]])
{code}

Calcite would convert it to this in BigQuery due to null direction emulation:

{code:sql}
SELECT CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END AS `$f0`, COUNT(MGR) AS cent
FROM SCOTT.EMP
GROUP BY CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END
ORDER BY CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END IS NULL, 1
{code}

Which of course triggers the problem described above. This may be a problem for 
MSSQL as well since it doesn't support {{NULLS LAST}}. The fix for BQ, at 
least, may be to just support {{NULLS LAST}} and sort by ordinal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-5767) Calcite's MSSQL dialect should not give GROUPING special treatment when emulating NULL direction

2023-06-08 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17730760#comment-17730760
 ] 

Will Noble edited comment on CALCITE-5767 at 6/8/23 10:42 PM:
--

I updated the description and summary, but kept the verbose prose description 
because it's really a complicated problem with seemingly no ideal solution. 
Really, there are 4 potential "problems" that could be blamed for this. The 
first 3 are MSSQL's seemingly non-standard design decisions outlined in the 
ticket description (each of these is not necessarily something that needs 
changing, but if any one of these were different, there would be no problem). 
The fourth potential target for blame is the fact that {{RelBuilder}} uses a 
"default" null direction by default instead of {{NullDirection.UNSPECIFIED}} 
which, if used, would also eliminate this weird issue, and allow us to return 
{{null}} from {{emulateNullDirection}} and not have to include an effectively 
constant expression in the {{ORDER BY}} list. So, in a sense the problem lies 
with {{RelBuilder}} because it's forcing us to produce a seemingly less 
efficient and slightly more confusing SQL query.

But, changing the default null direction in {{RelBuilder}} from NULLS-high to 
unspecified would be a major breaking change, so I'm assuming that's a 
non-starter.


was (Author: wnoble):
I updated the description and summary, but kept the verbose prose description 
because it's really a complicated problem with seemingly no ideal solution. 
Really, there are 4 potential "problems" that could be blamed for this. The 
first 3 are MSSQL's seemingly non-standard design decisions outlined in the 
ticket description (each of these is not necessarily something that needs 
changing, but if any one of these were different, there would be no problem). 
The fourth potential target for blame is the fact that {{RelBuilder}} uses a 
"default" null direction by default instead of {{NullDirection.UNSPECIFIED}} 
which, if used, would also eliminate this weird issue, and allow us to return 
{{null}} from {{emulateNullDirection}} and not have to include an effectively 
constant expression in the {{ORDER BY}} list.

But, changing the default null direction in {{RelBuilder}} from NULLS-high to 
unspecified would be a major breaking change, so I'm assuming that's a 
non-starter.

> Calcite's MSSQL dialect should not give GROUPING special treatment when 
> emulating NULL direction
> 
>
> Key: CALCITE-5767
> URL: https://issues.apache.org/jira/browse/CALCITE-5767
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>  Labels: pull-request-available
>
> {{MssqlSqlDialect.emulateNullDirection}} has [special logic for {{GROUPING}} 
> calls|https://github.com/apache/calcite/blob/814ae6ec09e72544ba010f2591e06020c55b162b/core/src/main/java/org/apache/calcite/sql/dialect/MssqlSqlDialect.java#L95].
>  This seems to be an optimization attempt since {{GROUPING}} is known to 
> never return {{{}NULL{}}}, and therefore needs no null direction emulation. 
> However, this causes problems because:
> 1. MSSQL does not have the same default null direction as Calcite's 
> {{RelBuilder}} (and most other dialects).
> 2. MSSQL does not support the common {{NULLS FIRST}} or {{NULLS LAST}} syntax.
> 3. MSSQL does not allow sorting on the same field twice, even though there is 
> no theoretical issue with this.
> Each of these properties must be present for the problem to occur, so it's a 
> bit niche and specific to MSSQL. Seems like the best solution is to simply 
> eliminate the special-case treatment for {{GROUPING}} in 
> {{{}emulateNullDirection{}}}, which is currently creating problems due to 
> property #3.
> {*}More in-depth explanation{*}:
> In {{{}RelBuilder.collation{}}}, we use the "default null direction" to 
> insert rex nodes as sorting expressions, but this is only the default null 
> direction for NULLS-high dialects, i.e. *not* MSSQL. This is a problem 
> because MSSQL has special-case logic for emulating null direction of GROUPING 
> calls, whereby it effectively duplicates the expression. Really, 
> {{MssqlSqlDialect.emulateNullDirection}} probably should've been returning 
> {{null}} instead, signalling to callers that no null-direction emulation is 
> necessary because {{GROUPING}} never returns {{{}NULL{}}}, but this causes 
> another problem due to property #2 above when the null direction is 
> non-default as is caused simply by using {{RelBuilder.collation}} as 
> described above (it should be noted that this method takes rex nodes instead 
> of {{RelFieldCollation}} object, so there is no way to specify null 
> 

[jira] [Commented] (CALCITE-5767) Calcite's MSSQL dialect should not give GROUPING special treatment when emulating NULL direction

2023-06-08 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17730760#comment-17730760
 ] 

Will Noble commented on CALCITE-5767:
-

I updated the description and summary, but kept the verbose prose description 
because it's really a complicated problem with seemingly no ideal solution. 
Really, there are 4 potential "problems" that could be blamed for this. The 
first 3 are MSSQL's seemingly non-standard design decisions outlined in the 
ticket description (each of these is not necessarily something that needs 
changing, but if any one of these were different, there would be no problem). 
The fourth potential target for blame is the fact that {{RelBuilder}} uses a 
"default" null direction by default instead of {{NullDirection.UNSPECIFIED}} 
which, if used, would also eliminate this weird issue, and allow us to return 
{{null}} from {{emulateNullDirection}} and not have to include an effectively 
constant expression in the {{ORDER BY}} list.

But, changing the default null direction in {{RelBuilder}} from NULLS-high to 
unspecified would be a major breaking change, so I'm assuming that's a 
non-starter.

> Calcite's MSSQL dialect should not give GROUPING special treatment when 
> emulating NULL direction
> 
>
> Key: CALCITE-5767
> URL: https://issues.apache.org/jira/browse/CALCITE-5767
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>  Labels: pull-request-available
>
> {{MssqlSqlDialect.emulateNullDirection}} has [special logic for {{GROUPING}} 
> calls|https://github.com/apache/calcite/blob/814ae6ec09e72544ba010f2591e06020c55b162b/core/src/main/java/org/apache/calcite/sql/dialect/MssqlSqlDialect.java#L95].
>  This seems to be an optimization attempt since {{GROUPING}} is known to 
> never return {{{}NULL{}}}, and therefore needs no null direction emulation. 
> However, this causes problems because:
> 1. MSSQL does not have the same default null direction as Calcite's 
> {{RelBuilder}} (and most other dialects).
> 2. MSSQL does not support the common {{NULLS FIRST}} or {{NULLS LAST}} syntax.
> 3. MSSQL does not allow sorting on the same field twice, even though there is 
> no theoretical issue with this.
> Each of these properties must be present for the problem to occur, so it's a 
> bit niche and specific to MSSQL. Seems like the best solution is to simply 
> eliminate the special-case treatment for {{GROUPING}} in 
> {{{}emulateNullDirection{}}}, which is currently creating problems due to 
> property #3.
> {*}More in-depth explanation{*}:
> In {{{}RelBuilder.collation{}}}, we use the "default null direction" to 
> insert rex nodes as sorting expressions, but this is only the default null 
> direction for NULLS-high dialects, i.e. *not* MSSQL. This is a problem 
> because MSSQL has special-case logic for emulating null direction of GROUPING 
> calls, whereby it effectively duplicates the expression. Really, 
> {{MssqlSqlDialect.emulateNullDirection}} probably should've been returning 
> {{null}} instead, signalling to callers that no null-direction emulation is 
> necessary because {{GROUPING}} never returns {{{}NULL{}}}, but this causes 
> another problem due to property #2 above when the null direction is 
> non-default as is caused simply by using {{RelBuilder.collation}} as 
> described above (it should be noted that this method takes rex nodes instead 
> of {{RelFieldCollation}} object, so there is no way to specify null 
> direction) because the non-default null direction is not expanded into a 
> {{CASE}} expression (MSSQL does not support {{NULLS FIRST}} or {{LAST}} 
> syntax).
> Here's a test illustrating the problem:
> Input SQL (default dialect)
> {code:xml}
> select "product_class_id", "brand_name", GROUPING("brand_name")
> from "product"
> group by GROUPING SETS (("product_class_id", "brand_name"), 
> ("product_class_id"))
> order by 3, 2, 1
> {code}
> Current behavior for unparsing as MSSQL (incorrect because it orders by the 
> same column twice; {{GROUPING([brand_name])}} and {{{}3{}}}, which will fail 
> if you try to actually run this against a real MSSQL database, even though it 
> seems like it shouldn't):
> {code:xml}
> SELECT [product_class_id], [brand_name], GROUPING([brand_name])
> FROM [foodmart].[product]
> GROUP BY GROUPING SETS(([product_class_id], [brand_name]), [product_class_id])
> ORDER BY
>  GROUPING([brand_name]),
>  3,
>  CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
>  [brand_name],
>  CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
>  [product_class_id]
> {code}
> Behavior where {{MssqlSqlDialect.emulateNullDirection}} simply 

[jira] [Updated] (CALCITE-5767) Calcite's MSSQL dialect should not give GROUPING special treatment when emulating NULL direction

2023-06-08 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5767:

Description: 
{{MssqlSqlDialect.emulateNullDirection}} has [special logic for {{GROUPING}} 
calls|https://github.com/apache/calcite/blob/814ae6ec09e72544ba010f2591e06020c55b162b/core/src/main/java/org/apache/calcite/sql/dialect/MssqlSqlDialect.java#L95].
 This seems to be an optimization attempt since {{GROUPING}} is known to never 
return {{{}NULL{}}}, and therefore needs no null direction emulation. However, 
this causes problems because:

1. MSSQL does not have the same default null direction as Calcite's 
{{RelBuilder}} (and most other dialects).
2. MSSQL does not support the common {{NULLS FIRST}} or {{NULLS LAST}} syntax.
3. MSSQL does not allow sorting on the same field twice, even though there is 
no theoretical issue with this.

Each of these properties must be present for the problem to occur, so it's a 
bit niche and specific to MSSQL. Seems like the best solution is to simply 
eliminate the special-case treatment for {{GROUPING}} in 
{{{}emulateNullDirection{}}}, which is currently creating problems due to 
property #3.

{*}More in-depth explanation{*}:

In {{{}RelBuilder.collation{}}}, we use the "default null direction" to insert 
rex nodes as sorting expressions, but this is only the default null direction 
for NULLS-high dialects, i.e. *not* MSSQL. This is a problem because MSSQL has 
special-case logic for emulating null direction of GROUPING calls, whereby it 
effectively duplicates the expression. Really, 
{{MssqlSqlDialect.emulateNullDirection}} probably should've been returning 
{{null}} instead, signalling to callers that no null-direction emulation is 
necessary because {{GROUPING}} never returns {{{}NULL{}}}, but this causes 
another problem due to property #2 above when the null direction is non-default 
as is caused simply by using {{RelBuilder.collation}} as described above (it 
should be noted that this method takes rex nodes instead of 
{{RelFieldCollation}} object, so there is no way to specify null direction) 
because the non-default null direction is not expanded into a {{CASE}} 
expression (MSSQL does not support {{NULLS FIRST}} or {{LAST}} syntax).

Here's a test illustrating the problem:

Input SQL (default dialect)
{code:xml}
select "product_class_id", "brand_name", GROUPING("brand_name")
from "product"
group by GROUPING SETS (("product_class_id", "brand_name"), 
("product_class_id"))
order by 3, 2, 1
{code}
Current behavior for unparsing as MSSQL (incorrect because it orders by the 
same column twice; {{GROUPING([brand_name])}} and {{{}3{}}}, which will fail if 
you try to actually run this against a real MSSQL database, even though it 
seems like it shouldn't):
{code:xml}
SELECT [product_class_id], [brand_name], GROUPING([brand_name])
FROM [foodmart].[product]
GROUP BY GROUPING SETS(([product_class_id], [brand_name]), [product_class_id])
ORDER BY
 GROUPING([brand_name]),
 3,
 CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
 [brand_name],
 CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
 [product_class_id]
{code}
Behavior where {{MssqlSqlDialect.emulateNullDirection}} simply returns {{null}} 
for {{GROUPING}} expressions (incorrect because it uses {{NULLS LAST}} syntax):
{code:xml}
...
ORDER BY
 3 NULLS LAST,
 CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
 [brand_name],
 CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
 [product_class_id]
{code}
Acceptable behavior (although the first {{{}ORDER BY{}}}-clause is effectively 
ordering by a constant, this will at least run and produce the correct results):
{code:xml}
...
ORDER BY
 CASE WHEN GROUPING([brand_name]) IS NULL THEN 0 ELSE 1 END,
 3,
 CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
 [brand_name],
 CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
 [product_class_id]
{code}

  was:
In {{{}RelBuilder.collation{}}}, we use the "default null direction" to insert 
rex nodes as sorting expressions, but this is only the default null direction 
for NULLS-high dialects, i.e. *not* MSSQL. This is a problem because MSSQL has 
special-case logic for emulating null direction of GROUPING calls, whereby it 
effectively duplicates the expression. Really, 
{{MssqlSqlDialect.emulateNullDirection}} probably should've been returning 
{{null}} instead, signalling to callers that no null-direction emulation is 
necessary because {{GROUPING}} never returns {{{}NULL{}}}, but this causes 
another problem when the null direction is non-default as is caused simply by 
using {{RelBuilder.collation}} as described above (it should be noted that this 
method takes rex nodes instead of {{RelFieldCollation}} object, so there is no 
way to specify null direction) because the non-default 

[jira] [Updated] (CALCITE-5767) Calcite's MSSQL dialect should not give GROUPING special treatment when emulating NULL direction

2023-06-08 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5767:

Summary: Calcite's MSSQL dialect should not give GROUPING special treatment 
when emulating NULL direction  (was: MSSQL fails to unparse properly when 
sorting by GROUPING expression)

> Calcite's MSSQL dialect should not give GROUPING special treatment when 
> emulating NULL direction
> 
>
> Key: CALCITE-5767
> URL: https://issues.apache.org/jira/browse/CALCITE-5767
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>  Labels: pull-request-available
>
> In {{{}RelBuilder.collation{}}}, we use the "default null direction" to 
> insert rex nodes as sorting expressions, but this is only the default null 
> direction for NULLS-high dialects, i.e. *not* MSSQL. This is a problem 
> because MSSQL has special-case logic for emulating null direction of GROUPING 
> calls, whereby it effectively duplicates the expression. Really, 
> {{MssqlSqlDialect.emulateNullDirection}} probably should've been returning 
> {{null}} instead, signalling to callers that no null-direction emulation is 
> necessary because {{GROUPING}} never returns {{{}NULL{}}}, but this causes 
> another problem when the null direction is non-default as is caused simply by 
> using {{RelBuilder.collation}} as described above (it should be noted that 
> this method takes rex nodes instead of {{RelFieldCollation}} object, so there 
> is no way to specify null direction) because the non-default null direction 
> is not expanded into a {{CASE}} expression (MSSQL does not support {{NULLS 
> FIRST}} or {{LAST}} syntax).
> Here's a test illustrating the problem:
> Input SQL (default dialect)
> {code:xml}
> select "product_class_id", "brand_name", GROUPING("brand_name")
> from "product"
> group by GROUPING SETS (("product_class_id", "brand_name"), 
> ("product_class_id"))
> order by 3, 2, 1
> {code}
> Current behavior for unparsing as MSSQL (incorrect because it orders by the 
> same column twice; {{GROUPING([brand_name])}} and {{3}}, which will fail if 
> you try to actually run this against a real MSSQL database, even though it 
> seems like it shouldn't):
> {code:xml}
> SELECT [product_class_id], [brand_name], GROUPING([brand_name])
> FROM [foodmart].[product]
> GROUP BY GROUPING SETS(([product_class_id], [brand_name]), [product_class_id])
> ORDER BY
>  GROUPING([brand_name]),
>  3,
>  CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
>  [brand_name],
>  CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
>  [product_class_id]
> {code}
> Behavior where {{MssqlSqlDialect.emulateNullDirection}} simply returns 
> {{null}} for {{GROUPING}} expressions (incorrect because it uses {{NULLS 
> LAST}} syntax):
> {code:xml}
> ...
> ORDER BY
>  3 NULLS LAST,
>  CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
>  [brand_name],
>  CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
>  [product_class_id]
> {code}
> Acceptable behavior (although the first {{ORDER BY}}-clause is effectively 
> ordering by a constant, this will at least run and produce the correct 
> results):
> {code:xml}
> ...
> ORDER BY
>  CASE WHEN GROUPING([brand_name]) IS NULL THEN 0 ELSE 1 END,
>  3,
>  CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
>  [brand_name],
>  CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
>  [product_class_id]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5768) Rel-to-sql conversion fails to insert a subquery for some sort-by-ordinal queries

2023-06-08 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5768:

Description: 
Currently, when given a rel expression like this:

{code}
LogicalProject(JOB=[$0])
  LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
LogicalAggregate(group=[{2}], agg#0=[COUNT($1)])
  LogicalTableScan(table=[[scott, EMP]])
{code}

Calcite will convert to this invalid ({{ORDER BY 2}} despite only one 
{{SELECT}} expression) SQL:

{code:sql}
SELECT "JOB"
FROM "scott"."EMP"
GROUP BY "JOB"
ORDER BY "JOB", 2
{code}

When it should convert to this SQL with a sub-query:

{code:sql}
SELECT "JOB"
FROM (SELECT "JOB", COUNT("ENAME") AS "$f1"
FROM "scott"."EMP"
GROUP BY "JOB"
ORDER BY "JOB", 2) AS "t0"
{code}

{{hasSortByOrdinal}} has an [improper early 
return|https://github.com/apache/calcite/blob/c4042a34ef054b89cec1c47fefcbc8689bad55be/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L1902]
 that's causing incorrect results. It should only return early if the result is 
{{true}}. Otherwise, it should continue searching for ordinals.

  was:{{hasSortByOrdinal}} has an [improper early 
return|https://github.com/apache/calcite/blob/c4042a34ef054b89cec1c47fefcbc8689bad55be/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L1902]
 that's causing incorrect results. It should only return early if the result is 
{{true}}. Otherwise, it should continue searching for ordinals.


> Rel-to-sql conversion fails to insert a subquery for some sort-by-ordinal 
> queries
> -
>
> Key: CALCITE-5768
> URL: https://issues.apache.org/jira/browse/CALCITE-5768
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Major
>  Labels: pull-request-available
>
> Currently, when given a rel expression like this:
> {code}
> LogicalProject(JOB=[$0])
>   LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
> LogicalAggregate(group=[{2}], agg#0=[COUNT($1)])
>   LogicalTableScan(table=[[scott, EMP]])
> {code}
> Calcite will convert to this invalid ({{ORDER BY 2}} despite only one 
> {{SELECT}} expression) SQL:
> {code:sql}
> SELECT "JOB"
> FROM "scott"."EMP"
> GROUP BY "JOB"
> ORDER BY "JOB", 2
> {code}
> When it should convert to this SQL with a sub-query:
> {code:sql}
> SELECT "JOB"
> FROM (SELECT "JOB", COUNT("ENAME") AS "$f1"
> FROM "scott"."EMP"
> GROUP BY "JOB"
> ORDER BY "JOB", 2) AS "t0"
> {code}
> {{hasSortByOrdinal}} has an [improper early 
> return|https://github.com/apache/calcite/blob/c4042a34ef054b89cec1c47fefcbc8689bad55be/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L1902]
>  that's causing incorrect results. It should only return early if the result 
> is {{true}}. Otherwise, it should continue searching for ordinals.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5768) Rel-to-sql conversion fails to insert a subquery for some sort-by-ordinal queries

2023-06-08 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5768:

Summary: Rel-to-sql conversion fails to insert a subquery for some 
sort-by-ordinal queries  (was: Rel-to-sql conversion fails to insert a subquery)

> Rel-to-sql conversion fails to insert a subquery for some sort-by-ordinal 
> queries
> -
>
> Key: CALCITE-5768
> URL: https://issues.apache.org/jira/browse/CALCITE-5768
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Major
>  Labels: pull-request-available
>
> {{hasSortByOrdinal}} has an [improper early 
> return|https://github.com/apache/calcite/blob/c4042a34ef054b89cec1c47fefcbc8689bad55be/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L1902]
>  that's causing incorrect results. It should only return early if the result 
> is {{true}}. Otherwise, it should continue searching for ordinals.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5768) Rel-to-sql conversion fails to insert a subquery

2023-06-08 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5768:

Summary: Rel-to-sql conversion fails to insert a subquery  (was: 
SqlImplementor.hasSortByOrdinal is broken)

> Rel-to-sql conversion fails to insert a subquery
> 
>
> Key: CALCITE-5768
> URL: https://issues.apache.org/jira/browse/CALCITE-5768
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Major
>  Labels: pull-request-available
>
> {{hasSortByOrdinal}} has an [improper early 
> return|https://github.com/apache/calcite/blob/c4042a34ef054b89cec1c47fefcbc8689bad55be/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L1902]
>  that's causing incorrect results. It should only return early if the result 
> is {{true}}. Otherwise, it should continue searching for ordinals.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5768) SqlImplementor.hasSortByOrdinal is broken

2023-06-07 Thread Will Noble (Jira)
Will Noble created CALCITE-5768:
---

 Summary: SqlImplementor.hasSortByOrdinal is broken
 Key: CALCITE-5768
 URL: https://issues.apache.org/jira/browse/CALCITE-5768
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Will Noble


{{hasSortByOrdinal}} has an [improper early 
return|https://github.com/apache/calcite/blob/c4042a34ef054b89cec1c47fefcbc8689bad55be/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L1902]
 that's causing incorrect results. It should only return early if the result is 
{{true}}. Otherwise, it should continue searching for ordinals.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (CALCITE-5768) SqlImplementor.hasSortByOrdinal is broken

2023-06-07 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble reassigned CALCITE-5768:
---

Assignee: Will Noble

> SqlImplementor.hasSortByOrdinal is broken
> -
>
> Key: CALCITE-5768
> URL: https://issues.apache.org/jira/browse/CALCITE-5768
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Major
>
> {{hasSortByOrdinal}} has an [improper early 
> return|https://github.com/apache/calcite/blob/c4042a34ef054b89cec1c47fefcbc8689bad55be/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L1902]
>  that's causing incorrect results. It should only return early if the result 
> is {{true}}. Otherwise, it should continue searching for ordinals.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5767) MSSQL fails to unparse properly when sorting by GROUPING expression

2023-06-07 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5767:

Description: 
In {{{}RelBuilder.collation{}}}, we use the "default null direction" to insert 
rex nodes as sorting expressions, but this is only the default null direction 
for NULLS-high dialects, i.e. *not* MSSQL. This is a problem because MSSQL has 
special-case logic for emulating null direction of GROUPING calls, whereby it 
effectively duplicates the expression. Really, 
{{MssqlSqlDialect.emulateNullDirection}} probably should've been returning 
{{null}} instead, signalling to callers that no null-direction emulation is 
necessary because {{GROUPING}} never returns {{{}NULL{}}}, but this causes 
another problem when the null direction is non-default as is caused simply by 
using {{RelBuilder.collation}} as described above (it should be noted that this 
method takes rex nodes instead of {{RelFieldCollation}} object, so there is no 
way to specify null direction) because the non-default null direction is not 
expanded into a {{CASE}} expression (MSSQL does not support {{NULLS FIRST}} or 
{{LAST}} syntax).

Here's a test illustrating the problem:

Input SQL (default dialect)
{code:xml}
select "product_class_id", "brand_name", GROUPING("brand_name")
from "product"
group by GROUPING SETS (("product_class_id", "brand_name"), 
("product_class_id"))
order by 3, 2, 1
{code}
Current behavior for unparsing as MSSQL (incorrect because it orders by the 
same column twice; {{GROUPING([brand_name])}} and {{3}}, which will fail if you 
try to actually run this against a real MSSQL database, even though it seems 
like it shouldn't):
{code:xml}
SELECT [product_class_id], [brand_name], GROUPING([brand_name])
FROM [foodmart].[product]
GROUP BY GROUPING SETS(([product_class_id], [brand_name]), [product_class_id])
ORDER BY
 GROUPING([brand_name]),
 3,
 CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
 [brand_name],
 CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
 [product_class_id]
{code}
Behavior where {{MssqlSqlDialect.emulateNullDirection}} simply returns {{null}} 
for {{GROUPING}} expressions (incorrect because it uses {{NULLS LAST}} syntax):
{code:xml}
...
ORDER BY
 3 NULLS LAST,
 CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
 [brand_name],
 CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
 [product_class_id]
{code}
Acceptable behavior (although the first {{ORDER BY}}-clause is effectively 
ordering by a constant, this will at least run and produce the correct results):
{code:xml}
...
ORDER BY
 CASE WHEN GROUPING([brand_name]) IS NULL THEN 0 ELSE 1 END,
 3,
 CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
 [brand_name],
 CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
 [product_class_id]
{code}

  was:
In {{{}RelBuilder.collation{}}}, we use the "default null direction" to insert 
rex nodes as sorting expressions, but this is only the default null direction 
for NULLS-high dialects, i.e. *not* MSSQL. This is a problem because MSSQL has 
special-case logic for emulating null direction of GROUPING calls, whereby it 
effectively duplicates the expression. Really, 
{{MssqlSqlDialect.emulateNullDirection}} probably should've been returning 
{{null}} instead, signalling to callers that no null-direction emulation is 
necessary because {{GROUPING}} never returns {{{}NULL{}}}, but this causes 
another problem when the null direction is non-default as is caused simply by 
using {{RelBuilder.collation}} as described above (it should be noted that this 
method takes rex nodes instead of {{RelFieldCollation}} object, so there is no 
way to specify null direction) because the non-default null direction is not 
expanded into a {{CASE}} expression (MSSQL does not support {{NULLS FIRST}} or 
{{LAST}} syntax).

Here's a test illustrating the problem:

Input SQL (default dialect)
{code:xml}
select "product_class_id", "brand_name", GROUPING("brand_name")
from "product"
group by GROUPING SETS (("product_class_id", "brand_name"), 
("product_class_id"))
order by 3, 2, 1
{code}
Current behavior for unparsing as MSSQL (incorrect because it orders by the 
same column twice; {{GROUPING([brand_name])}} and {{3}}, which will fail if you 
try to actually run this against a real MSSQL database, even though it seems 
like it shouldn't):
{code:xml}
SELECT [product_class_id], [brand_name], GROUPING([brand_name])
FROM [foodmart].[product]
GROUP BY GROUPING SETS(([product_class_id], [brand_name]), [product_class_id])
ORDER BY
 GROUPING([brand_name]),
 3,
 CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
 [brand_name],
 CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
 [product_class_id]
{code}
Behavior where 

[jira] [Updated] (CALCITE-5767) MSSQL fails to unparse properly when sorting by GROUPING expression

2023-06-07 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5767:

Description: 
In {{{}RelBuilder.collation{}}}, we use the "default null direction" to insert 
rex nodes as sorting expressions, but this is only the default null direction 
for NULLS-high dialects, i.e. *not* MSSQL. This is a problem because MSSQL has 
special-case logic for emulating null direction of GROUPING calls, whereby it 
effectively duplicates the expression. Really, 
{{MssqlSqlDialect.emulateNullDirection}} probably should've been returning 
{{null}} instead, signalling to callers that no null-direction emulation is 
necessary because {{GROUPING}} never returns {{{}NULL{}}}, but this causes 
another problem when the null direction is non-default as is caused simply by 
using {{RelBuilder.collation}} as described above (it should be noted that this 
method takes rex nodes instead of {{RelFieldCollation}} object, so there is no 
way to specify null direction) because the non-default null direction is not 
expanded into a {{CASE}} expression (MSSQL does not support {{NULLS FIRST}} or 
{{LAST}} syntax).

Here's a test illustrating the problem:

Input SQL (default dialect)
{code:xml}
select "product_class_id", "brand_name", GROUPING("brand_name")
from "product"
group by GROUPING SETS (("product_class_id", "brand_name"), 
("product_class_id"))
order by 3, 2, 1
{code}
Current behavior for unparsing as MSSQL (incorrect because it orders by the 
same column twice; {{GROUPING([brand_name])}} and {{3}}, which will fail if you 
try to actually run this against a real MSSQL database, even though it seems 
like it shouldn't):
{code:xml}
SELECT [product_class_id], [brand_name], GROUPING([brand_name])
FROM [foodmart].[product]
GROUP BY GROUPING SETS(([product_class_id], [brand_name]), [product_class_id])
ORDER BY
 GROUPING([brand_name]),
 3,
 CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
 [brand_name],
 CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
 [product_class_id]
{code}
Behavior where {{MssqlSqlDialect.emulateNullDirection}} simply returns {{null}} 
for {{GROUPING}} expressions (incorrect because it uses {{NULLS LAST}} syntax):
{code:xml}
...
ORDER BY
 3 NULLS LAST,
 3,
 CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
 [brand_name],
 CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
 [product_class_id]
{code}
Acceptable behavior (although the first {{ORDER BY}}-clause is effectively 
ordering by a constant, this will at least run and produce the correct results):
{code:xml}
...
ORDER BY
 CASE WHEN GROUPING([brand_name]) IS NULL THEN 0 ELSE 1 END,
 3,
 CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
 [brand_name],
 CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
 [product_class_id]
{code}

  was:
In {{{}RelBuilder.collation{}}}, we use the "default null direction" to insert 
rex nodes as sorting expressions, but this is only the default null direction 
for NULLS-high dialects, i.e. *not* MSSQL. This is a problem because MSSQL has 
special-case logic for emulating null direction of GROUPING calls, whereby it 
effectively duplicates the expression. Really, 
{{MssqlSqlDialect.emulateNullDirection}} probably should've been returning 
{{null}} instead, signalling to callers that no null-direction emulation is 
necessary because {{GROUPING}} never returns {{{}NULL{}}}, but this causes 
another problem when the null direction is non-default as is caused simply by 
using {{RelBuilder.collation}} as described above (it should be noted that this 
method takes rex nodes instead of {{RelFieldCollation}} object, so there is no 
way to specify null direction) because the non-default null direction is not 
expanded into a {{CASE}} expression (MSSQL does not support {{NULLS FIRST}} or 
{{LAST}} syntax).

Here's a test illustrating the problem:

Input SQL (default dialect)
{code:xml}
select "product_class_id", "brand_name", GROUPING("brand_name")
from "product"
group by GROUPING SETS (("product_class_id", "brand_name"), 
("product_class_id"))
order by 3, 2, 1
{code}
Current behavior for unparsing as MSSQL (incorrect because it orders by the 
same column twice; {{GROUPING([brand_name])}} and {{3}}):
{code:xml}
SELECT [product_class_id], [brand_name], GROUPING([brand_name])
FROM [foodmart].[product]
GROUP BY GROUPING SETS(([product_class_id], [brand_name]), [product_class_id])
ORDER BY
 GROUPING([brand_name]),
 3,
 CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
 [brand_name],
 CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
 [product_class_id]
{code}
Behavior where {{MssqlSqlDialect.emulateNullDirection}} simply returns {{null}} 
for {{GROUPING}} expressions (incorrect because it uses {{NULLS 

[jira] [Created] (CALCITE-5767) MSSQL fails to unparse properly when sorting by GROUPING expression

2023-06-07 Thread Will Noble (Jira)
Will Noble created CALCITE-5767:
---

 Summary: MSSQL fails to unparse properly when sorting by GROUPING 
expression
 Key: CALCITE-5767
 URL: https://issues.apache.org/jira/browse/CALCITE-5767
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Will Noble


In {{{}RelBuilder.collation{}}}, we use the "default null direction" to insert 
rex nodes as sorting expressions, but this is only the default null direction 
for NULLS-high dialects, i.e. *not* MSSQL. This is a problem because MSSQL has 
special-case logic for emulating null direction of GROUPING calls, whereby it 
effectively duplicates the expression. Really, 
{{MssqlSqlDialect.emulateNullDirection}} probably should've been returning 
{{null}} instead, signalling to callers that no null-direction emulation is 
necessary because {{GROUPING}} never returns {{{}NULL{}}}, but this causes 
another problem when the null direction is non-default as is caused simply by 
using {{RelBuilder.collation}} as described above (it should be noted that this 
method takes rex nodes instead of {{RelFieldCollation}} object, so there is no 
way to specify null direction) because the non-default null direction is not 
expanded into a {{CASE}} expression (MSSQL does not support {{NULLS FIRST}} or 
{{LAST}} syntax).

Here's a test illustrating the problem:

Input SQL (default dialect)
{code:xml}
select "product_class_id", "brand_name", GROUPING("brand_name")
from "product"
group by GROUPING SETS (("product_class_id", "brand_name"), 
("product_class_id"))
order by 3, 2, 1
{code}
Current behavior for unparsing as MSSQL (incorrect because it orders by the 
same column twice; {{GROUPING([brand_name])}} and {{3}}):
{code:xml}
SELECT [product_class_id], [brand_name], GROUPING([brand_name])
FROM [foodmart].[product]
GROUP BY GROUPING SETS(([product_class_id], [brand_name]), [product_class_id])
ORDER BY
 GROUPING([brand_name]),
 3,
 CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
 [brand_name],
 CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
 [product_class_id]
{code}
Behavior where {{MssqlSqlDialect.emulateNullDirection}} simply returns {{null}} 
for {{GROUPING}} expressions (incorrect because it uses {{NULLS LAST}} syntax):
{code:xml}
...
ORDER BY
 3 NULLS LAST,
 3,
 CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
 [brand_name],
 CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
 [product_class_id]
{code}
Acceptable behavior:
{code:xml}
...
ORDER BY
 CASE WHEN GROUPING([brand_name]) IS NULL THEN 0 ELSE 1 END,
 3,
 CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
 [brand_name],
 CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
 [product_class_id]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (CALCITE-5767) MSSQL fails to unparse properly when sorting by GROUPING expression

2023-06-07 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble reassigned CALCITE-5767:
---

Assignee: Will Noble

> MSSQL fails to unparse properly when sorting by GROUPING expression
> ---
>
> Key: CALCITE-5767
> URL: https://issues.apache.org/jira/browse/CALCITE-5767
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>
> In {{{}RelBuilder.collation{}}}, we use the "default null direction" to 
> insert rex nodes as sorting expressions, but this is only the default null 
> direction for NULLS-high dialects, i.e. *not* MSSQL. This is a problem 
> because MSSQL has special-case logic for emulating null direction of GROUPING 
> calls, whereby it effectively duplicates the expression. Really, 
> {{MssqlSqlDialect.emulateNullDirection}} probably should've been returning 
> {{null}} instead, signalling to callers that no null-direction emulation is 
> necessary because {{GROUPING}} never returns {{{}NULL{}}}, but this causes 
> another problem when the null direction is non-default as is caused simply by 
> using {{RelBuilder.collation}} as described above (it should be noted that 
> this method takes rex nodes instead of {{RelFieldCollation}} object, so there 
> is no way to specify null direction) because the non-default null direction 
> is not expanded into a {{CASE}} expression (MSSQL does not support {{NULLS 
> FIRST}} or {{LAST}} syntax).
> Here's a test illustrating the problem:
> Input SQL (default dialect)
> {code:xml}
> select "product_class_id", "brand_name", GROUPING("brand_name")
> from "product"
> group by GROUPING SETS (("product_class_id", "brand_name"), 
> ("product_class_id"))
> order by 3, 2, 1
> {code}
> Current behavior for unparsing as MSSQL (incorrect because it orders by the 
> same column twice; {{GROUPING([brand_name])}} and {{3}}):
> {code:xml}
> SELECT [product_class_id], [brand_name], GROUPING([brand_name])
> FROM [foodmart].[product]
> GROUP BY GROUPING SETS(([product_class_id], [brand_name]), [product_class_id])
> ORDER BY
>  GROUPING([brand_name]),
>  3,
>  CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
>  [brand_name],
>  CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
>  [product_class_id]
> {code}
> Behavior where {{MssqlSqlDialect.emulateNullDirection}} simply returns 
> {{null}} for {{GROUPING}} expressions (incorrect because it uses {{NULLS 
> LAST}} syntax):
> {code:xml}
> ...
> ORDER BY
>  3 NULLS LAST,
>  3,
>  CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
>  [brand_name],
>  CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
>  [product_class_id]
> {code}
> Acceptable behavior:
> {code:xml}
> ...
> ORDER BY
>  CASE WHEN GROUPING([brand_name]) IS NULL THEN 0 ELSE 1 END,
>  3,
>  CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
>  [brand_name],
>  CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
>  [product_class_id]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5636) Consult type mappings when coercing types

2023-04-12 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5636:

Summary: Consult type mappings when coercing types  (was: Consult type 
mappings when coercing types.)

> Consult type mappings when coercing types
> -
>
> Key: CALCITE-5636
> URL: https://issues.apache.org/jira/browse/CALCITE-5636
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Will Noble
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is another offshoot of CALCITE-5346.
> In 
> [{{AbstractTypeCoercion.implicitCast}}|https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/main/java/org/apache/calcite/sql/validate/implicit/AbstractTypeCoercion.java#L713],
>  Calcite may insert {{CAST}} invocations disregarding the root schema type 
> map. This happens during the validation phase, so it's easy enough to read 
> the type map directly in the {{AbstractTypeCoercion}} utility, although it's 
> hard to say if this is an optimal solution. It may make sense to incorporate 
> the type map into the {{RelDataTypeSystem}} which is more widely available in 
> non-validation phases of SQL processing pipelines.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-3959) Implement INSTR function

2023-04-10 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710342#comment-17710342
 ] 

Will Noble commented on CALCITE-3959:
-

- BigQuery and Oracle have identical functions called {{INSTR}} (takes string, 
substring, position, occurence)
- MySQL has a function called {{INSTR}} that's similar to BigQuery's {{STRPOS}} 
or Postgres' {{POSITION}} (takes only string and substring; equivalent to the 
special case of BQ's {{INSTR}} where position=1 and occurence=1).

I'm not totally clear on how to look up "standard" SQL functions (i.e. to 
divide functions between {{SqlStdOperatorTable}} and {{SqlLibraryOperators}}) 
so I'll just assume that Postgres is basically standard SQL.

So, here we have a situation where there's a standard SQL function (called 
{{POSITION}} in Postgres) that's equivalent to a special case of a more general 
library-specific function (called {{INSTR}} in BigQuery). Does it generally 
make sense to de-sugar the standard function into the library-specific function 
in this case? It seems like this would add a little complexity to the 
[{{unparseCall}}|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/BigQuerySqlDialect.java#L152]
 method of virtually every {{SqlDialect}}, introducing the possibility of a 
runtime exception if {{occurence}} is ever anything besides a literal {{1}} 
(although this runtime exception may be unavoidable if parsing in BQ and 
unparsing in Postgres, for example).

Seems like it should work fine, just curious how desirable it is for code 
health.

> Implement INSTR function
> 
>
> Key: CALCITE-3959
> URL: https://issues.apache.org/jira/browse/CALCITE-3959
> Project: Calcite
>  Issue Type: Wish
>Reporter: xzh_dz
>Assignee: Joey Moore
>Priority: Major
>
> [BiqQuery|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#instr]
>  and 
> [Oracle|https://docs.oracle.com/cd/B13789_01/server.101/b10759/functions058.htm]
>  both support functionally identical INSTR(source_value, search_value[, 
> position[, occurrence]]) functions which accepts 2 (character strings or 
> binary strings), 1 optional int representing position, and 1 optional int 
> representing occurrence.
> Occurrence and position are assigned a default value of 1 if not specified.
> The function returns the 1-based position of the nth occurrence of the 2nd 
> operand in the 1st operand where n is defined by the 4th operand. The 
> function begins searching at the 1-based position specified in the 3rd 
> operand.
> The function also supports negative position values, with -1 indicating the 
> last character, and will search backwards from the position specified in that 
> case. 
> Returns 0 if:
>  * No match is found.
>  * If occurrence is greater than the number of matches found.
>  * If position is greater than the length of source_value.
> Returns NULL if:
>  * Any input argument is NULL.
> Returns an error if:
>  * position is 0.
>  * occurrence is 0 or negative.
> EXAMPLE: {{INSTR("abc", "bc")}} would return 2.
> EXAMPLE: {{INSTR("abcabc", "bc", 3)}} would return 5.
> EXAMPLE: {{INSTR("abcabc", "bc", -1, 1)}} would return 5.
> EXAMPLE: {{INSTR("abcabc", "bc", -1, 2)}} would return 2.
>  
> MySQL also has an  [INSTR|https://www.w3schools.com/sql/func_mysql_instr.asp] 
> function, the functionality of which is a subset of the INSTR present in BQ 
> and Oracle. MySQL INSTR only takes 2 parameters and returns the first 
> occurrence of the search value in the source value. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5636) Consult type mappings when coercing types.

2023-04-07 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709784#comment-17709784
 ] 

Will Noble commented on CALCITE-5636:
-

Whoops closed the other as a duplicate. Think that was some race condition.

> Consult type mappings when coercing types.
> --
>
> Key: CALCITE-5636
> URL: https://issues.apache.org/jira/browse/CALCITE-5636
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Will Noble
>Priority: Major
>
> This is another offshoot of CALCITE-5346.
> In 
> [{{AbstractTypeCoercion.implicitCast}}|https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/main/java/org/apache/calcite/sql/validate/implicit/AbstractTypeCoercion.java#L713],
>  Calcite may insert {{CAST}} invocations disregarding the root schema type 
> map. This happens during the validation phase, so it's easy enough to read 
> the type map directly in the {{AbstractTypeCoercion}} utility, although it's 
> hard to say if this is an optimal solution. It may make sense to incorporate 
> the type map into the {{RelDataTypeSystem}} which is more widely available in 
> non-validation phases of SQL processing pipelines.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (CALCITE-5635) Consult type mappings when coercing types.

2023-04-07 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble closed CALCITE-5635.
---
Resolution: Duplicate

> Consult type mappings when coercing types.
> --
>
> Key: CALCITE-5635
> URL: https://issues.apache.org/jira/browse/CALCITE-5635
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Will Noble
>Priority: Major
>
> This is another offshoot of CALCITE-5346.
> In 
> [{{AbstractTypeCoercion.implicitCast}}|https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/main/java/org/apache/calcite/sql/validate/implicit/AbstractTypeCoercion.java#L713],
>  Calcite may insert {{CAST}} invocations disregarding the root schema type 
> map. This happens during the validation phase, so it's easy enough to read 
> the type map directly in the {{AbstractTypeCoercion}} utility, although it's 
> hard to say if this is an optimal solution. It may make sense to incorporate 
> the type map into the {{RelDataTypeSystem}} which is more widely available in 
> non-validation phases of SQL processing pipelines.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5636) Consult type mappings when coercing types.

2023-04-06 Thread Will Noble (Jira)
Will Noble created CALCITE-5636:
---

 Summary: Consult type mappings when coercing types.
 Key: CALCITE-5636
 URL: https://issues.apache.org/jira/browse/CALCITE-5636
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Will Noble


This is another offshoot of CALCITE-5346.

In 
[{{AbstractTypeCoercion.implicitCast}}|https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/main/java/org/apache/calcite/sql/validate/implicit/AbstractTypeCoercion.java#L713],
 Calcite may insert {{CAST}} invocations disregarding the root schema type map. 
This happens during the validation phase, so it's easy enough to read the type 
map directly in the {{AbstractTypeCoercion}} utility, although it's hard to say 
if this is an optimal solution. It may make sense to incorporate the type map 
into the {{RelDataTypeSystem}} which is more widely available in non-validation 
phases of SQL processing pipelines.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5635) Consult type mappings when coercing types.

2023-04-06 Thread Will Noble (Jira)
Will Noble created CALCITE-5635:
---

 Summary: Consult type mappings when coercing types.
 Key: CALCITE-5635
 URL: https://issues.apache.org/jira/browse/CALCITE-5635
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Will Noble


This is another offshoot of CALCITE-5346.

In 
[{{AbstractTypeCoercion.implicitCast}}|https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/main/java/org/apache/calcite/sql/validate/implicit/AbstractTypeCoercion.java#L713],
 Calcite may insert {{CAST}} invocations disregarding the root schema type map. 
This happens during the validation phase, so it's easy enough to read the type 
map directly in the {{AbstractTypeCoercion}} utility, although it's hard to say 
if this is an optimal solution. It may make sense to incorporate the type map 
into the {{RelDataTypeSystem}} which is more widely available in non-validation 
phases of SQL processing pipelines.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5552) Returned timestamp is incorrect after the 100th row

2023-03-07 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697659#comment-17697659
 ] 

Will Noble commented on CALCITE-5552:
-

Sounds like this is similar in nature to, though probably distinct from, 
CALCITE-5488 (just my initial hunch)

> Returned timestamp is incorrect after the 100th row
> ---
>
> Key: CALCITE-5552
> URL: https://issues.apache.org/jira/browse/CALCITE-5552
> Project: Calcite
>  Issue Type: Bug
>  Components: avatica
>Affects Versions: avatica-1.23.0
>Reporter: Magnus Mogren
>Priority: Critical
> Attachments: TSTAMPS.csv
>
>
> When fetching data that contains timestamps the returned timestamp after row 
> 100 is incorrect.
> This can be reproduced using the CSV adapter (calcite-csv) using the 
> TSTAMPS.csv file attached. It contains 101 timestamps with the value 
> 1900-01-01 00:00:00. The first 100 is returned correctly, but number 101 in 
> the result has the value 1899-12-31 23:00:00 instead.
> Marking this bug as critical since not beeing able to trust the values 
> returned by calcite is as bad as it gets in my opinion.
> I do not know if the bug is in calcite or avatica.
> I have created a project that reproduces the issue. You can find that here: 
> [nytro77/calcite-timestamp-bug: Showcase bug in Calcite or Avatica that 
> causes faulty timestamps to be returned 
> (github.com)|https://github.com/nytro77/calcite-timestamp-bug]
> It starts an AvaticaServer that serves the attached file as a table using 
> calcite-csv and a unit tests that connects to the server and fetches the data.
> Run it with 
> {code:java}
> ./mvnw test{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (CALCITE-5446) Support TIMESTAMP WITH LOCAL TIME ZONE type in JDBC driver

2023-02-22 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble reassigned CALCITE-5446:
---

Assignee: Will Noble  (was: Oliver Lee)

> Support TIMESTAMP WITH LOCAL TIME ZONE type in JDBC driver
> --
>
> Key: CALCITE-5446
> URL: https://issues.apache.org/jira/browse/CALCITE-5446
> Project: Calcite
>  Issue Type: Bug
>  Components: avatica
>Reporter: Julian Hyde
>Assignee: Will Noble
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Support TIMESTAMP WITH LOCAL TIME ZONE type in JDBC driver.
> Currently, Calcite types with {{SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE}} 
> are mapped to JDBC type  (OTHER). As a result, the ResultSet.toString() 
> method just converts the (signed) {{long}} value to a string.
> Here is what 
> [Progress|https://docs.progress.com/en-US/bundle/datadirect-connect-jdbc-51/page/TIMESTAMP-WITH-LOCAL-TIME-ZONE-Data-Type.html]
>  says:
> {quote}
> The Oracle {{TIMESTAMP WITH LOCAL TIME ZONE}} data type is mapped to the JDBC 
> {{TIMESTAMP}} data type.
> When retrieving {{TIMESTAMP WITH LOCAL TIME ZONE}} columns, the value 
> returned to the user is converted to the time zone specified by the 
> {{TIME_ZONE}} session parameter.
> When setting {{TIMESTAMP WITH LOCAL TIME ZONE}} columns:
> * Using a timestamp (using {{PreparedStatement.setTimestamp}}, for example), 
> the value set is converted to the time zone specified by the {{TIME_ZONE}} 
> session parameter.
> * Using a string (using {{PreparedStatement.setString()}}, for example), the 
> string is passed as-is to the server. The supplied string must be in the 
> format specified by the {{NLS_TIMESTAMP_TZ_FORMAT}} session parameter. If 
> not, the Oracle server generates an error when it attempts to convert the 
> string to the {{TIMESTAMP WITH LOCAL TIME ZONE}} type.
> {quote}
> We should definitely map to {{java.sql.Types.TIMESTAMP}}.
> I'm not sure whether Calcite has a {{TIME_ZONE}} session parameter. But 
> values should be shifted from/to the desired time zone on the way in/out.
> And maybe there should also be a Calcite server time zone that is the time 
> zone in which {{TIMESTAMP WITH LOCAL TIME ZONE}} values are represented. By 
> default this should be UTC, and using a time zone with a variable offset from 
> UTC (e.g. "Pacific/Los_Angeles") would be a bad idea.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5536) Clean up some of the magic numbers in AvaticaResultSetConversionsTest

2023-02-16 Thread Will Noble (Jira)
Will Noble created CALCITE-5536:
---

 Summary: Clean up some of the magic numbers in 
AvaticaResultSetConversionsTest
 Key: CALCITE-5536
 URL: https://issues.apache.org/jira/browse/CALCITE-5536
 Project: Calcite
  Issue Type: Improvement
  Components: avatica
Reporter: Will Noble


Retroactively filing this bug for some  [cleanup I 
did|https://github.com/apache/calcite-avatica/pull/208] in preparation for the 
fix to CALCITE-5446. This cleanup makes the fix and accompanied test 
adjustments easier to review.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5535) Time values are not properly truncated

2023-02-16 Thread Will Noble (Jira)
Will Noble created CALCITE-5535:
---

 Summary: Time values are not properly truncated
 Key: CALCITE-5535
 URL: https://issues.apache.org/jira/browse/CALCITE-5535
 Project: Calcite
  Issue Type: Bug
  Components: avatica
Reporter: Will Noble


Avatica [uses the binary modulo 
operator|https://github.com/apache/calcite-avatica/blob/810acf80771310431d7ef576f3404299ebb8eaf2/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L1164]
 to truncate time values by the number of milliseconds in a day, but this 
operator can result in negative numbers. We should use {{Math.floorMod()}} 
instead. There are also places where numbers are not truncated at all, such as 
when [converting to 
string|https://github.com/apache/calcite-avatica/blob/810acf80771310431d7ef576f3404299ebb8eaf2/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L262].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5526) Handle unparsing of literals based on type system

2023-02-15 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5526:

Description: 
CALCITE-5424 dealt with parsing date/time literals via a custom type system, 
however there is still no way to unparse them via the same type system. This is 
handled in 
[SqlDialect.unparseDateTimeLiteral|https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/java/org/apache/calcite/sql/SqlDialect.java#L468]
 which simply calls the literal object's {{toString()}} method.

The literal object (a subclass of {{SqlAbstractDateTimeLiteral}}) should not be 
able to determine it's own unparsed representation by itself since it could be 
unparsed in any dialect. Since the current system for resolving custom types 
relies on access to the catalog, it appears we'll need to introduce to 
{{SqlDialect}} a dependency on {{CalciteCatalogReader}} so it can use the same 
mappings for unparsing as it does for parsing.

  was:
[CALCITE-5424|https://issues.apache.org/jira/browse/CALCITE-5424] dealt with 
parsing date/time literals via a custom type system, however there is still no 
way to unparse them via the same type system. This is handled in 
[SqlDialect.unparseDateTimeLiteral|https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/java/org/apache/calcite/sql/SqlDialect.java#L468]
 which simply calls the literal object's {{toString()}} method.

The literal object (a subclass of {{SqlAbstractDateTimeLiteral}}) should not be 
able to determine it's own unparsed representation by itself since it could be 
unparsed in any dialect. Since the current system for resolving custom types 
relies on access to the catalog, it appears we'll need to introduce to 
{{SqlDialect}} a dependency on {{CalciteCatalogReader}} so it can use the same 
mappings for unparsing as it does for parsing.


> Handle unparsing of literals based on type system
> -
>
> Key: CALCITE-5526
> URL: https://issues.apache.org/jira/browse/CALCITE-5526
> Project: Calcite
>  Issue Type: Task
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>
> CALCITE-5424 dealt with parsing date/time literals via a custom type system, 
> however there is still no way to unparse them via the same type system. This 
> is handled in 
> [SqlDialect.unparseDateTimeLiteral|https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/java/org/apache/calcite/sql/SqlDialect.java#L468]
>  which simply calls the literal object's {{toString()}} method.
> The literal object (a subclass of {{SqlAbstractDateTimeLiteral}}) should not 
> be able to determine it's own unparsed representation by itself since it 
> could be unparsed in any dialect. Since the current system for resolving 
> custom types relies on access to the catalog, it appears we'll need to 
> introduce to {{SqlDialect}} a dependency on {{CalciteCatalogReader}} so it 
> can use the same mappings for unparsing as it does for parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (CALCITE-5526) Handle unparsing of literals based on type system

2023-02-15 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble reassigned CALCITE-5526:
---

Assignee: Will Noble

> Handle unparsing of literals based on type system
> -
>
> Key: CALCITE-5526
> URL: https://issues.apache.org/jira/browse/CALCITE-5526
> Project: Calcite
>  Issue Type: Task
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>
> [CALCITE-5424|https://issues.apache.org/jira/browse/CALCITE-5424] dealt with 
> parsing date/time literals via a custom type system, however there is still 
> no way to unparse them via the same type system. This is handled in 
> [SqlDialect.unparseDateTimeLiteral|https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/java/org/apache/calcite/sql/SqlDialect.java#L468]
>  which simply calls the literal object's {{toString()}} method.
> The literal object (a subclass of {{SqlAbstractDateTimeLiteral}}) should not 
> be able to determine it's own unparsed representation by itself since it 
> could be unparsed in any dialect. Since the current system for resolving 
> custom types relies on access to the catalog, it appears we'll need to 
> introduce to {{SqlDialect}} a dependency on {{CalciteCatalogReader}} so it 
> can use the same mappings for unparsing as it does for parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5526) Handle unparsing of literals based on type system

2023-02-14 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688638#comment-17688638
 ] 

Will Noble commented on CALCITE-5526:
-

I'm having trouble visualizing your supposition. My understanding is that 
{{SqlUnknownLiteral}} was used because of the separate parsing and validation 
steps, but I'm not aware of a second step during unparsing that would allow us 
to invert the parsing logic directly. Where do you think this intermediate 
"unknown" literal should be created? And are you suggesting I introduce the 
dependency on {{CalciteCatalogReader}} in some other place besides the 
{{{}SqlDialect{}}}?

> Handle unparsing of literals based on type system
> -
>
> Key: CALCITE-5526
> URL: https://issues.apache.org/jira/browse/CALCITE-5526
> Project: Calcite
>  Issue Type: Task
>Reporter: Will Noble
>Priority: Minor
>
> [CALCITE-5424|https://issues.apache.org/jira/browse/CALCITE-5424] dealt with 
> parsing date/time literals via a custom type system, however there is still 
> no way to unparse them via the same type system. This is handled in 
> [SqlDialect.unparseDateTimeLiteral|https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/java/org/apache/calcite/sql/SqlDialect.java#L468]
>  which simply calls the literal object's {{toString()}} method.
> The literal object (a subclass of {{SqlAbstractDateTimeLiteral}}) should not 
> be able to determine it's own unparsed representation by itself since it 
> could be unparsed in any dialect. Since the current system for resolving 
> custom types relies on access to the catalog, it appears we'll need to 
> introduce to {{SqlDialect}} a dependency on {{CalciteCatalogReader}} so it 
> can use the same mappings for unparsing as it does for parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5446) Support TIMESTAMP WITH LOCAL TIME ZONE type in JDBC driver

2023-02-14 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688627#comment-17688627
 ] 

Will Noble commented on CALCITE-5446:
-

Adding my thoughts on the proper semantics for these timestamps types from 
CALCITE-5500 which was marked as a duplicate of this:

{{TIMESTAMP}}:
* {{getString()}} -> Always returns the same string no matter the client's time 
zone
* {{getTimestamp()}} -> Subtracts the zone offset from the instant, so 
rendering the instant to a string in the client's time zone produces the same 
result as {{getString()}}

{{TIMESTAMP WITH LOCAL TIME ZONE}}:
* {{getTimestamp()}} -> Always returns the same instant no matter the time zone
* {{getString()}} -> Adds the zone offset to the instant before rending it to a 
string, so rendering the result of {{getTimestamp()}} in the client's time zone 
produces the same result as this.

> Support TIMESTAMP WITH LOCAL TIME ZONE type in JDBC driver
> --
>
> Key: CALCITE-5446
> URL: https://issues.apache.org/jira/browse/CALCITE-5446
> Project: Calcite
>  Issue Type: Bug
>  Components: avatica
>Reporter: Julian Hyde
>Assignee: Oliver Lee
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Support TIMESTAMP WITH LOCAL TIME ZONE type in JDBC driver.
> Currently, Calcite types with {{SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE}} 
> are mapped to JDBC type  (OTHER). As a result, the ResultSet.toString() 
> method just converts the (signed) {{long}} value to a string.
> Here is what 
> [Progress|https://docs.progress.com/en-US/bundle/datadirect-connect-jdbc-51/page/TIMESTAMP-WITH-LOCAL-TIME-ZONE-Data-Type.html]
>  says:
> {quote}
> The Oracle {{TIMESTAMP WITH LOCAL TIME ZONE}} data type is mapped to the JDBC 
> {{TIMESTAMP}} data type.
> When retrieving {{TIMESTAMP WITH LOCAL TIME ZONE}} columns, the value 
> returned to the user is converted to the time zone specified by the 
> {{TIME_ZONE}} session parameter.
> When setting {{TIMESTAMP WITH LOCAL TIME ZONE}} columns:
> * Using a timestamp (using {{PreparedStatement.setTimestamp}}, for example), 
> the value set is converted to the time zone specified by the {{TIME_ZONE}} 
> session parameter.
> * Using a string (using {{PreparedStatement.setString()}}, for example), the 
> string is passed as-is to the server. The supplied string must be in the 
> format specified by the {{NLS_TIMESTAMP_TZ_FORMAT}} session parameter. If 
> not, the Oracle server generates an error when it attempts to convert the 
> string to the {{TIMESTAMP WITH LOCAL TIME ZONE}} type.
> {quote}
> We should definitely map to {{java.sql.Types.TIMESTAMP}}.
> I'm not sure whether Calcite has a {{TIME_ZONE}} session parameter. But 
> values should be shifted from/to the desired time zone on the way in/out.
> And maybe there should also be a Calcite server time zone that is the time 
> zone in which {{TIMESTAMP WITH LOCAL TIME ZONE}} values are represented. By 
> default this should be UTC, and using a time zone with a variable offset from 
> UTC (e.g. "Pacific/Los_Angeles") would be a bad idea.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5526) Handle unparsing of literals based on type system

2023-02-13 Thread Will Noble (Jira)
Will Noble created CALCITE-5526:
---

 Summary: Handle unparsing of literals based on type system
 Key: CALCITE-5526
 URL: https://issues.apache.org/jira/browse/CALCITE-5526
 Project: Calcite
  Issue Type: Task
Reporter: Will Noble


[CALCITE-5424|https://issues.apache.org/jira/browse/CALCITE-5424] dealt with 
parsing date/time literals via a custom type system, however there is still no 
way to unparse them via the same type system. This is handled in 
[SqlDialect.unparseDateTimeLiteral|https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/java/org/apache/calcite/sql/SqlDialect.java#L468]
 which simply calls the literal object's {{toString()}} method.

The literal object (a subclass of {{SqlAbstractDateTimeLiteral}}) should not be 
able to determine it's own unparsed representation by itself since it could be 
unparsed in any dialect. Since the current system for resolving custom types 
relies on access to the catalog, it appears we'll need to introduce to 
{{SqlDialect}} a dependency on {{CalciteCatalogReader}} so it can use the same 
mappings for unparsing as it does for parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-5488) Avatica double-adjusts timestamps when calling Array.getResultSet()

2023-02-13 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680016#comment-17680016
 ] 

Will Noble edited comment on CALCITE-5488 at 2/13/23 7:26 PM:
--

This is actually a pretty narrow bug. I'll step through the problem at a higher 
level than in the ticket description to illuminate what I see as the solution:

Problem:
1. We get a {{ResultSet}} that contains an array that contains timestamps.
2. We properly adjust all the timestamps in the array when we call 
{{{}getArray(){}}}.
3. We decide to convert that array to a "sub-{{{}ResultSet{}}}" by calling 
{{java.sql.Array.getResultSet()}} on the array, causing the timestamps to be 
double-adjusted.

It's not so much that any of your 4 example constraints were violated. Rather, 
it's that the process of of getting an array and then converting it to a result 
set involves an incorrect series of operations on the primitive ({{{}long{}}}) 
internal representation of the result set before it's abstracted into a 
{{java.sql.Timestamp}} object.

My original proposal was to set the local calendar of any {{ResultSet}} created 
by {{java.sql.Array.getResultSet()}} to UTC. By static analysis of the code, I 
can verify that this will *only* affect this particular narrow case: getting 
time/date/timestamp values out of an array result set, as well as the return 
value of {{{}AvaticaResultSet.getLocalCalendar(){}}}, which is a public method 
but is not part of the JDBC spec and is not used anywhere in the Avatica 
codebase. However, this would create a problematic edge-case where an explicit 
calendar is provided to get the individual timestamps out of the array's result 
set; they would be incorrect.

The only real solution I see now is to "de-adjust" the data in the result set 
during {{ArrayFactoryImpl.create()}} (in between steps 2 and 3 of the problem 
above), effectively undoing the first adjustment and making it so that the 
second adjustment (whether implicit or explicit) is correct.


was (Author: wnoble):
This is actually a pretty narrow bug. I'll step through the problem at a higher 
level than in the ticket description to illuminate what I see as the two 
possible solutions:

Problem:
1. We get a {{ResultSet}} that contains an array that contains timestamps.
2. We properly adjust all the timestamps in the array when we call 
{{getArray()}}.
3. We decide to convert that array to a "sub-{{ResultSet}}" by calling 
{{java.sql.Array.getResultSet()}} on the array, causing the timestamps to be 
double-adjusted.

It's not so much that any of your 4 example constraints were violated. Rather, 
it's that the process of of getting an array and then converting it to a result 
set involves an incorrect series of operations on the primitive ({{long}}) 
internal representation of the result set before it's abstracted into a 
{{java.sql.Timestamp}} object.

My original proposal was to set the local calendar of any {{ResultSet}} created 
by {{java.sql.Array.getResultSet()}} to UTC. By static analysis of the code, I 
can verify that this will *only* affect this particular narrow case: getting 
time/date/timestamp values out of an array result set, as well as the return 
value of {{AvaticaResultSet.getLocalCalendar()}}, which is a public method but 
is not part of the JDBC spec and is not used anywhere in the Avatica codebase. 
However, this would create a problematic edge-case where an explicit calendar 
is provided to get the individual timestamps out of the array's result set; 
they would be incorrect.

The only real solution I see now is to "de-adjust" the data in the result set 
during {{ArrayFactoryImpl.create()}} (in between steps 2 and 3 of the problem 
above), effectively undoing the first adjustment and making it so that the 
second adjustment (whether implicit or explicit) is correct.

> Avatica double-adjusts timestamps when calling Array.getResultSet()
> ---
>
> Key: CALCITE-5488
> URL: https://issues.apache.org/jira/browse/CALCITE-5488
> Project: Calcite
>  Issue Type: Bug
>  Components: avatica
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>
> If you go into {{AvaticaResultSetConversionTest}} and delete the 
> [timeZone|https://github.com/apache/calcite-avatica/blob/dbe9b1d8c2e53474eb40cfaf5721aceca3bdb57f/core/src/test/java/org/apache/calcite/avatica/AvaticaResultSetConversionsTest.java#L1121]
>  property for the connection, which is currently set to "GMT", several 
> {{getString()}} and {{getArray()}} tests start to fail. The {{getString()}} 
> failures are self-explanatory -- the expected value has become incorrect 
> because of the time zone change -- but the {{getArray()}} failures are 
> tricky. They're double-adjusting the timestamps:
> * Once when 

[jira] [Updated] (CALCITE-5511) Substitute type names in error messages when type aliases are present

2023-02-13 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5511:

Description: 
The type aliases covered in CALCITE-5346 may not be included in error messages 
properly. There should be a setting to do type unmapping when rendering error 
messages. Type unmapping will require access to a catalog.

For example, in 
[SqlCastFunction.checkOperandTypes|https://github.com/apache/calcite/blob/d9a81b88ad561e7e4cedae93e805e0d7a53a7f1a/core/src/main/java/org/apache/calcite/sql/fun/SqlCastFunction.java#L161]
 it throws an error based on the canonical name of the {{RelDataType}} which 
may be confusing to end-users.

  was:The type aliases covered in CALCITE-5346 may not be included in error 
messages properly. Perhaps there should be a setting to do type unmapping when 
rendering error messages.


> Substitute type names in error messages when type aliases are present
> -
>
> Key: CALCITE-5511
> URL: https://issues.apache.org/jira/browse/CALCITE-5511
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Will Noble
>Assignee: Oliver Lee
>Priority: Major
>
> The type aliases covered in CALCITE-5346 may not be included in error 
> messages properly. There should be a setting to do type unmapping when 
> rendering error messages. Type unmapping will require access to a catalog.
> For example, in 
> [SqlCastFunction.checkOperandTypes|https://github.com/apache/calcite/blob/d9a81b88ad561e7e4cedae93e805e0d7a53a7f1a/core/src/main/java/org/apache/calcite/sql/fun/SqlCastFunction.java#L161]
>  it throws an error based on the canonical name of the {{RelDataType}} which 
> may be confusing to end-users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5511) Substitute type names in error messages when type aliases are present

2023-01-31 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5511:

Summary: Substitute type names in error messages when type aliases are 
present  (was: Think about type aliases in error messages)

> Substitute type names in error messages when type aliases are present
> -
>
> Key: CALCITE-5511
> URL: https://issues.apache.org/jira/browse/CALCITE-5511
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Will Noble
>Priority: Major
>
> The type aliases covered in CALCITE-5346 may not be included in error 
> messages properly. Perhaps there should be a setting to do type unmapping 
> when rendering error messages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5511) Think about type aliases in error messages

2023-01-31 Thread Will Noble (Jira)
Will Noble created CALCITE-5511:
---

 Summary: Think about type aliases in error messages
 Key: CALCITE-5511
 URL: https://issues.apache.org/jira/browse/CALCITE-5511
 Project: Calcite
  Issue Type: Improvement
Reporter: Will Noble


The type aliases covered in CALCITE-5346 may not be included in error messages 
properly. Perhaps there should be a setting to do type unmapping when rendering 
error messages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5509) Allow automatic type coercion for string literals in BigQuery functions

2023-01-30 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5509:

Summary: Allow automatic type coercion for string literals in BigQuery 
functions  (was: Allow automatic type coercion for strings in BigQuery 
functions)

> Allow automatic type coercion for string literals in BigQuery functions
> ---
>
> Key: CALCITE-5509
> URL: https://issues.apache.org/jira/browse/CALCITE-5509
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Will Noble
>Priority: Minor
>
> BigQuery allows string literals to be automatically coerced into 
> {{{}TIMESTAMP{}}}, {{{}DATETIME{}}}, or {{DATE}} literals. See in the 
> [docs|https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical]:
> {quote}String literals in canonical date format also implicitly coerce to 
> DATE type when used where a DATE-type expression is expected...
> String literals with the canonical datetime format implicitly coerce to a 
> datetime literal when used where a datetime expression is expected...
> String literals with the canonical timestamp format, including those with 
> time zone names, implicitly coerce to a timestamp literal when used where a 
> timestamp expression is expected.
> {quote}
> We should make sure that the BQ library functions respect this type coercion 
> in general. This only applies to literals; not string values in general. For 
> example, {{'2023-01-30'}} can be coerced to a timestamp, datetime, or date, 
> but {{SUBSTRING('x2023-01-30', 2)}} cannot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5509) Allow automatic type coercion for strings in BigQuery functions

2023-01-30 Thread Will Noble (Jira)
Will Noble created CALCITE-5509:
---

 Summary: Allow automatic type coercion for strings in BigQuery 
functions
 Key: CALCITE-5509
 URL: https://issues.apache.org/jira/browse/CALCITE-5509
 Project: Calcite
  Issue Type: Improvement
Reporter: Will Noble


BigQuery allows string literals to be automatically coerced into 
{{{}TIMESTAMP{}}}, {{{}DATETIME{}}}, or {{DATE}} literals. See in the 
[docs|https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical]:
{quote}String literals in canonical date format also implicitly coerce to DATE 
type when used where a DATE-type expression is expected...

String literals with the canonical datetime format implicitly coerce to a 
datetime literal when used where a datetime expression is expected...

String literals with the canonical timestamp format, including those with time 
zone names, implicitly coerce to a timestamp literal when used where a 
timestamp expression is expected.
{quote}

We should make sure that the BQ library functions respect this type coercion in 
general. This only applies to literals; not string values in general. For 
example, {{'2023-01-30'}} can be coerced to a timestamp, datetime, or date, but 
{{SUBSTRING('x2023-01-30', 2)}} cannot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5446) Support TIMESTAMP WITH LOCAL TIME ZONE type in JDBC driver

2023-01-25 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680852#comment-17680852
 ] 

Will Noble commented on CALCITE-5446:
-

{quote}values should be shifted from/to the desired time zone on the way in/out
{quote}
This must be done in Avatica. See CALCITE-5500. I'd like to do a bit of testing 
against my local Avatica branch with a draft fix before signing off on that PR.

> Support TIMESTAMP WITH LOCAL TIME ZONE type in JDBC driver
> --
>
> Key: CALCITE-5446
> URL: https://issues.apache.org/jira/browse/CALCITE-5446
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Oliver Lee
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Support TIMESTAMP WITH LOCAL TIME ZONE type in JDBC driver.
> Currently, Calcite types with {{SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE}} 
> are mapped to JDBC type  (OTHER). As a result, the ResultSet.toString() 
> method just converts the (signed) {{long}} value to a string.
> Here is what 
> [Progress|https://docs.progress.com/en-US/bundle/datadirect-connect-jdbc-51/page/TIMESTAMP-WITH-LOCAL-TIME-ZONE-Data-Type.html]
>  says:
> {quote}
> The Oracle {{TIMESTAMP WITH LOCAL TIME ZONE}} data type is mapped to the JDBC 
> {{TIMESTAMP}} data type.
> When retrieving {{TIMESTAMP WITH LOCAL TIME ZONE}} columns, the value 
> returned to the user is converted to the time zone specified by the 
> {{TIME_ZONE}} session parameter.
> When setting {{TIMESTAMP WITH LOCAL TIME ZONE}} columns:
> * Using a timestamp (using {{PreparedStatement.setTimestamp}}, for example), 
> the value set is converted to the time zone specified by the {{TIME_ZONE}} 
> session parameter.
> * Using a string (using {{PreparedStatement.setString()}}, for example), the 
> string is passed as-is to the server. The supplied string must be in the 
> format specified by the {{NLS_TIMESTAMP_TZ_FORMAT}} session parameter. If 
> not, the Oracle server generates an error when it attempts to convert the 
> string to the {{TIMESTAMP WITH LOCAL TIME ZONE}} type.
> {quote}
> We should definitely map to {{java.sql.Types.TIMESTAMP}}.
> I'm not sure whether Calcite has a {{TIME_ZONE}} session parameter. But 
> values should be shifted from/to the desired time zone on the way in/out.
> And maybe there should also be a Calcite server time zone that is the time 
> zone in which {{TIMESTAMP WITH LOCAL TIME ZONE}} values are represented. By 
> default this should be UTC, and using a time zone with a variable offset from 
> UTC (e.g. "Pacific/Los_Angeles") would be a bad idea.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5308) Add support to Avatica for microsecond-precision timestamps

2023-01-25 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5308:

Description: Avatica [serializes 
timestamps|https://github.com/apache/calcite-avatica/blob/9dfc5cad3efe72e9b14513b15a988ba86ab5a4ea/core/src/main/java/org/apache/calcite/avatica/remote/TypedValue.java#L447]
 as the integer number of milliseconds since epoch. We would like support for 
microsecond precision.  (was: Avatica [serializes 
timestamps|https://github.com/apache/calcite-avatica/blob/9dfc5cad3efe72e9b14513b15a988ba86ab5a4ea/core/src/main/java/org/apache/calcite/avatica/remote/TypedValue.java#L447]
 as the integer number of milliseconds since epoch. We would like support for 
microsecond precision, although we only need it for 
{{TIMESTAMP_WITH_LOCAL_TIME_ZONE}}, which is currently lacking proper support 
(The recommended corresponding Java type as per [the 
spec|https://download.oracle.com/otn-pub/jcp/jdbc-4_2-mrel2-spec/jdbc4.2-fr-spec.pdf?AuthParam=1664936014_d36468601dd3a115d7d95c975043ff3a]
 is {{java.time.OffsetDateTime}}).)

> Add support to Avatica for microsecond-precision timestamps
> ---
>
> Key: CALCITE-5308
> URL: https://issues.apache.org/jira/browse/CALCITE-5308
> Project: Calcite
>  Issue Type: Improvement
>  Components: avatica
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Avatica [serializes 
> timestamps|https://github.com/apache/calcite-avatica/blob/9dfc5cad3efe72e9b14513b15a988ba86ab5a4ea/core/src/main/java/org/apache/calcite/avatica/remote/TypedValue.java#L447]
>  as the integer number of milliseconds since epoch. We would like support for 
> microsecond precision.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5308) Add support to Avatica for microsecond-precision timestamps

2023-01-25 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5308:

Summary: Add support to Avatica for microsecond-precision timestamps  (was: 
Add support to Avatica for microsecond-precision timestamps (with timezone))

> Add support to Avatica for microsecond-precision timestamps
> ---
>
> Key: CALCITE-5308
> URL: https://issues.apache.org/jira/browse/CALCITE-5308
> Project: Calcite
>  Issue Type: Improvement
>  Components: avatica
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Avatica [serializes 
> timestamps|https://github.com/apache/calcite-avatica/blob/9dfc5cad3efe72e9b14513b15a988ba86ab5a4ea/core/src/main/java/org/apache/calcite/avatica/remote/TypedValue.java#L447]
>  as the integer number of milliseconds since epoch. We would like support for 
> microsecond precision, although we only need it for 
> {{TIMESTAMP_WITH_LOCAL_TIME_ZONE}}, which is currently lacking proper support 
> (The recommended corresponding Java type as per [the 
> spec|https://download.oracle.com/otn-pub/jcp/jdbc-4_2-mrel2-spec/jdbc4.2-fr-spec.pdf?AuthParam=1664936014_d36468601dd3a115d7d95c975043ff3a]
>  is {{java.time.OffsetDateTime}}).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5308) Add support to Avatica for microsecond-precision timestamps (with timezone)

2023-01-25 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680851#comment-17680851
 ] 

Will Noble commented on CALCITE-5308:
-

I split this into a separate bug for {{TIMESTAMP WITH LOCAL TIME ZONE}}: 
CALCITE-5500

Now, this bug will just track extended precision for both {{TIMESTAMP}} and 
{{TIMESTAMP WITH LOCAL TIME ZONE}}.

> Add support to Avatica for microsecond-precision timestamps (with timezone)
> ---
>
> Key: CALCITE-5308
> URL: https://issues.apache.org/jira/browse/CALCITE-5308
> Project: Calcite
>  Issue Type: Improvement
>  Components: avatica
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Avatica [serializes 
> timestamps|https://github.com/apache/calcite-avatica/blob/9dfc5cad3efe72e9b14513b15a988ba86ab5a4ea/core/src/main/java/org/apache/calcite/avatica/remote/TypedValue.java#L447]
>  as the integer number of milliseconds since epoch. We would like support for 
> microsecond precision, although we only need it for 
> {{TIMESTAMP_WITH_LOCAL_TIME_ZONE}}, which is currently lacking proper support 
> (The recommended corresponding Java type as per [the 
> spec|https://download.oracle.com/otn-pub/jcp/jdbc-4_2-mrel2-spec/jdbc4.2-fr-spec.pdf?AuthParam=1664936014_d36468601dd3a115d7d95c975043ff3a]
>  is {{java.time.OffsetDateTime}}).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5500) Add support in Avatica for TIMESTAMP WITH LOCAL TIME ZONE

2023-01-25 Thread Will Noble (Jira)
Will Noble created CALCITE-5500:
---

 Summary: Add support in Avatica for TIMESTAMP WITH LOCAL TIME ZONE
 Key: CALCITE-5500
 URL: https://issues.apache.org/jira/browse/CALCITE-5500
 Project: Calcite
  Issue Type: Improvement
  Components: avatica
Reporter: Will Noble


This is closely related to CALCITE-5446 but tracks the changes required in 
Avatica. This is where time zone conversion happens when the client invokes a 
getter method on a {{ResultSet}}. Here are the intended semantics for 
{{TIMESTAMP}} and {{TIMESTAMP WITH LOCAL TIME ZONE}}:

{{TIMESTAMP}}:
* {{getString()}} -> Always returns the same string no matter the client's time 
zone
* {{getTimestamp()}} -> Subtracts the zone offset from the instant, so 
rendering the instant to a string in the client's time zone produces the same 
result as {{getString()}}

{{TIMESTAMP WITH LOCAL TIME ZONE}}:
* {{getTimestamp()}} -> Always returns the same instant no matter the time zone
* {{getString()}} -> Adds the zone offset to the instant before rending it to a 
string, so rendering the result of {{getTimestamp()}} in the client's time zone 
produces the same result as this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5492) Adjusting dates based on time zone doesn't always make sense

2023-01-23 Thread Will Noble (Jira)
Will Noble created CALCITE-5492:
---

 Summary: Adjusting dates based on time zone doesn't always make 
sense
 Key: CALCITE-5492
 URL: https://issues.apache.org/jira/browse/CALCITE-5492
 Project: Calcite
  Issue Type: Improvement
  Components: avatica
Reporter: Will Noble


The JDBC API allows you to provide a {{Calendar}} argument to 
[ResultSet.getDate()|https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getDate(int,%20java.util.Calendar)]
 "to use in constructing the date" and Avatica currently uses it to adjust the 
millisecond value of the {{java.sql.Date}} object that comes out of a result 
set according to the time zone of the calendar, similar to how it adjust 
timestamps and time-of-day values.

This might make sense if Calcite always represented date objects with 
millisecond precision, but since it often represents them as an integer number 
of days since 1970-01-01, adjusting based on time zone does not make sense in 
general. This has become an issue in the fix for 
[CALCITE-5488|https://issues.apache.org/jira/browse/CALCITE-5488] because date 
values may have to be "un-adjusted", but information may have already been lost 
due to truncating division to convert milliseconds to days.

I'm not yet sure what the best way to proceed might be, but I'm wondering what 
the intended semantics here are. Is a date meant to have millisecond resolution 
(in which case representing it as an integer number of days does not make 
sense), or is it meant to have only day resolution (in which case applying a 
time zone does not make sense)?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5492) Adjusting dates based on time zone doesn't always make sense

2023-01-23 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5492:

Priority: Minor  (was: Major)

> Adjusting dates based on time zone doesn't always make sense
> 
>
> Key: CALCITE-5492
> URL: https://issues.apache.org/jira/browse/CALCITE-5492
> Project: Calcite
>  Issue Type: Improvement
>  Components: avatica
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>
> The JDBC API allows you to provide a {{Calendar}} argument to 
> [ResultSet.getDate()|https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getDate(int,%20java.util.Calendar)]
>  "to use in constructing the date" and Avatica currently uses it to adjust 
> the millisecond value of the {{java.sql.Date}} object that comes out of a 
> result set according to the time zone of the calendar, similar to how it 
> adjust timestamps and time-of-day values.
> This might make sense if Calcite always represented date objects with 
> millisecond precision, but since it often represents them as an integer 
> number of days since 1970-01-01, adjusting based on time zone does not make 
> sense in general. This has become an issue in the fix for 
> [CALCITE-5488|https://issues.apache.org/jira/browse/CALCITE-5488] because 
> date values may have to be "un-adjusted", but information may have already 
> been lost due to truncating division to convert milliseconds to days.
> I'm not yet sure what the best way to proceed might be, but I'm wondering 
> what the intended semantics here are. Is a date meant to have millisecond 
> resolution (in which case representing it as an integer number of days does 
> not make sense), or is it meant to have only day resolution (in which case 
> applying a time zone does not make sense)?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (CALCITE-5492) Adjusting dates based on time zone doesn't always make sense

2023-01-23 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble reassigned CALCITE-5492:
---

Assignee: Will Noble

> Adjusting dates based on time zone doesn't always make sense
> 
>
> Key: CALCITE-5492
> URL: https://issues.apache.org/jira/browse/CALCITE-5492
> Project: Calcite
>  Issue Type: Improvement
>  Components: avatica
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Major
>
> The JDBC API allows you to provide a {{Calendar}} argument to 
> [ResultSet.getDate()|https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getDate(int,%20java.util.Calendar)]
>  "to use in constructing the date" and Avatica currently uses it to adjust 
> the millisecond value of the {{java.sql.Date}} object that comes out of a 
> result set according to the time zone of the calendar, similar to how it 
> adjust timestamps and time-of-day values.
> This might make sense if Calcite always represented date objects with 
> millisecond precision, but since it often represents them as an integer 
> number of days since 1970-01-01, adjusting based on time zone does not make 
> sense in general. This has become an issue in the fix for 
> [CALCITE-5488|https://issues.apache.org/jira/browse/CALCITE-5488] because 
> date values may have to be "un-adjusted", but information may have already 
> been lost due to truncating division to convert milliseconds to days.
> I'm not yet sure what the best way to proceed might be, but I'm wondering 
> what the intended semantics here are. Is a date meant to have millisecond 
> resolution (in which case representing it as an integer number of days does 
> not make sense), or is it meant to have only day resolution (in which case 
> applying a time zone does not make sense)?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5488) Avatica double-adjusts timestamps when calling Array.getResultSet()

2023-01-23 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680016#comment-17680016
 ] 

Will Noble commented on CALCITE-5488:
-

This is actually a pretty narrow bug. I'll step through the problem at a higher 
level than in the ticket description to illuminate what I see as the two 
possible solutions:

Problem:
1. We get a {{ResultSet}} that contains an array that contains timestamps.
2. We properly adjust all the timestamps in the array when we call 
{{getArray()}}.
3. We decide to convert that array to a "sub-{{ResultSet}}" by calling 
{{java.sql.Array.getResultSet()}} on the array, causing the timestamps to be 
double-adjusted.

It's not so much that any of your 4 example constraints were violated. Rather, 
it's that the process of of getting an array and then converting it to a result 
set involves an incorrect series of operations on the primitive ({{long}}) 
internal representation of the result set before it's abstracted into a 
{{java.sql.Timestamp}} object.

My original proposal was to set the local calendar of any {{ResultSet}} created 
by {{java.sql.Array.getResultSet()}} to UTC. By static analysis of the code, I 
can verify that this will *only* affect this particular narrow case: getting 
time/date/timestamp values out of an array result set, as well as the return 
value of {{AvaticaResultSet.getLocalCalendar()}}, which is a public method but 
is not part of the JDBC spec and is not used anywhere in the Avatica codebase. 
However, this would create a problematic edge-case where an explicit calendar 
is provided to get the individual timestamps out of the array's result set; 
they would be incorrect.

The only real solution I see now is to "de-adjust" the data in the result set 
during {{ArrayFactoryImpl.create()}} (in between steps 2 and 3 of the problem 
above), effectively undoing the first adjustment and making it so that the 
second adjustment (whether implicit or explicit) is correct.

> Avatica double-adjusts timestamps when calling Array.getResultSet()
> ---
>
> Key: CALCITE-5488
> URL: https://issues.apache.org/jira/browse/CALCITE-5488
> Project: Calcite
>  Issue Type: Bug
>  Components: avatica
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>
> If you go into {{AvaticaResultSetConversionTest}} and delete the 
> [timeZone|https://github.com/apache/calcite-avatica/blob/dbe9b1d8c2e53474eb40cfaf5721aceca3bdb57f/core/src/test/java/org/apache/calcite/avatica/AvaticaResultSetConversionsTest.java#L1121]
>  property for the connection, which is currently set to "GMT", several 
> {{getString()}} and {{getArray()}} tests start to fail. The {{getString()}} 
> failures are self-explanatory -- the expected value has become incorrect 
> because of the time zone change -- but the {{getArray()}} failures are 
> tricky. They're double-adjusting the timestamps:
> * Once when they call 
> [getObject()|https://github.com/apache/calcite-avatica/blob/810acf80771310431d7ef576f3404299ebb8eaf2/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L1432]
>  (which invokes {{getTimestamp()}} with the connection's default calendar, 
> which was previously GMT but is now the system default).
> * Then, in 
> [ArrayImpl.equalContents()|https://github.com/apache/calcite-avatica/blob/820edf6f653607afb5a2a280a32f315aff1f64cb/core/src/main/java/org/apache/calcite/avatica/util/ArrayImpl.java#L233]
>  the arrays are converted to result sets for traversal via 
> [Array.getResultSet()|https://docs.oracle.com/javase/8/docs/api/java/sql/Array.html#getResultSet--]
>  , which uses the same time zone as the "factory" result set and eventually 
> invokes {{getObject()}} again, thus applying the time zone offset twice.
> This is a bug in the implementation of {{Array.getResultSet()}} that only 
> manifests when the array contains timestamps and when the connection default 
> time zone is anything besides GMT, and this is not currently covered by 
> tests. It's easy to cause the failure, however, by changing the connection 
> default time zone for {{AvaticaResultSetConversionTest}} to anything besides 
> GMT or UTC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5488) Avatica double-adjusts timestamps when calling Array.getResultSet()

2023-01-20 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5488:

Summary: Avatica double-adjusts timestamps when calling 
Array.getResultSet()  (was: Calcite double-adjusts timestamps when calling 
Array.getResultSet())

> Avatica double-adjusts timestamps when calling Array.getResultSet()
> ---
>
> Key: CALCITE-5488
> URL: https://issues.apache.org/jira/browse/CALCITE-5488
> Project: Calcite
>  Issue Type: Bug
>  Components: avatica
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>
> If you go into {{AvaticaResultSetConversionTest}} and delete the 
> [timeZone|https://github.com/apache/calcite-avatica/blob/dbe9b1d8c2e53474eb40cfaf5721aceca3bdb57f/core/src/test/java/org/apache/calcite/avatica/AvaticaResultSetConversionsTest.java#L1121]
>  property for the connection, which is currently set to "GMT", several 
> {{getString()}} and {{getArray()}} tests start to fail. The {{getString()}} 
> failures are self-explanatory -- the expected value has become incorrect 
> because of the time zone change -- but the {{getArray()}} failures are 
> tricky. They're double-adjusting the timestamps:
> * Once when they call 
> [getObject()|https://github.com/apache/calcite-avatica/blob/810acf80771310431d7ef576f3404299ebb8eaf2/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L1432]
>  (which invokes {{getTimestamp()}} with the connection's default calendar, 
> which was previously GMT but is now the system default).
> * Then, in 
> [ArrayImpl.equalContents()|https://github.com/apache/calcite-avatica/blob/820edf6f653607afb5a2a280a32f315aff1f64cb/core/src/main/java/org/apache/calcite/avatica/util/ArrayImpl.java#L233]
>  the arrays are converted to result sets for traversal via 
> [Array.getResultSet()|https://docs.oracle.com/javase/8/docs/api/java/sql/Array.html#getResultSet--]
>  , which uses the same time zone as the "factory" result set and eventually 
> invokes {{getObject()}} again, thus applying the time zone offset twice.
> This is a bug in the implementation of {{Array.getResultSet()}} that only 
> manifests when the array contains timestamps and when the connection default 
> time zone is anything besides GMT, and this is not currently covered by 
> tests. It's easy to cause the failure, however, by changing the connection 
> default time zone for {{AvaticaResultSetConversionTest}} to anything besides 
> GMT or UTC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-5488) Calcite double-adjusts timestamps when calling Array.getResultSet()

2023-01-20 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble updated CALCITE-5488:

Component/s: avatica

> Calcite double-adjusts timestamps when calling Array.getResultSet()
> ---
>
> Key: CALCITE-5488
> URL: https://issues.apache.org/jira/browse/CALCITE-5488
> Project: Calcite
>  Issue Type: Bug
>  Components: avatica
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>
> If you go into {{AvaticaResultSetConversionTest}} and delete the 
> [timeZone|https://github.com/apache/calcite-avatica/blob/dbe9b1d8c2e53474eb40cfaf5721aceca3bdb57f/core/src/test/java/org/apache/calcite/avatica/AvaticaResultSetConversionsTest.java#L1121]
>  property for the connection, which is currently set to "GMT", several 
> {{getString()}} and {{getArray()}} tests start to fail. The {{getString()}} 
> failures are self-explanatory -- the expected value has become incorrect 
> because of the time zone change -- but the {{getArray()}} failures are 
> tricky. They're double-adjusting the timestamps:
> * Once when they call 
> [getObject()|https://github.com/apache/calcite-avatica/blob/810acf80771310431d7ef576f3404299ebb8eaf2/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L1432]
>  (which invokes {{getTimestamp()}} with the connection's default calendar, 
> which was previously GMT but is now the system default).
> * Then, in 
> [ArrayImpl.equalContents()|https://github.com/apache/calcite-avatica/blob/820edf6f653607afb5a2a280a32f315aff1f64cb/core/src/main/java/org/apache/calcite/avatica/util/ArrayImpl.java#L233]
>  the arrays are converted to result sets for traversal via 
> [Array.getResultSet()|https://docs.oracle.com/javase/8/docs/api/java/sql/Array.html#getResultSet--]
>  , which uses the same time zone as the "factory" result set and eventually 
> invokes {{getObject()}} again, thus applying the time zone offset twice.
> This is a bug in the implementation of {{Array.getResultSet()}} that only 
> manifests when the array contains timestamps and when the connection default 
> time zone is anything besides GMT, and this is not currently covered by 
> tests. It's easy to cause the failure, however, by changing the connection 
> default time zone for {{AvaticaResultSetConversionTest}} to anything besides 
> GMT or UTC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5488) Calcite double-adjusts timestamps when calling Array.getResultSet()

2023-01-19 Thread Will Noble (Jira)
Will Noble created CALCITE-5488:
---

 Summary: Calcite double-adjusts timestamps when calling 
Array.getResultSet()
 Key: CALCITE-5488
 URL: https://issues.apache.org/jira/browse/CALCITE-5488
 Project: Calcite
  Issue Type: Bug
Reporter: Will Noble


If you go into {{AvaticaResultSetConversionTest}} and delete the 
[timeZone|https://github.com/apache/calcite-avatica/blob/dbe9b1d8c2e53474eb40cfaf5721aceca3bdb57f/core/src/test/java/org/apache/calcite/avatica/AvaticaResultSetConversionsTest.java#L1121]
 property for the connection, which is currently set to "GMT", several 
{{getString()}} and {{getArray()}} tests start to fail. The {{getString()}} 
failures are self-explanatory -- the expected value has become incorrect 
because of the time zone change -- but the {{getArray()}} failures are tricky. 
They're double-adjusting the timestamps:
* Once when they call 
[getObject()|https://github.com/apache/calcite-avatica/blob/810acf80771310431d7ef576f3404299ebb8eaf2/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L1432]
 (which invokes {{getTimestamp()}} with the connection's default calendar, 
which was previously GMT but is now the system default).
* Then, in 
[ArrayImpl.equalContents()|https://github.com/apache/calcite-avatica/blob/820edf6f653607afb5a2a280a32f315aff1f64cb/core/src/main/java/org/apache/calcite/avatica/util/ArrayImpl.java#L233]
 the arrays are converted to result sets for traversal via 
[Array.getResultSet()|https://docs.oracle.com/javase/8/docs/api/java/sql/Array.html#getResultSet--]
 , which uses the same time zone as the "factory" result set and eventually 
invokes {{getObject()}} again, thus applying the time zone offset twice.

This is a bug in the implementation of {{Array.getResultSet()}} that only 
manifests when the array contains timestamps and when the connection default 
time zone is anything besides GMT, and this is not currently covered by tests. 
It's easy to cause the failure, however, by changing the connection default 
time zone for {{AvaticaResultSetConversionTest}} to anything besides GMT or UTC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (CALCITE-5488) Calcite double-adjusts timestamps when calling Array.getResultSet()

2023-01-19 Thread Will Noble (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Noble reassigned CALCITE-5488:
---

Assignee: Will Noble

> Calcite double-adjusts timestamps when calling Array.getResultSet()
> ---
>
> Key: CALCITE-5488
> URL: https://issues.apache.org/jira/browse/CALCITE-5488
> Project: Calcite
>  Issue Type: Bug
>Reporter: Will Noble
>Assignee: Will Noble
>Priority: Minor
>
> If you go into {{AvaticaResultSetConversionTest}} and delete the 
> [timeZone|https://github.com/apache/calcite-avatica/blob/dbe9b1d8c2e53474eb40cfaf5721aceca3bdb57f/core/src/test/java/org/apache/calcite/avatica/AvaticaResultSetConversionsTest.java#L1121]
>  property for the connection, which is currently set to "GMT", several 
> {{getString()}} and {{getArray()}} tests start to fail. The {{getString()}} 
> failures are self-explanatory -- the expected value has become incorrect 
> because of the time zone change -- but the {{getArray()}} failures are 
> tricky. They're double-adjusting the timestamps:
> * Once when they call 
> [getObject()|https://github.com/apache/calcite-avatica/blob/810acf80771310431d7ef576f3404299ebb8eaf2/core/src/main/java/org/apache/calcite/avatica/util/AbstractCursor.java#L1432]
>  (which invokes {{getTimestamp()}} with the connection's default calendar, 
> which was previously GMT but is now the system default).
> * Then, in 
> [ArrayImpl.equalContents()|https://github.com/apache/calcite-avatica/blob/820edf6f653607afb5a2a280a32f315aff1f64cb/core/src/main/java/org/apache/calcite/avatica/util/ArrayImpl.java#L233]
>  the arrays are converted to result sets for traversal via 
> [Array.getResultSet()|https://docs.oracle.com/javase/8/docs/api/java/sql/Array.html#getResultSet--]
>  , which uses the same time zone as the "factory" result set and eventually 
> invokes {{getObject()}} again, thus applying the time zone offset twice.
> This is a bug in the implementation of {{Array.getResultSet()}} that only 
> manifests when the array contains timestamps and when the connection default 
> time zone is anything besides GMT, and this is not currently covered by 
> tests. It's easy to cause the failure, however, by changing the connection 
> default time zone for {{AvaticaResultSetConversionTest}} to anything besides 
> GMT or UTC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5487) Avatica Result Sets do not properly handle TIMESTAMP WITH LOCAL TIME ZONE

2023-01-19 Thread Will Noble (Jira)
Will Noble created CALCITE-5487:
---

 Summary: Avatica Result Sets do not properly handle TIMESTAMP WITH 
LOCAL TIME ZONE
 Key: CALCITE-5487
 URL: https://issues.apache.org/jira/browse/CALCITE-5487
 Project: Calcite
  Issue Type: Bug
  Components: avatica
Reporter: Will Noble


With the addition of proper support for {{TIMESTAMP WITH LOCAL TIME ZONE}} to 
Calcite being done as part of CALCITE-5346, Avatica needs to support the proper 
semantics for this type. Whereas a regular {{TIMESTAMP}} always has the same 
string representation but can have different instants depending on the calendar 
(time zone) passed to the result set getter, a {{TIMESTAMP WITH LOCAL TIME 
ZONE}} does the inverse: it always represents the same instant, but could have 
a different string representation depending on the calendar passed to the 
result set getter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5424) Customize handling of literals based on type system

2023-01-18 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17678434#comment-17678434
 ] 

Will Noble commented on CALCITE-5424:
-

Offline, Julian expressed concern about moving the BigQuery Quidem tests to the 
core parser, citing merge conflicts and confusion. I think it's worth it, 
because we need the core parser to properly support the BQ dialect.

To my understanding, success-condition testing in {{CoreQuidemTest}} implies 
that the more-permissive Babel parser will also succeed in those cases, but 
success-condition testing in {{BabelQuidemTest}} does not imply that it will 
work with the core parser. On the other hand, failure-condition testing has 
inverted implications.

So, I think it makes most sense to have all necessary success conditions tested 
in the core parser to ensure they're successful in both parsers for posterity.

> Customize handling of literals based on type system
> ---
>
> Key: CALCITE-5424
> URL: https://issues.apache.org/jira/browse/CALCITE-5424
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Julian Hyde
>Priority: Major
>
> Currently if you write {{TIMESTAMP '1969-07-20 22:56:00'}}  it will be 
> converted to a value of type {{{}TIMESTAMP{}}}. But if someone has set a 
> custom type mapping where the name {{TIMESTAMP}} corresponds to the 
> {{TIMESTAMP WITH LOCAL TIME ZONE}} type then the literal value should be of 
> that type.
> Custom type mappings are specified by defining user-defined types in the 
> schema; see the method 
> [Schema.getType|https://calcite.apache.org/javadocAggregate/org/apache/calcite/schema/Schema.html#getType(java.lang.String)].
> The contents of the string are currently handled by the parser. But in order 
> to implement this change, the parsing of that string will need to be deferred 
> to validation time (when the actual type is known, not just its name). This 
> applies to literals {{TIMESTAMP}}, {{DATE}}, {{TIME}}, {{ARRAY}}, 
> {{INTERVAL}}. The parser currently emits subclasses of {{SqlLiteral}}, namely 
> {{SqlTimestampLiteral}}, {{SqlDateLiteral}}, {{SqlTimeLiteral}}, 
> {{SqlArrayLiteral}}, {{SqlIntervalLiteral}}. But after this change, the 
> parser will emit a new subclass, {{SqlTaggedLiteral}} instead of all of these.
> Also as part of this change, add BigQuery-compatible literals:
>  * {{NUMERIC '0'}}
>  * {{BIGNUMERIC '0'}}
>  * {{DATETIME '2014-09-27 12:30:00.45'}}
>  * {{JSON 'json_formatted_data'}}
>  
> And a {{TIMESTAMP WITH LOCAL TIME ZONE}} literal:
>  * {{TIMESTAMP WITH LOCAL TIME ZONE '1969-07-21 02:56:00'}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5445) Avatica should display Measures as the proper DATA_TYPE

2022-12-19 Thread Will Noble (Jira)
Will Noble created CALCITE-5445:
---

 Summary: Avatica should display Measures as the proper DATA_TYPE
 Key: CALCITE-5445
 URL: https://issues.apache.org/jira/browse/CALCITE-5445
 Project: Calcite
  Issue Type: Bug
  Components: avatica
Reporter: Will Noble
Assignee: Julian Hyde


Filing this for tracking purposes. According to Julian:

For a measure of type {{FLOAT}} we would return {{DATA_TYPE = 
java.sql.Types.FLOAT}} and {{TYPE_NAME = "Measure"}}. We should not be 
returning {{DATA_TYPE = java.sql.Types.OTHER}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5180) Implement BigQuery functions for DATE, TIME, TIMESTAMP, DATETIME

2022-11-22 Thread Will Noble (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637515#comment-17637515
 ] 

Will Noble commented on CALCITE-5180:
-

This is the first I'm hearing of extreme BQ compatibility. Grepping 
case-insensitively for {{EBQC}} and {{'extreme.*compatibility'}} both turned up 
nothing in Calcite. Is this an existing setting?

I'm assuming you're going with Oracle's definitions of these types, in which 
case they have a handy guide to [picking which type to 
use|https://docs.oracle.com/cd/E11882_01/server.112/e10729/ch4datetime.htm#NLSPG244].
 Here's my take:

* {{TIMESTAMP WITH TIME ZONE}} is clearly not appropriate because BigQuery 
timestamps have no time zone.
* {{TIMESTAMP WITH LOCAL TIME ZONE}} "is appropriate when the original time 
zone is of no interest, but the relative times of events are important and 
daylight saving adjustment does not have to be accurate. The time zone 
conversion that this data type performs to and from the database time zone is 
asymmetrical, due to the daylight saving adjustment. Because this data type 
does not preserve the time zone information, it does not distinguish values 
near the adjustment in fall, whether they are daylight saving time or standard 
time. This confusion between distinct instants can cause an application to 
behave unexpectedly, especially if the adjustment takes place during the normal 
working hours of a user." -- this also doesn't seem appropriate to me.
* {{TIMESTAMP}} seems most appropriate to me. In Oracle, this is "the time of 
an event without the time zone", and according to the [BigQuery 
docs|https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions#timezone_definitions],
 a {{TIMESTAMP}} "represents an absolute point in time, independent of any time 
zone". Seems like a perfect fit. It's also very straightforward.

I believe we should add a new datatype to represent {{DATETIME}}. Neither it, 
not BQ's {{TIMESTAMP}}, map naturally to any of the time zone- related datetime 
types in the Oracle docs.



> Implement BigQuery functions for DATE, TIME, TIMESTAMP, DATETIME
> 
>
> Key: CALCITE-5180
> URL: https://issues.apache.org/jira/browse/CALCITE-5180
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Will Noble
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Implement missing BigQuery functions for DATE, TIME, TIMESTAMP, DATETIME.
> Functions include:
> * CURRENT_DATE, CURRENT_DATETIME, CURRENT_TIME with timezone;
> * EXTRACT with time unit values WEEK(weekday), DATE, TIME;
> * various DATE, TIME, TIMESTAMP, DATETIME constructor functions;
> * DATE_ADD, DATETIME_ADD, TIME_ADD, TIMESTAMP_ADD;
> * DATE_SUB, DATETIME_SUB, TIME_SUB, TIMESTAMP_SUB;
> * DATE_DIFF, DATETIME_DIFF, TIME_DIFF, TIMESTAMP_DIFF;
> * DATE_TRUNC, DATETIME_TRUNC, TIME_TRUNC, TIMESTAMP_TRUNC;
> * LAST_DAY;
> * FORMAT_DATE, FORMAT_DATETIME, FORMAT_TIME, FORMAT_TIMESTAMP; 
> * PARSE_DATE, PARSE_DATETIME, PARSE_TIME, PARSE_TIMESTAMP. 
> The following functions are already implemented: CURRENT_DATETME, 
> DATE(string), DATE_FROM_UNIX_DATE, TIMESTAMP_MICROS, TIMESTAMP_MILLIS, 
> TIMESTAMP_SECONDS, UNIX_MICROS, UNIX_MILLIS, UNIX_SECONDS, UNIX_DATE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >