[jira] [Commented] (CALCITE-6367) Add timezone support for FORMAT clause in CAST (enabled in BigQuery)
[ https://issues.apache.org/jira/browse/CALCITE-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837465#comment-17837465 ] Julian Hyde commented on CALCITE-6367: -- [~jerin_john] Can you add links to any related cases (not just mention them in comments). Links are bi-directional and help us prevent duplicate work. > Add timezone support for FORMAT clause in CAST (enabled in BigQuery) > > > Key: CALCITE-6367 > URL: https://issues.apache.org/jira/browse/CALCITE-6367 > Project: Calcite > Issue Type: Bug >Reporter: Jerin John >Priority: Minor > > This issue is a followup on CALCITE-6269 that fixes some of Calcite's > existing format elements implementation to be aligned to BQ functionality. > Two major formats that might require a bit more rework is adding support for > the TZH/TZM elements along with time zone areas as described below: > * [Parsing timestamp > literals|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] > with timezones as used by BQ does not seem to be supported yet (format > element TZR is unimplemented, BQ has TZH, TZM for hour and minute offsets) > (eg: {{cast('2020.06.03 00:00:53+00' as timestamp format '.MM.DD > HH:MI:SSTZH')}} > * BQ format [timezone as string > |https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] > can take an additional argument {{{}AT TIME ZONE 'Asia/Kolkata'{}}}, which > would require additional parser changes and time zone parameter to be plumbed > in to the cast operator call. > One important thing to consider, is that the {{SimpleDateFormat}} class which > currently stores the datetime object in {{{}CAST{}}}, may not fully support > timezone features as described and might warrant a broader refactoring of > this code to use timezone compatible data types. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6363) Introduce a rule to derive more filters from inner join condition
[ https://issues.apache.org/jira/browse/CALCITE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837462#comment-17837462 ] Julian Hyde commented on CALCITE-6363: -- Thanks for finding JoinConditionPushRule (CALCITE-5073). This case seems to be a generalization of that. Maybe the logic should be added to JoinConditionPushRule. What's your opinion, [~libenchao]? I would like to see some test cases for left and right joins. It is possible to move conditions across outer joins, in some cases. I don't believe that all the changes to RexNormalize and RexUtil are necessary. > Introduce a rule to derive more filters from inner join condition > - > > Key: CALCITE-6363 > URL: https://issues.apache.org/jira/browse/CALCITE-6363 > Project: Calcite > Issue Type: New Feature > Components: core >Reporter: ruanhui >Priority: Minor > Labels: pull-request-available > > Sometimes we can infer more predicates from inner Join , for example, in the > query > SELECT * FROM ta INNER JOIN tb ON ta.x = tb.y WHERE ta.x > 10 > we can infer condition tb.y > 10 and we can push it down to the table tb. > In this way, it is possible to reduce the amount of data involved in the Join. > To achieve this, here is my idea. > The core data strucature is two Multimap: > predicateMap : a map for inputRef to corresponding predicate such as: $1 -> > [$1 > 10, $1 < 20, $1 = $2] > equivalenceMap : a map for inputRef to corresponding equivalent values or > inputRefs such as: $1 -> [$2, 1] > The filter derivation is divided into 4 steps: > 1. construct predicate map and equivalence map by traversing all conjunctions > in the condition > 2. search map and rewrite predicates with equivalent inputRefs or literals > 2.1 find all inputRefs that are equivalent to the current inputRef, and then > rewrite all predicates involving equivalent inputRefs using inputRef, for > example if we have inputRef $1 = equivInputRef $2, then we can rewrite \{$2 = > 10} to \{$1 = 10}. > 2.2 find all predicates involving current inputRef. If any predicate refers > to another inputRef, rewrite the predicate with the literal/constant > equivalent to that inputRef, such as: if we have inputRef \{$1 > $2} and \{$2 > = 10} then we can infer new condition \{$1 > 10}. > 2.3 derive new predicates based on equivalence relation in equivalenceMultimap > 3. compose all original predicates and derived predicates > 4. simplify expression such as range merging, like \{$1 > 10 AND $1 > 20} => > \{$1 > 20}, \{$1 > $2 AND $1 > $2} => \{$1 > $2} > Anyone interested in this, please feel free to comment on this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6363) Introduce a rule to derive more filters from inner join condition
[ https://issues.apache.org/jira/browse/CALCITE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837461#comment-17837461 ] Julian Hyde commented on CALCITE-6363: -- On the mailing list [~jamesstarr] replied: {quote}The keyword I think you want is transitive filter pushdown. The reduce expression rule handles some of the trivial cases outlined as examples. Also, you will need to simplify the pushed down filters after they are extracted to prevent infinite loops. Ideally, for the equivalenceMap, an arbitrary subtree that only references a single side of the join could be used. Example 1: SELECT * FROM t1, t2 WHERE subString(t1.zip, 0, 6) = subString(t2.zip, 0, 6) AND subString(t1.zip, 0, 6) IN () {quote} > Introduce a rule to derive more filters from inner join condition > - > > Key: CALCITE-6363 > URL: https://issues.apache.org/jira/browse/CALCITE-6363 > Project: Calcite > Issue Type: New Feature > Components: core >Reporter: ruanhui >Priority: Minor > Labels: pull-request-available > > Sometimes we can infer more predicates from inner Join , for example, in the > query > SELECT * FROM ta INNER JOIN tb ON ta.x = tb.y WHERE ta.x > 10 > we can infer condition tb.y > 10 and we can push it down to the table tb. > In this way, it is possible to reduce the amount of data involved in the Join. > To achieve this, here is my idea. > The core data strucature is two Multimap: > predicateMap : a map for inputRef to corresponding predicate such as: $1 -> > [$1 > 10, $1 < 20, $1 = $2] > equivalenceMap : a map for inputRef to corresponding equivalent values or > inputRefs such as: $1 -> [$2, 1] > The filter derivation is divided into 4 steps: > 1. construct predicate map and equivalence map by traversing all conjunctions > in the condition > 2. search map and rewrite predicates with equivalent inputRefs or literals > 2.1 find all inputRefs that are equivalent to the current inputRef, and then > rewrite all predicates involving equivalent inputRefs using inputRef, for > example if we have inputRef $1 = equivInputRef $2, then we can rewrite \{$2 = > 10} to \{$1 = 10}. > 2.2 find all predicates involving current inputRef. If any predicate refers > to another inputRef, rewrite the predicate with the literal/constant > equivalent to that inputRef, such as: if we have inputRef \{$1 > $2} and \{$2 > = 10} then we can infer new condition \{$1 > 10}. > 2.3 derive new predicates based on equivalence relation in equivalenceMultimap > 3. compose all original predicates and derived predicates > 4. simplify expression such as range merging, like \{$1 > 10 AND $1 > 20} => > \{$1 > 20}, \{$1 > $2 AND $1 > $2} => \{$1 > $2} > Anyone interested in this, please feel free to comment on this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6367) Add timezone support for FORMAT clause in CAST (enabled in BigQuery)
[ https://issues.apache.org/jira/browse/CALCITE-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837414#comment-17837414 ] Jerin John commented on CALCITE-6367: - [~mbudiu] thanks for sharing that info, you're right we need to reuse the same pattern to make timezones available in the CAST operator as well > Add timezone support for FORMAT clause in CAST (enabled in BigQuery) > > > Key: CALCITE-6367 > URL: https://issues.apache.org/jira/browse/CALCITE-6367 > Project: Calcite > Issue Type: Bug >Reporter: Jerin John >Priority: Minor > > This issue is a followup on CALCITE-6269 that fixes some of Calcite's > existing format elements implementation to be aligned to BQ functionality. > Two major formats that might require a bit more rework is adding support for > the TZH/TZM elements along with time zone areas as described below: > * [Parsing timestamp > literals|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] > with timezones as used by BQ does not seem to be supported yet (format > element TZR is unimplemented, BQ has TZH, TZM for hour and minute offsets) > (eg: {{cast('2020.06.03 00:00:53+00' as timestamp format '.MM.DD > HH:MI:SSTZH')}} > * BQ format [timezone as string > |https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] > can take an additional argument {{{}AT TIME ZONE 'Asia/Kolkata'{}}}, which > would require additional parser changes and time zone parameter to be plumbed > in to the cast operator call. > One important thing to consider, is that the {{SimpleDateFormat}} class which > currently stores the datetime object in {{{}CAST{}}}, may not fully support > timezone features as described and might warrant a broader refactoring of > this code to use timezone compatible data types. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-6367) Add timezone support for FORMAT clause in CAST (enabled in BigQuery)
[ https://issues.apache.org/jira/browse/CALCITE-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerin John updated CALCITE-6367: Priority: Minor (was: Major) > Add timezone support for FORMAT clause in CAST (enabled in BigQuery) > > > Key: CALCITE-6367 > URL: https://issues.apache.org/jira/browse/CALCITE-6367 > Project: Calcite > Issue Type: Bug >Reporter: Jerin John >Priority: Minor > > This issue is a followup on CALCITE-6269 that fixes some of Calcite's > existing format elements implementation to be aligned to BQ functionality. > Two major formats that might require a bit more rework is adding support for > the TZH/TZM elements along with time zone areas as described below: > * [Parsing timestamp > literals|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] > with timezones as used by BQ does not seem to be supported yet (format > element TZR is unimplemented, BQ has TZH, TZM for hour and minute offsets) > (eg: {{cast('2020.06.03 00:00:53+00' as timestamp format '.MM.DD > HH:MI:SSTZH')}} > * BQ format [timezone as string > |https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] > can take an additional argument {{{}AT TIME ZONE 'Asia/Kolkata'{}}}, which > would require additional parser changes and time zone parameter to be plumbed > in to the cast operator call. > One important thing to consider, is that the {{SimpleDateFormat}} class which > currently stores the datetime object in {{{}CAST{}}}, may not fully support > timezone features as described and might warrant a broader refactoring of > this code to use timezone compatible data types. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-6367) Add timezone support for FORMAT clause in CAST (enabled in BigQuery)
[ https://issues.apache.org/jira/browse/CALCITE-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerin John updated CALCITE-6367: Description: This issue is a followup on CALCITE-6269 that fixes some of Calcite's existing format elements implementation to be aligned to BQ functionality. Two major formats that might require a bit more rework is adding support for the TZH/TZM elements along with time zone areas as described below: * [Parsing timestamp literals|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] with timezones as used by BQ does not seem to be supported yet (format element TZR is unimplemented, BQ has TZH, TZM for hour and minute offsets) (eg: {{cast('2020.06.03 00:00:53+00' as timestamp format '.MM.DD HH:MI:SSTZH')}} * BQ format [timezone as string |https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] can take an additional argument {{{}AT TIME ZONE 'Asia/Kolkata'{}}}, which would require additional parser changes and time zone parameter to be plumbed in to the cast operator call. One important thing to consider, is that the {{SimpleDateFormat}} class which currently stores the datetime object in {{{}CAST{}}}, may not fully support timezone features as described and might warrant a broader refactoring of this code to use timezone compatible data types. was: This issue is a followup on CALCITE-6269 that fixes some of Calcite's existing format elements implementation to be aligned to BQ functionality. Two major formats that might require a bit more rework is adding support for the TZH/TZM elements along with time zone areas as described below: * [Parsing timestamp literals|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] with timezones as used by BQ does not seem to be supported yet (format element TZR is unimplemented, BQ has TZH, TZM for hour and minute offsets) (eg: {{cast('2020.06.03 00:00:53+00' as timestamp format '.MM.DD HH:MI:SSTZH')}} * BQ format [timezone as string |https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] can take an additional argument {{{}AT TIME ZONE 'Asia/Kolkata'{}}}, which would require additional parser changes and time zone parameter to be plumbed in to the cast operator call. One important thing to consider, is that the {{SimpleDateFormat}} class which currently stores the datetime object, may not fully support timezone features as described and might warrant a broader refactoring of this code. > Add timezone support for FORMAT clause in CAST (enabled in BigQuery) > > > Key: CALCITE-6367 > URL: https://issues.apache.org/jira/browse/CALCITE-6367 > Project: Calcite > Issue Type: Bug >Reporter: Jerin John >Priority: Major > > This issue is a followup on CALCITE-6269 that fixes some of Calcite's > existing format elements implementation to be aligned to BQ functionality. > Two major formats that might require a bit more rework is adding support for > the TZH/TZM elements along with time zone areas as described below: > * [Parsing timestamp > literals|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] > with timezones as used by BQ does not seem to be supported yet (format > element TZR is unimplemented, BQ has TZH, TZM for hour and minute offsets) > (eg: {{cast('2020.06.03 00:00:53+00' as timestamp format '.MM.DD > HH:MI:SSTZH')}} > * BQ format [timezone as string > |https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] > can take an additional argument {{{}AT TIME ZONE 'Asia/Kolkata'{}}}, which > would require additional parser changes and time zone parameter to be plumbed > in to the cast operator call. > One important thing to consider, is that the {{SimpleDateFormat}} class which > currently stores the datetime object in {{{}CAST{}}}, may not fully support > timezone features as described and might warrant a broader refactoring of > this code to use timezone compatible data types. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6269) Fix missing/broken BigQuery date-time format elements
[ https://issues.apache.org/jira/browse/CALCITE-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837361#comment-17837361 ] Jerin John commented on CALCITE-6269: - Hi committers, I've raised this [PR|https://github.com/apache/calcite/pull/3761] that should make available fixes for most of the above missing use cases. Couple of points to note, as observed and highlighted in this comment on CALCITE-6315, the {{SimpleDateFormat}} class which currently stores the datetime object only supports parsing up to 3 decimal places for values like Fractional Seconds. This hinders our ability to produce any value for FFn elements (e.g. FF4, FF5, .. FF9) as precision above 3 is not available in this parsed object. My temporary fix was to pad zeroes to the right, as the existing implementation padded to the left and returned an incorrect answer as seen in that comment. As per BQ [docs,|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_year_as_string] {{}} format is said to support 4 or more digits as an year value which was incorrect as BQ studio also threw a similar error as calcite would, i.e. incorrect literal supplied as year. Additionally, I have moved the last two features mentioned previously in the description, regarding support for Timezone format elements, to a new ticket CALCITE-6367 as it requires more rework than fixing existing elements or adding similar ones. Would appreciate a review on the PR and comments on how to handle these unfixed cases and timezone support for later. > Fix missing/broken BigQuery date-time format elements > - > > Key: CALCITE-6269 > URL: https://issues.apache.org/jira/browse/CALCITE-6269 > Project: Calcite > Issue Type: Bug >Reporter: Jerin John >Assignee: Jerin John >Priority: Minor > Labels: pull-request-available > > Calcite has the > [FormatModels|https://github.com/apache/calcite/blob/2dadcd1a0e235f5fe1b29c9c32014035971fd45e/core/src/main/java/org/apache/calcite/util/format/FormatModels.java#L115] > class which is missing support for or has incorrect implementation for the > following DATE-TIME format elements: > * [YYY / > Y|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_year_as_string] > - last three or 1 digits of year > * > [MONTH|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_month_as_string] > formats to "Jan" instead of "JANUARY" > * > [S|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_second_as_string] > - seconds of the day (5 digits), only SS is available that gives seconds of > the minute. > * [FFn > (n=1/2)|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_second_as_string] > - always returns seconds with precision 3 (=FF3); also BQ supports n=1-9, > calcite format models supports n=1-6, should we expand this range? > * > [AM/PM|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_meridian_as_string] > - Meridian formats not available -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-6269) Fix missing/broken BigQuery date-time format elements
[ https://issues.apache.org/jira/browse/CALCITE-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerin John updated CALCITE-6269: Description: Calcite has the [FormatModels|https://github.com/apache/calcite/blob/2dadcd1a0e235f5fe1b29c9c32014035971fd45e/core/src/main/java/org/apache/calcite/util/format/FormatModels.java#L115] class which is missing support for or has incorrect implementation for the following DATE-TIME format elements: * [YYY / Y|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_year_as_string] - last three or 1 digits of year * [MONTH|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_month_as_string] formats to "Jan" instead of "JANUARY" * [S|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_second_as_string] - seconds of the day (5 digits), only SS is available that gives seconds of the minute. * [FFn (n=1/2)|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_second_as_string] - always returns seconds with precision 3 (=FF3); also BQ supports n=1-9, calcite format models supports n=1-6, should we expand this range? * [AM/PM|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_meridian_as_string] - Meridian formats not available was: Calcite has the [FormatModels|https://github.com/apache/calcite/blob/2dadcd1a0e235f5fe1b29c9c32014035971fd45e/core/src/main/java/org/apache/calcite/util/format/FormatModels.java#L115] class which is missing support for or has incorrect implementation for the following DATE-TIME format elements: * [YYY / Y|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_year_as_string] - last three or 1 digits of year * [|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_year_as_string] - supports four or more digits in the year, Calcite using [DateString|https://github.com/apache/calcite/blob/3326475c766267d521330006cc80730c4e456191/core/src/main/java/org/apache/calcite/util/DateString.java] util throws: {{java.lang.IllegalArgumentException: Year out of range: [12018]}} * [MONTH|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_month_as_string] formats to "Jan" instead of "JANUARY" * [S|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_second_as_string] - seconds of the day (5 digits), only SS is available that gives seconds of the minute. * [FFn (n=1/2)|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_second_as_string] - always returns seconds with precision 3 (=FF3); also BQ supports n=1-9, calcite format models supports n=1-6, should we expand this range? * [AM/PM|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_meridian_as_string] - Meridian formats not available > Fix missing/broken BigQuery date-time format elements > - > > Key: CALCITE-6269 > URL: https://issues.apache.org/jira/browse/CALCITE-6269 > Project: Calcite > Issue Type: Bug >Reporter: Jerin John >Assignee: Jerin John >Priority: Minor > Labels: pull-request-available > > Calcite has the > [FormatModels|https://github.com/apache/calcite/blob/2dadcd1a0e235f5fe1b29c9c32014035971fd45e/core/src/main/java/org/apache/calcite/util/format/FormatModels.java#L115] > class which is missing support for or has incorrect implementation for the > following DATE-TIME format elements: > * [YYY / > Y|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_year_as_string] > - last three or 1 digits of year > * > [MONTH|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_month_as_string] > formats to "Jan" instead of "JANUARY" > * > [S|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_second_as_string] > - seconds of the day (5 digits), only SS is available that gives seconds of > the minute. > * [FFn > (n=1/2)|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_second_as_string] > - always returns seconds with precision 3 (=FF3); also BQ supports n=1-9, > calcite format models supports n=1-6, should we expand this range? > * > [AM/PM|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_meridian_as_string] > - Meridian formats not available -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6366) Code generated by EnumUtils#convert should throw an exception if the target type is overflowed
[ https://issues.apache.org/jira/browse/CALCITE-6366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837356#comment-17837356 ] Mihai Budiu commented on CALCITE-6366: -- I have submitted (at least a partial) fix in https://github.com/apache/calcite/pull/3589 The issue addressed is [CALCITE-6169] > Code generated by EnumUtils#convert should throw an exception if the target > type is overflowed > -- > > Key: CALCITE-6366 > URL: https://issues.apache.org/jira/browse/CALCITE-6366 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Ruben Q L >Priority: Major > > Code generated by EnumUtils#convert should throw an exception if the target > type is overflowed (consider using Expressions#convertChecked) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-6269) Fix missing/broken BigQuery date-time format elements
[ https://issues.apache.org/jira/browse/CALCITE-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated CALCITE-6269: Labels: pull-request-available (was: ) > Fix missing/broken BigQuery date-time format elements > - > > Key: CALCITE-6269 > URL: https://issues.apache.org/jira/browse/CALCITE-6269 > Project: Calcite > Issue Type: Bug >Reporter: Jerin John >Assignee: Jerin John >Priority: Minor > Labels: pull-request-available > > Calcite has the > [FormatModels|https://github.com/apache/calcite/blob/2dadcd1a0e235f5fe1b29c9c32014035971fd45e/core/src/main/java/org/apache/calcite/util/format/FormatModels.java#L115] > class which is missing support for or has incorrect implementation for the > following DATE-TIME format elements: > * [YYY / > Y|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_year_as_string] > - last three or 1 digits of year > * > [|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_year_as_string] > - supports four or more digits in the year, Calcite using > [DateString|https://github.com/apache/calcite/blob/3326475c766267d521330006cc80730c4e456191/core/src/main/java/org/apache/calcite/util/DateString.java] > util throws: > {{java.lang.IllegalArgumentException: Year out of range: [12018]}} > * > [MONTH|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_month_as_string] > formats to "Jan" instead of "JANUARY" > * > [S|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_second_as_string] > - seconds of the day (5 digits), only SS is available that gives seconds of > the minute. > * [FFn > (n=1/2)|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_second_as_string] > - always returns seconds with precision 3 (=FF3); also BQ supports n=1-9, > calcite format models supports n=1-6, should we expand this range? > * > [AM/PM|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_meridian_as_string] > - Meridian formats not available -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6367) Add timezone support for FORMAT clause in CAST (enabled in BigQuery)
[ https://issues.apache.org/jira/browse/CALCITE-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837353#comment-17837353 ] Mihai Budiu commented on CALCITE-6367: -- Partial support for date time/times with timezone has been merged. https://issues.apache.org/jira/browse/CALCITE-6138 https://github.com/apache/calcite/pull/3569 Classes to store these data types have existed in Calcite for a while. > Add timezone support for FORMAT clause in CAST (enabled in BigQuery) > > > Key: CALCITE-6367 > URL: https://issues.apache.org/jira/browse/CALCITE-6367 > Project: Calcite > Issue Type: Bug >Reporter: Jerin John >Priority: Major > > This issue is a followup on CALCITE-6269 that fixes some of Calcite's > existing format elements implementation to be aligned to BQ functionality. > Two major formats that might require a bit more rework is adding support for > the TZH/TZM elements along with time zone areas as described below: > * [Parsing timestamp > literals|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] > with timezones as used by BQ does not seem to be supported yet (format > element TZR is unimplemented, BQ has TZH, TZM for hour and minute offsets) > (eg: {{cast('2020.06.03 00:00:53+00' as timestamp format '.MM.DD > HH:MI:SSTZH')}} > * BQ format [timezone as string > |https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] > can take an additional argument {{{}AT TIME ZONE 'Asia/Kolkata'{}}}, which > would require additional parser changes and time zone parameter to be plumbed > in to the cast operator call. > One important thing to consider, is that the {{SimpleDateFormat}} class which > currently stores the datetime object, may not fully support timezone features > as described and might warrant a broader refactoring of this code. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-6269) Fix missing/broken BigQuery date-time format elements
[ https://issues.apache.org/jira/browse/CALCITE-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerin John updated CALCITE-6269: Description: Calcite has the [FormatModels|https://github.com/apache/calcite/blob/2dadcd1a0e235f5fe1b29c9c32014035971fd45e/core/src/main/java/org/apache/calcite/util/format/FormatModels.java#L115] class which is missing support for or has incorrect implementation for the following DATE-TIME format elements: * [YYY / Y|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_year_as_string] - last three or 1 digits of year * [|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_year_as_string] - supports four or more digits in the year, Calcite using [DateString|https://github.com/apache/calcite/blob/3326475c766267d521330006cc80730c4e456191/core/src/main/java/org/apache/calcite/util/DateString.java] util throws: {{java.lang.IllegalArgumentException: Year out of range: [12018]}} * [MONTH|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_month_as_string] formats to "Jan" instead of "JANUARY" * [S|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_second_as_string] - seconds of the day (5 digits), only SS is available that gives seconds of the minute. * [FFn (n=1/2)|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_second_as_string] - always returns seconds with precision 3 (=FF3); also BQ supports n=1-9, calcite format models supports n=1-6, should we expand this range? * [AM/PM|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_meridian_as_string] - Meridian formats not available was: Calcite has the [FormatModels|https://github.com/apache/calcite/blob/2dadcd1a0e235f5fe1b29c9c32014035971fd45e/core/src/main/java/org/apache/calcite/util/format/FormatModels.java#L115] class which is missing support for or has incorrect implementation for the following DATE-TIME format elements: * [YYY / Y|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_year_as_string] - last three or 1 digits of year * [|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_year_as_string] - supports four or more digits in the year, Calcite using [DateString|https://github.com/apache/calcite/blob/3326475c766267d521330006cc80730c4e456191/core/src/main/java/org/apache/calcite/util/DateString.java] util throws: {{java.lang.IllegalArgumentException: Year out of range: [12018]}} * [MONTH|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_month_as_string] formats to "Jan" instead of "JANUARY" * [S|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_second_as_string] - seconds of the day (5 digits), only SS is available that gives seconds of the minute. * [FFn (n=1/2)|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_second_as_string] - always returns seconds with precision 3 (=FF3); also BQ supports n=1-9, calcite format models supports n=1-6, should we expand this range? * [AM/PM|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_meridian_as_string] - Meridian formats not available * [Parsing timestamp literals|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] with timezones as used by BQ does not seem to be supported yet (format element TZR is unimplemented, BQ has TZH, TZM for hour and minute offsets) (eg: {{cast('2020.06.03 00:00:53+00' as timestamp format '.MM.DD HH:MI:SSTZH')}} * BQ format [timezone as string |https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] can take an additional argument {{{}AT TIME ZONE 'Asia/Kolkata'{}}}, which would require additional parser changes and time zone parameter to be plumbed in to the cast operator call. > Fix missing/broken BigQuery date-time format elements > - > > Key: CALCITE-6269 > URL: https://issues.apache.org/jira/browse/CALCITE-6269 > Project: Calcite > Issue Type: Bug >Reporter: Jerin John >Assignee: Jerin John >Priority: Minor > > Calcite has the > [FormatModels|https://github.com/apache/calcite/blob/2dadcd1a0e235f5fe1b29c9c32014035971fd45e/core/src/main/java/org/apache/calcite/util/format/FormatModels.java#L115] > class which is missing support for or has incorrect implementation for the > following DATE-TIME format elements: > * [YYY / > Y|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_year_as_string] > - last three or 1 digits of year >
[jira] [Created] (CALCITE-6367) Add timezone support for FORMAT clause in CAST (enabled in BigQuery)
Jerin John created CALCITE-6367: --- Summary: Add timezone support for FORMAT clause in CAST (enabled in BigQuery) Key: CALCITE-6367 URL: https://issues.apache.org/jira/browse/CALCITE-6367 Project: Calcite Issue Type: Bug Reporter: Jerin John This issue is a followup on CALCITE-6269 that fixes some of Calcite's existing format elements implementation to be aligned to BQ functionality. Two major formats that might require a bit more rework is adding support for the TZH/TZM elements along with time zone areas as described below: * [Parsing timestamp literals|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] with timezones as used by BQ does not seem to be supported yet (format element TZR is unimplemented, BQ has TZH, TZM for hour and minute offsets) (eg: {{cast('2020.06.03 00:00:53+00' as timestamp format '.MM.DD HH:MI:SSTZH')}} * BQ format [timezone as string |https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] can take an additional argument {{{}AT TIME ZONE 'Asia/Kolkata'{}}}, which would require additional parser changes and time zone parameter to be plumbed in to the cast operator call. One important thing to consider, is that the {{SimpleDateFormat}} class which currently stores the datetime object, may not fully support timezone features as described and might warrant a broader refactoring of this code. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CALCITE-6265) Type coercion is failing for numeric values in prepared statements
[ https://issues.apache.org/jira/browse/CALCITE-6265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17836560#comment-17836560 ] Ruben Q L edited comment on CALCITE-6265 at 4/15/24 3:10 PM: - I have [opened a PR|https://github.com/apache/calcite/pull/3758] to fix the problems (partially reverting some of the original changes, partially making some adjustments): - It seems the original issue could be solved easier in RexToLixTranslator, by simply converting to Number and then to storageType in case of Numeric types. - All the bind* tests that were added on the first PR still work; plus a new test that I added to illustrate the issues that I found (with setBigDecimal) - The problems that I detected on my downstream project are also fixed. - The only thing that is missing (didn't have the time to dig deeper) from the original solution is the "check for overflow and throw". I have left the auxiliary method that generates this dynamic code (Expressions#convertChecked), and left a TODO on EnumUtils#convert for future work. I think that this was not part of the strict scope of the current ticket's description, so IMHO it would be acceptable open a separate ticket and work on that in the future, adding more thorough tests on this regard (and not just the one JdbcTest#bindOverflowingTinyIntParameter that was originally added, which I disabled on my branch). UPDATE: created CALCITE-6366 for this purpose. I'd appreciate some feedback, and if you think I can move on and merge my proposal. was (Author: rubenql): I have [opened a PR|https://github.com/apache/calcite/pull/3758] to fix the problems (partially reverting some of the original changes, partially making some adjustments): - It seems the original issue could be solved easier in RexToLixTranslator, by simply converting to Number and then to storageType in case of Numeric types. - All the bind* tests that were added on the first PR still work; plus a new test that I added to illustrate the issues that I found (with setBigDecimal) - The problems that I detected on my downstream project are also fixed. - The only thing that is missing (didn't have the time to dig deeper) from the original solution is the "check for overflow and throw". I have left the auxiliary method that generates this dynamic code (Expressions#convertChecked), and left a TODO on EnumUtils#convert for future work. I think that this was not part of the strict scope of the current ticket's description, so IMHO it would be acceptable open a separate ticket and work on that in the future, adding more thorough tests on this regard (and not just the one JdbcTest#bindOverflowingTinyIntParameter that was originally added, which I disabled on my branch). I'd appreciate some feedback, and if you think I can move on and merge my proposal. > Type coercion is failing for numeric values in prepared statements > -- > > Key: CALCITE-6265 > URL: https://issues.apache.org/jira/browse/CALCITE-6265 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Tim Nieradzik >Assignee: Ruben Q L >Priority: Major > Labels: pull-request-available > Fix For: 1.37.0 > > > Given a column of type {{{}INT{}}}. When providing a {{short}} value as a > placeholder in a prepared statement, a {{ClassCastException}} is thrown. > h2. Test case > {{final String sql =}} > {{ "select \"empid\" from \"hr\".\"emps\" where \"empid\" in (?, ?)";}}{{ > CalciteAssert.hr()}} > {{ .query(sql)}} > {{ .consumesPreparedStatement(p -> {}} > {{ p.setShort(1, (short) 100);}} > {{ p.setShort(2, (short) 110);}} > {{ })}} > {{ .returnsUnordered("empid=100", "empid=110");}} > h2. Stack trace > {{java.lang.ClassCastException: class java.lang.Short cannot be cast to class > java.lang.Integer (java.lang.Short and java.lang.Integer are in module > java.base of loader 'bootstrap')}} > {{ at Baz$1$1.moveNext(Unknown Source)}} > {{ at > org.apache.calcite.linq4j.Linq4j$EnumeratorIterator.(Linq4j.java:679)}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-6366) Code generated by EnumUtils#convert should throw an exception if the target type is overflowed
Ruben Q L created CALCITE-6366: -- Summary: Code generated by EnumUtils#convert should throw an exception if the target type is overflowed Key: CALCITE-6366 URL: https://issues.apache.org/jira/browse/CALCITE-6366 Project: Calcite Issue Type: Improvement Components: core Reporter: Ruben Q L Code generated by EnumUtils#convert should throw an exception if the target type is overflowed (consider using Expressions#convertChecked) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6352) The map_contains_key function may return true when the key and mapkeytype types are different.
[ https://issues.apache.org/jira/browse/CALCITE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837300#comment-17837300 ] Caican Cai commented on CALCITE-6352: - I think there are two steps to solving the problem . The first step is to determine the type judgment rules for map_contain_key in spark. The second step is to handle these special situations specially. > The map_contains_key function may return true when the key and mapkeytype > types are different. > -- > > Key: CALCITE-6352 > URL: https://issues.apache.org/jira/browse/CALCITE-6352 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.36.0 >Reporter: Caican Cai >Assignee: Caican Cai >Priority: Critical > Fix For: 1.37.0 > > > > {code:java} > scala> val df = spark.sql("select map_contains_key(map(1, 'a', 2, 'b'), > 2.0)") > val df: org.apache.spark.sql.DataFrame = [map_contains_key(map(1, a, 2, b), > 2.0): boolean] > scala> df.show() > +--+ > |map_contains_key(map(1, a, 2, b), 2.0)| > +--+ > | true| > +--+ > {code} > calcite return false > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6365) Support for RETURNING clause of JSON_QUERY
[ https://issues.apache.org/jira/browse/CALCITE-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837223#comment-17837223 ] Alessandro Solimando commented on CALCITE-6365: --- It would be nice to add at least a simple example to the description > Support for RETURNING clause of JSON_QUERY > -- > > Key: CALCITE-6365 > URL: https://issues.apache.org/jira/browse/CALCITE-6365 > Project: Calcite > Issue Type: New Feature >Reporter: Dawid Wysakowicz >Priority: Major > > SQL standard says {{JSON_QUERY}} should support {{RETURNING}} clause similar > to {{JSON_VALUE}}. Calcite supports the clause for JSON_VALUE already, but > not for the JSON_QUERY. > {code} > ::= > JSON_QUERY > > [ ] > [ WRAPPER ] > [ QUOTES [ ON SCALAR STRING ] ] > [ ON EMPTY ] > [ ON ERROR ] > > ::= > RETURNING > [ FORMAT ] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-6365) Support for RETURNING clause of JSON_QUERY
Dawid Wysakowicz created CALCITE-6365: - Summary: Support for RETURNING clause of JSON_QUERY Key: CALCITE-6365 URL: https://issues.apache.org/jira/browse/CALCITE-6365 Project: Calcite Issue Type: New Feature Reporter: Dawid Wysakowicz SQL standard says {{JSON_QUERY}} should support {{RETURNING}} clause similar to {{JSON_VALUE}}. Calcite supports the clause for JSON_VALUE already, but not for the JSON_QUERY. {code} ::= JSON_QUERY [ ] [ WRAPPER ] [ QUOTES [ ON SCALAR STRING ] ] [ ON EMPTY ] [ ON ERROR ] ::= RETURNING [ FORMAT ] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-6244) Improve `Expressions#constant` to allow passing models with non-public fields
[ https://issues.apache.org/jira/browse/CALCITE-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wegdan Ghazi updated CALCITE-6244: -- Fix Version/s: 1.37.0 > Improve `Expressions#constant` to allow passing models with non-public fields > - > > Key: CALCITE-6244 > URL: https://issues.apache.org/jira/browse/CALCITE-6244 > Project: Calcite > Issue Type: Improvement > Components: linq4j >Reporter: Wegdan Ghazi >Assignee: Wegdan Ghazi >Priority: Minor > Fix For: 1.37.0 > > > To use > [Expressions#constant|https://github.com/apache/calcite/blob/e17098d47f3c31e4d90cc17e6e1da1175bf49ae4/linq4j/src/main/java/org/apache/calcite/linq4j/tree/Expressions.java#L540] > with complex models, it's required to pass a model with public fields, as > can be seen in this > [test|https://github.com/apache/calcite/blob/e17098d47f3c31e4d90cc17e6e1da1175bf49ae4/linq4j/src/test/java/org/apache/calcite/linq4j/test/ExpressionTest.java#L865]. > i.e. to successfully pass an instance of `{{{}Employee{}}}`, it must be > defined as follows: > {code:java} > public static class Employee { > public final int empno; > public final String name; > public final int deptno; public Employee(int empno, String name, int > deptno) { > this.empno = empno; > this.name = name; > this.deptno = deptno; > } public String toString() { > return "Employee(name: " + name + ", deptno:" + deptno + ")"; > } @Override public int hashCode() { > final int prime = 31; > int result = 1; > result = prime * result + deptno; > result = prime * result + empno; > result = prime * result + ((name == null) ? 0 : name.hashCode()); > return result; > } @Override public boolean equals(Object obj) { > if (this == obj) { > return true; > } > if (obj == null) { > return false; > } > if (getClass() != obj.getClass()) { > return false; > } > Employee other = (Employee) obj; > if (deptno != other.deptno) { > return false; > } > if (empno != other.empno) { > return false; > } > if (name == null) { > if (other.name != null) { > return false; > } > } else if (!name.equals(other.name)) { > return false; > } > return true; > } > } {code} > This makes it difficult to use generated classes e.g. Java records or > immutables, or even encapsulated POJOs to pass through Linq4j. > This is caused by the logic to > [explore|https://github.com/apache/calcite/blob/e17098d47f3c31e4d90cc17e6e1da1175bf49ae4/linq4j/src/main/java/org/apache/calcite/linq4j/tree/ConstantExpression.java#L299] > and > [create|https://github.com/apache/calcite/blob/e17098d47f3c31e4d90cc17e6e1da1175bf49ae4/linq4j/src/main/java/org/apache/calcite/linq4j/tree/ConstantExpression.java#L216] > the model constructor; which depends on: > {code:java} > value.getClass().getFields() {code} > which only accesses public fields. > {*}Proposed solution{*}: Access fields using reflection, by accessing their > getter methods. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-6364) HttpClient SPENGO support is deprecated
[ https://issues.apache.org/jira/browse/CALCITE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837098#comment-17837098 ] Istvan Toth commented on CALCITE-6364: -- Found this link in the HttpClient ticket > HttpClient SPENGO support is deprecated > --- > > Key: CALCITE-6364 > URL: https://issues.apache.org/jira/browse/CALCITE-6364 > Project: Calcite > Issue Type: Bug > Components: avatica >Reporter: Istvan Toth >Priority: Critical > > The Avatica Java client depends on Apache HttpClient's Kerberos/SPNEGO > implementation. > According to HTTPCLIENT-1625 that implementation is not secure, and is > deprecated in newer versions. > Unfortunately, HTTPCLIENT-1625 is very scant on details, and since the reason > given for deprecation is the lack of time to fix it, it is likely not a > trivial fix. > Unfortunately, Avatica depends heavily on httpclient, and replacing it would > it would be a big job. > While Avatica in theory has a configurable Http Client implementation, the > only non-httpclient implementation is more of a POC, and does not support ANY > authentication methods. > I can see these options: > 1. Find an another http client library, and use it in Avatica > 2. Copy the SPENGO auth code from httpclient, and fix it in Avatica > 3. Fix the SPENGO auth code in httpclient. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-6364) HttpClient SPENGO support is deprecated
Istvan Toth created CALCITE-6364: Summary: HttpClient SPENGO support is deprecated Key: CALCITE-6364 URL: https://issues.apache.org/jira/browse/CALCITE-6364 Project: Calcite Issue Type: Bug Components: avatica Reporter: Istvan Toth The Avatica Java client depends on Apache HttpClient's Kerberos/SPNEGO implementation. According to HTTPCLIENT-1625 that implementation is not secure, and is deprecated in newer versions. Unfortunately, HTTPCLIENT-1625 is very scant on details, and since the reason given for deprecation is the lack of time to fix it, it is likely not a trivial fix. Unfortunately, Avatica depends heavily on httpclient, and replacing it would it would be a big job. While Avatica in theory has a configurable Http Client implementation, the only non-httpclient implementation is more of a POC, and does not support ANY authentication methods. I can see these options: 1. Find an another http client library, and use it in Avatica 2. Copy the SPENGO auth code from httpclient, and fix it in Avatica 3. Fix the SPENGO auth code in httpclient. -- This message was sent by Atlassian Jira (v8.20.10#820010)