[GitHub] [calcite] wnob commented on a diff in pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

2022-11-18 Thread GitBox


wnob commented on code in PR #2973:
URL: https://github.com/apache/calcite/pull/2973#discussion_r1026992374


##
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##
@@ -149,6 +159,40 @@ public class SqlFunctions {
 
   private static final Pattern PATTERN_0_STAR_E = Pattern.compile("0*E");
 
+  // Date formatter for BigQuery's timestamp literals:
+  // 
https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical#timestamp_literals
+  private static final DateTimeFormatter BIG_QUERY_TIMESTAMP_LITERAL_FORMATTER 
=
+  new DateTimeFormatterBuilder()
+  // Unlike ISO 8601, BQ only supports years between 1 - ,
+  // but can support single-digit month and day parts.
+  .appendValue(ChronoField.YEAR, 4)
+  .appendLiteral('-')
+  .appendValue(ChronoField.MONTH_OF_YEAR, 1, 2, SignStyle.NOT_NEGATIVE)
+  .appendLiteral('-')
+  .appendValue(ChronoField.DAY_OF_MONTH, 1, 2, SignStyle.NOT_NEGATIVE)
+  // Everything after the date is optional. Optional sections can be 
nested.
+  .optionalStart()
+  // BQ accepts either a literal 'T' or a space to separate the date 
from the time,
+  // so make the 'T' optional but pad with 1 space if it's omitted.
+  .padNext(1, ' ')
+  .optionalStart()
+  .appendLiteral('T')
+  .optionalEnd()
+  // Unlike ISO 8601, BQ can support single-digit hour, minute, and 
second parts.
+  .appendValue(ChronoField.HOUR_OF_DAY, 1, 2, SignStyle.NOT_NEGATIVE)
+  .appendLiteral(':')
+  .appendValue(ChronoField.MINUTE_OF_HOUR, 1, 2, 
SignStyle.NOT_NEGATIVE)
+  .appendLiteral(':')
+  .appendValue(ChronoField.SECOND_OF_MINUTE, 1, 2, 
SignStyle.NOT_NEGATIVE)
+  // ISO 8601 supports up to nanosecond precision, but BQ only up to 
microsecond.
+  .optionalStart()
+  .appendFraction(ChronoField.MICRO_OF_SECOND, 0, 6, true)
+  .optionalEnd()
+  .optionalStart()
+  .parseLenient()
+  .appendOffsetId()
+  .toFormatter(Locale.ROOT);
+

Review Comment:
   Yes, it would appear that JDK8 had a different implementation for 
`DateTimeFormatter` that cannot handle offsets (or at least not in the same way 
as later versions). I'm inclined to switch to regex-based conversion, which 
will require more logic but be more widely compatible. Thoughts?



##
site/_docs/reference.md:
##
@@ -2596,7 +2596,16 @@ semantics.
 | p | CONVERT_TIMEZONE(tz1, tz2, datetime)   | Converts the timezone 
of *datetime* from *tz1* to *tz2*
 | b | CURRENT_DATETIME([timezone])   | Returns the current 
time as a TIMESTAMP from *timezone*
 | m | DAYNAME(datetime)  | Returns the name, in 
the connection's locale, of the weekday in *datetime*; for example, it returns 
'星期日' for both DATE '2020-02-10' and TIMESTAMP '2020-02-10 10:10:10'
-| b | DATE(string)   | Equivalent to 
`CAST(string AS DATE)`
+| b | DATE(integer, integer, integer)| Returns a date object 
given year, month, and day
+| b | DATE(timestamp)| Extracts the UTC date 
from a timestamp
+| b | DATE(timestamp, string)| Extracts the date from 
a timestamp, in a given time zone
+| b | TIMESTAMP(string)  | Equivalent to 
`CAST(string AS TIMESTAMP)`
+| b | TIMESTAMP(string, string)  | Equivalent to 
`CAST(string AS TIMESTAMP)`, converted to a given time zone
+| b | TIMESTAMP(date)| Convert a UTC date to a 
timestamp (at midnight)
+| b | TIMESTAMP(date, string)| Convert a date to a 
timestamp (at midnight), in a given time zone
+| b | TIME(integer, integer, integer)| Returns a time object 
given hour, minute, and second
+| b | TIME(timestamp)| Extracts the UTC time 
from a timestamp

Review Comment:
   Reworded for clarity.



##
babel/src/main/codegen/includes/parserImpls.ftl:
##
@@ -42,6 +42,46 @@ SqlNode DateFunctionCall() :
 }
 }
 
+SqlNode TimestampFunctionCall() :
+{
+final SqlFunctionCategory funcType = 
SqlFunctionCategory.USER_DEFINED_FUNCTION;

Review Comment:
   Ya this was just copied from the existing `DateFunctionCall`. I assumed this 
vaguely meant library function since there is no obvious "library function" 
value in the `SqlFunctionCategory` enum. Perhaps these should all be in the 
`TIMEDATE` category. I suppose since this is the babel parser, there's no real 
concept of a library function.
   
   BTW, is my assessment of babel in my above comment with Oliver correct? 
Should we be putting these productions in the core parser instead (or in 
addition to here)?



##
babel/src/test/resources/sql/big-query.iq:
##
@@ -1051,29 +1051,98 @@ select u

[GitHub] [calcite] wnob commented on a diff in pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

2022-11-17 Thread GitBox


wnob commented on code in PR #2973:
URL: https://github.com/apache/calcite/pull/2973#discussion_r1025778875


##
babel/src/main/codegen/includes/parserImpls.ftl:
##
@@ -42,6 +42,26 @@ SqlNode DateFunctionCall() :
 }
 }
 
+SqlNode TimestampFunctionCall() :

Review Comment:
   Here's my understanding: Babel is the "universal" parser ¹ that does not 
understand the semantics of individual dialects (hence the name). 
Dialect-specific semantic validation is a separate step that occurs after 
dialect-agnostic parsing.
   
   This is a [FreeMarker template file][1]. It does not contain any 
interpolation or tags, so running this through FreeMarker would simply remove 
the license comment at the top of the file. It's processed by [FMPP][2] in 
`config.fmpp` [here][3], which is in-turn referenced in the [JavaCC][4] grammar 
[here][5], as well as [here][6] via the `builtinFunctionCallMethods` list where 
the function name was added above in `config.fmpp`.
   
   Together, these 2 additions to the grammar indicate that there is a 
*production* called `TimestampFunctionCall` which consists of the 
[`` token][7] followed by a [`FunctionParameterList`][8] production, 
which consists of left-parenthesis, comma-separated arguments, 
right-parenthesis. The parser does nothing to validate how many arguments there 
are or what types they are.
   
   So yes, long story short, you're right that we didn't have to modify the 
`DateFunctionCall` production because it's just a function call either way, and 
I just added this new production with a different name token but which is 
otherwise identical to the `DATE` one.
   
   ¹ I've heard Julian make reference to other parsers being potentially usable 
with Calcite but I don't know of any examples.
   
   [1]: https://freemarker.apache.org/docs/dgui_template_overallstructure.html
   [2]: https://fmpp.sourceforge.net/index.html
   [3]: 
https://github.com/apache/calcite/blob/a505b25eacc473c6ec0ef8abd40c1ccae86297b6/babel/src/main/codegen/config.fmpp#L566
   [4]: https://javacc.github.io/javacc/
   [5]: 
https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/codegen/templates/Parser.jj#L1163
   [6]: 
https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/codegen/templates/Parser.jj#L5992
   [7]: 
https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/codegen/templates/Parser.jj#L7922
   [8]: 
https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/codegen/templates/Parser.jj#L948
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@calcite.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org