[GitHub] [calcite] wnob commented on pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

2022-11-18 Thread GitBox


wnob commented on PR #2973:
URL: https://github.com/apache/calcite/pull/2973#issuecomment-1320707981

   @tjbanghart
   
   > Wouldn't `DATETIME` map to Calcite's interpretation of a `TIMESTAMP` due 
to the lack of TZ info?
   
   Neither has tz info. I included a little table in the design doc that shows 
similarities between types in BQ and Calcite, and BQ `TIMESTAMP` maps pretty 
well to Calcite's `TIMESTAMP`. There isn't really a good existing type for 
`DATETIME` in Calcite. I think we'll have to implement one.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@calcite.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [calcite] wnob commented on a diff in pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

2022-11-18 Thread GitBox


wnob commented on code in PR #2973:
URL: https://github.com/apache/calcite/pull/2973#discussion_r1026992374


##
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##
@@ -149,6 +159,40 @@ public class SqlFunctions {
 
   private static final Pattern PATTERN_0_STAR_E = Pattern.compile("0*E");
 
+  // Date formatter for BigQuery's timestamp literals:
+  // 
https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical#timestamp_literals
+  private static final DateTimeFormatter BIG_QUERY_TIMESTAMP_LITERAL_FORMATTER 
=
+  new DateTimeFormatterBuilder()
+  // Unlike ISO 8601, BQ only supports years between 1 - ,
+  // but can support single-digit month and day parts.
+  .appendValue(ChronoField.YEAR, 4)
+  .appendLiteral('-')
+  .appendValue(ChronoField.MONTH_OF_YEAR, 1, 2, SignStyle.NOT_NEGATIVE)
+  .appendLiteral('-')
+  .appendValue(ChronoField.DAY_OF_MONTH, 1, 2, SignStyle.NOT_NEGATIVE)
+  // Everything after the date is optional. Optional sections can be 
nested.
+  .optionalStart()
+  // BQ accepts either a literal 'T' or a space to separate the date 
from the time,
+  // so make the 'T' optional but pad with 1 space if it's omitted.
+  .padNext(1, ' ')
+  .optionalStart()
+  .appendLiteral('T')
+  .optionalEnd()
+  // Unlike ISO 8601, BQ can support single-digit hour, minute, and 
second parts.
+  .appendValue(ChronoField.HOUR_OF_DAY, 1, 2, SignStyle.NOT_NEGATIVE)
+  .appendLiteral(':')
+  .appendValue(ChronoField.MINUTE_OF_HOUR, 1, 2, 
SignStyle.NOT_NEGATIVE)
+  .appendLiteral(':')
+  .appendValue(ChronoField.SECOND_OF_MINUTE, 1, 2, 
SignStyle.NOT_NEGATIVE)
+  // ISO 8601 supports up to nanosecond precision, but BQ only up to 
microsecond.
+  .optionalStart()
+  .appendFraction(ChronoField.MICRO_OF_SECOND, 0, 6, true)
+  .optionalEnd()
+  .optionalStart()
+  .parseLenient()
+  .appendOffsetId()
+  .toFormatter(Locale.ROOT);
+

Review Comment:
   Yes, it would appear that JDK8 had a different implementation for 
`DateTimeFormatter` that cannot handle offsets (or at least not in the same way 
as later versions). I'm inclined to switch to regex-based conversion, which 
will require more logic but be more widely compatible. Thoughts?



##
site/_docs/reference.md:
##
@@ -2596,7 +2596,16 @@ semantics.
 | p | CONVERT_TIMEZONE(tz1, tz2, datetime)   | Converts the timezone 
of *datetime* from *tz1* to *tz2*
 | b | CURRENT_DATETIME([timezone])   | Returns the current 
time as a TIMESTAMP from *timezone*
 | m | DAYNAME(datetime)  | Returns the name, in 
the connection's locale, of the weekday in *datetime*; for example, it returns 
'星期日' for both DATE '2020-02-10' and TIMESTAMP '2020-02-10 10:10:10'
-| b | DATE(string)   | Equivalent to 
`CAST(string AS DATE)`
+| b | DATE(integer, integer, integer)| Returns a date object 
given year, month, and day
+| b | DATE(timestamp)| Extracts the UTC date 
from a timestamp
+| b | DATE(timestamp, string)| Extracts the date from 
a timestamp, in a given time zone
+| b | TIMESTAMP(string)  | Equivalent to 
`CAST(string AS TIMESTAMP)`
+| b | TIMESTAMP(string, string)  | Equivalent to 
`CAST(string AS TIMESTAMP)`, converted to a given time zone
+| b | TIMESTAMP(date)| Convert a UTC date to a 
timestamp (at midnight)
+| b | TIMESTAMP(date, string)| Convert a date to a 
timestamp (at midnight), in a given time zone
+| b | TIME(integer, integer, integer)| Returns a time object 
given hour, minute, and second
+| b | TIME(timestamp)| Extracts the UTC time 
from a timestamp

Review Comment:
   Reworded for clarity.



##
babel/src/main/codegen/includes/parserImpls.ftl:
##
@@ -42,6 +42,46 @@ SqlNode DateFunctionCall() :
 }
 }
 
+SqlNode TimestampFunctionCall() :
+{
+final SqlFunctionCategory funcType = 
SqlFunctionCategory.USER_DEFINED_FUNCTION;

Review Comment:
   Ya this was just copied from the existing `DateFunctionCall`. I assumed this 
vaguely meant library function since there is no obvious "library function" 
value in the `SqlFunctionCategory` enum. Perhaps these should all be in the 
`TIMEDATE` category. I suppose since this is the babel parser, there's no real 
concept of a library function.
   
   BTW, is my assessment of babel in my above comment with Oliver correct? 
Should we be putting these productions in the core parser instead (or in 
addition to here)?



##
babel/src/test/resources/sql/big-query.iq:
##
@@ -1051,29 +1051,98 @@ select 

[GitHub] [calcite] sdreynolds opened a new pull request, #2975: [CALCITE-5391] Preserve Project field names

2022-11-18 Thread GitBox


sdreynolds opened a new pull request, #2975:
URL: https://github.com/apache/calcite/pull/2975

   ## Why:
   When rewriting the query for a semi join, the
   `JoinOnUniqueToSemiJoinRule` pushes a new project via `RelBuilder`. This 
project is *almost* a good copy of the original `Project` -- it is just missing 
the field names.
   
   ## How:
   This change uses `RelDataType` of the `Project` to get the field names for 
the new `Project` pushed on the `Relbuilder` stack. This is similar to what is 
done in `SemiJoinRule#perform`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@calcite.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [calcite] sdreynolds commented on a diff in pull request #2848: [CALCITE-5201] Improve SemiJoinRule to match Join's right input which is unique for Join keys

2022-11-18 Thread GitBox


sdreynolds commented on code in PR #2848:
URL: https://github.com/apache/calcite/pull/2848#discussion_r1026959819


##
core/src/main/java/org/apache/calcite/rel/rules/SemiJoinRule.java:
##
@@ -232,6 +234,84 @@ default JoinToSemiJoinRuleConfig 
withOperandFor(Class joinClass,
 }
   }
 
+  /**
+   * SemiJoinRule that matches a Project on top of a Join with a RelNode
+   * which is unique for Join's right keys.
+   *
+   * @see CoreRules#JOIN_ON_UNIQUE_TO_SEMI_JOIN */
+  public static class JoinOnUniqueToSemiJoinRule extends SemiJoinRule {
+
+/** Creates a JoinOnUniqueToSemiJoinRule. */
+protected JoinOnUniqueToSemiJoinRule(JoinOnUniqueToSemiJoinRuleConfig 
config) {
+  super(config);
+}
+
+@Override public boolean matches(RelOptRuleCall call) {
+  final Project project = call.rel(0);
+  final Join join = call.rel(1);
+  final RelNode left = call.rel(2);
+
+  final ImmutableBitSet bits =
+  RelOptUtil.InputFinder.bits(project.getProjects(), null);
+  final ImmutableBitSet rightBits =
+  ImmutableBitSet.range(left.getRowType().getFieldCount(),
+  join.getRowType().getFieldCount());
+  return !bits.intersects(rightBits);
+}
+
+@Override public void onMatch(RelOptRuleCall call) {
+  final Project project = call.rel(0);
+  final Join join = call.rel(1);
+  final RelNode left = call.rel(2);
+  final RelNode right = call.rel(3);
+
+  final JoinInfo joinInfo = join.analyzeCondition();
+  final RelOptCluster cluster = join.getCluster();
+  final RelMetadataQuery mq = cluster.getMetadataQuery();
+  final Boolean unique = mq.areColumnsUnique(right, joinInfo.rightSet());
+  if (unique != null && unique) {
+final RelBuilder builder = call.builder();
+switch (join.getJoinType()) {
+case INNER:
+  builder.push(left);
+  builder.push(right);
+  builder.join(JoinRelType.SEMI, join.getCondition());
+  break;
+case LEFT:
+  builder.push(left);
+  break;
+default:
+  throw new AssertionError(join.getJoinType());
+}
+builder.project(project.getProjects());

Review Comment:
   :doh: this doesn't carry through the field aliases like it does in the other 
[rule](https://github.com/libenchao/calcite/blob/65e07f71c3e98115f114ee05986daa6baab28b96/core/src/main/java/org/apache/calcite/rel/rules/SemiJoinRule.java#L127)
   ```java
   builder.project(project.getProjects(), project.getRowType().getFieldNames());
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@calcite.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [calcite] julianhyde commented on a diff in pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

2022-11-18 Thread GitBox


julianhyde commented on code in PR #2973:
URL: https://github.com/apache/calcite/pull/2973#discussion_r1026775499


##
site/_docs/reference.md:
##
@@ -2596,7 +2596,16 @@ semantics.
 | p | CONVERT_TIMEZONE(tz1, tz2, datetime)   | Converts the timezone 
of *datetime* from *tz1* to *tz2*
 | b | CURRENT_DATETIME([timezone])   | Returns the current 
time as a TIMESTAMP from *timezone*
 | m | DAYNAME(datetime)  | Returns the name, in 
the connection's locale, of the weekday in *datetime*; for example, it returns 
'星期日' for both DATE '2020-02-10' and TIMESTAMP '2020-02-10 10:10:10'
-| b | DATE(string)   | Equivalent to 
`CAST(string AS DATE)`
+| b | DATE(integer, integer, integer)| Returns a date object 
given year, month, and day
+| b | DATE(timestamp)| Extracts the UTC date 
from a timestamp
+| b | DATE(timestamp, string)| Extracts the date from 
a timestamp, in a given time zone
+| b | TIMESTAMP(string)  | Equivalent to 
`CAST(string AS TIMESTAMP)`
+| b | TIMESTAMP(string, string)  | Equivalent to 
`CAST(string AS TIMESTAMP)`, converted to a given time zone
+| b | TIMESTAMP(date)| Convert a UTC date to a 
timestamp (at midnight)
+| b | TIMESTAMP(date, string)| Convert a date to a 
timestamp (at midnight), in a given time zone
+| b | TIME(integer, integer, integer)| Returns a time object 
given hour, minute, and second
+| b | TIME(timestamp)| Extracts the UTC time 
from a timestamp

Review Comment:
   A Calcite TIMESTAMP value does not have a UTC time. Even though you're 
implementing for BQ, you must use Calcite terminology here.



##
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##
@@ -149,6 +159,40 @@ public class SqlFunctions {
 
   private static final Pattern PATTERN_0_STAR_E = Pattern.compile("0*E");
 
+  // Date formatter for BigQuery's timestamp literals:
+  // 
https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical#timestamp_literals
+  private static final DateTimeFormatter BIG_QUERY_TIMESTAMP_LITERAL_FORMATTER 
=
+  new DateTimeFormatterBuilder()
+  // Unlike ISO 8601, BQ only supports years between 1 - ,
+  // but can support single-digit month and day parts.
+  .appendValue(ChronoField.YEAR, 4)
+  .appendLiteral('-')
+  .appendValue(ChronoField.MONTH_OF_YEAR, 1, 2, SignStyle.NOT_NEGATIVE)
+  .appendLiteral('-')
+  .appendValue(ChronoField.DAY_OF_MONTH, 1, 2, SignStyle.NOT_NEGATIVE)
+  // Everything after the date is optional. Optional sections can be 
nested.
+  .optionalStart()
+  // BQ accepts either a literal 'T' or a space to separate the date 
from the time,
+  // so make the 'T' optional but pad with 1 space if it's omitted.
+  .padNext(1, ' ')
+  .optionalStart()
+  .appendLiteral('T')
+  .optionalEnd()
+  // Unlike ISO 8601, BQ can support single-digit hour, minute, and 
second parts.
+  .appendValue(ChronoField.HOUR_OF_DAY, 1, 2, SignStyle.NOT_NEGATIVE)
+  .appendLiteral(':')
+  .appendValue(ChronoField.MINUTE_OF_HOUR, 1, 2, 
SignStyle.NOT_NEGATIVE)
+  .appendLiteral(':')
+  .appendValue(ChronoField.SECOND_OF_MINUTE, 1, 2, 
SignStyle.NOT_NEGATIVE)
+  // ISO 8601 supports up to nanosecond precision, but BQ only up to 
microsecond.
+  .optionalStart()
+  .appendFraction(ChronoField.MICRO_OF_SECOND, 0, 6, true)
+  .optionalEnd()
+  .optionalStart()
+  .parseLenient()
+  .appendOffsetId()
+  .toFormatter(Locale.ROOT);
+

Review Comment:
   is this the code that Will said was non-JDK8 compliant? 



##
site/_docs/reference.md:
##
@@ -2596,7 +2596,16 @@ semantics.
 | p | CONVERT_TIMEZONE(tz1, tz2, datetime)   | Converts the timezone 
of *datetime* from *tz1* to *tz2*
 | b | CURRENT_DATETIME([timezone])   | Returns the current 
time as a TIMESTAMP from *timezone*
 | m | DAYNAME(datetime)  | Returns the name, in 
the connection's locale, of the weekday in *datetime*; for example, it returns 
'星期日' for both DATE '2020-02-10' and TIMESTAMP '2020-02-10 10:10:10'
-| b | DATE(string)   | Equivalent to 
`CAST(string AS DATE)`
+| b | DATE(integer, integer, integer)| Returns a date object 
given year, month, and day

Review Comment:
   this list should be alphabetically sorted



##
core/src/main/java/org/apache/calcite/sql/fun/SqlDateFunction.java:
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license