[
https://issues.apache.org/jira/browse/SPARK-57250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086541#comment-18086541
]
Uroš Bojanić commented on SPARK-57250:
--------------------------------------
Issue resolved by pull request 56306
https://github.com/apache/spark/pull/56306
> Construct sub-microsecond timestamp typed literals with precision derived
> from fractional digits
> ------------------------------------------------------------------------------------------------
>
> Key: SPARK-57250
> URL: https://issues.apache.org/jira/browse/SPARK-57250
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Assignee: Max Gekk
> Priority: Major
> Labels: pull-request-available
>
> h2. What
> Make the SQL typed-literal constructor in {{AstBuilder.visitTypeConstructor}}
> produce
> nanosecond-capable timestamp literals when the literal string carries 7-9
> fractional-second
> digits, instead of truncating to a microsecond {{Long}}. Following the ANSI
> SQL rule, the
> fractional-second precision {{p}} of the literal is *derived from the number
> of digits in the
> fractional part of the string* - there is no precision argument in the
> literal syntax.
> Concretely, for a typed literal {{[TYPE] '<value>'}}:
> * 0-6 fractional digits -> unchanged: a microsecond literal of
> {{TimestampType}} /
> {{TimestampNTZType}} (current behavior; aligned with SPARK-57163).
> * 7-9 fractional digits -> a nanos literal:
> {{Literal(TimestampNTZNanos(epochMicros, nanosWithinMicro),
> TimestampNTZNanosType(p))}}
> or the LTZ counterpart, where {{p}} is the fractional digit count.
> * more than 9 fractional digits -> a user-facing error (precision exceeds the
> maximum of 9).
> Parent: SPARK-56822. Builds on SPARK-56876 (logical types), SPARK-56965
> (parser names the
> types), SPARK-56981 (physical row storage), SPARK-57032 (string parsing for
> 7-9 digits), and
> SPARK-57033 (java.time / composite conversion). Complements SPARK-57163 (p<=6
> -> micro mapping)
> and the test-side SPARK-57165 (LiteralGenerator).
> h2. Why
> The parser can already *name* {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}}
> types
> (SPARK-56965), but a typed *literal value* with sub-microsecond precision is
> still silently
> narrowed to micros. Today {{visitTypeConstructor}} routes {{TIMESTAMP}} /
> {{TIMESTAMP_NTZ}} /
> {{TIMESTAMP_LTZ}} through {{stringToTimestamp}} /
> {{stringToTimestampWithoutTimeZone}}, which
> return epoch-microsecond {{Long}}s, so {{TIMESTAMP_NTZ '2020-01-01
> 00:00:00.123456789'}} loses
> digits 7-9. Typed literals are the most direct way for users to introduce
> constant nanos
> values in SQL (filters, INSERT ... VALUES, constant folding), so this
> unblocks the minimum
> user-visible SQL surface for the preview feature.
> h2. ANSI SQL behavior to follow
> Per ISO/IEC 9075-2 (SQL/Foundation):
> * Grammar: {{<timestamp literal> ::= TIMESTAMP <timestamp string>}}, where
> the fractional part
> is {{<seconds fraction> ::= <unsigned integer>}} (no fixed digit limit, and
> *no precision
> token in the literal*).
> * Subclause 5.3, Syntax Rule 27: the declared type of a {{<timestamp
> literal>}} is
> {{TIMESTAMP(P)}}, "where P is the number of digits in <seconds fraction>".
> WITH vs WITHOUT
> TIME ZONE is determined by whether the string contains a {{<time zone
> interval>}} (offset).
> * Subclause 6.1, Syntax Rules 34 and 36: default {{<timestamp precision>}} is
> 6; the maximum is
> implementation-defined and "not less than 6". Spark caps it at 9
> (nanoseconds).
> This matches Spark's existing literal grammar, which has no precision
> argument:
> {{literalType singleStringLitWithoutMarker}} with {{literalType}} one of
> {{TIMESTAMP | TIMESTAMP_NTZ | TIMESTAMP_LTZ}}. So no grammar change is
> required; only the
> value-construction side of {{visitTypeConstructor}} changes.
> h2. Scope
> {{sql/catalyst/.../parser/AstBuilder.scala}} ({{visitTypeConstructor}}):
> * Determine the fractional digit count of the literal string and pick micro
> vs nanos accordingly.
> * For 7-9 digits, build {{TimestampNTZNanos}} / {{TimestampLTZNanos}} values
> via the
> SPARK-57032 nanos parse helpers and wrap them in {{Literal(...,
> TimestampNTZNanosType(p))}} /
> {{TimestampLTZNanosType(p)}}.
> * Keep the existing NTZ/LTZ resolution for the bare {{TIMESTAMP}} keyword:
> when the string
> contains a time-zone offset, resolve to the LTZ (WITH TIME ZONE) variant;
> otherwise NTZ. The
> explicit {{TIMESTAMP_NTZ}} / {{TIMESTAMP_LTZ}} keywords pin the variant.
> * Validate precision: more than 9 fractional digits raises the existing
> user-facing
> invalid-precision / cannot-parse error.
> * Gate behind {{spark.sql.timestampNanosTypes.enabled}}. When the flag is
> off, retain current
> microsecond behavior (parse and narrow at 6 digits) exactly as today.
> * Reuse {{convertSpecialTimestamp}} / {{convertSpecialTimestampNTZ}} for
> special values (these
> remain micro).
> h2. Out of scope
> * Custom-format / pattern-based literal parsing (covered by the nanos
> TimestampFormatter,
> SPARK-57162).
> * Casts and type coercion for nanos types (separate cast-matrix and coercion
> subtasks).
> * {{LiteralGenerator}} test support (SPARK-57165).
> * Any new literal syntax with an explicit precision token (e.g.
> {{TIMESTAMP_NTZ(9) '...'}}) -
> not ANSI and not in scope.
> h2. Design notes
> * Precision is *implied*, never declared, in the literal - count the digits
> after the decimal
> point in the seconds field.
> * {{TimestampNTZNanosType}} / {{TimestampLTZNanosType}} require precision in
> [7, 9]; literals
> with 0-6 fractional digits therefore cannot be nanos types and stay micro
> by construction.
> * Reuse the {{TimestampNanosVal}} normalization invariant
> ({{nanosWithinMicro}} in [0, 999]).
> * No rounding of sub-precision digits is needed here because {{p}} equals the
> exact digit count;
> truncation/rounding rules belong to cast/format paths.
> h2. How was this patch tested
> * New cases in the parser/expression literal suites: parse round-trip for
> {{TIMESTAMP_NTZ}} /
> {{TIMESTAMP_LTZ}} / {{TIMESTAMP}} literals with 7, 8, and 9 fractional
> digits; assert the
> resulting {{Literal}} dataType is {{TimestampNTZNanosType(p)}} /
> {{TimestampLTZNanosType(p)}}
> and the composite value matches a java.time oracle (SPARK-57033 helpers).
> * Boundary cases: {{nanosWithinMicro}} 0 and 999, pre-epoch instants (e.g.
> 1582 cutover),
> 9999-12-31, and exactly 6 digits (must stay micro).
> * Negative: more than 9 fractional digits raises the expected error; behavior
> with the preview
> flag disabled is unchanged.
> * TIMESTAMP-keyword NTZ-vs-LTZ resolution based on presence of a time-zone
> offset.
> h2. Does this PR introduce any user-facing change
> Yes, but gated. When {{spark.sql.timestampNanosTypes.enabled}} is true, typed
> timestamp literals
> with 7-9 fractional digits resolve to nanosecond-capable types instead of
> being narrowed to
> microseconds, e.g.:
> {code:sql}
> SELECT TIMESTAMP_NTZ '2020-01-01 00:00:00.123456789'; --
> TimestampNTZNanosType(9)
> {code}
> When the flag is false (production default), behavior is unchanged.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]