Max Gekk created SPARK-57250:
--------------------------------

             Summary: Construct sub-microsecond timestamp typed literals with 
precision derived from fractional digits
                 Key: SPARK-57250
                 URL: https://issues.apache.org/jira/browse/SPARK-57250
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk


h2. What

Make the SQL typed-literal constructor in {{AstBuilder.visitTypeConstructor}} 
produce
nanosecond-capable timestamp literals when the literal string carries 7-9 
fractional-second
digits, instead of truncating to a microsecond {{Long}}. Following the ANSI SQL 
rule, the
fractional-second precision {{p}} of the literal is *derived from the number of 
digits in the
fractional part of the string* - there is no precision argument in the literal 
syntax.

Concretely, for a typed literal {{[TYPE] '<value>'}}:
* 0-6 fractional digits -> unchanged: a microsecond literal of 
{{TimestampType}} /
  {{TimestampNTZType}} (current behavior; aligned with SPARK-57163).
* 7-9 fractional digits -> a nanos literal:
  {{Literal(TimestampNTZNanos(epochMicros, nanosWithinMicro), 
TimestampNTZNanosType(p))}}
  or the LTZ counterpart, where {{p}} is the fractional digit count.
* more than 9 fractional digits -> a user-facing error (precision exceeds the 
maximum of 9).

Parent: SPARK-56822. Builds on SPARK-56876 (logical types), SPARK-56965 (parser 
names the
types), SPARK-56981 (physical row storage), SPARK-57032 (string parsing for 7-9 
digits), and
SPARK-57033 (java.time / composite conversion). Complements SPARK-57163 (p<=6 
-> micro mapping)
and the test-side SPARK-57165 (LiteralGenerator).

h2. Why

The parser can already *name* {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} types
(SPARK-56965), but a typed *literal value* with sub-microsecond precision is 
still silently
narrowed to micros. Today {{visitTypeConstructor}} routes {{TIMESTAMP}} / 
{{TIMESTAMP_NTZ}} /
{{TIMESTAMP_LTZ}} through {{stringToTimestamp}} / 
{{stringToTimestampWithoutTimeZone}}, which
return epoch-microsecond {{Long}}s, so {{TIMESTAMP_NTZ '2020-01-01 
00:00:00.123456789'}} loses
digits 7-9. Typed literals are the most direct way for users to introduce 
constant nanos
values in SQL (filters, INSERT ... VALUES, constant folding), so this unblocks 
the minimum
user-visible SQL surface for the preview feature.

h2. ANSI SQL behavior to follow

Per ISO/IEC 9075-2 (SQL/Foundation):
* Grammar: {{<timestamp literal> ::= TIMESTAMP <timestamp string>}}, where the 
fractional part
  is {{<seconds fraction> ::= <unsigned integer>}} (no fixed digit limit, and 
*no precision
  token in the literal*).
* Subclause 5.3, Syntax Rule 27: the declared type of a {{<timestamp literal>}} 
is
  {{TIMESTAMP(P)}}, "where P is the number of digits in <seconds fraction>". 
WITH vs WITHOUT
  TIME ZONE is determined by whether the string contains a {{<time zone 
interval>}} (offset).
* Subclause 6.1, Syntax Rules 34 and 36: default {{<timestamp precision>}} is 
6; the maximum is
  implementation-defined and "not less than 6". Spark caps it at 9 
(nanoseconds).

This matches Spark's existing literal grammar, which has no precision argument:
{{literalType singleStringLitWithoutMarker}} with {{literalType}} one of
{{TIMESTAMP | TIMESTAMP_NTZ | TIMESTAMP_LTZ}}. So no grammar change is 
required; only the
value-construction side of {{visitTypeConstructor}} changes.

h2. Scope

{{sql/catalyst/.../parser/AstBuilder.scala}} ({{visitTypeConstructor}}):
* Determine the fractional digit count of the literal string and pick micro vs 
nanos accordingly.
* For 7-9 digits, build {{TimestampNTZNanos}} / {{TimestampLTZNanos}} values 
via the
  SPARK-57032 nanos parse helpers and wrap them in {{Literal(..., 
TimestampNTZNanosType(p))}} /
  {{TimestampLTZNanosType(p)}}.
* Keep the existing NTZ/LTZ resolution for the bare {{TIMESTAMP}} keyword: when 
the string
  contains a time-zone offset, resolve to the LTZ (WITH TIME ZONE) variant; 
otherwise NTZ. The
  explicit {{TIMESTAMP_NTZ}} / {{TIMESTAMP_LTZ}} keywords pin the variant.
* Validate precision: more than 9 fractional digits raises the existing 
user-facing
  invalid-precision / cannot-parse error.
* Gate behind {{spark.sql.timestampNanosTypes.enabled}}. When the flag is off, 
retain current
  microsecond behavior (parse and narrow at 6 digits) exactly as today.
* Reuse {{convertSpecialTimestamp}} / {{convertSpecialTimestampNTZ}} for 
special values (these
  remain micro).

h2. Out of scope

* Custom-format / pattern-based literal parsing (covered by the nanos 
TimestampFormatter,
  SPARK-57162).
* Casts and type coercion for nanos types (separate cast-matrix and coercion 
subtasks).
* {{LiteralGenerator}} test support (SPARK-57165).
* Any new literal syntax with an explicit precision token (e.g. 
{{TIMESTAMP_NTZ(9) '...'}}) -
  not ANSI and not in scope.

h2. Design notes

* Precision is *implied*, never declared, in the literal - count the digits 
after the decimal
  point in the seconds field.
* {{TimestampNTZNanosType}} / {{TimestampLTZNanosType}} require precision in 
[7, 9]; literals
  with 0-6 fractional digits therefore cannot be nanos types and stay micro by 
construction.
* Reuse the {{TimestampNanosVal}} normalization invariant ({{nanosWithinMicro}} 
in [0, 999]).
* No rounding of sub-precision digits is needed here because {{p}} equals the 
exact digit count;
  truncation/rounding rules belong to cast/format paths.

h2. How was this patch tested

* New cases in the parser/expression literal suites: parse round-trip for 
{{TIMESTAMP_NTZ}} /
  {{TIMESTAMP_LTZ}} / {{TIMESTAMP}} literals with 7, 8, and 9 fractional 
digits; assert the
  resulting {{Literal}} dataType is {{TimestampNTZNanosType(p)}} / 
{{TimestampLTZNanosType(p)}}
  and the composite value matches a java.time oracle (SPARK-57033 helpers).
* Boundary cases: {{nanosWithinMicro}} 0 and 999, pre-epoch instants (e.g. 1582 
cutover),
  9999-12-31, and exactly 6 digits (must stay micro).
* Negative: more than 9 fractional digits raises the expected error; behavior 
with the preview
  flag disabled is unchanged.
* TIMESTAMP-keyword NTZ-vs-LTZ resolution based on presence of a time-zone 
offset.

h2. Does this PR introduce any user-facing change

Yes, but gated. When {{spark.sql.timestampNanosTypes.enabled}} is true, typed 
timestamp literals
with 7-9 fractional digits resolve to nanosecond-capable types instead of being 
narrowed to
microseconds, e.g.:

{code:sql}
SELECT TIMESTAMP_NTZ '2020-01-01 00:00:00.123456789';  -- 
TimestampNTZNanosType(9)
{code}

When the flag is false (production default), behavior is unchanged.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to