Max Gekk created SPARK-57211:
--------------------------------

             Summary: Cast strings to nanosecond-precision timestamps
                 Key: SPARK-57211
                 URL: https://issues.apache.org/jira/browse/SPARK-57211
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk
            Assignee: Max Gekk


h2. What

Wire {{Cast}} to support casting {{StringType}} to the nanosecond-capable
timestamp types {{TimestampNTZNanosType(p)}} and {{TimestampLTZNanosType(p)}}
with fractional-seconds precision {{p}} in {{[7, 9]}}, on both the interpreted
and codegen paths and across all eval modes (LEGACY, ANSI, TRY):

* {{CAST(<string> AS TIMESTAMP_NTZ(p))}}
* {{CAST(<string> AS TIMESTAMP_LTZ(p))}}

h2. Why

This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond precision).

The logical types, the {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} SQL syntax,
the physical row value {{TimestampNanosVal}}, and (via SPARK-57032) the
string-to-nanos parse helpers all exist, but {{Cast}} has zero arms for the
nanos types. As a result {{CAST(s AS TIMESTAMP_NTZ(9))}} fails type-check with
{{CAST_WITHOUT_SUGGESTION}} even when the preview flag
{{spark.sql.timestampNanosTypes.enabled}} is on. String ingestion is the most
common entry point for these types and unblocks typed literals, filters, and
CTAS once coercion lands.

h2. Approach

Reuse the parse entry points added in SPARK-57032 (on {{SparkDateTimeUtils}},
inherited by {{DateTimeUtils}}), which already return a normalized
{{TimestampNanosVal}} and apply per-precision truncation, so no separate
normalization module is required for the string path:

* {{stringToTimestampLTZNanos(s, precision, zoneId)}} /
  {{stringToTimestampLTZNanosAnsi(s, precision, zoneId, context)}}
* {{stringToTimestampNTZNanos(s, precision, allowTimeZone = true)}} /
  {{stringToTimestampNTZNanosAnsi(s, precision, context)}}

Changes in {{Cast.scala}}:
* Add {{StringType -> TimestampNTZNanosType(p) / TimestampLTZNanosType(p)}} arms
  to {{canCast}} and {{canAnsiCast}} (try-cast is covered: {{canTryCast}}
  delegates to {{canAnsiCast}}, and {{canUseLegacyCastForTryCast}} already
  matches {{(StringType, DatetimeType)}}, which the nanos types extend).
* Add {{(StringType, TimestampLTZNanosType)}} to {{Cast.needsTimeZone}}
  (NTZ string is zone-independent, mirroring micro NTZ).
* Add interpreted {{castToTimestampLTZNanos}} / {{castToTimestampNTZNanos}} and
  matching codegen, dispatched from {{castInternal}} / {{nullSafeCastFunction}}
  with the precision taken from the target type. The cast result is a
  {{TimestampNanosVal}} (or null in legacy/try mode).
* NTZ cast adopts {{allowTimeZone = true}} to match the existing micro
  {{TIMESTAMP_NTZ}} string cast (resolves the TODO(SPARK-57032) left on
  {{stringToTimestampNTZNanosAnsi}}).

Existing preview gating is unchanged: {{Cast.checkInputDataTypes}} already calls
{{TypeUtils.failUnsupportedDataType}}, which throws {{FEATURE_NOT_ENABLED}} when
the flag is off.

h2. Tests

* {{CastSuiteBase}}: success cases for both types over {{p}} in {{[7, 9]}} and a
  7-9 digit fractional corpus; LTZ parameterized over time zones, NTZ
  zone-independent. Expected values are {{TimestampNanosVal}}.
* {{CastWithAnsiOnSuite}}: malformed-input parse errors (DateTimeException).
* {{CastWithAnsiOffSuite}} / {{TryCastSuite}}: malformed input returns NULL.
* A flag-off guard asserting {{FEATURE_NOT_ENABLED}}.

h2. Out of scope (follow-ups under task 3)

* Other cast pairs: micro {{TimestampType}} / {{TimestampNTZType}} <-> nanos,
  {{DateType}} <-> nanos, numeric <-> nanos (need the shared 3a normalization
  helpers).
* Reverse direction nanos -> {{StringType}} and the nanos-aware
  {{TimestampFormatter}} (SPARK-57162).
* Type coercion / widening and {{AnyTimestampType}} extension.

h2. Acceptance criteria

* With the preview flag enabled, {{CAST(<string> AS TIMESTAMP_NTZ(p))}} and
  {{CAST(<string> AS TIMESTAMP_LTZ(p))}} for {{p}} in {{[7, 9]}} produce correct
  nanosecond values in LEGACY, ANSI, and TRY modes (interpreted and codegen).
* ANSI mode throws on malformed input; LEGACY/TRY return NULL.
* Existing microsecond timestamp string casts are unchanged.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to