Max Gekk created SPARK-57211:
--------------------------------
Summary: Cast strings to nanosecond-precision timestamps
Key: SPARK-57211
URL: https://issues.apache.org/jira/browse/SPARK-57211
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
Assignee: Max Gekk
h2. What
Wire {{Cast}} to support casting {{StringType}} to the nanosecond-capable
timestamp types {{TimestampNTZNanosType(p)}} and {{TimestampLTZNanosType(p)}}
with fractional-seconds precision {{p}} in {{[7, 9]}}, on both the interpreted
and codegen paths and across all eval modes (LEGACY, ANSI, TRY):
* {{CAST(<string> AS TIMESTAMP_NTZ(p))}}
* {{CAST(<string> AS TIMESTAMP_LTZ(p))}}
h2. Why
This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond precision).
The logical types, the {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} SQL syntax,
the physical row value {{TimestampNanosVal}}, and (via SPARK-57032) the
string-to-nanos parse helpers all exist, but {{Cast}} has zero arms for the
nanos types. As a result {{CAST(s AS TIMESTAMP_NTZ(9))}} fails type-check with
{{CAST_WITHOUT_SUGGESTION}} even when the preview flag
{{spark.sql.timestampNanosTypes.enabled}} is on. String ingestion is the most
common entry point for these types and unblocks typed literals, filters, and
CTAS once coercion lands.
h2. Approach
Reuse the parse entry points added in SPARK-57032 (on {{SparkDateTimeUtils}},
inherited by {{DateTimeUtils}}), which already return a normalized
{{TimestampNanosVal}} and apply per-precision truncation, so no separate
normalization module is required for the string path:
* {{stringToTimestampLTZNanos(s, precision, zoneId)}} /
{{stringToTimestampLTZNanosAnsi(s, precision, zoneId, context)}}
* {{stringToTimestampNTZNanos(s, precision, allowTimeZone = true)}} /
{{stringToTimestampNTZNanosAnsi(s, precision, context)}}
Changes in {{Cast.scala}}:
* Add {{StringType -> TimestampNTZNanosType(p) / TimestampLTZNanosType(p)}} arms
to {{canCast}} and {{canAnsiCast}} (try-cast is covered: {{canTryCast}}
delegates to {{canAnsiCast}}, and {{canUseLegacyCastForTryCast}} already
matches {{(StringType, DatetimeType)}}, which the nanos types extend).
* Add {{(StringType, TimestampLTZNanosType)}} to {{Cast.needsTimeZone}}
(NTZ string is zone-independent, mirroring micro NTZ).
* Add interpreted {{castToTimestampLTZNanos}} / {{castToTimestampNTZNanos}} and
matching codegen, dispatched from {{castInternal}} / {{nullSafeCastFunction}}
with the precision taken from the target type. The cast result is a
{{TimestampNanosVal}} (or null in legacy/try mode).
* NTZ cast adopts {{allowTimeZone = true}} to match the existing micro
{{TIMESTAMP_NTZ}} string cast (resolves the TODO(SPARK-57032) left on
{{stringToTimestampNTZNanosAnsi}}).
Existing preview gating is unchanged: {{Cast.checkInputDataTypes}} already calls
{{TypeUtils.failUnsupportedDataType}}, which throws {{FEATURE_NOT_ENABLED}} when
the flag is off.
h2. Tests
* {{CastSuiteBase}}: success cases for both types over {{p}} in {{[7, 9]}} and a
7-9 digit fractional corpus; LTZ parameterized over time zones, NTZ
zone-independent. Expected values are {{TimestampNanosVal}}.
* {{CastWithAnsiOnSuite}}: malformed-input parse errors (DateTimeException).
* {{CastWithAnsiOffSuite}} / {{TryCastSuite}}: malformed input returns NULL.
* A flag-off guard asserting {{FEATURE_NOT_ENABLED}}.
h2. Out of scope (follow-ups under task 3)
* Other cast pairs: micro {{TimestampType}} / {{TimestampNTZType}} <-> nanos,
{{DateType}} <-> nanos, numeric <-> nanos (need the shared 3a normalization
helpers).
* Reverse direction nanos -> {{StringType}} and the nanos-aware
{{TimestampFormatter}} (SPARK-57162).
* Type coercion / widening and {{AnyTimestampType}} extension.
h2. Acceptance criteria
* With the preview flag enabled, {{CAST(<string> AS TIMESTAMP_NTZ(p))}} and
{{CAST(<string> AS TIMESTAMP_LTZ(p))}} for {{p}} in {{[7, 9]}} produce correct
nanosecond values in LEGACY, ANSI, and TRY modes (interpreted and codegen).
* ANSI mode throws on malformed input; LEGACY/TRY return NULL.
* Existing microsecond timestamp string casts are unchanged.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]