[
https://issues.apache.org/jira/browse/SPARK-57211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18085785#comment-18085785
]
Uroš Bojanić commented on SPARK-57211:
--------------------------------------
Resolved in https://github.com/apache/spark/pull/56288.
> Cast strings to nanosecond-precision timestamps
> -----------------------------------------------
>
> Key: SPARK-57211
> URL: https://issues.apache.org/jira/browse/SPARK-57211
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Assignee: Max Gekk
> Priority: Major
> Labels: pull-request-available
> Attachments: cast_strings_to_nanos_timestamps_4f1e8aee.plan.md
>
>
> h2. What
> Wire {{Cast}} to support casting {{StringType}} to the nanosecond-capable
> timestamp types {{TimestampNTZNanosType(p)}} and {{TimestampLTZNanosType(p)}}
> with fractional-seconds precision {{p}} in {{[7, 9]}}, on both the interpreted
> and codegen paths and across all eval modes (LEGACY, ANSI, TRY):
> * {{CAST(<string> AS TIMESTAMP_NTZ(p))}}
> * {{CAST(<string> AS TIMESTAMP_LTZ(p))}}
> h2. Why
> This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond
> precision).
> The logical types, the {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} SQL syntax,
> the physical row value {{TimestampNanosVal}}, and (via SPARK-57032) the
> string-to-nanos parse helpers all exist, but {{Cast}} has zero arms for the
> nanos types. As a result {{CAST(s AS TIMESTAMP_NTZ(9))}} fails type-check with
> {{CAST_WITHOUT_SUGGESTION}} even when the preview flag
> {{spark.sql.timestampNanosTypes.enabled}} is on. String ingestion is the most
> common entry point for these types and unblocks typed literals, filters, and
> CTAS once coercion lands.
> h2. Approach
> Reuse the parse entry points added in SPARK-57032 (on {{SparkDateTimeUtils}},
> inherited by {{DateTimeUtils}}), which already return a normalized
> {{TimestampNanosVal}} and apply per-precision truncation, so no separate
> normalization module is required for the string path:
> * {{stringToTimestampLTZNanos(s, precision, zoneId)}} /
> {{stringToTimestampLTZNanosAnsi(s, precision, zoneId, context)}}
> * {{stringToTimestampNTZNanos(s, precision, allowTimeZone = true)}} /
> {{stringToTimestampNTZNanosAnsi(s, precision, context)}}
> Changes in {{Cast.scala}}:
> * Add {{StringType -> TimestampNTZNanosType(p) / TimestampLTZNanosType(p)}}
> arms
> to {{canCast}} and {{canAnsiCast}} (try-cast is covered: {{canTryCast}}
> delegates to {{canAnsiCast}}, and {{canUseLegacyCastForTryCast}} already
> matches {{(StringType, DatetimeType)}}, which the nanos types extend).
> * Add {{(StringType, TimestampLTZNanosType)}} to {{Cast.needsTimeZone}}
> (NTZ string is zone-independent, mirroring micro NTZ).
> * Add interpreted {{castToTimestampLTZNanos}} / {{castToTimestampNTZNanos}}
> and
> matching codegen, dispatched from {{castInternal}} /
> {{nullSafeCastFunction}}
> with the precision taken from the target type. The cast result is a
> {{TimestampNanosVal}} (or null in legacy/try mode).
> * NTZ cast adopts {{allowTimeZone = true}} to match the existing micro
> {{TIMESTAMP_NTZ}} string cast (resolves the TODO(SPARK-57032) left on
> {{stringToTimestampNTZNanosAnsi}}).
> Existing preview gating is unchanged: {{Cast.checkInputDataTypes}} already
> calls
> {{TypeUtils.failUnsupportedDataType}}, which throws {{FEATURE_NOT_ENABLED}}
> when
> the flag is off.
> h2. Tests
> * {{CastSuiteBase}}: success cases for both types over {{p}} in {{[7, 9]}}
> and a
> 7-9 digit fractional corpus; LTZ parameterized over time zones, NTZ
> zone-independent. Expected values are {{TimestampNanosVal}}.
> * {{CastWithAnsiOnSuite}}: malformed-input parse errors (DateTimeException).
> * {{CastWithAnsiOffSuite}} / {{TryCastSuite}}: malformed input returns NULL.
> * A flag-off guard asserting {{FEATURE_NOT_ENABLED}}.
> h2. Out of scope (follow-ups under task 3)
> * Other cast pairs: micro {{TimestampType}} / {{TimestampNTZType}} <-> nanos,
> {{DateType}} <-> nanos, numeric <-> nanos (need the shared 3a normalization
> helpers).
> * Reverse direction nanos -> {{StringType}} and the nanos-aware
> {{TimestampFormatter}} (SPARK-57162).
> * Type coercion / widening and {{AnyTimestampType}} extension.
> h2. Acceptance criteria
> * With the preview flag enabled, {{CAST(<string> AS TIMESTAMP_NTZ(p))}} and
> {{CAST(<string> AS TIMESTAMP_LTZ(p))}} for {{p}} in {{[7, 9]}} produce
> correct
> nanosecond values in LEGACY, ANSI, and TRY modes (interpreted and codegen).
> * ANSI mode throws on malformed input; LEGACY/TRY return NULL.
> * Existing microsecond timestamp string casts are unchanged.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]