[
https://issues.apache.org/jira/browse/SPARK-57285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Max Gekk reassigned SPARK-57285:
--------------------------------
Assignee: Max Gekk
> Route nanosecond timestamp cast-to-string through the Types Framework in both
> interpreted and codegen paths
> -----------------------------------------------------------------------------------------------------------
>
> Key: SPARK-57285
> URL: https://issues.apache.org/jira/browse/SPARK-57285
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Assignee: Max Gekk
> Priority: Major
>
> h2. Background
> SPARK-57256 implemented {{CAST(TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) AS
> STRING)}} for p in [7, 9]. The formatting currently lives in {{ToStringBase}}
> (alongside the microsecond timestamp types): the interpreted path explicitly
> bypasses {{TypeApiOps}}, and the codegen path inlines
> {{TimestampFormatter.formatNanos}} / {{formatWithoutTimeZoneNanos}}. This was
> done because the Types Framework {{TypeApiOps.format(v)}} is zone-less and
> cannot render LTZ in the session time zone, so it deliberately still raises
> {{UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_TO_STRING}} for the zone-less callers.
> This leaves nanosecond cast-to-string as a one-off integration outside the
> framework, which is inconsistent with the SPIP direction of wiring the new
> types through the centralized {{TypeOps}} / {{TypeApiOps}} (see SPARK-57101 /
> SPARK-57207).
> h2. Goal
> Make the Types Framework the single integration point for nanosecond
> timestamp cast-to-string, for both the interpreted and codegen paths, while
> producing the same output as SPARK-57256 (zone-aware LTZ, zone-independent
> NTZ, precision flooring, trailing-zero trimming).
> h2. Proposed approach
> * Interpreted path: extend the framework formatting hook with the session
> zone (e.g. an optional {{zoneId}} parameter on {{format}} / {{formatUTF8}}),
> and implement zone-aware formatting in {{TimestampNTZNanosTypeApiOps}} /
> {{TimestampLTZNanosTypeApiOps}} using the sql/api {{TimestampFormatter}}
> ({{formatWithoutTimeZoneNanos}} for NTZ, {{formatNanos}} with {{zoneId}} for
> LTZ). Thread {{ToStringBase}}'s {{zoneId}} into the dispatch, then remove the
> {{castToStringDefault}} nanos cases and the current {{TypeApiOps}} bypass.
> * Codegen path: {{TypeApiOps}} has no codegen hook today (each type is
> hand-written in {{ToStringBase.castToStringCode}}). Add a framework codegen
> hook (a method that emits the format snippet), or have {{castToStringCode}}
> emit a runtime call into the ops reference object passing the {{zoneId}}
> literal; then drop the inlined {{formatNanos}} cases.
> * Zone-less callers: reconcile {{format()}} / {{toSQLValue()}} (EXPLAIN,
> SQL-literal rendering). NTZ needs no zone and can format directly; LTZ
> without a session zone keeps raising (or uses a documented default). Update
> {{TimestampNanosTypeOpsSuite}} accordingly.
> h2. Out of scope
> * The microsecond timestamp types ({{TIMESTAMP}} / {{TIMESTAMP_NTZ}}), which
> remain handled inline in {{ToStringBase}}.
> * Any change to the rendered string output: this is a refactor with no
> user-facing behavior change.
> h2. Testing
> Existing {{CastWithAnsiOnSuite}} / {{CastWithAnsiOffSuite}},
> {{ToPrettyStringSuite}}, {{TimestampNanosRowSuite}}, and the {{cast.sql}}
> golden files must stay green unchanged; add framework-level coverage for the
> new zone-aware {{format}} hook in both eval modes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]