Max Gekk created SPARK-57285:
--------------------------------

             Summary: Route nanosecond timestamp cast-to-string through the 
Types Framework in both interpreted and codegen paths
                 Key: SPARK-57285
                 URL: https://issues.apache.org/jira/browse/SPARK-57285
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk


h2. Background

SPARK-57256 implemented {{CAST(TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) AS STRING)}} 
for p in [7, 9]. The formatting currently lives in {{ToStringBase}} (alongside 
the microsecond timestamp types): the interpreted path explicitly bypasses 
{{TypeApiOps}}, and the codegen path inlines {{TimestampFormatter.formatNanos}} 
/ {{formatWithoutTimeZoneNanos}}. This was done because the Types Framework 
{{TypeApiOps.format(v)}} is zone-less and cannot render LTZ in the session time 
zone, so it deliberately still raises 
{{UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_TO_STRING}} for the zone-less callers.

This leaves nanosecond cast-to-string as a one-off integration outside the 
framework, which is inconsistent with the SPIP direction of wiring the new 
types through the centralized {{TypeOps}} / {{TypeApiOps}} (see SPARK-57101 / 
SPARK-57207).

h2. Goal

Make the Types Framework the single integration point for nanosecond timestamp 
cast-to-string, for both the interpreted and codegen paths, while producing the 
same output as SPARK-57256 (zone-aware LTZ, zone-independent NTZ, precision 
flooring, trailing-zero trimming).

h2. Proposed approach

* Interpreted path: extend the framework formatting hook with the session zone 
(e.g. an optional {{zoneId}} parameter on {{format}} / {{formatUTF8}}), and 
implement zone-aware formatting in {{TimestampNTZNanosTypeApiOps}} / 
{{TimestampLTZNanosTypeApiOps}} using the sql/api {{TimestampFormatter}} 
({{formatWithoutTimeZoneNanos}} for NTZ, {{formatNanos}} with {{zoneId}} for 
LTZ). Thread {{ToStringBase}}'s {{zoneId}} into the dispatch, then remove the 
{{castToStringDefault}} nanos cases and the current {{TypeApiOps}} bypass.
* Codegen path: {{TypeApiOps}} has no codegen hook today (each type is 
hand-written in {{ToStringBase.castToStringCode}}). Add a framework codegen 
hook (a method that emits the format snippet), or have {{castToStringCode}} 
emit a runtime call into the ops reference object passing the {{zoneId}} 
literal; then drop the inlined {{formatNanos}} cases.
* Zone-less callers: reconcile {{format()}} / {{toSQLValue()}} (EXPLAIN, 
SQL-literal rendering). NTZ needs no zone and can format directly; LTZ without 
a session zone keeps raising (or uses a documented default). Update 
{{TimestampNanosTypeOpsSuite}} accordingly.

h2. Out of scope

* The microsecond timestamp types ({{TIMESTAMP}} / {{TIMESTAMP_NTZ}}), which 
remain handled inline in {{ToStringBase}}.
* Any change to the rendered string output: this is a refactor with no 
user-facing behavior change.

h2. Testing

Existing {{CastWithAnsiOnSuite}} / {{CastWithAnsiOffSuite}}, 
{{ToPrettyStringSuite}}, {{TimestampNanosRowSuite}}, and the {{cast.sql}} 
golden files must stay green unchanged; add framework-level coverage for the 
new zone-aware {{format}} hook in both eval modes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to