[ 
https://issues.apache.org/jira/browse/SPARK-57256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić resolved SPARK-57256.
----------------------------------
    Fix Version/s: 4.3.0
       Resolution: Fixed

Issue resolved by pull request 56317
[https://github.com/apache/spark/pull/56317]

> Cast nanosecond-precision timestamps to string
> ----------------------------------------------
>
>                 Key: SPARK-57256
>                 URL: https://issues.apache.org/jira/browse/SPARK-57256
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Assignee: Max Gekk
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.3.0
>
>
> h2. What
> Implement casting of the nanosecond-precision timestamp types 
> {{TIMESTAMP_NTZ(p)}} ({{TimestampNTZNanosType}}) and {{TIMESTAMP_LTZ(p)}} 
> ({{TimestampLTZNanosType}}), {{p}} in [7, 9], to {{STRING}}.
> Casting is implemented in {{ToStringBase}} (mixed into {{Cast}}), so this 
> change also fixes {{ToPrettyString}} (and therefore {{Dataset.show()}}) for 
> these types via the shared base.
> The change wires the SPARK-57162 formatter methods into the existing 
> cast-to-string paths (interpreted and codegen):
> * {{TimestampLTZNanosType(p)}} -> {{TimestampFormatter.formatNanos(v, p)}} 
> (renders in the session time zone).
> * {{TimestampNTZNanosType(p)}} -> 
> {{TimestampFormatter.formatWithoutTimeZoneNanos(v, p)}} (zone-independent, 
> UTC wall-clock grid).
> The fractional-second precision {{p}} is taken from the source type; 
> sub-{{p}} digits are floored and trailing zeros are trimmed, consistent with 
> the microsecond cast path (both use {{FractionTimestampFormatter}}).
> {{Cast.needsTimeZone}} is extended so that {{TimestampLTZNanosType -> 
> StringType}} resolves the session time zone (mirroring {{TimestampType -> 
> StringType}}); the NTZ variant does not need a time zone.
> h2. Why
> Today {{Cast}} permits these casts at analysis time (the generic {{(_, 
> StringType)}} rule), but at runtime the nanosecond types have no dedicated 
> case in {{ToStringBase}} and fall through to the default 
> {{String.valueOf(...)}} branch, producing the internal form 
> {{TimestampNanosVal(epochMicros, nanosWithinMicro)}} instead of a proper SQL 
> timestamp string. Producing a correct textual representation is a 
> prerequisite for nanosecond support in expressions, {{SHOW}}/pretty output, 
> and downstream text-based sinks.
> h2. Example
> With {{spark.sql.timestampNanosTypes.enabled=true}}:
> {code:sql}
> SELECT CAST(ts AS STRING);
> -- TIMESTAMP_NTZ(9) value 2020-01-01 00:00:00.123456789
> --   before: TimestampNanosVal(1577836800000000, 789)
> --   after:  2020-01-01 00:00:00.123456789
> {code}
> h2. Behavior change
> User-facing only when {{spark.sql.timestampNanosTypes.enabled=true}}; these 
> types are not available otherwise. Casting to string never fails, so ANSI and 
> non-ANSI modes behave identically.
> h2. Dependency
> Builds on SPARK-57162 (nanosecond-aware {{TimestampFormatter}}), which 
> provides {{formatNanos}} / {{formatWithoutTimeZoneNanos}}.
> h2. How tested
> New cases in {{CastSuiteBase}} (run under both ANSI on/off; 
> {{checkEvaluation}} exercises the interpreted and codegen paths): precision 
> 7/8/9, trailing-zero trimming, {{nanosWithinMicro}} 0 and 999, LTZ time-zone 
> shift under a non-UTC session zone vs. NTZ remaining unshifted, pre-epoch and 
> year-9999 boundaries, and null input.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to