Max Gekk created SPARK-57256:
--------------------------------
Summary: Cast nanosecond-precision timestamps to string
Key: SPARK-57256
URL: https://issues.apache.org/jira/browse/SPARK-57256
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
h2. What
Implement casting of the nanosecond-precision timestamp types
{{TIMESTAMP_NTZ(p)}} ({{TimestampNTZNanosType}}) and {{TIMESTAMP_LTZ(p)}}
({{TimestampLTZNanosType}}), {{p}} in [7, 9], to {{STRING}}.
Casting is implemented in {{ToStringBase}} (mixed into {{Cast}}), so this
change also fixes {{ToPrettyString}} (and therefore {{Dataset.show()}}) for
these types via the shared base.
The change wires the SPARK-57162 formatter methods into the existing
cast-to-string paths (interpreted and codegen):
* {{TimestampLTZNanosType(p)}} -> {{TimestampFormatter.formatNanos(v, p)}}
(renders in the session time zone).
* {{TimestampNTZNanosType(p)}} ->
{{TimestampFormatter.formatWithoutTimeZoneNanos(v, p)}} (zone-independent, UTC
wall-clock grid).
The fractional-second precision {{p}} is taken from the source type; sub-{{p}}
digits are floored and trailing zeros are trimmed, consistent with the
microsecond cast path (both use {{FractionTimestampFormatter}}).
{{Cast.needsTimeZone}} is extended so that {{TimestampLTZNanosType ->
StringType}} resolves the session time zone (mirroring {{TimestampType ->
StringType}}); the NTZ variant does not need a time zone.
h2. Why
Today {{Cast}} permits these casts at analysis time (the generic {{(_,
StringType)}} rule), but at runtime the nanosecond types have no dedicated case
in {{ToStringBase}} and fall through to the default {{String.valueOf(...)}}
branch, producing the internal form {{TimestampNanosVal(epochMicros,
nanosWithinMicro)}} instead of a proper SQL timestamp string. Producing a
correct textual representation is a prerequisite for nanosecond support in
expressions, {{SHOW}}/pretty output, and downstream text-based sinks.
h2. Example
With {{spark.sql.timestampNanosTypes.enabled=true}}:
{code:sql}
SELECT CAST(ts AS STRING);
-- TIMESTAMP_NTZ(9) value 2020-01-01 00:00:00.123456789
-- before: TimestampNanosVal(1577836800000000, 789)
-- after: 2020-01-01 00:00:00.123456789
{code}
h2. Behavior change
User-facing only when {{spark.sql.timestampNanosTypes.enabled=true}}; these
types are not available otherwise. Casting to string never fails, so ANSI and
non-ANSI modes behave identically.
h2. Dependency
Builds on SPARK-57162 (nanosecond-aware {{TimestampFormatter}}), which provides
{{formatNanos}} / {{formatWithoutTimeZoneNanos}}.
h2. How tested
New cases in {{CastSuiteBase}} (run under both ANSI on/off; {{checkEvaluation}}
exercises the interpreted and codegen paths): precision 7/8/9, trailing-zero
trimming, {{nanosWithinMicro}} 0 and 999, LTZ time-zone shift under a non-UTC
session zone vs. NTZ remaining unshifted, pre-epoch and year-9999 boundaries,
and null input.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]