Max Gekk created SPARK-57257:
--------------------------------
Summary: Support nanosecond-precision timestamps in Hive results
Key: SPARK-57257
URL: https://issues.apache.org/jira/browse/SPARK-57257
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
h2. What
Modify {{HiveResult}} to support the nanosecond-precision timestamp types
{{TIMESTAMP_LTZ(p)}} ({{TimestampLTZNanosType}}) and {{TIMESTAMP_NTZ(p)}}
({{TimestampNTZNanosType}}), {{p}} in [7, 9].
Add cases to {{HiveResult.toHiveStringDefault}} mirroring the existing
microsecond timestamp cases:
* {{(i: Instant, _: TimestampLTZNanosType)}} -> render in the session time zone.
* {{(l: LocalDateTime, _: TimestampNTZNanosType)}} -> render zone-independently.
Both render with the nanosecond-aware {{TimestampFormatter}} (SPARK-57162) at
the column's fractional-second precision {{p}}, flooring sub-{{p}} digits and
trimming trailing zeros, consistent with casting these types to string.
{{getTimeFormatters}} already constructs a {{FractionTimestampFormatter}} via
{{TimestampFormatter.getFractionFormatter}}, which now exposes {{formatNanos}}
/ {{formatWithoutTimeZoneNanos}}.
h2. Why
Before the change, formatting a nanosecond timestamp column through
{{HiveResult}} (e.g. end-to-end SQL / golden-file tests, {{spark-sql}} CLI,
Thrift server output) hits the catch-all match and fails with a {{MatchError}},
analogous to the {{TimeType}} issue fixed in SPARK-51517:
{code}
scala.MatchError
(2020-01-01T00:00:00.123456789Z, TimestampLTZNanosType(9)) (of class
scala.Tuple2)
{code}
The existing cases at {{HiveResult.scala}} match only the microsecond
{{TimestampType}} / {{TimestampNTZType}}, so the parameterized nanos types are
not handled.
h2. Does this PR introduce any user-facing change?
It fixes the error above. After the change, nanosecond timestamp values are
rendered as proper strings in Hive results (only reachable when
{{spark.sql.timestampNanosTypes.enabled=true}}).
h2. Dependency
Builds on SPARK-57162 (nanosecond-aware {{TimestampFormatter}}).
h2. How tested
* New cases in {{HiveResultSuite}} covering {{TIMESTAMP_LTZ(p)}} /
{{TIMESTAMP_NTZ(p)}} for {{p}} in [7, 9]: precision-driven fraction width,
trailing-zero trimming, {{nanosWithinMicro}} 0 and 999, LTZ session-zone
rendering vs. zone-independent NTZ, and nested (array/map/struct) values.
* A golden-file end-to-end test (as SPARK-51517 added {{time.sql}}), disabled
in {{ThriftServerQueryTestSuite}} if needed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]