[
https://issues.apache.org/jira/browse/SPARK-57162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Max Gekk resolved SPARK-57162.
------------------------------
Fix Version/s: 4.3.0
Resolution: Fixed
Issue resolved by pull request 56295
[https://github.com/apache/spark/pull/56295]
> Add nanosecond-aware TimestampFormatter for parsing and formatting
> TimestampNanosVal with precision
> ---------------------------------------------------------------------------------------------------
>
> Key: SPARK-57162
> URL: https://issues.apache.org/jira/browse/SPARK-57162
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Assignee: Max Gekk
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.3.0
>
>
> h2. What
> Extend the {{TimestampFormatter}} family so it can parse a string into
> {{org.apache.spark.unsafe.types.TimestampNanosVal}} ({{epochMicros: Long}} +
> {{nanosWithinMicro: Short}} in [0, 999]) and format a {{TimestampNanosVal}}
> back to a string
> with a target fractional precision {{p}} in [7, 9].
> Parent: SPARK-56822. Builds on SPARK-57032 (raw string parsing for nanosecond
> fractional
> precision), which covers only {{SparkDateTimeUtils.parseTimestampString}},
> not the
> pattern-based / format (write) side used by datasources.
> h2. Why
> Today {{TimestampFormatter}} is microsecond-only: every {{parse}} /
> {{parseWithoutTimeZone}} returns a {{Long}} of epoch microseconds, and every
> {{format}}
> overload consumes microseconds. {{Iso8601TimestampFormatter.extractMicros}}
> reads
> {{ChronoField.MICRO_OF_SECOND}}, discarding the 7th-9th fractional digits,
> and the legacy
> {{FAST_DATE_FORMAT}} path caps at millisecond/microsecond resolution. There
> is no API that
> yields or consumes {{TimestampNanosVal}}.
> The JSON and CSV datasources (and other text-based paths) drive all timestamp
> parsing and
> formatting through {{TimestampFormatter}} with user-supplied
> {{timestampFormat}} patterns,
> so they cannot round-trip 7-9 digit fractions until the formatter is
> nanos-aware. This
> ticket is the foundational unblocker for nanosecond support in those
> datasources.
> h2. Scope
> {{sql/api/.../util/TimestampFormatter.scala}}
> * Add nanos-aware parse methods returning {{TimestampNanosVal}} (LTZ and NTZ /
> without-time-zone variants), and {{Optional}} counterparts mirroring
> {{parseOptional}} /
> {{parseWithoutTimeZoneOptional}}.
> * Add format methods accepting {{TimestampNanosVal}} plus the target
> precision {{p}}, with
> defined truncation/rounding of sub-precision digits.
> * Cover the implementations: {{Iso8601TimestampFormatter}} (extend
> {{extractMicros}} to also
> capture {{NANO_OF_SECOND}} remainder), {{DefaultTimestampFormatter}}
> (delegate to the
> SPARK-57032 nanos parse), and the legacy {{LegacyFastTimestampFormatter}}
> (define behavior
> or explicitly reject nanos in LEGACY mode).
> * Support fraction patterns up to 9 digits ({{[.SSSSSSS]}} ..
> {{[.SSSSSSSSS]}}) in both parse
> and format ({{DateTimeFormatterHelper}} already appends {{NANO_OF_SECOND}}
> 0..9).
> h2. Out of scope
> * JSON/CSV converter and schema-inference wiring (separate sub-tasks; they
> depend on this).
> * Raw string parsing already handled by SPARK-57032.
> * Datasource option additions.
> h2. Design notes
> * Precision {{p}} controls how many fractional digits are emitted on format
> and how
> sub-precision input is handled on parse (truncate vs round) - document and
> test the chosen
> rule.
> * Reuse the existing {{TimestampNanosVal}} normalization invariant
> (nanosWithinMicro in
> [0, 999]); carry overflow into {{epochMicros}}.
> * Keep all existing microsecond methods unchanged (additive API).
> h2. How was this patch tested
> * {{TimestampFormatterSuite}} (or new cases): parse/format round-trip for p
> in [7, 9] across
> ISO default and custom patterns; boundary values (nanosWithinMicro 0 and 999,
> pre-epoch
> instants, Long micro boundaries); LEGACY-mode behavior; truncation/rounding
> rule.
> h2. Does this PR introduce any user-facing change
> No. Additive formatter API gated for use behind
> {{spark.sql.timestampNanosTypes.enabled}} by
> its callers.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]