[ 
https://issues.apache.org/jira/browse/SPARK-57162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-57162.
------------------------------
    Fix Version/s: 4.3.0
       Resolution: Fixed

Issue resolved by pull request 56295
[https://github.com/apache/spark/pull/56295]

> Add nanosecond-aware TimestampFormatter for parsing and formatting 
> TimestampNanosVal with precision
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-57162
>                 URL: https://issues.apache.org/jira/browse/SPARK-57162
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Assignee: Max Gekk
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.3.0
>
>
> h2. What
> Extend the {{TimestampFormatter}} family so it can parse a string into
> {{org.apache.spark.unsafe.types.TimestampNanosVal}} ({{epochMicros: Long}} +
> {{nanosWithinMicro: Short}} in [0, 999]) and format a {{TimestampNanosVal}} 
> back to a string
> with a target fractional precision {{p}} in [7, 9].
> Parent: SPARK-56822. Builds on SPARK-57032 (raw string parsing for nanosecond 
> fractional
> precision), which covers only {{SparkDateTimeUtils.parseTimestampString}}, 
> not the
> pattern-based / format (write) side used by datasources.
> h2. Why
> Today {{TimestampFormatter}} is microsecond-only: every {{parse}} /
> {{parseWithoutTimeZone}} returns a {{Long}} of epoch microseconds, and every 
> {{format}}
> overload consumes microseconds. {{Iso8601TimestampFormatter.extractMicros}} 
> reads
> {{ChronoField.MICRO_OF_SECOND}}, discarding the 7th-9th fractional digits, 
> and the legacy
> {{FAST_DATE_FORMAT}} path caps at millisecond/microsecond resolution. There 
> is no API that
> yields or consumes {{TimestampNanosVal}}.
> The JSON and CSV datasources (and other text-based paths) drive all timestamp 
> parsing and
> formatting through {{TimestampFormatter}} with user-supplied 
> {{timestampFormat}} patterns,
> so they cannot round-trip 7-9 digit fractions until the formatter is 
> nanos-aware. This
> ticket is the foundational unblocker for nanosecond support in those 
> datasources.
> h2. Scope
> {{sql/api/.../util/TimestampFormatter.scala}}
> * Add nanos-aware parse methods returning {{TimestampNanosVal}} (LTZ and NTZ /
> without-time-zone variants), and {{Optional}} counterparts mirroring 
> {{parseOptional}} /
> {{parseWithoutTimeZoneOptional}}.
> * Add format methods accepting {{TimestampNanosVal}} plus the target 
> precision {{p}}, with
> defined truncation/rounding of sub-precision digits.
> * Cover the implementations: {{Iso8601TimestampFormatter}} (extend 
> {{extractMicros}} to also
> capture {{NANO_OF_SECOND}} remainder), {{DefaultTimestampFormatter}} 
> (delegate to the
> SPARK-57032 nanos parse), and the legacy {{LegacyFastTimestampFormatter}} 
> (define behavior
> or explicitly reject nanos in LEGACY mode).
> * Support fraction patterns up to 9 digits ({{[.SSSSSSS]}} .. 
> {{[.SSSSSSSSS]}}) in both parse
> and format ({{DateTimeFormatterHelper}} already appends {{NANO_OF_SECOND}} 
> 0..9).
> h2. Out of scope
> * JSON/CSV converter and schema-inference wiring (separate sub-tasks; they 
> depend on this).
> * Raw string parsing already handled by SPARK-57032.
> * Datasource option additions.
> h2. Design notes
> * Precision {{p}} controls how many fractional digits are emitted on format 
> and how
> sub-precision input is handled on parse (truncate vs round) - document and 
> test the chosen
> rule.
> * Reuse the existing {{TimestampNanosVal}} normalization invariant 
> (nanosWithinMicro in
> [0, 999]); carry overflow into {{epochMicros}}.
> * Keep all existing microsecond methods unchanged (additive API).
> h2. How was this patch tested
> * {{TimestampFormatterSuite}} (or new cases): parse/format round-trip for p 
> in [7, 9] across
> ISO default and custom patterns; boundary values (nanosWithinMicro 0 and 999, 
> pre-epoch
> instants, Long micro boundaries); LEGACY-mode behavior; truncation/rounding 
> rule.
> h2. Does this PR introduce any user-facing change
> No. Additive formatter API gated for use behind 
> {{spark.sql.timestampNanosTypes.enabled}} by
> its callers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to