[ 
https://issues.apache.org/jira/browse/SPARK-57034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-57034.
------------------------------
    Fix Version/s: 4.3.0
       Resolution: Fixed

Issue resolved by pull request 56149
[https://github.com/apache/spark/pull/56149]

> Add TimestampNanosTestUtils and RandomDataGenerator support for nanosecond 
> timestamps
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-57034
>                 URL: https://issues.apache.org/jira/browse/SPARK-57034
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL, Tests
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Assignee: Stevo Mitric
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 4.3.0
>
>
> h2. Summary
> Introduce shared *test infrastructure* for nanosecond-capable timestamps: a 
> {{TimestampNanosTestUtils}} helper (parallel to {{DateTimeTestUtils}}) and 
> {{RandomDataGenerator}} support for {{TimestampNTZNanosType(p)}} / 
> {{TimestampLTZNanosType(p)}} with a fixed edge-case corpus and seeded random 
> values at *nanosecond* precision.
> The deliverable is *test code only* — no user-facing API.
> h2. Background
> Spark datetime tests rely on shared helpers today:
> * {{DateTimeTestUtils}} — fixed {{LocalDateTime}} / micro {{Long}} builders, 
> time zones, Julian/Gregorian edge handling
> * {{RandomDataGenerator.forType}} — random values per {{DataType}}; 
> {{TimestampType}} → {{Instant}}, {{TimestampNTZType}} → {{LocalDateTime}} via 
> {{uniformMicrosRand}} (micro precision only)
> * {{specialTs}} corpus in {{RandomDataGenerator}} and {{CastSuiteBase}} — 
> epoch, 1582 cutover, 0001, 9999 (no sub-micro fractional digits)
> Nanosecond row/unsafe tests 
> ([SPARK-56981|https://github.com/apache/spark/pull/56059]) use hand-written 
> {{TimestampNTZNanos(epochMicros, nanos)}} literals. Downcoming work (casts, 
> coercion, hash, Parquet, benchmarks, expression parity) needs reusable fixed 
> values, random generators, and {{java.time}}-based oracles — without 
> duplicating boilerplate in every suite.
> Sub-task SPARK-57033 provides {{java.time}} ↔ composite conversion; this 
> ticket consumes those helpers in test utilities and generators.
> h2. Scope
> h3. 1. {{TimestampNanosTestUtils}} (new, {{sql/catalyst/src/test/.../util/}})
> Add an object modeled on {{DateTimeTestUtils}}:
> * Readable fixed-value builders, e.g. {{timestampNTZ(year, month, day, hour, 
> minute, sec, nanosWithinMicro)}} returning {{TimestampNTZNanos}}; LTZ variant 
> with {{ZoneId}} where needed
> * Convenience wrappers producing {{LocalDateTime}} / {{Instant}} for the same 
> instants (delegate to SPARK-57033 conversion helpers)
> * Shared constants: default test zone IDs (reuse {{DateTimeTestUtils.UTC}}, 
> {{PST}}, etc.)
> * Optional {{gridTest}} / precision loop helpers for *p* in \[7, 9\] (mirror 
> patterns in existing datetime suites)
> h3. 2. Edge-case corpus ({{specialNanosTs}})
> Define a shared sequence of nanosecond timestamp strings and/or {{java.time}} 
> values, extending the micro {{specialTs}} set with 7–9 fractional digits, 
> e.g.:
> * {{1970-01-01 00:00:00.000000001}} ({{nanosWithinMicro = 1}})
> * {{1582-10-15 23:59:59.123456789}}
> * {{9999-12-31 23:59:59.999999999}} ({{nanosWithinMicro = 999}})
> * Existing corpus dates (0001, epoch, 9999) with {{nanosWithinMicro}} in {0, 
> 1, 999}
> Expose from {{TimestampNanosTestUtils}} for reuse in {{CastSuiteBase}}, 
> Parquet fixtures, and benchmarks.
> h3. 3. Extend {{RandomDataGenerator.forType}}
> Add cases for {{TimestampNTZNanosType(_)}} and {{TimestampLTZNanosType(_)}}:
> * *Uniform random:* {{uniformMicrosRand}} for {{epochMicros}} + 
> {{rand.nextInt(1000)}} for {{nanosWithinMicro}} (always normalized)
> * *External representation:* return {{LocalDateTime}} (NTZ) / {{Instant}} 
> (LTZ) — same convention as micro timestamp generators; not raw 
> {{TimestampNTZNanos}} pairs in caller code
> * *Special values:* mix in {{specialNanosTs}} corpus entries
> * *{{validJulianDatetime}}:* reuse existing flag and Proleptic-Gregorian 
> shift logic from micro generator
> * *Nullable:* honor {{nullable}} parameter (null fraction)
> h3. 4. Unit tests for the infrastructure itself
> New suite (e.g. {{TimestampNanosTestUtilsSuite}}):
> * Fixed builders produce normalized values ({{nanosWithinMicro}} in \[0, 
> 999\])
> * {{RandomDataGenerator.forType(TimestampNTZNanosType(9))}} returns non-null 
> {{LocalDateTime}} with varying nano-of-second
> * Seeded roundtrip smoke test: {{Random(42)}}, e.g. 1000 iterations — 
> generate → convert to composite → convert back → {{equals}} on {{java.time}} 
> (uses SPARK-57033 helpers)
> * {{specialNanosTs}} entries are parseable / convertible without exception
> h2. Implementation notes
> * Keep all new code under {{src/test}} — no production dependency from main 
> sources on test utils.
> * Prefer {{LocalDateTime}} / {{Instant}} as the external type in 
> {{RandomDataGenerator}} to match micro timestamp conventions and SPARK-57033 
> converters.
> * Do not change behavior of existing {{RandomDataGenerator}} cases for 
> {{TimestampType}} / {{TimestampNTZType}}.
> * Consider a single shared {{specialNanosTs}} object referenced from 
> {{RandomDataGenerator}} and optionally from {{CastSuiteBase}} in a follow-up 
> (avoid large unrelated refactors in this ticket; exporting from 
> {{TimestampNanosTestUtils}} is sufficient).
> h2. Acceptance criteria
> * {{TimestampNanosTestUtils}} provides fixed builders and {{specialNanosTs}} 
> corpus usable from other test suites.
> * {{RandomDataGenerator.forType(TimestampNTZNanosType(9))}} and 
> {{TimestampLTZNanosType(9)}} return {{Some(generator)}} producing 
> {{LocalDateTime}} / {{Instant}} with nanosecond variation.
> * {{TimestampNanosTestUtilsSuite}} (or equivalent) passes; seeded random 
> roundtrip smoke test passes.
> * Existing {{RandomDataGenerator}} and datetime test suites show no 
> regressions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to