[
https://issues.apache.org/jira/browse/SPARK-57034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Max Gekk resolved SPARK-57034.
------------------------------
Fix Version/s: 4.3.0
Resolution: Fixed
Issue resolved by pull request 56149
[https://github.com/apache/spark/pull/56149]
> Add TimestampNanosTestUtils and RandomDataGenerator support for nanosecond
> timestamps
> -------------------------------------------------------------------------------------
>
> Key: SPARK-57034
> URL: https://issues.apache.org/jira/browse/SPARK-57034
> Project: Spark
> Issue Type: Sub-task
> Components: SQL, Tests
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Assignee: Stevo Mitric
> Priority: Minor
> Labels: pull-request-available
> Fix For: 4.3.0
>
>
> h2. Summary
> Introduce shared *test infrastructure* for nanosecond-capable timestamps: a
> {{TimestampNanosTestUtils}} helper (parallel to {{DateTimeTestUtils}}) and
> {{RandomDataGenerator}} support for {{TimestampNTZNanosType(p)}} /
> {{TimestampLTZNanosType(p)}} with a fixed edge-case corpus and seeded random
> values at *nanosecond* precision.
> The deliverable is *test code only* — no user-facing API.
> h2. Background
> Spark datetime tests rely on shared helpers today:
> * {{DateTimeTestUtils}} — fixed {{LocalDateTime}} / micro {{Long}} builders,
> time zones, Julian/Gregorian edge handling
> * {{RandomDataGenerator.forType}} — random values per {{DataType}};
> {{TimestampType}} → {{Instant}}, {{TimestampNTZType}} → {{LocalDateTime}} via
> {{uniformMicrosRand}} (micro precision only)
> * {{specialTs}} corpus in {{RandomDataGenerator}} and {{CastSuiteBase}} —
> epoch, 1582 cutover, 0001, 9999 (no sub-micro fractional digits)
> Nanosecond row/unsafe tests
> ([SPARK-56981|https://github.com/apache/spark/pull/56059]) use hand-written
> {{TimestampNTZNanos(epochMicros, nanos)}} literals. Downcoming work (casts,
> coercion, hash, Parquet, benchmarks, expression parity) needs reusable fixed
> values, random generators, and {{java.time}}-based oracles — without
> duplicating boilerplate in every suite.
> Sub-task SPARK-57033 provides {{java.time}} ↔ composite conversion; this
> ticket consumes those helpers in test utilities and generators.
> h2. Scope
> h3. 1. {{TimestampNanosTestUtils}} (new, {{sql/catalyst/src/test/.../util/}})
> Add an object modeled on {{DateTimeTestUtils}}:
> * Readable fixed-value builders, e.g. {{timestampNTZ(year, month, day, hour,
> minute, sec, nanosWithinMicro)}} returning {{TimestampNTZNanos}}; LTZ variant
> with {{ZoneId}} where needed
> * Convenience wrappers producing {{LocalDateTime}} / {{Instant}} for the same
> instants (delegate to SPARK-57033 conversion helpers)
> * Shared constants: default test zone IDs (reuse {{DateTimeTestUtils.UTC}},
> {{PST}}, etc.)
> * Optional {{gridTest}} / precision loop helpers for *p* in \[7, 9\] (mirror
> patterns in existing datetime suites)
> h3. 2. Edge-case corpus ({{specialNanosTs}})
> Define a shared sequence of nanosecond timestamp strings and/or {{java.time}}
> values, extending the micro {{specialTs}} set with 7–9 fractional digits,
> e.g.:
> * {{1970-01-01 00:00:00.000000001}} ({{nanosWithinMicro = 1}})
> * {{1582-10-15 23:59:59.123456789}}
> * {{9999-12-31 23:59:59.999999999}} ({{nanosWithinMicro = 999}})
> * Existing corpus dates (0001, epoch, 9999) with {{nanosWithinMicro}} in {0,
> 1, 999}
> Expose from {{TimestampNanosTestUtils}} for reuse in {{CastSuiteBase}},
> Parquet fixtures, and benchmarks.
> h3. 3. Extend {{RandomDataGenerator.forType}}
> Add cases for {{TimestampNTZNanosType(_)}} and {{TimestampLTZNanosType(_)}}:
> * *Uniform random:* {{uniformMicrosRand}} for {{epochMicros}} +
> {{rand.nextInt(1000)}} for {{nanosWithinMicro}} (always normalized)
> * *External representation:* return {{LocalDateTime}} (NTZ) / {{Instant}}
> (LTZ) — same convention as micro timestamp generators; not raw
> {{TimestampNTZNanos}} pairs in caller code
> * *Special values:* mix in {{specialNanosTs}} corpus entries
> * *{{validJulianDatetime}}:* reuse existing flag and Proleptic-Gregorian
> shift logic from micro generator
> * *Nullable:* honor {{nullable}} parameter (null fraction)
> h3. 4. Unit tests for the infrastructure itself
> New suite (e.g. {{TimestampNanosTestUtilsSuite}}):
> * Fixed builders produce normalized values ({{nanosWithinMicro}} in \[0,
> 999\])
> * {{RandomDataGenerator.forType(TimestampNTZNanosType(9))}} returns non-null
> {{LocalDateTime}} with varying nano-of-second
> * Seeded roundtrip smoke test: {{Random(42)}}, e.g. 1000 iterations —
> generate → convert to composite → convert back → {{equals}} on {{java.time}}
> (uses SPARK-57033 helpers)
> * {{specialNanosTs}} entries are parseable / convertible without exception
> h2. Implementation notes
> * Keep all new code under {{src/test}} — no production dependency from main
> sources on test utils.
> * Prefer {{LocalDateTime}} / {{Instant}} as the external type in
> {{RandomDataGenerator}} to match micro timestamp conventions and SPARK-57033
> converters.
> * Do not change behavior of existing {{RandomDataGenerator}} cases for
> {{TimestampType}} / {{TimestampNTZType}}.
> * Consider a single shared {{specialNanosTs}} object referenced from
> {{RandomDataGenerator}} and optionally from {{CastSuiteBase}} in a follow-up
> (avoid large unrelated refactors in this ticket; exporting from
> {{TimestampNanosTestUtils}} is sufficient).
> h2. Acceptance criteria
> * {{TimestampNanosTestUtils}} provides fixed builders and {{specialNanosTs}}
> corpus usable from other test suites.
> * {{RandomDataGenerator.forType(TimestampNTZNanosType(9))}} and
> {{TimestampLTZNanosType(9)}} return {{Some(generator)}} producing
> {{LocalDateTime}} / {{Instant}} with nanosecond variation.
> * {{TimestampNanosTestUtilsSuite}} (or equivalent) passes; seeded random
> roundtrip smoke test passes.
> * Existing {{RandomDataGenerator}} and datetime test suites show no
> regressions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]