Max Gekk created SPARK-57034:
--------------------------------
Summary: Add TimestampNanosTestUtils and RandomDataGenerator
support for nanosecond timestamps
Key: SPARK-57034
URL: https://issues.apache.org/jira/browse/SPARK-57034
Project: Spark
Issue Type: Sub-task
Components: SQL, Tests
Affects Versions: 4.2.0
Reporter: Max Gekk
h2. Summary
Introduce shared *test infrastructure* for nanosecond-capable timestamps: a
{{TimestampNanosTestUtils}} helper (parallel to {{DateTimeTestUtils}}) and
{{RandomDataGenerator}} support for {{TimestampNTZNanosType(p)}} /
{{TimestampLTZNanosType(p)}} with a fixed edge-case corpus and seeded random
values at *nanosecond* precision.
The deliverable is *test code only* — no user-facing API.
h2. Background
Spark datetime tests rely on shared helpers today:
* {{DateTimeTestUtils}} — fixed {{LocalDateTime}} / micro {{Long}} builders,
time zones, Julian/Gregorian edge handling
* {{RandomDataGenerator.forType}} — random values per {{DataType}};
{{TimestampType}} → {{Instant}}, {{TimestampNTZType}} → {{LocalDateTime}} via
{{uniformMicrosRand}} (micro precision only)
* {{specialTs}} corpus in {{RandomDataGenerator}} and {{CastSuiteBase}} —
epoch, 1582 cutover, 0001, 9999 (no sub-micro fractional digits)
Nanosecond row/unsafe tests
([SPARK-56981|https://github.com/apache/spark/pull/56059]) use hand-written
{{TimestampNTZNanos(epochMicros, nanos)}} literals. Downcoming work (casts,
coercion, hash, Parquet, benchmarks, expression parity) needs reusable fixed
values, random generators, and {{java.time}}-based oracles — without
duplicating boilerplate in every suite.
Sub-task *1b* provides {{java.time}} ↔ composite conversion; this ticket
consumes those helpers in test utilities and generators.
h2. Scope
h3. 1. {{TimestampNanosTestUtils}} (new, {{sql/catalyst/src/test/.../util/}})
Add an object modeled on {{DateTimeTestUtils}}:
* Readable fixed-value builders, e.g. {{timestampNTZ(year, month, day, hour,
minute, sec, nanosWithinMicro)}} returning {{TimestampNTZNanos}}; LTZ variant
with {{ZoneId}} where needed
* Convenience wrappers producing {{LocalDateTime}} / {{Instant}} for the same
instants (delegate to *1b* conversion helpers)
* Shared constants: default test zone IDs (reuse {{DateTimeTestUtils.UTC}},
{{PST}}, etc.)
* Optional {{gridTest}} / precision loop helpers for *p* in \[7, 9\] (mirror
patterns in existing datetime suites)
h3. 2. Edge-case corpus ({{specialNanosTs}})
Define a shared sequence of nanosecond timestamp strings and/or {{java.time}}
values, extending the micro {{specialTs}} set with 7–9 fractional digits, e.g.:
* {{1970-01-01 00:00:00.000000001}} ({{nanosWithinMicro = 1}})
* {{1582-10-15 23:59:59.123456789}}
* {{9999-12-31 23:59:59.999999999}} ({{nanosWithinMicro = 999}})
* Existing corpus dates (0001, epoch, 9999) with {{nanosWithinMicro}} in {0, 1,
999}
Expose from {{TimestampNanosTestUtils}} for reuse in {{CastSuiteBase}}, Parquet
fixtures, and benchmarks.
h3. 3. Extend {{RandomDataGenerator.forType}}
Add cases for {{TimestampNTZNanosType(_)}} and {{TimestampLTZNanosType(_)}}:
* *Uniform random:* {{uniformMicrosRand}} for {{epochMicros}} +
{{rand.nextInt(1000)}} for {{nanosWithinMicro}} (always normalized)
* *External representation:* return {{LocalDateTime}} (NTZ) / {{Instant}} (LTZ)
— same convention as micro timestamp generators; not raw {{TimestampNTZNanos}}
pairs in caller code
* *Special values:* mix in {{specialNanosTs}} corpus entries
* *{{validJulianDatetime}}:* reuse existing flag and Proleptic-Gregorian shift
logic from micro generator
* *Nullable:* honor {{nullable}} parameter (null fraction)
h3. 4. Unit tests for the infrastructure itself
New suite (e.g. {{TimestampNanosTestUtilsSuite}}):
* Fixed builders produce normalized values ({{nanosWithinMicro}} in \[0, 999\])
* {{RandomDataGenerator.forType(TimestampNTZNanosType(9))}} returns non-null
{{LocalDateTime}} with varying nano-of-second
* Seeded roundtrip smoke test: {{Random(42)}}, e.g. 1000 iterations — generate
→ convert to composite → convert back → {{equals}} on {{java.time}} (uses *1b*
helpers)
* {{specialNanosTs}} entries are parseable / convertible without exception
h2. Out of scope
* Production conversion or parsing logic (sub-tasks *1a*, *1b*, *1c*, *1d*)
* {{CatalystTypeConverters}} / Dataset encoder wiring (*1b*)
* Cast, hash, ordering, Parquet implementation tests (those suites *consume*
this infra in later tickets)
* SQL golden files ({{SQLQueryTestSuite}})
* Benchmark classes (may reuse {{RandomDataGenerator}} / {{specialNanosTs}}
after this lands)
* ScalaCheck / property-based framework introduction
h2. Implementation notes
* Keep all new code under {{src/test}} — no production dependency from main
sources on test utils.
* Prefer {{LocalDateTime}} / {{Instant}} as the external type in
{{RandomDataGenerator}} to match micro timestamp conventions and *1b*
converters.
* Do not change behavior of existing {{RandomDataGenerator}} cases for
{{TimestampType}} / {{TimestampNTZType}}.
* Consider a single shared {{specialNanosTs}} object referenced from
{{RandomDataGenerator}} and optionally from {{CastSuiteBase}} in a follow-up
(avoid large unrelated refactors in this ticket; exporting from
{{TimestampNanosTestUtils}} is sufficient).
h2. Acceptance criteria
* {{TimestampNanosTestUtils}} provides fixed builders and {{specialNanosTs}}
corpus usable from other test suites.
* {{RandomDataGenerator.forType(TimestampNTZNanosType(9))}} and
{{TimestampLTZNanosType(9)}} return {{Some(generator)}} producing
{{LocalDateTime}} / {{Instant}} with nanosecond variation.
* {{TimestampNanosTestUtilsSuite}} (or equivalent) passes; seeded random
roundtrip smoke test passes.
* Existing {{RandomDataGenerator}} and datetime test suites show no regressions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]