[PR] [WIP][SPARK-56981][SQL] Add physical representation and UnsafeRow support for nanosecond timestamps [spark]

via GitHub Fri, 22 May 2026 01:14:58 -0700


MaxGekk opened a new pull request, #56059:
URL: https://github.com/apache/spark/pull/56059


   ### What changes were proposed in this pull request?
   This PR implements the **physical row layer** for nanosecond-capable 
timestamp types, as defined in [SPARK-56822 SPIP: Timestamps with nanosecond 
precision](https://issues.apache.org/jira/browse/SPARK-56822).
   Logical types `TimestampNTZNanosType(p)` and `TimestampLTZNanosType(p)` (p 
in [7, 9]) were added in #55952; they still mapped to 
`UninitializedPhysicalType`, so values could not be stored or read from 
`InternalRow` / `UnsafeRow`. This change wires up the minimum physical 
infrastructure that downstream work (casts, Parquet, expressions) can depend on.
   
   #### SPIP internal representation
   Per the SPIP, a value is a composite of:
   
   - **Epoch microseconds** (long, 8 bytes) — same proleptic-Gregorian epoch 
microsecond count as existing microsecond timestamps
   - **Nanoseconds within that microsecond** (short, 0-999) — sub-micro 
fractional part, not a full nanosecond offset from epoch
   
   Logical defaultSize remains **10 bytes** on the types. In UnsafeRow, values 
use the same variable-length pattern as CalendarInterval: an 8-byte field slot 
(offset + size) pointing to a **16-byte** aligned payload (epochMicros + 
nanosWithinMicro with padding), so in-place updates remain possible.
   
   #### Implementation summary
   - **Unsafe value types** (distinct classes for NTZ vs LTZ semantics at the 
Java layer; shared byte layout):
     - `TimestampNTZNanos` - physical value for `TIMESTAMP_NTZ(p)`
     - `TimestampLTZNanos` - physical value for `TIMESTAMP_LTZ(p)`
   - **Physical types:** `PhysicalTimestampNTZNanosType`, 
`PhysicalTimestampLTZNanosType` registered in `PhysicalDataType.applyDefault`
   - **Row access:** specialized getters/setters on `InternalRow`, `UnsafeRow`, 
`UnsafeArrayData`, codegen (`CodeGenerator`, `InterpretedUnsafeProjection`, 
`SpecializedGettersReader`), and literal validation
   - **Columnar:** `ColumnVector` stubs throw 
`SparkUnsupportedOperationException` until columnar support is added
   
   ### Why are the changes needed?
   Without a concrete physical type and `UnsafeRow` accessors, any code path 
that materializes rows for nanosecond timestamps fails or falls through to 
`UninitializedPhysicalType`. This is the unblocker for the rest of sub-tasks.
   
   ### Does this PR introduce _any_ user-facing change?
   No. Logical types exist but are not yet exposed through SQL; behavior of 
`TimestampType`, `TimestampNTZType`, and microsecond storage is unchanged.
   
   ### How was this patch tested?
   - `unsafe/testOnly *TimestampNanosSuite` — unsafe value equality, hashCode, 
validation (`nanosWithinMicro` ∈ [0, 999])
   - `catalyst/testOnly *TimestampNanosRowSuite` — `GenericInternalRow` and 
`UnsafeRow` roundtrips (NTZ + LTZ, null/non-null), codegen and interpreted 
unsafe projection, literal validation
   - `DataTypeSuite` — `PhysicalDataType` is not `UninitializedPhysicalType` 
for p in {7, 8, 9}; `defaultSize` remains 10
   
   ### Was this patch authored or co-authored using generative AI tooling?
   Generated-by: Cursor Auto


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [WIP][SPARK-56981][SQL] Add physical representation and UnsafeRow support for nanosecond timestamps [spark]

Reply via email to