[jira] [Updated] (SPARK-57101) Register nanosecond timestamp types in the Types Framework (server-side)

ASF GitHub Bot (Jira) Fri, 29 May 2026 02:37:05 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-57101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated SPARK-57101:
-----------------------------------
    Labels: pull-request-available  (was: )

> Register nanosecond timestamp types in the Types Framework (server-side)
> ------------------------------------------------------------------------
>
>                 Key: SPARK-57101
>                 URL: https://issues.apache.org/jira/browse/SPARK-57101
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Priority: Major
>              Labels: pull-request-available
>
> h3. Summary
> Register TimestampNTZNanosType(p) and TimestampLTZNanosType(p) (p in [7, 9]) 
> in the Spark SQL Types Framework (SPARK-53504) for server-side (catalyst) 
> operations. Logical types and the physical row layer already exist 
> (SPARK-56876, SPARK-56981); today these types are wired only through legacy 
> dispatch in PhysicalDataType, Literal, InternalRow, and codegen. This issue 
> centralizes that wiring behind TypeOps when spark.sql.types.framework.enabled 
> is true.
> This issue covers physical representation, literals, row accessors, and 
> codegen class selection only. java.time conversion, Dataset encoders, Connect 
> proto, Arrow, and cast formatting are out of scope and will be handled in 
> follow-up issues after SPARK-57033 and related work land.
> h3. Background
> * Parent SPIP: SPARK-56822 (Timestamps with nanosecond precision)
> * Types Framework: SPARK-53504; reference implementation is TimeTypeOps / 
> TimeTypeApiOps
> * Merged foundation:
> ** SPARK-56876 — logical types TimestampNTZNanosType / TimestampLTZNanosType
> ** SPARK-56981 — physical value TimestampNanosVal, 
> PhysicalTimestampNTZNanosType / PhysicalTimestampLTZNanosType, InternalRow 
> and UnsafeRow accessors (PR #56059)
> * Internal representation: epochMicros (long) + nanosWithinMicro (short, 
> 0–999), stored as TimestampNanosVal in rows
> h3. What to do
> *Add TypeOps implementations (sql/catalyst)*
> * Create TimestampNTZNanosTypeOps and TimestampLTZNanosTypeOps (shared base 
> for common logic), following the TimeTypeOps pattern.
> * Register both in TypeOps.apply() — single registration point alongside 
> TimeType.
> *Implement TypeOps methods using existing 56981 behavior:*
> || Method || Behavior ||
> | getPhysicalType | PhysicalTimestampNTZNanosType or 
> PhysicalTimestampLTZNanosType |
> | getJavaClass | classOf[TimestampNanosVal] |
> | getRowWriter | setTimestampNTZNanos / setTimestampLTZNanos on InternalRow |
> | getDefaultLiteral | Literal.create(TimestampNanosVal.ZERO, type) |
> | getJavaLiteral | Java literal for codegen (e.g. TimestampNanosVal.ZERO or 
> fromParts) |
> | getMutableValue | Mutable holder for TimestampNanosVal in 
> SpecificInternalRow (new MutableTimestampNanos or equivalent; avoid 
> unnecessary MutableAny fallback) |
> *Add minimal TypeApiOps stubs (sql/api)*
> * Create TimestampNTZNanosTypeApiOps and TimestampLTZNanosTypeApiOps 
> registered in TypeApiOps.apply().
> * TimestampNTZNanosTypeOps / TimestampLTZNanosTypeOps extend the 
> corresponding ApiOps class and TypeOps (same pattern as TimeTypeOps extends 
> TimeTypeApiOps).
> * format / formatUTF8 / toSQLValue: interim implementation acceptable (e.g. 
> epoch-micros-based display or TimestampNanosVal.toString) until dedicated FSP 
> formatters exist in a follow-up issue.
> * getEncoder: not in scope for this issue.
> *Integration points (automatic when TypeOps returns Some)*
> These call sites already delegate to TypeOps(dt).map(...).getOrElse(legacy); 
> no per-call-site edits should be required beyond registration:
> * PhysicalDataType.apply
> * Literal.default
> * InternalRow.getWriter
> * CodeGenerator / EncoderUtils Java class for codegen
> * SpecificInternalRow mutable column values
> *Feature flag*
> * All registration is gated by spark.sql.types.framework.enabled (same as 
> TimeType).
> * When the flag is false, behavior must remain identical to current legacy 
> paths.
> h3. Tests
> * With spark.sql.types.framework.enabled=true:
> ** PhysicalDataType(TimestampNTZNanosType(9)) and LTZ variant return the 
> correct physical types (not UninitializedPhysicalType).
> ** Literal.default matches TimestampNanosVal.ZERO for both nanos types.
> ** InternalRow.getWriter roundtrip: set and read via accessor for NTZ and LTZ.
> ** SpecificInternalRow update/read for nanos columns.
> * With the flag false: regression tests confirm no behavior change vs master 
> legacy paths.
> * Framework-on vs framework-off equivalence tests for the operations above.
> h3. Acceptance criteria
> * TypeOps(TimestampNTZNanosType(p)) and TypeOps(TimestampLTZNanosType(p)) 
> return non-empty ops when spark.sql.types.framework.enabled=true, for p in 
> {7, 8, 9}.
> * Listed integration points use TypeOps implementations and match legacy 
> behavior.
> * spark.sql.types.framework.enabled=false preserves current behavior.
> * No change to UnsafeRow layout, TimestampNanosRowValues, or microsecond 
> TimestampType / TimestampNTZType behavior.
> h3. Out of scope
> * CatalystTypeConverters and java.time roundtrip (SPARK-57033)
> * SerializerBuildHelper / DeserializerBuildHelper and RowEncoder encoders
> * ConnectTypeOps and Connect proto literals
> * Arrow type mapping and ArrowFieldWriter
> * PySpark conversion (EvaluatePython)
> * Cast matrix, Parquet read/write, ColumnVector / vectorized Parquet
> * Physical ordering, compare, and hash for nanos types
> * Removing legacy branches from PhysicalDataType.applyDefault (optional 
> cleanup in a later issue)
> h3. Depends on
> * SPARK-56981 (physical row layer and TimestampNanosVal)
> h3. References
> * SPARK-56822 — parent SPIP
> * SPARK-53504 — Types Framework
> * Precedent: org.apache.spark.sql.catalyst.types.ops.TimeTypeOps
> * Physical value: org.apache.spark.unsafe.types.TimestampNanosVal



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-57101) Register nanosecond timestamp types in the Types Framework (server-side)

Reply via email to