[
https://issues.apache.org/jira/browse/SPARK-57101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-57101:
-----------------------------------
Labels: pull-request-available (was: )
> Register nanosecond timestamp types in the Types Framework (server-side)
> ------------------------------------------------------------------------
>
> Key: SPARK-57101
> URL: https://issues.apache.org/jira/browse/SPARK-57101
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Priority: Major
> Labels: pull-request-available
>
> h3. Summary
> Register TimestampNTZNanosType(p) and TimestampLTZNanosType(p) (p in [7, 9])
> in the Spark SQL Types Framework (SPARK-53504) for server-side (catalyst)
> operations. Logical types and the physical row layer already exist
> (SPARK-56876, SPARK-56981); today these types are wired only through legacy
> dispatch in PhysicalDataType, Literal, InternalRow, and codegen. This issue
> centralizes that wiring behind TypeOps when spark.sql.types.framework.enabled
> is true.
> This issue covers physical representation, literals, row accessors, and
> codegen class selection only. java.time conversion, Dataset encoders, Connect
> proto, Arrow, and cast formatting are out of scope and will be handled in
> follow-up issues after SPARK-57033 and related work land.
> h3. Background
> * Parent SPIP: SPARK-56822 (Timestamps with nanosecond precision)
> * Types Framework: SPARK-53504; reference implementation is TimeTypeOps /
> TimeTypeApiOps
> * Merged foundation:
> ** SPARK-56876 — logical types TimestampNTZNanosType / TimestampLTZNanosType
> ** SPARK-56981 — physical value TimestampNanosVal,
> PhysicalTimestampNTZNanosType / PhysicalTimestampLTZNanosType, InternalRow
> and UnsafeRow accessors (PR #56059)
> * Internal representation: epochMicros (long) + nanosWithinMicro (short,
> 0–999), stored as TimestampNanosVal in rows
> h3. What to do
> *Add TypeOps implementations (sql/catalyst)*
> * Create TimestampNTZNanosTypeOps and TimestampLTZNanosTypeOps (shared base
> for common logic), following the TimeTypeOps pattern.
> * Register both in TypeOps.apply() — single registration point alongside
> TimeType.
> *Implement TypeOps methods using existing 56981 behavior:*
> || Method || Behavior ||
> | getPhysicalType | PhysicalTimestampNTZNanosType or
> PhysicalTimestampLTZNanosType |
> | getJavaClass | classOf[TimestampNanosVal] |
> | getRowWriter | setTimestampNTZNanos / setTimestampLTZNanos on InternalRow |
> | getDefaultLiteral | Literal.create(TimestampNanosVal.ZERO, type) |
> | getJavaLiteral | Java literal for codegen (e.g. TimestampNanosVal.ZERO or
> fromParts) |
> | getMutableValue | Mutable holder for TimestampNanosVal in
> SpecificInternalRow (new MutableTimestampNanos or equivalent; avoid
> unnecessary MutableAny fallback) |
> *Add minimal TypeApiOps stubs (sql/api)*
> * Create TimestampNTZNanosTypeApiOps and TimestampLTZNanosTypeApiOps
> registered in TypeApiOps.apply().
> * TimestampNTZNanosTypeOps / TimestampLTZNanosTypeOps extend the
> corresponding ApiOps class and TypeOps (same pattern as TimeTypeOps extends
> TimeTypeApiOps).
> * format / formatUTF8 / toSQLValue: interim implementation acceptable (e.g.
> epoch-micros-based display or TimestampNanosVal.toString) until dedicated FSP
> formatters exist in a follow-up issue.
> * getEncoder: not in scope for this issue.
> *Integration points (automatic when TypeOps returns Some)*
> These call sites already delegate to TypeOps(dt).map(...).getOrElse(legacy);
> no per-call-site edits should be required beyond registration:
> * PhysicalDataType.apply
> * Literal.default
> * InternalRow.getWriter
> * CodeGenerator / EncoderUtils Java class for codegen
> * SpecificInternalRow mutable column values
> *Feature flag*
> * All registration is gated by spark.sql.types.framework.enabled (same as
> TimeType).
> * When the flag is false, behavior must remain identical to current legacy
> paths.
> h3. Tests
> * With spark.sql.types.framework.enabled=true:
> ** PhysicalDataType(TimestampNTZNanosType(9)) and LTZ variant return the
> correct physical types (not UninitializedPhysicalType).
> ** Literal.default matches TimestampNanosVal.ZERO for both nanos types.
> ** InternalRow.getWriter roundtrip: set and read via accessor for NTZ and LTZ.
> ** SpecificInternalRow update/read for nanos columns.
> * With the flag false: regression tests confirm no behavior change vs master
> legacy paths.
> * Framework-on vs framework-off equivalence tests for the operations above.
> h3. Acceptance criteria
> * TypeOps(TimestampNTZNanosType(p)) and TypeOps(TimestampLTZNanosType(p))
> return non-empty ops when spark.sql.types.framework.enabled=true, for p in
> {7, 8, 9}.
> * Listed integration points use TypeOps implementations and match legacy
> behavior.
> * spark.sql.types.framework.enabled=false preserves current behavior.
> * No change to UnsafeRow layout, TimestampNanosRowValues, or microsecond
> TimestampType / TimestampNTZType behavior.
> h3. Out of scope
> * CatalystTypeConverters and java.time roundtrip (SPARK-57033)
> * SerializerBuildHelper / DeserializerBuildHelper and RowEncoder encoders
> * ConnectTypeOps and Connect proto literals
> * Arrow type mapping and ArrowFieldWriter
> * PySpark conversion (EvaluatePython)
> * Cast matrix, Parquet read/write, ColumnVector / vectorized Parquet
> * Physical ordering, compare, and hash for nanos types
> * Removing legacy branches from PhysicalDataType.applyDefault (optional
> cleanup in a later issue)
> h3. Depends on
> * SPARK-56981 (physical row layer and TimestampNanosVal)
> h3. References
> * SPARK-56822 — parent SPIP
> * SPARK-53504 — Types Framework
> * Precedent: org.apache.spark.sql.catalyst.types.ops.TimeTypeOps
> * Physical value: org.apache.spark.unsafe.types.TimestampNanosVal
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]