Max Gekk created SPARK-57207:
--------------------------------
Summary: Register nanosecond timestamp types in the Types
Framework via TypeOps overrides
Key: SPARK-57207
URL: https://issues.apache.org/jira/browse/SPARK-57207
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
Assignee: Max Gekk
### Summary
Register TimestampNTZNanosType(p) and TimestampLTZNanosType(p) (p in [7, 9]) in
the
Spark SQL Types Framework (SPARK-53504) by adding TypeOps (server-side,
catalyst) and
TypeApiOps (client-side, sql/api) implementations. The logical types and the
physical row
layer already exist (SPARK-56876, SPARK-56981); this issue centralizes the
wiring behind
TypeOps when spark.sql.types.framework.enabled is true.
This is split out of SPARK-57101 / PR #56199 so that the timestamp-nano type
registration is
reviewed independently of the abstract Types Framework method additions. It
deliberately only
*overrides existing* TypeOps / TypeApiOps methods; no new framework methods are
introduced.
### What changes
Add TypeOps implementations (sql/catalyst):
- TimestampNanosTypeOps shared trait with TimestampNTZNanosTypeOps /
TimestampLTZNanosTypeOps,
following the TimeTypeOps pattern.
- Register both in TypeOps.apply() alongside TimeType.
- Overridden methods (all already on TypeOps): getPhysicalType, getJavaClass,
getRowWriter,
getDefaultLiteral, getJavaLiteral, getMutableValue, toCatalystImpl, toScala,
toScalaImpl.
Add TypeApiOps stubs (sql/api):
- TimestampNanosTypeApiOps base with TimestampNTZNanosTypeApiOps /
TimestampLTZNanosTypeApiOps,
registered in TypeApiOps.apply().
- format / toSQLValue: interim implementation (TimestampNanosVal.toString with
NTZ/LTZ prefix)
until dedicated fractional-second formatters land.
- getEncoder: reports the type as unsupported
(UNSUPPORTED_DATA_TYPE_FOR_ENCODER), matching the
legacy RowEncoder fallback; encoders are out of scope (SPARK-57033).
Mutable holder:
- Add MutableTimestampNanos to SpecificInternalRow to avoid the MutableAny
fallback.
Feature flag:
- All registration is gated by spark.sql.types.framework.enabled (same as
TimeType).
- When the flag is false, behavior remains identical to the current legacy
paths.
### Integration points (automatic when TypeOps returns Some)
These call sites already delegate to TypeOps(dt).map(...).getOrElse(legacy); no
per-call-site
edits are required beyond registration: PhysicalDataType.apply, Literal.default,
InternalRow.getWriter / getAccessor, CodeGenerator Java class for codegen, and
SpecificInternalRow mutable column values.
### Tests
New TimestampNanosTypeOpsSuite, for p in {7, 8, 9} over NTZ and LTZ:
- TypeOps / TypeApiOps are registered when the framework is enabled.
- PhysicalDataType, Literal.default value, and codegen Java class are correct.
- InternalRow and SpecificInternalRow set/read roundtrips.
- SpecificInternalRow uses the dedicated MutableTimestampNanos holder.
- getEncoder reports UNSUPPORTED_DATA_TYPE_FOR_ENCODER.
- toSQLValue uses the NTZ/LTZ literal prefix.
- Framework-disabled fallback produces identical results.
### Out of scope
- New abstract Types Framework methods for codegen (kept in SPARK-57101 / PR
#56199).
- CatalystTypeConverters / java.time roundtrip (SPARK-57033), encoders, Connect
proto, Arrow,
PySpark conversion, cast/Parquet/ColumnVector, and physical
ordering/compare/hash.
### Depends on
- SPARK-56981 (physical row layer and TimestampNanosVal)
### References
- SPARK-56822 - parent SPIP
- SPARK-53504 - Types Framework
- Precedent: org.apache.spark.sql.catalyst.types.ops.TimeTypeOps
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]