[
https://issues.apache.org/jira/browse/SPARK-56981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-56981:
-----------------------------------
Labels: pull-request-available (was: )
> Add physical representation and UnsafeRow support for nanosecond-capable
> timestamp types
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-56981
> URL: https://issues.apache.org/jira/browse/SPARK-56981
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.2.0
> Reporter: Max Gekk
> Assignee: Max Gekk
> Priority: Major
> Labels: pull-request-available
>
> h3. Summary
> [PR #55952|https://github.com/apache/spark/pull/55952] / SPARK-56876 added
> _logical_ types {{TimestampNTZNanosType(p)}} and {{TimestampLTZNanosType(p)}}
> (p ∈ [7, 9]) and JSON metadata. They still map to
> {{UninitializedPhysicalType}} in {{PhysicalDataType.apply}}, so the engine
> cannot store or access values in {{InternalRow}} / {{UnsafeRow}}.
> This issue delivers the _minimum_ physical layer aligned with the merged SPIP
> model: *epoch microseconds (8 bytes) + nanoseconds within the microsecond
> (0–999, 2 bytes)* — see {{defaultSize = 10}} on the logical types. One shared
> unsafe value representation at the row layer is fine for both NTZ and LTZ
> nanos types; semantic differences stay in logical/SQL layers.
> This is the *unblocker* for downstream work (cast, Parquet, expressions). It
> is intentionally small: no SQL parser, no SQLConf preview, no casts, no
> Parquet, no {{TypeOps}} / Types Framework requirement.
> _Ordering / compare / hash_ for these types is *out of scope* and will be
> tracked in a separate follow-up issue.
> h3. What to do
> *common/unsafe*
> * Add {{org.apache.spark.unsafe.types.TimestampNTZNanos}} (name as
> implemented): immutable value with {{long}} epoch micros + {{short}}
> nanos-in-micro ∈ [0, 999]; {{equals}} / {{hashCode}}.
> *PhysicalDataType*
> * Add {{PhysicalTimestampNanosType}} with {{InternalType}} = the unsafe value
> class.
> * Register {{TimestampNTZNanosType}} and {{TimestampLTZNanosType}} in
> {{PhysicalDataType.applyDefault}} (no {{UninitializedPhysicalType}}
> fall-through).
> *InternalRow*
> * Add get/set accessors on {{GenericInternalRow}} (and wiring in
> {{InternalRow}} accessor dispatch) for the new physical type.
> *UnsafeRow*
> * Store values using the same pattern as {{PhysicalCalendarIntervalType}}
> (non-fixed field: pointer in the 8-byte word + fixed payload), since 10
> logical bytes do not fit a single primitive word.
> * Implement read and write on {{UnsafeRow}}; update
> {{UnsafeRow.isFixedLength}} / size estimation if needed.
> *Codegen / getters*
> * {{SpecializedGettersReader}} and {{CodeGenerator}} read path for
> {{PhysicalTimestampNanosType}}; write path included if required for roundtrip
> tests or projection writers.
> *Literals*
> * Extend {{Literal}} validation in {{literals.scala}} to accept the unsafe
> value type for nanos timestamp physical type.
> h3. Tests
> * {{DataTypeSuite}}: {{PhysicalDataType(TimestampNTZNanosType(p))}} and LTZ
> variant are not {{UninitializedPhysicalType}}; {{defaultSize}} remains 10.
> * New or extended suite: {{InternalRow}} set/get roundtrip for non-null and
> null.
> * {{UnsafeRow}} write/read roundtrip for a struct with nanos timestamp
> column(s).
> * Regression: microsecond {{TimestampType}} / {{TimestampNTZType}} unchanged.
> h3. Acceptance criteria
> * {{PhysicalDataType.apply}} returns a concrete physical type for
> {{TimestampNTZNanosType}} and {{TimestampLTZNanosType}} for all valid p ∈ [7,
> 9].
> * Values can be written to and read from {{UnsafeRow}} and
> {{GenericInternalRow}} without falling through to uninitialized physical type
> or generic unsupported-physical-type failures in tests.
> * Codegen and interpreted getters can read a bound column of this physical
> type in a minimal projection test.
> * No change to behavior of {{TimestampType}}, {{TimestampNTZType}}, or
> existing microsecond storage.
> * Downstream issues (parser, SQLConf, cast, Parquet) can depend on this issue
> and assume the SPIP composite row layout.
> h3. References
> * Precedent: {{PhysicalCalendarIntervalType}} + {{CalendarInterval}} unsafe
> type
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]