[ 
https://issues.apache.org/jira/browse/SPARK-56822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-56822:
-----------------------------
    Labels: SPIP  (was: )

> SPIP: Timestamps with nanosecond precision
> ------------------------------------------
>
>                 Key: SPARK-56822
>                 URL: https://issues.apache.org/jira/browse/SPARK-56822
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Max Gekk
>            Assignee: Max Gekk
>            Priority: Major
>              Labels: SPIP
>
> h1. Q1. What are you trying to do? Articulate your objectives using 
> absolutely no jargon.
> Add nanosecond-capable timestamps with an explicit fractional precision *n* 
> in SQL and APIs, while keeping a simple binary value: epoch microseconds 
> (64-bit) + nanoseconds within that microsecond (0–999, stored in 16 bits).
> h3. SQL surface
> Support TIMESTAMP_NTZ\(n\), TIMESTAMP_LTZ\(n\), and TIMESTAMP\(n\) in the 
> parser (including equivalent spellings: WITHOUT TIME ZONE, WITH LOCAL TIME 
> ZONE, etc.) so precision is first-class in the grammar, not a side 
> convention. The parameter n is optional, and its valid range is [0, 9]. It 
> defines how many decimal digits of the fractional second are part of the 
> type. For example: n = 6 -> microseconds, n = 9 -> nanoseconds.
> h3. Scala / Java APIs
> Introduce parameterized catalyst types, e.g. TimestampNTZNanosType\(n\) for 
> TIMESTAMP_NTZ\(n\) and TimestampNanosType\(n\) for TIMESTAMP_LTZ\(n\).
> h3. In-memory value
> The internal representation of both types is long micros since the epoch + 
> short (or 16-bit) nanos-in-micro in [0, 999]. Both NTZ and LTZ have the same 
> representation; time zone only affects interpretation for LTZ, not the pair 
> layout.
>  
> h1. Q2. What problem is this proposal NOT designed to solve?
>  * *Precision below nanosecond-target range.* This SPIP only covers 
> high-precision parameterized timestamps where 7 <= n <= 9. Support/changes 
> for n < 7 are explicitly out of scope for this proposal.
>  * *Subtraction of timestamps.* The result should be a day-time interval but 
> the data type has microsecond precision at the moment.
>  * *Changing existing time zone semantics.* It does not redefine NTZ/LTZ 
> semantics, session time zone behavior, or SQL time zone rules.
>  * *A full rework of all connectors/storage formats.* It does not promise 
> end-to-end nanosecond support in every external system; connector-specific 
> follow-ups may still be needed.
> h1. Q3. How is it done today, and what are the limits of current practice?
> Today Spark SQL supports two built-in timestamp types, both at microsecond 
> precision:
>  * TimestampType (TIMESTAMP WITH LOCAL TIME ZONE)
>  * TimestampNTZType (TIMESTAMP WITHOUT TIME ZONE)
> So Spark’s native timestamp model stops at 6 fractional digits.
> For nanosecond data (for example Parquet TIMESTAMP(NANOS, ...)), current 
> behavior is limited:
>  * Default behavior: Spark rejects it with an analysis error (for example: 
> Illegal Parquet type: INT64 (TIMESTAMP(NANOS,true))).
>  * Legacy fallback (spark.sql.legacy.parquet.nanosAsLong=true): Spark reads 
> it as LongType (raw integer), which drops timestamp semantics:
>  ** no timestamp/date function behavior
>  ** no time zone semantics
>  ** no timestamp type safety
> In practice this means users either:
>  * down-convert data to microseconds before Spark, or
>  * keep nanosecond values as raw integers and do manual conversion logic, or
>  * switch to another engine for nanosecond timestamp workloads.
> This is increasingly painful because nanosecond timestamps are common in data 
> produced by systems like Pandas/PyArrow, Trino, ClickHouse, and DuckDB, and 
> in domains such as market data, IoT telemetry, and tracing/observability 
> pipelines.
>  
> h1. Q4. What is new in your approach and why do you think it will be 
> successful?
> h2. What is new
>  * *Precision parameterization for timestamp types*
> Add TIMESTAMP_NTZ\(n\), TIMESTAMP_LTZ\(n\), and TIMESTAMP\(n\) (including 
> alias forms) with explicit fractional precision, focused on 7 <= n <= 9.
>  * *New catalyst/API types for nanos-capable timestamps*
> Introduce parameterized nanos-aware types instead of overloading existing 
> microsecond-only types.
>   ** For TIMESTAMP_NTZ\(n\): case class TimestampNTZNanosType(precision: Int)
>   ** For TIMESTAMP_LTZ\(n\): case class TimestampLTZNanosType(precision: Int)
>  * *Precise internal value model without changing core epoch semantics.* 
> Represent each value as:
>  ** epoch microseconds (long)
>  ** nanoseconds within the microsecond (0..999) This preserves existing Spark 
> timestamp foundations while adding sub-micro precision.
>  * *End-to-end engine integration, not just parser support*
> The change is wired through parser/type system, expression evaluation, 
> codegen, unsafe/container paths, and file-format handling paths needed for 
> practical use.
> h2. Why this should succeed
>  # Low semantic risk: extends existing timestamp families instead of 
> redefining them.
>  # Backward compatible path: existing microsecond types and behavior remain 
> unchanged.
>  # Incremental implementation: narrow scope (7..9) and clear boundaries make 
> rollout testable.
>  # Real interoperability value: directly addresses common nanos data sources 
> where Spark currently fails or degrades to LongType.
>  # Operationally practical: keeps compact representation and reuses current 
> execution architecture, so performance/regression risk is manageable.
> h1. Q5. Who cares? If you are successful, what difference will it make?
> h2. Who cares
>  * Data engineering teams ingesting Parquet/warehouse data produced with 
> nanosecond timestamps.
>  * Platform teams running mixed ecosystems (Spark + 
> Trino/ClickHouse/DuckDB/Pandas/PyArrow).
>  * Domain teams where sub-micro timing matters (market data, telemetry, 
> observability, IoT, CDC/event streams).
>  * Connector and table-format maintainers who currently need compatibility 
> workarounds.
> h2. What difference success makes
>  # Interop works by default. Spark can read/use high-precision timestamp data 
> as timestamp types, instead of failing or forcing LongType fallback.
>  # No more semantic loss workarounds. Users avoid manual bigint-to-timestamp 
> conversions, custom UDF glue, and loss of timezone/type semantics.
>  # Correctness for high-frequency data. Distinct events within the same 
> microsecond stay distinct; ordering and time-window logic become more 
> reliable for nanos data.
>  # Lower migration friction to Spark. Teams can bring existing nanos datasets 
> into Spark pipelines without pre-normalization to micros.
>  # Cleaner long-term type story. Spark gets an explicit precision model for 
> timestamps \(n\) rather than implicit microsecond-only behavior, making 
> schema contracts clearer across SQL and APIs.
> h1. Q6. What are the risks?
> |Risk|Mitigation|
> |*User confusion* from more timestamp spellings (TIMESTAMP\(n\), 
> TIMESTAMP_NTZ\(n\), TIMESTAMP_LTZ\(n\), plus WITHOUT TIME ZONE / WITH LOCAL 
> TIME ZONE variants).|Keep today’s types as the default; require explicit n 
> for the new behavior; document a small “cheat sheet” mapping SQL spelling -> 
> semantics + precision.|
> |{*}Wrong results in shared datetime code paths{*}: functions/rules that 
> implicitly assume microsecond-only timestamps (AnyTimestampType-style 
> plumbing, codegen, optimizer rewrites) may mishandle nanosecond-capable 
> values.|Systematic audit of shared abstractions (casts, comparisons, 
> intervals, extract, codegen paths) plus focused regression tests on n ∈ 
> [7,9].|
> |*Performance regressions* (wider projections, more branches in 
> codegen/vectorized paths).|Benchmark common scans/joins/aggregations; keep 
> fast paths for existing microsecond timestamps.|
> |*Range / overflow issues tied to widening.* Bugs where code accidentally 
> converts to a single epochNanos long during promotion or builtins.|Document 
> the representable range for the chosen internal representation; enforce 
> bounds at cast boundaries; tests for edge instants.|
> |*Interop / external formats (Parquet/Iceberg/etc.):* external encodings may 
> use epoch nanoseconds in int64, while Spark uses (micros, nanosWithinMicro). 
> Conversion bugs are likely near boundaries and for pushdown 
> predicates.|Conversions must be explicitly specified (including acceptable 
> rounding/truncation) and covered by tests for the supported read/write paths 
> you ship.|
>  
> h1. Q7. How long will it take?
> This estimate covers shipping of parameterized nanosecond-capable timestamps 
> (7 <= n <= 9) with feature parity to existing TimestampType / 
> TimestampNTZType, including parser, core type system, Parquet nanos support, 
> encoders/converters, datetime utilities, cast matrix (interpreted + codegen), 
> expression updates, type coercion, literals, and testing.
> Implementation should integrate with the Types Framework (SPARK-53504): 
> register the new types through the centralized TypeOps / TypeApiOps (and 
> storage/client Ops as applicable) instead of scattering one-off integration 
> across dozens of files.
> h3. Engineering estimate (person-weeks)
> |Area|pw|
> |Core type system (parameterization, physical dispatch for p<=6 vs p>=7, 
> equality/hash/order + catalyst plumbing)|2|
> |SQL parser / AST (TIMESTAMP…(p) + aligned spellings)|1|
> |Parquet nanos read/write (incl. vectorized + non-vectorized, rebasing, 
> legacy migration hooks, preview flags)|4|
> |Encoders / converters (framework-aligned)|1.5|
> |Datetime utils (nano parsing/conversion/arithmetic helpers)|1|
> |Cast matrix (interpreted + codegen, precision rules)|2|
> |Expression parity work (datetime builtins impacted)|5|
> |Type coercion / widening (TypeCoercion / AnsiTypeCoercion)|3|
> |Literals|1|
> |Testing (unit + Parquet + overflow/range + ANSI vs non-ANSI)|2|
> |Types Framework wiring + tests for the new type|1.5|
> |Total|25.0|
> Calendar time (rule of thumb) ~25 person-weeks is about:
>  * ~6 months for one senior engineer (calendar), or
>  * ~7–9 weeks wall time for two engineers mostly dedicated,
> plus ~2–4 extra weeks calendar buffer for OSS review/CI churn and rebases.
>  
> h1. Q8. What are the mid-term and final “exams” to check for success?
> h3. Mid-term “exams”. Users can declare, load, and analyze nanosecond-capable 
> timestamps without hacks:
>  # *SQL types are real, not experimental stubs.*
> Users can declare schemas with TIMESTAMP…(p) / TIMESTAMP_NTZ(p) / 
> TIMESTAMP_LTZ(p) for p ∈ [7,9] (including equivalent WITHOUT TIME ZONE / WITH 
> LOCAL TIME ZONE spellings), and those types round-trip through CREATE TABLE / 
> CAST / DESCRIBE / explain plans in a predictable way.
>  # *Nanosecond timestamps stop degrading into “just BIGINT”.*
> Users can read common nanosecond Parquet inputs as timestamps, without being 
> forced into LongType / legacy escape hatches for the supported paths you ship 
> in this milestone.
>  # *Everyday analytics works on the new types.*
> For the shipped surface area, users can run typical workflows end-to-end: 
> filters/joins/group-by keys, casts, timestamp arithmetic, 
> extract/truncate-style operations.
>  # *Defaults unchanged for existing users.*
> Existing microsecond-first workloads behave as today unless users explicitly 
> opt into p ∈ [7,9] or ingest nanosecond-native sources that trigger the new 
> behavior.
>  
> h3. Final “exams”. Workflows are consistently usable, documented, and 
> migration-ready at the level users expect from Spark timestamps today:
>  # *Parity exam vs existing Spark timestamps (user-visible).*
> For TimestampType / TimestampNTZType, users already expect a broad set of 
> behaviors. For p ∈ [7,9], the shipped release must meet the same practical 
> usability standard on the supported operations list (no “half the functions 
> silently downgrade precision or fail inconsistently”).
>  # *Interop exam.*
> Users can run realistic pipelines: read nanosecond-rich Parquet, transform, 
> and write/publish results without mandatory pre-processing outside Spark - 
> within the explicitly documented guarantees (lossless round-trip everywhere 
> is not required, but what is guaranteed must be testable true).
>  # *Migration exam.*
> There is a published migration path away from workflows that today rely on 
> spark.sql.legacy.parquet.nanosAsLong (including what changes in schema types 
> and what users must do).
>  # *Documentation exam.*
> Public docs answer: what p means, 7–9 scope, range/overflow behavior, ANSI vs 
> non-ANSI, and known limitations - written so support/engineering can deflect 
> repeated confusion.
> h1. Appendix A. Proposed API Changes
> h2. A.1 SQL language
> Extend fractional-second precision (FSP) on timestamp families
> |*Surface*|*Proposed syntax*|*Role*|
> |NTZ nanos-capable|TIMESTAMP_NTZ\(n\)|Explicit NTZ with fractional precision 
> {*}n{*}.|
> |Alias (NTZ)|TIMESTAMP\(n\) WITHOUT TIME ZONE|Same NTZ meaning as 
> TIMESTAMP_NTZ\(n\) (aligned with SQL-style spelling).|
> |LTZ nanos-capable|TIMESTAMP_LTZ\(n\)|Explicit LTZ / “instant” timeline with 
> n.|
> |Alias (LTZ)|TIMESTAMP\(n\) WITH LOCAL TIME ZONE|Same LTZ meaning as 
> TIMESTAMP_LTZ\(n\).|
> |Session default timestamp form|TIMESTAMP\(n\)|Where Spark today resolves 
> TIMESTAMP to LTZ or NTZ per configuration, n attaches FSP to that choice.|
> Backward compatibility
>  * Unparameterized forms keep today’s meaning:
>  ** TIMESTAMP_NTZ -> existing TimestampNTZType (microsecond semantics).
>  ** TIMESTAMP_LTZ / TIMESTAMP WITHOUT TIME ZONE / WITH LOCAL TIME ZONE 
> combinations -> existing TimestampType / TimestampNTZType behavior unchanged 
> when \(n\) is omitted.
>  * New behavior appears only when \(n\) is present (and, per SPIP scope, 
> especially 7 <= n <= 9), mapping to the new catalyst types below - not silent 
> widening of old types.
> h2. A.2 Scala / Java DataType APIs
> New case classes:
> |Type|Purpose|Notes|
> |TimestampNTZNanosType(n: Int)|SQL TIMESTAMP_NTZ\(n\) (nanosecond-capable 
> NTZ).|Present on this branch: sql / typeName like TIMESTAMP_NTZ(n; n in 0..9 
> today in code - SPIP may narrow product scope to 7..9 while keeping storage 
> general.|
> |TimestampNanosType(n: Int)|SQL TIMESTAMP_LTZ\(n\) (nanosecond-capable 
> LTZ).|Proposed companion to NTZ; mirrors the same FSP parameter and 
> external/Java mapping pattern.|
> Backward compatibility
>  * Existing singleton types TimestampNTZType and TimestampType remain the 
> defaults for unparameterized SQL and for older serialized schemas that do not 
> carry n.
>  * New types are additive; code that pattern-matches only on TimestampType / 
> TimestampNTZType continues to compile but must be reviewed for parity paths 
> when TimestampNanosType / TimestampNTZNanosType appear.
> h2. A.3 Compatibility summary
> |Direction|Expectation|
> |Old → new|Opt-in via TIMESTAMP_* \(n\) / new DataTypes; no change for legacy 
> microsecond types unless users choose nanosecond-capable types.|
> |New → old|Downgrades may truncate/round per FSP rules; must be documented 
> and tested (ANSI throw vs non-ANSI null where applicable).|
> |Cross-version|Schemas with nanosecond-capable types require a Spark version 
> that understands them; older engines must reject or require migration tools - 
> not silently coerce.|
> h1. Appendix B. Design Sketch.
> A value for nanosecond-capable NTZ (and the same pair for LTZ):
>  * {*}epochMicros{*}: Long - signed epoch microseconds (same grid as 
> TimestampType / TimestampNTZType today).
>  * {*}nanosOfMicro{*}: Short in [0, 999] - remaining nanoseconds inside that 
> microsecond bucket.
> {*}Invariant{*}: the pair is always normalized so *nanosOfMicro* stays in 
> range; excess carries into *epochMicros* with Math.addExact / floor-div where 
> needed.
> h4. Why this split (vs a single Long epoch-nanos counter):
>  * {*}Range{*}: *epochMicros* as Long keeps the calendar reach in the same 
> ballpark as today’s microsecond timestamps. A single INT64 epoch-nanoseconds 
> field has a much smaller representable year range; Spark can avoid that 
> user-visible cliff for the micros timeline part.
>  * *Interop/conversion cost:* most existing Spark datetime math is already 
> microsecond-grained; incremental changes upgrade paths by composing micro ops 
> + cheap nano remainder, instead of forcing every operator to immediately 
> normalize to nanoseconds-since-epoch.
>  * *Deterministic normalization:* *nanosOfMicro* is a small bounded 
> correction term - easy to audit in casts/parsers vs unconstrained nano 
> arithmetic everywhere.
> TimestampNTZNanosType (and similar TimestampLTZNanosType) is the schema-level 
> description of TIMESTAMP_NTZ(p). The values for this type are not “a struct 
> of two children in the row” at runtime; the companion internalStructType 
> exists for metadata only (historical/compat hooks). Execution uses the 
> normalized logical pair epoch micros + nanoseconds within that micro via 
> org.apache.spark.unsafe.types.TimestampNTZNanos.
> {code:java}
> /**
>  * Timestamp without time zone with fractional-second precision up to
>  * nanoseconds (9 decimal digits)
>  */
> @Unstable
> case class TimestampNTZNanosType(precision: Int) extends DatetimeType {
>   if (precision < 7 || precision > 9)
> {     throw DataTypeErrors.unsupportedTimestampPrecisionError(precision)   }
>  
>   /**
>    * Default size used by Spark for row-size estimation.
>    * Nanosecond-capable NTZ values are represented logically as epoch
>    * microseconds (`Long`, 8 bytes) plus nanoseconds within that micro
>    * (`Short`, 2 bytes). Size estimation sums those fixed logical
>    * parts (10 bytes). Physical encoding in 
> [[org.apache.spark.sql.catalyst.expressions.UnsafeRow]]
>    * uses a separate layout (one fixed pointer word plus a variable-length 
> payload).
>    */
>   override def defaultSize: Int = 10
>  
>   override def typeName: String = s"timestamp_ntz($precision)"
>   override def sql: String = s"TIMESTAMP_NTZ($precision)"
>   private[spark] override def asNullable: TimestampNTZNanosType = this
> }
> {code}
>  
> h1. Appendix C. Rejected Designs
> This SPIP converged on a logical pair: epoch microseconds (Long) + 
> nanoseconds within that micro (0..999, typically Short) with a single 
> normalization rule. Other encodings were considered and set aside for the 
> reasons below.
> h2. C.1 Epoch nanoseconds in a single Long
>  * {*}Reference{*}: related direction in the broader nanosecond-timestamp 
> SPIP / design thread ([Google Doc: SPIP: Support NanoSecond Timestamp 
> Types|https://docs.google.com/document/d/1Q5u1whAO_KcT6d4dFFaIMy_S3RoQEo4Znwz2U-nbhls/edit?tab=t.0#heading=h.xk16mmomv6il]).
>  * {*}Idea{*}: store the instant as signed 64-bit nanoseconds since Unix 
> epoch.
>  * *Why rejected* as the primary internal model for Spark’s timestamp 
> execution:
>  ** Much smaller representable calendar range in INT64 nanoseconds than INT64 
> microseconds (a well-known “range cliff” vs today’s microsecond timestamps).
>  ** Higher integration cost with the existing engine, which is overwhelmingly 
> organized around microsecond-grained datetime rules; everything would either 
> convert constantly or risk drift.
>  ** Not compatible with the SQL standard
> h2. C.2 Seconds + nanos of second
>  * *Idea:* decompose time into whole seconds and 0..999_999_999 nanos within 
> the second.
>  * *Why rejected:*
>  ** Mismatches Spark’s native microsecond “currency” (long micros) used 
> across most of Catalyst; forces pervasive rescaling and increases 
> codegen/interpreted drift risk.
>  ** Second-bucket arithmetic is awkward for interval APIs and 
> microsecond-legacy behavior (many paths think in micros, not “seconds + 
> intra-second nanos”).
> h2. C.3 Days + nanos within the day
>  * {*}Idea{*}: pack calendar date as day index and time-of-day as nanoseconds 
> within the day.
>  * *Why rejected:*
>  ** Not aligned with how Spark’s datetime system reasons about instants 
> (epoch micros + zone rules, legacy rebasing, Julian/Gregorian transitions, 
> etc.).
>  ** Day-boundary corner cases (leap seconds not with standings, DST is less 
> relevant to NTZ/LTZ storage but the model still composes poorly with “micros 
> since epoch” execution).
> h2. C.4 Nanos-from-epoch + Byte “high extension”
>  * {*}Idea{*}: keep nanosecond resolution but add an extra byte to extend 
> effective range beyond pure INT64 nanos limits.
>  * *Why rejected:*
>  ** Non-standard, harder to explain, and complicates every operator (suddenly 
> values are a 2-part variable-precision integer in the hot path).
>  ** Interoperability pain: external systems and file formats won’t natively 
> match the “byte extension” scheme; you still convert at boundaries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to