This new design makes sense to me. So we just add 2 more bytes to store nanosOfMicro, and the rest is the same as the current timestamp types: same value range, but higher precision.
On Thu, May 7, 2026 at 5:16 PM Max Gekk <[email protected]> wrote: > Hi Spark devs, > > I’d like to share a proposal for nano-second-capable timestamp support > and ask for your feedback. > > Here is the SPIP: > > https://docs.google.com/document/d/1DeW15QueI4PdRyPm6C6jsTZFmIjbXX2j4h-Ja5W_fsg/edit?usp=sharing > > My proposal uses a logical split representation: > - epochMicros: Long > - nanosOfMicro: Short in [0, 999] > > This applies to both NTZ and LTZ nano-capable types; timezone > semantics remain unchanged and are handled at interpretation > boundaries (as today). > > Why this approach? I believe this is the most practical path for Spark > because it: > 0. Conforms to the SQL standard. > 1. Preserves Spark’s existing microsecond approach. Most > Catalyst/runtime datetime logic already uses micros. The split model > extends it rather than replacing it. > 2. Avoids INT64 epoch-nanos range cliff as the primary engine model. A > single Long epoch-nanos representation constrains calendar range much > more aggressively than Long micros. > 3. Keeps migration risk lower. Existing microsecond behavior remains > default; nano precision is opt-in via parameterized types/syntax. > 4. Allows efficient implementation paths. Internals can still choose > compact physical encodings (row/vector/file boundaries), while keeping > one canonical logical contract. > > Related SPIPs considered. I reviewed and compared against these two drafts: > - SPIP: Support NanoSecond Timestamps: > > https://docs.google.com/document/d/1wjFsBdlV2YK75x7UOk2HhDOqWVA0yC7iEiqOMnNnxlA/edit?tab=t.0#heading=h.4kibaxwtx2xo > - SPIP: Support NanoSecond Timestamp Types: > > https://docs.google.com/document/d/1Q5u1whAO_KcT6d4dFFaIMy_S3RoQEo4Znwz2U-nbhls/edit?tab=t.0#heading=h.xk16mmomv6il > > Those drafts are valuable and informed this design. The key difference > is that I prioritize micros-first engine continuity with a bounded > nano remainder, instead of making epoch-nanos the primary internal > semantic unit. > In short: I think epochMicros + nanosOfMicro is a better fit for > Spark’s current architecture and compatibility constraints, while > still delivering practical nanosecond support. > > Thanks in advance for your feedback. > > Best regards, > Max Gekk > > --------------------------------------------------------------------- > To unsubscribe e-mail: [email protected] > >
