I don't have a personal opinion on which representation is technical better, but our goal here should be to standardize existing practice, not come up with a novel representation, IMHO.

Regards

Antoine.


Le 18/11/2025 à 23:45, Felipe Oliveira Carvalho a écrit :
One reason to avoid 128-bit integers is the requirement for 128-bit
operations that it creates. Many high-resolution time representations split
the value in two integers in a way that is useful for many time-related
operations.

The picosecond resolution can be achieved by splitting into a (seconds:
i64, picoseconds: i64) pair where the number of picoseconds in a day can
fit in 53 bits and the number of seconds can represent much more than 10K
years in number of seconds.

This removes the need for a128-bit division by 86400 to do anything
interesting with the picoseconds timestamp. This layout could be a
Canonical Extension Type proposal with the seconds timestamp fields being
one of the existing timestamp types allowing for very cheap casts from the
extension type to the timestamp with the precision in seconds.

--
Felipe

On Tue, Nov 18, 2025 at 6:22 PM Curt Hagenlocher <[email protected]>
wrote:

For both Duration and Timestamp, this would require adding a new field
to the FlatBuffers spec. That should be okay, right?

A 128-bit timestamp would be useful at a nanosecond scale as well;
there are databases like Snowflake which support a precision and scale
for timestamps that force either truncation of precision or clipping
of range when representing as Arrow.

Would there be any reason to have (or not have) a canonical
LogicalType for these in Parquet as well?

On Fri, Nov 7, 2025 at 1:29 PM Tim Swena <[email protected]> wrote:

Hello,

Per the process described at

https://arrow.apache.org/docs/format/Changing.html#discussion-and-voting-process
I am starting a discussion thread for the following spec change proposal:


    1.

    Add a new time unit: PICOSECOND, which is unsupported in the existing
    64-bit timestamp-related types.
    2.

    Add support for bitWidth=128 to the timestamp data type, which
supports
    all units, including PICOSECOND.
    3.

    Add support for bitWidth=128 to the duration data type, which supports
    all units, including PICOSECOND.

This is motivated by some currently experimental changes in BigQuery to
support picosecond precision timestamps (source
<
https://docs.cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1?content_ref=read%20api%20will%20return%20full%20precision%20picosecond%20value%20the%20value%20will%20be%20encoded%20as%20a%20string%20which%20conforms%20to%20iso%208601%20format#picostimestampprecision
),
but from what I can tell such timestamps already have some support in IBM
Db2 (source
<
https://www.ibm.com/docs/en/db2-for-zos/13.0.0?topic=jdbc-dbtimestamp-class&content_ref=the+com+ibm+db2+jcc+dbtimestamp+class+can+be+used+to+create+timestamp+objects+with+a+precision+of+up+to+picoseconds+and+time+zone+information
)
and Trino (source
<
https://trino.io/docs/current/language/types.html?content_ref=heading+calendar+date+and+time+of+day+without+a+time+zone+with+pdigits+of+precision+for+the+fraction+of+seconds+a+precision+of+up+to+12+picoseconds+is+supported
).
Note that reference implementation(s) are still very much a
work-in-progress (https://github.com/apache/arrow/pull/48018 for a
start in
C++), but I figured it would be useful to kick off the conversation
before
diving in too much further into implementation.

Inspired by other discussions, I've created a draft of a more formal RFC
document here: Arrow-RFC: timestamp128 and duration128 data types with
support for picosecond units
<
https://docs.google.com/document/d/1-S0qvYTIEGlLnNkkgyWSHfnIvU4xpFqDQuMNTojaj9A/edit?tab=t.0#heading=h.as1aixu509k7


*  •  **Tim Sweña (Swast)*
*  •  *Team Lead, BigQuery DataFrames
*  •  *Google Cloud Platform
*  •  *Chicago, IL, USA



Reply via email to