Yeah, I think that'd make sense. I've started a draft at
https://docs.google.com/document/d/1zDvKU26W8HS7aFplNTrIqNT1cLRiRcLAdCWiF8e4d_s/edit?tab=t.0#heading=h.as1aixu509k7
and will start a new discussion thread once I've made some progress on
reference implementations.

*  •  **Tim Sweña (Swast)*
*  •  *Team Lead, BigQuery DataFrames
*  •  *Google Cloud Platform
*  •  *Chicago, IL, USA


On Tue, Nov 25, 2025 at 9:41 AM Dewey Dunnington <[email protected]>
wrote:

> Since the representation of these as two integers is quite different than
> the existing timestamp/duration representation, should these be canonical
> extension types rather than a change to the Flatbuffers spec?
>
> On Fri, Nov 21, 2025 at 9:57 AM Tim Swena via dev <[email protected]>
> wrote:
>
>> Correction: I looked deeper into the BigQuery and Trino implementations,
>> and both are using 2 separate integers as Felipe is proposing. I think
>> it's
>> worth updating the proposal to reflect this layout. Thanks, folks!
>>
>> *  •  **Tim Sweña (Swast)*
>> *  •  *Team Lead, BigQuery DataFrames
>> *  •  *Google Cloud Platform
>> *  •  *Chicago, IL, USA
>>
>>
>> On Fri, Nov 21, 2025 at 9:37 AM Tim Swena <[email protected]> wrote:
>>
>> > > Would there be any reason to have (or not have) a canonical
>> LogicalType
>> > for these in Parquet as well?
>> >
>> > I think it would be appropriate to add this to Parquet as well. I assume
>> > there's a different process / mailing list for that?
>> >
>> > > our goal here should be to standardize existing practice, not come up
>> > with a novel representation, IMHO.
>> >
>> > BigQuery is using 128-bits, which is why I went this proposal.
>> >
>> > Trino is using 96-bits (
>> >
>> https://github.com/trinodb/trino/blob/eef66628759d7244c176f62be45f3d9f0e5a1a5d/core/trino-spi/src/main/java/io/trino/spi/type/LongTimestampType.java
>> )
>> > but doesn't seem to me that would be much more efficient compared to
>> 128.
>> >
>> > *  •  **Tim Sweña (Swast)*
>> > *  •  *Team Lead, BigQuery DataFrames
>> > *  •  *Google Cloud Platform
>> > *  •  *Chicago, IL, USA
>> >
>> >
>> > On Wed, Nov 19, 2025 at 3:35 AM Antoine Pitrou <[email protected]>
>> wrote:
>> >
>> >>
>> >> I don't have a personal opinion on which representation is technical
>> >> better, but our goal here should be to standardize existing practice,
>> >> not come up with a novel representation, IMHO.
>> >>
>> >> Regards
>> >>
>> >> Antoine.
>> >>
>> >>
>> >> Le 18/11/2025 à 23:45, Felipe Oliveira Carvalho a écrit :
>> >> > One reason to avoid 128-bit integers is the requirement for 128-bit
>> >> > operations that it creates. Many high-resolution time representations
>> >> split
>> >> > the value in two integers in a way that is useful for many
>> time-related
>> >> > operations.
>> >> >
>> >> > The picosecond resolution can be achieved by splitting into a
>> (seconds:
>> >> > i64, picoseconds: i64) pair where the number of picoseconds in a day
>> can
>> >> > fit in 53 bits and the number of seconds can represent much more than
>> >> 10K
>> >> > years in number of seconds.
>> >> >
>> >> > This removes the need for a128-bit division by 86400 to do anything
>> >> > interesting with the picoseconds timestamp. This layout could be a
>> >> > Canonical Extension Type proposal with the seconds timestamp fields
>> >> being
>> >> > one of the existing timestamp types allowing for very cheap casts
>> from
>> >> the
>> >> > extension type to the timestamp with the precision in seconds.
>> >> >
>> >> > --
>> >> > Felipe
>> >> >
>> >> > On Tue, Nov 18, 2025 at 6:22 PM Curt Hagenlocher <
>> [email protected]>
>> >> > wrote:
>> >> >
>> >> >> For both Duration and Timestamp, this would require adding a new
>> field
>> >> >> to the FlatBuffers spec. That should be okay, right?
>> >> >>
>> >> >> A 128-bit timestamp would be useful at a nanosecond scale as well;
>> >> >> there are databases like Snowflake which support a precision and
>> scale
>> >> >> for timestamps that force either truncation of precision or clipping
>> >> >> of range when representing as Arrow.
>> >> >>
>> >> >> Would there be any reason to have (or not have) a canonical
>> >> >> LogicalType for these in Parquet as well?
>> >> >>
>> >> >> On Fri, Nov 7, 2025 at 1:29 PM Tim Swena <[email protected]>
>> >> wrote:
>> >> >>>
>> >> >>> Hello,
>> >> >>>
>> >> >>> Per the process described at
>> >> >>>
>> >> >>
>> >>
>> https://arrow.apache.org/docs/format/Changing.html#discussion-and-voting-process
>> >> >>> I am starting a discussion thread for the following spec change
>> >> proposal:
>> >> >>>
>> >> >>>
>> >> >>>     1.
>> >> >>>
>> >> >>>     Add a new time unit: PICOSECOND, which is unsupported in the
>> >> existing
>> >> >>>     64-bit timestamp-related types.
>> >> >>>     2.
>> >> >>>
>> >> >>>     Add support for bitWidth=128 to the timestamp data type, which
>> >> >> supports
>> >> >>>     all units, including PICOSECOND.
>> >> >>>     3.
>> >> >>>
>> >> >>>     Add support for bitWidth=128 to the duration data type, which
>> >> supports
>> >> >>>     all units, including PICOSECOND.
>> >> >>>
>> >> >>> This is motivated by some currently experimental changes in
>> BigQuery
>> >> to
>> >> >>> support picosecond precision timestamps (source
>> >> >>> <
>> >> >>
>> >>
>> https://docs.cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1?content_ref=read%20api%20will%20return%20full%20precision%20picosecond%20value%20the%20value%20will%20be%20encoded%20as%20a%20string%20which%20conforms%20to%20iso%208601%20format#picostimestampprecision
>> >> >>> ),
>> >> >>> but from what I can tell such timestamps already have some support
>> in
>> >> IBM
>> >> >>> Db2 (source
>> >> >>> <
>> >> >>
>> >>
>> https://www.ibm.com/docs/en/db2-for-zos/13.0.0?topic=jdbc-dbtimestamp-class&content_ref=the+com+ibm+db2+jcc+dbtimestamp+class+can+be+used+to+create+timestamp+objects+with+a+precision+of+up+to+picoseconds+and+time+zone+information
>> >> >>> )
>> >> >>> and Trino (source
>> >> >>> <
>> >> >>
>> >>
>> https://trino.io/docs/current/language/types.html?content_ref=heading+calendar+date+and+time+of+day+without+a+time+zone+with+pdigits+of+precision+for+the+fraction+of+seconds+a+precision+of+up+to+12+picoseconds+is+supported
>> >> >>> ).
>> >> >>> Note that reference implementation(s) are still very much a
>> >> >>> work-in-progress (https://github.com/apache/arrow/pull/48018 for a
>> >> >> start in
>> >> >>> C++), but I figured it would be useful to kick off the conversation
>> >> >> before
>> >> >>> diving in too much further into implementation.
>> >> >>>
>> >> >>> Inspired by other discussions, I've created a draft of a more
>> formal
>> >> RFC
>> >> >>> document here: Arrow-RFC: timestamp128 and duration128 data types
>> with
>> >> >>> support for picosecond units
>> >> >>> <
>> >> >>
>> >>
>> https://docs.google.com/document/d/1-S0qvYTIEGlLnNkkgyWSHfnIvU4xpFqDQuMNTojaj9A/edit?tab=t.0#heading=h.as1aixu509k7
>> >> >>>
>> >> >>>
>> >> >>> *  •  **Tim Sweña (Swast)*
>> >> >>> *  •  *Team Lead, BigQuery DataFrames
>> >> >>> *  •  *Google Cloud Platform
>> >> >>> *  •  *Chicago, IL, USA
>> >> >>
>> >> >
>> >>
>> >>
>>
>

Reply via email to