Since the representation of these as two integers is quite different than the existing timestamp/duration representation, should these be canonical extension types rather than a change to the Flatbuffers spec?
On Fri, Nov 21, 2025 at 9:57 AM Tim Swena via dev <[email protected]> wrote: > Correction: I looked deeper into the BigQuery and Trino implementations, > and both are using 2 separate integers as Felipe is proposing. I think it's > worth updating the proposal to reflect this layout. Thanks, folks! > > * • **Tim Sweña (Swast)* > * • *Team Lead, BigQuery DataFrames > * • *Google Cloud Platform > * • *Chicago, IL, USA > > > On Fri, Nov 21, 2025 at 9:37 AM Tim Swena <[email protected]> wrote: > > > > Would there be any reason to have (or not have) a canonical LogicalType > > for these in Parquet as well? > > > > I think it would be appropriate to add this to Parquet as well. I assume > > there's a different process / mailing list for that? > > > > > our goal here should be to standardize existing practice, not come up > > with a novel representation, IMHO. > > > > BigQuery is using 128-bits, which is why I went this proposal. > > > > Trino is using 96-bits ( > > > https://github.com/trinodb/trino/blob/eef66628759d7244c176f62be45f3d9f0e5a1a5d/core/trino-spi/src/main/java/io/trino/spi/type/LongTimestampType.java > ) > > but doesn't seem to me that would be much more efficient compared to 128. > > > > * • **Tim Sweña (Swast)* > > * • *Team Lead, BigQuery DataFrames > > * • *Google Cloud Platform > > * • *Chicago, IL, USA > > > > > > On Wed, Nov 19, 2025 at 3:35 AM Antoine Pitrou <[email protected]> > wrote: > > > >> > >> I don't have a personal opinion on which representation is technical > >> better, but our goal here should be to standardize existing practice, > >> not come up with a novel representation, IMHO. > >> > >> Regards > >> > >> Antoine. > >> > >> > >> Le 18/11/2025 à 23:45, Felipe Oliveira Carvalho a écrit : > >> > One reason to avoid 128-bit integers is the requirement for 128-bit > >> > operations that it creates. Many high-resolution time representations > >> split > >> > the value in two integers in a way that is useful for many > time-related > >> > operations. > >> > > >> > The picosecond resolution can be achieved by splitting into a > (seconds: > >> > i64, picoseconds: i64) pair where the number of picoseconds in a day > can > >> > fit in 53 bits and the number of seconds can represent much more than > >> 10K > >> > years in number of seconds. > >> > > >> > This removes the need for a128-bit division by 86400 to do anything > >> > interesting with the picoseconds timestamp. This layout could be a > >> > Canonical Extension Type proposal with the seconds timestamp fields > >> being > >> > one of the existing timestamp types allowing for very cheap casts from > >> the > >> > extension type to the timestamp with the precision in seconds. > >> > > >> > -- > >> > Felipe > >> > > >> > On Tue, Nov 18, 2025 at 6:22 PM Curt Hagenlocher < > [email protected]> > >> > wrote: > >> > > >> >> For both Duration and Timestamp, this would require adding a new > field > >> >> to the FlatBuffers spec. That should be okay, right? > >> >> > >> >> A 128-bit timestamp would be useful at a nanosecond scale as well; > >> >> there are databases like Snowflake which support a precision and > scale > >> >> for timestamps that force either truncation of precision or clipping > >> >> of range when representing as Arrow. > >> >> > >> >> Would there be any reason to have (or not have) a canonical > >> >> LogicalType for these in Parquet as well? > >> >> > >> >> On Fri, Nov 7, 2025 at 1:29 PM Tim Swena <[email protected]> > >> wrote: > >> >>> > >> >>> Hello, > >> >>> > >> >>> Per the process described at > >> >>> > >> >> > >> > https://arrow.apache.org/docs/format/Changing.html#discussion-and-voting-process > >> >>> I am starting a discussion thread for the following spec change > >> proposal: > >> >>> > >> >>> > >> >>> 1. > >> >>> > >> >>> Add a new time unit: PICOSECOND, which is unsupported in the > >> existing > >> >>> 64-bit timestamp-related types. > >> >>> 2. > >> >>> > >> >>> Add support for bitWidth=128 to the timestamp data type, which > >> >> supports > >> >>> all units, including PICOSECOND. > >> >>> 3. > >> >>> > >> >>> Add support for bitWidth=128 to the duration data type, which > >> supports > >> >>> all units, including PICOSECOND. > >> >>> > >> >>> This is motivated by some currently experimental changes in BigQuery > >> to > >> >>> support picosecond precision timestamps (source > >> >>> < > >> >> > >> > https://docs.cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1?content_ref=read%20api%20will%20return%20full%20precision%20picosecond%20value%20the%20value%20will%20be%20encoded%20as%20a%20string%20which%20conforms%20to%20iso%208601%20format#picostimestampprecision > >> >>> ), > >> >>> but from what I can tell such timestamps already have some support > in > >> IBM > >> >>> Db2 (source > >> >>> < > >> >> > >> > https://www.ibm.com/docs/en/db2-for-zos/13.0.0?topic=jdbc-dbtimestamp-class&content_ref=the+com+ibm+db2+jcc+dbtimestamp+class+can+be+used+to+create+timestamp+objects+with+a+precision+of+up+to+picoseconds+and+time+zone+information > >> >>> ) > >> >>> and Trino (source > >> >>> < > >> >> > >> > https://trino.io/docs/current/language/types.html?content_ref=heading+calendar+date+and+time+of+day+without+a+time+zone+with+pdigits+of+precision+for+the+fraction+of+seconds+a+precision+of+up+to+12+picoseconds+is+supported > >> >>> ). > >> >>> Note that reference implementation(s) are still very much a > >> >>> work-in-progress (https://github.com/apache/arrow/pull/48018 for a > >> >> start in > >> >>> C++), but I figured it would be useful to kick off the conversation > >> >> before > >> >>> diving in too much further into implementation. > >> >>> > >> >>> Inspired by other discussions, I've created a draft of a more formal > >> RFC > >> >>> document here: Arrow-RFC: timestamp128 and duration128 data types > with > >> >>> support for picosecond units > >> >>> < > >> >> > >> > https://docs.google.com/document/d/1-S0qvYTIEGlLnNkkgyWSHfnIvU4xpFqDQuMNTojaj9A/edit?tab=t.0#heading=h.as1aixu509k7 > >> >>> > >> >>> > >> >>> * • **Tim Sweña (Swast)* > >> >>> * • *Team Lead, BigQuery DataFrames > >> >>> * • *Google Cloud Platform > >> >>> * • *Chicago, IL, USA > >> >> > >> > > >> > >> >
