> The Arrow format supports 64-bit durations of seconds, milliseconds, > microseconds and nanoseconds. It would make sense for Parquet to > roundtrip these types IMHO.
Introducing parameterization would definitely add complexity and increase the number of logical types. Also, Parquet currently doesn’t support seconds as a time unit. Given that, it might be more practical to start with a concrete type like NanoDuration, and add other *Unit*Duration types later if needed. Following up on the related discussion about int128 vs FLAB(16)(see: https://lists.apache.org/thread/7zfwc3o53btd2xbdb8bqf8lxsrk76cxr), it seems there’s a general preference for sticking with FLAB(16) rather than introducing a new int128 type. If that’s the direction, should we define NanoDuration directly using the widest size—FLAB(16)—or annotate support for both int64 and FLAB(16)? My inclination is to go with the wider type from the start to keep things simpler, but I’d love to hear others’ thoughts on this. Best Regards, Yun Zou On Fri, Jul 11, 2025 at 9:23 AM Antoine Pitrou <[email protected]> wrote: > On Thu, 10 Jul 2025 17:18:34 -0700 > yun zou <[email protected]> > wrote: > > > I think the point was raised previously that hard-coded names were > > > preferred but I don't recall if that was when we were still calling > this > > > DayTime? > > > > I believe the main concern around naming is focused on whether to > > use *"Duration"* or *"Interval"*, rather than the inclusion of *"Nano"* > in > > the type name. > > > > As for the parameterized time unit, the primary issue seems to be that > > the *physical type size would vary depending on the unit* — for example, > > using int32 for milliseconds and int64 for microseconds. However, it > sounds > > like the proposal is to use the same physical type, the unit is just > used to > > indicate the type name. > > Certainly the latter, IMHO. > > > > I do think it's reasonable to parameterize `TimeUnit` > > > for consistency and future proofing but for now we should say it only > > > supports Nanoseconds > > > > It feels a bit odd to introduce a parameter when it currently only > supports > > a single value. > > An alternative could be to start with a concrete type like > *NanoDuration*, > > and if future > > requirements arise, we can consider adding new logical types such as > > *MicroDuration*, etc. > > The disadvantage is that the number of logical type definitions will > > increase along with the units > > we want to support, but I doubt there will be a lot. > > The Arrow format supports 64-bit durations of seconds, milliseconds, > microseconds and nanoseconds. It would make sense for Parquet to > roundtrip these types IMHO. > > Regards > > Antoine. > > >
