Hi Micah, Thanks a lot for the feedback!
> The second type, I think, can just be called Duration and be parameterized > with a single enum that contains nanoseconds (which allows it to expand to > support other granularities if needed) The parameterized approach was also mentioned in the last discussion. It seems that most people prefer a simpler type with a fixed representation that can cover the majority of use cases. I recall people pointed out that Parquet has a type using a parameterized representation, but it turned out to be overly complex and didn’t work out well in practice—though I don’t have detailed insight into that myself. Maybe @Russell Spitzer <russell.spit...@snowflake.com> has a better insight about the parameterized representation story? Best Regards, Yun On Tue, Jul 8, 2025 at 10:03 AM Micah Kornfield <emkornfi...@gmail.com> wrote: > > > > I'm in favor of the CalendarDuration and TimeDuration > > > I'm bikeshedding now but: > > YearMonthInterval I think works for the first type, the language and type > lines up with ANSI SQL and an Arrow type so I think there is little > ambiguity. > > The second type, I think, can just be called Duration and be parameterized > with a single enum that contains nanoseconds (which allows it to expand to > support other granularities if needed). Thoughts? IIUC based on all the > discussion Day Time Interval as proposed aligns with Arrow's definition > <https://github.com/apache/arrow/blob/main/format/Schema.fbs#L423> [1] of > Duration? > > I don't have a problem with FLBA(10) but I > > would hope > > we could do some better encoding tricks with an Int128 > > > Yes, hopefully we can get some better integer encodings in place that > would apply across the board. > > Cheers, > Micah > > [1] https://github.com/apache/arrow/blob/main/format/Schema.fbs#L423 > > On Tue, Jul 8, 2025 at 9:28 AM Russell Spitzer <russell.spit...@gmail.com> > wrote: > > > I'm in favor of the CalendarDuration and TimeDuration types as better > names > > for what we are trying to express here. I also think going forward with > > Int64 for > > now probably makes sense with us also doing some work to start getting an > > official int128 in as well. I don't have a problem with FLBA(10) but I > > would hope > > we could do some better encoding tricks with an Int128. I'm relatively a > > novice > > in this area so take that with a grain of salt. > > > > On Mon, Jul 7, 2025 at 10:39 PM Micah Kornfield <emkornfi...@gmail.com> > > wrote: > > > > > > > > > > However, the reverse is not guaranteed: a MonthDayNano value cannot > > > > reliably be converted back > > > > into a DayTimeInterval. This is because there's no way to determine > > > whether > > > > the calendar component > > > > is used without looking into the data, which introduces ambiguity. > This > > > > ambiguity can negatively impact > > > > interoperability across different engines and systems. > > > > > > > > > Ultimately, this is something that systems will need to deal with at > some > > > point but this can delayed until someone has the bandwidth to have a > > formal > > > proposal for persisting MonthDayNano in parquet (and it would still be > up > > > to the consuming system on how to do the translation so I'm not clear > > that > > > defining the translation is strictly necessary). > > > > > > > > > > Regarding whether we should use FLBA(16) or INT128, while INT128 does > > > have > > > > a natural > > > > fitting for ordering, I think one concern I had is if that type will > > only > > > > be used by the Day Time Interval. > > > > > > > > > I think there are a few use-cases that have at least been mentioned > where > > > it would be useful to have int128: > > > > > > 1. A replacement for int96 timestamp that can handle the full range of > > > ANSI SQL Nanoseconds. > > > 2. Picoseconds has at least been mentioned in passing and that would > > > require int128. > > > > > > If we don't model it as a 128 we should minimize the range to reflect > > what > > > ANSI SQL requires (i.e. FLBA(10) I believe). We should probably allow > the > > > logical type to annotate both int64 and FLBA(10), since int64 is a > common > > > representation for nanoseconds (this is similar to what we already do > for > > > Decimal values). > > > > > > Regarding the name for DayTimeInterval, if we all agree that "Duration" > > > > provides better clarity, > > > > I'm fully on board with using that instead. > > > > > > > > > +1, IIUC I think this addresses the majority of concerns. If others in > > the > > > community want to define a parquet representation for MonthDayNanos > arrow > > > interval that would be welcome as well. I think the main question then > > > becomes on Arrow side if we want to define the new type or deal with > the > > > unlikely case of overflow for the duration type. > > > > > > > > > > > > On Mon, Jul 7, 2025 at 4:38 PM yun zou <yunzou.colost...@gmail.com> > > wrote: > > > > > > > Hi, > > > > > > > > Thanks all for the valuable feedback! > > > > > > > > Regarding the MonthDayNano type, one important point that may not be > > > > explicitly stated > > > > is the lack of true interoperability between YearMonthInterval, > > > > DayTimeInterval, and MonthDayNano. > > > > > > > > While YearMonthInterval and DayTimeInterval are not directly > > > interoperable > > > > with each other, > > > > they can both be converted into MonthDayNano by setting certain > > > components > > > > to zero. > > > > However, the reverse is not guaranteed: a MonthDayNano value cannot > > > > reliably be converted back > > > > into a DayTimeInterval. This is because there's no way to determine > > > whether > > > > the calendar component > > > > is used without looking into the data, which introduces ambiguity. > This > > > > ambiguity can negatively impact > > > > interoperability across different engines and systems. > > > > > > > > > Doesn't capture semantics for engines that treat day as a calendar > > > type. > > > > I don't actually see the above as a drawback of introducing two > > separate > > > > interval types, > > > > since when the day is used as a calendar type, it can be mapped to > the > > > > MonthDayNano type. > > > > In fact, I believe all three types are necessary to fully support the > > > range > > > > of use cases. > > > > What’s important is that we clearly define the interoperability rules > > > > between them to ensure > > > > consistent behavior across systems. > > > > > > > > > While I understand the desire to be able to represent all values > > > > > allowable in ANSI SQL, I really don't understand why our types > should > > > > > not be allowed to represent any values *outside* of the range > allowed > > > > > in ANSI SQL. > > > > I completely agree—if there are valid use cases beyond ANSI SQL, we > > > should > > > > absolutely support them. It makes sense to leave range validation to > > the > > > > engine or > > > > client implementation, as they are best suited to handle their own > > > specific > > > > requirements.. > > > > > > > > Regarding whether we should use FLBA(16) or INT128, while INT128 does > > > have > > > > a natural > > > > fitting for ordering, I think one concern I had is if that type will > > only > > > > be used by the Day Time Interval. > > > > > > > > Regarding the name for DayTimeInterval, if we all agree that > "Duration" > > > > provides better clarity, > > > > I'm fully on board with using that instead. > > > > > > > > Best Regards, > > > > Yun > > > > > > > > > >