Hi Micah,

Thanks a lot for the feedback!

> The second type, I think, can just be called Duration and be parameterized
> with a single enum that contains nanoseconds (which allows it to expand to
> support other granularities if needed)

The parameterized approach was also mentioned in the last discussion. It
seems that most people prefer a simpler type with a fixed representation
that
can cover the majority of use cases.
I recall people pointed out that Parquet has a type using a parameterized
representation, but it turned out to be overly complex and didn’t work out
well
 in practice—though I don’t have detailed insight into that myself.
Maybe @Russell Spitzer <russell.spit...@snowflake.com> has a better
insight about the parameterized representation story?

Best Regards,
Yun


On Tue, Jul 8, 2025 at 10:03 AM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> >
> > I'm in favor of the CalendarDuration and TimeDuration
>
>
> I'm bikeshedding now but:
>
> YearMonthInterval I think works for the first type, the language and type
> lines up with ANSI SQL and an Arrow type so I think there is little
> ambiguity.
>
> The second type, I think, can just be called Duration and be parameterized
> with a single enum that contains nanoseconds (which allows it to expand to
> support other granularities if needed).  Thoughts?  IIUC based on all the
> discussion Day Time Interval as proposed aligns with Arrow's definition
> <https://github.com/apache/arrow/blob/main/format/Schema.fbs#L423> [1] of
> Duration?
>
> I don't have a problem with FLBA(10) but I
> > would hope
> > we could do some better encoding tricks with an Int128
>
>
> Yes, hopefully we can get some better integer encodings in place that
> would apply across the board.
>
> Cheers,
> Micah
>
> [1] https://github.com/apache/arrow/blob/main/format/Schema.fbs#L423
>
> On Tue, Jul 8, 2025 at 9:28 AM Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
> > I'm in favor of the CalendarDuration and TimeDuration types as better
> names
> > for what we are trying to express here. I also think going forward with
> > Int64 for
> > now probably makes sense with us also doing some work to start getting an
> > official int128 in as well. I don't have a problem with FLBA(10) but I
> > would hope
> > we could do some better encoding tricks with an Int128. I'm relatively a
> > novice
> > in this area so take that with a grain of salt.
> >
> > On Mon, Jul 7, 2025 at 10:39 PM Micah Kornfield <emkornfi...@gmail.com>
> > wrote:
> >
> > > >
> > > > However, the reverse is not guaranteed: a MonthDayNano value cannot
> > > > reliably be converted back
> > > > into a DayTimeInterval. This is because there's no way to determine
> > > whether
> > > > the calendar component
> > > > is used without looking into the data, which introduces ambiguity.
> This
> > > > ambiguity can negatively impact
> > > > interoperability across different engines and systems.
> > >
> > >
> > > Ultimately, this is something that systems will need to deal with at
> some
> > > point but this can delayed until someone has the bandwidth to have a
> > formal
> > > proposal for persisting MonthDayNano in parquet (and it would still be
> up
> > > to the consuming system on how to do the translation so I'm not clear
> > that
> > > defining the translation is strictly necessary).
> > >
> > >
> > > > Regarding whether we should use FLBA(16) or INT128, while INT128 does
> > > have
> > > > a natural
> > > > fitting for ordering, I think one concern I had is if that type will
> > only
> > > > be used by the Day Time Interval.
> > >
> > >
> > > I think there are a few use-cases that have at least been mentioned
> where
> > > it would be useful to have int128:
> > >
> > > 1.  A replacement for int96 timestamp that can handle the full range of
> > > ANSI SQL Nanoseconds.
> > > 2.  Picoseconds has at least been mentioned in passing and that would
> > > require int128.
> > >
> > > If we don't model it as a 128 we should minimize the range to reflect
> > what
> > > ANSI SQL requires (i.e. FLBA(10) I believe). We should probably allow
> the
> > > logical type to annotate both int64 and FLBA(10), since int64 is a
> common
> > > representation for nanoseconds (this is similar to what we already do
> for
> > > Decimal values).
> > >
> > > Regarding the name for DayTimeInterval, if we all agree that "Duration"
> > > > provides better clarity,
> > > > I'm fully on board with using that instead.
> > >
> > >
> > > +1, IIUC I think this addresses the majority of concerns.  If others in
> > the
> > > community want to define a parquet representation for MonthDayNanos
> arrow
> > > interval that would be welcome as well. I think the main question then
> > > becomes on Arrow side if we want to define the new type or deal with
> the
> > > unlikely case of overflow for the duration type.
> > >
> > >
> > >
> > > On Mon, Jul 7, 2025 at 4:38 PM yun zou <yunzou.colost...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > Thanks all for the valuable feedback!
> > > >
> > > > Regarding the MonthDayNano type, one important point that may not be
> > > > explicitly stated
> > > > is the lack of true interoperability between YearMonthInterval,
> > > > DayTimeInterval, and MonthDayNano.
> > > >
> > > > While YearMonthInterval and DayTimeInterval are not directly
> > > interoperable
> > > > with each other,
> > > > they can both be converted into MonthDayNano by setting certain
> > > components
> > > > to zero.
> > > > However, the reverse is not guaranteed: a MonthDayNano value cannot
> > > > reliably be converted back
> > > > into a DayTimeInterval. This is because there's no way to determine
> > > whether
> > > > the calendar component
> > > > is used without looking into the data, which introduces ambiguity.
> This
> > > > ambiguity can negatively impact
> > > > interoperability across different engines and systems.
> > > >
> > > > > Doesn't capture semantics for engines that treat day as a calendar
> > > type.
> > > > I don't actually see the above as a drawback of introducing two
> > separate
> > > > interval types,
> > > > since when the day is used as a calendar type, it can be mapped to
> the
> > > > MonthDayNano type.
> > > > In fact, I believe all three types are necessary to fully support the
> > > range
> > > > of use cases.
> > > > What’s important is that we clearly define the interoperability rules
> > > > between them to ensure
> > > > consistent behavior across systems.
> > > >
> > > > > While I understand the desire to be able to represent all values
> > > > > allowable in ANSI SQL, I really don't understand why our types
> should
> > > > > not be allowed to represent any values *outside* of the range
> allowed
> > > > > in ANSI SQL.
> > > > I completely agree—if there are valid use cases beyond ANSI SQL, we
> > > should
> > > > absolutely support them. It makes sense to leave range validation to
> > the
> > > > engine or
> > > > client implementation, as they are best suited to handle their own
> > > specific
> > > > requirements..
> > > >
> > > > Regarding whether we should use FLBA(16) or INT128, while INT128 does
> > > have
> > > > a natural
> > > > fitting for ordering, I think one concern I had is if that type will
> > only
> > > > be used by the Day Time Interval.
> > > >
> > > > Regarding the name for DayTimeInterval, if we all agree that
> "Duration"
> > > > provides better clarity,
> > > > I'm fully on board with using that instead.
> > > >
> > > > Best Regards,
> > > > Yun
> > > >
> > >
> >
>

Reply via email to