I'm in favor of the CalendarDuration and TimeDuration types as better names
for what we are trying to express here. I also think going forward with
Int64 for
now probably makes sense with us also doing some work to start getting an
official int128 in as well. I don't have a problem with FLBA(10) but I
would hope
we could do some better encoding tricks with an Int128. I'm relatively a
novice
in this area so take that with a grain of salt.

On Mon, Jul 7, 2025 at 10:39 PM Micah Kornfield <[email protected]>
wrote:

> >
> > However, the reverse is not guaranteed: a MonthDayNano value cannot
> > reliably be converted back
> > into a DayTimeInterval. This is because there's no way to determine
> whether
> > the calendar component
> > is used without looking into the data, which introduces ambiguity. This
> > ambiguity can negatively impact
> > interoperability across different engines and systems.
>
>
> Ultimately, this is something that systems will need to deal with at some
> point but this can delayed until someone has the bandwidth to have a formal
> proposal for persisting MonthDayNano in parquet (and it would still be up
> to the consuming system on how to do the translation so I'm not clear that
> defining the translation is strictly necessary).
>
>
> > Regarding whether we should use FLBA(16) or INT128, while INT128 does
> have
> > a natural
> > fitting for ordering, I think one concern I had is if that type will only
> > be used by the Day Time Interval.
>
>
> I think there are a few use-cases that have at least been mentioned where
> it would be useful to have int128:
>
> 1.  A replacement for int96 timestamp that can handle the full range of
> ANSI SQL Nanoseconds.
> 2.  Picoseconds has at least been mentioned in passing and that would
> require int128.
>
> If we don't model it as a 128 we should minimize the range to reflect what
> ANSI SQL requires (i.e. FLBA(10) I believe). We should probably allow the
> logical type to annotate both int64 and FLBA(10), since int64 is a common
> representation for nanoseconds (this is similar to what we already do for
> Decimal values).
>
> Regarding the name for DayTimeInterval, if we all agree that "Duration"
> > provides better clarity,
> > I'm fully on board with using that instead.
>
>
> +1, IIUC I think this addresses the majority of concerns.  If others in the
> community want to define a parquet representation for MonthDayNanos arrow
> interval that would be welcome as well. I think the main question then
> becomes on Arrow side if we want to define the new type or deal with the
> unlikely case of overflow for the duration type.
>
>
>
> On Mon, Jul 7, 2025 at 4:38 PM yun zou <[email protected]> wrote:
>
> > Hi,
> >
> > Thanks all for the valuable feedback!
> >
> > Regarding the MonthDayNano type, one important point that may not be
> > explicitly stated
> > is the lack of true interoperability between YearMonthInterval,
> > DayTimeInterval, and MonthDayNano.
> >
> > While YearMonthInterval and DayTimeInterval are not directly
> interoperable
> > with each other,
> > they can both be converted into MonthDayNano by setting certain
> components
> > to zero.
> > However, the reverse is not guaranteed: a MonthDayNano value cannot
> > reliably be converted back
> > into a DayTimeInterval. This is because there's no way to determine
> whether
> > the calendar component
> > is used without looking into the data, which introduces ambiguity. This
> > ambiguity can negatively impact
> > interoperability across different engines and systems.
> >
> > > Doesn't capture semantics for engines that treat day as a calendar
> type.
> > I don't actually see the above as a drawback of introducing two separate
> > interval types,
> > since when the day is used as a calendar type, it can be mapped to the
> > MonthDayNano type.
> > In fact, I believe all three types are necessary to fully support the
> range
> > of use cases.
> > What’s important is that we clearly define the interoperability rules
> > between them to ensure
> > consistent behavior across systems.
> >
> > > While I understand the desire to be able to represent all values
> > > allowable in ANSI SQL, I really don't understand why our types should
> > > not be allowed to represent any values *outside* of the range allowed
> > > in ANSI SQL.
> > I completely agree—if there are valid use cases beyond ANSI SQL, we
> should
> > absolutely support them. It makes sense to leave range validation to the
> > engine or
> > client implementation, as they are best suited to handle their own
> specific
> > requirements..
> >
> > Regarding whether we should use FLBA(16) or INT128, while INT128 does
> have
> > a natural
> > fitting for ordering, I think one concern I had is if that type will only
> > be used by the Day Time Interval.
> >
> > Regarding the name for DayTimeInterval, if we all agree that "Duration"
> > provides better clarity,
> > I'm fully on board with using that instead.
> >
> > Best Regards,
> > Yun
> >
>

Reply via email to