That is a great suggestion Wes, thank you.

I wonder if we could get away with a 128 bit representation that is the
concatenation of the two existing interval types (YearMonth)(DayTime). Or
maybe even define a `struct` type with those fields that is used by
DataFusion.

Basically, given our reading of the Arrow spec[1], it is currently not
possible to precisely represent an interval that has both monthly and
sub-montly granularity.

As Dmtry says, if you have an interval seemingly simple like  1 month, 1 day

Using IntervalUnit(YEAR_MONTH) can't represent the 1 day
Using IntervalUnit(DAY_TIME) can't represent the month as different months
have different numbers of days

[1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L249-L260


On Wed, Feb 17, 2021 at 5:01 PM Wes McKinney <wesmck...@gmail.com> wrote:

> On Wed, Feb 17, 2021 at 3:46 PM <t...@dmtry.me> wrote:
> >
> > > It's unclear to me that this needs to be introduced into the top-level
> >
> > Similar thing to columnar format, How to store interval like 1 month 1
> day 1 hour? It’s not possible to do it without converting 1 month to 30
> days, which is a bad way.
> >
>
> Presumably you can represent a complex interval in a fixed number of
> bytes, and then embed the data in a FixedSizeBinary type. You can
> adorn this type with extension type metadata so that DataFusion can
> then apply Interval semantics to it. This could also serve as an
> interim strategy for you to proceed with implementation while
> proposing a top-level type to the Arrow format (which may or may not
> be accepting) so you aren't blocked on acceptance of changes into
> Schema.fbs.
>
> > > On 17 Feb 2021, at 21:02, Wes McKinney <wesmck...@gmail.com> wrote:
> > >
> > > It's unclear to me that this needs to be introduced into the top-level
> > > columnar format without more analysis — have you considered
> > > implementing this for DataFusion as an extension type for the time
> > > being?
> > >
> > > On Wed, Feb 17, 2021 at 11:59 AM t...@dmtry.me <mailto:t...@dmtry.me>
> <t...@dmtry.me <mailto:t...@dmtry.me>> wrote:
> > >>
> > >> Hi,
> > >>
> > >> For now, There are only two types of IntervalUnit inside Arrow:
> > >>
> > >> - YearMonth - month stored as int32
> > >> - DayTime - days as int32 and time in milliseconds  as in32. Total
> (64 bites)
> > >>
> > >> Since DF is using Arrow, It’s not possible to store “Complex”
> intervals such 1 MONTH 1 DAY 1 HOUR.
> > >> I think, the best way to understand the problem will be to read a
> comment from DF codebase:
> https://github.com/apache/arrow/blob/bca7d2fe84ccd8fc1129cb4d85448eb0779c52c3/rust/datafusion/src/sql/planner.rs#L1148
> > >>
> > >>        // Interval is tricky thing
> > >>        // 1 day is not 24 hours because timezones, 1 year != 365/364!
> 30 days != 1 month
> > >>        // The true way to store and calculate intervals is to store
> it as it defined
> > >>        // Due the fact that Arrow supports only two types YearMonth
> (month) and DayTime (day, time)
> > >>        // It's not possible to store complex intervals
> > >>        // It's possible to do select (NOW() + INTERVAL '1 year') +
> INTERVAL '1 day'; as workaround
> > >>        if result_month != 0 && (result_days != 0 || result_millis !=
> 0) {
> > >>            return Err(DataFusionError::NotImplemented(format!(
> > >>                "DF does not support intervals that have both a
> Year/Month part as well as Days/Hours/Mins/Seconds: {:?}. Hint: try
> breaking the interval into two parts, one with Year/Month and the other
> with Days/Hours/Mins/Seconds - e.g. (NOW() + INTERVAL '1 year') + INTERVAL
> '1 day'",
> > >>                value
> > >>            )));
> > >>        }
> > >>
> > >>
> > >>
> > >> I prepared a PR https://github.com/apache/arrow/pull/9516/files <
> https://github.com/apache/arrow/pull/9516/files> <
> https://github.com/apache/arrow/pull/9516/files <
> https://github.com/apache/arrow/pull/9516/files>> that introduce a new
> type for IntervalUnit called Complex, that store both YearMonth and DayTime
> to support complex interval.
> > >> I didn’t find any page/documentation on how to do RFC in Arrow
> protocol, so can anyone point me to it or PR with email will be enough?
> > >>
> > >> Thanks.
> >
>

Reply via email to