That is a great suggestion Wes, thank you. I wonder if we could get away with a 128 bit representation that is the concatenation of the two existing interval types (YearMonth)(DayTime). Or maybe even define a `struct` type with those fields that is used by DataFusion.
Basically, given our reading of the Arrow spec[1], it is currently not possible to precisely represent an interval that has both monthly and sub-montly granularity. As Dmtry says, if you have an interval seemingly simple like 1 month, 1 day Using IntervalUnit(YEAR_MONTH) can't represent the 1 day Using IntervalUnit(DAY_TIME) can't represent the month as different months have different numbers of days [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L249-L260 On Wed, Feb 17, 2021 at 5:01 PM Wes McKinney <wesmck...@gmail.com> wrote: > On Wed, Feb 17, 2021 at 3:46 PM <t...@dmtry.me> wrote: > > > > > It's unclear to me that this needs to be introduced into the top-level > > > > Similar thing to columnar format, How to store interval like 1 month 1 > day 1 hour? It’s not possible to do it without converting 1 month to 30 > days, which is a bad way. > > > > Presumably you can represent a complex interval in a fixed number of > bytes, and then embed the data in a FixedSizeBinary type. You can > adorn this type with extension type metadata so that DataFusion can > then apply Interval semantics to it. This could also serve as an > interim strategy for you to proceed with implementation while > proposing a top-level type to the Arrow format (which may or may not > be accepting) so you aren't blocked on acceptance of changes into > Schema.fbs. > > > > On 17 Feb 2021, at 21:02, Wes McKinney <wesmck...@gmail.com> wrote: > > > > > > It's unclear to me that this needs to be introduced into the top-level > > > columnar format without more analysis — have you considered > > > implementing this for DataFusion as an extension type for the time > > > being? > > > > > > On Wed, Feb 17, 2021 at 11:59 AM t...@dmtry.me <mailto:t...@dmtry.me> > <t...@dmtry.me <mailto:t...@dmtry.me>> wrote: > > >> > > >> Hi, > > >> > > >> For now, There are only two types of IntervalUnit inside Arrow: > > >> > > >> - YearMonth - month stored as int32 > > >> - DayTime - days as int32 and time in milliseconds as in32. Total > (64 bites) > > >> > > >> Since DF is using Arrow, It’s not possible to store “Complex” > intervals such 1 MONTH 1 DAY 1 HOUR. > > >> I think, the best way to understand the problem will be to read a > comment from DF codebase: > https://github.com/apache/arrow/blob/bca7d2fe84ccd8fc1129cb4d85448eb0779c52c3/rust/datafusion/src/sql/planner.rs#L1148 > > >> > > >> // Interval is tricky thing > > >> // 1 day is not 24 hours because timezones, 1 year != 365/364! > 30 days != 1 month > > >> // The true way to store and calculate intervals is to store > it as it defined > > >> // Due the fact that Arrow supports only two types YearMonth > (month) and DayTime (day, time) > > >> // It's not possible to store complex intervals > > >> // It's possible to do select (NOW() + INTERVAL '1 year') + > INTERVAL '1 day'; as workaround > > >> if result_month != 0 && (result_days != 0 || result_millis != > 0) { > > >> return Err(DataFusionError::NotImplemented(format!( > > >> "DF does not support intervals that have both a > Year/Month part as well as Days/Hours/Mins/Seconds: {:?}. Hint: try > breaking the interval into two parts, one with Year/Month and the other > with Days/Hours/Mins/Seconds - e.g. (NOW() + INTERVAL '1 year') + INTERVAL > '1 day'", > > >> value > > >> ))); > > >> } > > >> > > >> > > >> > > >> I prepared a PR https://github.com/apache/arrow/pull/9516/files < > https://github.com/apache/arrow/pull/9516/files> < > https://github.com/apache/arrow/pull/9516/files < > https://github.com/apache/arrow/pull/9516/files>> that introduce a new > type for IntervalUnit called Complex, that store both YearMonth and DayTime > to support complex interval. > > >> I didn’t find any page/documentation on how to do RFC in Arrow > protocol, so can anyone point me to it or PR with email will be enough? > > >> > > >> Thanks. > > >