On Wed, Feb 17, 2021 at 3:46 PM <t...@dmtry.me> wrote: > > > It's unclear to me that this needs to be introduced into the top-level > > Similar thing to columnar format, How to store interval like 1 month 1 day 1 > hour? It’s not possible to do it without converting 1 month to 30 days, which > is a bad way. >
Presumably you can represent a complex interval in a fixed number of bytes, and then embed the data in a FixedSizeBinary type. You can adorn this type with extension type metadata so that DataFusion can then apply Interval semantics to it. This could also serve as an interim strategy for you to proceed with implementation while proposing a top-level type to the Arrow format (which may or may not be accepting) so you aren't blocked on acceptance of changes into Schema.fbs. > > On 17 Feb 2021, at 21:02, Wes McKinney <wesmck...@gmail.com> wrote: > > > > It's unclear to me that this needs to be introduced into the top-level > > columnar format without more analysis — have you considered > > implementing this for DataFusion as an extension type for the time > > being? > > > > On Wed, Feb 17, 2021 at 11:59 AM t...@dmtry.me <mailto:t...@dmtry.me> > > <t...@dmtry.me <mailto:t...@dmtry.me>> wrote: > >> > >> Hi, > >> > >> For now, There are only two types of IntervalUnit inside Arrow: > >> > >> - YearMonth - month stored as int32 > >> - DayTime - days as int32 and time in milliseconds as in32. Total (64 > >> bites) > >> > >> Since DF is using Arrow, It’s not possible to store “Complex” intervals > >> such 1 MONTH 1 DAY 1 HOUR. > >> I think, the best way to understand the problem will be to read a comment > >> from DF codebase: > >> https://github.com/apache/arrow/blob/bca7d2fe84ccd8fc1129cb4d85448eb0779c52c3/rust/datafusion/src/sql/planner.rs#L1148 > >> > >> // Interval is tricky thing > >> // 1 day is not 24 hours because timezones, 1 year != 365/364! 30 > >> days != 1 month > >> // The true way to store and calculate intervals is to store it as > >> it defined > >> // Due the fact that Arrow supports only two types YearMonth > >> (month) and DayTime (day, time) > >> // It's not possible to store complex intervals > >> // It's possible to do select (NOW() + INTERVAL '1 year') + > >> INTERVAL '1 day'; as workaround > >> if result_month != 0 && (result_days != 0 || result_millis != 0) { > >> return Err(DataFusionError::NotImplemented(format!( > >> "DF does not support intervals that have both a Year/Month > >> part as well as Days/Hours/Mins/Seconds: {:?}. Hint: try breaking the > >> interval into two parts, one with Year/Month and the other with > >> Days/Hours/Mins/Seconds - e.g. (NOW() + INTERVAL '1 year') + INTERVAL '1 > >> day'", > >> value > >> ))); > >> } > >> > >> > >> > >> I prepared a PR https://github.com/apache/arrow/pull/9516/files > >> <https://github.com/apache/arrow/pull/9516/files> > >> <https://github.com/apache/arrow/pull/9516/files > >> <https://github.com/apache/arrow/pull/9516/files>> that introduce a new > >> type for IntervalUnit called Complex, that store both YearMonth and > >> DayTime to support complex interval. > >> I didn’t find any page/documentation on how to do RFC in Arrow protocol, > >> so can anyone point me to it or PR with email will be enough? > >> > >> Thanks. >