I think the `to_timestamp_*` sounds like a good idea to me

On Sat, Apr 17, 2021 at 3:00 PM Evan Chan <e...@urbanlogiq.com> wrote:

> Andrew and others,
>
> Thanks for your input.
>
> >
> > 3. For functions (e.g. date_trunc(...)), I think the infrastructure for
> > multiple type signatures exists, we just need implementations for
> different
> > resolutions
>
> Sure, we can add more type support in date_trunc() and other functions.
>
> > 4. For converting back and forth from different resolutions in SQL Server
> >
> > For 3, given there is seemingly no SQL standard way to do this, I suggest
> > adding a new, non-postgres function like `convert_resolution` that
> performs
> > the desired conversion. This function could be implemented today as a
> user
> > defined function (UDF), and/or included as a built in function in
> > DataFusion in the future.
>
> What do folks think about adding functions like the following to Arrow?
>
> * to_timestamp_millis(STRING) -> converts string timestamps to timestamps
> with millis resolution
> * to_timestamp_millis(Timestamp(xxx)) -> converts or truncates other
> timestamp column types to millis resolution
> * to_timestamp_micros(…)
> * to_timestamp_seconds(…)
>
> -Evan
>
> >
> > Andrew
> >
> > [1]  https://github.com/apache/arrow/pull/10005#discussion_r612551640 <
> https://github.com/apache/arrow/pull/10005#discussion_r612551640>
> >
> >
> >
> > On Thu, Apr 15, 2021 at 4:41 PM Evan Chan <e...@urbanlogiq.com <mailto:
> e...@urbanlogiq.com>> wrote:
> >
> >> Hi folks,
> >>
> >> So currently Arrow Rust/DataFusion supports four types of Timestamp
> >> arrays, with Nano, Micro, Millisecond and Second resolution.  However,
> the
> >> best supported by far are Nanos.  For example, in DataFusion, the
> following
> >> only works for Nanos and not the other resolutions:
> >> * CAST(x as TIMESTAMP)  -> Nanos only
> >> * date_trunc()  -> nanos only
> >> * filtering a timestamp array, eg my column >
> >> to_timestamp(‘2020-06-30T12:00Z’)
> >>
> >> In the broader SQL world, in general there seems to be only a single
> >> Timestamp type in most databases, though in many cases there is a
> variable
> >> resolution.
> >> PostGres’s TIMESTAMP type is microsecond based:
> >> https://www.postgresql.org/docs/9.1/datatype-datetime.html <
> >> https://www.postgresql.org/docs/9.1/datatype-datetime.html <
> https://www.postgresql.org/docs/9.1/datatype-datetime.html>>
> >>
> >> For UrbanLogiq, in some cases we would like to standardize on
> millisecond
> >> resolution.  Nanoseconds yields only ~300 years of span for i64, which
> >> isn’t enough for some applications.  At minimum, we’d like to get the
> >> following working:
> >> * Either something like CAST(x AS TIMESTAMP(Milliseconds)) or
> >> date_trunc(‘milliseconds’, x) , which can cast different types of
> timestamp
> >> arrays to Timestamp(Milliseconds, None)
> >> * filtering that can compare, ideally, different types of timestamp
> >> columns to a to_timestamp(….)
> >>
> >> The last problem is easy to solve as the coercion logic can be fixed to
> >> address that, but solving the first problem is not as straightforward.
> >> - There isn’t any universal SQL standard for a type that supports
> >> different timestamp resolutions
> >> - The most non-intrusive way I can think of is:
> >>    - CAST(x AS TIMESTAMP)   -> Nanos
> >>    - CAST(x AS TIMESTAMP(Micros/Millis/Seconds)) ->.
> >> Micros/Millis/Seconds arrays
> >> - Functions are designed/work best with a single output type, and
> >> date_trunc() is designed to output nanos only.  Fixing this would not be
> >> trivial, it would probably require changing the signature of
> return_type()
> >> so that return types can be determined from argument values, not just
> >> argument types
> >>
> >> Basically the larger question for the Arrow/DataFusion community is how
> do
> >> we want to deal with supporting different timestamp types.  The ideal
> would
> >> be that different functions work on all the timestamp types, but it’ll
> take
> >> a long time to get there I fear.  Some possible directions:
> >>
> >> - Continue current support for nanos as the “first class” citizen, but
> add
> >> support for casting to different timestamp resolutions, and coercion to
> >> nanos to work with different functions like date_trunc().
> >>    - This means Arrow would not be usable in some cases.  Converting
> from
> >> micros to nanos loses year range (for 64-bits anyways)
> >>    - There is a performance penalty for the coercion
> >> - Switch to different basis for the “base” timestamp type, like PostGres
> >> did with micros
> >> - Use more than 64 bits to represent nanos
> >> - Add support for functions to work with different timestamp types
> >>    - For example, cast() produces different timestamp resolutions
> >>    - Other date functions like date_trunc() can input from different
> >> resolutions
> >>    - Signatures and return type calculation would be more complex
> >> - Switch to a universal timestamp type which supports different
> >> resolutions, as some SQL databases support
> >>
> >> Thanks for your input,
> >> Evan
>
>

Reply via email to