I also am not sure there is a good case for a new built-in type since it
introduces a good deal of complexity, particularly when there is the
extension type option. We’ve been living with 64-bit nanoseconds in pandas
for a decade, for example (and without the option for lower resolutions!!),
and while it does arise as a limitation from time to time, the use cases
are so specialized that it has never made sense to do anything about it.

On Tue, Aug 4, 2020 at 11:26 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> I think a stronger case needs to be made for adding a new builtin type to
> support this.  Can you provide concrete use-cases?  Why can't dates outside
> of the one representable by int64 be truncated (even for nano precision
> 64-bits max value is is over 200 years in the future)?  It seems like in
> most cases values at the nanosecond level that are outside the values
> representable by 64-bits, are generally sentinel values.
>
> FWIW, Parquet had an int96 type that was used for timestamps but it has
> been deprecated [1] in favor of int64 nanos.
>
> -Micah
>
> [1] https://issues.apache.org/jira/browse/PARQUET-323
>
> On Tue, Aug 4, 2020 at 8:52 PM Fan Liya <liya.fa...@gmail.com> wrote:
>
> > Hi Ji,
> >
> > This sounds like a universal requirement, as 64-bit is not sufficient to
> > hold the precision for nano-second.
> >
> > For the extension type, we have two choices:
> > 1. Extending struct(int64, int32), which represents the design of SoA
> > (Struct of Arrays).
> > 2. Extending fixed width binary(12), which represents the design of AoS
> > (Array of Structs)
> >
> > Given the universal requirement, I'd prefer a new type.
> >
> > Best,
> > Liya Fan
> >
> >
> > On Wed, Aug 5, 2020 at 11:18 AM Ji Liu <tianc...@apache.org> wrote:
> >
> > > Hi all,
> > >
> > > Now in Arrow Timestamp type, it support different TimeUnit(seconds,
> > > milliseconds, microseconds, nanoseconds) with int64 type for storage.
> In
> > > most cases this is enough, but if the timestamp value range of external
> > > system exceeds int64_t::max, then it's impossible to directly convert
> to
> > > Arrow Timestamp, consider the following user case:
> > >
> > > A timestamp in other system with int64 + int32(stores milliseconds and
> > > nanoseconds) can represent data from 0000-00-00 to 9999-12-31
> > > 23:59:59.999999999, if we want to convert type like this, how should we
> > do?
> > > One probably create an extension type with struct(int64, int32) for
> > > storage.
> > >
> > > Besides ExtensionType, are we considering extending our Timestamp for
> > wider
> > > range or maybe a new type for cases above?
> > >
> > >
> > > Thanks,
> > > Ji Liu
> > >
> >
>

Reply via email to