I also am not sure there is a good case for a new built-in type since it introduces a good deal of complexity, particularly when there is the extension type option. We’ve been living with 64-bit nanoseconds in pandas for a decade, for example (and without the option for lower resolutions!!), and while it does arise as a limitation from time to time, the use cases are so specialized that it has never made sense to do anything about it.
On Tue, Aug 4, 2020 at 11:26 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > I think a stronger case needs to be made for adding a new builtin type to > support this. Can you provide concrete use-cases? Why can't dates outside > of the one representable by int64 be truncated (even for nano precision > 64-bits max value is is over 200 years in the future)? It seems like in > most cases values at the nanosecond level that are outside the values > representable by 64-bits, are generally sentinel values. > > FWIW, Parquet had an int96 type that was used for timestamps but it has > been deprecated [1] in favor of int64 nanos. > > -Micah > > [1] https://issues.apache.org/jira/browse/PARQUET-323 > > On Tue, Aug 4, 2020 at 8:52 PM Fan Liya <liya.fa...@gmail.com> wrote: > > > Hi Ji, > > > > This sounds like a universal requirement, as 64-bit is not sufficient to > > hold the precision for nano-second. > > > > For the extension type, we have two choices: > > 1. Extending struct(int64, int32), which represents the design of SoA > > (Struct of Arrays). > > 2. Extending fixed width binary(12), which represents the design of AoS > > (Array of Structs) > > > > Given the universal requirement, I'd prefer a new type. > > > > Best, > > Liya Fan > > > > > > On Wed, Aug 5, 2020 at 11:18 AM Ji Liu <tianc...@apache.org> wrote: > > > > > Hi all, > > > > > > Now in Arrow Timestamp type, it support different TimeUnit(seconds, > > > milliseconds, microseconds, nanoseconds) with int64 type for storage. > In > > > most cases this is enough, but if the timestamp value range of external > > > system exceeds int64_t::max, then it's impossible to directly convert > to > > > Arrow Timestamp, consider the following user case: > > > > > > A timestamp in other system with int64 + int32(stores milliseconds and > > > nanoseconds) can represent data from 0000-00-00 to 9999-12-31 > > > 23:59:59.999999999, if we want to convert type like this, how should we > > do? > > > One probably create an extension type with struct(int64, int32) for > > > storage. > > > > > > Besides ExtensionType, are we considering extending our Timestamp for > > wider > > > range or maybe a new type for cases above? > > > > > > > > > Thanks, > > > Ji Liu > > > > > >