+1, let's be cautious adding these kinds of things. On Wed, Aug 5, 2020 at 5:49 AM Wes McKinney <wesmck...@gmail.com> wrote:
> I also am not sure there is a good case for a new built-in type since it > introduces a good deal of complexity, particularly when there is the > extension type option. We’ve been living with 64-bit nanoseconds in pandas > for a decade, for example (and without the option for lower resolutions!!), > and while it does arise as a limitation from time to time, the use cases > are so specialized that it has never made sense to do anything about it. > > On Tue, Aug 4, 2020 at 11:26 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > I think a stronger case needs to be made for adding a new builtin type to > > support this. Can you provide concrete use-cases? Why can't dates > outside > > of the one representable by int64 be truncated (even for nano precision > > 64-bits max value is is over 200 years in the future)? It seems like in > > most cases values at the nanosecond level that are outside the values > > representable by 64-bits, are generally sentinel values. > > > > FWIW, Parquet had an int96 type that was used for timestamps but it has > > been deprecated [1] in favor of int64 nanos. > > > > -Micah > > > > [1] https://issues.apache.org/jira/browse/PARQUET-323 > > > > On Tue, Aug 4, 2020 at 8:52 PM Fan Liya <liya.fa...@gmail.com> wrote: > > > > > Hi Ji, > > > > > > This sounds like a universal requirement, as 64-bit is not sufficient > to > > > hold the precision for nano-second. > > > > > > For the extension type, we have two choices: > > > 1. Extending struct(int64, int32), which represents the design of SoA > > > (Struct of Arrays). > > > 2. Extending fixed width binary(12), which represents the design of AoS > > > (Array of Structs) > > > > > > Given the universal requirement, I'd prefer a new type. > > > > > > Best, > > > Liya Fan > > > > > > > > > On Wed, Aug 5, 2020 at 11:18 AM Ji Liu <tianc...@apache.org> wrote: > > > > > > > Hi all, > > > > > > > > Now in Arrow Timestamp type, it support different TimeUnit(seconds, > > > > milliseconds, microseconds, nanoseconds) with int64 type for storage. > > In > > > > most cases this is enough, but if the timestamp value range of > external > > > > system exceeds int64_t::max, then it's impossible to directly convert > > to > > > > Arrow Timestamp, consider the following user case: > > > > > > > > A timestamp in other system with int64 + int32(stores milliseconds > and > > > > nanoseconds) can represent data from 0000-00-00 to 9999-12-31 > > > > 23:59:59.999999999, if we want to convert type like this, how should > we > > > do? > > > > One probably create an extension type with struct(int64, int32) for > > > > storage. > > > > > > > > Besides ExtensionType, are we considering extending our Timestamp for > > > wider > > > > range or maybe a new type for cases above? > > > > > > > > > > > > Thanks, > > > > Ji Liu > > > > > > > > > >