consistency with Parquet a + Parquet supports timestamp millis and micros (no nanos) https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#datetime-types
currently Arrow timestamps have a timezone field. https://github.com/apache/arrow/blob/master/format/Message.fbs#L67 Wes: regarding your suggestion do we want to change timestamp as follows? - remove "timestamp" field and say it's UTC - add unit field (MICROS | MILLIS) On Fri, Sep 30, 2016 at 12:20 PM, Donald Foss <donald.f...@gmail.com> wrote: > +1 for nano or milli, or something else? > > TL;DR; > > epochMilli++ > > — > > Wes, the hierarchy is eminently reasonable, so +1 from me for that. > Regarding your aside, I am also a fan of the http://speleotrove.com/ > decimal/decarith.html <http://speleotrove.com/decimal/decarith.html> > specification, though I must admit I am biased simply because it addresses > the Rexx Lost Digits condition. > > The most commonly used timestamps I see are stored as epoch milliseconds, > or epochMillis. It may not be canonical, however there are many billions > of devices and software applications utilizing it. > > To support extremely fine grained DateTime representations, particularly > in common scientific applications, I’m for _epochNano_, with logical > casting to work with existing datasets that are in epochMilli instead. We > can deal with the rollover in 300k years. > > While I personally would prefer assigning 0 as 2000-01-01T00:00:00.00Z, I > doubt it will ever happen. No, I’m not a millennial. > > My only concern is for use of 64-bit logical DateTime at the small Physics > level. For that use case, UT2 is more appropriate; measurements are > frequently in fractions of nanoseconds. Perhaps there could be a way to > logically cast a signed int96, which is supported by Parquet. > > Timestamp [logical type] > extends FixedDecimal [logical type] (int64) > extends FixedWidth [physical type] byteArray[8] > > Timestamp96 [logical type] > extends FixedDecimal [logical type] (int96) > extends FixedWidth [physical type] byteArray[12] > > — > > Although inappurtenant to this specific discussion, I would like to see a > standardized DateTime specification that uses a signed int64 as the decimal > epochSecond and an unsigned int96 as the fractional representation of a > second. > > TimestampHiggs [logical type] > extends FixedDecimal [logical type] [(int64), (uint96)] :: join()ing of 2 > columns, the fixed decimal epochSecond and the fractional second as > (n/2^96). > extends FixedWidth [physical type] byteArray[8], byteArray[12] > > —Donald > > > On Sep 29, 2016, at 7:07 PM, Jacques Nadeau <jacq...@apache.org> wrote: > > > > +1 > > > > On Thu, Sep 29, 2016 at 3:19 PM, Wes McKinney <wesmck...@gmail.com> > wrote: > > > >> hello, > >> > >> For the current iteration of Arrow, can we agree to support int64 UNIX > >> timestamps with a particular resolution (second through nanosecond), > >> as these are reasonably common representations? We can look to expand > >> later if it is needed. > >> > >> Thanks > >> Wes > >> > >> On Mon, Aug 15, 2016 at 4:12 AM, Wes McKinney <wesmck...@gmail.com> > wrote: > >>> Bumping this discussion. As part of finalizing a v1 Arrow spec (for > >>> purposes of moving data between systems, at minimum) we should propose > >>> timestamp metadata and physical memory representation that maximizes > >>> interoperability with other systems. It seems like a fixed decimal > >>> would meet this requirement as UNIX-like timestamps at some resolution > >>> could pass unmodified with appropriate metadata. > >>> > >>> We will also need decimal types in Arrow (at least to accommodate > >>> common database representations and file formats like Parquet), so > >>> this seems like a reasonable potential hierarchy of types: > >>> > >>> Timestamp [logical type] > >>> extends FixedDecimal [logical type] > >>> extends FixedWidth [physical type] > >>> > >>> I did a bit of internet searching but did not find a canonical > >>> reference or implementation of fixed decimals; that would be helpful. > >>> > >>> As an aside: for floating decimal numbers for numerical data we could > >>> utilize an implementation like http://www.bytereef.org/mpdecimal/ > >>> which implements the spec described at > >>> http://speleotrove.com/decimal/decarith.html > >>> > >>> Thanks > >>> Wes > >>> > >>> On Thu, Jul 14, 2016 at 8:18 AM, Alex Samuel <a...@alexsamuel.net> > >> wrote: > >>>> Hi all, > >>>> > >>>> May I suggest that instead of fixed-point decimals, you consider a > more > >>>> general fixed-denominator rational representation, for times and other > >>>> purposes? Powers of ten are convenient for humans, but powers of two > >> more > >>>> efficient. For some applications, the efficiency of bit operations > over > >>>> divmod is more useful than an exact representation of integral > >> nanoseconds. > >>>> > >>>> std::chrono takes this approach. I'll also humbly point you at my own > >>>> date/time library, https://github.com/alexhsamuel/cron (incomplete > but > >>>> basically working), which may provide ideas or useful code. It was > >> intended > >>>> for precisely this sort of application. > >>>> > >>>> Regards, > >>>> Alex > >>>> > >>>> > >>>> On Thu, Jul 14, 2016 at 10:27 AM Uwe Korn <uw...@xhochy.com> wrote: > >>>> > >>>>> I agree with that having a Decimal type for timestamps is a nice > >>>>> definition. Haying your time encoded as seconds or nanoseconds should > >> be > >>>>> the same as having a scale of the respective amount. But I would > rather > >>>>> avoid having a separate decimal physical type. Therefore I'd prefer > the > >>>>> parquet approach where decimal is only a logical type and backed by > >>>>> either a bytearray, int32 or int64. > >>>>> > >>>>> Thus a more general timestamp could look like: > >>>>> > >>>>> * Decimals are logical types, physical types are the same as defined > in > >>>>> Parquet [1] > >>>>> * Base unit for timestamps is seconds, you can get milliseconds and > >>>>> nanoseconds by using a different scale. .(Note that seconds and so on > >>>>> are all powers of ten, thus matching the specification of decimal > scale > >>>>> really good). > >>>>> * Timestamp is just another logical type that is referring to Decimal > >>>>> (and optionally may have a timezone) and signalling that we have a > Time > >>>>> and not just a "simple" decimal. > >>>>> * For a first iteration, I would assume no timezone or UTC but not > >>>>> include a metadata field. Once we're sure the implementation works, > we > >>>>> can add metadata about it. > >>>>> > >>>>> Timedeltas could be addressed in a similar way, just without the need > >>>>> for a timezone. > >>>>> > >>>>> For my usages, I don't have the use-case for a larger than int64 > >>>>> timestamp and would like to have it exactly as such in my > computation, > >>>>> thus my preference for the Parquet way. > >>>>> > >>>>> Uwe > >>>>> > >>>>> [1] > >>>>> > >>>>> https://github.com/apache/parquet-format/blob/master/ > >> LogicalTypes.md#decimal > >>>>> > >>>>> On 13.07.16 03:06, Julian Hyde wrote: > >>>>>> I'm talking about a fixed decimal type, not floating decimal. > (Oracle > >>>>>> numbers are floating decimal. They have a few nice properties, but > >>>>>> they are variable width and can get quite large. I've seen one or > two > >>>>>> systems that started with binary flo > >>>> > >>>> > >>>>> * Base unit for timestamps is seconds, you can get milliseconds and > >>>> > >>>> nanoseconds by using a different scale. .(Note that seconds and so on > >>>> > >>>> are all powers of ten, thus matching the specification of decimal > scale > >>>> > >>>> really good). > >>>> > >>>> * Timestamp is just another logical type that is referring to Decimal > >>>> > >>>> (and optionally may have a timezone) and signalling that we have a Tim > >>>> > >>>> ating point numbers, which are > >>>>>> much worse for business computing, and then change to Java > >> BigDecimal, > >>>>>> which gives the right answer but are horribly inefficient.) > >>>>>> > >>>>>> A fixed decimal type has virtually zero computational overhead. It > >>>>>> just has a piece of metadata saying something like "every value in > >>>>>> this field is multiplied by 1 million" and leaves it to the client > >>>>>> program to do that multiplying. > >>>>>> > >>>>>> My advice is to create a good fixed decimal type and lean on it > >> heavily. > >>>>>> > >>>>>> Julian > >>>>>> > >>>>> > >>>>> > >> > > -- Julien