consistency with Parquet a +
Parquet supports timestamp millis and micros (no nanos)
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#datetime-types

currently Arrow timestamps have a timezone field.
https://github.com/apache/arrow/blob/master/format/Message.fbs#L67
Wes: regarding your suggestion do we want to change timestamp as follows?
- remove "timestamp" field and say it's UTC
- add unit field (MICROS | MILLIS)



On Fri, Sep 30, 2016 at 12:20 PM, Donald Foss <donald.f...@gmail.com> wrote:

> +1 for nano or milli, or something else?
>
> TL;DR;
>
> epochMilli++
>
> —
>
> Wes, the hierarchy is eminently reasonable, so +1 from me for that.
> Regarding your aside, I am also a fan of the http://speleotrove.com/
> decimal/decarith.html <http://speleotrove.com/decimal/decarith.html>
> specification, though I must admit I am biased simply because it addresses
> the Rexx Lost Digits condition.
>
> The most commonly used timestamps I see are stored as epoch milliseconds,
> or epochMillis.  It may not be canonical, however there are many billions
> of devices and software applications utilizing it.
>
> To support extremely fine grained DateTime representations, particularly
> in common scientific applications, I’m for _epochNano_, with logical
> casting to work with existing datasets that are in epochMilli instead.  We
> can deal with the rollover in 300k years.
>
> While I personally would prefer assigning 0 as 2000-01-01T00:00:00.00Z, I
> doubt it will ever happen. No, I’m not a millennial.
>
> My only concern is for use of 64-bit logical DateTime at the small Physics
> level.  For that use case, UT2 is more appropriate; measurements are
> frequently in fractions of nanoseconds.  Perhaps there could be a way to
> logically cast a signed int96, which is supported by Parquet.
>
> Timestamp [logical type]
> extends FixedDecimal [logical type] (int64)
> extends FixedWidth [physical type] byteArray[8]
>
> Timestamp96 [logical type]
> extends FixedDecimal [logical type] (int96)
> extends FixedWidth [physical type] byteArray[12]
>
> —
>
> Although inappurtenant to this specific discussion, I would like to see a
> standardized DateTime specification that uses a signed int64 as the decimal
> epochSecond and an unsigned int96 as the fractional representation of a
> second.
>
> TimestampHiggs [logical type]
> extends FixedDecimal [logical type] [(int64), (uint96)] :: join()ing of 2
> columns, the fixed decimal epochSecond and the fractional second as
> (n/2^96).
> extends FixedWidth [physical type] byteArray[8], byteArray[12]
>
> —Donald
>
> > On Sep 29, 2016, at 7:07 PM, Jacques Nadeau <jacq...@apache.org> wrote:
> >
> > +1
> >
> > On Thu, Sep 29, 2016 at 3:19 PM, Wes McKinney <wesmck...@gmail.com>
> wrote:
> >
> >> hello,
> >>
> >> For the current iteration of Arrow, can we agree to support int64 UNIX
> >> timestamps with a particular resolution (second through nanosecond),
> >> as these are reasonably common representations? We can look to expand
> >> later if it is needed.
> >>
> >> Thanks
> >> Wes
> >>
> >> On Mon, Aug 15, 2016 at 4:12 AM, Wes McKinney <wesmck...@gmail.com>
> wrote:
> >>> Bumping this discussion. As part of finalizing a v1 Arrow spec (for
> >>> purposes of moving data between systems, at minimum) we should propose
> >>> timestamp metadata and physical memory representation that maximizes
> >>> interoperability with other systems. It seems like a fixed decimal
> >>> would meet this requirement as UNIX-like timestamps at some resolution
> >>> could pass unmodified with appropriate metadata.
> >>>
> >>> We will also need decimal types in Arrow (at least to accommodate
> >>> common database representations and file formats like Parquet), so
> >>> this seems like a reasonable potential hierarchy of types:
> >>>
> >>> Timestamp [logical type]
> >>> extends FixedDecimal [logical type]
> >>> extends FixedWidth [physical type]
> >>>
> >>> I did a bit of internet searching but did not find a canonical
> >>> reference or implementation of fixed decimals; that would be helpful.
> >>>
> >>> As an aside: for floating decimal numbers for numerical data we could
> >>> utilize an implementation like http://www.bytereef.org/mpdecimal/
> >>> which implements the spec described at
> >>> http://speleotrove.com/decimal/decarith.html
> >>>
> >>> Thanks
> >>> Wes
> >>>
> >>> On Thu, Jul 14, 2016 at 8:18 AM, Alex Samuel <a...@alexsamuel.net>
> >> wrote:
> >>>> Hi all,
> >>>>
> >>>> May I suggest that instead of fixed-point decimals, you consider a
> more
> >>>> general fixed-denominator rational representation, for times and other
> >>>> purposes? Powers of ten are convenient for humans, but powers of two
> >> more
> >>>> efficient. For some applications, the efficiency of bit operations
> over
> >>>> divmod is more useful than an exact representation of integral
> >> nanoseconds.
> >>>>
> >>>> std::chrono takes this approach. I'll also humbly point you at my own
> >>>> date/time library, https://github.com/alexhsamuel/cron (incomplete
> but
> >>>> basically working), which may provide ideas or useful code. It was
> >> intended
> >>>> for precisely this sort of application.
> >>>>
> >>>> Regards,
> >>>> Alex
> >>>>
> >>>>
> >>>> On Thu, Jul 14, 2016 at 10:27 AM Uwe Korn <uw...@xhochy.com> wrote:
> >>>>
> >>>>> I agree with that having a Decimal type for timestamps is a nice
> >>>>> definition. Haying your time encoded as seconds or nanoseconds should
> >> be
> >>>>> the same as having a scale of the respective amount. But I would
> rather
> >>>>> avoid having a separate decimal physical type. Therefore I'd prefer
> the
> >>>>> parquet approach where decimal is only a logical type and backed by
> >>>>> either a bytearray, int32 or int64.
> >>>>>
> >>>>> Thus a more general timestamp could look like:
> >>>>>
> >>>>> * Decimals are logical types, physical types are the same as defined
> in
> >>>>> Parquet [1]
> >>>>> * Base unit for timestamps is seconds, you can get milliseconds and
> >>>>> nanoseconds by using a different scale. .(Note that seconds and so on
> >>>>> are all powers of ten, thus matching the specification of decimal
> scale
> >>>>> really good).
> >>>>> * Timestamp is just another logical type that is referring to Decimal
> >>>>> (and optionally may have a timezone) and signalling that we have a
> Time
> >>>>> and not just a "simple" decimal.
> >>>>> * For a first iteration, I would assume no timezone or UTC but not
> >>>>> include a metadata field. Once we're sure the implementation works,
> we
> >>>>> can add metadata about it.
> >>>>>
> >>>>> Timedeltas could be addressed in a similar way, just without the need
> >>>>> for a timezone.
> >>>>>
> >>>>> For my usages, I don't have the use-case for a larger than int64
> >>>>> timestamp and would like to have it exactly as such in my
> computation,
> >>>>> thus my preference for the Parquet way.
> >>>>>
> >>>>> Uwe
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>> https://github.com/apache/parquet-format/blob/master/
> >> LogicalTypes.md#decimal
> >>>>>
> >>>>> On 13.07.16 03:06, Julian Hyde wrote:
> >>>>>> I'm talking about a fixed decimal type, not floating decimal.
> (Oracle
> >>>>>> numbers are floating decimal. They have a few nice properties, but
> >>>>>> they are variable width and can get quite large. I've seen one or
> two
> >>>>>> systems that started with binary flo
> >>>>
> >>>>
> >>>>> * Base unit for timestamps is seconds, you can get milliseconds and
> >>>>
> >>>> nanoseconds by using a different scale. .(Note that seconds and so on
> >>>>
> >>>> are all powers of ten, thus matching the specification of decimal
> scale
> >>>>
> >>>> really good).
> >>>>
> >>>> * Timestamp is just another logical type that is referring to Decimal
> >>>>
> >>>> (and optionally may have a timezone) and signalling that we have a Tim
> >>>>
> >>>> ating point numbers, which are
> >>>>>> much worse for business computing, and then change to Java
> >> BigDecimal,
> >>>>>> which gives the right answer but are horribly inefficient.)
> >>>>>>
> >>>>>> A fixed decimal type has virtually zero computational overhead. It
> >>>>>> just has a piece of metadata saying something like "every value in
> >>>>>> this field is multiplied by 1 million" and leaves it to the client
> >>>>>> program to do that multiplying.
> >>>>>>
> >>>>>> My advice is to create a good fixed decimal type and lean on it
> >> heavily.
> >>>>>>
> >>>>>> Julian
> >>>>>>
> >>>>>
> >>>>>
> >>
>
>


-- 
Julien

Reply via email to