In SQL, date-time values have no timezone, and they are not implicitly UTC. It is up to the user to supply a timezone. Sounds like what you are proposing is a moment in time (similar to Unix time, and what Joda calls an “instant”). That’s fine, but be aware that you are diverging from SQL.
> On Oct 3, 2016, at 4:32 PM, Julien Le Dem <jul...@dremio.com> wrote: > > Here is a PR for the change in timestamp: > https://github.com/apache/arrow/pull/156 > > We should also clarify Date: > https://issues.apache.org/jira/browse/ARROW-316 > > On Mon, Oct 3, 2016 at 3:23 PM, Julien Le Dem <jul...@dremio.com> wrote: > >> I created a JIRA for the Timestamp type if you want to comment in it: >> https://issues.apache.org/jira/browse/ARROW-315 >> >> On Mon, Oct 3, 2016 at 3:16 PM, Julien Le Dem <jul...@dremio.com> wrote: >> >>> consistency with Parquet a + >>> Parquet supports timestamp millis and micros (no nanos) >>> https://github.com/apache/parquet-format/blob/master/Logical >>> Types.md#datetime-types >>> >>> currently Arrow timestamps have a timezone field. >>> https://github.com/apache/arrow/blob/master/format/Message.fbs#L67 >>> Wes: regarding your suggestion do we want to change timestamp as follows? >>> - remove "timestamp" field and say it's UTC >>> - add unit field (MICROS | MILLIS) >>> >>> >>> >>> On Fri, Sep 30, 2016 at 12:20 PM, Donald Foss <donald.f...@gmail.com> >>> wrote: >>> >>>> +1 for nano or milli, or something else? >>>> >>>> TL;DR; >>>> >>>> epochMilli++ >>>> >>>> — >>>> >>>> Wes, the hierarchy is eminently reasonable, so +1 from me for that. >>>> Regarding your aside, I am also a fan of the >>>> http://speleotrove.com/decimal/decarith.html < >>>> http://speleotrove.com/decimal/decarith.html> specification, though I >>>> must admit I am biased simply because it addresses the Rexx Lost Digits >>>> condition. >>>> >>>> The most commonly used timestamps I see are stored as epoch >>>> milliseconds, or epochMillis. It may not be canonical, however there are >>>> many billions of devices and software applications utilizing it. >>>> >>>> To support extremely fine grained DateTime representations, particularly >>>> in common scientific applications, I’m for _epochNano_, with logical >>>> casting to work with existing datasets that are in epochMilli instead. We >>>> can deal with the rollover in 300k years. >>>> >>>> While I personally would prefer assigning 0 as 2000-01-01T00:00:00.00Z, >>>> I doubt it will ever happen. No, I’m not a millennial. >>>> >>>> My only concern is for use of 64-bit logical DateTime at the small >>>> Physics level. For that use case, UT2 is more appropriate; measurements >>>> are frequently in fractions of nanoseconds. Perhaps there could be a way >>>> to logically cast a signed int96, which is supported by Parquet. >>>> >>>> Timestamp [logical type] >>>> extends FixedDecimal [logical type] (int64) >>>> extends FixedWidth [physical type] byteArray[8] >>>> >>>> Timestamp96 [logical type] >>>> extends FixedDecimal [logical type] (int96) >>>> extends FixedWidth [physical type] byteArray[12] >>>> >>>> — >>>> >>>> Although inappurtenant to this specific discussion, I would like to see >>>> a standardized DateTime specification that uses a signed int64 as the >>>> decimal epochSecond and an unsigned int96 as the fractional representation >>>> of a second. >>>> >>>> TimestampHiggs [logical type] >>>> extends FixedDecimal [logical type] [(int64), (uint96)] :: join()ing of >>>> 2 columns, the fixed decimal epochSecond and the fractional second as >>>> (n/2^96). >>>> extends FixedWidth [physical type] byteArray[8], byteArray[12] >>>> >>>> —Donald >>>> >>>>> On Sep 29, 2016, at 7:07 PM, Jacques Nadeau <jacq...@apache.org> >>>> wrote: >>>>> >>>>> +1 >>>>> >>>>> On Thu, Sep 29, 2016 at 3:19 PM, Wes McKinney <wesmck...@gmail.com> >>>> wrote: >>>>> >>>>>> hello, >>>>>> >>>>>> For the current iteration of Arrow, can we agree to support int64 UNIX >>>>>> timestamps with a particular resolution (second through nanosecond), >>>>>> as these are reasonably common representations? We can look to expand >>>>>> later if it is needed. >>>>>> >>>>>> Thanks >>>>>> Wes >>>>>> >>>>>> On Mon, Aug 15, 2016 at 4:12 AM, Wes McKinney <wesmck...@gmail.com> >>>> wrote: >>>>>>> Bumping this discussion. As part of finalizing a v1 Arrow spec (for >>>>>>> purposes of moving data between systems, at minimum) we should >>>> propose >>>>>>> timestamp metadata and physical memory representation that maximizes >>>>>>> interoperability with other systems. It seems like a fixed decimal >>>>>>> would meet this requirement as UNIX-like timestamps at some >>>> resolution >>>>>>> could pass unmodified with appropriate metadata. >>>>>>> >>>>>>> We will also need decimal types in Arrow (at least to accommodate >>>>>>> common database representations and file formats like Parquet), so >>>>>>> this seems like a reasonable potential hierarchy of types: >>>>>>> >>>>>>> Timestamp [logical type] >>>>>>> extends FixedDecimal [logical type] >>>>>>> extends FixedWidth [physical type] >>>>>>> >>>>>>> I did a bit of internet searching but did not find a canonical >>>>>>> reference or implementation of fixed decimals; that would be helpful. >>>>>>> >>>>>>> As an aside: for floating decimal numbers for numerical data we could >>>>>>> utilize an implementation like http://www.bytereef.org/mpdecimal/ >>>>>>> which implements the spec described at >>>>>>> http://speleotrove.com/decimal/decarith.html >>>>>>> >>>>>>> Thanks >>>>>>> Wes >>>>>>> >>>>>>> On Thu, Jul 14, 2016 at 8:18 AM, Alex Samuel <a...@alexsamuel.net> >>>>>> wrote: >>>>>>>> Hi all, >>>>>>>> >>>>>>>> May I suggest that instead of fixed-point decimals, you consider a >>>> more >>>>>>>> general fixed-denominator rational representation, for times and >>>> other >>>>>>>> purposes? Powers of ten are convenient for humans, but powers of two >>>>>> more >>>>>>>> efficient. For some applications, the efficiency of bit operations >>>> over >>>>>>>> divmod is more useful than an exact representation of integral >>>>>> nanoseconds. >>>>>>>> >>>>>>>> std::chrono takes this approach. I'll also humbly point you at my >>>> own >>>>>>>> date/time library, https://github.com/alexhsamuel/cron (incomplete >>>> but >>>>>>>> basically working), which may provide ideas or useful code. It was >>>>>> intended >>>>>>>> for precisely this sort of application. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Alex >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jul 14, 2016 at 10:27 AM Uwe Korn <uw...@xhochy.com> wrote: >>>>>>>> >>>>>>>>> I agree with that having a Decimal type for timestamps is a nice >>>>>>>>> definition. Haying your time encoded as seconds or nanoseconds >>>> should >>>>>> be >>>>>>>>> the same as having a scale of the respective amount. But I would >>>> rather >>>>>>>>> avoid having a separate decimal physical type. Therefore I'd >>>> prefer the >>>>>>>>> parquet approach where decimal is only a logical type and backed by >>>>>>>>> either a bytearray, int32 or int64. >>>>>>>>> >>>>>>>>> Thus a more general timestamp could look like: >>>>>>>>> >>>>>>>>> * Decimals are logical types, physical types are the same as >>>> defined in >>>>>>>>> Parquet [1] >>>>>>>>> * Base unit for timestamps is seconds, you can get milliseconds and >>>>>>>>> nanoseconds by using a different scale. .(Note that seconds and so >>>> on >>>>>>>>> are all powers of ten, thus matching the specification of decimal >>>> scale >>>>>>>>> really good). >>>>>>>>> * Timestamp is just another logical type that is referring to >>>> Decimal >>>>>>>>> (and optionally may have a timezone) and signalling that we have a >>>> Time >>>>>>>>> and not just a "simple" decimal. >>>>>>>>> * For a first iteration, I would assume no timezone or UTC but not >>>>>>>>> include a metadata field. Once we're sure the implementation >>>> works, we >>>>>>>>> can add metadata about it. >>>>>>>>> >>>>>>>>> Timedeltas could be addressed in a similar way, just without the >>>> need >>>>>>>>> for a timezone. >>>>>>>>> >>>>>>>>> For my usages, I don't have the use-case for a larger than int64 >>>>>>>>> timestamp and would like to have it exactly as such in my >>>> computation, >>>>>>>>> thus my preference for the Parquet way. >>>>>>>>> >>>>>>>>> Uwe >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> >>>>>>>>> https://github.com/apache/parquet-format/blob/master/ >>>>>> LogicalTypes.md#decimal >>>>>>>>> >>>>>>>>> On 13.07.16 03:06, Julian Hyde wrote: >>>>>>>>>> I'm talking about a fixed decimal type, not floating decimal. >>>> (Oracle >>>>>>>>>> numbers are floating decimal. They have a few nice properties, but >>>>>>>>>> they are variable width and can get quite large. I've seen one or >>>> two >>>>>>>>>> systems that started with binary flo >>>>>>>> >>>>>>>> >>>>>>>>> * Base unit for timestamps is seconds, you can get milliseconds and >>>>>>>> >>>>>>>> nanoseconds by using a different scale. .(Note that seconds and so >>>> on >>>>>>>> >>>>>>>> are all powers of ten, thus matching the specification of decimal >>>> scale >>>>>>>> >>>>>>>> really good). >>>>>>>> >>>>>>>> * Timestamp is just another logical type that is referring to >>>> Decimal >>>>>>>> >>>>>>>> (and optionally may have a timezone) and signalling that we have a >>>> Tim >>>>>>>> >>>>>>>> ating point numbers, which are >>>>>>>>>> much worse for business computing, and then change to Java >>>>>> BigDecimal, >>>>>>>>>> which gives the right answer but are horribly inefficient.) >>>>>>>>>> >>>>>>>>>> A fixed decimal type has virtually zero computational overhead. It >>>>>>>>>> just has a piece of metadata saying something like "every value in >>>>>>>>>> this field is multiplied by 1 million" and leaves it to the client >>>>>>>>>> program to do that multiplying. >>>>>>>>>> >>>>>>>>>> My advice is to create a good fixed decimal type and lean on it >>>>>> heavily. >>>>>>>>>> >>>>>>>>>> Julian >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>> >>>> >>>> >>> >>> >>> -- >>> Julien >>> >> >> >> >> -- >> Julien >> > > > > -- > Julien