In SQL, date-time values have no timezone, and they are not implicitly UTC. It 
is up to the user to supply a timezone. Sounds like what you are proposing is a 
moment in time (similar to Unix time, and what Joda calls an “instant”). That’s 
fine, but be aware that you are diverging from SQL.

> On Oct 3, 2016, at 4:32 PM, Julien Le Dem <jul...@dremio.com> wrote:
> 
> Here is a PR for the change in timestamp:
> https://github.com/apache/arrow/pull/156
> 
> We should also clarify Date:
> https://issues.apache.org/jira/browse/ARROW-316
> 
> On Mon, Oct 3, 2016 at 3:23 PM, Julien Le Dem <jul...@dremio.com> wrote:
> 
>> I created a JIRA for the Timestamp type if you want to comment in it:
>> https://issues.apache.org/jira/browse/ARROW-315
>> 
>> On Mon, Oct 3, 2016 at 3:16 PM, Julien Le Dem <jul...@dremio.com> wrote:
>> 
>>> consistency with Parquet a +
>>> Parquet supports timestamp millis and micros (no nanos)
>>> https://github.com/apache/parquet-format/blob/master/Logical
>>> Types.md#datetime-types
>>> 
>>> currently Arrow timestamps have a timezone field.
>>> https://github.com/apache/arrow/blob/master/format/Message.fbs#L67
>>> Wes: regarding your suggestion do we want to change timestamp as follows?
>>> - remove "timestamp" field and say it's UTC
>>> - add unit field (MICROS | MILLIS)
>>> 
>>> 
>>> 
>>> On Fri, Sep 30, 2016 at 12:20 PM, Donald Foss <donald.f...@gmail.com>
>>> wrote:
>>> 
>>>> +1 for nano or milli, or something else?
>>>> 
>>>> TL;DR;
>>>> 
>>>> epochMilli++
>>>> 
>>>> —
>>>> 
>>>> Wes, the hierarchy is eminently reasonable, so +1 from me for that.
>>>> Regarding your aside, I am also a fan of the
>>>> http://speleotrove.com/decimal/decarith.html <
>>>> http://speleotrove.com/decimal/decarith.html> specification, though I
>>>> must admit I am biased simply because it addresses the Rexx Lost Digits
>>>> condition.
>>>> 
>>>> The most commonly used timestamps I see are stored as epoch
>>>> milliseconds, or epochMillis.  It may not be canonical, however there are
>>>> many billions of devices and software applications utilizing it.
>>>> 
>>>> To support extremely fine grained DateTime representations, particularly
>>>> in common scientific applications, I’m for _epochNano_, with logical
>>>> casting to work with existing datasets that are in epochMilli instead.  We
>>>> can deal with the rollover in 300k years.
>>>> 
>>>> While I personally would prefer assigning 0 as 2000-01-01T00:00:00.00Z,
>>>> I doubt it will ever happen. No, I’m not a millennial.
>>>> 
>>>> My only concern is for use of 64-bit logical DateTime at the small
>>>> Physics level.  For that use case, UT2 is more appropriate; measurements
>>>> are frequently in fractions of nanoseconds.  Perhaps there could be a way
>>>> to logically cast a signed int96, which is supported by Parquet.
>>>> 
>>>> Timestamp [logical type]
>>>> extends FixedDecimal [logical type] (int64)
>>>> extends FixedWidth [physical type] byteArray[8]
>>>> 
>>>> Timestamp96 [logical type]
>>>> extends FixedDecimal [logical type] (int96)
>>>> extends FixedWidth [physical type] byteArray[12]
>>>> 
>>>> —
>>>> 
>>>> Although inappurtenant to this specific discussion, I would like to see
>>>> a standardized DateTime specification that uses a signed int64 as the
>>>> decimal epochSecond and an unsigned int96 as the fractional representation
>>>> of a second.
>>>> 
>>>> TimestampHiggs [logical type]
>>>> extends FixedDecimal [logical type] [(int64), (uint96)] :: join()ing of
>>>> 2 columns, the fixed decimal epochSecond and the fractional second as
>>>> (n/2^96).
>>>> extends FixedWidth [physical type] byteArray[8], byteArray[12]
>>>> 
>>>> —Donald
>>>> 
>>>>> On Sep 29, 2016, at 7:07 PM, Jacques Nadeau <jacq...@apache.org>
>>>> wrote:
>>>>> 
>>>>> +1
>>>>> 
>>>>> On Thu, Sep 29, 2016 at 3:19 PM, Wes McKinney <wesmck...@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> hello,
>>>>>> 
>>>>>> For the current iteration of Arrow, can we agree to support int64 UNIX
>>>>>> timestamps with a particular resolution (second through nanosecond),
>>>>>> as these are reasonably common representations? We can look to expand
>>>>>> later if it is needed.
>>>>>> 
>>>>>> Thanks
>>>>>> Wes
>>>>>> 
>>>>>> On Mon, Aug 15, 2016 at 4:12 AM, Wes McKinney <wesmck...@gmail.com>
>>>> wrote:
>>>>>>> Bumping this discussion. As part of finalizing a v1 Arrow spec (for
>>>>>>> purposes of moving data between systems, at minimum) we should
>>>> propose
>>>>>>> timestamp metadata and physical memory representation that maximizes
>>>>>>> interoperability with other systems. It seems like a fixed decimal
>>>>>>> would meet this requirement as UNIX-like timestamps at some
>>>> resolution
>>>>>>> could pass unmodified with appropriate metadata.
>>>>>>> 
>>>>>>> We will also need decimal types in Arrow (at least to accommodate
>>>>>>> common database representations and file formats like Parquet), so
>>>>>>> this seems like a reasonable potential hierarchy of types:
>>>>>>> 
>>>>>>> Timestamp [logical type]
>>>>>>> extends FixedDecimal [logical type]
>>>>>>> extends FixedWidth [physical type]
>>>>>>> 
>>>>>>> I did a bit of internet searching but did not find a canonical
>>>>>>> reference or implementation of fixed decimals; that would be helpful.
>>>>>>> 
>>>>>>> As an aside: for floating decimal numbers for numerical data we could
>>>>>>> utilize an implementation like http://www.bytereef.org/mpdecimal/
>>>>>>> which implements the spec described at
>>>>>>> http://speleotrove.com/decimal/decarith.html
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Wes
>>>>>>> 
>>>>>>> On Thu, Jul 14, 2016 at 8:18 AM, Alex Samuel <a...@alexsamuel.net>
>>>>>> wrote:
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> May I suggest that instead of fixed-point decimals, you consider a
>>>> more
>>>>>>>> general fixed-denominator rational representation, for times and
>>>> other
>>>>>>>> purposes? Powers of ten are convenient for humans, but powers of two
>>>>>> more
>>>>>>>> efficient. For some applications, the efficiency of bit operations
>>>> over
>>>>>>>> divmod is more useful than an exact representation of integral
>>>>>> nanoseconds.
>>>>>>>> 
>>>>>>>> std::chrono takes this approach. I'll also humbly point you at my
>>>> own
>>>>>>>> date/time library, https://github.com/alexhsamuel/cron (incomplete
>>>> but
>>>>>>>> basically working), which may provide ideas or useful code. It was
>>>>>> intended
>>>>>>>> for precisely this sort of application.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Alex
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, Jul 14, 2016 at 10:27 AM Uwe Korn <uw...@xhochy.com> wrote:
>>>>>>>> 
>>>>>>>>> I agree with that having a Decimal type for timestamps is a nice
>>>>>>>>> definition. Haying your time encoded as seconds or nanoseconds
>>>> should
>>>>>> be
>>>>>>>>> the same as having a scale of the respective amount. But I would
>>>> rather
>>>>>>>>> avoid having a separate decimal physical type. Therefore I'd
>>>> prefer the
>>>>>>>>> parquet approach where decimal is only a logical type and backed by
>>>>>>>>> either a bytearray, int32 or int64.
>>>>>>>>> 
>>>>>>>>> Thus a more general timestamp could look like:
>>>>>>>>> 
>>>>>>>>> * Decimals are logical types, physical types are the same as
>>>> defined in
>>>>>>>>> Parquet [1]
>>>>>>>>> * Base unit for timestamps is seconds, you can get milliseconds and
>>>>>>>>> nanoseconds by using a different scale. .(Note that seconds and so
>>>> on
>>>>>>>>> are all powers of ten, thus matching the specification of decimal
>>>> scale
>>>>>>>>> really good).
>>>>>>>>> * Timestamp is just another logical type that is referring to
>>>> Decimal
>>>>>>>>> (and optionally may have a timezone) and signalling that we have a
>>>> Time
>>>>>>>>> and not just a "simple" decimal.
>>>>>>>>> * For a first iteration, I would assume no timezone or UTC but not
>>>>>>>>> include a metadata field. Once we're sure the implementation
>>>> works, we
>>>>>>>>> can add metadata about it.
>>>>>>>>> 
>>>>>>>>> Timedeltas could be addressed in a similar way, just without the
>>>> need
>>>>>>>>> for a timezone.
>>>>>>>>> 
>>>>>>>>> For my usages, I don't have the use-case for a larger than int64
>>>>>>>>> timestamp and would like to have it exactly as such in my
>>>> computation,
>>>>>>>>> thus my preference for the Parquet way.
>>>>>>>>> 
>>>>>>>>> Uwe
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>>>> 
>>>>>>>>> https://github.com/apache/parquet-format/blob/master/
>>>>>> LogicalTypes.md#decimal
>>>>>>>>> 
>>>>>>>>> On 13.07.16 03:06, Julian Hyde wrote:
>>>>>>>>>> I'm talking about a fixed decimal type, not floating decimal.
>>>> (Oracle
>>>>>>>>>> numbers are floating decimal. They have a few nice properties, but
>>>>>>>>>> they are variable width and can get quite large. I've seen one or
>>>> two
>>>>>>>>>> systems that started with binary flo
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> * Base unit for timestamps is seconds, you can get milliseconds and
>>>>>>>> 
>>>>>>>> nanoseconds by using a different scale. .(Note that seconds and so
>>>> on
>>>>>>>> 
>>>>>>>> are all powers of ten, thus matching the specification of decimal
>>>> scale
>>>>>>>> 
>>>>>>>> really good).
>>>>>>>> 
>>>>>>>> * Timestamp is just another logical type that is referring to
>>>> Decimal
>>>>>>>> 
>>>>>>>> (and optionally may have a timezone) and signalling that we have a
>>>> Tim
>>>>>>>> 
>>>>>>>> ating point numbers, which are
>>>>>>>>>> much worse for business computing, and then change to Java
>>>>>> BigDecimal,
>>>>>>>>>> which gives the right answer but are horribly inefficient.)
>>>>>>>>>> 
>>>>>>>>>> A fixed decimal type has virtually zero computational overhead. It
>>>>>>>>>> just has a piece of metadata saying something like "every value in
>>>>>>>>>> this field is multiplied by 1 million" and leaves it to the client
>>>>>>>>>> program to do that multiplying.
>>>>>>>>>> 
>>>>>>>>>> My advice is to create a good fixed decimal type and lean on it
>>>>>> heavily.
>>>>>>>>>> 
>>>>>>>>>> Julian
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Julien
>>> 
>> 
>> 
>> 
>> --
>> Julien
>> 
> 
> 
> 
> -- 
> Julien

Reply via email to