Timezone calculations are not simple ( e.g., “2017-03-11 23:30:00-PST” + INTERVAL ‘3’ HOURS --> need to know about daylight savings time, etc.)
Linux does have a timezone. The actual implementation is quite complex – it keeps an elaborate “database” under /usr/share/zoneinfo , (which needs to be updated periodically, e.g. by running “yum update tzdata”). Is Java’s TZ support (https://docs.oracle.com/javase/8/docs/api/java/util/TimeZone.html) equivalent to Linux ? - Boaz On 3/16/17, 4:48 PM, "Paul Rogers" <[email protected]> wrote: Thanks all for the explanations! Did a bit of poking around. See DRILL-5360. For the Timestamp type: * Literals are claimed to be in UTC (have not yet tested) * Value vectors store Timestamps in server local time * Drill clients get the Timestamp in server local time * JDBC clients try to convert server local time to UTC, but use the client timezone to do so. The result is that clients must know the server timezone, but Drill does not provide this info. Drill clients must convert from server timezone to UTC to get the UTC value of a timestamp. JDBC clients must convert from the “UTC” given from JDBC to true UTC by subtracting the difference between server and client timezone offsets. I suspect, as Jinfeng points out, that much of the confusion comes from the conflicting use of the term “timestamp” in the SQL 2011 standard [1] and standard Linux/Java practice. In Linux and Java, a “timestamp” is ms since the Unix epoch, UTC. (That is, the UTC timestamp is implied, so all machines anywhere agree on what a time means.) The SQL TIMESTAMP is what most databases call a DATETIME: a combination of a date and time that are “free floating”: there is no implied time zone. “3 PM” is just that, it does not imply “3 PM in Paris.” SQL provides a TIMESTAMP WITH TIME ZONE, but that also differs from Linux practice: it is not a UTC time but rather a DATETIME with a associated timezone. The Drill Timestamp is neither of these. It like a TIMESTAMP WITH TIMEZONE where the timezone is the server local timezone. But, Drill does not specify that timezone, so the client “just has to know.” Unlike the Linux timestamp, the client & server don’t agree ahead of time by convention; instead every server can have its own Timestamp timezone and the client must figure out the corresponding UTC or client local time. What we have can work with clever adjustment programming. But it would be better (for wider adoption) to provide a cleaner, more deterministic API. Unfortunately, we probably can’t fix the existing Timestamp as there is probably already code that tries (like JDBC) to work around the current behavior. Instead, we should add the SQL TIMESTAMP WITH TIMEZONE. Or add a non-standard “LinuxTimezone” (or “TimezoneUTC”) that sores times in an agreed-upon UTC format. Until then, tread carefully. - Paul [1] http://standards.iso.org/ittf/PubliclyAvailableStandards/c053681_ISO_IEC_9075-1_2011.zip > On Mar 16, 2017, at 4:25 PM, Jinfeng Ni <[email protected]> wrote: > > My understanding is TIME/TIMESTAMP in Drill is TIME/TIMESTAMP without > timezone. TimeStampTZ is for TIMESTAMP with timezone, which Drill > probably does not fully support. > > SQL standards has DATE, TIME WITHOUT TIME ZONE, TIMESTAMP WITHOUT > TIME ZONE, TIME WITH TIME ZONE, or TIMESTAMP WITH TIME ZONE. > Time/Timestamp without t/z should be interpreted as local time. > > Here is some descriptions in SQL 2011 : Sec 4.6.2. > > " > > A datetime data type that specifies WITH TIME ZONE is a data type that > is datetime with time zone, while a datetime data type that specifies > WITHOUT TIME ZONE is a data type that is datetime without time zone. > > The surface of the earth is divided into zones, called time zones, in > which every correct clock tells the same time, known as local time. > Local time is equal to UTC (Coordinated Universal Time) plus the time > zone dis- placement, which is an interval value that ranges between > INTERVAL '–14:00' HOUR TO MINUTE and INTERVAL '+14:00' HOUR TO MINUTE. > > A datetime value, of data type TIME WITHOUT TIME ZONE or TIMESTAMP > WITHOUT TIME ZONE, may represent a local time, whereas a datetime > value of data type TIME WITH TIME ZONE or TIMESTAMP WITH TIME ZONE > represents UTC. > > " > > On Tue, Mar 14, 2017 at 11:56 AM, Julian Hyde <[email protected]> wrote: >> I don’t think 4713 BC comes from the SQL standard. That is a Postgres thing. >> >> I believe that the standard says you should support timestamp precision up to 9 (i.e. nanoseconds). 2 ^ 64 nanoseconds is 584 years. So, it’s not possible to cram all of the timestamp values we’d like into a 64 bit integer if you are going to support such a large time range. >> >> Julian >> >>> On Mar 14, 2017, at 11:29 AM, Boaz Ben-Zvi <[email protected]> wrote: >>> >>> Thanks for the detailed research, Paul, >>> >>> INTERVAL by the (SQL-99 ?) standard should be either a SECOND-DAY interval or a MONTH-YEAR interval. This is a result of the inconsistent “number of days in a month”. >>> >>> DATE expressed in days, starting at 4713-BC sound like the standard as well (at least same as Postgres). (should be implemented as 4 byte) >>> >>> TIME is only within a 24 hour period, so why should it care about 2001 ? Probably the documentation should be fixed. (should be implemented as 4 byte) >>> >>> TIMESTAMP should be an 8 byte type (maybe starting at 4713-BC as well, to match DATE (and Postgres :-)). >>> >>> Thanks, >>> >>> — Boaz >>> >>> >>> On Mar 13, 2017, at 3:46 PM, Paul Rogers <[email protected]<mailto:[email protected]>> wrote: >>> >>> Thanks Parth! >>> >>> The date and time definitions are the “classic” ones, but conflict with the Drill documentation: >>> >>> http://drill.apache.org/docs/supported-data-types/ >>> >>> DATE Years, months, and days in YYYY-MM-DD format since 4713 BC >>> >>> TIME 24-hour based time before or after January 1, 2001 in hours, minutes, seconds format: HH:mm:ss >>> >>> Which is correct? >>> >>> If the documentation is wrong, we can file a JIRA to correct it. (It may not even be wrong, since one can convert from one to the other easily, it may just be misleading…) >>> >>> Also note that, according to C++, DATE and TIME and TIMESTAMP are exactly the same, but the TIME as as 32-bit number, could only hold about 2 years due to limited range. >>> >>> Also, according to SQL, DATE has no time zone, it is just a date. That is, 2016-03-13 is the same date in PST or GMT. If DATE were seconds since the UTC epoch, dates would be different in different time zones. So, I assume we use the Unix epoch, but without an implied UTC time zone as is usual for Linux and Windows timestamps? >>> >>> How does a TIMESTAMP differ from a DATE? Perhaps a TIMESTAMP is based on the epoch UTC while DATE has no implied time zone? >>> >>> Again, the documentation differs: >>> >>> INTERVAL (Internally, INTERVAL is represented as INTERVALDAY or INTERVALYEAR.) A day-time or year-month interval >>> >>> TIMESTAMP JDBC timestamp in year, month, date hour, minute, second, and optional milliseconds format: yyyy-MM-dd HH:mm:ss.SSS >>> >>> So, sounds like we have an INTERVALDAY and INTERVAL year, but do we or do we not have an INTERVAL? >>> >>> If anyone knows, please let me know, else I need to do some poking around... >>> >>> Thanks, >>> >>> - Paul >>> >>> On Mar 13, 2017, at 2:44 PM, Parth Chandra <[email protected]<mailto:[email protected]>> wrote: >>> >>> Paul asked this and I'm posting here so someone who knows better can >>> correct me if I'm wrong ( This is from my notes when I was young) >>> >>> DATE : Int64 : Milliseconds from Unix Epoch : 1/1/1970 00:00:00 >>> TIME : Int32 : Milliseconds from midnight on 1/1/1970 >>> TimeStampTZ : Int64 + Int32 : (Milliseconds from epoch + Index into list of >>> TimeZones) >>> TimeStamp : Int64 : Milliseconds from epoch >>> Interval : Int32 + Int32 + Int32 : Month + Days + Milliseconds >>> Interval Day : Int32 + Int32 : Days + Milliseconds >>> Interval Year : Int32 : Month >>> >>> A slightly readable version of these can be found in the C++ client :). >>> $drill_src/contrib/native/client/src/include/drill/recordbatch.hpp which >>> has a bunch of 'Holder' structs for the date-time types. >>> >>> HTH >>> >>> Parth >>> >>> >>
