Github user paul-rogers commented on the issue:
https://github.com/apache/drill/pull/937
The original description talks about data with local times. The TPC-H data
has no TZ. Now, maybe we made one up in creating the Parquet files, but the
original date just has dates without a tz.
The fundamental issue is that if we have a tz-less date, 1994-08-12, say,
then this *cannot* be converted to a UTC timestamp. Which of the 23+ time zones
would we use? How would the client and server agree on the arbitrary tz? This
is like saying that I have a measurement in miles, but we can store distances
only in km, so I'll take my length of 5 miles and store it as 5 km, remembering
that I'm using km as an alias for miles. Does not make sense.
Your example uses `timestamp` constants. A timestamp is defined with a
timezone, and so it fits Drill's model well. But, TPC dates don't have a
timezone. See [the TPC-H
spec](http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.2.pdf)
which says:
> Date is a value whose external representation can be expressed as
YYYY-MM-DD, where all characters are numeric. A date must be able to express
any day within at least 14 consecutive years. There is no requirement specific
to the internal representation of a date.
That is, TPC-H dates are not midnight on some date in some timezone, they
are just dates. The cannot be converted to UTC. And so, they should not be
subject to time zone shifting as tzs shift.
My point here is that Hive (according to the docs) implements functions
correctly: using tz-less dates. Drill tries to convert to a (fake) UTC and use
time-based functions on that data. This is, at best, a hack, and at worst,
leads to great complexity and incorrect results.
That said, if all we have is km, and we can't do the miles-to-km conversion
correctly, then we do need a way to know that a particular km value is actually
miles. Similarly, using the current implementation, how will we know that a
particular arbitrary-local-time-encoded-as-fake-UTC value really is local time
vs. being an actual UTC time?
All that said, if you fix makes the current implementation work better,
then it is a good improvement.
In the interests of moving ahead, let's table the basic discussion and just
look at this one fix.
---