Hi!

We already have an old rambling thread on fixing Phoenix time zone
handling, but I am starting another one.
Please if you have interest in fixing date handling then take the time to
read it, I promise it is much less messy than the old approach.


*Background:*
The SQL standard defines two groups of Date types.
One is the WITHOUT TIME ZONE types, which are analogous to the
java.time.Local* types, the other is the WITH TIME ZONE types, which are
analogous to java.time.Instant.
Types without qualifiers are interpreted as WITHOUT TIME ZONE types.


*Current state of Phoenix*
Phoenix follows this in most regards, as it stores temporal types as UTC
epoch values, interprets string literals as UTC, and formats them as UTC.
Phoenix date functions also operate in UTC, which is consistent with the
parse/format code. This results in the behaviour required for WITHOUT TIME
ZONE types.
However, this breaks down when we begin to use the java.sql. temporal
types, where Phoenix expects and returns the raw UTC epoch timestamp values.

a brief illustration, for selects, but the situation is the same for upsert
with PreparedStatment.setTimestamp() and friends:


*INSERT INTO X  (DATE) VALUES ('2000-01-01 10:10:10') *stores  the UTC
epoch for '2000-01-01 10:10:10', regardless of the client timezone.


*SELECT * from DATE; rs.getString("DATE") *returns '2000-01-01 10:10:10' ,
regardless of the client timezone.

So far, so good.


*SELECT * from DATE; rs.getTimestamp("DATE").toString()*returns the UTC
epoch timestamp interpreted in the local timezone, i.e a different string
for every TZ.

This is the root of all of our problems.

This problem is built into the ancient JDBC API, which has no real concept
of WITH TIMEZONE and WITHOUT TIMEZONE types.

*My previous attempt*

Previously, I have changed Phoenix to treat all Temporal data as WITH
TIMEZONE types, which is not wrong, but
1.) Goes against the SQL standard with defines unqualified temporal types
as WITHOUT timezone
2.) Goes against several assumptions built into the Phoenix codebase, and
requires pushing the client TZ to the server side, and
- either using ThreadLocals per connection to store this information,
- or adding the TZ as a parameter to many/most methods in the parser and
compiler code
- or some even deeper refactoring, that I haven't figured out yet.

Implementing real WITH TIMEZONE types in Phoenix in the future is not off
the table, but is a much more involved task.
(As evidenced by the time I have already wasted on it)

*The new proposal*

Instead of re-imagining Phoenix to treat all types as WITH TIMEZONE types,
we can keep the current interpretation as WITHOUT TIMEZONE types, and  fix
the get / functions very easily.
The  code change required for this is only a few hundred lines (without
tests).

The solution is as simple as applying the Timezone offset as a displacement
in PreparedStatement and Resultset when setting or returning
java.sql.Date/Time/Timestamp.

This is what most DBs do, and also how the SQL standard defines
conversion between WITH TIME ZONE and WITHOUT TIME ZONE types. In this
case, the java.sql epoch based types transiting the JDBC interface are
treated as WITH TIMEZONE types.

This ensures that non timezone-aware clients get timestamps that, if
interpreted in their local timezone, resolve to the same timezone-less date
that the parser and formatter would return, and any java.sql.temporal types
set in PreparedStatement will be interpreted as the local time in the
client TZ.

The only other change in functionality is the CURRENT_DATE() and
CURRENT_TIME() functions, which now apply the displacement to the generated
values.

As usual, we hide this change behind a property, and default to the old
behaviour to maintain backwards (bug) compatibility.


*Complications*
The SQL standard really talks about a Timezone Offset, and is not talking
about DST.
Many databases take DST into account, as does my patch.
Without that the internal representation would not be really UTC, and the
date functions (i.e. dayofweek()) wouldn't work.

However, the standard DST problems also apply, with one hour each year not
existing, and one hour existing twice.

The proper way to handle WITHOUT TIMEZONE types from JDBC is to use
strings, or use java.time.Local* functions with get/setObject methods.
Phoenix doesn't yet have those, I plan to add them in the near future
(PHOENIX-6881).

When using ROW_TIMESTAMP the displacement rules can easily cause problems
if the user is not expecting it, and expects the java.sql.Timestamp value
to be applied directly.
i.e when executing *preparedStmt.setTimestamp('ROW_TS_COL', new
java.sql.Timestamp(new java.sql.Date()));* from multiple timezones, there
is no guarantee that the chronologically last insert will have the latest
timestamp.
This was already the case when setting the column via a String.
This requires extra attention, and should be mentioned in the documentation.

*Feedback*

Please check out the new PR at https://github.com/apache/phoenix/pull/1567,
as I said it is very small.

Any and all comments and questions are welcome, either with the general
approach or with the specific implementation.

I am also collecting some related issues in PHOENIX-6882, those will be
also needed to get Phoenix date handling into a better place.

*References/further reading*

https://issues.apache.org/jira/browse/PHOENIX-5066
https://issues.apache.org/jira/browse/PHOENIX-6882
https://github.com/julianhyde/sqlline/issues/440
https://stackoverflow.com/questions/14070572/is-java-sql-timestamp-timezone-specific
https://stackoverflow.com/questions/44574205/does-the-timezone-matter-when-parsing-a-timestamp
https://medium.com/@williampuk/sql-timestamp-and-jdbc-timestamp-deep-dive-7ae0ea91e237
Also search for "displacement" in the SQL11 standard. (No public link
available)

best regards
Istvan

Reply via email to