Hi!

I got one review from Richard on Github - Thank you!

However, even though this is a small (in line count) change, and is off by
default, it has very high visibility and fundamentally changes the
date/time semantics of the JDBC interface (even for things like query
browsers / sqlline) so I'd much prefer to get some eyes on it from
the SFDC side before commiting.

The actual fix is in the *DateUtil#Apply*Displacement* methods, which are
called from *PhoenixResultSet* and *PhoenixPreparedStatements*,
and from the* CURRENT_DATE/TIME()* functions. The rest is tests, and some
(really unrelated) refactoring in PhoenixStatement to avoid re-parsing the
date configuration parameters.

best regards
Istvan

On Mon, Feb 27, 2023 at 2:08 PM Istvan Toth <st...@apache.org> wrote:

> Hi!
>
> We already have an old rambling thread on fixing Phoenix time zone
> handling, but I am starting another one.
> Please if you have interest in fixing date handling then take the time to
> read it, I promise it is much less messy than the old approach.
>
>
> *Background:*
> The SQL standard defines two groups of Date types.
> One is the WITHOUT TIME ZONE types, which are analogous to the
> java.time.Local* types, the other is the WITH TIME ZONE types, which are
> analogous to java.time.Instant.
> Types without qualifiers are interpreted as WITHOUT TIME ZONE types.
>
>
> *Current state of Phoenix*
> Phoenix follows this in most regards, as it stores temporal types as UTC
> epoch values, interprets string literals as UTC, and formats them as UTC.
> Phoenix date functions also operate in UTC, which is consistent with the
> parse/format code. This results in the behaviour required for WITHOUT TIME
> ZONE types.
> However, this breaks down when we begin to use the java.sql. temporal
> types, where Phoenix expects and returns the raw UTC epoch timestamp values.
>
> a brief illustration, for selects, but the situation is the same for
> upsert with PreparedStatment.setTimestamp() and friends:
>
>
> *INSERT INTO X  (DATE) VALUES ('2000-01-01 10:10:10') *stores  the UTC
> epoch for '2000-01-01 10:10:10', regardless of the client timezone.
>
>
> *SELECT * from DATE; rs.getString("DATE") *returns '2000-01-01 10:10:10'
> , regardless of the client timezone.
>
> So far, so good.
>
>
> *SELECT * from DATE; rs.getTimestamp("DATE").toString()*returns the UTC
> epoch timestamp interpreted in the local timezone, i.e a different string
> for every TZ.
>
> This is the root of all of our problems.
>
> This problem is built into the ancient JDBC API, which has no real concept
> of WITH TIMEZONE and WITHOUT TIMEZONE types.
>
> *My previous attempt*
>
> Previously, I have changed Phoenix to treat all Temporal data as WITH
> TIMEZONE types, which is not wrong, but
> 1.) Goes against the SQL standard with defines unqualified temporal types
> as WITHOUT timezone
> 2.) Goes against several assumptions built into the Phoenix codebase, and
> requires pushing the client TZ to the server side, and
> - either using ThreadLocals per connection to store this information,
> - or adding the TZ as a parameter to many/most methods in the parser and
> compiler code
> - or some even deeper refactoring, that I haven't figured out yet.
>
> Implementing real WITH TIMEZONE types in Phoenix in the future is not off
> the table, but is a much more involved task.
> (As evidenced by the time I have already wasted on it)
>
> *The new proposal*
>
> Instead of re-imagining Phoenix to treat all types as WITH TIMEZONE types,
> we can keep the current interpretation as WITHOUT TIMEZONE types, and  fix
> the get / functions very easily.
> The  code change required for this is only a few hundred lines (without
> tests).
>
> The solution is as simple as applying the Timezone offset as a
> displacement in PreparedStatement and Resultset when setting or returning
> java.sql.Date/Time/Timestamp.
>
> This is what most DBs do, and also how the SQL standard defines
> conversion between WITH TIME ZONE and WITHOUT TIME ZONE types. In this
> case, the java.sql epoch based types transiting the JDBC interface are
> treated as WITH TIMEZONE types.
>
> This ensures that non timezone-aware clients get timestamps that, if
> interpreted in their local timezone, resolve to the same timezone-less date
> that the parser and formatter would return, and any java.sql.temporal types
> set in PreparedStatement will be interpreted as the local time in the
> client TZ.
>
> The only other change in functionality is the CURRENT_DATE() and
> CURRENT_TIME() functions, which now apply the displacement to the generated
> values.
>
> As usual, we hide this change behind a property, and default to the old
> behaviour to maintain backwards (bug) compatibility.
>
>
> *Complications*
> The SQL standard really talks about a Timezone Offset, and is not talking
> about DST.
> Many databases take DST into account, as does my patch.
> Without that the internal representation would not be really UTC, and the
> date functions (i.e. dayofweek()) wouldn't work.
>
> However, the standard DST problems also apply, with one hour each year not
> existing, and one hour existing twice.
>
> The proper way to handle WITHOUT TIMEZONE types from JDBC is to use
> strings, or use java.time.Local* functions with get/setObject methods.
> Phoenix doesn't yet have those, I plan to add them in the near future
> (PHOENIX-6881).
>
> When using ROW_TIMESTAMP the displacement rules can easily cause problems
> if the user is not expecting it, and expects the java.sql.Timestamp value
> to be applied directly.
> i.e when executing *preparedStmt.setTimestamp('ROW_TS_COL', new
> java.sql.Timestamp(new java.sql.Date()));* from multiple timezones, there
> is no guarantee that the chronologically last insert will have the latest
> timestamp.
> This was already the case when setting the column via a String.
> This requires extra attention, and should be mentioned in the
> documentation.
>
> *Feedback*
>
> Please check out the new PR at https://github.com/apache/phoenix/pull/1567,
> as I said it is very small.
>
> Any and all comments and questions are welcome, either with the general
> approach or with the specific implementation.
>
> I am also collecting some related issues in PHOENIX-6882, those will be
> also needed to get Phoenix date handling into a better place.
>
> *References/further reading*
>
> https://issues.apache.org/jira/browse/PHOENIX-5066
> https://issues.apache.org/jira/browse/PHOENIX-6882
> https://github.com/julianhyde/sqlline/issues/440
>
> https://stackoverflow.com/questions/14070572/is-java-sql-timestamp-timezone-specific
>
> https://stackoverflow.com/questions/44574205/does-the-timezone-matter-when-parsing-a-timestamp
>
> https://medium.com/@williampuk/sql-timestamp-and-jdbc-timestamp-deep-dive-7ae0ea91e237
> Also search for "displacement" in the SQL11 standard. (No public link
> available)
>
> best regards
> Istvan
>
>

Reply via email to