Hi! I got one review from Richard on Github - Thank you!
However, even though this is a small (in line count) change, and is off by default, it has very high visibility and fundamentally changes the date/time semantics of the JDBC interface (even for things like query browsers / sqlline) so I'd much prefer to get some eyes on it from the SFDC side before commiting. The actual fix is in the *DateUtil#Apply*Displacement* methods, which are called from *PhoenixResultSet* and *PhoenixPreparedStatements*, and from the* CURRENT_DATE/TIME()* functions. The rest is tests, and some (really unrelated) refactoring in PhoenixStatement to avoid re-parsing the date configuration parameters. best regards Istvan On Mon, Feb 27, 2023 at 2:08 PM Istvan Toth <st...@apache.org> wrote: > Hi! > > We already have an old rambling thread on fixing Phoenix time zone > handling, but I am starting another one. > Please if you have interest in fixing date handling then take the time to > read it, I promise it is much less messy than the old approach. > > > *Background:* > The SQL standard defines two groups of Date types. > One is the WITHOUT TIME ZONE types, which are analogous to the > java.time.Local* types, the other is the WITH TIME ZONE types, which are > analogous to java.time.Instant. > Types without qualifiers are interpreted as WITHOUT TIME ZONE types. > > > *Current state of Phoenix* > Phoenix follows this in most regards, as it stores temporal types as UTC > epoch values, interprets string literals as UTC, and formats them as UTC. > Phoenix date functions also operate in UTC, which is consistent with the > parse/format code. This results in the behaviour required for WITHOUT TIME > ZONE types. > However, this breaks down when we begin to use the java.sql. temporal > types, where Phoenix expects and returns the raw UTC epoch timestamp values. > > a brief illustration, for selects, but the situation is the same for > upsert with PreparedStatment.setTimestamp() and friends: > > > *INSERT INTO X (DATE) VALUES ('2000-01-01 10:10:10') *stores the UTC > epoch for '2000-01-01 10:10:10', regardless of the client timezone. > > > *SELECT * from DATE; rs.getString("DATE") *returns '2000-01-01 10:10:10' > , regardless of the client timezone. > > So far, so good. > > > *SELECT * from DATE; rs.getTimestamp("DATE").toString()*returns the UTC > epoch timestamp interpreted in the local timezone, i.e a different string > for every TZ. > > This is the root of all of our problems. > > This problem is built into the ancient JDBC API, which has no real concept > of WITH TIMEZONE and WITHOUT TIMEZONE types. > > *My previous attempt* > > Previously, I have changed Phoenix to treat all Temporal data as WITH > TIMEZONE types, which is not wrong, but > 1.) Goes against the SQL standard with defines unqualified temporal types > as WITHOUT timezone > 2.) Goes against several assumptions built into the Phoenix codebase, and > requires pushing the client TZ to the server side, and > - either using ThreadLocals per connection to store this information, > - or adding the TZ as a parameter to many/most methods in the parser and > compiler code > - or some even deeper refactoring, that I haven't figured out yet. > > Implementing real WITH TIMEZONE types in Phoenix in the future is not off > the table, but is a much more involved task. > (As evidenced by the time I have already wasted on it) > > *The new proposal* > > Instead of re-imagining Phoenix to treat all types as WITH TIMEZONE types, > we can keep the current interpretation as WITHOUT TIMEZONE types, and fix > the get / functions very easily. > The code change required for this is only a few hundred lines (without > tests). > > The solution is as simple as applying the Timezone offset as a > displacement in PreparedStatement and Resultset when setting or returning > java.sql.Date/Time/Timestamp. > > This is what most DBs do, and also how the SQL standard defines > conversion between WITH TIME ZONE and WITHOUT TIME ZONE types. In this > case, the java.sql epoch based types transiting the JDBC interface are > treated as WITH TIMEZONE types. > > This ensures that non timezone-aware clients get timestamps that, if > interpreted in their local timezone, resolve to the same timezone-less date > that the parser and formatter would return, and any java.sql.temporal types > set in PreparedStatement will be interpreted as the local time in the > client TZ. > > The only other change in functionality is the CURRENT_DATE() and > CURRENT_TIME() functions, which now apply the displacement to the generated > values. > > As usual, we hide this change behind a property, and default to the old > behaviour to maintain backwards (bug) compatibility. > > > *Complications* > The SQL standard really talks about a Timezone Offset, and is not talking > about DST. > Many databases take DST into account, as does my patch. > Without that the internal representation would not be really UTC, and the > date functions (i.e. dayofweek()) wouldn't work. > > However, the standard DST problems also apply, with one hour each year not > existing, and one hour existing twice. > > The proper way to handle WITHOUT TIMEZONE types from JDBC is to use > strings, or use java.time.Local* functions with get/setObject methods. > Phoenix doesn't yet have those, I plan to add them in the near future > (PHOENIX-6881). > > When using ROW_TIMESTAMP the displacement rules can easily cause problems > if the user is not expecting it, and expects the java.sql.Timestamp value > to be applied directly. > i.e when executing *preparedStmt.setTimestamp('ROW_TS_COL', new > java.sql.Timestamp(new java.sql.Date()));* from multiple timezones, there > is no guarantee that the chronologically last insert will have the latest > timestamp. > This was already the case when setting the column via a String. > This requires extra attention, and should be mentioned in the > documentation. > > *Feedback* > > Please check out the new PR at https://github.com/apache/phoenix/pull/1567, > as I said it is very small. > > Any and all comments and questions are welcome, either with the general > approach or with the specific implementation. > > I am also collecting some related issues in PHOENIX-6882, those will be > also needed to get Phoenix date handling into a better place. > > *References/further reading* > > https://issues.apache.org/jira/browse/PHOENIX-5066 > https://issues.apache.org/jira/browse/PHOENIX-6882 > https://github.com/julianhyde/sqlline/issues/440 > > https://stackoverflow.com/questions/14070572/is-java-sql-timestamp-timezone-specific > > https://stackoverflow.com/questions/44574205/does-the-timezone-matter-when-parsing-a-timestamp > > https://medium.com/@williampuk/sql-timestamp-and-jdbc-timestamp-deep-dive-7ae0ea91e237 > Also search for "displacement" in the SQL11 standard. (No public link > available) > > best regards > Istvan > >