[ https://issues.apache.org/jira/browse/PHOENIX-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435871#comment-17435871 ]
Istvan Toth commented on PHOENIX-5066: -------------------------------------- Spent about a day on this, and I think I have plan. I was wrong when I earlier wrote that we don't have the infrastructure for pushing the client time zone to the coprocessors. We can use the scan attributes for this, just as we are using them for about anything else. It is also rather obvious that we must have a compatibility setting to maintain the current behaviour, as all existing applications are written to work with that. To facilitate a grandual migration to the new behaviour, we also have to support both the legacy and the JDBC compilant mode in the same cluster, which means that it must be switchable on the connection level. We have three pieces of information that we must push deep into code, to type system, expressions and coprocessors: * Do we use the legacy or the new date handling ? * Date/Time format string * Timezone (the format string + timezone can also be represented as a pair of parsers and formatters) This means that a lot of core methods in the Function and Type classes , like org.apache.phoenix.schema.types.PDataType.toBytes(Object, byte[], int), and all methods in their caller chain must be made aware of the the above information, which is a rather huge change in terms of lines touched, and also has some performance impact as we'd be schlepping around another object (actually a pointer) in a lot of the code paths, and only end up using it in a few cases. Richard's current patch uses a ThreadLocal variable to avoid having to carry around this information (as suggested by me), but now I have doubts if that is the right solution. On the client side we'd have to re-initalize it on the client side for every relevant jdbc entry point method (Conneciton.execute* ?), as it may not be called from the same thread where the connection was created. On the server side I believe that a Scanner invocation doesn't switch threads during execution, so setting the ThreadLocal variable before starting the scanner may be enough. Please share your thoughts on the above. Is adding the new parameters in the execution call path acceptable ? Do you think ThreadLocals are good idea, or that they would even work ? Do you have some other, better suggestion ? > The TimeZone is incorrectly used during writing or reading data > --------------------------------------------------------------- > > Key: PHOENIX-5066 > URL: https://issues.apache.org/jira/browse/PHOENIX-5066 > Project: Phoenix > Issue Type: Bug > Affects Versions: 5.0.0, 4.14.1 > Reporter: Jaanai Zhang > Assignee: Istvan Toth > Priority: Critical > Fix For: 4.17.0, 5.2.0, 4.16.2 > > Attachments: DateTest.java, PHOENIX-5066.4x.v1.patch, > PHOENIX-5066.4x.v2.patch, PHOENIX-5066.4x.v3.patch, > PHOENIX-5066.master.v1.patch, PHOENIX-5066.master.v2.patch, > PHOENIX-5066.master.v3.patch, PHOENIX-5066.master.v4.patch, > PHOENIX-5066.master.v5.patch, PHOENIX-5066.master.v6.patch > > Time Spent: 20m > Remaining Estimate: 0h > > We have two methods to write data when uses JDBC API. > #1. Uses _the exceuteUpdate_ method to execute a string that is an upsert SQL. > #2. Uses the _prepareStatement_ method to set some objects and execute. > The _string_ data needs to convert to a new object by the schema information > of tables. we'll use some date formatters to convert string data to object > for Date/Time/Timestamp types when writes data and the formatters are used > when reads data as well. > > *Uses default timezone test* > Writing 3 records by the different ways. > {code:java} > UPSERT INTO date_test VALUES (1,'2018-12-10 15:40:47','2018-12-10 > 15:40:47','2018-12-10 15:40:47') > UPSERT INTO date_test VALUES (2,to_date('2018-12-10 > 15:40:47'),to_time('2018-12-10 15:40:47'),to_timestamp('2018-12-10 15:40:47')) > stmt.setInt(1, 3);stmt.setDate(2, date);stmt.setTime(3, > time);stmt.setTimestamp(4, ts); > {code} > Reading the table by the getObject(getDate/getTime/getTimestamp) methods. > {code:java} > 1 | 2018-12-10 | 23:45:07 | 2018-12-10 23:45:07.0 > 2 | 2018-12-10 | 23:45:07 | 2018-12-10 23:45:07.0 > 3 | 2018-12-10 | 15:45:07 | 2018-12-10 15:45:07.66 > {code} > Reading the table by the getString methods > {code:java} > 1 | 2018-12-10 15:45:07.000 | 2018-12-10 15:45:07.000 | 2018-12-10 > 15:45:07.000 > 2 | 2018-12-10 15:45:07.000 | 2018-12-10 15:45:07.000 | 2018-12-10 > 15:45:07.000 > 3 | 2018-12-10 07:45:07.660 | 2018-12-10 07:45:07.660 | 2018-12-10 > 07:45:07.660 > {code} > *Uses GMT+8 test* > Writing 3 records by the different ways. > {code:java} > UPSERT INTO date_test VALUES (1,'2018-12-10 15:40:47','2018-12-10 > 15:40:47','2018-12-10 15:40:47') > UPSERT INTO date_test VALUES (2,to_date('2018-12-10 > 15:40:47'),to_time('2018-12-10 15:40:47'),to_timestamp('2018-12-10 15:40:47')) > stmt.setInt(1, 3);stmt.setDate(2, date);stmt.setTime(3, > time);stmt.setTimestamp(4, ts); > {code} > Reading the table by the getObject(getDate/getTime/getTimestamp) methods. > {code:java} > 1 | 2018-12-10 | 23:40:47 | 2018-12-10 23:40:47.0 > 2 | 2018-12-10 | 15:40:47 | 2018-12-10 15:40:47.0 > 3 | 2018-12-10 | 15:40:47 | 2018-12-10 15:40:47.106 {code} > Reading the table by the getString methods > {code:java} > 1 | 2018-12-10 23:40:47.000 | 2018-12-10 23:40:47.000 | 2018-12-10 > 23:40:47.000 > 2 | 2018-12-10 15:40:47.000 | 2018-12-10 15:40:47.000 | 2018-12-10 > 15:40:47.000 > 3 | 2018-12-10 15:40:47.106 | 2018-12-10 15:40:47.106 | 2018-12-10 > 15:40:47.106 > {code} > > _We_ have a historical problem, we'll parse the string to > Date/Time/Timestamp objects with timezone in #1, which means the actual data > is going to be changed when stored in HBase table。 -- This message was sent by Atlassian Jira (v8.3.4#803005)