[ https://issues.apache.org/jira/browse/HIVE-26233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541313#comment-17541313 ]
Peter Vary commented on HIVE-26233: ----------------------------------- I checked the specific TS (9999-12-31-23:59:59.999) generated with Hive2. The TS is converted to UTC and the resulting NanoTime has been written to Parquet. Since the local TZ was EST, the actual date written to Parquet is 10000-01-01-04:59:59.999. Since the TS was written with old Parquet/Hive, we correctly want to use the legacy timestamp conversion: {code} if (legacyConversion) { try { DateFormat formatter = getLegacyDateFormatter(); formatter.setTimeZone(TimeZone.getTimeZone(fromZone)); java.util.Date date = formatter.parse(ts.toSting()); // Set the formatter to use a different timezone formatter.setTimeZone(TimeZone.getTimeZone(toZone)); Timestamp result = Timestamp.valueOf(formatter.format(date)); result.setNanos(ts.getNanos()); return result; } catch (ParseException e) { throw new RuntimeException(e); } } {code} This fails because to the {{toString()}} prints {{\+10000-01-01 04:59:59.999}} - notice the {{\+}} sing at the beginning of the String. Then we try to parse it and we fail. So we have to either print it without the {{\+}} or parse it correctly. After my proposed changes we write it out without a {{\+}} sing, and we can read it back correctly. This way - after applying the conversion - we are able get back the original (9999-12-31-23:59:59.999) TS. > Problems reading back PARQUET timestamps above 10000 years > ---------------------------------------------------------- > > Key: HIVE-26233 > URL: https://issues.apache.org/jira/browse/HIVE-26233 > Project: Hive > Issue Type: Bug > Reporter: Peter Vary > Assignee: Peter Vary > Priority: Major > Labels: backwards-compatibility, pull-request-available, > timestamp > Time Spent: 20m > Remaining Estimate: 0h > > Timestamp values above year 10000 are not supported, but during the migration > from Hive2 to Hive3 some might appear because of TZ issues. We should be able > to at least read these tables before rewriting the data. > For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is > appended to the timestamp if the year exceeds 4 digits. -- This message was sent by Atlassian Jira (v8.20.7#820007)