[jira] [Commented] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

Peter Vary (Jira) Tue, 24 May 2022 00:33:06 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-26233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541313#comment-17541313
 ]


Peter Vary commented on HIVE-26233:
-----------------------------------

I checked the specific TS (9999-12-31-23:59:59.999) generated with Hive2.
The TS is converted to UTC and the resulting NanoTime has been written to 
Parquet. Since the local TZ was EST, the actual date written to Parquet is 
10000-01-01-04:59:59.999.

Since the TS was written with old Parquet/Hive, we correctly want to use the 
legacy timestamp conversion:
{code}
    if (legacyConversion) {
      try {
        DateFormat formatter = getLegacyDateFormatter();
        formatter.setTimeZone(TimeZone.getTimeZone(fromZone));
        java.util.Date date = formatter.parse(ts.toSting());
        // Set the formatter to use a different timezone
        formatter.setTimeZone(TimeZone.getTimeZone(toZone));
        Timestamp result = Timestamp.valueOf(formatter.format(date));
        result.setNanos(ts.getNanos());
        return result;
      } catch (ParseException e) {
        throw new RuntimeException(e);
      }
    }
{code}

This fails because to the {{toString()}} prints {{\+10000-01-01 04:59:59.999}} 
- notice the {{\+}} sing at the beginning of the String. Then we try to parse 
it and we fail. So we have to either print it without the {{\+}} or parse it 
correctly. After my proposed changes we write it out without a {{\+}} sing, and 
we can read it back correctly. This way - after applying the conversion - we 
are able get back the original (9999-12-31-23:59:59.999) TS.

> Problems reading back PARQUET timestamps above 10000 years
> ----------------------------------------------------------
>
>                 Key: HIVE-26233
>                 URL: https://issues.apache.org/jira/browse/HIVE-26233
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Peter Vary
>            Assignee: Peter Vary
>            Priority: Major
>              Labels: backwards-compatibility, pull-request-available, 
> timestamp
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Timestamp values above year 10000 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

Reply via email to