[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years
[ https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=775108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-775108 ] ASF GitHub Bot logged work on HIVE-26233: - Author: ASF GitHub Bot Created on: 26/May/22 15:15 Start Date: 26/May/22 15:15 Worklog Time Spent: 10m Work Description: pvary merged PR #3295: URL: https://github.com/apache/hive/pull/3295 Issue Time Tracking --- Worklog Id: (was: 775108) Time Spent: 2h 20m (was: 2h 10m) > Problems reading back PARQUET timestamps above 1 years > -- > > Key: HIVE-26233 > URL: https://issues.apache.org/jira/browse/HIVE-26233 > Project: Hive > Issue Type: Bug >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: backwards-compatibility, pull-request-available, > timestamp > Time Spent: 2h 20m > Remaining Estimate: 0h > > Timestamp values above year 1 are not supported, but during the migration > from Hive2 to Hive3 some might appear because of TZ issues. We should be able > to at least read these tables before rewriting the data. > For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is > appended to the timestamp if the year exceeds 4 digits. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years
[ https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774820=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774820 ] ASF GitHub Bot logged work on HIVE-26233: - Author: ASF GitHub Bot Created on: 25/May/22 21:26 Start Date: 25/May/22 21:26 Worklog Time Spent: 10m Work Description: zabetak commented on code in PR #3295: URL: https://github.com/apache/hive/pull/3295#discussion_r882128671 ## ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampsHive2Compatibility.java: ## @@ -79,6 +79,18 @@ void testWriteHive2ReadHive4UsingLegacyConversion(String timestampString) { assertEquals(timestampString, ts.toString()); } + /** + * Tests that timestamps written using Hive2 APIs are read correctly by Hive4 APIs when legacy conversion is on. + */ + @ParameterizedTest(name = "{0}") + @MethodSource("generateTimestamps") + void testWriteHive2ReadHive4UsingLegacyConversionWithZone(String timestampString) { +String zoneId = "US/Pacific"; +NanoTime nt = writeHive2(timestampString); Review Comment: Since there is no parameter in `writeHive2` for specifying the timezone I think you will need to call `TimeZone.setDefault()` explicitly otherwise it will not work. Issue Time Tracking --- Worklog Id: (was: 774820) Time Spent: 2h 10m (was: 2h) > Problems reading back PARQUET timestamps above 1 years > -- > > Key: HIVE-26233 > URL: https://issues.apache.org/jira/browse/HIVE-26233 > Project: Hive > Issue Type: Bug >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: backwards-compatibility, pull-request-available, > timestamp > Time Spent: 2h 10m > Remaining Estimate: 0h > > Timestamp values above year 1 are not supported, but during the migration > from Hive2 to Hive3 some might appear because of TZ issues. We should be able > to at least read these tables before rewriting the data. > For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is > appended to the timestamp if the year exceeds 4 digits. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years
[ https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774593 ] ASF GitHub Bot logged work on HIVE-26233: - Author: ASF GitHub Bot Created on: 25/May/22 14:42 Start Date: 25/May/22 14:42 Worklog Time Spent: 10m Work Description: zabetak commented on code in PR #3295: URL: https://github.com/apache/hive/pull/3295#discussion_r881743843 ## ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java: ## @@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() { verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", LogicalTypeAnnotation.TimeUnit.NANOS, false); } + @Test + public void testTimestamp() { Review Comment: `TestParquetTimestampsHive2Compatibility` also uses multiple timezones so I think it should be fine and adding another one should be trivial. Issue Time Tracking --- Worklog Id: (was: 774593) Time Spent: 2h (was: 1h 50m) > Problems reading back PARQUET timestamps above 1 years > -- > > Key: HIVE-26233 > URL: https://issues.apache.org/jira/browse/HIVE-26233 > Project: Hive > Issue Type: Bug >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: backwards-compatibility, pull-request-available, > timestamp > Time Spent: 2h > Remaining Estimate: 0h > > Timestamp values above year 1 are not supported, but during the migration > from Hive2 to Hive3 some might appear because of TZ issues. We should be able > to at least read these tables before rewriting the data. > For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is > appended to the timestamp if the year exceeds 4 digits. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years
[ https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774587=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774587 ] ASF GitHub Bot logged work on HIVE-26233: - Author: ASF GitHub Bot Created on: 25/May/22 14:40 Start Date: 25/May/22 14:40 Worklog Time Spent: 10m Work Description: zabetak commented on code in PR #3295: URL: https://github.com/apache/hive/pull/3295#discussion_r881741459 ## ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java: ## @@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() { verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", LogicalTypeAnnotation.TimeUnit.NANOS, false); } + @Test + public void testTimestamp() { Review Comment: How about something like the following: ```java private static Stream generateTimestamps() { return Stream.concat(Stream.of("-12-31 23:59:59.999"), Stream.generate(new Supplier() { ``` Issue Time Tracking --- Worklog Id: (was: 774587) Time Spent: 1h 50m (was: 1h 40m) > Problems reading back PARQUET timestamps above 1 years > -- > > Key: HIVE-26233 > URL: https://issues.apache.org/jira/browse/HIVE-26233 > Project: Hive > Issue Type: Bug >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: backwards-compatibility, pull-request-available, > timestamp > Time Spent: 1h 50m > Remaining Estimate: 0h > > Timestamp values above year 1 are not supported, but during the migration > from Hive2 to Hive3 some might appear because of TZ issues. We should be able > to at least read these tables before rewriting the data. > For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is > appended to the timestamp if the year exceeds 4 digits. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years
[ https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774581=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774581 ] ASF GitHub Bot logged work on HIVE-26233: - Author: ASF GitHub Bot Created on: 25/May/22 14:18 Start Date: 25/May/22 14:18 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3295: URL: https://github.com/apache/hive/pull/3295#discussion_r881712501 ## ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java: ## @@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() { verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", LogicalTypeAnnotation.TimeUnit.NANOS, false); } + @Test + public void testTimestamp() { Review Comment: Th ave never seen `TestParquetTimestampsHive2Compatibility`, so it is good that you highlighted here. OTOH that is a parameterized test, without TZ, so I think it would be a full rewrite to get this work for this specific TZ and TS Issue Time Tracking --- Worklog Id: (was: 774581) Time Spent: 1h 40m (was: 1.5h) > Problems reading back PARQUET timestamps above 1 years > -- > > Key: HIVE-26233 > URL: https://issues.apache.org/jira/browse/HIVE-26233 > Project: Hive > Issue Type: Bug >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: backwards-compatibility, pull-request-available, > timestamp > Time Spent: 1h 40m > Remaining Estimate: 0h > > Timestamp values above year 1 are not supported, but during the migration > from Hive2 to Hive3 some might appear because of TZ issues. We should be able > to at least read these tables before rewriting the data. > For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is > appended to the timestamp if the year exceeds 4 digits. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years
[ https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774576=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774576 ] ASF GitHub Bot logged work on HIVE-26233: - Author: ASF GitHub Bot Created on: 25/May/22 14:09 Start Date: 25/May/22 14:09 Worklog Time Spent: 10m Work Description: zabetak commented on code in PR #3295: URL: https://github.com/apache/hive/pull/3295#discussion_r881702228 ## ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java: ## @@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() { verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", LogicalTypeAnnotation.TimeUnit.NANOS, false); } + @Test + public void testTimestamp() { Review Comment: Or rather enrich the tests inside `TestParquetTimestampsHive2Compatibility` to account for this edge case. Issue Time Tracking --- Worklog Id: (was: 774576) Time Spent: 1.5h (was: 1h 20m) > Problems reading back PARQUET timestamps above 1 years > -- > > Key: HIVE-26233 > URL: https://issues.apache.org/jira/browse/HIVE-26233 > Project: Hive > Issue Type: Bug >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: backwards-compatibility, pull-request-available, > timestamp > Time Spent: 1.5h > Remaining Estimate: 0h > > Timestamp values above year 1 are not supported, but during the migration > from Hive2 to Hive3 some might appear because of TZ issues. We should be able > to at least read these tables before rewriting the data. > For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is > appended to the timestamp if the year exceeds 4 digits. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years
[ https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774574 ] ASF GitHub Bot logged work on HIVE-26233: - Author: ASF GitHub Bot Created on: 25/May/22 14:07 Start Date: 25/May/22 14:07 Worklog Time Spent: 10m Work Description: zabetak commented on code in PR #3295: URL: https://github.com/apache/hive/pull/3295#discussion_r881700144 ## ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java: ## @@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() { verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", LogicalTypeAnnotation.TimeUnit.NANOS, false); } + @Test + public void testTimestamp() { Review Comment: I was thinking to move it inside the existing class, not create a new one. Issue Time Tracking --- Worklog Id: (was: 774574) Time Spent: 1h 20m (was: 1h 10m) > Problems reading back PARQUET timestamps above 1 years > -- > > Key: HIVE-26233 > URL: https://issues.apache.org/jira/browse/HIVE-26233 > Project: Hive > Issue Type: Bug >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: backwards-compatibility, pull-request-available, > timestamp > Time Spent: 1h 20m > Remaining Estimate: 0h > > Timestamp values above year 1 are not supported, but during the migration > from Hive2 to Hive3 some might appear because of TZ issues. We should be able > to at least read these tables before rewriting the data. > For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is > appended to the timestamp if the year exceeds 4 digits. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years
[ https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774535=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774535 ] ASF GitHub Bot logged work on HIVE-26233: - Author: ASF GitHub Bot Created on: 25/May/22 12:52 Start Date: 25/May/22 12:52 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3295: URL: https://github.com/apache/hive/pull/3295#discussion_r881615549 ## common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java: ## @@ -128,6 +138,10 @@ public String toString() { return localDateTime.format(PRINT_FORMATTER); } + public String toStingWithLenientFormatter() { +return localDateTime.format(PRINT_LENIENT_FORMATTER); + } + Review Comment: Good idea with the `format()` method Issue Time Tracking --- Worklog Id: (was: 774535) Time Spent: 1h 10m (was: 1h) > Problems reading back PARQUET timestamps above 1 years > -- > > Key: HIVE-26233 > URL: https://issues.apache.org/jira/browse/HIVE-26233 > Project: Hive > Issue Type: Bug >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: backwards-compatibility, pull-request-available, > timestamp > Time Spent: 1h 10m > Remaining Estimate: 0h > > Timestamp values above year 1 are not supported, but during the migration > from Hive2 to Hive3 some might appear because of TZ issues. We should be able > to at least read these tables before rewriting the data. > For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is > appended to the timestamp if the year exceeds 4 digits. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years
[ https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774534 ] ASF GitHub Bot logged work on HIVE-26233: - Author: ASF GitHub Bot Created on: 25/May/22 12:52 Start Date: 25/May/22 12:52 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3295: URL: https://github.com/apache/hive/pull/3295#discussion_r881615152 ## ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java: ## @@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() { verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", LogicalTypeAnnotation.TimeUnit.NANOS, false); } + @Test + public void testTimestamp() { Review Comment: I think for a single tests we should not create a separate test class. Added comment to make clear what we are testing here. Issue Time Tracking --- Worklog Id: (was: 774534) Time Spent: 1h (was: 50m) > Problems reading back PARQUET timestamps above 1 years > -- > > Key: HIVE-26233 > URL: https://issues.apache.org/jira/browse/HIVE-26233 > Project: Hive > Issue Type: Bug >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: backwards-compatibility, pull-request-available, > timestamp > Time Spent: 1h > Remaining Estimate: 0h > > Timestamp values above year 1 are not supported, but during the migration > from Hive2 to Hive3 some might appear because of TZ issues. We should be able > to at least read these tables before rewriting the data. > For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is > appended to the timestamp if the year exceeds 4 digits. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years
[ https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774533=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774533 ] ASF GitHub Bot logged work on HIVE-26233: - Author: ASF GitHub Bot Created on: 25/May/22 12:51 Start Date: 25/May/22 12:51 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3295: URL: https://github.com/apache/hive/pull/3295#discussion_r881614470 ## common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java: ## @@ -101,6 +101,16 @@ public class Timestamp implements Comparable { // Fractional Part (Optional) .optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, true).optionalEnd().toFormatter(); + private static final DateTimeFormatter PRINT_LENIENT_FORMATTER = new DateTimeFormatterBuilder() + // Date and Time Parts + .appendValue(YEAR, 4, 10, SignStyle.NORMAL).appendLiteral('-').appendValue(MONTH_OF_YEAR, 2, 2, SignStyle.NORMAL) + .appendLiteral('-').appendValue(DAY_OF_MONTH, 2, 2, SignStyle.NORMAL) + .appendLiteral(" ").appendValue(HOUR_OF_DAY, 2, 2, SignStyle.NORMAL).appendLiteral(':') + .appendValue(MINUTE_OF_HOUR, 2, 2, SignStyle.NORMAL).appendLiteral(':') + .appendValue(SECOND_OF_MINUTE, 2, 2, SignStyle.NORMAL) + // Fractional Part (Optional) + .optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, true).optionalEnd().toFormatter(); + Review Comment: Moved to TimestampTZUtil, and renamed Issue Time Tracking --- Worklog Id: (was: 774533) Time Spent: 50m (was: 40m) > Problems reading back PARQUET timestamps above 1 years > -- > > Key: HIVE-26233 > URL: https://issues.apache.org/jira/browse/HIVE-26233 > Project: Hive > Issue Type: Bug >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: backwards-compatibility, pull-request-available, > timestamp > Time Spent: 50m > Remaining Estimate: 0h > > Timestamp values above year 1 are not supported, but during the migration > from Hive2 to Hive3 some might appear because of TZ issues. We should be able > to at least read these tables before rewriting the data. > For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is > appended to the timestamp if the year exceeds 4 digits. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years
[ https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774483=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774483 ] ASF GitHub Bot logged work on HIVE-26233: - Author: ASF GitHub Bot Created on: 25/May/22 11:12 Start Date: 25/May/22 11:12 Worklog Time Spent: 10m Work Description: zabetak commented on code in PR #3295: URL: https://github.com/apache/hive/pull/3295#discussion_r881514185 ## common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java: ## @@ -101,6 +101,16 @@ public class Timestamp implements Comparable { // Fractional Part (Optional) .optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, true).optionalEnd().toFormatter(); + private static final DateTimeFormatter PRINT_LENIENT_FORMATTER = new DateTimeFormatterBuilder() + // Date and Time Parts + .appendValue(YEAR, 4, 10, SignStyle.NORMAL).appendLiteral('-').appendValue(MONTH_OF_YEAR, 2, 2, SignStyle.NORMAL) + .appendLiteral('-').appendValue(DAY_OF_MONTH, 2, 2, SignStyle.NORMAL) + .appendLiteral(" ").appendValue(HOUR_OF_DAY, 2, 2, SignStyle.NORMAL).appendLiteral(':') + .appendValue(MINUTE_OF_HOUR, 2, 2, SignStyle.NORMAL).appendLiteral(':') + .appendValue(SECOND_OF_MINUTE, 2, 2, SignStyle.NORMAL) + // Fractional Part (Optional) + .optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, true).optionalEnd().toFormatter(); + Review Comment: Maybe it would be better to move this formatter in `TimestampTZUtil` since it should be strictly used for `LEGACY` purposes and since there is another formatter there as well. ## ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java: ## @@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() { verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", LogicalTypeAnnotation.TimeUnit.NANOS, false); } + @Test + public void testTimestamp() { Review Comment: What do you think of refactoring and moving the test in `TestParquetTimestampsHive2Compatibility` which exactly about compatibility with Hive 2? ## common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java: ## @@ -101,6 +101,16 @@ public class Timestamp implements Comparable { // Fractional Part (Optional) .optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, true).optionalEnd().toFormatter(); + private static final DateTimeFormatter PRINT_LENIENT_FORMATTER = new DateTimeFormatterBuilder() + // Date and Time Parts + .appendValue(YEAR, 4, 10, SignStyle.NORMAL).appendLiteral('-').appendValue(MONTH_OF_YEAR, 2, 2, SignStyle.NORMAL) + .appendLiteral('-').appendValue(DAY_OF_MONTH, 2, 2, SignStyle.NORMAL) + .appendLiteral(" ").appendValue(HOUR_OF_DAY, 2, 2, SignStyle.NORMAL).appendLiteral(':') + .appendValue(MINUTE_OF_HOUR, 2, 2, SignStyle.NORMAL).appendLiteral(':') + .appendValue(SECOND_OF_MINUTE, 2, 2, SignStyle.NORMAL) + // Fractional Part (Optional) + .optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, true).optionalEnd().toFormatter(); + Review Comment: Also using `LENIENT` in the name is a bit misleading since it implies that `DateTimeFormatterBuilder#parseLenient` is in use which is not the case here. ## common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java: ## @@ -128,6 +138,10 @@ public String toString() { return localDateTime.format(PRINT_FORMATTER); } + public String toStingWithLenientFormatter() { +return localDateTime.format(PRINT_LENIENT_FORMATTER); + } + Review Comment: The use of `Lenient` is a bit misleading as I wrote previously. Also there is a small typo in the method name `toSting` vs `toString`. Instead of adding a new method we could use `Timestamp#format` passing in the desired formatter. With the right naming for the formatter parameter it would make the intention more clear. Issue Time Tracking --- Worklog Id: (was: 774483) Time Spent: 40m (was: 0.5h) > Problems reading back PARQUET timestamps above 1 years > -- > > Key: HIVE-26233 > URL: https://issues.apache.org/jira/browse/HIVE-26233 > Project: Hive > Issue Type: Bug >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: backwards-compatibility, pull-request-available, > timestamp > Time Spent: 40m > Remaining Estimate: 0h > > Timestamp values above year 1 are not supported, but during the migration > from Hive2 to Hive3 some might appear because of TZ issues. We should be able > to at least read these tables before rewriting the data. > For this we
[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years
[ https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774462 ] ASF GitHub Bot logged work on HIVE-26233: - Author: ASF GitHub Bot Created on: 25/May/22 10:00 Start Date: 25/May/22 10:00 Worklog Time Spent: 10m Work Description: zabetak commented on code in PR #3295: URL: https://github.com/apache/hive/pull/3295#discussion_r881466988 ## ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java: ## @@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() { verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", LogicalTypeAnnotation.TimeUnit.NANOS, false); } + @Test Review Comment: What I wrote above is not true. Reverting the changes about the proleptic calendar in `NanoTimeUtils` does not fix this test. I don't know why the test was passing before but I suspect a "stale" workspace. Issue Time Tracking --- Worklog Id: (was: 774462) Time Spent: 0.5h (was: 20m) > Problems reading back PARQUET timestamps above 1 years > -- > > Key: HIVE-26233 > URL: https://issues.apache.org/jira/browse/HIVE-26233 > Project: Hive > Issue Type: Bug >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: backwards-compatibility, pull-request-available, > timestamp > Time Spent: 0.5h > Remaining Estimate: 0h > > Timestamp values above year 1 are not supported, but during the migration > from Hive2 to Hive3 some might appear because of TZ issues. We should be able > to at least read these tables before rewriting the data. > For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is > appended to the timestamp if the year exceeds 4 digits. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years
[ https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=773249=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-773249 ] ASF GitHub Bot logged work on HIVE-26233: - Author: ASF GitHub Bot Created on: 22/May/22 20:48 Start Date: 22/May/22 20:48 Worklog Time Spent: 10m Work Description: zabetak commented on code in PR #3295: URL: https://github.com/apache/hive/pull/3295#discussion_r878923538 ## ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java: ## @@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() { verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", LogicalTypeAnnotation.TimeUnit.NANOS, false); } + @Test Review Comment: The test passes successfully if you exclude all the changes in this PR and revert some lines in `NanoTimeUtils` as I mentioned in the JIRA case. This makes me think that we need to treat the problem differently since it does not appear to be related to the formatter. Issue Time Tracking --- Worklog Id: (was: 773249) Time Spent: 20m (was: 10m) > Problems reading back PARQUET timestamps above 1 years > -- > > Key: HIVE-26233 > URL: https://issues.apache.org/jira/browse/HIVE-26233 > Project: Hive > Issue Type: Bug >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: backwards-compatibility, pull-request-available, > timestamp > Time Spent: 20m > Remaining Estimate: 0h > > Timestamp values above year 1 are not supported, but during the migration > from Hive2 to Hive3 some might appear because of TZ issues. We should be able > to at least read these tables before rewriting the data. > For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is > appended to the timestamp if the year exceeds 4 digits. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years
[ https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=771309=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-771309 ] ASF GitHub Bot logged work on HIVE-26233: - Author: ASF GitHub Bot Created on: 17/May/22 12:48 Start Date: 17/May/22 12:48 Worklog Time Spent: 10m Work Description: pvary opened a new pull request, #3295: URL: https://github.com/apache/hive/pull/3295 ### What changes were proposed in this pull request? Timestamps where the year is more than 4 digits will not have an extra + sign in front of them ### Why are the changes needed? So these big timestamps could be read and rewritten so they can conform the original year limit. ### Does this PR introduce _any_ user-facing change? Timestamps where the year is more than 4 digits will not have an extra + sign in front of them. Since it was not supported even before this change, I think it is ok to have this in. ### How was this patch tested? Added unit test Issue Time Tracking --- Worklog Id: (was: 771309) Remaining Estimate: 0h Time Spent: 10m > Problems reading back PARQUET timestamps above 1 years > -- > > Key: HIVE-26233 > URL: https://issues.apache.org/jira/browse/HIVE-26233 > Project: Hive > Issue Type: Bug >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Timestamp values above year 1 are not supported, but during the migration > from Hive2 to Hive3 some might appear because of TZ issues. We should be able > to at least read these tables before rewriting the data. > For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is > appended to the timestamp if the year exceeds 4 digits. -- This message was sent by Atlassian Jira (v8.20.7#820007)