[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=775108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-775108
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 26/May/22 15:15
Start Date: 26/May/22 15:15
Worklog Time Spent: 10m 
  Work Description: pvary merged PR #3295:
URL: https://github.com/apache/hive/pull/3295




Issue Time Tracking
---

Worklog Id: (was: 775108)
Time Spent: 2h 20m  (was: 2h 10m)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774820=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774820
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 21:26
Start Date: 25/May/22 21:26
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r882128671


##
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampsHive2Compatibility.java:
##
@@ -79,6 +79,18 @@ void testWriteHive2ReadHive4UsingLegacyConversion(String 
timestampString) {
 assertEquals(timestampString, ts.toString());
   }
 
+  /**
+   * Tests that timestamps written using Hive2 APIs are read correctly by 
Hive4 APIs when legacy conversion is on.
+   */
+  @ParameterizedTest(name = "{0}")
+  @MethodSource("generateTimestamps")
+  void testWriteHive2ReadHive4UsingLegacyConversionWithZone(String 
timestampString) {
+String zoneId = "US/Pacific";
+NanoTime nt = writeHive2(timestampString);

Review Comment:
   Since there is no parameter in `writeHive2` for specifying the timezone I 
think you will need to call `TimeZone.setDefault()` explicitly otherwise it 
will not work.





Issue Time Tracking
---

Worklog Id: (was: 774820)
Time Spent: 2h 10m  (was: 2h)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774593
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 14:42
Start Date: 25/May/22 14:42
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881743843


##
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java:
##
@@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() {
 verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", 
LogicalTypeAnnotation.TimeUnit.NANOS, false);
   }
 
+  @Test
+  public void testTimestamp() {

Review Comment:
   `TestParquetTimestampsHive2Compatibility` also uses multiple timezones so I 
think it should be fine and adding another one should be trivial.





Issue Time Tracking
---

Worklog Id: (was: 774593)
Time Spent: 2h  (was: 1h 50m)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774587=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774587
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 14:40
Start Date: 25/May/22 14:40
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881741459


##
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java:
##
@@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() {
 verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", 
LogicalTypeAnnotation.TimeUnit.NANOS, false);
   }
 
+  @Test
+  public void testTimestamp() {

Review Comment:
   How about something like the following:
   
   ```java
 private static Stream generateTimestamps() {
   return Stream.concat(Stream.of("-12-31 23:59:59.999"), 
Stream.generate(new Supplier() {
   ```





Issue Time Tracking
---

Worklog Id: (was: 774587)
Time Spent: 1h 50m  (was: 1h 40m)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774581=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774581
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 14:18
Start Date: 25/May/22 14:18
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881712501


##
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java:
##
@@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() {
 verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", 
LogicalTypeAnnotation.TimeUnit.NANOS, false);
   }
 
+  @Test
+  public void testTimestamp() {

Review Comment:
   Th ave never seen `TestParquetTimestampsHive2Compatibility`, so it is good 
that you highlighted here. OTOH that is a parameterized test, without TZ, so I 
think it would be a full rewrite to get this work for this specific TZ and TS 





Issue Time Tracking
---

Worklog Id: (was: 774581)
Time Spent: 1h 40m  (was: 1.5h)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774576=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774576
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 14:09
Start Date: 25/May/22 14:09
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881702228


##
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java:
##
@@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() {
 verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", 
LogicalTypeAnnotation.TimeUnit.NANOS, false);
   }
 
+  @Test
+  public void testTimestamp() {

Review Comment:
   Or rather enrich the tests inside `TestParquetTimestampsHive2Compatibility` 
to account for this edge case.





Issue Time Tracking
---

Worklog Id: (was: 774576)
Time Spent: 1.5h  (was: 1h 20m)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774574
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 14:07
Start Date: 25/May/22 14:07
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881700144


##
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java:
##
@@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() {
 verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", 
LogicalTypeAnnotation.TimeUnit.NANOS, false);
   }
 
+  @Test
+  public void testTimestamp() {

Review Comment:
   I was thinking to move it inside the existing class, not create a new one.





Issue Time Tracking
---

Worklog Id: (was: 774574)
Time Spent: 1h 20m  (was: 1h 10m)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774535=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774535
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 12:52
Start Date: 25/May/22 12:52
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881615549


##
common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java:
##
@@ -128,6 +138,10 @@ public String toString() {
 return localDateTime.format(PRINT_FORMATTER);
   }
 
+  public String toStingWithLenientFormatter() {
+return localDateTime.format(PRINT_LENIENT_FORMATTER);
+  }
+

Review Comment:
   Good idea with the `format()` method





Issue Time Tracking
---

Worklog Id: (was: 774535)
Time Spent: 1h 10m  (was: 1h)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774534
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 12:52
Start Date: 25/May/22 12:52
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881615152


##
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java:
##
@@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() {
 verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", 
LogicalTypeAnnotation.TimeUnit.NANOS, false);
   }
 
+  @Test
+  public void testTimestamp() {

Review Comment:
   I think for a single tests we should not create a separate test class.
   Added comment to make clear what we are testing here.





Issue Time Tracking
---

Worklog Id: (was: 774534)
Time Spent: 1h  (was: 50m)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774533=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774533
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 12:51
Start Date: 25/May/22 12:51
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881614470


##
common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java:
##
@@ -101,6 +101,16 @@ public class Timestamp implements Comparable {
   // Fractional Part (Optional)
   .optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, 
true).optionalEnd().toFormatter();
 
+  private static final DateTimeFormatter PRINT_LENIENT_FORMATTER = new 
DateTimeFormatterBuilder()
+  // Date and Time Parts
+  .appendValue(YEAR, 4, 10, 
SignStyle.NORMAL).appendLiteral('-').appendValue(MONTH_OF_YEAR, 2, 2, 
SignStyle.NORMAL)
+  .appendLiteral('-').appendValue(DAY_OF_MONTH, 2, 2, SignStyle.NORMAL)
+  .appendLiteral(" ").appendValue(HOUR_OF_DAY, 2, 2, 
SignStyle.NORMAL).appendLiteral(':')
+  .appendValue(MINUTE_OF_HOUR, 2, 2, SignStyle.NORMAL).appendLiteral(':')
+  .appendValue(SECOND_OF_MINUTE, 2, 2, SignStyle.NORMAL)
+  // Fractional Part (Optional)
+  .optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, 
true).optionalEnd().toFormatter();
+

Review Comment:
   Moved to TimestampTZUtil, and renamed





Issue Time Tracking
---

Worklog Id: (was: 774533)
Time Spent: 50m  (was: 40m)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774483=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774483
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 11:12
Start Date: 25/May/22 11:12
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881514185


##
common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java:
##
@@ -101,6 +101,16 @@ public class Timestamp implements Comparable {
   // Fractional Part (Optional)
   .optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, 
true).optionalEnd().toFormatter();
 
+  private static final DateTimeFormatter PRINT_LENIENT_FORMATTER = new 
DateTimeFormatterBuilder()
+  // Date and Time Parts
+  .appendValue(YEAR, 4, 10, 
SignStyle.NORMAL).appendLiteral('-').appendValue(MONTH_OF_YEAR, 2, 2, 
SignStyle.NORMAL)
+  .appendLiteral('-').appendValue(DAY_OF_MONTH, 2, 2, SignStyle.NORMAL)
+  .appendLiteral(" ").appendValue(HOUR_OF_DAY, 2, 2, 
SignStyle.NORMAL).appendLiteral(':')
+  .appendValue(MINUTE_OF_HOUR, 2, 2, SignStyle.NORMAL).appendLiteral(':')
+  .appendValue(SECOND_OF_MINUTE, 2, 2, SignStyle.NORMAL)
+  // Fractional Part (Optional)
+  .optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, 
true).optionalEnd().toFormatter();
+

Review Comment:
   Maybe it would be better to move this formatter in `TimestampTZUtil` since 
it should be strictly used for `LEGACY` purposes and since there is another 
formatter there as well.



##
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java:
##
@@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() {
 verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", 
LogicalTypeAnnotation.TimeUnit.NANOS, false);
   }
 
+  @Test
+  public void testTimestamp() {

Review Comment:
   What do you think of refactoring and moving the test in 
`TestParquetTimestampsHive2Compatibility` which exactly about compatibility 
with Hive 2? 



##
common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java:
##
@@ -101,6 +101,16 @@ public class Timestamp implements Comparable {
   // Fractional Part (Optional)
   .optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, 
true).optionalEnd().toFormatter();
 
+  private static final DateTimeFormatter PRINT_LENIENT_FORMATTER = new 
DateTimeFormatterBuilder()
+  // Date and Time Parts
+  .appendValue(YEAR, 4, 10, 
SignStyle.NORMAL).appendLiteral('-').appendValue(MONTH_OF_YEAR, 2, 2, 
SignStyle.NORMAL)
+  .appendLiteral('-').appendValue(DAY_OF_MONTH, 2, 2, SignStyle.NORMAL)
+  .appendLiteral(" ").appendValue(HOUR_OF_DAY, 2, 2, 
SignStyle.NORMAL).appendLiteral(':')
+  .appendValue(MINUTE_OF_HOUR, 2, 2, SignStyle.NORMAL).appendLiteral(':')
+  .appendValue(SECOND_OF_MINUTE, 2, 2, SignStyle.NORMAL)
+  // Fractional Part (Optional)
+  .optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, 
true).optionalEnd().toFormatter();
+

Review Comment:
   Also using `LENIENT` in the name is a bit misleading since it implies that 
`DateTimeFormatterBuilder#parseLenient` is in use which is not the case here.



##
common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java:
##
@@ -128,6 +138,10 @@ public String toString() {
 return localDateTime.format(PRINT_FORMATTER);
   }
 
+  public String toStingWithLenientFormatter() {
+return localDateTime.format(PRINT_LENIENT_FORMATTER);
+  }
+

Review Comment:
   The use of `Lenient` is a bit misleading as I wrote previously. Also there 
is a small typo in the method name `toSting` vs `toString`.
   
   Instead of adding a new method we could use `Timestamp#format` passing in 
the desired formatter. With the right naming for the formatter parameter it 
would make the intention more clear.





Issue Time Tracking
---

Worklog Id: (was: 774483)
Time Spent: 40m  (was: 0.5h)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we 

[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774462
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 10:00
Start Date: 25/May/22 10:00
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881466988


##
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java:
##
@@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() {
 verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", 
LogicalTypeAnnotation.TimeUnit.NANOS, false);
   }
 
+  @Test

Review Comment:
   What I wrote above is not true. Reverting the changes about the proleptic 
calendar in `NanoTimeUtils` does not fix this test. I don't know why the test 
was passing before but I suspect a "stale" workspace.





Issue Time Tracking
---

Worklog Id: (was: 774462)
Time Spent: 0.5h  (was: 20m)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=773249=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-773249
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 22/May/22 20:48
Start Date: 22/May/22 20:48
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r878923538


##
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java:
##
@@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() {
 verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", 
LogicalTypeAnnotation.TimeUnit.NANOS, false);
   }
 
+  @Test

Review Comment:
   The test passes successfully if you exclude all the changes in this PR and 
revert some lines in `NanoTimeUtils` as I mentioned in the JIRA case. This 
makes me think that we need to treat the problem differently since it does not 
appear to be related to the formatter.





Issue Time Tracking
---

Worklog Id: (was: 773249)
Time Spent: 20m  (was: 10m)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=771309=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-771309
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 17/May/22 12:48
Start Date: 17/May/22 12:48
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request, #3295:
URL: https://github.com/apache/hive/pull/3295

   ### What changes were proposed in this pull request?
   Timestamps where the year is more than 4 digits will not have an extra + 
sign in front of them
   
   ### Why are the changes needed?
   So these big timestamps could be read and rewritten so they can conform the 
original  year limit.
   
   ### Does this PR introduce _any_ user-facing change?
   Timestamps where the year is more than 4 digits will not have an extra + 
sign in front of them. Since it was not supported even before this change, I 
think it is ok to have this in.
   
   ### How was this patch tested?
   Added unit test
   




Issue Time Tracking
---

Worklog Id: (was: 771309)
Remaining Estimate: 0h
Time Spent: 10m

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)