zabetak commented on code in PR #5241:
URL: https://github.com/apache/hive/pull/5241#discussion_r1601339480


##########
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java:
##########
@@ -119,7 +128,44 @@ public static Timestamp getTimestamp(NanoTime nt, ZoneId 
targetZone, boolean leg
     calendar.set(Calendar.SECOND, seconds);
 
     Timestamp ts = Timestamp.ofEpochMilli(calendar.getTimeInMillis(), (int) 
nanos);
-    ts = TimestampTZUtil.convertTimestampToZone(ts, ZoneOffset.UTC, 
targetZone, legacyConversion);
+    ts = TimestampTZUtil.convertTimestampToZone(ts, ZoneOffset.UTC, 
targetZone, legacyConversion)
+        .minusDays(leapYearDateAdjustment);
+
     return ts;
   }
+
+  private static int julianLeapYearAdjustment(JulianDate jDateTime) {
+    Set<Integer> validDaysOfMonth = new HashSet<>();
+    validDaysOfMonth.add(1);
+    validDaysOfMonth.add(2);
+    validDaysOfMonth.add(27);
+    validDaysOfMonth.add(28);
+    validDaysOfMonth.add(29);
+    LocalDateTime localDateTime;
+    try {
+      localDateTime = jDateTime.toLocalDateTime();
+    } catch (DateTimeException e) {
+      if (e.getMessage().contains(ErrorMsg.NOT_LEAP_YEAR.getMsg())) {
+        return 2;
+      } else {
+        throw e;
+      }
+    }
+
+    int year = localDateTime.getYear();
+    int dayOfMonth = localDateTime.getDayOfMonth();
+    boolean isJulianLeapYear = (year % 4 == 0 && year % 400 != 0 && year % 100 
== 0);
+
+    if (year < 1582 && isJulianLeapYear && 
validDaysOfMonth.contains(dayOfMonth)) {
+      switch (localDateTime.getMonth()) {
+      case FEBRUARY:
+        return -2;
+      case MARCH:
+        return 2;
+      default:
+        return 0;
+      }
+    }
+    return 0;
+  }

Review Comment:
   This logic is complex and pretty expensive in terms of performance for 
something that will be done for every single timestamp value (especially the 
try catch block).
   
   According to the simple test below the problematic Julian days seem to be 
rather few and they are known in advance:
   ```java
       int start = JulianDate.of(LocalDate.of(0, 1, 1)).getInteger();
       int end = JulianDate.of(LocalDate.of(9999, 12, 31)).getInteger();
       System.out.println("Start Julian day " + start);
       System.out.println("End Julian day " + end);
       for (int i = start; i <= end; i++) {
         JulianDate date = new JulianDate(i, 0.0);
         try {
           date.toLocalDateTime();
         } catch (Exception e) {
           System.out.println("Problem converting Julian day " + i);
         }
       }
   ```
   ```
   Start Julian day 1721057
   End Julian day 5373483
   Problem converting Julian day 1757642
   Problem converting Julian day 1794167
   Problem converting Julian day 1830692
   Problem converting Julian day 1903742
   Problem converting Julian day 1940267
   Problem converting Julian day 1976792
   Problem converting Julian day 2049842
   Problem converting Julian day 2086367
   Problem converting Julian day 2122892
   Problem converting Julian day 2195942
   Problem converting Julian day 2232467
   Problem converting Julian day 2268992
   ```
   Wouldn't be easier and more efficient to do the shifting based on the 12 
dates printed above? Am I missing more problematic dates?



##########
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java:
##########
@@ -100,12 +107,14 @@ public static Timestamp getTimestamp(NanoTime nt, ZoneId 
targetZone, boolean leg
       julianDay--;
     }
 
-    JulianDate jDateTime;
-    jDateTime = JulianDate.of((double) julianDay);
+    JulianDate jDateTime = JulianDate.of((double) julianDay);
+    LocalDateTime localDateTime;
+    int leapYearDateAdjustment = legacyConversion ?  
julianLeapYearAdjustment(jDateTime) : 0;
+    localDateTime = jDateTime.add(leapYearDateAdjustment).toLocalDateTime();
     Calendar calendar = getGMTCalendar();
-    calendar.set(Calendar.YEAR, jDateTime.toLocalDateTime().getYear());

Review Comment:
   The fact that the exception originates from `JulianDate` class makes me 
believe that this is a bug/problem to be addressed in the `jodd` library. I 
also think that this exception is mainly a result of HIVE-25054 which upgraded 
the jodd library.



##########
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java:
##########
@@ -100,12 +107,14 @@ public static Timestamp getTimestamp(NanoTime nt, ZoneId 
targetZone, boolean leg
       julianDay--;
     }
 
-    JulianDate jDateTime;
-    jDateTime = JulianDate.of((double) julianDay);
+    JulianDate jDateTime = JulianDate.of((double) julianDay);
+    LocalDateTime localDateTime;
+    int leapYearDateAdjustment = legacyConversion ?  
julianLeapYearAdjustment(jDateTime) : 0;

Review Comment:
   Are we sure that this leap year problem affects only the `legacyConversion`? 
What is preventing us from getting a problematic julian day when 
`legacyConversion` is `false`?



##########
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java:
##########
@@ -119,7 +128,44 @@ public static Timestamp getTimestamp(NanoTime nt, ZoneId 
targetZone, boolean leg
     calendar.set(Calendar.SECOND, seconds);
 
     Timestamp ts = Timestamp.ofEpochMilli(calendar.getTimeInMillis(), (int) 
nanos);
-    ts = TimestampTZUtil.convertTimestampToZone(ts, ZoneOffset.UTC, 
targetZone, legacyConversion);
+    ts = TimestampTZUtil.convertTimestampToZone(ts, ZoneOffset.UTC, 
targetZone, legacyConversion)
+        .minusDays(leapYearDateAdjustment);
+
     return ts;
   }
+
+  private static int julianLeapYearAdjustment(JulianDate jDateTime) {
+    Set<Integer> validDaysOfMonth = new HashSet<>();
+    validDaysOfMonth.add(1);
+    validDaysOfMonth.add(2);
+    validDaysOfMonth.add(27);
+    validDaysOfMonth.add(28);
+    validDaysOfMonth.add(29);
+    LocalDateTime localDateTime;
+    try {
+      localDateTime = jDateTime.toLocalDateTime();
+    } catch (DateTimeException e) {
+      if (e.getMessage().contains(ErrorMsg.NOT_LEAP_YEAR.getMsg())) {
+        return 2;
+      } else {
+        throw e;
+      }
+    }
+
+    int year = localDateTime.getYear();
+    int dayOfMonth = localDateTime.getDayOfMonth();
+    boolean isJulianLeapYear = (year % 4 == 0 && year % 400 != 0 && year % 100 
== 0);
+
+    if (year < 1582 && isJulianLeapYear && 
validDaysOfMonth.contains(dayOfMonth)) {
+      switch (localDateTime.getMonth()) {
+      case FEBRUARY:
+        return -2;
+      case MARCH:
+        return 2;
+      default:
+        return 0;
+      }
+    }
+    return 0;
+  }

Review Comment:
   Moreover it kinda feels like this kind of shifting should be done inside the 
`JulianDate` class. I haven't fully formulated an opinion yet but I raised 
https://github.com/oblac/jodd-util/issues/21 to see what the jodd authors have 
to say about this use case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to