zabetak commented on code in PR #5241:
URL: https://github.com/apache/hive/pull/5241#discussion_r1601339480
##########
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java:
##########
@@ -119,7 +128,44 @@ public static Timestamp getTimestamp(NanoTime nt, ZoneId
targetZone, boolean leg
calendar.set(Calendar.SECOND, seconds);
Timestamp ts = Timestamp.ofEpochMilli(calendar.getTimeInMillis(), (int)
nanos);
- ts = TimestampTZUtil.convertTimestampToZone(ts, ZoneOffset.UTC,
targetZone, legacyConversion);
+ ts = TimestampTZUtil.convertTimestampToZone(ts, ZoneOffset.UTC,
targetZone, legacyConversion)
+ .minusDays(leapYearDateAdjustment);
+
return ts;
}
+
+ private static int julianLeapYearAdjustment(JulianDate jDateTime) {
+ Set<Integer> validDaysOfMonth = new HashSet<>();
+ validDaysOfMonth.add(1);
+ validDaysOfMonth.add(2);
+ validDaysOfMonth.add(27);
+ validDaysOfMonth.add(28);
+ validDaysOfMonth.add(29);
+ LocalDateTime localDateTime;
+ try {
+ localDateTime = jDateTime.toLocalDateTime();
+ } catch (DateTimeException e) {
+ if (e.getMessage().contains(ErrorMsg.NOT_LEAP_YEAR.getMsg())) {
+ return 2;
+ } else {
+ throw e;
+ }
+ }
+
+ int year = localDateTime.getYear();
+ int dayOfMonth = localDateTime.getDayOfMonth();
+ boolean isJulianLeapYear = (year % 4 == 0 && year % 400 != 0 && year % 100
== 0);
+
+ if (year < 1582 && isJulianLeapYear &&
validDaysOfMonth.contains(dayOfMonth)) {
+ switch (localDateTime.getMonth()) {
+ case FEBRUARY:
+ return -2;
+ case MARCH:
+ return 2;
+ default:
+ return 0;
+ }
+ }
+ return 0;
+ }
Review Comment:
This logic is complex and pretty expensive in terms of performance for
something that will be done for every single timestamp value (especially the
try catch block).
According to the simple test below the problematic Julian days seem to be
rather few and they are known in advance:
```java
int start = JulianDate.of(LocalDate.of(0, 1, 1)).getInteger();
int end = JulianDate.of(LocalDate.of(9999, 12, 31)).getInteger();
System.out.println("Start Julian day " + start);
System.out.println("End Julian day " + end);
for (int i = start; i <= end; i++) {
JulianDate date = new JulianDate(i, 0.0);
try {
date.toLocalDateTime();
} catch (Exception e) {
System.out.println("Problem converting Julian day " + i);
}
}
```
```
Start Julian day 1721057
End Julian day 5373483
Problem converting Julian day 1757642
Problem converting Julian day 1794167
Problem converting Julian day 1830692
Problem converting Julian day 1903742
Problem converting Julian day 1940267
Problem converting Julian day 1976792
Problem converting Julian day 2049842
Problem converting Julian day 2086367
Problem converting Julian day 2122892
Problem converting Julian day 2195942
Problem converting Julian day 2232467
Problem converting Julian day 2268992
```
Wouldn't be easier and more efficient to do the shifting based on the 12
dates printed above? Am I missing more problematic dates?
##########
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java:
##########
@@ -100,12 +107,14 @@ public static Timestamp getTimestamp(NanoTime nt, ZoneId
targetZone, boolean leg
julianDay--;
}
- JulianDate jDateTime;
- jDateTime = JulianDate.of((double) julianDay);
+ JulianDate jDateTime = JulianDate.of((double) julianDay);
+ LocalDateTime localDateTime;
+ int leapYearDateAdjustment = legacyConversion ?
julianLeapYearAdjustment(jDateTime) : 0;
+ localDateTime = jDateTime.add(leapYearDateAdjustment).toLocalDateTime();
Calendar calendar = getGMTCalendar();
- calendar.set(Calendar.YEAR, jDateTime.toLocalDateTime().getYear());
Review Comment:
The fact that the exception originates from `JulianDate` class makes me
believe that this is a bug/problem to be addressed in the `jodd` library. I
also think that this exception is mainly a result of HIVE-25054 which upgraded
the jodd library.
##########
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java:
##########
@@ -100,12 +107,14 @@ public static Timestamp getTimestamp(NanoTime nt, ZoneId
targetZone, boolean leg
julianDay--;
}
- JulianDate jDateTime;
- jDateTime = JulianDate.of((double) julianDay);
+ JulianDate jDateTime = JulianDate.of((double) julianDay);
+ LocalDateTime localDateTime;
+ int leapYearDateAdjustment = legacyConversion ?
julianLeapYearAdjustment(jDateTime) : 0;
Review Comment:
Are we sure that this leap year problem affects only the `legacyConversion`?
What is preventing us from getting a problematic julian day when
`legacyConversion` is `false`?
##########
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java:
##########
@@ -119,7 +128,44 @@ public static Timestamp getTimestamp(NanoTime nt, ZoneId
targetZone, boolean leg
calendar.set(Calendar.SECOND, seconds);
Timestamp ts = Timestamp.ofEpochMilli(calendar.getTimeInMillis(), (int)
nanos);
- ts = TimestampTZUtil.convertTimestampToZone(ts, ZoneOffset.UTC,
targetZone, legacyConversion);
+ ts = TimestampTZUtil.convertTimestampToZone(ts, ZoneOffset.UTC,
targetZone, legacyConversion)
+ .minusDays(leapYearDateAdjustment);
+
return ts;
}
+
+ private static int julianLeapYearAdjustment(JulianDate jDateTime) {
+ Set<Integer> validDaysOfMonth = new HashSet<>();
+ validDaysOfMonth.add(1);
+ validDaysOfMonth.add(2);
+ validDaysOfMonth.add(27);
+ validDaysOfMonth.add(28);
+ validDaysOfMonth.add(29);
+ LocalDateTime localDateTime;
+ try {
+ localDateTime = jDateTime.toLocalDateTime();
+ } catch (DateTimeException e) {
+ if (e.getMessage().contains(ErrorMsg.NOT_LEAP_YEAR.getMsg())) {
+ return 2;
+ } else {
+ throw e;
+ }
+ }
+
+ int year = localDateTime.getYear();
+ int dayOfMonth = localDateTime.getDayOfMonth();
+ boolean isJulianLeapYear = (year % 4 == 0 && year % 400 != 0 && year % 100
== 0);
+
+ if (year < 1582 && isJulianLeapYear &&
validDaysOfMonth.contains(dayOfMonth)) {
+ switch (localDateTime.getMonth()) {
+ case FEBRUARY:
+ return -2;
+ case MARCH:
+ return 2;
+ default:
+ return 0;
+ }
+ }
+ return 0;
+ }
Review Comment:
Moreover it kinda feels like this kind of shifting should be done inside the
`JulianDate` class. I haven't fully formulated an opinion yet but I raised
https://github.com/oblac/jodd-util/issues/21 to see what the jodd authors have
to say about this use case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]