SourabhBadhya commented on code in PR #5779:
URL: https://github.com/apache/hive/pull/5779#discussion_r2347406022


##########
common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java:
##########
@@ -237,6 +249,18 @@ public static Timestamp ofEpochMilli(long epochMilli, int 
nanos) {
         .withNano(nanos));
   }
 
+  public static Timestamp ofEpochMicro(long epochMicro) {
+    int nanos = (int) ((epochMicro % 1000000) * 1000);

Review Comment:
   Use Math.toIntExact() for long to int casts since it throws the correct 
exception instead of ClassCastException.



##########
ql/src/test/results/clientpositive/llap/avro_timestamp.q.out:
##########
@@ -118,6 +134,14 @@ POSTHOOK: Input: default@avro_timestamp@p1=2/p2=2014-09-26 
07%3A08%3A09.123
 1214-02-11 07:08:09.123        {"baz":"0981-12-16 07:08:09.123"}       
["0011-09-05 07:08:09.123"]     2       2014-09-26 07:08:09.123
 0847-02-11 07:08:09.123        {"baz":"0921-12-16 07:08:09.123"}       
["0011-09-05 07:08:09.123"]     2       2014-09-26 07:08:09.123
 0600-02-11 07:08:09.123        {"baz":"0981-12-16 07:08:09.123"}       
["0039-09-05 07:08:09.123"]     2       2014-09-26 07:08:09.123
+2012-02-21 07:08:09.123456     {"bar":"1998-05-07 
07:08:09.123456","foo":"1980-12-16 07:08:09.123456"} ["2011-09-04 
07:08:09.123456","2011-09-05 07:08:09.123456"]     2       2014-09-26 
07:08:09.123
+2014-02-11 07:08:09.123456     {"baz":"1981-12-16 07:08:09.123456"}    
["2011-09-05 07:08:09.123456"]  2       2014-09-26 07:08:09.123
+1947-02-11 07:08:09.123456     {"baz":"1921-12-16 07:08:09.123456"}    
["2011-09-05 07:08:09.123456"]  2       2014-09-26 07:08:09.123
+8200-02-11 07:08:09.123456     {"baz":"6981-12-16 07:08:09.123456"}    
["1039-09-05 07:08:09.123456"]  2       2014-09-26 07:08:09.123
+1412-02-21 07:08:09.123456     {"bar":"0998-05-07 
07:08:09.123456","foo":"0980-12-16 07:08:09.123456"} ["0011-09-04 
07:08:09.123456","0011-09-05 07:08:09.123456"]     2       2014-09-26 
07:08:09.123
+1214-02-11 07:08:09.123456     {"baz":"0981-12-16 07:08:09.123456"}    
["0011-09-05 07:08:09.123456"]  2       2014-09-26 07:08:09.123
+0847-02-11 07:08:09.123456     {"baz":"0921-12-16 07:08:09.123456"}    
["0011-09-05 07:08:09.123456"]  2       2014-09-26 07:08:09.123
+0600-02-11 07:08:09.123456     {"baz":"0981-12-16 07:08:09.123456"}    
["0039-09-05 07:08:09.123456"]  2       2014-09-26 07:08:09.123

Review Comment:
   Dont see a test for union type. Please add it here.



##########
ql/src/test/queries/clientpositive/avro_timestamp_micros.q:
##########
@@ -0,0 +1,9 @@
+CREATE EXTERNAL TABLE micros_table(`dt` timestamp)
+STORED AS AVRO;
+
+INSERT INTO micros_table VALUES
+(cast('2024-08-09 14:08:26.326107' as timestamp)),
+('2012-02-21 07:08:09.123'),
+('1014-02-11 07:15:11.12345');
+
+SELECT * FROM micros_table;

Review Comment:
   nit: Add a new line at the end of the file.



##########
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java:
##########
@@ -220,12 +223,60 @@ private List<Object> workerBase(List<Object> objectRow, 
Schema fileSchema, List<
       Object datum = record.get(columnName);
       Schema datumSchema = record.getSchema().getField(columnName).schema();
       Schema.Field field = 
AvroSerdeUtils.isNullableType(fileSchema)?AvroSerdeUtils.getOtherTypeFromNullableType(fileSchema).getField(columnName):fileSchema.getField(columnName);
+      if (field != null) {
+        // This single call handles all cases: direct, union, map, and array.
+        logicalType = findNestedLogicalType(field, field.schema());
+      }
       objectRow.add(worker(datum, field == null ? null : field.schema(), 
datumSchema, columnType));
     }
 
     return objectRow;
   }
 
+  /**
+   * Recursively traverses a schema to find the first LogicalType.
+   * This handles direct logical types, and logical types nested within
+   * UNIONS, MAPS, or ARRAYS.
+   *
+   * @param field
+   * @param schema The schema to inspect.
+   * @return The found LogicalType, or null if none is found.
+   */
+  private LogicalType findNestedLogicalType(Schema.Field field, Schema schema) 
{
+    // Base Case 1: The schema is null, so we can't proceed.
+    if (schema == null) {
+      return null;
+    }
+
+    // Base Case 2: The schema itself has the logical type. We found it.
+    if (schema.getLogicalType() != null) {
+      return schema.getLogicalType();
+    }
+
+    // Recursive Step: The logical type is deeper. Check the container type.
+    switch (schema.getType()) {
+      case UNION:
+        // Find the first non-null branch and search within it.
+        return schema.getTypes().stream()
+                .filter(s -> s.getType() != Schema.Type.NULL)
+                .map(s -> findNestedLogicalType(field, s)) // Recurse on the 
branch
+                .filter(java.util.Objects::nonNull)
+                .findFirst()
+                .orElse(null);
+      case MAP:
+        // Search within the map's value schema.
+        return findNestedLogicalType(field, schema.getValueType());
+      case ARRAY:
+        // Search within the array's element schema.
+        return findNestedLogicalType(field, schema.getElementType());
+      default:
+        // This type (e.g., STRING, INT) doesn't contain a nested schema.
+        return field.getProp("logicalType") != null
+                  ? new LogicalType(field.getProp("logicalType"))
+                  : null;
+    }

Review Comment:
   I see tests for map and array but not for union type. Please add tests for 
UNION type. If its already here, feel free to ping the source here.



##########
storage-api/src/java/org/apache/hadoop/hive/common/type/CalendarUtils.java:
##########
@@ -180,6 +239,39 @@ public static long convertTimeToHybrid(long proleptic) {
     return hybrid;
   }
 
+  /**
+   * Convert epoch microseconds from the proleptic Gregorian calendar to the
+   * hybrid Julian/Gregorian.
+   * @param prolepticMicros Microseconds of epoch in the proleptic Gregorian
+   * @return Microseconds of epoch in the hybrid Julian/Gregorian
+   */
+  public static long convertTimeToHybridMicros(long prolepticMicros) {
+    long hybridMicros = prolepticMicros;
+    if (prolepticMicros < SWITCHOVER_MICROS) {
+      // Convert microseconds to Instant
+      Instant instant = Instant.ofEpochSecond(
+              prolepticMicros / 1_000_000L,
+              (prolepticMicros % 1_000_000L) * 1_000L
+      );
+
+      // Format using proleptic formatter
+      String dateStr = 
PROLEPTIC_TIME_FORMAT_MICROS.format(instant.atZone(ZoneOffset.UTC));
+
+      // Parse using hybrid formatter
+      try {
+        Instant hybridInstant = LocalDateTime.parse(dateStr, 
HYBRID_TIME_FORMAT_MICROS)
+                .atZone(ZoneOffset.UTC)
+                .toInstant();
+
+        hybridMicros = hybridInstant.getEpochSecond() * 1_000_000L +
+                hybridInstant.getNano() / 1000L;
+      } catch (DateTimeParseException e) {
+        throw new IllegalArgumentException("Can't parse " + dateStr, e);
+      }
+    }
+    return hybridMicros;
+  }

Review Comment:
   A lot of the code here is duplicated via convertTimeToProlepticMicros(). 
Consider creating a function and call proplectic and hybrid based on the 
different formatters used for them.



##########
common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java:
##########
@@ -163,10 +163,22 @@ public long toEpochMilli() {
     return localDateTime.toInstant(ZoneOffset.UTC).toEpochMilli();
   }
 
+  public long toEpochMicro() {
+    return localDateTime.toEpochSecond(ZoneOffset.UTC) * 1_000_000
+            + localDateTime.getNano() / 1000;
+  }
+
   public long toEpochMilli(ZoneId id) {
     return localDateTime.atZone(id).toInstant().toEpochMilli();
   }
 
+  public long toEpochMicro(ZoneId id) {
+    return localDateTime.atZone(id)
+            .toInstant()
+            .getEpochSecond() * 1_000_000L +
+            localDateTime.getNano() / 1000;
+  }
+

Review Comment:
   No point in adding if its never used.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to