[GitHub] [hive] zabetak commented on a change in pull request #2282: HIVE-25104: Backward incompatible timestamp serialization in Parquet for certain timezones

GitBox Mon, 17 May 2021 11:35:19 -0700


zabetak commented on a change in pull request #2282:
URL: https://github.com/apache/hive/pull/2282#discussion_r633480041




##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
##########
@@ -523,10 +532,30 @@ private static MessageType getRequestedPrunedSchema(
           configuration, 
HiveConf.ConfVars.HIVE_PARQUET_DATE_PROLEPTIC_GREGORIAN_DEFAULT)));
     }
 
-    String legacyConversion = 
ConfVars.HIVE_PARQUET_TIMESTAMP_LEGACY_CONVERSION_ENABLED.varname;
-    if (!metadata.containsKey(legacyConversion)) {
-      metadata.put(legacyConversion, String.valueOf(HiveConf.getBoolVar(
-          configuration, 
HiveConf.ConfVars.HIVE_PARQUET_TIMESTAMP_LEGACY_CONVERSION_ENABLED)));
+    if 
(!metadata.containsKey(DataWritableWriteSupport.WRITER_ZONE_CONVERSION_LEGACY)) 
{
+      final String legacyConversion;
+      
if(keyValueMetaData.containsKey(DataWritableWriteSupport.WRITER_ZONE_CONVERSION_LEGACY))
 {
+        // If there is meta about the legacy conversion then the file should 
be read in the same way it was written. 
+        legacyConversion = 
keyValueMetaData.get(DataWritableWriteSupport.WRITER_ZONE_CONVERSION_LEGACY);
+      } else 
if(keyValueMetaData.containsKey(DataWritableWriteSupport.WRITER_TIMEZONE)) {
+        // If there is no meta about the legacy conversion but there is meta 
about the timezone then we can infer the
+        // file was written with the new rules.
+        legacyConversion = "false";
+      } else {

Review comment:
       This `if` block makes the life of users in (3.1.2, 3.2.0) a bit easier 
since it determines automatically the appropriate conversion. It looks a bit 
weird though so we could possibly remove it and require from the users in these 
versions to set the respective property accordingly. I would prefer to keep the 
code more uniform than trying to cover edge cases.

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java
##########
@@ -536,7 +542,8 @@ public void write(Object value) {
         Long int64value = ParquetTimestampUtils.getInt64(ts, timeUnit);
         recordConsumer.addLong(int64value);

Review comment:
       The fact that we do not perform/control legacy conversion when we store 
timestamps in INT64 type can create problems if we end up comparing timestamps 
stored as INT96 and INT64. Shall we try to make the new property 
(`hive.parquet.timestamp.write.legacy.conversion.enabled`) affect also the 
INT64 storage type?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] zabetak commented on a change in pull request #2282: HIVE-25104: Backward incompatible timestamp serialization in Parquet for certain timezones

Reply via email to