[GitHub] [hudi] fengjian428 commented on a diff in pull request #7542: [HUDI-5469] Hive doesn't respect the space at the end of partition path, so remove it to avoid dupl…

2022-12-28 Thread GitBox


fengjian428 commented on code in PR #7542:
URL: https://github.com/apache/hudi/pull/7542#discussion_r1058768721


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java:
##
@@ -131,6 +131,9 @@ public static String getRecordPartitionPath(GenericRecord 
record, List p
   } else {
 if (encodePartitionPath) {
   fieldVal = PartitionPathEncodeUtils.escapePathName(fieldVal);
+} else {
+  // Hive doesn't respect the space at the end, so remove it to avoid 
duplicate keys error
+  fieldVal = fieldVal.trim();

Review Comment:
   In my case, the user's data has some wrong partition values with space at 
the end(space shouldn't exist), then the data has been separated into two HDFS 
paths, and those two paths cannot be synced to Hive.
   I did this fix to avoid upstream change and hive sync errors. 
   if the user really wants two hdfs paths and two partitions in hive, they 
should encode partition path



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] fengjian428 commented on a diff in pull request #7542: [HUDI-5469] Hive doesn't respect the space at the end of partition path, so remove it to avoid dupl…

2022-12-23 Thread GitBox


fengjian428 commented on code in PR #7542:
URL: https://github.com/apache/hudi/pull/7542#discussion_r1056143997


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java:
##
@@ -131,6 +131,9 @@ public static String getRecordPartitionPath(GenericRecord 
record, List p
   } else {
 if (encodePartitionPath) {
   fieldVal = PartitionPathEncodeUtils.escapePathName(fieldVal);
+} else {
+  // Hive doesn't respect the space at the end, so remove it to avoid 
duplicate keys error
+  fieldVal = fieldVal.trim();
 }

Review Comment:
   Haven't tested it against other engines besides HMS.  maybe can make a 
configuration for it and infer from HMS? WDYT



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] fengjian428 commented on a diff in pull request #7542: [HUDI-5469] Hive doesn't respect the space at the end of partition path, so remove it to avoid dupl…

2022-12-23 Thread GitBox


fengjian428 commented on code in PR #7542:
URL: https://github.com/apache/hudi/pull/7542#discussion_r1056143997


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java:
##
@@ -131,6 +131,9 @@ public static String getRecordPartitionPath(GenericRecord 
record, List p
   } else {
 if (encodePartitionPath) {
   fieldVal = PartitionPathEncodeUtils.escapePathName(fieldVal);
+} else {
+  // Hive doesn't respect the space at the end, so remove it to avoid 
duplicate keys error
+  fieldVal = fieldVal.trim();
 }

Review Comment:
   Haven't tested it against other engines besides HMS. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] fengjian428 commented on a diff in pull request #7542: [HUDI-5469] Hive doesn't respect the space at the end of partition path, so remove it to avoid dupl…

2022-12-23 Thread GitBox


fengjian428 commented on code in PR #7542:
URL: https://github.com/apache/hudi/pull/7542#discussion_r1056143997


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java:
##
@@ -131,6 +131,9 @@ public static String getRecordPartitionPath(GenericRecord 
record, List p
   } else {
 if (encodePartitionPath) {
   fieldVal = PartitionPathEncodeUtils.escapePathName(fieldVal);
+} else {
+  // Hive doesn't respect the space at the end, so remove it to avoid 
duplicate keys error
+  fieldVal = fieldVal.trim();
 }

Review Comment:
   Haven't tested other engines besides HMS



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org