[I] When the user's primary key data contains commas, BucketIdentifier cannot be used [hudi]

via GitHub Sat, 29 Nov 2025 22:38:30 -0800


hudi-bot opened a new issue, #15866:
URL: https://github.com/apache/hudi/issues/15866


   In the scenario of using composite primary keys and bucket index in a Hudi 
table, BucketIdentifier splits the recordKey using commas as a delimiter. This 
can cause exceptions to occur if the user's primary key data contains commas.
   {code:java}
   // BucketIdentifier.java
   private static List<String> getHashKeysUsingIndexFields(String recordKey, 
List<String> indexKeyFields) {
     Map<String, String> recordKeyPairs = Arrays.stream(recordKey.split(","))
         .map(p -> p.split(":"))
         .collect(Collectors.toMap(p -> p[0], p -> p[1]));
     return indexKeyFields.stream()
         .map(recordKeyPairs::get).collect(Collectors.toList());
   } {code}
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-5982
   - Type: Bug
   - Affects version(s):
     - 0.12.0
   
   
   ---
   
   
   ## Comments
   
   25/Mar/23 10:52;tangshangwen;My idea is that if the user's primary key data 
contains ",", we can replace it with __commas__ _when generating the recordKey. 
When the user wants to retrieve the real primary key data from the recordKey, 
they can replace  __commas___  with ",".;;;
   
   ---
   
   25/Mar/23 14:38;codope;Is it common to have commas in a primary key field 
name? In my opinion, it should be fixed upstream.;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] When the user's primary key data contains commas, BucketIdentifier cannot be used [hudi]

Reply via email to