hudi-bot opened a new issue, #15866:
URL: https://github.com/apache/hudi/issues/15866
In the scenario of using composite primary keys and bucket index in a Hudi
table, BucketIdentifier splits the recordKey using commas as a delimiter. This
can cause exceptions to occur if the user's primary key data contains commas.
{code:java}
// BucketIdentifier.java
private static List<String> getHashKeysUsingIndexFields(String recordKey,
List<String> indexKeyFields) {
Map<String, String> recordKeyPairs = Arrays.stream(recordKey.split(","))
.map(p -> p.split(":"))
.collect(Collectors.toMap(p -> p[0], p -> p[1]));
return indexKeyFields.stream()
.map(recordKeyPairs::get).collect(Collectors.toList());
} {code}
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-5982
- Type: Bug
- Affects version(s):
- 0.12.0
---
## Comments
25/Mar/23 10:52;tangshangwen;My idea is that if the user's primary key data
contains ",", we can replace it with __commas__ _when generating the recordKey.
When the user wants to retrieve the real primary key data from the recordKey,
they can replace __commas___ with ",".;;;
---
25/Mar/23 14:38;codope;Is it common to have commas in a primary key field
name? In my opinion, it should be fixed upstream.;;;
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]