hudi-agent commented on code in PR #18814:
URL: https://github.com/apache/hudi/pull/18814#discussion_r3286345086
##########
hudi-aws/src/main/java/org/apache/hudi/aws/transaction/lock/DynamoDBBasedImplicitPartitionKeyLockProvider.java:
##########
@@ -43,26 +43,33 @@ public class DynamoDBBasedImplicitPartitionKeyLockProvider
extends DynamoDBBased
protected static final Logger LOG =
LoggerFactory.getLogger(DynamoDBBasedImplicitPartitionKeyLockProvider.class);
private final String hudiTableBasePath;
-
+
public DynamoDBBasedImplicitPartitionKeyLockProvider(final LockConfiguration
lockConfiguration, final StorageConfiguration<?> conf) {
this(lockConfiguration, conf, null);
}
public DynamoDBBasedImplicitPartitionKeyLockProvider(
final LockConfiguration lockConfiguration, final StorageConfiguration<?>
conf, DynamoDbClient dynamoDB) {
super(lockConfiguration, conf, dynamoDB);
- hudiTableBasePath =
s3aToS3(lockConfiguration.getConfig().getString(HoodieCommonConfig.BASE_PATH.key()));
+ hudiTableBasePath = normalizeBasePathForLocking(
+
lockConfiguration.getConfig().getString(HoodieCommonConfig.BASE_PATH.key()));
+ }
+
+ /**
+ * Compute the DynamoDB partition key for a given Hudi table base path.
Exposed as a static
+ * helper so that the formula is testable without standing up a DynamoDB
client.
+ */
+ public static String derivePartitionKey(String hudiTableBasePath) {
+ String normalized = normalizeBasePathForLocking(hudiTableBasePath);
+ String partitionKey = HashID.generateXXHashAsString(normalized,
HashID.Size.BITS_64);
+ LOG.info(String.format("The DynamoDB partition key of the lock provider
for the base path %s (normalized: %s) is %s",
+ hudiTableBasePath, normalized, partitionKey));
+ return partitionKey;
}
Review Comment:
🤖 I think this will break the existing integration test
`ITTestDynamoDBBasedLockProvider.testAcquireLock` (lines 193-195). That test
asserts `getPartitionKey()` equals `HashID.generateXXHashAsString(SCHEME_S3 +
URI_NO_CLOUD_PROVIDER_PREFIX, ...)` where `URI_NO_CLOUD_PROVIDER_PREFIX` has no
trailing slash — but with this change the implementation will hash
`"...my_table/"` instead. Could you update that assertion to use the normalized
form (or `derivePartitionKey(...)`) so the IT keeps passing?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java:
##########
@@ -759,6 +759,31 @@ public static String s3aToS3(String s3aUrl) {
return s3aUrl.replaceFirst("(?i)^s3a://", "s3://");
}
+ /**
+ * Canonicalize a Hudi table base path for use as the input to implicit
lock-key derivation.
+ *
+ * <p>Implicit lock providers (DynamoDB and Zookeeper variants) hash this
string to choose the
+ * lock row / znode for a table. Two callers writing to the same table must
produce the same
+ * hash, so any benign formatting drift in the basePath has to be eliminated
before hashing.
+ * This method trims surrounding whitespace, normalizes s3a:// to s3://,
then forces exactly
+ * one trailing slash. Inner double slashes are intentionally preserved.
+ */
Review Comment:
🤖 Worth thinking about the rollout: the canonical form now always has a
trailing slash, but most existing deployments today probably pass
`s3://bucket/table` (no trailing slash) and have running locks keyed off
`hash("s3://bucket/table")`. After this change those same writers will key off
`hash("s3://bucket/table/")` instead — so during a rolling upgrade an
old-binary writer and a new-binary writer for the same table will take
*different* locks, which is exactly the bug this PR is trying to prevent.
@yihua could you weigh in on whether this needs a coordinated-restart note /
release-notes callout, or whether we should preserve the old hash as a fallback?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-aws/src/main/java/org/apache/hudi/aws/transaction/lock/DynamoDBBasedImplicitPartitionKeyLockProvider.java:
##########
@@ -43,26 +43,33 @@ public class DynamoDBBasedImplicitPartitionKeyLockProvider
extends DynamoDBBased
protected static final Logger LOG =
LoggerFactory.getLogger(DynamoDBBasedImplicitPartitionKeyLockProvider.class);
private final String hudiTableBasePath;
-
+
public DynamoDBBasedImplicitPartitionKeyLockProvider(final LockConfiguration
lockConfiguration, final StorageConfiguration<?> conf) {
this(lockConfiguration, conf, null);
}
public DynamoDBBasedImplicitPartitionKeyLockProvider(
final LockConfiguration lockConfiguration, final StorageConfiguration<?>
conf, DynamoDbClient dynamoDB) {
super(lockConfiguration, conf, dynamoDB);
- hudiTableBasePath =
s3aToS3(lockConfiguration.getConfig().getString(HoodieCommonConfig.BASE_PATH.key()));
+ hudiTableBasePath = normalizeBasePathForLocking(
+
lockConfiguration.getConfig().getString(HoodieCommonConfig.BASE_PATH.key()));
+ }
+
+ /**
+ * Compute the DynamoDB partition key for a given Hudi table base path.
Exposed as a static
+ * helper so that the formula is testable without standing up a DynamoDB
client.
+ */
+ public static String derivePartitionKey(String hudiTableBasePath) {
+ String normalized = normalizeBasePathForLocking(hudiTableBasePath);
+ String partitionKey = HashID.generateXXHashAsString(normalized,
HashID.Size.BITS_64);
+ LOG.info(String.format("The DynamoDB partition key of the lock provider
for the base path %s (normalized: %s) is %s",
Review Comment:
🤖 nit: could you switch this to SLF4J parameterized logging (`LOG.info("...
{} ... {} ... {}", hudiTableBasePath, normalized, partitionKey)`) rather than
`String.format`? The ZK provider already uses `{}` placeholders, and this
avoids the string-building cost when INFO is disabled.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]