Davis-Zhang-Onehouse opened a new pull request, #18814:
URL: https://github.com/apache/hudi/pull/18814

   ### Describe the issue this Pull Request addresses
   
   The implicit-key lock providers — 
`DynamoDBBasedImplicitPartitionKeyLockProvider` and 
`ZookeeperBasedImplicitBasePathLockProvider` — hash the hudi table base path to 
derive a DynamoDB partition key / Zookeeper znode for the table-level lock. The 
hash function (XXH64) is avalanche: any byte-level difference in the input 
produces a completely different output. Today the only normalization applied 
before hashing is `s3aToS3` (`s3a://` → `s3://`). Trailing slashes, repeated 
trailing slashes, and surrounding whitespace pass through unchanged.
   
   When two writers for the same hudi table disagree on those benign formatting 
details — for example, one engine supplies `"s3://bucket/table"` while another 
supplies `"s3://bucket/table/"` — they end up acquiring **different** lock rows 
/ znodes and lose mutual exclusion, even though they are targeting the same 
table. Two writers that should serialize can then write concurrently and 
corrupt the hudi timeline.
   
   ### Summary and Changelog
   
   Introduce `FSUtils.normalizeBasePathForLocking(String)` as the single source 
of truth for canonicalization before hashing:
   
   1. Reject null / empty / whitespace-only basePath.
   2. `trim()` surrounding whitespace.
   3. Apply existing `s3aToS3` (case-insensitive `s3a://` → `s3://`).
   4. Strip all trailing `/` then add exactly one.
   
   Wire it into both implicit-key lock providers:
   
   - `DynamoDBBasedImplicitPartitionKeyLockProvider` — also gains a small 
public static `derivePartitionKey(String)` so the formula is testable without a 
DynamoDB client.
   - `ZookeeperBasedImplicitBasePathLockProvider` — `getLockBasePath(String)` 
now normalizes before hashing.
   
   Both providers also log the normalized form alongside the original input, 
making it easier to diagnose any future drift.
   
   **Deliberately not normalized:**
   
   - **Inner consecutive slashes** (`s3://b//x` vs `s3://b/x`) — these can 
resolve to a legitimate S3 key; rewriting them could collide unrelated tables.
   - **Scheme case beyond `s3a`** (e.g. `S3://`) — never observed in practice 
and out of scope.
   - **URL encoding** — Hudi does not re-encode paths internally.
   
   ### Impact
   
   - **Behavior:** Implicit-key lock providers will produce the same lock for a 
hudi table regardless of trailing-slash, multi-slash, surrounding whitespace, 
or `s3a`/`s3` scheme variations in the base path supplied to the 
`LockConfiguration`. No public API signature change.
   - **Performance:** Negligible — one extra `trim()` and a short backward scan 
over the path string per `getDynamoDBPartitionKey` / `getLockBasePath` call.
   - **Compatibility (rollout caveat):** Locks taken under the previous 
(no-trailing-slash or whitespace-sensitive) form effectively orphan at deploy 
time, since the new code looks at a different lock row / znode for the same 
logical table. Operators upgrading should coordinate the deploy across all 
writers that share a hudi table, or briefly quiesce writers while rolling out. 
Stale DynamoDB rows can be left to expire naturally (heartbeat-based) or 
cleaned up manually.
   
   ### Risk Level
   
   medium
   
   The lock-key derivation changes for an existing canonical input form 
(trailing-slash callers will see the same hash as before, since `"s3://b/x/"` → 
strip → `"s3://b/x"` → add `/` → `"s3://b/x/"`). The risk is concentrated on 
callers who previously supplied the basePath without a trailing slash — those 
locks will key under a new row / znode after this change, which is the entire 
point of the fix but does mean any in-flight lock under the old key is dropped 
at deploy time.
   
   Mitigation:
   - Existing `TestZookeeperBasedLockProvider` continues to pass with no 
changes.
   - New tests cover the invariant directly: trailing slash, multi-slash, 
whitespace, and `s3a`/`s3` variants all produce the same key.
   - Local Maven runs: `TestFSUtils` (13 cases), 
`TestDynamoDBBasedImplicitPartitionKeyLockProvider` (4 cases), 
`TestZookeeperBasedImplicitBasePathLockProvider` (4 cases) all pass.
   
   ### Documentation Update
   
   none — no user-facing config or website change. The new helper is internal.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to