smengcl opened a new pull request, #10324: URL: https://github.com/apache/ozone/pull/10324
## What changes were proposed in this pull request? Fix a race in `CoderUtil.getEmptyChunk()` that can cause EC writes to fail with `ArrayIndexOutOfBoundsException` during parity encoding. ### Problem `CoderUtil.resetBuffer(byte[] buffer, int offset, int len)` gets a shared zero-filled buffer from `getEmptyChunk(len)` and then calls: ```java System.arraycopy(empty, 0, buffer, offset, len); ``` The old getEmptyChunk() implementation checked emptyChunk.length before entering the synchronized block, unconditionally replaced the shared static buffer inside the lock, and returned the shared static field after leaving the lock. This allowed a smaller concurrent caller to shrink the shared cached buffer after a larger caller had grown it. An interleaving that repros the issue: 1. emptyChunk starts as byte[4096]. 1. Thread A calls getEmptyChunk(4097) and blocks before entering the synchronized block. 1. Thread B calls getEmptyChunk(8194), enters the synchronized block, and sets emptyChunk = byte[8194]. 1. Thread A resumes and unconditionally sets emptyChunk = byte[4097]. 1. Thread B returns the shared static emptyChunk, now byte[4097]. 1. System.arraycopy(..., len=8194) throws ArrayIndexOutOfBoundsException. This is a TOCTOU-style race on the shared emptyChunk cache. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-15341 ## How was this patch tested? - TBD -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
