balodesecurity opened a new pull request, #8290:
URL: https://github.com/apache/hadoop/pull/8290
## Problem
When an application runs on a DataNode with short-circuit reads enabled and
a custom `URLClassLoader` (whose classpath contains remote HDFS JARs) set as
the thread context ClassLoader, the main thread can hang indefinitely.
**Deadlock chain:**
1. Thread T enters `DfsClientShmManager.EndpointShmManager.allocSlot()`,
sets `loading = true`, releases the lock, and calls `requestNewShm()`
2. `requestNewShm()` creates a `DfsClientShm`, whose constructor
(`ShortCircuitShm`) calls `POSIX.mmap()` — triggering the `NativeIO.POSIX`
class static initializer
3. The static initializer calls `new Configuration()`, which loads XML
resources via the thread's **context ClassLoader**
4. If the context ClassLoader is a `URLClassLoader` backed by remote HDFS
JARs, resolving those JARs triggers an HDFS read
5. That read re-enters `allocSlot()` on **the same thread T**, which
acquires the lock (since it was released), sees `loading == true`, and calls
`finishedLoading.awaitUninterruptibly()`
6. Thread T is now parked waiting for a condition that **it itself** must
signal → **indefinite hang**
## Fix
Track which thread set `loading = true` via a new `loadingThread` field in
`EndpointShmManager`. When `allocSlot()` detects that `loading == true` and the
current thread **is** the loading thread, it returns `null` immediately instead
of waiting. The caller then falls back transparently to a normal
(non-short-circuit) read.
Changes:
- `DfsClientShmManager.java`: add `loadingThread` field; set/clear it
alongside `loading`; detect and short-circuit re-entrant calls
- `TestDfsClientShmManager.java`: regression test that injects the
re-entrant state via reflection and verifies `null` is returned within a
10-second timeout (would hang indefinitely before this fix)
## Test
```
mvn test -pl hadoop-hdfs-project/hadoop-hdfs-client \
-Dtest=TestDfsClientShmManager
```
The test requires native Unix domain socket support (`libhadoop`) and
auto-skips without it (matching the pattern used throughout the shortcircuit
test suite).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]