AliRana30 opened a new issue, #49061:
URL: https://github.com/apache/arrow/issues/49061

   ### Describe the enhancement requested
   
   ## Problem
   
   Arrow has 30+ documented deadlock risks across the codebase but no 
centralized detection or prevention framework.
   
   **Affected files:**
   - `cpp/src/arrow/util/thread_pool.h:271,292,329,443` - Nested parallelism 
causes deadlock
   - `cpp/src/arrow/filesystem/s3fs.cc:2098,2118` - Future callbacks deadlock 
(GH-41862)
   - `cpp/src/arrow/filesystem/filesystem.cc:645` - Blocking Close() causes 
deadlocks
   - `cpp/src/arrow/util/async_generator.h:1052,1746,1878` - Lock ordering 
issues
   - `cpp/src/arrow/flight/sql/odbc/odbc_impl/flight_sql_driver.cc:43` - 
**Abseil deadlock detection explicitly disabled**
   - `cpp/src/arrow/dataset/dataset_writer.cc:118` - Queue overflow causes 
deadlock
   - `cpp/src/arrow/acero/asof_join_node.cc:767,1417` - Pause/backpressure 
deadlock
   - `cpp/src/arrow/csv/reader_test.cc:130-132` - Destructor can deadlock on 
cleanup
   
   **Related:** #48714
   
   ## Proposed Solution
   
   Implement comprehensive deadlock detection and prevention framework:
   
   1. **Runtime Detection:** Build lock ordering validator tracking mutex 
acquisition across threads, create cycle detector for resource dependencies
   2. **Enable Abseil Detection:** Re-enable 
`absl::SetMutexDeadlockDetectionMode` currently disabled in Flight SQL, fix 
underlying issues requiring the workaround
   3. **Timeout Mechanisms:** Add configurable timeouts for all blocking 
operations, implement timeout-based detection for suspicious long waits
   4. **Thread Tracking:** Add thread state tracking with lock acquisition 
logging in debug builds, integrate with ThreadPool/Executor infrastructure
   5. **Static Analysis:** Implement compile-time detection of nested 
parallelism patterns and potential deadlock scenarios
   6. **Prevention Policies:** Enforce lock ordering policies, add automatic 
deadlock avoidance in executor scheduling
   7. **Testing Infrastructure:** Create automated stress tests exercising 
concurrent operations, add chaos engineering tests with random delays/contention
   8. **CI Integration:** Run deadlock detection in nightly builds with TSAN, 
add performance regression tests for new deadlock risks
   9. **Documentation:** Document safe parallelism patterns, create lock 
hierarchy guide, provide examples of anti-patterns to avoid
   
   **Benefits:** Prevent production deadlocks, enable faster debugging, support 
safe parallelism optimizations, improve enterprise reliability
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to