jihoonson opened a new pull request #10736: URL: https://github.com/apache/druid/pull/10736
### Description `AbstractBatchIndexTask.tryTimeChunkLock()` is called `isReady()` which is called in the Overlord to check if the task is ready for execution. Since the Overlord only checks the result of `isReady()` but does nothing with locks by itself, locking individual interval can lead to deadlock. Imagine that there are two tasks that want to work on overlapping intervals. One task could lock some of intervals but failed for others for some reason. Later, another task could lock those intervals where the first task failed to lock. Now they would wait for each other to release locks they possess. This PR changes to lock an umbrella interval of the input intervals instead of locking them individually. This is an easy fix, but I'm not sure if it's the best. Maybe the Overlord should take care of those tasks who are not ready yet and release any locks they acquired during `isReady()`. This approach will work better when the input intervals are not consecutive (the approach locking an umbrella interval will unnecessarily lock some intervals even though the task doesn't need them). We can get back to this approach when this turns out to be an issue. <hr> This PR has: - [x] been self-reviewed. - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.) - [x] added documentation for new or modified features or behaviors. - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml) - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [ ] been tested in a test Druid cluster. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org