[ https://issues.apache.org/jira/browse/ARROW-16498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-16498: ----------------------------------- Labels: pull-request-available (was: ) > [C++] Fix potential deadlock in arrow::compute::TaskScheduler > ------------------------------------------------------------- > > Key: ARROW-16498 > URL: https://issues.apache.org/jira/browse/ARROW-16498 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Reporter: Weston Pace > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > An extremely simplified version of the task scheduler's ScheduleMore method > it looks something like: > {noformat} > void ScheduleMore(int num_to_schedule) { > tasks_that_need_running_.fetch_add(num_to_schedule); > if (!weak_lock.lock()) { > // If someone else is scheduling then return early > return; > } > auto tasks = PickTasks(); > weak_lock.unlock(); > } > {noformat} > It is possible for one thread to have the lock, and find 0 tasks. But then, > before it gives up the lock, another thread adds tasks and fails to acquire > the lock. Neither thread will schedule anything even though there are tasks > to run. This can lead to deadlock. > The proposed PR changes the logic to (still extremely simplified): > {noformat} > void ScheduleMore(int num_to_schedule) { > tasks_that_need_running_.fetch_add(num_to_schedule); > tasks_added_recently.store(true); > if (!weak_lock.lock()) { > // If someone else is scheduling then return early > return; > } > auto tasks = PickTasks(); > if (tasks_added_recently.compare_exchange_strong(true, false)) { > if (tasks.empty()) { > ScheduleMore(); > } > } > weak_lock.unlock(); > } > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)