[ 
https://issues.apache.org/jira/browse/ARROW-16498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-16498:
-----------------------------------
    Labels: pull-request-available  (was: )

> [C++] Fix potential deadlock in arrow::compute::TaskScheduler
> -------------------------------------------------------------
>
>                 Key: ARROW-16498
>                 URL: https://issues.apache.org/jira/browse/ARROW-16498
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> An extremely simplified version of the task scheduler's ScheduleMore method 
> it looks something like:
> {noformat}
> void ScheduleMore(int num_to_schedule) {
>   tasks_that_need_running_.fetch_add(num_to_schedule);
>   if (!weak_lock.lock()) {
>     // If someone else is scheduling then return early
>     return;
>   }
>   auto tasks = PickTasks();
>   weak_lock.unlock();
> }
> {noformat}
> It is possible for one thread to have the lock, and find 0 tasks.  But then, 
> before it gives up the lock, another thread adds tasks and fails to acquire 
> the lock.  Neither thread will schedule anything even though there are tasks 
> to run.  This can lead to deadlock.
> The proposed PR changes the logic to (still extremely simplified):
> {noformat}
> void ScheduleMore(int num_to_schedule) {
>   tasks_that_need_running_.fetch_add(num_to_schedule);
>   tasks_added_recently.store(true);
>   if (!weak_lock.lock()) {
>     // If someone else is scheduling then return early
>     return;
>   }
>   auto tasks = PickTasks();
>   if (tasks_added_recently.compare_exchange_strong(true, false)) {
>     if (tasks.empty()) {
>       ScheduleMore();
>     }
>   }
>   weak_lock.unlock();
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to