Wes McKinney created ARROW-8667:
-----------------------------------

             Summary: [C++] Add multi-consumer Scheduler API to sit one layer 
above ThreadPool
                 Key: ARROW-8667
                 URL: https://issues.apache.org/jira/browse/ARROW-8667
             Project: Apache Arrow
          Issue Type: New Feature
          Components: C++
            Reporter: Wes McKinney
             Fix For: 1.0.0


I believe we should define an abstraction to allow for custom resource 
allocation strategies (round robin, even time, etc.) to be devised for 
situations where there are different thread pool consumers that are working 
independently of each other.

Consider the classic nested parallelism scenario:

* Task A in thread 1 may issue N subtasks that run in parallel
* Task B in thread 2 may issue K subtasks

With our current ThreadPool abstraction, it is easy to conceive scenarios where 
either Task A or Task B trample each other. 

One approach to remedy this problem is to have an API like so:

{code}
// Inform the scheduler that you want to submit tasks that are "your tasks"
int consumer_id = scheduler->NewConsumer();

for (...) {
  Future<T> fut = scheduler->Submit(consumer_id, DoWork, ...);
}

scheduler->FinishConsumer(consumer_id);
{code}

The idea is that the scheduler would maintain separate task queues for each 
consumer and e.g. track consumer-specific metrics of interest to determine how 
tasks are allocated.

The scheduler could have different logic to control tasks being assigned to 
worker threads:

* Round-robin
* Even-time allocation (run fewer tasks for consumers with "slow" tasks and 
more tasks from consumers with "fast" tasks -- though there are some nuances 
here like avoiding starving a consumer if they've been doing a lot of "slow" 
tasks and then a "fast" consumer shows up)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to