Thanks. I think Task Group suits my needs almost. I might need some extra layer
around it.
Here is my use case. When converting a record batch to an R data structures,
all R allocation has to happen on the main thread, but then filling the vectors
can (for some of them) be done in a task that runs on a different thread. Not
all of them, e.g. filling R character vectors needs the main thread.
So I was thinking doing something like the pseudo code:
auto n = num_columns();
auto serial = TaskGroup::MakeSerial();
auto threaded = TaskGroup::MakeThreaded(...);
for(int i=0; i<n; i++) {
- Allocate column I
if( <can run in paralel> ) {
threaded.AddTask(...)
} else {
serial.AddTash()
}
}
- start threaded tasks
- start serial tasks
- combine
I guess that just means I need some way to hold the tasks before they go in the
task groups.
> Le 3 janv. 2019 à 14:36, Antoine Pitrou <[email protected]> a écrit :
>
>
> Hi Romain,
>
> No, it's better if you use the CPU thread pool directly (or through
> TaskGroup, if that suits your execution model better).
>
> Regards
>
> Antoine.
>
>
> Le 03/01/2019 à 14:29, Romain Francois a écrit :
>> Hello,
>>
>> Are the functions in parallel.h the de facto model for parallelisation in
>> arrow ?
>> https://github.com/apache/arrow/blob/42cf69abfc1368c9884f4581811e2e7900c98fcd/cpp/src/arrow/util/parallel.h
>>
>> <https://github.com/apache/arrow/blob/42cf69abfc1368c9884f4581811e2e7900c98fcd/cpp/src/arrow/util/parallel.h>
>>
>> Just wondering if things like intel tbb were considered, IIRC managing
>> threads manually can be expensive and tasks are usually cheaper.
>>
>> Romain
>>