Antoine/Wes, thanks for the input.  I will focus on the CSV reader and
the minimal async needed to get I/O off the thread pool and support
for a nested task group.  This is just to focus on one small thing at
a time.  I'll avoid any scheduler work for now but maybe can look at
that in the future.

As for your feedback, I think #3 (adding items to the end of the
thread pool) could also be mitigated if a promise executed it's
callbacks directly (instead of submitting them as new tasks).  There
is a bit of a "max recursion" case that has to be looked after
(similar to what Antoine mentioned) but it could be handled.  I may
experiment with that some.  The Tokio article you posted also talked
about this (keeping a spot open for the last thing scheduled and
running that if possible).

#5 sounds pretty straightforward but I think you'd want a wide variety
of test cases to make sure you're improving things overall.  You could
exceed a thread pool with just a single workload.  The CSV reader, for
example, will grow to occupy as many threads as there are available
(assuming there are enough columns).  There are a lot of things to
balance for here, balancing for cache cohesion, balancing I/O vs. CPU
workload, balancing for fairness.  It may not be obvious what exactly
to aim for.


On Mon, Sep 28, 2020 at 2:32 AM Antoine Pitrou <anto...@python.org> wrote:
>
> Le 28/09/2020 à 11:38, Antoine Pitrou a écrit :
> >
> > Hi Weston,
> >
> > Le 25/09/2020 à 23:21, Weston Pace a écrit :
> >>
> >> * The current thread pool implementation deadlocks when used in a
> >> "nested" case, an asynchronous solution can work around this
> >
> > If required it may be possible to hack around this.  For example, AFAIR
> > TBB has a simple heuristic to enable reentrant calls into the thread
> > pool until a hardcoded recursion level.
>
> Closely related: "TaskGroup::Finish should execute tasks"
> https://issues.apache.org/jira/browse/ARROW-10014
>
> Regards
>
> Antoine.

Reply via email to