Weston Pace created ARROW-13004:
-----------------------------------

             Summary: [C++] Allow the creation of future "chains" to better 
control parallelism
                 Key: ARROW-13004
                 URL: https://issues.apache.org/jira/browse/ARROW-13004
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace


This is a bit tricky to explain.  ShouldSchedule::Always works well for 
AddCallback but falls short for Transfer and Then.  An example may explain best.

Consider three operators, Source, Transform, and Sink.  They are setup as...
{code:java}
source_fut = source(); // 1
transform_fut = source_fut.Then(Transform(), ScheduleAlways); // 2
sink_fut = transform_fut.Then(Consume()); // 3
{code}
The intent is to run Transform + Consume as a single thread task on each item 
generated by source().  This is what happens if source() is slow.  If source() 
is fast (let's pretend it's always finished) then this is not what happens.

Line 2 causes a new thread task to be launched (since source_fut is finished).  
It is possible that new thread task can mark transform_fut finished before line 
3 is executed by the original thread.  This causes Consume() and Transform() to 
run on separate threads.

The solution (at least as best I can come up with) is unfortunately a little 
complex (though the complexity can be hidden in future/async_generator 
internals).  Basically, it is worth waiting to schedule until the future chain 
has had a chance to finish connecting the pressure.  This means a future 
created with ScheduleAlways is created in an "unconsumed" mode.  Any callbacks 
that would normally be launched will not be launched until the future switches 
to "consumed".  Future.Wait(), VisitAsyncGenerator, CollectAsyncGenerator, and 
some of the async_generator operators would cause the future to be "consumed".  
The "consume" signal will need to propagate backwards up the chain so futures 
will need to keep a reference to their antecedent future.

This work meshes well with some other improvements I have been considering, in 
particular, splitting future/promise and restricting futures to a single 
callback.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to