[ 
https://issues.apache.org/jira/browse/ARROW-18431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645419#comment-17645419
 ] 

Weston Pace commented on ARROW-18431:
-------------------------------------

Are you able to provide some more information on the structure of the plan?  
Can you print the plan?  What is your input?  Are you reading from files or 
in-memory tables or some kind of record batch reader?

My guess would be some kind of deadlock in mutexes from callbacks firing more 
quickly than we'd expect.  It should be fixable if we can find a way to 
reproduce semi-reliably.

> Acero's Execution Plan never finishes.
> --------------------------------------
>
>                 Key: ARROW-18431
>                 URL: https://issues.apache.org/jira/browse/ARROW-18431
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 10.0.0
>            Reporter: Pau Garcia Rodriguez
>            Assignee: Weston Pace
>            Priority: Major
>
> We have observed that sometimes an execution plan with a small input never 
> finishes (the future returned by the ExecPlan::finished() method is never 
> marked as finished), even though the generator in the sink node is exhausted 
> and has returned nullopt.
> This issue seems to happen at random, the same plan with the same input 
> sometimes works (the plan is marked finished) and sometimes it doesn't. Since 
> the ExecPlanImpl destructor forces the executing thread to wait for the plan 
> to finish (when the plan has not yet finished) we enter in a deadlock waiting 
> for a plan that never finishes.
> Since this has only happened with small inputs and not in a deterministic 
> way, we believe the issue might be in the ExecPlan::StartProducing method.
> Our hypothesis is that after the plan starts producing on each node, each 
> node schedules their tasks and they are  immediately finished (due to the 
> small input) and somehow the callback that marks the future finished_ 
> finished is never executed.
>  
> {code:java}
> Status StartProducing() {
>   ...
>   Future<> scheduler_finished =   
> util::AsyncTaskScheduler::Make([this(util::AsyncTaskScheduler* 
> async_scheduler) {
>   ...
>   scheduler_finished.AddCallback([this](const Status& st) { 
> finished_.MarkFinished(st);});
> ...
> }{code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to