[ 
https://issues.apache.org/jira/browse/ARROW-9753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17202396#comment-17202396
 ] 

Jorge commented on ARROW-9753:
------------------------------

Isn't it possible to replace {{Arc<Mutex<dyn RecordBatchReader>>}} by {{Box<dyn 
RecordBatchReader>}}?  Maybe this is not a good idea for other reasons (e.g. we 
can't share  on batches), but reading the current code, the creation of 
RecordBatchIterator is always done inside the thread, and what needs to be Arc 
+ Send+Sync is the ExecutionPlan itself, that crosses thread spawn boundaries 
(see e.g. MergeExec::execute).

I did a quick POC locally, and I was able to compile and have the tests run 
with the change above.

To run threads on an iterator, I think that we need scoped threads (a-la 
crossbeam) or some mechanism to create the threads inside the iteration (which 
IMO needs a scheduler).

This SO question is quite good in this respect: 
[https://stackoverflow.com/a/45327907/931303]

> [Rust] [DataFusion] Remove the use of Mutex in ExecutionPlan trait
> ------------------------------------------------------------------
>
>                 Key: ARROW-9753
>                 URL: https://issues.apache.org/jira/browse/ARROW-9753
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust, Rust - DataFusion
>            Reporter: Andy Grove
>            Assignee: Andy Grove
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.0.0
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The ExecutionPlan trait should not return Arc<Mutex<RecordBatchIterator>> but 
> just Arc<RecordBatchIterator> since most operators do not need to be mutable. 
> Those that do can use interior mutability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to