jorgecarleitao opened a new pull request #9086:
URL: https://github.com/apache/arrow/pull/9086


   The motivation behind this PR is that Tokio does not really like blocking 
(e.g. cpu-intensive) operations, because (by definition) it forbids the 
scheduler from switching to other tasks. Because of this, tokio recommends 
(throughout its documentation, but most notably 
[here](https://docs.rs/tokio/1.0.1/tokio/index.html#cpu-bound-tasks-and-blocking-code))
 to use `spawn_blocking` or `rayon` to handle blocking tasks, such as IO and 
CPU-bounded tasks.
   
   This PR is just an experiment / proposal / idea of how we could handle this 
within tokio. Specifically, it is using `spawn_blocking` to spawn a thread on 
tokio's "blocking-dedicated" thread pool to handle a blocking operation, 
thereby avoiding starving the "async-dedicated" thread pool.
   
   I do not expect this code to have much difference in performance, as 
`filter` is not such a blocking operation compared to e.g. a group by. However, 
I think that this could address performance issues when we have multiple stages 
(as one stage currently blocks the whole thread due to how we perform blocking 
ops inside `async` code).
   
   @andygrove @alamb @Dandandan , I have been looking at DataFusion's code and 
tokio's documentation, and I hypothesize that this is would be one way to 
follow tokio's recommendations for our use-case, but I would really like to get 
your opinions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to