edrevo commented on pull request #9523:
URL: https://github.com/apache/arrow/pull/9523#issuecomment-786767430


   I've fiddled around with the code and I'm now convinced the root cause is 
what I mentioned in 
https://github.com/apache/arrow/pull/9523#issuecomment-786237494
   
   It would probably be a good idea to audit all of the usages of crossbeam in 
the project, because crossbeam + tokio don't mix well. See 
https://tokio.rs/tokio/tutorial/channels#tokios-channel-primitives, where they 
mention that crossbeam is for use outside of async contexts:
   
   >There are also channels for use outside of asynchronous Rust, such as 
std::sync::mpsc and crossbeam::channel. **These channels wait for messages by 
blocking the thread, which is not allowed in asynchronous code.**
   
   (emphasis is mine)
   
   I have a local change where I'm using tokio's mpsc channel and there's no 
deadlock. I haven't done any formal measurements, but with that change the 
repartition of the 100GBs uses flat memory and is able to use x1.5 the amount 
of CPUs that it used before. I can also see the output files growing at a 
larger rate. This is probably because tokio's threads aren't blocked anymore 
and tokio's scheduler can schedule more work into the same amount of threads.
   
   An alternative to tokio's mpsc would be to use the async_channel crate, but 
I have never used it so I don't know what's the best option here.
   
   I can open a PR with these changes if you want, but let me know if you 
prefer tokio's mpsc or the async_channel crate.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to