alamb commented on issue #5882:
URL: https://github.com/apache/arrow-rs/issues/5882#issuecomment-2337775365

   > If we identify the CPU intensive sections of Datafusion and use 
`spawn_blocking` combined with a channel we could move all the blocking tasks 
to that separate threadpool and use the default tokio threadpool for IO.
   
   If we did this I think it is important to do performance tests -- by default 
tokio potentially uses many (100s I think?) of threads for this blocking thread 
pool and if we are not careful launching CPU bound work on them will mean the 
threads are over subscribed (more threads than CPUs) which will reduce 
effectiveness
   
   
   
   > This is what we do at InfluxData and it works reasonably well. You have to 
be slightly careful so that you don't miss some IO calls or that you don't hand 
IO handles (e.g. sockets, or HTTP connections wrapping them) from the IO 
runtime to the CPU runtime.
   
   It would be really helpful to document / write a blog about how this works 
-- I think it would be widely read and appreciated. @ion-elgreco  any interest 
/ chance that you or someone else in the delta lake team would be able to? I 
would be happy to collaborate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to