adriangb commented on issue #5882:
URL: https://github.com/apache/arrow-rs/issues/5882#issuecomment-2293618784

   Indeed I have encountered this. Like @tustvold says it's not just object 
storage, any other IO will be impacted, and currently there's nothing to stop 
you from shooting yourself in the foot. It's also hard to detect when you do.
   
   I don't have any suggestions currently for the right APIs to do this, 
frankly I'm not 100% confident I've addressed the problem completely in my use 
case as it seems tricky to get right. I feel like it would be hard to 
completely eliminate the foot gun from an API perspective, so maybe the best we 
can do is provide APIs to work around it + detect when it's happening.
   
   I wonder if we could do something like have DataFusion's CPU runtime be some 
specialized runtime that requires a trait bound on futures? No idea if that's 
possible. But basically prohibit you from doing async stuff that hasn't opted 
into the CPU runtime on the CPU runtime.
   
   It would also be nice to have documentation on how to check when this 
happens (using Tokio metrics to detect stalled async tasks?). I'm not 
completely sure how to do this but if there were a "guide to debugging your 
DataFusion app for CPU blocking IO" that would be immensely helpful not just to 
DataFusion users but to async Rust users in general.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to