Tom-Newton commented on issue #15233: URL: https://github.com/apache/arrow/issues/15233#issuecomment-2502104233
We just discovered this problem and I'm quite keen to get it fixed. In our case we are using the `AzureFileSystem`, but it implements `background_writes` in a very similar way to the `S3FileSystem`, so it's not a surprise that it suffers from the same problem. I think the problem is exactly what @westonpace was describing but to be precise about how it manifests in this case: 1. With a large number of files and `use_threads=true`, `CopyFiles` submits a large number of copy operations to the IO thread pool, enough to occupy all the IO threads. https://github.com/apache/arrow/blob/main/cpp/src/arrow/filesystem/filesystem.cc#L648-L650 2. With `background_writes=true` each `Write` https://github.com/apache/arrow/tree/main/cpp/src/arrow/filesystem/util_internal.cc#L56 to Azure or S3 actually just writes to an in memory buffer and submits a task on the IO thread pool, to do the actual upload. 3. After all the write calls the `OutputStream` is destructed, which calls `Flush()`. `Flush()` blocks, waiting for all the tasks, submitted by calls to `Write` to complete there actual uploads. Unfortunately, this is a deadlock because there are no free threads in the IO thread pool to actually do any of the uploads. If we compare to @westonpace 's rules: > This yields roughly the following rules: > - The user thread (the python top-level thread) should block on a top-level future > - CPU threads should never block (outside of minor blocking on mutex guards to sequence a tiny critical section) > - I/O threads should only block on OS calls. They should never block waiting for other tasks. This third rule is violated because the IO threads spawned by copy are waiting on background write IO threads to complete. A few vague ideas on solutions, but none of them seem particularly attractive: 1. `OutputStream` destructor does not synchronously `Flush`. Instead it returns a future for the `Flush`. 1. This is what @westonpace described 2. I fear this could require changes to the `Filesystem` interface particularly around the destructor calling `Flush`. Maybe there are other options though. 1. Try setting priorities when submitting tasks. 1. Copy files spawns tasks with lower priority so they can’t block threads that actually do the copies from running. 1. If it can preempt lower priority, then this seems like a perfect solution. If it cannot preempt there will still be rare race conditions when all the IO threads can get filled with copy operations and cause the deadlock. 1. Use a different thread pool e.g. the cpu thread pool for copy files 1. Sort of makes sense to call it CPU on uploads with background_writes=true because it's basically just writing to an in memory buffer, which is probably more CPU than IO. 1. Using the same thread pool at both levels is good for controlling concurrency. 3. For downloads its definitely wrong to use the CPU pool because it will run the blocking IO directly in the CPU pool. 1. Limit how many tasks copy files can run in parallel, to be slightly less than the io thread pool’s capacity. 1. There will still be lots of wasted threads. 1. Disable background writes if there are more files than IO threads. 1. If there is a really huge file, among lots of small ones the large file will be uploaded synchronously, which will be bad for performance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
