OliLay commented on issue #45749: URL: https://github.com/apache/arrow/issues/45749#issuecomment-2717994328
> > The fact that it seems to take a lot of CPU time makes this explanation unlikely (some lock contention would not result in such CPU usage). Yes, further investigation also revealed that my initial assumption is not correct. In fact, debugging a bit further, it seems like the actual call to `HeadObject` (in `ObjectInputFile::Init`) into the AWS SDK takes so long. Debugging further yield the result that setting the [maxConnections](https://github.com/apache/arrow/blob/9b36c709a52caabd3579d006bb92379e1b263e52/cpp/src/arrow/filesystem/s3fs.cc#L1178) triggers that behavior. For some reason the AWS SDK code [interacting with curl ](https://github.com/aws/aws-sdk-cpp/blob/main/src/aws-cpp-sdk-core/source/http/curl/CurlHandleContainer.cpp) seems to be problematic in that case. As Azure also uses curl as underlying library, this may be the common denominator. Using the [AWS SDK CRT HTTP](https://github.com/aws/aws-sdk-cpp/blob/109022594cb4bcd589ee0e8a12a0d11e42b9f558/docs/CMake_Parameters.md#use_crt_http_client) client also resolves the issue. Not sure if it is a deeper problem inside curl; or with how curl is used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org