OliLay commented on issue #45749:
URL: https://github.com/apache/arrow/issues/45749#issuecomment-2717994328

   > 
   > The fact that it seems to take a lot of CPU time makes this explanation 
unlikely (some lock contention would not result in such CPU usage).
   
   Yes, further investigation also revealed that my initial assumption is not 
correct.
   In fact, debugging a bit further, it seems like the actual call to 
`HeadObject` (in `ObjectInputFile::Init`) into the AWS SDK takes so long.
   Debugging further yield the result that setting the 
[maxConnections](https://github.com/apache/arrow/blob/9b36c709a52caabd3579d006bb92379e1b263e52/cpp/src/arrow/filesystem/s3fs.cc#L1178)
 triggers that behavior. For some reason the AWS SDK code [interacting with 
curl 
](https://github.com/aws/aws-sdk-cpp/blob/main/src/aws-cpp-sdk-core/source/http/curl/CurlHandleContainer.cpp)
 seems to be problematic in that case. As Azure also uses curl as underlying 
library, this may be the common denominator. 
   Using the [AWS SDK CRT 
HTTP](https://github.com/aws/aws-sdk-cpp/blob/109022594cb4bcd589ee0e8a12a0d11e42b9f558/docs/CMake_Parameters.md#use_crt_http_client)
 client also resolves the issue. Not sure if it is a deeper problem inside 
curl; or with how curl is used.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to