westonpace commented on issue #36007:
URL: https://github.com/apache/arrow/issues/36007#issuecomment-1589766322

   Are you accessing a public file?  My guess is that you're running into the 
default AWS credentials checker.  By default, even if the file is public, AWS 
is going to try and figure out what credentials you are using before making any 
request.
   
   Unfortunately, if you don't have a credentials file stored somewhere that it 
expects (e.g. `~/.aws/credentials`) then it will start trying some unpleasant 
alternatives.  The main problematic alternative is to make a request to 
`169.254.169.254` which is a special magic IP address that all EC2 containers 
are configured to serve that contains EC2 metadata (e.g. in case your EC2 
instance has credentials assigned to it).  Depending on your system's 
networking configuration it may take some time for this request to fail.
   
   So, for example, if I delete my credentials file, turn on AWS trace logging, 
and run your test then the first option takes 13 seconds and my logs are filled 
with messages like this:
   
   ```
   [INFO] 2023-06-13 17:28:26.917 ProcessCredentialsProvider [140191537006528] 
Failed to find credential process's profile: default
   [TRACE] 2023-06-13 17:28:26.917 FileSystemUtils [140191537006528] Checking 
HOME for the home directory.
   [DEBUG] 2023-06-13 17:28:26.917 FileSystemUtils [140191537006528] 
Environment value for variable HOME is /home/pace
   [DEBUG] 2023-06-13 17:28:26.917 FileSystemUtils [140191537006528] Home 
directory is missing the final / appending one to normalize
   [DEBUG] 2023-06-13 17:28:26.917 FileSystemUtils [140191537006528] Final Home 
Directory is /home/pace/
   [DEBUG] 2023-06-13 17:28:26.917 SSOCredentialsProvider [140191537006528] 
Loading token from: 
/home/pace/.aws/sso/cache/da39a3ee5e6b4b0d3255bfef95601890afd80709.json
   [DEBUG] 2023-06-13 17:28:26.917 SSOCredentialsProvider [140191537006528] 
Preparing to load token from: 
/home/pace/.aws/sso/cache/da39a3ee5e6b4b0d3255bfef95601890afd80709.json
   [INFO] 2023-06-13 17:28:26.917 SSOCredentialsProvider [140191537006528] 
Unable to open token file on path: 
/home/pace/.aws/sso/cache/da39a3ee5e6b4b0d3255bfef95601890afd80709.json
   [TRACE] 2023-06-13 17:28:26.917 SSOCredentialsProvider [140191537006528] 
Access token for SSO not available
   [DEBUG] 2023-06-13 17:28:26.917 InstanceProfileCredentialsProvider 
[140191537006528] Checking if latest credential pull has expired.
   [INFO] 2023-06-13 17:28:26.917 InstanceProfileCredentialsProvider 
[140191537006528] Credentials have expired attempting to re-pull from EC2 
Metadata Service.
   [TRACE] 2023-06-13 17:28:26.917 EC2MetadataClient [140191537006528] Getting 
default credentials for ec2 instance from http://169.254.169.254
   [TRACE] 2023-06-13 17:28:26.917 EC2MetadataClient [140191537006528] 
Retrieving credentials from 
http://169.254.169.254/latest/meta-data/iam/security-credentials
   [TRACE] 2023-06-13 17:28:26.917 CurlHttpClient [140191537006528] Making 
request to http://169.254.169.254/latest/meta-data/iam/security-credentials
   [TRACE] 2023-06-13 17:28:26.917 CurlHttpClient [140191537006528] Including 
headers:
   [TRACE] 2023-06-13 17:28:26.917 CurlHttpClient [140191537006528] host: 
169.254.169.254
   [TRACE] 2023-06-13 17:28:26.917 CurlHttpClient [140191537006528] user-agent: 
aws-sdk-cpp/1.10.13 Linux/5.19.0-43-generic x86_64 GCC/10.4.0
   [DEBUG] 2023-06-13 17:28:26.917 CurlHandleContainer [140191537006528] 
Attempting to acquire curl connection.
   [INFO] 2023-06-13 17:28:26.917 CurlHandleContainer [140191537006528] 
Connection has been released. Continuing.
   [DEBUG] 2023-06-13 17:28:26.917 CurlHandleContainer [140191537006528] 
Returning connection handle 0x563be150dc20
   [DEBUG] 2023-06-13 17:28:26.917 CurlHttpClient [140191537006528] Obtained 
connection handle 0x563be150dc20
   [DEBUG] 2023-06-13 17:28:26.918 CURL [140191537006528] (Text)   Trying 
169.254.169.254:80...
   [DEBUG] 2023-06-13 17:28:27.919 CURL [140191537006528] (Text) After 1000ms 
connect time, move on!
   [DEBUG] 2023-06-13 17:28:27.919 CURL [140191537006528] (Text) connect to 
169.254.169.254 port 80 failed: Connection timed out
   [DEBUG] 2023-06-13 17:28:27.919 CURL [140191537006528] (Text) Connection 
timeout after 1001 ms
   [DEBUG] 2023-06-13 17:28:27.919 CURL [140191537006528] (Text) Closing 
connection 0\
   [ERROR] 2023-06-13 17:28:27.919 CurlHttpClient [140191537006528] Curl 
returned error code 28 - Timeout was reached
   ```
   
   There is a way to specify anonymous credentials when creating an S3 
filesystem (which should prevent S3 from trying to recalculate credentials on 
each request) but I don't know enough of the R library to know how to plumb 
that through to something like `read_parquet`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to