westonpace commented on issue #36587: URL: https://github.com/apache/arrow/issues/36587#issuecomment-1632748371
I am a little confused then. When any operation is run by the S3 filesystem then the AWS SDK will attempt to determine credentials for that action. Typically this is done by looking in the user's config file (e.g. for ~/.aws/config). If this configuration file is not found then it will attempt to contact a special IP address that EC2 machines have configured which tells the EC2 machine what its configuration is. This attempt to contact that special IP address can be very slow, depending on the network configuration of the machine (sometimes it will spend minutes waiting for a timeout). Setting variable `AWS_EC2_METADATA_DISABLED` will disable the check but that should only affect your connection if you are in an EC2 machine to begin with. So I do not understand how setting that variable to true can cause connection issues to S3. Can you add these lines to the **top** of your script (these lines must come before you import any other pyarrow module)? This will add additional debugging information that might help us understand what is happening: ``` import pyarrow._s3fs pyarrow._s3fs.initialize_s3(pyarrow._s3fs.S3LogLevel.Trace) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
