Ingo Müller created ASTERIXDB-2944:
--------------------------------------

             Summary: "SdkClientException: Timeout waiting for connection from 
pool" when using Parquet on S3 at large scale
                 Key: ASTERIXDB-2944
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2944
             Project: Apache AsterixDB
          Issue Type: Bug
          Components: EXT - External data
    Affects Versions: 0.9.8
            Reporter: Ingo Müller


I am running complex queries against Parquet files on S3 (about 17GB) on a 
large machine ({{m5d.24xlarge}} on EC2, which has 96 vCPUs) and get the errors 
like the following:

{{java.io.InterruptedIOException: getFileStatus on 
s3a://bucket/folder/file.parquet: com.amazonaws.SdkClientException: Unable to 
execute HTTP request: Timeout waiting for connection from pool}}

{{java.io.InterruptedIOException: Reopen at position 15899845068 
ons3a://bucket/folder/file.parquet: com.amazonaws.SdkClientException: Unable to 
execute HTTP request: Timeout waiting for connection from pool}}

This seems to originate from the AWS SDK, where this error [may apparently 
occur|https://github.com/aws/aws-sdk-java/issues/269] if (1) the S3Object is 
not closed properly, or (2) too many requests are being made to the bucket. The 
last time I tried, I found the request limit to S3 to be in the order of 6k/s; 
is it possible that that limit is reached in my workload?

Let me know what kind of information you need to get to the bottom of the 
problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to