Can you try with the current master? I just merged a fix for a file leak
that could cause this.

This also looks like it is happening in a task, not on the driver. Do your
task logs complain about closing files in a finalizer? We added a finalizer
that will log when open files are leaked so we can clean them up. That's
how we caught the scan ones.

On Fri, Apr 5, 2019 at 4:28 PM Sandeep Sagar <sandeep.sa...@meltwater.com>
wrote:

> Hi all,
>
> Need some help to understand why I am running into
>
>      com.amazonaws.SdkClientException: Unable to execute HTTP request:
> Timeout waiting for connection from pool
>
>
>
> I have written a simple program to just save a dataset and read it. This
> op works when writing to disk.
>
> When I *changed it to use S3*, I get the issue.
>
> The save to S3 is  ok.
>
> Is it possible that the S3ObjectInputStream is not being closed somewhere,
> leading to all threads being exhausted in the pool?
>
>
>
> Am using the latest build of iceberg.
>
> Regards
>
> Sandeep
>
> Stack Trace-
>
> 2019-04-05 15:31:55,224 INFO main org.apache.iceberg.TableScan - Scanning
> table s3a://tahoe-dev-today/ snapshot 7179336048327305337 created at
> 2019-04-05 15:30:46.616 with filter true
>
>
>
>
>
> 2019-04-05 15:37:53,816 ERROR Executor task launch worker for task 8
> org.apache.spark.executor.Executor - Exception in task 0.0 in stage 1.0
> (TID 8)
>
> org.apache.iceberg.exceptions.RuntimeIOException: Failed to get status for
> file:
> s3a://tahoe-dev-today/data/company_type=test-2/00000-0-b1255a21-99f3-4005-9b29-999bf1862e34.parquet
>
>        at
> org.apache.iceberg.hadoop.HadoopInputFile.lazyStat(HadoopInputFile.java:108)
>
>        at
> org.apache.iceberg.hadoop.HadoopInputFile.getStat(HadoopInputFile.java:136)
>
>        at org.apache.iceberg.parquet.ParquetIO.file(ParquetIO.java:57)
>
>        at
> org.apache.iceberg.parquet.ParquetReader$ReadConf.newReader(ParquetReader.java:163)
>
>        at
> org.apache.iceberg.parquet.ParquetReader$ReadConf.<init>(ParquetReader.java:81)
>
>        at
> org.apache.iceberg.parquet.ParquetReader.init(ParquetReader.java:174)
>
>        at
> org.apache.iceberg.parquet.ParquetReader.iterator(ParquetReader.java:185)
>
>        at
> org.apache.iceberg.spark.source.Reader$TaskDataReader.open(Reader.java:442)
>
>        at
> org.apache.iceberg.spark.source.Reader$TaskDataReader.open(Reader.java:382)
>
>        at
> org.apache.iceberg.spark.source.Reader$TaskDataReader.<init>(Reader.java:317)
>
>        at
> org.apache.iceberg.spark.source.Reader$ReadTask.createPartitionReader(Reader.java:266)
>
>        at
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD.compute(DataSourceRDD.scala:41)
>
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>
>        at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>
>        at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>
>        at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>
>        at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>
>        at org.apache.spark.scheduler.Task.run(Task.scala:121)
>
>        at
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:405)
>
>        at
> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>
>        at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
>
>        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
>        at java.lang.Thread.run(Thread.java:748)
>
> Caused by: java.io.InterruptedIOException: getFileStatus on
> s3a://tahoe-dev-today/data/company_type=test-2/00000-0-b1255a21-99f3-4005-9b29-999bf1862e34.parquet:
> com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout
> waiting for connection from pool
>
>        at
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:141)
>
>        at
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:117)
>
>        at
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1844)
>
>        at
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1808)
>
>        at
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1748)
>
>        at
> org.apache.iceberg.hadoop.HadoopInputFile.lazyStat(HadoopInputFile.java:106)
>
>        ... 30 more
>
> Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP
> request: Timeout waiting for connection from pool
>
>        at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1113)
>
>        at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1063)
>
>        at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
>
>        at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
>
>        at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
>
>        at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
>
>        at
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
>
>        at
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
>
>        at
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4247)
>
>        at
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4194)
>
>        at
> com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1253)
>
>        at
> org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1038)
>
>        at
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1826)
>
>        ... 33 more
>
> Caused by:
> com.amazonaws.thirdparty.apache.http.conn.ConnectionPoolTimeoutException:
> Timeout waiting for connection from pool
>
>        at
> com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:286)
>
>        at
> com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:263)
>
>        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>        at java.lang.reflect.Method.invoke(Method.java:498)
>
>        at
> com.amazonaws.http.conn.ClientConnectionRequestFactory$Handler.invoke(ClientConnectionRequestFactory.java:70)
>
>        at com.amazonaws.http.conn.$Proxy8.get(Unknown Source)
>
>        at
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:190)
>
>        at
> com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
>
>        at
> com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
>
>        at
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>
>        at
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
>
>        at
> com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
>
>        at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1235)
>
>        at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1055)
>
>        ... 44 more
>
> 2019-04-05 15:37:53,842 ERROR task-result-getter-3
> org.apache.spark.scheduler.TaskSetManager - Task 0 in stage 1.0 failed 1
> times; aborting job
>
> 2019-04-05 15:37:53,870 INFO pool-2-thread-1
> org.spark_project.jetty.server.AbstractConnector - Stopped Spark@25e8e59
> {HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
>
>
> The information contained in this email may be confidential. It has been
> sent for the sole use of the intended recipient(s). If the reader of this
> email is not an intended recipient, you are hereby notified that any
> unauthorized review, use, disclosure, dissemination, distribution, or
> copying of this message is strictly prohibited. If you have received this
> email in error, please notify the sender immediately and destroy all copies
> of the message.
>


-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to