SandeepSinghGahir commented on issue #10340:
URL: https://github.com/apache/iceberg/issues/10340#issuecomment-2550591724
Hi @amogh-jahagirdar,
This issue isn't resolved yet. Upon Glue 5.0 release, I tested a job with
Iceberg 1.7.0 and I'm still seeing the same error with just different logging.
Here is stack trace:
Any help in resolving this issue is greatly appreciated.
```
ERROR 2024-12-07T02:04:22,219 814843
com.amazonaws.services.glueexceptionanalysis.GlueExceptionAnalysisListener
[spark-listener-group-shared] [Glue Exception Analysis]
{"Event":"GlueExceptionAnalysisStageFailed","Timestamp":1733537062218,"Failure
Reason":"org.apache.spark.shuffle.FetchFailedException: Error in reading
FileSegmentManagedBuffer[file=/tmp/blockmgr-e22f16fc-d99e-4692-aa4b-66a91/0c/shuffle_11_118332_0.data,offset=288812863,length=188651]","Stack
Trace":[{"Declaring Class":"org.apache.spark.errors.SparkCoreErrors$","Method
Name":"fetchFailedError","File Name":"SparkCoreErrors.scala","Line
Number":437},{"Declaring
Class":"org.apache.spark.storage.ShuffleBlockFetcherIterator","Method
Name":"throwFetchFailedException","File
Name":"ShuffleBlockFetcherIterator.scala","Line Number":1304},{"Declaring
Class":"org.apache.spark.storage.ShuffleBlockFetcherIterator","Method
Name":"next","File Name":"ShuffleBlockFetcherIterator.scala","Line
Number":957},{"Declaring Class":"org.apach
e.spark.storage.Shuffl
ERROR 2024-12-07T02:04:25,531 818155
org.apache.spark.scheduler.TaskSchedulerImpl
[dispatcher-CoarseGrainedScheduler] Lost executor 95 on 172.34.30.9: Remote
RPC client disassociated. Likely due to containers exceeding thresholds, or
network issues. Check driver logs for WARN messages.
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.apache.spark.ExecutorDeadException: [INTERNAL_ERROR_NETWORK]
The relative remote executor(Id: 95), which maintains the block data to fetch
is dead.
at
org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:145)
at
org.apache.spark.network.shuffle.RetryingBlockTransferor.transferAllOutstanding(RetryingBlockTransferor.java:173)
at
org.apache.spark.network.shuffle.RetryingBlockTransferor.lambda$initiateRetry$0(RetryingBlockTransferor.java:206)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.bas
)at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.apache.spark.ExecutorDeadException: [INTERNAL_ERROR_NETWORK]
The relative remote executor(Id: 95), which maintains the block data to fetch
is dead.
at
org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:145)
at
org.apache.spark.network.shuffle.RetryingBlockTransferor.transferAllOutstanding(RetryingBlockTransferor.java:173)
at
org.apache.spark.network.shuffle.RetryingBlockTransferor.lambda$initiateRetry$0(RetryingBlockTransferor.java:206)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.ba
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.apache.spark.ExecutorDeadException: [INTERNAL_ERROR_NETWORK]
The relative remote executor(Id: 95), which maintains the block data to fetch
is dead.
at
org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:145)
at
org.apache.spark.network.shuffle.RetryingBlockTransferor.transferAllOutstanding(RetryingBlockTransferor.java:173)
at
org.apache.spark.network.shuffle.RetryingBlockTransferor.start(RetryingBlockTransferor.java:152)
at
org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:155)
at
org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:403)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.send$1
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.apache.spark.ExecutorDeadException: [INTERNAL_ERROR_NETWORK]
The relative remote executor(Id: 95), which maintains the block data to fetch
is dead.
at
org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:145)
at
org.apache.spark.network.shuffle.RetryingBlockTransferor.transferAllOutstanding(RetryingBlockTransferor.java:173)
at
org.apache.spark.network.shuffle.RetryingBlockTransferor.lambda$initiateRetry$0(RetryingBlockTransferor.java:206)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.bas
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.apache.spark.ExecutorDeadException: [INTERNAL_ERROR_NETWORK]
The relative remote executor(Id: 95), which maintains the block data to fetch
is dead.
at
org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:145)
at
org.apache.spark.network.shuffle.RetryingBlockTransferor.transferAllOutstanding(RetryingBlockTransferor.java:173)
at
org.apache.spark.network.shuffle.RetryingBlockTransferor.lambda$initiateRetry$0(RetryingBlockTransferor.java:206)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.bas
)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.apache.spark.ExecutorDeadException: [INTERNAL_ERROR_NETWORK]
The relative remote executor(Id: 95), which maintains the block data to fetch
is dead.
at
org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:145)
at
org.apache.spark.network.shuffle.RetryingBlockTransferor.transferAllOutstanding(RetryingBlockTransferor.java:173)
at
org.apache.spark.network.shuffle.RetryingBlockTransferor.lambda$initiateRetry$0(RetryingBlockTransferor.java:206)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.ba
ERROR 2024-12-07T02:04:28,291 820915
com.amazonaws.services.glueexceptionanalysis.GlueExceptionAnalysisListener
[spark-listener-group-shared] [Glue Exception Analysis]
{"Event":"GlueExceptionAnalysisTaskFailed","Timestamp":1733537068290,"Failure
Reason":"Connection pool shut down","Stack Trace":[{"Declaring
Class":"software.amazon.awssdk.thirdparty.org.apache.http.util.Asserts","Method
Name":"check","File Name":"Asserts.java","Line Number":34},{"Declaring
Class":"software.amazon.awssdk.thirdparty.org.apache.http.impl.conn.PoolingHttpClientConnectionManager","Method
Name":"requestConnection","File
Name":"PoolingHttpClientConnectionManager.java","Line Number":269},{"Declaring
Class":"software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$DelegatingHttpClientConnectionManager","Method
Name":"requestConnection","File
Name":"ClientConnectionManagerFactory.java","Line Number":75},{"Declaring
Class":"software.amazon.awssdk.http.apache.internal.conn.ClientConnecti
onManagerFactory$Instrumented
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.apache.spark.ExecutorDeadException: [INTERNAL_ERROR_NETWORK]
The relative remote executor(Id: 95), which maintains the block data to fetch
is dead.
at
org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:145)
at
org.apache.spark.network.shuffle.RetryingBlockTransferor.transferAllOutstanding(RetryingBlockTransferor.java:173)
at
org.apache.spark.network.shuffle.RetryingBlockTransferor.lambda$initiateRetry$0(RetryingBlockTransferor.java:206)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base
ERROR 2024-12-07T02:04:28,401 821025
org.apache.spark.scheduler.TaskSetManager [task-result-getter-3] Task 47
in stage 51.3 failed 4 times; aborting job
ERROR 2024-12-07T02:04:28,408 821032
com.amazonaws.services.glueexceptionanalysis.GlueExceptionAnalysisListener
[spark-listener-group-shared] [Glue Exception Analysis]
{"Event":"GlueExceptionAnalysisJobFailed","Timestamp":1733537068406,"Failure
Reason":"JobFailed(org.apache.spark.SparkException: Job aborted due to stage
failure: Task 47 in stage 51.3 failed 4 times, most recent failure: Lost task
47.3 in stage 51.3 (TID 172048) (172.36.175.193 executor 46):
java.lang.IllegalStateException: Connection pool shut down","Stack
Trace":[{"Declaring Class":"org.apache.spark.SparkException: Job aborted due to
stage failure: Task 47 in stage 51.3 failed 4 times, most recent failure: Lost
task 47.3 in stage 51.3 (TID 172048) (172.36.175.193 executor 46):
java.lang.IllegalStateException: Connection pool shut down","Method
Name":"TopLevelFailedReason","File Name":"TopLevelFailedReason","Line
Number":-1},{"Declaring
Class":"software.amazon.awssdk.thirdparty.org.apache.http.util.Asserts","Metho
d Name":"check","File Name":"
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]