[ 
https://issues.apache.org/jira/browse/SPARK-36912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Spark101 updated SPARK-36912:
-----------------------------
    Attachment: environment.pdf

> Get Result time for task is taking very long time and timesout
> --------------------------------------------------------------
>
>                 Key: SPARK-36912
>                 URL: https://issues.apache.org/jira/browse/SPARK-36912
>             Project: Spark
>          Issue Type: Question
>          Components: Block Manager
>    Affects Versions: 3.0.3
>            Reporter: Spark101
>            Priority: Major
>         Attachments: Stage-result.pdf, Storage-result.pdf, environment.pdf, 
> executors.pdf, thread-dump-exec3.pdf, threadDump-exc2.pdf
>
>
> We use Spark on Kubernetes to run batch jobs to analyze flows and produce 
> insights. The flows are read from timeseries database. We have 3 exec 
> instances each having 5g mem + driver (5g mem). We observe the following 
> warning followed by timeout errors after which the job fails. We have been 
> stuck on this for some time and really hoping to get some help from this 
> forum:2021-10-02T16:07:09.459ZGMT  WARN dispatcher-CoarseGrainedScheduler 
> TaskSetManager - Stage 52 contains a task of very large size (2842 KiB). The 
> maximum recommended task size is 1000 KiB.
> 2021-10-02T16:08:19.151ZGMT ERROR task-result-getter-0 RetryingBlockFetcher - 
> Exception while beginning fetch of 1 outstanding blocks 
> java.io.IOException: Failed to connect to /192.168.7.99:34259
>       at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:253)
>       at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:195)
>       at 
> org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:122)
>       at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141)
>       at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:121)
>       at 
> org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:143)
>       at 
> org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:103)
>       at 
> org.apache.spark.storage.BlockManager.fetchRemoteManagedBuffer(BlockManager.scala:1010)
>       at 
> org.apache.spark.storage.BlockManager.$anonfun$getRemoteBlock$8(BlockManager.scala:954)
>       at scala.Option.orElse(Option.scala:447)
>       at 
> org.apache.spark.storage.BlockManager.getRemoteBlock(BlockManager.scala:954)
>       at 
> org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:1092)
>       at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$3.$anonfun$run$1(TaskResultGetter.scala:88)
>       at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>       at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1934)
>       at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:63)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: 
> Connection timed out: /192.168.7.99:34259
> Caused by: java.net.ConnectException: Connection timed out
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>       at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
>       at 
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
>       at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
>       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
>       at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>       at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>       at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>       at java.lang.Thread.run(Thread.java:748)
> 2021-10-02T16:08:19.151ZGMT ERROR task-result-getter-2 RetryingBlockFetcher - 
> Exception while beginning fetch of 1 outstanding blocks 
> java.io.IOException: Failed to connect to /192.168.6.167:42405
>       at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:253)
>       at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:195)
>       at 
> org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:122)
>       at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141)
>       at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:121)
>       at 
> org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:143)
>       at 
> org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:103)
>       at 
> org.apache.spark.storage.BlockManager.fetchRemoteManagedBuffer(BlockManager.scala:1010)
>       at 
> org.apache.spark.storage.BlockManager.$anonfun$getRemoteBlock$8(BlockManager.scala:954)
>       at scala.Option.orElse(Option.scala:447)
>       at 
> org.apache.spark.storage.BlockManager.getRemoteBlock(BlockManager.scala:954)
>       at 
> org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:1092)
>       at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$3.$anonfun$run$1(TaskResultGetter.scala:88)
>       at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>       at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1934)
>       at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:63)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: 
> Connection timed out: /192.168.6.167:42



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to