-dev (this is appropriate for user@) Probably https://issues.apache.org/jira/browse/SPARK-10141 or https://issues.apache.org/jira/browse/SPARK-11334 but those aren't resolved. Feel free to jump in.
On Mon, Aug 15, 2016 at 8:13 PM, Rachana Srivastava < rachana.srivast...@markmonitor.com> wrote: > *Summary:* > > I am running Spark 1.5 on CDH5.5.1. Under extreme load intermittently I > am getting this connection failure exception and later negative executor in > the Spark UI. > > > > *Exception:* > > TRACE: org.apache.hadoop.hbase.ipc.AbstractRpcClient - Call: Multi, > callTime: 76ms > > INFO : org.apache.spark.network.client.TransportClientFactory - Found > inactive connection to xxxx/xxx.xxx.xxx.xxxx, creating a new one. > > ERROR: org.apache.spark.network.shuffle.RetryingBlockFetcher - Exception > while beginning fetch of 1 outstanding blocks (after 1 retries) > > java.io.IOException: Failed to connect to xxxx/xxx.xxx.xxx.xxxx > > at org.apache.spark.network.client.TransportClientFactory. > createClient(TransportClientFactory.java:193) > > at org.apache.spark.network.client.TransportClientFactory. > createClient(TransportClientFactory.java:156) > > at org.apache.spark.network.netty. > NettyBlockTransferService$$anon$1.createAndStart( > NettyBlockTransferService.scala:88) > > at org.apache.spark.network.shuffle.RetryingBlockFetcher. > fetchAllOutstanding(RetryingBlockFetcher.java:140) > > at org.apache.spark.network.shuffle.RetryingBlockFetcher. > access$200(RetryingBlockFetcher.java:43) > > at org.apache.spark.network.shuffle.RetryingBlockFetcher$ > 1.run(RetryingBlockFetcher.java:170) > > at java.util.concurrent.Executors$RunnableAdapter. > call(Executors.java:471) > > at java.util.concurrent.FutureTask.run(FutureTask. > java:262) > > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.net.ConnectException: Connection refused: > xxxx/xxx.xxx.xxx.xxxx > > at sun.nio.ch.SocketChannelImpl.checkConnect(Native > Method) > > at sun.nio.ch.SocketChannelImpl.finishConnect( > SocketChannelImpl.java:739) > > at io.netty.channel.socket.nio.NioSocketChannel. > doFinishConnect(NioSocketChannel.java:224) > > at io.netty.channel.nio.AbstractNioChannel$ > AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289) > > at io.netty.channel.nio.NioEventLoop.processSelectedKey( > NioEventLoop.java:528) > > at io.netty.channel.nio.NioEventLoop. > processSelectedKeysOptimized(NioEventLoop.java:468) > > at io.netty.channel.nio.NioEventLoop.processSelectedKeys( > NioEventLoop.java:382) > > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop. > java:354) > > at io.netty.util.concurrent.SingleThreadEventExecutor$2. > run(SingleThreadEventExecutor.java:111) > > ... 1 more > > > > > > *Related Defects*: > > https://issues.apache.org/jira/browse/SPARK-2319 > > https://issues.apache.org/jira/browse/SPARK-9591 > > > > > >