Hi, community
      I am facing a strange problem:
      all executors does not respond, and then all of them failed with the
ExecutorLostFailure.
      when I look into yarn logs, there are full of such exception

15/09/14 04:35:33 ERROR shuffle.RetryingBlockFetcher: Exception while
beginning fetch of 1 outstanding blocks (after 3 retries)
java.io.IOException: Failed to connect to host/ip:port
        at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:193)
        at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
        at
org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:88)
        at
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
        at
org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
        at
org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: host/ip:port
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
        at
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
        at
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
        at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
        at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
        ... 1 more


      The strange thing is that, if I reduce the input size, the problems
just disappeared. I have found a similar issue in the mail-archive(
http://mail-archives.us.apache.org/mod_mbox/spark-user/201502.mbox/%3CCAOHP_tHRtuxDfWF0qmYDauPDhZ1=MAm5thdTfgAhXDN=7kq...@mail.gmail.com%3E),
however I didn't see the solution. So I am wondering if anyone could help
with that?

      My env is:
      hdp 2.2.6
      spark(1.4.1)
      mode: yarn-client
      spark-conf:
      spark.driver.extraJavaOptions -Dhdp.version=2.2.6.0-2800
      spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.6.0-2800
      spark.executor.memory 6g
      spark.storage.memoryFraction 0.3
      spark.dynamicAllocation.enabled true
      spark.shuffle.service.enabled true

Reply via email to