ah~hell, I am using Spark 1.2.0, and my job was submitted to use 8 cores...the magic number in the bug.
[image: --] Xi Shen [image: http://]about.me/davidshen <http://about.me/davidshen?promo=email_sig> <http://about.me/davidshen> On Thu, Mar 26, 2015 at 5:48 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Whats your spark version? Not quiet sure, but you could be hitting this > issue > https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-4516 > On 26 Mar 2015 11:01, "Xi Shen" <davidshe...@gmail.com> wrote: > >> Hi, >> >> My environment is Windows 64bit, Spark + YARN. I had a job that takes a >> long time. It starts well, but it ended with below exception: >> >> 15/03/25 12:39:09 WARN server.TransportChannelHandler: Exception in >> connection from >> headnode0.xshe3539-hadoop-sydney.q10.internal.cloudapp.net/100.72.68.34:58507 >> java.io.IOException: An existing connection was forcibly closed by the >> remote host >> at sun.nio.ch.SocketDispatcher.read0(Native Method) >> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43) >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) >> at sun.nio.ch.IOUtil.read(IOUtil.java:192) >> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) >> at >> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) >> at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) >> at >> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225) >> at >> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) >> at >> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) >> at >> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) >> at >> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) >> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) >> at >> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) >> at java.lang.Thread.run(Thread.java:745) >> 15/03/25 12:39:09 ERROR executor.CoarseGrainedExecutorBackend: Driver >> Disassociated [akka.tcp:// >> sparkexecu...@workernode0.xshe3539-hadoop-sydney.q10.internal.cloudapp.net:65469] >> -> [akka.tcp:// >> sparkdri...@headnode0.xshe3539-hadoop-sydney.q10.internal.cloudapp.net:58467] >> disassociated! Shutting down. >> 15/03/25 12:39:09 WARN remote.ReliableDeliverySupervisor: Association >> with remote system [akka.tcp:// >> sparkdri...@headnode0.xshe3539-hadoop-sydney.q10.internal.cloudapp.net:58467] >> has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. >> >> Interestingly, the job is shown as Succeeded in the RM. I checked the >> application log, it is miles long, and this is the only exception I found. >> And it is no very useful to help me pin point the problem. >> >> Any idea what would be the cause? >> >> >> Thanks, >> >> >> [image: --] >> Xi Shen >> [image: http://]about.me/davidshen >> <http://about.me/davidshen?promo=email_sig> >> <http://about.me/davidshen> >> >