Re: Connection closed error while running Terasort
Can you look at bit deeper in the executor logs? It may happen that it hit the GC Overhead etc which lead to the connection failures. Thanks Best Regards On Tue, Sep 1, 2015 at 5:43 AM, Suman Somasundar < suman.somasun...@oracle.com> wrote: > Hi, > > > > I am getting the following error while trying to run a 10GB terasort under > Yarn with 8 nodes. > > The command is: > > spark-submit --class com.github.ehiggs.spark.terasort.TeraSort --master > yarn-cluster --num-executors 10 --executor-memory 32g > spark-terasort-master/target/spark-terasort-1.0-SNAPSHOT-jar-with-dependencies.jar > hdfs://hadoop-solaris-a:8020/user/hadoop/terasort/input-10 > hdfs://hadoop-solaris-a:8020/user/hadoop/terasort/output-10 > > > > What might be causing this error? > > > > 15/08/31 17:09:48 ERROR server.TransportRequestHandler: Error sending > result > ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1867783019052, > chunkIndex=0}, > buffer=FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/hadoop/appcache/application_1441064487503_0001/blockmgr-c3c8dbb3-9ae2-4e45-b537-fd0beeff98b5/3e/shuffle_1_9_0.data, > offset=0, length=1059423784}} to /199.199.35.5:52486; closing connection > > java.io.IOException: Broken pipe > > at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) > > at > sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:443) > > at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:575) > > at > org.apache.spark.network.buffer.LazyFileRegion.transferTo(LazyFileRegion.java:96) > > at > org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:89) > > at > io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:237) > > at > io.netty.channel.nio.AbstractNioByteChannel.doWrite(AbstractNioByteChannel.java:233) > > at > io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:264) > > at > io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:707) > > at > io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.forceFlush(AbstractNioChannel.java:321) > > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:519) > > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) > > at java.lang.Thread.run(Thread.java:745) > > 15/08/31 17:10:48 ERROR server.TransportChannelHandler: Connection to > hadoop-solaris-c/199.199.35.4:48540 has been quiet for 12 ms while > there are outstanding requests. Assuming connection is dead; please adjust > spark.network.timeout if this is wrong. > > 15/08/31 17:10:48 ERROR client.TransportResponseHandler: Still have 1 > requests outstanding when connection from hadoop-solaris-c/ > 199.199.35.4:48540 is closed > > 15/08/31 17:10:48 INFO shuffle.RetryingBlockFetcher: Retrying fetch (3/3) > for 1 outstanding blocks after 5000 ms > > 15/08/31 17:10:49 ERROR server.TransportRequestHandler: Error sending > result > ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1867783019053, > chunkIndex=0}, > buffer=FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/hadoop/appcache/application_1441064487503_0001/blockmgr-c3c8dbb3-9ae2-4e45-b537-fd0beeff98b5/1b/shuffle_1_6_0.data, > offset=0, length=1052128440}} to /199.199.35.6:45201; closing connection > > java.nio.channels.ClosedChannelException > > 15/08/31 17:10:53 INFO client.TransportClientFactory: Found inactive > connection to hadoop-solaris-c/199.199.35.4:48540, creating a new one. > > 15/08/31 17:11:31 ERROR server.TransportRequestHandler: Error sending > result > ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1867783019054, > chunkIndex=0}, > buffer=FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/hadoop/appcache/application_1441064487503_0001/blockmgr-c3c8dbb3-9ae2-4e45-b537-fd0beeff98b5/1b/shuffle_1_6_0.data, > offset=0, length=1052128440}} to /199.199.35.10:55082; closing connection > > java.nio.channels.ClosedChannelException > > 15/08/31 17:11:31 ERROR server.TransportRequestHandler: Error sending > result > ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1867783019055, > chunkIndex=0}, > buffer=FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/hadoop/appcache/application_1441064487503_0001/blockmgr-c3c8dbb3-9ae2-4e45-b537-fd0beeff98b5/3e/shuffle_1_9_0.data, > offset=0, length=1059423784}} to /199.199.35.7:54328; closing connection > > java.nio.channels.ClosedChannelException > > 15/08/31 17:11:53 ERROR server.TransportRequestHandler: Error sending > result >
Connection closed error while running Terasort
Hi, I am getting the following error while trying to run a 10GB terasort under Yarn with 8 nodes. The command is: spark-submit --class com.github.ehiggs.spark.terasort.TeraSort --master yarn-cluster --num-executors 10 --executor-memory 32g spark-terasort-master/target/spark-terasort-1.0-SNAPSHOT-jar-with-dependencies.jar hdfs://hadoop-solaris-a:8020/user/hadoop/terasort/input-10 hdfs://hadoop-solaris-a:8020/user/hadoop/terasort/output-10 What might be causing this error? 15/08/31 17:09:48 ERROR server.TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1867783019052, chunkIndex=0}, buffer=FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/hadoop/appcache/application_1441064487503_0001/blockmgr-c3c8dbb3-9ae2-4e45-b537-fd0beeff98b5/3e/shuffle_1_9_0.data, offset=0, length=1059423784}} to /199.199.35.5:52486; closing connection java.io.IOException: Broken pipe at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:443) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:575) at org.apache.spark.network.buffer.LazyFileRegion.transferTo(LazyFileRegion.java:96) at org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:89) at io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:237) at io.netty.channel.nio.AbstractNioByteChannel.doWrite(AbstractNioByteChannel.java:233) at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:264) at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:707) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.forceFlush(AbstractNioChannel.java:321) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:519) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at java.lang.Thread.run(Thread.java:745) 15/08/31 17:10:48 ERROR server.TransportChannelHandler: Connection to hadoop-solaris-c/199.199.35.4:48540 has been quiet for 12 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.network.timeout if this is wrong. 15/08/31 17:10:48 ERROR client.TransportResponseHandler: Still have 1 requests outstanding when connection from hadoop-solaris-c/199.199.35.4:48540 is closed 15/08/31 17:10:48 INFO shuffle.RetryingBlockFetcher: Retrying fetch (3/3) for 1 outstanding blocks after 5000 ms 15/08/31 17:10:49 ERROR server.TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1867783019053, chunkIndex=0}, buffer=FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/hadoop/appcache/application_1441064487503_0001/blockmgr-c3c8dbb3-9ae2-4e45-b537-fd0beeff98b5/1b/shuffle_1_6_0.data, offset=0, length=1052128440}} to /199.199.35.6:45201; closing connection java.nio.channels.ClosedChannelException 15/08/31 17:10:53 INFO client.TransportClientFactory: Found inactive connection to hadoop-solaris-c/199.199.35.4:48540, creating a new one. 15/08/31 17:11:31 ERROR server.TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1867783019054, chunkIndex=0}, buffer=FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/hadoop/appcache/application_1441064487503_0001/blockmgr-c3c8dbb3-9ae2-4e45-b537-fd0beeff98b5/1b/shuffle_1_6_0.data, offset=0, length=1052128440}} to /199.199.35.10:55082; closing connection java.nio.channels.ClosedChannelException 15/08/31 17:11:31 ERROR server.TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1867783019055, chunkIndex=0}, buffer=FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/hadoop/appcache/application_1441064487503_0001/blockmgr-c3c8dbb3-9ae2-4e45-b537-fd0beeff98b5/3e/shuffle_1_9_0.data, offset=0, length=1059423784}} to /199.199.35.7:54328; closing connection java.nio.channels.ClosedChannelException 15/08/31 17:11:53 ERROR server.TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1867783019056, chunkIndex=0}, buffer=FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/hadoop/appcache/application_1441064487503_0001/blockmgr-c3c8dbb3-9ae2-4e45-b537-fd0beeff98b5/3e/shuffle_1_9_0.data, offset=0, length=1059423784}} to /199.199.35.5:50573; closing connection java.nio.channels.ClosedChannelException 15/08/31 17:12:54 ERROR