[ https://issues.apache.org/jira/browse/SPARK-28726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907131#comment-16907131 ]
angerszhu commented on SPARK-28726: ----------------------------------- [~ajithshetty] also happen when higher timeouts > Spark with DynamicAllocation always got connect rest by peers > ------------------------------------------------------------- > > Key: SPARK-28726 > URL: https://issues.apache.org/jira/browse/SPARK-28726 > Project: Spark > Issue Type: Wish > Components: Spark Core > Affects Versions: 2.4.0 > Reporter: angerszhu > Priority: Major > > When use Spark with dynamic allocation, we set idle time to 5s > We always got exception about neety 'Connect reset by peers' > > I suspect that it's because we set idle time 5s is too small, it will cause > when Blockmanager call netty io, the executor has been remove because of > timeout. > But not timely notify driver's BlocakManager > {code:java} > 19/08/14 00:00:46 WARN > org.apache.spark.network.server.TransportChannelHandler: "Exception in > connection from /host:port" > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288) > at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) > -- > 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMasterEndpoint: > "Error trying to remove broadcast 67 from block manager BlockManagerId(967, > host, port, None)" > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288) > at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) > -- > 19/08/14 00:00:46 INFO org.apache.spark.ContextCleaner: "Cleaned accumulator > 162174" > 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMaster: "Failed > to remove shuffle 22 - Connection reset by peer" > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39){code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org