[ https://issues.apache.org/jira/browse/SPARK-28726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908711#comment-16908711 ]
angerszhu edited comment on SPARK-28726 at 8/16/19 5:04 AM: ------------------------------------------------------------ [~hyukjin.kwon] spark thrift server, sql dynamic allocation enable and idle timeout time 5s。 spark.dynamicAllocation.enabled true spark.dynamicAllocation.executorIdleTimeout 5s spark.dynamicAllocation.initialExecutors 1 spark.dynamicAllocation.maxExecutors 40 spark.dynamicAllocation.schedulerBacklogTimeout 2s spark.reducer.maxSizeInFlight 24m spark.shuffle.consolidateFiles true spark.core.connection.ack.wait.timeout 300s spark.kryoserializer.buffer.max 1024m spark.serializer org.apache.spark.serializer.KryoSerializer spark.port.maxRetries 100 was (Author: angerszhuuu): [~hyukjin.kwon] spark thrift server, sql dynamic allocation enable and idle timeout time 5s。 spark.dynamicAllocation.enabled true spark.dynamicAllocation.executorIdleTimeout 5s spark.dynamicAllocation.initialExecutors 1 spark.dynamicAllocation.maxExecutors 40 spark.dynamicAllocation.schedulerBacklogTimeout 2s > Spark with DynamicAllocation always got connect rest by peers > ------------------------------------------------------------- > > Key: SPARK-28726 > URL: https://issues.apache.org/jira/browse/SPARK-28726 > Project: Spark > Issue Type: Wish > Components: Spark Core > Affects Versions: 2.4.0 > Reporter: angerszhu > Priority: Major > > When use Spark with dynamic allocation, we set idle time to 5s > We always got exception about neety 'Connect reset by peers' > > I suspect that it's because we set idle time 5s is too small, it will cause > when Blockmanager call netty io, the executor has been remove because of > timeout. > But not timely notify driver's BlocakManager > {code:java} > 19/08/14 00:00:46 WARN > org.apache.spark.network.server.TransportChannelHandler: "Exception in > connection from /host:port" > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288) > at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) > -- > 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMasterEndpoint: > "Error trying to remove broadcast 67 from block manager BlockManagerId(967, > host, port, None)" > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288) > at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) > -- > 19/08/14 00:00:46 INFO org.apache.spark.ContextCleaner: "Cleaned accumulator > 162174" > 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMaster: "Failed > to remove shuffle 22 - Connection reset by peer" > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39){code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org