[ 
https://issues.apache.org/jira/browse/SPARK-28726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-28726:
------------------------------
    Description: 
When use Spark with dynamic allocation, we set idle time to 5s

We always got exception about neety 'Connect reset by peers'

 

I suspect that it's because we set idle time 5s is too small, it will cause 
when Blockmanager call netty io, the executor has been remove because of 
timeout.

But not timely notify driver's BlocakManager
{code:java}

19/08/14 00:00:46 WARN org.apache.spark.network.server.TransportChannelHandler: 
"Exception in connection from /host:port"
java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
 at sun.nio.ch.IOUtil.read(IOUtil.java:192)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
 at 
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
 at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
 at 
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
 at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
 at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
 at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
 at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
 at 
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
 at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
--
19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMasterEndpoint: 
"Error trying to remove broadcast 67 from block manager BlockManagerId(967, 
host, port, None)"
java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
 at sun.nio.ch.IOUtil.read(IOUtil.java:192)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
 at 
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
 at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
 at 
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
 at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
 at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
 at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
 at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
 at 
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
 at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
--
19/08/14 00:00:46 INFO org.apache.spark.ContextCleaner: "Cleaned accumulator 
162174"
19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMaster: "Failed to 
remove shuffle 22 - Connection reset by peer"
java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39){code}

> Spark with DynamicAllocation always got connect rest by peers
> -------------------------------------------------------------
>
>                 Key: SPARK-28726
>                 URL: https://issues.apache.org/jira/browse/SPARK-28726
>             Project: Spark
>          Issue Type: Wish
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: angerszhu
>            Priority: Major
>
> When use Spark with dynamic allocation, we set idle time to 5s
> We always got exception about neety 'Connect reset by peers'
>  
> I suspect that it's because we set idle time 5s is too small, it will cause 
> when Blockmanager call netty io, the executor has been remove because of 
> timeout.
> But not timely notify driver's BlocakManager
> {code:java}
> 19/08/14 00:00:46 WARN 
> org.apache.spark.network.server.TransportChannelHandler: "Exception in 
> connection from /host:port"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>  at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>  at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
>  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
>  at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
>  at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>  at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>  at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> --
> 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMasterEndpoint: 
> "Error trying to remove broadcast 67 from block manager BlockManagerId(967, 
> host, port, None)"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>  at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>  at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
>  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
>  at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
>  at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>  at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>  at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> --
> 19/08/14 00:00:46 INFO org.apache.spark.ContextCleaner: "Cleaned accumulator 
> 162174"
> 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMaster: "Failed 
> to remove shuffle 22 - Connection reset by peer"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39){code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to