Re: Connection reset by peer : failed to remove cache rdd

2021-09-02 Thread Harsh Sharma



On 2021/09/02 06:00:26, Harsh Sharma  wrote: 
> Please Find reply : 
> Do you know when in your application lifecycle it happens? Spark SQL or
> > Structured Streaming? 
> 
> ans :its Spark SQL
> 
> Do you use broadcast variables ?
> 
> ans : yes we are using broadcast variables
>  or are the errors
>  coming from broadcast joins perhaps? 
not sure about this

> 
> On 2021/08/30 13:32:19, Jacek Laskowski  wrote: 
> > Hi,
> > 
> > No idea what might be going on here, but I'd not worry much about it and
> > simply monitor disk usage as some broadcast blocks might have left over.
> > 
> > Do you know when in your application lifecycle it happens? Spark SQL or
> > Structured Streaming? Do you use broadcast variables or are the errors
> > coming from broadcast joins perhaps?
> > 
> > Pozdrawiam,
> > Jacek Laskowski
> > 
> > https://about.me/JacekLaskowski
> > "The Internals Of" Online Books <https://books.japila.pl/>
> > Follow me on https://twitter.com/jaceklaskowski
> > 
> > <https://twitter.com/jaceklaskowski>
> > 
> > 
> > On Mon, Aug 30, 2021 at 3:26 PM Harsh Sharma 
> > wrote:
> > 
> > > We are facing issue in production where we are getting frequent
> > >
> > > Still have 1 request outstanding when connection with the hostname was
> > > closed
> > >
> > > connection reset by peer : errors as well as warnings  : failed to remove
> > > cache rdd or failed  to remove broadcast variable.
> > >
> > > Please help us how to mitigate this  :
> > >
> > > Executor memory : 12g
> > >
> > > Network timeout :   60
> > >
> > > Heartbeat interval : 25
> > >
> > >
> > >
> > > [Stage 284:>(199 + 1) / 200][Stage 292:>  (1 + 3)
> > > / 200]
> > > [Stage 284:>(199 + 1) / 200][Stage 292:>  (2 + 3)
> > > / 200]
> > > [Stage 292:>  (2 + 4)
> > > / 200][14/06/21 10:46:17,006 WARN
> > > shuffle-server-4](TransportChannelHandler) Exception in connection from
> > > 
> > > java.io.IOException: Connection reset by peer
> > > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > > at
> > > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > > at
> > > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > > at
> > > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > > at
> > > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > > at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > > at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > > at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > > at
> > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > > at java.lang.Thread.run(Thread.java:748)
> > > [14/06/21 10:46:17,010 ERROR shuffle-server-4](TransportResponseHandler)
> > > Still have 1 requests outstanding when connection from  is 
> > > closed
> > > [14/06/21 10:46:17,012 ERROR Spark Context Cleaner](ContextCleaner) Error
> > > cleaning broadcast 159
> > > java.io.IOException: Connection reset by peer
> > > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > > at
> > > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > > at
> > >

Spark Phoenix Connection Exception while loading from Phoenix tables

2021-09-02 Thread Harsh Sharma
[01/09/21 11:55:51,861 WARN  pool-1-thread-1](Client) Exception encountered 
while connecting to the server : java.lang.NullPointerException 
[01/09/21 11:55:51,862 WARN  pool-1-thread-1](Client) Exception encountered 
while connecting to the server : java.lang.NullPointerException 

[01/09/21 11:55:51,862 WARN  pool-1-thread-1](RetryInvocationHandler) Exception 
while invoking class 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo
 over server 
Not retrying because failovers (15) exceeded maximum allowed (15) 
java.io.IOException: Failed on local exception: java.io.IOException: 
java.lang.NullPointerException; Host Details : local host is:xx1 
destination host is: xx2
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy19.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305



-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Connection reset by peer : failed to remove cache rdd

2021-09-02 Thread Harsh Sharma
Please Find reply : 
Do you know when in your application lifecycle it happens? Spark SQL or
> Structured Streaming? 

ans :its Spark SQL

Do you use broadcast variables ?

ans : yes we are using broadcast variables
 or are the errors
 coming from broadcast joins perhaps? 
ans :we are not using Boardcast join

On 2021/08/30 13:32:19, Jacek Laskowski  wrote: 
> Hi,
> 
> No idea what might be going on here, but I'd not worry much about it and
> simply monitor disk usage as some broadcast blocks might have left over.
> 
> Do you know when in your application lifecycle it happens? Spark SQL or
> Structured Streaming? Do you use broadcast variables or are the errors
> coming from broadcast joins perhaps?
> 
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
> 
> <https://twitter.com/jaceklaskowski>
> 
> 
> On Mon, Aug 30, 2021 at 3:26 PM Harsh Sharma 
> wrote:
> 
> > We are facing issue in production where we are getting frequent
> >
> > Still have 1 request outstanding when connection with the hostname was
> > closed
> >
> > connection reset by peer : errors as well as warnings  : failed to remove
> > cache rdd or failed  to remove broadcast variable.
> >
> > Please help us how to mitigate this  :
> >
> > Executor memory : 12g
> >
> > Network timeout :   60
> >
> > Heartbeat interval : 25
> >
> >
> >
> > [Stage 284:>(199 + 1) / 200][Stage 292:>  (1 + 3)
> > / 200]
> > [Stage 284:>(199 + 1) / 200][Stage 292:>  (2 + 3)
> > / 200]
> > [Stage 292:>  (2 + 4)
> > / 200][14/06/21 10:46:17,006 WARN
> > shuffle-server-4](TransportChannelHandler) Exception in connection from
> > 
> > java.io.IOException: Connection reset by peer
> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > at java.lang.Thread.run(Thread.java:748)
> > [14/06/21 10:46:17,010 ERROR shuffle-server-4](TransportResponseHandler)
> > Still have 1 requests outstanding when connection from  is closed
> > [14/06/21 10:46:17,012 ERROR Spark Context Cleaner](ContextCleaner) Error
> > cleaning broadcast 159
> > java.io.IOException: Connection reset by peer
> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > at

Re: Connection reset by peer : failed to remove cache rdd

2021-09-01 Thread Harsh Sharma



On 2021/08/30 13:32:19, Jacek Laskowski  wrote: 
> Hi,
> 
> No idea what might be going on here, but I'd not worry much about it and
> simply monitor disk usage as some broadcast blocks might have left over.
> 
> Do you know when in your application lifecycle it happens? Spark SQL or
> Structured Streaming? Do you use broadcast variables or are the errors
> coming from broadcast joins perhaps?
> 
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
> 
> <https://twitter.com/jaceklaskowski>
> 
> 
> On Mon, Aug 30, 2021 at 3:26 PM Harsh Sharma 
> wrote:
> 
> > We are facing issue in production where we are getting frequent
> >
> > Still have 1 request outstanding when connection with the hostname was
> > closed
> >
> > connection reset by peer : errors as well as warnings  : failed to remove
> > cache rdd or failed  to remove broadcast variable.
> >
> > Please help us how to mitigate this  :
> >
> > Executor memory : 12g
> >
> > Network timeout :   60
> >
> > Heartbeat interval : 25
> >
> >
> >
> > [Stage 284:>(199 + 1) / 200][Stage 292:>  (1 + 3)
> > / 200]
> > [Stage 284:>(199 + 1) / 200][Stage 292:>  (2 + 3)
> > / 200]
> > [Stage 292:>  (2 + 4)
> > / 200][14/06/21 10:46:17,006 WARN
> > shuffle-server-4](TransportChannelHandler) Exception in connection from
> > 
> > java.io.IOException: Connection reset by peer
> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > at java.lang.Thread.run(Thread.java:748)
> > [14/06/21 10:46:17,010 ERROR shuffle-server-4](TransportResponseHandler)
> > Still have 1 requests outstanding when connection from  is closed
> > [14/06/21 10:46:17,012 ERROR Spark Context Cleaner](ContextCleaner) Error
> > cleaning broadcast 159
> > java.io.IOException: Connection reset by peer
> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventEx

Connection reset by peer : failed to remove cache rdd

2021-08-30 Thread Harsh Sharma
We are facing issue in production where we are getting frequent

Still have 1 request outstanding when connection with the hostname was closed

connection reset by peer : errors as well as warnings  : failed to remove cache 
rdd or failed  to remove broadcast variable.

Please help us how to mitigate this  :

Executor memory : 12g

Network timeout :   60

Heartbeat interval : 25

 

[Stage 284:>(199 + 1) / 200][Stage 292:>  (1 + 3) / 200]
[Stage 284:>(199 + 1) / 200][Stage 292:>  (2 + 3) / 200]
[Stage 292:>  (2 + 4) / 
200][14/06/21 10:46:17,006 WARN  shuffle-server-4](TransportChannelHandler) 
Exception in connection from 
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
at 
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at 
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)
[14/06/21 10:46:17,010 ERROR shuffle-server-4](TransportResponseHandler) Still 
have 1 requests outstanding when connection from  is closed
[14/06/21 10:46:17,012 ERROR Spark Context Cleaner](ContextCleaner) Error 
cleaning broadcast 159
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
at 
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at 
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)
[14/06/21 10:46:17,012 WARN  
block-manager-ask-thread-pool-69](BlockManagerMaster) Failed to remove 
broadcast 159 with removeFromMaster = true - Connection reset by peer
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
at 
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at 
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)

-
To unsubscribe 

Spark Issues while upgrade to 2.4 from 1.6 in Parcels

2021-08-11 Thread Harsh Sharma
hi Team , 

we are upgrading our cloudera parcels  to 6.X from 5.x , hence e have upgraded 
version of park from 1.6 to 2.4 . While executing a spark program we are 
getting the below error  : 
Please help us how to resolve in cloudera parcels. There are suggestion to 
install spark gateway roles  if that is the case guide us how to do that .
Failed to find client configuration. If this host is managed by Cloudera 
Manager, please install Spark Gateway Role on this host to run Spark Jobs. 
Otherwise, please configure the Spark dependencies correctly.

PS : we are not using cloudera manager

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Cloudera Parcel : spark issues after upgrade 1.6 to 2.4

2021-07-30 Thread Harsh Sharma
hi Team , 

we are upgrading our cloudera parcels  to 6.X from 5.x , hence e have upgraded 
version of park from 1.6 to 2.4 . While executing a spark program we are 
getting the below error  : 
Please help us how to resolve in cloudera parcels. There are suggestion to 
install spark gateway roles  if that is the case guide us how to do that .
Failed to find client configuration. If this host is managed by Cloudera 
Manager, please install Spark Gateway Role on this host to run Spark Jobs. 
Otherwise, please configure the Spark dependencies correctly.

PS : we are not using cloudera manager






-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Connection Reset by Peer : failed to remove cached rdd

2021-07-30 Thread Harsh Sharma
[Stage 284:>(199 + 1) / 200][Stage 292:>  (1 + 3) / 200]
[Stage 284:>(199 + 1) / 200][Stage 292:>  (2 + 3) / 200]
[Stage 292:>  (2 + 4) / 
200][14/06/21 10:46:17,006 WARN  shuffle-server-4](TransportChannelHandler) 
Exception in connection from 
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
at 
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at 
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)
[14/06/21 10:46:17,010 ERROR shuffle-server-4](TransportResponseHandler) Still 
have 1 requests outstanding when connection from  is closed 
[14/06/21 10:46:17,012 ERROR Spark Context Cleaner](ContextCleaner) Error 
cleaning broadcast 159 
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
at 
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at 
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)
[14/06/21 10:46:17,012 WARN  
block-manager-ask-thread-pool-69](BlockManagerMaster) Failed to remove 
broadcast 159 with removeFromMaster = true - Connection reset by peer 
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
at 
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at 
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)

On 2021/07/29 12:46:01, 
Big data developer need help relat to spark gateway roles in 2.0
  wrote: 
> Hi Team ,
> 
> We are facing issue in production where we are getting frequent
> 
> Still have 1 request outstanding when connection with the hostname was closed
> 
> connection reset by peer : errors as well as warnings : failed to remove cache
> rdd or failed to remove broadcast variable.
> 
> Please help us how