Re: Re-post: java.io.IOException: Too many open files
Hello! My expectation is that your code could skip the 2. step under load with timeouts. Regards, -- Ilya Kasnacheev 2018-07-04 11:42 GMT+03:00 胡海麟 : > Hi, > > After have some more tests, I believe the things are like this: > > 1. client writes to ignite and timeout (client setting is 15ms) > 2. client resets the connection since it seems dead (timeout). > 3. server catches the connection reset and throw the exception > > Actually, it's just a normal use case about connection timeout. > > I guess "Too many open files" is caused by huge number and high > frequency timeout and reconnecting. > I tried to set nofile 491403 to avoid the problem. > > For 500 microseconds, it is just for testing. Generally, we use > milliseconds level settings. > > Thanks. > > On Tue, Jul 3, 2018 at 9:57 PM, ilya.kasnacheev > wrote: > > Hello! > > > > I have tried to reproduce your case, but I don't observe any growth of > > number of open file descriptors on Ignite side. > > > > I think the problem here is on Go side. Please make sure to always close > > connection if you open it. If your program is terminated, this is not so > > strict, but if you create connections in a loop it can become a problem. > > > > Also, socket:[2322160] is not necessarily a UNIX socket, it is most > often a > > TCP socket as well. > > > > I also recommend changing '500 microseconds' to '500 milliseconds' > because > > there's not much you can expect to happen in half a millisecond or > > one-2000th of a second. Especially when over a network. > > > > Regards, > > > > > > > > -- > > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >
Re: Re-post: java.io.IOException: Too many open files
Hi, After have some more tests, I believe the things are like this: 1. client writes to ignite and timeout (client setting is 15ms) 2. client resets the connection since it seems dead (timeout). 3. server catches the connection reset and throw the exception Actually, it's just a normal use case about connection timeout. I guess "Too many open files" is caused by huge number and high frequency timeout and reconnecting. I tried to set nofile 491403 to avoid the problem. For 500 microseconds, it is just for testing. Generally, we use milliseconds level settings. Thanks. On Tue, Jul 3, 2018 at 9:57 PM, ilya.kasnacheev wrote: > Hello! > > I have tried to reproduce your case, but I don't observe any growth of > number of open file descriptors on Ignite side. > > I think the problem here is on Go side. Please make sure to always close > connection if you open it. If your program is terminated, this is not so > strict, but if you create connections in a loop it can become a problem. > > Also, socket:[2322160] is not necessarily a UNIX socket, it is most often a > TCP socket as well. > > I also recommend changing '500 microseconds' to '500 milliseconds' because > there's not much you can expect to happen in half a millisecond or > one-2000th of a second. Especially when over a network. > > Regards, > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Re-post: java.io.IOException: Too many open files
Hello! I have tried to reproduce your case, but I don't observe any growth of number of open file descriptors on Ignite side. I think the problem here is on Go side. Please make sure to always close connection if you open it. If your program is terminated, this is not so strict, but if you create connections in a loop it can become a problem. Also, socket:[2322160] is not necessarily a UNIX socket, it is most often a TCP socket as well. I also recommend changing '500 microseconds' to '500 milliseconds' because there's not much you can expect to happen in half a millisecond or one-2000th of a second. Especially when over a network. Regards, -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Re-post: java.io.IOException: Too many open files
Hi, In case dial timeout, client is nil so that client.Close() can't work. Thanks.
Re: Re-post: java.io.IOException: Too many open files
Hi, Thank you for attaching the code sample and configration. Most likely this information will be enough to reproduce the issue. Meanwhile could you please try to add `defer client.Close()` into your code: func main() { client, err := redis.DialTimeout("tcp", "10.1.14.221:11211", 500 * time.Microsecond) if err != nil { panic(err) } defer client.Close() foo, err := client.Cmd("GET", "foo").Str() if err != nil { panic(err) } fmt.Println("foo: ", foo) } I think it should help to fix "Connection reset by peer" exceptions on Ignite nodes. Best Regards, Roman -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Re-post: java.io.IOException: Too many open files
Hi, Sorry I have no knowledge about maven. Here is my config file and sample code of the client. I reproduced "Connection reset by peer" by adjust the timeout setting, but ignite's file descriptor count didn't increase. Before ignite was halted by "Too many open files", there was a close wait spike in network metrics (see network.png). I'm still looking for the reason for that. Thanks. http://www.springframework.org/schema/beans"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd";> radix.go Description: Binary data
Re: Re-post: java.io.IOException: Too many open files
Hi, I checked the logs and it saw there are a lot of "Connection reset by peer" exceptions there which can be a cause of "Too many open files". It seems that the clients connect to Ignite server node via REST API. Could you please share a small reproducer (maven project at github) and server node configuration as well? I will try to reproduce the issue in my environment. I think we need to analyze how a client app interacts with a server node. Thanks. Best Regards, Roman -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Re-post: java.io.IOException: Too many open files
Hi, Could you please attach full Ignite logs? Ignite node should not keep a lot of socket descriptors. But some network issues can lead to increasing those ones. Another possible cause, if you have one node with 30 caches and Ignite Persistence is enabled, then it is an expected behavior the more 30K files are opened - 30 x 1024 (default partitions count). Is it your case? Best Regards, Roman -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Re-post: java.io.IOException: Too many open files
I set it 32768, exhausted. I have many clients to connect to ignite, but don't have so many. I'm afraid that to set it higher just win me a little more time, but not a solution. On Mon, Jun 25, 2018 at 5:20 AM, David Harvey wrote: > MYou must increase the Linux NOFILE ulimit when running Ignite. The > documentation describes how to do this. > > Disclaimer > > The information contained in this communication from the sender is > confidential. It is intended solely for use by the recipient and others > authorized to receive it. If you are not the recipient, you are hereby > notified that any disclosure, copying, distribution or taking action in > relation of the contents of this information is strictly prohibited and may > be unlawful. > > This email has been scanned for viruses and malware, and may have been > automatically archived by Mimecast Ltd, an innovator in Software as a > Service (SaaS) for business. Providing a safer and more useful place for > your human generated data. Specializing in; Security, archiving and > compliance. To find out more Click Here.
Re: Re-post: java.io.IOException: Too many open files
MYou must increase the Linux NOFILE ulimit when running Ignite. The documentation describes how to do this. On Sun, Jun 24, 2018, 12:47 PM 胡海麟 wrote: > Hi, > > Re-post message 'cause I failed to post my logs pasted. > > I have got repeated Too many open files exceptions since sometime. > > [11:26:24,493][SEVERE][grid-nio-worker-tcp-rest-1-#57][GridTcpRestProtocol] > Failed to process selector key [ses=GridSelectorNioSessionImpl > [worker=ByteBufferNioClientWorker > [readBuf=java.nio.HeapByteBuffer[pos=0 lim=8192 cap=8192], > super=AbstractNioClientWorker [idx=1, bytesRcvd=0, bytesSent=0, > bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker > [name=grid-nio-worker-tcp-rest-1, igniteInstanceName=null, > finished=false, hashCode=1611196193, interrupted=false, > runner=grid-nio-worker-tcp-rest-1-#57]]], writeBuf=null, readBuf=null, > inRecovery=null, outRecovery=null, super=GridNioSessionImpl > [locAddr=/10.1.14.11:11211, rmtAddr=/10.1.252.184:40680, > createTime=1529666783471, closeTime=0, bytesSent=5, bytesRcvd=1074, > bytesSent0=0, bytesRcvd0=0, sndSchedTime=1529666783481, > lastSndTime=1529666783481, lastRcvTime=1529666783481, > readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter > [parser=GridTcpRestParser [marsh=JdkMarshaller > [clsFilter=o.a.i.i.IgniteKernal$5@331b0c4a], routerClient=false], > directMode=false]], accepted=true]]] > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) > at > org.apache.ignite.internal.util.nio.GridNioServer$ByteBufferNioClientWorker.processRead(GridNioServer.java:1085) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2339) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2110) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1764) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:748) > > [11:26:24,493][WARNING][grid-nio-worker-tcp-rest-1-#57][GridTcpRestProtocol] > Closing NIO session because of unhandled exception [cls=class > o.a.i.i.util.nio.GridNioException, msg=Connection reset by peer] > > [11:26:24,493][WARNING][grid-nio-worker-tcp-rest-1-#57][GridTcpRestProtocol] > Closed client session due to exception [ses=GridSelectorNioSessionImpl > [worker=ByteBufferNioClientWorker > [readBuf=java.nio.HeapByteBuffer[pos=0 lim=8192 cap=8192], > super=AbstractNioClientWorker [idx=1, bytesRcvd=0, bytesSent=0, > bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker > [name=grid-nio-worker-tcp-rest-1, igniteInstanceName=null, > finished=false, hashCode=1611196193, interrupted=false, > runner=grid-nio-worker-tcp-rest-1-#57]]], writeBuf=null, readBuf=null, > inRecovery=null, outRecovery=null, super=GridNioSessionImpl > [locAddr=/10.1.14.11:11211, rmtAddr=/10.1.252.184:40680, > createTime=1529666783471, closeTime=1529666784488, bytesSent=5, > bytesRcvd=1074, bytesSent0=0, bytesRcvd0=0, > sndSchedTime=1529666783481, lastSndTime=1529666783481, > lastRcvTime=1529666783481, readsPaused=false, > filterChain=FilterChain[filters=[GridNioCodecFilter > [parser=GridTcpRestParser [marsh=JdkMarshaller > [clsFilter=o.a.i.i.IgniteKernal$5@331b0c4a], routerClient=false], > directMode=false]], accepted=true]], msg=Connection reset by peer] > [11:26:24,513][SEVERE][grid-nio-worker-tcp-rest-1-#57][GridTcpRestProtocol] > Caught unhandled exception in NIO worker thread (restart the node). > java.lang.NullPointerException > at > sun.nio.ch.EPollArrayWrapper.isEventsHighKilled(EPollArrayWrapper.java:174) > at > sun.nio.ch.EPollArrayWrapper.setUpdateEvents(EPollArrayWrapper.java:190) > at sun.nio.ch.EPollArrayWrapper.add(EPollArrayWrapper.java:239) > at > sun.nio.ch.EPollSelectorImpl.implRegister(EPollSelectorImpl.java:178) > at sun.nio.ch.SelectorImpl.register(SelectorImpl.java:132) > at > java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:212) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.register(GridNioServer.java:2545) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:1934) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1764) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Threa
Re-post: java.io.IOException: Too many open files
Hi, Re-post message 'cause I failed to post my logs pasted. I have got repeated Too many open files exceptions since sometime. [11:26:24,493][SEVERE][grid-nio-worker-tcp-rest-1-#57][GridTcpRestProtocol] Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=0 lim=8192 cap=8192], super=AbstractNioClientWorker [idx=1, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-rest-1, igniteInstanceName=null, finished=false, hashCode=1611196193, interrupted=false, runner=grid-nio-worker-tcp-rest-1-#57]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/10.1.14.11:11211, rmtAddr=/10.1.252.184:40680, createTime=1529666783471, closeTime=0, bytesSent=5, bytesRcvd=1074, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1529666783481, lastSndTime=1529666783481, lastRcvTime=1529666783481, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=GridTcpRestParser [marsh=JdkMarshaller [clsFilter=o.a.i.i.IgniteKernal$5@331b0c4a], routerClient=false], directMode=false]], accepted=true]]] java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at org.apache.ignite.internal.util.nio.GridNioServer$ByteBufferNioClientWorker.processRead(GridNioServer.java:1085) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2339) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2110) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1764) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) [11:26:24,493][WARNING][grid-nio-worker-tcp-rest-1-#57][GridTcpRestProtocol] Closing NIO session because of unhandled exception [cls=class o.a.i.i.util.nio.GridNioException, msg=Connection reset by peer] [11:26:24,493][WARNING][grid-nio-worker-tcp-rest-1-#57][GridTcpRestProtocol] Closed client session due to exception [ses=GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=0 lim=8192 cap=8192], super=AbstractNioClientWorker [idx=1, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-rest-1, igniteInstanceName=null, finished=false, hashCode=1611196193, interrupted=false, runner=grid-nio-worker-tcp-rest-1-#57]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/10.1.14.11:11211, rmtAddr=/10.1.252.184:40680, createTime=1529666783471, closeTime=1529666784488, bytesSent=5, bytesRcvd=1074, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1529666783481, lastSndTime=1529666783481, lastRcvTime=1529666783481, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=GridTcpRestParser [marsh=JdkMarshaller [clsFilter=o.a.i.i.IgniteKernal$5@331b0c4a], routerClient=false], directMode=false]], accepted=true]], msg=Connection reset by peer] [11:26:24,513][SEVERE][grid-nio-worker-tcp-rest-1-#57][GridTcpRestProtocol] Caught unhandled exception in NIO worker thread (restart the node). java.lang.NullPointerException at sun.nio.ch.EPollArrayWrapper.isEventsHighKilled(EPollArrayWrapper.java:174) at sun.nio.ch.EPollArrayWrapper.setUpdateEvents(EPollArrayWrapper.java:190) at sun.nio.ch.EPollArrayWrapper.add(EPollArrayWrapper.java:239) at sun.nio.ch.EPollSelectorImpl.implRegister(EPollSelectorImpl.java:178) at sun.nio.ch.SelectorImpl.register(SelectorImpl.java:132) at java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:212) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.register(GridNioServer.java:2545) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:1934) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1764) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) [11:26:30,277][SEVERE][nio-acceptor-#55][GridTcpRestProtocol] Failed to accept remote connection (will wait for 2000ms). class org.apache.ignite.IgniteCheckedException: Failed to accept connection: GridWorker [name=nio-acceptor, igniteInstanceName=null, finished=false, hashCode=1020662787,