[jira] [Commented] (GEODE-8999) When max-threads is specified for a cache server its reader threads may be reported as Stuck
[ https://issues.apache.org/jira/browse/GEODE-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17294880#comment-17294880 ] Darrel Schneider commented on GEODE-8999: - On second thought it seems like this should work. When max-threads is set ServerConnection.run should not be called until we have detected that the client socket has something on it to read. It then only does a single message and returns itself to the Selector waiting to be run again. Is it possible in the above stack that the client started to write a message and for some reason did not finish writing it? If that happened then the Selector would have detected a read event on the socket; asked the thread pool to execute the ServerConnection; and then been stuck in it trying to read the complete message. > When max-threads is specified for a cache server its reader threads may be > reported as Stuck > > > Key: GEODE-8999 > URL: https://issues.apache.org/jira/browse/GEODE-8999 > Project: Geode > Issue Type: Bug > Components: client/server, membership >Affects Versions: 1.14.0 >Reporter: Bruce J Schuchardt >Priority: Major > > We noticed this report of a stuck thread in a test that enabled max-threads > in a cache server: > {noformat} > [warn 2021/03/02 19:54:31.041 PST bridgep2_host2_17822 > tid=0x1b] Thread <104> (0x68) that was executed at <02 Mar 2021 19:53:44 PST> > has been stuck for <46.356 seconds> and number of thread monitor iteration <1> > Thread Name state > Executor Group > Monitored metric > Thread stack: > sun.nio.ch.FileDispatcherImpl.read0(Native Method) > sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > sun.nio.ch.IOUtil.read(IOUtil.java:192) > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378) > org.apache.geode.internal.cache.tier.sockets.Message.readWrappedHeaders(Message.java:1237) > org.apache.geode.internal.cache.tier.sockets.Message.fetchHeader(Message.java:859) > org.apache.geode.internal.cache.tier.sockets.Message.readHeaderAndBody(Message.java:698) > org.apache.geode.internal.cache.tier.sockets.Message.receive(Message.java:1213) > org.apache.geode.internal.cache.tier.sockets.Message.receive(Message.java:1229) > org.apache.geode.internal.cache.tier.sockets.BaseCommand.readRequest(BaseCommand.java:816) > org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:777) > org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:73) > org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1185) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:710) > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$$Lambda$351/1357226696.invoke(Unknown > Source) > org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:120) > org.apache.geode.logging.internal.executors.LoggingThreadFactory$$Lambda$88/1800187767.run(Unknown > Source) > java.lang.Thread.run(Thread.java:748) > {noformat} > The cache server should suspend thread monitoring before reading from a > socket and resume monitoring afterward. An example of this can be found in > org.apache.geode.internal.tcp.Connection.java. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8999) When max-threads is specified for a cache server its reader threads may be reported as Stuck
[ https://issues.apache.org/jira/browse/GEODE-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17294872#comment-17294872 ] Darrel Schneider commented on GEODE-8999: - I think this bug should be fixed by no longer monitoring these threads. We have another ticket to improve the monitoring by included all server connection threads (see GEODE-8761). For this ticket (which may have been around since thread monitoring was added) I think we should just change initializeServerConnectionThreadPool to pass null instead of getThreadMonitorObj() when it creates selector thread pool > When max-threads is specified for a cache server its reader threads may be > reported as Stuck > > > Key: GEODE-8999 > URL: https://issues.apache.org/jira/browse/GEODE-8999 > Project: Geode > Issue Type: Bug > Components: client/server, membership >Affects Versions: 1.14.0 >Reporter: Bruce J Schuchardt >Priority: Major > > We noticed this report of a stuck thread in a test that enabled max-threads > in a cache server: > {noformat} > [warn 2021/03/02 19:54:31.041 PST bridgep2_host2_17822 > tid=0x1b] Thread <104> (0x68) that was executed at <02 Mar 2021 19:53:44 PST> > has been stuck for <46.356 seconds> and number of thread monitor iteration <1> > Thread Name state > Executor Group > Monitored metric > Thread stack: > sun.nio.ch.FileDispatcherImpl.read0(Native Method) > sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > sun.nio.ch.IOUtil.read(IOUtil.java:192) > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378) > org.apache.geode.internal.cache.tier.sockets.Message.readWrappedHeaders(Message.java:1237) > org.apache.geode.internal.cache.tier.sockets.Message.fetchHeader(Message.java:859) > org.apache.geode.internal.cache.tier.sockets.Message.readHeaderAndBody(Message.java:698) > org.apache.geode.internal.cache.tier.sockets.Message.receive(Message.java:1213) > org.apache.geode.internal.cache.tier.sockets.Message.receive(Message.java:1229) > org.apache.geode.internal.cache.tier.sockets.BaseCommand.readRequest(BaseCommand.java:816) > org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:777) > org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:73) > org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1185) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:710) > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$$Lambda$351/1357226696.invoke(Unknown > Source) > org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:120) > org.apache.geode.logging.internal.executors.LoggingThreadFactory$$Lambda$88/1800187767.run(Unknown > Source) > java.lang.Thread.run(Thread.java:748) > {noformat} > The cache server should suspend thread monitoring before reading from a > socket and resume monitoring afterward. An example of this can be found in > org.apache.geode.internal.tcp.Connection.java. -- This message was sent by Atlassian Jira (v8.3.4#803005)