[ 
https://issues.apache.org/jira/browse/GEODE-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17294880#comment-17294880
 ] 

Darrel Schneider commented on GEODE-8999:
-----------------------------------------

On second thought it seems like this should work. When max-threads is set 
ServerConnection.run should not be called until we have detected that the 
client socket has something on it to read. It then only does a single message 
and returns itself to the Selector waiting to be run again. Is it possible in 
the above stack that the client started to write a message and for some reason 
did not finish writing it? If that happened then the Selector would have 
detected a read event on the socket; asked the thread pool to execute the 
ServerConnection; and then been stuck in it trying to read the complete message.


> When max-threads is specified for a cache server its reader threads may be 
> reported as Stuck
> --------------------------------------------------------------------------------------------
>
>                 Key: GEODE-8999
>                 URL: https://issues.apache.org/jira/browse/GEODE-8999
>             Project: Geode
>          Issue Type: Bug
>          Components: client/server, membership
>    Affects Versions: 1.14.0
>            Reporter: Bruce J Schuchardt
>            Priority: Major
>
> We noticed this report of a stuck thread in a test that enabled max-threads 
> in a cache server:
> {noformat}
> [warn 2021/03/02 19:54:31.041 PST bridgep2_host2_17822 <ThreadsMonitor> 
> tid=0x1b] Thread <104> (0x68) that was executed at <02 Mar 2021 19:53:44 PST> 
> has been stuck for <46.356 seconds> and number of thread monitor iteration <1>
> Thread Name <ServerConnection on port 26188 Thread 5> state <RUNNABLE>
> Executor Group <PooledExecutorWithDMStats>
> Monitored metric <ResourceManagerStats.numThreadsStuck>
> Thread stack:
> sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> sun.nio.ch.IOUtil.read(IOUtil.java:192)
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> org.apache.geode.internal.cache.tier.sockets.Message.readWrappedHeaders(Message.java:1237)
> org.apache.geode.internal.cache.tier.sockets.Message.fetchHeader(Message.java:859)
> org.apache.geode.internal.cache.tier.sockets.Message.readHeaderAndBody(Message.java:698)
> org.apache.geode.internal.cache.tier.sockets.Message.receive(Message.java:1213)
> org.apache.geode.internal.cache.tier.sockets.Message.receive(Message.java:1229)
> org.apache.geode.internal.cache.tier.sockets.BaseCommand.readRequest(BaseCommand.java:816)
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:777)
> org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:73)
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1185)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:710)
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$$Lambda$351/1357226696.invoke(Unknown
>  Source)
> org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:120)
> org.apache.geode.logging.internal.executors.LoggingThreadFactory$$Lambda$88/1800187767.run(Unknown
>  Source)
> java.lang.Thread.run(Thread.java:748)
> {noformat}
> The cache server should suspend thread monitoring before reading from a 
> socket and resume monitoring afterward.  An example of this can be found in 
> org.apache.geode.internal.tcp.Connection.java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to