[ https://issues.apache.org/jira/browse/SSHD-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266436#comment-14266436 ]
Hugo Arès commented on SSHD-348: -------------------------------- I managed to reproduce the issue so I can give you the info you need. I wrote a simple application that connects to gerrit using the same libraries that Jenkins Gerrit-Trigger uses (gerrit-events which use jsch). This application simulates clients that stay connected, some other that disconnects/connect every 10 seconds and others that disconnects/connect when they did not receive any events for 1 minutes. I took a full thread dump and there are no other thread stuck other than the ones stuck on Window.waitForSpace. Here is the info you asked(I have 5 threads stuck on Window.waitForSpace and all the values are the same): remoteWindow.size 0 remoteWindow.waiting true remoteWindow.closed false commandExitFuture.result true gracefulState.value CloseReceived gracefulFuture.result null state.value Graceful closeFuture.result null service.state.value Closed service.closeFuture.result true session.state.value Closed session.closeFuture.result true command.done (Are you sure about the name, I did not find it) I looked on the client side and there is no errors. Is it possible that the client disconnect and the server miss the disconnection event and then continue to write to the client until the buffer is full? > Some SSH threads get blocked in Object.wait() method forever > ------------------------------------------------------------ > > Key: SSHD-348 > URL: https://issues.apache.org/jira/browse/SSHD-348 > Project: MINA SSHD > Issue Type: Bug > Affects Versions: 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.13.0 > Environment: Gerrit Code Review 2.9.1 (SSHD 0.12.0) > Gerrit Code Review 2.9.2 (SSHD 0.13.0) > Gerrit Code Review 2.9.3 (Downgraded to SSHD 0.9) > Reporter: David Ostrovsky > Assignee: Guillaume Nodet > Priority: Blocker > Fix For: 0.14.0 > > Attachments: 0001-Prepare-release-sshd-0.13.0-72f868e26.patch, diff > > > This seems to be a regression started from 0.10.1. > In Gerrit we have SSH commamds that returns immediately and so called > stream-events command which keeps connection open until clients disconnect. > Basically for days or weeks. This is used for example to inform CI system > (jenkins) about events in gerrit, like new patch set upload etc. > In Gerrit we have configurable "SSH-Stream-Worker" thread pool which is > dedicated to the mentioned stream-events SSH command. The observed behaviour > on latest SSHD release is that after some time all threads are stuck. They > aren't stuck at the same time, but one after another untill at some time all > threads are stuck and Gerrit must be restarted. Usually after one week. The > stack dump of all such stuck thread are the same, see below. If we had a > patch we could apply it to our production Gerrit instance and try if this > helps. > {code} > "SSH-Stream-Worker-10" cpu=95400.00 [reset 95400.00] ms elapsed=146444.30 > [reset 146444.30] s allocated=5526700000 B (5.15 GB) [reset 5526700000 B > (5.15 GB)] defined_classes=0 > io= file i/o: 15622752/0 B, net i/o: 0/0 B, files opened:0, socks opened:0 > [reset file i/o: 15622752/0 B, net i/o: 0/0 B, files opened:0, socks opened:0 > ] > prio=10 tid=0x00007f54514df800 nid=0x1c71 / 7281 pthread-id=139999281374976 > in Object.wait() [_thread_blocked (_at_safepoint), > stack(0x00007f541f5f6000,0x00007f541f6f7000)] [0x00007f541f6f5000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(J)V(Native Method) > - waiting on <0x00007f553aa530d0> (a > org.apache.sshd.common.channel.Window) > at java.lang.Object.wait()V(Object.java:503) > at > org.apache.sshd.common.channel.Window.waitForSpace()I(Window.java:148) > - locked <0x00007f553aa530d0> (a org.apache.sshd.common.channel.Window) > at > org.apache.sshd.common.channel.ChannelOutputStream.flush()V(ChannelOutputStream.java:116) > - locked <0x00007f553aa55060> (a > org.apache.sshd.common.channel.ChannelOutputStream) > at > org.apache.sshd.common.channel.ChannelOutputStream.write([BII)V(ChannelOutputStream.java:84) > - locked <0x00007f553aa55060> (a > org.apache.sshd.common.channel.ChannelOutputStream) > at sun.nio.cs.StreamEncoder.writeBytes()V(StreamEncoder.java:221) > at sun.nio.cs.StreamEncoder.implFlushBuffer()V(StreamEncoder.java:291) > at sun.nio.cs.StreamEncoder.implFlush()V(StreamEncoder.java:295) > at sun.nio.cs.StreamEncoder.flush()V(StreamEncoder.java:141) > - locked <0x00007f553aa7e258> (a java.io.OutputStreamWriter) > at java.io.OutputStreamWriter.flush()V(OutputStreamWriter.java:229) > at java.io.BufferedWriter.flush()V(BufferedWriter.java:254) > - locked <0x00007f553aa7e258> (a java.io.OutputStreamWriter) > at java.io.PrintWriter.flush()V(PrintWriter.java:320) > - locked <0x00007f553aa7e210> (a java.io.BufferedWriter) > at java.io.PrintWriter.checkError()Z(PrintWriter.java:357) > at > com.google.gerrit.sshd.commands.StreamEvents.writeEvents()V(StreamEvents.java:186) > at > com.google.gerrit.sshd.commands.StreamEvents.access$100(Lcom/google/gerrit/sshd/commands/StreamEvents;)V(StreamEvents.java:43) > at > com.google.gerrit.sshd.commands.StreamEvents$3.run()V(StreamEvents.java:82) > at > java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;(Executors.java:471) > at java.util.concurrent.FutureTask.run()V(FutureTask.java:262) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V(ScheduledThreadPoolExecutor.java:292) > at > com.google.gerrit.server.git.WorkQueue$Task.run()V(WorkQueue.java:364) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run()V(ThreadPoolExecutor.java:615) > at java.lang.Thread.run()V(Thread.java:812) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)