[ https://issues.apache.org/jira/browse/SSHD-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135632#comment-14135632 ]
David Ostrovsky commented on SSHD-348: -------------------------------------- Interesting fact is that we haven't observed this behavior in previous versions of SSHD (I assume that there were no client changes) with the same StreamEvents implementation. Gerrit v.2.8.0-v2.8.4 was based on SSHD 0.9.0 (Google storage bucket Maven repo) with one small change that suppressed verbose output during handshake phase: {code} maven_jar( name = 'sshd', id = 'org.apache.sshd:sshd-core:0.9.0.201311081', sha1 = '38f7ac8602e70fa05fdc6147d204198e9cefe5bc', license = 'Apache2.0', deps = [':core'], exclude = EXCLUDE, repository = GERRIT, ) {code} And starting from Gerrit 2.9 we started to observe this behavior: SSHD 0.11.0 (Atlassian Maven repo: 0.11 with cherry-picked patch that fixed handshake failure after 250 attempts). {code} maven_jar( name = 'sshd', id = 'org.apache.sshd:sshd-core:0.11.1-atlassian-1', sha1 = '0de20bfa03ddeedc8eb54ab6e85e90e776ea18f8', license = 'Apache2.0', deps = [':core'], exclude = EXCLUDE, repository = ATLASSIAN, ) {code} May be it worth trying to downgrade SSHD to org.apache.sshd:sshd-core:0.9.0.201311081 release that is known to work just fine with unchanged StreamEvents implementation to see if it fixes the problem? > Some SSH threads get blocked in Object.wait() method forever > ------------------------------------------------------------ > > Key: SSHD-348 > URL: https://issues.apache.org/jira/browse/SSHD-348 > Project: MINA SSHD > Issue Type: Bug > Affects Versions: 0.12.0 > Environment: Gerrit Code Review 2.9.1 > Reporter: David Ostrovsky > > This seems to be a regression compared to previous versions (0.6-0 and later). > In Gerrit we have SSH commamds that returns immediately and so called > stream-events command which keeps connection open until clients disconnect. > Basically for days or weeks. This is used for example to inform CI system > (jenkins) about events in gerrit, like new patch set upload etc. > In Gerrit we have configurable "SSH-Stream-Worker" thread pool which is > dedicated to the mentioned stream-events SSH command. The observed behaviour > on latest SSHD release is that after some time all threads are stuck. They > aren't stuck at the same time, but one after another untill at some time all > threads are stuck and Gerrit must be restarted. Usually after one week. The > stack dump of all such stuck thread are the same, see below. If we had a > patch we could apply it to our production Gerrit instance and try if this > helps. > {code} > "SSH-Stream-Worker-10" cpu=95400.00 [reset 95400.00] ms elapsed=146444.30 > [reset 146444.30] s allocated=5526700000 B (5.15 GB) [reset 5526700000 B > (5.15 GB)] defined_classes=0 > io= file i/o: 15622752/0 B, net i/o: 0/0 B, files opened:0, socks opened:0 > [reset file i/o: 15622752/0 B, net i/o: 0/0 B, files opened:0, socks opened:0 > ] > prio=10 tid=0x00007f54514df800 nid=0x1c71 / 7281 pthread-id=139999281374976 > in Object.wait() [_thread_blocked (_at_safepoint), > stack(0x00007f541f5f6000,0x00007f541f6f7000)] [0x00007f541f6f5000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(J)V(Native Method) > - waiting on <0x00007f553aa530d0> (a > org.apache.sshd.common.channel.Window) > at java.lang.Object.wait()V(Object.java:503) > at > org.apache.sshd.common.channel.Window.waitForSpace()I(Window.java:148) > - locked <0x00007f553aa530d0> (a org.apache.sshd.common.channel.Window) > at > org.apache.sshd.common.channel.ChannelOutputStream.flush()V(ChannelOutputStream.java:116) > - locked <0x00007f553aa55060> (a > org.apache.sshd.common.channel.ChannelOutputStream) > at > org.apache.sshd.common.channel.ChannelOutputStream.write([BII)V(ChannelOutputStream.java:84) > - locked <0x00007f553aa55060> (a > org.apache.sshd.common.channel.ChannelOutputStream) > at sun.nio.cs.StreamEncoder.writeBytes()V(StreamEncoder.java:221) > at sun.nio.cs.StreamEncoder.implFlushBuffer()V(StreamEncoder.java:291) > at sun.nio.cs.StreamEncoder.implFlush()V(StreamEncoder.java:295) > at sun.nio.cs.StreamEncoder.flush()V(StreamEncoder.java:141) > - locked <0x00007f553aa7e258> (a java.io.OutputStreamWriter) > at java.io.OutputStreamWriter.flush()V(OutputStreamWriter.java:229) > at java.io.BufferedWriter.flush()V(BufferedWriter.java:254) > - locked <0x00007f553aa7e258> (a java.io.OutputStreamWriter) > at java.io.PrintWriter.flush()V(PrintWriter.java:320) > - locked <0x00007f553aa7e210> (a java.io.BufferedWriter) > at java.io.PrintWriter.checkError()Z(PrintWriter.java:357) > at > com.google.gerrit.sshd.commands.StreamEvents.writeEvents()V(StreamEvents.java:186) > at > com.google.gerrit.sshd.commands.StreamEvents.access$100(Lcom/google/gerrit/sshd/commands/StreamEvents;)V(StreamEvents.java:43) > at > com.google.gerrit.sshd.commands.StreamEvents$3.run()V(StreamEvents.java:82) > at > java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;(Executors.java:471) > at java.util.concurrent.FutureTask.run()V(FutureTask.java:262) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V(ScheduledThreadPoolExecutor.java:292) > at > com.google.gerrit.server.git.WorkQueue$Task.run()V(WorkQueue.java:364) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run()V(ThreadPoolExecutor.java:615) > at java.lang.Thread.run()V(Thread.java:812) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)