[ 
https://issues.apache.org/jira/browse/SSHD-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147548#comment-14147548
 ] 

Saša Živkov commented on SSHD-348:
----------------------------------

In our production Gerrit we have been now using the sshd 0.10.1 plus the ssh 
handshake failure fix (2aed686bdb2)
cherry-picked.
After 2 days of Gerrit uptime there is no single SSH-Stream-Worker thread 
blocked in the waitForSpace method.

This means that the issue was introduced somewhere between the 0.10.1 and 0.11 
as we know that both 0.11 and 0.12
have this issue.

> Some SSH threads get blocked in Object.wait() method forever
> ------------------------------------------------------------
>
>                 Key: SSHD-348
>                 URL: https://issues.apache.org/jira/browse/SSHD-348
>             Project: MINA SSHD
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>         Environment: Gerrit Code Review 2.9.1
>            Reporter: David Ostrovsky
>
> This seems to be a regression compared to previous versions (0.6-0 and later).
> In Gerrit we have SSH commamds that returns immediately and so called 
> stream-events command which keeps connection open until clients disconnect. 
> Basically for days or weeks. This is used for example to inform CI system 
> (jenkins) about events in gerrit, like new patch set upload etc.
> In Gerrit we have configurable "SSH-Stream-Worker" thread pool which is 
> dedicated to the mentioned stream-events SSH command. The observed behaviour 
> on latest SSHD release is that after some time all threads are stuck. They 
> aren't stuck at the same time, but one after another untill at some time all 
> threads are stuck and Gerrit must be restarted. Usually after one week. The 
> stack dump of all such stuck thread are the same, see below. If we had a 
> patch we could apply it to our production Gerrit instance and try if this 
> helps.
> {code}
> "SSH-Stream-Worker-10" cpu=95400.00 [reset 95400.00] ms elapsed=146444.30 
> [reset 146444.30] s allocated=5526700000 B (5.15 GB) [reset 5526700000 B 
> (5.15 GB)] defined_classes=0
> io= file i/o: 15622752/0 B, net i/o: 0/0 B, files opened:0, socks opened:0  
> [reset file i/o: 15622752/0 B, net i/o: 0/0 B, files opened:0, socks opened:0 
> ] 
> prio=10 tid=0x00007f54514df800 nid=0x1c71 / 7281  pthread-id=139999281374976 
> in Object.wait()  [_thread_blocked (_at_safepoint), 
> stack(0x00007f541f5f6000,0x00007f541f6f7000)] [0x00007f541f6f5000]
>    java.lang.Thread.State: WAITING (on object monitor)
>       at java.lang.Object.wait(J)V(Native Method)
>       - waiting on <0x00007f553aa530d0> (a 
> org.apache.sshd.common.channel.Window)
>       at java.lang.Object.wait()V(Object.java:503)
>       at 
> org.apache.sshd.common.channel.Window.waitForSpace()I(Window.java:148)
>       - locked <0x00007f553aa530d0> (a org.apache.sshd.common.channel.Window)
>       at 
> org.apache.sshd.common.channel.ChannelOutputStream.flush()V(ChannelOutputStream.java:116)
>       - locked <0x00007f553aa55060> (a 
> org.apache.sshd.common.channel.ChannelOutputStream)
>       at 
> org.apache.sshd.common.channel.ChannelOutputStream.write([BII)V(ChannelOutputStream.java:84)
>       - locked <0x00007f553aa55060> (a 
> org.apache.sshd.common.channel.ChannelOutputStream)
>       at sun.nio.cs.StreamEncoder.writeBytes()V(StreamEncoder.java:221)
>       at sun.nio.cs.StreamEncoder.implFlushBuffer()V(StreamEncoder.java:291)
>       at sun.nio.cs.StreamEncoder.implFlush()V(StreamEncoder.java:295)
>       at sun.nio.cs.StreamEncoder.flush()V(StreamEncoder.java:141)
>       - locked <0x00007f553aa7e258> (a java.io.OutputStreamWriter)
>       at java.io.OutputStreamWriter.flush()V(OutputStreamWriter.java:229)
>       at java.io.BufferedWriter.flush()V(BufferedWriter.java:254)
>       - locked <0x00007f553aa7e258> (a java.io.OutputStreamWriter)
>       at java.io.PrintWriter.flush()V(PrintWriter.java:320)
>       - locked <0x00007f553aa7e210> (a java.io.BufferedWriter)
>       at java.io.PrintWriter.checkError()Z(PrintWriter.java:357)
>       at 
> com.google.gerrit.sshd.commands.StreamEvents.writeEvents()V(StreamEvents.java:186)
>       at 
> com.google.gerrit.sshd.commands.StreamEvents.access$100(Lcom/google/gerrit/sshd/commands/StreamEvents;)V(StreamEvents.java:43)
>       at 
> com.google.gerrit.sshd.commands.StreamEvents$3.run()V(StreamEvents.java:82)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;(Executors.java:471)
>       at java.util.concurrent.FutureTask.run()V(FutureTask.java:262)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V(ScheduledThreadPoolExecutor.java:178)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V(ScheduledThreadPoolExecutor.java:292)
>       at 
> com.google.gerrit.server.git.WorkQueue$Task.run()V(WorkQueue.java:364)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run()V(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run()V(Thread.java:812)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to