thanks for taking a look.

> On Jul 11, 2018, at 1:08 AM, Greg Wilkins <gr...@webtide.com> wrote:
> 
> Steven,
> 
> The loop appears to be in code that we have added as a partial solution to 
> https://github.com/eclipse/jetty.project/issues/2233
> 
> Your analysis sounds plausible.  Is this reproducible for you or has it so 
> far been a once off?
> 
> regards

So far, one off.  I've subscribed to the issue and we'll certainly keep an eye 
to see if it happens again.

> 
> 
> On 11 July 2018 at 01:39, Steven Schlansker <stevenschlans...@gmail.com> 
> wrote:
> Hi everyone,
> 
> We've observed runaway Jetty threads in our production environment.
> The remaining threads seem to continue serving traffic just fine, but two of 
> them are totally stuck, seemingly in a spinloop.
> 
> We run Jetty 9.4.11.v20180605 on Java 9.0.4
> 
> You can see the difference by comparing a couple of our threads:
> 
> <Screen Shot 2018-07-10 at 3.41.18 PM.png>
> 
> [ The "hung" thread has an order of magnitude more CPU time than any other 
> worker thread ]
> 
> By sampling the stacks, I am able to notice a couple of hot spots:
> 
> "default-pool-4665" #4665 prio=5 os_prio=0 tid=0x00000000013c5000 nid=0x1258 
> runnable [0x00007ff9eb1ef000]
>    java.lang.Thread.State: RUNNABLE
>         at 
> org.eclipse.jetty.io.ssl.SslConnection$1.run(SslConnection.java:121)
>         at 
> org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onIncompleteFlush(SslConnection.java:545)
>         at 
> org.eclipse.jetty.io.AbstractEndPoint$2.onIncompleteFlush(AbstractEndPoint.java:54)
>         at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:332)
>         at 
> org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:380)
>         at 
> org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:804)
>         at 
> org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
>         at 
> org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:224)
>         at 
> org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:538)
>         at 
> org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:795)
> 
> "default-pool-4665" #4665 prio=5 os_prio=0 tid=0x00000000013c5000 nid=0x1258 
> runnable [0x00007ff9eb1ef000]
>    java.lang.Thread.State: RUNNABLE
>         at 
> java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss(java.base@9.0.4/ThreadLocal.java:443)
>         at 
> java.lang.ThreadLocal$ThreadLocalMap.getEntry(java.base@9.0.4/ThreadLocal.java:418)
>         at 
> java.lang.ThreadLocal$ThreadLocalMap.access$000(java.base@9.0.4/ThreadLocal.java:298)
>         at java.lang.ThreadLocal.get(java.base@9.0.4/ThreadLocal.java:163)
>         at 
> org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onIncompleteFlush(SslConnection.java:547)
>         at 
> org.eclipse.jetty.io.AbstractEndPoint$2.onIncompleteFlush(AbstractEndPoint.java:54)
>         at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:332)
>         at 
> org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:380)
>         at 
> org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:804)
>         at 
> org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
>         at 
> org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:224)
> 
> <Screen Shot 2018-07-10 at 4.29.06 PM.png>
> 
> In a series of dozens of samples, I never saw a stack that was outside 
> AEP.write, every stack trace I got looked like this.
> Note how 'ThreadLocal.get' is 1/3 of the time -- this must be a *hot* loop!
> 
> I attempted to up the logging level to DEBUG for the WriteFlusher and 
> SslConnection, but unfortunately Jetty
> uses this pattern to boost performance in WriteFlusher:
> 
>     private static final boolean DEBUG = LOG.isDebugEnabled(); // Easy for 
> the compiler to remove the code if DEBUG==false
> 
> which means my attempts to increase debugging without restarting were totally 
> ineffective.
> (Is the performance boost here worth the inability to change logging levels 
> at runtime?  Pretty annoying in this case! And the JIT ought to be able to 
> elide it all the same if it is actually disabled.)
> 
> I dumped the heap and inspected the state of the thread directly.
> I observe the following state, an arbitrary point-in-time:
> 
> WriteFlusher._state = IDLE
> HttpConnectionOverHTTP.state = CLOSED
> SslConnection$DecryptedEndPoint._state = OPEN
> SSLEngineImpl.connectionState = 2 ( DATA )
> SocketChannelImpl.closed = false
> 
> I verified (with lsof -i and netstat -a) that the socket is no longer known 
> to the kernel (or, at least, nobody is sitting on the 
> HttpConnectionOverHTTP._endPoint._local.port)
> 
> To me, it looks like the thread is stuck in a hot loop where it sees an 
> incomplete flush, "tries again".  But the connection is not open, so no data 
> is written,
> so it makes no forward progress but also seems to never exit.
> 
> I note that in DecryptedEndPoint.onIncompleteFlush, the loop preamble checks 
> `isOutputShutdown()` but then we never re-check for a closed socket inside 
> the __tryWriteAgain loop.
> Inside SslConnection$1.run, we call WriteFlusher.completeWrite(), which 
> checks things like:
> 
>         if (previous.getType() != StateType.PENDING)
>             return; // failure already handled.
> 
> If WriteFlusher state is IDLE, this returns immediately.  (I can't know what 
> the state was at the time, but I see a number of early exit paths in the 
> code).
> 
> So my current hypothesis is that the socket was asynchronously closed in the 
> middle of a retry, such that it neither ever succeeds nor fails, but just 
> burns CPU.
> 
> I'm not very familiar with this code, but hopefully someone here can help 
> guide us to understand what is going on.  I see that `try_again` is only set 
> in a //TODO block so I worry that we might have hit a bug or unfortunate race 
> condition.  In particular,
> 
> // Keep running complete write until
> __tryWriteAgain.set(Boolean.FALSE);
> do
> {
>     _runCompleteWrite.run();
> } while (Boolean.TRUE.equals(__tryWriteAgain.get()));
> 
> seems that it would possibly never terminate, since we never set 
> __tryWriteAgain inside the do/while, only in a finally outside.
> 
> Thanks for any guidance or suggestions!
> 
> Best,
> Steven
> 
> _______________________________________________
> jetty-users mailing list
> jetty-users@eclipse.org
> To change your delivery options, retrieve your password, or unsubscribe from 
> this list, visit
> https://dev.eclipse.org/mailman/listinfo/jetty-users
> 
> 
> 
> --
> Greg Wilkins <gr...@webtide.com> CTO http://webtide.com
> _______________________________________________
> jetty-users mailing list
> jetty-users@eclipse.org
> To change your delivery options, retrieve your password, or unsubscribe from 
> this list, visit
> https://dev.eclipse.org/mailman/listinfo/jetty-users

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
jetty-users mailing list
jetty-users@eclipse.org
To change your delivery options, retrieve your password, or unsubscribe from 
this list, visit
https://dev.eclipse.org/mailman/listinfo/jetty-users

Reply via email to