thanks for taking a look. > On Jul 11, 2018, at 1:08 AM, Greg Wilkins <gr...@webtide.com> wrote: > > Steven, > > The loop appears to be in code that we have added as a partial solution to > https://github.com/eclipse/jetty.project/issues/2233 > > Your analysis sounds plausible. Is this reproducible for you or has it so > far been a once off? > > regards
So far, one off. I've subscribed to the issue and we'll certainly keep an eye to see if it happens again. > > > On 11 July 2018 at 01:39, Steven Schlansker <stevenschlans...@gmail.com> > wrote: > Hi everyone, > > We've observed runaway Jetty threads in our production environment. > The remaining threads seem to continue serving traffic just fine, but two of > them are totally stuck, seemingly in a spinloop. > > We run Jetty 9.4.11.v20180605 on Java 9.0.4 > > You can see the difference by comparing a couple of our threads: > > <Screen Shot 2018-07-10 at 3.41.18 PM.png> > > [ The "hung" thread has an order of magnitude more CPU time than any other > worker thread ] > > By sampling the stacks, I am able to notice a couple of hot spots: > > "default-pool-4665" #4665 prio=5 os_prio=0 tid=0x00000000013c5000 nid=0x1258 > runnable [0x00007ff9eb1ef000] > java.lang.Thread.State: RUNNABLE > at > org.eclipse.jetty.io.ssl.SslConnection$1.run(SslConnection.java:121) > at > org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onIncompleteFlush(SslConnection.java:545) > at > org.eclipse.jetty.io.AbstractEndPoint$2.onIncompleteFlush(AbstractEndPoint.java:54) > at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:332) > at > org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:380) > at > org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:804) > at > org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241) > at > org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:224) > at > org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:538) > at > org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:795) > > "default-pool-4665" #4665 prio=5 os_prio=0 tid=0x00000000013c5000 nid=0x1258 > runnable [0x00007ff9eb1ef000] > java.lang.Thread.State: RUNNABLE > at > java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss(java.base@9.0.4/ThreadLocal.java:443) > at > java.lang.ThreadLocal$ThreadLocalMap.getEntry(java.base@9.0.4/ThreadLocal.java:418) > at > java.lang.ThreadLocal$ThreadLocalMap.access$000(java.base@9.0.4/ThreadLocal.java:298) > at java.lang.ThreadLocal.get(java.base@9.0.4/ThreadLocal.java:163) > at > org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onIncompleteFlush(SslConnection.java:547) > at > org.eclipse.jetty.io.AbstractEndPoint$2.onIncompleteFlush(AbstractEndPoint.java:54) > at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:332) > at > org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:380) > at > org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:804) > at > org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241) > at > org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:224) > > <Screen Shot 2018-07-10 at 4.29.06 PM.png> > > In a series of dozens of samples, I never saw a stack that was outside > AEP.write, every stack trace I got looked like this. > Note how 'ThreadLocal.get' is 1/3 of the time -- this must be a *hot* loop! > > I attempted to up the logging level to DEBUG for the WriteFlusher and > SslConnection, but unfortunately Jetty > uses this pattern to boost performance in WriteFlusher: > > private static final boolean DEBUG = LOG.isDebugEnabled(); // Easy for > the compiler to remove the code if DEBUG==false > > which means my attempts to increase debugging without restarting were totally > ineffective. > (Is the performance boost here worth the inability to change logging levels > at runtime? Pretty annoying in this case! And the JIT ought to be able to > elide it all the same if it is actually disabled.) > > I dumped the heap and inspected the state of the thread directly. > I observe the following state, an arbitrary point-in-time: > > WriteFlusher._state = IDLE > HttpConnectionOverHTTP.state = CLOSED > SslConnection$DecryptedEndPoint._state = OPEN > SSLEngineImpl.connectionState = 2 ( DATA ) > SocketChannelImpl.closed = false > > I verified (with lsof -i and netstat -a) that the socket is no longer known > to the kernel (or, at least, nobody is sitting on the > HttpConnectionOverHTTP._endPoint._local.port) > > To me, it looks like the thread is stuck in a hot loop where it sees an > incomplete flush, "tries again". But the connection is not open, so no data > is written, > so it makes no forward progress but also seems to never exit. > > I note that in DecryptedEndPoint.onIncompleteFlush, the loop preamble checks > `isOutputShutdown()` but then we never re-check for a closed socket inside > the __tryWriteAgain loop. > Inside SslConnection$1.run, we call WriteFlusher.completeWrite(), which > checks things like: > > if (previous.getType() != StateType.PENDING) > return; // failure already handled. > > If WriteFlusher state is IDLE, this returns immediately. (I can't know what > the state was at the time, but I see a number of early exit paths in the > code). > > So my current hypothesis is that the socket was asynchronously closed in the > middle of a retry, such that it neither ever succeeds nor fails, but just > burns CPU. > > I'm not very familiar with this code, but hopefully someone here can help > guide us to understand what is going on. I see that `try_again` is only set > in a //TODO block so I worry that we might have hit a bug or unfortunate race > condition. In particular, > > // Keep running complete write until > __tryWriteAgain.set(Boolean.FALSE); > do > { > _runCompleteWrite.run(); > } while (Boolean.TRUE.equals(__tryWriteAgain.get())); > > seems that it would possibly never terminate, since we never set > __tryWriteAgain inside the do/while, only in a finally outside. > > Thanks for any guidance or suggestions! > > Best, > Steven > > _______________________________________________ > jetty-users mailing list > jetty-users@eclipse.org > To change your delivery options, retrieve your password, or unsubscribe from > this list, visit > https://dev.eclipse.org/mailman/listinfo/jetty-users > > > > -- > Greg Wilkins <gr...@webtide.com> CTO http://webtide.com > _______________________________________________ > jetty-users mailing list > jetty-users@eclipse.org > To change your delivery options, retrieve your password, or unsubscribe from > this list, visit > https://dev.eclipse.org/mailman/listinfo/jetty-users
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ jetty-users mailing list jetty-users@eclipse.org To change your delivery options, retrieve your password, or unsubscribe from this list, visit https://dev.eclipse.org/mailman/listinfo/jetty-users