Re: Many threads "stuck" when using async loggers

Anthony Maire Wed, 27 Mar 2019 08:51:03 -0700

Hi John

If you have this stack trace in the threads that are producing messages,
then your disruptor is full because the appender is not capable of writing
fast enough.
Since you have a single appender, it might be useful to switch
to RollingRandomAccessFileAppender which can batch writes to disc to reach
a greater throughput if you are limited by I/O operations
In your case I would probably run a profiling with Java Mission Control or
any other sampling profiler that you are used too to see if the slowness in
the single appender thread is caused by I/O operations or by the
processing/formatting of the message itself


Best regards,
Anthony

Le mer. 27 mars 2019 à 16:23, <john.e.gr...@wellsfargo.com.invalid> a
écrit :

> Hi all,
>
> I'm using Log4j2 2.11.2 and am evaluating switching from custom async
> appenders to async loggers.
>
> I've been running some scalability tests with async loggers and sometimes
> get good results but mostly get terrible results at high loads.  At high
> loads, I typically see many threads in this state:
>
> http-bio-7114-exec-406 - priority:5 - threadId:0x00007f2f2c0cc000 -
> nativeId:0x4b3e - nativeId (decimal):19262 - state:TIMED_WAITING
> stackTrace:
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
> at
> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
> at
> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
> at com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465)
> at com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:331)
> at
> org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageInfo(AsyncLoggerDisruptor.java:222)
> at
> org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:181)
> at
> org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:166)
> at
> org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:161)
> at
> org.apache.logging.log4j.core.async.AsyncLogger.logMessage(AsyncLogger.java:131)
> at
> org.apache.logging.log4j.spi.ExtendedLoggerWrapper.logMessage(ExtendedLoggerWrapper.java:217)
> at
> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2170)
> at
> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2125)
> at
> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2108)
> at
> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2002)
> at
> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1974)
>
> Is this a sign that the log rate is too high?
>
> If I count the number of log messages per minute, they go from 70k (1x
> load) to 140k (2x) to 210k (3x) very briefly.  That's over 3k messages per
> second.  Then the log message volume drops way down to under 1k for a few
> minutes before rising to a few thousand per minute.  There were only a few
> hundred log messages per minute when the thread dump was taken.  CPU was
> 100% at the start of 3x (no surprise) then dropped to around 50% even
> though application throughput was almost 0.  Context switching was higher
> than I've ever seen it (hundreds of thousands.)  It's interesting that the
> CPU usage was as high as it was even though application throughput was so
> low.  Could there have been a large backlog of log messages that were
> getting processed?
>
> The application did recover when I dropped the load to 1x.  CPU usage and
> log volume went back to normal.
>
> Garbage collection logs reflect the fact that almost no work was being
> done.
>
> We're using
> -Dlog4j2.contextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector.
>
> All of the log messages go to a single RollingFileAppender.  Disk space
> was about 2/3 used.
>
> Another common thing that I see in the thread dumps when that happens is a
> thread like this blocking many others:
>
> "http-bio-7114-exec-101" #2171 daemon prio=5 os_prio=0
> tid=0x00007f2f5c083800 nid=0x1702 runnable [0x00007f2ea7e02000]
>    java.lang.Thread.State: RUNNABLE
>                 at java.lang.Thread.setPriority0(Native Method)
>                 at java.lang.Thread.setPriority(Thread.java:1095)
>                 at java.lang.Thread.init(Thread.java:417)
>                 at java.lang.Thread.init(Thread.java:349)
>                 at java.lang.Thread.<init>(Thread.java:499)
>               ...
>
> We have a custom multi-threaded async appender that can handle this load.
>
> Any suggestions?
>
>
>
> Thanks
>
> John
>

Re: Many threads "stuck" when using async loggers

Reply via email to