Hi all, I'm using Log4j2 2.11.2 and am evaluating switching from custom async appenders to async loggers.
I've been running some scalability tests with async loggers and sometimes get good results but mostly get terrible results at high loads. At high loads, I typically see many threads in this state: http-bio-7114-exec-406 - priority:5 - threadId:0x00007f2f2c0cc000 - nativeId:0x4b3e - nativeId (decimal):19262 - state:TIMED_WAITING stackTrace: java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338) at com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136) at com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105) at com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465) at com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:331) at org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageInfo(AsyncLoggerDisruptor.java:222) at org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:181) at org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:166) at org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:161) at org.apache.logging.log4j.core.async.AsyncLogger.logMessage(AsyncLogger.java:131) at org.apache.logging.log4j.spi.ExtendedLoggerWrapper.logMessage(ExtendedLoggerWrapper.java:217) at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2170) at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2125) at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2108) at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2002) at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1974) Is this a sign that the log rate is too high? If I count the number of log messages per minute, they go from 70k (1x load) to 140k (2x) to 210k (3x) very briefly. That's over 3k messages per second. Then the log message volume drops way down to under 1k for a few minutes before rising to a few thousand per minute. There were only a few hundred log messages per minute when the thread dump was taken. CPU was 100% at the start of 3x (no surprise) then dropped to around 50% even though application throughput was almost 0. Context switching was higher than I've ever seen it (hundreds of thousands.) It's interesting that the CPU usage was as high as it was even though application throughput was so low. Could there have been a large backlog of log messages that were getting processed? The application did recover when I dropped the load to 1x. CPU usage and log volume went back to normal. Garbage collection logs reflect the fact that almost no work was being done. We're using -Dlog4j2.contextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector. All of the log messages go to a single RollingFileAppender. Disk space was about 2/3 used. Another common thing that I see in the thread dumps when that happens is a thread like this blocking many others: "http-bio-7114-exec-101" #2171 daemon prio=5 os_prio=0 tid=0x00007f2f5c083800 nid=0x1702 runnable [0x00007f2ea7e02000] java.lang.Thread.State: RUNNABLE at java.lang.Thread.setPriority0(Native Method) at java.lang.Thread.setPriority(Thread.java:1095) at java.lang.Thread.init(Thread.java:417) at java.lang.Thread.init(Thread.java:349) at java.lang.Thread.<init>(Thread.java:499) ... We have a custom multi-threaded async appender that can handle this load. Any suggestions? Thanks John