Hi all,

I'm using Log4j2 2.11.2 and am evaluating switching from custom async appenders 
to async loggers.

I've been running some scalability tests with async loggers and sometimes get 
good results but mostly get terrible results at high loads.  At high loads, I 
typically see many threads in this state:

http-bio-7114-exec-406 - priority:5 - threadId:0x00007f2f2c0cc000 - 
nativeId:0x4b3e - nativeId (decimal):19262 - state:TIMED_WAITING
stackTrace:
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
at 
com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
at 
com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
at com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465)
at com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:331)
at 
org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageInfo(AsyncLoggerDisruptor.java:222)
at 
org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:181)
at org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:166)
at 
org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:161)
at 
org.apache.logging.log4j.core.async.AsyncLogger.logMessage(AsyncLogger.java:131)
at 
org.apache.logging.log4j.spi.ExtendedLoggerWrapper.logMessage(ExtendedLoggerWrapper.java:217)
at 
org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2170)
at 
org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2125)
at 
org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2108)
at 
org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2002)
at 
org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1974)

Is this a sign that the log rate is too high?

If I count the number of log messages per minute, they go from 70k (1x load) to 
140k (2x) to 210k (3x) very briefly.  That's over 3k messages per second.  Then 
the log message volume drops way down to under 1k for a few minutes before 
rising to a few thousand per minute.  There were only a few hundred log 
messages per minute when the thread dump was taken.  CPU was 100% at the start 
of 3x (no surprise) then dropped to around 50% even though application 
throughput was almost 0.  Context switching was higher than I've ever seen it 
(hundreds of thousands.)  It's interesting that the CPU usage was as high as it 
was even though application throughput was so low.  Could there have been a 
large backlog of log messages that were getting processed?

The application did recover when I dropped the load to 1x.  CPU usage and log 
volume went back to normal.

Garbage collection logs reflect the fact that almost no work was being done.

We're using 
-Dlog4j2.contextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector.

All of the log messages go to a single RollingFileAppender.  Disk space was 
about 2/3 used.

Another common thing that I see in the thread dumps when that happens is a 
thread like this blocking many others:

"http-bio-7114-exec-101" #2171 daemon prio=5 os_prio=0 tid=0x00007f2f5c083800 
nid=0x1702 runnable [0x00007f2ea7e02000]
   java.lang.Thread.State: RUNNABLE
                at java.lang.Thread.setPriority0(Native Method)
                at java.lang.Thread.setPriority(Thread.java:1095)
                at java.lang.Thread.init(Thread.java:417)
                at java.lang.Thread.init(Thread.java:349)
                at java.lang.Thread.<init>(Thread.java:499)
              ...

We have a custom multi-threaded async appender that can handle this load.

Any suggestions?



Thanks

John

Reply via email to