Re: Garbage-free Log4j docs preview

Remko Popma Sun, 22 May 2016 08:40:54 -0700


On Sunday, May 22, 2016 at 11:08:37 PM UTC+9, Benedict Elliott Smith wrote:
>
> It's possible I'm missing some other aspect that insulates the application 
> thread from the disruptor, but from your message it seems to me that you 
> fall into the all-too-common category of people who do not fully understand 
> the semantics of the disruptor.  Application threads (i.e. disruptor 
> producers) very much *are* affected by the wait strategy.  Under normal 
> use, *if the disruptor consumer thread is employing a blocking strategy, 
> there will be futex call to unblock it when work is provided*.  i.e. 
> whenever the queue transitions from empty to non-empty, i.e. for most 
> messages with low-to-medium logging rates.
>
> In this situation the enqueueing of a message is also* not lock free*, as 
> while futexes can in theory be lock-free, the JVM uses a lock to 
> synchronise access to the semaphore that controls its runnable state, so 
> all threads attempting to wake it up concurrently will compete for this 
> lock.
>


Well... Looks like you were right and I was wrong. 

Interesting. Apart from the dent in my pride here, this is actually good 
news. It means we can do better!

So in the worst-case scenario Async Loggers/Disruptor with the blocking 
wait strategy give performance equivalent to AsyncAppender with a 
BlockingArrayQueue. We can expect Async Loggers with Disruptor to 
outperform AsyncAppender with BlockingArrayQueue when workloads are high or 
bursts of work keep the background thread busy for some time. At least this 
is my understanding now. Or did I miss something and are there cases where 
Async Loggers with Disruptor will perform worse than AsyncAppender with 
BlockingArrayQueue?

In terms of how to improve, short-term we should probably look at using 
the PhasedBackoffWaitStrategy as the default, although there will always be 
workloads that "miss the window". The trade-off is increased CPU. We need 
to tread carefully here because users reporting high CPU is why we changed 
from the sleeping wait strategy to blocking. (The default in Log4j 2.6 will 
be timeout blocking wait to deal with a deadlock issue in Solaris.)

Long-term we should probably explore other data structures.

Exciting times. Thanks for correcting me!


>
> On 22 May 2016 at 15:07, Benedict Elliott Smith <[email protected] 
> <javascript:>> wrote:
>
>> It's possible I'm missing some other aspect that insulates the 
>> application thread from the disruptor, but from your message it seems to me 
>> that you fall into the all-too-common category of people who do not fully 
>> understand the semantics of the disruptor.  Application threads (i.e. 
>> disruptor producers) very much *are* affected by the wait strategy.  
>> Under normal use, *if the disruptor consumer thread is employing a 
>> blocking strategy, there will be futex call to unblock it when work is 
>> provided*.  i.e. whenever the queue transitions from empty to non-empty, 
>> i.e. for most messages with low-to-medium logging rates.
>>
>> In this situation the enqueueing of a message is also* not lock free*, 
>> as while futexes can in theory be lock-free, the JVM uses a lock to 
>> synchronise access to the semaphore that controls its runnable state, so 
>> all threads attempting to wake it up concurrently will compete for this 
>> lock.
>>
>>
>>
>> On 22 May 2016 at 14:42, Remko Popma <[email protected] <javascript:>> 
>> wrote:
>>
>>>
>>>
>>> On Sunday, May 22, 2016 at 6:34:13 PM UTC+9, Benedict Elliott Smith 
>>> wrote:
>>>>
>>>> Hi Remko,
>>>>
>>>> I realise that I was looking at the old log4j code without realising 
>>>> it, in which there is always a futex involved.
>>>>
>>> Which class is this? Async Loggers in Log4j 2 have always been 
>>> lock-free. What may be confusing is that Log4j 2 also has an Async
>>> *Appender*, which does not use the Disruptor but uses a JDK 
>>> BlockingArrayQueue. This appender is very similar to how Logback and Log4j 
>>> 1 do asynchronous logging. Async*Appender* (in Log4j 2, Log4j 1 and 
>>> Logback, they all use a blocking queue) is sensitive to lock contention and 
>>> does not scale with more threads.
>>>  
>>>
>>>>
>>>> However the behaviour of the disruptor that backs the new code will 
>>>> depend on the wait strategy employed (and I'll note your default is a 
>>>> blocking strategy, for which my prior statement holds*).
>>>>
>>> Perhaps the key misunderstanding is here. I think I did not explain it 
>>> well in my previous message. Let me try again.
>>>
>>> Async Loggers use the Disruptor and are lock-free. This means that 
>>> multiple application threads can log concurrently and are not blocked by 
>>> each other. This is true for very low workloads, medium workloads and high 
>>> workloads. 
>>>
>>> Only if an application logs messages at a very high rate for a long 
>>> period of time, or uses a slow appender, the Disruptor ringbuffer can fill 
>>> up and application threads will have to wait until a slot in the ringbuffer 
>>> becomes available. For the 2.6 release I have added text to the 
>>> Trade-offs section 
>>> <http://home.apache.org/~rpopma/log4j/2.6/manual/async.html#Trade-offs> 
>>> to explain this last point, based on your feedback.
>>>
>>> Note that the Disruptor wait strategy is irrelevant to all this. 
>>> Application threads are *not *impacted by the wait strategy.
>>>
>>>
>>>> Here we obviously get into the territory of people needing to 
>>>> understand the disruptor as well as your logging framework, but since you 
>>>> have a whole section labelled "Trade-offs" in which you draw attention to 
>>>> the improved throughput and latency profile under "Benefits," under 
>>>> "Drawbacks" it might be nice to mention these possible confounders.  Or at 
>>>> least raise a bat signal to go and understand the tradeoffs for the 
>>>> disruptor (which is also regrettably poorly understood).  
>>>>
>>>> It might also be helpful to explicitly call out the configuration used 
>>>> for your benchmarks, 
>>>>
>>> Tests on the 2.6 performance page 
>>> <http://home.apache.org/~rpopma/log4j/2.6/performance.html> are mostly 
>>> done with JMH benchmarks where parameters are in the javadoc.
>>> Tests on the Async Loggers page are partly with the new ResponseTimeTest 
>>>
>>> <https://github.com/apache/logging-log4j2/blob/6ddc0250cb6ebbf5f05d3026a976bc8ca0a8f162/log4j-core/src/test/java/org/apache/logging/log4j/core/async/perftest/ResponseTimeTest.java>class
>>>  
>>> (params in the doc 
>>> <http://home.apache.org/~rpopma/log4j/2.6/manual/async.html#Latency>) 
>>> and partly still shows results from the older PerfTestDriver 
>>> <https://github.com/apache/logging-log4j2/blob/6ddc0250cb6ebbf5f05d3026a976bc8ca0a8f162/log4j-core/src/test/java/org/apache/logging/log4j/core/async/perftest/PerfTestDriver.java>
>>>  
>>> class. This last one has parameters in the code. You'll need to spend some 
>>> time to get familiar with it if you want to run it.
>>>
>>> and perhaps to run a comparison against some real software - github does 
>>>> not lack for applications using log4j!  Unfortunately I bet this would 
>>>> result in a much less sexy graph.
>>>>
>>> Log4j 2 has had lock-free Async Loggers from its 2.0-beta5 release 
>>> <http://logging.apache.org/log4j/2.x/changes-report.html#a2.0-beta5>, 
>>> three years ago. If there are any serious performance drawbacks like you 
>>> seem to think there are I hope I would have heard of them. Fortunately the 
>>> opposite is true. People report very positive experiences 
>>> <http://tech.finn.no/2014/07/01/log4j2-in-production-making-it-fly/>. 
>>> People who did their own measurements reported it makes their logging as 
>>> fast as if it was switched off 
>>> <http://blogs.mulesoft.com/dev/mule-dev/mule-3-6-asynchronous-logging/>.
>>>
>>>
>>>> Finally, I really think people should (in your wiki page) explicitly be 
>>>> *discouraged* from using this part of your framework *unless they know 
>>>> they need it*.  It's a fantastic piece of work for people who *do* need 
>>>> it.  
>>>>
>>> Again, I think you are laboring under a misapprehension. I cannot 
>>> imagine a scenario where lock-free logging would be a bad thing. If you 
>>> think there is, please show evidence.
>>>  
>>>
>>>> But 99.99% of people would be better off with a logger that just avoids 
>>>> blocking threads when one is already in the process of logging. 
>>>>
>>> Avoiding blocking is exactly what the Async Loggers are for. (So I 
>>> really think there is some misunderstanding somewhere.)
>>> Based on your feedback, I've updated 
>>> <http://home.apache.org/~rpopma/log4j/2.6/manual/async.html> the side 
>>> nav menu and page title for the 2.6 release to clarify that Async Loggers 
>>> are lock-free. I hope this will help avoid misunderstandings.
>>>  
>>>
>>>>
>>>> (*It looks like you silently ignore misconfiguration, and default to 
>>>> timeout wait strategy also, which is probably going to surprise some 
>>>> people)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 19 May 2016 at 00:34, Remko Popma <[email protected]> wrote:
>>>>
>>>>> Benedict, thanks for your feedback!
>>>>>
>>>>> With regards to filtering, global filters are already considered 
>>>>> before the event is enqueued. Filters configured on the Logger and 
>>>>> AppenderRef are applied prior to enqueueing with mixed Sync/Async 
>>>>> <http://logging.apache.org/log4j/2.x/manual/async.html#MixedSync-Async> 
>>>>> Loggers but when all loggers are async 
>>>>> <http://logging.apache.org/log4j/2.x/manual/async.html#AllAsync> 
>>>>> these filters are applied by the background thread. We can probably 
>>>>> improve 
>>>>> this, thanks for the suggestion!
>>>>>
>>>>> Your suggestion to run performance tests under lower loads is 
>>>>> reasonable and we will look at this for a future release.
>>>>> I did experiment with this a little a while back and did see larger 
>>>>> response time pauses. Perhaps others with more experience can chime in.
>>>>>
>>>>> My understanding of how the Disruptor works is that enqueueing the 
>>>>> event is lock-free, so applications threads doing the logging should not 
>>>>> experience any context switches or suffer latency from Futex calls caused 
>>>>> by logging (regardless of the workload). The background thread is another 
>>>>> story. Under moderate to low workloads the background thread may 
>>>>> (depending 
>>>>> on the wait policy) fall asleep and experience delays before waking up 
>>>>> when 
>>>>> work arrives. However, if there are enough cores to serve all threads I 
>>>>> don't see how such background thread delays can impact (cause response 
>>>>> time 
>>>>> pauses for) the application that is doing the logging. Please correct me 
>>>>> if 
>>>>> my understanding is incorrect.
>>>>>
>>>>> My thinking is that using async loggers is good for reactive 
>>>>> applications that need to respond quickly to external events. It is 
>>>>> especially useful if the application needs to deal with occasional bursts 
>>>>> of work (and accompanied bursts of logging). This is where async loggers 
>>>>> can deliver value even if the normal workload is low.
>>>>>
>>>>> Remko
>>>>>
>>>>> On Wednesday, May 18, 2016 at 2:56:18 AM UTC+9, Benedict Elliott Smith 
>>>>> wrote:
>>>>>>
>>>>>> Regrettably it seems impossible (or at least very annoying) to send 
>>>>>> to both lists simultaneously...
>>>>>>
>>>>>> On 17 May 2016 at 18:55, Benedict Elliott Smith <[email protected]> 
>>>>>> wrote:
>>>>>>
>>>>>>> Could I suggest that you run tests for latency and throughput 
>>>>>>> effects of using this in systems with only moderate-to-low numbers of 
>>>>>>> logging calls?  (i.e. the vast majority of systems!)
>>>>>>>
>>>>>>
>>>>>>> It's very likely that in such systems messages will not reach a 
>>>>>>> velocity to keep the Dispatcher running, and so logging calls may often 
>>>>>>> (or 
>>>>>>> always) involve a Futex call - which is not a trivial cost.  There will 
>>>>>>> almost certainly be systems out there with anti-ideal characteristics - 
>>>>>>> logging just often enough that these costs materially impact 
>>>>>>> throughput, 
>>>>>>> but not often enough that they suddenly disappear.  
>>>>>>>
>>>>>>> Even though a majority of systems *do not need this*, because it 
>>>>>>> "async" and the new hotness, and there are no advertised downsides, 
>>>>>>> everyone will try to use it.  It's up to those who know better to make 
>>>>>>> sure 
>>>>>>> these people are informed it isn't a free lunch.  Making sure all of 
>>>>>>> the 
>>>>>>> caveats are reported on the advertising page would be a great start IMO.
>>>>>>>
>>>>>>> Might I also separately suggest you consider filtering events prior 
>>>>>>> to placing them on the queue for processing by the dispatcher?  I've 
>>>>>>> only 
>>>>>>> briefly glanced at the code, but it seems not to be the case currently.
>>>>>>>
>>>>>>>>
>>>>>>>> On 17 May 2016 at 18:33, Martin Thompson <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Remko,
>>>>>>>>>
>>>>>>>>> I'd just like to say that it is a great service to the community 
>>>>>>>>> as a whole that someone is seriously looking at improving logging.
>>>>>>>>>
>>>>>>>>> If you keep it up you'll be putting folks like me out of a job :)
>>>>>>>>>
>>>>>>>>> Martin...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 17 May 2016 at 18:13, Remko Popma <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> First, my thanks to the many people who gave helpful advice and 
>>>>>>>>>> feedback on how to measure Log4j response time on this list some 
>>>>>>>>>> time ago.
>>>>>>>>>>
>>>>>>>>>> We're about to start the Log4j 2.6 release.
>>>>>>>>>> If anyone is interested, a preview of the garbage-free logging 
>>>>>>>>>> manual page is here: 
>>>>>>>>>> http://home.apache.org/~rpopma/log4j/2.6/manual/garbagefree.html
>>>>>>>>>> and a preview of the updated performance page is here: 
>>>>>>>>>> http://home.apache.org/~rpopma/log4j/2.6/performance.html
>>>>>>>>>>
>>>>>>>>>> Feedback welcome!
>>>>>>>>>>
>>>>>>>>>> Remko
>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>> Google Groups "mechanical-sympathy" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>>> send an email to [email protected]
>>>>>>>>>> .
>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "mechanical-sympathy" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>> send an email to [email protected].
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "mechanical-sympathy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] 
>>> <javascript:>.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Garbage-free Log4j docs preview

Reply via email to