Re: Garbage-free Log4j docs preview

Mikael Ståldal Mon, 23 May 2016 06:11:45 -0700

> But I don't understand why users should only configure async logging in
rare cases? Synchronous logging also involves holding a lock. Log4j 2 makes
an effort to release this lock as soon as
> possible (locking only while copying a byte array from a thread-local
buffer to the I/O device), but as Martin pointed out, in addition to the
lock contention, other things may cause response time
> pauses when writing to the I/O device.


There is also another reason for using async logging, to protect the app
from errors and unpredictable response time from a network based appender.
However, for that the ArrayBlockingQueue based AsyncAppender is sufficient.

On Mon, May 23, 2016 at 2:38 PM, Remko Popma <remko.po...@gmail.com> wrote:

>
>
> On Monday, May 23, 2016 at 2:47:05 AM UTC+9, Benedict Elliott Smith wrote:
>>
>> Your pride should be saved by your gracious recognition of the error :)
>>
>>
>> However this is the core of my point: you’re clearly an intelligent,
>> conscientious and talented developer, and yet you did not fully understand
>> these concerns.  You are writing core infrastructure many people will
>> deploy, and a majority will understand it much much less well.  So the very
>> best thing you can do is to educate them.  Doubly so if you needed
>> educating yourself!
>>
> ;-)
>
>>
>> I would make it very clear that users should only consider asynchronous
>> logging if they are having performance problems caused by their logging.
>> Only with sub-millisecond latency requirements should async be considered,
>> and below 100 micros you are entering busy-spinning territory.  Or if huge
>> message rates are being seen.  The best default is a blocking strategy, as
>> this has the closest performance profile to synchronous, and will prevent
>> the many users who deploy this without needing it from suffering any major
>> ill-effect.
>>
> I think I missed a step here. You made it plausible that under low
> workload, the Disruptor may not be better than ArrayBlockingQueue.
> Understood.
>
> But I don't understand why users should only configure async logging in
> rare cases? Synchronous logging also involves holding a lock. Log4j 2 makes
> an effort to release this lock as soon as possible (locking only while
> copying a byte array from a thread-local buffer to the I/O device), but as
> Martin pointed out, in addition to the lock contention, other things may
> cause response time pauses when writing to the I/O device.
>
> So which is "better": holding a lock to signal a
> java.util.concurrent.locks.Condition, versus holding a lock to write a byte
> array to the I/O device? The latter sounds like it has more risk of latency
> spikes. So I'm not sure I agree with your assessment that async logging has
> more risk of major ill effects.
>
> Under low loads the upside of Async Loggers may not be as high as the
> graph suggests but I don't see any real downsides either. Please let me
> know if I'm missing something.
>
>
>>
>> I would also do a performance comparison with a real application.
>> Cassandra uses log4j, and the community has an easily utilised performance
>> tool.  You could try both DEBUG and INFO level, and put up some graphs.
>> This would help users make a more realistic and informed decision.
>>
> I agree it may be interesting for users to see a case study of some sort
> with a real application. Can you point me to the tool you are referring to?
>
>
>>
>> This may all seem a little disappointing: to *reduce* the number of
>> people who might use your amazing new technology.  But there will be
>> plenty, who both need it and do not and don't read or heed the warnings.
>>
> That's okay. It is garbage-free logging that is new in the 2.6 release,
> not the Async Loggers. The Disruptor-based Async Loggers have been used in
> the wild for about three years now. The response has generally been
> positive so I'm not panicking just yet.
> I do intend to follow up though and I take your point that under low load
> this technology may not give as much benefit as under high load. (Again,
> please correct me if there is a bigger problem and I'm missing any other
> downsides/risks.)
>
>
>>
>> Regarding improvements: In my opinion, a better algorithm for this
>> majority, of low-to-moderate traffic that is not extremely latency
>> sensitive, is semi-synchronous.  By which I mean for the first log call in
>> a batch to take exclusive ownership of the output and to write its message
>> synchronously; any messages that arrive while it has ownership are passed
>> to the owner to persist, i.e. they are written asynchronously.  To ensure
>> any thread’s ownership is brief, it would impose a time-limit, after which
>> a dedicated consumer could be started if the queue is still not exhausted.
>> This would have better than synchronous behaviour for low-to-moderate
>> traffic and scale gracefully to high traffic.
>>
> This is really interesting. It reminds me of one of the co-operative
> algorithms I read about in the Art of Multiprocessor Programming (I tried
> to look it up but could not find it last night).
>
> It may be an idea to enhance the Disruptor for lower workloads: I guess
> the question is whether it is possible to incorporate a cheap check before
> calling waitStrategy.signalAllWhenBlocking() in MultiProducerSequence, and
> avoid calling that method if another thread is already signalling the
> consumer thread.
> ...and it turns out this already exists and is called
> LiteBlockingWaitStrategy. Good stuff! Perhaps all that is needed is combine
> the TimeoutBlockingWaitStrategy with this one and we can improve things
> nicely for lower workloads.
>
> Does anyone on this list have any experience with
> LiteBlockingWaitStrategy? I see it was used in projectreactor.io, would
> be interested to hear how that worked, either on list of off-list.
>
>>
>>
>> On Sunday, 22 May 2016 16:40:21 UTC+1, Remko Popma wrote:
>>>
>>>
>>>
>>> On Sunday, May 22, 2016 at 11:08:37 PM UTC+9, Benedict Elliott Smith
>>> wrote:
>>>>
>>>> It's possible I'm missing some other aspect that insulates the
>>>> application thread from the disruptor, but from your message it seems to me
>>>> that you fall into the all-too-common category of people who do not fully
>>>> understand the semantics of the disruptor.  Application threads (i.e.
>>>> disruptor producers) very much *are* affected by the wait strategy.
>>>> Under normal use, *if the disruptor consumer thread is employing a
>>>> blocking strategy, there will be futex call to unblock it when work is
>>>> provided*.  i.e. whenever the queue transitions from empty to
>>>> non-empty, i.e. for most messages with low-to-medium logging rates.
>>>>
>>>> In this situation the enqueueing of a message is also* not lock free*,
>>>> as while futexes can in theory be lock-free, the JVM uses a lock to
>>>> synchronise access to the semaphore that controls its runnable state, so
>>>> all threads attempting to wake it up concurrently will compete for this
>>>> lock.
>>>>
>>>
>>> Well... Looks like you were right and I was wrong.
>>>
>>> Interesting. Apart from the dent in my pride here, this is actually good
>>> news. It means we can do better!
>>>
>>> So in the worst-case scenario Async Loggers/Disruptor with the blocking
>>> wait strategy give performance equivalent to AsyncAppender with a
>>> BlockingArrayQueue. We can expect Async Loggers with Disruptor to
>>> outperform AsyncAppender with BlockingArrayQueue when workloads are high or
>>> bursts of work keep the background thread busy for some time. At least this
>>> is my understanding now. Or did I miss something and are there cases where
>>> Async Loggers with Disruptor will perform worse than AsyncAppender with
>>> BlockingArrayQueue?
>>>
>>> In terms of how to improve, short-term we should probably look at using
>>> the PhasedBackoffWaitStrategy as the default, although there will always be
>>> workloads that "miss the window". The trade-off is increased CPU. We need
>>> to tread carefully here because users reporting high CPU is why we changed
>>> from the sleeping wait strategy to blocking. (The default in Log4j 2.6 will
>>> be timeout blocking wait to deal with a deadlock issue in Solaris.)
>>>
>>> Long-term we should probably explore other data structures.
>>>
>>> Exciting times. Thanks for correcting me!
>>>
>>>
>>>>
>>>> On 22 May 2016 at 15:07, Benedict Elliott Smith <bene...@apache.org>
>>>> wrote:
>>>>
>>>>> It's possible I'm missing some other aspect that insulates the
>>>>> application thread from the disruptor, but from your message it seems to 
>>>>> me
>>>>> that you fall into the all-too-common category of people who do not fully
>>>>> understand the semantics of the disruptor.  Application threads (i.e.
>>>>> disruptor producers) very much *are* affected by the wait strategy.
>>>>> Under normal use, *if the disruptor consumer thread is employing a
>>>>> blocking strategy, there will be futex call to unblock it when work is
>>>>> provided*.  i.e. whenever the queue transitions from empty to
>>>>> non-empty, i.e. for most messages with low-to-medium logging rates.
>>>>>
>>>>> In this situation the enqueueing of a message is also* not lock free*,
>>>>> as while futexes can in theory be lock-free, the JVM uses a lock to
>>>>> synchronise access to the semaphore that controls its runnable state, so
>>>>> all threads attempting to wake it up concurrently will compete for this
>>>>> lock.
>>>>>
>>>>>
>>>>>
>>>>> On 22 May 2016 at 14:42, Remko Popma <remko...@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Sunday, May 22, 2016 at 6:34:13 PM UTC+9, Benedict Elliott Smith
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Remko,
>>>>>>>
>>>>>>> I realise that I was looking at the old log4j code without realising
>>>>>>> it, in which there is always a futex involved.
>>>>>>>
>>>>>> Which class is this? Async Loggers in Log4j 2 have always been
>>>>>> lock-free. What may be confusing is that Log4j 2 also has an Async
>>>>>> *Appender*, which does not use the Disruptor but uses a JDK
>>>>>> BlockingArrayQueue. This appender is very similar to how Logback and 
>>>>>> Log4j
>>>>>> 1 do asynchronous logging. Async*Appender* (in Log4j 2, Log4j 1 and
>>>>>> Logback, they all use a blocking queue) is sensitive to lock contention 
>>>>>> and
>>>>>> does not scale with more threads.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> However the behaviour of the disruptor that backs the new code will
>>>>>>> depend on the wait strategy employed (and I'll note your default is a
>>>>>>> blocking strategy, for which my prior statement holds*).
>>>>>>>
>>>>>> Perhaps the key misunderstanding is here. I think I did not explain
>>>>>> it well in my previous message. Let me try again.
>>>>>>
>>>>>> Async Loggers use the Disruptor and are lock-free. This means that
>>>>>> multiple application threads can log concurrently and are not blocked by
>>>>>> each other. This is true for very low workloads, medium workloads and 
>>>>>> high
>>>>>> workloads.
>>>>>>
>>>>>> Only if an application logs messages at a very high rate for a long
>>>>>> period of time, or uses a slow appender, the Disruptor ringbuffer can 
>>>>>> fill
>>>>>> up and application threads will have to wait until a slot in the 
>>>>>> ringbuffer
>>>>>> becomes available. For the 2.6 release I have added text to the
>>>>>> Trade-offs section
>>>>>> <http://home.apache.org/~rpopma/log4j/2.6/manual/async.html#Trade-offs>
>>>>>> to explain this last point, based on your feedback.
>>>>>>
>>>>>> Note that the Disruptor wait strategy is irrelevant to all this.
>>>>>> Application threads are *not *impacted by the wait strategy.
>>>>>>
>>>>>>
>>>>>>> Here we obviously get into the territory of people needing to
>>>>>>> understand the disruptor as well as your logging framework, but since 
>>>>>>> you
>>>>>>> have a whole section labelled "Trade-offs" in which you draw attention 
>>>>>>> to
>>>>>>> the improved throughput and latency profile under "Benefits," under
>>>>>>> "Drawbacks" it might be nice to mention these possible confounders.  Or 
>>>>>>> at
>>>>>>> least raise a bat signal to go and understand the tradeoffs for the
>>>>>>> disruptor (which is also regrettably poorly understood).
>>>>>>>
>>>>>>> It might also be helpful to explicitly call out the configuration
>>>>>>> used for your benchmarks,
>>>>>>>
>>>>>> Tests on the 2.6 performance page
>>>>>> <http://home.apache.org/~rpopma/log4j/2.6/performance.html> are
>>>>>> mostly done with JMH benchmarks where parameters are in the javadoc.
>>>>>> Tests on the Async Loggers page are partly with the new ResponseTimeTest
>>>>>>
>>>>>> <https://github.com/apache/logging-log4j2/blob/6ddc0250cb6ebbf5f05d3026a976bc8ca0a8f162/log4j-core/src/test/java/org/apache/logging/log4j/core/async/perftest/ResponseTimeTest.java>class
>>>>>> (params in the doc
>>>>>> <http://home.apache.org/~rpopma/log4j/2.6/manual/async.html#Latency>)
>>>>>> and partly still shows results from the older PerfTestDriver
>>>>>> <https://github.com/apache/logging-log4j2/blob/6ddc0250cb6ebbf5f05d3026a976bc8ca0a8f162/log4j-core/src/test/java/org/apache/logging/log4j/core/async/perftest/PerfTestDriver.java>
>>>>>> class. This last one has parameters in the code. You'll need to spend 
>>>>>> some
>>>>>> time to get familiar with it if you want to run it.
>>>>>>
>>>>>> and perhaps to run a comparison against some real software - github
>>>>>>> does not lack for applications using log4j!  Unfortunately I bet this 
>>>>>>> would
>>>>>>> result in a much less sexy graph.
>>>>>>>
>>>>>> Log4j 2 has had lock-free Async Loggers from its 2.0-beta5 release
>>>>>> <http://logging.apache.org/log4j/2.x/changes-report.html#a2.0-beta5>,
>>>>>> three years ago. If there are any serious performance drawbacks like you
>>>>>> seem to think there are I hope I would have heard of them. Fortunately 
>>>>>> the
>>>>>> opposite is true. People report very positive experiences
>>>>>> <http://tech.finn.no/2014/07/01/log4j2-in-production-making-it-fly/>.
>>>>>> People who did their own measurements reported it makes their logging as
>>>>>> fast as if it was switched off
>>>>>> <http://blogs.mulesoft.com/dev/mule-dev/mule-3-6-asynchronous-logging/>
>>>>>> .
>>>>>>
>>>>>>
>>>>>>> Finally, I really think people should (in your wiki page) explicitly
>>>>>>> be *discouraged* from using this part of your framework *unless
>>>>>>> they know they need it*.  It's a fantastic piece of work for people
>>>>>>> who *do* need it.
>>>>>>>
>>>>>> Again, I think you are laboring under a misapprehension. I cannot
>>>>>> imagine a scenario where lock-free logging would be a bad thing. If you
>>>>>> think there is, please show evidence.
>>>>>>
>>>>>>
>>>>>>> But 99.99% of people would be better off with a logger that just
>>>>>>> avoids blocking threads when one is already in the process of logging.
>>>>>>>
>>>>>> Avoiding blocking is exactly what the Async Loggers are for. (So I
>>>>>> really think there is some misunderstanding somewhere.)
>>>>>> Based on your feedback, I've updated
>>>>>> <http://home.apache.org/~rpopma/log4j/2.6/manual/async.html> the
>>>>>> side nav menu and page title for the 2.6 release to clarify that Async
>>>>>> Loggers are lock-free. I hope this will help avoid misunderstandings.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> (*It looks like you silently ignore misconfiguration, and default to
>>>>>>> timeout wait strategy also, which is probably going to surprise some 
>>>>>>> people)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 19 May 2016 at 00:34, Remko Popma <remko...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Benedict, thanks for your feedback!
>>>>>>>>
>>>>>>>> With regards to filtering, global filters are already considered
>>>>>>>> before the event is enqueued. Filters configured on the Logger and
>>>>>>>> AppenderRef are applied prior to enqueueing with mixed Sync/Async
>>>>>>>> <http://logging.apache.org/log4j/2.x/manual/async.html#MixedSync-Async>
>>>>>>>> Loggers but when all loggers are async
>>>>>>>> <http://logging.apache.org/log4j/2.x/manual/async.html#AllAsync>
>>>>>>>> these filters are applied by the background thread. We can probably 
>>>>>>>> improve
>>>>>>>> this, thanks for the suggestion!
>>>>>>>>
>>>>>>>> Your suggestion to run performance tests under lower loads is
>>>>>>>> reasonable and we will look at this for a future release.
>>>>>>>> I did experiment with this a little a while back and did see larger
>>>>>>>> response time pauses. Perhaps others with more experience can chime in.
>>>>>>>>
>>>>>>>> My understanding of how the Disruptor works is that enqueueing the
>>>>>>>> event is lock-free, so applications threads doing the logging should 
>>>>>>>> not
>>>>>>>> experience any context switches or suffer latency from Futex calls 
>>>>>>>> caused
>>>>>>>> by logging (regardless of the workload). The background thread is 
>>>>>>>> another
>>>>>>>> story. Under moderate to low workloads the background thread may 
>>>>>>>> (depending
>>>>>>>> on the wait policy) fall asleep and experience delays before waking up 
>>>>>>>> when
>>>>>>>> work arrives. However, if there are enough cores to serve all threads I
>>>>>>>> don't see how such background thread delays can impact (cause response 
>>>>>>>> time
>>>>>>>> pauses for) the application that is doing the logging. Please correct 
>>>>>>>> me if
>>>>>>>> my understanding is incorrect.
>>>>>>>>
>>>>>>>> My thinking is that using async loggers is good for reactive
>>>>>>>> applications that need to respond quickly to external events. It is
>>>>>>>> especially useful if the application needs to deal with occasional 
>>>>>>>> bursts
>>>>>>>> of work (and accompanied bursts of logging). This is where async 
>>>>>>>> loggers
>>>>>>>> can deliver value even if the normal workload is low.
>>>>>>>>
>>>>>>>> Remko
>>>>>>>>
>>>>>>>> On Wednesday, May 18, 2016 at 2:56:18 AM UTC+9, Benedict Elliott
>>>>>>>> Smith wrote:
>>>>>>>>>
>>>>>>>>> Regrettably it seems impossible (or at least very annoying) to
>>>>>>>>> send to both lists simultaneously...
>>>>>>>>>
>>>>>>>>> On 17 May 2016 at 18:55, Benedict Elliott Smith <
>>>>>>>>> bene...@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Could I suggest that you run tests for latency and throughput
>>>>>>>>>> effects of using this in systems with only moderate-to-low numbers of
>>>>>>>>>> logging calls?  (i.e. the vast majority of systems!)
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> It's very likely that in such systems messages will not reach a
>>>>>>>>>> velocity to keep the Dispatcher running, and so logging calls may 
>>>>>>>>>> often (or
>>>>>>>>>> always) involve a Futex call - which is not a trivial cost.  There 
>>>>>>>>>> will
>>>>>>>>>> almost certainly be systems out there with anti-ideal 
>>>>>>>>>> characteristics -
>>>>>>>>>> logging just often enough that these costs materially impact 
>>>>>>>>>> throughput,
>>>>>>>>>> but not often enough that they suddenly disappear.
>>>>>>>>>>
>>>>>>>>>> Even though a majority of systems *do not need this*, because it
>>>>>>>>>> "async" and the new hotness, and there are no advertised downsides,
>>>>>>>>>> everyone will try to use it.  It's up to those who know better to 
>>>>>>>>>> make sure
>>>>>>>>>> these people are informed it isn't a free lunch.  Making sure all of 
>>>>>>>>>> the
>>>>>>>>>> caveats are reported on the advertising page would be a great start 
>>>>>>>>>> IMO.
>>>>>>>>>>
>>>>>>>>>> Might I also separately suggest you consider filtering events
>>>>>>>>>> prior to placing them on the queue for processing by the dispatcher? 
>>>>>>>>>>  I've
>>>>>>>>>> only briefly glanced at the code, but it seems not to be the case 
>>>>>>>>>> currently.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 17 May 2016 at 18:33, Martin Thompson <mjp...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Remko,
>>>>>>>>>>>>
>>>>>>>>>>>> I'd just like to say that it is a great service to the
>>>>>>>>>>>> community as a whole that someone is seriously looking at 
>>>>>>>>>>>> improving logging.
>>>>>>>>>>>>
>>>>>>>>>>>> If you keep it up you'll be putting folks like me out of a job
>>>>>>>>>>>> :)
>>>>>>>>>>>>
>>>>>>>>>>>> Martin...
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 17 May 2016 at 18:13, Remko Popma <remko...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> First, my thanks to the many people who gave helpful advice
>>>>>>>>>>>>> and feedback on how to measure Log4j response time on this list 
>>>>>>>>>>>>> some time
>>>>>>>>>>>>> ago.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We're about to start the Log4j 2.6 release.
>>>>>>>>>>>>> If anyone is interested, a preview of the garbage-free logging
>>>>>>>>>>>>> manual page is here:
>>>>>>>>>>>>> http://home.apache.org/~rpopma/log4j/2.6/manual/garbagefree.html
>>>>>>>>>>>>> and a preview of the updated performance page is here:
>>>>>>>>>>>>> http://home.apache.org/~rpopma/log4j/2.6/performance.html
>>>>>>>>>>>>>
>>>>>>>>>>>>> Feedback welcome!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Remko
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>>> Google Groups "mechanical-sympathy" group.
>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>>>>>>>>>>>> it, send an email to
>>>>>>>>>>>>> mechanical-sympathy+unsubscr...@googlegroups.com.
>>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>> Google Groups "mechanical-sympathy" group.
>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>>>>>>>>>>> it, send an email to
>>>>>>>>>>>> mechanical-sympathy+unsubscr...@googlegroups.com.
>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "mechanical-sympathy" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to mechanical-sympathy+unsubscr...@googlegroups.com.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>
>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org
> For additional commands, e-mail: log4j-dev-h...@logging.apache.org
>



-- 
[image: MagineTV]

*Mikael Ståldal*
Senior software developer

*Magine TV*
mikael.stal...@magine.com
Grev Turegatan 3  | 114 46 Stockholm, Sweden  |   www.magine.com

Privileged and/or Confidential Information may be contained in this
message. If you are not the addressee indicated in this message
(or responsible for delivery of the message to such a person), you may not
copy or deliver this message to anyone. In such case,
you should destroy this message and kindly notify the sender by reply
email.

Re: Garbage-free Log4j docs preview

Reply via email to