> But I don't understand why users should only configure async logging in rare cases? Synchronous logging also involves holding a lock. Log4j 2 makes an effort to release this lock as soon as > possible (locking only while copying a byte array from a thread-local buffer to the I/O device), but as Martin pointed out, in addition to the lock contention, other things may cause response time > pauses when writing to the I/O device.
There is also another reason for using async logging, to protect the app from errors and unpredictable response time from a network based appender. However, for that the ArrayBlockingQueue based AsyncAppender is sufficient. On Mon, May 23, 2016 at 2:38 PM, Remko Popma <[email protected]> wrote: > > > On Monday, May 23, 2016 at 2:47:05 AM UTC+9, Benedict Elliott Smith wrote: >> >> Your pride should be saved by your gracious recognition of the error :) >> >> >> However this is the core of my point: you’re clearly an intelligent, >> conscientious and talented developer, and yet you did not fully understand >> these concerns. You are writing core infrastructure many people will >> deploy, and a majority will understand it much much less well. So the very >> best thing you can do is to educate them. Doubly so if you needed >> educating yourself! >> > ;-) > >> >> I would make it very clear that users should only consider asynchronous >> logging if they are having performance problems caused by their logging. >> Only with sub-millisecond latency requirements should async be considered, >> and below 100 micros you are entering busy-spinning territory. Or if huge >> message rates are being seen. The best default is a blocking strategy, as >> this has the closest performance profile to synchronous, and will prevent >> the many users who deploy this without needing it from suffering any major >> ill-effect. >> > I think I missed a step here. You made it plausible that under low > workload, the Disruptor may not be better than ArrayBlockingQueue. > Understood. > > But I don't understand why users should only configure async logging in > rare cases? Synchronous logging also involves holding a lock. Log4j 2 makes > an effort to release this lock as soon as possible (locking only while > copying a byte array from a thread-local buffer to the I/O device), but as > Martin pointed out, in addition to the lock contention, other things may > cause response time pauses when writing to the I/O device. > > So which is "better": holding a lock to signal a > java.util.concurrent.locks.Condition, versus holding a lock to write a byte > array to the I/O device? The latter sounds like it has more risk of latency > spikes. So I'm not sure I agree with your assessment that async logging has > more risk of major ill effects. > > Under low loads the upside of Async Loggers may not be as high as the > graph suggests but I don't see any real downsides either. Please let me > know if I'm missing something. > > >> >> I would also do a performance comparison with a real application. >> Cassandra uses log4j, and the community has an easily utilised performance >> tool. You could try both DEBUG and INFO level, and put up some graphs. >> This would help users make a more realistic and informed decision. >> > I agree it may be interesting for users to see a case study of some sort > with a real application. Can you point me to the tool you are referring to? > > >> >> This may all seem a little disappointing: to *reduce* the number of >> people who might use your amazing new technology. But there will be >> plenty, who both need it and do not and don't read or heed the warnings. >> > That's okay. It is garbage-free logging that is new in the 2.6 release, > not the Async Loggers. The Disruptor-based Async Loggers have been used in > the wild for about three years now. The response has generally been > positive so I'm not panicking just yet. > I do intend to follow up though and I take your point that under low load > this technology may not give as much benefit as under high load. (Again, > please correct me if there is a bigger problem and I'm missing any other > downsides/risks.) > > >> >> Regarding improvements: In my opinion, a better algorithm for this >> majority, of low-to-moderate traffic that is not extremely latency >> sensitive, is semi-synchronous. By which I mean for the first log call in >> a batch to take exclusive ownership of the output and to write its message >> synchronously; any messages that arrive while it has ownership are passed >> to the owner to persist, i.e. they are written asynchronously. To ensure >> any thread’s ownership is brief, it would impose a time-limit, after which >> a dedicated consumer could be started if the queue is still not exhausted. >> This would have better than synchronous behaviour for low-to-moderate >> traffic and scale gracefully to high traffic. >> > This is really interesting. It reminds me of one of the co-operative > algorithms I read about in the Art of Multiprocessor Programming (I tried > to look it up but could not find it last night). > > It may be an idea to enhance the Disruptor for lower workloads: I guess > the question is whether it is possible to incorporate a cheap check before > calling waitStrategy.signalAllWhenBlocking() in MultiProducerSequence, and > avoid calling that method if another thread is already signalling the > consumer thread. > ...and it turns out this already exists and is called > LiteBlockingWaitStrategy. Good stuff! Perhaps all that is needed is combine > the TimeoutBlockingWaitStrategy with this one and we can improve things > nicely for lower workloads. > > Does anyone on this list have any experience with > LiteBlockingWaitStrategy? I see it was used in projectreactor.io, would > be interested to hear how that worked, either on list of off-list. > >> >> >> On Sunday, 22 May 2016 16:40:21 UTC+1, Remko Popma wrote: >>> >>> >>> >>> On Sunday, May 22, 2016 at 11:08:37 PM UTC+9, Benedict Elliott Smith >>> wrote: >>>> >>>> It's possible I'm missing some other aspect that insulates the >>>> application thread from the disruptor, but from your message it seems to me >>>> that you fall into the all-too-common category of people who do not fully >>>> understand the semantics of the disruptor. Application threads (i.e. >>>> disruptor producers) very much *are* affected by the wait strategy. >>>> Under normal use, *if the disruptor consumer thread is employing a >>>> blocking strategy, there will be futex call to unblock it when work is >>>> provided*. i.e. whenever the queue transitions from empty to >>>> non-empty, i.e. for most messages with low-to-medium logging rates. >>>> >>>> In this situation the enqueueing of a message is also* not lock free*, >>>> as while futexes can in theory be lock-free, the JVM uses a lock to >>>> synchronise access to the semaphore that controls its runnable state, so >>>> all threads attempting to wake it up concurrently will compete for this >>>> lock. >>>> >>> >>> Well... Looks like you were right and I was wrong. >>> >>> Interesting. Apart from the dent in my pride here, this is actually good >>> news. It means we can do better! >>> >>> So in the worst-case scenario Async Loggers/Disruptor with the blocking >>> wait strategy give performance equivalent to AsyncAppender with a >>> BlockingArrayQueue. We can expect Async Loggers with Disruptor to >>> outperform AsyncAppender with BlockingArrayQueue when workloads are high or >>> bursts of work keep the background thread busy for some time. At least this >>> is my understanding now. Or did I miss something and are there cases where >>> Async Loggers with Disruptor will perform worse than AsyncAppender with >>> BlockingArrayQueue? >>> >>> In terms of how to improve, short-term we should probably look at using >>> the PhasedBackoffWaitStrategy as the default, although there will always be >>> workloads that "miss the window". The trade-off is increased CPU. We need >>> to tread carefully here because users reporting high CPU is why we changed >>> from the sleeping wait strategy to blocking. (The default in Log4j 2.6 will >>> be timeout blocking wait to deal with a deadlock issue in Solaris.) >>> >>> Long-term we should probably explore other data structures. >>> >>> Exciting times. Thanks for correcting me! >>> >>> >>>> >>>> On 22 May 2016 at 15:07, Benedict Elliott Smith <[email protected]> >>>> wrote: >>>> >>>>> It's possible I'm missing some other aspect that insulates the >>>>> application thread from the disruptor, but from your message it seems to >>>>> me >>>>> that you fall into the all-too-common category of people who do not fully >>>>> understand the semantics of the disruptor. Application threads (i.e. >>>>> disruptor producers) very much *are* affected by the wait strategy. >>>>> Under normal use, *if the disruptor consumer thread is employing a >>>>> blocking strategy, there will be futex call to unblock it when work is >>>>> provided*. i.e. whenever the queue transitions from empty to >>>>> non-empty, i.e. for most messages with low-to-medium logging rates. >>>>> >>>>> In this situation the enqueueing of a message is also* not lock free*, >>>>> as while futexes can in theory be lock-free, the JVM uses a lock to >>>>> synchronise access to the semaphore that controls its runnable state, so >>>>> all threads attempting to wake it up concurrently will compete for this >>>>> lock. >>>>> >>>>> >>>>> >>>>> On 22 May 2016 at 14:42, Remko Popma <[email protected]> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Sunday, May 22, 2016 at 6:34:13 PM UTC+9, Benedict Elliott Smith >>>>>> wrote: >>>>>>> >>>>>>> Hi Remko, >>>>>>> >>>>>>> I realise that I was looking at the old log4j code without realising >>>>>>> it, in which there is always a futex involved. >>>>>>> >>>>>> Which class is this? Async Loggers in Log4j 2 have always been >>>>>> lock-free. What may be confusing is that Log4j 2 also has an Async >>>>>> *Appender*, which does not use the Disruptor but uses a JDK >>>>>> BlockingArrayQueue. This appender is very similar to how Logback and >>>>>> Log4j >>>>>> 1 do asynchronous logging. Async*Appender* (in Log4j 2, Log4j 1 and >>>>>> Logback, they all use a blocking queue) is sensitive to lock contention >>>>>> and >>>>>> does not scale with more threads. >>>>>> >>>>>> >>>>>>> >>>>>>> However the behaviour of the disruptor that backs the new code will >>>>>>> depend on the wait strategy employed (and I'll note your default is a >>>>>>> blocking strategy, for which my prior statement holds*). >>>>>>> >>>>>> Perhaps the key misunderstanding is here. I think I did not explain >>>>>> it well in my previous message. Let me try again. >>>>>> >>>>>> Async Loggers use the Disruptor and are lock-free. This means that >>>>>> multiple application threads can log concurrently and are not blocked by >>>>>> each other. This is true for very low workloads, medium workloads and >>>>>> high >>>>>> workloads. >>>>>> >>>>>> Only if an application logs messages at a very high rate for a long >>>>>> period of time, or uses a slow appender, the Disruptor ringbuffer can >>>>>> fill >>>>>> up and application threads will have to wait until a slot in the >>>>>> ringbuffer >>>>>> becomes available. For the 2.6 release I have added text to the >>>>>> Trade-offs section >>>>>> <http://home.apache.org/~rpopma/log4j/2.6/manual/async.html#Trade-offs> >>>>>> to explain this last point, based on your feedback. >>>>>> >>>>>> Note that the Disruptor wait strategy is irrelevant to all this. >>>>>> Application threads are *not *impacted by the wait strategy. >>>>>> >>>>>> >>>>>>> Here we obviously get into the territory of people needing to >>>>>>> understand the disruptor as well as your logging framework, but since >>>>>>> you >>>>>>> have a whole section labelled "Trade-offs" in which you draw attention >>>>>>> to >>>>>>> the improved throughput and latency profile under "Benefits," under >>>>>>> "Drawbacks" it might be nice to mention these possible confounders. Or >>>>>>> at >>>>>>> least raise a bat signal to go and understand the tradeoffs for the >>>>>>> disruptor (which is also regrettably poorly understood). >>>>>>> >>>>>>> It might also be helpful to explicitly call out the configuration >>>>>>> used for your benchmarks, >>>>>>> >>>>>> Tests on the 2.6 performance page >>>>>> <http://home.apache.org/~rpopma/log4j/2.6/performance.html> are >>>>>> mostly done with JMH benchmarks where parameters are in the javadoc. >>>>>> Tests on the Async Loggers page are partly with the new ResponseTimeTest >>>>>> >>>>>> <https://github.com/apache/logging-log4j2/blob/6ddc0250cb6ebbf5f05d3026a976bc8ca0a8f162/log4j-core/src/test/java/org/apache/logging/log4j/core/async/perftest/ResponseTimeTest.java>class >>>>>> (params in the doc >>>>>> <http://home.apache.org/~rpopma/log4j/2.6/manual/async.html#Latency>) >>>>>> and partly still shows results from the older PerfTestDriver >>>>>> <https://github.com/apache/logging-log4j2/blob/6ddc0250cb6ebbf5f05d3026a976bc8ca0a8f162/log4j-core/src/test/java/org/apache/logging/log4j/core/async/perftest/PerfTestDriver.java> >>>>>> class. This last one has parameters in the code. You'll need to spend >>>>>> some >>>>>> time to get familiar with it if you want to run it. >>>>>> >>>>>> and perhaps to run a comparison against some real software - github >>>>>>> does not lack for applications using log4j! Unfortunately I bet this >>>>>>> would >>>>>>> result in a much less sexy graph. >>>>>>> >>>>>> Log4j 2 has had lock-free Async Loggers from its 2.0-beta5 release >>>>>> <http://logging.apache.org/log4j/2.x/changes-report.html#a2.0-beta5>, >>>>>> three years ago. If there are any serious performance drawbacks like you >>>>>> seem to think there are I hope I would have heard of them. Fortunately >>>>>> the >>>>>> opposite is true. People report very positive experiences >>>>>> <http://tech.finn.no/2014/07/01/log4j2-in-production-making-it-fly/>. >>>>>> People who did their own measurements reported it makes their logging as >>>>>> fast as if it was switched off >>>>>> <http://blogs.mulesoft.com/dev/mule-dev/mule-3-6-asynchronous-logging/> >>>>>> . >>>>>> >>>>>> >>>>>>> Finally, I really think people should (in your wiki page) explicitly >>>>>>> be *discouraged* from using this part of your framework *unless >>>>>>> they know they need it*. It's a fantastic piece of work for people >>>>>>> who *do* need it. >>>>>>> >>>>>> Again, I think you are laboring under a misapprehension. I cannot >>>>>> imagine a scenario where lock-free logging would be a bad thing. If you >>>>>> think there is, please show evidence. >>>>>> >>>>>> >>>>>>> But 99.99% of people would be better off with a logger that just >>>>>>> avoids blocking threads when one is already in the process of logging. >>>>>>> >>>>>> Avoiding blocking is exactly what the Async Loggers are for. (So I >>>>>> really think there is some misunderstanding somewhere.) >>>>>> Based on your feedback, I've updated >>>>>> <http://home.apache.org/~rpopma/log4j/2.6/manual/async.html> the >>>>>> side nav menu and page title for the 2.6 release to clarify that Async >>>>>> Loggers are lock-free. I hope this will help avoid misunderstandings. >>>>>> >>>>>> >>>>>>> >>>>>>> (*It looks like you silently ignore misconfiguration, and default to >>>>>>> timeout wait strategy also, which is probably going to surprise some >>>>>>> people) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 19 May 2016 at 00:34, Remko Popma <[email protected]> wrote: >>>>>>> >>>>>>>> Benedict, thanks for your feedback! >>>>>>>> >>>>>>>> With regards to filtering, global filters are already considered >>>>>>>> before the event is enqueued. Filters configured on the Logger and >>>>>>>> AppenderRef are applied prior to enqueueing with mixed Sync/Async >>>>>>>> <http://logging.apache.org/log4j/2.x/manual/async.html#MixedSync-Async> >>>>>>>> Loggers but when all loggers are async >>>>>>>> <http://logging.apache.org/log4j/2.x/manual/async.html#AllAsync> >>>>>>>> these filters are applied by the background thread. We can probably >>>>>>>> improve >>>>>>>> this, thanks for the suggestion! >>>>>>>> >>>>>>>> Your suggestion to run performance tests under lower loads is >>>>>>>> reasonable and we will look at this for a future release. >>>>>>>> I did experiment with this a little a while back and did see larger >>>>>>>> response time pauses. Perhaps others with more experience can chime in. >>>>>>>> >>>>>>>> My understanding of how the Disruptor works is that enqueueing the >>>>>>>> event is lock-free, so applications threads doing the logging should >>>>>>>> not >>>>>>>> experience any context switches or suffer latency from Futex calls >>>>>>>> caused >>>>>>>> by logging (regardless of the workload). The background thread is >>>>>>>> another >>>>>>>> story. Under moderate to low workloads the background thread may >>>>>>>> (depending >>>>>>>> on the wait policy) fall asleep and experience delays before waking up >>>>>>>> when >>>>>>>> work arrives. However, if there are enough cores to serve all threads I >>>>>>>> don't see how such background thread delays can impact (cause response >>>>>>>> time >>>>>>>> pauses for) the application that is doing the logging. Please correct >>>>>>>> me if >>>>>>>> my understanding is incorrect. >>>>>>>> >>>>>>>> My thinking is that using async loggers is good for reactive >>>>>>>> applications that need to respond quickly to external events. It is >>>>>>>> especially useful if the application needs to deal with occasional >>>>>>>> bursts >>>>>>>> of work (and accompanied bursts of logging). This is where async >>>>>>>> loggers >>>>>>>> can deliver value even if the normal workload is low. >>>>>>>> >>>>>>>> Remko >>>>>>>> >>>>>>>> On Wednesday, May 18, 2016 at 2:56:18 AM UTC+9, Benedict Elliott >>>>>>>> Smith wrote: >>>>>>>>> >>>>>>>>> Regrettably it seems impossible (or at least very annoying) to >>>>>>>>> send to both lists simultaneously... >>>>>>>>> >>>>>>>>> On 17 May 2016 at 18:55, Benedict Elliott Smith < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Could I suggest that you run tests for latency and throughput >>>>>>>>>> effects of using this in systems with only moderate-to-low numbers of >>>>>>>>>> logging calls? (i.e. the vast majority of systems!) >>>>>>>>>> >>>>>>>>> >>>>>>>>>> It's very likely that in such systems messages will not reach a >>>>>>>>>> velocity to keep the Dispatcher running, and so logging calls may >>>>>>>>>> often (or >>>>>>>>>> always) involve a Futex call - which is not a trivial cost. There >>>>>>>>>> will >>>>>>>>>> almost certainly be systems out there with anti-ideal >>>>>>>>>> characteristics - >>>>>>>>>> logging just often enough that these costs materially impact >>>>>>>>>> throughput, >>>>>>>>>> but not often enough that they suddenly disappear. >>>>>>>>>> >>>>>>>>>> Even though a majority of systems *do not need this*, because it >>>>>>>>>> "async" and the new hotness, and there are no advertised downsides, >>>>>>>>>> everyone will try to use it. It's up to those who know better to >>>>>>>>>> make sure >>>>>>>>>> these people are informed it isn't a free lunch. Making sure all of >>>>>>>>>> the >>>>>>>>>> caveats are reported on the advertising page would be a great start >>>>>>>>>> IMO. >>>>>>>>>> >>>>>>>>>> Might I also separately suggest you consider filtering events >>>>>>>>>> prior to placing them on the queue for processing by the dispatcher? >>>>>>>>>> I've >>>>>>>>>> only briefly glanced at the code, but it seems not to be the case >>>>>>>>>> currently. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 17 May 2016 at 18:33, Martin Thompson <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Remko, >>>>>>>>>>>> >>>>>>>>>>>> I'd just like to say that it is a great service to the >>>>>>>>>>>> community as a whole that someone is seriously looking at >>>>>>>>>>>> improving logging. >>>>>>>>>>>> >>>>>>>>>>>> If you keep it up you'll be putting folks like me out of a job >>>>>>>>>>>> :) >>>>>>>>>>>> >>>>>>>>>>>> Martin... >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 17 May 2016 at 18:13, Remko Popma <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi all, >>>>>>>>>>>>> >>>>>>>>>>>>> First, my thanks to the many people who gave helpful advice >>>>>>>>>>>>> and feedback on how to measure Log4j response time on this list >>>>>>>>>>>>> some time >>>>>>>>>>>>> ago. >>>>>>>>>>>>> >>>>>>>>>>>>> We're about to start the Log4j 2.6 release. >>>>>>>>>>>>> If anyone is interested, a preview of the garbage-free logging >>>>>>>>>>>>> manual page is here: >>>>>>>>>>>>> http://home.apache.org/~rpopma/log4j/2.6/manual/garbagefree.html >>>>>>>>>>>>> and a preview of the updated performance page is here: >>>>>>>>>>>>> http://home.apache.org/~rpopma/log4j/2.6/performance.html >>>>>>>>>>>>> >>>>>>>>>>>>> Feedback welcome! >>>>>>>>>>>>> >>>>>>>>>>>>> Remko >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>> Google Groups "mechanical-sympathy" group. >>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>>> it, send an email to >>>>>>>>>>>>> [email protected]. >>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>> Google Groups "mechanical-sympathy" group. >>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>> it, send an email to >>>>>>>>>>>> [email protected]. >>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "mechanical-sympathy" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> >>>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > -- [image: MagineTV] *Mikael Ståldal* Senior software developer *Magine TV* [email protected] Grev Turegatan 3 | 114 46 Stockholm, Sweden | www.magine.com Privileged and/or Confidential Information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such a person), you may not copy or deliver this message to anyone. In such case, you should destroy this message and kindly notify the sender by reply email.
