While that is true it does mean that we are locking higher up in the call stack than we need to be.
Ralph > On Aug 26, 2021, at 11:13 AM, Tim Perry <tim.v...@gmail.com> wrote: > > I’m fairly certain the JIT will optimize out the locking operations on the > nested synchronized calls after a while. I’m not sure how soon into the > performance tests that would happen. > > Tim > > >> On Aug 26, 2021, at 10:55 AM, Ralph Goers <ralph.go...@dslextreme.com> wrote: >> >> So are you continuing to look at this? Frankly, the work on zero-gc stuff >> made this >> very complicated. I am not sure I can even follow the logic anymore. I also >> noticed >> that many of the methods dealing with byte buffers are synchronized. I am >> not sure >> why as only the method that moves data really needs to be. >> >> Using multiple buffers would make sense if you are filling one from the >> various patterns >> while it is unlocked and then only blocking when writing to the output >> stream buffer. >> >> For what its worth Ceki updated his performance page again yesterday to >> change >> the wording but I am pretty sure he didn’t change the log4j2 configuration >> to use a >> 256K buffer to match Logback. And the page still makes it look like he is >> testing >> the performance of asynchronous processing when it is really testing the >> performance >> of the file appenders with a blocking queue in front of them. >> >> Ralph >> >>> On Aug 26, 2021, at 7:29 AM, Carter Kozak <cko...@ckozak.net> wrote: >>> >>> I also tried that with similar results, which leads me to believe we're >>> spending >>> too much time copying between buffers. >>> >>> We've proven that log4j can get LogEvents to the appender very quickly and >>> efficiently. >>> >>> Once we hit PatternLayout we write data to a StringBuilder. >>> AbstractStringBuilder used to store a char array in java 8, however with >>> jep 254 compact strings it was updated (similarly to String) to use a byte >>> array + byte representing the encoding. >>> We encode the StringBuilder value into a ByteBufferDestination >>> (Heap)ByteBuffer in StringBuilderEncoder using the following steps: >>> 1. copy the StringBuilder value into a CharBuffer (this would have been a >>> direct copy from char[] to char[] in java8, but requires conversion from >>> byte[]+coder now) >>> 2. encode data from the CharBuffer to the StringBuilderEncoder ByteBuffer >>> 3. copy contents of the StringBuilderEncoder ByteBuffer to the >>> ByteBufferDestination ByteBuffer (which is also always a heap buffer as far >>> as I can tell) >>> 4. Write the ByteBufferDestination to the FileOutputStream/RandomAccessFile >>> by extracting the heap byte-array. >>> >>> We're copying bytes around a lot, and I think older JVMs would have >>> optimized out a bit of the copying because the types exactly matched. >>> It's unclear why we need to copy through two separate ByteBuffers from >>> StringBuilderEncoder into ByteBufferDestination >>> I've generally had better I/O performance using direct byte-buffers than >>> heap buffers, although that experience largely comes from work on >>> webservers, I haven't tested if it holds true for file I/O. If we operate >>> directly upon a FileChannel, we can write direct buffers directly to files. >>> The DirectByteBuffer memory is guaranteed to be contiguous, so it should >>> have better performance around copying and manipulation as well. >>> >>> -ck >>> >>>> On Thu, Aug 26, 2021, at 10:00, Volkan Yazıcı wrote: >>>> Ralph, maybe a dumb idea but... Can't we simply write to /dev/null to avoid >>>> hardware's influence in the test results? >>>> >>>> On Wed, Aug 25, 2021 at 8:05 PM Ralph Goers <ralph.go...@dslextreme.com> >>>> wrote: >>>> >>>>> Did you add the no-op appender to Ceki’s project? Or are you using our >>>>> NullAppender? I have >>>>> my doubts about using that NullAppender as it does nothing - not even >>>>> render the Layout, so >>>>> it may get completely optimized away. >>>>> >>>>> I’d still like to know what kind of disks Remko did his original testing. >>>>> The page says he got >>>>> 18.495 million messages per second. Each message would be about 130 >>>>> characters long >>>>> from I can tell. That means the destination has to be able to support >>>>> about at least 2.4 GB >>>>> per second. >>>>> >>>>> In Ceki’s test the message is only 23 characters long (vs 100 in Remko’s). >>>>> That means the >>>>> message size is only going to be about 60 characters long. For me the >>>>> async test is writing >>>>> at about 1700 ops/ms which equates to 102 MBs, yet I am still seeing the >>>>> queue backed up. >>>>> So how we are writing must be terribly inefficient. Logback is getting >>>>> about 2000 ops/ms, >>>>> which is still only 120 MBs. >>>>> >>>>> Your test below would be generating about 197 MBs. >>>>> >>>>> One thing I did notice that made a big difference is that Ceki has Logback >>>>> configured to >>>>> use a buffer size of 256K while Log4j2 uses the default of 8K. Setting >>>>> Log4j2 to 256K >>>>> considerably improved the performance. >>>>> >>>>> >>>>> Ralph >>>>> >>>>>> On Aug 25, 2021, at 8:42 AM, Carter Kozak <cko...@ckozak.net> wrote: >>>>>> >>>>>>> If we would move the appender performance aside, am I right to conclude >>>>>>> that the entire complex async. framework built atop Disruptor with all >>>>> its >>>>>>> whistles and bells is yielding a diminishing performance gain compared >>>>> to a >>>>>>> simple off the shelf blocking queue? >>>>>> >>>>>> I haven't seen any data that would give me such an impression -- the >>>>> file appender >>>>>> itself is slower than the thread producing events, which appears to >>>>> limit our throughput. >>>>>> Even then, log4j2 does much better in my testing on a linux workstation >>>>> with 20+ producer >>>>>> threads compared to logback. >>>>>> Any time the queue is constantly full, performance will suffer (and this >>>>> applies to standard >>>>>> blocking queues as well as disruptor, however disruptor will fare worse >>>>> when the queue >>>>>> is full). >>>>>> The results using a single thread are roughly reproducible on the >>>>> non-async test, so I think >>>>>> the problem is somewhere between the layout, and appender/manager. >>>>>> >>>>>> Results running locally with no-op appender implementations: >>>>>> >>>>>> 1 thread: >>>>>> >>>>>> Benchmark Mode Cnt Score >>>>> Error Units >>>>>> AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 4 3285.743 ± >>>>> 427.963 ops/ms >>>>>> AsyncWithFileAppenderBenchmark.logbackFile thrpt 4 2383.532 ± >>>>> 136.034 ops/ms >>>>>> >>>>>> 20 threads: >>>>>> >>>>>> Benchmark Mode Cnt Score >>>>> Error Units >>>>>> AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 4 1177.746 ± >>>>> 919.560 ops/ms >>>>>> AsyncWithFileAppenderBenchmark.logbackFile thrpt 4 602.614 ± >>>>> 47.485 ops/ms >>>>> >>>>> >>>>> >>>> >> >> >