I’m fairly certain the JIT will optimize out the locking operations on the nested synchronized calls after a while. I’m not sure how soon into the performance tests that would happen.
Tim > On Aug 26, 2021, at 10:55 AM, Ralph Goers <ralph.go...@dslextreme.com> wrote: > > So are you continuing to look at this? Frankly, the work on zero-gc stuff > made this > very complicated. I am not sure I can even follow the logic anymore. I also > noticed > that many of the methods dealing with byte buffers are synchronized. I am not > sure > why as only the method that moves data really needs to be. > > Using multiple buffers would make sense if you are filling one from the > various patterns > while it is unlocked and then only blocking when writing to the output stream > buffer. > > For what its worth Ceki updated his performance page again yesterday to > change > the wording but I am pretty sure he didn’t change the log4j2 configuration to > use a > 256K buffer to match Logback. And the page still makes it look like he is > testing > the performance of asynchronous processing when it is really testing the > performance > of the file appenders with a blocking queue in front of them. > > Ralph > >> On Aug 26, 2021, at 7:29 AM, Carter Kozak <cko...@ckozak.net> wrote: >> >> I also tried that with similar results, which leads me to believe we're >> spending >> too much time copying between buffers. >> >> We've proven that log4j can get LogEvents to the appender very quickly and >> efficiently. >> >> Once we hit PatternLayout we write data to a StringBuilder. >> AbstractStringBuilder used to store a char array in java 8, however with jep >> 254 compact strings it was updated (similarly to String) to use a byte array >> + byte representing the encoding. >> We encode the StringBuilder value into a ByteBufferDestination >> (Heap)ByteBuffer in StringBuilderEncoder using the following steps: >> 1. copy the StringBuilder value into a CharBuffer (this would have been a >> direct copy from char[] to char[] in java8, but requires conversion from >> byte[]+coder now) >> 2. encode data from the CharBuffer to the StringBuilderEncoder ByteBuffer >> 3. copy contents of the StringBuilderEncoder ByteBuffer to the >> ByteBufferDestination ByteBuffer (which is also always a heap buffer as far >> as I can tell) >> 4. Write the ByteBufferDestination to the FileOutputStream/RandomAccessFile >> by extracting the heap byte-array. >> >> We're copying bytes around a lot, and I think older JVMs would have >> optimized out a bit of the copying because the types exactly matched. >> It's unclear why we need to copy through two separate ByteBuffers from >> StringBuilderEncoder into ByteBufferDestination >> I've generally had better I/O performance using direct byte-buffers than >> heap buffers, although that experience largely comes from work on >> webservers, I haven't tested if it holds true for file I/O. If we operate >> directly upon a FileChannel, we can write direct buffers directly to files. >> The DirectByteBuffer memory is guaranteed to be contiguous, so it should >> have better performance around copying and manipulation as well. >> >> -ck >> >>> On Thu, Aug 26, 2021, at 10:00, Volkan Yazıcı wrote: >>> Ralph, maybe a dumb idea but... Can't we simply write to /dev/null to avoid >>> hardware's influence in the test results? >>> >>> On Wed, Aug 25, 2021 at 8:05 PM Ralph Goers <ralph.go...@dslextreme.com> >>> wrote: >>> >>>> Did you add the no-op appender to Ceki’s project? Or are you using our >>>> NullAppender? I have >>>> my doubts about using that NullAppender as it does nothing - not even >>>> render the Layout, so >>>> it may get completely optimized away. >>>> >>>> I’d still like to know what kind of disks Remko did his original testing. >>>> The page says he got >>>> 18.495 million messages per second. Each message would be about 130 >>>> characters long >>>> from I can tell. That means the destination has to be able to support >>>> about at least 2.4 GB >>>> per second. >>>> >>>> In Ceki’s test the message is only 23 characters long (vs 100 in Remko’s). >>>> That means the >>>> message size is only going to be about 60 characters long. For me the >>>> async test is writing >>>> at about 1700 ops/ms which equates to 102 MBs, yet I am still seeing the >>>> queue backed up. >>>> So how we are writing must be terribly inefficient. Logback is getting >>>> about 2000 ops/ms, >>>> which is still only 120 MBs. >>>> >>>> Your test below would be generating about 197 MBs. >>>> >>>> One thing I did notice that made a big difference is that Ceki has Logback >>>> configured to >>>> use a buffer size of 256K while Log4j2 uses the default of 8K. Setting >>>> Log4j2 to 256K >>>> considerably improved the performance. >>>> >>>> >>>> Ralph >>>> >>>>> On Aug 25, 2021, at 8:42 AM, Carter Kozak <cko...@ckozak.net> wrote: >>>>> >>>>>> If we would move the appender performance aside, am I right to conclude >>>>>> that the entire complex async. framework built atop Disruptor with all >>>> its >>>>>> whistles and bells is yielding a diminishing performance gain compared >>>> to a >>>>>> simple off the shelf blocking queue? >>>>> >>>>> I haven't seen any data that would give me such an impression -- the >>>> file appender >>>>> itself is slower than the thread producing events, which appears to >>>> limit our throughput. >>>>> Even then, log4j2 does much better in my testing on a linux workstation >>>> with 20+ producer >>>>> threads compared to logback. >>>>> Any time the queue is constantly full, performance will suffer (and this >>>> applies to standard >>>>> blocking queues as well as disruptor, however disruptor will fare worse >>>> when the queue >>>>> is full). >>>>> The results using a single thread are roughly reproducible on the >>>> non-async test, so I think >>>>> the problem is somewhere between the layout, and appender/manager. >>>>> >>>>> Results running locally with no-op appender implementations: >>>>> >>>>> 1 thread: >>>>> >>>>> Benchmark Mode Cnt Score >>>> Error Units >>>>> AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 4 3285.743 ± >>>> 427.963 ops/ms >>>>> AsyncWithFileAppenderBenchmark.logbackFile thrpt 4 2383.532 ± >>>> 136.034 ops/ms >>>>> >>>>> 20 threads: >>>>> >>>>> Benchmark Mode Cnt Score >>>> Error Units >>>>> AsyncWithFileAppenderBenchmark.log4j2AsyncFile thrpt 4 1177.746 ± >>>> 919.560 ops/ms >>>>> AsyncWithFileAppenderBenchmark.logbackFile thrpt 4 602.614 ± >>>> 47.485 ops/ms >>>> >>>> >>>> >>> > >