So are you continuing to look at this?  Frankly, the work on zero-gc stuff made 
this 
very complicated. I am not sure I can even follow the logic anymore. I also 
noticed 
that many of the methods dealing with byte buffers are synchronized. I am not 
sure 
why as only the method that moves data really needs to be.

Using multiple buffers would make sense if you are filling one from the various 
patterns
while it is unlocked and then only blocking when writing to the output stream 
buffer.

For what its worth Ceki updated his performance page again yesterday to change 
the wording but I am pretty sure he didn’t change the log4j2 configuration to 
use a 
256K buffer to match Logback.  And the page still makes it look like he is 
testing 
the performance of asynchronous processing when it is really testing the 
performance 
of the file appenders with a blocking queue in front of them.

Ralph

> On Aug 26, 2021, at 7:29 AM, Carter Kozak <cko...@ckozak.net> wrote:
> 
> I also tried that with similar results, which leads me to believe we're 
> spending
> too much time copying between buffers.
> 
> We've proven that log4j can get LogEvents to the appender very quickly and
> efficiently.
> 
> Once we hit PatternLayout we write data to a StringBuilder. 
> AbstractStringBuilder used to store a char array in java 8, however with jep 
> 254 compact strings it was updated (similarly to String) to use a byte array 
> + byte representing the encoding.
> We encode the StringBuilder value into a ByteBufferDestination 
> (Heap)ByteBuffer in StringBuilderEncoder using the following steps:
> 1. copy the StringBuilder value into a CharBuffer (this would have been a 
> direct copy from char[] to char[] in java8, but requires conversion from 
> byte[]+coder now)
> 2. encode data from the CharBuffer to the StringBuilderEncoder ByteBuffer
> 3. copy contents of the StringBuilderEncoder ByteBuffer to the 
> ByteBufferDestination ByteBuffer (which is also always a heap buffer as far 
> as I can tell)
> 4. Write the ByteBufferDestination to the FileOutputStream/RandomAccessFile 
> by extracting the heap byte-array.
> 
> We're copying bytes around a lot, and I think older JVMs would have optimized 
> out a bit of the copying because the types exactly matched.
> It's unclear why we need to copy through two separate ByteBuffers from 
> StringBuilderEncoder into ByteBufferDestination
> I've generally had better I/O performance using direct byte-buffers than heap 
> buffers, although that experience largely comes from work on webservers, I 
> haven't tested if it holds true for file I/O. If we operate directly upon a 
> FileChannel, we can write direct buffers directly to files. The 
> DirectByteBuffer memory is guaranteed to be contiguous, so it should have 
> better performance around copying and manipulation as well.
> 
> -ck
> 
> On Thu, Aug 26, 2021, at 10:00, Volkan Yazıcı wrote:
>> Ralph, maybe a dumb idea but... Can't we simply write to /dev/null to avoid
>> hardware's influence in the test results?
>> 
>> On Wed, Aug 25, 2021 at 8:05 PM Ralph Goers <ralph.go...@dslextreme.com>
>> wrote:
>> 
>>> Did you add the no-op appender to Ceki’s project? Or are you using our
>>> NullAppender? I have
>>> my doubts about using that NullAppender as it does nothing - not even
>>> render the Layout, so
>>> it may get completely optimized away.
>>> 
>>> I’d still like to know what kind of disks Remko did his original testing.
>>> The page says he got
>>> 18.495 million messages per second. Each message would be about 130
>>> characters long
>>> from I can tell. That means the destination has to be able to support
>>> about at least 2.4 GB
>>> per second.
>>> 
>>> In Ceki’s test the message is only 23 characters long (vs 100 in Remko’s).
>>> That means the
>>> message size is only going to be about 60 characters long. For me the
>>> async test is writing
>>> at about 1700 ops/ms which equates to 102 MBs, yet I am still seeing the
>>> queue backed up.
>>> So how we are writing must be terribly inefficient. Logback is getting
>>> about 2000 ops/ms,
>>> which is still only 120 MBs.
>>> 
>>> Your test below would be generating about 197 MBs.
>>> 
>>> One thing I did notice that made a big difference is that Ceki has Logback
>>> configured to
>>> use a buffer size of 256K while Log4j2 uses the default of 8K. Setting
>>> Log4j2 to 256K
>>> considerably improved the performance.
>>> 
>>> 
>>> Ralph
>>> 
>>>> On Aug 25, 2021, at 8:42 AM, Carter Kozak <cko...@ckozak.net> wrote:
>>>> 
>>>>> If we would move the appender performance aside, am I right to conclude
>>>>> that the entire complex async. framework built atop Disruptor with all
>>> its
>>>>> whistles and bells is yielding a diminishing performance gain compared
>>> to a
>>>>> simple off the shelf blocking queue?
>>>> 
>>>> I haven't seen any data that would give me such an impression -- the
>>> file appender
>>>> itself is slower than the thread producing events, which appears to
>>> limit our throughput.
>>>> Even then, log4j2 does much better in my testing on a linux workstation
>>> with 20+ producer
>>>> threads compared to logback.
>>>> Any time the queue is constantly full, performance will suffer (and this
>>> applies to standard
>>>> blocking queues as well as disruptor, however disruptor will fare worse
>>> when the queue
>>>> is full).
>>>> The results using a single thread are roughly reproducible on the
>>> non-async test, so I think
>>>> the problem is somewhere between the layout, and appender/manager.
>>>> 
>>>> Results running locally with no-op appender implementations:
>>>> 
>>>> 1 thread:
>>>> 
>>>> Benchmark                                        Mode  Cnt     Score
>>> Error   Units
>>>> AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt    4  3285.743 ±
>>> 427.963  ops/ms
>>>> AsyncWithFileAppenderBenchmark.logbackFile      thrpt    4  2383.532 ±
>>> 136.034  ops/ms
>>>> 
>>>> 20 threads:
>>>> 
>>>> Benchmark                                        Mode  Cnt     Score
>>> Error   Units
>>>> AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt    4  1177.746 ±
>>> 919.560  ops/ms
>>>> AsyncWithFileAppenderBenchmark.logbackFile      thrpt    4   602.614 ±
>>> 47.485  ops/ms
>>> 
>>> 
>>> 
>> 


Reply via email to