I’m fairly certain the JIT will optimize out the locking operations on the 
nested synchronized calls after a while. I’m not sure how soon into the 
performance tests that would happen. 

Tim


> On Aug 26, 2021, at 10:55 AM, Ralph Goers <ralph.go...@dslextreme.com> wrote:
> 
> So are you continuing to look at this?  Frankly, the work on zero-gc stuff 
> made this 
> very complicated. I am not sure I can even follow the logic anymore. I also 
> noticed 
> that many of the methods dealing with byte buffers are synchronized. I am not 
> sure 
> why as only the method that moves data really needs to be.
> 
> Using multiple buffers would make sense if you are filling one from the 
> various patterns
> while it is unlocked and then only blocking when writing to the output stream 
> buffer.
> 
> For what its worth Ceki updated his performance page again yesterday to 
> change 
> the wording but I am pretty sure he didn’t change the log4j2 configuration to 
> use a 
> 256K buffer to match Logback.  And the page still makes it look like he is 
> testing 
> the performance of asynchronous processing when it is really testing the 
> performance 
> of the file appenders with a blocking queue in front of them.
> 
> Ralph
> 
>> On Aug 26, 2021, at 7:29 AM, Carter Kozak <cko...@ckozak.net> wrote:
>> 
>> I also tried that with similar results, which leads me to believe we're 
>> spending
>> too much time copying between buffers.
>> 
>> We've proven that log4j can get LogEvents to the appender very quickly and
>> efficiently.
>> 
>> Once we hit PatternLayout we write data to a StringBuilder. 
>> AbstractStringBuilder used to store a char array in java 8, however with jep 
>> 254 compact strings it was updated (similarly to String) to use a byte array 
>> + byte representing the encoding.
>> We encode the StringBuilder value into a ByteBufferDestination 
>> (Heap)ByteBuffer in StringBuilderEncoder using the following steps:
>> 1. copy the StringBuilder value into a CharBuffer (this would have been a 
>> direct copy from char[] to char[] in java8, but requires conversion from 
>> byte[]+coder now)
>> 2. encode data from the CharBuffer to the StringBuilderEncoder ByteBuffer
>> 3. copy contents of the StringBuilderEncoder ByteBuffer to the 
>> ByteBufferDestination ByteBuffer (which is also always a heap buffer as far 
>> as I can tell)
>> 4. Write the ByteBufferDestination to the FileOutputStream/RandomAccessFile 
>> by extracting the heap byte-array.
>> 
>> We're copying bytes around a lot, and I think older JVMs would have 
>> optimized out a bit of the copying because the types exactly matched.
>> It's unclear why we need to copy through two separate ByteBuffers from 
>> StringBuilderEncoder into ByteBufferDestination
>> I've generally had better I/O performance using direct byte-buffers than 
>> heap buffers, although that experience largely comes from work on 
>> webservers, I haven't tested if it holds true for file I/O. If we operate 
>> directly upon a FileChannel, we can write direct buffers directly to files. 
>> The DirectByteBuffer memory is guaranteed to be contiguous, so it should 
>> have better performance around copying and manipulation as well.
>> 
>> -ck
>> 
>>> On Thu, Aug 26, 2021, at 10:00, Volkan Yazıcı wrote:
>>> Ralph, maybe a dumb idea but... Can't we simply write to /dev/null to avoid
>>> hardware's influence in the test results?
>>> 
>>> On Wed, Aug 25, 2021 at 8:05 PM Ralph Goers <ralph.go...@dslextreme.com>
>>> wrote:
>>> 
>>>> Did you add the no-op appender to Ceki’s project? Or are you using our
>>>> NullAppender? I have
>>>> my doubts about using that NullAppender as it does nothing - not even
>>>> render the Layout, so
>>>> it may get completely optimized away.
>>>> 
>>>> I’d still like to know what kind of disks Remko did his original testing.
>>>> The page says he got
>>>> 18.495 million messages per second. Each message would be about 130
>>>> characters long
>>>> from I can tell. That means the destination has to be able to support
>>>> about at least 2.4 GB
>>>> per second.
>>>> 
>>>> In Ceki’s test the message is only 23 characters long (vs 100 in Remko’s).
>>>> That means the
>>>> message size is only going to be about 60 characters long. For me the
>>>> async test is writing
>>>> at about 1700 ops/ms which equates to 102 MBs, yet I am still seeing the
>>>> queue backed up.
>>>> So how we are writing must be terribly inefficient. Logback is getting
>>>> about 2000 ops/ms,
>>>> which is still only 120 MBs.
>>>> 
>>>> Your test below would be generating about 197 MBs.
>>>> 
>>>> One thing I did notice that made a big difference is that Ceki has Logback
>>>> configured to
>>>> use a buffer size of 256K while Log4j2 uses the default of 8K. Setting
>>>> Log4j2 to 256K
>>>> considerably improved the performance.
>>>> 
>>>> 
>>>> Ralph
>>>> 
>>>>> On Aug 25, 2021, at 8:42 AM, Carter Kozak <cko...@ckozak.net> wrote:
>>>>> 
>>>>>> If we would move the appender performance aside, am I right to conclude
>>>>>> that the entire complex async. framework built atop Disruptor with all
>>>> its
>>>>>> whistles and bells is yielding a diminishing performance gain compared
>>>> to a
>>>>>> simple off the shelf blocking queue?
>>>>> 
>>>>> I haven't seen any data that would give me such an impression -- the
>>>> file appender
>>>>> itself is slower than the thread producing events, which appears to
>>>> limit our throughput.
>>>>> Even then, log4j2 does much better in my testing on a linux workstation
>>>> with 20+ producer
>>>>> threads compared to logback.
>>>>> Any time the queue is constantly full, performance will suffer (and this
>>>> applies to standard
>>>>> blocking queues as well as disruptor, however disruptor will fare worse
>>>> when the queue
>>>>> is full).
>>>>> The results using a single thread are roughly reproducible on the
>>>> non-async test, so I think
>>>>> the problem is somewhere between the layout, and appender/manager.
>>>>> 
>>>>> Results running locally with no-op appender implementations:
>>>>> 
>>>>> 1 thread:
>>>>> 
>>>>> Benchmark                                        Mode  Cnt     Score
>>>> Error   Units
>>>>> AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt    4  3285.743 ±
>>>> 427.963  ops/ms
>>>>> AsyncWithFileAppenderBenchmark.logbackFile      thrpt    4  2383.532 ±
>>>> 136.034  ops/ms
>>>>> 
>>>>> 20 threads:
>>>>> 
>>>>> Benchmark                                        Mode  Cnt     Score
>>>> Error   Units
>>>>> AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt    4  1177.746 ±
>>>> 919.560  ops/ms
>>>>> AsyncWithFileAppenderBenchmark.logbackFile      thrpt    4   602.614 ±
>>>> 47.485  ops/ms
>>>> 
>>>> 
>>>> 
>>> 
> 
> 

Reply via email to