While that is true it does mean that we are locking higher up in the call stack 
than we need to be.

Ralph

> On Aug 26, 2021, at 11:13 AM, Tim Perry <tim.v...@gmail.com> wrote:
> 
> I’m fairly certain the JIT will optimize out the locking operations on the 
> nested synchronized calls after a while. I’m not sure how soon into the 
> performance tests that would happen. 
> 
> Tim
> 
> 
>> On Aug 26, 2021, at 10:55 AM, Ralph Goers <ralph.go...@dslextreme.com> wrote:
>> 
>> So are you continuing to look at this?  Frankly, the work on zero-gc stuff 
>> made this 
>> very complicated. I am not sure I can even follow the logic anymore. I also 
>> noticed 
>> that many of the methods dealing with byte buffers are synchronized. I am 
>> not sure 
>> why as only the method that moves data really needs to be.
>> 
>> Using multiple buffers would make sense if you are filling one from the 
>> various patterns
>> while it is unlocked and then only blocking when writing to the output 
>> stream buffer.
>> 
>> For what its worth Ceki updated his performance page again yesterday to 
>> change 
>> the wording but I am pretty sure he didn’t change the log4j2 configuration 
>> to use a 
>> 256K buffer to match Logback.  And the page still makes it look like he is 
>> testing 
>> the performance of asynchronous processing when it is really testing the 
>> performance 
>> of the file appenders with a blocking queue in front of them.
>> 
>> Ralph
>> 
>>> On Aug 26, 2021, at 7:29 AM, Carter Kozak <cko...@ckozak.net> wrote:
>>> 
>>> I also tried that with similar results, which leads me to believe we're 
>>> spending
>>> too much time copying between buffers.
>>> 
>>> We've proven that log4j can get LogEvents to the appender very quickly and
>>> efficiently.
>>> 
>>> Once we hit PatternLayout we write data to a StringBuilder. 
>>> AbstractStringBuilder used to store a char array in java 8, however with 
>>> jep 254 compact strings it was updated (similarly to String) to use a byte 
>>> array + byte representing the encoding.
>>> We encode the StringBuilder value into a ByteBufferDestination 
>>> (Heap)ByteBuffer in StringBuilderEncoder using the following steps:
>>> 1. copy the StringBuilder value into a CharBuffer (this would have been a 
>>> direct copy from char[] to char[] in java8, but requires conversion from 
>>> byte[]+coder now)
>>> 2. encode data from the CharBuffer to the StringBuilderEncoder ByteBuffer
>>> 3. copy contents of the StringBuilderEncoder ByteBuffer to the 
>>> ByteBufferDestination ByteBuffer (which is also always a heap buffer as far 
>>> as I can tell)
>>> 4. Write the ByteBufferDestination to the FileOutputStream/RandomAccessFile 
>>> by extracting the heap byte-array.
>>> 
>>> We're copying bytes around a lot, and I think older JVMs would have 
>>> optimized out a bit of the copying because the types exactly matched.
>>> It's unclear why we need to copy through two separate ByteBuffers from 
>>> StringBuilderEncoder into ByteBufferDestination
>>> I've generally had better I/O performance using direct byte-buffers than 
>>> heap buffers, although that experience largely comes from work on 
>>> webservers, I haven't tested if it holds true for file I/O. If we operate 
>>> directly upon a FileChannel, we can write direct buffers directly to files. 
>>> The DirectByteBuffer memory is guaranteed to be contiguous, so it should 
>>> have better performance around copying and manipulation as well.
>>> 
>>> -ck
>>> 
>>>> On Thu, Aug 26, 2021, at 10:00, Volkan Yazıcı wrote:
>>>> Ralph, maybe a dumb idea but... Can't we simply write to /dev/null to avoid
>>>> hardware's influence in the test results?
>>>> 
>>>> On Wed, Aug 25, 2021 at 8:05 PM Ralph Goers <ralph.go...@dslextreme.com>
>>>> wrote:
>>>> 
>>>>> Did you add the no-op appender to Ceki’s project? Or are you using our
>>>>> NullAppender? I have
>>>>> my doubts about using that NullAppender as it does nothing - not even
>>>>> render the Layout, so
>>>>> it may get completely optimized away.
>>>>> 
>>>>> I’d still like to know what kind of disks Remko did his original testing.
>>>>> The page says he got
>>>>> 18.495 million messages per second. Each message would be about 130
>>>>> characters long
>>>>> from I can tell. That means the destination has to be able to support
>>>>> about at least 2.4 GB
>>>>> per second.
>>>>> 
>>>>> In Ceki’s test the message is only 23 characters long (vs 100 in Remko’s).
>>>>> That means the
>>>>> message size is only going to be about 60 characters long. For me the
>>>>> async test is writing
>>>>> at about 1700 ops/ms which equates to 102 MBs, yet I am still seeing the
>>>>> queue backed up.
>>>>> So how we are writing must be terribly inefficient. Logback is getting
>>>>> about 2000 ops/ms,
>>>>> which is still only 120 MBs.
>>>>> 
>>>>> Your test below would be generating about 197 MBs.
>>>>> 
>>>>> One thing I did notice that made a big difference is that Ceki has Logback
>>>>> configured to
>>>>> use a buffer size of 256K while Log4j2 uses the default of 8K. Setting
>>>>> Log4j2 to 256K
>>>>> considerably improved the performance.
>>>>> 
>>>>> 
>>>>> Ralph
>>>>> 
>>>>>> On Aug 25, 2021, at 8:42 AM, Carter Kozak <cko...@ckozak.net> wrote:
>>>>>> 
>>>>>>> If we would move the appender performance aside, am I right to conclude
>>>>>>> that the entire complex async. framework built atop Disruptor with all
>>>>> its
>>>>>>> whistles and bells is yielding a diminishing performance gain compared
>>>>> to a
>>>>>>> simple off the shelf blocking queue?
>>>>>> 
>>>>>> I haven't seen any data that would give me such an impression -- the
>>>>> file appender
>>>>>> itself is slower than the thread producing events, which appears to
>>>>> limit our throughput.
>>>>>> Even then, log4j2 does much better in my testing on a linux workstation
>>>>> with 20+ producer
>>>>>> threads compared to logback.
>>>>>> Any time the queue is constantly full, performance will suffer (and this
>>>>> applies to standard
>>>>>> blocking queues as well as disruptor, however disruptor will fare worse
>>>>> when the queue
>>>>>> is full).
>>>>>> The results using a single thread are roughly reproducible on the
>>>>> non-async test, so I think
>>>>>> the problem is somewhere between the layout, and appender/manager.
>>>>>> 
>>>>>> Results running locally with no-op appender implementations:
>>>>>> 
>>>>>> 1 thread:
>>>>>> 
>>>>>> Benchmark                                        Mode  Cnt     Score
>>>>> Error   Units
>>>>>> AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt    4  3285.743 ±
>>>>> 427.963  ops/ms
>>>>>> AsyncWithFileAppenderBenchmark.logbackFile      thrpt    4  2383.532 ±
>>>>> 136.034  ops/ms
>>>>>> 
>>>>>> 20 threads:
>>>>>> 
>>>>>> Benchmark                                        Mode  Cnt     Score
>>>>> Error   Units
>>>>>> AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt    4  1177.746 ±
>>>>> 919.560  ops/ms
>>>>>> AsyncWithFileAppenderBenchmark.logbackFile      thrpt    4   602.614 ±
>>>>> 47.485  ops/ms
>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>> 
> 


Reply via email to