I see your point.

On Thu, Aug 26, 2021 at 11:21 AM Ralph Goers <ralph.go...@dslextreme.com>
wrote:

> While that is true it does mean that we are locking higher up in the call
> stack than we need to be.
>
> Ralph
>
> > On Aug 26, 2021, at 11:13 AM, Tim Perry <tim.v...@gmail.com> wrote:
> >
> > I’m fairly certain the JIT will optimize out the locking operations on
> the nested synchronized calls after a while. I’m not sure how soon into the
> performance tests that would happen.
> >
> > Tim
> >
> >
> >> On Aug 26, 2021, at 10:55 AM, Ralph Goers <ralph.go...@dslextreme.com>
> wrote:
> >>
> >> So are you continuing to look at this?  Frankly, the work on zero-gc
> stuff made this
> >> very complicated. I am not sure I can even follow the logic anymore. I
> also noticed
> >> that many of the methods dealing with byte buffers are synchronized. I
> am not sure
> >> why as only the method that moves data really needs to be.
> >>
> >> Using multiple buffers would make sense if you are filling one from the
> various patterns
> >> while it is unlocked and then only blocking when writing to the output
> stream buffer.
> >>
> >> For what its worth Ceki updated his performance page again yesterday to
> change
> >> the wording but I am pretty sure he didn’t change the log4j2
> configuration to use a
> >> 256K buffer to match Logback.  And the page still makes it look like he
> is testing
> >> the performance of asynchronous processing when it is really testing
> the performance
> >> of the file appenders with a blocking queue in front of them.
> >>
> >> Ralph
> >>
> >>> On Aug 26, 2021, at 7:29 AM, Carter Kozak <cko...@ckozak.net> wrote:
> >>>
> >>> I also tried that with similar results, which leads me to believe
> we're spending
> >>> too much time copying between buffers.
> >>>
> >>> We've proven that log4j can get LogEvents to the appender very quickly
> and
> >>> efficiently.
> >>>
> >>> Once we hit PatternLayout we write data to a StringBuilder.
> AbstractStringBuilder used to store a char array in java 8, however with
> jep 254 compact strings it was updated (similarly to String) to use a byte
> array + byte representing the encoding.
> >>> We encode the StringBuilder value into a ByteBufferDestination
> (Heap)ByteBuffer in StringBuilderEncoder using the following steps:
> >>> 1. copy the StringBuilder value into a CharBuffer (this would have
> been a direct copy from char[] to char[] in java8, but requires conversion
> from byte[]+coder now)
> >>> 2. encode data from the CharBuffer to the StringBuilderEncoder
> ByteBuffer
> >>> 3. copy contents of the StringBuilderEncoder ByteBuffer to the
> ByteBufferDestination ByteBuffer (which is also always a heap buffer as far
> as I can tell)
> >>> 4. Write the ByteBufferDestination to the
> FileOutputStream/RandomAccessFile by extracting the heap byte-array.
> >>>
> >>> We're copying bytes around a lot, and I think older JVMs would have
> optimized out a bit of the copying because the types exactly matched.
> >>> It's unclear why we need to copy through two separate ByteBuffers from
> StringBuilderEncoder into ByteBufferDestination
> >>> I've generally had better I/O performance using direct byte-buffers
> than heap buffers, although that experience largely comes from work on
> webservers, I haven't tested if it holds true for file I/O. If we operate
> directly upon a FileChannel, we can write direct buffers directly to files.
> The DirectByteBuffer memory is guaranteed to be contiguous, so it should
> have better performance around copying and manipulation as well.
> >>>
> >>> -ck
> >>>
> >>>> On Thu, Aug 26, 2021, at 10:00, Volkan Yazıcı wrote:
> >>>> Ralph, maybe a dumb idea but... Can't we simply write to /dev/null to
> avoid
> >>>> hardware's influence in the test results?
> >>>>
> >>>> On Wed, Aug 25, 2021 at 8:05 PM Ralph Goers <
> ralph.go...@dslextreme.com>
> >>>> wrote:
> >>>>
> >>>>> Did you add the no-op appender to Ceki’s project? Or are you using
> our
> >>>>> NullAppender? I have
> >>>>> my doubts about using that NullAppender as it does nothing - not even
> >>>>> render the Layout, so
> >>>>> it may get completely optimized away.
> >>>>>
> >>>>> I’d still like to know what kind of disks Remko did his original
> testing.
> >>>>> The page says he got
> >>>>> 18.495 million messages per second. Each message would be about 130
> >>>>> characters long
> >>>>> from I can tell. That means the destination has to be able to support
> >>>>> about at least 2.4 GB
> >>>>> per second.
> >>>>>
> >>>>> In Ceki’s test the message is only 23 characters long (vs 100 in
> Remko’s).
> >>>>> That means the
> >>>>> message size is only going to be about 60 characters long. For me the
> >>>>> async test is writing
> >>>>> at about 1700 ops/ms which equates to 102 MBs, yet I am still seeing
> the
> >>>>> queue backed up.
> >>>>> So how we are writing must be terribly inefficient. Logback is
> getting
> >>>>> about 2000 ops/ms,
> >>>>> which is still only 120 MBs.
> >>>>>
> >>>>> Your test below would be generating about 197 MBs.
> >>>>>
> >>>>> One thing I did notice that made a big difference is that Ceki has
> Logback
> >>>>> configured to
> >>>>> use a buffer size of 256K while Log4j2 uses the default of 8K.
> Setting
> >>>>> Log4j2 to 256K
> >>>>> considerably improved the performance.
> >>>>>
> >>>>>
> >>>>> Ralph
> >>>>>
> >>>>>> On Aug 25, 2021, at 8:42 AM, Carter Kozak <cko...@ckozak.net>
> wrote:
> >>>>>>
> >>>>>>> If we would move the appender performance aside, am I right to
> conclude
> >>>>>>> that the entire complex async. framework built atop Disruptor with
> all
> >>>>> its
> >>>>>>> whistles and bells is yielding a diminishing performance gain
> compared
> >>>>> to a
> >>>>>>> simple off the shelf blocking queue?
> >>>>>>
> >>>>>> I haven't seen any data that would give me such an impression -- the
> >>>>> file appender
> >>>>>> itself is slower than the thread producing events, which appears to
> >>>>> limit our throughput.
> >>>>>> Even then, log4j2 does much better in my testing on a linux
> workstation
> >>>>> with 20+ producer
> >>>>>> threads compared to logback.
> >>>>>> Any time the queue is constantly full, performance will suffer (and
> this
> >>>>> applies to standard
> >>>>>> blocking queues as well as disruptor, however disruptor will fare
> worse
> >>>>> when the queue
> >>>>>> is full).
> >>>>>> The results using a single thread are roughly reproducible on the
> >>>>> non-async test, so I think
> >>>>>> the problem is somewhere between the layout, and appender/manager.
> >>>>>>
> >>>>>> Results running locally with no-op appender implementations:
> >>>>>>
> >>>>>> 1 thread:
> >>>>>>
> >>>>>> Benchmark                                        Mode  Cnt     Score
> >>>>> Error   Units
> >>>>>> AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt    4
> 3285.743 ±
> >>>>> 427.963  ops/ms
> >>>>>> AsyncWithFileAppenderBenchmark.logbackFile      thrpt    4
> 2383.532 ±
> >>>>> 136.034  ops/ms
> >>>>>>
> >>>>>> 20 threads:
> >>>>>>
> >>>>>> Benchmark                                        Mode  Cnt     Score
> >>>>> Error   Units
> >>>>>> AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt    4
> 1177.746 ±
> >>>>> 919.560  ops/ms
> >>>>>> AsyncWithFileAppenderBenchmark.logbackFile      thrpt    4
>  602.614 ±
> >>>>> 47.485  ops/ms
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>
> >>
> >
>
>
>

Reply via email to