I also tried that with similar results, which leads me to believe we're spending
too much time copying between buffers.

We've proven that log4j can get LogEvents to the appender very quickly and
efficiently.

Once we hit PatternLayout we write data to a StringBuilder. 
AbstractStringBuilder used to store a char array in java 8, however with jep 
254 compact strings it was updated (similarly to String) to use a byte array + 
byte representing the encoding.
We encode the StringBuilder value into a ByteBufferDestination (Heap)ByteBuffer 
in StringBuilderEncoder using the following steps:
1. copy the StringBuilder value into a CharBuffer (this would have been a 
direct copy from char[] to char[] in java8, but requires conversion from 
byte[]+coder now)
2. encode data from the CharBuffer to the StringBuilderEncoder ByteBuffer
3. copy contents of the StringBuilderEncoder ByteBuffer to the 
ByteBufferDestination ByteBuffer (which is also always a heap buffer as far as 
I can tell)
4. Write the ByteBufferDestination to the FileOutputStream/RandomAccessFile by 
extracting the heap byte-array.

We're copying bytes around a lot, and I think older JVMs would have optimized 
out a bit of the copying because the types exactly matched.
It's unclear why we need to copy through two separate ByteBuffers from 
StringBuilderEncoder into ByteBufferDestination
I've generally had better I/O performance using direct byte-buffers than heap 
buffers, although that experience largely comes from work on webservers, I 
haven't tested if it holds true for file I/O. If we operate directly upon a 
FileChannel, we can write direct buffers directly to files. The 
DirectByteBuffer memory is guaranteed to be contiguous, so it should have 
better performance around copying and manipulation as well.

-ck

On Thu, Aug 26, 2021, at 10:00, Volkan Yazıcı wrote:
> Ralph, maybe a dumb idea but... Can't we simply write to /dev/null to avoid
> hardware's influence in the test results?
> 
> On Wed, Aug 25, 2021 at 8:05 PM Ralph Goers <ralph.go...@dslextreme.com>
> wrote:
> 
> > Did you add the no-op appender to Ceki’s project? Or are you using our
> > NullAppender? I have
> > my doubts about using that NullAppender as it does nothing - not even
> > render the Layout, so
> > it may get completely optimized away.
> >
> > I’d still like to know what kind of disks Remko did his original testing.
> > The page says he got
> > 18.495 million messages per second. Each message would be about 130
> > characters long
> > from I can tell. That means the destination has to be able to support
> > about at least 2.4 GB
> > per second.
> >
> > In Ceki’s test the message is only 23 characters long (vs 100 in Remko’s).
> > That means the
> > message size is only going to be about 60 characters long. For me the
> > async test is writing
> > at about 1700 ops/ms which equates to 102 MBs, yet I am still seeing the
> > queue backed up.
> > So how we are writing must be terribly inefficient. Logback is getting
> > about 2000 ops/ms,
> > which is still only 120 MBs.
> >
> > Your test below would be generating about 197 MBs.
> >
> > One thing I did notice that made a big difference is that Ceki has Logback
> > configured to
> > use a buffer size of 256K while Log4j2 uses the default of 8K. Setting
> > Log4j2 to 256K
> > considerably improved the performance.
> >
> >
> > Ralph
> >
> > > On Aug 25, 2021, at 8:42 AM, Carter Kozak <cko...@ckozak.net> wrote:
> > >
> > >> If we would move the appender performance aside, am I right to conclude
> > >> that the entire complex async. framework built atop Disruptor with all
> > its
> > >> whistles and bells is yielding a diminishing performance gain compared
> > to a
> > >> simple off the shelf blocking queue?
> > >
> > > I haven't seen any data that would give me such an impression -- the
> > file appender
> > > itself is slower than the thread producing events, which appears to
> > limit our throughput.
> > > Even then, log4j2 does much better in my testing on a linux workstation
> > with 20+ producer
> > > threads compared to logback.
> > > Any time the queue is constantly full, performance will suffer (and this
> > applies to standard
> > > blocking queues as well as disruptor, however disruptor will fare worse
> > when the queue
> > > is full).
> > > The results using a single thread are roughly reproducible on the
> > non-async test, so I think
> > > the problem is somewhere between the layout, and appender/manager.
> > >
> > > Results running locally with no-op appender implementations:
> > >
> > > 1 thread:
> > >
> > > Benchmark                                        Mode  Cnt     Score
> >  Error   Units
> > > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt    4  3285.743 ±
> > 427.963  ops/ms
> > > AsyncWithFileAppenderBenchmark.logbackFile      thrpt    4  2383.532 ±
> > 136.034  ops/ms
> > >
> > > 20 threads:
> > >
> > > Benchmark                                        Mode  Cnt     Score
> >  Error   Units
> > > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt    4  1177.746 ±
> > 919.560  ops/ms
> > > AsyncWithFileAppenderBenchmark.logbackFile      thrpt    4   602.614 ±
> > 47.485  ops/ms
> >
> >
> >
> 

Reply via email to