Re: Cek's new log4j vs logback benchmark

2021-10-15 Thread Carter Kozak
Thanks :-) That works for me.

-ck

> On Oct 15, 2021, at 7:07 AM, Volkan Yazıcı  wrote:
> 
> Welcome back and great work Carter! I am inclined to merge the
> `release-2.x` changes to clear out the way for the 2.15.0 release. (I guess
> Ralph's changes will be the only blocker after that.) Porting those to
> `master` can be tackled in parallel.
> 
>> On Fri, Oct 15, 2021 at 1:03 PM Carter Kozak  wrote:
>> 
>> I’ve been on vacation with limited network access since Saturday, but I’ll
>> be back on Sunday. My PR is ready to merge into release-2.x, I haven’t done
>> so yet because I began the cherry-pick to master but was unable to complete
>> the relatively painful conflict resolution before my flight out. Hoping to
>> get that settled early next week.
>> 



Re: Cek's new log4j vs logback benchmark

2021-10-15 Thread Volkan Yazıcı
Welcome back and great work Carter! I am inclined to merge the
`release-2.x` changes to clear out the way for the 2.15.0 release. (I guess
Ralph's changes will be the only blocker after that.) Porting those to
`master` can be tackled in parallel.

On Fri, Oct 15, 2021 at 1:03 PM Carter Kozak  wrote:

> I’ve been on vacation with limited network access since Saturday, but I’ll
> be back on Sunday. My PR is ready to merge into release-2.x, I haven’t done
> so yet because I began the cherry-pick to master but was unable to complete
> the relatively painful conflict resolution before my flight out. Hoping to
> get that settled early next week.
>


Re: Cek's new log4j vs logback benchmark

2021-10-15 Thread Carter Kozak
I’ve been on vacation with limited network access since Saturday, but I’ll be 
back on Sunday. My PR is ready to merge into release-2.x, I haven’t done so yet 
because I began the cherry-pick to master but was unable to complete the 
relatively painful conflict resolution before my flight out. Hoping to get that 
settled early next week.


Re: Cek's new log4j vs logback benchmark

2021-10-05 Thread Remko Popma
I don't have access to those boxes any more.
I will be going on holiday from tomorrow without access to work email so it
will be difficult for me to find out.

One way is to assume they were new and look at what was the fastest
reported speed for 2015 SSDs.
I found this:
https://www.alphr.com/ssds/1000200/best-ssds-of-2015-whats-the-best-ssd-on-the-market/
and
https://www.alphr.com/samsung-850-pro-256gb/1000187/samsung-850-pro-256gb-review/
which mentions
"The 850 Pro’s AS SSD sequential read and write results of 527MB/sec and
502MB/sec are the best we’ve seen, and it’s the fastest we’ve seen in the
4K 64 read and write tests as well."
My guess is that this would be the ballpark for disks of 2015.

I hope that is useful.
Remko


On Mon, Oct 4, 2021 at 4:36 PM Ralph Goers 
wrote:

> I don’t believe Carter’s latest changes have been merged. But when I tried
> to build
> Carter’s branch one unit test kept failing. I believe the problem is in
> the test - some
> sort of timing issue.
>
> I have some documentation updates I need to make as well as my usual pass
> through
> open issues and PRs. I just finished a project at work that has been
> consuming a
> lot of my time so I should be able to work on that this week.
>
> It would be really nice to update our performance page as it is seriously
> out of date as
> the tests are all based on Log4j 2 2.6. I am reluctant to do that though
> since I cannot
> replicate the results Remko published there. Although Remko documented
> info on the
> CPU that was used the page doesn’t mention what type of disks were used
> and how
> fast they are. I’ve run the tests on my Mac with the SSD that comes with
> it and have
> never been able to verify the results.
>
> Ralph
>
> > On Oct 4, 2021, at 12:09 AM, Volkan Yazıcı  wrote:
> >
> > Gentlemen, what is the status of the progress on this front? Is it in a
> > "good enough" state for the 2.15.0 release?
> >
> > On Thu, Sep 23, 2021 at 6:29 PM Carter Kozak  wrote:
> >
> >> Thanks, Ralph!
> >>
> >> You're correct that the direct encoderes are required for GC-free
> logging,
> >> some of our consumers rely on that
> >> for niche (low latency) workloads where it's important to understand
> >> precisely where a pause may occur.
> >> GC performance has become much better over the last several years, and
> the
> >> throughput of the allocation-
> >> heavy workflow has increased significantly, however there are a few
> things
> >> we should keep in mind:
> >>
> >> 1. Java 8 performs better with direct encoders (although afaict the
> >> performance isn't any better than java 9+,
> >>string.getBytes just has several new optimizations
> >> 2. If this[1] change is merged into openjdk, the direct encoders may
> once
> >> again be the best-performing option,
> >>I'm planning to put together a local build for comparison later.
> >> 3. My testing has primarily targetted UTF-8 as it's the only charset we
> >> target at work, and the default charset
> >>on most modern servers. The jdk has a great deal of optimization
> which
> >> special cases UTF-8 for this reason,
> >>especially from string.getBytes. Other charsets are likely to perform
> >> differently.
> >>
> >> I'm in favor of changing the default value of "direct.encoders" based on
> >> the java version, for example
> >> Java 8 should continue to use them, while versions after compact-strings
> >> was introduced are better
> >> off without direct encoders until the jdk can be improved using
> something
> >> like the linked PR.
> >>
> >> 1. https://github.com/openjdk/jdk/pull/5621#issuecomment-925413767
> >>
> >> -ck
> >>
> >> On Thu, Sep 23, 2021, at 11:51, Ralph Goers wrote:
> >>> I ran Ceki’s benchmarks against 2.14.1, 2.15.0-SNAPSHOT and Carter’s
> >> branch. I ran them with 16 threads on my
> >>> MacBook Pro and ran each test twice since the tests show some
> >> variability from time to time.
> >>>
> >>> I can draw 2 conclusions from this.
> >>> 1. The improvements Carter has made to 2.15.0 have already made a
> >> noticeable improvement.
> >>> 2. His new changes show a definite improvement on top of the 2.15.0
> >> improvements when enable direct encoders is false.
> >>>
> >>> These results definitely make me wonder if we should just remove the
> >> direct encoders logic at least in master. From what
> >>> I can tell it makes things complicated and makes no noticeable impact
> on
> >> performance in 2.15.0 even before Carter’s
> >>> changes. Are they required for gc-free? Can we make direct.encoders =
> >> false the default if it isn’t now?
> >>>
> >>> I apologize if the performance results don’t look good in your email.
> >> They are formatted with a fixed-width font but it is
> >>> likely they will be messed up once they reach your mail client.
> >>>
> >>> Ralph
> >>>
> >>> 2.14.1 - enable direct encoders = false
> >>>
> >>> BenchmarkMode  CntScore
> >> Error  Units
> >>> AsyncWithFileAppenderBenchmark.log4j1File   thrp

Re: Cek's new log4j vs logback benchmark

2021-10-04 Thread Carter Kozak
Ralph, could you point me toward the test that was failing for you? I'd fixed 
up a few and thought everything was passing. The branch should be ready to 
merge unless there are test flakes, then I'll fix them up. I'm also planning to 
turn off direct encoders by default on Java 9 and newer in a separate change.
I've been busy with work and life things, and haven't been as proactive with 
this as I'd like.

-ck

On Mon, Oct 4, 2021, at 03:35, Ralph Goers wrote:
> I don’t believe Carter’s latest changes have been merged. But when I tried to 
> build 
> Carter’s branch one unit test kept failing. I believe the problem is in the 
> test - some 
> sort of timing issue.
> 
> I have some documentation updates I need to make as well as my usual pass 
> through 
> open issues and PRs. I just finished a project at work that has been 
> consuming a 
> lot of my time so I should be able to work on that this week.
> 
> It would be really nice to update our performance page as it is seriously out 
> of date as 
> the tests are all based on Log4j 2 2.6. I am reluctant to do that though 
> since I cannot 
> replicate the results Remko published there. Although Remko documented info 
> on the 
> CPU that was used the page doesn’t mention what type of disks were used and 
> how 
> fast they are. I’ve run the tests on my Mac with the SSD that comes with it 
> and have 
> never been able to verify the results.
> 
> Ralph
> 
> > On Oct 4, 2021, at 12:09 AM, Volkan Yazıcı  wrote:
> > 
> > Gentlemen, what is the status of the progress on this front? Is it in a
> > "good enough" state for the 2.15.0 release?
> > 
> > On Thu, Sep 23, 2021 at 6:29 PM Carter Kozak  wrote:
> > 
> >> Thanks, Ralph!
> >> 
> >> You're correct that the direct encoderes are required for GC-free logging,
> >> some of our consumers rely on that
> >> for niche (low latency) workloads where it's important to understand
> >> precisely where a pause may occur.
> >> GC performance has become much better over the last several years, and the
> >> throughput of the allocation-
> >> heavy workflow has increased significantly, however there are a few things
> >> we should keep in mind:
> >> 
> >> 1. Java 8 performs better with direct encoders (although afaict the
> >> performance isn't any better than java 9+,
> >>string.getBytes just has several new optimizations
> >> 2. If this[1] change is merged into openjdk, the direct encoders may once
> >> again be the best-performing option,
> >>I'm planning to put together a local build for comparison later.
> >> 3. My testing has primarily targetted UTF-8 as it's the only charset we
> >> target at work, and the default charset
> >>on most modern servers. The jdk has a great deal of optimization which
> >> special cases UTF-8 for this reason,
> >>especially from string.getBytes. Other charsets are likely to perform
> >> differently.
> >> 
> >> I'm in favor of changing the default value of "direct.encoders" based on
> >> the java version, for example
> >> Java 8 should continue to use them, while versions after compact-strings
> >> was introduced are better
> >> off without direct encoders until the jdk can be improved using something
> >> like the linked PR.
> >> 
> >> 1. https://github.com/openjdk/jdk/pull/5621#issuecomment-925413767
> >> 
> >> -ck
> >> 
> >> On Thu, Sep 23, 2021, at 11:51, Ralph Goers wrote:
> >>> I ran Ceki’s benchmarks against 2.14.1, 2.15.0-SNAPSHOT and Carter’s
> >> branch. I ran them with 16 threads on my
> >>> MacBook Pro and ran each test twice since the tests show some
> >> variability from time to time.
> >>> 
> >>> I can draw 2 conclusions from this.
> >>> 1. The improvements Carter has made to 2.15.0 have already made a
> >> noticeable improvement.
> >>> 2. His new changes show a definite improvement on top of the 2.15.0
> >> improvements when enable direct encoders is false.
> >>> 
> >>> These results definitely make me wonder if we should just remove the
> >> direct encoders logic at least in master. From what
> >>> I can tell it makes things complicated and makes no noticeable impact on
> >> performance in 2.15.0 even before Carter’s
> >>> changes. Are they required for gc-free? Can we make direct.encoders =
> >> false the default if it isn’t now?
> >>> 
> >>> I apologize if the performance results don’t look good in your email.
> >> They are formatted with a fixed-width font but it is
> >>> likely they will be messed up once they reach your mail client.
> >>> 
> >>> Ralph
> >>> 
> >>> 2.14.1 - enable direct encoders = false
> >>> 
> >>> BenchmarkMode  CntScore
> >> Error  Units
> >>> AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1221.995 ±
> >> 150.876  ops/ms
> >>> AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1495.072 ±
> >> 41.715  ops/ms
> >>> AsyncWithFileAppenderBenchmark.logbackFile  thrpt  10  1529.807 ±
> >> 55.876  ops/ms
> >>> FileAppenderBenchmark.log4j1Filethrpt  10   681.6

Re: Cek's new log4j vs logback benchmark

2021-10-04 Thread Ralph Goers
I don’t believe Carter’s latest changes have been merged. But when I tried to 
build 
Carter’s branch one unit test kept failing. I believe the problem is in the 
test - some 
sort of timing issue.

I have some documentation updates I need to make as well as my usual pass 
through 
open issues and PRs. I just finished a project at work that has been consuming 
a 
lot of my time so I should be able to work on that this week.

It would be really nice to update our performance page as it is seriously out 
of date as 
the tests are all based on Log4j 2 2.6. I am reluctant to do that though since 
I cannot 
replicate the results Remko published there. Although Remko documented info on 
the 
CPU that was used the page doesn’t mention what type of disks were used and how 
fast they are. I’ve run the tests on my Mac with the SSD that comes with it and 
have 
never been able to verify the results.

Ralph

> On Oct 4, 2021, at 12:09 AM, Volkan Yazıcı  wrote:
> 
> Gentlemen, what is the status of the progress on this front? Is it in a
> "good enough" state for the 2.15.0 release?
> 
> On Thu, Sep 23, 2021 at 6:29 PM Carter Kozak  wrote:
> 
>> Thanks, Ralph!
>> 
>> You're correct that the direct encoderes are required for GC-free logging,
>> some of our consumers rely on that
>> for niche (low latency) workloads where it's important to understand
>> precisely where a pause may occur.
>> GC performance has become much better over the last several years, and the
>> throughput of the allocation-
>> heavy workflow has increased significantly, however there are a few things
>> we should keep in mind:
>> 
>> 1. Java 8 performs better with direct encoders (although afaict the
>> performance isn't any better than java 9+,
>>string.getBytes just has several new optimizations
>> 2. If this[1] change is merged into openjdk, the direct encoders may once
>> again be the best-performing option,
>>I'm planning to put together a local build for comparison later.
>> 3. My testing has primarily targetted UTF-8 as it's the only charset we
>> target at work, and the default charset
>>on most modern servers. The jdk has a great deal of optimization which
>> special cases UTF-8 for this reason,
>>especially from string.getBytes. Other charsets are likely to perform
>> differently.
>> 
>> I'm in favor of changing the default value of "direct.encoders" based on
>> the java version, for example
>> Java 8 should continue to use them, while versions after compact-strings
>> was introduced are better
>> off without direct encoders until the jdk can be improved using something
>> like the linked PR.
>> 
>> 1. https://github.com/openjdk/jdk/pull/5621#issuecomment-925413767
>> 
>> -ck
>> 
>> On Thu, Sep 23, 2021, at 11:51, Ralph Goers wrote:
>>> I ran Ceki’s benchmarks against 2.14.1, 2.15.0-SNAPSHOT and Carter’s
>> branch. I ran them with 16 threads on my
>>> MacBook Pro and ran each test twice since the tests show some
>> variability from time to time.
>>> 
>>> I can draw 2 conclusions from this.
>>> 1. The improvements Carter has made to 2.15.0 have already made a
>> noticeable improvement.
>>> 2. His new changes show a definite improvement on top of the 2.15.0
>> improvements when enable direct encoders is false.
>>> 
>>> These results definitely make me wonder if we should just remove the
>> direct encoders logic at least in master. From what
>>> I can tell it makes things complicated and makes no noticeable impact on
>> performance in 2.15.0 even before Carter’s
>>> changes. Are they required for gc-free? Can we make direct.encoders =
>> false the default if it isn’t now?
>>> 
>>> I apologize if the performance results don’t look good in your email.
>> They are formatted with a fixed-width font but it is
>>> likely they will be messed up once they reach your mail client.
>>> 
>>> Ralph
>>> 
>>> 2.14.1 - enable direct encoders = false
>>> 
>>> BenchmarkMode  CntScore
>> Error  Units
>>> AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1221.995 ±
>> 150.876  ops/ms
>>> AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1495.072 ±
>> 41.715  ops/ms
>>> AsyncWithFileAppenderBenchmark.logbackFile  thrpt  10  1529.807 ±
>> 55.876  ops/ms
>>> FileAppenderBenchmark.log4j1Filethrpt  10   681.690 ±
>> 17.631  ops/ms
>>> FileAppenderBenchmark.log4j2Filethrpt  10  1134.543 ±
>> 35.065  ops/ms
>>> FileAppenderBenchmark.logbackFile   thrpt  10  1546.811 ±
>> 95.721  ops/ms
>>> 
>>> BenchmarkMode  CntScore
>> Error  Units
>>> AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1103.526 ±
>> 78.201  ops/ms
>>> AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1402.669 ±
>> 144.374  ops/ms
>>> AsyncWithFileAppenderBenchmark.logbackFile  thrpt  10  1475.666 ±
>> 87.885  ops/ms
>>> FileAppenderBenchmark.log4j1Filethrpt  10   582.841 ±
>> 57.156  ops/ms
>>> Fi

Re: Cek's new log4j vs logback benchmark

2021-10-04 Thread Volkan Yazıcı
Gentlemen, what is the status of the progress on this front? Is it in a
"good enough" state for the 2.15.0 release?

On Thu, Sep 23, 2021 at 6:29 PM Carter Kozak  wrote:

> Thanks, Ralph!
>
> You're correct that the direct encoderes are required for GC-free logging,
> some of our consumers rely on that
> for niche (low latency) workloads where it's important to understand
> precisely where a pause may occur.
> GC performance has become much better over the last several years, and the
> throughput of the allocation-
> heavy workflow has increased significantly, however there are a few things
> we should keep in mind:
>
> 1. Java 8 performs better with direct encoders (although afaict the
> performance isn't any better than java 9+,
> string.getBytes just has several new optimizations
> 2. If this[1] change is merged into openjdk, the direct encoders may once
> again be the best-performing option,
> I'm planning to put together a local build for comparison later.
> 3. My testing has primarily targetted UTF-8 as it's the only charset we
> target at work, and the default charset
> on most modern servers. The jdk has a great deal of optimization which
> special cases UTF-8 for this reason,
> especially from string.getBytes. Other charsets are likely to perform
> differently.
>
> I'm in favor of changing the default value of "direct.encoders" based on
> the java version, for example
> Java 8 should continue to use them, while versions after compact-strings
> was introduced are better
> off without direct encoders until the jdk can be improved using something
> like the linked PR.
>
> 1. https://github.com/openjdk/jdk/pull/5621#issuecomment-925413767
>
> -ck
>
> On Thu, Sep 23, 2021, at 11:51, Ralph Goers wrote:
> > I ran Ceki’s benchmarks against 2.14.1, 2.15.0-SNAPSHOT and Carter’s
> branch. I ran them with 16 threads on my
> > MacBook Pro and ran each test twice since the tests show some
> variability from time to time.
> >
> > I can draw 2 conclusions from this.
> > 1. The improvements Carter has made to 2.15.0 have already made a
> noticeable improvement.
> > 2. His new changes show a definite improvement on top of the 2.15.0
> improvements when enable direct encoders is false.
> >
> > These results definitely make me wonder if we should just remove the
> direct encoders logic at least in master. From what
> > I can tell it makes things complicated and makes no noticeable impact on
> performance in 2.15.0 even before Carter’s
> > changes. Are they required for gc-free? Can we make direct.encoders =
> false the default if it isn’t now?
> >
> > I apologize if the performance results don’t look good in your email.
> They are formatted with a fixed-width font but it is
> > likely they will be messed up once they reach your mail client.
> >
> > Ralph
> >
> > 2.14.1 - enable direct encoders = false
> >
> > BenchmarkMode  CntScore
> Error  Units
> > AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1221.995 ±
> 150.876  ops/ms
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1495.072 ±
> 41.715  ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile  thrpt  10  1529.807 ±
> 55.876  ops/ms
> > FileAppenderBenchmark.log4j1Filethrpt  10   681.690 ±
> 17.631  ops/ms
> > FileAppenderBenchmark.log4j2Filethrpt  10  1134.543 ±
> 35.065  ops/ms
> > FileAppenderBenchmark.logbackFile   thrpt  10  1546.811 ±
> 95.721  ops/ms
> >
> > BenchmarkMode  CntScore
> Error  Units
> > AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1103.526 ±
> 78.201  ops/ms
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1402.669 ±
> 144.374  ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile  thrpt  10  1475.666 ±
> 87.885  ops/ms
> > FileAppenderBenchmark.log4j1Filethrpt  10   582.841 ±
> 57.156  ops/ms
> > FileAppenderBenchmark.log4j2Filethrpt  10   860.587 ±
> 185.170  ops/ms
> > FileAppenderBenchmark.logbackFile   thrpt  10  1560.206 ±
> 323.229  ops/ms
> >
> > 2.14.1 - enable direct encoders = true
> >
> > BenchmarkMode  CntScore
> Error  Units
> > AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1250.808 ±
> 116.848  ops/ms
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1564.926 ±
> 146.195  ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile  thrpt  10  1703.731 ±
> 183.304  ops/ms
> > FileAppenderBenchmark.log4j1Filethrpt  10  736.189 ±
>  40.798  ops/ms
> > FileAppenderBenchmark.log4j2Filethrpt  10  1070.958 ±
> 97.173  ops/ms
> > FileAppenderBenchmark.logbackFile   thrpt  10  1350.507 ±
> 225.104  ops/ms
> >
> > BenchmarkMode  CntScore
> Error  Units
> > AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1218.639 ±
> 141.343 

Re: Cek's new log4j vs logback benchmark

2021-09-23 Thread Matt Sicker
I don't think we should get rid of the Java 8 specific performance
options in the 2.x branch. In 3.x, we can simplify things to default
to appropriate options for Java 11+ and also removing the less useful
variants.

On Thu, Sep 23, 2021 at 11:29 AM Carter Kozak  wrote:
>
> Thanks, Ralph!
>
> You're correct that the direct encoderes are required for GC-free logging, 
> some of our consumers rely on that
> for niche (low latency) workloads where it's important to understand 
> precisely where a pause may occur.
> GC performance has become much better over the last several years, and the 
> throughput of the allocation-
> heavy workflow has increased significantly, however there are a few things we 
> should keep in mind:
>
> 1. Java 8 performs better with direct encoders (although afaict the 
> performance isn't any better than java 9+,
> string.getBytes just has several new optimizations
> 2. If this[1] change is merged into openjdk, the direct encoders may once 
> again be the best-performing option,
> I'm planning to put together a local build for comparison later.
> 3. My testing has primarily targetted UTF-8 as it's the only charset we 
> target at work, and the default charset
> on most modern servers. The jdk has a great deal of optimization which 
> special cases UTF-8 for this reason,
> especially from string.getBytes. Other charsets are likely to perform 
> differently.
>
> I'm in favor of changing the default value of "direct.encoders" based on the 
> java version, for example
> Java 8 should continue to use them, while versions after compact-strings was 
> introduced are better
> off without direct encoders until the jdk can be improved using something 
> like the linked PR.
>
> 1. https://github.com/openjdk/jdk/pull/5621#issuecomment-925413767
>
> -ck
>
> On Thu, Sep 23, 2021, at 11:51, Ralph Goers wrote:
> > I ran Ceki’s benchmarks against 2.14.1, 2.15.0-SNAPSHOT and Carter’s 
> > branch. I ran them with 16 threads on my
> > MacBook Pro and ran each test twice since the tests show some variability 
> > from time to time.
> >
> > I can draw 2 conclusions from this.
> > 1. The improvements Carter has made to 2.15.0 have already made a 
> > noticeable improvement.
> > 2. His new changes show a definite improvement on top of the 2.15.0 
> > improvements when enable direct encoders is false.
> >
> > These results definitely make me wonder if we should just remove the direct 
> > encoders logic at least in master. From what
> > I can tell it makes things complicated and makes no noticeable impact on 
> > performance in 2.15.0 even before Carter’s
> > changes. Are they required for gc-free? Can we make direct.encoders = false 
> > the default if it isn’t now?
> >
> > I apologize if the performance results don’t look good in your email. They 
> > are formatted with a fixed-width font but it is
> > likely they will be messed up once they reach your mail client.
> >
> > Ralph
> >
> > 2.14.1 - enable direct encoders = false
> >
> > BenchmarkMode  CntScore
> > Error  Units
> > AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1221.995 ± 
> > 150.876  ops/ms
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1495.072 ±  
> > 41.715  ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile  thrpt  10  1529.807 ±  
> > 55.876  ops/ms
> > FileAppenderBenchmark.log4j1Filethrpt  10   681.690 ±  
> > 17.631  ops/ms
> > FileAppenderBenchmark.log4j2Filethrpt  10  1134.543 ±  
> > 35.065  ops/ms
> > FileAppenderBenchmark.logbackFile   thrpt  10  1546.811 ±  
> > 95.721  ops/ms
> >
> > BenchmarkMode  CntScore
> > Error  Units
> > AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1103.526 ±  
> > 78.201  ops/ms
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1402.669 ± 
> > 144.374  ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile  thrpt  10  1475.666 ±  
> > 87.885  ops/ms
> > FileAppenderBenchmark.log4j1Filethrpt  10   582.841 ±  
> > 57.156  ops/ms
> > FileAppenderBenchmark.log4j2Filethrpt  10   860.587 ± 
> > 185.170  ops/ms
> > FileAppenderBenchmark.logbackFile   thrpt  10  1560.206 ± 
> > 323.229  ops/ms
> >
> > 2.14.1 - enable direct encoders = true
> >
> > BenchmarkMode  CntScore
> > Error  Units
> > AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1250.808 ± 
> > 116.848  ops/ms
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1564.926 ± 
> > 146.195  ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile  thrpt  10  1703.731 ± 
> > 183.304  ops/ms
> > FileAppenderBenchmark.log4j1Filethrpt  10  736.189 ±   
> > 40.798  ops/ms
> > FileAppenderBenchmark.log4j2Filethrpt  10  1070.958 ±  
> > 97.173  ops/ms
> > FileAppenderBenchmark.logbackFile 

Re: Cek's new log4j vs logback benchmark

2021-09-23 Thread Carter Kozak
Thanks, Ralph!

You're correct that the direct encoderes are required for GC-free logging, some 
of our consumers rely on that
for niche (low latency) workloads where it's important to understand precisely 
where a pause may occur.
GC performance has become much better over the last several years, and the 
throughput of the allocation-
heavy workflow has increased significantly, however there are a few things we 
should keep in mind:

1. Java 8 performs better with direct encoders (although afaict the performance 
isn't any better than java 9+,
string.getBytes just has several new optimizations
2. If this[1] change is merged into openjdk, the direct encoders may once again 
be the best-performing option,
I'm planning to put together a local build for comparison later.
3. My testing has primarily targetted UTF-8 as it's the only charset we target 
at work, and the default charset
on most modern servers. The jdk has a great deal of optimization which 
special cases UTF-8 for this reason,
especially from string.getBytes. Other charsets are likely to perform 
differently.

I'm in favor of changing the default value of "direct.encoders" based on the 
java version, for example
Java 8 should continue to use them, while versions after compact-strings was 
introduced are better
off without direct encoders until the jdk can be improved using something like 
the linked PR.

1. https://github.com/openjdk/jdk/pull/5621#issuecomment-925413767

-ck

On Thu, Sep 23, 2021, at 11:51, Ralph Goers wrote:
> I ran Ceki’s benchmarks against 2.14.1, 2.15.0-SNAPSHOT and Carter’s branch. 
> I ran them with 16 threads on my
> MacBook Pro and ran each test twice since the tests show some variability 
> from time to time. 
> 
> I can draw 2 conclusions from this.
> 1. The improvements Carter has made to 2.15.0 have already made a noticeable 
> improvement. 
> 2. His new changes show a definite improvement on top of the 2.15.0 
> improvements when enable direct encoders is false.
> 
> These results definitely make me wonder if we should just remove the direct 
> encoders logic at least in master. From what 
> I can tell it makes things complicated and makes no noticeable impact on 
> performance in 2.15.0 even before Carter’s 
> changes. Are they required for gc-free? Can we make direct.encoders = false 
> the default if it isn’t now?
> 
> I apologize if the performance results don’t look good in your email. They 
> are formatted with a fixed-width font but it is 
> likely they will be messed up once they reach your mail client.
> 
> Ralph
> 
> 2.14.1 - enable direct encoders = false
> 
> BenchmarkMode  CntScoreError  
> Units
> AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1221.995 ± 150.876 
>  ops/ms
> AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1495.072 ±  41.715 
>  ops/ms
> AsyncWithFileAppenderBenchmark.logbackFile  thrpt  10  1529.807 ±  55.876 
>  ops/ms
> FileAppenderBenchmark.log4j1Filethrpt  10   681.690 ±  17.631 
>  ops/ms
> FileAppenderBenchmark.log4j2Filethrpt  10  1134.543 ±  35.065 
>  ops/ms
> FileAppenderBenchmark.logbackFile   thrpt  10  1546.811 ±  95.721 
>  ops/ms
> 
> BenchmarkMode  CntScoreError  
> Units
> AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1103.526 ±  78.201 
>  ops/ms
> AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1402.669 ± 144.374 
>  ops/ms
> AsyncWithFileAppenderBenchmark.logbackFile  thrpt  10  1475.666 ±  87.885 
>  ops/ms
> FileAppenderBenchmark.log4j1Filethrpt  10   582.841 ±  57.156 
>  ops/ms
> FileAppenderBenchmark.log4j2Filethrpt  10   860.587 ± 185.170 
>  ops/ms
> FileAppenderBenchmark.logbackFile   thrpt  10  1560.206 ± 323.229 
>  ops/ms
> 
> 2.14.1 - enable direct encoders = true
> 
> BenchmarkMode  CntScoreError  
> Units
> AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1250.808 ± 116.848 
>  ops/ms
> AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1564.926 ± 146.195 
>  ops/ms
> AsyncWithFileAppenderBenchmark.logbackFile  thrpt  10  1703.731 ± 183.304 
>  ops/ms
> FileAppenderBenchmark.log4j1Filethrpt  10  736.189 ±   40.798 
>  ops/ms
> FileAppenderBenchmark.log4j2Filethrpt  10  1070.958 ±  97.173 
>  ops/ms
> FileAppenderBenchmark.logbackFile   thrpt  10  1350.507 ± 225.104 
>  ops/ms
> 
> BenchmarkMode  CntScoreError  
> Units
> AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1218.639 ± 141.343 
>  ops/ms
> AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1525.081 ± 112.058 
>  ops/ms
> AsyncWithFileAppenderBenchmark.logbackFile  thrpt  10  1724.925 ± 169.680 
>  ops/ms
> FileAppenderBenchmark.log4j1Filethrpt  10

Re: Cek's new log4j vs logback benchmark

2021-09-23 Thread Ralph Goers
I ran Ceki’s benchmarks against 2.14.1, 2.15.0-SNAPSHOT and Carter’s branch. I 
ran them with 16 threads on my
MacBook Pro and ran each test twice since the tests show some variability from 
time to time. 

I can draw 2 conclusions from this.
1. The improvements Carter has made to 2.15.0 have already made a noticeable 
improvement. 
2. His new changes show a definite improvement on top of the 2.15.0 
improvements when enable direct encoders is false.

These results definitely make me wonder if we should just remove the direct 
encoders logic at least in master. From what 
I can tell it makes things complicated and makes no noticeable impact on 
performance in 2.15.0 even before Carter’s 
changes. Are they required for gc-free? Can we make direct.encoders = false the 
default if it isn’t now?

I apologize if the performance results don’t look good in your email. They are 
formatted with a fixed-width font but it is 
likely they will be messed up once they reach your mail client.

Ralph

2.14.1 - enable direct encoders = false

BenchmarkMode  CntScoreError  
Units
AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1221.995 ± 150.876  
ops/ms
AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1495.072 ±  41.715  
ops/ms
AsyncWithFileAppenderBenchmark.logbackFile  thrpt  10  1529.807 ±  55.876  
ops/ms
FileAppenderBenchmark.log4j1Filethrpt  10   681.690 ±  17.631  
ops/ms
FileAppenderBenchmark.log4j2Filethrpt  10  1134.543 ±  35.065  
ops/ms
FileAppenderBenchmark.logbackFile   thrpt  10  1546.811 ±  95.721  
ops/ms

BenchmarkMode  CntScoreError  
Units
AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1103.526 ±  78.201  
ops/ms
AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1402.669 ± 144.374  
ops/ms
AsyncWithFileAppenderBenchmark.logbackFile  thrpt  10  1475.666 ±  87.885  
ops/ms
FileAppenderBenchmark.log4j1Filethrpt  10   582.841 ±  57.156  
ops/ms
FileAppenderBenchmark.log4j2Filethrpt  10   860.587 ± 185.170  
ops/ms
FileAppenderBenchmark.logbackFile   thrpt  10  1560.206 ± 323.229  
ops/ms

2.14.1 - enable direct encoders = true

BenchmarkMode  CntScoreError  
Units
AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1250.808 ± 116.848  
ops/ms
AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1564.926 ± 146.195  
ops/ms
AsyncWithFileAppenderBenchmark.logbackFile  thrpt  10  1703.731 ± 183.304  
ops/ms
FileAppenderBenchmark.log4j1Filethrpt  10  736.189 ±   40.798  
ops/ms
FileAppenderBenchmark.log4j2Filethrpt  10  1070.958 ±  97.173  
ops/ms
FileAppenderBenchmark.logbackFile   thrpt  10  1350.507 ± 225.104  
ops/ms

BenchmarkMode  CntScoreError  
Units
AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1218.639 ± 141.343  
ops/ms
AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1525.081 ± 112.058  
ops/ms
AsyncWithFileAppenderBenchmark.logbackFile  thrpt  10  1724.925 ± 169.680  
ops/ms
FileAppenderBenchmark.log4j1Filethrpt  10   680.453 ±  32.330  
ops/ms
FileAppenderBenchmark.log4j2Filethrpt  10  1054.517 ±  52.147  
ops/ms
FileAppenderBenchmark.logbackFile   thrpt  10  1353.180 ± 224.415  
ops/ms

2.15.0-SNAPSHOT - enable direct encoders = false

BenchmarkMode  CntScoreError  
Units
AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1121.838 ± 112.026  
ops/ms
AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1606.602 ± 217.776  
ops/ms
AsyncWithFileAppenderBenchmark.logbackFile  thrpt  10  1607.875 ± 226.656  
ops/ms
FileAppenderBenchmark.log4j1Filethrpt  10  694.809 ±  33.380   
ops/ms
FileAppenderBenchmark.log4j2Filethrpt  10  1500.557 ± 402.779  
ops/ms
FileAppenderBenchmark.logbackFile   thrpt  10  1387.306 ± 213.081  
ops/ms

BenchmarkMode  CntScoreError  
Units
AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1199.591 ± 149.157 
ops/ms
AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt  10  1744.558 ± 190.732 
ops/ms
AsyncWithFileAppenderBenchmark.logbackFile  thrpt  10  1718.862 ± 142.328 
ops/ms
FileAppenderBenchmark.log4j1Filethrpt  10   604.785 ± 78.884  
ops/ms
FileAppenderBenchmark.log4j2Filethrpt  10  1604.844 ± 712.807 
ops/ms
FileAppenderBenchmark.logbackFile   thrpt  10  1578.756 ± 109.343 
ops/ms

2.15.0-SNAPSHOT - enable direct encoders = true

BenchmarkMode  CntScoreError  
Units
AsyncWithFileAppenderBenchmark.log4j1File   thrpt  10  1244.029 ±  61.842  
ops/ms
Asyn

Re: Cek's new log4j vs logback benchmark

2021-09-22 Thread Volkan Yazıcı
That's great news! Then I will be looking forward to your signal for
putting a ribbon onto 2.15.0.

I have followed the conversation with Claes Redestad from Oracle on Twitter
. My
conclusion was also that there apparently is no way to make CharsetEncoder
beat .toString().getBytes() in Java 9+, until a CE specialization gets
introduced similar to what has been done in .toString().getBytes().
Nevertheless, I was also (naively?) thinking about a similar branching
strategy like the one you proposed as a temporary solution: preJava9 ?
useCE() : useStringGetBytes(). Would you be able to get this done?

Another idea I thought of yesterday evening was to introduce our own
hand-written StringBuilder- or char[]-to-byte[] encoders for common cases,
i.e., ASCII and UTF-8. What do you think?


On Wed, Sep 22, 2021 at 3:48 AM Carter Kozak  wrote:

> Thanks, Volkan!
>
> Rerunning the benchmarks on my branch (specifically the PatternLayout
> benchmarks in log4j-perf) produced much better improvements than I had
> anticipated. Beyond that, the date format cache invalidation problem
> resulted in a substantial speedup. I agree that it would be helpful to get
> a release out the door once this is merged.
>
> Re getBytes vs CharsetEncoder, I don't want to use the unsafe hack I put
> together in my benchmark project, that was just for experimentation :-)
> Future java releases (or changes in minor patch releases) could cause it to
> fail in frightening ways. We may be better off recommending the getBytes
> approach for now on some java versions (possibly by changing our default on
> java 9+).
> Claes has a potential change[1] which appears to buy us a great deal of
> performance in future Javas (assuming it is merged) and we may be able to
> engage for additional encoding APIs, for example something like this could
> avoid allocations and additional buffers:
>
> /** Returns the number of characters encoded to destination, or -1 if more
> space is needed (equivalent to CoderResult.OVERFLOW) */
> int CharSetEncoder.encode(charsequence, inputStart, inputLimit, byte[]
> destination, int destOffset, int destLimit)
> (I haven't put a great deal of thought into this API and it's getting
> late, so pardon any terrible ideas!)
>
> 1. https://github.com/openjdk/jdk/pull/5621
>
> -ck
>
> On Tue, Sep 21, 2021, at 15:31, Volkan Yazıcı wrote:
> > First and foremost, fantastic job Carter!
> >
> > For #573, I see that Gary and Ralph have already shared some remarks. I
> > would appreciate it if we can get this merged and cut the ribbon for
> 2.15.0
> > release.
> >
> > Regarding `StringBuilder#toString().getBytes()`-vs-`CharsetEncoder`...
> That
> > is a tough one. In your hack branch there, I am not sure if using
> `Unsafe`
> > is a future-proof path forward. I was trying to wrap my mind around
> Daniel
> > Sun's fast-reflection 
> > (for the records, I couldn't) and was triggered by his ASM usage there. I
> > was curious if we could do something similar via ASM to hack
> > `CharsetEncoder`? (I am probably talking nonsense, I hope it anyway
> > triggers a practical idea for you.)
>


Re: Cek's new log4j vs logback benchmark

2021-09-21 Thread Carter Kozak
Thanks, Volkan!

Rerunning the benchmarks on my branch (specifically the PatternLayout 
benchmarks in log4j-perf) produced much better improvements than I had 
anticipated. Beyond that, the date format cache invalidation problem resulted 
in a substantial speedup. I agree that it would be helpful to get a release out 
the door once this is merged.

Re getBytes vs CharsetEncoder, I don't want to use the unsafe hack I put 
together in my benchmark project, that was just for experimentation :-) Future 
java releases (or changes in minor patch releases) could cause it to fail in 
frightening ways. We may be better off recommending the getBytes approach for 
now on some java versions (possibly by changing our default on java 9+).
Claes has a potential change[1] which appears to buy us a great deal of 
performance in future Javas (assuming it is merged) and we may be able to 
engage for additional encoding APIs, for example something like this could 
avoid allocations and additional buffers:

/** Returns the number of characters encoded to destination, or -1 if more 
space is needed (equivalent to CoderResult.OVERFLOW) */
int CharSetEncoder.encode(charsequence, inputStart, inputLimit, byte[] 
destination, int destOffset, int destLimit)
(I haven't put a great deal of thought into this API and it's getting late, so 
pardon any terrible ideas!)

1. https://github.com/openjdk/jdk/pull/5621

-ck

On Tue, Sep 21, 2021, at 15:31, Volkan Yazıcı wrote:
> First and foremost, fantastic job Carter!
> 
> For #573, I see that Gary and Ralph have already shared some remarks. I
> would appreciate it if we can get this merged and cut the ribbon for 2.15.0
> release.
> 
> Regarding `StringBuilder#toString().getBytes()`-vs-`CharsetEncoder`... That
> is a tough one. In your hack branch there, I am not sure if using `Unsafe`
> is a future-proof path forward. I was trying to wrap my mind around Daniel
> Sun's fast-reflection 
> (for the records, I couldn't) and was triggered by his ASM usage there. I
> was curious if we could do something similar via ASM to hack
> `CharsetEncoder`? (I am probably talking nonsense, I hope it anyway
> triggers a practical idea for you.)


Re: Cek's new log4j vs logback benchmark

2021-09-21 Thread Volkan Yazıcı
First and foremost, fantastic job Carter!

For #573, I see that Gary and Ralph have already shared some remarks. I
would appreciate it if we can get this merged and cut the ribbon for 2.15.0
release.

Regarding `StringBuilder#toString().getBytes()`-vs-`CharsetEncoder`... That
is a tough one. In your hack branch there, I am not sure if using `Unsafe`
is a future-proof path forward. I was trying to wrap my mind around Daniel
Sun's fast-reflection 
(for the records, I couldn't) and was triggered by his ASM usage there. I
was curious if we could do something similar via ASM to hack
`CharsetEncoder`? (I am probably talking nonsense, I hope it anyway
triggers a practical idea for you.)

On Mon, Sep 20, 2021 at 11:09 PM Carter Kozak  wrote:

> https://github.com/apache/logging-log4j2/pull/573 improves performance by
> double digits in my testing, unfortunately it's difficult and
> time-consuming to individually compare performance implications of each
> part of the change. I'd appreciate review/feedback on the change. When I
> apply this PR and benchmark with "-Dlog4j2.enable.direct.encoders=false"
> (results in StringBuilder.toString().getBytes(charset) rather than using
> CharsetEncoders) the performance in blocking mode matches logback, and
> async performance is much better.
>
> StringBuilder.toString.getBytes vs CharsetEncoder:
> I have some benchmarks here which illustrate the problem:
> https://github.com/carterkozak/stringbuilder-encoding-performance
> The CharsetEncoder itself doesn't seem to make a great deal of sense these
> days because Strings are no longer backed by char arrays, the conversion
> from bytes (in String/StringBuilder) into chars for the CharBuffer
> over-complicates the process. It could be feasible to add an API method to
> CharsetEncoder which takes a String/StringBuilder (charsequence +
> instanceof?), extracts the bytes+coder, and writes to an existing buffer
> without the CharBuffer conversion, but I haven't had a chance to sketch
> this out.
> I put together a quick and dirty hack showing a zero-gc fast-path for
> StringBuilder-to-bytes conversion, however it suffers from lack of
> intrinsics used in the String.getBytes(UTF_8) implementation:
> https://github.com/carterkozak/stringbuilder-encoding-performance/pull/3
>
> On Mon, Sep 20, 2021, at 15:34, Volkan Yazıcı wrote:
> > Carter, I am really curious about your ongoing crusade. Mind updating us
> on
> > it a bit, please? Where are we? What is your plan? How can we help you?
> >
> > On Wed, Sep 1, 2021 at 2:57 PM Carter Kozak  wrote:
> >
> > > Thanks, Volkan!
> > >
> > > I had not seen InstantFormatter, it does look helpful, however I think
> it
> > > may
> > > have a bug. It appears to only compare milliseconds of Instant values
> > > while FixedDateFormat has some patterns which support microsecond
> > > and nanosecond precision. Currently I think this will batch together
> all
> > > events within a millisecond and assign them the microsecond value
> > > of the first cached event.
> > > I think this is what we want:
> > > https://github.com/apache/logging-log4j2/pull/576
> > >
> > > Good idea reaching out to Claes Redestad, it would be helpful to have
> > > someone more familiar with modern jvm string internals than us take
> > > a look if he's interested!
> > >
> > > -ck
> > >
> > > On Wed, Sep 1, 2021, at 03:44, Volkan Yazıcı wrote:
> > > > Great work Carter!
> > > >
> > > > Have you seen `o.a.l.l.layout.template.json.util.InstantFormatter`,
> > > > particularly its `Formatter#isInstantMatching` methods used for
> > > > invalidating the cache? I was thinking of making it even smarter,
> e.g.,
> > > if
> > > > the pattern only has seconds, compare `Instant`s by their seconds. I
> > > aspire
> > > > to pull it to the core, replace access to all individual formatters
> with
> > > > this one, and mark the rest deprecated. Another idea I was thinking
> about
> > > > is enhancing these individual formatters to contain the precision
> they
> > > > require and use that in `isInstantMatching` methods.
> > > >
> > > > Regarding your unicode character problems, shall we try pinging Claes
> > > > Redestad (@cl4es), who has recently enhanced String.format() in JDK
> 17
> > > > , via Twitter?
> > > >
> > > > On Mon, Aug 30, 2021 at 11:32 PM Carter Kozak 
> wrote:
> > > >
> > > > > I've merged the fix for our FixedDateFormat caching bug which
> caused
> > > us to
> > > > > recompute the same millisecond-precision formatted timestamp
> > > unnecessarily
> > > > > each time our microsecond-precision clock incremented.
> > > > > https://issues.apache.org/jira/browse/LOG4J2-3153
> > > > >
> > > > > I've also been unwrapping a few layers of complexity, wrapping
> several
> > > > > layers of components with conditional logic makes it harder for the
> > > jit to
> > > > > optimize code, so we can improve things by using separate types
> based

Re: Cek's new log4j vs logback benchmark

2021-09-20 Thread Carter Kozak
https://github.com/apache/logging-log4j2/pull/573 improves performance by 
double digits in my testing, unfortunately it's difficult and time-consuming to 
individually compare performance implications of each part of the change. I'd 
appreciate review/feedback on the change. When I apply this PR and benchmark 
with "-Dlog4j2.enable.direct.encoders=false" (results in 
StringBuilder.toString().getBytes(charset) rather than using CharsetEncoders) 
the performance in blocking mode matches logback, and async performance is much 
better.

StringBuilder.toString.getBytes vs CharsetEncoder:
I have some benchmarks here which illustrate the problem: 
https://github.com/carterkozak/stringbuilder-encoding-performance
The CharsetEncoder itself doesn't seem to make a great deal of sense these days 
because Strings are no longer backed by char arrays, the conversion from bytes 
(in String/StringBuilder) into chars for the CharBuffer over-complicates the 
process. It could be feasible to add an API method to CharsetEncoder which 
takes a String/StringBuilder (charsequence + instanceof?), extracts the 
bytes+coder, and writes to an existing buffer without the CharBuffer 
conversion, but I haven't had a chance to sketch this out.
I put together a quick and dirty hack showing a zero-gc fast-path for 
StringBuilder-to-bytes conversion, however it suffers from lack of intrinsics 
used in the String.getBytes(UTF_8) implementation:
https://github.com/carterkozak/stringbuilder-encoding-performance/pull/3

On Mon, Sep 20, 2021, at 15:34, Volkan Yazıcı wrote:
> Carter, I am really curious about your ongoing crusade. Mind updating us on
> it a bit, please? Where are we? What is your plan? How can we help you?
> 
> On Wed, Sep 1, 2021 at 2:57 PM Carter Kozak  wrote:
> 
> > Thanks, Volkan!
> >
> > I had not seen InstantFormatter, it does look helpful, however I think it
> > may
> > have a bug. It appears to only compare milliseconds of Instant values
> > while FixedDateFormat has some patterns which support microsecond
> > and nanosecond precision. Currently I think this will batch together all
> > events within a millisecond and assign them the microsecond value
> > of the first cached event.
> > I think this is what we want:
> > https://github.com/apache/logging-log4j2/pull/576
> >
> > Good idea reaching out to Claes Redestad, it would be helpful to have
> > someone more familiar with modern jvm string internals than us take
> > a look if he's interested!
> >
> > -ck
> >
> > On Wed, Sep 1, 2021, at 03:44, Volkan Yazıcı wrote:
> > > Great work Carter!
> > >
> > > Have you seen `o.a.l.l.layout.template.json.util.InstantFormatter`,
> > > particularly its `Formatter#isInstantMatching` methods used for
> > > invalidating the cache? I was thinking of making it even smarter, e.g.,
> > if
> > > the pattern only has seconds, compare `Instant`s by their seconds. I
> > aspire
> > > to pull it to the core, replace access to all individual formatters with
> > > this one, and mark the rest deprecated. Another idea I was thinking about
> > > is enhancing these individual formatters to contain the precision they
> > > require and use that in `isInstantMatching` methods.
> > >
> > > Regarding your unicode character problems, shall we try pinging Claes
> > > Redestad (@cl4es), who has recently enhanced String.format() in JDK 17
> > > , via Twitter?
> > >
> > > On Mon, Aug 30, 2021 at 11:32 PM Carter Kozak  wrote:
> > >
> > > > I've merged the fix for our FixedDateFormat caching bug which caused
> > us to
> > > > recompute the same millisecond-precision formatted timestamp
> > unnecessarily
> > > > each time our microsecond-precision clock incremented.
> > > > https://issues.apache.org/jira/browse/LOG4J2-3153
> > > >
> > > > I've also been unwrapping a few layers of complexity, wrapping several
> > > > layers of components with conditional logic makes it harder for the
> > jit to
> > > > optimize code, so we can improve things by using separate types based
> > on
> > > > the requested features:
> > > > https://github.com/apache/logging-log4j2/pull/573
> > > > TODO: I'm not happy with the way I unwrapped PatternFormatter objects
> > in
> > > > this PR, I think it could work better as an optional wrapper around the
> > > > delegate LogEventPatternConverter (where the default FormattingInfo
> > returns
> > > > the delegate instance directly)
> > > > TODO: simplify MessagePatternConverter  a bit, the body is giant for
> > > > something that's generally relatively simple. The method is too large
> > for
> > > > me to read in a glance, so I imagine the jit will have a hard time
> > making
> > > > it fast as well. I don't really like the message-format feature which
> > > > allows lookups in the formatted message text because it leaks details
> > of
> > > > the framework implementation/configuration into consumers of logging
> > APIs
> > > > (which may not even be log4j-core), however I'm not sure how
> 

Re: Cek's new log4j vs logback benchmark

2021-09-20 Thread Volkan Yazıcı
Carter, I am really curious about your ongoing crusade. Mind updating us on
it a bit, please? Where are we? What is your plan? How can we help you?

On Wed, Sep 1, 2021 at 2:57 PM Carter Kozak  wrote:

> Thanks, Volkan!
>
> I had not seen InstantFormatter, it does look helpful, however I think it
> may
> have a bug. It appears to only compare milliseconds of Instant values
> while FixedDateFormat has some patterns which support microsecond
> and nanosecond precision. Currently I think this will batch together all
> events within a millisecond and assign them the microsecond value
> of the first cached event.
> I think this is what we want:
> https://github.com/apache/logging-log4j2/pull/576
>
> Good idea reaching out to Claes Redestad, it would be helpful to have
> someone more familiar with modern jvm string internals than us take
> a look if he's interested!
>
> -ck
>
> On Wed, Sep 1, 2021, at 03:44, Volkan Yazıcı wrote:
> > Great work Carter!
> >
> > Have you seen `o.a.l.l.layout.template.json.util.InstantFormatter`,
> > particularly its `Formatter#isInstantMatching` methods used for
> > invalidating the cache? I was thinking of making it even smarter, e.g.,
> if
> > the pattern only has seconds, compare `Instant`s by their seconds. I
> aspire
> > to pull it to the core, replace access to all individual formatters with
> > this one, and mark the rest deprecated. Another idea I was thinking about
> > is enhancing these individual formatters to contain the precision they
> > require and use that in `isInstantMatching` methods.
> >
> > Regarding your unicode character problems, shall we try pinging Claes
> > Redestad (@cl4es), who has recently enhanced String.format() in JDK 17
> > , via Twitter?
> >
> > On Mon, Aug 30, 2021 at 11:32 PM Carter Kozak  wrote:
> >
> > > I've merged the fix for our FixedDateFormat caching bug which caused
> us to
> > > recompute the same millisecond-precision formatted timestamp
> unnecessarily
> > > each time our microsecond-precision clock incremented.
> > > https://issues.apache.org/jira/browse/LOG4J2-3153
> > >
> > > I've also been unwrapping a few layers of complexity, wrapping several
> > > layers of components with conditional logic makes it harder for the
> jit to
> > > optimize code, so we can improve things by using separate types based
> on
> > > the requested features:
> > > https://github.com/apache/logging-log4j2/pull/573
> > > TODO: I'm not happy with the way I unwrapped PatternFormatter objects
> in
> > > this PR, I think it could work better as an optional wrapper around the
> > > delegate LogEventPatternConverter (where the default FormattingInfo
> returns
> > > the delegate instance directly)
> > > TODO: simplify MessagePatternConverter  a bit, the body is giant for
> > > something that's generally relatively simple. The method is too large
> for
> > > me to read in a glance, so I imagine the jit will have a hard time
> making
> > > it fast as well. I don't really like the message-format feature which
> > > allows lookups in the formatted message text because it leaks details
> of
> > > the framework implementation/configuration into consumers of logging
> APIs
> > > (which may not even be log4j-core), however I'm not sure how
> reasonable it
> > > would be to change the default to disallow lookups given I'm sure folks
> > > depend on that behavior.
> > >
> > > I'm not sure what to do about the CharsetEncoder vs
> > > string.getBytes(charset) issue. The CharsetEncoder does not require
> > > allocation and outperforms getBytes when I add a unicode character to
> the
> > > message. When the message contains only ascii characters, getBytes
> performs
> > > better. Using CharBuffer.wrap(StringBuilder) produces substantially
> worse
> > > performance -- it shouldn't be copying the buffer in memory, but I
> suppose
> > > the heap buffer is more efficient to deal with. I need to do more
> research
> > > in this area.
> > >
> > > Thoughts/ideas/concerns?
> > > -ck
> > >
> >
>


Re: Cek's new log4j vs logback benchmark

2021-09-01 Thread Carter Kozak
Thanks, Volkan!

I had not seen InstantFormatter, it does look helpful, however I think it may
have a bug. It appears to only compare milliseconds of Instant values
while FixedDateFormat has some patterns which support microsecond
and nanosecond precision. Currently I think this will batch together all
events within a millisecond and assign them the microsecond value
of the first cached event.
I think this is what we want: https://github.com/apache/logging-log4j2/pull/576

Good idea reaching out to Claes Redestad, it would be helpful to have
someone more familiar with modern jvm string internals than us take
a look if he's interested!

-ck

On Wed, Sep 1, 2021, at 03:44, Volkan Yazıcı wrote:
> Great work Carter!
> 
> Have you seen `o.a.l.l.layout.template.json.util.InstantFormatter`,
> particularly its `Formatter#isInstantMatching` methods used for
> invalidating the cache? I was thinking of making it even smarter, e.g., if
> the pattern only has seconds, compare `Instant`s by their seconds. I aspire
> to pull it to the core, replace access to all individual formatters with
> this one, and mark the rest deprecated. Another idea I was thinking about
> is enhancing these individual formatters to contain the precision they
> require and use that in `isInstantMatching` methods.
> 
> Regarding your unicode character problems, shall we try pinging Claes
> Redestad (@cl4es), who has recently enhanced String.format() in JDK 17
> , via Twitter?
> 
> On Mon, Aug 30, 2021 at 11:32 PM Carter Kozak  wrote:
> 
> > I've merged the fix for our FixedDateFormat caching bug which caused us to
> > recompute the same millisecond-precision formatted timestamp unnecessarily
> > each time our microsecond-precision clock incremented.
> > https://issues.apache.org/jira/browse/LOG4J2-3153
> >
> > I've also been unwrapping a few layers of complexity, wrapping several
> > layers of components with conditional logic makes it harder for the jit to
> > optimize code, so we can improve things by using separate types based on
> > the requested features:
> > https://github.com/apache/logging-log4j2/pull/573
> > TODO: I'm not happy with the way I unwrapped PatternFormatter objects in
> > this PR, I think it could work better as an optional wrapper around the
> > delegate LogEventPatternConverter (where the default FormattingInfo returns
> > the delegate instance directly)
> > TODO: simplify MessagePatternConverter  a bit, the body is giant for
> > something that's generally relatively simple. The method is too large for
> > me to read in a glance, so I imagine the jit will have a hard time making
> > it fast as well. I don't really like the message-format feature which
> > allows lookups in the formatted message text because it leaks details of
> > the framework implementation/configuration into consumers of logging APIs
> > (which may not even be log4j-core), however I'm not sure how reasonable it
> > would be to change the default to disallow lookups given I'm sure folks
> > depend on that behavior.
> >
> > I'm not sure what to do about the CharsetEncoder vs
> > string.getBytes(charset) issue. The CharsetEncoder does not require
> > allocation and outperforms getBytes when I add a unicode character to the
> > message. When the message contains only ascii characters, getBytes performs
> > better. Using CharBuffer.wrap(StringBuilder) produces substantially worse
> > performance -- it shouldn't be copying the buffer in memory, but I suppose
> > the heap buffer is more efficient to deal with. I need to do more research
> > in this area.
> >
> > Thoughts/ideas/concerns?
> > -ck
> >
> 


Re: Cek's new log4j vs logback benchmark

2021-09-01 Thread Volkan Yazıcı
Great work Carter!

Have you seen `o.a.l.l.layout.template.json.util.InstantFormatter`,
particularly its `Formatter#isInstantMatching` methods used for
invalidating the cache? I was thinking of making it even smarter, e.g., if
the pattern only has seconds, compare `Instant`s by their seconds. I aspire
to pull it to the core, replace access to all individual formatters with
this one, and mark the rest deprecated. Another idea I was thinking about
is enhancing these individual formatters to contain the precision they
require and use that in `isInstantMatching` methods.

Regarding your unicode character problems, shall we try pinging Claes
Redestad (@cl4es), who has recently enhanced String.format() in JDK 17
, via Twitter?

On Mon, Aug 30, 2021 at 11:32 PM Carter Kozak  wrote:

> I've merged the fix for our FixedDateFormat caching bug which caused us to
> recompute the same millisecond-precision formatted timestamp unnecessarily
> each time our microsecond-precision clock incremented.
> https://issues.apache.org/jira/browse/LOG4J2-3153
>
> I've also been unwrapping a few layers of complexity, wrapping several
> layers of components with conditional logic makes it harder for the jit to
> optimize code, so we can improve things by using separate types based on
> the requested features:
> https://github.com/apache/logging-log4j2/pull/573
> TODO: I'm not happy with the way I unwrapped PatternFormatter objects in
> this PR, I think it could work better as an optional wrapper around the
> delegate LogEventPatternConverter (where the default FormattingInfo returns
> the delegate instance directly)
> TODO: simplify MessagePatternConverter  a bit, the body is giant for
> something that's generally relatively simple. The method is too large for
> me to read in a glance, so I imagine the jit will have a hard time making
> it fast as well. I don't really like the message-format feature which
> allows lookups in the formatted message text because it leaks details of
> the framework implementation/configuration into consumers of logging APIs
> (which may not even be log4j-core), however I'm not sure how reasonable it
> would be to change the default to disallow lookups given I'm sure folks
> depend on that behavior.
>
> I'm not sure what to do about the CharsetEncoder vs
> string.getBytes(charset) issue. The CharsetEncoder does not require
> allocation and outperforms getBytes when I add a unicode character to the
> message. When the message contains only ascii characters, getBytes performs
> better. Using CharBuffer.wrap(StringBuilder) produces substantially worse
> performance -- it shouldn't be copying the buffer in memory, but I suppose
> the heap buffer is more efficient to deal with. I need to do more research
> in this area.
>
> Thoughts/ideas/concerns?
> -ck
>


Re: Cek's new log4j vs logback benchmark

2021-08-30 Thread Carter Kozak
I've merged the fix for our FixedDateFormat caching bug which caused us to 
recompute the same millisecond-precision formatted timestamp unnecessarily each 
time our microsecond-precision clock incremented. 
https://issues.apache.org/jira/browse/LOG4J2-3153

I've also been unwrapping a few layers of complexity, wrapping several layers 
of components with conditional logic makes it harder for the jit to optimize 
code, so we can improve things by using separate types based on the requested 
features:
https://github.com/apache/logging-log4j2/pull/573
TODO: I'm not happy with the way I unwrapped PatternFormatter objects in this 
PR, I think it could work better as an optional wrapper around the delegate 
LogEventPatternConverter (where the default FormattingInfo returns the delegate 
instance directly)
TODO: simplify MessagePatternConverter  a bit, the body is giant for something 
that's generally relatively simple. The method is too large for me to read in a 
glance, so I imagine the jit will have a hard time making it fast as well. I 
don't really like the message-format feature which allows lookups in the 
formatted message text because it leaks details of the framework 
implementation/configuration into consumers of logging APIs (which may not even 
be log4j-core), however I'm not sure how reasonable it would be to change the 
default to disallow lookups given I'm sure folks depend on that behavior.

I'm not sure what to do about the CharsetEncoder vs string.getBytes(charset) 
issue. The CharsetEncoder does not require allocation and outperforms getBytes 
when I add a unicode character to the message. When the message contains only 
ascii characters, getBytes performs better. Using 
CharBuffer.wrap(StringBuilder) produces substantially worse performance -- it 
shouldn't be copying the buffer in memory, but I suppose the heap buffer is 
more efficient to deal with. I need to do more research in this area.

Thoughts/ideas/concerns?
-ck


Re: Cek's new log4j vs logback benchmark

2021-08-29 Thread Carter Kozak
I've pushed some changes I've been using to validate performance here: 
https://github.com/carterkozak/logback-perf/pull/new/ckozak/sandbox

Using the linux perf-norm profiler, the direct (garbage free) encoders on my 
machine use about 5000 instructions per operation while the byte-array method 
uses about 3500 instructions per operation.

The allocations appear to be small and short-lived enough to fit nicely into 
TLAB, if I disable TLAB with '-XX:-UseTLAB', the gc-free implementation isn't 
quite as heavily impacted as the byte-array encoder.

Interestingly, performance suffers dramatically when I introduce unicode 
characters to the log line. It looks like the garbage-free implementation fares 
better than the byte-array implementation by about 5%. I need to dig in a bit 
more here and re-validate the benchmarking results.

On Sat, Aug 28, 2021, at 17:58, Remko Popma wrote:
> On Sat, Aug 28, 2021 at 3:16 PM Ron Grabowski
>  wrote:
> 
> >  Follow-up to "Formatting the date is expensive. The process of actually
> > formatting a value is reasonable". Is this still an issue from LOG4J2-930:
> > %m %ex%n: 1,755,573 msg/sec%d %m %ex%n: 1,194,184 msg/sec
> >
> 
> No, I believe that formatting the date is no longer the bottleneck.
> The analysis done in LOG4J2-930 led to
> https://issues.apache.org/jira/browse/LOG4J2-1097 which resulted in the
> FixedDateFormat.
> This format gives a good trade-off between speed and flexibility.
> The drawback is that it only works for some fixed formats, but those are
> the most widely used formats.
> I don't think that focusing on the date formatting and pregenerate
> formatted timestamps will be fruitful (but I could be wrong).
> 
> Avoiding the PreciseTime Instant and using System.currentTimeMillis when it
> is known that none of the downstream formatters require sub-millisecond
> precision may be a good optimization.
> 
> The message formatting (PatternLayout) as a whole is expensive somehow,
> there may be more to optimize there.
> 
> 
> > If so, isn't date rendering essentially a sequence we can generate ahead
> > of time similar to how a local ID generator asks for an allocation of keys
> > then uses that to quickly assign IDs to new objects? When its time to
> > render %d we can just grab it via an index:
> >
> > 1)At startup calculate the next 32k formatted dates. If
> > Clock.currentTimeMillis() were configured down to the second, 32000 seconds
> > would pre-allocate %d for the next 8 hours.
> > 2)Apply math to Clock.currentTimeMillis() to get an index into the buffer.
> > Seconds precision:
> > [10] = "2021-08-28 09:44:31,000"
> > [11] = "2021-08-28 09:44:32,000"[12] = "2021-08-28 09:44:33,000"[13] =
> > "2021-08-28 09:44:34,000"[14] = "2021-08-28 09:44:35,000"[15] = "2021-08-28
> > 09:44:36,000"...[31999] = "..."
> > 50ms precision:
> > [10] = "2021-08-28 09:44:31,050"[11] = "2021-08-28 09:44:31,075"[12] =
> > "2021-08-28 09:44:31,100"[13] = "2021-08-28 09:44:31,150"[14] = "2021-08-28
> > 09:44:31,175"[15] = "2021-08-28 09:44:31,200"...[31999] = "..."
> >
> > 3)Rendering %d{SEQ(DEFAULT,32000)} is just a index lookup into the
> > sequence of 32000 pre-calculated %d{DEFAULT} values without the cost of
> > formatting. I made up the "SEQ" notation, there's likely a better way to
> > express the feature. Everything can read from the buffer without locking.
> > 4)Have a background thread casually keep the sequence filled in a ring so
> > dates in the past are replaced with future dates so the structure consumes
> > a consistent amount of memory.
> > On Friday, August 27, 2021, 10:07:59 PM EDT, Carter Kozak <
> > cko...@ckozak.net> wrote:
> >
> >  Thanks, Remko. The default '%d' uses FixedDateFormat with
> > FixedFormat.DEFAULT. The FastDateFormat alternative does not support
> > microseconds, so it doesn't suffer from the same problem. I think I can
> > substantially reduce the frequency we re-format dates by checking
> > FixedFormat.secondFractionDigits to determine if we meed to compare
> > microseconds.
> >
> > On Fri, Aug 27, 2021, at 16:10, Remko Popma wrote:
> > > I remember looking at PatternLayout performance, I reported my findings
> > here, hopefully they’re still useful:
> > https://issues.apache.org/jira/browse/LOG4J2-930
> > >
> > > If %d is used in the pattern, does the FixedDateFormat get used?
> > >
> > >
> > >
> > >
> > > > On Aug 28, 2021, at 4:33, Ralph Goers 
> > wrote:
> > > >
> > > > All of that agrees with my observations as well.
> > > >
> > > > Ralph
> > > >
> > > >> On Aug 27, 2021, at 12:23 PM, Carter Kozak  wrote:
> > > >>
> > > >> I've identified a few things that seem impactful, but bear in mind
> > that I haven't begun to drill down into them yet. I plan to file individual
> > tickets and investigate in greater depth later on:
> > > >>
> > > >> 1. Formatting the date is expensive. The process of actually
> > formatting a value is reasonable, however using a precise clock appears to
> > cause cache misses even when the pattern results

Re: Cek's new log4j vs logback benchmark

2021-08-28 Thread Remko Popma
On Sat, Aug 28, 2021 at 3:16 PM Ron Grabowski
 wrote:

>  Follow-up to "Formatting the date is expensive. The process of actually
> formatting a value is reasonable". Is this still an issue from LOG4J2-930:
> %m %ex%n: 1,755,573 msg/sec%d %m %ex%n: 1,194,184 msg/sec
>

No, I believe that formatting the date is no longer the bottleneck.
The analysis done in LOG4J2-930 led to
https://issues.apache.org/jira/browse/LOG4J2-1097 which resulted in the
FixedDateFormat.
This format gives a good trade-off between speed and flexibility.
The drawback is that it only works for some fixed formats, but those are
the most widely used formats.
I don't think that focusing on the date formatting and pregenerate
formatted timestamps will be fruitful (but I could be wrong).

Avoiding the PreciseTime Instant and using System.currentTimeMillis when it
is known that none of the downstream formatters require sub-millisecond
precision may be a good optimization.

The message formatting (PatternLayout) as a whole is expensive somehow,
there may be more to optimize there.


> If so, isn't date rendering essentially a sequence we can generate ahead
> of time similar to how a local ID generator asks for an allocation of keys
> then uses that to quickly assign IDs to new objects? When its time to
> render %d we can just grab it via an index:
>
> 1)At startup calculate the next 32k formatted dates. If
> Clock.currentTimeMillis() were configured down to the second, 32000 seconds
> would pre-allocate %d for the next 8 hours.
> 2)Apply math to Clock.currentTimeMillis() to get an index into the buffer.
> Seconds precision:
> [10] = "2021-08-28 09:44:31,000"
> [11] = "2021-08-28 09:44:32,000"[12] = "2021-08-28 09:44:33,000"[13] =
> "2021-08-28 09:44:34,000"[14] = "2021-08-28 09:44:35,000"[15] = "2021-08-28
> 09:44:36,000"...[31999] = "..."
> 50ms precision:
> [10] = "2021-08-28 09:44:31,050"[11] = "2021-08-28 09:44:31,075"[12] =
> "2021-08-28 09:44:31,100"[13] = "2021-08-28 09:44:31,150"[14] = "2021-08-28
> 09:44:31,175"[15] = "2021-08-28 09:44:31,200"...[31999] = "..."
>
> 3)Rendering %d{SEQ(DEFAULT,32000)} is just a index lookup into the
> sequence of 32000 pre-calculated %d{DEFAULT} values without the cost of
> formatting. I made up the "SEQ" notation, there's likely a better way to
> express the feature. Everything can read from the buffer without locking.
> 4)Have a background thread casually keep the sequence filled in a ring so
> dates in the past are replaced with future dates so the structure consumes
> a consistent amount of memory.
> On Friday, August 27, 2021, 10:07:59 PM EDT, Carter Kozak <
> cko...@ckozak.net> wrote:
>
>  Thanks, Remko. The default '%d' uses FixedDateFormat with
> FixedFormat.DEFAULT. The FastDateFormat alternative does not support
> microseconds, so it doesn't suffer from the same problem. I think I can
> substantially reduce the frequency we re-format dates by checking
> FixedFormat.secondFractionDigits to determine if we meed to compare
> microseconds.
>
> On Fri, Aug 27, 2021, at 16:10, Remko Popma wrote:
> > I remember looking at PatternLayout performance, I reported my findings
> here, hopefully they’re still useful:
> https://issues.apache.org/jira/browse/LOG4J2-930
> >
> > If %d is used in the pattern, does the FixedDateFormat get used?
> >
> >
> >
> >
> > > On Aug 28, 2021, at 4:33, Ralph Goers 
> wrote:
> > >
> > > All of that agrees with my observations as well.
> > >
> > > Ralph
> > >
> > >> On Aug 27, 2021, at 12:23 PM, Carter Kozak  wrote:
> > >>
> > >> I've identified a few things that seem impactful, but bear in mind
> that I haven't begun to drill down into them yet. I plan to file individual
> tickets and investigate in greater depth later on:
> > >>
> > >> 1. Formatting the date is expensive. The process of actually
> formatting a value is reasonable, however using a precise clock appears to
> cause cache misses even when the pattern results in the same value
> (microseconds/nanoseconds aren't included in the result). Using the
> SystemMillisClock improves our throughput. I imagine some part of that
> improvement is the result of currentTimeMillis(): long rather than
> Clock.instant().
> > >>
> > >> 2. Setting 'log4j2.enable.direct.encoders' to false improves
> performance, otherwise we see a lot of time spent in the
> StringBuilderEncoder, but I haven't had time to investigate that yet.
> Without direct encoders, we are allocating a great deal more byte arrays,
> but they're used across a small space that C2 may be able to inline out.
> > >>
> > >> 3. Setting log4j2.formatMsgNoLookups=true (or using message pattern
> '%m{nolookup}' saves us some time scanning through produced messages --
> I've had this disabled at work for a while.
> > >>
> > >> 4. [in progress] If I implement the full layout directly in java
> (write directly to a stringbuilder based on the event in a custom Layout
> instead of using PatternLayout) combined with [1] and [2] above,
> performance is much better than logb

Re: Cek's new log4j vs logback benchmark

2021-08-28 Thread Ralph Goers
Have you pushed that up somewhere? I would like to test it. 

 Is it possible to have direct encoders disabled by default unless something 
else that indicates GC Free logging is desired is enabled? From what I can tell 
the CPU overhead of GC Free logging makes it not worth it for most Java apps. 
If the application itself isn’t GC Free there isn’t much point for Log4j to be.

When I tested I did see a benefit of using a buffer size larger than the 
default of 8K. But again, that probably is what you want for a typical logging 
scenario.

Ralph

> On Aug 28, 2021, at 11:36 AM, Carter Kozak  wrote:
> 
> I don’t particularly want to optimize for the benchmarks, in most cases by 
> pre-generating the formatted dates we’d end up doing substantially more work 
> because we don’t log an event every millisecond/microsecond in practice.
> I’ve pushed a relatively naive optimization which takes into account when 
> microseconds aren’t used to provide the same performance as java 8 (where 
> microseconds weren’t available), but the implementation can be improved to 
> cache dates much longer if only second granularity is used. I’m skeptical 
> that such improvements would be helpful though because it’s uncommon to 
> combine low-precision timestamps with very high rates of logging in my 
> experience.
> https://issues.apache.org/jira/browse/LOG4J2-3153 
> 
> https://github.com/apache/logging-log4j2/pull/572 
> 
> 
> In testing (with direct encoders disabled) this brings us most of the way to 
> the logbook results. When I use it with a custom hard-coded layout 
> implementation to produce the same output as the pattern, log4j2 performs 15% 
> faster than logbook (in a single-threaded benchmark, I haven’t revalidated 
> using the async benchmarks because we seem to have a clear lead across the 
> asynchronous parts).
> 
>> On Aug 28, 2021, at 4:36 AM, Dominik Psenner  wrote:
>> 
>> May a ringbuffer cache be a viable solution?
>> 
>> Knowing the resolution (ticks, microseconds, milliseconds, seconds,..), a
>> key could be generated to lookup and return a previously created formatted
>> string from a ringbuffer or create it on the fly and store it for later.
>> During bursts of log events, this could significantly improve performance.
>> The ringbuffer may even be very small, only depending on the requested
>> resolution.
>> 
>> Cheers
>> Dominik
>> --
>> Sent from my phone. Typos are a kind gift to anyone who happens to find
>> them.
>> 
>> On Sat, Aug 28, 2021, 09:40 Ralph Goers  wrote:
>> 
>>> If you look at log4j-perf you will see both a TimeFormatBenchmark and a
>>> ClocksBenchmark. The time
>>> benchmark doesn’t include java.time’s formatter but I am pretty sure I
>>> added it to find that it was still
>>> slower than our formatters. However, the TimeFormat benchmark tests using
>>> System.currentTimeMillis
>>> - it doesn’t use a Clock. So there really isn’t a perf test for all the
>>> various ways the Date Converter
>>> could be implemented.
>>> 
>>> Actually, having a background thread for the date is an interesting idea
>>> but I don’t think I would have the
>>> thread create dates that might be used. Instead, I would look to see if
>>> there is a way to request the date
>>> be formatted asynchronously so that when the DateConverter actually needs
>>> it it is already available. The
>>> trick to this would be to know that a date is going to be requested
>>> similar to how we check whether
>>> location information is required. Then the async thread could add the
>>> formatted time to the log event.
>>> The only downside to this is that it assumes the date formatting thread
>>> will actually run before the Date
>>> Converter needs the data.
>>> 
>>> The problem with pre-formatting is that we would have to determine the
>>> frequency with which log events
>>> arrive. It doesn’t do any good to format the next 32K milliseconds if
>>> logging occurs once per second. And
>>> it doesn’t do any good to format once per second just to have it suddenly
>>> start generating 10 events per
>>> second.
>>> 
>>> 
>>> Ralph
>>> 
 On Aug 27, 2021, at 11:16 PM, Ron Grabowski 
 
>>> wrote:
 
 Follow-up to "Formatting the date is expensive. The process of actually
>>> formatting a value is reasonable". Is this still an issue from LOG4J2-930:
 %m %ex%n: 1,755,573 msg/sec%d %m %ex%n: 1,194,184 msg/sec
 If so, isn't date rendering essentially a sequence we can generate ahead
>>> of time similar to how a local ID generator asks for an allocation of keys
>>> then uses that to quickly assign IDs to new objects? When its time to
>>> render %d we can just grab it via an index:
 
 1)At startup calculate the next 32k formatted dates. If
>>> Clock.currentTimeMillis() were configured down to the second, 32000 seconds
>>> would pre-allocate %d for the next 8 hours.
 2)Apply math to Clock.currentTimeMillis() 

Re: Cek's new log4j vs logback benchmark

2021-08-28 Thread Carter Kozak
I don’t particularly want to optimize for the benchmarks, in most cases by 
pre-generating the formatted dates we’d end up doing substantially more work 
because we don’t log an event every millisecond/microsecond in practice.
I’ve pushed a relatively naive optimization which takes into account when 
microseconds aren’t used to provide the same performance as java 8 (where 
microseconds weren’t available), but the implementation can be improved to 
cache dates much longer if only second granularity is used. I’m skeptical that 
such improvements would be helpful though because it’s uncommon to combine 
low-precision timestamps with very high rates of logging in my experience.
https://issues.apache.org/jira/browse/LOG4J2-3153 

https://github.com/apache/logging-log4j2/pull/572 


In testing (with direct encoders disabled) this brings us most of the way to 
the logbook results. When I use it with a custom hard-coded layout 
implementation to produce the same output as the pattern, log4j2 performs 15% 
faster than logbook (in a single-threaded benchmark, I haven’t revalidated 
using the async benchmarks because we seem to have a clear lead across the 
asynchronous parts).

> On Aug 28, 2021, at 4:36 AM, Dominik Psenner  wrote:
> 
> May a ringbuffer cache be a viable solution?
> 
> Knowing the resolution (ticks, microseconds, milliseconds, seconds,..), a
> key could be generated to lookup and return a previously created formatted
> string from a ringbuffer or create it on the fly and store it for later.
> During bursts of log events, this could significantly improve performance.
> The ringbuffer may even be very small, only depending on the requested
> resolution.
> 
> Cheers
> Dominik
> --
> Sent from my phone. Typos are a kind gift to anyone who happens to find
> them.
> 
> On Sat, Aug 28, 2021, 09:40 Ralph Goers  wrote:
> 
>> If you look at log4j-perf you will see both a TimeFormatBenchmark and a
>> ClocksBenchmark. The time
>> benchmark doesn’t include java.time’s formatter but I am pretty sure I
>> added it to find that it was still
>> slower than our formatters. However, the TimeFormat benchmark tests using
>> System.currentTimeMillis
>> - it doesn’t use a Clock. So there really isn’t a perf test for all the
>> various ways the Date Converter
>> could be implemented.
>> 
>> Actually, having a background thread for the date is an interesting idea
>> but I don’t think I would have the
>> thread create dates that might be used. Instead, I would look to see if
>> there is a way to request the date
>> be formatted asynchronously so that when the DateConverter actually needs
>> it it is already available. The
>> trick to this would be to know that a date is going to be requested
>> similar to how we check whether
>> location information is required. Then the async thread could add the
>> formatted time to the log event.
>> The only downside to this is that it assumes the date formatting thread
>> will actually run before the Date
>> Converter needs the data.
>> 
>> The problem with pre-formatting is that we would have to determine the
>> frequency with which log events
>> arrive. It doesn’t do any good to format the next 32K milliseconds if
>> logging occurs once per second. And
>> it doesn’t do any good to format once per second just to have it suddenly
>> start generating 10 events per
>> second.
>> 
>> 
>> Ralph
>> 
>>> On Aug 27, 2021, at 11:16 PM, Ron Grabowski 
>> wrote:
>>> 
>>> Follow-up to "Formatting the date is expensive. The process of actually
>> formatting a value is reasonable". Is this still an issue from LOG4J2-930:
>>> %m %ex%n: 1,755,573 msg/sec%d %m %ex%n: 1,194,184 msg/sec
>>> If so, isn't date rendering essentially a sequence we can generate ahead
>> of time similar to how a local ID generator asks for an allocation of keys
>> then uses that to quickly assign IDs to new objects? When its time to
>> render %d we can just grab it via an index:
>>> 
>>> 1)At startup calculate the next 32k formatted dates. If
>> Clock.currentTimeMillis() were configured down to the second, 32000 seconds
>> would pre-allocate %d for the next 8 hours.
>>> 2)Apply math to Clock.currentTimeMillis() to get an index into the
>> buffer. Seconds precision:
>>> [10] = "2021-08-28 09:44:31,000"
>>> [11] = "2021-08-28 09:44:32,000"[12] = "2021-08-28 09:44:33,000"[13] =
>> "2021-08-28 09:44:34,000"[14] = "2021-08-28 09:44:35,000"[15] = "2021-08-28
>> 09:44:36,000"...[31999] = "..."
>>> 50ms precision:
>>> [10] = "2021-08-28 09:44:31,050"[11] = "2021-08-28 09:44:31,075"[12] =
>> "2021-08-28 09:44:31,100"[13] = "2021-08-28 09:44:31,150"[14] = "2021-08-28
>> 09:44:31,175"[15] = "2021-08-28 09:44:31,200"...[31999] = "..."
>>> 
>>> 3)Rendering %d{SEQ(DEFAULT,32000)} is just a index lookup into the
>> sequence of 32000 pre-calculated %d{DEFAULT} values without the cost of
>> formatting. I made up the "SEQ" notation, there's lik

Re: Cek's new log4j vs logback benchmark

2021-08-28 Thread Dominik Psenner
May a ringbuffer cache be a viable solution?

Knowing the resolution (ticks, microseconds, milliseconds, seconds,..), a
key could be generated to lookup and return a previously created formatted
string from a ringbuffer or create it on the fly and store it for later.
During bursts of log events, this could significantly improve performance.
The ringbuffer may even be very small, only depending on the requested
resolution.

Cheers
Dominik
--
Sent from my phone. Typos are a kind gift to anyone who happens to find
them.

On Sat, Aug 28, 2021, 09:40 Ralph Goers  wrote:

> If you look at log4j-perf you will see both a TimeFormatBenchmark and a
> ClocksBenchmark. The time
> benchmark doesn’t include java.time’s formatter but I am pretty sure I
> added it to find that it was still
> slower than our formatters. However, the TimeFormat benchmark tests using
> System.currentTimeMillis
>  - it doesn’t use a Clock. So there really isn’t a perf test for all the
> various ways the Date Converter
> could be implemented.
>
> Actually, having a background thread for the date is an interesting idea
> but I don’t think I would have the
> thread create dates that might be used. Instead, I would look to see if
> there is a way to request the date
> be formatted asynchronously so that when the DateConverter actually needs
> it it is already available. The
> trick to this would be to know that a date is going to be requested
> similar to how we check whether
> location information is required. Then the async thread could add the
> formatted time to the log event.
> The only downside to this is that it assumes the date formatting thread
> will actually run before the Date
> Converter needs the data.
>
> The problem with pre-formatting is that we would have to determine the
> frequency with which log events
> arrive. It doesn’t do any good to format the next 32K milliseconds if
> logging occurs once per second. And
> it doesn’t do any good to format once per second just to have it suddenly
> start generating 10 events per
> second.
>
>
> Ralph
>
> > On Aug 27, 2021, at 11:16 PM, Ron Grabowski 
> wrote:
> >
> > Follow-up to "Formatting the date is expensive. The process of actually
> formatting a value is reasonable". Is this still an issue from LOG4J2-930:
> > %m %ex%n: 1,755,573 msg/sec%d %m %ex%n: 1,194,184 msg/sec
> > If so, isn't date rendering essentially a sequence we can generate ahead
> of time similar to how a local ID generator asks for an allocation of keys
> then uses that to quickly assign IDs to new objects? When its time to
> render %d we can just grab it via an index:
> >
> > 1)At startup calculate the next 32k formatted dates. If
> Clock.currentTimeMillis() were configured down to the second, 32000 seconds
> would pre-allocate %d for the next 8 hours.
> > 2)Apply math to Clock.currentTimeMillis() to get an index into the
> buffer. Seconds precision:
> > [10] = "2021-08-28 09:44:31,000"
> > [11] = "2021-08-28 09:44:32,000"[12] = "2021-08-28 09:44:33,000"[13] =
> "2021-08-28 09:44:34,000"[14] = "2021-08-28 09:44:35,000"[15] = "2021-08-28
> 09:44:36,000"...[31999] = "..."
> > 50ms precision:
> > [10] = "2021-08-28 09:44:31,050"[11] = "2021-08-28 09:44:31,075"[12] =
> "2021-08-28 09:44:31,100"[13] = "2021-08-28 09:44:31,150"[14] = "2021-08-28
> 09:44:31,175"[15] = "2021-08-28 09:44:31,200"...[31999] = "..."
> >
> > 3)Rendering %d{SEQ(DEFAULT,32000)} is just a index lookup into the
> sequence of 32000 pre-calculated %d{DEFAULT} values without the cost of
> formatting. I made up the "SEQ" notation, there's likely a better way to
> express the feature. Everything can read from the buffer without locking.
> > 4)Have a background thread casually keep the sequence filled in a ring
> so dates in the past are replaced with future dates so the structure
> consumes a consistent amount of memory.
> >On Friday, August 27, 2021, 10:07:59 PM EDT, Carter Kozak <
> cko...@ckozak.net> wrote:
> >
> > Thanks, Remko. The default '%d' uses FixedDateFormat with
> FixedFormat.DEFAULT. The FastDateFormat alternative does not support
> microseconds, so it doesn't suffer from the same problem. I think I can
> substantially reduce the frequency we re-format dates by checking
> FixedFormat.secondFractionDigits to determine if we meed to compare
> microseconds.
> >
> > On Fri, Aug 27, 2021, at 16:10, Remko Popma wrote:
> >> I remember looking at PatternLayout performance, I reported my findings
> here, hopefully they’re still useful:
> https://issues.apache.org/jira/browse/LOG4J2-930
> >>
> >> If %d is used in the pattern, does the FixedDateFormat get used?
> >>
> >>
> >>
> >>
> >>> On Aug 28, 2021, at 4:33, Ralph Goers 
> wrote:
> >>>
> >>> All of that agrees with my observations as well.
> >>>
> >>> Ralph
> >>>
>  On Aug 27, 2021, at 12:23 PM, Carter Kozak  wrote:
> 
>  I've identified a few things that seem impactful, but bear in mind
> that I haven't begun to drill down into them yet. I plan to file individual
> tickets and i

Re: Cek's new log4j vs logback benchmark

2021-08-28 Thread Ralph Goers
If you look at log4j-perf you will see both a TimeFormatBenchmark and a 
ClocksBenchmark. The time 
benchmark doesn’t include java.time’s formatter but I am pretty sure I added it 
to find that it was still 
slower than our formatters. However, the TimeFormat benchmark tests using 
System.currentTimeMillis
 - it doesn’t use a Clock. So there really isn’t a perf test for all the 
various ways the Date Converter 
could be implemented.

Actually, having a background thread for the date is an interesting idea but I 
don’t think I would have the
thread create dates that might be used. Instead, I would look to see if there 
is a way to request the date 
be formatted asynchronously so that when the DateConverter actually needs it it 
is already available. The 
trick to this would be to know that a date is going to be requested similar to 
how we check whether 
location information is required. Then the async thread could add the formatted 
time to the log event.
The only downside to this is that it assumes the date formatting thread will 
actually run before the Date 
Converter needs the data.

The problem with pre-formatting is that we would have to determine the 
frequency with which log events 
arrive. It doesn’t do any good to format the next 32K milliseconds if logging 
occurs once per second. And 
it doesn’t do any good to format once per second just to have it suddenly start 
generating 10 events per 
second.


Ralph

> On Aug 27, 2021, at 11:16 PM, Ron Grabowski  
> wrote:
> 
> Follow-up to "Formatting the date is expensive. The process of actually 
> formatting a value is reasonable". Is this still an issue from LOG4J2-930:
> %m %ex%n: 1,755,573 msg/sec%d %m %ex%n: 1,194,184 msg/sec
> If so, isn't date rendering essentially a sequence we can generate ahead of 
> time similar to how a local ID generator asks for an allocation of keys then 
> uses that to quickly assign IDs to new objects? When its time to render %d we 
> can just grab it via an index:
> 
> 1)At startup calculate the next 32k formatted dates. If 
> Clock.currentTimeMillis() were configured down to the second, 32000 seconds 
> would pre-allocate %d for the next 8 hours.
> 2)Apply math to Clock.currentTimeMillis() to get an index into the buffer. 
> Seconds precision:
> [10] = "2021-08-28 09:44:31,000"
> [11] = "2021-08-28 09:44:32,000"[12] = "2021-08-28 09:44:33,000"[13] = 
> "2021-08-28 09:44:34,000"[14] = "2021-08-28 09:44:35,000"[15] = "2021-08-28 
> 09:44:36,000"...[31999] = "..."
> 50ms precision:
> [10] = "2021-08-28 09:44:31,050"[11] = "2021-08-28 09:44:31,075"[12] = 
> "2021-08-28 09:44:31,100"[13] = "2021-08-28 09:44:31,150"[14] = "2021-08-28 
> 09:44:31,175"[15] = "2021-08-28 09:44:31,200"...[31999] = "..."
> 
> 3)Rendering %d{SEQ(DEFAULT,32000)} is just a index lookup into the sequence 
> of 32000 pre-calculated %d{DEFAULT} values without the cost of formatting. I 
> made up the "SEQ" notation, there's likely a better way to express the 
> feature. Everything can read from the buffer without locking.
> 4)Have a background thread casually keep the sequence filled in a ring so 
> dates in the past are replaced with future dates so the structure consumes a 
> consistent amount of memory.
>On Friday, August 27, 2021, 10:07:59 PM EDT, Carter Kozak 
>  wrote:  
> 
> Thanks, Remko. The default '%d' uses FixedDateFormat with 
> FixedFormat.DEFAULT. The FastDateFormat alternative does not support 
> microseconds, so it doesn't suffer from the same problem. I think I can 
> substantially reduce the frequency we re-format dates by checking 
> FixedFormat.secondFractionDigits to determine if we meed to compare 
> microseconds.
> 
> On Fri, Aug 27, 2021, at 16:10, Remko Popma wrote:
>> I remember looking at PatternLayout performance, I reported my findings 
>> here, hopefully they’re still useful: 
>> https://issues.apache.org/jira/browse/LOG4J2-930
>> 
>> If %d is used in the pattern, does the FixedDateFormat get used?
>> 
>> 
>> 
>> 
>>> On Aug 28, 2021, at 4:33, Ralph Goers  wrote:
>>> 
>>> All of that agrees with my observations as well.
>>> 
>>> Ralph
>>> 
 On Aug 27, 2021, at 12:23 PM, Carter Kozak  wrote:
 
 I've identified a few things that seem impactful, but bear in mind that I 
 haven't begun to drill down into them yet. I plan to file individual 
 tickets and investigate in greater depth later on:
 
 1. Formatting the date is expensive. The process of actually formatting a 
 value is reasonable, however using a precise clock appears to cause cache 
 misses even when the pattern results in the same value 
 (microseconds/nanoseconds aren't included in the result). Using the 
 SystemMillisClock improves our throughput. I imagine some part of that 
 improvement is the result of currentTimeMillis(): long rather than 
 Clock.instant().
 
 2. Setting 'log4j2.enable.direct.encoders' to false improves performance, 
 otherwise we see a lot of time spent in the StringBuilde

Re: Cek's new log4j vs logback benchmark

2021-08-27 Thread Ron Grabowski
 Follow-up to "Formatting the date is expensive. The process of actually 
formatting a value is reasonable". Is this still an issue from LOG4J2-930:
%m %ex%n: 1,755,573 msg/sec%d %m %ex%n: 1,194,184 msg/sec
If so, isn't date rendering essentially a sequence we can generate ahead of 
time similar to how a local ID generator asks for an allocation of keys then 
uses that to quickly assign IDs to new objects? When its time to render %d we 
can just grab it via an index:

1)At startup calculate the next 32k formatted dates. If 
Clock.currentTimeMillis() were configured down to the second, 32000 seconds 
would pre-allocate %d for the next 8 hours.
2)Apply math to Clock.currentTimeMillis() to get an index into the buffer. 
Seconds precision:
[10] = "2021-08-28 09:44:31,000"
[11] = "2021-08-28 09:44:32,000"[12] = "2021-08-28 09:44:33,000"[13] = 
"2021-08-28 09:44:34,000"[14] = "2021-08-28 09:44:35,000"[15] = "2021-08-28 
09:44:36,000"...[31999] = "..."
50ms precision:
[10] = "2021-08-28 09:44:31,050"[11] = "2021-08-28 09:44:31,075"[12] = 
"2021-08-28 09:44:31,100"[13] = "2021-08-28 09:44:31,150"[14] = "2021-08-28 
09:44:31,175"[15] = "2021-08-28 09:44:31,200"...[31999] = "..."

3)Rendering %d{SEQ(DEFAULT,32000)} is just a index lookup into the sequence of 
32000 pre-calculated %d{DEFAULT} values without the cost of formatting. I made 
up the "SEQ" notation, there's likely a better way to express the feature. 
Everything can read from the buffer without locking.
4)Have a background thread casually keep the sequence filled in a ring so dates 
in the past are replaced with future dates so the structure consumes a 
consistent amount of memory.
On Friday, August 27, 2021, 10:07:59 PM EDT, Carter Kozak 
 wrote:  
 
 Thanks, Remko. The default '%d' uses FixedDateFormat with FixedFormat.DEFAULT. 
The FastDateFormat alternative does not support microseconds, so it doesn't 
suffer from the same problem. I think I can substantially reduce the frequency 
we re-format dates by checking FixedFormat.secondFractionDigits to determine if 
we meed to compare microseconds.

On Fri, Aug 27, 2021, at 16:10, Remko Popma wrote:
> I remember looking at PatternLayout performance, I reported my findings here, 
> hopefully they’re still useful: 
> https://issues.apache.org/jira/browse/LOG4J2-930
> 
> If %d is used in the pattern, does the FixedDateFormat get used?
> 
> 
> 
> 
> > On Aug 28, 2021, at 4:33, Ralph Goers  wrote:
> > 
> > All of that agrees with my observations as well.
> > 
> > Ralph
> > 
> >> On Aug 27, 2021, at 12:23 PM, Carter Kozak  wrote:
> >> 
> >> I've identified a few things that seem impactful, but bear in mind that I 
> >> haven't begun to drill down into them yet. I plan to file individual 
> >> tickets and investigate in greater depth later on:
> >> 
> >> 1. Formatting the date is expensive. The process of actually formatting a 
> >> value is reasonable, however using a precise clock appears to cause cache 
> >> misses even when the pattern results in the same value 
> >> (microseconds/nanoseconds aren't included in the result). Using the 
> >> SystemMillisClock improves our throughput. I imagine some part of that 
> >> improvement is the result of currentTimeMillis(): long rather than 
> >> Clock.instant().
> >> 
> >> 2. Setting 'log4j2.enable.direct.encoders' to false improves performance, 
> >> otherwise we see a lot of time spent in the StringBuilderEncoder, but I 
> >> haven't had time to investigate that yet. Without direct encoders, we are 
> >> allocating a great deal more byte arrays, but they're used across a small 
> >> space that C2 may be able to inline out.
> >> 
> >> 3. Setting log4j2.formatMsgNoLookups=true (or using message pattern 
> >> '%m{nolookup}' saves us some time scanning through produced messages -- 
> >> I've had this disabled at work for a while.
> >> 
> >> 4. [in progress] If I implement the full layout directly in java (write 
> >> directly to a stringbuilder based on the event in a custom Layout instead 
> >> of using PatternLayout) combined with [1] and [2] above, performance is 
> >> much better than logback, however using PatternLayout with the steps above 
> >> we fall just a bit short. I still need to read through the patternlayout 
> >> and individual pattern converters to understand why, but there's 
> >> definitely some headroom we'll be able to reclaim hiding somewhere.
> >> 
> >> -ck
> >> 
> >>> On Fri, Aug 27, 2021, at 11:23, Carter Kozak wrote:
> >>> My pleasure, I enjoy digging into this sort of performance comparison, the
> >>> toughest piece right now is balancing investigation with active projects 
> >>> at
> >>> work. I completely agree about the importance of getting this resolved
> >>> quickly and I intend to continue investigating.
> >>> 
> >>> Re loom:
> >>> Currently we have a few places which reuse shared objects, potentially 
> >>> large
> >>> ones (thinking StringBuilder instances on 
> >>> ReusableLogEvent/StringLayout/etc)
> >>> which are currently boun

Re: Cek's new log4j vs logback benchmark

2021-08-27 Thread Carter Kozak
Thanks, Remko. The default '%d' uses FixedDateFormat with FixedFormat.DEFAULT. 
The FastDateFormat alternative does not support microseconds, so it doesn't 
suffer from the same problem. I think I can substantially reduce the frequency 
we re-format dates by checking FixedFormat.secondFractionDigits to determine if 
we meed to compare microseconds.

On Fri, Aug 27, 2021, at 16:10, Remko Popma wrote:
> I remember looking at PatternLayout performance, I reported my findings here, 
> hopefully they’re still useful: 
> https://issues.apache.org/jira/browse/LOG4J2-930
> 
> If %d is used in the pattern, does the FixedDateFormat get used?
> 
> 
> 
> 
> > On Aug 28, 2021, at 4:33, Ralph Goers  wrote:
> > 
> > All of that agrees with my observations as well.
> > 
> > Ralph
> > 
> >> On Aug 27, 2021, at 12:23 PM, Carter Kozak  wrote:
> >> 
> >> I've identified a few things that seem impactful, but bear in mind that I 
> >> haven't begun to drill down into them yet. I plan to file individual 
> >> tickets and investigate in greater depth later on:
> >> 
> >> 1. Formatting the date is expensive. The process of actually formatting a 
> >> value is reasonable, however using a precise clock appears to cause cache 
> >> misses even when the pattern results in the same value 
> >> (microseconds/nanoseconds aren't included in the result). Using the 
> >> SystemMillisClock improves our throughput. I imagine some part of that 
> >> improvement is the result of currentTimeMillis(): long rather than 
> >> Clock.instant().
> >> 
> >> 2. Setting 'log4j2.enable.direct.encoders' to false improves performance, 
> >> otherwise we see a lot of time spent in the StringBuilderEncoder, but I 
> >> haven't had time to investigate that yet. Without direct encoders, we are 
> >> allocating a great deal more byte arrays, but they're used across a small 
> >> space that C2 may be able to inline out.
> >> 
> >> 3. Setting log4j2.formatMsgNoLookups=true (or using message pattern 
> >> '%m{nolookup}' saves us some time scanning through produced messages -- 
> >> I've had this disabled at work for a while.
> >> 
> >> 4. [in progress] If I implement the full layout directly in java (write 
> >> directly to a stringbuilder based on the event in a custom Layout instead 
> >> of using PatternLayout) combined with [1] and [2] above, performance is 
> >> much better than logback, however using PatternLayout with the steps above 
> >> we fall just a bit short. I still need to read through the patternlayout 
> >> and individual pattern converters to understand why, but there's 
> >> definitely some headroom we'll be able to reclaim hiding somewhere.
> >> 
> >> -ck
> >> 
> >>> On Fri, Aug 27, 2021, at 11:23, Carter Kozak wrote:
> >>> My pleasure, I enjoy digging into this sort of performance comparison, the
> >>> toughest piece right now is balancing investigation with active projects 
> >>> at
> >>> work. I completely agree about the importance of getting this resolved
> >>> quickly and I intend to continue investigating.
> >>> 
> >>> Re loom:
> >>> Currently we have a few places which reuse shared objects, potentially 
> >>> large
> >>> ones (thinking StringBuilder instances on 
> >>> ReusableLogEvent/StringLayout/etc)
> >>> which are currently bounded to the number of application threads you have
> >>> (which is relatively small compared to the loom claims which are 
> >>> admittedly
> >>> large). More importantly, today we tend to reuse almost every thread in a
> >>> ThreadPoolExecutor of some kind where the model in loom involves creating
> >>> a new virtual thread for each task, meaning using a thread-local as a 
> >>> cache
> >>> will have a much lower hit rate as reuse will only occur within a single 
> >>> task,
> >>> and only if ThreadLocals are enabled. I hope that made sense sense.
> >>> 
> >>> -ck
> >>> 
> >>> On Fri, Aug 27, 2021, at 08:18, Volkan Yazıcı wrote:
>  Carter, thanks so much for your great effort and attention on this 
>  issue. *I
>  personally find this the uttermost important thing the project needs to
>  address with urgency right now.* Ceki is warming up for a new Logback 
>  (and
>  SLF4J?) release. I am pretty sure he will do his part in PR, particularly
>  in the context of benchmarking his new work with competitors. Once these
>  results are out – independent of whether they make sense or not –
>  "internet" will take this as the ultimate truth and it will require a
>  tremendous effort to change people's mind. In conclusion, please 
>  prioritize
>  this appropriately and keep us posted. Don't hesitate to let us/me know 
>  if
>  there is anything (drinks, coffee, lunch?) we can help with.
>  
>  Regarding your remarks about Loom, I don't think I follow you there. Our
>  thread-locals are stateless hence should be perfectly fine while running 
>  in
>  virtual threads too. (I am totally against blindly using TLs, but that is
>  another disc

Re: Cek's new log4j vs logback benchmark

2021-08-27 Thread Remko Popma
I remember looking at PatternLayout performance, I reported my findings here, 
hopefully they’re still useful: https://issues.apache.org/jira/browse/LOG4J2-930

If %d is used in the pattern, does the FixedDateFormat get used?




> On Aug 28, 2021, at 4:33, Ralph Goers  wrote:
> 
> All of that agrees with my observations as well.
> 
> Ralph
> 
>> On Aug 27, 2021, at 12:23 PM, Carter Kozak  wrote:
>> 
>> I've identified a few things that seem impactful, but bear in mind that I 
>> haven't begun to drill down into them yet. I plan to file individual tickets 
>> and investigate in greater depth later on:
>> 
>> 1. Formatting the date is expensive. The process of actually formatting a 
>> value is reasonable, however using a precise clock appears to cause cache 
>> misses even when the pattern results in the same value 
>> (microseconds/nanoseconds aren't included in the result). Using the 
>> SystemMillisClock improves our throughput. I imagine some part of that 
>> improvement is the result of currentTimeMillis(): long rather than 
>> Clock.instant().
>> 
>> 2. Setting 'log4j2.enable.direct.encoders' to false improves performance, 
>> otherwise we see a lot of time spent in the StringBuilderEncoder, but I 
>> haven't had time to investigate that yet. Without direct encoders, we are 
>> allocating a great deal more byte arrays, but they're used across a small 
>> space that C2 may be able to inline out.
>> 
>> 3. Setting log4j2.formatMsgNoLookups=true (or using message pattern 
>> '%m{nolookup}' saves us some time scanning through produced messages -- I've 
>> had this disabled at work for a while.
>> 
>> 4. [in progress] If I implement the full layout directly in java (write 
>> directly to a stringbuilder based on the event in a custom Layout instead of 
>> using PatternLayout) combined with [1] and [2] above, performance is much 
>> better than logback, however using PatternLayout with the steps above we 
>> fall just a bit short. I still need to read through the patternlayout and 
>> individual pattern converters to understand why, but there's definitely some 
>> headroom we'll be able to reclaim hiding somewhere.
>> 
>> -ck
>> 
>>> On Fri, Aug 27, 2021, at 11:23, Carter Kozak wrote:
>>> My pleasure, I enjoy digging into this sort of performance comparison, the
>>> toughest piece right now is balancing investigation with active projects at
>>> work. I completely agree about the importance of getting this resolved
>>> quickly and I intend to continue investigating.
>>> 
>>> Re loom:
>>> Currently we have a few places which reuse shared objects, potentially large
>>> ones (thinking StringBuilder instances on ReusableLogEvent/StringLayout/etc)
>>> which are currently bounded to the number of application threads you have
>>> (which is relatively small compared to the loom claims which are admittedly
>>> large). More importantly, today we tend to reuse almost every thread in a
>>> ThreadPoolExecutor of some kind where the model in loom involves creating
>>> a new virtual thread for each task, meaning using a thread-local as a cache
>>> will have a much lower hit rate as reuse will only occur within a single 
>>> task,
>>> and only if ThreadLocals are enabled. I hope that made sense sense.
>>> 
>>> -ck
>>> 
>>> On Fri, Aug 27, 2021, at 08:18, Volkan Yazıcı wrote:
 Carter, thanks so much for your great effort and attention on this issue. 
 *I
 personally find this the uttermost important thing the project needs to
 address with urgency right now.* Ceki is warming up for a new Logback (and
 SLF4J?) release. I am pretty sure he will do his part in PR, particularly
 in the context of benchmarking his new work with competitors. Once these
 results are out – independent of whether they make sense or not –
 "internet" will take this as the ultimate truth and it will require a
 tremendous effort to change people's mind. In conclusion, please prioritize
 this appropriately and keep us posted. Don't hesitate to let us/me know if
 there is anything (drinks, coffee, lunch?) we can help with.
 
 Regarding your remarks about Loom, I don't think I follow you there. Our
 thread-locals are stateless hence should be perfectly fine while running in
 virtual threads too. (I am totally against blindly using TLs, but that is
 another discussion. I would rather prefer a recycler concept similar to the
 one we have in JsonTemplateLayout. I think there was already a mailing list
 thread about this.)
 
 On Thu, Aug 26, 2021 at 8:37 PM Carter Kozak  wrote:
 
> Yes, I'm still looking into this.
> 
> I agree that the zero-garbage code is difficult to follow, and Loom
> virtual threads will have less reuse so our thread-locals will create more
> overhead than they're worth in many cases. Fingers crossed for Valhalla to
> land before Loom, but I haven't been following updates from either project
> very closely lately, and Loom see

Re: Cek's new log4j vs logback benchmark

2021-08-27 Thread Ralph Goers
All of that agrees with my observations as well.

Ralph

> On Aug 27, 2021, at 12:23 PM, Carter Kozak  wrote:
> 
> I've identified a few things that seem impactful, but bear in mind that I 
> haven't begun to drill down into them yet. I plan to file individual tickets 
> and investigate in greater depth later on:
> 
> 1. Formatting the date is expensive. The process of actually formatting a 
> value is reasonable, however using a precise clock appears to cause cache 
> misses even when the pattern results in the same value 
> (microseconds/nanoseconds aren't included in the result). Using the 
> SystemMillisClock improves our throughput. I imagine some part of that 
> improvement is the result of currentTimeMillis(): long rather than 
> Clock.instant().
> 
> 2. Setting 'log4j2.enable.direct.encoders' to false improves performance, 
> otherwise we see a lot of time spent in the StringBuilderEncoder, but I 
> haven't had time to investigate that yet. Without direct encoders, we are 
> allocating a great deal more byte arrays, but they're used across a small 
> space that C2 may be able to inline out.
> 
> 3. Setting log4j2.formatMsgNoLookups=true (or using message pattern 
> '%m{nolookup}' saves us some time scanning through produced messages -- I've 
> had this disabled at work for a while.
> 
> 4. [in progress] If I implement the full layout directly in java (write 
> directly to a stringbuilder based on the event in a custom Layout instead of 
> using PatternLayout) combined with [1] and [2] above, performance is much 
> better than logback, however using PatternLayout with the steps above we fall 
> just a bit short. I still need to read through the patternlayout and 
> individual pattern converters to understand why, but there's definitely some 
> headroom we'll be able to reclaim hiding somewhere.
> 
> -ck
> 
> On Fri, Aug 27, 2021, at 11:23, Carter Kozak wrote:
>> My pleasure, I enjoy digging into this sort of performance comparison, the
>> toughest piece right now is balancing investigation with active projects at
>> work. I completely agree about the importance of getting this resolved
>> quickly and I intend to continue investigating.
>> 
>> Re loom:
>> Currently we have a few places which reuse shared objects, potentially large
>> ones (thinking StringBuilder instances on ReusableLogEvent/StringLayout/etc)
>> which are currently bounded to the number of application threads you have
>> (which is relatively small compared to the loom claims which are admittedly
>> large). More importantly, today we tend to reuse almost every thread in a
>> ThreadPoolExecutor of some kind where the model in loom involves creating
>> a new virtual thread for each task, meaning using a thread-local as a cache
>> will have a much lower hit rate as reuse will only occur within a single 
>> task,
>> and only if ThreadLocals are enabled. I hope that made sense sense.
>> 
>> -ck
>> 
>> On Fri, Aug 27, 2021, at 08:18, Volkan Yazıcı wrote:
>>> Carter, thanks so much for your great effort and attention on this issue. *I
>>> personally find this the uttermost important thing the project needs to
>>> address with urgency right now.* Ceki is warming up for a new Logback (and
>>> SLF4J?) release. I am pretty sure he will do his part in PR, particularly
>>> in the context of benchmarking his new work with competitors. Once these
>>> results are out – independent of whether they make sense or not –
>>> "internet" will take this as the ultimate truth and it will require a
>>> tremendous effort to change people's mind. In conclusion, please prioritize
>>> this appropriately and keep us posted. Don't hesitate to let us/me know if
>>> there is anything (drinks, coffee, lunch?) we can help with.
>>> 
>>> Regarding your remarks about Loom, I don't think I follow you there. Our
>>> thread-locals are stateless hence should be perfectly fine while running in
>>> virtual threads too. (I am totally against blindly using TLs, but that is
>>> another discussion. I would rather prefer a recycler concept similar to the
>>> one we have in JsonTemplateLayout. I think there was already a mailing list
>>> thread about this.)
>>> 
>>> On Thu, Aug 26, 2021 at 8:37 PM Carter Kozak  wrote:
>>> 
 Yes, I'm still looking into this.
 
 I agree that the zero-garbage code is difficult to follow, and Loom
 virtual threads will have less reuse so our thread-locals will create more
 overhead than they're worth in many cases. Fingers crossed for Valhalla to
 land before Loom, but I haven't been following updates from either project
 very closely lately, and Loom seems farther along.
 
 I think the issue with multiple buffers these days is that we have optane
 and nvme devices which can be _nearly_ as fast as main memory, so copying
 contents can be problematic. In fully async mode (with some caveats around
 queue-full-policy configuration points and nested async appenders) we don't
 need locking because the append

Re: Cek's new log4j vs logback benchmark

2021-08-27 Thread Carter Kozak
I've identified a few things that seem impactful, but bear in mind that I 
haven't begun to drill down into them yet. I plan to file individual tickets 
and investigate in greater depth later on:

1. Formatting the date is expensive. The process of actually formatting a value 
is reasonable, however using a precise clock appears to cause cache misses even 
when the pattern results in the same value (microseconds/nanoseconds aren't 
included in the result). Using the SystemMillisClock improves our throughput. I 
imagine some part of that improvement is the result of currentTimeMillis(): 
long rather than Clock.instant().

2. Setting 'log4j2.enable.direct.encoders' to false improves performance, 
otherwise we see a lot of time spent in the StringBuilderEncoder, but I haven't 
had time to investigate that yet. Without direct encoders, we are allocating a 
great deal more byte arrays, but they're used across a small space that C2 may 
be able to inline out.

3. Setting log4j2.formatMsgNoLookups=true (or using message pattern 
'%m{nolookup}' saves us some time scanning through produced messages -- I've 
had this disabled at work for a while.

4. [in progress] If I implement the full layout directly in java (write 
directly to a stringbuilder based on the event in a custom Layout instead of 
using PatternLayout) combined with [1] and [2] above, performance is much 
better than logback, however using PatternLayout with the steps above we fall 
just a bit short. I still need to read through the patternlayout and individual 
pattern converters to understand why, but there's definitely some headroom 
we'll be able to reclaim hiding somewhere.

-ck

On Fri, Aug 27, 2021, at 11:23, Carter Kozak wrote:
> My pleasure, I enjoy digging into this sort of performance comparison, the
> toughest piece right now is balancing investigation with active projects at
> work. I completely agree about the importance of getting this resolved
> quickly and I intend to continue investigating.
> 
> Re loom:
> Currently we have a few places which reuse shared objects, potentially large
> ones (thinking StringBuilder instances on ReusableLogEvent/StringLayout/etc)
> which are currently bounded to the number of application threads you have
> (which is relatively small compared to the loom claims which are admittedly
> large). More importantly, today we tend to reuse almost every thread in a
> ThreadPoolExecutor of some kind where the model in loom involves creating
> a new virtual thread for each task, meaning using a thread-local as a cache
> will have a much lower hit rate as reuse will only occur within a single task,
> and only if ThreadLocals are enabled. I hope that made sense sense.
> 
> -ck
> 
> On Fri, Aug 27, 2021, at 08:18, Volkan Yazıcı wrote:
> > Carter, thanks so much for your great effort and attention on this issue. *I
> > personally find this the uttermost important thing the project needs to
> > address with urgency right now.* Ceki is warming up for a new Logback (and
> > SLF4J?) release. I am pretty sure he will do his part in PR, particularly
> > in the context of benchmarking his new work with competitors. Once these
> > results are out – independent of whether they make sense or not –
> > "internet" will take this as the ultimate truth and it will require a
> > tremendous effort to change people's mind. In conclusion, please prioritize
> > this appropriately and keep us posted. Don't hesitate to let us/me know if
> > there is anything (drinks, coffee, lunch?) we can help with.
> > 
> > Regarding your remarks about Loom, I don't think I follow you there. Our
> > thread-locals are stateless hence should be perfectly fine while running in
> > virtual threads too. (I am totally against blindly using TLs, but that is
> > another discussion. I would rather prefer a recycler concept similar to the
> > one we have in JsonTemplateLayout. I think there was already a mailing list
> > thread about this.)
> > 
> > On Thu, Aug 26, 2021 at 8:37 PM Carter Kozak  wrote:
> > 
> > > Yes, I'm still looking into this.
> > >
> > > I agree that the zero-garbage code is difficult to follow, and Loom
> > > virtual threads will have less reuse so our thread-locals will create more
> > > overhead than they're worth in many cases. Fingers crossed for Valhalla to
> > > land before Loom, but I haven't been following updates from either project
> > > very closely lately, and Loom seems farther along.
> > >
> > > I think the issue with multiple buffers these days is that we have optane
> > > and nvme devices which can be _nearly_ as fast as main memory, so copying
> > > contents can be problematic. In fully async mode (with some caveats around
> > > queue-full-policy configuration points and nested async appenders) we 
> > > don't
> > > need locking because the appender interactions are single-threaded.
> > >
> > > For what it's worth, in my testing I didn't see much difference between
> > > the 8 kB buffer and 256 kB buffer on a system with a relat

Re: Cek's new log4j vs logback benchmark

2021-08-27 Thread Carter Kozak
My pleasure, I enjoy digging into this sort of performance comparison, the
toughest piece right now is balancing investigation with active projects at
work. I completely agree about the importance of getting this resolved
quickly and I intend to continue investigating.

Re loom:
Currently we have a few places which reuse shared objects, potentially large
ones (thinking StringBuilder instances on ReusableLogEvent/StringLayout/etc)
which are currently bounded to the number of application threads you have
(which is relatively small compared to the loom claims which are admittedly
large). More importantly, today we tend to reuse almost every thread in a
ThreadPoolExecutor of some kind where the model in loom involves creating
a new virtual thread for each task, meaning using a thread-local as a cache
will have a much lower hit rate as reuse will only occur within a single task,
and only if ThreadLocals are enabled. I hope that made sense sense.

-ck

On Fri, Aug 27, 2021, at 08:18, Volkan Yazıcı wrote:
> Carter, thanks so much for your great effort and attention on this issue. *I
> personally find this the uttermost important thing the project needs to
> address with urgency right now.* Ceki is warming up for a new Logback (and
> SLF4J?) release. I am pretty sure he will do his part in PR, particularly
> in the context of benchmarking his new work with competitors. Once these
> results are out – independent of whether they make sense or not –
> "internet" will take this as the ultimate truth and it will require a
> tremendous effort to change people's mind. In conclusion, please prioritize
> this appropriately and keep us posted. Don't hesitate to let us/me know if
> there is anything (drinks, coffee, lunch?) we can help with.
> 
> Regarding your remarks about Loom, I don't think I follow you there. Our
> thread-locals are stateless hence should be perfectly fine while running in
> virtual threads too. (I am totally against blindly using TLs, but that is
> another discussion. I would rather prefer a recycler concept similar to the
> one we have in JsonTemplateLayout. I think there was already a mailing list
> thread about this.)
> 
> On Thu, Aug 26, 2021 at 8:37 PM Carter Kozak  wrote:
> 
> > Yes, I'm still looking into this.
> >
> > I agree that the zero-garbage code is difficult to follow, and Loom
> > virtual threads will have less reuse so our thread-locals will create more
> > overhead than they're worth in many cases. Fingers crossed for Valhalla to
> > land before Loom, but I haven't been following updates from either project
> > very closely lately, and Loom seems farther along.
> >
> > I think the issue with multiple buffers these days is that we have optane
> > and nvme devices which can be _nearly_ as fast as main memory, so copying
> > contents can be problematic. In fully async mode (with some caveats around
> > queue-full-policy configuration points and nested async appenders) we don't
> > need locking because the appender interactions are single-threaded.
> >
> > For what it's worth, in my testing I didn't see much difference between
> > the 8 kB buffer and 256 kB buffer on a system with a relatively fast nvme
> > drive, however I did discover and resolve LOG4J2-3150 (incorrect default
> > buffer size in the RandomAccessFileAppender).
> >
> > On Thu, Aug 26, 2021, at 14:21, Ralph Goers wrote:
> > > While that is true it does mean that we are locking higher up in the
> > call stack than we need to be.
> > >
> > > Ralph
> > >
> > > > On Aug 26, 2021, at 11:13 AM, Tim Perry  wrote:
> > > >
> > > > I’m fairly certain the JIT will optimize out the locking operations on
> > the nested synchronized calls after a while. I’m not sure how soon into the
> > performance tests that would happen.
> > > >
> > > > Tim
> > > >
> > > >
> > > >> On Aug 26, 2021, at 10:55 AM, Ralph Goers 
> > wrote:
> > > >>
> > > >> So are you continuing to look at this?  Frankly, the work on zero-gc
> > stuff made this
> > > >> very complicated. I am not sure I can even follow the logic anymore.
> > I also noticed
> > > >> that many of the methods dealing with byte buffers are synchronized.
> > I am not sure
> > > >> why as only the method that moves data really needs to be.
> > > >>
> > > >> Using multiple buffers would make sense if you are filling one from
> > the various patterns
> > > >> while it is unlocked and then only blocking when writing to the
> > output stream buffer.
> > > >>
> > > >> For what its worth Ceki updated his performance page again yesterday
> > to change
> > > >> the wording but I am pretty sure he didn’t change the log4j2
> > configuration to use a
> > > >> 256K buffer to match Logback.  And the page still makes it look like
> > he is testing
> > > >> the performance of asynchronous processing when it is really testing
> > the performance
> > > >> of the file appenders with a blocking queue in front of them.
> > > >>
> > > >> Ralph
> > > >>
> > > >>> On Aug 26, 2021, at 7:29 AM, Carter Kozak  wrote:
> 

Re: Cek's new log4j vs logback benchmark

2021-08-27 Thread Volkan Yazıcı
Carter, thanks so much for your great effort and attention on this issue. *I
personally find this the uttermost important thing the project needs to
address with urgency right now.* Ceki is warming up for a new Logback (and
SLF4J?) release. I am pretty sure he will do his part in PR, particularly
in the context of benchmarking his new work with competitors. Once these
results are out – independent of whether they make sense or not –
"internet" will take this as the ultimate truth and it will require a
tremendous effort to change people's mind. In conclusion, please prioritize
this appropriately and keep us posted. Don't hesitate to let us/me know if
there is anything (drinks, coffee, lunch?) we can help with.

Regarding your remarks about Loom, I don't think I follow you there. Our
thread-locals are stateless hence should be perfectly fine while running in
virtual threads too. (I am totally against blindly using TLs, but that is
another discussion. I would rather prefer a recycler concept similar to the
one we have in JsonTemplateLayout. I think there was already a mailing list
thread about this.)

On Thu, Aug 26, 2021 at 8:37 PM Carter Kozak  wrote:

> Yes, I'm still looking into this.
>
> I agree that the zero-garbage code is difficult to follow, and Loom
> virtual threads will have less reuse so our thread-locals will create more
> overhead than they're worth in many cases. Fingers crossed for Valhalla to
> land before Loom, but I haven't been following updates from either project
> very closely lately, and Loom seems farther along.
>
> I think the issue with multiple buffers these days is that we have optane
> and nvme devices which can be _nearly_ as fast as main memory, so copying
> contents can be problematic. In fully async mode (with some caveats around
> queue-full-policy configuration points and nested async appenders) we don't
> need locking because the appender interactions are single-threaded.
>
> For what it's worth, in my testing I didn't see much difference between
> the 8 kB buffer and 256 kB buffer on a system with a relatively fast nvme
> drive, however I did discover and resolve LOG4J2-3150 (incorrect default
> buffer size in the RandomAccessFileAppender).
>
> On Thu, Aug 26, 2021, at 14:21, Ralph Goers wrote:
> > While that is true it does mean that we are locking higher up in the
> call stack than we need to be.
> >
> > Ralph
> >
> > > On Aug 26, 2021, at 11:13 AM, Tim Perry  wrote:
> > >
> > > I’m fairly certain the JIT will optimize out the locking operations on
> the nested synchronized calls after a while. I’m not sure how soon into the
> performance tests that would happen.
> > >
> > > Tim
> > >
> > >
> > >> On Aug 26, 2021, at 10:55 AM, Ralph Goers 
> wrote:
> > >>
> > >> So are you continuing to look at this?  Frankly, the work on zero-gc
> stuff made this
> > >> very complicated. I am not sure I can even follow the logic anymore.
> I also noticed
> > >> that many of the methods dealing with byte buffers are synchronized.
> I am not sure
> > >> why as only the method that moves data really needs to be.
> > >>
> > >> Using multiple buffers would make sense if you are filling one from
> the various patterns
> > >> while it is unlocked and then only blocking when writing to the
> output stream buffer.
> > >>
> > >> For what its worth Ceki updated his performance page again yesterday
> to change
> > >> the wording but I am pretty sure he didn’t change the log4j2
> configuration to use a
> > >> 256K buffer to match Logback.  And the page still makes it look like
> he is testing
> > >> the performance of asynchronous processing when it is really testing
> the performance
> > >> of the file appenders with a blocking queue in front of them.
> > >>
> > >> Ralph
> > >>
> > >>> On Aug 26, 2021, at 7:29 AM, Carter Kozak  wrote:
> > >>>
> > >>> I also tried that with similar results, which leads me to believe
> we're spending
> > >>> too much time copying between buffers.
> > >>>
> > >>> We've proven that log4j can get LogEvents to the appender very
> quickly and
> > >>> efficiently.
> > >>>
> > >>> Once we hit PatternLayout we write data to a StringBuilder.
> AbstractStringBuilder used to store a char array in java 8, however with
> jep 254 compact strings it was updated (similarly to String) to use a byte
> array + byte representing the encoding.
> > >>> We encode the StringBuilder value into a ByteBufferDestination
> (Heap)ByteBuffer in StringBuilderEncoder using the following steps:
> > >>> 1. copy the StringBuilder value into a CharBuffer (this would have
> been a direct copy from char[] to char[] in java8, but requires conversion
> from byte[]+coder now)
> > >>> 2. encode data from the CharBuffer to the StringBuilderEncoder
> ByteBuffer
> > >>> 3. copy contents of the StringBuilderEncoder ByteBuffer to the
> ByteBufferDestination ByteBuffer (which is also always a heap buffer as far
> as I can tell)
> > >>> 4. Write the ByteBufferDestination to the
> FileOutputStream/RandomAccessFil

Re: Cek's new log4j vs logback benchmark

2021-08-26 Thread Carter Kozak
Yes, I'm still looking into this.

I agree that the zero-garbage code is difficult to follow, and Loom virtual 
threads will have less reuse so our thread-locals will create more overhead 
than they're worth in many cases. Fingers crossed for Valhalla to land before 
Loom, but I haven't been following updates from either project very closely 
lately, and Loom seems farther along.

I think the issue with multiple buffers these days is that we have optane and 
nvme devices which can be _nearly_ as fast as main memory, so copying contents 
can be problematic. In fully async mode (with some caveats around 
queue-full-policy configuration points and nested async appenders) we don't 
need locking because the appender interactions are single-threaded.

For what it's worth, in my testing I didn't see much difference between the 8 
kB buffer and 256 kB buffer on a system with a relatively fast nvme drive, 
however I did discover and resolve LOG4J2-3150 (incorrect default buffer size 
in the RandomAccessFileAppender).

On Thu, Aug 26, 2021, at 14:21, Ralph Goers wrote:
> While that is true it does mean that we are locking higher up in the call 
> stack than we need to be.
> 
> Ralph
> 
> > On Aug 26, 2021, at 11:13 AM, Tim Perry  wrote:
> > 
> > I’m fairly certain the JIT will optimize out the locking operations on the 
> > nested synchronized calls after a while. I’m not sure how soon into the 
> > performance tests that would happen. 
> > 
> > Tim
> > 
> > 
> >> On Aug 26, 2021, at 10:55 AM, Ralph Goers  
> >> wrote:
> >> 
> >> So are you continuing to look at this?  Frankly, the work on zero-gc 
> >> stuff made this 
> >> very complicated. I am not sure I can even follow the logic anymore. I 
> >> also noticed 
> >> that many of the methods dealing with byte buffers are synchronized. I am 
> >> not sure 
> >> why as only the method that moves data really needs to be.
> >> 
> >> Using multiple buffers would make sense if you are filling one from the 
> >> various patterns
> >> while it is unlocked and then only blocking when writing to the output 
> >> stream buffer.
> >> 
> >> For what its worth Ceki updated his performance page again yesterday to 
> >> change 
> >> the wording but I am pretty sure he didn’t change the log4j2 configuration 
> >> to use a 
> >> 256K buffer to match Logback.  And the page still makes it look like he is 
> >> testing 
> >> the performance of asynchronous processing when it is really testing the 
> >> performance 
> >> of the file appenders with a blocking queue in front of them.
> >> 
> >> Ralph
> >> 
> >>> On Aug 26, 2021, at 7:29 AM, Carter Kozak  wrote:
> >>> 
> >>> I also tried that with similar results, which leads me to believe we're 
> >>> spending
> >>> too much time copying between buffers.
> >>> 
> >>> We've proven that log4j can get LogEvents to the appender very quickly and
> >>> efficiently.
> >>> 
> >>> Once we hit PatternLayout we write data to a StringBuilder. 
> >>> AbstractStringBuilder used to store a char array in java 8, however with 
> >>> jep 254 compact strings it was updated (similarly to String) to use a 
> >>> byte array + byte representing the encoding.
> >>> We encode the StringBuilder value into a ByteBufferDestination 
> >>> (Heap)ByteBuffer in StringBuilderEncoder using the following steps:
> >>> 1. copy the StringBuilder value into a CharBuffer (this would have been a 
> >>> direct copy from char[] to char[] in java8, but requires conversion from 
> >>> byte[]+coder now)
> >>> 2. encode data from the CharBuffer to the StringBuilderEncoder ByteBuffer
> >>> 3. copy contents of the StringBuilderEncoder ByteBuffer to the 
> >>> ByteBufferDestination ByteBuffer (which is also always a heap buffer as 
> >>> far as I can tell)
> >>> 4. Write the ByteBufferDestination to the 
> >>> FileOutputStream/RandomAccessFile by extracting the heap byte-array.
> >>> 
> >>> We're copying bytes around a lot, and I think older JVMs would have 
> >>> optimized out a bit of the copying because the types exactly matched.
> >>> It's unclear why we need to copy through two separate ByteBuffers from 
> >>> StringBuilderEncoder into ByteBufferDestination
> >>> I've generally had better I/O performance using direct byte-buffers than 
> >>> heap buffers, although that experience largely comes from work on 
> >>> webservers, I haven't tested if it holds true for file I/O. If we operate 
> >>> directly upon a FileChannel, we can write direct buffers directly to 
> >>> files. The DirectByteBuffer memory is guaranteed to be contiguous, so it 
> >>> should have better performance around copying and manipulation as well.
> >>> 
> >>> -ck
> >>> 
>  On Thu, Aug 26, 2021, at 10:00, Volkan Yazıcı wrote:
>  Ralph, maybe a dumb idea but... Can't we simply write to /dev/null to 
>  avoid
>  hardware's influence in the test results?
>  
>  On Wed, Aug 25, 2021 at 8:05 PM Ralph Goers 
>  wrote:
>  
> > Did you add the no-op appender to Ceki’s project? Or are 

Re: Cek's new log4j vs logback benchmark

2021-08-26 Thread Tim Perry
I see your point.

On Thu, Aug 26, 2021 at 11:21 AM Ralph Goers 
wrote:

> While that is true it does mean that we are locking higher up in the call
> stack than we need to be.
>
> Ralph
>
> > On Aug 26, 2021, at 11:13 AM, Tim Perry  wrote:
> >
> > I’m fairly certain the JIT will optimize out the locking operations on
> the nested synchronized calls after a while. I’m not sure how soon into the
> performance tests that would happen.
> >
> > Tim
> >
> >
> >> On Aug 26, 2021, at 10:55 AM, Ralph Goers 
> wrote:
> >>
> >> So are you continuing to look at this?  Frankly, the work on zero-gc
> stuff made this
> >> very complicated. I am not sure I can even follow the logic anymore. I
> also noticed
> >> that many of the methods dealing with byte buffers are synchronized. I
> am not sure
> >> why as only the method that moves data really needs to be.
> >>
> >> Using multiple buffers would make sense if you are filling one from the
> various patterns
> >> while it is unlocked and then only blocking when writing to the output
> stream buffer.
> >>
> >> For what its worth Ceki updated his performance page again yesterday to
> change
> >> the wording but I am pretty sure he didn’t change the log4j2
> configuration to use a
> >> 256K buffer to match Logback.  And the page still makes it look like he
> is testing
> >> the performance of asynchronous processing when it is really testing
> the performance
> >> of the file appenders with a blocking queue in front of them.
> >>
> >> Ralph
> >>
> >>> On Aug 26, 2021, at 7:29 AM, Carter Kozak  wrote:
> >>>
> >>> I also tried that with similar results, which leads me to believe
> we're spending
> >>> too much time copying between buffers.
> >>>
> >>> We've proven that log4j can get LogEvents to the appender very quickly
> and
> >>> efficiently.
> >>>
> >>> Once we hit PatternLayout we write data to a StringBuilder.
> AbstractStringBuilder used to store a char array in java 8, however with
> jep 254 compact strings it was updated (similarly to String) to use a byte
> array + byte representing the encoding.
> >>> We encode the StringBuilder value into a ByteBufferDestination
> (Heap)ByteBuffer in StringBuilderEncoder using the following steps:
> >>> 1. copy the StringBuilder value into a CharBuffer (this would have
> been a direct copy from char[] to char[] in java8, but requires conversion
> from byte[]+coder now)
> >>> 2. encode data from the CharBuffer to the StringBuilderEncoder
> ByteBuffer
> >>> 3. copy contents of the StringBuilderEncoder ByteBuffer to the
> ByteBufferDestination ByteBuffer (which is also always a heap buffer as far
> as I can tell)
> >>> 4. Write the ByteBufferDestination to the
> FileOutputStream/RandomAccessFile by extracting the heap byte-array.
> >>>
> >>> We're copying bytes around a lot, and I think older JVMs would have
> optimized out a bit of the copying because the types exactly matched.
> >>> It's unclear why we need to copy through two separate ByteBuffers from
> StringBuilderEncoder into ByteBufferDestination
> >>> I've generally had better I/O performance using direct byte-buffers
> than heap buffers, although that experience largely comes from work on
> webservers, I haven't tested if it holds true for file I/O. If we operate
> directly upon a FileChannel, we can write direct buffers directly to files.
> The DirectByteBuffer memory is guaranteed to be contiguous, so it should
> have better performance around copying and manipulation as well.
> >>>
> >>> -ck
> >>>
>  On Thu, Aug 26, 2021, at 10:00, Volkan Yazıcı wrote:
>  Ralph, maybe a dumb idea but... Can't we simply write to /dev/null to
> avoid
>  hardware's influence in the test results?
> 
>  On Wed, Aug 25, 2021 at 8:05 PM Ralph Goers <
> ralph.go...@dslextreme.com>
>  wrote:
> 
> > Did you add the no-op appender to Ceki’s project? Or are you using
> our
> > NullAppender? I have
> > my doubts about using that NullAppender as it does nothing - not even
> > render the Layout, so
> > it may get completely optimized away.
> >
> > I’d still like to know what kind of disks Remko did his original
> testing.
> > The page says he got
> > 18.495 million messages per second. Each message would be about 130
> > characters long
> > from I can tell. That means the destination has to be able to support
> > about at least 2.4 GB
> > per second.
> >
> > In Ceki’s test the message is only 23 characters long (vs 100 in
> Remko’s).
> > That means the
> > message size is only going to be about 60 characters long. For me the
> > async test is writing
> > at about 1700 ops/ms which equates to 102 MBs, yet I am still seeing
> the
> > queue backed up.
> > So how we are writing must be terribly inefficient. Logback is
> getting
> > about 2000 ops/ms,
> > which is still only 120 MBs.
> >
> > Your test below would be generating about 197 MBs.
> >
> > One thing I did notice th

Re: Cek's new log4j vs logback benchmark

2021-08-26 Thread Ralph Goers
While that is true it does mean that we are locking higher up in the call stack 
than we need to be.

Ralph

> On Aug 26, 2021, at 11:13 AM, Tim Perry  wrote:
> 
> I’m fairly certain the JIT will optimize out the locking operations on the 
> nested synchronized calls after a while. I’m not sure how soon into the 
> performance tests that would happen. 
> 
> Tim
> 
> 
>> On Aug 26, 2021, at 10:55 AM, Ralph Goers  wrote:
>> 
>> So are you continuing to look at this?  Frankly, the work on zero-gc stuff 
>> made this 
>> very complicated. I am not sure I can even follow the logic anymore. I also 
>> noticed 
>> that many of the methods dealing with byte buffers are synchronized. I am 
>> not sure 
>> why as only the method that moves data really needs to be.
>> 
>> Using multiple buffers would make sense if you are filling one from the 
>> various patterns
>> while it is unlocked and then only blocking when writing to the output 
>> stream buffer.
>> 
>> For what its worth Ceki updated his performance page again yesterday to 
>> change 
>> the wording but I am pretty sure he didn’t change the log4j2 configuration 
>> to use a 
>> 256K buffer to match Logback.  And the page still makes it look like he is 
>> testing 
>> the performance of asynchronous processing when it is really testing the 
>> performance 
>> of the file appenders with a blocking queue in front of them.
>> 
>> Ralph
>> 
>>> On Aug 26, 2021, at 7:29 AM, Carter Kozak  wrote:
>>> 
>>> I also tried that with similar results, which leads me to believe we're 
>>> spending
>>> too much time copying between buffers.
>>> 
>>> We've proven that log4j can get LogEvents to the appender very quickly and
>>> efficiently.
>>> 
>>> Once we hit PatternLayout we write data to a StringBuilder. 
>>> AbstractStringBuilder used to store a char array in java 8, however with 
>>> jep 254 compact strings it was updated (similarly to String) to use a byte 
>>> array + byte representing the encoding.
>>> We encode the StringBuilder value into a ByteBufferDestination 
>>> (Heap)ByteBuffer in StringBuilderEncoder using the following steps:
>>> 1. copy the StringBuilder value into a CharBuffer (this would have been a 
>>> direct copy from char[] to char[] in java8, but requires conversion from 
>>> byte[]+coder now)
>>> 2. encode data from the CharBuffer to the StringBuilderEncoder ByteBuffer
>>> 3. copy contents of the StringBuilderEncoder ByteBuffer to the 
>>> ByteBufferDestination ByteBuffer (which is also always a heap buffer as far 
>>> as I can tell)
>>> 4. Write the ByteBufferDestination to the FileOutputStream/RandomAccessFile 
>>> by extracting the heap byte-array.
>>> 
>>> We're copying bytes around a lot, and I think older JVMs would have 
>>> optimized out a bit of the copying because the types exactly matched.
>>> It's unclear why we need to copy through two separate ByteBuffers from 
>>> StringBuilderEncoder into ByteBufferDestination
>>> I've generally had better I/O performance using direct byte-buffers than 
>>> heap buffers, although that experience largely comes from work on 
>>> webservers, I haven't tested if it holds true for file I/O. If we operate 
>>> directly upon a FileChannel, we can write direct buffers directly to files. 
>>> The DirectByteBuffer memory is guaranteed to be contiguous, so it should 
>>> have better performance around copying and manipulation as well.
>>> 
>>> -ck
>>> 
 On Thu, Aug 26, 2021, at 10:00, Volkan Yazıcı wrote:
 Ralph, maybe a dumb idea but... Can't we simply write to /dev/null to avoid
 hardware's influence in the test results?
 
 On Wed, Aug 25, 2021 at 8:05 PM Ralph Goers 
 wrote:
 
> Did you add the no-op appender to Ceki’s project? Or are you using our
> NullAppender? I have
> my doubts about using that NullAppender as it does nothing - not even
> render the Layout, so
> it may get completely optimized away.
> 
> I’d still like to know what kind of disks Remko did his original testing.
> The page says he got
> 18.495 million messages per second. Each message would be about 130
> characters long
> from I can tell. That means the destination has to be able to support
> about at least 2.4 GB
> per second.
> 
> In Ceki’s test the message is only 23 characters long (vs 100 in Remko’s).
> That means the
> message size is only going to be about 60 characters long. For me the
> async test is writing
> at about 1700 ops/ms which equates to 102 MBs, yet I am still seeing the
> queue backed up.
> So how we are writing must be terribly inefficient. Logback is getting
> about 2000 ops/ms,
> which is still only 120 MBs.
> 
> Your test below would be generating about 197 MBs.
> 
> One thing I did notice that made a big difference is that Ceki has Logback
> configured to
> use a buffer size of 256K while Log4j2 uses the default of 8K. Setting
> Log4j2 to 256K
> considerably im

Re: Cek's new log4j vs logback benchmark

2021-08-26 Thread Tim Perry
I’m fairly certain the JIT will optimize out the locking operations on the 
nested synchronized calls after a while. I’m not sure how soon into the 
performance tests that would happen. 

Tim


> On Aug 26, 2021, at 10:55 AM, Ralph Goers  wrote:
> 
> So are you continuing to look at this?  Frankly, the work on zero-gc stuff 
> made this 
> very complicated. I am not sure I can even follow the logic anymore. I also 
> noticed 
> that many of the methods dealing with byte buffers are synchronized. I am not 
> sure 
> why as only the method that moves data really needs to be.
> 
> Using multiple buffers would make sense if you are filling one from the 
> various patterns
> while it is unlocked and then only blocking when writing to the output stream 
> buffer.
> 
> For what its worth Ceki updated his performance page again yesterday to 
> change 
> the wording but I am pretty sure he didn’t change the log4j2 configuration to 
> use a 
> 256K buffer to match Logback.  And the page still makes it look like he is 
> testing 
> the performance of asynchronous processing when it is really testing the 
> performance 
> of the file appenders with a blocking queue in front of them.
> 
> Ralph
> 
>> On Aug 26, 2021, at 7:29 AM, Carter Kozak  wrote:
>> 
>> I also tried that with similar results, which leads me to believe we're 
>> spending
>> too much time copying between buffers.
>> 
>> We've proven that log4j can get LogEvents to the appender very quickly and
>> efficiently.
>> 
>> Once we hit PatternLayout we write data to a StringBuilder. 
>> AbstractStringBuilder used to store a char array in java 8, however with jep 
>> 254 compact strings it was updated (similarly to String) to use a byte array 
>> + byte representing the encoding.
>> We encode the StringBuilder value into a ByteBufferDestination 
>> (Heap)ByteBuffer in StringBuilderEncoder using the following steps:
>> 1. copy the StringBuilder value into a CharBuffer (this would have been a 
>> direct copy from char[] to char[] in java8, but requires conversion from 
>> byte[]+coder now)
>> 2. encode data from the CharBuffer to the StringBuilderEncoder ByteBuffer
>> 3. copy contents of the StringBuilderEncoder ByteBuffer to the 
>> ByteBufferDestination ByteBuffer (which is also always a heap buffer as far 
>> as I can tell)
>> 4. Write the ByteBufferDestination to the FileOutputStream/RandomAccessFile 
>> by extracting the heap byte-array.
>> 
>> We're copying bytes around a lot, and I think older JVMs would have 
>> optimized out a bit of the copying because the types exactly matched.
>> It's unclear why we need to copy through two separate ByteBuffers from 
>> StringBuilderEncoder into ByteBufferDestination
>> I've generally had better I/O performance using direct byte-buffers than 
>> heap buffers, although that experience largely comes from work on 
>> webservers, I haven't tested if it holds true for file I/O. If we operate 
>> directly upon a FileChannel, we can write direct buffers directly to files. 
>> The DirectByteBuffer memory is guaranteed to be contiguous, so it should 
>> have better performance around copying and manipulation as well.
>> 
>> -ck
>> 
>>> On Thu, Aug 26, 2021, at 10:00, Volkan Yazıcı wrote:
>>> Ralph, maybe a dumb idea but... Can't we simply write to /dev/null to avoid
>>> hardware's influence in the test results?
>>> 
>>> On Wed, Aug 25, 2021 at 8:05 PM Ralph Goers 
>>> wrote:
>>> 
 Did you add the no-op appender to Ceki’s project? Or are you using our
 NullAppender? I have
 my doubts about using that NullAppender as it does nothing - not even
 render the Layout, so
 it may get completely optimized away.
 
 I’d still like to know what kind of disks Remko did his original testing.
 The page says he got
 18.495 million messages per second. Each message would be about 130
 characters long
 from I can tell. That means the destination has to be able to support
 about at least 2.4 GB
 per second.
 
 In Ceki’s test the message is only 23 characters long (vs 100 in Remko’s).
 That means the
 message size is only going to be about 60 characters long. For me the
 async test is writing
 at about 1700 ops/ms which equates to 102 MBs, yet I am still seeing the
 queue backed up.
 So how we are writing must be terribly inefficient. Logback is getting
 about 2000 ops/ms,
 which is still only 120 MBs.
 
 Your test below would be generating about 197 MBs.
 
 One thing I did notice that made a big difference is that Ceki has Logback
 configured to
 use a buffer size of 256K while Log4j2 uses the default of 8K. Setting
 Log4j2 to 256K
 considerably improved the performance.
 
 
 Ralph
 
> On Aug 25, 2021, at 8:42 AM, Carter Kozak  wrote:
> 
>> If we would move the appender performance aside, am I right to conclude
>> that the entire complex async. framework built atop Disruptor with all
 i

Re: Cek's new log4j vs logback benchmark

2021-08-26 Thread Ralph Goers
So are you continuing to look at this?  Frankly, the work on zero-gc stuff made 
this 
very complicated. I am not sure I can even follow the logic anymore. I also 
noticed 
that many of the methods dealing with byte buffers are synchronized. I am not 
sure 
why as only the method that moves data really needs to be.

Using multiple buffers would make sense if you are filling one from the various 
patterns
while it is unlocked and then only blocking when writing to the output stream 
buffer.

For what its worth Ceki updated his performance page again yesterday to change 
the wording but I am pretty sure he didn’t change the log4j2 configuration to 
use a 
256K buffer to match Logback.  And the page still makes it look like he is 
testing 
the performance of asynchronous processing when it is really testing the 
performance 
of the file appenders with a blocking queue in front of them.

Ralph

> On Aug 26, 2021, at 7:29 AM, Carter Kozak  wrote:
> 
> I also tried that with similar results, which leads me to believe we're 
> spending
> too much time copying between buffers.
> 
> We've proven that log4j can get LogEvents to the appender very quickly and
> efficiently.
> 
> Once we hit PatternLayout we write data to a StringBuilder. 
> AbstractStringBuilder used to store a char array in java 8, however with jep 
> 254 compact strings it was updated (similarly to String) to use a byte array 
> + byte representing the encoding.
> We encode the StringBuilder value into a ByteBufferDestination 
> (Heap)ByteBuffer in StringBuilderEncoder using the following steps:
> 1. copy the StringBuilder value into a CharBuffer (this would have been a 
> direct copy from char[] to char[] in java8, but requires conversion from 
> byte[]+coder now)
> 2. encode data from the CharBuffer to the StringBuilderEncoder ByteBuffer
> 3. copy contents of the StringBuilderEncoder ByteBuffer to the 
> ByteBufferDestination ByteBuffer (which is also always a heap buffer as far 
> as I can tell)
> 4. Write the ByteBufferDestination to the FileOutputStream/RandomAccessFile 
> by extracting the heap byte-array.
> 
> We're copying bytes around a lot, and I think older JVMs would have optimized 
> out a bit of the copying because the types exactly matched.
> It's unclear why we need to copy through two separate ByteBuffers from 
> StringBuilderEncoder into ByteBufferDestination
> I've generally had better I/O performance using direct byte-buffers than heap 
> buffers, although that experience largely comes from work on webservers, I 
> haven't tested if it holds true for file I/O. If we operate directly upon a 
> FileChannel, we can write direct buffers directly to files. The 
> DirectByteBuffer memory is guaranteed to be contiguous, so it should have 
> better performance around copying and manipulation as well.
> 
> -ck
> 
> On Thu, Aug 26, 2021, at 10:00, Volkan Yazıcı wrote:
>> Ralph, maybe a dumb idea but... Can't we simply write to /dev/null to avoid
>> hardware's influence in the test results?
>> 
>> On Wed, Aug 25, 2021 at 8:05 PM Ralph Goers 
>> wrote:
>> 
>>> Did you add the no-op appender to Ceki’s project? Or are you using our
>>> NullAppender? I have
>>> my doubts about using that NullAppender as it does nothing - not even
>>> render the Layout, so
>>> it may get completely optimized away.
>>> 
>>> I’d still like to know what kind of disks Remko did his original testing.
>>> The page says he got
>>> 18.495 million messages per second. Each message would be about 130
>>> characters long
>>> from I can tell. That means the destination has to be able to support
>>> about at least 2.4 GB
>>> per second.
>>> 
>>> In Ceki’s test the message is only 23 characters long (vs 100 in Remko’s).
>>> That means the
>>> message size is only going to be about 60 characters long. For me the
>>> async test is writing
>>> at about 1700 ops/ms which equates to 102 MBs, yet I am still seeing the
>>> queue backed up.
>>> So how we are writing must be terribly inefficient. Logback is getting
>>> about 2000 ops/ms,
>>> which is still only 120 MBs.
>>> 
>>> Your test below would be generating about 197 MBs.
>>> 
>>> One thing I did notice that made a big difference is that Ceki has Logback
>>> configured to
>>> use a buffer size of 256K while Log4j2 uses the default of 8K. Setting
>>> Log4j2 to 256K
>>> considerably improved the performance.
>>> 
>>> 
>>> Ralph
>>> 
 On Aug 25, 2021, at 8:42 AM, Carter Kozak  wrote:
 
> If we would move the appender performance aside, am I right to conclude
> that the entire complex async. framework built atop Disruptor with all
>>> its
> whistles and bells is yielding a diminishing performance gain compared
>>> to a
> simple off the shelf blocking queue?
 
 I haven't seen any data that would give me such an impression -- the
>>> file appender
 itself is slower than the thread producing events, which appears to
>>> limit our throughput.
 Even then, log4j2 does much better in my testing on a

Re: Cek's new log4j vs logback benchmark

2021-08-26 Thread Carter Kozak
Sorry I missed replying to your question, Ralph!

> Did you add the no-op appender to Ceki’s project? Or are you using our 
> NullAppender? I have 
> my doubts about using that NullAppender as it does nothing - not even render 
> the Layout, so 
> it may get completely optimized away.

I added a no-op appender locally and ran the benchmark, and used the logback 
null-appender which also does not read event values. I agree that this test 
would be completely invalid using synchronous logging, however using the 
asynchronous approach I don't believe the JVM can optimize out the application 
thread pieces. There's definitely some ambiguity in the data -- I also tested 
using a wrapper around a file appender with a 1/1000 probability of writing a 
given event to the file appender and saw similar results, so I believe the test 
was at least somewhat valid.

-ck

On Thu, Aug 26, 2021, at 10:29, Carter Kozak wrote:
> I also tried that with similar results, which leads me to believe we're 
> spending
> too much time copying between buffers.
> 
> We've proven that log4j can get LogEvents to the appender very quickly and
> efficiently.
> 
> Once we hit PatternLayout we write data to a StringBuilder. 
> AbstractStringBuilder used to store a char array in java 8, however with jep 
> 254 compact strings it was updated (similarly to String) to use a byte array 
> + byte representing the encoding.
> We encode the StringBuilder value into a ByteBufferDestination 
> (Heap)ByteBuffer in StringBuilderEncoder using the following steps:
> 1. copy the StringBuilder value into a CharBuffer (this would have been a 
> direct copy from char[] to char[] in java8, but requires conversion from 
> byte[]+coder now)
> 2. encode data from the CharBuffer to the StringBuilderEncoder ByteBuffer
> 3. copy contents of the StringBuilderEncoder ByteBuffer to the 
> ByteBufferDestination ByteBuffer (which is also always a heap buffer as far 
> as I can tell)
> 4. Write the ByteBufferDestination to the FileOutputStream/RandomAccessFile 
> by extracting the heap byte-array.
> 
> We're copying bytes around a lot, and I think older JVMs would have optimized 
> out a bit of the copying because the types exactly matched.
> It's unclear why we need to copy through two separate ByteBuffers from 
> StringBuilderEncoder into ByteBufferDestination
> I've generally had better I/O performance using direct byte-buffers than heap 
> buffers, although that experience largely comes from work on webservers, I 
> haven't tested if it holds true for file I/O. If we operate directly upon a 
> FileChannel, we can write direct buffers directly to files. The 
> DirectByteBuffer memory is guaranteed to be contiguous, so it should have 
> better performance around copying and manipulation as well.
> 
> -ck
> 
> On Thu, Aug 26, 2021, at 10:00, Volkan Yazıcı wrote:
> > Ralph, maybe a dumb idea but... Can't we simply write to /dev/null to avoid
> > hardware's influence in the test results?
> > 
> > On Wed, Aug 25, 2021 at 8:05 PM Ralph Goers 
> > wrote:
> > 
> > > Did you add the no-op appender to Ceki’s project? Or are you using our
> > > NullAppender? I have
> > > my doubts about using that NullAppender as it does nothing - not even
> > > render the Layout, so
> > > it may get completely optimized away.
> > >
> > > I’d still like to know what kind of disks Remko did his original testing.
> > > The page says he got
> > > 18.495 million messages per second. Each message would be about 130
> > > characters long
> > > from I can tell. That means the destination has to be able to support
> > > about at least 2.4 GB
> > > per second.
> > >
> > > In Ceki’s test the message is only 23 characters long (vs 100 in Remko’s).
> > > That means the
> > > message size is only going to be about 60 characters long. For me the
> > > async test is writing
> > > at about 1700 ops/ms which equates to 102 MBs, yet I am still seeing the
> > > queue backed up.
> > > So how we are writing must be terribly inefficient. Logback is getting
> > > about 2000 ops/ms,
> > > which is still only 120 MBs.
> > >
> > > Your test below would be generating about 197 MBs.
> > >
> > > One thing I did notice that made a big difference is that Ceki has Logback
> > > configured to
> > > use a buffer size of 256K while Log4j2 uses the default of 8K. Setting
> > > Log4j2 to 256K
> > > considerably improved the performance.
> > >
> > >
> > > Ralph
> > >
> > > > On Aug 25, 2021, at 8:42 AM, Carter Kozak  wrote:
> > > >
> > > >> If we would move the appender performance aside, am I right to conclude
> > > >> that the entire complex async. framework built atop Disruptor with all
> > > its
> > > >> whistles and bells is yielding a diminishing performance gain compared
> > > to a
> > > >> simple off the shelf blocking queue?
> > > >
> > > > I haven't seen any data that would give me such an impression -- the
> > > file appender
> > > > itself is slower than the thread producing events, which appears to
> > > limit our throu

Re: Cek's new log4j vs logback benchmark

2021-08-26 Thread Carter Kozak
I also tried that with similar results, which leads me to believe we're spending
too much time copying between buffers.

We've proven that log4j can get LogEvents to the appender very quickly and
efficiently.

Once we hit PatternLayout we write data to a StringBuilder. 
AbstractStringBuilder used to store a char array in java 8, however with jep 
254 compact strings it was updated (similarly to String) to use a byte array + 
byte representing the encoding.
We encode the StringBuilder value into a ByteBufferDestination (Heap)ByteBuffer 
in StringBuilderEncoder using the following steps:
1. copy the StringBuilder value into a CharBuffer (this would have been a 
direct copy from char[] to char[] in java8, but requires conversion from 
byte[]+coder now)
2. encode data from the CharBuffer to the StringBuilderEncoder ByteBuffer
3. copy contents of the StringBuilderEncoder ByteBuffer to the 
ByteBufferDestination ByteBuffer (which is also always a heap buffer as far as 
I can tell)
4. Write the ByteBufferDestination to the FileOutputStream/RandomAccessFile by 
extracting the heap byte-array.

We're copying bytes around a lot, and I think older JVMs would have optimized 
out a bit of the copying because the types exactly matched.
It's unclear why we need to copy through two separate ByteBuffers from 
StringBuilderEncoder into ByteBufferDestination
I've generally had better I/O performance using direct byte-buffers than heap 
buffers, although that experience largely comes from work on webservers, I 
haven't tested if it holds true for file I/O. If we operate directly upon a 
FileChannel, we can write direct buffers directly to files. The 
DirectByteBuffer memory is guaranteed to be contiguous, so it should have 
better performance around copying and manipulation as well.

-ck

On Thu, Aug 26, 2021, at 10:00, Volkan Yazıcı wrote:
> Ralph, maybe a dumb idea but... Can't we simply write to /dev/null to avoid
> hardware's influence in the test results?
> 
> On Wed, Aug 25, 2021 at 8:05 PM Ralph Goers 
> wrote:
> 
> > Did you add the no-op appender to Ceki’s project? Or are you using our
> > NullAppender? I have
> > my doubts about using that NullAppender as it does nothing - not even
> > render the Layout, so
> > it may get completely optimized away.
> >
> > I’d still like to know what kind of disks Remko did his original testing.
> > The page says he got
> > 18.495 million messages per second. Each message would be about 130
> > characters long
> > from I can tell. That means the destination has to be able to support
> > about at least 2.4 GB
> > per second.
> >
> > In Ceki’s test the message is only 23 characters long (vs 100 in Remko’s).
> > That means the
> > message size is only going to be about 60 characters long. For me the
> > async test is writing
> > at about 1700 ops/ms which equates to 102 MBs, yet I am still seeing the
> > queue backed up.
> > So how we are writing must be terribly inefficient. Logback is getting
> > about 2000 ops/ms,
> > which is still only 120 MBs.
> >
> > Your test below would be generating about 197 MBs.
> >
> > One thing I did notice that made a big difference is that Ceki has Logback
> > configured to
> > use a buffer size of 256K while Log4j2 uses the default of 8K. Setting
> > Log4j2 to 256K
> > considerably improved the performance.
> >
> >
> > Ralph
> >
> > > On Aug 25, 2021, at 8:42 AM, Carter Kozak  wrote:
> > >
> > >> If we would move the appender performance aside, am I right to conclude
> > >> that the entire complex async. framework built atop Disruptor with all
> > its
> > >> whistles and bells is yielding a diminishing performance gain compared
> > to a
> > >> simple off the shelf blocking queue?
> > >
> > > I haven't seen any data that would give me such an impression -- the
> > file appender
> > > itself is slower than the thread producing events, which appears to
> > limit our throughput.
> > > Even then, log4j2 does much better in my testing on a linux workstation
> > with 20+ producer
> > > threads compared to logback.
> > > Any time the queue is constantly full, performance will suffer (and this
> > applies to standard
> > > blocking queues as well as disruptor, however disruptor will fare worse
> > when the queue
> > > is full).
> > > The results using a single thread are roughly reproducible on the
> > non-async test, so I think
> > > the problem is somewhere between the layout, and appender/manager.
> > >
> > > Results running locally with no-op appender implementations:
> > >
> > > 1 thread:
> > >
> > > BenchmarkMode  Cnt Score
> >  Error   Units
> > > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt4  3285.743 ±
> > 427.963  ops/ms
> > > AsyncWithFileAppenderBenchmark.logbackFile  thrpt4  2383.532 ±
> > 136.034  ops/ms
> > >
> > > 20 threads:
> > >
> > > BenchmarkMode  Cnt Score
> >  Error   Units
> > > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrp

Re: Cek's new log4j vs logback benchmark

2021-08-26 Thread Volkan Yazıcı
Ralph, maybe a dumb idea but... Can't we simply write to /dev/null to avoid
hardware's influence in the test results?

On Wed, Aug 25, 2021 at 8:05 PM Ralph Goers 
wrote:

> Did you add the no-op appender to Ceki’s project? Or are you using our
> NullAppender? I have
> my doubts about using that NullAppender as it does nothing - not even
> render the Layout, so
> it may get completely optimized away.
>
> I’d still like to know what kind of disks Remko did his original testing.
> The page says he got
> 18.495 million messages per second. Each message would be about 130
> characters long
> from I can tell. That means the destination has to be able to support
> about at least 2.4 GB
> per second.
>
> In Ceki’s test the message is only 23 characters long (vs 100 in Remko’s).
> That means the
> message size is only going to be about 60 characters long. For me the
> async test is writing
> at about 1700 ops/ms which equates to 102 MBs, yet I am still seeing the
> queue backed up.
> So how we are writing must be terribly inefficient. Logback is getting
> about 2000 ops/ms,
> which is still only 120 MBs.
>
> Your test below would be generating about 197 MBs.
>
> One thing I did notice that made a big difference is that Ceki has Logback
> configured to
> use a buffer size of 256K while Log4j2 uses the default of 8K. Setting
> Log4j2 to 256K
> considerably improved the performance.
>
>
> Ralph
>
> > On Aug 25, 2021, at 8:42 AM, Carter Kozak  wrote:
> >
> >> If we would move the appender performance aside, am I right to conclude
> >> that the entire complex async. framework built atop Disruptor with all
> its
> >> whistles and bells is yielding a diminishing performance gain compared
> to a
> >> simple off the shelf blocking queue?
> >
> > I haven't seen any data that would give me such an impression -- the
> file appender
> > itself is slower than the thread producing events, which appears to
> limit our throughput.
> > Even then, log4j2 does much better in my testing on a linux workstation
> with 20+ producer
> > threads compared to logback.
> > Any time the queue is constantly full, performance will suffer (and this
> applies to standard
> > blocking queues as well as disruptor, however disruptor will fare worse
> when the queue
> > is full).
> > The results using a single thread are roughly reproducible on the
> non-async test, so I think
> > the problem is somewhere between the layout, and appender/manager.
> >
> > Results running locally with no-op appender implementations:
> >
> > 1 thread:
> >
> > BenchmarkMode  Cnt Score
>  Error   Units
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt4  3285.743 ±
> 427.963  ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile  thrpt4  2383.532 ±
> 136.034  ops/ms
> >
> > 20 threads:
> >
> > BenchmarkMode  Cnt Score
>  Error   Units
> > AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt4  1177.746 ±
> 919.560  ops/ms
> > AsyncWithFileAppenderBenchmark.logbackFile  thrpt4   602.614 ±
> 47.485  ops/ms
>
>
>


Re: Cek's new log4j vs logback benchmark

2021-08-25 Thread Ralph Goers
Did you add the no-op appender to Ceki’s project? Or are you using our 
NullAppender? I have 
my doubts about using that NullAppender as it does nothing - not even render 
the Layout, so 
it may get completely optimized away.

I’d still like to know what kind of disks Remko did his original testing. The 
page says he got 
18.495 million messages per second. Each message would be about 130 characters 
long 
from I can tell. That means the destination has to be able to support about at 
least 2.4 GB 
per second. 

In Ceki’s test the message is only 23 characters long (vs 100 in Remko’s). That 
means the 
message size is only going to be about 60 characters long. For me the async 
test is writing 
at about 1700 ops/ms which equates to 102 MBs, yet I am still seeing the queue 
backed up. 
So how we are writing must be terribly inefficient. Logback is getting about 
2000 ops/ms, 
which is still only 120 MBs.

Your test below would be generating about 197 MBs.

One thing I did notice that made a big difference is that Ceki has Logback 
configured to 
use a buffer size of 256K while Log4j2 uses the default of 8K. Setting Log4j2 
to 256K 
considerably improved the performance.


Ralph

> On Aug 25, 2021, at 8:42 AM, Carter Kozak  wrote:
> 
>> If we would move the appender performance aside, am I right to conclude
>> that the entire complex async. framework built atop Disruptor with all its
>> whistles and bells is yielding a diminishing performance gain compared to a
>> simple off the shelf blocking queue?
> 
> I haven't seen any data that would give me such an impression -- the file 
> appender
> itself is slower than the thread producing events, which appears to limit our 
> throughput.
> Even then, log4j2 does much better in my testing on a linux workstation with 
> 20+ producer
> threads compared to logback.
> Any time the queue is constantly full, performance will suffer (and this 
> applies to standard
> blocking queues as well as disruptor, however disruptor will fare worse when 
> the queue
> is full).
> The results using a single thread are roughly reproducible on the non-async 
> test, so I think
> the problem is somewhere between the layout, and appender/manager.
> 
> Results running locally with no-op appender implementations:
> 
> 1 thread:
> 
> BenchmarkMode  Cnt Score 
> Error   Units
> AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt4  3285.743 ± 
> 427.963  ops/ms
> AsyncWithFileAppenderBenchmark.logbackFile  thrpt4  2383.532 ± 
> 136.034  ops/ms
> 
> 20 threads:
> 
> BenchmarkMode  Cnt Score 
> Error   Units
> AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt4  1177.746 ± 
> 919.560  ops/ms
> AsyncWithFileAppenderBenchmark.logbackFile  thrpt4   602.614 ±  
> 47.485  ops/ms




Re: Cek's new log4j vs logback benchmark

2021-08-25 Thread Carter Kozak
> If we would move the appender performance aside, am I right to conclude
> that the entire complex async. framework built atop Disruptor with all its
> whistles and bells is yielding a diminishing performance gain compared to a
> simple off the shelf blocking queue?

I haven't seen any data that would give me such an impression -- the file 
appender
itself is slower than the thread producing events, which appears to limit our 
throughput.
Even then, log4j2 does much better in my testing on a linux workstation with 
20+ producer
threads compared to logback.
Any time the queue is constantly full, performance will suffer (and this 
applies to standard
blocking queues as well as disruptor, however disruptor will fare worse when 
the queue
is full).
The results using a single thread are roughly reproducible on the non-async 
test, so I think
the problem is somewhere between the layout, and appender/manager.

Results running locally with no-op appender implementations:

1 thread:

BenchmarkMode  Cnt Score Error  
 Units
AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt4  3285.743 ± 427.963  
ops/ms
AsyncWithFileAppenderBenchmark.logbackFile  thrpt4  2383.532 ± 136.034  
ops/ms

20 threads:

BenchmarkMode  Cnt Score Error  
 Units
AsyncWithFileAppenderBenchmark.log4j2AsyncFile  thrpt4  1177.746 ± 919.560  
ops/ms
AsyncWithFileAppenderBenchmark.logbackFile  thrpt4   602.614 ±  47.485  
ops/ms

Re: Cek's new log4j vs logback benchmark

2021-08-20 Thread Ralph Goers
Carter,

Thanks for following up. I knew I had to be missing something. I don’t know why 
I didn’t see the system property being set.

Ralph

> On Aug 20, 2021, at 5:19 PM, Carter Kozak  wrote:
> 
> The benchmark itself sets the system property to opt into 
> AsyncLoggerContextSelector:
> https://github.com/ceki/logback-perf/blob/5f6b10693959b6ecf1b82abddb052e89fe063e89/src/main/java/ch/qos/logback/perf/AsyncWithFileAppenderBenchmark.java#L61
>  
> 
> 
> There’s some discussion on 
> https://gist.github.com/carterkozak/891ea382a12782b772571059d62d501a 
> 
> 
> -ck
> 
>> On Aug 20, 2021, at 8:04 PM, Ralph Goers  wrote:
>> 
>> I don’t understand. His async configuration for Log4j 2 isn’t async. I 
>> didn’t see him set the system property. The log4j2 config file says
>> 
>> 
>> But he didn’t configure an AsyncLogger or AsyncRoot and there is no Async 
>> Appender configured.
>> 
>> Ralph
>> 
>>> On Aug 20, 2021, at 9:14 AM, Carter Kozak  wrote:
>>> 
>>> Benchmarks were using an unpublished version of logback that works 
>>> differently than the release version I tested against -- continuing the 
>>> conversation there, but I'll report back here once dust settles. Rerunning 
>>> the benchmarks with a logback snapshot from source shows that async logback 
>>> with one logging thread outperforms async log4j2 with 1 logging thread, 
>>> however log4j2 performs better with 20 threads. I still need to do a bit of 
>>> deeper investigation but will be busy with work for the next several hours.
>>> 
>>> On Fri, Aug 20, 2021, at 12:10, Ralph Goers wrote:
 Feel free to respond to his tweet. 
 
 Ralph
 
> On Aug 20, 2021, at 7:15 AM, Carter Kozak  wrote:
> 
> Thanks for flagging this! I've responded to the tweet, copying it here as 
> well for posterity:
> 
> Looking at the logback benchmark it appears that no bytes are being 
> written to target/test-output/logback-async-perf.log. Upon closer 
> inspection the logback asyncappender is in an started=false state, 
> rejecting all input events.
> https://twitter.com/carter_kozak/status/1428721705464238085?s=20
> 
> -ck
> 
> On Fri, Aug 20, 2021, at 01:13, Volkan Yazıcı wrote:
>> Hello,
>> 
>> Ceki has recently posted a Tweet stating that both log4j 1 and logback
>> performs better than log4j 2 in async mode:
>> 
>> https://twitter.com/ceki/status/1428461637917360131?s=19
>> https://github.com/ceki/logback-perf
>> 
>> I don't know much about how async wiring is done under the hood, yet, if
>> his claim is true, that is pretty concerning. Would anybody mind sparing
>> some time to investigate if the configuration he employs is tuned good
>> enough and the results are accurate, please?
>> 
>> Kind regards.
>> 
 
 
 
>> 
> 




Re: Cek's new log4j vs logback benchmark

2021-08-20 Thread Matt Sicker
Does it help that we have five different ways to do async logging?

Matt Sicker

> On Aug 20, 2021, at 19:19, Carter Kozak  wrote:
> 
> The benchmark itself sets the system property to opt into 
> AsyncLoggerContextSelector:
> https://github.com/ceki/logback-perf/blob/5f6b10693959b6ecf1b82abddb052e89fe063e89/src/main/java/ch/qos/logback/perf/AsyncWithFileAppenderBenchmark.java#L61
>  
> 
> 
> There’s some discussion on 
> https://gist.github.com/carterkozak/891ea382a12782b772571059d62d501a 
> 
> 
> -ck
> 
>> On Aug 20, 2021, at 8:04 PM, Ralph Goers  wrote:
>> 
>> I don’t understand. His async configuration for Log4j 2 isn’t async. I 
>> didn’t see him set the system property. The log4j2 config file says
>> 
>> 
>> But he didn’t configure an AsyncLogger or AsyncRoot and there is no Async 
>> Appender configured.
>> 
>> Ralph
>> 
 On Aug 20, 2021, at 9:14 AM, Carter Kozak  wrote:
>>> 
>>> Benchmarks were using an unpublished version of logback that works 
>>> differently than the release version I tested against -- continuing the 
>>> conversation there, but I'll report back here once dust settles. Rerunning 
>>> the benchmarks with a logback snapshot from source shows that async logback 
>>> with one logging thread outperforms async log4j2 with 1 logging thread, 
>>> however log4j2 performs better with 20 threads. I still need to do a bit of 
>>> deeper investigation but will be busy with work for the next several hours.
>>> 
>>> On Fri, Aug 20, 2021, at 12:10, Ralph Goers wrote:
 Feel free to respond to his tweet. 
 
 Ralph
 
> On Aug 20, 2021, at 7:15 AM, Carter Kozak  wrote:
> 
> Thanks for flagging this! I've responded to the tweet, copying it here as 
> well for posterity:
> 
> Looking at the logback benchmark it appears that no bytes are being 
> written to target/test-output/logback-async-perf.log. Upon closer 
> inspection the logback asyncappender is in an started=false state, 
> rejecting all input events.
> https://twitter.com/carter_kozak/status/1428721705464238085?s=20
> 
> -ck
> 
> On Fri, Aug 20, 2021, at 01:13, Volkan Yazıcı wrote:
>> Hello,
>> 
>> Ceki has recently posted a Tweet stating that both log4j 1 and logback
>> performs better than log4j 2 in async mode:
>> 
>> https://twitter.com/ceki/status/1428461637917360131?s=19
>> https://github.com/ceki/logback-perf
>> 
>> I don't know much about how async wiring is done under the hood, yet, if
>> his claim is true, that is pretty concerning. Would anybody mind sparing
>> some time to investigate if the configuration he employs is tuned good
>> enough and the results are accurate, please?
>> 
>> Kind regards.
>> 
 
 
 
>> 
> 


Re: Cek's new log4j vs logback benchmark

2021-08-20 Thread Carter Kozak
The benchmark itself sets the system property to opt into 
AsyncLoggerContextSelector:
https://github.com/ceki/logback-perf/blob/5f6b10693959b6ecf1b82abddb052e89fe063e89/src/main/java/ch/qos/logback/perf/AsyncWithFileAppenderBenchmark.java#L61
 


There’s some discussion on 
https://gist.github.com/carterkozak/891ea382a12782b772571059d62d501a 


-ck

> On Aug 20, 2021, at 8:04 PM, Ralph Goers  wrote:
> 
> I don’t understand. His async configuration for Log4j 2 isn’t async. I didn’t 
> see him set the system property. The log4j2 config file says
> 
> 
> But he didn’t configure an AsyncLogger or AsyncRoot and there is no Async 
> Appender configured.
> 
> Ralph
> 
>> On Aug 20, 2021, at 9:14 AM, Carter Kozak  wrote:
>> 
>> Benchmarks were using an unpublished version of logback that works 
>> differently than the release version I tested against -- continuing the 
>> conversation there, but I'll report back here once dust settles. Rerunning 
>> the benchmarks with a logback snapshot from source shows that async logback 
>> with one logging thread outperforms async log4j2 with 1 logging thread, 
>> however log4j2 performs better with 20 threads. I still need to do a bit of 
>> deeper investigation but will be busy with work for the next several hours.
>> 
>> On Fri, Aug 20, 2021, at 12:10, Ralph Goers wrote:
>>> Feel free to respond to his tweet. 
>>> 
>>> Ralph
>>> 
 On Aug 20, 2021, at 7:15 AM, Carter Kozak  wrote:
 
 Thanks for flagging this! I've responded to the tweet, copying it here as 
 well for posterity:
 
 Looking at the logback benchmark it appears that no bytes are being 
 written to target/test-output/logback-async-perf.log. Upon closer 
 inspection the logback asyncappender is in an started=false state, 
 rejecting all input events.
 https://twitter.com/carter_kozak/status/1428721705464238085?s=20
 
 -ck
 
 On Fri, Aug 20, 2021, at 01:13, Volkan Yazıcı wrote:
> Hello,
> 
> Ceki has recently posted a Tweet stating that both log4j 1 and logback
> performs better than log4j 2 in async mode:
> 
> https://twitter.com/ceki/status/1428461637917360131?s=19
> https://github.com/ceki/logback-perf
> 
> I don't know much about how async wiring is done under the hood, yet, if
> his claim is true, that is pretty concerning. Would anybody mind sparing
> some time to investigate if the configuration he employs is tuned good
> enough and the results are accurate, please?
> 
> Kind regards.
> 
>>> 
>>> 
>>> 
> 



Re: Cek's new log4j vs logback benchmark

2021-08-20 Thread Ralph Goers
I don’t understand. His async configuration for Log4j 2 isn’t async. I didn’t 
see him set the system property. The log4j2 config file says


But he didn’t configure an AsyncLogger or AsyncRoot and there is no Async 
Appender configured.

Ralph

> On Aug 20, 2021, at 9:14 AM, Carter Kozak  wrote:
> 
> Benchmarks were using an unpublished version of logback that works 
> differently than the release version I tested against -- continuing the 
> conversation there, but I'll report back here once dust settles. Rerunning 
> the benchmarks with a logback snapshot from source shows that async logback 
> with one logging thread outperforms async log4j2 with 1 logging thread, 
> however log4j2 performs better with 20 threads. I still need to do a bit of 
> deeper investigation but will be busy with work for the next several hours.
> 
> On Fri, Aug 20, 2021, at 12:10, Ralph Goers wrote:
>> Feel free to respond to his tweet. 
>> 
>> Ralph
>> 
>>> On Aug 20, 2021, at 7:15 AM, Carter Kozak  wrote:
>>> 
>>> Thanks for flagging this! I've responded to the tweet, copying it here as 
>>> well for posterity:
>>> 
>>> Looking at the logback benchmark it appears that no bytes are being written 
>>> to target/test-output/logback-async-perf.log. Upon closer inspection the 
>>> logback asyncappender is in an started=false state, rejecting all input 
>>> events.
>>> https://twitter.com/carter_kozak/status/1428721705464238085?s=20
>>> 
>>> -ck
>>> 
>>> On Fri, Aug 20, 2021, at 01:13, Volkan Yazıcı wrote:
 Hello,
 
 Ceki has recently posted a Tweet stating that both log4j 1 and logback
 performs better than log4j 2 in async mode:
 
 https://twitter.com/ceki/status/1428461637917360131?s=19
 https://github.com/ceki/logback-perf
 
 I don't know much about how async wiring is done under the hood, yet, if
 his claim is true, that is pretty concerning. Would anybody mind sparing
 some time to investigate if the configuration he employs is tuned good
 enough and the results are accurate, please?
 
 Kind regards.
 
>> 
>> 
>> 



Re: Cek's new log4j vs logback benchmark

2021-08-20 Thread Carter Kozak
Benchmarks were using an unpublished version of logback that works differently 
than the release version I tested against -- continuing the conversation there, 
but I'll report back here once dust settles. Rerunning the benchmarks with a 
logback snapshot from source shows that async logback with one logging thread 
outperforms async log4j2 with 1 logging thread, however log4j2 performs better 
with 20 threads. I still need to do a bit of deeper investigation but will be 
busy with work for the next several hours.

On Fri, Aug 20, 2021, at 12:10, Ralph Goers wrote:
> Feel free to respond to his tweet. 
> 
> Ralph
> 
> > On Aug 20, 2021, at 7:15 AM, Carter Kozak  wrote:
> > 
> > Thanks for flagging this! I've responded to the tweet, copying it here as 
> > well for posterity:
> > 
> > Looking at the logback benchmark it appears that no bytes are being written 
> > to target/test-output/logback-async-perf.log. Upon closer inspection the 
> > logback asyncappender is in an started=false state, rejecting all input 
> > events.
> > https://twitter.com/carter_kozak/status/1428721705464238085?s=20
> > 
> > -ck
> > 
> > On Fri, Aug 20, 2021, at 01:13, Volkan Yazıcı wrote:
> >> Hello,
> >> 
> >> Ceki has recently posted a Tweet stating that both log4j 1 and logback
> >> performs better than log4j 2 in async mode:
> >> 
> >> https://twitter.com/ceki/status/1428461637917360131?s=19
> >> https://github.com/ceki/logback-perf
> >> 
> >> I don't know much about how async wiring is done under the hood, yet, if
> >> his claim is true, that is pretty concerning. Would anybody mind sparing
> >> some time to investigate if the configuration he employs is tuned good
> >> enough and the results are accurate, please?
> >> 
> >> Kind regards.
> >> 
> 
> 
> 


Re: Cek's new log4j vs logback benchmark

2021-08-20 Thread Ralph Goers
Feel free to respond to his tweet. 

Ralph

> On Aug 20, 2021, at 7:15 AM, Carter Kozak  wrote:
> 
> Thanks for flagging this! I've responded to the tweet, copying it here as 
> well for posterity:
> 
> Looking at the logback benchmark it appears that no bytes are being written 
> to target/test-output/logback-async-perf.log. Upon closer inspection the 
> logback asyncappender is in an started=false state, rejecting all input 
> events.
> https://twitter.com/carter_kozak/status/1428721705464238085?s=20
> 
> -ck
> 
> On Fri, Aug 20, 2021, at 01:13, Volkan Yazıcı wrote:
>> Hello,
>> 
>> Ceki has recently posted a Tweet stating that both log4j 1 and logback
>> performs better than log4j 2 in async mode:
>> 
>> https://twitter.com/ceki/status/1428461637917360131?s=19
>> https://github.com/ceki/logback-perf
>> 
>> I don't know much about how async wiring is done under the hood, yet, if
>> his claim is true, that is pretty concerning. Would anybody mind sparing
>> some time to investigate if the configuration he employs is tuned good
>> enough and the results are accurate, please?
>> 
>> Kind regards.
>> 




Re: Cek's new log4j vs logback benchmark

2021-08-20 Thread Carter Kozak
Thanks for flagging this! I've responded to the tweet, copying it here as well 
for posterity:

Looking at the logback benchmark it appears that no bytes are being written to 
target/test-output/logback-async-perf.log. Upon closer inspection the logback 
asyncappender is in an started=false state, rejecting all input events.
https://twitter.com/carter_kozak/status/1428721705464238085?s=20

-ck

On Fri, Aug 20, 2021, at 01:13, Volkan Yazıcı wrote:
> Hello,
> 
> Ceki has recently posted a Tweet stating that both log4j 1 and logback
> performs better than log4j 2 in async mode:
> 
> https://twitter.com/ceki/status/1428461637917360131?s=19
> https://github.com/ceki/logback-perf
> 
> I don't know much about how async wiring is done under the hood, yet, if
> his claim is true, that is pretty concerning. Would anybody mind sparing
> some time to investigate if the configuration he employs is tuned good
> enough and the results are accurate, please?
> 
> Kind regards.
> 


Cek's new log4j vs logback benchmark

2021-08-19 Thread Volkan Yazıcı
Hello,

Ceki has recently posted a Tweet stating that both log4j 1 and logback
performs better than log4j 2 in async mode:

https://twitter.com/ceki/status/1428461637917360131?s=19
https://github.com/ceki/logback-perf

I don't know much about how async wiring is done under the hood, yet, if
his claim is true, that is pretty concerning. Would anybody mind sparing
some time to investigate if the configuration he employs is tuned good
enough and the results are accurate, please?

Kind regards.