[ 
https://issues.apache.org/jira/browse/LOG4J2-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077425#comment-14077425
 ] 

Remko Popma edited comment on LOG4J2-744 at 7/29/14 6:19 AM:
-------------------------------------------------------------

Thanks for providing the microbenchmarks. There is a lot there, and I'm still 
unpacking it, experimenting with various things, but so far this is what I've 
found:

*Clock performance varies wildly across systems*

I added some simplified benchmarks to focus on the clocks:
{code}
final static Clock systemClock = new SystemClock();
@GenerateMicroBenchmark
public long testSystemCurrentTimeMillis() {
    return System.currentTimeMillis();
}
@GenerateMicroBenchmark
public long testSystemClock() {
    return systemClock.currentTimeMillis();
}
...
{code}

java 1.7.0_55 64 bit, Windows 7 Enterprise, Intel i5-2500 dual CPU 3.30GHz with 
hyperthreading on (4 virtual cores):
{code}
java -jar target/microbenchmarks.jar ".*ClockTest.*" -f 1 -wi 5 -i 5  -tu ns 
-bm sample -t 4
Benchmark                                    Mode   Samples         Mean   Mean 
error    Units
b.ClockTest.testBaseline                   sample    205018        9.892        
0.399    ns/op
b.ClockTest.testCachedClock                sample    237702      147.085        
1.211    ns/op
b.ClockTest.testCoarseCachedClock          sample    235897       10.037        
0.374    ns/op
b.ClockTest.testSystemCurrentTimeMillis    sample    208701       14.626        
1.074    ns/op
b.ClockTest.testFakeClock                  sample    235691        9.520        
0.371    ns/op
b.ClockTest.testSystemClock                sample    239811       13.963        
0.440    ns/op
{code}

Unix:
java 1.7.0_05 64 bit, RHEL 4.4.7-4 (Linux 2.6.32-431.e16x86_64), Intel Xeon 4 
quad core [email protected] CPUs (16 virtual cores):
{code}
java -jar target/microbenchmarks.jar ".*ClockTest.*" -f 1 -wi 5 -i 5  -tu ns 
-bm sample -t 4
Benchmark                                    Mode   Samples         Mean   Mean 
error    Units
b.ClockTest.testBaseline                   sample    212362      729.093       
10.132    ns/op
b.ClockTest.testCachedClock                sample    152722      890.688        
7.055    ns/op
b.ClockTest.testCoarseCachedClock          sample    336110      632.849        
2.585    ns/op
b.ClockTest.testSystemCurrentTimeMillis    sample    258054     3174.306        
9.986    ns/op
b.ClockTest.testFakeClock                  sample    336549     1038.334        
5.581    ns/op
b.ClockTest.testSystemClock                sample    271259     3881.617       
15.908    ns/op
{code}

Note the *massive* difference for {{System.currentTimeMillis()}}: 14 nanos on 
WIndows, 3.1 _*micros*_ on Linux. That's 220x!

I agree, it looks like CachedClock performs very badly if there is contention 
on the volatile field. (It does perform well on the Linux box where there are 
enough cores to spread the load, but on the Windows box, brr...)

*To instanceof or not, that is the question*

First, I am assuming that the vast majority of users will log a mixture of 
SimpleMessages and ParameterizedMessages, and that TimestampMessages are a 
relatively rare use case.

Here are my measurements:

java 1.7.0_55 64 bit, Windows 7 Enterprise, Intel i5-2500 dual CPU 3.30GHz with 
hyperthreading on (4 virtual cores):
{code}
java -jar target/microbenchmarks.jar ".*ClockBench.*" -f 1 -wi 5 -i 5  -tu ns 
-bm sample -t 4
Benchmark                                                       Mode   Samples  
       Mean   Mean error    Units
b.ClockBenchmark.testSystemClockSimpleMessageInstanceOf       sample    244944  
    105.317        6.183    ns/op
b.ClockBenchmark.testSystemClockSimpleMessageUseClock         sample    284839  
     86.625        2.427    ns/op
b.ClockBenchmark.testSystemClockTimestampMessageInstanceOf    sample    250178  
     61.812        6.151    ns/op
{code}

On Windows, my results seem to indicate that the overhead of {{msg instanceof 
TimestampMessage ? ((TimestampMessage) msg).getTimestamp() : 
clock.currentTimeMillis()}} is about 20 nanos. On the other hand, 
{{TimestampMessage.getTimestamp()}} is about 20 nanos faster than 
{{SystemClock.currentTimeMillis()}}.

So just looking at my Windows results, I would not want to make logging 20 
nanos slower for the majority of users in order to make logging 20 nanos faster 
for the fraction who log TimestampMessages.

However, the Linux results show a very different picture. The Linux benchmark 
results were _very_ noisy so I changed to 20 iterations, but results easily 
vary by several hundred nanoseconds between runs...

java 1.7.0_05 64 bit, RHEL 4.4.7-4 (Linux 2.6.32-431.e16x86_64), Intel Xeon 4 
quad core [email protected] CPUs (16 virtual cores):
{code}
java -jar target/microbenchmarks.jar ".*ClockBench.*" -f 1 -wi 5 -i 5  -tu ns 
-bm sample -t 4
Benchmark                                                       Mode   Samples  
       Mean   Mean error    Units
b.ClockBenchmark.testSystemClockSimpleMessageInstanceOf       sample    964501  
   3596.031       23.931    ns/op
b.ClockBenchmark.testSystemClockSimpleMessageUseClock         sample    777443  
   3207.301       28.671    ns/op
b.ClockBenchmark.testSystemClockTimestampMessageInstanceOf    sample    959701  
    720.979       12.551    ns/op
{code}

On the linux box I tested on, the cost of calling System.currentTimeMillis 
completely dominate the cost of creating a log event. This is painful for 
TimestampMessages. It looks like the cost of {{instanceof + conditional}} is 
~400 nanos, but in other tests I got the opposite result so I'm putting this 
down to noise.

Even if the results are noisy, clearly there is a 4-5x(!) improvement to be had 
by avoiding calls to System.currentTimeMillis(). The overhead of {{instanceof + 
conditional}} is hard to determine.

So, to conclude, I'll put in the instanceof check.


was (Author: [email protected]):
Thanks for providing the microbenchmarks. There is a lot there, and I'm still 
unpacking it, experimenting with various things, but so far this is what I've 
found:

*Clock performance varies wildly across systems*

I added some simplified benchmarks to focus on the clocks:
{code}
final static Clock systemClock = new SystemClock();
@GenerateMicroBenchmark
public long testSystemCurrentTimeMillis() {
    return System.currentTimeMillis();
}
@GenerateMicroBenchmark
public long testSystemClock() {
    return systemClock.currentTimeMillis();
}
...
{code}

java 1.7.0_55 64 bit, Windows 7 Enterprise, Intel i5-2500 dual CPU 3.30GHz with 
hyperthreading on (4 virtual cores):
{code}
java -jar target/microbenchmarks.jar ".*ClockTest.*" -f 1 -wi 5 -i 5  -tu ns 
-bm sample -t 4
Benchmark                                    Mode   Samples         Mean   Mean 
error    Units
b.ClockTest.testBaseline                   sample    205018        9.892        
0.399    ns/op
b.ClockTest.testCachedClock                sample    237702      147.085        
1.211    ns/op
b.ClockTest.testCoarseCachedClock          sample    235897       10.037        
0.374    ns/op
b.ClockTest.testSystemCurrentTimeMillis    sample    208701       14.626        
1.074    ns/op
b.ClockTest.testFakeClock                  sample    235691        9.520        
0.371    ns/op
b.ClockTest.testSystemClock                sample    239811       13.963        
0.440    ns/op
{code}

Unix:
java 1.7.0_05 64 bit, RHEL 4.4.7-4 (Linux 2.6.32-431.e16x86_64), Intel Xeon 4 
quad core [email protected] CPUs (16 virtual cores):
{code}
java -jar target/microbenchmarks.jar ".*ClockTest.*" -f 1 -wi 5 -i 5  -tu ns 
-bm sample -t 4
Benchmark                                    Mode   Samples         Mean   Mean 
error    Units
b.ClockTest.testBaseline                   sample    212362      729.093       
10.132    ns/op
b.ClockTest.testCachedClock                sample    152722      890.688        
7.055    ns/op
b.ClockTest.testCoarseCachedClock          sample    336110      632.849        
2.585    ns/op
b.ClockTest.testSystemCurrentTimeMillis    sample    258054     3174.306        
9.986    ns/op
b.ClockTest.testFakeClock                  sample    336549     1038.334        
5.581    ns/op
b.ClockTest.testSystemClock                sample    271259     3881.617       
15.908    ns/op
{code}

Note the *massive* difference for {{System.currentTimeMillis()}}: 14 nanos on 
WIndows, 3.1 _*micros*_ on Linux. That's 220x!

I agree, it looks like CachedClock performs very badly if there is contention 
on the volatile field. (It does perform well on the Linux box where there are 
enough cores to spread the load, but on the Windows box, brr...)

*To instanceof or not, that is the question*

First, I am assuming that the vast majority of users will log a mixture of 
SimpleMessages and ParameterizedMessages, and that TimestampMessages are a 
relatively rare use case.

Here are my measurements:

java 1.7.0_55 64 bit, Windows 7 Enterprise, Intel i5-2500 dual CPU 3.30GHz with 
hyperthreading on (4 virtual cores):
{code}
Benchmark                                                       Mode   Samples  
       Mean   Mean error    Units
b.ClockBenchmark.testSystemClockSimpleMessageInstanceOf       sample    244944  
    105.317        6.183    ns/op
b.ClockBenchmark.testSystemClockSimpleMessageUseClock         sample    284839  
     86.625        2.427    ns/op
b.ClockBenchmark.testSystemClockTimestampMessageInstanceOf    sample    250178  
     61.812        6.151    ns/op
{code}

On Windows, my results seem to indicate that the overhead of {{msg instanceof 
TimestampMessage ? ((TimestampMessage) msg).getTimestamp() : 
clock.currentTimeMillis()}} is about 20 nanos. On the other hand, 
{{TimestampMessage.getTimestamp()}} is about 20 nanos faster than 
{{SystemClock.currentTimeMillis()}}.

So just looking at my Windows results, I would not want to make logging 20 
nanos slower for the majority of users in order to make logging 20 nanos faster 
for the fraction who log TimestampMessages.

However, the Linux results show a very different picture. The Linux benchmark 
results were _very_ noisy so I changed to 20 iterations, but results easily 
vary by several hundred nanoseconds between runs...

java 1.7.0_05 64 bit, RHEL 4.4.7-4 (Linux 2.6.32-431.e16x86_64), Intel Xeon 4 
quad core [email protected] CPUs (16 virtual cores):
{code}
Benchmark                                                       Mode   Samples  
       Mean   Mean error    Units
b.ClockBenchmark.testSystemClockSimpleMessageInstanceOf       sample    964501  
   3596.031       23.931    ns/op
b.ClockBenchmark.testSystemClockSimpleMessageUseClock         sample    777443  
   3207.301       28.671    ns/op
b.ClockBenchmark.testSystemClockTimestampMessageInstanceOf    sample    959701  
    720.979       12.551    ns/op
{code}

On the linux box I tested on, the cost of calling System.currentTimeMillis 
completely dominate the cost of creating a log event. This is painful for 
TimestampMessages. It looks like the cost of {{instanceof + conditional}} is 
~400 nanos, but in other tests I got the opposite result so I'm putting this 
down to noise.

Even if the results are noisy, clearly there is a 4-5x(!) improvement to be had 
by avoiding calls to System.currentTimeMillis(). The overhead of {{instanceof + 
conditional}} is hard to determine.

So, to conclude, I'll put in the instanceof check.

> Avoid unnecessary Clock calls when TimestampMessage is logged
> -------------------------------------------------------------
>
>                 Key: LOG4J2-744
>                 URL: https://issues.apache.org/jira/browse/LOG4J2-744
>             Project: Log4j 2
>          Issue Type: Improvement
>            Reporter: Scott Harrington
>            Priority: Minor
>         Attachments: ClockBenchmark-jdk1.6.0_31.txt, 
> ClockBenchmark-jdk1.7.0_45.txt, ClockBenchmark.java, LOG4J2-744-test.patch, 
> LOG4J2-744.patch
>
>
> The TimestampMessage interface was introduced in LOG4J2-53 and revised for 
> AsyncLogger in LOG4J2-455.
> I've observed that Clock.currentTimeMillis is still called which should not 
> be necessary.
> I have two patches, one which adds JUnit tests that demonstrate the 
> unnecessary Clock calls, and one which fixes the issue for both AsyncLogger 
> and "traditional" configurations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to