[
https://issues.apache.org/jira/browse/LOG4J2-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077425#comment-14077425
]
Remko Popma edited comment on LOG4J2-744 at 7/29/14 6:19 AM:
-------------------------------------------------------------
Thanks for providing the microbenchmarks. There is a lot there, and I'm still
unpacking it, experimenting with various things, but so far this is what I've
found:
*Clock performance varies wildly across systems*
I added some simplified benchmarks to focus on the clocks:
{code}
final static Clock systemClock = new SystemClock();
@GenerateMicroBenchmark
public long testSystemCurrentTimeMillis() {
return System.currentTimeMillis();
}
@GenerateMicroBenchmark
public long testSystemClock() {
return systemClock.currentTimeMillis();
}
...
{code}
java 1.7.0_55 64 bit, Windows 7 Enterprise, Intel i5-2500 dual CPU 3.30GHz with
hyperthreading on (4 virtual cores):
{code}
java -jar target/microbenchmarks.jar ".*ClockTest.*" -f 1 -wi 5 -i 5 -tu ns
-bm sample -t 4
Benchmark Mode Samples Mean Mean
error Units
b.ClockTest.testBaseline sample 205018 9.892
0.399 ns/op
b.ClockTest.testCachedClock sample 237702 147.085
1.211 ns/op
b.ClockTest.testCoarseCachedClock sample 235897 10.037
0.374 ns/op
b.ClockTest.testSystemCurrentTimeMillis sample 208701 14.626
1.074 ns/op
b.ClockTest.testFakeClock sample 235691 9.520
0.371 ns/op
b.ClockTest.testSystemClock sample 239811 13.963
0.440 ns/op
{code}
Unix:
java 1.7.0_05 64 bit, RHEL 4.4.7-4 (Linux 2.6.32-431.e16x86_64), Intel Xeon 4
quad core [email protected] CPUs (16 virtual cores):
{code}
java -jar target/microbenchmarks.jar ".*ClockTest.*" -f 1 -wi 5 -i 5 -tu ns
-bm sample -t 4
Benchmark Mode Samples Mean Mean
error Units
b.ClockTest.testBaseline sample 212362 729.093
10.132 ns/op
b.ClockTest.testCachedClock sample 152722 890.688
7.055 ns/op
b.ClockTest.testCoarseCachedClock sample 336110 632.849
2.585 ns/op
b.ClockTest.testSystemCurrentTimeMillis sample 258054 3174.306
9.986 ns/op
b.ClockTest.testFakeClock sample 336549 1038.334
5.581 ns/op
b.ClockTest.testSystemClock sample 271259 3881.617
15.908 ns/op
{code}
Note the *massive* difference for {{System.currentTimeMillis()}}: 14 nanos on
WIndows, 3.1 _*micros*_ on Linux. That's 220x!
I agree, it looks like CachedClock performs very badly if there is contention
on the volatile field. (It does perform well on the Linux box where there are
enough cores to spread the load, but on the Windows box, brr...)
*To instanceof or not, that is the question*
First, I am assuming that the vast majority of users will log a mixture of
SimpleMessages and ParameterizedMessages, and that TimestampMessages are a
relatively rare use case.
Here are my measurements:
java 1.7.0_55 64 bit, Windows 7 Enterprise, Intel i5-2500 dual CPU 3.30GHz with
hyperthreading on (4 virtual cores):
{code}
java -jar target/microbenchmarks.jar ".*ClockBench.*" -f 1 -wi 5 -i 5 -tu ns
-bm sample -t 4
Benchmark Mode Samples
Mean Mean error Units
b.ClockBenchmark.testSystemClockSimpleMessageInstanceOf sample 244944
105.317 6.183 ns/op
b.ClockBenchmark.testSystemClockSimpleMessageUseClock sample 284839
86.625 2.427 ns/op
b.ClockBenchmark.testSystemClockTimestampMessageInstanceOf sample 250178
61.812 6.151 ns/op
{code}
On Windows, my results seem to indicate that the overhead of {{msg instanceof
TimestampMessage ? ((TimestampMessage) msg).getTimestamp() :
clock.currentTimeMillis()}} is about 20 nanos. On the other hand,
{{TimestampMessage.getTimestamp()}} is about 20 nanos faster than
{{SystemClock.currentTimeMillis()}}.
So just looking at my Windows results, I would not want to make logging 20
nanos slower for the majority of users in order to make logging 20 nanos faster
for the fraction who log TimestampMessages.
However, the Linux results show a very different picture. The Linux benchmark
results were _very_ noisy so I changed to 20 iterations, but results easily
vary by several hundred nanoseconds between runs...
java 1.7.0_05 64 bit, RHEL 4.4.7-4 (Linux 2.6.32-431.e16x86_64), Intel Xeon 4
quad core [email protected] CPUs (16 virtual cores):
{code}
java -jar target/microbenchmarks.jar ".*ClockBench.*" -f 1 -wi 5 -i 5 -tu ns
-bm sample -t 4
Benchmark Mode Samples
Mean Mean error Units
b.ClockBenchmark.testSystemClockSimpleMessageInstanceOf sample 964501
3596.031 23.931 ns/op
b.ClockBenchmark.testSystemClockSimpleMessageUseClock sample 777443
3207.301 28.671 ns/op
b.ClockBenchmark.testSystemClockTimestampMessageInstanceOf sample 959701
720.979 12.551 ns/op
{code}
On the linux box I tested on, the cost of calling System.currentTimeMillis
completely dominate the cost of creating a log event. This is painful for
TimestampMessages. It looks like the cost of {{instanceof + conditional}} is
~400 nanos, but in other tests I got the opposite result so I'm putting this
down to noise.
Even if the results are noisy, clearly there is a 4-5x(!) improvement to be had
by avoiding calls to System.currentTimeMillis(). The overhead of {{instanceof +
conditional}} is hard to determine.
So, to conclude, I'll put in the instanceof check.
was (Author: [email protected]):
Thanks for providing the microbenchmarks. There is a lot there, and I'm still
unpacking it, experimenting with various things, but so far this is what I've
found:
*Clock performance varies wildly across systems*
I added some simplified benchmarks to focus on the clocks:
{code}
final static Clock systemClock = new SystemClock();
@GenerateMicroBenchmark
public long testSystemCurrentTimeMillis() {
return System.currentTimeMillis();
}
@GenerateMicroBenchmark
public long testSystemClock() {
return systemClock.currentTimeMillis();
}
...
{code}
java 1.7.0_55 64 bit, Windows 7 Enterprise, Intel i5-2500 dual CPU 3.30GHz with
hyperthreading on (4 virtual cores):
{code}
java -jar target/microbenchmarks.jar ".*ClockTest.*" -f 1 -wi 5 -i 5 -tu ns
-bm sample -t 4
Benchmark Mode Samples Mean Mean
error Units
b.ClockTest.testBaseline sample 205018 9.892
0.399 ns/op
b.ClockTest.testCachedClock sample 237702 147.085
1.211 ns/op
b.ClockTest.testCoarseCachedClock sample 235897 10.037
0.374 ns/op
b.ClockTest.testSystemCurrentTimeMillis sample 208701 14.626
1.074 ns/op
b.ClockTest.testFakeClock sample 235691 9.520
0.371 ns/op
b.ClockTest.testSystemClock sample 239811 13.963
0.440 ns/op
{code}
Unix:
java 1.7.0_05 64 bit, RHEL 4.4.7-4 (Linux 2.6.32-431.e16x86_64), Intel Xeon 4
quad core [email protected] CPUs (16 virtual cores):
{code}
java -jar target/microbenchmarks.jar ".*ClockTest.*" -f 1 -wi 5 -i 5 -tu ns
-bm sample -t 4
Benchmark Mode Samples Mean Mean
error Units
b.ClockTest.testBaseline sample 212362 729.093
10.132 ns/op
b.ClockTest.testCachedClock sample 152722 890.688
7.055 ns/op
b.ClockTest.testCoarseCachedClock sample 336110 632.849
2.585 ns/op
b.ClockTest.testSystemCurrentTimeMillis sample 258054 3174.306
9.986 ns/op
b.ClockTest.testFakeClock sample 336549 1038.334
5.581 ns/op
b.ClockTest.testSystemClock sample 271259 3881.617
15.908 ns/op
{code}
Note the *massive* difference for {{System.currentTimeMillis()}}: 14 nanos on
WIndows, 3.1 _*micros*_ on Linux. That's 220x!
I agree, it looks like CachedClock performs very badly if there is contention
on the volatile field. (It does perform well on the Linux box where there are
enough cores to spread the load, but on the Windows box, brr...)
*To instanceof or not, that is the question*
First, I am assuming that the vast majority of users will log a mixture of
SimpleMessages and ParameterizedMessages, and that TimestampMessages are a
relatively rare use case.
Here are my measurements:
java 1.7.0_55 64 bit, Windows 7 Enterprise, Intel i5-2500 dual CPU 3.30GHz with
hyperthreading on (4 virtual cores):
{code}
Benchmark Mode Samples
Mean Mean error Units
b.ClockBenchmark.testSystemClockSimpleMessageInstanceOf sample 244944
105.317 6.183 ns/op
b.ClockBenchmark.testSystemClockSimpleMessageUseClock sample 284839
86.625 2.427 ns/op
b.ClockBenchmark.testSystemClockTimestampMessageInstanceOf sample 250178
61.812 6.151 ns/op
{code}
On Windows, my results seem to indicate that the overhead of {{msg instanceof
TimestampMessage ? ((TimestampMessage) msg).getTimestamp() :
clock.currentTimeMillis()}} is about 20 nanos. On the other hand,
{{TimestampMessage.getTimestamp()}} is about 20 nanos faster than
{{SystemClock.currentTimeMillis()}}.
So just looking at my Windows results, I would not want to make logging 20
nanos slower for the majority of users in order to make logging 20 nanos faster
for the fraction who log TimestampMessages.
However, the Linux results show a very different picture. The Linux benchmark
results were _very_ noisy so I changed to 20 iterations, but results easily
vary by several hundred nanoseconds between runs...
java 1.7.0_05 64 bit, RHEL 4.4.7-4 (Linux 2.6.32-431.e16x86_64), Intel Xeon 4
quad core [email protected] CPUs (16 virtual cores):
{code}
Benchmark Mode Samples
Mean Mean error Units
b.ClockBenchmark.testSystemClockSimpleMessageInstanceOf sample 964501
3596.031 23.931 ns/op
b.ClockBenchmark.testSystemClockSimpleMessageUseClock sample 777443
3207.301 28.671 ns/op
b.ClockBenchmark.testSystemClockTimestampMessageInstanceOf sample 959701
720.979 12.551 ns/op
{code}
On the linux box I tested on, the cost of calling System.currentTimeMillis
completely dominate the cost of creating a log event. This is painful for
TimestampMessages. It looks like the cost of {{instanceof + conditional}} is
~400 nanos, but in other tests I got the opposite result so I'm putting this
down to noise.
Even if the results are noisy, clearly there is a 4-5x(!) improvement to be had
by avoiding calls to System.currentTimeMillis(). The overhead of {{instanceof +
conditional}} is hard to determine.
So, to conclude, I'll put in the instanceof check.
> Avoid unnecessary Clock calls when TimestampMessage is logged
> -------------------------------------------------------------
>
> Key: LOG4J2-744
> URL: https://issues.apache.org/jira/browse/LOG4J2-744
> Project: Log4j 2
> Issue Type: Improvement
> Reporter: Scott Harrington
> Priority: Minor
> Attachments: ClockBenchmark-jdk1.6.0_31.txt,
> ClockBenchmark-jdk1.7.0_45.txt, ClockBenchmark.java, LOG4J2-744-test.patch,
> LOG4J2-744.patch
>
>
> The TimestampMessage interface was introduced in LOG4J2-53 and revised for
> AsyncLogger in LOG4J2-455.
> I've observed that Clock.currentTimeMillis is still called which should not
> be necessary.
> I have two patches, one which adds JUnit tests that demonstrate the
> unnecessary Clock calls, and one which fixes the issue for both AsyncLogger
> and "traditional" configurations.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]