Re: JVM random performance

2017-08-01 Thread Gil Tene
Add -XX:+PrintGCTimeStamps, also, run with time so we can see the total run 
time...

On Tuesday, August 1, 2017 at 12:32:37 PM UTC-7, Roger Alsing wrote:
>
> Does this tell anyone anything?
> https://gist.github.com/rogeralsing/1e814f80321378ee132fa34aae77ef6d
> https://gist.github.com/rogeralsing/85ce3feb409eb7710f713b184129cc0b
>
> This is beyond my understanding of the JVM.
>
> ps. no multi socket or numa.
>
> Regards
> Roger
>
>
> Den tisdag 1 augusti 2017 kl. 20:22:23 UTC+2 skrev Georges Gomes:
>>
>> Are you benchmarking on a multi-socket/NUMA server?
>>
>> On Tue, Aug 1, 2017, 1:48 PM Wojciech Kudla  wrote:
>>
>>> It definitely makes sense to have a look at gc activity, but I would 
>>> suggest looking at safepoints from a broader perspective. Just use 
>>>  -XX:+PrintGCApplicationStoppedTime to see what's going on. If it's 
>>> safepoints, you could get more details with safepoint statistics. 
>>> Also, benchmark runs in java may appear undeterministic simply because 
>>> compilation happens in background threads by default and some runs may 
>>> exhibit a different runtime profile since the compilation threads receive 
>>> their time slice in different moments throughout the benchmark. 
>>> Are the results also jittery when run entirely in interpreted mode? It 
>>> may be worth to experiment with various compilation settings (ie. disable 
>>> tiered compilation, employ different warmup strategies, play around with 
>>> compiler control). 
>>> Are you employing any sort of affinitizing threads to cpus? 
>>> Are you running on a multi-socket setup? 
>>>
>>> On Tue, 1 Aug 2017, 19:27 Roger Alsing,  wrote:
>>>
 Some context: I'm building an actor framework, similar to Akka but 
 polyglot/cross-platform..
 For each platform we have the same benchmarks, where one of them is an 
 in process ping-pong benchmark.

 On .NET and Go, we can spin up pairs of ping-pong actors equal to the 
 number of cores in the CPU and no matter if we spin up more pairs, the 
 total throughput remains roughly the same.
 But, on the JVM. if we do this, I can see how we max out at 100% CPU, 
 as expected, but if I instead spin up a lot more pairs, e.g. 20 * 
 core_count, the total throughput tipples.

 I suspect this is due to the system running in a more steady state kind 
 of fashion in the latter case, mailboxes are never completely drained and 
 actors don't have to switch between processing and idle.
 Would this be fair to assume?
 This is the reason why I believe this is a question for this specific 
 forum.

 Now to the real question.. roughly 60-40 when the benchmark is started, 
 it runs at 250 mil msg/sec. steadily and the other times it runs at 350 
 mil 
 msg/sec.
 The reason why I find this strange is that it is stable over time. if I 
 don't stop the benchmark, it will continue at the same pace.

 If anyone is bored and like to try it out, the repo is here:
 https://github.com/AsynkronIT/protoactor-kotlin
 and the actual benchmark here: 
 https://github.com/AsynkronIT/protoactor-kotlin/blob/master/examples/src/main/kotlin/actor/proto/examples/inprocessbenchmark/InProcessBenchmark.kt

 This is also consistent with or without various vm arguments.

 I'm very interested to hear if anyone has any theories what could cause 
 this behavior.

 One factor that seems to be involved is GC, but not in the obvious way, 
 rather reversed.
 In the beginning, when the framework allocated more memory, it more 
 often ran at the high speed.
 And the fewer allocations I've managed to do w/o touching the hot path, 
 the more the benchmark have started to toggle between these two numbers.

 Thoughts?

 -- 
 You received this message because you are subscribed to the Google 
 Groups "mechanical-sympathy" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to mechanical-sympathy+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "mechanical-sympathy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to mechanical-sympathy+unsubscr...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JVM random performance

2017-08-01 Thread Roger Alsing
Does this tell anyone anything?
https://gist.github.com/rogeralsing/1e814f80321378ee132fa34aae77ef6d
https://gist.github.com/rogeralsing/85ce3feb409eb7710f713b184129cc0b

This is beyond my understanding of the JVM.

ps. no multi socket or numa.

Regards
Roger


Den tisdag 1 augusti 2017 kl. 20:22:23 UTC+2 skrev Georges Gomes:
>
> Are you benchmarking on a multi-socket/NUMA server?
>
> On Tue, Aug 1, 2017, 1:48 PM Wojciech Kudla  > wrote:
>
>> It definitely makes sense to have a look at gc activity, but I would 
>> suggest looking at safepoints from a broader perspective. Just use 
>>  -XX:+PrintGCApplicationStoppedTime to see what's going on. If it's 
>> safepoints, you could get more details with safepoint statistics. 
>> Also, benchmark runs in java may appear undeterministic simply because 
>> compilation happens in background threads by default and some runs may 
>> exhibit a different runtime profile since the compilation threads receive 
>> their time slice in different moments throughout the benchmark. 
>> Are the results also jittery when run entirely in interpreted mode? It 
>> may be worth to experiment with various compilation settings (ie. disable 
>> tiered compilation, employ different warmup strategies, play around with 
>> compiler control). 
>> Are you employing any sort of affinitizing threads to cpus? 
>> Are you running on a multi-socket setup? 
>>
>> On Tue, 1 Aug 2017, 19:27 Roger Alsing, > 
>> wrote:
>>
>>> Some context: I'm building an actor framework, similar to Akka but 
>>> polyglot/cross-platform..
>>> For each platform we have the same benchmarks, where one of them is an 
>>> in process ping-pong benchmark.
>>>
>>> On .NET and Go, we can spin up pairs of ping-pong actors equal to the 
>>> number of cores in the CPU and no matter if we spin up more pairs, the 
>>> total throughput remains roughly the same.
>>> But, on the JVM. if we do this, I can see how we max out at 100% CPU, as 
>>> expected, but if I instead spin up a lot more pairs, e.g. 20 * core_count, 
>>> the total throughput tipples.
>>>
>>> I suspect this is due to the system running in a more steady state kind 
>>> of fashion in the latter case, mailboxes are never completely drained and 
>>> actors don't have to switch between processing and idle.
>>> Would this be fair to assume?
>>> This is the reason why I believe this is a question for this specific 
>>> forum.
>>>
>>> Now to the real question.. roughly 60-40 when the benchmark is started, 
>>> it runs at 250 mil msg/sec. steadily and the other times it runs at 350 mil 
>>> msg/sec.
>>> The reason why I find this strange is that it is stable over time. if I 
>>> don't stop the benchmark, it will continue at the same pace.
>>>
>>> If anyone is bored and like to try it out, the repo is here:
>>> https://github.com/AsynkronIT/protoactor-kotlin
>>> and the actual benchmark here: 
>>> https://github.com/AsynkronIT/protoactor-kotlin/blob/master/examples/src/main/kotlin/actor/proto/examples/inprocessbenchmark/InProcessBenchmark.kt
>>>
>>> This is also consistent with or without various vm arguments.
>>>
>>> I'm very interested to hear if anyone has any theories what could cause 
>>> this behavior.
>>>
>>> One factor that seems to be involved is GC, but not in the obvious way, 
>>> rather reversed.
>>> In the beginning, when the framework allocated more memory, it more 
>>> often ran at the high speed.
>>> And the fewer allocations I've managed to do w/o touching the hot path, 
>>> the more the benchmark have started to toggle between these two numbers.
>>>
>>> Thoughts?
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "mechanical-sympathy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to mechanical-sympathy+unsubscr...@googlegroups.com 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "mechanical-sympathy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to mechanical-sympathy+unsubscr...@googlegroups.com .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JVM random performance

2017-08-01 Thread Kirk Pepperdine
Hi,

>From my observations there appears to be some race conditions in the hotspot 
>compilations that can affect hot/cold path decisions during warmup. If the 
>race wins in your favor, all is well, if not…. Also memory layout of the JVM 
>will have some impact on what optimizations are applied. If you’re in low ram 
>you’ll get different optimizations than if you’re running in high ram. I’d 
>suggest you run your benches in a highly controlled environment to start with 
>and then afterwards you can experiment to understand what environmental 
>conditions your bench maybe sensitive to.

Kind regards,
Kirk


> On Aug 1, 2017, at 7:26 PM, Roger Alsing  wrote:
> 
> Some context: I'm building an actor framework, similar to Akka but 
> polyglot/cross-platform..
> For each platform we have the same benchmarks, where one of them is an in 
> process ping-pong benchmark.
> 
> On .NET and Go, we can spin up pairs of ping-pong actors equal to the number 
> of cores in the CPU and no matter if we spin up more pairs, the total 
> throughput remains roughly the same.
> But, on the JVM. if we do this, I can see how we max out at 100% CPU, as 
> expected, but if I instead spin up a lot more pairs, e.g. 20 * core_count, 
> the total throughput tipples.
> 
> I suspect this is due to the system running in a more steady state kind of 
> fashion in the latter case, mailboxes are never completely drained and actors 
> don't have to switch between processing and idle.
> Would this be fair to assume?
> This is the reason why I believe this is a question for this specific forum.
> 
> Now to the real question.. roughly 60-40 when the benchmark is started, it 
> runs at 250 mil msg/sec. steadily and the other times it runs at 350 mil 
> msg/sec.
> The reason why I find this strange is that it is stable over time. if I don't 
> stop the benchmark, it will continue at the same pace.
> 
> If anyone is bored and like to try it out, the repo is here:
> https://github.com/AsynkronIT/protoactor-kotlin
> and the actual benchmark here: 
> https://github.com/AsynkronIT/protoactor-kotlin/blob/master/examples/src/main/kotlin/actor/proto/examples/inprocessbenchmark/InProcessBenchmark.kt
> 
> This is also consistent with or without various vm arguments.
> 
> I'm very interested to hear if anyone has any theories what could cause this 
> behavior.
> 
> One factor that seems to be involved is GC, but not in the obvious way, 
> rather reversed.
> In the beginning, when the framework allocated more memory, it more often ran 
> at the high speed.
> And the fewer allocations I've managed to do w/o touching the hot path, the 
> more the benchmark have started to toggle between these two numbers.
> 
> Thoughts?
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to mechanical-sympathy+unsubscr...@googlegroups.com 
> .
> For more options, visit https://groups.google.com/d/optout 
> .

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JVM random performance

2017-08-01 Thread Georges Gomes
Are you benchmarking on a multi-socket/NUMA server?

On Tue, Aug 1, 2017, 1:48 PM Wojciech Kudla 
wrote:

> It definitely makes sense to have a look at gc activity, but I would
> suggest looking at safepoints from a broader perspective. Just use
>  -XX:+PrintGCApplicationStoppedTime to see what's going on. If it's
> safepoints, you could get more details with safepoint statistics.
> Also, benchmark runs in java may appear undeterministic simply because
> compilation happens in background threads by default and some runs may
> exhibit a different runtime profile since the compilation threads receive
> their time slice in different moments throughout the benchmark.
> Are the results also jittery when run entirely in interpreted mode? It may
> be worth to experiment with various compilation settings (ie. disable
> tiered compilation, employ different warmup strategies, play around with
> compiler control).
> Are you employing any sort of affinitizing threads to cpus?
> Are you running on a multi-socket setup?
>
> On Tue, 1 Aug 2017, 19:27 Roger Alsing,  wrote:
>
>> Some context: I'm building an actor framework, similar to Akka but
>> polyglot/cross-platform..
>> For each platform we have the same benchmarks, where one of them is an in
>> process ping-pong benchmark.
>>
>> On .NET and Go, we can spin up pairs of ping-pong actors equal to the
>> number of cores in the CPU and no matter if we spin up more pairs, the
>> total throughput remains roughly the same.
>> But, on the JVM. if we do this, I can see how we max out at 100% CPU, as
>> expected, but if I instead spin up a lot more pairs, e.g. 20 * core_count,
>> the total throughput tipples.
>>
>> I suspect this is due to the system running in a more steady state kind
>> of fashion in the latter case, mailboxes are never completely drained and
>> actors don't have to switch between processing and idle.
>> Would this be fair to assume?
>> This is the reason why I believe this is a question for this specific
>> forum.
>>
>> Now to the real question.. roughly 60-40 when the benchmark is started,
>> it runs at 250 mil msg/sec. steadily and the other times it runs at 350 mil
>> msg/sec.
>> The reason why I find this strange is that it is stable over time. if I
>> don't stop the benchmark, it will continue at the same pace.
>>
>> If anyone is bored and like to try it out, the repo is here:
>> https://github.com/AsynkronIT/protoactor-kotlin
>> and the actual benchmark here:
>> https://github.com/AsynkronIT/protoactor-kotlin/blob/master/examples/src/main/kotlin/actor/proto/examples/inprocessbenchmark/InProcessBenchmark.kt
>>
>> This is also consistent with or without various vm arguments.
>>
>> I'm very interested to hear if anyone has any theories what could cause
>> this behavior.
>>
>> One factor that seems to be involved is GC, but not in the obvious way,
>> rather reversed.
>> In the beginning, when the framework allocated more memory, it more often
>> ran at the high speed.
>> And the fewer allocations I've managed to do w/o touching the hot path,
>> the more the benchmark have started to toggle between these two numbers.
>>
>> Thoughts?
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "mechanical-sympathy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to mechanical-sympathy+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mechanical-sympathy+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JVM random performance

2017-08-01 Thread Wojciech Kudla
It definitely makes sense to have a look at gc activity, but I would
suggest looking at safepoints from a broader perspective. Just use
 -XX:+PrintGCApplicationStoppedTime to see what's going on. If it's
safepoints, you could get more details with safepoint statistics.
Also, benchmark runs in java may appear undeterministic simply because
compilation happens in background threads by default and some runs may
exhibit a different runtime profile since the compilation threads receive
their time slice in different moments throughout the benchmark.
Are the results also jittery when run entirely in interpreted mode? It may
be worth to experiment with various compilation settings (ie. disable
tiered compilation, employ different warmup strategies, play around with
compiler control).
Are you employing any sort of affinitizing threads to cpus?
Are you running on a multi-socket setup?

On Tue, 1 Aug 2017, 19:27 Roger Alsing,  wrote:

> Some context: I'm building an actor framework, similar to Akka but
> polyglot/cross-platform..
> For each platform we have the same benchmarks, where one of them is an in
> process ping-pong benchmark.
>
> On .NET and Go, we can spin up pairs of ping-pong actors equal to the
> number of cores in the CPU and no matter if we spin up more pairs, the
> total throughput remains roughly the same.
> But, on the JVM. if we do this, I can see how we max out at 100% CPU, as
> expected, but if I instead spin up a lot more pairs, e.g. 20 * core_count,
> the total throughput tipples.
>
> I suspect this is due to the system running in a more steady state kind of
> fashion in the latter case, mailboxes are never completely drained and
> actors don't have to switch between processing and idle.
> Would this be fair to assume?
> This is the reason why I believe this is a question for this specific
> forum.
>
> Now to the real question.. roughly 60-40 when the benchmark is started, it
> runs at 250 mil msg/sec. steadily and the other times it runs at 350 mil
> msg/sec.
> The reason why I find this strange is that it is stable over time. if I
> don't stop the benchmark, it will continue at the same pace.
>
> If anyone is bored and like to try it out, the repo is here:
> https://github.com/AsynkronIT/protoactor-kotlin
> and the actual benchmark here:
> https://github.com/AsynkronIT/protoactor-kotlin/blob/master/examples/src/main/kotlin/actor/proto/examples/inprocessbenchmark/InProcessBenchmark.kt
>
> This is also consistent with or without various vm arguments.
>
> I'm very interested to hear if anyone has any theories what could cause
> this behavior.
>
> One factor that seems to be involved is GC, but not in the obvious way,
> rather reversed.
> In the beginning, when the framework allocated more memory, it more often
> ran at the high speed.
> And the fewer allocations I've managed to do w/o touching the hot path,
> the more the benchmark have started to toggle between these two numbers.
>
> Thoughts?
>
> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mechanical-sympathy+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


JVM random performance

2017-08-01 Thread Roger Alsing
Some context: I'm building an actor framework, similar to Akka but 
polyglot/cross-platform..
For each platform we have the same benchmarks, where one of them is an in 
process ping-pong benchmark.

On .NET and Go, we can spin up pairs of ping-pong actors equal to the 
number of cores in the CPU and no matter if we spin up more pairs, the 
total throughput remains roughly the same.
But, on the JVM. if we do this, I can see how we max out at 100% CPU, as 
expected, but if I instead spin up a lot more pairs, e.g. 20 * core_count, 
the total throughput tipples.

I suspect this is due to the system running in a more steady state kind of 
fashion in the latter case, mailboxes are never completely drained and 
actors don't have to switch between processing and idle.
Would this be fair to assume?
This is the reason why I believe this is a question for this specific forum.

Now to the real question.. roughly 60-40 when the benchmark is started, it 
runs at 250 mil msg/sec. steadily and the other times it runs at 350 mil 
msg/sec.
The reason why I find this strange is that it is stable over time. if I 
don't stop the benchmark, it will continue at the same pace.

If anyone is bored and like to try it out, the repo is here:
https://github.com/AsynkronIT/protoactor-kotlin
and the actual benchmark 
here: 
https://github.com/AsynkronIT/protoactor-kotlin/blob/master/examples/src/main/kotlin/actor/proto/examples/inprocessbenchmark/InProcessBenchmark.kt

This is also consistent with or without various vm arguments.

I'm very interested to hear if anyone has any theories what could cause 
this behavior.

One factor that seems to be involved is GC, but not in the obvious way, 
rather reversed.
In the beginning, when the framework allocated more memory, it more often 
ran at the high speed.
And the fewer allocations I've managed to do w/o touching the hot path, the 
more the benchmark have started to toggle between these two numbers.

Thoughts?

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.