Re: Question about crossing NUMA boundary

2022-02-28 Thread Jacob Barrett
The NUMA settings really only affects transient / non-shared objects. These are 
objects where the thread that allocated them is most likely to be the one to 
access them and in a short enough time span that the thread is not scheduled in 
a different NUMA zone core. Non-transient and shared objects may have to cross 
NUMA because there is no way to know what NUMA zones a scheduled thread will 
need to access.

So if anything was going to show the affects it would be on something that 
generates lots of transient objects really fast. For that I would look to the 
geode-benchmark get server side tests. They do local gets on a server without a 
client and can produce lots of transient objects, though not as many as it used 
to. Run with and without NUMA settings and see what it yields. I recall doing 
this a while ago for one benchmark and non-NUMA performance wasn’t great 
compared to NUMA but I was specifically looking at issue regarding network 
buffer saturation in the kernel on 72 core machines.

-Jake

On Feb 28, 2022, at 12:39 PM, Alberto Gomez 
mailto:alberto.go...@est.tech>> wrote:

Hi,

We understand the recommendation is to fit the Geode JVM within one NUMA node 
for optimal performance, so in case we're running in a system with multiple 
NUMA nodes and our JVM can fit in the memory available in a single NUMA, it is 
recommended to pin it there ([1]).

However, does anyone have any numbers to compare the performance of the 
same-sized Geode JVM when run on non-NUMA hardware vs run on NUMA hardware 
where JVM is spread on more NUMA nodes?

Have you played with newer JDKs and GCs that have better NUMA awareness to 
quantify if the drop in performance could be reduced to acceptable levels?

Thanks!

Alberto

[1]: 
https://geode.apache.org/docs/guide/114/managing/monitor_tune/performance_on_vsphere.html



Question about crossing NUMA boundary

2022-02-28 Thread Alberto Gomez
Hi,

We understand the recommendation is to fit the Geode JVM within one NUMA node 
for optimal performance, so in case we're running in a system with multiple 
NUMA nodes and our JVM can fit in the memory available in a single NUMA, it is 
recommended to pin it there ([1]).

However, does anyone have any numbers to compare the performance of the 
same-sized Geode JVM when run on non-NUMA hardware vs run on NUMA hardware 
where JVM is spread on more NUMA nodes?

Have you played with newer JDKs and GCs that have better NUMA awareness to 
quantify if the drop in performance could be reduced to acceptable levels?

Thanks!

Alberto

[1]: 
https://geode.apache.org/docs/guide/114/managing/monitor_tune/performance_on_vsphere.html