The NUMA settings really only affects transient / non-shared objects. These are
objects where the thread that allocated them is most likely to be the one to
access them and in a short enough time span that the thread is not scheduled in
a different NUMA zone core. Non-transient and shared objects may have to cross
NUMA because there is no way to know what NUMA zones a scheduled thread will
need to access.
So if anything was going to show the affects it would be on something that
generates lots of transient objects really fast. For that I would look to the
geode-benchmark get server side tests. They do local gets on a server without a
client and can produce lots of transient objects, though not as many as it used
to. Run with and without NUMA settings and see what it yields. I recall doing
this a while ago for one benchmark and non-NUMA performance wasn’t great
compared to NUMA but I was specifically looking at issue regarding network
buffer saturation in the kernel on 72 core machines.
-Jake
On Feb 28, 2022, at 12:39 PM, Alberto Gomez
mailto:alberto.go...@est.tech>> wrote:
Hi,
We understand the recommendation is to fit the Geode JVM within one NUMA node
for optimal performance, so in case we're running in a system with multiple
NUMA nodes and our JVM can fit in the memory available in a single NUMA, it is
recommended to pin it there ([1]).
However, does anyone have any numbers to compare the performance of the
same-sized Geode JVM when run on non-NUMA hardware vs run on NUMA hardware
where JVM is spread on more NUMA nodes?
Have you played with newer JDKs and GCs that have better NUMA awareness to
quantify if the drop in performance could be reduced to acceptable levels?
Thanks!
Alberto
[1]:
https://geode.apache.org/docs/guide/114/managing/monitor_tune/performance_on_vsphere.html