On 28/07/2021 12:17 am, Thomas Stuefe wrote:
On Mon, 26 Jul 2021 21:08:04 GMT, David Holmes <david.hol...@oracle.com> wrote:

Before looking at this, have you checked the startup performance impact?

Thanks,
David
-----

Hi David,

performance should not be a problem. The potentially costly thing is the 
underlying hashmap. But we keep it operating with a very small load factor.

More details:

Adding entries is O(1). Since during pre-init phase almost only adds happen, 
startup time is not affected. Still, to make sure this is true, I did a bunch 
of tests:

- tested WCT of a helloworld, no differences with and without patch
- tested startup time in various of ways, no differences
- repeated those tests with 25000 (!) VM arguments, the only way to influence 
the number of pre-init allocations. No differences (VM gets slower with and 
without patch).

----

The expensive thing is lookup since we potentially need to walk a very full 
hashmap. Lookup affects post-init more than pre-init.

To get an idea of the cost of a too-full preinit lookup table, I modified the 
VM to do a configurable number of pre-init test-allocations, with the intent of 
artificially inflating the lookup table. Then, after NMT initialization, I 
measured the cost of lookup. The short story, I was not able to measure 
anything, even with a million pre-init allocations. Of course, with more 
allocations lookup table got fuller and the VM got slower, but the time 
increase was caused by the cost of the malloc calls themselves, not the table 
lookup.

Finally, I did an isolated test for the lookup table, testing pure adding and 
retrieval cost with artificial values. There, I could see costs for add were 
static (as expected), and lookup cost increased with table population. On my 
machine:

| lu table entries            | time per lookup |
| ------ |:-------------:|
| 1000    | 3 ns         |
| 1 mio   | 240 ns       |

As you can see, if lookup table population goes beyond 1 mio entries, lookup 
time starts being noticeable over background noise. But with these numbers, I 
am not worried. Standard lookup population should be around *300-500*, with 
very long command lines resulting in table populations of *~1000*. We should 
never seen 10000 entries, let alone millions of them.

Still, I added a jtreg test to verify the expected hash table population. To 
catch errors like an unforeseen mass of pre-init allocations (lets say a leak 
or badly written code sneaked in), or if the hash algorithm suddenly is not 
good anymore.

Two more points

1) I kept this coding deliberately simple. If we are really worried about a 
degenerated lookup table, we can do things to fix that:
  - we could automatically resize and rehash
  - we could, if we sense something wrong, just stop filling it and disable 
NMT, stopping NMT init phase prematurely at the cost of not being able to use 
NMT.
The latter I had implemented already but removed it again to keep complexity down, and because I saw no need.

2) In our propietary production VM we have a system similar to NMT, but predating 
it. In that system we don't use malloc headers but store all (millions of) 
malloc'ed pointers in a big hash map. It performs excellent on *all our libc 
variants*. It is so fast that we just leave it always switched on. This solution 
has been productive since >10 years, and therefore I am confident that this is 
viable. This proposed hashmap with a planned population of 300-1000 is really not 
much :)

Thanks Thomas! I appreciate the detailed investigation.

Cheers,
David

-------------

PR: https://git.openjdk.java.net/jdk/pull/4874

Reply via email to