On Fri, 6 Dec 2024 16:30:47 GMT, Quan Anh Mai <[email protected]> wrote:
> Hi,
>
> This patch improves the performance of a typical `Arena::allocate` in several
> ways:
>
> - Delay the creation of the NativeMemorySegmentImpl. This avoids the merge of
> the instance with the one obtained from the call in the uncommon path,
> increasing the chance the object being scalar replaced.
> - Split the allocation of over-aligned memory to a slow-path method.
> - Align the memory to 8 bytes, allowing faster zeroing.
> - Use a dedicated method to zero the just-allocated native memory, reduce
> code size and make it more straightforward.
> - Make `VM.pageAlignDirectMemory` a `Boolean` instead of a `boolean` so that
> `false` value can be constant folded.
>
> Please take a look and leave your reviews, thanks a lot.
The results with the modified `AllocTest`:
Before After
Benchmark (size) Mode Cnt Score Error Score
Error Units
AllocTest.alloc_confined 5 avgt 30 24.188 ± 0.305 17.221 ±
1.299 ns/op
AllocTest.alloc_confined 20 avgt 30 24.690 ± 0.168 19.571 ±
3.108 ns/op
AllocTest.alloc_confined 100 avgt 30 26.714 ± 0.061 17.819 ±
0.095 ns/op
AllocTest.alloc_confined 500 avgt 30 38.907 ± 0.113 19.716 ±
0.060 ns/op
AllocTest.alloc_confined 2000 avgt 30 60.056 ± 3.087 43.373 ±
0.564 ns/op
AllocTest.alloc_confined 8000 avgt 30 141.535 ± 1.546 75.110 ±
3.482 ns/op
The overall `AllocTest` results:
Benchmark (size) Mode Cnt Score Error Units
AllocTest.alloc_calloc_arena 5 avgt 30 19.604 ± 0.075 ns/op
AllocTest.alloc_calloc_arena 20 avgt 30 19.750 ± 0.105 ns/op
AllocTest.alloc_calloc_arena 100 avgt 30 20.335 ± 0.103 ns/op
AllocTest.alloc_calloc_arena 500 avgt 30 36.676 ± 0.403 ns/op
AllocTest.alloc_calloc_arena 2000 avgt 30 47.928 ± 2.754 ns/op
AllocTest.alloc_calloc_arena 8000 avgt 30 83.762 ± 1.829 ns/op
AllocTest.alloc_confined 5 avgt 30 17.221 ± 1.299 ns/op
AllocTest.alloc_confined 20 avgt 30 19.571 ± 3.108 ns/op
AllocTest.alloc_confined 100 avgt 30 17.819 ± 0.095 ns/op
AllocTest.alloc_confined 500 avgt 30 19.716 ± 0.060 ns/op
AllocTest.alloc_confined 2000 avgt 30 43.373 ± 0.564 ns/op
AllocTest.alloc_confined 8000 avgt 30 75.110 ± 3.482 ns/op
AllocTest.alloc_unsafe_arena 5 avgt 30 18.810 ± 0.074 ns/op
AllocTest.alloc_unsafe_arena 20 avgt 30 18.858 ± 0.068 ns/op
AllocTest.alloc_unsafe_arena 100 avgt 30 21.820 ± 0.077 ns/op
AllocTest.alloc_unsafe_arena 500 avgt 30 32.685 ± 0.062 ns/op
AllocTest.alloc_unsafe_arena 2000 avgt 30 61.172 ± 1.464 ns/op
AllocTest.alloc_unsafe_arena 8000 avgt 30 133.842 ± 0.337 ns/op
-------------
PR Comment: https://git.openjdk.org/jdk/pull/22610#issuecomment-2523693086