On Tue, 28 Apr 2026 02:07:26 GMT, Quan Anh Mai <[email protected]> wrote:

>> ## Preliminaries
>> 
>> ### 1. The inlining heuristic NodeCountLimitCutoff
>> 
>> In general, we don't want to inline a call if the graph is already too 
>> large. However, it is hard to decide whether the graph is large when we are 
>> still constructing the graph.
>> 
>> - There are still more bytecodes that need parsing, and more nodes that need 
>> generating.
>> - It is hard (maybe impossible) to reliably determine whether a node is dead 
>> during parsing.
>> 
>> Due to the issues above, the heuristic depends on the number of generated 
>> nodes, which is an upper bound of the number of live nodes, and the 
>> threshold is pretty conservative.
>> 
>> ### 2. Inlining a method
>> 
>> To inline a method, C2 needs to generate the structure for the callee to 
>> reside in. This includes the map for the exception path, the map for the 
>> merge of all normal paths, their memory states, etc. My experiment shows 
>> that, inlining a call generates around 20 more nodes than if the call is 
>> inlined in the source code.
>> 
>>     private int v() {
>>         return this,v;
>>     }
>> 
>>     int test1() {
>>         return this.v();
>>     }
>> 
>>     int test2() {
>>         return this.v;
>>     }
>> 
>> This means that, inlining a call consumes the budget of 
>> `NodeCountInliningCutoff`, which may prevent other calls from being inlined, 
>> even if other heuristics say that inlining is preferable. However, in 
>> practice, it is rarely an issue, because there is a difference of 3 orders 
>> of magnitude between the extra nodes generated by inlining, and the default 
>> value of `NodeCountInliningCutoff` (16000).
>> 
>> ### 3. Foreign memory access API
>> 
>> The aforementioned property that `NodeCountInliningCutoff` is 3 orders of 
>> magnitude larger than the number of extra nodes generated when inlining a 
>> call is broken due to how the FMA API is implemented. A memory access such 
>> as `j.l.f.MemorySegment::get` results in a huge call tree that needs 
>> inlining:
>> 
>>     @ 8   jdk.internal.foreign.AbstractMemorySegmentImpl::get (12 bytes)   
>> force inline by annotation   callee changed to  
>> io.github.merykitty.BenchmarkDraft::test1 (14 bytes)    -> TypeProfile 
>> (9083/9083 counts) = jdk/internal/foreign/NativeMemorySegmentImpl
>>       @ 1   
>> jdk.internal.foreign.layout.ValueLayouts$AbstractValueLayout::varHandle (24 
>> bytes)   force inline by annotation
>>       @ 8   java.lang.invoke.VarHandleGuards::guard_LJ_I (84 bytes)   force 
>> inline by annotation
>>         @ 3   java.lang.invoke.VarHandle::checkAccessModeThenIsDirect (29 
>> bytes)   force inline by annot...
>
> Quan Anh Mai has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Add diagnostic flag, change initial value to NodeCountInliningCutoff

Overall, looks good.

src/hotspot/share/opto/bytecodeInfo.cpp line 34:

> 32: #include "jfr/jfrEvents.hpp"
> 33: #include "oops/objArrayKlass.hpp"
> 34: #include "opto/c2_globals.hpp"

Is it redundant?

src/hotspot/share/opto/bytecodeInfo.cpp line 401:

> 399:     // parsing in the caller, and node liveness is more easily 
> determined.
> 400:     if (C->over_inlining_cutoff()) {
> 401:       bool give_up = true;

I suggest to extract the logic into a helper method (e.g., 
`Compile::should_delay_over_inlining_cutoff()`).

src/hotspot/share/opto/compile.cpp line 50:

> 48: #include "opto/addnode.hpp"
> 49: #include "opto/block.hpp"
> 50: #include "opto/c2_globals.hpp"

Redundant?

test/micro/org/openjdk/bench/java/lang/foreign/FFMStructAccessTest.java line 33:

> 31: import org.openjdk.jmh.annotations.*;
> 32: 
> 33: // Credit: 
> https://gist.github.com/Spasi/eed94bd2228e637464c32786a52fbd0d, contributed 
> on panama-dev

Is it the right way to credit the contribution? I'd add Ioannis as a 
contributor to the PR instead.

test/micro/org/openjdk/bench/java/lang/foreign/FFMStructAccessTest.java line 41:

> 39: @BenchmarkMode(Mode.AverageTime)
> 40: @OutputTimeUnit(TimeUnit.NANOSECONDS)
> 41: public class FFMStructAccessTest {

Can you turn one of the microbenchmarks into a regression test?

-------------

PR Review: https://git.openjdk.org/jdk/pull/30874#pullrequestreview-4192714874
PR Review Comment: https://git.openjdk.org/jdk/pull/30874#discussion_r3157455336
PR Review Comment: https://git.openjdk.org/jdk/pull/30874#discussion_r3157436870
PR Review Comment: https://git.openjdk.org/jdk/pull/30874#discussion_r3157456043
PR Review Comment: https://git.openjdk.org/jdk/pull/30874#discussion_r3157445544
PR Review Comment: https://git.openjdk.org/jdk/pull/30874#discussion_r3157453070

Reply via email to