Re: RFR: 8382700: C2: Delay inlining instead of giving up when hit NodeCountInliningCutoff [v11]

Vladimir Kozlov Thu, 30 Apr 2026 08:47:50 -0700

On Wed, 29 Apr 2026 22:24:20 GMT, Quan Anh Mai <[email protected]> wrote:


>> ## Preliminaries
>> 
>> ### 1. The inlining heuristic NodeCountLimitCutoff
>> 
>> In general, we don't want to inline a call if the graph is already too 
>> large. However, it is hard to decide whether the graph is large when we are 
>> still constructing the graph.
>> 
>> - There are still more bytecodes that need parsing, and more nodes that need 
>> generating.
>> - It is hard (maybe impossible) to reliably determine whether a node is dead 
>> during parsing.
>> 
>> Due to the issues above, the heuristic depends on the number of generated 
>> nodes, which is an upper bound of the number of live nodes, and the 
>> threshold is pretty conservative.
>> 
>> ### 2. Inlining a method
>> 
>> To inline a method, C2 needs to generate the structure for the callee to 
>> reside in. This includes the map for the exception path, the map for the 
>> merge of all normal paths, their memory states, etc. My experiment shows 
>> that, inlining a call generates around 20 more nodes than if the call is 
>> inlined in the source code.
>> 
>>     private int v() {
>>         return this,v;
>>     }
>> 
>>     int test1() {
>>         return this.v();
>>     }
>> 
>>     int test2() {
>>         return this.v;
>>     }
>> 
>> This means that, inlining a call consumes the budget of 
>> `NodeCountInliningCutoff`, which may prevent other calls from being inlined, 
>> even if other heuristics say that inlining is preferable. However, in 
>> practice, it is rarely an issue, because there is a difference of 3 orders 
>> of magnitude between the extra nodes generated by inlining, and the default 
>> value of `NodeCountInliningCutoff` (16000).
>> 
>> ### 3. Foreign memory access API
>> 
>> The aforementioned property that `NodeCountInliningCutoff` is 3 orders of 
>> magnitude larger than the number of extra nodes generated when inlining a 
>> call is broken due to how the FMA API is implemented. A memory access such 
>> as `j.l.f.MemorySegment::get` results in a huge call tree that needs 
>> inlining:
>> 
>>     @ 8   jdk.internal.foreign.AbstractMemorySegmentImpl::get (12 bytes)   
>> force inline by annotation   callee changed to  
>> io.github.merykitty.BenchmarkDraft::test1 (14 bytes)    -> TypeProfile 
>> (9083/9083 counts) = jdk/internal/foreign/NativeMemorySegmentImpl
>>       @ 1   
>> jdk.internal.foreign.layout.ValueLayouts$AbstractValueLayout::varHandle (24 
>> bytes)   force inline by annotation
>>       @ 8   java.lang.invoke.VarHandleGuards::guard_LJ_I (84 bytes)   force 
>> inline by annotation
>>         @ 3   java.lang.invoke.VarHandle::checkAccessModeThenIsDirect (29 
>> bytes)   force inline by annot...
>
> Quan Anh Mai has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Remove credit in the file in favor of proper PR attribution

It would be nice if `ciMethod` has `hs_mh_call()` as it has `has_jsrs()`. So we 
could update limits and inline it.

Even with this implementation of delayed inlining we can skip needed inlining 
if we hit `max_node_limit` before we update it.

Next RFE if it gives us improvement. It may not - it could be just corner case.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/30874#issuecomment-4353925179

Re: RFR: 8382700: C2: Delay inlining instead of giving up when hit NodeCountInliningCutoff [v11]

Reply via email to