On Wed, 29 Apr 2026 22:24:20 GMT, Quan Anh Mai <[email protected]> wrote:
>> ## Preliminaries
>>
>> ### 1. The inlining heuristic NodeCountLimitCutoff
>>
>> In general, we don't want to inline a call if the graph is already too
>> large. However, it is hard to decide whether the graph is large when we are
>> still constructing the graph.
>>
>> - There are still more bytecodes that need parsing, and more nodes that need
>> generating.
>> - It is hard (maybe impossible) to reliably determine whether a node is dead
>> during parsing.
>>
>> Due to the issues above, the heuristic depends on the number of generated
>> nodes, which is an upper bound of the number of live nodes, and the
>> threshold is pretty conservative.
>>
>> ### 2. Inlining a method
>>
>> To inline a method, C2 needs to generate the structure for the callee to
>> reside in. This includes the map for the exception path, the map for the
>> merge of all normal paths, their memory states, etc. My experiment shows
>> that, inlining a call generates around 20 more nodes than if the call is
>> inlined in the source code.
>>
>> private int v() {
>> return this,v;
>> }
>>
>> int test1() {
>> return this.v();
>> }
>>
>> int test2() {
>> return this.v;
>> }
>>
>> This means that, inlining a call consumes the budget of
>> `NodeCountInliningCutoff`, which may prevent other calls from being inlined,
>> even if other heuristics say that inlining is preferable. However, in
>> practice, it is rarely an issue, because there is a difference of 3 orders
>> of magnitude between the extra nodes generated by inlining, and the default
>> value of `NodeCountInliningCutoff` (16000).
>>
>> ### 3. Foreign memory access API
>>
>> The aforementioned property that `NodeCountInliningCutoff` is 3 orders of
>> magnitude larger than the number of extra nodes generated when inlining a
>> call is broken due to how the FMA API is implemented. A memory access such
>> as `j.l.f.MemorySegment::get` results in a huge call tree that needs
>> inlining:
>>
>> @ 8 jdk.internal.foreign.AbstractMemorySegmentImpl::get (12 bytes)
>> force inline by annotation callee changed to
>> io.github.merykitty.BenchmarkDraft::test1 (14 bytes) -> TypeProfile
>> (9083/9083 counts) = jdk/internal/foreign/NativeMemorySegmentImpl
>> @ 1
>> jdk.internal.foreign.layout.ValueLayouts$AbstractValueLayout::varHandle (24
>> bytes) force inline by annotation
>> @ 8 java.lang.invoke.VarHandleGuards::guard_LJ_I (84 bytes) force
>> inline by annotation
>> @ 3 java.lang.invoke.VarHandle::checkAccessModeThenIsDirect (29
>> bytes) force inline by annot...
>
> Quan Anh Mai has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Remove credit in the file in favor of proper PR attribution
It would be nice if `ciMethod` has `hs_mh_call()` as it has `has_jsrs()`. So we
could update limits and inline it.
Even with this implementation of delayed inlining we can skip needed inlining
if we hit `max_node_limit` before we update it.
Next RFE if it gives us improvement. It may not - it could be just corner case.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/30874#issuecomment-4353925179