On Wed, 22 Apr 2026 11:30:08 GMT, Maurizio Cimadamore <[email protected]>
wrote:
>> ## Preliminaries
>>
>> ### 1. The inlining heuristic NodeCountLimitCutoff
>>
>> In general, we don't want to inline a call if the graph is already too
>> large. However, it is hard to decide whether the graph is large when we are
>> still constructing the graph.
>>
>> - There are still more bytecodes that need parsing, and more nodes that need
>> generating.
>> - It is hard (maybe impossible) to reliably determine whether a node is dead
>> during parsing.
>>
>> Due to the issues above, the heuristic depends on the number of generated
>> nodes, which is an upper bound of the number of live nodes, and the
>> threshold is pretty conservative.
>>
>> ### 2. Inlining a method
>>
>> To inline a method, C2 needs to generate the structure for the callee to
>> reside in. This includes the map for the exception path, the map for the
>> merge of all normal paths, their memory states, etc. My experiment shows
>> that, inlining a call generates around 20 more nodes than if the call is
>> inlined in the source code.
>>
>> private int v() {
>> return this,v;
>> }
>>
>> int test1() {
>> return this.v();
>> }
>>
>> int test2() {
>> return this.v;
>> }
>>
>> This means that, inlining a call consumes the budget of
>> `NodeCountInliningCutoff`, which may prevent other calls from being inlined,
>> even if other heuristics say that inlining is preferable. However, in
>> practice, it is rarely an issue, because there is a difference of 3 orders
>> of magnitude between the extra nodes generated by inlining, and the default
>> value of `NodeCountInliningCutoff` (16000).
>>
>> ### 3. Foreign memory access API
>>
>> The aforementioned property that `NodeCountInliningCutoff` is 3 orders of
>> magnitude larger than the number of extra nodes generated when inlining a
>> call is broken due to how the FMA API is implemented. A memory access such
>> as `j.l.f.MemorySegment::get` results in a huge call tree that needs
>> inlining:
>>
>> @ 8 jdk.internal.foreign.AbstractMemorySegmentImpl::get (12 bytes)
>> force inline by annotation callee changed to
>> io.github.merykitty.BenchmarkDraft::test1 (14 bytes) -> TypeProfile
>> (9083/9083 counts) = jdk/internal/foreign/NativeMemorySegmentImpl
>> @ 1
>> jdk.internal.foreign.layout.ValueLayouts$AbstractValueLayout::varHandle (24
>> bytes) force inline by annotation
>> @ 8 java.lang.invoke.VarHandleGuards::guard_LJ_I (84 bytes) force
>> inline by annotation
>> @ 3 java.lang.invoke.VarHandle::checkAccessModeThenIsDirect (29
>> bytes) force inline by annot...
>
> Note: I'd still advise against using `copySegmentReinterpret` as a general
> strategy. That trick can, under ideal condition, effectively remove all kinds
> of bound checks. That said, as shown here, under more realistic conditions,
> it just doesn't perform very well, while simpler and more idiomatic solutions
> basically reach parity with Unsafe (at least in this benchmark).
@mcimadamore @vnkozlov Thanks for taking a look, I have added the benchmark.
@iwanowww Thanks a lot for your comment.
> First, I'd like to clarify the root cause of the problem. It's not specific
> to FFM, but relates to how `MethodHandle`s and `VarHandle`s are implemented.
I can say it is intensified by how FFM is implemented, if you look at the
inline tree of `AbstractMemorySegment::get` above, most of the calls are below
`java.lang.invoke.VarHandleSegmentAsInts::get`. So, while it is often an issue
with `Method/VarHandle`s, it is a more severe one with the FFM API.
> So, the question I have is what kind of performance testing besides
> microbenchmarks have you run on it? I'd like to get a better understanding
> how severe the risks of overinlining are and then decide how to better
> address the issue.
I am running some benchmarks that measuring the compiler performance, as well
as familiar benchmark suites Renaissance, Dacapo, Specjbb.
For the risk of overinlining, I think the risk is really small, because
normally we will hit `DesiredMethodLimit` first, which is based on the total
bytecode size, and it has a much smaller threshold. The reason we usually see
`NodeCountInliningCutoff` with FFM API is that all of the methods in the
implementation of FFM API is `@ForceInline`, and with `@FI` methods,
`ciMethod::code_size_for_inlining` returns 1, which means they hide themselves
from this heuristic, allowing us to reach `NodeCountInliningCutoff`. In fact,
if I remove the part of `ciMethod::code_size_for_inlining` that returns 1 for
`@FI` methods, we hit `DesiredMethodLimit`.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/30874#issuecomment-4302743800