On Thu, 11 Jun 2026 18:15:03 GMT, Andrew Haley <[email protected]> wrote:

>> Please use [this 
>> link](https://github.com/openjdk/jdk/pull/28541/changes?w=1) to view the 
>> files changed.
>> 
>> Profile counters scale very badly.
>> 
>> The overhead for profiled code isn't too bad with one thread, but as the 
>> thread count increases, things go wrong very quickly.
>> 
>> For example, here's a benchmark from the OpenJDK test suite, run at 
>> TieredLevel 3 with one thread, then three threads:
>> 
>> 
>> Benchmark (randomized) Mode Cnt Score Error Units
>> InterfaceCalls.test2ndInt5Types false avgt 4 27.468 ± 2.631 ns/op
>> InterfaceCalls.test2ndInt5Types false avgt 4 240.010 ± 6.329 ns/op
>> 
>> 
>> This slowdown is caused by high memory contention on the profile counters. 
>> Not only is this slow, but it can also lose profile counts.
>> 
>> This patch is for C1 only. It'd be easy to randomize C1 counters as well in 
>> another PR, if anyone thinks it's worth doing.
>> 
>> One other thing to note is that randomized profile counters degrade very 
>> badly with small decimation ratios. For example, using a ratio of 2 with 
>> `-XX:ProfileCaptureRatio=2` with a single thread results in
>> 
>> 
>> Benchmark                        (randomized)  Mode  Cnt   Score   Error  
>> Units
>> InterfaceCalls.test2ndInt5Types         false  avgt    4  80.147 ± 9.991  
>> ns/op
>> 
>> 
>> The problem is that the branch prediction rate drops away very badly, 
>> leading to many mispredictions. It only really makes sense to use higher 
>> decimation ratios, e.g. 64.
>> 
>> ---------
>> - [x] I confirm that I make this contribution in accordance with the 
>> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai).
>
> Andrew Haley has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Work around Assembler bug.

A bit more stuff...

src/hotspot/share/c1/c1_LIRGenerator.cpp line 3216:

> 3214:                                  
> MethodData::invocation_counter_offset());
> 3215:     counters_base = LIR_OprFact::metadataConst
> 3216:                      (method ->method_data_or_null()

Suggestion:

                     (method->method_data_or_null()


Also it is funky to see `_or_null()` use that immediately dereferences the 
result.

src/hotspot/share/c1/c1_LIRGenerator.hpp line 282:

> 280:   LIR_Opr call_runtime(Value arg1, address entry, ValueType* 
> result_type, CodeEmitInfo* info);
> 281:   LIR_Opr call_runtime(Value arg1, Value arg2, address entry, ValueType* 
> result_type, CodeEmitInfo* info);
> 282:   LIR_Opr call_runtime(LIR_Opr arg1, LIR_Opr arg2, LIR_Opr arg3, LIR_Opr 
> result,

Looks dead, unless I am missing something.

src/hotspot/share/runtime/flags/jvmFlagConstraintsCompiler.cpp line 421:

> 419:     JVMFlag::printError(verbose,
> 420:                         "ProfileCaptureRatio != 1 is not supported on 
> this target\n");
> 421:     return JVMFlag::INVALID_FLAG;

`INVALID_FLAG` means:


    // there is no flag with the given name
    INVALID_FLAG,


I think you want `VIOLATES_CONSTRAINT` here.

src/hotspot/share/runtime/javaThread.cpp line 429:

> 427:     int state;
> 428:     do {
> 429:       state = os::random();

Tidbit: `os::random()` seems to be Park-Miller `mod (2**31-1)`, so the highest 
bit is always unset. Does it have impact on profiling? IOW, does any profiling 
code depends on all bits being uniformly distributed? 

It sounds from this code that we _do_ care about the highest bits?


    auto threshold = (UCONST64(1) << 32) >> ratio_shift;
    __ cmpl(r_profile_rng, threshold);
    __ jccb(Assembler::aboveEqual, nope);

-------------

PR Review: https://git.openjdk.org/jdk/pull/28541#pullrequestreview-4498263754
PR Review Comment: https://git.openjdk.org/jdk/pull/28541#discussion_r3414314746
PR Review Comment: https://git.openjdk.org/jdk/pull/28541#discussion_r3414293614
PR Review Comment: https://git.openjdk.org/jdk/pull/28541#discussion_r3414302940
PR Review Comment: https://git.openjdk.org/jdk/pull/28541#discussion_r3414286477

Reply via email to