On Fri, 5 Jun 2026 17:50:34 GMT, Andrew Haley <[email protected]> wrote:
>> src/hotspot/share/c1/c1_LIRGenerator.cpp line 969: >> >>> 967: LIR_Opr tmp = new_register(T_INT); >>> 968: LIR_Opr step = >>> LIR_OprFact::intConst(DataLayout::counter_increment); >>> 969: __ increment_counter(step, tmp, md_reg, md->constant_encoding(), >>> data_offset_reg); >> >> OK, so this covers C1 profiling path. The interpreter profiling path >> (`InterpreterMacroAssembler::profile_taken_branch`) is still not covered? So >> if we are mixing the subsampled counters from C1 and raw counters from >> intepreter, does that skew the profiling results? In the long run this is >> probably not a problem? But I am not sure we are actually profiling any >> particular bci for long enough to mitigate this. >> >> What I am concerned about is the profile inversion. If there is a hot >> branch, it would probably progress to C1 profiling, where it would be >> subsampled, and would add up to profile only after some time. But the _cold_ >> branch that is running in intepreter would show up in profile right away. So >> there is time window where cold branch is over-represented over hot branch >> in profile. With large `ProfileCaptureRatio` this window can be >> uncomfortably large and fairly close to triggering the compilation with >> skewed profile? >> >> It is not much of the problem with receiver type profiling, where >> interpreter and C1 are on the same subsampling footing. > >> What I am concerned about is the profile inversion. If there is a hot >> branch, it would probably progress to C1 profiling, where it would be >> subsampled, and would add up to profile only after some time. > > Some compilations will be delayed, and some will be advanced, with > approximately equal probability. > >> But the _cold_ branch that is running in intepreter would show up in profile >> right away. So there is time window where cold branch is over-represented >> over hot branch in profile. > > Randomized profile counters add some noise to profile counts. The counters > can trigger overflow earlier _or later_ than they would have done without > randomization. I think the noise added to profile counters has something like > a Poisson distribution, so on average the error is about √N, where N is the > number of random samples. If you have a compilation threshold of 1024 it'll > take on average 1024 sampling events to trigger compilation, regardless of > `ProfileCaptureRatio`. Of those 1024 events, on average only 16 will actually > increment the associated counter by 64. √16 = 4, so the number of counts when > compilation is signalled is 1024, +- 256 for one standard distribution. > >> With large `ProfileCaptureRatio` this window can be uncomfortably large and >> fairly close to triggering the compilation with skewed profile? >> >> It is not much of the problem with receiver type profiling, where >> interpreter and C1 are on the same subsampling footing. The standard deviation of a Poisson distribution is the square root of its mean. (But I am not a statistician. Having said that, I have run a few simulations which seem to fit the above.) With really low thresholds typical for the interpreter, if we used the same `ProfileCaptureRatio` for interpreter and C1 it would surely lead to compilations happening in a very different order. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28541#discussion_r3364434618
