On Mon, 18 May 2026 08:29:18 GMT, Andrew Dinn <[email protected]> wrote:
>> Ferenc Rakoczi has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Added AOT Code Cache related code + some cosmetic changes
>
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7738:
>
>> 7736: // so four calls with the appropriate parameters will produce the
>> 64-bit
>> 7737: // low32 * low32, low32 * high32, high32 * low32, high32 * high32
>> 7738: // values in the output register sequences.
>
> A little more detail would make it easier to understand this method and helpt
> to clarify what is happening in code where it is called
> Suggestion:
>
> // Calls to this function accept either the low 32 bis or high 20 bits
> // of each b_i packed into bs in ascending order. a_0 and a_1 are packed
> // into successive 64 bit elements of as. lane selects the low 32 or high
> // 20 bits of each a_j value. So four calls with the appropriate parameters
> // will produce the 64-bit low32 * low32, low32 * high20, high20 * low32,
> // high20 * high20 values in the output register sequences vs. The
> // 64-bit partial products are returned in vs in ascending order:
> // vs[0] = (b_0*a_0, b_1*a_0) . . . vs[3] = (b_2*a_1, b_3*a_1)
Accepted with minor changes.
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7820:
>
>> 7818: __ mov(c_ptr, sp);
>> 7819:
>> 7820: // Calculate limb mask
>
> Suggestion:
>
> // Calculate (52-bit) limb masks for both gpr and vector registers
Accepted.
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7826:
>
>> 7824: //Load input arrays and modulus
>> 7825: Register a_ptr = *common_regs++, mod_ptr = *common_regs++;
>> 7826: __ add(a_ptr, a, 24);
>
> Suggestion:
>
> // skip 3 limbs so a_ptr addresses trailing pair {a3, a4}
> __ add(a_ptr, a, 24);
Accepted.
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7838:
>
>> 7836: __ ldr(mod_4, mod_ptr);
>> 7837: __ ld1(a_vals, __ T2D, a_ptr);
>> 7838: __ ld2(b_lows, b_highs, __ T4S, b);
>
> Suggestion:
>
> // use an interleaved load to group low 32 bits and high 20 bits
> // of 4 successive b values into two vector registers
> // n.b. these are the same inputs as the ones in b_0 ... b4
> __ ld2(b_lows, b_highs, __ T4S, b);
Accepted.
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7854:
>
>> 7852: n = *common_regs++;
>> 7853:
>> 7854: VSeq<4> A(16);
>
> Suggestion:
>
> // vector sequences used to compute and combine partial products of
> // b_i * a_j for i = {0,1,2,3} j = {3,4}
> VSeq<4> A(16);
Accepted.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3288498738
PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3288499475
PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3288501682
PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3288500145
PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3288500841