On Tue, 18 Nov 2025 09:27:44 GMT, Hamlin Li <[email protected]> wrote:
>> Hi, >> >> This pr add CMoveF/D on riscv, which enable vectorization of statement like: >> `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. >> >> This pr is also a preparation for further vectorization in >> https://github.com/openjdk/jdk/pull/28231. >> >> Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, >> C2 SLP has some issue with unsigned comparison, which is now fixed, so it's >> good to continue the work. >> >> # Test >> ## Jtreg >> >> in progress... >> >> ## Performance >> >> Column names meanings: >> * p: with patch >> * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned >> on >> * m: without patch >> * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` >> turned on >> >> #### Average improvement >> >> NOTE: With only this PR, it brings performance benefit in case of >> `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below >> is based on fullly implmenting the vectorization of >> `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by >> https://github.com/openjdk/jdk/pull/28231. >> >> For details, check the performance data in >> https://github.com/openjdk/jdk/pull/25341 on riscv. >> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, >> 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; >> letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; >> text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; >> -webkit-text-stroke-width: 0px; text-decoration: none;"> >> Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) >> -- | -- | -- | -- >> 1.022782609 | 2.198717391 | 2.162673913 | 2.199 >> >> </google-sheets-html-origin> > > Hamlin Li has updated the pull request incrementally with one additional > commit since the last revision: > > replace assert with log_warning src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1590: > 1588: // jump if cmp1 < cmp2 or either is NaN > 1589: // not jump (i.e. move src to dst) if cmp1 >= cmp2 > 1590: float_blt(cmp1, cmp2, no_set); I compared this with the existing `MacroAssembler::cmov_cmp_fp_ge` [1] and I witnessed some difference in the case of `NaN` handling. In `MacroAssembler::cmov_cmp_fp_ge`, we set the `is_unordered` param to true when calling `float_blt` or `double_blt`, which is not the case here. I assume we need similar handling here as well, right? [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L1338 src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1636: > 1634: // jump if cmp1 <= cmp2 or either is NaN > 1635: // not jump (i.e. move src to dst) if cmp1 > cmp2 > 1636: float_ble(cmp1, cmp2, no_set); Same question here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2548424215 PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2548424568
