On Wed, 8 Apr 2026 02:12:29 GMT, Xiaohong Gong <[email protected]> wrote:
>> Duplicate `ptrue`(`MaskAll`) instructions are generated with different >> predicate registers on SVE when multiple `VectorMask.not()` operations >> exist. This increases the predicate register pressure and reduces >> performance, especially after loop is unrolled. >> >> Root cause: the matcher clones `MaskAll` for each `not` pattern (i.e. >> `(XorVMask (MaskAll m1))`), but SVE has no match rule for that alone. And >> the cloned `MaskAll` nodes are not shared with each other. >> >> Since SVE has rules for the `andNot` pattern: >> >> match(Set pd (AndVMask pn (XorVMask pm (MaskAll m1)))); >> >> `MaskAll` node should be cloned only when it is part of the `andNot` pattern >> instead. >> >> A second issue: `AndVMask`, `OrVMask`, and `XorVMask` are not in the >> matcher's commutative vector op list, so their operands are never swapped. >> As a result, the `andNot` rule does not match when the `XorVMask` operands >> appear in the opposite order (e.g. `(XorVMask (MaskAll m1) pm)`). >> >> This patch fixes both issues by 1) limiting when `MaskAll` is cloned and 2) >> adding the three binary mask bitwise IRs to the commutative op list. >> >> Following is the performance result of the new added JMH tested on V1 and >> Grace(V2) machines respecitively: >> >> V1 (SVE machine with 256-bit vector length): >> >> Benchmark Mode Threads >> Samples Unit size Before After Gain >> MaskLogicOperationsBenchmark.byteMaskAndNot thrpt 1 >> 30 ops/ms 256 54465.231 74374.960 1.365 >> MaskLogicOperationsBenchmark.byteMaskAndNot thrpt 1 >> 30 ops/ms 512 29156.881 39601.358 1.358 >> MaskLogicOperationsBenchmark.byteMaskAndNot thrpt 1 >> 30 ops/ms 1024 15169.894 20272.379 1.336 >> MaskLogicOperationsBenchmark.intMaskAndNot thrpt 1 >> 30 ops/ms 256 15408.510 19808.722 1.285 >> MaskLogicOperationsBenchmark.intMaskAndNot thrpt 1 >> 30 ops/ms 512 7906.952 10297.837 1.302 >> MaskLogicOperationsBenchmark.intMaskAndNot thrpt 1 >> 30 ops/ms 1024 3767.122 5097.853 1.353 >> MaskLogicOperationsBenchmark.longMaskAndNot thrpt 1 >> 30 ops/ms 256 7762.614 10534.290 1.357 >> MaskLogicOperationsBenchmark.longMaskAndNot thrpt 1 >> 30 ops/ms 512 3976.759 5123.445 1.288 >> MaskLogicOperationsBenchmark.longMaskAndNot thrpt 1 >> 30 ops/... > > Xiaohong Gong has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains four additional > commits since the last revision: > > - Do not clone maskAll if two inputs of AndVMask are both mask-not pattern > - Merge branch 'jdk:master' into JDK-8378737 > - Add comment > - 8378737: AArch64: Fix SVE match rule issues for VectorMask.andNot() Hi, could anyone please help take a look at this PR? Thanks a lot in advance! ------------- PR Comment: https://git.openjdk.org/jdk/pull/30013#issuecomment-4242720193
