Re: RFR: 8378737: AArch64: Fix SVE match rule issues for VectorMask.andNot() [v2]

Jaromir Hamala Fri, 03 Apr 2026 01:50:39 -0700

On Thu, 19 Mar 2026 06:33:02 GMT, Xiaohong Gong <[email protected]> wrote:


>> Duplicate `ptrue`(`MaskAll`) instructions are generated with different 
>> predicate registers on SVE when multiple `VectorMask.not()` operations 
>> exist. This increases the predicate register pressure and reduces 
>> performance, especially after loop is unrolled.
>> 
>> Root cause: the matcher clones `MaskAll` for each `not` pattern (i.e. 
>> `(XorVMask (MaskAll m1))`), but SVE has no match rule for that alone. And 
>> the cloned `MaskAll` nodes are not shared with each other.
>> 
>> Since SVE has rules for the `andNot` pattern:
>> 
>>   match(Set pd (AndVMask pn (XorVMask pm (MaskAll m1))));
>> 
>> `MaskAll` node should be cloned only when it is part of the `andNot` pattern 
>> instead.
>> 
>> A second issue: `AndVMask`, `OrVMask`, and `XorVMask` are not in the 
>> matcher's commutative vector op list, so their operands are never swapped. 
>> As a result, the `andNot` rule does not match when the `XorVMask` operands 
>> appear in the opposite order (e.g. `(XorVMask (MaskAll m1) pm)`).
>> 
>> This patch fixes both issues by 1) limiting when `MaskAll` is cloned and 2) 
>> adding the three binary mask bitwise IRs to the commutative op list.
>> 
>> Following is the performance result of the new added JMH tested on V1 and 
>> Grace(V2) machines respecitively:
>> 
>> V1 (SVE machine with 256-bit vector length):
>> 
>> Benchmark                                                     Mode  Threads 
>> Samples Unit   size  Before     After     Gain
>> MaskLogicOperationsBenchmark.byteMaskAndNot                   thrpt 1       
>> 30      ops/ms  256 54465.231  74374.960  1.365
>> MaskLogicOperationsBenchmark.byteMaskAndNot                   thrpt 1       
>> 30      ops/ms  512 29156.881  39601.358  1.358
>> MaskLogicOperationsBenchmark.byteMaskAndNot                   thrpt 1       
>> 30      ops/ms 1024 15169.894  20272.379  1.336
>> MaskLogicOperationsBenchmark.intMaskAndNot                    thrpt 1       
>> 30      ops/ms  256 15408.510  19808.722  1.285
>> MaskLogicOperationsBenchmark.intMaskAndNot                    thrpt 1       
>> 30      ops/ms  512  7906.952  10297.837  1.302
>> MaskLogicOperationsBenchmark.intMaskAndNot                    thrpt 1       
>> 30      ops/ms 1024  3767.122   5097.853  1.353
>> MaskLogicOperationsBenchmark.longMaskAndNot                   thrpt 1       
>> 30      ops/ms  256  7762.614  10534.290  1.357
>> MaskLogicOperationsBenchmark.longMaskAndNot                   thrpt 1       
>> 30      ops/ms  512  3976.759   5123.445  1.288
>> MaskLogicOperationsBenchmark.longMaskAndNot                   thrpt 1       
>> 30      ops/...
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Add comment

I've been toying in this area on x86, except the problem on x86 is opposite: It 
does not clone the `MaskAll` node enough: 
1. there is a missing rule for long - only nodes with int constant are cloned
2. `XorVMask` is not treated as commutative, so even the existing rule does not 
always match

I have 
[WIP](https://github.com/openjdk/jdk/compare/master...jerrinot:jdk:jh_knot_kxor),
 is it worth sending a PR?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/30013#issuecomment-4182582388

Re: RFR: 8378737: AArch64: Fix SVE match rule issues for VectorMask.andNot() [v2]

Reply via email to