Re: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v17]

Jatin Bhateja Wed, 08 Apr 2026 03:22:29 -0700

On Tue, 7 Apr 2026 16:52:48 GMT, Sandhya Viswanathan <[email protected]> 
wrote:


>> Jatin Bhateja has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now contains 20 commits:
>> 
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8303762
>>  - Review comments resolutions
>>  - Review resolutions
>>  - Review comments resolution
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8303762
>>  - Review comments resolutions
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8303762
>>  - Review comments resolutions
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8303762
>>  - Update callGenerator.hpp copyright year
>>  - ... and 10 more: https://git.openjdk.org/jdk/compare/0b803bd3...bde0c216
>
> src/hotspot/cpu/x86/x86.ad line 3418:
> 
>> 3416:                          ((size_in_bits != 512) && 
>> !VM_Version::supports_avx512vl()))) {
>> 3417:         return false;
>> 3418:       }
> 
> This can be simplified to:
> 
>       if (UseAVX > 2 && !VM_Version::supports_avx512vlbw()) {
>         return false;
>       }
> 
> As the platforms supporting bw also support vl.

It was segregated because for size_in_bits set to 512 we don't need AVX512VL.

> src/hotspot/cpu/x86/x86.ad line 25418:
> 
>> 25416: %}
>> 25417: 
>> 25418: instruct vector_slice_const_origin_16B_reg(vec dst, vec src1, vec 
>> src2, immI origin)
> 
> The instruct rules with same register profile can be merged, so overall only 
> 3 rules are needed:
> 
> 1) With dst, src1, src2, origin profile
>     The following three rules can be merged into 1:
>         vector_slice_const_origin_16B_reg
>         vector_slice_const_origin_GT16B_index16_reg
>         vector_slice_const_origin_GT16B_index_multiple4_reg_evex
>    With predicate:
>         predicate((Matcher::vector_length_in_bytes(n) == 16)  ||
>                          n->in(2)->get_int() & 0x3) == 0);
> 
> 2) With dst, src1, src2, origin and TEMP dst
>     The following two rules can be merged into 1:
>        vector_slice_const_origin_GT16B_reg
>        vector_slice_const_origin_GT16B_index_LT16_OR_GT48_reg_evex
>    With predicate:
>         predicate ( n->in(2)->get_int() & 0x3) != 0  &&
>                           (Matcher::vector_length_in_bytes(n) == 32) ||
>                           (Matcher::vector_length_in_bytes(n) == 64 &&
>                            (n->in(2)->get_int() < 16 || n->in(2)->get_int() > 
> 48));
> 3) With dst, src1, src2, origin, xtmp with TEMP dst
>     vector_slice_const_origin_GT16B_index_GT16_AND_LT48_reg_evex
>     With predicate:
>     predicate( n->in(2)->get_int() & 0x3) != 0 &&
>                      Matcher::vector_length_in_bytes(n) == 64 &&
>                      n->in(2)->get_int() > 16 && n->in(2)->get_int() < 48);

Hi @sviswa7 , I agree that we should generate optimum number of matcher 
patterns, thanks for the changes, predicates will now seep into instruction 
encoding block which will now have mixture of AVX and EVEX macro assembly 
routines. I have shared couple of nodes but not all in view of maintainability 
and to reduce unnecessary complexity.  

Please let me know if there are other comments.

Best Regards

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r3050632763
PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r3050632332

Re: RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v17]

Reply via email to