On Tue, 4 Oct 2022 09:07:42 GMT, Quan Anh Mai <qa...@openjdk.org> wrote:

>>> You can use `kmovwl` instead which will relax the avx512bw constraint, 
>>> however, you will need avx512vl for `evcvtps2ph`. Thanks.
>> 
>> Yes, in general all AVX512VL targets support AVX512BW, but cloud instances 
>> give freedom to enable custom features. Regarding K0, as per section 
>> "15.6.1.1" of SDM, expectation is that K0 can appear in source and 
>> destination of regular non predication context, k0 should always contain all 
>> true mask so it should be unmodifiable for subsequent usages i.e. should not 
>> be present as destination of a mask manipulating instruction. Your 
>> suggestion is to have that in source but it may not work either. Changing 
>> existing sequence to use kmovw and replace AVX512BW with AVX512VL will again 
>> mean introducing an additional predication check for this pattern.
>
> Ah I get it, the encoding of k0 is treated specially in predicated 
> instructions to refer to an all-set mask, but the register itself may not 
> actually contain that value. So usage in `kshiftrw` may fail. In that case I 
> think we can generate an all-set mask on the fly using `kxnorw(ktmp, ktmp, 
> ktmp)` to save a GPR in this occasion. Thanks.

Hi @merykitty, I am seeing performance regression with kxnorw instruction. So I 
have updated the PR with kmovwl. Thanks.

-------------

PR: https://git.openjdk.org/jdk/pull/9781

Reply via email to