On Tue, 4 Oct 2022 06:49:53 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> @merykitty Thanks for the suggestion. I will update the instruct to use >> kmovwl. I will also experiment with kshiftrw and let you know. > >> You can use `kmovwl` instead which will relax the avx512bw constraint, >> however, you will need avx512vl for `evcvtps2ph`. Thanks. > > Yes, in general all AVX512VL targets support AVX512BW, but cloud instances > give freedom to enable custom features. Regarding K0, as per section > "15.6.1.1" of SDM, expectation is that K0 can appear in source and > destination of regular non predication context, k0 should always contain all > true mask so it should be unmodifiable for subsequent usages i.e. should not > be present as destination of a mask manipulating instruction. Your suggestion > is to have that in source but it may not work either. Changing existing > sequence to use kmovw and replace AVX512BW with AVX512VL will again mean > introducing an additional predication check for this pattern. Ah I get it, the encoding of k0 is treated specially in predicated instructions to refer to an all-set mask, but the register itself may not actually contain that value. So usage in `kshiftrw` may fail. In that case I think we can generate an all-set mask on the fly using `kxnorw(ktmp, ktmp)` to save a GPR in this occasion. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/9781