On Tue, 4 Oct 2022 09:07:42 GMT, Quan Anh Mai <qa...@openjdk.org> wrote:
>>> You can use `kmovwl` instead which will relax the avx512bw constraint, >>> however, you will need avx512vl for `evcvtps2ph`. Thanks. >> >> Yes, in general all AVX512VL targets support AVX512BW, but cloud instances >> give freedom to enable custom features. Regarding K0, as per section >> "15.6.1.1" of SDM, expectation is that K0 can appear in source and >> destination of regular non predication context, k0 should always contain all >> true mask so it should be unmodifiable for subsequent usages i.e. should not >> be present as destination of a mask manipulating instruction. Your >> suggestion is to have that in source but it may not work either. Changing >> existing sequence to use kmovw and replace AVX512BW with AVX512VL will again >> mean introducing an additional predication check for this pattern. > > Ah I get it, the encoding of k0 is treated specially in predicated > instructions to refer to an all-set mask, but the register itself may not > actually contain that value. So usage in `kshiftrw` may fail. In that case I > think we can generate an all-set mask on the fly using `kxnorw(ktmp, ktmp, > ktmp)` to save a GPR in this occasion. Thanks. Hi @merykitty, I am seeing performance regression with kxnorw instruction. So I have updated the PR with kmovwl. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/9781