On Wed, 20 Apr 2022 02:44:39 GMT, Xiaohong Gong <xg...@openjdk.org> wrote:
>>> The blend should be with the intended-to-store vector, so that masked lanes >>> contain the need-to-store elements and unmasked lanes contain the loaded >>> elements, which would be stored back, which results in unchanged values. >> >> It may not work if memory is beyond legal accessible address space of the >> process, a corner case could be a page boundary. Thus re-composing the >> intermediated vector which partially contains actual updates but effectively >> perform full vector write to destination address may not work in all >> scenarios. > > Thanks for the comment! So how about adding the check for the valid array > range like the masked vector load? > Codes like: > > public final > void intoArray(byte[] a, int offset, > VectorMask<Byte> m) { > if (m.allTrue()) { > intoArray(a, offset); > } else { > ByteSpecies vsp = vspecies(); > if (offset >= 0 && offset <= (a.length - vsp.length())) { // > a full range check > intoArray0(a, offset, m, /* usePred */ false); > // can be vectorized by load+blend_store > } else { > checkMaskFromIndexSize(offset, vsp, m, 1, a.length); > intoArray0(a, offset, m, /* usePred */ true); > // only be vectorized by the predicated store > } > } > } Thanks, this looks ok since out of range condition will not be intrinsified if targets does not support predicated vector store. ------------- PR: https://git.openjdk.java.net/jdk/pull/8035