On Wed, 20 Apr 2022 02:44:39 GMT, Xiaohong Gong <[email protected]> wrote:
>>> The blend should be with the intended-to-store vector, so that masked lanes
>>> contain the need-to-store elements and unmasked lanes contain the loaded
>>> elements, which would be stored back, which results in unchanged values.
>>
>> It may not work if memory is beyond legal accessible address space of the
>> process, a corner case could be a page boundary. Thus re-composing the
>> intermediated vector which partially contains actual updates but effectively
>> perform full vector write to destination address may not work in all
>> scenarios.
>
> Thanks for the comment! So how about adding the check for the valid array
> range like the masked vector load?
> Codes like:
>
> public final
> void intoArray(byte[] a, int offset,
> VectorMask<Byte> m) {
> if (m.allTrue()) {
> intoArray(a, offset);
> } else {
> ByteSpecies vsp = vspecies();
> if (offset >= 0 && offset <= (a.length - vsp.length())) { //
> a full range check
> intoArray0(a, offset, m, /* usePred */ false);
> // can be vectorized by load+blend_store
> } else {
> checkMaskFromIndexSize(offset, vsp, m, 1, a.length);
> intoArray0(a, offset, m, /* usePred */ true);
> // only be vectorized by the predicated store
> }
> }
> }
Thanks, this looks ok since out of range condition will not be intrinsified if
targets does not support predicated vector store.
-------------
PR: https://git.openjdk.java.net/jdk/pull/8035