On Thu, 31 Mar 2022 03:53:15 GMT, Xiaohong Gong <xg...@openjdk.org> wrote:
>> Yeah, maybe I misunderstood what you mean. So maybe the masked store >> `(store(src, m))` could be implemented with: >> >> 1) v1 = load >> 2) v2 = blend(load, src, m) >> 3) store(v2) >> >> Let's record this a JBS and fix it with a followed-up patch. Thanks! > > The optimization for masked store is recorded to: > https://bugs.openjdk.java.net/browse/JDK-8284050 > The blend should be with the intended-to-store vector, so that masked lanes > contain the need-to-store elements and unmasked lanes contain the loaded > elements, which would be stored back, which results in unchanged values. It may not work if memory is beyond legal accessible address space of the process, a corner case could be a page boundary. Thus re-composing the intermediated vector which partially contains actual updates but effectively perform full vector write to destination address may not work in all scenarios. ------------- PR: https://git.openjdk.java.net/jdk/pull/8035