On Wed, 1 Feb 2023 14:20:05 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractMask.java >> line 236: >> >>> 234: } else if (offset >= limit) { >>> 235: return vectorSpecies().maskAll(false); >>> 236: } else if (limit - offset >= length()) { >> >> Can you move this else if check at the top, this is the most general case >> appearing in the loop and hence two extra uncommon trap jumps before it for >> special cases may penalize this. > > Your patch re-organized the code to take care of following cases :- > 1. offset < 0 : Still falling over to old code. > 2. offset >= limit : Special case optimized > 3. (limit - offset) >= length : Most general case. > 4. Partial in range check : New intrinsic, existing java side > implementation in checkIndex0 is also composed of intrinsified calls, > infrequently taken paths will anyways lead to uncommon traps. > > Have you tried introducing just case 3 (first) and then case 2 in existing > indexInRange implementation. > Thanks for the review @jatin-bhateja ! > > > Have you tried introducing just case 3 (first) and then case 2 in existing > > indexInRange implementation. > > Yes, I tried with this way as well, together with the performance testing > compared with current version. The result is: > > 1. For cases that the array size is aligned with the vector size, the > performance is almost the same with each other. > 2. For cases that the array size is not aligned (which needs the masked > operation), the performance with current version can improve about 5%~6% > compared with the way as you suggested. But the improvement is limited to the > cases with smaller vector size, whose tail loop size is bigger. > > So, adding the new intrinsic has no side effect for me, except for the more > complex java side code and one additional intrinsic. Another benefit is the code change will be simpler, that almost all the files except the `AbstractMask.java` do not need to be changed. I will make the modification and rerun the benchmarks to decide whether it's necessary to add the new intrinsics. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/12064