Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

Emanuel Peter Mon, 08 Jan 2024 02:37:25 -0800

On Mon, 8 Jan 2024 06:06:20 GMT, Jatin Bhateja <[email protected]> wrote:


>> You are using `VectorMask<Integer> pred = VectorMask.fromLong(ispecies, 
>> maskctr++);`.
>> That basically systematically iterates over all masks, which is nice for a 
>> correctness test.
>> But that would use different density inside one test run, right? The average 
>> over the loop is still at `50%`, correct?
>> 
>> I was thinking more a run where the percentage over the whole loop is lower 
>> than maybe `1%`. That would get us to a point where maybe the branch 
>> prediction of non-vectorized code might be faster, what do you think?
>
>> You are using `VectorMask<Integer> pred = VectorMask.fromLong(ispecies, 
>> maskctr++);`. That basically systematically iterates over all masks, which 
>> is nice for a correctness test. But that would use different density inside 
>> one test run, right? The average over the loop is still at `50%`, correct?
>> 
>> I was thinking more a run where the percentage over the whole loop is lower 
>> than maybe `1%`. That would get us to a point where maybe the branch 
>> prediction of non-vectorized code might be faster, what do you think?
> 
> An imperative loop for compression will check each mask bit to select 
> compressible lane. Therefore mask with low or high density of set bits should 
> show similar performance.

Yes, IF it is vectorized, then there is no difference between high and low 
density. My concern was more if vectorization is preferrable over the scalar 
alternative in the low-density case, where branch prediction is more stable.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1444257535

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

Reply via email to