> 
> I think for a 512-bit vector, vgf2p8affineqb is better than the
> original codegen, but for a 128/256-bit vector, shouldn't vpcmpgtb be
> better than vgf2p8affineqb?

Yes it's better, but I don't see it in the loop bodies for
any of my test cases, only in prologues/epilogues. 

Okay probably that is because they were all 512 opsize.

I'll add test cases for the other sizes.

-Andi

Reply via email to