> > I think for a 512-bit vector, vgf2p8affineqb is better than the > original codegen, but for a 128/256-bit vector, shouldn't vpcmpgtb be > better than vgf2p8affineqb?
Yes it's better, but I don't see it in the loop bodies for any of my test cases, only in prologues/epilogues. Okay probably that is because they were all 512 opsize. I'll add test cases for the other sizes. -Andi