https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45215
--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> --- On Fri, 19 Jul 2024, pinskia at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45215 > > Andrew Pinski <pinskia at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |pinskia at gcc dot gnu.org > Component|tree-optimization |middle-end > Status|NEW |ASSIGNED > > --- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> --- > _1 = t_3(D) & 256; > if (_1 != 0) > goto <bb 4>; [1.04%] > else > goto <bb 3>; [98.96%] > > <bb 3> [local count: 1062574912]: > > <bb 4> [local count: 1073741824]: > # _2 = PHI <-26(2), 0(3)> > > > So the trick here is that 256 is `0x1<<8` so we want to shift that bit up to > the sign bit and then arthimetic shift down to get 0/-1 and then and with -26. So .BIT_SPLAT (t_3(D), 8) & -26, there's nothing special in x86 to help .BIT_SPLAT though, back-to-back shift might be throughput constrained. I think x86 can do -1 vs. 0 set from flags of the and though. I'm not sure whether two shifts and and are a good way to recover an optimal non-branch insn sequence later?