https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45215

--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 19 Jul 2024, pinskia at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45215
> 
> Andrew Pinski <pinskia at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |pinskia at gcc dot gnu.org
>           Component|tree-optimization           |middle-end
>              Status|NEW                         |ASSIGNED
> 
> --- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>   _1 = t_3(D) & 256;
>   if (_1 != 0)
>     goto <bb 4>; [1.04%]
>   else
>     goto <bb 3>; [98.96%]
> 
>   <bb 3> [local count: 1062574912]:
> 
>   <bb 4> [local count: 1073741824]:
>   # _2 = PHI <-26(2), 0(3)>
> 
> 
> So the trick here is that 256 is `0x1<<8` so we want to shift that bit up to
> the sign bit and then arthimetic shift down to get 0/-1 and then and with -26.

So .BIT_SPLAT (t_3(D), 8) & -26, there's nothing special in x86 to help
.BIT_SPLAT though, back-to-back shift might be throughput constrained.
I think x86 can do -1 vs. 0 set from flags of the and though.

I'm not sure whether two shifts and and are a good way to recover an
optimal non-branch insn sequence later?

Reply via email to