https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125905

            Bug ID: 125905
           Summary: Improve if-conversion when true/false values are
                    closely related
           Product: gcc
           Version: 17.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: law at gcc dot gnu.org
  Target Milestone: ---

Consider this code:


unsigned long
cond_mask_0 (bool flag, unsigned long mask, unsigned long target)
{
  return flag ? target | mask : target & ~mask;
}

Compiled for -march=rv64gcbv_zicond we'll get something like this:

        or      a5,a1,a2        # 37    [c=4 l=4]  *iordi3/0
        andn    a1,a2,a1        # 38    [c=4 l=4]  and_notdi3
        czero.eqz       a5,a5,a0        # 41    [c=4 l=4]  *czero.eqz.didi
        czero.nez       a0,a1,a0        # 40    [c=4 l=4]  *czero.nez.didi
        add     a0,a5,a0        # 42    [c=4 l=4]  *adddi3/0
        ret             # 52    [c=0 l=4]  simple_return

That's not bad.  But we can do better.  The trick is to realize that the two
values we're selecting across are closely related.

We essentially have

result = c ? x | y : ~x & y;

And we know that x | (~x & y) == x | y

So a bit of substitution:

result = c ? x | (~x & y) : ~x & y;

And factoring

t = (~x & y);
result = c ? x | t : t

So it's just conditional ior.  

  andn t0, a2, a1
  czero.eqz t1, t1, a0
  add a0, t0, t1

So 5->3 instructions for the select.  Probably not any faster on a 2+ wide
core, but still worth doing.

I haven't thought a ton about implementation details.  I did confirm that we
see the form we want in noce_try_cond_arith. 
Breakpoint 1, noce_try_cond_arith (if_info=0x7fffffffe3d0) at
/home/jlaw/test/gcc/gcc/ifcvt.cc:3232
3232      rtx cond = if_info->cond;
(ior:DI (reg:DI 147 [ mask ])
    (reg:DI 148 [ target ]))
$8 = void
(and:DI (not:DI (reg:DI 147 [ mask ]))
    (reg:DI 148 [ target ]))
$9 = void

We could try to optimize/canonicalize the arms in here or its caller.  I would
expect that if it's canonicalized into a conditional IOR the right things will
just happen in ifcvt.

Note this happens in if-conversion *after* reload.  So the most natural place
to clean some of this up (combine/simplify-rtx) isn't applicable.

Reply via email to