On Fri, Mar 13, 2020 at 03:21:18AM +0000, Yangfei (Felix) wrote:
> > On Wed, Mar 04, 2020 at 08:39:36AM +0000, Yangfei (Felix) wrote:
> > >   This is a simple fix for PR94026.
> > >   With this fix, combine will try make an extraction if we are in a 
> > > equality
> > comparison and this is an AND
> > >   with a constant which is power of two minus one.  Shift here should be 
> > > an
> > constant.  For example, combine
> > >   will transform (compare (and (lshiftrt x 8) 6) 0) to (compare 
> > > (zero_extract
> > (x 2 9)) 0).
> > 
> > Why is that a good thing?
> 
> The reported test case is reduced from spec2017 541.leela_r.  I have pasted 
> original code snippet on the bugzilla.  
> We found other compilers like aocc/llvm can catch this pattern and simplify 
> it.  

That wasn't my question, let me rephrase: why would writing it as
zero_extract (instead of as a more canonical form) be wanted?

The aarch backend only has zero_extract formulations for most of the
bitfield instructions.  If you fix that problem, all of this should go
away?  Like, the testcase in the PR starts with

Trying 7 -> 8:
    7: r99:SI=r103:SI>>r104:SI#0
      REG_DEAD r104:SI
      REG_DEAD r103:SI
    8: r100:SI=r99:SI&0x6
      REG_DEAD r99:SI
Failed to match this instruction:
(set (reg:SI 100)
    (and:SI (ashiftrt:SI (reg:SI 103)
            (subreg:QI (reg:SI 104) 0))
        (const_int 6 [0x6])))

and that should match already (that's an ubfm (ubfx))?


> > (There should be thorough tests on many archs, showing it helps on average,
> > and it doesn't regress anything.  I can do that for you, but not right now).
> 
> I only have aarch64 & x86_64 linux available and have tested this patch with 
> spec17 on both platforms.  
> No obvious improvement & regression witnessed.  This is expected as only one 
> instruction is reduced here.  

What should be tested is what new combinations are done, and which are
*no longer* done.

> > In general, we should have *fewer* zero_extract, not more.

Some reasons for that:

1) All those can be expressed with simpler operations as well;
2) Most very similar expressions cannot be expressed as zero_extract,
although many architectures can handle (some of) those just fine;
3) The optimizers do not handle zero_extract very well at all (this
includes simplify-rtx, to start with).

sign_extract is nastier -- we really want to have a sign_extend that
works on separate bits, not as coarse as address units as we have now --
but it currently isn't handled much either.


Segher

Reply via email to