On 10/13/2015 07:12 AM, Dominik Vogt wrote:
In some cases, the work of the cse1 pass is counterproductive, as
we noticed on s390x.  The effect described below is present since
at least 4.8.0.  Note that this may not become manifest in a
performance issue problem on all platforms.  Also note that -O1
does not show this behaviour because the responsible code is only
executed with -O2 or higher.

The core of the problem is the was cse1 sometimes handles function
parameters.  Roughly, the observed situation is

Before cse1

   start of function
   set pseudoreg Rp to the first argument from hardreg R2
   (some code that uses Rp)
   set R2 to Rp

After cse1:

   start of function
   set pseudoreg Rp to the first argument from hardreg R2
   (some code that uses Rp)  <--- The use of Rp is still present
   set R2 to R2              <--- cse1 has replaced Rp with R2

After that, the set pattern is removed completely, and now we have
both, Rp and R2 live in the drafted code snippet.  Because R2 ist
still supposed to be live later on, the ira pass chooses a
different hard register (R1) for Rp, and code to copy R1 back to
R2 is added later.  (See further down for Rtl and assembly code.)

--

There seems to be code to prevent this in cse.c:hash_rtx_cb() as a
comment from that function suggests:

     /* On some machines, we can't record any non-fixed hard register,
        because extending its life will cause reload problems.  We
        consider ap, fp, sp, gp to be fixed for this purpose.
This is referring to the inability to reload those objects. It's a correctness not a performance concern with those registers.


     ...

Unfortunately this is not caused by hashing but by the code
dealing with src_related in cse_insn().  When cse_insn() handles
the "copy Rp to R2" instruction, it does nothing up to line 5020
and sets src_related there:

   /* This is the same as the destination of the insns, we want
      to prefer it.  Copy it to src_related.  The code below will
      then give it a negative cost.  */
   if (GET_CODE (dest) == code && rtx_equal_p (p->exp, dest))
     src_related = dest;

Eventually, the term src_related is used to replace the source
expression of the set pattern.  So, while the above comment may be
applicable to hashed expressions that are considered for
replacement, there's no such "safety net" for the expressions
src_related, src_folded etc.  I guess if there was, that would fix
the issue.

--

So, I've made an experimental hack (see attachment) and treid
that.  In a larger test suite, register copies could be saved in
quite some places (including the test program below), but in other
places new register copies were introduced, resulting in about
twice as much "issues" as without the patch.

Maybe the patch is just too coarse.  In general I'd assume that
the register allocator does a better job of assigning hard
registers to pseudo registers.  Is it possible to better describe
when cse1 should keep its hands off pseudo registers?
We don't really have a way to describe this.

I know Vlad looked at problems in this space -- essentially knowing when two registers had the same value in the allocators/reload and exploiting that information.

My recollection was it didn't help in any measurable way -- I think he discussed it during one of the old GCC summit conferences. That was also in the reload era.

Ultimately this feels like all the issues around coalescing and copy-propagation. With that in mind, if we had lifetime & conflict information, then we'd be able to query that and perhaps be able to make different choices.

I wonder if the web-izer pass could help here or something based on it. Essentially what you want to do is a range split.


Jeff


Reply via email to