On 04/21/2015 08:24 AM, Jiong Wang wrote:

Jiong Wang writes:

2015-04-14 18:24 GMT+01:00 Jeff Law <l...@redhat.com>:
On 04/14/2015 10:48 AM, Steven Bosscher wrote:

So I think this stage2/3 binary difference is acceptable?


No, they should be identical. If there's a difference, then there's a
bug - which, it seems, you've already found, too.

RIght.  And so the natural question is how to fix.

At first glance it would seem like having this new code ignore dependencies
rising from debug insns would work.

Which then begs the question, what happens to the debug insn -- it's
certainly not going to be correct anymore if the transformation is made.

Exactly.

The debug_insn 2776 in my example is to record the base address of a
local array. the new code is doing correctly here by not shuffling the
operands of insn 2556 and 2557 as there is additional reference of
reg:1473 from debug insn, although the code will still execute correctly
if we do the transformation.

my understanding to fix this:

   * delete the out-of-date mismatch debug_insn? as there is no guarantee
     to generate accurate debug info under -O2.

     IMO, this debug_insn may affect "DW_AT_location" field for variable
     descrption of "classes" in .debug_info section, but it's omitted in
     the final output already.

     <3><38a4d>: Abbrev Number: 137 (DW_TAG_variable)
     <38a4f>   DW_AT_name : (indirect string, offset: 0x18db): classes
     <38a53>   DW_AT_decl_file   : 1
     <38a54>   DW_AT_decl_line   : 548
     <38a56>   DW_AT_type        : <0x38cb4>

   * update the debug_insn? if the following change is OK with dwarf standard

    from

      insn0: reg0 = fp + reg1
      debug_insn: var_loc = reg0 + const_off
      insn1: reg2 = reg0 + const_off

    to

      insn0: reg0 = fp + const_off
      debug_insn: var_loc = reg0 + reg1
      insn1: reg2 = reg0 + reg1

Thanks,


And attachment is the new patch which will update debug_insn as
described in the second solution above.

Now the stage2/3 binary differences on AArch64 gone away. Bootstrap OK.

On AArch64, this patch give 600+ new rtl loop invariants found across
spec2k6 float. +4.5% perf improvement on 436.cactusADM because four new
invariants found in the critical function "regex_compile".

The similar improvements may be achieved on other RISC backends like
powerpc/mips I guess.

One thing to mention, for AArch64, one minor glitch in
aarch64_legitimize_address needs to be fixed to let this patch take
effect, I will send out that patch later as it's a seperate issue.
Powerpc/Mips don't have this glitch in LEGITIMIZE_ADDRESS hook, so
should be OK, and I verified the base address of local array in the
testcase given by Seb on pr62173 do hoisted on ppc64 now. I think
pr62173 is fixed on those 64bit arch by this patch.

Thoughts?

Thanks.

2015-04-21  Jiong Wang  <jiong.w...@arm.com>

gcc/
   * loop-invariant.c (find_defs): Enable DF_DU_CHAIN build.
   (vfp_const_iv): New hash table.
   (expensive_addr_check_p): New boolean.
   (init_inv_motion_data): Initialize new variables.>
   (free_inv_motion_data): Release hash table.
   (create_new_invariant): Set cheap_address to false for iv in
   vfp_const_iv table.
   (find_invariant_insn): Skip dependencies check for iv in vfp_const_iv
   table.
   (use_for_single_du): New function.
   (reshuffle_insn_with_vfp): Likewise.
   (find_invariants_bb): Call reshuffle_insn_with_vfp.

gcc/testsuite/
    * gcc.dg/pr62173.c: New testcase.
So ultimately isn't this just a specialized version of reassociation of pointer arithmetic? And if so, don't we have all the usual problems around introducing overflow?

ISTM if this is going to go forward (ie, we somehow convince ourselves that this kind of reassociation is OK), then should it be made to apply on pointers in general rather than restricting to stack, frame, virtual-frame?

I wish I'd looked more closely at the patch the first time around so that I could have raised these questions sooner.

Jeff


Reply via email to