On Thu, Nov 28, 2013 at 9:18 PM, Vladimir Makarov <vmaka...@redhat.com> wrote:
> On 11/28/2013, 7:51 PM, H.J. Lu wrote:
>>
>> On Thu, Nov 28, 2013 at 2:11 PM, Vladimir Makarov <vmaka...@redhat.com>
>> wrote:
>>>
>>>    The following patch fixes PR57293
>>>
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57293
>>>
>>>    It is actually an implementation of missed LRA functionality in reg
>>> elimination.  Before the patch any explicit change of stack pointer in
>>> RTL resulted in necessity to use the frame pointer.
>>>
>>>    The patch has practically no effect on generic tuning of x86/x86-64.
>>> But it has a dramatic effect on code performance for other tunings
>>> like corei7 which don't use incoming args accumulation.  The maximum
>>> SPEC2000 improvement 2.5% is achieved on x86 SPECInt2000.  But
>>> SPECFP2000 rate also has improvement about 1% on x86 and x86-64.  Too
>>> bad that I did not implement it at the first place.  The results would
>>> have been even much better ones reported on 2012 GNU Cauldron as I
>>> also used -mtune=corei7 that time.
>>>
>>> The patch was bootstrapped and tested on x86-64/x86 and ppc.
>>>
>>> Committed as rev. 205498.
>>>
>>>   2013-11-28  Vladimir Makarov<vmaka...@redhat.com>
>>>
>>>          PR target/57293
>>>          * ira.h (ira_setup_eliminable_regset): Remove parameter.
>>>          * ira.c (ira_setup_eliminable_regset): Ditto.  Add
>>>          SUPPORTS_STACK_ALIGNMENT for crtl->stack_realign_needed.
>>>          Don't call lra_init_elimination.
>>>          (ira): Call ira_setup_eliminable_regset without arguments.
>>>          * loop-invariant.c (calculate_loop_reg_pressure): Remove
>>> argument
>>>          from ira_setup_eliminable_regset call.
>>>          * gcse.c (calculate_bb_reg_pressure): Ditto.
>>>          * haifa-sched.c (sched_init): Ditto.
>>>          * lra.h (lra_init_elimination): Remove the prototype.
>>>          * lra-int.h (lra_insn_recog_data): New member sp_offset.  Move
>>>          used_insn_alternative upper.
>>>          (lra_eliminate_regs_1): Add one more parameter.
>>>          (lra-eliminate): Ditto.
>>>          * lra.c (lra_invalidate_insn_data): Set sp_offset.
>>>          (setup_sp_offset): New.
>>>          (lra_process_new_insns): Call setup_sp_offset.
>>>          (lra): Add argument to lra_eliminate calls.
>>>          * lra-constraints.c (get_equiv_substitution): Rename to
>>> get_equiv.
>>>          (get_equiv_with_elimination): New.
>>>          (process_addr_reg): Call get_equiv_with_elimination instead of
>>>          get_equiv_substitution.
>>>          (equiv_address_substitution): Ditto.
>>>          (loc_equivalence_change_p): Ditto.
>>>          (loc_equivalence_callback, lra_constraints): Ditto.
>>>          (curr_insn_transform): Ditto.  Print the sp offset
>>>          (process_alt_operands): Prevent stack pointer reloads.
>>>          (lra_constraints): Remove one argument from lra_eliminate call.
>>>          Move it up.  Mark used hard regs bfore it.  Use
>>>          get_equiv_with_elimination instead of get_equiv_substitution.
>>>          * lra-eliminations.c (lra_eliminate_regs_1): Add parameter and
>>>          assert for param values combination.  Use sp offset.  Add
>>> argument
>>>          to lra_eliminate_regs_1 calls.
>>>          (lra_eliminate_regs): Add argument to lra_eliminate_regs_1 call.
>>>          (curr_sp_change): New static var.
>>>          (mark_not_eliminable): Add parameter.  Update curr_sp_change.
>>>          Don't prevent elimination to sp if we can calculate its change.
>>>          Pass the argument to mark_not_eliminable calls.
>>>          (eliminate_regs_in_insn): Add a parameter.  Use sp offset.  Add
>>>          argument to lra_eliminate_regs_1 call.
>>>          (update_reg_eliminate): Move calculation of hard regs for spill
>>>          lower.  Switch off lra_in_progress temporarily to generate regs
>>>          involved into elimination.
>>>          (lra_init_elimination): Rename to init_elimination.  Make it
>>>          static.  Set up insn sp offset, check the offsets at the end of
>>>          BBs.
>>>          (process_insn_for_elimination): Add parameter.  Pass its value
>>> to
>>>          eliminate_regs_in_insn.
>>>          (lra_eliminate): : Add parameter.  Pass its value to
>>>          process_insn_for_elimination.  Add assert for param values
>>>          combination.  Call init_elimination.  Don't update offsets in
>>>          equivalence substitutions.
>>>          * lra-spills.c (assign_mem_slot): Don't call
>>> lra_eliminate_regs_1
>>>          for created stack slot.
>>>          (remove_pseudos): Call lra_eliminate_regs_1 before changing
>>> memory
>>>          onto stack slot.
>>>
>>
>> Hi Vladimir,
>>
>> Thanks for your hard work.   I noticed a few regressions
>> on x86-64:
>>

>> FAIL: gcc.target/i386/pr9771-1.c (internal compiler error)
>>
>
> Well, I guess it can happen when you don't use frame-pointer anymore.
>
>> FAIL: gcc.target/i386/pr9771-1.c (internal compiler error)
>> FAIL: gcc.target/i386/pr9771-1.c (test for excess errors)
>>
>
> May be it is wrong to use this test with this tuning as reload also fails on
> this test when -march=corei7 is used.  Or most probably the problem is
> somewhere else.  The patch just triggered it.
>

There is a comment:

   FIXME: the flags is incorrectly enabled for amdfam10, Bulldozer,
   Bobcat and Generic.  This is because disabling it causes large
   regression on mgrid due to IRA limitation leading to unecessary
   use of the frame pointer in 32bit mode.  */
DEF_TUNE (X86_TUNE_ACCUMULATE_OUTGOING_ARGS, "accumulate_outgoing_args",
          m_PPRO | m_P4_NOCONA | m_ATOM | m_SLM | m_AMD_MULTIPLE | m_GENERIC)

Does this mean we can enable it for m_AMD_MULTIPLE and
m_GENERIC?  If this bug is fixed, we should add

-mtune-ctrl=accumulate_outgoing_args

to gcc.target/i386/pr9771-1.c since it may be disabled by
default.

Thanks.

-- 
H.J.

Reply via email to