Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-25 Thread Vladimir Makarov

On 04/25/2013 11:29 AM, Vladimir Makarov wrote:

On 04/24/2013 03:42 PM, Michael Meissner wrote:

I'm seeing a lot of failures with these changes in make check.


I've reproduced it, Mike.  I'll work on it.  Thanks.


I've fixed it, Mike.  It is on the branch now.

2013-04-25  Vladimir Makarov  

* config/rs6000/rs6000.c (rs6000_emit_move): Use assigned hard reg
for LRA SD moves.

Index: config/rs6000/rs6000.c
===
--- config/rs6000/rs6000.c  (revision 198263)
+++ config/rs6000/rs6000.c  (working copy)
@@ -7377,9 +7377,14 @@ rs6000_emit_move (rtx dest, rtx source,

   if (regno >= FIRST_PSEUDO_REGISTER)
{
- cl = reg_preferred_class (regno);
- gcc_assert (cl != NO_REGS);
- regno = ira_class_hard_regs[cl][0];
+ if (reg_renumber[regno] >= 0)
+   regno = reg_renumber[regno];
+ else
+   {
+ cl = reg_preferred_class (regno);
+ gcc_assert (cl != NO_REGS);
+ regno = ira_class_hard_regs[cl][0];
+   }
}
   if (FP_REGNO_P (regno))
{
@@ -7407,9 +7412,14 @@ rs6000_emit_move (rtx dest, rtx source,

   if (regno >= FIRST_PSEUDO_REGISTER)
{
- cl = reg_preferred_class (regno);
- gcc_assert (cl != NO_REGS);
- regno = ira_class_hard_regs[cl][0];
+ if (reg_renumber[regno] >= 0)
+   regno = reg_renumber[regno];
+ else
+   {
+ cl = reg_preferred_class (regno);
+ gcc_assert (cl != NO_REGS);
+ regno = ira_class_hard_regs[cl][0];
+   }
}
   if (FP_REGNO_P (regno))
{





Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-25 Thread Vladimir Makarov

On 04/24/2013 03:42 PM, Michael Meissner wrote:

I'm seeing a lot of failures with these changes in make check.


I've reproduced it, Mike.  I'll work on it.  Thanks.



Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-24 Thread Michael Meissner
I'm seeing a lot of failures with these changes in make check.

The first two that I noticed on a build that did not use --with-cpu=power7:

1) c-c++-common/dfp/call-by-value.c (and others in the directory) fails with -O0
for all targets before power7 because it can't spill SDmode.  Note, in the
earlier targets, we need to have a wider spill slot because we don't have
32-bit integer load/store to the FPR registers.

FAIL: c-c++-common/dfp/call-by-value.c (internal compiler error)
FAIL: c-c++-common/dfp/call-by-value.c (test for excess errors)
Excess errors:
/home/meissner/fsf-src/meissner-lra/gcc/testsuite/c-c++-common/dfp/call-by-value.c:43:1:
 internal compiler error: in assign_by_spills, at lra-assigns.c:1268
0x104ceff7 assign_by_spills
/home/meissner/fsf-src/meissner-lra/gcc/lra-assigns.c:1268
0x104cfe43 lra_assign()
/home/meissner/fsf-src/meissner-lra/gcc/lra-assigns.c:1425
0x104ca837 lra(_IO_FILE*)
/home/meissner/fsf-src/meissner-lra/gcc/lra.c:2309
0x1047d6eb do_reload
/home/meissner/fsf-src/meissner-lra/gcc/ira.c:4619
0x1047d6eb rest_of_handle_reload
/home/meissner/fsf-src/meissner-lra/gcc/ira.c:4731

2) A lot of fortran tests are failing for all optimization levels due to
segmentation violations at runtime:

AIL: gfortran.dg/advance_1.f90  -O0  execution test
Executing on host: 
/home/meissner/fsf-build-ppc64/meissner-lra/gcc/testsuite/gfortran1/../../gfortran
 -B/home/meissner/fsf-build-ppc64/meissner-lra/gcc/testsuite/gfortran1/../../ 
-B/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unkno
wn-linux-gnu/./libgfortran/ 
/home/meissner/fsf-src/meissner-lra/gcc/testsuite/gfortran.dg/advance_1.f90  
-fno-diagnostics-show-caret   -O1   -pedantic-errors  
-B/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libg
fortran/.libs 
-L/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs
 
-L/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs
  -lm   -m32 -o ./advance_1.exe(t
imeout = 300)
spawn 
/home/meissner/fsf-build-ppc64/meissner-lra/gcc/testsuite/gfortran1/../../gfortran
 -B/home/meissner/fsf-build-ppc64/meissner-lra/gcc/testsuite/gfortran1/../../ 
-B/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/
./libgfortran/ 
/home/meissner/fsf-src/meissner-lra/gcc/testsuite/gfortran.dg/advance_1.f90 
-fno-diagnostics-show-caret -O1 -pedantic-errors 
-B/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs
 -L/ho
me/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs
 
-L/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs
 -lm -m32 -o ./advance_1.exe
PASS: gfortran.dg/advance_1.f90  -O1  (test for excess errors)
Setting LD_LIBRARY_PATH to 
.:/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs:/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs:/home/meissner/fsf-build-
ppc64/meissner-lra/gcc:/home/meissner/fsf-build-ppc64/meissner-lra/gcc/32:.:/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs:/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu
/32/libgfortran/.libs:/home/meissner/fsf-build-ppc64/meissner-lra/gcc:/home/meissner/fsf-build-ppc64/meissner-lra/gcc/32:/home/meissner/tools/ppc64/lib:/home/meissner/tools/ppc32/lib:/home/meissner/tools-binutils/ppc64/lib:/home/meissner/to
ols-binutils/ppc32/lib
spawn [open ...]

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-23 Thread David Edelsohn
On Tue, Apr 23, 2013 at 11:37 AM, Vladimir Makarov  wrote:

> Sorry, may be I missed some other places but
> could you be more specific about the place where combining lra_in_progress
> and mode happens now for legitimate address querying.
>
> I guess I explained in my previous emails why the following change is
> necessary in legitimate_lo_sum_address_p:
>
>   toc_ok_p = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL
>   && small_toc_ref (x, VOIDmode));
>
> lra_in_progress was removed in my latest change which now looks like the
> original code:
>
>   if (GET_MODE_SIZE (mode) > UNITS_PER_WORD
>
>   && !(/* ??? Assume floating point reg based on mode?  */
>TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
>&& (mode == DFmode || mode == DDmode)))
> return false;
>
> So I can not see other places.

Okay. I misunderstood the change. I thought that you were adding
lra_in_progress tied to the mode.

Thanks, David


Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-23 Thread Vladimir Makarov

On 04/23/2013 11:33 AM, David Edelsohn wrote:

On Tue, Apr 23, 2013 at 11:03 AM, Vladimir Makarov  wrote:


   LRA approach is different from reload one.  First of all LRA can create
pseudos (not assigned yet to anything) and secondly LRA makes changes
incrementally opposite to reload which makes all final rewriting code when
it decides that a final solution is found.  The difference in approaches
means that LRA can not use all reload macros and hooks completely without
changes.  Although I tried to minimize the changes, they are still present.
For example, LRA will never use hooks which call push_reload which is
completely oriented to reload pass.

   I'd not say that "the code uniformly returned consistent values". One
place where lra_in_progress is needed exactly because of discrepancy between
legitimize reload address hook (which can call push_reload and can not be
used by LRA) and legitimate address hook.  Two other places in
rs6000_emit_move for SDmode where lra_in_progress is used are because LRA
can create and use spilled pseudos instead of using concrete memory slot as
reload does.  And two the rest places are in calls of
legitimate_const_pool_address_p where LRA actually querying in strict sense.
So I think the changes are minimal.

David, thanks for expressing the concern.  I hope I answered it. The future
will show the reality not just our speculations.


Vlad,

I don't think that you understood my concern.  I am not concerned
about adding lra_in_progress.  I am concerned about combining
lra_in_progress with the mode.  I would like an explanation why it is
correct for the result of legitimate address functions to vary with
lra_in_progress combined with the mode.  Not simply that it works.

The issue should be if an offsettable address is valid and the size of
the offset.


Sorry, may be I missed some other places but
could you be more specific about the place where combining 
lra_in_progress and mode happens now for legitimate address querying.


I guess I explained in my previous emails why the following change is 
necessary in legitimate_lo_sum_address_p:


  toc_ok_p = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL
  && small_toc_ref (x, VOIDmode));

lra_in_progress was removed in my latest change which now looks like the 
original code:


  if (GET_MODE_SIZE (mode) > UNITS_PER_WORD
  && !(/* ??? Assume floating point reg based on mode?  */
   TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
   && (mode == DFmode || mode == DDmode)))
return false;

So I can not see other places.


Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-23 Thread David Edelsohn
On Tue, Apr 23, 2013 at 11:03 AM, Vladimir Makarov  wrote:

>   LRA approach is different from reload one.  First of all LRA can create
> pseudos (not assigned yet to anything) and secondly LRA makes changes
> incrementally opposite to reload which makes all final rewriting code when
> it decides that a final solution is found.  The difference in approaches
> means that LRA can not use all reload macros and hooks completely without
> changes.  Although I tried to minimize the changes, they are still present.
> For example, LRA will never use hooks which call push_reload which is
> completely oriented to reload pass.
>
>   I'd not say that "the code uniformly returned consistent values". One
> place where lra_in_progress is needed exactly because of discrepancy between
> legitimize reload address hook (which can call push_reload and can not be
> used by LRA) and legitimate address hook.  Two other places in
> rs6000_emit_move for SDmode where lra_in_progress is used are because LRA
> can create and use spilled pseudos instead of using concrete memory slot as
> reload does.  And two the rest places are in calls of
> legitimate_const_pool_address_p where LRA actually querying in strict sense.
> So I think the changes are minimal.
>
> David, thanks for expressing the concern.  I hope I answered it. The future
> will show the reality not just our speculations.


Vlad,

I don't think that you understood my concern.  I am not concerned
about adding lra_in_progress.  I am concerned about combining
lra_in_progress with the mode.  I would like an explanation why it is
correct for the result of legitimate address functions to vary with
lra_in_progress combined with the mode.  Not simply that it works.

The issue should be if an offsettable address is valid and the size of
the offset.

Thanks, David


Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-23 Thread Vladimir Makarov

On 04/22/2013 09:55 PM, David Edelsohn wrote:

On Mon, Apr 22, 2013 at 8:43 PM, Vladimir Makarov  wrote:


 * config/rs6000/rs6000.c (legitimate_lo_sum_address_p): Remove
 lra_in_progress guard for addressing something bigger than word.

That will work, but I'm worried that it is too fragile.  Previously
the code uniformly returned consistent values from the various
legitimate address functions.  Changing the response based on
lra_in_progress for various modes seems like a problem waiting to
happen.

  LRA approach is different from reload one.  First of all LRA can 
create pseudos (not assigned yet to anything) and secondly LRA makes 
changes incrementally opposite to reload which makes all final rewriting 
code when it decides that a final solution is found.  The difference in 
approaches means that LRA can not use all reload macros and hooks 
completely without changes.  Although I tried to minimize the changes, 
they are still present.  For example, LRA will never use hooks which 
call push_reload which is completely oriented to reload pass.


  I'd not say that "the code uniformly returned consistent values". One 
place where lra_in_progress is needed exactly because of discrepancy 
between legitimize reload address hook (which can call push_reload and 
can not be used by LRA) and legitimate address hook.  Two other places 
in rs6000_emit_move for SDmode where lra_in_progress is used are because 
LRA can create and use spilled pseudos instead of using concrete memory 
slot as reload does.  And two the rest places are in calls of 
legitimate_const_pool_address_p where LRA actually querying in strict 
sense.  So I think the changes are minimal.


David, thanks for expressing the concern.  I hope I answered it. The 
future will show the reality not just our speculations.




Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-22 Thread David Edelsohn
On Mon, Apr 22, 2013 at 8:43 PM, Vladimir Makarov  wrote:

> * config/rs6000/rs6000.c (legitimate_lo_sum_address_p): Remove
> lra_in_progress guard for addressing something bigger than word.

That will work, but I'm worried that it is too fragile.  Previously
the code uniformly returned consistent values from the various
legitimate address functions.  Changing the response based on
lra_in_progress for various modes seems like a problem waiting to
happen.

Thanks, David


Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-22 Thread Vladimir Makarov

On 13-04-22 3:31 PM, Michael Meissner wrote:

On Mon, Apr 22, 2013 at 03:26:45PM -0400, Vladimir Makarov wrote:

On 13-04-22 12:35 AM, Alan Modra wrote:

On Fri, Apr 19, 2013 at 05:16:43PM -0400, Vladimir Makarov wrote:

   I don't understand what this check means and what comments ??? means too.

A lo_sum mem is only valid if you know it won't be offset (or that
offsetting will never cross a 64k+32k boundary).  If the access is
smaller than a word then the load or store can be done in one insn.
No offset required.  If the access is a DFmode *and* you are loading
or storing a floating point reg, then the access is also done in one
insn.  The ??? comment is referring to the fact that you don't know
for sure that the DFmode is in a floating point reg.  It usually is,
but may be in two general purpose regs.  Which then need an offset to
load/store the second reg.


Alan, thanks for the explanation. I'll search for another solution.

I'm suspecting secondary_reload needs more tuning for TF/TD modes.

I've fixed dealII crash and commited the patch into the branch as rev. 
198169.


The dealII crash itself can be cured by treatment of lo_sum for LRA the 
same way as for reload (please see code checking modes for addressing 
more one word).  But in this case a few tests fail which is cured in LRA 
itself by trying to load address into pseudo using lo_sum.


The patch was successfully bootstrapped (--with-cpu=power7) and tested 
on GCC testsuite.


2013-04-22  Vladimir Makarov 

* lra-constraints.c (process_address): Try to put lo_sum into
register.
* config/rs6000/rs6000.c (legitimate_lo_sum_address_p): Remove
lra_in_progress guard for addressing something bigger than word.


Index: ChangeLog
===
--- ChangeLog   (revision 198101)
+++ ChangeLog   (working copy)
@@ -1,3 +1,10 @@
+2013-04-22  Vladimir Makarov  
+
+   * lra-constraints.c (process_address): Try to put lo_sum into
+   register.
+   * config/rs6000/rs6000.c (legitimate_lo_sum_address_p): Remove
+   lra_in_progress guard for addressing something bigger than word.
+
 2013-04-18  Vladimir Makarov  
 
* lra-constraints.c (check_and_process_move): Move code for move
Index: config/rs6000/rs6000.c
===
--- config/rs6000/rs6000.c  (revision 198101)
+++ config/rs6000/rs6000.c  (working copy)
@@ -5736,7 +5736,7 @@ legitimate_lo_sum_address_p (enum machin
return false;
   if (GET_MODE_NUNITS (mode) != 1)
return false;
-  if (! lra_in_progress && GET_MODE_SIZE (mode) > UNITS_PER_WORD
+  if (GET_MODE_SIZE (mode) > UNITS_PER_WORD
  && !(/* ??? Assume floating point reg based on mode?  */
   TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
   && (mode == DFmode || mode == DDmode)))
Index: lra-constraints.c
===
--- lra-constraints.c   (revision 198101)
+++ lra-constraints.c   (working copy)
@@ -2504,8 +2504,21 @@ process_address (int nop, rtx *before, r
*ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
if (! valid_address_p (ad.mode, *ad.outer, ad.as))
  {
-   *ad.inner = addr;
-   code = -1;
+   /* Try to put lo_sum into register.  */
+   insn = emit_insn (gen_rtx_SET
+ (VOIDmode, new_reg,
+  gen_rtx_LO_SUM (Pmode, new_reg, addr)));
+   code = recog_memoized (insn);
+   if (code >= 0)
+ {
+   *ad.inner = new_reg;
+   if (! valid_address_p (ad.mode, *ad.outer, ad.as))
+ {
+   *ad.inner = addr;
+   code = -1;
+ }
+ }
+   
  }
  }
if (code < 0)


Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-22 Thread Michael Meissner
On Mon, Apr 22, 2013 at 03:26:45PM -0400, Vladimir Makarov wrote:
> On 13-04-22 12:35 AM, Alan Modra wrote:
> >On Fri, Apr 19, 2013 at 05:16:43PM -0400, Vladimir Makarov wrote:
> >>   I don't understand what this check means and what comments ??? means too.
> >A lo_sum mem is only valid if you know it won't be offset (or that
> >offsetting will never cross a 64k+32k boundary).  If the access is
> >smaller than a word then the load or store can be done in one insn.
> >No offset required.  If the access is a DFmode *and* you are loading
> >or storing a floating point reg, then the access is also done in one
> >insn.  The ??? comment is referring to the fact that you don't know
> >for sure that the DFmode is in a floating point reg.  It usually is,
> >but may be in two general purpose regs.  Which then need an offset to
> >load/store the second reg.
> >
> Alan, thanks for the explanation. I'll search for another solution.

I'm suspecting secondary_reload needs more tuning for TF/TD modes.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-22 Thread Vladimir Makarov

On 13-04-22 12:35 AM, Alan Modra wrote:

On Fri, Apr 19, 2013 at 05:16:43PM -0400, Vladimir Makarov wrote:

   I don't understand what this check means and what comments ??? means too.

A lo_sum mem is only valid if you know it won't be offset (or that
offsetting will never cross a 64k+32k boundary).  If the access is
smaller than a word then the load or store can be done in one insn.
No offset required.  If the access is a DFmode *and* you are loading
or storing a floating point reg, then the access is also done in one
insn.  The ??? comment is referring to the fact that you don't know
for sure that the DFmode is in a floating point reg.  It usually is,
but may be in two general purpose regs.  Which then need an offset to
load/store the second reg.


Alan, thanks for the explanation. I'll search for another solution.



Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-21 Thread Alan Modra
On Fri, Apr 19, 2013 at 05:16:43PM -0400, Vladimir Makarov wrote:
>   If I change the above code to
> 
>   if (! lra_in_progress && GET_MODE_SIZE (mode) > UNITS_PER_WORD
>   && !(/* ??? Assume floating point reg based on mode?  */
>TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
>&& (mode == DFmode || mode == DDmode || mode ==
> TFmode || mode == TDmode)))
> return false;
> 
>   the crash is gone.

A lo_sum address isn't always offsettable.  You know that you can't
read or write the whole TFmode with one instruction, and that the mem
will be accessed at addr+0 and addr+8 (and possibly at addr+4 and
addr+12).  If it so happens that addr is n*65536+32760, then addr+8
will overflow the 16-bit lo_sum and you'll actually try to access
n*65536-32768 for the second part.

So I don't think your change is correct.

>   I don't understand what this check means and what comments ??? means too.

A lo_sum mem is only valid if you know it won't be offset (or that
offsetting will never cross a 64k+32k boundary).  If the access is
smaller than a word then the load or store can be done in one insn.
No offset required.  If the access is a DFmode *and* you are loading
or storing a floating point reg, then the access is also done in one
insn.  The ??? comment is referring to the fact that you don't know
for sure that the DFmode is in a floating point reg.  It usually is,
but may be in two general purpose regs.  Which then need an offset to
load/store the second reg.

-- 
Alan Modra
Australia Development Lab, IBM


Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-19 Thread Vladimir Makarov

On 13-04-18 4:44 PM, Vladimir Makarov wrote:


Tomorrow I am going to look at SPEC2006 dealII crash for 32-bit mode.


  LRA crashes on insn

(insn 406 575 391 22 (set (reg:TF 35 3)
(mem/u/c:TF (lo_sum:SI (reg:SI 7 7 [414])
(symbol_ref/u:SI ("*.LC10") [flags 0x82])) [85 S16 
A128])) 
/home/cygnus/vmakarov/build/spec2006/benchspec/CPU2006/447.dealII/src/quadrature_lib.cc:80 
373 {*movtf_intern\

al}
 (expr_list:REG_DEAD (reg:SI 7 7 [414])
(nil)))

  As I understand this correct insn.  LRA checks the insn correctness in

if (!constrain_operands (1))
  fatal_insn_not_found (insn);

  and fails.  The reason for this code in 
rs6000.c::legitimate_lo_sum_address_p


  if (! lra_in_progress && GET_MODE_SIZE (mode) > UNITS_PER_WORD
  && !(/* ??? Assume floating point reg based on mode?  */
   TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
   && (mode == DFmode || mode == DDmode)))
return false;

  lra_in_progress is false and mode==TFmode.  I don't want to setup 
lra_in_progress to true at this stage as I need right correctness check 
(lra_in_progress may affect code checking RTLcorrectness, make the test 
less rigourous).


  If I change the above code to

  if (! lra_in_progress && GET_MODE_SIZE (mode) > UNITS_PER_WORD
  && !(/* ??? Assume floating point reg based on mode?  */
   TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
   && (mode == DFmode || mode == DDmode || mode == TFmode 
|| mode == TDmode)))

return false;

  the crash is gone.

  I don't understand what this check means and what comments ??? means too.

  So it is up to you, Mike, to decide is my change correct or not.

Thanks.




Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-18 Thread Michael Meissner
On Thu, Apr 18, 2013 at 04:44:21PM -0400, Vladimir Makarov wrote:
> LRA can fix the wrong address but secondary reload  was done before
> processing addresses.  It could be fixed in rs6000.c code too but it
> is complicated and I found a better (and i think more right)
> solution by moving secondary reload generation after address
> processing.

I tended to think secondary_reload should always happen after address
processing.

> Here is the patch for your branch (patch for trunk is a bit
> different as some changes in affected code were done on trunk).
> 
> 2013-04-18  Vladimir Makarov  
> 
> * lra-constraints.c (check_and_process_move): Move code for move
> cost check to simple_move_p.  Remove equiv_substitution.
> (simple_move_p): New function.
> (curr_insn_transform): Use the new function.  Move call of
> check_and_process_move after operand equiv substitution and
> address process.
> 
> Tomorrow I am going to look at SPEC2006 dealII crash for 32-bit mode.
> 

Thanks for the patch.  Unfortunately I just updated the branch to be merged up
to 198065.  So if you could send me a patch against current trunk, or update
the branch (branches/ibm/meissner-lra) and check it into the branch, it would
be appreciated.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-18 Thread Vladimir Makarov

On 04/17/2013 12:10 PM, Michael Meissner wrote:

On Wed, Apr 17, 2013 at 10:14:53AM -0400, Vladimir Makarov wrote:

Mike, thanks for the patch and all the SPEC2006 data  (which are
very useful as I have no access to power machine which can be used
for benchmarking).  I guess that may be some benchmark scores are
lower because of LRA lacks some micro-optimizations which reload
implements through many power hooks (e.g. LRA does not use push
reload).  Although sometimes it is not a bad thing (e.g. LRA does
not use  SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the
stack slots for other useful things).

SECONDARY_MEMORY_NEEDED_RTX is needed for SDmode on machines before the power7,
where we need a larger stack slot to hold spilled SDmode values (power6 did not
have the LFIWZX instruction that is needed to load SDmode values into floating
point registers).

Thanks for the info.

In general I got impression that power7 is the most difficult port
for LRA.  If we manage to port it, LRA ports for other targets will
be easier.

I dunno, has LRA been ported to the SH yet?

Not yet.

Sorry for be inaccurate.  I meant 9 targets which I worked on to port LRA.

I also reproduced bootstrap failure --with-cpu=power7 and I am going
to work on this and after that on SPEC2006 you wrote about.


The bootstrap problem was in processing move whose operand was 
substituted by equiv. memory and the move needs secondary reload through 
a provided insn pattern.  The equiv memory was not legitimate and it 
resulted in failure to generated the secondary reload insn.


LRA can fix the wrong address but secondary reload  was done before 
processing addresses.  It could be fixed in rs6000.c code too but it is 
complicated and I found a better (and i think more right) solution by 
moving secondary reload generation after address processing.


Here is the patch for your branch (patch for trunk is a bit different as 
some changes in affected code were done on trunk).


2013-04-18  Vladimir Makarov  

* lra-constraints.c (check_and_process_move): Move code for move
cost check to simple_move_p.  Remove equiv_substitution.
(simple_move_p): New function.
(curr_insn_transform): Use the new function.  Move call of
check_and_process_move after operand equiv substitution and
address process.

Tomorrow I am going to look at SPEC2006 dealII crash for 32-bit mode.

Index: lra-constraints.c
===
--- lra-constraints.c   (revision 198028)
+++ lra-constraints.c   (working copy)
@@ -887,14 +887,6 @@ check_and_process_move (bool *change_p,
   lra_assert (curr_insn_set != NULL_RTX);
   dreg = dest = SET_DEST (curr_insn_set);
   sreg = src = SET_SRC (curr_insn_set);
-  /* Quick check on the right move insn which does not need
- reloads.  */
-  if ((dclass = get_op_class (dest)) != NO_REGS
-  && (sclass = get_op_class (src)) != NO_REGS
-  /* The backend guarantees that register moves of cost 2 never
-need reloads.  */
-  && targetm.register_move_cost (GET_MODE (src), dclass, sclass) == 2)
-return true;
   if (GET_CODE (dest) == SUBREG)
 dreg = SUBREG_REG (dest);
   if (GET_CODE (src) == SUBREG)
@@ -902,7 +894,6 @@ check_and_process_move (bool *change_p,
   if (! REG_P (dreg) || ! REG_P (sreg))
 return false;
   sclass = dclass = NO_REGS;
-  dreg = get_equiv_substitution (dreg);
   if (REG_P (dreg))
 dclass = get_reg_class (REGNO (dreg));
   if (dclass == ALL_REGS)
@@ -916,7 +907,6 @@ check_and_process_move (bool *change_p,
 return false;
   sreg_mode = GET_MODE (sreg);
   old_sreg = sreg;
-  sreg = get_equiv_substitution (sreg);
   if (REG_P (sreg))
 sclass = get_reg_class (REGNO (sreg));
   if (sclass == ALL_REGS)
@@ -2693,6 +2683,24 @@ emit_inc (enum reg_class new_rclass, rtx
   return result;
 }
 
+/* Return true if the current move insn does not need processing as we
+   already know that it satisfies its constraints.  */
+static bool
+simple_move_p (void)
+{
+  rtx dest, src;
+  enum reg_class dclass, sclass;
+
+  lra_assert (curr_insn_set != NULL_RTX);
+  dest = SET_DEST (curr_insn_set);
+  src = SET_SRC (curr_insn_set);
+  return ((dclass = get_op_class (dest)) != NO_REGS
+ && (sclass = get_op_class (src)) != NO_REGS
+ /* The backend guarantees that register moves of cost 2
+never need reloads.  */
+ && targetm.register_move_cost (GET_MODE (src), dclass, sclass) == 2);
+ }
+
 /* Swap operands NOP and NOP + 1. */
 static inline void
 swap_operands (int nop)
@@ -2736,15 +2744,13 @@ curr_insn_transform (void)
   int max_regno_before;
   int reused_alternative_num;
 
+  curr_insn_set = single_set (curr_insn);
+  if (curr_insn_set != NULL_RTX && simple_move_p ())
+return false;
+
   no_input_reloads_p = no_output_reloads_p = false;
   goal_alt_number = -1;
-
   change_p = sec_mem_p = false;
-  curr_insn_set = single_set (curr_insn);
-  if (curr_insn_set != 

Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-17 Thread Michael Meissner
On Wed, Apr 17, 2013 at 10:14:53AM -0400, Vladimir Makarov wrote:
> Mike, thanks for the patch and all the SPEC2006 data  (which are
> very useful as I have no access to power machine which can be used
> for benchmarking).  I guess that may be some benchmark scores are
> lower because of LRA lacks some micro-optimizations which reload
> implements through many power hooks (e.g. LRA does not use push
> reload).  Although sometimes it is not a bad thing (e.g. LRA does
> not use  SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the
> stack slots for other useful things).

SECONDARY_MEMORY_NEEDED_RTX is needed for SDmode on machines before the power7,
where we need a larger stack slot to hold spilled SDmode values (power6 did not
have the LFIWZX instruction that is needed to load SDmode values into floating
point registers).

> In general I got impression that power7 is the most difficult port
> for LRA.  If we manage to port it, LRA ports for other targets will
> be easier.

I dunno, has LRA been ported to the SH yet?

> I also reproduced bootstrap failure --with-cpu=power7 and I am going
> to work on this and after that on SPEC2006 you wrote about.

Ok.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-17 Thread Vladimir Makarov

On 13-04-16 6:56 PM, Michael Meissner wrote:

I tracked down the bug with the spec 2006 benchmark WRF using the LRA register
allocator.

At one point LRA has decided to use the CTR to hold a CCmode value:

(insn 11019 11018 11020 16 (set (reg:CC 66 ctr [4411])
 (reg:CC 66 ctr [4411])) module_diffusion_em.fppized.f90:4885 360 
{*movcc_internal1}
  (expr_list:REG_DEAD (reg:CC 66 ctr [4411])
 (nil)))

Now movcc_internal1 has moves from r->h (which includes ctr/lr) and ctr/lr->r,
but it doesn't have a move to cover the nop move of moving the ctr to the ctr.
IMHO, LRA should not be generating NOP moves that are later deleted.

There are two ways to solve the problem.  One is not to let anything but int
modes into CTR/LR, which will also eliminate the register allocator from
spilling floating point values there (which we've seen in the past, but the
last time I tried to eliminate it I couldn't).  The following patch does this,
and also changes the assertion to call fatal_insn_not_found to make it clearer
what the error is.

I imagine, I could add a NOP move insn to movcc_internal1, but that just
strikes me as wrong.

Note, this does not fix the 32-bit failure in dealII, and I also noticed that I
can't bootstrap the compiler using --with-cpu=power7, which I will get to
tomorrow.

2013-04-16  Michael Meissner  

* config/rs6000/rs6000.opt (-mconstrain-regs): New debug switch to
control whether we only allow int modes to go in the CTR, LR,
VRSAVE, VSCR registers.
* config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Likewise.
(rs6000_debug_reg_global): If -mdebug=reg, print out if SPRs are
constrained.
(rs6000_option_override_internal): Set -mconstrain-regs if we are
using the LRA register allocator.

* lra.c (check_rtl): Use fatal_insn_not_found to report constraint
does not match.

Mike, thanks for the patch and all the SPEC2006 data  (which are very 
useful as I have no access to power machine which can be used for 
benchmarking).  I guess that may be some benchmark scores are lower 
because of LRA lacks some micro-optimizations which reload implements 
through many power hooks (e.g. LRA does not use push reload).  Although 
sometimes it is not a bad thing (e.g. LRA does not use  
SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the stack slots for 
other useful things).


In general I got impression that power7 is the most difficult port for 
LRA.  If we manage to port it, LRA ports for other targets will be easier.


I also reproduced bootstrap failure --with-cpu=power7 and I am going to 
work on this and after that on SPEC2006 you wrote about.




Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-16 Thread Michael Meissner
I tracked down the bug with the spec 2006 benchmark WRF using the LRA register
allocator.

At one point LRA has decided to use the CTR to hold a CCmode value:

(insn 11019 11018 11020 16 (set (reg:CC 66 ctr [4411])
(reg:CC 66 ctr [4411])) module_diffusion_em.fppized.f90:4885 360 
{*movcc_internal1}
 (expr_list:REG_DEAD (reg:CC 66 ctr [4411])
(nil)))

Now movcc_internal1 has moves from r->h (which includes ctr/lr) and ctr/lr->r,
but it doesn't have a move to cover the nop move of moving the ctr to the ctr.
IMHO, LRA should not be generating NOP moves that are later deleted.

There are two ways to solve the problem.  One is not to let anything but int
modes into CTR/LR, which will also eliminate the register allocator from
spilling floating point values there (which we've seen in the past, but the
last time I tried to eliminate it I couldn't).  The following patch does this,
and also changes the assertion to call fatal_insn_not_found to make it clearer
what the error is.

I imagine, I could add a NOP move insn to movcc_internal1, but that just
strikes me as wrong.

Note, this does not fix the 32-bit failure in dealII, and I also noticed that I
can't bootstrap the compiler using --with-cpu=power7, which I will get to
tomorrow.

2013-04-16  Michael Meissner  

* config/rs6000/rs6000.opt (-mconstrain-regs): New debug switch to
control whether we only allow int modes to go in the CTR, LR,
VRSAVE, VSCR registers.
* config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Likewise.
(rs6000_debug_reg_global): If -mdebug=reg, print out if SPRs are
constrained.
(rs6000_option_override_internal): Set -mconstrain-regs if we are
using the LRA register allocator.

* lra.c (check_rtl): Use fatal_insn_not_found to report constraint
does not match.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.opt
===
--- gcc/config/rs6000/rs6000.opt(revision 197925)
+++ gcc/config/rs6000/rs6000.opt(working copy)
@@ -522,3 +522,7 @@ Control whether we save the TOC in the p
 mvsx-timode
 Target Undocumented Mask(VSX_TIMODE) Var(rs6000_isa_flags)
 ; Allow/disallow TImode in VSX registers
+
+mconstrain-regs
+Target Undocumented Mask(CONSTRAIN_REGS) Var(rs6000_isa_flags)
+; Only allow ints of certain modes to go in SPRs
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 197925)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -1608,6 +1608,18 @@ rs6000_hard_regno_mode_ok (int regno, en
   if (SPE_SIMD_REGNO_P (regno) && TARGET_SPE && SPE_VECTOR_MODE (mode))
 return 1;
 
+  /* See if we need to be stricter about what goes into the special
+ registers (LR, CTR, VSAVE, VSCR).  */
+  if (TARGET_CONSTRAIN_REGS)
+{
+  if (regno == LR_REGNO || regno == CTR_REGNO)
+   return (GET_MODE_CLASS (mode) == MODE_INT
+   && rs6000_hard_regno_nregs[mode][regno] == 1);
+
+  if (regno == VRSAVE_REGNO || regno == VSCR_REGNO)
+   return (mode == SImode);
+}
+
   /* We cannot put non-VSX TImode or PTImode anywhere except general register
  and it must be able to fit within the register set.  */
 
@@ -2054,6 +2066,9 @@ rs6000_debug_reg_global (void)
   if (targetm.lra_p ())
 fprintf (stderr, DEBUG_FMT_S, "lra", "true");
 
+  if (TARGET_CONSTRAIN_REGS)
+fprintf (stderr, DEBUG_FMT_S, "constrain-regs", "true");
+
   fprintf (stderr, DEBUG_FMT_S, "plt-format",
   TARGET_SECURE_PLT ? "secure" : "bss");
   fprintf (stderr, DEBUG_FMT_S, "struct-return",
@@ -2945,6 +2960,12 @@ rs6000_option_override_internal (bool gl
}
 }
 
+  /* If we are using the LRA register allocator, constrain the SPRs to only
+ have integer modes, and not modes like CC.  */
+  if ((rs6000_isa_flags_explicit & OPTION_MASK_CONSTRAIN_REGS) == 0
+  && targetm.lra_p ())
+rs6000_isa_flags |= OPTION_MASK_CONSTRAIN_REGS;
+
   /* Place FP constants in the constant pool instead of TOC
  if section anchors enabled.  */
   if (flag_section_anchors)
Index: gcc/lra.c
===
--- gcc/lra.c   (revision 197925)
+++ gcc/lra.c   (working copy)
@@ -1996,7 +1996,8 @@ check_rtl (bool final_p)
if (final_p)
  {
extract_insn (insn);
-   lra_assert (constrain_operands (1));
+   if (!constrain_operands (1))
+ fatal_insn_not_found (insn);
continue;
  }
if (insn_invalid_p (insn, false))