Re: RFC: LRA for x86/x86-64 [4/9]

2012-10-02 Thread Vladimir Makarov

On 10/01/2012 02:51 PM, Richard Sandiford wrote:

Vladimir Makarov vmaka...@redhat.com writes:

+/* Return register bank of given hard regno for the current target.  */
+DEFHOOK
+(register_bank,
+ A target hook which returns the register bank number to which the\
+  register @var{hard_regno} belongs to.  The smaller the number, the\
+  more preferable the hard register usage (when all other conditions are\
+  the same).  This hook can be used to prefer some hard register over\
+  others in LRA.  For example, some x86-64 register usage needs\
+  additional prefix which makes instructions longer.  The hook can\
+  return bigger bank number for such registers make them less favorable\
+  and as result making the generated code smaller.\
+  \
+  The default version of this target hook returns always zero.,
+ int, (int),
+ default_register_bank)

This is a horribly bikeshed-level comment, sorry, but I wonder if
something like register_priority would be better.  Register classes
are in some ways an extension of register banks, so it wasn't obvious
from the name why we needed both.
Ok.  I agree that is not a good term.  Register bank in hardware 
(especially in DSP) means a bit different thing.


Actually, on the Cauldron Ian asked me why it is different from register 
allocation order.  I should say that the order usually takes 
caller-saves info into account.  In x86-64, reg with REX flags can be 
caller-saved or not.

+/* Return true if maximal address displacement can be different.  */
+DEFHOOK
+(different_addr_displacement_p,
+ A target hook which returns true if an address with the same structure\
+  can have different maximal legitimate displacement.  For example, the\
+  displacement can depend on memory mode or on operand combinations in\
+  the insn.\
+  \
+  The default version of this target hook returns always false.,
+ bool, (void),
+ default_different_addr_displacement_p)

If I read the patch correctly, this is only used in:

+   if (lra_reg_spill_p || targetm.different_addr_displacement_p ())
+ lra_set_used_insn_alternative (insn, -1);

and so we keep the current alternative when neither spill_class_mode
nor different_addr_displacement_p is defined.  How many targets on the
LRA branch are like that?  I would have expected most targets with limited
address displacements would have to return true for the above hook,
because multiword loads and stores typically have to be split into word
loads and stores.  Same goes for strict-alignment targets, where wider
modes often have slightly lower maximal displacements.

E.g. for MIPS, SImode loads and stores have a displacement range of
[-32768, 32764], but DImode loads and stores only accept [-32768, 32760].
So the maximal displacement depends on mode, even though the instruction set
is pretty regular.

Targets with full address-size displacements can use the default false return,
but it looks like the x86 port defines spill_class_mode instead, so AIUI
the value isn't really tested on Core i7.  What's the impact of that compared
to the other x86 targets that don't set X86_TUNE_GENERAL_REGS_SSE_SPILL?
Is LRA just quicker for them, or will it make different decisions
(compared to Core i7) even for non-SSE insns?
It is mostly done for the LRA speed.  We could remove if-stmt and 
everything will be all right, only lra-constraints pass will go over all 
alternatives again.


Currently, there are two targets for which if-cond is true.  One is 
x86-64 (more exactly when corei7 tune is used) and another target PARISC 
for which different_add_displacement_p is true.


It might be more in the future.  Future ARMs and Powers might be 
profitable for spilling general registers into vector/floating point 
registers.  Might be new other targets will need 
different_addr_displacement_p.


We could rid off the hook but it is important for speeding LRA up. I 
felt that I should make LRA speed competitive with reload even it is 
hard because LRA works on RTL and not on internal structures as reload.  
I see now I was right worrying about the speed.  I guess if we could 
sacrifice 2% of compilation time, LRA code would be smaller and more 
clear.  If we could sacrifice 10% percent compilation time, the code 
would be even smaller and clear because we could use DF-infrastructure.

+/* Determine class of registers which could be used for spilled
+   pseudos instead of memory.  */
+DEFHOOK
+(spill_class,
+ This hook defines a class of registers which could be used for spilled 
pseudos\
+  of given class instead of memory,
+ reg_class_t, (reg_class_t),
+ NULL)

Should probably say that NO_REGS means none.

+/* Determine mode for spilling pseudos into registers instead of memory.  */
+DEFHOOK
+(spill_class_mode,
+ This hook defines mode in which a pseudo of given mode and of the first\
+  register class can be spilled into the second register class,
+ enum machine_mode, (reg_class_t, reg_class_t, enum machine_mode),
+ NULL)

It looks like the only use is in:


Re: RFC: LRA for x86/x86-64 [4/9]

2012-10-01 Thread Richard Sandiford
Thanks a lot for doing this.  When you finally get to the stage of
rm reload.c reload1.c, please do it in a screen session and save
the log for posterity.

Vladimir Makarov vmaka...@redhat.com writes:
 +/* Return register bank of given hard regno for the current target.  */
 +DEFHOOK
 +(register_bank,
 + A target hook which returns the register bank number to which the\
 +  register @var{hard_regno} belongs to.  The smaller the number, the\
 +  more preferable the hard register usage (when all other conditions are\
 +  the same).  This hook can be used to prefer some hard register over\
 +  others in LRA.  For example, some x86-64 register usage needs\
 +  additional prefix which makes instructions longer.  The hook can\
 +  return bigger bank number for such registers make them less favorable\
 +  and as result making the generated code smaller.\
 +  \
 +  The default version of this target hook returns always zero.,
 + int, (int),
 + default_register_bank)

This is a horribly bikeshed-level comment, sorry, but I wonder if
something like register_priority would be better.  Register classes
are in some ways an extension of register banks, so it wasn't obvious
from the name why we needed both.

 +/* Return true if maximal address displacement can be different.  */
 +DEFHOOK
 +(different_addr_displacement_p,
 + A target hook which returns true if an address with the same structure\
 +  can have different maximal legitimate displacement.  For example, the\
 +  displacement can depend on memory mode or on operand combinations in\
 +  the insn.\
 +  \
 +  The default version of this target hook returns always false.,
 + bool, (void),
 + default_different_addr_displacement_p)

If I read the patch correctly, this is only used in:

+   if (lra_reg_spill_p || targetm.different_addr_displacement_p ())
+ lra_set_used_insn_alternative (insn, -1);

and so we keep the current alternative when neither spill_class_mode
nor different_addr_displacement_p is defined.  How many targets on the
LRA branch are like that?  I would have expected most targets with limited
address displacements would have to return true for the above hook,
because multiword loads and stores typically have to be split into word
loads and stores.  Same goes for strict-alignment targets, where wider
modes often have slightly lower maximal displacements.

E.g. for MIPS, SImode loads and stores have a displacement range of
[-32768, 32764], but DImode loads and stores only accept [-32768, 32760].
So the maximal displacement depends on mode, even though the instruction set
is pretty regular.

Targets with full address-size displacements can use the default false return,
but it looks like the x86 port defines spill_class_mode instead, so AIUI
the value isn't really tested on Core i7.  What's the impact of that compared
to the other x86 targets that don't set X86_TUNE_GENERAL_REGS_SSE_SPILL?
Is LRA just quicker for them, or will it make different decisions
(compared to Core i7) even for non-SSE insns?

 +/* Determine class of registers which could be used for spilled
 +   pseudos instead of memory.  */
 +DEFHOOK
 +(spill_class,
 + This hook defines a class of registers which could be used for spilled 
 pseudos\
 +  of given class instead of memory,
 + reg_class_t, (reg_class_t),
 + NULL)

Should probably say that NO_REGS means none.

 +/* Determine mode for spilling pseudos into registers instead of memory.  */
 +DEFHOOK
 +(spill_class_mode,
 + This hook defines mode in which a pseudo of given mode and of the first\
 +  register class can be spilled into the second register class,
 + enum machine_mode, (reg_class_t, reg_class_t, enum machine_mode),
 + NULL)

It looks like the only use is in:

+ || (targetm.spill_class_mode (rclass, spill_class,
+   PSEUDO_REGNO_MODE (regno))
+ != PSEUDO_REGNO_MODE (regno))

So would it make sense to have a single hook like:

/* Determine mode for spilling pseudos into registers instead of memory.  */
DEFHOOK
(spill_class,
 This hook defines a class of registers which could be used for spilling\
  pseudos of the given mode and class, or @code{NO_REGS} if only memory\
  should be used.  Not defining this hook is equivalent to returning\
  @code{NO_REGS} for all inputs.
 reg_class_t, (reg_class_t, enum machine_mode),
 NULL)

?  It means that setup_reg_spill_flag needs a class-x-mode walk, but
(bad excuse ahoy) we have plenty of those already.  If we really wanted
to avoid the extra loop, we could make VOIDmode mean any mode,
although that does make the interface a bit more clunky.

Richard


Re: RFC: LRA for x86/x86-64 [4/9]

2012-10-01 Thread Paul_Koning

On Oct 1, 2012, at 2:51 PM, Richard Sandiford wrote:

 ...
 E.g. for MIPS, SImode loads and stores have a displacement range of
 [-32768, 32764], but DImode loads and stores only accept [-32768, 32760].
 So the maximal displacement depends on mode, even though the instruction set
 is pretty regular.

It may be that the case doesn't arise in code GCC generates, but I don't think 
that's true.  The offset field is always a 2's complement 16 bit integer, hence 
in the range -32768..32767.  The alignment required in loading multibyte data 
with aligned load/store instructions applies to the final address, not the 
offset.  For example, if R1 contains 1, then LD r2,32767(r1) will work.

paul



Re: RFC: LRA for x86/x86-64 [4/9]

2012-10-01 Thread Jeff Law

On 09/27/2012 07:44 PM, Vladimir Makarov wrote:

On 09/27/2012 08:07 PM, Joseph S. Myers wrote:

On Thu, 27 Sep 2012, Vladimir Makarov wrote:


Hook spill_class returns a value of enum reg_class which is defined in
target-depend include file.

That's what reg_class_t is for: avoiding enum reg_class in hook
interfaces.


Ok.  Thanks for pointing this out.

Here is the modified patch.

2012-09-27  Vladimir Makarov  vmaka...@redhat.com

 * targhooks.h (default_lra_p): Declare.
 (default_register_bank): Ditto.
 (default_different_addr_displacement_p): Ditto.
 * targhooks.c (default_lra_p): New function.
 (default_register_bank): Ditto.
 (default_different_addr_displacement_p): Ditto.
 * target.def (lra_p): New hook.
 (register_bank): Ditto.
 (different_addr_displacement_p): Ditto.
 (spill_class, spill_class_mode): New hooks.
 * doc/tm.texi.in: Add TARGET_LRA_P, TARGET_REGISTER_BANK,
 TARGET_DIFFERENT_ADDR_DISPLACEMENT_P, TARGET_SPILL_CLASS, and
 TARGET_SPILL_CLASS_MODE.
 * doc/tm.texi: Update.

The change also requires some modification in the 9th patch. The
ChangeLog for the patch should be the same as before.

This looks fine to me.

jeff



Re: RFC: LRA for x86/x86-64 [4/9]

2012-09-27 Thread Joseph S. Myers
On Thu, 27 Sep 2012, Vladimir Makarov wrote:

 * target.h: Include tm.h.

That's a backward step; we'd like parts of the compiler that aren't using 
target macros directly not to end up including tm.h.  Why do you need 
this?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: RFC: LRA for x86/x86-64 [4/9]

2012-09-27 Thread Vladimir Makarov

On 09/27/2012 07:05 PM, Joseph S. Myers wrote:

On Thu, 27 Sep 2012, Vladimir Makarov wrote:


 * target.h: Include tm.h.

That's a backward step; we'd like parts of the compiler that aren't using
target macros directly not to end up including tm.h.  Why do you need
this?


Thanks, Joseph.

Hook spill_class returns a value of enum reg_class which is defined in 
target-depend include file.


If it is really bad, I'll try to find a solution but it will be probably 
not a pleasant one like the hook returning int which is converted to and 
from enum reg_class.




Re: RFC: LRA for x86/x86-64 [4/9]

2012-09-27 Thread Joseph S. Myers
On Thu, 27 Sep 2012, Vladimir Makarov wrote:

 Hook spill_class returns a value of enum reg_class which is defined in
 target-depend include file.

That's what reg_class_t is for: avoiding enum reg_class in hook 
interfaces.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: RFC: LRA for x86/x86-64 [4/9]

2012-09-27 Thread Vladimir Makarov

On 09/27/2012 08:07 PM, Joseph S. Myers wrote:

On Thu, 27 Sep 2012, Vladimir Makarov wrote:


Hook spill_class returns a value of enum reg_class which is defined in
target-depend include file.

That's what reg_class_t is for: avoiding enum reg_class in hook
interfaces.


Ok.  Thanks for pointing this out.

Here is the modified patch.

2012-09-27  Vladimir Makarov  vmaka...@redhat.com

* targhooks.h (default_lra_p): Declare.
(default_register_bank): Ditto.
(default_different_addr_displacement_p): Ditto.
* targhooks.c (default_lra_p): New function.
(default_register_bank): Ditto.
(default_different_addr_displacement_p): Ditto.
* target.def (lra_p): New hook.
(register_bank): Ditto.
(different_addr_displacement_p): Ditto.
(spill_class, spill_class_mode): New hooks.
* doc/tm.texi.in: Add TARGET_LRA_P, TARGET_REGISTER_BANK,
TARGET_DIFFERENT_ADDR_DISPLACEMENT_P, TARGET_SPILL_CLASS, and
TARGET_SPILL_CLASS_MODE.
* doc/tm.texi: Update.

The change also requires some modification in the 9th patch. The 
ChangeLog for the patch should be the same as before.


Index: targhooks.c
===
--- targhooks.c	(revision 191771)
+++ targhooks.c	(working copy)
@@ -840,6 +840,24 @@ default_branch_target_register_class (vo
   return NO_REGS;
 }
 
+extern bool
+default_lra_p (void)
+{
+  return false;
+}
+
+int
+default_register_bank (int hard_regno ATTRIBUTE_UNUSED)
+{
+  return 0;
+}
+
+extern bool
+default_different_addr_displacement_p (void)
+{
+  return false;
+}
+
 reg_class_t
 default_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x ATTRIBUTE_UNUSED,
 			  reg_class_t reload_class_i ATTRIBUTE_UNUSED,
Index: targhooks.h
===
--- targhooks.h	(revision 191771)
+++ targhooks.h	(working copy)
@@ -132,6 +132,9 @@ extern rtx default_static_chain (const_t
 extern void default_trampoline_init (rtx, tree, rtx);
 extern int default_return_pops_args (tree, tree, int);
 extern reg_class_t default_branch_target_register_class (void);
+extern bool default_lra_p (void);
+extern int default_register_bank (int);
+extern bool default_different_addr_displacement_p (void);
 extern reg_class_t default_secondary_reload (bool, rtx, reg_class_t,
 	 enum machine_mode,
 	 secondary_reload_info *);
Index: doc/tm.texi
===
--- doc/tm.texi	(revision 191771)
+++ doc/tm.texi	(working copy)
@@ -2893,6 +2893,26 @@ as below:
 @end smallexample
 @end defmac
 
+@deftypefn {Target Hook} bool TARGET_LRA_P (void)
+A target hook which returns true if we use LRA instead of reload pass.  It means that LRA was ported to the target.The default version of this target hook returns always false.
+@end deftypefn
+
+@deftypefn {Target Hook} int TARGET_REGISTER_BANK (int)
+A target hook which returns the register bank number to which the  register @var{hard_regno} belongs to.  The smaller the number, the  more preferable the hard register usage (when all other conditions are  the same).  This hook can be used to prefer some hard register over  others in LRA.  For example, some x86-64 register usage needs  additional prefix which makes instructions longer.  The hook can  return bigger bank number for such registers make them less favorable  and as result making the generated code smaller.The default version of this target hook returns always zero.
+@end deftypefn
+
+@deftypefn {Target Hook} bool TARGET_DIFFERENT_ADDR_DISPLACEMENT_P (void)
+A target hook which returns true if an address with the same structure  can have different maximal legitimate displacement.  For example, the  displacement can depend on memory mode or on operand combinations in  the insn.The default version of this target hook returns always false.
+@end deftypefn
+
+@deftypefn {Target Hook} reg_class_t TARGET_SPILL_CLASS (reg_class_t)
+This hook defines a class of registers which could be used for spilled pseudos  of given class instead of memory
+@end deftypefn
+
+@deftypefn {Target Hook} {enum machine_mode} TARGET_SPILL_CLASS_MODE (reg_class_t, @var{reg_class_t}, enum @var{machine_mode})
+This hook defines mode in which a pseudo of given mode and of the first  register class can be spilled into the second register class
+@end deftypefn
+
 @node Old Constraints
 @section Obsolete Macros for Defining Constraints
 @cindex defining constraints, obsolete method
Index: doc/tm.texi.in
===
--- doc/tm.texi.in	(revision 191771)
+++ doc/tm.texi.in	(working copy)
@@ -2869,6 +2869,16 @@ as below:
 @end smallexample
 @end defmac
 
+@hook TARGET_LRA_P
+
+@hook TARGET_REGISTER_BANK
+
+@hook TARGET_DIFFERENT_ADDR_DISPLACEMENT_P
+
+@hook TARGET_SPILL_CLASS
+
+@hook TARGET_SPILL_CLASS_MODE
+
 @node Old Constraints
 @section Obsolete Macros for Defining Constraints
 @cindex defining