Re: [RFC] hard-reg-set.h refactoring

2011-08-02 Thread Mike Stump

On Aug 2, 2011, at 12:51 AM, Paolo Bonzini wrote:

> On 08/01/2011 09:10 PM, Dimitrios Apostolou wrote:
>> 
>> Keeping my patch exactly the same, just changing the
>> hook_void_hard_reg_set to receive a (HOST_WIDEST_FAST_INT *) arg and
>> doing the necessary typecasts, added an extra 3 M instructions.
>> 
>> But the ix86_live_on_entry is only called 1233x times from df-scan.c.
>> This isn't enough to explain all this overhead.
> 
> Indeed, 0.2% is hard to attribute to anything anyway.

Only if you lack the tools to collect data.  :-(



Re: [RFC] hard-reg-set.h refactoring

2011-08-02 Thread Paolo Bonzini

On 08/01/2011 09:10 PM, Dimitrios Apostolou wrote:


Keeping my patch exactly the same, just changing the
hook_void_hard_reg_set to receive a (HOST_WIDEST_FAST_INT *) arg and
doing the necessary typecasts, added an extra 3 M instructions.

But the ix86_live_on_entry is only called 1233x times from df-scan.c.
This isn't enough to explain all this overhead.


Indeed, 0.2% is hard to attribute to anything anyway.  Can you try 
building the patch on a couple more target and submit it?


Paolo


Re: [RFC] hard-reg-set.h refactoring

2011-08-01 Thread Dimitrios Apostolou

On Mon, 1 Aug 2011, Paolo Bonzini wrote:


On 08/01/2011 05:57 PM, Dimitrios Apostolou wrote:


I don't fully understand the output from -fdump-tree-all, but my
conclusion based also on profiler output and objdump, is that both
unrolling and inlining is happening in both versions. Nevertheless I can
see that assembly output is a bit different in the two cases (I can post
specific disassembly output if you are interested).


Thanks for checking.

Have you tried the idea of passing an unsigned HOST_WIDEST_FAST_INT * (or 
whatever the name) to the target hook?


Keeping my patch exactly the same, just changing the 
hook_void_hard_reg_set to receive a (HOST_WIDEST_FAST_INT *) arg and doing 
the necessary typecasts, added an extra 3 M instructions.


But the ix86_live_on_entry is only called 1233x times from df-scan.c. This 
isn't enough to explain all this overhead.



Dimitris



Re: [RFC] hard-reg-set.h refactoring

2011-08-01 Thread Paolo Bonzini

On 08/01/2011 05:57 PM, Dimitrios Apostolou wrote:


I don't fully understand the output from -fdump-tree-all, but my
conclusion based also on profiler output and objdump, is that both
unrolling and inlining is happening in both versions. Nevertheless I can
see that assembly output is a bit different in the two cases (I can post
specific disassembly output if you are interested).


Thanks for checking.

Have you tried the idea of passing an unsigned HOST_WIDEST_FAST_INT * 
(or whatever the name) to the target hook?


Paolo


Re: [RFC] hard-reg-set.h refactoring

2011-08-01 Thread Dimitrios Apostolou

On Sun, 31 Jul 2011, Paolo Bonzini wrote:

On Sat, Jul 30, 2011 at 19:21, Dimitrios Apostolou  wrote:

Nevertheless I'd appreciate comments on whether any part of this patch is
worth keeping. FWIW I've profiled this on i386 to be about 4 M instr slower
out of ~1.5 G inst. I'll be now checking the profiler to see where exactly
the overhead is.


I suggest -fdump-tree-all too, to check if unrolling is happening and
if not why.


I don't fully understand the output from -fdump-tree-all, but my 
conclusion based also on profiler output and objdump, is that both 
unrolling and inlining is happening in both versions. Nevertheless I can 
see that assembly output is a bit different in the two cases (I can post 
specific disassembly output if you are interested).


My opinion is that code cleanup is worth the minor overhead, given that 
there should be no regressions.



Thanks,
Dimitris



Re: [RFC] hard-reg-set.h refactoring

2011-07-31 Thread Paolo Bonzini
On Sat, Jul 30, 2011 at 19:21, Dimitrios Apostolou  wrote:
> I don't intend for this to go mainline, Jakub has explained on IRC that
> certain ABIs make it slower to pass structs and we wouldn't want that.

This can be "fixed" by marking the functions as always_inline.  They
should be always inlined though.

> Nevertheless I'd appreciate comments on whether any part of this patch is
> worth keeping. FWIW I've profiled this on i386 to be about 4 M instr slower
> out of ~1.5 G inst. I'll be now checking the profiler to see where exactly
> the overhead is.

I suggest -fdump-tree-all too, to check if unrolling is happening and
if not why.

Paolo


[RFC] hard-reg-set.h refactoring

2011-07-30 Thread Dimitrios Apostolou

Hello list,

the attached patch changes hard-reg-set.h in the following areas:

1) HARD_REG_SET is now always a struct so that it can be used in files 
where we don't want to include tm.h. Many thanks to Paolo for providing 
the idea and the original patch.


2) Code for specific HARD_REG_SET_LONG values is deleted and only generic 
code is left, making the file much more readable/maintainable. I was 
expecting gcc would unroll, even at -O2, loops with 2-3 iterations, so 
performance should have been the same.



I don't intend for this to go mainline, Jakub has explained on IRC that 
certain ABIs make it slower to pass structs and we wouldn't want that. 
Nevertheless I'd appreciate comments on whether any part of this patch is 
worth keeping. FWIW I've profiled this on i386 to be about 4 M instr 
slower out of ~1.5 G inst. I'll be now checking the profiler to see where 
exactly the overhead is.



Thanks,
Dimitris
=== modified file 'gcc/hard-reg-set.h'
--- gcc/hard-reg-set.h  2011-01-03 20:52:22 +
+++ gcc/hard-reg-set.h  2011-07-29 22:32:27 +
@@ -24,35 +24,31 @@ along with GCC; see the file COPYING3.  
 /* Define the type of a set of hard registers.  */
 
 /* HARD_REG_ELT_TYPE is a typedef of the unsigned integral type which
-   will be used for hard reg sets, either alone or in an array.
-
-   If HARD_REG_SET is a macro, its definition is HARD_REG_ELT_TYPE,
-   and it has enough bits to represent all the target machine's hard
-   registers.  Otherwise, it is a typedef for a suitably sized array
-   of HARD_REG_ELT_TYPEs.  HARD_REG_SET_LONGS is defined as how many.
+   will be used for hard reg sets.  An HARD_REG_ELT_TYPE, or an
+   array of them is wrapped in a struct.
 
Note that lots of code assumes that the first part of a regset is
the same format as a HARD_REG_SET.  To help make sure this is true,
we only try the widest fast integer mode (HOST_WIDEST_FAST_INT)
-   instead of all the smaller types.  This approach loses only if
-   there are very few registers and then only in the few cases where
-   we have an array of HARD_REG_SETs, so it needn't be as complex as
-   it used to be.  */
-
-typedef unsigned HOST_WIDEST_FAST_INT HARD_REG_ELT_TYPE;
-
-#if FIRST_PSEUDO_REGISTER <= HOST_BITS_PER_WIDEST_FAST_INT
-
-#define HARD_REG_SET HARD_REG_ELT_TYPE
+   instead of all the smaller types. */
 
+#ifdef ENABLE_RTL_CHECKING
+#define gcc_rtl_assert(EXPR) gcc_assert (EXPR)
 #else
+#define gcc_rtl_assert(EXPR) ((void)(0 && (EXPR)))
+#endif
+
+typedef unsigned HOST_WIDEST_FAST_INT HARD_REG_ELT_TYPE;
 
 #define HARD_REG_SET_LONGS \
  ((FIRST_PSEUDO_REGISTER + HOST_BITS_PER_WIDEST_FAST_INT - 1)  \
   / HOST_BITS_PER_WIDEST_FAST_INT)
-typedef HARD_REG_ELT_TYPE HARD_REG_SET[HARD_REG_SET_LONGS];
 
-#endif
+#define HARD_REG_SET struct hard_reg_set
+
+struct hard_reg_set {
+  HARD_REG_ELT_TYPE elems[HARD_REG_SET_LONGS];
+};
 
 /* HARD_CONST is used to cast a constant to the appropriate type
for use with a HARD_REG_SET.  */
@@ -89,343 +85,108 @@ typedef HARD_REG_ELT_TYPE HARD_REG_SET[H
hard_reg_set_intersect_p (X, Y), which returns true if X and Y intersect.
hard_reg_set_empty_p (X), which returns true if X is empty.  */
 
-#define UHOST_BITS_PER_WIDE_INT ((unsigned) HOST_BITS_PER_WIDEST_FAST_INT)
 
-#ifdef HARD_REG_SET
+#define HARD_REG_ELT_BITS ((unsigned) HOST_BITS_PER_WIDEST_FAST_INT)
 
 #define SET_HARD_REG_BIT(SET, BIT)  \
- ((SET) |= HARD_CONST (1) << (BIT))
+  hard_reg_set_set_bit (&(SET), (BIT))
 #define CLEAR_HARD_REG_BIT(SET, BIT)  \
- ((SET) &= ~(HARD_CONST (1) << (BIT)))
+  hard_reg_set_clear_bit(&(SET), (BIT))
 #define TEST_HARD_REG_BIT(SET, BIT)  \
- (!!((SET) & (HARD_CONST (1) << (BIT
-
-#define CLEAR_HARD_REG_SET(TO) ((TO) = HARD_CONST (0))
-#define SET_HARD_REG_SET(TO) ((TO) = ~ HARD_CONST (0))
-
-#define COPY_HARD_REG_SET(TO, FROM) ((TO) = (FROM))
-#define COMPL_HARD_REG_SET(TO, FROM) ((TO) = ~(FROM))
-
-#define IOR_HARD_REG_SET(TO, FROM) ((TO) |= (FROM))
-#define IOR_COMPL_HARD_REG_SET(TO, FROM) ((TO) |= ~ (FROM))
-#define AND_HARD_REG_SET(TO, FROM) ((TO) &= (FROM))
-#define AND_COMPL_HARD_REG_SET(TO, FROM) ((TO) &= ~ (FROM))
-
-static inline bool
-hard_reg_set_subset_p (const HARD_REG_SET x, const HARD_REG_SET y)
-{
-  return (x & ~y) == HARD_CONST (0);
-}
+  hard_reg_set_bit_p((SET), (BIT))
 
-static inline bool
-hard_reg_set_equal_p (const HARD_REG_SET x, const HARD_REG_SET y)
-{
-  return x == y;
-}
-
-static inline bool
-hard_reg_set_intersect_p (const HARD_REG_SET x, const HARD_REG_SET y)
-{
-  return (x & y) != HARD_CONST (0);
-}
-
-static inline bool
-hard_reg_set_empty_p (const HARD_REG_SET x)
+static inline void
+hard_reg_set_set_bit (HARD_REG_SET *s, unsigned int bit)
 {
-  return x == HARD_CONST (0);
-}
-
+#if HARD_REG_SET_LONGS > 1
+  int word = bit / HARD_REG_ELT_BITS;
+  int bitpos = bit % HARD_REG_ELT_BITS;
 #else
-
-#define SET_HARD_REG_BIT(SET, BIT) \
-  ((SET)[(BIT) / UHOST_BITS_PER_WIDE_INT]  \
-   |= HARD_CONST (1) << ((BIT) %