Re: [PATCH] Significantly reduce memory usage of genattrtab

2016-11-15 Thread Richard Sandiford
Bernd Edlinger  writes:
> On 11/15/16 13:21, Richard Sandiford wrote:
>> Bernd Edlinger  writes:
>>> Hi!
>>>
>>> The genattrtab build-tool uses way too much memory in general.
>>> I think there is no other build step that uses more memory.
>>>
>>> On the currently trunk it takes around 700MB to build the
>>> ARM latency tab files.  I debugged that yesterday
>>> and found that this can be reduced to 8MB (!).  Yes, really.
>>>
>>> So the attached patch does try really hard to hash and re-use
>>> all ever created rtx objects.
>>>
>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu and ARM.
>>> Is it OK for trunk?
>>
>> Just to check: does this produce the same output as before?
>> And did you notice any difference in the time genattrtab
>> takes to run?
>>
>
> The run time was in the range of 24-25s, with and without the patch.
>
> However the tables are a bit different, although that seems only to be
> w flaw with the ATTR_CURR_SIMPLIFIED_P which is now re-used when a
> matching rtx was found in the hash.  As I said, the generated functions
> do really work, probably because just a few simplifications are missing.
>
> So it looks like I need to clear the ATTR_CURR_SIMPLIFIED_P on re-used
> binary ops.  That I found out just by try-and-error.  I can say that now
> the generated functions are exactly identical for i386, arm and mips.
> The memory and the run time did not change due to this re-hashing.

OK, thanks for checking.

>> ATTR_PERMANENT_P is supposed to guarantee that no other rtx like it exists,
>> so that x != y when x or y is "permanent" implies that the attributes
>> must be different.  This lets attr_equal_p avoid a recursive walk:
>>
>> static int
>> attr_equal_p (rtx x, rtx y)
>> {
>>   return (x == y || (! (ATTR_PERMANENT_P (x) && ATTR_PERMANENT_P (y))
>>   && rtx_equal_p (x, y)));
>> }
>>
>> Does the patch still guarantee that?
>>
>
> Hmm, I see.  I expected that ATTR_PERMANENT_P means more or less,
> that it is in the hash table.  I believe that a long time ago, there
> was a kind of garbage collection of temporary rtx objects, that needed
> to be copied from the temporary space to the permanent space, after
> the simplification was done.  And then all temporary objects were
> just tossed away.  But that was long before my time.  Today
> everything is permanent, that is why the memory usage is unbounded.
>
> But I can fix that, by only setting ATTR_PERMANENT_P on the hashed
> rtx when both sub-rtx are also ATTR_PERMANENT_P.
>
>
> How does that new version look, is it OK?

OK.  Thanks for doing this, certainly an impressive headline number :-)

Richard


Re: [PATCH] Significantly reduce memory usage of genattrtab

2016-11-15 Thread Bernd Edlinger
On 11/15/16 13:21, Richard Sandiford wrote:
> Bernd Edlinger  writes:
>> Hi!
>>
>> The genattrtab build-tool uses way too much memory in general.
>> I think there is no other build step that uses more memory.
>>
>> On the currently trunk it takes around 700MB to build the
>> ARM latency tab files.  I debugged that yesterday
>> and found that this can be reduced to 8MB (!).  Yes, really.
>>
>> So the attached patch does try really hard to hash and re-use
>> all ever created rtx objects.
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu and ARM.
>> Is it OK for trunk?
>
> Just to check: does this produce the same output as before?
> And did you notice any difference in the time genattrtab
> takes to run?
>

The run time was in the range of 24-25s, with and without the patch.

However the tables are a bit different, although that seems only to be
w flaw with the ATTR_CURR_SIMPLIFIED_P which is now re-used when a
matching rtx was found in the hash.  As I said, the generated functions
do really work, probably because just a few simplifications are missing.

So it looks like I need to clear the ATTR_CURR_SIMPLIFIED_P on re-used
binary ops.  That I found out just by try-and-error.  I can say that now
the generated functions are exactly identical for i386, arm and mips.
The memory and the run time did not change due to this re-hashing.

>
> ATTR_PERMANENT_P is supposed to guarantee that no other rtx like it exists,
> so that x != y when x or y is "permanent" implies that the attributes
> must be different.  This lets attr_equal_p avoid a recursive walk:
>
> static int
> attr_equal_p (rtx x, rtx y)
> {
>   return (x == y || (! (ATTR_PERMANENT_P (x) && ATTR_PERMANENT_P (y))
>&& rtx_equal_p (x, y)));
> }
>
> Does the patch still guarantee that?
>

Hmm, I see.  I expected that ATTR_PERMANENT_P means more or less,
that it is in the hash table.  I believe that a long time ago, there
was a kind of garbage collection of temporary rtx objects, that needed
to be copied from the temporary space to the permanent space, after
the simplification was done.  And then all temporary objects were
just tossed away.  But that was long before my time.  Today
everything is permanent, that is why the memory usage is unbounded.

But I can fix that, by only setting ATTR_PERMANENT_P on the hashed
rtx when both sub-rtx are also ATTR_PERMANENT_P.


How does that new version look, is it OK?


Thanks
Bernd.
2016-11-15  Bernd Edlinger  

* genattrtab.c (attr_rtx_1): Avoid allocating new rtx objects.
Clear ATTR_CURR_SIMPLIFIED_P for re-used binary rtx objects.
Use DEF_ATTR_STRING for string arguments.  Use RTL_HASH for
integer arguments.  Only set ATTR_PERMANENT_P on newly hashed
rtx when all sub-rtx are also permanent.
(attr_eq): Simplify.
(attr_copy_rtx): Remove.
(make_canonical, get_attr_value): Use attr_equal_p.
(copy_boolean): Rehash NOT.
(simplify_test_exp_in_temp,
optimize_attrs): Remove call to attr_copy_rtx.
(attr_alt_intersection, attr_alt_union,
attr_alt_complement, mk_attr_alt): Rehash EQ_ATTR_ALT.
(make_automaton_attrs): Use attr_eq.
Index: gcc/genattrtab.c
===
--- gcc/genattrtab.c	(revision 242335)
+++ gcc/genattrtab.c	(working copy)
@@ -386,6 +386,7 @@ attr_rtx_1 (enum rtx_code code, va_list p)
   unsigned int hashcode;
   struct attr_hash *h;
   struct obstack *old_obstack = rtl_obstack;
+  int permanent_p = 1;
 
   /* For each of several cases, search the hash table for an existing entry.
  Use that entry if one is found; otherwise create a new RTL and add it
@@ -395,13 +396,8 @@ attr_rtx_1 (enum rtx_code code, va_list p)
 {
   rtx arg0 = va_arg (p, rtx);
 
-  /* A permanent object cannot point to impermanent ones.  */
   if (! ATTR_PERMANENT_P (arg0))
-	{
-	  rt_val = rtx_alloc (code);
-	  XEXP (rt_val, 0) = arg0;
-	  return rt_val;
-	}
+	permanent_p = 0;
 
   hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0));
   for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next)
@@ -425,14 +421,8 @@ attr_rtx_1 (enum rtx_code code, va_list p)
   rtx arg0 = va_arg (p, rtx);
   rtx arg1 = va_arg (p, rtx);
 
-  /* A permanent object cannot point to impermanent ones.  */
   if (! ATTR_PERMANENT_P (arg0) || ! ATTR_PERMANENT_P (arg1))
-	{
-	  rt_val = rtx_alloc (code);
-	  XEXP (rt_val, 0) = arg0;
-	  XEXP (rt_val, 1) = arg1;
-	  return rt_val;
-	}
+	permanent_p = 0;
 
   hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0) + RTL_HASH (arg1));
   for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next)
@@ -440,7 +430,10 @@ attr_rtx_1 (enum rtx_code code, va_list p)
 	&& GET_CODE (h->u.rtl) == code
 	&& XEXP (h->u.rtl, 0) == arg0
 	&& XEXP (h->u.rtl, 1) == arg1)
-	  return h->u.rtl;
+	  {
+	

Re: [PATCH] Significantly reduce memory usage of genattrtab

2016-11-15 Thread Richard Sandiford
Bernd Edlinger  writes:
> Hi!
>
> The genattrtab build-tool uses way too much memory in general.
> I think there is no other build step that uses more memory.
>
> On the currently trunk it takes around 700MB to build the
> ARM latency tab files.  I debugged that yesterday
> and found that this can be reduced to 8MB (!).  Yes, really.
>
> So the attached patch does try really hard to hash and re-use
> all ever created rtx objects.
>
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu and ARM.
> Is it OK for trunk?

Just to check: does this produce the same output as before?
And did you notice any difference in the time genattrtab
takes to run?

> Index: gcc/genattrtab.c
> ===
> --- gcc/genattrtab.c  (revision 242335)
> +++ gcc/genattrtab.c  (working copy)
> @@ -395,14 +395,6 @@ attr_rtx_1 (enum rtx_code code, va_list p)
>  {
>rtx arg0 = va_arg (p, rtx);
>  
> -  /* A permanent object cannot point to impermanent ones.  */
> -  if (! ATTR_PERMANENT_P (arg0))
> - {
> -   rt_val = rtx_alloc (code);
> -   XEXP (rt_val, 0) = arg0;
> -   return rt_val;
> - }
> -
>hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0));
>for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next)
>   if (h->hashcode == hashcode
> @@ -425,15 +417,6 @@ attr_rtx_1 (enum rtx_code code, va_list p)
>rtx arg0 = va_arg (p, rtx);
>rtx arg1 = va_arg (p, rtx);
>  
> -  /* A permanent object cannot point to impermanent ones.  */
> -  if (! ATTR_PERMANENT_P (arg0) || ! ATTR_PERMANENT_P (arg1))
> - {
> -   rt_val = rtx_alloc (code);
> -   XEXP (rt_val, 0) = arg0;
> -   XEXP (rt_val, 1) = arg1;
> -   return rt_val;
> - }
> -
>hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0) + RTL_HASH (arg1));
>for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next)
>   if (h->hashcode == hashcode

ATTR_PERMANENT_P is supposed to guarantee that no other rtx like it exists,
so that x != y when x or y is "permanent" implies that the attributes
must be different.  This lets attr_equal_p avoid a recursive walk:

static int
attr_equal_p (rtx x, rtx y)
{
  return (x == y || (! (ATTR_PERMANENT_P (x) && ATTR_PERMANENT_P (y))
 && rtx_equal_p (x, y)));
}

Does the patch still guarantee that?

Thanks,
Richard


[PATCH] Significantly reduce memory usage of genattrtab

2016-11-15 Thread Bernd Edlinger
Hi!

The genattrtab build-tool uses way too much memory in general.
I think there is no other build step that uses more memory.

On the currently trunk it takes around 700MB to build the
ARM latency tab files.  I debugged that yesterday
and found that this can be reduced to 8MB (!).  Yes, really.

So the attached patch does try really hard to hash and re-use
all ever created rtx objects.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu and ARM.
Is it OK for trunk?


Thanks
Bernd.2016-11-15  Bernd Edlinger  

* genattrtab.c (attr_rtx_1): Ignore ATTR_PERMANENT_P on arguments.
Use DEF_ATTR_STRING for string arguments.
Use RTL_HASH for integer arguments.
(attr_eq): Simplify.
(attr_copy_rtx): Remove.
(make_canonical, get_attr_value): Use attr_equal_p.
(copy_boolean): Rehash NOT.
(simplify_test_exp_in_temp,
optimize_attrs): Remove call to attr_copy_rtx.
(attr_alt_intersection, attr_alt_union,
attr_alt_complement, mk_attr_alt): Rehash EQ_ATTR_ALT.
(make_automaton_attrs): Use attr_eq.
Index: gcc/genattrtab.c
===
--- gcc/genattrtab.c	(revision 242335)
+++ gcc/genattrtab.c	(working copy)
@@ -395,14 +395,6 @@ attr_rtx_1 (enum rtx_code code, va_list p)
 {
   rtx arg0 = va_arg (p, rtx);
 
-  /* A permanent object cannot point to impermanent ones.  */
-  if (! ATTR_PERMANENT_P (arg0))
-	{
-	  rt_val = rtx_alloc (code);
-	  XEXP (rt_val, 0) = arg0;
-	  return rt_val;
-	}
-
   hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0));
   for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next)
 	if (h->hashcode == hashcode
@@ -425,15 +417,6 @@ attr_rtx_1 (enum rtx_code code, va_list p)
   rtx arg0 = va_arg (p, rtx);
   rtx arg1 = va_arg (p, rtx);
 
-  /* A permanent object cannot point to impermanent ones.  */
-  if (! ATTR_PERMANENT_P (arg0) || ! ATTR_PERMANENT_P (arg1))
-	{
-	  rt_val = rtx_alloc (code);
-	  XEXP (rt_val, 0) = arg0;
-	  XEXP (rt_val, 1) = arg1;
-	  return rt_val;
-	}
-
   hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0) + RTL_HASH (arg1));
   for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next)
 	if (h->hashcode == hashcode
@@ -481,6 +464,9 @@ attr_rtx_1 (enum rtx_code code, va_list p)
   char *arg0 = va_arg (p, char *);
   char *arg1 = va_arg (p, char *);
 
+  arg0 = DEF_ATTR_STRING (arg0);
+  arg1 = DEF_ATTR_STRING (arg1);
+
   hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0) + RTL_HASH (arg1));
   for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next)
 	if (h->hashcode == hashcode
@@ -497,6 +483,29 @@ attr_rtx_1 (enum rtx_code code, va_list p)
 	  XSTR (rt_val, 1) = arg1;
 	}
 }
+  else if (GET_RTX_LENGTH (code) == 2
+	   && GET_RTX_FORMAT (code)[0] == 'i'
+	   && GET_RTX_FORMAT (code)[1] == 'i')
+{
+  int  arg0 = va_arg (p, int);
+  int  arg1 = va_arg (p, int);
+
+  hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0) + RTL_HASH (arg1));
+  for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next)
+	if (h->hashcode == hashcode
+	&& GET_CODE (h->u.rtl) == code
+	&& XINT (h->u.rtl, 0) == arg0
+	&& XINT (h->u.rtl, 1) == arg1)
+	  return h->u.rtl;
+
+  if (h == 0)
+	{
+	  rtl_obstack = hash_obstack;
+	  rt_val = rtx_alloc (code);
+	  XINT (rt_val, 0) = arg0;
+	  XINT (rt_val, 1) = arg1;
+	}
+}
   else if (code == CONST_INT)
 {
   HOST_WIDE_INT arg0 = va_arg (p, HOST_WIDE_INT);
@@ -592,7 +601,7 @@ attr_printf (unsigned int len, const char *fmt, ..
 static rtx
 attr_eq (const char *name, const char *value)
 {
-  return attr_rtx (EQ_ATTR, DEF_ATTR_STRING (name), DEF_ATTR_STRING (value));
+  return attr_rtx (EQ_ATTR, name, value);
 }
 
 static const char *
@@ -646,89 +655,6 @@ attr_equal_p (rtx x, rtx y)
 		 && rtx_equal_p (x, y)));
 }
 
-/* Copy an attribute value expression,
-   descending to all depths, but not copying any
-   permanent hashed subexpressions.  */
-
-static rtx
-attr_copy_rtx (rtx orig)
-{
-  rtx copy;
-  int i, j;
-  RTX_CODE code;
-  const char *format_ptr;
-
-  /* No need to copy a permanent object.  */
-  if (ATTR_PERMANENT_P (orig))
-return orig;
-
-  code = GET_CODE (orig);
-
-  switch (code)
-{
-case REG:
-CASE_CONST_ANY:
-case SYMBOL_REF:
-case MATCH_TEST:
-case CODE_LABEL:
-case PC:
-case CC0:
-  return orig;
-
-default:
-  break;
-}
-
-  copy = rtx_alloc (code);
-  PUT_MODE (copy, GET_MODE (orig));
-  ATTR_IND_SIMPLIFIED_P (copy) = ATTR_IND_SIMPLIFIED_P (orig);
-  ATTR_CURR_SIMPLIFIED_P (copy) = ATTR_CURR_SIMPLIFIED_P (orig);
-  ATTR_PERMANENT_P (copy) = ATTR_PERMANENT_P (orig);
-
-  format_ptr = GET_RTX_FORMAT (GET_CODE (copy));
-
-  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (copy)); i++)
-{
-  switch (*format_ptr++)
-	{
-	case 'e':
-	  XEXP (copy, i) = XEXP