Re: [PATCH, PR 10474] Split live-ranges of function arguments to help shrink-wrapping

2013-10-31 Thread Martin Jambor
On Thu, Oct 31, 2013 at 12:16:41AM +0100, Jakub Jelinek wrote:
> On Fri, Oct 25, 2013 at 05:19:06PM +0200, Martin Jambor wrote:
> > 2013-10-23  Martin Jambor  
> > 
> > PR rtl-optimization/10474
> > * ira.c (find_moveable_pseudos): Do not calculate dominance info
> > nor df analysis.
> > (interesting_dest_for_shprep): New function.
> > (split_live_ranges_for_shrink_wrap): Likewise.
> > (ira): Calculate dominance info and df analysis. Call
> > split_live_ranges_for_shrink_wrap.
> > 
> > testsuite/
> > * gcc.dg/pr10474.c: New testcase.
> > * gcc.dg/ira-shrinkwrap-prep-1.c: Likewise.
> > * gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
> 
> Unfortunately this patch breaks i686-linux bootstrap,

Because of this, PR 58934 and perhaps other problems, and because I
have reasons to doubt that I will be able to resolve them today or
tomorrow (for example seeing postreload in the backraces makes me
think I'll need some help :-), I am about to commit the following to
revert my patch, after it passed C and C++ bootstrap and x86_64-linux.

Sorry for the breakage,

Martin

2013-10-31  Martin Jambor  

PR rtl-optimization/58934

Revert:
2013-10-30  Martin Jambor  
PR rtl-optimization/10474
* ira.c (find_moveable_pseudos): Do not calculate dominance info
nor df analysis.
(interesting_dest_for_shprep): New function.
(split_live_ranges_for_shrink_wrap): Likewise.
(ira): Calculate dominance info and df analysis. Call
split_live_ranges_for_shrink_wrap.

testsuite/
* gcc.dg/pr10474.c: New testcase.
* gcc.dg/ira-shrinkwrap-prep-1.c: Likewise.
* gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.


diff --git a/gcc/ira.c b/gcc/ira.c
index d959109..1a26fed 100644
--- b/gcc/ira.c
+++ a/gcc/ira.c
@@ -3990,6 +3990,9 @@
   pseudo_replaced_reg.release ();
   pseudo_replaced_reg.safe_grow_cleared (max_regs);
 
+  df_analyze ();
+  calculate_dominance_info (CDI_DOMINATORS);
+
   i = 0;
   bitmap_initialize (&live, 0);
   bitmap_initialize (&used, 0);
@@ -4309,196 +4312,7 @@
   regstat_free_ri ();
   regstat_init_n_sets_and_refs ();
   regstat_compute_ri ();
-}
-
-
-/* If insn is interesting for parameter range-splitting shring-wrapping
-   preparation, i.e. it is a single set from a hard register to a pseudo, which
-   is live at CALL_DOM, return the destination.  Otherwise return NULL.  */
-
-static rtx
-interesting_dest_for_shprep (rtx insn, basic_block call_dom)
-{
-  rtx set = single_set (insn);
-  if (!set)
-return NULL;
-  rtx src = SET_SRC (set);
-  rtx dest = SET_DEST (set);
-  if (!REG_P (src) || !HARD_REGISTER_P (src)
-  || !REG_P (dest) || HARD_REGISTER_P (dest)
-  || (call_dom && !bitmap_bit_p (df_get_live_in (call_dom), REGNO (dest
-return NULL;
-  return dest;
-}
-
-/* Split live ranges of pseudos that are loaded from hard registers in the
-   first BB in a BB that dominates all non-sibling call if such a BB can be
-   found and is not in a loop.  Return true if the function has made any
-   changes.  */
-
-static bool
-split_live_ranges_for_shrink_wrap (void)
-{
-  basic_block bb, call_dom = NULL;
-  basic_block first = single_succ (ENTRY_BLOCK_PTR);
-  rtx insn, last_interesting_insn = NULL;
-  bitmap_head need_new, reachable;
-  vec queue;
-
-  if (!flag_shrink_wrap)
-return false;
-
-  bitmap_initialize (&need_new, 0);
-  bitmap_initialize (&reachable, 0);
-  queue.create (n_basic_blocks);
-
-  FOR_EACH_BB (bb)
-FOR_BB_INSNS (bb, insn)
-  if (CALL_P (insn) && !SIBLING_CALL_P (insn))
-   {
- if (bb == first)
-   {
- bitmap_clear (&need_new);
- bitmap_clear (&reachable);
- queue.release ();
- return false;
-   }
-
- bitmap_set_bit (&need_new, bb->index);
- bitmap_set_bit (&reachable, bb->index);
- queue.quick_push (bb);
- break;
-   }
-
-  if (queue.is_empty ())
-{
-  bitmap_clear (&need_new);
-  bitmap_clear (&reachable);
-  queue.release ();
-  return false;
-}
-
-  while (!queue.is_empty ())
-{
-  edge e;
-  edge_iterator ei;
-
-  bb = queue.pop ();
-  FOR_EACH_EDGE (e, ei, bb->succs)
-   if (e->dest != EXIT_BLOCK_PTR
-   && bitmap_set_bit (&reachable, e->dest->index))
- queue.quick_push (e->dest);
-}
-  queue.release ();
-
-  FOR_BB_INSNS (first, insn)
-{
-  rtx dest = interesting_dest_for_shprep (insn, NULL);
-  if (!dest)
-   continue;
-
-  if (DF_REG_DEF_COUNT (REGNO (dest)) > 1)
-   {
- bitmap_clear (&need_new);
- bitmap_clear (&reachable);
- return false;
-   }
-
-  for (df_ref use = DF_REG_USE_CHAIN (REGNO(dest));
-  use;
-  use = DF_REF_NEXT_REG (use))
-   {
- if (NONDEBUG_INSN_P (DF_REF_INSN (use))
- && GET_CODE (DF_REF_REG (use)) == SUBREG)
-   {
- /* This is 

ICE with "[PATCH, PR 10474] Split live-ranges of function arguments to help shrink-wrapping"

2013-10-30 Thread Hans-Peter Nilsson
> From: Jakub Jelinek 
> Date: Thu, 31 Oct 2013 00:16:41 +0100

> On Fri, Oct 25, 2013 at 05:19:06PM +0200, Martin Jambor wrote:
> > 2013-10-23  Martin Jambor  
> > 
> > PR rtl-optimization/10474
> > * ira.c (find_moveable_pseudos): Do not calculate dominance info
> > nor df analysis.
> > (interesting_dest_for_shprep): New function.
> > (split_live_ranges_for_shrink_wrap): Likewise.
> > (ira): Calculate dominance info and df analysis. Call
> > split_live_ranges_for_shrink_wrap.
> > 
> > testsuite/
> > * gcc.dg/pr10474.c: New testcase.
> > * gcc.dg/ira-shrinkwrap-prep-1.c: Likewise.
> > * gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
> 
> Unfortunately this patch breaks i686-linux bootstrap,

It (revision r204205) "also" causes
.

(If I had seen your message Jakub, it might have saved me the
reghunt and a PR.  On the other hand, debugging an ICE is far
more comfortable than a miscompare.)

brgds, H-P


Re: [PATCH, PR 10474] Split live-ranges of function arguments to help shrink-wrapping

2013-10-30 Thread Jakub Jelinek
On Fri, Oct 25, 2013 at 05:19:06PM +0200, Martin Jambor wrote:
> 2013-10-23  Martin Jambor  
> 
>   PR rtl-optimization/10474
>   * ira.c (find_moveable_pseudos): Do not calculate dominance info
>   nor df analysis.
>   (interesting_dest_for_shprep): New function.
>   (split_live_ranges_for_shrink_wrap): Likewise.
>   (ira): Calculate dominance info and df analysis. Call
>   split_live_ranges_for_shrink_wrap.
> 
> testsuite/
>   * gcc.dg/pr10474.c: New testcase.
>   * gcc.dg/ira-shrinkwrap-prep-1.c: Likewise.
>   * gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.

Unfortunately this patch breaks i686-linux bootstrap,
in r204204 compare passes, while in r204205 I'm getting .bad_compare
gcc/fortran/module.o differs
   
gcc/ipa.o differs   
   
gcc/go/gogo.o differs   
   
gcc/go/statements.o differs 
   
Most likely combine.o is miscompiled, but haven't verified that yet.

The way I'm configuring this on x86_64-linux is:
mkdir ~/hbin
cat > ~/hbin/as <<\EOF2
#!/bin/sh
exec /usr/bin/as --32 "$@"
EOF2
cat > ~/hbin/g++ <<\EOF2
#!/bin/sh
exec /usr/bin/g++ -m32 "$@"
EOF2
cat > ~/hbin/gcc <<\EOF2
#!/bin/sh
exec /usr/bin/gcc -m32 "$@"
EOF2
cat > ~/hbin/ld <<\EOF2
#!/bin/sh
case "$*" in
  --version) cat <<\EOF
GNU ld version 2.20.52.0.1-10.fc17 20100131
Copyright 2012 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later
version.
This program has absolutely no warranty.
EOF
  exit 0;;
esac
exec /usr/bin/ld -m elf_i386 -L /usr/lib/ "$@"
EOF2
chmod 755 ~/hbin/*
PATH=~/hbin:$PATH i386 ../configure --enable-languages=all,obj-c++,lto,go 
--enable-checking=yes,rtl
PATH=~/hbin:$PATH i386 make -j48

Jakub


Re: [PATCH, PR 10474] Split live-ranges of function arguments to help shrink-wrapping

2013-10-29 Thread Vladimir Makarov
On 10/25/2013 11:19 AM, Martin Jambor wrote:
> Hi,
>
> On Thu, Oct 24, 2013 at 01:02:51AM +0200, Steven Bosscher wrote:
>> On Wed, Oct 23, 2013 at 6:46 PM, Martin Jambor wrote:
>>
>>>  /* Perform the second half of the transformation started in
>>> @@ -4522,7 +4704,15 @@ ira (FILE *f)
>>>   allocation because of -O0 usage or because the function is too
>>>   big.  */
>>>if (ira_conflicts_p)
>>> -find_moveable_pseudos ();
>>> +{
>>> +  df_analyze ();
>>> +  calculate_dominance_info (CDI_DOMINATORS);
>>> +
>>> +  find_moveable_pseudos ();
>>> +  split_live_ranges_for_shrink_wrap ();
>>> +
>>> +  free_dominance_info (CDI_DOMINATORS);
>>> +}
>>>
>> You probably want to add another df_analyze if
>> split_live_ranges_for_shrink_wrap makes code transformations. AFAIU
>> find_moveable_pseudos doesn't change global liveness but your
>> transformation might. IRA/LRA need up-to-date DF_LR results to compute
>> allocno live ranges.
>>
> OK, I have changed the patch to fo that (it is below, still bootstraps
> and passes tests on x86_64 fine).  However, I have noticed that the
> corresponding part in function ira now looks like:
>
>   /* ... */
>   if (delete_trivially_dead_insns (get_insns (), max_reg_num ()))
> df_analyze ();
>
>   /* It is not worth to do such improvement when we use a simple
>  allocation because of -O0 usage or because the function is too
>  big.  */
>   if (ira_conflicts_p)
> {
>   df_analyze ();
>   calculate_dominance_info (CDI_DOMINATORS);
>
>   find_moveable_pseudos ();
>   if (split_live_ranges_for_shrink_wrap ())
>   df_analyze ();
>
>   free_dominance_info (CDI_DOMINATORS);
> }
>   /* ... */
>
> So, that left me wondering whether the first call to df_analyze is
> actually necessary, or whether perhaps the data are actually already
> up to date.  What do you think?
I guess it needs some investigation.  delete_trivially_dead_insns code
was taken from the old RA.  First of all, I don't know how many insns
are really trivially dead before RA in optimization and non-optimization
mode.  May be the code can be removed at all.  I'll put it on my todo list.
The patch is ok to commit.  Thanks for working on this, Martin.
>
> 2013-10-23  Martin Jambor  
>
>   PR rtl-optimization/10474
>   * ira.c (find_moveable_pseudos): Do not calculate dominance info
>   nor df analysis.
>   (interesting_dest_for_shprep): New function.
>   (split_live_ranges_for_shrink_wrap): Likewise.
>   (ira): Calculate dominance info and df analysis. Call
>   split_live_ranges_for_shrink_wrap.
>
> testsuite/
>   * gcc.dg/pr10474.c: New testcase.
>   * gcc.dg/ira-shrinkwrap-prep-1.c: Likewise.
>   * gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
>
>



Re: [PATCH, PR 10474] Split live-ranges of function arguments to help shrink-wrapping

2013-10-25 Thread Martin Jambor
Hi,

On Thu, Oct 24, 2013 at 01:02:51AM +0200, Steven Bosscher wrote:
> On Wed, Oct 23, 2013 at 6:46 PM, Martin Jambor wrote:
> 
> >  /* Perform the second half of the transformation started in
> > @@ -4522,7 +4704,15 @@ ira (FILE *f)
> >   allocation because of -O0 usage or because the function is too
> >   big.  */
> >if (ira_conflicts_p)
> > -find_moveable_pseudos ();
> > +{
> > +  df_analyze ();
> > +  calculate_dominance_info (CDI_DOMINATORS);
> > +
> > +  find_moveable_pseudos ();
> > +  split_live_ranges_for_shrink_wrap ();
> > +
> > +  free_dominance_info (CDI_DOMINATORS);
> > +}
> >
> 
> You probably want to add another df_analyze if
> split_live_ranges_for_shrink_wrap makes code transformations. AFAIU
> find_moveable_pseudos doesn't change global liveness but your
> transformation might. IRA/LRA need up-to-date DF_LR results to compute
> allocno live ranges.
> 

OK, I have changed the patch to fo that (it is below, still bootstraps
and passes tests on x86_64 fine).  However, I have noticed that the
corresponding part in function ira now looks like:

  /* ... */
  if (delete_trivially_dead_insns (get_insns (), max_reg_num ()))
df_analyze ();

  /* It is not worth to do such improvement when we use a simple
 allocation because of -O0 usage or because the function is too
 big.  */
  if (ira_conflicts_p)
{
  df_analyze ();
  calculate_dominance_info (CDI_DOMINATORS);

  find_moveable_pseudos ();
  if (split_live_ranges_for_shrink_wrap ())
df_analyze ();

  free_dominance_info (CDI_DOMINATORS);
}
  /* ... */

So, that left me wondering whether the first call to df_analyze is
actually necessary, or whether perhaps the data are actually already
up to date.  What do you think?

Thanks for all the feedback,

Martin


2013-10-23  Martin Jambor  

PR rtl-optimization/10474
* ira.c (find_moveable_pseudos): Do not calculate dominance info
nor df analysis.
(interesting_dest_for_shprep): New function.
(split_live_ranges_for_shrink_wrap): Likewise.
(ira): Calculate dominance info and df analysis. Call
split_live_ranges_for_shrink_wrap.

testsuite/
* gcc.dg/pr10474.c: New testcase.
* gcc.dg/ira-shrinkwrap-prep-1.c: Likewise.
* gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.

diff --git a/gcc/ira.c b/gcc/ira.c
index 203fbff..532db31 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -3989,9 +3989,6 @@ find_moveable_pseudos (void)
   pseudo_replaced_reg.release ();
   pseudo_replaced_reg.safe_grow_cleared (max_regs);
 
-  df_analyze ();
-  calculate_dominance_info (CDI_DOMINATORS);
-
   i = 0;
   bitmap_initialize (&live, 0);
   bitmap_initialize (&used, 0);
@@ -4311,7 +4308,196 @@ find_moveable_pseudos (void)
   regstat_free_ri ();
   regstat_init_n_sets_and_refs ();
   regstat_compute_ri ();
-  free_dominance_info (CDI_DOMINATORS);
+}
+
+
+/* If insn is interesting for parameter range-splitting shring-wrapping
+   preparation, i.e. it is a single set from a hard register to a pseudo, which
+   is live at CALL_DOM, return the destination.  Otherwise return NULL.  */
+
+static rtx
+interesting_dest_for_shprep (rtx insn, basic_block call_dom)
+{
+  rtx set = single_set (insn);
+  if (!set)
+return NULL;
+  rtx src = SET_SRC (set);
+  rtx dest = SET_DEST (set);
+  if (!REG_P (src) || !HARD_REGISTER_P (src)
+  || !REG_P (dest) || HARD_REGISTER_P (dest)
+  || (call_dom && !bitmap_bit_p (df_get_live_in (call_dom), REGNO (dest
+return NULL;
+  return dest;
+}
+
+/* Split live ranges of pseudos that are loaded from hard registers in the
+   first BB in a BB that dominates all non-sibling call if such a BB can be
+   found and is not in a loop.  Return true if the function has made any
+   changes.  */
+
+static bool
+split_live_ranges_for_shrink_wrap (void)
+{
+  basic_block bb, call_dom = NULL;
+  basic_block first = single_succ (ENTRY_BLOCK_PTR);
+  rtx insn, last_interesting_insn = NULL;
+  bitmap_head need_new, reachable;
+  vec queue;
+
+  if (!flag_shrink_wrap)
+return false;
+
+  bitmap_initialize (&need_new, 0);
+  bitmap_initialize (&reachable, 0);
+  queue.create (n_basic_blocks);
+
+  FOR_EACH_BB (bb)
+FOR_BB_INSNS (bb, insn)
+  if (CALL_P (insn) && !SIBLING_CALL_P (insn))
+   {
+ if (bb == first)
+   {
+ bitmap_clear (&need_new);
+ bitmap_clear (&reachable);
+ queue.release ();
+ return false;
+   }
+
+ bitmap_set_bit (&need_new, bb->index);
+ bitmap_set_bit (&reachable, bb->index);
+ queue.quick_push (bb);
+ break;
+   }
+
+  if (queue.is_empty ())
+{
+  bitmap_clear (&need_new);
+  bitmap_clear (&reachable);
+  queue.release ();
+  return false;
+}
+
+  while (!queue.is_empty ())
+{
+  edge e;
+  edge_iterator ei;
+
+  bb = queue.pop ();
+  FOR_EACH_EDGE (e, ei, bb->suc

Re: [PATCH, PR 10474] Split live-ranges of function arguments to help shrink-wrapping

2013-10-23 Thread Steven Bosscher
On Wed, Oct 23, 2013 at 6:46 PM, Martin Jambor wrote:

>  /* Perform the second half of the transformation started in
> @@ -4522,7 +4704,15 @@ ira (FILE *f)
>   allocation because of -O0 usage or because the function is too
>   big.  */
>if (ira_conflicts_p)
> -find_moveable_pseudos ();
> +{
> +  df_analyze ();
> +  calculate_dominance_info (CDI_DOMINATORS);
> +
> +  find_moveable_pseudos ();
> +  split_live_ranges_for_shrink_wrap ();
> +
> +  free_dominance_info (CDI_DOMINATORS);
> +}
>

You probably want to add another df_analyze if
split_live_ranges_for_shrink_wrap makes code transformations. AFAIU
find_moveable_pseudos doesn't change global liveness but your
transformation might. IRA/LRA need up-to-date DF_LR results to compute
allocno live ranges.

Ciao!
Steven


Re: [PATCH, PR 10474] Split live-ranges of function arguments to help shrink-wrapping

2013-10-23 Thread Martin Jambor
Hi,

On Mon, Oct 21, 2013 at 11:00:38PM -0400, Vladimir Makarov wrote:
> On 13-10-21 6:56 PM, Steven Bosscher wrote:
> >
> >+   {
> >+ bitmap_clear (&need_new);
> >+ bitmap_clear (&reachable);
> >+ return;
> >+   }
> >+
> >+  for (df_ref use = DF_REG_USE_CHAIN (REGNO(dest));
> >+  use;
> >+  use = DF_REF_NEXT_REG (use))
> >You're using DF in these places. But IRA and LRA don't work with DF.
> >After update_equiv_regs DF caches and liveness may be incorrect. You'd
> >have to add a df_analyze call but I'm not sure how that will interact
> >with IRA/LRA's own dataflow frameworks (e.g. w.r.t.
> >REG_DEAD/REG_UNUSED notes).
> >
> >
> Sorry, Martin.  I think Steven is right.  IRA/LRA (and reload pass)
> creates so many changes in RTL that DF infrastructure would slow
> down the compiler a lot and therefore df info is not updated during
> RA.

no need to be sorry, I absolutely anticipated comments and requests
for changes.

>  Your patch mostly uses a correct DF-info because there are few
> changes since updating is off.

I think the reason why it works is that find_moveable_pseudos, which
is called immediately before the function I'm adding, already calls
df_analyze.  I suppose it does not cause any trouble since the call is
there for quite a few months already.  Because find_moveable_pseudos
will never split registers which are defined by a set from a hard
register (because rtx_moveable_p returns false for a HARD_REGISTER_P),
the analysis results should still be perfectly up-to-date for the
purposes of my transformation.

So, do you think that this could be just made more explicit by moving
the call to df_analyze (and dominator calculation) out of
find_moveable_pseudos to ira() as in the (bootstrapped, tested) patch
below?

> You could move your optimization a bit up before df_clear_flags
> (DF_NO_INSN_RESCAN); or move this call right after your
> optimizations (possibly some minor df calls are needed too to
> restore live info for RA after your RTL changes).

I tried this but got dark glibc double free and segfault errors from
deep down in the call to df_analyze in find_moveable_pseudos, so I
quickly chickened out.  I will re-try (or even move the transformation
more up front) if you or Steven reject this attempt.

Thanks,

Martin


2013-10-23  Martin Jambor  

PR rtl-optimization/10474
* ira.c (find_moveable_pseudos): Do not calculate dominance info
nor df analysis.
(interesting_dest_for_shprep): New function.
(split_live_ranges_for_shrink_wrap): Likewise.
(ira): Calculate dominance info and df analysis. Call
split_live_ranges_for_shrink_wrap.

testsuite/
* gcc.dg/pr10474.c: New testcase.
* gcc.dg/ira-shrinkwrap-prep-1.c: Likewise.
* gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.

diff --git a/gcc/ira.c b/gcc/ira.c
index 203fbff..0faea8f 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -3989,9 +3989,6 @@ find_moveable_pseudos (void)
   pseudo_replaced_reg.release ();
   pseudo_replaced_reg.safe_grow_cleared (max_regs);
 
-  df_analyze ();
-  calculate_dominance_info (CDI_DOMINATORS);
-
   i = 0;
   bitmap_initialize (&live, 0);
   bitmap_initialize (&used, 0);
@@ -4311,7 +4308,192 @@ find_moveable_pseudos (void)
   regstat_free_ri ();
   regstat_init_n_sets_and_refs ();
   regstat_compute_ri ();
-  free_dominance_info (CDI_DOMINATORS);
+}
+
+
+/* If insn is interesting for parameter range-splitting shring-wrapping
+   preparation, i.e. it is a single set from a hard register to a pseudo, which
+   is live at CALL_DOM, return the destination.  Otherwise return NULL.  */
+
+static rtx
+interesting_dest_for_shprep (rtx insn, basic_block call_dom)
+{
+  rtx set = single_set (insn);
+  if (!set)
+return NULL;
+  rtx src = SET_SRC (set);
+  rtx dest = SET_DEST (set);
+  if (!REG_P (src) || !HARD_REGISTER_P (src)
+  || !REG_P (dest) || HARD_REGISTER_P (dest)
+  || (call_dom && !bitmap_bit_p (df_get_live_in (call_dom), REGNO (dest
+return NULL;
+  return dest;
+}
+
+/* Split live ranges of pseudos that are loaded from hard registers in the
+   first BB in a BB that dominates all non-sibling call if such a BB can be
+   found and is not in a loop.  */
+
+static void
+split_live_ranges_for_shrink_wrap (void)
+{
+  basic_block bb, call_dom = NULL;
+  basic_block first = single_succ (ENTRY_BLOCK_PTR);
+  rtx insn, last_interesting_insn = NULL;
+  bitmap_head need_new, reachable;
+  vec queue;
+
+  if (!flag_shrink_wrap)
+return;
+
+  bitmap_initialize (&need_new, 0);
+  bitmap_initialize (&reachable, 0);
+  queue.create (n_basic_blocks);
+
+  FOR_EACH_BB (bb)
+FOR_BB_INSNS (bb, insn)
+  if (CALL_P (insn) && !SIBLING_CALL_P (insn))
+   {
+ if (bb == first)
+   {
+ bitmap_clear (&need_new);
+ bitmap_clear (&reachable);
+ queue.release ();
+ return;
+   }
+
+ bitmap_set_bit (&need_new, bb->in

Re: [PATCH, PR 10474] Split live-ranges of function arguments to help shrink-wrapping

2013-10-21 Thread Vladimir Makarov

On 13-10-21 6:56 PM, Steven Bosscher wrote:


+   {
+ bitmap_clear (&need_new);
+ bitmap_clear (&reachable);
+ return;
+   }
+
+  for (df_ref use = DF_REG_USE_CHAIN (REGNO(dest));
+  use;
+  use = DF_REF_NEXT_REG (use))
You're using DF in these places. But IRA and LRA don't work with DF.
After update_equiv_regs DF caches and liveness may be incorrect. You'd
have to add a df_analyze call but I'm not sure how that will interact
with IRA/LRA's own dataflow frameworks (e.g. w.r.t.
REG_DEAD/REG_UNUSED notes).


Sorry, Martin.  I think Steven is right.  IRA/LRA (and reload pass) 
creates so many changes in RTL that DF infrastructure would slow down 
the compiler a lot and therefore df info is not updated during RA.  Your 
patch mostly uses a correct DF-info because there are few changes since 
updating is off.


You could move your optimization a bit up before df_clear_flags 
(DF_NO_INSN_RESCAN); or move this call right after your optimizations 
(possibly some minor df calls are needed too to restore live info for RA 
after your RTL changes).




Re: [PATCH, PR 10474] Split live-ranges of function arguments to help shrink-wrapping

2013-10-21 Thread Steven Bosscher
On Mon, Oct 21, 2013 at 6:32 PM, Martin Jambor wrote:
> --- a/gcc/ira.c
> +++ b/gcc/ira.c
> @@ -4314,6 +4314,197 @@ find_moveable_pseudos (void)
>free_dominance_info (CDI_DOMINATORS);
>  }
>
> +
> +/* If insn is interesting for parameter range-splitting shring-wrapping
> +   preparation, i.e. it is a single set from a hard register to a pseudo, 
> which
> +   is live at CALL_DOM, return the destination.  Otherwise return NULL.  */
> +
> +static rtx
> +interesting_dest_for_shprep (rtx insn, basic_block call_dom)
> +{
> +  rtx set = single_set (insn);
> +  if (!set)
> +return NULL;
> +  rtx src = SET_SRC (set);
> +  rtx dest = SET_DEST (set);
> +  if (!REG_P (src) || !HARD_REGISTER_P (src)
> +  || !REG_P (dest) || HARD_REGISTER_P (dest)
> +  || (call_dom && !bitmap_bit_p (df_get_live_in (call_dom), REGNO 
> (dest

See below...


> +return NULL;
> +  return dest;
> +}
> +
> +/* Split live ranges of pseudos that are loaded from hard registers in the
> +   first BB in a BB that dominates all non-sibling call if such a BB can be
> +   found and is not in a loop.  */
> +
> +static void
> +split_live_ranges_for_shrink_wrap (void)
> +{
> +  basic_block bb, call_dom = NULL;
> +  basic_block first = single_succ (ENTRY_BLOCK_PTR);
> +  rtx insn, last_interesting_insn = NULL;
> +  bitmap_head need_new, reachable;
> +  vec queue;
> +
> +  if (!flag_shrink_wrap)
> +return;
> +
> +  bitmap_initialize (&need_new, 0);
> +  bitmap_initialize (&reachable, 0);
> +  queue.create (n_basic_blocks);
> +
> +  FOR_EACH_BB (bb)
> +FOR_BB_INSNS (bb, insn)
> +  if (CALL_P (insn) && !SIBLING_CALL_P (insn))
> +   {
> + if (bb == first)
> +   {
> + bitmap_clear (&need_new);
> + bitmap_clear (&reachable);
> + queue.release ();
> + return;
> +   }
> +
> + bitmap_set_bit (&need_new, bb->index);
> + bitmap_set_bit (&reachable, bb->index);
> + queue.quick_push (bb);
> + break;
> +   }
> +
> +  if (queue.is_empty ())
> +{
> +  bitmap_clear (&need_new);
> +  bitmap_clear (&reachable);
> +  queue.release ();
> +  return;
> +}
> +
> +  while (!queue.is_empty ())
> +{
> +  edge e;
> +  edge_iterator ei;
> +
> +  bb = queue.pop ();
> +  FOR_EACH_EDGE (e, ei, bb->succs)
> +   if (e->dest != EXIT_BLOCK_PTR
> +   && bitmap_set_bit (&reachable, e->dest->index))
> + queue.quick_push (e->dest);
> +}
> +  queue.release ();
> +
> +  FOR_BB_INSNS (first, insn)
> +{
> +  rtx dest = interesting_dest_for_shprep (insn, NULL);
> +  if (!dest)
> +   continue;
> +
> +  if (DF_REG_DEF_COUNT (REGNO (dest)) > 1)

See below...


> +   {
> + bitmap_clear (&need_new);
> + bitmap_clear (&reachable);
> + return;
> +   }
> +
> +  for (df_ref use = DF_REG_USE_CHAIN (REGNO(dest));
> +  use;
> +  use = DF_REF_NEXT_REG (use))

You're using DF in these places. But IRA and LRA don't work with DF.
After update_equiv_regs DF caches and liveness may be incorrect. You'd
have to add a df_analyze call but I'm not sure how that will interact
with IRA/LRA's own dataflow frameworks (e.g. w.r.t.
REG_DEAD/REG_UNUSED notes).


> + rtx uin = DF_REF_INSN (use);
> + int ubbi = BLOCK_FOR_INSN (uin)->index;

int ubbi = DF_REF_BB (use)?

Ciao!
Steven


[PATCH, PR 10474] Split live-ranges of function arguments to help shrink-wrapping

2013-10-21 Thread Martin Jambor
Hi,

in spring I have suggested to shedule pass_cprop_hardreg before
pass_thread_prologue_and_epilogue in order to create many more
shrink-wrapping opportunities.  The problem is that formal arguments
of a functions which need to be saved across calls on slow paths often
get assigned callee saved registers and the very first BB thus needs
prologue, which makes any attempt at shrink-wrapping difficult.

Steven suggested that splitting live-ranges of these registers might
be a better approach (and Vlad suggested that I should look at
http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01137.html, a big thanks
to both) and this is what the patch below does.

The basic idea is to split live ranges of problematic pseudos in the
common dominator of all non-sibling calls and all uses of those
pseudos in BBs which are reachable from calls, if such dominator is
not in a loop and if it does not post-dominate the entry BB.

As far as the number of performed shrink-wrappings is concerned, it
has even bigger effect to scheduling pass_cprop_hardreg earlier that I
have proposed in spring.  Apart from looking at bootstrap and SPEC
2006 (at -Ofast) I have tried compiling Python and Ruby (with whatever
their default flags were, I believe mostly -O3) because David
suggested at his Cauldron talk last year that shrink-wrapping may be
important:

Number of performed shrink-wrappings:
| Source  | Trunk | With patch |   % |
|-+---++-|
| SPEC 2006 Int   |   734 |   1670 | +127.52 |
| SPEC 2006 FP (*)|   575 |   1022 |  +77.74 |
| C/CPP bootstrap stage 2 |  1748 |   3015 |  +72.48 |
| Python 2.7.5|   231 |401 |  +73.59 |
| Python 3.3.2|   234 |416 |  +77.78 |
| Ruby 2.0.0-p247 |   331 |464 |  +40.18 |

(*) I forgot to increase the stack limit and I believe that is the
reason why 410.bwaves, 416.gamess, 447.dealII and 481.wrf SPEC 2006 FP
benchmarks failed, regardless of my patch, but I have not investigated
what exactly happened, so there might have even been compile-time
issues.

As far as run-times are concerned, I have tried running the benchmarks
from hg.python.org/benchmarks but I do not have anything to report.
David claimed that shrink-wrapping in function lookdict_string is
likely to help but this patch alone does not make that happen, I am
investigating what else needs to be done for this to happen.  I have
not yet tried benchmarking ruby.  Spec 2006 (*) run times are a bit
encouraging.

Run-times (seconds, median of three runs):
| Benchmark  | Trunk | Live range splitting | % |
|+---+--+---|
| 400.perlbench  |   294 |  290 | -1.36 |
| 401.bzip2  |   384 |  387 | +0.78 |
| 403.gcc|   239 |  237 | -0.84 |
| 429.mcf|   227 |  228 | +0.44 |
| 445.gobmk  |   392 |  390 | -0.51 |
| 456.hmmer  |   343 |  342 | -0.29 |
| 458.sjeng  |   418 |  411 | -1.67 |
| 462.libquantum |   287 |  288 | +0.35 |
| 464.h264ref|   417 |  418 | +0.24 |
| 471.omnetpp|   275 |  280 | +1.82 |
| 473.astar  |   312 |  295 | -5.45 |
| 483.xalancbmk  |   183 |  183 |  0.00 |
|+---+--+---|
| 433.milc   |   336 |  335 | -0.30 |
| 434.zeusmp |   247 |  246 | -0.40 |
| 435.gromacs|   259 |  259 |  0.00 |
| 436.cactusADM  |   192 |  186 | -3.12 |
| 437.leslie3d   |   227 |  226 | -0.44 |
| 444.namd   |   315 |  315 |  0.00 |
| 450.soplex |   202 |  200 | -0.99 |
| 453.povray |   136 |  134 | -1.47 |
| 454.calculix   |   300 |  300 |  0.00 |
| 459.GemsFDTD   |   273 |  267 | -2.20 |
| 465.tonto  |   685 |  685 |  0.00 |
| 470.lbm|   212 |  213 | +0.47 |
| 482.sphinx3|   363 |  360 | -0.83 |

The patch is speculative, quite a few modifications do not lead to
shrink wrappings, as outlined in the following table which shows how
many of the changed functions still needed their prologue in the first
BB.

Number of functions touched:
| |  Modified | |   |
| Source  | functions | In vain | % |
|-+---+-+---|
| SPEC 2006 Int   |  2004 | 745 | 37.18 |
| SPEC 2006 FP (*)|  1491 | 881 | 59.09 |
| C/C++ bootstrap stage 2 |  2864 | 935 | 32.65 |