[Bug rtl-optimization/72855] Long compile time due to integrity checking during dataflow analysis per loop

2018-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #18 from Richard Biener  ---
Fixed.

[Bug rtl-optimization/72855] Long compile time due to integrity checking during dataflow analysis per loop

2018-11-19 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855

Martin Liška  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org

--- Comment #17 from Martin Liška  ---
Bill: Can the bug be marked as resolved? Or please update Known to work.

[Bug rtl-optimization/72855] Long compile time due to integrity checking during dataflow analysis per loop

2016-08-11 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855

--- Comment #16 from Bill Schmidt  ---
Author: wschmidt
Date: Thu Aug 11 22:20:41 2016
New Revision: 239395

URL: https://gcc.gnu.org/viewcvs?rev=239395=gcc=rev
Log:
2016-08-11  Richard Biener  
Bill Schmidt  

PR rtl-optimization/72855
* df-core.c (df_verify): Turn off DF_VERIFY_SCHEDULED at end.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/df-core.c

[Bug rtl-optimization/72855] Long compile time due to integrity checking during dataflow analysis per loop

2016-08-11 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855

--- Comment #15 from Bill Schmidt  ---
For your patch submission, the testing was done on
powerpc64le-unknown-linux-gnu.

[Bug rtl-optimization/72855] Long compile time due to integrity checking during dataflow analysis per loop

2016-08-11 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855

--- Comment #14 from Bill Schmidt  ---
Oh, I should have mentioned, it passed bootstrap with no regressions, so the
patch LGTM.

[Bug rtl-optimization/72855] Long compile time due to integrity checking during dataflow analysis per loop

2016-08-11 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855

--- Comment #13 from amker at gcc dot gnu.org ---
(In reply to Bill Schmidt from comment #10)
> Some experiments on trunk:
> 
> - Using Bin's patch, I see compile time reduced to ~14 minutes.
> - Using Richi's patch, I see compile time reduced to ~9 minutes.
> 
> So both are quite helpful compared to somewhere around 2 hours.
> 
> I'll plan to implement the pre-approved patch, but first I want to try to
> dig into why flag_checking appears to have an unexpected value.

Hi, could you help bootstrap/test the patch please?  I don't have machine with
doloop optimization at hand.  If test is ok, the patch should be helpful in
addition to the other one, because it's in effect moving cheaper checks before
expensive one.  Thanks.

[Bug rtl-optimization/72855] Long compile time due to integrity checking during dataflow analysis per loop

2016-08-11 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855

--- Comment #12 from Bill Schmidt  ---
Last night before I ran out of time, I built a debug compiler on gcc-6-branch,
and flag_checking was always 0, and I didn't have the compile time issue.  So
it appears to be something that only occurs with a bootstrap compiler, which
isn't going to be very helpful for running this down. :/

[Bug rtl-optimization/72855] Long compile time due to integrity checking during dataflow analysis per loop

2016-08-11 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855

--- Comment #11 from rguenther at suse dot de  ---
On Wed, 10 Aug 2016, wschmidt at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855
> 
> --- Comment #10 from Bill Schmidt  ---
> Some experiments on trunk:
> 
> - Using Bin's patch, I see compile time reduced to ~14 minutes.
> - Using Richi's patch, I see compile time reduced to ~9 minutes.
> 
> So both are quite helpful compared to somewhere around 2 hours.
> 
> I'll plan to implement the pre-approved patch, but first I want to try to dig
> into why flag_checking appears to have an unexpected value.

Yes, that's very much appreciated.  As the issue also appears on older
branches w/o flag_checking there may be a stray memory write to
df->changeable_flags somewhere?  As it is a global maybe put a watchpoint
to it ...

[Bug rtl-optimization/72855] Long compile time due to integrity checking during dataflow analysis per loop

2016-08-10 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855

--- Comment #10 from Bill Schmidt  ---
Some experiments on trunk:

- Using Bin's patch, I see compile time reduced to ~14 minutes.
- Using Richi's patch, I see compile time reduced to ~9 minutes.

So both are quite helpful compared to somewhere around 2 hours.

I'll plan to implement the pre-approved patch, but first I want to try to dig
into why flag_checking appears to have an unexpected value.

[Bug rtl-optimization/72855] Long compile time due to integrity checking during dataflow analysis per loop

2016-08-10 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855

--- Comment #9 from rguenther at suse dot de  ---
On August 10, 2016 7:20:00 PM GMT+02:00, "dje at gcc dot gnu.org"
 wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855
>
>David Edelsohn  changed:
>
>   What|Removed |Added
>
> Status|UNCONFIRMED |NEW
>   Last reconfirmed||2016-08-10
> Ever confirmed|0   |1
>
>--- Comment #8 from David Edelsohn  ---
>> Yes, but that is guarded by flag_checking which defaults to 0.
>
>How can flag_checking be 0 if -fno-checking has an effect?

It can't have an effect with release checking unless sth is seriously botched. 
Which is why I asked this to be investigated (I can't reproduce it on x86_64
Linux)

[Bug rtl-optimization/72855] Long compile time due to integrity checking during dataflow analysis per loop

2016-08-10 Thread dje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855

David Edelsohn  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-08-10
 Ever confirmed|0   |1

--- Comment #8 from David Edelsohn  ---
> Yes, but that is guarded by flag_checking which defaults to 0.

How can flag_checking be 0 if -fno-checking has an effect?

[Bug rtl-optimization/72855] Long compile time due to integrity checking during dataflow analysis per loop

2016-08-10 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855

--- Comment #7 from rguenther at suse dot de  ---
On August 10, 2016 5:15:43 PM GMT+02:00, "wschmidt at gcc dot gnu.org"
 wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855
>
>--- Comment #5 from Bill Schmidt  ---
>(In reply to Richard Biener from comment #2)
>> If we have release checking enabled then we shuould hit
>> 
>> static void
>> df_analyze_1 (void)
>> {
>> ...
>> #ifndef ENABLE_DF_CHECKING
>>   if (df->changeable_flags & DF_VERIFY_SCHEDULED)
>> #endif
>> df_verify ();
>> 
>> so I wonder why df->changeable_flags & DF_VERIFY_SCHEDULED is ever
>true
>> for release checking.  Can you track that down?  
>
>df_finish_pass has the following unguarded code at the end of the
>function:
>
>  if (flag_checking && verify)
>df->changeable_flags |= DF_VERIFY_SCHEDULED;
>
>This is the only way that DF_VERIFY_SCHEDULED gets turned on by itself.

Yes, but that is guarded by flag_checking which defaults to 0.

>> But yes, performing df_verify for each loop in a function is
>excessive,
>> we seem to lack ever clearing said flag.  Does
>> 
>> Index: gcc/df-core.c
>> ===
>> --- gcc/df-core.c   (revision 239276)
>> +++ gcc/df-core.c   (working copy)
>> @@ -1833,6 +1833,7 @@ df_verify (void)
>>if (df_live)
>>  df_live_verify_transfer_functions ();
>>  #endif
>> +  df->changeable_flags &= ~DF_VERIFY_SCHEDULED;
>>  }
>>  
>>  #ifdef DF_DEBUG_CFG
>> 
>> help in the non-DF-checking case?
>
>It should, since df_finish_pass isn't called (indirectly from
>iv_analysis_done)
>by the doloop code until after all of the loops have been processed.

If you manage to test that it's pre-approved.

[Bug rtl-optimization/72855] Long compile time due to integrity checking during dataflow analysis per loop

2016-08-10 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855

--- Comment #6 from Bill Schmidt  ---
(In reply to amker from comment #4)
> It reduces compile time for powerpc-elf on x86_64 machine from 54m to 5m. 
> The compiler is configured with checking.  With "--enable-checking=release",
> the current trunk compiles for ~5m too, but gcc in Ubuntu has the issue even
> it's configured as release?

Yes -- I can reproduce this with:

Target: powerpc64le-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
4.8.4-2ubuntu1~14.04' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs
--enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-4.8 --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls
--with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libmudflap
--disable-libsanitizer --disable-libsanitizer --disable-libquadmath
--enable-plugin --with-system-zlib --disable-browser-plugin
--enable-java-awt=gtk --enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-ppc64el/jre --enable-java-home
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-ppc64el
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-ppc64el
--with-arch-directory=ppc64el --with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--enable-objc-gc --enable-secureplt --with-cpu=power7 --with-tune=power8
--enable-targets=powerpcle-linux --disable-multilib --enable-multiarch
--disable-werror --with-long-double-128 --enable-checking=release
--build=powerpc64le-linux-gnu --host=powerpc64le-linux-gnu
--target=powerpc64le-linux-gnu
Thread model: posix
gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04) 

I also reproduced it with latest gcc-6-branch, which should be built with
release checking, and with trunk (but that should use extra checking).  The guy
who spotted this originally reported it on 4.8.4, 4.9.?, and 5.4 also.

For his purposes, he recognized that he should just remove the silly --param,
but this uncovered a rather interesting test in any event.

[Bug rtl-optimization/72855] Long compile time due to integrity checking during dataflow analysis per loop

2016-08-10 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855

--- Comment #5 from Bill Schmidt  ---
(In reply to Richard Biener from comment #2)
> If we have release checking enabled then we shuould hit
> 
> static void
> df_analyze_1 (void)
> {
> ...
> #ifndef ENABLE_DF_CHECKING
>   if (df->changeable_flags & DF_VERIFY_SCHEDULED)
> #endif
> df_verify ();
> 
> so I wonder why df->changeable_flags & DF_VERIFY_SCHEDULED is ever true
> for release checking.  Can you track that down?  

df_finish_pass has the following unguarded code at the end of the function:

  if (flag_checking && verify)
df->changeable_flags |= DF_VERIFY_SCHEDULED;

This is the only way that DF_VERIFY_SCHEDULED gets turned on by itself.

> But yes, performing df_verify for each loop in a function is excessive,
> we seem to lack ever clearing said flag.  Does
> 
> Index: gcc/df-core.c
> ===
> --- gcc/df-core.c   (revision 239276)
> +++ gcc/df-core.c   (working copy)
> @@ -1833,6 +1833,7 @@ df_verify (void)
>if (df_live)
>  df_live_verify_transfer_functions ();
>  #endif
> +  df->changeable_flags &= ~DF_VERIFY_SCHEDULED;
>  }
>  
>  #ifdef DF_DEBUG_CFG
> 
> help in the non-DF-checking case?

It should, since df_finish_pass isn't called (indirectly from iv_analysis_done)
by the doloop code until after all of the loops have been processed.

[Bug rtl-optimization/72855] Long compile time due to integrity checking during dataflow analysis per loop

2016-08-10 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855

--- Comment #4 from amker at gcc dot gnu.org ---
Here is a simple refactoring patch.

diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c
index c311516..9fb04cf 100644
--- a/gcc/loop-doloop.c
+++ b/gcc/loop-doloop.c
@@ -254,18 +254,51 @@ doloop_condition_get (rtx doloop_pat)
   return 0;
 }

-/* Return nonzero if the loop specified by LOOP is suitable for
-   the use of special low-overhead looping instructions.  DESC
-   describes the number of iterations of the loop.  */
+/* Check all insns of LOOP to see if the loop is suitable for the use of
+   special low-overhead looping instructions.  Return TRUE if yes, false
+   otherwise.  */

 static bool
-doloop_valid_p (struct loop *loop, struct niter_desc *desc)
+doloop_insn_valid_p (struct loop *loop)
 {
-  basic_block *body = get_loop_body (loop), bb;
-  rtx_insn *insn;
   unsigned i;
-  bool result = true;
+  rtx_insn *insn;
+  basic_block *body = get_loop_body (loop), bb;

+  for (i = 0; i < loop->num_nodes; i++)
+{
+  bb = body[i];
+
+  for (insn = BB_HEAD (bb);
+  insn != NEXT_INSN (BB_END (bb));
+  insn = NEXT_INSN (insn))
+   {
+ /* Different targets have different necessities for low-overhead
+looping.  Call the back end for each instruction within the loop
+to let it decide whether the insn prohibits a low-overhead loop.
+It will then return the cause for it to emit to the dump file.  */
+ const char * invalid = targetm.invalid_within_doloop (insn);
+ if (invalid)
+   {
+ if (dump_file)
+   fprintf (dump_file, "Doloop: %s\n", invalid);
+
+ free (body);
+ return false;
+   }
+   }
+}
+  free (body);
+  return true;
+}
+
+/* Check the number of iterations described by DESC of a loop to see if
+   the loop is suitable for the use of special low-overhead looping
+   instructions.  Return true if yes, false otherwise.  */
+
+static bool
+doloop_niter_valid_p (struct loop *, struct niter_desc *desc)
+{
   /* Check for loops that may not terminate under special conditions.  */
   if (!desc->simple_p
   || desc->assumptions
@@ -295,38 +328,10 @@ doloop_valid_p (struct loop *loop, struct niter_desc
*desc)
 enable count-register loops in this case.  */
   if (dump_file)
fprintf (dump_file, "Doloop: Possible infinite iteration case.\n");
-  result = false;
-  goto cleanup;
-}
-
-  for (i = 0; i < loop->num_nodes; i++)
-{
-  bb = body[i];
-
-  for (insn = BB_HEAD (bb);
-  insn != NEXT_INSN (BB_END (bb));
-  insn = NEXT_INSN (insn))
-   {
- /* Different targets have different necessities for low-overhead
-looping.  Call the back end for each instruction within the loop
-to let it decide whether the insn prohibits a low-overhead loop.
-It will then return the cause for it to emit to the dump file.  */
- const char * invalid = targetm.invalid_within_doloop (insn);
- if (invalid)
-   {
- if (dump_file)
-   fprintf (dump_file, "Doloop: %s\n", invalid);
- result = false;
- goto cleanup;
-   }
-   }
+  return false;
 }
-  result = true;
-
-cleanup:
-  free (body);

-  return result;
+  return true;
 }

 /* Adds test of COND jumping to DEST on edge *E and set *E to the new fallthru
@@ -621,17 +626,25 @@ doloop_optimize (struct loop *loop)
   if (dump_file)
 fprintf (dump_file, "Doloop: Processing loop %d.\n", loop->num);

+  if (!doloop_insn_valid_p (loop))
+{
+  if (dump_file)
+   fprintf (dump_file, "Doloop: The loop is not suitable.\n");
+
+  return false;
+}
+
   iv_analysis_loop_init (loop);

   /* Find the simple exit of a LOOP.  */
   desc = get_simple_loop_desc (loop);

   /* Check that loop is a candidate for a low-overhead looping insn.  */
-  if (!doloop_valid_p (loop, desc))
+  if (!doloop_niter_valid_p (loop, desc))
 {
   if (dump_file)
-   fprintf (dump_file,
-"Doloop: The loop is not suitable.\n");
+   fprintf (dump_file, "Doloop: The loop is not suitable.\n");
+
   return false;
 }
   mode = desc->mode;

It reduces compile time for powerpc-elf on x86_64 machine from 54m to 5m.  The
compiler is configured with checking.  With "--enable-checking=release", the
current trunk compiles for ~5m too, but gcc in Ubuntu has the issue even it's
configured as release?

[Bug rtl-optimization/72855] Long compile time due to integrity checking during dataflow analysis per loop

2016-08-10 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855

--- Comment #3 from amker at gcc dot gnu.org ---
(In reply to amker from comment #1)
> Among all loops in the large function, how many loops can be doloop
> optimized successfully?  Function doloop__optimize has some valid checks on
> doloop optimizations.  Is it possible to move most (some) of checks outside
> of the per-loop function.  Looks like targetm.invalid_within_doloop can be
> moved.  As a result, we can know some loops can't be doloop optimized, so
> the iv_analysis/df_analysis can be skipped for those loops.  For checks on
> loop analysis result:
>   if (!desc->simple_p
>   || desc->assumptions
>   || desc->infinite)
> I think it's possible to avoid per-loop analysis behavior too.  Since we
> don't change code, there is no need to do df analysis for each loop before
> iv_analysis.  As a result, df_verify can be saved.  We may need a new
> interface analyzing iv/loop_desc for all loops in loop-iv.c
> 
> Of course, if most loops can be doloop optimized, this may not be able to
> help.

As suspected, most loop can't be doloop optimized by the target dependent insn
check.  I am testing a simple refactoring patch to see if it can reduce compile
time.

[Bug rtl-optimization/72855] Long compile time due to integrity checking during dataflow analysis per loop

2016-08-10 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855

--- Comment #2 from Richard Biener  ---
If we have release checking enabled then we shuould hit

static void
df_analyze_1 (void)
{
...
#ifndef ENABLE_DF_CHECKING
  if (df->changeable_flags & DF_VERIFY_SCHEDULED)
#endif
df_verify ();

so I wonder why df->changeable_flags & DF_VERIFY_SCHEDULED is ever true
for release checking.  Can you track that down?  I can't reproduce
df_verify being called on the 4.9 branch with release checking on x86_64
(OTOH x86_64 doesn't have a doloop pattern).  Compile-time is 186s for 4.9,
main offenders:

 df reaching defs:  24.61 (13%) usr 
 alias stmt walking  :  13.72 ( 7%) usr  
 dominator optimization  :  33.26 (18%) usr
 expand vars :  48.89 (26%) usr

GCC 5 seems to be quite a bit worse with 400s:

 df reaching defs:  13.16 ( 3%) usr
 alias stmt walking  : 216.73 (54%) usr
 expand vars :  25.86 ( 6%) usr  
 load CSE after reload   :  83.96 (21%) usr  

GCC 6 uses a _load_ more memory on this testcase (I end up swapping with 8GB
ram
and finally get killed).  So even on x86_64 the testcase looks interesting.

Note that

static bool
doloop_optimize (struct loop *loop)
{
...
  iv_analysis_loop_init (loop);

  /* Find the simple exit of a LOOP.  */
  desc = get_simple_loop_desc (loop);


performs iv_analysis_loop_init and thus df_analyze_loop twice ...


But yes, performing df_verify for each loop in a function is excessive,
we seem to lack ever clearing said flag.  Does

Index: gcc/df-core.c
===
--- gcc/df-core.c   (revision 239276)
+++ gcc/df-core.c   (working copy)
@@ -1833,6 +1833,7 @@ df_verify (void)
   if (df_live)
 df_live_verify_transfer_functions ();
 #endif
+  df->changeable_flags &= ~DF_VERIFY_SCHEDULED;
 }

 #ifdef DF_DEBUG_CFG

help in the non-DF-checking case?

[Bug rtl-optimization/72855] Long compile time due to integrity checking during dataflow analysis per loop

2016-08-10 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72855

amker at gcc dot gnu.org changed:

   What|Removed |Added

 CC||amker at gcc dot gnu.org

--- Comment #1 from amker at gcc dot gnu.org ---
Among all loops in the large function, how many loops can be doloop optimized
successfully?  Function doloop__optimize has some valid checks on doloop
optimizations.  Is it possible to move most (some) of checks outside of the
per-loop function.  Looks like targetm.invalid_within_doloop can be moved.  As
a result, we can know some loops can't be doloop optimized, so the
iv_analysis/df_analysis can be skipped for those loops.  For checks on loop
analysis result:
  if (!desc->simple_p
  || desc->assumptions
  || desc->infinite)
I think it's possible to avoid per-loop analysis behavior too.  Since we don't
change code, there is no need to do df analysis for each loop before
iv_analysis.  As a result, df_verify can be saved.  We may need a new interface
analyzing iv/loop_desc for all loops in loop-iv.c

Of course, if most loops can be doloop optimized, this may not be able to help.