[Bug target/108738] compile-time and memory-hog in mdreorg

2023-03-02 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108738

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Keywords||compile-time-hog
   Target Milestone|--- |13.0
 Status|ASSIGNED|RESOLVED

--- Comment #9 from Richard Biener  ---
This has been mitigated for GCC 13, in theory backporting is possible but
there's a workaround available (-mno-stv), so I think that's not strictly
necessary since the issue is quite old already.

[Bug target/108738] compile-time and memory-hog in mdreorg

2023-03-02 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108738

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:6010189923908501ca5b02bd1f4aee05d2283118

commit r13-6439-g6010189923908501ca5b02bd1f4aee05d2283118
Author: Richard Biener 
Date:   Thu Mar 2 13:43:51 2023 +0100

target/108738 - limit STV chain discovery

The following puts a hard limit on the inherently quadratic STV chain
discovery.  Without a limit for the compiler.i testcase in PR26854
we see at -O2

 machine dep reorg  : 574.45 ( 53%)

with release checking while with the proposed limit it's

 machine dep reorg  :   2.86 (  1%)

PR target/108738
* config/i386/i386.opt (--param x86-stv-max-visits): New param.
* doc/invoke.texi (--param x86-stv-max-visits): Document it.
* config/i386/i386-features.h (scalar_chain::max_visits): New.
(scalar_chain::build): Add bitmap parameter, return boolean.
(scalar_chain::add_insn): Likewise.
(scalar_chain::analyze_register_chain): Likewise.
* config/i386/i386-features.cc (scalar_chain::scalar_chain):
Initialize max_visits.
(scalar_chain::analyze_register_chain): When we exhaust
max_visits, abort.  Also abort when running into any
disallowed insn.
(scalar_chain::add_insn): Propagate abort.
(scalar_chain::build): Likewise.  When aborting amend
the set of disallowed insn with the insns set.
(convert_scalars_to_vector): Adjust.  Do not convert aborted
chains.

[Bug target/108738] compile-time and memory-hog in mdreorg

2023-03-02 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108738

--- Comment #7 from Richard Biener  ---
So it's now better than before but still quadratic.

Finding a strathegic place to limit the search with some --param might be a
solution, but there's no easy
point to hook that into.  You'd not want to disable the whole pass but
terminate the greedy search and axe the candidates sofar processed (to not run
into the same ones again), which might then result in "odd" STV decisions if
the remains are picked up.  To avoid this maybe maintain a "too big" set of
candidates
and if a further greedy search lands at a insn in that set, axe that search as
well.  Note it's not the size of the set but the complexity of the search that
needs limiting, so count the number of ref visits through
analyze_register_chain
for an invocation of scalar_chain::build and limit that to some --param.

I'm trying to prototype that.

[Bug target/108738] compile-time and memory-hog in mdreorg

2023-02-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108738

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:e1dfac7e71056e879f101fef1c5ecb8ff6be1a1f

commit r13-5995-ge1dfac7e71056e879f101fef1c5ecb8ff6be1a1f
Author: Richard Biener 
Date:   Thu Feb 9 13:40:43 2023 +0100

target/108738 - optimize bit operations in STV

The following does low-hanging optimizations, combining bitmap
test and set and removing redundant operations.

PR target/108738
* config/i386/i386-features.cc (scalar_chain::add_to_queue):
Combine bitmap test and set.
(scalar_chain::add_insn): Likewise.
(scalar_chain::analyze_register_chain): Remove redundant
attempt to add to queue and instead strengthen assert.
Sink common attempts to mark the def dual-mode.
(scalar_chain::add_to_queue): Remove redundant insn bitmap
check.

[Bug target/108738] compile-time and memory-hog in mdreorg

2023-02-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108738

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:ec23e9e25eb64bb066dc408fd498861b8587bec8

commit r13-5994-gec23e9e25eb64bb066dc408fd498861b8587bec8
Author: Richard Biener 
Date:   Thu Feb 9 11:03:25 2023 +0100

target/108738 - STV bitmap operations compile-time hog

When the set of candidates becomes very large then repeated
bit checks on it during the build of an actual chain can become
slow because of the O(n) nature of bitmap tests.  The following
switches the candidates bitmaps to the tree representation before
building the chains to get O(log n) amortized behavior.

For the testcase at hand this improves STV time by 50%.

PR target/108738
* config/i386/i386-features.cc (convert_scalars_to_vector):
Switch candidates bitmaps to tree view before building the chains.

[Bug target/108738] compile-time and memory-hog in mdreorg

2023-02-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108738

--- Comment #4 from Richard Biener  ---
The odd thing is that DF_REF_CHAIN of a USE ref contains _all_ definitions of
the pseudo while DF_REF_CHAIN of a DEF ref contains only reaching uses!?

Probably an artifact of how df_chain_create_bb_process_use operates.

That makes this much harder to fix ...

[Bug target/108738] compile-time and memory-hog in mdreorg

2023-02-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108738

--- Comment #3 from Richard Biener  ---
That brings it down to

 machine dep reorg  :  87.09 ( 28%)

let me see if there's something else obvious to do.

[Bug target/108738] compile-time and memory-hog in mdreorg

2023-02-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108738

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2023-02-09
 Ever confirmed|0   |1
  Known to fail||10.4.0
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #2 from Richard Biener  ---
Happens at least since GCC 10.

There's a comment already:

  /* ???  The following is quadratic since analyze_register_chain
 iterates over all refs to look for dual-mode regs.  Instead this
 should be done separately for all regs mentioned in the chain once.  */
  df_ref ref;
  for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
if (!HARD_REGISTER_P (DF_REF_REG (ref)))
  analyze_register_chain (candidates, ref);

but the walk also adds to the queue, so it's more interwinded and the suggested
solution isn't the proper.  Instead it might be possible to just record
already visited regs.

[Bug target/108738] compile-time and memory-hog in mdreorg

2023-02-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108738

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-*

--- Comment #1 from Richard Biener  ---
A perf report isn't very conclusive, but at least points at STV:

Samples: 3M of event 'cycles:u', Event count (approx.): 4455606089800   
 Samples  Command  Shared Object   Symbol   
 2153935  cc1  cc1 [.] bitmap_bit_p 
  288315  cc1  cc1 [.] (anonymous
namespace)::scalar_chain::analyze_register_chain
  145980  cc1  cc1 [.] bitmap_equal_p   
   85953  cc1  cc1 [.] pop_scope
   70985  cc1  cc1 [.] bitmap_ior_into  
   87675  cc1  cc1 [.] df_chain_create

with -O2 -mno-stv we get to

 machine dep reorg  :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
0  (  0%)
 TOTAL  : 202.90  8.46211.41   
 1475M