[Bug target/108738] compile-time and memory-hog in mdreorg
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108738 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Keywords||compile-time-hog Target Milestone|--- |13.0 Status|ASSIGNED|RESOLVED --- Comment #9 from Richard Biener --- This has been mitigated for GCC 13, in theory backporting is possible but there's a workaround available (-mno-stv), so I think that's not strictly necessary since the issue is quite old already.
[Bug target/108738] compile-time and memory-hog in mdreorg
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108738 --- Comment #8 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:6010189923908501ca5b02bd1f4aee05d2283118 commit r13-6439-g6010189923908501ca5b02bd1f4aee05d2283118 Author: Richard Biener Date: Thu Mar 2 13:43:51 2023 +0100 target/108738 - limit STV chain discovery The following puts a hard limit on the inherently quadratic STV chain discovery. Without a limit for the compiler.i testcase in PR26854 we see at -O2 machine dep reorg : 574.45 ( 53%) with release checking while with the proposed limit it's machine dep reorg : 2.86 ( 1%) PR target/108738 * config/i386/i386.opt (--param x86-stv-max-visits): New param. * doc/invoke.texi (--param x86-stv-max-visits): Document it. * config/i386/i386-features.h (scalar_chain::max_visits): New. (scalar_chain::build): Add bitmap parameter, return boolean. (scalar_chain::add_insn): Likewise. (scalar_chain::analyze_register_chain): Likewise. * config/i386/i386-features.cc (scalar_chain::scalar_chain): Initialize max_visits. (scalar_chain::analyze_register_chain): When we exhaust max_visits, abort. Also abort when running into any disallowed insn. (scalar_chain::add_insn): Propagate abort. (scalar_chain::build): Likewise. When aborting amend the set of disallowed insn with the insns set. (convert_scalars_to_vector): Adjust. Do not convert aborted chains.
[Bug target/108738] compile-time and memory-hog in mdreorg
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108738 --- Comment #7 from Richard Biener --- So it's now better than before but still quadratic. Finding a strathegic place to limit the search with some --param might be a solution, but there's no easy point to hook that into. You'd not want to disable the whole pass but terminate the greedy search and axe the candidates sofar processed (to not run into the same ones again), which might then result in "odd" STV decisions if the remains are picked up. To avoid this maybe maintain a "too big" set of candidates and if a further greedy search lands at a insn in that set, axe that search as well. Note it's not the size of the set but the complexity of the search that needs limiting, so count the number of ref visits through analyze_register_chain for an invocation of scalar_chain::build and limit that to some --param. I'm trying to prototype that.
[Bug target/108738] compile-time and memory-hog in mdreorg
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108738 --- Comment #6 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:e1dfac7e71056e879f101fef1c5ecb8ff6be1a1f commit r13-5995-ge1dfac7e71056e879f101fef1c5ecb8ff6be1a1f Author: Richard Biener Date: Thu Feb 9 13:40:43 2023 +0100 target/108738 - optimize bit operations in STV The following does low-hanging optimizations, combining bitmap test and set and removing redundant operations. PR target/108738 * config/i386/i386-features.cc (scalar_chain::add_to_queue): Combine bitmap test and set. (scalar_chain::add_insn): Likewise. (scalar_chain::analyze_register_chain): Remove redundant attempt to add to queue and instead strengthen assert. Sink common attempts to mark the def dual-mode. (scalar_chain::add_to_queue): Remove redundant insn bitmap check.
[Bug target/108738] compile-time and memory-hog in mdreorg
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108738 --- Comment #5 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:ec23e9e25eb64bb066dc408fd498861b8587bec8 commit r13-5994-gec23e9e25eb64bb066dc408fd498861b8587bec8 Author: Richard Biener Date: Thu Feb 9 11:03:25 2023 +0100 target/108738 - STV bitmap operations compile-time hog When the set of candidates becomes very large then repeated bit checks on it during the build of an actual chain can become slow because of the O(n) nature of bitmap tests. The following switches the candidates bitmaps to the tree representation before building the chains to get O(log n) amortized behavior. For the testcase at hand this improves STV time by 50%. PR target/108738 * config/i386/i386-features.cc (convert_scalars_to_vector): Switch candidates bitmaps to tree view before building the chains.
[Bug target/108738] compile-time and memory-hog in mdreorg
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108738 --- Comment #4 from Richard Biener --- The odd thing is that DF_REF_CHAIN of a USE ref contains _all_ definitions of the pseudo while DF_REF_CHAIN of a DEF ref contains only reaching uses!? Probably an artifact of how df_chain_create_bb_process_use operates. That makes this much harder to fix ...
[Bug target/108738] compile-time and memory-hog in mdreorg
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108738 --- Comment #3 from Richard Biener --- That brings it down to machine dep reorg : 87.09 ( 28%) let me see if there's something else obvious to do.
[Bug target/108738] compile-time and memory-hog in mdreorg
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108738 Richard Biener changed: What|Removed |Added Last reconfirmed||2023-02-09 Ever confirmed|0 |1 Known to fail||10.4.0 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #2 from Richard Biener --- Happens at least since GCC 10. There's a comment already: /* ??? The following is quadratic since analyze_register_chain iterates over all refs to look for dual-mode regs. Instead this should be done separately for all regs mentioned in the chain once. */ df_ref ref; for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref)) if (!HARD_REGISTER_P (DF_REF_REG (ref))) analyze_register_chain (candidates, ref); but the walk also adds to the queue, so it's more interwinded and the suggested solution isn't the proper. Instead it might be possible to just record already visited regs.
[Bug target/108738] compile-time and memory-hog in mdreorg
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108738 Richard Biener changed: What|Removed |Added Target||x86_64-*-* --- Comment #1 from Richard Biener --- A perf report isn't very conclusive, but at least points at STV: Samples: 3M of event 'cycles:u', Event count (approx.): 4455606089800 Samples Command Shared Object Symbol 2153935 cc1 cc1 [.] bitmap_bit_p 288315 cc1 cc1 [.] (anonymous namespace)::scalar_chain::analyze_register_chain 145980 cc1 cc1 [.] bitmap_equal_p 85953 cc1 cc1 [.] pop_scope 70985 cc1 cc1 [.] bitmap_ior_into 87675 cc1 cc1 [.] df_chain_create with -O2 -mno-stv we get to machine dep reorg : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) TOTAL : 202.90 8.46211.41 1475M