[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Bug 84402 depends on bug 109051, which changed state. Bug 109051 Summary: Configure takes long time for multibuild of run-time libraries https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109051 What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |WORKSFORME
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Bug 84402 depends on bug 113575, which changed state. Bug 113575 Summary: [14 Regression] memory hog building insn-opinit.o (i686-linux-gnu -> riscv64-linux-gnu) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113575 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Bug 84402 depends on bug 54179, which changed state. Bug 54179 Summary: please split insn-emit.c ! https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54179 What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #71 from CVS Commits --- The master branch has been updated by Robin Dapp : https://gcc.gnu.org/g:184378027e92f51e02d3649e0ca523f487fd2810 commit r14-5034-g184378027e92f51e02d3649e0ca523f487fd2810 Author: Robin Dapp Date: Thu Oct 12 11:23:26 2023 +0200 genemit: Split insn-emit.cc into several partitions. On riscv insn-emit.cc has grown to over 1.2 mio lines of code and compiling it takes considerable time. Therefore, this patch adjust genemit to create several partitions (insn-emit-1.cc to insn-emit-n.cc). The available patterns are written to the given files in a sequential fashion. Similar to match.pd a configure option --with-emitinsn-partitions=num is introduced that makes the number of partition configurable. gcc/ChangeLog: PR bootstrap/84402 PR target/111600 * Makefile.in: Handle split insn-emit.cc. * configure: Regenerate. * configure.ac: Add --with-insnemit-partitions. * genemit.cc (output_peephole2_scratches): Print to file instead of stdout. (print_code): Ditto. (gen_rtx_scratch): Ditto. (gen_exp): Ditto. (gen_emit_seq): Ditto. (emit_c_code): Ditto. (gen_insn): Ditto. (gen_expand): Ditto. (gen_split): Ditto. (output_add_clobbers): Ditto. (output_added_clobbers_hard_reg_p): Ditto. (print_overload_arguments): Ditto. (print_overload_test): Ditto. (handle_overloaded_code_for): Ditto. (handle_overloaded_gen): Ditto. (print_header): New function. (handle_arg): New function. (main): Split output into 10 files. * gensupport.cc (count_patterns): New function. * gensupport.h (count_patterns): Define. * read-md.cc (md_reader::print_md_ptr_loc): Add file argument. * read-md.h (class md_reader): Change definition.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Bug 84402 depends on bug 54179, which changed state. Bug 54179 Summary: please split insn-emit.c ! https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54179 What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|WONTFIX |---
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #70 from CVS Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:0a85544e1aaeca41133ecfc438cda913dbc0f122 commit r14-501-g0a85544e1aaeca41133ecfc438cda913dbc0f122 Author: Tamar Christina Date: Fri May 5 13:42:17 2023 +0100 match.pd: Use splits in makefile and make configurable. This updates the build system to split up match.pd files into chunks of 10. This also introduces a new flag --with-matchpd-partitions which can be used to change the number of partitions. For the analysis of why 10 please look at the previous patch in the series. gcc/ChangeLog: PR bootstrap/84402 * Makefile.in (NUM_MATCH_SPLITS, MATCH_SPLITS_SEQ, GIMPLE_MATCH_PD_SEQ_SRC, GIMPLE_MATCH_PD_SEQ_O, GENERIC_MATCH_PD_SEQ_SRC, GENERIC_MATCH_PD_SEQ_O): New. (OBJS, MOSTLYCLEANFILES, .PRECIOUS): Use them. (s-match): Split into s-generic-match and s-gimple-match. * configure.ac (with-matchpd-partitions, DEFAULT_MATCHPD_PARTITIONS): New. * configure: Regenerate.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #69 from CVS Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:703417a030b3d80f55ba1402adc3f1692d3631e5 commit r14-500-g703417a030b3d80f55ba1402adc3f1692d3631e5 Author: Tamar Christina Date: Fri May 5 13:38:50 2023 +0100 match.pd: automatically partition *-match.cc files. Following on from Richi's RFC[1] this is another attempt to split up match.pd into multiple gimple-match and generic-match files. This version is fully automated and requires no human intervention. First things first, some perf numbers. The following shows the effect of the patch on my desktop doing parallel compilation of gimple-match: ++--++--+ | splits | rel. improvement | splits | rel. improvement | ++--++--+ | 1 | 0.00%| 33 | 91.03% | | 2 | 71.77% | 34 | 84.02% | | 3 | 100.71% | 35 | 83.42% | | 4 | 143.08% | 36 | 78.80% | | 5 | 176.18% | 37 | 74.06% | | 6 | 174.40% | 38 | 55.76% | | 7 | 176.62% | 39 | 66.90% | | 8 | 168.35% | 40 | 18.25% | | 9 | 189.80% | 41 | 16.55% | | 10 | 171.77% | 42 | 47.02% | | 11 | 152.82% | 43 | 15.29% | | 12 | 112.20% | 44 | 21.63% | | 13 | 158.57% | 45 | 41.53% | | 14 | 158.57% | 46 | 21.98% | | 15 | 152.07% | 47 | -42.74% | | 16 | 151.70% | 48 | -32.62% | | 17 | 131.52% | 49 | 11.81% | | 18 | 133.11% | 50 | 34.07% | | 19 | 137.33% | 51 | 2.71%| | 20 | 103.83% | 52 | -22.23% | | 21 | 132.47% | 53 | 32.30% | | 22 | 116.52% | 54 | 21.45% | | 23 | 112.73% | 55 | 40.02% | | 24 | 111.94% | 56 | 42.83% | | 25 | 112.73% | 57 | -9.98% | | 26 | 104.07% | 58 | 18.01% | | 27 | 113.27% | 59 | -4.91% | | 28 | 96.77% | 60 | 22.94% | | 29 | 93.42% | 61 | -3.73% | | 30 | 87.67% | 62 | -27.43% | | 31 | 89.54% | 63 | -1.05% | | 32 | 84.42% | 64 | -5.44% | ++--++--+ As can be seen there seems to be a point of diminishing returns in doing splits. This comes from the fact that these match files consume a sizeable amount of headers. At a certain point the parsing overhead of the headers dominate and you start losing in gains. As such from this I've made the default 10 splits per file to allow for some room for growth in the future without needing changes to the split amount. Since 5-10 show roughly the same gains it means we can afford to double the file sizes before we need to up the split amount. This can be controlled by the configure parameter --with-matchpd-partitions=. At 10 splits the sizes of the files are: 1.2M gimple-match-1.cc 490K gimple-match-2.cc 459K gimple-match-3.cc 462K gimple-match-4.cc 466K gimple-match-5.cc 690K gimple-match-6.cc 517K gimple-match-7.cc 693K gimple-match-8.cc 1011K gimple-match-9.cc 490K gimple-match-10.cc 210K gimple-match-auto.h The reason gimple-match-1.cc is so large is because it got allocated a very large function: gimple_simplify_NE_EXPR. Because of these sporadically large functions the allocation to a split happens based on the amount of data already written to a split instead of just a simple round robin allocation (though the patch supports that too.). This means that once gimple_simplify_NE_EXPR is allocated to gimple-match-1.cc nothing uses it again until the rest of the files catch up. To support this split a new header file *-match-auto.h is generated to allow the individual files to compile separately. Lastly for the auto generated files I use pragmas to silence the unused predicate warnings instead of the previous Makefile way because I couldn't find a way to set them without knowing the number of split files beforehand. Finally with this change, bootstrap time has dropped 8 minutes on AArch64. [1] https://gcc.gnu.org/legacy-ml/gcc
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #68 from CVS Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:27fcf994c5515e1bbf2ff03d28fd2fa927c7e7b5 commit r14-499-g27fcf994c5515e1bbf2ff03d28fd2fa927c7e7b5 Author: Tamar Christina Date: Fri May 5 13:37:49 2023 +0100 genmatch: split shared code to gimple-match-exports.cc In preparation for automatically splitting match.pd files I split off the non-static helper functions that are shared between the match.pd functions off to another file. This file can be compiled in parallel and also allows us to later avoid duplicate symbols errors. gcc/ChangeLog: PR bootstrap/84402 * Makefile.in (OBJS): Add gimple-match-exports.o. * genmatch.cc (decision_tree::gen): Export gimple_gimplify helpers. * gimple-match-head.cc (gimple_simplify, gimple_resimplify1, gimple_resimplify2, gimple_resimplify3, gimple_resimplify4, gimple_resimplify5, constant_for_folding, convert_conditional_op, maybe_resimplify_conditional_op, gimple_match_op::resimplify, maybe_build_generic_op, build_call_internal, maybe_push_res_to_seq, do_valueize, try_conditional_simplification, gimple_extract, gimple_extract_op, canonicalize_code, commutative_binary_op_p, commutative_ternary_op_p, first_commutative_argument, associative_binary_op_p, directly_supported_p, get_conditional_internal_fn): Moved to gimple-match-exports.cc * gimple-match-exports.cc: New file.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #66 from CVS Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:e487fcc0f7466ea663a0fea52076337bebd42b8b commit r14-497-ge487fcc0f7466ea663a0fea52076337bebd42b8b Author: Tamar Christina Date: Fri May 5 13:36:01 2023 +0100 match.pd: Remove commented out line pragmas unless -vv is used. genmatch currently outputs commented out line directives that have no effect but the compiler still has to parse only to discard. They are however handy when debugging genmatch output. As such this moves them behind the -vv flag. gcc/ChangeLog: PR bootstrap/84402 * genmatch.cc (output_line_directive): Only emit commented directive when -vv.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #67 from CVS Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:c0ce29bc1ce329001b6c02bb3d34bcbb086e1b72 commit r14-498-gc0ce29bc1ce329001b6c02bb3d34bcbb086e1b72 Author: Tamar Christina Date: Fri May 5 13:36:43 2023 +0100 match.pd: CSE the dump output check. This is a small improvement in QoL codegen for match.pd to save time not re-evaluating the condition for printing debug information in every function. There is a small but consistent runtime and compile time win here. The runtime win comes from not having to do the condition over again, and on Arm plaforms we now use the new test-and-branch support for booleans to only have a single instruction here. gcc/ChangeLog: PR bootstrap/84402 * genmatch.cc (decision_tree::gen, write_predicate): Generate new debug_dump var. (dt_simplify::gen_1): Use it.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #65 from CVS Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:580cda3c2799b1f8323af770e52f1eb0fa204718 commit r14-496-g580cda3c2799b1f8323af770e52f1eb0fa204718 Author: Tamar Christina Date: Fri May 5 13:35:17 2023 +0100 match.pd: don't emit label if not needed This is a small QoL codegen improvement for match.pd to not emit labels when they are not needed. The codegen is nice and there is a small (but consistent) improvement in compile time. gcc/ChangeLog: PR bootstrap/84402 * genmatch.cc (dt_simplify::gen_1): Only emit labels if used.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #64 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:75cda3be0232f745cda4e177d514f6900390af0b commit r13-6902-g75cda3be0232f745cda4e177d514f6900390af0b Author: Richard Biener Date: Tue Mar 28 12:42:14 2023 +0200 bootstrap/84402 - improve (match ...) code generation The following avoids duplicating matching code for (match ...) in match.pd when possible. That's more easily possible for (match ...) than simplify because we do not need to handle common matches (those would be diagnosed only during compiling) nor is the result able to inspect the active operator. Specifically this reduces the size of the generated matches for the atomic ops as noted in PR108129. gimple-match.cc shrinks from 245k lines to 209k lines with this patch. PR bootstrap/84402 PR tree-optimization/108129 * genmatch.cc (lower_for): For (match ...) delay substituting into the match operator if possible. (dt_operand::gen_gimple_expr): For user_id look at the first substitute for determining how to access operands. (dt_operand::gen_generic_expr): Likewise. (dt_node::gen_kids): Properly sort user_ids according to their substitutes. (dt_node::gen_kids_1): Code-generate user_id matching.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #63 from rguenther at suse dot de --- On Tue, 28 Mar 2023, amonakov at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 > > Alexander Monakov changed: > >What|Removed |Added > > CC||amonakov at gcc dot gnu.org > > --- Comment #61 from Alexander Monakov --- > (In reply to Richard Biener from comment #60) > > > This one is btw. a known issue PR108129. > > > > But the revision only sligthly changes the patterns so I'm very curious > > how it arrived at 30% slowdown. > > It adds an extra 'convert2?' to 'nop_atomic_bit_test_and_p' matchers, and > since > match.pd expansion works by emitting match subtrees twice for each '?' > component, that gives an extra 2x factor to the already bad combinatorial > explosion going on in those patterns. > > We really need to rework match-and-simplify emission in a smarter way. I've > looked at that in January once, but there's a few things I'd need help > understanding, such as... > > > The "trivial" improvement of course would be to special-case > > iterator uses als for (match ...) like we do for (simplify ...) where > > we can delay substitution. > > ... this. Is there a short explanation what's 'delayed substitution' in this > context? 'delayed substitution' works for (simplify (...)) by not expanding the substitution for each (for ..) iterator but instead passing it as variable to a split out common function. For (match (...)) the "substitution" part is trivial so there's no point doing that. But instead we can look to apply something similar to the "matching" part. When we have (for X (A B ...) (simplify (op (X (op2 ...) ...)) ... we get for the matching of 'X' (if it's not at the toplevel) switch (...) { case A: { .. match the rest .. } case B: { .. match the rest .. } ... but we can instead emit (maybe only in a subset of cases?) switch (...) { case A: case B: case ...: { .. mach the rest .. } in theory we support things like (for X (plus IFN_POW) (... as both operators are binary - so that's cases we cannot handle this way. Basically we'd keep the user-defined operator in the AST and adjust code-generation to deal with that. I will try to do that.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #62 from Jakub Jelinek --- Looking at gimple-match.cc, the case CFN_BUILT_IN_ATOMIC_FETCH_OR_{1,2,4,8,16}: etc. blocks are identical there, except for the numbers in next_after_fail* label numbers. So, could we perhaps expand everything the way we do and just when emitting a switch hash the subtree of the cases to be emitted and if the hashes are equal also compare and if the subtrees are the same (== would result in the same text being emitted into the output except for the label numbers) emit multiple cases with the same block? Admittedly I haven't looked yet at the data structures genmatch.cc uses before emitting the source, so don't know whether it is feasible.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #61 from Alexander Monakov --- (In reply to Richard Biener from comment #60) > > This one is btw. a known issue PR108129. > > But the revision only sligthly changes the patterns so I'm very curious > how it arrived at 30% slowdown. It adds an extra 'convert2?' to 'nop_atomic_bit_test_and_p' matchers, and since match.pd expansion works by emitting match subtrees twice for each '?' component, that gives an extra 2x factor to the already bad combinatorial explosion going on in those patterns. We really need to rework match-and-simplify emission in a smarter way. I've looked at that in January once, but there's a few things I'd need help understanding, such as... > The "trivial" improvement of course would be to special-case > iterator uses als for (match ...) like we do for (simplify ...) where > we can delay substitution. ... this. Is there a short explanation what's 'delayed substitution' in this context?
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #60 from Richard Biener --- (In reply to Martin Liška from comment #59) > (In reply to Andrew Carlotti from comment #58) > > Since November 2021, there's been a significant regression in the compile > > time for gimple-match.cc during a bootstrap build (+100% in Stage 2, +73% in > > Stage 3), with this regression accounting for over 20% of the current total > > bootstrap time on some aarch64 machines. > > Thank for the interesting numbers! Yeah, it's very unfortunate :/ > > > > > Most of the change in compile time is due to the following 6 commits (of > > which one is a performance improvement, and one only regressed the Stage 2 > > build): > > > > 7df89377a7ae3906255e38a79be8e5d962c3a0df 24th November 2021 > > Enhance optimize_atomic_bit_test_and to handle truncation. (Hongtao Liu) > > Stage 2: +27% > > Stage 3: +33% > > This one is btw. a known issue PR108129. But the revision only sligthly changes the patterns so I'm very curious how it arrived at 30% slowdown. Note these (match ..) patterns that are not used from inside match.pd itself (and do not use other (match ..)) would be perfect candidates to emit to separate files. Either by explicit syntax or magically where the former would be easier to cater for in the Makefile. The "trivial" improvement of course would be to special-case iterator uses als for (match ...) like we do for (simplify ...) where we can delay substitution.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #59 from Martin Liška --- (In reply to Andrew Carlotti from comment #58) > Since November 2021, there's been a significant regression in the compile > time for gimple-match.cc during a bootstrap build (+100% in Stage 2, +73% in > Stage 3), with this regression accounting for over 20% of the current total > bootstrap time on some aarch64 machines. Thank for the interesting numbers! Yeah, it's very unfortunate :/ > > Most of the change in compile time is due to the following 6 commits (of > which one is a performance improvement, and one only regressed the Stage 2 > build): > > 7df89377a7ae3906255e38a79be8e5d962c3a0df 24th November 2021 > Enhance optimize_atomic_bit_test_and to handle truncation. (Hongtao Liu) > Stage 2: +27% > Stage 3: +33% This one is btw. a known issue PR108129.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Andrew Carlotti changed: What|Removed |Added CC||andrew.carlotti at arm dot com --- Comment #58 from Andrew Carlotti --- Since November 2021, there's been a significant regression in the compile time for gimple-match.cc during a bootstrap build (+100% in Stage 2, +73% in Stage 3), with this regression accounting for over 20% of the current total bootstrap time on some aarch64 machines. Most of the change in compile time is due to the following 6 commits (of which one is a performance improvement, and one only regressed the Stage 2 build): 7df89377a7ae3906255e38a79be8e5d962c3a0df 24th November 2021 Enhance optimize_atomic_bit_test_and to handle truncation. (Hongtao Liu) Stage 2: +27% Stage 3: +33% 9a53101caadae1b5c8d791d247b05268ee4f7f92 16th May 2022 Add MIN/MAX folding from fold_cond_expr_with_comparison to match.pd (Richard Biener) Stage 2: +15% Stage 3: +15% 409978d58dafa689c5b3f85013e2786526160f2c 9th August 2022 tree-optimization/106514 - add --param max-jump-thread-paths (Richard Biener) Stage 2: -7% Stage 3: -10% 011d0a033ab370ea38b06b813ac62be8dde0801b 18th August 2022 Make path_range_query standalone and add reset_path. (Aldy Hernandez) Stage 2: +5% Stage 3: +0% 4d9db4bdd458a4b526f59e4bc5bbd549d3861cea 12th December 2022 middle-end: simplify complex if expressions where comparisons are inverse of one another. (Tamar Christina) Stage 2: +10% Stage 3: +9% 733a1b777f16cd397b43a242d9c31761f66d3da8 13th January 2023 sched-deps: do not schedule pseudos across calls [PR108117] (Alexander Monakov) Stage 2: +14% Stage 3: +9%
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #57 from Martin Liška --- Created attachment 53997 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53997&action=edit Partial linking path
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #56 from Martin Liška --- Created attachment 53996 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53996&action=edit make all-host on Ryzen 9 with LTO partial linking Using partial linking for the following 4 objects (gimple-match.o generic-match.o insn-recog.o insn-emit.o), I can speed up build of all-host by almost 30s from 145 to 115 seconds).
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #55 from Martin Liška --- Created attachment 53995 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53995&action=edit make all-host on Ryzen 9
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #54 from Martin Liška --- > Try LTOing libbackend.a? So this option is not feasible as well, we're paying a too high price for parallel WPA of the LTO and the resulting time on 32 cores is even slower :/
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #53 from Martin Liška --- (In reply to Richard Biener from comment #50) > (In reply to Martin Liška from comment #48) > > Created attachment 53989 [details] > > CPU utilization of make all-host on recent AMD server > > > > The situation with a recent AMD server is really bad! Having 192 cores, the > > average CPU utilization of `make all-host` is 6% ! > > Just do more builds in parallel! No! I'm speaking about faster edit-build-debug cycles and also about faster builds of gcc packages. > There's just 903 .o files in gcc/ and > libbackend.a just has 490 of them. It's not surprising the few larger > files stretch out the compile-time here. Well, gimple-match.o takes ~66s on my new AMD Ryzen 9 5950X CPU :/ > Try LTOing libbackend.a? Yep, that's our parallel for free approach and I would welcome that, however: during IPA pass: inline In member function ‘quick_push’, inlined from ‘make_forwarders_with_degenerate_phis’ at /home/marxin/Programming/gcc/gcc/tree-ssa-dce.cc:1848:6: /home/marxin/Programming/gcc/gcc/vec.h:1958:28: internal compiler error: Segmentation fault 1958 | return m_vec->quick_push (obj); |^ 0x102f987 internal_error(char const*, ...) ???:0 0x117935b cgraph_node::get_untransformed_body() ???:0 0x123f6e9 optimize_inline_calls(tree_node*) ???:0 0x123e4d2 inline_transform(cgraph_node*) ???:0 0x123da5f execute_all_ipa_transforms(bool) ???:0 0x15ebe1b cgraph_node::expand() ???:0 0x15e2f6d symbol_table::compile() ???:0 0x15d0368 lto_main() ???:0 I'll isolate that and hope we can add a configure option for LTOed libbackend.a.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #52 from Richard Biener --- (In reply to Richard Biener from comment #51) > (In reply to Martin Liška from comment #49) > > [...] > > > Can please any GNU make expect judge here? Starting e.g. gimple-match.cc > > early would really help > > to speed up the build process. > > this has come up in the past and there's no reliable way to order things > (just use make -j on such machines and overcommit?) Doesn't make a difference to overall time so early starting isn't the issue it seems.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #51 from Richard Biener --- (In reply to Martin Liška from comment #49) [...] > Can please any GNU make expect judge here? Starting e.g. gimple-match.cc > early would really help > to speed up the build process. this has come up in the past and there's no reliable way to order things (just use make -j on such machines and overcommit?)
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #50 from Richard Biener --- (In reply to Martin Liška from comment #48) > Created attachment 53989 [details] > CPU utilization of make all-host on recent AMD server > > The situation with a recent AMD server is really bad! Having 192 cores, the > average CPU utilization of `make all-host` is 6% ! Just do more builds in parallel! There's just 903 .o files in gcc/ and libbackend.a just has 490 of them. It's not surprising the few larger files stretch out the compile-time here. Try LTOing libbackend.a?
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #49 from Martin Liška --- One more observation I made, apparently we're trying to sort (in Makefile.in) OBJS with the biggest at the very beginning: 1295 # Language-independent object files. 1296 # We put the *-match.o and insn-*.o files first so that a parallel make 1297 # will build them sooner, because they are large and otherwise tend to be 1298 # the last objects to finish building. 1299 OBJS = \ 1300 gimple-match.o \ 1301 generic-match.o \ 1302 insn-attrtab.o \ 1303 insn-automata.o \ That's fine, plus we introduce dependency for all objects to depend on generated_files: 4441 # In order for parallel make to really start compiling the expensive 4442 # objects from $(OBJS) as early as possible, build all their 4443 # prerequisites strictly before all objects. $(ALL_HOST_OBJS) : | $(generated_files) Using that, we should see gimple-match.o being spawned very soon, but it's not the case. Imagine you have already built all-host and let's see what happens: $ rm -f gimple-match.o ; rm -f tree*.o && make -j4 --debug=b libbackend.a 2>&1 | less ... File 'gimple-match.o' does not exist. Prerequisite 'cs-bconfig.h' is newer than target 'bconfig.h'. Must remake target 'bconfig.h'. Prerequisite 'cstamp-h' is newer than target 'auto-host.h'. Must remake target 'auto-host.h'. Prerequisite 's-options' is newer than target 'optionlist'. Must remake target 'optionlist'. Prerequisite 's-gtyp-input' is newer than target 'gtyp-input.list'. Must remake target 'gtyp-input.list'. Prerequisite 's-bversion' is newer than target 'bversion.h'. Must remake target 'bversion.h'. Prerequisite 'cs-config.h' is newer than target 'config.h'. Must remake target 'config.h'. ... File 'tree-vrp.o' does not exist. File 'tree.o' does not exist. Prerequisite 's-i386-bt' is newer than target 'i386-builtin-types.inc'. Must remake target 'i386-builtin-types.inc'. File 'gimple-match.o' does not exist. Prerequisite 's-modes-h' is newer than target 'insn-modes.h'. Must remake target 'insn-modes.h'. Prerequisite 's-modes-inline-h' is newer than target 'insn-modes-inline.h'. Must remake target 'insn-modes-inline.h'. Prerequisite 's-version' is newer than target 'version.h'. Must remake target 'version.h'. Prerequisite 's-options-h' is newer than target 'options.h'. Must remake target 'options.h'. Prerequisite 's-genrtl-h' is newer than target 'genrtl.h'. Must remake target 'genrtl.h'. Prerequisite 's-modes-m' is newer than target 'min-insn-modes.cc'. Must remake target 'min-insn-modes.cc'. ... File 'gimple-match.o' does not exist. Prerequisite 's-gtype' is newer than target 'gtype-desc.h'. Must remake target 'gtype-desc.h'. Prerequisite 's-constants' is newer than target 'insn-constants.h'. Must remake target 'insn-constants.h'. ... Must remake target 'tree-affine.o'. g++ -fno-PIE -c -g -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I/home/marxin/Programming/gcc/gcc -I/home/marxin/Programming/gcc/gcc/. -I/home/marxin/Programming/gcc/gcc/../include -I/home/marxin/Programming/gcc/gcc/../libcpp/include -I/home/marxin/Programming/gcc/gcc/../libcody -I/home/marxin/Programming/gcc/gcc/../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libdecnumber/bid -I../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libbacktrace -o tree-affine.o -MT tree-affine.o -MMD -MP -MF ./.deps/tree-affine.TPo /home/marxin/Programming/gcc/gcc/tree-affine.cc File 'tree-call-cdce.o' does not exist. Must remake target 'tree-call-cdce.o'. g++ -fno-PIE -c -g -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I/home/marxin/Programming/gcc/gcc -I/home/marxin/Programming/gcc/gcc/. -I/home/marxin/Programming/gcc/gcc/../include -I/home/marxin/Programming/gcc/gcc/../libcpp/include -I/home/marxin/Programming/gcc/gcc/../libcody -I/home/marxin/Programming/gcc/gcc/../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libdecnumber/bid -I../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libbacktrace -o tree-call-cdce.o -MT tree-call-cdce.o -MMD -MP -MF ./.dep
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #48 from Martin Liška --- Created attachment 53989 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53989&action=edit CPU utilization of make all-host on recent AMD server The situation with a recent AMD server is really bad! Having 192 cores, the average CPU utilization of `make all-host` is 6% !
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #47 from Segher Boessenkool --- (In reply to Sam James from comment #46) > Even partially making the build less recursive would likely help a fair bit. It will help a bit, sure, but not nearly as much as you perhaps hope for. There are quite a few "synchronisation" points where nothing after it can be done until everything before it has been done. Partly this is just because we have a three-stage bootstrap, but also there are some generator programs that everything else depends on (on its output that is), and those are real chokepoints. Also, recursive make is a scourge of humanity, for sure, but fixing this has to be done in auto first and foremost.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Sam James changed: What|Removed |Added CC||sam at gentoo dot org --- Comment #46 from Sam James --- Even partially making the build less recursive would likely help a fair bit. The classic text on this is https://accu.org/journals/overload/14/71/miller_2004/. This doesn't mean that splitting up files is futile, but when watching a build, much of the time, make doesn't even get to traverse into each of the directories, because it doesn't know if it's able to. It can safely be done in stages. Using includes would let you get a lot of the current state wrt split directories. Could even just have a certain number of toplevel directories but non-recursive within them.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #45 from Eric Gallager --- (In reply to Martin Liška from comment #0) > [...] > Then I built GCC with -j1 and used following parser to generate reports: > https://github.com/marxin/script-misc/blob/master/parse-make-log.py The new URL for that script is now this, btw: https://github.com/marxin/script-misc/blob/master/legacy/parse-make-log.py
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #44 from Eric Gallager --- (In reply to Martin Liška from comment #43) > (In reply to Eric Gallager from comment #42) > > Is this just about parallelism bottlenecks for the main build target (e.g. > > just `make` or `make all`), or does it apply to other Makefile targets, too? > > (e.g. the testsuite via `make check`, or docs via `make pdf` or something) > > Well, it was intended to cover only the main build, which pdf can be seen as > part of. I usually have to run `make pdf` as a separate build target, though, as it doesn't get run as part of the main build for me... and the bottleneck there, for the pdf target, is in libstdc++ for me...
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #43 from Martin Liška --- (In reply to Eric Gallager from comment #42) > Is this just about parallelism bottlenecks for the main build target (e.g. > just `make` or `make all`), or does it apply to other Makefile targets, too? > (e.g. the testsuite via `make check`, or docs via `make pdf` or something) Well, it was intended to cover only the main build, which pdf can be seen as part of. On the other hand, `make check` should belong to a different PR if you have troubles with it.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #42 from Eric Gallager --- Is this just about parallelism bottlenecks for the main build target (e.g. just `make` or `make all`), or does it apply to other Makefile targets, too? (e.g. the testsuite via `make check`, or docs via `make pdf` or something)
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #41 from Andrew Pinski --- Latest discussion of this can also be found at: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571555.html
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Andrew Pinski changed: What|Removed |Added Last reconfirmed|2018-09-07 00:00:00 |2021-7-18 Target Milestone|10.4|---
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Richard Biener changed: What|Removed |Added Target Milestone|10.3|10.4 --- Comment #40 from Richard Biener --- GCC 10.3 is being released, retargeting bugs to GCC 10.4.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Richard Biener changed: What|Removed |Added Target Milestone|10.2|10.3 --- Comment #39 from Richard Biener --- GCC 10.2 is released, adjusting target milestone.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #38 from jojo --- (In reply to Martin Liška from comment #36) > (In reply to jojo from comment #35) > > (In reply to Martin Liška from comment #30) > > > A possible solution can be usage of '-flinker-output=nolto-rel -r' for > > > huge > > > files. > > > > it's useful for splitting huge files ? > > There's experiment I did: > > $ time g++ -O2 /tmp/gimple-match.ii -c > > real0m35.790s > user0m35.490s > sys0m0.268s > > $ time g++ -O2 /tmp/gimple-match.ii -c -flto > > real0m8.138s > user0m7.915s > sys0m0.202s > > $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o > gimple-match2.o > > real0m9.087s > user1m56.028s > sys0m3.292s > > $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o > gimple-match2.o --param lto-partitions=8 > > real0m7.350s > user0m48.548s > sys0m0.976s > > $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o > gimple-match2.o --param lto-partitions=4 > > real0m9.847s > user0m30.462s > sys0m0.392s > > so for N==4 we get to 8+10s = 18s (compared to the original 36s). And total > user time is 30+8, which is comparable > to the original 36s. It's looks a little cost down for huge file as insn-emit.c.. I want to use shell tool like 'csplit' to split it and compile parallelly
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #37 from Richard Biener --- (In reply to Martin Liška from comment #36) > (In reply to jojo from comment #35) > > (In reply to Martin Liška from comment #30) > > > A possible solution can be usage of '-flinker-output=nolto-rel -r' for > > > huge > > > files. > > > > it's useful for splitting huge files ? > > There's experiment I did: > > $ time g++ -O2 /tmp/gimple-match.ii -c > > real0m35.790s > user0m35.490s > sys0m0.268s > > $ time g++ -O2 /tmp/gimple-match.ii -c -flto > > real0m8.138s > user0m7.915s > sys0m0.202s > > $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o > gimple-match2.o > > real0m9.087s > user1m56.028s > sys0m3.292s > > $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o > gimple-match2.o --param lto-partitions=8 > > real0m7.350s > user0m48.548s > sys0m0.976s > > $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o > gimple-match2.o --param lto-partitions=4 > > real0m9.847s > user0m30.462s > sys0m0.392s > > so for N==4 we get to 8+10s = 18s (compared to the original 36s). And total > user time is 30+8, which is comparable > to the original 36s. The GSoC parallelism project this year is supposed to replicate this in a cheaper way and also develop some magic to automatically trigger it when it seems profitable.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #36 from Martin Liška --- (In reply to jojo from comment #35) > (In reply to Martin Liška from comment #30) > > A possible solution can be usage of '-flinker-output=nolto-rel -r' for huge > > files. > > it's useful for splitting huge files ? There's experiment I did: $ time g++ -O2 /tmp/gimple-match.ii -c real0m35.790s user0m35.490s sys0m0.268s $ time g++ -O2 /tmp/gimple-match.ii -c -flto real0m8.138s user0m7.915s sys0m0.202s $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o gimple-match2.o real0m9.087s user1m56.028s sys0m3.292s $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o gimple-match2.o --param lto-partitions=8 real0m7.350s user0m48.548s sys0m0.976s $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o gimple-match2.o --param lto-partitions=4 real0m9.847s user0m30.462s sys0m0.392s so for N==4 we get to 8+10s = 18s (compared to the original 36s). And total user time is 30+8, which is comparable to the original 36s.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 jojo changed: What|Removed |Added CC||rjiejie at me dot com --- Comment #35 from jojo --- (In reply to Martin Liška from comment #30) > A possible solution can be usage of '-flinker-output=nolto-rel -r' for huge > files. it's useful for splitting huge files ?
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Eric Gallager changed: What|Removed |Added CC||egallager at gcc dot gnu.org --- Comment #34 from Eric Gallager --- (In reply to Giuliano Belinassi from comment #32) > (In reply to Eric Gallager from comment #31) > > I think this came up at Cauldron, but I forget what exactly people said > > about it... > > Actually this PR comes before Cauldron 2019. By "came up" I meant simply that it was mentioned, not that that was where it originated...
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Jakub Jelinek changed: What|Removed |Added Target Milestone|10.0|10.2 --- Comment #33 from Jakub Jelinek --- GCC 10.1 has been released.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #32 from Giuliano Belinassi --- (In reply to Eric Gallager from comment #31) > I think this came up at Cauldron, but I forget what exactly people said > about it... Actually this PR comes before Cauldron 2019. One way to fix this issue is to make the match.pd parser output several smaller gimple-match.c, and add these to the Makefile. Also repeat this procedure to other big files. Another solution is to parallelize GCC internals and make GCC communicate with Make somehow so that when a CPU is idle, it starts compiling some files in parallel.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #31 from Eric Gallager --- I think this came up at Cauldron, but I forget what exactly people said about it...
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Martin Liška changed: What|Removed |Added Status|ASSIGNED|NEW Assignee|marxin at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #30 from Martin Liška --- A possible solution can be usage of '-flinker-output=nolto-rel -r' for huge files.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #29 from Giuliano Belinassi --- > No, the proper fix would be to split the generated files and compile them in > parallel. Similarly for all the insn-*.c generated files. That would the > proper fix. Indeed. However, I am working on parallelizing the compilation with threads. This may lead to a solution, but may not be the best for this scenario. > Anyway, I like the graph you made :) Thank you. > But what version of GCC is this graph, with what exact configuration? * This is the gcc that I used to build: * Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/8/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 8.2.0-14' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-8 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 8.2.0 (Debian 8.2.0-14) * The gcc that I built: * Using built-in specs. COLLECT_GCC=./xgcc Target: x86_64-pc-linux-gnu Configured with: /home/giulianob/gcc_svn/trunk//configure --disable-checking --disable-bootstrap Thread model: posix gcc version 9.0.1 20190205 (experimental) (GCC)
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #28 from Segher Boessenkool --- But what version of GCC is this graph, with what exact configuration?
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #27 from Martin Liška --- > Since gimple-match.c takes so long to compile, I was wondering if it might > be possible to reorder the compilation so we can push its compilation early > in the dependency graph. No, the proper fix would be to split the generated files and compile them in parallel. Similarly for all the insn-*.c generated files. That would the proper fix. Anyway, I like the graph you made :)
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Giuliano Belinassi changed: What|Removed |Added CC||giuliano.belinassi at usp dot br --- Comment #26 from Giuliano Belinassi --- Created attachment 45630 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45630&action=edit make -j 64 all-gcc, with --disable-bootstrap, on 64-cores. Blue means dependency to gimple-match. Since gimple-match.c takes so long to compile, I was wondering if it might be possible to reorder the compilation so we can push its compilation early in the dependency graph. I did the following steps: 1) 'configure --disable-bootstrap' 2) 'make -j 64 all-gcc' 3) 'make clean'. 4) 'make gimple-match.o' using a wrapper[1] that I created to log all files required by gimple-match, and plotted the attached graphic. Here, blue means dependency and the largest bar is the 'gimple-match.c' itself. I used a 64 cores AMD Opteron 6376 in the process. Any ideas? [1] https://github.com/giulianobelinassi/gcc-timer-analysis
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Martin Liška changed: What|Removed |Added Target Milestone|9.0 |10.0
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Martin Liška changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |marxin at gcc dot gnu.org --- Comment #25 from Martin Liška --- Let me assign it.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Eric Gallager changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2018-09-07 CC||egallager at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #24 from Eric Gallager --- (In reply to Martin Liška from comment #23) > I can easily split insn-emit.c. Once we know which was a split should be > done, I can prepare patch for that. Confirmed, please do this!
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Martin Liška changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #23 from Martin Liška --- I can easily split insn-emit.c. Once we know which was a split should be done, I can prepare patch for that.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #22 from Martin Liška --- (In reply to rguent...@suse.de from comment #21) > On Wed, 4 Apr 2018, marxin at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 > > > > --- Comment #20 from Martin Liška --- > > For the libsanitizer/*/*_interceptors I make a quick patch: > > https://github.com/marxin/gcc/commit/5ce658230db567474997fa411f23ac78366487ce > > which basically splits asan_interceptors.cc and > > sanitizer_common_interceptors.inc and moves implementation of string > > functions > > to a separate compile unit. > > This shrinks time from 38->34s for asan_interceptors.cc being built with > > enabled checking stage1 compiler. > > > > I believe splitting the interceptors to couple of logical sub-files will > > make > > it very fast. List of interceptors grepped from > > sanitizer_common_interceptors.inc: > > I can imagine splitting that to components like string, stdio, time, > > process, > > thread, math,.. > > The question is of course _why_ it is this slow. It's not that this > is 1s of functions or very large ones... It's analyzed here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78288
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #21 from rguenther at suse dot de --- On Wed, 4 Apr 2018, marxin at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 > > --- Comment #20 from Martin Liška --- > For the libsanitizer/*/*_interceptors I make a quick patch: > https://github.com/marxin/gcc/commit/5ce658230db567474997fa411f23ac78366487ce > which basically splits asan_interceptors.cc and > sanitizer_common_interceptors.inc and moves implementation of string functions > to a separate compile unit. > This shrinks time from 38->34s for asan_interceptors.cc being built with > enabled checking stage1 compiler. > > I believe splitting the interceptors to couple of logical sub-files will make > it very fast. List of interceptors grepped from > sanitizer_common_interceptors.inc: > I can imagine splitting that to components like string, stdio, time, process, > thread, math,.. The question is of course _why_ it is this slow. It's not that this is 1s of functions or very large ones...
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #20 from Martin Liška --- For the libsanitizer/*/*_interceptors I make a quick patch: https://github.com/marxin/gcc/commit/5ce658230db567474997fa411f23ac78366487ce which basically splits asan_interceptors.cc and sanitizer_common_interceptors.inc and moves implementation of string functions to a separate compile unit. This shrinks time from 38->34s for asan_interceptors.cc being built with enabled checking stage1 compiler. I believe splitting the interceptors to couple of logical sub-files will make it very fast. List of interceptors grepped from sanitizer_common_interceptors.inc: I can imagine splitting that to components like string, stdio, time, process, thread, math,.. INTERCEPTOR(SIZE_T, strlen, const char *s) { INTERCEPTOR(SIZE_T, strnlen, const char *s, SIZE_T maxlen) { INTERCEPTOR(char*, strndup, const char *s, uptr size) { INTERCEPTOR(char*, __strndup, const char *s, uptr size) { INTERCEPTOR(char*, textdomain, const char *domainname) { INTERCEPTOR(int, strcmp, const char *s1, const char *s2) { INTERCEPTOR(int, strncmp, const char *s1, const char *s2, uptr size) { INTERCEPTOR(int, strcasecmp, const char *s1, const char *s2) { INTERCEPTOR(int, strncasecmp, const char *s1, const char *s2, SIZE_T size) { INTERCEPTOR(char*, strstr, const char *s1, const char *s2) { INTERCEPTOR(char*, strcasestr, const char *s1, const char *s2) { INTERCEPTOR(char*, strtok, char *str, const char *delimiters) { INTERCEPTOR(void*, memmem, const void *s1, SIZE_T len1, const void *s2, INTERCEPTOR(char*, strchr, const char *s, int c) { INTERCEPTOR(char*, strchrnul, const char *s, int c) { INTERCEPTOR(char*, strrchr, const char *s, int c) { INTERCEPTOR(SIZE_T, strspn, const char *s1, const char *s2) { INTERCEPTOR(SIZE_T, strcspn, const char *s1, const char *s2) { INTERCEPTOR(char *, strpbrk, const char *s1, const char *s2) { INTERCEPTOR(void *, memset, void *dst, int v, uptr size) { INTERCEPTOR(void *, memmove, void *dst, const void *src, uptr size) { INTERCEPTOR(void *, memcpy, void *dst, const void *src, uptr size) { INTERCEPTOR(int, memcmp, const void *a1, const void *a2, uptr size) { INTERCEPTOR(void*, memchr, const void *s, int c, SIZE_T n) { INTERCEPTOR(void*, memrchr, const void *s, int c, SIZE_T n) { INTERCEPTOR(double, frexp, double x, int *exp) { INTERCEPTOR(float, frexpf, float x, int *exp) { INTERCEPTOR(long double, frexpl, long double x, int *exp) { INTERCEPTOR(SSIZE_T, read, int fd, void *ptr, SIZE_T count) { INTERCEPTOR(SIZE_T, fread, void *ptr, SIZE_T size, SIZE_T nmemb, void *file) { INTERCEPTOR(SSIZE_T, pread, int fd, void *ptr, SIZE_T count, OFF_T offset) { INTERCEPTOR(SSIZE_T, pread64, int fd, void *ptr, SIZE_T count, OFF64_T offset) { INTERCEPTOR_WITH_SUFFIX(SSIZE_T, readv, int fd, __sanitizer_iovec *iov, INTERCEPTOR(SSIZE_T, preadv, int fd, __sanitizer_iovec *iov, int iovcnt, INTERCEPTOR(SSIZE_T, preadv64, int fd, __sanitizer_iovec *iov, int iovcnt, INTERCEPTOR(SSIZE_T, write, int fd, void *ptr, SIZE_T count) { INTERCEPTOR(SIZE_T, fwrite, const void *p, uptr size, uptr nmemb, void *file) { INTERCEPTOR(SSIZE_T, pwrite, int fd, void *ptr, SIZE_T count, OFF_T offset) { INTERCEPTOR(SSIZE_T, pwrite64, int fd, void *ptr, OFF64_T count, INTERCEPTOR_WITH_SUFFIX(SSIZE_T, writev, int fd, __sanitizer_iovec *iov, INTERCEPTOR(SSIZE_T, pwritev, int fd, __sanitizer_iovec *iov, int iovcnt, INTERCEPTOR(SSIZE_T, pwritev64, int fd, __sanitizer_iovec *iov, int iovcnt, INTERCEPTOR(int, prctl, int option, unsigned long arg2, INTERCEPTOR(unsigned long, time, unsigned long *t) { INTERCEPTOR(__sanitizer_tm *, localtime, unsigned long *timep) { INTERCEPTOR(__sanitizer_tm *, localtime_r, unsigned long *timep, void *result) { INTERCEPTOR(__sanitizer_tm *, gmtime, unsigned long *timep) { INTERCEPTOR(__sanitizer_tm *, gmtime_r, unsigned long *timep, void *result) { INTERCEPTOR(char *, ctime, unsigned long *timep) { INTERCEPTOR(char *, ctime_r, unsigned long *timep, char *result) { INTERCEPTOR(char *, asctime, __sanitizer_tm *tm) { INTERCEPTOR(char *, asctime_r, __sanitizer_tm *tm, char *result) { INTERCEPTOR(long, mktime, __sanitizer_tm *tm) { INTERCEPTOR(char *, strptime, char *s, char *format, __sanitizer_tm *tm) { INTERCEPTOR(int, vscanf, const char *format, va_list ap) INTERCEPTOR(int, vsscanf, const char *str, const char *format, va_list ap) INTERCEPTOR(int, vfscanf, void *stream, const char *format, va_list ap) INTERCEPTOR(int, __isoc99_vscanf, const char *format, va_list ap) INTERCEPTOR(int, __isoc99_vsscanf, const char *str, const char *format, INTERCEPTOR(int, __isoc99_vfscanf, void *stream, const char *format, va_list ap) INTERCEPTOR(int, scanf, const char *format, ...) INTERCEPTOR(int, fscanf, void *stream, const char *format, ...) INTERCEPTOR(int, sscanf, const char *str, const char *format, ...) INTERCEPTOR(int, __isoc99_scanf, const char *format, ...) INTERCEPTOR(int, __isoc99_fscanf, void *stream, const char *format, ...) INTERCEPTOR(int, __isoc99_sscanf,
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #19 from Martin Liška --- (In reply to Tom Tromey from comment #17) > The results in comment #13 seem to be missing some compilations -- > I would have expected to see more files from libcpp in there. > As it is I only see directives.o and line-map.o. There was a minimum threshold of 0.5s, please take a look at log file in: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402#c18
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #18 from Martin Liška --- Created attachment 43492 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43492&action=edit Parallel build of make all-host on 128 core EPYC machine (log file)
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #17 from Tom Tromey --- The results in comment #13 seem to be missing some compilations -- I would have expected to see more files from libcpp in there. As it is I only see directives.o and line-map.o.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Martin Liška changed: What|Removed |Added Attachment #43478|0 |1 is obsolete|| --- Comment #16 from Martin Liška --- Created attachment 43482 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43482&action=edit -ftime-report for most time consuming files on Haswell machine Properly generated with -O2 which was missing in previous version.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #15 from Segher Boessenkool --- This is a -O0 build? That's what that time report shows afaics.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #13 from Martin Liška --- Created attachment 43440 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43440&action=edit Parallel build of make all-host on 128 core EPYC machine --- Comment #14 from Martin Liška --- Created attachment 43478 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43478&action=edit -ftime-report for most time consuming files on Haswell machine
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Martin Liška changed: What|Removed |Added CC||hubicka at ucw dot cz, ||rguenth at gcc dot gnu.org Target Milestone|--- |9.0
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #13 from Martin Liška --- Created attachment 43440 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43440&action=edit Parallel build of make all-host on 128 core EPYC machine
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Martin Liška changed: What|Removed |Added Attachment #43432|0 |1 is obsolete|| --- Comment #12 from Martin Liška --- Created attachment 43439 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43439&action=edit Parallel build of make all-host on 8 core Haswell machine
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #11 from Martin Liška --- (In reply to Martin Liška from comment #10) > Created attachment 43432 [details] > Parallel build of make all-host on 8 core Haswell machine This was generated with a slightly modified make (being able to run fully in parallel): https://github.com/marxin/make/tree/timestamp-v2 And output is then parsed and 'stacked' graph is generated: https://github.com/marxin/script-misc/blob/master/parse-make-log-parallel.py
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Martin Liška changed: What|Removed |Added Attachment #43428|0 |1 is obsolete|| --- Comment #10 from Martin Liška --- Created attachment 43432 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43432&action=edit Parallel build of make all-host on 8 core Haswell machine
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #9 from Martin Liška --- Created attachment 43428 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43428&action=edit Parallel build of make all-host on 8 core Haswell machine
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #8 from Martin Liška --- I forgot to note that minimum time threshold is 0.5s for the wall time reports.
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #7 from Martin Liška --- Created attachment 43426 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43426&action=edit wall time report: boostrap stage3 on Haswell machine
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #6 from Martin Liška --- Created attachment 43425 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43425&action=edit wall time report: boostrap stage2 on Haswell machine
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #4 from Martin Liška --- Created attachment 43423 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43423&action=edit wall time report: make (for configure --disable-boostrap) on Haswell machine (system compiler -O2 -g)
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #5 from Martin Liška --- Created attachment 43424 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43424&action=edit wall time report: boostrap stage1 on Haswell machine
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #2 from Martin Liška --- Created attachment 43421 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43421&action=edit make all-host -j128 on 128 core EPYC machine
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #3 from Martin Liška --- Created attachment 43422 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43422&action=edit make (for configure --disable-boostrap) -j128 on 128 core EPYC machine
[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 --- Comment #1 from Martin Liška --- Created attachment 43420 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43420&action=edit make all-host -j8 on 8 core Haswell machine