On Mon, Apr 27, 2015 at 6:20 PM, Richard Sandiford <richard.sandif...@arm.com> wrote: > I think it's been the case for a while that parallel builds of GCC tend > to serialise around the compilation of insn-recog.c, especially with > higher --enable-checking settings. This patch tries to speed that > up by replacing most of genrecog with a new algorithm. > > I think the main problems with the current code are: > > 1. Vector architectures have added lots of new instructions that have > a similar shape and differ only in mode, code or unspec number. > The current algorithm doesn't have any way of factoring out those > similarities. > > 2. When matching a particular instruction, the current code examines > everything about a SET_DEST before moving on to the SET_SRC. This has > two subproblems: > > 2a. The destination of a SET isn't very distinctive. It's usually > just a register_operand, a memory_operand, a nonimmediate_operand > or a flags register. We therefore tend to backtrack to the > SET_DEST a lot, oscillating between groups of instructions with > the same kind of destination. > > 2b. Backtracking through predicate checks is relatively expensive. > It would be good to narrow down the "shape" of the instruction > first and only then check the predicates. (The backtracking is > expensive in terms of insn-recog.o compile time too, both because > we need to copy into argument registers and out of the result > register, and because it adds more sites where spills are needed.) > > 3. The code keeps one local variable per rtx depth, so it ends up > loading the same rtx many times over (mostly when backtracking). > This is very expensive in rtl-checking builds because each XEXP > includes a code check and a line-specific failure call. > > In principle the idea of having one local variable per depth > is good. But it was originally written that way when all optimisations > were done at the rtl level and I imagine each local variable mapped > to one pseudo register. These days the statements that reload the > value needed on backtracking lead to many more SSA names and phi > statements than you'd get with just a single variable per position > (loaded once, so naturally SSA). There is still the potential benefit > of avoiding having sibling rtxes live at once, but fixing (2) above > reduces that problem. > > Also, the code is all goto-based, which makes it rather hard to step through. > > The patch deals with these as follows: > > 1. Detect subpatterns that differ only by mode, code and/or integer > (e.g. unspec number) and split them out into a common routine. > > 2. Match the "shape" of the instruction first, in terms of codes, > integers and vector lengths, and only then check the modes, predicates > and dups. When checking the shape, handle SET_SRCs before SET_DESTs. > In practice this seems to greatly reduce the amount of backtracking. > > 3. Have one local variable per rtx position. I tested the patch with > and without the change and it helped a lot with rtl-checking builds > without seeming to affect release builds much either way. > > As far as debuggability goes, the new code avoids gotos and just > uses "natural" control flow. > > The headline stat is that a stage 3 --enable-checking=yes,rtl,df > build of insn-recog.c on my box goes from 7m43s to 2m2s (using the > same stage 2 compiler). The corresponding --enable-checking=release > change is from 49s to 24s (less impressive, as expected). > > The patch seems to speed up recog. E.g. the time taken to build > fold-const.ii goes from 6.74s before the patch to 6.69s after it; > not a big speed-up, but reproducible. > > Here's a comparison of the number of lines of code in insn-recog.c > before and after the patch on one target per config/ CPU: > > aarch64-linux-gnueabi 115526 38169 : 33.04% > alpha-linux-gnu 24479 10740 : 43.87% > arm-linux-gnueabi 169208 67759 : 40.04% > avr-rtems 55647 22127 : 39.76% > bfin-elf 13928 6498 : 46.65% > c6x-elf 29928 13324 : 44.52% > cr16-elf 2650 1419 : 53.55% > cris-elf 18669 7257 : 38.87% > epiphany-elf 19308 6131 : 31.75% > fr30-elf 2204 1112 : 50.45% > frv-linux-gnu 13541 5950 : 43.94% > h8300-elf 19584 9327 : 47.63% > hppa64-hp-hpux11.23 18299 8549 : 46.72% > ia64-linux-gnu 37629 17101 : 45.45% > iq2000-elf 2752 1609 : 58.47% > lm32-elf 1536 872 : 56.77% > m32c-elf 10040 4145 : 41.28% > m32r-elf 4436 2307 : 52.01% > m68k-linux-gnu 15739 8147 : 51.76% > mcore-elf 4816 2577 : 53.51% > mep-elf 67335 15929 : 23.66% > microblaze-elf 2656 1587 : 59.75% > mips-linux-gnu 54543 24186 : 44.34% > mmix 2597 1487 : 57.26% > mn10300-elf 6384 3294 : 51.60% > moxie-elf 1311 659 : 50.27% > msp430-elf 6054 2382 : 39.35% > nds32le-elf 5953 3152 : 52.95% > nios2-linux-gnu 3735 2143 : 57.38% > pdp11 2137 1157 : 54.14% > powerpc-eabispe 109322 40582 : 37.12% > powerpc-linux-gnu 82976 32192 : 38.80% > rl78-elf 5321 2432 : 45.71% > rx-elf 14454 7534 : 52.12% > s390-linux-gnu 48487 20454 : 42.18% > sh-linux-gnu 104087 41820 : 40.18% > sparc-linux-gnu 21912 10509 : 47.96% > spu-elf 19937 8182 : 41.04% > tilegx-elf 15412 6970 : 45.22% > tilepro-elf 11998 5479 : 45.67% > v850-elf 8725 4438 : 50.87% > vax-netbsdelf 4537 2410 : 53.12% > visium-elf 15190 7224 : 47.56% > x86_64-darwin 346396 119593 : 34.52% > xstormy16-elf 4660 2229 : 47.83% > xtensa-elf 2799 1514 : 54.09% > > Here's the loadable size of insn-recog.o in an --enable-checking=release > build on an x86_64-linux-gnu box: > > aarch64-linux-gnueabi 443955 298026 : 67.13% > alpha-linux-gnu 97194 80893 : 83.23% > arm-linux-gnueabi 782325 632248 : 80.82% > avr-rtems 226827 159763 : 70.43% > bfin-elf 52563 42376 : 80.62% > c6x-elf 112512 90142 : 80.12% > cr16-elf 10446 10006 : 95.79% > cris-elf 74771 52634 : 70.39% > epiphany-elf 87577 52284 : 59.70% > fr30-elf 8041 7713 : 95.92% > frv-linux-gnu 53699 47543 : 88.54% > h8300-elf 70708 66274 : 93.73% > hppa64-hp-hpux11.23 71597 57484 : 80.29% > ia64-linux-gnu 147286 130632 : 88.69% > iq2000-elf 11002 11774 : 107.02% > lm32-elf 5894 5798 : 98.37% > m32c-elf 36563 28855 : 78.92% > m32r-elf 17252 16910 : 98.02% > m68k-linux-gnu 58248 59781 : 102.63% > mcore-elf 17708 18948 : 107.00% > mep-elf 314466 150771 : 47.95% > microblaze-elf 10257 10534 : 102.70% > mips-linux-gnu 230991 191155 : 82.75% > mmix 10782 10678 : 99.04% > mn10300-elf 24035 22802 : 94.87% > moxie-elf 4622 4198 : 90.83% > msp430-elf 21707 16539 : 76.19% > nds32le-elf 22041 19444 : 88.22% > nios2-linux-gnu 15595 13238 : 84.89% > pdp11 7630 8254 : 108.18% > powerpc-eabispe 430816 308481 : 71.60% > powerpc-linux-gnu 317738 248534 : 78.22% > rl78-elf 18904 16329 : 86.38% > rx-elf 55015 56632 : 102.94% > s390-linux-gnu 190584 148961 : 78.16% > sh-linux-gnu 408446 307927 : 75.39% > sparc-linux-gnu 91016 80640 : 88.60% > spu-elf 80387 69151 : 86.02% > tilegx-elf 63815 49977 : 78.32% > tilepro-elf 51763 39252 : 75.83% > v850-elf 32812 28462 : 86.74% > vax-netbsdelf 18350 18259 : 99.50% > visium-elf 56872 46790 : 82.27% > x86_64-darwin 1306182 883169 : 67.61% > xstormy16-elf 17044 14430 : 84.66% > xtensa-elf 10780 9678 : 89.78% > > The same for --enable-checking=yes,rtl: > > aarch64-linux-gnueabi 1790272 507488 : 28.35% > alpha-linux-gnu 440058 193826 : 44.05% > arm-linux-gnueabi 2845568 1299337 : 45.66% > avr-rtems 885672 294387 : 33.24% > bfin-elf 280606 142836 : 50.90% > c6x-elf 486345 259256 : 53.31% > cr16-elf 46626 20044 : 42.99% > cris-elf 426813 144414 : 33.84% > epiphany-elf 353078 122166 : 34.60% > fr30-elf 40414 21042 : 52.07% > frv-linux-gnu 259550 111335 : 42.90% > h8300-elf 355199 158411 : 44.60% > hppa64-hp-hpux11.23 340584 149009 : 43.75% > ia64-linux-gnu 661364 293710 : 44.41% > iq2000-elf 41123 26709 : 64.95% > lm32-elf 20370 14781 : 72.56% > m32c-elf 174344 62000 : 35.56% > m32r-elf 74357 41773 : 56.18% > m68k-linux-gnu 275733 117445 : 42.59% > mcore-elf 85180 48018 : 56.37% > mep-elf 1450168 376020 : 25.93% > microblaze-elf 44189 26295 : 59.51% > mips-linux-gnu 876650 375753 : 42.86% > mmix 49882 25363 : 50.85% > mn10300-elf 128148 66768 : 52.10% > moxie-elf 23388 9011 : 38.53% > msp430-elf 114200 34426 : 30.15% > nds32le-elf 101416 73677 : 72.65% > nios2-linux-gnu 58799 29825 : 50.72% > pdp11 32836 18557 : 56.51% > powerpc-eabispe 1976098 626942 : 31.73% > powerpc-linux-gnu 1510652 526841 : 34.88% > rl78-elf 93675 40538 : 43.28% > rx-elf 279748 137284 : 49.07% > s390-linux-gnu 857009 316494 : 36.93% > sh-linux-gnu 2154337 806571 : 37.44% > sparc-linux-gnu 367682 169019 : 45.97% > spu-elf 341945 135629 : 39.66% > tilegx-elf 235480 112103 : 47.61% > tilepro-elf 246231 104137 : 42.29% > v850-elf 158028 72875 : 46.12% > vax-netbsdelf 85057 37578 : 44.18% > visium-elf 257148 103331 : 40.18% > x86_64-darwin 5514235 1721777 : 31.22% > xstormy16-elf 83456 46128 : 55.27% > xtensa-elf 52652 29880 : 56.75% > > Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-none-eabi. > Also tested by building the testsuite for each of the targets above > and making sure there were no assembly differences. Made sure that no > objects in spec2k6 changed for aarch64-linux-gnu (except for perlbench > perl.o and cactusADM datestamp.o, where the differences are timestamps). > OK to install? > > Thanks, > Richard > > PS. I've attached the new genrecog.c since the diff version is unreadable. > > > gcc/ > * Makefile.in (build/genrecog.o): Depend on inchash.h. > (build/genrecog$(build_exeext): Depend on build/hash-table.o and > build/inchash.o > * genrecog.c: Rewrite most of the code except for the third page. > > Index: gcc/Makefile.in > =================================================================== > --- gcc/Makefile.in 2015-04-27 10:42:57.783191573 +0100 > +++ gcc/Makefile.in 2015-04-27 10:43:42.878643078 +0100 > @@ -2527,7 +2527,8 @@ build/genpeep.o : genpeep.c $(RTL_BASE_H > build/genpreds.o : genpreds.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \ > coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h $(OBSTACK_H) > build/genrecog.o : genrecog.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \ > - coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h > + coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h \ > + $(HASH_TABLE_H) inchash.h > build/genhooks.o : genhooks.c $(TARGET_DEF) $(C_TARGET_DEF) \ > $(COMMON_TARGET_DEF) $(BCONFIG_H) $(SYSTEM_H) errors.h > build/genmddump.o : genmddump.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \ > @@ -2559,6 +2560,8 @@ genprog = $(genprogerr) check checksum c > # These programs need libs over and above what they get from the above list. > build/genautomata$(build_exeext) : BUILD_LIBS += -lm > > +build/genrecog$(build_exeext) : build/hash-table.o build/inchash.o > + > # For stage1 and when cross-compiling use the build libcpp which is > # built with NLS disabled. For stage2+ use the host library and > # its dependencies. > Hi Richard, I noticed that this patch caused ICE for gcc.target/arm/mmx-2.c on arm-none-linux-gnueabi. Could you please have a look at it?
The log message is as below, /projects/.../src/gcc/gcc/testsuite/gcc.target/arm/mmx-2.c: In function 'foo': /projects/.../src/gcc/gcc/testsuite/gcc.target/arm/mmx-2.c:166:1: error: unrecognizable insn: (insn 541 540 542 2 (set (reg:V4HI 512 [ D.4809 ]) (vec_merge:V4HI (vec_select:V4HI (reg:V4HI 510 [ D.4806 ]) (parallel [ (const_int 2 [0x2]) (const_int 0 [0]) (const_int 3 [0x3]) (const_int 1 [0x1]) ])) (vec_select:V4HI (reg:V4HI 511 [ D.4806 ]) (parallel [ (const_int 0 [0]) (const_int 2 [0x2]) (const_int 1 [0x1]) (const_int 3 [0x3]) ])) (const_int 5 [0x5]))) /projects/.../src/gcc/gcc/testsuite/gcc.target/arm/mmx-2.c:159 -1 (nil)) /projects/.../src/gcc/gcc/testsuite/gcc.target/arm/mmx-2.c:166:1: internal compiler error: in extract_insn, at recog.c:2341 0xa42d2a _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) /projects/.../src/gcc/gcc/rtl-error.c:110 0xa42d59 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) /projects/.../src/gcc/gcc/rtl-error.c:118 0xa15ff7 extract_insn(rtx_insn*) /projects/.../src/gcc/gcc/recog.c:2341 0x7ffb42 instantiate_virtual_regs_in_insn /projects/.../src/gcc/gcc/function.c:1598 0x7ffb42 instantiate_virtual_regs /projects/.../src/gcc/gcc/function.c:1966 0x7ffb42 execute /projects/.../src/gcc/gcc/function.c:2015 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. GCC is configured with gcc/configure --target=arm-none-linux-gnueabi --prefix= --with-sysroot=... --enable-shared --disable-libsanitizer --disable-libssp --disable-libmudflap --with-plugin-ld=arm-none-linux-gnueabi-ld --enable-checking=yes --enable-languages=c,c++,fortran --with-gmp=... --with-mpfr=... --with-mpc=... --with-isl=... --with-cloog=... --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=softfp --with-arch=armv7-a Thanks, bin