[PATCH 0/4][v2] Implement smart multiple switch expansion algorithms

marxin Fri, 08 Jun 2018 06:14:42 -0700

Hello.

This is v2 of following patch series: 
https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00347.html
I've done some adjustments:


- patch series is split:
  * 1/4 - factoring out of switch conversion algorithm
  * 2/4 - jump table and bit test use clustering data structures, but operate
          on all cases
  * 3/4 - clustering is enabled for both algorithms
  * 4/4 - param change which makes reasonable sizes on binaries

- jump table density threshold has been changed to align with current trunk
- most of David's comments were resolved


I decided to change way how to calculate jump table and bit test benefit.
It uses the same calculation as we use now. Still I believe it's too high
number, factor of 10 basically means that expected grown would be 10x compared
to balanced tree expansion. It's caused by fact that cmp & tmp instruction
takes roughly 8B, which is size of one element in jump table.

There are statistics for parameter equal to 8.

1) GCC (-O2):

bloaty ./objdir2/gcc/cc1plus -- ./objdir3/gcc/cc1plus
     VM SIZE                      FILE SIZE
 ++++++++++++++ GROWING        ++++++++++++++
  +0.1% +11.3Ki .text          +11.3Ki  +0.1%
  [ = ]       0 .debug_info    +4.00Ki  +0.0%
  [ = ]       0 .debug_loc     +3.69Ki  +0.0%
  +0.1% +1.95Ki .eh_frame      +1.95Ki  +0.1%

 -------------- SHRINKING      --------------
  -2.6%  -195Ki .rodata         -195Ki  -2.6%
  [ = ]       0 .debug_line    -8.29Ki  -0.0%
  [ = ]       0 [Unmapped]     -2.11Ki -54.8%
  [ = ]       0 .debug_ranges  -2.08Ki  -0.0%
  [ = ]       0 .strtab           -691  -0.0%
  [ = ]       0 .symtab           -360  -0.0%
  [ = ]       0 .debug_abbrev     -167  -0.0%
  -0.0%     -80 .eh_frame_hdr      -80  -0.0%
  [ = ]       0 .debug_aranges     -16  -0.0%

  -0.6%  -181Ki TOTAL           -187Ki  -0.1%

Thus shrinks by 0.6%.

2) SPEC 2006 benchmarks (-O2): numbers are presented only if there's a change:

BENCHMARK: /tmp/before/astar_peak.amd64-m64-mine
BENCHMARK: /tmp/before/bwaves_peak.amd64-m64-mine
BENCHMARK: /tmp/before/bzip2_peak.amd64-m64-mine
BENCHMARK: /tmp/before/cactusADM_peak.amd64-m64-mine
  +0.0%     +16 .text             +16  +0.0%
  -0.4%    -320 .rodata          -320  -0.4%
  -0.0%    -312 TOTAL         -4.17Ki  -0.1%
BENCHMARK: /tmp/before/calculix_peak.amd64-m64-mine
  +0.1%    +184 .rodata          +184  +0.1%
  -0.0%     -16 .text             -16  -0.0%
  +0.0%    +112 TOTAL         +1.06Ki  +0.0%
BENCHMARK: /tmp/before/dealII_peak.amd64-m64-mine
  [ = ]       0 TOTAL           +16  +0.0%
BENCHMARK: /tmp/before/gamess_peak.amd64-m64-mine
  +0.0%    +480 .rodata        +480  +0.0%
  -0.0%    -112 .text          -112  -0.0%
  +0.0%    +352 TOTAL          -160  -0.0%
BENCHMARK: /tmp/before/gcc_peak.amd64-m64-mine
  -1.2% -9.78Ki .rodata       -9.78Ki  -1.2%
  -0.1% -1.44Ki .text         -1.44Ki  -0.1%
  -0.3% -11.3Ki TOTAL            +264  +0.0%
BENCHMARK: /tmp/before/GemsFDTD_peak.amd64-m64-mine
BENCHMARK: /tmp/before/gobmk_peak.amd64-m64-mine
  +0.0%     +64 .rodata           +64  +0.0%
  -0.0%     -16 .text             -16  -0.0%
  +0.0%     +64 TOTAL             -48  -0.0%
BENCHMARK: /tmp/before/gromacs_peak.amd64-m64-mine
  +0.0%     +80 .text           +80  +0.0%
  -0.6%    -512 .rodata        -512  -0.6%
  -0.0%    -384 TOTAL          +432  +0.0%
BENCHMARK: /tmp/before/h264ref_peak.amd64-m64-mine
  +0.5%    +384 .rodata        +384  +0.5%
  -0.0%     -96 .text           -96  -0.0%
  +0.0%    +288 TOTAL           -16  -0.0%
BENCHMARK: /tmp/before/hmmer_peak.amd64-m64-mine
  +2.6%    +704 .rodata          +704  +2.6%
  -0.1%    -288 .text            -288  -0.1%
  +0.1%    +416 TOTAL             +80  +0.0%
BENCHMARK: /tmp/before/lbm_peak.amd64-m64-mine
BENCHMARK: /tmp/before/leslie3d_peak.amd64-m64-mine
BENCHMARK: /tmp/before/libquantum_peak.amd64-m64-mine
   +26%    +336 .rodata        +336   +26%
  +0.6%    +176 .text          +176  +0.6%
  +1.2%    +512 TOTAL          -256  -0.1%
BENCHMARK: /tmp/before/mcf_peak.amd64-m64-mine
BENCHMARK: /tmp/before/milc_peak.amd64-m64-mine
BENCHMARK: /tmp/before/namd_peak.amd64-m64-mine
BENCHMARK: /tmp/before/omnetpp_peak.amd64-m64-mine
BENCHMARK: /tmp/before/perlbench_peak.amd64-m64-mine
  +4.5% +5.25Ki .rodata       +5.25Ki  +4.5%
  -0.2% -1.70Ki .text         -1.70Ki  -0.2%
  +0.3% +3.50Ki TOTAL         +2.41Ki  +0.0%
BENCHMARK: /tmp/before/povray_peak.amd64-m64-mine
   +20% +21.1Ki .rodata       +21.1Ki   +20%
  -0.8% -5.84Ki .text         -5.84Ki  -0.8%
  +1.4% +15.4Ki TOTAL         +19.6Ki  +0.3%
BENCHMARK: /tmp/before/sjeng_peak.amd64-m64-mine
  +0.7%    +192 .rodata        +192  +0.7%
  -0.1%     -80 .text           -80  -0.1%
  +0.0%    +128 TOTAL           -64  -0.0%
BENCHMARK: /tmp/before/soplex_peak.amd64-m64-mine
  +2.5%    +464 .rodata              +464  +2.5%
  -0.0%     -32 .text                 -32  -0.0%
  +0.1%    +444 TOTAL                -224  -0.0%
BENCHMARK: /tmp/before/specrand_peak.amd64-m64-mine
BENCHMARK: /tmp/before/sphinx_livepretend_peak.amd64-m64-mine
  +0.0%     +32 .text           +32  +0.0%
  -0.3%     -64 .rodata         -64  -0.3%
  -0.0%     -32 TOTAL           -48  -0.0%
BENCHMARK: /tmp/before/tonto_peak.amd64-m64-mine
  +0.2%    +928 .rodata          +928  +0.2%
  -0.0%    -176 .text            -176  -0.0%
  +0.0%    +760 TOTAL            +832  +0.0%
BENCHMARK: /tmp/before/wrf_peak.amd64-m64-mine
  +0.2%    +608 .rodata          +608  +0.2%
  -0.0%    -608 .text            -608  -0.0%
  +0.0%     +24 TOTAL            -272  -0.0%
BENCHMARK: /tmp/before/Xalan_peak.amd64-m64-mine
  +0.3% +2.06Ki .rodata           +2.06Ki  +0.3%
  -0.0%    -496 .text                -496  -0.0%
  +0.0% +1.60Ki TOTAL             +1.11Ki  +0.0%
BENCHMARK: /tmp/before/zeusmp_peak.amd64-m64-mine

Which gives us very similar numbers.

3) http://natsys-lab.blogspot.com/
https://github.com/natsys/blog

It's a HTTP request parser benchmark. There are some nice new jump tables
created:

http_ngx.c.167t.switchlower1:;; GIMPLE switch case clusters: 0 10 13 JT:97-105 

which is a switch that handles 'a' - 'z', plus 0, 10, 13. JT is created for
alphabet characters. I can also measure benchmark changes:

+--------------------------------+--------+--------+---------+
| https://github.com/natsys/blog |  base  | Gcc 9  |  ratio  |
+--------------------------------+--------+--------+---------+
| goto_big_header_line           | 244.22 | 202.62 | 82.97%  |
| goto_header_line               | 119.06 | 118.58 | 99.60%  |
| goto_opt_request_line          | 172.97 | 156.39 | 90.41%  |
| goto_request_line              | 201.47 | 159.29 | 79.06%  |
| hsm_header_line                | 283.14 | 282.16 | 99.65%  |
| ngx_big_header_line            | 755.11 | 683.98 | 90.58%  |
| ngx_header_line                |  84.85 |  84.46 | 99.54%  |
| ngx_lw_header_line             |  96.61 |  95.98 | 99.35%  |
| ngx_request_line               |  141.9 | 184.47 | 130.00% |
| tbl_big_header_line            | 414.59 | 359.58 | 86.73%  |
| tbl_header_line                | 206.87 | 205.71 | 99.44%  |
+--------------------------------+--------+--------+---------+

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Thoughts?
Martin

marxin (4):
  Transform switch_conversion into a class.
  Switch other switch expansion methods into classes.
  Enable clustering for switch statements.
  Change default for jump_table expansion ratio to 8.

 gcc/testsuite/gcc.dg/tree-ssa/vrp104.c |    2 +-
 gcc/tree-switch-conversion.c           | 2518 +++++++++++-------------
 gcc/tree-switch-conversion.h           |  845 ++++++++
 3 files changed, 1941 insertions(+), 1424 deletions(-)
 create mode 100644 gcc/tree-switch-conversion.h

-- 
2.17.0

[PATCH 0/4][v2] Implement smart multiple switch expansion algorithms

Reply via email to