This series implements some basic machine-independent optimizations.  They
simplify code and allow liveness analysis do it's work better.

Suppose we have following ARM code:

 movw    r12, #0xb6db
 movt    r12, #0xdb6d

In TCG before optimizations we'll have:

 movi_i32 tmp8,$0xb6db
 mov_i32 r12,tmp8
 mov_i32 tmp8,r12
 ext16u_i32 tmp8,tmp8
 movi_i32 tmp9,$0xdb6d0000
 or_i32 tmp8,tmp8,tmp9
 mov_i32 r12,tmp8

And after optimizations we'll have this:

 movi_i32 r12,$0xdb6db6db

Here are performance evaluation results on SPEC CPU2000 integer tests in
user-mode emulation on x86_64 host.  There were 5 runs of each test on
reference data set.  The tables below show runtime in seconds for all these
runs.

ARM guest without optimizations:
Test name       #1       #2       #3       #4       #5    Median
164.gzip    1408.891 1402.323 1407.623 1404.955 1405.396 1405.396
175.vpr     1245.31  1248.758 1247.936 1248.534 1247.534 1247.936
176.gcc      912.561  809.481  847.057 912.636   912.544  912.544
181.mcf      198.384  197.841  199.127 197.976   197.29   197.976
186.crafty  1545.881 1546.051 1546.002 1545.927 1545.945 1545.945
197.parser  3779.954 3779.878 3779.79  3779.94  3779.88  3779.88
252.eon     2563.168 2776.152 2776.395 2776.577 2776.202 2776.202
253.perlbmk 2591.781 2504.078 2507.07  2591.337 2463.401 2507.07
256.bzip2   1306.197 1304.639 1184.853 1305.141 1305.606 1305.141
300.twolf   2918.984 2918.926 2918.93  2918.97  2918.914 2918.93

ARM guest with optimizations:
Test name       #1       #2       #3       #4       #5    Median    Gain
164.gzip    1401.198 1376.337 1401.117 1401.23  1401.246 1401.198   0.30%
175.vpr     1247.964 1151.468 1247.76  1154.419 1242.017 1242.017   0.47%
176.gcc      896.882  918.546  918.297  851.465  918.39   918.297  -0.63%
181.mcf      198.19   197.399  198.421  198.663  198.312  198.312  -0.17%
186.crafty  1520.425 1520.362 1520.477 1520.445 1520.957 1520.445   1.65%
197.parser  3770.943 3770.927 3770.578 3771.048 3770.904 3770.927   0.24%
252.eon     2752.371 2752.111 2752.005 2752.214 2752.109 2752.111   0.87%
253.perlbmk 2577.462 2578.588 2493.567 2578.571 2578.318 2578.318  -2.84%
256.bzip2   1296.198 1271.128 1296.044 1296.321 1296.147 1296.147   0.69%
300.twolf   2888.984 2889.023 2889.225 2889.039 2889.05  2889.039   1.02%


x86_64 guest without optimizations:
Test name       #1       #2       #3       #4       #5    Median
164.gzip     857.654  857.646  857.678  798.119  857.675  857.654
175.vpr      959.265  959.207  959.185  959.461  959.332  959.265
176.gcc      625.722  637.257  646.638  646.614  646.56   646.56
181.mcf      221.666  220.194  220.079  219.868  221.5    220.194
186.crafty  1129.531 1129.739 1129.573 1129.588 1129.624 1129.588
197.parser  1809.517 1809.516 1809.386 1809.477 1809.427 1809.477
253.perlbmk 1774.944 1776.046 1769.865 1774.052 1775.236 1774.944
254.gap     1061.033 1061.158 1061.064 1061.047 1061.01  1061.047
255.vortex  1871.261 1914.144 1914.057 1914.086 1914.127 1914.086
256.bzip2    918.916 1011.828 1011.819 1012.11  1011.932 1011.828
300.twolf   1332.797 1330.56  1330.687 1330.917 1330.602 1330.687 

x86_64 guest with optimizations:
Test name       #1       #2       #3       #4       #5    Median    Gain
164.gzip     806.198  854.159  854.184  854.168  854.187  854.168   0.41%
175.vpr      955.905  950.86   955.876  876.397  955.957  955.876   1.82%
176.gcc      641.663  640.189  641.57   641.552  641.514  641.552   0.03%
181.mcf      217.619  218.627  218.699  217.977  216.955  217.977   1.18%
186.crafty  1123.909 1123.852 1123.917 1123.781 1123.805 1123.852   0.51%
197.parser  1813.94  1814.643 1815.286 1814.445 1813.72  1814.445  -0.27%
253.perlbmk 1791.536 1795.642 1793.0   1797.486 1791.401 1793.0    -1.02%
254.gap     1070.605 1070.216 1070.637 1070.168 1070.491 1070.491  -0.89%
255.vortex  1918.764 1918.573 1917.411 1918.287 1918.735 1918.573  -0.23%
256.bzip2   1017.179 1017.083 1017.283 1016.913 1017.189 1017.179  -0.53%
300.twolf   1321.072 1321.109 1321.019 1321.072 1321.004 1321.072   0.72%

ARM guests for 254.gap and 255.vortex and x86_64 guest for 252.eon does not
work under QEMU for some unrelated reason.

Changes:
v1 -> v2
 - State and Vals arrays merged to an array of structures.
 - Added reference counting of temp's copies. This helps to reset temp's state
   faster in most cases.
 - Do not make copy propagation through operations with TCG_OPF_CALL_CLOBBER or
   TCG_OPF_SIDE_EFFECTS flag.
 - Split some expression simplifications into independent switch.
 - Let compiler handle signed shifts and sign/zero extends in it's
   implementation defined way.

v2 -> v3
 - Elements of equiv class are placed in a double-linked circular list so it's
   easier to choose a new representative.
 - CASE_OP_32_64 macro is used to reduce amount of ifdefdsi. Checkpatch is not
   happy about this change but I do not think spaces would be appropriate here.
 - Some constraints during copy propagation are relaxed.
 - Functions tcg_opt_gen_mov and tcg_opt_gen_movi are introduced to reduce code
   duplication.

Kirill Batuzov (6):
  Add TCG optimizations stub
  Add copy and constant propagation.
  Do constant folding for basic arithmetic operations.
  Do constant folding for boolean operations.
  Do constant folding for shift operations.
  Do constant folding for unary operations.

 Makefile.target |    2 +-
 tcg/optimize.c  |  568 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg.c       |    6 +
 tcg/tcg.h       |    3 +
 4 files changed, 578 insertions(+), 1 deletions(-)
 create mode 100644 tcg/optimize.c

-- 
1.7.4.1


Reply via email to