https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

--- Comment #21 from Roger Sayle <roger at nextmovesoftware dot com> ---
I completely agree that Richard Sandiford's patch is a much better solution,
but I'd like to counter the claims that the change originally proposed in
comment #8 is obviously universally bad.

Segher has proposed that object code size correlates with the quality of
"combine", so I thought that I'd point out that the original patch reduces code
size on the CSiBE benchmark on x86_64 when compiled with -Os.

#bench,file,before,after,delta,cumsum
bzip2-1.0.2,bzlib,11750,11756,6,6
cg_compiler_opensrc,memory,818,825,7,13
jpeg-6b,jcsample,2814,2804,-10,3
jpeg-6b,jcphuff,3606,3609,3,6
libpng-1.2.5,pngget,3798,3792,-6,0
linux-2.4.23-pre3-testplatform,fs/nfs/nfs2xdr,6070,6069,-1,-1
linux-2.4.23-pre3-testplatform,fs/nfs/nfs3xdr,8229,8225,-4,-5
linux-2.4.23-pre3-testplatform,fs/ext3/balloc,6769,6773,4,-1
linux-2.4.23-pre3-testplatform,fs/ext3/ialloc,6063,6064,1,0
linux-2.4.23-pre3-testplatform,fs/nfsd/nfsfh,5896,5889,-7,-7
linux-2.4.23-pre3-testplatform,fs/nfsd/nfs3xdr,6145,6143,-2,-9
linux-2.4.23-pre3-testplatform,fs/lockd/xdr,5924,5916,-8,-17
linux-2.4.23-pre3-testplatform,fs/lockd/xdr4,5883,5875,-8,-25
linux-2.4.23-pre3-testplatform,fs/inode,8227,8229,2,-23
linux-2.4.23-pre3-testplatform,mm/memory,8877,8880,3,-20
linux-2.4.23-pre3-testplatform,lib/zlib_deflate/deflate,6319,6315,-4,-24
linux-2.4.23-pre3-testplatform,net/ipv4/tcp_ipv4,17587,17588,1,-23
linux-2.4.23-pre3-testplatform,net/sunrpc/auth_unix,2155,2148,-7,-30
linux-2.4.23-pre3-testplatform,net/sunrpc/svcauth,948,947,-1,-31
linux-2.4.23-pre3-testplatform,net/sunrpc/xdr,4033,4043,10,-21
linux-2.4.23-pre3-testplatform,kernel/timer,5106,5108,2,-19

The story with -O2 is more complicated, it does indeed increase code size, but
the effects are greatly inflated due to jump alignment (notice that the
majority  of deltas in the report below are multiples of 16).  If the single
pathological OpenTCP/ip is excluded, the size is reduced over all of the other
tests.

OpenTCP-1.0.4,dns/dns,1793,1825,32,32
OpenTCP-1.0.4,http/http_server,2691,2676,-15,17
OpenTCP-1.0.4,ip,3152,3312,160,177
OpenTCP-1.0.4,tcp,8823,8839,16,193
OpenTCP-1.0.4,udp,2147,2163,16,209
bzip2-1.0.2,compress,17779,17763,-16,193
jikespg-1.3,src/mkfirst,20023,20007,-16,177
jikespg-1.3,src/mkred,8993,16,193
jikespg-1.3,src/produce,14897,14961,-16,177
jikespg-1.3,src/remsp,10678,10694,16,193
jikespg-1.3,src/resolve,17542,17558,16,209
jpeg-6b,jchuff,6439,6423,-16,193
jpeg-6b,jcphuff,7921,7905,-16,177
jpeg-6b,jdmarker,9845,9893,48,225
jpeg-6b,jquant2,6785,6769,-16,209
jpeg-6b,wrtarga,1353,1369,16,225
jpeg-6b,wrbmp,2551,2567,16,241
libmspack,test/cabextract_md5,32951,32935,-16,225
libpng-1.2.5,pngget,4583,4567,-16,209
libpng-1.2.5,pngwutil,21647,21615,-32,177
libpng-1.2.5,pngrtran,26045,26109,64,241
libpng-1.2.5,pngwtran,2539,2555,16,257
linux-2.4.23-pre3-testplatform,fs/nfs/nfs3xdr,14038,14006,-32,225
linux-2.4.23-pre3-testplatform,fs/nfsd/nfs3xdr,13584,13568,-16,209
linux-2.4.23-pre3-testplatform,fs/lockd/xdr,9554,9538,-16,193
linux-2.4.23-pre3-testplatform,fs/lockd/xdr4,7903,7919,16,209
linux-2.4.23-pre3-testplatform,fs/buffer,22824,22840,16,225
linux-2.4.23-pre3-testplatform,mm/filemap,23872,23888,16,241
linux-2.4.23-pre3-testplatform,net/ipv4/ip_input,4189,4173,-16,225
linux-2.4.23-pre3-testplatform,net/ipv4/ip_fragment,7242,7226,-16,209
linux-2.4.23-pre3-testplatform,net/ipv4/ip_options,7664,7680,16,225
linux-2.4.23-pre3-testplatform,net/ipv4/ip_output,10956,10924,-32,193
linux-2.4.23-pre3-testplatform,net/ipv4/tcp_ipv4,22663,22679,16,209
linux-2.4.23-pre3-testplatform,net/ipv4/udp,10365,10349,-16,193
linux-2.4.23-pre3-testplatform,net/ipv4/icmp,8589,8573,-16,177
linux-2.4.23-pre3-testplatform,net/sunrpc/auth_unix,2782,2766,-16,161
linux-2.4.23-pre3-testplatform,net/sunrpc/svcauth,1172,1156,-16,145
linux-2.4.23-pre3-testplatform,drivers/char/raw,4860,4876,16,161
linux-2.4.23-pre3-testplatform,kernel/exit,5485,5469,-16,145
linux-2.4.23-pre3-testplatform,kernel/timer,7257,7273,16,161
lwip-0.5.3.preproc,src/core/ipv4/ip,1883,1899,16,177
lwip-0.5.3.preproc,src/core/tcp_input,5513,5497,-16,161
lwip-0.5.3.preproc,src/core/tcp_output,3290,3354,64,225
teem-1.6.0-src,src/gage/st,5248,5264,16,241
teem-1.6.0-src,src/nrrd/apply1D,10837,10789,-48,193
unrarlib-0.4.0,unrarlib/unrarlib,16682,16666,-16,177
zlib-1.1.4,deflate,8721,8689,-32,145

Picking "lwip-0.5.3.preproc,src/core/tcp_output" as an example size regression,
the first difference in code is:

Before:
83 c2 05                add    $0x5,%edx
89 d3                   mov    %edx,%ebx
c1 e3 0c                shl    $0xc,%ebx

After:
c1 e2 0c                shl    $0xc,%edx
8d 9a 00 50 00 00       lea    0x5000(%rdx),%ebx

Notice that the size has increased by a byte, but the new sequence is actually
now only two instructions compared to the original three.

Let's just say the situation is complicated (comparing code size when not
optimizing for code size may be misleading), but importantly it is possible to
do better than the current expand_compound_operation/make_compound_operation.

Reply via email to