https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65483
Bug ID: 65483 Summary: bzip2 bsR/bsW should be auto-inlined Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org bzip2 contains: INLINE UInt32 bsR ( Int32 n ) { UInt32 v; bsNEEDR ( n ); v = (bsBuff >> (bsLive-n)) & ((1 << n)-1); bsLive -= n; return v; } and INLNE void bsW ( Int32 n, UInt32 v ) { bsNEEDW ( n ); bsBuff |= (v << (32 - bsLive - n)); bsLive += n; } which should be inlined. INLINE is however defined to nothing for SPEC. The catch is that we instead inline fgetc/fputc into the functions here: #define bsNEEDR(nz) \ { \ while (bsLive < nz) { \ Int32 zzi = fgetc ( bsStream ); \ if (zzi == EOF) compressedStreamEOF(); \ bsBuff = (bsBuff << 8) | (zzi & 0xffL); \ bsLive += 8; \ } \ } /*---------------------------------------------*/ #define bsNEEDW(nz) \ { \ while (bsLive >= 8) { \ fputc ( (UChar)(bsBuff >> 24), \ bsStream ); \ bsBuff <<= 8; \ bsLive -= 8; \ bytesOut++; \ } \ } Considering spec_getc/285 with 33 size to be inlined into bsR/98 in unknown:-1 Estimated badness is -21.814074, frequency 21.04. Badness calculation for bsR/98 -> spec_getc/285 size growth 27, time 22 inline hints: cross_module big_speedup -10.907037: guessed profile. frequency 21.035000, count 0 caller count 0 time w/o inlining 1063.840001, time w inlining 769.350000 overall growth 0 (current) 0 (original) Adjusted by hints -21.814074 Accounting size:20.00, time:304.69 on predicate:(true) ... Inlined into bsR which now has time 767 and size 55,net change of +27. which makes it to reach inline-insns-auto limit. bsR is estimated as: Inline summary for bsR/98 inlinable self time: 559 global time: 0 self size: 28 global size: 0 min size: 0 self stack: 0 global stack: 0 size:21.000000, time:304.328000, predicate:(true) size:3.000000, time:1.982000, predicate:(not inlined) calls: compressedStreamEOF/143 function not considered for inlining loop depth: 0 freq: 8 size: 1 time: 10 callee size:12 stack: 0 spec_getc/153 function body not available loop depth: 1 freq:21035 size: 3 time: 12 callee size: 0 stack: 0 The spec_getc is implemented as: int spec_getc (int fd) { int rc = 0; debug1(4,"spec_getc: %d = ", fd); if (fd > MAX_SPEC_FD) { fprintf(stderr, "spec_read: fd=%d, > MAX_SPEC_FD!\n", fd); exit (1); } if (spec_fd[fd].pos >= spec_fd[fd].len) { debug(4,"EOF\n"); return EOF; } rc = spec_fd[fd].buf[spec_fd[fd].pos++]; debug1(4,"%d\n", rc); return rc; } we however split out the error handling into spec_getc.part and get: Inline summary for spec_getc/38 inlinable self time: 24 global time: 0 self size: 33 global size: 0 min size: 0 self stack: 0 global stack: 0 size:20.000000, time:14.485000, predicate:(true) size:3.000000, time:1.998000, predicate:(not inlined) which makes it quite good inline candidate especially because the call appears within what we consider an internal loop of bsR. Apparently clang gets lucky here because it inlines more at copmile time and spec_getc is housed in different translation unit.