https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65483

            Bug ID: 65483
           Summary: bzip2 bsR/bsW should be auto-inlined
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: ipa
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hubicka at gcc dot gnu.org

bzip2 contains:
INLINE UInt32 bsR ( Int32 n )
{
   UInt32 v;
   bsNEEDR ( n );
   v = (bsBuff >> (bsLive-n)) & ((1 << n)-1);
   bsLive -= n;
   return v;
}

and

INLNE void bsW ( Int32 n, UInt32 v )
{
   bsNEEDW ( n );
   bsBuff |= (v << (32 - bsLive - n));
   bsLive += n;
}

which should be inlined.  INLINE is however defined to nothing for SPEC.
The catch is that we instead inline fgetc/fputc into the functions here:

#define bsNEEDR(nz)                           \
{                                             \
   while (bsLive < nz) {                      \
      Int32 zzi = fgetc ( bsStream );         \
      if (zzi == EOF) compressedStreamEOF();  \
      bsBuff = (bsBuff << 8) | (zzi & 0xffL); \
      bsLive += 8;                            \
   }                                          \
}


/*---------------------------------------------*/
#define bsNEEDW(nz)                           \
{                                             \
   while (bsLive >= 8) {                      \
      fputc ( (UChar)(bsBuff >> 24),          \
               bsStream );                    \
      bsBuff <<= 8;                           \
      bsLive -= 8;                            \
      bytesOut++;                             \
   }                                          \
}

Considering spec_getc/285 with 33 size
 to be inlined into bsR/98 in unknown:-1
 Estimated badness is -21.814074, frequency 21.04.
    Badness calculation for bsR/98 -> spec_getc/285
      size growth 27, time 22 inline hints: cross_module big_speedup
      -10.907037: guessed profile. frequency 21.035000, count 0 caller count 0
time w/o inlining 1063.840001, time w inlining 769.350000 overall growth 0
(current) 0 (original)
      Adjusted by hints -21.814074
                Accounting size:20.00, time:304.69 on predicate:(true)
...
 Inlined into bsR which now has time 767 and size 55,net change of +27.

which makes it to reach inline-insns-auto limit.

bsR is estimated as:

Inline summary for bsR/98 inlinable
  self time:       559
  global time:     0
  self size:       28
  global size:     0
  min size:       0
  self stack:      0
  global stack:    0
    size:21.000000, time:304.328000, predicate:(true)
    size:3.000000, time:1.982000, predicate:(not inlined)
  calls:
    compressedStreamEOF/143 function not considered for inlining
      loop depth: 0 freq:   8 size: 1 time: 10 callee size:12 stack: 0
    spec_getc/153 function body not available
      loop depth: 1 freq:21035 size: 3 time: 12 callee size: 0 stack: 0

The spec_getc is implemented as:

int spec_getc (int fd) {
    int rc = 0;
    debug1(4,"spec_getc: %d = ", fd);
    if (fd > MAX_SPEC_FD) {
        fprintf(stderr, "spec_read: fd=%d, > MAX_SPEC_FD!\n", fd);
        exit (1);
    }
    if (spec_fd[fd].pos >= spec_fd[fd].len) {
        debug(4,"EOF\n");
        return EOF;
    }
    rc = spec_fd[fd].buf[spec_fd[fd].pos++];
    debug1(4,"%d\n", rc);
    return rc;
}

we however split out the error handling into spec_getc.part and get:

Inline summary for spec_getc/38 inlinable
  self time:       24
  global time:     0
  self size:       33
  global size:     0
  min size:       0
  self stack:      0
  global stack:    0
    size:20.000000, time:14.485000, predicate:(true)
    size:3.000000, time:1.998000, predicate:(not inlined)

which makes it quite good inline candidate especially because the call appears
within what we consider an internal loop of bsR.

Apparently clang gets lucky here because it inlines more at copmile time and
spec_getc is housed in different translation unit.

Reply via email to