On Jan 26, 2016, at 12:45 PM, Richard Biener <richard.guent...@gmail.com> wrote:
> The original reasoning was to inline only the fast path if it is known at 
> compile time and otherwise have a call. Exactly to avoid bloating callers 
> with inlined conditionals.

That’s part of it.  And generally, yes, we do that.  One other part is, from 
the comment:

         For variable-precision integers like wide_int, handle HWI              
                                                                             
         and sub-HWI integers inline.  */

Before, we just did the shift directly.  This _was_ the fast path.  We bulk the 
fast path case by a shift check, because we want to ensure it is small enough 
to be well defined.  The code previously was the fast path, because almost all 
shifts ever run will be HWI or smaller and will be 63 bits or less shifted.  If 
we make the check static only, then we boost the existing fast path out into 
the subroutine call.  Since the condition is most always true and since the 
predicate is in a register for the subroutine call anyway, the extra check 
should be fairly light (one instruction slot, zero execution time, zero 
dependencies, one BHT slot pollution).  With 34 call sites, that should be 
around an additional 136 bytes to the executable.  I think that space is paid 
for by the speed different of the fast path and how often it hits.  But, would 
be nice to have a person that loves benchmarking compile times and just tell us 
which one is faster.  :-)  Anyone skilled at knowing which one is faster is 
free to change to the other form.  My confidence that I know which one is 
faster is small; I’m happy to defer to someone that is more confident than I.

Reply via email to