https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153

--- Comment #22 from ncm at cantrip dot org ---
(In reply to Nathan Kurz from comment #21)
> My current belief is
> that everything here is expected behavior, and there is no bug with either
> the compiler or processor.  
> 
> The code spends most of its time in a tight loop that depends on runtime
> input, and the compiler doesn't have any reason to know which branch is more
> likely.   The addition of "count" changes the heuristic in recent compilers,
> and by chance, changes it for the worse.  

I am going to disagree, carefully.  It seems clear, in any case, that Haswell
is off the hook.

1. As a correction: *without* the count takes twice as long to run as with,
   or when using bitset<>.

2. As a heuristic, favoring a branch to skip a biggish loop body evidently 
   has much less downside than favoring the branch into it.  Maybe Gcc
   already has such a heuristic, and the addition of 7 conditional 
   increments in the loop, or whatever overhead bitset<> adds, was enough
   to push it over?  

Westmere runs both instruction sequences (with and without __builtin_expect)
the same.  Maybe on Westmere the loop takes two cycles regardless of code 
placement, and Gcc is (still) tuned for the older timings?

Reply via email to