https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398

--- Comment #27 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
(In reply to Wilco from comment #13)
> So to add some real numbers to the discussion, the average number of
> iterations is 4.31. Frequency stats (16 includes all iterations > 16 too):
> 
> 1: 29.0
> 2: 4.2
> 3: 1.0
> 4: 36.7
> 5: 8.7
> 6: 3.4
> 7: 3.0
> 8: 2.6
> 9: 2.1
> 10: 1.9
> 11: 1.6
> 12: 1.2
> 13: 0.9
> 14: 0.8
> 15: 0.7
> 16: 2.1
> 

Find one interesting thing:
If using widen reading for the run which > 16 iterations, we can see the
performance is significantly improved(>18%) for xz_r in spec.
This means that the frequency is small for >16, while it still costs a big part
of the runtime.

if (len_limit - len > 16)                                         
    {                                                                   
      for(++len; len + sizeof(TYPEE) <= len_limit; len += sizeof(TYPEE)) 
        {                                                               
          long long a = *((TYPEE*)(cur+len));                           
          long long b = *((TYPEE*)(pb+len));                            
          if (a != b) {                                                 
            int lz = __builtin_ctzll(a ^ b);                            
            len += lz / 8;                                              
            goto found;                                                 
            break;                                                      
          }                                                             
        }                                                               
      for (;len != len_limit; ++len)                                    
        if (pb[len] != cur[len])                                        
          break;                                                        
    found:;                                                             
    }    
else
 { xxxx original loop}

Reply via email to