https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398
--- Comment #27 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> --- (In reply to Wilco from comment #13) > So to add some real numbers to the discussion, the average number of > iterations is 4.31. Frequency stats (16 includes all iterations > 16 too): > > 1: 29.0 > 2: 4.2 > 3: 1.0 > 4: 36.7 > 5: 8.7 > 6: 3.4 > 7: 3.0 > 8: 2.6 > 9: 2.1 > 10: 1.9 > 11: 1.6 > 12: 1.2 > 13: 0.9 > 14: 0.8 > 15: 0.7 > 16: 2.1 > Find one interesting thing: If using widen reading for the run which > 16 iterations, we can see the performance is significantly improved(>18%) for xz_r in spec. This means that the frequency is small for >16, while it still costs a big part of the runtime. if (len_limit - len > 16) { for(++len; len + sizeof(TYPEE) <= len_limit; len += sizeof(TYPEE)) { long long a = *((TYPEE*)(cur+len)); long long b = *((TYPEE*)(pb+len)); if (a != b) { int lz = __builtin_ctzll(a ^ b); len += lz / 8; goto found; break; } } for (;len != len_limit; ++len) if (pb[len] != cur[len]) break; found:; } else { xxxx original loop}