https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325
--- Comment #16 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- > I'm all for removing the 1/3 for innermost loop handling (in cunroll > the unrolled loop is then innermost). I'm more concerned about > unrolling more than one level which is exactly what's required for > 454.calculix. Removing 1/3 for the innermost loop would be sufficient to solve both the issue in the PR and x264_pixel_var_8x8 from 525.x264_r. I'll try to benchmark that.