https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64691
Bug ID: 64691 Summary: Suboptimal register allocation for bytes comparison on i386 Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: enkovich.gnu at gmail dot com This problem was actually found in 256.bzip2 benchmark codes compiled by GCC 5.0 on -O2. There is a small loop with bytes comparison which appeared to be ineffective because compared values were not allocated on registers allowing byte access. That caused additional copies and as a result significant loop slow down. Situation may be simulated on a small test if we restrict registers usage. >cat test.c void test (unsigned char *p, unsigned char val) { unsigned char tmp1, tmp2; int i; i = 0; tmp1 = p[0]; while (val != tmp1) { i++; tmp2 = tmp1; tmp1 = p[i]; p[i] = tmp2; } p[0]= tmp1; } >gcc -O2 -m32 -ffixed-ebx test.c -S Here is a loop: .L3: movzbl (%eax), %ebp movl %esi, %ecx movb %dl, (%eax) addl $1, %eax movl %ebp, %edx cmpb %dl, %cl jne .L3 We have an extra register copy esi->ecx to perform comparison. Suppose the easiest way to get better register allocation here would be to transform QI comparison into SI one to relax register constraints.