https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65698
Bug ID: 65698 Summary: Non-optimal code for simple compare function for x86 32-bit target Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com For attached test-case in inner loop we can see the following deficiencies: 1. 2 redundant fills and one spill in comparison part of loop - I assume that only 4 registers needs to keep the base of 'v1' and 'v2' and inexes 'i1' and 'i2', one more register is required to keep 'c1' or 's1'. 2. @ redundant lea instructions to perform multiplication on 2. Here is optimal binaries produced by icc compiler( with deleted increment part): 2e: 8a 04 3b mov (%ebx,%edi,1),%al 31: 3a 04 3e cmp (%esi,%edi,1),%al 34: 75 53 jne 89 <my_cmp+0x89> 36: 0f b7 04 5a movzwl (%edx,%ebx,2),%eax 3a: 0f b7 2c 72 movzwl (%edx,%esi,2),%ebp 3e: 3b c5 cmp %ebp,%eax 40: 75 47 jne 89 <my_cmp+0x89> 42: 8a 44 3b 01 mov 0x1(%ebx,%edi,1),%al 46: 3a 44 3e 01 cmp 0x1(%esi,%edi,1),%al 4a: 75 3d jne 89 <my_cmp+0x89> 4c: 0f b7 44 5a 02 movzwl 0x2(%edx,%ebx,2),%eax 51: 0f b7 6c 72 02 movzwl 0x2(%edx,%esi,2),%ebp 56: 3b c5 cmp %ebp,%eax 58: 75 2f jne 89 <my_cmp+0x89> 5a: 83 c3 02 add $0x2,%ebx ... 7b: 7f b1 jg 2e <my_cmp+0x2e> Note aalso that if we commented out 2 lines if (i1 > n) i1 -= n; if (i2 > n) i2 -= n; we get optimal code with gcc compiler.