https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67288
Bug ID: 67288
Summary: [4.9 regression] non optimal simple function (useless
additional shift/remove/shift/add)
Product: gcc
Version: 4.9.3
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: regression
Assignee: unassigned at gcc dot gnu.org
Reporter: [email protected]
Target Milestone: ---
The following function (Linux Kernel, compiled with -O2) was resulting in a
good assembly with GCC 4.8.3. With GCC 4.9.3 there are a lot of unneccessary
instructions
/* L1_CACHE_BYTES = 16 */
/* L1_CACHE_SHIFT = 4 */
#define mb() __asm__ __volatile__ ("sync" : : : "memory")
static inline void dcbf(void *addr)
{
__asm__ __volatile__ ("dcbf 0, %0" : : "r"(addr) : "memory");
}
void flush_dcache_range(unsigned long start, unsigned long stop)
{
void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
unsigned int size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1);
unsigned int i;
for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES)
dcbf(addr);
if (i)
mb();
}
Result with GCC 4.9.3: (15 insns)
c000d970 <flush_dcache_range>:
c000d970: 54 63 00 36 rlwinm r3,r3,0,0,27
c000d974: 38 84 00 0f addi r4,r4,15
c000d978: 7c 83 20 50 subf r4,r3,r4
c000d97c: 54 89 e1 3f rlwinm. r9,r4,28,4,31
c000d980: 4d 82 00 20 beqlr
c000d984: 55 24 20 36 rlwinm r4,r9,4,0,27
c000d988: 39 24 ff f0 addi r9,r4,-16
c000d98c: 55 29 e1 3e rlwinm r9,r9,28,4,31
c000d990: 39 29 00 01 addi r9,r9,1
c000d994: 7d 29 03 a6 mtctr r9
c000d998: 7c 00 18 ac dcbf 0,r3
c000d99c: 38 63 00 10 addi r3,r3,16
c000d9a0: 42 00 ff f8 bdnz c000d998 <flush_dcache_range+0x28>
c000d9a4: 7c 00 04 ac sync
c000d9a8: 4e 80 00 20 blr
The following section is just useless: (shift left 4 bits, remove 16, shift
right 4 bits, add 1)
c000d984: 55 24 20 36 rlwinm r4,r9,4,0,27
c000d988: 39 24 ff f0 addi r9,r4,-16
c000d98c: 55 29 e1 3e rlwinm r9,r9,28,4,31
c000d990: 39 29 00 01 addi r9,r9,1
Result with GCC 4.8.3 was correct: (11 insns)
c000d894 <flush_dcache_range>:
c000d894: 54 63 00 36 rlwinm r3,r3,0,0,27
c000d898: 38 84 00 0f addi r4,r4,15
c000d89c: 7d 23 20 50 subf r9,r3,r4
c000d8a0: 55 29 e1 3f rlwinm. r9,r9,28,4,31
c000d8a4: 4d 82 00 20 beqlr
c000d8a8: 7d 29 03 a6 mtctr r9
c000d8ac: 7c 00 18 ac dcbf 0,r3
c000d8b0: 38 63 00 10 addi r3,r3,16
c000d8b4: 42 00 ff f8 bdnz c000d8ac <flush_dcache_range+0x18>
c000d8b8: 7c 00 04 ac sync
c000d8bc: 4e 80 00 20 blr