[Bug target/109116] vector_pair register allocation bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109116 Peter Bergner changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Last reconfirmed||2023-03-15 Assignee|unassigned at gcc dot gnu.org |bergner at gcc dot gnu.org --- Comment #3 from Peter Bergner --- Mine.
[Bug target/109116] vector_pair register allocation bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109116 --- Comment #2 from Chip Kerchner --- This could be a bigger issue with register allocation after the disassemble of an opaque object like vector_pair or MMA.
[Bug rtl-optimization/109116] vector_pair register allocation bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109116 --- Comment #1 from Chip Kerchner --- This has been in GCC since the initial version that supported __vector_pair (10.x)
[Bug rtl-optimization/109116] New: vector_pair register allocation bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109116 Bug ID: 109116 Summary: vector_pair register allocation bug Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: chip.kerchner at ibm dot com Target Milestone: --- There seems to be a bug in the register allocator when using a __vector_pair. GCC didn't choose a register for the load that served the later instruction. With this testcase ``` #include #if !__has_builtin(__builtin_vsx_disassemble_pair) #define __builtin_vsx_disassemble_pair __builtin_mma_disassemble_pair #endif int main() { float A[8] = { float(1), float(2), float(3), float(4), float(5), float(6), float(7), float(8) }; __vector_pair P; __vector_quad Q; vector float B, C[2], D[4]; __builtin_mma_xxsetaccz(&Q); P = *reinterpret_cast<__vector_pair *>(A); B = *reinterpret_cast(A); __builtin_vsx_disassemble_pair((void*)(C), &P); __builtin_mma_xvf32gerpp(&Q, reinterpret_cast<__vector unsigned char>(C[0]), reinterpret_cast<__vector unsigned char>(B)); __builtin_mma_xvf32gerpp(&Q, reinterpret_cast<__vector unsigned char>(C[1]), reinterpret_cast<__vector unsigned char>(B)); __builtin_mma_disassemble_acc((void *)D, &Q); return int(D[0][0]); } ``` It produces an output with extra (unneeded) register moves. ``` plxvp 12,.LANCHOR0@pcrel xxsetaccz 0 plxv 33,.LC1@pcrel xxlor 45,13,13 xxlor 32,12,12 xvf32gerpp 0,45,33 xvf32gerpp 0,32,33 xxmfacc 0 ```
Re: Register Allocation Bug?
#define ESP (rel,value,addr) \ asm volatile ("mov (%%esp, %2, 4), %0\n \t" \ "lea (%%esp, %2, 4), %1\n \t" \ : "=r" (value), "=r" (addr) \ : "r" (rel)); \ It didn't work as expected so I looked at the assembler code generated for the above: 1: b8 00 00 00 00 mov$0x0,%eax 2: 8b 04 84mov(%esp,%eax,4),%eax 3: 8d 14 84lea(%esp,%eax,4),%edx 4: 89 45 f8mov%eax,0xfff8(%ebp) 5: 89 55 fcmov%edx,0xfffc(%ebp) As it turns out, %eax is being used for both input and output in line 2, clobbering %eax, so of course line 3 does not give the expected result... Is this a compiler error? It's not a compiler bug: you need to use an "early clobber", namely "=&r"(value) . See the Fine Manual. Segher
Register Allocation Bug?
Hi List I have a question (or possible compiler bug) regarding inline assembly that I hope you can help me with. I wanted a routine that would give me the value and address of a memory location relative to the stack pointer. What I initially tried was the following: #define ESP(rel,value,addr) \ asm volatile ("mov (%%esp, %2, 4), %0\n\t" \ "lea (%%esp, %2, 4), %1\n\t" \ : "=r" (value), "=r" (addr) \ : "r" (rel)); \ It didn't work as expected so I looked at the assembler code generated for the above: 1: b8 00 00 00 00 mov$0x0,%eax 2: 8b 04 84mov(%esp,%eax,4),%eax 3: 8d 14 84lea(%esp,%eax,4),%edx 4: 89 45 f8mov%eax,0xfff8(%ebp) 5: 89 55 fcmov%edx,0xfffc(%ebp) As it turns out, %eax is being used for both input and output in line 2, clobbering %eax, so of course line 3 does not give the expected result... Is this a compiler error? I thought the only way the same register would be used for both input and output was if you use the "0" constraint? I'm compiling with 'GCC 4.2.1 20070719'. The best solution I found was to split the two assembler statements in the following way: #define ESP(rel,value,addr) \ asm volatile ("movl (%%esp, %1, 4), %0\n\t" : \ "=r" (value) : "r" (rel));\ asm volatile ("lea (%%esp, %1, 4), %0\n\t" : \ "=r" (addr) : "r" (rel)); The above compiles into six instructions instead of five (duplicating mov $0x0,%eax) but is has the benefit of only using one register: 1: b8 00 00 00 00 mov$0x0,%eax 2: 8b 04 84mov(%esp,%eax,4),%eax 3: 89 45 fcmov%eax,0xfffc(%ebp) 4: b8 00 00 00 00 mov$0x0,%eax 5: 8d 04 84lea(%esp,%eax,4),%eax 6: 89 45 f0mov%eax,0xfff0(%ebp) So, again, my question is this: Is the compiler doing what it's supposed to when it's assigning the same register to both input and output when the specified constraint is "r" and not "0"? As far as I can tell this problem have been floating around for a number of years. The following post from 2000 describes exactly the same issue: http://gcc.gnu.org/ml/gcc-bugs/2000-07/msg00456.html Since it hasn't been fixed maybe it's a bu..*ahem*..feature? Best /Kasper