[Bug target/109116] vector_pair register allocation bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109116 Peter Bergner changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Last reconfirmed||2023-03-15 Assignee|unassigned at gcc dot gnu.org |bergner at gcc dot gnu.org --- Comment #3 from Peter Bergner --- Mine.
[Bug target/109116] vector_pair register allocation bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109116 --- Comment #2 from Chip Kerchner --- This could be a bigger issue with register allocation after the disassemble of an opaque object like vector_pair or MMA.
[Bug rtl-optimization/109116] vector_pair register allocation bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109116 --- Comment #1 from Chip Kerchner --- This has been in GCC since the initial version that supported __vector_pair (10.x)
[Bug rtl-optimization/109116] New: vector_pair register allocation bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109116 Bug ID: 109116 Summary: vector_pair register allocation bug Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: chip.kerchner at ibm dot com Target Milestone: --- There seems to be a bug in the register allocator when using a __vector_pair. GCC didn't choose a register for the load that served the later instruction. With this testcase ``` #include #if !__has_builtin(__builtin_vsx_disassemble_pair) #define __builtin_vsx_disassemble_pair __builtin_mma_disassemble_pair #endif int main() { float A[8] = { float(1), float(2), float(3), float(4), float(5), float(6), float(7), float(8) }; __vector_pair P; __vector_quad Q; vector float B, C[2], D[4]; __builtin_mma_xxsetaccz(); P = *reinterpret_cast<__vector_pair *>(A); B = *reinterpret_cast(A); __builtin_vsx_disassemble_pair((void*)(C), ); __builtin_mma_xvf32gerpp(, reinterpret_cast<__vector unsigned char>(C[0]), reinterpret_cast<__vector unsigned char>(B)); __builtin_mma_xvf32gerpp(, reinterpret_cast<__vector unsigned char>(C[1]), reinterpret_cast<__vector unsigned char>(B)); __builtin_mma_disassemble_acc((void *)D, ); return int(D[0][0]); } ``` It produces an output with extra (unneeded) register moves. ``` plxvp 12,.LANCHOR0@pcrel xxsetaccz 0 plxv 33,.LC1@pcrel xxlor 45,13,13 xxlor 32,12,12 xvf32gerpp 0,45,33 xvf32gerpp 0,32,33 xxmfacc 0 ```
Re: Register Allocation Bug?
#define ESP (rel,value,addr) \ asm volatile (mov (%%esp, %2, 4), %0\n \t \ lea (%%esp, %2, 4), %1\n \t \ : =r (value), =r (addr) \ : r (rel)); \ It didn't work as expected so I looked at the assembler code generated for the above: 1: b8 00 00 00 00 mov$0x0,%eax 2: 8b 04 84mov(%esp,%eax,4),%eax 3: 8d 14 84lea(%esp,%eax,4),%edx 4: 89 45 f8mov%eax,0xfff8(%ebp) 5: 89 55 fcmov%edx,0xfffc(%ebp) As it turns out, %eax is being used for both input and output in line 2, clobbering %eax, so of course line 3 does not give the expected result... Is this a compiler error? It's not a compiler bug: you need to use an early clobber, namely =r(value) . See the Fine Manual. Segher
Register Allocation Bug?
Hi List I have a question (or possible compiler bug) regarding inline assembly that I hope you can help me with. I wanted a routine that would give me the value and address of a memory location relative to the stack pointer. What I initially tried was the following: #define ESP(rel,value,addr) \ asm volatile (mov (%%esp, %2, 4), %0\n\t \ lea (%%esp, %2, 4), %1\n\t \ : =r (value), =r (addr) \ : r (rel)); \ It didn't work as expected so I looked at the assembler code generated for the above: 1: b8 00 00 00 00 mov$0x0,%eax 2: 8b 04 84mov(%esp,%eax,4),%eax 3: 8d 14 84lea(%esp,%eax,4),%edx 4: 89 45 f8mov%eax,0xfff8(%ebp) 5: 89 55 fcmov%edx,0xfffc(%ebp) As it turns out, %eax is being used for both input and output in line 2, clobbering %eax, so of course line 3 does not give the expected result... Is this a compiler error? I thought the only way the same register would be used for both input and output was if you use the 0 constraint? I'm compiling with 'GCC 4.2.1 20070719'. The best solution I found was to split the two assembler statements in the following way: #define ESP(rel,value,addr) \ asm volatile (movl (%%esp, %1, 4), %0\n\t : \ =r (value) : r (rel));\ asm volatile (lea (%%esp, %1, 4), %0\n\t : \ =r (addr) : r (rel)); The above compiles into six instructions instead of five (duplicating mov $0x0,%eax) but is has the benefit of only using one register: 1: b8 00 00 00 00 mov$0x0,%eax 2: 8b 04 84mov(%esp,%eax,4),%eax 3: 89 45 fcmov%eax,0xfffc(%ebp) 4: b8 00 00 00 00 mov$0x0,%eax 5: 8d 04 84lea(%esp,%eax,4),%eax 6: 89 45 f0mov%eax,0xfff0(%ebp) So, again, my question is this: Is the compiler doing what it's supposed to when it's assigning the same register to both input and output when the specified constraint is r and not 0? As far as I can tell this problem have been floating around for a number of years. The following post from 2000 describes exactly the same issue: http://gcc.gnu.org/ml/gcc-bugs/2000-07/msg00456.html Since it hasn't been fixed maybe it's a bu..*ahem*..feature? Best /Kasper