[Bug target/109116] vector_pair register allocation bug

2023-03-15 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109116

Peter Bergner  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2023-03-15
   Assignee|unassigned at gcc dot gnu.org  |bergner at gcc dot 
gnu.org

--- Comment #3 from Peter Bergner  ---
Mine.

[Bug target/109116] vector_pair register allocation bug

2023-03-13 Thread chip.kerchner at ibm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109116

--- Comment #2 from Chip Kerchner  ---
This could be a bigger issue with register allocation after the disassemble of
an opaque object like vector_pair or MMA.

[Bug rtl-optimization/109116] vector_pair register allocation bug

2023-03-13 Thread chip.kerchner at ibm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109116

--- Comment #1 from Chip Kerchner  ---
This has been in GCC since the initial version that supported __vector_pair
(10.x)

[Bug rtl-optimization/109116] New: vector_pair register allocation bug

2023-03-13 Thread chip.kerchner at ibm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109116

Bug ID: 109116
   Summary: vector_pair register allocation bug
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chip.kerchner at ibm dot com
  Target Milestone: ---

There seems to be a bug in the register allocator when using a __vector_pair. 
GCC didn't choose a register for the load that served the later instruction.

With this testcase

```
#include 

#if !__has_builtin(__builtin_vsx_disassemble_pair)
#define __builtin_vsx_disassemble_pair __builtin_mma_disassemble_pair
#endif

int main() {
  float A[8] = { float(1), float(2), float(3), float(4),
 float(5), float(6), float(7), float(8) };
  __vector_pair P;
  __vector_quad Q;
  vector float B, C[2], D[4];

  __builtin_mma_xxsetaccz(&Q);
  P = *reinterpret_cast<__vector_pair *>(A);
  B = *reinterpret_cast(A);
  __builtin_vsx_disassemble_pair((void*)(C), &P);
  __builtin_mma_xvf32gerpp(&Q, reinterpret_cast<__vector unsigned char>(C[0]),
reinterpret_cast<__vector unsigned char>(B));
  __builtin_mma_xvf32gerpp(&Q, reinterpret_cast<__vector unsigned char>(C[1]),
reinterpret_cast<__vector unsigned char>(B));
  __builtin_mma_disassemble_acc((void *)D, &Q);

  return int(D[0][0]);
}
```

It produces an output with extra (unneeded) register moves.

```
plxvp 12,.LANCHOR0@pcrel
xxsetaccz 0
plxv 33,.LC1@pcrel
xxlor 45,13,13
xxlor 32,12,12
xvf32gerpp 0,45,33
xvf32gerpp 0,32,33
xxmfacc 0
```

Re: Register Allocation Bug?

2009-04-06 Thread Segher Boessenkool
#define ESP 
(rel,value,addr) \
asm volatile ("mov (%%esp, %2, 4), %0\n 
\t"  \
  "lea (%%esp, %2, 4), %1\n 
\t"  \
  : "=r" (value),  
"=r" (addr)   \
  :  
"r" (rel)); \


It didn't work as expected so I looked at the assembler code generated
for the above:

 1:   b8 00 00 00 00  mov$0x0,%eax
 2:   8b 04 84mov(%esp,%eax,4),%eax
 3:   8d 14 84lea(%esp,%eax,4),%edx
 4:   89 45 f8mov%eax,0xfff8(%ebp)
 5:   89 55 fcmov%edx,0xfffc(%ebp)


As it turns out, %eax is being used for both input and output in line
2, clobbering %eax, so of course line 3 does not give the expected
result... Is this a compiler error?


It's not a compiler bug: you need to use an "early clobber", namely
"=&r"(value) .  See the Fine Manual.


Segher



Register Allocation Bug?

2009-03-25 Thread Kasper Bonne
Hi List

I have a question (or possible compiler bug) regarding inline assembly
that I hope you can help me with.

I wanted a routine that would give me the value and address of a
memory location relative to the stack pointer. What I initially tried
was the following:

#define ESP(rel,value,addr) \
asm volatile ("mov (%%esp, %2, 4), %0\n\t"  \
  "lea (%%esp, %2, 4), %1\n\t"  \
  : "=r" (value), "=r" (addr)   \
  : "r" (rel)); \

It didn't work as expected so I looked at the assembler code generated
for the above:

 1:   b8 00 00 00 00  mov$0x0,%eax
 2:   8b 04 84mov(%esp,%eax,4),%eax
 3:   8d 14 84lea(%esp,%eax,4),%edx
 4:   89 45 f8mov%eax,0xfff8(%ebp)
 5:   89 55 fcmov%edx,0xfffc(%ebp)


As it turns out, %eax is being used for both input and output in line
2, clobbering %eax, so of course line 3 does not give the expected
result... Is this a compiler error?  I thought the only way the same
register would be used for both input and output was if you use the
"0" constraint? I'm compiling with 'GCC 4.2.1 20070719'.

The best solution I found was to split the two assembler statements in
the following way:

#define ESP(rel,value,addr) \
asm volatile ("movl (%%esp, %1, 4), %0\n\t" :   \
  "=r" (value) : "r" (rel));\
asm volatile ("lea  (%%esp, %1, 4), %0\n\t" :   \
  "=r" (addr) : "r" (rel));

The above compiles into six instructions instead of five (duplicating
mov $0x0,%eax) but is has the benefit of only using one register:

 1:   b8 00 00 00 00  mov$0x0,%eax
 2:   8b 04 84mov(%esp,%eax,4),%eax
 3:   89 45 fcmov%eax,0xfffc(%ebp)
 4:   b8 00 00 00 00  mov$0x0,%eax
 5:   8d 04 84lea(%esp,%eax,4),%eax
 6:   89 45 f0mov%eax,0xfff0(%ebp)

So, again, my question is this: Is the compiler doing what it's
supposed to when it's assigning the same register to both input and
output when the specified constraint is "r" and not "0"?

As far as I can tell this problem have been floating around for a
number of years. The following post from 2000 describes exactly the
same issue:

http://gcc.gnu.org/ml/gcc-bugs/2000-07/msg00456.html

Since it hasn't been fixed maybe it's a bu..*ahem*..feature?

Best
/Kasper