[Bug target/109116] vector_pair register allocation bug

2023-03-15 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109116

Peter Bergner  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2023-03-15
   Assignee|unassigned at gcc dot gnu.org  |bergner at gcc dot 
gnu.org

--- Comment #3 from Peter Bergner  ---
Mine.

[Bug target/109116] vector_pair register allocation bug

2023-03-13 Thread chip.kerchner at ibm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109116

--- Comment #2 from Chip Kerchner  ---
This could be a bigger issue with register allocation after the disassemble of
an opaque object like vector_pair or MMA.

[Bug rtl-optimization/109116] vector_pair register allocation bug

2023-03-13 Thread chip.kerchner at ibm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109116

--- Comment #1 from Chip Kerchner  ---
This has been in GCC since the initial version that supported __vector_pair
(10.x)

[Bug rtl-optimization/109116] New: vector_pair register allocation bug

2023-03-13 Thread chip.kerchner at ibm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109116

Bug ID: 109116
   Summary: vector_pair register allocation bug
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chip.kerchner at ibm dot com
  Target Milestone: ---

There seems to be a bug in the register allocator when using a __vector_pair. 
GCC didn't choose a register for the load that served the later instruction.

With this testcase

```
#include 

#if !__has_builtin(__builtin_vsx_disassemble_pair)
#define __builtin_vsx_disassemble_pair __builtin_mma_disassemble_pair
#endif

int main() {
  float A[8] = { float(1), float(2), float(3), float(4),
 float(5), float(6), float(7), float(8) };
  __vector_pair P;
  __vector_quad Q;
  vector float B, C[2], D[4];

  __builtin_mma_xxsetaccz();
  P = *reinterpret_cast<__vector_pair *>(A);
  B = *reinterpret_cast(A);
  __builtin_vsx_disassemble_pair((void*)(C), );
  __builtin_mma_xvf32gerpp(, reinterpret_cast<__vector unsigned char>(C[0]),
reinterpret_cast<__vector unsigned char>(B));
  __builtin_mma_xvf32gerpp(, reinterpret_cast<__vector unsigned char>(C[1]),
reinterpret_cast<__vector unsigned char>(B));
  __builtin_mma_disassemble_acc((void *)D, );

  return int(D[0][0]);
}
```

It produces an output with extra (unneeded) register moves.

```
plxvp 12,.LANCHOR0@pcrel
xxsetaccz 0
plxv 33,.LC1@pcrel
xxlor 45,13,13
xxlor 32,12,12
xvf32gerpp 0,45,33
xvf32gerpp 0,32,33
xxmfacc 0
```

Re: Register Allocation Bug?

2009-04-06 Thread Segher Boessenkool
#define ESP 
(rel,value,addr) \
asm volatile (mov (%%esp, %2, 4), %0\n 
\t  \
  lea (%%esp, %2, 4), %1\n 
\t  \
  : =r (value),  
=r (addr)   \
  :  
r (rel)); \


It didn't work as expected so I looked at the assembler code generated
for the above:

 1:   b8 00 00 00 00  mov$0x0,%eax
 2:   8b 04 84mov(%esp,%eax,4),%eax
 3:   8d 14 84lea(%esp,%eax,4),%edx
 4:   89 45 f8mov%eax,0xfff8(%ebp)
 5:   89 55 fcmov%edx,0xfffc(%ebp)


As it turns out, %eax is being used for both input and output in line
2, clobbering %eax, so of course line 3 does not give the expected
result... Is this a compiler error?


It's not a compiler bug: you need to use an early clobber, namely
=r(value) .  See the Fine Manual.


Segher



Register Allocation Bug?

2009-03-25 Thread Kasper Bonne
Hi List

I have a question (or possible compiler bug) regarding inline assembly
that I hope you can help me with.

I wanted a routine that would give me the value and address of a
memory location relative to the stack pointer. What I initially tried
was the following:

#define ESP(rel,value,addr) \
asm volatile (mov (%%esp, %2, 4), %0\n\t  \
  lea (%%esp, %2, 4), %1\n\t  \
  : =r (value), =r (addr)   \
  : r (rel)); \

It didn't work as expected so I looked at the assembler code generated
for the above:

 1:   b8 00 00 00 00  mov$0x0,%eax
 2:   8b 04 84mov(%esp,%eax,4),%eax
 3:   8d 14 84lea(%esp,%eax,4),%edx
 4:   89 45 f8mov%eax,0xfff8(%ebp)
 5:   89 55 fcmov%edx,0xfffc(%ebp)


As it turns out, %eax is being used for both input and output in line
2, clobbering %eax, so of course line 3 does not give the expected
result... Is this a compiler error?  I thought the only way the same
register would be used for both input and output was if you use the
0 constraint? I'm compiling with 'GCC 4.2.1 20070719'.

The best solution I found was to split the two assembler statements in
the following way:

#define ESP(rel,value,addr) \
asm volatile (movl (%%esp, %1, 4), %0\n\t :   \
  =r (value) : r (rel));\
asm volatile (lea  (%%esp, %1, 4), %0\n\t :   \
  =r (addr) : r (rel));

The above compiles into six instructions instead of five (duplicating
mov $0x0,%eax) but is has the benefit of only using one register:

 1:   b8 00 00 00 00  mov$0x0,%eax
 2:   8b 04 84mov(%esp,%eax,4),%eax
 3:   89 45 fcmov%eax,0xfffc(%ebp)
 4:   b8 00 00 00 00  mov$0x0,%eax
 5:   8d 04 84lea(%esp,%eax,4),%eax
 6:   89 45 f0mov%eax,0xfff0(%ebp)

So, again, my question is this: Is the compiler doing what it's
supposed to when it's assigning the same register to both input and
output when the specified constraint is r and not 0?

As far as I can tell this problem have been floating around for a
number of years. The following post from 2000 describes exactly the
same issue:

http://gcc.gnu.org/ml/gcc-bugs/2000-07/msg00456.html

Since it hasn't been fixed maybe it's a bu..*ahem*..feature?

Best
/Kasper