Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI

2020-02-08 Thread Uros Bizjak
On Sat, Feb 8, 2020 at 11:52 AM Jakub Jelinek  wrote:
>
> On Sat, Feb 08, 2020 at 11:32:40AM +0100, Uros Bizjak wrote:
> > I think that the patch should also be backported to gcc-9 branch. The
> > change is backward compatible, since the new code will save and
> > restore zmm16+ registers at the caller site, and the old code (e.g.
> > existing libraries) will then unnecessarily save and restore regs in
> > the callee.
>
> Actually, it isn't backward compatible.
> If both caller and callee touch xmm16+ (sure, unlikely), then if caller
> is compiled by new gcc and callee by old gcc it is the case you described,
> caller doesn't try to preserve values across the call and callee uselessly
> saves them anyway (unless it fails to assemble due to SEH).
> But if caller is compiled by old gcc and callee by new gcc, caller assumes
> it can preserve values across the call in those registers and callee doesn't
> save them and clobbers them.

I was considering only the most likely scenario where the compiler is
upgraded without recompiling existing system libraries. So, with the
new compiler installed, the new compiled code calls either new or old
libraries. *Assuming* that old libraries never call new code, the
situation is OK.

As you point out, when the old code calls new code, registers get
clobbered. By not backporting the patch, we can consider gcc-9 and
gcc-10 ABI incompatible, otherwise the forward incompatibility moves
in between gcc-9.2 and gcc-9.3. I agree that the former is somehow
more convenient.

IMO, this issue should be pointed out in the compiler release notes in any case.

Uros.


Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI

2020-02-08 Thread Jakub Jelinek
On Sat, Feb 08, 2020 at 11:32:40AM +0100, Uros Bizjak wrote:
> I think that the patch should also be backported to gcc-9 branch. The
> change is backward compatible, since the new code will save and
> restore zmm16+ registers at the caller site, and the old code (e.g.
> existing libraries) will then unnecessarily save and restore regs in
> the callee.

Actually, it isn't backward compatible.
If both caller and callee touch xmm16+ (sure, unlikely), then if caller
is compiled by new gcc and callee by old gcc it is the case you described,
caller doesn't try to preserve values across the call and callee uselessly
saves them anyway (unless it fails to assemble due to SEH).
But if caller is compiled by old gcc and callee by new gcc, caller assumes
it can preserve values across the call in those registers and callee doesn't
save them and clobbers them.

Jakub



Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI

2020-02-08 Thread Jakub Jelinek
On Sat, Feb 08, 2020 at 11:32:40AM +0100, Uros Bizjak wrote:
> On Sat, Feb 8, 2020 at 11:05 AM Jakub Jelinek  wrote:
> >
> > On Sat, Feb 08, 2020 at 08:24:38AM +, JonY wrote:
> > > It does not, I just checked with the master branch of binutils.
> > ...
> > > I did a -c test build with an older toolchain, it fails to compile
> > > (invalid register for .seh_savexmm) while the latest gcc is passing,
> > > both are using the same binutils version.
> > >
> > > I guess that covers it?
> >
> > I went ahead and committed the patch (with the double config/i386/ in it
> > fixed obviously).
> 
> I think that the patch should also be backported to gcc-9 branch. The
> change is backward compatible, since the new code will save and
> restore zmm16+ registers at the caller site, and the old code (e.g.
> existing libraries) will then unnecessarily save and restore regs in
> the callee.

Agreed, will do together with other backports soon.

Jakub



Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI

2020-02-08 Thread Uros Bizjak
On Sat, Feb 8, 2020 at 11:05 AM Jakub Jelinek  wrote:
>
> On Sat, Feb 08, 2020 at 08:24:38AM +, JonY wrote:
> > It does not, I just checked with the master branch of binutils.
> ...
> > I did a -c test build with an older toolchain, it fails to compile
> > (invalid register for .seh_savexmm) while the latest gcc is passing,
> > both are using the same binutils version.
> >
> > I guess that covers it?
>
> I went ahead and committed the patch (with the double config/i386/ in it
> fixed obviously).

I think that the patch should also be backported to gcc-9 branch. The
change is backward compatible, since the new code will save and
restore zmm16+ registers at the caller site, and the old code (e.g.
existing libraries) will then unnecessarily save and restore regs in
the callee.

HJ fixed some other MSABI issue there.

Uros.

>
> > > 2020-02-07  Uroš Bizjak  
> > > Jakub Jelinek  
> > >
> > > PR target/65782
> > > * config/i386/config/i386/i386.h (CALL_USED_REGISTERS): Make
> > > xmm16-xmm31 call-used even in 64-bit ms-abi.
> > >
> > > * gcc.target/i386/pr65782.c: New test.
>
> Jakub
>


Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI

2020-02-08 Thread Jakub Jelinek
On Sat, Feb 08, 2020 at 08:24:38AM +, JonY wrote:
> It does not, I just checked with the master branch of binutils.
...
> I did a -c test build with an older toolchain, it fails to compile
> (invalid register for .seh_savexmm) while the latest gcc is passing,
> both are using the same binutils version.
> 
> I guess that covers it?

I went ahead and committed the patch (with the double config/i386/ in it
fixed obviously).

> > 2020-02-07  Uroš Bizjak  
> > Jakub Jelinek  
> > 
> > PR target/65782
> > * config/i386/config/i386/i386.h (CALL_USED_REGISTERS): Make
> > xmm16-xmm31 call-used even in 64-bit ms-abi.
> > 
> > * gcc.target/i386/pr65782.c: New test.

Jakub



Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI

2020-02-08 Thread JonY
On 2/7/20 11:28 AM, Jakub Jelinek wrote:
> On Fri, Feb 07, 2020 at 10:57:22AM +, JonY wrote:
 Is this patch testing still required? I just got back from traveling.
>>>
>>> Yes, our reading of the MS ABI docs show that xmm16-31 are to be call used
>>> (not preserved over calls), while in gcc they are currently handled as
>>> preserved across the calls.
> 
> The other parts are I guess mainly about SEH.  Consider e.g.
> void
> foo (void)
> {
>   register double x __asm ("xmm14");
>   register double y __asm ("xmm18");
>   asm ("" : "=x" (x));
>   asm ("" : "=v" (y));
>   x += y;
>   y += x;
>   asm ("" : : "x" (x));
>   asm ("" : : "v" (y));
> }
> looking at cross-compiler output, with -O2 -mavx512f this emits
>   .file   "abcdeq.c"
>   .text
>   .align 16
>   .globl  foo
>   .deffoo;.scl2;  .type   32; .endef
>   .seh_proc   foo
> foo:
>   subq$40, %rsp
>   .seh_stackalloc 40
>   vmovaps %xmm14, (%rsp)
>   .seh_savexmm%xmm14, 0
>   vmovaps %xmm18, 16(%rsp)
>   .seh_savexmm%xmm18, 16
>   .seh_endprologue
>   vaddsd  %xmm18, %xmm14, %xmm14
>   vaddsd  %xmm18, %xmm14, %xmm18
>   vmovaps (%rsp), %xmm14
>   vmovaps 16(%rsp), %xmm18
>   addq$40, %rsp
>   ret
>   .seh_endproc
>   .ident  "GCC: (GNU) 10.0.1 20200207 (experimental)"
> Does whatever assembler mingw64 uses even assemble this (I mean the
> .seh_savexmm %xmm16, 16 could be problematic)?''

It does not, I just checked with the master branch of binutils.

> I can find e.g.
> https://stackoverflow.com/questions/43152633/invalid-register-for-seh-savexmm-in-cygwin/43210527
> which then links to
> https://gcc.gnu.org/PR65782
> 
> So, I'd say we want to add PR target/65782 to the ChangeLog entry in the
> patch, and likely add a testcase like the above, so like below?
> 
> Have you tested the earlier version of the patch on mingw64 or cygwin?
> If yes, can you just test the testcase from the following patch, without and
> with the i386.h part and verify it FAILs without and PASSes with it?
> Just make check-gcc RUNTESTFLAGS=i386.exp=pr65782.c should do?
> If not, can you please test the whole patch?
> 

I did a -c test build with an older toolchain, it fails to compile
(invalid register for .seh_savexmm) while the latest gcc is passing,
both are using the same binutils version.

I guess that covers it?


> 2020-02-07  Uroš Bizjak  
>   Jakub Jelinek  
> 
>   PR target/65782
>   * config/i386/config/i386/i386.h (CALL_USED_REGISTERS): Make
>   xmm16-xmm31 call-used even in 64-bit ms-abi.
> 
>   * gcc.target/i386/pr65782.c: New test.
> 
> --- gcc/config/i386/config/i386/i386.h.jj 2020-01-22 10:19:24.199221986 
> +0100
> +++ gcc/config/i386/config/i386/i386.h2020-02-04 12:09:12.338341003 
> +0100
> @@ -1128,9 +1128,9 @@ extern const char *host_detect_local_cpu
>  /*xmm8,xmm9,xmm10,xmm11,xmm12,xmm13,xmm14,xmm15*/\
>   6,   6,6,6,6,6,6,6, \
>  /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/  \
> - 6,6, 6,6,6,6,6,6,   \
> + 1,1, 1,1,1,1,1,1,   \
>  /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/  \
> - 6,6, 6,6,6,6,6,6,   \
> + 1,1, 1,1,1,1,1,1,   \
>   /* k0,  k1,  k2,  k3,  k4,  k5,  k6,  k7*/  \
>   1,   1,   1,   1,   1,   1,   1,   1 }
>  
> --- gcc/testsuite/gcc.target/i386/pr65782.c.jj2020-02-07 
> 12:21:09.472819018 +0100
> +++ gcc/testsuite/gcc.target/i386/pr65782.c   2020-02-07 12:24:06.820154495 
> +0100
> @@ -0,0 +1,16 @@
> +/* PR target/65782 */
> +/* { dg-do assemble { target { avx512vl && { ! ia32 } } } } */
> +/* { dg-options "-O2 -mavx512vl" } */
> +
> +void
> +foo (void)
> +{
> +  register double x __asm ("xmm14");
> +  register double y __asm ("xmm18");
> +  asm ("" : "=x" (x));
> +  asm ("" : "=v" (y));
> +  x += y;
> +  y += x;
> +  asm ("" : : "x" (x));
> +  asm ("" : : "v" (y));
> +}
> 
> 
>   Jakub
> 




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI

2020-02-07 Thread Jakub Jelinek
On Fri, Feb 07, 2020 at 10:57:22AM +, JonY wrote:
> >> Is this patch testing still required? I just got back from traveling.
> > 
> > Yes, our reading of the MS ABI docs show that xmm16-31 are to be call used
> > (not preserved over calls), while in gcc they are currently handled as
> > preserved across the calls.

The other parts are I guess mainly about SEH.  Consider e.g.
void
foo (void)
{
  register double x __asm ("xmm14");
  register double y __asm ("xmm18");
  asm ("" : "=x" (x));
  asm ("" : "=v" (y));
  x += y;
  y += x;
  asm ("" : : "x" (x));
  asm ("" : : "v" (y));
}
looking at cross-compiler output, with -O2 -mavx512f this emits
.file   "abcdeq.c"
.text
.align 16
.globl  foo
.deffoo;.scl2;  .type   32; .endef
.seh_proc   foo
foo:
subq$40, %rsp
.seh_stackalloc 40
vmovaps %xmm14, (%rsp)
.seh_savexmm%xmm14, 0
vmovaps %xmm18, 16(%rsp)
.seh_savexmm%xmm18, 16
.seh_endprologue
vaddsd  %xmm18, %xmm14, %xmm14
vaddsd  %xmm18, %xmm14, %xmm18
vmovaps (%rsp), %xmm14
vmovaps 16(%rsp), %xmm18
addq$40, %rsp
ret
.seh_endproc
.ident  "GCC: (GNU) 10.0.1 20200207 (experimental)"
Does whatever assembler mingw64 uses even assemble this (I mean the
.seh_savexmm %xmm16, 16 could be problematic)?
I can find e.g.
https://stackoverflow.com/questions/43152633/invalid-register-for-seh-savexmm-in-cygwin/43210527
which then links to
https://gcc.gnu.org/PR65782

So, I'd say we want to add PR target/65782 to the ChangeLog entry in the
patch, and likely add a testcase like the above, so like below?

Have you tested the earlier version of the patch on mingw64 or cygwin?
If yes, can you just test the testcase from the following patch, without and
with the i386.h part and verify it FAILs without and PASSes with it?
Just make check-gcc RUNTESTFLAGS=i386.exp=pr65782.c should do?
If not, can you please test the whole patch?

2020-02-07  Uroš Bizjak  
Jakub Jelinek  

PR target/65782
* config/i386/config/i386/i386.h (CALL_USED_REGISTERS): Make
xmm16-xmm31 call-used even in 64-bit ms-abi.

* gcc.target/i386/pr65782.c: New test.

--- gcc/config/i386/config/i386/i386.h.jj   2020-01-22 10:19:24.199221986 
+0100
+++ gcc/config/i386/config/i386/i386.h  2020-02-04 12:09:12.338341003 +0100
@@ -1128,9 +1128,9 @@ extern const char *host_detect_local_cpu
 /*xmm8,xmm9,xmm10,xmm11,xmm12,xmm13,xmm14,xmm15*/  \
  6,   6,6,6,6,6,6,6,   \
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/\
- 6,6, 6,6,6,6,6,6, \
+ 1,1, 1,1,1,1,1,1, \
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/\
- 6,6, 6,6,6,6,6,6, \
+ 1,1, 1,1,1,1,1,1, \
  /* k0,  k1,  k2,  k3,  k4,  k5,  k6,  k7*/\
  1,   1,   1,   1,   1,   1,   1,   1 }
 
--- gcc/testsuite/gcc.target/i386/pr65782.c.jj  2020-02-07 12:21:09.472819018 
+0100
+++ gcc/testsuite/gcc.target/i386/pr65782.c 2020-02-07 12:24:06.820154495 
+0100
@@ -0,0 +1,16 @@
+/* PR target/65782 */
+/* { dg-do assemble { target { avx512vl && { ! ia32 } } } } */
+/* { dg-options "-O2 -mavx512vl" } */
+
+void
+foo (void)
+{
+  register double x __asm ("xmm14");
+  register double y __asm ("xmm18");
+  asm ("" : "=x" (x));
+  asm ("" : "=v" (y));
+  x += y;
+  y += x;
+  asm ("" : : "x" (x));
+  asm ("" : : "v" (y));
+}


Jakub



Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI

2020-02-07 Thread JonY
On 2/6/20 6:07 AM, Jakub Jelinek wrote:
> On Thu, Feb 06, 2020 at 01:00:36AM +, JonY wrote:
>> On 2/4/20 11:42 AM, Jakub Jelinek wrote:
>>> Hi!
>>>
>>> On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote:
 I guess that Comment #9 patch form the PR should be trivially correct,
 but althouhg it looks obvious, I don't want to propose the patch since
 I have no means of testing it.
>>>
>>> I don't have means of testing it either.
>>> https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019
>>> is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low
>>> 128-bits only) are call preserved.
>>>
>>> Jonathan, could you please test this if it is sufficient to just change
>>> CALL_USED_REGISTERS or if e.g. something in the pro/epilogue needs tweaking
>>> too?  Thanks.
>>
>> Is this patch testing still required? I just got back from traveling.
> 
> Yes, our reading of the MS ABI docs show that xmm16-31 are to be call used
> (not preserved over calls), while in gcc they are currently handled as
> preserved across the calls.
> 
>   Jakub
> 


--- original.s  2020-02-06 09:00:02.014638069 +
+++ new.s   2020-02-07 10:28:55.678317667 +
@@ -7,23 +7,23 @@
 qux:
subq$72, %rsp
.seh_stackalloc 72
-   vmovaps %xmm18, 48(%rsp)
-   .seh_savexmm%xmm18, 48
+   vmovaps %xmm6, 48(%rsp)
+   .seh_savexmm%xmm6, 48
.seh_endprologue
callbar
vmovapd %xmm0, %xmm1
-   vmovapd %xmm1, %xmm18
+   vmovapd %xmm1, %xmm6
callfoo
leaq32(%rsp), %rcx
-   vmovapd %xmm18, %xmm0
-   vmovaps %xmm0, 32(%rsp)
+   vmovapd %xmm6, %xmm0
+   vmovapd %xmm0, 32(%rsp)
callbaz
nop
-   vmovaps 48(%rsp), %xmm18
+   vmovaps 48(%rsp), %xmm6
addq$72, %rsp
ret
.seh_endproc
-   .ident  "GCC: (GNU) 10.0.0 20191024 (experimental)"
+   .ident  "GCC: (GNU) 10.0.1 20200206 (experimental)"
.defbar;.scl2;  .type   32; .endef
.deffoo;.scl2;  .type   32; .endef
.defbaz;.scl2;  .type   32; .endef

GCC with the patch now seems to put the variables in xmm6, unfortunately
I don't know enough of AVX or stack setups to know if that's all that is
needed.



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI

2020-02-05 Thread Jakub Jelinek
On Thu, Feb 06, 2020 at 01:00:36AM +, JonY wrote:
> On 2/4/20 11:42 AM, Jakub Jelinek wrote:
> > Hi!
> > 
> > On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote:
> >> I guess that Comment #9 patch form the PR should be trivially correct,
> >> but althouhg it looks obvious, I don't want to propose the patch since
> >> I have no means of testing it.
> > 
> > I don't have means of testing it either.
> > https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019
> > is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low
> > 128-bits only) are call preserved.
> > 
> > Jonathan, could you please test this if it is sufficient to just change
> > CALL_USED_REGISTERS or if e.g. something in the pro/epilogue needs tweaking
> > too?  Thanks.
> 
> Is this patch testing still required? I just got back from traveling.

Yes, our reading of the MS ABI docs show that xmm16-31 are to be call used
(not preserved over calls), while in gcc they are currently handled as
preserved across the calls.

Jakub



Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI

2020-02-05 Thread JonY
On 2/4/20 11:42 AM, Jakub Jelinek wrote:
> Hi!
> 
> On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote:
>> I guess that Comment #9 patch form the PR should be trivially correct,
>> but althouhg it looks obvious, I don't want to propose the patch since
>> I have no means of testing it.
> 
> I don't have means of testing it either.
> https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019
> is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low
> 128-bits only) are call preserved.
> 
> Jonathan, could you please test this if it is sufficient to just change
> CALL_USED_REGISTERS or if e.g. something in the pro/epilogue needs tweaking
> too?  Thanks.

Is this patch testing still required? I just got back from traveling.



signature.asc
Description: OpenPGP digital signature