Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI
On Sat, Feb 8, 2020 at 11:52 AM Jakub Jelinek wrote: > > On Sat, Feb 08, 2020 at 11:32:40AM +0100, Uros Bizjak wrote: > > I think that the patch should also be backported to gcc-9 branch. The > > change is backward compatible, since the new code will save and > > restore zmm16+ registers at the caller site, and the old code (e.g. > > existing libraries) will then unnecessarily save and restore regs in > > the callee. > > Actually, it isn't backward compatible. > If both caller and callee touch xmm16+ (sure, unlikely), then if caller > is compiled by new gcc and callee by old gcc it is the case you described, > caller doesn't try to preserve values across the call and callee uselessly > saves them anyway (unless it fails to assemble due to SEH). > But if caller is compiled by old gcc and callee by new gcc, caller assumes > it can preserve values across the call in those registers and callee doesn't > save them and clobbers them. I was considering only the most likely scenario where the compiler is upgraded without recompiling existing system libraries. So, with the new compiler installed, the new compiled code calls either new or old libraries. *Assuming* that old libraries never call new code, the situation is OK. As you point out, when the old code calls new code, registers get clobbered. By not backporting the patch, we can consider gcc-9 and gcc-10 ABI incompatible, otherwise the forward incompatibility moves in between gcc-9.2 and gcc-9.3. I agree that the former is somehow more convenient. IMO, this issue should be pointed out in the compiler release notes in any case. Uros.
Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI
On Sat, Feb 08, 2020 at 11:32:40AM +0100, Uros Bizjak wrote: > I think that the patch should also be backported to gcc-9 branch. The > change is backward compatible, since the new code will save and > restore zmm16+ registers at the caller site, and the old code (e.g. > existing libraries) will then unnecessarily save and restore regs in > the callee. Actually, it isn't backward compatible. If both caller and callee touch xmm16+ (sure, unlikely), then if caller is compiled by new gcc and callee by old gcc it is the case you described, caller doesn't try to preserve values across the call and callee uselessly saves them anyway (unless it fails to assemble due to SEH). But if caller is compiled by old gcc and callee by new gcc, caller assumes it can preserve values across the call in those registers and callee doesn't save them and clobbers them. Jakub
Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI
On Sat, Feb 08, 2020 at 11:32:40AM +0100, Uros Bizjak wrote: > On Sat, Feb 8, 2020 at 11:05 AM Jakub Jelinek wrote: > > > > On Sat, Feb 08, 2020 at 08:24:38AM +, JonY wrote: > > > It does not, I just checked with the master branch of binutils. > > ... > > > I did a -c test build with an older toolchain, it fails to compile > > > (invalid register for .seh_savexmm) while the latest gcc is passing, > > > both are using the same binutils version. > > > > > > I guess that covers it? > > > > I went ahead and committed the patch (with the double config/i386/ in it > > fixed obviously). > > I think that the patch should also be backported to gcc-9 branch. The > change is backward compatible, since the new code will save and > restore zmm16+ registers at the caller site, and the old code (e.g. > existing libraries) will then unnecessarily save and restore regs in > the callee. Agreed, will do together with other backports soon. Jakub
Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI
On Sat, Feb 8, 2020 at 11:05 AM Jakub Jelinek wrote: > > On Sat, Feb 08, 2020 at 08:24:38AM +, JonY wrote: > > It does not, I just checked with the master branch of binutils. > ... > > I did a -c test build with an older toolchain, it fails to compile > > (invalid register for .seh_savexmm) while the latest gcc is passing, > > both are using the same binutils version. > > > > I guess that covers it? > > I went ahead and committed the patch (with the double config/i386/ in it > fixed obviously). I think that the patch should also be backported to gcc-9 branch. The change is backward compatible, since the new code will save and restore zmm16+ registers at the caller site, and the old code (e.g. existing libraries) will then unnecessarily save and restore regs in the callee. HJ fixed some other MSABI issue there. Uros. > > > > 2020-02-07 Uroš Bizjak > > > Jakub Jelinek > > > > > > PR target/65782 > > > * config/i386/config/i386/i386.h (CALL_USED_REGISTERS): Make > > > xmm16-xmm31 call-used even in 64-bit ms-abi. > > > > > > * gcc.target/i386/pr65782.c: New test. > > Jakub >
Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI
On Sat, Feb 08, 2020 at 08:24:38AM +, JonY wrote: > It does not, I just checked with the master branch of binutils. ... > I did a -c test build with an older toolchain, it fails to compile > (invalid register for .seh_savexmm) while the latest gcc is passing, > both are using the same binutils version. > > I guess that covers it? I went ahead and committed the patch (with the double config/i386/ in it fixed obviously). > > 2020-02-07 Uroš Bizjak > > Jakub Jelinek > > > > PR target/65782 > > * config/i386/config/i386/i386.h (CALL_USED_REGISTERS): Make > > xmm16-xmm31 call-used even in 64-bit ms-abi. > > > > * gcc.target/i386/pr65782.c: New test. Jakub
Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI
On 2/7/20 11:28 AM, Jakub Jelinek wrote: > On Fri, Feb 07, 2020 at 10:57:22AM +, JonY wrote: Is this patch testing still required? I just got back from traveling. >>> >>> Yes, our reading of the MS ABI docs show that xmm16-31 are to be call used >>> (not preserved over calls), while in gcc they are currently handled as >>> preserved across the calls. > > The other parts are I guess mainly about SEH. Consider e.g. > void > foo (void) > { > register double x __asm ("xmm14"); > register double y __asm ("xmm18"); > asm ("" : "=x" (x)); > asm ("" : "=v" (y)); > x += y; > y += x; > asm ("" : : "x" (x)); > asm ("" : : "v" (y)); > } > looking at cross-compiler output, with -O2 -mavx512f this emits > .file "abcdeq.c" > .text > .align 16 > .globl foo > .deffoo;.scl2; .type 32; .endef > .seh_proc foo > foo: > subq$40, %rsp > .seh_stackalloc 40 > vmovaps %xmm14, (%rsp) > .seh_savexmm%xmm14, 0 > vmovaps %xmm18, 16(%rsp) > .seh_savexmm%xmm18, 16 > .seh_endprologue > vaddsd %xmm18, %xmm14, %xmm14 > vaddsd %xmm18, %xmm14, %xmm18 > vmovaps (%rsp), %xmm14 > vmovaps 16(%rsp), %xmm18 > addq$40, %rsp > ret > .seh_endproc > .ident "GCC: (GNU) 10.0.1 20200207 (experimental)" > Does whatever assembler mingw64 uses even assemble this (I mean the > .seh_savexmm %xmm16, 16 could be problematic)?'' It does not, I just checked with the master branch of binutils. > I can find e.g. > https://stackoverflow.com/questions/43152633/invalid-register-for-seh-savexmm-in-cygwin/43210527 > which then links to > https://gcc.gnu.org/PR65782 > > So, I'd say we want to add PR target/65782 to the ChangeLog entry in the > patch, and likely add a testcase like the above, so like below? > > Have you tested the earlier version of the patch on mingw64 or cygwin? > If yes, can you just test the testcase from the following patch, without and > with the i386.h part and verify it FAILs without and PASSes with it? > Just make check-gcc RUNTESTFLAGS=i386.exp=pr65782.c should do? > If not, can you please test the whole patch? > I did a -c test build with an older toolchain, it fails to compile (invalid register for .seh_savexmm) while the latest gcc is passing, both are using the same binutils version. I guess that covers it? > 2020-02-07 Uroš Bizjak > Jakub Jelinek > > PR target/65782 > * config/i386/config/i386/i386.h (CALL_USED_REGISTERS): Make > xmm16-xmm31 call-used even in 64-bit ms-abi. > > * gcc.target/i386/pr65782.c: New test. > > --- gcc/config/i386/config/i386/i386.h.jj 2020-01-22 10:19:24.199221986 > +0100 > +++ gcc/config/i386/config/i386/i386.h2020-02-04 12:09:12.338341003 > +0100 > @@ -1128,9 +1128,9 @@ extern const char *host_detect_local_cpu > /*xmm8,xmm9,xmm10,xmm11,xmm12,xmm13,xmm14,xmm15*/\ > 6, 6,6,6,6,6,6,6, \ > /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/ \ > - 6,6, 6,6,6,6,6,6, \ > + 1,1, 1,1,1,1,1,1, \ > /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/ \ > - 6,6, 6,6,6,6,6,6, \ > + 1,1, 1,1,1,1,1,1, \ > /* k0, k1, k2, k3, k4, k5, k6, k7*/ \ > 1, 1, 1, 1, 1, 1, 1, 1 } > > --- gcc/testsuite/gcc.target/i386/pr65782.c.jj2020-02-07 > 12:21:09.472819018 +0100 > +++ gcc/testsuite/gcc.target/i386/pr65782.c 2020-02-07 12:24:06.820154495 > +0100 > @@ -0,0 +1,16 @@ > +/* PR target/65782 */ > +/* { dg-do assemble { target { avx512vl && { ! ia32 } } } } */ > +/* { dg-options "-O2 -mavx512vl" } */ > + > +void > +foo (void) > +{ > + register double x __asm ("xmm14"); > + register double y __asm ("xmm18"); > + asm ("" : "=x" (x)); > + asm ("" : "=v" (y)); > + x += y; > + y += x; > + asm ("" : : "x" (x)); > + asm ("" : : "v" (y)); > +} > > > Jakub > signature.asc Description: OpenPGP digital signature
Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI
On Fri, Feb 07, 2020 at 10:57:22AM +, JonY wrote: > >> Is this patch testing still required? I just got back from traveling. > > > > Yes, our reading of the MS ABI docs show that xmm16-31 are to be call used > > (not preserved over calls), while in gcc they are currently handled as > > preserved across the calls. The other parts are I guess mainly about SEH. Consider e.g. void foo (void) { register double x __asm ("xmm14"); register double y __asm ("xmm18"); asm ("" : "=x" (x)); asm ("" : "=v" (y)); x += y; y += x; asm ("" : : "x" (x)); asm ("" : : "v" (y)); } looking at cross-compiler output, with -O2 -mavx512f this emits .file "abcdeq.c" .text .align 16 .globl foo .deffoo;.scl2; .type 32; .endef .seh_proc foo foo: subq$40, %rsp .seh_stackalloc 40 vmovaps %xmm14, (%rsp) .seh_savexmm%xmm14, 0 vmovaps %xmm18, 16(%rsp) .seh_savexmm%xmm18, 16 .seh_endprologue vaddsd %xmm18, %xmm14, %xmm14 vaddsd %xmm18, %xmm14, %xmm18 vmovaps (%rsp), %xmm14 vmovaps 16(%rsp), %xmm18 addq$40, %rsp ret .seh_endproc .ident "GCC: (GNU) 10.0.1 20200207 (experimental)" Does whatever assembler mingw64 uses even assemble this (I mean the .seh_savexmm %xmm16, 16 could be problematic)? I can find e.g. https://stackoverflow.com/questions/43152633/invalid-register-for-seh-savexmm-in-cygwin/43210527 which then links to https://gcc.gnu.org/PR65782 So, I'd say we want to add PR target/65782 to the ChangeLog entry in the patch, and likely add a testcase like the above, so like below? Have you tested the earlier version of the patch on mingw64 or cygwin? If yes, can you just test the testcase from the following patch, without and with the i386.h part and verify it FAILs without and PASSes with it? Just make check-gcc RUNTESTFLAGS=i386.exp=pr65782.c should do? If not, can you please test the whole patch? 2020-02-07 Uroš Bizjak Jakub Jelinek PR target/65782 * config/i386/config/i386/i386.h (CALL_USED_REGISTERS): Make xmm16-xmm31 call-used even in 64-bit ms-abi. * gcc.target/i386/pr65782.c: New test. --- gcc/config/i386/config/i386/i386.h.jj 2020-01-22 10:19:24.199221986 +0100 +++ gcc/config/i386/config/i386/i386.h 2020-02-04 12:09:12.338341003 +0100 @@ -1128,9 +1128,9 @@ extern const char *host_detect_local_cpu /*xmm8,xmm9,xmm10,xmm11,xmm12,xmm13,xmm14,xmm15*/ \ 6, 6,6,6,6,6,6,6, \ /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/\ - 6,6, 6,6,6,6,6,6, \ + 1,1, 1,1,1,1,1,1, \ /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/\ - 6,6, 6,6,6,6,6,6, \ + 1,1, 1,1,1,1,1,1, \ /* k0, k1, k2, k3, k4, k5, k6, k7*/\ 1, 1, 1, 1, 1, 1, 1, 1 } --- gcc/testsuite/gcc.target/i386/pr65782.c.jj 2020-02-07 12:21:09.472819018 +0100 +++ gcc/testsuite/gcc.target/i386/pr65782.c 2020-02-07 12:24:06.820154495 +0100 @@ -0,0 +1,16 @@ +/* PR target/65782 */ +/* { dg-do assemble { target { avx512vl && { ! ia32 } } } } */ +/* { dg-options "-O2 -mavx512vl" } */ + +void +foo (void) +{ + register double x __asm ("xmm14"); + register double y __asm ("xmm18"); + asm ("" : "=x" (x)); + asm ("" : "=v" (y)); + x += y; + y += x; + asm ("" : : "x" (x)); + asm ("" : : "v" (y)); +} Jakub
Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI
On 2/6/20 6:07 AM, Jakub Jelinek wrote: > On Thu, Feb 06, 2020 at 01:00:36AM +, JonY wrote: >> On 2/4/20 11:42 AM, Jakub Jelinek wrote: >>> Hi! >>> >>> On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote: I guess that Comment #9 patch form the PR should be trivially correct, but althouhg it looks obvious, I don't want to propose the patch since I have no means of testing it. >>> >>> I don't have means of testing it either. >>> https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 >>> is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low >>> 128-bits only) are call preserved. >>> >>> Jonathan, could you please test this if it is sufficient to just change >>> CALL_USED_REGISTERS or if e.g. something in the pro/epilogue needs tweaking >>> too? Thanks. >> >> Is this patch testing still required? I just got back from traveling. > > Yes, our reading of the MS ABI docs show that xmm16-31 are to be call used > (not preserved over calls), while in gcc they are currently handled as > preserved across the calls. > > Jakub > --- original.s 2020-02-06 09:00:02.014638069 + +++ new.s 2020-02-07 10:28:55.678317667 + @@ -7,23 +7,23 @@ qux: subq$72, %rsp .seh_stackalloc 72 - vmovaps %xmm18, 48(%rsp) - .seh_savexmm%xmm18, 48 + vmovaps %xmm6, 48(%rsp) + .seh_savexmm%xmm6, 48 .seh_endprologue callbar vmovapd %xmm0, %xmm1 - vmovapd %xmm1, %xmm18 + vmovapd %xmm1, %xmm6 callfoo leaq32(%rsp), %rcx - vmovapd %xmm18, %xmm0 - vmovaps %xmm0, 32(%rsp) + vmovapd %xmm6, %xmm0 + vmovapd %xmm0, 32(%rsp) callbaz nop - vmovaps 48(%rsp), %xmm18 + vmovaps 48(%rsp), %xmm6 addq$72, %rsp ret .seh_endproc - .ident "GCC: (GNU) 10.0.0 20191024 (experimental)" + .ident "GCC: (GNU) 10.0.1 20200206 (experimental)" .defbar;.scl2; .type 32; .endef .deffoo;.scl2; .type 32; .endef .defbaz;.scl2; .type 32; .endef GCC with the patch now seems to put the variables in xmm6, unfortunately I don't know enough of AVX or stack setups to know if that's all that is needed. signature.asc Description: OpenPGP digital signature
Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI
On Thu, Feb 06, 2020 at 01:00:36AM +, JonY wrote: > On 2/4/20 11:42 AM, Jakub Jelinek wrote: > > Hi! > > > > On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote: > >> I guess that Comment #9 patch form the PR should be trivially correct, > >> but althouhg it looks obvious, I don't want to propose the patch since > >> I have no means of testing it. > > > > I don't have means of testing it either. > > https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 > > is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low > > 128-bits only) are call preserved. > > > > Jonathan, could you please test this if it is sufficient to just change > > CALL_USED_REGISTERS or if e.g. something in the pro/epilogue needs tweaking > > too? Thanks. > > Is this patch testing still required? I just got back from traveling. Yes, our reading of the MS ABI docs show that xmm16-31 are to be call used (not preserved over calls), while in gcc they are currently handled as preserved across the calls. Jakub
Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI
On 2/4/20 11:42 AM, Jakub Jelinek wrote: > Hi! > > On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote: >> I guess that Comment #9 patch form the PR should be trivially correct, >> but althouhg it looks obvious, I don't want to propose the patch since >> I have no means of testing it. > > I don't have means of testing it either. > https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 > is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low > 128-bits only) are call preserved. > > Jonathan, could you please test this if it is sufficient to just change > CALL_USED_REGISTERS or if e.g. something in the pro/epilogue needs tweaking > too? Thanks. Is this patch testing still required? I just got back from traveling. signature.asc Description: OpenPGP digital signature
[PATCH] i386: Make xmm16-xmm31 call used even in ms ABI
Hi! On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote: > I guess that Comment #9 patch form the PR should be trivially correct, > but althouhg it looks obvious, I don't want to propose the patch since > I have no means of testing it. I don't have means of testing it either. https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low 128-bits only) are call preserved. Jonathan, could you please test this if it is sufficient to just change CALL_USED_REGISTERS or if e.g. something in the pro/epilogue needs tweaking too? Thanks. We are talking e.g. about /* { dg-options "-O2 -mabi=ms -mavx512vl" } */ typedef double V __attribute__((vector_size (16))); void foo (void); V bar (void); void baz (V); void qux (void) { V c; { register V a __asm ("xmm18"); V b = bar (); asm ("" : "=x" (a) : "0" (b)); c = a; } foo (); { register V d __asm ("xmm18"); V e; d = c; asm ("" : "=x" (e) : "0" (d)); baz (e); } } where according to the MSDN doc gcc incorrectly holds the c value in xmm18 register across the foo call; if foo is compiled by some Microsoft compiler (or LLVM), then it could clobber %xmm18. If all xmm18 occurrences are changed to say xmm15, then it is valid to hold the 128-bit value across the foo call (though, surprisingly, LLVM saves it into stack anyway). 2020-02-04 Uroš Bizjak * config/i386/config/i386/i386.h (CALL_USED_REGISTERS): Make xmm16-xmm31 call-used even in 64-bit ms-abi. --- gcc/config/i386/config/i386/i386.h.jj 2020-01-22 10:19:24.199221986 +0100 +++ gcc/config/i386/config/i386/i386.h 2020-02-04 12:09:12.338341003 +0100 @@ -1128,9 +1128,9 @@ extern const char *host_detect_local_cpu /*xmm8,xmm9,xmm10,xmm11,xmm12,xmm13,xmm14,xmm15*/ \ 6, 6,6,6,6,6,6,6, \ /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/\ - 6,6, 6,6,6,6,6,6, \ + 1,1, 1,1,1,1,1,1, \ /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/\ - 6,6, 6,6,6,6,6,6, \ + 1,1, 1,1,1,1,1,1, \ /* k0, k1, k2, k3, k4, k5, k6, k7*/\ 1, 1, 1, 1, 1, 1, 1, 1 } Jakub