[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2023-06-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #21 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:a90f558bbb87c0b5d2b1e07d55bd585b2285cf3d

commit r14-2114-ga90f558bbb87c0b5d2b1e07d55bd585b2285cf3d
Author: liuhongt 
Date:   Mon Jun 26 13:59:29 2023 +0800

Don't issue vzeroupper for vzeroupper call_insn.

gcc/ChangeLog:

PR target/82735
* config/i386/i386.cc (ix86_avx_u127_mode_needed): Don't emit
vzeroupper for vzeroupper call_insn.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-vzeroupper-30.c: New test.

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-06-07 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #20 from CVS Commits  ---
The master branch has been updated by H.J. Lu :

https://gcc.gnu.org/g:e89759fdfc80db223bd852aba937acb2d7c2cd80

commit r12-1265-ge89759fdfc80db223bd852aba937acb2d7c2cd80
Author: H.J. Lu 
Date:   Mon Jun 7 11:43:25 2021 -0700

x86: Don't compile pr82735-[345].c for x32

Since -mabi=ms isn't compatible with x32, skip pr82735-[345].c for x32.

PR target/82735
* gcc.target/i386/pr82735-3.c: Don't compile for x32.
* gcc.target/i386/pr82735-4.c: Likewise.
* gcc.target/i386/pr82735-5.c: Likewise.

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-06-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #19 from Hongtao.liu  ---
Fixed in GCC12.

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-06-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #18 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:9a90b311f22956addaf4f5f9bdb3592afd45083f

commit r12-1253-g9a90b311f22956addaf4f5f9bdb3592afd45083f
Author: liuhongt 
Date:   Tue Jun 1 09:09:44 2021 +0800

Fix _mm256_zeroupper by representing the instructions as call_insns in
which the call has a special vzeroupper ABI.

When __builtin_ia32_vzeroupper is called explicitly, the corresponding
vzeroupper pattern does not carry any CLOBBERS or SETs before LRA,
which leads to incorrect optimization in pass_reload. In order to
solve this problem, this patch refine instructions as call_insns in
which the call has a special vzeroupper ABI.

gcc/ChangeLog:

PR target/82735
* config/i386/i386-expand.c (ix86_expand_builtin): Remove
assignment of cfun->machine->has_explicit_vzeroupper.
* config/i386/i386-features.c
(ix86_add_reg_usage_to_vzerouppers): Delete.
(ix86_add_reg_usage_to_vzeroupper): Ditto.
(rest_of_handle_insert_vzeroupper): Remove
ix86_add_reg_usage_to_vzerouppers, add df_analyze at the end
of the function.
(gate): Remove cfun->machine->has_explicit_vzeroupper.
* config/i386/i386-protos.h (ix86_expand_avx_vzeroupper):
Declared.
* config/i386/i386.c (ix86_insn_callee_abi): New function.
(ix86_initialize_callee_abi): Ditto.
(ix86_expand_avx_vzeroupper): Ditto.
(ix86_hard_regno_call_part_clobbered): Adjust for vzeroupper
ABI.
(TARGET_INSN_CALLEE_ABI): Define as ix86_insn_callee_abi.
(ix86_emit_mode_set): Call ix86_expand_avx_vzeroupper
directly.
* config/i386/i386.h (struct GTY(()) machine_function): Delete
has_explicit_vzeroupper.
* config/i386/i386.md (enum unspec): New member
UNSPEC_CALLEE_ABI.
(ABI_DEFAULT,ABI_VZEROUPPER,ABI_UNKNOWN): New
define_constants for insn callee abi index.
* config/i386/predicates.md (vzeroupper_pattern): Adjust.
* config/i386/sse.md (UNSPECV_VZEROUPPER): Deleted.
(avx_vzeroupper): Call ix86_expand_avx_vzeroupper.
(*avx_vzeroupper): Rename to ..
(avx_vzeroupper_callee_abi): .. this, and adjust pattern as
call_insn which has a special vzeroupper ABI.
(*avx_vzeroupper_1): Deleted.

gcc/testsuite/ChangeLog:

PR target/82735
* gcc.target/i386/pr82735-1.c: New test.
* gcc.target/i386/pr82735-2.c: New test.
* gcc.target/i386/pr82735-3.c: New test.
* gcc.target/i386/pr82735-4.c: New test.
* gcc.target/i386/pr82735-5.c: New test.

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-06-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #17 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:16465ceb06cc1f65cfca3c0eb2c1ee27ab03bdfd

commit r12-1252-g16465ceb06cc1f65cfca3c0eb2c1ee27ab03bdfd
Author: liuhongt 
Date:   Tue Jun 1 09:00:57 2021 +0800

CALL_INSN may not be a real function call.

Use "used" flag for CALL_INSN to indicate it's a fake call. If it's a
fake call, it won't have its own function stack.

gcc/ChangeLog

PR target/82735
* df-scan.c (df_get_call_refs): When call_insn is a fake call,
it won't use stack pointer reg.
* final.c (leaf_function_p): When call_insn is a fake call, it
won't affect caller as a leaf function.
* reg-stack.c (callee_clobbers_any_stack_reg): New.
(subst_stack_regs): When call_insn doesn't clobber any stack
reg, don't clear the arguments.
* rtl.c (shallow_copy_rtx): Don't clear flag used when orig is
a insn.
* shrink-wrap.c (requires_stack_frame_p): No need for stack
frame for a fake call.
* rtl.h (FAKE_CALL_P): New macro.

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-05-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #16 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #15)
> (In reply to Hongtao.liu from comment #14)
> > (In reply to Hongtao.liu from comment #12)
> > > (In reply to Jakub Jelinek from comment #10)
> > > > Last touched in PR99563.
> > > > I guess for the explicit user vzeroupper we need to add the 
> > > > clobbers/sets
> > > > earlier than in the vzeroupper pass, but ideally in a way that doesn't 
> > > > force
> > > > save/restore of registers that aren't really needed in the function.
> > > 
> > > Yes, if we want to add the clobbers/sets earlier(than CSE1), vzeroupper 
> > > pass
> > > should be able to remove those unnecessary clobbers/sets.
> > 
> > Correct typo, add the clobbers/sets earlier(than RA)
> 
> I'm trying to add a post_reload splitter to add CLOBBERS of xmm to
> vzeroupper so that LRA knows vzeroupper will kill those xmm registers, then
> in pass_vzeroupper, transform those CLOBBERS to SET (xmm, xmm), it will
> benifit post_reload CSE which allow lower 128bits to cross vzeroupper, then
> in post_reload split2, drop those SETs, it's safe since there's no CSE
> between split2 and split3, problem is there's no update for data flow info
> between split2 and pro_and_epilog which mean even i manually drop those
> SETS, xmm6-xmm15 are still marked as used which causes redudant save and
> restore under 64-bit MSabi.
> 
> I'm thinking of adding a target_hook for updating df info just in the
> begenning of pass_pro_and_epilogue, the default behavior of the target_hook
> is doing nothing , and in i386 backend, df_analyse is called only under
> TARGET_AVX && cfun->machine->has_explicit_vzeroupper.

Oh, regs_ever_live isn't recomputed, that's why even i manually drop those SETs
is post_reload splitter, but xmm6 to xmm15 is still marked as live.

/* After reload, some ports add certain bits to regs_ever_live so
 this cannot be reset.  */

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-05-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #15 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #14)
> (In reply to Hongtao.liu from comment #12)
> > (In reply to Jakub Jelinek from comment #10)
> > > Last touched in PR99563.
> > > I guess for the explicit user vzeroupper we need to add the clobbers/sets
> > > earlier than in the vzeroupper pass, but ideally in a way that doesn't 
> > > force
> > > save/restore of registers that aren't really needed in the function.
> > 
> > Yes, if we want to add the clobbers/sets earlier(than CSE1), vzeroupper pass
> > should be able to remove those unnecessary clobbers/sets.
> 
> Correct typo, add the clobbers/sets earlier(than RA)

I'm trying to add a post_reload splitter to add CLOBBERS of xmm to vzeroupper
so that LRA knows vzeroupper will kill those xmm registers, then in
pass_vzeroupper, transform those CLOBBERS to SET (xmm, xmm), it will benifit
post_reload CSE which allow lower 128bits to cross vzeroupper, then in
post_reload split2, drop those SETs, it's safe since there's no CSE between
split2 and split3, problem is there's no update for data flow info between
split2 and pro_and_epilog which mean even i manually drop those SETS,
xmm6-xmm15 are still marked as used which causes redudant save and restore
under 64-bit MSabi.

I'm thinking of adding a target_hook for updating df info just in the begenning
of pass_pro_and_epilogue, the default behavior of the target_hook is doing
nothing , and in i386 backend, df_analyse is called only under TARGET_AVX &&
cfun->machine->has_explicit_vzeroupper.

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-04-28 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #14 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #12)
> (In reply to Jakub Jelinek from comment #10)
> > Last touched in PR99563.
> > I guess for the explicit user vzeroupper we need to add the clobbers/sets
> > earlier than in the vzeroupper pass, but ideally in a way that doesn't force
> > save/restore of registers that aren't really needed in the function.
> 
> Yes, if we want to add the clobbers/sets earlier(than CSE1), vzeroupper pass
> should be able to remove those unnecessary clobbers/sets.

Correct typo, add the clobbers/sets earlier(than RA)

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-04-28 Thread marcin.slusarz at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #13 from Marcin Ślusarz  ---
FTR, to reproduce this problem with gcc 9 and 10 I had to either replace -mavx
with -march=native or add -mtune=native. The problem starts reproducing with
-march=haswell.

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-04-28 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #12 from Hongtao.liu  ---
(In reply to Jakub Jelinek from comment #10)
> Last touched in PR99563.
> I guess for the explicit user vzeroupper we need to add the clobbers/sets
> earlier than in the vzeroupper pass, but ideally in a way that doesn't force
> save/restore of registers that aren't really needed in the function.

Yes, if we want to add the clobbers/sets earlier(than CSE1), vzeroupper pass
should be able to remove those unnecessary clobbers/sets.

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-04-28 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #11 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #9)
> (In reply to Richard Biener from comment #4)
> > Indeed as far as I understand an unspec volatile isn't sth clobbering
> > registers (not even memory?!).  The insn is missing inputs/outputs
> > (we might be able to model that lowparts are preserved).
> 
> Since the instruction operates on the whole pack (8 or 16 SSE registers),
> this approach will introduce uninitialized uses, and will clobber the whole
> register pack. Since SSE registers are callee-saved this means all registers
> will be saved in function prologue and restored in the epilogue.

I was typing a bit too fast here: SSE registers are NOT preserved across
function calls for SYSV ABI, and lower 128bit are preserved for MS ABI.

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-04-28 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #10 from Jakub Jelinek  ---
Last touched in PR99563.
I guess for the explicit user vzeroupper we need to add the clobbers/sets
earlier than in the vzeroupper pass, but ideally in a way that doesn't force
save/restore of registers that aren't really needed in the function.

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-04-28 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #9 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #4)
> Indeed as far as I understand an unspec volatile isn't sth clobbering
> registers (not even memory?!).  The insn is missing inputs/outputs
> (we might be able to model that lowparts are preserved).

Since the instruction operates on the whole pack (8 or 16 SSE registers), this
approach will introduce uninitialized uses, and will clobber the whole register
pack. Since SSE registers are callee-saved this means all registers will be
saved in function prologue and restored in the epilogue.

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-04-28 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #8 from Uroš Bizjak  ---
(In reply to Hongtao.liu from comment #7)
> Confirmed, let me fix this.

Please note that the current definition of vzeroupper does not model effects of
the instruction at all. The current definition is intended to handle automatic
vzeroupper insertion, and the builtin somehow slipped out of the mind...

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-04-25 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

Hongtao.liu  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com

--- Comment #7 from Hongtao.liu  ---
Confirmed, let me fix this.

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-04-24 Thread noloader at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #6 from Jeffrey Walton  ---
Add 9.3 to the know to fail list:

$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-04-24 Thread noloader at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #5 from Jeffrey Walton  ---
I think we are seeing this bug in the field. We are catching lots of failed
self tests as we test on multiple platforms, including Ubuntu 14 ERS and Ubuntu
16 LTS.

The problem makes GCC 4.8.4 through 7.5 practically useless for AVX and AVX2.

I don't see the problem with GCC 9.3.

Maybe the problem got fixed somewhere along the way?

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2017-10-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #4 from Richard Biener  ---
Indeed as far as I understand an unspec volatile isn't sth clobbering
registers (not even memory?!).  The insn is missing inputs/outputs
(we might be able to model that lowparts are preserved).

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2017-10-26 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #3 from Marc Glisse  ---
Actually, what CSE1 does might be fine, and it is LRA that should have noticed
that the register it assigned was clobbered, so it should have spilled (or
better rematerialized). Assuming the i386 backend does say that this unspec
clobbers the registers, which I am not seeing right now (but I may not be
looking in the right place).

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2017-10-26 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

Marc Glisse  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-10-26
 Ever confirmed|0   |1

--- Comment #2 from Marc Glisse  ---
CSE1 happily turns uses of the second constant, loaded after vzeroupper, into
uses of the first constant, loaded before, ignoring the fact that vzeroupper
clobbers (the upper part of) all avx registers.

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2017-10-26 Thread marcin.slusarz at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #1 from Marcin Ślusarz  ---
Heh, there are really stupid bugs in both files. Thankfully they don't change
the outcome.

Updated code:
$ cat main.c 
#include 
#include 

void test(char *dest);

int main()
{
char buf[64];
memset(buf, 0x2, 64);
test(buf);
for (int i = 0; i < 32; ++i)
printf("%d ", buf[i]);
printf("\n");
return 0;
}
$ cat zeroupper.c 
#include 

void test(char *dest)
{
__m256i ymm1 = _mm256_set1_epi8((char)0x1);
_mm256_storeu_si256((__m256i *)(dest + 32), ymm1);
_mm256_zeroupper();
__m256i ymm2 = _mm256_set1_epi8((char)0x1);
_mm256_storeu_si256((__m256i *)dest, ymm2);
}

Still the output is:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0