Re: [PATCH]rs6000: avoid peeking eof after __vector keyword

2022-03-21 Thread Jiufu Guo via Gcc-patches


Hi!

Segher Boessenkool  writes:

> On Mon, Mar 21, 2022 at 02:14:08PM -0400, David Edelsohn wrote:
>> On Mon, Mar 21, 2022 at 5:13 AM Jiufu Guo  wrote:
>> > There is a rare corner case: where __vector is followed only with ";"
>> > and near the end of the file.
>
>> This is okay. Maybe a tweak to the comment, see below.
>
> This whole function could use some restructuring / rewriting to make
> clearer what it actually does.  See the function comment:
>
> /* Called to decide whether a conditional macro should be expanded.
>Since we have exactly one such macro (i.e, 'vector'), we do not
>need to examine the 'tok' parameter.  */
>
> ... followed by 17 uses of "tok".  Yes, some of those overwrite the
> function argument, but that doesn't make it any better!  :-P
>
> Some factoring would help, too, perhaps.

Thanks for your review!

I am also confused about it when I check this function for the first
time. In the function, 'tok' is used directly at the beginning, and
then it is overwritten by cpp_peek_token.
>From the history of this function, the first version of this function
contains this 'inconsistency' between comments and implementations. :-P

With check related code, it seems this function is used to predicate
if a conditional macro should be expanded by peeking two or more
tokens.
The context-sensitive macros are vector/bool/pixel.  Correponding
keywords __vector/__bool/__pixel are unconditional.
Based on those related codes, the behavior of function
rs6000_macro_to_expand would be checking if the 'vector' token is
followed by bool/__bool or pixel/__pixel.  To do this the 'tok' has to
be 'examined'.

>From this understanding, we may just update the comment.
While the patch does not cover this.


BR,
Jiufu

>
>
> Segher


[COMMITTED] print-tree:Avoid warnings of overflow

2022-03-21 Thread Qian Jianhua via Gcc-patches
This patch avoids two warnings of "'sprintf' may write a
terminating nul past the end of the destination
[-Wformat-overflow=]" when build GCC.

Tested on x86_64, and committed as obvious.

gcc/ChangeLog:

* print-tree.cc: Change array length
---
 gcc/print-tree.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/print-tree.cc b/gcc/print-tree.cc
index 0876da873a9..6d45a4a5966 100644
--- a/gcc/print-tree.cc
+++ b/gcc/print-tree.cc
@@ -776,7 +776,7 @@ print_node (FILE *file, const char *prefix, tree node, int 
indent,
{
  /* Buffer big enough to format a 32-bit UINT_MAX into, plus
 the text.  */
- char temp[15];
+ char temp[16];
 
  sprintf (temp, "arg:%d", i);
  print_node (file, temp, TREE_OPERAND (node, i), indent + 4);
@@ -886,7 +886,7 @@ print_node (FILE *file, const char *prefix, tree node, int 
indent,
  {
  /* Buffer big enough to format a 32-bit UINT_MAX into, plus
 the text.  */
-   char temp[15];
+   char temp[16];
sprintf (temp, "elt:%d", i);
print_node (file, temp, TREE_VEC_ELT (node, i), indent + 4);
  }
-- 
2.18.1





RE: [PATCH v3] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-21 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Wang, Hongyu 
> Sent: Tuesday, March 22, 2022 11:28 AM
> To: Liu, Hongtao 
> Cc: gcc-patches@gcc.gnu.org
> Subject: [PATCH v3] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch
> [PR 104978]
> 
> Hi, here is the patch with force_reg before lowpart_subreg.
> 
> Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.
> 
> Ok for master?
> 
> For complex scalar intrinsic like _mm_mask_fcmadd_sch, the mask should be
> and by 1 to ensure the mask is bind to lowest byte.
> Use masked vmovss to perform same operation which omits higher bits of mask.
> 
> gcc/ChangeLog:
> 
>   PR target/104978
>   * config/i386/sse.md
>   (avx512fp16_fmaddcsh_v8hf_mask1   Use avx512f_movsf_mask instead of vmovaps or vblend, and
>   force_reg before lowpart_subreg.
>   (avx512fp16_fcmaddcsh_v8hf_mask1 
> gcc/testsuite/ChangeLog:
> 
>   PR target/104978
>   * gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c: Adjust asm scan.
>   * gcc.target/i386/avx512fp16-vfmaddcsh-1a.c: Ditto.
>   * gcc.target/i386/avx512fp16-vfcmaddcsh-1c.c: Removed.
>   * gcc.target/i386/avx512fp16-vfmaddcsh-1c.c: Ditto.
>   * gcc.target/i386/pr104978.c: New test.
> 
> V3
> ---
>  gcc/config/i386/sse.md| 62 ++-
>  .../i386/avx512fp16-vfcmaddcsh-1a.c   |  4 +-
>  .../i386/avx512fp16-vfcmaddcsh-1c.c   | 13 
>  .../gcc.target/i386/avx512fp16-vfmaddcsh-1a.c |  4 +-
>   .../gcc.target/i386/avx512fp16-vfmaddcsh-1c.c | 13 
>  gcc/testsuite/gcc.target/i386/pr104978.c  | 18 ++
>  6 files changed, 42 insertions(+), 72 deletions(-)  delete mode 100644
> gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1c.c
>  delete mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104978.c
> 
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> 21bf3c55c95..6f7af2f21d6 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -6576,7 +6576,7 @@ (define_expand
> "avx512fp16_fmaddcsh_v8hf_mask1"
> (match_operand:QI 4 "register_operand")]
>"TARGET_AVX512FP16 && "
>  {
> -  rtx op0, op1;
> +  rtx op0, op1, dest;
> 
>if ()
>  emit_insn (gen_avx512fp16_fmaddcsh_v8hf_mask
> ( @@ -6586,26 +6586,15 @@ (define_expand
> "avx512fp16_fmaddcsh_v8hf_mask1"
>  emit_insn (gen_avx512fp16_fmaddcsh_v8hf_mask (operands[0],
>operands[1], operands[2], operands[3], operands[4]));
> 
> -  if (TARGET_AVX512VL)
> -  {
> -op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
> -op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
> -emit_insn (gen_avx512vl_loadv4sf_mask (op0, op0, op1, operands[4]));
> -  }
> -  else
> -  {
> -rtx mask, tmp, vec_mask;
> -mask = lowpart_subreg (SImode, operands[4], QImode),
> -tmp = gen_reg_rtx (SImode);
> -emit_insn (gen_ashlsi3 (tmp, mask, GEN_INT (31)));
> -vec_mask = gen_reg_rtx (V4SImode);
> -emit_insn (gen_rtx_SET (vec_mask, CONST0_RTX (V4SImode)));
> -emit_insn (gen_vec_setv4si_0 (vec_mask, vec_mask, tmp));
> -vec_mask = lowpart_subreg (V4SFmode, vec_mask, V4SImode);
> -op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
> -op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
> -emit_insn (gen_sse4_1_blendvps (op0, op1, op0, vec_mask));
> -  }
> +  op0 = lowpart_subreg (V4SFmode, force_reg (V8HFmode, operands[0]),
> + V8HFmode);
> +  if (!MEM_P (operands[1]))
> +operands[1] = force_reg (V8HFmode, operands[1]);
> +  op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
> +  dest = gen_reg_rtx (V4SFmode);
> +  emit_insn (gen_avx512f_movsf_mask (dest, op1, op0, op1,
> +operands[4]));
> +  emit_move_insn (operands[0], lowpart_subreg (V8HFmode, dest,
> +V4SFmode));
>DONE;
>  })
> 
> @@ -6631,7 +6620,7 @@ (define_expand
> "avx512fp16_fcmaddcsh_v8hf_mask1"
> (match_operand:QI 4 "register_operand")]
>"TARGET_AVX512FP16 && "
>  {
> -  rtx op0, op1;
> +  rtx op0, op1, dest;
> 
>if ()
>  emit_insn (gen_avx512fp16_fcmaddcsh_v8hf_mask
> ( @@ -6641,26 +6630,15 @@ (define_expand
> "avx512fp16_fcmaddcsh_v8hf_mask1"
>  emit_insn (gen_avx512fp16_fcmaddcsh_v8hf_mask (operands[0],
>operands[1], operands[2], operands[3], operands[4]));
> 
> -  if (TARGET_AVX512VL)
> -  {
> -op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
> -op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
> -emit_insn (gen_avx512vl_loadv4sf_mask (op0, op0, op1, operands[4]));
> -  }
> -  else
> -  {
> -rtx mask, tmp, vec_mask;
> -mask = lowpart_subreg (SImode, operands[4], QImode),
> -tmp = gen_reg_rtx (SImode);
> -emit_insn (gen_ashlsi3 (tmp, mask, GEN_INT (31)));
> -vec_mask = gen_reg_rtx (V4SImode);
> -emit_insn (gen_rtx_SET (vec_mask, CONST0_RTX (V4SImode)));
> -emit_insn (gen_vec_setv4si_0 (vec_mask, 

[PATCH v3] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-21 Thread Hongyu Wang via Gcc-patches
Hi, here is the patch with force_reg before lowpart_subreg.

Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.

Ok for master?

For complex scalar intrinsic like _mm_mask_fcmadd_sch, the
mask should be and by 1 to ensure the mask is bind to lowest byte.
Use masked vmovss to perform same operation which omits higher bits
of mask.

gcc/ChangeLog:

PR target/104978
* config/i386/sse.md
(avx512fp16_fmaddcsh_v8hf_mask1"
(match_operand:QI 4 "register_operand")]
   "TARGET_AVX512FP16 && "
 {
-  rtx op0, op1;
+  rtx op0, op1, dest;
 
   if ()
 emit_insn (gen_avx512fp16_fmaddcsh_v8hf_mask (
@@ -6586,26 +6586,15 @@ (define_expand 
"avx512fp16_fmaddcsh_v8hf_mask1"
 emit_insn (gen_avx512fp16_fmaddcsh_v8hf_mask (operands[0],
   operands[1], operands[2], operands[3], operands[4]));
 
-  if (TARGET_AVX512VL)
-  {
-op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
-op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
-emit_insn (gen_avx512vl_loadv4sf_mask (op0, op0, op1, operands[4]));
-  }
-  else
-  {
-rtx mask, tmp, vec_mask;
-mask = lowpart_subreg (SImode, operands[4], QImode),
-tmp = gen_reg_rtx (SImode);
-emit_insn (gen_ashlsi3 (tmp, mask, GEN_INT (31)));
-vec_mask = gen_reg_rtx (V4SImode);
-emit_insn (gen_rtx_SET (vec_mask, CONST0_RTX (V4SImode)));
-emit_insn (gen_vec_setv4si_0 (vec_mask, vec_mask, tmp));
-vec_mask = lowpart_subreg (V4SFmode, vec_mask, V4SImode);
-op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
-op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
-emit_insn (gen_sse4_1_blendvps (op0, op1, op0, vec_mask));
-  }
+  op0 = lowpart_subreg (V4SFmode, force_reg (V8HFmode, operands[0]),
+   V8HFmode);
+  if (!MEM_P (operands[1]))
+operands[1] = force_reg (V8HFmode, operands[1]);
+  op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
+  dest = gen_reg_rtx (V4SFmode);
+  emit_insn (gen_avx512f_movsf_mask (dest, op1, op0, op1, operands[4]));
+  emit_move_insn (operands[0], lowpart_subreg (V8HFmode, dest,
+  V4SFmode));
   DONE;
 })
 
@@ -6631,7 +6620,7 @@ (define_expand 
"avx512fp16_fcmaddcsh_v8hf_mask1"
(match_operand:QI 4 "register_operand")]
   "TARGET_AVX512FP16 && "
 {
-  rtx op0, op1;
+  rtx op0, op1, dest;
 
   if ()
 emit_insn (gen_avx512fp16_fcmaddcsh_v8hf_mask (
@@ -6641,26 +6630,15 @@ (define_expand 
"avx512fp16_fcmaddcsh_v8hf_mask1"
 emit_insn (gen_avx512fp16_fcmaddcsh_v8hf_mask (operands[0],
   operands[1], operands[2], operands[3], operands[4]));
 
-  if (TARGET_AVX512VL)
-  {
-op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
-op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
-emit_insn (gen_avx512vl_loadv4sf_mask (op0, op0, op1, operands[4]));
-  }
-  else
-  {
-rtx mask, tmp, vec_mask;
-mask = lowpart_subreg (SImode, operands[4], QImode),
-tmp = gen_reg_rtx (SImode);
-emit_insn (gen_ashlsi3 (tmp, mask, GEN_INT (31)));
-vec_mask = gen_reg_rtx (V4SImode);
-emit_insn (gen_rtx_SET (vec_mask, CONST0_RTX (V4SImode)));
-emit_insn (gen_vec_setv4si_0 (vec_mask, vec_mask, tmp));
-vec_mask = lowpart_subreg (V4SFmode, vec_mask, V4SImode);
-op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
-op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
-emit_insn (gen_sse4_1_blendvps (op0, op1, op0, vec_mask));
-  }
+  op0 = lowpart_subreg (V4SFmode, force_reg (V8HFmode, operands[0]),
+   V8HFmode);
+  if (!MEM_P (operands[1]))
+operands[1] = force_reg (V8HFmode, operands[1]);
+  op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
+  dest = gen_reg_rtx (V4SFmode);
+  emit_insn (gen_avx512f_movsf_mask (dest, op1, op0, op1, operands[4]));
+  emit_move_insn (operands[0], lowpart_subreg (V8HFmode, dest,
+  V4SFmode));
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c
index eb96588df39..0f87861f09b 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c
@@ -1,13 +1,13 @@
 /* { dg-do compile } */
-/* { dg-options "-mavx512fp16 -mno-avx512vl -O2" } */
+/* { dg-options "-mavx512fp16 -O2" } */
 /* { dg-final { scan-assembler-times "vfcmaddcsh\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ 
\\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vfcmaddcsh\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[
 \\t\]+#)" 2 } } */
 /* { dg-final { scan-assembler-times "vfcmaddcsh\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[
 \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vfcmaddcsh\[ 

Re: [PATCH v2] Add TARGET_MOVE_WITH_MODE_P

2022-03-21 Thread H.J. Lu via Gcc-patches
On Mon, Mar 14, 2022 at 8:44 AM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Wed, Mar 9, 2022 at 7:04 PM Richard Sandiford
> >  wrote:
> >>
> >> Richard Biener via Gcc-patches  writes:
> >> > On Wed, Mar 2, 2022 at 10:18 PM H.J. Lu  wrote:
> >> >>
> >> >> On Wed, Mar 02, 2022 at 09:51:26AM +0100, Richard Biener wrote:
> >> >> > On Tue, Mar 1, 2022 at 11:41 PM H.J. Lu via Gcc-patches
> >> >> >  wrote:
> >> >> > >
> >> >> > > Add TARGET_FOLD_MEMCPY_MAX for the maximum number of bytes to fold 
> >> >> > > memcpy.
> >> >> > > The default is
> >> >> > >
> >> >> > > MOVE_MAX * MOVE_RATIO (optimize_function_for_size_p (cfun))
> >> >> > >
> >> >> > > For x86, it is MOVE_MAX to restore the old behavior before
> >> >> >
> >> >> > I know we've discussed this to death in the PR, I just want to repeat 
> >> >> > here
> >> >> > that the GIMPLE folding expects to generate a single load and a single
> >> >> > store (that is what it does on the GIMPLE level) which is why MOVE_MAX
> >> >> > was chosen originally (it's documented to what a "single instruction" 
> >> >> > does).
> >> >> > In practice MOVE_MAX does not seem to cover vector register sizes
> >> >> > so Richard pulled MOVE_RATIO which is really intended to cover
> >> >> > the case of using multiple instructions for moving memory (but then I
> >> >> > don't remember whether for the ARM case the single load/store GIMPLE
> >> >> > will be expanded to multiple load/store instructions).
> >> >> >
> >> >> > TARGET_FOLD_MEMCPY_MAX sounds like a stop-gap solution,
> >> >> > being very specific for memcpy folding (we also fold memmove btw).
> >> >> >
> >> >> > There is also MOVE_MAX_PIECES which _might_ be more appropriate
> >> >> > than MOVE_MAX here and still honor the idea of single instructions.
> >> >> > Now neither arm nor aarch64 define this and it defaults to MOVE_MAX,
> >> >> > not MOVE_MAX * MOVE_RATIO.
> >> >> >
> >> >> > So if we need a new hook then that hook should at least get the
> >> >> > 'speed' argument of MOVE_RATIO and it should get a better name.
> >> >> >
> >> >> > I still think that it should be possible to improve the insn check to
> >> >> > avoid use of "disabled" modes, maybe that's also a point to add
> >> >> > a new hook like .move_with_mode_p or so?  To quote, we do
> >> >>
> >> >> Here is the v2 patch to add TARGET_MOVE_WITH_MODE_P.
> >> >
> >> > Again I'd like to shine light on MOVE_MAX_PIECES which explicitely
> >> > mentions "a load or store used TO COPY MEMORY" (emphasis mine)
> >> > and whose x86 implementation would already be fine (doing larger moves
> >> > and also not doing too large moves).  But appearantly the arm folks
> >> > decided that that's not fit and instead (mis-?)used MOVE_MAX * 
> >> > MOVE_RATIO.
> >>
> >> It seems like there are old comments and old documentation that justify
> >> both interpretations, so there are good arguments on both sides.  But
> >> with this kind of thing I think we have to infer the meaning of the
> >> macro from the way it's currently used, rather than trusting such old
> >> and possibly out-of-date and contradictory information.
> >>
> >> FWIW, I agree that (if we exclude old reload, which we should!) the
> >> only direct uses of MOVE_MAX before the patch were not specific to
> >> integer registers and so MOVE_MAX should include vectors if the
> >> target wants vector modes to be used for general movement.
> >>
> >> Even if people disagree that that's the current meaning, I think it's
> >> at least a sensible meaning.  It provides information that AFAIK isn't
> >> available otherwise, and it avoids overlap with MAX_FIXED_MODE_SIZE.
> >>
> >> So FWIW, I think it'd be reasonable to change non-x86 targets if they
> >> want vector modes to be used for single-insn copies.
> >
> > Note a slight complication in the GIMPLE folding case is that we
> > do not end up using vector modes but we're using "fake"
> > integer modes like OImode which x86 has move patterns for.
> > If we'd use vector modes we could use existing target hooks to
> > eventually decide whether auto-using those is desired or not.
>
> Hmm, yeah.  Certainly we shouldn't require the target to support
> a scalar integer equivalent of every vector mode.
>

I'd like to resolve this before GCC 12 is released.

Thanks.


-- 
H.J.


Re: [PATCH] RISC-V: Implement ZTSO extension.

2022-03-21 Thread Kito Cheng via Gcc-patches
Hi Palmer:

Cool, so I keep that on the GCC 13 queue :)

On Tue, Mar 22, 2022 at 10:41 AM Palmer Dabbelt  wrote:
>
> On Mon, 21 Mar 2022 19:39:24 PDT (-0700), kito.ch...@sifive.com wrote:
> > Hi Palmer:
> >
> > I guess the problem is binutils isn't included and it's too close to the
> > GCC release, and binutils will report errors if it has any unsupported
> > extensions.
>
> Ya, sorry, I was trying to say that we should have more than just the
> binutils support -- IIUC having binutils support the GCC flags at
> release is the standard way to do things, and I don't see any reason to
> rush this.
>
> >
> > Most distro will use GCC 12 + binutils 2.38 or GCC 11 + binutils 2.38, so
> > either combination doesn't work for march string with ztso.
> >
> > So that's why I am not intending to include that at this moment, but maybe
> > we could include that first and it'll work once binutils 2.39 released,
> > then we can have GCC 12 + binutils 2.39 in the next few months.
> >
> > Anyway, I think I am fine with that, and I'll ping Nelson for the binutils
> > part.
> >
> > On Tue, Mar 22, 2022 at 9:13 AM Palmer Dabbelt  wrote:
> >
> >> On Thu, 17 Mar 2022 23:52:04 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
> >> > Hi Shi-Hua:
> >> >
> >> > Thanks, this patch is LGTM, but I would defer that until stage 1,
> >> > because the binutils part isn't merget yet.
> >>
> >> IMO we should at least have a __riscv_ztso define, and ideally have the
> >> relevent builtins ported (atomics, fences, etc) as well.  Otherwise this
> >> is really just setting a bit that makes binaries incompatible without
> >> providing any real benefit.  That'll also let us work through how these
> >> mappings should be implemented, so we don't end up with issues like we
> >> did with WMO.
> >>
> >> >
> >> > On Tue, Mar 15, 2022 at 5:10 PM  wrote:
> >> >>
> >> >> From: LiaoShihua 
> >> >>
> >> >>   ZTSO is the extension of tatol store order model.
> >> >>   This extension adds no new instructions to the ISA, and you can
> >> use it with arch "ztso".
> >> >>   If you use it, TSO flag will be generate in the ELF header.
> >> >>
> >> >> gcc/ChangeLog:
> >> >>
> >> >> * common/config/riscv/riscv-common.cc: define new arch.
> >> >> * config/riscv/riscv-opts.h (MASK_ZTSO): Ditto.
> >> >> (TARGET_ZTSO):Ditto.
> >> >> * config/riscv/riscv.opt:Ditto.
> >> >>
> >> >> ---
> >> >>  gcc/common/config/riscv/riscv-common.cc | 4 +++-
> >> >>  gcc/config/riscv/riscv-opts.h   | 3 +++
> >> >>  gcc/config/riscv/riscv.opt  | 3 +++
> >> >>  3 files changed, 9 insertions(+), 1 deletion(-)
> >> >>
> >> >> diff --git a/gcc/common/config/riscv/riscv-common.cc
> >> b/gcc/common/config/riscv/riscv-common.cc
> >> >> index a904893b9ed..f4730b991d7 100644
> >> >> --- a/gcc/common/config/riscv/riscv-common.cc
> >> >> +++ b/gcc/common/config/riscv/riscv-common.cc
> >> >> @@ -185,6 +185,8 @@ static const struct riscv_ext_version
> >> riscv_ext_version_table[] =
> >> >>{"zvl32768b", ISA_SPEC_CLASS_NONE, 1, 0},
> >> >>{"zvl65536b", ISA_SPEC_CLASS_NONE, 1, 0},
> >> >>
> >> >> +  {"ztso", ISA_SPEC_CLASS_NONE, 0, 1},
> >> >> +
> >> >>/* Terminate the list.  */
> >> >>{NULL, ISA_SPEC_CLASS_NONE, 0, 0}
> >> >>  };
> >> >> @@ -1080,7 +1082,7 @@ static const riscv_ext_flag_table_t
> >> riscv_ext_flag_table[] =
> >> >>{"zvl32768b", _options::x_riscv_zvl_flags, MASK_ZVL32768B},
> >> >>{"zvl65536b", _options::x_riscv_zvl_flags, MASK_ZVL65536B},
> >> >>
> >> >> -
> >> >> +  {"ztso", _options::x_riscv_ztso_subext, MASK_ZTSO},
> >> >>{NULL, NULL, 0}
> >> >>  };
> >> >>
> >> >> diff --git a/gcc/config/riscv/riscv-opts.h
> >> b/gcc/config/riscv/riscv-opts.h
> >> >> index 929e4e3a7c5..9cb5f2a550a 100644
> >> >> --- a/gcc/config/riscv/riscv-opts.h
> >> >> +++ b/gcc/config/riscv/riscv-opts.h
> >> >> @@ -136,4 +136,7 @@ enum stack_protector_guard {
> >> >>  #define TARGET_ZVL32768B ((riscv_zvl_flags & MASK_ZVL32768B) != 0)
> >> >>  #define TARGET_ZVL65536B ((riscv_zvl_flags & MASK_ZVL65536B) != 0)
> >> >>
> >> >> +#define MASK_ZTSO(1 <<  0)
> >> >> +#define TARGET_ZTSO((riscv_ztso_subext & MASK_ZTSO) != 0)
> >> >> +
> >> >>  #endif /* ! GCC_RISCV_OPTS_H */
> >> >> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> >> >> index 9fffc08220d..6128bfa31dc 100644
> >> >> --- a/gcc/config/riscv/riscv.opt
> >> >> +++ b/gcc/config/riscv/riscv.opt
> >> >> @@ -209,6 +209,9 @@ int riscv_vector_eew_flags
> >> >>  TargetVariable
> >> >>  int riscv_zvl_flags
> >> >>
> >> >> +TargetVariable
> >> >> +int riscv_ztso_subext
> >> >> +
> >> >>  Enum
> >> >>  Name(isa_spec_class) Type(enum riscv_isa_spec_class)
> >> >>  Supported ISA specs (for use with the -misa-spec= option):
> >> >> --
> >> >> 2.31.1.windows.1
> >> >>
> >>


Re: [PATCH] RISC-V: Implement ZTSO extension.

2022-03-21 Thread Palmer Dabbelt

On Mon, 21 Mar 2022 19:39:24 PDT (-0700), kito.ch...@sifive.com wrote:

Hi Palmer:

I guess the problem is binutils isn't included and it's too close to the
GCC release, and binutils will report errors if it has any unsupported
extensions.


Ya, sorry, I was trying to say that we should have more than just the 
binutils support -- IIUC having binutils support the GCC flags at 
release is the standard way to do things, and I don't see any reason to 
rush this.




Most distro will use GCC 12 + binutils 2.38 or GCC 11 + binutils 2.38, so
either combination doesn't work for march string with ztso.

So that's why I am not intending to include that at this moment, but maybe
we could include that first and it'll work once binutils 2.39 released,
then we can have GCC 12 + binutils 2.39 in the next few months.

Anyway, I think I am fine with that, and I'll ping Nelson for the binutils
part.

On Tue, Mar 22, 2022 at 9:13 AM Palmer Dabbelt  wrote:


On Thu, 17 Mar 2022 23:52:04 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
> Hi Shi-Hua:
>
> Thanks, this patch is LGTM, but I would defer that until stage 1,
> because the binutils part isn't merget yet.

IMO we should at least have a __riscv_ztso define, and ideally have the
relevent builtins ported (atomics, fences, etc) as well.  Otherwise this
is really just setting a bit that makes binaries incompatible without
providing any real benefit.  That'll also let us work through how these
mappings should be implemented, so we don't end up with issues like we
did with WMO.

>
> On Tue, Mar 15, 2022 at 5:10 PM  wrote:
>>
>> From: LiaoShihua 
>>
>>   ZTSO is the extension of tatol store order model.
>>   This extension adds no new instructions to the ISA, and you can
use it with arch "ztso".
>>   If you use it, TSO flag will be generate in the ELF header.
>>
>> gcc/ChangeLog:
>>
>> * common/config/riscv/riscv-common.cc: define new arch.
>> * config/riscv/riscv-opts.h (MASK_ZTSO): Ditto.
>> (TARGET_ZTSO):Ditto.
>> * config/riscv/riscv.opt:Ditto.
>>
>> ---
>>  gcc/common/config/riscv/riscv-common.cc | 4 +++-
>>  gcc/config/riscv/riscv-opts.h   | 3 +++
>>  gcc/config/riscv/riscv.opt  | 3 +++
>>  3 files changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/common/config/riscv/riscv-common.cc
b/gcc/common/config/riscv/riscv-common.cc
>> index a904893b9ed..f4730b991d7 100644
>> --- a/gcc/common/config/riscv/riscv-common.cc
>> +++ b/gcc/common/config/riscv/riscv-common.cc
>> @@ -185,6 +185,8 @@ static const struct riscv_ext_version
riscv_ext_version_table[] =
>>{"zvl32768b", ISA_SPEC_CLASS_NONE, 1, 0},
>>{"zvl65536b", ISA_SPEC_CLASS_NONE, 1, 0},
>>
>> +  {"ztso", ISA_SPEC_CLASS_NONE, 0, 1},
>> +
>>/* Terminate the list.  */
>>{NULL, ISA_SPEC_CLASS_NONE, 0, 0}
>>  };
>> @@ -1080,7 +1082,7 @@ static const riscv_ext_flag_table_t
riscv_ext_flag_table[] =
>>{"zvl32768b", _options::x_riscv_zvl_flags, MASK_ZVL32768B},
>>{"zvl65536b", _options::x_riscv_zvl_flags, MASK_ZVL65536B},
>>
>> -
>> +  {"ztso", _options::x_riscv_ztso_subext, MASK_ZTSO},
>>{NULL, NULL, 0}
>>  };
>>
>> diff --git a/gcc/config/riscv/riscv-opts.h
b/gcc/config/riscv/riscv-opts.h
>> index 929e4e3a7c5..9cb5f2a550a 100644
>> --- a/gcc/config/riscv/riscv-opts.h
>> +++ b/gcc/config/riscv/riscv-opts.h
>> @@ -136,4 +136,7 @@ enum stack_protector_guard {
>>  #define TARGET_ZVL32768B ((riscv_zvl_flags & MASK_ZVL32768B) != 0)
>>  #define TARGET_ZVL65536B ((riscv_zvl_flags & MASK_ZVL65536B) != 0)
>>
>> +#define MASK_ZTSO(1 <<  0)
>> +#define TARGET_ZTSO((riscv_ztso_subext & MASK_ZTSO) != 0)
>> +
>>  #endif /* ! GCC_RISCV_OPTS_H */
>> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
>> index 9fffc08220d..6128bfa31dc 100644
>> --- a/gcc/config/riscv/riscv.opt
>> +++ b/gcc/config/riscv/riscv.opt
>> @@ -209,6 +209,9 @@ int riscv_vector_eew_flags
>>  TargetVariable
>>  int riscv_zvl_flags
>>
>> +TargetVariable
>> +int riscv_ztso_subext
>> +
>>  Enum
>>  Name(isa_spec_class) Type(enum riscv_isa_spec_class)
>>  Supported ISA specs (for use with the -misa-spec= option):
>> --
>> 2.31.1.windows.1
>>



Re: [PATCH] RISC-V: Implement ZTSO extension.

2022-03-21 Thread Kito Cheng
Hi Palmer:

I guess the problem is binutils isn't included and it's too close to the
GCC release, and binutils will report errors if it has any unsupported
extensions.

Most distro will use GCC 12 + binutils 2.38 or GCC 11 + binutils 2.38, so
either combination doesn't work for march string with ztso.

So that's why I am not intending to include that at this moment, but maybe
we could include that first and it'll work once binutils 2.39 released,
then we can have GCC 12 + binutils 2.39 in the next few months.

Anyway, I think I am fine with that, and I'll ping Nelson for the binutils
part.

On Tue, Mar 22, 2022 at 9:13 AM Palmer Dabbelt  wrote:

> On Thu, 17 Mar 2022 23:52:04 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
> > Hi Shi-Hua:
> >
> > Thanks, this patch is LGTM, but I would defer that until stage 1,
> > because the binutils part isn't merget yet.
>
> IMO we should at least have a __riscv_ztso define, and ideally have the
> relevent builtins ported (atomics, fences, etc) as well.  Otherwise this
> is really just setting a bit that makes binaries incompatible without
> providing any real benefit.  That'll also let us work through how these
> mappings should be implemented, so we don't end up with issues like we
> did with WMO.
>
> >
> > On Tue, Mar 15, 2022 at 5:10 PM  wrote:
> >>
> >> From: LiaoShihua 
> >>
> >>   ZTSO is the extension of tatol store order model.
> >>   This extension adds no new instructions to the ISA, and you can
> use it with arch "ztso".
> >>   If you use it, TSO flag will be generate in the ELF header.
> >>
> >> gcc/ChangeLog:
> >>
> >> * common/config/riscv/riscv-common.cc: define new arch.
> >> * config/riscv/riscv-opts.h (MASK_ZTSO): Ditto.
> >> (TARGET_ZTSO):Ditto.
> >> * config/riscv/riscv.opt:Ditto.
> >>
> >> ---
> >>  gcc/common/config/riscv/riscv-common.cc | 4 +++-
> >>  gcc/config/riscv/riscv-opts.h   | 3 +++
> >>  gcc/config/riscv/riscv.opt  | 3 +++
> >>  3 files changed, 9 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/gcc/common/config/riscv/riscv-common.cc
> b/gcc/common/config/riscv/riscv-common.cc
> >> index a904893b9ed..f4730b991d7 100644
> >> --- a/gcc/common/config/riscv/riscv-common.cc
> >> +++ b/gcc/common/config/riscv/riscv-common.cc
> >> @@ -185,6 +185,8 @@ static const struct riscv_ext_version
> riscv_ext_version_table[] =
> >>{"zvl32768b", ISA_SPEC_CLASS_NONE, 1, 0},
> >>{"zvl65536b", ISA_SPEC_CLASS_NONE, 1, 0},
> >>
> >> +  {"ztso", ISA_SPEC_CLASS_NONE, 0, 1},
> >> +
> >>/* Terminate the list.  */
> >>{NULL, ISA_SPEC_CLASS_NONE, 0, 0}
> >>  };
> >> @@ -1080,7 +1082,7 @@ static const riscv_ext_flag_table_t
> riscv_ext_flag_table[] =
> >>{"zvl32768b", _options::x_riscv_zvl_flags, MASK_ZVL32768B},
> >>{"zvl65536b", _options::x_riscv_zvl_flags, MASK_ZVL65536B},
> >>
> >> -
> >> +  {"ztso", _options::x_riscv_ztso_subext, MASK_ZTSO},
> >>{NULL, NULL, 0}
> >>  };
> >>
> >> diff --git a/gcc/config/riscv/riscv-opts.h
> b/gcc/config/riscv/riscv-opts.h
> >> index 929e4e3a7c5..9cb5f2a550a 100644
> >> --- a/gcc/config/riscv/riscv-opts.h
> >> +++ b/gcc/config/riscv/riscv-opts.h
> >> @@ -136,4 +136,7 @@ enum stack_protector_guard {
> >>  #define TARGET_ZVL32768B ((riscv_zvl_flags & MASK_ZVL32768B) != 0)
> >>  #define TARGET_ZVL65536B ((riscv_zvl_flags & MASK_ZVL65536B) != 0)
> >>
> >> +#define MASK_ZTSO(1 <<  0)
> >> +#define TARGET_ZTSO((riscv_ztso_subext & MASK_ZTSO) != 0)
> >> +
> >>  #endif /* ! GCC_RISCV_OPTS_H */
> >> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> >> index 9fffc08220d..6128bfa31dc 100644
> >> --- a/gcc/config/riscv/riscv.opt
> >> +++ b/gcc/config/riscv/riscv.opt
> >> @@ -209,6 +209,9 @@ int riscv_vector_eew_flags
> >>  TargetVariable
> >>  int riscv_zvl_flags
> >>
> >> +TargetVariable
> >> +int riscv_ztso_subext
> >> +
> >>  Enum
> >>  Name(isa_spec_class) Type(enum riscv_isa_spec_class)
> >>  Supported ISA specs (for use with the -misa-spec= option):
> >> --
> >> 2.31.1.windows.1
> >>
>


Re: [PATCH] [i386] Extend splitter pattern to reversed condition by swapping then and else rtx. [PR target/104982]

2022-03-21 Thread Hongtao Liu via Gcc-patches
On Mon, Mar 21, 2022 at 9:06 PM liuhongt  wrote:
>
> Failed to match this instruction:
> (set (reg/v:SI 88 [ z ])
> (if_then_else:SI (eq (zero_extract:SI (reg:SI 92)
> (const_int 1 [0x1])
> (zero_extend:SI (subreg:QI (reg:SI 93) 0)))
> (const_int 0 [0]))
> (reg:SI 95)
> (reg:SI 94)))
>
> but it's equal to
>
> (set (reg/v:SI 88 [ z ])
> (if_then_else:SI (ne (zero_extract:SI (reg:SI 92)
> (const_int 1 [0x1])
> (zero_extend:SI (subreg:QI (reg:SI 93) 0)))
> (const_int 0 [0]))
> (reg:SI 94)
> (reg:SI 95)))
>
> which is the exact existing splitter.
>
> The patch will fix below regressions:
>
> On x86-64, r12-7687 caused:
>
> FAIL: gcc.target/i386/bt-5.c scan-assembler-not sar[lq][ \t]
> FAIL: gcc.target/i386/bt-5.c scan-assembler-times bt[lq][ \t] 7
>
> Bootstrap and regtested on x86_64-pc-linux-gnu{-m32,}
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/104982
> * config/i386/i386.md (*jcc_bt_mask): Extend the
> following splitter to reversed condition.
> ---
>  gcc/config/i386/i386.md | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 02f298c2846..c74edd1aaef 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -14182,12 +14182,12 @@ (define_insn_and_split "*jcc_bt_mask"
>  (define_split
>[(set (match_operand:SWI248 0 "register_operand")
> (if_then_else:SWI248
> -(ne
> - (zero_extract:SWI48
> -  (match_operand:SWI48 1 "register_operand")
> -  (const_int 1)
> -  (zero_extend:SI (match_operand:QI 2 "register_operand")))
> - (const_int 0))
> +(match_operator 5 "bt_comparison_operator"
> + [(zero_extract:SWI48
> +   (match_operand:SWI48 1 "register_operand")
> +   (const_int 1)
> +   (zero_extend:SI (match_operand:QI 2 "register_operand")))
> +  (const_int 0)])
>  (match_operand:SWI248 3 "nonimmediate_operand")
>  (match_operand:SWI248 4 "nonimmediate_operand")))]
>"TARGET_USE_BT && TARGET_CMOVE
> @@ -14202,6 +14202,8 @@ (define_split
>  (match_dup 3)
>  (match_dup 4)))]
>  {
> +  if (GET_CODE (operands[5]) == EQ)
> +std::swap (operands[3], operands[4]);
>operands[2] = lowpart_subreg (SImode, operands[2], QImode);
>  })
>
> --
> 2.18.1
>


-- 
BR,
Hongtao


Re: [PATCH] RISC-V: Implement ZTSO extension.

2022-03-21 Thread Palmer Dabbelt

On Thu, 17 Mar 2022 23:52:04 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

Hi Shi-Hua:

Thanks, this patch is LGTM, but I would defer that until stage 1,
because the binutils part isn't merget yet.


IMO we should at least have a __riscv_ztso define, and ideally have the 
relevent builtins ported (atomics, fences, etc) as well.  Otherwise this 
is really just setting a bit that makes binaries incompatible without 
providing any real benefit.  That'll also let us work through how these 
mappings should be implemented, so we don't end up with issues like we 
did with WMO.




On Tue, Mar 15, 2022 at 5:10 PM  wrote:


From: LiaoShihua 

  ZTSO is the extension of tatol store order model.
  This extension adds no new instructions to the ISA, and you can use it with arch 
"ztso".
  If you use it, TSO flag will be generate in the ELF header.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: define new arch.
* config/riscv/riscv-opts.h (MASK_ZTSO): Ditto.
(TARGET_ZTSO):Ditto.
* config/riscv/riscv.opt:Ditto.

---
 gcc/common/config/riscv/riscv-common.cc | 4 +++-
 gcc/config/riscv/riscv-opts.h   | 3 +++
 gcc/config/riscv/riscv.opt  | 3 +++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index a904893b9ed..f4730b991d7 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -185,6 +185,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zvl32768b", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zvl65536b", ISA_SPEC_CLASS_NONE, 1, 0},

+  {"ztso", ISA_SPEC_CLASS_NONE, 0, 1},
+
   /* Terminate the list.  */
   {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
 };
@@ -1080,7 +1082,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zvl32768b", _options::x_riscv_zvl_flags, MASK_ZVL32768B},
   {"zvl65536b", _options::x_riscv_zvl_flags, MASK_ZVL65536B},

-
+  {"ztso", _options::x_riscv_ztso_subext, MASK_ZTSO},
   {NULL, NULL, 0}
 };

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 929e4e3a7c5..9cb5f2a550a 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -136,4 +136,7 @@ enum stack_protector_guard {
 #define TARGET_ZVL32768B ((riscv_zvl_flags & MASK_ZVL32768B) != 0)
 #define TARGET_ZVL65536B ((riscv_zvl_flags & MASK_ZVL65536B) != 0)

+#define MASK_ZTSO(1 <<  0)
+#define TARGET_ZTSO((riscv_ztso_subext & MASK_ZTSO) != 0)
+
 #endif /* ! GCC_RISCV_OPTS_H */
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 9fffc08220d..6128bfa31dc 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -209,6 +209,9 @@ int riscv_vector_eew_flags
 TargetVariable
 int riscv_zvl_flags

+TargetVariable
+int riscv_ztso_subext
+
 Enum
 Name(isa_spec_class) Type(enum riscv_isa_spec_class)
 Supported ISA specs (for use with the -misa-spec= option):
--
2.31.1.windows.1



[pushed] c++: initialized array of vla [PR58646]

2022-03-21 Thread Jason Merrill via Gcc-patches
We went into build_vec_init because we're dealing with a VLA, but then
build_vec_init thought it was safe to just build an INIT_EXPR because the
outer dimension is constant.  Nope.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

* init.cc (build_vec_init): Check for vla element type.

gcc/testsuite/ChangeLog:

* g++.dg/ext/vla24.C: New.
---
 gcc/cp/init.cc   | 1 +
 gcc/testsuite/g++.dg/ext/vla24.C | 7 +++
 2 files changed, 8 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/vla24.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index 7575597c8fd..08767679dd4 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -4395,6 +4395,7 @@ build_vec_init (tree base, tree maxindex, tree init,
   if (init
   && TREE_CODE (atype) == ARRAY_TYPE
   && TREE_CONSTANT (maxindex)
+  && !vla_type_p (type)
   && (from_array == 2
  ? vec_copy_assign_is_trivial (inner_elt_type, init)
  : !TYPE_NEEDS_CONSTRUCTING (type))
diff --git a/gcc/testsuite/g++.dg/ext/vla24.C b/gcc/testsuite/g++.dg/ext/vla24.C
new file mode 100644
index 000..0a99c003ffb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/vla24.C
@@ -0,0 +1,7 @@
+// PR c++/58646
+// { dg-additional-options -Wno-vla }
+
+void foo(int n)
+{
+  int a[2][n] = {};
+}

base-commit: bec69ac548b0f37b41d07082d6ee52b52d356536
-- 
2.27.0



Re: [PATCH] x86: Properly check FEATURE_AESKLE

2022-03-21 Thread Uros Bizjak via Gcc-patches
On Mon, Mar 21, 2022 at 10:51 PM H.J. Lu  wrote:
>
> On Mon, Mar 21, 2022 at 2:29 PM Uros Bizjak  wrote:
> >
> > On Mon, Mar 21, 2022 at 2:56 PM H.J. Lu  wrote:
> > >
> > > 1. Pass 0x19 to __cpuid for bit_AESKLE.
> > > 2. Enable FEATURE_AESKLE only if bit_AESKLE is set.
> > >
> > > PR target/104998
> > > * common/config/i386/cpuinfo.h (get_available_features): Pass
> > > 0x19 to __cpuid for bit_AESKLE.  Enable FEATURE_AESKLE only if
> > > bit_AESKLE is set.
> >
> > LGTM.
>
> OK for backport?

Looks safe, so OK.

Thanks,
Uros.

>
> Thanks.
>
> > Thanks,
> > Uros.
> >
> > > ---
> > >  gcc/common/config/i386/cpuinfo.h | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/gcc/common/config/i386/cpuinfo.h 
> > > b/gcc/common/config/i386/cpuinfo.h
> > > index 61b1a0f291c..239759dc766 100644
> > > --- a/gcc/common/config/i386/cpuinfo.h
> > > +++ b/gcc/common/config/i386/cpuinfo.h
> > > @@ -779,11 +779,11 @@ get_available_features (struct __processor_model 
> > > *cpu_model,
> > >/* Get Advanced Features at level 0x19 (eax = 0x19).  */
> > >if (max_cpuid_level >= 0x19)
> > >  {
> > > -  set_feature (FEATURE_AESKLE);
> > > -  __cpuid (19, eax, ebx, ecx, edx);
> > > +  __cpuid (0x19, eax, ebx, ecx, edx);
> > >/* Check if OS support keylocker.  */
> > >if (ebx & bit_AESKLE)
> > > {
> > > + set_feature (FEATURE_AESKLE);
> > >   if (ebx & bit_WIDEKL)
> > > set_feature (FEATURE_WIDEKL);
> > >   if (has_kl)
> > > --
> > > 2.35.1
> > >
>
>
>
> --
> H.J.


Re: [PATCH v3] x86: Disable SSE in ISA2 for -mgeneral-regs-only

2022-03-21 Thread Uros Bizjak via Gcc-patches
On Mon, Mar 21, 2022 at 10:57 PM H.J. Lu  wrote:
>
> On Mon, Mar 21, 2022 at 10:50:11PM +0100, Uros Bizjak wrote:
> > On Mon, Mar 21, 2022 at 10:47 PM H.J. Lu  wrote:
> > >
> > > On Mon, Mar 21, 2022 at 10:23:59PM +0100, Uros Bizjak wrote:
> > > > On Mon, Mar 21, 2022 at 10:10 PM H.J. Lu  wrote:
> > > > >
> > > > > SSE and AVX ISAs in ISA2 should be disabled for -mgeneral-regs-only.
> > > > >
> > > > > gcc/
> > > > >
> > > > > PR target/105000
> > > > > * common/config/i386/i386-common.cc
> > > > > (OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET): Also disable SSE
> > > > > and AVX.
> > > > >
> > > > > gcc/testsuite/
> > > > >
> > > > > PR target/105000
> > > > > * gcc.target/i386/pr105000-1.c: New test.
> > > > > * gcc.target/i386/pr105000-2.c: Likewise.
> > > > > * gcc.target/i386/pr105000-3.c: Likewise.
> > > > > ---
> > > > >  gcc/common/config/i386/i386-common.cc  |  4 +++-
> > > > >  gcc/testsuite/gcc.target/i386/pr105000-1.c | 11 +++
> > > > >  gcc/testsuite/gcc.target/i386/pr105000-2.c | 11 +++
> > > > >  gcc/testsuite/gcc.target/i386/pr105000-3.c | 11 +++
> > > > >  4 files changed, 36 insertions(+), 1 deletion(-)
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-1.c
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-2.c
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-3.c
> > > > >
> > > > > diff --git a/gcc/common/config/i386/i386-common.cc 
> > > > > b/gcc/common/config/i386/i386-common.cc
> > > > > index 449df6351c9..b77d495e9a4 100644
> > > > > --- a/gcc/common/config/i386/i386-common.cc
> > > > > +++ b/gcc/common/config/i386/i386-common.cc
> > > > > @@ -321,7 +321,9 @@ along with GCC; see the file COPYING3.  If not see
> > > > > | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
> > > > > | OPTION_MASK_ISA2_AVX512FP16_UNSET)
> > > > >  #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
> > > > > -  (OPTION_MASK_ISA2_AVX512F_UNSET)
> > > > > +  (OPTION_MASK_ISA2_SSE_UNSET \
> > > > > +   | OPTION_MASK_ISA2_AVX_UNSET \
> > > > > +   | OPTION_MASK_ISA2_AVX512F_UNSET)
> > > >
> > > > The above should only need OPTION_MASK_ISA2_SSE_UNSET, other options
> > > > follow from #define chain.
> > > >
> > >
> > > Here is the v2 patch to use OPTION_MASK_ISA2_SSE_UNSET.  OK for
> > > master and GCC 11 branches?
> >
> > Have you regressiont tested it?
>
> I tested with the original patch.   Since OPTION_MASK_ISA2_SSE_UNSET
> is the same as
>
> (OPTION_MASK_ISA2_SSE_UNSET
>  | OPTION_MASK_ISA2_AVX_UNSET
>  | OPTION_MASK_ISA2_AVX512F_UNSET)
>
> there should be no difference.

I hope so.

OK.

Thanks,
Uros.

>
> >
> > > Thanks.
> > >
> > >
> > > H.J.
> > > ---
> > > Replace OPTION_MASK_ISA2_AVX512F_UNSET with OPTION_MASK_ISA2_SSE_UNSET
> > > in OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET to disable SSE, AVX and
> > > AVX512 ISAs.
> > >
> > > gcc/
> > >
> > > PR target/105000
> > > * common/config/i386/i386-common.cc
> > > (OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET): Replace
> > > OPTION_MASK_ISA2_AVX512F_UNSET with OPTION_MASK_ISA2_SSE_UNSET.
> > >
> > > gcc/testsuite/
> > >
> > > PR target/105000
> > > * gcc.target/i386/pr105000-1.c: New test.
> > > * gcc.target/i386/pr105000-2.c: Likewise.
> > > * gcc.target/i386/pr105000-3.c: Likewise.
> > > * gcc.target/i386/pr105000-4.c: Likewise.
> > > ---
> > >  gcc/common/config/i386/i386-common.cc  |  2 +-
> > >  gcc/testsuite/gcc.target/i386/pr105000-1.c | 11 +++
> > >  gcc/testsuite/gcc.target/i386/pr105000-2.c | 11 +++
> > >  gcc/testsuite/gcc.target/i386/pr105000-3.c | 11 +++
> > >  gcc/testsuite/gcc.target/i386/pr105000-4.c | 11 +++
> > >  5 files changed, 45 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-3.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-4.c
> > >
> > > diff --git a/gcc/common/config/i386/i386-common.cc 
> > > b/gcc/common/config/i386/i386-common.cc
> > > index 449df6351c9..c64d7b01126 100644
> > > --- a/gcc/common/config/i386/i386-common.cc
> > > +++ b/gcc/common/config/i386/i386-common.cc
> > > @@ -321,7 +321,7 @@ along with GCC; see the file COPYING3.  If not see
> > > | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
> > > | OPTION_MASK_ISA2_AVX512FP16_UNSET)
> > >  #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
> > > -  (OPTION_MASK_ISA2_AVX512F_UNSET)
> > > +  (OPTION_MASK_ISA2_SSE_UNSET)
> >
> > No need for parenthesis.
> >
>
> Fixed in the v3 patch.
>
>
> H.J.
> ---
> Replace OPTION_MASK_ISA2_AVX512F_UNSET with OPTION_MASK_ISA2_SSE_UNSET
> in OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET to disable SSE, AVX and
> AVX512 ISAs.
>
> gcc/
>
> PR target/105000
> * 

[PATCH v3] x86: Disable SSE in ISA2 for -mgeneral-regs-only

2022-03-21 Thread H.J. Lu via Gcc-patches
On Mon, Mar 21, 2022 at 10:50:11PM +0100, Uros Bizjak wrote:
> On Mon, Mar 21, 2022 at 10:47 PM H.J. Lu  wrote:
> >
> > On Mon, Mar 21, 2022 at 10:23:59PM +0100, Uros Bizjak wrote:
> > > On Mon, Mar 21, 2022 at 10:10 PM H.J. Lu  wrote:
> > > >
> > > > SSE and AVX ISAs in ISA2 should be disabled for -mgeneral-regs-only.
> > > >
> > > > gcc/
> > > >
> > > > PR target/105000
> > > > * common/config/i386/i386-common.cc
> > > > (OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET): Also disable SSE
> > > > and AVX.
> > > >
> > > > gcc/testsuite/
> > > >
> > > > PR target/105000
> > > > * gcc.target/i386/pr105000-1.c: New test.
> > > > * gcc.target/i386/pr105000-2.c: Likewise.
> > > > * gcc.target/i386/pr105000-3.c: Likewise.
> > > > ---
> > > >  gcc/common/config/i386/i386-common.cc  |  4 +++-
> > > >  gcc/testsuite/gcc.target/i386/pr105000-1.c | 11 +++
> > > >  gcc/testsuite/gcc.target/i386/pr105000-2.c | 11 +++
> > > >  gcc/testsuite/gcc.target/i386/pr105000-3.c | 11 +++
> > > >  4 files changed, 36 insertions(+), 1 deletion(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-1.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-2.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-3.c
> > > >
> > > > diff --git a/gcc/common/config/i386/i386-common.cc 
> > > > b/gcc/common/config/i386/i386-common.cc
> > > > index 449df6351c9..b77d495e9a4 100644
> > > > --- a/gcc/common/config/i386/i386-common.cc
> > > > +++ b/gcc/common/config/i386/i386-common.cc
> > > > @@ -321,7 +321,9 @@ along with GCC; see the file COPYING3.  If not see
> > > > | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
> > > > | OPTION_MASK_ISA2_AVX512FP16_UNSET)
> > > >  #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
> > > > -  (OPTION_MASK_ISA2_AVX512F_UNSET)
> > > > +  (OPTION_MASK_ISA2_SSE_UNSET \
> > > > +   | OPTION_MASK_ISA2_AVX_UNSET \
> > > > +   | OPTION_MASK_ISA2_AVX512F_UNSET)
> > >
> > > The above should only need OPTION_MASK_ISA2_SSE_UNSET, other options
> > > follow from #define chain.
> > >
> >
> > Here is the v2 patch to use OPTION_MASK_ISA2_SSE_UNSET.  OK for
> > master and GCC 11 branches?
> 
> Have you regressiont tested it?

I tested with the original patch.   Since OPTION_MASK_ISA2_SSE_UNSET
is the same as

(OPTION_MASK_ISA2_SSE_UNSET
 | OPTION_MASK_ISA2_AVX_UNSET
 | OPTION_MASK_ISA2_AVX512F_UNSET)

there should be no difference.

> 
> > Thanks.
> >
> >
> > H.J.
> > ---
> > Replace OPTION_MASK_ISA2_AVX512F_UNSET with OPTION_MASK_ISA2_SSE_UNSET
> > in OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET to disable SSE, AVX and
> > AVX512 ISAs.
> >
> > gcc/
> >
> > PR target/105000
> > * common/config/i386/i386-common.cc
> > (OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET): Replace
> > OPTION_MASK_ISA2_AVX512F_UNSET with OPTION_MASK_ISA2_SSE_UNSET.
> >
> > gcc/testsuite/
> >
> > PR target/105000
> > * gcc.target/i386/pr105000-1.c: New test.
> > * gcc.target/i386/pr105000-2.c: Likewise.
> > * gcc.target/i386/pr105000-3.c: Likewise.
> > * gcc.target/i386/pr105000-4.c: Likewise.
> > ---
> >  gcc/common/config/i386/i386-common.cc  |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr105000-1.c | 11 +++
> >  gcc/testsuite/gcc.target/i386/pr105000-2.c | 11 +++
> >  gcc/testsuite/gcc.target/i386/pr105000-3.c | 11 +++
> >  gcc/testsuite/gcc.target/i386/pr105000-4.c | 11 +++
> >  5 files changed, 45 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-4.c
> >
> > diff --git a/gcc/common/config/i386/i386-common.cc 
> > b/gcc/common/config/i386/i386-common.cc
> > index 449df6351c9..c64d7b01126 100644
> > --- a/gcc/common/config/i386/i386-common.cc
> > +++ b/gcc/common/config/i386/i386-common.cc
> > @@ -321,7 +321,7 @@ along with GCC; see the file COPYING3.  If not see
> > | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
> > | OPTION_MASK_ISA2_AVX512FP16_UNSET)
> >  #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
> > -  (OPTION_MASK_ISA2_AVX512F_UNSET)
> > +  (OPTION_MASK_ISA2_SSE_UNSET)
> 
> No need for parenthesis.
> 

Fixed in the v3 patch.


H.J.
---
Replace OPTION_MASK_ISA2_AVX512F_UNSET with OPTION_MASK_ISA2_SSE_UNSET
in OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET to disable SSE, AVX and
AVX512 ISAs.

gcc/

PR target/105000
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET): Replace
OPTION_MASK_ISA2_AVX512F_UNSET with OPTION_MASK_ISA2_SSE_UNSET.

gcc/testsuite/

PR target/105000
* gcc.target/i386/pr105000-1.c: New test.
* gcc.target/i386/pr105000-2.c: Likewise.
* 

Re: [PATCH] x86: Properly check FEATURE_AESKLE

2022-03-21 Thread H.J. Lu via Gcc-patches
On Mon, Mar 21, 2022 at 2:29 PM Uros Bizjak  wrote:
>
> On Mon, Mar 21, 2022 at 2:56 PM H.J. Lu  wrote:
> >
> > 1. Pass 0x19 to __cpuid for bit_AESKLE.
> > 2. Enable FEATURE_AESKLE only if bit_AESKLE is set.
> >
> > PR target/104998
> > * common/config/i386/cpuinfo.h (get_available_features): Pass
> > 0x19 to __cpuid for bit_AESKLE.  Enable FEATURE_AESKLE only if
> > bit_AESKLE is set.
>
> LGTM.

OK for backport?

Thanks.

> Thanks,
> Uros.
>
> > ---
> >  gcc/common/config/i386/cpuinfo.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/common/config/i386/cpuinfo.h 
> > b/gcc/common/config/i386/cpuinfo.h
> > index 61b1a0f291c..239759dc766 100644
> > --- a/gcc/common/config/i386/cpuinfo.h
> > +++ b/gcc/common/config/i386/cpuinfo.h
> > @@ -779,11 +779,11 @@ get_available_features (struct __processor_model 
> > *cpu_model,
> >/* Get Advanced Features at level 0x19 (eax = 0x19).  */
> >if (max_cpuid_level >= 0x19)
> >  {
> > -  set_feature (FEATURE_AESKLE);
> > -  __cpuid (19, eax, ebx, ecx, edx);
> > +  __cpuid (0x19, eax, ebx, ecx, edx);
> >/* Check if OS support keylocker.  */
> >if (ebx & bit_AESKLE)
> > {
> > + set_feature (FEATURE_AESKLE);
> >   if (ebx & bit_WIDEKL)
> > set_feature (FEATURE_WIDEKL);
> >   if (has_kl)
> > --
> > 2.35.1
> >



-- 
H.J.


Re: [PATCH v2] x86: Disable SSE in ISA2 for -mgeneral-regs-only

2022-03-21 Thread Uros Bizjak via Gcc-patches
On Mon, Mar 21, 2022 at 10:47 PM H.J. Lu  wrote:
>
> On Mon, Mar 21, 2022 at 10:23:59PM +0100, Uros Bizjak wrote:
> > On Mon, Mar 21, 2022 at 10:10 PM H.J. Lu  wrote:
> > >
> > > SSE and AVX ISAs in ISA2 should be disabled for -mgeneral-regs-only.
> > >
> > > gcc/
> > >
> > > PR target/105000
> > > * common/config/i386/i386-common.cc
> > > (OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET): Also disable SSE
> > > and AVX.
> > >
> > > gcc/testsuite/
> > >
> > > PR target/105000
> > > * gcc.target/i386/pr105000-1.c: New test.
> > > * gcc.target/i386/pr105000-2.c: Likewise.
> > > * gcc.target/i386/pr105000-3.c: Likewise.
> > > ---
> > >  gcc/common/config/i386/i386-common.cc  |  4 +++-
> > >  gcc/testsuite/gcc.target/i386/pr105000-1.c | 11 +++
> > >  gcc/testsuite/gcc.target/i386/pr105000-2.c | 11 +++
> > >  gcc/testsuite/gcc.target/i386/pr105000-3.c | 11 +++
> > >  4 files changed, 36 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-3.c
> > >
> > > diff --git a/gcc/common/config/i386/i386-common.cc 
> > > b/gcc/common/config/i386/i386-common.cc
> > > index 449df6351c9..b77d495e9a4 100644
> > > --- a/gcc/common/config/i386/i386-common.cc
> > > +++ b/gcc/common/config/i386/i386-common.cc
> > > @@ -321,7 +321,9 @@ along with GCC; see the file COPYING3.  If not see
> > > | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
> > > | OPTION_MASK_ISA2_AVX512FP16_UNSET)
> > >  #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
> > > -  (OPTION_MASK_ISA2_AVX512F_UNSET)
> > > +  (OPTION_MASK_ISA2_SSE_UNSET \
> > > +   | OPTION_MASK_ISA2_AVX_UNSET \
> > > +   | OPTION_MASK_ISA2_AVX512F_UNSET)
> >
> > The above should only need OPTION_MASK_ISA2_SSE_UNSET, other options
> > follow from #define chain.
> >
>
> Here is the v2 patch to use OPTION_MASK_ISA2_SSE_UNSET.  OK for
> master and GCC 11 branches?

Have you regressiont tested it?

> Thanks.
>
>
> H.J.
> ---
> Replace OPTION_MASK_ISA2_AVX512F_UNSET with OPTION_MASK_ISA2_SSE_UNSET
> in OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET to disable SSE, AVX and
> AVX512 ISAs.
>
> gcc/
>
> PR target/105000
> * common/config/i386/i386-common.cc
> (OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET): Replace
> OPTION_MASK_ISA2_AVX512F_UNSET with OPTION_MASK_ISA2_SSE_UNSET.
>
> gcc/testsuite/
>
> PR target/105000
> * gcc.target/i386/pr105000-1.c: New test.
> * gcc.target/i386/pr105000-2.c: Likewise.
> * gcc.target/i386/pr105000-3.c: Likewise.
> * gcc.target/i386/pr105000-4.c: Likewise.
> ---
>  gcc/common/config/i386/i386-common.cc  |  2 +-
>  gcc/testsuite/gcc.target/i386/pr105000-1.c | 11 +++
>  gcc/testsuite/gcc.target/i386/pr105000-2.c | 11 +++
>  gcc/testsuite/gcc.target/i386/pr105000-3.c | 11 +++
>  gcc/testsuite/gcc.target/i386/pr105000-4.c | 11 +++
>  5 files changed, 45 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-4.c
>
> diff --git a/gcc/common/config/i386/i386-common.cc 
> b/gcc/common/config/i386/i386-common.cc
> index 449df6351c9..c64d7b01126 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -321,7 +321,7 @@ along with GCC; see the file COPYING3.  If not see
> | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
> | OPTION_MASK_ISA2_AVX512FP16_UNSET)
>  #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
> -  (OPTION_MASK_ISA2_AVX512F_UNSET)
> +  (OPTION_MASK_ISA2_SSE_UNSET)

No need for parenthesis.

Uros.

>  #define OPTION_MASK_ISA2_AVX_UNSET OPTION_MASK_ISA2_AVX2_UNSET
>  #define OPTION_MASK_ISA2_SSE4_2_UNSET OPTION_MASK_ISA2_AVX_UNSET
>  #define OPTION_MASK_ISA2_SSE4_1_UNSET OPTION_MASK_ISA2_SSE4_2_UNSET
> diff --git a/gcc/testsuite/gcc.target/i386/pr105000-1.c 
> b/gcc/testsuite/gcc.target/i386/pr105000-1.c
> new file mode 100644
> index 000..020e2adca83
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr105000-1.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mshstk -mavxvnni" } */
> +
> +#include 
> +
> +__attribute__((target("no-mmx,no-sse")))
> +int
> +foo ()
> +{
> +  return _get_ssp ();
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr105000-2.c 
> b/gcc/testsuite/gcc.target/i386/pr105000-2.c
> new file mode 100644
> index 000..a113fd1dfa2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr105000-2.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mshstk -mkl" } */
> +
> +#include 
> +
> +__attribute__((target("no-mmx,no-sse")))
> +int
> 

[PATCH v2] x86: Disable SSE in ISA2 for -mgeneral-regs-only

2022-03-21 Thread H.J. Lu via Gcc-patches
On Mon, Mar 21, 2022 at 10:23:59PM +0100, Uros Bizjak wrote:
> On Mon, Mar 21, 2022 at 10:10 PM H.J. Lu  wrote:
> >
> > SSE and AVX ISAs in ISA2 should be disabled for -mgeneral-regs-only.
> >
> > gcc/
> >
> > PR target/105000
> > * common/config/i386/i386-common.cc
> > (OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET): Also disable SSE
> > and AVX.
> >
> > gcc/testsuite/
> >
> > PR target/105000
> > * gcc.target/i386/pr105000-1.c: New test.
> > * gcc.target/i386/pr105000-2.c: Likewise.
> > * gcc.target/i386/pr105000-3.c: Likewise.
> > ---
> >  gcc/common/config/i386/i386-common.cc  |  4 +++-
> >  gcc/testsuite/gcc.target/i386/pr105000-1.c | 11 +++
> >  gcc/testsuite/gcc.target/i386/pr105000-2.c | 11 +++
> >  gcc/testsuite/gcc.target/i386/pr105000-3.c | 11 +++
> >  4 files changed, 36 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-3.c
> >
> > diff --git a/gcc/common/config/i386/i386-common.cc 
> > b/gcc/common/config/i386/i386-common.cc
> > index 449df6351c9..b77d495e9a4 100644
> > --- a/gcc/common/config/i386/i386-common.cc
> > +++ b/gcc/common/config/i386/i386-common.cc
> > @@ -321,7 +321,9 @@ along with GCC; see the file COPYING3.  If not see
> > | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
> > | OPTION_MASK_ISA2_AVX512FP16_UNSET)
> >  #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
> > -  (OPTION_MASK_ISA2_AVX512F_UNSET)
> > +  (OPTION_MASK_ISA2_SSE_UNSET \
> > +   | OPTION_MASK_ISA2_AVX_UNSET \
> > +   | OPTION_MASK_ISA2_AVX512F_UNSET)
> 
> The above should only need OPTION_MASK_ISA2_SSE_UNSET, other options
> follow from #define chain.
> 

Here is the v2 patch to use OPTION_MASK_ISA2_SSE_UNSET.  OK for
master and GCC 11 branches?

Thanks.


H.J.
---
Replace OPTION_MASK_ISA2_AVX512F_UNSET with OPTION_MASK_ISA2_SSE_UNSET
in OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET to disable SSE, AVX and
AVX512 ISAs.

gcc/

PR target/105000
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET): Replace
OPTION_MASK_ISA2_AVX512F_UNSET with OPTION_MASK_ISA2_SSE_UNSET.

gcc/testsuite/

PR target/105000
* gcc.target/i386/pr105000-1.c: New test.
* gcc.target/i386/pr105000-2.c: Likewise.
* gcc.target/i386/pr105000-3.c: Likewise.
* gcc.target/i386/pr105000-4.c: Likewise.
---
 gcc/common/config/i386/i386-common.cc  |  2 +-
 gcc/testsuite/gcc.target/i386/pr105000-1.c | 11 +++
 gcc/testsuite/gcc.target/i386/pr105000-2.c | 11 +++
 gcc/testsuite/gcc.target/i386/pr105000-3.c | 11 +++
 gcc/testsuite/gcc.target/i386/pr105000-4.c | 11 +++
 5 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-4.c

diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 449df6351c9..c64d7b01126 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -321,7 +321,7 @@ along with GCC; see the file COPYING3.  If not see
| OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
| OPTION_MASK_ISA2_AVX512FP16_UNSET)
 #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
-  (OPTION_MASK_ISA2_AVX512F_UNSET)
+  (OPTION_MASK_ISA2_SSE_UNSET)
 #define OPTION_MASK_ISA2_AVX_UNSET OPTION_MASK_ISA2_AVX2_UNSET
 #define OPTION_MASK_ISA2_SSE4_2_UNSET OPTION_MASK_ISA2_AVX_UNSET
 #define OPTION_MASK_ISA2_SSE4_1_UNSET OPTION_MASK_ISA2_SSE4_2_UNSET
diff --git a/gcc/testsuite/gcc.target/i386/pr105000-1.c 
b/gcc/testsuite/gcc.target/i386/pr105000-1.c
new file mode 100644
index 000..020e2adca83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105000-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mshstk -mavxvnni" } */
+
+#include 
+
+__attribute__((target("no-mmx,no-sse")))
+int
+foo ()
+{
+  return _get_ssp ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr105000-2.c 
b/gcc/testsuite/gcc.target/i386/pr105000-2.c
new file mode 100644
index 000..a113fd1dfa2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105000-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mshstk -mkl" } */
+
+#include 
+
+__attribute__((target("no-mmx,no-sse")))
+int
+foo ()
+{
+  return _get_ssp ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr105000-3.c 
b/gcc/testsuite/gcc.target/i386/pr105000-3.c
new file mode 100644
index 000..7e82925270c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105000-3.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mshstk -mwidekl" } */
+
+#include 
+

Re: [PATCH] x86: Properly check FEATURE_AESKLE

2022-03-21 Thread Uros Bizjak via Gcc-patches
On Mon, Mar 21, 2022 at 2:56 PM H.J. Lu  wrote:
>
> 1. Pass 0x19 to __cpuid for bit_AESKLE.
> 2. Enable FEATURE_AESKLE only if bit_AESKLE is set.
>
> PR target/104998
> * common/config/i386/cpuinfo.h (get_available_features): Pass
> 0x19 to __cpuid for bit_AESKLE.  Enable FEATURE_AESKLE only if
> bit_AESKLE is set.

LGTM.

Thanks,
Uros.

> ---
>  gcc/common/config/i386/cpuinfo.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/common/config/i386/cpuinfo.h 
> b/gcc/common/config/i386/cpuinfo.h
> index 61b1a0f291c..239759dc766 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -779,11 +779,11 @@ get_available_features (struct __processor_model 
> *cpu_model,
>/* Get Advanced Features at level 0x19 (eax = 0x19).  */
>if (max_cpuid_level >= 0x19)
>  {
> -  set_feature (FEATURE_AESKLE);
> -  __cpuid (19, eax, ebx, ecx, edx);
> +  __cpuid (0x19, eax, ebx, ecx, edx);
>/* Check if OS support keylocker.  */
>if (ebx & bit_AESKLE)
> {
> + set_feature (FEATURE_AESKLE);
>   if (ebx & bit_WIDEKL)
> set_feature (FEATURE_WIDEKL);
>   if (has_kl)
> --
> 2.35.1
>


[patch]update the documentation for TARGET_ZERO_CALL_USED_REGS hook and add an assertion

2022-03-21 Thread Qing Zhao via Gcc-patches
Hi, 

Per our discussion on: 
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592002.html

I come up with the following patch to:

1. Update the documentation for TARGET_ZERO_CALL_USED_REGS hook;
2. Add an assertion in function.cc to make sure the actually zeroed_regs is a 
subset of all call used regs;
   (The reason I didn’t add a new parameter to TARGET_ZERO_CALL_USED_REGS is, I 
think adding the 
assertion in the common place function.cc is simpler to be implemented).
3. This new assertion identified a bug in i386 implementation. Fix this bug in 
i386.

This patch is bootstrapped on both x86 and aarch64, no regression.

Okay for commit?

thanks.

Qing

===
From 2e5bc1b25a707c6a17afbf03da2a8bec5b03454d Mon Sep 17 00:00:00 2001
From: Qing Zhao 
Date: Fri, 18 Mar 2022 20:49:56 +
Subject: [PATCH] Add an assertion: the zeroed_hardregs set is a subset of all
 call used regs.

We should make sure that the hard register set that is actually cleared by
the target hook zero_call_used_regs should be a subset of all call used
registers.

At the same time, update documentation for the target hook
TARGET_ZERO_CALL_USED_REGS.

This new assertion identified a bug in the i386 implemenation, which
incorrectly set the zeroed_hardregs for stack registers. Fixed this bug
in i386 implementation.

gcc/ChangeLog:

2022-03-21  Qing Zhao  

* config/i386/i386.cc (zero_all_st_registers): Return the value of
num_of_st.
(ix86_zero_call_used_regs): Update zeroed_hardregs set according to
the return value of zero_all_st_registers.
* doc/tm.texi: Update the documentation of TARGET_ZERO_CALL_USED_REGS.
* function.cc (gen_call_used_regs_seq): Add an assertion.
* target.def: Update the documentation of TARGET_ZERO_CALL_USED_REGS.
---
 gcc/config/i386/i386.cc | 27 ++-
 gcc/doc/tm.texi |  7 +++
 gcc/function.cc | 22 ++
 gcc/target.def  |  7 +++
 4 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 5a561966eb44..d84047a4bc1b 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -3753,16 +3753,17 @@ zero_all_vector_registers (HARD_REG_SET 
need_zeroed_hardregs)
needs to be cleared, the whole stack should be cleared.  However,
x87 stack registers that hold the return value should be excluded.
x87 returns in the top (two for complex values) register, so
-   num_of_st should be 7/6 when x87 returns, otherwise it will be 8.  */
+   num_of_st should be 7/6 when x87 returns, otherwise it will be 8.
+   return the value of num_of_st.  */
 
 
-static bool
+static int
 zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
 {
 
   /* If the FPU is disabled, no need to zero all st registers.  */
   if (! (TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387))
-return false;
+return 0;
 
   unsigned int num_of_st = 0;
   for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
@@ -3774,7 +3775,7 @@ zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
   }
 
   if (num_of_st == 0)
-return false;
+return 0;
 
   bool return_with_x87 = false;
   return_with_x87 = (crtl->return_rtx
@@ -3802,7 +3803,7 @@ zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
   insn = emit_insn (gen_rtx_SET (st_reg, st_reg));
   add_reg_note (insn, REG_DEAD, st_reg);
 }
-  return true;
+  return num_of_st;
 }
 
 
@@ -3851,7 +3852,7 @@ ix86_zero_call_used_regs (HARD_REG_SET 
need_zeroed_hardregs)
 {
   HARD_REG_SET zeroed_hardregs;
   bool all_sse_zeroed = false;
-  bool all_st_zeroed = false;
+  int all_st_zeroed_num = 0;
   bool all_mm_zeroed = false;
 
   CLEAR_HARD_REG_SET (zeroed_hardregs);
@@ -3881,9 +3882,17 @@ ix86_zero_call_used_regs (HARD_REG_SET 
need_zeroed_hardregs)
   if (!exit_with_mmx_mode)
 /* x87 exit mode, we should zero all st registers together.  */
 {
-  all_st_zeroed = zero_all_st_registers (need_zeroed_hardregs);
-  if (all_st_zeroed)
-   SET_HARD_REG_BIT (zeroed_hardregs, FIRST_STACK_REG);
+  all_st_zeroed_num = zero_all_st_registers (need_zeroed_hardregs);
+
+  if (all_st_zeroed_num > 0)
+   for (unsigned int regno = FIRST_STACK_REG; regno <= LAST_STACK_REG; 
regno++)
+ /* x87 stack registers that hold the return value should be excluded.
+x87 returns in the top (two for complex values) register.  */
+ if (all_st_zeroed_num == 8
+ || !((all_st_zeroed_num >= 6 && regno == REGNO (crtl->return_rtx))
+  || (all_st_zeroed_num == 6
+  && (regno == (REGNO (crtl->return_rtx) + 1)
+   SET_HARD_REG_BIT (zeroed_hardregs, regno);
 }
   else
 /* MMX exit mode, check whether we can zero all mm registers.  */
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 2f92d37da8c0..c5006afc00d2 100644
--- a/gcc/doc/tm.texi
+++ 

Re: [PATCH] x86: Disable SSE and AVX in ISA2 for -mgeneral-regs-only

2022-03-21 Thread Uros Bizjak via Gcc-patches
On Mon, Mar 21, 2022 at 10:10 PM H.J. Lu  wrote:
>
> SSE and AVX ISAs in ISA2 should be disabled for -mgeneral-regs-only.
>
> gcc/
>
> PR target/105000
> * common/config/i386/i386-common.cc
> (OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET): Also disable SSE
> and AVX.
>
> gcc/testsuite/
>
> PR target/105000
> * gcc.target/i386/pr105000-1.c: New test.
> * gcc.target/i386/pr105000-2.c: Likewise.
> * gcc.target/i386/pr105000-3.c: Likewise.
> ---
>  gcc/common/config/i386/i386-common.cc  |  4 +++-
>  gcc/testsuite/gcc.target/i386/pr105000-1.c | 11 +++
>  gcc/testsuite/gcc.target/i386/pr105000-2.c | 11 +++
>  gcc/testsuite/gcc.target/i386/pr105000-3.c | 11 +++
>  4 files changed, 36 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-3.c
>
> diff --git a/gcc/common/config/i386/i386-common.cc 
> b/gcc/common/config/i386/i386-common.cc
> index 449df6351c9..b77d495e9a4 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -321,7 +321,9 @@ along with GCC; see the file COPYING3.  If not see
> | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
> | OPTION_MASK_ISA2_AVX512FP16_UNSET)
>  #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
> -  (OPTION_MASK_ISA2_AVX512F_UNSET)
> +  (OPTION_MASK_ISA2_SSE_UNSET \
> +   | OPTION_MASK_ISA2_AVX_UNSET \
> +   | OPTION_MASK_ISA2_AVX512F_UNSET)

The above should only need OPTION_MASK_ISA2_SSE_UNSET, other options
follow from #define chain.

Uros.

>  #define OPTION_MASK_ISA2_AVX_UNSET OPTION_MASK_ISA2_AVX2_UNSET
>  #define OPTION_MASK_ISA2_SSE4_2_UNSET OPTION_MASK_ISA2_AVX_UNSET
>  #define OPTION_MASK_ISA2_SSE4_1_UNSET OPTION_MASK_ISA2_SSE4_2_UNSET
> diff --git a/gcc/testsuite/gcc.target/i386/pr105000-1.c 
> b/gcc/testsuite/gcc.target/i386/pr105000-1.c
> new file mode 100644
> index 000..020e2adca83
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr105000-1.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mshstk -mavxvnni" } */
> +
> +#include 
> +
> +__attribute__((target("no-mmx,no-sse")))
> +int
> +foo ()
> +{
> +  return _get_ssp ();
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr105000-2.c 
> b/gcc/testsuite/gcc.target/i386/pr105000-2.c
> new file mode 100644
> index 000..a113fd1dfa2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr105000-2.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mshstk -mkl" } */
> +
> +#include 
> +
> +__attribute__((target("no-mmx,no-sse")))
> +int
> +foo ()
> +{
> +  return _get_ssp ();
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr105000-3.c 
> b/gcc/testsuite/gcc.target/i386/pr105000-3.c
> new file mode 100644
> index 000..7e82925270c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr105000-3.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mshstk -mwidekl" } */
> +
> +#include 
> +
> +__attribute__((target("no-mmx,no-sse")))
> +int
> +foo ()
> +{
> +  return _get_ssp ();
> +}
> --
> 2.35.1
>


[PATCH] x86: Disable SSE and AVX in ISA2 for -mgeneral-regs-only

2022-03-21 Thread H.J. Lu via Gcc-patches
SSE and AVX ISAs in ISA2 should be disabled for -mgeneral-regs-only.

gcc/

PR target/105000
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET): Also disable SSE
and AVX.

gcc/testsuite/

PR target/105000
* gcc.target/i386/pr105000-1.c: New test.
* gcc.target/i386/pr105000-2.c: Likewise.
* gcc.target/i386/pr105000-3.c: Likewise.
---
 gcc/common/config/i386/i386-common.cc  |  4 +++-
 gcc/testsuite/gcc.target/i386/pr105000-1.c | 11 +++
 gcc/testsuite/gcc.target/i386/pr105000-2.c | 11 +++
 gcc/testsuite/gcc.target/i386/pr105000-3.c | 11 +++
 4 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr105000-3.c

diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 449df6351c9..b77d495e9a4 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -321,7 +321,9 @@ along with GCC; see the file COPYING3.  If not see
| OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
| OPTION_MASK_ISA2_AVX512FP16_UNSET)
 #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
-  (OPTION_MASK_ISA2_AVX512F_UNSET)
+  (OPTION_MASK_ISA2_SSE_UNSET \
+   | OPTION_MASK_ISA2_AVX_UNSET \
+   | OPTION_MASK_ISA2_AVX512F_UNSET)
 #define OPTION_MASK_ISA2_AVX_UNSET OPTION_MASK_ISA2_AVX2_UNSET
 #define OPTION_MASK_ISA2_SSE4_2_UNSET OPTION_MASK_ISA2_AVX_UNSET
 #define OPTION_MASK_ISA2_SSE4_1_UNSET OPTION_MASK_ISA2_SSE4_2_UNSET
diff --git a/gcc/testsuite/gcc.target/i386/pr105000-1.c 
b/gcc/testsuite/gcc.target/i386/pr105000-1.c
new file mode 100644
index 000..020e2adca83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105000-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mshstk -mavxvnni" } */
+
+#include 
+
+__attribute__((target("no-mmx,no-sse")))
+int
+foo ()
+{
+  return _get_ssp ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr105000-2.c 
b/gcc/testsuite/gcc.target/i386/pr105000-2.c
new file mode 100644
index 000..a113fd1dfa2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105000-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mshstk -mkl" } */
+
+#include 
+
+__attribute__((target("no-mmx,no-sse")))
+int
+foo ()
+{
+  return _get_ssp ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr105000-3.c 
b/gcc/testsuite/gcc.target/i386/pr105000-3.c
new file mode 100644
index 000..7e82925270c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105000-3.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mshstk -mwidekl" } */
+
+#include 
+
+__attribute__((target("no-mmx,no-sse")))
+int
+foo ()
+{
+  return _get_ssp ();
+}
-- 
2.35.1



[PATCH] x86: Disable AVX on pr86722.c and pr90356.c

2022-03-21 Thread H.J. Lu via Gcc-patches
On Thu, Mar 17, 2022 at 9:47 PM sunil.k.pandey via Gcc-patches
 wrote:
>
> On Linux/x86_64,
>
> c482c28ba4c549006deb70dead90fe8ab34dcbcf is the first bad commit
> commit c482c28ba4c549006deb70dead90fe8ab34dcbcf
> Author: Roger Sayle 
> Date:   Thu Mar 17 21:56:32 2022 +
>
> PR 90356: Use xor to load const_double 0.0 on SSE (always)
>
> caused
>
> FAIL: gcc.target/i386/pr86722.c scan-assembler-not orpd
> FAIL: gcc.target/i386/pr90356.c scan-assembler pxor
>
> with GCC configured with
>
> ../../gcc/configure 
> --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-7693/usr
>  --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
>
> To reproduce:
>
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/pr86722.c --target_board='unix{-m64\ 
> -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/pr90356.c --target_board='unix{-m32\ 
> -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="i386.exp=gcc.target/i386/pr90356.c --target_board='unix{-m64\ 
> -march=cascadelake}'"
>

I am checking in this testcase fix.

-- 
H.J.
From 24722f23a9d6ac4fbc3694b92585db61797a843d Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 21 Mar 2022 13:57:31 -0700
Subject: [PATCH] x86: Disable AVX on pr86722.c and pr90356.c

SSE/SSE2 are enabled explicitly on pr86722.c and pr90356.c.  Disable AVX
to avoid AVX with -march=native.

	PR target/86722
	PR tree-optimization/90356
	* gcc.target/i386/pr86722.c: Add -mno-avx.
	* gcc.target/i386/pr90356.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/pr86722.c | 3 +--
 gcc/testsuite/gcc.target/i386/pr90356.c | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr86722.c b/gcc/testsuite/gcc.target/i386/pr86722.c
index 1092c4d8035..4de2ca1a6c0 100644
--- a/gcc/testsuite/gcc.target/i386/pr86722.c
+++ b/gcc/testsuite/gcc.target/i386/pr86722.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ! ia32 } } } */
-/* { dg-options "-O2 -msse" } */
+/* { dg-options "-O2 -mno-avx -msse" } */
 
 void f(double*d,double*e){
   for(;d

[pushed] c++: designated init and aggregate members [PR102337]

2022-03-21 Thread Jason Merrill via Gcc-patches
Our C++20 designated initializer handling was broken with members of class
type; we would find the relevant member and then try to find a member of
the member with the same name.  Or we would sometimes ignore the designator
entirely.  The former problem is fixed by the change to reshape_init_class,
the latter by the change to reshape_init_r.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/103337
PR c++/102740
PR c++/103299
PR c++/102538

gcc/cp/ChangeLog:

* decl.cc (reshape_init_class): Avoid looking for designator
after we found it.
(reshape_init_r): Keep looking for designator.

gcc/testsuite/ChangeLog:

* g++.dg/ext/flexary3.C: Remove one error.
* g++.dg/parse/pr43765.C: Likewise.
* g++.dg/cpp2a/desig22.C: New test.
* g++.dg/cpp2a/desig23.C: New test.
* g++.dg/cpp2a/desig24.C: New test.
* g++.dg/cpp2a/desig25.C: New test.
---
 gcc/cp/decl.cc   | 47 +---
 gcc/testsuite/g++.dg/cpp2a/desig22.C | 11 +++
 gcc/testsuite/g++.dg/cpp2a/desig23.C | 20 
 gcc/testsuite/g++.dg/cpp2a/desig24.C | 11 +++
 gcc/testsuite/g++.dg/cpp2a/desig25.C | 13 
 gcc/testsuite/g++.dg/ext/flexary3.C  |  2 +-
 gcc/testsuite/g++.dg/parse/pr43765.C |  6 ++--
 7 files changed, 101 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig22.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig23.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig24.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig25.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 375385e0013..34d9dad9fb0 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -6598,8 +6598,9 @@ reshape_init_class (tree type, reshape_iter *d, bool 
first_initializer_p,
 {
   tree field_init;
   constructor_elt *old_cur = d->cur;
+  bool direct_desig = false;
 
-  /* Handle designated initializers, as an extension.  */
+  /* Handle C++20 designated initializers.  */
   if (d->cur->index)
{
  if (d->cur->index == error_mark_node)
@@ -6617,7 +6618,10 @@ reshape_init_class (tree type, reshape_iter *d, bool 
first_initializer_p,
}
}
  else if (TREE_CODE (d->cur->index) == IDENTIFIER_NODE)
-   field = get_class_binding (type, d->cur->index);
+   {
+ field = get_class_binding (type, d->cur->index);
+ direct_desig = true;
+   }
  else
{
  if (complain & tf_error)
@@ -6669,6 +6673,7 @@ reshape_init_class (tree type, reshape_iter *d, bool 
first_initializer_p,
  break;
  gcc_assert (aafield);
  field = aafield;
+ direct_desig = false;
}
}
 
@@ -6683,9 +6688,32 @@ reshape_init_class (tree type, reshape_iter *d, bool 
first_initializer_p,
   assumed to correspond to no elements of the initializer list.  */
goto continue_;
 
-  field_init = reshape_init_r (TREE_TYPE (field), d,
-  /*first_initializer_p=*/NULL_TREE,
-  complain);
+  if (direct_desig)
+   {
+ /* The designated field F is initialized from this one element:
+Temporarily clear the designator so a recursive reshape_init_class
+doesn't try to find it again in F, and adjust d->end so we don't
+try to use the next initializer to initialize another member of F.
+
+Note that we don't want these changes if we found the designator
+inside an anon aggr above; we leave them alone to implement:
+
+"If the element is an anonymous union member and the initializer
+list is a brace-enclosed designated- initializer-list, the element
+is initialized by the designated-initializer-list { D }, where D
+is the designated- initializer-clause naming a member of the
+anonymous union member."  */
+ auto end_ = make_temp_override (d->end, d->cur + 1);
+ auto idx_ = make_temp_override (d->cur->index, NULL_TREE);
+ field_init = reshape_init_r (TREE_TYPE (field), d,
+  /*first_initializer_p=*/NULL_TREE,
+  complain);
+   }
+  else
+   field_init = reshape_init_r (TREE_TYPE (field), d,
+/*first_initializer_p=*/NULL_TREE,
+complain);
+
   if (field_init == error_mark_node)
return error_mark_node;
 
@@ -6941,6 +6969,15 @@ reshape_init_r (tree type, reshape_iter *d, tree 
first_initializer_p,
 to handle initialization of arrays and similar.  */
  else if (COMPOUND_LITERAL_P (stripped_init))
gcc_assert (!BRACE_ENCLOSED_INITIALIZER_P (stripped_init));
+ /* If we have an unresolved 

[pushed] c++: designator and anon struct [PR101767]

2022-03-21 Thread Jason Merrill via Gcc-patches
We found .x in the anonymous struct, but then didn't find .y there; we
should decide that means we're done with the struct rather than that the
code is wrong.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/101767

gcc/cp/ChangeLog:

* decl.cc (reshape_init_class): Back out of anon struct
if a designator doesn't match.

gcc/testsuite/ChangeLog:

* g++.dg/ext/anon-struct10.C: New test.
---
 gcc/cp/decl.cc   |  5 +
 gcc/testsuite/g++.dg/ext/anon-struct10.C | 21 +
 2 files changed, 26 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/anon-struct10.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 8afda8264c4..375385e0013 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -6626,6 +6626,11 @@ reshape_init_class (tree type, reshape_iter *d, bool 
first_initializer_p,
  return error_mark_node;
}
 
+ if (!field && ANON_AGGR_TYPE_P (type))
+   /* Apparently the designator isn't for a member of this anonymous
+  struct, so head back to the enclosing class.  */
+   break;
+
  if (!field || TREE_CODE (field) != FIELD_DECL)
{
  if (complain & tf_error)
diff --git a/gcc/testsuite/g++.dg/ext/anon-struct10.C 
b/gcc/testsuite/g++.dg/ext/anon-struct10.C
new file mode 100644
index 000..9b01bf3fada
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/anon-struct10.C
@@ -0,0 +1,21 @@
+// PR c++/101767
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-Wno-pedantic" }
+
+typedef struct {
+  struct {
+int x;
+  };
+  union {
+int y;
+float z;
+  };
+} S;
+
+void foo(void)
+{
+  [[maybe_unused]] S a = {
+.x = 1,
+.y = 0
+  };
+}

base-commit: 70b8f43695b0e6fabc760d247ac83f354092b21d
-- 
2.27.0



[committed] d: Fix internal compiler error: in build_complex, at tree.c:2358

2022-03-21 Thread Iain Buclaw via Gcc-patches
Hi,

This patch fixes an ICE in the D front-end when constructing a complex
object from a struct literal typed as enum __c_complex_float.

The conversion from the special _Complex enum to native complex used
build_complex, however the input value isn't necessarily a literal.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32,
committed to mainline, and backported to the releases/gcc-11 branch.

Regards,
Iain.

---
PR d/105004

gcc/d/ChangeLog:

* d-codegen.cc (build_struct_literal): Use complex_expr to build
complex expressions from __c_complex types.

gcc/testsuite/ChangeLog:

* gdc.dg/pr105004.d: New test.
---
 gcc/d/d-codegen.cc  |  2 +-
 gcc/testsuite/gdc.dg/pr105004.d | 14 ++
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gdc.dg/pr105004.d

diff --git a/gcc/d/d-codegen.cc b/gcc/d/d-codegen.cc
index 3e54d3bffd0..3206edd17e8 100644
--- a/gcc/d/d-codegen.cc
+++ b/gcc/d/d-codegen.cc
@@ -1161,7 +1161,7 @@ build_struct_literal (tree type, vec  *init)
   if (COMPLEX_FLOAT_TYPE_P (type))
 {
   gcc_assert (vec_safe_length (init) == 2);
-  return build_complex (type, (*init)[0].value, (*init)[1].value);
+  return complex_expr (type, (*init)[0].value, (*init)[1].value);
 }
 
   vec  *ve = NULL;
diff --git a/gcc/testsuite/gdc.dg/pr105004.d b/gcc/testsuite/gdc.dg/pr105004.d
new file mode 100644
index 000..60b3c3f635e
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/pr105004.d
@@ -0,0 +1,14 @@
+// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105004
+// { dg-do compile }
+
+private struct _Complex(T)
+{
+T re;
+T im;
+}
+enum __c_complex_float  : _Complex!float;
+
+__c_complex_float pr105004(float re, float im)
+{
+return typeof(return)(re, im);
+}
-- 
2.32.0



Re: [PATCH]rs6000: avoid peeking eof after __vector keyword

2022-03-21 Thread Segher Boessenkool
On Mon, Mar 21, 2022 at 02:14:08PM -0400, David Edelsohn wrote:
> On Mon, Mar 21, 2022 at 5:13 AM Jiufu Guo  wrote:
> > There is a rare corner case: where __vector is followed only with ";"
> > and near the end of the file.

> This is okay. Maybe a tweak to the comment, see below.

This whole function could use some restructuring / rewriting to make
clearer what it actually does.  See the function comment:

/* Called to decide whether a conditional macro should be expanded.
   Since we have exactly one such macro (i.e, 'vector'), we do not
   need to examine the 'tok' parameter.  */

... followed by 17 uses of "tok".  Yes, some of those overwrite the
function argument, but that doesn't make it any better!  :-P

Some factoring would help, too, perhaps.


Segher


Re: [PATCH]rs6000: avoid peeking eof after __vector keyword

2022-03-21 Thread David Edelsohn via Gcc-patches
On Mon, Mar 21, 2022 at 5:13 AM Jiufu Guo  wrote:
>
> Hi!
>
> There is a rare corner case: where __vector is followed only with ";"
> and near the end of the file.
>
> Like the case in PR101168:
> using vdbl =  __vector double;
> #define BREAK 1
>
> For this case, "__vector double" is not followed by a PP_NAME, it is
> followed by CPP_SEMICOLON and then EOF.  In this case, there is no
> more tokens in the file.  Then, do not need to continue to parse the
> file.
>
> This patch pass bootstrap and regtest on ppc64 and ppc64le.

This is okay. Maybe a tweak to the comment, see below.

Thanks, David

>
>
> BR,
> Jiufu
>
>
> PR preprocessor/101168
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000-c.cc (rs6000_macro_to_expand):
> Avoid empty identifier.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/powerpc/pr101168.C: New test.
>
>
> ---
>  gcc/config/rs6000/rs6000-c.cc   | 4 +++-
>  gcc/testsuite/g++.target/powerpc/pr101168.C | 6 ++
>  2 files changed, 9 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.target/powerpc/pr101168.C
>
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index 3b62b499df2..f8cc7bad812 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -282,7 +282,9 @@ rs6000_macro_to_expand (cpp_reader *pfile, const 
> cpp_token *tok)
> expand_bool_pixel = __pixel_keyword;
>   else if (ident == C_CPP_HASHNODE (__bool_keyword))
> expand_bool_pixel = __bool_keyword;
> - else
> +
> + /* If it needs to check tokens continue.  */

Maybe /* If there are more tokens to check.  */ ?

> + else if (ident)
> {
>   /* Try two tokens down, too.  */
>   do
> diff --git a/gcc/testsuite/g++.target/powerpc/pr101168.C 
> b/gcc/testsuite/g++.target/powerpc/pr101168.C
> new file mode 100644
> index 000..284e77fdc88
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/powerpc/pr101168.C
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-options "-maltivec" } */
> +
> +using vdbl =  __vector double;
> +#define BREAK 1
> --
> 2.25.1
>


Re: [PING^2][PATCH][final] Handle compiler-generated asm insn

2022-03-21 Thread Tom de Vries via Gcc-patches

On 3/21/22 14:49, Richard Biener wrote:

On Mon, Mar 21, 2022 at 12:50 PM Tom de Vries  wrote:


On 3/21/22 08:58, Richard Biener wrote:

On Thu, Mar 17, 2022 at 4:10 PM Tom de Vries via Gcc-patches
 wrote:


On 3/9/22 13:50, Tom de Vries wrote:

On 2/22/22 14:55, Tom de Vries wrote:

Hi,

For the nvptx port, with -mptx-comment we have in pr53465.s:
...
   // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
   // Start: Added by -minit-regs=3:
   // #NO_APP
   mov.u32 %r26, 0;
   // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
   // End: Added by -minit-regs=3:
   // #NO_APP
...

The comments where generated using the compiler-generated equivalent of:
...
 asm ("// Comment");
...
but both the printed location and the NO_APP/APP are unnecessary for a
compiler-generated asm insn.

Fix this by handling ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION in
final_scan_insn_1, such what we simply get:
...
   // Start: Added by -minit-regs=3:
   mov.u32 %r26, 0;
   // End: Added by -minit-regs=3:
...

Tested on nvptx.

OK for trunk?





Ping^2.

Tobias just reported an ICE in PR104968, and this patch fixes it.

I'd like to known whether this patch is acceptable for stage 4 or not.

If not, I need to fix PR104968 in a different way.  Say, disable
-mcomment by default, or trying harder to propagate source info on
outlined functions.




Hi,

thanks for the review.


Usually targets use UNSPECs to emit compiler-generated "asm"
instructions.


Ack. [ I could go down that route eventually, but for now I'm hoping to
implement this without having to change the port. ]


I think an unknown location is a reasonable but not
the best way to identify 'compiler-generated', we might lose
the location through optimization.  (why does it not use
the INSN_LOCATION?)



I don't know.  FWIW, at the time that ASM_INPUT_SOURCE_LOCATION was
introduced (2007), there was no INSN_LOCATION yet (introduced in 2012),
only INSN_LOCATOR, my guess is that it has something to do with that.


Rather than a location I'd use sth like DECL_ARTIFICIAL to
disable 'user-mangling', do we have something like that for
ASM or an insn in general?


Haven't found it.


If not maybe there's an unused
bit on ASMs we can enable this way.


Done.  I've used the jump flag for that.

Updated, untested patch attached.

Is this what you meant?


Hmm.  I now read that ASM_INPUT is in every PATTERN of an insn


Maybe I misunderstand, but that sounds incorrect to me.  That is, can 
you point me to where you read that?


Maybe you're referring to the fact that an ASM_INPUT may occur inside an 
ASM_OPERANDS, as "a convenient way to hold a string" (quoting rtl.def)?



and wonder how this all works out there.  That is, by default the
ASM_INPUT would be artificial (for regular define_insn) but asm("")
in source would mark them ASM_INPUT_USER_P or so.



If you're suggesting to make it by default artificial, then that doesn't 
sound like a bad idea to me.  In this iteration I haven't implemented 
this (yet), but instead explicitly marked as artificial some other uses 
of ASM_INPUT.



But then I know nothing here.  I did expect us to look at
ASM_OPERANDS instead of just ASM_INPUT (but the code you
are changing is about ASM_INPUT).



I extended the rationale in the commit log a bit to include a 
description of what the rtl-equivalent of 'asm ("// Comment")' looks 
like, and there's no ASM_OPERANDS there.



That said, the comments should probably explicitely say this
is about ASM_INPUT in an ASM_OPERANDS  instruction
template, not some other pattern.



AFAIU, this isn't about an ASM_INPUT in an ASM_OPERANDS  instruction 
template, so at this point I haven't updated the comment.


Thanks,
- Tom
[final] Handle compiler-generated asm insn

For the nvptx port, with -mptx-comment we have for test-case pr53465.c at
mach:
...
(insn 66 43 65 3 (asm_input ("// Start: Added by -minit-regs=3:")) -1
 (nil))
(insn 65 66 67 3 (set (reg/v:SI 26 [ d ])
(const_int 0 [0])) 6 {*movsi_insn}
 (nil))
(insn 67 65 44 3 (asm_input ("// End: Added by -minit-regs=3:")) -1
 (nil))
...
and in pr53465.s:
...
// #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
// Start: Added by -minit-regs=3:
// #NO_APP
mov.u32 %r26, 0;
// #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
// End: Added by -minit-regs=3:
// #NO_APP
...

[ The comment insns were modelled after:
...
  asm ("// Comment");
...
which expands to:
...
(insn 5 2 6 2 (parallel [
(asm_input/v ("// Comment") test.c:4)
(clobber (mem:BLK (scratch) [0  A8]))
]) "test.c":4:3 -1
 (nil))
...
Note btw the differences: the comment insn has no clobber, and ASM_INPUT is
not volatile. ]

Both the printed location and the NO_APP/APP are unnecessary for a
compiler-generated asm insn.

Fix this by:
- 

Re: [PATCH] Ignore (possible) signed zeros in operands of FP comparisons.

2022-03-21 Thread Aldy Hernandez via Gcc-patches
On Fri, Mar 18, 2022 at 7:33 PM Aldy Hernandez  wrote:

> > > Consider the following interesting example:
> > >
> > > int foo(int x, double y) {
> > >  return (x * 0.0) < y;
> > > }
> > >
> > > Although we know that x (when converted to double) can't be NaN or
> > > Inf, we still worry that for negative values of x that (x * 0.0) may
> > > be -0.0 and so perform the multiplication at run-time. But in this
> > > case, the result of the comparison (-0.0 < y) will be exactly the
> > > same
> > > as (+0.0 < y) for any y, hence the above may be safely constant
> > > folded to "0.0 <
> >  y"
> > > avoiding the multiplication at run-time.

Ok, once the "frange" infrastructure is in place, it's actually
trivial.  See attached patch and tests.  We can do everything with
small range-op entries and evrp / ranger will handle everything else.

Roger, I believe this is what you described:

+// { dg-do compile }
+// { dg-options "-O2 -fno-tree-forwprop -fno-thread-jumps
-fdump-tree-evrp -fdump-tree-optimized" }
+
+extern void link_error ();
+
+int foo(int x, double y)
+{
+  return (x * 0.0) < y;
+}
+
+// The multiply should be gone by *vrp time.
+// { dg-final { scan-tree-dump-not " \\* " "evrp" } }
+
+// Ultimately, there sound be no references to "x".
+// { dg-final { scan-tree-dump-not "x_" "optimized" } }

With the attached patch (and pending patches), we keep track that a
cast from int to float is guaranteed to not be NaN, which allows us to
fold x*0.0 into 0.0, and DCE to remove x altogether.  Also, as the
other tests show, we are able to resolve __builtin_isnan's for the
operations implemented.  It should be straightforward for someone with
floating point foo to extend this to all operations.

Aldy
From 2a6218e97782f63dfe9836e6024fbb28c8cbb803 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Mon, 21 Mar 2022 16:26:40 +0100
Subject: [PATCH] [frange] Implement NAN aware stubs for FLOAT_EXPR,
 UNORDERED_EXPR, and MULT_EXPR.

---
 gcc/range-op-float.cc| 90 
 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-01.c | 14 +++
 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-02.c | 14 +++
 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-03.c | 15 
 4 files changed, 133 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-01.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-02.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-03.c

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index 988a3938959..52aaa01ab0f 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -922,6 +922,93 @@ foperator_cst::fold_range (frange , tree type ATTRIBUTE_UNUSED,
   return true;
 }
 
+class foperator_cast : public range_operator
+{
+  using range_operator::fold_range;
+public:
+  virtual bool fold_range (frange , tree type,
+			   const irange ,
+			   const frange ,
+			   relation_kind rel = VREL_NONE) const override;
+} fop_convert;
+
+bool
+foperator_cast::fold_range (frange , tree type,
+			const irange ,
+			const frange ,
+			relation_kind) const
+{
+  if (empty_range_varying (r, type, inner, outer))
+return true;
+
+  r.set_varying (type);
+
+  // Some flags can be cleared when converting from ints.
+  r.clear_prop (FRANGE_PROP_NAN);
+
+  return true;
+}
+
+class foperator_unordered : public range_operator
+{
+  using range_operator::fold_range;
+public:
+  virtual bool fold_range (irange , tree type,
+			   const frange ,
+			   const frange ,
+			   relation_kind rel = VREL_NONE) const override;
+} fop_unordered;
+
+bool
+foperator_unordered::fold_range (irange , tree type,
+ const frange ,
+ const frange ,
+ relation_kind) const
+{
+  if (empty_range_varying (r, type, lh, rh))
+return true;
+
+  // Return FALSE if both operands are !NaN.
+  if (!lh.get_prop (FRANGE_PROP_NAN) && !rh.get_prop (FRANGE_PROP_NAN))
+{
+  r = range_false (type);
+  return true;
+}
+
+  return false;
+}
+
+class foperator_mult : public range_operator
+{
+  using range_operator::fold_range;
+public:
+  virtual bool fold_range (frange , tree type,
+			   const frange ,
+			   const frange ,
+			   relation_kind rel = VREL_NONE) const override;
+} fop_mult;
+
+bool
+foperator_mult::fold_range (frange , tree type,
+			const frange ,
+			const frange ,
+			relation_kind) const
+{
+  if (empty_range_varying (r, type, lh, rh))
+return true;
+
+  // When x is !Nan, x * 0.0 = 0.0
+  if (rh.zero_p ()
+  && !rh.get_prop (FRANGE_PROP_NAN)
+  && !lh.get_prop (FRANGE_PROP_NAN))
+{
+  r.set_zero (type);
+  return true;
+}
+
+  return false;
+}
+
 class floating_table : public range_op_table
 {
 public:
@@ -944,6 +1031,9 @@ floating_table::floating_table ()
   set (PAREN_EXPR, fop_identity);
   set (OBJ_TYPE_REF, fop_identity);
   set (REAL_CST, fop_real_cst);
+  set (FLOAT_EXPR, fop_convert);
+  set (UNORDERED_EXPR, 

[PATCH] tree-optimization/104912 - ensure cost model is checked first

2022-03-21 Thread Richard Biener via Gcc-patches
The following makes sure that when we build the versioning condition
for vectorization including the cost model check, we check for the
cost model and branch over other versioning checks.  That is what
the cost modeling assumes, since the cost model check is the only
one accounted for in the scalar outside cost.  Currently we emit
all checks as straight-line code combined with bitwise ops which
can result in surprising ordering of checks in the final assembly.

Since loop_version accepts only a single versioning condition
the splitting is done after the fact.

The result is a 1.5% speedup of 416.gamess on x86_64 when compiling
with -Ofast and tuning for generic or skylake.  That's not enough
to recover from the slowdown when vectorizing but it now cuts off
the expensive alias versioning test.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for trunk?

For the rest of the regression my plan is to somehow factor in
the evolution of the number of iterations in the outer loop
(which is {1, +, 1}) to somehow bump the static profitability
estimate and together with the "cheap" cost model check never
execute the vectorized version (well, it is actually never executed,
but only because the alias check fails).

Thanks,
Richard.

2022-03-21  Richard Biener  

PR tree-optimization/104912
* tree-vect-loop-manip.cc (vect_loop_versioning): Split
the cost model check to a separate BB to make sure it is
checked first and not combined with other version checks.
---
 gcc/tree-vect-loop-manip.cc | 53 ++---
 1 file changed, 50 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index a7bbc916bbc..8ef333eb31b 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3445,13 +3445,28 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
cond_expr = expr;
 }
 
+  tree cost_name = NULL_TREE;
+  if (cond_expr
+  && !integer_truep (cond_expr)
+  && (version_niter
+ || version_align
+ || version_alias
+ || version_simd_if_cond))
+cost_name = cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
+   _expr_stmt_list,
+   is_gimple_val, NULL_TREE);
+
   if (version_niter)
 vect_create_cond_for_niters_checks (loop_vinfo, _expr);
 
   if (cond_expr)
-cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
-   _expr_stmt_list,
-   is_gimple_condexpr, NULL_TREE);
+{
+  gimple_seq tem = NULL;
+  cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
+ ,
+ is_gimple_condexpr, NULL_TREE);
+  gimple_seq_add_seq (_expr_stmt_list, tem);
+}
 
   if (version_align)
 vect_create_cond_for_align_checks (loop_vinfo, _expr,
@@ -3654,6 +3669,38 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
   update_ssa (TODO_update_ssa);
 }
 
+  /* Split the cost model check off to a separate BB.  Costing assumes
+ this is the only thing we perform when we enter the scalar loop.  */
+  if (cost_name)
+{
+  gimple *def = SSA_NAME_DEF_STMT (cost_name);
+  /* All uses of the cost check are 'true' after the check we
+are going to insert.  */
+  replace_uses_by (cost_name, boolean_true_node);
+  /* And we're going to build the new single use of it.  */
+  gcond *cond = gimple_build_cond (NE_EXPR, cost_name, boolean_false_node,
+  NULL_TREE, NULL_TREE);
+  edge e = split_block (gimple_bb (def), def);
+  gimple_stmt_iterator gsi = gsi_for_stmt (def);
+  gsi_insert_after (, cond, GSI_NEW_STMT);
+  edge true_e, false_e;
+  extract_true_false_edges_from_block (e->dest, _e, _e);
+  e->flags &= ~EDGE_FALLTHRU;
+  e->flags |= EDGE_TRUE_VALUE;
+  edge e2 = make_edge (e->src, false_e->dest, EDGE_FALSE_VALUE);
+  e->probability = prob;
+  e2->probability = prob.invert ();
+  set_immediate_dominator (CDI_DOMINATORS, false_e->dest, e->src);
+  auto_vec adj;
+  for (basic_block son = first_dom_son (CDI_DOMINATORS, e->dest);
+  son;
+  son = next_dom_son (CDI_DOMINATORS, son))
+   if (EDGE_COUNT (son->preds) > 1)
+ adj.safe_push (son);
+  for (auto son : adj)
+   set_immediate_dominator (CDI_DOMINATORS, son, e->src);
+}
+
   if (version_niter)
 {
   /* The versioned loop could be infinite, we need to clear existing
-- 
2.34.1


Re: [PATCH v3, rs6000] Add V1TI into vector comparison expand [PR103316]

2022-03-21 Thread will schmidt via Gcc-patches
On Mon, 2022-03-21 at 09:51 +0800, HAO CHEN GUI wrote:
> Hi,
>This patch adds V1TI mode into a new mode iterator used in vector
> comparison expands.Without the patch, the comparisons between two vector
> __int128 are converted to scalar comparisons with branches. The code is
> suboptimal.The patch fixes the issue. Now all comparisons between two
> vector __int128 generates P10 new comparison instructions. Also the
> relative built-ins generate the same instructions after gimple folding.
> So they're added back to the list.
> 

Hi,
Thanks for reworking the description, this clears up my uncertainty. 
:-)
A few spots where spaces should be added after periods.  No need to
re-post for just that.  Patch content otherwise seems OK to me, though
I defer to others for any subtleties with actual VEC_IC related
changes, 
Thanks
-Will


>Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-03-16 Haochen Gui 
> 
> gcc/
>   PR target/103316
>   * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Enable
>   gimple folding for RS6000_BIF_VCMPEQUT, RS6000_BIF_VCMPNET,
>   RS6000_BIF_CMPGE_1TI, RS6000_BIF_CMPGE_U1TI, RS6000_BIF_VCMPGTUT,
>   RS6000_BIF_VCMPGTST, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI.
>   * config/rs6000/vector.md (VEC_IC): Define. Add support for new Power10
>   V1TI instructions.
>   (vec_cmp): Set mode iterator to VEC_IC.
>   (vec_cmpu): Likewise.
> 
> gcc/testsuite/
>   PR target/103316
>   * gcc.target/powerpc/pr103316.c: New.
>   * gcc.target/powerpc/fold-vec-cmp-int128.c: New cases for vector
>   __int128.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 5d34c1bcfc9..fac7f43f438 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -1994,16 +1994,14 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPEQUH:
>  case RS6000_BIF_VCMPEQUW:
>  case RS6000_BIF_VCMPEQUD:
> -/* We deliberately omit RS6000_BIF_VCMPEQUT for now, because gimple
> -   folding produces worse code for 128-bit compares.  */
> +case RS6000_BIF_VCMPEQUT:
>fold_compare_helper (gsi, EQ_EXPR, stmt);
>return true;
> 
>  case RS6000_BIF_VCMPNEB:
>  case RS6000_BIF_VCMPNEH:
>  case RS6000_BIF_VCMPNEW:
> -/* We deliberately omit RS6000_BIF_VCMPNET for now, because gimple
> -   folding produces worse code for 128-bit compares.  */
> +case RS6000_BIF_VCMPNET:
>fold_compare_helper (gsi, NE_EXPR, stmt);
>return true;
> 
> @@ -2015,9 +2013,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPGE_U4SI:
>  case RS6000_BIF_CMPGE_2DI:
>  case RS6000_BIF_CMPGE_U2DI:
> -/* We deliberately omit RS6000_BIF_CMPGE_1TI and RS6000_BIF_CMPGE_U1TI
> -   for now, because gimple folding produces worse code for 128-bit
> -   compares.  */
> +case RS6000_BIF_CMPGE_1TI:
> +case RS6000_BIF_CMPGE_U1TI:
>fold_compare_helper (gsi, GE_EXPR, stmt);
>return true;
> 
> @@ -2029,9 +2026,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPGTUW:
>  case RS6000_BIF_VCMPGTUD:
>  case RS6000_BIF_VCMPGTSD:
> -/* We deliberately omit RS6000_BIF_VCMPGTUT and RS6000_BIF_VCMPGTST
> -   for now, because gimple folding produces worse code for 128-bit
> -   compares.  */
> +case RS6000_BIF_VCMPGTUT:
> +case RS6000_BIF_VCMPGTST:
>fold_compare_helper (gsi, GT_EXPR, stmt);
>return true;
> 
> @@ -2043,9 +2039,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPLE_U4SI:
>  case RS6000_BIF_CMPLE_2DI:
>  case RS6000_BIF_CMPLE_U2DI:
> -/* We deliberately omit RS6000_BIF_CMPLE_1TI and RS6000_BIF_CMPLE_U1TI
> -   for now, because gimple folding produces worse code for 128-bit
> -   compares.  */
> +case RS6000_BIF_CMPLE_1TI:
> +case RS6000_BIF_CMPLE_U1TI:
>fold_compare_helper (gsi, LE_EXPR, stmt);
>return true;
> 
> diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
> index b87a742cca8..d88869cc8d0 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -26,6 +26,9 @@
>  ;; Vector int modes
>  (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
> 
> +;; Vector int modes for comparison
> +(define_mode_iterator VEC_IC [V16QI V8HI V4SI V2DI (V1TI "TARGET_POWER10")])
> +
>  ;; 128-bit int modes
>  (define_mode_iterator VEC_TI [V1TI TI])
> 
> @@ -533,10 +536,10 @@ (define_expand "vcond_mask_"
> 
>  ;; For signed integer vectors comparison.
>  (define_expand "vec_cmp"
> -  [(set (match_operand:VEC_I 0 "vint_operand")
> +  [(set (match_operand:VEC_IC 0 "vint_operand")
>   (match_operator 1 

Re: [PATCH v2] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-21 Thread Hongyu Wang via Gcc-patches
> Considering ICE in PR104976, it's better to force_reg before lowpart_subreg.
> i.e.
> op0 = lowpart_subreg (V4SFmode, force_reg (V8HFmode, operands[0]), V8HFmode);
> if (!MEM_P (operands[1]))
>   operands[1] = force_reg (V8HFmode, operands[1]);
> op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
> rtx dest = gen_reg_rtx (V4SFmode);
> emit_insn (gen_avx512f_movsf_mask (dest, op1, op0, op1, operands[4]));
> emit_move_insn (operands[0], lowpart_subreg (V8HFmode, dest, V4SFmode);

I think this is different from PR104976, since operands[0] and
operands[1] here are strictly V8HF operands from builtin input.
I suppose there should be no chance to input a different size subreg
for the expander, otherwise (__v8hf) convert in builtin would fail
first.

Hongtao Liu via Gcc-patches  于2022年3月21日周一 20:53写道:

>
> On Mon, Mar 21, 2022 at 7:52 PM Hongyu Wang via Gcc-patches
>  wrote:
> >
> > Hi,
> >
> > For complex scalar intrinsic like _mm_mask_fcmadd_sch, the
> > mask should be and by 1 to ensure the mask is bind to lowest byte.
> > Use masked vmovss to perform same operation which omits higher bits
> > of mask.
> >
> > Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.
> >
> > Ok for master?
> >
> > gcc/ChangeLog:
> >
> > PR target/104978
> > * config/i386/sse.md
> > (avx512fp16_fmaddcsh_v8hf_mask1 > Use avx512f_movsf_mask instead of vmovaps or vblend.
> > (avx512fp16_fcmaddcsh_v8hf_mask1 >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/104978
> > * gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c: Adjust scan.
> > * gcc.target/i386/avx512fp16-vfmaddcsh-1a.c: Ditto.
> > * gcc.target/i386/avx512fp16-vfcmaddcsh-1c.c: Removed.
> > * gcc.target/i386/avx512fp16-vfmaddcsh-1c.c: Ditto.
> > * gcc.target/i386/pr104978.c: New test.
> > ---
> >  gcc/config/i386/sse.md| 48 ---
> >  .../i386/avx512fp16-vfcmaddcsh-1a.c   |  4 +-
> >  .../i386/avx512fp16-vfcmaddcsh-1c.c   | 13 -
> >  .../gcc.target/i386/avx512fp16-vfmaddcsh-1a.c |  4 +-
> >  .../gcc.target/i386/avx512fp16-vfmaddcsh-1c.c | 13 -
> >  gcc/testsuite/gcc.target/i386/pr104978.c  | 18 +++
> >  6 files changed, 30 insertions(+), 70 deletions(-)
> >  delete mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1c.c
> >  delete mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1c.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr104978.c
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 21bf3c55c95..1087a37812f 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -6586,26 +6586,10 @@ (define_expand 
> > "avx512fp16_fmaddcsh_v8hf_mask1"
> >  emit_insn (gen_avx512fp16_fmaddcsh_v8hf_mask (operands[0],
> >operands[1], operands[2], operands[3], operands[4]));
> >
> > -  if (TARGET_AVX512VL)
> > -  {
> > -op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
> > -op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
> > -emit_insn (gen_avx512vl_loadv4sf_mask (op0, op0, op1, operands[4]));
> > -  }
> > -  else
> > -  {
> > -rtx mask, tmp, vec_mask;
> > -mask = lowpart_subreg (SImode, operands[4], QImode),
> > -tmp = gen_reg_rtx (SImode);
> > -emit_insn (gen_ashlsi3 (tmp, mask, GEN_INT (31)));
> > -vec_mask = gen_reg_rtx (V4SImode);
> > -emit_insn (gen_rtx_SET (vec_mask, CONST0_RTX (V4SImode)));
> > -emit_insn (gen_vec_setv4si_0 (vec_mask, vec_mask, tmp));
> > -vec_mask = lowpart_subreg (V4SFmode, vec_mask, V4SImode);
> > -op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
> > -op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
> > -emit_insn (gen_sse4_1_blendvps (op0, op1, op0, vec_mask));
> > -  }
> > +  op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
> > +  op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
> > +  emit_insn (gen_avx512f_movsf_mask (op1, op1, op0, op1, operands[4]));
> > +  emit_move_insn (op0, op1);
> Considering ICE in PR104976, it's better to force_reg before lowpart_subreg.
> i.e.
> op0 = lowpart_subreg (V4SFmode, force_reg (V8HFmode, operands[0]), V8HFmode);
> if (!MEM_P (operands[1]))
>   operands[1] = force_reg (V8HFmode, operands[1]);
> op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
> rtx dest = gen_reg_rtx (V4SFmode);
> emit_insn (gen_avx512f_movsf_mask (dest, op1, op0, op1, operands[4]));
> emit_move_insn (operands[0], lowpart_subreg (V8HFmode, dest, V4SFmode);
>
> >DONE;
> >  })
> >
> > @@ -6641,26 +6625,10 @@ (define_expand 
> > "avx512fp16_fcmaddcsh_v8hf_mask1"
> >  emit_insn (gen_avx512fp16_fcmaddcsh_v8hf_mask (operands[0],
> >operands[1], operands[2], operands[3], operands[4]));
> >
> > -  if (TARGET_AVX512VL)
> > -  {
> > -op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
> > -op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
> > -emit_insn 

[PATCH] x86: Properly check FEATURE_AESKLE

2022-03-21 Thread H.J. Lu via Gcc-patches
1. Pass 0x19 to __cpuid for bit_AESKLE.
2. Enable FEATURE_AESKLE only if bit_AESKLE is set.

PR target/104998
* common/config/i386/cpuinfo.h (get_available_features): Pass
0x19 to __cpuid for bit_AESKLE.  Enable FEATURE_AESKLE only if
bit_AESKLE is set.
---
 gcc/common/config/i386/cpuinfo.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 61b1a0f291c..239759dc766 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -779,11 +779,11 @@ get_available_features (struct __processor_model 
*cpu_model,
   /* Get Advanced Features at level 0x19 (eax = 0x19).  */
   if (max_cpuid_level >= 0x19)
 {
-  set_feature (FEATURE_AESKLE);
-  __cpuid (19, eax, ebx, ecx, edx);
+  __cpuid (0x19, eax, ebx, ecx, edx);
   /* Check if OS support keylocker.  */
   if (ebx & bit_AESKLE)
{
+ set_feature (FEATURE_AESKLE);
  if (ebx & bit_WIDEKL)
set_feature (FEATURE_WIDEKL);
  if (has_kl)
-- 
2.35.1



Re: [PING^2][PATCH][final] Handle compiler-generated asm insn

2022-03-21 Thread Richard Biener via Gcc-patches
On Mon, Mar 21, 2022 at 12:50 PM Tom de Vries  wrote:
>
> On 3/21/22 08:58, Richard Biener wrote:
> > On Thu, Mar 17, 2022 at 4:10 PM Tom de Vries via Gcc-patches
> >  wrote:
> >>
> >> On 3/9/22 13:50, Tom de Vries wrote:
> >>> On 2/22/22 14:55, Tom de Vries wrote:
>  Hi,
> 
>  For the nvptx port, with -mptx-comment we have in pr53465.s:
>  ...
>    // #APP
>  // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
>    // Start: Added by -minit-regs=3:
>    // #NO_APP
>    mov.u32 %r26, 0;
>    // #APP
>  // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
>    // End: Added by -minit-regs=3:
>    // #NO_APP
>  ...
> 
>  The comments where generated using the compiler-generated equivalent of:
>  ...
>  asm ("// Comment");
>  ...
>  but both the printed location and the NO_APP/APP are unnecessary for a
>  compiler-generated asm insn.
> 
>  Fix this by handling ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION in
>  final_scan_insn_1, such what we simply get:
>  ...
>    // Start: Added by -minit-regs=3:
>    mov.u32 %r26, 0;
>    // End: Added by -minit-regs=3:
>  ...
> 
>  Tested on nvptx.
> 
>  OK for trunk?
> 
> >>>
> >>
> >> Ping^2.
> >>
> >> Tobias just reported an ICE in PR104968, and this patch fixes it.
> >>
> >> I'd like to known whether this patch is acceptable for stage 4 or not.
> >>
> >> If not, I need to fix PR104968 in a different way.  Say, disable
> >> -mcomment by default, or trying harder to propagate source info on
> >> outlined functions.
> >
>
> Hi,
>
> thanks for the review.
>
> > Usually targets use UNSPECs to emit compiler-generated "asm"
> > instructions.
>
> Ack. [ I could go down that route eventually, but for now I'm hoping to
> implement this without having to change the port. ]
>
> > I think an unknown location is a reasonable but not
> > the best way to identify 'compiler-generated', we might lose
> > the location through optimization.  (why does it not use
> > the INSN_LOCATION?)
> >
>
> I don't know.  FWIW, at the time that ASM_INPUT_SOURCE_LOCATION was
> introduced (2007), there was no INSN_LOCATION yet (introduced in 2012),
> only INSN_LOCATOR, my guess is that it has something to do with that.
>
> > Rather than a location I'd use sth like DECL_ARTIFICIAL to
> > disable 'user-mangling', do we have something like that for
> > ASM or an insn in general?
>
> Haven't found it.
>
> > If not maybe there's an unused
> > bit on ASMs we can enable this way.
>
> Done.  I've used the jump flag for that.
>
> Updated, untested patch attached.
>
> Is this what you meant?

Hmm.  I now read that ASM_INPUT is in every PATTERN of an insn
and wonder how this all works out there.  That is, by default the
ASM_INPUT would be artificial (for regular define_insn) but asm("")
in source would mark them ASM_INPUT_USER_P or so.

But then I know nothing here.  I did expect us to look at
ASM_OPERANDS instead of just ASM_INPUT (but the code you
are changing is about ASM_INPUT).

That said, the comments should probably explicitely say this
is about ASM_INPUT in an ASM_OPERANDS  instruction
template, not some other pattern.

But yes, this was kind-of what I meant.

Any considerations from others?

Thanks,
Richard.

>
> Thanks,
> - Tom


[PATCH] [i386] Extend splitter pattern to reversed condition by swapping then and else rtx. [PR target/104982]

2022-03-21 Thread liuhongt via Gcc-patches
Failed to match this instruction:
(set (reg/v:SI 88 [ z ])
(if_then_else:SI (eq (zero_extract:SI (reg:SI 92)
(const_int 1 [0x1])
(zero_extend:SI (subreg:QI (reg:SI 93) 0)))
(const_int 0 [0]))
(reg:SI 95)
(reg:SI 94)))

but it's equal to

(set (reg/v:SI 88 [ z ])
(if_then_else:SI (ne (zero_extract:SI (reg:SI 92)
(const_int 1 [0x1])
(zero_extend:SI (subreg:QI (reg:SI 93) 0)))
(const_int 0 [0]))
(reg:SI 94)
(reg:SI 95)))

which is the exact existing splitter.

The patch will fix below regressions:

On x86-64, r12-7687 caused:

FAIL: gcc.target/i386/bt-5.c scan-assembler-not sar[lq][ \t]
FAIL: gcc.target/i386/bt-5.c scan-assembler-times bt[lq][ \t] 7

Bootstrap and regtested on x86_64-pc-linux-gnu{-m32,}
Ok for trunk?

gcc/ChangeLog:

PR target/104982
* config/i386/i386.md (*jcc_bt_mask): Extend the
following splitter to reversed condition.
---
 gcc/config/i386/i386.md | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 02f298c2846..c74edd1aaef 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14182,12 +14182,12 @@ (define_insn_and_split "*jcc_bt_mask"
 (define_split
   [(set (match_operand:SWI248 0 "register_operand")
(if_then_else:SWI248
-(ne
- (zero_extract:SWI48
-  (match_operand:SWI48 1 "register_operand")
-  (const_int 1)
-  (zero_extend:SI (match_operand:QI 2 "register_operand")))
- (const_int 0))
+(match_operator 5 "bt_comparison_operator"
+ [(zero_extract:SWI48
+   (match_operand:SWI48 1 "register_operand")
+   (const_int 1)
+   (zero_extend:SI (match_operand:QI 2 "register_operand")))
+  (const_int 0)])
 (match_operand:SWI248 3 "nonimmediate_operand")
 (match_operand:SWI248 4 "nonimmediate_operand")))]
   "TARGET_USE_BT && TARGET_CMOVE
@@ -14202,6 +14202,8 @@ (define_split
 (match_dup 3)
 (match_dup 4)))]
 {
+  if (GET_CODE (operands[5]) == EQ)
+std::swap (operands[3], operands[4]);
   operands[2] = lowpart_subreg (SImode, operands[2], QImode);
 })
 
-- 
2.18.1



Re: [PATCH v2] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-21 Thread Hongtao Liu via Gcc-patches
On Mon, Mar 21, 2022 at 7:52 PM Hongyu Wang via Gcc-patches
 wrote:
>
> Hi,
>
> For complex scalar intrinsic like _mm_mask_fcmadd_sch, the
> mask should be and by 1 to ensure the mask is bind to lowest byte.
> Use masked vmovss to perform same operation which omits higher bits
> of mask.
>
> Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.
>
> Ok for master?
>
> gcc/ChangeLog:
>
> PR target/104978
> * config/i386/sse.md
> (avx512fp16_fmaddcsh_v8hf_mask1 Use avx512f_movsf_mask instead of vmovaps or vblend.
> (avx512fp16_fcmaddcsh_v8hf_mask1
> gcc/testsuite/ChangeLog:
>
> PR target/104978
> * gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c: Adjust scan.
> * gcc.target/i386/avx512fp16-vfmaddcsh-1a.c: Ditto.
> * gcc.target/i386/avx512fp16-vfcmaddcsh-1c.c: Removed.
> * gcc.target/i386/avx512fp16-vfmaddcsh-1c.c: Ditto.
> * gcc.target/i386/pr104978.c: New test.
> ---
>  gcc/config/i386/sse.md| 48 ---
>  .../i386/avx512fp16-vfcmaddcsh-1a.c   |  4 +-
>  .../i386/avx512fp16-vfcmaddcsh-1c.c   | 13 -
>  .../gcc.target/i386/avx512fp16-vfmaddcsh-1a.c |  4 +-
>  .../gcc.target/i386/avx512fp16-vfmaddcsh-1c.c | 13 -
>  gcc/testsuite/gcc.target/i386/pr104978.c  | 18 +++
>  6 files changed, 30 insertions(+), 70 deletions(-)
>  delete mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1c.c
>  delete mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104978.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 21bf3c55c95..1087a37812f 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -6586,26 +6586,10 @@ (define_expand 
> "avx512fp16_fmaddcsh_v8hf_mask1"
>  emit_insn (gen_avx512fp16_fmaddcsh_v8hf_mask (operands[0],
>operands[1], operands[2], operands[3], operands[4]));
>
> -  if (TARGET_AVX512VL)
> -  {
> -op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
> -op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
> -emit_insn (gen_avx512vl_loadv4sf_mask (op0, op0, op1, operands[4]));
> -  }
> -  else
> -  {
> -rtx mask, tmp, vec_mask;
> -mask = lowpart_subreg (SImode, operands[4], QImode),
> -tmp = gen_reg_rtx (SImode);
> -emit_insn (gen_ashlsi3 (tmp, mask, GEN_INT (31)));
> -vec_mask = gen_reg_rtx (V4SImode);
> -emit_insn (gen_rtx_SET (vec_mask, CONST0_RTX (V4SImode)));
> -emit_insn (gen_vec_setv4si_0 (vec_mask, vec_mask, tmp));
> -vec_mask = lowpart_subreg (V4SFmode, vec_mask, V4SImode);
> -op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
> -op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
> -emit_insn (gen_sse4_1_blendvps (op0, op1, op0, vec_mask));
> -  }
> +  op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
> +  op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
> +  emit_insn (gen_avx512f_movsf_mask (op1, op1, op0, op1, operands[4]));
> +  emit_move_insn (op0, op1);
Considering ICE in PR104976, it's better to force_reg before lowpart_subreg.
i.e.
op0 = lowpart_subreg (V4SFmode, force_reg (V8HFmode, operands[0]), V8HFmode);
if (!MEM_P (operands[1]))
  operands[1] = force_reg (V8HFmode, operands[1]);
op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
rtx dest = gen_reg_rtx (V4SFmode);
emit_insn (gen_avx512f_movsf_mask (dest, op1, op0, op1, operands[4]));
emit_move_insn (operands[0], lowpart_subreg (V8HFmode, dest, V4SFmode);

>DONE;
>  })
>
> @@ -6641,26 +6625,10 @@ (define_expand 
> "avx512fp16_fcmaddcsh_v8hf_mask1"
>  emit_insn (gen_avx512fp16_fcmaddcsh_v8hf_mask (operands[0],
>operands[1], operands[2], operands[3], operands[4]));
>
> -  if (TARGET_AVX512VL)
> -  {
> -op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
> -op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
> -emit_insn (gen_avx512vl_loadv4sf_mask (op0, op0, op1, operands[4]));
> -  }
> -  else
> -  {
> -rtx mask, tmp, vec_mask;
> -mask = lowpart_subreg (SImode, operands[4], QImode),
> -tmp = gen_reg_rtx (SImode);
> -emit_insn (gen_ashlsi3 (tmp, mask, GEN_INT (31)));
> -vec_mask = gen_reg_rtx (V4SImode);
> -emit_insn (gen_rtx_SET (vec_mask, CONST0_RTX (V4SImode)));
> -emit_insn (gen_vec_setv4si_0 (vec_mask, vec_mask, tmp));
> -vec_mask = lowpart_subreg (V4SFmode, vec_mask, V4SImode);
> -op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
> -op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
> -emit_insn (gen_sse4_1_blendvps (op0, op1, op0, vec_mask));
> -  }
> +  op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
> +  op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
> +  emit_insn (gen_avx512f_movsf_mask (op1, op1, op0, op1, operands[4]));
> +  emit_move_insn (op0, op1);
>DONE;
>  })
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c 

[PATCH] Add condition coverage profiling

2022-03-21 Thread Jørgen Kvalsvik via Gcc-patches
This patch adds support in gcc+gcov for modified condition/decision
coverage (MC/DC) with the -fprofile-conditions flag. MC/DC is a type of
test/code coverage, and it is particularly important in the avation and
automotive industries for safety-critical applications. In particular,
it is required or recommended by:

* DO-178C for the most critical software (Level A) in avionics
* IEC 61508 for SIL 4
* ISO 26262-6 for ASIL D

>From the SQLite webpage:

Two methods of measuring test coverage were described above:
"statement" and "branch" coverage. There are many other test
coverage metrics besides these two. Another popular metric is
"Modified Condition/Decision Coverage" or MC/DC. Wikipedia defines
MC/DC as follows:

* Each decision tries every possible outcome.
* Each condition in a decision takes on every possible outcome.
* Each entry and exit point is invoked.
* Each condition in a decision is shown to independently affect
  the outcome of the decision.

In the C programming language where && and || are "short-circuit"
operators, MC/DC and branch coverage are very nearly the same thing.
The primary difference is in boolean vector tests. One can test for
any of several bits in bit-vector and still obtain 100% branch test
coverage even though the second element of MC/DC - the requirement
that each condition in a decision take on every possible outcome -
might not be satisfied.

https://sqlite.org/testing.html#mcdc

Wahlen, Heimdahl, and De Silva "Efficient Test Coverage Measurement for
MC/DC" describes an algorithm for adding instrumentation by carrying
over information from the AST, but my algorithm does analysis on the
control flow graph. This should make it work for any language gcc
supports, although I have only tested it on constructs in C and C++, see
testsuite/gcc.misc-tests and testsuite/g++.dg.

Like Wahlen et al this implementation uses bitsets to store conditions,
which gcov later interprets. This is very fast, but introduces an max
limit for the number of terms in a single boolean expression. This limit
is the number of bits in a gcov_unsigned_type (which is typedef'd to
uint64_t), so for most practical purposes this would be acceptable.
limitation can be relaxed with a more sophisticated way of storing and
updating bitsets (for example length-encoding).

In action it looks pretty similar to the branch coverage. The -g short
opt carries no significance, but was chosen because it was an available
option with the upper-case free too.

gcov --conditions:

3:   17:void fn (int a, int b, int c, int d) {
3:   18:if ((a && (b || c)) && d)
conditions covered 5/8
condition  1 not covered (false)
condition  2 not covered (true)
condition  2 not covered (false)
1:   19:x = 1;
-:   20:else
2:   21:x = 2;
3:   22:}

gcov --conditions --json-format:

"conditions": [
{
"not_covered_false": [
1,
2
],
"count": 8,
"covered": 5,
"not_covered_true": [
2
]
}
],

C++ destructors will add extra conditionals that are not explicitly
present in source code. These are present in the CFG, which means the
condition coverage will show them.

gcov --conditions

-:5:struct A {
1:6:explicit A (int x) : v (x) {}
1:7:operator bool () const { return bool(v); }
1:8:~A() {}
-:9:
-:   10:int v;
-:   11:};
-:   12:
2:   13:void fn (int a, int b) {
   2*:   14:x = a && A(b);
conditions covered 2/4
condition  0 not covered (true)
condition  1 not covered (true)
conditions covered 2/2

They are also reported by the branch coverage.

gcov -bc

   2*:   14:x = a && A(b);
branch  0 taken 1 (fallthrough)
branch  1 taken 1
call2 returned 1
call3 returned 1
branch  4 taken 0 (fallthrough)
branch  5 taken 1
branch  6 taken 1 (fallthrough)
branch  7 taken 1

The algorithm struggles in a particular case where gcc would generate
identical CFGs for different expressions:

and.c:

if (a && b && c)
x = 1;

ifs.c:

if (a)
if (b)
if (c)
x = 1;

if (a && b && c)
x = 1;

and.c.gcov:

#:2:if (a && b && c)
conditions covered 2/2
conditions covered 2/2
conditions covered 2/2
#:3:x = 1;

ifs.c.gcov:
#:2:if (a)
conditions covered 2/2
#:3:   2if (b)
conditions covered 2/2
#:4:   2if (c)
conditions covered 2/2
#:5:x = 1;

These programs are semantically equivalent, and are interpreted as 3
separate conditional expressions. It does not matter w.r.t. coverage,
but is not ideal. Adding an else to and.c fixes the issue:

#:

[PATCH v2] AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]

2022-03-21 Thread Hongyu Wang via Gcc-patches
Hi,

For complex scalar intrinsic like _mm_mask_fcmadd_sch, the
mask should be and by 1 to ensure the mask is bind to lowest byte.
Use masked vmovss to perform same operation which omits higher bits
of mask.

Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.

Ok for master?

gcc/ChangeLog:

PR target/104978
* config/i386/sse.md
(avx512fp16_fmaddcsh_v8hf_mask1"
 emit_insn (gen_avx512fp16_fmaddcsh_v8hf_mask (operands[0],
   operands[1], operands[2], operands[3], operands[4]));
 
-  if (TARGET_AVX512VL)
-  {
-op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
-op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
-emit_insn (gen_avx512vl_loadv4sf_mask (op0, op0, op1, operands[4]));
-  }
-  else
-  {
-rtx mask, tmp, vec_mask;
-mask = lowpart_subreg (SImode, operands[4], QImode),
-tmp = gen_reg_rtx (SImode);
-emit_insn (gen_ashlsi3 (tmp, mask, GEN_INT (31)));
-vec_mask = gen_reg_rtx (V4SImode);
-emit_insn (gen_rtx_SET (vec_mask, CONST0_RTX (V4SImode)));
-emit_insn (gen_vec_setv4si_0 (vec_mask, vec_mask, tmp));
-vec_mask = lowpart_subreg (V4SFmode, vec_mask, V4SImode);
-op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
-op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
-emit_insn (gen_sse4_1_blendvps (op0, op1, op0, vec_mask));
-  }
+  op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
+  op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
+  emit_insn (gen_avx512f_movsf_mask (op1, op1, op0, op1, operands[4]));
+  emit_move_insn (op0, op1);
   DONE;
 })
 
@@ -6641,26 +6625,10 @@ (define_expand 
"avx512fp16_fcmaddcsh_v8hf_mask1"
 emit_insn (gen_avx512fp16_fcmaddcsh_v8hf_mask (operands[0],
   operands[1], operands[2], operands[3], operands[4]));
 
-  if (TARGET_AVX512VL)
-  {
-op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
-op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
-emit_insn (gen_avx512vl_loadv4sf_mask (op0, op0, op1, operands[4]));
-  }
-  else
-  {
-rtx mask, tmp, vec_mask;
-mask = lowpart_subreg (SImode, operands[4], QImode),
-tmp = gen_reg_rtx (SImode);
-emit_insn (gen_ashlsi3 (tmp, mask, GEN_INT (31)));
-vec_mask = gen_reg_rtx (V4SImode);
-emit_insn (gen_rtx_SET (vec_mask, CONST0_RTX (V4SImode)));
-emit_insn (gen_vec_setv4si_0 (vec_mask, vec_mask, tmp));
-vec_mask = lowpart_subreg (V4SFmode, vec_mask, V4SImode);
-op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
-op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
-emit_insn (gen_sse4_1_blendvps (op0, op1, op0, vec_mask));
-  }
+  op0 = lowpart_subreg (V4SFmode, operands[0], V8HFmode);
+  op1 = lowpart_subreg (V4SFmode, operands[1], V8HFmode);
+  emit_insn (gen_avx512f_movsf_mask (op1, op1, op0, op1, operands[4]));
+  emit_move_insn (op0, op1);
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c
index eb96588df39..0f87861f09b 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c
@@ -1,13 +1,13 @@
 /* { dg-do compile } */
-/* { dg-options "-mavx512fp16 -mno-avx512vl -O2" } */
+/* { dg-options "-mavx512fp16 -O2" } */
 /* { dg-final { scan-assembler-times "vfcmaddcsh\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ 
\\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vfcmaddcsh\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[
 \\t\]+#)" 2 } } */
 /* { dg-final { scan-assembler-times "vfcmaddcsh\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[
 \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vfcmaddcsh\[ 
\\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[
 \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vfcmaddcsh\[ 
\\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[
 \\t\]+#)" 2 } } */
 /* { dg-final { scan-assembler-times "vfcmaddcsh\[ 
\\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[
 \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vblendvps\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[
 \\t\]+#)" 2 } } */
 /* { dg-final { scan-assembler-times "vmovss\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ 
\\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vmovss\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}+(?:\n|\[
 \\t\]+#)" 2 } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1c.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1c.c
deleted file mode 

Re: [PING^2][PATCH][final] Handle compiler-generated asm insn

2022-03-21 Thread Tom de Vries via Gcc-patches

On 3/21/22 08:58, Richard Biener wrote:

On Thu, Mar 17, 2022 at 4:10 PM Tom de Vries via Gcc-patches
 wrote:


On 3/9/22 13:50, Tom de Vries wrote:

On 2/22/22 14:55, Tom de Vries wrote:

Hi,

For the nvptx port, with -mptx-comment we have in pr53465.s:
...
  // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
  // Start: Added by -minit-regs=3:
  // #NO_APP
  mov.u32 %r26, 0;
  // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
  // End: Added by -minit-regs=3:
  // #NO_APP
...

The comments where generated using the compiler-generated equivalent of:
...
asm ("// Comment");
...
but both the printed location and the NO_APP/APP are unnecessary for a
compiler-generated asm insn.

Fix this by handling ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION in
final_scan_insn_1, such what we simply get:
...
  // Start: Added by -minit-regs=3:
  mov.u32 %r26, 0;
  // End: Added by -minit-regs=3:
...

Tested on nvptx.

OK for trunk?





Ping^2.

Tobias just reported an ICE in PR104968, and this patch fixes it.

I'd like to known whether this patch is acceptable for stage 4 or not.

If not, I need to fix PR104968 in a different way.  Say, disable
-mcomment by default, or trying harder to propagate source info on
outlined functions.




Hi,

thanks for the review.


Usually targets use UNSPECs to emit compiler-generated "asm"
instructions.


Ack. [ I could go down that route eventually, but for now I'm hoping to 
implement this without having to change the port. ]



I think an unknown location is a reasonable but not
the best way to identify 'compiler-generated', we might lose
the location through optimization.  (why does it not use
the INSN_LOCATION?)



I don't know.  FWIW, at the time that ASM_INPUT_SOURCE_LOCATION was 
introduced (2007), there was no INSN_LOCATION yet (introduced in 2012), 
only INSN_LOCATOR, my guess is that it has something to do with that.



Rather than a location I'd use sth like DECL_ARTIFICIAL to
disable 'user-mangling', do we have something like that for
ASM or an insn in general? 


Haven't found it.


If not maybe there's an unused
bit on ASMs we can enable this way.


Done.  I've used the jump flag for that.

Updated, untested patch attached.

Is this what you meant?

Thanks,
- Tom[final] Handle compiler-generated asm insn

For the nvptx port, with -mptx-comment we have in pr53465.s:
...
// #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
// Start: Added by -minit-regs=3:
// #NO_APP
mov.u32 %r26, 0;
// #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
// End: Added by -minit-regs=3:
// #NO_APP
...

The comments where generated using the compiler-generated equivalent of:
...
  asm ("// Comment");
...
but both the printed location and the NO_APP/APP are unnecessary for a
compiler-generated asm insn.

Fix this by:
- adding new flag ASM_INPUT_ARTIFICIAL_P
- in gen_comment:
  - setting ASM_INPUT_ARTIFICIAL_P to 1
  - setting ASM_INPUT_SOURCE_LOCATION to UNKNOWN_LOCATION,
- in final_scan_insn_1:
  - handling ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION and
  ASM_INPUT_ARTIFICIAL_P
such what we simply get:
...
// Start: Added by -minit-regs=3:
mov.u32 %r26, 0;
// End: Added by -minit-regs=3:
...

Tested on nvptx.

gcc/ChangeLog:

2022-02-21  Tom de Vries  

	PR rtl-optimization/104596
	* rtl.h (struct rtx_def): Document use of jump flag in ASM_INPUT.
	(ASM_INPUT_ARTIFICIAL_P): New macro.
	* config/nvptx/nvptx.cc (gen_comment): Use gen_rtx_ASM_INPUT instead
	of gen_rtx_ASM_INPUT_loc.  Set ASM_INPUT_ARTIFICIAL_P.
	* final.cc (final_scan_insn_1): Handle
	ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION and
	ASM_INPUT_ARTIFICIAL_P.

---
 gcc/config/nvptx/nvptx.cc |  5 +++--
 gcc/final.cc  | 18 --
 gcc/rtl.h |  3 +++
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 87efc23bd96a..93df3f309d18 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -5442,8 +5442,9 @@ gen_comment (const char *s)
   size_t len = strlen (ASM_COMMENT_START) + strlen (sep) + strlen (s) + 1;
   char *comment = (char *) alloca (len);
   snprintf (comment, len, "%s%s%s", ASM_COMMENT_START, sep, s);
-  return gen_rtx_ASM_INPUT_loc (VOIDmode, ggc_strdup (comment),
-DECL_SOURCE_LOCATION (cfun->decl));
+  rtx asm_input = gen_rtx_ASM_INPUT (VOIDmode, ggc_strdup (comment));
+  ASM_INPUT_ARTIFICIAL_P (asm_input) = 1;
+  return asm_input;
 }
 
 /* Initialize all declared regs at function entry.
diff --git a/gcc/final.cc b/gcc/final.cc
index a9868861bd2c..fee512869482 100644
--- a/gcc/final.cc
+++ b/gcc/final.cc
@@ -2642,15 +2642,21 @@ final_scan_insn_1 (rtx_insn *insn, FILE *file, int optimize_p ATTRIBUTE_UNUSED,
 	if (string[0])
 	  {
 		

[committed] [PATCH] Avoid a warning of overflow

2022-03-21 Thread qianjh--- via Gcc-patches
Thank you @Arnaud Charlet
Committed

> -Original Message-
> From: Arnaud Charlet 
> Sent: Monday, March 21, 2022 5:47 PM
> To: Qian, Jianhua/钱 建华 
> Cc: gcc-patches@gcc.gnu.org; Arnaud Charlet 
> Subject: Re: PING [PATCH] Avoid a warning of overflow
> 
> > This warning will become ERROR in stage2 of bootstrap when use " make
> > BOOT_CFLAGS='-O0' BOOT_CXXFLAGS='-O0' " command.
> > So it is better to fix this warning.
> > There are other similar warnings. I will submit patches one by one.
> >
> > Tested on x86_64. OK for trunk?
> 
> This is OK (pretty much obvious), thanks.
> 
> > > -Original Message-
> > > From: Qian Jianhua 
> > > Sent: Friday, March 18, 2022 6:02 PM
> > > To: Qian, Jianhua/钱 建华 
> > > Subject: [PATCH] Avoid a warning of overflow
> > >
> > > This patch avoid a warning of "c-ada-spec.cc:1660:34: warning:
> > > 'sprintf' may write a terminating nul past the end of the
> > > destination [-Wformat-overflow=]" when build GCC.
> > >
> > > gcc/c-family/
> > >   * c-ada-spec.cc: Change array length
> > >
> > > ---
> > >  gcc/c-family/c-ada-spec.cc | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/gcc/c-family/c-ada-spec.cc b/gcc/c-family/c-ada-spec.cc
> > > index
> > > 149d336ee96..aeb429136b6 100644
> > > --- a/gcc/c-family/c-ada-spec.cc
> > > +++ b/gcc/c-family/c-ada-spec.cc
> > > @@ -1579,7 +1579,7 @@ dump_ada_function_declaration (pretty_printer
> > > *buffer, tree func,
> > >tree type = TREE_TYPE (func);
> > >tree arg = TYPE_ARG_TYPES (type);
> > >tree t;
> > > -  char buf[17];
> > > +  char buf[18];
> > >int num, num_args = 0, have_args = true, have_ellipsis = false;
> > >
> > >/* Compute number of arguments.  */
> > > --
> > > 2.18.1


Re: [PATCH] libstdc++: Work around clang misdesign in time_get<>::get [PR104990]

2022-03-21 Thread Jonathan Wakely via Gcc-patches
On Mon, 21 Mar 2022 at 06:42, Jakub Jelinek wrote:
>
> Hi!
>
> Apparently clang has a -fgnuc-version= option which allows it to pretend
> it is any GCC version the user likes.  It is already bad that it claims to
> be GCC 4.2 compatible by default when it is not (various unimplemented
> extensions at least), but this option is a horrible idea.
>
> Anyway, this patch adds a hack for it.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK for trunk.

In stage 1 we might want to consider removing the __GNUC__ check
entirely. We don't support using old versions of genuine GCC. Intel
icc claims to be the latest GCC. And now Clang can be told to cosplay
as any version of GCC, so we can't trust __GNUC__ at all. So this
check would be just !defined(__clang__).


>
> 2022-03-21  Jakub Jelinek  
>
> PR libstdc++/104990
> * include/bits/locale_facets_nonio.tcc (get): Don't check if do_get
> isn't overloaded if __clang__ is defined.
>
> --- libstdc++-v3/include/bits/locale_facets_nonio.tcc   2022-03-18 
> 10:37:41.176593188 +0100
> +++ libstdc++-v3/include/bits/locale_facets_nonio.tcc   2022-03-20 
> 20:28:07.203815325 +0100
> @@ -1465,7 +1465,7 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
>ctype<_CharT> const& __ctype = use_facet >(__loc);
>__err = ios_base::goodbit;
>bool __use_state = false;
> -#if __GNUC__ >= 5
> +#if __GNUC__ >= 5 && !defined(__clang__)
>  #pragma GCC diagnostic push
>  #pragma GCC diagnostic ignored "-Wpmf-conversions"
>// Nasty hack.  The C++ standard mandates that get invokes the do_get
>
> Jakub
>



Re: [PATCH] Allow (void *) 0xdeadbeef accesses without warnings [PR99578]

2022-03-21 Thread Martin Liška

Hi.

I'm installing the following patch that documents the newly introduced 
parameter.

MartinFrom 3f18553eb7dabc6528d712e54b25ea6f96e51bde Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 21 Mar 2022 10:46:57 +0100
Subject: [PATCH] docs: Document min-pagesize parameter.

gcc/ChangeLog:

	* doc/invoke.texi: Document min-pagesize parameter.
---
 gcc/doc/invoke.texi | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e3f2e82cde5..4da4a1170f5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15132,6 +15132,9 @@ when evaluating outgoing edge ranges.
 @item relation-block-limit
 Maximum number of relations the oracle will register in a basic block.
 
+@item min-pagesize
+Minimum page size for warning purposes.
+
 @item openacc-kernels
 Specify mode of OpenACC `kernels' constructs handling.
 With @option{--param=openacc-kernels=decompose}, OpenACC `kernels'
-- 
2.35.1



Re: PING [PATCH] Avoid a warning of overflow

2022-03-21 Thread Arnaud Charlet via Gcc-patches
> This warning will become ERROR in stage2 of bootstrap when use 
> " make BOOT_CFLAGS='-O0' BOOT_CXXFLAGS='-O0' " command.
> So it is better to fix this warning. 
> There are other similar warnings. I will submit patches one by one.
> 
> Tested on x86_64. OK for trunk?

This is OK (pretty much obvious), thanks.

> > -Original Message-
> > From: Qian Jianhua 
> > Sent: Friday, March 18, 2022 6:02 PM
> > To: Qian, Jianhua/钱 建华 
> > Subject: [PATCH] Avoid a warning of overflow
> > 
> > This patch avoid a warning of "c-ada-spec.cc:1660:34: warning:
> > 'sprintf' may write a terminating nul past the end of the destination
> > [-Wformat-overflow=]" when build GCC.
> > 
> > gcc/c-family/
> > * c-ada-spec.cc: Change array length
> > 
> > ---
> >  gcc/c-family/c-ada-spec.cc | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/gcc/c-family/c-ada-spec.cc b/gcc/c-family/c-ada-spec.cc index
> > 149d336ee96..aeb429136b6 100644
> > --- a/gcc/c-family/c-ada-spec.cc
> > +++ b/gcc/c-family/c-ada-spec.cc
> > @@ -1579,7 +1579,7 @@ dump_ada_function_declaration (pretty_printer
> > *buffer, tree func,
> >tree type = TREE_TYPE (func);
> >tree arg = TYPE_ARG_TYPES (type);
> >tree t;
> > -  char buf[17];
> > +  char buf[18];
> >int num, num_args = 0, have_args = true, have_ellipsis = false;
> > 
> >/* Compute number of arguments.  */
> > --
> > 2.18.1


PING [PATCH] Avoid a warning of overflow

2022-03-21 Thread qianjh--- via Gcc-patches
Hi 

This warning will become ERROR in stage2 of bootstrap when use 
" make BOOT_CFLAGS='-O0' BOOT_CXXFLAGS='-O0' " command.
So it is better to fix this warning. 
There are other similar warnings. I will submit patches one by one.

Tested on x86_64. OK for trunk?

> -Original Message-
> From: Qian Jianhua 
> Sent: Friday, March 18, 2022 6:02 PM
> To: Qian, Jianhua/钱 建华 
> Subject: [PATCH] Avoid a warning of overflow
> 
> This patch avoid a warning of "c-ada-spec.cc:1660:34: warning:
> 'sprintf' may write a terminating nul past the end of the destination
> [-Wformat-overflow=]" when build GCC.
> 
> gcc/c-family/
>   * c-ada-spec.cc: Change array length
> 
> ---
>  gcc/c-family/c-ada-spec.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/c-family/c-ada-spec.cc b/gcc/c-family/c-ada-spec.cc index
> 149d336ee96..aeb429136b6 100644
> --- a/gcc/c-family/c-ada-spec.cc
> +++ b/gcc/c-family/c-ada-spec.cc
> @@ -1579,7 +1579,7 @@ dump_ada_function_declaration (pretty_printer
> *buffer, tree func,
>tree type = TREE_TYPE (func);
>tree arg = TYPE_ARG_TYPES (type);
>tree t;
> -  char buf[17];
> +  char buf[18];
>int num, num_args = 0, have_args = true, have_ellipsis = false;
> 
>/* Compute number of arguments.  */
> --
> 2.18.1



[PATCH] Dump when estimating the number of iterations of a loop

2022-03-21 Thread Richard Biener via Gcc-patches
Currently the dumps are somewhat inter-mangled, not showing the
(possibly bad) recursion between niter estimation and number of
iteration computation.  The following tries to improve deciphering
a little bit by dumping when we do niter estimation.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2022-03-21  Richard Biener  

* tree-ssa-loop-niter.cc (estimate_numbers_of_iterations): Dump
we are estimating niter of loop.
---
 gcc/tree-ssa-loop-niter.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index 9bb5097379b..afa51064953 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -4374,6 +4374,9 @@ estimate_numbers_of_iterations (class loop *loop)
   if (loop->estimate_state != EST_NOT_COMPUTED)
 return;
 
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file, "Estimating # of iterations of loop %d\n", loop->num);
+
   loop->estimate_state = EST_AVAILABLE;
 
   /* If we have a measured profile, use it to estimate the number of
-- 
2.34.1


[PATCH]rs6000: avoid peeking eof after __vector keyword

2022-03-21 Thread Jiufu Guo via Gcc-patches
Hi!

There is a rare corner case: where __vector is followed only with ";"
and near the end of the file.

Like the case in PR101168:
using vdbl =  __vector double;
#define BREAK 1

For this case, "__vector double" is not followed by a PP_NAME, it is
followed by CPP_SEMICOLON and then EOF.  In this case, there is no
more tokens in the file.  Then, do not need to continue to parse the
file.

This patch pass bootstrap and regtest on ppc64 and ppc64le.


BR,
Jiufu


PR preprocessor/101168

gcc/ChangeLog:

* config/rs6000/rs6000-c.cc (rs6000_macro_to_expand):
Avoid empty identifier.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/pr101168.C: New test.


---
 gcc/config/rs6000/rs6000-c.cc   | 4 +++-
 gcc/testsuite/g++.target/powerpc/pr101168.C | 6 ++
 2 files changed, 9 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr101168.C

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 3b62b499df2..f8cc7bad812 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -282,7 +282,9 @@ rs6000_macro_to_expand (cpp_reader *pfile, const cpp_token 
*tok)
expand_bool_pixel = __pixel_keyword;
  else if (ident == C_CPP_HASHNODE (__bool_keyword))
expand_bool_pixel = __bool_keyword;
- else
+
+ /* If it needs to check tokens continue.  */
+ else if (ident)
{
  /* Try two tokens down, too.  */
  do
diff --git a/gcc/testsuite/g++.target/powerpc/pr101168.C 
b/gcc/testsuite/g++.target/powerpc/pr101168.C
new file mode 100644
index 000..284e77fdc88
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr101168.C
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec" } */
+
+using vdbl =  __vector double;
+#define BREAK 1
-- 
2.25.1



[committed] RISC-V: Implement misc macro for vector extensions.

2022-03-21 Thread Kito Cheng
See also:
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/21

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_flag_table):
Update flag name and mask name.
* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Define
misc macro for vector extensions.
* config/riscv/riscv-opts.h (MASK_VECTOR_EEW_32): Rename to ...
(MASK_VECTOR_ELEN_32): ... this.
(MASK_VECTOR_EEW_64): Rename to ...
(MASK_VECTOR_ELEN_64): ... this.
(MASK_VECTOR_EEW_FP_32): Rename to ...
(MASK_VECTOR_ELEN_FP_32): ... this.
(MASK_VECTOR_EEW_FP_64): Rename to ...
(MASK_VECTOR_ELEN_FP_64): ... this.
(TARGET_VECTOR_ELEN_32): New.
(TARGET_VECTOR_ELEN_64): Ditto.
(TARGET_VECTOR_ELEN_FP_32): Ditto.
(TARGET_VECTOR_ELEN_FP_64): Ditto.
(TARGET_MIN_VLEN): Ditto.
* config/riscv/riscv.opt (riscv_vector_eew_flags): Rename to ...
(riscv_vector_elen_flags): ... this.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-13.c: New.
* gcc.target/riscv/arch-14.c: Ditto.
* gcc.target/riscv/arch-15.c: Ditto.
* gcc.target/riscv/predef-18.c: Ditto.
* gcc.target/riscv/predef-19.c: Ditto.
* gcc.target/riscv/predef-20.c: Ditto.
---
 gcc/common/config/riscv/riscv-common.cc| 16 ++--
 gcc/config/riscv/riscv-c.cc| 18 +
 gcc/config/riscv/riscv-opts.h  | 25 +-
 gcc/config/riscv/riscv.opt |  2 +-
 gcc/testsuite/gcc.target/riscv/arch-13.c   |  5 ++
 gcc/testsuite/gcc.target/riscv/arch-14.c   |  5 ++
 gcc/testsuite/gcc.target/riscv/arch-15.c   |  5 ++
 gcc/testsuite/gcc.target/riscv/predef-18.c | 84 +
 gcc/testsuite/gcc.target/riscv/predef-19.c | 88 ++
 gcc/testsuite/gcc.target/riscv/predef-20.c | 84 +
 10 files changed, 319 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-13.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-14.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-15.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-18.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-19.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-20.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 48c4fabdc6b..1501242e296 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1116,16 +1116,16 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zve64f",   _options::x_target_flags, MASK_VECTOR},
   {"zve64d",   _options::x_target_flags, MASK_VECTOR},
 
-  /* We don't need to put complete EEW/EEW_FP info here, due to the
+  /* We don't need to put complete ELEN/ELEN_FP info here, due to the
  implication relation of vector extension.
- e.g. v -> zve64d ... zve32x, so v has set MASK_VECTOR_EEW_FP_64,
- MASK_VECTOR_EEW_FP_32, MASK_VECTOR_EEW_64 and MASK_VECTOR_EEW_32
+ e.g. v -> zve64d ... zve32x, so v has set MASK_VECTOR_ELEN_FP_64,
+ MASK_VECTOR_ELEN_FP_32, MASK_VECTOR_ELEN_64 and MASK_VECTOR_ELEN_32
  due to the extension implication.  */
-  {"zve32x",   _options::x_riscv_vector_eew_flags, MASK_VECTOR_EEW_32},
-  {"zve32f",   _options::x_riscv_vector_eew_flags, MASK_VECTOR_EEW_FP_32},
-  {"zve64x",   _options::x_riscv_vector_eew_flags, MASK_VECTOR_EEW_64},
-  {"zve64f",   _options::x_riscv_vector_eew_flags, MASK_VECTOR_EEW_FP_32},
-  {"zve64d",   _options::x_riscv_vector_eew_flags, MASK_VECTOR_EEW_FP_64},
+  {"zve32x",   _options::x_riscv_vector_elen_flags, MASK_VECTOR_ELEN_32},
+  {"zve32f",   _options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_32},
+  {"zve64x",   _options::x_riscv_vector_elen_flags, MASK_VECTOR_ELEN_64},
+  {"zve64f",   _options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_32},
+  {"zve64d",   _options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_64},
 
   {"zvl32b",_options::x_riscv_zvl_flags, MASK_ZVL32B},
   {"zvl64b",_options::x_riscv_zvl_flags, MASK_ZVL64B},
diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 73c62f41274..eb7ef09297e 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -104,6 +104,24 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
 
 }
 
+  if (TARGET_MIN_VLEN != 0)
+builtin_define_with_int_value ("__riscv_v_min_vlen", TARGET_MIN_VLEN);
+
+  if (TARGET_VECTOR_ELEN_64)
+builtin_define_with_int_value ("__riscv_v_elen", 64);
+  else if (TARGET_VECTOR_ELEN_32)
+builtin_define_with_int_value ("__riscv_v_elen", 32);
+
+  if (TARGET_VECTOR_ELEN_FP_64)
+builtin_define_with_int_value ("__riscv_v_elen_fp", 64);
+  else if (TARGET_VECTOR_ELEN_FP_32)
+builtin_define_with_int_value ("__riscv_v_elen_fp", 32);
+  else if (TARGET_MIN_VLEN != 0)
+builtin_define_with_int_value ("__riscv_v_elen_fp", 0);
+
+  if 

Re: [PING^2][PATCH][final] Handle compiler-generated asm insn

2022-03-21 Thread Richard Biener via Gcc-patches
On Thu, Mar 17, 2022 at 4:10 PM Tom de Vries via Gcc-patches
 wrote:
>
> On 3/9/22 13:50, Tom de Vries wrote:
> > On 2/22/22 14:55, Tom de Vries wrote:
> >> Hi,
> >>
> >> For the nvptx port, with -mptx-comment we have in pr53465.s:
> >> ...
> >>  // #APP
> >> // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
> >>  // Start: Added by -minit-regs=3:
> >>  // #NO_APP
> >>  mov.u32 %r26, 0;
> >>  // #APP
> >> // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
> >>  // End: Added by -minit-regs=3:
> >>  // #NO_APP
> >> ...
> >>
> >> The comments where generated using the compiler-generated equivalent of:
> >> ...
> >>asm ("// Comment");
> >> ...
> >> but both the printed location and the NO_APP/APP are unnecessary for a
> >> compiler-generated asm insn.
> >>
> >> Fix this by handling ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION in
> >> final_scan_insn_1, such what we simply get:
> >> ...
> >>  // Start: Added by -minit-regs=3:
> >>  mov.u32 %r26, 0;
> >>  // End: Added by -minit-regs=3:
> >> ...
> >>
> >> Tested on nvptx.
> >>
> >> OK for trunk?
> >>
> >
>
> Ping^2.
>
> Tobias just reported an ICE in PR104968, and this patch fixes it.
>
> I'd like to known whether this patch is acceptable for stage 4 or not.
>
> If not, I need to fix PR104968 in a different way.  Say, disable
> -mcomment by default, or trying harder to propagate source info on
> outlined functions.

Usually targets use UNSPECs to emit compiler-generated "asm"
instructions.  I think an unknown location is a reasonable but not
the best way to identify 'compiler-generated', we might lose
the location through optimization.  (why does it not use
the INSN_LOCATION?)

Rather than a location I'd use sth like DECL_ARTIFICIAL to
disable 'user-mangling', do we have something like that for
ASM or an insn in general?  If not maybe there's an unused
bit on ASMs we can enable this way.  IIRC some of the Ada
hardening GIMPLE passes also emit ASMs that could 'benefit'
from this.

Richard.

> Thanks,
> - Tom
>
> >> [final] Handle compiler-generated asm insn
> >>
> >> gcc/ChangeLog:
> >>
> >> 2022-02-21  Tom de Vries  
> >>
> >> PR rtl-optimization/104596
> >> * config/nvptx/nvptx.cc (gen_comment): Use gen_rtx_ASM_INPUT instead
> >> of gen_rtx_ASM_INPUT_loc.
> >> * final.cc (final_scan_insn_1): Handle
> >> ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION.
> >>
> >> ---
> >>   gcc/config/nvptx/nvptx.cc |  3 +--
> >>   gcc/final.cc  | 17 +++--
> >>   2 files changed, 12 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
> >> index 858789e6df7..4124c597f24 100644
> >> --- a/gcc/config/nvptx/nvptx.cc
> >> +++ b/gcc/config/nvptx/nvptx.cc
> >> @@ -5381,8 +5381,7 @@ gen_comment (const char *s)
> >> size_t len = strlen (ASM_COMMENT_START) + strlen (sep) + strlen
> >> (s) + 1;
> >> char *comment = (char *) alloca (len);
> >> snprintf (comment, len, "%s%s%s", ASM_COMMENT_START, sep, s);
> >> -  return gen_rtx_ASM_INPUT_loc (VOIDmode, ggc_strdup (comment),
> >> -cfun->function_start_locus);
> >> +  return gen_rtx_ASM_INPUT (VOIDmode, ggc_strdup (comment));
> >>   }
> >>   /* Initialize all declared regs at function entry.
> >> diff --git a/gcc/final.cc b/gcc/final.cc
> >> index a9868861bd2..e6443ef7a4f 100644
> >> --- a/gcc/final.cc
> >> +++ b/gcc/final.cc
> >> @@ -2642,15 +2642,20 @@ final_scan_insn_1 (rtx_insn *insn, FILE *file,
> >> int optimize_p ATTRIBUTE_UNUSED,
> >>   if (string[0])
> >> {
> >>   expanded_location loc;
> >> +bool unknown_loc_p
> >> +  = ASM_INPUT_SOURCE_LOCATION (body) == UNKNOWN_LOCATION;
> >> -app_enable ();
> >> -loc = expand_location (ASM_INPUT_SOURCE_LOCATION (body));
> >> -if (*loc.file && loc.line)
> >> -  fprintf (asm_out_file, "%s %i \"%s\" 1\n",
> >> -   ASM_COMMENT_START, loc.line, loc.file);
> >> +if (!unknown_loc_p)
> >> +  {
> >> +app_enable ();
> >> +loc = expand_location (ASM_INPUT_SOURCE_LOCATION (body));
> >> +if (*loc.file && loc.line)
> >> +  fprintf (asm_out_file, "%s %i \"%s\" 1\n",
> >> +   ASM_COMMENT_START, loc.line, loc.file);
> >> +  }
> >>   fprintf (asm_out_file, "\t%s\n", string);
> >>   #if HAVE_AS_LINE_ZERO
> >> -if (*loc.file && loc.line)
> >> +if (!unknown_loc_p && loc.file && *loc.file && loc.line)
> >> fprintf (asm_out_file, "%s 0 \"\" 2\n", ASM_COMMENT_START);
> >>   #endif
> >> }


Re: [PATCH] rtl-ssa: Fix prev/next_def confusion [PR104869]

2022-03-21 Thread Richard Biener via Gcc-patches
On Sun, Mar 20, 2022 at 10:41 PM Richard Sandiford via Gcc-patches
 wrote:
>
> rtl-ssa chains definitions into an RPO list.  It also groups
> sequences of clobbers together into a single node, so that it's
> possible to skip over the clobbers in constant time in order to
> get the next or previous set.
>
> When adding a clobber to an insn, the main DF barriers for that
> clobber are the last use of the previous set (if any) and the next
> set (if any); adding a new clobber to a sea of clobbers is fine.
> def_lookup provided the basis for these barriers as prev_def ()
> and next_def ().
>
> But of course, in hindsight, those were bad names, since they
> implied that the returned values were literally the previous
> definition (of any kind) or the next definition (of any kind).
> And function_info::make_use_available was using the same routines
> assuming that they had that meaning. :-(
>
> This made a difference for the case where the start of a BB
> occurs in the middle of an (RPO) clobber group: we then want
> the previous and next clobbers in the group, rather than the
> set before the clobber group and the set after the clobber group.
>
> This patch renames the existing routines to something that's hopefully
> clearer (though also more long-winded).  It then adds routines that
> really do provide the previous and next definitions.
>
> This complication is supposed to be internal to rtl-ssa and, as
> mentioned above, is part of trying to reduce time complexity.
>
> Tested on aarch64-linux-gnu and powerpc64le-linux-gnu.  OK to install?

OK.

> This will be latent on GCC 11, so it should probably be backported there.

Agreed, maybe after a short soaking time.

Richard.

> Richard
>
>
> gcc/
> * rtl-ssa/accesses.h (clobber_group::prev_clobber): Declare.
> (clobber_group::next_clobber): Likewise.
> (def_lookup::prev_def): Rename to...
> (def_lookup::last_def_of_prev_group): ...this.
> (def_lookup::next_def): Rename to...
> (def_lookup::first_def_of_next_group): ...this.
> (def_lookup::matching_or_prev_def): Rename to...
> (def_lookup::matching_set_or_last_def_of_prev_group): ...this.
> (def_lookup::matching_or_next_def): Rename to...
> (def_lookup::matching_set_or_first_def_of_next_group): ...this.
> (def_lookup::prev_def): New function, taking the lookup insn as
> argument.
> (def_lookup::next_def): Likewise.
> * rtl-ssa/member-fns.inl (def_lookup::prev_def): Rename to...
> (def_lookup::last_def_of_prev_group): ...this.
> (def_lookup::next_def): Rename to...
> (def_lookup::first_def_of_next_group): ...this.
> (def_lookup::matching_or_prev_def): Rename to...
> (def_lookup::matching_set_or_last_def_of_prev_group): ...this.
> (def_lookup::matching_or_next_def): Rename to...
> (def_lookup::matching_set_or_first_def_of_next_group): ...this.
> * rtl-ssa/movement.h (restrict_movement_for_dead_range): Update after
> above renaming.
> * rtl-ssa/accesses.cc (clobber_group::prev_clobber): New function.
> (clobber_group::next_clobber): Likewise.
> (def_lookup::prev_def): Likewise.
> (def_lookup::next_def): Likewise.
> (function_info::make_use_available): Pass the lookup insn to
> def_lookup::prev_def and def_lookup::next_def.
>
> gcc/testsuite/
> * g++.dg/pr104869.C: New test.
> ---
>  gcc/rtl-ssa/accesses.cc | 52 +-
>  gcc/rtl-ssa/accesses.h  | 22 --
>  gcc/rtl-ssa/member-fns.inl  | 12 ++---
>  gcc/rtl-ssa/movement.h  |  6 +--
>  gcc/testsuite/g++.dg/pr104869.C | 78 +
>  5 files changed, 155 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/pr104869.C
>
> diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
> index 47f2aea05db..dcf2335056b 100644
> --- a/gcc/rtl-ssa/accesses.cc
> +++ b/gcc/rtl-ssa/accesses.cc
> @@ -393,6 +393,28 @@ set_node::print (pretty_printer *pp) const
>pp_access (pp, first_def ());
>  }
>
> +// See the comment above the declaration.
> +clobber_info *
> +clobber_group::prev_clobber (insn_info *insn) const
> +{
> +  auto  = const_cast (m_clobber_tree);
> +  int comparison = lookup_clobber (tree, insn);
> +  if (comparison <= 0)
> +return dyn_cast (tree.root ()->prev_def ());
> +  return tree.root ();
> +}
> +
> +// See the comment above the declaration.
> +clobber_info *
> +clobber_group::next_clobber (insn_info *insn) const
> +{
> +  auto  = const_cast (m_clobber_tree);
> +  int comparison = lookup_clobber (tree, insn);
> +  if (comparison >= 0)
> +return dyn_cast (tree.root ()->next_def ());
> +  return tree.root ();
> +}
> +
>  // See the comment above the declaration.
>  void
>  clobber_group::print (pretty_printer *pp) const
> @@ -415,6 +437,32 @@ clobber_group::print (pretty_printer *pp) const
>pp_indentation (pp) -= 4;

Re: [PATCH] Reset relations when crossing backedges.

2022-03-21 Thread Richard Biener via Gcc-patches
On Sat, Mar 19, 2022 at 8:27 PM Jeff Law  wrote:
>
>
>
> On 2/2/2022 2:27 AM, Richard Biener wrote:
> > On Tue, Feb 1, 2022 at 7:41 PM Aldy Hernandez  wrote:
> >> Ping
> > I didn't quite get Jeffs comment, so I waited (sorry).  I've meanwhile added
> Sorry.  IIRC the concern was whether or not we need to do something
> special for irreducible regions.  In that case which edge gets marked as
> the backedge depends on graph traversal order.  My suggestion was to use
> the dominance relationship instead of edge flags.

Ah.  Yes - it depends on what situation we are trying to detect here.  If we
are just trying to identify PHI arguments from "backedges" using dominance
info is good.  If we are trying to detect backedges from a CFG walk then we
have to use a backedge DFS that matches our CFG walk, otherwise the
backedges might not match ours for irreducible regions.

Richard.

> Jeff
>


Re: [r12-7687 Regression] FAIL: gcc.target/i386/bt-5.c scan-assembler-times bt[lq][ \t] 7 on Linux/x86_64

2022-03-21 Thread Richard Biener via Gcc-patches
On Thu, 17 Mar 2022, sunil.k.pandey wrote:

> On Linux/x86_64,
> 
> 3a7ba8fd0cda387809e4902328af2473662b6a4a is the first bad commit
> commit 3a7ba8fd0cda387809e4902328af2473662b6a4a
> Author: Richard Biener 
> Date:   Thu Mar 17 08:10:59 2022 +0100
> 
> tree-optimization/104960 - unsplit edges after late sinking
> 
> caused
> 
> FAIL: gcc.target/i386/bt-5.c scan-assembler-not sar[lq][ \t]
> FAIL: gcc.target/i386/bt-5.c scan-assembler-times bt[lq][ \t] 7

Btw, I've seen this as well and the "only" change is swapped edges
out of a conditional from the split/unsplit dance.  I was hoping
somebody more familiar with the change intruducing that testcase
was going to investigate but in the end it's probably RTL expansion
presenting slightly different RTL IL to ifcvt.

Richard.


[PATCH] libstdc++: Work around clang misdesign in time_get<>::get [PR104990]

2022-03-21 Thread Jakub Jelinek via Gcc-patches
Hi!

Apparently clang has a -fgnuc-version= option which allows it to pretend
it is any GCC version the user likes.  It is already bad that it claims to
be GCC 4.2 compatible by default when it is not (various unimplemented
extensions at least), but this option is a horrible idea.

Anyway, this patch adds a hack for it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-03-21  Jakub Jelinek  

PR libstdc++/104990
* include/bits/locale_facets_nonio.tcc (get): Don't check if do_get
isn't overloaded if __clang__ is defined.

--- libstdc++-v3/include/bits/locale_facets_nonio.tcc   2022-03-18 
10:37:41.176593188 +0100
+++ libstdc++-v3/include/bits/locale_facets_nonio.tcc   2022-03-20 
20:28:07.203815325 +0100
@@ -1465,7 +1465,7 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
   ctype<_CharT> const& __ctype = use_facet >(__loc);
   __err = ios_base::goodbit;
   bool __use_state = false;
-#if __GNUC__ >= 5
+#if __GNUC__ >= 5 && !defined(__clang__)
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Wpmf-conversions"
   // Nasty hack.  The C++ standard mandates that get invokes the do_get

Jakub