[PATCH] Add missing space after seen_error in gcc/cp/pt.cc

2024-06-04 Thread Simon Martin
I realized that I committed a change with a missing space after seen_error.
This fixes it, as well as another occurrence in the same file.

Apologies for the mistake - I'll commit this as obvious.

gcc/cp/ChangeLog:

* pt.cc (tsubst_expr): Add missing space after seen_error.
(dependent_type_p): Likewise.

---
 gcc/cp/pt.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index edb94a000ea..8cbcf7cdf7a 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -20918,7 +20918,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
   be using lambdas anyway, so it's ok to be
   stricter.  Be strict with C++20 template-id ADL too.
   And be strict if we're already failing anyway.  */
-   bool strict = in_lambda || template_id_p || seen_error();
+   bool strict = in_lambda || template_id_p || seen_error ();
bool diag = true;
if (strict)
  error_at (cp_expr_loc_or_input_loc (t),
@@ -28020,7 +28020,7 @@ dependent_type_p (tree type)
 providing us with a dependent type.  */
   gcc_assert (type);
   gcc_assert (TREE_CODE (type) != TEMPLATE_TYPE_PARM || is_auto (type)
- || seen_error());
+ || seen_error ());
   return false;
 }
 
-- 
2.44.0




[PATCH] Add missing check for const_pool in the escaped solutions

2024-05-17 Thread Richard Biener
The ptr-vs-ptr compare folding using points-to info was missing a
check for const_pool being included in the escaped solution.  The
following fixes that, fixing the observed execute FAIL of
experimental/functional/searchers.cc

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

* tree-ssa-alias.h (pt_solution_includes_const_pool): Declare.
* tree-ssa-alias.cc (ptrs_compare_unequal): Use
pt_solution_includes_const_pool.
* tree-ssa-structalias.cc (pt_solution_includes_const_pool): New.

* gcc.dg/torture/20240517-1.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/20240517-1.c | 26 +++
 gcc/tree-ssa-alias.cc |  3 ++-
 gcc/tree-ssa-alias.h  |  1 +
 gcc/tree-ssa-structalias.cc   | 11 ++
 4 files changed, 40 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/20240517-1.c

diff --git a/gcc/testsuite/gcc.dg/torture/20240517-1.c 
b/gcc/testsuite/gcc.dg/torture/20240517-1.c
new file mode 100644
index 000..ab83d3ca6fb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/20240517-1.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-additional-options "-fmerge-all-constants" } */
+
+char *p;
+
+char * __attribute__((noipa))
+foo () { return p+1; }
+
+volatile int z;
+
+int main()
+{
+  /* ESCAPED = CONST_POOL */
+  p = "Hello";
+  /* PT = ESCAPED */
+  char *x = foo ();
+  char *y;
+  /* y PT = CONST_POOL */
+  if (z)
+y = "Baz";
+  else
+y = "Hello" + 1;
+  if (y != x)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
index 6d31fc83691..9f5f69bcfad 100644
--- a/gcc/tree-ssa-alias.cc
+++ b/gcc/tree-ssa-alias.cc
@@ -501,7 +501,8 @@ ptrs_compare_unequal (tree ptr1, tree ptr2)
  || pi2->pt.vars_contains_interposable)
return false;
  if ((!pi1->pt.null || !pi2->pt.null)
- && (!pi1->pt.const_pool || !pi2->pt.const_pool))
+ && (!pt_solution_includes_const_pool (&pi1->pt)
+ || !pt_solution_includes_const_pool (&pi2->pt)))
return !pt_solutions_intersect (&pi1->pt, &pi2->pt);
}
 }
diff --git a/gcc/tree-ssa-alias.h b/gcc/tree-ssa-alias.h
index e29dff58375..5cd64e72295 100644
--- a/gcc/tree-ssa-alias.h
+++ b/gcc/tree-ssa-alias.h
@@ -178,6 +178,7 @@ extern bool pt_solution_empty_p (const pt_solution *);
 extern bool pt_solution_singleton_or_null_p (struct pt_solution *, unsigned *);
 extern bool pt_solution_includes_global (struct pt_solution *, bool);
 extern bool pt_solution_includes (struct pt_solution *, const_tree);
+extern bool pt_solution_includes_const_pool (struct pt_solution *);
 extern bool pt_solutions_intersect (struct pt_solution *, struct pt_solution 
*);
 extern void pt_solution_reset (struct pt_solution *);
 extern void pt_solution_set (struct pt_solution *, bitmap, bool);
diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index 0c6085b1766..61fb3610a17 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -7080,6 +7080,17 @@ pt_solution_includes (struct pt_solution *pt, const_tree 
decl)
   return res;
 }
 
+/* Return true if the points-to solution *PT contains a reference to a
+   constant pool entry.  */
+
+bool
+pt_solution_includes_const_pool (struct pt_solution *pt)
+{
+  return (pt->const_pool
+ || (pt->escaped && (!cfun || cfun->gimple_df->escaped.const_pool))
+ || (pt->ipa_escaped && ipa_escaped_pt.const_pool));
+}
+
 /* Return true if both points-to solutions PT1 and PT2 have a non-empty
intersection.  */
 
-- 
2.35.3


[PATCH] Add missing hf/bf patterns.

2024-03-17 Thread liuhongt
It fixes ICE of unrecognized logic operation insn which is generated by 
lroundmn2 expanders.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.

gcc/ChangeLog:

PR target/114334
* config/i386/i386.md (mode): Add new number V8BF,V16BF,V32BF.
(MODEF248): New mode iterator.
(ssevecmodesuffix): Hanlde BF and HF.
* config/i386/sse.md (andnot3): Extend to HF/BF.
(3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr114334.c: New test.
---
 gcc/config/i386/i386.md  | 13 +
 gcc/config/i386/sse.md   | 22 +++---
 gcc/testsuite/gcc.target/i386/pr114334.c |  8 
 3 files changed, 28 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr114334.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index df97a2d6270..11fdc6af3fa 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -543,8 +543,9 @@ (define_attr "type"
 
 ;; Main data type used by the insn
 (define_attr "mode"
-  "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,BF,SF,DF,XF,TF,V32HF,V16HF,V8HF,
-   V16SF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,V8DF,V4HF,V4BF,V2HF,V2BF"
+  "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,BF,SF,DF,XF,TF,
+   V32HF,V16HF,V8HF,V4HF,V2HF,V32BF,V16BF,V8BF,V4BF,V2BF,
+   V16SF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,V8DF"
   (const_string "unknown"))
 
 ;; The CPU unit operations uses.
@@ -1323,6 +1324,8 @@ (define_mode_attr ashl_input_operand
 ;; SSE and x87 SFmode and DFmode floating point modes
 (define_mode_iterator MODEF [SF DF])
 
+(define_mode_iterator MODEF248 [BF HF SF (DF "TARGET_SSE2")])
+
 ;; SSE floating point modes
 (define_mode_iterator MODEFH [(HF "TARGET_AVX512FP16") SF DF])
 
@@ -1347,7 +1350,8 @@ (define_mode_attr ssemodesuffix
(V64QI "b") (V32HI "w") (V16SI "d") (V8DI "q")])
 
 ;; SSE vector suffix for floating point modes
-(define_mode_attr ssevecmodesuffix [(SF "ps") (DF "pd")])
+;; BF HF use same suffix as SF for logic operations.
+(define_mode_attr ssevecmodesuffix [(BF "ps") (HF "ps") (SF "ps") (DF "pd")])
 
 ;; SSE vector mode corresponding to a scalar mode
 (define_mode_attr ssevecmode
@@ -1357,7 +1361,8 @@ (define_mode_attr ssevecmodelower
 
 ;; AVX512F vector mode corresponding to a scalar mode
 (define_mode_attr avx512fvecmode
-  [(QI "V64QI") (HI "V32HI") (SI "V16SI") (DI "V8DI") (SF "V16SF") (DF 
"V8DF")])
+  [(QI "V64QI") (HI "V32HI") (SI "V16SI") (DI "V8DI")
+   (HF "V32HF") (BF "V32BF") (SF "V16SF") (DF "V8DF")])
 
 ;; Instruction suffix for REX 64bit operators.
 (define_mode_attr rex64suffix [(SI "{l}") (DI "{q}")])
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 1bc614ab702..3286d3a4fac 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5125,12 +5125,12 @@ (define_expand "signbit2"
 ;; because the native instructions read the full 128-bits.
 
 (define_insn "*andnot3"
-  [(set (match_operand:MODEF 0 "register_operand" "=x,x,v,v")
-   (and:MODEF
- (not:MODEF
-   (match_operand:MODEF 1 "register_operand" "0,x,v,v"))
-   (match_operand:MODEF 2 "register_operand" "x,x,v,v")))]
-  "SSE_FLOAT_MODE_P (mode)"
+  [(set (match_operand:MODEF248 0 "register_operand" "=x,x,v,v")
+   (and:MODEF248
+ (not:MODEF248
+   (match_operand:MODEF248 1 "register_operand" "0,x,v,v"))
+   (match_operand:MODEF248 2 "register_operand" "x,x,v,v")))]
+  "TARGET_SSE"
 {
   char buf[128];
   const char *ops;
@@ -5257,11 +5257,11 @@ (define_insn "*andnot3"
  (const_string "TI")))])
 
 (define_insn "3"
-  [(set (match_operand:MODEF 0 "register_operand" "=x,x,v,v")
-   (any_logic:MODEF
- (match_operand:MODEF 1 "register_operand" "%0,x,v,v")
- (match_operand:MODEF 2 "register_operand" "x,x,v,v")))]
-  "SSE_FLOAT_MODE_P (mode)"
+  [(set (match_operand:MODEF248 0 "register_operand" "=x,x,v,v")
+   (any_logic:MODEF248
+ (match_operand:MODEF248 1 "register_operand" "%0,x,v,v")
+ (match_operand:MODEF248 2 "register_operand" "x,x,v,v")))]
+  "TARGET_SSE"
 {
   char buf[128];
   const char *ops;
diff --git a/gcc/testsuite/gcc.target/i386/pr114334.c 
b/gcc/testsuite/gcc.target/i386/pr114334.c
new file mode 100644
index 000..8e38e24cd16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr114334.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mavx512fp16" } */
+
+long
+foo(_Float16 f)
+{
+  return __builtin_lroundf16(f);
+}
-- 
2.31.1



Re: [PATCH] Add missing declaration of get_restrict in C++ interface

2023-11-09 Thread David Malcolm
On Thu, 2023-11-09 at 21:51 +0100, Guillaume Gomez wrote:
> I confirm it does. I realized it when finalizing our patch for
> attributes support.

Excellent; thanks for the fix.

Dave




Re: [PATCH] Add missing declaration of get_restrict in C++ interface

2023-11-09 Thread Guillaume Gomez
I confirm it does. I realized it when finalizing our patch for
attributes support.

Le jeu. 9 nov. 2023 à 21:49, David Malcolm  a écrit :
>
> On Thu, 2023-11-09 at 21:03 +0100, Guillaume Gomez wrote:
> > Hi,
> >
> > This patch adds the `get_restrict` method declaration for
> > the C++ interface as it was forgotten.
> >
> > Thanks in advance for the review.
>
> Looking at my jit.sum results, it looks like the .cc files are indeed
> FAILing on initial compilation, with errors such as:
>
> In file included from gcc/testsuite/jit.dg/test-alignment.cc:4:
> gcc/testsuite/../jit/libgccjit++.h:1414:1: error: no declaration matches 
> 'gccjit::type gccjit::type::get_restrict()'
> gcc/testsuite/../jit/libgccjit++.h:1414:1: note: no functions named 
> 'gccjit::type gccjit::type::get_restrict()'
> gcc/testsuite/../jit/libgccjit++.h:350:9: note: 'class gccjit::type' defined 
> here
>
> which presumably started with r14-3552-g29763b002459cb.
>
> Hence the patch looks good to me - thanks!
>
> Does this patch fix those test cases?
>
> Dave
>


Re: [PATCH] Add missing declaration of get_restrict in C++ interface

2023-11-09 Thread David Malcolm
On Thu, 2023-11-09 at 21:03 +0100, Guillaume Gomez wrote:
> Hi,
> 
> This patch adds the `get_restrict` method declaration for
> the C++ interface as it was forgotten.
> 
> Thanks in advance for the review.

Looking at my jit.sum results, it looks like the .cc files are indeed
FAILing on initial compilation, with errors such as:

In file included from gcc/testsuite/jit.dg/test-alignment.cc:4:
gcc/testsuite/../jit/libgccjit++.h:1414:1: error: no declaration matches 
'gccjit::type gccjit::type::get_restrict()'
gcc/testsuite/../jit/libgccjit++.h:1414:1: note: no functions named 
'gccjit::type gccjit::type::get_restrict()'
gcc/testsuite/../jit/libgccjit++.h:350:9: note: 'class gccjit::type' defined 
here

which presumably started with r14-3552-g29763b002459cb.

Hence the patch looks good to me - thanks!

Does this patch fix those test cases?

Dave



[PATCH] Add missing declaration of get_restrict in C++ interface

2023-11-09 Thread Guillaume Gomez
Hi,

This patch adds the `get_restrict` method declaration for
the C++ interface as it was forgotten.

Thanks in advance for the review.
From e819fd01cd3e79bfab28a77f4ce78f34156e7a83 Mon Sep 17 00:00:00 2001
From: Guillaume Gomez 
Date: Thu, 9 Nov 2023 17:53:08 +0100
Subject: [PATCH] Add missing declaration of get_restrict in C++ interface

gcc/jit/ChangeLog:

	* libgccjit++.h:
---
 gcc/jit/libgccjit++.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/jit/libgccjit++.h b/gcc/jit/libgccjit++.h
index 4a04db386e6..f9a0017cae5 100644
--- a/gcc/jit/libgccjit++.h
+++ b/gcc/jit/libgccjit++.h
@@ -360,6 +360,7 @@ namespace gccjit
 type get_volatile ();
 type get_aligned (size_t alignment_in_bytes);
 type get_vector (size_t num_units);
+type get_restrict ();
 
 // Shortcuts for getting values of numeric types:
 rvalue zero ();
-- 
2.34.1



Re: [PATCH] Add missing return in gori_compute::logical_combine

2023-09-25 Thread Andrew MacLeod
OK for trunk at least.   Thanks.  I presume it'll be fine for the other 
releases.


Andrew

On 9/25/23 11:51, Eric Botcazou wrote:

Hi,

the varying case currently falls through to the 1/true case.

Tested on x86_64-suse-linux, OK for mainline, 13 and 12 branches?


2023-09-25  Eric Botcazou  

* gimple-range-gori.cc (gori_compute::logical_combine): Add missing
return statement in the varying case.


2023-09-25  Eric Botcazou  

* gnat.dg/opt102.adb:New test.
* gnat.dg/opt102_pkg.adb, gnat.dg/opt102_pkg.ads: New helper.





[PATCH] Add missing return in gori_compute::logical_combine

2023-09-25 Thread Eric Botcazou
Hi,

the varying case currently falls through to the 1/true case.

Tested on x86_64-suse-linux, OK for mainline, 13 and 12 branches?


2023-09-25  Eric Botcazou  

* gimple-range-gori.cc (gori_compute::logical_combine): Add missing
return statement in the varying case.


2023-09-25  Eric Botcazou  

* gnat.dg/opt102.adb:New test.
* gnat.dg/opt102_pkg.adb, gnat.dg/opt102_pkg.ads: New helper.

-- 
Eric Botcazoudiff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
index 51fb542a19c..2694e551d73 100644
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -876,6 +876,7 @@ gori_compute::logical_combine (vrange &r, enum tree_code code,
 	  r.dump (dump_file);
 	  fputc ('\n', dump_file);
 	}
+  return res;
 }
 
   switch (code)
package body Opt102_Pkg is

  function Get (E : Enum; F, M : access Integer) return Integer is
  begin
case E is
  when One   => return 0;
  when Two   => return F.all;
  when Three => return M.all;
end case;
  end;

end Opt102_Pkg;
-- { dg-do run }
-- { dg-options "-O2 -gnata" }

with Opt102_Pkg; use Opt102_Pkg;

procedure Opt102 is
  I, F : aliased Integer;
begin
  I := Get (Two, F'Access, null);
end;
package Opt102_Pkg is

  type Enum is (One, Two, Three);

  function Get (E : Enum; F, M : access Integer) return Integer
with Pre => (E = One) = (F = null and M = null) and
(E = Two) = (F /= null) and
(E = Three) = (M /= null);

end Opt102_Pkg;


[PATCH] add missing dg-require alias to gcc.dg/torture/pr100786.c

2022-03-28 Thread Richard Biener via Gcc-patches


Pushed.

2022-03-28  Richard Biener  

* gcc.dg/torture/pr100786.c: Add dg-require alias.
---
 gcc/testsuite/gcc.dg/torture/pr100786.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/torture/pr100786.c 
b/gcc/testsuite/gcc.dg/torture/pr100786.c
index 42f4e485593..7c03b08d8cb 100644
--- a/gcc/testsuite/gcc.dg/torture/pr100786.c
+++ b/gcc/testsuite/gcc.dg/torture/pr100786.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-alias "" } */
 
 const double a = 0;
 extern int b __attribute__((alias("a")));
-- 
2.34.1


Re: [PATCH] add missing GTY support for hash-map (PR 100463)

2021-05-24 Thread Jason Merrill via Gcc-patches

On 5/24/21 12:46 PM, Martin Sebor wrote:

Instantiating a hash_map on a number of integer types including,
for example, int or unsigned int (such as location_t), or
HOST_WIDE_INT, and using it with the garbage collector causes many
cryptic compilation errors due to incomplete support for such types
(the PR shows a few examples).  I ran into these errors as I was
prototyping a new feature and they took me egregiously long to
figure out, even with help from others.

The attached patch adds the missing functions necessary to complete
the support for all integer types to avoid these errors.  This is
prerequisite for a future patch of mine.  The patch uses just one
of these hash_map instances but others shouldn't have to run into
the same errors if they happen to choose one of them.

Tested on x86_64-linux.


OK.

Jason



[PATCH] add missing GTY support for hash-map (PR 100463)

2021-05-24 Thread Martin Sebor via Gcc-patches

Instantiating a hash_map on a number of integer types including,
for example, int or unsigned int (such as location_t), or
HOST_WIDE_INT, and using it with the garbage collector causes many
cryptic compilation errors due to incomplete support for such types
(the PR shows a few examples).  I ran into these errors as I was
prototyping a new feature and they took me egregiously long to
figure out, even with help from others.

The attached patch adds the missing functions necessary to complete
the support for all integer types to avoid these errors.  This is
prerequisite for a future patch of mine.  The patch uses just one
of these hash_map instances but others shouldn't have to run into
the same errors if they happen to choose one of them.

Tested on x86_64-linux.

Martin
PR other/100463 - many errors using GTY and hash_map

gcc/ChangeLog:
	* ggc.h (gt_ggc_mx): Add overloads for all integers.
	(gt_pch_nx):  Same.
	* hash-map.h (class hash_map): Add pch_nx_helper overloads for all
	integers.
	(hash_map::operator==): New function.

diff --git a/gcc/ggc.h b/gcc/ggc.h
index 65f6cb4d19d..92884717f5c 100644
--- a/gcc/ggc.h
+++ b/gcc/ggc.h
@@ -332,19 +332,30 @@ gt_pch_nx (const char *)
 {
 }
 
-inline void
-gt_ggc_mx (int)
-{
-}
-
-inline void
-gt_pch_nx (int)
-{
-}
-
-inline void
-gt_pch_nx (unsigned int)
-{
-}
+inline void gt_pch_nx (bool) { }
+inline void gt_pch_nx (char) { }
+inline void gt_pch_nx (signed char) { }
+inline void gt_pch_nx (unsigned char) { }
+inline void gt_pch_nx (short) { }
+inline void gt_pch_nx (unsigned short) { }
+inline void gt_pch_nx (int) { }
+inline void gt_pch_nx (unsigned int) { }
+inline void gt_pch_nx (long int) { }
+inline void gt_pch_nx (unsigned long int) { }
+inline void gt_pch_nx (long long int) { }
+inline void gt_pch_nx (unsigned long long int) { }
+
+inline void gt_ggc_mx (bool) { }
+inline void gt_ggc_mx (char) { }
+inline void gt_ggc_mx (signed char) { }
+inline void gt_ggc_mx (unsigned char) { }
+inline void gt_ggc_mx (short) { }
+inline void gt_ggc_mx (unsigned short) { }
+inline void gt_ggc_mx (int) { }
+inline void gt_ggc_mx (unsigned int) { }
+inline void gt_ggc_mx (long int) { }
+inline void gt_ggc_mx (unsigned long int) { }
+inline void gt_ggc_mx (long long int) { }
+inline void gt_ggc_mx (unsigned long long int) { }
 
 #endif
diff --git a/gcc/hash-map.h b/gcc/hash-map.h
index 0779c930f0a..dd039f10343 100644
--- a/gcc/hash-map.h
+++ b/gcc/hash-map.h
@@ -107,27 +107,31 @@ class GTY((user)) hash_map
 	  gt_pch_nx (&x, op, cookie);
 	}
 
-static void
-  pch_nx_helper (int, gt_pointer_operator, void *)
-	{
-	}
-
-static void
-  pch_nx_helper (unsigned int, gt_pointer_operator, void *)
-	{
-	}
-
-static void
-  pch_nx_helper (bool, gt_pointer_operator, void *)
-	{
-	}
-
 template
   static void
   pch_nx_helper (T *&x, gt_pointer_operator op, void *cookie)
 	{
 	  op (&x, cookie);
 	}
+
+/* The overloads below should match those in ggc.h.  */
+#define DEFINE_PCH_HELPER(T)			\
+static void pch_nx_helper (T, gt_pointer_operator, void *) { }
+
+DEFINE_PCH_HELPER (bool);
+DEFINE_PCH_HELPER (char);
+DEFINE_PCH_HELPER (signed char);
+DEFINE_PCH_HELPER (unsigned char);
+DEFINE_PCH_HELPER (short);
+DEFINE_PCH_HELPER (unsigned short);
+DEFINE_PCH_HELPER (int);
+DEFINE_PCH_HELPER (unsigned int);
+DEFINE_PCH_HELPER (long);
+DEFINE_PCH_HELPER (unsigned long);
+DEFINE_PCH_HELPER (long long);
+DEFINE_PCH_HELPER (unsigned long long);
+
+#undef DEFINE_PCH_HELPER
   };
 
 public:
@@ -273,8 +277,12 @@ public:
   return reference_pair (e.m_key, e.m_value);
 }
 
-bool
-operator != (const iterator &other) const
+bool operator== (const iterator &other) const
+{
+  return m_iter == other.m_iter;
+}
+
+bool operator != (const iterator &other) const
 {
   return m_iter != other.m_iter;
 }



[PATCH] Add missing changes to Makefile.tpl

2021-02-28 Thread H.J. Lu via Gcc-patches
On Sat, Feb 27, 2021 at 11:01 PM Mike Frysinger  wrote:
>
> On 19 Dec 2020 10:10, H.J. Lu via Gdb-patches wrote:
> > --- a/Makefile.in
> > +++ b/Makefile.in
> >
> > +PGO_BUILD_TRAINING_FLAGS_TO_PASS = \
> > + PGO_BUILD_TRAINING=yes \
> > + CFLAGS_FOR_TARGET="$(PGO_BUILD_TRAINING_CFLAGS)" \
> > + CXXFLAGS_FOR_TARGET="$(PGO_BUILD_TRAINING_CXXFLAGS)"
> > +
> > +# Ignore "make check" errors in PGO training runs.
> > +PGO_BUILD_TRAINING_MFLAGS = -i
>
> these lines are in Makefile.in but not Makefile.tpl.  so regenerating
> the file causes them to be removed.  can you take a look please ?
>

I checked in this patch as an obvious change.

Thanks.

-- 
H.J.
From 1dbde357be3ce2641595b10436822e699abe32a0 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sun, 28 Feb 2021 04:39:38 -0800
Subject: [PATCH] Add missing changes to Makefile.tpl

Update Makefile.tpl to add missing changes in

commit af019bfde9b13d628202fe58054ec7ff08d92a0f
Author: H.J. Lu 
Date:   Sat Jan 9 06:51:15 2021 -0800

Support the PGO build for binutils+gdb

"autogen Makefile.def" showed no changes in Makefile.in.

	PR binutils/26766
	* Makefile.tpl (PGO_BUILD_TRAINING_FLAGS_TO_PASS): Add
	PGO_BUILD_TRAINING=yes.
	(PGO_BUILD_TRAINING_MFLAGS): New.
	(all): Pass $(PGO_BUILD_TRAINING_MFLAGS) to the PGO build.
---
 ChangeLog| 8 
 Makefile.tpl | 5 +
 2 files changed, 13 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index e9a5611c5e7..4cd48fa1dad 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+2021-02-28  H.J. Lu  
+
+	PR binutils/26766
+	* Makefile.tpl (PGO_BUILD_TRAINING_FLAGS_TO_PASS): Add
+	PGO_BUILD_TRAINING=yes.
+	(PGO_BUILD_TRAINING_MFLAGS): New.
+	(all): Pass $(PGO_BUILD_TRAINING_MFLAGS) to the PGO build.
+
 2021-02-09  Alan Modra  
 
 	* configure.ac: Delete arm*-*-symbianelf* entry.
diff --git a/Makefile.tpl b/Makefile.tpl
index 38f0b021f43..84fee3dd0f7 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -440,9 +440,13 @@ PGO_BUILD_TRAINING_CFLAGS:= \
 PGO_BUILD_TRAINING_CXXFLAGS:= \
 	$(filter-out -specs=%,$(PGO_BUILD_TRAINING_CXXFLAGS))
 PGO_BUILD_TRAINING_FLAGS_TO_PASS = \
+	PGO_BUILD_TRAINING=yes \
 	CFLAGS_FOR_TARGET="$(PGO_BUILD_TRAINING_CFLAGS)" \
 	CXXFLAGS_FOR_TARGET="$(PGO_BUILD_TRAINING_CXXFLAGS)"
 
+# Ignore "make check" errors in PGO training runs.
+PGO_BUILD_TRAINING_MFLAGS = -i
+
 # Additional PGO and LTO compiler options to use profiling data for the
 # PGO build.
 PGO_BUILD_USE_FLAGS_TO_PASS = \
@@ -784,6 +788,7 @@ all:
 		$(PGO_BUILD_GEN_FLAGS_TO_PASS) all-host all-target \
 @if pgo-build
 	&& $(MAKE) $(RECURSE_FLAGS_TO_PASS) \
+		$(PGO_BUILD_TRAINING_MFLAGS) \
 		$(PGO_BUILD_TRAINING_FLAGS_TO_PASS) \
 		$(PGO_BUILD_TRAINING) \
 	&& $(MAKE) $(RECURSE_FLAGS_TO_PASS) clean \
-- 
2.29.2



Re: [PATCH] Add missing varasm DECL_P check.

2020-12-10 Thread Richard Sandiford via Gcc-patches
Jim Wilson  writes:
> This fixes a riscv64-linux bootstrap failure.
>
> get_constant_section calls the select_section target hook, and select_section
> calls get_named_section which calls get_section.  So it is possible to have
> a constant not a decl in both of these functions.  They already call DECL_P
> checks everywhere except for the new code HJ recently added.  This adds the
> missing DECL_P check.
>
> Verified with a riscv64-linux bootstrap.
>
> OK?

OK, thanks.  (And yeah, I agree a testcase isn't needed for bootstrap fixes.)

Richard

>
> Jim
> ---
>  gcc/varasm.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/varasm.c b/gcc/varasm.c
> index 0fac3688828..5b2e123b0da 100644
> --- a/gcc/varasm.c
> +++ b/gcc/varasm.c
> @@ -294,6 +294,7 @@ get_section (const char *name, unsigned int flags, tree 
> decl,
>flags |= SECTION_NAMED;
>if (HAVE_GAS_SHF_GNU_RETAIN
>&& decl != nullptr
> +  && DECL_P (decl)
>&& DECL_PRESERVE_P (decl))
>  flags |= SECTION_RETAIN;
>if (*slot == NULL)


Re: [PATCH] Add missing varasm DECL_P check.

2020-12-09 Thread Jim Wilson
On Wed, Dec 9, 2020 at 7:14 PM H.J. Lu  wrote:

>  A testcase?
>

A testcase requires the RISC-V select_section target hook, so it isn't
going to be very useful.  I don't see any other linux targets that have
this hook defined.  Just a few embedded targets.  The testcase
is libgfortran/generated/product_c4.c.  I haven't tried to reduce it.  It
fails both for a native build and a cross build.

Jim


Re: [PATCH] Add missing varasm DECL_P check.

2020-12-09 Thread H.J. Lu via Gcc-patches
On Wed, Dec 9, 2020 at 7:10 PM Jim Wilson  wrote:
>
> This fixes a riscv64-linux bootstrap failure.
>
> get_constant_section calls the select_section target hook, and select_section
> calls get_named_section which calls get_section.  So it is possible to have
> a constant not a decl in both of these functions.  They already call DECL_P
> checks everywhere except for the new code HJ recently added.  This adds the
> missing DECL_P check.
>
> Verified with a riscv64-linux bootstrap.
>
> OK?
>
> Jim
> ---
>  gcc/varasm.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/varasm.c b/gcc/varasm.c
> index 0fac3688828..5b2e123b0da 100644
> --- a/gcc/varasm.c
> +++ b/gcc/varasm.c
> @@ -294,6 +294,7 @@ get_section (const char *name, unsigned int flags, tree 
> decl,
>flags |= SECTION_NAMED;
>if (HAVE_GAS_SHF_GNU_RETAIN
>&& decl != nullptr
> +  && DECL_P (decl)
>&& DECL_PRESERVE_P (decl))
>  flags |= SECTION_RETAIN;
>if (*slot == NULL)
> --
> 2.17.1
>

 A testcase?

-- 
H.J.


[PATCH] Add missing varasm DECL_P check.

2020-12-09 Thread Jim Wilson
This fixes a riscv64-linux bootstrap failure.

get_constant_section calls the select_section target hook, and select_section
calls get_named_section which calls get_section.  So it is possible to have
a constant not a decl in both of these functions.  They already call DECL_P
checks everywhere except for the new code HJ recently added.  This adds the
missing DECL_P check.

Verified with a riscv64-linux bootstrap.

OK?

Jim
---
 gcc/varasm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/varasm.c b/gcc/varasm.c
index 0fac3688828..5b2e123b0da 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -294,6 +294,7 @@ get_section (const char *name, unsigned int flags, tree 
decl,
   flags |= SECTION_NAMED;
   if (HAVE_GAS_SHF_GNU_RETAIN
   && decl != nullptr
+  && DECL_P (decl)
   && DECL_PRESERVE_P (decl))
 flags |= SECTION_RETAIN;
   if (*slot == NULL)
-- 
2.17.1



Re: [PATCH] Add missing gnu-versioned-namespace symbols

2020-11-03 Thread Jonathan Wakely via Gcc-patches

On 02/11/20 21:52 +0100, François Dumont via Libstdc++ wrote:

On 02/11/20 3:17 pm, Jonathan Wakely wrote:

On 01/11/20 20:48 +0100, François Dumont via Libstdc++ wrote:

Several tests are failing because of those missing symbols.

I understand why we need to export symbols relying in the 
versioned namespace but I don't understand why we need to do it 
for _GLIBCXX_DEBUG symbols which are not version namespace 
dependant.


If you don't export the symbol, it can't be found by code linking to
libstdc++.so.8


So I understand that in versioned namespace mode only 
gnu-versioned-namespace.ver is being used and not gnu.ver.


Right.



This linker script is the only one used to build libstdc++.so.8 so all
symbols that need to be exported by that library have to be exported
by this script. Nothing exports that debug symbol unless you add it
here.

What I don't understand is why the __istream_extract symbol isn't
matched by the wildcard in the extern "C++" block at the top of the
file.


Maybe for the same reason that the std::__copy_streambufs before this 
one and some others symbols in std::__8 had to be explicitely exported 
too.


But I don't know it.


Yeah, I don't understand those either.

OK for trunk anyway. I'll investigate another day.




Re: [PATCH] Add missing gnu-versioned-namespace symbols

2020-11-02 Thread François Dumont via Gcc-patches

On 02/11/20 3:17 pm, Jonathan Wakely wrote:

On 01/11/20 20:48 +0100, François Dumont via Libstdc++ wrote:

Several tests are failing because of those missing symbols.

I understand why we need to export symbols relying in the versioned 
namespace but I don't understand why we need to do it for 
_GLIBCXX_DEBUG symbols which are not version namespace dependant.


If you don't export the symbol, it can't be found by code linking to
libstdc++.so.8


So I understand that in versioned namespace mode only 
gnu-versioned-namespace.ver is being used and not gnu.ver.




This linker script is the only one used to build libstdc++.so.8 so all
symbols that need to be exported by that library have to be exported
by this script. Nothing exports that debug symbol unless you add it
here.

What I don't understand is why the __istream_extract symbol isn't
matched by the wildcard in the extern "C++" block at the top of the
file.


Maybe for the same reason that the std::__copy_streambufs before this 
one and some others symbols in std::__8 had to be explicitely exported too.


But I don't know it.




Do you want to backport the Debug symbol ?

    libstdc++: Add mising gnu-versioned-namespace symbols

    libstdc++-v3/ChangeLog:

            * config/abi/pre/gnu-versioned-namespace.ver:
            Add __istream_extract and 
_Safe_local_iterator_base::_M_attach_single

            symbols.

Tested under Linux x86_64 versioned namespace.

Ok to commit ?

François



diff --git a/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver 
b/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver

index 0965854fbc3..3b6d7944d06 100644
--- a/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
+++ b/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
@@ -98,6 +98,9 @@ GLIBCXX_8.0 {
    _ZNSt3__817__copy_streambufsI*;
    _ZNSt3__821__copy_streambufs_eofI*;

+    # std::__istream_extract(wistream&, wchar_t*, streamsize)
+ 
_ZNSt3__817__istream_extractIwNS_11char_traitsIwvRNS_13basic_istreamIT_T0_EEPS4_[ilx];

+
    # __gnu_cxx::__atomic_add
    # __gnu_cxx::__exchange_and_add
    _ZN9__gnu_cxx3__812__atomic_addEPV[il][il];
@@ -145,6 +148,7 @@ GLIBCXX_8.0 {
_ZN11__gnu_debug30_Safe_unordered_container_base13_M_detach_allEv;
_ZN11__gnu_debug25_Safe_local_iterator_base9_M_attachEPNS_19_Safe_sequence_baseEb;
    _ZN11__gnu_debug25_Safe_local_iterator_base9_M_detachEv;
+ 
_ZN11__gnu_debug25_Safe_local_iterator_base16_M_attach_singleEPNS_19_Safe_sequence_baseEb;


    # parallel mode
    _ZN14__gnu_parallel9_Settings3getEv;






Re: [PATCH] Add missing gnu-versioned-namespace symbols

2020-11-02 Thread Jonathan Wakely via Gcc-patches

On 01/11/20 20:48 +0100, François Dumont via Libstdc++ wrote:

Several tests are failing because of those missing symbols.

I understand why we need to export symbols relying in the versioned 
namespace but I don't understand why we need to do it for 
_GLIBCXX_DEBUG symbols which are not version namespace dependant.


If you don't export the symbol, it can't be found by code linking to
libstdc++.so.8

This linker script is the only one used to build libstdc++.so.8 so all
symbols that need to be exported by that library have to be exported
by this script. Nothing exports that debug symbol unless you add it
here.

What I don't understand is why the __istream_extract symbol isn't
matched by the wildcard in the extern "C++" block at the top of the
file.


Do you want to backport the Debug symbol ?

    libstdc++: Add mising gnu-versioned-namespace symbols

    libstdc++-v3/ChangeLog:

            * config/abi/pre/gnu-versioned-namespace.ver:
            Add __istream_extract and 
_Safe_local_iterator_base::_M_attach_single

            symbols.

Tested under Linux x86_64 versioned namespace.

Ok to commit ?

François




diff --git a/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver 
b/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
index 0965854fbc3..3b6d7944d06 100644
--- a/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
+++ b/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
@@ -98,6 +98,9 @@ GLIBCXX_8.0 {
_ZNSt3__817__copy_streambufsI*;
_ZNSt3__821__copy_streambufs_eofI*;

+# std::__istream_extract(wistream&, wchar_t*, streamsize)
+
_ZNSt3__817__istream_extractIwNS_11char_traitsIwvRNS_13basic_istreamIT_T0_EEPS4_[ilx];
+
# __gnu_cxx::__atomic_add
# __gnu_cxx::__exchange_and_add
_ZN9__gnu_cxx3__812__atomic_addEPV[il][il];
@@ -145,6 +148,7 @@ GLIBCXX_8.0 {
_ZN11__gnu_debug30_Safe_unordered_container_base13_M_detach_allEv;

_ZN11__gnu_debug25_Safe_local_iterator_base9_M_attachEPNS_19_Safe_sequence_baseEb;
_ZN11__gnu_debug25_Safe_local_iterator_base9_M_detachEv;
+
_ZN11__gnu_debug25_Safe_local_iterator_base16_M_attach_singleEPNS_19_Safe_sequence_baseEb;

# parallel mode
_ZN14__gnu_parallel9_Settings3getEv;




[PATCH] Add missing gnu-versioned-namespace symbols

2020-11-01 Thread François Dumont via Gcc-patches

Several tests are failing because of those missing symbols.

I understand why we need to export symbols relying in the versioned 
namespace but I don't understand why we need to do it for _GLIBCXX_DEBUG 
symbols which are not version namespace dependant.


Do you want to backport the Debug symbol ?

    libstdc++: Add mising gnu-versioned-namespace symbols

    libstdc++-v3/ChangeLog:

    * config/abi/pre/gnu-versioned-namespace.ver:
    Add __istream_extract and 
_Safe_local_iterator_base::_M_attach_single

    symbols.

Tested under Linux x86_64 versioned namespace.

Ok to commit ?

François

diff --git a/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver b/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
index 0965854fbc3..3b6d7944d06 100644
--- a/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
+++ b/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
@@ -98,6 +98,9 @@ GLIBCXX_8.0 {
 _ZNSt3__817__copy_streambufsI*;
 _ZNSt3__821__copy_streambufs_eofI*;
 
+# std::__istream_extract(wistream&, wchar_t*, streamsize)
+_ZNSt3__817__istream_extractIwNS_11char_traitsIwvRNS_13basic_istreamIT_T0_EEPS4_[ilx];
+
 # __gnu_cxx::__atomic_add
 # __gnu_cxx::__exchange_and_add
 _ZN9__gnu_cxx3__812__atomic_addEPV[il][il];
@@ -145,6 +148,7 @@ GLIBCXX_8.0 {
 _ZN11__gnu_debug30_Safe_unordered_container_base13_M_detach_allEv;
 _ZN11__gnu_debug25_Safe_local_iterator_base9_M_attachEPNS_19_Safe_sequence_baseEb;
 _ZN11__gnu_debug25_Safe_local_iterator_base9_M_detachEv;
+_ZN11__gnu_debug25_Safe_local_iterator_base16_M_attach_singleEPNS_19_Safe_sequence_baseEb;
 
 # parallel mode
 _ZN14__gnu_parallel9_Settings3getEv;


Re: [PATCH] Add Missing FSF copyright notes for some x86 intrinsic headers

2020-09-29 Thread H.J. Lu via Gcc-patches
On Mon, Sep 28, 2020 at 9:06 AM H.J. Lu  wrote:
>
> On Mon, Sep 28, 2020 at 9:04 AM Hongyu Wang via Gcc-patches
>  wrote:
> >
> > Hi,
> >
> > Some x86 intrinsic headers is missing FSF copyright notes. This patch add
> > the missed notes for those headers.
> >
> > OK for master?
> >
> > gcc/ChangeLog:
> >
> > * config/i386/amxbf16intrin.h: Add FSF copyright notes.
> > * config/i386/amxint8intrin.h: Ditto.
> > * config/i386/amxtileintrin.h: Ditto.
> > * config/i386/avx512vp2intersectintrin.h: Ditto.
> > * config/i386/avx512vp2intersectvlintrin.h: Ditto.
> > * config/i386/pconfigintrin.h: Ditto.
> > * config/i386/tsxldtrkintrin.h: Ditto.
> > * config/i386/wbnoinvdintrin.h: Ditto.
> >
>
> I will check it for Hongyu tomorrow if there are no objections.
>

I checked it into master branch and will backport it to release branches.


-- 
H.J.


Re: [PATCH] Add Missing FSF copyright notes for some x86 intrinsic headers

2020-09-28 Thread H.J. Lu via Gcc-patches
On Mon, Sep 28, 2020 at 9:04 AM Hongyu Wang via Gcc-patches
 wrote:
>
> Hi,
>
> Some x86 intrinsic headers is missing FSF copyright notes. This patch add
> the missed notes for those headers.
>
> OK for master?
>
> gcc/ChangeLog:
>
> * config/i386/amxbf16intrin.h: Add FSF copyright notes.
> * config/i386/amxint8intrin.h: Ditto.
> * config/i386/amxtileintrin.h: Ditto.
> * config/i386/avx512vp2intersectintrin.h: Ditto.
> * config/i386/avx512vp2intersectvlintrin.h: Ditto.
> * config/i386/pconfigintrin.h: Ditto.
> * config/i386/tsxldtrkintrin.h: Ditto.
> * config/i386/wbnoinvdintrin.h: Ditto.
>

I will check it for Hongyu tomorrow if there are no objections.

Thanks.

-- 
H.J.


[PATCH] Add Missing FSF copyright notes for some x86 intrinsic headers

2020-09-28 Thread Hongyu Wang via Gcc-patches
Hi,

Some x86 intrinsic headers is missing FSF copyright notes. This patch add
the missed notes for those headers.

OK for master?

gcc/ChangeLog:

* config/i386/amxbf16intrin.h: Add FSF copyright notes.
* config/i386/amxint8intrin.h: Ditto.
* config/i386/amxtileintrin.h: Ditto.
* config/i386/avx512vp2intersectintrin.h: Ditto.
* config/i386/avx512vp2intersectvlintrin.h: Ditto.
* config/i386/pconfigintrin.h: Ditto.
* config/i386/tsxldtrkintrin.h: Ditto.
* config/i386/wbnoinvdintrin.h: Ditto.

-- 
Regards,

Hongyu, Wang
From ec6263ba1d74953721dd274c301bdeeeb71d5e77 Mon Sep 17 00:00:00 2001
From: Hongyu Wang 
Date: Mon, 28 Sep 2020 22:22:28 +
Subject: [PATCH] Add missing FSF copyright notes for x86 intrinsic headers.

gcc/ChangeLog:

	* config/i386/amxbf16intrin.h: Add FSF copyright notes.
	* config/i386/amxint8intrin.h: Ditto.
	* config/i386/amxtileintrin.h: Ditto.
	* config/i386/avx512vp2intersectintrin.h: Ditto.
	* config/i386/avx512vp2intersectvlintrin.h: Ditto.
	* config/i386/pconfigintrin.h: Ditto.
	* config/i386/tsxldtrkintrin.h: Ditto.
	* config/i386/wbnoinvdintrin.h: Ditto.
---
 gcc/config/i386/amxbf16intrin.h  | 23 
 gcc/config/i386/amxint8intrin.h  | 23 
 gcc/config/i386/amxtileintrin.h  | 23 
 gcc/config/i386/avx512vp2intersectintrin.h   | 23 
 gcc/config/i386/avx512vp2intersectvlintrin.h | 23 
 gcc/config/i386/pconfigintrin.h  | 23 
 gcc/config/i386/tsxldtrkintrin.h | 23 
 gcc/config/i386/wbnoinvdintrin.h | 23 
 8 files changed, 184 insertions(+)

diff --git a/gcc/config/i386/amxbf16intrin.h b/gcc/config/i386/amxbf16intrin.h
index b1620963944..77cc395e86d 100644
--- a/gcc/config/i386/amxbf16intrin.h
+++ b/gcc/config/i386/amxbf16intrin.h
@@ -1,3 +1,26 @@
+/* Copyright (C) 2020 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
 #if !defined _IMMINTRIN_H_INCLUDED
 #error "Never use  directly; include  instead."
 #endif
diff --git a/gcc/config/i386/amxint8intrin.h b/gcc/config/i386/amxint8intrin.h
index 11adc1f1295..f4e410b6647 100644
--- a/gcc/config/i386/amxint8intrin.h
+++ b/gcc/config/i386/amxint8intrin.h
@@ -1,3 +1,26 @@
+/* Copyright (C) 2020 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
 #if !defined _IMMINTRIN_H_INCLUDED
 #error "Never use  directly; include  instead."
 #endif
diff --git a/gcc/config/i386/amxtileintrin.h b/gcc/config/i386/amxtileintrin.h
index e78e5c04909..41fb9a5d86a 100644
--- a/gcc/config/i386/amxtileintrin.h
+++ b/gcc/config/i386/amxtileintrin.h
@@ -1,3 +1,26 @@
+/* Copyright (C) 2020 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   b

Re: [PATCH] Add missing vn_reference_t::punned initialization

2020-08-24 Thread Richard Biener via Gcc-patches
On Thu, Aug 13, 2020 at 2:49 PM Martin Liška  wrote:
>
> As mentioned in the PR, we miss one initialization of ::punned
> in vn_reference_lookup_call.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

OK.

Thanks,
Richard.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> PR tree-optimization/96597
> * tree-ssa-sccvn.c (vn_reference_lookup_call): Add missing
> initialization of ::punned.
> (vn_reference_insert): Use consistently false instead of 0.
> (vn_reference_insert_pieces): Likewise.
> ---
>   gcc/tree-ssa-sccvn.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
> index 934ae40670d..789d3664db5 100644
> --- a/gcc/tree-ssa-sccvn.c
> +++ b/gcc/tree-ssa-sccvn.c
> @@ -3578,6 +3578,7 @@ vn_reference_lookup_call (gcall *call, vn_reference_t 
> *vnresult,
> vr->vuse = vuse ? SSA_VAL (vuse) : NULL_TREE;
> vr->operands = valueize_shared_reference_ops_from_call (call);
> vr->type = gimple_expr_type (call);
> +  vr->punned = false;
> vr->set = 0;
> vr->base_set = 0;
> vr->hashcode = vn_reference_compute_hash (vr);
> @@ -3601,7 +3602,7 @@ vn_reference_insert (tree op, tree result, tree vuse, 
> tree vdef)
> vr1->vuse = vuse_ssa_val (vuse);
> vr1->operands = valueize_shared_reference_ops_from_ref (op, &tem).copy ();
> vr1->type = TREE_TYPE (op);
> -  vr1->punned = 0;
> +  vr1->punned = false;
> ao_ref op_ref;
> ao_ref_init (&op_ref, op);
> vr1->set = ao_ref_alias_set (&op_ref);
> @@ -3661,7 +3662,7 @@ vn_reference_insert_pieces (tree vuse, alias_set_type 
> set,
> vr1->vuse = vuse_ssa_val (vuse);
> vr1->operands = valueize_refs (operands);
> vr1->type = type;
> -  vr1->punned = 0;
> +  vr1->punned = false;
> vr1->set = set;
> vr1->base_set = base_set;
> vr1->hashcode = vn_reference_compute_hash (vr1);
> --
> 2.28.0
>


[PATCH] Add missing vn_reference_t::punned initialization

2020-08-13 Thread Martin Liška

As mentioned in the PR, we miss one initialization of ::punned
in vn_reference_lookup_call.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

PR tree-optimization/96597
* tree-ssa-sccvn.c (vn_reference_lookup_call): Add missing
initialization of ::punned.
(vn_reference_insert): Use consistently false instead of 0.
(vn_reference_insert_pieces): Likewise.
---
 gcc/tree-ssa-sccvn.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 934ae40670d..789d3664db5 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -3578,6 +3578,7 @@ vn_reference_lookup_call (gcall *call, vn_reference_t 
*vnresult,
   vr->vuse = vuse ? SSA_VAL (vuse) : NULL_TREE;
   vr->operands = valueize_shared_reference_ops_from_call (call);
   vr->type = gimple_expr_type (call);
+  vr->punned = false;
   vr->set = 0;
   vr->base_set = 0;
   vr->hashcode = vn_reference_compute_hash (vr);
@@ -3601,7 +3602,7 @@ vn_reference_insert (tree op, tree result, tree vuse, 
tree vdef)
   vr1->vuse = vuse_ssa_val (vuse);
   vr1->operands = valueize_shared_reference_ops_from_ref (op, &tem).copy ();
   vr1->type = TREE_TYPE (op);
-  vr1->punned = 0;
+  vr1->punned = false;
   ao_ref op_ref;
   ao_ref_init (&op_ref, op);
   vr1->set = ao_ref_alias_set (&op_ref);
@@ -3661,7 +3662,7 @@ vn_reference_insert_pieces (tree vuse, alias_set_type set,
   vr1->vuse = vuse_ssa_val (vuse);
   vr1->operands = valueize_refs (operands);
   vr1->type = type;
-  vr1->punned = 0;
+  vr1->punned = false;
   vr1->set = set;
   vr1->base_set = base_set;
   vr1->hashcode = vn_reference_compute_hash (vr1);
--
2.28.0



Re: [PATCH] Add missing check for gassign.

2020-06-18 Thread Richard Biener via Gcc-patches
On Thu, Jun 18, 2020 at 9:42 AM Martin Liška  wrote:
>
> Hi.
>
> We should check for gassign before doing gimple_assign_rhs_code and friends.
>
> Ready to be installed after proper testing?

OK.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * tree-vect-generic.c (expand_vector_condition): Check
> for gassign before inspecting RHS.
> ---
>   gcc/tree-vect-generic.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
> index fb955bbf3d2..83d399a7898 100644
> --- a/gcc/tree-vect-generic.c
> +++ b/gcc/tree-vect-generic.c
> @@ -957,8 +957,9 @@ expand_vector_condition (gimple_stmt_iterator *gsi)
>
> if (code == SSA_NAME)
>   {
> -  gimple *assign = SSA_NAME_DEF_STMT (a);
> -  if (TREE_CODE_CLASS (gimple_assign_rhs_code (assign)) == 
> tcc_comparison)
> +  gassign *assign = dyn_cast (SSA_NAME_DEF_STMT (a));
> +  if (assign != NULL
> + && TREE_CODE_CLASS (gimple_assign_rhs_code (assign)) == 
> tcc_comparison)
> {
>   a_is_comparison = true;
>   a1 = gimple_assign_rhs1 (assign);
> --
> 2.27.0
>


[PATCH] Add missing check for gassign.

2020-06-18 Thread Martin Liška

Hi.

We should check for gassign before doing gimple_assign_rhs_code and friends.

Ready to be installed after proper testing?

Thanks,
Martin

gcc/ChangeLog:

* tree-vect-generic.c (expand_vector_condition): Check
for gassign before inspecting RHS.
---
 gcc/tree-vect-generic.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index fb955bbf3d2..83d399a7898 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -957,8 +957,9 @@ expand_vector_condition (gimple_stmt_iterator *gsi)
 
   if (code == SSA_NAME)

 {
-  gimple *assign = SSA_NAME_DEF_STMT (a);
-  if (TREE_CODE_CLASS (gimple_assign_rhs_code (assign)) == tcc_comparison)
+  gassign *assign = dyn_cast (SSA_NAME_DEF_STMT (a));
+  if (assign != NULL
+ && TREE_CODE_CLASS (gimple_assign_rhs_code (assign)) == 
tcc_comparison)
{
  a_is_comparison = true;
  a1 = gimple_assign_rhs1 (assign);
--
2.27.0



Re: [PATCH] Add missing store in emission of asan_stack_free.

2020-06-10 Thread Jakub Jelinek via Gcc-patches
On Wed, Jun 10, 2020 at 01:14:59PM +0200, Martin Liška wrote:
> >From 4d2e0b1e87b08ec21fd82144f00d364687030706 Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Tue, 19 May 2020 16:57:56 +0200
> Subject: [PATCH] Add missing store in emission of asan_stack_free.
> 
> gcc/ChangeLog:
> 
> 2020-05-19  Martin Liska  
> 
>   PR sanitizer/94910
>   * asan.c (asan_emit_stack_protection): Emit
>   also **SavedFlagPtr(FakeStack) = 0 in order to release
>   a stack frame.

Please adjust the ChangeLog too.

Ok with that change.

Jakub



Re: [PATCH] Add missing store in emission of asan_stack_free.

2020-06-10 Thread Martin Liška

On 6/10/20 12:08 PM, Jakub Jelinek wrote:

On Wed, Jun 10, 2020 at 11:49:01AM +0200, Martin Liška wrote:

-   store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c,
-BITS_PER_UNIT, true, RETURN_BEGIN);
+   {
+ /* Emit:
+  memset(ShadowBase, kAsanStackAfterReturnMagic, ShadowSize);
+  **SavedFlagPtr(FakeStack) = 0


SavedFlagPtr has two arguments, doesn't it?


Good point, I copied that from 
llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
which has it wrong. Fixed that.




+ */
+ store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c,
+  BITS_PER_UNIT, true, RETURN_BEGIN);
+
+ unsigned HOST_WIDE_INT offset
+   = (1 << (use_after_return_class + 6));
+ offset -= GET_MODE_SIZE (ptr_mode);


So, mem here is a MEM into which we stored ASAN_STACK_RETIRED_MAGIC.


Ok, we should rather start from 'base'.




+ mem = adjust_address (mem, ptr_mode, offset);


This adds offset to it and changes mode to ptr_mode.  So,
mem is now *(ptr_mode)(&old_mem + offset)


+ rtx addr = gen_reg_rtx (ptr_mode);
+ emit_move_insn (addr, mem);


We load that value.


+ mem = gen_rtx_MEM (ptr_mode, addr);
+ mem = adjust_address (mem, QImode, 0);


And here I'm lost why you do that.  If you want to store a single
byte into what it points to, then why don't you just
mem = gen_rtx_MEM (QImode, addr);
instead of the above two lines?


Because I'm not so much familiar with RTL ;)


adjust_address will return a MEM like the above, with offset not adjusted
(as the addition is 0) and mode changed to QImode, but there is no reason
not to create it already in QImode.


All right.
What about this?
Martin




+ emit_move_insn (mem, const0_rtx);
+   }
else if (use_after_return_class >= 5
   || !set_storage_via_setmem (shadow_mem,
   GEN_INT (sz),
--
2.26.2




Jakub



>From 4d2e0b1e87b08ec21fd82144f00d364687030706 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 19 May 2020 16:57:56 +0200
Subject: [PATCH] Add missing store in emission of asan_stack_free.

gcc/ChangeLog:

2020-05-19  Martin Liska  

	PR sanitizer/94910
	* asan.c (asan_emit_stack_protection): Emit
	also **SavedFlagPtr(FakeStack) = 0 in order to release
	a stack frame.
---
 gcc/asan.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/gcc/asan.c b/gcc/asan.c
index c9872f1b007..e015fa3ec9b 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1598,8 +1598,24 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
   if (use_after_return_class < 5
 	  && can_store_by_pieces (sz, builtin_memset_read_str, &c,
   BITS_PER_UNIT, true))
-	store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c,
-			 BITS_PER_UNIT, true, RETURN_BEGIN);
+	{
+	  /* Emit:
+	   memset(ShadowBase, kAsanStackAfterReturnMagic, ShadowSize);
+	   **SavedFlagPtr(FakeStack, class_id) = 0
+	  */
+	  store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c,
+			   BITS_PER_UNIT, true, RETURN_BEGIN);
+
+	  unsigned HOST_WIDE_INT offset
+	= (1 << (use_after_return_class + 6));
+	  offset -= GET_MODE_SIZE (ptr_mode);
+	  mem = gen_rtx_MEM (ptr_mode, base);
+	  mem = adjust_address (mem, ptr_mode, offset);
+	  rtx addr = gen_reg_rtx (ptr_mode);
+	  emit_move_insn (addr, mem);
+	  mem = gen_rtx_MEM (QImode, addr);
+	  emit_move_insn (mem, const0_rtx);
+	}
   else if (use_after_return_class >= 5
 	   || !set_storage_via_setmem (shadow_mem,
 	   GEN_INT (sz),
-- 
2.26.2



Re: [PATCH] Add missing store in emission of asan_stack_free.

2020-06-10 Thread Jakub Jelinek via Gcc-patches
On Wed, Jun 10, 2020 at 11:49:01AM +0200, Martin Liška wrote:
> - store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c,
> -  BITS_PER_UNIT, true, RETURN_BEGIN);
> + {
> +   /* Emit:
> +memset(ShadowBase, kAsanStackAfterReturnMagic, ShadowSize);
> +**SavedFlagPtr(FakeStack) = 0

SavedFlagPtr has two arguments, doesn't it?

> +   */
> +   store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c,
> +BITS_PER_UNIT, true, RETURN_BEGIN);
> +
> +   unsigned HOST_WIDE_INT offset
> + = (1 << (use_after_return_class + 6));
> +   offset -= GET_MODE_SIZE (ptr_mode);

So, mem here is a MEM into which we stored ASAN_STACK_RETIRED_MAGIC.

> +   mem = adjust_address (mem, ptr_mode, offset);

This adds offset to it and changes mode to ptr_mode.  So,
mem is now *(ptr_mode)(&old_mem + offset)

> +   rtx addr = gen_reg_rtx (ptr_mode);
> +   emit_move_insn (addr, mem);

We load that value.

> +   mem = gen_rtx_MEM (ptr_mode, addr);
> +   mem = adjust_address (mem, QImode, 0);

And here I'm lost why you do that.  If you want to store a single
byte into what it points to, then why don't you just
mem = gen_rtx_MEM (QImode, addr);
instead of the above two lines?
adjust_address will return a MEM like the above, with offset not adjusted
(as the addition is 0) and mode changed to QImode, but there is no reason
not to create it already in QImode.

> +   emit_move_insn (mem, const0_rtx);
> + }
>else if (use_after_return_class >= 5
>  || !set_storage_via_setmem (shadow_mem,
>  GEN_INT (sz),
> -- 
> 2.26.2
> 


Jakub



Re: [PATCH] Add missing store in emission of asan_stack_free.

2020-06-10 Thread Martin Liška

On 6/10/20 10:42 AM, Jakub Jelinek wrote:

E.g. we just shouldn't reuse
MEMs (even after adjusting them) from different indirection levels because
we risk some attributes (alias set, MEM_EXPR, whatever else) will stay
around from the different indirection level.


All right, what about the updated patch? I must confess that RTL instruction
emission is not my strength.

Martin
>From 16e46a532c059930887bc30f82c3054a75a5a56d Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 19 May 2020 16:57:56 +0200
Subject: [PATCH] Add missing store in emission of asan_stack_free.

gcc/ChangeLog:

2020-05-19  Martin Liska  

	PR sanitizer/94910
	* asan.c (asan_emit_stack_protection): Emit
	also **SavedFlagPtr(FakeStack) = 0 in order to release
	a stack frame.
---
 gcc/asan.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/gcc/asan.c b/gcc/asan.c
index c9872f1b007..232341f5c4b 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1598,8 +1598,24 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
   if (use_after_return_class < 5
 	  && can_store_by_pieces (sz, builtin_memset_read_str, &c,
   BITS_PER_UNIT, true))
-	store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c,
-			 BITS_PER_UNIT, true, RETURN_BEGIN);
+	{
+	  /* Emit:
+	   memset(ShadowBase, kAsanStackAfterReturnMagic, ShadowSize);
+	   **SavedFlagPtr(FakeStack) = 0
+	  */
+	  store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c,
+			   BITS_PER_UNIT, true, RETURN_BEGIN);
+
+	  unsigned HOST_WIDE_INT offset
+	= (1 << (use_after_return_class + 6));
+	  offset -= GET_MODE_SIZE (ptr_mode);
+	  mem = adjust_address (mem, ptr_mode, offset);
+	  rtx addr = gen_reg_rtx (ptr_mode);
+	  emit_move_insn (addr, mem);
+	  mem = gen_rtx_MEM (ptr_mode, addr);
+	  mem = adjust_address (mem, QImode, 0);
+	  emit_move_insn (mem, const0_rtx);
+	}
   else if (use_after_return_class >= 5
 	   || !set_storage_via_setmem (shadow_mem,
 	   GEN_INT (sz),
-- 
2.26.2



Re: [PATCH] Add missing store in emission of asan_stack_free.

2020-06-10 Thread Jakub Jelinek via Gcc-patches
On Wed, Jun 10, 2020 at 10:24:59AM +0200, Martin Liška wrote:
> > > This doesn't look correct to me.
> > > I'd think the first adjust_address should be
> > >   mem = adjust_address (mem, ptr_mode, offset);
> > > which will give you a MEM with ptr_mode which has SavedFlagPtr(FakeStack)
> > > address, i.e. *SavedFlagPtr(FakeStack).
> > > Next, you want to load that into some temporary, so e.g.
> > >   rtx addr = gen_reg_rtx (ptr_mode);
> > >   emit_move_insn (addr, mem);
> > > next you need to convert that ptr_mode to Pmode if needed, so something 
> > > like
> > >   addr = convert_memory_address (Pmode, addr);
> > > and finally:
> > >   mem = gen_rtx_MEM (QImode, addr);
> > >   emit_move_insn (mem, const0_rtx);
> > > Completely untested.
> > 
> > This is not correct. With your suggestion I have:
> > 
> > int foo(int index)
> > {
> >    int a[100];
> >    return a[index];
> > }
> > 
> > $ diff -u before.s after.s
> > --- before.s    2020-06-01 15:15:22.634337654 +0200
> > +++ after.s    2020-06-01 15:16:32.205711511 +0200
> > @@ -81,8 +81,7 @@
> >   movq    %rdi, 2147450920(%rax)
> >   movq    %rsi, 2147450928(%rax)
> >   movq    %rdi, 2147450936(%rax)
> > -    movq    504(%rbx), %rax
> > -    movb    $0, (%rax)
> > +    movb    $0, 504(%rbx)
> >   jmp    .L3
> >   .L2:
> >   movq    $0, 2147450880(%rax)
> > 
> > There's missing one level of de-reference. Looking at clang:
> > 
> >  movq    %rsi, 2147450928(%rax)
> >  movq    %rdi, 2147450936(%rax)
> >  movq    504(%rbx), %rax
> >  movb    $0, (%rax)
> >  jmp    .L3
> > .L2:
> > 
> > It does the same as my patch.
> 
> Jakub?

Even if so, just add that another level of indirection where it belongs,
but as I said, what you posted didn't feel right.  E.g. we just shouldn't reuse
MEMs (even after adjusting them) from different indirection levels because
we risk some attributes (alias set, MEM_EXPR, whatever else) will stay
around from the different indirection level.

Jakub



Re: [PATCH] Add missing store in emission of asan_stack_free.

2020-06-10 Thread Martin Liška

On 6/1/20 3:18 PM, Martin Liška wrote:

On 6/1/20 2:52 PM, Jakub Jelinek wrote:

On Mon, Jun 01, 2020 at 02:28:51PM +0200, Martin Liška wrote:

--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1598,8 +1598,24 @@ asan_emit_stack_protection (rtx base, rtx pbase, 
unsigned int alignb,
    if (use_after_return_class < 5
    && can_store_by_pieces (sz, builtin_memset_read_str, &c,
    BITS_PER_UNIT, true))
-    store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c,
- BITS_PER_UNIT, true, RETURN_BEGIN);
+    {
+  /* Emit:
+   memset(ShadowBase, kAsanStackAfterReturnMagic, ShadowSize);
+   **SavedFlagPtr(FakeStack) = 0
+  */
+  store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c,
+   BITS_PER_UNIT, true, RETURN_BEGIN);
+
+  unsigned HOST_WIDE_INT offset
+    = (1 << (use_after_return_class + 6));
+  offset -= GET_MODE_SIZE (ptr_mode);
+  mem = adjust_address (mem, Pmode, offset);
+  mem = gen_rtx_MEM (ptr_mode, mem);
+  rtx tmp_reg = gen_reg_rtx (Pmode);
+  emit_move_insn (tmp_reg, mem);
+  mem = adjust_address (mem, QImode, 0);
+  emit_move_insn (mem, const0_rtx);


This doesn't look correct to me.
I'd think the first adjust_address should be
  mem = adjust_address (mem, ptr_mode, offset);
which will give you a MEM with ptr_mode which has SavedFlagPtr(FakeStack)
address, i.e. *SavedFlagPtr(FakeStack).
Next, you want to load that into some temporary, so e.g.
  rtx addr = gen_reg_rtx (ptr_mode);
  emit_move_insn (addr, mem);
next you need to convert that ptr_mode to Pmode if needed, so something like
  addr = convert_memory_address (Pmode, addr);
and finally:
  mem = gen_rtx_MEM (QImode, addr);
  emit_move_insn (mem, const0_rtx);
Completely untested.


This is not correct. With your suggestion I have:

int foo(int index)
{
   int a[100];
   return a[index];
}

$ diff -u before.s after.s
--- before.s    2020-06-01 15:15:22.634337654 +0200
+++ after.s    2020-06-01 15:16:32.205711511 +0200
@@ -81,8 +81,7 @@
  movq    %rdi, 2147450920(%rax)
  movq    %rsi, 2147450928(%rax)
  movq    %rdi, 2147450936(%rax)
-    movq    504(%rbx), %rax
-    movb    $0, (%rax)
+    movb    $0, 504(%rbx)
  jmp    .L3
  .L2:
  movq    $0, 2147450880(%rax)

There's missing one level of de-reference. Looking at clang:

 movq    %rsi, 2147450928(%rax)
 movq    %rdi, 2147450936(%rax)
 movq    504(%rbx), %rax
 movb    $0, (%rax)
 jmp    .L3
.L2:

It does the same as my patch.
Martin



Jakub





Jakub?

Thanks,
Martin


Re: [PATCH] Add missing store in emission of asan_stack_free.

2020-06-01 Thread Martin Liška

On 6/1/20 2:52 PM, Jakub Jelinek wrote:

On Mon, Jun 01, 2020 at 02:28:51PM +0200, Martin Liška wrote:

--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1598,8 +1598,24 @@ asan_emit_stack_protection (rtx base, rtx pbase, 
unsigned int alignb,
if (use_after_return_class < 5
  && can_store_by_pieces (sz, builtin_memset_read_str, &c,
  BITS_PER_UNIT, true))
-   store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c,
-BITS_PER_UNIT, true, RETURN_BEGIN);
+   {
+ /* Emit:
+  memset(ShadowBase, kAsanStackAfterReturnMagic, ShadowSize);
+  **SavedFlagPtr(FakeStack) = 0
+ */
+ store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c,
+  BITS_PER_UNIT, true, RETURN_BEGIN);
+
+ unsigned HOST_WIDE_INT offset
+   = (1 << (use_after_return_class + 6));
+ offset -= GET_MODE_SIZE (ptr_mode);
+ mem = adjust_address (mem, Pmode, offset);
+ mem = gen_rtx_MEM (ptr_mode, mem);
+ rtx tmp_reg = gen_reg_rtx (Pmode);
+ emit_move_insn (tmp_reg, mem);
+ mem = adjust_address (mem, QImode, 0);
+ emit_move_insn (mem, const0_rtx);


This doesn't look correct to me.
I'd think the first adjust_address should be
  mem = adjust_address (mem, ptr_mode, offset);
which will give you a MEM with ptr_mode which has SavedFlagPtr(FakeStack)
address, i.e. *SavedFlagPtr(FakeStack).
Next, you want to load that into some temporary, so e.g.
  rtx addr = gen_reg_rtx (ptr_mode);
  emit_move_insn (addr, mem);
next you need to convert that ptr_mode to Pmode if needed, so something like
  addr = convert_memory_address (Pmode, addr);
and finally:
  mem = gen_rtx_MEM (QImode, addr);
  emit_move_insn (mem, const0_rtx);
Completely untested.


This is not correct. With your suggestion I have:

int foo(int index)
{
  int a[100];
  return a[index];
}

$ diff -u before.s after.s
--- before.s2020-06-01 15:15:22.634337654 +0200
+++ after.s 2020-06-01 15:16:32.205711511 +0200
@@ -81,8 +81,7 @@
movq%rdi, 2147450920(%rax)
movq%rsi, 2147450928(%rax)
movq%rdi, 2147450936(%rax)
-   movq504(%rbx), %rax
-   movb$0, (%rax)
+   movb$0, 504(%rbx)
jmp .L3
 .L2:
movq$0, 2147450880(%rax)

There's missing one level of de-reference. Looking at clang:

movq%rsi, 2147450928(%rax)
movq%rdi, 2147450936(%rax)
movq504(%rbx), %rax
movb$0, (%rax)
jmp .L3
.L2:

It does the same as my patch.
Martin



Jakub





Re: [PATCH] Add missing store in emission of asan_stack_free.

2020-06-01 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 01, 2020 at 02:28:51PM +0200, Martin Liška wrote:
> --- a/gcc/asan.c
> +++ b/gcc/asan.c
> @@ -1598,8 +1598,24 @@ asan_emit_stack_protection (rtx base, rtx pbase, 
> unsigned int alignb,
>if (use_after_return_class < 5
> && can_store_by_pieces (sz, builtin_memset_read_str, &c,
> BITS_PER_UNIT, true))
> - store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c,
> -  BITS_PER_UNIT, true, RETURN_BEGIN);
> + {
> +   /* Emit:
> +memset(ShadowBase, kAsanStackAfterReturnMagic, ShadowSize);
> +**SavedFlagPtr(FakeStack) = 0
> +   */
> +   store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c,
> +BITS_PER_UNIT, true, RETURN_BEGIN);
> +
> +   unsigned HOST_WIDE_INT offset
> + = (1 << (use_after_return_class + 6));
> +   offset -= GET_MODE_SIZE (ptr_mode);
> +   mem = adjust_address (mem, Pmode, offset);
> +   mem = gen_rtx_MEM (ptr_mode, mem);
> +   rtx tmp_reg = gen_reg_rtx (Pmode);
> +   emit_move_insn (tmp_reg, mem);
> +   mem = adjust_address (mem, QImode, 0);
> +   emit_move_insn (mem, const0_rtx);

This doesn't look correct to me.
I'd think the first adjust_address should be
  mem = adjust_address (mem, ptr_mode, offset);
which will give you a MEM with ptr_mode which has SavedFlagPtr(FakeStack)
address, i.e. *SavedFlagPtr(FakeStack).
Next, you want to load that into some temporary, so e.g.
  rtx addr = gen_reg_rtx (ptr_mode);
  emit_move_insn (addr, mem);
next you need to convert that ptr_mode to Pmode if needed, so something like
  addr = convert_memory_address (Pmode, addr);
and finally:
  mem = gen_rtx_MEM (QImode, addr);
  emit_move_insn (mem, const0_rtx);
Completely untested.

Jakub



Re: [PATCH] Add missing store in emission of asan_stack_free.

2020-06-01 Thread Martin Liška

On 5/20/20 1:03 PM, Franz Sirl wrote:

Am 2020-05-19 um 21:05 schrieb Martin Liška:

Hi.

We make direct emission for asan_emit_stack_protection for smaller stacks.
That's fine but we're missing the piece that marks the stack as released
and we run out of pre-allocated stacks. I also included some stack-related
constants that were used in asan.c.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2020-05-19  Martin Liska  

 PR sanitizer/94910
 * asan.c (asan_emit_stack_protection): Emit
 also **SavedFlagPtr(FakeStack) = 0 in order to release
 a stack frame.
 * asan.h (ASAN_MIN_STACK_FRAME_SIZE_LOG): New.
 (ASAN_MAX_STACK_FRAME_SIZE_LOG): Likewise.
 (ASAN_MIN_STACK_FRAME_SIZE): Likewise.
 (ASAN_MAX_STACK_FRAME_SIZE): Likewise.
---
  gcc/asan.c | 26 ++
  gcc/asan.h |  8 
  2 files changed, 30 insertions(+), 4 deletions(-)




 >-  if (asan_frame_size > 32 && asan_frame_size <= 65536 && pbase
 >+  if (asan_frame_size >= ASAN_MIN_STACK_FRAME_SIZE

Hi,

is the change from > to >= and from 32 to 64 for ASAN_MIN_STACK_FRAME_SIZE 
intentional? Just asking because it doesn't look obvious from Changelog or patch.
Also a few lines below the "5" in

   use_after_return_class = floor_log2 (asan_frame_size - 1) - 5;

looks like it may be related to ASAN_MIN_STACK_FRAME_SIZE_LOG.


Hello.

Thank you very much for the useful feedback. I really made the refactoring
in a wrong way.

I'm suggesting to only change the emission of asan_emit_stack_protection.
Tested locally with asan.exp file.

Ready for master?
Thanks,
Martin



regards,
Franz


>From 5d0c64b2f4028af3ed575934ecc0c3378cca3de1 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 19 May 2020 16:57:56 +0200
Subject: [PATCH] Add missing store in emission of asan_stack_free.

gcc/ChangeLog:

2020-05-19  Martin Liska  

	PR sanitizer/94910
	* asan.c (asan_emit_stack_protection): Emit
	also **SavedFlagPtr(FakeStack) = 0 in order to release
	a stack frame.
---
 gcc/asan.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/gcc/asan.c b/gcc/asan.c
index c9872f1b007..e8d2a25ff79 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1598,8 +1598,24 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
   if (use_after_return_class < 5
 	  && can_store_by_pieces (sz, builtin_memset_read_str, &c,
   BITS_PER_UNIT, true))
-	store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c,
-			 BITS_PER_UNIT, true, RETURN_BEGIN);
+	{
+	  /* Emit:
+	   memset(ShadowBase, kAsanStackAfterReturnMagic, ShadowSize);
+	   **SavedFlagPtr(FakeStack) = 0
+	  */
+	  store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c,
+			   BITS_PER_UNIT, true, RETURN_BEGIN);
+
+	  unsigned HOST_WIDE_INT offset
+	= (1 << (use_after_return_class + 6));
+	  offset -= GET_MODE_SIZE (ptr_mode);
+	  mem = adjust_address (mem, Pmode, offset);
+	  mem = gen_rtx_MEM (ptr_mode, mem);
+	  rtx tmp_reg = gen_reg_rtx (Pmode);
+	  emit_move_insn (tmp_reg, mem);
+	  mem = adjust_address (mem, QImode, 0);
+	  emit_move_insn (mem, const0_rtx);
+	}
   else if (use_after_return_class >= 5
 	   || !set_storage_via_setmem (shadow_mem,
 	   GEN_INT (sz),
-- 
2.26.2



Re: [PATCH] Add missing expander for vector float_extend and float_truncate [PR target/95125]

2020-05-24 Thread Uros Bizjak via Gcc-patches
On Sun, May 24, 2020 at 9:20 AM Hongtao Liu  wrote:
>
>   Bootstrap is ok, regression test on i386/x86-64 backend is ok.
>
> gcc/ChangeLog
> PR target/95125
> * config/i386/sse.md (sf2dfmode_lower): New mode attribute.
> (trunc2) New expander.
> (extend2): Ditto.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/pr95125-avx.c: New test.
> * gcc.target/i386/pr95125-avx512f.c: Ditto.

OK.

Thanks,
Uros.


[PATCH] Add missing expander for vector float_extend and float_truncate [PR target/95125]

2020-05-24 Thread Hongtao Liu via Gcc-patches
  Bootstrap is ok, regression test on i386/x86-64 backend is ok.

gcc/ChangeLog
PR target/95125
* config/i386/sse.md (sf2dfmode_lower): New mode attribute.
(trunc2) New expander.
(extend2): Ditto.

gcc/testsuite/ChangeLog
* gcc.target/i386/pr95125-avx.c: New test.
* gcc.target/i386/pr95125-avx512f.c: Ditto.

--
BR,
Hongtao


0001-Add-missing-expander-for-vector-float_extend-and-flo.patch
Description: Binary data


Re: [PATCH] Add missing store in emission of asan_stack_free.

2020-05-20 Thread Franz Sirl

Am 2020-05-19 um 21:05 schrieb Martin Liška:

Hi.

We make direct emission for asan_emit_stack_protection for smaller stacks.
That's fine but we're missing the piece that marks the stack as released
and we run out of pre-allocated stacks. I also included some stack-related
constants that were used in asan.c.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2020-05-19  Martin Liska  

 PR sanitizer/94910
 * asan.c (asan_emit_stack_protection): Emit
 also **SavedFlagPtr(FakeStack) = 0 in order to release
 a stack frame.
 * asan.h (ASAN_MIN_STACK_FRAME_SIZE_LOG): New.
 (ASAN_MAX_STACK_FRAME_SIZE_LOG): Likewise.
 (ASAN_MIN_STACK_FRAME_SIZE): Likewise.
 (ASAN_MAX_STACK_FRAME_SIZE): Likewise.
---
  gcc/asan.c | 26 ++
  gcc/asan.h |  8 
  2 files changed, 30 insertions(+), 4 deletions(-)




>-  if (asan_frame_size > 32 && asan_frame_size <= 65536 && pbase
>+  if (asan_frame_size >= ASAN_MIN_STACK_FRAME_SIZE

Hi,

is the change from > to >= and from 32 to 64 for 
ASAN_MIN_STACK_FRAME_SIZE intentional? Just asking because it doesn't 
look obvious from Changelog or patch.

Also a few lines below the "5" in

  use_after_return_class = floor_log2 (asan_frame_size - 1) - 5;

looks like it may be related to ASAN_MIN_STACK_FRAME_SIZE_LOG.

regards,
Franz


[PATCH] Add missing store in emission of asan_stack_free.

2020-05-19 Thread Martin Liška

Hi.

We make direct emission for asan_emit_stack_protection for smaller stacks.
That's fine but we're missing the piece that marks the stack as released
and we run out of pre-allocated stacks. I also included some stack-related
constants that were used in asan.c.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2020-05-19  Martin Liska  

PR sanitizer/94910
* asan.c (asan_emit_stack_protection): Emit
also **SavedFlagPtr(FakeStack) = 0 in order to release
a stack frame.
* asan.h (ASAN_MIN_STACK_FRAME_SIZE_LOG): New.
(ASAN_MAX_STACK_FRAME_SIZE_LOG): Likewise.
(ASAN_MIN_STACK_FRAME_SIZE): Likewise.
(ASAN_MAX_STACK_FRAME_SIZE): Likewise.
---
 gcc/asan.c | 26 ++
 gcc/asan.h |  8 
 2 files changed, 30 insertions(+), 4 deletions(-)


diff --git a/gcc/asan.c b/gcc/asan.c
index c9872f1b007..4cdf294e82b 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1375,7 +1375,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
   char buf[32];
   HOST_WIDE_INT base_offset = offsets[length - 1];
   HOST_WIDE_INT base_align_bias = 0, offset, prev_offset;
-  HOST_WIDE_INT asan_frame_size = offsets[0] - base_offset;
+  unsigned HOST_WIDE_INT asan_frame_size = offsets[0] - base_offset;
   HOST_WIDE_INT last_offset, last_size, last_size_aligned;
   int l;
   unsigned char cur_shadow_byte = ASAN_STACK_MAGIC_LEFT;
@@ -1428,7 +1428,9 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
   str_cst = asan_pp_string (&asan_pp);
 
   /* Emit the prologue sequence.  */
-  if (asan_frame_size > 32 && asan_frame_size <= 65536 && pbase
+  if (asan_frame_size >= ASAN_MIN_STACK_FRAME_SIZE
+  && asan_frame_size <= ASAN_MAX_STACK_FRAME_SIZE
+  && pbase
   && param_asan_use_after_return)
 {
   use_after_return_class = floor_log2 (asan_frame_size - 1) - 5;
@@ -1598,8 +1600,24 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
   if (use_after_return_class < 5
 	  && can_store_by_pieces (sz, builtin_memset_read_str, &c,
   BITS_PER_UNIT, true))
-	store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c,
-			 BITS_PER_UNIT, true, RETURN_BEGIN);
+	{
+	  /* Emit:
+	   memset(ShadowBase, kAsanStackAfterReturnMagic, ShadowSize);
+	   **SavedFlagPtr(FakeStack) = 0
+	  */
+	  store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c,
+			   BITS_PER_UNIT, true, RETURN_BEGIN);
+
+	  unsigned HOST_WIDE_INT offset
+	= (1 << (use_after_return_class + ASAN_MIN_STACK_FRAME_SIZE_LOG));
+	  offset -= GET_MODE_SIZE (ptr_mode);
+	  mem = adjust_address (mem, Pmode, offset);
+	  mem = gen_rtx_MEM (ptr_mode, mem);
+	  rtx tmp_reg = gen_reg_rtx (Pmode);
+	  emit_move_insn (tmp_reg, mem);
+	  mem = adjust_address (mem, QImode, 0);
+	  emit_move_insn (mem, const0_rtx);
+	}
   else if (use_after_return_class >= 5
 	   || !set_storage_via_setmem (shadow_mem,
 	   GEN_INT (sz),
diff --git a/gcc/asan.h b/gcc/asan.h
index 9efd33f9b86..5ff0735045f 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -58,6 +58,14 @@ extern hash_set  *asan_used_labels;
 
 #define ASAN_MIN_RED_ZONE_SIZE	16
 
+/* Minimal and maximal frame size for a fake stack frame.  */
+
+#define ASAN_MIN_STACK_FRAME_SIZE_LOG 6
+#define ASAN_MAX_STACK_FRAME_SIZE_LOG 16
+
+#define ASAN_MIN_STACK_FRAME_SIZE (1UL << ASAN_MIN_STACK_FRAME_SIZE_LOG)
+#define ASAN_MAX_STACK_FRAME_SIZE (1UL << ASAN_MAX_STACK_FRAME_SIZE_LOG)
+
 /* Shadow memory values for stack protection.  Left is below protected vars,
the first pointer in stack corresponding to that offset contains
ASAN_STACK_FRAME_MAGIC word, the second pointer to a string describing



Re: [PATCH] add missing -Wno-stack-usage etc. options [PR90983]

2020-04-22 Thread Jeff Law via Gcc-patches
On Tue, 2020-04-21 at 13:37 -0600, Martin Sebor via Gcc-patches wrote:
> In addition to accepting argument values in excess of INT_MAX in
> options like -Walloca-larger-than=byte-size, GCC 9 has changed
> the behavior of such options with byte-size of zero.  While in prior
> versions zero disables the warning for any size, in GCC 9 it enables
> it for all non-zero sizes.  Since all these byte-size options are
> enabled by default for sizes in excess of PTRDIFF_MAX, users who
> want to disable them need to use the newly added -Wno-xxx options
> (such as -Wno-alloca-larger-than).
> 
> However, although I documented all of the new options, I only
> remembered to add the negative options for the C/C++ family and
> forgot about all the common ones, including -Wframe-larger-than=,
> -Wlarger-than=, and -Wstack-usage=.  As a result, users who want
> to disable the default, say -Wstack-usage, cannot use
> the -Wno-stack-usage as the manual leads them to do but have to
> use the less than intuitive workaround of specifying a very large
> argument to the positive option, e.g., something like
> -Wstack-usage=999EiB (denoting 999 etabytes).
> 
> To avoid this hassle the attached patch provides the three missing
> negative options.
> 
> Tested on x86_64-linux.
OK
jeff



[PATCH] add missing -Wno-stack-usage etc. options [PR90983]

2020-04-21 Thread Martin Sebor via Gcc-patches

In addition to accepting argument values in excess of INT_MAX in
options like -Walloca-larger-than=byte-size, GCC 9 has changed
the behavior of such options with byte-size of zero.  While in prior
versions zero disables the warning for any size, in GCC 9 it enables
it for all non-zero sizes.  Since all these byte-size options are
enabled by default for sizes in excess of PTRDIFF_MAX, users who
want to disable them need to use the newly added -Wno-xxx options
(such as -Wno-alloca-larger-than).

However, although I documented all of the new options, I only
remembered to add the negative options for the C/C++ family and
forgot about all the common ones, including -Wframe-larger-than=,
-Wlarger-than=, and -Wstack-usage=.  As a result, users who want
to disable the default, say -Wstack-usage, cannot use
the -Wno-stack-usage as the manual leads them to do but have to
use the less than intuitive workaround of specifying a very large
argument to the positive option, e.g., something like
-Wstack-usage=999EiB (denoting 999 etabytes).

To avoid this hassle the attached patch provides the three missing
negative options.

Tested on x86_64-linux.

Martin
PR driver/90983 - manual documents `-Wno-stack-usage` flag but it is unrecognized
gcc/ChangeLog:

	PR driver/90983
	* common.opt (-Wno-frame-larger-than): New option.
	(-Wno-larger-than, -Wno-stack-usage): Same.


gcc/testsuite/ChangeLog:

	PR driver/90983
	* gcc.dg/Wframe-larger-than-3.c: New test.
	* gcc.dg/Wlarger-than4.c: New test.
	* gcc.dg/Wstack-usage.c: New test.

diff --git a/gcc/common.opt b/gcc/common.opt
index 1e604ba9bb6..d33383b523c 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -601,6 +601,10 @@ Wframe-larger-than=
 Common RejectNegative Joined Host_Wide_Int ByteSize Warning Var(warn_frame_larger_than_size) Init(HOST_WIDE_INT_MAX)
 -Wframe-larger-than=	Warn if a function's stack frame requires in excess of .
 
+Wno-frame-larger-than
+Common Alias(Wframe-larger-than=,18446744073709551615EiB,none) Warning
+Disable -Wframe-larger-than= warning.  Equivalent to -Wframe-larger-than= or larger.
+
 Wfree-nonheap-object
 Common Var(warn_free_nonheap_object) Init(1) Warning
 Warn when attempting to free a non-heap object.
@@ -631,6 +635,10 @@ Wlarger-than=
 Common RejectNegative Joined Host_Wide_Int ByteSize Warning Var(warn_larger_than_size) Init(HOST_WIDE_INT_MAX)
 -Wlarger-than=	Warn if an object's size exceeds .
 
+Wno-larger-than
+Common Alias(Wlarger-than=,18446744073709551615EiB,none) Warning
+Disable -Wlarger-than= warning.  Equivalent to -Wlarger-than= or larger.
+
 Wnonnull-compare
 Var(warn_nonnull_compare) Warning
 Warn if comparing pointer parameter with nonnull attribute with NULL.
@@ -704,6 +712,10 @@ Wstack-usage=
 Common Joined RejectNegative Host_Wide_Int ByteSize Var(warn_stack_usage) Warning Init(HOST_WIDE_INT_MAX)
 -Wstack-usage=	Warn if stack usage might exceed .
 
+Wno-stack-usage
+Common Alias(Wstack-usage=,18446744073709551615EiB,none) Warning
+Disable Wstack-usage= warning.  Equivalent to Wstack-usage= or larger.
+
 Wstrict-aliasing
 Common Warning
 Warn about code which might break strict aliasing rules.
diff --git a/gcc/testsuite/gcc.dg/Wframe-larger-than-3.c b/gcc/testsuite/gcc.dg/Wframe-larger-than-3.c
new file mode 100644
index 000..ae5b2f497c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wframe-larger-than-3.c
@@ -0,0 +1,11 @@
+/* PR 90983/manual documents `-Wno-stack-usage` flag, but it is unrecognized
+   { dg-do compile }
+   { dg-options "-Wall -Wframe-larger-than=123 -Wno-frame-larger-than" } */
+
+void f (void*);
+
+void g (void)
+{
+  char a [1234];
+  f (a);
+}
diff --git a/gcc/testsuite/gcc.dg/Wlarger-than4.c b/gcc/testsuite/gcc.dg/Wlarger-than4.c
new file mode 100644
index 000..dd936c671cd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wlarger-than4.c
@@ -0,0 +1,5 @@
+/* PR 90983/manual documents `-Wno-stack-usage` flag, but it is unrecognized
+   { dg-do compile }
+   { dg-options "-Wall -Wlarger-than=123 -Wno-larger-than" } */
+
+char a [1234];
diff --git a/gcc/testsuite/gcc.dg/Wstack-usage.c b/gcc/testsuite/gcc.dg/Wstack-usage.c
new file mode 100644
index 000..4738b69478b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wstack-usage.c
@@ -0,0 +1,14 @@
+/* PR 90983/manual documents `-Wno-stack-usage` flag, but it is unrecognized
+   { dg-do compile }
+   { dg-options "-Wall -Wstack-usage=123 -Wno-stack-usage" } */
+
+void f (void*);
+
+void g (int n)
+{
+  if (n < 1234)
+n = 1234;
+
+  char a [n];
+  f (a);
+}


[PATCH] Add missing space in various string literals

2019-10-24 Thread Jakub Jelinek
Hi!

On Thu, Oct 24, 2019 at 01:20:26PM -0400, Marek Polacek wrote:
> Since r269045 we're missing a space in a diagnostic.
> 
> Bootstrapped/regtested on x86_64-linux, applying to trunk and 9.
> 
> 2019-10-24  Marek Polacek  
> 
>   * decl.c (reshape_init_r): Add missing space.

This change reminded me that it is time to run my
https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00844.html
script again.  While it has lots of false positives, it discovered quite a
few issues besides the bug you've fixed.

I'll commit this as obvious to trunk if it passes bootstrap/regtest.

2019-10-24  Jakub Jelinek  

* config/arc/arc.c (hwloop_optimize): Add missing space in string
literal.
* config/rx/rx.c (rx_print_operand): Likewise.
* tree-vect-data-refs.c (vect_analyze_data_refs): Likewise.
* tree-ssa-loop-ch.c (should_duplicate_loop_header_p): Likewise.
* ipa-sra.c (create_parameter_descriptors, process_scan_results):
Likewise.
* genemit.c (emit_c_code): Likewise.
* plugin.c (try_init_one_plugin): Likewise.  Formatting fix.
cp/
* call.c (convert_arg_to_ellipsis): Add missing space in string
literal.

--- gcc/config/arc/arc.c.jj 2019-09-11 10:27:44.612703959 +0200
+++ gcc/config/arc/arc.c2019-10-24 19:38:21.796846873 +0200
@@ -8001,7 +8001,7 @@ hwloop_optimize (hwloop_info loop)
  return false;
}
   if (dump_file)
-   fprintf (dump_file, ";; loop %d has a control like last insn;"
+   fprintf (dump_file, ";; loop %d has a control like last insn; "
 "add a nop\n",
 loop->loop_no);
 
@@ -8011,7 +8011,7 @@ hwloop_optimize (hwloop_info loop)
   if (LABEL_P (last_insn))
 {
   if (dump_file)
-   fprintf (dump_file, ";; loop %d has a label as last insn;"
+   fprintf (dump_file, ";; loop %d has a label as last insn; "
 "add a nop\n",
 loop->loop_no);
   last_insn = emit_insn_after (gen_nopv (), last_insn);
@@ -8038,7 +8038,7 @@ hwloop_optimize (hwloop_info loop)
   if (entry_edge == NULL)
 {
   if (dump_file)
-   fprintf (dump_file, ";; loop %d has no fallthru edge jumping"
+   fprintf (dump_file, ";; loop %d has no fallthru edge jumping "
 "into the loop\n",
 loop->loop_no);
   return false;
--- gcc/config/rx/rx.c.jj   2019-09-11 10:27:41.266755350 +0200
+++ gcc/config/rx/rx.c  2019-10-24 19:39:53.224447237 +0200
@@ -649,7 +649,7 @@ rx_print_operand (FILE * file, rtx op, i
case CTRLREG_INTB:  fprintf (file, "intb"); break;
default:
  warning (0, "unrecognized control register number: %d"
-  "- using %", (int) INTVAL (op));
+  " - using %", (int) INTVAL (op));
  fprintf (file, "psw");
  break;
}
--- gcc/cp/call.c.jj2019-10-24 14:46:34.976751156 +0200
+++ gcc/cp/call.c   2019-10-24 19:43:52.416785521 +0200
@@ -7590,7 +7590,7 @@ convert_arg_to_ellipsis (tree arg, tsubs
  && TYPE_MODE (TREE_TYPE (prom)) != TYPE_MODE (arg_type)
  && (complain & tf_warning))
warning_at (loc, OPT_Wabi, "scoped enum %qT passed through %<...%>"
-   "as %qT before %<-fabi-version=6%>, %qT after",
+   " as %qT before %<-fabi-version=6%>, %qT after",
arg_type,
TREE_TYPE (prom), ENUM_UNDERLYING_TYPE (arg_type));
  if (!abi_version_at_least (6))
--- gcc/plugin.c.jj 2019-05-20 11:39:35.305796134 +0200
+++ gcc/plugin.c2019-10-24 19:46:59.839916339 +0200
@@ -712,10 +712,10 @@ try_init_one_plugin (struct plugin_name_
   if (dlsym (dl_handle, str_license) == NULL)
 fatal_error (input_location,
 "plugin %s is not licensed under a GPL-compatible license"
-"%s", plugin->full_name, dlerror ());
+" %s", plugin->full_name, dlerror ());
 
-  PTR_UNION_AS_VOID_PTR (plugin_init_union) =
-  dlsym (dl_handle, str_plugin_init_func_name);
+  PTR_UNION_AS_VOID_PTR (plugin_init_union)
+= dlsym (dl_handle, str_plugin_init_func_name);
   plugin_init = PTR_UNION_AS_CAST_PTR (plugin_init_union);
 
   if ((err = dlerror ()) != NULL)
--- gcc/tree-vect-data-refs.c.jj2019-10-21 13:06:29.220299826 +0200
+++ gcc/tree-vect-data-refs.c   2019-10-24 19:51:53.689419065 +0200
@@ -4282,7 +4282,7 @@ vect_analyze_data_refs (vec_info *vinfo,
{
  if (nested_in_vect_loop_p (loop, stmt_info))
return opt_result::failure_at (stmt_info->stmt,
-  "not vectorized:"
+  "not vectorized: "
   "not suitable for strided load %G",
   stmt_info->stmt);
  STMT_VINFO_STRIDED_P (stmt_info) = true;
--- gcc/tree-ssa-loop-ch.c.jj   2019-07-10 15:52:27.851038998 +0200

Re: [PATCH] Add missing gimple_call_set_fntype

2019-10-04 Thread Jeff Law
On 10/4/19 7:07 AM, Martin Jambor wrote:
> Hi,
> 
> when looking for detected call argument type incompatibilities I
> stumbled over a call to builtin_memcpy which originally was a call to
> builtin_memset but simplify_builtin_call changed it to the former
> without setting the gimple statement fntype.  That's probably not a big
> deal but since I know about it I thought I might as well make the
> statement consistent with the following.
> 
> Bootstrapped and tested on x86-64-linux.  OK for trunk?
> 
> Thanks,
> 
> Martin
> 
> 
> 2019-10-04  Martin Jambor  
> 
>   * tree-ssa-forwprop.c (simplify_builtin_call): Set gimple call
>   fntype when switching to calling memcpy instead of memset.
OK
jeff


[PATCH] Add missing gimple_call_set_fntype

2019-10-04 Thread Martin Jambor
Hi,

when looking for detected call argument type incompatibilities I
stumbled over a call to builtin_memcpy which originally was a call to
builtin_memset but simplify_builtin_call changed it to the former
without setting the gimple statement fntype.  That's probably not a big
deal but since I know about it I thought I might as well make the
statement consistent with the following.

Bootstrapped and tested on x86-64-linux.  OK for trunk?

Thanks,

Martin


2019-10-04  Martin Jambor  

* tree-ssa-forwprop.c (simplify_builtin_call): Set gimple call
fntype when switching to calling memcpy instead of memset.
---
 gcc/tree-ssa-forwprop.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index 221f140b356..a1e22c93631 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -1426,8 +1426,10 @@ simplify_builtin_call (gimple_stmt_iterator *gsi_p, tree 
callee2)
  if (!is_gimple_val (ptr1))
ptr1 = force_gimple_operand_gsi (gsi_p, ptr1, true, NULL_TREE,
 true, GSI_SAME_STMT);
- gimple_call_set_fndecl (stmt2,
- builtin_decl_explicit (BUILT_IN_MEMCPY));
+ tree fndecl = builtin_decl_explicit (BUILT_IN_MEMCPY);
+ gimple_call_set_fndecl (stmt2, fndecl);
+ gimple_call_set_fntype (as_a  (stmt2),
+ TREE_TYPE (fndecl));
  gimple_call_set_arg (stmt2, 0, ptr1);
  gimple_call_set_arg (stmt2, 1, new_str_cst);
  gimple_call_set_arg (stmt2, 2,
-- 
2.23.0



Re: [PATCH] Add missing popcount simplifications (PR90693)

2019-08-22 Thread Richard Biener
On Wed, Aug 21, 2019 at 5:53 PM Jeff Law  wrote:
>
> On 8/21/19 7:57 AM, Wilco Dijkstra wrote:
> > Hi Richard,
> >
> >>>
> >>> I think this should be in expand stage where there could be comparison
> >>> of the cost of the RTLs.
> >>
> >> I tend to agree here, if not then for the reason the "simplified" variants
> >> have more GIMPLE stmts which means they are not "simpler".  In
> >> fact I'd argue for canonicalization we'd want to have the reverse
> >> "simplifications" on GIMPLE and expansion based on target cost.
> >
> > So how would this work? Expand works on one statement at a time, but
> > we are dealing with more complex expressions here. When we get a
> > popcount (x) > 1 in expand_gimple_cond, the popcount has already been
> > expanded. And the code in builtins.c that emits popcount doesn't see or
> > consider the comparison, so it would be difficult to change it at that 
> > point.
> > None of the infrastructure in expand seems to be set up to do complex
> > pattern matches and replacements at expand time...
> >
> > Costing would be difficult too since rtx_cost doesn't support builtins or
> > calls, so each backend would need to be modified to add costs for these.
> >
> > So what is the best place to do pattern matches? I thought it was all
> > moving to match.pd.
> I believe the expanders have access to more than one statement via the
> use-def chains and TER's transformations.

Either that or as repeatedly suggested elsewhere more complex
"expand time instruction selection" can happen on GIMPLE right
before RTL expansion (pass_optimize_widening_mul is a pass
doing something like that).  We probably want to have a more
formalized thing at some point as part of RTL expansion itself
to also get rid of TER.

The issue with using TER for this is that TER doesn't "handle"
internal FN calls I think (which is simply an oversight):

  /* Increment counter if this is a non BUILT_IN call. We allow
 replacement over BUILT_IN calls since many will expand to inline
 insns instead of a true call.  */
  if (is_gimple_call (stmt)
  && !((fndecl = gimple_call_fndecl (stmt))
   && fndecl_built_in_p (fndecl)))
cur_call_cnt++;

Richard.

> jeff


Re: [PATCH] Add missing popcount simplifications (PR90693)

2019-08-21 Thread Jeff Law
On 8/21/19 7:57 AM, Wilco Dijkstra wrote:
> Hi Richard,
> 
>>>
>>> I think this should be in expand stage where there could be comparison
>>> of the cost of the RTLs.
>>
>> I tend to agree here, if not then for the reason the "simplified" variants
>> have more GIMPLE stmts which means they are not "simpler".  In
>> fact I'd argue for canonicalization we'd want to have the reverse
>> "simplifications" on GIMPLE and expansion based on target cost.
>  
> So how would this work? Expand works on one statement at a time, but
> we are dealing with more complex expressions here. When we get a
> popcount (x) > 1 in expand_gimple_cond, the popcount has already been
> expanded. And the code in builtins.c that emits popcount doesn't see or
> consider the comparison, so it would be difficult to change it at that point.
> None of the infrastructure in expand seems to be set up to do complex
> pattern matches and replacements at expand time...
> 
> Costing would be difficult too since rtx_cost doesn't support builtins or
> calls, so each backend would need to be modified to add costs for these.
> 
> So what is the best place to do pattern matches? I thought it was all
> moving to match.pd.
I believe the expanders have access to more than one statement via the
use-def chains and TER's transformations.

jeff


Re: [PATCH] Add missing popcount simplifications (PR90693)

2019-08-21 Thread Wilco Dijkstra
Hi Richard,

> >
> > I think this should be in expand stage where there could be comparison
> > of the cost of the RTLs.
> 
> I tend to agree here, if not then for the reason the "simplified" variants
> have more GIMPLE stmts which means they are not "simpler".  In
> fact I'd argue for canonicalization we'd want to have the reverse
> "simplifications" on GIMPLE and expansion based on target cost.
 
So how would this work? Expand works on one statement at a time, but
we are dealing with more complex expressions here. When we get a
popcount (x) > 1 in expand_gimple_cond, the popcount has already been
expanded. And the code in builtins.c that emits popcount doesn't see or
consider the comparison, so it would be difficult to change it at that point.
None of the infrastructure in expand seems to be set up to do complex
pattern matches and replacements at expand time...

Costing would be difficult too since rtx_cost doesn't support builtins or
calls, so each backend would need to be modified to add costs for these.

So what is the best place to do pattern matches? I thought it was all
moving to match.pd.

Wilco

Re: [PATCH] Add missing popcount simplifications (PR90693)

2019-08-14 Thread Richard Biener
On Tue, Aug 13, 2019 at 6:47 PM Andrew Pinski  wrote:
>
> On Tue, Aug 13, 2019 at 8:50 AM Wilco Dijkstra  wrote:
> >
> > Add simplifications for popcount (x) > 1 to (x & (x-1)) != 0 and
> > popcount (x) == 1 into (x-1)  > single-use cases and support an optional convert.  A microbenchmark
> > shows a speedup of 2-2.5x on both x64 and AArch64.
> >
> > Bootstrap OK, OK for commit?
>
> I think this should be in expand stage where there could be comparison
> of the cost of the RTLs.

I tend to agree here, if not then for the reason the "simplified" variants
have more GIMPLE stmts which means they are not "simpler".  In
fact I'd argue for canonicalization we'd want to have the reverse
"simplifications" on GIMPLE and expansion based on target cost.

Richard.

> The only reason why it is faster for AARCH64 is the requirement of
> moving between the GPRs and the SIMD registers.
>
> Thanks,
> Andrew Pinski
>
> >
> > ChangeLog:
> > 2019-08-13  Wilco Dijkstra  
> >
> > gcc/
> > PR middle-end/90693
> > * match.pd: Add popcount simplifications.
> >
> > testsuite/
> > PR middle-end/90693
> > * gcc.dg/fold-popcount-5.c: Add new test.
> >
> > ---
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 
> > 0317bc704f771f626ab72189b3a54de00087ad5a..bf4351a330f45f3a1424d9792cefc3da6267597d
> >  100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -5356,7 +5356,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > rep (eq eq ne ne)
> >  (simplify
> >(cmp (popcount @0) integer_zerop)
> > -  (rep @0 { build_zero_cst (TREE_TYPE (@0)); }
> > +  (rep @0 { build_zero_cst (TREE_TYPE (@0)); })))
> > +  /* popcount(X) == 1 -> (X-1)  > +  (for cmp (eq ne)
> > +   rep (lt ge)
> > +(simplify
> > +  (cmp (convert? (popcount:s @0)) integer_onep)
> > +  (with {
> > + tree utype = unsigned_type_for (TREE_TYPE (@0));
> > + tree a0 = fold_convert (utype, @0); }
> > +   (rep (plus { a0; } { build_minus_one_cst (utype); })
> > +(bit_and (negate { a0; }) { a0; })
> > +  /* popcount(X) > 1 -> (X & (X-1)) != 0.  */
> > +  (for cmp (gt le)
> > +   rep (ne eq)
> > +(simplify
> > +  (cmp (convert? (popcount:s @0)) integer_onep)
> > +  (rep (bit_and (plus @0 { build_minus_one_cst (TREE_TYPE (@0)); }) @0)
> > +  { build_zero_cst (TREE_TYPE (@0)); }
> >
> >  /* Simplify:
> >
> > diff --git a/gcc/testsuite/gcc.dg/fold-popcount-5.c 
> > b/gcc/testsuite/gcc.dg/fold-popcount-5.c
> > new file mode 100644
> > index 
> > ..fcf3910587caacb8e39cf437dc3971df892f405a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/fold-popcount-5.c
> > @@ -0,0 +1,69 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +/* Test popcount (x) > 1 -> (x & (x-1)) != 0.  */
> > +
> > +int test_1 (long x)
> > +{
> > +  return __builtin_popcountl (x) >= 2;
> > +}
> > +
> > +int test_2 (int x)
> > +{
> > +  return (unsigned) __builtin_popcount (x) <= 1u;
> > +}
> > +
> > +int test_3 (unsigned x)
> > +{
> > +  return __builtin_popcount (x) > 1u;
> > +}
> > +
> > +int test_4 (unsigned long x)
> > +{
> > +  return (unsigned char) __builtin_popcountl (x) > 1;
> > +}
> > +
> > +int test_5 (unsigned long x)
> > +{
> > +  return (signed char) __builtin_popcountl (x) <= (signed char)1;
> > +}
> > +
> > +int test_6 (unsigned long long x)
> > +{
> > +  return 2u <= __builtin_popcountll (x);
> > +}
> > +
> > +/* Test popcount (x) == 1 -> (x-1)  > +
> > +int test_7 (unsigned long x)
> > +{
> > +  return __builtin_popcountl (x) != 1;
> > +}
> > +
> > +int test_8 (long long x)
> > +{
> > +  return (unsigned) __builtin_popcountll (x) == 1u;
> > +}
> > +
> > +int test_9 (int x)
> > +{
> > +  return (unsigned char) __builtin_popcount (x) != 1u;
> > +}
> > +
> > +int test_10 (unsigned x)
> > +{
> > +  return (unsigned char) __builtin_popcount (x) == 1;
> > +}
> > +
> > +int test_11 (long x)
> > +{
> > +  return (signed char) __builtin_popcountl (x) == 1;
> > +}
> > +
> > +int test_12 (long x)
> > +{
> > +  return 1u == __builtin_popcountl (x);
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "popcount" 0 "optimized" } } */
> > +


Re: [PATCH] Add missing popcount simplifications (PR90693)

2019-08-13 Thread Andrew Pinski
On Tue, Aug 13, 2019 at 8:50 AM Wilco Dijkstra  wrote:
>
> Add simplifications for popcount (x) > 1 to (x & (x-1)) != 0 and
> popcount (x) == 1 into (x-1)  single-use cases and support an optional convert.  A microbenchmark
> shows a speedup of 2-2.5x on both x64 and AArch64.
>
> Bootstrap OK, OK for commit?

I think this should be in expand stage where there could be comparison
of the cost of the RTLs.
The only reason why it is faster for AARCH64 is the requirement of
moving between the GPRs and the SIMD registers.

Thanks,
Andrew Pinski

>
> ChangeLog:
> 2019-08-13  Wilco Dijkstra  
>
> gcc/
> PR middle-end/90693
> * match.pd: Add popcount simplifications.
>
> testsuite/
> PR middle-end/90693
> * gcc.dg/fold-popcount-5.c: Add new test.
>
> ---
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> 0317bc704f771f626ab72189b3a54de00087ad5a..bf4351a330f45f3a1424d9792cefc3da6267597d
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5356,7 +5356,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> rep (eq eq ne ne)
>  (simplify
>(cmp (popcount @0) integer_zerop)
> -  (rep @0 { build_zero_cst (TREE_TYPE (@0)); }
> +  (rep @0 { build_zero_cst (TREE_TYPE (@0)); })))
> +  /* popcount(X) == 1 -> (X-1)  +  (for cmp (eq ne)
> +   rep (lt ge)
> +(simplify
> +  (cmp (convert? (popcount:s @0)) integer_onep)
> +  (with {
> + tree utype = unsigned_type_for (TREE_TYPE (@0));
> + tree a0 = fold_convert (utype, @0); }
> +   (rep (plus { a0; } { build_minus_one_cst (utype); })
> +(bit_and (negate { a0; }) { a0; })
> +  /* popcount(X) > 1 -> (X & (X-1)) != 0.  */
> +  (for cmp (gt le)
> +   rep (ne eq)
> +(simplify
> +  (cmp (convert? (popcount:s @0)) integer_onep)
> +  (rep (bit_and (plus @0 { build_minus_one_cst (TREE_TYPE (@0)); }) @0)
> +  { build_zero_cst (TREE_TYPE (@0)); }
>
>  /* Simplify:
>
> diff --git a/gcc/testsuite/gcc.dg/fold-popcount-5.c 
> b/gcc/testsuite/gcc.dg/fold-popcount-5.c
> new file mode 100644
> index 
> ..fcf3910587caacb8e39cf437dc3971df892f405a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/fold-popcount-5.c
> @@ -0,0 +1,69 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +/* Test popcount (x) > 1 -> (x & (x-1)) != 0.  */
> +
> +int test_1 (long x)
> +{
> +  return __builtin_popcountl (x) >= 2;
> +}
> +
> +int test_2 (int x)
> +{
> +  return (unsigned) __builtin_popcount (x) <= 1u;
> +}
> +
> +int test_3 (unsigned x)
> +{
> +  return __builtin_popcount (x) > 1u;
> +}
> +
> +int test_4 (unsigned long x)
> +{
> +  return (unsigned char) __builtin_popcountl (x) > 1;
> +}
> +
> +int test_5 (unsigned long x)
> +{
> +  return (signed char) __builtin_popcountl (x) <= (signed char)1;
> +}
> +
> +int test_6 (unsigned long long x)
> +{
> +  return 2u <= __builtin_popcountll (x);
> +}
> +
> +/* Test popcount (x) == 1 -> (x-1)  +
> +int test_7 (unsigned long x)
> +{
> +  return __builtin_popcountl (x) != 1;
> +}
> +
> +int test_8 (long long x)
> +{
> +  return (unsigned) __builtin_popcountll (x) == 1u;
> +}
> +
> +int test_9 (int x)
> +{
> +  return (unsigned char) __builtin_popcount (x) != 1u;
> +}
> +
> +int test_10 (unsigned x)
> +{
> +  return (unsigned char) __builtin_popcount (x) == 1;
> +}
> +
> +int test_11 (long x)
> +{
> +  return (signed char) __builtin_popcountl (x) == 1;
> +}
> +
> +int test_12 (long x)
> +{
> +  return 1u == __builtin_popcountl (x);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "popcount" 0 "optimized" } } */
> +


Re: [PATCH] Add missing popcount simplifications (PR90693)

2019-08-13 Thread Marc Glisse

On Tue, 13 Aug 2019, Wilco Dijkstra wrote:


Add simplifications for popcount (x) > 1 to (x & (x-1)) != 0 and
popcount (x) == 1 into (x-1) 

Is that true even on targets that have a popcount instruction? (-mpopcnt 
for x64)



diff --git a/gcc/match.pd b/gcc/match.pd
index 
0317bc704f771f626ab72189b3a54de00087ad5a..bf4351a330f45f3a1424d9792cefc3da6267597d
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5356,7 +5356,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   rep (eq eq ne ne)
(simplify
  (cmp (popcount @0) integer_zerop)
-  (rep @0 { build_zero_cst (TREE_TYPE (@0)); }
+  (rep @0 { build_zero_cst (TREE_TYPE (@0)); })))
+  /* popcount(X) == 1 -> (X-1) 

That doesn't seem right for a gimple transformation. I assume you didn't 
write (convert:utype @0) in the output because you want to avoid doing it 
3 times? IIRC you are allowed to write (convert:utype@1 @0) in the output 
and reuse @1 several times.



+   (rep (plus { a0; } { build_minus_one_cst (utype); })
+(bit_and (negate { a0; }) { a0; })
+  /* popcount(X) > 1 -> (X & (X-1)) != 0.  */
+  (for cmp (gt le)
+   rep (ne eq)
+(simplify
+  (cmp (convert? (popcount:s @0)) integer_onep)
+  (rep (bit_and (plus @0 { build_minus_one_cst (TREE_TYPE (@0)); }) @0)
+  { build_zero_cst (TREE_TYPE (@0)); }


Are there any types where this could be a problem? Say if you cast to a 
1-bit type. Actually, even converting popcnt(__uint128_t(-1)) to signed 
char may be problematic.


--
Marc Glisse


[PATCH] Add missing popcount simplifications (PR90693)

2019-08-13 Thread Wilco Dijkstra
Add simplifications for popcount (x) > 1 to (x & (x-1)) != 0 and
popcount (x) == 1 into (x-1) 

gcc/
PR middle-end/90693
* match.pd: Add popcount simplifications.

testsuite/
PR middle-end/90693
* gcc.dg/fold-popcount-5.c: Add new test.

---

diff --git a/gcc/match.pd b/gcc/match.pd
index 
0317bc704f771f626ab72189b3a54de00087ad5a..bf4351a330f45f3a1424d9792cefc3da6267597d
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5356,7 +5356,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
rep (eq eq ne ne)
 (simplify
   (cmp (popcount @0) integer_zerop)
-  (rep @0 { build_zero_cst (TREE_TYPE (@0)); }
+  (rep @0 { build_zero_cst (TREE_TYPE (@0)); })))
+  /* popcount(X) == 1 -> (X-1)  1 -> (X & (X-1)) != 0.  */
+  (for cmp (gt le)
+   rep (ne eq)
+(simplify
+  (cmp (convert? (popcount:s @0)) integer_onep)
+  (rep (bit_and (plus @0 { build_minus_one_cst (TREE_TYPE (@0)); }) @0)
+  { build_zero_cst (TREE_TYPE (@0)); }
 
 /* Simplify:
 
diff --git a/gcc/testsuite/gcc.dg/fold-popcount-5.c 
b/gcc/testsuite/gcc.dg/fold-popcount-5.c
new file mode 100644
index 
..fcf3910587caacb8e39cf437dc3971df892f405a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-popcount-5.c
@@ -0,0 +1,69 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+/* Test popcount (x) > 1 -> (x & (x-1)) != 0.  */
+
+int test_1 (long x)
+{
+  return __builtin_popcountl (x) >= 2;
+}
+
+int test_2 (int x)
+{
+  return (unsigned) __builtin_popcount (x) <= 1u;
+}
+
+int test_3 (unsigned x)
+{
+  return __builtin_popcount (x) > 1u;
+}
+
+int test_4 (unsigned long x)
+{
+  return (unsigned char) __builtin_popcountl (x) > 1;
+}
+
+int test_5 (unsigned long x)
+{
+  return (signed char) __builtin_popcountl (x) <= (signed char)1;
+}
+
+int test_6 (unsigned long long x)
+{
+  return 2u <= __builtin_popcountll (x);
+}
+
+/* Test popcount (x) == 1 -> (x-1) 

Re: [PATCH] Add missing _mm{256,512}_zext* intrinsics (PRs target/83250, target/91340)

2019-08-12 Thread Uros Bizjak
On Mon, Aug 12, 2019 at 4:57 PM Jakub Jelinek  wrote:
>
> Hi!
>
> The following patch adds 9 missing intrinsics, which are like _mm*_cast*,
> but don't leave the upper bits undefined - set them to zero instead.
> The implementation uses code that combine manages to optimize well,
> the only problem is that as the 512-bit intrinsics are supposed to be
> avx512f and some needed intrinsics they'd ideally use are avx512dq, it means
> that for _mm512_zextpd128_pd512/_mm512_zextps256_ps512 we emit
> vmovaps/vmovapd instead of vmovapd/vmovaps.
>
> I've also discovered that for AVX, there is no test coverage of the various
> cast intrinsics, so I've added that too.
>
> The PR has some details on other possible expansions, it would be nice to
> optimize also those definitions into the same code, but it will require some
> extra define_insn_and_split, though I think that can be done incrementally;
> and once done, perhaps we could change the 
> _mm512_zextpd128_pd512/_mm512_zextps256_ps512
> so that they actually generate the right ps vs. pd variant of move.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2019-08-12  Jakub Jelinek  
>
> PR target/83250
> PR target/91340
> * config/i386/avxintrin.h (_mm256_zextpd128_pd256,
> _mm256_zextps128_ps256, _mm256_zextsi128_si256): New intrinsics.
> * config/i386/avx512fintrin.h (_mm512_zextpd128_pd512,
> _mm512_zextps128_ps512, _mm512_zextsi128_si512, 
> _mm512_zextpd256_pd512,
> _mm512_zextps256_ps512, _mm512_zextsi256_si512): Likewise.
>
> * gcc.target/i386/avx-typecast-1.c: New test.
> * gcc.target/i386/avx-typecast-2.c: New test.
> * gcc.target/i386/avx512f-typecast-2.c: New test.

OK for AVX, LGTM for AVX512F.

Thanks,
Uros.

>
> --- gcc/config/i386/avxintrin.h.jj  2019-08-05 12:25:34.476667673 +0200
> +++ gcc/config/i386/avxintrin.h 2019-08-12 14:33:07.905601186 +0200
> @@ -1484,6 +1484,26 @@ _mm256_castsi128_si256 (__m128i __A)
>return (__m256i) __builtin_ia32_si256_si ((__v4si)__A);
>  }
>
> +/* Similarly, but with zero extension instead of undefined values.  */
> +
> +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm256_zextpd128_pd256 (__m128d __A)
> +{
> +  return _mm256_insertf128_pd (_mm256_setzero_pd (), __A, 0);
> +}
> +
> +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm256_zextps128_ps256 (__m128 __A)
> +{
> +  return _mm256_insertf128_ps (_mm256_setzero_ps (), __A, 0);
> +}
> +
> +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm256_zextsi128_si256 (__m128i __A)
> +{
> +  return _mm256_insertf128_si256 (_mm256_setzero_si256 (), __A, 0);
> +}
> +
>  extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
>  _mm256_set_m128 ( __m128 __H, __m128 __L)
>  {
> --- gcc/config/i386/avx512fintrin.h.jj  2019-07-12 09:34:49.524385009 +0200
> +++ gcc/config/i386/avx512fintrin.h 2019-08-12 14:36:52.281169281 +0200
> @@ -15437,6 +15437,48 @@ _mm512_castsi256_si512 (__m256i __A)
>return (__m512i)__builtin_ia32_si512_256si ((__v8si)__A);
>  }
>
> +extern __inline __m512d
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_zextpd128_pd512 (__m128d __A)
> +{
> +  return (__m512d) _mm512_insertf32x4 (_mm512_setzero_ps (), (__m128) __A, 
> 0);
> +}
> +
> +extern __inline __m512
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_zextps128_ps512 (__m128 __A)
> +{
> +  return _mm512_insertf32x4 (_mm512_setzero_ps (), __A, 0);
> +}
> +
> +extern __inline __m512i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_zextsi128_si512 (__m128i __A)
> +{
> +  return _mm512_inserti32x4 (_mm512_setzero_si512 (), __A, 0);
> +}
> +
> +extern __inline __m512d
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_zextpd256_pd512 (__m256d __A)
> +{
> +  return _mm512_insertf64x4 (_mm512_setzero_pd (), __A, 0);
> +}
> +
> +extern __inline __m512
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_zextps256_ps512 (__m256 __A)
> +{
> +  return (__m512) _mm512_insertf64x4 (_mm512_setzero_pd (), (__m256d) __A, 
> 0);
> +}
> +
> +extern __inline __m512i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_zextsi256_si512 (__m256i __A)
> +{
> +  return _mm512_inserti64x4 (_mm512_setzero_si512 (), __A, 0);
> +}
> +
>  extern __inline __mmask16
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_cmpeq_epu32_mask (__m512i __A, __m512i __B)
> --- gcc/testsuite/gcc.target/i386/avx-typecast-1.c.jj   2019-08-12 
> 15:12:51.597209881 +0200
> +++ gcc/testsuite/gcc.target/i386/avx-typecast-1.c  2019-08-12 
> 15:12:47.334274860 +0200
> @@ -0,0 +1,83 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -ma

[PATCH] Add missing _mm{256,512}_zext* intrinsics (PRs target/83250, target/91340)

2019-08-12 Thread Jakub Jelinek
Hi!

The following patch adds 9 missing intrinsics, which are like _mm*_cast*,
but don't leave the upper bits undefined - set them to zero instead.
The implementation uses code that combine manages to optimize well,
the only problem is that as the 512-bit intrinsics are supposed to be
avx512f and some needed intrinsics they'd ideally use are avx512dq, it means
that for _mm512_zextpd128_pd512/_mm512_zextps256_ps512 we emit
vmovaps/vmovapd instead of vmovapd/vmovaps.

I've also discovered that for AVX, there is no test coverage of the various
cast intrinsics, so I've added that too.

The PR has some details on other possible expansions, it would be nice to
optimize also those definitions into the same code, but it will require some
extra define_insn_and_split, though I think that can be done incrementally;
and once done, perhaps we could change the 
_mm512_zextpd128_pd512/_mm512_zextps256_ps512
so that they actually generate the right ps vs. pd variant of move.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-08-12  Jakub Jelinek  

PR target/83250
PR target/91340
* config/i386/avxintrin.h (_mm256_zextpd128_pd256,
_mm256_zextps128_ps256, _mm256_zextsi128_si256): New intrinsics.
* config/i386/avx512fintrin.h (_mm512_zextpd128_pd512,
_mm512_zextps128_ps512, _mm512_zextsi128_si512, _mm512_zextpd256_pd512,
_mm512_zextps256_ps512, _mm512_zextsi256_si512): Likewise.

* gcc.target/i386/avx-typecast-1.c: New test.
* gcc.target/i386/avx-typecast-2.c: New test.
* gcc.target/i386/avx512f-typecast-2.c: New test.

--- gcc/config/i386/avxintrin.h.jj  2019-08-05 12:25:34.476667673 +0200
+++ gcc/config/i386/avxintrin.h 2019-08-12 14:33:07.905601186 +0200
@@ -1484,6 +1484,26 @@ _mm256_castsi128_si256 (__m128i __A)
   return (__m256i) __builtin_ia32_si256_si ((__v4si)__A);
 }
 
+/* Similarly, but with zero extension instead of undefined values.  */
+
+extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm256_zextpd128_pd256 (__m128d __A)
+{
+  return _mm256_insertf128_pd (_mm256_setzero_pd (), __A, 0);
+}
+
+extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm256_zextps128_ps256 (__m128 __A)
+{
+  return _mm256_insertf128_ps (_mm256_setzero_ps (), __A, 0);
+}
+
+extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm256_zextsi128_si256 (__m128i __A)
+{
+  return _mm256_insertf128_si256 (_mm256_setzero_si256 (), __A, 0);
+}
+
 extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm256_set_m128 ( __m128 __H, __m128 __L)
 {
--- gcc/config/i386/avx512fintrin.h.jj  2019-07-12 09:34:49.524385009 +0200
+++ gcc/config/i386/avx512fintrin.h 2019-08-12 14:36:52.281169281 +0200
@@ -15437,6 +15437,48 @@ _mm512_castsi256_si512 (__m256i __A)
   return (__m512i)__builtin_ia32_si512_256si ((__v8si)__A);
 }
 
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_zextpd128_pd512 (__m128d __A)
+{
+  return (__m512d) _mm512_insertf32x4 (_mm512_setzero_ps (), (__m128) __A, 0);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_zextps128_ps512 (__m128 __A)
+{
+  return _mm512_insertf32x4 (_mm512_setzero_ps (), __A, 0);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_zextsi128_si512 (__m128i __A)
+{
+  return _mm512_inserti32x4 (_mm512_setzero_si512 (), __A, 0);
+}
+
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_zextpd256_pd512 (__m256d __A)
+{
+  return _mm512_insertf64x4 (_mm512_setzero_pd (), __A, 0);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_zextps256_ps512 (__m256 __A)
+{
+  return (__m512) _mm512_insertf64x4 (_mm512_setzero_pd (), (__m256d) __A, 0);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_zextsi256_si512 (__m256i __A)
+{
+  return _mm512_inserti64x4 (_mm512_setzero_si512 (), __A, 0);
+}
+
 extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_cmpeq_epu32_mask (__m512i __A, __m512i __B)
--- gcc/testsuite/gcc.target/i386/avx-typecast-1.c.jj   2019-08-12 
15:12:51.597209881 +0200
+++ gcc/testsuite/gcc.target/i386/avx-typecast-1.c  2019-08-12 
15:12:47.334274860 +0200
@@ -0,0 +1,83 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx" } */
+/* { dg-require-effective-target avx } */
+
+#include "avx-check.h"
+
+extern int memcmp (const void *, const void *, __SIZE_TYPE__);
+
+void
+avx_test (void)
+{
+  union256i_d  a, ad;
+  union256  b, bd;
+  union256d  c, cd;
+  union128i_d  d, dd;
+  union128  e, ed;
+  union128d  f, fd;
+  int i;
+
+  for (i = 0; i < 8; i++)
+{
+  a

Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/89803)

2019-06-07 Thread Jeff Law
On 6/3/19 5:06 AM, Jakub Jelinek wrote:
> On Mon, Jun 03, 2019 at 06:01:40PM +0800, Hongtao Liu wrote:
>>   The following patch adds forgotten avx512f fpclass instrinsics for
>> masked scalar operations.
>>
>> Bootstrapped/regtested on x86_64-linux and i686-linux (on skylake-avx512),
>> ok for trunk?
>>
>> Changelog:
>>
>> gcc/
>> +2019-03-24 Hongtao Liu 
>> +
>> + PR target/89803
>> + * config/i386/avx512dqintrin.h
>> + (_mm_mask_fpclass_ss_mask,_mm_mask_fpclass_sd_mask):
>> + New intrinsics.
>> + * config/i386/i386-builtin.def
>> + (__builtin_ia32_fpcla_mask, _builtin_ia32_fpclasssd_mask):
>> + New builtins.
>> + * config/i386/sse.md
>> + (define_insn "avx512dq_vmfpclass):
>> + Modified with mask.
> 
> Given that the __builtin_ia32_fpclasss[sd] builtins are AVX512DQ only,
> wouldn't it make more sense to remove the __builtin_ia32_fpclasss[sd]
> builtins rather than keep them, adjust _mm_mask_fpclass_ss/_mm_mask_fpclass_sd
> so that they use these new builtins instead of old and pass in -1 and
> make sure we emit the same code as before for those intrinsics?
> 
> We have way too many ia32 builtins.
Can't argue with the too many ia32 builtins.  If we easily collapse
things the way you suggest, it'd be better.

jeff


Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/89803)

2019-06-04 Thread Hongtao Liu
On Tue, Jun 4, 2019 at 5:56 PM Hongtao Liu  wrote:
>
> On Tue, Jun 4, 2019 at 5:21 PM Jakub Jelinek  wrote:
> >
> > On Tue, Jun 04, 2019 at 05:00:05PM +0800, Hongtao Liu wrote:
> > > Thanks for reminding, Here is updated:
> >
> > You've missed some notes.  Ok for trunk with:
> > 1) the following patch applied on top of your patch
> > 2) the ChangeLog entries moved to the start of the ChangeLog (normally,
> >ChangeLog entries are not added as part of the patch, but before the
> >patch in text form, because the ChangeLog files are updated many times
> >a day
> >
>
> Ok, thanks.
>
> > --- mask_fpclasss[sd]_v3.diff   2019-06-04 11:11:31.007712339 +0200
> > +++ mask_fpclasss[sd]_v3.diff   2019-06-04 11:14:19.581047040 +0200
> > @@ -2,7 +2,7 @@ Index: gcc/ChangeLog
> >  ===
> >  --- gcc/ChangeLog  (revision 271853)
> >  +++ gcc/ChangeLog  (working copy)
> > -@@ -4706,6 +4706,24 @@
> > +@@ -4706,6 +4706,23 @@
> > reprocessing.  Always call df_analyze before fixing up debug bind
> > insns.
> >
> > @@ -12,17 +12,16 @@ Index: gcc/ChangeLog
> >  +  * config/i386/avx512dqintrin.h (_mm_mask_fpclass_ss_mask,
> >  +  _mm_mask_fpclass_sd_mask): New intrinsics.
> >  +  (_mm_fpclass_ss_mask, _mm_fpclass_sd_mask): Modified, use new 
> > builtins.
> > -+  * config/i386/i386-builtin.def
> > -+  (__builtin_ia32_fpcla_mask, __builtin_ia32_fpclasssd_mask):
> > -+  New builtins.
> > ++  * config/i386/i386-builtin.def (__builtin_ia32_fpcla_mask,
> > ++  __builtin_ia32_fpclasssd_mask): New builtins.
> >  +  (__builtin_ia32_fpcla, __builtin_ia32_fpclasssd): Deleted.
> >  +  * config/i386/i386-builtin-types.def (DEF_FUNCTION_TYPE (QI, V2DF, 
> > INT),
> >  +  DEF_FUNCTION_TYPE (QI, V4SF, INT)): Deleted.
> >  +  * config/i386/i386-expand.c (case QI_FTYPE_V4SF_INT,
> >  +  case QI_FTYPE_V2SF_INT): Ditto.
> > -+  * config/i386/sse.md
> > -+  (define_insn "avx512dq_vmfpclass):
> > -+  Extended to insnstructions with mask operands.
> > ++  * config/i386/sse.md (avx512dq_vmfpclass): Rename to ...
> > ++  (avx512dq_vmfpclass): ... this.  Add
> > ++   to insn template.
> >  +
> >   2019-03-23  Segher Boessenkool  
> >
> > @@ -184,10 +183,10 @@ Index: gcc/testsuite/ChangeLog
> >  +  (__builtin_ia32_fpcla_mask, __builtin_ia32_fpclasssd_mask): 
> > Define.
> >  +  * gcc.target/i386/avx512dq-vfpcla-2.c: New.
> >  +  * gcc.target/i386/avx512dq-vfpclasssd-2.c: New.
> > -+  * gcc.target/i386/avx512dq-vfpcla-1.c (avx512f_test):
> > -+  Add test for _mm_mask_fpclass_ss_mask.
> > -+  * gcc.target/i386/avx512dq-vfpclasssd-1.c (avx512f_test):
> > -+  Add test for _mm_mask_fpclass_sd_mask.
> > ++  * gcc.target/i386/avx512dq-vfpcla-1.c (avx512f_test): Add test 
> > for
> > ++  _mm_mask_fpclass_ss_mask.
> > ++  * gcc.target/i386/avx512dq-vfpclasssd-1.c (avx512f_test): Add test 
> > for
> > ++  _mm_mask_fpclass_sd_mask.
> >  +
> >   2019-03-22  Vladimir Makarov  
> >
> >
> >
> > Jakub
>
>
>
> --
> BR,
> Hongtao

Author: liuhongt
Date: Wed Jun  5 06:04:22 2019
New Revision: 271946

URL: https://gcc.gnu.org/viewcvs?rev=271946&root=gcc&view=rev
Log:
gcc/
2019-06-05  Hongtao Liu  

PR target/89803
* config/i386/avx512dqintrin.h (_mm_mask_fpclass_ss_mask,
_mm_mask_fpclass_sd_mask): New intrinsics.
(_mm_fpclass_ss_mask, _mm_fpclass_sd_mask): Modified, use new builtins.
* config/i386/i386-builtin.def
(__builtin_ia32_fpcla_mask, __builtin_ia32_fpclasssd_mask):
New builtins.
(__builtin_ia32_fpcla, __builtin_ia32_fpclasssd): Deleted.
* config/i386/i386-builtin-types.def (DEF_FUNCTION_TYPE (QI, V2DF,
INT),
DEF_FUNCTION_TYPE (QI, V4SF, INT)): Deleted.
* config/i386/i386-expand.c (case QI_FTYPE_V4SF_INT,
case QI_FTYPE_V2SF_INT): Ditto.
* config/i386/sse.md
(define_insn "avx512dq_vmfpclass):
Extended to insnstructions with mask operands.

gcc/testsuite
2019-06-05  Hongtao Liu  

PR target/89803
* gcc.target/i386/avx-1.c (__builtin_ia32_fpclas,
__builtin_ia32_fpclasssd): Removed.
(__builtin_ia32_fpcla_mask, __builtin_ia32_fpclasssd_mask): Define.
* gcc.target/i386/sse-13.c (__builtin_ia32_fpclas,
__builtin_ia32_fpclasssd): Removed.
(__builtin_ia32_fpcla_mask, __builtin_ia32_fpclasssd_mask): Define.
* gcc.target/i386/sse-23.c (__builtin_ia32_fpclas,
__builtin_ia32_fpclasssd): Removed.
(__builtin_ia32_fpcla_mask, __builtin_ia32_fpclasssd_mask): Define.
* gcc.target/i386/avx512dq-vfpcla-2.c: New.
* gcc.target/i386/avx512dq-vfpclasssd-2.c: New.
* gcc.target/i386/avx512dq-vfpcla-1.c (avx512f_test):
Add test for _mm_mask_fpclass_ss_mask.
* gcc.target/i3

Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/89803)

2019-06-04 Thread Hongtao Liu
On Tue, Jun 4, 2019 at 5:21 PM Jakub Jelinek  wrote:
>
> On Tue, Jun 04, 2019 at 05:00:05PM +0800, Hongtao Liu wrote:
> > Thanks for reminding, Here is updated:
>
> You've missed some notes.  Ok for trunk with:
> 1) the following patch applied on top of your patch
> 2) the ChangeLog entries moved to the start of the ChangeLog (normally,
>ChangeLog entries are not added as part of the patch, but before the
>patch in text form, because the ChangeLog files are updated many times
>a day
>

Ok, thanks.

> --- mask_fpclasss[sd]_v3.diff   2019-06-04 11:11:31.007712339 +0200
> +++ mask_fpclasss[sd]_v3.diff   2019-06-04 11:14:19.581047040 +0200
> @@ -2,7 +2,7 @@ Index: gcc/ChangeLog
>  ===
>  --- gcc/ChangeLog  (revision 271853)
>  +++ gcc/ChangeLog  (working copy)
> -@@ -4706,6 +4706,24 @@
> +@@ -4706,6 +4706,23 @@
> reprocessing.  Always call df_analyze before fixing up debug bind
> insns.
>
> @@ -12,17 +12,16 @@ Index: gcc/ChangeLog
>  +  * config/i386/avx512dqintrin.h (_mm_mask_fpclass_ss_mask,
>  +  _mm_mask_fpclass_sd_mask): New intrinsics.
>  +  (_mm_fpclass_ss_mask, _mm_fpclass_sd_mask): Modified, use new 
> builtins.
> -+  * config/i386/i386-builtin.def
> -+  (__builtin_ia32_fpcla_mask, __builtin_ia32_fpclasssd_mask):
> -+  New builtins.
> ++  * config/i386/i386-builtin.def (__builtin_ia32_fpcla_mask,
> ++  __builtin_ia32_fpclasssd_mask): New builtins.
>  +  (__builtin_ia32_fpcla, __builtin_ia32_fpclasssd): Deleted.
>  +  * config/i386/i386-builtin-types.def (DEF_FUNCTION_TYPE (QI, V2DF, 
> INT),
>  +  DEF_FUNCTION_TYPE (QI, V4SF, INT)): Deleted.
>  +  * config/i386/i386-expand.c (case QI_FTYPE_V4SF_INT,
>  +  case QI_FTYPE_V2SF_INT): Ditto.
> -+  * config/i386/sse.md
> -+  (define_insn "avx512dq_vmfpclass):
> -+  Extended to insnstructions with mask operands.
> ++  * config/i386/sse.md (avx512dq_vmfpclass): Rename to ...
> ++  (avx512dq_vmfpclass): ... this.  Add
> ++   to insn template.
>  +
>   2019-03-23  Segher Boessenkool  
>
> @@ -184,10 +183,10 @@ Index: gcc/testsuite/ChangeLog
>  +  (__builtin_ia32_fpcla_mask, __builtin_ia32_fpclasssd_mask): 
> Define.
>  +  * gcc.target/i386/avx512dq-vfpcla-2.c: New.
>  +  * gcc.target/i386/avx512dq-vfpclasssd-2.c: New.
> -+  * gcc.target/i386/avx512dq-vfpcla-1.c (avx512f_test):
> -+  Add test for _mm_mask_fpclass_ss_mask.
> -+  * gcc.target/i386/avx512dq-vfpclasssd-1.c (avx512f_test):
> -+  Add test for _mm_mask_fpclass_sd_mask.
> ++  * gcc.target/i386/avx512dq-vfpcla-1.c (avx512f_test): Add test for
> ++  _mm_mask_fpclass_ss_mask.
> ++  * gcc.target/i386/avx512dq-vfpclasssd-1.c (avx512f_test): Add test for
> ++  _mm_mask_fpclass_sd_mask.
>  +
>   2019-03-22  Vladimir Makarov  
>
>
>
> Jakub



-- 
BR,
Hongtao


Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/89803)

2019-06-04 Thread Jakub Jelinek
On Tue, Jun 04, 2019 at 05:00:05PM +0800, Hongtao Liu wrote:
> Thanks for reminding, Here is updated:

You've missed some notes.  Ok for trunk with:
1) the following patch applied on top of your patch
2) the ChangeLog entries moved to the start of the ChangeLog (normally,
   ChangeLog entries are not added as part of the patch, but before the
   patch in text form, because the ChangeLog files are updated many times
   a day

--- mask_fpclasss[sd]_v3.diff   2019-06-04 11:11:31.007712339 +0200
+++ mask_fpclasss[sd]_v3.diff   2019-06-04 11:14:19.581047040 +0200
@@ -2,7 +2,7 @@ Index: gcc/ChangeLog
 ===
 --- gcc/ChangeLog  (revision 271853)
 +++ gcc/ChangeLog  (working copy)
-@@ -4706,6 +4706,24 @@
+@@ -4706,6 +4706,23 @@
reprocessing.  Always call df_analyze before fixing up debug bind
insns.
  
@@ -12,17 +12,16 @@ Index: gcc/ChangeLog
 +  * config/i386/avx512dqintrin.h (_mm_mask_fpclass_ss_mask,
 +  _mm_mask_fpclass_sd_mask): New intrinsics.
 +  (_mm_fpclass_ss_mask, _mm_fpclass_sd_mask): Modified, use new builtins.
-+  * config/i386/i386-builtin.def
-+  (__builtin_ia32_fpcla_mask, __builtin_ia32_fpclasssd_mask):
-+  New builtins.
++  * config/i386/i386-builtin.def (__builtin_ia32_fpcla_mask,
++  __builtin_ia32_fpclasssd_mask): New builtins.
 +  (__builtin_ia32_fpcla, __builtin_ia32_fpclasssd): Deleted.
 +  * config/i386/i386-builtin-types.def (DEF_FUNCTION_TYPE (QI, V2DF, INT),
 +  DEF_FUNCTION_TYPE (QI, V4SF, INT)): Deleted.
 +  * config/i386/i386-expand.c (case QI_FTYPE_V4SF_INT,
 +  case QI_FTYPE_V2SF_INT): Ditto.
-+  * config/i386/sse.md
-+  (define_insn "avx512dq_vmfpclass):
-+  Extended to insnstructions with mask operands.
++  * config/i386/sse.md (avx512dq_vmfpclass): Rename to ...
++  (avx512dq_vmfpclass): ... this.  Add
++   to insn template.
 +
  2019-03-23  Segher Boessenkool  
  
@@ -184,10 +183,10 @@ Index: gcc/testsuite/ChangeLog
 +  (__builtin_ia32_fpcla_mask, __builtin_ia32_fpclasssd_mask): Define.
 +  * gcc.target/i386/avx512dq-vfpcla-2.c: New.
 +  * gcc.target/i386/avx512dq-vfpclasssd-2.c: New.
-+  * gcc.target/i386/avx512dq-vfpcla-1.c (avx512f_test):
-+  Add test for _mm_mask_fpclass_ss_mask.
-+  * gcc.target/i386/avx512dq-vfpclasssd-1.c (avx512f_test):
-+  Add test for _mm_mask_fpclass_sd_mask.
++  * gcc.target/i386/avx512dq-vfpcla-1.c (avx512f_test): Add test for
++  _mm_mask_fpclass_ss_mask.
++  * gcc.target/i386/avx512dq-vfpclasssd-1.c (avx512f_test): Add test for
++  _mm_mask_fpclass_sd_mask.
 +
  2019-03-22  Vladimir Makarov  
  


Jakub


Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/89803)

2019-06-04 Thread Hongtao Liu
On Tue, Jun 4, 2019 at 3:59 PM Jakub Jelinek  wrote:
>
> On Tue, Jun 04, 2019 at 03:38:08PM +0800, Hongtao Liu wrote:
> > --- gcc/ChangeLog (revision 271853)
> > +++ gcc/ChangeLog (working copy)
> > @@ -4706,6 +4706,26 @@
> >   reprocessing.  Always call df_analyze before fixing up debug bind
> >   insns.
> >
> > +2019-03-24 Hongtao Liu   
>
> name should be separated from date and email by 2 spaces on each side,
> you have just one space before and a tab after.
>
> > +
> > + PR target/89803
> > + * config/i386/avx512dqintrin.h
> > + (_mm_mask_fpclass_ss_mask,_mm_mask_fpclass_sd_mask):
> > + New intrinsics.
>
> There should be space after comma, and a line break should be there
> only when it will not fit, so:
>
> +   * config/i386/avx512dqintrin.h (_mm_mask_fpclass_ss_mask,
> +   _mm_mask_fpclass_sd_mask): New intrinsics.
>
> > + (_mm_fpclass_ss_mask,_mm_fpclass_sd_mask):
> > + Modified, use new builtins.
>
> Similarly.
>
> > + * config/i386/i386-builtin.def
> > + (__builtin_ia32_fpcla_mask, _builtin_ia32_fpclasssd_mask):
> > + New builtins.
>
> Again.
>
> > + (__builtin_ia32_fpcla, _builtin_ia32_fpclasssd): Deleted.
> > + * config/i386/i386-builtin-types.def:
> > + Delete relate types.
>
> You should say what exactly you've deleted, so
>
> +   * config/i386/i386-builtin-types.def (QI_FTYPE_V2DF_INT,
> +   QI_FTYPE_V4SF_INT): Remove.
>
> > + * config/i386/i386-expand.c:
> > + Ditto.
>
> Mention what you've changed, so
>
> +   * config/i386/i386-expand.c (ix86_expand_args_builtin): Remove
> +   QI_FTYPE_V2DF_INT and QI_FTYPE_V4SF_INT cases.
>
> > + * config/i386/sse.md
> > + (define_insn "avx512dq_vmfpclass):
> > + Modified with mask.
>
> That is not what you've done.
>
> +   * config/i386/sse.md (avx512dq_vmfpclass): Rename to ...
> +   (avx512dq_vmfpclass): ... this.  Add
> +to insn template.
>
> > --- gcc/config/i386/avx512dqintrin.h  (revision 271853)
> > +++ gcc/config/i386/avx512dqintrin.h  (working copy)
> > @@ -1362,7 +1362,7 @@
> >  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> >  _mm_fpclass_ss_mask (__m128 __A, const int __imm)
> >  {
> > -  return (__mmask8) __builtin_ia32_fpcla ((__v4sf) __A, __imm);
> > +  return (__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) __A, __imm, 
> > -1);
>
> Most other avx512*.h code uses explicit (__mmaskN) -1 instead of just -1, so
> perhaps for consistency use:
> +  return (__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) __A, __imm,
> +   (_mmask8) -1);
> ?
>
> >  }
> >
> >  extern __inline __mmask8
> > @@ -1369,9 +1369,23 @@
> >  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> >  _mm_fpclass_sd_mask (__m128d __A, const int __imm)
> >  {
> > -  return (__mmask8) __builtin_ia32_fpclasssd ((__v2df) __A, __imm);
> > +  return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm, 
> > -1);
> >  }
>
> Likewise.
>
> >  #define _mm_fpclass_ss_mask(X, C)  
> >   \
> > -  ((__mmask8) __builtin_ia32_fpcla ((__v4sf) (__m128) (X), (int) (C))) 
> >  \
> > +  ((__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) (__m128) (X), (int) 
> > (C), (__mmask8) (-1))) \
> >
> >  #define _mm_fpclass_sd_mask(X, C)  
> >   \
> > -  ((__mmask8) __builtin_ia32_fpclasssd ((__v2df) (__m128d) (X), (int) 
> > (C))) \
> > +  ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X), (int) 
> > (C), (__mmask8) (-1))) \
> >
> > +#define _mm_mask_fpclass_ss_mask(X, C, U)  
> >   \
> > +  ((__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) (__m128) (X), (int) 
> > (C), (__mmask8) (U)))
> > +
> > +#define _mm_mask_fpclass_sd_mask(X, C, U)  
> >   \
> > +  ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X), (int) 
> > (C), (__mmask8) (U)))
>
> Too long lines.
>
> > +2019-03-24 Hongtao Liu 
> > +
> > + PR target/89803
> > + * gcc.target/i386/avx-1.c
> > + (__builtin_ia32_fpclasss[sd]): Replaced with 
> > builtin_ia32_fpclasss[sd]_mask.
> > + * gcc.target/i386/sse-13.c:
> > + (__builtin_ia32_fpclasss[sd]): Likewise.
> > + * gcc.target/i386/sse-23.c
> > + (__builtin_ia32_fpclasss[sd]): Likewise.
>
> Similar problems in this ChangeLog as in gcc/ChangeLog, you don't want a
> linebreak after the filename if the function name can fit in, too long line
> too, sse-13.c has an extra : after it and I believe we don't allow wildcards
> in the function names between
> ()s, so it should be:
> +   * gcc.target/i386/avx-1.c (__builtin_ia32_fpcla,
> +   __builtin_ia32_fpclasssd): Remove.
> +   (__builtin_ia32_fpcla_mask, __builtin_ia32_fpclasssd_mask): 
> Define.
> etc.
>
> Jakub

Thanks for reminding, Here is updated:

Index: gcc/ChangeLog

Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/89803)

2019-06-04 Thread Jakub Jelinek
On Tue, Jun 04, 2019 at 03:38:08PM +0800, Hongtao Liu wrote:
> --- gcc/ChangeLog (revision 271853)
> +++ gcc/ChangeLog (working copy)
> @@ -4706,6 +4706,26 @@
>   reprocessing.  Always call df_analyze before fixing up debug bind
>   insns.
>  
> +2019-03-24 Hongtao Liu   

name should be separated from date and email by 2 spaces on each side,
you have just one space before and a tab after.

> +
> + PR target/89803
> + * config/i386/avx512dqintrin.h
> + (_mm_mask_fpclass_ss_mask,_mm_mask_fpclass_sd_mask):
> + New intrinsics.

There should be space after comma, and a line break should be there
only when it will not fit, so:

+   * config/i386/avx512dqintrin.h (_mm_mask_fpclass_ss_mask,
+   _mm_mask_fpclass_sd_mask): New intrinsics.

> + (_mm_fpclass_ss_mask,_mm_fpclass_sd_mask):
> + Modified, use new builtins.

Similarly.

> + * config/i386/i386-builtin.def
> + (__builtin_ia32_fpcla_mask, _builtin_ia32_fpclasssd_mask):
> + New builtins.

Again.

> + (__builtin_ia32_fpcla, _builtin_ia32_fpclasssd): Deleted.
> + * config/i386/i386-builtin-types.def:
> + Delete relate types.

You should say what exactly you've deleted, so

+   * config/i386/i386-builtin-types.def (QI_FTYPE_V2DF_INT,
+   QI_FTYPE_V4SF_INT): Remove.

> + * config/i386/i386-expand.c:
> + Ditto.

Mention what you've changed, so

+   * config/i386/i386-expand.c (ix86_expand_args_builtin): Remove
+   QI_FTYPE_V2DF_INT and QI_FTYPE_V4SF_INT cases.

> + * config/i386/sse.md
> + (define_insn "avx512dq_vmfpclass):
> + Modified with mask.

That is not what you've done.

+   * config/i386/sse.md (avx512dq_vmfpclass): Rename to ...
+   (avx512dq_vmfpclass): ... this.  Add
+to insn template.

> --- gcc/config/i386/avx512dqintrin.h  (revision 271853)
> +++ gcc/config/i386/avx512dqintrin.h  (working copy)
> @@ -1362,7 +1362,7 @@
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm_fpclass_ss_mask (__m128 __A, const int __imm)
>  {
> -  return (__mmask8) __builtin_ia32_fpcla ((__v4sf) __A, __imm);
> +  return (__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) __A, __imm, -1);

Most other avx512*.h code uses explicit (__mmaskN) -1 instead of just -1, so
perhaps for consistency use:
+  return (__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) __A, __imm,
+   (_mmask8) -1);
?

>  }
>  
>  extern __inline __mmask8
> @@ -1369,9 +1369,23 @@
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm_fpclass_sd_mask (__m128d __A, const int __imm)
>  {
> -  return (__mmask8) __builtin_ia32_fpclasssd ((__v2df) __A, __imm);
> +  return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm, -1);
>  }

Likewise.

>  #define _mm_fpclass_ss_mask(X, C)
> \
> -  ((__mmask8) __builtin_ia32_fpcla ((__v4sf) (__m128) (X), (int) (C)))  \
> +  ((__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) (__m128) (X), (int) 
> (C), (__mmask8) (-1))) \
>  
>  #define _mm_fpclass_sd_mask(X, C)
> \
> -  ((__mmask8) __builtin_ia32_fpclasssd ((__v2df) (__m128d) (X), (int) (C))) \
> +  ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X), (int) 
> (C), (__mmask8) (-1))) \
>  
> +#define _mm_mask_fpclass_ss_mask(X, C, U)
> \
> +  ((__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) (__m128) (X), (int) 
> (C), (__mmask8) (U)))
> +
> +#define _mm_mask_fpclass_sd_mask(X, C, U)
> \
> +  ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X), (int) 
> (C), (__mmask8) (U)))

Too long lines.

> +2019-03-24 Hongtao Liu 
> +
> + PR target/89803
> + * gcc.target/i386/avx-1.c
> + (__builtin_ia32_fpclasss[sd]): Replaced with 
> builtin_ia32_fpclasss[sd]_mask.
> + * gcc.target/i386/sse-13.c:
> + (__builtin_ia32_fpclasss[sd]): Likewise.
> + * gcc.target/i386/sse-23.c
> + (__builtin_ia32_fpclasss[sd]): Likewise.

Similar problems in this ChangeLog as in gcc/ChangeLog, you don't want a
linebreak after the filename if the function name can fit in, too long line
too, sse-13.c has an extra : after it and I believe we don't allow wildcards
in the function names between
()s, so it should be:
+   * gcc.target/i386/avx-1.c (__builtin_ia32_fpcla,
+   __builtin_ia32_fpclasssd): Remove.
+   (__builtin_ia32_fpcla_mask, __builtin_ia32_fpclasssd_mask): Define.
etc.

Jakub


Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/89803)

2019-06-04 Thread Hongtao Liu
On Mon, Jun 3, 2019 at 7:06 PM Jakub Jelinek  wrote:
>
> On Mon, Jun 03, 2019 at 06:01:40PM +0800, Hongtao Liu wrote:
> >   The following patch adds forgotten avx512f fpclass instrinsics for
> > masked scalar operations.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux (on skylake-avx512),
> > ok for trunk?
> >
> > Changelog:
> >
> > gcc/
> > +2019-03-24 Hongtao Liu 
> > +
> > + PR target/89803
> > + * config/i386/avx512dqintrin.h
> > + (_mm_mask_fpclass_ss_mask,_mm_mask_fpclass_sd_mask):
> > + New intrinsics.
> > + * config/i386/i386-builtin.def
> > + (__builtin_ia32_fpcla_mask, _builtin_ia32_fpclasssd_mask):
> > + New builtins.
> > + * config/i386/sse.md
> > + (define_insn "avx512dq_vmfpclass):
> > + Modified with mask.
>
> Given that the __builtin_ia32_fpclasss[sd] builtins are AVX512DQ only,
> wouldn't it make more sense to remove the __builtin_ia32_fpclasss[sd]
> builtins rather than keep them, adjust _mm_mask_fpclass_ss/_mm_mask_fpclass_sd
> so that they use these new builtins instead of old and pass in -1 and
> make sure we emit the same code as before for those intrinsics?
>
> We have way too many ia32 builtins.
>
> Jakub

Yes, here is updated patch.

-- 
BR,
Hongtao
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 271853)
+++ gcc/ChangeLog	(working copy)
@@ -4706,6 +4706,26 @@
 	reprocessing.  Always call df_analyze before fixing up debug bind
 	insns.
 
+2019-03-24 Hongtao Liu	
+
+	PR target/89803
+	* config/i386/avx512dqintrin.h
+	(_mm_mask_fpclass_ss_mask,_mm_mask_fpclass_sd_mask):
+	New intrinsics.
+	(_mm_fpclass_ss_mask,_mm_fpclass_sd_mask):
+	Modified, use new builtins.
+	* config/i386/i386-builtin.def
+	(__builtin_ia32_fpcla_mask, _builtin_ia32_fpclasssd_mask):
+	New builtins.
+	(__builtin_ia32_fpcla, _builtin_ia32_fpclasssd): Deleted.
+	* config/i386/i386-builtin-types.def:
+	Delete relate types.
+	* config/i386/i386-expand.c:
+	Ditto.
+	* config/i386/sse.md
+	(define_insn "avx512dq_vmfpclass):
+	Modified with mask.
+
 2019-03-23  Segher Boessenkool  
 
 	* config/rs6000/xmmintrin.h (_mm_movemask_pi8): Implement for 32-bit
Index: gcc/config/i386/avx512dqintrin.h
===
--- gcc/config/i386/avx512dqintrin.h	(revision 271853)
+++ gcc/config/i386/avx512dqintrin.h	(working copy)
@@ -1362,7 +1362,7 @@
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_fpclass_ss_mask (__m128 __A, const int __imm)
 {
-  return (__mmask8) __builtin_ia32_fpcla ((__v4sf) __A, __imm);
+  return (__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) __A, __imm, -1);
 }
 
 extern __inline __mmask8
@@ -1369,9 +1369,23 @@
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_fpclass_sd_mask (__m128d __A, const int __imm)
 {
-  return (__mmask8) __builtin_ia32_fpclasssd ((__v2df) __A, __imm);
+  return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm, -1);
 }
 
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fpclass_ss_mask (__mmask8 __U, __m128 __A, const int __imm)
+{
+  return (__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) __A, __imm, __U);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fpclass_sd_mask (__mmask8 __U, __m128d __A, const int __imm)
+{
+  return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm, __U);
+}
+
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_cvtt_roundpd_epi64 (__m512d __A, const int __R)
@@ -2618,11 +2632,17 @@
 (__mmask16)(U)))
 
 #define _mm_fpclass_ss_mask(X, C)		\
-  ((__mmask8) __builtin_ia32_fpcla ((__v4sf) (__m128) (X), (int) (C)))  \
+  ((__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) (__m128) (X), (int) (C), (__mmask8) (-1))) \
 
 #define _mm_fpclass_sd_mask(X, C)		\
-  ((__mmask8) __builtin_ia32_fpclasssd ((__v2df) (__m128d) (X), (int) (C))) \
+  ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X), (int) (C), (__mmask8) (-1))) \
 
+#define _mm_mask_fpclass_ss_mask(X, C, U)	\
+  ((__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) (__m128) (X), (int) (C), (__mmask8) (U)))
+
+#define _mm_mask_fpclass_sd_mask(X, C, U)	\
+  ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X), (int) (C), (__mmask8) (U)))
+
 #define _mm512_mask_fpclass_pd_mask(u, X, C)\
   ((__mmask8) __builtin_ia32_fpclasspd512_mask ((__v8df) (__m512d) (X), \
 		(int) (C), (__mmask8)(u)))
Index: gcc/config/i386/i386-builtin-types.def
===
--- gcc/config/i386/i386-builtin-types.def	(revision 271853)
+++ gcc/config/i386/i386-builtin-types.def	(working copy)
@@ -964,11 +964,9 @@
 DEF_FUNCTION_TYPE (QI, V8DF, INT)
 DEF_FUNCTION_TYPE (QI, V4DF, INT)
 DEF_FUNCTION_TYPE (QI, V4DF, 

Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/89803)

2019-06-03 Thread Jakub Jelinek
On Mon, Jun 03, 2019 at 06:01:40PM +0800, Hongtao Liu wrote:
>   The following patch adds forgotten avx512f fpclass instrinsics for
> masked scalar operations.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux (on skylake-avx512),
> ok for trunk?
> 
> Changelog:
> 
> gcc/
> +2019-03-24 Hongtao Liu 
> +
> + PR target/89803
> + * config/i386/avx512dqintrin.h
> + (_mm_mask_fpclass_ss_mask,_mm_mask_fpclass_sd_mask):
> + New intrinsics.
> + * config/i386/i386-builtin.def
> + (__builtin_ia32_fpcla_mask, _builtin_ia32_fpclasssd_mask):
> + New builtins.
> + * config/i386/sse.md
> + (define_insn "avx512dq_vmfpclass):
> + Modified with mask.

Given that the __builtin_ia32_fpclasss[sd] builtins are AVX512DQ only,
wouldn't it make more sense to remove the __builtin_ia32_fpclasss[sd]
builtins rather than keep them, adjust _mm_mask_fpclass_ss/_mm_mask_fpclass_sd
so that they use these new builtins instead of old and pass in -1 and
make sure we emit the same code as before for those intrinsics?

We have way too many ia32 builtins.

Jakub


[PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/89803)

2019-06-03 Thread Hongtao Liu
Hi Jeff:
  The following patch adds forgotten avx512f fpclass instrinsics for
masked scalar operations.

Bootstrapped/regtested on x86_64-linux and i686-linux (on skylake-avx512),
ok for trunk?

Changelog:

gcc/
+2019-03-24 Hongtao Liu 
+
+ PR target/89803
+ * config/i386/avx512dqintrin.h
+ (_mm_mask_fpclass_ss_mask,_mm_mask_fpclass_sd_mask):
+ New intrinsics.
+ * config/i386/i386-builtin.def
+ (__builtin_ia32_fpcla_mask, _builtin_ia32_fpclasssd_mask):
+ New builtins.
+ * config/i386/sse.md
+ (define_insn "avx512dq_vmfpclass):
+ Modified with mask.

gcc/testsuite
+2019-03-24 Hongtao Liu 
+
+ PR target/89803
+ * gcc.target/i386/avx-1.c (__builtin_ia32_fpcla_mask,
+ __builtin_ia32_fpclasssd_mask): Define.
+ * gcc.target/i386/sse-13.c (__builtin_ia32_fpcla_mask,
+ __builtin_ia32_fpclasssd_mask): Likewise.
+ * gcc.target/i386/sse-23.c (__builtin_ia32_fpcla_mask)
+ (__builtin_ia32_fpclasssd_mask): Likewise.
+ * gcc.target/i386/avx512dq-vfpcla-2.c: New.
+ * gcc.target/i386/avx512dq-vfpclasssd-2.c: Likewise.
+ * gcc.target/i386/avx512dq-vfpcla-1.c (avx512f_test):
+ Add test for _mm_mask_fpclass_ss_mask.
+ * gcc.target/i386/avx512dq-vfpclasssd-1.c (avx512f_test):
+ Add test for _mm_mask_fpclass_sd_mask.

-- 
BR,
Hongtao
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 271853)
+++ gcc/ChangeLog	(working copy)
@@ -4706,6 +4706,19 @@
 	reprocessing.  Always call df_analyze before fixing up debug bind
 	insns.
 
+2019-03-24 Hongtao Liu	
+
+	PR target/89803
+	* config/i386/avx512dqintrin.h
+	(_mm_mask_fpclass_ss_mask,_mm_mask_fpclass_sd_mask):
+	New intrinsics.
+	* config/i386/i386-builtin.def
+	(__builtin_ia32_fpcla_mask, _builtin_ia32_fpclasssd_mask):
+	New builtins.
+	* config/i386/sse.md
+	(define_insn "avx512dq_vmfpclass):
+	Modified with mask.
+
 2019-03-23  Segher Boessenkool  
 
 	* config/rs6000/xmmintrin.h (_mm_movemask_pi8): Implement for 32-bit
Index: gcc/config/i386/avx512dqintrin.h
===
--- gcc/config/i386/avx512dqintrin.h	(revision 271853)
+++ gcc/config/i386/avx512dqintrin.h	(working copy)
@@ -1372,6 +1372,20 @@
   return (__mmask8) __builtin_ia32_fpclasssd ((__v2df) __A, __imm);
 }
 
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fpclass_ss_mask (__mmask8 __U, __m128 __A, const int __imm)
+{
+  return (__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) __A, __imm, __U);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fpclass_sd_mask (__mmask8 __U, __m128d __A, const int __imm)
+{
+  return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm, __U);
+}
+
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_cvtt_roundpd_epi64 (__m512d __A, const int __R)
@@ -2623,6 +2637,12 @@
 #define _mm_fpclass_sd_mask(X, C)		\
   ((__mmask8) __builtin_ia32_fpclasssd ((__v2df) (__m128d) (X), (int) (C))) \
 
+#define _mm_mask_fpclass_ss_mask(X, C, U)	\
+  ((__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) (__m128) (X), (int) (C)), (__mmask8) (U))
+
+#define _mm_mask_fpclass_sd_mask(X, C, U)	\
+  ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X), (int) (C)), (__mmask8) (U))
+
 #define _mm512_mask_fpclass_pd_mask(u, X, C)\
   ((__mmask8) __builtin_ia32_fpclasspd512_mask ((__v8df) (__m512d) (X), \
 		(int) (C), (__mmask8)(u)))
Index: gcc/config/i386/i386-builtin.def
===
--- gcc/config/i386/i386-builtin.def	(revision 271853)
+++ gcc/config/i386/i386-builtin.def	(working copy)
@@ -2086,9 +2086,11 @@
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_fpclassv4df_mask, "__builtin_ia32_fpclasspd256_mask", IX86_BUILTIN_FPCLASSPD256, UNKNOWN, (int) QI_FTYPE_V4DF_INT_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_fpclassv2df_mask, "__builtin_ia32_fpclasspd128_mask", IX86_BUILTIN_FPCLASSPD128, UNKNOWN, (int) QI_FTYPE_V2DF_INT_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vmfpclassv2df, "__builtin_ia32_fpclasssd", IX86_BUILTIN_FPCLASSSD, UNKNOWN, (int) QI_FTYPE_V2DF_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vmfpclassv2df_mask, "__builtin_ia32_fpclasssd_mask", IX86_BUILTIN_FPCLASSSD_MASK, UNKNOWN, (int) QI_FTYPE_V2DF_INT_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_fpclassv8sf_mask, "__builtin_ia32_fpclassps256_mask", IX86_BUILTIN_FPCLASSPS256, UNKNOWN, (int) QI_FTYPE_V8SF_INT_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_fpclassv4sf_mask, "__builtin_ia32_fpclassps128_mask", IX86_BUILTIN_FPCLASSPS128, UNKNOWN, (int) QI_FTYPE_V4SF_INT_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vmfpc

[PATCH] Add missing feature test macro to C++17 status table

2019-05-23 Thread Jonathan Wakely

* doc/xml/manual/status_cxx2017.xml: Add feature test macro for
P0040R3.
* doc/html/*: Regenerate.

Committed to trunk.


commit f58762124c374887a5785a6f4a812af64fe5b2f1
Author: Jonathan Wakely 
Date:   Thu May 23 17:00:37 2019 +0100

Add missing feature test macro to C++17 status table

* doc/xml/manual/status_cxx2017.xml: Add feature test macro for
P0040R3.
* doc/html/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml 
b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
index 73403ef6ba0..a11e93cda90 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
@@ -392,7 +392,7 @@ Feature-testing recommendations for C++.

   
7.1 
-   
+  __cpp_lib_raw_memory_algorithms >= 201606L
 
 
 


[PATCH] Add missing piece of P0777R1 and update C++20 status docs

2019-05-17 Thread Jonathan Wakely

* doc/xml/manual/status_cxx2020.xml: Update P0608R3, P0777R1, and
P1165R1 entries.
* doc/html/*: Regenerate.
* include/std/tuple (make_from_tuple): Use remove_reference_t instead
of decay_t (P0777R1).

Tested powerpc64le-linux, committed to trunk.


commit 867e57fbf824520973453cfa9e08c3a9ee3a5df8
Author: Jonathan Wakely 
Date:   Fri May 17 00:29:22 2019 +0100

Add missing piece of P0777R1 and update C++20 status docs

* doc/xml/manual/status_cxx2020.xml: Update P0608R3, P0777R1, and
P1165R1 entries.
* doc/html/*: Regenerate.
* include/std/tuple (make_from_tuple): Use remove_reference_t 
instead
of decay_t (P0777R1).

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2020.xml 
b/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
index 5cdd2227ae0..c7a543f85d9 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
@@ -227,14 +227,13 @@ Feature-testing recommendations for C++.
 
 
 
-  
 Treating Unnecessary decay 
   
 http://www.w3.org/1999/xlink"; 
xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0777r1.pdf";>
P0777R1

   
-   
+   9.1 
   
 
 
@@ -702,14 +701,13 @@ Feature-testing recommendations for C++.
 
 
 
-  
 A sane variant converting constructor 
   
 http://www.w3.org/1999/xlink"; 
xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0608r3.html";>
P0608R3

   
-   
+   10.1 
   
 
 
@@ -863,14 +861,13 @@ Feature-testing recommendations for C++.
 
 
 
-  
 Make stateful allocator propagation more consistent for 
operator+(basic_string) 
   
 http://www.w3.org/1999/xlink"; 
xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1165r1.html";>
P1165R1

   
-   
+   10.1 
   
 
 
diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index a28111749f0..b81157c097b 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -1756,7 +1756,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   return __make_from_tuple_impl<_Tp>(
 std::forward<_Tuple>(__t),
-   make_index_sequence>>{});
+   make_index_sequence>>{});
 }
 #endif // C++17
 


Re: [PATCH] Add missing target options (PR middle-end/90258).

2019-04-26 Thread Martin Liška

On 4/26/19 5:24 PM, Jeff Law wrote:

On 4/26/19 5:02 AM, Martin Liška wrote:

Hi.

The fix is about forgotten target options for which get_valid_option_values
returns empty list.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed to trunk and later than to gcc-9 branch?
Thanks,
Martin

gcc/ChangeLog:

2019-04-26  Martin Liska  

PR middle-end/90258
* opt-suggestions.c (option_proposer::build_option_suggestions):
When get_valid_option_values returns empty values, add the
misspelling candidate.

gcc/testsuite/ChangeLog:

2019-04-26  Martin Liska  

PR middle-end/90258
* gcc.dg/completion-5.c: New test.
* gcc.target/i386/spellcheck-options-5.c: New test.

OK for the trunk.  Your call on when to cherry-pick it onto the gcc-9
branch.

Jeff



Thank you for review, installed as r270622 to trunk.
I would like to see the patch in GCC 9.1. What's release managers opinion?

Thanks,
Martin



Re: [PATCH] Add missing target options (PR middle-end/90258).

2019-04-26 Thread Jeff Law
On 4/26/19 5:02 AM, Martin Liška wrote:
> Hi.
> 
> The fix is about forgotten target options for which get_valid_option_values
> returns empty list.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed to trunk and later than to gcc-9 branch?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2019-04-26  Martin Liska  
> 
>   PR middle-end/90258
>   * opt-suggestions.c (option_proposer::build_option_suggestions):
>   When get_valid_option_values returns empty values, add the
>   misspelling candidate.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-04-26  Martin Liska  
> 
>   PR middle-end/90258
>   * gcc.dg/completion-5.c: New test.
>   * gcc.target/i386/spellcheck-options-5.c: New test.
OK for the trunk.  Your call on when to cherry-pick it onto the gcc-9
branch.

Jeff


[PATCH] Add missing target options (PR middle-end/90258).

2019-04-26 Thread Martin Liška
Hi.

The fix is about forgotten target options for which get_valid_option_values
returns empty list.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed to trunk and later than to gcc-9 branch?
Thanks,
Martin

gcc/ChangeLog:

2019-04-26  Martin Liska  

PR middle-end/90258
* opt-suggestions.c (option_proposer::build_option_suggestions):
When get_valid_option_values returns empty values, add the
misspelling candidate.

gcc/testsuite/ChangeLog:

2019-04-26  Martin Liska  

PR middle-end/90258
* gcc.dg/completion-5.c: New test.
* gcc.target/i386/spellcheck-options-5.c: New test.
---
 gcc/opt-suggestions.c| 5 -
 gcc/testsuite/gcc.dg/completion-5.c  | 7 +++
 gcc/testsuite/gcc.target/i386/spellcheck-options-5.c | 5 +
 3 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/completion-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/spellcheck-options-5.c


diff --git a/gcc/opt-suggestions.c b/gcc/opt-suggestions.c
index 415dcc9fc45..a820c78ff56 100644
--- a/gcc/opt-suggestions.c
+++ b/gcc/opt-suggestions.c
@@ -141,12 +141,14 @@ option_proposer::build_option_suggestions (const char *prefix)
 	}
 	  else
 	{
+	  bool option_added = false;
 	  if (option->flags & CL_TARGET)
 		{
 		  vec option_values
 		= targetm_common.get_valid_option_values (i, prefix);
 		  if (!option_values.is_empty ())
 		{
+		  option_added = true;
 		  for (unsigned j = 0; j < option_values.length (); j++)
 			{
 			  char *with_arg = concat (opt_text, option_values[j],
@@ -158,7 +160,8 @@ option_proposer::build_option_suggestions (const char *prefix)
 		}
 		  option_values.release ();
 		}
-	  else
+
+	  if (!option_added)
 		add_misspelling_candidates (m_option_suggestions, option,
 	opt_text);
 	}
diff --git a/gcc/testsuite/gcc.dg/completion-5.c b/gcc/testsuite/gcc.dg/completion-5.c
new file mode 100644
index 000..6719cfb6717
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/completion-5.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "--completion=-mfm" } */
+
+/* { dg-begin-multiline-output "" }
+-mfma
+-mfma4
+   { dg-end-multiline-output "" } */
diff --git a/gcc/testsuite/gcc.target/i386/spellcheck-options-5.c b/gcc/testsuite/gcc.target/i386/spellcheck-options-5.c
new file mode 100644
index 000..4a878ba2da0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/spellcheck-options-5.c
@@ -0,0 +1,5 @@
+/* PR middle-end/90258.  */
+
+/* { dg-do compile } */
+/* { dg-options "-mandroidX" } */
+/* { dg-error "unrecognized command line option '-mandroidX'; did you mean '-mandroid'"  "" { target *-*-* } 0 } */



Re: [PATCH] Add missing libsanitizer extra patch (r259664) (PR sanitizer/89941).

2019-04-08 Thread Jakub Jelinek
On Wed, Apr 03, 2019 at 10:09:19AM +0200, Martin Liška wrote:
> Hi.
> 
> The patch is about re-application of what we've already had
> on top trunk libsanitizer. I'll then include the patch in 
> libsanitizer/LOCAL_PATCHES.
> 
> Ready for trunk?
> Thanks,
> Martin
> 
> libsanitizer/ChangeLog:
> 
> 2019-04-03  Martin Liska  
> 
>   PR sanitizer/89941
>   * sanitizer_common/sanitizer_platform_limits_linux.cc (defined):
>   Reapply patch from r259664.
>   * sanitizer_common/sanitizer_platform_limits_posix.h (defined):
>   Likewise.

Ok.

Jakub


[PATCH] Add missing libsanitizer extra patch (r259664) (PR sanitizer/89941).

2019-04-03 Thread Martin Liška
Hi.

The patch is about re-application of what we've already had
on top trunk libsanitizer. I'll then include the patch in 
libsanitizer/LOCAL_PATCHES.

Ready for trunk?
Thanks,
Martin

libsanitizer/ChangeLog:

2019-04-03  Martin Liska  

PR sanitizer/89941
* sanitizer_common/sanitizer_platform_limits_linux.cc (defined):
Reapply patch from r259664.
* sanitizer_common/sanitizer_platform_limits_posix.h (defined):
Likewise.
---
 .../sanitizer_common/sanitizer_platform_limits_linux.cc| 7 +--
 .../sanitizer_common/sanitizer_platform_limits_posix.h | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)


diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc b/libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc
index 23a014823c4..3a906538129 100644
--- a/libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc
+++ b/libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc
@@ -25,9 +25,12 @@
 
 // With old kernels (and even new kernels on powerpc) asm/stat.h uses types that
 // are not defined anywhere in userspace headers. Fake them. This seems to work
-// fine with newer headers, too.
+// fine with newer headers, too.  Beware that with , struct stat
+// takes the form of struct stat64 on 32-bit platforms if _FILE_OFFSET_BITS=64.
+// Also, for some platforms (e.g. mips) there are additional members in the
+//  struct stat:s.
 #include 
-#if defined(__x86_64__) ||  defined(__mips__)
+#if defined(__x86_64__)
 #include 
 #else
 #define ino_t __kernel_ino_t
diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
index 91f38918f35..73af92af1e8 100644
--- a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
+++ b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
@@ -87,7 +87,7 @@ namespace __sanitizer {
 #elif defined(__mips__)
   const unsigned struct_kernel_stat_sz =
  SANITIZER_ANDROID ? FIRST_32_SECOND_64(104, 128) :
- FIRST_32_SECOND_64(160, 216);
+ FIRST_32_SECOND_64(144, 216);
   const unsigned struct_kernel_stat64_sz = 104;
 #elif defined(__s390__) && !defined(__s390x__)
   const unsigned struct_kernel_stat_sz = 64;



Re: Ping Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/89803)

2019-03-29 Thread Hongtao Liu
On Sat, Mar 30, 2019 at 5:34 AM Jeff Law  wrote:
>
> On 3/28/19 1:38 AM, Uros Bizjak wrote:
> > On Thu, Mar 28, 2019 at 7:47 AM Hongtao Liu  wrote:
> >>
> >> Hi Uros:
> >>   would you help to review this patch?
> >
> > This is AVX512F patch, you will need the approval from the maintainer
> > first. I have no plans to maintain AVX512 beyond rubber-stamping OK
> > dead obvious regression from a reputable contributors. It is simply
> > too much involvment for me. If the appointed maintainer doesn't
> > respond anymore, then I suggest you raise the issue with GCC steering
> > committe.
> Also note, this is not fixing a regression relative to a prior release.
>   I'd prefer to see this moved to gcc-10 unless there is a strong
> justification for pushing it into gcc-9.
>
> The subject like should also be changed to reference the right bz.  I
> think the right one is 89803.
>
> jeff

Yes, it's PR 89803, sorry for typo.

And thank you for you explanation.

-- 
BR,
Hongtao


Re: Ping Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/897803)

2019-03-29 Thread Jeff Law
On 3/28/19 1:38 AM, Uros Bizjak wrote:
> On Thu, Mar 28, 2019 at 7:47 AM Hongtao Liu  wrote:
>>
>> Hi Uros:
>>   would you help to review this patch?
> 
> This is AVX512F patch, you will need the approval from the maintainer
> first. I have no plans to maintain AVX512 beyond rubber-stamping OK
> dead obvious regression from a reputable contributors. It is simply
> too much involvment for me. If the appointed maintainer doesn't
> respond anymore, then I suggest you raise the issue with GCC steering
> committe.
Also note, this is not fixing a regression relative to a prior release.
  I'd prefer to see this moved to gcc-10 unless there is a strong
justification for pushing it into gcc-9.

The subject like should also be changed to reference the right bz.  I
think the right one is 89803.

jeff


Re: Ping Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/897803)

2019-03-28 Thread Uros Bizjak
On Thu, Mar 28, 2019 at 7:47 AM Hongtao Liu  wrote:
>
> Hi Uros:
>   would you help to review this patch?

This is AVX512F patch, you will need the approval from the maintainer
first. I have no plans to maintain AVX512 beyond rubber-stamping OK
dead obvious regression from a reputable contributors. It is simply
too much involvment for me. If the appointed maintainer doesn't
respond anymore, then I suggest you raise the issue with GCC steering
committe.

Uros.

> Regards,
> Hongtao.
>
> On Sun, Mar 24, 2019 at 8:13 PM Hongtao Liu  wrote:
> >
> > Hi:
> >   The following patch adds forgotten avx512f fpclass instrinsics for
> > masked scalar operations.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux (on skylake-avx512),
> > ok for trunk?
> >
> > Index: ChangeLog
> > ===
> > --- ChangeLog (revision 269894)
> > +++ ChangeLog (working copy)
> > @@ -1,3 +1,16 @@
> > +2019-03-24 Hongtao Liu 
> > +
> > + PR target/89803
> > + * config/i386/avx512dqintrin.h
> > + (_mm_mask_fpclass_ss_mask,_mm_mask_fpclass_sd_mask):
> > + New intrinsics.
> > + * config/i386/i386-builtin.def
> > + (__builtin_ia32_fpcla_mask, _builtin_ia32_fpclasssd_mask):
> > + New builtins.
> > + * config/i386/sse.md
> > + (define_insn "avx512dq_vmfpclass):
> > + Modified with mask.
> > +
> >  2019-03-23  Segher Boessenkool  
> >
> >   * config/rs6000/xmmintrin.h (_mm_movemask_pi8): Implement for 32-bit
> > Index: config/i386/avx512dqintrin.h
> > ===
> > --- config/i386/avx512dqintrin.h (revision 269894)
> > +++ config/i386/avx512dqintrin.h (working copy)
> > @@ -1372,6 +1372,20 @@
> >return (__mmask8) __builtin_ia32_fpclasssd ((__v2df) __A, __imm);
> >  }
> >
> > +extern __inline __mmask8
> > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > +_mm_mask_fpclass_ss_mask (__mmask8 __U, __m128 __A, const int __imm)
> > +{
> > +  return (__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) __A, __imm, 
> > __U);
> > +}
> > +
> > +extern __inline __mmask8
> > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > +_mm_mask_fpclass_sd_mask (__mmask8 __U, __m128d __A, const int __imm)
> > +{
> > +  return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm, 
> > __U);
> > +}
> > +
> >  extern __inline __m512i
> >  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> >  _mm512_cvtt_roundpd_epi64 (__m512d __A, const int __R)
> > @@ -2623,6 +2637,12 @@
> >  #define _mm_fpclass_sd_mask(X, C) \
> >((__mmask8) __builtin_ia32_fpclasssd ((__v2df) (__m128d) (X), (int) 
> > (C))) \
> >
> > +#define _mm_mask_fpclass_ss_mask(X, C, U) \
> > +  ((__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) (__m128) (X),
> > (int) (C)), (__mmask8) (U))
> > +
> > +#define _mm_mask_fpclass_sd_mask(X, C, U) \
> > +  ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X),
> > (int) (C)), (__mmask8) (U))
> > +
> >  #define _mm512_mask_fpclass_pd_mask(u, X, C)\
> >((__mmask8) __builtin_ia32_fpclasspd512_mask ((__v8df) (__m512d) (X), \
> >   (int) (C), (__mmask8)(u)))
> > Index: config/i386/i386-builtin.def
> > ===
> > --- config/i386/i386-builtin.def (revision 269894)
> > +++ config/i386/i386-builtin.def (working copy)
> > @@ -2082,9 +2082,11 @@
> >  BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0,
> > CODE_FOR_avx512dq_fpclassv4df_mask,
> > "__builtin_ia32_fpclasspd256_mask", IX86_BUILTIN_FPCLASSPD256,
> > UNKNOWN, (int) QI_FTYPE_V4DF_INT_UQI)
> >  BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0,
> > CODE_FOR_avx512dq_fpclassv2df_mask,
> > "__builtin_ia32_fpclasspd128_mask", IX86_BUILTIN_FPCLASSPD128,
> > UNKNOWN, (int) QI_FTYPE_V2DF_INT_UQI)
> >  BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vmfpclassv2df,
> > "__builtin_ia32_fpclasssd", IX86_BUILTIN_FPCLASSSD, UNKNOWN, (int)
> > QI_FTYPE_V2DF_INT)
> > +BDESC (OPTION_MASK_ISA_AVX512DQ, 0,
> > CODE_FOR_avx512dq_vmfpclassv2df_mask, "__builtin_ia32_fpclasssd_mask",
> > IX86_BUILTIN_FPCLASSSD_MASK, UNKNOWN, (int) QI_FTYPE_V2DF_INT_UQI)
> >  BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0,
> > CODE_FOR_avx512dq_fpclassv8sf_mask,
> > "__builtin_ia32_fpclassps256_mask", IX86_BUILTIN_FPCLASSPS256,
> > UNKNOWN, (int) QI_FTYPE_V8SF_INT_UQI)
> >  BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0,
> > CODE_FOR_avx512dq_fpclassv4sf_mask,
> > "__builtin_ia32_fpclassps128_mask", IX86_BUILTIN_FPCLASSPS128,
> > UNKNOWN, (int) QI_FTYPE_V4SF_INT_UQI)
> >  BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vmfpclassv4sf,
> > "__builtin_ia32_fpcla", IX86_BUILTIN_FPCLA, UNKNOWN, (int)
> > QI_FTYPE_V4SF_INT)
> > +BDESC (OPTION_MASK_ISA_AVX512DQ, 0,
> > CODE_FOR_avx512dq_vmfpclassv4sf_mask, "__builtin_ia32_fpcla_mask",
> > IX86_BUILTIN_FPCLA_MASK, UNKNOWN, (int)

Ping Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/897803)

2019-03-27 Thread Hongtao Liu
Hi Uros:
  would you help to review this patch?

Regards,
Hongtao.

On Sun, Mar 24, 2019 at 8:13 PM Hongtao Liu  wrote:
>
> Hi:
>   The following patch adds forgotten avx512f fpclass instrinsics for
> masked scalar operations.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux (on skylake-avx512),
> ok for trunk?
>
> Index: ChangeLog
> ===
> --- ChangeLog (revision 269894)
> +++ ChangeLog (working copy)
> @@ -1,3 +1,16 @@
> +2019-03-24 Hongtao Liu 
> +
> + PR target/89803
> + * config/i386/avx512dqintrin.h
> + (_mm_mask_fpclass_ss_mask,_mm_mask_fpclass_sd_mask):
> + New intrinsics.
> + * config/i386/i386-builtin.def
> + (__builtin_ia32_fpcla_mask, _builtin_ia32_fpclasssd_mask):
> + New builtins.
> + * config/i386/sse.md
> + (define_insn "avx512dq_vmfpclass):
> + Modified with mask.
> +
>  2019-03-23  Segher Boessenkool  
>
>   * config/rs6000/xmmintrin.h (_mm_movemask_pi8): Implement for 32-bit
> Index: config/i386/avx512dqintrin.h
> ===
> --- config/i386/avx512dqintrin.h (revision 269894)
> +++ config/i386/avx512dqintrin.h (working copy)
> @@ -1372,6 +1372,20 @@
>return (__mmask8) __builtin_ia32_fpclasssd ((__v2df) __A, __imm);
>  }
>
> +extern __inline __mmask8
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_fpclass_ss_mask (__mmask8 __U, __m128 __A, const int __imm)
> +{
> +  return (__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) __A, __imm, __U);
> +}
> +
> +extern __inline __mmask8
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_fpclass_sd_mask (__mmask8 __U, __m128d __A, const int __imm)
> +{
> +  return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm, __U);
> +}
> +
>  extern __inline __m512i
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_cvtt_roundpd_epi64 (__m512d __A, const int __R)
> @@ -2623,6 +2637,12 @@
>  #define _mm_fpclass_sd_mask(X, C) \
>((__mmask8) __builtin_ia32_fpclasssd ((__v2df) (__m128d) (X), (int) (C))) \
>
> +#define _mm_mask_fpclass_ss_mask(X, C, U) \
> +  ((__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) (__m128) (X),
> (int) (C)), (__mmask8) (U))
> +
> +#define _mm_mask_fpclass_sd_mask(X, C, U) \
> +  ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X),
> (int) (C)), (__mmask8) (U))
> +
>  #define _mm512_mask_fpclass_pd_mask(u, X, C)\
>((__mmask8) __builtin_ia32_fpclasspd512_mask ((__v8df) (__m512d) (X), \
>   (int) (C), (__mmask8)(u)))
> Index: config/i386/i386-builtin.def
> ===
> --- config/i386/i386-builtin.def (revision 269894)
> +++ config/i386/i386-builtin.def (working copy)
> @@ -2082,9 +2082,11 @@
>  BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0,
> CODE_FOR_avx512dq_fpclassv4df_mask,
> "__builtin_ia32_fpclasspd256_mask", IX86_BUILTIN_FPCLASSPD256,
> UNKNOWN, (int) QI_FTYPE_V4DF_INT_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0,
> CODE_FOR_avx512dq_fpclassv2df_mask,
> "__builtin_ia32_fpclasspd128_mask", IX86_BUILTIN_FPCLASSPD128,
> UNKNOWN, (int) QI_FTYPE_V2DF_INT_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vmfpclassv2df,
> "__builtin_ia32_fpclasssd", IX86_BUILTIN_FPCLASSSD, UNKNOWN, (int)
> QI_FTYPE_V2DF_INT)
> +BDESC (OPTION_MASK_ISA_AVX512DQ, 0,
> CODE_FOR_avx512dq_vmfpclassv2df_mask, "__builtin_ia32_fpclasssd_mask",
> IX86_BUILTIN_FPCLASSSD_MASK, UNKNOWN, (int) QI_FTYPE_V2DF_INT_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0,
> CODE_FOR_avx512dq_fpclassv8sf_mask,
> "__builtin_ia32_fpclassps256_mask", IX86_BUILTIN_FPCLASSPS256,
> UNKNOWN, (int) QI_FTYPE_V8SF_INT_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0,
> CODE_FOR_avx512dq_fpclassv4sf_mask,
> "__builtin_ia32_fpclassps128_mask", IX86_BUILTIN_FPCLASSPS128,
> UNKNOWN, (int) QI_FTYPE_V4SF_INT_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vmfpclassv4sf,
> "__builtin_ia32_fpcla", IX86_BUILTIN_FPCLA, UNKNOWN, (int)
> QI_FTYPE_V4SF_INT)
> +BDESC (OPTION_MASK_ISA_AVX512DQ, 0,
> CODE_FOR_avx512dq_vmfpclassv4sf_mask, "__builtin_ia32_fpcla_mask",
> IX86_BUILTIN_FPCLA_MASK, UNKNOWN, (int) QI_FTYPE_V4SF_INT_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0,
> CODE_FOR_avx512vl_cvtb2maskv16qi, "__builtin_ia32_cvtb2mask128",
> IX86_BUILTIN_CVTB2MASK128, UNKNOWN, (int) UHI_FTYPE_V16QI)
>  BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0,
> CODE_FOR_avx512vl_cvtb2maskv32qi, "__builtin_ia32_cvtb2mask256",
> IX86_BUILTIN_CVTB2MASK256, UNKNOWN, (int) USI_FTYPE_V32QI)
>  BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0,
> CODE_FOR_avx512vl_cvtw2maskv8hi, "__builtin_ia32_cvtw2mask128",
> IX86_BUILTIN_CVTW2MASK128, UNKNOWN, (int) UQI_FTYPE_V8HI)
> Index: config/i386/sse.md
> 

[PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/897803)

2019-03-24 Thread Hongtao Liu
Hi:
  The following patch adds forgotten avx512f fpclass instrinsics for
masked scalar operations.

Bootstrapped/regtested on x86_64-linux and i686-linux (on skylake-avx512),
ok for trunk?

Index: ChangeLog
===
--- ChangeLog (revision 269894)
+++ ChangeLog (working copy)
@@ -1,3 +1,16 @@
+2019-03-24 Hongtao Liu 
+
+ PR target/89803
+ * config/i386/avx512dqintrin.h
+ (_mm_mask_fpclass_ss_mask,_mm_mask_fpclass_sd_mask):
+ New intrinsics.
+ * config/i386/i386-builtin.def
+ (__builtin_ia32_fpcla_mask, _builtin_ia32_fpclasssd_mask):
+ New builtins.
+ * config/i386/sse.md
+ (define_insn "avx512dq_vmfpclass):
+ Modified with mask.
+
 2019-03-23  Segher Boessenkool  

  * config/rs6000/xmmintrin.h (_mm_movemask_pi8): Implement for 32-bit
Index: config/i386/avx512dqintrin.h
===
--- config/i386/avx512dqintrin.h (revision 269894)
+++ config/i386/avx512dqintrin.h (working copy)
@@ -1372,6 +1372,20 @@
   return (__mmask8) __builtin_ia32_fpclasssd ((__v2df) __A, __imm);
 }

+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fpclass_ss_mask (__mmask8 __U, __m128 __A, const int __imm)
+{
+  return (__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) __A, __imm, __U);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fpclass_sd_mask (__mmask8 __U, __m128d __A, const int __imm)
+{
+  return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm, __U);
+}
+
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_cvtt_roundpd_epi64 (__m512d __A, const int __R)
@@ -2623,6 +2637,12 @@
 #define _mm_fpclass_sd_mask(X, C) \
   ((__mmask8) __builtin_ia32_fpclasssd ((__v2df) (__m128d) (X), (int) (C))) \

+#define _mm_mask_fpclass_ss_mask(X, C, U) \
+  ((__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) (__m128) (X),
(int) (C)), (__mmask8) (U))
+
+#define _mm_mask_fpclass_sd_mask(X, C, U) \
+  ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X),
(int) (C)), (__mmask8) (U))
+
 #define _mm512_mask_fpclass_pd_mask(u, X, C)\
   ((__mmask8) __builtin_ia32_fpclasspd512_mask ((__v8df) (__m512d) (X), \
  (int) (C), (__mmask8)(u)))
Index: config/i386/i386-builtin.def
===
--- config/i386/i386-builtin.def (revision 269894)
+++ config/i386/i386-builtin.def (working copy)
@@ -2082,9 +2082,11 @@
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0,
CODE_FOR_avx512dq_fpclassv4df_mask,
"__builtin_ia32_fpclasspd256_mask", IX86_BUILTIN_FPCLASSPD256,
UNKNOWN, (int) QI_FTYPE_V4DF_INT_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0,
CODE_FOR_avx512dq_fpclassv2df_mask,
"__builtin_ia32_fpclasspd128_mask", IX86_BUILTIN_FPCLASSPD128,
UNKNOWN, (int) QI_FTYPE_V2DF_INT_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vmfpclassv2df,
"__builtin_ia32_fpclasssd", IX86_BUILTIN_FPCLASSSD, UNKNOWN, (int)
QI_FTYPE_V2DF_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, 0,
CODE_FOR_avx512dq_vmfpclassv2df_mask, "__builtin_ia32_fpclasssd_mask",
IX86_BUILTIN_FPCLASSSD_MASK, UNKNOWN, (int) QI_FTYPE_V2DF_INT_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0,
CODE_FOR_avx512dq_fpclassv8sf_mask,
"__builtin_ia32_fpclassps256_mask", IX86_BUILTIN_FPCLASSPS256,
UNKNOWN, (int) QI_FTYPE_V8SF_INT_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0,
CODE_FOR_avx512dq_fpclassv4sf_mask,
"__builtin_ia32_fpclassps128_mask", IX86_BUILTIN_FPCLASSPS128,
UNKNOWN, (int) QI_FTYPE_V4SF_INT_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vmfpclassv4sf,
"__builtin_ia32_fpcla", IX86_BUILTIN_FPCLA, UNKNOWN, (int)
QI_FTYPE_V4SF_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, 0,
CODE_FOR_avx512dq_vmfpclassv4sf_mask, "__builtin_ia32_fpcla_mask",
IX86_BUILTIN_FPCLA_MASK, UNKNOWN, (int) QI_FTYPE_V4SF_INT_UQI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0,
CODE_FOR_avx512vl_cvtb2maskv16qi, "__builtin_ia32_cvtb2mask128",
IX86_BUILTIN_CVTB2MASK128, UNKNOWN, (int) UHI_FTYPE_V16QI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0,
CODE_FOR_avx512vl_cvtb2maskv32qi, "__builtin_ia32_cvtb2mask256",
IX86_BUILTIN_CVTB2MASK256, UNKNOWN, (int) USI_FTYPE_V32QI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0,
CODE_FOR_avx512vl_cvtw2maskv8hi, "__builtin_ia32_cvtw2mask128",
IX86_BUILTIN_CVTW2MASK128, UNKNOWN, (int) UQI_FTYPE_V8HI)
Index: config/i386/sse.md
===
--- config/i386/sse.md (revision 269894)
+++ config/i386/sse.md (working copy)
@@ -2,7 +2,7 @@
(set_attr "prefix" "evex")
(set_attr "mode" "")])

-(define_insn "avx512dq_vmfpclass"
+(define_insn "avx512dq_vmfpclass"
   [(set (match_operand: 0 "register_operand" "=k

Re: [PATCH] Add missing avx512fintrin.h _mm_mask{,3,z}_f{,n}m{add,sub}_s{s,d} intrinsics (PR target/89784)

2019-03-22 Thread Uros Bizjak
On Fri, Mar 22, 2019 at 11:40 AM Jakub Jelinek  wrote:
>
> On Fri, Mar 22, 2019 at 11:11:58AM +0100, Uros Bizjak wrote:
> > > For FMA, naturally only the two operands that are multiplied should be
> > > commutative, but in most patterns one of those two uses "0" or "0,0"
> >
> > This should be safe, we have had "*add_1" for decades that does
> > just the above.
>
> Sure, the 0 isn't a problem in itself.
>
> > > constraint and there is one or two match_dup 1 for it, so it really
> > > isn't commutative.
> >
> > Hm, this situation involving match_dup needs some more thinking...
>
> But this one is.
> If one reads the documentation
> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_mask_fmadd_sd&expand=5236,2545
> or
> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_maskz_fmadd_sd&expand=5236,2545,2547
> then even in the source form related description the a and b arguments
> aren't commutative, because a is used 3 or 2 times while b is used just
> once.
>
> Compare to
> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_mask3_fmadd_sd&expand=5236,2545,2547,2546
> where a and b are both used just once and which thus can be commutative,
> only c is used 3 times.
>
> For this reason, even _mm{,256,512}_mask_f{,n}m{add,sub}_p{s,d} aren't using 
> % and
> IMHO can't.
> Now, _mm{,256,512}_maskz_f{,n}m{add,sub}_p{s,d} actually do use % and can,
> because both a and b are used just once, it is filled with zeros if mask bit
> is 0.  That is different from _mm_maskz_f{,n}m{add,sub}_s{s,d} which
> fills with 0 only the first element and the elements above it are copied
> from a.
>
> > > Which leaves us with the 4 mask3 patterns only, as I said above, for
> > > the first two where neither of those are negated I think % should be ok.
> > > For the .*fnm{add,sub}.*mask3.* ones I'm not sure, because one of them
> > > is negated.  On the other side, seems various other existing fnm*
> > > patterns use % even on those.
> >
> > It is safe to use even if one of the first two operands is negated.
> > According to the documentation, the negation represents negation of
> > the intermediate product, so it doesn't matter which operand is
> > negated.
>
> So like this if it passes bootstrap/regtest?

Yes.

> PS., it would be nice to have some testsuite coverage also for cases where
> the intrinsics are called from noipa wrapper functions and one of the
> arguments is __m128{,d} *x and passes *x to the intrinsic (repeated so that
> we test all non-mask arguments that way).  And it would be nice to also have
> testcases with constant -1 masks and with constant 0 mask.  I just compiled
> attached sources and eyeballed the result that at least on some cases it
> performed the expected simplifications, but probably don't have spare cycles
> now to turn that all into something suitable for the testsuite (ideally it
> would be both test for specific instructions and runtime test).
>
> 2019-03-22  Jakub Jelinek  
>
> * config/i386/sse.md (_fmadd__mask3,
> _fmsub__mask3,
> _fnmadd__mask3,
> _fnmsub__mask3,
> avx512f_vmfmadd__mask3,
> avx512f_vmfmsub__mask3,
> *avx512f_vmfnmadd__mask3): Use 
> 
> instead of register_operand and %v instead of v for match_operand 1.
> (avx512f_vmfnmsub__mask3): Rename to ...
> (*avx512f_vmfnmsub__mask3): ... this.  Use
>  instead of register_operand and %v instead of v
> for match_operand 1.

OK.

Thanks,
Uros.

> --- gcc/config/i386/sse.md.jj   2019-03-22 11:11:58.330060594 +0100
> +++ gcc/config/i386/sse.md  2019-03-22 11:21:12.901952453 +0100
> @@ -3973,7 +3973,7 @@ (define_insn "_fmadd__mask
>[(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
> (vec_merge:VF_AVX512VL
>   (fma:VF_AVX512VL
> -   (match_operand:VF_AVX512VL 1 "register_operand" "v")
> +   (match_operand:VF_AVX512VL 1 "" "%v")
> (match_operand:VF_AVX512VL 2 "" 
> "")
> (match_operand:VF_AVX512VL 3 "register_operand" "0"))
>   (match_dup 3)
> @@ -4094,7 +4094,7 @@ (define_insn "_fmsub__mask
>[(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
> (vec_merge:VF_AVX512VL
>   (fma:VF_AVX512VL
> -   (match_operand:VF_AVX512VL 1 "register_operand" "v")
> +   (match_operand:VF_AVX512VL 1 "" "%v")
> (match_operand:VF_AVX512VL 2 "" 
> "")
> (neg:VF_AVX512VL
>   (match_operand:VF_AVX512VL 3 "register_operand" "0")))
> @@ -4217,7 +4217,7 @@ (define_insn "_fnmadd__mas
> (vec_merge:VF_AVX512VL
>   (fma:VF_AVX512VL
> (neg:VF_AVX512VL
> - (match_operand:VF_AVX512VL 1 "register_operand" "v"))
> + (match_operand:VF_AVX512VL 1 "" "%v"))
> (match_operand:VF_AVX512VL 2 "" 
> "")
> (match_operand:VF_AVX512VL 3 "register_operand" "0"))
>   (match_dup 3)
> @@ -4345,7 +434

Re: [PATCH] Add missing avx512fintrin.h _mm_mask{,3,z}_f{,n}m{add,sub}_s{s,d} intrinsics (PR target/89784)

2019-03-22 Thread Jakub Jelinek
On Fri, Mar 22, 2019 at 11:11:58AM +0100, Uros Bizjak wrote:
> > For FMA, naturally only the two operands that are multiplied should be
> > commutative, but in most patterns one of those two uses "0" or "0,0"
> 
> This should be safe, we have had "*add_1" for decades that does
> just the above.

Sure, the 0 isn't a problem in itself.

> > constraint and there is one or two match_dup 1 for it, so it really
> > isn't commutative.
> 
> Hm, this situation involving match_dup needs some more thinking...

But this one is.
If one reads the documentation
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_mask_fmadd_sd&expand=5236,2545
or
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_maskz_fmadd_sd&expand=5236,2545,2547
then even in the source form related description the a and b arguments
aren't commutative, because a is used 3 or 2 times while b is used just
once.

Compare to
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_mask3_fmadd_sd&expand=5236,2545,2547,2546
where a and b are both used just once and which thus can be commutative,
only c is used 3 times.

For this reason, even _mm{,256,512}_mask_f{,n}m{add,sub}_p{s,d} aren't using % 
and
IMHO can't.
Now, _mm{,256,512}_maskz_f{,n}m{add,sub}_p{s,d} actually do use % and can,
because both a and b are used just once, it is filled with zeros if mask bit
is 0.  That is different from _mm_maskz_f{,n}m{add,sub}_s{s,d} which
fills with 0 only the first element and the elements above it are copied
from a.

> > Which leaves us with the 4 mask3 patterns only, as I said above, for
> > the first two where neither of those are negated I think % should be ok.
> > For the .*fnm{add,sub}.*mask3.* ones I'm not sure, because one of them
> > is negated.  On the other side, seems various other existing fnm*
> > patterns use % even on those.
> 
> It is safe to use even if one of the first two operands is negated.
> According to the documentation, the negation represents negation of
> the intermediate product, so it doesn't matter which operand is
> negated.

So like this if it passes bootstrap/regtest?

PS., it would be nice to have some testsuite coverage also for cases where
the intrinsics are called from noipa wrapper functions and one of the
arguments is __m128{,d} *x and passes *x to the intrinsic (repeated so that
we test all non-mask arguments that way).  And it would be nice to also have
testcases with constant -1 masks and with constant 0 mask.  I just compiled
attached sources and eyeballed the result that at least on some cases it
performed the expected simplifications, but probably don't have spare cycles
now to turn that all into something suitable for the testsuite (ideally it
would be both test for specific instructions and runtime test).

2019-03-22  Jakub Jelinek  

* config/i386/sse.md (_fmadd__mask3,
_fmsub__mask3,
_fnmadd__mask3,
_fnmsub__mask3,
avx512f_vmfmadd__mask3,
avx512f_vmfmsub__mask3,
*avx512f_vmfnmadd__mask3): Use 
instead of register_operand and %v instead of v for match_operand 1.
(avx512f_vmfnmsub__mask3): Rename to ...
(*avx512f_vmfnmsub__mask3): ... this.  Use
 instead of register_operand and %v instead of v
for match_operand 1.

--- gcc/config/i386/sse.md.jj   2019-03-22 11:11:58.330060594 +0100
+++ gcc/config/i386/sse.md  2019-03-22 11:21:12.901952453 +0100
@@ -3973,7 +3973,7 @@ (define_insn "_fmadd__mask
   [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
(vec_merge:VF_AVX512VL
  (fma:VF_AVX512VL
-   (match_operand:VF_AVX512VL 1 "register_operand" "v")
+   (match_operand:VF_AVX512VL 1 "" "%v")
(match_operand:VF_AVX512VL 2 "" 
"")
(match_operand:VF_AVX512VL 3 "register_operand" "0"))
  (match_dup 3)
@@ -4094,7 +4094,7 @@ (define_insn "_fmsub__mask
   [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
(vec_merge:VF_AVX512VL
  (fma:VF_AVX512VL
-   (match_operand:VF_AVX512VL 1 "register_operand" "v")
+   (match_operand:VF_AVX512VL 1 "" "%v")
(match_operand:VF_AVX512VL 2 "" 
"")
(neg:VF_AVX512VL
  (match_operand:VF_AVX512VL 3 "register_operand" "0")))
@@ -4217,7 +4217,7 @@ (define_insn "_fnmadd__mas
(vec_merge:VF_AVX512VL
  (fma:VF_AVX512VL
(neg:VF_AVX512VL
- (match_operand:VF_AVX512VL 1 "register_operand" "v"))
+ (match_operand:VF_AVX512VL 1 "" "%v"))
(match_operand:VF_AVX512VL 2 "" 
"")
(match_operand:VF_AVX512VL 3 "register_operand" "0"))
  (match_dup 3)
@@ -4345,7 +4345,7 @@ (define_insn "_fnmsub__mas
(vec_merge:VF_AVX512VL
  (fma:VF_AVX512VL
(neg:VF_AVX512VL
- (match_operand:VF_AVX512VL 1 "register_operand" "v"))
+ (match_operand:VF_AVX512VL 1 "" "%v"))
(match_operand:VF_AVX512VL 2 "" 

Re: [PATCH] Add missing avx512fintrin.h _mm_mask{,3,z}_f{,n}m{add,sub}_s{s,d} intrinsics (PR target/89784)

2019-03-22 Thread Uros Bizjak
On Fri, Mar 22, 2019 at 11:02 AM Jakub Jelinek  wrote:
>
> On Fri, Mar 22, 2019 at 10:35:45AM +0100, Uros Bizjak wrote:
> > On Fri, Mar 22, 2019 at 9:41 AM Jakub Jelinek  wrote:
> > > The following patch adds forgotten avx512f fma instrinsics for masked 
> > > scalar
> > > operations.
> > >
> > > Bootstrapped/regtested on x86_64-linux and i686-linux (on skylake-avx512),
> > > ok for trunk?
> >
> > There are several possibilities to mark the 1st and the 2nd operand of
> > fma pattern as commutative ("%"). However, there are already existing
> > patterns without commutative operand, so this improvement could be
> > eventually submitted as a follow-on patch.
>
> I actually don't know where it would be safe to use % in these masked
> patterns, except perhaps avx512f_vmfm{add,sub}__mask3.
>
> For FMA, naturally only the two operands that are multiplied should be
> commutative, but in most patterns one of those two uses "0" or "0,0"

This should be safe, we have had "*add_1" for decades that does
just the above.

> constraint and there is one or two match_dup 1 for it, so it really
> isn't commutative.

Hm, this situation involving match_dup needs some more thinking...

> Which leaves us with the 4 mask3 patterns only, as I said above, for
> the first two where neither of those are negated I think % should be ok.
> For the .*fnm{add,sub}.*mask3.* ones I'm not sure, because one of them
> is negated.  On the other side, seems various other existing fnm*
> patterns use % even on those.

It is safe to use even if one of the first two operands is negated.
According to the documentation, the negation represents negation of
the intermediate product, so it doesn't matter which operand is
negated.

Uros.


Re: [PATCH] Add missing avx512fintrin.h _mm_mask{,3,z}_f{,n}m{add,sub}_s{s,d} intrinsics (PR target/89784)

2019-03-22 Thread Jakub Jelinek
On Fri, Mar 22, 2019 at 10:35:45AM +0100, Uros Bizjak wrote:
> On Fri, Mar 22, 2019 at 9:41 AM Jakub Jelinek  wrote:
> > The following patch adds forgotten avx512f fma instrinsics for masked scalar
> > operations.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux (on skylake-avx512),
> > ok for trunk?
> 
> There are several possibilities to mark the 1st and the 2nd operand of
> fma pattern as commutative ("%"). However, there are already existing
> patterns without commutative operand, so this improvement could be
> eventually submitted as a follow-on patch.

I actually don't know where it would be safe to use % in these masked
patterns, except perhaps avx512f_vmfm{add,sub}__mask3.

For FMA, naturally only the two operands that are multiplied should be
commutative, but in most patterns one of those two uses "0" or "0,0"
constraint and there is one or two match_dup 1 for it, so it really
isn't commutative.

Which leaves us with the 4 mask3 patterns only, as I said above, for
the first two where neither of those are negated I think % should be ok.
For the .*fnm{add,sub}.*mask3.* ones I'm not sure, because one of them
is negated.  On the other side, seems various other existing fnm*
patterns use % even on those.

Jakub


Re: [PATCH] Add missing avx512fintrin.h _mm_mask{,3,z}_f{,n}m{add,sub}_s{s,d} intrinsics (PR target/89784)

2019-03-22 Thread Uros Bizjak
On Fri, Mar 22, 2019 at 9:41 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following patch adds forgotten avx512f fma instrinsics for masked scalar
> operations.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux (on skylake-avx512),
> ok for trunk?

There are several possibilities to mark the 1st and the 2nd operand of
fma pattern as commutative ("%"). However, there are already existing
patterns without commutative operand, so this improvement could be
eventually submitted as a follow-on patch.

So, LGTM for the whole thing.

Thanks,
Uros.

> 2019-03-22  Jakub Jelinek  
>
> PR target/89784
> * config/i386/i386.c (enum ix86_builtins): Remove
> IX86_BUILTIN_VFMSUBSD3_MASK3 and IX86_BUILTIN_VFMSUBSS3_MASK3.
> * config/i386/i386-builtin.def (__builtin_ia32_vfmaddsd3_mask,
> __builtin_ia32_vfmaddsd3_mask3, __builtin_ia32_vfmaddsd3_maskz,
> __builtin_ia32_vfmsubsd3_mask3, __builtin_ia32_vfmaddss3_mask,
> __builtin_ia32_vfmaddss3_mask3, __builtin_ia32_vfmaddss3_maskz,
> __builtin_ia32_vfmsubss3_mask3): New builtins.
> * config/i386/sse.md (avx512f_vmfmadd__mask,
> avx512f_vmfmadd__mask3,
> avx512f_vmfmadd__maskz_1,
> *avx512f_vmfmsub__mask,
> avx512f_vmfmsub__mask3,
> *avx512f_vmfmasub__maskz_1,
> *avx512f_vmfnmadd__mask,
> *avx512f_vmfnmadd__mask3,
> *avx512f_vmfnmadd__maskz_1,
> *avx512f_vmfnmsub__mask,
> *avx512f_vmfnmsub__mask3,
> *avx512f_vmfnmasub__maskz_1): New define_insns.
> (avx512f_vmfmadd__maskz): New define_expand.
> * config/i386/avx512fintrin.h (_mm_mask_fmadd_sd, _mm_mask_fmadd_ss,
> _mm_mask3_fmadd_sd, _mm_mask3_fmadd_ss, _mm_maskz_fmadd_sd,
> _mm_maskz_fmadd_ss, _mm_mask_fmsub_sd, _mm_mask_fmsub_ss,
> _mm_mask3_fmsub_sd, _mm_mask3_fmsub_ss, _mm_maskz_fmsub_sd,
> _mm_maskz_fmsub_ss, _mm_mask_fnmadd_sd, _mm_mask_fnmadd_ss,
> _mm_mask3_fnmadd_sd, _mm_mask3_fnmadd_ss, _mm_maskz_fnmadd_sd,
> _mm_maskz_fnmadd_ss, _mm_mask_fnmsub_sd, _mm_mask_fnmsub_ss,
> _mm_mask3_fnmsub_sd, _mm_mask3_fnmsub_ss, _mm_maskz_fnmsub_sd,
> _mm_maskz_fnmsub_ss, _mm_mask_fmadd_round_sd, _mm_mask_fmadd_round_ss,
> _mm_mask3_fmadd_round_sd, _mm_mask3_fmadd_round_ss,
> _mm_maskz_fmadd_round_sd, _mm_maskz_fmadd_round_ss,
> _mm_mask_fmsub_round_sd, _mm_mask_fmsub_round_ss,
> _mm_mask3_fmsub_round_sd, _mm_mask3_fmsub_round_ss,
> _mm_maskz_fmsub_round_sd, _mm_maskz_fmsub_round_ss,
> _mm_mask_fnmadd_round_sd, _mm_mask_fnmadd_round_ss,
> _mm_mask3_fnmadd_round_sd, _mm_mask3_fnmadd_round_ss,
> _mm_maskz_fnmadd_round_sd, _mm_maskz_fnmadd_round_ss,
> _mm_mask_fnmsub_round_sd, _mm_mask_fnmsub_round_ss,
> _mm_mask3_fnmsub_round_sd, _mm_mask3_fnmsub_round_ss,
> _mm_maskz_fnmsub_round_sd, _mm_maskz_fnmsub_round_ss): New intrinsics.
>
> * gcc.target/i386/sse-13.c (__builtin_ia32_vfmaddsd3_mask,
> __builtin_ia32_vfmaddsd3_mask3, __builtin_ia32_vfmaddsd3_maskz,
> __builtin_ia32_vfmsubsd3_mask3, __builtin_ia32_vfmaddss3_mask,
> __builtin_ia32_vfmaddss3_mask3, __builtin_ia32_vfmaddss3_maskz,
> __builtin_ia32_vfmsubss3_mask3): Define.
> * gcc.target/i386/sse-23.c (__builtin_ia32_vfmaddsd3_mask,
> __builtin_ia32_vfmaddsd3_mask3, __builtin_ia32_vfmaddsd3_maskz,
> __builtin_ia32_vfmsubsd3_mask3, __builtin_ia32_vfmaddss3_mask,
> __builtin_ia32_vfmaddss3_mask3, __builtin_ia32_vfmaddss3_maskz,
> __builtin_ia32_vfmsubss3_mask3): Define.
> * gcc.target/i386/avx-1.c (__builtin_ia32_vfmaddsd3_mask,
> __builtin_ia32_vfmaddsd3_mask3, __builtin_ia32_vfmaddsd3_maskz,
> __builtin_ia32_vfmsubsd3_mask3, __builtin_ia32_vfmaddss3_mask,
> __builtin_ia32_vfmaddss3_mask3, __builtin_ia32_vfmaddss3_maskz,
> __builtin_ia32_vfmsubss3_mask3): Define.
> * gcc.target/i386/sse-14.c: Add tests for
> _mm_mask{,3,z}_f{,n}m{add,sub}_round_s{s,d} builtins.
> * gcc.target/i386/sse-22.c: Likewise.
>
> 2019-03-22  Hongtao Liu  
>
> * gcc.target/i386/avx512f-vfmaddXXXsd-1.c (avx512f_test): Add tests
> for _mm_mask{,3,z}_*.
> * gcc.target/i386/avx512f-vfmaddXXXss-1.c (avx512f_test): Likewise.
> * gcc.target/i386/avx512f-vfmsubXXXsd-1.c (avx512f_test): Likewise.
> * gcc.target/i386/avx512f-vfmsubXXXss-1.c (avx512f_test): Likewise.
> * gcc.target/i386/avx512f-vfnmaddXXXsd-1.c (avx512f_test): Likewise.
> * gcc.target/i386/avx512f-vfnmaddXXXss-1.c (avx512f_test): Likewise.
> * gcc.target/i386/avx512f-vfnmsubXXXsd-1.c (avx512f_test): Likewise.
> * gcc.target/i386/avx512f-vfnmsubXXXss-1.c (avx512f_test): Likewise.
> * gcc.target/i386/avx512f-vfmaddXXXsd-2.c: New test.
> * gcc.target/i386/avx512f-vfmaddXXXss-2.c: New test.
> * gcc.

[PATCH] Add missing avx512fintrin.h _mm_mask{,3,z}_f{,n}m{add,sub}_s{s,d} intrinsics (PR target/89784)

2019-03-22 Thread Jakub Jelinek
Hi!

The following patch adds forgotten avx512f fma instrinsics for masked scalar
operations.

Bootstrapped/regtested on x86_64-linux and i686-linux (on skylake-avx512),
ok for trunk?

2019-03-22  Jakub Jelinek  

PR target/89784
* config/i386/i386.c (enum ix86_builtins): Remove
IX86_BUILTIN_VFMSUBSD3_MASK3 and IX86_BUILTIN_VFMSUBSS3_MASK3.
* config/i386/i386-builtin.def (__builtin_ia32_vfmaddsd3_mask,
__builtin_ia32_vfmaddsd3_mask3, __builtin_ia32_vfmaddsd3_maskz,
__builtin_ia32_vfmsubsd3_mask3, __builtin_ia32_vfmaddss3_mask,
__builtin_ia32_vfmaddss3_mask3, __builtin_ia32_vfmaddss3_maskz,
__builtin_ia32_vfmsubss3_mask3): New builtins.
* config/i386/sse.md (avx512f_vmfmadd__mask,
avx512f_vmfmadd__mask3,
avx512f_vmfmadd__maskz_1,
*avx512f_vmfmsub__mask,
avx512f_vmfmsub__mask3,
*avx512f_vmfmasub__maskz_1,
*avx512f_vmfnmadd__mask,
*avx512f_vmfnmadd__mask3,
*avx512f_vmfnmadd__maskz_1,
*avx512f_vmfnmsub__mask,
*avx512f_vmfnmsub__mask3,
*avx512f_vmfnmasub__maskz_1): New define_insns.
(avx512f_vmfmadd__maskz): New define_expand.
* config/i386/avx512fintrin.h (_mm_mask_fmadd_sd, _mm_mask_fmadd_ss,
_mm_mask3_fmadd_sd, _mm_mask3_fmadd_ss, _mm_maskz_fmadd_sd,
_mm_maskz_fmadd_ss, _mm_mask_fmsub_sd, _mm_mask_fmsub_ss,
_mm_mask3_fmsub_sd, _mm_mask3_fmsub_ss, _mm_maskz_fmsub_sd,
_mm_maskz_fmsub_ss, _mm_mask_fnmadd_sd, _mm_mask_fnmadd_ss,
_mm_mask3_fnmadd_sd, _mm_mask3_fnmadd_ss, _mm_maskz_fnmadd_sd,
_mm_maskz_fnmadd_ss, _mm_mask_fnmsub_sd, _mm_mask_fnmsub_ss,
_mm_mask3_fnmsub_sd, _mm_mask3_fnmsub_ss, _mm_maskz_fnmsub_sd,
_mm_maskz_fnmsub_ss, _mm_mask_fmadd_round_sd, _mm_mask_fmadd_round_ss,
_mm_mask3_fmadd_round_sd, _mm_mask3_fmadd_round_ss,
_mm_maskz_fmadd_round_sd, _mm_maskz_fmadd_round_ss,
_mm_mask_fmsub_round_sd, _mm_mask_fmsub_round_ss,
_mm_mask3_fmsub_round_sd, _mm_mask3_fmsub_round_ss,
_mm_maskz_fmsub_round_sd, _mm_maskz_fmsub_round_ss,
_mm_mask_fnmadd_round_sd, _mm_mask_fnmadd_round_ss,
_mm_mask3_fnmadd_round_sd, _mm_mask3_fnmadd_round_ss,
_mm_maskz_fnmadd_round_sd, _mm_maskz_fnmadd_round_ss,
_mm_mask_fnmsub_round_sd, _mm_mask_fnmsub_round_ss,
_mm_mask3_fnmsub_round_sd, _mm_mask3_fnmsub_round_ss,
_mm_maskz_fnmsub_round_sd, _mm_maskz_fnmsub_round_ss): New intrinsics.

* gcc.target/i386/sse-13.c (__builtin_ia32_vfmaddsd3_mask,
__builtin_ia32_vfmaddsd3_mask3, __builtin_ia32_vfmaddsd3_maskz,
__builtin_ia32_vfmsubsd3_mask3, __builtin_ia32_vfmaddss3_mask,
__builtin_ia32_vfmaddss3_mask3, __builtin_ia32_vfmaddss3_maskz,
__builtin_ia32_vfmsubss3_mask3): Define.
* gcc.target/i386/sse-23.c (__builtin_ia32_vfmaddsd3_mask,
__builtin_ia32_vfmaddsd3_mask3, __builtin_ia32_vfmaddsd3_maskz,
__builtin_ia32_vfmsubsd3_mask3, __builtin_ia32_vfmaddss3_mask,
__builtin_ia32_vfmaddss3_mask3, __builtin_ia32_vfmaddss3_maskz,
__builtin_ia32_vfmsubss3_mask3): Define.
* gcc.target/i386/avx-1.c (__builtin_ia32_vfmaddsd3_mask,
__builtin_ia32_vfmaddsd3_mask3, __builtin_ia32_vfmaddsd3_maskz,
__builtin_ia32_vfmsubsd3_mask3, __builtin_ia32_vfmaddss3_mask,
__builtin_ia32_vfmaddss3_mask3, __builtin_ia32_vfmaddss3_maskz,
__builtin_ia32_vfmsubss3_mask3): Define.
* gcc.target/i386/sse-14.c: Add tests for
_mm_mask{,3,z}_f{,n}m{add,sub}_round_s{s,d} builtins.
* gcc.target/i386/sse-22.c: Likewise.

2019-03-22  Hongtao Liu  

* gcc.target/i386/avx512f-vfmaddXXXsd-1.c (avx512f_test): Add tests
for _mm_mask{,3,z}_*.
* gcc.target/i386/avx512f-vfmaddXXXss-1.c (avx512f_test): Likewise.
* gcc.target/i386/avx512f-vfmsubXXXsd-1.c (avx512f_test): Likewise.
* gcc.target/i386/avx512f-vfmsubXXXss-1.c (avx512f_test): Likewise.
* gcc.target/i386/avx512f-vfnmaddXXXsd-1.c (avx512f_test): Likewise.
* gcc.target/i386/avx512f-vfnmaddXXXss-1.c (avx512f_test): Likewise.
* gcc.target/i386/avx512f-vfnmsubXXXsd-1.c (avx512f_test): Likewise.
* gcc.target/i386/avx512f-vfnmsubXXXss-1.c (avx512f_test): Likewise.
* gcc.target/i386/avx512f-vfmaddXXXsd-2.c: New test.
* gcc.target/i386/avx512f-vfmaddXXXss-2.c: New test.
* gcc.target/i386/avx512f-vfmsubXXXsd-2.c: New test.
* gcc.target/i386/avx512f-vfmsubXXXss-2.c: New test.
* gcc.target/i386/avx512f-vfnmaddXXXsd-2.c: New test.
* gcc.target/i386/avx512f-vfnmaddXXXss-2.c: New test.
* gcc.target/i386/avx512f-vfnmsubXXXsd-2.c: New test.
* gcc.target/i386/avx512f-vfnmsubXXXss-2.c: New test.

--- gcc/config/i386/i386.c.jj   2019-03-19 08:25:24.225118967 +0100
+++ gcc/config/i386/i386.c  2019-03-21 17:08:40.840369883 +0100
@@ -30524,8 +30524,6 @@

Re: [PATCH] Add missing avx512fintrin.h intrinsics (PR target/89602)

2019-03-08 Thread H.J. Lu
On Thu, Mar 7, 2019 at 4:09 PM Jakub Jelinek  wrote:
>
> On Thu, Mar 07, 2019 at 08:11:53AM +0100, Uros Bizjak wrote:
> > > +(define_insn "*avx512f_load_mask"
> > > +  [(set (match_operand: 0 "register_operand" "=v")
> > > +   (vec_merge:
> > > + (vec_merge:
> > > +   (vec_duplicate:
> > > + (match_operand:MODEF 1 "memory_operand" "m"))
> > > +   (match_operand: 2 "nonimm_or_0_operand" "0C")
> > > +   (match_operand:QI 3 "nonmemory_operand" "Yk"))
> >
> > Is there a reason to have nonmemory_operand predicate here instead of
> > register_operand?
>
> Thanks for catching that up, that was from my earlier attempt to have
> Yk,n constraints and deal with that during output.  For store it was
> possible, for others there were some cases it couldn't handle but further
> testing revealed that the combiner already handles most of the constant
> mask cases right.
>
> Here is updated patch, I've changed this in two spots.  It even improves the
> constant 1 case (the only one that is still not optimized as much as it
> should):
>  f4:
> -   movzbl  .LC0(%rip), %eax
> +   movl$1, %eax
> kmovw   %eax, %k1
> vmovsd  (%rsi), %xmm0{%k1}{z}
> ret
> Tested so far with make check-gcc RUNTESTFLAGS=i386.exp=avx512f-vmovs*.c
> and compiling/eyeballing differences on the short testcase I've posted
> in the description with also the u, -> 1, and u, -> 0, changes, appart
> from the above f4 no differences.
>
> Ok for trunk if it passes another full bootstrap/regtest?
>
> 2019-03-07  Jakub Jelinek  
>
> PR target/89602
> * config/i386/sse.md (avx512f_mov_mask,
> *avx512f_load_mask, avx512f_store_mask): New define_insns.
> (avx512f_load_mask): New define_expand.
> * config/i386/i386-builtin.def (__builtin_ia32_loadsd_mask,
> __builtin_ia32_loadss_mask, __builtin_ia32_storesd_mask,
> __builtin_ia32_storess_mask, __builtin_ia32_movesd_mask,
> __builtin_ia32_movess_mask): New builtins.
> * config/i386/avx512fintrin.h (_mm_mask_load_ss, _mm_maskz_load_ss,
> _mm_mask_load_sd, _mm_maskz_load_sd, _mm_mask_move_ss,
> _mm_maskz_move_ss, _mm_mask_move_sd, _mm_maskz_move_sd,
> _mm_mask_store_ss, _mm_mask_store_sd): New intrinsics.
>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89630

This looks very strange since this patch only touched backend.

-- 
H.J.


Re: [PATCH] Add missing avx512fintrin.h intrinsics (PR target/89602)

2019-03-07 Thread H.J. Lu
Looks good to me.

Thanks.

On Thu, Mar 7, 2019, 4:15 PM Uros Bizjak  wrote:

> On Thu, Mar 7, 2019 at 9:09 AM Jakub Jelinek  wrote:
> >
> > On Thu, Mar 07, 2019 at 08:11:53AM +0100, Uros Bizjak wrote:
> > > > +(define_insn "*avx512f_load_mask"
> > > > +  [(set (match_operand: 0 "register_operand" "=v")
> > > > +   (vec_merge:
> > > > + (vec_merge:
> > > > +   (vec_duplicate:
> > > > + (match_operand:MODEF 1 "memory_operand" "m"))
> > > > +   (match_operand: 2 "nonimm_or_0_operand" "0C")
> > > > +   (match_operand:QI 3 "nonmemory_operand" "Yk"))
> > >
> > > Is there a reason to have nonmemory_operand predicate here instead of
> > > register_operand?
> >
> > Thanks for catching that up, that was from my earlier attempt to have
> > Yk,n constraints and deal with that during output.  For store it was
> > possible, for others there were some cases it couldn't handle but further
> > testing revealed that the combiner already handles most of the constant
> > mask cases right.
> >
> > Here is updated patch, I've changed this in two spots.  It even improves
> the
> > constant 1 case (the only one that is still not optimized as much as it
> > should):
> >  f4:
> > -   movzbl  .LC0(%rip), %eax
> > +   movl$1, %eax
> > kmovw   %eax, %k1
> > vmovsd  (%rsi), %xmm0{%k1}{z}
> > ret
> > Tested so far with make check-gcc RUNTESTFLAGS=i386.exp=avx512f-vmovs*.c
> > and compiling/eyeballing differences on the short testcase I've posted
> > in the description with also the u, -> 1, and u, -> 0, changes, appart
> > from the above f4 no differences.
> >
> > Ok for trunk if it passes another full bootstrap/regtest?
>
> LGTM with another fixup below.
>
> HJ should approve addition of intrinsic in header files.
>
> Thanks,
> Uros.
>
> >
> > 2019-03-07  Jakub Jelinek  
> >
> > PR target/89602
> > * config/i386/sse.md (avx512f_mov_mask,
> > *avx512f_load_mask, avx512f_store_mask): New
> define_insns.
> > (avx512f_load_mask): New define_expand.
> > * config/i386/i386-builtin.def (__builtin_ia32_loadsd_mask,
> > __builtin_ia32_loadss_mask, __builtin_ia32_storesd_mask,
> > __builtin_ia32_storess_mask, __builtin_ia32_movesd_mask,
> > __builtin_ia32_movess_mask): New builtins.
> > * config/i386/avx512fintrin.h (_mm_mask_load_ss,
> _mm_maskz_load_ss,
> > _mm_mask_load_sd, _mm_maskz_load_sd, _mm_mask_move_ss,
> > _mm_maskz_move_ss, _mm_mask_move_sd, _mm_maskz_move_sd,
> > _mm_mask_store_ss, _mm_mask_store_sd): New intrinsics.
> >
> > * gcc.target/i386/avx512f-vmovss-1.c: New test.
> > * gcc.target/i386/avx512f-vmovss-2.c: New test.
> > * gcc.target/i386/avx512f-vmovss-3.c: New test.
> > * gcc.target/i386/avx512f-vmovsd-1.c: New test.
> > * gcc.target/i386/avx512f-vmovsd-2.c: New test.
> > * gcc.target/i386/avx512f-vmovsd-3.c: New test.
> >
> > --- gcc/config/i386/sse.md.jj   2019-02-20 23:40:17.119140235 +0100
> > +++ gcc/config/i386/sse.md  2019-03-06 19:15:12.379749161 +0100
> > @@ -1151,6 +1151,67 @@ (define_insn "_load_mask"
> > (set_attr "memory" "none,load")
> > (set_attr "mode" "")])
> >
> > +(define_insn "avx512f_mov_mask"
> > +  [(set (match_operand:VF_128 0 "register_operand" "=v")
> > +   (vec_merge:VF_128
> > + (vec_merge:VF_128
> > +   (match_operand:VF_128 2 "register_operand" "v")
> > +   (match_operand:VF_128 3 "nonimm_or_0_operand" "0C")
> > +   (match_operand:QI 4 "register_operand" "Yk"))
> > + (match_operand:VF_128 1 "register_operand" "v")
> > + (const_int 1)))]
> > +  "TARGET_AVX512F"
> > +  "vmov\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}"
> > +  [(set_attr "type" "ssemov")
> > +   (set_attr "prefix" "evex")
> > +   (set_attr "mode" "")])
> > +
> > +(define_expand "avx512f_load_mask"
> > +  [(set (match_operand: 0 "register_operand")
> > +   (vec_merge:
> > + (vec_merge:
> > +   (vec_duplicate:
> > + (match_operand:MODEF 1 "memory_operand"))
> > +   (match_operand: 2 "nonimm_or_0_operand")
> > +   (match_operand:QI 3 "nonmemory_operand"))
>
> register operand here, the expander should match corresponding insn
> pattern.
>
> > + (match_dup 4)
> > + (const_int 1)))]
> > +  "TARGET_AVX512F"
> > +  "operands[4] = CONST0_RTX (mode);")
> > +
> > +(define_insn "*avx512f_load_mask"
> > +  [(set (match_operand: 0 "register_operand" "=v")
> > +   (vec_merge:
> > + (vec_merge:
> > +   (vec_duplicate:
> > + (match_operand:MODEF 1 "memory_operand" "m"))
> > +   (match_operand: 2 "nonimm_or_0_operand" "0C")
> > +   (match_operand:QI 3 "register_operand" "Yk"))
> > + (match_operand: 4 "const0_operand" "C")
> > + (const_int 1)))]
> > +  "TARGET_AVX512F"
> > +  "vmov\t{%1, %0%{%3%}%N2|%0%{3%}%N2, %1}"
> > +  [(set_attr "ty

Re: [PATCH] Add missing avx512fintrin.h intrinsics (PR target/89602)

2019-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2019 at 9:09 AM Jakub Jelinek  wrote:
>
> On Thu, Mar 07, 2019 at 08:11:53AM +0100, Uros Bizjak wrote:
> > > +(define_insn "*avx512f_load_mask"
> > > +  [(set (match_operand: 0 "register_operand" "=v")
> > > +   (vec_merge:
> > > + (vec_merge:
> > > +   (vec_duplicate:
> > > + (match_operand:MODEF 1 "memory_operand" "m"))
> > > +   (match_operand: 2 "nonimm_or_0_operand" "0C")
> > > +   (match_operand:QI 3 "nonmemory_operand" "Yk"))
> >
> > Is there a reason to have nonmemory_operand predicate here instead of
> > register_operand?
>
> Thanks for catching that up, that was from my earlier attempt to have
> Yk,n constraints and deal with that during output.  For store it was
> possible, for others there were some cases it couldn't handle but further
> testing revealed that the combiner already handles most of the constant
> mask cases right.
>
> Here is updated patch, I've changed this in two spots.  It even improves the
> constant 1 case (the only one that is still not optimized as much as it
> should):
>  f4:
> -   movzbl  .LC0(%rip), %eax
> +   movl$1, %eax
> kmovw   %eax, %k1
> vmovsd  (%rsi), %xmm0{%k1}{z}
> ret
> Tested so far with make check-gcc RUNTESTFLAGS=i386.exp=avx512f-vmovs*.c
> and compiling/eyeballing differences on the short testcase I've posted
> in the description with also the u, -> 1, and u, -> 0, changes, appart
> from the above f4 no differences.
>
> Ok for trunk if it passes another full bootstrap/regtest?

LGTM with another fixup below.

HJ should approve addition of intrinsic in header files.

Thanks,
Uros.

>
> 2019-03-07  Jakub Jelinek  
>
> PR target/89602
> * config/i386/sse.md (avx512f_mov_mask,
> *avx512f_load_mask, avx512f_store_mask): New define_insns.
> (avx512f_load_mask): New define_expand.
> * config/i386/i386-builtin.def (__builtin_ia32_loadsd_mask,
> __builtin_ia32_loadss_mask, __builtin_ia32_storesd_mask,
> __builtin_ia32_storess_mask, __builtin_ia32_movesd_mask,
> __builtin_ia32_movess_mask): New builtins.
> * config/i386/avx512fintrin.h (_mm_mask_load_ss, _mm_maskz_load_ss,
> _mm_mask_load_sd, _mm_maskz_load_sd, _mm_mask_move_ss,
> _mm_maskz_move_ss, _mm_mask_move_sd, _mm_maskz_move_sd,
> _mm_mask_store_ss, _mm_mask_store_sd): New intrinsics.
>
> * gcc.target/i386/avx512f-vmovss-1.c: New test.
> * gcc.target/i386/avx512f-vmovss-2.c: New test.
> * gcc.target/i386/avx512f-vmovss-3.c: New test.
> * gcc.target/i386/avx512f-vmovsd-1.c: New test.
> * gcc.target/i386/avx512f-vmovsd-2.c: New test.
> * gcc.target/i386/avx512f-vmovsd-3.c: New test.
>
> --- gcc/config/i386/sse.md.jj   2019-02-20 23:40:17.119140235 +0100
> +++ gcc/config/i386/sse.md  2019-03-06 19:15:12.379749161 +0100
> @@ -1151,6 +1151,67 @@ (define_insn "_load_mask"
> (set_attr "memory" "none,load")
> (set_attr "mode" "")])
>
> +(define_insn "avx512f_mov_mask"
> +  [(set (match_operand:VF_128 0 "register_operand" "=v")
> +   (vec_merge:VF_128
> + (vec_merge:VF_128
> +   (match_operand:VF_128 2 "register_operand" "v")
> +   (match_operand:VF_128 3 "nonimm_or_0_operand" "0C")
> +   (match_operand:QI 4 "register_operand" "Yk"))
> + (match_operand:VF_128 1 "register_operand" "v")
> + (const_int 1)))]
> +  "TARGET_AVX512F"
> +  "vmov\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}"
> +  [(set_attr "type" "ssemov")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "")])
> +
> +(define_expand "avx512f_load_mask"
> +  [(set (match_operand: 0 "register_operand")
> +   (vec_merge:
> + (vec_merge:
> +   (vec_duplicate:
> + (match_operand:MODEF 1 "memory_operand"))
> +   (match_operand: 2 "nonimm_or_0_operand")
> +   (match_operand:QI 3 "nonmemory_operand"))

register operand here, the expander should match corresponding insn pattern.

> + (match_dup 4)
> + (const_int 1)))]
> +  "TARGET_AVX512F"
> +  "operands[4] = CONST0_RTX (mode);")
> +
> +(define_insn "*avx512f_load_mask"
> +  [(set (match_operand: 0 "register_operand" "=v")
> +   (vec_merge:
> + (vec_merge:
> +   (vec_duplicate:
> + (match_operand:MODEF 1 "memory_operand" "m"))
> +   (match_operand: 2 "nonimm_or_0_operand" "0C")
> +   (match_operand:QI 3 "register_operand" "Yk"))
> + (match_operand: 4 "const0_operand" "C")
> + (const_int 1)))]
> +  "TARGET_AVX512F"
> +  "vmov\t{%1, %0%{%3%}%N2|%0%{3%}%N2, %1}"
> +  [(set_attr "type" "ssemov")
> +   (set_attr "prefix" "evex")
> +   (set_attr "memory" "load")
> +   (set_attr "mode" "")])
> +
> +(define_insn "avx512f_store_mask"
> +  [(set (match_operand:MODEF 0 "memory_operand" "=m")
> +   (if_then_else:MODEF
> + (and:QI (match_operand:QI 2 "register_operand" "Yk")
> + 

Re: [PATCH] Add missing avx512fintrin.h intrinsics (PR target/89602)

2019-03-07 Thread Jakub Jelinek
On Thu, Mar 07, 2019 at 08:11:53AM +0100, Uros Bizjak wrote:
> > +(define_insn "*avx512f_load_mask"
> > +  [(set (match_operand: 0 "register_operand" "=v")
> > +   (vec_merge:
> > + (vec_merge:
> > +   (vec_duplicate:
> > + (match_operand:MODEF 1 "memory_operand" "m"))
> > +   (match_operand: 2 "nonimm_or_0_operand" "0C")
> > +   (match_operand:QI 3 "nonmemory_operand" "Yk"))
> 
> Is there a reason to have nonmemory_operand predicate here instead of
> register_operand?

Thanks for catching that up, that was from my earlier attempt to have
Yk,n constraints and deal with that during output.  For store it was
possible, for others there were some cases it couldn't handle but further
testing revealed that the combiner already handles most of the constant
mask cases right.

Here is updated patch, I've changed this in two spots.  It even improves the
constant 1 case (the only one that is still not optimized as much as it
should):
 f4:
-   movzbl  .LC0(%rip), %eax
+   movl$1, %eax
kmovw   %eax, %k1
vmovsd  (%rsi), %xmm0{%k1}{z}
ret
Tested so far with make check-gcc RUNTESTFLAGS=i386.exp=avx512f-vmovs*.c
and compiling/eyeballing differences on the short testcase I've posted
in the description with also the u, -> 1, and u, -> 0, changes, appart
from the above f4 no differences.

Ok for trunk if it passes another full bootstrap/regtest?

2019-03-07  Jakub Jelinek  

PR target/89602
* config/i386/sse.md (avx512f_mov_mask,
*avx512f_load_mask, avx512f_store_mask): New define_insns.
(avx512f_load_mask): New define_expand.
* config/i386/i386-builtin.def (__builtin_ia32_loadsd_mask,
__builtin_ia32_loadss_mask, __builtin_ia32_storesd_mask,
__builtin_ia32_storess_mask, __builtin_ia32_movesd_mask,
__builtin_ia32_movess_mask): New builtins.
* config/i386/avx512fintrin.h (_mm_mask_load_ss, _mm_maskz_load_ss,
_mm_mask_load_sd, _mm_maskz_load_sd, _mm_mask_move_ss,
_mm_maskz_move_ss, _mm_mask_move_sd, _mm_maskz_move_sd,
_mm_mask_store_ss, _mm_mask_store_sd): New intrinsics.

* gcc.target/i386/avx512f-vmovss-1.c: New test.
* gcc.target/i386/avx512f-vmovss-2.c: New test.
* gcc.target/i386/avx512f-vmovss-3.c: New test.
* gcc.target/i386/avx512f-vmovsd-1.c: New test.
* gcc.target/i386/avx512f-vmovsd-2.c: New test.
* gcc.target/i386/avx512f-vmovsd-3.c: New test.

--- gcc/config/i386/sse.md.jj   2019-02-20 23:40:17.119140235 +0100
+++ gcc/config/i386/sse.md  2019-03-06 19:15:12.379749161 +0100
@@ -1151,6 +1151,67 @@ (define_insn "_load_mask"
(set_attr "memory" "none,load")
(set_attr "mode" "")])
 
+(define_insn "avx512f_mov_mask"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+   (vec_merge:VF_128
+ (vec_merge:VF_128
+   (match_operand:VF_128 2 "register_operand" "v")
+   (match_operand:VF_128 3 "nonimm_or_0_operand" "0C")
+   (match_operand:QI 4 "register_operand" "Yk"))
+ (match_operand:VF_128 1 "register_operand" "v")
+ (const_int 1)))]
+  "TARGET_AVX512F"
+  "vmov\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "")])
+
+(define_expand "avx512f_load_mask"
+  [(set (match_operand: 0 "register_operand")
+   (vec_merge:
+ (vec_merge:
+   (vec_duplicate:
+ (match_operand:MODEF 1 "memory_operand"))
+   (match_operand: 2 "nonimm_or_0_operand")
+   (match_operand:QI 3 "nonmemory_operand"))
+ (match_dup 4)
+ (const_int 1)))]
+  "TARGET_AVX512F"
+  "operands[4] = CONST0_RTX (mode);")
+
+(define_insn "*avx512f_load_mask"
+  [(set (match_operand: 0 "register_operand" "=v")
+   (vec_merge:
+ (vec_merge:
+   (vec_duplicate:
+ (match_operand:MODEF 1 "memory_operand" "m"))
+   (match_operand: 2 "nonimm_or_0_operand" "0C")
+   (match_operand:QI 3 "register_operand" "Yk"))
+ (match_operand: 4 "const0_operand" "C")
+ (const_int 1)))]
+  "TARGET_AVX512F"
+  "vmov\t{%1, %0%{%3%}%N2|%0%{3%}%N2, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "load")
+   (set_attr "mode" "")])
+
+(define_insn "avx512f_store_mask"
+  [(set (match_operand:MODEF 0 "memory_operand" "=m")
+   (if_then_else:MODEF
+ (and:QI (match_operand:QI 2 "register_operand" "Yk")
+(const_int 1))
+ (vec_select:MODEF
+   (match_operand: 1 "register_operand" "v")
+   (parallel [(const_int 0)]))
+ (match_dup 0)))]
+  "TARGET_AVX512F"
+  "vmov\t{%1, %0%{%2%}|%0%{%2%}, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "store")
+   (set_attr "mode" "")])
+
 (define_insn "_blendm"
   [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v")
(vec_me

Re: [PATCH] Add missing avx512fintrin.h intrinsics (PR target/89602)

2019-03-06 Thread Uros Bizjak
On Thu, Mar 7, 2019 at 12:49 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following patch adds vmovss/vmovsd masked intrinsics.
> On
> #include 
> __m128 f1 (__m128 w, __mmask8 u, const float *p) { return _mm_mask_load_ss 
> (w, u, p); }
> __m128 f2 (__mmask8 u, const float *p) { return _mm_maskz_load_ss (u, p); }
> __m128d f3 (__m128d w, __mmask8 u, const double *p) { return _mm_mask_load_sd 
> (w, u, p); }
> __m128d f4 (__mmask8 u, const double *p) { return _mm_maskz_load_sd (u, p); }
> __m128 f5 (__m128 w, __mmask8 u, __m128 a, __m128 b) { return 
> _mm_mask_move_ss (w, u, a, b); }
> __m128 f6 (__mmask8 u, __m128 a, __m128 b) { return _mm_maskz_move_ss (u, a, 
> b); }
> __m128d f7 (__m128d w, __mmask8 u, __m128d a, __m128d b) { return 
> _mm_mask_move_sd (w, u, a, b); }
> __m128d f8 (__mmask8 u, __m128d a, __m128d b) { return _mm_maskz_move_sd (u, 
> a, b); }
> void f9 (float *p, __mmask8 u, __m128 a) { _mm_mask_store_ss (p, u, a); }
> void f10 (double *p, __mmask8 u, __m128d a) { _mm_mask_store_sd (p, u, a); }
> it generates the same assembly with -O2 -mavx512f as icc 19 or clang trunk.
> It mostly does a good job also when the mask is constant, on the above
> testcase with u arguments replaced with 1 I get:
> f1: vmovss (%rsi), %xmm0
> f2: vmovss (%rsi), %xmm0
> f3: vmovq (%rsi), %xmm0
> f4: movzbl .LC0(%rip), %eax; kmovw %eax, %k1; vmovsd (%rsi), %xmm0{%k1}{z}
> f5: vmovss %xmm2, %xmm1, %xmm0
> f6: vmovss %xmm1, %xmm0, %xmm0
> f7: vmovsd %xmm2, %xmm1, %xmm0
> f8: vmovsd %xmm1, %xmm0, %xmm0
> f9: vmovss %xmm0, (%rdi)
> f10: vmovlpd %xmm0, (%rdi)
> Except for f4 that looks reasonable to me (and as tested in the testsuite
> works too), for f4 guess either we need to improve simplify-rtx.c or add
> some pattern for the combiner.  Can handle that as follow-up.
> When instead using 0 mask, I get:
> f1: kxorw %k1, %k1, %k1; vmovss (%rsi), %xmm0{%k1}
> f2: vxorps %xmm0, %xmm0, %xmm0
> f3: kxorw %k1, %k1, %k1; vmovsd (%rsi), %xmm0{%k1}
> f4: vxorpd %xmm0, %xmm0, %xmm0
> f5: vmovss %xmm0, %xmm1, %xmm0
> f6: kxorw %k1, %k1, %k1; vmovss %xmm1, %xmm0, %xmm0{%k1}{z}
> f7: vmovsd %xmm0, %xmm1, %xmm0
> f8: kxorw %k1, %k1, %k1; vmovsd %xmm1, %xmm0, %xmm0{%k1}{z}
> f9: nothing
> f10: nothing
> which looks good to me.  For f1/f3/f6/f8, I really have no idea if there is
> some single insn that could do that kind of operation.  This is also tested
> at runtime in the testsuite.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2019-03-07  Jakub Jelinek  
>
> PR target/89602
> * config/i386/sse.md (avx512f_mov_mask,
> *avx512f_load_mask, avx512f_store_mask): New define_insns.
> (avx512f_load_mask): New define_expand.
> * config/i386/i386-builtin.def (__builtin_ia32_loadsd_mask,
> __builtin_ia32_loadss_mask, __builtin_ia32_storesd_mask,
> __builtin_ia32_storess_mask, __builtin_ia32_movesd_mask,
> __builtin_ia32_movess_mask): New builtins.
> * config/i386/avx512fintrin.h (_mm_mask_load_ss, _mm_maskz_load_ss,
> _mm_mask_load_sd, _mm_maskz_load_sd, _mm_mask_move_ss,
> _mm_maskz_move_ss, _mm_mask_move_sd, _mm_maskz_move_sd,
> _mm_mask_store_ss, _mm_mask_store_sd): New intrinsics.
>
> * gcc.target/i386/avx512f-vmovss-1.c: New test.
> * gcc.target/i386/avx512f-vmovss-2.c: New test.
> * gcc.target/i386/avx512f-vmovss-3.c: New test.
> * gcc.target/i386/avx512f-vmovsd-1.c: New test.
> * gcc.target/i386/avx512f-vmovsd-2.c: New test.
> * gcc.target/i386/avx512f-vmovsd-3.c: New test.
>
> --- gcc/config/i386/sse.md.jj   2019-02-20 23:40:17.119140235 +0100
> +++ gcc/config/i386/sse.md  2019-03-06 19:15:12.379749161 +0100
> @@ -1151,6 +1151,67 @@ (define_insn "_load_mask"
> (set_attr "memory" "none,load")
> (set_attr "mode" "")])
>
> +(define_insn "avx512f_mov_mask"
> +  [(set (match_operand:VF_128 0 "register_operand" "=v")
> +   (vec_merge:VF_128
> + (vec_merge:VF_128
> +   (match_operand:VF_128 2 "register_operand" "v")
> +   (match_operand:VF_128 3 "nonimm_or_0_operand" "0C")
> +   (match_operand:QI 4 "register_operand" "Yk"))
> + (match_operand:VF_128 1 "register_operand" "v")
> + (const_int 1)))]
> +  "TARGET_AVX512F"
> +  "vmov\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}"
> +  [(set_attr "type" "ssemov")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "")])
> +
> +(define_expand "avx512f_load_mask"
> +  [(set (match_operand: 0 "register_operand")
> +   (vec_merge:
> + (vec_merge:
> +   (vec_duplicate:
> + (match_operand:MODEF 1 "memory_operand"))
> +   (match_operand: 2 "nonimm_or_0_operand")
> +   (match_operand:QI 3 "nonmemory_operand"))
> + (match_dup 4)
> + (const_int 1)))]
> +  "TARGET_AVX512F"
> +  "operands[4] = CONST0_RTX (mode);")
> +
> +(define_insn "*avx512f_load_mask"
> +  [(set (match_operand: 0 "register_operand" "=v")
>

[PATCH] Add missing avx512fintrin.h intrinsics (PR target/89602)

2019-03-06 Thread Jakub Jelinek
Hi!

The following patch adds vmovss/vmovsd masked intrinsics.
On
#include 
__m128 f1 (__m128 w, __mmask8 u, const float *p) { return _mm_mask_load_ss (w, 
u, p); }
__m128 f2 (__mmask8 u, const float *p) { return _mm_maskz_load_ss (u, p); }
__m128d f3 (__m128d w, __mmask8 u, const double *p) { return _mm_mask_load_sd 
(w, u, p); }
__m128d f4 (__mmask8 u, const double *p) { return _mm_maskz_load_sd (u, p); }
__m128 f5 (__m128 w, __mmask8 u, __m128 a, __m128 b) { return _mm_mask_move_ss 
(w, u, a, b); }
__m128 f6 (__mmask8 u, __m128 a, __m128 b) { return _mm_maskz_move_ss (u, a, 
b); }
__m128d f7 (__m128d w, __mmask8 u, __m128d a, __m128d b) { return 
_mm_mask_move_sd (w, u, a, b); }
__m128d f8 (__mmask8 u, __m128d a, __m128d b) { return _mm_maskz_move_sd (u, a, 
b); }
void f9 (float *p, __mmask8 u, __m128 a) { _mm_mask_store_ss (p, u, a); }
void f10 (double *p, __mmask8 u, __m128d a) { _mm_mask_store_sd (p, u, a); }
it generates the same assembly with -O2 -mavx512f as icc 19 or clang trunk.
It mostly does a good job also when the mask is constant, on the above
testcase with u arguments replaced with 1 I get:
f1: vmovss (%rsi), %xmm0
f2: vmovss (%rsi), %xmm0
f3: vmovq (%rsi), %xmm0
f4: movzbl .LC0(%rip), %eax; kmovw %eax, %k1; vmovsd (%rsi), %xmm0{%k1}{z}
f5: vmovss %xmm2, %xmm1, %xmm0
f6: vmovss %xmm1, %xmm0, %xmm0
f7: vmovsd %xmm2, %xmm1, %xmm0
f8: vmovsd %xmm1, %xmm0, %xmm0
f9: vmovss %xmm0, (%rdi)
f10: vmovlpd %xmm0, (%rdi)
Except for f4 that looks reasonable to me (and as tested in the testsuite
works too), for f4 guess either we need to improve simplify-rtx.c or add
some pattern for the combiner.  Can handle that as follow-up.
When instead using 0 mask, I get:
f1: kxorw %k1, %k1, %k1; vmovss (%rsi), %xmm0{%k1}
f2: vxorps %xmm0, %xmm0, %xmm0
f3: kxorw %k1, %k1, %k1; vmovsd (%rsi), %xmm0{%k1}
f4: vxorpd %xmm0, %xmm0, %xmm0
f5: vmovss %xmm0, %xmm1, %xmm0
f6: kxorw %k1, %k1, %k1; vmovss %xmm1, %xmm0, %xmm0{%k1}{z}
f7: vmovsd %xmm0, %xmm1, %xmm0
f8: kxorw %k1, %k1, %k1; vmovsd %xmm1, %xmm0, %xmm0{%k1}{z}
f9: nothing
f10: nothing
which looks good to me.  For f1/f3/f6/f8, I really have no idea if there is
some single insn that could do that kind of operation.  This is also tested
at runtime in the testsuite.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-03-07  Jakub Jelinek  

PR target/89602
* config/i386/sse.md (avx512f_mov_mask,
*avx512f_load_mask, avx512f_store_mask): New define_insns.
(avx512f_load_mask): New define_expand.
* config/i386/i386-builtin.def (__builtin_ia32_loadsd_mask,
__builtin_ia32_loadss_mask, __builtin_ia32_storesd_mask,
__builtin_ia32_storess_mask, __builtin_ia32_movesd_mask,
__builtin_ia32_movess_mask): New builtins.
* config/i386/avx512fintrin.h (_mm_mask_load_ss, _mm_maskz_load_ss,
_mm_mask_load_sd, _mm_maskz_load_sd, _mm_mask_move_ss,
_mm_maskz_move_ss, _mm_mask_move_sd, _mm_maskz_move_sd,
_mm_mask_store_ss, _mm_mask_store_sd): New intrinsics.

* gcc.target/i386/avx512f-vmovss-1.c: New test.
* gcc.target/i386/avx512f-vmovss-2.c: New test.
* gcc.target/i386/avx512f-vmovss-3.c: New test.
* gcc.target/i386/avx512f-vmovsd-1.c: New test.
* gcc.target/i386/avx512f-vmovsd-2.c: New test.
* gcc.target/i386/avx512f-vmovsd-3.c: New test.

--- gcc/config/i386/sse.md.jj   2019-02-20 23:40:17.119140235 +0100
+++ gcc/config/i386/sse.md  2019-03-06 19:15:12.379749161 +0100
@@ -1151,6 +1151,67 @@ (define_insn "_load_mask"
(set_attr "memory" "none,load")
(set_attr "mode" "")])
 
+(define_insn "avx512f_mov_mask"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+   (vec_merge:VF_128
+ (vec_merge:VF_128
+   (match_operand:VF_128 2 "register_operand" "v")
+   (match_operand:VF_128 3 "nonimm_or_0_operand" "0C")
+   (match_operand:QI 4 "register_operand" "Yk"))
+ (match_operand:VF_128 1 "register_operand" "v")
+ (const_int 1)))]
+  "TARGET_AVX512F"
+  "vmov\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "")])
+
+(define_expand "avx512f_load_mask"
+  [(set (match_operand: 0 "register_operand")
+   (vec_merge:
+ (vec_merge:
+   (vec_duplicate:
+ (match_operand:MODEF 1 "memory_operand"))
+   (match_operand: 2 "nonimm_or_0_operand")
+   (match_operand:QI 3 "nonmemory_operand"))
+ (match_dup 4)
+ (const_int 1)))]
+  "TARGET_AVX512F"
+  "operands[4] = CONST0_RTX (mode);")
+
+(define_insn "*avx512f_load_mask"
+  [(set (match_operand: 0 "register_operand" "=v")
+   (vec_merge:
+ (vec_merge:
+   (vec_duplicate:
+ (match_operand:MODEF 1 "memory_operand" "m"))
+   (match_operand: 2 "nonimm_or_0_operand" "0C")
+   (match_operand:QI 3 "nonmemory_operand" "Yk"))
+ (match_operand: 

[PATCH] Add missing exports for symbols used by directory iterators

2019-01-28 Thread Jonathan Wakely

* config/abi/pre/gnu.ver (GLIBCXX_3.4.26): Add missing exports for
__shared_ptr instantiations used by gcc4-compatible ABI.

Tested x86_64-linux, committed to trunk.

commit b50b2bf26eef0e87014c19991bebfa9b75c2b754
Author: Jonathan Wakely 
Date:   Tue Jan 29 00:19:59 2019 +

Add missing exports for symbols used by directory iterators

* config/abi/pre/gnu.ver (GLIBCXX_3.4.26): Add missing exports for
__shared_ptr instantiations used by gcc4-compatible ABI.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index 34c70b6cb8f..a0860668b90 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2225,6 +2225,12 @@ GLIBCXX_3.4.26 {
 _ZNSt10filesystem7__cxx1128recursive_directory_iteratoraSEOS1_;
 _ZNSt10filesystem7__cxx1128recursive_directory_iteratorppEv;
 
+
_ZNSt12__shared_ptrINSt10filesystem4_DirELN9__gnu_cxx12_Lock_policyE2EEC1Ev;
+
_ZNSt12__shared_ptrINSt10filesystem4_DirELN9__gnu_cxx12_Lock_policyE2EEC1EOS4_;
+
_ZNSt12__shared_ptrINSt10filesystem4_DirELN9__gnu_cxx12_Lock_policyE2EEaSEOS4_;
+
_ZNSt12__shared_ptrINSt10filesystem28recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE2EEC1Ev;
+
_ZNSt12__shared_ptrINSt10filesystem28recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE2EEC1EOS5_;
+
 
_ZNSt12__shared_ptrINSt10filesystem7__cxx114_DirELN9__gnu_cxx12_Lock_policyE2EEC1Ev;
 
_ZNSt12__shared_ptrINSt10filesystem7__cxx114_DirELN9__gnu_cxx12_Lock_policyE2EEC1EOS5_;
 
_ZNSt12__shared_ptrINSt10filesystem7__cxx114_DirELN9__gnu_cxx12_Lock_policyE2EEaSEOS5_;


Re: [PATCH] Add missing noexpect causes in tuple for move functions

2018-11-30 Thread nick



On 2018-11-30 6:12 p.m., Ville Voutilainen wrote:
> On Sat, 1 Dec 2018 at 01:05, Nicholas Krause  wrote:
>>
>> This adds the remainging noexcept causes required for this cause
>> to meet the spec as dicussed last year and documented here:
>> http://cplusplus.github.io/LWG/lwg-active.html#2899.
> 
> I don't see how this change is sufficient; the noexcept-specs need to
> be added to tuple's
> special member functions, not just to _Tuple_impl, and your suggested
> patch contains no
> tests.
> 

It was tested I just didn't mention that as it was assumed, that's my mistake 
and
sorry for that. This was more just to make sure that this is fine. If you would
prefer I send a patch  cleaning it up for all the classes i.e. tuple, and it's 
version's that's fine. I just want to ask do you want a patch or a series will
each patch touching one of the tuple clases as I assume your the maintainer.

Cheers,

Nick


Re: [PATCH] Add missing noexpect causes in tuple for move functions

2018-11-30 Thread Ville Voutilainen
On Sat, 1 Dec 2018 at 01:05, Nicholas Krause  wrote:
>
> This adds the remainging noexcept causes required for this cause
> to meet the spec as dicussed last year and documented here:
> http://cplusplus.github.io/LWG/lwg-active.html#2899.

I don't see how this change is sufficient; the noexcept-specs need to
be added to tuple's
special member functions, not just to _Tuple_impl, and your suggested
patch contains no
tests.


[PATCH] Add missing noexpect causes in tuple for move functions

2018-11-30 Thread Nicholas Krause
This adds the remainging noexcept causes required for this cause
to meet the spec as dicussed last year and documented here:
http://cplusplus.github.io/LWG/lwg-active.html#2899.

Signed-off-by: Nicholas Krause 
---
 libstdc++-v3/include/std/tuple | 4 
 1 file changed, 4 insertions(+)

diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 56b97c25eed..d17512a1b7e 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -214,6 +214,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
enable_if::type>
 explicit
 constexpr _Tuple_impl(_UHead&& __head, _UTail&&... __tail)
+noexcept(__and_,
+ is_nothrow_move_constructible<_Inherited>>::value)
: _Inherited(std::forward<_UTail>(__tail)...),
  _Base(std::forward<_UHead>(__head)) { }
 
@@ -237,6 +239,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 constexpr _Tuple_impl(_Tuple_impl<_Idx, _UHead, _UTails...>&& __in)
+noexcept(__and_,
+ is_nothrow_move_constructible<_Inherited>>::value)
: _Inherited(std::move
 (_Tuple_impl<_Idx, _UHead, _UTails...>::_M_tail(__in))),
  _Base(std::forward<_UHead>
-- 
2.17.1



Re: [PATCH] Add missing ZLIBINC to CFLAGS-optinfo-emit-json.o

2018-11-14 Thread David Malcolm
On Wed, 2018-11-14 at 09:49 +, Kyrill Tkachov wrote:
> On 13/11/18 18:45, David Malcolm wrote:
> > On Tue, 2018-11-13 at 17:58 +, Kyrill Tkachov wrote:
> > > Hi David,
> > > 
> > > On 09/11/18 21:00, Jeff Law wrote:
> > > > On 11/9/18 10:51 AM, David Malcolm wrote:
> > > > > One of the concerns noted at Cauldron about -fsave-
> > > > > optimization-
> > > > > record
> > > > > was the size of the output files.
> > > > > 
> > > > > This file implements compression of the -fsave-optimization-
> > > > > record
> > > > > output, using zlib.
> > > > > 
> > > > > I did some before/after testing of this patch, using SPEC
> > > > > 2017's
> > > > > 502.gcc_r with -O3, looking at the sizes of the generated
> > > > > FILENAME.opt-record.json[.gz] files.
> > > > > 
> > > > > The largest file was for insn-attrtab.c:
> > > > >before:  171736285 bytes (164M)
> > > > >after: 5304015 bytes (5.1M)
> > > > > 
> > > > > Smallest file was for vasprintf.c:
> > > > >before:  30567 bytes
> > > > >after:4485 bytes
> > > > > 
> > > > > Median file by size before was lambda-mat.c:
> > > > >before:2266738 bytes (2.2M)
> > > > >after:   75988 bytes (15K)
> > > > > 
> > > > > Total of all files in the benchmark:
> > > > >before: 2041720713 bytes (1.9G)
> > > > >after:66870770 bytes (63.8M)
> > > > > 
> > > > > ...so clearly compression is a big win in terms of file size,
> > > > > at
> > > > > the
> > > > > cost of making the files slightly more awkward to work with.
> > > > > [1]
> > > > > I also wonder if we want to support any pre-filtering of the
> > > > > output
> > > > > (FWIW roughly half of the biggest file seems to be "Adding
> > > > > assert
> > > > > for "
> > > > > messages from tree-vrp.c).
> > > > > 
> > > > > Successfully bootstrapped & regrtested on x86_64-pc-linux-
> > > > > gnu.
> > > > > 
> > > > > OK for trunk?
> > > > > 
> > > 
> > > So does this now add a dependency on zlib?
> > > I can't build GCC on my aarch64-none-linux machine after this
> > > patch
> > > due to a missing zlib.h.
> > > I see there's a zlib in the top-level GCC tree. Is that
> > > build/used
> > > during the GCC build itself?
> > > 
> > > Thanks,
> > > Kyrill
> > 
> > Sorry about that.  Does the following patch fix the build for you?
> 
> Yes, that fixes it.
> Thanks David!
> 
> Kyrill

Thanks; I've committed it to trunk as r266156.

Dave


[PATCH] Add missing dir to create_testsuite_files script

2018-11-14 Thread Jonathan Wakely

* scripts/create_testsuite_files: Add special_functions to the list
of directories to search. Add comment referring to conformance.exp.
* testsuite/libstdc++-dg/conformance.exp: Add comment referring
to create_testsuite_files.

Committed to trunk.

commit de9099395703eac44f7d1ab7c06dc20718dcc0b8
Author: Jonathan Wakely 
Date:   Wed Nov 14 14:11:14 2018 +

Add missing dir to create_testsuite_files script

* scripts/create_testsuite_files: Add special_functions to the list
of directories to search. Add comment referring to conformance.exp.
* testsuite/libstdc++-dg/conformance.exp: Add comment referring
to create_testsuite_files.

diff --git a/libstdc++-v3/scripts/create_testsuite_files 
b/libstdc++-v3/scripts/create_testsuite_files
index 156304c2ad2..40e81cea8a9 100755
--- a/libstdc++-v3/scripts/create_testsuite_files
+++ b/libstdc++-v3/scripts/create_testsuite_files
@@ -31,8 +31,10 @@ tests_file_perf="$outdir/testsuite_files_performance"
 cd $srcdir
 # This is the ugly version of "everything but the current directory".  It's
 # what has to happen when find(1) doesn't support -mindepth, or -xtype.
+# The directories here should be consistent with libstdc++-dg/conformance.exp
 dlist=`echo [0-9][0-9]*`
 dlist="$dlist abi backward ext performance tr1 tr2 decimal experimental"
+dlist="$dlist special_functions"
 find $dlist "(" -type f -o -type l ")" -name "*.cc" -print > $tmp.01
 find $dlist "(" -type f -o -type l ")" -name "*.c" -print > $tmp.02
 cat  $tmp.01 $tmp.02 | sort > $tmp.1
diff --git a/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp 
b/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
index 49ac6fb1649..f372d670f6b 100644
--- a/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
+++ b/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
@@ -52,6 +52,7 @@ if {[info exists tests_file] && [file exists $tests_file]} {
 close $f
 } else {
 # Find directories that might have tests.
+# This list should be consistent with scripts/create_testsuite_files
 set subdirs [glob "$srcdir/\[0-9\]\[0-9\]*"]
 lappend subdirs "$srcdir/abi"
 lappend subdirs "$srcdir/backward"


Re: [PATCH] Add missing ZLIBINC to CFLAGS-optinfo-emit-json.o

2018-11-14 Thread Kyrill Tkachov



On 13/11/18 18:45, David Malcolm wrote:

On Tue, 2018-11-13 at 17:58 +, Kyrill Tkachov wrote:

Hi David,

On 09/11/18 21:00, Jeff Law wrote:

On 11/9/18 10:51 AM, David Malcolm wrote:

One of the concerns noted at Cauldron about -fsave-optimization-
record
was the size of the output files.

This file implements compression of the -fsave-optimization-
record
output, using zlib.

I did some before/after testing of this patch, using SPEC 2017's
502.gcc_r with -O3, looking at the sizes of the generated
FILENAME.opt-record.json[.gz] files.

The largest file was for insn-attrtab.c:
   before:  171736285 bytes (164M)
   after: 5304015 bytes (5.1M)

Smallest file was for vasprintf.c:
   before:  30567 bytes
   after:4485 bytes

Median file by size before was lambda-mat.c:
   before:2266738 bytes (2.2M)
   after:   75988 bytes (15K)

Total of all files in the benchmark:
   before: 2041720713 bytes (1.9G)
   after:66870770 bytes (63.8M)

...so clearly compression is a big win in terms of file size, at
the
cost of making the files slightly more awkward to work with. [1]
I also wonder if we want to support any pre-filtering of the
output
(FWIW roughly half of the biggest file seems to be "Adding assert
for "
messages from tree-vrp.c).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?


So does this now add a dependency on zlib?
I can't build GCC on my aarch64-none-linux machine after this patch
due to a missing zlib.h.
I see there's a zlib in the top-level GCC tree. Is that build/used
during the GCC build itself?

Thanks,
Kyrill

Sorry about that.  Does the following patch fix the build for you?


Yes, that fixes it.
Thanks David!

Kyrill


gcc/ChangeLog:
* Makefile.in (CFLAGS-optinfo-emit-json.o): Add $(ZLIBINC).
---
  gcc/Makefile.in | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 16c9ed6..1e8a311 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2233,7 +2233,7 @@ s-bversion: BASE-VER
$(STAMP) s-bversion
  
  CFLAGS-toplev.o += -DTARGET_NAME=\"$(target_noncanonical)\"

-CFLAGS-optinfo-emit-json.o += -DTARGET_NAME=\"$(target_noncanonical)\"
+CFLAGS-optinfo-emit-json.o += -DTARGET_NAME=\"$(target_noncanonical)\" 
$(ZLIBINC)
  
  pass-instances.def: $(srcdir)/passes.def $(PASSES_EXTRA) \

$(srcdir)/gen-pass-instances.awk




[PATCH] Add missing ZLIBINC to CFLAGS-optinfo-emit-json.o

2018-11-13 Thread David Malcolm
On Tue, 2018-11-13 at 17:58 +, Kyrill Tkachov wrote:
> Hi David,
> 
> On 09/11/18 21:00, Jeff Law wrote:
> > On 11/9/18 10:51 AM, David Malcolm wrote:
> > > One of the concerns noted at Cauldron about -fsave-optimization-
> > > record
> > > was the size of the output files.
> > > 
> > > This file implements compression of the -fsave-optimization-
> > > record
> > > output, using zlib.
> > > 
> > > I did some before/after testing of this patch, using SPEC 2017's
> > > 502.gcc_r with -O3, looking at the sizes of the generated
> > > FILENAME.opt-record.json[.gz] files.
> > > 
> > > The largest file was for insn-attrtab.c:
> > >   before:  171736285 bytes (164M)
> > >   after: 5304015 bytes (5.1M)
> > > 
> > > Smallest file was for vasprintf.c:
> > >   before:  30567 bytes
> > >   after:4485 bytes
> > > 
> > > Median file by size before was lambda-mat.c:
> > >   before:2266738 bytes (2.2M)
> > >   after:   75988 bytes (15K)
> > > 
> > > Total of all files in the benchmark:
> > >   before: 2041720713 bytes (1.9G)
> > >   after:66870770 bytes (63.8M)
> > > 
> > > ...so clearly compression is a big win in terms of file size, at
> > > the
> > > cost of making the files slightly more awkward to work with. [1]
> > > I also wonder if we want to support any pre-filtering of the
> > > output
> > > (FWIW roughly half of the biggest file seems to be "Adding assert
> > > for "
> > > messages from tree-vrp.c).
> > > 
> > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > 
> > > OK for trunk?
> > > 
> 
> So does this now add a dependency on zlib?
> I can't build GCC on my aarch64-none-linux machine after this patch
> due to a missing zlib.h.
> I see there's a zlib in the top-level GCC tree. Is that build/used
> during the GCC build itself?
> 
> Thanks,
> Kyrill

Sorry about that.  Does the following patch fix the build for you?

gcc/ChangeLog:
* Makefile.in (CFLAGS-optinfo-emit-json.o): Add $(ZLIBINC).
---
 gcc/Makefile.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 16c9ed6..1e8a311 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2233,7 +2233,7 @@ s-bversion: BASE-VER
$(STAMP) s-bversion
 
 CFLAGS-toplev.o += -DTARGET_NAME=\"$(target_noncanonical)\"
-CFLAGS-optinfo-emit-json.o += -DTARGET_NAME=\"$(target_noncanonical)\"
+CFLAGS-optinfo-emit-json.o += -DTARGET_NAME=\"$(target_noncanonical)\" 
$(ZLIBINC)
 
 pass-instances.def: $(srcdir)/passes.def $(PASSES_EXTRA) \
$(srcdir)/gen-pass-instances.awk
-- 
1.8.5.3



[PATCH] Add missing ZLIBINC to CFLAGS-optinfo-emit-json.o

2018-11-13 Thread David Malcolm
On Tue, 2018-11-13 at 17:58 +, Kyrill Tkachov wrote:
> Hi David,
> 
> On 09/11/18 21:00, Jeff Law wrote:
> > On 11/9/18 10:51 AM, David Malcolm wrote:
> > > One of the concerns noted at Cauldron about -fsave-optimization-
> > > record
> > > was the size of the output files.
> > > 
> > > This file implements compression of the -fsave-optimization-
> > > record
> > > output, using zlib.
> > > 
> > > I did some before/after testing of this patch, using SPEC 2017's
> > > 502.gcc_r with -O3, looking at the sizes of the generated
> > > FILENAME.opt-record.json[.gz] files.
> > > 
> > > The largest file was for insn-attrtab.c:
> > >   before:  171736285 bytes (164M)
> > >   after: 5304015 bytes (5.1M)
> > > 
> > > Smallest file was for vasprintf.c:
> > >   before:  30567 bytes
> > >   after:4485 bytes
> > > 
> > > Median file by size before was lambda-mat.c:
> > >   before:2266738 bytes (2.2M)
> > >   after:   75988 bytes (15K)
> > > 
> > > Total of all files in the benchmark:
> > >   before: 2041720713 bytes (1.9G)
> > >   after:66870770 bytes (63.8M)
> > > 
> > > ...so clearly compression is a big win in terms of file size, at
> > > the
> > > cost of making the files slightly more awkward to work with. [1]
> > > I also wonder if we want to support any pre-filtering of the
> > > output
> > > (FWIW roughly half of the biggest file seems to be "Adding assert
> > > for "
> > > messages from tree-vrp.c).
> > > 
> > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > 
> > > OK for trunk?
> > > 
> 
> So does this now add a dependency on zlib?
> I can't build GCC on my aarch64-none-linux machine after this patch
> due to a missing zlib.h.
> I see there's a zlib in the top-level GCC tree. Is that build/used
> during the GCC build itself?
> 
> Thanks,
> Kyrill

Sorry about that.  Does the following patch fix the build for you?

gcc/ChangeLog:
* Makefile.in (CFLAGS-optinfo-emit-json.o): Add $(ZLIBINC).
---
 gcc/Makefile.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 16c9ed6..1e8a311 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2233,7 +2233,7 @@ s-bversion: BASE-VER
$(STAMP) s-bversion
 
 CFLAGS-toplev.o += -DTARGET_NAME=\"$(target_noncanonical)\"
-CFLAGS-optinfo-emit-json.o += -DTARGET_NAME=\"$(target_noncanonical)\"
+CFLAGS-optinfo-emit-json.o += -DTARGET_NAME=\"$(target_noncanonical)\" 
$(ZLIBINC)
 
 pass-instances.def: $(srcdir)/passes.def $(PASSES_EXTRA) \
$(srcdir)/gen-pass-instances.awk
-- 
1.8.5.3



Re: [PATCH] add missing @opindex to Warning Options

2018-06-11 Thread Martin Sebor

On 06/11/2018 01:20 PM, Jeff Law wrote:

On 06/07/2018 09:21 PM, Eric Gallager wrote:

On 6/7/18, Martin Sebor  wrote:

A bunch of warning options are missing an @opindex entry,
usually for the negative form.  I went through them all
and added them where it made sense.

Unless there are objections I will commit the patch to
trunk next week.

I think the patch is also appropriate for release branches
(modulo options that don't exist there), so likewise, unless
Jakub or Richard feel differently I will backport it to GCC
6 and 7.

Martin



There's a 'p' missing in "Wno-openm-simd" which looks like is being
propagated by copying and pasting

OK with that fixed.
jeff



Thanks for spotting the missing p!  Committed in r261453.

Martin


Re: [PATCH] add missing @opindex to Warning Options

2018-06-11 Thread Jeff Law
On 06/07/2018 09:21 PM, Eric Gallager wrote:
> On 6/7/18, Martin Sebor  wrote:
>> A bunch of warning options are missing an @opindex entry,
>> usually for the negative form.  I went through them all
>> and added them where it made sense.
>>
>> Unless there are objections I will commit the patch to
>> trunk next week.
>>
>> I think the patch is also appropriate for release branches
>> (modulo options that don't exist there), so likewise, unless
>> Jakub or Richard feel differently I will backport it to GCC
>> 6 and 7.
>>
>> Martin
>>
> 
> There's a 'p' missing in "Wno-openm-simd" which looks like is being
> propagated by copying and pasting
OK with that fixed.
jeff



Re: [PATCH] add missing @opindex to Warning Options

2018-06-07 Thread Eric Gallager
On 6/7/18, Martin Sebor  wrote:
> A bunch of warning options are missing an @opindex entry,
> usually for the negative form.  I went through them all
> and added them where it made sense.
>
> Unless there are objections I will commit the patch to
> trunk next week.
>
> I think the patch is also appropriate for release branches
> (modulo options that don't exist there), so likewise, unless
> Jakub or Richard feel differently I will backport it to GCC
> 6 and 7.
>
> Martin
>

There's a 'p' missing in "Wno-openm-simd" which looks like is being
propagated by copying and pasting


  1   2   3   >