Re: [PATCH v3] c++: Further tweaks for new-expression and paren-init [PR77841]

2020-09-08 Thread Marek Polacek via Gcc-patches
On Tue, Sep 08, 2020 at 04:19:42PM -0400, Jason Merrill wrote:
> On 9/8/20 4:06 PM, Marek Polacek wrote:
> > On Mon, Sep 07, 2020 at 11:19:47PM -0400, Jason Merrill wrote:
> > > On 9/6/20 11:34 AM, Marek Polacek wrote:
> > > > @@ -3944,9 +3935,9 @@ build_new (location_t loc, vec 
> > > > **placement, tree type,
> > > >}
> > > >  /* P1009: Array size deduction in new-expressions.  */
> > > > -  if (TREE_CODE (type) == ARRAY_TYPE
> > > > -  && !TYPE_DOMAIN (type)
> > > > -  && *init)
> > > > +  const bool deduce_array_p = (TREE_CODE (type) == ARRAY_TYPE
> > > > +  && !TYPE_DOMAIN (type));
> > > > +  if (*init && (deduce_array_p || (nelts && cxx_dialect >= cxx20)))
> > > 
> > > Looks like this won't handle new (char[4]), for which we also get an
> > > ARRAY_TYPE.
> > 
> > Good catch.  Fixed & paren-init37.C added.
> > 
> > > >{
> > > >  /* This means we have 'new T[]()'.  */
> > > >  if ((*init)->is_empty ())
> > > > @@ -3955,16 +3946,20 @@ build_new (location_t loc, vec 
> > > > **placement, tree type,
> > > >   CONSTRUCTOR_IS_DIRECT_INIT (ctor) = true;
> > > >   vec_safe_push (*init, ctor);
> > > > }
> > > > +  tree array_type = deduce_array_p ? TREE_TYPE (type) : type;
> > > 
> > > I'd call this variable elt_type.
> > 
> > Right, and it should be inside the block below.
> > 
> > > >  tree  = (**init)[0];
> > > >  /* The C++20 'new T[](e_0, ..., e_k)' case allowed by P0960.  
> > > > */
> > > >  if (!DIRECT_LIST_INIT_P (elt) && cxx_dialect >= cxx20)
> > > > {
> > > > - /* Handle new char[]("foo").  */
> > > > + /* Handle new char[]("foo"): turn it into new char[]{"foo"}.  
> > > > */
> > > >   if (vec_safe_length (*init) == 1
> > > > - && char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (type)))
> > > > + && char_type_p (TYPE_MAIN_VARIANT (array_type))
> > > >   && TREE_CODE (tree_strip_any_location_wrapper (elt))
> > > >  == STRING_CST)
> > > > -   /* Leave it alone: the string should not be wrapped in {}.  
> > > > */;
> > > > +   {
> > > > + elt = build_constructor_single (init_list_type_node, 
> > > > NULL_TREE, elt);
> > > > + CONSTRUCTOR_IS_DIRECT_INIT (elt) = true;
> > > > +   }
> > > >   else
> > > > {
> > > >   tree ctor = build_constructor_from_vec 
> > > > (init_list_type_node, *init);
> > > 
> > > With this change, doesn't the string special case produce the same result 
> > > as
> > > the general case?
> > 
> > The problem is that reshape_init won't do anything for 
> > CONSTRUCTOR_IS_PAREN_INIT.
> 
> Ah, yes, that flag is the difference.
> 
> > So the reshape_init in build_new_1 wouldn't unwrap the outermost { } around
> > a STRING_CST.
> 
> > Perhaps reshape_init should be adjusted to do that unwrapping even when it 
> > gets
> > a CONSTRUCTOR_IS_PAREN_INIT CONSTRUCTOR.  But I'm not sure if it should 
> > also do
> > the reference_related_p unwrapping in reshape_init_r in that case.
> 
> That would make sense to me.

Done (but only for the outermost CONSTRUCTOR) in the below.  It allowed me to...

> > > > @@ -3977,9 +3972,15 @@ build_new (location_t loc, vec 
> > > > **placement, tree type,
> > > > }
> > > > }
> > > >  /* Otherwise we should have 'new T[]{e_0, ..., e_k}'.  */
> > > > -  if (BRACE_ENCLOSED_INITIALIZER_P (elt))
> > > > -   elt = reshape_init (type, elt, complain);
> > > > -  cp_complete_array_type (, elt, /*do_default*/false);
> > > > +  if (deduce_array_p)
> > > > +   {
> > > > + /* Don't reshape ELT itself: we want to pass a 
> > > > list-initializer to
> > > > +build_new_1, even for STRING_CSTs.  */
> > > > + tree e = elt;
> > > > + if (BRACE_ENCLOSED_INITIALIZER_P (e))
> > > > +   e = reshape_init (type, e, complain);
> > > 
> > > The comment is unclear; this call does reshape the CONSTRUCTOR ELT points
> > > to, it just doesn't change ELT if the reshape call returns something else.
> > 
> > Yea, I've amended the comment.
> > 
> > > Why are we reshaping here, anyway?  Won't that lead to undesired brace
> > > elision?
> > 
> > We have to reshape before deducing the array, otherwise we could deduce the
> > wrong number of elements when certain braces were omitted.  E.g. in
> > 
> >struct S { int x, y; };
> >new S[]{1, 2, 3, 4}; // braces elided, is { {1, 2}, {3, 4} }
> 
> Ah, right, we also get here for initializers written with actual braces.
> 
> > we want S[2], not S[4].  A way to test it would be
> > 
> >struct S { int x, y; };
> >S *p = new S[]{1, 2, 3, 4};
> > 
> >void* operator new (unsigned long int size)
> >{
> >if (size != sizeof (S) * 2)
> > __builtin_abort ();
> >return __builtin_malloc (size);
> >}
> > 
> >int main () { }
> > 
> > I can add that 

Re: [PATCH] Implement __builtin_thread_pointer for x86 TLS

2020-09-08 Thread Hongtao Liu via Gcc-patches
On Tue, Sep 8, 2020 at 4:52 PM Jakub Jelinek  wrote:
>
> On Tue, Sep 08, 2020 at 04:14:52PM +0800, Hongtao Liu wrote:
> > Hi:
> >   We have "*load_tp_" in i386.md for load of thread pointer in
> > i386.md, so this patch merely adds the expander for
> > __builtin_thread_pointer.
> >
> >   Bootstrap is ok, regression test is ok for i386/x86-64 backend.
> >   Ok for trunk?
> >
> > gcc/ChangeLog:
> > PR target/96955
> > * config/i386/i386.md (get_thread_pointer): New
> > expander.
>
> I wonder if this shouldn't be done only if targetm.have_tls is true.
> Because on targets that use emulated TLS it doesn't really make much sense.
>

Changed as

 ;; Load and add the thread base pointer from %:0.
+(define_expand "get_thread_pointer"
+  [(set (match_operand:PTR 0 "register_operand")
+   (unspec:PTR [(const_int 0)] UNSPEC_TP))]
+  ""
+{
+  /* targetm is not existed in the scope of condition.  */
+  if (!targetm.have_tls)
+error ("%<__builtin_thread_pointer%> is not supported on this target");
+})


> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr96955-builtin_thread_pointer.c: New test.
>
> The testcase naming is weird.  Either call it pr96955.c, or
> builtin_thread_pointer.c, but not both.
>

Renamed to builtin_thread_pointer.c.

Update patch.
-- 
BR,
Hongtao
From 400418fadce46e7db7bd37be45ef5ff5beb08d19 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Tue, 8 Sep 2020 15:44:58 +0800
Subject: [PATCH] Implement __builtin_thread_pointer for x86 TLS.

gcc/ChangeLog:
	PR target/96955
	* config/i386/i386.md (get_thread_pointer): New
	expander.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/builtin_thread_pointer.c: New test.
---
 gcc/config/i386/i386.md   | 10 +++
 .../gcc.target/i386/builtin_thread_pointer.c  | 28 +++
 2 files changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/builtin_thread_pointer.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 446793b78db..2f6eb0a7b98 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -15433,6 +15433,16 @@ (define_insn_and_split "*tls_local_dynamic_32_once"
   (clobber (reg:CC FLAGS_REG))])])
 
 ;; Load and add the thread base pointer from %:0.
+(define_expand "get_thread_pointer"
+  [(set (match_operand:PTR 0 "register_operand")
+	(unspec:PTR [(const_int 0)] UNSPEC_TP))]
+  ""
+{
+  /* targetm is not existed in the scope of condition.  */
+  if (!targetm.have_tls)
+error ("%<__builtin_thread_pointer%> is not supported on this target");
+})
+
 (define_insn_and_split "*load_tp_"
   [(set (match_operand:PTR 0 "register_operand" "=r")
 	(unspec:PTR [(const_int 0)] UNSPEC_TP))]
diff --git a/gcc/testsuite/gcc.target/i386/builtin_thread_pointer.c b/gcc/testsuite/gcc.target/i386/builtin_thread_pointer.c
new file mode 100644
index 000..dce31488117
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/builtin_thread_pointer.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mtls-direct-seg-refs -O2 -masm=att" } */
+
+int*
+foo1 ()
+{
+  return (int*) __builtin_thread_pointer ();
+}
+
+/* { dg-final { scan-assembler "mov\[lq\]\[ \t\]*%\[fg\]s:0, %\[re\]ax" } }  */
+
+int
+foo2 ()
+{
+  int* p =  (int*) __builtin_thread_pointer ();
+  return p[4];
+}
+
+/* { dg-final { scan-assembler "movl\[ \t\]*%\[fg\]s:16, %eax" } }  */
+
+int
+foo3 (int i)
+{
+  int* p = (int*) __builtin_thread_pointer ();
+  return p[i];
+}
+
+/* { dg-final { scan-assembler "movl\[ \t\]*%\[fg\]s:0\\(,%\[a-z0-9\]*,4\\), %eax" } }  */
-- 
2.18.1



libbacktrace patch committed: Don't strip underscore on 64-bit PE

2020-09-08 Thread Ian Lance Taylor via Gcc-patches
This patch to libbacktrace avoids stripping a leading underscore from
symbol names on 64-bit PE COFF.  Bootstrapped and ran Go tests on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian

* pecoff.c (coff_initialize_syminfo): Add is_64 parameter.
(coff_add): Determine and pass is_64.
diff --git a/libbacktrace/pecoff.c b/libbacktrace/pecoff.c
index 221571c862e..49e5c3d868c 100644
--- a/libbacktrace/pecoff.c
+++ b/libbacktrace/pecoff.c
@@ -330,7 +330,7 @@ coff_is_function_symbol (const b_coff_internal_symbol *isym)
 
 static int
 coff_initialize_syminfo (struct backtrace_state *state,
-uintptr_t base_address,
+uintptr_t base_address, int is_64,
 const b_coff_section_header *sects, size_t sects_num,
 const b_coff_external_symbol *syms, size_t syms_size,
 const unsigned char *strtab, size_t strtab_size,
@@ -426,9 +426,12 @@ coff_initialize_syminfo (struct backtrace_state *state,
  else
name = isym.name;
 
- /* Strip leading '_'.  */
- if (name[0] == '_')
-   name++;
+ if (!is_64)
+   {
+ /* Strip leading '_'.  */
+ if (name[0] == '_')
+   name++;
+   }
 
  /* Symbol value is section relative, so we need to read the address
 of its section.  */
@@ -605,6 +608,7 @@ coff_add (struct backtrace_state *state, int descriptor,
   off_t max_offset;
   struct backtrace_view debug_view;
   int debug_view_valid;
+  int is_64;
   uintptr_t image_base;
   struct dwarf_sections dwarf_sections;
 
@@ -680,12 +684,16 @@ coff_add (struct backtrace_state *state, int descriptor,
   sects = (const b_coff_section_header *)
 (sects_view.data + fhdr.size_of_optional_header);
 
+  is_64 = 0;
   if (fhdr.size_of_optional_header > sizeof (*opt_hdr))
 {
   if (opt_hdr->magic == PE_MAGIC)
image_base = opt_hdr->u.pe.image_base;
   else if (opt_hdr->magic == PEP_MAGIC)
-   image_base = opt_hdr->u.pep.image_base;
+   {
+ image_base = opt_hdr->u.pep.image_base;
+ is_64 = 1;
+   }
   else
{
  error_callback (data, "bad magic in PE optional header", 0);
@@ -778,7 +786,7 @@ coff_add (struct backtrace_state *state, int descriptor,
   if (sdata == NULL)
goto fail;
 
-  if (!coff_initialize_syminfo (state, image_base,
+  if (!coff_initialize_syminfo (state, image_base, is_64,
sects, sects_num,
syms_view.data, syms_size,
str_view.data, str_size,


libbacktrace patch committed: Get executable name on macOS

2020-09-08 Thread Ian Lance Taylor via Gcc-patches
This patch to libbacktrace gets the executable name on macOS using
_NSGetExecutablePath.  This is another aspect of PR 96973.  Tested
basic functionality on macOS.  Bootstrapped and ran libbacktrace tests
on x86_64-pc-linux-gnu.  Committed to mainline.

Ian

 * fileline.c (macho_get_executable_path): New static function.
(fileline_initialize): Call macho_get_executable_path.
diff --git a/libbacktrace/fileline.c b/libbacktrace/fileline.c
index cc1011e8b5d..be62b9899c5 100644
--- a/libbacktrace/fileline.c
+++ b/libbacktrace/fileline.c
@@ -43,6 +43,10 @@ POSSIBILITY OF SUCH DAMAGE.  */
 #include 
 #endif
 
+#ifdef HAVE_MACH_O_DYLD_H
+#include 
+#endif
+
 #include "backtrace.h"
 #include "internal.h"
 
@@ -122,6 +126,35 @@ sysctl_exec_name2 (struct backtrace_state *state,
 
 #endif /* defined (HAVE_KERN_PROC_ARGS) || |defined (HAVE_KERN_PROC) */
 
+#ifdef HAVE_MACH_O_DYLD_H
+
+static char *
+macho_get_executable_path (struct backtrace_state *state,
+  backtrace_error_callback error_callback, void *data)
+{
+  uint32_t len;
+  char *name;
+
+  len = 0;
+  if (_NSGetExecutablePath (NULL, ) == 0)
+return NULL;
+  name = (char *) backtrace_alloc (state, len, error_callback, data);
+  if (name == NULL)
+return NULL;
+  if (_NSGetExecutablePath (name, ) != 0)
+{
+  backtrace_free (state, name, len, error_callback, data);
+  return NULL;
+}
+  return name;
+}
+
+#else /* !defined (HAVE_MACH_O_DYLD_H) */
+
+#define macho_get_executable_path(state, error_callback, data) NULL
+
+#endif /* !defined (HAVE_MACH_O_DYLD_H) */
+
 /* Initialize the fileline information from the executable.  Returns 1
on success, 0 on failure.  */
 
@@ -159,7 +192,7 @@ fileline_initialize (struct backtrace_state *state,
 
   descriptor = -1;
   called_error_callback = 0;
-  for (pass = 0; pass < 7; ++pass)
+  for (pass = 0; pass < 8; ++pass)
 {
   int does_not_exist;
 
@@ -188,6 +221,9 @@ fileline_initialize (struct backtrace_state *state,
case 6:
  filename = sysctl_exec_name2 (state, error_callback, data);
  break;
+   case 7:
+ filename = macho_get_executable_path (state, error_callback, data);
+ break;
default:
  abort ();
}


Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-08 Thread luoxhu via Gcc-patches



On 2020/9/8 16:26, Richard Biener wrote:
>> Seems not only pseudo, for example "v = vec_insert (i, v, n);"
>> the vector variable will be store to stack first, then [r112:DI] is a
>> memory here to be processed.  So the patch loads it from stack(insn #10) to
>> temp vector register first, and store to stack again(insn #24) after
>> rs6000_vector_set_var.
> Hmm, yeah - I guess that's what should be addressed first then.
> I'm quite sure that in case 'v' is not on the stack but in memory like
> in my case a SImode store is better than what we get from
> vec_insert - in fact vec_insert will likely introduce a RMW cycle
> which is prone to inserting store-data-races?

Yes, for your case, there is no stack operation and to_rtx is expanded
with BLKmode instead of V4SImode.  Add the to_rtx mode check could workaround
it.  ASM doesn't show store hit load issue.  

optimized:

_1 = i_2(D) % 4;
VIEW_CONVERT_EXPR(x.u)[_1] = a_4(D);

expand:
2: r118:DI=%3:DI
3: r119:DI=%4:DI
4: NOTE_INSN_FUNCTION_BEG
7: r120:DI=unspec[`*.LANCHOR0',%2:DI] 47
  REG_EQUAL `*.LANCHOR0'
8: r122:SI=r118:DI#0
9: {r124:SI=r122:SI/0x4;clobber ca:SI;}
   10: r125:SI=r124:SI<<0x2
   11: r123:SI=r122:SI-r125:SI
  REG_EQUAL r122:SI%0x4
   12: r126:DI=sign_extend(r123:SI)
   13: r127:DI=r126:DI+0x4
   14: r128:DI=r127:DI<<0x2
   15: r129:DI=r120:DI+r128:DI
   16: [r129:DI]=r119:DI#0

 p to_rtx
$319 = (rtx_def *) (mem/c:BLK (reg/f:DI 120) [2 x+0 S32 A128])

asm:
addis 2,12,.TOC.-.LCF0@ha
addi 2,2,.TOC.-.LCF0@l
.localentry test,.-test
srawi 9,3,2
addze 9,9
addis 10,2,.LANCHOR0@toc@ha
addi 10,10,.LANCHOR0@toc@l
slwi 9,9,2
subf 9,9,3
extsw 9,9
addi 9,9,4
sldi 9,9,2
stwx 4,10,9
blr


> 
> So - what we need to "fix" is cfgexpand.c marking variably-indexed
> decls as not to be expanded as registers (see
> discover_nonconstant_array_refs).
> 
> I guess one way forward would be to perform instruction
> selection on GIMPLE here and transform
> 
> VIEW_CONVERT_EXPR(D.3185)[_1] = i_6(D)
> 
> to a (direct) internal function based on the vec_set optab.  

I don't quite understand what you mean here.  Do you mean:
ALTIVEC_BUILTIN_VEC_INSERT -> VIEW_CONVERT_EXPR -> internal function -> vec_set
or ALTIVEC_BUILTIN_VEC_INSERT -> internal function -> vec_set?
And which pass to put the selection and transform is acceptable?
Why call it *based on* vec_set optab?  The VIEW_CONVERT_EXPR or internal 
function
is expanded to vec_set optab.

I guess you suggest adding internal function for VIEW_CONVERT_EXPR in gimple,
and do the transform from internal function to vec_set optab in expander?
I doubt my understanding as this looks really over complicated since we
transform from VIEW_CONVERT_EXPR to vec_set optab directly so far...
IIUC, Internal function seems doesn't help much here as Segher said before.


> But then in GIMPLE D.3185 is also still memory (we don't have a variable
> index partial register set operation - BIT_INSERT_EXPR is
> currently specified to receive a constant bit position only).
> 
> At which point after your patch is the stack storage elided?
> 

Stack storage is elided by register reload pass in RTL.


Thanks,
Xionghu


libbacktrace patch committed: Avoid ambiguous binary search

2020-09-08 Thread Ian Lance Taylor via Gcc-patches
This patch to libbacktrace avoids ambiguous binary searches.
Searching for a range match can cause the search order to not match
the sort order, which can cause libbacktrace to miss matching entries.
This patch allocates an extra entry at the end of function_addrs and
unit_addrs vectors, so that we can safely compare to the next entry
when searching.  It adjusts the matching code accordingly.  This fixes
https://github.com/ianlancetaylor/libbacktrace/issues/44.
Bootstrapped and ran libbacktrace and libgo tests on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian


* dwarf.c (function_addrs_search): Compare against the next entry
low address, not the high address.
(unit_addrs_search): Likewise.
(build_address_map): Add a trailing unit_addrs.
(read_function_entry): Add a trailing function_addrs.
(read_function_info): Likewise.
(report_inlined_functions): Search backward for function_addrs
match.
(dwarf_lookup_pc): Search backward for unit_addrs and
function_addrs matches.
diff --git a/libbacktrace/dwarf.c b/libbacktrace/dwarf.c
index 006c8181622..386701bffea 100644
--- a/libbacktrace/dwarf.c
+++ b/libbacktrace/dwarf.c
@@ -1164,9 +1164,11 @@ function_addrs_compare (const void *v1, const void *v2)
   return strcmp (a1->function->name, a2->function->name);
 }
 
-/* Compare a PC against a function_addrs for bsearch.  Note that if
-   there are multiple ranges containing PC, which one will be returned
-   is unpredictable.  We compensate for that in dwarf_fileline.  */
+/* Compare a PC against a function_addrs for bsearch.  We always
+   allocate an entra entry at the end of the vector, so that this
+   routine can safely look at the next entry.  Note that if there are
+   multiple ranges containing PC, which one will be returned is
+   unpredictable.  We compensate for that in dwarf_fileline.  */
 
 static int
 function_addrs_search (const void *vkey, const void *ventry)
@@ -1178,7 +1180,7 @@ function_addrs_search (const void *vkey, const void 
*ventry)
   pc = *key;
   if (pc < entry->low)
 return -1;
-  else if (pc >= entry->high)
+  else if (pc > (entry + 1)->low)
 return 1;
   else
 return 0;
@@ -1249,9 +1251,11 @@ unit_addrs_compare (const void *v1, const void *v2)
   return 0;
 }
 
-/* Compare a PC against a unit_addrs for bsearch.  Note that if there
-   are multiple ranges containing PC, which one will be returned is
-   unpredictable.  We compensate for that in dwarf_fileline.  */
+/* Compare a PC against a unit_addrs for bsearch.  We always allocate
+   an entry entry at the end of the vector, so that this routine can
+   safely look at the next entry.  Note that if there are multiple
+   ranges containing PC, which one will be returned is unpredictable.
+   We compensate for that in dwarf_fileline.  */
 
 static int
 unit_addrs_search (const void *vkey, const void *ventry)
@@ -1263,7 +1267,7 @@ unit_addrs_search (const void *vkey, const void *ventry)
   pc = *key;
   if (pc < entry->low)
 return -1;
-  else if (pc >= entry->high)
+  else if (pc > (entry + 1)->low)
 return 1;
   else
 return 0;
@@ -2091,6 +2095,7 @@ build_address_map (struct backtrace_state *state, 
uintptr_t base_address,
   size_t i;
   struct unit **pu;
   size_t unit_offset = 0;
+  struct unit_addrs *pa;
 
   memset (>vec, 0, sizeof addrs->vec);
   memset (_vec->vec, 0, sizeof unit_vec->vec);
@@ -2231,6 +2236,17 @@ build_address_map (struct backtrace_state *state, 
uintptr_t base_address,
   if (info.reported_underflow)
 goto fail;
 
+  /* Add a trailing addrs entry, but don't include it in addrs->count.  */
+  pa = ((struct unit_addrs *)
+   backtrace_vector_grow (state, sizeof (struct unit_addrs),
+  error_callback, data, >vec));
+  if (pa == NULL)
+goto fail;
+  pa->low = 0;
+  --pa->low;
+  pa->high = pa->low;
+  pa->u = NULL;
+
   unit_vec->vec = units;
   unit_vec->count = units_count;
   return 1;
@@ -3404,8 +3420,23 @@ read_function_entry (struct backtrace_state *state, 
struct dwarf_data *ddata,
 
  if (fvec.count > 0)
{
+ struct function_addrs *p;
  struct function_addrs *faddrs;
 
+ /* Allocate a trailing entry, but don't include it
+in fvec.count.  */
+ p = ((struct function_addrs *)
+  backtrace_vector_grow (state,
+ sizeof (struct function_addrs),
+ error_callback, data,
+ ));
+ if (p == NULL)
+   return 0;
+ p->low = 0;
+ --p->low;
+ p->high = p->low;
+ p->function = NULL;
+
  if (!backtrace_vector_release (state, ,
 error_callback, data))
return 0;
@@ -3439,6 +3470,7 @@ read_function_info (struct backtrace_state 

[PATCH] Cygwin/MinGW: Do not version lto plugins

2020-09-08 Thread JonY via Gcc-patches
Hello,

The lto plugis are tied to the built GCC anyway, so there isn't much
point to versioning them.

* gcc/config.host: Remove version string
* lto-plugin/Makefile.am: Use libtool -avoid-version
* lto-plugin/Makefile.in: Regenerate

This patch has been in use with Cygwin gcc for a long time and should be
pushed upstream. Patch OK?

From 6bf6b87887a8a5eb53ad409cd4aa32cb1ac50786 Mon Sep 17 00:00:00 2001
From: Jonathan Yong <10wa...@gmail.com>
Date: Sat, 28 Jun 2014 09:35:02 +0800
Subject: [PATCH 1/1] Cygwin/MinGW: Do not version lto plugins

---
 gcc/config.host| 6 +++---
 lto-plugin/Makefile.am | 2 +-
 lto-plugin/Makefile.in | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config.host b/gcc/config.host
index 84f0433e2ad..373d5efd8da 100644
--- a/gcc/config.host
+++ b/gcc/config.host
@@ -232,7 +232,7 @@ case ${host} in
 out_host_hook_obj=host-cygwin.o
 host_xmake_file="${host_xmake_file} i386/x-cygwin"
 host_exeext=.exe
-host_lto_plugin_soname=cyglto_plugin-0.dll
+host_lto_plugin_soname=cyglto_plugin.dll
 ;;
   i[34567]86-*-mingw32*)
 host_xm_file=i386/xm-mingw32.h
@@ -240,7 +240,7 @@ case ${host} in
 host_exeext=.exe
 out_host_hook_obj=host-mingw32.o
 host_extra_gcc_objs="${host_extra_gcc_objs} driver-mingw32.o"
-host_lto_plugin_soname=liblto_plugin-0.dll
+host_lto_plugin_soname=liblto_plugin.dll
 ;;
   x86_64-*-mingw*)
 use_long_long_for_widest_fast_int=yes
@@ -249,7 +249,7 @@ case ${host} in
 host_exeext=.exe
 out_host_hook_obj=host-mingw32.o
 host_extra_gcc_objs="${host_extra_gcc_objs} driver-mingw32.o"
-host_lto_plugin_soname=liblto_plugin-0.dll
+host_lto_plugin_soname=liblto_plugin.dll
 ;;
   i[34567]86-*-darwin* | x86_64-*-darwin*)
 out_host_hook_obj="${out_host_hook_obj} host-i386-darwin.o"
diff --git a/lto-plugin/Makefile.am b/lto-plugin/Makefile.am
index ba5882df7a7..204b25f45ef 100644
--- a/lto-plugin/Makefile.am
+++ b/lto-plugin/Makefile.am
@@ -21,7 +21,7 @@ in_gcc_libs = $(foreach lib, $(libexecsub_LTLIBRARIES), 
$(gcc_build_dir)/$(lib))
 liblto_plugin_la_SOURCES = lto-plugin.c
 # Note that we intentionally override the bindir supplied by ACX_LT_HOST_FLAGS.
 liblto_plugin_la_LDFLAGS = $(AM_LDFLAGS) \
-   $(lt_host_flags) -module -bindir $(libexecsubdir)
+   $(lt_host_flags) -module -avoid-version -bindir $(libexecsubdir)
 # Can be simplified when libiberty becomes a normal convenience library.
 libiberty = $(with_libiberty)/libiberty.a
 libiberty_noasan = $(with_libiberty)/noasan/libiberty.a
diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in
index 7da7cd26dbf..834699b439e 100644
--- a/lto-plugin/Makefile.in
+++ b/lto-plugin/Makefile.in
@@ -350,7 +350,7 @@ libexecsub_LTLIBRARIES = liblto_plugin.la
 in_gcc_libs = $(foreach lib, $(libexecsub_LTLIBRARIES), 
$(gcc_build_dir)/$(lib))
 liblto_plugin_la_SOURCES = lto-plugin.c
 # Note that we intentionally override the bindir supplied by ACX_LT_HOST_FLAGS.
-liblto_plugin_la_LDFLAGS = $(AM_LDFLAGS) $(lt_host_flags) -module \
+liblto_plugin_la_LDFLAGS = $(AM_LDFLAGS) $(lt_host_flags) -module 
-avoid-version \
-bindir $(libexecsubdir) $(if $(wildcard \
$(libiberty_noasan)),, $(if $(wildcard \
$(libiberty_pic)),,-Wc,$(libiberty)))
-- 
2.11.4.GIT



signature.asc
Description: OpenPGP digital signature


libbacktrace patch committed: Correct tipo in comment

2020-09-08 Thread Ian Lance Taylor via Gcc-patches
This patch suggested by Ondřej Čertík fixes a typpo in a comment.
Bootstrapped and ran libbacktrace tests on x86_64-pc-linux-gnu.
Committed to mainline.

Ian

* simple.c (simple_unwind): Correct comment spelling.
diff --git a/libbacktrace/simple.c b/libbacktrace/simple.c
index b9b971af95e..9ba660c871c 100644
--- a/libbacktrace/simple.c
+++ b/libbacktrace/simple.c
@@ -55,7 +55,7 @@ struct backtrace_simple_data
   int ret;
 };
 
-/* Unwind library callback routine.  This is passd to
+/* Unwind library callback routine.  This is passed to
_Unwind_Backtrace.  */
 
 static _Unwind_Reason_Code


Re: [PATCH] arm: Fix up arm_override_options_after_change [PR96939]

2020-09-08 Thread Jeff Law via Gcc-patches
On Tue, 2020-09-08 at 10:45 +0200, Jakub Jelinek via Gcc-patches wrote:
> Hi!
> 
> As mentioned in the PR, the testcase fails to link, because when set_cfun is
> being called on the crc function, arm_override_options_after_change is
> called from set_cfun -> invoke_set_current_function_hook:
>   /* Change optimization options if needed.  */
>   if (optimization_current_node != opts)
> {
>   optimization_current_node = opts;
>   cl_optimization_restore (_options, TREE_OPTIMIZATION (opts));
> }
> and at that point target_option_default_node actually matches even the
> current state of options, so this means armv7 (or whatever) arch is set as
> arm_active_target, then
>   targetm.set_current_function (fndecl);
> is called later in that function, which because the crc function's
> DECL_FUNCTION_SPECIFIC_TARGET is different from the current one will do:
>   cl_target_option_restore (_options, TREE_TARGET_OPTION (new_tree));
> which calls arm_option_restore and sets arm_active_target to armv8-a+crc
> (so far so good).
> Later arm_set_current_function calls:
>   save_restore_target_globals (new_tree);
> which in this case calls:
>   /* Call target_reinit and save the state for TARGET_GLOBALS.  */
>   TREE_TARGET_GLOBALS (new_tree) = save_target_globals_default_opts ();
> which because optimization_current_node != optimization_default_node
> (the testcase is LTO, so all functions have their
> DECL_FUNCTION_SPECIFIC_TARGET and TREE_OPTIMIZATION nodes) will call:
>   cl_optimization_restore
> (_options,
>  TREE_OPTIMIZATION (optimization_default_node));
> and
>   cl_optimization_restore (_options,
>TREE_OPTIMIZATION (opts));
> The problem is that these call arm_override_options_after_change again,
> and that one uses the target_option_default_node as what to set the
> arm_active_target to (i.e. back to armv7 or whatever, but not to the
> armv8-a+crc that should be the active target for the crc function).
> That means we then error on the builtin call in that function.
> 
> Now, the targetm.override_options_after_change hook is called always at the
> end of cl_optimization_restore, i.e. when we change the Optimization marked
> generic options.  So it seems unnecessary to call arm_configure_build_target
> at that point (nothing it depends on changed), and additionally incorrect
> (because it uses the target_option_default_node, rather than the current
> set of options; we'd need to revert
> https://gcc.gnu.org/legacy-ml/gcc-patches/2016-12/msg01390.html
> otherwise so that it works again with global_options otherwise).
> The options that arm_configure_build_target cares about will change only
> during option parsing (which is where it is called already), or during
> arm_set_current_function, where it is done during the
> cl_target_option_restore.
> Now, arm_override_options_after_change_1 wants to adjust the
> str_align_functions, which depends on the current Optimization options (e.g.
> optimize_size and flag_align_options and str_align_functions) as well as
> the target options target_flags, so IMHO needs to be called both
> when the Optimization options (possibly) change, i.e. from
> the targetm.override_options_after_change hook, and from when the target
> options change (set_current_function hook).
> 
> Bootstrapped/regtested on armv7hl-linux-gnueabi, ok for trunk?
> 
> Looking further at arm_override_options_after_change_1, it also seems to be
> incorrect, rather than testing
> !opts->x_str_align_functions
> it should be really testing
> !opts_set->x_str_align_functions
> and get _options_set or similar passed to it as additional opts_set
> argument.  That is because otherwise the decision will be sticky, while it
> should be done whenever use provided -falign-functions but didn't provide
> -falign-functions= (either on the command line, or through optimize
> attribute or pragma).
> 
> 2020-09-08  Jakub Jelinek  
> 
>   PR target/96939
>   * config/arm/arm.c (arm_override_options_after_change): Don't call
>   arm_configure_build_target here.
>   (arm_set_current_function): Call arm_override_options_after_change_1
>   at the end.
> 
>   * gcc.target/arm/lto/pr96939_0.c: New test.
>   * gcc.target/arm/lto/pr96939_1.c: New file.
Any objection if I pull this into the Fedora tree and build a new GCC at some
point in the relatively new future (once approved).  Similarly for your lto vs
linenumber patch?

Jeff



libbacktrace patch committed: Correct Mach-O memory allocation

2020-09-08 Thread Ian Lance Taylor via Gcc-patches
This libbacktrace patch corrects the amount of memory allocated when
looking for the Mach-O dsym file.  We weren't allocating space for the
backslash.  Thanks to Alex Crichton for noticing this.  This also
fixes the amount of space released when freeing diralc in the same
function.  Thanks to Francois-Xavier Coudert for noticing that.
Bootstrapped and ran libbacktrace tests on x86_64-pc-linux-gnu.
Committed to mainline.

Ian


* macho.c (macho_add_dsym): Make space for '/' in dsym.  Use
correct length when freeing diralc.


Re: [PATCH] libphobos: libdruntime doesn't support shadow stack (PR95680)

2020-09-08 Thread Rainer Orth
Hi Iain,

>>> ---
>>> libphobos/ChangeLog:
>>>
>>> PR d/95680
>>> * Makefile.in: Regenerate.
>>> * configure: Regenerate.
>>> * configure.ac (DCFG_ENABLE_CET): Substitute.
>>> * libdruntime/Makefile.in: Regenerate.
>>> * libdruntime/config/x86/switchcontext.S: Remove CET support code.
>>> * libdruntime/core/thread.d: Import gcc.config.  Don't set version
>>> AsmExternal when GNU_Enable_CET is true.
>>> * libdruntime/gcc/config.d.in (GNU_Enable_CET): Define.
>>> * src/Makefile.in: Regenerate.
>>> * testsuite/Makefile.in: Regenerate.
>> 
>> Looks good.  I can try it on Tiger Lake after it has been checked in.
>> 
>
> OK, I have committed it as r11-3047.

this patch broke Solaris/x86 bootstrap:

/vol/gcc/src/hg/master/local/libphobos/libdruntime/core/thread.d:3595:23: 
error: version AsmExternal defined after use
 3595 | version = AsmExternal;
  |   ^
/vol/gcc/src/hg/master/local/libphobos/libdruntime/core/thread.d:3603:27: 
error: version AsmX86_Posix defined after use
 3603 | version = AsmX86_Posix;
  |   ^

and similarly for the 64-bit version.  libdruntime/gcc/config.d has

// Whether libphobos been configured with --enable-cet.
enum GNU_Enable_CET = false;

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


libbacktrace patch committed: Correctly swap Mach-O fat 32-bit file offset

2020-09-08 Thread Ian Lance Taylor via Gcc-patches
This libbacktrace patch correctly swaps the 32-bit file offset in a
Mach-O fat file.  This is based on a patch by Francois-Xavier Coudert
, who analyzed the problem.  This is for PR 96973.  Bootstrapped and
ran libbacktrace tests on x86_64-pc-linux-gnu.  Committed to mainline.

Ian

PR libbacktrace/96973
 * macho.c (macho_add_fat): Correctly swap 32-bit file offset.
diff --git a/libbacktrace/macho.c b/libbacktrace/macho.c
index bd737226ca6..20dd3262d58 100644
--- a/libbacktrace/macho.c
+++ b/libbacktrace/macho.c
@@ -793,13 +793,24 @@ macho_add_fat (struct backtrace_state *state, const char 
*filename,
 
   for (i = 0; i < nfat_arch; ++i)
 {
-  struct macho_fat_arch_64 fat_arch;
   uint32_t fcputype;
+  uint64_t foffset;
 
   if (is_64)
-   memcpy (_arch,
-   (const char *) arch_view.data + i * arch_size,
-   arch_size);
+   {
+ struct macho_fat_arch_64 fat_arch_64;
+
+ memcpy (_arch_64,
+ (const char *) arch_view.data + i * arch_size,
+ arch_size);
+ fcputype = fat_arch_64.cputype;
+ foffset = fat_arch_64.offset;
+ if (swapped)
+   {
+ fcputype = __builtin_bswap32 (fcputype);
+ foffset = __builtin_bswap64 (foffset);
+   }
+   }
   else
{
  struct macho_fat_arch fat_arch_32;
@@ -807,26 +818,18 @@ macho_add_fat (struct backtrace_state *state, const char 
*filename,
  memcpy (_arch_32,
  (const char *) arch_view.data + i * arch_size,
  arch_size);
- fat_arch.cputype = fat_arch_32.cputype;
- fat_arch.cpusubtype = fat_arch_32.cpusubtype;
- fat_arch.offset = (uint64_t) fat_arch_32.offset;
- fat_arch.size = (uint64_t) fat_arch_32.size;
- fat_arch.align = fat_arch_32.align;
- fat_arch.reserved = 0;
+ fcputype = fat_arch_32.cputype;
+ foffset = (uint64_t) fat_arch_32.offset;
+ if (swapped)
+   {
+ fcputype = __builtin_bswap32 (fcputype);
+ foffset = (uint64_t) __builtin_bswap32 ((uint32_t) foffset);
+   }
}
 
-  fcputype = fat_arch.cputype;
-  if (swapped)
-   fcputype = __builtin_bswap32 (fcputype);
-
   if (fcputype == cputype)
{
- uint64_t foffset;
-
  /* FIXME: What about cpusubtype?  */
- foffset = fat_arch.offset;
- if (swapped)
-   foffset = __builtin_bswap64 (foffset);
  backtrace_release_view (state, _view, error_callback, data);
  return macho_add (state, filename, descriptor, foffset, match_uuid,
base_address, skip_symtab, error_callback, data,


Re: [PATCH] c++: Fix resolving the address of overloaded pmf [PR96647]

2020-09-08 Thread Jason Merrill via Gcc-patches

On 9/8/20 9:17 AM, Patrick Palka wrote:

On Mon, 31 Aug 2020, Jason Merrill wrote:


On 8/28/20 12:45 PM, Patrick Palka wrote:

(Removing libstd...@gcc.gnu.org from CC list)

On Fri, 28 Aug 2020, Patrick Palka wrote:

In resolve_address_of_overloaded_function, currently only the second
pass over the overload set (which considers just the function templates
in the overload set) checks constraints and performs return type
deduction when necessary.  But as the testcases below show, we need to
do this when considering non-template functions during the first pass,
too.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?


OK.


gcc/cp/ChangeLog:

PR c++/96647
* class.c (resolve_address_of_overloaded_function): Also check
constraints and perform return type deduction when considering
non-template functions in the overload set.

gcc/testsuite/ChangeLog:

PR c++/96647
* g++.dg/cpp0x/auto-96647.C: New test.
* g++.dg/cpp2a/concepts-fn6.C: New test.
---
   gcc/cp/class.c| 16 
   gcc/testsuite/g++.dg/cpp0x/auto-96647.C   | 10 ++
   gcc/testsuite/g++.dg/cpp2a/concepts-fn6.C | 10 ++
   3 files changed, 36 insertions(+)
   create mode 100644 gcc/testsuite/g++.dg/cpp0x/auto-96647.C
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-fn6.C



+   if (undeduced_auto_decl (fn))
+ {
+   /* Force instantiation to do return type deduction.  */
+   ++function_depth;
+   instantiate_decl (fn, /*defer*/false, /*class*/false);
+   --function_depth;


How about maybe_instantiate_decl instead of this hunk?  This looks like it
could call instantiate_decl for a non-template function, which is wrong.


Good point.  We even ICE on the testcase error9.C below when using
instantiate_decl here, since we indeed end up calling it for the
non-specialization f(bool).

Does the following look OK?

-- >8 --

Subject: [PATCH] c++: Fix resolving the address of overloaded pmf [PR96647]

In resolve_address_of_overloaded_function, currently only the second
pass over the overload set (which considers just the function templates
in the overload set) checks constraints and performs return type
deduction when necessary.  But as the testcases below show, we need to
do the same when considering non-template functions during the first
pass.

gcc/cp/ChangeLog:

PR c++/96647
* class.c (resolve_address_of_overloaded_function): Check
constraints_satisfied_p and perform return-type deduction via
maybe_instantiate_decl when considering non-template functions
in the overload set.
* cp-tree.h (maybe_instantiate_decl): Declare.
* decl2.c (maybe_instantiate_decl): Remove static.

gcc/testsuite/ChangeLog:

PR c++/96647
* g++.dg/cpp0x/auto-96647.C: New test.
* g++.dg/cpp0x/error9: New test.
* g++.dg/cpp2a/concepts-fn6.C: New test.
---
  gcc/cp/class.c| 13 +
  gcc/cp/cp-tree.h  |  1 +
  gcc/cp/decl2.c|  3 +--
  gcc/testsuite/g++.dg/cpp0x/auto-96647.C   | 10 ++
  gcc/testsuite/g++.dg/cpp0x/error9.C   |  6 ++
  gcc/testsuite/g++.dg/cpp2a/concepts-fn6.C | 10 ++
  6 files changed, 41 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/auto-96647.C
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/error9.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-fn6.C

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 3479b8207d2..c9a1f753d56 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -8286,6 +8286,19 @@ resolve_address_of_overloaded_function (tree target_type,
 one, or vice versa.  */
  continue;
  
+	/* Constraints must be satisfied. This is done before

+  return type deduction since that instantiates the
+  function. */
+   if (!constraints_satisfied_p (fn))
+ continue;
+
+   if (undeduced_auto_decl (fn))
+ {
+   /* Force instantiation to do return type deduction.  */
+   maybe_instantiate_decl (fn);
+   require_deduced_type (fn);
+ }
+
/* In C++17 we need the noexcept-qualifier to compare types.  */
if (flag_noexcept_type
&& !maybe_instantiate_noexcept (fn, complain))
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 708de83eb46..78739411755 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6905,6 +6905,7 @@ extern void do_type_instantiation (tree, tree, 
tsubst_flags_t);
  extern bool always_instantiate_p  (tree);
  extern bool maybe_instantiate_noexcept(tree, tsubst_flags_t = 
tf_warning_or_error);
  extern tree instantiate_decl  (tree, bool, bool);
+extern void maybe_instantiate_decl (tree);
  extern int comp_template_parms(const_tree, 

Re: [PATCH v2] c++: Further tweaks for new-expression and paren-init [PR77841]

2020-09-08 Thread Jason Merrill via Gcc-patches

On 9/8/20 4:06 PM, Marek Polacek wrote:

On Mon, Sep 07, 2020 at 11:19:47PM -0400, Jason Merrill wrote:

On 9/6/20 11:34 AM, Marek Polacek wrote:

@@ -3944,9 +3935,9 @@ build_new (location_t loc, vec **placement, 
tree type,
   }
 /* P1009: Array size deduction in new-expressions.  */
-  if (TREE_CODE (type) == ARRAY_TYPE
-  && !TYPE_DOMAIN (type)
-  && *init)
+  const bool deduce_array_p = (TREE_CODE (type) == ARRAY_TYPE
+  && !TYPE_DOMAIN (type));
+  if (*init && (deduce_array_p || (nelts && cxx_dialect >= cxx20)))


Looks like this won't handle new (char[4]), for which we also get an
ARRAY_TYPE.


Good catch.  Fixed & paren-init37.C added.


   {
 /* This means we have 'new T[]()'.  */
 if ((*init)->is_empty ())
@@ -3955,16 +3946,20 @@ build_new (location_t loc, vec 
**placement, tree type,
  CONSTRUCTOR_IS_DIRECT_INIT (ctor) = true;
  vec_safe_push (*init, ctor);
}
+  tree array_type = deduce_array_p ? TREE_TYPE (type) : type;


I'd call this variable elt_type.


Right, and it should be inside the block below.


 tree  = (**init)[0];
 /* The C++20 'new T[](e_0, ..., e_k)' case allowed by P0960.  */
 if (!DIRECT_LIST_INIT_P (elt) && cxx_dialect >= cxx20)
{
- /* Handle new char[]("foo").  */
+ /* Handle new char[]("foo"): turn it into new char[]{"foo"}.  */
  if (vec_safe_length (*init) == 1
- && char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (type)))
+ && char_type_p (TYPE_MAIN_VARIANT (array_type))
  && TREE_CODE (tree_strip_any_location_wrapper (elt))
 == STRING_CST)
-   /* Leave it alone: the string should not be wrapped in {}.  */;
+   {
+ elt = build_constructor_single (init_list_type_node, NULL_TREE, 
elt);
+ CONSTRUCTOR_IS_DIRECT_INIT (elt) = true;
+   }
  else
{
  tree ctor = build_constructor_from_vec (init_list_type_node, 
*init);


With this change, doesn't the string special case produce the same result as
the general case?


The problem is that reshape_init won't do anything for 
CONSTRUCTOR_IS_PAREN_INIT.


Ah, yes, that flag is the difference.


So the reshape_init in build_new_1 wouldn't unwrap the outermost { } around
a STRING_CST.



Perhaps reshape_init should be adjusted to do that unwrapping even when it gets
a CONSTRUCTOR_IS_PAREN_INIT CONSTRUCTOR.  But I'm not sure if it should also do
the reference_related_p unwrapping in reshape_init_r in that case.


That would make sense to me.


@@ -3977,9 +3972,15 @@ build_new (location_t loc, vec **placement, 
tree type,
}
}
 /* Otherwise we should have 'new T[]{e_0, ..., e_k}'.  */
-  if (BRACE_ENCLOSED_INITIALIZER_P (elt))
-   elt = reshape_init (type, elt, complain);
-  cp_complete_array_type (, elt, /*do_default*/false);
+  if (deduce_array_p)
+   {
+ /* Don't reshape ELT itself: we want to pass a list-initializer to
+build_new_1, even for STRING_CSTs.  */
+ tree e = elt;
+ if (BRACE_ENCLOSED_INITIALIZER_P (e))
+   e = reshape_init (type, e, complain);


The comment is unclear; this call does reshape the CONSTRUCTOR ELT points
to, it just doesn't change ELT if the reshape call returns something else.


Yea, I've amended the comment.


Why are we reshaping here, anyway?  Won't that lead to undesired brace
elision?


We have to reshape before deducing the array, otherwise we could deduce the
wrong number of elements when certain braces were omitted.  E.g. in

   struct S { int x, y; };
   new S[]{1, 2, 3, 4}; // braces elided, is { {1, 2}, {3, 4} }


Ah, right, we also get here for initializers written with actual braces.


we want S[2], not S[4].  A way to test it would be

   struct S { int x, y; };
   S *p = new S[]{1, 2, 3, 4};

   void* operator new (unsigned long int size)
   {
   if (size != sizeof (S) * 2)
__builtin_abort ();
   return __builtin_malloc (size);
   }

   int main () { }

I can add that too, if you want.  (It'd be safer if cp_complete_array_type
always reshaped but that's not trivial, as the original patch mentions.)
()-init-list wouldn't be reshaped because CONSTRUCTOR_IS_PAREN_INIT is set.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

Thanks,

-- >8 --
This patch corrects our handling of array new-expression with ()-init:

   new int[4](1, 2, 3, 4);

should work even with the explicit array bound, and

   new char[3]("so_sad");

should cause an error, but we weren't giving any.

Fixed by handling array new-expressions with ()-init in the same spot
where we deduce the array bound in array new-expression.  I'm now
always passing STRING_CSTs to build_new_1 wrapped in { } which allowed
me to remove the special handling of STRING_CSTs in build_new_1.  And
since the DIRECT_LIST_INIT_P block in build_new_1 calls digest_init, we

[PATCH] openacc: Fix atomic_capture-2.c iteration-ordering issues

2020-09-08 Thread Julian Brown
The test case was written with assumptions about loop iteration ordering
that are not guaranteed by OpenACC and do not apply on all targets,
in particular AMD GCN. This patch removes those assumptions.

Tested with offloading to AMD GCN. I will apply shortly.

Julian

2020-09-08  Julian Brown  

libgomp/
* testsuite/libgomp.oacc-c-c++-common/atomic_capture-2.c: Remove
iteration-ordering assumptions.
---
 .../atomic_capture-2.c| 92 +--
 1 file changed, 43 insertions(+), 49 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-2.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-2.c
index 842f2de4722..4f83f03899d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-2.c
@@ -37,11 +37,9 @@ main(int argc, char **argv)
   imin = idata[i] < imin ? idata[i] : imin;
 }
 
-  if (imax != 1234 || imin != 0)
+  if (imax != 1234 || imin < 0 || imin > 1)
 abort ();
 
-  return 0;
-
   igot = 0;
   iexp = 32;
 
@@ -443,17 +441,16 @@ main(int argc, char **argv)
 }
   }
 
+  int ones = 0, zeros = 0;
+
   for (i = 0; i < N; i++)
-if (i % 2 == 0)
-  {
-   if (idata[i] != 1)
- abort ();
-  }
-else
-  {
-   if (idata[i] != 0)
- abort ();
-  }
+if (idata[i] == 1)
+  ones++;
+else if (idata[i] == 0)
+  zeros++;
+
+  if (ones != N / 2 || zeros != N / 2)
+abort ();
 
   if (iexp != igot)
 abort ();
@@ -491,17 +488,16 @@ main(int argc, char **argv)
   }
   }
 
+  ones = zeros = 0;
+
   for (i = 0; i < N; i++)
-if (i % 2 == 0)
-  {
-   if (idata[i] != 0)
- abort ();
-  }
-else
-  {
-   if (idata[i] != 1)
- abort ();
-  }
+if (idata[i] == 1)
+  ones++;
+else if (idata[i] == 0)
+  zeros++;
+
+  if (ones != N / 2 || zeros != N / 2)
+abort ();
 
   if (iexp != igot)
 abort ();
@@ -579,7 +575,7 @@ main(int argc, char **argv)
   if (lexp != lgot)
 abort ();
 
-  lgot = 2LL;
+  lgot = 2LL << N;
   lexp = 2LL;
 
 #pragma acc data copy (lgot, ldata[0:N])
@@ -587,7 +583,7 @@ main(int argc, char **argv)
 #pragma acc parallel loop
 for (i = 0; i < N; i++)
   {
-long long expr = 1LL << N;
+   long long expr = 2LL;
 
 #pragma acc atomic capture
 { lgot = lgot / expr; ldata[i] = lgot; }
@@ -1450,17 +1446,16 @@ main(int argc, char **argv)
   }
   }
 
+  ones = zeros = 0;
+
   for (i = 0; i < N; i++)
-if (i % 2 == 0)
-  {
-   if (fdata[i] != 1.0)
- abort ();
-  }
-else
-  {
-   if (fdata[i] != 0.0)
- abort ();
-  }
+if (fdata[i] == 1.0)
+  ones++;
+else if (fdata[i] == 0.0)
+  zeros++;
+
+  if (ones != N / 2 || zeros != N / 2)
+abort ();
 
   if (fexp != fgot)
 abort ();
@@ -1498,17 +1493,16 @@ main(int argc, char **argv)
   }
   }
 
+  ones = zeros = 0;
+
   for (i = 0; i < N; i++)
-if (i % 2 == 0)
-  {
-   if (fdata[i] != 0.0)
- abort ();
-  }
-else
-  {
-   if (fdata[i] != 1.0)
- abort ();
-  }
+if (fdata[i] == 1.0)
+  ones++;
+else if (fdata[i] == 0.0)
+  zeros++;
+
+  if (ones != N / 2 || zeros != N / 2)
+abort ();
 
   if (fexp != fgot)
 abort ();
@@ -1569,7 +1563,7 @@ main(int argc, char **argv)
 abort ();
 
   fgot = 8192.0*8192.0*64.0;
-  fexp = 1.0;
+  fexp = fgot;
 
 #pragma acc data copy (fgot, fdata[0:N])
   {
@@ -1586,15 +1580,15 @@ main(int argc, char **argv)
   if (fexp != fgot)
 abort ();
 
-  fgot = 4.0;
-  fexp = 4.0;
+  fgot = 2.0 * (1LL << N);
+  fexp = 2.0;
 
 #pragma acc data copy (fgot, fdata[0:N])
   {
 #pragma acc parallel loop
 for (i = 0; i < N; i++)
   {
-long long expr = 1LL << N;
+   long long expr = 2LL;
 
 #pragma acc atomic capture
 { fgot = fgot / expr; fdata[i] = fgot; }
-- 
2.28.0



[PATCH] openacc: Fix mkoffload SGPR/VGPR count parsing for HSACO v3

2020-09-08 Thread Julian Brown
If an offload kernel uses a large number of VGPRs, AMD GCN hardware may
need to limit the number of threads/workers launched for that kernel.
The number of SGPRs/VGPRs in use is detected by mkoffload and recorded in
the processed output.  The patterns emitted detailing SGPR/VGPR occupancy
changed between HSACO v2 and v3 though, so this patch updates parsing
to account for that.

Tested with offloading to AMD GCN. I will apply shortly.

Julian

2020-09-08  Julian Brown  

gcc/
* config/gcn/mkoffload.c (process_asm): Initialise regcount.  Update
scanning for SGPR/VGPR usage for HSACO v3.
---
 gcc/config/gcn/mkoffload.c | 40 --
 1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.c b/gcc/config/gcn/mkoffload.c
index 808ce53176c..0983b98e178 100644
--- a/gcc/config/gcn/mkoffload.c
+++ b/gcc/config/gcn/mkoffload.c
@@ -432,7 +432,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 int sgpr_count;
 int vgpr_count;
 char *kernel_name;
-  } regcount;
+  } regcount = { -1, -1, NULL };
 
   /* Always add _init_array and _fini_array as kernels.  */
   obstack_ptr_grow (_os, xstrdup ("_init_array"));
@@ -440,7 +440,12 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
   fn_count += 2;
 
   char buf[1000];
-  enum { IN_CODE, IN_AMD_KERNEL_CODE_T, IN_VARS, IN_FUNCS } state = IN_CODE;
+  enum
+{ IN_CODE,
+  IN_METADATA,
+  IN_VARS,
+  IN_FUNCS
+} state = IN_CODE;
   while (fgets (buf, sizeof (buf), in))
 {
   switch (state)
@@ -453,21 +458,25 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
obstack_grow (_os, , sizeof (dim));
dims_count++;
  }
-   else if (sscanf (buf, " .amdgpu_hsa_kernel %ms\n",
-_name) == 1)
- break;
 
break;
  }
-   case IN_AMD_KERNEL_CODE_T:
+   case IN_METADATA:
  {
-   gcc_assert (regcount.kernel_name);
-   if (sscanf (buf, " wavefront_sgpr_count = %d\n",
-   _count) == 1)
+   if (sscanf (buf, " - .name: %ms\n", _name) == 1)
  break;
-   else if (sscanf (buf, " workitem_vgpr_count = %d\n",
+   else if (sscanf (buf, " .sgpr_count: %d\n",
+_count) == 1)
+ {
+   gcc_assert (regcount.kernel_name);
+   break;
+ }
+   else if (sscanf (buf, " .vgpr_count: %d\n",
 _count) == 1)
- break;
+ {
+   gcc_assert (regcount.kernel_name);
+   break;
+ }
 
break;
  }
@@ -508,9 +517,10 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
state = IN_VARS;
   else if (sscanf (buf, " .section .gnu.offload_funcs%c", ) > 0)
state = IN_FUNCS;
-  else if (sscanf (buf, " .amd_kernel_code_%c", ) > 0)
+  else if (sscanf (buf, " .amdgpu_metadata%c", ) > 0)
{
- state = IN_AMD_KERNEL_CODE_T;
+ state = IN_METADATA;
+ regcount.kernel_name = NULL;
  regcount.sgpr_count = regcount.vgpr_count = -1;
}
   else if (sscanf (buf, " .section %c", ) > 0
@@ -519,7 +529,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
   || sscanf (buf, " .data%c", ) > 0
   || sscanf (buf, " .ident %c", ) > 0)
state = IN_CODE;
-  else if (sscanf (buf, " .end_amd_kernel_code_%c", ) > 0)
+  else if (sscanf (buf, " .end_amdgpu_metadata%c", ) > 0)
{
  state = IN_CODE;
  gcc_assert (regcount.kernel_name != NULL
@@ -531,7 +541,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
  regcount.sgpr_count = regcount.vgpr_count = -1;
}
 
-  if (state == IN_CODE || state == IN_AMD_KERNEL_CODE_T)
+  if (state == IN_CODE || state == IN_METADATA)
fputs (buf, out);
 }
 
-- 
2.28.0



[PATCH] amdgcn: Add waitcnt after LDS write instructions

2020-09-08 Thread Julian Brown
Data-share write (ds_write) instructions do not necessarily complete
the write to LDS immediately. When a write completes, LGKM_CNT is
decremented. For now, we wait until LGKM_CNT reaches zero after each
ds_write instruction.

This fixes a race condition in the case where LDS is read immediately
after being written. This can happen with broadcast operations.

This may be latent on mainline, since the broadcast machinery isn't
present there yet. Nonetheless, I will apply shortly.

Julian

2020-09-08  Julian Brown  

gcc/
* config/gcn/gcn-valu.md (scatter_insn_1offset_ds):
Add waitcnt.
* config/gcn/gcn.md (*mov_insn, *movti_insn): Add waitcnt to
ds_write alternatives.
---
 gcc/config/gcn/gcn-valu.md | 2 +-
 gcc/config/gcn/gcn.md  | 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 26559ff765e..e4d7f2a0f49 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -923,7 +923,7 @@
   {
 addr_space_t as = INTVAL (operands[3]);
 static char buf[200];
-sprintf (buf, "ds_write%%b2\t%%0, %%2 offset:%%1%s",
+sprintf (buf, "ds_write%%b2\t%%0, %%2 offset:%%1%s\;s_waitcnt\tlgkmcnt(0)",
 (AS_GDS_P (as) ? " gds" : ""));
 return buf;
   }
diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index ed98d2d2706..aeb25fbb931 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -554,7 +554,7 @@
   flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0
   flat_store_dword\t%A0, %1%O0%g0
   v_mov_b32\t%0, %1
-  ds_write_b32\t%A0, %1%O0
+  ds_write_b32\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read_b32\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)
   s_mov_b32\t%0, %1
   global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
@@ -582,7 +582,7 @@
   flat_load%o1\t%0, %A1%O1%g1\;s_waitcnt\t0
   flat_store%s0\t%A0, %1%O0%g0
   v_mov_b32\t%0, %1
-  ds_write%b0\t%A0, %1%O0
+  ds_write%b0\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read%u1\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)
   global_load%o1\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
   global_store%s0\t%A0, %1%O0%g0"
@@ -611,7 +611,7 @@
   #
   flat_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\t0
   flat_store_dwordx2\t%A0, %1%O0%g0
-  ds_write_b64\t%A0, %1%O0
+  ds_write_b64\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read_b64\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)
   global_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
   global_store_dwordx2\t%A0, %1%O0%g0"
@@ -667,7 +667,7 @@
   #
   global_store_dwordx4\t%A0, %1%O0%g0
   global_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
-  ds_write_b128\t%A0, %1%O0
+  ds_write_b128\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read_b128\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)"
   "reload_completed
&& REG_P (operands[0])
-- 
2.28.0



[PATCH] openacc: Fix race condition in Fortran loop collapse tests

2020-09-08 Thread Julian Brown
The gangs participating in a gang-partitioned loop are not all guaranteed
to complete before some given gang continues to execute beyond that loop.
This means that two existing test cases contain a race condition,
because a loop that may be gang-partitioned is followed immediately by
another loop.  The fix is to place the loops in separate parallel regions.

Tested with offloading to AMD GCN.  I will apply shortly (testsuite
change only).

Julian

2020-09-08  Julian Brown  

libgomp/
* testsuite/libgomp.oacc-fortran/collapse-1.f90: Fix race condition.
* testsuite/libgomp.oacc-fortran/collapse-2.f90: Likewise.
---
 libgomp/testsuite/libgomp.oacc-fortran/collapse-1.f90 | 3 +++
 libgomp/testsuite/libgomp.oacc-fortran/collapse-2.f90 | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/collapse-1.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/collapse-1.f90
index 918c5d0d5b1..4857752f1b0 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/collapse-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/collapse-1.f90
@@ -14,6 +14,9 @@ program collapse1
 end do
   end do
 end do
+  !$acc end parallel
+
+  !$acc parallel
   !$acc loop collapse(2) reduction(.or.:l)
 do i = 1, 3
   do j = 4, 6
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/collapse-2.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/collapse-2.f90
index 98b6987750e..0a543909127 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/collapse-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/collapse-2.f90
@@ -13,6 +13,9 @@ program collapse2
 do 164 k = 5, 7
   a(i, j, k) = i + j + k
 164  end do
+  !$acc end parallel
+
+  !$acc parallel
   !$acc loop collapse(2) reduction(.or.:l)
 firstdo: do i = 1, 3
   do j = 4, 6
-- 
2.28.0



Re: [PATCH v2] c++: Further tweaks for new-expression and paren-init [PR77841]

2020-09-08 Thread Marek Polacek via Gcc-patches
On Mon, Sep 07, 2020 at 11:19:47PM -0400, Jason Merrill wrote:
> On 9/6/20 11:34 AM, Marek Polacek wrote:
> > @@ -3944,9 +3935,9 @@ build_new (location_t loc, vec 
> > **placement, tree type,
> >   }
> > /* P1009: Array size deduction in new-expressions.  */
> > -  if (TREE_CODE (type) == ARRAY_TYPE
> > -  && !TYPE_DOMAIN (type)
> > -  && *init)
> > +  const bool deduce_array_p = (TREE_CODE (type) == ARRAY_TYPE
> > +  && !TYPE_DOMAIN (type));
> > +  if (*init && (deduce_array_p || (nelts && cxx_dialect >= cxx20)))
> 
> Looks like this won't handle new (char[4]), for which we also get an
> ARRAY_TYPE.

Good catch.  Fixed & paren-init37.C added.

> >   {
> > /* This means we have 'new T[]()'.  */
> > if ((*init)->is_empty ())
> > @@ -3955,16 +3946,20 @@ build_new (location_t loc, vec 
> > **placement, tree type,
> >   CONSTRUCTOR_IS_DIRECT_INIT (ctor) = true;
> >   vec_safe_push (*init, ctor);
> > }
> > +  tree array_type = deduce_array_p ? TREE_TYPE (type) : type;
> 
> I'd call this variable elt_type.

Right, and it should be inside the block below.

> > tree  = (**init)[0];
> > /* The C++20 'new T[](e_0, ..., e_k)' case allowed by P0960.  */
> > if (!DIRECT_LIST_INIT_P (elt) && cxx_dialect >= cxx20)
> > {
> > - /* Handle new char[]("foo").  */
> > + /* Handle new char[]("foo"): turn it into new char[]{"foo"}.  */
> >   if (vec_safe_length (*init) == 1
> > - && char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (type)))
> > + && char_type_p (TYPE_MAIN_VARIANT (array_type))
> >   && TREE_CODE (tree_strip_any_location_wrapper (elt))
> >  == STRING_CST)
> > -   /* Leave it alone: the string should not be wrapped in {}.  */;
> > +   {
> > + elt = build_constructor_single (init_list_type_node, NULL_TREE, 
> > elt);
> > + CONSTRUCTOR_IS_DIRECT_INIT (elt) = true;
> > +   }
> >   else
> > {
> >   tree ctor = build_constructor_from_vec (init_list_type_node, 
> > *init);
> 
> With this change, doesn't the string special case produce the same result as
> the general case?

The problem is that reshape_init won't do anything for 
CONSTRUCTOR_IS_PAREN_INIT.
So the reshape_init in build_new_1 wouldn't unwrap the outermost { } around
a STRING_CST.

Perhaps reshape_init should be adjusted to do that unwrapping even when it gets
a CONSTRUCTOR_IS_PAREN_INIT CONSTRUCTOR.  But I'm not sure if it should also do
the reference_related_p unwrapping in reshape_init_r in that case.

> > @@ -3977,9 +3972,15 @@ build_new (location_t loc, vec 
> > **placement, tree type,
> > }
> > }
> > /* Otherwise we should have 'new T[]{e_0, ..., e_k}'.  */
> > -  if (BRACE_ENCLOSED_INITIALIZER_P (elt))
> > -   elt = reshape_init (type, elt, complain);
> > -  cp_complete_array_type (, elt, /*do_default*/false);
> > +  if (deduce_array_p)
> > +   {
> > + /* Don't reshape ELT itself: we want to pass a list-initializer to
> > +build_new_1, even for STRING_CSTs.  */
> > + tree e = elt;
> > + if (BRACE_ENCLOSED_INITIALIZER_P (e))
> > +   e = reshape_init (type, e, complain);
> 
> The comment is unclear; this call does reshape the CONSTRUCTOR ELT points
> to, it just doesn't change ELT if the reshape call returns something else.

Yea, I've amended the comment.

> Why are we reshaping here, anyway?  Won't that lead to undesired brace
> elision?

We have to reshape before deducing the array, otherwise we could deduce the
wrong number of elements when certain braces were omitted.  E.g. in

  struct S { int x, y; };
  new S[]{1, 2, 3, 4}; // braces elided, is { {1, 2}, {3, 4} }

we want S[2], not S[4].  A way to test it would be

  struct S { int x, y; };
  S *p = new S[]{1, 2, 3, 4};

  void* operator new (unsigned long int size)
  {
  if (size != sizeof (S) * 2)
__builtin_abort ();
  return __builtin_malloc (size);
  }

  int main () { }

I can add that too, if you want.  (It'd be safer if cp_complete_array_type
always reshaped but that's not trivial, as the original patch mentions.)
()-init-list wouldn't be reshaped because CONSTRUCTOR_IS_PAREN_INIT is set.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

Thanks,

-- >8 --
This patch corrects our handling of array new-expression with ()-init:

  new int[4](1, 2, 3, 4);

should work even with the explicit array bound, and

  new char[3]("so_sad");

should cause an error, but we weren't giving any.

Fixed by handling array new-expressions with ()-init in the same spot
where we deduce the array bound in array new-expression.  I'm now
always passing STRING_CSTs to build_new_1 wrapped in { } which allowed
me to remove the special handling of STRING_CSTs in build_new_1.  And
since the DIRECT_LIST_INIT_P block in build_new_1 calls digest_init, we
report errors about too short arrays.

I took a stab at cp_complete_array_type's "FIXME: this 

libbacktrace patch committed: Only match magic number at start of file

2020-09-08 Thread Ian Lance Taylor via Gcc-patches
This patch fixes the libbacktrace file type detection, which is run at
configure time, to only look for a magic number at the very start of
the file.  Otherwise we can get confused if the bytes happen to appear
elsewhere on the first "line".  This is for PR 96971.  Bootstrapped
and ran libbacktrace tests on x86_64-pc-linux-gnu.  Committed to
mainline.

Ian

PR libbacktrace/96971
* filetype.awk: Only match magic number at start of line.
diff --git a/libbacktrace/filetype.awk b/libbacktrace/filetype.awk
index 14d91581f7e..1eefa7e72f0 100644
--- a/libbacktrace/filetype.awk
+++ b/libbacktrace/filetype.awk
@@ -1,13 +1,13 @@
 # An awk script to determine the type of a file.
-/\177ELF\001/  { if (NR == 1) { print "elf32"; exit } }
-/\177ELF\002/  { if (NR == 1) { print "elf64"; exit } }
-/\114\001/ { if (NR == 1) { print "pecoff"; exit } }
-/\144\206/ { if (NR == 1) { print "pecoff"; exit } }
-/\001\337/ { if (NR == 1) { print "xcoff32"; exit } }
-/\001\367/ { if (NR == 1) { print "xcoff64"; exit } }
-/\376\355\372\316/ { if (NR == 1) { print "macho"; exit } }
-/\316\372\355\376/ { if (NR == 1) { print "macho"; exit } }
-/\376\355\372\317/ { if (NR == 1) { print "macho"; exit } }
-/\317\372\355\376/ { if (NR == 1) { print "macho"; exit } }
-/\312\376\272\276/ { if (NR == 1) { print "macho"; exit } }
-/\276\272\376\312/ { if (NR == 1) { print "macho"; exit } }
+/^\177ELF\001/  { if (NR == 1) { print "elf32"; exit } }
+/^\177ELF\002/  { if (NR == 1) { print "elf64"; exit } }
+/^\114\001/ { if (NR == 1) { print "pecoff"; exit } }
+/^\144\206/ { if (NR == 1) { print "pecoff"; exit } }
+/^\001\337/ { if (NR == 1) { print "xcoff32"; exit } }
+/^\001\367/ { if (NR == 1) { print "xcoff64"; exit } }
+/^\376\355\372\316/ { if (NR == 1) { print "macho"; exit } }
+/^\316\372\355\376/ { if (NR == 1) { print "macho"; exit } }
+/^\376\355\372\317/ { if (NR == 1) { print "macho"; exit } }
+/^\317\372\355\376/ { if (NR == 1) { print "macho"; exit } }
+/^\312\376\272\276/ { if (NR == 1) { print "macho"; exit } }
+/^\276\272\376\312/ { if (NR == 1) { print "macho"; exit } }


Re: Patch for 96948

2020-09-08 Thread Martin Storsjö

Hi,

On Tue, 8 Sep 2020, Kirill Müller wrote:


I haven't actually tested if the cfa value (ms_context.Rsp) is valid, I also
see no reason why it shouldn't be.

What does it take for your patch to be accepted? What's the minimum gcc
version where it will be available?


I'm not a maintainer nor a committer here, so it takes someone with such a 
role/privileges to review and commit it, and presumably it'd be part of 
the upcoming GCC 11 then, unless someone chooses to backport it to current 
release branches.


// Martin


[PATCH] Practical Improvement to libgcc Complex Divide

2020-09-08 Thread Patrick McGehearty via Gcc-patches
(Version 4)

(Added in version 4)
Fixed Changelog entry to include __divsc3, __divdc3, __divxc3, __divtc3.
Revised description to avoid incorrect use of "ulp (units last place)".
Modified float precison case to use double precision when double
precision hardware is available. Otherwise float uses the new algorithm.
Added code to scale subnormal numerator arguments when appropriate.
This change reduces 16 bit errors in double precision by a factor of 140.
Revised results charts to match current version of code.
Added background of tuning approach.

Summary of Purpose

The following patch to libgcc/libgcc2.c __divdc3 provides an
opportunity to gain important improvements to the quality of answers
for the default complex divide routine (half, float, double, extended,
long double precisions) when dealing with very large or very small exponents.

The current code correctly implements Smith's method (1962) [2]
further modified by c99's requirements for dealing with NaN (not a
number) results. When working with input values where the exponents
are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are
substantially different from the answers provided by quad precision
more than 1% of the time. This error rate may be unacceptable for many
applications that cannot a priori restrict their computations to the
safe range. The proposed method reduces the frequency of
"substantially different" answers by more than 99% for double
precision at a modest cost of performance.

Differences between current gcc methods and the new method will be
described. Then accuracy and performance differences will be discussed.

Background

This project started with an investigation related to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714.  Study of Beebe[1]
provided an overview of past and recent practice for computing complex
divide. The current glibc implementation is based on Robert Smith's
algorithm [2] from 1962.  A google search found the paper by Baudin
and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's
proposed patch [4] is based on that paper.

I developed two sets of test set by randomly distributing values over
a restricted range and the full range of input values. The current
complex divide handled the restricted range well enough, but failed on
the full range more than 1% of the time. Baudin and Smith's primary
test for "ratio" equals zero reduced the cases with 16 or more error
bits by a factor of 5, but still left too many flawed answers. Adding
debug print out to cases with substantial errors allowed me to see the
intermediate calculations for test values that failed. I noted that
for many of the failures, "ratio" was a subnormal. Changing the
"ratio" test from check for zero to check for subnormal reduced the 16
bit error rate by another factor of 12. This single modified test
provides the greatest benefit for the least cost, but the percentage
of cases with greater than 16 bit errors (double precision data) is
still greater than 0.027% (2.7 in 10,000).

Continued examination of remaining errors and their intermediate
computations led to the various tests of input value tests and scaling
to avoid under/overflow. The current patch does not handle some of the
rarest and most extreme combinations of input values, but the random
test data is only showing 1 case in 10 million that has an error of
greater than 12 bits. That case has 18 bits of error and is due to
subtraction cancellation. These results are significantly better
than the results reported by Baudin and Smith.

Support for half, float, double, extended, and long double precision
is included as all are handled with suitable preprocessor symbols in a
single source routine. Since half precision is computed with float
precision as per current libgcc practice, the enhanced algorithm
provides no benefit for half precision and would cost performance.
Therefore half precision is left unchanged.

The existing constants for each precision:
float: FLT_MAX, FLT_MIN;
double: DBL_MAX, DBL_MIN;
extended and/or long double: LDBL_MAX, LDBL_MIN
are used for avoiding the more common overflow/underflow cases.

Testing for when both parts of the denominator had exponents roughly
small enough to allow shifting any subnormal values to normal values,
all input values could be scaled up without risking unnecessary
overflow and gaining a clear improvement in accuracy. Similarly, when
either numerator was subnormal and the other numerator and both
denominator values were not too large, scaling could be used to reduce
risk of computing with subnormals.  The test and scaling values used
all fit within the allowed exponent range for each precision required
by the C standard.

Float precision has even more difficulty with getting correct answers
than double precision. When hardware for double precision floating
point operations is available, float precision is now handled in
double precision intermediate calculations with the original Smith
algorithm (i.e. the 

[PATCH] Add support for atomic_flag::wait/notify_one/notify_all

2020-09-08 Thread Thomas Rodgers
* include/bits/atomic_base.h (__atomic_flag::wait): Define.
(__atomic_flag::notify_one): Likewise.
(__atomic_flag::notify_all): Likewise.
* testsuite/29_atomics/atomic_flag/wait_notify/1.cc: New test.
---
 libstdc++-v3/include/bits/atomic_base.h   | 23 +++
 .../29_atomics/atomic_flag/wait_notify/1.cc   | 61 +++
 2 files changed, 84 insertions(+)
 create mode 100644 
libstdc++-v3/testsuite/29_atomics/atomic_flag/wait_notify/1.cc

diff --git a/libstdc++-v3/include/bits/atomic_base.h 
b/libstdc++-v3/include/bits/atomic_base.h
index c121d993fee..a7ddd03d544 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -229,6 +229,29 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __atomic_load(&_M_i, &__v, int(__m));
   return __v == __GCC_ATOMIC_TEST_AND_SET_TRUEVAL;
 }
+
+_GLIBCXX_ALWAYS_INLINE void
+wait(bool __old,
+   memory_order __m = memory_order_seq_cst) const noexcept
+{
+  std::__atomic_wait(&_M_i, __old,
+[__m, this, __old]()
+{ return this->test(__m) != __old; });
+}
+
+// TODO add const volatile overload
+
+_GLIBCXX_ALWAYS_INLINE void
+notify_one() const noexcept
+{ std::__atomic_notify(&_M_i, false); }
+
+// TODO add const volatile overload
+
+_GLIBCXX_ALWAYS_INLINE void
+notify_all() const noexcept
+{ std::__atomic_notify(&_M_i, true); }
+
+// TODO add const volatile overload
 #endif // C++20
 
 _GLIBCXX_ALWAYS_INLINE void
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_flag/wait_notify/1.cc 
b/libstdc++-v3/testsuite/29_atomics/atomic_flag/wait_notify/1.cc
new file mode 100644
index 000..6de7873ecc2
--- /dev/null
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_flag/wait_notify/1.cc
@@ -0,0 +1,61 @@
+// { dg-options "-std=gnu++2a -pthread" }
+// { dg-do run { target c++2a } }
+// { dg-require-effective-target pthread }
+// { dg-require-gthreads "" }
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+int
+main()
+{
+  using namespace std::literals::chrono_literals;
+
+  std::mutex m;
+  std::condition_variable cv;
+
+  std::atomic_flag a;
+  std::atomic_flag b;
+  std::thread t([&]
+   {
+ cv.notify_one();
+ a.wait(false);
+  b.test_and_set();
+  b.notify_one();
+   });
+
+  std::unique_lock l(m);
+  cv.wait(l);
+  std::this_thread::sleep_for(100ms);
+  a.test_and_set();
+  a.notify_one();
+  b.wait(false);
+  t.join();
+
+  VERIFY( a.test() );
+  VERIFY( b.test() );
+  return 0;
+}
-- 
2.26.2



Re: [PATCH] bb-reorder: Remove a misfiring micro-optimization (PR96475)

2020-09-08 Thread Segher Boessenkool
Hi!

Sorry this took so long to come back to...

On Tue, Aug 25, 2020 at 02:39:56PM -0600, Jeff Law wrote:
> On Fri, 2020-08-07 at 21:51 +, Segher Boessenkool wrote:
> > When the compgotos pass copies the tail of blocks ending in an indirect
> > jump, there is a micro-optimization to not copy the last one, since the
> > original block will then just be deleted.  This does not work properly
> > if cleanup_cfg does not merge all pairs of blocks we expect it to.
> > 
> > 
> > v2: This also deletes the other use of single_pred_p, which has the same
> > problem in principle, I just never have triggered it so far.
> > 
> > Tested on powerpc64-linux {-m32,-m64} like before.  Is this okay for
> > trunk?
> > 
> > 
> > Segher
> > 
> > 
> > 2020-08-07  Segher Boessenkool  
> > 
> > PR rtl-optimization/96475
> > * bb-reorder.c (maybe_duplicate_computed_goto): Remove single_pred_p
> > micro-optimization.
> So this may have already been answered, but why didn't we merge the single 
> pred
> with its single succ?

The patch makes the compgotos pass more robust, and also makes it work
in cases where the unmodified code would not (for example, with the
unmodified code, the original indirect jump is *not* copied to the
predecessors whenever possible).

I don't know if we hit this, or we hit a snag with some older GCC
releases that did not jump thread properly in all cases.  I suspect a
micture of the two :-/

> Though I guess your patch wouldn't hurt, so OK.

Thanks!


Segher


RE: [PING] floatformat.h: Add bfloat16 support.

2020-09-08 Thread Joseph Myers
On Tue, 8 Sep 2020, Willgerodt, Felix via Gcc-patches wrote:

> Thanks for your review. It seems like the format issue was introduced by 
> my email client when hitting reply. Sorry for that! The original patch 
> is formatted correctly, as I used git send-email: 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552079.html
> 
> Could you double-check and push the patch for me? This is the first time 
> I contribute to gcc and I therefore don't have write access.

I've now pushed this patch.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Add support for putting jump table into relocation read-only section

2020-09-08 Thread Segher Boessenkool
Hi!

On Mon, Aug 24, 2020 at 03:53:44PM +0800, HAO CHEN GUI wrote:
>   abs_jump_table = (!CASE_VECTOR_PC_RELATIVE
>                                     && 
> !targetm.asm_out.generate_pic_addr_diff_vec ()) ? 1 : 0;

  x = y ? 1 : 0;

is the same as

  x = y;

if y is already only 0 or 1, and otherwise it is

  x = !!y;

You can also write this as an "if", that often is more readable.

> @@ -2491,9 +2491,17 @@ final_scan_insn_1 (rtx_insn *insn, FILE *file, int 
> optimize_p ATTRIBUTE_UNUSED,
> if (! JUMP_TABLES_IN_TEXT_SECTION)
>   {
> int log_align;
> +   bool reloc;

> +extern section *default_function_rodata_section (tree, bool reloc);
> +extern section *default_no_function_rodata_section (tree, bool reloc);

"reloc" is an int elsewhere, and can have 4 values.  And you have to
interact with that code (mostly in varasm.c), so don't use the same name
meaning something else?


You need an RTL maintainer or global maintainer to review this...


Segher


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-08 Thread Qing Zhao via Gcc-patches



> On Sep 7, 2020, at 10:58 AM, H.J. Lu  wrote:
> 
> On Mon, Sep 7, 2020 at 7:06 AM Segher Boessenkool
> mailto:seg...@kernel.crashing.org>> wrote:
>> 
>> On Fri, Sep 04, 2020 at 11:52:13AM -0700, H.J. Lu wrote:
>>> On Fri, Sep 4, 2020 at 11:09 AM Segher Boessenkool
>>>  wrote:
 On Fri, Sep 04, 2020 at 10:34:23AM -0700, H.J. Lu wrote:
>> You probably have to do this for every target separately?  But it is not
>> enough to handle it in the epilogue, you also need to make sure it is
>> done on every path that returns *without* epilogue.
> 
> This feature is designed for normal return with epilogue.
 
 Very many normal returns do *not* pass through an epilogue, but are
 simple_return.  Disabling that is *much* more expensive than that 2%.
>>> 
>>> Sibcall isn't covered.  What other cases don't have an epilogue?
>> 
>> Shrink-wrapped stuff.  Quite important for performance.  Not something
>> you can throw away.
>> 
> 
> Qing, can you check how it interacts with shrink-wrap?

We have some discussion on shrink-wrapping previously.  And we agreed on  the 
following at that time:

"Shrink-wrapping often deals with the non-volatile registers, so that
doesn't matter much for this patch series.”

On the other hand, we deal with volatile registers in this patch, so from the 
registers point of view, there is NO overlap between this
Patch and the shrink-wrapping. 

So, what’s the other possible issues when this patch interacting with 
shrink-wrapping?

When I checked the gcc source code on shrink-wrapping as following 
(gcc/function.c):


…….
  rtx_insn *epilogue_seq = make_epilogue_seq ();

  /* Try to perform a kind of shrink-wrapping, making sure the
 prologue/epilogue is emitted only around those parts of the
 function that require it.  */
  try_shrink_wrapping (_edge, prologue_seq);

  /* If the target can handle splitting the prologue/epilogue into separate
 components, try to shrink-wrap these components separately.  */
  try_shrink_wrapping_separate (entry_edge->dest);

  /* If that did anything for any component we now need the generate the
 "main" prologue again.  Because some targets require some of these
 to be called in a specific order (i386 requires the split prologue
 to be first, for example), we create all three sequences again here.
 If this does not work for some target, that target should not enable
 separate shrink-wrapping.  */
  if (crtl->shrink_wrapped_separate)
{
  split_prologue_seq = make_split_prologue_seq ();
  prologue_seq = make_prologue_seq ();
  epilogue_seq = make_epilogue_seq ();
}
…….

My understanding from the above is:

1. “try_shrink_wrapping” should NOT interact with make_epilogue_seq since only 
“prologue_seq” will not touched. 
2. “try_shrink_wrapping_seperate”  might interact with epilogue, however, if 
there is anything changed with “try_shrink_wrapping_seperate”,
make_epilogue_seq() will be called again, and then the zeroing sequence 
will be generated still at the end of the routine. 

So, from the above, I didn’t see any obvious issues.

But I might miss some important  issues here, please let me know what I am 
missing here?

Thanks a lot for any help.

Qing



> 
> -- 
> H.J.



Re: [committed] analyzer: fix another ICE in constructor-handling [PR96949]

2020-09-08 Thread Thomas Koenig via Gcc-patches

Hi David,


I'm taking the liberty of adding the reproducer for this to
gfortran.dg/analyzer as a regression test; hope that's OK.


Sure.  Adding a passing test case like that certainly falls under the
"obvious and simple" rule.

Best regards

Thomas


Re: [RFC Patch] mklog.py: Parse first 10 lines for PR/DR number

2020-09-08 Thread Tobias Burnus

On 9/8/20 5:50 PM, Martin Sebor wrote:


On 9/8/20 3:47 AM, Tobias Burnus wrote:

currently, mklog searches for "PR" (and "DR") only in the
first line of a new 'testsuite' file.

I think in many cases, the PR is listed a bit later than
the first line – although, it is usually in the first few
lines; in my example, it is in line 3 and 4.

Admittedly, I do have cases where later lines are wrong
like
"! Not tested due to PR ...'

How about testing the first, e.g., ten lines?
That's what the attached patch does.


I frequently use "prN" in dg-warning directives xfailed due
to the pr.


Those won't match pr_regex = re.compile(r'(\/(\/|\*)|[Cc*!])\s+(?PPR 
[a-z+-]+\/[0-9]+)')


They're probably only rarely in the first 10 lines
but stopping the search after the first dg- directive is seen
would help reduce the likelihood of the false positives even
further.


I think stopping after the first 'dg-' directive does not make sense;
at least I tend to start testcases with 'dg-do' followed by
'dg-(additional-)options'.

However, the new version of the patch stops after the first
'dg-error/dg-warning'.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
mklog.py: Parse first 10 lines for PR/DR number

contrib/ChangeLog:

	* mklog.py: Parse first 10 lines for PR/DR number
	not only the first line.

diff --git a/contrib/mklog.py b/contrib/mklog.py
index 243edbb15c5..1e85dfe583a 100755
--- a/contrib/mklog.py
+++ b/contrib/mklog.py
@@ -38,6 +38,7 @@ from unidiff import PatchSet
 
 pr_regex = re.compile(r'(\/(\/|\*)|[Cc*!])\s+(?PPR [a-z+-]+\/[0-9]+)')
 dr_regex = re.compile(r'(\/(\/|\*)|[Cc*!])\s+(?PDR [0-9]+)')
+dg_regex = re.compile(r'{\s+dg-(error|warning)')
 identifier_regex = re.compile(r'^([a-zA-Z0-9_#].*)')
 comment_regex = re.compile(r'^\/\*')
 struct_regex = re.compile(r'^(class|struct|union|enum)\s+'
@@ -137,7 +138,10 @@ def generate_changelog(data, no_functions=False, fill_pr_titles=False):
 
 # Extract PR entries from newly added tests
 if 'testsuite' in file.path and file.is_added_file:
-for line in list(file)[0]:
+# Only search first ten lines as later lines may
+# contains commented code which a note that it
+# has not been tested due to a certain PR or DR.
+for line in list(file)[0][0:10]:
 m = pr_regex.search(line.value)
 if m:
 pr = m.group('pr')
@@ -149,7 +153,8 @@ def generate_changelog(data, no_functions=False, fill_pr_titles=False):
 dr = m.group('dr')
 if dr not in prs:
 prs.append(dr)
-else:
+elif dg_regex.search(line.value):
+# Found dg-warning/dg-error line
 break
 
 if fill_pr_titles:


Re: Patch for 96948

2020-09-08 Thread Kirill Müller via Gcc-patches

Hi


Thanks for the explanation, this makes sense now.

I haven't actually tested if the cfa value (ms_context.Rsp) is valid, I 
also see no reason why it shouldn't be.


What does it take for your patch to be accepted? What's the minimum gcc 
version where it will be available?



Best regards

Kirill


On 08.09.20 17:34, Martin Storsjö wrote:

Hi,

On Tue, 8 Sep 2020, Kirill Müller wrote:

Thanks for the heads up. The coincidence is funny -- a file that 
hasn't been touched for years.


I think we both may originally be triggered from the same guy asking 
around in different places about implementations of _Unwind_Backtrace 
for windows, actually.


I do believe that we need the logic around the `first` flag for 
consistency with the other unwind-*.c implementations.


Yes, if you store ms_context.Rip/Rsp before the RtlVirtualUnwind step 
- but my patch stores them afterwards; after RtlVirtualUnwind, before 
calling the callback.


The result should be the same, except if using the first flag 
approach, I believe you're missing the last frame that is printed if 
using my patch.


// Martin


Re: [RFC Patch] mklog.py: Parse first 10 lines for PR/DR number

2020-09-08 Thread Martin Sebor via Gcc-patches

On 9/8/20 3:47 AM, Tobias Burnus wrote:

Hi Martin, hi all,

currently, mklog searches for "PR" (and "DR") only in the
first line of a new 'testsuite' file.

I think in many cases, the PR is listed a bit later than
the first line – although, it is usually in the first few
lines; in my example, it is in line 3 and 4.

Admittedly, I do have cases where later lines are wrong
like
"! Not tested due to PR ...'

How about testing the first, e.g., ten lines?
That's what the attached patch does.


I frequently use "prN" in dg-warning directives xfailed due
to the pr.  They're probably only rarely in the first 10 lines
but stopping the search after the first dg- directive is seen
would help reduce the likelihood of the false positives even
further.

Martin


RE: [PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-09-08 Thread Alex Coplan
Hi Christophe,

> -Original Message-
> From: Christophe Lyon 
> Sent: 08 September 2020 09:15
> To: Alex Coplan 
> Cc: gcc Patches ; Richard Earnshaw
> ; Marcus Shawcroft 
> Subject: Re: [PATCH] aarch64: Don't generate invalid zero/sign-extend
> syntax
> 
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64.md
> > (*adds__): Ensure extended operand
> > agrees with width of extension specifier.
> > (*subs__): Likewise.
> > (*adds__shift_): Likewise.
> > (*subs__shift_): Likewise.
> > (*add__): Likewise.
> > (*add__shft_): Likewise.
> > (*add_uxt_shift2): Likewise.
> > (*sub__): Likewise.
> > (*sub__shft_): Likewise.
> > (*sub_uxt_shift2): Likewise.
> > (*cmp_swp__reg): Likewise.
> > (*cmp_swp__shft_): Likewise.
> >
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/adds3.c: Fix test w.r.t. new syntax.
> > * gcc.target/aarch64/cmp.c: Likewise.
> > * gcc.target/aarch64/subs3.c: Likewise.
> > * gcc.target/aarch64/subsp.c: Likewise.
> > * gcc.target/aarch64/extend-syntax.c: New test.
> >
> 
> Hi,
> 
> I've noticed some of the new tests fail with -mabi=ilp32:
> gcc.target/aarch64/extend-syntax.c check-function-bodies add1
> gcc.target/aarch64/extend-syntax.c check-function-bodies add3
> gcc.target/aarch64/extend-syntax.c check-function-bodies sub2
> gcc.target/aarch64/extend-syntax.c check-function-bodies sub3
> gcc.target/aarch64/extend-syntax.c scan-assembler-times
> subs\tx[0-9]+, x[0-9]+, w[0-9]+, sxtw 3 1
> gcc.target/aarch64/subsp.c scan-assembler sub\tsp, sp, w[0-9]*, sxtw
> 4\n

Thanks for catching these.

The failures in extend-syntax.c just need the assertions tweaking, I have a
patch to fix those.

The failure in subsp.c is more interesting: looks like a missed optimisation on
ILP32, I'm taking a look.

> 
> Christophe

Thanks,
Alex


Re: Patch for 96948

2020-09-08 Thread Martin Storsjö

Hi,

On Tue, 8 Sep 2020, Kirill Müller wrote:

Thanks for the heads up. The coincidence is funny -- a file 
that hasn't been touched for years.


I think we both may originally be triggered from the same guy asking 
around in different places about implementations of _Unwind_Backtrace for 
windows, actually.


I do believe that we need the logic around the `first` flag 
for consistency with the other unwind-*.c implementations.


Yes, if you store ms_context.Rip/Rsp before the RtlVirtualUnwind step - 
but my patch stores them afterwards; after RtlVirtualUnwind, before 
calling the callback.


The result should be the same, except if using the first flag approach, I 
believe you're missing the last frame that is printed if using my patch.


// Martin


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-08 Thread Patrick McGehearty via Gcc-patches

My understanding is this feature/flag is not intended to be "default on".
It is intended to be used in security sensitive environments such
as the Linux kernel where it was requested by kernel security experts.
I'm not understanding the objection here if the feature is requested
by security teams and the average cost is modest.

My background is in performance and application optimization. I agree
that for typical computation oriented, non-secure applications, I would
not use the feature, but for system applications that have the ability
to cross protection boundaries, it seems to be clearly a worthwhile
feature.

- patrick


On 9/7/2020 9:44 AM, Segher Boessenkool wrote:

On Fri, Sep 04, 2020 at 01:23:14AM +, Rodriguez Bahena, Victor wrote:

Qing, thanks a lot for the measurement, I am not sure if this is the limit of 
overhead the community is willing to accept by adding extra security (me as gcc 
user will be willing to accept).

The overhead is of course bearable for most programs / users, but what
is the return?  For what percentage of programs are ROP attacks no
longer possible, for example.


Segher




Re: [PATCH][libatomic] Add nvptx support

2020-09-08 Thread Tobias Burnus

On 9/8/20 8:51 AM, Tom de Vries wrote:


Add nvptx support to libatomic.


I tried it on powerpc64le-none-linux-gnu and that solves the
__sync_val_compare_and_swap_16 issue, I reported at
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553070.html

However, when trying Jakub's example (see below; syntax fixed
version), https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553142.html
it (still) fails with:

atomic.c: In function 'main._omp_fn.0':
atomic.c:6:11: internal compiler error: in write_fn_proto, at 
config/nvptx/nvptx.c:913
6 |   #pragma omp target

Tobias

PS: The 'atomic.c' testcase:

__uint128_t v;
#pragma omp declare target (v)
int
main ()
{
  #pragma omp target
  {
__atomic_add_fetch (, 1, __ATOMIC_RELAXED);
__atomic_fetch_add (, 1, __ATOMIC_RELAXED);
__uint128_t exp = 2;
__atomic_compare_exchange_n (, , 7, 0, __ATOMIC_RELEASE, 
__ATOMIC_ACQUIRE);
  }
}

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


[committed] analyzer: fix another ICE in constructor-handling [PR96949]

2020-09-08 Thread David Malcolm via Gcc-patches
PR analyzer/96949 reports an ICE within -fanalyzer on a Fortran test
case with --param analyzer-max-svalue-depth=0, where that param value
leads to INTEGER_CST values in a RANGE_EXPR being treated as unknown
symbolic values.

This patch replaces implicit assumptions that these values are
concrete (and thus have concrete bit offsets), adding
error-handling for symbolic cases instead of assertions, fixing the ICE.

I'm taking the liberty of adding the reproducer for this to
gfortran.dg/analyzer as a regression test; hope that's OK.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r11-3052-g34d926dba097c4965917d09a3eedec11242c5457.

gcc/analyzer/ChangeLog:
PR analyzer/96949
* store.cc (binding_map::apply_ctor_val_to_range): Add
error-handling for the cases where we have symbolic offsets.

gcc/testsuite/ChangeLog:
PR analyzer/96949
* gfortran.dg/analyzer/pr96949.f90: New test.
---
 gcc/analyzer/store.cc |  8 ++--
 .../gfortran.dg/analyzer/pr96949.f90  | 20 +++
 2 files changed, 26 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/analyzer/pr96949.f90

diff --git a/gcc/analyzer/store.cc b/gcc/analyzer/store.cc
index 94bcbecce88..1348895e5c7 100644
--- a/gcc/analyzer/store.cc
+++ b/gcc/analyzer/store.cc
@@ -466,11 +466,14 @@ binding_map::apply_ctor_val_to_range (const region 
*parent_reg,
   const region *max_element
 = get_subregion_within_ctor (parent_reg, max_index, mgr);
   region_offset min_offset = min_element->get_offset ();
+  if (min_offset.symbolic_p ())
+return false;
   bit_offset_t start_bit_offset = min_offset.get_bit_offset ();
   store_manager *smgr = mgr->get_store_manager ();
   const binding_key *max_element_key
 = binding_key::make (smgr, max_element, BK_direct);
-  gcc_assert (max_element_key->concrete_p ());
+  if (max_element_key->symbolic_p ())
+return false;
   const concrete_binding *max_element_ckey
 = max_element_key->dyn_cast_concrete_binding ();
   bit_size_t range_size_in_bits
@@ -478,7 +481,8 @@ binding_map::apply_ctor_val_to_range (const region 
*parent_reg,
   const concrete_binding *range_key
 = smgr->get_concrete_binding (start_bit_offset, range_size_in_bits,
  BK_direct);
-  gcc_assert (range_key->concrete_p ());
+  if (range_key->symbolic_p ())
+return false;
 
   /* Get the value.  */
   if (TREE_CODE (val) == CONSTRUCTOR)
diff --git a/gcc/testsuite/gfortran.dg/analyzer/pr96949.f90 
b/gcc/testsuite/gfortran.dg/analyzer/pr96949.f90
new file mode 100644
index 000..4af96bb9676
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/analyzer/pr96949.f90
@@ -0,0 +1,20 @@
+! { dg-do compile }
+! { dg-additional-options "-Wno-analyzer-too-complex --param 
analyzer-max-svalue-depth=0" }
+
+program n6
+  integer :: ck(2,2)
+  integer :: ac
+
+  data ck /4 * 1/
+
+  call x9()
+
+contains
+  subroutine x9()
+if (ck(2, 1) == 1) then
+   ac = 1
+else
+   ac = 0
+end if
+  end subroutine x9
+end program n6
-- 
2.26.2



Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-08 Thread Qing Zhao via Gcc-patches



> On Sep 7, 2020, at 8:06 AM, Rodriguez Bahena, Victor 
>  wrote:
> 
>  
>  
> From: Qing Zhao mailto:qing.z...@oracle.com>>
> Date: Friday, September 4, 2020 at 9:19 AM
> To: "Rodriguez Bahena, Victor"  >, Kees Cook  >
> Cc: Segher Boessenkool  >, Jakub Jelinek  >, Uros Bizjak  >, GCC Patches  >
> Subject: Re: PING [Patch][Middle-end]Add 
> -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
>  
>  
> 
> 
>> On Sep 3, 2020, at 8:23 PM, Rodriguez Bahena, Victor 
>> > > wrote:
>>  
>> 
>> 
>> -Original Message-
>> From: Qing Zhao mailto:qing.z...@oracle.com>>
>> Date: Thursday, September 3, 2020 at 12:55 PM
>> To: Kees Cook mailto:keesc...@chromium.org>>
>> Cc: Segher Boessenkool > >, Jakub Jelinek > >, Uros Bizjak > >, "Rodriguez Bahena, Victor" 
>> > >, GCC Patches 
>> mailto:gcc-patches@gcc.gnu.org>>
>> Subject: Re: PING [Patch][Middle-end]Add 
>> -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
>> 
>> 
>> 
>> 
>>> On Sep 3, 2020, at 12:13 PM, Kees Cook >> > wrote:
>>> 
>>> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
>>> 
 On average, all the options starting with “used_…”  (i.e, only the 
 registers that are used in the routine will be zeroed) have very low 
 runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP 
 benchmarks. 
 If all the registers will be zeroed, the runtime overhead is bigger, 
 all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks 
 on average. 
 Looks like the overhead of zeroing vector registers is much bigger. 
 
 For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, 
 the runtime overhead with this is very small.
>>> 
>>> That looks great; thanks for doing those tests!
>>> 
>>> (And it seems like these benchmarks are kind of a "worst case" scenario
>>> with regard to performance, yes? As in it's mostly tight call loops?)
>> 
>>The top 3 benchmarks that have the most overhead from this option are: 
>> 531.deepsjeng_r, 541.leela_r, and 511.povray_r.
>>All of them are C++ benchmarks. 
>>I guess that the most important reason is  the smaller routine size in 
>> general (especially at the hot execution path or loops).
>>As a result, the overhead of these additional zeroing instructions in 
>> each routine will be relatively higher.  
>> 
>>Qing
>> 
>> I think that overhead is expected in benchmarks like 541.leela_r, according 
>> to 
>> https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html__;!!GqivPVa7Brio!I4c2wyzrNGbeOTsX7BSD-4C9Cv3ypQ4N1qfRzSK__STxRGa5M4VarBKof2ak8-dT$
>>  
>> 
>>   is a benchmark for Artificial Intelligence (Monte Carlo simulation, game 
>> tree search & pattern recognition). The addition of fzero-call-used-regs 
>> will represent an overhead each time the functions are being call and in 
>> areas like game tree search is high. 
>> 
>> Qing, thanks a lot for the measurement, I am not sure if this is the limit 
>> of overhead the community is willing to accept by adding extra security (me 
>> as gcc user will be willing to accept). 
>  
> From the performance data, we can see that the runtime overhead of clearing 
> only_used registers is very reasonable, even for 541.leela_r, 
> 531.deepsjent_r, and 511.povray.   If try to clear all registers whatever 
> used or not in the current routine, the overhead will be increased 
> dramatically. 
>  
> So, my question is:
>  
> From the security point of view, does clearing ALL registers have more 
> benefit than clearing USED registers?  
> From my understanding, clearing registers that are not used in the current 
> routine does NOT provide additional benefit, correct me if I am wrong here.
>  
> You are right, it does not provide additional security

Then, is it necessary to provide 

-fzero-call-used-regs=all-arg|all-gpr|all   to the user?

Can we just delete these 3 sub options?


Qing


>  
>  
> Thanks.
>  
> Qing
>  
>  
>> 
>> Regards
>> 
>> Victor 
>> 
>> 
>> 
>>> 
>>> -- 
>>> Kees Cook
> 
> 
> 



Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-08 Thread Qing Zhao via Gcc-patches


> On Sep 7, 2020, at 9:36 AM, Segher Boessenkool  
> wrote:
> 
> On Fri, Sep 04, 2020 at 02:00:41PM -0500, Qing Zhao wrote:
 However, if we only clear USED registers, the worst case is 1.72% on 
 average.  This overhead is very reasonable. 
>>> 
>>> No, that is the number I meant.  2% overhead is extremely much, unless
>>> this is magically super effective, and actually protects many things
>>> from exploitation (that aren't already protected some other way, SSP for
>>> example).
>> 
>> Then how about the 0.81% overhead on average for 
>> -fzero-call-used-regs=used_gpr_arg? 
> 
> That is still quite a lot.
> 
>> This option can be used to effectively mitigate ROP attack. 
> 
> Nice assertion.  Show it!

As I mentioned multiple times,  one important background of this patch is this  
paper which was published at 2018 IEEE 29th International Conference on 
Application-specific Systems, Architectures and Processors (ASAP):

"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming 
Attacks”

https://ieeexplore.ieee.org/document/8445132

Downloading this paper form IEEE needs a fee. I have downloaded it from my 
company’s account, however, After consulting, it turned out that I was not 
allowed to further forward the copy I downloaded through my company’s account 
to this alias. 

However, There is some more information on this paper online though:

https://www.semanticscholar.org/paper/Clean-the-Scratch-Registers:-A-Way-to-Mitigate-Rong-Xie/6f2ce4fd31baa0f6c02f9eb5c57b90d39fe5fa13

All the figures and tables in this paper are available in this link. 

In which, Table III, Table IV and Table V are the results of “zeroing scratch 
register mitigate ROP attack”. From the tables, zeroing scratch registers can 
successfully mitigate the ROP on all those benchmarks. 

What other information you need to show the effective of mitigation ROP attack?

> 
>>> Yes.  Which is why I asked for numbers of both sides of the equation:
>>> how much it costs, vs. how much value it brings.--- Begin Message ---


> On Aug 25, 2020, at 9:05 AM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> 
> 
>> On Aug 25, 2020, at 1:41 AM, Uros Bizjak  wrote:
>> 
 
>> (The other side of the coin is how much this helps prevent exploitation;
>> numbers on that would be good to see, too.)
> 
> This can be well showed from the paper:
> 
> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented 
> Programming Attacks"
> 
> https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$
>  
>   >
> 
> Please take a look at this paper.
 
 As I told you before, that isn't open information, I cannot reply to
 any of that.
>>> 
>>> A little confused here, what’s you mean by “open information”? Is the 
>>> information in a published paper not open information?
>> 
>> No, because it is behind a paywall.
> 
> Still don’t understand here:  this paper has been published in the proceeding 
> of “ 2018 IEEE 29th International Conference on Application-specific Systems, 
> Architectures and Processors (ASAP)”.
> If you want to read the complete version online, you need to pay for it.
> 
> However, it’s still a published paper, and the information inside it should 
> be “open information”. 
> 
> So, what’s the definition of “open information” you have?
> 
> I downloaded a PDF copy of this paper through my company’s paid account.  But 
> I am not sure whether it’s legal for me to attach it to this mailing list?

After consulting, it turned out that I was not allowed to further forward the 
copy I downloaded through my company’s account to this alias. 
There is some more information on this paper online though:

https://urldefense.com/v3/__https://www.semanticscholar.org/paper/Clean-the-Scratch-Registers:-A-Way-to-Mitigate-Rong-Xie/6f2ce4fd31baa0f6c02f9eb5c57b90d39fe5fa13__;!!GqivPVa7Brio!I4MGz7_DH7Dtcfzmgz7MxfDNnuJO-CiNo1jUcp4OOQOiPi4uEEOfuoT7_1SSMt1D$
 

All the figures and tables in this paper are available in this link. 

In which, Figure 1 is an illustration  of a typical ROP attack, please pay 
special attention on the “Gadgets”, which are carefully chosen machine 
instruction sequences that are already present in the machine's memory, Each 
gadget typically ends in a return instruction and is located in a subroutine 
within the existing program and/or shared library code. Chained together, these 
gadgets allow an attacker to perform arbitrary operations on a machine 
employing defenses that thwart simpler attacks.

The paper identified the important features of ROP attack as following:

"First, the destination of using gadget chains in usual is performing system 
call or system fucntion to perform malicious 

[committed] analyzer: fix ICE on RANGE_EXPR with CONSTRUCTOR value [PR96950]

2020-09-08 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r11-3051-gaf656c401e97f9de2a8478f18278e8efb2a6cf23.

gcc/analyzer/ChangeLog:
PR analyzer/96950
* store.cc (binding_map::apply_ctor_to_region): Handle RANGE_EXPR
where min_index == max_index.
(binding_map::apply_ctor_val_to_range): Replace assertion that we
don't have a CONSTRUCTOR value with error-handling.
---
 gcc/analyzer/store.cc | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/gcc/analyzer/store.cc b/gcc/analyzer/store.cc
index 7f15aa92492..94bcbecce88 100644
--- a/gcc/analyzer/store.cc
+++ b/gcc/analyzer/store.cc
@@ -425,9 +425,18 @@ binding_map::apply_ctor_to_region (const region 
*parent_reg, tree ctor,
{
  tree min_index = TREE_OPERAND (index, 0);
  tree max_index = TREE_OPERAND (index, 1);
- if (!apply_ctor_val_to_range (parent_reg, mgr,
-   min_index, max_index, val))
-   return false;
+ if (min_index == max_index)
+   {
+ if (!apply_ctor_pair_to_child_region (parent_reg, mgr,
+   min_index, val))
+   return false;
+   }
+ else
+   {
+ if (!apply_ctor_val_to_range (parent_reg, mgr,
+   min_index, max_index, val))
+   return false;
+   }
  continue;
}
   if (!apply_ctor_pair_to_child_region (parent_reg, mgr, index, val))
@@ -472,7 +481,8 @@ binding_map::apply_ctor_val_to_range (const region 
*parent_reg,
   gcc_assert (range_key->concrete_p ());
 
   /* Get the value.  */
-  gcc_assert (TREE_CODE (val) != CONSTRUCTOR);
+  if (TREE_CODE (val) == CONSTRUCTOR)
+return false;
   const svalue *sval = get_svalue_for_ctor_val (val, mgr);
 
   /* Bind the value to the range.  */
-- 
2.26.2



[committed] analyzer: fix ICE on machine-specific builtins [PR96962]

2020-09-08 Thread David Malcolm via Gcc-patches
In g:ee7bfbe5eb70a23bbf3a2cedfdcbd2ea1a20c3f2 I added a
  switch (DECL_UNCHECKED_FUNCTION_CODE (callee_fndecl))
to region_model::on_call_pre guarded by
  fndecl_built_in_p (callee_fndecl).
I meant to handle only normal built-ins, whereas this
single-argument overload of fndecl_built_in_p returns true for any
kind of built-in.

PR analyzer/96962 reports a case where this matches for a
machine-specific builtin, leading to an ICE.  Fixed thusly.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r11-3050-g47997a32e63b77ec88a7131a5d540f108c698661.

gcc/analyzer/ChangeLog:
PR analyzer/96962
* region-model.cc (region_model::on_call_pre): Fix guard on switch
on built-ins to only consider BUILT_IN_NORMAL, rather than other
kinds of build-ins.
---
 gcc/analyzer/region-model.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index a7bc48115ee..e6a9d3cacd8 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -653,7 +653,7 @@ region_model::on_call_pre (const gcall *call, 
region_model_context *ctxt)
 Having them split out into separate functions makes it easier
 to put breakpoints on the handling of specific functions.  */
 
-  if (fndecl_built_in_p (callee_fndecl)
+  if (fndecl_built_in_p (callee_fndecl, BUILT_IN_NORMAL)
  && gimple_builtin_call_types_compatible_p (call, callee_fndecl))
switch (DECL_UNCHECKED_FUNCTION_CODE (callee_fndecl))
  {
-- 
2.26.2



Re: [RFC] enable flags-unchanging asms, add_overflow/expand/combine woes

2020-09-08 Thread Hans-Peter Nilsson
On Thu, 3 Sep 2020, Alexandre Oliva wrote:
> On Sep  3, 2020, Segher Boessenkool  wrote:
> > For instructions that inherently set a condition code register, the
> > @code{compare} operator is always written as the first RTL expression of
> > the @code{parallel} instruction pattern.
>
> Interesting.  I'm pretty sure I read email recently that suggested it
> was really up to the port, but I've caught up with GCC emails from years
> ago, so that might have been it.  Or I just misremember.  Whatever.

If you remember far enough back, you were right. :)

As I recall it, at one time, it was up to the port.  Then, some
time after the x86 port was decc0rated, and cc0 judged evil (but
before the infrastructure changes to seriously support
decc0ration), it became important and there was a discussion on
canonicalizing the order.  I remember arguing for the
flags-setting to be last in the parallel, consistent with the
clobber canonically being last and the "important" part of the
insn should be first, (possibly also having observed combine
ordering as you did) but that's not the way it turned out.
I have a faint memory about the order in the x86 patterns even
being used as an argument!

> Since there is a canonical order, maybe combine should attempt to follow
> that order.

> Anyway...  Does this still seem worth pursuing?

Are you referring to your non-flags-clobbering or making combine
order flags-side-effect parallels canonically?

I don't have an opinion in the former, but IMHO, yes, getting
the combined order right seems worthwile.  (No, I have no
targets that'd benefit from this.)

brgds, H-P


Re: [PATCH] tree-optimization/96043 - BB vectorization costing improvement

2020-09-08 Thread Richard Biener
On Tue, 8 Sep 2020, Richard Biener wrote:

> 
> This makes the BB vectorizer cost independent SLP subgraphs
> separately.  While on pristine trunk and for x86_64 I failed to
> distill a testcase where the vectorizer would think _any_
> basic-block vectorization opportunity is not profitable I do
> have pending work that would make the cost savings of a
> profitable opportunity make another independently not
> profitable opportunity vectorized.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> CCing some people to double-check my graph partitioning algorithm.

In fact I can amend gcc.dg/vect/costmodel/x86_64/costmodel-pr69297.c
to make it currently profitable to vectorize and after the patch
only vectorize the profitable part.  Consider the patch amended
with this.

This becomes more important once we merge Martins patches to
consider the whole function at once and not stop at basic-block
boundaries for finding SLP opportunities.

Richard.

diff --git 
a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr69297.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr69297.c
index e65a30c06d6..ef74785f6a8 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr69297.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr69297.c
@@ -74,10 +74,28 @@ foo (int* diff)
 d[13] = m[12] - m[13];
 d[14] = m[14] + m[15];
 d[15] = m[15] - m[14];
+/* The following obviously profitable part should not make
+   the former unprofitable one profitable.  */
+diff[16 + 16] = diff[16];
+diff[17 + 16] = diff[17];
+diff[18 + 16] = diff[18];
+diff[19 + 16] = diff[19];
+diff[20 + 16] = diff[20];
+diff[21 + 16] = diff[21];
+diff[22 + 16] = diff[22];
+diff[23 + 16] = diff[23];
+diff[24 + 16] = diff[24];
+diff[25 + 16] = diff[25];
+diff[26 + 16] = diff[26];
+diff[27 + 16] = diff[27];
+diff[28 + 16] = diff[28];
+diff[29 + 16] = diff[29];
+diff[30 + 16] = diff[30];
+diff[31 + 16] = diff[31];
 for (k=0; k<16; k++)
   satd += abs(d[k]);
   return satd;
 }

 /* { dg-final { scan-tree-dump "vectorization is not profitable" "slp1" } 
} */
-/* { dg-final { scan-tree-dump-not "basic block vectorized" "slp1" } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing SLP tree" 1 "slp1" } } 
*/


> Richard.
> 
> 2020-09-08  Richard Biener  
> 
>   PR tree-optimization/96043
>   * tree-vectorizer.h (_slp_instance::cost_vec): New.
>   (_slp_instance::subgraph_entries): Likewise.
>   (BB_VINFO_TARGET_COST_DATA): Remove.
>   * tree-vect-slp.c (vect_free_slp_instance): Free
>   cost_vec and subgraph_entries.
>   (vect_analyze_slp_instance): Initialize them.
>   (vect_slp_analyze_operations): Defer passing costs to
>   the target, instead record them in the SLP graph entry.
>   (get_ultimate_leader): New helper for graph partitioning.
>   (vect_bb_partition_graph_r): Likewise.
>   (vect_bb_partition_graph): New function to partition the
>   SLP graph into independently costable parts.
>   (vect_bb_vectorization_profitable_p): Adjust to work on
>   a subgraph.
>   (vect_bb_vectorization_profitable_p): New wrapper,
>   discarding non-profitable vectorization of subgraphs.
>   (vect_slp_analyze_bb_1): Call vect_bb_partition_graph before
>   costing.
> ---
>  gcc/tree-vect-slp.c   | 190 ++
>  gcc/tree-vectorizer.h |   8 +-
>  2 files changed, 180 insertions(+), 18 deletions(-)
> 
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index 2b7fd685ef9..35e8985d159 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -126,6 +126,8 @@ vect_free_slp_instance (slp_instance instance, bool 
> final_p)
>  {
>vect_free_slp_tree (SLP_INSTANCE_TREE (instance), final_p);
>SLP_INSTANCE_LOADS (instance).release ();
> +  instance->subgraph_entries.release ();
> +  instance->cost_vec.release ();
>free (instance);
>  }
>  
> @@ -2269,6 +2271,8 @@ vect_analyze_slp_instance (vec_info *vinfo,
> SLP_INSTANCE_LOADS (new_instance) = vNULL;
> SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ? stmt_info : 
> NULL;
> new_instance->reduc_phis = NULL;
> +   new_instance->cost_vec = vNULL;
> +   new_instance->subgraph_entries = vNULL;
>  
> vect_gather_slp_loads (new_instance, node);
> if (dump_enabled_p ())
> @@ -3153,8 +3157,8 @@ vect_slp_analyze_operations (vec_info *vinfo)
>   visited.add (*x);
> i++;
>  
> -   add_stmt_costs (vinfo, vinfo->target_cost_data, _vec);
> -   cost_vec.release ();
> +   /* Remember the SLP graph entry cost for later.  */
> +   instance->cost_vec = cost_vec;
>   }
>  }
>  
> @@ -3162,18 +3166,113 @@ vect_slp_analyze_operations (vec_info *vinfo)
>if (bb_vec_info bb_vinfo = dyn_cast  (vinfo))
>  {
>hash_set svisited;
> -  stmt_vector_for_cost cost_vec;
> -  

RE: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2020-09-08 Thread Martin Jambor
Hi,

On Fri, Aug 21 2020, Tamar Christina wrote:
>> 
>> Honza's changes have been motivated to big extent as an enabler for IPA-CP
>> heuristics changes to actually speed up 548.exchange2_r.
>> 
>> On my AMD Zen2 machine, the run-time of exchange2 was 358 seconds two
>> weeks ago, this week it is 403, but with my WIP (and so far untested) patch
>> below it is just 276 seconds - faster than one built with GCC 8 which needs
>> 283 seconds.
>> 
>> I'll be interested in knowing if it also works this well on other 
>> architectures.
>> 

I have posted the new version of the patch series to the mailing list
yesterday and I have also pushed the branch to the FSF repo as
refs/users/jamborm/heads/ipa-context_and_exchange-200907

>
> Many thanks for working on this!
>
> I tried this on an AArch64 Neoverse-N1 machine and didn't see any difference.
> Do I need any flags for it to work? The patch was applied on top of 
> 656218ab982cc22b826227045826c92743143af1
>

I only have access to fairly old AMD (Seattle) Opteron 1100 which might
not support some interesting Aarch64 ISA extensions but I can measure a
significant speedup on it (everything with just -Ofast -march=native
-mtune=native, no non-default parameters, without LTO, without any
inlining options):

  GCC 10 branch:  915 seconds
  Master (rev. 995bb851ffe):  989 seconds
  My branch:  827 seconds

(All is 548.exchange_r reference run time.)

> And I tried 3 runs
> 1) -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
> ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 
> -fno-inline-functions-called-once

This is the first time I saw -fno-inline-functions-called-once used in
this context.  This seems to indicate we are looking at another problem
that at least I have not known about yet.  Can you please upload
somewhere the inlining WPA dumps with and without the option?

Similarly, I do not need LTO for the speedup on x86_64.

The patches in the series should also remove the need for --param
ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 If you still need
them on my branch, could you please again provide me with (WPA, if with
LTO) ipa-cp dumps with and without them?


> 2) -mcpu=native -Ofast -fomit-frame-pointer -flto 
> -fno-inline-functions-called-once
> 3) -mcpu=native -Ofast -fomit-frame-pointer -flto
>
> First one used to give us the best result, with this patch there's no 
> difference between 1 and 2 (11% regression) and the 3rd one is about 15% on 
> top of that.

OK, so the patch did help (but above you wrote it did not?) but not
enough to be as fast as some previous revision and on top of that
-fno-inline-functions-called-once further helps but again not enough?

If correct, this looks like we need to examine what goes wrong
specifically in the case of Neoverse-N1 though.

Thanks,

Martin



[PATCH] arm: Add new vector mode macros

2020-09-08 Thread Richard Sandiford
[ This is related to Dennis's subtraction patch
  https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553339.html
  and the discussion about how the patterns were written.  I wanted
  to see whether there was a way that we could simplify the current
  addition handling that might perhaps make it easier to add other
  MVE operations in future.  It seemed like one of those situations
  in which the most productive thing would be to try it and see,
  rather than just describe it in words.

  One of the questions Ramana had in the thread above was: why does
  MVE not need the flag_unsafe_math_optimizations flag?  AIUI the reason
  is that MVE honours the FPSCR.FZ flag while SF Advanced SIMD always
  flushes to zero.  (HF Advanced SIMD honours FPSCR.FZ16 and so also
  doesn't need flag_unsafe_math_optimizations.) ]

The AArch32 port now has three vector extensions: iwMMXt, Neon
and MVE.  We already have some named expanders that are shared
by all three, and soon we'll need more.

One way of handling this would be to use define_mode_iterators
that specify the condition for each mode.  For example,

  (V16QI "TARGET_NEON || TARGET_HAVE_MVE")
  (V8QI "TARGET_NEON || TARGET_REALLY_IWMXXT")
  ...
  (V2SF "TARGET_NEON && flag_unsafe_math_optimizations")

etc.  However, we'll need several mode iterators, and it would
be repetitive to specify the mode condition every time.

This patch therefore introduces per-mode macros that say whether
we can perform general arithmetic on the mode.  Initially there are
two sets of macros:

ARM_HAVE_NEON__ARITH
  true if Neon can handle general arithmetic on 

ARM_HAVE__ARITH
  true if any vector extension can handle general arithmetic on 

The macro definitions themselves are undeniably ugly, but hopefully
they're justified by the simplifications they allow.

The patch converts the addition patterns to use this scheme.

Previously there were three copies of the V8HF and V4HF addition
patterns for Neon:

(1) *add3_neon, which provided plus:VnHF even without
TARGET_NEON_FP16INST.  This was probably harmless since all the
named patterns had an appropriate guard, but it is possible that
something could have tried to generate the plus directly, such as
by using a REG_EQUAL note to generate a new pattern.

(2) addv8hf3_neon and addv4hf3, which had the correct
TARGET_NEON_FP16INST target condition, but unnecessarily required
flag_unsafe_math_optimizations.  Unlike VnSF operations, VnHF
operations do not force flush to zero.

(3) add3_fp16, which provided plus:VnHF with the
correct conditions (TARGET_NEON_FP16INST, with no
flag_unsafe_math_optimizations test).

The patch in essence renames add3_fp16 to *add3_neon
(part of *add3_neon) and removes the other two patterns.

WDYT?  Does this look like a way forward?

Tested on arm-linux-gnueabihf and armeb-eabi.

Thanks,
Richard


gcc/
* config/arm/arm.h (ARM_HAVE_NEON_V8QI_ARITH, ARM_HAVE_NEON_V4HI_ARITH)
(ARM_HAVE_NEON_V2SI_ARITH, ARM_HAVE_NEON_V16QI_ARITH): New macros.
(ARM_HAVE_NEON_V8HI_ARITH, ARM_HAVE_NEON_V4SI_ARITH): Likewise.
(ARM_HAVE_NEON_V2DI_ARITH, ARM_HAVE_NEON_V4HF_ARITH): Likewise.
(ARM_HAVE_NEON_V8HF_ARITH, ARM_HAVE_NEON_V2SF_ARITH): Likewise.
(ARM_HAVE_NEON_V4SF_ARITH, ARM_HAVE_V8QI_ARITH, ARM_HAVE_V4HI_ARITH)
(ARM_HAVE_V2SI_ARITH, ARM_HAVE_V16QI_ARITH, ARM_HAVE_V8HI_ARITH)
(ARM_HAVE_V4SI_ARITH, ARM_HAVE_V2DI_ARITH, ARM_HAVE_V4HF_ARITH)
(ARM_HAVE_V2SF_ARITH, ARM_HAVE_V8HF_ARITH, ARM_HAVE_V4SF_ARITH):
Likewise.
* config/arm/iterators.md (VNIM, VNINOTM): Delete.
* config/arm/vec-common.md (add3, addv8hf3)
(add3): Replace with...
(add3): ...this new expander.
* config/arm/neon.md (*add3_neon): Use the new
ARM_HAVE_NEON__ARITH macros as the C condition.
(addv8hf3_neon, addv4hf3, add3_fp16): Delete in
favor of the above.
(neon_vadd): Use gen_add3 instead of
gen_add3_fp16.
---
 gcc/config/arm/arm.h | 41 +++
 gcc/config/arm/iterators.md  |  8 --
 gcc/config/arm/neon.md   | 47 ++--
 gcc/config/arm/vec-common.md | 42 
 4 files changed, 48 insertions(+), 90 deletions(-)

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 3887c51eebe..3284ae29d7c 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1106,6 +1106,47 @@ extern const int arm_arch_cde_coproc_bits[];
 #define VALID_MVE_STRUCT_MODE(MODE) \
   ((MODE) == TImode || (MODE) == OImode || (MODE) == XImode)
 
+/* The conditions under which vector modes are supported for general
+   arithmetic using Neon.  */
+
+#define ARM_HAVE_NEON_V8QI_ARITH TARGET_NEON
+#define ARM_HAVE_NEON_V4HI_ARITH TARGET_NEON
+#define ARM_HAVE_NEON_V2SI_ARITH TARGET_NEON
+
+#define ARM_HAVE_NEON_V16QI_ARITH TARGET_NEON
+#define ARM_HAVE_NEON_V8HI_ARITH TARGET_NEON
+#define 

Re: Patch for 96948

2020-09-08 Thread Kirill Müller via Gcc-patches

Hi


Thanks for the heads up. The coincidence is funny -- a file that hasn't 
been touched for years.


I do believe that we need the logic around the `first` flag for 
consistency with the other unwind-*.c implementations. This can be 
verified by running the tests in the included libbacktrace library (e.g. 
via make check), in particular btest.c . See 
https://github.com/ianlancetaylor/libbacktrace/issues/43 for the 
downstream ticket.



Best regards

Kirill


On 08.09.20 14:08, Martin Storsjö wrote:

Hi,

On Mon, 7 Sep 2020, Kirill Müller via Gcc-patches wrote:

As requested, attaching a patch for 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96948. This solves a 
problem with _Unwind_Backtrace() on mingw64 + SEH.


What a coincidence - I actually sent a patch for the exact same thing 
last week, see 
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553082.html.


My version doesn't set gcc_context.cfa though, but is simpler by 
avoiding the whole "first" flag logic.


I can send an updated patch that also sets gcc_context.cfa in a smilar 
manner to the previous one.


// Martin


Re: [PATCH] c++: Fix resolving the address of overloaded pmf [PR96647]

2020-09-08 Thread Patrick Palka via Gcc-patches
On Mon, 31 Aug 2020, Jason Merrill wrote:

> On 8/28/20 12:45 PM, Patrick Palka wrote:
> > (Removing libstd...@gcc.gnu.org from CC list)
> > 
> > On Fri, 28 Aug 2020, Patrick Palka wrote:
> > > In resolve_address_of_overloaded_function, currently only the second
> > > pass over the overload set (which considers just the function templates
> > > in the overload set) checks constraints and performs return type
> > > deduction when necessary.  But as the testcases below show, we need to
> > > do this when considering non-template functions during the first pass,
> > > too.
> > > 
> > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   PR c++/96647
> > >   * class.c (resolve_address_of_overloaded_function): Also check
> > >   constraints and perform return type deduction when considering
> > >   non-template functions in the overload set.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   PR c++/96647
> > >   * g++.dg/cpp0x/auto-96647.C: New test.
> > >   * g++.dg/cpp2a/concepts-fn6.C: New test.
> > > ---
> > >   gcc/cp/class.c| 16 
> > >   gcc/testsuite/g++.dg/cpp0x/auto-96647.C   | 10 ++
> > >   gcc/testsuite/g++.dg/cpp2a/concepts-fn6.C | 10 ++
> > >   3 files changed, 36 insertions(+)
> > >   create mode 100644 gcc/testsuite/g++.dg/cpp0x/auto-96647.C
> > >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-fn6.C
> 
> > > + if (undeduced_auto_decl (fn))
> > > +   {
> > > + /* Force instantiation to do return type deduction.  */
> > > + ++function_depth;
> > > + instantiate_decl (fn, /*defer*/false, /*class*/false);
> > > + --function_depth;
> 
> How about maybe_instantiate_decl instead of this hunk?  This looks like it
> could call instantiate_decl for a non-template function, which is wrong.

Good point.  We even ICE on the testcase error9.C below when using
instantiate_decl here, since we indeed end up calling it for the
non-specialization f(bool).

Does the following look OK?

-- >8 --

Subject: [PATCH] c++: Fix resolving the address of overloaded pmf [PR96647]

In resolve_address_of_overloaded_function, currently only the second
pass over the overload set (which considers just the function templates
in the overload set) checks constraints and performs return type
deduction when necessary.  But as the testcases below show, we need to
do the same when considering non-template functions during the first
pass.

gcc/cp/ChangeLog:

PR c++/96647
* class.c (resolve_address_of_overloaded_function): Check
constraints_satisfied_p and perform return-type deduction via
maybe_instantiate_decl when considering non-template functions
in the overload set.
* cp-tree.h (maybe_instantiate_decl): Declare.
* decl2.c (maybe_instantiate_decl): Remove static.

gcc/testsuite/ChangeLog:

PR c++/96647
* g++.dg/cpp0x/auto-96647.C: New test.
* g++.dg/cpp0x/error9: New test.
* g++.dg/cpp2a/concepts-fn6.C: New test.
---
 gcc/cp/class.c| 13 +
 gcc/cp/cp-tree.h  |  1 +
 gcc/cp/decl2.c|  3 +--
 gcc/testsuite/g++.dg/cpp0x/auto-96647.C   | 10 ++
 gcc/testsuite/g++.dg/cpp0x/error9.C   |  6 ++
 gcc/testsuite/g++.dg/cpp2a/concepts-fn6.C | 10 ++
 6 files changed, 41 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/auto-96647.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/error9.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-fn6.C

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 3479b8207d2..c9a1f753d56 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -8286,6 +8286,19 @@ resolve_address_of_overloaded_function (tree target_type,
 one, or vice versa.  */
  continue;
 
+   /* Constraints must be satisfied. This is done before
+  return type deduction since that instantiates the
+  function. */
+   if (!constraints_satisfied_p (fn))
+ continue;
+
+   if (undeduced_auto_decl (fn))
+ {
+   /* Force instantiation to do return type deduction.  */
+   maybe_instantiate_decl (fn);
+   require_deduced_type (fn);
+ }
+
/* In C++17 we need the noexcept-qualifier to compare types.  */
if (flag_noexcept_type
&& !maybe_instantiate_noexcept (fn, complain))
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 708de83eb46..78739411755 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6905,6 +6905,7 @@ extern void do_type_instantiation (tree, tree, 
tsubst_flags_t);
 extern bool always_instantiate_p   (tree);
 extern bool maybe_instantiate_noexcept (tree, tsubst_flags_t = 
tf_warning_or_error);
 extern tree instantiate_decl   (tree, bool, bool);
+extern void maybe_instantiate_decl (tree);
 

[PATCH] tree-optimization/96043 - BB vectorization costing improvement

2020-09-08 Thread Richard Biener


This makes the BB vectorizer cost independent SLP subgraphs
separately.  While on pristine trunk and for x86_64 I failed to
distill a testcase where the vectorizer would think _any_
basic-block vectorization opportunity is not profitable I do
have pending work that would make the cost savings of a
profitable opportunity make another independently not
profitable opportunity vectorized.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

CCing some people to double-check my graph partitioning algorithm.

Richard.

2020-09-08  Richard Biener  

PR tree-optimization/96043
* tree-vectorizer.h (_slp_instance::cost_vec): New.
(_slp_instance::subgraph_entries): Likewise.
(BB_VINFO_TARGET_COST_DATA): Remove.
* tree-vect-slp.c (vect_free_slp_instance): Free
cost_vec and subgraph_entries.
(vect_analyze_slp_instance): Initialize them.
(vect_slp_analyze_operations): Defer passing costs to
the target, instead record them in the SLP graph entry.
(get_ultimate_leader): New helper for graph partitioning.
(vect_bb_partition_graph_r): Likewise.
(vect_bb_partition_graph): New function to partition the
SLP graph into independently costable parts.
(vect_bb_vectorization_profitable_p): Adjust to work on
a subgraph.
(vect_bb_vectorization_profitable_p): New wrapper,
discarding non-profitable vectorization of subgraphs.
(vect_slp_analyze_bb_1): Call vect_bb_partition_graph before
costing.
---
 gcc/tree-vect-slp.c   | 190 ++
 gcc/tree-vectorizer.h |   8 +-
 2 files changed, 180 insertions(+), 18 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 2b7fd685ef9..35e8985d159 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -126,6 +126,8 @@ vect_free_slp_instance (slp_instance instance, bool final_p)
 {
   vect_free_slp_tree (SLP_INSTANCE_TREE (instance), final_p);
   SLP_INSTANCE_LOADS (instance).release ();
+  instance->subgraph_entries.release ();
+  instance->cost_vec.release ();
   free (instance);
 }
 
@@ -2269,6 +2271,8 @@ vect_analyze_slp_instance (vec_info *vinfo,
  SLP_INSTANCE_LOADS (new_instance) = vNULL;
  SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ? stmt_info : 
NULL;
  new_instance->reduc_phis = NULL;
+ new_instance->cost_vec = vNULL;
+ new_instance->subgraph_entries = vNULL;
 
  vect_gather_slp_loads (new_instance, node);
  if (dump_enabled_p ())
@@ -3153,8 +3157,8 @@ vect_slp_analyze_operations (vec_info *vinfo)
visited.add (*x);
  i++;
 
- add_stmt_costs (vinfo, vinfo->target_cost_data, _vec);
- cost_vec.release ();
+ /* Remember the SLP graph entry cost for later.  */
+ instance->cost_vec = cost_vec;
}
 }
 
@@ -3162,18 +3166,113 @@ vect_slp_analyze_operations (vec_info *vinfo)
   if (bb_vec_info bb_vinfo = dyn_cast  (vinfo))
 {
   hash_set svisited;
-  stmt_vector_for_cost cost_vec;
-  cost_vec.create (2);
   for (i = 0; vinfo->slp_instances.iterate (i, ); ++i)
vect_bb_slp_mark_live_stmts (bb_vinfo, SLP_INSTANCE_TREE (instance),
-instance, _vec, svisited);
-  add_stmt_costs (vinfo, vinfo->target_cost_data, _vec);
-  cost_vec.release ();
+instance, >cost_vec, svisited);
 }
 
   return !vinfo->slp_instances.is_empty ();
 }
 
+/* Get the SLP instance leader from INSTANCE_LEADER thereby transitively
+   closing the eventual chain.  */
+
+static slp_instance
+get_ultimate_leader (slp_instance instance,
+hash_map _leader)
+{
+  auto_vec chain;
+  slp_instance *tem;
+  while (*(tem = instance_leader.get (instance)) != instance)
+{
+  chain.safe_push (tem);
+  instance = *tem;
+}
+  while (!chain.is_empty ())
+*chain.pop () = instance;
+  return instance;
+}
+
+/* Worker of vect_bb_partition_graph, recurse on NODE.  */
+
+static void
+vect_bb_partition_graph_r (bb_vec_info bb_vinfo,
+  slp_instance instance, slp_tree node,
+  hash_map 
_to_instance,
+  hash_map 
_leader)
+{
+  stmt_vec_info stmt_info;
+  unsigned i;
+  bool all = true;
+  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
+{
+  bool existed_p;
+  slp_instance _instance
+   = stmt_to_instance.get_or_insert (stmt_info, _p);
+  if (!existed_p)
+   {
+ all = false;
+   }
+  else if (stmt_instance != instance)
+   {
+ /* If we're running into a previously marked stmt make us the
+leader of the current ultimate leader.  This keeps the
+leader chain acyclic and works even when the current instance
+connects two previously independent graph parts.  */
+ stmt_instance = 

[PATCH v2] --enable-link-serialization support

2020-09-08 Thread Jakub Jelinek via Gcc-patches
On Mon, Sep 07, 2020 at 05:58:04PM -0400, Jason Merrill via Gcc-patches wrote:
> On Thu, Sep 3, 2020 at 10:49 AM Jakub Jelinek via Gcc-patches
>  wrote:
> >
> > On Thu, Sep 03, 2020 at 03:53:35PM +0200, Richard Biener wrote:
> > > On Thu, 3 Sep 2020, Jakub Jelinek wrote:
> > > But is that an issue in practice?  I usually do not do make -j32 cc1plus
> > > in a tree that was configured for bootstrap, nor do I use
> > > --enable-link-serialization in that case.
> >
> > Guess most often true, but one could still do it when debugging some
> > particular problem during bootstrap.

Ok, based on your and Jason's comment, I've moved it back to gcc/ configure,
which means that even inside of gcc/ make all (as well as e.g. make lto-dump)
will serialize and build all previous large binaries when configured this
way.
One can always make -j32 cc1 DO_LINK_SERIALIZATION=
to avoid that.
Furthermore, I've implemented the idea I wrote about, so that
--enable-link-serialization
is the same as
--enable-link-serialization=1
and means the large link commands are serialized, one can as before (the
default)
--disable-link-serialization
which will cause all links to be parallelizable, but one can newly also
--enable-link-serialization=3
etc. which says that at most 3 of the large link commands can run
concurrently.
And finally I've implemented (only if the serialization is enabled) simple
progress bars for the linking.
With --enable-link-serialization and e.g. the 5 large links I have in my
current tree (cc1, cc1plus, f951, lto1 and lto-dump), before the linking it
prints
Linking |==--  | 20%
and after it
Linking |  | 40%
(each 2 = characters stand for already finished links, each 2 -
characters 2 - characters stand for the link being started).
With --enable-link-serialization=3 it will change the way the start is
printed, one will get:
Linking |--| 0%
at the start of cc1 link,
Linking |>>--  | 0%
at the start of the second large link and
Linking |--| 0%
at the start of the third large link, where the 2 > characters stand for
already pending links.  The printing at the end of link command is
the same as with the full serialization, i.e. for the above 3:
Linking |==| 20%
Linking |  | 40%
Linking |==| 60%
but one could actually get them in any order depending on which of those 3
finishes first - to get it 100% accurate I'd need to add some directory with
files representing finished links or similar, doesn't seem worth it.

2020-09-08  Jakub Jelinek  

gcc/
* configure.ac: Add $lang.prev rules, INDEX.$lang and SERIAL_LIST and
SERIAL_COUNT variables to Make-hooks.
(--enable-link-serialization): New configure option.
* Makefile.in (DO_LINK_SERIALIZATION, LINK_PROGRESS): New variables.
* doc/install.texi (--enable-link-serialization): Document.
* configure: Regenerated.
gcc/c/
* Make-lang.in (c.serial): New goal.
(.PHONY): Add c.serial c.prev.
(cc1$(exeext)): Call LINK_PROGRESS.
gcc/cp/
* Make-lang.in (c++.serial): New goal.
(.PHONY): Add c++.serial c++.prev.
(cc1plus$(exeext)): Depend on c++.prev.  Call LINK_PROGRESS.
gcc/fortran/
* Make-lang.in (fortran.serial): New goal.
(.PHONY): Add fortran.serial fortran.prev.
(f951$(exeext)): Depend on fortran.prev.  Call LINK_PROGRESS.
gcc/lto/
* Make-lang.in (lto, lto1.serial, lto2.serial): New goals.
(.PHONY): Add lto lto1.serial lto1.prev lto2.serial lto2.prev.
(lto.all.cross, lto.start.encap): Remove dependencies.
($(LTO_EXE)): Depend on lto1.prev.  Call LINK_PROGRESS.
($(LTO_DUMP_EXE)): Depend on lto2.prev.  Call LINK_PROGRESS.
gcc/objc/
* Make-lang.in (objc.serial): New goal.
(.PHONY): Add objc.serial objc.prev.
(cc1obj$(exeext)): Depend on objc.prev.  Call LINK_PROGRESS.
gcc/objcp/
* Make-lang.in (obj-c++.serial): New goal.
(.PHONY): Add obj-c++.serial obj-c++.prev.
(cc1objplus$(exeext)): Depend on obj-c++.prev.  Call LINK_PROGRESS.
gcc/ada/
* gcc-interface/Make-lang.in (ada.serial): New goal.
(.PHONY): Add ada.serial ada.prev.
(gnat1$(exeext)): Depend on ada.prev.  Call LINK_PROGRESS.
gcc/brig/
* Make-lang.in (brig.serial): New goal.
(.PHONY): Add brig.serial brig.prev.
(brig1$(exeext)): Depend on brig.prev.  Call LINK_PROGRESS.
gcc/go/
* Make-lang.in (go.serial): New goal.
(.PHONY): Add go.serial go.prev.
(go1$(exeext)): Depend on go.prev.  Call LINK_PROGRESS.
gcc/jit/
* Make-lang.in (jit.serial): New goal.
(.PHONY): Add jit.serial jit.prev.
($(LIBGCCJIT_FILENAME)): Depend on jit.prev.  Call LINK_PROGRESS.
gcc/d/
* Make-lang.in (d.serial): New goal.
(.PHONY): Add d.serial d.prev.
(d21$(exeext)): Depend on d.prev.  Call LINK_PROGRESS.

--- gcc/configure.ac.jj 2020-09-08 12:24:39.199542406 +0200
+++ 

Re: [PATCH] code generate live lanes in basic-block vectorization

2020-09-08 Thread Christophe Lyon via Gcc-patches
On Tue, 8 Sep 2020 at 14:15, Richard Biener  wrote:
>
> On Tue, 8 Sep 2020, Christophe Lyon wrote:
>
> > Hi Richard,
> >
> > On Fri, 4 Sep 2020 at 15:42, Richard Biener  wrote:
> > >
> > > The following adds the capability to code-generate live lanes in
> > > basic-block vectorization using lane extracts from vector stmts
> > > rather than keeping the original scalar code around for those.
> > > This eventually makes previously not profitable vectorizations
> > > profitable (the live scalar code was appropriately costed so
> > > are the lane extracts now), without considering the cost model
> > > this patch doesn't add or remove any basic-block vectorization
> > > capabilities.
> > >
> > > The patch re/ab-uses STMT_VINFO_LIVE_P in basic-block vectorization
> > > mode to tell whether a live lane is vectorized or whether it is
> > > provided by means of keeping the scalar code live.
> > >
> > > The patch is a first step towards vectorizing sequences of
> > > stmts that do not end up in stores or vector constructors though.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > >
> > > Any comments?
> > >
> > Yes: this is causing an ICE on arm:
>
> As usual this isn't enough for me to reproduce with a cross.
> Can you open a bugreport with the cc1 command & the configury pasted?
>

Sure, I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96978

> Thanks,
> Richard.
>
> > FAIL: gcc.dg/vect/bb-slp-pr92596.c (internal compiler error)
> > FAIL: gcc.dg/vect/bb-slp-pr92596.c (test for excess errors)
> > Excess errors:
> > during GIMPLE pass: slp
> > dump file: bb-slp-pr92596.c.173t.slp2
> > /gcc/testsuite/gcc.dg/vect/bb-slp-pr92596.c:11:6: internal compiler
> > error: in vect_transform_stmt, at tree-vect-stmts.c:10870
> > 0xfa16cc vect_transform_stmt(vec_info*, _stmt_vec_info*,
> > gimple_stmt_iterator*, _slp_tree*, _slp_instance*)
> > /gcc/tree-vect-stmts.c:10870
> > 0xfd6954 vect_schedule_slp_instance
> > /gcc/tree-vect-slp.c:4570
> > 0xfd684f vect_schedule_slp_instance
> > /gcc/tree-vect-slp.c:4436
> > 0xfd684f vect_schedule_slp_instance
> > /gcc/tree-vect-slp.c:4436
> > 0xfdeace vect_schedule_slp(vec_info*)
> > /gcc/tree-vect-slp.c:4695
> > 0xfe2529 vect_slp_region
> > /gcc/tree-vect-slp.c:3529
> > 0xfe33d7 vect_slp_bb(basic_block_def*)
> > /gcc/tree-vect-slp.c:3647
> > 0xfe503c execute
> > /gcc/tree-vectorizer.c:1429
> >
> > Christophe
> >
> > > Thanks,
> > > Richard.
> > >
> > > 2020-09-04  Richard Biener  
> > >
> > > * tree-vectorizer.h (vectorizable_live_operation): Adjust.
> > > * tree-vect-loop.c (vectorizable_live_operation): Vectorize
> > > live lanes out of basic-block vectorization nodes.
> > > * tree-vect-slp.c (vect_bb_slp_mark_live_stmts): New function.
> > > (vect_slp_analyze_operations): Analyze live lanes and their
> > > vectorization possibility after the whole SLP graph is final.
> > > (vect_bb_slp_scalar_cost): Adjust for vectorized live lanes.
> > > * tree-vect-stmts.c (can_vectorize_live_stmts): Adjust.
> > > (vect_transform_stmt): Call can_vectorize_live_stmts also for
> > > basic-block vectorization.
> > >
> > > * gcc.dg/vect/bb-slp-46.c: New testcase.
> > > * gcc.dg/vect/bb-slp-47.c: Likewise.
> > > * gcc.dg/vect/bb-slp-32.c: Adjust.
> > > ---
> > >  gcc/testsuite/gcc.dg/vect/bb-slp-32.c |   7 +-
> > >  gcc/testsuite/gcc.dg/vect/bb-slp-46.c |  28 +++
> > >  gcc/testsuite/gcc.dg/vect/bb-slp-47.c |  14 ++
> > >  gcc/tree-vect-loop.c  | 243 --
> > >  gcc/tree-vect-slp.c   | 145 +--
> > >  gcc/tree-vect-stmts.c |  12 +-
> > >  gcc/tree-vectorizer.h |   2 +-
> > >  7 files changed, 332 insertions(+), 119 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-46.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-47.c
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-32.c 
> > > b/gcc/testsuite/gcc.dg/vect/bb-slp-32.c
> > > index 41bbf352156..020b6365e02 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/bb-slp-32.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-32.c
> > > @@ -7,16 +7,21 @@ int foo (int *p, int a, int b)
> > >  {
> > >int x[4];
> > >int tem0, tem1, tem2, tem3;
> > > +  int sum = 0;
> > >tem0 = p[0] + 1 + a;
> > > +  sum += tem0;
> > >x[0] = tem0;
> > >tem1 = p[1] + 2 + b;
> > > +  sum += tem1;
> > >x[1] = tem1;
> > >tem2 = p[2] + 3 + b;
> > > +  sum += tem2;
> > >x[2] = tem2;
> > >tem3 = p[3] + 4 + a;
> > > +  sum += tem3;
> > >x[3] = tem3;
> > >bar (x);
> > > -  return tem0 + tem1 + tem2 + tem3;
> > > +  return sum;
> > >  }
> > >
> > >  /* { dg-final { scan-tree-dump "vectorization is not profitable" "slp2" 
> > > { xfail  { vect_no_align && { ! vect_hw_misalign } } } } } */
> > > diff --git 

[PATCH v2] libgcc: Expose the instruction pointer and stack pointer in SEH _Unwind_Backtrace

2020-09-08 Thread Martin Storsjö
Previously, the SEH version of _Unwind_Backtrace did unwind
the stack and call the provided callback function as intended,
but there was little the caller could do within the callback to
actually get any info about that particular level in the unwind.

Set the ra and cfa pointers, which are used by _Unwind_GetIP
and _Unwind_GetCFA, to allow using these functions from the
callacb to inspect the state at each stack frame.

2020-09-08  Martin Storsjö  

libgcc/Changelog:
* unwind-seh.c (_Unwind_Backtrace): Set the ra and cfa pointers
before calling the callback.
---
 libgcc/unwind-seh.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/libgcc/unwind-seh.c b/libgcc/unwind-seh.c
index 1a70180cfaa..275d782903a 100644
--- a/libgcc/unwind-seh.c
+++ b/libgcc/unwind-seh.c
@@ -466,6 +466,11 @@ _Unwind_Backtrace(_Unwind_Trace_Fn trace,
_context.disp->HandlerData,
_context.disp->EstablisherFrame, NULL);
 
+  /* Set values that the callback can inspect via _Unwind_GetIP
+   * and _Unwind_GetCFA. */
+  gcc_context.ra = ms_context.Rip;
+  gcc_context.cfa = ms_context.Rsp;
+
   /* Call trace function.  */
   if (trace (_context, trace_argument) != _URC_NO_REASON)
return _URC_FATAL_PHASE1_ERROR;
-- 
2.17.1



Re: [PATCH] code generate live lanes in basic-block vectorization

2020-09-08 Thread Richard Biener
On Tue, 8 Sep 2020, Christophe Lyon wrote:

> Hi Richard,
> 
> On Fri, 4 Sep 2020 at 15:42, Richard Biener  wrote:
> >
> > The following adds the capability to code-generate live lanes in
> > basic-block vectorization using lane extracts from vector stmts
> > rather than keeping the original scalar code around for those.
> > This eventually makes previously not profitable vectorizations
> > profitable (the live scalar code was appropriately costed so
> > are the lane extracts now), without considering the cost model
> > this patch doesn't add or remove any basic-block vectorization
> > capabilities.
> >
> > The patch re/ab-uses STMT_VINFO_LIVE_P in basic-block vectorization
> > mode to tell whether a live lane is vectorized or whether it is
> > provided by means of keeping the scalar code live.
> >
> > The patch is a first step towards vectorizing sequences of
> > stmts that do not end up in stores or vector constructors though.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > Any comments?
> >
> Yes: this is causing an ICE on arm:

As usual this isn't enough for me to reproduce with a cross.
Can you open a bugreport with the cc1 command & the configury pasted?

Thanks,
Richard.

> FAIL: gcc.dg/vect/bb-slp-pr92596.c (internal compiler error)
> FAIL: gcc.dg/vect/bb-slp-pr92596.c (test for excess errors)
> Excess errors:
> during GIMPLE pass: slp
> dump file: bb-slp-pr92596.c.173t.slp2
> /gcc/testsuite/gcc.dg/vect/bb-slp-pr92596.c:11:6: internal compiler
> error: in vect_transform_stmt, at tree-vect-stmts.c:10870
> 0xfa16cc vect_transform_stmt(vec_info*, _stmt_vec_info*,
> gimple_stmt_iterator*, _slp_tree*, _slp_instance*)
> /gcc/tree-vect-stmts.c:10870
> 0xfd6954 vect_schedule_slp_instance
> /gcc/tree-vect-slp.c:4570
> 0xfd684f vect_schedule_slp_instance
> /gcc/tree-vect-slp.c:4436
> 0xfd684f vect_schedule_slp_instance
> /gcc/tree-vect-slp.c:4436
> 0xfdeace vect_schedule_slp(vec_info*)
> /gcc/tree-vect-slp.c:4695
> 0xfe2529 vect_slp_region
> /gcc/tree-vect-slp.c:3529
> 0xfe33d7 vect_slp_bb(basic_block_def*)
> /gcc/tree-vect-slp.c:3647
> 0xfe503c execute
> /gcc/tree-vectorizer.c:1429
> 
> Christophe
> 
> > Thanks,
> > Richard.
> >
> > 2020-09-04  Richard Biener  
> >
> > * tree-vectorizer.h (vectorizable_live_operation): Adjust.
> > * tree-vect-loop.c (vectorizable_live_operation): Vectorize
> > live lanes out of basic-block vectorization nodes.
> > * tree-vect-slp.c (vect_bb_slp_mark_live_stmts): New function.
> > (vect_slp_analyze_operations): Analyze live lanes and their
> > vectorization possibility after the whole SLP graph is final.
> > (vect_bb_slp_scalar_cost): Adjust for vectorized live lanes.
> > * tree-vect-stmts.c (can_vectorize_live_stmts): Adjust.
> > (vect_transform_stmt): Call can_vectorize_live_stmts also for
> > basic-block vectorization.
> >
> > * gcc.dg/vect/bb-slp-46.c: New testcase.
> > * gcc.dg/vect/bb-slp-47.c: Likewise.
> > * gcc.dg/vect/bb-slp-32.c: Adjust.
> > ---
> >  gcc/testsuite/gcc.dg/vect/bb-slp-32.c |   7 +-
> >  gcc/testsuite/gcc.dg/vect/bb-slp-46.c |  28 +++
> >  gcc/testsuite/gcc.dg/vect/bb-slp-47.c |  14 ++
> >  gcc/tree-vect-loop.c  | 243 --
> >  gcc/tree-vect-slp.c   | 145 +--
> >  gcc/tree-vect-stmts.c |  12 +-
> >  gcc/tree-vectorizer.h |   2 +-
> >  7 files changed, 332 insertions(+), 119 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-46.c
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-47.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-32.c 
> > b/gcc/testsuite/gcc.dg/vect/bb-slp-32.c
> > index 41bbf352156..020b6365e02 100644
> > --- a/gcc/testsuite/gcc.dg/vect/bb-slp-32.c
> > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-32.c
> > @@ -7,16 +7,21 @@ int foo (int *p, int a, int b)
> >  {
> >int x[4];
> >int tem0, tem1, tem2, tem3;
> > +  int sum = 0;
> >tem0 = p[0] + 1 + a;
> > +  sum += tem0;
> >x[0] = tem0;
> >tem1 = p[1] + 2 + b;
> > +  sum += tem1;
> >x[1] = tem1;
> >tem2 = p[2] + 3 + b;
> > +  sum += tem2;
> >x[2] = tem2;
> >tem3 = p[3] + 4 + a;
> > +  sum += tem3;
> >x[3] = tem3;
> >bar (x);
> > -  return tem0 + tem1 + tem2 + tem3;
> > +  return sum;
> >  }
> >
> >  /* { dg-final { scan-tree-dump "vectorization is not profitable" "slp2" { 
> > xfail  { vect_no_align && { ! vect_hw_misalign } } } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-46.c 
> > b/gcc/testsuite/gcc.dg/vect/bb-slp-46.c
> > new file mode 100644
> > index 000..4e4571ef640
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-46.c
> > @@ -0,0 +1,28 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-additional-options "-fdump-tree-optimized" } */
> > +
> > +int 

Re: Patch for 96948

2020-09-08 Thread Martin Storsjö

Hi,

On Mon, 7 Sep 2020, Kirill Müller via Gcc-patches wrote:

As requested, attaching a patch for 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96948. This solves a problem 
with _Unwind_Backtrace() on mingw64 + SEH.


What a coincidence - I actually sent a patch for the exact same thing 
last week, see 
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553082.html.


My version doesn't set gcc_context.cfa though, but is simpler by avoiding 
the whole "first" flag logic.


I can send an updated patch that also sets gcc_context.cfa in a smilar 
manner to the previous one.


// Martin


Re: [PATCH] code generate live lanes in basic-block vectorization

2020-09-08 Thread Christophe Lyon via Gcc-patches
Hi Richard,

On Fri, 4 Sep 2020 at 15:42, Richard Biener  wrote:
>
> The following adds the capability to code-generate live lanes in
> basic-block vectorization using lane extracts from vector stmts
> rather than keeping the original scalar code around for those.
> This eventually makes previously not profitable vectorizations
> profitable (the live scalar code was appropriately costed so
> are the lane extracts now), without considering the cost model
> this patch doesn't add or remove any basic-block vectorization
> capabilities.
>
> The patch re/ab-uses STMT_VINFO_LIVE_P in basic-block vectorization
> mode to tell whether a live lane is vectorized or whether it is
> provided by means of keeping the scalar code live.
>
> The patch is a first step towards vectorizing sequences of
> stmts that do not end up in stores or vector constructors though.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> Any comments?
>
Yes: this is causing an ICE on arm:
FAIL: gcc.dg/vect/bb-slp-pr92596.c (internal compiler error)
FAIL: gcc.dg/vect/bb-slp-pr92596.c (test for excess errors)
Excess errors:
during GIMPLE pass: slp
dump file: bb-slp-pr92596.c.173t.slp2
/gcc/testsuite/gcc.dg/vect/bb-slp-pr92596.c:11:6: internal compiler
error: in vect_transform_stmt, at tree-vect-stmts.c:10870
0xfa16cc vect_transform_stmt(vec_info*, _stmt_vec_info*,
gimple_stmt_iterator*, _slp_tree*, _slp_instance*)
/gcc/tree-vect-stmts.c:10870
0xfd6954 vect_schedule_slp_instance
/gcc/tree-vect-slp.c:4570
0xfd684f vect_schedule_slp_instance
/gcc/tree-vect-slp.c:4436
0xfd684f vect_schedule_slp_instance
/gcc/tree-vect-slp.c:4436
0xfdeace vect_schedule_slp(vec_info*)
/gcc/tree-vect-slp.c:4695
0xfe2529 vect_slp_region
/gcc/tree-vect-slp.c:3529
0xfe33d7 vect_slp_bb(basic_block_def*)
/gcc/tree-vect-slp.c:3647
0xfe503c execute
/gcc/tree-vectorizer.c:1429

Christophe

> Thanks,
> Richard.
>
> 2020-09-04  Richard Biener  
>
> * tree-vectorizer.h (vectorizable_live_operation): Adjust.
> * tree-vect-loop.c (vectorizable_live_operation): Vectorize
> live lanes out of basic-block vectorization nodes.
> * tree-vect-slp.c (vect_bb_slp_mark_live_stmts): New function.
> (vect_slp_analyze_operations): Analyze live lanes and their
> vectorization possibility after the whole SLP graph is final.
> (vect_bb_slp_scalar_cost): Adjust for vectorized live lanes.
> * tree-vect-stmts.c (can_vectorize_live_stmts): Adjust.
> (vect_transform_stmt): Call can_vectorize_live_stmts also for
> basic-block vectorization.
>
> * gcc.dg/vect/bb-slp-46.c: New testcase.
> * gcc.dg/vect/bb-slp-47.c: Likewise.
> * gcc.dg/vect/bb-slp-32.c: Adjust.
> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-32.c |   7 +-
>  gcc/testsuite/gcc.dg/vect/bb-slp-46.c |  28 +++
>  gcc/testsuite/gcc.dg/vect/bb-slp-47.c |  14 ++
>  gcc/tree-vect-loop.c  | 243 --
>  gcc/tree-vect-slp.c   | 145 +--
>  gcc/tree-vect-stmts.c |  12 +-
>  gcc/tree-vectorizer.h |   2 +-
>  7 files changed, 332 insertions(+), 119 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-46.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-47.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-32.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-32.c
> index 41bbf352156..020b6365e02 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-32.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-32.c
> @@ -7,16 +7,21 @@ int foo (int *p, int a, int b)
>  {
>int x[4];
>int tem0, tem1, tem2, tem3;
> +  int sum = 0;
>tem0 = p[0] + 1 + a;
> +  sum += tem0;
>x[0] = tem0;
>tem1 = p[1] + 2 + b;
> +  sum += tem1;
>x[1] = tem1;
>tem2 = p[2] + 3 + b;
> +  sum += tem2;
>x[2] = tem2;
>tem3 = p[3] + 4 + a;
> +  sum += tem3;
>x[3] = tem3;
>bar (x);
> -  return tem0 + tem1 + tem2 + tem3;
> +  return sum;
>  }
>
>  /* { dg-final { scan-tree-dump "vectorization is not profitable" "slp2" { 
> xfail  { vect_no_align && { ! vect_hw_misalign } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-46.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-46.c
> new file mode 100644
> index 000..4e4571ef640
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-46.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-fdump-tree-optimized" } */
> +
> +int a[4], b[4];
> +int foo ()
> +{
> +  int tem0 = a[0] + b[0];
> +  int temx = tem0 * 17;  /* this fails without a real need */
> +  int tem1 = a[1] + b[1];
> +  int tem2 = a[2] + b[2];
> +  int tem3 = a[3] + b[3];
> +  int temy = tem3 * 13;
> +  a[0] = tem0;
> +  a[1] = tem1;
> +  a[2] = tem2;
> +  a[3] = tem3;
> +  return temx + temy;
> +}
> +
> +/* We should extract the live lane from the vectorized add rather than
> +   keeping the original 

Re: [PATCH] gcc: Make strchr return value pointers const

2020-09-08 Thread Martin Storsjö

On Tue, 8 Sep 2020, Jakub Jelinek wrote:


On Tue, Sep 08, 2020 at 12:16:08PM +0100, Richard Sandiford wrote:

Are platform maintainers allowed to push general changes like these? If
so I can push soon.


Yeah, anyone with commit access can push an approved patch.


I've pushed this one yesterday already:
https://gcc.gnu.org/g:3fe3efe5c141a88a80c1ecc6aebc7f15d6426f62


Thanks!

// Martin



Re: [PATCH] gcc: Make strchr return value pointers const

2020-09-08 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 08, 2020 at 12:16:08PM +0100, Richard Sandiford wrote:
> > Are platform maintainers allowed to push general changes like these? If
> > so I can push soon.
> 
> Yeah, anyone with commit access can push an approved patch.

I've pushed this one yesterday already:
https://gcc.gnu.org/g:3fe3efe5c141a88a80c1ecc6aebc7f15d6426f62

Anyway, with git I'd like to say that it is desirable to commit
such patches with git commit --author '...' to give due credit.

Jakub



[PUSHED] PR tree-optimization/96967 - cast label range to type of switch operand

2020-09-08 Thread Aldy Hernandez via Gcc-patches
This is the same issue as PR96818.  It's another intersect that's 
missing a cast in the same function.  I missed it in the previous PR.


Pushed as obvious.

PR tree-optimization/96967
* tree-vrp.c (find_case_label_range): Cast label range to
type of switch operand.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr96967.c | 36 
+

 gcc/tree-vrp.c  |  2 ++
 2 files changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr96967.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr96967.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr96967.c

new file mode 100644
index 000..249dfc7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr96967.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fshort-enums" } */
+
+enum re {
+  o3,
+};
+
+int
+uj (int mq, enum re dn)
+{
+  enum re nr = mq;
+
+  switch (nr)
+{
+case 4:
+  if (dn == 0)
+goto wdev_inactive_unlock;
+  break;
+
+default:
+  break;
+}
+
+  switch (nr)
+{
+case 0:
+case 4:
+  return 0;
+
+default:
+  break;
+}
+
+ wdev_inactive_unlock:
+  return 1;
+}
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index f7b0692..b493e40 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -3828,6 +3828,8 @@ find_case_label_range (gswitch *switch_stmt, const 
irange *range_of_op)

   tree case_high
= CASE_HIGH (label) ? CASE_HIGH (label) : CASE_LOW (label);
   int_range_max label_range (CASE_LOW (label), case_high);
+  if (!types_compatible_p (label_range.type (), range_of_op->type ()))
+   range_cast (label_range, range_of_op->type ());
   label_range.intersect (range_of_op);
   if (label_range == *range_of_op)
return label;
--
1.8.3.1



Re: [PATCH] gcc: Make strchr return value pointers const

2020-09-08 Thread Richard Sandiford
JonY via Gcc-patches  writes:
> On 9/4/20 12:47 PM, Martin Storsjö wrote:
>> Hi,
>> 
>> On Fri, 4 Sep 2020, Jakub Jelinek wrote:
>> 
>>> On Tue, Sep 01, 2020 at 04:01:42PM +0300, Martin Storsjö wrote:
 This fixes compilation of codepaths for dos-like filesystems
 with Clang. When built with clang, it treats C input files as C++
 when the compiler driver is invoked in C++ mode, triggering errors
 when the return value of strchr() on a pointer to const is assigned
 to a pointer to non-const variable.
>>>
>>> Not really specific to clang, e.g. glibc does that in its headers too
>>> as the C++ standard mandates that (and I guess mingw should do that too).
>>>
 This matches similar variables outside of the ifdefs for dos-like
 path handling.

 2020-09-01  Martin Storsjö  

 gcc/Changelog:
     * dwarf2out.c (file_name_acquire): Make a strchr return value
     pointer to const.

 libcpp/Changelog:
     * files.c (remap_filename): Make a strchr return value pointer
     to const.
>>>
>>> LGTM.  And it is short enough not to need copyright assignment, so ok for
>>> trunk.
>> 
>> Thanks! Can someone commit this for me?
>> 
>> // Martin
>
> Ping can anyone commit this?
>
> Are platform maintainers allowed to push general changes like these? If
> so I can push soon.

Yeah, anyone with commit access can push an approved patch.

Richard


[committed] MSP430: Fix detection of assembler support for .mspabi_attribute

2020-09-08 Thread Jozef Lawrynowicz
The assembly code ".mspabi_attribute 4,1" uses the object attribute
mechanism to indicate that the 430 ISA is in use. However, the default
ISA is 430X, so GAS fails to assemble this since the ISA wasn't also set
to 430 on the command line.

Successfully regtested msp430.exp. This fixes the object attribute tests
when GAS hasn't been built using a unified source tree alongside GCC.

Committed as obvious.
>From b75863a88ececd4fcce9e3b35df8d91b82cf4fc5 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Tue, 8 Sep 2020 11:31:02 +0100
Subject: [PATCH] MSP430: Fix detection of assembler support for
 .mspabi_attribute

The assembly code ".mspabi_attribute 4,1" uses the object attribute
mechanism to indicate that the 430 ISA is in use. However, the default
ISA is 430X, so GAS fails to assemble this since the ISA wasn't also set
to 430 on the command line.

gcc/ChangeLog:

* config/msp430/msp430.c (msp430_file_end): Fix jumbled
HAVE_AS_MSPABI_ATTRIBUTE and HAVE_AS_GNU_ATTRIBUTE checks.
* configure: Regenerate.
* configure.ac: Use ".mspabi_attribute 4,2" to check for assembler
support for this object attribute directive.
---
 gcc/config/msp430/msp430.c | 4 ++--
 gcc/configure  | 2 +-
 gcc/configure.ac   | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c
index d0557fe9058..a299ed7f9d1 100644
--- a/gcc/config/msp430/msp430.c
+++ b/gcc/config/msp430/msp430.c
@@ -2091,7 +2091,7 @@ msp430_output_aligned_decl_common (FILE *   
stream,
 static void
 msp430_file_end (void)
 {
-#ifdef HAVE_AS_GNU_ATTRIBUTE
+#ifdef HAVE_AS_MSPABI_ATTRIBUTE
   /* Enum for tag names.  */
   enum
 {
@@ -2130,7 +2130,7 @@ msp430_file_end (void)
   OFBA_MSPABI_Tag_Data_Model,
   TARGET_LARGE ? OFBA_MSPABI_Val_Model_Large
   : OFBA_MSPABI_Val_Model_Small);
-#ifdef HAVE_AS_MSPABI_ATTRIBUTE
+#ifdef HAVE_AS_GNU_ATTRIBUTE
   /* Emit .gnu_attribute directive for Tag_GNU_MSP430_Data_Region.  */
   fprintf (asm_out_file, "\t%s %d, %d\n", gnu_attr, Tag_GNU_MSP430_Data_Region,
   msp430_data_region == MSP430_REGION_LOWER
diff --git a/gcc/configure b/gcc/configure
index 0f7a8dbe0f9..0a09777dd42 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -27981,7 +27981,7 @@ else
   then gcc_cv_as_msp430_mspabi_attribute=yes
 fi
   elif test x$gcc_cv_as != x; then
-$as_echo '.mspabi_attribute 4,1' > conftest.s
+$as_echo '.mspabi_attribute 4,2' > conftest.s
 if { ac_try='$gcc_cv_as $gcc_cv_as_flags  -o conftest.o conftest.s >&5'
   { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
   (eval $ac_try) 2>&5
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 0f11238c19f..6a233a3c706 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -5041,7 +5041,7 @@ pointers into PC-relative form.])
  [Define if your assembler supports .gnu_attribute.])])
 gcc_GAS_CHECK_FEATURE([.mspabi_attribute support],
   gcc_cv_as_msp430_mspabi_attribute, [2,33,50],,
-  [.mspabi_attribute 4,1],,
+  [.mspabi_attribute 4,2],,
   [AC_DEFINE(HAVE_AS_MSPABI_ATTRIBUTE, 1,
  [Define if your assembler supports .mspabi_attribute.])])
 if test x$enable_newlib_nano_formatted_io = xyes; then
-- 
2.28.0



[Patch] Fortran: Fixes for OpenMP loop-iter privatization (PRs 95109 + 94690)

2020-09-08 Thread Tobias Burnus

This patch fixes all known issues related to loop-iter privatization.

This patch removes the code added to gfc_resolve_do_iterator in commit
r11-349-gf884bef2105d748fd7869cd641cbb4f6b6bb.

I added it to fix some issues in testsuite/libgomp.fortran/pr66199-*.f90
but it turned out that removing no longer causes fails; on the other hand,
while r11-349 caused target1.f90 to fail, just removing that changes did
not fix it.

Digging deeper, I found that for target1.f90's
  !$omp target teams ...
  !$omp distribute parallel do simd ...
the do-loop's i and j where added to '(target) teams'
instead of the 'distribute parallel do simd', causing
a middle-end ICE → resolve.c and openmp.c part of the patch.

Some testing and looking at dumps additionally showed
that for 'target parallel do simd', all loop variables
where 'shared' as the wrong function was called in trans-openmp.c.

The latter is also the reason that 'combined-if.f90' now has
a higher 'omp simd.*if' count.

Tested on x86-64-gnu-linux.

OK for the trunk and GCC 10?
(As r11-349 was not applied to GCC 10, the
gfc_resolve_do_iterator change is trunk only.)

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
Fortran: Fixes for OpenMP loop-iter privatization (PRs 95109 + 94690)

This commit also fixes a gfortran.dg/gomp/target1.f90 regression;
target1.f90 tests the resolve.c and openmp.c changes.

gcc/fortran/ChangeLog:

	PR fortran/95109
	PR fortran/94690
	* resolve.c (gfc_resolve_code): Also call
	gfc_resolve_omp_parallel_blocks for 'distribute parallel do (simd)'.
	* openmp.c (gfc_resolve_omp_parallel_blocks): Handle it.
	(gfc_resolve_do_iterator): Remove special code for SIMD, which is
	not needed.
	* trans-openmp.c (gfc_trans_omp_target): For TARGET_PARALLEL_DO_SIMD,
	call simd not do processing function.

gcc/testsuite/ChangeLog:

	PR fortran/95109
	PR fortran/94690
	* gfortran.dg/gomp/combined-if.f90: Update scan-tree-dump-times for
	'omp simd.*if'.
	* gfortran.dg/gomp/openmp-simd-5.f90: New test.

 gcc/fortran/openmp.c | 27 ++--
 gcc/fortran/resolve.c|  2 ++
 gcc/fortran/trans-openmp.c   |  8 ++-
 gcc/testsuite/gfortran.dg/gomp/combined-if.f90   |  2 +-
 gcc/testsuite/gfortran.dg/gomp/openmp-simd-5.f90 | 24 +
 5 files changed, 36 insertions(+), 27 deletions(-)

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index d0e516c472d..1efce33e519 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -5962,6 +5962,8 @@ gfc_resolve_omp_parallel_blocks (gfc_code *code, gfc_namespace *ns)
 
   switch (code->op)
 {
+case EXEC_OMP_DISTRIBUTE_PARALLEL_DO:
+case EXEC_OMP_DISTRIBUTE_PARALLEL_DO_SIMD:
 case EXEC_OMP_PARALLEL_DO:
 case EXEC_OMP_PARALLEL_DO_SIMD:
 case EXEC_OMP_TARGET_PARALLEL_DO:
@@ -6047,31 +6049,6 @@ gfc_resolve_do_iterator (gfc_code *code, gfc_symbol *sym, bool add_clause)
   if (omp_current_ctx->sharing_clauses->contains (sym))
 return;
 
-  if (omp_current_ctx->is_openmp && omp_current_ctx->code->block)
-{
-  /* SIMD is handled differently and, hence, ignored here.  */
-  gfc_code *omp_code = omp_current_ctx->code->block;
-  for ( ; omp_code->next; omp_code = omp_code->next)
-	switch (omp_code->op)
-	  {
-	  case EXEC_OMP_SIMD:
-	  case EXEC_OMP_DO_SIMD:
-	  case EXEC_OMP_PARALLEL_DO_SIMD:
-	  case EXEC_OMP_DISTRIBUTE_SIMD:
-	  case EXEC_OMP_DISTRIBUTE_PARALLEL_DO_SIMD:
-	  case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD:
-	  case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_SIMD:
-	  case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
-	  case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
-	  case EXEC_OMP_TARGET_PARALLEL_DO_SIMD:
-	  case EXEC_OMP_TARGET_SIMD:
-	  case EXEC_OMP_TASKLOOP_SIMD:
-	return;
-	  default:
-	break;
-	  }
-}
-
   if (! omp_current_ctx->private_iterators->add (sym) && add_clause)
 {
   gfc_omp_clauses *omp_clauses = omp_current_ctx->code->ext.omp_clauses;
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index ebf89a9b1f5..f4ce49f8432 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -11722,6 +11722,8 @@ gfc_resolve_code (gfc_code *code, gfc_namespace *ns)
 	  omp_workshare_flag = 1;
 	  gfc_resolve_omp_parallel_blocks (code, ns);
 	  break;
+	case EXEC_OMP_DISTRIBUTE_PARALLEL_DO:
+	case EXEC_OMP_DISTRIBUTE_PARALLEL_DO_SIMD:
 	case EXEC_OMP_PARALLEL:
 	case EXEC_OMP_PARALLEL_DO:
 	case EXEC_OMP_PARALLEL_DO_SIMD:
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 7d3365fe7e0..0e1da0426b4 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -5591,13 +5591,19 @@ gfc_trans_omp_target (gfc_code *code)
   }
   break;
 case EXEC_OMP_TARGET_PARALLEL_DO:
-case 

Re: Do we need to do a loop invariant motion after loop interchange ?

2020-09-08 Thread Bin.Cheng via Gcc-patches
On Mon, Sep 7, 2020 at 5:42 PM HAO CHEN GUI  wrote:
>
> Hi,
>
> I want to follow Lijia's work as I gained the performance benefit on
> some SPEC workloads by adding a im pass after loop interchange.  Could
> you send me the latest patches? I could do further testing. Thanks a lot.
Hi,
Hmm, not sure if this refers to me?  I only provided an example patch
(which isn't complete) before Lijia's.  Unfortunately I don't have any
latest patch about this either.
As Richard suggested, maybe you (if you work on this) can simplify the
implementation.  Anyway, we only need to hoist memory references here.

Thanks,
bin
>
> https://gcc.gnu.org/pipermail/gcc/2020-February/232091.html
>


RE: [PATCH v2] doc: add 'cd' command before 'make check-gcc' command in install.texi

2020-09-08 Thread Hu, Jiangping
Hi, H-P

> > > On Sat, 29 Aug 2020, Hu Jiangping wrote:
> > >
> > > > This patch add 'cd' command before 'make check-gcc' command
> > > > when run the testsuite on selected tests.
> > >
> > > No, don't do that; those targets work fine from the toplevel
> > > too, and then include the language libs.
> > Yes, I know that 'make check-gcc' work well from the toplevel,
> > but 'make check-g++' does not. Is there anything wrong with
> > the Makefile?
> 
> IIUC check-g++ is somewhat a historic artefact, but for
> consistency it should be added to the toplevel Makefile too as
> a synonym for check-c++.
Thanks for your suggestion. I'm submitting a patch for that.
Any advice will be appreciated.

> 
> > i.e.:
> > Note that if run 'make check-testsuite' from the object directory,
> >   not only the tests under gcc subdirectory but also the tests under
> >   the target libriaries will be performed.
> 
> (There's no "make check-testsuite".)
Oh, I mean 'check-@var{testsuite}'.

If the patch for Makefile is ok, then the patch here will be modified 
accordingly. 
For example, it is no need to add cd objdir/gcc before the ‘make 
check-’
commands, but in the text above the commands we need to describe the difference
between the targets in the gcc subdirectory and the object directory.

Regards!
Hujp

> 
> > What do you think?
> 
> I think that after re-reading the patch, I retract my objection,
> thanks.
> 
> FWIW, note also "check-gcc-" where LANGUAGE="c, c++,
> fortran, ada" which do consider the library testsuite.
> 
> brgds, H-P
> 





Re: [PATCH] libphobos: libdruntime doesn't support shadow stack (PR95680)

2020-09-08 Thread Iain Buclaw via Gcc-patches
Excerpts from H.J. Lu's message of September 8, 2020 4:09 am:
> On Mon, Sep 7, 2020 at 2:35 PM Iain Buclaw  wrote:
>>
>> Hi,
>>
>> This patch removes whatever CET support was in the switchContext routine
>> for x86 D runtime, and instead uses the ucontext fallback, which propely
>> handles shadow stack handling.
>>
>> Rather than implementing support within D runtime itself, use libc
>> getcontext/setcontext functions if CET is enabled instead.
>>
>> HJ, does this look reasonable before I commit it?  The detection has
>> been done at configure-time, rather than adding a predefined version
>> condition for CET within the compiler.
>>
>> Done regression testing on x86_64-linux-gnu/-m32/-mx32.
>>
>> Regards
>> Iain.
>>
>> ---
>> libphobos/ChangeLog:
>>
>> PR d/95680
>> * Makefile.in: Regenerate.
>> * configure: Regenerate.
>> * configure.ac (DCFG_ENABLE_CET): Substitute.
>> * libdruntime/Makefile.in: Regenerate.
>> * libdruntime/config/x86/switchcontext.S: Remove CET support code.
>> * libdruntime/core/thread.d: Import gcc.config.  Don't set version
>> AsmExternal when GNU_Enable_CET is true.
>> * libdruntime/gcc/config.d.in (GNU_Enable_CET): Define.
>> * src/Makefile.in: Regenerate.
>> * testsuite/Makefile.in: Regenerate.
> 
> Looks good.  I can try it on Tiger Lake after it has been checked in.
> 

OK, I have committed it as r11-3047.

Iain.


[RFC Patch] mklog.py: Parse first 10 lines for PR/DR number

2020-09-08 Thread Tobias Burnus

Hi Martin, hi all,

currently, mklog searches for "PR" (and "DR") only in the
first line of a new 'testsuite' file.

I think in many cases, the PR is listed a bit later than
the first line – although, it is usually in the first few
lines; in my example, it is in line 3 and 4.

Admittedly, I do have cases where later lines are wrong
like
"! Not tested due to PR ...'

How about testing the first, e.g., ten lines?
That's what the attached patch does.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
mklog.py: Parse first 10 lines for PR/DR number

contrib/ChangeLog:

	* mklog.py: Parse first 10 lines for PR/DR number
	not only the first line.

diff --git a/contrib/mklog.py b/contrib/mklog.py
index 243edbb15c5..d334a3875c9 100755
--- a/contrib/mklog.py
+++ b/contrib/mklog.py
@@ -137,7 +137,10 @@ def generate_changelog(data, no_functions=False, fill_pr_titles=False):
 
 # Extract PR entries from newly added tests
 if 'testsuite' in file.path and file.is_added_file:
-for line in list(file)[0]:
+# Only search first ten lines as later lines may
+# contains commented code which a note that it
+# has not been tested due to a certain PR or DR.
+for line in list(file)[0][0:10]:
 m = pr_regex.search(line.value)
 if m:
 pr = m.group('pr')
@@ -149,8 +152,6 @@ def generate_changelog(data, no_functions=False, fill_pr_titles=False):
 dr = m.group('dr')
 if dr not in prs:
 prs.append(dr)
-else:
-break
 
 if fill_pr_titles:
 out += get_pr_titles(prs)


[PATCH] Makefile.tpl: Add check-g++

2020-09-08 Thread Hu Jiangping
This patch add a new check-g++ target to the Makefile under toplevel,
as synonym of the check-c++ target.  

It is to be consistent with the check-g++ target under the gcc 
subdirectory.  And because check-gcc can be performed under toplevel,
it is very possible that check-g++ may be performed under toplevel,
but now it gives 'No rule to make target.' error.

ChangeLog:
2020-09-08 Hu Jiangping 

Makefile.tpl (check-g++): New target. As synonym of check-c++.
Makefile.in: Regenerated.

Bootstraped on aarch64. Ok for master?

Regards!
Hujp

---
 Makefile.in  | 3 +++
 Makefile.tpl | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/Makefile.in b/Makefile.in
index 36e369df6e7..35b57d5af21 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -4,6 +4,9 @@ check-gcc-d:
 check-d: check-gcc-d check-target-libphobos
 
 
+.PHONY: check-g++
+check-g++: check-c++
+
 # The gcc part of install-no-fixedincludes, which relies on an intimate
 # knowledge of how a number of gcc internal targets (inter)operate.  Delegate.
 .PHONY: gcc-install-no-fixedincludes
diff --git a/Makefile.tpl b/Makefile.tpl
index efed1511750..6dfe3c9caca 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -1542,6 +1542,9 @@ check-gcc-[+language+]:
 check-[+language+]: check-gcc-[+language+][+ FOR lib-check-target +] [+ 
lib-check-target +][+ ENDFOR lib-check-target +]
 [+ ENDFOR languages +]
 
+.PHONY: check-g++
+check-g++: check-c++
+
 # The gcc part of install-no-fixedincludes, which relies on an intimate
 # knowledge of how a number of gcc internal targets (inter)operate.  Delegate.
 .PHONY: gcc-install-no-fixedincludes
-- 
2.17.1





[COMMITTED] config: Sync largefile.m4 from binutils-gdb

2020-09-08 Thread Rainer Orth
The following patch improves handling of largefile support with procfs
on 32-bit Solaris.  It has already been approved and installed for
binutils-gdb in the thread starting at

[PATCH] Unify Solaris procfs and largefile handling
https://sourceware.org/pipermail/gdb-patches/2020-June/169977.html

I'm syncing the config/largefile.m4 part to gcc now which is the master
for config.  Since ACX_LARGEFILE isn't used anywhere in the gcc tree,
I'm installing it as obvious.

[Or rather: I meant to and failed:

remote: *** ChangeLog format failed:
remote: ERR: first line should start with a tab, an asterisk and a space:"  
Sync from binutils-gdb."
remote: ERR: additional author must be indented with one tab and four spaces:"  
2020-07-30  Rainer Orth  "
remote: ERR: first line should start with a tab, an asterisk and a space:"  
2020-07-30  Rainer Orth  "
remote: 
remote: Please see: https://gcc.gnu.org/codingconventions.html#ChangeLogs
remote: 
abort: git remote error: refs/heads/master failed to update

How am I supposed to install a ChangeLog entry like the one below?  The
format is analogous to the one used for backports.  Martin?]

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2020-09-08  Rainer Orth  

config:
Sync from binutils-gdb.
2020-07-30  Rainer Orth  

* largefile.m4 (ACX_LARGEFILE) :
Check for  incompatilibity with large-file support
on Solaris.
Only disable large-file support and perhaps plugins if needed.
Set, substitute LARGEFILE_CPPFLAGS if so.

# HG changeset patch
# Parent  9631429b03ab5d025fee1443d88e46caa38dbd23
config: Sync largefile.m4 from binutils-gdb

diff --git a/config/largefile.m4 b/config/largefile.m4
--- a/config/largefile.m4
+++ b/config/largefile.m4
@@ -1,5 +1,5 @@
 # This macro wraps AC_SYS_LARGEFILE with one exception for Solaris.
-# PR 9992/binutils: We have to replicate everywhere the behaviour of
+# PR binutils/9992: We have to replicate everywhere the behaviour of
 # bfd's configure script so that all the directories agree on the size
 # of structures used to describe files.
 
@@ -16,17 +16,38 @@ AC_REQUIRE([AC_CANONICAL_TARGET])
 AC_PLUGINS
 
 case "${host}" in
-changequote(,)dnl
-  sparc-*-solaris*|i[3-7]86-*-solaris*)
-changequote([,])dnl
-# On native 32bit sparc and ia32 solaris, large-file and procfs support
-# are mutually exclusive; and without procfs support, the bfd/ elf module
-# cannot provide certain routines such as elfcore_write_prpsinfo
-# or elfcore_write_prstatus.  So unless the user explicitly requested
-# large-file support through the --enable-largefile switch, disable
-# large-file support in favor of procfs support.
-test "${target}" = "${host}" -a "x$plugins" = xno \
-  && : ${enable_largefile="no"}
+  sparc-*-solaris*|i?86-*-solaris*)
+# On native 32-bit Solaris/SPARC and x86, large-file and procfs support
+# were mutually exclusive until Solaris 11.3.  Without procfs support,
+# the bfd/ elf module cannot provide certain routines such as
+# elfcore_write_prpsinfo or elfcore_write_prstatus.  So unless the user
+# explicitly requested large-file support through the
+# --enable-largefile switch, disable large-file support in favor of
+# procfs support.
+#
+# Check if  is incompatible with large-file support.
+AC_TRY_COMPILE([#define _FILE_OFFSET_BITS 64
+#define _STRUCTURED_PROC 1
+#include ], , acx_cv_procfs_lfs=yes, acx_cv_procfs_lfs=no)
+#
+# Forcefully disable large-file support only if necessary, gdb is in
+# tree and enabled.
+if test "${target}" = "${host}" -a "$acx_cv_procfs_lfs" = no \
+ -a -d $srcdir/../gdb -a "$enable_gdb" != no; then
+  : ${enable_largefile="no"}
+  if test "$plugins" = yes; then
+	AC_MSG_WARN([
+plugin support disabled; require large-file support which is incompatible with GDB.])
+	plugins=no
+  fi
+fi
+#
+# Explicitly undef _FILE_OFFSET_BITS if enable_largefile=no for the
+# benefit of g++ 9+ which predefines it on Solaris.
+if test "$enable_largefile" = no; then
+  LARGEFILE_CPPFLAGS="-U_FILE_OFFSET_BITS"
+  AC_SUBST(LARGEFILE_CPPFLAGS)
+fi
 ;;
 esac
 


[committed] MSP430: Use enums to handle -mcpu= values

2020-09-08 Thread Jozef Lawrynowicz
The -mcpu= option accepts only a handful of string values.
Using enums instead of strings to handle the accepted values removes the
need to have specific processing of the strings in the backend, and
simplifies any comparisons that need to be performed on the value.

It also allows the default value to have semantic equivalence to a user
set value, whilst retaining the ability to differentiate between them.
Practically, this allows a user set -mcpu= value to override the the ISA set by
-mmcu, whilst the default -mcpu= value can still have an explicit meaning.

Successfully regtested on trunk.

Committed as obvious.
>From cd2d3822ca0f2f743601cc9d048d51f6d326f6a2 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Tue, 8 Sep 2020 10:10:17 +0100
Subject: [PATCH] MSP430: Use enums to handle -mcpu= values

The -mcpu= option accepts only a handful of string values.
Using enums instead of strings to handle the accepted values removes the
need to have specific processing of the strings in the backend, and
simplifies any comparisons which need to be performed on the value.

It also allows the default value to have semantic equivalence to a user
set value, whilst retaining the ability to differentiate between them.
Practically, this allows a user set -mcpu= value to override the the ISA set by
-mmcu, whilst the default -mcpu= value can still have an explicit meaning.

gcc/ChangeLog:

* common/config/msp430/msp430-common.c (msp430_handle_option): Remove
OPT_mcpu_ handling.
Set target_cpu value to new enum values when parsing certain -mmcu=
values.
* config/msp430/msp430-opts.h (enum msp430_cpu_types): New.
* config/msp430/msp430.c (msp430_option_override): Handle new
target_cpu enum values.
Set target_cpu using extracted value for given MCU when -mcpu=
option is not passed by the user.
* config/msp430/msp430.opt: Handle -mcpu= values using enums.

gcc/testsuite/ChangeLog:

* gcc.target/msp430/mcpu-is-430.c: New test.
* gcc.target/msp430/mcpu-is-430x.c: New test.
* gcc.target/msp430/mcpu-is-430xv2.c: New test.
---
 gcc/common/config/msp430/msp430-common.c  | 26 +++
 gcc/config/msp430/msp430-opts.h   | 12 +
 gcc/config/msp430/msp430.c| 21 ++-
 gcc/config/msp430/msp430.opt  | 23 +++-
 gcc/testsuite/gcc.target/msp430/mcpu-is-430.c | 10 +++
 .../gcc.target/msp430/mcpu-is-430x.c  | 12 +
 .../gcc.target/msp430/mcpu-is-430xv2.c| 13 ++
 7 files changed, 80 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/msp430/mcpu-is-430.c
 create mode 100644 gcc/testsuite/gcc.target/msp430/mcpu-is-430x.c
 create mode 100644 gcc/testsuite/gcc.target/msp430/mcpu-is-430xv2.c

diff --git a/gcc/common/config/msp430/msp430-common.c 
b/gcc/common/config/msp430/msp430-common.c
index 0e261c40015..65be3194683 100644
--- a/gcc/common/config/msp430/msp430-common.c
+++ b/gcc/common/config/msp430/msp430-common.c
@@ -27,7 +27,7 @@
 #include "opts.h"
 #include "flags.h"
 
-/* Check for generic -mcpu= and -mmcu= names here.  If found then we
+/* Check for generic -mmcu= names here.  If found then we
convert to a baseline cpu name.  Otherwise we allow the option to
be passed on to the backend where it can be checked more fully.  */
 
@@ -39,26 +39,6 @@ msp430_handle_option (struct gcc_options *opts 
ATTRIBUTE_UNUSED,
 {
   switch (decoded->opt_index)
 {
-case OPT_mcpu_:
-  if (strcasecmp (decoded->arg, "msp430x") == 0
- || strcasecmp (decoded->arg, "msp430xv2") == 0
- || strcasecmp (decoded->arg, "430x") == 0
- || strcasecmp (decoded->arg, "430xv2") == 0)
-   {
- target_cpu = "msp430x";
-   }
-  else if (strcasecmp (decoded->arg, "msp430") == 0
-  || strcasecmp (decoded->arg, "430") == 0)
-   {
- target_cpu = "msp430";
-   }
-  else
-   {
- error ("unrecognized argument of %<-mcpu%>: %s", decoded->arg);
- return false;
-   }
-  break;
-
 case OPT_mmcu_:
   /* For backwards compatibility we recognise two generic MCU
 430X names.  However we want to be able to generate special C
@@ -66,13 +46,13 @@ msp430_handle_option (struct gcc_options *opts 
ATTRIBUTE_UNUSED,
 to NULL.  */
   if (strcasecmp (decoded->arg, "msp430") == 0)
{
- target_cpu = "msp430";
+ target_cpu = MSP430_CPU_MSP430;
  target_mcu = NULL;
}
   else if (strcasecmp (decoded->arg, "msp430x") == 0
   || strcasecmp (decoded->arg, "msp430xv2") == 0)
{
- target_cpu = "msp430x";
+ target_cpu = MSP430_CPU_MSP430X;
  target_mcu = NULL;
}
   break;
diff --git a/gcc/config/msp430/msp430-opts.h b/gcc/config/msp430/msp430-opts.h
index 4d208306367..fa64677cb0b 100644
--- 

Re: [PATCH] Implement __builtin_thread_pointer for x86 TLS

2020-09-08 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 08, 2020 at 04:14:52PM +0800, Hongtao Liu wrote:
> Hi:
>   We have "*load_tp_" in i386.md for load of thread pointer in
> i386.md, so this patch merely adds the expander for
> __builtin_thread_pointer.
> 
>   Bootstrap is ok, regression test is ok for i386/x86-64 backend.
>   Ok for trunk?
> 
> gcc/ChangeLog:
> PR target/96955
> * config/i386/i386.md (get_thread_pointer): New
> expander.

I wonder if this shouldn't be done only if targetm.have_tls is true.
Because on targets that use emulated TLS it doesn't really make much sense.

> gcc/testsuite/ChangeLog:
> 
> * gcc.target/i386/pr96955-builtin_thread_pointer.c: New test.

The testcase naming is weird.  Either call it pr96955.c, or
builtin_thread_pointer.c, but not both.

> From 4d80571d325859f75b6eb896a0def9695fb65c49 Mon Sep 17 00:00:00 2001
> From: liuhongt 
> Date: Tue, 8 Sep 2020 15:44:58 +0800
> Subject: [PATCH] Implement __builtin_thread_pointer for x86 TLS.
> 
> gcc/ChangeLog:
>   PR target/96955
>   * config/i386/i386.md (get_thread_pointer): New
>   expander.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/pr96955-builtin_thread_pointer.c: New test.
> ---
>  gcc/config/i386/i386.md   |  5 
>  .../i386/pr96955-builtin_thread_pointer.c | 28 +++
>  2 files changed, 33 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/i386/pr96955-builtin_thread_pointer.c
> 
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 446793b78db..55b1852cf9a 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -15433,6 +15433,11 @@ (define_insn_and_split "*tls_local_dynamic_32_once"
>(clobber (reg:CC FLAGS_REG))])])
>  
>  ;; Load and add the thread base pointer from %:0.
> +(define_expand "get_thread_pointer"
> +  [(set (match_operand:PTR 0 "register_operand")
> + (unspec:PTR [(const_int 0)] UNSPEC_TP))]
> +  "")
> +
>  (define_insn_and_split "*load_tp_"
>[(set (match_operand:PTR 0 "register_operand" "=r")
>   (unspec:PTR [(const_int 0)] UNSPEC_TP))]
> diff --git a/gcc/testsuite/gcc.target/i386/pr96955-builtin_thread_pointer.c 
> b/gcc/testsuite/gcc.target/i386/pr96955-builtin_thread_pointer.c
> new file mode 100644
> index 000..dce31488117
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr96955-builtin_thread_pointer.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mtls-direct-seg-refs -O2 -masm=att" } */
> +
> +int*
> +foo1 ()
> +{
> +  return (int*) __builtin_thread_pointer ();
> +}
> +
> +/* { dg-final { scan-assembler "mov\[lq\]\[ \t\]*%\[fg\]s:0, %\[re\]ax" } }  
> */
> +
> +int
> +foo2 ()
> +{
> +  int* p =  (int*) __builtin_thread_pointer ();
> +  return p[4];
> +}
> +
> +/* { dg-final { scan-assembler "movl\[ \t\]*%\[fg\]s:16, %eax" } }  */
> +
> +int
> +foo3 (int i)
> +{
> +  int* p = (int*) __builtin_thread_pointer ();
> +  return p[i];
> +}
> +
> +/* { dg-final { scan-assembler "movl\[ \t\]*%\[fg\]s:0\\(,%\[a-z0-9\]*,4\\), 
> %eax" } }  */
> -- 
> 2.18.1
> 


Jakub



[PATCH] arm: Fix up arm_override_options_after_change [PR96939]

2020-09-08 Thread Jakub Jelinek via Gcc-patches
Hi!

As mentioned in the PR, the testcase fails to link, because when set_cfun is
being called on the crc function, arm_override_options_after_change is
called from set_cfun -> invoke_set_current_function_hook:
  /* Change optimization options if needed.  */
  if (optimization_current_node != opts)
{
  optimization_current_node = opts;
  cl_optimization_restore (_options, TREE_OPTIMIZATION (opts));
}
and at that point target_option_default_node actually matches even the
current state of options, so this means armv7 (or whatever) arch is set as
arm_active_target, then
  targetm.set_current_function (fndecl);
is called later in that function, which because the crc function's
DECL_FUNCTION_SPECIFIC_TARGET is different from the current one will do:
  cl_target_option_restore (_options, TREE_TARGET_OPTION (new_tree));
which calls arm_option_restore and sets arm_active_target to armv8-a+crc
(so far so good).
Later arm_set_current_function calls:
  save_restore_target_globals (new_tree);
which in this case calls:
  /* Call target_reinit and save the state for TARGET_GLOBALS.  */
  TREE_TARGET_GLOBALS (new_tree) = save_target_globals_default_opts ();
which because optimization_current_node != optimization_default_node
(the testcase is LTO, so all functions have their
DECL_FUNCTION_SPECIFIC_TARGET and TREE_OPTIMIZATION nodes) will call:
  cl_optimization_restore
(_options,
 TREE_OPTIMIZATION (optimization_default_node));
and
  cl_optimization_restore (_options,
   TREE_OPTIMIZATION (opts));
The problem is that these call arm_override_options_after_change again,
and that one uses the target_option_default_node as what to set the
arm_active_target to (i.e. back to armv7 or whatever, but not to the
armv8-a+crc that should be the active target for the crc function).
That means we then error on the builtin call in that function.

Now, the targetm.override_options_after_change hook is called always at the
end of cl_optimization_restore, i.e. when we change the Optimization marked
generic options.  So it seems unnecessary to call arm_configure_build_target
at that point (nothing it depends on changed), and additionally incorrect
(because it uses the target_option_default_node, rather than the current
set of options; we'd need to revert
https://gcc.gnu.org/legacy-ml/gcc-patches/2016-12/msg01390.html
otherwise so that it works again with global_options otherwise).
The options that arm_configure_build_target cares about will change only
during option parsing (which is where it is called already), or during
arm_set_current_function, where it is done during the
cl_target_option_restore.
Now, arm_override_options_after_change_1 wants to adjust the
str_align_functions, which depends on the current Optimization options (e.g.
optimize_size and flag_align_options and str_align_functions) as well as
the target options target_flags, so IMHO needs to be called both
when the Optimization options (possibly) change, i.e. from
the targetm.override_options_after_change hook, and from when the target
options change (set_current_function hook).

Bootstrapped/regtested on armv7hl-linux-gnueabi, ok for trunk?

Looking further at arm_override_options_after_change_1, it also seems to be
incorrect, rather than testing
!opts->x_str_align_functions
it should be really testing
!opts_set->x_str_align_functions
and get _options_set or similar passed to it as additional opts_set
argument.  That is because otherwise the decision will be sticky, while it
should be done whenever use provided -falign-functions but didn't provide
-falign-functions= (either on the command line, or through optimize
attribute or pragma).

2020-09-08  Jakub Jelinek  

PR target/96939
* config/arm/arm.c (arm_override_options_after_change): Don't call
arm_configure_build_target here.
(arm_set_current_function): Call arm_override_options_after_change_1
at the end.

* gcc.target/arm/lto/pr96939_0.c: New test.
* gcc.target/arm/lto/pr96939_1.c: New file.

--- gcc/config/arm/arm.c.jj 2020-07-30 15:04:38.136293101 +0200
+++ gcc/config/arm/arm.c2020-09-07 10:43:54.809561852 +0200
@@ -3037,10 +3037,6 @@ arm_override_options_after_change_1 (str
 static void
 arm_override_options_after_change (void)
 {
-  arm_configure_build_target (_active_target,
- TREE_TARGET_OPTION (target_option_default_node),
- _options_set, false);
-
   arm_override_options_after_change_1 (_options);
 }
 
@@ -32338,6 +32334,8 @@ arm_set_current_function (tree fndecl)
   cl_target_option_restore (_options, TREE_TARGET_OPTION (new_tree));
 
   save_restore_target_globals (new_tree);
+
+  arm_override_options_after_change_1 (_options);
 }
 
 /* Implement TARGET_OPTION_PRINT.  */
--- gcc/testsuite/gcc.target/arm/lto/pr96939_0.c.jj 2020-09-07 
11:26:45.909937609 +0200
+++ 

Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-08 Thread Richard Biener via Gcc-patches
On Tue, Sep 8, 2020 at 10:11 AM luoxhu  wrote:
>
> Hi Richi,
>
> On 2020/9/7 19:57, Richard Biener wrote:
> > +  if (TREE_CODE (to) == ARRAY_REF)
> > +   {
> > + tree op0 = TREE_OPERAND (to, 0);
> > + if (TREE_CODE (op0) == VIEW_CONVERT_EXPR
> > + && expand_view_convert_to_vec_set (to, from, to_rtx))
> > +   {
> > + pop_temp_slots ();
> > + return;
> > +   }
> > +   }
> >
> > you're placing this at an awkward spot IMHO, after to_rtx expansion
> > but disregading parts of  it and compensating just with 'to' matching.
> > Is the pieces (offset, bitpos) really too awkward to work with for
> > matching?
> >
> > Because as written you'll miscompile
> >
> > struct X { _vector signed int v; _vector singed int u; } x;
> >
> > test(int i, int a)
> > {
> >x.u[i] = a;
> > }
> >
> > as I think you'll end up assigning to x.v.
>
> Thanks for pointing out, this case will be a problem for the patch.
> I checked with optimize_bitfield_assignment_op, it will return very early
> as the mode1 is not VOIDmode, and this is actually not "FIELD op= VAL"
> operation?
>
> To be honest, I am not quite familiar with this part of code, I put the new
> function expand_view_convert_to_vec_set just after to_rtx expansion because
> adjust_address will change the V4SImode memory to SImode memory, but I need
> keep target to_rtx V4SImode to save the vector after calling
> rs6000_vector_set_var, so seems paradoxical here?
>
>  p to_rtx
> $264 = (rtx_def *) (mem/c:V4SI (reg/f:DI 112 virtual-stack-vars) [1 D.3186+0 
> S16 A128])
>
> => to_rtx = adjust_address (to_rtx, mode1, 0);
>
> p to_rtx
> $265 = (rtx_def *) (mem/c:SI (reg/f:DI 112 virtual-stack-vars) [1 D.3186+0 S4 
> A128])
>
>
> >
> > Are we just interested in the case were we store to a
> > pseudo or also when the destination is memory?  I guess
> > only when it's a pseudo - correct?  In that case
> > handling this all in optimize_bitfield_assignment_op
> > is probably the best thing to try.
> >
> > Note we possibly refrain from assigning a pseudo to
> > such vector because we see a variable array-ref to it.
>
> Seems not only pseudo, for example "v = vec_insert (i, v, n);"
> the vector variable will be store to stack first, then [r112:DI] is a
> memory here to be processed.  So the patch loads it from stack(insn #10) to
> temp vector register first, and store to stack again(insn #24) after
> rs6000_vector_set_var.

Hmm, yeah - I guess that's what should be addressed first then.
I'm quite sure that in case 'v' is not on the stack but in memory like
in my case a SImode store is better than what we get from
vec_insert - in fact vec_insert will likely introduce a RMW cycle
which is prone to inserting store-data-races?

So - what we need to "fix" is cfgexpand.c marking variably-indexed
decls as not to be expanded as registers (see
discover_nonconstant_array_refs).

I guess one way forward would be to perform instruction
selection on GIMPLE here and transform

VIEW_CONVERT_EXPR(D.3185)[_1] = i_6(D)

to a (direct) internal function based on the vec_set optab.  But then
in GIMPLE D.3185 is also still memory (we don't have a variable
index partial register set operation - BIT_INSERT_EXPR is
currently specified to receive a constant bit position only).

At which point after your patch is the stack storage elided?

>
> optimized:
>
> D.3185 = v_3(D);
> _1 = n_5(D) & 3;
> VIEW_CONVERT_EXPR(D.3185)[_1] = i_6(D);
> v_8 = D.3185;
> return v_8;
>
> => expand without the patch:
>
> 2: r119:V4SI=%2:V4SI
> 3: r120:DI=%5:DI
> 4: r121:DI=%6:DI
> 5: NOTE_INSN_FUNCTION_BEG
> 8: [r112:DI]=r119:V4SI
>
> 9: r122:DI=r121:DI&0x3
>10: r123:DI=r122:DI<<0x2
>11: r124:DI=r112:DI+r123:DI
>12: [r124:DI]=r120:DI#0
>
>13: r126:V4SI=[r112:DI]
>14: r118:V4SI=r126:V4SI
>18: %2:V4SI=r118:V4SI
>19: use %2:V4SI
>
> => expand with the patch (replace #9~#12 to #10~#24):
>
> 2: r119:V4SI=%2:V4SI
> 3: r120:DI=%5:DI
> 4: r121:DI=%6:DI
> 5: NOTE_INSN_FUNCTION_BEG
> 8: [r112:DI]=r119:V4SI
> 9: r122:DI=r121:DI&0x3
>
>10: r123:V4SI=[r112:DI] // load from stack
>11: {r125:SI=0x3-r122:DI#0;clobber ca:SI;}
>12: r125:SI=r125:SI<<0x2
>13: {r125:SI=0x14-r125:SI;clobber ca:SI;}
>14: r128:DI=unspec[`*.LC0',%2:DI] 47
>   REG_EQUAL `*.LC0'
>15: r127:V2DI=[r128:DI]
>   REG_EQUAL const_vector
>16: r126:V16QI=r127:V2DI#0
>17: r129:V16QI=unspec[r120:DI#0] 61
>18: r130:V16QI=unspec[r125:SI] 151
>19: r131:V16QI=unspec[r129:V16QI,r129:V16QI,r130:V16QI] 232
>20: r132:V16QI=unspec[r126:V16QI,r126:V16QI,r130:V16QI] 232
>21: r124:V16QI=r123:V4SI#0
>22: r124:V16QI={(r132:V16QI!=const_vector)?r131:V16QI:r124:V16QI}
>23: r123:V4SI=r124:V16QI#0
>24: [r112:DI]=r123:V4SI   // store to stack.
>
>25: r134:V4SI=[r112:DI]
>26: r118:V4SI=r134:V4SI
>30: %2:V4SI=r118:V4SI
>31: use %2:V4SI
>
>
> Thanks,
> Xionghu


Re: [PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-09-08 Thread Christophe Lyon via Gcc-patches
On Mon, 17 Aug 2020 at 11:00, Alex Coplan  wrote:
>
> Hello,
>
> Given the following C function:
>
> double *f(double *p, unsigned x)
> {
> return p + x;
> }
>
> prior to this patch, GCC at -O2 would generate:
>
> f:
> add x0, x0, x1, uxtw 3
> ret
>
> but this add instruction uses architecturally-invalid syntax: the width
> of the third operand conflicts with the width of the extension
> specifier. The third operand is only permitted to be an x register when
> the extension specifier is (u|s)xtx.
>
> This instruction, and analogous insns for adds, sub, subs, and cmp, are
> rejected by clang, but accepted by binutils. Assembling and
> disassembling such an insn with binutils gives the architecturally-valid
> version in the disassembly:
>
>0:   8b214c00add x0, x0, w1, uxtw #3
>
> This patch fixes several patterns in the AArch64 backend to use the
> standard syntax as specified in the Arm ARM such that GCC's output can
> be assembled by assemblers other than GAS.
>
> Note that an obvious omission here is that this patch does not touch the
> mult patterns such as *add__mult_. I found
> that I couldn't hit these patterns with C code since multiplications by
> powers of two always get turned into shifts by earlier RTL passes. If
> there's a way to reliably hit these patterns, then perhaps these should
> be updated as well.
>
> Testing:
>  * New test which checks for the correct syntax in all updated
>patterns (fails before and passes after the aarch64.md change).
>  * New test can be assembled by both GAS and llvm-mc following the
>change.
>  * Bootstrapped and regtested on aarch64-none-linux-gnu.
>
> OK for master?
>
> Thanks,
> Alex
>
> ---
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.md
> (*adds__): Ensure extended operand
> agrees with width of extension specifier.
> (*subs__): Likewise.
> (*adds__shift_): Likewise.
> (*subs__shift_): Likewise.
> (*add__): Likewise.
> (*add__shft_): Likewise.
> (*add_uxt_shift2): Likewise.
> (*sub__): Likewise.
> (*sub__shft_): Likewise.
> (*sub_uxt_shift2): Likewise.
> (*cmp_swp__reg): Likewise.
> (*cmp_swp__shft_): Likewise.
>
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/adds3.c: Fix test w.r.t. new syntax.
> * gcc.target/aarch64/cmp.c: Likewise.
> * gcc.target/aarch64/subs3.c: Likewise.
> * gcc.target/aarch64/subsp.c: Likewise.
> * gcc.target/aarch64/extend-syntax.c: New test.
>

Hi,

I've noticed some of the new tests fail with -mabi=ilp32:
gcc.target/aarch64/extend-syntax.c check-function-bodies add1
gcc.target/aarch64/extend-syntax.c check-function-bodies add3
gcc.target/aarch64/extend-syntax.c check-function-bodies sub2
gcc.target/aarch64/extend-syntax.c check-function-bodies sub3
gcc.target/aarch64/extend-syntax.c scan-assembler-times
subs\tx[0-9]+, x[0-9]+, w[0-9]+, sxtw 3 1
gcc.target/aarch64/subsp.c scan-assembler sub\tsp, sp, w[0-9]*, sxtw 4\n

Christophe


[PATCH] Implement __builtin_thread_pointer for x86 TLS

2020-09-08 Thread Hongtao Liu via Gcc-patches
Hi:
  We have "*load_tp_" in i386.md for load of thread pointer in
i386.md, so this patch merely adds the expander for
__builtin_thread_pointer.

  Bootstrap is ok, regression test is ok for i386/x86-64 backend.
  Ok for trunk?

gcc/ChangeLog:
PR target/96955
* config/i386/i386.md (get_thread_pointer): New
expander.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr96955-builtin_thread_pointer.c: New test.


-- 
BR,
Hongtao
From 4d80571d325859f75b6eb896a0def9695fb65c49 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Tue, 8 Sep 2020 15:44:58 +0800
Subject: [PATCH] Implement __builtin_thread_pointer for x86 TLS.

gcc/ChangeLog:
	PR target/96955
	* config/i386/i386.md (get_thread_pointer): New
	expander.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr96955-builtin_thread_pointer.c: New test.
---
 gcc/config/i386/i386.md   |  5 
 .../i386/pr96955-builtin_thread_pointer.c | 28 +++
 2 files changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr96955-builtin_thread_pointer.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 446793b78db..55b1852cf9a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -15433,6 +15433,11 @@ (define_insn_and_split "*tls_local_dynamic_32_once"
   (clobber (reg:CC FLAGS_REG))])])
 
 ;; Load and add the thread base pointer from %:0.
+(define_expand "get_thread_pointer"
+  [(set (match_operand:PTR 0 "register_operand")
+	(unspec:PTR [(const_int 0)] UNSPEC_TP))]
+  "")
+
 (define_insn_and_split "*load_tp_"
   [(set (match_operand:PTR 0 "register_operand" "=r")
 	(unspec:PTR [(const_int 0)] UNSPEC_TP))]
diff --git a/gcc/testsuite/gcc.target/i386/pr96955-builtin_thread_pointer.c b/gcc/testsuite/gcc.target/i386/pr96955-builtin_thread_pointer.c
new file mode 100644
index 000..dce31488117
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr96955-builtin_thread_pointer.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mtls-direct-seg-refs -O2 -masm=att" } */
+
+int*
+foo1 ()
+{
+  return (int*) __builtin_thread_pointer ();
+}
+
+/* { dg-final { scan-assembler "mov\[lq\]\[ \t\]*%\[fg\]s:0, %\[re\]ax" } }  */
+
+int
+foo2 ()
+{
+  int* p =  (int*) __builtin_thread_pointer ();
+  return p[4];
+}
+
+/* { dg-final { scan-assembler "movl\[ \t\]*%\[fg\]s:16, %eax" } }  */
+
+int
+foo3 (int i)
+{
+  int* p = (int*) __builtin_thread_pointer ();
+  return p[i];
+}
+
+/* { dg-final { scan-assembler "movl\[ \t\]*%\[fg\]s:0\\(,%\[a-z0-9\]*,4\\), %eax" } }  */
-- 
2.18.1



Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-08 Thread luoxhu via Gcc-patches
Hi Richi,

On 2020/9/7 19:57, Richard Biener wrote:
> +  if (TREE_CODE (to) == ARRAY_REF)
> +   {
> + tree op0 = TREE_OPERAND (to, 0);
> + if (TREE_CODE (op0) == VIEW_CONVERT_EXPR
> + && expand_view_convert_to_vec_set (to, from, to_rtx))
> +   {
> + pop_temp_slots ();
> + return;
> +   }
> +   }
> 
> you're placing this at an awkward spot IMHO, after to_rtx expansion
> but disregading parts of  it and compensating just with 'to' matching.
> Is the pieces (offset, bitpos) really too awkward to work with for
> matching?
> 
> Because as written you'll miscompile
> 
> struct X { _vector signed int v; _vector singed int u; } x;
> 
> test(int i, int a)
> {
>x.u[i] = a;
> }
> 
> as I think you'll end up assigning to x.v.

Thanks for pointing out, this case will be a problem for the patch.
I checked with optimize_bitfield_assignment_op, it will return very early
as the mode1 is not VOIDmode, and this is actually not "FIELD op= VAL"
operation?

To be honest, I am not quite familiar with this part of code, I put the new
function expand_view_convert_to_vec_set just after to_rtx expansion because
adjust_address will change the V4SImode memory to SImode memory, but I need
keep target to_rtx V4SImode to save the vector after calling
rs6000_vector_set_var, so seems paradoxical here?

 p to_rtx
$264 = (rtx_def *) (mem/c:V4SI (reg/f:DI 112 virtual-stack-vars) [1 D.3186+0 
S16 A128])

=> to_rtx = adjust_address (to_rtx, mode1, 0);

p to_rtx
$265 = (rtx_def *) (mem/c:SI (reg/f:DI 112 virtual-stack-vars) [1 D.3186+0 S4 
A128])


> 
> Are we just interested in the case were we store to a
> pseudo or also when the destination is memory?  I guess
> only when it's a pseudo - correct?  In that case
> handling this all in optimize_bitfield_assignment_op
> is probably the best thing to try.
> 
> Note we possibly refrain from assigning a pseudo to
> such vector because we see a variable array-ref to it.

Seems not only pseudo, for example "v = vec_insert (i, v, n);"
the vector variable will be store to stack first, then [r112:DI] is a
memory here to be processed.  So the patch loads it from stack(insn #10) to
temp vector register first, and store to stack again(insn #24) after
rs6000_vector_set_var.

optimized:

D.3185 = v_3(D);
_1 = n_5(D) & 3;
VIEW_CONVERT_EXPR(D.3185)[_1] = i_6(D);
v_8 = D.3185;
return v_8;

=> expand without the patch:

2: r119:V4SI=%2:V4SI
3: r120:DI=%5:DI
4: r121:DI=%6:DI
5: NOTE_INSN_FUNCTION_BEG
8: [r112:DI]=r119:V4SI

9: r122:DI=r121:DI&0x3
   10: r123:DI=r122:DI<<0x2
   11: r124:DI=r112:DI+r123:DI
   12: [r124:DI]=r120:DI#0

   13: r126:V4SI=[r112:DI]
   14: r118:V4SI=r126:V4SI
   18: %2:V4SI=r118:V4SI
   19: use %2:V4SI

=> expand with the patch (replace #9~#12 to #10~#24):

2: r119:V4SI=%2:V4SI
3: r120:DI=%5:DI
4: r121:DI=%6:DI
5: NOTE_INSN_FUNCTION_BEG
8: [r112:DI]=r119:V4SI
9: r122:DI=r121:DI&0x3

   10: r123:V4SI=[r112:DI] // load from stack
   11: {r125:SI=0x3-r122:DI#0;clobber ca:SI;}
   12: r125:SI=r125:SI<<0x2
   13: {r125:SI=0x14-r125:SI;clobber ca:SI;}
   14: r128:DI=unspec[`*.LC0',%2:DI] 47
  REG_EQUAL `*.LC0'
   15: r127:V2DI=[r128:DI]
  REG_EQUAL const_vector
   16: r126:V16QI=r127:V2DI#0
   17: r129:V16QI=unspec[r120:DI#0] 61
   18: r130:V16QI=unspec[r125:SI] 151
   19: r131:V16QI=unspec[r129:V16QI,r129:V16QI,r130:V16QI] 232
   20: r132:V16QI=unspec[r126:V16QI,r126:V16QI,r130:V16QI] 232
   21: r124:V16QI=r123:V4SI#0
   22: r124:V16QI={(r132:V16QI!=const_vector)?r131:V16QI:r124:V16QI}
   23: r123:V4SI=r124:V16QI#0
   24: [r112:DI]=r123:V4SI   // store to stack.

   25: r134:V4SI=[r112:DI]
   26: r118:V4SI=r134:V4SI
   30: %2:V4SI=r118:V4SI
   31: use %2:V4SI


Thanks,
Xionghu


[PATCH] rs6000: Use direct move for char/short vector CTOR [PR96933]

2020-09-08 Thread Kewen.Lin via Gcc-patches
Hi,

This patch is to make vector CTOR with char/short leverage direct
move instructions when they are available.  With one constructed
test case, it can speed up 145% for char and 190% for short on P9.

Tested SPEC2017 x264_r at -Ofast on P9, it gets 1.61% speedup
(but based on unexpected SLP see PR96789).

Bootstrapped/regtested on powerpc64{,le}-linux-gnu P8 and
powerpc64le-linux-gnu P9.

Is it ok for trunk?

BR,
Kewen


gcc/ChangeLog:

PR target/96933
* config/rs6000/rs6000.c (rs6000_expand_vector_init): Use direct move
instructions for vector construction with char/short types.
* config/rs6000/rs6000.md (p8_mtvsrwz_v16qisi2): New define_insn.
(p8_mtvsrd_v16qidi2): Likewise. 

gcc/testsuite/ChangeLog:

PR target/96933
* gcc.target/powerpc/pr96933-1.c: New test.
* gcc.target/powerpc/pr96933-2.c: New test.
* gcc.target/powerpc/pr96933-3.c: New test.
* gcc.target/powerpc/pr96933.h: New test.
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ca5b71ecdd3..39d7e2e9451 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6411,11 +6411,11 @@ rs6000_expand_vector_init (rtx target, rtx vals)
 {
   machine_mode mode = GET_MODE (target);
   machine_mode inner_mode = GET_MODE_INNER (mode);
-  int n_elts = GET_MODE_NUNITS (mode);
+  unsigned int n_elts = GET_MODE_NUNITS (mode);
   int n_var = 0, one_var = -1;
   bool all_same = true, all_const_zero = true;
   rtx x, mem;
-  int i;
+  unsigned int i;
 
   for (i = 0; i < n_elts; ++i)
 {
@@ -6681,6 +6681,207 @@ rs6000_expand_vector_init (rtx target, rtx vals)
   return;
 }
 
+  if (TARGET_DIRECT_MOVE && (mode == V16QImode || mode == V8HImode))
+{
+  rtx op[16];
+  /* Force the values into word_mode registers.  */
+  for (i = 0; i < n_elts; i++)
+   {
+ rtx tmp = force_reg (GET_MODE_INNER (mode), XVECEXP (vals, 0, i));
+ if (TARGET_POWERPC64)
+   {
+ op[i] = gen_reg_rtx (DImode);
+ emit_insn (gen_zero_extendqidi2 (op[i], tmp));
+   }
+ else
+   {
+ op[i] = gen_reg_rtx (SImode);
+ emit_insn (gen_zero_extendqisi2 (op[i], tmp));
+   }
+   }
+
+  rtx vr_qi[16];
+  rtx vr_hi[8];
+  rtx vr_si[4];
+  rtx vr_di[2];
+
+  rtx (*merge_v16qi) (rtx, rtx, rtx) = NULL;
+  rtx (*merge_v8hi) (rtx, rtx, rtx) = NULL;
+  rtx (*merge_v4si) (rtx, rtx, rtx) = NULL;
+  rtx perm_idx;
+
+  /* Set up some common gen routines and values according to endianness.  
*/
+  if (BYTES_BIG_ENDIAN)
+   {
+ if (mode == V16QImode)
+   {
+ merge_v16qi = gen_altivec_vmrghb;
+ merge_v8hi = gen_altivec_vmrglh;
+   }
+ else
+   merge_v8hi = gen_altivec_vmrghh;
+
+ merge_v4si = gen_altivec_vmrglw;
+ perm_idx = GEN_INT (3);
+   }
+  else
+   {
+ if (mode == V16QImode)
+   {
+ merge_v16qi = gen_altivec_vmrglb;
+ merge_v8hi = gen_altivec_vmrghh;
+   }
+ else
+   merge_v8hi = gen_altivec_vmrglh;
+
+ merge_v4si = gen_altivec_vmrghw;
+ perm_idx = GEN_INT (0);
+   }
+
+  if (TARGET_DIRECT_MOVE_128)
+   {
+ /* Take unsigned char big endianness as example for below comments,
+the input values are: c1, c2, c3, c4, ..., c15, c16.  */
+
+ rtx vr_mrg1[8];
+ /* Move to VSX register with vec_concat, each has 2 values.
+eg: vr_mrg1[0] = { ... c1, ... c2 };
+vr_mrg1[1] = { ... c3, ... c4 };
+...  */
+ for (i = 0; i < n_elts / 2; i++)
+   {
+ rtx tmp = gen_reg_rtx (V2DImode);
+ emit_insn (gen_vsx_concat_v2di (tmp, op[i * 2], op[i * 2 + 1]));
+ vr_mrg1[i] = gen_reg_rtx (V16QImode);
+ emit_move_insn (vr_mrg1[i], gen_lowpart (V16QImode, tmp));
+   }
+
+ /* Obtain the control vector for further merging.  */
+ rtx vr_ctrl = gen_reg_rtx (V16QImode);
+ if (mode == V16QImode)
+   {
+ rtx val;
+ if (BYTES_BIG_ENDIAN)
+   val = gen_int_mode (0x070f171f, SImode);
+ else
+   val = gen_int_mode (0x18100800, SImode);
+ rtvec v = gen_rtvec (4, val, val, val, val);
+ rtx tmp = gen_reg_rtx (V4SImode);
+ emit_insn (gen_vec_initv4sisi (tmp,
+gen_rtx_PARALLEL (V4SImode, v)));
+ emit_move_insn (vr_ctrl, gen_lowpart (V16QImode, tmp));
+   }
+ else
+   {
+ rtx val;
+ if (BYTES_BIG_ENDIAN)
+   val = gen_int_mode (0x06070e0f16171e1f, DImode);
+ else
+   val = gen_int_mode (0x1918111009080100, DImode);
+ rtvec v = gen_rtvec (2, 

RE: [PING] floatformat.h: Add bfloat16 support.

2020-09-08 Thread Willgerodt, Felix via Gcc-patches
Thanks for your review. It seems like the format issue was introduced by my 
email client when hitting reply. Sorry for that!
The original patch is formatted correctly, as I used git send-email: 
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552079.html

Could you double-check and push the patch for me? This is the first time I 
contribute to gcc and I therefore don't have write access.

Regards,
Felix

-Original Message-
From: Joseph Myers  
Sent: Montag, 7. September 2020 21:45
To: Willgerodt, Felix 
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PING] floatformat.h: Add bfloat16 support.

On Mon, 7 Sep 2020, Willgerodt, Felix via Gcc-patches wrote:

> @@ -133,6 +133,9 @@ extern const struct floatformat 
> floatformat_ia64_quad_little;
>  /* IBM long double (double+double).  */  extern const struct 
> floatformat floatformat_ibm_long_double_big;  extern const struct 
> floatformat floatformat_ibm_long_double_little;
> +/* bfloat16.  */
> +extern const struct floatformat floatformat_bfloat16_big; extern 
> +const struct floatformat floatformat_bfloat16_little;

There seems to be something odd about the diff formatting here.  I'd expect 
each declaration to be on its own line, not "extern const" at the end of a line 
and the rest of a declaration on the next line.  OK with that fixed.

--
Joseph S. Myers
mailto:jos...@codesourcery.com
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928


[PATCH][libatomic] Add nvptx support

2020-09-08 Thread Tom de Vries
Hi,

Add nvptx support to libatomic.

Given that atomic_test_and_set is not implemented for nvptx (PR96964), the
compiler translates __atomic_test_and_set falling back onto the "Failing all
else, assume a single threaded environment and simply perform the operation"
case in expand_atomic_test_and_set, so it doesn't map onto an actual atomic
operation.

Still, that counts as supported for the configure test of libatomic, so we
end up with HAVE_ATOMIC_TAS_1/2/4/8/16 == 1, and the corresponding
__atomic_test_and_set_1/2/4/8/16 in libatomic all using that non-atomic
implementation.

Fix this by adding an atomic_test_and_set expansion for nvptx, that uses
libatomics __atomic_test_and_set_1.

This again makes the configure tests for HAVE_ATOMIC_TAS_1/2/4/8/16 fail, so
instead we use this case in tas_n.c:
...
/* If this type is smaller than word-sized, fall back to a word-sized
   compare-and-swap loop.  */
bool
SIZE(libat_test_and_set) (UTYPE *mptr, int smodel)
...
which for __atomic_test_and_set_8 uses INVERT_MASK_8.

Add INVERT_MASK_8 in libatomic_i.h, as well as MASK_8.

Tested libatomic testsuite on nvptx.

Non-target bits (sync-builtins.def, libatomic_i.h) OK for trunk?

Any other comments?

Thanks,
- Tom

[libatomic] Add nvptx support

gcc/ChangeLog:

PR target/96964
* config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New
expansion.
* sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin.

libatomic/ChangeLog:

PR target/96898
* configure.tgt: Add nvptx.
* libatomic_i.h (MASK_8, INVERT_MASK_8): New macro definition.
* config/nvptx/host-config.h: New file.
* config/nvptx/lock.c: New file.

---
 gcc/config/nvptx/nvptx.md| 16 +++
 gcc/sync-builtins.def|  2 ++
 libatomic/config/nvptx/host-config.h | 56 
 libatomic/config/nvptx/lock.c| 56 
 libatomic/configure.tgt  |  3 ++
 libatomic/libatomic_i.h  |  2 ++
 6 files changed, 135 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 4168190fa42..6178e6a0f77 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1667,6 +1667,22 @@
   "%.\\tatom%A1.b%T0.\\t%0, %1, %2;"
   [(set_attr "atomic" "true")])
 
+(define_expand "atomic_test_and_set"
+  [(match_operand:SI 0 "nvptx_register_operand")   ;; bool success output
+   (match_operand:QI 1 "memory_operand")   ;; memory
+   (match_operand:SI 2 "const_int_operand")]   ;; model
+  ""
+{
+  rtx libfunc;
+  rtx addr;
+  libfunc = init_one_libfunc ("__atomic_test_and_set_1");
+  addr = convert_memory_address (ptr_mode, XEXP (operands[1], 0));
+  emit_library_call_value (libfunc, operands[0], LCT_NORMAL, SImode,
+ addr, ptr_mode,
+ operands[2], SImode);
+  DONE;
+})
+
 (define_insn "nvptx_barsync"
   [(unspec_volatile [(match_operand:SI 0 "nvptx_nonmemory_operand" "Ri")
 (match_operand:SI 1 "const_int_operand")]
diff --git a/gcc/sync-builtins.def b/gcc/sync-builtins.def
index 156a13ce0f8..b802257bd1a 100644
--- a/gcc/sync-builtins.def
+++ b/gcc/sync-builtins.def
@@ -261,6 +261,8 @@ DEF_SYNC_BUILTIN (BUILT_IN_SYNC_SYNCHRONIZE, 
"__sync_synchronize",
 
 DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_TEST_AND_SET, "__atomic_test_and_set",
  BT_FN_BOOL_VPTR_INT, ATTR_NOTHROWCALL_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_TEST_AND_SET_1, "__atomic_test_and_set_1",
+ BT_FN_BOOL_VPTR_INT, ATTR_NOTHROWCALL_LEAF_LIST)
 
 DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_CLEAR, "__atomic_clear", BT_FN_VOID_VPTR_INT,
  ATTR_NOTHROWCALL_LEAF_LIST)
diff --git a/libatomic/config/nvptx/host-config.h 
b/libatomic/config/nvptx/host-config.h
new file mode 100644
index 000..eb9de81f388
--- /dev/null
+++ b/libatomic/config/nvptx/host-config.h
@@ -0,0 +1,56 @@
+/* Copyright (C) 2020 Free Software Foundation, Inc.
+
+   This file is part of the GNU Atomic Library (libatomic).
+
+   Libatomic is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and 

Re: Bug in FINDLOC documentation at https://gcc.gnu.org/onlinedocs/gfortran/FINDLOC.html

2020-09-08 Thread Thomas Koenig via Gcc-patches

Hello Kay,


The above website reads: "9.108 FINDLOC — Search an array for a value

Description: Determines the location of the element in the array with the value 
given in the VALUE argument, or, if the DIM argument is supplied, determines the 
locations of the maximum element along each row of the array in the DIM direction. 
"

This was seemingly copied without change from the MAXLOC documentation, but "maximum 
element" should probably be changed to "matching elements", otherwise this makes no 
sense (to me at least).

Is there someone reading this list who can fix this?


Thanks for the report!
I have just committed the attached patch as obvious and simple
after checking with "make dvi" and "make pdf".

I will backport this shortly.

Fix description of FINDLOC result.

gcc/fortran/ChangeLog:

* intrinsic.texi: Fix description of FINDLOC result.

Best regards

Thomas
diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
index 13325ede3e3..eda87ea6a2c 100644
--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -6154,7 +6154,8 @@ END PROGRAM
 @item @emph{Description}:
 Determines the location of the element in the array with the value
 given in the @var{VALUE} argument, or, if the @var{DIM} argument is
-supplied, determines the locations of the maximum element along each
+supplied, determines the locations of the elements equal to the
+@var{VALUE} argument element along each
 row of the array in the @var{DIM} direction.  If @var{MASK} is
 present, only the elements for which @var{MASK} is @code{.TRUE.} are
 considered.  If more than one element in the array has the value