Re: [PATCH] [X86] Delete Deadcode.

2020-11-25 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 26, 2020 at 12:50:17PM +0800, Hongtao Liu wrote:
> Hi:
>   This patch is about to delete dead code in ix86_expand_special_args_builtin.
> 
>   Bootstrap and regression test are ok.
> 
> gcc/ChangeLog:
> * config/i386/i386-expand.c
> (ix86_expand_special_args_builtin): Delete last_arg_constant.

Ok for trunk.

Jakub



Re: [PATCH] make POINTER_PLUS offset sizetype (PR 97956)

2020-11-25 Thread Richard Biener via Gcc-patches
On Wed, Nov 25, 2020 at 7:22 PM Martin Sebor  wrote:
>
> On 11/25/20 2:31 AM, Richard Biener wrote:
> > On Wed, Nov 25, 2020 at 1:45 AM Martin Sebor via Gcc-patches
> >  wrote:
> >>
> >> Offsets in pointer expressions are signed but GCC prefers to
> >> represent them as sizetype instead, and sometimes (though not
> >> always) crashes during GIMPLE verification when they're not.
> >> The sometimes-but-not-always part makes it easy for mistakes
> >> to slip in and go undetected for months, until someone either
> >> trips over it by accident, or deliberately tries to break
> >> things (the test case in the bug relies on declaring memchr
> >> with the third argument of type signed long which is what's
> >> apparently needed to trigger the ICE).  The attached patch
> >> corrects a couple of such mistakes.
> >>
> >> Martin
> >>
> >> PS It would save us the time and effort dealing with these
> >> bugs to either detect (or even correct) the mistakes early,
> >> at the time the POINTER_PLUS_EXPR is built.  Adding an assert
> >> to gimple_build_assign()) to verify that it has the expected
> >> type (or converting the operand to sizetype) as in the change
> >> below does that.  I'm pretty sure I submitted a patch like it
> >> in the past but it was rejected.  If I'm wrong or if there are
> >> no objections to it now I'll be happy to commit it as well.
> >
> > We already verify this in verify_gimple_assign_binary after
> > each pass.  Iff then this would argue for verifying all built
> > stmts immediately, assigns with verify_gimple_assign.
> > But I think this is overkill - your testcase is already catched
> > by the IL verification.
>
> You're right, having the check wouldn't have prevented this bug.
> But I'm not worried about this test case.  What I'd like to do
> is reduce the risk of similar problems happening in the future
> where the check would help.  Catching problems earlier by having
> functions verify their pre- and postconditions is good practice.
> So yes, I think all these build() functions should do that (not
> necessarily to the same extent as the full-blown IL verification
> but at least the basics).
>
> >
> > Btw, are you sure the offset returned by constant_byte_string
> > is never checked to be positive in callers?
>
> The function sets *PTR_OFFSET to sizetype in all but this one
> case (actually, it also sets it to integer_zero_one).  Callers
> then typically compare it to the length of the string to see
> if it's less.  If not, the result is discarded because it refers
> outside the string.  It's tested for equality to zero but I don't
> see it being checked to see if it's positive and I'm not sure to
> what end.  What's your concern?

My concern was that code might do

 /* Handle broken code.  */
 if (tree_int_cst_sgn (offset) < 0)
  return NULL_TREE;

.. expect offset to be within the string ..

but I didn't look at much context.  If the function uses sizetype
everywhere else that concern is moot and thus this hunk is OK
as well.

Thanks,
Richard.

> Anyway, as an experiment, I've changed the function to set
> the offset to ssizetype instead of sizetype and reran a subset
> of the test suite with the check in gimple_build_assign and it
> didn't trigger.  So I guess the sloppiness here doesn't matter.
>
> That said, there is a bug in the function that I noticed while
> making this change so it wasn't a completely pointless exercise.
> The function should call itself recursively but instead it calls
> string_constant.  I'll resubmit the sizetype change with the fix
> for this bug
>
> Martin
>
> >
> > The gimple-fold.c hunk and the new testcase are OK.
> >
> > Richard.
> >
> >> Both patches were tested on x86_64-linux.
> >>
> >> diff --git a/gcc/gimple.c b/gcc/gimple.c
> >> index e3e508daf2f..8e88bab9e41 100644
> >> --- a/gcc/gimple.c
> >> +++ b/gcc/gimple.c
> >> @@ -489,6 +489,9 @@ gassign *
> >>gimple_build_assign (tree lhs, enum tree_code subcode, tree op1,
> >>tree op2 MEM_STAT_DECL)
> >>{
> >> +  if (subcode == POINTER_PLUS_EXPR)
> >> +gcc_checking_assert (ptrofftype_p (TREE_TYPE (op2)));
> >> +
> >>  return gimple_build_assign_1 (lhs, subcode, op1, op2, NULL_TREE
> >>   PASS_MEM_STAT);
> >>}
>


Re: [PATCH] libgfortran: Correct FP feature macro checks

2020-11-25 Thread Thomas Koenig via Gcc-patches

Am 25.11.20 um 23:02 schrieb Maciej W. Rozycki:

The Fortran intrinsis like HUGE, EPSILON, SELECTED_REAL_KIND etc
would have to be handled correctly, both for simplification in
the front end and in the library.

Does the program

   print *,HUGE(1.0)
   print *,EPSILON(1.0)
end

print correct values?

  Well, it does not link, for the somewhat unsurprising reason of a missing
libgfortran runtime.


OK.

What you can do instead is to use the C interpoerability feature
to make a Fortran subroutine which does not depend on
libgfortran, like this:

subroutine read_val (r, d, i) bind(c)
  use iso_c_binding, only : c_float, c_double, c_int
  real(kind=c_float), intent(out) :: r
  real(kind=c_double), intent(out) :: d
  integer(kind=c_int), intent(out) :: i
  r = huge(1._c_float)
  d = huge(1._c_double)
  i = selected_real_kind(6)
end subroutine read_val

and then call it from C, like this:

#include 

void read_val (float *, double *, int *);

int main ()
{
  float r;
  double d;
  int i;
  read_val (&r, &d, &i);
  printf ("r = %e d = %e i = %d\n", r, d, i);
  return 0;
}

On my (IEEE) box, this prints

r = 3.402823e+38 d = 1.797693e+308 i = 4

so if you have a working printf (or some other way to display
floating-point-variables) for C, you can examine the
values.

Best regards

Thomas


[PATCH] [X86] Delete Deadcode.

2020-11-25 Thread Hongtao Liu via Gcc-patches
Hi:
  This patch is about to delete dead code in ix86_expand_special_args_builtin.

  Bootstrap and regression test are ok.

gcc/ChangeLog:
* config/i386/i386-expand.c
(ix86_expand_special_args_builtin): Delete last_arg_constant.
From 948756dae8f67bf766714d9ecc064b4eea9952cd Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Thu, 26 Nov 2020 09:49:18 +0800
Subject: [PATCH 1/2] Delete dead code in ix86_expand_special_args_builtin

gcc/ChangeLog:
	* config/i386/i386-expand.c
	(ix86_expand_special_args_builtin): Delete last_arg_constant.
---
 gcc/config/i386/i386-expand.c | 62 ++-
 1 file changed, 25 insertions(+), 37 deletions(-)

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 73e3358b290..e7768882158 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -10494,7 +10494,6 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
   machine_mode mode;
 } args[3];
   enum insn_code icode = d->icode;
-  bool last_arg_constant = false;
   const struct insn_data_d *insn_p = &insn_data[icode];
   machine_mode tmode = insn_p->operand[0].mode;
   enum { load, store } klass;
@@ -10824,48 +10823,37 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
   op = expand_normal (arg);
   match = insn_p->operand[i + 1].predicate (op, mode);
 
-  if (last_arg_constant && (i + 1) == nargs)
+  if (i == memory)
 	{
-	  if (!match)
-	{
-	  error ("the last argument must be an 8-bit immediate");
-	  return const0_rtx;
-	}
+	  /* This must be the memory operand.  */
+	  op = ix86_zero_extend_to_Pmode (op);
+	  op = gen_rtx_MEM (mode, op);
+	  /* op at this point has just BITS_PER_UNIT MEM_ALIGN
+	 on it.  Try to improve it using get_pointer_alignment,
+	 and if the special builtin is one that requires strict
+	 mode alignment, also from it's GET_MODE_ALIGNMENT.
+	 Failure to do so could lead to ix86_legitimate_combined_insn
+	 rejecting all changes to such insns.  */
+	  unsigned int align = get_pointer_alignment (arg);
+	  if (aligned_mem && align < GET_MODE_ALIGNMENT (mode))
+	align = GET_MODE_ALIGNMENT (mode);
+	  if (MEM_ALIGN (op) < align)
+	set_mem_align (op, align);
 	}
   else
 	{
-	  if (i == memory)
-	{
-	  /* This must be the memory operand.  */
-	  op = ix86_zero_extend_to_Pmode (op);
-	  op = gen_rtx_MEM (mode, op);
-	  /* op at this point has just BITS_PER_UNIT MEM_ALIGN
-		 on it.  Try to improve it using get_pointer_alignment,
-		 and if the special builtin is one that requires strict
-		 mode alignment, also from it's GET_MODE_ALIGNMENT.
-		 Failure to do so could lead to ix86_legitimate_combined_insn
-		 rejecting all changes to such insns.  */
-	  unsigned int align = get_pointer_alignment (arg);
-	  if (aligned_mem && align < GET_MODE_ALIGNMENT (mode))
-		align = GET_MODE_ALIGNMENT (mode);
-	  if (MEM_ALIGN (op) < align)
-		set_mem_align (op, align);
-	}
-	  else
-	{
-	  /* This must be register.  */
-	  if (VECTOR_MODE_P (mode))
-		op = safe_vector_operand (op, mode);
+	  /* This must be register.  */
+	  if (VECTOR_MODE_P (mode))
+	op = safe_vector_operand (op, mode);
 
-	  op = fixup_modeless_constant (op, mode);
+	  op = fixup_modeless_constant (op, mode);
 
-	  if (GET_MODE (op) == mode || GET_MODE (op) == VOIDmode)
-		op = copy_to_mode_reg (mode, op);
-	  else
-	{
-	  op = copy_to_reg (op);
-	  op = lowpart_subreg (mode, op, GET_MODE (op));
-	}
+	  if (GET_MODE (op) == mode || GET_MODE (op) == VOIDmode)
+	op = copy_to_mode_reg (mode, op);
+	  else
+	{
+	  op = copy_to_reg (op);
+	  op = lowpart_subreg (mode, op, GET_MODE (op));
 	}
 	}
 
-- 
2.18.1



Re: [PATCH][PR target/97642] Fix incorrect replacement of vmovdqu32 with vpblendd.

2020-11-25 Thread Hongtao Liu via Gcc-patches
On Wed, Nov 25, 2020 at 7:37 PM Jakub Jelinek  wrote:
>
> On Wed, Nov 25, 2020 at 07:32:44PM +0800, Hongtao Liu wrote:
> > Update patch:
> >   1. ix86_expand_special_args_builtin is used for expanding mask load
> > intrinsics, this function will always convert the constant mask
> > operands into reg. So for the situation of all-ones mask, keep this
> > constant, and also change the mask operand predicate(of corresponding
> > expander) to register_or_constm1_operand.
> >   2. Delete last_arg_constant which is not used in
> > ix86_expand_special_args_builtin(maybe should be in a separate patch?)
>
> Yes, please make it a separate patch, it should go in first and
> should just drop last_arg_constant from that function plus the
> reindentation.
>
> Then post the PR97642 incremental to that.
>

Updated.

> Thanks.
>
> Jakub
>

-- 
BR,
Hongtao
From b1256b6ef8f877244f4955b9205d53797424fc27 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Tue, 3 Nov 2020 17:26:43 +0800
Subject: [PATCH 2/2] Fix incorrect replacement of vmovdqu32 with vpblendd
 which can cause fault.

gcc/ChangeLog:

	PR target/97642
	* config/i386/i386-expand.c
	(ix86_expand_special_args_builtin): Don't move all-ones mask
	operands into register.
	* config/i386/sse.md (UNSPEC_MASKLOAD): New unspec.
	(*_load_mask): New define_insns for masked load
	instructions.
	(_load_mask): Changed to define_expands which
	specifically handle memory or all-ones mask operands.
	(_blendm): Changed to define_insns which are same
	as original _load_mask with adjustment of
	operands order.
	(*_load): New define_insn_and_split which is
	used to optimize for masked load with all one mask.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512bw-vmovdqu16-1.c: Adjust testcase to
	make sure only masked load instruction is generated.
	* gcc.target/i386/avx512bw-vmovdqu8-1.c: Ditto.
	* gcc.target/i386/avx512f-vmovapd-1.c: Ditto.
	* gcc.target/i386/avx512f-vmovaps-1.c: Ditto.
	* gcc.target/i386/avx512f-vmovdqa32-1.c: Ditto.
	* gcc.target/i386/avx512f-vmovdqa64-1.c: Ditto.
	* gcc.target/i386/avx512vl-vmovapd-1.c: Ditto.
	* gcc.target/i386/avx512vl-vmovaps-1.c: Ditto.
	* gcc.target/i386/avx512vl-vmovdqa32-1.c: Ditto.
	* gcc.target/i386/avx512vl-vmovdqa64-1.c: Ditto.
	* gcc.target/i386/pr97642-1.c: New test.
	* gcc.target/i386/pr97642-2.c: New test.
---
 gcc/config/i386/i386-expand.c |   8 +-
 gcc/config/i386/sse.md| 148 ++
 .../gcc.target/i386/avx512bw-vmovdqu16-1.c|   6 +-
 .../gcc.target/i386/avx512bw-vmovdqu8-1.c |   6 +-
 .../gcc.target/i386/avx512f-vmovapd-1.c   |   2 +-
 .../gcc.target/i386/avx512f-vmovaps-1.c   |   2 +-
 .../gcc.target/i386/avx512f-vmovdqa32-1.c |   2 +-
 .../gcc.target/i386/avx512f-vmovdqa64-1.c |   2 +-
 .../gcc.target/i386/avx512vl-vmovapd-1.c  |   4 +-
 .../gcc.target/i386/avx512vl-vmovaps-1.c  |   4 +-
 .../gcc.target/i386/avx512vl-vmovdqa32-1.c|   4 +-
 .../gcc.target/i386/avx512vl-vmovdqa64-1.c|   4 +-
 gcc/testsuite/gcc.target/i386/pr97642-1.c |  41 +
 gcc/testsuite/gcc.target/i386/pr97642-2.c |  77 +
 14 files changed, 263 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97642-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97642-2.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index e7768882158..d034612d9ee 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -10848,7 +10848,13 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
 
 	  op = fixup_modeless_constant (op, mode);
 
-	  if (GET_MODE (op) == mode || GET_MODE (op) == VOIDmode)
+	  /* NB: 3-operands load implied it's a mask load,
+	 and that mask operand shoud be at the end.
+	 Keep all-ones mask which would be simplified by the expander.  */
+	  if (nargs == 3 && i == 2 && klass == load
+	  && constm1_operand (op, mode))
+	;
+	  else if (GET_MODE (op) == mode || GET_MODE (op) == VOIDmode)
 	op = copy_to_mode_reg (mode, op);
 	  else
 	{
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 11936809561..c7f7aeec51d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -111,6 +111,8 @@ (define_c_enum "unspec" [
   UNSPEC_MASKOP
   UNSPEC_KORTEST
   UNSPEC_KTEST
+  ;; Mask load
+  UNSPEC_MASKLOAD
 
   ;; For embed. rounding feature
   UNSPEC_EMBEDDED_ROUNDING
@@ -1065,18 +1067,39 @@ (define_insn "mov_internal"
 	  ]
 	  (symbol_ref "true")))])
 
-(define_insn "_load_mask"
-  [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v,v")
+;; If mem_addr points to a memory region with less than whole vector size bytes
+;; of accessible memory and k is a mask that would prevent reading the inaccessible
+;; bytes from mem_addr, add UNSPEC_MASKLOAD to prevent it to be transformed to vpblendd
+;; See pr97642.
+(define_expand "_load_mask"
+  [(set (match_operand:V48_AVX512VL 0 "register_operand")

Re: [PATCH v2] rs6000, vector integer multiply/divide/modulo instructions

2020-11-25 Thread Segher Boessenkool
On Tue, Nov 24, 2020 at 08:34:51PM -0600, Pat Haugen wrote:
> On 11/24/20 8:17 PM, Pat Haugen via Gcc-patches wrote:
> > On 11/24/20 12:59 PM, Carl Love via Gcc-patches wrote:
> >> +(define_insn "modu_"
> >> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> >> +  (umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> >> +   (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> >> +  "TARGET_POWER10"
> >> +  "vmodu %0,%1,%2"
> >> +  [(set_attr "type" "vecdiv")
> >> +   (set_attr "size" "128")])
> > 
> > We should only be setting "size" "128" for instructions that operate on 
> > scalar 128-bit data items (i.e. 'vdivesq' etc). Since the above insns are 
> > either V2DI/V4SI (ala VIlong mode_iterator), they shouldn't be marked as 
> > size 128. If you want to set the size based on mode, (set_attr "size" 
> > "") should do the trick I believe.
> 
> Well, after you update "(define_mode_attr bits" in rs6000.md for V2DI/V4SI.

So far,  was only used for scalars.  I agree that for vectors it
makes most sense to do the element size (because the vector size always
is 128 bits, and for scheduling the element size can matter).  But, the
definitions of  and  now say

;; What data size does this instruction work on?
;; This is used for insert, mul and others as necessary.
(define_attr "size" "8,16,32,64,128" (const_string "32"))

and

;; How many bits in this mode?
(define_mode_attr bits [(QI "8") (HI "16") (SI "32") (DI "64")
   (SF "32") (DF "64")])
so those need a bit of update as well then :-)


Segher


Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.

2020-11-25 Thread Hongtao Liu via Gcc-patches
Thanks for the review.
BTW, the patch is already installed because uros helped to review this
patch in another thread
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558682.html

On Thu, Nov 26, 2020 at 3:15 AM Jeff Law  wrote:
>
>
>
> On 11/11/20 1:03 AM, Hongtao Liu via Gcc-patches wrote:
>
> >
> >
> >
> > vec_set_rebaserebase_onr11-4901.patch
> >
> > From c9d684c37b5f79f68f938f39eeb9e7989b10302d Mon Sep 17 00:00:00 2001
> > From: liuhongt 
> > Date: Mon, 19 Oct 2020 16:04:39 +0800
> > Subject: [PATCH] Support variable index vec_set.
> >
> > gcc/ChangeLog:
> >
> >   PR target/97194
> >   * config/i386/i386-expand.c (ix86_expand_vector_set_var): New 
> > function.
> >   * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> >   * config/i386/predicates.md (vec_setm_operand): New predicate,
> >   true for const_int_operand or register_operand under TARGET_AVX2.
> >   * config/i386/sse.md (vec_set): Support both constant
> >   and variable index vec_set.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/i386/avx2-vec-set-1.c: New test.
> >   * gcc.target/i386/avx2-vec-set-2.c: New test.
> >   * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> >   * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> >   * gcc.target/i386/avx512f-vec-set-2.c: New test.
> >   * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> This is OK.  Sorry for the delays.
>
> jeff
>


-- 
BR,
Hongtao


[committed] patch fixing PR97983

2020-11-25 Thread Vladimir Makarov via Gcc-patches

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97983

The patch was successfully bootstrapped on x86_64 and s390x (with 
--enable-languages=c,c++ --enable-checking=release --disable-multilib 
--with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions 
--enable-gnu-unique-object --enable-linker-build-\
id --with-gcc-major-version-only --with-linker-hash-style=gnu 
--enable-plugin --enable-initfini-array --with-isl 
--enable-gnu-indirect-function --with-long-double-128 --with-arch=zEC12 
--with-tune=z13 --enable-decimal-float)



commit 0ea3f28e49b1c936fae2b8a5a418440635c6b13a (HEAD -> master)
Author: Vladimir N. Makarov 
Date:   Wed Nov 25 17:06:13 2020 -0500

[PR97983] LRA: Use the right emit func for putting insn in the destination BB.

gcc/

2020-11-25  Vladimir Makarov  

PR bootstrap/97983
* lra.c (lra_process_new_insns): Use emit_insn_before_noloc or
emit_insn_after_noloc with the destination BB.

diff --git a/gcc/lra.c b/gcc/lra.c
index 4ec0f466376..a79213e32e0 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -1908,13 +1908,9 @@ lra_process_new_insns (rtx_insn *insn, rtx_insn *before, rtx_insn *after,
 		  tmp = NEXT_INSN (tmp);
 		if (NOTE_INSN_BASIC_BLOCK_P (tmp))
 		  tmp = NEXT_INSN (tmp);
-		for (curr = tmp; curr != NULL; curr = NEXT_INSN (curr))
-		  if (INSN_P (curr))
-		break;
 		/* Do not put reload insns if it is the last BB
-		   without actual insns.  In this case the reload insns
-		   can get null BB after emitting.  */
-		if (curr == NULL)
+		   without actual insns.  */
+		if (tmp == NULL)
 		  continue;
 		start_sequence ();
 		for (curr = after; curr != NULL_RTX; curr = NEXT_INSN (curr))
@@ -1927,7 +1923,11 @@ lra_process_new_insns (rtx_insn *insn, rtx_insn *before, rtx_insn *after,
 			 e->dest->index);
 		dump_rtl_slim (lra_dump_file, copy, NULL, -1, 0);
 		  }
-		emit_insn_before (copy, tmp);
+		/* Use the right emit func for setting up BB_END/BB_HEAD: */
+		if (BB_END (e->dest) == PREV_INSN (tmp))
+		  emit_insn_after_noloc (copy, PREV_INSN (tmp), e->dest);
+		else
+		  emit_insn_before_noloc (copy, tmp, e->dest);
 		push_insns (last, PREV_INSN (copy));
 		setup_sp_offset (copy, last);
 		/* We can ignore BB live info here as it and reg notes


Re: [PATCH] c++: v2: Add __builtin_bit_cast to implement std::bit_cast [PR93121]

2020-11-25 Thread Jason Merrill via Gcc-patches

On 11/25/20 1:50 PM, Jakub Jelinek wrote:

On Wed, Nov 25, 2020 at 12:26:17PM -0500, Jason Merrill wrote:

+ if (DECL_BIT_FIELD (fld)
+ && DECL_NAME (fld) == NULL_TREE)
+   continue;


I think you want to check DECL_PADDING_P here; the C and C++ front ends set
it on unnamed bit-fields, and that's what is_empty_type looks at.


Ok, changed in my copy.  I'll also post a patch for
__builtin_clear_padding to use DECL_PADDING_P in there instead of
DECL_BIT_FIELD/DECL_NAME==NULL.


+  if (TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
+   {
+ /* Don't perform array-to-pointer conversion.  */
+ arg = mark_rvalue_use (arg, loc, true);
+ if (!complete_type_or_maybe_complain (TREE_TYPE (arg), arg, complain))
+   return error_mark_node;
+   }
+  else
+   arg = decay_conversion (arg, complain);


bit_cast operates on an lvalue argument, so I don't think we want
decay_conversion at all here.


+  if (error_operand_p (arg))
+   return error_mark_node;
+
+  arg = convert_from_reference (arg);


This shouldn't be necessary; the argument should already be converted from
reference.  Generally we call convert_from_reference on the result of some
processing, not on an incoming argument.


Removing these two regresses some tests in the testsuite.
It is true that std::bit_cast's argument must be a reference, and so when
one uses std::bit_cast one won't run into these problems, but the builtin's
argument itself is an rvalue and so we need to deal with people calling it
directly.
So, commenting out the decay_conversion and convert_from_reference results
in:
extern V v;
...
   __builtin_bit_cast (int, v);
no longer being reported as invalid use of incomplete type, but
error: '__builtin_bit_cast' source size '' not equal to destination type size 
'4'
(note nothing in between '' for the size because the size is NULL).
Ditto for:
extern V *p;
...
   __builtin_bit_cast (int, *p);
I guess I could add some hand written code to deal with incomplete types
to cure these.  But e.g. decay_conversion also calls mark_rvalue_use which
we also need e.g. for -Wunused-but-set*, but don't we also need it e.g. for
lambdas? The builtin is after all using the argument as an rvalue
(reads it).


OK, it isn't exactly use as an rvalue, but I guess it's close enough to 
work.



Another change that commenting out those two parts causes is different
diagnostics on bit-cast4.C,
bit-cast4.C:7:30: error: '__builtin_bit_cast' is not a constant expression 
because 'const int* const' is a pointer type
bit-cast4.C:7:30: error: '__builtin_bit_cast' is not a constant expression 
because 'int D::* const' is a pointer to member type
bit-cast4.C:7:30: error: '__builtin_bit_cast' is not a constant expression 
because 'int (D::* const)() const' is a pointer to member type
The tests expect 'const int*', 'int D::*' and 'int (D::*)() const', i.e. the
toplevel qualifiers stripped from those.
Commenting out just the arg = convert_from_reference (arg); doesn't regress
anything though, it is the decay_conversion.


Let's just drop that part, then.


+/* Attempt to interpret aggregate of TYPE from bytes encoded in target
+   byte order at PTR + OFF with LEN bytes.  MASK contains bits set if the value
+   is indeterminate.  */
+
+static tree
+cxx_native_interpret_aggregate (tree type, const unsigned char *ptr, int off,
+   int len, unsigned char *mask,
+   const constexpr_ctx *ctx, bool *non_constant_p,
+   location_t loc)


Can this be, say, native_interpret_initializer in fold-const?  It doesn't
seem closely tied to the front end other than diagnostics that could move to
the caller, like you've already done for the non-aggregate case.


The middle-end doesn't need it ATM for anything, plus I think the
ctx/non_constant_p/loc and diagnostics is really C++ FE specific.
If you really want it in fold-const.c, the only way I can imagine it is
that it would be
tree
native_interpret_aggregate (tree type, const unsigned char *ptr, int off,
int len, unsigned char *mask = NULL,
tree (*mask_callback) (void *, int) = NULL,
void *mask_data = NULL)
where C++ would call it with the mask argument, as mask_callback a FE function
that would emit the diagnostics and decide what to return when mask is set
on something, and mask_data would be a pointer to struct containing
const constexpr_ctx *ctx; bool *non_constant_p; location_t loc;
for it.


Instead of a callback, the function could express mask-related failure 
in a way that the caller can detect and diagnose.  Perhaps by clearing 
mask elements that correspond to padding in the output, so you can again 
just scan through the mask to see if there are any elements set?


Jason



[r11-5391 Regression] FAIL: gcc.target/i386/avx512vl-vxorpd-2.c execution test on Linux/x86_64

2020-11-25 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

bb07490abba850fd5b1d2d09d76d18b8bdc7d817 is the first bad commit
commit bb07490abba850fd5b1d2d09d76d18b8bdc7d817
Author: Jan Hubicka 
Date:   Wed Nov 25 20:51:26 2020 +0100

Add EAF_NODIRECTESCAPE flag

caused

FAIL: gcc.target/i386/avx512vl-vandnpd-2.c execution test
FAIL: gcc.target/i386/avx512vl-vandpd-2.c execution test
FAIL: gcc.target/i386/avx512vl-vorpd-2.c execution test
FAIL: gcc.target/i386/avx512vl-vxorpd-2.c execution test

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-5391/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vandnpd-2.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vandnpd-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vandnpd-2.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vandnpd-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vandpd-2.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vandpd-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vandpd-2.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vandpd-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vorpd-2.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vorpd-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vorpd-2.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vorpd-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vxorpd-2.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vxorpd-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vxorpd-2.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512vl-vxorpd-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: How to traverse all the local variables that declared in the current routine?

2020-11-25 Thread Martin Sebor via Gcc-patches

On 11/24/20 9:54 AM, Qing Zhao via Gcc-patches wrote:




On Nov 24, 2020, at 9:55 AM, Richard Biener  wrote:

On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao  wrote:





On Nov 24, 2020, at 1:32 AM, Richard Biener  wrote:

On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches
 wrote:


Hi,

Does gcc provide an iterator to traverse all the local variables that are 
declared in the current routine?

If not, what’s the best way to traverse the local variables?


Depends on what for.  There's the source level view you get by walking
BLOCK_VARS of the
scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and
there's SSA names
(FOR_EACH_SSA_NAME).


I am planing to add a new phase immediately after 
“pass_late_warn_uninitialized” to initialize all auto-variables that are
not explicitly initialized in the declaration, the basic idea is following:

** The proposal:

A. add a new GCC option: (same name and meaning as CLANG)
-ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;

B. add a new attribute for variable:
__attribute((uninitialized)
the marked variable is uninitialized intentionaly for performance purpose.

C. The implementation needs to keep the current static warning on uninitialized
variables untouched in order to avoid "forking the language".


** The implementation:

There are two major requirements for the implementation:

1. all auto-variables that do not have an explicit initializer should be 
initialized to
zero by this option.  (Same behavior as CLANG)

2. keep the current static warning on uninitialized variables untouched.

In order to satisfy 1, we should check whether an auto-variable has initializer
or not;
In order to satisfy 2, we should add this new transformation after
"pass_late_warn_uninitialized".

So, we should be able to check whether an auto-variable has initializer or not 
after “pass_late_warn_uninitialized”,
If Not, then insert an initialization for it.

For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?


Yes, but do you want to catch variables promoted to register as well
or just variables
on the stack?


I think both as long as they are source-level auto-variables. Then which one is 
better?




Another issue is, in order to check whether an auto-variable has initializer, I 
plan to add a new bit in “decl_common” as:
  /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
  unsigned decl_is_initialized :1;

/* IN VAR_DECL, set when the decl is initialized at the declaration.  */
#define DECL_IS_INITIALIZED(NODE) \
  (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)

set this bit when setting DECL_INITIAL for the variables in FE. then keep it
even though DECL_INITIAL might be NULLed.


For locals it would be more reliable to set this flag during gimplification.


You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the routine 
“gimpley_decl_expr” (gimplify.c) as following:

   if (VAR_P (decl) && !DECL_EXTERNAL (decl))
 {
   tree init = DECL_INITIAL (decl);
...
   if (init && init != error_mark_node)
 {
   if (!TREE_STATIC (decl))
{
  DECL_IS_INITIALIZED(decl) = 1;
}

Is this enough for all Frontends? Are there other places that I need to 
maintain this bit?





Do you have any comment and suggestions?


As said above - do you want to cover registers as well as locals?


All the locals from the source-code point of view should be covered.   (From my 
study so far,  looks like that Clang adds that phase in FE).
If GCC adds this phase in FE, then the following design requirement

C. The implementation needs to keep the current static warning on uninitialized
variables untouched in order to avoid "forking the language”.

cannot be satisfied.  Since gcc’s uninitialized variables analysis is applied 
quite late.

So, we have to add this new phase after “pass_late_warn_uninitialized”.


  I'd do
the actual zeroing during RTL expansion instead since otherwise you
have to figure youself whether a local is actually used (see expand_stack_vars)


Adding  this new transformation during RTL expansion is okay.  I will check on 
this in more details to see how to add it to RTL expansion phase.


Note that optimization will already made have use of "uninitialized" state
of locals so depending on what the actual goal is here "late" may be too late.


This is a really good point…

In order to avoid optimization  to use the “uninitialized” state of locals, we 
should add the zeroing phase as early as possible (adding it in FE might be best
for this issue). However, if we have to met the following requirement:

C. The implementation needs to keep the current static warning on uninitialized
variables untouched in order to avoid "forking the language”.

We have to move the new phase after all the uninitialized analysis is done in 
order to avoid “forking the language”.

So, this is a problem that is not easy to resolve.

Do you have suggestion on this?


Not having th

Re: [07/23] Add a class that multiplexes two pointer types

2020-11-25 Thread Martin Sebor via Gcc-patches

On 11/13/20 1:14 AM, Richard Sandiford via Gcc-patches wrote:

This patch adds a pointer_mux class that provides similar
functionality to:

 union { T1 *a; T2 *b; };
 ...
 bool is_b_rather_than_a;

except that the is_b_rather_than_a tag is stored in the low bit
of the pointer.  See the comments in the patch for a comparison
between the two approaches and why this one can be more efficient.

I've tried to microoptimise the class a fair bit, since a later
patch uses it extensively in order to keep the sizes of data
structures down.


I've been reading these changes mostly out of interest than to
provide comments.  I like your use of C++ -- you clearly know
the language very well.  I also appreciate the extensive
commentary.  It makes understanding the code (and the changes)
much easier.  Thank you for doing that!  We should all aspire
to follow your example! :)

I do have one concern: the tendency to prioritize efficiency
over safety (this can be said about most GCC code). Specifically
in this class, the address bit twiddling makes me uneasy.  I don't
think the object model in either language (certainly not C but
I don't have the impression C++ either) makes it unequivocally
valid.  On the contrary, I'd say many of us interpret the current
rules as leaving it undefined.  There are efforts to sanction
this sort of thing under some conditions (e.g, the C object
model proposal) but they have not been adopted yet.  I think
we should try to avoid exploiting these dark corners in new
code.

I'm not too concerned that it will break with some compilers
(it might, but code like this is out there already and works).
What, I worry is that it will either prevent or make much more
difficult any access checking that might otherwise be possible.
I also worry that it will encourage people who look to GCC code
for examples to duplicate these tricks in their own code, making
it in turn harder for us to help them detect bugs in it.

Having said that, I looked for tests that verify this new utility
class (and the others in this series), partly to get a better
idea of how it's meant to be used.  I couldn't find any.  I'd
expect every nontrivial, general-purpose utility class to come
with tests.  (Having a library of these components might make
testing easier.)

Martin



gcc/
* mux-utils.h: New file.
---
  gcc/mux-utils.h | 248 
  1 file changed, 248 insertions(+)
  create mode 100644 gcc/mux-utils.h

diff --git a/gcc/mux-utils.h b/gcc/mux-utils.h
new file mode 100644
index 000..17ced49cd22
--- /dev/null
+++ b/gcc/mux-utils.h
@@ -0,0 +1,248 @@
+// Multiplexer utilities
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of GCC.
+//
+// GCC is free software; you can redistribute it and/or modify it under
+// the terms of the GNU General Public License as published by the Free
+// Software Foundation; either version 3, or (at your option) any later
+// version.
+//
+// GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+// WARRANTY; without even the implied warranty of MERCHANTABILITY or
+// FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+// for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with GCC; see the file COPYING3.  If not see
+// .
+
+#ifndef GCC_MUX_UTILS_H
+#define GCC_MUX_UTILS_H 1
+
+// A class that stores a choice "A or B", where A has type T1 * and B has
+// type T2 *.  Both T1 and T2 must have an alignment greater than 1, since
+// the low bit is used to identify B over A.  T1 and T2 can be the same.
+//
+// A can be a null pointer but B cannot.
+//
+// Barring the requirement that B must be nonnull, using the class is
+// equivalent to using:
+//
+// union { T1 *A; T2 *B; };
+//
+// and having a separate tag bit to indicate which alternative is active.
+// However, using this class can have two advantages over a union:
+//
+// - It avoides the need to find somewhere to store the tag bit.
+//
+// - The compiler is aware that B cannot be null, which can make checks
+//   of the form:
+//
+//   if (auto *B = mux.dyn_cast ())
+//
+//   more efficient.  With a union-based representation, the dyn_cast
+//   check could fail either because MUX is an A or because MUX is a
+//   null B, both of which require a run-time test.  With a pointer_mux,
+//   only a check for MUX being A is needed.
+template
+class pointer_mux
+{
+public:
+  // Return an A pointer with the given value.
+  static pointer_mux first (T1 *);
+
+  // Return a B pointer with the given (nonnull) value.
+  static pointer_mux second (T2 *);
+
+  pointer_mux () = default;
+
+  // Create a null A pointer.
+  pointer_mux (std::nullptr_t) : m_ptr (nullptr) {}
+
+  // Create an A or B pointer with the given value.  This is only valid
+  // if T1 and T2 are distinct and if T can be resolved to exactly one
+  // of them.
+  templ

Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-25 Thread Stefan Kanthak
Jeff Law  wrote:

[...]

> By understanding how your proposed changes affect other processors, you
> can write better changes that are more likely to get included. 
> Furthermore you can focus efforts on things that matter more in the real
> world. DImode shifts in libgcc are _not_ useful to try and optimize on
> x86_64 as it has instructions to implement 64 bit shifts. DImode shifts
> in libgcc are not useful to try and optimize on i686 as the compiler can
> synthesize them on i686. DImode shifts can't be easily synthesized on
> other targets and on those targets we call the routines in libgcc2. 
> Similarly for other routines you find in libgcc2.

What makes you think that my patches addressed only i386 and AMD64?

Again: from the absence of __addDI3/__subDI3 in libgcc2.[ch] I had reason
to assume that GCC synthesizes "double-word" addition/subtraction on all
processors, not just on x86.
Since "double-word" comparision and shifts are likewise simple operations
I further assumed that GCC synthesizes them too on all processors.

What's the fundamental difference between subtraction and comparision?
Why does GCC generate calls to __[u]cmpDI2 for a simple comparision
instead to synthesize it?
And: as shown in libgcc2.c, "double-word" shifts can easily be synthesized
using "single-word" shifts plus logical OR on any target.
I expected GCC to synthesize these operations on non-x86 processors too,
just like "double-word" addition and subtraction.

A possible/reasonable explanation would be code size, i.e. if the synthesized
instructions need significantly more memory than the function call (including
the argument setup of course).

Stefan


Re: [PATCH] libgfortran: Correct FP feature macro checks

2020-11-25 Thread Tobias Burnus

On 25.11.20 23:02, Maciej W. Rozycki wrote:


Well, it does not link, for the somewhat unsurprising reason of a missing
libgfortran runtime.  I have modified the program with whatever little
Fortran skills I gained a while ago to get something that can be parseable
for a human being in the assembly form:


You could also try -fdump-tree-original or -fdump-parse-tree
which might be a bit more readable than assembler – at least
it avoids the problem of D-floating format.


   real(8) :: h = HUGE(1.0)
   real(8) :: e = EPSILON(1.0)

   print *,h
   print *,e
end


huge and epsilon are defined as:
  huge(x) = (1 - b**(-p)) * b**(emax-1) * b
  epsilon(x) = b**(1-p)
with
  b = radix = REAL_MODE_FORMAT (mode)->b
  p = digits = REAL_MODE_FORMAT (mode)->p
  emax = max_exponent = REAL_MODE_FORMAT (mode)->emax

For C/C++, it is defined with
  %s = FLT, DBL, LDBL + FLT%d%s with %d e.g. 128 and %s = "" or "X" if extended
as
  __%s_MAX__ = builtin_define_with_hex_fp_value (... decimal_dig ...)
with
  decimal_dig = 1 + fmt->p * log10_b
and
  __%s_EPSILON__
as
  if (fmt->pnan < fmt->p)
/* This is an IBM extended double format, so 1.0 + any double is
   representable precisely.  */
  sprintf (buf, "0x1p%d", fmt->emin - fmt->p);
else
  sprintf (buf, "0x1p%d", 1 - fmt->p);


Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [C PATCH] Do not drop qualifiers for _Atomic in typeof

2020-11-25 Thread Joseph Myers
On Wed, 25 Nov 2020, Uecker, Martin wrote:

> So OK to apply with the following Changelog?

OK fixed as noted.

> 2020-11-25  Martin Uecker  
> 
> gcc/c/

Should mention the PR number in the ChangeLog entry (the part that will 
end up automatically added to the ChangeLog file), not just the summary 
line.

> * c-parsers.c (c_parser_declaration_or_fndef): Remove redundant 
> code

It's c-parser.c.  The git hooks will complain if the file names mentioned 
don't match the files changed in the commit.

> gcc/ginclude/
>   * ginclude/stdatomic.h: Use comma operator to drop qualifiers.

gcc/ginclude/ doesn't have its own ChangeLog, this entry goes in gcc/.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 00/31] VAX: Bring the port up to date (yes, MODE_CC conversion is included)

2020-11-25 Thread coypu--- via Gcc-patches
On Tue, Nov 24, 2020 at 05:27:10AM +, Maciej W. Rozycki wrote:
> On Tue, 24 Nov 2020, Maciej W. Rozycki wrote:
> 
> > > I don't know how or why __FLT_HAS_INFINITY is set for a target which
> > > does not support it, but if you get rid of that macro, that particular
> > > problem should be solved.
> > 
> >  Thanks for the hint; I didn't look into it any further not to distract 
> > myself from the scope of the project.  I have now, and the check you have 
> > quoted is obviously broken (as are all the remaining similar ones), given:
> > 
> > $ vax-netbsdelf-gcc -E -dM - < /dev/null | sort | grep _HAS_
> > #define __DBL_HAS_DENORM__ 0
> > #define __DBL_HAS_INFINITY__ 0
> > #define __DBL_HAS_QUIET_NAN__ 0
> > #define __FLT_HAS_DENORM__ 0
> > #define __FLT_HAS_INFINITY__ 0
> > #define __FLT_HAS_QUIET_NAN__ 0
> > #define __LDBL_HAS_DENORM__ 0
> > #define __LDBL_HAS_INFINITY__ 0
> > #define __LDBL_HAS_QUIET_NAN__ 0
> > $ 
> > 
> > which looks reasonable to me.  This seems straightforward to fix to me, so 
> > I'll include it along with verification I am about to schedule (assuming 
> > that this will be enough for libgfortran to actually build; obviously it 
> > hasn't been tried by anyone with such a setup for a while now, as these 
> > libgfortran checks date back to 2009).
> 
>  Well, it is still broken, owing to NetBSD failing to implement POSIX 2008 
> locale handling correctly, apparently deliberately[1], and missing 
> uselocale(3)[2] while still providing newlocale(3).  This confuses our 
> conditionals and consequently:
> 
> .../libgfortran/io/transfer.c: In function 'data_transfer_init_worker':
> .../libgfortran/io/transfer.c:3416:30: error:
> 'old_locale_lock' undeclared (first use in this function)
>  3416 |   __gthread_mutex_lock (&old_locale_lock);
>   |  ^~~
> 
> etc.
> 
>  We can probably work it around by downgrading to setlocale(3) for NetBSD 
> (i.e. whenever either function is missing) unless someone from the NetBSD 
> community contributes a better implementation (they seem to prefer their 
> own non-standard printf_l(3) library API).

Hi Maciej,

I've been building successfully with setting:
export ac_cv_func_freelocale=no
export ac_cv_func_newlocale=no
export ac_cv_func_uselocale=no

I think the code to avoid these functions already exists, but just the
configure tests need tuning.

Also, this is amazing work!


Re: [PATCH 00/31] VAX: Bring the port up to date (yes, MODE_CC conversion is included)

2020-11-25 Thread Joseph Myers
On Wed, 25 Nov 2020, Maciej W. Rozycki wrote:

>  For the other pieces that are missing perhaps my work I did many years 
> ago to port glibc 2.4 (the last one I was able to cook up without NPTL), 
> and specifically libm within, to the never-upstreamed VAX/Linux target 
> might be useful to complete the effort, as there seems to be an overlap 
> here.  That port hasn't been fully verified though and I do not promise 
> doing any work related to it anytime either.  The glibc patches continue 
> being available online to download and use under the terms of the GNU GPL 
> for anyone though.

I think I mentioned before: if you wish to bring a VAX port back to 
current glibc, I think it would make more sense to use software IEEE 
floating point rather than adding new support to glibc for a non-IEEE 
floating-point format.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/25/20 2:22 PM, Stefan Kanthak wrote:
> Jakub Jelinek  wrote:
>
>> On Wed, Nov 25, 2020 at 09:22:53PM +0100, Stefan Kanthak wrote:
 As Jakub has already indicated, your change will result in infinite
 recursion on avr.Ã, I happened to have a cr16 handy and it looks like
 it'd generate infinite recursion there too.
>>> JFTR: does GCC emit a warning then? If not: why not?
>> Why should it.  libgcc is something GCC has full control over and can assume
>> it is written in a way that it can't recurse infinitely, that is one of
>> libgcc design goals.
> Where are these design goals documented?
The internals manual is the best source of information on these
routines.  But the implications of what you're going to read there won't
be clear unless you think hard and probably spend some time working with
embedded targets.  Jakub is absolutely, 100% correct here.

>>> Since I neither have an avr nor a cr16 here, and also no TR-440, no S/3x0,
>>> no Spectra-70, no PDP-11, no VAX, no SPARC, no MIPS, no PowerPC, no MC68k,
>>> no NSC16xxx and no NSC32xxx any more, GCC only gives me access to the x86
>>> code it generates.
>> You can always use cross-compilers.
> There are no cross-compilers available here.
> Why should I waste time and energy to build cross-compilers for processors
> I don't use?
If all we had to care about was x86, then your changes would likely be
fine, but GCC targets dozens of distinct processors from deeply embedded
to mainframes and we have to consider how changes impact all of them,
both from a correctness and from a performance standpoint.

By understanding how your proposed changes affect other processors, you
can write better changes that are more likely to get included. 
Furthermore you can focus efforts on things that matter more in the real
world.  DImode shifts in libgcc are _not_ useful to try and optimize on
x86_64 as it has instructions to implement 64 bit shifts.  DImode shifts
in libgcc are not useful to try and optimize on i686 as the compiler can
synthesize them on i686.  DImode shifts can't be easily synthesized on
other targets and on those targets we call the routines in libgcc2. 
Similarly for other routines you find in libgcc2.



>
> [...]
>
>> As has been said multiple times, trying to optimize routines that are never
>> called on x86 for x86
> According to Andreas Schwab, the 64-bit shift routines are called on 32-bit
> processors when compiling with -Os!
> x86 is a 32-bit processor.
He's making a generalization.  It's not going to apply to every 32bit
processor.  Nor would it even necessarily apply to every 16 bit
processor.  But I'm not sure what the point of arguing about this is. 
What you're trying to do with the shift routines is wrong.

>
>> is just wasted energy, better invest your time in
>> functions that are actually ever called.
> Where is the documentation that names the routines (not) called on the >50
> target architectures?
Nobody's bothered to document that.  I don't think doing so would
ultimately be useful.   I do think there is some documentation on what
we used to call libgcc1 as those routines have typically been provided
as assembly code on the targets where they're needed (think 32bit
arithmetic on something like an 8 or 16 bit target).  I haven't looked
at or needed to look for that documentation in over 20 years, so I don't
know if it's still around.


> And why do you build and ship routines which are never called at all?
> This is a waste of time, space and energy!
They are used on some and not others.

I'm sorry you think it's a waste of time space and energy.  These bits
have a purpose and it would be more work than it's worth to try and
selectively build and ship these routines on a target by target basis. 
They're part of the source tree, they get built into the library and
when they are actually needed, they will be used.  As a generality
you're going to find that most are not needed on x86.  So spending time
optimizing them for x86 just isn't all that useful.

jeff



Re: H8 cc0 conversion

2020-11-25 Thread Hans-Peter Nilsson
On Tue, 24 Nov 2020, Eric Botcazou wrote:

> > I'm intested in any notes, however vague, on that matter.  I was
> > a bit surprised to see that myself...that is, after fixing
> > *some* related regressions, like the one in combine.  (Did I
> > actually miss something?)
>
> My recollection for the Visium port would align with what Jeff saw but, on the
> other hand, this could have been very marginal a phenomenon in the end.

Thanks.  Though, without claims substantiated as anything more
than a feeling, I'm going stick out my chin and say that you're
both seasoned enough gcc hackers to be influenced by earlier
experience, and that things have changed enough that this is no
longer true.

Also, early-debug cause-misattribuations may be a factor.  (For
the combine thing, I first suspected it being target rtx_costs.)

Also for visium, it very well be a remaining odd case in
dbr/reorg.  We've only fixed a few paths in that pile, but that
hasn't had any effect in *my* benchmarks.  Hm, I also realize I
can't speak about scheduling and LRA.

With the alternative being the machine description exploding
(linearly) with error-prone edits, I'll insist that for this
kind of machine it's better to expose the target_flags_regnum
clobbers before reload.  So, better try this approach first,
when it costs "nothing", before going for the big(ger) edit of
adding define_insn_and_splits for just-about-everything (bigger
than adding a register clobber to most patterns).

Current cc0 head-count is down to avr, cr16, h8300, vax, with
two of them recently having patches posted, alas not a lot of
ports left to try this advice.

brgds, H-P


[PATCH] [tree-optimization] Optimize max/min pattern with comparison

2020-11-25 Thread Eugene Rozenfeld via Gcc-patches
Make the following simplifications:
X <= MAX(X, Y) -> true
X > MAX(X, Y) -> false
X >= MIN(X, Y) -> true
X < MIN(X, Y) -> false

This fixes PR96708.

Tested on x86_64-pc-linux-gnu.

bool f(int a, int b)
{
int tmp = (a < b) ? b : a;
return tmp >= a;
}

Code without the patch:

vmovd  xmm0,edi
vmovd  xmm1,esi
vpmaxsd xmm0,xmm0,xmm1
vmovd  eax,xmm0
cmpeax,edi
setge  al
ret

Code with the patch:

moveax,0x1
ret

Eugene


0001-Optimize-max-pattern-with-comparison.patch
Description: 0001-Optimize-max-pattern-with-comparison.patch


Re: [PATCH] libgfortran: Correct FP feature macro checks

2020-11-25 Thread Maciej W. Rozycki
On Wed, 25 Nov 2020, Thomas Koenig wrote:

> >xbig = 26.543, xhuge = 6.71e+7, xmax = 2.53e+307;
> 
> The Fortran intrinsis like HUGE, EPSILON, SELECTED_REAL_KIND etc
> would have to be handled correctly, both for simplification in
> the front end and in the library.
> 
> Does the program
> 
>   print *,HUGE(1.0)
>   print *,EPSILON(1.0)
> end
> 
> print correct values?

 Well, it does not link, for the somewhat unsurprising reason of a missing 
libgfortran runtime.  I have modified the program with whatever little 
Fortran skills I gained a while ago to get something that can be parseable 
for a human being in the assembly form:

  real(8) :: h = HUGE(1.0)
  real(8) :: e = EPSILON(1.0)

  print *,h
  print *,e
end

This yields the following data produced for the real literals referred:

.align 2
.type   h.2, @object
.size   h.2, 8
h.2:
.long   -32769
.long   0
.align 2
.type   e.1, @object
.size   e.1, 8
e.1:
.long   13568
.long   0

which made me realise the NetBSD target defaults to the D-floating format 
for some reason, which is significantly different in the range supported 
from IEEE 754 double; specifically it has 1 sign bit, 8 exponent bits 
(like IEEE 754 single) and 55 trailing significand bits.

 So I am going to give up on giving this format any support, sorry (for 
the VAX/Linux port I chose to use the G-floating format, which at least 
gives some hope for interchangeability, even though I realise in some uses 
the extra mantissa bits D-floating provides are useful).

 In any case the output above does not appear exactly right to me as I 
would expect:

h.2:
.long   -32769
.long   -1

(`e' seems right to me).

  Maciej


Re: [PATCH] Fixup additional search path in offload_additional_options

2020-11-25 Thread Thomas Schwinge
Hi!

On 2020-11-25T15:44:56+, Richard Biener  wrote:
> On Wed, 25 Nov 2020, Jakub Jelinek wrote:
>
>> On Wed, Nov 25, 2020 at 04:30:44PM +0100, Richard Biener wrote:
>> > This fixes the search when configured with --libexecdir=lib64,

(I can't comment on that one specifically, not using this option.)

>> > I've adjusted the bin reference for consistency.
>> >
>> > Testing in progress.  Does this look sensible?
>> >
>> > 2020-11-25  Richard Biener  
>> >
>> > libgomp/
>> >* configure: Regenerate.
>> >* plugin/configfrag.ac (offload_additional_options): Use
>> >$(libexecdir) and $(bindir) instead of hard-coding them.
>>
>> LGTM.
>>
>>  Jakub.
>
> Hmm, but $(libexecdir) includes the prefix, thus expands to
> /usr/lib64 for me.

> So what's the tgt_dir used for besides
> populating offload_additional_options?

As far as I can tell, in addition to 'libgomp/plugin/' ('tgt_dir'), as
you've found, the 'path' argument to
'--enable-offload-targets=target=path' is only also used in
'liboffloadmic/plugin/' ('accel_search_dir') to populate search paths
(I'm not familiar with that one in detail).  At least in the libgomp
case, these search paths are (supposed to) only be used for build-tree
testing.

So, it seems this doesn't actually match the description in
'gcc/doc/install.texi':

@item 
--enable-offload-targets=@var{target1}[=@var{path1}],@dots{},@var{targetN}[=@var{pathN}]
Enable offloading to targets @var{target1}, @dots{}, @var{targetN}.
Offload compilers are expected to be already installed.  Default search
path for them is @file{@var{exec-prefix}}, but it can be changed by
specifying paths @var{path1}, @dots{}, @var{pathN}.

... which (to me) sounds as if these search paths would apply not only
for build-tree testing, but also for installed usage?


> That said, in this
> very spot not specifying it would work for me I guess,
> historically I have used /usr/nvptx as path for reasons
> I do not remember :/ (newlib is installed in this location)

I too configure/install, for example, the offload compilers into their
own prefix, for example: '[install]/offload-nvptx-none/' instead of
'[install]/' (which would be '/usr' in your case, I suppose), because I
don't like the host and offloading compilers overwrite each others'
files.

For that to work, I then need to set up some symlinks so that the host
compiler can find the 'mkoffload's:

lto-wrapper: fatal error: could not find accel/nvptx-none/mkoffload in 
[...]/install/bin/../libexec/gcc/x86_64-pc-linux-gnu/10.0.1/:[...]/install/bin/../libexec/gcc/
 (consider using '-B')

$ ls -l install/bin/../libexec/gcc/accel/*
[...] install/bin/../libexec/gcc/accel/amdgcn-amdhsa -> 
../../../offload-amdgcn-amdhsa/libexec/gcc/x86_64-pc-linux-gnu/11.0.0/accel/amdgcn-amdhsa
[...] install/bin/../libexec/gcc/accel/nvptx-none -> 
../../../offload-nvptx-none/libexec/gcc/x86_64-pc-linux-gnu/11.0.0/accel/nvptx-none
[...] install/bin/../libexec/gcc/accel/x86_64-intelmicemul-linux-gnu -> 
../../../offload-x86_64-intelmicemul-linux-gnu/libexec/gcc/x86_64-pc-linux-gnu/11.0.0/accel/x86_64-intelmicemul-linux-gnu

(Normally these are found via
'install/bin/../libexec/gcc/x86_64-pc-linux-gnu/10.0.1/' etc., but I want
to avoid the changing version tag ('10.0.1').)

..., and some symlinks so that the 'mkoffload's can find the offload
compilers:

mkoffload: fatal error: offload compiler 
x86_64-pc-linux-gnu-accel-nvptx-none-gcc not found (consider using '-B')

$ ls -l install/bin/*-accel-*-gcc
[...] install/bin/x86_64-pc-linux-gnu-accel-amdgcn-amdhsa-gcc -> 
../offload-amdgcn-amdhsa/bin/x86_64-pc-linux-gnu-accel-amdgcn-amdhsa-gcc
[...] install/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc -> 
../offload-nvptx-none/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc
[...] 
install/bin/x86_64-pc-linux-gnu-accel-x86_64-intelmicemul-linux-gnu-gcc -> 
../offload-x86_64-intelmicemul-linux-gnu/bin/x86_64-pc-linux-gnu-accel-x86_64-intelmicemul-linux-gnu-gcc

This is not quite polished, but I still like it better than installing
everying into the same prefix.  (This way, we only add to the host
compiler installation tree the few symlinks cited above; everything else
of the offload compiler installations is separated in
'[install]/offload-[target]'.)

I didn't get around yet to proposing changing GCC to make that the
default mode of installation/operation.  (Why do you/others think?)
(Also, I have not looked up how other distributors are handling this.)


Grüße
 Thomas
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-25 Thread Stefan Kanthak
Jakub Jelinek  wrote:

> On Wed, Nov 25, 2020 at 09:22:53PM +0100, Stefan Kanthak wrote:
>> > As Jakub has already indicated, your change will result in infinite
>> > recursion on avr.Ã, I happened to have a cr16 handy and it looks like
>> > it'd generate infinite recursion there too.
>>
>> JFTR: does GCC emit a warning then? If not: why not?
>
> Why should it.  libgcc is something GCC has full control over and can assume
> it is written in a way that it can't recurse infinitely, that is one of
> libgcc design goals.

Where are these design goals documented?

>> Since I neither have an avr nor a cr16 here, and also no TR-440, no S/3x0,
>> no Spectra-70, no PDP-11, no VAX, no SPARC, no MIPS, no PowerPC, no MC68k,
>> no NSC16xxx and no NSC32xxx any more, GCC only gives me access to the x86
>> code it generates.
>
> You can always use cross-compilers.

There are no cross-compilers available here.
Why should I waste time and energy to build cross-compilers for processors
I don't use?

[...]

> As has been said multiple times, trying to optimize routines that are never
> called on x86 for x86

According to Andreas Schwab, the 64-bit shift routines are called on 32-bit
processors when compiling with -Os!
x86 is a 32-bit processor.

> is just wasted energy, better invest your time in
> functions that are actually ever called.

Where is the documentation that names the routines (not) called on the >50
target architectures?
And why do you build and ship routines which are never called at all?
This is a waste of time, space and energy!

Stefan



Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/25/20 1:22 PM, Stefan Kanthak wrote:
> Jeff Law  wrote:
>
>> On 11/24/20 8:40 AM, Stefan Kanthak wrote:
>>> Andreas Schwab wrote:
>>>
 On Nov 24 2020, Stefan Kanthak wrote:

> 'nuff said
 What's your point?
>>> Pinpoint deficiencies and bugs in GCC and libgcc, plus a counter
>>> example to your "argument"!
>>> I recommend careful reading.
>> Umm, you should broaden your horizons.
> My horizon is as wide as the cap^Winability of GCC available for the
> machines I use.
Then you should expect that some of your patches are going to simply be
inappropriate for GCC.  When people with 20-30 years of experience with
GCC tell you what you're doing is wrong in the context of GCC, it would
be wise to listen.  Continuing to argue doesn't change the fact that
what you're trying to do is fundamentally wrong.

>
>> The world is not an x86.
> The GCC available for the machines I use is only able to generate x86
> code.
You can certainly take GCC sources and use them to create cross
compilers for any architecture supported by GCC.  People do it all the time.

>
>> I'm pretty sure Andreas was referring to non-x86 targets.
> | On most 32-bit targets with -Os.
>
> Are avr or cr16 32-bit processors?
I don't know offhand.  And ultimately it doesn't matter, the shift
changes are simply wrong and will not be applied.

>
>> As Jakub has already indicated, your change will result in infinite
>> recursion on avr. I happened to have a cr16 handy and it looks like
>> it'd generate infinite recursion there too.
> JFTR: does GCC emit a warning then? If not: why not?
GCC doesn't try to warn for self-recursion.

>
> Since I neither have an avr nor a cr16 here, and also no TR-440, no S/3x0,
> no Spectra-70, no PDP-11, no VAX, no SPARC, no MIPS, no PowerPC, no MC68k,
> no NSC16xxx and no NSC32xxx any more, GCC only gives me access to the x86
> code it generates.
Read about building cross compilers.  I don't have any of those things
either.  But I'm still able to build and test them without anything
other than a generic x86_64 laptop.

Hell, I can *bootstrap* on things like m68k, mips, hppa, riscv, alpha,
and  others using qemu emulation on a generic x86_64 laptop.


>
>> On other targets the routines you're changing won't be used because they
>> either have 64 bit shifts or the compiler can synthesize them from other
>> primitives that are available.
> These routines are documented in
> 
> and might be called by your users.
THose routines are not for direct use by users.   I've already stated
that.  In fact, you're referring to the "gcc internals" manual which is
for developers working on GCC itself, not for end users.  There's
separate documentation for end users and I'd fully expect it not to
document any of the routines in libgcc, because they're not for end users.


Also note the first sentence on the page you've referenced:

"The integer arithmetic routines are used on platforms that don’t
provide hardware support for arithmetic operations on some modes."

Which again is what Jakub, myself and others have been telling you.  You
can't (for example) use a 64bit shift in any of the 64bit shift routines
in libgcc2.  It's fundamentally wrong.
>> It's pointless to keep arguing on the shift stuff. What you've
>> submitted is fundamentally wrong in the context of gcc's libgcc2
>> routines. It's that simple. If you keep arguing about it you're likely
>> just going to annoy those who can help you to the point where they won't
>> bother.
> Is there any documentation for (the design and restrictions of) libgcc2?
Not that I'm immediately aware of.

> The patches I sent for the shift and the comparision routines are based
> on the assumption that GCC generates code for "double-word" arithmetic
> inline.
And that's a fundamentally incorrect assumption as myself and others
have pointed out.


Jeff



Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-25 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 25, 2020 at 09:22:53PM +0100, Stefan Kanthak wrote:
> > As Jakub has already indicated, your change will result in infinite
> > recursion on avr. I happened to have a cr16 handy and it looks like
> > it'd generate infinite recursion there too.
> 
> JFTR: does GCC emit a warning then? If not: why not?

Why should it.  libgcc is something GCC has full control over and can assume
it is written in a way that it can't recurse infinitely, that is one of
libgcc design goals.

> Since I neither have an avr nor a cr16 here, and also no TR-440, no S/3x0,
> no Spectra-70, no PDP-11, no VAX, no SPARC, no MIPS, no PowerPC, no MC68k,
> no NSC16xxx and no NSC32xxx any more, GCC only gives me access to the x86
> code it generates.

You can always use cross-compilers.

> > On other targets the routines you're changing won't be used because they
> > either have 64 bit shifts or the compiler can synthesize them from other
> > primitives that are available.
> 
> These routines are documented in
> 
> and might be called by your users.

That is just theory, if people call them by hand instead of using normal C
arithmetics they will get worse code in any case (at least because of the
library call).
As has been said multiple times, trying to optimize routines that are never
called on x86 for x86 is just wasted energy, better invest your time in
functions that are actually ever called.  And even for those, care has to be
taken so that it doesn't break any other of the > 50 target architectures
GCC supports.

Jakub



Re: [20/23] rtlanal: Add simple_regno_set

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/13/20 1:21 AM, Richard Sandiford via Gcc-patches wrote:
> This patch adds a routine for finding a “simple” SET for a register
> definition.  See the comment in the patch for details.
>
> gcc/
>   * rtl.h (simple_regno_set): Declare.
>   * rtlanal.c (simple_regno_set): New function.
So I was a bit confused that this is supposed to reject read-write, but
what it's really rejecting is a narrow subset of read-write.  In
particular it rejects things that are potentially RMW via subregs. It
doesn't prevent the destination from appearing as a source operand.  You
might consider clarifying the comment.

OK

jeff



Re: [18/23] recog: Add an RAII class for undoing insn changes

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/13/20 1:20 AM, Richard Sandiford via Gcc-patches wrote:
> When using validate_change to make a group of changes, you have
> to remember to cancel them if something goes wrong.  This patch
> adds an RAII class to make that easier.  See the comments in the
> patch for details and examples.
>
> gcc/
>   * recog.h (insn_change_watermark): New class.
Ah, funny, I nearly suggested this with the temporary undo thingie.

OK
jeff



Re: [16/23] recog: Add a way of temporarily undoing changes

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/13/20 1:19 AM, Richard Sandiford via Gcc-patches wrote:
> In some cases, it can be convenient to roll back the changes that
> have been made by validate_change to see how things looked before,
> then reroll the changes.  For example, this makes it possible
> to defer calculating the cost of an instruction until we know that
> the result is actually needed.  It can also make dumps easier to read.
>
> This patch adds a couple of helper functions for doing that.
>
> gcc/
>   * recog.h (temporarily_undo_changes, redo_changes): Declare.
>   * recog.c (swap_change, temporarily_undo_changes): New functions.
>   (redo_changes): Likewise.
OK...  But...
+
> +/* Temporarily undo all the changes numbered NUM and up, with a view
> +   to reapplying them later.  The next call to the changes machinery
> +   must be:
> +
> +  redo_changes (NUM)
> +
> +   otherwise things will end up in an invalid state.  */
It'd be nice if we had state validation in the other routines. Somebody
is likely to mess this up at some point...


jeff




Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-25 Thread Stefan Kanthak
Jeff Law  wrote:

> On 11/24/20 8:40 AM, Stefan Kanthak wrote:
>> Andreas Schwab wrote:
>>
>>> On Nov 24 2020, Stefan Kanthak wrote:
>>>
 'nuff said
>>> What's your point?
>> Pinpoint deficiencies and bugs in GCC and libgcc, plus a counter
>> example to your "argument"!
>> I recommend careful reading.
> Umm, you should broaden your horizons.

My horizon is as wide as the cap^Winability of GCC available for the
machines I use.

> The world is not an x86.

The GCC available for the machines I use is only able to generate x86
code.

> I'm pretty sure Andreas was referring to non-x86 targets.

| On most 32-bit targets with -Os.

Are avr or cr16 32-bit processors?

> As Jakub has already indicated, your change will result in infinite
> recursion on avr. I happened to have a cr16 handy and it looks like
> it'd generate infinite recursion there too.

JFTR: does GCC emit a warning then? If not: why not?

Since I neither have an avr nor a cr16 here, and also no TR-440, no S/3x0,
no Spectra-70, no PDP-11, no VAX, no SPARC, no MIPS, no PowerPC, no MC68k,
no NSC16xxx and no NSC32xxx any more, GCC only gives me access to the x86
code it generates.

> On other targets the routines you're changing won't be used because they
> either have 64 bit shifts or the compiler can synthesize them from other
> primitives that are available.

These routines are documented in

and might be called by your users.

> It's pointless to keep arguing on the shift stuff. What you've
> submitted is fundamentally wrong in the context of gcc's libgcc2
> routines. It's that simple. If you keep arguing about it you're likely
> just going to annoy those who can help you to the point where they won't
> bother.

Is there any documentation for (the design and restrictions of) libgcc2?
The patches I sent for the shift and the comparision routines are based
on the assumption that GCC generates code for "double-word" arithmetic
inline.
If it doesn't: where are __addDI3 and __subDI3 defined, and where are
__adddi3, __addti3, __subdi3 and __subti3 documented?

Since libgcc2.[hc] don't define them, and

doesn't document them, I had reason to believe that my assumption holds.

> I think the bswapsi2 change will go forward, but it needs to be tested.

Stefan



Re: [13/23] recog: Split out a register_asm_p function

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/13/20 1:18 AM, Richard Sandiford via Gcc-patches wrote:
> verify_changes has a test for whether a particular hard register
> is a user-defined register asm.  A later patch needs to test the
> same thing, so this patch splits it out into a helper.
>
> gcc/
>   * rtl.h (register_asm_p): Declare.
>   * recog.c (verify_changes): Split out the test for whether
>   a hard register is a register asm to...
>   (register_asm_p): ...this new function.
OK
jeff



Re: [12/23] Export print-rtl.c:print_insn_with_notes

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/13/20 1:17 AM, Richard Sandiford via Gcc-patches wrote:
> Later patches want to use print_insn_with_notes (printing to
> a pretty_printer).  This patch exports it from print-rtl.c.
>
> The non-notes version is already public.
>
> gcc/
>   * print-rtl.h (print_insn_with_notes): Declare.
>   * print-rtl.c (print_insn_with_notes): Make non-static
OK
jeff



Re: [07/23] Add a class that multiplexes two pointer types

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/13/20 1:14 AM, Richard Sandiford via Gcc-patches wrote:
> This patch adds a pointer_mux class that provides similar
> functionality to:
>
> union { T1 *a; T2 *b; };
> ...
> bool is_b_rather_than_a;
>
> except that the is_b_rather_than_a tag is stored in the low bit
> of the pointer.  See the comments in the patch for a comparison
> between the two approaches and why this one can be more efficient.
>
> I've tried to microoptimise the class a fair bit, since a later
> patch uses it extensively in order to keep the sizes of data
> structures down.
>
> gcc/
>   * mux-utils.h: New file.
Do we have any potentially bootstrappable targets where we can't
guarantee pointer alignment of at least 16 bits?  I see what look like
suitable asserts, and presumably if we trigger them, then we're going to
need to rethink this and fall back to a separate bit?

jeff



Re: [06/23] Add an RAII class for managing obstacks

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/13/20 1:14 AM, Richard Sandiford via Gcc-patches wrote:
> This patch adds an RAII class for managing the lifetimes of objects
> on an obstack.  See the comments in the patch for more details and
> example usage.
>
> gcc/
>   * obstack-utils.h: New file.
RAII is goodness.  One could argue that most of our obstacks should
probably be converted.


jeff



Re: [05/23] Add more iterator utilities

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/13/20 1:13 AM, Richard Sandiford via Gcc-patches wrote:
> This patch adds some more iterator helper classes.  They really fall
> into two groups, but there didn't seem much value in separating them:
>
> - A later patch has a class hierarchy of the form:
>
>  Base
>   +- Derived1
>   +- Derived2
>
>   A class wants to store an array A1 of Derived1 pointers and an
>   array A2 of Derived2 pointers.  However, for compactness reasons,
>   it was convenient to have a single array of Base pointers,
>   with A1 and A2 being slices of this array.  This reduces the
>   overhead from two pointers and two ints (3 LP64 words) to one
>   pointer and two ints (2 LP64 words).
>
>   But consumers of the class shouldn't be aware of this: they should
>   see A1 as containing Derived1 pointers rather than Base pointers
>   and A2 as containing Derived2 pointers rather than Base pointers.
>   This patch adds derived_iterator and const_derived_container
>   classes to support this use case.
>
> - A later patch also adds various linked lists.  This patch adds
>   wrapper_iterator and list_iterator classes to make it easier
>   to create iterators for these linked lists.  For example:
>
> // Iterators for lists of definitions.
> using def_iterator = list_iterator;
> using reverse_def_iterator
>   = list_iterator;
>
>   This in turn makes it possible to use range-based for loops
>   on the lists.
>
> The patch just adds the things that the later patches need; it doesn't
> try to make the classes as functionally complete as possible.  I think
> we should add extra functionality when needed rather than ahead of time.
>
> gcc/
>   * iterator-utils.h (derived_iterator): New class.
>   (const_derived_container, wrapper_iterator): Likewise.
>   (list_iterator): Likewise.
OK
jeff



Re: [04/23] Move iterator_range to a new iterator-utils.h file

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/13/20 1:13 AM, Richard Sandiford via Gcc-patches wrote:
> A later patch will add more iterator-related utilities.  Rather than
> putting them all directly in coretypes.h, it seemed better to add a
> new header file, here called "iterator-utils.h".  This preliminary
> patch moves the existing iterator_range class there too.
>
> I used the same copyright date range as coretypes.h “just to be sure”.
>
> gcc/
>   * coretypes.h (iterator_range): Move to...
>   * iterator-utils.h: ...this new file.
OK
jeff



Re: [02/23] rtlanal: Remove noop_move_p REG_EQUAL condition

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/13/20 1:12 AM, Richard Sandiford via Gcc-patches wrote:
> noop_move_p currently keeps any instruction that has a REG_EQUAL
> note, on the basis that the equality might be useful in future.
> But this creates a perverse incentive not to add potentially-useful
> REG_EQUAL notes, in case they prevent an instruction from later being
> removed as dead.
>
> The condition originates from flow.c:life_analysis_1 and predates
> the changes tracked by the current repository (1992).  It probably
> made sense when most optimisations were done on RTL rather than FE
> trees, but it seems counterproductive now.
>
> gcc/
>   * rtlanal.c (noop_move_p): Don't check for REG_EQUAL notes.
I would  guess this was primarily for the old libcall mechanism where
we'd have a self-copy at the end of the sequence with a REG_EQUAL note
for the expression's natural form.  All that's been broken for a long
time.  So I'm not going to lose any sleep if we want to remove this
little chunk of code.

OK

jeff



Re: [03/23] reginfo: Add a global_reg_set

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/13/20 1:12 AM, Richard Sandiford via Gcc-patches wrote:
> A later patch wants to use the set of global registers as a HARD_REG_SET
> rather than a bool/char array.  Most other arrays already have a
> HARD_REG_SET counterpart, but this one didn't.
>
> gcc/
>   * hard-reg-set.h (global_reg_set): Declare.
>   * reginfo.c (global_reg_set): New variable.
>   (init_reg_sets_1, globalize_reg): Update it when globalizing
>   registers.
OK
jeff



Re: [01/23] vec: Silence clang warning

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/13/20 1:11 AM, Richard Sandiford via Gcc-patches wrote:
> I noticed during compatibility testing that clang warns that this
> operator won't be implicitly const in C++14 onwards.
>
> gcc/
>   * vec.h (vnull::operator vec): Make const.
OK
jeff



Re: [00/23] Make fwprop use an on-the-side RTL SSA representation

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/13/20 1:10 AM, Richard Sandiford via Gcc-patches wrote:
> Just after GCC 10 stage 1 closed (oops), I posted a patch to add a new
> combine pass.  One of its main aims was to allow instructions to move
> around where necessary in order to make a combination possible.
> It also tried to parallelise instructions that use the same resource.
>
> That pass contained its own code for maintaining limited def-use chains.
> When I posted the patch, Segher asked why we wanted yet another piece
> of pass-specific code to do that.  Although I had specific reasons
> (which I explained at the time) I've gradually come round to agreeing
> that that was a flaw.
>
> This series of patches is the result of a Covid-time project to add
> a more general, pass-agnostic framework.  There are two parts:
> adding the framework itself, and using it to make fwprop.c faster.
>
> The framework part
> --
>
> The framework provides an optional, on-the-side SSA view of existing
> RTL instructions.  Each instruction gets a list of definitions and a
> list of uses, with each use having a single definition.  Phi nodes
> handle cases in which there are multiple possible definitions of a
> register on entry to a basic block.  There are also routines for
> updating instructions while keeping the SSA representation intact.
>
> The aim is only to provide a different view of existing RTL instructions.
> Unlike gimple, and unlike (IIRC) the old RTL SSA project from way back,
> the new framework isn't a “native” SSA representation.  This means that
> all inputs to a phi node for a register R are also definitions of
> register R; no move operation is “hidden” in the phi node.
Hmm, I'm trying to parse what the last phrase means.  Does it mean that
the "hidden copy" problem for out-of-ssa is avoided?  And if so, how is
that maintained over time.  Things like copy-prop will tend to introduce
those issues even if they didn't originally exist.

>
> Like gimple, the framework treats memory as a single unified resource.
>
> A more in-depth summary is contained in the doc patch, but some
> other random notes:
>
> * At the moment, the SSA information is local to one pass, but it might
>   be good to maintain it between passes in future.
Right.  I think we can look at the passes near fwprop as good targets
for extending the lifetime over which we have an SSA framework.   I note
CSE is just before the first fwprop and CSE is a hell of a lot easier in
an SSA world :-)  It's unfortunately that there's no DCE passes abutting
fwprop as DCE is really easy in an SSA world.

>
> * The SSA code groups blocks into extended basic blocks, with the
>   EBBs rather than individual blocks having phi nodes.  
So I haven't looked at the patch, but the usual place to put PHIs is at
the dominance frontier.  But extra PHIs just increase time/memory and
shouldn't affect correctness.

>
> * The framework also provides live range information for registers
>   within an extended basic block and allows instructions to move within
>   their EBB.  It might be useful to allow further movement in future;
>   I just don't have a use case for it yet.
Yup.   You could do something like Click's algorithm to schedule the
instructions in a block to maximize CSE opportunities on top of this.

>
> * One advantage of the new infrastructure is that it gives
>   recog_for_combine-like behaviour: if recog wants to add clobbers
>   of things like the flags register, the SSA code will make sure
>   that the flags register is free.
I look more at the intersection between combine and SSA as an
opportunity to combine on extended blocks, simplify the "does dataflow
allow this combination" logic, drop the need to build/maintain LOG_LINKS
and more generally simplify note distribution.

> * I've tried to optimise the code for both memory footprint and
>   compile time.  The first part involves quite a bit of overloading
>   of pointers and various other kinds of reuse, so most of the new data
>   structures use private member variables and public accessor functions.
>   I know that style isn't universally popular, but I think it's
>   justified here.  Things could easily go wrong if passes tried
>   to operate directly on the underlying data structures.
ACK.

>
> * Debug instructions get SSA information too, on a best-effort basis.
>   Providing complete information would be significantly more expensive.
>
> * I wasn't sure for new C++ code whether to stick to the old C /* … */
>   comments, or whether to switch to //.  In the end I went for //,
>   on the basis that:
>
>   - The ranger code already does this.
>
>   - // is certainly more idiomatic in C++.
>
>   - // is in the lisp tradition of per-line comments and it matches the
> ;; used in .md files.  I feel sure that GCC would have been written
> using // from the outset if that had been possible.
I think we're allowing both and realistically /* */ vs // shouldn't be
something we spend a lot of time arguing about :-)


>

Re: test: Update cases for vect_partial_vectors_usage_1

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/23/20 2:09 AM, Kewen.Lin via Gcc-patches wrote:
> Hi,
>
> I adjusted some vectorization test cases for vect_partial_vectors_usage_1
> before, but as exposed in the recent testings, some of them need to be
> adjusted again.  The reason is that the commit r11-3393 improved the
> epilogue loop handling of partial vectors and we won't use partial vectors
> to vectorize a single iteration scalar loop now.
>
> The affected test cases have only one single iteration in their epilogues
> separately, so we shouldn't expect the vectorization to happen on the
> epilogues any more.
>
> Tested with explicit --param=vect-partial-vector-usage=1 and default
> enablement.
>
> Is it ok for trunk?
>
> BR,
> Kewen
> -
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/vect/slp-perm-1.c: Adjust for partial vectors.
>   * gcc.dg/vect/slp-perm-5.c: Likewise.
>   * gcc.dg/vect/slp-perm-6.c: Likewise.
>   * gcc.dg/vect/slp-perm-7.c: Likewise.
OK
jeff



Re: [patch][rtl-optimization][i386][pr97777] Fix a reg-stack df maintenance bug triggered by zero-call-used-regs pass.

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/19/20 8:59 AM, Qing Zhao via Gcc-patches wrote:
> Hi, 
>
> PR9 - ICE: in df_refs_verify, at df-scan.c:3991 with -O 
> -ffinite-math-only -fzero-call-used-regs=all
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9
>
> Is a bug triggered by the new pass zero-call-used-regs, however, it’s an old 
> bug in the pass “reg-stack”.
> This pass does not correctly maintain the df information after 
> transformation. 
>
> Since the transformation is reg-stack pass is quite complicate, involving 
> both instruction changes and control
> Flow changes, I called “df_insn_rescan_all” after the transformation is done.
>
> The patch has been tested with bootstrap with 
> --enable-checking=yes,rtl,df,extra, no regression. 
>
> Okay for commit?
>
> Qing
>
> From c2573c6c8552b7b4c2eedb0684ce48b5c11436ec Mon Sep 17 00:00:00 2001
> From: qing zhao 
> Date: Thu, 19 Nov 2020 16:46:50 +0100
> Subject: [PATCH] rtl-optimization: Fix data flow maintenance bug in
>  reg-stack.c [pr9]
>
> reg-stack pass does not maintain the data flow information correctly.
> call df_insn_rescan_all after the transformation is done.
>
> gcc/
>   PR rtl-optimization/9
>   * reg-stack.c (rest_of_handle_stack_regs): call
>   df_insn_rescan_all if reg_to_stack return true.
>
>   gcc/testsuite/
>   PR rtl-optimization/9
>   * gcc.target/i386/pr9.c: New test.
I'd like to see more analysis here.

ie, precisely what data is out of date and why?

Jeff




Fix duplication hook in ipa-modref

2020-11-25 Thread Jan Hubicka
Hi,
this constructing testcase for the direct escape analysis I noticed that
I forgot to duplicate arg_flags in duplicate hook so we lose track of
that when clonning.  I am sure I had it there at some point, but
apparently it got lost while breaking up patches.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* ipa-modref.c (modref_summaries::duplicate,
modref_summaries_lto::duplicate): Copy arg_flags.
(remap_arg_flags): Fix remapping of arg_flags.
diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index e6cb4a87b69..d1d4ba786a4 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -2142,6 +2146,8 @@ modref_summaries::duplicate (cgraph_node *, cgraph_node 
*dst,
 src_data->loads->max_accesses);
   dst_data->loads->copy_from (src_data->loads);
   dst_data->writes_errno = src_data->writes_errno;
+  if (src_data->arg_flags.length ())
+dst_data->arg_flags = src_data->arg_flags.copy ();
 }
 
 /* Called when new clone is inserted to callgraph late.  */
@@ -2165,6 +2171,8 @@ modref_summaries_lto::duplicate (cgraph_node *, 
cgraph_node *,
 src_data->loads->max_accesses);
   dst_data->loads->copy_from (src_data->loads);
   dst_data->writes_errno = src_data->writes_errno;
+  if (src_data->arg_flags.length ())
+dst_data->arg_flags = src_data->arg_flags.copy ();
 }
 
 namespace
@@ -2690,7 +2698,7 @@ remap_arg_flags (auto_vec  &arg_flags, 
clone_info *info)
   if (o >= 0 && (int)old.length () > o && old[o])
max = i;
 }
-  if (max > 0)
+  if (max >= 0)
 arg_flags.safe_grow_cleared (max + 1, true);
   FOR_EACH_VEC_SAFE_ELT (info->param_adjustments->m_adj_params, i, p)
 {


Re: [13/32] new options

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/3/20 2:15 PM, Nathan Sidwell wrote:
> Here are the new options, along with the C++ lang-spec changes.
>
> Modules is enabled by -fmodules-ts, it is not implicitly enabled by
> -std=c++20.  Usually that's the only option you need to add for a
> module-aware build.
>
> to build a header unit you can either add -fmodule-header to a c++
> build, or you can set the language to be c++-header and add -fmodules-ts:
>
> g++ -x c++-header -fmodules-ts my-header-file
>
> to search the user or system include paths select c++-user-header of
> c++-system-header as the language.
>
> enabling -fmodules-ts will disable PCH, they do not play well
> together. There is a potential issue down the road when we implicitly
> enable modules.  At that point building a header-unit could become
> indistinguishable from building a PCH.  Perhaps we should consider
> phasing in an explicit PCH option?
In a modules capable world, how much value still exists with the old PCH
scheme?   The old PCH scheme can't work in some enviroments anyway
(anyone using ASLR & PIE).

I'm going to assume the spec changs are OK.  My brain always melts when
I start looking at them.

OK

Jeff



Re: Add 'g++.dg/gomp/map-{1,2}.C'

2020-11-25 Thread Thomas Schwinge
Hi!

On 2020-11-25T11:52:44+0100, Jakub Jelinek  wrote:
> On Wed, Nov 25, 2020 at 11:43:48AM +0100, Thomas Schwinge wrote:
>> On 2020-11-25T11:10:18+0100, Jakub Jelinek  wrote:
>> > On Wed, Nov 25, 2020 at 11:00:57AM +0100, Thomas Schwinge wrote:
>> >> I had reason to look into OpenMP C++ 'map' clause parsing, and a
>> >> testsuite enhancement to "Add 'g++.dg/gomp/map-{1,2}.C'" fell out of
>> >> that, see attached.  OK to push?
>> >>
>> >> Note two XFAILs in 'g++.dg/gomp/map-1.C' compared to the C/C++ variant.

>> [...] the first XFAIL disappears (now matches the C/C++ variant),
>> but the second remains, see updated patch attached.
>
> Ok, patch ok for trunk.

I've pushed "Add 'g++.dg/gomp/map-{1,2}.C'" to master branch in commit
1049e5408fa343b5bf0a6380212a8ec8dfe2b6fc, and backported to
releases/gcc-10 branch in commit
78853078d692809807f44348948041c5fbe2588d, to releases/gcc-9 branch in
commit a03fa173f60c66889f84da947a62f5b1b42cdf07, to releases/gcc-8 branch
in commit e7e0360147d973e7634f16bfec265cd4cc937e1c, see attached.  (All
branches, as I need this later in context of an OpenACC-specific bug
fix.)

> If you could file a PR for the TODO xfail, I'd appreciate it.

 "[OMP] Missing 'omp_mappable_type' error
diagnostic inside C++ template".  Unfortunately, I failed to replace
"TODO" with "PR97996" before pushing.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 1049e5408fa343b5bf0a6380212a8ec8dfe2b6fc Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 25 Nov 2020 11:41:45 +0100
Subject: [PATCH] Add 'g++.dg/gomp/map-{1,2}.C'

	gcc/testsuite/
	* g++.dg/gomp/map-1.C: New.
	* g++.dg/gomp/map-2.C: Likewise.
	* c-c++-common/gomp/map-1.c: Adjust.
	* c-c++-common/gomp/map-2.c: Likewise.
---
 gcc/testsuite/c-c++-common/gomp/map-1.c|  5 +++--
 gcc/testsuite/c-c++-common/gomp/map-2.c|  5 +++--
 .../gomp/map-1.c => g++.dg/gomp/map-1.C}   | 14 +++---
 .../gomp/map-2.c => g++.dg/gomp/map-2.C}   | 12 ++--
 4 files changed, 27 insertions(+), 9 deletions(-)
 copy gcc/testsuite/{c-c++-common/gomp/map-1.c => g++.dg/gomp/map-1.C} (95%)
 copy gcc/testsuite/{c-c++-common/gomp/map-2.c => g++.dg/gomp/map-2.C} (91%)

diff --git a/gcc/testsuite/c-c++-common/gomp/map-1.c b/gcc/testsuite/c-c++-common/gomp/map-1.c
index 508dc8d6b01..31100b0396b 100644
--- a/gcc/testsuite/c-c++-common/gomp/map-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/map-1.c
@@ -1,5 +1,6 @@
-/* { dg-do compile } */
-/* { dg-options "-fopenmp" } */
+/* Test 'map' clause diagnostics.  */
+
+/* See also corresponding C++ variant: '../../g++.dg/gomp/map-1.C'.  */
 
 extern int a[][10], a2[][10];
 int b[10], c[10][2], d[10], e[10], f[10];
diff --git a/gcc/testsuite/c-c++-common/gomp/map-2.c b/gcc/testsuite/c-c++-common/gomp/map-2.c
index 101f4047b85..cd69f6b9a57 100644
--- a/gcc/testsuite/c-c++-common/gomp/map-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/map-2.c
@@ -1,5 +1,6 @@
-/* { dg-do compile } */
-/* { dg-options "-fopenmp" } */
+/* Test 'map' clause diagnostics.  */
+
+/* See also corresponding C++ variant: '../../g++.dg/gomp/map-2.C'.  */
 
 void
 foo (int *p, int (*q)[10], int r[10], int s[10][10])
diff --git a/gcc/testsuite/c-c++-common/gomp/map-1.c b/gcc/testsuite/g++.dg/gomp/map-1.C
similarity index 95%
copy from gcc/testsuite/c-c++-common/gomp/map-1.c
copy to gcc/testsuite/g++.dg/gomp/map-1.C
index 508dc8d6b01..11275efff4a 100644
--- a/gcc/testsuite/c-c++-common/gomp/map-1.c
+++ b/gcc/testsuite/g++.dg/gomp/map-1.C
@@ -1,5 +1,6 @@
-/* { dg-do compile } */
-/* { dg-options "-fopenmp" } */
+/* Test 'map' clause diagnostics.  */
+
+/* See also corresponding C/C++ variant: '../../c-c++-common/gomp/map-1.c'.  */
 
 extern int a[][10], a2[][10];
 int b[10], c[10][2], d[10], e[10], f[10];
@@ -16,6 +17,7 @@ int t[10];
 void bar (int *);
 #pragma omp end declare target
 
+template 
 void
 foo (int g[3][10], int h[4][8], int i[2][10], int j[][9],
  int g2[3][10], int h2[4][8], int i2[2][10], int j2[][9])
@@ -39,7 +41,7 @@ foo (int g[3][10], int h[4][8], int i[2][10], int j[][9],
   #pragma omp target map(alloc: s2) /* { dg-error "'s2' does not have a mappable type in 'map' clause" } */
 ;
   #pragma omp target map(to: a[:][:]) /* { dg-error "array type length expression must be specified" } */
-bar (&a[0][0]); /* { dg-error "referenced in target region does not have a mappable type" } */
+bar (&a[0][0]); /* { dg-error "referenced in target region does not have a mappable type" "TODO" { xfail *-*-* } } */
   #pragma omp target map(tofrom: b[-1:]) /* { dg-error "negative low bound in array section" } */
 bar (b);
   #pragma omp target map(tofrom: c[:-3][:]) /* { dg-error "negative length in array section" } */
@@ -107,3 +109,9 @@ foo (int g[3][10], int h[4][8], int i[2][10], int j[][9],
   #pragma omp tar

Re: [PATCH] RFC: add "deallocated_by" attribute for use by analyzer

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/18/20 2:20 PM, Martin Sebor wrote:
> On 11/18/20 1:41 PM, David Malcolm wrote:
>
>> So hopefully that gives us a way forward.  I'm about to disappear for a
>> week and a half, so don't let my analyzer patches stand in the way of
>> Martin's.  I can finish reworking my stuff on top of Martin's when I
>> get back, or if they aren't in "master" yet I can combine parts of
>> Martin's patches and integrate them into mine.
>>
>> Hope this sounds sane
>
> It does to me.  Thanks again for taking the time to do the prototype
> and confirm there are no roadblocks for sharing the same attribute!
Good.  Then let's focus on trying to get Martin's work onto the trunk
with David's work following, utilizing Martin's attibute.

jeff



Re: [PATCH 00/31] VAX: Bring the port up to date (yes, MODE_CC conversion is included)

2020-11-25 Thread Maciej W. Rozycki
On Tue, 24 Nov 2020, Thomas Koenig wrote:

> > I am going to give up at this point, as porting libgfortran to non-IEEE FP
> > is surely well beyond what I can afford to do right now.
> 
> Can you file a PR about this? Eliminating __builtin_inf and friends
> sounds doable.

 There's more to that unfortunately.  I would have done it right away if 
it was so easy.

> And does anybody know what we should return in cases where the result
> exceeds the maximum representable number?

 Presumably the standard has it (implementation-specific for non-IEEE-754 
I suppose; in the VAX FP ISA an underflow optionally traps, and otherwise 
produces a zero result, while an overflow traps unconditionally and keeps 
the destination operand unchanged).

  Maciej


Re: [12/32] user documentation

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/3/20 2:15 PM, Nathan Sidwell wrote:
> This is the user documentation.
>
>
> 12-core-doc.diff
>
I think this is fine. 
jeff



Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/11/20 1:03 AM, Hongtao Liu via Gcc-patches wrote:

>
>
>
> vec_set_rebaserebase_onr11-4901.patch
>
> From c9d684c37b5f79f68f938f39eeb9e7989b10302d Mon Sep 17 00:00:00 2001
> From: liuhongt 
> Date: Mon, 19 Oct 2020 16:04:39 +0800
> Subject: [PATCH] Support variable index vec_set.
>
> gcc/ChangeLog:
>
>   PR target/97194
>   * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
>   * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
>   * config/i386/predicates.md (vec_setm_operand): New predicate,
>   true for const_int_operand or register_operand under TARGET_AVX2.
>   * config/i386/sse.md (vec_set): Support both constant
>   and variable index vec_set.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/i386/avx2-vec-set-1.c: New test.
>   * gcc.target/i386/avx2-vec-set-2.c: New test.
>   * gcc.target/i386/avx512bw-vec-set-1.c: New test.
>   * gcc.target/i386/avx512bw-vec-set-2.c: New test.
>   * gcc.target/i386/avx512f-vec-set-2.c: New test.
>   * gcc.target/i386/avx512vl-vec-set-2.c: New test.
This is OK.  Sorry for the delays.

jeff



Re: [PATCH] libgfortran: Correct FP feature macro checks

2020-11-25 Thread Thomas Koenig via Gcc-patches

Hi Maciej,


  Infinity is the least of a problem, because as it turns out we have
numerous FP library functions in libgfortran that require explicit porting
to each FP format supported, like these settings:

  xbig = 26.543, xhuge = 6.71e+7, xmax = 2.53e+307;


The Fortran intrinsis like HUGE, EPSILON, SELECTED_REAL_KIND etc
would have to be handled correctly, both for simplification in
the front end and in the library.

Does the program

  print *,HUGE(1.0)
  print *,EPSILON(1.0)
end

print correct values?

Regards

Thomas



Fix templatized C++ OpenACC 'cache' directive ICEs

2020-11-25 Thread Thomas Schwinge
Hi!

I've pushed "Fix templatized C++ OpenACC 'cache' directive ICEs" to
master branch in commit 0cab70604cfda30bc64351b39493ef884ff7ba10, and
backported to releases/gcc-10 branch in commit
5bfcc9e103c06d85de43766fe05eb59f4f50c3db, to releases/gcc-9 branch in
commit 1cb1c9e62f92ad674976b0da8cc46d7350d79a05, to releases/gcc-8 branch
in commit b4a3e26c329f63c9953f4c4e3141c562bf91ce93, see attached.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 0cab70604cfda30bc64351b39493ef884ff7ba10 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 25 Nov 2020 13:03:52 +0100
Subject: [PATCH] Fix templatized C++ OpenACC 'cache' directive ICEs

This has been broken forever, whoops...

	gcc/cp/
	* pt.c (tsubst_omp_clauses): Handle 'OMP_CLAUSE__CACHE_'.
	(tsubst_expr): Handle 'OACC_CACHE'.
	gcc/testsuite/
	* c-c++-common/goacc/cache-1.c: Update.
	* c-c++-common/goacc/cache-2.c: Likewise.
	* g++.dg/goacc/cache-1.C: New.
	* g++.dg/goacc/cache-2.C: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c++/cache-1.C: New.
	* testsuite/libgomp.oacc-c-c++-common/cache-1.c: Update.
---
 gcc/cp/pt.c|  2 ++
 gcc/testsuite/c-c++-common/goacc/cache-1.c | 18 +++---
 gcc/testsuite/c-c++-common/goacc/cache-2.c | 10 +-
 gcc/testsuite/g++.dg/goacc/cache-1.C   | 15 +++
 .../goacc/cache-2.c => g++.dg/goacc/cache-2.C} | 15 +++
 libgomp/testsuite/libgomp.oacc-c++/cache-1.C   | 13 +
 .../libgomp.oacc-c-c++-common/cache-1.c| 12 +++-
 7 files changed, 68 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/goacc/cache-1.C
 copy gcc/testsuite/{c-c++-common/goacc/cache-2.c => g++.dg/goacc/cache-2.C} (90%)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c++/cache-1.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index fdd7f2d457b..4fb0bc82c31 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -17245,6 +17245,7 @@ tsubst_omp_clauses (tree clauses, enum c_omp_region_type ort,
 	case OMP_CLAUSE_FROM:
 	case OMP_CLAUSE_TO:
 	case OMP_CLAUSE_MAP:
+	case OMP_CLAUSE__CACHE_:
 	case OMP_CLAUSE_NONTEMPORAL:
 	case OMP_CLAUSE_USE_DEVICE_PTR:
 	case OMP_CLAUSE_USE_DEVICE_ADDR:
@@ -18761,6 +18762,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
   add_stmt (t);
   break;
 
+case OACC_CACHE:
 case OACC_ENTER_DATA:
 case OACC_EXIT_DATA:
 case OACC_UPDATE:
diff --git a/gcc/testsuite/c-c++-common/goacc/cache-1.c b/gcc/testsuite/c-c++-common/goacc/cache-1.c
index 1d4759e738c..242f3c612fd 100644
--- a/gcc/testsuite/c-c++-common/goacc/cache-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/cache-1.c
@@ -1,9 +1,15 @@
-/* OpenACC cache directive: valid usage.  */
-/* For execution testing, this file is "#include"d from
-   libgomp/testsuite/libgomp.oacc-c-c++-common/cache-1.c.  */
+/* OpenACC 'cache' directive: valid usage.  */
 
-int
-main (int argc, char **argv)
+/* See also corresponding C++ variant: '../../g++.dg/goacc/cache-1.C'.  */
+
+/* For execution testing, this file is '#include'd from
+   '../../../../libgomp/testsuite/libgomp.oacc-c-c++-common/cache-1.c'.  */
+
+#ifdef TEMPLATIZE
+template 
+#endif
+static void
+test ()
 {
 #define N   2
 int a[N], b[N];
@@ -61,6 +67,4 @@ main (int argc, char **argv)
 if (a[i] != b[i])
 __builtin_abort ();
 }
-
-return 0;
 }
diff --git a/gcc/testsuite/c-c++-common/goacc/cache-2.c b/gcc/testsuite/c-c++-common/goacc/cache-2.c
index d1181d1b6e7..80b925e5112 100644
--- a/gcc/testsuite/c-c++-common/goacc/cache-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/cache-2.c
@@ -1,7 +1,9 @@
-/* OpenACC cache directive: invalid usage.  */
+/* OpenACC 'cache' directive: invalid usage.  */
 
-int
-main (int argc, char **argv)
+/* See also corresponding C++ variant: '../../g++.dg/goacc/cache-2.C'.  */
+
+static void
+test ()
 {
 #define N   2
 int a[N], b[N];
@@ -52,6 +54,4 @@ main (int argc, char **argv)
 if (a[i] != b[i])
 __builtin_abort ();
 }
-
-return 0;
 }
diff --git a/gcc/testsuite/g++.dg/goacc/cache-1.C b/gcc/testsuite/g++.dg/goacc/cache-1.C
new file mode 100644
index 000..a8d5ab32016
--- /dev/null
+++ b/gcc/testsuite/g++.dg/goacc/cache-1.C
@@ -0,0 +1,15 @@
+/* OpenACC 'cache' directive: valid usage.  */
+
+/* See also corresponding C/C++ variant '../../c-c++-common/goacc/cache-1.c'.  */
+
+/* For execution testing, this file is '#include'd from
+   '../../../../libgomp/testsuite/libgomp.oacc-c++/cache-1.C'.  */
+
+#define TEMPLATIZE
+#include "../../c-c++-common/goacc/cache-1.c"
+
+static void
+instantiate ()
+{
+  &test<0>;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/cache-2.c b/gcc/testsuite/g++.dg/goacc/cache-2.C
similarity index 90%
copy from gcc/testsuite/c-c++-common/goacc/cache-2.c
copy to gcc/testsuite/g++.dg/goacc/cache-

Re: [C PATCH] Do not drop qualifiers for _Atomic in typeof

2020-11-25 Thread Uecker, Martin
Am Montag, den 23.11.2020, 20:21 + schrieb Joseph Myers:
> On Mon, 23 Nov 2020, Uecker, Martin wrote:
> 
> > Joseph,
> > 
> > here is the patch to not drop qualifiers for _Atomic in
> > typeof. I am not sure whether this is appropriate in
> > stage3, but I wanted to leave it here for you to comment
> > and so that it does not lost.
> > 
> > First, I noticed that the change to drop qualifiers
> > in lvalue conversion also implies that __auto_type now
> > always uses the non-qualified type. I think this is more
> > correct, and also what other compilers and C++'s auto do.
> > The first change here in c-parser would remove the now
> > redundant code to drop qualifiers for _Atomic types.
> > 
> > The second change would remove the code to drop qualifiers
> > for _Atomic types for typeof. I would then use the
> > comma operator for stdatomic to remove all qualifiers.
> > Here, the question is whether this may have same
> > unintended side effects.
> 
> This is OK, with references to bugs 65455 and 92935 as I think it fixes 
> those.
> 
> Any change to qualifiers for typeof risks breaking something relying on 
> the details of when the result is or is not qualified, but given that in 
> previous GCC versions that was poorly defined and inconsistent, making 
> these changes to make it more consistent seems reasonable.
> 
> It is probably still the case that _Typeof as proposed for ISO C would 
> need special handling of function and function pointer types for the same 
> reason as _Generic has such handling (_Noreturn not being part of the 
> function type as defined by ISO C).

So OK to apply with the following Changelog?



C: Do not drop qualifiers in typeof for _Atomic types. [PR65455,PR92935]

2020-11-25  Martin Uecker  

gcc/c/
* c-parsers.c (c_parser_declaration_or_fndef): Remove redundant code
to drop qualifiers of _Atomic types for __auto_type.
(c_parser_typeof_specifier): Do not drop qualifiers of _Atomic
types for __typeof__.

gcc/ginclude/
* ginclude/stdatomic.h: Use comma operator to drop qualifiers.
 
gcc/testsuite/
* gcc.dg/typeof-2.c: Adapt test.



Re: [PATCH] libgfortran: Correct FP feature macro checks

2020-11-25 Thread Maciej W. Rozycki
On Wed, 25 Nov 2020, Tobias Burnus wrote:

> Does this solve all infinity issues? Or is there still code requiring it
> implicitly? From the previous exchange, it sounded as if there are still
> issues.

 Infinity is the least of a problem, because as it turns out we have 
numerous FP library functions in libgfortran that require explicit porting 
to each FP format supported, like these settings:

  xbig = 26.543, xhuge = 6.71e+7, xmax = 2.53e+307;

which need to be adjusted for the different representable ranges provided, 
by the VAX format in this case.  Fortunately the VAX encoding is quite 
similar to IEEE 754 as far as the encodings of regular FP data are, but 
the bias of the exponent is bumped up by 2, that is for example with the 
double format using the VAX G-floating data type it is 1025 rather than 
1023 -- 1 by the encoding of the exponent itself and 1 by the value of the 
implicit mantissa bit.  Some of my previous work on the libm part of glibc 
can possibly be reused; see my other message here:



 BTW there's an interesting story available online regarding the VAX vs 
IEEE 754 (Intel) FP format, which does explain the similarities: 

.

  Maciej


Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/24/20 8:40 AM, Stefan Kanthak wrote:
> Andreas Schwab wrote:
>
>> On Nov 24 2020, Stefan Kanthak wrote:
>>
>>> 'nuff said
>> What's your point?
> Pinpoint deficiencies and bugs in GCC and libgcc, plus a counter
> example to your "argument"!
> I recommend careful reading.
Umm, you should broaden your horizons.  The world is not an x86.  I'm
pretty sure Andreas was referring to non-x86 targets.

As Jakub has already indicated, your change will result in infinite
recursion on avr.  I happened to have a cr16 handy and it looks like
it'd generate infinite recursion there too.

On other targets the routines you're changing won't be used because they
either have 64 bit shifts or the compiler can synthesize them from other
primitives that are available.

It's pointless to keep arguing on the shift stuff.  What you've
submitted is fundamentally wrong in the context of gcc's libgcc2
routines.  It's that simple.  If you keep arguing about it you're likely
just going to annoy those who can help you to the point where they won't
bother.

I think the bswapsi2 change will go forward, but it needs to be tested.

jeff



Re: [PATCH] c++: v2: Add __builtin_bit_cast to implement std::bit_cast [PR93121]

2020-11-25 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 25, 2020 at 12:26:17PM -0500, Jason Merrill wrote:
> > + if (DECL_BIT_FIELD (fld)
> > + && DECL_NAME (fld) == NULL_TREE)
> > +   continue;
> 
> I think you want to check DECL_PADDING_P here; the C and C++ front ends set
> it on unnamed bit-fields, and that's what is_empty_type looks at.

Ok, changed in my copy.  I'll also post a patch for
__builtin_clear_padding to use DECL_PADDING_P in there instead of
DECL_BIT_FIELD/DECL_NAME==NULL.

> > +  if (TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
> > +   {
> > + /* Don't perform array-to-pointer conversion.  */
> > + arg = mark_rvalue_use (arg, loc, true);
> > + if (!complete_type_or_maybe_complain (TREE_TYPE (arg), arg, complain))
> > +   return error_mark_node;
> > +   }
> > +  else
> > +   arg = decay_conversion (arg, complain);
> 
> bit_cast operates on an lvalue argument, so I don't think we want
> decay_conversion at all here.
> 
> > +  if (error_operand_p (arg))
> > +   return error_mark_node;
> > +
> > +  arg = convert_from_reference (arg);
> 
> This shouldn't be necessary; the argument should already be converted from
> reference.  Generally we call convert_from_reference on the result of some
> processing, not on an incoming argument.

Removing these two regresses some tests in the testsuite.
It is true that std::bit_cast's argument must be a reference, and so when
one uses std::bit_cast one won't run into these problems, but the builtin's
argument itself is an rvalue and so we need to deal with people calling it
directly.
So, commenting out the decay_conversion and convert_from_reference results
in:
extern V v;
...
  __builtin_bit_cast (int, v);
no longer being reported as invalid use of incomplete type, but
error: '__builtin_bit_cast' source size '' not equal to destination type size 
'4'
(note nothing in between '' for the size because the size is NULL).
Ditto for:
extern V *p;
...
  __builtin_bit_cast (int, *p);
I guess I could add some hand written code to deal with incomplete types
to cure these.  But e.g. decay_conversion also calls mark_rvalue_use which
we also need e.g. for -Wunused-but-set*, but don't we also need it e.g. for
lambdas? The builtin is after all using the argument as an rvalue
(reads it).
Another change that commenting out those two parts causes is different
diagnostics on bit-cast4.C,
bit-cast4.C:7:30: error: '__builtin_bit_cast' is not a constant expression 
because 'const int* const' is a pointer type
bit-cast4.C:7:30: error: '__builtin_bit_cast' is not a constant expression 
because 'int D::* const' is a pointer to member type
bit-cast4.C:7:30: error: '__builtin_bit_cast' is not a constant expression 
because 'int (D::* const)() const' is a pointer to member type
The tests expect 'const int*', 'int D::*' and 'int (D::*)() const', i.e. the
toplevel qualifiers stripped from those.
Commenting out just the arg = convert_from_reference (arg); doesn't regress
anything though, it is the decay_conversion.

> > +/* Attempt to interpret aggregate of TYPE from bytes encoded in target
> > +   byte order at PTR + OFF with LEN bytes.  MASK contains bits set if the 
> > value
> > +   is indeterminate.  */
> > +
> > +static tree
> > +cxx_native_interpret_aggregate (tree type, const unsigned char *ptr, int 
> > off,
> > +   int len, unsigned char *mask,
> > +   const constexpr_ctx *ctx, bool *non_constant_p,
> > +   location_t loc)
> 
> Can this be, say, native_interpret_initializer in fold-const?  It doesn't
> seem closely tied to the front end other than diagnostics that could move to
> the caller, like you've already done for the non-aggregate case.

The middle-end doesn't need it ATM for anything, plus I think the
ctx/non_constant_p/loc and diagnostics is really C++ FE specific.
If you really want it in fold-const.c, the only way I can imagine it is
that it would be
tree
native_interpret_aggregate (tree type, const unsigned char *ptr, int off,
int len, unsigned char *mask = NULL,
tree (*mask_callback) (void *, int) = NULL,
void *mask_data = NULL)
where C++ would call it with the mask argument, as mask_callback a FE function
that would emit the diagnostics and decide what to return when mask is set
on something, and mask_data would be a pointer to struct containing
const constexpr_ctx *ctx; bool *non_constant_p; location_t loc;
for it.

Jakub



[committed] libstdc++: Remove redundant clock conversions in atomic waits

2020-11-25 Thread Jonathan Wakely via Gcc-patches
For the case where a timeout is specified using the system_clock we
perform a conversion to the preferred clock (which is either
steady_clock or system_clock itself), wait using __cond_wait_until_impl,
and then check the time by that clock again to see if it was reached.
This is entirely redundant, as we can just call __cond_wait_until_impl
directly. It will wait using the specified clock, and there's no need to
check the time twice. For the no_timeout case this removes two
unnecessary calls to the clock's now() function, and for the timeout
case it removes three calls.

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h (__cond_wait_until): Do not
perform redundant conversions to the same clock.

Tested powerpc-aix and sparc-solaris2.11.

Committed to trunk.

commit dfc537e554afa98b42a4b203ffd08c0eddba746e
Author: Jonathan Wakely 
Date:   Wed Nov 25 17:59:44 2020

libstdc++: Remove redundant clock conversions in atomic waits

For the case where a timeout is specified using the system_clock we
perform a conversion to the preferred clock (which is either
steady_clock or system_clock itself), wait using __cond_wait_until_impl,
and then check the time by that clock again to see if it was reached.
This is entirely redundant, as we can just call __cond_wait_until_impl
directly. It will wait using the specified clock, and there's no need to
check the time twice. For the no_timeout case this removes two
unnecessary calls to the clock's now() function, and for the timeout
case it removes three calls.

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h (__cond_wait_until): Do not
perform redundant conversions to the same clock.

diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h 
b/libstdc++-v3/include/bits/atomic_timed_wait.h
index 9e44114dd5b6..1c91c858ce7c 100644
--- a/libstdc++-v3/include/bits/atomic_timed_wait.h
+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
@@ -166,24 +166,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __cond_wait_until(__condvar& __cv, mutex& __mx,
  const chrono::time_point<_Clock, _Duration>& __atime)
   {
-#ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
-   using __clock_t = chrono::steady_clock;
-#else
+#ifndef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
using __clock_t = chrono::system_clock;
+#else
+   using __clock_t = chrono::steady_clock;
+   if constexpr (is_same_v<_Clock, chrono::steady_clock>)
+ return __detail::__cond_wait_until_impl(__cv, __mx, __atime);
+   else
 #endif
-   const typename _Clock::time_point __c_entry = _Clock::now();
-   const __clock_t::time_point __s_entry = __clock_t::now();
-   const auto __delta = __atime - __c_entry;
-   const auto __s_atime = __s_entry + __delta;
-   if (__detail::__cond_wait_until_impl(__cv, __mx, __s_atime)
-   == __atomic_wait_status::no_timeout)
- return __atomic_wait_status::no_timeout;
-   // We got a timeout when measured against __clock_t but
-   // we need to check against the caller-supplied clock
-   // to tell whether we should return a timeout.
-   if (_Clock::now() < __atime)
- return __atomic_wait_status::no_timeout;
-   return __atomic_wait_status::timeout;
+   if constexpr (is_same_v<_Clock, chrono::system_clock>)
+ return __detail::__cond_wait_until_impl(__cv, __mx, __atime);
+   else
+ {
+   const typename _Clock::time_point __c_entry = _Clock::now();
+   const __clock_t::time_point __s_entry = __clock_t::now();
+   const auto __delta = __atime - __c_entry;
+   const auto __s_atime = __s_entry + __delta;
+   if (__detail::__cond_wait_until_impl(__cv, __mx, __s_atime)
+   == __atomic_wait_status::no_timeout)
+ return __atomic_wait_status::no_timeout;
+   // We got a timeout when measured against __clock_t but
+   // we need to check against the caller-supplied clock
+   // to tell whether we should return a timeout.
+   if (_Clock::now() < __atime)
+ return __atomic_wait_status::no_timeout;
+   return __atomic_wait_status::timeout;
+ }
   }
 #endif // FUTEX
 


[committed] libstdc++: Encapsulate __gthread_cond_t as std::__condvar

2020-11-25 Thread Jonathan Wakely via Gcc-patches
This introduces a new internal utility, std::__condvar, which is a
simplified form of std::condition_variable. It has no dependency on
 or std::unique_lock, which allows it to be used in
.

This avoids repeating the #ifdef __GTHREAD_COND_INIT preprocessor
conditions and associated logic for initializing a __gthread_cond_t
correctly. It also encapsulates most of the __gthread_cond_xxx functions
as member functions of __condvar.

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h (__cond_wait_until_impl):
Do not define when _GLIBCXX_HAVE_LINUX_FUTEX is defined. Use
__condvar and mutex instead of __gthread_cond_t and
unique_lock.
(__cond_wait_until): Likewise. Fix test for return value of
__cond_wait_until_impl.
(__timed_waiters::_M_do_wait_until): Use __condvar instead
of __gthread_cond_t.
* include/bits/atomic_wait.h: Remove 
include. Only include  if not using futexes.
(__platform_wait_max_value): Remove unused variable.
(__waiters::lock_t): Use lock_guard instead of unique_lock.
(__waiters::_M_cv): Use __condvar instead of __gthread_cond_t.
(__waiters::_M_do_wait(__platform_wait_t)): Likewise.
(__waiters::_M_notify()): Likewise. Use notify_one() if not
asked to notify all.
* include/bits/std_mutex.h (__condvar): New type.
* include/std/condition_variable (condition_variable::_M_cond)
(condition_variable::wait_until): Use __condvar instead of
__gthread_cond_t.
* src/c++11/condition_variable.cc (condition_variable): Define
default constructor and destructor as defaulted.
(condition_variable::wait, condition_variable::notify_one)
(condition_variable::notify_all): Forward to corresponding
member function of __condvar.

Tested x86_64-linux, powerpc64le-linux, powerpc-aix and
sparc-solaris2.11. Committed to trunk.

commit 7d2a98a7273c423842a3935de64b15a6d6cb33bc
Author: Jonathan Wakely 
Date:   Wed Nov 25 14:24:21 2020

libstdc++: Encapsulate __gthread_cond_t as std::__condvar

This introduces a new internal utility, std::__condvar, which is a
simplified form of std::condition_variable. It has no dependency on
 or std::unique_lock, which allows it to be used in
.

This avoids repeating the #ifdef __GTHREAD_COND_INIT preprocessor
conditions and associated logic for initializing a __gthread_cond_t
correctly. It also encapsulates most of the __gthread_cond_xxx functions
as member functions of __condvar.

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h (__cond_wait_until_impl):
Do not define when _GLIBCXX_HAVE_LINUX_FUTEX is defined. Use
__condvar and mutex instead of __gthread_cond_t and
unique_lock.
(__cond_wait_until): Likewise. Fix test for return value of
__cond_wait_until_impl.
(__timed_waiters::_M_do_wait_until): Use __condvar instead
of __gthread_cond_t.
* include/bits/atomic_wait.h: Remove 
include. Only include  if not using futexes.
(__platform_wait_max_value): Remove unused variable.
(__waiters::lock_t): Use lock_guard instead of unique_lock.
(__waiters::_M_cv): Use __condvar instead of __gthread_cond_t.
(__waiters::_M_do_wait(__platform_wait_t)): Likewise.
(__waiters::_M_notify()): Likewise. Use notify_one() if not
asked to notify all.
* include/bits/std_mutex.h (__condvar): New type.
* include/std/condition_variable (condition_variable::_M_cond)
(condition_variable::wait_until): Use __condvar instead of
__gthread_cond_t.
* src/c++11/condition_variable.cc (condition_variable): Define
default constructor and destructor as defaulted.
(condition_variable::wait, condition_variable::notify_one)
(condition_variable::notify_all): Forward to corresponding
member function of __condvar.

diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h 
b/libstdc++-v3/include/bits/atomic_timed_wait.h
index b13f8aa12861..9e44114dd5b6 100644
--- a/libstdc++-v3/include/bits/atomic_timed_wait.h
+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
@@ -40,6 +40,7 @@
 #include 
 
 #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
+#include  // std::terminate
 #include 
 #endif
 
@@ -113,13 +114,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return __atomic_wait_status::timeout;
  }
   }
-#endif
+#else // ! FUTEX
 
 #ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
 template
   __atomic_wait_status
-  __cond_wait_until_impl(__gthread_cond_t* __cv,
- unique_lock& __lock,
+  __cond_wait_until_impl(__condvar& __cv, mutex& __mx,
  const chrono::time_point& __atime)
   {
auto __s = chrono::time_point_cast(__atime);
@@ -131,62 +131,

Re: [PATCH] libstdc++: Add C++2a synchronization support

2020-11-25 Thread Jonathan Wakely via Gcc-patches

On 25/11/20 10:35 +, Jonathan Wakely wrote:

I've pushed that as ad9cbcee543ecccd79fa49dafcd925532d2ce210 but there
are still other FAILs to be fixed.


I think the other FAILs are due to a race condition in the tests,
fixed by this patch. Tested x86_64-linux, powerpc64le-linux,
sparc-solaris2.11 and powerpc-aix. Committed to trunk.

I'll keep an eye on the testresults to see if this really fixes it or
not.



commit f76cad692a62d44ed32d010200bad74f36c73092
Author: Jonathan Wakely 
Date:   Wed Nov 25 14:39:54 2020

libstdc++: Fix testsuite helper functions [PR 97936]

This fixes a race condition in the util/atomic/wait_notify_util.h header
used by several tests, which should make the tests work properly.

libstdc++-v3/ChangeLog:

PR libstdc++/97936
* testsuite/29_atomics/atomic/wait_notify/bool.cc: Re-eneable
test.
* testsuite/29_atomics/atomic/wait_notify/generic.cc: Likewise.
* testsuite/29_atomics/atomic/wait_notify/pointers.cc: Likewise.
* testsuite/29_atomics/atomic_flag/wait_notify/1.cc: Likewise.
* testsuite/29_atomics/atomic_float/wait_notify.cc: Likewise.
* testsuite/29_atomics/atomic_integral/wait_notify.cc: Likewise.
* testsuite/util/atomic/wait_notify_util.h: Fix missed
notifications by making the new thread wait until the parent
thread is waiting on the condition variable.

diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/bool.cc b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/bool.cc
index 29781c6e1357..c14a2391d68b 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/bool.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/bool.cc
@@ -2,7 +2,6 @@
 // { dg-do run { target c++2a } }
 // { dg-require-gthreads "" }
 // { dg-additional-options "-pthread" { target pthread } }
-// { dg-skip-if "broken" { ! *-*-*linux } }
 
 // Copyright (C) 2020 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/generic.cc b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/generic.cc
index 629556a9d2d0..988fe7b334f3 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/generic.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/generic.cc
@@ -2,7 +2,6 @@
 // { dg-do run { target c++2a } }
 // { dg-require-gthreads "" }
 // { dg-additional-options "-pthread" { target pthread } }
-// { dg-skip-if "broken" { ! *-*-*linux } }
 
 // Copyright (C) 2020 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/pointers.cc b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/pointers.cc
index f54961f893d4..87830236e0ee 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/pointers.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/pointers.cc
@@ -2,7 +2,6 @@
 // { dg-do run { target c++2a } }
 // { dg-additional-options "-pthread" { target pthread } }
 // { dg-require-gthreads "" }
-// { dg-skip-if "broken" { ! *-*-*linux } }
 
 // Copyright (C) 2020 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_flag/wait_notify/1.cc b/libstdc++-v3/testsuite/29_atomics/atomic_flag/wait_notify/1.cc
index 763d3e77159c..991713fbcdee 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_flag/wait_notify/1.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_flag/wait_notify/1.cc
@@ -2,7 +2,6 @@
 // { dg-do run { target c++2a } }
 // { dg-require-gthreads "" }
 // { dg-additional-options "-pthread" { target pthread } }
-// { dg-skip-if "broken" { ! *-*-*linux } }
 
 // Copyright (C) 2020 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc b/libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc
index 8f9e4a39a21f..134eff39e1b1 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc
@@ -3,7 +3,6 @@
 // { dg-require-gthreads "" }
 // { dg-additional-options "-pthread" { target pthread } }
 // { dg-add-options libatomic }
-// { dg-skip-if "broken" { ! *-*-*linux } }
 
 // Copyright (C) 2020 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_integral/wait_notify.cc b/libstdc++-v3/testsuite/29_atomics/atomic_integral/wait_notify.cc
index 762583cf8c76..c65379cba619 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_integral/wait_notify.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_integral/wait_notify.cc
@@ -3,7 +3,6 @@
 // { dg-require-gthreads "" }
 // { dg-add-options libatomic }
 // { dg-additional-options "-pthread" { target pthread } }
-// { dg-skip-if "broken" { *-*-* } }
 
 // Copyright (C) 2020 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/util/atomic/wait_notify_util.h b/libstdc++-v3/testsuite/util/atomic/wait_notify_util.h
index a319

Re: [PATCH 00/31] VAX: Bring the port up to date (yes, MODE_CC conversion is included)

2020-11-25 Thread Maciej W. Rozycki
On Fri, 20 Nov 2020, Maciej W. Rozycki wrote:

>  These changes have been regression-tested throughout development with the 
> `vax-netbsdelf' target running NetBSD 9.0, using said VAXstation 4000/60, 
> which uses the Mariah implemementation of the VAX architecture.  The host 
> used was `powerpc64le-linux-gnu' and occasionally `x86_64-linux-gnu' as 
> well; changes outside the VAX backend were all natively bootstrapped and 
> regression-tested with both these hosts.

 I forgot to note that I have been going through this final verification 
with the native compiler and the `vax-netbsdelf' cross-compiler built with 
it both configured with `--disable-werror'.  This is due to a recent 
regression with the Go frontend causing a build error otherwise:

.../gcc/go/gofrontend/go-diagnostics.cc: In function 'std::string 
expand_message(const char*, va_list)':
.../gcc/go/gofrontend/go-diagnostics.cc:110:61: error: '' may be 
used uninitialized [-Werror=maybe-uninitialized]
  110 |  "memory allocation failed in vasprintf");
  | ^
In file included from 
.../prev-powerpc64le-linux-gnu/libstdc++-v3/include/string:55,
 from .../gcc/go/go-system.h:34,
 from .../gcc/go/gofrontend/go-linemap.h:10,
 from .../gcc/go/gofrontend/go-diagnostics.h:10,
 from .../gcc/go/gofrontend/go-diagnostics.cc:7:
.../prev-powerpc64le-linux-gnu/libstdc++-v3/include/bits/basic_string.h:525:7: 
note: by argument 3 of type 'const std::allocator&' to 
'std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::basic_string(const 
_CharT*, const _Alloc&) [with  = std::allocator; 
_CharT = char; _Traits = std::char_traits; _Alloc = 
std::allocator]' declared here
  525 |   basic_string(const _CharT* __s, const _Alloc& __a = _Alloc())
  |   ^~~~
.../gcc/go/gofrontend/go-diagnostics.cc:110:61: note: '' declared 
here
  110 |  "memory allocation failed in vasprintf");
  | ^
cc1plus: all warnings being treated as errors
make[3]: *** [.../gcc/go/Make-lang.in:242: go/go-diagnostics.o] Error 1

the cause for which I decided I could not afford the time to track down.  
Perhaps it has been fixed since, but mentioning it in case it has not.

 Earlier verification iterations were done with `--enable-werror-always'.

  Maciej


Re: [PATCH 00/31] VAX: Bring the port up to date (yes, MODE_CC conversion is included)

2020-11-25 Thread Maciej W. Rozycki
On Tue, 24 Nov 2020, Maciej W. Rozycki wrote:

> I am going to give up at this point, as porting libgfortran to non-IEEE FP 
> is surely well beyond what I can afford to do right now.

 I have now posted fixes for the issues handled so far.

 For the other pieces that are missing perhaps my work I did many years 
ago to port glibc 2.4 (the last one I was able to cook up without NPTL), 
and specifically libm within, to the never-upstreamed VAX/Linux target 
might be useful to complete the effort, as there seems to be an overlap 
here.  That port hasn't been fully verified though and I do not promise 
doing any work related to it anytime either.  The glibc patches continue 
being available online to download and use under the terms of the GNU GPL 
for anyone though.

  Maciej


Re: [PATCH] make POINTER_PLUS offset sizetype (PR 97956)

2020-11-25 Thread Martin Sebor via Gcc-patches

On 11/25/20 2:31 AM, Richard Biener wrote:

On Wed, Nov 25, 2020 at 1:45 AM Martin Sebor via Gcc-patches
 wrote:


Offsets in pointer expressions are signed but GCC prefers to
represent them as sizetype instead, and sometimes (though not
always) crashes during GIMPLE verification when they're not.
The sometimes-but-not-always part makes it easy for mistakes
to slip in and go undetected for months, until someone either
trips over it by accident, or deliberately tries to break
things (the test case in the bug relies on declaring memchr
with the third argument of type signed long which is what's
apparently needed to trigger the ICE).  The attached patch
corrects a couple of such mistakes.

Martin

PS It would save us the time and effort dealing with these
bugs to either detect (or even correct) the mistakes early,
at the time the POINTER_PLUS_EXPR is built.  Adding an assert
to gimple_build_assign()) to verify that it has the expected
type (or converting the operand to sizetype) as in the change
below does that.  I'm pretty sure I submitted a patch like it
in the past but it was rejected.  If I'm wrong or if there are
no objections to it now I'll be happy to commit it as well.


We already verify this in verify_gimple_assign_binary after
each pass.  Iff then this would argue for verifying all built
stmts immediately, assigns with verify_gimple_assign.
But I think this is overkill - your testcase is already catched
by the IL verification.


You're right, having the check wouldn't have prevented this bug.
But I'm not worried about this test case.  What I'd like to do
is reduce the risk of similar problems happening in the future
where the check would help.  Catching problems earlier by having
functions verify their pre- and postconditions is good practice.
So yes, I think all these build() functions should do that (not
necessarily to the same extent as the full-blown IL verification
but at least the basics).



Btw, are you sure the offset returned by constant_byte_string
is never checked to be positive in callers?


The function sets *PTR_OFFSET to sizetype in all but this one
case (actually, it also sets it to integer_zero_one).  Callers
then typically compare it to the length of the string to see
if it's less.  If not, the result is discarded because it refers
outside the string.  It's tested for equality to zero but I don't
see it being checked to see if it's positive and I'm not sure to
what end.  What's your concern?

Anyway, as an experiment, I've changed the function to set
the offset to ssizetype instead of sizetype and reran a subset
of the test suite with the check in gimple_build_assign and it
didn't trigger.  So I guess the sloppiness here doesn't matter.

That said, there is a bug in the function that I noticed while
making this change so it wasn't a completely pointless exercise.
The function should call itself recursively but instead it calls
string_constant.  I'll resubmit the sizetype change with the fix
for this bug

Martin



The gimple-fold.c hunk and the new testcase are OK.

Richard.


Both patches were tested on x86_64-linux.

diff --git a/gcc/gimple.c b/gcc/gimple.c
index e3e508daf2f..8e88bab9e41 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -489,6 +489,9 @@ gassign *
   gimple_build_assign (tree lhs, enum tree_code subcode, tree op1,
   tree op2 MEM_STAT_DECL)
   {
+  if (subcode == POINTER_PLUS_EXPR)
+gcc_checking_assert (ptrofftype_p (TREE_TYPE (op2)));
+
 return gimple_build_assign_1 (lhs, subcode, op1, op2, NULL_TREE
  PASS_MEM_STAT);
   }




Re: [PATCH] libgfortran: Verify the presence of all functions for POSIX 2008 locale

2020-11-25 Thread Tobias Burnus

Patch looks good to me.

Thanks,

Tobias

PS: Are there still remaining issues to be solved? Or does libgfortran
now build?

On 25.11.20 19:14, Maciej W. Rozycki wrote:

While we have `configure' checks for the individual POSIX 2008 extended
locale functions we refer to and use to guard the respective call sites,
we only verify the presence of `newlocale' for our global feature enable
check.  Consequently compilation fails for targets like NetBSD that only
have partial support for POSIX 2008 locale features and in particular
lack the `uselocale' function:

.../libgfortran/io/transfer.c: In function 'data_transfer_init_worker':
.../libgfortran/io/transfer.c:3416:30: error:
'old_locale_lock' undeclared (first use in this function)
  3416 |   __gthread_mutex_lock (&old_locale_lock);
   |  ^~~
.../libgfortran/io/transfer.c:3416:30: note: each undeclared identifier is 
reported only once for each function it appears in
.../libgfortran/io/transfer.c:3417:12: error:
'old_locale_ctr' undeclared (first use in this function)
  3417 |   if (!old_locale_ctr++)
   |^~
.../libgfortran/io/transfer.c:3419:11: error:
'old_locale' undeclared (first use in this function); did you mean 'c_locale'?
  3419 |   old_locale = setlocale (LC_NUMERIC, NULL);
   |   ^~
   |   c_locale
.../libgfortran/io/transfer.c: In function 'finalize_transfer':
.../libgfortran/io/transfer.c:4253:26: error:
'old_locale_lock' undeclared (first use in this function)
  4253 |   __gthread_mutex_lock (&old_locale_lock);
   |  ^~~
.../libgfortran/io/transfer.c:4254:10: error:
'old_locale_ctr' undeclared (first use in this function)
  4254 |   if (!--old_locale_ctr)
   |  ^~
.../libgfortran/io/transfer.c:4256:30: error:
'old_locale' undeclared (first use in this function); did you mean 'c_locale'?
  4256 |   setlocale (LC_NUMERIC, old_locale);
   |  ^~
   |  c_locale
make[3]: *** [Makefile:6221: transfer.lo] Error 1

Only enable the use of POSIX 2008 extended locale features then when all
the three functions required are present, removing said build errors.

  libgfortran/
  * io/io.h [HAVE_NEWLOCALE]: Also check for HAVE_FREELOCALE and
  HAVE_USELOCALE.
  [HAVE_FREELOCALE && HAVE_NEWLOCALE && HAVE_USELOCALE]
  (HAVE_POSIX_2008_LOCALE): New macro.
  (st_parameter_dt) [HAVE_NEWLOCALE]: Check for
  HAVE_POSIX_2008_LOCALE instead.
  * io/transfer.c (data_transfer_init_worker, finalize_transfer)
  [HAVE_USELOCALE]: Check for HAVE_POSIX_2008_LOCALE instead.
  * io/unit.c [HAVE_NEWLOCALE]: Likewise.
  (init_units) [HAVE_NEWLOCALE]: Likewise.
  (close_units) [HAVE_FREELOCALE]: Likewise.
  * runtime/error.c (gf_strerror) [HAVE_USELOCALE]: Likewise.
---
  libgfortran/io/io.h |   10 +++---
  libgfortran/io/transfer.c   |4 ++--
  libgfortran/io/unit.c   |6 +++---
  libgfortran/runtime/error.c |2 +-
  4 files changed, 13 insertions(+), 9 deletions(-)

gcc-netbsd-libgfortran-locale.diff
Index: gcc/libgfortran/io/io.h
===
--- gcc.orig/libgfortran/io/io.h
+++ gcc/libgfortran/io/io.h
@@ -52,8 +52,12 @@ struct format_data;
  typedef struct fnode fnode;
  struct gfc_unit;

-#ifdef HAVE_NEWLOCALE
-/* We have POSIX 2008 extended locale stuff.  */
+#if defined (HAVE_FREELOCALE) && defined (HAVE_NEWLOCALE) \
+  && defined (HAVE_USELOCALE)
+/* We have POSIX 2008 extended locale stuff.  We only choose to use it
+   if all the functions required are present as some systems, e.g. NetBSD
+   do not have `uselocale'.  */
+#define HAVE_POSIX_2008_LOCALE
  extern locale_t c_locale;
  internal_proto(c_locale);
  #else
@@ -562,7 +566,7 @@ typedef struct st_parameter_dt
char *line_buffer;
struct format_data *fmt;
namelist_info *ionml;
-#ifdef HAVE_NEWLOCALE
+#ifdef HAVE_POSIX_2008_LOCALE
locale_t old_locale;
  #endif
/* Current position within the look-ahead line buffer.  */
Index: gcc/libgfortran/io/transfer.c
===
--- gcc.orig/libgfortran/io/transfer.c
+++ gcc/libgfortran/io/transfer.c
@@ -3410,7 +3410,7 @@ data_transfer_init_worker (st_parameter_

if (dtp->u.p.current_unit->flags.form == FORM_FORMATTED)
  {
-#ifdef HAVE_USELOCALE
+#ifdef HAVE_POSIX_2008_LOCALE
dtp->u.p.old_locale = uselocale (c_locale);
  #else
__gthread_mutex_lock (&old_locale_lock);
@@ -4243,7 +4243,7 @@ finalize_transfer (st_parameter_dt *dtp)
  }
  }

-#ifdef HAVE_USELOCALE
+#ifdef HAVE_POSIX_2008_LOCALE
if (dtp->u.p.old_locale != (locale_t) 0)
  {
uselocale (dtp->u.p.old_locale);
Index: gcc/libgfortran/io/unit.c
=

Re: [PATCH] libgfortran: Correct FP feature macro checks

2020-11-25 Thread Tobias Burnus

LGTM.

Does this solve all infinity issues? Or is there still code requiring it
implicitly? From the previous exchange, it sounded as if there are still
issues.

Tobias

On 25.11.20 19:14, Maciej W. Rozycki wrote:

The *_HAS_* floating-point feature macros are defined as 0/1 rather than
#undef/#define settings by gcc/c-family/c-cppbuiltin.c.  Consequently we
choose to use infinity and NaN features even with non-IEEE-754 targets
such as `vax-netbsdelf' that lack them, causing build warnings and
failures like:

In file included from .../libgfortran/generated/maxval_r4.c:26:
.../libgfortran/generated/maxval_r4.c: In function 'maxval_r4':
.../libgfortran/libgfortran.h:292:30: warning: target format does not support 
infinity
   292 | # define GFC_REAL_4_INFINITY __builtin_inff ()
   |  ^~
.../libgfortran/generated/maxval_r4.c:149:19:
note: in expansion of macro 'GFC_REAL_4_INFINITY'
   149 | result = -GFC_REAL_4_INFINITY;
   |   ^~~
.../libgfortran/generated/maxval_r4.c: In function 'mmaxval_r4':
.../libgfortran/libgfortran.h:292:30: warning: target format does not support 
infinity
   292 | # define GFC_REAL_4_INFINITY __builtin_inff ()
   |  ^~
.../libgfortran/generated/maxval_r4.c:363:19:
note: in expansion of macro 'GFC_REAL_4_INFINITY'
   363 | result = -GFC_REAL_4_INFINITY;
   |   ^~~
{standard input}: Assembler messages:
{standard input}:204: Fatal error: Can't relocate expression
make[3]: *** [Makefile:3358: maxval_r4.lo] Error 1

Correct the checks then for __FLT_HAS_INFINITY__, __DBL_HAS_INFINITY__,
__LDBL_HAS_INFINITY__, __FLT_HAS_QUIET_NAN__, __DBL_HAS_QUIET_NAN__, and
__LDBL_HAS_QUIET_NAN__ to match semantics and remove build issues coming
from the misinterpretation of these macros.

  libgfortran/
  * libgfortran.h: Use #if rather than #ifdef with
  __FLT_HAS_INFINITY__, __DBL_HAS_INFINITY__,
  __LDBL_HAS_INFINITY__, __FLT_HAS_QUIET_NAN__,
  __DBL_HAS_QUIET_NAN__, and __LDBL_HAS_QUIET_NAN__.
---
  libgfortran/libgfortran.h |   12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)

gcc-libgfortran-fp-has.diff
Index: gcc/libgfortran/libgfortran.h
===
--- gcc.orig/libgfortran/libgfortran.h
+++ gcc/libgfortran/libgfortran.h
@@ -288,13 +288,13 @@ typedef GFC_UINTEGER_4 gfc_char4_t;

  /* M{IN,AX}{LOC,VAL} need also infinities and NaNs if supported.  */

-#ifdef __FLT_HAS_INFINITY__
+#if __FLT_HAS_INFINITY__
  # define GFC_REAL_4_INFINITY __builtin_inff ()
  #endif
-#ifdef __DBL_HAS_INFINITY__
+#if __DBL_HAS_INFINITY__
  # define GFC_REAL_8_INFINITY __builtin_inf ()
  #endif
-#ifdef __LDBL_HAS_INFINITY__
+#if __LDBL_HAS_INFINITY__
  # ifdef HAVE_GFC_REAL_10
  #  define GFC_REAL_10_INFINITY __builtin_infl ()
  # endif
@@ -306,13 +306,13 @@ typedef GFC_UINTEGER_4 gfc_char4_t;
  #  endif
  # endif
  #endif
-#ifdef __FLT_HAS_QUIET_NAN__
+#if __FLT_HAS_QUIET_NAN__
  # define GFC_REAL_4_QUIET_NAN __builtin_nanf ("")
  #endif
-#ifdef __DBL_HAS_QUIET_NAN__
+#if __DBL_HAS_QUIET_NAN__
  # define GFC_REAL_8_QUIET_NAN __builtin_nan ("")
  #endif
-#ifdef __LDBL_HAS_QUIET_NAN__
+#if __LDBL_HAS_QUIET_NAN__
  # ifdef HAVE_GFC_REAL_10
  #  define GFC_REAL_10_QUIET_NAN __builtin_nanl ("")
  # endif

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


[PATCH] libgfortran: Verify the presence of all functions for POSIX 2008 locale

2020-11-25 Thread Maciej W. Rozycki
While we have `configure' checks for the individual POSIX 2008 extended 
locale functions we refer to and use to guard the respective call sites, 
we only verify the presence of `newlocale' for our global feature enable 
check.  Consequently compilation fails for targets like NetBSD that only 
have partial support for POSIX 2008 locale features and in particular 
lack the `uselocale' function:

.../libgfortran/io/transfer.c: In function 'data_transfer_init_worker':
.../libgfortran/io/transfer.c:3416:30: error:
'old_locale_lock' undeclared (first use in this function)
 3416 |   __gthread_mutex_lock (&old_locale_lock);
  |  ^~~
.../libgfortran/io/transfer.c:3416:30: note: each undeclared identifier is 
reported only once for each function it appears in
.../libgfortran/io/transfer.c:3417:12: error:
'old_locale_ctr' undeclared (first use in this function)
 3417 |   if (!old_locale_ctr++)
  |^~
.../libgfortran/io/transfer.c:3419:11: error:
'old_locale' undeclared (first use in this function); did you mean 'c_locale'?
 3419 |   old_locale = setlocale (LC_NUMERIC, NULL);
  |   ^~
  |   c_locale
.../libgfortran/io/transfer.c: In function 'finalize_transfer':
.../libgfortran/io/transfer.c:4253:26: error:
'old_locale_lock' undeclared (first use in this function)
 4253 |   __gthread_mutex_lock (&old_locale_lock);
  |  ^~~
.../libgfortran/io/transfer.c:4254:10: error:
'old_locale_ctr' undeclared (first use in this function)
 4254 |   if (!--old_locale_ctr)
  |  ^~
.../libgfortran/io/transfer.c:4256:30: error:
'old_locale' undeclared (first use in this function); did you mean 'c_locale'?
 4256 |   setlocale (LC_NUMERIC, old_locale);
  |  ^~
  |  c_locale
make[3]: *** [Makefile:6221: transfer.lo] Error 1

Only enable the use of POSIX 2008 extended locale features then when all 
the three functions required are present, removing said build errors.

libgfortran/
* io/io.h [HAVE_NEWLOCALE]: Also check for HAVE_FREELOCALE and
HAVE_USELOCALE.
[HAVE_FREELOCALE && HAVE_NEWLOCALE && HAVE_USELOCALE]
(HAVE_POSIX_2008_LOCALE): New macro.
(st_parameter_dt) [HAVE_NEWLOCALE]: Check for
HAVE_POSIX_2008_LOCALE instead.
* io/transfer.c (data_transfer_init_worker, finalize_transfer)
[HAVE_USELOCALE]: Check for HAVE_POSIX_2008_LOCALE instead.
* io/unit.c [HAVE_NEWLOCALE]: Likewise.
(init_units) [HAVE_NEWLOCALE]: Likewise.
(close_units) [HAVE_FREELOCALE]: Likewise.
* runtime/error.c (gf_strerror) [HAVE_USELOCALE]: Likewise.
---
 libgfortran/io/io.h |   10 +++---
 libgfortran/io/transfer.c   |4 ++--
 libgfortran/io/unit.c   |6 +++---
 libgfortran/runtime/error.c |2 +-
 4 files changed, 13 insertions(+), 9 deletions(-)

gcc-netbsd-libgfortran-locale.diff
Index: gcc/libgfortran/io/io.h
===
--- gcc.orig/libgfortran/io/io.h
+++ gcc/libgfortran/io/io.h
@@ -52,8 +52,12 @@ struct format_data;
 typedef struct fnode fnode;
 struct gfc_unit;
 
-#ifdef HAVE_NEWLOCALE
-/* We have POSIX 2008 extended locale stuff.  */
+#if defined (HAVE_FREELOCALE) && defined (HAVE_NEWLOCALE) \
+  && defined (HAVE_USELOCALE)
+/* We have POSIX 2008 extended locale stuff.  We only choose to use it
+   if all the functions required are present as some systems, e.g. NetBSD
+   do not have `uselocale'.  */
+#define HAVE_POSIX_2008_LOCALE
 extern locale_t c_locale;
 internal_proto(c_locale);
 #else
@@ -562,7 +566,7 @@ typedef struct st_parameter_dt
  char *line_buffer;
  struct format_data *fmt;
  namelist_info *ionml;
-#ifdef HAVE_NEWLOCALE
+#ifdef HAVE_POSIX_2008_LOCALE
  locale_t old_locale;
 #endif
  /* Current position within the look-ahead line buffer.  */
Index: gcc/libgfortran/io/transfer.c
===
--- gcc.orig/libgfortran/io/transfer.c
+++ gcc/libgfortran/io/transfer.c
@@ -3410,7 +3410,7 @@ data_transfer_init_worker (st_parameter_
 
   if (dtp->u.p.current_unit->flags.form == FORM_FORMATTED)
 {
-#ifdef HAVE_USELOCALE
+#ifdef HAVE_POSIX_2008_LOCALE
   dtp->u.p.old_locale = uselocale (c_locale);
 #else
   __gthread_mutex_lock (&old_locale_lock);
@@ -4243,7 +4243,7 @@ finalize_transfer (st_parameter_dt *dtp)
}
 }
 
-#ifdef HAVE_USELOCALE
+#ifdef HAVE_POSIX_2008_LOCALE
   if (dtp->u.p.old_locale != (locale_t) 0)
 {
   uselocale (dtp->u.p.old_locale);
Index: gcc/libgfortran/io/unit.c
===
--- gcc.orig/libgfortran/io/unit.c
+++ gcc/libgfortran/io/unit.c
@@ -114,7 +114,7 @@ static char stdout_name[] = "stdout";
 static char stderr_name

[PATCH] libgfortran: Correct FP feature macro checks

2020-11-25 Thread Maciej W. Rozycki
The *_HAS_* floating-point feature macros are defined as 0/1 rather than 
#undef/#define settings by gcc/c-family/c-cppbuiltin.c.  Consequently we 
choose to use infinity and NaN features even with non-IEEE-754 targets 
such as `vax-netbsdelf' that lack them, causing build warnings and 
failures like:

In file included from .../libgfortran/generated/maxval_r4.c:26:
.../libgfortran/generated/maxval_r4.c: In function 'maxval_r4':
.../libgfortran/libgfortran.h:292:30: warning: target format does not support 
infinity
  292 | # define GFC_REAL_4_INFINITY __builtin_inff ()
  |  ^~
.../libgfortran/generated/maxval_r4.c:149:19:
note: in expansion of macro 'GFC_REAL_4_INFINITY'
  149 | result = -GFC_REAL_4_INFINITY;
  |   ^~~
.../libgfortran/generated/maxval_r4.c: In function 'mmaxval_r4':
.../libgfortran/libgfortran.h:292:30: warning: target format does not support 
infinity
  292 | # define GFC_REAL_4_INFINITY __builtin_inff ()
  |  ^~
.../libgfortran/generated/maxval_r4.c:363:19:
note: in expansion of macro 'GFC_REAL_4_INFINITY'
  363 | result = -GFC_REAL_4_INFINITY;
  |   ^~~
{standard input}: Assembler messages:
{standard input}:204: Fatal error: Can't relocate expression
make[3]: *** [Makefile:3358: maxval_r4.lo] Error 1

Correct the checks then for __FLT_HAS_INFINITY__, __DBL_HAS_INFINITY__, 
__LDBL_HAS_INFINITY__, __FLT_HAS_QUIET_NAN__, __DBL_HAS_QUIET_NAN__, and 
__LDBL_HAS_QUIET_NAN__ to match semantics and remove build issues coming 
from the misinterpretation of these macros.

libgfortran/
* libgfortran.h: Use #if rather than #ifdef with 
__FLT_HAS_INFINITY__, __DBL_HAS_INFINITY__, 
__LDBL_HAS_INFINITY__, __FLT_HAS_QUIET_NAN__, 
__DBL_HAS_QUIET_NAN__, and __LDBL_HAS_QUIET_NAN__.
---
 libgfortran/libgfortran.h |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

gcc-libgfortran-fp-has.diff
Index: gcc/libgfortran/libgfortran.h
===
--- gcc.orig/libgfortran/libgfortran.h
+++ gcc/libgfortran/libgfortran.h
@@ -288,13 +288,13 @@ typedef GFC_UINTEGER_4 gfc_char4_t;
 
 /* M{IN,AX}{LOC,VAL} need also infinities and NaNs if supported.  */
 
-#ifdef __FLT_HAS_INFINITY__
+#if __FLT_HAS_INFINITY__
 # define GFC_REAL_4_INFINITY __builtin_inff ()
 #endif
-#ifdef __DBL_HAS_INFINITY__
+#if __DBL_HAS_INFINITY__
 # define GFC_REAL_8_INFINITY __builtin_inf ()
 #endif
-#ifdef __LDBL_HAS_INFINITY__
+#if __LDBL_HAS_INFINITY__
 # ifdef HAVE_GFC_REAL_10
 #  define GFC_REAL_10_INFINITY __builtin_infl ()
 # endif
@@ -306,13 +306,13 @@ typedef GFC_UINTEGER_4 gfc_char4_t;
 #  endif
 # endif
 #endif
-#ifdef __FLT_HAS_QUIET_NAN__
+#if __FLT_HAS_QUIET_NAN__
 # define GFC_REAL_4_QUIET_NAN __builtin_nanf ("")
 #endif
-#ifdef __DBL_HAS_QUIET_NAN__
+#if __DBL_HAS_QUIET_NAN__
 # define GFC_REAL_8_QUIET_NAN __builtin_nan ("")
 #endif
-#ifdef __LDBL_HAS_QUIET_NAN__
+#if __LDBL_HAS_QUIET_NAN__
 # ifdef HAVE_GFC_REAL_10
 #  define GFC_REAL_10_QUIET_NAN __builtin_nanl ("")
 # endif


Re: [PATCH] Overflow-trapping integer arithmetic routines7code

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/25/20 6:18 AM, Stefan Kanthak wrote:
> Jeff Law  wrote:
>
>> On 11/10/20 10:21 AM, Stefan Kanthak wrote:
>>
 So with all that in mind, I installed everything except the bits which
 have the LIBGCC2_BAD_CODE ifdefs after testing on the various crosses.
 If you could remove the ifdefs on the abs/mult changes and resubmit them
 it'd be appreciated.
>>> Done.
>> THanks. I'm doing some testing on the abs changes right now. They look
>> pretty reasonable, though they will tend to generate worse code on
>> targets that don't handle overflow arithmetic and testing all that well.
> OTOH the changes yield better code on targets which have a proper
> overflow handling, and may benefit from eventual improvements in the
> compiler/code generator itself on all targets.
I mentioned it mostly because I wanted others to be aware that there are
targets where the abs changes may generate slightly worse code and that
resolution is (IMHO) mostly a matter of improving overflow handling in
the target.  These issues are small enough that I don't think they
should hinder the abs changes moving forward.


>
>> Also note that your approach always does 3 multiplies, which can be very
>> expensive on some architectures. The existing version in libgcc2.c will
>> often just do one or two multiplies. So while your implementation looks
>> a lot simpler, I suspect its often much slower. And on targets without
>> 32bit multiplication support, it's probably horribly bad.
> All (current) processors I know have super-scalar architecture and a
> hardware multiplier, they'll execute the 3 multiplies in parallel.
> In my tests on i386 and AMD64 (Core2, Skylake, Ryzen/EPYC), the code
> generated for 
> as well as the (almost) branch-free code shown below and in
>  runs 10% to 25%
> faster than the __mulvDI3 routine from libgcc: the many conditional
> branches of your current implementation impair performance more than 3
> multiplies!
All the world is not an x86.  GCC supports over 30 distinct processor
types, many of which target the embedded world.   Those chips often have
limited multiply capabilities and they're often quite slow with
no/minimal pipelining and no superscalar or out of order capabilities.

THe fact that it runs faster on x86 is good, but we have to think in a
more broad fashion.  As it stands right now I'm not going to put the
multiply changes in.  If you wanted to rework them so they're less
costly on the embedded targets, then that would be helpful.


>
>> My inclination is to leave the overflow checking double-word multiplier
>> as-is.
> See but  ff.
Already read and considered it. 
>
>> Though I guess you could keep the same structure as the existing
>> implementation which tries to avoid unnecessary multiplies and still use
>> the __builtin_{add,mul}_overflow to simplify the code a bit less
>> aggressively.
> Tertium datur: take a look at the __udivmodDI4 routine.
> It has separate code paths for targets without hardware divider, and
> also for targets where the hardware divider needs a normalized dividend.
> I therefore propose to add separate code paths for targets with and
> without hardware multiplier for the __mulvDI3 routine too, guarded by a
> preprocessor macro which tells whether a target has a hardware multiplier.
I don't think there is a way to indicate that there's a hardware
multipler available (or what capabilities it might have -- some might
just have a 16x16 multiplier with or without widening variants, it can
depend on precisely what revision of the chip you're targeting -- which
can change based on compielr flags) and I would oppose a change that
adds something like TARGET_HAS_NO_HW_DIVIDE.  That's a wart and one I
would oppose spreading further.

Instead keep the tests that detect the special cases that don't need as
many multiplies and use the overflow builtins within that implementation
framework.  In cases where we can use the operands directly, that's
helpful as going through the struct/union likely leads to unnecessary
register shuffling.

jeff



Re: [PATCH] configury : Fix LEB128 support for non-GNU assemblers.

2020-11-25 Thread Jeff Law via Gcc-patches



On 11/25/20 2:49 AM, Iain Sandoe wrote:
> Hi,
>
> I’ve had this patch in my Darwin trees for (literally) years, it is
> relevant to
> the GCC ports that use a non-binutils assembler.
>
> At present, even if such an assembler supports LEB128,  the GCC config
> is setting HAVE_LEB128 = 0.
>
> The ports I know of that can benefit from a change here are:
>
> AIX (unaffected by this change, since the assembler [at least on
> gcc119] does
>     not support leb128 directives)
>
> Darwin (checked the various different assemblers)
>
> GCN (I can’t test this, so Andrew, please could you say if the change
>   is OK for that)
>
> Solaris (bootstrapped and tests running on GCC211, but maybe Rainer would
>     want wider checks).
>
> I guess we could exclude specific ports that don’t want to use leb128
> with
> a target elif in the configuration.
>
> OK for master?
>
> thanks
> Iain
>
> = commit message
>
> Some assemblers that either do not respond to --version, or are non-GNU,
> do support leb128 directives.  The current configuration test omits these
> cases and only supports GNU assemblers with a version > 2.11.
>
> The patch extends the asm test to cover one failure case present in
> assemblers based off an older version of GAS (where a 64 bit value with
> the MSB set presented to a .uleb128 directive causes a fail).
>
> This change then assumes that a non-GNU assembler that passes the asm
> test
> correctly supports the directives.
>
> gcc/ChangeLog:
>
> * configure.ac (check leb128 support): Support non-GNU assemblers
> that pass the leb128 confgure test.  Add a test for uleb128 with
> the MSB set for a 64 bit value.
> * configure: Regenerated.
OK.  Obviously if Rainer's tests show adjustment is necessary we'll have
to take care of that. 

Jeff



[committed] libstdc++: Fix missing subsumption in std::iterator_traits [PR 97935]

2020-11-25 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

PR libstdc++/97935
* include/bits/iterator_concepts.h (__detail::__iter_without_category):
New helper concept.
(__iterator_traits::__cat): Use __detail::__iter_without_category.
* testsuite/24_iterators/associated_types/iterator.traits.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

I'll backport this to gcc-10 too.


commit 9d908b7fc475b351622fa5630d4874068c789d70
Author: Jonathan Wakely 
Date:   Wed Nov 25 17:18:44 2020

libstdc++: Fix missing subsumption in std::iterator_traits [PR 97935]

libstdc++-v3/ChangeLog:

PR libstdc++/97935
* include/bits/iterator_concepts.h 
(__detail::__iter_without_category):
New helper concept.
(__iterator_traits::__cat): Use __detail::__iter_without_category.
* testsuite/24_iterators/associated_types/iterator.traits.cc: New 
test.

diff --git a/libstdc++-v3/include/bits/iterator_concepts.h 
b/libstdc++-v3/include/bits/iterator_concepts.h
index 8ff4f8667dd1..6668caa8185c 100644
--- a/libstdc++-v3/include/bits/iterator_concepts.h
+++ b/libstdc++-v3/include/bits/iterator_concepts.h
@@ -357,6 +357,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 template
   concept __iter_without_nested_types = !__iter_with_nested_types<_Iter>;
+
+template
+  concept __iter_without_category
+   = !requires { typename _Iter::iterator_category; };
+
   } // namespace __detail
 
   template
@@ -396,20 +401,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{ using type = typename _Iter::iterator_category; };
 
   template
-   requires (!requires { typename _Iter::iterator_category; }
- && __detail::__cpp17_randacc_iterator<_Iter>)
+   requires __detail::__iter_without_category<_Iter>
+ && __detail::__cpp17_randacc_iterator<_Iter>
struct __cat<_Iter>
{ using type = random_access_iterator_tag; };
 
   template
-   requires (!requires { typename _Iter::iterator_category; }
- && __detail::__cpp17_bidi_iterator<_Iter>)
+   requires __detail::__iter_without_category<_Iter>
+ && __detail::__cpp17_bidi_iterator<_Iter>
struct __cat<_Iter>
{ using type = bidirectional_iterator_tag; };
 
   template
-   requires (!requires { typename _Iter::iterator_category; }
- && __detail::__cpp17_fwd_iterator<_Iter>)
+   requires __detail::__iter_without_category<_Iter>
+ && __detail::__cpp17_fwd_iterator<_Iter>
struct __cat<_Iter>
{ using type = forward_iterator_tag; };
 
diff --git 
a/libstdc++-v3/testsuite/24_iterators/associated_types/iterator.traits.cc 
b/libstdc++-v3/testsuite/24_iterators/associated_types/iterator.traits.cc
new file mode 100644
index ..c3549fefaf60
--- /dev/null
+++ b/libstdc++-v3/testsuite/24_iterators/associated_types/iterator.traits.cc
@@ -0,0 +1,56 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do compile { target c++2a } }
+
+#include 
+
+struct bidi_iterator
+{
+  // No nested reference and pointer types.
+  // No iterator_category.
+
+  // cpp17-iterator requirements:
+  int&   operator*() const;
+  bidi_iterator& operator++();
+  bidi_iterator  operator++(int);
+
+  // cpp17-input-iterator requirements:
+  friend bool operator==(const bidi_iterator&, const bidi_iterator&);
+  using difference_type = long long;
+  using value_type = int;
+
+  // cpp17-forward-iterator requirements:
+  bidi_iterator();
+
+  // cpp17-bidirectional-iterator requirements:
+  bidi_iterator& operator--();
+  bidi_iterator operator--(int);
+};
+
+void
+test01()
+{
+  // PR libstdc++/97935
+  // Missing subsumption in iterator category detection
+  using namespace std;
+  static_assert(__detail::__cpp17_bidi_iterator);
+  static_assert(same_as::iterator_category,
+   bidirectional_iterator_tag>,
+   "PR libstdc++/97935");
+}


[committed] libstdc++: Fix test failure on AIX

2020-11-25 Thread Jonathan Wakely via Gcc-patches
This fixes a failure on AIX 7.2:

FAIL: 17_intro/names.cc (test for excess errors)
Excess errors:
/home/jwakely/src/gcc/libstdc++-v3/testsuite/17_intro/names.cc:99: error: 
expected identifier before '(' token
/usr/include/sys/var.h:187: error: expected unqualified-id before '{' token
/usr/include/sys/var.h:187: error: expected ')' before '{' token
/usr/include/sys/var.h:337: error: expected unqualified-id before ';' token
/usr/include/sys/var.h:337: error: expected ')' before ';' token

libstdc++-v3/ChangeLog:

* testsuite/17_intro/names.cc: Do not test 'v' on AIX.

Tested powerpc64le-linux and powerpc-aix. Committed to trunk.

commit 1a8d1f54de371de88b2604d8c0e4e01306be8870
Author: Jonathan Wakely 
Date:   Wed Nov 25 16:58:05 2020

libstdc++: Fix test failure on AIX

This fixes a failure on AIX 7.2:

FAIL: 17_intro/names.cc (test for excess errors)
Excess errors:
/home/jwakely/src/gcc/libstdc++-v3/testsuite/17_intro/names.cc:99: error: 
expected identifier before '(' token
/usr/include/sys/var.h:187: error: expected unqualified-id before '{' token
/usr/include/sys/var.h:187: error: expected ')' before '{' token
/usr/include/sys/var.h:337: error: expected unqualified-id before ';' token
/usr/include/sys/var.h:337: error: expected ')' before ';' token

libstdc++-v3/ChangeLog:

* testsuite/17_intro/names.cc: Do not test 'v' on AIX.

diff --git a/libstdc++-v3/testsuite/17_intro/names.cc 
b/libstdc++-v3/testsuite/17_intro/names.cc
index 5a61c97e9899..2c8bfff26e1c 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -193,6 +193,8 @@
 #undef r
 #undef x
 #undef y
+//  defines vario::v
+#undef v
 #endif
 
 #ifdef __hpux__


Re: How to traverse all the local variables that declared in the current routine?

2020-11-25 Thread Qing Zhao via Gcc-patches



> On Nov 25, 2020, at 3:11 AM, Richard Biener  
> wrote:
>> 
>> 
>> Hi,
>> 
>> Does gcc provide an iterator to traverse all the local variables that are 
>> declared in the current routine?
>> 
>> If not, what’s the best way to traverse the local variables?
>> 
>> 
>> Depends on what for.  There's the source level view you get by walking
>> BLOCK_VARS of the
>> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and
>> there's SSA names
>> (FOR_EACH_SSA_NAME).
>> 
>> 
>> I am planing to add a new phase immediately after 
>> “pass_late_warn_uninitialized” to initialize all auto-variables that are
>> not explicitly initialized in the declaration, the basic idea is following:
>> 
>> ** The proposal:
>> 
>> A. add a new GCC option: (same name and meaning as CLANG)
>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>> 
>> B. add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance purpose.
>> 
>> C. The implementation needs to keep the current static warning on 
>> uninitialized
>> variables untouched in order to avoid "forking the language".
>> 
>> 
>> ** The implementation:
>> 
>> There are two major requirements for the implementation:
>> 
>> 1. all auto-variables that do not have an explicit initializer should be 
>> initialized to
>> zero by this option.  (Same behavior as CLANG)
>> 
>> 2. keep the current static warning on uninitialized variables untouched.
>> 
>> In order to satisfy 1, we should check whether an auto-variable has 
>> initializer
>> or not;
>> In order to satisfy 2, we should add this new transformation after
>> "pass_late_warn_uninitialized".
>> 
>> So, we should be able to check whether an auto-variable has initializer or 
>> not after “pass_late_warn_uninitialized”,
>> If Not, then insert an initialization for it.
>> 
>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?
>> 
>> 
>> Yes, but do you want to catch variables promoted to register as well
>> or just variables
>> on the stack?
>> 
>> 
>> I think both as long as they are source-level auto-variables. Then which one 
>> is better?
>> 
>> 
>> Another issue is, in order to check whether an auto-variable has 
>> initializer, I plan to add a new bit in “decl_common” as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
>> 
>> 
>> For locals it would be more reliable to set this flag during gimplification.
>> 
>> 
>> You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the routine 
>> “gimpley_decl_expr” (gimplify.c) as following:
>> 
>>  if (VAR_P (decl) && !DECL_EXTERNAL (decl))
>>{
>>  tree init = DECL_INITIAL (decl);
>> ...
>>  if (init && init != error_mark_node)
>>{
>>  if (!TREE_STATIC (decl))
>>{
>>  DECL_IS_INITIALIZED(decl) = 1;
>>}
>> 
>> Is this enough for all Frontends? Are there other places that I need to 
>> maintain this bit?
>> 
>> 
>> 
>> Do you have any comment and suggestions?
>> 
>> 
>> As said above - do you want to cover registers as well as locals?
>> 
>> 
>> All the locals from the source-code point of view should be covered.   (From 
>> my study so far,  looks like that Clang adds that phase in FE).
>> If GCC adds this phase in FE, then the following design requirement
>> 
>> C. The implementation needs to keep the current static warning on 
>> uninitialized
>> variables untouched in order to avoid "forking the language”.
>> 
>> cannot be satisfied.  Since gcc’s uninitialized variables analysis is 
>> applied quite late.
>> 
>> So, we have to add this new phase after “pass_late_warn_uninitialized”.
>> 
>> I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see 
>> expand_stack_vars)
>> 
>> 
>> Adding  this new transformation during RTL expansion is okay.  I will check 
>> on this in more details to see how to add it to RTL expansion phase.
>> 
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too 
>> late.
>> 
>> 
>> This is a really good point…
>> 
>> In order to avoid optimization  to use the “uninitialized” state of locals, 
>> we should add the zeroing phase as early as possible (adding it in FE might 
>> be best
>> for this issue). However, if we have to met the following requirement:
> 
> So is optimization supposed to pick up zero or is it supposed to act
> as if the initializer
> is unknown?

Good question!

Theoretically,  the new option -ftrivial-auto-var-init=zero is supposed to add 

Re: [PATCH] clean up more -Wformat-diag (PR 94982)

2020-11-25 Thread Jason Merrill via Gcc-patches

On 11/24/20 8:09 PM, Martin Sebor via Gcc-patches wrote:

The attached patch cleans up most remaining -Wformat-diag instances
in an x86_64-build.  I tried to minimize using #pragma diagnostic
so I tweaked the code instead.  A preferable solution might be to
introduce a new format attribute and used it to exempt the pp_printf()
and verbatim() functions from some of the -Wformat-diag checks but that
would require more surgery on the warning code than I think is called
for at this point.

Tested by bootstrapping all of the languages below (same set I test
all my patches with) and regtesting:
   ada,brig,c,c++,d,fortran,jit,lto,objc,obj-c++


OK.



Re: Unreviewed g++ driver patch

2020-11-25 Thread Jason Merrill via Gcc-patches

On 11/25/20 9:41 AM, Rainer Orth wrote:

The g++spec.c part of this patch

[PATCH] ada: c++: Get rid of libposix4, librt on Solaris
 https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559341.html

is the only remaining part still needing review by a c++ or driver
maintainer.  It's more than a week now.


OK.



Re: [PATCH] c++: v2: Add __builtin_bit_cast to implement std::bit_cast [PR93121]

2020-11-25 Thread Jason Merrill via Gcc-patches

On 11/2/20 2:21 PM, Jakub Jelinek wrote:

On Thu, Jul 30, 2020 at 10:16:46AM -0400, Jason Merrill via Gcc-patches wrote:

The following patch adds __builtin_bit_cast builtin, similarly to
clang or MSVC which implement std::bit_cast using such an builtin too.


Great!


Sorry for the long delay.

The following patch implements what you've requested in the review.
It doesn't deal with the padding bits being (sometimes) well defined, but if
we e.g. get a new CONSTRUCTOR bit that would request that (and corresponding
gimplifier change to pre-initialize to all zeros), it could be easily
adjusted (just for such CONSTRUCTORs memset the whole mask portion
(total_bytes) and set mask to NULL).  If there are other cases where
such handling is desirable (e.g. I wonder about bit_cast being called on
non-automatic const variables where their CONSTRUCTOR at least
implementation-wise would end up being on pre-zeroed .rodata (or .data)
memory), that can be handled too.

In the previous version I was using next_initializable_field, that is
something that isn't available in the middle-end, so all I do right now
is just skip non-FIELD_DECLs, unnamed bit-fields and fields with zero size.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-11-02  Jakub Jelinek  

PR libstdc++/93121
* fold-const.h (native_encode_initializer): Add mask argument
defaulted to nullptr.
(find_bitfield_repr_type): Declare.
* fold-const.c (find_bitfield_repr_type): New function.
(native_encode_initializer): Add mask argument and support for
filling it.  Handle also some bitfields without integral
DECL_BIT_FIELD_REPRESENTATIVE.

* c-common.h (enum rid): Add RID_BUILTIN_BIT_CAST.
* c-common.c (c_common_reswords): Add __builtin_bit_cast.

* cp-tree.h (cp_build_bit_cast): Declare.
* cp-tree.def (BIT_CAST_EXPR): New tree code.
* cp-objcp-common.c (names_builtin_p): Handle RID_BUILTIN_BIT_CAST.
(cp_common_init_ts): Handle BIT_CAST_EXPR.
* cxx-pretty-print.c (cxx_pretty_printer::postfix_expression):
Likewise.
* parser.c (cp_parser_postfix_expression): Handle
RID_BUILTIN_BIT_CAST.
* semantics.c (cp_build_bit_cast): New function.
* tree.c (cp_tree_equal): Handle BIT_CAST_EXPR.
(cp_walk_subtrees): Likewise.
* pt.c (tsubst_copy): Likewise.
* constexpr.c (check_bit_cast_type, cxx_native_interpret_aggregate,
cxx_eval_bit_cast): New functions.
(cxx_eval_constant_expression): Handle BIT_CAST_EXPR.
(potential_constant_expression_1): Likewise.
* cp-gimplify.c (cp_genericize_r): Likewise.

* g++.dg/cpp2a/bit-cast1.C: New test.
* g++.dg/cpp2a/bit-cast2.C: New test.
* g++.dg/cpp2a/bit-cast3.C: New test.
* g++.dg/cpp2a/bit-cast4.C: New test.
* g++.dg/cpp2a/bit-cast5.C: New test.

--- gcc/fold-const.h.jj 2020-08-18 07:50:18.716920202 +0200
+++ gcc/fold-const.h2020-11-02 15:53:59.859477063 +0100
@@ -27,9 +27,10 @@ extern int folding_initializer;
  /* Convert between trees and native memory representation.  */
  extern int native_encode_expr (const_tree, unsigned char *, int, int off = 
-1);
  extern int native_encode_initializer (tree, unsigned char *, int,
- int off = -1);
+ int off = -1, unsigned char * = nullptr);
  extern tree native_interpret_expr (tree, const unsigned char *, int);
  extern bool can_native_interpret_type_p (tree);
+extern tree find_bitfield_repr_type (int, int);
  extern void shift_bytes_in_array_left (unsigned char *, unsigned int,
   unsigned int);
  extern void shift_bytes_in_array_right (unsigned char *, unsigned int,
--- gcc/fold-const.c.jj 2020-10-15 15:09:49.079725120 +0200
+++ gcc/fold-const.c2020-11-02 17:20:43.784633491 +0100
@@ -7915,25 +7915,78 @@ native_encode_expr (const_tree expr, uns
  }
  }
  
+/* Try to find a type whose byte size is smaller or equal to LEN bytes larger

+   or equal to FIELDSIZE bytes, with underlying mode precision/size multiple
+   of BITS_PER_UNIT.  As native_{interpret,encode}_int works in term of
+   machine modes, we can't just use build_nonstandard_integer_type.  */
+
+tree
+find_bitfield_repr_type (int fieldsize, int len)
+{
+  machine_mode mode;
+  for (int pass = 0; pass < 2; pass++)
+{
+  enum mode_class mclass = pass ? MODE_PARTIAL_INT : MODE_INT;
+  FOR_EACH_MODE_IN_CLASS (mode, mclass)
+   if (known_ge (GET_MODE_SIZE (mode), fieldsize)
+   && known_eq (GET_MODE_PRECISION (mode),
+GET_MODE_BITSIZE (mode))
+   && known_le (GET_MODE_SIZE (mode), len))
+ {
+   tree ret = lang_hooks.types.type_for_mode (mode, 1);
+   if (ret && TYPE_MODE (ret) == mode)
+ return ret;
+ }
+}
+
+  for (int i = 0; i < NUM_INT_N_ENTS;

Re: [PATCH 1/7] arm: Auto-vectorization for MVE: vand

2020-11-25 Thread Andre Simoes Dias Vieira via Gcc-patches

Hi Christophe,

Thanks for these! See some inline comments.

On 25/11/2020 13:54, Christophe Lyon via Gcc-patches wrote:

This patch enables MVE vandq instructions for auto-vectorization.  MVE
vandq insns in mve.md are modified to use 'and' instead of unspec
expression to support and3.  The and3 expander is added to
vec-common.md

2020-11-12  Christophe Lyon  

gcc/
* gcc/config/arm/iterators.md (supf): Remove VANDQ_S and VANDQ_U.
(VANQ): Remove.
* config/arm/mve.md (mve_vandq_u): New entry for vand
instruction using expression and.
(mve_vandq_s): New expander.
* config/arm/neon.md (and3): Renamed into and3_neon.
* config/arm/unspecs.md (VANDQ_S, VANDQ_U): Remove.
* config/arm/vec-common.md (and3): New expander.

gcc/testsuite/
* gcc.target/arm/simd/mve-vand.c: New test.
---
  gcc/config/arm/iterators.md  |  4 +---
  gcc/config/arm/mve.md| 20 
  gcc/config/arm/neon.md   |  2 +-
  gcc/config/arm/unspecs.md|  2 --
  gcc/config/arm/vec-common.md | 15 
  gcc/testsuite/gcc.target/arm/simd/mve-vand.c | 34 
  6 files changed, 66 insertions(+), 11 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vand.c

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 592af35..72039e4 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -1232,8 +1232,7 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") 
(VREV16Q_S "s")
   (VADDLVQ_P_U "u") (VCMPNEQ_U "u") (VCMPNEQ_S "s")
   (VABDQ_M_S "s") (VABDQ_M_U "u") (VABDQ_S "s")
   (VABDQ_U "u") (VADDQ_N_S "s") (VADDQ_N_U "u")
-  (VADDVQ_P_S "s")   (VADDVQ_P_U "u") (VANDQ_S "s")
-  (VANDQ_U "u") (VBICQ_S "s") (VBICQ_U "u")
+  (VADDVQ_P_S "s")   (VADDVQ_P_U "u") (VBICQ_S "s") (VBICQ_U 
"u")
   (VBRSRQ_N_S "s") (VBRSRQ_N_U "u") (VCADDQ_ROT270_S "s")
   (VCADDQ_ROT270_U "u") (VCADDQ_ROT90_S "s")
   (VCMPEQQ_S "s") (VCMPEQQ_U "u") (VCADDQ_ROT90_U "u")
@@ -1501,7 +1500,6 @@ (define_int_iterator VABDQ [VABDQ_S VABDQ_U])
  (define_int_iterator VADDQ_N [VADDQ_N_S VADDQ_N_U])
  (define_int_iterator VADDVAQ [VADDVAQ_S VADDVAQ_U])
  (define_int_iterator VADDVQ_P [VADDVQ_P_U VADDVQ_P_S])
-(define_int_iterator VANDQ [VANDQ_U VANDQ_S])
  (define_int_iterator VBICQ [VBICQ_S VBICQ_U])
  (define_int_iterator VBRSRQ_N [VBRSRQ_N_U VBRSRQ_N_S])
  (define_int_iterator VCADDQ_ROT270 [VCADDQ_ROT270_S VCADDQ_ROT270_U])
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index ecbaaa9..975eb7d 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -894,17 +894,27 @@ (define_insn "mve_vaddvq_p_"
  ;;
  ;; [vandq_u, vandq_s])
  ;;
-(define_insn "mve_vandq_"
+;; signed and unsigned versions are the same: define the unsigned
+;; insn, and use an expander for the signed one as we still reference
+;; both names from arm_mve.h.
+(define_insn "mve_vandq_u"
[
 (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-   (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-  (match_operand:MVE_2 2 "s_register_operand" "w")]
-VANDQ))
+   (and:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+  (match_operand:MVE_2 2 "s_register_operand" "w")))
The predicate on the second operand is more restrictive than the one in 
expand 'neon_inv_logic_op2'. This should still work with immediates, or 
well I checked for integers, it generates a loop as such:


    vldrw.32    q3, [r0]
    vldr.64 d4, .L8
    vldr.64 d5, .L8+8
    vand    q3, q3, q2
    vstrw.32    q3, [r2]

MVE does support vand's with immediates, just like NEON, I suspect you 
could just copy the way Neon handles these, possibly worth the little 
extra effort. You can use dest[i] = a[i] & ~1 as a testcase.
If you don't it might still be worth expanding the test to make sure 
other immediates-types combinations don't trigger an ICE?


I'm not sure I understand why it loads it in two 64-bit chunks and not 
do a single load or not just do something like a vmov or vbic immediate. 
Anyhow that's a worry for another day I guess..

]
"TARGET_HAVE_MVE"
-  "vand %q0, %q1, %q2"
+  "vand\t%q0, %q1, %q2"
[(set_attr "type" "mve_move")
  ])
+(define_expand "mve_vandq_s"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand")
+   (and:MVE_2 (match_operand:MVE_2 1 "s_register_operand")
+  (match_operand:MVE_2 2 "s_register_operand")))
+  ]
+  "TARGET_HAVE_MVE"
+)
  
  ;;

  ;; [vbicq_s, vbicq_u])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 2d76769..dc4707d 100644
--- a/gcc/config/arm/neon.md
+++ 

Re: [PATCH 00/31] VAX: Bring the port up to date (yes, MODE_CC conversion is included)

2020-11-25 Thread Maciej W. Rozycki
On Mon, 23 Nov 2020, Paul Koning wrote:

> > Then there is a fix for the PDP11 backend addressing an issue I found in 
> > the handling of floating-point comparisons.  Unlike all the other changes 
> > this one has not been regression-tested, not even built as I have no idea 
> > how to prepare a development environment for a PDP11 target (also none of 
> > my VAX pieces is old enough to support PDP11 machine code execution).
> 
> I agree this is a correct change, interesting that it was missed before.  
> You'd expect some ICE issues from that mistake.  Perhaps there were and 
> I didn't realize the cause; the PDP11 test run is not yet fully clean.

 Nothing like that, I wouldn't expect an ICE here.  Just as none happened 
with the VAX backend before a test case made me realise a corresponding 
change was required.  It's just a pessimisation: the RTL simply doesn't 
match and the comparison to remove stays.

> I've hacked together a primitive newlib based "bare metal" execution 
> test setup that uses SIMH, but it's not a particularly clean setup.  
> And it hasn't been posted, I hope to be able to do that at some point.

 Hmm, I gather those systems are able to run some kind of BSD Unix: don't 
they support the r-commands which would allow you to run DejaGNU testing 
with a realistic environment PDP-11 hardware would be usually used with, 
possibly on actual hardware even?  I always feel a bit uneasy about the 
accuracy of any simulation (having suffered from bugs in QEMU causing 
false negatives in software verification).

 While I would expect old BSD libc to miss some of the C language features 
considered standard nowadays, I think at least the C GCC frontend runtime 
(libgcc.a) and the test suite do not overall rely on their presence, and 
any individual test cases that do can be easily excluded.

> Thanks for the fix.

 I take it as an approval and will apply the change then along with the 
rest of the series.  Thank you for your review.

  Maciej


[PATCH 2/2] gcc/testsuite/s390: Add test cases for float_t

2020-11-25 Thread Marius Hillenbrand via Gcc-patches
Add two test cases that check for acceptable combinations of float_t and
FLT_EVAL_METHOD on s390x.

Tested against an as-is glibc and one modified so that it derives
float_t from FLT_EVAL_METHOD.

gcc/testsuite/ChangeLog:

2020-11-25  Marius Hillenbrand  

* gcc.target/s390/float_t-1.c: New test.
* gcc.target/s390/float_t-2.c: New test.
---
 gcc/testsuite/gcc.target/s390/float_t-1.c | 15 +++
 gcc/testsuite/gcc.target/s390/float_t-2.c | 13 +
 2 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/float_t-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/float_t-2.c

diff --git a/gcc/testsuite/gcc.target/s390/float_t-1.c 
b/gcc/testsuite/gcc.target/s390/float_t-1.c
new file mode 100644
index 000..3455694250f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/float_t-1.c
@@ -0,0 +1,15 @@
+/* { dg-do run } */
+/* { dg-options "-std=c99" } */
+#include 
+#include 
+
+int main()
+{
+  /* In standard-compliant mode, the size of float_t and FLT_EVAL_METHOD must
+ match. */
+  if (sizeof(float_t) == sizeof(double) && __FLT_EVAL_METHOD__ != 1)
+abort();
+  if (sizeof(float_t) == sizeof(float) && __FLT_EVAL_METHOD__ != 0)
+abort();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/s390/float_t-2.c 
b/gcc/testsuite/gcc.target/s390/float_t-2.c
new file mode 100644
index 000..ebeda28b6d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/float_t-2.c
@@ -0,0 +1,13 @@
+/* { dg-do run } */
+/* { dg-options "-std=gnu99" } */
+#include 
+#include 
+
+int main()
+{
+  /* In gnuXY mode, the size of float_t and FLT_EVAL_METHOD must
+ match, with the historic exception of permitting double and 0. */
+  if (sizeof(float_t) == sizeof(float) && __FLT_EVAL_METHOD__ == 1)
+abort();
+  return 0;
+}
-- 
2.26.2



[PATCH 1/2] IBM Z: Configure excess precision for float at compile-time

2020-11-25 Thread Marius Hillenbrand via Gcc-patches
Historically, float_t has been defined as double on s390 and gcc would
emit double precision insns for evaluating float expressions when in
standard-compliant mode. Configure that behavior at compile-time as prep
for changes in glibc: When glibc ties float_t to double, keep the old
behavior; when glibc derives float_t from FLT_EVAL_METHOD (as on most
other archs), revert to the default behavior (i.e.,
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT). Provide a configure option
--enable-s390-excess-float-precision to override the check.

gcc/ChangeLog:

2020-11-25  Marius Hillenbrand  

* configure.ac: Add configure option
--enable-s390-excess-float-precision and check to derive default
from glibc.
* config/s390/s390.c: Guard s390_excess_precision with an ifdef
for ENABLE_S390_EXCESS_FLOAT_PRECISION.
* doc/install.texi: Document --enable-s390-excess-float-precision.
* configure: Regenerate.
* config.in: Regenerate.
---
 gcc/config/s390/s390.c | 27 ++---
 gcc/configure.ac   | 45 ++
 gcc/doc/install.texi   | 10 ++
 3 files changed, 75 insertions(+), 7 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 6983e363252..02f18366aa1 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16376,20 +16376,28 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, 
const_tree type1, const_tree ty
   return NULL;
 }
 
-/* Implement TARGET_C_EXCESS_PRECISION.
+#if ENABLE_S390_EXCESS_FLOAT_PRECISION == 1
+/* Implement TARGET_C_EXCESS_PRECISION to maintain historic behavior with older
+   glibc versions
 
-   FIXME: For historical reasons, float_t and double_t are typedef'ed to
+   For historical reasons, float_t and double_t had been typedef'ed to
double on s390, causing operations on float_t to operate in a higher
precision than is necessary.  However, it is not the case that SFmode
operations have implicit excess precision, and we generate more optimal
code if we let the compiler know no implicit extra precision is added.
 
-   That means when we are compiling with -fexcess-precision=fast, the value
-   we set for FLT_EVAL_METHOD will be out of line with the actual precision of
-   float_t (though they would be correct for -fexcess-precision=standard).
+   With a glibc with that "historic" definition, configure will enable this 
hook
+   to set FLT_EVAL_METHOD to 1 for -fexcess-precision=standard (e.g., as 
implied
+   by -std=cXY).  That means when we are compiling with 
-fexcess-precision=fast,
+   the value we set for FLT_EVAL_METHOD will be out of line with the actual
+   precision of float_t.
 
-   A complete fix would modify glibc to remove the unnecessary typedef
-   of float_t to double.  */
+   Newer versions of glibc will be modified to derive the definition of float_t
+   from FLT_EVAL_METHOD on s390x, as on many other architectures.  There,
+   configure will disable this hook by default, so that we defer to the default
+   of FLT_EVAL_METHOD_PROMOTE_TO_FLOAT and a resulting typedef of float_t to
+   float.  Note that in that scenario, float_t and FLT_EVAL_METHOD will be in
+   line independent of -fexcess-precision. */
 
 static enum flt_eval_method
 s390_excess_precision (enum excess_precision_type type)
@@ -16412,6 +16420,7 @@ s390_excess_precision (enum excess_precision_type type)
 }
   return FLT_EVAL_METHOD_UNPREDICTABLE;
 }
+#endif
 
 /* Implement the TARGET_ASAN_SHADOW_OFFSET hook.  */
 
@@ -16708,8 +16717,12 @@ s390_shift_truncation_mask (machine_mode mode)
 #undef TARGET_ASM_CAN_OUTPUT_MI_THUNK
 #define TARGET_ASM_CAN_OUTPUT_MI_THUNK 
hook_bool_const_tree_hwi_hwi_const_tree_true
 
+#if ENABLE_S390_EXCESS_FLOAT_PRECISION == 1
+/* This hook is only needed to maintain the historic behavior with glibc
+   versions that typedef float_t to double. */
 #undef TARGET_C_EXCESS_PRECISION
 #define TARGET_C_EXCESS_PRECISION s390_excess_precision
+#endif
 
 #undef  TARGET_SCHED_ADJUST_PRIORITY
 #define TARGET_SCHED_ADJUST_PRIORITY s390_adjust_priority
diff --git a/gcc/configure.ac b/gcc/configure.ac
index b410428b4fc..24679a540c1 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -7318,6 +7318,51 @@ if test x"$ld_pushpopstate_support" = xyes; then
 fi
 AC_MSG_RESULT($ld_pushpopstate_support)
 
+# On s390, float_t has historically been statically defined as double for no
+# good reason. To comply with the C standard in the light of this definition,
+# gcc has evaluated float expressions in double precision when in
+# standards-compatible mode or when given -fexcess-precision=standard. To 
enable
+# a smooth transition towards the new model used by most architectures, where
+# gcc describes its behavior via the macro __FLT_EVAL_METHOD__ and glibc 
derives
+# float_t from that, this behavior can be configured with
+# --enable-s390-excess-float-precision. When given as enabled, that flag 
selects
+# the old model. When omitted, nativ

[PATCH 0/2] IBM Z: Prepare cleanup of float express precision

2020-11-25 Thread Marius Hillenbrand via Gcc-patches
Hi,

gcc has special behavior for FLT_EVAL_METHOD on s390x that causes
performance impact in some scenarios and fails to align with float_t wrt
the C standard in others. These two patches prepare gcc for a cleanup to
get rid of that special case, to improve standard compliance and avoid
the overhead.

On s390 today, float_t is defined as double while gcc by default sets
FLT_EVAL_METHOD to 0 and evaluates float expressions in
single-precision. To mitigate that mismatch, with -std=c99 gcc emits
double precision instructions for evaluating float expressions -- at the
cost of additional conversion instructions. Earlier discussions favored
this behavior to maintain ABI compatibility and compliance with the C
standard (that is, for -std=c99), see the discussion around
https://gcc.gnu.org/legacy-ml/gcc-patches/2016-09/msg02392.html Given
the performance overhead, I have reevaluated the impact of cleaning up
the special behavior and changing float_t into float on s390, and now
think that option to be favorable.

The reason for float_t being defined as double is that the port of glibc
to s390 deferred to the generic definition, which back then defaulted to
double. Since then, that definition has not been changed to avoid
breaking ABIs that use float_t. I found only two affected packages,
clucene and ImageMagick, out of >130k Debian packages scanned, and
prepared patches to avoid impact. ImageMagick's ABI has become
independent of float_t on s390 since 7.0.10-39 (patch in
https://github.com/ImageMagick/ImageMagick/pull/2832); patch for clucene
in https://sourceforge.net/p/clucene/bugs/233/.

To smoothen the transition, the first patch makes gcc's behavior
configurable at compile-time with the flag
--enable-s390-excess-float-precision. When the flag is enabled, gcc
maintains the current behavior. By default, configure will test glibc's
behavior: if glibc ties float_t to double, configure will enable the
flag and maintain the current behavior.  Otherwise, it will disable the
flag and drop the special behavior.

Bootstrapped and regtested on s390x, both with a conventional and a
modified glibc. Bootstrapped and regtested on x86-64. Inspected the
generated headers on both targets.

Marius Hillenbrand (2):
  IBM Z: Configure excess precision for float at compile-time
  gcc/testsuite/s390: Add test cases for float_t

 gcc/config/s390/s390.c| 27 ++
 gcc/configure.ac  | 45 +++
 gcc/doc/install.texi  | 10 +
 gcc/testsuite/gcc.target/s390/float_t-1.c | 15 
 gcc/testsuite/gcc.target/s390/float_t-2.c | 13 +++
 5 files changed, 103 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/float_t-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/float_t-2.c

-- 
2.26.2



[PR66791][ARM] Replace calls to __builtin_neon_vmvn* by ~ for vmvn intrinsics

2020-11-25 Thread Prathamesh Kulkarni via Gcc-patches
Hi,
This patch replaces calls to __builtin_neon_vmvnv* builtins with ~
operator in arm_neon.h.
Cross-tested on arm*-*-*.
OK to commit ?

Thanks,
Prathamesh


vmvn-1.diff
Description: Binary data


Re: [PATCH] c++: v2: Add __builtin_bit_cast to implement std::bit_cast [PR93121]

2020-11-25 Thread Jonathan Wakely via Gcc-patches

On 25/11/20 17:24 +0100, Jakub Jelinek wrote:

On Wed, Nov 25, 2020 at 10:31:28AM +, Jonathan Wakely via Gcc-patches wrote:

On 25/11/20 10:23 +0100, Jakub Jelinek wrote:
> On Tue, Nov 24, 2020 at 05:31:03PM -0700, Jeff Law wrote:
> > FIrst, do we need to document the new builtin? 
>
> I think for builtins that are only meant for libstdc++ as underlying 
implementation
> of its documented in the standard APIs we have some cases where we don't
> document them and other cases where we don't.

And people report bugs when they're not documented. They might want to
know how the built-in is supposed to behave so they can implement the
equivalent in other compilers. Documenting the *intended* behaviour
also makes it clear which behaviours can be relied on, which allows us
to change the result for edge cases or out-of-contract inputs later.
If we don't describe the expected behaviour, then we can't really
blame people for relying on whatever it happens to do today.

I used to think we don't need to bother documenting them, but I've
been persuaded that it's useful to do it.


Ok, so like this?

2020-11-25  Jakub Jelinek  

PR libstdc++/93121
* doc/extend.texi (__builtin_bit_cast): Document.

--- gcc/doc/extend.texi.jj  2020-11-23 17:01:51.986013540 +0100
+++ gcc/doc/extend.texi 2020-11-25 17:21:14.696046005 +0100
@@ -13574,6 +13574,21 @@ have intederminate values and the object
bitwise compared to some other object, for example for atomic operations.
@end deftypefn

+@deftypefn {Built-in Function} @var{type} __builtin_bit_cast (@var{type}, 
@var{arg})
+The @code{__builtin_bit_cast} function is available only
+in C++.  The built-in is intended to be used by implementations of
+the @code{std::bit_cast} C++ template function.  Programs should make
+use of the latter function rather than invoking the built-in directly.
+
+This built-in function allows reinterpreting the bits of the @var{arg}
+argument as if it had type @var{type}.  @var{type} and the type of the
+@var{arg} argument need to be trivially copyable types with the same size.
+When manifestly constant-evaluated, it performs extra diagnostics required
+for @code{std::bit_cast} and returns a constant expression if @var{arg}
+is a constant expression.  For more details
+refer to the latest revision of the C++ standard.
+@end deftypefn
+
@deftypefn {Built-in Function} long __builtin_expect (long @var{exp}, long 
@var{c})
@opindex fprofile-arcs
You may use @code{__builtin_expect} to provide the compiler with


That looks good to me.




Re: [PATCH] c++: v2: Add __builtin_bit_cast to implement std::bit_cast [PR93121]

2020-11-25 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 25, 2020 at 10:31:28AM +, Jonathan Wakely via Gcc-patches wrote:
> On 25/11/20 10:23 +0100, Jakub Jelinek wrote:
> > On Tue, Nov 24, 2020 at 05:31:03PM -0700, Jeff Law wrote:
> > > FIrst, do we need to document the new builtin? 
> > 
> > I think for builtins that are only meant for libstdc++ as underlying 
> > implementation
> > of its documented in the standard APIs we have some cases where we don't
> > document them and other cases where we don't.
> 
> And people report bugs when they're not documented. They might want to
> know how the built-in is supposed to behave so they can implement the
> equivalent in other compilers. Documenting the *intended* behaviour
> also makes it clear which behaviours can be relied on, which allows us
> to change the result for edge cases or out-of-contract inputs later.
> If we don't describe the expected behaviour, then we can't really
> blame people for relying on whatever it happens to do today.
> 
> I used to think we don't need to bother documenting them, but I've
> been persuaded that it's useful to do it.

Ok, so like this?

2020-11-25  Jakub Jelinek  

PR libstdc++/93121
* doc/extend.texi (__builtin_bit_cast): Document.

--- gcc/doc/extend.texi.jj  2020-11-23 17:01:51.986013540 +0100
+++ gcc/doc/extend.texi 2020-11-25 17:21:14.696046005 +0100
@@ -13574,6 +13574,21 @@ have intederminate values and the object
 bitwise compared to some other object, for example for atomic operations.
 @end deftypefn
 
+@deftypefn {Built-in Function} @var{type} __builtin_bit_cast (@var{type}, 
@var{arg})
+The @code{__builtin_bit_cast} function is available only
+in C++.  The built-in is intended to be used by implementations of
+the @code{std::bit_cast} C++ template function.  Programs should make
+use of the latter function rather than invoking the built-in directly.
+
+This built-in function allows reinterpreting the bits of the @var{arg}
+argument as if it had type @var{type}.  @var{type} and the type of the
+@var{arg} argument need to be trivially copyable types with the same size.
+When manifestly constant-evaluated, it performs extra diagnostics required
+for @code{std::bit_cast} and returns a constant expression if @var{arg}
+is a constant expression.  For more details
+refer to the latest revision of the C++ standard.
+@end deftypefn
+
 @deftypefn {Built-in Function} long __builtin_expect (long @var{exp}, long 
@var{c})
 @opindex fprofile-arcs
 You may use @code{__builtin_expect} to provide the compiler with


Jakub



[committed] aarch64: Avoid false dependencies for SVE unary operations

2020-11-25 Thread Richard Sandiford via Gcc-patches
For calls like:

z0 = svabs_s8_x (p0, z1)

we previously generated:

abs z0.b, p0/m, z1.b

However, this creates a false dependency on z0 (the merge input).
This can lead to strange results in some cases, e.g. serialising
the operation behind arbitrary earlier operations, or preventing
two iterations of a loop from being executed in parallel.

This patch therefore ties the input to the output, using a MOVPRFX
if necessary and possible.  (The SVE2 unary long instructions do
not support MOVPRFX.)

When testing the patch, I hit a bug in the big-endian SVE move
optimisation in aarch64_maybe_expand_sve_subreg_move.  I don't
have an indepenedent testcase for it, so I didn't split it out
into a separate patch.

Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to master.
I'll backport to GCC 10 in a few days.

Richard


gcc/
* config/aarch64/aarch64.c (aarch64_maybe_expand_sve_subreg_move):
Do not optimize LRA subregs.
* config/aarch64/aarch64-sve.md
(@aarch64_pred_): Tie the input to the
output.
(@aarch64_sve_revbhw_): Likewise.
(*2): Likewise.
(@aarch64_pred_sxt): Likewise.
(*cnot): Likewise.
(@aarch64_pred_): Likewise.
(@aarch64_sve__nontrunc):
Likewise.
(@aarch64_sve__trunc):
Likewise.
(@aarch64_sve__nonextend):
Likewise.
(@aarch64_sve__extend):
Likewise.
(@aarch64_sve__trunc):
Likewise.
(@aarch64_sve__trunc):
Likewise.
(@aarch64_sve__nontrunc):
Likewise.
* config/aarch64/aarch64-sve2.md
(@aarch64_pred_): Likewise.
(@aarch64_pred_): Likewise.
(@aarch64_pred_): Likewise.
(@aarch64_pred_): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/abs_f16.c (abs_f16_x_untied): Expect
a MOVPRFX instruction.
* gcc.target/aarch64/sve/acle/asm/abs_f32.c (abs_f32_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/abs_f64.c (abs_f64_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/abs_s16.c (abs_s16_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/abs_s32.c (abs_s32_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/abs_s64.c (abs_s64_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/abs_s8.c (abs_s8_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/cls_s16.c (cls_s16_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/cls_s32.c (cls_s32_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/cls_s64.c (cls_s64_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/cls_s8.c (cls_s8_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/clz_s16.c (clz_s16_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/clz_s32.c (clz_s32_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/clz_s64.c (clz_s64_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/clz_s8.c (clz_s8_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/clz_u16.c (clz_u16_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/clz_u32.c (clz_u32_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/clz_u64.c (clz_u64_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/clz_u8.c (clz_u8_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/cnot_s16.c (cnot_s16_x_untied):
Ditto.
* gcc.target/aarch64/sve/acle/asm/cnot_s32.c (cnot_s32_x_untied):
Ditto.
* gcc.target/aarch64/sve/acle/asm/cnot_s64.c (cnot_s64_x_untied):
Ditto.
* gcc.target/aarch64/sve/acle/asm/cnot_s8.c (cnot_s8_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/cnot_u16.c (cnot_u16_x_untied):
Ditto.
* gcc.target/aarch64/sve/acle/asm/cnot_u32.c (cnot_u32_x_untied):
Ditto.
* gcc.target/aarch64/sve/acle/asm/cnot_u64.c (cnot_u64_x_untied):
Ditto.
* gcc.target/aarch64/sve/acle/asm/cnot_u8.c (cnot_u8_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/cnt_bf16.c (cnt_bf16_x_untied):
Ditto.
* gcc.target/aarch64/sve/acle/asm/cnt_f16.c (cnt_f16_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/cnt_f32.c (cnt_f32_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/cnt_f64.c (cnt_f64_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/cnt_s16.c (cnt_s16_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/cnt_s32.c (cnt_s32_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/cnt_s64.c (cnt_s64_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/cnt_s8.c (cnt_s8_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/cnt_u16.c (cnt_u16_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/cnt_u32.c (cnt_u32_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/cnt_u64.c (cnt_u64_x_untied): Ditto.
* gcc.target/aarch64/sve/acle/asm/cnt_u8.c (cnt_u8_x_untied): Ditto.
* gcc.target/aarch64/sve/acl

Go patch committed: Avoid silent integer truncation in compiler

2020-11-25 Thread Ian Lance Taylor via Gcc-patches
This patch to the Go frontend avoids silent integer truncation when
compiling code like string(1 << 32).  In the conversion of a constant
integer to a string type, the value of the constant integer was being
silently truncated from unsigned long to unsigned int, producing the
wrong string value.  This patch adds an explicit overflow check to
avoid this problem.  This is for https://golang.org/issue/42790.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
c9525352984b5ded5ce444969dae67d93f7748f8
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 3e94dabcd30..c14ee7e7b14 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-78c9a657fdbc9e812d39910fb93fbae4affe4360
+8cbe18aff99dbf79bd1adb9c025418e84505ffd5
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc
index 6bc93488939..44b0ad7 100644
--- a/gcc/go/gofrontend/expressions.cc
+++ b/gcc/go/gofrontend/expressions.cc
@@ -4024,8 +4024,16 @@ 
Type_conversion_expression::do_string_constant_value(std::string* val) const
  unsigned long ival;
  if (nc.to_unsigned_long(&ival) == Numeric_constant::NC_UL_VALID)
{
+ unsigned int cval = static_cast(ival);
+ if (static_cast(cval) != ival)
+   {
+ go_warning_at(this->location(), 0,
+   "unicode code point 0x%lx out of range",
+   ival);
+ cval = 0xfffd; // Unicode "replacement character."
+   }
  val->clear();
- Lex::append_char(ival, true, val, this->location());
+ Lex::append_char(cval, true, val, this->location());
  return true;
}
}


Re: Unreviewed g++ driver patch

2020-11-25 Thread Nathan Sidwell

On 11/25/20 9:41 AM, Rainer Orth wrote:

The g++spec.c part of this patch

[PATCH] ada: c++: Get rid of libposix4, librt on Solaris
 https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559341.html

is the only remaining part still needing review by a c++ or driver
maintainer.  It's more than a week now.



the c++ bit is fine, thanks!

nathan

--
Nathan Sidwell


Re: [PATCH] Fixup additional search path in offload_additional_options

2020-11-25 Thread Richard Biener
On Wed, 25 Nov 2020, Jakub Jelinek wrote:

> On Wed, Nov 25, 2020 at 04:30:44PM +0100, Richard Biener wrote:
> > This fixes the search when configured with --libexecdir=lib64,
> > I've adjusted the bin reference for consistency.
> > 
> > Testing in progress.  Does this look sensible?
> > 
> > 2020-11-25  Richard Biener  
> > 
> > libgomp/
> > * configure: Regenerate.
> > * plugin/configfrag.ac (offload_additional_options): Use
> > $(libexecdir) and $(bindir) instead of hard-coding them.
> 
> LGTM.
> 
>   Jakub.

Hmm, but $(libexecdir) includes the prefix, thus expands to
/usr/lib64 for me.  So what's the tgt_dir used for besides
populating offload_additional_options?  That said, in this
very spot not specifying it would work for me I guess,
historically I have used /usr/nvptx as path for reasons
I do not remember :/ (newlib is installed in this location)

Richard.


Re: [PATCH] rs6000: Set param_vect_partial_vector_usage as 1 for P10

2020-11-25 Thread Segher Boessenkool
Hi!

On Wed, Nov 25, 2020 at 02:02:16PM +0800, Kewen.Lin wrote:
> This patch is to set param_vect_partial_vector_usage as 1 on P10
> by default.  Due to the unexpected performance on Power9 of those
> vector with length instructions, we didn't enable vectorization
> with partial vectors before.  Some recent testings show that they
> perform expectedly on Power10 now.  The performance evaluation
> on the whole SPEC2017 with latest trunk and option set power10/
> Ofast/unroll shows it can speed up 525.x264_r by 10.80% and
> 554.roms_r by 1.94%.  One remarkable degradation is 523.xalancbmk_r
> -1.79% but it's identified not directly related to this enablement
> by some investigation.


> +  if (TARGET_POWER10)
> +   SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> + param_vect_partial_vector_usage, 1);
> +  else
> +   /* Disable it on the default supported hardware Power9 since
> +   lxvl/stxvl have unexpected performance behaviors.  */
> +   SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> + param_vect_partial_vector_usage, 0);

Maybe write this like

  /* The lxvl/stxvl instructions don't perform well before Power10.  */
  if (TARGET_POWER10)
   SET_OPTION_IF_UNSET (&global_options, &global_options_set,
param_vect_partial_vector_usage, 1);
  else
   SET_OPTION_IF_UNSET (&global_options, &global_options_set,
param_vect_partial_vector_usage, 0);

Okay for trunk with such a comment (before the "if").  Thanks!


Segher


Re: [PATCH] Fixup additional search path in offload_additional_options

2020-11-25 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 25, 2020 at 04:30:44PM +0100, Richard Biener wrote:
> This fixes the search when configured with --libexecdir=lib64,
> I've adjusted the bin reference for consistency.
> 
> Testing in progress.  Does this look sensible?
> 
> 2020-11-25  Richard Biener  
> 
> libgomp/
>   * configure: Regenerate.
>   * plugin/configfrag.ac (offload_additional_options): Use
>   $(libexecdir) and $(bindir) instead of hard-coding them.

LGTM.

Jakub



[PATCH] Fixup additional search path in offload_additional_options

2020-11-25 Thread Richard Biener
This fixes the search when configured with --libexecdir=lib64,
I've adjusted the bin reference for consistency.

Testing in progress.  Does this look sensible?

2020-11-25  Richard Biener  

libgomp/
* configure: Regenerate.
* plugin/configfrag.ac (offload_additional_options): Use
$(libexecdir) and $(bindir) instead of hard-coding them.
---
 libgomp/configure| 2 +-
 libgomp/plugin/configfrag.ac | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgomp/configure b/libgomp/configure
index e48371d5093..f130f97330f 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15371,7 +15371,7 @@ rm -f core conftest.err conftest.$ac_objext \
 fi
 # Configure additional search paths.
 if test x"$tgt_dir" != x; then
-  offload_additional_options="$offload_additional_options 
-B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
+  offload_additional_options="$offload_additional_options 
-B$tgt_dir/\$(libexecdir)/gcc/\$(target_alias)/\$(gcc_version) 
-B$tgt_dir/\$(bindir)"
   
offload_additional_lib_paths="$offload_additional_lib_paths:$tgt_dir/lib64:$tgt_dir/lib:$tgt_dir/lib32"
 else
   offload_additional_options="$offload_additional_options 
-B\$(libexecdir)/gcc/\$(target_alias)/\$(gcc_version) -B\$(bindir)"
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index 14030082ea8..5b6ae6c09f1 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -238,7 +238,7 @@ if test x"$enable_offload_targets" != x; then
 fi
 # Configure additional search paths.
 if test x"$tgt_dir" != x; then
-  offload_additional_options="$offload_additional_options 
-B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
+  offload_additional_options="$offload_additional_options 
-B$tgt_dir/\$(libexecdir)/gcc/\$(target_alias)/\$(gcc_version) 
-B$tgt_dir/\$(bindir)"
   
offload_additional_lib_paths="$offload_additional_lib_paths:$tgt_dir/lib64:$tgt_dir/lib:$tgt_dir/lib32"
 else
   offload_additional_options="$offload_additional_options 
-B\$(libexecdir)/gcc/\$(target_alias)/\$(gcc_version) -B\$(bindir)"
-- 
2.26.2


Unreviewed g++ driver patch

2020-11-25 Thread Rainer Orth
The g++spec.c part of this patch

[PATCH] ada: c++: Get rid of libposix4, librt on Solaris
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559341.html

is the only remaining part still needing review by a c++ or driver
maintainer.  It's more than a week now.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] middle-end: __builtin_mul_overflow expansion improvements [PR95862]

2020-11-25 Thread Richard Biener
On Wed, 25 Nov 2020, Jakub Jelinek wrote:

> Hi!
> 
> The following patch adds some improvements for __builtin_mul_overflow
> expansion.
> One optimization is for the u1 * u2 -> sr case, as documented we normally
> do:
>  u1 * u2 -> sr
> res = (S) (u1 * u2)
> ovf = res < 0 || main_ovf (true)
> where main_ovf (true) stands for jump on unsigned multiplication overflow.
> If we know that the most significant bits of both operands are clear (such
> as when they are zero extended from something smaller), we can
> emit better coe by handling it like s1 * s2 -> sr, i.e. just jump on
> overflow after signed multiplication.
> 
> Another two cases are s1 * s2 -> ur or s1 * u2 -> ur, if we know the minimum
> precision needed to encode all values of both arguments summed together
> is smaller or equal to destination precision (such as when the two arguments
> are sign (or zero) extended from half precision types, we know the overflows
> happen only iff one argument is negative and the other argument is positive
> (not zero), because even if both have maximum possible values, the maximum
> is still representable (e.g. for char * char -> unsigned short
> 0x7f * 0x7f = 0x3f01 and for char * unsigned char -> unsigned short
> 0x7f * 0xffU = 0x7e81) and as the result is unsigned, all negative results
> do overflow, but are also representable if we consider the result signed
> - all of them have the MSB set.  So, it is more efficient to just
> do the normal multiplication in that case and compare the result considered
> as signed value against 0, if it is smaller, overflow happened.
> 
> And the get_min_precision change is to improve the char to short handling,
> we have there in the IL
>   _2 = (int) arg_1(D);
> promotion from C promotions from char or unsigned char arg, and the caller
> adds a NOP_EXPR cast to short or unsigned short.  get_min_precision punts
> on the narrowing cast though, it handled only widening casts, but we can
> handle narrowing casts fine too, by recursing on the narrowing cast operands
> and using it only if it has in the end smaller minimal precision, which
> would duplicate the sign bits (or zero bits) to both the bits above the
> narrowing conversion and also at least one below that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2020-10-25  Jakub Jelinek  
> 
>   PR rtl-optimization/95862
>   * internal-fn.c (get_min_precision): For narrowing conversion, recurse
>   on the operand and if the operand precision is smaller than the
>   current one, return that smaller precision.
>   (expand_mul_overflow): For s1 * u2 -> ur and s1 * s2 -> ur cases
>   if the sum of minimum precisions of both operands is smaller or equal
>   to the result precision, just perform normal multiplication and
>   set overflow to the sign bit of the multiplication result.  For
>   u1 * u2 -> sr if both arguments have the MSB known zero, use
>   normal s1 * s2 -> sr expansion.
> 
>   * gcc.dg/builtin-artih-overflow-5.c: New test.
> 
> --- gcc/internal-fn.c.jj  2020-10-13 09:25:28.444909391 +0200
> +++ gcc/internal-fn.c 2020-11-24 15:54:32.684762538 +0100
> @@ -553,6 +553,16 @@ get_min_precision (tree arg, signop sign
>if (++cnt > 30)
>   return prec + (orig_sign != sign);
>  }
> +  if (CONVERT_EXPR_P (arg)
> +  && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (arg, 0)))
> +  && TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) > prec)
> +{
> +  /* We have e.g. (unsigned short) y_2 where int y_2 = (int) x_1(D);
> +  If y_2's min precision is smaller than prec, return that.  */
> +  int oprec = get_min_precision (TREE_OPERAND (arg, 0), sign);
> +  if (oprec < prec)
> + return oprec + (orig_sign != sign);
> +}
>if (TREE_CODE (arg) != SSA_NAME)
>  return prec + (orig_sign != sign);
>wide_int arg_min, arg_max;
> @@ -1357,6 +1367,37 @@ expand_mul_overflow (location_t loc, tre
>  NULL, done_label, 
> profile_probability::very_likely ());
> goto do_error_label;
>   case 3:
> +   if (get_min_precision (arg1, UNSIGNED)
> +   + get_min_precision (arg0, SIGNED) <= GET_MODE_PRECISION (mode))
> + {
> +   /* If the first operand is sign extended from narrower type, the
> +  second operand is zero extended from narrower type and
> +  the sum of the two precisions is smaller or equal to the
> +  result precision: if the first argument is at runtime
> +  non-negative, maximum result will be 0x7e81 or 0x7f..fe80..01
> +  and there will be no overflow, if the first argument is
> +  negative and the second argument zero, the result will be
> +  0 and there will be no overflow, if the first argument is
> +  negative and the second argument positive, the result when
> +  treated as signed wi

Re: Improve tracking of parameters that does not escape

2020-11-25 Thread Richard Biener
On Wed, 25 Nov 2020, Jan Hubicka wrote:

> Hi,
> we discussed this patch briefly two weeks ago, but did not reach conclusion 
> and
> since I wanted to avoid ICF fixes slipping another release I had chance to
> return to it only now.  Main limitation of modref is the fact that it does not
> track anything in memory. This is intentional - I wanted the initial
> implementation to be cheap. However it also makes it very limited when it 
> comes
> to detecting noescape especially because it is paranoid about what memory 
> accesses may be used to copy (bits of) pointers.
> 
> Consider:
> 
> void
> test (int *a, int *b)
> {
>   *a=*b;
> }
> 
> Here both parameters are noescape. However we get:
> 
> Analyzing flags of ssa name: a_4(D)
>   Analyzing stmt:*a_4(D) = _1;
>   current flags of a_4(D) direct noescape
> flags of ssa name a_4(D) direct noescape
> Analyzing flags of ssa name: b_3(D)
>   Analyzing stmt:_1 = *b_3(D);
> Analyzing flags of ssa name: _1
>   Analyzing stmt:*a_4(D) = _1;
>   ssa name saved to memory
>   current flags of _1
> flags of ssa name _1
>   current flags of b_3(D)
> 
> So for a we get flags right, but for b we see memory write and stop trakcing
> completely assuming that the memory may cause indirect scape.
> 
> This patch adds EAF_NODIRECTSCAPE that is weaker vairant of EAF_NOESCAPE where
> we only know that the pointer itself does not escape, but memory pointed to
> may.  This is a lot more reliable to auto-detect that EAF_NOESCAPE and still
> enables additional optimization.  With patch we get nodirectscape flag for b
> that enables in practice similar optimization as EAF_NOESCAPE for arrays of
> integers that points nowhere :)
> 
> Path is very effective on cc1plus changing:
> 
> Alias oracle query stats:
>   refs_may_alias_p: 65974098 disambiguations, 75491744 queries
>   ref_maybe_used_by_call_p: 239316 disambiguations, 66783365 queries
>   call_may_clobber_ref_p: 109214 disambiguations, 114381 queries
>   nonoverlapping_component_refs_p: 0 disambiguations, 37014 queries
>   nonoverlapping_refs_since_match_p: 26917 disambiguations, 56947 must 
> overlaps, 84634 queries
>   aliasing_component_refs_p: 63593 disambiguations, 2026642 queries
>   TBAA oracle: 25059985 disambiguations 58735771 queries
>12279288 are in alias set 0
>10228328 queries asked about the same object
>124 queries asked about the same alias set
>0 access volatile
>9551512 are dependent in the DAG
>1616534 are aritificially in conflict with void *
> 
> Modref stats:
>   modref use: 13629 disambiguations, 362550 queries
>   modref clobber: 1603074 disambiguations, 12633002 queries
>   4128405 tbaa queries (0.326795 per modref query)
>   678007 base compares (0.053670 per modref query)
> 
> PTA query stats:
>   pt_solution_includes: 1447025 disambiguations, 13421154 queries
>   pt_solutions_intersect: 1014606 disambiguations, 12743264 queries
> 
> to:
> 
> Alias oracle query stats:
>   refs_may_alias_p: 76994196 disambiguations, 86322026 queries
>   ref_maybe_used_by_call_p: 398635 disambiguations, 77664397 queries
>   call_may_clobber_ref_p: 248995 disambiguations, 252747 queries
>   nonoverlapping_component_refs_p: 0 disambiguations, 36357 queries
>   nonoverlapping_refs_since_match_p: 26973 disambiguations, 56944 must 
> overlaps, 84688 queries
>   aliasing_component_refs_p: 63472 disambiguations, 2013517 queries
>   TBAA oracle: 25278106 disambiguations 59186830 queries
>12480044 are in alias set 0
>10260217 queries asked about the same object
>121 queries asked about the same alias set
>0 access volatile
>9550119 are dependent in the DAG
>1618223 are aritificially in conflict with void *
> 
> Modref stats:
>   modref use: 13909 disambiguations, 370418 queries
>   modref clobber: 1643513 disambiguations, 18036536 queries
>   4197648 tbaa queries (0.232730 per modref query)
>   727893 base compares (0.040357 per modref query)
> 
> PTA query stats:
>   pt_solution_includes: 11463123 disambiguations, 22989602 queries
>   pt_solutions_intersect: 1238048 disambiguations, 12893812 queries
> 
> This is 1447025->11463123 PTA disambiguations, so 7.6 times more.
> (there is also incrase in number of querries)
> 
> For tramp3d I get:
> Alias oracle query stats:
>   refs_may_alias_p: 2394105 disambiguations, 2675969 queries
>   ref_maybe_used_by_call_p: 11048 disambiguations, 2428198 queries
>   call_may_clobber_ref_p: 922 disambiguations, 932 queries
>   nonoverlapping_component_refs_p: 0 disambiguations, 4457 queries
>   nonoverlapping_refs_since_match_p: 329 disambiguations, 10298 must 
> overlaps, 10714 queries
>   aliasing_component_refs_p: 956 disambiguations, 36074 queries
>   TBAA oracle: 1046044 disambiguations 1942025 queries
>169583 are in alias set 0
>507146 queries asked 

Re: Free more of CFG in release_function_body

2020-11-25 Thread Richard Biener
On Wed, 25 Nov 2020, Jan Hubicka wrote:

> > On Wed, Nov 25, 2020 at 3:11 PM Jan Hubicka  wrote:
> > >
> > > > On Tue, 24 Nov 2020, Jan Hubicka wrote:
> > > >
> > > > > Hi,
> > > > > at the end of processing function body we loop over basic blocks and
> > > > > free all edges while we do not free the rest.  I think this is 
> > > > > leftover
> > > > > from time eges was not garbage collected and we was not using 
> > > > > ggc_free.
> > > > > It makes more sense to free all associated structures (which is
> > > > > importnat for WPA memory footprint).
> > > > >
> > > > > Bootstrapped/regtested x86_64-linux, OK?
> > > >
> > > > OK.
> > >
> > > Unforutnately the patch does not surive LTO bootstrap.  The problem is
> > > that we keep DECL_INITIAL that points to blocks and blocks points to
> > > var_decls and these points to SSA_NAMES that points to statements and
> > > those points to basic blocks.
> > 
> > VAR_DECLs point to SSA_NAMEs?  It's the other way around.  We for sure
> > free SSA_NAMEs (well, maybe not explicitely with ggc_free).
> 
> I am going to debug this more carefully now.  I think it was VAR_DECL
> with variadic type pointing to SSA_NAME.  Should be easy to reduct with
> gcac compiler.

Possibly another case of a missing DECL_EXPR then.  In those cases
SSA names leak into TYPE_MIN/MAX_VALUE or TYPE/DECL_SIZE during
gimplification.  But those are frontend bugs (there are plenty reported).

Richard.


Re: Free more of CFG in release_function_body

2020-11-25 Thread Jan Hubicka
> On Wed, Nov 25, 2020 at 3:11 PM Jan Hubicka  wrote:
> >
> > > On Tue, 24 Nov 2020, Jan Hubicka wrote:
> > >
> > > > Hi,
> > > > at the end of processing function body we loop over basic blocks and
> > > > free all edges while we do not free the rest.  I think this is leftover
> > > > from time eges was not garbage collected and we was not using ggc_free.
> > > > It makes more sense to free all associated structures (which is
> > > > importnat for WPA memory footprint).
> > > >
> > > > Bootstrapped/regtested x86_64-linux, OK?
> > >
> > > OK.
> >
> > Unforutnately the patch does not surive LTO bootstrap.  The problem is
> > that we keep DECL_INITIAL that points to blocks and blocks points to
> > var_decls and these points to SSA_NAMES that points to statements and
> > those points to basic blocks.
> 
> VAR_DECLs point to SSA_NAMEs?  It's the other way around.  We for sure
> free SSA_NAMEs (well, maybe not explicitely with ggc_free).

I am going to debug this more carefully now.  I think it was VAR_DECL
with variadic type pointing to SSA_NAME.  Should be easy to reduct with
gcac compiler.

Honza


Re: Handle EAF_DIRECT and EAF_UNUSED of pure calls

2020-11-25 Thread Richard Biener via Gcc-patches
On Wed, Nov 25, 2020 at 3:14 PM Jan Hubicka  wrote:
>
> Hi,
> while looking into structalias I noticed that we ignore EAF flags here.
> This is pity since we still can apply direct and unused.
> This patch simply copies logic from normal call handling. I relaize that
> it is bit more expensive by creating callarg and doing transitive
> closure there instead of doing one common transitive closure on call use.
> I can also scan first if there are both direct and !direct argument and
> do this optimization, but it does not seem to affect build times (tested
> on spec2k6 gcc LTO build)
>
> lto-boostrapped/regtested x86_64-linux.

OK.

Richard.

> Honza
>
> diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
> index a4832b75436..5f84f7d467f 100644
> --- a/gcc/tree-ssa-structalias.c
> +++ b/gcc/tree-ssa-structalias.c
> @@ -4253,12 +4253,20 @@ handle_pure_call (gcall *stmt, vec *results)
>for (i = 0; i < gimple_call_num_args (stmt); ++i)
>  {
>tree arg = gimple_call_arg (stmt, i);
> +  int flags = gimple_call_arg_flags (stmt, i);
> +
> +  if (flags & EAF_UNUSED)
> +   continue;
> +
>if (!uses)
> -   {
> - uses = get_call_use_vi (stmt);
> - make_any_offset_constraints (uses);
> - make_transitive_closure_constraints (uses);
> -   }
> +   uses = get_call_use_vi (stmt);
> +  varinfo_t tem = new_var_info (NULL_TREE, "callarg", true);
> +  tem->is_reg_var = true;
> +  make_constraint_to (tem->id, arg);
> +  make_any_offset_constraints (tem);
> +  if (!(flags & EAF_DIRECT))
> +   make_transitive_closure_constraints (tem);
> +  make_copy_constraint (uses, tem->id);
>make_constraint_to (uses->id, arg);
>  }
>


Re: Free more of CFG in release_function_body

2020-11-25 Thread Richard Biener via Gcc-patches
On Wed, Nov 25, 2020 at 3:11 PM Jan Hubicka  wrote:
>
> > On Tue, 24 Nov 2020, Jan Hubicka wrote:
> >
> > > Hi,
> > > at the end of processing function body we loop over basic blocks and
> > > free all edges while we do not free the rest.  I think this is leftover
> > > from time eges was not garbage collected and we was not using ggc_free.
> > > It makes more sense to free all associated structures (which is
> > > importnat for WPA memory footprint).
> > >
> > > Bootstrapped/regtested x86_64-linux, OK?
> >
> > OK.
>
> Unforutnately the patch does not surive LTO bootstrap.  The problem is
> that we keep DECL_INITIAL that points to blocks and blocks points to
> var_decls and these points to SSA_NAMES that points to statements and
> those points to basic blocks.

VAR_DECLs point to SSA_NAMEs?  It's the other way around.  We for sure
free SSA_NAMEs (well, maybe not explicitely with ggc_free).

Richard.

>
> I wonder with early debug if we sitll need all the logic about keeping
> DECL_INITIAL.
>
> I have commited version that frees everything but the BB themselves and
> will look into cleaning the pointers in decl_initial.
>
> gcc/ChangeLog:
>
> 2020-11-25  Jan Hubicka  
>
> * cfg.c (free_block): New function.
> (clear_edges): Rename to 
> (free_cfg): ... this one; also free BBs and vectors.
> (expunge_block): Update comment.
> * cfg.h (clear_edges): Rename to ...
> (free_cfg): ... this one.
> * cgraph.c (release_function_body): Use free_cfg.
>
> diff --git a/gcc/cfg.c b/gcc/cfg.c
> index de0e71db850..529b6ed2105 100644
> --- a/gcc/cfg.c
> +++ b/gcc/cfg.c
> @@ -25,7 +25,7 @@ along with GCC; see the file COPYING3.  If not see
>
> Available functionality:
>   - Initialization/deallocation
> -init_flow, clear_edges
> +init_flow, free_cfg
>   - Low level basic block manipulation
>  alloc_block, expunge_block
>   - Edge manipulation
> @@ -83,7 +83,7 @@ init_flow (struct function *the_fun)
>the_fun->cfg->bb_flags_allocated = BB_ALL_FLAGS;
>  }
>
> -/* Helper function for remove_edge and clear_edges.  Frees edge structure
> +/* Helper function for remove_edge and free_cffg.  Frees edge structure
> without actually removing it from the pred/succ arrays.  */
>
>  static void
> @@ -93,29 +93,44 @@ free_edge (function *fn, edge e)
>ggc_free (e);
>  }
>
> -/* Free the memory associated with the edge structures.  */
> +/* Free basic block BB.  */
> +
> +static void
> +free_block (basic_block bb)
> +{
> +   vec_free (bb->succs);
> +   bb->succs = NULL;
> +   vec_free (bb->preds);
> +   bb->preds = NULL;
> +   /* Do not free BB itself yet since we leak pointers to dead statements
> +  that points to dead basic blocks.  */
> +}
> +
> +/* Free the memory associated with the CFG in FN.  */
>
>  void
> -clear_edges (struct function *fn)
> +free_cfg (struct function *fn)
>  {
> -  basic_block bb;
>edge e;
>edge_iterator ei;
> +  basic_block next;
>
> -  FOR_EACH_BB_FN (bb, fn)
> +  for (basic_block bb = ENTRY_BLOCK_PTR_FOR_FN (fn); bb; bb = next)
>  {
> +  next = bb->next_bb;
>FOR_EACH_EDGE (e, ei, bb->succs)
> free_edge (fn, e);
> -  vec_safe_truncate (bb->succs, 0);
> -  vec_safe_truncate (bb->preds, 0);
> +  free_block (bb);
>  }
>
> -  FOR_EACH_EDGE (e, ei, ENTRY_BLOCK_PTR_FOR_FN (fn)->succs)
> -free_edge (fn, e);
> -  vec_safe_truncate (EXIT_BLOCK_PTR_FOR_FN (fn)->preds, 0);
> -  vec_safe_truncate (ENTRY_BLOCK_PTR_FOR_FN (fn)->succs, 0);
> -
>gcc_assert (!n_edges_for_fn (fn));
> +  /* Sanity check that dominance tree is freed.  */
> +  gcc_assert (!fn->cfg->x_dom_computed[0] && !fn->cfg->x_dom_computed[1]);
> +
> +  vec_free (fn->cfg->x_label_to_block_map);
> +  vec_free (basic_block_info_for_fn (fn));
> +  ggc_free (fn->cfg);
> +  fn->cfg = NULL;
>  }
>
>  /* Allocate memory for basic_block.  */
> @@ -190,8 +205,8 @@ expunge_block (basic_block b)
>/* We should be able to ggc_free here, but we are not.
>   The dead SSA_NAMES are left pointing to dead statements that are 
> pointing
>   to dead basic blocks making garbage collector to die.
> - We should be able to release all dead SSA_NAMES and at the same time we 
> should
> - clear out BB pointer of dead statements consistently.  */
> + We should be able to release all dead SSA_NAMES and at the same time we
> + should clear out BB pointer of dead statements consistently.  */
>  }
>
>  /* Connect E to E->src.  */
> diff --git a/gcc/cfg.h b/gcc/cfg.h
> index 93fde6df2bf..a9c8300f173 100644
> --- a/gcc/cfg.h
> +++ b/gcc/cfg.h
> @@ -82,7 +82,7 @@ struct GTY(()) control_flow_graph {
>
>
>  extern void init_flow (function *);
> -extern void clear_edges (function *);
> +extern void free_cfg (function *);
>  extern basic_block alloc_block (void);
>  extern void link_block (basic_block, basic_block);
>  extern void unlink_block (basic_block);
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
>

Handle EAF_DIRECT and EAF_UNUSED of pure calls

2020-11-25 Thread Jan Hubicka
Hi,
while looking into structalias I noticed that we ignore EAF flags here.
This is pity since we still can apply direct and unused.
This patch simply copies logic from normal call handling. I relaize that
it is bit more expensive by creating callarg and doing transitive
closure there instead of doing one common transitive closure on call use.
I can also scan first if there are both direct and !direct argument and
do this optimization, but it does not seem to affect build times (tested
on spec2k6 gcc LTO build)

lto-boostrapped/regtested x86_64-linux.

Honza

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index a4832b75436..5f84f7d467f 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -4253,12 +4253,20 @@ handle_pure_call (gcall *stmt, vec *results)
   for (i = 0; i < gimple_call_num_args (stmt); ++i)
 {
   tree arg = gimple_call_arg (stmt, i);
+  int flags = gimple_call_arg_flags (stmt, i);
+
+  if (flags & EAF_UNUSED)
+   continue;
+
   if (!uses)
-   {
- uses = get_call_use_vi (stmt);
- make_any_offset_constraints (uses);
- make_transitive_closure_constraints (uses);
-   }
+   uses = get_call_use_vi (stmt);
+  varinfo_t tem = new_var_info (NULL_TREE, "callarg", true);
+  tem->is_reg_var = true;
+  make_constraint_to (tem->id, arg);
+  make_any_offset_constraints (tem);
+  if (!(flags & EAF_DIRECT))
+   make_transitive_closure_constraints (tem);
+  make_copy_constraint (uses, tem->id);
   make_constraint_to (uses->id, arg);
 }
 


Re: Free more of CFG in release_function_body

2020-11-25 Thread Jan Hubicka
> On Tue, 24 Nov 2020, Jan Hubicka wrote:
> 
> > Hi,
> > at the end of processing function body we loop over basic blocks and
> > free all edges while we do not free the rest.  I think this is leftover
> > from time eges was not garbage collected and we was not using ggc_free.
> > It makes more sense to free all associated structures (which is
> > importnat for WPA memory footprint).
> > 
> > Bootstrapped/regtested x86_64-linux, OK?
> 
> OK.

Unforutnately the patch does not surive LTO bootstrap.  The problem is
that we keep DECL_INITIAL that points to blocks and blocks points to
var_decls and these points to SSA_NAMES that points to statements and
those points to basic blocks.

I wonder with early debug if we sitll need all the logic about keeping
DECL_INITIAL.

I have commited version that frees everything but the BB themselves and
will look into cleaning the pointers in decl_initial.

gcc/ChangeLog:

2020-11-25  Jan Hubicka  

* cfg.c (free_block): New function.
(clear_edges): Rename to 
(free_cfg): ... this one; also free BBs and vectors.
(expunge_block): Update comment.
* cfg.h (clear_edges): Rename to ...
(free_cfg): ... this one.
* cgraph.c (release_function_body): Use free_cfg.

diff --git a/gcc/cfg.c b/gcc/cfg.c
index de0e71db850..529b6ed2105 100644
--- a/gcc/cfg.c
+++ b/gcc/cfg.c
@@ -25,7 +25,7 @@ along with GCC; see the file COPYING3.  If not see
 
Available functionality:
  - Initialization/deallocation
-init_flow, clear_edges
+init_flow, free_cfg
  - Low level basic block manipulation
 alloc_block, expunge_block
  - Edge manipulation
@@ -83,7 +83,7 @@ init_flow (struct function *the_fun)
   the_fun->cfg->bb_flags_allocated = BB_ALL_FLAGS;
 }
 
-/* Helper function for remove_edge and clear_edges.  Frees edge structure
+/* Helper function for remove_edge and free_cffg.  Frees edge structure
without actually removing it from the pred/succ arrays.  */
 
 static void
@@ -93,29 +93,44 @@ free_edge (function *fn, edge e)
   ggc_free (e);
 }
 
-/* Free the memory associated with the edge structures.  */
+/* Free basic block BB.  */
+
+static void
+free_block (basic_block bb)
+{
+   vec_free (bb->succs);
+   bb->succs = NULL;
+   vec_free (bb->preds);
+   bb->preds = NULL;
+   /* Do not free BB itself yet since we leak pointers to dead statements
+  that points to dead basic blocks.  */
+}
+
+/* Free the memory associated with the CFG in FN.  */
 
 void
-clear_edges (struct function *fn)
+free_cfg (struct function *fn)
 {
-  basic_block bb;
   edge e;
   edge_iterator ei;
+  basic_block next;
 
-  FOR_EACH_BB_FN (bb, fn)
+  for (basic_block bb = ENTRY_BLOCK_PTR_FOR_FN (fn); bb; bb = next)
 {
+  next = bb->next_bb;
   FOR_EACH_EDGE (e, ei, bb->succs)
free_edge (fn, e);
-  vec_safe_truncate (bb->succs, 0);
-  vec_safe_truncate (bb->preds, 0);
+  free_block (bb);
 }
 
-  FOR_EACH_EDGE (e, ei, ENTRY_BLOCK_PTR_FOR_FN (fn)->succs)
-free_edge (fn, e);
-  vec_safe_truncate (EXIT_BLOCK_PTR_FOR_FN (fn)->preds, 0);
-  vec_safe_truncate (ENTRY_BLOCK_PTR_FOR_FN (fn)->succs, 0);
-
   gcc_assert (!n_edges_for_fn (fn));
+  /* Sanity check that dominance tree is freed.  */
+  gcc_assert (!fn->cfg->x_dom_computed[0] && !fn->cfg->x_dom_computed[1]);
+  
+  vec_free (fn->cfg->x_label_to_block_map);
+  vec_free (basic_block_info_for_fn (fn));
+  ggc_free (fn->cfg);
+  fn->cfg = NULL;
 }
 
 /* Allocate memory for basic_block.  */
@@ -190,8 +205,8 @@ expunge_block (basic_block b)
   /* We should be able to ggc_free here, but we are not.
  The dead SSA_NAMES are left pointing to dead statements that are pointing
  to dead basic blocks making garbage collector to die.
- We should be able to release all dead SSA_NAMES and at the same time we 
should
- clear out BB pointer of dead statements consistently.  */
+ We should be able to release all dead SSA_NAMES and at the same time we
+ should clear out BB pointer of dead statements consistently.  */
 }
 
 /* Connect E to E->src.  */
diff --git a/gcc/cfg.h b/gcc/cfg.h
index 93fde6df2bf..a9c8300f173 100644
--- a/gcc/cfg.h
+++ b/gcc/cfg.h
@@ -82,7 +82,7 @@ struct GTY(()) control_flow_graph {
 
 
 extern void init_flow (function *);
-extern void clear_edges (function *);
+extern void free_cfg (function *);
 extern basic_block alloc_block (void);
 extern void link_block (basic_block, basic_block);
 extern void unlink_block (basic_block);
diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 19dfe2be23b..dbde8aaaba1 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -1811,7 +1811,7 @@ release_function_body (tree decl)
  gcc_assert (!dom_info_available_p (fn, CDI_DOMINATORS));
  gcc_assert (!dom_info_available_p (fn, CDI_POST_DOMINATORS));
  delete_tree_cfg_annotations (fn);
- clear_edges (fn);
+ free_cfg (fn);
  fn->cfg = NULL;
}
   if (fn->value_histograms)
diff --

[PING] [PATCH] libstdc++: Pretty printers for std::_Bit_reference, std::_Bit_iterator and std::_Bit_const_iterator

2020-11-25 Thread Michael Weghorn via Gcc-patches

I'd like to ping for this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553870.html

Michael

On 11/10/2020 19.22, Michael Weghorn via Gcc-patches wrote:

On 22/09/2020 12.04, Jonathan Wakely wrote:

On 14/09/20 16:49 +0200, Michael Weghorn via Libstdc++ wrote:

Hi,

the attached patch implements pretty printers relevant for iteration
over std::vector, thus handling the TODO
added in commit 36d0dada6773d7fd7c5ace64c90e723930a3b81e
("Have std::vector printer's iterator return bool for vector",
2019-06-19):

    TODO add printer for vector's _Bit_iterator and
_Bit_const_iterator

Tested on x86_64-pc-linux-gnu (Debian testing).

I haven't filed any copyright assignment for GCC yet, but I'm happy to
do so when pointed to the right place.


Thanks for the patch! I'll send you the form to start the copyuright
assignment process.




Thanks! The copyright assignment is done now. Is there anything else to
do from my side at the moment?



[committed] Fix atomic_capture-1.f90 testcase

2020-11-25 Thread Andrew Stubbs
This libgomp OpenACC testcase makes assumptions about the order in which 
loop iterations will run that are invalid on amdgcn. Apparently nvptx 
does work that way, but I find that surprising in itself.


For example, this patch ensures that where a test expects one bit left 
set, or unset, then it doesn't matter which bit it is.


Committed as obvious. Please revert if that turns out not true.

Andrew
Fix atomic_capture-1.f90 testcase

The testcase had invalid assumptions about which loop iterations would run
first and last.

libgomp/ChangeLog

	* testsuite/libgomp.oacc-fortran/atomic_capture-1.f90 (main): Adjust
	expected results.

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90
index 536b3f0030c..0b923d5c5bf 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90
@@ -299,7 +299,7 @@ program main
   ! At most one iarr element can be 0.
   do i = 1, N
  if ((iarr(i) == 0 .and. i /= itmp) &
- .or. iarr(i) < 0 .or. iarr(i) >= N) STOP 35
+ .or. iarr(i) < 0 .or. iarr(i) > N) STOP 35
   end do
   if (igot /= iexp) STOP 36
 
@@ -336,7 +336,7 @@ program main
 
   !$acc parallel loop copy (igot, itmp)
 do i = 0, N - 1
-  iexpr = ibclr (-2, i)
+  iexpr = ibclr (-1, i)
   !$acc atomic capture
   iarr(i) = igot
   igot = iand (igot, iexpr)
@@ -345,7 +345,7 @@ program main
   !$acc end parallel loop
 
   do i = 1, N
- if (.not. (iarr(i - 1) < 0)) STOP 39
+ if (.not. (popcnt(iarr(i - 1)) > 0)) STOP 39
   end do
   if (igot /= iexp) STOP 40
 
@@ -363,7 +363,7 @@ program main
   !$acc end parallel loop
 
   do i = 1, N
- if (.not. (iarr(i - 1) >= 0)) STOP 41
+ if (.not. (popcnt(iarr(i - 1)) < 32)) STOP 41
   end do
   if (igot /= iexp) STOP 42
 
@@ -381,7 +381,7 @@ program main
   !$acc end parallel loop
 
   do i = 1, N
- if (.not. (iarr(i - 1) < 0)) STOP 43
+ if (.not. (popcnt(iarr(i - 1)) > 0)) STOP 43
   end do
   if (igot /= iexp) STOP 44
 
@@ -398,7 +398,7 @@ program main
   !$acc end parallel loop
 
   do i = 1, N
- if (.not. (1 <= iarr(i) .and. iarr(i) < iexp)) STOP 45
+ if (.not. (1 <= iarr(i) .and. iarr(i) <= iexp)) STOP 45
   end do
   if (igot /= iexp) STOP 46
 
@@ -415,7 +415,7 @@ program main
   !$acc end parallel loop
 
   do i = 1, N
- if (.not. (iarr(i) == 1 .or. iarr(i) == N)) STOP 47
+ if (.not. (iarr(i) >= 1 .or. iarr(i) <= N)) STOP 47
   end do
   if (igot /= iexp) STOP 48
 
@@ -424,7 +424,7 @@ program main
 
   !$acc parallel loop copy (igot, itmp)
 do i = 0, N - 1
-  iexpr = ibclr (-2, i)
+  iexpr = ibclr (-1, i)
   !$acc atomic capture
   iarr(i) = igot
   igot = iand (iexpr, igot)
@@ -433,7 +433,7 @@ program main
   !$acc end parallel loop
 
   do i = 1, N
- if (.not. (iarr(i - 1) < 0)) STOP 49
+ if (.not. (popcnt(iarr(i - 1)) > 0)) STOP 49
   end do
   if (igot /= iexp) STOP 50
 
@@ -451,7 +451,7 @@ program main
   !$acc end parallel loop
 
   do i = 1, N
- if (.not. (iarr(i - 1) >= 0)) STOP 51
+ if (.not. (popcnt(iarr(i - 1)) < 32)) STOP 51
   end do
   if (igot /= iexp) STOP 52
 
@@ -469,7 +469,7 @@ program main
   !$acc end parallel loop
 
   do i = 1, N
- if (.not. (iarr(i - 1) < 0)) STOP 53
+ if (.not. (popcnt(iarr(i - 1)) > 0)) STOP 53
   end do
   if (igot /= iexp) STOP 54
 
@@ -755,7 +755,7 @@ program main
   !$acc end parallel loop
 
   do i = 1, N
- if (.not. (iarr(i) == iexp)) STOP 89
+ if (.not. (iarr(i) <= i)) STOP 89
   end do
   if (igot /= iexp) STOP 90
 
@@ -773,7 +773,7 @@ program main
   !$acc end parallel loop
 
   do i = 1, N
- if (.not. (iarr(i - 1) <= 0)) STOP 91
+ if (.not. (popcnt(iarr(i - 1)) < 32)) STOP 91
   end do
   if (igot /= iexp) STOP 92
 
@@ -791,7 +791,7 @@ program main
   !$acc end parallel loop
 
   do i = 1, N
- if (.not. (iarr(i - 1) >= -1)) STOP 93
+ if (.not. (popcnt(iarr(i - 1)) > 0)) STOP 93
   end do
   if (igot /= iexp) STOP 94
 
@@ -809,7 +809,7 @@ program main
   !$acc end parallel loop
 
   do i = 1, N
- if (.not. (iarr(i - 1) <= 0)) STOP 95
+ if (.not. (popcnt(iarr(i - 1)) < 32)) STOP 95
   end do
   if (igot /= iexp) STOP 96
 
@@ -843,7 +843,7 @@ program main
   !$acc end parallel loop
 
   do i = 1, N
- if (.not. (iarr(i) == iexp )) STOP 99
+ if (.not. (iarr(i) <= i)) STOP 99
   end do
   if (igot /= iexp) STOP 100
 
@@ -861,7 +861,7 @@ program main
   !$acc end parallel loop
 
   do i = 1, N
- if (.not. (iarr(i - 1) <= 0)) STOP 101
+ if (.not. (popcnt(iarr(i - 1)) < 32)) STOP 101
   end do
   if (igot /= iexp) STOP 102
 
@@ -879,7 +879,7 @@ program main
   !$acc end parallel loop
 
   do i = 1, N
- if (.not. (iarr(i - 1) >= iexp)) STOP 103
+ if (.not. (popcnt(iarr(i - 1)) > 0)) STOP 103
   end do
   if (igot /= iexp) STOP 104
 
@@ -897,7 +897,7 @@ program main
   !$acc end parallel loop
 
   do i = 

Re: [PATCH] middle-end/97579 - lower VECTOR_BOOLEAN_TYPE_P VEC_COND_EXPRs

2020-11-25 Thread Richard Biener
On Wed, 25 Nov 2020, Richard Biener wrote:

> On Wed, 25 Nov 2020, Richard Sandiford wrote:
> 
> > Richard Biener  writes:
> > > This makes sure to lower VECTOR_BOOLEAN_TYPE_P typed VEC_COND_EXPRs
> > > so we don't try to use vcond to expand those.  That's especially
> > > improtant for x86 integer mode boolean vectors but eventually
> > > as well for aarch64 / gcn VnBImode ones.
> > >
> > > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > >
> > > 2020-11-25  Richard Biener  
> > >
> > >   PR middle-end/97579
> > >   * gimple-isel.cc (gimple_expand_vec_cond_expr): Lower
> > >   VECTOR_BOOLEAN_TYPE_P VEC_COND_EXPRs.
> > >
> > >   * gcc.dg/pr97579.c: New testcase.
> > > ---
> > >  gcc/gimple-isel.cc | 21 +++--
> > >  gcc/testsuite/gcc.dg/pr97579.c | 31 +++
> > >  2 files changed, 50 insertions(+), 2 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/pr97579.c
> > >
> > > diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
> > > index b5362cc4b01..eac1f62e1ff 100644
> > > --- a/gcc/gimple-isel.cc
> > > +++ b/gcc/gimple-isel.cc
> > > @@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.  If not see
> > >  #include "tree-ssa-dce.h"
> > >  #include "memmodel.h"
> > >  #include "optabs.h"
> > > +#include "gimple-fold.h"
> > >  
> > >  /* Expand all ARRAY_REF(VIEW_CONVERT_EXPR) gimple assignments into calls 
> > > to
> > > internal function based on vector type of selected expansion.
> > > @@ -134,6 +135,24 @@ gimple_expand_vec_cond_expr (gimple_stmt_iterator 
> > > *gsi,
> > >lhs = gimple_assign_lhs (stmt);
> > >machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
> > >  
> > > +  /* Lower mask typed VEC_COND_EXPRs to bitwise operations.  Those can
> > > + end up generated by folding and at least for integer mode masks
> > > + we cannot expect vcond expanders to exist.  We lower a ? b : c
> > > + to (b & a) | (c & ~a).  */
> > 
> > I think it is reasonable to provide them for vector modes though.
> > E.g. for SVE we have:
> > 
> > (define_insn "@vcond_mask_"
> >   [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
> > (ior:PRED_ALL
> >   (and:PRED_ALL
> > (match_operand:PRED_ALL 3 "register_operand" "Upa")
> > (match_operand:PRED_ALL 1 "register_operand" "Upa"))
> >   (and:PRED_ALL
> > (not (match_dup 3))
> > (match_operand:PRED_ALL 2 "register_operand" "Upa"]
> >   "TARGET_SVE"
> >   "sel\t%0.b, %3, %1.b, %2.b"
> > )
> > 
> > So it might better to check for an integer mode as well.
> 
> OK, I'll adjust accordingly.  Looks like GCN uses DImode.
> 
> (define_expand "vcond_mask_di"
>   [(parallel
> [(set (match_operand:V_ALL 0   "register_operand" "")
>   (vec_merge:V_ALL
> (match_operand:V_ALL 1 "gcn_vop3_operand" "")
> (match_operand:V_ALL 2 "gcn_alu_operand" "")
> (match_operand:DI 3  "register_operand" "")))
>  (clobber (scratch:))])]
>   ""
>   "")

The following is what I pushed.

Richard.

>From 6a3223f2a445d5a1872f65cabe922fbe2d86eb1b Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Wed, 25 Nov 2020 12:31:54 +0100
Subject: [PATCH] middle-end/97579 - lower VECTOR_BOOLEAN_TYPE_P VEC_COND_EXPRs
To: gcc-patches@gcc.gnu.org

This makes sure to lower VECTOR_BOOLEAN_TYPE_P typed non-vector
mode VEC_COND_EXPRs so we don't try to use vcond to expand those.
That's required for x86 and gcn integer mode boolean vectors.

2020-11-25  Richard Biener  

PR middle-end/97579
* gimple-isel.cc (gimple_expand_vec_cond_expr): Lower
VECTOR_BOOLEAN_TYPE_P, non-vector mode VEC_COND_EXPRs.

* gcc.dg/pr97579.c: New testcase.
---
 gcc/gimple-isel.cc | 22 --
 gcc/testsuite/gcc.dg/pr97579.c | 31 +++
 2 files changed, 51 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr97579.c

diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index b5362cc4b01..83281c0cbf9 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-dce.h"
 #include "memmodel.h"
 #include "optabs.h"
+#include "gimple-fold.h"
 
 /* Expand all ARRAY_REF(VIEW_CONVERT_EXPR) gimple assignments into calls to
internal function based on vector type of selected expansion.
@@ -134,6 +135,25 @@ gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi,
   lhs = gimple_assign_lhs (stmt);
   machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
 
+  /* Lower mask typed, non-vector mode VEC_COND_EXPRs to bitwise operations.
+ Those can end up generated by folding and at least for integer mode masks
+ we cannot expect vcond expanders to exist.  We lower a ? b : c
+ to (b & a) | (c & ~a).  */
+  if (!VECTOR_MODE_P (mode))
+{
+  gcc_assert (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (lhs))
+ && types_compatible_p (TREE_TYPE (op0), TREE_TYPE 

RE: [backport gcc-8,9][arm] Thumb2 out of range conditional branch fix [PR91816]

2020-11-25 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Stam Markianos-Wright 
> Sent: 25 November 2020 13:49
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Ramana
> Radhakrishnan ; Kyrylo Tkachov
> ; ni...@redhat.com
> Subject: [backport gcc-8,9][arm] Thumb2 out of range conditional branch fix
> [PR91816]
> 
> Hi all,
> 
> Now that I have pushed the entirety of this patch to gcc-10 and gcc-11,
> I would like to backport it to gcc-8 and gcc-9.
> 
> PR link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91816
> 
> This patch had originally been approved here:
> 
> https://gcc.gnu.org/legacy-ml/gcc-patches/2020-01/msg02010.html
> 
> See the attached diffs that have been rebased and apply cleanly.
> 
> Tested on a cross arm-none-eabi and also in a Cortex A-15 bootstrap with
> no regressions.
> 
> Ok to backport?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Stam Markianos-Wright


  1   2   >