Re: [PATCH] Improve AVX512 vector shift patterns (PR target/82370)

2017-10-19 Thread Kirill Yukhin
Hello Jakub,
On 04 Oct 21:29, Jakub Jelinek wrote:
> Hi!
> 
> EVEX encoded vector shifts by immediate allow memory operand as input.
> We handle this right for the sra patterns by having 3 distinct
> define_insns, one TARGET_AVX512VL with masking, where the non-masked
> insn names start with *, that have (=v,v,v) and (=v,vm,N) alternatives
> and nonimmediate_operand for the middle operand, then SSE2 pattern
> for the same modes with just the noavx and avx alternatives and finally
> a 512-bit vector pattern with masking that also has the nonimmediate_operand
> etc.  For the logical shifts we have 3 define_insns too, but with very
> different split that makes it not possible to do this.
> 
> The following patch reworks the logical vector shifts so that they are
> similar to the arithmetic right vector shifts, except for the needed V?DI
> differences.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
The patch is OK for trunk.

--
Thanks, K

> 
> 2017-10-04  Jakub Jelinek  
> 
>   PR target/82370
>   * config/i386/sse.md (VI248_AVX2, VI248_AVX512BW, VI248_AVX512BW_2):
>   New mode iterators.
>   (3): Change the last of the 3
>   define_insns for logical vector shifts to use VI248_AVX512BW
>   iterator instead of VI48_AVX512, remove 
>   condition, useless isa and prefix attributes.  Change the first
>   2 of these define_insns to ...
>   (3): ... this, new
>   define_insn for avx512vl.
>   (3): ... and this, new define_insn without
>   masking for non-avx512vl.
> 
>   * gcc.target/i386/avx-pr82370.c: New test.
>   * gcc.target/i386/avx2-pr82370.c: New test.
>   * gcc.target/i386/avx512f-pr82370.c: New test.
>   * gcc.target/i386/avx512bw-pr82370.c: New test.
>   * gcc.target/i386/avx512vl-pr82370.c: New test.
>   * gcc.target/i386/avx512vlbw-pr82370.c: New test.
> 
> --- gcc/config/i386/sse.md.jj 2017-10-04 09:45:55.0 +0200
> +++ gcc/config/i386/sse.md2017-10-04 12:18:19.163858188 +0200
> @@ -403,11 +403,19 @@ (define_mode_iterator VI48_AVX2
>[(V8SI "TARGET_AVX2") V4SI
> (V4DI "TARGET_AVX2") V2DI])
>  
> +(define_mode_iterator VI248_AVX2
> +  [(V16HI "TARGET_AVX2") V8HI
> +   (V8SI "TARGET_AVX2") V4SI
> +   (V4DI "TARGET_AVX2") V2DI])
> +
>  (define_mode_iterator VI248_AVX2_8_AVX512F_24_AVX512BW
>[(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI
> (V16SI "TARGET_AVX512BW") (V8SI "TARGET_AVX2") V4SI
> (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX2") V2DI])
>  
> +(define_mode_iterator VI248_AVX512BW
> +  [(V32HI "TARGET_AVX512BW") V16SI V8DI])
> +
>  (define_mode_iterator VI248_AVX512BW_AVX512VL
>[(V32HI "TARGET_AVX512BW") 
> (V4DI "TARGET_AVX512VL") V16SI V8DI])
> @@ -418,6 +426,11 @@ (define_mode_iterator VI248_AVX512BW_1
>V8SI V4SI
>V2DI])
> 
> +(define_mode_iterator VI248_AVX512BW_2
> + [(V16HI "TARGET_AVX512BW") (V8HI "TARGET_AVX512BW")
> +  V8SI V4SI
> +  V4DI V2DI])
> +   
>  (define_mode_iterator VI48_AVX512F
>[(V16SI "TARGET_AVX512F") V8SI V4SI
> (V8DI "TARGET_AVX512F") V4DI V2DI])
> @@ -10731,59 +10744,51 @@ (define_insn "ashr3"
> (const_string "0")))
> (set_attr "mode" "")])
>  
> -(define_insn "3"
> -  [(set (match_operand:VI2_AVX2_AVX512BW 0 "register_operand" "=x,v")
> - (any_lshift:VI2_AVX2_AVX512BW
> -   (match_operand:VI2_AVX2_AVX512BW 1 "register_operand" "0,v")
> -   (match_operand:DI 2 "nonmemory_operand" "xN,vN")))]
> -  "TARGET_SSE2 &&  && "
> -  "@
> -   p\t{%2, %0|%0, %2}
> -   vp\t{%2, %1, %0|%0, 
> %1, %2}"
> -  [(set_attr "isa" "noavx,avx")
> -   (set_attr "type" "sseishft")
> +(define_insn "3"
> +  [(set (match_operand:VI248_AVX512BW_2 0 "register_operand" "=v,v")
> + (any_lshift:VI248_AVX512BW_2
> +   (match_operand:VI248_AVX512BW_2 1 "nonimmediate_operand" "v,vm")
> +   (match_operand:DI 2 "nonmemory_operand" "v,N")))]
> +  "TARGET_AVX512VL"
> +  "vp\t{%2, %1, %0|%0, 
> %1, %2}"
> +  [(set_attr "type" "sseishft")
> (set (attr "length_immediate")
>   (if_then_else (match_operand 2 "const_int_operand")
> (const_string "1")
> (const_string "0")))
> -   (set_attr "prefix_data16" "1,*")
> -   (set_attr "prefix" "orig,vex")
> (set_attr "mode" "")])
>  
> -(define_insn "3"
> -  [(set (match_operand:VI48_AVX2 0 "register_operand" "=x,x,v")
> - (any_lshift:VI48_AVX2
> -   (match_operand:VI48_AVX2 1 "register_operand" "0,x,v")
> -   (match_operand:DI 2 "nonmemory_operand" "xN,xN,vN")))]
> -  "TARGET_SSE2 && "
> +(define_insn "3"
> +  [(set (match_operand:VI248_AVX2 0 "register_operand" "=x,x")
> + (any_lshift:VI248_AVX2
> +   (match_operand:VI248_AVX2 1 "register_operand" "0,x")
> +   (match_operand:DI 2 "nonmemory_operand" "xN,xN")))]
> +  "TARGET_SSE2"
>"@
> p\t{%2, %0|%0, %2}
> -   vp\t{%2, %1, %0|%0, 
> %1, %2}
> -   vp\t{%2, %1, %0|%0, 
> %1, %2}"  
> -  [(set_attr "isa" "noavx,avx,avx512bw")
> +   vp\t{%2, %1, %0|%0, 

Re: [PATCH] PR target/82624 msp430-elf target must allow for NULL pointer dereferences

2017-10-19 Thread DJ Delorie

Thanks for your patch; I applied it with some minor changes.  Please
note that you don't need to submit patches to generated files (*.1 and
*.info), that patches are customarily made against the development tree
not a released tarball (which is probably why you thought you had to
patch the generated files), and a ChangeLog entry is needed (I wrote one
for you).

Please see https://gcc.gnu.org/contribute.html for more details; the
better prepared your patch is, the easier it is to accept it ;-)

Thanks!
DJ


Re: [PATCH] Prefer shorter VEX encoding of VP{AND,OR,XOR,ANDN} over EVEX when possible (PR target/82370)

2017-10-19 Thread Kirill Yukhin
Hello Jakub, Uroš,
On 04 Oct 13:41, Uros Bizjak wrote:
> On Wed, Oct 4, 2017 at 10:33 AM, Jakub Jelinek  wrote:
> > Hi!
> >
> > Most AVX* instructions have the same insn name between VEX and EVEX
> > encoded insns and whether it is EVEX or VEX encoded is determined by
> > the operands by the assembler (if there is masking/broadcast, or
> > %[xy]mm16+ operands are present, or when using 512-bit vectors).
> > This is not the case for the logical instruction, we have VEX
> > encoded VP{AND,OR,XOR,ANDN} and EVEX encoded VP{AND,OR,XOR,ANDN}{D,Q}.
> >
> > Right now we use the d or q suffixes for -mavx512vl even when we could
> > use VEX encoded insn, which results in larger opcode.
> >
> > The following patch fixes that, by emitting the suffix only when needed.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > 2017-10-04  Jakub Jelinek  
> >
> > PR target/82370
> > * config/i386/sse.md (*andnot3,
> > 3, *3): Split
> > (=v,v,vm) alternative into (=x,x,xm) and (=v,v,vm), for 128-bit
> > and 256-bit vectors, the (=x,x,xm) alternative and when mask is
> > not applied use empty suffix even for TARGET_AVX512VL.
> > * config/i386/subst.md (mask_prefix3, mask_prefix4): When mask
> > is applied, supply evex,evex or evex,evex,evex instead of just
> > evex.
> 
> LGTM, but Kirill should give the final approval for this patch.
Patch is OK for main trunk. Thanks!

--
K


[PATCH][compare-elim] Fix PR rtl-optimization/82597

2017-10-19 Thread Michael Collison
This patch fixes an ICE on x86 because we were not constraining the operands
of a recognized insn. Bootstrapped and tested on aarch64-none-linux-gnu and 
x86_64.

Also successfully compiled the failing test cases in 82597 and duplicate 82592.

Ok for trunk?

2017-10-18  Michael Collison  

PR rtl-optimization/82597
* compare-elim.c: (try_validate_parallel): Constrain operands
of recognized insn.

2017-10-18  Michael Collison  
* gcc.dg/pr82597.c: New test.



rb8338.patch
Description: rb8338.patch


Re: [PATCH] Document --coverage and fork-like functions (PR gcov-profile/82457).

2017-10-19 Thread Sandra Loosemore

On 10/19/2017 12:26 PM, Eric Gallager wrote:

On 10/19/17, Martin Liška  wrote:

Hi.

As discussed in the PR, we should be more precise in our documentation.
The patch does that.

Ready for trunk?
Martin

gcc/ChangeLog:

2017-10-19  Martin Liska  

PR gcov-profile/82457
* doc/invoke.texi: Document that one needs a non-strict ISO mode
for fork-like functions to be properly instrumented.
---
  gcc/doc/invoke.texi | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)





The wording is kinda unclear because the modes in the parentheses are
all strict ISO modes, but the part before the parentheses says
NON-strict... I think you either need an additional "not" inside the
parentheses, or to change all the instances of -std=c* to -std=gnu*.


The wording in the patch doesn't make sense to me, either.  If I 
understand the issue correctly, the intent is probably to say something like


Unless a strict ISO C dialect option is in effect,
@code{fork} calls are detected and correctly handled without double 
counting.


??

-Sandra




Re: [PATCH, RFC] Add a pass counter for "are we there yet" purposes

2017-10-19 Thread Sandra Loosemore

On 10/16/2017 10:15 AM, Richard Biener wrote:

On October 16, 2017 5:46:35 PM GMT+02:00, Sandra Loosemore 
 wrote:

On 10/16/2017 12:53 AM, Richard Biener wrote:

On October 16, 2017 7:38:50 AM GMT+02:00, Sandra Loosemore

 wrote:

This patch is a first cut at solving the problem discussed in this
thread

https://gcc.gnu.org/ml/gcc/2017-10/msg00016.html

where I have some nios2 backend patches in my queue that need a way

of

knowing whether the split1 pass has run yet.  There seemed to be
agreement that a general way to query the pass manager for this
information would be useful.

[snip]


I missed the post of why you need to know this. But as you noticed

we're using reload_completed for similar purpose. There's also the
possibility of setting/adding a pass property that split could provide
and which you could query. We're using this to signal the various
different lowering stages in GIMPLE for example.

See the thread linked above.  There was interest in a general mechanism

to query the pass manager state instead of adding the bookkeeping to
the
nios2 backend or adding something to track the split1 pass to the
target-independent parts of the compiler.  After fiddling with it,
though, I'm not sure this is an improvement.

Maybe it would help the discussion if I got my nios2 patch set posted
so
that everybody could see the actual use case?  It'll take me a few days

to finish cleaning it up.


I guess that might help. I have the feeling that querying for 'did pass X run' 
is wrong conceptually.


Now posted in this thread.
https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01309.html

-Sandra



[patch 5/5] test cases

2017-10-19 Thread Sandra Loosemore
This patch adds new test cases to check that the new addressing 
optimizations are happening as expected.  There's also a tweak to an 
existing test where the optimization patches changed the code generated 
to something that was equally correct but not what the test case was 
trying to test.


-Sandra

2017-10-19  Sandra Loosemore  

	gcc/testsuite/
	* gcc.target/nios2/cdx-branch.c:  Fix broken test.
	* gcc.target/nios2/lo-addr-bypass.c: New.
	* gcc.target/nios2/lo-addr-char.c: New.
	* gcc.target/nios2/lo-addr-int.c: New.
	* gcc.target/nios2/lo-addr-pic.c: New.
	* gcc.target/nios2/lo-addr-short.c: New.
	* gcc.target/nios2/lo-addr-tls.c: New.
	* gcc.target/nios2/lo-addr-uchar.c: New.
	* gcc.target/nios2/lo-addr-ushort.c: New.
	* gcc.target/nios2/lo-addr-volatile.c: New.
diff --git a/gcc/testsuite/gcc.target/nios2/cdx-branch.c b/gcc/testsuite/gcc.target/nios2/cdx-branch.c
index 3b984f2..3a9c459 100644
--- a/gcc/testsuite/gcc.target/nios2/cdx-branch.c
+++ b/gcc/testsuite/gcc.target/nios2/cdx-branch.c
@@ -23,7 +23,7 @@ extern int i (int);
 extern int j (int);
 extern int k (int);
 
-int h (int a)
+int h (int a, int b)
 {
   int x;
 
@@ -31,7 +31,7 @@ int h (int a)
  an unconditional branch from one branch of the "if" to
  the return statement.  We compile this testcase with -Os to
  avoid insertion of a duplicate epilogue in place of the branch.  */
-  if (a == 1)
+  if (a == b)
 x = i (37);
   else
 x = j (42);
diff --git a/gcc/testsuite/gcc.target/nios2/lo-addr-bypass.c b/gcc/testsuite/gcc.target/nios2/lo-addr-bypass.c
new file mode 100644
index 000..24e6cfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nios2/lo-addr-bypass.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=r2 -mbypass-cache" } */
+/* { dg-final { scan-assembler-times "addi\tr., r., %lo" 12 } } */
+/* { dg-final { scan-assembler-not "ldw\t" } } */
+/* { dg-final { scan-assembler-not "stw\t" } } */
+/* { dg-final { scan-assembler-not "ldwio\tr., %lo" } } */
+/* { dg-final { scan-assembler-not "stwio\tr., %lo" } } */
+
+/* Check that we do not generate %lo addresses with R2 ldstio instructions.
+   %lo requires a 16-bit relocation and on R2 these instructions only have a
+   12-bit register offset.  */
+#define TYPE int
+
+struct ss
+{
+  TYPE x1,x2;
+};
+
+extern TYPE S1;
+extern TYPE S2[];
+
+extern struct ss S3;
+extern struct ss S4[];
+
+TYPE *addr1 (void) { return  }
+TYPE get1 (void) { return S1; }
+void set1 (TYPE value) { S1 = value; }
+
+TYPE *addr2 (int i) { return &(S2[i]); }
+TYPE get2 (int i) { return S2[i]; }
+void set2 (int i, TYPE value) { S2[i] = value; }
+
+TYPE *addr3 (void) { return &(S3.x2); }
+TYPE get3 (void) { return S3.x2; }
+void set3 (TYPE value) { S3.x2 = value; }
+
+TYPE *addr4 (int i) { return &(S4[i].x2); }
+TYPE get4 (int i) { return S4[i].x2; }
+void set4 (int i, TYPE value) { S4[i].x2 = value; }
+
diff --git a/gcc/testsuite/gcc.target/nios2/lo-addr-char.c b/gcc/testsuite/gcc.target/nios2/lo-addr-char.c
new file mode 100644
index 000..dd99245
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nios2/lo-addr-char.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-times "addi\tr., r., %lo" 4 } } */
+/* { dg-final { scan-assembler-times "ldbu\tr., %lo" 4 } } */
+/* { dg-final { scan-assembler-times "ldb\tr., %lo" 16 } } */
+/* { dg-final { scan-assembler-times "stb\tr., %lo" 4 } } */
+
+/* Check that various address forms involving a symbolic constant
+   with a possible constant offset and/or index register are optimized
+   to generate a %lo relocation in the load/store instructions instead
+   of a plain register indirect addressing mode.  */
+/* Note: get* uses ldhu but ext* uses ldh since TYPE is signed.  */
+
+#define TYPE signed char
+
+struct ss
+{
+  TYPE x1,x2;
+};
+
+extern TYPE S1;
+extern TYPE S2[];
+
+extern struct ss S3;
+extern struct ss S4[];
+
+TYPE *addr1 (void) { return  }
+TYPE get1 (void) { return S1; }
+void set1 (TYPE value) { S1 = value; }
+
+TYPE *addr2 (int i) { return &(S2[i]); }
+TYPE get2 (int i) { return S2[i]; }
+void set2 (int i, TYPE value) { S2[i] = value; }
+
+TYPE *addr3 (void) { return &(S3.x2); }
+TYPE get3 (void) { return S3.x2; }
+void set3 (TYPE value) { S3.x2 = value; }
+
+TYPE *addr4 (int i) { return &(S4[i].x2); }
+TYPE get4 (int i) { return S4[i].x2; }
+void set4 (int i, TYPE value) { S4[i].x2 = value; }
+
+int extw1 (void) { return (int)(S1); }
+int extw2 (int i) { return (int)(S2[i]); }
+int extw3 (void) { return (int)(S3.x2); }
+int extw4 (int i) { return (int)(S4[i].x2); }
+unsigned int extwu1 (void) { return (unsigned int)(S1); }
+unsigned int extwu2 (int i) { return (unsigned int)(S2[i]); }
+unsigned int extwu3 (void) { return (unsigned int)(S3.x2); }
+unsigned int extwu4 (int i) { return (unsigned int)(S4[i].x2); }
+
+short exth1 (void) { return (short)(S1); }
+short exth2 (int i) { return (short)(S2[i]); }
+short exth3 (void) { return 

[patch 4/5] nios2 cost model improvements

2017-10-19 Thread Sandra Loosemore

This patch adds a TARGET_ADDRESS_COST hook to the nios2 backend.  I
also made a number of improvements to the TARGET_RTX_COSTS hook.  I
suspect the original implementation was copied from some other backend
in the mists of ancient time, and never really adjusted to match the
nios2 architecture.  :-P

The change to nios2_legitimize_address to recognize register indirect
with constant offset is for the benefit of the ivopts pass cost
modeling; falling through to target-independent code doesn't produce
that mode directly, which caused ivopts to overestimate its cost.

-Sandra

2017-10-19  Sandra Loosemore  

	gcc/
	* config/nios2/nios2.c (nios2_rtx_costs): Make costs better
	reflect reality.
	(nios2_address_cost): Define.
	(nios2_legitimize_address): Recognize (exp + constant) directly.
	(TARGET_ADDRESS_COST): Define.
diff --git a/gcc/config/nios2/nios2.c b/gcc/config/nios2/nios2.c
index d739d7f..dfd00dd 100644
--- a/gcc/config/nios2/nios2.c
+++ b/gcc/config/nios2/nios2.c
@@ -63,6 +63,7 @@ static const char *nios2_unspec_reloc_name (int);
 static void nios2_register_builtin_fndecl (unsigned, tree);
 static rtx nios2_ldst_parallel (bool, bool, bool, rtx, int,
 unsigned HOST_WIDE_INT, bool);
+static int nios2_address_cost (rtx, machine_mode, addr_space_t, bool);
 
 /* Threshold for data being put into the small data/bss area, instead
of the normal data area (references to the small data/bss area take
@@ -1502,29 +1503,25 @@ nios2_simple_const_p (const_rtx cst)
cost has been computed, and false if subexpressions should be
scanned.  In either case, *TOTAL contains the cost result.  */
 static bool
-nios2_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED,
-		 int outer_code ATTRIBUTE_UNUSED,
-		 int opno ATTRIBUTE_UNUSED,
-		 int *total, bool speed ATTRIBUTE_UNUSED)
+nios2_rtx_costs (rtx x, machine_mode mode,
+		 int outer_code,
+		 int opno,
+		 int *total, bool speed)
 {
   int code = GET_CODE (x);
 
   switch (code)
 {
   case CONST_INT:
-if (INTVAL (x) == 0)
+if (INTVAL (x) == 0 || nios2_simple_const_p (x))
   {
 *total = COSTS_N_INSNS (0);
 return true;
   }
-else if (nios2_simple_const_p (x))
-  {
-*total = COSTS_N_INSNS (2);
-return true;
-  }
 else
   {
-*total = COSTS_N_INSNS (4);
+	/* High + lo_sum.  */
+*total = COSTS_N_INSNS (1);
 return true;
   }
 
@@ -1532,10 +1529,30 @@ nios2_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED,
   case SYMBOL_REF:
   case CONST:
   case CONST_DOUBLE:
-{
-  *total = COSTS_N_INSNS (4);
-  return true;
-}
+	if (gprel_constant_p (x))
+  {
+*total = COSTS_N_INSNS (1);
+return true;
+  }
+	else
+	  {
+	/* High + lo_sum.  */
+	*total = COSTS_N_INSNS (1);
+	return true;
+	  }
+
+  case HIGH:
+	{
+	  /* This is essentially a constant.  */
+	  *total = COSTS_N_INSNS (0);
+	  return true;
+	}
+
+  case LO_SUM:
+	{
+	  *total = COSTS_N_INSNS (0);
+	  return true;
+	}
 
   case AND:
 	{
@@ -1549,29 +1566,83 @@ nios2_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED,
 	  return false;
 	}
 
+  /* For insns that have an execution latency (3 cycles), don't
+	 penalize by the full amount since we can often schedule
+	 to avoid it.  */
   case MULT:
 {
-  *total = COSTS_N_INSNS (1);
+	  if (!TARGET_HAS_MUL)
+	*total = COSTS_N_INSNS (5);  /* Guess?  */
+	  else if (speed)
+	*total = COSTS_N_INSNS (2);  /* Latency adjustment.  */
+	  else 
+	*total = COSTS_N_INSNS (1);
   return false;
 }
-  case SIGN_EXTEND:
+
+  case DIV:
 {
-  *total = COSTS_N_INSNS (3);
+	  if (!TARGET_HAS_DIV)
+	*total = COSTS_N_INSNS (5);  /* Guess?  */
+	  else if (speed)
+	*total = COSTS_N_INSNS (2);  /* Latency adjustment.  */
+	  else 
+	*total = COSTS_N_INSNS (1);
   return false;
 }
-  case ZERO_EXTEND:
+
+  case ASHIFT:
+  case ASHIFTRT:
+  case LSHIFTRT:
+  case ROTATE:
 {
-  *total = COSTS_N_INSNS (1);
+	  if (!speed)
+	*total = COSTS_N_INSNS (1);
+	  else 
+	*total = COSTS_N_INSNS (2);  /* Latency adjustment.  */
   return false;
 }
+	
+  case ZERO_EXTRACT:
+	if (TARGET_HAS_BMX)
+	  {
+	*total = COSTS_N_INSNS (1);
+	return true;
+	  }
+	return false;
 
-case ZERO_EXTRACT:
-  if (TARGET_HAS_BMX)
+  case SIGN_EXTEND:
+{
+	  if (MEM_P (XEXP (x, 0)))
+	*total = COSTS_N_INSNS (1);
+	  else
+	*total = COSTS_N_INSNS (3);
+	  return false;
+	}
+
+  case MEM:
 	{
-  *total = COSTS_N_INSNS (1);
-  return true;
+	  rtx addr = XEXP (x, 0);
+
+	  /* Account for cost of different addressing modes.  */
+	  *total = nios2_address_cost (addr, mode, ADDR_SPACE_GENERIC, 

[patch 3/5] defer splitting of symbolic addresses

2017-10-19 Thread Sandra Loosemore
This patch defers splitting of symbolic addresses into HIGH/LO_SUM pairs 
to the split1 pass.  Currently this is handled in the movsi expander, 
which runs too early to allow optimizations such as cse before 
splitting.  Deferring the splitting requires that these addresses be 
recognized as legitimate meanwhile.  There are new splitters added to 
all the insns that take MEM operands with general address forms, as well 
as movsi.


Also included in this patch, LO_SUM is recognized as a legitimate 
address form and there is code to support emitting assembly language for it.


-Sandra

2017-10-19  Sandra Loosemore  

	gcc/
	* config/nios2/nios2-protos.h (nios2_large_constant_p): Declare.
	(nios2_symbolic_memory_operand_p): Declare.
	(nios2_split_large_constant): Declare.
	(nios2_split_symbolic_memory_operand): Declare.
	* config/nios2/nios2.c (nios2_symbolic_constant_p): New.
	(nios2_plus_symbolic_constant_p): New.
	(nios2_valid_addr_expr_p): Recognize addresses involving 
	symbolic constants.
	(nios2_legitimate_address_p): Likewise, also LO_SUM.
	(nios2_symbolic_memory_operand_p): New.
	(nios2_large_constant_p): New.
	(nios2_split_large_constant): New.
	(nios2_split_plus_large_constant): New.
	(nios2_split_symbolic_memory_operand): New.
	(nios2_legitimize_address): Code refactoring.  Handle addresses
	involving symbolic constants.
	(nios2_emit_move_sequence): Likewise.
	(nios2_print_operand): Improve error output.
	(nios2_print_operand_address): Handle LO_SUM.
	(nios2_cdx_narrow_form_p): Likewise.
	* config/nios2/nios2.md (movqi_internal): Add splitter for memory
	operands involving symbolic constants.
	(movhi_internal, movsi_internal): Likewise.
	(zero_extendhisi2, zero_extendqi2): Likewise.
	(extendhisi2, extendqi2): Likewise.
diff --git a/gcc/config/nios2/nios2-protos.h b/gcc/config/nios2/nios2-protos.h
index 925d46a..33f0ec3 100644
--- a/gcc/config/nios2/nios2-protos.h
+++ b/gcc/config/nios2/nios2-protos.h
@@ -31,6 +31,11 @@ extern void nios2_function_profiler (FILE *, int);
 extern void nios2_init_expanders (void);
 
 #ifdef RTX_CODE
+extern bool nios2_large_constant_p (rtx);
+extern bool nios2_symbolic_memory_operand_p (rtx);
+
+extern rtx nios2_split_large_constant (rtx, rtx);
+extern rtx nios2_split_symbolic_memory_operand (rtx);
 extern bool nios2_emit_move_sequence (rtx *, machine_mode);
 extern void nios2_emit_expensive_div (rtx *, machine_mode);
 extern void nios2_adjust_call_address (rtx *, rtx);
diff --git a/gcc/config/nios2/nios2.c b/gcc/config/nios2/nios2.c
index 120adb5..d739d7f 100644
--- a/gcc/config/nios2/nios2.c
+++ b/gcc/config/nios2/nios2.c
@@ -55,6 +55,7 @@
 #include "target-def.h"
 
 /* Forward function declarations.  */
+static bool nios2_symbolic_constant_p (rtx);
 static bool prologue_saved_reg_p (unsigned);
 static void nios2_load_pic_register (void);
 static void nios2_register_custom_code (unsigned int, enum nios2_ccs_code, int);
@@ -1975,7 +1976,39 @@ nios2_validate_compare (machine_mode mode, rtx *cmp, rtx *op1, rtx *op2)
 }
 
 
-/* Addressing Modes.  */
+/* Addressing modes and constants.  */
+
+/* Return true if X is constant expression with a reference to an
+   "ordinary" symbol; not GOT-relative, not GP-relative, not TLS.  */
+static bool
+nios2_symbolic_constant_p (rtx x)
+{
+  rtx base, offset;
+
+  if (flag_pic)
+return false;
+  if (GET_CODE (x) == LABEL_REF)
+return true;
+  else if (CONSTANT_P (x))
+{
+  split_const (x, , );
+  return (SYMBOL_REF_P (base)
+		&& !SYMBOL_REF_TLS_MODEL (base)
+		&& !gprel_constant_p (base)
+		&& SMALL_INT (INTVAL (offset)));
+}
+  return false;
+}
+
+/* Return true if X is an expression of the form 
+   (PLUS reg symbolic_constant).  */
+static bool
+nios2_plus_symbolic_constant_p (rtx x)
+{
+  return (GET_CODE (x) == PLUS
+	  && REG_P (XEXP (x, 0))
+	  && nios2_symbolic_constant_p (XEXP (x, 1)));
+}
 
 /* Implement TARGET_LEGITIMATE_CONSTANT_P.  */
 static bool
@@ -2044,6 +2077,8 @@ nios2_valid_addr_expr_p (rtx base, rtx offset, bool strict_p)
 	  && nios2_regno_ok_for_base_p (REGNO (base), strict_p)
 	  && (offset == NULL_RTX
 	  || nios2_valid_addr_offset_p (offset)
+	  || (nios2_symbolic_constant_allowed () 
+		  && nios2_symbolic_constant_p (offset))
 	  || nios2_unspec_reloc_p (offset)));
 }
 
@@ -2066,6 +2101,11 @@ nios2_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED,
 
   /* Else, fall through.  */
 case LABEL_REF:
+  if (nios2_symbolic_constant_allowed () 
+	  && nios2_symbolic_constant_p (operand))
+	return true;
+
+  /* Else, fall through.  */
 case CONST_INT:
 case CONST_DOUBLE:
   return false;
@@ -2080,9 +2120,28 @@ nios2_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED,
 rtx op0 = XEXP (operand, 0);
 rtx op1 = XEXP (operand, 1);
 
-	return (nios2_valid_addr_expr_p (op0, op1, strict_p)
-		|| nios2_valid_addr_expr_p (op1, op0, strict_p));
+	if (nios2_valid_addr_expr_p (op0, op1, strict_p) 
+	|| 

[patch 2/5] add hook to track when splitting is complete

2017-10-19 Thread Sandra Loosemore

This patch adds a function to indicate whether the split1 pass has run
yet.  This is used in part 3 of the patch set to decide whether 32-bit
symbolic constant expressions are permitted, e.g. in
TARGET_LEGITIMATE_ADDRESS_P and the movsi expander.

Since there's currently no usable hook for querying the pass manager
where it is relative to another pass, I implemented this using a
target-specific pass that runs directly after split1 and does nothing
but set a flag.

-Sandra

2017-10-19  Sandra Loosemore  

	gcc/
	* config/nios2/nios2_protos.h (nios2_init_expanders): Declare.
	* config/nios2/nios2.c: Adjust includes.
	(split_completed): New.
	(nios2_symbolic_constant_allowed): New.
	(nios2_init_expanders): New.
	(nios2_postsplit_pass_data): New.
	(class pass_nios2_postsplit): New.
	(make_nios2_postsplit): New.
	(nios2_option_override): Register postsplit pass.
	* config/nios2/nios2.h (INIT_EXPANDERS): Define.
diff --git a/gcc/config/nios2/nios2-protos.h b/gcc/config/nios2/nios2-protos.h
index 4478334..925d46a 100644
--- a/gcc/config/nios2/nios2-protos.h
+++ b/gcc/config/nios2/nios2-protos.h
@@ -28,6 +28,7 @@ extern void nios2_expand_prologue (void);
 extern void nios2_expand_epilogue (bool);
 extern bool nios2_expand_return (void);
 extern void nios2_function_profiler (FILE *, int);
+extern void nios2_init_expanders (void);
 
 #ifdef RTX_CODE
 extern bool nios2_emit_move_sequence (rtx *, machine_mode);
diff --git a/gcc/config/nios2/nios2.c b/gcc/config/nios2/nios2.c
index 659250a..120adb5 100644
--- a/gcc/config/nios2/nios2.c
+++ b/gcc/config/nios2/nios2.c
@@ -47,6 +47,8 @@
 #include "toplev.h"
 #include "langhooks.h"
 #include "stor-layout.h"
+#include "tree-pass.h"
+#include "context.h"
 #include "builtins.h"
 
 /* This file should be included last.  */
@@ -101,6 +103,71 @@ static int custom_code_index[256];
 static bool custom_code_conflict = false;
 
 
+/* Support for target-specific postsplit pass.  */
+
+/* Symbolic constants are split into high/lo_sum pairs during the 
+   split1 pass.  After that, they are not considered legitimate addresses.
+   In order to keep track of whether or not we can generate or recognize
+   symbolic constants, we use a target-specific pass that runs 
+   immediately after split1 and does nothing but set a flag we can check.  */
+
+static bool split_completed = false;
+
+static bool
+nios2_symbolic_constant_allowed (void)
+{
+  /* The reload_completed check is for the benefit of
+ nios2_asm_output_mi_thunk and perhaps other places that try to
+ emulate a post-reload pass.  */
+  return !split_completed && !reload_completed;
+}
+
+/* Use INIT_EXPANDERS to ensure that split_completed is reset before
+   compiling every function.  */
+void
+nios2_init_expanders (void)
+{
+  split_completed = false;
+}
+
+namespace {
+
+static const pass_data nios2_postsplit_pass_data =
+{
+  RTL_PASS,  	// type
+  "nios2_postsplit",	// name
+  OPTGROUP_NONE, 	// optinfo_flags
+  TV_REST_OF_COMPILATION,// tv_id
+  0, 	// properties_required
+  0, 	// properties_provided
+  0, 	// properties_destroyed
+  0, 	// todo_flags_start
+  0		 	// todo_flags_finish
+};
+
+class pass_nios2_postsplit : public rtl_opt_pass
+{
+ public:
+  pass_nios2_postsplit (gcc::context *ctxt)
+: rtl_opt_pass (nios2_postsplit_pass_data, ctxt)
+  {}
+
+  virtual unsigned int execute (function *)
+{
+  split_completed = true;
+  return 0;
+}
+};
+
+} /* Anonymous namespace */
+
+rtl_opt_pass *
+make_nios2_postsplit (gcc::context *ctxt)
+{
+  return new pass_nios2_postsplit (ctxt);
+}
+
+
 /* Definition of builtin function types for nios2.  */
 
 #define N2_FTYPES\
@@ -1407,6 +1474,12 @@ nios2_option_override (void)
 
   nios2_custom_check_insns ();
 
+  /* Register target-specific passes.  */
+  opt_pass *postsplit = make_nios2_postsplit (g);
+  struct register_pass_info finish_split_info
+= { postsplit, "split1", 1, PASS_POS_INSERT_AFTER };
+  register_pass (_split_info);
+
   /* Save the initial options in case the user does function specific
  options.  */
   target_option_default_node = target_option_current_node
diff --git a/gcc/config/nios2/nios2.h b/gcc/config/nios2/nios2.h
index 10bebfb..d077dc5 100644
--- a/gcc/config/nios2/nios2.h
+++ b/gcc/config/nios2/nios2.h
@@ -503,6 +503,9 @@ do {\
 #define ASM_PREFERRED_EH_DATA_FORMAT(CODE, GLOBAL)		\
   (flag_pic ? DW_EH_PE_aligned : DW_EH_PE_sdata4)
 
+/* Target-specific initialization.  */
+#define INIT_EXPANDERS nios2_init_expanders ()
+
 /* Misc. parameters.  */
 
 #define STORE_FLAG_VALUE 1


[patch 1/5] switch nios2 backend to LRA

2017-10-19 Thread Sandra Loosemore

This patch switches the nios2 backend to use LRA.  It's conceptually
independent of the other changes in this patch set and in fact I had
wanted to push this change earlier in the summer.  But, enabling LRA
by itself causes a handful of test regressions due to a missing
movsi splitter -- which is provided in part 3 of the patch series.

-Sandra

2017-10-19  Sandra Loosemore  

	gcc/
	* config/nios2/nios2.c (TARGET_LRA_P): Don't override.
diff --git a/gcc/config/nios2/nios2.c b/gcc/config/nios2/nios2.c
index 2602605..659250a 100644
--- a/gcc/config/nios2/nios2.c
+++ b/gcc/config/nios2/nios2.c
@@ -5052,9 +5052,6 @@ nios2_adjust_reg_alloc_order (void)
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P nios2_legitimate_address_p
 
-#undef TARGET_LRA_P
-#define TARGET_LRA_P hook_bool_void_false
-
 #undef TARGET_PREFERRED_RELOAD_CLASS
 #define TARGET_PREFERRED_RELOAD_CLASS nios2_preferred_reload_class
 


[patch 0/5] nios2 address mode improvements

2017-10-19 Thread Sandra Loosemore

This is the set of nios2 optimization patches that I've previously
mentioned in these threads:

https://gcc.gnu.org/ml/gcc/2017-10/msg00016.html
https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00957.html

To give an overview of what this is for

The nios2 backend currently generates quite bad code for memory
accesses with addresses involving symbolic constants.  Like a typical
RISC machine, nios2 requires splitting such 32-bit constants into
HIGH/LO_SUM pairs.  Currently this happens in expand, and address
expressions involving such constants are always converted to use a
register indirect form.

One part of the problem is that the backend currently doesn't
recognize that LO_SUM is a legitimate address form (it's register
indirect with a constant offset using the %lo relocation).  That's
fixed in these patches.

A harder problem is that doing the high/lo_sum splitting in expand
inhibits subsequent optimizations.  One such problem arises when you
have accesses to multiple fields in a static structure object.  Expand
sees this as many (symbol + offset) expressions involving the same
symbol with different constant offsets.  What we should be doing in
that case is CSE'ing the symbol address computation rather than
splitting every such expression individually.

This patch series attacks that problem by deferring splitting to the
split1 pass, which happens after cse and fwprop optimizations.
Deferring the splitting also requires that TARGET_LEGITIMATE_ADDRESS_P
accept these symbolic constant expressions until the splitting takes
place, and that code that might generate 32-bit constants in other
places (e.g., the movsi expander) must not do so after they are
supposed to have been split.

This patch series also includes general improvements to the cost model
to get better CSE results -- in particular, the nios2 backend has been 
completely missing an implementation for TARGET_ADDRESS_COST.  I also 
found that making TARGET_LEGITIMIZE_ADDRESS smarter resulted in better

address cost modeling by the ivopts pass.

All together, this resulted in about a 7% code size improvement on the
customer-provided test case I was using for tuning purposes.

Patches in this set are broken down as follows:

1: Switch to LRA.
2: Detect when splitting has been completed.
3: Add splitters and recognize the new address modes.
4: Cost model improvements.
5: Test cases.

Part 2 is the piece that relates to the discussion linked above.  As
implemented, it works fine, but it's maybe not the best design.  I'll
hold off on committing the entire set for at least a few days in case
somebody wants to suggest a better solution.

-Sandra



[PATCH, rs6000] Add Power 9 support for vec_first builtins

2017-10-19 Thread Carl Love
GCC maintainers:

The following patch adds support for the vec_first_match_index,
vec_first_match_or_eos_index, vec_first_mismatch_index,  and
vec_first_mismatch_or_eos_index builtins for ISA 3.0.  The patch has
been tested on:

powerpc64le-unknown-linux-gnu (Power 8 LE)
powerpc64le-unknown-linux-gnu (Power 9 LE)

without regressions.  

Please let me know if the following patch is acceptable.  Thanks.

   Carl Love


gcc/ChangeLog:

2017-10-19  Carl Love  

* config/rs6000/rs6000-c.c: Add support for builtins:
unsigned int vec_first_match_index (vector signed char,
vector signed char);
unsigned int vec_first_match_index (vector unsigned char,
vector unsigned char);
unsigned int vec_first_match_index (vector signed int,
vector signed int);
unsigned int vec_first_match_index (vector unsigned int,
vector unsigned int);
unsigned int vec_first_match_index (vector signed short,
vector signed short);
unsigned int vec_first_match_index (vector unsigned short,
vector unsigned short);
unsigned int vec_first_match_or_eos_index (vector signed char,
   vector signed char);
unsigned int vec_first_match_or_eos_index (vector unsigned char,
   vector unsigned char);
unsigned int vec_first_match_or_eos_index (vector signed int,
   vector signed int);
unsigned int vec_first_match_or_eos_index (vector unsigned int,
   vector unsigned int);
unsigned int vec_first_match_or_eos_index (vector signed short,
   vector signed short);
unsigned int vec_first_match_or_eos_index (vector unsigned short,
   vector unsigned short);
unsigned int vec_first_mismatch_index (vector signed char,
   vector signed char);
unsigned int vec_first_mismatch_index (vector unsigned char,
   vector unsigned char);
unsigned int vec_first_mismatch_index (vector signed int,
   vector signed int);
unsigned int vec_first_mismatch_index (vector unsigned int,
   vector unsigned int);
unsigned int vec_first_mismatch_index (vector signed short,
   vector signed short);
unsigned int vec_first_mismatch_index (vector unsigned short,
   vector unsigned short);
unsigned int vec_first_mismatch_or_eos_index (vector signed char,
  vector signed char);
unsigned int vec_first_mismatch_or_eos_index (vector unsigned char,
  vector unsigned char);
unsigned int vec_first_mismatch_or_eos_index (vector signed int,
  vector signed int);
unsigned int vec_first_mismatch_or_eos_index (vector unsigned int,
  vector unsigned int);
unsigned int vec_first_mismatch_or_eos_index (vector signed short,
  vector signed short);
unsigned int vec_first_mismatch_or_eos_index (vector unsigned short,
  vector unsigned short);
* config/rs6000/rs6000-builtin.def (VFIRSTMATCHINDEX,
VFIRSTMATCHOREOSINDEX, VFIRSTMISMATCHINDEX, VFIRSTMISMATCHOREOSINDEX):
Add BU_P9V_AV_2 expansions for the builtins.
* config/rs6000/altivec.h (vec_first_match_index,
vec_first_mismatch_index, vec_first_match_or_eos_index,
vec_first_mismatch_or_eos_index): Add #defines for the builtins.
* config/rs6000/rs6000-protos.h (bytes_in_mode): Add extern
declaration.
* config/rs6000/rs6000.c (bytes_in_mode): Add function to return mode
size in bytes.
* config/rs6000/vsx.md: (first_match_index_,
first_match_or_eos_index_, first_mismatch_index_,
first_mismatch_or_eos_index_):  Add define expand to implement
the builtins.
(vctzlsbb_): Add mode field to define_insn for vctzlsbb.
* doc/extend.texi: Update the built-in documenation file for the new
built-in functions.

gcc/testsuite/ChangeLog:

2017-10-19  

[Ada] Spurious elaboration error in SPARK

2017-10-19 Thread Pierre-Marie de Rodat
This patch corrects the diagnostics related to elaboration requirements of
instantiations imposed on a unit. In addition, the patch updates two key
routines to handle compilation unit instances.


-- Source --


--  gnat.adc

pragma SPARK_Mode (On);

--  gen_pack.ads

generic
package Gen_Pack is
   generic
   package Nested is
  procedure Proc;
   end Nested;
end Gen_Pack;

--  gen_pack.adb

with Ada.Text_IO; use Ada.Text_IO;

package body Gen_Pack is
   package body Nested is
  procedure Proc is
  begin
 Put_Line ("Proc");
  end Proc;
   end Nested;
end Gen_Pack;

--  gen_proc.ads

generic
procedure Gen_Proc;

--  gen_proc.adb

with Ada.Text_IO; use Ada.Text_IO;

procedure Gen_Proc is
begin
   Put_Line ("Gen_Proc");
end Gen_Proc;

--  inst_proc.ads

with Gen_Proc;
pragma Elaborate (Gen_Proc);

procedure Inst_Proc is new Gen_Proc;

--  inst_proc2.ads

with Gen_Proc;

procedure Inst_Proc2 is new Gen_Proc;

--  inst_pack.ads

with Gen_Pack;
pragma Elaborate_All (Gen_Pack);

package Inst_Pack is new Gen_Pack;

--  inst_pack2.ads

with Gen_Pack;

package Inst_Pack2 is new Gen_Pack;

--  inst_nested.ads

with Inst_Pack;
pragma Elaborate_All (Inst_Pack);

package Inst_Nested is new Inst_Pack.Nested;

--  inst_nested2.ads

with Inst_Pack;

package Inst_Nested2 is new Inst_Pack.Nested;

--  call_proc.ads

package Call_Proc is
   procedure Force_Body;
end Call_Proc;

--  call_proc.adb

with Inst_Proc;
pragma Elaborate_All (Inst_Proc);

package body Call_Proc is
   procedure Force_Body is begin null; end Force_Body;
begin
   Inst_Proc;
end Call_Proc;

--  call_proc2.ads

package Call_Proc2 is
   procedure Force_Body;
end Call_Proc2;

--  call_proc2.adb

with Inst_Proc2;

package body Call_Proc2 is
   procedure Force_Body is begin null; end Force_Body;
begin
   Inst_Proc2;
end Call_Proc2;

--  call_nested_proc.ads

package Call_Nested_Proc is
   procedure Force_Body;
end Call_Nested_Proc;

--  call_nested_proc.adb

with Inst_Nested;
pragma Elaborate_All (Inst_Nested);

package body Call_Nested_Proc is
   procedure Force_Body is begin null; end Force_Body;
begin
   Inst_Nested.Proc;
end Call_Nested_Proc;

--  call_nested_proc2.ads

package Call_Nested_Proc2 is
   procedure Force_Body;
end Call_Nested_Proc2;

--  call_nested_proc2.adb

with Inst_Nested2;

package body Call_Nested_Proc2 is
   procedure Force_Body is begin null; end Force_Body;
begin
   Inst_Nested2.Proc;
end Call_Nested_Proc2;


-- Compilation and output --


$ gcc -c -gnatd.v inst_proc.ads
$ gcc -c -gnatd.v inst_pack.ads
$ gcc -c -gnatd.v inst_nested.ads
$ gcc -c -gnatd.v call_proc.adb
$ gcc -c -gnatd.v call_nested_proc.adb
$ gcc -c -gnatd.v inst_proc2.ads
$ gcc -c -gnatd.v inst_pack2.ads
$ gcc -c -gnatd.v inst_nested2.ads
$ gcc -c -gnatd.v call_proc2.adb
$ gcc -c -gnatd.v call_nested_proc2.adb
inst_proc2.ads:3:01: instantiation of "Gen_Proc" during elaboration in SPARK
inst_proc2.ads:3:01: unit "Inst_Proc2" requires pragma "Elaborate" for
  "Gen_Proc"
inst_proc2.ads:3:01:   body of unit "Inst_Proc2" elaborated
inst_proc2.ads:3:01:   procedure "Gen_Proc" instantiated as "Inst_Proc2" at
  line 3
inst_pack2.ads:3:01: instantiation of "Gen_Pack" during elaboration in SPARK
inst_pack2.ads:3:01: unit "Inst_Pack2" requires pragma "Elaborate_All" for
  "Gen_Pack"
inst_pack2.ads:3:01:   spec of unit "Inst_Pack2" elaborated
inst_pack2.ads:3:01:   package "Gen_Pack" instantiated as "Inst_Pack2" at line
  3
inst_nested2.ads:3:01: instantiation of "Nested" during elaboration in SPARK
inst_nested2.ads:3:01: unit "Inst_Nested2" requires pragma "Elaborate_All" for
  "Inst_Pack"
inst_nested2.ads:3:01:   spec of unit "Inst_Nested2" elaborated
inst_nested2.ads:3:01:   package "Nested" instantiated as "Inst_Nested2" at
  line 3
call_proc2.adb:6:04: call to "Inst_Proc2" during elaboration in SPARK
call_proc2.adb:6:04: unit "Call_Proc2" requires pragma "Elaborate_All" for
  "Inst_Proc2"
call_proc2.adb:6:04:   body of unit "Call_Proc2" elaborated
call_proc2.adb:6:04:   procedure "Inst_Proc2" called at line 6
call_nested_proc2.adb:6:16: call to "Proc" during elaboration in SPARK
call_nested_proc2.adb:6:16: unit "Call_Nested_Proc2" requires pragma
  "Elaborate_All" for "Inst_Nested2"
call_nested_proc2.adb:6:16:   body of unit "Call_Nested_Proc2" elaborated
call_nested_proc2.adb:6:16:   procedure "Proc" called at line 6

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-10-19  Hristian Kirtchev  

* sem_elab.adb (Compilation_Unit): Handle the case of a subprogram
instantiation that acts as a compilation unit.
(Find_Code_Unit): Reimplemented.
(Find_Top_Unit): Reimplemented.
(Find_Unit_Entity): New routine.
(Process_Instantiation_SPARK): Correct the elaboration requirement a
package instantiation imposes on a unit.

Index: sem_elab.adb

Re: [PATCH, version 2], Add support for _Float and _FloatX sqrt, fma, fmin, fmax built-in functions

2017-10-19 Thread Joseph Myers
On Fri, 20 Oct 2017, Michael Meissner wrote:

> On Thu, Oct 19, 2017 at 10:15:44PM +, Joseph Myers wrote:
> > On Thu, 19 Oct 2017, Michael Meissner wrote:
> > 
> > > 1)I switched to use DEF_EXT_LIB_BUILTIN to declare the _Float 
> > > and
> > >   _FloatX functions.  This allows treating __builtin_sqrtf128 the same
> > >   as sqrtf128.
> > 
> > It's not correct to do that unconditionally for all the existing 
> > DEF_GCC_FLOATN_NX_BUILTINS functions.  There should not be a public 
> > huge_valf128 function any more than a public huge_val function, just 
> > __builtin_huge_valf128.
> > 
> > Rather, you should add a new DEF_EXT_LIB_FLOATN_NX_BUILTINS.  It should be 
> > used by the new functions, and by the existing copysign / fabs / nan 
> > functions.  It should not be used by the existing huge_val / inf / nans 
> > functions.
> 
> Ok.  Would an initial patch to go back to using DEF_GCC_BUILTIN (which means
> people would need to use __builtin_sqrtf128 and __builtin_fmaf128) be
> acceptable, while I delve into the appropriate incantations to get it to work
> as desired be ok?

My argument was that *if* you provide __FP_FAST_FMAF128 (etc.), that 
should relate to an fmaf128 (etc.) built-in function.  If you don't define 
__FP_FAST_*, doing everything with DEF_GCC_BUILTIN would be reasonable as 
a starting point (with a view to moving selected functions to 
DEF_EXT_LIB_BUILTIN and defining __FP_FAST_* in a subsequent patch, and 
subject to calls still falling back to call the right external functions, 
e.g. sqrtf128 to deal with the errno-setting case of __builtin_sqrtf128).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [patch, c++] Add a warning flag for the enum bit-field declaration warning in bug #61414.

2017-10-19 Thread Martin Sebor

On 10/19/2017 12:38 PM, Eric Gallager wrote:

On 10/16/17, Martin Sebor  wrote:

On 10/16/2017 06:37 AM, Sam van Kampen via gcc-patches wrote:

..I just realised that the clang flag is -Wbitfield-enum-conversion, not
-Wenum-bitfield-conversion. Please apply the patch below instead, which
has replaced the two words to remove the inconsistency.

2017-10-16  Sam van Kampen  

* c-family/c.opt: Add a warning flag for struct bit-fields
being too small to hold enumerated types.
* cp/class.c: Likewise.
* doc/invoke.texi: Add documentation for said warning flag.

Index: gcc/c-family/c.opt
===
--- gcc/c-family/c.opt  (revision 253784)
+++ gcc/c-family/c.opt  (working copy)
@@ -346,6 +346,10 @@ Wframe-address
 C ObjC C++ ObjC++ Var(warn_frame_address) Warning LangEnabledBy(C ObjC
C++ ObjC++,Wall)
 Warn when __builtin_frame_address or __builtin_return_address is used
unsafely.

+Wbitfield-enum-conversion
+C++ Var(warn_bitfield_enum_conversion) Init(1) Warning
+Warn about struct bit-fields being too small to hold enumerated types.


Warning by default is usually reserved for problems that are
most likely indicative of bugs.  Does this rise to that level?
(It seems that it should be in the same class as -Wconversion).


The warning is already enabled by default; this patch just adds the
new flag controlling it. I don't think it's worth changing it away
from warning by default when it already warns by default now.


The C++ warning in its present form is considered unhelpful
by users (as evident from the bug report).  That it warns by
default is part of the problem.  GCC (in C mode), Clang, and
Intel ICC warn only for enums whose enumerators cannot fit in
the bitfield.  (In Clang the warning is in addition disabled
by default and has to be explicitly enabled.)

The suggestion is to add a two-level warning and adjust G++
to warn the same way at level 1 as other compilers do (and
to include this level in -Wall, or if it's considered
appropriate and necessary, enable it by default), and warn
as it does now only at level 2 (with that level having to
be explicitly requested).  I would further suggest to do
the same in GCC (in C mode), if only for consistency.

Martin



Re: [PATCH, version 2], Add support for _Float and _FloatX sqrt, fma, fmin, fmax built-in functions

2017-10-19 Thread Michael Meissner
On Thu, Oct 19, 2017 at 10:15:44PM +, Joseph Myers wrote:
> On Thu, 19 Oct 2017, Michael Meissner wrote:
> 
> > 1)  I switched to use DEF_EXT_LIB_BUILTIN to declare the _Float 
> > and
> > _FloatX functions.  This allows treating __builtin_sqrtf128 the same
> > as sqrtf128.
> 
> It's not correct to do that unconditionally for all the existing 
> DEF_GCC_FLOATN_NX_BUILTINS functions.  There should not be a public 
> huge_valf128 function any more than a public huge_val function, just 
> __builtin_huge_valf128.
> 
> Rather, you should add a new DEF_EXT_LIB_FLOATN_NX_BUILTINS.  It should be 
> used by the new functions, and by the existing copysign / fabs / nan 
> functions.  It should not be used by the existing huge_val / inf / nans 
> functions.

Ok.  Would an initial patch to go back to using DEF_GCC_BUILTIN (which means
people would need to use __builtin_sqrtf128 and __builtin_fmaf128) be
acceptable, while I delve into the appropriate incantations to get it to work
as desired be ok?

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices

2017-10-19 Thread Martin Sebor

On 10/19/2017 02:34 AM, Richard Biener wrote:

On Thu, Oct 19, 2017 at 1:19 AM, Martin Sebor  wrote:

On 10/18/2017 04:48 AM, Richard Biener wrote:


On Wed, Oct 18, 2017 at 5:34 AM, Martin Sebor  wrote:


While testing my latest -Wrestrict changes I noticed a number of
opportunities to improve the -Warray-bounds warning.  Attached
is a patch that implements a solution for the following subset
of these:

PR tree-optimization/82596 - missing -Warray-bounds on an out-of
  bounds index into string literal
PR tree-optimization/82588 - missing -Warray-bounds on an excessively
  large index
PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds
  inner indices



I meant to use size_type_node (size_t), not sizetype.  But
I just checked that ptrdiff_type_node is initialized in
build_common_tree_nodes and thus always available.


I see.  Using ptrdiff_type_node is preferable for the targets
where ptrdiff_t has a greater precision than size_t (e.g., VMS).
It makes sense now.  I should remember to change all the other
places where I introduced ssizetype to use ptrdiff_type_node.




As an aside, at some point I would like to get away from a type
based limit in all these warnings and instead use one that can
be controlled by an option so that a user can impose a lower limit
on the maximum size of an object and have all size-related warnings
(and perhaps even optimizations) enforce it and benefit from it.


You could add a --param that is initialized from ptrdiff_type_node.


Yes, that's an option to consider.  Thanks.




+  tree arg = TREE_OPERAND (ref, 0);
+  tree_code code = TREE_CODE (arg);
+  if (code == COMPONENT_REF)
+   {
+ HOST_WIDE_INT off;
+ if (tree base = get_addr_base_and_unit_offset (ref, ))
+   up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype, up_bound_p1,
+  TYPE_SIZE_UNIT (TREE_TYPE (base)));
+ else
+   return;

so this gives up on a.b[i].c.d[k] (ok, array_at_struct_end_p will be
false).
simply not subtracting anyhing instead of returning would be
conservatively
correct, no?  Likewise subtracting the offset of the array for all
"previous"
variably indexed components with assuming the lowest value for the index.
But as above I think compensating for the offset of the array within the
object
is academic ... ;)



I was going to say yes (it gives up) but on second thought I don't
think it does.  Only the major index can be unbounded and the code
does consider the size of the sub-array when checking the major
index.  So, IIUC, I think this works correctly as is (*).  What
doesn't work is VLAs but those are a separate problem.  Let me
know if I misunderstood your question.


get_addr_base_and_unit_offset will return NULL if there's any variable
component in 'ref'.  So as written it seems to be dead code (you
want to pass 'arg'?)


Sorry, I'm not sure I understand what you mean.  What do you think
is dead code?  The call to get_addr_base_and_unit_offset() is also
made for an array of unspecified bound (up_bound is null) and for
an array at the end of a struct.  For those the function returns
non-null, and for the others (arrays of runtime bound) it returns
null.  (I passed arg instead of ref but I see no difference in
my tests.)



I was asking you to remove the 'else return' because w/o subtracting
the upper bound is just more conservative.


Sure, that sounds good.

Attached is an updated patch.

Thanks
Martin
PR tree-optimization/82588 - missing -Warray-bounds on a excessively large index
PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds inner indic

gcc/ChangeLog:
	PR tree-optimization/82588
	PR tree-optimization/82583
	* tree-vrp.c (check_array_ref): Handle flexible array members,
	string literals, and inner indices.
	(search_for_addr_array): Add detail to diagnostics.

gcc/testsuite/ChangeLog:
	PR tree-optimization/82588
	PR tree-optimization/82583	
	* c-c++-common/Warray-bounds.c: New test.
	* gcc.dg/Warray-bounds-11.c: Adjust.
	* gcc.dg/Warray-bounds-22.c: New test.

diff --git a/gcc/testsuite/c-c++-common/Warray-bounds.c b/gcc/testsuite/c-c++-common/Warray-bounds.c
new file mode 100644
index 000..2207999
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Warray-bounds.c
@@ -0,0 +1,240 @@
+/* PR tree-optimization/82588 - missing -Warray-bounds on an excessively
+   large index
+   { dg-do compile }
+   { dg-options "-O2 -Warray-bounds -ftrack-macro-expansion=0" }  */
+
+#define SIZE_MAX  __SIZE_MAX__
+#define SSIZE_MAX __PTRDIFF_MAX__
+#define SSIZE_MIN (-SSIZE_MAX - 1)
+
+typedef __PTRDIFF_TYPE__ ssize_t;
+typedef __SIZE_TYPE__size_t;
+
+extern ssize_t signed_value (void)
+{
+  extern volatile ssize_t signed_value_source;
+  return signed_value_source;
+}
+
+extern size_t unsigned_value (void)
+{
+  extern volatile size_t unsigned_value_source;
+  return unsigned_value_source;
+}
+
+ssize_t signed_range (ssize_t min, ssize_t max)
+{

Re: [PATCH, version 2], Add support for _Float and _FloatX sqrt, fma, fmin, fmax built-in functions

2017-10-19 Thread Joseph Myers
On Thu, 19 Oct 2017, Michael Meissner wrote:

> 1)I switched to use DEF_EXT_LIB_BUILTIN to declare the _Float 
> and
>   _FloatX functions.  This allows treating __builtin_sqrtf128 the same
>   as sqrtf128.

It's not correct to do that unconditionally for all the existing 
DEF_GCC_FLOATN_NX_BUILTINS functions.  There should not be a public 
huge_valf128 function any more than a public huge_val function, just 
__builtin_huge_valf128.

Rather, you should add a new DEF_EXT_LIB_FLOATN_NX_BUILTINS.  It should be 
used by the new functions, and by the existing copysign / fabs / nan 
functions.  It should not be used by the existing huge_val / inf / nans 
functions.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH, version 2], Add support for _Float and _FloatX sqrt, fma, fmin, fmax built-in functions

2017-10-19 Thread Michael Meissner
On Wed, Sep 13, 2017 at 10:49:43PM +, Joseph Myers wrote:
> On Wed, 13 Sep 2017, Michael Meissner wrote:
> 
> > This patch adds support on PowerPC ISA 3.0 for the built-in function
> > __builtin_sqrtf128 generating the XSSQRTQP hardware square root instruction 
> > and
> > the built-in function __builtin_fmaf128 generating XSMADDQP, XSMSUBQP,
> > XSNMADDQP, and XSNMSUBQP fused multiply-add instructions.
> 
> Is there a reason for these to be architecture-specific rather than 
> generic everywhere _Float128 is supported?  (With the fmaf128 / sqrtf128 
> names available as well as the __builtin_* variants of those.)

The basic reason was I hadn't yet discovered all of the places that need to be
modified to add generic _Float128 math functions.

> Full support for _FloatN/_FloatNx variants of all the existing built-in 
> functions might be complicated, and run into potential issues with startup 
> cost of creating large numbers of extra built-in functions (it's 
> desirable, but possibly hard, which is why I excluded it from the initial 
> _FloatN / _FloatNx support patches).  But adding just these two functions 
> to builtins.def and making them fold / expand appropriately ought to be 
> much simpler.  (I realise sqrt goes through internal-fn.def and 
> DEF_INTERNAL_FLT_FN expects a particular set of functions for standard 
> types, so maybe some duplication would be involved to get the built-in 
> function expanded appropriately, i.e. using an insn pattern or a call to 
> an external sqrtf128 function according to whether such an insn pattern is 
> available.  fma ought not to involve much more than adding an extra case 
> where CASE_FLT_FN (BUILT_IN_FMA) is used.)

I have now gone through and added the proper support for _Float128 sqrt, fma,
fmin, and fmax.  I have added the framework so that other functions as needed
can be added over time.

> > While I was at it, I changed the documentation so that it no longer 
> > documents
> > the 'q' built-in functions (to mirror libquadmath) but instead just 
> > documented
> > the 'f128' functions that matches glibc 2.26 and the technical report that
> > added the _FloatF128 date.
> 
> Those *f128 built-in functions (inf / huge_val / nan / nans / fabs / 
> copysign) are not target-specific; they exist for all _FloatN / _FloatNx 
> types for all targets with such types.  So it doesn't seem appropriate to 
> document them in a target-specific section of the manual, beyond a brief 
> cross-reference to the documentation of the functions as 
> target-independent.

Highlights of the patch:

1)  I switched to use DEF_EXT_LIB_BUILTIN to declare the _Float and
_FloatX functions.  This allows treating __builtin_sqrtf128 the same
as sqrtf128.

2)  Add support in gencfn-macros.c to build the appropriate CASE_CFN_* and
operators in cfn-operators.pd that can be used as needed.

3)  I did not enable _Float128 support for all math built-ins, but just the
built-in functions I am currently need to support in (just like
copysign and fabs were previously done).  I expect over time there
might be some more needed to be added to the list.  I added fmin and
fmax to the machine independent built-ins, but I will submit a patch
later to enable them in the PowerPC.

4)  I went through and added support for copysign, fma, fmin, and fmax
functions in the same places the current float/double/long double
functions are handled.

5)  I removed the PowerPC sqrtf128 and fmaf128 built-ins, since these are
now handled by machine independent code.  In doing so, I deleted two
tests that did not allow the built-ins where the software emulator is
used.  The GLIBC 2.26 as shipped with the Advance Toolchain 11.0-1
contain these functions.

6)  In the previous version of the patch, I put in a special warning for
fmaf128 (that it might not be present if the h/w instructions weren't
available).  When I wrote that patch, the initial release of Advance
Toolchain 11.0-0 did not include a fmaf128 function.  It now includes
the function, so I don't need the warning.

I have checked this patch on the following systems with bootstrap and make
check for gcc/g++/gfortran/lto:

1)  Little endian power8 system using --with-cpu=power8
2)  Big endian power7 system (both 64/32-bit) using --with-cpu=power7
3)  Little endian power9 prototype system using --with-cpu=power9
4)  A Fedora 21 x86_64 system

Can I check these patches into the trunk?

[gcc]
2017-10-19  Michael Meissner  

* builtins.c (CASE_MATHRN_FLOATN): New helper macro to support
math functions that have _Float and _FloatX variants.
(mathfn_built_in_2): Add support for copysign, fma, fmax, fmin,
and sqrt having _Float and _FloatX variants.
(DEF_INTERNAL_FLT_FLOATN_FN): New helper macro to support for math
   

Re: [RFA] Implement __VA_OPT__

2017-10-19 Thread Tom Tromey
> "Tom" == Tom Tromey  writes:

Tom> [ __VA_OPT__ ]
Tom> Here's v3.

Tom> Ping.

Ping #2.

Tom


Re: [patch] avoid printing leading 0 in widest_int hex dumps

2017-10-19 Thread Richard Sandiford
Richard Sandiford  writes:
> Aldy Hernandez  writes:
>> On Tue, Oct 17, 2017 at 6:05 PM, Richard Sandiford
>>  wrote:
>>> Andrew MacLeod  writes:
 On 10/17/2017 08:18 AM, Richard Sandiford wrote:
> Aldy Hernandez  writes:
>> Hi folks!
>>
>> Calling print_hex() on a widest_int with the most significant bit turned
>> on can lead to a leading zero being printed (0x0). This produces
>> confusing dumps to say the least, especially when you incorrectly assume
>> an integer is NOT signed :).
> That's the intended behaviour though.  wide_int-based types only use as
> many HWIs as they need to store their current value, with any other bits
> in the value being a sign extension of the top bit.  So if the most
> significant HWI in a widest_int is zero, that HWI is there to say that
> the previous HWI should be zero- rather than sign-extended.
>
> So:
>
> 0x0  -> (1 << 32) - 1 to infinite precision
> (i.e. a positive value)
> 0x   -> -1
>
> Thanks,
> Richard

 I for one find this very confusing.  If I have a 128 bit value, I don't
 expect to see a 132 bits.  And there are enough 0's its not obvious when
 I look.
>>>
>>> But Aldy was talking about widest_int, which is wider than 128 bits.
>>> It's an approximation of infinite precision.
>>
>> IMO, we should document this leading zero in print_hex, as it's not
>> inherently obvious.
>>
>> But yes, I was talking about widest_int.  I should explain what I am
>> trying to accomplish, since perhaps there is a better way.
>>
>> I am printing a a wide_int (bounds[i] below), but I really don't want
>> to print the sign extension nonsense, since it's a detail of the
>> underlying representation.
>
> Ah!  OK.  Yeah, I agree it doesn't make sense to print sign-extension
> bits above the precision.  I think it'd work if print_hex used
> extract_uhwi insteead of elt, which would also remove the need
> to handle "negative" numbers specially.  I'll try that tomorrow.

How about this?  Not tested much beyond the selftests themselves.

Thanks,
Richard


gcc/
* wide-int-print.cc (print_hex): Loop based on extract_uhwi.
Don't print any bits outside the precision of the value.
* wide-int.cc (test_printing): Add some new tests.

Index: gcc/wide-int-print.cc
===
--- gcc/wide-int-print.cc   2017-07-02 10:05:20.997439436 +0100
+++ gcc/wide-int-print.cc   2017-10-19 21:20:05.138500726 +0100
@@ -103,30 +103,28 @@ print_decu (const wide_int_ref , FILE
 }
 
 void
-print_hex (const wide_int_ref , char *buf)
+print_hex (const wide_int_ref , char *buf)
 {
-  int i = wi.get_len ();
-
-  if (wi == 0)
+  if (val == 0)
 buf += sprintf (buf, "0x0");
   else
 {
-  if (wi::neg_p (wi))
+  buf += sprintf (buf, "0x");
+  int start = ROUND_DOWN (val.get_precision (), HOST_BITS_PER_WIDE_INT);
+  int width = val.get_precision () - start;
+  bool first_p = true;
+  for (int i = start; i >= 0; i -= HOST_BITS_PER_WIDE_INT)
{
- int j;
- /* If the number is negative, we may need to pad value with
-0xFFF...  because the leading elements may be missing and
-we do not print a '-' with hex.  */
- buf += sprintf (buf, "0x");
- for (j = BLOCKS_NEEDED (wi.get_precision ()); j > i; j--)
-   buf += sprintf (buf, HOST_WIDE_INT_PRINT_PADDED_HEX, 
HOST_WIDE_INT_M1);
-
+ unsigned HOST_WIDE_INT uhwi = wi::extract_uhwi (val, i, width);
+ if (!first_p)
+   buf += sprintf (buf, HOST_WIDE_INT_PRINT_PADDED_HEX, uhwi);
+ else if (uhwi != 0)
+   {
+ buf += sprintf (buf, HOST_WIDE_INT_PRINT_HEX_PURE, uhwi);
+ first_p = false;
+   }
+ width = HOST_BITS_PER_WIDE_INT;
}
-  else
-   buf += sprintf (buf, "0x" HOST_WIDE_INT_PRINT_HEX_PURE, wi.elt (--i));
-
-  while (--i >= 0)
-   buf += sprintf (buf, HOST_WIDE_INT_PRINT_PADDED_HEX, wi.elt (i));
 }
 }
 
Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2017-10-19 21:19:47.100454414 +0100
+++ gcc/wide-int.cc 2017-10-19 21:20:05.138500726 +0100
@@ -2253,6 +2253,17 @@ test_printing ()
   VALUE_TYPE a = from_int (42);
   assert_deceq ("42", a, SIGNED);
   assert_hexeq ("0x2a", a);
+  assert_hexeq ("0x1f", wi::shwi (-1, 69));
+  assert_hexeq ("0x", wi::mask (64, false, 69));
+  assert_hexeq ("0x", wi::mask  (64, false));
+  if (WIDE_INT_MAX_PRECISION > 128)
+{
+  assert_hexeq ("0x2fffe",
+   wi::lshift (1, 129) + wi::lshift (1, 64) - 2);
+  assert_hexeq 

[PATCH] Slightly optimize noreturn functions (PR target/82158)

2017-10-19 Thread Jakub Jelinek
Hi!

The following patch, when optimizing, replaces GIMPLE_RETURN in
noreturn functions right after warning about them with __builtin_unreachable
().  The advantage of that is that we don't emit unnecessary epilogues for
them and perhaps optimize away some other useless computations if the only
side-effect of those is feeding the return value.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-10-19  Jakub Jelinek  

PR target/82158
* tree-cfg.c (pass_warn_function_return::execute): In noreturn
functions when optimizing replace GIMPLE_RETURN stmts with
calls to __builtin_unreachable ().

* gcc.dg/tree-ssa/noreturn-1.c: New test.

--- gcc/tree-cfg.c.jj   2017-10-10 22:04:09.0 +0200
+++ gcc/tree-cfg.c  2017-10-19 18:41:04.843653944 +0200
@@ -9084,13 +9084,29 @@ pass_warn_function_return::execute (func
   && EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (fun)->preds) > 0)
 {
   location = UNKNOWN_LOCATION;
-  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (fun)->preds)
+  for (ei = ei_start (EXIT_BLOCK_PTR_FOR_FN (fun)->preds);
+  (e = ei_safe_edge (ei)); )
{
  last = last_stmt (e->src);
  if ((gimple_code (last) == GIMPLE_RETURN
   || gimple_call_builtin_p (last, BUILT_IN_RETURN))
- && (location = gimple_location (last)) != UNKNOWN_LOCATION)
+ && location == UNKNOWN_LOCATION
+ && (location = gimple_location (last)) != UNKNOWN_LOCATION
+ && !optimize)
break;
+ /* When optimizing, replace return stmts in noreturn functions
+with __builtin_unreachable () call.  */
+ if (optimize && gimple_code (last) == GIMPLE_RETURN)
+   {
+ tree fndecl = builtin_decl_implicit (BUILT_IN_UNREACHABLE);
+ gimple *new_stmt = gimple_build_call (fndecl, 0);
+ gimple_set_location (new_stmt, gimple_location (last));
+ gimple_stmt_iterator gsi = gsi_for_stmt (last);
+ gsi_replace (, new_stmt, true);
+ remove_edge (e);
+   }
+ else
+   ei_next ();
}
   if (location == UNKNOWN_LOCATION)
location = cfun->function_end_locus;
--- gcc/testsuite/gcc.dg/tree-ssa/noreturn-1.c.jj   2017-10-19 
18:50:53.861660409 +0200
+++ gcc/testsuite/gcc.dg/tree-ssa/noreturn-1.c  2017-10-19 18:55:32.270354804 
+0200
@@ -0,0 +1,42 @@
+/* { dg-do compile } *
+/* { dg-options "-O2 -fdump-tree-ssa -std=gnu11" } */
+/* { dg-final { scan-tree-dump-times "__builtin_unreachable" 4 "ssa" } } */
+
+void bar1 (void);
+void bar2 (void);
+void bar3 (void);
+void bar4 (void);
+
+_Noreturn void
+foo1 (int *p, int y)
+{
+  bar1 ();
+  *p = y;
+  return;  /* { dg-warning "function declared 'noreturn' has a 'return' 
statement" } */
+}  /* { dg-warning "'noreturn' function does return" "" { target 
*-*-* } .-1 } */
+
+_Noreturn void
+foo2 (int *p, int y)
+{
+  bar2 ();
+  *p = y;
+}  /* { dg-warning "'noreturn' function does return" } */
+
+_Noreturn void
+foo3 (int *p, int y)
+{
+  if (y > 10)
+return;/* { dg-warning "function declared 'noreturn' has a 'return' 
statement" } */
+  bar3 ();
+  *p = y;
+  return;  /* { dg-warning "function declared 'noreturn' has a 'return' 
statement" } */
+}  /* { dg-warning "'noreturn' function does return" } */
+
+_Noreturn void
+foo4 (int *p, int y)
+{
+  if (y > 10)
+return;/* { dg-warning "function declared 'noreturn' has a 'return' 
statement" } */
+  bar4 ();
+  *p = y;
+}  /* { dg-warning "'noreturn' function does return" } */

Jakub


[PATCH] Fix bootstrap with libsanitizer (PR sanitizer/82595)

2017-10-19 Thread Jakub Jelinek
Hi!

The PR claims a bootstrap failure because we link lsan_preinit.cc that
contains .preinit_array section item into the liblsan shared library.

Of course, this object isn't meant to be linked into the library, but
rather handled like libasan_preinit.o - linked directly into executables.
The point of these is to initialize the runtimes as early as possible.

I've discovered that we install libtsan_preinit.o, but don't actually
link it into executables, the patch fixes that too.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

For the release branches, I'd just go for removing lsan_preinit.cc from
lsan_files and nothing else.

2017-10-19  Jakub Jelinek  

PR sanitizer/82595
* config/gnu-user.h (LIBTSAN_EARLY_SPEC): Add libtsan_preinit.o
for -fsanitize=thread link of executables.
(LIBLSAN_EARLY_SPEC): Add liblsan_preinit.o for -fsanitize=leak
link of executables.

* lsan/lsan.h (__lsan_init): Add SANITIZER_INTERFACE_ATTRIBUTE.
* lsan/Makefile.am (nodist_toolexeclib_HEADERS): Add
liblsan_preinit.o.
(lsan_files): Remove lsan_preinit.cc.
(liblsan_preinit.o): New rule.
* lsan/Makefile.in: Regenerated.

--- gcc/config/gnu-user.h.jj2017-09-13 16:22:13.0 +0200
+++ gcc/config/gnu-user.h   2017-10-19 15:06:11.427688056 +0200
@@ -162,11 +162,13 @@ see the files COPYING3 and COPYING.RUNTI
   LD_STATIC_OPTION " --whole-archive -lasan --no-whole-archive " \
   LD_DYNAMIC_OPTION "}}%{!static-libasan:-lasan}"
 #undef LIBTSAN_EARLY_SPEC
-#define LIBTSAN_EARLY_SPEC "%{static-libtsan:%{!shared:" \
+#define LIBTSAN_EARLY_SPEC "%{!shared:libtsan_preinit%O%s} " \
+  "%{static-libtsan:%{!shared:" \
   LD_STATIC_OPTION " --whole-archive -ltsan --no-whole-archive " \
   LD_DYNAMIC_OPTION "}}%{!static-libtsan:-ltsan}"
 #undef LIBLSAN_EARLY_SPEC
-#define LIBLSAN_EARLY_SPEC "%{static-liblsan:%{!shared:" \
+#define LIBLSAN_EARLY_SPEC "%{!shared:liblsan_preinit%O%s} " \
+  "%{static-liblsan:%{!shared:" \
   LD_STATIC_OPTION " --whole-archive -llsan --no-whole-archive " \
   LD_DYNAMIC_OPTION "}}%{!static-liblsan:-llsan}"
 #endif
--- libsanitizer/lsan/lsan.h.jj 2017-10-19 18:59:17.0 +0200
+++ libsanitizer/lsan/lsan.h2017-10-19 21:16:09.313551127 +0200
@@ -64,4 +64,4 @@ void GetStackTraceWithPcBpAndContext(__s
 extern bool lsan_inited;
 extern bool lsan_init_is_running;
 
-extern "C" void __lsan_init();
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE void __lsan_init();
--- libsanitizer/lsan/Makefile.am.jj2017-10-19 13:20:58.0 +0200
+++ libsanitizer/lsan/Makefile.am   2017-10-19 15:13:53.280015739 +0200
@@ -12,6 +12,7 @@ ACLOCAL_AMFLAGS = -I m4
 noinst_LTLIBRARIES = libsanitizer_lsan.la
 if LSAN_SUPPORTED
 toolexeclib_LTLIBRARIES = liblsan.la
+nodist_toolexeclib_HEADERS = liblsan_preinit.o
 endif
 
 sanitizer_lsan_files = \
@@ -27,7 +28,6 @@ lsan_files = \
lsan_malloc_mac.cc \
lsan_allocator.cc \
lsan_interceptors.cc \
-   lsan_preinit.cc \
lsan_thread.cc
 
 libsanitizer_lsan_la_SOURCES = $(sanitizer_lsan_files)
@@ -40,6 +40,9 @@ endif
 liblsan_la_LIBADD += $(LIBSTDCXX_RAW_CXX_LDFLAGS)
 liblsan_la_LDFLAGS = -version-info `grep -v '^\#' $(srcdir)/libtool-version` 
$(link_liblsan)
 
+liblsan_preinit.o: lsan_preinit.o
+   cp $< $@
+
 # Work around what appears to be a GNU make bug handling MAKEFLAGS
 # values defined in terms of make variables, as is the case for CC and
 # friends when we are called from the top level Makefile.
--- libsanitizer/lsan/Makefile.in.jj2017-10-19 13:20:58.0 +0200
+++ libsanitizer/lsan/Makefile.in   2017-10-19 15:14:17.530717247 +0200
@@ -15,6 +15,7 @@
 
 @SET_MAKE@
 
+
 VPATH = @srcdir@
 am__make_dryrun = \
   { \
@@ -100,7 +101,8 @@ am__uninstall_files_from_dir = { \
 || { echo " ( cd '$$dir' && rm -f" $$files ")"; \
  $(am__cd) "$$dir" && rm -f $$files; }; \
   }
-am__installdirs = "$(DESTDIR)$(toolexeclibdir)"
+am__installdirs = "$(DESTDIR)$(toolexeclibdir)" \
+   "$(DESTDIR)$(toolexeclibdir)"
 LTLIBRARIES = $(noinst_LTLIBRARIES) $(toolexeclib_LTLIBRARIES)
 am__DEPENDENCIES_1 =
 liblsan_la_DEPENDENCIES =  \
@@ -110,7 +112,7 @@ liblsan_la_DEPENDENCIES =  \
 am__objects_1 = lsan_common.lo lsan_common_linux.lo lsan_common_mac.lo
 am__objects_2 = $(am__objects_1) lsan.lo lsan_linux.lo lsan_mac.lo \
lsan_malloc_mac.lo lsan_allocator.lo lsan_interceptors.lo \
-   lsan_preinit.lo lsan_thread.lo
+   lsan_thread.lo
 am_liblsan_la_OBJECTS = $(am__objects_2)
 liblsan_la_OBJECTS = $(am_liblsan_la_OBJECTS)
 liblsan_la_LINK = $(LIBTOOL) --tag=CXX $(AM_LIBTOOLFLAGS) \
@@ -139,6 +141,7 @@ am__can_run_installinfo = \
 n|no|NO) false;; \
 *) (install-info --version) >/dev/null 2>&1;; \
   esac
+HEADERS = $(nodist_toolexeclib_HEADERS)
 ETAGS = etags
 CTAGS = ctags
 ACLOCAL = @ACLOCAL@
@@ -298,6 +301,7 @@ AM_CXXFLAGS = -Wall -W -Wno-unused-param
 ACLOCAL_AMFLAGS = -I m4
 

Re: [patch, c++] Add a warning flag for the enum bit-field declaration warning in bug #61414.

2017-10-19 Thread Eric Gallager
On 10/16/17, Martin Sebor  wrote:
> On 10/16/2017 06:37 AM, Sam van Kampen via gcc-patches wrote:
>> ..I just realised that the clang flag is -Wbitfield-enum-conversion, not
>> -Wenum-bitfield-conversion. Please apply the patch below instead, which
>> has replaced the two words to remove the inconsistency.
>>
>> 2017-10-16  Sam van Kampen  
>>
>> * c-family/c.opt: Add a warning flag for struct bit-fields
>> being too small to hold enumerated types.
>> * cp/class.c: Likewise.
>> * doc/invoke.texi: Add documentation for said warning flag.
>>
>> Index: gcc/c-family/c.opt
>> ===
>> --- gcc/c-family/c.opt   (revision 253784)
>> +++ gcc/c-family/c.opt   (working copy)
>> @@ -346,6 +346,10 @@ Wframe-address
>>  C ObjC C++ ObjC++ Var(warn_frame_address) Warning LangEnabledBy(C ObjC
>> C++ ObjC++,Wall)
>>  Warn when __builtin_frame_address or __builtin_return_address is used
>> unsafely.
>>
>> +Wbitfield-enum-conversion
>> +C++ Var(warn_bitfield_enum_conversion) Init(1) Warning
>> +Warn about struct bit-fields being too small to hold enumerated types.
>
> Warning by default is usually reserved for problems that are
> most likely indicative of bugs.  Does this rise to that level?
> (It seems that it should be in the same class as -Wconversion).

The warning is already enabled by default; this patch just adds the
new flag controlling it. I don't think it's worth changing it away
from warning by default when it already warns by default now.

>
>> +
>>  Wbuiltin-declaration-mismatch
>>  C ObjC C++ ObjC++ Var(warn_builtin_declaraion_mismatch) Init(1) Warning
>>  Warn when a built-in function is declared with the wrong signature.
>> Index: gcc/cp/class.c
>> ===
>> --- gcc/cp/class.c   (revision 253784)
>> +++ gcc/cp/class.c   (working copy)
>> @@ -3278,10 +3278,11 @@ check_bitfield_decl (tree field)
>> && tree_int_cst_lt (TYPE_SIZE (type), w)))
>>  warning_at (DECL_SOURCE_LOCATION (field), 0,
>>  "width of %qD exceeds its type", field);
>> -  else if (TREE_CODE (type) == ENUMERAL_TYPE
>> +  else if (warn_bitfield_enum_conversion
>> +   && TREE_CODE (type) == ENUMERAL_TYPE
>> && (0 > (compare_tree_int
>>  (w, TYPE_PRECISION (ENUM_UNDERLYING_TYPE (type))
>> -warning_at (DECL_SOURCE_LOCATION (field), 0,
>> +warning_at (DECL_SOURCE_LOCATION (field),
>> OPT_Wbitfield_enum_conversion,
>>  "%qD is too small to hold all values of %q#T",
>>  field, type);
>
> I would suggest to include a bit more detail in the message, such
> as the smallest number of bits needed to represent every value of
> the enumeration.  It's also often helpful to include a note
> pointing to the relevant declaration (in this case it could be
> the bit-field).
>
>>  }
>> Index: gcc/doc/invoke.texi
>> ===
>> --- gcc/doc/invoke.texi  (revision 253784)
>> +++ gcc/doc/invoke.texi  (working copy)
>> @@ -264,7 +264,7 @@ Objective-C and Objective-C++ Dialects}.
>>  -Walloca  -Walloca-larger-than=@var{n} @gol
>>  -Wno-aggressive-loop-optimizations  -Warray-bounds
>> -Warray-bounds=@var{n} @gol
>>  -Wno-attributes  -Wbool-compare  -Wbool-operation @gol
>> --Wno-builtin-declaration-mismatch @gol
>> +-Wbitfield-enum-conversion -Wno-builtin-declaration-mismatch @gol
>>  -Wno-builtin-macro-redefined  -Wc90-c99-compat  -Wc99-c11-compat @gol
>>  -Wc++-compat  -Wc++11-compat  -Wc++14-compat  @gol
>>  -Wcast-align  -Wcast-align=strict  -Wcast-qual  @gol
>> @@ -5431,6 +5431,12 @@ Incrementing a boolean is invalid in C++17, and de
>>
>>  This warning is enabled by @option{-Wall}.
>>
>> +@item -Wbitfield-enum-conversion
>> +@opindex Wbitfield-enum-conversion
>> +@opindex Wno-bitfield-enum-conversion
>> +Warn about a bit-field potentially being too small to hold all values
>> +of an enumerated type. This warning is enabled by default.
>> +
>
> The brief description makes me wonder what this really means.  What
> is the meaning of "potentially?"  What (what expressions) triggers
> the warning?  Presumably it's initialization and assignment.  Does
> explicit or implicit conversion also trigger is?  I would suggest
> to expand the documentation to obviate these questions, and perhaps
> also include an example.
>
> Martin
>


Re: [RFC PATCH] Merge libsanitizer from upstream

2017-10-19 Thread Eric Gallager
On 10/19/17, Jakub Jelinek  wrote:
> On Thu, Oct 19, 2017 at 02:07:24PM +0300, Maxim Ostapenko wrote:
>> > Is the patch (the merge + this incremental) ok for trunk?
>>
>> I think the patch is OK, just wondering about two things:
>
> Richi just approved the patch on IRC, so I'll commit, then we can deal with
> follow-ups.
>
>> 1) We have a bunch of GCC local patches, did you include them into a
>> cumulative patch (I guess yes)?
>
> I have done some verification today, diff from upstream r285547 to
> unpatched
> GCC (with the LLVM Compiler infrastructure two liners removed), attached
> P1,
> and diff from upstream r315899 to patched GCC (again, two liners removed),
> attached P2 and went through the changes in P1 and verified that except for
> the ubsan backwards compatibility we had that can't work anymore everything
> else was upstreamed, or remained in P2.  So P2 is the current diff from
> upstream, with the sanitizer_common/sanitizer_symbolizer_libbacktrace.cc
> changes now filed upstream.
>
>> 2) Upstream has enabled LSan for x86 and ARM, is it worth to enable them
>> in GCC too?
>
> Maybe, feel free to post patches.  For LSan we need to split off
> lsan_preinit
> out of liblsan and link it into executables, will handle it next (there is
> a PR about it, just wanted to wait until the merge is in).
>

Just for reference, the PR number is bug 82595 btw, for anyone reading
who isn't already on it:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82595

>   Jakub
>


Re: [PATCH] Document --coverage and fork-like functions (PR gcov-profile/82457).

2017-10-19 Thread Eric Gallager
On 10/19/17, Martin Liška  wrote:
> Hi.
>
> As discussed in the PR, we should be more precise in our documentation.
> The patch does that.
>
> Ready for trunk?
> Martin
>
> gcc/ChangeLog:
>
> 2017-10-19  Martin Liska  
>
>   PR gcov-profile/82457
>   * doc/invoke.texi: Document that one needs a non-strict ISO mode
>   for fork-like functions to be properly instrumented.
> ---
>  gcc/doc/invoke.texi | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
>
>

The wording is kinda unclear because the modes in the parentheses are
all strict ISO modes, but the part before the parentheses says
NON-strict... I think you either need an additional "not" inside the
parentheses, or to change all the instances of -std=c* to -std=gnu*.


[PATCH] PR target/82624 msp430-elf target must allow for NULL pointer dereferences

2017-10-19 Thread Orlando Arias
Greetings,

As requested by Andrew Pinski, I should send the patch to this mailing
list. The patch was also included in the bug report alongside an
explanation as to why it is necessary [1].

In brief, the memory map of the MSP430 reserves the area from address
0x to address 0x000f for special function registers. Software
attempting to access address 0x as an implicitly declared pointer
will cause the compiler to silently generate bad code (gcc7) or to throw
an error message (gcc8 behavior, I think) when compiled with -O2 or
higher due to the -fdelete-null-pointer-checks optimization.

The attached patch disables the optimization for the msp430-elf target.
Please let me know if any changes are needed (also, keep me CC'd if
necessary, I am not a member of this mailing list). Thank you.

Cheers,
Orlando.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82624
diff -rupN gcc-7.2.0-pristine/gcc/config/msp430/msp430.c gcc-7.2.0-changed/gcc/config/msp430/msp430.c
--- gcc-7.2.0-pristine/gcc/config/msp430/msp430.c	2017-03-10 20:43:48.186872000 -0500
+++ gcc-7.2.0-changed/gcc/config/msp430/msp430.c	2017-10-18 16:00:46.677567638 -0400
@@ -748,6 +748,10 @@ hwmult_name (unsigned int val)
 static void
 msp430_option_override (void)
 {
+  /* The MSP430 architecture can safely dereference a NULL pointer. In fact,
+  there are memory mapped registers there.  */
+  flag_delete_null_pointer_checks = 0;
+
   init_machine_status = msp430_init_machine_status;
 
   if (target_cpu)
diff -rupN gcc-7.2.0-pristine/gcc/doc/g++.1 gcc-7.2.0-changed/gcc/doc/g++.1
--- gcc-7.2.0-pristine/gcc/doc/g++.1	2017-08-14 04:30:43.506125138 -0400
+++ gcc-7.2.0-changed/gcc/doc/g++.1	2017-10-18 16:07:27.160923633 -0400
@@ -7045,7 +7045,8 @@ Use \fB\-fno\-delete\-null\-pointer\-che
 for programs that depend on that behavior.
 .Sp
 This option is enabled by default on most targets.  On Nios \s-1II ELF,\s0 it
-defaults to off.  On \s-1AVR\s0 and \s-1CR16,\s0 this option is completely disabled.
+defaults to off.  On \s-1AVR\s0, \s-1CR16\s0, and \s-1MSP430\s0 this option is
+completely disabled.
 .Sp
 Passes that use the dataflow information
 are enabled independently at different optimization levels.
diff -rupN gcc-7.2.0-pristine/gcc/doc/gcc.1 gcc-7.2.0-changed/gcc/doc/gcc.1
--- gcc-7.2.0-pristine/gcc/doc/gcc.1	2017-08-14 04:30:43.002125125 -0400
+++ gcc-7.2.0-changed/gcc/doc/gcc.1	2017-10-18 16:06:31.367587143 -0400
@@ -7045,7 +7045,8 @@ Use \fB\-fno\-delete\-null\-pointer\-che
 for programs that depend on that behavior.
 .Sp
 This option is enabled by default on most targets.  On Nios \s-1II ELF,\s0 it
-defaults to off.  On \s-1AVR\s0 and \s-1CR16,\s0 this option is completely disabled.
+defaults to off.  On \s-1AVR\s0, \s-1CR16\s0, and \s-1MSP430\s0, this option is
+completely disabled.
 .Sp
 Passes that use the dataflow information
 are enabled independently at different optimization levels.
diff -rupN gcc-7.2.0-pristine/gcc/doc/gcc.info gcc-7.2.0-changed/gcc/doc/gcc.info
--- gcc-7.2.0-pristine/gcc/doc/gcc.info	2017-08-14 04:30:40.618125066 -0400
+++ gcc-7.2.0-changed/gcc/doc/gcc.info	2017-10-18 16:05:19.124249722 -0400
@@ -7070,7 +7070,7 @@ optimizations to be performed is desired
  for programs that depend on that behavior.
 
  This option is enabled by default on most targets.  On Nios II
- ELF, it defaults to off.  On AVR and CR16, this option is
+ ELF, it defaults to off.  On AVR, CR16, and MSP430, this option is
  completely disabled.
 
  Passes that use the dataflow information are enabled independently
diff -rupN gcc-7.2.0-pristine/gcc/doc/invoke.texi gcc-7.2.0-changed/gcc/doc/invoke.texi
--- gcc-7.2.0-pristine/gcc/doc/invoke.texi	2017-07-26 08:42:03.184523000 -0400
+++ gcc-7.2.0-changed/gcc/doc/invoke.texi	2017-10-18 16:04:16.724246191 -0400
@@ -7618,7 +7618,7 @@ Use @option{-fno-delete-null-pointer-che
 for programs that depend on that behavior.
 
 This option is enabled by default on most targets.  On Nios II ELF, it
-defaults to off.  On AVR and CR16, this option is completely disabled.  
+defaults to off.  On AVR, CR16, and MSP430, this option is completely disabled.
 
 Passes that use the dataflow information
 are enabled independently at different optimization levels.
diff -rupN gcc-7.2.0-pristine/gcc/testsuite/lib/target-supports.exp gcc-7.2.0-changed/gcc/testsuite/lib/target-supports.exp
--- gcc-7.2.0-pristine/gcc/testsuite/lib/target-supports.exp	2017-03-24 09:59:51.032979000 -0400
+++ gcc-7.2.0-changed/gcc/testsuite/lib/target-supports.exp	2017-10-18 16:26:11.234320570 -0400
@@ -514,7 +514,8 @@ proc check_effective_target_keeps_null_p
 if [target_info exists keeps_null_pointer_checks] {
   return 1
 }
-if { [istarget avr-*-*] } {
+if { [istarget avr-*-*]
+	 || [istarget msp430-*-*] } {
 	return 1;   
 }
 return 0


signature.asc
Description: OpenPGP digital signature


Re: [RFC] New pragma exec_charset

2017-10-19 Thread Joseph Myers
On Thu, 19 Oct 2017, Andreas Krebbel wrote:

>  gcc/testsuite/gcc.dg/pragma-exec_charset-1.c | 26 +++

I'd expect a c-c++-common test rather than a C-specific one, given there 
are significant differences in how the C and C++ front ends handle lexing.

> +#pragma GCC exec_charset("UTF16")
> +call_with_utf16("hello world");

I don't think UTF16 makes sense as an example, as it's not a multibyte 
character set (characters are 16-bit, and with 8-bit bytes that means many 
characters contain NUL bytes, which is not allowed for multibyte character 
sets).

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH, i386]: PR 82618, Inefficient double-word subtraction on x86_64

2017-10-19 Thread Uros Bizjak
Attached patch converts SUB with its output unused to CMP. This can
happen in double-word subtraction when lowpart of the result is unused
(but highpart subtraction still needs to borrow carry flag).

2017-10-19  Uros Bizjak  

PR target/82618
* config/i386/i386.md (sub to cmp): New peephole2 pattern.

testsuite/ChangeLog:

2017-10-18  Uros Bizjak  
Jakub Jelinek  

PR target/82618
* gcc.target/i386/pr82618.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 253899)
+++ config/i386/i386.md (working copy)
@@ -6766,6 +6766,17 @@
   [(set_attr "type" "alu")
(set_attr "mode" "")])
 
+(define_peephole2
+  [(parallel
+ [(set (reg:CC FLAGS_REG)
+  (compare:CC (match_operand:SWI 0 "general_reg_operand")
+  (match_operand:SWI 1 "general_gr_operand")))
+  (set (match_dup 0)
+  (minus:SWI (match_dup 0) (match_dup 1)))])]
+  "find_regno_note (peep2_next_insn (0), REG_UNUSED, REGNO (operands[0])) != 0"
+  [(set (reg:CC FLAGS_REG)
+   (compare:CC (match_dup 0) (match_dup 1)))])
+
 (define_insn "*subsi_3_zext"
   [(set (reg FLAGS_REG)
(compare (match_operand:SI 1 "register_operand" "0")
Index: testsuite/gcc.target/i386/pr82618.c
===
--- testsuite/gcc.target/i386/pr82618.c (nonexistent)
+++ testsuite/gcc.target/i386/pr82618.c (working copy)
@@ -0,0 +1,18 @@
+/* PR target/82618 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#ifdef __SIZEOF_INT128__
+typedef unsigned __int128 U;
+typedef unsigned long long H;
+#else
+typedef unsigned long long U;
+typedef unsigned int H;
+#endif
+
+H f0 (U x, U y)
+{
+  return (x - y) >> (__CHAR_BIT__ * sizeof (H));
+}
+
+/* { dg-final { scan-assembler {\mcmp} } } */


Re: [committed] Add gnu::unique_ptr

2017-10-19 Thread David Malcolm
On Thu, 2017-10-19 at 19:23 +0200, Gerald Pfeifer wrote:
> On Mon, 16 Oct 2017, David Malcolm wrote:
> > For reference, here's what I committed:
> 
> I'm afraid this may have broken the bootstrap with clang?
> 
> In file included from /scratch/tmp/gerald/gcc-HEAD/gcc/unique-ptr-
> tests.cc:23:
> In file included from /scratch/tmp/gerald/gcc-
> HEAD/gcc/../include/unique-ptr.h:77:
> In file included from /usr/include/c++/v1/memory:629:
> /usr/include/c++/v1/typeinfo:199:2: error: no member named
> 'fancy_abort' in namespace 'std::__1'; did you mean simply
> 'fancy_abort'?
> _VSTD::abort();
> ^~~
> /usr/include/c++/v1/__config:390:15: note: expanded from macro
> '_VSTD'
> #define _VSTD std::_LIBCPP_NAMESPACE
>   ^
> /scratch/tmp/gerald/gcc-HEAD/gcc/system.h:725:13: note: 'fancy_abort'
> declared here
> extern void fancy_abort (const char *, int, const char *)
> ^
> 
> 
> This is FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) 
> (based on LLVM 4.0.0), on x86_64-unknown-freebsd11.1.
> 
> Gerald

Sorry about the breakage.

There seem to have been similar problems on OS X:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82610

The proposed fix there is to include  in system.h, which
presumably would fix this also.


Re: [committed] Add gnu::unique_ptr

2017-10-19 Thread Gerald Pfeifer
On Mon, 16 Oct 2017, David Malcolm wrote:
> For reference, here's what I committed:

I'm afraid this may have broken the bootstrap with clang?

In file included from /scratch/tmp/gerald/gcc-HEAD/gcc/unique-ptr-tests.cc:23:
In file included from 
/scratch/tmp/gerald/gcc-HEAD/gcc/../include/unique-ptr.h:77:
In file included from /usr/include/c++/v1/memory:629:
/usr/include/c++/v1/typeinfo:199:2: error: no member named 'fancy_abort' in 
namespace 'std::__1'; did you mean simply 'fancy_abort'?
_VSTD::abort();
^~~
/usr/include/c++/v1/__config:390:15: note: expanded from macro '_VSTD'
#define _VSTD std::_LIBCPP_NAMESPACE
  ^
/scratch/tmp/gerald/gcc-HEAD/gcc/system.h:725:13: note: 'fancy_abort' declared 
here
extern void fancy_abort (const char *, int, const char *)
^


This is FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) 
(based on LLVM 4.0.0), on x86_64-unknown-freebsd11.1.

Gerald


Re: [PATCH] PR libgcc/59714 complex division is surprising on aarch64

2017-10-19 Thread Wilco Dijkstra
Vladimir Mezentsev 
> On 10/19/2017 06:37 AM, Richard Earnshaw (lists) wrote:
>> On 19/10/17 14:07, Wilco Dijkstra wrote:
>>> Vladimir wrote:
>>>
>>> +# Disable floating-point expression contraction
>>> +LIBGCC2_FFP_CONTRAST_CFLAGS = -ffp-contract=off
>>> +
>>>
>>> It looks like this disables fp-contract in all of libgcc...
>>> What is the the number of FMAs in libgcc before/after?
>
>   How can I find this number ?
> dis  | grep  | wc

Eg. objdump -d ~/install/gcc/lib64/libgcc_s.so | grep -c fmadd
Also grep fmsub, fnmadd, fnmsub.

> > It's probably better to do this with an attribute
> >
> >    __attribute__((optimize("fp-contract=off")))
> >
> > on the affected functions.
>
>    I like your suggestion.

I don't think this will work in general, IIRC only a few targets correctly
implement the optimize pragma. Changing the makefile to build only
__divdc3/__divtc3/__divsc3/__divhc3 without FMA looks like a better
approach.

However we also need to decide whether this is the best possible fix.
It will affect accuracy of other inputs as well...

Wilco

Re: [RFC] New pragma exec_charset

2017-10-19 Thread Martin Sebor

On 10/19/2017 09:50 AM, Andreas Krebbel wrote:

The TPF operating system uses the GCC S/390 backend.  They set an
EBCDIC exec charset for compilation using -fexec-charset.  However,
certain libraries require ASCII strings instead.  In order to be able
to put calls to that library into the normal code it is required to
switch the exec charset within a compilation unit.

This is an attempt to implement it by adding a new pragma which could
be used like in the following example:

int
foo ()
{
  call_with_utf8("hello world");

#pragma GCC exec_charset("UTF16")
  call_with_utf16("hello world");

#pragma GCC exec_charset(pop)
  call_with_utf8("hello world");
}

Does this look reasonable?


I'm not an expert on this but at a high level it looks reasonable
to me.  But based on some small amount of work I did in this area
I have a couple of questions.

There are a few places in the compiler that already do or that
should but don't yet handle different execution character sets.
The former include built-ins like __bultin_isdigit() and
__builtin_sprintf (in both builtins.c and gimple-ssa-sprintf.c)
The latter is the -Wformat checking done by the C and C++ front
ends.  The missing support for the latter is the subject of bug
38308.  According to bug 81686, LTO is apparently also missing
support for exec-charset.

I'm curious how the pragma might interact with these two areas,
and whether the lack of support for it in the latter is a concern
(and if not, why not).  For the former, I'm also wondering about
the interaction of inlining and other interprocedural optimizations
with the pragma.  Does it propagate through inlined calls as one
would expect?

Thanks
Martin



Re: [PATCH] PR libgcc/59714 complex division is surprising on aarch64

2017-10-19 Thread Vladimir Mezentsev

On 10/19/2017 06:37 AM, Richard Earnshaw (lists) wrote:

On 19/10/17 14:07, Wilco Dijkstra wrote:

Vladimir wrote:

+# Disable floating-point expression contraction
+LIBGCC2_FFP_CONTRAST_CFLAGS = -ffp-contract=off
+

It looks like this disables fp-contract in all of libgcc...
What is the the number of FMAs in libgcc before/after?


  How can I find this number ?
dis  | grep  | wc

Or are there the other simple ways ?



If it disables anything other than the ones in complex division, it
would have an unintended performance impact. It's best to do
this just for complex division.

Wilco


It's probably better to do this with an attribute

__attribute__((optimize("fp-contract=off")))

on the affected functions.


  I like your suggestion.

-Vladimir



R.





Re: Drop edge counts from CFG

2017-10-19 Thread Uros Bizjak
Hello!

> * cgraphunit.c (init_lowered_empty_function): UPdate.
> (cgraph_node::expand_thunk): UPdate.
> * gimple-pretty-print.c (dump_probability): UPdate.
...
> (gimple_mod_subtract): UPdate.
> (gimple_ic): UPdate.
> (gimple_stringop_fixed_value): UPdate.

s/UPdate/Update/g

Uros.


[PR other/79543] Fix GNU ld --version scanning to conform to the GNU Coding Standards

2017-10-19 Thread Thomas Schwinge
Hi!

As discussed in :

| [...] target
| libraries [...] conditionally use certain linker features, depending on linker
| version numbers.  For example, linker version scripts in libgomp; see
| LIBGOMP_BUILD_VERSIONED_SHLIB, LIBGOMP_BUILD_VERSIONED_SHLIB_GNU usage in
| libgomp/Makefile.am, which are defined in libgomp/acinclude.m4 based on the
| enable_symvers value, which is set by some logic depending on "ld --version"
| output.  Looking at one specific build's libgomp/config.log, I see:
| 
| [...]
| configure:16132: checking if the linker ([...]/gcc/collect-ld) is GNU ld
| configure:16147: result: yes
| [...]
| configure:16389: WARNING: === Linker version 1125 is too old for
| configure:16391: WARNING: === full symbol versioning support in this 
release of GCC.
| configure:16393: WARNING: === You would need to upgrade your binutils to 
version
| configure:16395: WARNING: === 21400 or later and rebuild GCC.
| configure:16404: WARNING: === Symbol versioning will be disabled.
| [...]
| 
| Now, what is this "1125" linker version?
| 
| $ [...]/gcc/collect-ld --version
| GNU ld (Sourcery CodeBench ([...]) Lite 2015.11-[...]) 2.25.51
| 
| ..., and then apply the "magic" used in libgomp/acinclude.m4 to compute
| libgomp_gnu_ld_version:
| 
| $ [...]/gcc/collect-ld --version | sed -e 's/GNU gold /GNU ld /;s/GNU ld 
version /GNU ld /;s/GNU ld ([^)]*) /GNU ld /;s/GNU ld \([0-9.][0-9.]*\).*/\1/; 
q' | awk -F. '{ if (NF<3) $3=0; print ($1*100+$2)*100+$3 }'
| 1125
| 
| "1125" instead of the expected "22551".  That's pretty weird, and I suppose it
| may be causing all kinds of "interesting" issues, when using a linker that has
| been configured to override/augment its version string?

Specifically, it's the additional set of nested parens that is confusing
the sed command.

| Joseph told me that "the correct logic for finding the version number is to
| take everything after the last space on the first line of the output, as per
| the GNU Coding Standards.

See here:
:
"[...] the version number proper starts after the last space.  [...]"

| (It's possible some very old binutils versions may
| not have properly formatted output; my view is that each GCC version should
| have a minimum corresponding binutils version, no more than say five years 
old,
| for targets using GNU binutils.)"

(Agreed.  That's for another day.)

| Yet better, obviously, would be to not rely on such version-based decisions at
| all.

(That, too.)

| (Hopefully, similar parsing is not also being done for "as --version", or 
other
| tools

(That, too; not verified now.)

| and this was just a one-off problem, possibly originally introduced in
| libstdc++-v3/acinclude.m4, 15 years ago, and the copied from there to other
| target libraries?  I haven't looked in detail.)

(Also, I don't know what the GNU Conding Standards described, 15 years
ago.)


| (For reference, this is the root cause for the issue reported in
| 
.)

In addition to that libgomp example, the problem can be observed for
libstdc++ in the following existing test cases
libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c,
libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-2.c,
libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-3.c,
libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-4.c, and
libgomp.oacc-c++/../libgomp.oacc-c-c++-common/host_data-1.c.  Trying to
link in cuBLAS (these require a nvptx offloading configuration), these
FAIL to compile:

/usr/lib/x86_64-linux-gnu/libcublas.so: undefined reference to `operator 
delete[](void*)@GLIBCXX_3.4'
/usr/lib/x86_64-linux-gnu/libcublas.so: undefined reference to `operator 
new[](unsigned long)@GLIBCXX_3.4'


Applying the following patch (that is, simplying the sed command), makes
the problem go away:

-sed -e 's/GNU gold /GNU ld /;s/GNU ld version /GNU ld /;s/GNU ld ([^)]*) 
/GNU ld /;s/GNU ld \([0-9.][0-9.]*\).*/\1/; q'`
+sed -e 's/GNU gold /GNU ld /;s/GNU ld .* \(.*\)/\1/; q'`

In one specific build, this causes the following changes:

x86_64-none-linux-gnu/libatomic/config.log:

[...]
-configure:15071: WARNING: === Linker version 1125 is too old for
-configure:15073: WARNING: === full symbol versioning support in this 
release of GCC.
-configure:15075: WARNING: === You would need to upgrade your binutils to 
version
-configure:15077: WARNING: === 21400 or later and rebuild GCC.
-configure:15086: WARNING: === Symbol versioning will be disabled.
-configure:15136: versioning on shared library symbols is no
+configure:15136: versioning on shared library symbols is gnu
[...]

x86_64-none-linux-gnu/libgomp/config.log:

[...]
-configure:16458: WARNING: === Linker version 1125 is too old for
-configure:16460: WARNING: === full 

[PATCH, i386]: A couple of cleanups in output insn mnemonic constructions

2017-10-19 Thread Uros Bizjak
No functional changes.

2017-10-19  Uros Bizjak  

* config/i386/i386.c (output_387_binary_op): Rewrite SSE part.
(ix86_emit_mode_set): Rewrite insn mnemonic construction.
(ix86_prepare_fp_compare_args): Redefine is_sse as bool.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 253899)
+++ config/i386/i386.c  (working copy)
@@ -18149,90 +18149,67 @@ output_387_binary_op (rtx_insn *insn, rtx *operand
 {
   static char buf[40];
   const char *p;
-  const char *ssep;
-  int is_sse = SSE_REG_P (operands[0]) || SSE_REG_P (operands[1]) || SSE_REG_P 
(operands[2]);
+  bool is_sse
+= (SSE_REG_P (operands[0])
+   || SSE_REG_P (operands[1]) || SSE_REG_P (operands[2]));
 
-  /* Even if we do not want to check the inputs, this documents input
- constraints.  Which helps in understanding the following code.  */
-  if (flag_checking)
-{
-  if (STACK_REG_P (operands[0])
- && ((REG_P (operands[1])
-  && REGNO (operands[0]) == REGNO (operands[1])
-  && (STACK_REG_P (operands[2]) || MEM_P (operands[2])))
- || (REG_P (operands[2])
- && REGNO (operands[0]) == REGNO (operands[2])
- && (STACK_REG_P (operands[1]) || MEM_P (operands[1]
- && (STACK_TOP_P (operands[1]) || STACK_TOP_P (operands[2])))
-   ; /* ok */
-  else
-   gcc_assert (is_sse);
-}
+  if (is_sse)
+p = "%v";
+  else if (GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_INT
+  || GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT)
+p = "fi";
+  else
+p = "f";
 
+  strcpy (buf, p);
+
   switch (GET_CODE (operands[3]))
 {
 case PLUS:
-  if (GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_INT
- || GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT)
-   p = "fiadd";
-  else
-   p = "fadd";
-  ssep = "vadd";
-  break;
-
+  p = "add"; break;
 case MINUS:
-  if (GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_INT
- || GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT)
-   p = "fisub";
-  else
-   p = "fsub";
-  ssep = "vsub";
-  break;
-
+  p = "sub"; break;
 case MULT:
-  if (GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_INT
- || GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT)
-   p = "fimul";
-  else
-   p = "fmul";
-  ssep = "vmul";
-  break;
-
+  p = "mul"; break;
 case DIV:
-  if (GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_INT
- || GET_MODE_CLASS (GET_MODE (operands[2])) == MODE_INT)
-   p = "fidiv";
-  else
-   p = "fdiv";
-  ssep = "vdiv";
-  break;
-
+  p = "div"; break;
 default:
   gcc_unreachable ();
 }
 
+  strcat (buf, p);
+
   if (is_sse)
{
+ p = (GET_MODE (operands[0]) == SFmode) ? "ss" : "sd";
+ strcat (buf, p);
+
  if (TARGET_AVX)
-   {
-strcpy (buf, ssep);
-if (GET_MODE (operands[0]) == SFmode)
-  strcat (buf, "ss\t{%2, %1, %0|%0, %1, %2}");
-else
-  strcat (buf, "sd\t{%2, %1, %0|%0, %1, %2}");
-   }
+   p = "\t{%2, %1, %0|%0, %1, %2}";
  else
-   {
-strcpy (buf, ssep + 1);
-if (GET_MODE (operands[0]) == SFmode)
-  strcat (buf, "ss\t{%2, %0|%0, %2}");
-else
-  strcat (buf, "sd\t{%2, %0|%0, %2}");
-   }
-  return buf;
+   p = "\t{%2, %0|%0, %2}";
+
+ strcat (buf, p);
+ return buf;
}
-  strcpy (buf, p);
 
+  /* Even if we do not want to check the inputs, this documents input
+ constraints.  Which helps in understanding the following code.  */
+  if (flag_checking)
+{
+  if (STACK_REG_P (operands[0])
+ && ((REG_P (operands[1])
+  && REGNO (operands[0]) == REGNO (operands[1])
+  && (STACK_REG_P (operands[2]) || MEM_P (operands[2])))
+ || (REG_P (operands[2])
+ && REGNO (operands[0]) == REGNO (operands[2])
+ && (STACK_REG_P (operands[1]) || MEM_P (operands[1]
+ && (STACK_TOP_P (operands[1]) || STACK_TOP_P (operands[2])))
+   ; /* ok */
+  else
+   gcc_unreachable ();
+}
+
   switch (GET_CODE (operands[3]))
 {
 case MULT:
@@ -18820,10 +18797,13 @@ ix86_emit_mode_set (int entity, int mode, int prev
 const char *
 output_fix_trunc (rtx_insn *insn, rtx *operands, bool fisttp)
 {
-  int stack_top_dies = find_regno_note (insn, REG_DEAD, FIRST_STACK_REG) != 0;
-  int dimode_p = GET_MODE (operands[0]) == DImode;
+  bool stack_top_dies = find_regno_note (insn, REG_DEAD, FIRST_STACK_REG);
+  bool dimode_p = GET_MODE (operands[0]) == DImode;
   int round_mode = get_attr_i387_cw (insn);
 
+  static char buf[40];
+  const char *p;
+
   /* Jump through a hoop or two for DImode, 

Re: [PATCH GCC][2/3]Simplify ((A +- CST1 CMP A +- CST2)) for undefined overflow type

2017-10-19 Thread Bin.Cheng
On Thu, Oct 19, 2017 at 4:33 PM, Marc Glisse  wrote:
> On Thu, 19 Oct 2017, Bin Cheng wrote:
>
>> * match.pd (A +- CST1 CMP A +- CST2): New pattern.
>
>
> Similarly, this has a very large overlap with "X + Z < Y + Z" transforms
> already in match.pd. It may handle X - CST CMP X + CST that the other
> doesn't (?), but we tend to canonicalize X-5 to X+-5 anyway.
And drop this one.

Thanks,
bin
>
> --
> Marc Glisse


Re: [PATCH GCC][1/3]Simplify (A + CST cmp A -> CST cmp zero) for undefined overflow type

2017-10-19 Thread Bin.Cheng
On Thu, Oct 19, 2017 at 4:22 PM, Marc Glisse  wrote:
> On Thu, 19 Oct 2017, Bin Cheng wrote:
>
>> * match.pd (A + CST cmp A  ->  CST cmp zero): New simplification
>> for undefined overflow types in (A + CST CMP A  ->  A CMP' CST').
>
>
> Could you check if you still need that? I recently added something very
> similar (search for "X + Y < Y" in match.pd).
Ah, yes indeed, the two patterns are unnecessary now on TOT.  I will drop this.

Thanks,
bin
>
> --
> Marc Glisse


Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices

2017-10-19 Thread Martin Sebor

Good question!  STRING_CST does have a domain.  The problem is
that array_at_struct_end_p() returns true for STRING_CST.  I've
added the handling to the function and removed the block above
from the latest patch.


Can you split out the STRING_CST handling and commit that separately
(split the testcase)?  That part looks ok.


I've committed r253902.  It turns out, however, that this subset
of the patch doesn't fix the whole problem.  What's still missing
is the handling of:

  int g (int i)
  {
return (i < 0 ? ABC : DEF)[7];   // missing -Warray-bounds
  }

Surprisingly, this happens to work in C++ but not in C (which
is in contrast to bug 82609 where it's the other way around).
The root cause is that the expression is represented as
a MEM_REF(ADDR_EXPR (STRING_CST)) and check_array_bounds()
only considers ARRAY_REF and ADDR_EXPR.

I will submit a separate patch for that, perhaps along with
one for bug 82612 that you commented on this morning (missing
-Warray-bounds on a non-zero offset from the address of a non-
array object).

Martin


Re: [PATCH] ira-color: fix allocno_priority_compare_func for qsort (PR 82395)

2017-10-19 Thread Vladimir Makarov

On 10/19/2017 06:37 AM, Alexander Monakov wrote:

Ping.

On Thu, 5 Oct 2017, Alexander Monakov wrote:
Bootstrapped and regtested on x86-64, OK for trunk?

OK.  Alexander, sorry for the delay with the answer and thank you for 
finding and fixing the problem.

PR rtl-optimization/82395
* ira-color.c (allocno_priority_compare_func): Fix comparison step
based on non_spilled_static_chain_regno_p.

diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 22fdb88274d..31a4a8074d1 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -3005,14 +3005,13 @@ allocno_priority_compare_func (const void *v1p, const 
void *v2p)
  {
ira_allocno_t a1 = *(const ira_allocno_t *) v1p;
ira_allocno_t a2 = *(const ira_allocno_t *) v2p;
-  int pri1, pri2;
+  int pri1, pri2, diff;
  
/* Assign hard reg to static chain pointer pseudo first when

   non-local goto is used.  */
-  if (non_spilled_static_chain_regno_p (ALLOCNO_REGNO (a1)))
-return 1;
-  else if (non_spilled_static_chain_regno_p (ALLOCNO_REGNO (a2)))
-return -1;
+  if ((diff = (non_spilled_static_chain_regno_p (ALLOCNO_REGNO (a2))
+  - non_spilled_static_chain_regno_p (ALLOCNO_REGNO (a1 != 0)
+return diff;
pri1 = allocno_priorities[ALLOCNO_NUM (a1)];
pri2 = allocno_priorities[ALLOCNO_NUM (a2)];
if (pri2 != pri1)





[RFC] New pragma exec_charset

2017-10-19 Thread Andreas Krebbel
The TPF operating system uses the GCC S/390 backend.  They set an
EBCDIC exec charset for compilation using -fexec-charset.  However,
certain libraries require ASCII strings instead.  In order to be able
to put calls to that library into the normal code it is required to
switch the exec charset within a compilation unit.

This is an attempt to implement it by adding a new pragma which could
be used like in the following example:

int
foo ()
{
  call_with_utf8("hello world");

#pragma GCC exec_charset("UTF16")
  call_with_utf16("hello world");

#pragma GCC exec_charset(pop)
  call_with_utf8("hello world");
}

Does this look reasonable?

Bye,

-Andreas-
---
 gcc/c-family/c-pragma.c  | 50 
 gcc/doc/extend.texi  | 26 +++
 gcc/testsuite/gcc.dg/pragma-exec_charset-1.c | 26 +++
 libcpp/charset.c |  2 +-
 libcpp/include/cpplib.h  |  3 ++
 libcpp/init.c|  2 +-
 libcpp/internal.h|  1 -
 7 files changed, 107 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pragma-exec_charset-1.c

diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index f7b59b3..db281b9 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -34,6 +34,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "plugin.h"
 
+extern cpp_options *cpp_opts;
+
 #define GCC_BAD(gmsgid) \
   do { warning (OPT_Wpragmas, gmsgid); return; } while (0)
 #define GCC_BAD2(gmsgid, arg) \
@@ -1141,6 +1143,52 @@ handle_pragma_message (cpp_reader *ARG_UNUSED(dummy))
 inform (input_location, "#pragma message: %s", TREE_STRING_POINTER 
(message));
 }
 
+static void
+handle_pragma_exec_charset (cpp_reader *ARG_UNUSED(dummy))
+{
+  enum cpp_ttype token;
+  tree x;
+  static const char* previous_charset = NULL;
+
+  token = pragma_lex ();
+  if (token == CPP_OPEN_PAREN)
+{
+  token = pragma_lex ();
+  if (token == CPP_STRING)
+   {
+ previous_charset = cpp_opts->narrow_charset;
+ cpp_opts->narrow_charset = TREE_STRING_POINTER (x);
+   }
+  else if (token == CPP_NAME
+  && strncmp (IDENTIFIER_POINTER (x), "pop", 3) == 0)
+   {
+ if (previous_charset == NULL)
+   {
+ warning (OPT_Wpragmas,
+  "pop without previous exec_charset use - ignored");
+ return;
+   }
+ cpp_opts->narrow_charset = previous_charset;
+ previous_charset = NULL;
+   }
+  else
+   GCC_BAD ("expected a charset string or pop after %<#pragma 
exec_charset%>");
+
+  if (pragma_lex () != CPP_CLOSE_PAREN)
+   GCC_BAD ("malformed %<#pragma exec_charset%>, ignored");
+}
+  else
+GCC_BAD ("expected a string after %<#pragma exec_charset%>");
+
+  if (pragma_lex () != CPP_EOF)
+warning (OPT_Wpragmas, "junk at end of %<#pragma exec_charset%>");
+
+  inform (input_location, "switching to exec charset: %s",
+ cpp_opts->narrow_charset);
+  cpp_destroy_iconv (parse_in);
+  cpp_init_iconv (parse_in);
+}
+
 /* Mark whether the current location is valid for a STDC pragma.  */
 
 static bool valid_location_for_stdc_pragma;
@@ -1571,6 +1619,8 @@ init_pragma (void)
handle_pragma_redefine_extname);
 
   c_register_pragma_with_expansion (0, "message", handle_pragma_message);
+  c_register_pragma_with_expansion ("GCC", "exec_charset",
+   handle_pragma_exec_charset);
 
 #ifdef REGISTER_TARGET_PRAGMAS
   REGISTER_TARGET_PRAGMAS ();
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index d9b7a54..b67993a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21611,6 +21611,7 @@ for further explanation.
 * Push/Pop Macro Pragmas::
 * Function Specific Option Pragmas::
 * Loop-Specific Pragmas::
+* Charset-Specific Pragmas::
 @end menu
 
 @node AArch64 Pragmas
@@ -22209,6 +22210,31 @@ void ignore_vec_dep (int *a, int k, int c, int m)
 @}
 @end smallexample
 
+@node Charset-Specific Pragmas
+@subsection Charset-Specific Pragmas
+
+@table @code
+@item #pragma GCC exec_charset(@var{"charset"})
+@cindex pragma GCC exec_charset
+
+Set the execution character set, used for string and character
+constants.  The default is the exec charset specified with
+@option{-fexec-charset} or UTF-8 if @option{-fexec-charset} isn't used.
+charset can be any encoding supported by the system's "iconv" library
+routine.  The special value @var{pop} (without ``) can be
+used to switch back to the exec charset before the last @code{#pragma
+GCC exec_charset} setting.
+@end table
+
+@smallexample
+call_with_utf8("hello world");
+
+#pragma GCC exec_charset("UTF16")
+call_with_utf16("hello world");
+
+#pragma GCC exec_charset(pop)
+call_with_utf8("hello world");
+@end smallexample
 
 @node Unnamed Fields
 @section Unnamed 

Re: [PATCH GCC][2/3]Simplify ((A +- CST1 CMP A +- CST2)) for undefined overflow type

2017-10-19 Thread Marc Glisse

On Thu, 19 Oct 2017, Bin Cheng wrote:


* match.pd (A +- CST1 CMP A +- CST2): New pattern.


Similarly, this has a very large overlap with "X + Z < Y + Z" transforms 
already in match.pd. It may handle X - CST CMP X + CST that the other 
doesn't (?), but we tend to canonicalize X-5 to X+-5 anyway.


--
Marc Glisse


Re: [PATCH GCC][1/3]Simplify (A + CST cmp A -> CST cmp zero) for undefined overflow type

2017-10-19 Thread Marc Glisse

On Thu, 19 Oct 2017, Bin Cheng wrote:


* match.pd (A + CST cmp A  ->  CST cmp zero): New simplification
for undefined overflow types in (A + CST CMP A  ->  A CMP' CST').


Could you check if you still need that? I recently added something very 
similar (search for "X + Y < Y" in match.pd).


--
Marc Glisse


Correct cost of SSE and x87 instructions for generic and core

2017-10-19 Thread Jan Hubicka
Hi,
core and generic costs of x87 and SSE instructions seems to follow pentium4 
settings
which is not very realistic. This patch sets them according to latencies.  I 
have
tested this on haswell as part of the vectorizer cost metric patch (where we 
want
to have sane values to get sane decisions) and it did not cause any regressions
on spec2k/2k6 and C++ benchmarks.  I am not 100% sure what of the improvements
seen can be attributed to the cost change alone and what to the vectorizer 
metric
but we will see tomorrow from our periodic testers.

Note that atom and other cost tables seems off too.  I will send separate patch 
for
this but I have no way to benchmark it.

Bootstrapped/regtested x86_linux, will commit it shortly.

Honza

* x86-tune-costs.h (generic_cost, core_cost): Correct costs
of x87 and SSE instructions.
Index: config/i386/x86-tune-costs.h
===
--- config/i386/x86-tune-costs.h(revision 253824)
+++ config/i386/x86-tune-costs.h(working copy)
@@ -2196,8 +2196,7 @@ static stringop_algs generic_memset[2] =
 static const
 struct processor_costs generic_cost = {
   COSTS_N_INSNS (1),   /* cost of an add instruction */
-  /* On all chips taken into consideration lea is 2 cycles and more.  With
- this cost however our current implementation of synth_mult results in
+  /* Setting cost to 2 makes our current implementation of synth_mult result in
  use of unnecessary temporary registers causing regression on several
  SPECfp benchmarks.  */
   COSTS_N_INSNS (1) + 1,   /* cost of a lea instruction */
@@ -2246,23 +2245,23 @@ struct processor_costs generic_cost = {
   /* Benchmarks shows large regressions on K8 sixtrack benchmark when this
  value is increased to perhaps more appropriate value of 5.  */
   3,   /* Branch cost */
-  COSTS_N_INSNS (8),   /* cost of FADD and FSUB insns.  */
-  COSTS_N_INSNS (8),   /* cost of FMUL instruction.  */
+  COSTS_N_INSNS (3),   /* cost of FADD and FSUB insns.  */
+  COSTS_N_INSNS (3),   /* cost of FMUL instruction.  */
   COSTS_N_INSNS (20),  /* cost of FDIV instruction.  */
-  COSTS_N_INSNS (8),   /* cost of FABS instruction.  */
-  COSTS_N_INSNS (8),   /* cost of FCHS instruction.  */
+  COSTS_N_INSNS (1),   /* cost of FABS instruction.  */
+  COSTS_N_INSNS (1),   /* cost of FCHS instruction.  */
   COSTS_N_INSNS (40),  /* cost of FSQRT instruction.  */
 
-  COSTS_N_INSNS (8),   /* cost of cheap SSE instruction.  */
-  COSTS_N_INSNS (8),   /* cost of ADDSS/SD SUBSS/SD insns.  */
-  COSTS_N_INSNS (8),   /* cost of MULSS instruction.  */
-  COSTS_N_INSNS (8),   /* cost of MULSD instruction.  */
-  COSTS_N_INSNS (8),   /* cost of FMA SS instruction.  */
-  COSTS_N_INSNS (8),   /* cost of FMA SD instruction.  */
-  COSTS_N_INSNS (20),  /* cost of DIVSS instruction.  */
-  COSTS_N_INSNS (20),  /* cost of DIVSD instruction.  */
-  COSTS_N_INSNS (40),  /* cost of SQRTSS instruction.  */
-  COSTS_N_INSNS (40),  /* cost of SQRTSD instruction.  */
+  COSTS_N_INSNS (1),   /* cost of cheap SSE instruction.  */
+  COSTS_N_INSNS (3),   /* cost of ADDSS/SD SUBSS/SD insns.  */
+  COSTS_N_INSNS (4),   /* cost of MULSS instruction.  */
+  COSTS_N_INSNS (5),   /* cost of MULSD instruction.  */
+  COSTS_N_INSNS (5),   /* cost of FMA SS instruction.  */
+  COSTS_N_INSNS (5),   /* cost of FMA SD instruction.  */
+  COSTS_N_INSNS (18),  /* cost of DIVSS instruction.  */
+  COSTS_N_INSNS (32),  /* cost of DIVSD instruction.  */
+  COSTS_N_INSNS (30),  /* cost of SQRTSS instruction.  */
+  COSTS_N_INSNS (58),  /* cost of SQRTSD instruction.  */
   1, 2, 1, 1,  /* reassoc int, fp, vec_int, vec_fp.  */
   generic_memcpy,
   generic_memset,
@@ -2344,12 +2343,12 @@ struct processor_costs core_cost = {
   6,   /* number of parallel prefetches */
   /* FIXME perhaps more appropriate value is 5.  */
   3,   /* Branch cost */
-  COSTS_N_INSNS (8),   /* cost of FADD and FSUB insns.  */
-  COSTS_N_INSNS (8),   /* cost of FMUL instruction.  */
-  COSTS_N_INSNS (20),  /* cost of FDIV instruction.  */
-  COSTS_N_INSNS (8),   /* cost of FABS instruction.  */
-  COSTS_N_INSNS (8),   /* cost of FCHS instruction.  */
-  COSTS_N_INSNS (40),  /* cost of FSQRT instruction.  */
+  

Drop edge counts from CFG

2017-10-19 Thread Jan Hubicka
Hi,
this is penultimate patch to convert CFG to new profile_count/probability
infrastructure.  The patch simply drops counts from edges as they can be
determined by applying edge's probability to source BB count. In most case
updating needed is converting e->count use to e->count() but there are
non-trivial changes to updating of profile after threading and propagation of
BBs known to be never executed.  

In the next step I will drop BB frequencies (and add extra mode to profile_count
for count being guessed and local to function) which will allow more cleanups
and then check for profile updating bugs.

Bootstrapped/regtested x86_64-linux, I will commit it after bit of extra
testing.

2017-10-19  Jan Hubicka  

* asan.c (create_cond_insert_point): Do not update edge count.
* auto-profile.c (afdo_propagate_edge): Update for edge count removal.
(afdo_propagate_circuit): Likewise.
(afdo_calculate_branch_prob): Likewise.
(afdo_annotate_cfg): Likewise.
* basic-block.h (struct edge_def): Remove count.
(edge_def::count): New accessor.
* bb-reorder.c (rotate_loop): Update.
(find_traces_1_round): Update.
(connect_traces): Update.
(sanitize_hot_paths): Update.
* cfg.c (unchecked_make_edge): Update.
(make_single_succ_edge): Update.
(check_bb_profile): Update.
(dump_edge_info): Update.
(update_bb_profile_for_threading): Update.
(scale_bbs_frequencies_int): Update.
(scale_bbs_frequencies_gcov_type): Update.
(scale_bbs_frequencies_profile_count): Update.
(scale_bbs_frequencies): Update.
* cfganal.c (connect_infinite_loops_to_exit): Update.
* cfgbuild.c (compute_outgoing_frequencies): Update.
(find_many_sub_basic_blocks): Update.
* cfgcleanup.c (try_forward_edges): Update.
(try_crossjump_to_edge): Update
* cfgexpand.c (expand_gimple_cond): Update
(expand_gimple_tailcall): Update
(construct_exit_block): Update
* cfghooks.c (verify_flow_info): Update
(redirect_edge_succ_nodup): Update
(split_edge): Update
(make_forwarder_block): Update
(duplicate_block): Update
(account_profile_record): Update
* cfgloop.c (find_subloop_latch_edge_by_profile): Update.
* cfgloopanal.c (expected_loop_iterations_unbounded): Update.
* cfgloopmanip.c (scale_loop_profile): Update.
(loopify): Update.
(lv_adjust_loop_entry_edge): Update.
* cfgrtl.c (try_redirect_by_replacing_jump): Update.
(force_nonfallthru_and_redirect): Update.
(purge_dead_edges): Update.
(rtl_flow_call_edges_add): Update.
* cgraphunit.c (init_lowered_empty_function): UPdate.
(cgraph_node::expand_thunk): UPdate.
* gimple-pretty-print.c (dump_probability): UPdate.
(dump_edge_probability): UPdate.
* gimple-ssa-isolate-paths.c (isolate_path): UPdate.
* haifa-sched.c (sched_create_recovery_edges): UPdate.
* hsa-gen.c (convert_switch_statements): UPdate.
* ifcvt.c (dead_or_predicable): UPdate.
* ipa-inline-transform.c (inline_transform): UPdate.
* ipa-split.c (split_function): UPdate.
* ipa-utils.c (ipa_merge_profiles): UPdate.
* loop-doloop.c (add_test): UPdate.
* loop-unroll.c (unroll_loop_runtime_iterations): UPdate.
* lto-streamer-in.c (input_cfg): UPdate.
(input_function): UPdate.
* lto-streamer-out.c (output_cfg): UPdate.
* modulo-sched.c (sms_schedule): UPdate.
* postreload-gcse.c (eliminate_partially_redundant_load): UPdate.
* predict.c (maybe_hot_edge_p): UPdate.
(unlikely_executed_edge_p): UPdate.
(probably_never_executed_edge_p): UPdate.
(dump_prediction): UPdate.
(drop_profile): UPdate.
(propagate_unlikely_bbs_forward): UPdate.
(determine_unlikely_bbs): UPdate.
(force_edge_cold): UPdate.
* profile.c (compute_branch_probabilities): UPdate.
* reg-stack.c (better_edge): UPdate.
* shrink-wrap.c (handle_simple_exit): UPdate.
* tracer.c (better_p): UPdate.
* trans-mem.c (expand_transaction): UPdate.
(split_bb_make_tm_edge): UPdate.
* tree-call-cdce.c: UPdate.
* tree-cfg.c (gimple_find_sub_bbs): UPdate.
(gimple_split_edge): UPdate.
(gimple_duplicate_sese_region): UPdate.
(gimple_duplicate_sese_tail): UPdate.
(gimple_flow_call_edges_add): UPdate.
(insert_cond_bb): UPdate.
(execute_fixup_cfg): UPdate.
* tree-cfgcleanup.c (cleanup_control_expr_graph): UPdate.
* tree-complex.c (expand_complex_div_wide): UPdate.
* tree-eh.c (lower_resx): UPdate.
(unsplit_eh): UPdate.
(cleanup_empty_eh_move_lp): UPdate.
* tree-inline.c (copy_edges_for_bb): UPdate.

Re: [RFC PATCH] Fix pointer diff (was: -fsanitize=pointer-overflow support (PR sanitizer/80998))

2017-10-19 Thread Richard Biener
On Mon, Oct 9, 2017 at 12:55 PM, Marc Glisse  wrote:
> On Mon, 3 Jul 2017, Richard Biener wrote:
>
>> On Sat, 1 Jul 2017, Marc Glisse wrote:
>>
>>> I wrote a quick prototype to see what the fallout would look like.
>>> Surprisingly, it actually passes bootstrap+testsuite on ppc64el with all
>>> languages with no regression. Sure, it is probably not a complete
>>> migration, there are likely a few places still converting to ptrdiff_t
>>> to perform a regular subtraction, but this seems to indicate that the
>>> work isn't as bad as using a signed type in pointer_plus_expr for
>>> instance.
>>
>>
>> The fold_binary_loc hunk looks dangerous (it'll generate MINUS_EXPR
>> from POINTER_MINUS_EXPR in some cases I guess).
>>
>> The tree code needs documenting in tree.def and generic.texi.
>>
>> Otherwise ok(*).
>>
>> Thanks,
>> Richard.
>>
>> (*) ok, just kidding -- or maybe not
>
>
> I updated the prototype a bit. Some things I noticed:
>
> - the C front-end has support for address spaces that seems to imply that
> pointer subtraction (and division by the size) may be done in a type larger
> than ptrdiff_t. Currently, it generates
> (ptrdiff_t)(((inttype)q-(inttype)p)/size) for q-p where inttype is some type
> potentially larger than ptrdiff_t.

It uses a ptrdiff_t corresponding to the respective address space, yes.
That we use sizetype elsewhere unconditionally is a bug :/

 I am thus generating a POINTER_DIFF_EXPR
> with that type, while I was originally hoping its type would always be
> ptrdiff_t. It complicates code and I am not sure I didn't break address
> spaces anyway... (expansion likely needs to do the inttype dance)

I think that's fine.  Ideally targets would provide a type to use for each
respective address space given we have targets that have sizetype smaller
than ptr_mode.

> Are ptrdiff_t (what POINTER_DIFF_EXPR should return) and size_t (what
> POINTER_PLUS_EXPR takes as second argument) always the same type
> signed/unsigned?

POINTER_PLUS_EXPR takes 'sizetype', not size_t.  So the answer is clearly
no.  And yes, that's ugly and broken and should be fixed.

> Counter-examples: m32c (when !TARGET_A16), visium, darwin,
> powerpcspe, s390, vms... and it isn't even always the same that is bigger
> than the other. That's quite messy.

Eh.  Yeah, targets are free to choose 'sizetype' and they do so for
efficiency.  IMHO we should get rid of this "feature".

> - I had to use @@ in match.pd, not for constants, but because in GENERIC we
> sometimes ended up with pointers where operand_equal_p said yes but
> types_match disagreed.
>
> - It currently regresses 2 go tests: net/http runtime/debug

Those are flaky for me and fail sometimes and sometimes not.

+@item POINTER_DIFF_EXPR
+This node represents pointer subtraction.  The two operands always
+have pointer/reference type.  The second operand is always an unsigned
+integer type compatible with sizetype.  It returns a signed integer.

the 2nd sentence looks bogusly copied.

+  /* FIXME.  */
+  if (code == POINTER_DIFF_EXPR)
+   return int_const_binop (MINUS_EXPR,
+   fold_convert (ptrdiff_type_node, arg1),
+   fold_convert (ptrdiff_type_node, arg2));

  wide_int_to_tree (type, wi::to_widest (arg1) - wi::to_widest (arg2));

?

+case POINTER_DIFF_EXPR:
+  {
+   if (!POINTER_TYPE_P (rhs1_type)
+   || !POINTER_TYPE_P (rhs2_type)
+   // || !useless_type_conversion_p (rhs2_type, rhs1_type)

types_compatible_p (rhs1_type, rhs2_type)?

+   // || !useless_type_conversion_p (ptrdiff_type_node, lhs_type))
+   || TREE_CODE (lhs_type) != INTEGER_TYPE
+   || TYPE_UNSIGNED (lhs_type))
+ {
+   error ("type mismatch in pointer diff expression");
+   debug_generic_stmt (lhs_type);
+   debug_generic_stmt (rhs1_type);
+   debug_generic_stmt (rhs2_type);
+   return true;

there's also verify_expr which could want adjustment for newly created
tree kinds.

So if the precision of the result is larger than that of the pointer operands
we sign-extend the result, right?  So the subtraction is performed in precision
of the pointer operands and then sign-extended/truncated to the result type?
Which means it is _not_ a widening subtraction to get around the
half-address-space
issue.  The tree.def documentation should reflect this semantic
detail.  Not sure
if the RTL expansion follows it.

I think that we'd ideally have a single legal INTEGER_TYPE precision
per pointer type precision and that those precisions should match...
we don't have to follow the mistakes of POINTER_PLUS_EXPR.

So ... above verify TYPE_PRECISION (rhs1_type) == TYPE_PRECISION (lhs_type)?
Some targets have 24bit ptr_mode but no 24bit integer type which means the
FE likely chooses 32bit int for the difference computation.  But the middle-end
should be free to create a 24bit precision type with SImode.

Otherwise as said 

Re: [RFC] Make 4-stage PGO bootstrap really working

2017-10-19 Thread Markus Trippelsdorf
On 2017.10.19 at 14:56 +0200, Martin Liška wrote:
> PING^2

> So far so good with a small exception: conftest.gcda files that
> trigger -Wcoverage-mismatch. Can we remove these before a stage? Do we
> do a similar thing somewhere?

I think you should simply remove all these conftest.gcda files before
STAGEfeedback.

-- 
Markus


Re: [PATCH] Fix nrv-1.c false failure on aarch64.

2017-10-19 Thread Richard Biener
On Thu, Oct 19, 2017 at 2:47 PM, Richard Earnshaw (lists)
 wrote:
> On 19/10/17 09:14, Richard Biener wrote:
>> On Wed, Oct 18, 2017 at 6:59 PM, Egeyar Bagcioglu
>>  wrote:
>>> Hello,
>>>
>>> Test case "guality.exp=nrv-1.c" fails on aarch64. Optimizations reorder the
>>> instructions and cause the value of a variable to be checked before its
>>> first assignment. The following patch is moving the
>>> break point to the end of the function. Therefore, it ensures that the break
>>> point is reached after the assignment instruction is executed.
>>>
>>> Please review the patch and apply if legitimate.
>>
>> guality testcases are mostly user-experience tests but they are indeed
>> prone to the usual jumpiness.  As a user we'd expect a breakpoint
>> on this line to trigger only if previous stmts have been committed.
>>
>
> If all the side effects of the later expression occur before any of the
> side effects of the earlier one then this would never happen.
>
> You might be able to concoct dwarf expressions that gave the appearance
> of the expression having been evaluated, but memory dumping would almost
> certainly reveal that memory hadn't really been updated at that point.
>
>> I guess Alex work on stmt frontiers will fix this instance?
>
> Don't stmt frontiers just enable you to identify exactly one stopping
> point with each statement, so that you don't keep repeatedly stepping to
> the same line?

I thought they made sure the point it comes up with is at least somewhat
correlated with the intended point in the excution flow?

If we're allowed to move all those notes to the beginning of the function
with just keeping their order I would miss the point of having them at all.

But I don't remember anything "invalidating" such notes in the patch
so maybe this
is indeed what they are... :/

Richard.

> R.
>
>>
>> Richard.
>>
>>> Egeyar
>>>
>
>


Re: [C++ PATCH] Avoid bogus -Wreturn-local-addr warnings in templates (PR c++/82600)

2017-10-19 Thread Nathan Sidwell

On 10/19/2017 04:15 AM, Jakub Jelinek wrote:

Hi!

A recent change to check_return_expr resulted in
maybe_warn_about_returning_address_of_local being called also with
processing_template_decl.  The problem with that is that the function
relies on folding (fold_for_warn) which isn't performed at all when
processing_template_decl.  So, we have still ARRAY_REF for a local decl
that has pointer type, rather than the expected POINTER_PLUS_EXPR,
where ARRAY_REF would only remain if the variable was actually an array.

The following patch fixes that by not calling the function at all
when processing_template_decl like before.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-10-19  Jakub Jelinek  

PR c++/82600
* typeck.c (check_return_expr): Don't call
maybe_warn_about_returning_address_of_local in templates.



ok, thanks

nathan


--
Nathan Sidwell


[PATCH] Improve tests for error reporting in Filesystem TS

2017-10-19 Thread Jonathan Wakely

Improve some experimental::filesystem tests to ensure that error_code
arguments are cleared, and the VERIFY( !ec ) tests don't pass just
because that was already true and the function didn't touch it.

* testsuite/experimental/filesystem/iterators/
recursive_directory_iterator.cc: Ensure that error_code arguments are
cleared when required.
* testsuite/experimental/filesystem/operations/create_directory.cc:
Remove redundant check.
* testsuite/experimental/filesystem/operations/temp_directory_path.cc:
Ensure that error_code argument is cleared when required.

Tested powerpc64le-linux, committed to trunk.

commit fc430c0f163f814948abe7ae717f117ccae5f802
Author: Jonathan Wakely 
Date:   Thu Oct 19 14:46:36 2017 +0100

Improve tests for error reporting in Filesystem TS

* testsuite/experimental/filesystem/iterators/
recursive_directory_iterator.cc: Ensure that error_code arguments 
are
cleared when required.
* testsuite/experimental/filesystem/operations/create_directory.cc:
Remove redundant check.
* 
testsuite/experimental/filesystem/operations/temp_directory_path.cc:
Ensure that error_code argument is cleared when required.

diff --git 
a/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc
 
b/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc
index e7b5e53d43d..50cc7d45de8 100644
--- 
a/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc
+++ 
b/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc
@@ -28,6 +28,7 @@ namespace fs = std::experimental::filesystem;
 void
 test01()
 {
+  const std::error_code bad_ec = make_error_code(std::errc::invalid_argument);
   std::error_code ec;
 
   // Test non-existent path.
@@ -37,15 +38,19 @@ test01()
   VERIFY( iter == end(iter) );
 
   // Test empty directory.
+  ec = bad_ec;
   create_directory(p, fs::current_path(), ec);
   VERIFY( !ec );
+  ec = bad_ec;
   iter = fs::recursive_directory_iterator(p, ec);
   VERIFY( !ec );
   VERIFY( iter == end(iter) );
 
   // Test non-empty directory.
-  create_directories(p / "d1/d2");
+  ec = bad_ec;
+  create_directories(p / "d1/d2", ec);
   VERIFY( !ec );
+  ec = bad_ec;
   iter = fs::recursive_directory_iterator(p, ec);
   VERIFY( !ec );
   VERIFY( iter != end(iter) );
@@ -56,6 +61,7 @@ test01()
   VERIFY( iter == end(iter) );
 
   // Test inaccessible directory.
+  ec = bad_ec;
   permissions(p, fs::perms::none, ec);
   VERIFY( !ec );
   iter = fs::recursive_directory_iterator(p, ec);
@@ -64,15 +70,19 @@ test01()
 
   // Test inaccessible directory, skipping permission denied.
   const auto opts = fs::directory_options::skip_permission_denied;
+  ec = bad_ec;
   iter = fs::recursive_directory_iterator(p, opts, ec);
   VERIFY( !ec );
   VERIFY( iter == end(iter) );
 
   // Test inaccessible sub-directory.
+  ec = bad_ec;
   permissions(p, fs::perms::owner_all, ec);
   VERIFY( !ec );
+  ec = bad_ec;
   permissions(p/"d1/d2", fs::perms::none, ec);
   VERIFY( !ec );
+  ec = bad_ec;
   iter = fs::recursive_directory_iterator(p, ec);
   VERIFY( !ec );
   VERIFY( iter != end(iter) );
@@ -84,12 +94,14 @@ test01()
   VERIFY( iter == end(iter) );
 
   // Test inaccessible sub-directory, skipping permission denied.
+  ec = bad_ec;
   iter = fs::recursive_directory_iterator(p, opts, ec);
   VERIFY( !ec );
   VERIFY( iter != end(iter) );
   VERIFY( iter->path() == p/"d1" );
   ++iter;  // should recurse into d1
   VERIFY( iter->path() == p/"d1/d2" );
+  ec = bad_ec;
   iter.increment(ec);  // should fail to recurse into p/d1/d2, so skip it
   VERIFY( !ec );
   VERIFY( iter == end(iter) );
@@ -101,12 +113,15 @@ test01()
 void
 test02()
 {
+  const std::error_code bad_ec = make_error_code(std::errc::invalid_argument);
   std::error_code ec;
   const auto p = __gnu_test::nonexistent_path();
+  ec = bad_ec;
   create_directories(p / "d1/d2", ec);
   VERIFY( !ec );
 
   // Test post-increment (libstdc++/71005)
+  ec = bad_ec;
   auto iter = fs::recursive_directory_iterator(p, ec);
   VERIFY( !ec );
   VERIFY( iter != end(iter) );
@@ -126,7 +141,7 @@ test02()
 void
 test03()
 {
-  std::error_code ec;
+  std::error_code ec = make_error_code(std::errc::invalid_argument);
   const auto p = __gnu_test::nonexistent_path();
   create_directories(p / "longer_than_small_string_buffer", ec);
   VERIFY( !ec );
diff --git 
a/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directory.cc 
b/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directory.cc
index f1c50a6e8e3..85e8281a6e1 100644
--- 
a/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directory.cc
+++ 
b/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directory.cc
@@ -50,7 +50,6 @@ test01()
   VERIFY( !ec );
   

[PATCH] Fix path::iterator post-increment and post-decrement

2017-10-19 Thread Jonathan Wakely

I made a dumb mistake in the post-inc and post-dec operators for
the path::iterator type, forgetting that _M_cur is sometimes null (for
a single-element path).

* include/experimental/bits/fs_path.h (path::iterator++(int))
(path::iterator--(int)): Fix for paths with only one component.
* testsuite/experimental/filesystem/path/itr/traversal.cc: Test
post-increment and post-decrement.

Tested powerpc64le-linux, committed to trunk.

It's too late to fix this on the gcc-5 branch but I'll backport it to
gcc-6 and gcc-7 soon.

commit 1c773ea81a8781dfbb4381a496aa5bab5bd623de
Author: Jonathan Wakely 
Date:   Wed Oct 18 02:47:24 2017 +0100

Fix path::iterator post-increment and post-decrement

* include/experimental/bits/fs_path.h (path::iterator++(int))
(path::iterator--(int)): Fix for paths with only one component.
* testsuite/experimental/filesystem/path/itr/traversal.cc: Test
post-increment and post-decrement.

diff --git a/libstdc++-v3/include/experimental/bits/fs_path.h 
b/libstdc++-v3/include/experimental/bits/fs_path.h
index cde3897b8e5..9121439b7f2 100644
--- a/libstdc++-v3/include/experimental/bits/fs_path.h
+++ b/libstdc++-v3/include/experimental/bits/fs_path.h
@@ -725,10 +725,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 pointer   operator->() const { return std::__addressof(**this); }
 
 iterator& operator++();
-iterator  operator++(int) { auto __tmp = *this; ++_M_cur; return __tmp; }
+iterator  operator++(int) { auto __tmp = *this; ++*this; return __tmp; }
 
 iterator& operator--();
-iterator  operator--(int) { auto __tmp = *this; --_M_cur; return __tmp; }
+iterator  operator--(int) { auto __tmp = *this; --*this; return __tmp; }
 
 friend bool operator==(const iterator& __lhs, const iterator& __rhs)
 { return __lhs._M_equals(__rhs); }
diff --git 
a/libstdc++-v3/testsuite/experimental/filesystem/path/itr/traversal.cc 
b/libstdc++-v3/testsuite/experimental/filesystem/path/itr/traversal.cc
index bc1091476b5..dbb4d46796d 100644
--- a/libstdc++-v3/testsuite/experimental/filesystem/path/itr/traversal.cc
+++ b/libstdc++-v3/testsuite/experimental/filesystem/path/itr/traversal.cc
@@ -79,9 +79,28 @@ test02()
   }
 }
 
+void
+test03()
+{
+  path paths[] = { "single", "multiple/elements" };
+  for (const path& p : paths)
+for (auto iter = p.begin(); iter != p.end(); ++iter)
+{
+  auto iter2 = iter;
+  ++iter;
+  iter2++;
+  VERIFY( iter2 == iter );
+  auto iter3 = iter;
+  --iter3;
+  iter2--;
+  VERIFY( iter2 == iter3 );
+}
+}
+
 int
 main()
 {
   test01();
   test02();
+  test03();
 }


[PATCH] Update references to C++17 in libstdc++ manual

2017-10-19 Thread Jonathan Wakely

* doc/xml/manual/status_cxx2017.xml: Update references to C++17
section numbers.

Committed to trunk.

commit 834582b0c91337b031fa610e08c81ff1f2087f53
Author: Jonathan Wakely 
Date:   Thu Oct 19 14:38:40 2017 +0100

Update references to C++17 in libstdc++ manual

* doc/xml/manual/status_cxx2017.xml: Update references to C++17
section numbers.

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml 
b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
index fd66ac503a8..b5a65be12c9 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
@@ -935,49 +935,49 @@ Feature-testing recommendations for C++.

 

-  20.6.5 [optional.bad_optional_access]
+  23.6.5 [optional.bad_optional_access]
   what() returns "bad optional access".

 

-  20.7.2 [variant.variant]
+  23.7.3 [variant.variant]
   Whether variant supports over-aligned types
   should be documented here.

 

-  20.7.10 [variant.bad.access]
+  23.7.10 [variant.bad.access]
   what() returns "Unexpected index".

 

-  20.12.5.2 [memory.resource.pool.options]
+  23.12.5.2 [memory.resource.pool.options]
   The limits for maximum number of blocks and largest allocation size
   supported by pool_options should be documented
   here.

 

-  20.12.6.1 [memory.resource.monotonic.buffer.ctor]
+  23.12.6.1 [memory.resource.monotonic.buffer.ctor]
   The default next_buffer_size and growth factor should
   be documented here.

 

-  20.15.4.3 [meta.unary.prop]
+  23.15.4.3 [meta.unary.prop]
   The predicate condition for
   has_unique_object_representations is true for all scalar
   types except floating point types.

 

-  20.19.3 [execpol.type],
-  25.2.3 [algorithms.parallel.exec]
+  23.19.3 [execpol.type],
+  28.4.3 [algorithms.parallel.exec]
   There are no implementation-defined execution policies.

 

-  22.4.2 [string.view.template]
+  24.4.2 [string.view.template]
   basic_string_viewC, T::iterator is
   C* and
   basic_string_viewC, T::const_iterator is
@@ -986,7 +986,7 @@ Feature-testing recommendations for C++.
 
 

-  25.2.3 [algorithms.parallel.exec]
+  28.4.3 [algorithms.parallel.exec]
   Threads of execution created by std::thread
   provide concurrent forward progress guarantees, so threads of execution
   implicitly created by the library will provide parallel forward
@@ -994,39 +994,39 @@ Feature-testing recommendations for C++.

 

-  26.4.1 [cfenv.syn]
+  29.4.1 [cfenv.syn]
   The effects of the cfenv functions
   depends on whether the FENV_ACCESS pragma is supported,
   and on the C library that provides the header.

 

-  26.6.9 [c.math.rand]
+  29.6.9 [c.math.rand]
   Whether the rand function may introduce data
   races depends on the target C library that provides the function.

 

-  26.9.5 [sf.cmath]
+  29.9.5 [sf.cmath]
   The effect of calling the mathematical special functions with large
   inputs should be documented here.

 

-  27.10.2.1 [fs.conform.9945]
+  30.10.2.1 [fs.conform.9945]
   The behavior of the filesystem library implementation will depend on
   the target operating system. Some features will not be not supported
   on some targets.

 

-  27.10.6 [fs.filesystem.syn]
+  30.10.5 [fs.filesystem.syn]
   The clock used for file times is
   std::chrono::system_clock.

 

-  27.10.8 [path.generic]
+  30.10.7.1 [fs.path.generic]
   dot-dot in the root-directory refers to the root-directory itself.

 


[PATCH] Use __LONG_LONG_MAX__ instead of LONG_LONG_MAX in test

2017-10-19 Thread Jonathan Wakely

The compiler always defines __LONG_LONG_MAX__, so use that.

* testsuite/decimal/conversion-to-integral.cc: Use predefined macro
instead of non-standard glibc one.

Tested powerpc64le-linux, committed to trunk.

commit 64dcf27be075febc17f9823e3d8155476f99fffa
Author: Jonathan Wakely 
Date:   Thu Oct 19 14:36:20 2017 +0100

Use __LONG_LONG_MAX__ instead of LONG_LONG_MAX in test

* testsuite/decimal/conversion-to-integral.cc: Use predefined macro
instead of non-standard glibc one.

diff --git a/libstdc++-v3/testsuite/decimal/conversion-to-integral.cc 
b/libstdc++-v3/testsuite/decimal/conversion-to-integral.cc
index 6fd59a67b70..cc48fa8729d 100644
--- a/libstdc++-v3/testsuite/decimal/conversion-to-integral.cc
+++ b/libstdc++-v3/testsuite/decimal/conversion-to-integral.cc
@@ -64,7 +64,7 @@ void
 conversion_to_integral_128 (void)
 {
   #undef MAXVAL
-  #define MAXVAL LONG_LONG_MAX
+  #define MAXVAL __LONG_LONG_MAX__
   decimal128 a, b (1), c (-1), d (MAXVAL), e (-MAXVAL);
   long long ll;
 


Re: PR82575, lto debugobj references __gnu_lto_slim, ld test liblto-17 fails

2017-10-19 Thread Richard Biener
On Fri, 20 Oct 2017, Alan Modra wrote:

> Bootstrapped and regression tested powerpc64le-linux.  OK for trunk?

Ok.

Thanks,
Richard.

>   PR lto/82575
>   * simple-object-elf.c (simple_object_elf_copy_lto_debug_sections):
>   Make discarded non-local symbols weak and hidden.
> 
> diff --git a/libiberty/simple-object-elf.c b/libiberty/simple-object-elf.c
> index c394924..1afd3eb 100644
> --- a/libiberty/simple-object-elf.c
> +++ b/libiberty/simple-object-elf.c
> @@ -236,8 +236,10 @@ typedef struct
>  
>  #define STB_LOCAL0   /* Local symbol */
>  #define STB_GLOBAL   1   /* Global symbol */
> +#define STB_WEAK 2   /* Weak global */
>  
>  #define STV_DEFAULT  0   /* Visibility is specified by binding type */
> +#define STV_HIDDEN   2   /* Can only be seen inside currect component */
>  
>  /* Functions to fetch and store different ELF types, depending on the
> endianness and size.  */
> @@ -1365,18 +1367,25 @@ simple_object_elf_copy_lto_debug_sections 
> (simple_object_read *sobj,
>   {
> /* Make discarded symbols undefined and unnamed
>in case it is local.  */
> -   if (ELF_ST_BIND (*st_info) == STB_LOCAL)
> - ELF_SET_FIELD (type_functions, ei_class, Sym,
> -ent, st_name, Elf_Word, 0);
> +   int bind = ELF_ST_BIND (*st_info);
> +   if (bind == STB_LOCAL)
> + {
> +   ELF_SET_FIELD (type_functions, ei_class, Sym,
> +  ent, st_name, Elf_Word, 0);
> +   *st_other = STV_DEFAULT;
> + }
> +   else
> + {
> +   bind = STB_WEAK;
> +   *st_other = STV_HIDDEN;
> + }
> +   *st_info = ELF_ST_INFO (bind, STT_NOTYPE);
> ELF_SET_FIELD (type_functions, ei_class, Sym,
>ent, st_value, Elf_Addr, 0);
> ELF_SET_FIELD (type_functions, ei_class, Sym,
>ent, st_size, Elf_Word, 0);
> ELF_SET_FIELD (type_functions, ei_class, Sym,
>ent, st_shndx, Elf_Half, SHN_UNDEF);
> -   *st_info = ELF_ST_INFO (ELF_ST_BIND (*st_info),
> -   STT_NOTYPE);
> -   *st_other = STV_DEFAULT;
>   }
>   }
> XDELETEVEC (strings);
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] PR libgcc/59714 complex division is surprising on aarch64

2017-10-19 Thread Richard Earnshaw (lists)
On 19/10/17 14:07, Wilco Dijkstra wrote:
> Vladimir wrote:
> 
> +# Disable floating-point expression contraction
> +LIBGCC2_FFP_CONTRAST_CFLAGS = -ffp-contract=off
> +
> 
> It looks like this disables fp-contract in all of libgcc...
> What is the the number of FMAs in libgcc before/after?
> 
> If it disables anything other than the ones in complex division, it
> would have an unintended performance impact. It's best to do
> this just for complex division.
> 
> Wilco
> 

It's probably better to do this with an attribute

__attribute__((optimize("fp-contract=off")))

on the affected functions.

R.


PR82575, lto debugobj references __gnu_lto_slim, ld test liblto-17 fails

2017-10-19 Thread Alan Modra
Bootstrapped and regression tested powerpc64le-linux.  OK for trunk?

PR lto/82575
* simple-object-elf.c (simple_object_elf_copy_lto_debug_sections):
Make discarded non-local symbols weak and hidden.

diff --git a/libiberty/simple-object-elf.c b/libiberty/simple-object-elf.c
index c394924..1afd3eb 100644
--- a/libiberty/simple-object-elf.c
+++ b/libiberty/simple-object-elf.c
@@ -236,8 +236,10 @@ typedef struct
 
 #define STB_LOCAL  0   /* Local symbol */
 #define STB_GLOBAL 1   /* Global symbol */
+#define STB_WEAK   2   /* Weak global */
 
 #define STV_DEFAULT0   /* Visibility is specified by binding type */
+#define STV_HIDDEN 2   /* Can only be seen inside currect component */
 
 /* Functions to fetch and store different ELF types, depending on the
endianness and size.  */
@@ -1365,18 +1367,25 @@ simple_object_elf_copy_lto_debug_sections 
(simple_object_read *sobj,
{
  /* Make discarded symbols undefined and unnamed
 in case it is local.  */
- if (ELF_ST_BIND (*st_info) == STB_LOCAL)
-   ELF_SET_FIELD (type_functions, ei_class, Sym,
-  ent, st_name, Elf_Word, 0);
+ int bind = ELF_ST_BIND (*st_info);
+ if (bind == STB_LOCAL)
+   {
+ ELF_SET_FIELD (type_functions, ei_class, Sym,
+ent, st_name, Elf_Word, 0);
+ *st_other = STV_DEFAULT;
+   }
+ else
+   {
+ bind = STB_WEAK;
+ *st_other = STV_HIDDEN;
+   }
+ *st_info = ELF_ST_INFO (bind, STT_NOTYPE);
  ELF_SET_FIELD (type_functions, ei_class, Sym,
 ent, st_value, Elf_Addr, 0);
  ELF_SET_FIELD (type_functions, ei_class, Sym,
 ent, st_size, Elf_Word, 0);
  ELF_SET_FIELD (type_functions, ei_class, Sym,
 ent, st_shndx, Elf_Half, SHN_UNDEF);
- *st_info = ELF_ST_INFO (ELF_ST_BIND (*st_info),
- STT_NOTYPE);
- *st_other = STV_DEFAULT;
}
}
  XDELETEVEC (strings);

-- 
Alan Modra
Australia Development Lab, IBM


[PATCH GCC][3/3]Refine CFG and bound information for split loops

2017-10-19 Thread Bin Cheng
Hi,
This is a rework of patch at 
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01037.html.
The new patch doesn't try to handle all cases, instead, it only handles obvious 
cases.
It also tries to add tests illustrating different cases handled.
Bootstrap and test for patch set on x86_64 and AArch64.  Comments?

Thanks,
bin
2017-10-16  Bin Cheng  

* tree-ssa-loop-split.c (compute_new_first_bound): New parameter.
Compute and return bound information for the second split loop.
(adjust_loop_split): New function.
(split_loop): Update use and call above function.

gcc/testsuite/ChangeLog
2017-10-16  Bin Cheng  

* gcc.dg/loop-split-1.c: New test.
* gcc.dg/loop-split-2.c: New test.
* gcc.dg/loop-split-3.c: New test.

[PATCH GCC][2/3]Simplify ((A +- CST1 CMP A +- CST2)) for undefined overflow type

2017-10-19 Thread Bin Cheng
Hi,
This patch adds pattern simplifying (A +- CST1 CMP A +- CST2) for undefined 
overflow types.
Bootstrap and test for patch set on x86_64 and AArch64.  Comments?

Thanks,
bin
2017-10-16  Bin Cheng  

* match.pd (A +- CST1 CMP A +- CST2): New pattern.From 6e31cde6560366242c15039a5b3032f5425750e0 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Thu, 10 Aug 2017 17:29:22 +0100
Subject: [PATCH 2/3] simplify-AopCst1-cmp-AopCst2-20170806.txt

---
 gcc/match.pd | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 64b023d..dae0f1c 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3485,7 +3485,30 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (if (cmp == LE_EXPR)
 	 (ge (convert:st @0) { build_zero_cst (st); })
 	 (lt (convert:st @0) { build_zero_cst (st); }))
- 
+
+/* A +- CST1 CMP A +- CST2 in type with undefined overflow behavior.  */
+(for cmp  (lt gt le ge)
+ (for xop (plus minus)
+  (for yop (plus minus)
+   (simplify
+(cmp (xop @0 INTEGER_CST@1) (yop @0 INTEGER_CST@2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+	 && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
+	 && types_compatible_p (TREE_TYPE (@1), TREE_TYPE (@2)))
+ (with
+  {
+	tree cst1 = @1, cst2 = @2, zero = build_zero_cst (TREE_TYPE (@1));
+	if (xop == MINUS_EXPR)
+	  cst1 = int_const_binop (MINUS_EXPR, zero, cst1);
+	if (yop == MINUS_EXPR)
+	  cst2 = int_const_binop (MINUS_EXPR, zero, cst2);
+
+fold_overflow_warning (("assuming signed overflow does not occur "
+"when simplifying A +- CST cmp A +- CST"),
+			   WARN_STRICT_OVERFLOW_CONDITIONAL);
+  }
+  (cmp { cst1; } { cst2; })))
+
 (for cmp (unordered ordered unlt unle ungt unge uneq ltgt)
  /* If the second operand is NaN, the result is constant.  */
  (simplify
-- 
1.9.1



[PATCH GCC][1/3]Simplify (A + CST cmp A -> CST cmp zero) for undefined overflow type

2017-10-19 Thread Bin Cheng
Hi,
This is a rework of patch set at 
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01036.html
and https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01037.html.  The patch set 
improves niters
bound analysis for split loop.  Instead of feeding bound computation to generic 
folder, this
patch simplifies (A + CST cmp A  ->  CST cmp zero) for types with undefined 
overflow behavior.
Bootstrap and test for patch set on x86_64 and AArch64.  Comments?

Thanks,
bin
2017-10-16  Bin Cheng  

* match.pd (A + CST cmp A  ->  CST cmp zero): New simplification
for undefined overflow types in (A + CST CMP A  ->  A CMP' CST').From 9eb5d484235b97ed6e4e5f153dd7f159d7365f38 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Mon, 16 Oct 2017 14:24:10 +0100
Subject: [PATCH 1/3] simplify-AopCst-cmp-A-20171006.txt

---
 gcc/match.pd | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index f2c4373..64b023d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3518,7 +3518,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* When one argument is a constant, overflow detection can be simplified.
Currently restricted to single use so as not to interfere too much with
ADD_OVERFLOW detection in tree-ssa-math-opts.c.
-   A + CST CMP A  ->  A CMP' CST' */
+   A + CST CMP A  ->  A CMP' CST'
+
+   For type with undefined overflow behavior, the expression can also be
+   simplified by assuming overflow won't happen.
+   A + CST cmp A  -> CST cmp zero.  */
 (for cmp (lt le ge gt)
  out (gt gt le le)
  (simplify
@@ -3530,7 +3534,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(with { unsigned int prec = TYPE_PRECISION (TREE_TYPE (@0)); }
 (out @0 { wide_int_to_tree (TREE_TYPE (@0),
 			wi::max_value (prec, UNSIGNED)
-- wi::to_wide (@1)); })
+- wi::to_wide (@1)); }))
+   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+&& TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
+(with
+ {
+   tree zero = build_zero_cst (TREE_TYPE (@1));
+
+   fold_overflow_warning (("assuming signed overflow does not occur "
+			   "when simplifying A + CST cmp A"),
+			  WARN_STRICT_OVERFLOW_CONDITIONAL);
+ }
+ (cmp @1 { zero; }))
 
 /* To detect overflow in unsigned A - B, A < B is simpler than A - B > A.
However, the detection logic for SUB_OVERFLOW in tree-ssa-math-opts.c
-- 
1.9.1



[ARM] PR 82445 - suppress 32-bit aligned ldrd/strd peepholing with -mno-unaligned-access

2017-10-19 Thread Richard Earnshaw (lists)
Peephole patterns exist in the arm backend to spot load/store
operations to adjacent memory operations in order to convert them into
ldrd/strd instructions.  However, when we have strict alignment
enforced, then we can only do this if the accesses are known to be
64-bit aligned; this is unlikely to be the case for most loads.  The
patch adds some alignment checking to the code that validates the
addresses for use in the peephole patterns.  This should also fix
incorrect generation of ldrd/strd with unaligned accesses that could
previously have occurred on ARMv5e where all such operations must be
64-bit aligned.

I've added some new tests as well.  In doing so I discovered that the
ldrd/strd peephole tests could never fail since they would match the
source file name in the scanned assembly as well as any instructions
of the intended type.  I've fixed those by tightening the scan results
slightly.

gcc:

PR target/82445
* config/arm/arm.c (align_ok_ldrd_strd): New function.
(mem_ok_for_ldrd_strd): New parameter align.  Extract the   
alignment of the mem into it.
(gen_operands_ldrd_strd): Validate the alignment of the
accesses.

testsuite:

PR target/82445
* gcc.target/arm/peep-ldrd-1.c: Tighten test scan pattern.
* gcc.target/arm/peep-strd-1.c: Likewise.
* gcc.target/arm/peep-ldrd-2.c: New test.
* gcc.target/arm/peep-strd-2.c: New test.

Committed to all active branches (trunk, 6 & 7).
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 622218c..879be49 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15288,12 +15288,23 @@ operands_ok_ldrd_strd (rtx rt, rtx rt2, rtx rn, HOST_WIDE_INT offset,
   return true;
 }
 
+/* Return true if a 64-bit access with alignment ALIGN and with a
+   constant offset OFFSET from the base pointer is permitted on this
+   architecture.  */
+static bool
+align_ok_ldrd_strd (HOST_WIDE_INT align, HOST_WIDE_INT offset)
+{
+  return (unaligned_access
+	  ? (align >= BITS_PER_WORD && (offset & 3) == 0)
+	  : (align >= 2 * BITS_PER_WORD && (offset & 7) == 0));
+}
+
 /* Helper for gen_operands_ldrd_strd.  Returns true iff the memory
operand MEM's address contains an immediate offset from the base
-   register and has no side effects, in which case it sets BASE and
-   OFFSET accordingly.  */
+   register and has no side effects, in which case it sets BASE,
+   OFFSET and ALIGN accordingly.  */
 static bool
-mem_ok_for_ldrd_strd (rtx mem, rtx *base, rtx *offset)
+mem_ok_for_ldrd_strd (rtx mem, rtx *base, rtx *offset, HOST_WIDE_INT *align)
 {
   rtx addr;
 
@@ -15312,6 +15323,7 @@ mem_ok_for_ldrd_strd (rtx mem, rtx *base, rtx *offset)
   gcc_assert (MEM_P (mem));
 
   *offset = const0_rtx;
+  *align = MEM_ALIGN (mem);
 
   addr = XEXP (mem, 0);
 
@@ -15352,7 +15364,7 @@ gen_operands_ldrd_strd (rtx *operands, bool load,
 bool const_store, bool commute)
 {
   int nops = 2;
-  HOST_WIDE_INT offsets[2], offset;
+  HOST_WIDE_INT offsets[2], offset, align[2];
   rtx base = NULL_RTX;
   rtx cur_base, cur_offset, tmp;
   int i, gap;
@@ -15364,7 +15376,8 @@ gen_operands_ldrd_strd (rtx *operands, bool load,
  registers, and the corresponding memory offsets.  */
   for (i = 0; i < nops; i++)
 {
-  if (!mem_ok_for_ldrd_strd (operands[nops+i], _base, _offset))
+  if (!mem_ok_for_ldrd_strd (operands[nops+i], _base, _offset,
+ [i]))
 return false;
 
   if (i == 0)
@@ -15478,6 +15491,7 @@ gen_operands_ldrd_strd (rtx *operands, bool load,
   /* Swap the instructions such that lower memory is accessed first.  */
   std::swap (operands[0], operands[1]);
   std::swap (operands[2], operands[3]);
+  std::swap (align[0], align[1]);
   if (const_store)
 std::swap (operands[4], operands[5]);
 }
@@ -15491,6 +15505,9 @@ gen_operands_ldrd_strd (rtx *operands, bool load,
   if (gap != 4)
 return false;
 
+  if (!align_ok_ldrd_strd (align[0], offset))
+return false;
+
   /* Make sure we generate legal instructions.  */
   if (operands_ok_ldrd_strd (operands[0], operands[1], base, offset,
  false, load))
diff --git a/gcc/testsuite/gcc.target/arm/peep-ldrd-1.c b/gcc/testsuite/gcc.target/arm/peep-ldrd-1.c
index eb2b86e..d49eff6 100644
--- a/gcc/testsuite/gcc.target/arm/peep-ldrd-1.c
+++ b/gcc/testsuite/gcc.target/arm/peep-ldrd-1.c
@@ -8,4 +8,4 @@ int foo(int a, int b, int* p, int *q)
   *p = a;
   return a;
 }
-/* { dg-final { scan-assembler "ldrd" } } */
+/* { dg-final { scan-assembler "ldrd\\t" } } */
diff --git a/gcc/testsuite/gcc.target/arm/peep-ldrd-2.c b/gcc/testsuite/gcc.target/arm/peep-ldrd-2.c
new file mode 100644
index 000..6822c2b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/peep-ldrd-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_prefer_ldrd_strd } */
+/* { dg-options "-O2 -mno-unaligned-access" }  */
+int foo(int a, int 

Re: [PATCH] PR libgcc/59714 complex division is surprising on aarch64

2017-10-19 Thread Wilco Dijkstra
Vladimir wrote:

+# Disable floating-point expression contraction
+LIBGCC2_FFP_CONTRAST_CFLAGS = -ffp-contract=off
+

It looks like this disables fp-contract in all of libgcc...
What is the the number of FMAs in libgcc before/after?

If it disables anything other than the ones in complex division, it
would have an unintended performance impact. It's best to do
this just for complex division.

Wilco

Re: [patch] Add -static-libquadmath option

2017-10-19 Thread Janus Weil
Ping!

Is there any hope to get this 3-year-old patch to trunk after all? (It
seems there was at least a review of the Fortran parts, which has not
seen a reply ...)

Cheers,
Janus



2014-10-15 8:49 GMT+02:00 FX :
> ping
>
> Patch needs:
>   - C/driver options maintainer, or global reviewer, to OK the C changes 
> (outside darwin). They are really simple
>   - Fortran maintainer to OK the Fortran part
>
>
>> Version 2 of the patch, now handling the darwin case (thanks Iain) and 
>> expressely noting in the documentation the implications on redistribution 
>> (thanks Joseph).
>> Bootstrapped and regtested on x86_64-apple-darwin14.
>>
>> OK to commit?
>>
>> I need a C/driver options maintainer, or global reviewer, to OK the C 
>> changes (but they really are obvious).
>> As Iain suggested the darwin.h change, I consider it pre-approved by him :)
>>
>>
>>
>>
>>> We have a -static-libgfortran option, but on targets where we support 
>>> quad-prec math through libquadmath, we didn’t have an equivalent 
>>> -static-libquadmath so far. This patch adds it, in what I think is a rather 
>>> straightforward manner.
>>>
>>> The only minor complication comes from the fact that previously, linking 
>>> libquadmath was conditionally through libgfortran.spec. So the spec was 
>>> modified: when we use -static-libquadmath, the driver itself includes the 
>>> -lquadmath (surrounded by the necessary static linking directives), so that 
>>> libgfortran.spec shouldn’t do it again.
>>>
>>> Bootstrapped and regtested on x86_64 linux. OK to commit?


Re: [RFC] Make 4-stage PGO bootstrap really working

2017-10-19 Thread Martin Liška
PING^2

On 09/14/2017 02:20 PM, Martin Liška wrote:
> PING^1
> 
> On 08/30/2017 11:45 AM, Martin Liška wrote:
>> Hi.
>>
>> This is follow up which I've just noticed. Main problem we have is that
>> an instrumented compiler w/ -fprofile-generate (built in $OBJDIR/gcc 
>> subfolder)
>> will generate all *.gcda files in a same dir as *.o files. That's problematic
>> because we then have *.gcda files spread in 'profile' subfolder (because 
>> profile'
>> compiler builds libgcc) and 'train' subfolder. Eventually in 'feedback' stage
>> we don't load any *.gcda files :/
>>
>> Well I really hope we need to set -fprofile-generate=$folder to a $folder. 
>> There comes
>> second problem: all *.gcda files are created as $folder/$aux_base_name.gcda 
>> which makes
>> it useless as we multiple same file names:
>>
>> $ find . -name expr.c
>> ./libcpp/expr.c
>> ./gcc/expr.c
>>
>> Thus I suggest patch #0001 that appends full path of current work dir. Patch 
>> #0002 sets
>> a folder for PGO bootstrap. So far so good with a small exception: 
>> conftest.gcda files
>> that trigger -Wcoverage-mismatch. Can we remove these before a stage? Do we 
>> do a similar
>> thing somewhere?
>>
>> Thoughts?
>> Thanks,
>> Martin
>>
> 



Re: [PATCH] Fix nrv-1.c false failure on aarch64.

2017-10-19 Thread Richard Earnshaw (lists)
On 19/10/17 09:14, Richard Biener wrote:
> On Wed, Oct 18, 2017 at 6:59 PM, Egeyar Bagcioglu
>  wrote:
>> Hello,
>>
>> Test case "guality.exp=nrv-1.c" fails on aarch64. Optimizations reorder the
>> instructions and cause the value of a variable to be checked before its
>> first assignment. The following patch is moving the
>> break point to the end of the function. Therefore, it ensures that the break
>> point is reached after the assignment instruction is executed.
>>
>> Please review the patch and apply if legitimate.
> 
> guality testcases are mostly user-experience tests but they are indeed
> prone to the usual jumpiness.  As a user we'd expect a breakpoint
> on this line to trigger only if previous stmts have been committed.
> 

If all the side effects of the later expression occur before any of the
side effects of the earlier one then this would never happen.

You might be able to concoct dwarf expressions that gave the appearance
of the expression having been evaluated, but memory dumping would almost
certainly reveal that memory hadn't really been updated at that point.

> I guess Alex work on stmt frontiers will fix this instance?

Don't stmt frontiers just enable you to identify exactly one stopping
point with each statement, so that you don't keep repeatedly stepping to
the same line?

R.

> 
> Richard.
> 
>> Egeyar
>>




Re: [PATCH] Add offset_int to guard HOST_WIDE_INT overflow (PR tree-optimization/82042).

2017-10-19 Thread Richard Biener
On Thu, Oct 19, 2017 at 1:21 PM, Martin Liška  wrote:
> On 09/20/2017 05:47 PM, Richard Biener wrote:
>> On Wed, Sep 20, 2017 at 9:50 AM, Martin Liška  wrote:
>>> Hi.
>>>
>>> This is partial fix for the PR, where I calculate 2 offsets in offset_int 
>>> type.
>>>
>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>
>>> Ready to be installed?
>>
>> That seems to poorly handle the cases in offset_overlap_p.  What kind of
>> overflow do we see here?
>
> We hit:
>
> (const:DI (plus:DI (symbol_ref:DI ("a") [flags 0x40]  0x77fe6ea0 a>)
> (const_int -9223372036854775805 [0x8003] 
> "/home/marxin/Programming/testcases/pr82042.c":8 81 {*movdi_internal}
>
> and:
>
> (const:DI (plus:DI (symbol_ref:DI ("a") [flags 0x40]  0x77fe6ea0 a>)
> (const_int 9223372036854775806 [0x7ffe] 81 
> {*movdi_internal}
>
> The UBSAN error is:
> ../../gcc/tree-ssa-alias.c:704:30: runtime error: signed integer overflow: 
> 9223372036854775806 * 8 cannot be represented in type 'long int'

Huh, but that's a different error?  You are patching alias.c ...

I guess semantically we need to make 'c' sth that is big enough to handle
address differences in bytes (offset_int is for bits) and with
arbitrary positive/negative
value.  offset_int would work of course.   The testcase for the MEMs above is
artificial I guess (those are big offsets to a symbol...).

There are many more places we process offsets that way, for example
in nonoverlapping_memrefs_p (and I bet tree-ssa-alias.c as well).

In the end I don't think we want to slow down code just for the sake of
UBSAN.  IMHO for code invoking undefined behavior (object too large)
it's reasonable for the compiler to invoke undefined behavior ;)

Richard.

> Martin
>
>>
>> Richard.
>>
>>> Martin
>>>
>>> gcc/ChangeLog:
>>>
>>> 2017-09-20  Martin Liska  
>>>
>>> PR tree-optimization/82042
>>> * alias.c (memrefs_conflict_p): Calculate offset in offset_int
>>> type.
>>> * cse.c (use_related_value): Likewise.
>>> ---
>>>  gcc/alias.c | 10 --
>>>  gcc/cse.c   | 14 ++
>>>  2 files changed, 18 insertions(+), 6 deletions(-)
>>>
>>>
>


Re: [PATCH] Fix UBSAN errors in dse.c (PR rtl-optimization/82044).

2017-10-19 Thread Martin Liška
On 09/20/2017 10:15 AM, Jakub Jelinek wrote:
> On Wed, Sep 20, 2017 at 09:50:32AM +0200, Martin Liška wrote:
>> Hello.
>>
>> Following patch handles UBSAN (overflow) in dce.c.
> 
> dse.c ;)
> 
>> --- a/gcc/dse.c
>> +++ b/gcc/dse.c
>> @@ -929,7 +929,9 @@ set_usage_bits (group_info *group, HOST_WIDE_INT offset, 
>> HOST_WIDE_INT width,
>>  {
>>HOST_WIDE_INT i;
>>bool expr_escapes = can_escape (expr);
>> -  if (offset > -MAX_OFFSET && offset + width < MAX_OFFSET)
>> +  if (offset > -MAX_OFFSET
>> +  && offset < MAX_OFFSET
>> +  && offset + width < MAX_OFFSET)
> 
> This can still overflow if width is close to HOST_WIDE_INT_MAX.
> Anyway, I don't remember this code too much, but wonder if either offset or
> width or their sum is outside of the -MAX_OFFSET, MAX_OFFSET range if we
> still don't want to record usage bits at least in the intersection of
> -MAX_OFFSET, MAX_OFFSET and offset, offset + width (the latter performed
> with infinite precision; though, if record_store is changed as suggested
> below, offset + width shouldn't overflow).
> 
>>  for (i=offset; i>{
>>  bitmap store1;
>> @@ -1536,7 +1538,11 @@ record_store (rtx body, bb_info_t bb_info)
>>  }
>>store_info->group_id = group_id;
>>store_info->begin = offset;
>> -  store_info->end = offset + width;
>> +  if (offset > HOST_WIDE_INT_MAX - width)
>> +store_info->end = HOST_WIDE_INT_MAX;
>> +  else
>> +store_info->end = offset + width;
> 
> If offset + width overflows, I think we risk wrong-code by doing this, plus
> there are 3 other offset + width computations earlier in record_store
> before we reach this.  I think instead we should treat such cases as wild
> stores early, i.e.:
>if (!canon_address (mem, _id, , ))
>  {
>clear_rhs_from_active_local_stores ();
>return 0;
>  }
>  
>if (GET_MODE (mem) == BLKmode)
>  width = MEM_SIZE (mem);
>else
>  width = GET_MODE_SIZE (GET_MODE (mem));
> 
> +  if (offset > HOST_WIDE_INT_MAX - width)
> +{
> +  clear_rhs_from_active_local_stores ();
> +  return 0;
> +}
> 
> or so.
> 
>> +
>>store_info->is_set = GET_CODE (body) == SET;
>>store_info->rhs = rhs;
>>store_info->const_rhs = const_rhs;
>> @@ -1976,6 +1982,14 @@ check_mem_read_rtx (rtx *loc, bb_info_t bb_info)
>>return;
>>  }
>>  
>> +  if (offset > MAX_OFFSET)
>> +{
>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>> +fprintf (dump_file, " reaches MAX_OFFSET.\n");
>> +  add_wild_read (bb_info);
>> +  return;
>> +}
>> +

Hi.

The later one works for me. I'm going to regtest that.

Ready after it survives regression tests?

Thanks,
Martin

> 
> Is offset > MAX_OFFSET really problematic (and not just the width != -1 &&
> offset + width overflowing case)?
> 
>>if (GET_MODE (mem) == BLKmode)
>>  width = -1;
>>else
>>
> 
> 
>   Jakub
> 



Re: [RFC PATCH] Merge libsanitizer from upstream

2017-10-19 Thread Kamil Rytarowski
On 19.10.2017 13:17, Jakub Jelinek wrote:
> On Thu, Oct 19, 2017 at 02:07:24PM +0300, Maxim Ostapenko wrote:
>>> Is the patch (the merge + this incremental) ok for trunk?
>>
>> I think the patch is OK, just wondering about two things:
> 
> Richi just approved the patch on IRC, so I'll commit, then we can deal with
> follow-ups.
> 
>> 1) We have a bunch of GCC local patches, did you include them into a
>> cumulative patch (I guess yes)?
> 
> I have done some verification today, diff from upstream r285547 to unpatched
> GCC (with the LLVM Compiler infrastructure two liners removed), attached P1,
> and diff from upstream r315899 to patched GCC (again, two liners removed),
> attached P2 and went through the changes in P1 and verified that except for
> the ubsan backwards compatibility we had that can't work anymore everything
> else was upstreamed, or remained in P2.  So P2 is the current diff from
> upstream, with the sanitizer_common/sanitizer_symbolizer_libbacktrace.cc
> changes now filed upstream.
> 
>> 2) Upstream has enabled LSan for x86 and ARM, is it worth to enable them in
>> GCC too?
> 
> Maybe, feel free to post patches.  For LSan we need to split off lsan_preinit
> out of liblsan and link it into executables, will handle it next (there is a
> PR about it, just wanted to wait until the merge is in).
> 
>   Jakub
> 

NetBSD/amd64 ships now with asan, ubsan, libfuzzer (WIP) and safestack
upstreamed.

tsan is work-in-progress and it will be followed by msan.

"make check-tsan" with local patches results so far:

Expected Passes: 248
Expected Failures  : 1
Unsupported Tests  : 83
Unexpected Failures: 44

It's worth to enable the upstreamed features on the GCC/NetBSD frontend
as well. We are doing it currently downstream in the NetBSD distribution.



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Add offset_int to guard HOST_WIDE_INT overflow (PR tree-optimization/82042).

2017-10-19 Thread Martin Liška
On 09/20/2017 05:47 PM, Richard Biener wrote:
> On Wed, Sep 20, 2017 at 9:50 AM, Martin Liška  wrote:
>> Hi.
>>
>> This is partial fix for the PR, where I calculate 2 offsets in offset_int 
>> type.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
> 
> That seems to poorly handle the cases in offset_overlap_p.  What kind of
> overflow do we see here?

We hit:

(const:DI (plus:DI (symbol_ref:DI ("a") [flags 0x40] )
(const_int -9223372036854775805 [0x8003] 
"/home/marxin/Programming/testcases/pr82042.c":8 81 {*movdi_internal}

and:

(const:DI (plus:DI (symbol_ref:DI ("a") [flags 0x40] )
(const_int 9223372036854775806 [0x7ffe] 81 
{*movdi_internal}

The UBSAN error is:
../../gcc/tree-ssa-alias.c:704:30: runtime error: signed integer overflow: 
9223372036854775806 * 8 cannot be represented in type 'long int'

Martin

> 
> Richard.
> 
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2017-09-20  Martin Liska  
>>
>> PR tree-optimization/82042
>> * alias.c (memrefs_conflict_p): Calculate offset in offset_int
>> type.
>> * cse.c (use_related_value): Likewise.
>> ---
>>  gcc/alias.c | 10 --
>>  gcc/cse.c   | 14 ++
>>  2 files changed, 18 insertions(+), 6 deletions(-)
>>
>>



Re: [PATCH GCC]Try harder to find base object by expanding base address

2017-10-19 Thread Richard Biener
On Fri, Oct 13, 2017 at 5:04 PM, Bin Cheng  wrote:
> Hi,
> I ran into this when investigating PR82369 which we failed to find base 
> object.
> This simple patch tries harder to find base object by expanding base address
> in alloc_iv.  In general, we don't want to do aggressive expansion, but this
> case is fine because finding base object means reduction happened during the
> expansion.

I'm not sure.  Does simplification happen for the testcase when expanding?

>  And it's good to have base object for address type iv_uses.

Because we find related candidates via base-object?

It looks like we could avoid excessive expansion by doing
determine_base_object together with expr expansion given
the base object search is linear.  We can simply follow
SSA defs until we reach a suitable base and upon unwinding
build the base expression?

Richard.

> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>
> Thanks,
> bin
> 2017-10-12  Bin Cheng  
>
> * tree-scalar-evolution.c (alloc_iv): New parameter controlling
> base expansion for finding base object.
> (find_interesting_uses_address): Adjust call to alloc_iv.
>
> gcc/testsuite
> 2017-10-12  Bin Cheng  
>
> * gcc.dg/tree-ssa/ivopt_6.c: New test.


Re: [RFC PATCH] Merge libsanitizer from upstream

2017-10-19 Thread Jakub Jelinek
On Thu, Oct 19, 2017 at 02:07:24PM +0300, Maxim Ostapenko wrote:
> > Is the patch (the merge + this incremental) ok for trunk?
> 
> I think the patch is OK, just wondering about two things:

Richi just approved the patch on IRC, so I'll commit, then we can deal with
follow-ups.

> 1) We have a bunch of GCC local patches, did you include them into a
> cumulative patch (I guess yes)?

I have done some verification today, diff from upstream r285547 to unpatched
GCC (with the LLVM Compiler infrastructure two liners removed), attached P1,
and diff from upstream r315899 to patched GCC (again, two liners removed),
attached P2 and went through the changes in P1 and verified that except for
the ubsan backwards compatibility we had that can't work anymore everything
else was upstreamed, or remained in P2.  So P2 is the current diff from
upstream, with the sanitizer_common/sanitizer_symbolizer_libbacktrace.cc
changes now filed upstream.

> 2) Upstream has enabled LSan for x86 and ARM, is it worth to enable them in
> GCC too?

Maybe, feel free to post patches.  For LSan we need to split off lsan_preinit
out of liblsan and link it into executables, will handle it next (there is a
PR about it, just wanted to wait until the merge is in).

Jakub
--- /usr/src/llvm/projects/compiler-rt.285547/lib/asan/asan_globals.cc  
2017-10-19 10:29:52.916249242 +0200
+++ asan/asan_globals.cc2017-10-19 09:17:31.163837518 +0200
@@ -149,23 +147,6 @@ static void CheckODRViolationViaIndicato
   }
 }
 
-// Check ODR violation for given global G by checking if it's already poisoned.
-// We use this method in case compiler doesn't use private aliases for global
-// variables.
-static void CheckODRViolationViaPoisoning(const Global *g) {
-  if (__asan_region_is_poisoned(g->beg, g->size_with_redzone)) {
-// This check may not be enough: if the first global is much larger
-// the entire redzone of the second global may be within the first global.
-for (ListOfGlobals *l = list_of_all_globals; l; l = l->next) {
-  if (g->beg == l->g->beg &&
-  (flags()->detect_odr_violation >= 2 || g->size != l->g->size) &&
-  !IsODRViolationSuppressed(g->name))
-ReportODRViolation(g, FindRegistrationSite(g),
-   l->g, FindRegistrationSite(l->g));
-}
-  }
-}
-
 // Clang provides two different ways for global variables protection:
 // it can poison the global itself or its private alias. In former
 // case we may poison same symbol multiple times, that can help us to
@@ -213,8 +194,6 @@ static void RegisterGlobal(const Global
 // where two globals with the same name are defined in different modules.
 if (UseODRIndicator(g))
   CheckODRViolationViaIndicator(g);
-else
-  CheckODRViolationViaPoisoning(g);
   }
   if (CanPoisonMemory())
 PoisonRedZones(*g);
--- 
/usr/src/llvm/projects/compiler-rt.285547/lib/sanitizer_common/sanitizer_common_interceptors.inc
2017-10-19 10:29:53.330243835 +0200
+++ sanitizer_common/sanitizer_common_interceptors.inc  2017-10-19 
09:17:31.139837810 +0200
@@ -3352,7 +3350,8 @@ INTERCEPTOR(int, getgroups, int size, u3
   // its metadata. See
   // https://github.com/google/sanitizers/issues/321.
   int res = REAL(getgroups)(size, lst);
-  if (res && lst) COMMON_INTERCEPTOR_WRITE_RANGE(ctx, lst, res * sizeof(*lst));
+  if (res >= 0 && lst && size > 0)
+COMMON_INTERCEPTOR_WRITE_RANGE(ctx, lst, res * sizeof(*lst));
   return res;
 }
 #define INIT_GETGROUPS COMMON_INTERCEPT_FUNCTION(getgroups);
@@ -4552,11 +4551,15 @@ void *__tls_get_addr_opt(void *arg);
 //   descriptor offset as an argument instead of a pointer.  GOT address
 //   is passed in r12, so it's necessary to write it in assembly.  This is
 //   the function used by the compiler.
-#define INIT_TLS_GET_ADDR COMMON_INTERCEPT_FUNCTION(__tls_get_addr_internal)
+extern "C" uptr __tls_get_offset_wrapper(void *arg, uptr (*fn)(void *arg));
+#define INIT_TLS_GET_ADDR COMMON_INTERCEPT_FUNCTION(__tls_get_offset)
+DEFINE_REAL(uptr, __tls_get_offset, void *arg)
+extern "C" uptr __tls_get_offset(void *arg);
+extern "C" uptr __interceptor___tls_get_offset(void *arg);
 INTERCEPTOR(uptr, __tls_get_addr_internal, void *arg) {
   void *ctx;
   COMMON_INTERCEPTOR_ENTER(ctx, __tls_get_addr_internal, arg);
-  uptr res = REAL(__tls_get_addr_internal)(arg);
+  uptr res = __tls_get_offset_wrapper(arg, REAL(__tls_get_offset));
   uptr tp = reinterpret_cast(__builtin_thread_pointer());
   void *ptr = reinterpret_cast(res + tp);
   uptr tls_begin, tls_end;
@@ -4568,32 +4571,43 @@ INTERCEPTOR(uptr, __tls_get_addr_interna
   }
   return res;
 }
-// We need a protected symbol aliasing the above, so that we can jump
+// We need a hidden symbol aliasing the above, so that we can jump
 // directly to it from the assembly below.
 extern "C" __attribute__((alias("__interceptor___tls_get_addr_internal"),
-  visibility("protected")))
-uptr 

Re: [RFC PATCH] Merge libsanitizer from upstream

2017-10-19 Thread Maxim Ostapenko

Hi, I'm sorry for a late response.

On 19/10/17 13:52, Jakub Jelinek wrote:

On Mon, Oct 16, 2017 at 08:52:50PM +0200, Jakub Jelinek wrote:

The following patch is an attempt at libsanitizer merge from upstream.
Sadly libubsan has several ABI incompatible changes, dunno if we should
fight the mess and re-add backward compatibility back, or as the patch
does just bump soname, upstream clearly doesn't care about ABI compatibility
at all.

Bootstrapped/regtested on x86_64-linux and i686-linux, it would be nice to
get it tested on other targets (e.g. darwin, that breaks almost all the time
when doing merges, but no access to that).

Included is just the non-libsanitizer/ part plus GCC owned file changes
in libsanitizer (except regenerated ones), attached is bzip2ed full patch.

Thoughts on this?

Seems I've missed two test failures during the testing, this incremental patch 
ought
to fix that (with LTO we optimize away __asan_default_options and thus the
test fails, and libasan changed the env var handling, instead of having
handle_segv={0,1} it now has {0,1,2} modes and the old default 1 mode is now
the non-default mode 2, which the test relies on.

To answer my own question about possibility to avoid libubsan soname bump,
seems that is really not possible, because they removed already before
last merge __ubsan_handle_cfi_bad_type{,_abort} entrypoints (which we've
added back to make the library backwards compatible), but now they have
added __ubsan_handle_cfi_bad_type again with a different signature!
Argh!

Is the patch (the merge + this incremental) ok for trunk?


I think the patch is OK, just wondering about two things:

1) We have a bunch of GCC local patches, did you include them into a 
cumulative patch (I guess yes)?
2) Upstream has enabled LSan for x86 and ARM, is it worth to enable them 
in GCC too?


-Maxim



2017-10-19  Jakub Jelinek  

* g++.dg/asan/default-options-1.C (__asan_default_options): Add
used attribute.
* g++.dg/asan/asan_test.C: Run with ASAN_OPTIONS=handle_segv=2
in the environment.

--- gcc/testsuite/g++.dg/asan/default-options-1.C.jj2015-10-29 
09:14:39.0 +0100
+++ gcc/testsuite/g++.dg/asan/default-options-1.C   2017-10-19 
12:09:26.367324880 +0200
@@ -3,7 +3,7 @@
  const char *kAsanDefaultOptions="verbosity=1 foo=bar";
  
  extern "C"

-__attribute__((no_sanitize_address))
+__attribute__((no_sanitize_address, used))
  const char *__asan_default_options() {
return kAsanDefaultOptions;
  }
--- gcc/testsuite/g++.dg/asan/asan_test.C.jj2017-01-24 09:27:20.0 
+0100
+++ gcc/testsuite/g++.dg/asan/asan_test.C   2017-10-19 12:06:27.453510604 
+0200
@@ -8,6 +8,7 @@
  // { dg-additional-options "-DASAN_AVOID_EXPENSIVE_TESTS=1" { target { ! 
run_expensive_tests } } }
  // { dg-additional-options "-msse2" { target { i?86-*-linux* x86_64-*-linux* 
} } }
  // { dg-additional-options "-D__NO_INLINE__" { target { *-*-linux-gnu } } }
+// { dg-set-target-env-var ASAN_OPTIONS "handle_segv=2" }
  // { dg-final { asan-gtest } }
  
  #include "asan_test.cc"



Jakub







Re: [patch] Enhance support for -Wstack-usage/-Wvla-larger-than/-Walloca-larger-than

2017-10-19 Thread Richard Biener
On Thu, Oct 19, 2017 at 12:28 PM, Eric Botcazou  wrote:
>> Looks ok.  I wonder if you want to explicitely document that max_size < size
>> doesn't have any effect on actual code generation and is not checked for.
>
> Documentation amended to that effect:
>
>  -- Built-in Function: void *__builtin_alloca_with_align_and_max
>   (size_t size, size_t alignment, size_t max_size)
>  Similar to `__builtin_alloca_with_align' but takes an extra
>  argument specifying an upper bound for SIZE in case its value
>  cannot be computed at compile time, for use by `-fstack-usage',
>  `-Wstack-usage' and `-Walloca-larger-than' computations.  MAX_SIZE
>  has no effect on code generation and no attempt is made to check
>  its compatibility with SIZE.

works for me.

>> Also it seems that __builtin_alloca_with_align (20, 8, 16) will still
>> account 20 as the size and not 16 compared to 20 arriving in a variable
>> which is when 16 will be used.  So at least for accounting always use MIN
>> (size, max_size)?
>
> The implementation is in keeping with the above documentation, i.e. SIZE will
> prevail if its value can be computed at compile time.

OK.  Just wanted to mention it, there'll likely be cases where -O0 then
reports a smaller stack usage than -O2 because of this.

Thanks,
Richard.

> --
> Eric Botcazou


Re: [PATCH 6/9] [LVU] Allow final_start_function to skip initial insns

2017-10-19 Thread Richard Biener
On Sat, Sep 30, 2017 at 11:08 AM, Alexandre Oliva  wrote:
> This API change will enable final_start_function() to "consume"
> initial insns, and choose the first insn to be passed to final().
>
> Many ports call final_start_function() and final() when creating
> thunks and whatnot, so they needed adjusting.

But for MI thunks (and whatnot) there's no debug info.  So why do we care?
Most of them ask for sth very low-level.

That is, almost all the time the sequence is final_start_function immediately
followed by a call to final.  So isn't better refactoring the answer here?
Some ports only call final_start_function and never final for their thunks
for example.

That said, what do we lose when you do not adjust these things?

After all final () does more stuff than calling final_scan_insn on all
insns but you only do that for those you handle in final_start_function.

It seems to me that the current split between start/final/end is too
artificial and that the few "raw" building blocks that backends maybe
need should be factored out for them?

That said, it looks like an ugly "layering" violation you are introducing.

But of course I don't know history or details around this part of the
compiler...

That said, all ports invoking final () invoke it in the
start/final/end sequence.

spu, s390, rs6000 (and thus powerpcspe), pa, nds32, i386, cris, arm
invoke final_start_function and not final in the next 3 lines.  They all
invoke final_end_function but not final inbetween (all invokers of final
do the 3-line dance).

So, refactor final() to do all three and provide the chunks targets
really care for?

Why can't we do the param locview handling from final ()?

Probably all remanents of targets with text prologues?

Anyway, don't feel like acking in its current state because of this,
maybe somebody else can chime in.

Thanks,
Richard.

> for  gcc/ChangeLog
>
> * output.h (final_start_function): Adjust.
> * final.c (final_start_function): Take pointer to FIRST.
> (rest_of_handle_final): Adjust.
> * config/aarch64/aarch64.c (aarch64_output_mi_thunk): Adjust.
> * config/alpha/alpha.c (alpha_output_mi_thunk_osf): Likewise.
> * config/arm/arm.c (arm_thumb1_mi_thunk): Likewise.
> (arm32_output_mi_thunk): Likewise.
> * config/cris/cris.c (cris_asm_output_mi_thunk): Likewise.
> * config/i386/i386.c (ix86_code_end): Likewise.
> (x86_output_mi_thunk): Likewise.
> * config/ia64/ia64.c (ia64_output_mi_thunk): Likewise.
> * config/m68k/m68k.c (m68k_output_mi_thunk): Likewise.
> * config/microblaze/microblaze.c (microblaze_asm_output_mi_thunk):
> Likewise.
> * config/mips/mips.c (mips_output_mi_thunk): Likewise.
> * config/nds32/nds32.c (nds32_asm_output_mi_thunk): Likewise.
> * config/nios2/nios2.c (nios2_asm_output_mi_thunk): Likewise.
> * config/pa/pa.c (pa_asm_output_mi_thunk): Likewise.
> * config/rs6000/rs6000.c (rs6000_output_mi_thunk): Likewise.
> (rs6000_code_end): Likewise.
> * config/s390/s390.c (s390_output_mi_thunk): Likewise.
> * config/sh/sh.c (sh_output_mi_thunk): Likewise.
> * config/sparc/sparc.c (sparc_output_mi_thunk): Likewise.
> * config/spu/spu.c (spu_output_mi_thunk): Likewise.
> * config/tilegx/tilegx.c (tilegx_output_mi_thunk): Likewise.
> * config/tilepro/tilepro.c (tilepro_asm_output_mi_thunk): Likewise.
> ---
>  gcc/config/aarch64/aarch64.c   | 2 +-
>  gcc/config/alpha/alpha.c   | 2 +-
>  gcc/config/arm/arm.c   | 5 +++--
>  gcc/config/cris/cris.c | 3 ++-
>  gcc/config/i386/i386.c | 5 +++--
>  gcc/config/ia64/ia64.c | 2 +-
>  gcc/config/m68k/m68k.c | 2 +-
>  gcc/config/microblaze/microblaze.c | 2 +-
>  gcc/config/mips/mips.c | 2 +-
>  gcc/config/nds32/nds32.c   | 3 ++-
>  gcc/config/nios2/nios2.c   | 2 +-
>  gcc/config/pa/pa.c | 3 ++-
>  gcc/config/rs6000/rs6000.c | 5 +++--
>  gcc/config/s390/s390.c | 3 ++-
>  gcc/config/sh/sh.c | 2 +-
>  gcc/config/sparc/sparc.c   | 2 +-
>  gcc/config/spu/spu.c   | 3 ++-
>  gcc/config/tilegx/tilegx.c | 2 +-
>  gcc/config/tilepro/tilepro.c   | 2 +-
>  gcc/final.c| 9 ++---
>  gcc/output.h   | 2 +-
>  21 files changed, 37 insertions(+), 26 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 23f5aff..73872dd 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3961,7 +3961,7 @@ aarch64_output_mi_thunk (FILE *file, tree thunk 
> ATTRIBUTE_UNUSED,
>
>insn = get_insns ();
>shorten_branches (insn);
> -  final_start_function (insn, file, 1);
> +  final_start_function (, file, 1);
>final (insn, file, 1);
>final_end_function ();
>
> 

Re: [RFC PATCH] Merge libsanitizer from upstream

2017-10-19 Thread Jakub Jelinek
On Mon, Oct 16, 2017 at 08:52:50PM +0200, Jakub Jelinek wrote:
> The following patch is an attempt at libsanitizer merge from upstream.
> Sadly libubsan has several ABI incompatible changes, dunno if we should
> fight the mess and re-add backward compatibility back, or as the patch
> does just bump soname, upstream clearly doesn't care about ABI compatibility
> at all.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, it would be nice to
> get it tested on other targets (e.g. darwin, that breaks almost all the time
> when doing merges, but no access to that).
> 
> Included is just the non-libsanitizer/ part plus GCC owned file changes
> in libsanitizer (except regenerated ones), attached is bzip2ed full patch.
> 
> Thoughts on this?

Seems I've missed two test failures during the testing, this incremental patch 
ought
to fix that (with LTO we optimize away __asan_default_options and thus the
test fails, and libasan changed the env var handling, instead of having
handle_segv={0,1} it now has {0,1,2} modes and the old default 1 mode is now
the non-default mode 2, which the test relies on.

To answer my own question about possibility to avoid libubsan soname bump,
seems that is really not possible, because they removed already before
last merge __ubsan_handle_cfi_bad_type{,_abort} entrypoints (which we've
added back to make the library backwards compatible), but now they have
added __ubsan_handle_cfi_bad_type again with a different signature!
Argh!

Is the patch (the merge + this incremental) ok for trunk?

2017-10-19  Jakub Jelinek  

* g++.dg/asan/default-options-1.C (__asan_default_options): Add
used attribute.
* g++.dg/asan/asan_test.C: Run with ASAN_OPTIONS=handle_segv=2
in the environment.

--- gcc/testsuite/g++.dg/asan/default-options-1.C.jj2015-10-29 
09:14:39.0 +0100
+++ gcc/testsuite/g++.dg/asan/default-options-1.C   2017-10-19 
12:09:26.367324880 +0200
@@ -3,7 +3,7 @@
 const char *kAsanDefaultOptions="verbosity=1 foo=bar";
 
 extern "C"
-__attribute__((no_sanitize_address))
+__attribute__((no_sanitize_address, used))
 const char *__asan_default_options() {
   return kAsanDefaultOptions;
 }
--- gcc/testsuite/g++.dg/asan/asan_test.C.jj2017-01-24 09:27:20.0 
+0100
+++ gcc/testsuite/g++.dg/asan/asan_test.C   2017-10-19 12:06:27.453510604 
+0200
@@ -8,6 +8,7 @@
 // { dg-additional-options "-DASAN_AVOID_EXPENSIVE_TESTS=1" { target { ! 
run_expensive_tests } } }
 // { dg-additional-options "-msse2" { target { i?86-*-linux* x86_64-*-linux* } 
} }
 // { dg-additional-options "-D__NO_INLINE__" { target { *-*-linux-gnu } } }
+// { dg-set-target-env-var ASAN_OPTIONS "handle_segv=2" }
 // { dg-final { asan-gtest } }
 
 #include "asan_test.cc"


Jakub


[PATCH] rs6000: Fix "missing mode" on UNSPEC_TOCSLOT

2017-10-19 Thread Segher Boessenkool
Currently gen* warn about a missing mode on an UNSPEC_TOCSLOT unspec in
some call patterns.  Those unspecs are created from rs6000.c, with Pmode
always.  This patch fixes the patterns to say :P as well.

Tested on powerpc64-linux {-m32,-m64}.  Committing to trunk.


Segher


2018-10-19  Segher Boessenkool  

* config/rs6000/rs6000.md (*call_indirect_aix,
*call_value_indirect_aix, *call_indirect_elfv2,
*call_value_indirect_elfv2): Add correct mode to the unspec.

---
 gcc/config/rs6000/rs6000.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index a47f746..5ef7f6b 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -11165,7 +11165,7 @@ (define_insn "*call_indirect_aix"
   [(call (mem:SI (match_operand:P 0 "register_operand" "c,*l"))
 (match_operand 1 "" "g,g"))
(use (match_operand:P 2 "memory_operand" ","))
-   (set (reg:P TOC_REGNUM) (unspec [(match_operand:P 3 "const_int_operand" 
"n,n")] UNSPEC_TOCSLOT))
+   (set (reg:P TOC_REGNUM) (unspec:P [(match_operand:P 3 "const_int_operand" 
"n,n")] UNSPEC_TOCSLOT))
(clobber (reg:P LR_REGNO))]
   "DEFAULT_ABI == ABI_AIX"
   " 2,%2\;b%T0l\; 2,%3(1)"
@@ -11177,7 +11177,7 @@ (define_insn "*call_value_indirect_aix"
(call (mem:SI (match_operand:P 1 "register_operand" "c,*l"))
  (match_operand 2 "" "g,g")))
(use (match_operand:P 3 "memory_operand" ","))
-   (set (reg:P TOC_REGNUM) (unspec [(match_operand:P 4 "const_int_operand" 
"n,n")] UNSPEC_TOCSLOT))
+   (set (reg:P TOC_REGNUM) (unspec:P [(match_operand:P 4 "const_int_operand" 
"n,n")] UNSPEC_TOCSLOT))
(clobber (reg:P LR_REGNO))]
   "DEFAULT_ABI == ABI_AIX"
   " 2,%3\;b%T1l\; 2,%4(1)"
@@ -11191,7 +11191,7 @@ (define_insn "*call_value_indirect_aix"
 (define_insn "*call_indirect_elfv2"
   [(call (mem:SI (match_operand:P 0 "register_operand" "c,*l"))
 (match_operand 1 "" "g,g"))
-   (set (reg:P TOC_REGNUM) (unspec [(match_operand:P 2 "const_int_operand" 
"n,n")] UNSPEC_TOCSLOT))
+   (set (reg:P TOC_REGNUM) (unspec:P [(match_operand:P 2 "const_int_operand" 
"n,n")] UNSPEC_TOCSLOT))
(clobber (reg:P LR_REGNO))]
   "DEFAULT_ABI == ABI_ELFv2"
   "b%T0l\; 2,%2(1)"
@@ -11202,7 +11202,7 @@ (define_insn "*call_value_indirect_elfv2"
   [(set (match_operand 0 "" "")
(call (mem:SI (match_operand:P 1 "register_operand" "c,*l"))
  (match_operand 2 "" "g,g")))
-   (set (reg:P TOC_REGNUM) (unspec [(match_operand:P 3 "const_int_operand" 
"n,n")] UNSPEC_TOCSLOT))
+   (set (reg:P TOC_REGNUM) (unspec:P [(match_operand:P 3 "const_int_operand" 
"n,n")] UNSPEC_TOCSLOT))
(clobber (reg:P LR_REGNO))]
   "DEFAULT_ABI == ABI_ELFv2"
   "b%T1l\; 2,%3(1)"
-- 
1.8.3.1



Re: [PATCH] ira-color: fix allocno_priority_compare_func for qsort (PR 82395)

2017-10-19 Thread Alexander Monakov
Ping.

On Thu, 5 Oct 2017, Alexander Monakov wrote:
> In ira-color.c, qsort comparator allocno_priority_compare_func lacks anti-
> commutativity and can indicate A < B < A if boths allocnos satisfy
> non_spilled_static_chain_regno_p.  It should fall down to following
> sub-comparisons in that case.
> 
> There is another issue: the comment doesn't match the code.  We are sorting
> allocnos by priority, higher first, and the comment says that allocnos
> corresponding to static chain pointer register should go first. However,
> the code implements the opposite ordering.
> 
> I think the comment documents the intended behavior and the code is wrong.
> Thus, the patch changes comparison value to match the comment.
> 
> A similar issue was present in lra-assigns.c and was fixed by this patch:
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00893.html
> 
> Bootstrapped and regtested on x86-64, OK for trunk?
> 
> Thanks.
> Alexander
> 
>   PR rtl-optimization/82395
>   * ira-color.c (allocno_priority_compare_func): Fix comparison step
>   based on non_spilled_static_chain_regno_p.
> 
> diff --git a/gcc/ira-color.c b/gcc/ira-color.c
> index 22fdb88274d..31a4a8074d1 100644
> --- a/gcc/ira-color.c
> +++ b/gcc/ira-color.c
> @@ -3005,14 +3005,13 @@ allocno_priority_compare_func (const void *v1p, const 
> void *v2p)
>  {
>ira_allocno_t a1 = *(const ira_allocno_t *) v1p;
>ira_allocno_t a2 = *(const ira_allocno_t *) v2p;
> -  int pri1, pri2;
> +  int pri1, pri2, diff;
>  
>/* Assign hard reg to static chain pointer pseudo first when
>   non-local goto is used.  */
> -  if (non_spilled_static_chain_regno_p (ALLOCNO_REGNO (a1)))
> -return 1;
> -  else if (non_spilled_static_chain_regno_p (ALLOCNO_REGNO (a2)))
> -return -1;
> +  if ((diff = (non_spilled_static_chain_regno_p (ALLOCNO_REGNO (a2))
> +- non_spilled_static_chain_regno_p (ALLOCNO_REGNO (a1 != 0)
> +return diff;
>pri1 = allocno_priorities[ALLOCNO_NUM (a1)];
>pri2 = allocno_priorities[ALLOCNO_NUM (a2)];
>if (pri2 != pri1)


Re: [patch] Enhance support for -Wstack-usage/-Wvla-larger-than/-Walloca-larger-than

2017-10-19 Thread Eric Botcazou
> Looks ok.  I wonder if you want to explicitely document that max_size < size
> doesn't have any effect on actual code generation and is not checked for.

Documentation amended to that effect:

 -- Built-in Function: void *__builtin_alloca_with_align_and_max
  (size_t size, size_t alignment, size_t max_size)
 Similar to `__builtin_alloca_with_align' but takes an extra
 argument specifying an upper bound for SIZE in case its value
 cannot be computed at compile time, for use by `-fstack-usage',
 `-Wstack-usage' and `-Walloca-larger-than' computations.  MAX_SIZE
 has no effect on code generation and no attempt is made to check
 its compatibility with SIZE.

> Also it seems that __builtin_alloca_with_align (20, 8, 16) will still
> account 20 as the size and not 16 compared to 20 arriving in a variable
> which is when 16 will be used.  So at least for accounting always use MIN
> (size, max_size)?

The implementation is in keeping with the above documentation, i.e. SIZE will 
prevail if its value can be computed at compile time.

-- 
Eric Botcazou


Re: [libstdc++, patch] Fix build on APFS file system

2017-10-19 Thread Petr Ovtchenkov
On Thu, 19 Oct 2017 10:37:14 +0200
Richard Biener  wrote:

> On Wed, Oct 18, 2017 at 11:58 PM, FX  wrote:
> >> Could you test using .PHONY: install-headers instead?
> >> That target *is* phony, so telling make that seems sensible.
> >
> > I’ve tried adding it to the existing .PHONY list:
> >
> > Index: libstdc++-v3/include/Makefile.in
> > ===
> > --- libstdc++-v3/include/Makefile.in(revision 253855)
> > +++ libstdc++-v3/include/Makefile.in(working copy)
> > @@ -1449,6 +1449,7 @@ uninstall-am:
> > distclean-libtool dvi dvi-am html html-am info info-am install \
> > install-am install-data install-data-am install-data-local \
> > install-dvi install-dvi-am install-exec install-exec-am \
> > +   install-freestanding-headers install-headers \
> > install-html install-html-am install-info install-info-am \
> > install-man install-pdf install-pdf-am install-ps \
> > install-ps-am install-strip installcheck installcheck-am \
> >
> >
> > But that doesn’t work:
> >
> > In file included
> > from 
> > /Users/fx/devel/gcc/ibin/x86_64-apple-darwin17.0.0/i386/libstdc++-v3/include/bits/exception_ptr.h:39:0,
> > from /Users/fx/devel/gcc/trunk/libstdc++-v3/libsupc++/exception:143,
> > from 
> > /Users/fx/devel/gcc/ibin/x86_64-apple-darwin17.0.0/i386/libstdc++-v3/include/ios:39,
> > from 
> > /Users/fx/devel/gcc/ibin/x86_64-apple-darwin17.0.0/i386/libstdc++-v3/include/istream:38,
> > from 
> > /Users/fx/devel/gcc/ibin/x86_64-apple-darwin17.0.0/i386/libstdc++-v3/include/sstream:38,
> > from 
> > /Users/fx/devel/gcc/ibin/x86_64-apple-darwin17.0.0/i386/libstdc++-v3/include/complex:45,
> > from 
> > /Users/fx/devel/gcc/ibin/x86_64-apple-darwin17.0.0/i386/libstdc++-v3/include/ccomplex:39,
> > from 
> > /Users/fx/devel/gcc/trunk/libstdc++-v3/include/precompiled/stdc++.h:52: 
> > /Users/fx/devel/gcc/trunk/libstdc++-v3/libsupc++/typeinfo:36:10:
> > fatal error: bits/hash_bytes.h: No such file or directory #include 
> > 
> >   ^~~
> 
> So the issue is PCH generation (why's that re-generated at install time?).

May be indeed: I have a lot parallel builds and never see problem like this,
but I skip PCH generation.

--

   - ptr


[PATCH] Document --coverage and fork-like functions (PR gcov-profile/82457).

2017-10-19 Thread Martin Liška
Hi.

As discussed in the PR, we should be more precise in our documentation.
The patch does that.

Ready for trunk?
Martin

gcc/ChangeLog:

2017-10-19  Martin Liska  

PR gcov-profile/82457
* doc/invoke.texi: Document that one needs a non-strict ISO mode
for fork-like functions to be properly instrumented.
---
 gcc/doc/invoke.texi | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5e88279528f..b37bca48960 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10861,7 +10861,9 @@ information.  This may be repeated any number of times.  You can run
 concurrent instances of your program, and provided that the file system
 supports locking, the data files will be correctly updated.  Also
 @code{fork} calls are detected and correctly handled (double counting
-will not happen).
+will not happen).  For C language, a non-strict ISO C mode
+(@option{-ansi}, @option{-std=c90}, @option{-std=c99} or @option{-std=c11})
+is needed.
 
 @item
 For profile-directed optimizations, compile the source files again with



Re: [PATCH] Revert r238089 (PR driver/81829).

2017-10-19 Thread Martin Liška
PING^1

On 09/15/2017 02:07 PM, Martin Liška wrote:
> Hi.
> 
> In order to make the code simple and transparent, I suggest to revert r238089.
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin
> 



Re: Add an alternative vector loop iv mechanism

2017-10-19 Thread Richard Biener
On Thu, Oct 19, 2017 at 10:48 AM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Thu, Oct 19, 2017 at 12:28 AM, Richard Sandiford
>>  wrote:
>>> Richard Biener  writes:
 On Fri, Oct 13, 2017 at 4:10 PM, Richard Sandiford
  wrote:
> Normally we adjust the vector loop so that it iterates:
>
>(original number of scalar iterations - number of peels) / VF
>
> times, enforcing this using an IV that starts at zero and increments
> by one each iteration.  However, dividing by VF would be expensive
> for variable VF, so this patch adds an alternative in which the IV
> increments by VF each iteration instead.  We then need to take care
> to handle possible overflow in the IV.

 Hmm, why do you need to handle possible overflow?  Doesn't the
 original loop have a natural IV that evolves like this?  After all we
 can compute an expression for niters of the scalar loop.
>>>
>>> The problem comes with loops like:
>>>
>>>unsigned char i = 0;
>>>do
>>>  {
>>>...
>>>i--;
>>>  }
>>>while (i != 0);
>>>
>>> The loop statements execute 256 times and the latch executes 255 times.
>>> LOOP_VINFO_NITERSM1 is then 255 but LOOP_VINFO_NITERS (stored as an
>>> unsigned char) is 0.
>>
>> Yes, that's an existing issue and the reason why I introduced
>> NITERSM1.  All remaining uses of NITERS should really go away
>> because of this corner-case.  So you are introducing a new user?
>
> It's not really an NITERSM1 vs. NITERS thing.  We'd get the same
> result/have the same problem with NITERSM1 - (STEP - 1) instead
> of NITERS - STEP, namely:
>
> - the new IV uses the same type as NITERS
> - we only want the loop to iterate if there are at least STEP scalar
>   iterations to go
> - this means that the natural limit is "IV <= NITERS - STEP"
>   or "IV <= NITERSM1 - (STEP - 1)" (both equivalent)
> - the loop is only guaranteed to terminate if the IV can hit
>   a value STEP times higher than that, i.e. "IV == NITERS - STEP"
>   must be followed by an iteration in which the branch-back
>   condition is false
> - but if NITERS can't represent the actual number of iterations,
>   then there is no value STEP times higher than that
> - we cope with this by starting the IV at -1 and using a limit
>   of "IV < NITERS - STEP" i.e. "IV <= NITERSM1 - STEP".
>
> So you could see this as using a limit based on NITERSM1 with a
> start of -1, although the "< NITERS - STEP" avoids the need to
> subtract 1 at runtime.
>
> But it seems better to use a 0-based IV when we can, since that
> leads to more natural ivopts opportunities.  That's why the loop
> tests for the overflow case and only uses the -1 based IV when
> necessary.

I see.  Thanks for the clarification.

Richard.

> Thanks,
> Richard
>
>>
>> Richard.
>>
>>> This leads to things like:
>>>
>>>   /* Constant case.  */
>>>   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>>> {
>>>   tree cst_niters = LOOP_VINFO_NITERS (loop_vinfo);
>>>   tree cst_nitersm1 = LOOP_VINFO_NITERSM1 (loop_vinfo);
>>>
>>>   gcc_assert (TREE_CODE (cst_niters) == INTEGER_CST);
>>>   gcc_assert (TREE_CODE (cst_nitersm1) == INTEGER_CST);
>>>   if (wi::to_widest (cst_nitersm1) < wi::to_widest (cst_niters))
>>> return true;
>>> }
>>>
>>> in loop_niters_no_overflow.
>>>
> The new mechanism isn't used yet; a later patch replaces the
> "if (1)" with a check for variable VF.  If the patch is OK, I'll
> hold off applying it until the follow-on is ready to go in.

 I indeed don't like code that isn't exercised.  Otherwise looks reasonable.
>>>
>>> Thanks.
>>>
>>> Richard
>>>
 Thanks,
 Richard.

> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
> OK to install when the time comes?
>
> Richard
>
>
> 2017-10-13  Richard Sandiford  
>
> gcc/
> * tree-vect-loop-manip.c: Include gimple-fold.h.
> (slpeel_make_loop_iterate_ntimes): Add step, final_iv and
> niters_maybe_zero parameters.  Handle other cases besides a step of
>> 1.
> (vect_gen_vector_loop_niters): Add a step_vector_ptr parameter.
> Add a path that uses a step of VF instead of 1, but disable it
> for now.
> (vect_do_peeling): Add step_vector, niters_vector_mult_vf_var
> and niters_no_overflow parameters.  Update calls to
> slpeel_make_loop_iterate_ntimes and vect_gen_vector_loop_niters.
> Create a new SSA name if the latter choses to use a ste other
> than zero, and return it via niters_vector_mult_vf_var.
> * tree-vect-loop.c (vect_transform_loop): Update calls to
> vect_do_peeling, vect_gen_vector_loop_niters and
> 

Re: Add an alternative vector loop iv mechanism

2017-10-19 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, Oct 19, 2017 at 12:28 AM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Fri, Oct 13, 2017 at 4:10 PM, Richard Sandiford
>>>  wrote:
 Normally we adjust the vector loop so that it iterates:

(original number of scalar iterations - number of peels) / VF

 times, enforcing this using an IV that starts at zero and increments
 by one each iteration.  However, dividing by VF would be expensive
 for variable VF, so this patch adds an alternative in which the IV
 increments by VF each iteration instead.  We then need to take care
 to handle possible overflow in the IV.
>>>
>>> Hmm, why do you need to handle possible overflow?  Doesn't the
>>> original loop have a natural IV that evolves like this?  After all we
>>> can compute an expression for niters of the scalar loop.
>>
>> The problem comes with loops like:
>>
>>unsigned char i = 0;
>>do
>>  {
>>...
>>i--;
>>  }
>>while (i != 0);
>>
>> The loop statements execute 256 times and the latch executes 255 times.
>> LOOP_VINFO_NITERSM1 is then 255 but LOOP_VINFO_NITERS (stored as an
>> unsigned char) is 0.
>
> Yes, that's an existing issue and the reason why I introduced
> NITERSM1.  All remaining uses of NITERS should really go away
> because of this corner-case.  So you are introducing a new user?

It's not really an NITERSM1 vs. NITERS thing.  We'd get the same
result/have the same problem with NITERSM1 - (STEP - 1) instead
of NITERS - STEP, namely:

- the new IV uses the same type as NITERS
- we only want the loop to iterate if there are at least STEP scalar
  iterations to go
- this means that the natural limit is "IV <= NITERS - STEP"
  or "IV <= NITERSM1 - (STEP - 1)" (both equivalent)
- the loop is only guaranteed to terminate if the IV can hit
  a value STEP times higher than that, i.e. "IV == NITERS - STEP"
  must be followed by an iteration in which the branch-back
  condition is false
- but if NITERS can't represent the actual number of iterations,
  then there is no value STEP times higher than that
- we cope with this by starting the IV at -1 and using a limit
  of "IV < NITERS - STEP" i.e. "IV <= NITERSM1 - STEP".

So you could see this as using a limit based on NITERSM1 with a
start of -1, although the "< NITERS - STEP" avoids the need to
subtract 1 at runtime.

But it seems better to use a 0-based IV when we can, since that
leads to more natural ivopts opportunities.  That's why the loop
tests for the overflow case and only uses the -1 based IV when
necessary.

Thanks,
Richard

>
> Richard.
>
>> This leads to things like:
>>
>>   /* Constant case.  */
>>   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>> {
>>   tree cst_niters = LOOP_VINFO_NITERS (loop_vinfo);
>>   tree cst_nitersm1 = LOOP_VINFO_NITERSM1 (loop_vinfo);
>>
>>   gcc_assert (TREE_CODE (cst_niters) == INTEGER_CST);
>>   gcc_assert (TREE_CODE (cst_nitersm1) == INTEGER_CST);
>>   if (wi::to_widest (cst_nitersm1) < wi::to_widest (cst_niters))
>> return true;
>> }
>>
>> in loop_niters_no_overflow.
>>
 The new mechanism isn't used yet; a later patch replaces the
 "if (1)" with a check for variable VF.  If the patch is OK, I'll
 hold off applying it until the follow-on is ready to go in.
>>>
>>> I indeed don't like code that isn't exercised.  Otherwise looks reasonable.
>>
>> Thanks.
>>
>> Richard
>>
>>> Thanks,
>>> Richard.
>>>
 Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
 OK to install when the time comes?

 Richard


 2017-10-13  Richard Sandiford  

 gcc/
 * tree-vect-loop-manip.c: Include gimple-fold.h.
 (slpeel_make_loop_iterate_ntimes): Add step, final_iv and
 niters_maybe_zero parameters.  Handle other cases besides a step of
> 1.
 (vect_gen_vector_loop_niters): Add a step_vector_ptr parameter.
 Add a path that uses a step of VF instead of 1, but disable it
 for now.
 (vect_do_peeling): Add step_vector, niters_vector_mult_vf_var
 and niters_no_overflow parameters.  Update calls to
 slpeel_make_loop_iterate_ntimes and vect_gen_vector_loop_niters.
 Create a new SSA name if the latter choses to use a ste other
 than zero, and return it via niters_vector_mult_vf_var.
 * tree-vect-loop.c (vect_transform_loop): Update calls to
 vect_do_peeling, vect_gen_vector_loop_niters and
 slpeel_make_loop_iterate_ntimes.
 * tree-vectorizer.h (slpeel_make_loop_iterate_ntimes,
> vect_do_peeling)
 (vect_gen_vector_loop_niters): Update declarations after above
>>> changes.

 Index: gcc/tree-vect-loop-manip.c
 

Re: [PATCH GCC][4/7]Choose exit edge/path when removing inner loop's exit statement

2017-10-19 Thread Bin.Cheng
On Thu, Oct 19, 2017 at 9:31 AM, Tom de Vries  wrote:
> On 10/09/2017 03:34 PM, Richard Biener wrote:
>>
>> On Thu, Oct 5, 2017 at 3:16 PM, Bin Cheng  wrote:
>>>
>>> Hi,
>>> Function generate_loops_for_partition chooses arbitrary path when
>>> removing exit
>>> condition not in partition.  This is fine for now because it's impossible
>>> to have
>>> loop exit condition in case of innermost distribution.  After extending
>>> to loop
>>> nest distribution, we must choose exit edge/path for inner loop's exit
>>> condition,
>>> otherwise an infinite empty loop will be generated.  Test case added.
>>>
>>> Bootstrap and test in patch set on x86_64 and AArch64, is it OK?
>>
>>
>> Ok.
>>
>> Richard.
>>
>>> Thanks,
>>> bin
>>> 2017-10-04  Bin Cheng  
>>>
>>>  * tree-loop-distribution.c (generate_loops_for_partition):
>>> Remove
>>>  inner loop's exit stmt by making it always exit the loop,
>>> otherwise
>>>  we would generate an infinite empty loop.
>>>
>>> gcc/testsuite/ChangeLog
>>> 2017-10-04  Bin Cheng  
>>>
>>>  * gcc.dg/tree-ssa/ldist-27.c: New test.
>
>
> Hi,
>
> I've committed patch below to specify the stack size requirements of this
> test-case (fixing the test failure for nvptx).
Hi,
Maybe we can simply make the structure a global variable?

Thanks,
bin
>
> Does it make sense to trim down the test-case using #ifdef STACK_SIZE?
>
> Thanks,
> - Tom


Re: [libstdc++, patch] Fix build on APFS file system

2017-10-19 Thread FX
> So the issue is PCH generation (why's that re-generated at install time?).

As suggested by Marc in the PR, I've removed the @ from include/Makefile.in, 
and removed the leading - for lines with LN_S.

The result of "make -d --trace -j8 all-target-libstdc++-v3", in a build where 
x86_64-apple-darwin17.0.0/libstdc++-v3 was entirely removed, can be found here: 
https://gist.github.com/fxcoudert/b621465a794d968593bc7ed90c0fc1fb

FX

Re: [libstdc++, patch] Fix build on APFS file system

2017-10-19 Thread Richard Biener
On Wed, Oct 18, 2017 at 11:58 PM, FX  wrote:
>> Could you test using .PHONY: install-headers instead?
>> That target *is* phony, so telling make that seems sensible.
>
> I’ve tried adding it to the existing .PHONY list:
>
> Index: libstdc++-v3/include/Makefile.in
> ===
> --- libstdc++-v3/include/Makefile.in(revision 253855)
> +++ libstdc++-v3/include/Makefile.in(working copy)
> @@ -1449,6 +1449,7 @@ uninstall-am:
> distclean-libtool dvi dvi-am html html-am info info-am install \
> install-am install-data install-data-am install-data-local \
> install-dvi install-dvi-am install-exec install-exec-am \
> +   install-freestanding-headers install-headers \
> install-html install-html-am install-info install-info-am \
> install-man install-pdf install-pdf-am install-ps \
> install-ps-am install-strip installcheck installcheck-am \
>
>
> But that doesn’t work:
>
> In file included from 
> /Users/fx/devel/gcc/ibin/x86_64-apple-darwin17.0.0/i386/libstdc++-v3/include/bits/exception_ptr.h:39:0,
>  from 
> /Users/fx/devel/gcc/trunk/libstdc++-v3/libsupc++/exception:143,
>  from 
> /Users/fx/devel/gcc/ibin/x86_64-apple-darwin17.0.0/i386/libstdc++-v3/include/ios:39,
>  from 
> /Users/fx/devel/gcc/ibin/x86_64-apple-darwin17.0.0/i386/libstdc++-v3/include/istream:38,
>  from 
> /Users/fx/devel/gcc/ibin/x86_64-apple-darwin17.0.0/i386/libstdc++-v3/include/sstream:38,
>  from 
> /Users/fx/devel/gcc/ibin/x86_64-apple-darwin17.0.0/i386/libstdc++-v3/include/complex:45,
>  from 
> /Users/fx/devel/gcc/ibin/x86_64-apple-darwin17.0.0/i386/libstdc++-v3/include/ccomplex:39,
>  from 
> /Users/fx/devel/gcc/trunk/libstdc++-v3/include/precompiled/stdc++.h:52:
> /Users/fx/devel/gcc/trunk/libstdc++-v3/libsupc++/typeinfo:36:10: fatal error: 
> bits/hash_bytes.h: No such file or directory
>  #include 
>   ^~~

So the issue is PCH generation (why's that re-generated at install time?).

Richard.


Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices

2017-10-19 Thread Richard Biener
On Thu, Oct 19, 2017 at 1:19 AM, Martin Sebor  wrote:
> On 10/18/2017 04:48 AM, Richard Biener wrote:
>>
>> On Wed, Oct 18, 2017 at 5:34 AM, Martin Sebor  wrote:
>>>
>>> While testing my latest -Wrestrict changes I noticed a number of
>>> opportunities to improve the -Warray-bounds warning.  Attached
>>> is a patch that implements a solution for the following subset
>>> of these:
>>>
>>> PR tree-optimization/82596 - missing -Warray-bounds on an out-of
>>>   bounds index into string literal
>>> PR tree-optimization/82588 - missing -Warray-bounds on an excessively
>>>   large index
>>> PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds
>>>   inner indices
>>>
>>> The patch also adds more detail to the -Warray-bounds diagnostics
>>> to make it easier to see the cause of the problem.
>>>
>>> Richard, since the changes touch tree-vrp.c, I look in particular
>>> for your comments.
>>
>>
>> +  /* Accesses to trailing arrays via pointers may access storage
>> +beyond the types array bounds.  For such arrays, or for flexible
>> +array members as well as for other arrays of an unknown size,
>> +replace the upper bound with a more permissive one that assumes
>> +the size of the largest object is SSIZE_MAX.  */
>>
>> I believe handling those cases are somewhat academic, but ...
>
>
> Thanks for the quick review!
>
> I agree the SSIZE_MAX tests handle corner cases and there are
> arguably more important gaps here to plug (e.g., VLAs).  Then
> again, most bugs tend to lurk in corner cases of one kind or
> another and these seemed like a good way for me to come up to
> speed on the implementation before tackling those.  If you
> have suggestions for which to dive into next I'm happy to take
> them.
>
>> +  tree eltype = TREE_TYPE (ref);
>> +  tree eltsize = TYPE_SIZE_UNIT (eltype);
>>
>> needs to use array_ref_element_size.  Note that this size can be
>> non-constant in which case you now ICE so you have to check
>> this is an INTEGER_CST.
>
>
> Thanks.  I did reproduce a few ICEs due to VLAs.  I've fixed
> the problems and added tests for them.  One-dimensional VLAs
> are now handled the same way arrays of unknown bound are.
> Handling them more intelligently (i.e., taking into account
> the ranges their bounds are in) and especially handling
> multidimensional VLAs will take some effort.
>
>>
>> +  tree maxbound = TYPE_MAX_VALUE (ssizetype);
>>
>> please don't use ssizetype - sizetype can be of different precision
>> than pointers (and thus objects).  size_type_node would come
>> close (maps to size_t), eventually ptrdiff_type_node is also
>> defined by all frontends.
>
>
> Okay, I've changed it to sizetype (it would be nice if there
> were a cleaner way of doing it rather than by bit-twiddling.)
>
> ptrdiff_type would have been my first choice but it's only defined
> by the C/C++ front ends and not available in the middle-end, so
> the other warnings that consider object sizes deal with ssizetype
> (e.g., -Walloc-size-larger-than, -Wstringop- overflow, and
> -Wformat-overflow).
>
> That said, I'm not sure I understand under what conditions
> ssizetype is not the signed equivalent of sizetype.  Can you
> clarify?

I meant to use size_type_node (size_t), not sizetype.  But
I just checked that ptrdiff_type_node is initialized in
build_common_tree_nodes and thus always available.

> As an aside, at some point I would like to get away from a type
> based limit in all these warnings and instead use one that can
> be controlled by an option so that a user can impose a lower limit
> on the maximum size of an object and have all size-related warnings
> (and perhaps even optimizations) enforce it and benefit from it.

You could add a --param that is initialized from ptrdiff_type_node.

>> +  up_bound_p1 = fold_build2 (TRUNC_DIV_EXPR, ssizetype, maxbound,
>> eltsize);
>> +
>>
>> int_const_binop if you insist on using trees...
>
>
> Sure.  (I think offset_int would be more convenient but the rest
> of the function uses trees so I just stuck to those to avoid
> converting things back and forth or disrupting more of the code
> than I had to.)

Understood.

>>
>> +  tree arg = TREE_OPERAND (ref, 0);
>> +  tree_code code = TREE_CODE (arg);
>> +  if (code == COMPONENT_REF)
>> +   {
>> + HOST_WIDE_INT off;
>> + if (tree base = get_addr_base_and_unit_offset (ref, ))
>> +   up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype, up_bound_p1,
>> +  TYPE_SIZE_UNIT (TREE_TYPE (base)));
>> + else
>> +   return;
>>
>> so this gives up on a.b[i].c.d[k] (ok, array_at_struct_end_p will be
>> false).
>> simply not subtracting anyhing instead of returning would be
>> conservatively
>> correct, no?  Likewise subtracting the offset of the array for all
>> "previous"
>> variably indexed components with assuming the lowest value for the index.
>> But as above I 

Re: [PATCH GCC][4/7]Choose exit edge/path when removing inner loop's exit statement

2017-10-19 Thread Tom de Vries

On 10/09/2017 03:34 PM, Richard Biener wrote:

On Thu, Oct 5, 2017 at 3:16 PM, Bin Cheng  wrote:

Hi,
Function generate_loops_for_partition chooses arbitrary path when removing exit
condition not in partition.  This is fine for now because it's impossible to 
have
loop exit condition in case of innermost distribution.  After extending to loop
nest distribution, we must choose exit edge/path for inner loop's exit 
condition,
otherwise an infinite empty loop will be generated.  Test case added.

Bootstrap and test in patch set on x86_64 and AArch64, is it OK?


Ok.

Richard.


Thanks,
bin
2017-10-04  Bin Cheng  

 * tree-loop-distribution.c (generate_loops_for_partition): Remove
 inner loop's exit stmt by making it always exit the loop, otherwise
 we would generate an infinite empty loop.

gcc/testsuite/ChangeLog
2017-10-04  Bin Cheng  

 * gcc.dg/tree-ssa/ldist-27.c: New test.


Hi,

I've committed patch below to specify the stack size requirements of 
this test-case (fixing the test failure for nvptx).


Does it make sense to trim down the test-case using #ifdef STACK_SIZE?

Thanks,
- Tom
Specify required stack size for gcc.dg/tree-ssa/ldist-27.c

2017-10-19  Tom de Vries  

	* gcc.dg/tree-ssa/ldist-27.c: Use dg-require-stack-size.

---
 gcc/testsuite/gcc.dg/tree-ssa/ldist-27.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-27.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-27.c
index 3580c65..cd9696e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-27.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-27.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O3 -ftree-loop-distribute-patterns -fdump-tree-ldist-details" } */
+/* { dg-require-stack-size "(300 + 200 + 300 * 200) * 8" } */
 
 #define M (300)
 #define N (200)


Re: [PATCH, i386] Improve double-word comparisons (PR target/82580)

2017-10-19 Thread Uros Bizjak
On Thu, Oct 19, 2017 at 10:09 AM, Jakub Jelinek  wrote:
> Hi!
>
> With the patch you've checked in yesterday we generate e.g. for f2
> cmpq%rdi, %rdx
> sbbq%rsi, %rcx
> setb%al
> movzbl  %al, %eax
> This is because the peephole2s we have for setcc + movzbl to xorl + setcc
> check that the flags register is dead before the instruction before
> setcc (sbbq above), which is not the case, so we can't insert xorl %eax, %eax
> before sbbq, as that woiuld clobber flags.  But we can emit
> xorl%eax, %eax
> cmpq%rdi, %rdx
> sbbq%rsi, %rcx
> setb%al
> in this case, move the clearing one insn earlier (of course assuming that
> neither the cmpq nor sbbq use or set %rax).
>
> The following peephole2s do that, bootstrapped/regtested on x86_64-linux and
> i686-linux, ok for trunk?
>
> The reason for only testing no movzb* insns for lp64 is that for -m32 there
> are some - the arguments are passed on the stack, so they need to be loaded
> from there and %eax is one of the registers it chooses to use.
>
> 2017-10-19  Jakub Jelinek  
>
> PR target/82580
> * config/i386/i386.md (setcc + movzbl to xor + setcc): New peephole2.
> (setcc + and to xor + setcc): New peephole2.
>
> * gcc.target/i386/pr82580.c: Use {\msbb} instead of "sbb" in
> scan-assembler-times.  Check that there are no movzb* instructions
> if lp64.

OK.

Thanks,
Uros.

> --- gcc/config/i386/i386.md.jj  2017-10-18 14:01:33.0 +0200
> +++ gcc/config/i386/i386.md 2017-10-18 14:38:04.623063038 +0200
> @@ -12300,6 +12300,34 @@ (define_peephole2
>ix86_expand_clear (operands[3]);
>  })
>
> +(define_peephole2
> +  [(set (reg FLAGS_REG) (match_operand 0))
> +   (parallel [(set (reg FLAGS_REG) (match_operand 1))
> + (match_operand 5)])
> +   (set (match_operand:QI 2 "register_operand")
> +   (match_operator:QI 3 "ix86_comparison_operator"
> + [(reg FLAGS_REG) (const_int 0)]))
> +   (set (match_operand 4 "any_QIreg_operand")
> +   (zero_extend (match_dup 2)))]
> +  "(peep2_reg_dead_p (4, operands[2])
> +|| operands_match_p (operands[2], operands[4]))
> +   && ! reg_overlap_mentioned_p (operands[4], operands[0])
> +   && ! reg_overlap_mentioned_p (operands[4], operands[1])
> +   && ! reg_set_p (operands[4], operands[5])
> +   && refers_to_regno_p (FLAGS_REG, operands[1], (rtx *)NULL)
> +   && peep2_regno_dead_p (0, FLAGS_REG)"
> +  [(set (match_dup 6) (match_dup 0))
> +   (parallel [(set (match_dup 7) (match_dup 1))
> + (match_dup 5)])
> +   (set (strict_low_part (match_dup 8))
> +   (match_dup 3))]
> +{
> +  operands[6] = gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);
> +  operands[7] = gen_rtx_REG (GET_MODE (operands[1]), FLAGS_REG);
> +  operands[8] = gen_lowpart (QImode, operands[4]);
> +  ix86_expand_clear (operands[4]);
> +})
> +
>  ;; Similar, but match zero extend with andsi3.
>
>  (define_peephole2
> @@ -12345,6 +12373,35 @@ (define_peephole2
>operands[6] = gen_lowpart (QImode, operands[3]);
>ix86_expand_clear (operands[3]);
>  })
> +
> +(define_peephole2
> +  [(set (reg FLAGS_REG) (match_operand 0))
> +   (parallel [(set (reg FLAGS_REG) (match_operand 1))
> + (match_operand 5)])
> +   (set (match_operand:QI 2 "register_operand")
> +   (match_operator:QI 3 "ix86_comparison_operator"
> + [(reg FLAGS_REG) (const_int 0)]))
> +   (parallel [(set (match_operand 4 "any_QIreg_operand")
> +  (zero_extend (match_dup 2)))
> + (clobber (reg:CC FLAGS_REG))])]
> +  "(peep2_reg_dead_p (4, operands[2])
> +|| operands_match_p (operands[2], operands[4]))
> +   && ! reg_overlap_mentioned_p (operands[4], operands[0])
> +   && ! reg_overlap_mentioned_p (operands[4], operands[1])
> +   && ! reg_set_p (operands[4], operands[5])
> +   && refers_to_regno_p (FLAGS_REG, operands[1], (rtx *)NULL)
> +   && peep2_regno_dead_p (0, FLAGS_REG)"
> +  [(set (match_dup 6) (match_dup 0))
> +   (parallel [(set (match_dup 7) (match_dup 1))
> + (match_dup 5)])
> +   (set (strict_low_part (match_dup 8))
> +   (match_dup 3))]
> +{
> +  operands[6] = gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);
> +  operands[7] = gen_rtx_REG (GET_MODE (operands[1]), FLAGS_REG);
> +  operands[8] = gen_lowpart (QImode, operands[4]);
> +  ix86_expand_clear (operands[4]);
> +})
>
>  ;; Call instructions.
>
> --- gcc/testsuite/gcc.target/i386/pr82580.c.jj  2017-10-19 09:08:14.499637398 
> +0200
> +++ gcc/testsuite/gcc.target/i386/pr82580.c 2017-10-19 10:01:22.148721921 
> +0200
> @@ -35,4 +35,5 @@ void f21 (S x, S y) { if (x >= y) bar ()
>  void f22 (S x, S y) { if (x < y) bar (); }
>  void f23 (S x, S y) { if (x <= y) bar (); }
>
> -/* { dg-final { scan-assembler-times "sbb" 16 } } */
> +/* { dg-final { scan-assembler-times {\msbb} 16 } } */
> +/* { dg-final { scan-assembler-not {\mmovzb} { target 

Re: Add an alternative vector loop iv mechanism

2017-10-19 Thread Richard Biener
On Thu, Oct 19, 2017 at 12:28 AM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Fri, Oct 13, 2017 at 4:10 PM, Richard Sandiford
>>  wrote:
>>> Normally we adjust the vector loop so that it iterates:
>>>
>>>(original number of scalar iterations - number of peels) / VF
>>>
>>> times, enforcing this using an IV that starts at zero and increments
>>> by one each iteration.  However, dividing by VF would be expensive
>>> for variable VF, so this patch adds an alternative in which the IV
>>> increments by VF each iteration instead.  We then need to take care
>>> to handle possible overflow in the IV.
>>
>> Hmm, why do you need to handle possible overflow?  Doesn't the
>> original loop have a natural IV that evolves like this?  After all we
>> can compute an expression for niters of the scalar loop.
>
> The problem comes with loops like:
>
>unsigned char i = 0;
>do
>  {
>...
>i--;
>  }
>while (i != 0);
>
> The loop statements execute 256 times and the latch executes 255 times.
> LOOP_VINFO_NITERSM1 is then 255 but LOOP_VINFO_NITERS (stored as an
> unsigned char) is 0.

Yes, that's an existing issue and the reason why I introduced
NITERSM1.  All remaining uses of NITERS should really go away
because of this corner-case.  So you are introducing a new user?

Richard.

> This leads to things like:
>
>   /* Constant case.  */
>   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> {
>   tree cst_niters = LOOP_VINFO_NITERS (loop_vinfo);
>   tree cst_nitersm1 = LOOP_VINFO_NITERSM1 (loop_vinfo);
>
>   gcc_assert (TREE_CODE (cst_niters) == INTEGER_CST);
>   gcc_assert (TREE_CODE (cst_nitersm1) == INTEGER_CST);
>   if (wi::to_widest (cst_nitersm1) < wi::to_widest (cst_niters))
> return true;
> }
>
> in loop_niters_no_overflow.
>
>>> The new mechanism isn't used yet; a later patch replaces the
>>> "if (1)" with a check for variable VF.  If the patch is OK, I'll
>>> hold off applying it until the follow-on is ready to go in.
>>
>> I indeed don't like code that isn't exercised.  Otherwise looks reasonable.
>
> Thanks.
>
> Richard
>
>> Thanks,
>> Richard.
>>
>>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
>>> OK to install when the time comes?
>>>
>>> Richard
>>>
>>>
>>> 2017-10-13  Richard Sandiford  
>>>
>>> gcc/
>>> * tree-vect-loop-manip.c: Include gimple-fold.h.
>>> (slpeel_make_loop_iterate_ntimes): Add step, final_iv and
>>> niters_maybe_zero parameters.  Handle other cases besides a step of 
>>> 1.
>>> (vect_gen_vector_loop_niters): Add a step_vector_ptr parameter.
>>> Add a path that uses a step of VF instead of 1, but disable it
>>> for now.
>>> (vect_do_peeling): Add step_vector, niters_vector_mult_vf_var
>>> and niters_no_overflow parameters.  Update calls to
>>> slpeel_make_loop_iterate_ntimes and vect_gen_vector_loop_niters.
>>> Create a new SSA name if the latter choses to use a ste other
>>> than zero, and return it via niters_vector_mult_vf_var.
>>> * tree-vect-loop.c (vect_transform_loop): Update calls to
>>> vect_do_peeling, vect_gen_vector_loop_niters and
>>> slpeel_make_loop_iterate_ntimes.
>>> * tree-vectorizer.h (slpeel_make_loop_iterate_ntimes, 
>>> vect_do_peeling)
>>> (vect_gen_vector_loop_niters): Update declarations after above
>> changes.
>>>
>>> Index: gcc/tree-vect-loop-manip.c
>>> ===
>>> --- gcc/tree-vect-loop-manip.c  2017-10-13 15:01:40.144777367 +0100
>>> +++ gcc/tree-vect-loop-manip.c  2017-10-13 15:01:40.296014347 +0100
>>> @@ -41,6 +41,7 @@ Software Foundation; either version 3, o
>>>  #include "tree-scalar-evolution.h"
>>>  #include "tree-vectorizer.h"
>>>  #include "tree-ssa-loop-ivopts.h"
>>> +#include "gimple-fold.h"
>>>
>>>  /*
>>>Simple Loop Peeling Utilities
>>> @@ -247,30 +248,115 @@ adjust_phi_and_debug_stmts (gimple *upda
>>> gimple_bb (update_phi));
>>>  }
>>>
>>> -/* Make the LOOP iterate NITERS times. This is done by adding a new IV
>>> -   that starts at zero, increases by one and its limit is NITERS.
>>> +/* Make LOOP iterate N == (NITERS - STEP) / STEP + 1 times,
>>> +   where NITERS is known to be outside the range [1, STEP - 1].
>>> +   This is equivalent to making the loop execute NITERS / STEP
>>> +   times when NITERS is nonzero and (1 << M) / STEP times otherwise,
>>> +   where M is the precision of NITERS.
>>> +
>>> +   NITERS_MAYBE_ZERO is true if NITERS can be zero, false it is known
>>> +   to be >= STEP.  In the latter case N is always NITERS / STEP.
>>> +
>>> +   If FINAL_IV is nonnull, it is an SSA name that should be set to
>>> +   N * STEP on exit from the 

[C++ PATCH] Avoid bogus -Wreturn-local-addr warnings in templates (PR c++/82600)

2017-10-19 Thread Jakub Jelinek
Hi!

A recent change to check_return_expr resulted in
maybe_warn_about_returning_address_of_local being called also with
processing_template_decl.  The problem with that is that the function
relies on folding (fold_for_warn) which isn't performed at all when
processing_template_decl.  So, we have still ARRAY_REF for a local decl
that has pointer type, rather than the expected POINTER_PLUS_EXPR,
where ARRAY_REF would only remain if the variable was actually an array.

The following patch fixes that by not calling the function at all
when processing_template_decl like before.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-10-19  Jakub Jelinek  

PR c++/82600
* typeck.c (check_return_expr): Don't call
maybe_warn_about_returning_address_of_local in templates.

* g++.dg/warn/Wreturn-local-addr-4.C: New test.

--- gcc/cp/typeck.c.jj  2017-10-10 22:04:05.0 +0200
+++ gcc/cp/typeck.c 2017-10-18 12:17:29.282963388 +0200
@@ -9228,7 +9228,8 @@ check_return_expr (tree retval, bool *no
   && TREE_CODE (TREE_OPERAND (retval, 1)) == AGGR_INIT_EXPR)
retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
 TREE_OPERAND (retval, 0));
-  else if (maybe_warn_about_returning_address_of_local (retval))
+  else if (!processing_template_decl
+  && maybe_warn_about_returning_address_of_local (retval))
retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
 build_zero_cst (TREE_TYPE (retval)));
 }
--- gcc/testsuite/g++.dg/warn/Wreturn-local-addr-4.C.jj 2017-10-18 
12:21:08.569273452 +0200
+++ gcc/testsuite/g++.dg/warn/Wreturn-local-addr-4.C2017-10-18 
12:16:28.0 +0200
@@ -0,0 +1,18 @@
+// PR c++/82600
+// { dg-do compile }
+
+void *b[10];
+
+template 
+void **
+foo (int x)
+{
+  void **a = b;// { dg-bogus "address of local variable 'a' 
returned" }
+  return [x];
+}
+
+void **
+bar (int x)
+{
+  return foo <0> (x);
+}

Jakub


Re: [PATCH] Fix nrv-1.c false failure on aarch64.

2017-10-19 Thread Richard Biener
On Wed, Oct 18, 2017 at 6:59 PM, Egeyar Bagcioglu
 wrote:
> Hello,
>
> Test case "guality.exp=nrv-1.c" fails on aarch64. Optimizations reorder the
> instructions and cause the value of a variable to be checked before its
> first assignment. The following patch is moving the
> break point to the end of the function. Therefore, it ensures that the break
> point is reached after the assignment instruction is executed.
>
> Please review the patch and apply if legitimate.

guality testcases are mostly user-experience tests but they are indeed
prone to the usual jumpiness.  As a user we'd expect a breakpoint
on this line to trigger only if previous stmts have been committed.

I guess Alex work on stmt frontiers will fix this instance?

Richard.

> Egeyar
>


Re: [patch] Fix PR debug/82509

2017-10-19 Thread Richard Biener
On Wed, Oct 18, 2017 at 3:54 PM, Eric Botcazou  wrote:
>> Hmm.  It makes tracking DIE builds difficult now that not all allocations go
>> through new_die anymore.
>
> I wouldn't have created such a precedent though, IOW there is nothing new.

Ah, didn't notice the other existing cases.

>> Can you instead split out a new_die_raw
>> function with just the allocation and the die_tag initialization?  Or make
>> !parent_die && !t this mode, thus add
>>
>>   if (parent_die != NULL)
>> add_child_die (parent_die, die);
>>   else if (! t)
>> return die;
>>   else
>>  {
>>
>> ?  Otherwise the patch looks sensible.
>
> Here's a revision version which makes sure that there is a single call to
>
>   ggc_cleared_alloc ()
>
> in the entire file.  Tested on x86_64-suse-linux.

Thanks, this is ok.

Richard.

>
> PR debug/82509
> * dwarf2out.c (new_die_raw): New static inline function.
> (new_die): Use it to create the DIE.
> (add_AT_external_die_ref): Likewise.
> (clone_die): Likewise.
> (clone_as_declaration): Likewise.
> (dwarf2out_vms_debug_main_pointer): Likewise.
> (base_type_die): Likewise.  Remove early return for corner cases.
> Do not call add_pubtype on the DIE here.
> (is_base_type): Remove ERROR_MARK and return 0 for VOID_TYPE.
> (modified_type_die): Adjust the lookup for reverse order DIEs.  Skip
> typedefs for base types with DW_AT_endianity.  Make sure a DIE with
> native order exists for base types, attach the DIE manually and call
> add_pubtype on it.  Do not equate a reverse order DIE to the type.
>
> --
> Eric Botcazou


[PATCH, i386] Improve double-word comparisons (PR target/82580)

2017-10-19 Thread Jakub Jelinek
Hi!

With the patch you've checked in yesterday we generate e.g. for f2
cmpq%rdi, %rdx
sbbq%rsi, %rcx
setb%al
movzbl  %al, %eax
This is because the peephole2s we have for setcc + movzbl to xorl + setcc
check that the flags register is dead before the instruction before
setcc (sbbq above), which is not the case, so we can't insert xorl %eax, %eax
before sbbq, as that woiuld clobber flags.  But we can emit
xorl%eax, %eax
cmpq%rdi, %rdx
sbbq%rsi, %rcx
setb%al
in this case, move the clearing one insn earlier (of course assuming that
neither the cmpq nor sbbq use or set %rax).

The following peephole2s do that, bootstrapped/regtested on x86_64-linux and
i686-linux, ok for trunk?

The reason for only testing no movzb* insns for lp64 is that for -m32 there
are some - the arguments are passed on the stack, so they need to be loaded
from there and %eax is one of the registers it chooses to use.

2017-10-19  Jakub Jelinek  

PR target/82580
* config/i386/i386.md (setcc + movzbl to xor + setcc): New peephole2.
(setcc + and to xor + setcc): New peephole2.

* gcc.target/i386/pr82580.c: Use {\msbb} instead of "sbb" in
scan-assembler-times.  Check that there are no movzb* instructions
if lp64.

--- gcc/config/i386/i386.md.jj  2017-10-18 14:01:33.0 +0200
+++ gcc/config/i386/i386.md 2017-10-18 14:38:04.623063038 +0200
@@ -12300,6 +12300,34 @@ (define_peephole2
   ix86_expand_clear (operands[3]);
 })
 
+(define_peephole2
+  [(set (reg FLAGS_REG) (match_operand 0))
+   (parallel [(set (reg FLAGS_REG) (match_operand 1))
+ (match_operand 5)])
+   (set (match_operand:QI 2 "register_operand")
+   (match_operator:QI 3 "ix86_comparison_operator"
+ [(reg FLAGS_REG) (const_int 0)]))
+   (set (match_operand 4 "any_QIreg_operand")
+   (zero_extend (match_dup 2)))]
+  "(peep2_reg_dead_p (4, operands[2])
+|| operands_match_p (operands[2], operands[4]))
+   && ! reg_overlap_mentioned_p (operands[4], operands[0])
+   && ! reg_overlap_mentioned_p (operands[4], operands[1])
+   && ! reg_set_p (operands[4], operands[5])
+   && refers_to_regno_p (FLAGS_REG, operands[1], (rtx *)NULL)
+   && peep2_regno_dead_p (0, FLAGS_REG)"
+  [(set (match_dup 6) (match_dup 0))
+   (parallel [(set (match_dup 7) (match_dup 1))
+ (match_dup 5)])
+   (set (strict_low_part (match_dup 8))
+   (match_dup 3))]
+{
+  operands[6] = gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);
+  operands[7] = gen_rtx_REG (GET_MODE (operands[1]), FLAGS_REG);
+  operands[8] = gen_lowpart (QImode, operands[4]);
+  ix86_expand_clear (operands[4]);
+})
+
 ;; Similar, but match zero extend with andsi3.
 
 (define_peephole2
@@ -12345,6 +12373,35 @@ (define_peephole2
   operands[6] = gen_lowpart (QImode, operands[3]);
   ix86_expand_clear (operands[3]);
 })
+
+(define_peephole2
+  [(set (reg FLAGS_REG) (match_operand 0))
+   (parallel [(set (reg FLAGS_REG) (match_operand 1))
+ (match_operand 5)])
+   (set (match_operand:QI 2 "register_operand")
+   (match_operator:QI 3 "ix86_comparison_operator"
+ [(reg FLAGS_REG) (const_int 0)]))
+   (parallel [(set (match_operand 4 "any_QIreg_operand")
+  (zero_extend (match_dup 2)))
+ (clobber (reg:CC FLAGS_REG))])]
+  "(peep2_reg_dead_p (4, operands[2])
+|| operands_match_p (operands[2], operands[4]))
+   && ! reg_overlap_mentioned_p (operands[4], operands[0])
+   && ! reg_overlap_mentioned_p (operands[4], operands[1])
+   && ! reg_set_p (operands[4], operands[5])
+   && refers_to_regno_p (FLAGS_REG, operands[1], (rtx *)NULL)
+   && peep2_regno_dead_p (0, FLAGS_REG)"
+  [(set (match_dup 6) (match_dup 0))
+   (parallel [(set (match_dup 7) (match_dup 1))
+ (match_dup 5)])
+   (set (strict_low_part (match_dup 8))
+   (match_dup 3))]
+{
+  operands[6] = gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);
+  operands[7] = gen_rtx_REG (GET_MODE (operands[1]), FLAGS_REG);
+  operands[8] = gen_lowpart (QImode, operands[4]);
+  ix86_expand_clear (operands[4]);
+})
 
 ;; Call instructions.
 
--- gcc/testsuite/gcc.target/i386/pr82580.c.jj  2017-10-19 09:08:14.499637398 
+0200
+++ gcc/testsuite/gcc.target/i386/pr82580.c 2017-10-19 10:01:22.148721921 
+0200
@@ -35,4 +35,5 @@ void f21 (S x, S y) { if (x >= y) bar ()
 void f22 (S x, S y) { if (x < y) bar (); }
 void f23 (S x, S y) { if (x <= y) bar (); }
 
-/* { dg-final { scan-assembler-times "sbb" 16 } } */
+/* { dg-final { scan-assembler-times {\msbb} 16 } } */
+/* { dg-final { scan-assembler-not {\mmovzb} { target lp64 } } } */

Jakub


Re: [RFA] Zen tuning part 9: Add support for scatter/gather in vectorizer costmodel

2017-10-19 Thread Richard Biener
On Thu, 19 Oct 2017, Jan Hubicka wrote:

> Hi,
> this is proof of concept patch for vectorizer costs to use costs used for 
> rtx_cost
> and register_move_cost which are readily available in ix86_costs instead of 
> using
> its own set of random values.  At least until we have proof of evidence that 
> vectroizer
> costs needs to differ, I do not think we want to complicate CPU tuning by 
> having them
> twice.
> 
> This is of course quite intrusive change to what we have becuase it affects 
> all
> x86 targets.  I have finally worked out that the "random" values used by AMD 
> target
> corresponds to latencies of bdver1.
> 
> I have benchmarked them on Zen and also temporarily patches Czerny (Haswel).
> It seems to cause no regression and quite nice improvements:
>   - 27.3% for facerec on Zen
>   - 7% for mgrid on Haswel
>   - maybe 1% for galgel of Haswell
>   - 3% for facerec on Haswell
>   - maybe 1% aspi on Haswell
>   - there may be small off-noise improvement for rnflow and regression for 
> fatigue2 on Haswell
> 
> So I would say that outcome is surprisingly good (especially due to lack of
> noteworthy regressions).  I also know that vectorizer hurts performance on 
> Zen and
> Mesa/tonto benchmarks which is not cured by this patch alone.
> 
> There is testsuite fallout though.
> 
> ./testsuite/g++/g++.sum:FAIL: g++.dg/vect/slp-pr56812.cc  -std=c++11  
> scan-tree-dump-times slp1 "basic block vectorized" 1 (found 0 times)
> ./testsuite/g++/g++.sum:FAIL: g++.dg/vect/slp-pr56812.cc  -std=c++14  
> scan-tree-dump-times slp1 "basic block vectorized" 1 (found 0 times)
> ./testsuite/g++/g++.sum:FAIL: g++.dg/vect/slp-pr56812.cc  -std=c++98  
> scan-tree-dump-times slp1 "basic block vectorized" 1 (found 0 times)
> 
>   Here we vectorize the loop before first while originally we unrolled and 
> SLP vectorized next
> 
> ./testsuite/gcc/gcc.sum:FAIL: gcc.target/i386/l_fma_double_1.c 
> scan-assembler-times vfmadd[123]+sd 56 (found 32 times)
> ./testsuite/gcc/gcc.sum:FAIL: gcc.target/i386/l_fma_double_2.c 
> scan-assembler-times vfmadd[123]+sd 56 (found 32 times)
> ./testsuite/gcc/gcc.sum:FAIL: gcc.target/i386/l_fma_double_3.c 
> scan-assembler-times vfmadd[123]+sd 56 (found 32 times)
> ./testsuite/gcc/gcc.sum:FAIL: gcc.target/i386/l_fma_double_4.c 
> scan-assembler-times vfmadd[123]+sd 56 (found 32 times)
> ./testsuite/gcc/gcc.sum:FAIL: gcc.target/i386/l_fma_double_5.c 
> scan-assembler-times vfmadd[123]+sd 56 (found 32 times)
> ./testsuite/gcc/gcc.sum:FAIL: gcc.target/i386/l_fma_double_6.c 
> scan-assembler-times vfmadd[123]+sd 56 (found 32 times)
> ./testsuite/gcc/gcc.sum:FAIL: gcc.target/i386/l_fma_float_1.c 
> scan-assembler-times vfmadd[123]+ss 120 (found 64 times)
> ./testsuite/gcc/gcc.sum:FAIL: gcc.target/i386/l_fma_float_2.c 
> scan-assembler-times vfmadd[123]+ss 120 (found 64 times)
> ./testsuite/gcc/gcc.sum:FAIL: gcc.target/i386/l_fma_float_3.c 
> scan-assembler-times vfmadd[123]+ss 120 (found 64 times)
> ./testsuite/gcc/gcc.sum:FAIL: gcc.target/i386/l_fma_float_4.c 
> scan-assembler-times vfmadd[123]+ss 120 (found 64 times)
> ./testsuite/gcc/gcc.sum:FAIL: gcc.target/i386/l_fma_float_5.c 
> scan-assembler-times vfmadd[123]+ss 120 (found 64 times)
> ./testsuite/gcc/gcc.sum:FAIL: gcc.target/i386/l_fma_float_6.c 
> scan-assembler-times vfnmsub[123]+ss 120 (found 64 times)
> 
> And friends, clearly we do not vectorize all loops, I did not look into 
> details yet
> 
> ./testsuite/gcc/gcc.sum:FAIL: gcc.target/i386/pr61403.c scan-assembler blend
> 
> Here again we vectorize loop while originally we did SLP.  I am not sure why 
> loop
> vectorizer does not use blend.
> 
> ./testsuite/gcc/gcc.sum:FAIL: gcc.target/i386/pr79683.c scan-assembler-times 
> padd 1 (found 0 times)
> 
> Here we are supposed to vectorize two integer additions, but since generic 
> cost model now claims that
> latency of vector add is twice of integer add we don't.  I think it makes 
> sense.
> 
> ./testsuite/gcc/gcc.sum:FAIL: gcc.target/i386/pr79723.c scan-assembler 
> mov[au]p.[ \t][^,]+, %gs:
> 
> Similarly here.
> 
> If it seems to make sense, I will clean it up (remove now unused entries and 
> scale
> conditional costs by COSTS_N_INSNS) and fix the tessuite fallout.

Please look at the testsuite fallout in detail.  Note that only
testcases that do not disable the cost model should be affected
(all vect.exp testcases disable the cost model for example).

The patch itself looks mostly good, I suppose if we also have
separate costs for float vs. double you could do a bit better
by looking at the type in more detail.  I think vectype should
be non-NULL most of the time - but for example for the scalar cost
it might not be always there, likewise for SLP vectorization the
scalar cost calls will not have the type information so you'll
get a mixup between integer and FP costing scalar vs. vector.
Nothing that cannot be fixed on the vectorizer side, but ...
(we'd document that for scalar costs we pass the scalar type
for example).


Re: [PATCH] Derive interface buffers from max name length

2017-10-19 Thread Bernhard Reutner-Fischer
On Sat, Jun 18, 2016 at 09:46:17PM +0200, Bernhard Reutner-Fischer wrote:
> On December 3, 2015 10:46:09 AM GMT+01:00, Janne Blomqvist 
>  wrote:
> >On Tue, Dec 1, 2015 at 6:51 PM, Bernhard Reutner-Fischer
> > wrote:
> >> On 1 December 2015 at 15:52, Janne Blomqvist
> > wrote:
> >>> On Tue, Dec 1, 2015 at 2:54 PM, Bernhard Reutner-Fischer
> >>>  wrote:
>  These three function used a hardcoded buffer of 100 but would be
> >better
>  off to base off GFC_MAX_SYMBOL_LEN which denotes the maximum length
> >of a
>  name in any of our supported standards (63 as of f2003 ff.).
> >>>
> >>> Please use xasprintf() instead (and free the result, or course). One
> >>> of my backburner projects is to get rid of these static symbol
> >>> buffers, and use dynamic buffers (or the symbol table) instead. We
> >>> IIRC already have some ugly hacks by using hashing to get around
> >>> GFC_MAX_SYMBOL_LEN when handling mangled symbols. Your patch doesn't
> >>> make the situation worse per se, but if you're going to fix it, lets
> >>> do it properly.
> >>
> >> I see.
> >>
> >> /scratch/src/gcc-6.0.mine/gcc/fortran$ git grep
> >> "^[[:space:]]*char[[:space:]][[:space:]]*[^[;[:space:]]*\[" | wc -l
> >> 142
> >> /scratch/src/gcc-6.0.mine/gcc/fortran$ git grep "xasprintf" | wc -l
> >> 32
> >
> >Yes, that's why it's on the TODO-list rather than on the DONE-list. :)
> >
> >> What about memory fragmentation when switching to heap-based
> >allocation?
> >> Or is there consensus that these are in the noise compared to other
> >> parts of the compiler?
> >
> >Heap fragmentation is an issue, yes. I'm not sure it's that
> >performance-critical, but I don't think there is any consensus. I just
> >want to avoid ugly hacks like symbol hashing to fit within some fixed
> >buffer. Perhaps an good compromise would be something like std::string
> >with small string optimization, but as you have seen there is some
> >resistance to C++. But this is more relevant for mangled symbols, so
> >GFC_MAX_MANGLED_SYMBOL_LEN is more relevant here, and there's only a
> >few of them left. So, well, if you're sure that mangled symbols are
> >never copied into the buffers your patch modifies, please consider
> >your original patch Ok as well. Whichever you prefer.
> >
> >Performance-wise I think a bigger benefit would be to use the symbol
> >table more and then e.g. be able to do pointer comparisons rather than
> >strcmp(). But that is certainly much more work.
> 
> Hm, worth a look indeed since that would certainly be a step in the right 
> direction.

Installed the initial patch as intermediate step as r253881 for now.

thanks,
> 
> >
> >> BTW:
> >> $ git grep APO
> >> io.c:  static const char *delim[] = { "APOSTROPHE", "QUOTE", "NONE",
> >NULL };
> >> io.c:  static const char *delim[] = { "APOSTROPHE", "QUOTE", "NONE",
> >NULL };
> >
> >? What are you saying?
> 
> delim is duplicated, we should remove one instance.
> thanks,
> 


Re: [patch] avoid printing leading 0 in widest_int hex dumps

2017-10-19 Thread Aldy Hernandez



On 10/18/2017 06:39 PM, Richard Sandiford wrote:

Aldy Hernandez  writes:

On Tue, Oct 17, 2017 at 6:05 PM, Richard Sandiford



Ah!  OK.  Yeah, I agree it doesn't make sense to print sign-extension
bits above the precision.  I think it'd work if print_hex used
extract_uhwi insteead of elt, which would also remove the need
to handle "negative" numbers specially.  I'll try that tomorrow.


Thanks.




Currently I'm doing this to chop off the unnecessary bits:

/* Wide ints may be sign extended to the full extent of the
underlying HWI storage, even if the precision we care about
is smaller.  Chop off the excess bits for prettier output.  */
signop sign = TYPE_UNSIGNED (type) ? UNSIGNED : SIGNED;
widest_int val = widest_int::from (bounds[i], sign);
val &= wi::mask (bounds[i].get_precision (), false);

if (val > 0x)
   print_hex (val, pp_buffer (buffer)->digit_buffer);
else
   print_dec (val, pp_buffer (buffer)->digit_buffer, sign);

Since I am calling print_hex() on the widest_int, I get the leading 0.

Do you recommend another way of accomplishing this?


Is it just print_hex that's the problem?  Does print_dec handle
big negative numbers properly?


I haven't checked.  I don't bother printing in decimal when the number 
is larger than say, 0x, hence the code above :).



Aldy


Re: [patch] implement generic debug() functions for wide_int's

2017-10-19 Thread Aldy Hernandez



On 10/18/2017 11:23 PM, Martin Sebor wrote:

On 10/18/2017 11:08 AM, Aldy Hernandez wrote:

It looks like the generic debug() function for wide_int's is missing.
Instead, we have a wi->dump() method.  I have implemented debug() for
generic wide_int and for widest_int, which should cover the common
cases.  Anything else, can continue using the ->dump() method
templated methods.


Do you by any chance also plan to add support to the pretty printer
for wide_int (and offset_int)?  It would obviate having to convert
them to {s,u}hwi.


The pretty printer wasn't on my radar.  I am mostly concerned in 
enhancing the debugging experience right now.  Perhaps I should put that 
on my todo list...


Aldy


Re: [patch] implement generic debug() functions for wide_int's

2017-10-19 Thread Aldy Hernandez



On 10/18/2017 06:22 PM, Pedro Alves wrote:

On 10/18/2017 06:08 PM, Aldy Hernandez wrote:


Also, do we have a blessed way of specifying overloaded functions in
ChangeLog's?  I couldn't find anything in our GCC coding guidelines or
in the GNU coding guidelines.  For lack of direction, I'm doing the
following:

* wide-int.cc (debug) [const wide_int &]: New.
(debug) [const wide_int *]: New.
(debug) [const widest_int &]: New.
(debug) [const widest_int *]: New.

It seems appropriate, as that is the GNU way of changelogs for a
conditional change to a function ???.


Doesn't seem that appropriate to me.  Square brackets are used for
conditional compilation (#ifdef etc.), but overloads are not that.

I'd suggest looking in libstdc++'s ChangeLog for precedents.  It's where
I looked at when I had the same question for GDB, FWIW.  E.g., a very
recent libstdc++ commit from Jon had:

 * include/bits/stl_map.h (map::insert(value_type&&))
 (map::insert(const_iterator, value_type&&)): Add overload for rvalues.


...but that's sooo ugly :).

Very well.  Committed the patch below.

Thanks for setting me straight.

commit 1a91de5beb16d130bed17f6eeb6b5ca6af454003 (HEAD -> trunk)
Author: Aldy Hernandez 
Date:   Thu Oct 19 03:49:34 2017 -0400

Update my last ChangeLog entry to properly specify overloaded 
functions.


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a139a824d35..e43c53661e9 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -17,10 +17,10 @@

 2017-10-18  Aldy Hernandez  

-   * wide-int.cc (debug) [const wide_int &]: New.
-   (debug) [const wide_int *]: New.
-   (debug) [const widest_int &]: New.
-   (debug) [const widest_int *]: New.
+   * wide-int.cc (debug (const wide_int &)): New.
+   (debug (const wide_int *)): New.
+   (debug (const widest_int &)): New.
+   (debug (const widest_int *)): New.


  1   2   >