date:20151208

No functional changes.

2015-12-08  Uros Bizjak  

* config/i386/i386.c (ix86_emit_swsqrtsf): Cleanup
infinity filterning code.

Bootstrapped and regression tested on x86_64-linux-gnu, committed to
mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 231355)
+++ config/i386/i386.c  (working copy)
@@ -47827,8 +47827,7 @@ void ix86_emit_swdivsf (rtx res, rtx a, rtx b, mac
 /* Output code to perform a Newton-Rhapson approximation of a
single precision floating point [reciprocal] square root.  */
 
-void ix86_emit_swsqrtsf (rtx res, rtx a, machine_mode mode,
-bool recip)
+void ix86_emit_swsqrtsf (rtx res, rtx a, machine_mode mode, bool recip)
 {
   rtx x0, e0, e1, e2, e3, mthree, mhalf;
   REAL_VALUE_TYPE r;
@@ -47868,13 +47867,9 @@ void ix86_emit_swdivsf (rtx res, rtx a, rtx b, mac
   /* If (a == 0.0) Filter out infinity to prevent NaN for sqrt(0.0).  */
   if (!recip)
 {
-  rtx zero, mask;
+  rtx zero = force_reg (mode, CONST0_RTX(mode));
+  rtx mask;
 
-  zero = gen_reg_rtx (mode);
-  mask = gen_reg_rtx (mode);
-
-  zero = force_reg (mode, CONST0_RTX(mode));
-
   /* Handle masked compare.  */
   if (VECTOR_MODE_P (mode) && GET_MODE_SIZE (mode) == 64)
{
@@ -47885,8 +47880,8 @@ void ix86_emit_swdivsf (rtx res, rtx a, rtx b, mac
}
   else
{
+ mask = gen_reg_rtx (mode);
  emit_insn (gen_rtx_SET (mask, gen_rtx_NE (mode, zero, a)));
-
  emit_insn (gen_rtx_SET (x0, gen_rtx_AND (mode, x0, mask)));
}
 }

Re: [PATCH] Fix warnings from including fdl.texi into gnat-style.texi


On 23/03/15 16:00, Gerald Pfeifer wrote:

On Fri, 20 Mar 2015, Tom de Vries wrote:

The gnat-style.texi part is OK. I cannot approve the fdl part though.

Gerald,

Can you approve the fdl part?


Let's assume I can.  Okay.

Can you just describe the _why_ a bit in a @comment (in simple
words beyond showing the error message), that is, what the issue
is and how you avoid it?  That should help someone coming in
later, trying to understand.



Gerald,

was the 'Okay' above:
- a figure of speech (as I read it), or
- an actual approval (conditional on the adding of the comment)
?

Thanks,
- Tom

Re: [PATCH, PR68640] Clear restrict in install_var_field

On Tue, 8 Dec 2015, Jakub Jelinek wrote:

> On Tue, Dec 08, 2015 at 09:49:49AM +0100, Tom de Vries wrote:
> > diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> > index d1d1e3c..ac4a94d 100644
> > --- a/gcc/omp-low.c
> > +++ b/gcc/omp-low.c
> > @@ -1389,6 +1389,12 @@ install_var_field (tree var, bool by_ref, int mask, 
> > omp_context *ctx,
> >   || !is_gimple_omp_oacc (ctx->stmt));
> >  
> >type = TREE_TYPE (var);
> > +  /* Prevent redeclaring the var in the split-off function with a restrict
> > + pointer type.  Note that we only clear type itself, restrict 
> > qualifiers in
> > + the pointed-to type will be ignored by points-to analysis.  */
> > +  if (POINTER_TYPE_P (type))
> > +type = build_qualified_type (type, TYPE_QUALS (type) & 
> > ~TYPE_QUAL_RESTRICT);
> 
> Is it necessary to call build_qualified_type in the common case (when it is
> not restrict)?  I'd think if (POINTER_TYPE_P (type) && TYPE_RESTRICT (type))
> might be better.

> That said, I forgot when exactly are the cliques computed, it would be nice
> if then computing those and seeing a
> GOMP_parallel/GOACC_parallel_keyed/GOMP_target_ext/GOMP_task/GOMP_taskloop*
> builtins we would just continue walking the outlined functions they refer to
> as if it was a part of the current function for the purpose of the restrict
> computation.

Cliques are not computed in IPA PTA and the above would require IPA.

Richard.

Re: [patch] Fix PR middle-end/68291 & 68292

2015-12-08 Thread Christophe Lyon

On 7 December 2015 at 10:35, Eric Botcazou  wrote:
> Hi,
>
> it's a couple of regressions in the C testsuite present on SPARC 64-bit and
> coming from the new coalescing code which fails to handle vector types with
> BLKmode that are returned in multiple registers.  The code assigns a BLKmode
> REG to the RESULT_DECL of the function in expand_function_start and this later
> causes expand_function_end to choke.
>
> As discussed with Alexandre in the audit trail, the attached minimal fix just
> prevents the problematic BLKmode REG from being generated, which appears to be
> sufficient to restore the nominal operating mode.
>
> Tested on x86-64/Linux and SPARC64/Solaris, OK for the mainline?
>
>
> 2015-12-07  Eric Botcazou  
>
> PR middle-end/68291
> PR middle-end/68292
> * cfgexpand.c (set_rtl): Always accept PARALLELs with BLKmode for
> SSA names based on RESULT_DECLs.
> * function.c (expand_function_start): Do not create BLKmode REGs
> for GIMPLE registers when coalescing is enabled.
>

Hi Eric,

Since you committed this (r231372), I've noticed several regressions
on ARM and AArch64.
You can have a look at
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/231372/report-build-info.html
for more details.

Can you have a look?

Thanks,

Christophe.


> --
> Eric Botcazou

Re: [patch] Fix PR middle-end/68291 & 68292

> Since you committed this (r231372), I've noticed several regressions
> on ARM and AArch64.
> You can have a look at
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/231372/
> report-build-info.html for more details.

I presume it's:

Fail appears  [ => FAIL]:
  gcc.dg/pr63594-1.c (internal compiler error)
  gcc.dg/pr63594-2.c (internal compiler error)
  gcc.dg/vect/vect-singleton_1.c (internal compiler error)
  gcc.dg/vect/vect-singleton_1.c -flto -ffat-lto-objects (internal compiler 
error)
  gcc.target/aarch64/aapcs64/func-ret-1.c compilation,  -O1  (internal 
compiler error)
  gcc.target/aarch64/aapcs64/func-ret-1.c compilation,  -O2  (internal 
compiler error)
  gcc.target/aarch64/aapcs64/func-ret-1.c compilation,  -O3 -g  (internal 
compiler error)
  gcc.target/aarch64/aapcs64/func-ret-1.c compilation,  -Og -g  (internal 
compiler error)
  gcc.target/aarch64/aapcs64/func-ret-1.c compilation,  -Os  (internal 
compiler error)
  gcc.target/aarch64/pr65491_1.c (internal compiler error)

on aarch64?  Yes, I'm going to have a look.

-- 
Eric Botcazou

[PATCH][AArch64] PR target/68696 FAIL: gcc.target/aarch64/vbslq_u64_1.c scan-assembler-times bif\tv 1

2015-12-08 Thread Kyrill Tkachov


Hi all,

The test gcc.target/aarch64/vbslq_u64_1.c started failing recently due to some 
tree-level changes.
This just exposed a deficiency in our xor-and-xor pattern for the vector 
bit-select pattern:
aarch64_simd_bsl_internal.

We now fail to match the rtx:
(set (reg:V4SI 79)
(xor:V4SI (and:V4SI (xor:V4SI (reg:V4SI 32 v0 [ a ])
(reg/v:V4SI 77 [ b ]))
(reg:V4SI 34 v2 [ mask ]))
(reg/v:V4SI 77 [ b ])))

whereas before combine attempted:
(set (reg:V4SI 79)
(xor:V4SI (and:V4SI (xor:V4SI (reg/v:V4SI 77 [ b ])
(reg:V4SI 32 v0 [ a ]))
(reg:V4SI 34 v2 [ mask ]))
(reg/v:V4SI 77 [ b ])))

Note that just the order of the operands of the inner XOR has changed.
This could be solved by making the second operand of the outer XOR a 4th operand
of the pattern, enforcing that it should be equal to operand 2 or 3 in the 
pattern
condition and performing the appropriate swapping in the output template.
However, the aarch64_simd_bsl_internal pattern is expanded to by other
places in aarch64-simd.md and updating all the callsites to add a 4th operand is
wasteful and makes them harder to understand.

Therefore this patch adds a new define_insn with the match_dup of operand 2 in
the outer XOR.  I also had to update the alternatives/constraints in the pattern
and the output template. Basically it involves swapping operands 2 and 3 around 
in the
constraints and output templates.

The test now combines to a single vector bfi instruction again.

Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill

2015-12-08  Kyrylo Tkachov  

PR target/68696
* config/aarch64/aarch64-simd.md (*aarch64_simd_bsl_alt):
New pattern.
(aarch64_simd_bsl_internal): Update comment to reflect
the above.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 030a1013caa8a965bcd1615c9686d0be715be921..2856f017c16380623960e21d66149d18efe176f0 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2153,6 +2153,10 @@ (define_insn "aarch64_reduc__internal"
 ;; bit op0, op2, mask
 ;;   if (op0 = op2) (so 0-bits in mask choose bits from op1, else op0)
 ;; bif op0, op1, mask
+;;
+;; This pattern is expanded to by the aarch64_simd_bsl expander.
+;; Some forms of straight-line code may generate the equivalent form
+;; in *aarch64_simd_bsl_alt.
 
 (define_insn "aarch64_simd_bsl_internal"
   [(set (match_operand:VSDQ_I_DI 0 "register_operand" "=w,w,w")
@@ -2172,6 +2176,29 @@ (define_insn "aarch64_simd_bsl_internal"
   [(set_attr "type" "neon_bsl")]
 )
 
+;; We need this form in addition to the above pattern to match the case
+;; when combine tries merging three insns such that the second operand of
+;; the outer XOR matches the second operand of the inner XOR rather than
+;; the first.  The two are equivalent but since recog doesn't try all
+;; permutations of commutative operations, we have to have a separate pattern.
+
+(define_insn "*aarch64_simd_bsl_alt"
+  [(set (match_operand:VSDQ_I_DI 0 "register_operand" "=w,w,w")
+	(xor:VSDQ_I_DI
+	   (and:VSDQ_I_DI
+	 (xor:VSDQ_I_DI
+	   (match_operand:VSDQ_I_DI 3 "register_operand" "w,w,0")
+	   (match_operand:VSDQ_I_DI 2 "register_operand" "w,0,w"))
+	  (match_operand:VSDQ_I_DI 1 "register_operand" "0,w,w"))
+	  (match_dup:VSDQ_I_DI 2)))]
+  "TARGET_SIMD"
+  "@
+  bsl\\t%0., %3., %2.
+  bit\\t%0., %3., %1.
+  bif\\t%0., %2., %1."
+  [(set_attr "type" "neon_bsl")]
+)
+
 (define_expand "aarch64_simd_bsl"
   [(match_operand:VALLDIF 0 "register_operand")
(match_operand: 1 "register_operand")

Re: [PATCH, PR68640] Clear restrict in install_var_field


On 08/12/15 10:41, Tom de Vries wrote:

Cliques are not computed in IPA PTA and the above would require IPA.


FTR, this is a generic (not openmp-related) example, and AFAIU a more
generic form of the issue that is being raised.
...
static void __attribute__((noinline, noclone))
bar (int *a, int * b)
{
   *a = 1;
   *b = 2;
   *a = *a + 1;
}

void
foo (int *__restrict__ a, int *__restrict__ b)
{
   bar (a, b);
}
...


Filed as PR68787 - fipa-pta to interpret restrict.

Thanks,
- Tom

Re: [SPARC] Fix PR target/63668

> 2015-12-07  Eric Botcazou  
> 
>   PR target/63668
>   * doc/invoke.texi (SPARC options): Document -mstd-struct-return.

Applied on mainline.

* doc/invoke.texi (SPARC options): Fix typo.

-- 
Eric BotcazouIndex: doc/invoke.texi
===
--- doc/invoke.texi	(revision 231394)
+++ doc/invoke.texi	(working copy)
@@ -21605,7 +21605,7 @@ the rules of the ABI@.
 @opindex mstd-struct-return
 @opindex mno-std-struct-return
 With @option{-mstd-struct-return}, the compiler generates checking code
-in functions returning structures or unions that detect size mismatches
+in functions returning structures or unions to detect size mismatches
 between the two sides of function calls, as per the 32-bit ABI@.
 
 The default is @option{-mno-std-struct-return}.  This option has no effect

Re: [PATCH, PR68640] Clear restrict in install_var_field

On Tue, 8 Dec 2015, Tom de Vries wrote:

> Hi,
> 
> this patch fixes PR68640.
> 
> Consider this testcase:
> ...
> int
> foo (int *__restrict__ ap)
> {
>   int *bp = ap;
> #pragma omp parallel for
>   for (unsigned int idx = 0; idx < N; idx++)
> ap[idx] = bp[idx];
> }
> ...
> 
> When declaring a field for restrict pointer ap in the data passing struct for
> the thread function, we declare that field with restrict.
> 
> So, we roughly get the equivalent of:
> ...
> foo.omp_fn (int *restrict ap, int *bp)
> {
>   for (unsigned int idx = 0; idx < N; idx++)
> ap[idx] = bp[idx];
> }
> 
> int
> foo (int *__restrict__ ap)
> {
>   int *bp = ap;
>   GOMP_parallel (foo.omp_fn, ap, bp, ...);
> }
> ...
> 
> That is incorrect, since ap is not restrict in thread function foo.omp_fn,
> given that we also access *ap via bp.
> 
> Atm, this doesn't result in wrong code, but it might result in wrong code when
> we make restrict interpretation more aggressive.
> 
> The patch fixes the problem by clearing the restrict qualifier on pointer
> types in install_var_field.
> 
> Build non-bootstrap c compiler and tested gcc/gomp.exp and
> target-libgomp/c.exp.
> 
> OK for stage3 trunk if bootstrap and reg-test succeeds?

Ok.

Richard.

> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment

2015-12-08 Thread Tobias Burnus

On Mon, Dec 07, 2015 at 02:09:22PM +, Matthew Wahab wrote:
> >>I wonder whether using
> >>__asm__ __volatile__ ("":::"memory");
> >>would be sufficient as it has a way lower overhead than
> >>__sync_synchronize().
>
> I don't know anything about Fortran or coarrays and I'm curious
> whether this affects architectures with weak memory models. Is the
> barrier only needed to stop reordering by the compiler or is does it
> also need to stop reordering by the hardware?

Short answer: I think no mfence is needed as either the communication
is local (to the thread/process) - in which case the hardware will act
correctly - or the communication is remote (different thread, process,
communication to different computer via interlink [ethernet, infiniband,
...]); and in the later case, the communication library has to deal with
it.

However, I think that except for SYNC using
  __asm__ __volatile__ ("":::"memory");
is the wrong solution - even though it might work as band aid.

 * * *

To gether some suggestions, here is how coarrays work. They are based on:
everything is a local variable - except it is a coarray. And all communication
is explicit - but is often single sideded.  That scheme works well (also)
with distributed memory and is similar to MPI (Message Passing Interface).

Using

integer :: var[*]

one declares a coarray scalar variable. Accessing it as

  var = 5
  tmp = var

acts on the variable on the current 'image' (process). Using

  var[idx] = 5   ! Set 'var' in image 'idx' to '5'
  tmp = var[idx] ! Obtain value of 'var' on image 'idx'

one accesses that variable on a remote image - except if 'idx' matches
the current image; in that case, it acts on the local variable.

In one segment (which ends at explicit or implicit synchronizations,
e.g. using SYNC ALL): If the value of a is changed - either locally
or by a remote image - then only that image is permitted to 'read' the
value (or to change it as that would be another possibility for race
conditions).

Very often, coarray variables are arrays and heavily accessed as local
varable in hot loops. But on the other hand, the value of the variable
can be changed by external means - up to having hardware support to
write into the memory from another computer.

Thus, there two cases were the local view might fail:

 (a) when the variable has been changed in a previous segment by a
 remote process, e.g.
   var = 5   ! assue 'var' is a coarray
   sync all  ! end of segment 
   ! ! var is changed by a remote image
   sync all  ! end of segment 
   tmp = var
 where "var" might or might not have the value 5. Or more fancy
 without locks and global barriers:
   type(event_type) :: var_is_set[*]
   var = 5
   event post(have_set_var[remote_idx])
 ! Remote waits for event, sets out 'var' to 42 and
 ! then posts an event that it is set
   event wait(have_set_var)
   tmp = var
where one tells the remote process that "var = 5" has been set
and waits until it has set the local variable "var".

 (b) when the variable is changed on the same image by two means, e.g.
   var = 5
   var[this_image()] = 42
   tmp = var

The communication maps to function calls like:
  __gfortran_caf_send(var_token, idx, 5)// var[idx] = 5
  __gfortran_caf_get (tmp, var_token, idx)  // tmp = var[idx]
[Real commands, see
https://gcc.gnu.org/onlinedocs/gfortran/index.html#toc_Coarray-Programming]

It is assumed that the library called takes care of the hardware part,
i.e. locking, mfence etc. - such that in the compiler itself, one only
has to deal with restricting compiler optimizations to the places where
it is permitted.

A coarray comes into existance via:

  var = _gfortran_caf_register (size, _token);

similar to malloc - and var_token identifies the coarray; what the
content is, is left to the library. - For the good or worse,
_gfortran_caf_register doesn't tell the compiler that it has clobbered
"var" as in a way the pointer address escaped. Nor is "var" marked as
volatile.

Coming back to item (a): After a segment has ended (SYNC ALL, EVENT WAIT
or similar), the compiler can no longer assume that coarray variables
have the same value. Those variables can be hidden as in
call foo()
sync all
call foo()
where 'foo' doesn't know about the 'sync all' and the caller doesn't know
whether 'foo' has accesses to coarrays (and whether it accesses them). I
think
  __asm__ __volatile__ ("":::"memory");
should be sufficient in that case, assuming that the communication library
takes are of making all changes available (mfence, __sync_synchronize or
whatever).

The other question is item (b): Here, one has an alias problem within the
segment - but the alias only rarely happens. Code of the form
  var[idx] = ...
usually does not access the local memory; it happens only in corner cases
like:
  do i = 1, num_images
var[i] = 5
  end do
which sets 'var' to 5 on all images -

Re: [PATCH, PR68640] Clear restrict in install_var_field


On 08/12/15 10:22, Richard Biener wrote:

On Tue, 8 Dec 2015, Jakub Jelinek wrote:


On Tue, Dec 08, 2015 at 09:49:49AM +0100, Tom de Vries wrote:

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d1d1e3c..ac4a94d 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1389,6 +1389,12 @@ install_var_field (tree var, bool by_ref, int mask, 
omp_context *ctx,
  || !is_gimple_omp_oacc (ctx->stmt));

type = TREE_TYPE (var);
+  /* Prevent redeclaring the var in the split-off function with a restrict
+ pointer type.  Note that we only clear type itself, restrict qualifiers in
+ the pointed-to type will be ignored by points-to analysis.  */
+  if (POINTER_TYPE_P (type))
+type = build_qualified_type (type, TYPE_QUALS (type) & 
~TYPE_QUAL_RESTRICT);


Is it necessary to call build_qualified_type in the common case (when it is
not restrict)?  I'd think if (POINTER_TYPE_P (type) && TYPE_RESTRICT (type))
might be better.




Will do.


That said, I forgot when exactly are the cliques computed, it would be nice
if then computing those and seeing a
GOMP_parallel/GOACC_parallel_keyed/GOMP_target_ext/GOMP_task/GOMP_taskloop*
builtins we would just continue walking the outlined functions they refer to
as if it was a part of the current function for the purpose of the restrict
computation.


Cliques are not computed in IPA PTA and the above would require IPA.


FTR, this is a generic (not openmp-related) example, and AFAIU a more 
generic form of the issue that is being raised.

...
static void __attribute__((noinline, noclone))
bar (int *a, int * b)
{
  *a = 1;
  *b = 2;
  *a = *a + 1;
}

void
foo (int *__restrict__ a, int *__restrict__ b)
{
  bar (a, b);
}
...

Thanks,
- Tom

[PATCH, PR68640] Clear restrict in install_var_field


Hi,

this patch fixes PR68640.

Consider this testcase:
...
int
foo (int *__restrict__ ap)
{
  int *bp = ap;
#pragma omp parallel for
  for (unsigned int idx = 0; idx < N; idx++)
ap[idx] = bp[idx];
}
...

When declaring a field for restrict pointer ap in the data passing 
struct for the thread function, we declare that field with restrict.


So, we roughly get the equivalent of:
...
foo.omp_fn (int *restrict ap, int *bp)
{
  for (unsigned int idx = 0; idx < N; idx++)
ap[idx] = bp[idx];
}

int
foo (int *__restrict__ ap)
{
  int *bp = ap;
  GOMP_parallel (foo.omp_fn, ap, bp, ...);
}
...

That is incorrect, since ap is not restrict in thread function 
foo.omp_fn, given that we also access *ap via bp.


Atm, this doesn't result in wrong code, but it might result in wrong 
code when we make restrict interpretation more aggressive.


The patch fixes the problem by clearing the restrict qualifier on 
pointer types in install_var_field.


Build non-bootstrap c compiler and tested gcc/gomp.exp and 
target-libgomp/c.exp.


OK for stage3 trunk if bootstrap and reg-test succeeds?

Thanks,
- Tom
Clear restrict in install_var_field

2015-12-02  Tom de Vries  

	PR tree-optimization/68640
	* omp-low.c (install_var_field): Clear the restrict qualifier on the var
	type.

---
 gcc/omp-low.c   |  6 ++
 gcc/testsuite/gcc.dg/gomp/pr68640.c | 16 
 2 files changed, 22 insertions(+)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d1d1e3c..ac4a94d 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1389,6 +1389,12 @@ install_var_field (tree var, bool by_ref, int mask, omp_context *ctx,
 	  || !is_gimple_omp_oacc (ctx->stmt));
 
   type = TREE_TYPE (var);
+  /* Prevent redeclaring the var in the split-off function with a restrict
+ pointer type.  Note that we only clear type itself, restrict qualifiers in
+ the pointed-to type will be ignored by points-to analysis.  */
+  if (POINTER_TYPE_P (type))
+type = build_qualified_type (type, TYPE_QUALS (type) & ~TYPE_QUAL_RESTRICT);
+
   if (mask & 4)
 {
   gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
diff --git a/gcc/testsuite/gcc.dg/gomp/pr68640.c b/gcc/testsuite/gcc.dg/gomp/pr68640.c
new file mode 100644
index 000..f333db0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/pr68640.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fopenmp -fdump-tree-ealias-all" } */
+
+#define N 1024
+
+int
+foo (int *__restrict__ ap)
+{
+  int *bp = ap;
+#pragma omp parallel for
+  for (unsigned int idx = 0; idx < N; idx++)
+ap[idx] = bp[idx];
+}
+
+/* { dg-final { scan-tree-dump-times "clique 1 base 1" 2 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "(?n)clique .* base .*" 2 "ealias" } } */

Re: [patch] Fix PR middle-end/68291 & 68292

2015-12-08 Thread Christophe Lyon

On 8 December 2015 at 10:46, Eric Botcazou  wrote:
>> Since you committed this (r231372), I've noticed several regressions
>> on ARM and AArch64.
>> You can have a look at
>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/231372/
>> report-build-info.html for more details.
>
> I presume it's:
>
> Fail appears  [ => FAIL]:
>   gcc.dg/pr63594-1.c (internal compiler error)
>   gcc.dg/pr63594-2.c (internal compiler error)
>   gcc.dg/vect/vect-singleton_1.c (internal compiler error)
>   gcc.dg/vect/vect-singleton_1.c -flto -ffat-lto-objects (internal compiler
> error)
>   gcc.target/aarch64/aapcs64/func-ret-1.c compilation,  -O1  (internal
> compiler error)
>   gcc.target/aarch64/aapcs64/func-ret-1.c compilation,  -O2  (internal
> compiler error)
>   gcc.target/aarch64/aapcs64/func-ret-1.c compilation,  -O3 -g  (internal
> compiler error)
>   gcc.target/aarch64/aapcs64/func-ret-1.c compilation,  -Og -g  (internal
> compiler error)
>   gcc.target/aarch64/aapcs64/func-ret-1.c compilation,  -Os  (internal
> compiler error)
>   gcc.target/aarch64/pr65491_1.c (internal compiler error)
>
> on aarch64?  Yes, I'm going to have a look.
>

Yes you are right. the PASS->FAIL and "PASS disappears" are consequences
of the new failures above.
You can download the gcc.log files by clicking on the "log" link, it that helps.

Thanks,

Christophe.

> --
> Eric Botcazou

Re: [gomp4] Adjust Fortran OACC async lib test

2015-12-08 Thread Thomas Schwinge

Hi!

On Mon, 23 Nov 2015 19:09:36 +0800, Chung-Lin Tang  
wrote:
> this fix adds more acc_wait's to libgomp.oacc-fortran/lib-1[13].f90.

It has not been obvious to me that these test cases would regress (PASS
-> FAIL, at least for some optimization levels) when your recent "[PATCH,
libgomp] Rewire OpenACC async" is applied.  Likewise for the testcase
affected ("repaired") by "[PATCH, C++] Wrap OpenACC wait in EXPR_STMT",
libgomp.oacc-c++/template-reduction.C.  Aside from verbally noting such
dependencies between patches and test cases, that latter patch submission
(for trunk) could have included (something like) this gomp-4_0-branch
libgomp.oacc-c++/template-reduction.C test case, to motivate the code
change, for example.

> For lib-12.f90, it's sort of a fix before we can resolve the issue
> of intended semantics for "wait+async".
> 
> As for lib-13.f90, I believe these added acc_wait calls seem
> reasonable, since we can't immediately assume the async-launched parallels
> already completed there.
> 
> Does this seem reasonable?

I think (and Jim, original author of these tests, copied to correct me if
I'm wrong) the intention of these tests is to launch a kernel
asynchronously, running long enough so that in the following we can test
that it's still running, enqueue asynchronous waits, and so on
(acc_sync_test, acc_wait_async, and so on).  Adding acc_wait calls
renders any such testing void?

However, isn't currently the logic written the wrong way round?  That is,
currently we abort if there are still asynchronous operations running,
which would explain why your change to add acc_wait calls does fix these
tests...

(And, of course a simple "do i = 1, 100: j = j + 1" loop is not
really a bullet-proof way to achieve such a long-running kernel...)

Given these problems, I suggest indeed you do commit your patch, and I'll
make a note that these test cases need to be revisited.

When committing this, please also remove the XFAIL directives from
libgomp.oacc-c-c++-common/asyncwait-1.c, which you forgot to do in your
gomp-4_0-branch "[PATCH, libgomp] Rewire OpenACC async" commit.

> Index: libgomp.oacc-fortran/lib-12.f90
> ===
> --- libgomp.oacc-fortran/lib-12.f90   (revision 230719)
> +++ libgomp.oacc-fortran/lib-12.f90   (working copy)
> @@ -15,6 +15,8 @@ program main
>  end do
>!$acc end parallel
>  
> +  call acc_wait (0)
> +
>call acc_wait_async (0, 1)
>  
>if (acc_async_test (0) .neqv. .TRUE.) call abort
> Index: libgomp.oacc-fortran/lib-13.f90
> ===
> --- libgomp.oacc-fortran/lib-13.f90   (revision 230719)
> +++ libgomp.oacc-fortran/lib-13.f90   (working copy)
> @@ -21,6 +21,9 @@ program main
>  end do
>!$acc end data
>  
> +  call acc_wait (1)
> +  call acc_wait (2)
> +
>if (acc_async_test (1) .neqv. .TRUE.) call abort
>if (acc_async_test (2) .neqv. .TRUE.) call abort

Grüße
 Thomas

Re: [PATCH] Add testcase for c++/68116


On 12/07/2015 06:49 PM, Marek Polacek wrote:


diff --git gcc/testsuite/g++.dg/cpp0x/pr68116.C 
gcc/testsuite/g++.dg/cpp0x/pr68116.C
index e69de29..04ed901 100644
--- gcc/testsuite/g++.dg/cpp0x/pr68116.C
+++ gcc/testsuite/g++.dg/cpp0x/pr68116.C
@@ -0,0 +1,12 @@
+// PR c++/68116
+// { dg-do compile { target c++11 } }
+
+class C {
+  void foo ();
+  typedef void (C::*T) (int);
+  static T b[];
+};
+C::T C::b[]
+{
+  T (::foo)
+};


The problem I have with approving C++ testcases is that I have no idea 
whether this is valid or not or what it expresses. You should Cc Jason 
(which I've now done).



Bernd

Re: [PATCH, i386, PR68627] Prohibit AVX-512VL broadcasts generation on KNL.

On Tue, Dec 8, 2015 at 11:40 AM, Kirill Yukhin  wrote:
> Hello,
> On 08 Dec 09:47, Andreas Schwab wrote:
>> FAIL: gfortran.dg/pr68627.f   -O  (test for excess errors)
>> Excess errors:
>> gfortran: error: unrecognized command line option '-mavx512f'
> Thanks for pointing.
>
> I've checked in this as obvious:
>
> gcc/testsuite:
> * gfortran.dg/pr68627.f: Limit target x86.
>
> diff --git a/gcc/testsuite/gfortran.dg/pr68627.f 
> b/gcc/testsuite/gfortran.dg/pr68627.f
> index 32ff4a7..54575d7 100644
> --- a/gcc/testsuite/gfortran.dg/pr68627.f
> +++ b/gcc/testsuite/gfortran.dg/pr68627.f
> @@ -1,4 +1,4 @@
> -! { dg-do compile { target lp64 } }
> +! { dg-do compile { target { { i?86-*-* x86_64-*-* } && lp64 } } }

Actually, you need { ! { ia32 } } here, since lp64 filters out x32 in
addition to ia32. Please note that since x32 is x86_64 target, it
implies 16 xmm registers.

Uros.

Re: [PATCH 8/N] Fix memory leak in tree-vectorizer.h

On Tue, Dec 8, 2015 at 1:01 PM, Richard Biener
 wrote:
> On Tue, Dec 8, 2015 at 12:30 PM, Martin Liška  wrote:
>> Hello.
>>
>> The patch removes memory leaks that are caused by overwriting an existing
>> item in stmt_vec_info_vec (in set_vinfo_for_stmt). My first attempt was to 
>> call
>> free_stmt_vec_info for old entries that are overwritten, but it caused 
>> double frees
>> as there are some references between stmt_vec_infos.
>>
>> So that I've decided to allocate all stmt_vec_info structures from a memory 
>> pool, which
>> is released in free_stmt_vec_info_vec routine. Replacing 'vec' (used for 
>> simd_clone_info and same_align_regs) to 'auto_vec'
>> helps to reduce another leaks. To be honest, the solution is not ideal as 
>> destructor
>> of there auto_vec is not called, however with the patch applied, there is 
>> just a single memory leak
>> in the whole test-suite related to tree-vect-stmts.c (which is unrelated to 
>> these vectors).
>>
>> Patch can bootstrap and survives regression tests on 
>> x86_64-unknown-linux-gnu.
>>
>> Ready for trunk?
>
>  new_stmt_vec_info (gimple *stmt, vec_info *vinfo)
>  {
>stmt_vec_info res;
> -  res = (stmt_vec_info) xcalloc (1, sizeof (struct _stmt_vec_info));
> +  res = stmt_vec_info_pool.allocate ();
>
> previously it was zeroed memory now it is no longer (AFAIK).
>
> ICK.  You do
>
> +struct _stmt_vec_info
> +{
> +  _stmt_vec_info (): type ((enum stmt_vec_info_type) 0), live (false),
> +  in_pattern_p (false), stmt (NULL), vinfo (0), vectype (NULL_TREE),
> +  vectorized_stmt (NULL), data_ref_info (0), dr_base_address (NULL_TREE),
> +  dr_init (NULL_TREE), dr_offset (NULL_TREE), dr_step (NULL_TREE),
> +  dr_aligned_to (NULL_TREE), loop_phi_evolution_base_unchanged (NULL_TREE),
> +  loop_phi_evolution_part (NULL_TREE), related_stmt (NULL),
> pattern_def_seq (0),
> +  same_align_refs (0), simd_clone_info (0), def_type ((enum vect_def_type) 
> 0),
> +  slp_type ((enum slp_vect_type) 0), first_element (NULL), next_element 
> (NULL),
> +  same_dr_stmt (NULL), size (0), store_count (0), gap (0), min_neg_dist (0),
> +  relevant ((enum vect_relevant) 0), vectorizable (false),
> +  gather_scatter_p (false), strided_p (false), simd_lane_access_p (false),
> +  v_reduc_type ((enum vect_reduction_type) 0) {}
>
> where I think a
>
>   _stmt_vec_info () { memset (this, 0, sizeof (_stmt_vec_info)); }
>
> would be much nicer.  Or just keep the struct as a POD and add that memset
> at the allocation point.  (and double-check the C++ alloc-pool doesn't end up
> zeroing everything twice that way...)
>
> Note that overwriting an existing entry in stmt_vec_info_vec should not 
> happen.
> In which path does that it happen?  We probably should assert the entry
> is NULL.
>
> Thus your fix shouldn't be necessary.

Testing a patch.

Richard.

> Thanks,
> Richard.
>
>
>> Martin

Re: [PATCH, i386, PR68627] Prohibit AVX-512VL broadcasts generation on KNL.

2015-12-08 Thread Kirill Yukhin

Hello,
On 08 Dec 09:47, Andreas Schwab wrote:
> FAIL: gfortran.dg/pr68627.f   -O  (test for excess errors)
> Excess errors:
> gfortran: error: unrecognized command line option '-mavx512f'
Thanks for pointing.

I've checked in this as obvious:

gcc/testsuite:
* gfortran.dg/pr68627.f: Limit target x86.

diff --git a/gcc/testsuite/gfortran.dg/pr68627.f 
b/gcc/testsuite/gfortran.dg/pr68627.f
index 32ff4a7..54575d7 100644
--- a/gcc/testsuite/gfortran.dg/pr68627.f
+++ b/gcc/testsuite/gfortran.dg/pr68627.f
@@ -1,4 +1,4 @@
-! { dg-do compile { target lp64 } }
+! { dg-do compile { target { { i?86-*-* x86_64-*-* } && lp64 } } }
 
 ! { dg-options "-Ofast -mavx512f -ffixed-xmm1 -ffixed-xmm2 -ffixed-xmm3 
-ffixed-xmm4 -ffixed-xmm5 -ffixed-xmm6 -ffixed-xmm7 -ffixed-xmm8 -ffixed-xmm9 
-ffixed-xmm10 -ffixed-xmm11 -ffixed-xmm12 -ffixed-xmm13 -ffixed-xmm14 
-ffixed-xmm15" }
 
--
Thanks, K
> 
> Andreas.
> 
> -- 
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."

Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment

2015-12-08 Thread Tobias Burnus

Dear Alessandro, dear all,

On Mon, Dec 07, 2015 at 03:48:17PM +0100, Alessandro Fanfarillo wrote:
> Your patch fixes the issues. In attachment patch, test case and changelog.

Regarding the ChangeLog: Please include the added lines, only, and not the
change as patch. gcc/testsuite/ChangeLog changes too often such that a patch
won't apply.


Regarding the patch, I wonder where the memory synchronization makes sense
and where it is not required. (cf. also my email to Matthew in this thread,
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00828.html)


I think it should be after all image control statements (8.5.1 in
http://j3-fortran.org/doc/year/15/15-007r2.pdf):
SYNC ALL, SYNC IMAGES, SYNC MEMORY, ALLOCATE, DEALLOCATE,
CRITICAL ... END CRITICAL, EVENT POST, EVENT WAIT, LOCK, UNLOCK,
MOVE_ALLOC.

Thus:
- SYNC ..., ALLOCATE/DEALLOCATE: I think those are all handled by the
  current patch
- MOVE_ALLOC: This one should be handled via the internal (de)allocate
  handling (please check!)
- EVENT WAIT, CRITICAL and LOCK: Obtaining a lock or receiving an event
  implies that quite likely some other process has changed something
  before. For those, the assembler statement really has to be added.
- EVENT POST, UNLOCK and END CRITICAL: While those are image control
  statements, I do not see how a remote image could modify a value in
  a well defined way, which is guaranteed to be available after that
  statement - but might not yet be available already at the previous
  segment (i.e. the previous image control statement).

Hence: I think you should update the patch to also handle 
EVENT WAIT, CRITICAL and LOCK - and to check MOVE_ALLOC.



Additionally, we need to handle the alias issue of:
  var = 5
  var[this_image()] = 42
  tmp = var

Both _gfortran_caf_send and _gfortran_caf_sendget can modify the
value of a variable; thus, calling the assembler after the function
makes sense.


However, _gfortran_caf_get does not modify the associated variable;
adding the assembler statement *after* _gfortran_caf_get. The
question is, however, whether one needs to take care of 'flushing'
the variable before the _gfortran_caf_get:
   var = 7
   ...
   var = 5
   tmp = var[this_image()]
   result = var + tmp
Here, one needs to prevent the compiler of keeping "5" only in a
register or moving "var = 5" after the _gfortran_caf_get call.


Thus, you have to move the assembler statement before the library
call in _gfortran_caf_get - and add another one at the beginning
of _gfortran_caf_sendget.

(For send/get, might might come up with something better than
""::"memory", but for now, it should do.)

Cheers,

Tobias

Re: [patch] Fix PR middle-end/68291 & 68292

> Yes you are right. the PASS->FAIL and "PASS disappears" are consequences
> of the new failures above.

OK.  The issue is that we used to create a REG:BLK for RESULT_DECL, but now we 
get to hard_function_value as originally, which rightfully adjusts it to SI:

  val
   = targetm.calls.function_value (valtype, func ? func : fntype, outgoing);

  if (REG_P (val)
  && GET_MODE (val) == BLKmode)
{
  unsigned HOST_WIDE_INT bytes = int_size_in_bytes (valtype);
  machine_mode tmpmode;

  /* int_size_in_bytes can return -1.  We don't need a check here
 since the value of bytes will then be large enough that no
 mode will match anyway.  */

  for (tmpmode = GET_CLASS_NARROWEST_MODE (MODE_INT);
   tmpmode != VOIDmode;
   tmpmode = GET_MODE_WIDER_MODE (tmpmode))
{
  /* Have we found a large enough mode?  */
  if (GET_MODE_SIZE (tmpmode) >= bytes)
break;
}

  /* No suitable mode found.  */
  gcc_assert (tmpmode != VOIDmode);

  PUT_MODE (val, tmpmode);
}

and we assert in the second gcc_checking_assert of set_rtl because mode and 
promote_ssa_mode don't agree anymore (SI vs BLK).  So we need to relax the 
second gcc_checking_assert the same way as the first one.

I'm going to test it on x86-64, SPARC64 and Aarch64.


PR middle-end/68291
PR middle-end/68292
* cfgexpand.c (set_rtl): Always accept mode mismatch for SSA names
with BLKmode promoted mode based on RESULT_DECLs.


-- 
Eric BotcazouIndex: cfgexpand.c
===
--- cfgexpand.c	(revision 231394)
+++ cfgexpand.c	(working copy)
@@ -203,11 +203,14 @@ set_rtl (tree t, rtx x)
  PARM_DECLs and RESULT_DECLs, we'll have been called by
  set_parm_rtl, which will give us the default def, so we don't
  have to compute it ourselves.  For RESULT_DECLs, we accept mode
- mismatches too, as long as we're not coalescing across variables,
- so that we don't reject BLKmode PARALLELs or unpromoted REGs.  */
+ mismatches too, as long as we have BLKmode or are not coalescing
+ across variables, so that we don't reject BLKmode PARALLELs or
+ unpromoted REGs.  */
   gcc_checking_assert (!x || x == pc_rtx || TREE_CODE (t) != SSA_NAME
-		   || (SSAVAR (t) && TREE_CODE (SSAVAR (t)) == RESULT_DECL
-			   && !flag_tree_coalesce_vars)
+		   || (SSAVAR (t)
+			   && TREE_CODE (SSAVAR (t)) == RESULT_DECL
+			   && (promote_ssa_mode (t, NULL) == BLKmode
+			   || !flag_tree_coalesce_vars))
 		   || !use_register_for_decl (t)
 		   || GET_MODE (x) == promote_ssa_mode (t, NULL));

Re: [testsuite][ARM target attributes] Fix effective_target tests

2015-12-08 Thread Kyrill Tkachov

Hi Christophe,

On 27/11/15 13:00, Christophe Lyon wrote:

Hi,

After the recent commits from Christian adding target attributes
support for ARM FPU settings, I've noticed that some of the tests
were failing because of incorrect assumptions wrt to the default
cpu/fpu/float-abi of the compiler.

This patch fixes the problems I've noticed in the following way:
- do not force -mfloat-abi=softfp in dg-options, to avoid conflicts
when gcc is configured --with-float=hard

- change arm_vfp_ok such that it tries several -mfpu/-mfloat-abi
flags, checks that __ARM_FP is defined and __ARM_NEON_FP is not
defined

- introduce arm_fp_ok, which is similar but does not enforce fpu setting

- add a new effective_target: arm_crypto_pragma_ok to check that
setting this fpu via a pragma is actually supported by the current
"multilib". This is different from checking the command-line option
because the pragma might conflict with the command-line options in
use.

The updates in the testcases are as follows:
- attr-crypto.c, we have to make sure that the defaut fpu does not
conflict with the one forced by pragma. That's why I use the arm_vfp
options/effective_target. This is needed if gcc has been configured
--with-fpu=neon-fp16, as the pragma fpu=crypto-neon-fp-armv8 would
conflict.

- attr-neon-builtin-fail.c: use arm_fp to force the appropriate
float-abi setting. Enforcing fpu is not needed here.

- attr-neon-fp16.c: similar, I also removed arm_neon_ok since it was
not necessary to make the test pass in my testing. On second thought,
I'm wondering whether I should leave it and make the test unsupported
in more cases (such as when forcing -march=armv5t, although it does
pass with this patch)

- attr-neon2.c: use arm_vfp to force the appropriate float-abi
setting. Enforcing mfpu=vfp is needed to avoid conflict with the
pragma target fpu=neon (for instance if the toolchain default is
neon-fp16)

- attr-neon3.c: similar

Tested on a variety of configurations, see:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/230929-target-attr/report-build-info.html

Note that the regressions reported fall into 3 categories:
- when forcing march=armv5t: tests are now unsupported because I
modified arm_crypto_ok to require arm_v8_neon_ok instead of arm32.

- the warning reported by attr-neon-builtin-fail.c moved from line 12
to 14 and is thus seen as a regression + one improvement

- finally, attr-neon-fp16.c causes an ICE on armeb compilers, for
which I need to post a bugzilla.

TBH, I'm a bit concerned by the complexity of all these multilib-like
conditions. I'm confident that I'm still missing some combinations :-)

And with new target attributes coming, new architectures etc... all
this logic is likely to become even more complex.

That being said, OK for trunk?

I'd like us to clean up and reorganise the gcc.target/arm testsuite
at some point to make it more robust with respect to the tons of multilib
options and configurations we can have for arm. It's becoming very frustrating
to test feature-specific stuff :(

This is ok in the meantime.
Sorry for the delay.

Thanks for handling this!
Kyrill

Christophe

2015-11-27 Christophe Lyon

* lib/target-supports.exp
(check_effective_target_arm_vfp_ok_nocache): New.
(check_effective_target_arm_vfp_ok): Call the new
check_effective_target_arm_vfp_ok_nocache function.
(check_effective_target_arm_fp_ok_nocache): New.
(check_effective_target_arm_fp_ok): New.
(add_options_for_arm_fp): New.
(check_effective_target_arm_crypto_ok_nocache): Require
target_arm_v8_neon_ok instead of arm32.
(check_effective_target_arm_crypto_pragma_ok_nocache): New.
(check_effective_target_arm_crypto_pragma_ok): New.
(add_options_for_arm_vfp): New.
* gcc.target/arm/attr-crypto.c: Use arm_crypto_pragma_ok effective
target. Do not force -mfloat-abi=softfp, use arm_vfp effective
target instead.
* gcc.target/arm/attr-neon-builtin-fail.c: Do not force
-mfloat-abi=softfp, use arm_fp effective target instead.
* gcc.target/arm/attr-neon-fp16.c: Likewise. Remove arm_neon_ok
dependency.
* gcc.target/arm/attr-neon2.c: Do not force -mfloat-abi=softfp,
use arm_vfp effective target instead.
* gcc.target/arm/attr-neon3.c: Likewise.

Re: [testsuite][ARM target attributes] Fix effective_target tests

2015-12-08 Thread Christophe Lyon

On 8 December 2015 at 11:50, Kyrill Tkachov  wrote:
> Hi Christophe,
>
>
> On 27/11/15 13:00, Christophe Lyon wrote:
>>
>> Hi,
>>
>> After the recent commits from Christian adding target attributes
>> support for ARM FPU settings,  I've noticed that some of the tests
>> were failing because of incorrect assumptions wrt to the default
>> cpu/fpu/float-abi of the compiler.
>>
>> This patch fixes the problems I've noticed in the following way:
>> - do not force -mfloat-abi=softfp in dg-options, to avoid conflicts
>> when gcc is configured --with-float=hard
>>
>> - change arm_vfp_ok such that it tries several -mfpu/-mfloat-abi
>> flags, checks that __ARM_FP is defined and __ARM_NEON_FP is not
>> defined
>>
>> - introduce arm_fp_ok, which is similar but does not enforce fpu setting
>>
>> - add a new effective_target: arm_crypto_pragma_ok to check that
>> setting this fpu via a pragma is actually supported by the current
>> "multilib". This is different from checking the command-line option
>> because the pragma might conflict with the command-line options in
>> use.
>>
>> The updates in the testcases are as follows:
>> - attr-crypto.c, we have to make sure that the defaut fpu does not
>> conflict with the one forced by pragma. That's why I use the arm_vfp
>> options/effective_target. This is needed if gcc has been configured
>> --with-fpu=neon-fp16, as the pragma fpu=crypto-neon-fp-armv8 would
>> conflict.
>>
>> - attr-neon-builtin-fail.c: use arm_fp to force the appropriate
>> float-abi setting. Enforcing fpu is not needed here.
>>
>> - attr-neon-fp16.c: similar, I also removed arm_neon_ok since it was
>> not necessary to make the test pass in my testing. On second thought,
>> I'm wondering whether I should leave it and make the test unsupported
>> in more cases (such as when forcing -march=armv5t, although it does
>> pass with this patch)
>>
>> - attr-neon2.c: use arm_vfp to force the appropriate float-abi
>> setting. Enforcing mfpu=vfp is needed to avoid conflict with the
>> pragma target fpu=neon (for instance if the toolchain default is
>> neon-fp16)
>>
>> - attr-neon3.c: similar
>>
>> Tested on a variety of configurations, see:
>>
>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/230929-target-attr/report-build-info.html
>>
>> Note that the regressions reported fall into 3 categories:
>> - when forcing march=armv5t: tests are now unsupported because I
>> modified arm_crypto_ok to require arm_v8_neon_ok instead of arm32.
>>
>> - the warning reported by attr-neon-builtin-fail.c moved from line 12
>> to 14 and is thus seen as a regression + one improvement
>>
>> - finally, attr-neon-fp16.c causes an ICE on armeb compilers, for
>> which I need to post a bugzilla.
>>
>>
>> TBH, I'm a bit concerned by the complexity of all these multilib-like
>> conditions. I'm confident that I'm still missing some combinations :-)
>>
>> And with new target attributes coming, new architectures etc... all
>> this logic is likely to become even more complex.
>>
>> That being said, OK for trunk?
>
>
> I'd like us to clean up and reorganise the gcc.target/arm testsuite
> at some point to make it more robust with respect to the tons of multilib
> options and configurations we can have for arm. It's becoming very
> frustrating
> to test feature-specific stuff :(
>
> This is ok in the meantime.
> Sorry for the delay.
>

Thanks for the approval, glad to see I'm not the only one willing to see
improvements in this area :)

Committed as r231403.

Christophe.

> Thanks for handling this!
> Kyrill
>
>
>> Christophe
>>
>>
>> 2015-11-27  Christophe Lyon  
>>
>>  * lib/target-supports.exp
>>  (check_effective_target_arm_vfp_ok_nocache): New.
>>  (check_effective_target_arm_vfp_ok): Call the new
>>  check_effective_target_arm_vfp_ok_nocache function.
>>  (check_effective_target_arm_fp_ok_nocache): New.
>>  (check_effective_target_arm_fp_ok): New.
>>  (add_options_for_arm_fp): New.
>>  (check_effective_target_arm_crypto_ok_nocache): Require
>>  target_arm_v8_neon_ok instead of arm32.
>>  (check_effective_target_arm_crypto_pragma_ok_nocache): New.
>>  (check_effective_target_arm_crypto_pragma_ok): New.
>>  (add_options_for_arm_vfp): New.
>>  * gcc.target/arm/attr-crypto.c: Use arm_crypto_pragma_ok effective
>>  target. Do not force -mfloat-abi=softfp, use arm_vfp effective
>>  target instead.
>>  * gcc.target/arm/attr-neon-builtin-fail.c: Do not force
>>  -mfloat-abi=softfp, use arm_fp effective target instead.
>>  * gcc.target/arm/attr-neon-fp16.c: Likewise. Remove arm_neon_ok
>>  dependency.
>>  * gcc.target/arm/attr-neon2.c: Do not force -mfloat-abi=softfp,
>>  use arm_vfp effective target instead.
>>  * gcc.target/arm/attr-neon3.c: Likewise.
>
>

Re: [PATCH 9/N] Fix memory leak tree-if-conv.c


On 12/08/2015 12:32 PM, Martin Liška wrote:

Simple memory leak fix.
Patch can bootstrap and survives regression tests on x86_64-unknown-linux-gnu.


This one is OK. For the larger one I'm not sure whether we shouldn't be 
saying no around this point and wait until stage 1 (unless we have 
regressions).



Bernd

Re: -fstrict-aliasing fixes 5/6: make type system independent of flag_strict_aliasing

> The problem is with the type:
> (gdb) p debug_tree (p)
>  
> sizes-gimplified public visited unsigned DI
> size  bitsizetype> constant visited 64> unit size  type  constant visited 8> align 64
> symtab 0 alias set -1 canonical type 0x76af02a0
> pointer_to_this >
> 
> it is a recursive pointer to itself. Does this make sense in Ada?  If so we
> will need to add a recursion guard into the loop and put the alias set into
> voidptr_alias_set.  It more looks like a frontend bug to me - I can not
> think of a use for this beast.

This one (access1) is admittedly border line and we can probably kludge around 
it in the front-end, but what about access2 and more generally pointer cycles?

-- 
Eric Botcazou

Re: [PATCH] Fix Changelog entry and add pr66896.C


On 12/07/2015 04:05 PM, Martin Liška wrote:

As Jakub pointed out in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66896#c15, 
I forgot
to add a test-case to both GCC-5-branch and trunk.

May I please installed the suggested patch to both these branches?


Sure.


Bernd

Re: [PATCH] New version of libmpx with new memmove wrapper

2015-12-08 Thread Aleksandra Tsvetkova

Wrong version of patch was attached.

On Tue, Dec 8, 2015 at 1:46 PM, Aleksandra Tsvetkova  wrote:
> gcc/testsuite/ChangeLog
> 2015-10-27  Tsvetkova Alexandra  
>
> * gcc.target/i386/mpx/memmove-1.c: New test for __mpx_wrapper_memmove.
> * gcc.target/i386/mpx/memmove-2.c: New test covering fail on spec.
>
> libmpx/ChangeLog
> 2015-10-28  Tsvetkova Alexandra  
>
> * mpxrt/Makefile.am (libmpx_la_LDFLAGS): Add -version-info option.
> * libmpxwrap/Makefile.am (libmpx_la_LDFLAGS): Likewise + includes fixed.
> * libmpx/Makefile.in: Regenerate.
> * mpxrt/Makefile.in: Regenerate.
> * libmpxwrap/Makefile.in: Regenerate.
> * mpxrt/libtool-version: New version.
> * libmpxwrap/libtool-version: Likewise.
> * mpxrt/libmpx.map: Add new version and a new symbol.
> * mpxrt/mpxrt.h: New file.
> * mpxrt/mpxrt.c (NUM_L1_BITS): Moved to mpxrt.h.
> (REG_IP_IDX): Moved to mpxrt.h.
> (REX_PREFIX): Moved to mpxrt.h.
> (XSAVE_OFFSET_IN_FPMEM): Moved to mpxrt.h.
> (MPX_L1_SIZE): Moved to mpxrt.h.
> * libmpxwrap/mpx_wrappers.c: (__mpx_wrapper_memmove): Rewritten.
> (mpx_pointer): New type.
> (mpx_bt_entry): New type.
> (alloc_bt): New function.
> (get_bt): New function.
> (copy_if_possible): New function.
> (copy_if_possible_from_end): New function.
> (move_bounds): New function.
diff --git a/gcc/testsuite/gcc.target/i386/mpx/memmove-1.c 
b/gcc/testsuite/gcc.target/i386/mpx/memmove-1.c
new file mode 100755
index 000..57030a3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/mpx/memmove-1.c
@@ -0,0 +1,119 @@
+/* { dg-do run } */
+/* { dg-options "-fcheck-pointer-bounds -mmpx" } */
+
+
+#include 
+#include 
+#include 
+#include 
+#include "mpx-check.h"
+
+#ifdef __i386__
+/* i386 directory size is 4MB.  */
+#define MPX_NUM_L2_BITS 10
+#define MPX_NUM_IGN_BITS 2
+#else /* __i386__ */
+/* x86_64 directory size is 2GB.  */
+#define MPX_NUM_L2_BITS 17
+#define MPX_NUM_IGN_BITS 3
+#endif /* !__i386__ */
+
+
+/* bt_num_of_elems is the number of elements in bounds table.  */
+unsigned long bt_num_of_elems = (1UL << MPX_NUM_L2_BITS);
+/* Function to test MPX wrapper of memmove function.
+   src_bigger_dst determines which address is bigger, can be 0 or 1.
+   src_bt_index and dst_bt index are bt_indexes
+   from the beginning of the page.
+   bd_index_end is the bd index of the last element of src if we define
+   bd index of the first element as 0.
+   src_bt index_end is bt index of the last element of src.
+   pointers inside determines if array being copied includes pointers
+   src_align and dst_align are alignments of src and dst.
+   Arrays may contain unaligned pointers.  */
+int
+test (int src_bigger_dst, int src_bt_index, int dst_bt_index,
+  int bd_index_end, int src_bt_index_end, int pointers_inside,
+  int src_align, int dst_align)
+{
+  const int n =
+src_bt_index_end - src_bt_index + bd_index_end * bt_num_of_elems;
+  if (n < 0)
+{
+  return 0;
+}
+  const int num_of_pointers = (bd_index_end + 2) * bt_num_of_elems;
+  void **arr = 0;
+  posix_memalign ((void **) (),
+   1UL << (MPX_NUM_L2_BITS + MPX_NUM_IGN_BITS),
+   num_of_pointers * sizeof (void *));
+  void **src = arr, **dst = arr;
+  if ((src_bigger_dst) && (src_bt_index < dst_bt_index))
+src_bt_index += bt_num_of_elems;
+  if (!(src_bigger_dst) && (src_bt_index > dst_bt_index))
+dst_bt_index += bt_num_of_elems;
+  src += src_bt_index;
+  dst += dst_bt_index;
+  char *realign = (char *) src;
+  realign += src_align;
+  src = (void **) realign;
+  realign = (char *) dst;
+  realign += src_align;
+  dst = (void **) realign;
+  if (pointers_inside)
+{
+  for (int i = 0; i < n; i++)
+src[i] = __bnd_set_ptr_bounds (arr + i, i * sizeof (void *) + 1);
+}
+  memmove (dst, src, n * sizeof (void *));
+  if (pointers_inside)
+{
+  for (int i = 0; i < n; i++)
+{
+  if (dst[i] != arr + i)
+abort ();
+  if (__bnd_get_ptr_lbound (dst[i]) != arr + i)
+abort ();
+  if (__bnd_get_ptr_ubound (dst[i]) != arr + 2 * i)
+abort ();
+}
+}
+  free (arr);
+  return 0;
+}
+
+/* Call testall to test common cases of memmove for MPX.  */
+void
+testall ()
+{
+  int align[3];
+  align[0] = 0;
+  align[1] = 1;
+  align[2] = 7;
+  for (int pointers_inside = 0; pointers_inside < 2; pointers_inside++)
+for (int src_bigger_dst = 0; src_bigger_dst < 2; src_bigger_dst++)
+  for (int src_align = 0; src_align < 3; src_align ++)
+for (int dst_align = 0; dst_align < 3; dst_align ++)
+  for (int pages = 0; pages < 4; pages++)
+{
+  test (src_bigger_dst, 1, 2, pages, 1, pointers_inside,
+align[src_align], align[dst_align]);
+  test (src_bigger_dst, 1, 2, pages, 2, pointers_inside,
+

Re: [PING v3][PATCH][4.9] Backport fix for PR sanitizer/64820.

2015-12-08 Thread Jakub Jelinek

On Tue, Dec 08, 2015 at 01:48:46PM +0300, Maxim Ostapenko wrote:
> 
> On 01/12/15 20:23, Maxim Ostapenko wrote:
> >On 25/11/15 12:14, Maxim Ostapenko wrote:
> >>I would like to ping the patch:
> >>https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02174.html.
> >>
> >
> >Ping.
> >
> 
> Ping.

Ok.

Jakub

Re: [PATCH] Fix -Werror= handling for Joined warnings, add a few missing Warning keywords (PRs c/48088, c/68657)


On 12/07/2015 11:41 PM, Jakub Jelinek wrote:

On Mon, Dec 07, 2015 at 04:11:48PM +0100, Bernd Schmidt wrote:

Let's document arguments; for the ones identical to read_cmdline_option an
explicit pointer there is sufficient, but errors is new.



This also needs an update to the function comment.

Other than that I'm ok with this. This area could probably be restructured a
bit but for now I think this is good enough.


So like this?  Bootstrapped/regtested on x86_64-linux and i686-linux.


Yes, thanks.


Bernd

[PATCH 9/N] Fix memory leak tree-if-conv.c

Hello.

Simple memory leak fix.
Patch can bootstrap and survives regression tests on x86_64-unknown-linux-gnu.

Ready for trunk?
Martin
>From f2f533ec19b36a9ead2f72b148d1aeed074ef136 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Sat, 28 Nov 2015 08:42:14 +0100
Subject: [PATCH 2/2] Fix memory leak in tree-if-conv.c

gcc/ChangeLog:

2015-12-07  Martin Liska  

	* tree-if-conv.c (ifcvt_local_dce): Replace vec with auto_vec.
---
 gcc/tree-if-conv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index f43942d..05e4e13 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -2566,7 +2566,7 @@ ifcvt_local_dce (basic_block bb)
   gimple *stmt1;
   gimple *phi;
   gimple_stmt_iterator gsi;
-  vec worklist;
+  auto_vec worklist;
   enum gimple_code code;
   use_operand_p use_p;
   imm_use_iterator imm_iter;
-- 
2.6.3

[PING v3][PATCH][4.9] Backport fix for PR sanitizer/64820.

2015-12-08 Thread Maxim Ostapenko



On 01/12/15 20:23, Maxim Ostapenko wrote:

On 25/11/15 12:14, Maxim Ostapenko wrote:
I would like to ping the patch: 
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02174.html.




Ping.



Ping.

-Maxim

[PATCH 8/N] Fix memory leak in tree-vectorizer.h

Hello.

The patch removes memory leaks that are caused by overwriting an existing
item in stmt_vec_info_vec (in set_vinfo_for_stmt). My first attempt was to call
free_stmt_vec_info for old entries that are overwritten, but it caused double 
frees
as there are some references between stmt_vec_infos.

So that I've decided to allocate all stmt_vec_info structures from a memory 
pool, which
is released in free_stmt_vec_info_vec routine. Replacing 'vec' (used for 
simd_clone_info and same_align_regs) to 'auto_vec'
helps to reduce another leaks. To be honest, the solution is not ideal as 
destructor
of there auto_vec is not called, however with the patch applied, there is just 
a single memory leak
in the whole test-suite related to tree-vect-stmts.c (which is unrelated to 
these vectors).

Patch can bootstrap and survives regression tests on x86_64-unknown-linux-gnu.

Ready for trunk?
Martin
>From a2f09db0c4fe3920c744ccc77cf5e28b2a7df458 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 2 Dec 2015 12:59:00 +0100
Subject: [PATCH 1/2] Fix memory leak in tree-vectorizer.h

gcc/ChangeLog:

2015-12-07  Martin Liska  

	* tree-loop-distribution.c: Include alloc-pool.h.
	* tree-parloops.c: Likewise.
	* tree-ssa-loop-ivopts.c: Likewise.
	* tree-ssa-loop-manip.c: Likewise.
	* tree-ssa-loop.c: Likewise.
	* tree-vect-data-refs.c: Likewise.
	* tree-vect-loop-manip.c: Likewise.
	* tree-vect-loop.c: Likewise.
	* tree-vect-patterns.c: Likewise.
	* tree-vect-slp.c: Likewise.
	* tree-vect-stmts.c (vectorizable_simd_clone_call): Do not
	release arginfo.
	(new_stmt_vec_info): Allocate stmt_vec_info from a newly added
	pool allocator.
	(free_stmt_vec_info_vec): Release the pool.
	(free_stmt_vec_info): Remove an entry from the memory pool.
	* tree-vectorizer.c (stmt_vec_info_pool): Declare the memory
	pool.
	* tree-vectorizer.h (struct _stmt_vec_info): Replace vec with
	auto_vec and provide default contructor.
---
 gcc/tree-loop-distribution.c |  1 +
 gcc/tree-parloops.c  |  1 +
 gcc/tree-ssa-loop-ivopts.c   |  1 +
 gcc/tree-ssa-loop-manip.c|  1 +
 gcc/tree-ssa-loop.c  |  1 +
 gcc/tree-vect-data-refs.c|  1 +
 gcc/tree-vect-loop-manip.c   |  1 +
 gcc/tree-vect-loop.c |  1 +
 gcc/tree-vect-patterns.c |  1 +
 gcc/tree-vect-slp.c  |  1 +
 gcc/tree-vect-stmts.c| 27 ---
 gcc/tree-vectorizer.c|  4 
 gcc/tree-vectorizer.h| 23 +++
 13 files changed, 41 insertions(+), 23 deletions(-)

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index a1936f0..637d8dd 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -63,6 +63,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa.h"
 #include "cfgloop.h"
 #include "tree-scalar-evolution.h"
+#include "alloc-pool.h"
 #include "tree-vectorizer.h"
 
 
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 9b564ca..a1085c1 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "tree-scalar-evolution.h"
 #include "langhooks.h"
+#include "alloc-pool.h"
 #include "tree-vectorizer.h"
 #include "tree-hasher.h"
 #include "tree-parloops.h"
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index d7a0e9e..5204d13 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -101,6 +101,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-propagate.h"
 #include "tree-ssa-address.h"
 #include "builtins.h"
+#include "alloc-pool.h"
 #include "tree-vectorizer.h"
 
 /* FIXME: Expressions are expanded to RTL in this pass to determine the
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index b614412..f1f9866 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-loop.h"
 #include "tree-into-ssa.h"
 #include "tree-ssa.h"
+#include "alloc-pool.h"
 #include "cfgloop.h"
 #include "tree-scalar-evolution.h"
 #include "params.h"
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index cf7d94e..8dab786 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "tree-inline.h"
 #include "tree-scalar-evolution.h"
+#include "alloc-pool.h"
 #include "tree-vectorizer.h"
 #include "omp-low.h"
 
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 8810af1..e39c804 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-loop.h"
 #include "cfgloop.h"
 #include "tree-scalar-evolution.h"
+#include "alloc-pool.h"
 #include "tree-vectorizer.h"
 #include "expr.h"
 #include "builtins.h"
diff --git

Re: [PATCH 8/N] Fix memory leak in tree-vectorizer.h

On Tue, Dec 8, 2015 at 12:30 PM, Martin Liška  wrote:
> Hello.
>
> The patch removes memory leaks that are caused by overwriting an existing
> item in stmt_vec_info_vec (in set_vinfo_for_stmt). My first attempt was to 
> call
> free_stmt_vec_info for old entries that are overwritten, but it caused double 
> frees
> as there are some references between stmt_vec_infos.
>
> So that I've decided to allocate all stmt_vec_info structures from a memory 
> pool, which
> is released in free_stmt_vec_info_vec routine. Replacing 'vec' (used for 
> simd_clone_info and same_align_regs) to 'auto_vec'
> helps to reduce another leaks. To be honest, the solution is not ideal as 
> destructor
> of there auto_vec is not called, however with the patch applied, there is 
> just a single memory leak
> in the whole test-suite related to tree-vect-stmts.c (which is unrelated to 
> these vectors).
>
> Patch can bootstrap and survives regression tests on x86_64-unknown-linux-gnu.
>
> Ready for trunk?

 new_stmt_vec_info (gimple *stmt, vec_info *vinfo)
 {
   stmt_vec_info res;
-  res = (stmt_vec_info) xcalloc (1, sizeof (struct _stmt_vec_info));
+  res = stmt_vec_info_pool.allocate ();

previously it was zeroed memory now it is no longer (AFAIK).

ICK.  You do

+struct _stmt_vec_info
+{
+  _stmt_vec_info (): type ((enum stmt_vec_info_type) 0), live (false),
+  in_pattern_p (false), stmt (NULL), vinfo (0), vectype (NULL_TREE),
+  vectorized_stmt (NULL), data_ref_info (0), dr_base_address (NULL_TREE),
+  dr_init (NULL_TREE), dr_offset (NULL_TREE), dr_step (NULL_TREE),
+  dr_aligned_to (NULL_TREE), loop_phi_evolution_base_unchanged (NULL_TREE),
+  loop_phi_evolution_part (NULL_TREE), related_stmt (NULL),
pattern_def_seq (0),
+  same_align_refs (0), simd_clone_info (0), def_type ((enum vect_def_type) 0),
+  slp_type ((enum slp_vect_type) 0), first_element (NULL), next_element (NULL),
+  same_dr_stmt (NULL), size (0), store_count (0), gap (0), min_neg_dist (0),
+  relevant ((enum vect_relevant) 0), vectorizable (false),
+  gather_scatter_p (false), strided_p (false), simd_lane_access_p (false),
+  v_reduc_type ((enum vect_reduction_type) 0) {}

where I think a

  _stmt_vec_info () { memset (this, 0, sizeof (_stmt_vec_info)); }

would be much nicer.  Or just keep the struct as a POD and add that memset
at the allocation point.  (and double-check the C++ alloc-pool doesn't end up
zeroing everything twice that way...)

Note that overwriting an existing entry in stmt_vec_info_vec should not happen.
In which path does that it happen?  We probably should assert the entry
is NULL.

Thus your fix shouldn't be necessary.

Thanks,
Richard.


> Martin

[Patch 3/3][Arm] Add support for IEEE-conformant versions of scalar fmin* and fmax*

2015-12-08 Thread David Sherwood

Hi,

Here is the last patch of the fmin/fmax change, which adds the optabs
to the arm backend.

Tested:

arm-none-eabi: no regressions

Good to go?
David Sherwood.

ChangeLog:

2015-12-08  David Sherwood  

gcc/
* config/arm/iterators.md: New iterators.
* config/arm/unspecs.md: New unspecs.
* config/arm/neon.md: New pattern.
* config/arm/vfp.md: Likewise.
gcc/testsuite
* gcc.target/arm/fmaxmin.c: New test.

> -Original Message-
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: 25 November 2015 12:39
> To: David Sherwood
> Cc: GCC Patches; Richard Sandiford
> Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of scalar 
> fmin* and fmax*
> 
> On Mon, Nov 23, 2015 at 10:21 AM, David Sherwood  
> wrote:
> > Hi,
> >
> > This is part 1 of a reworked version of a patch I originally submitted in
> > August, rebased after Richard Sandiford's recent work on the internal
> > functions. This first patch adds the internal function definitions and 
> > optabs
> > that provide support for IEEE fmax()/fmin() functions.
> >
> > Later patches will add the appropriate aarch64/aarch32 vector instructions.
> 
> Ok.
> 
> Thanks,
> Richard.
> 
> > Tested:
> >
> > x86_64-linux: no regressions
> > aarch64-none-elf: no regressions
> > arm-none-eabi: no regressions
> >
> > Regards,
> > David Sherwood.
> >
> > ChangeLog:
> >
> > 2015-11-19  David Sherwood  
> >
> > gcc/
> > * optabs.def: Add new optabs fmax_optab/fmin_optab.
> > * internal-fn.def: Add new fmax/fmin internal functions.
> > * config/aarch64/aarch64.md: New pattern.
> > * config/aarch64/aarch64-simd.md: Likewise.
> > * config/aarch64/iterators.md: New unspecs, iterators.
> > * config/arm/iterators.md: New iterators.
> > * config/arm/unspecs.md: New unspecs.
> > * config/arm/neon.md: New pattern.
> > * config/arm/vfp.md: Likewise.
> > * doc/md.texi: Add fmin and fmax patterns.
> > gcc/testsuite
> > * gcc.target/aarch64/fmaxmin.c: New test.
> > * gcc.target/arm/fmaxmin.c: New test.
> >
> >
> >> -Original Message-
> >> From: Richard Biener [mailto:richard.guent...@gmail.com]
> >> Sent: 19 August 2015 13:35
> >> To: Richard Biener; David Sherwood; GCC Patches; Richard Sandiford
> >> Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of 
> >> scalar fmin* and fmax*
> >>
> >> On Wed, Aug 19, 2015 at 2:11 PM, Richard Sandiford
> >>  wrote:
> >> > Richard Biener  writes:
> >> >> On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford
> >> >>  wrote:
> >> >>> Richard Biener  writes:
> >>  On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
> >>   wrote:
> >> > Richard Biener  writes:
> >> >> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood
> >> >>  wrote:
> >>  On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
> >>   wrote:
> >>  > Hi Richard,
> >>  >
> >>  > Thanks for the reply. I'd chosen to add new expressions as this
> >>  > seemed more
> >>  > consistent with the existing MAX_EXPR and MIN_EXPR tree codes. 
> >>  > In
> >>  > addition it
> >>  > would seem to provide more opportunities for optimisation than a
> >>  > target-specific
> >>  > builtin implementation would. I accept that optimisation
> >>  > opportunities will
> >>  > be more limited for strict math compilation, but that it was 
> >>  > still
> >>  > worth having
> >>  > them. Also, if we did map it to builtins then the scalar
> >>  > version would go
> >>  > through the optabs and the vector version would go through the
> >>  > target's builtin
> >>  > expansion, which doesn't seem very consistent.
> >> 
> >>  On another note ISTR you can't associate STRICT_MIN/MAX_EXPR and 
> >>  thus
> >>  you can't vectorize anyway?  (strict IEEE behavior is about NaNs,
> >>  correct?)
> >> >>> I thought for this particular case associativity wasn't an issue?
> >> >>> We're not doing any
> >> >>> reductions here, just simply performing max/min operations on each
> >> >>> pair of elements
> >> >>> in the vectors. I thought for IEEE-compliant behaviour we just 
> >> >>> need to
> >> >>> ensure that for
> >> >>> each pair of elements if one element is a NaN we return the other 
> >> >>> one.
> >> >>
> >> >> Hmm, true.  Ok, my comment still stands - I don't see that using a
> >> >> tree code is the best thing to do here.  You can add fmin/max optabs
> >> >> and

Re: -fstrict-aliasing fixes 5/6: make type system independent of flag_strict_aliasing

On Tue, 8 Dec 2015, Eric Botcazou wrote:

> > The problem is with the type:
> > (gdb) p debug_tree (p)
> >  
> > sizes-gimplified public visited unsigned DI
> > size  > bitsizetype> constant visited 64> unit size  > type  constant visited 8> align 64
> > symtab 0 alias set -1 canonical type 0x76af02a0
> > pointer_to_this >
> > 
> > it is a recursive pointer to itself. Does this make sense in Ada?  If so we
> > will need to add a recursion guard into the loop and put the alias set into
> > voidptr_alias_set.  It more looks like a frontend bug to me - I can not
> > think of a use for this beast.
> 
> This one (access1) is admittedly border line and we can probably kludge 
> around 
> it in the front-end, but what about access2 and more generally pointer cycles?

Usually cycles happen through structure members and it might be that
all other frontends have the pointed-to type incomplete.  But the
above recursion shouldn't apply for the structure case.

Not sure how your other examples look like.

Re: [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching

2015-12-08 Thread Ramana Radhakrishnan

On Tue, Dec 8, 2015 at 12:53 PM, Christian Bruel  wrote:
> Hi,
>
> The order of the NEON builtins construction has led to complications since
> the attribute target support. This was not a problem when driven from the
> command line, but was causing various issues when the builtins was mixed
> between fpu configurations or when used with LTO.
>
> Firstly the builtin functions was not initialized before the parsing of
> functions, leading to wrong type initializations.
>
> Then error catching code when a builtin was used without the proper fpu
> flags was incomprehensible for the user, for instance
>
> #include "arm_neon.h"
>
> int8x8_t a, b;
> int16x8_t e;
>
> void
> main()
> {
>   e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
> }

I'm not sure what problem you are trying to solve here - The user
should never be using __builtin_neon_vaddlsv8qi (a, b) here. What
happens with vaddl_s16 intrinsic ?

They really have to only use the vaddl_s8 intrinsic.


Ramana

>
> compiled with default options (without -mfpu=neon -mfloat-abi=hard) gave
> pages of
>
> /arm-none-eabi/6.0.0/include/arm_neon.h:39:9: error: unknown type name
> '__simd64_int8_t'
>  typedef __simd64_int8_t int8x8_t;
> ...
> ...
> arm_neon.h:4724:3: error: can't convert a vector of type 'poly64x2_t {aka
> __vector(4) int}' to type 'int' which has different size
>return (poly64x2_t)__builtin_neon_vsli_nv2di ((int64x2_t) __a,
> (int64x2_t) __b, __c);
>^~
> ...
> ... and one for each arm_neon.h lines..
>
> by postponing the check into arm_expand_builtin, we now emit something more
> useful:
>
> testo.c: In function 'main':
> testo.c:9:7: error: '__builtin_neon_vaddlsv8qi' neon builtin is not
> supported in this configuration.
>e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
>^~~
>
> One small side effect to note: The total memory allocated is 370k bigger
> when neon is not used, so this support will have a follow-up to make their
> initialization lazy. But I'd like first to stabilize the stuff for stage3
> (or get it pre-approved if the memory is an issue)
>
> tested without new failures with {,-mfpu=vfp,-mfpu=neon}{,-march=armv7-a\}
> (a few tests that was fail are now unsupported)
>
> OK for trunk ?
>
>
>
>
>
>
>
>

Re: [PATCH] Convert SPARC to LRA

2015-12-08 Thread Sebastian Huber


Hello David,

since the LRA patch is still reverted on the trunk, I guess the switch 
to LRA will not happen in GCC 6?


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

Re: [PATCH] New version of libmpx with new memmove wrapper

2015-12-08 Thread Aleksandra Tsvetkova

gcc/testsuite/ChangeLog
2015-10-27  Tsvetkova Alexandra  

* gcc.target/i386/mpx/memmove-1.c: New test for __mpx_wrapper_memmove.
* gcc.target/i386/mpx/memmove-2.c: New test covering fail on spec.

libmpx/ChangeLog
2015-10-28  Tsvetkova Alexandra  

* mpxrt/Makefile.am (libmpx_la_LDFLAGS): Add -version-info option.
* libmpxwrap/Makefile.am (libmpx_la_LDFLAGS): Likewise + includes fixed.
* libmpx/Makefile.in: Regenerate.
* mpxrt/Makefile.in: Regenerate.
* libmpxwrap/Makefile.in: Regenerate.
* mpxrt/libtool-version: New version.
* libmpxwrap/libtool-version: Likewise.
* mpxrt/libmpx.map: Add new version and a new symbol.
* mpxrt/mpxrt.h: New file.
* mpxrt/mpxrt.c (NUM_L1_BITS): Moved to mpxrt.h.
(REG_IP_IDX): Moved to mpxrt.h.
(REX_PREFIX): Moved to mpxrt.h.
(XSAVE_OFFSET_IN_FPMEM): Moved to mpxrt.h.
(MPX_L1_SIZE): Moved to mpxrt.h.
* libmpxwrap/mpx_wrappers.c: (__mpx_wrapper_memmove): Rewritten.
(mpx_pointer): New type.
(mpx_bt_entry): New type.
(alloc_bt): New function.
(get_bt): New function.
(copy_if_possible): New function.
(copy_if_possible_from_end): New function.
(move_bounds): New function.
diff --git a/gcc/testsuite/gcc.target/i386/mpx/memmove-2.c 
b/gcc/testsuite/gcc.target/i386/mpx/memmove-2.c
new file mode 100755
index 000..d0ada25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/mpx/memmove-2.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-options "-fcheck-pointer-bounds -mmpx" } */
+
+
+#include 
+#include 
+#include 
+#include 
+#include "mpx-check.h"
+
+#ifdef __i386__
+/* i386 directory size is 4MB.  */
+#define MPX_NUM_L2_BITS 10
+#define MPX_NUM_IGN_BITS 2
+#else /* __i386__ */
+/* x86_64 directory size is 2GB.  */
+#define MPX_NUM_L2_BITS 17
+#define MPX_NUM_IGN_BITS 3
+#endif /* !__i386__ */
+
+
+/* bt_num_of_elems is the number of elements in bounds table.  */
+unsigned long bt_num_of_elems = (1UL << MPX_NUM_L2_BITS);
+/* Function to test MPX wrapper of memmove function.
+   This test checks that a bug making spec2000 fail with
+   SEGFAULT is fixed.  */
+
+int
+mpx_test (int argc, const char **argv)
+{
+  void **arr = 0;
+  posix_memalign ((void **) (),
+   1UL << (MPX_NUM_L2_BITS + MPX_NUM_IGN_BITS),
+   2 * bt_num_of_elems * sizeof (void *));
+  void **src = arr, **dst = arr, **ptr = arr;
+  src += 10;
+  dst += 1;
+  ptr += bt_num_of_elems + 100;
+  ptr[0] = __bnd_set_ptr_bounds (arr + 1, sizeof (void *) + 1);
+  memmove (dst, src, 5 * sizeof (void *));
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/mpx/memmove.c 
b/gcc/testsuite/gcc.target/i386/mpx/memmove.c
new file mode 100755
index 000..57030a3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/mpx/memmove.c
@@ -0,0 +1,119 @@
+/* { dg-do run } */
+/* { dg-options "-fcheck-pointer-bounds -mmpx" } */
+
+
+#include 
+#include 
+#include 
+#include 
+#include "mpx-check.h"
+
+#ifdef __i386__
+/* i386 directory size is 4MB.  */
+#define MPX_NUM_L2_BITS 10
+#define MPX_NUM_IGN_BITS 2
+#else /* __i386__ */
+/* x86_64 directory size is 2GB.  */
+#define MPX_NUM_L2_BITS 17
+#define MPX_NUM_IGN_BITS 3
+#endif /* !__i386__ */
+
+
+/* bt_num_of_elems is the number of elements in bounds table.  */
+unsigned long bt_num_of_elems = (1UL << MPX_NUM_L2_BITS);
+/* Function to test MPX wrapper of memmove function.
+   src_bigger_dst determines which address is bigger, can be 0 or 1.
+   src_bt_index and dst_bt index are bt_indexes
+   from the beginning of the page.
+   bd_index_end is the bd index of the last element of src if we define
+   bd index of the first element as 0.
+   src_bt index_end is bt index of the last element of src.
+   pointers inside determines if array being copied includes pointers
+   src_align and dst_align are alignments of src and dst.
+   Arrays may contain unaligned pointers.  */
+int
+test (int src_bigger_dst, int src_bt_index, int dst_bt_index,
+  int bd_index_end, int src_bt_index_end, int pointers_inside,
+  int src_align, int dst_align)
+{
+  const int n =
+src_bt_index_end - src_bt_index + bd_index_end * bt_num_of_elems;
+  if (n < 0)
+{
+  return 0;
+}
+  const int num_of_pointers = (bd_index_end + 2) * bt_num_of_elems;
+  void **arr = 0;
+  posix_memalign ((void **) (),
+   1UL << (MPX_NUM_L2_BITS + MPX_NUM_IGN_BITS),
+   num_of_pointers * sizeof (void *));
+  void **src = arr, **dst = arr;
+  if ((src_bigger_dst) && (src_bt_index < dst_bt_index))
+src_bt_index += bt_num_of_elems;
+  if (!(src_bigger_dst) && (src_bt_index > dst_bt_index))
+dst_bt_index += bt_num_of_elems;
+  src += src_bt_index;
+  dst += dst_bt_index;
+  char *realign = (char *) src;
+  realign += src_align;
+  src = (void **) realign;
+  realign = (char *) dst;
+  realign += src_align;
+  dst = (void **) realign;
+  if (pointers_inside)
+{
+  for (int i = 0; i < n; i++)
+src[i] =

Re: [PATCH] Enable libstdc++ numeric conversions on Cygwin

2015-12-08 Thread Alan Lawrence


On 13/11/15 14:52, Jonathan Wakely wrote:

That patch was wrong, the new macros in include/bits/c++config used
"CSTDIO" instead of "STDIO" so it caused several tests to go from
PASS to UNSUPPORTED, oops!

This is the correct version, tested again more carefully, on
powerpc64le-linux and powerpc-aix and x86_64-dragonfly.

Committed to trunk.


Just as a note, on baremetal ARM and AArch64 (arm-none-eabi / aarch64-none-elf), 
this causes a bunch more tests to be executed that were previously UNSUPPORTED, 
including


FAIL: 21_strings/basic_string/numeric_conversions/char/stod.cc execution test
FAIL: 21_strings/basic_string/numeric_conversions/char/stof.cc execution test
FAIL: 21_strings/basic_string/numeric_conversions/char/stold.cc execution test
FAIL: 21_strings/basic_string/numeric_conversions/wchar_t/stod.cc execution test
FAIL: 21_strings/basic_string/numeric_conversions/wchar_t/stof.cc execution test
FAIL: 21_strings/basic_string/numeric_conversions/wchar_t/stold.cc execution 
test
FAIL: 27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc execution test

these were previously gated off by the dg-require-string-conversions, via the 
"#if !defined(_GLIBCXX_USE_C99) || defined(_GLIBCXX_HAVE_BROKEN_VSWPRINTF)" as 
GLIBCXX_USE_C99 was not defined. However, now we have "#if 
!(_GLIBCXX_USE_C99_STDIO && _GLIBCXX_USE_C99_STDLIB && _GLIBCXX_USE_C99_WCHAR)" 
and all of those are true.


(Note the tests would have failed before if we had executed them, there is no 
change there AFAICS.)


Cheers,
Alan

Re: [PATCH v3] Fix shrink-wrapping bug (PR67778, PR68634)


On 12/08/2015 01:40 AM, Segher Boessenkool wrote:


- if (can_get_prologue (pro, prologue_clobbered))
-   last_ok = pro;
}


Where did that test go?


Bernd

Re: [PATCH PR68542]

2015-12-08 Thread Yuri Rumyantsev

Hi Richard,

Here is the second part of patch.

Is it OK for trunk?

I assume that it should fix huge degradation on 481.wrf for -march=bdver4 also.

ChangeLog:
2015-12-08  Yuri Rumyantsev  

PR middle-end/68542
* config/i386/i386.c (ix86_expand_branch): Implement integral vector
comparison with boolean result.
* config/i386/sse.md (define_expand "cbranch4): Add define-expand
for vector comparion with eq/ne only.
* tree-vect-loop.c (is_valid_sink): New function.
(optimize_mask_stores): Likewise.
* tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
has_mask_store field of vect_info.
* tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
vectorized loops having masked stores.
* tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
correspondent macros.
(optimize_mask_stores): Add prototype.

gcc/testsuite/ChangeLog:
* gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

2015-12-07 13:57 GMT+03:00 Yuri Rumyantsev :
> Richard!
>
> Here is middle-end part of patch with changes proposed by you.
>
> Is it OK for trunk?
>
> Thanks.
> Yuri.
>
> ChangeLog:
> 2015-12-07  Yuri Rumyantsev  
>
> PR middle-end/68542
> * fold-const.c (fold_relational_const): Add handling of vector
> comparison with boolean result.
> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
> comparison of vector operands with boolean result for EQ/NE only.
> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
> (verify_gimple_cond): Likewise.
> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
> combining for non-compatible vector types.
> * tree-vrp.c (register_edge_assert_for): VRP does not track ranges for
> vector types.
>
>
>
> 2015-12-04 18:07 GMT+03:00 Yuri Rumyantsev :
>> Hi Richard.
>>
>> Thanks a lot for your review.
>> Below are my answers.
>>
>> You asked why I inserted additional check to
>> ++ b/gcc/tree-ssa-forwprop.c
>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>> tree_code code, tree type,
>>
>>gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>
>> +  /* Do not perform combining it types are not compatible.  */
>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>> +  && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE 
>> (op0
>> +return NULL_TREE;
>> +
>>
>> again, how does this happen?
>>
>> This is because without it I've got assert in fold_convert_loc
>>   gcc_assert (TREE_CODE (orig) == VECTOR_TYPE
>>  && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig)));
>>
>> since it tries to convert vector of bool to scalar bool.
>> Here is essential part of call-stack:
>>
>> #0  internal_error (gmsgid=0x1e48397 "in %s, at %s:%d")
>> at ../../gcc/diagnostic.c:1259
>> #1  0x01743ada in fancy_abort (
>> file=0x1847fc3 "../../gcc/fold-const.c", line=2217,
>> function=0x184b9d0 > tree_node*)::__FUNCTION__> "fold_convert_loc") at
>> ../../gcc/diagnostic.c:1332
>> #2  0x009c8330 in fold_convert_loc (loc=0, type=0x718a9d20,
>> arg=0x71a7f488) at ../../gcc/fold-const.c:2216
>> #3  0x009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR,
>> type=0x718a9d20, op0=0x71a7f460, op1=0x718c2000,
>> op2=0x718c2030) at ../../gcc/fold-const.c:11453
>> #4  0x009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR,
>> type=0x718a9d20, op0=0x71a7f460, op1=0x718c2000,
>> op2=0x718c2030) at ../../gcc/fold-const.c:12394
>> #5  0x009d870c in fold_binary_op_with_conditional_arg (loc=0,
>> code=EQ_EXPR, type=0x718a9d20, op0=0x71a7f460,
>> op1=0x71a48780, cond=0x71a7f460, arg=0x71a48780,
>> cond_first_p=1) at ../../gcc/fold-const.c:6465
>> #6  0x009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR,
>> type=0x718a9d20, op0=0x71a7f460, op1=0x71a48780)
>> at ../../gcc/fold-const.c:9211
>> #7  0x00ecb8fa in combine_cond_expr_cond (stmt=0x71a487d0,
>> code=EQ_EXPR, type=0x718a9d20, op0=0x71a7f460,
>> op1=0x71a48780, invariant_only=true)
>> at ../../gcc/tree-ssa-forwprop.c:382
>>
>>
>> Secondly, I did not catch your idea to implement GCC Vector Extension
>> for vector comparison with bool result since
>> such extension completely depends on comparison context, e.g. for your
>> example, result type of comparison depends on using - for
>> if-comparison it is scalar, but for c = (a==b) - result type is
>> vector. I don't think that this is reasonable for current release.
>>
>> And finally about AMD performance. I checked that this transformation
>> works for "-march=bdver4" option and regression for 481.wrf must
>> disappear too.
>>
>> Thanks.
>> Yuri.
>>
>> 2015-12-04 15:18 GMT+03:00 Richard Biener :
>>> On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev

Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment

2015-12-08 Thread Alessandro Fanfarillo

Hi,

in attachment the new patch. I also checked the behavior with
move_alloc: it synchronizes right after the deregistration of the
destination.
I also noticed that __asm__ __volatile__ ("":::"memory") is called
before sync all and not after. It shouldn't be a problem, right?


2015-12-08 11:01 GMT+01:00 Tobias Burnus :
> Dear Alessandro, dear all,
>
> On Mon, Dec 07, 2015 at 03:48:17PM +0100, Alessandro Fanfarillo wrote:
>> Your patch fixes the issues. In attachment patch, test case and changelog.
>
> Regarding the ChangeLog: Please include the added lines, only, and not the
> change as patch. gcc/testsuite/ChangeLog changes too often such that a patch
> won't apply.
>
>
> Regarding the patch, I wonder where the memory synchronization makes sense
> and where it is not required. (cf. also my email to Matthew in this thread,
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00828.html)
>
>
> I think it should be after all image control statements (8.5.1 in
> http://j3-fortran.org/doc/year/15/15-007r2.pdf):
> SYNC ALL, SYNC IMAGES, SYNC MEMORY, ALLOCATE, DEALLOCATE,
> CRITICAL ... END CRITICAL, EVENT POST, EVENT WAIT, LOCK, UNLOCK,
> MOVE_ALLOC.
>
> Thus:
> - SYNC ..., ALLOCATE/DEALLOCATE: I think those are all handled by the
>   current patch
> - MOVE_ALLOC: This one should be handled via the internal (de)allocate
>   handling (please check!)
> - EVENT WAIT, CRITICAL and LOCK: Obtaining a lock or receiving an event
>   implies that quite likely some other process has changed something
>   before. For those, the assembler statement really has to be added.
> - EVENT POST, UNLOCK and END CRITICAL: While those are image control
>   statements, I do not see how a remote image could modify a value in
>   a well defined way, which is guaranteed to be available after that
>   statement - but might not yet be available already at the previous
>   segment (i.e. the previous image control statement).
>
> Hence: I think you should update the patch to also handle
> EVENT WAIT, CRITICAL and LOCK - and to check MOVE_ALLOC.
>
>
>
> Additionally, we need to handle the alias issue of:
>   var = 5
>   var[this_image()] = 42
>   tmp = var
>
> Both _gfortran_caf_send and _gfortran_caf_sendget can modify the
> value of a variable; thus, calling the assembler after the function
> makes sense.
>
>
> However, _gfortran_caf_get does not modify the associated variable;
> adding the assembler statement *after* _gfortran_caf_get. The
> question is, however, whether one needs to take care of 'flushing'
> the variable before the _gfortran_caf_get:
>var = 7
>...
>var = 5
>tmp = var[this_image()]
>result = var + tmp
> Here, one needs to prevent the compiler of keeping "5" only in a
> register or moving "var = 5" after the _gfortran_caf_get call.
>
>
> Thus, you have to move the assembler statement before the library
> call in _gfortran_caf_get - and add another one at the beginning
> of _gfortran_caf_sendget.
>
> (For send/get, might might come up with something better than
> ""::"memory", but for now, it should do.)
>
> Cheers,
>
> Tobias
2015-12-08  Tobias Burnus  
Alessandro Fanfarillo 

* trans.c (gfc_allocate_using_lib,gfc_deallocate_with_status):
Introducing __asm__ __volatile__ ("":::"memory")
after image control statements.
* trans-stmt.c  (gfc_trans_sync, gfc_trans_event_post_wait,
gfc_trans_lock_unlock, gfc_trans_critical): Ditto.
* trans-intrinsic.c (gfc_conv_intrinsic_caf_get,
conv_caf_send): Introducing __asm__ __volatile__ ("":::"memory")
after send, before get and around sendget.

2015-12-08  Tobias Burnus  
Alessandro Fanfarillo 

* gfortran.dg/coarray_40.f90: New.

commit 6cdffc285931eb604d4c900d77fe60b96d604556
Author: Alessandro Fanfarillo 
Date:   Tue Dec 8 14:58:20 2015 +0100

Introducing __asm__ __volatile__ (:::memory) after image control 
statements, send, get and sendget

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 21efe44..0e4fcc5 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -1211,6 +1211,14 @@ gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr *expr, 
tree lhs, tree lhs_kind,
   if (lhs == NULL_TREE)
 may_require_tmp = boolean_false_node;
 
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+   gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+   tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (>pre, tmp);
+
   tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_get, 9,
 token, offset, image_index,

Re: [PATCH] Add testcase for c++/68116

2015-12-08 Thread Marek Polacek

On Tue, Dec 08, 2015 at 09:15:33AM -0500, Jason Merrill wrote:
> On 12/08/2015 06:54 AM, Bernd Schmidt wrote:
> >On 12/07/2015 06:49 PM, Marek Polacek wrote:
> >
> >>diff --git gcc/testsuite/g++.dg/cpp0x/pr68116.C
> >>gcc/testsuite/g++.dg/cpp0x/pr68116.C
> >>index e69de29..04ed901 100644
> >>--- gcc/testsuite/g++.dg/cpp0x/pr68116.C
> >>+++ gcc/testsuite/g++.dg/cpp0x/pr68116.C
> >>@@ -0,0 +1,12 @@
> >>+// PR c++/68116
> >>+// { dg-do compile { target c++11 } }
> >>+
> >>+class C {
> >>+  void foo ();
> >>+  typedef void (C::*T) (int);
> >>+  static T b[];
> >>+};
> >>+C::T C::b[]
> >>+{
> >>+  T (::foo)
> >>+};
> >
> >The problem I have with approving C++ testcases is that I have no idea
> >whether this is valid or not or what it expresses. You should Cc Jason
> >(which I've now done).
> 
> That's odd code--I don't approve of the cast in the initializer--but it is
> well-formed.  OK.

Certainly we shouldn't ICE on this, as we used to, which is what I'm trying to
ensure here.  The code looks weird indeed, but it's C++... ;)

Thanks,

Marek

[PATCH 1/2] PR c/68757: fix uninitialied src_range for various builtins

2015-12-08 Thread David Malcolm

This fixes various uninitialized src_range of c_expr, this time
in the various builtins that are parsed via c_parser_get_builtin_args.

Bootstrapped on x86_64-pc-linux-gnu.

OK for trunk?

gcc/c/ChangeLog:
PR c/68757
* c-parser.c (c_parser_get_builtin_args): Add
"out_close_paren_loc" param, and write back to it.
(c_parser_postfix_expression): Capture the closing
parenthesis location for RID_CHOOSE_EXPR,
RID_BUILTIN_CALL_WITH_STATIC_CHAIN, RID_BUILTIN_COMPLEX,
RID_BUILTIN_SHUFFLE and use it to set the source range
for such expressions; within RID_BUILTIN_COMPLEX set
the underlying location.

gcc/testsuite/ChangeLog:
PR c/68757
* gcc.dg/plugin/diagnostic-test-expressions-1.c
(test_builtin_choose_expr): New test function.
(test_builtin_call_with_static_chain): Likewise.
(test_builtin_complex): Likewise.
(test_builtin_shuffle): Likewise.
---
 gcc/c/c-parser.c   | 39 -
 .../gcc.dg/plugin/diagnostic-test-expressions-1.c  | 50 ++
 2 files changed, 78 insertions(+), 11 deletions(-)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index c7d15f9..8ea0e95 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -6933,11 +6933,14 @@ c_parser_alignof_expression (c_parser *parser)
for the middle-end nodes like COMPLEX_EXPR, VEC_PERM_EXPR and
others.  The name of the builtin is passed using BNAME parameter.
Function returns true if there were no errors while parsing and
-   stores the arguments in CEXPR_LIST.  */
+   stores the arguments in CEXPR_LIST.  If it returns true,
+   *OUT_CLOSE_PAREN_LOC is written to with the location of the closing
+   parenthesis.  */
 static bool
 c_parser_get_builtin_args (c_parser *parser, const char *bname,
   vec **ret_cexpr_list,
-  bool choose_expr_p)
+  bool choose_expr_p,
+  location_t *out_close_paren_loc)
 {
   location_t loc = c_parser_peek_token (parser)->location;
   vec *cexpr_list;
@@ -6955,6 +6958,7 @@ c_parser_get_builtin_args (c_parser *parser, const char 
*bname,
 
   if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN))
 {
+  *out_close_paren_loc = c_parser_peek_token (parser)->location;
   c_parser_consume_token (parser);
   return true;
 }
@@ -6974,6 +6978,7 @@ c_parser_get_builtin_args (c_parser *parser, const char 
*bname,
   vec_safe_push (cexpr_list, expr);
 }
 
+  *out_close_paren_loc = c_parser_peek_token (parser)->location;
   if (!c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>"))
 return false;
 
@@ -7594,11 +7599,13 @@ c_parser_postfix_expression (c_parser *parser)
vec *cexpr_list;
c_expr_t *e1_p, *e2_p, *e3_p;
tree c;
+   location_t close_paren_loc;
 
c_parser_consume_token (parser);
if (!c_parser_get_builtin_args (parser,
"__builtin_choose_expr",
-   _list, true))
+   _list, true,
+   _paren_loc))
  {
expr.value = error_mark_node;
break;
@@ -7626,6 +7633,7 @@ c_parser_postfix_expression (c_parser *parser)
" a constant");
constant_expression_warning (c);
expr = integer_zerop (c) ? *e3_p : *e2_p;
+   set_c_expr_source_range (, loc, close_paren_loc);
break;
  }
case RID_TYPES_COMPATIBLE_P:
@@ -7677,11 +7685,13 @@ c_parser_postfix_expression (c_parser *parser)
vec *cexpr_list;
c_expr_t *e2_p;
tree chain_value;
+   location_t close_paren_loc;
 
c_parser_consume_token (parser);
if (!c_parser_get_builtin_args (parser,
"__builtin_call_with_static_chain",
-   _list, false))
+   _list, false,
+   _paren_loc))
  {
expr.value = error_mark_node;
break;
@@ -7710,17 +7720,20 @@ c_parser_postfix_expression (c_parser *parser)
"must be a pointer type");
else
  CALL_EXPR_STATIC_CHAIN (expr.value) = chain_value;
+   set_c_expr_source_range (, loc, close_paren_loc);
break;
  }
case RID_BUILTIN_COMPLEX:
  {
vec *cexpr_list;
c_expr_t *e1_p, *e2_p;
+   location_t close_paren_loc;
 
c_parser_consume_token (parser);
if (!c_parser_get_builtin_args (parser,

[PATCH 2/2] C: fix uninitialized ranges for alignof

2015-12-08 Thread David Malcolm

This fixes another uninitialized src_range of a c_expr, this time
in __alignof__ expressions.

Bootstrapped on x86_64-pc-linux-gnu.

OK for trunk?

gcc/c/ChangeLog:
* c-parser.c (c_parser_alignof_expression): Capture location of
closing parenthesis (if any), or of end of unary expression, and
use it to build a src_range for the expression.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic-test-expressions-1.c (test_alignof):
New test function.
---
 gcc/c/c-parser.c   | 16 +
 .../gcc.dg/plugin/diagnostic-test-expressions-1.c  | 27 ++
 2 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 8ea0e95..4611e5b 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -6853,7 +6853,8 @@ static struct c_expr
 c_parser_alignof_expression (c_parser *parser)
 {
   struct c_expr expr;
-  location_t loc = c_parser_peek_token (parser)->location;
+  location_t start_loc = c_parser_peek_token (parser)->location;
+  location_t end_loc;
   tree alignof_spelling = c_parser_peek_token (parser)->value;
   gcc_assert (c_parser_next_token_is_keyword (parser, RID_ALIGNOF));
   bool is_c11_alignof = strcmp (IDENTIFIER_POINTER (alignof_spelling),
@@ -6864,10 +6865,10 @@ c_parser_alignof_expression (c_parser *parser)
   if (is_c11_alignof)
 {
   if (flag_isoc99)
-   pedwarn_c99 (loc, OPT_Wpedantic, "ISO C99 does not support %qE",
+   pedwarn_c99 (start_loc, OPT_Wpedantic, "ISO C99 does not support %qE",
 alignof_spelling);
   else
-   pedwarn_c99 (loc, OPT_Wpedantic, "ISO C90 does not support %qE",
+   pedwarn_c99 (start_loc, OPT_Wpedantic, "ISO C90 does not support %qE",
 alignof_spelling);
 }
   c_parser_consume_token (parser);
@@ -6884,6 +6885,7 @@ c_parser_alignof_expression (c_parser *parser)
   c_parser_consume_token (parser);
   loc = c_parser_peek_token (parser)->location;
   type_name = c_parser_type_name (parser);
+  end_loc = c_parser_peek_token (parser)->location;
   c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, "expected %<)%>");
   if (type_name == NULL)
{
@@ -6910,21 +6912,25 @@ c_parser_alignof_expression (c_parser *parser)
false, is_c11_alignof, 1);
   ret.original_code = ERROR_MARK;
   ret.original_type = NULL;
+  set_c_expr_source_range (, start_loc, end_loc);
   return ret;
 }
   else
 {
   struct c_expr ret;
   expr = c_parser_unary_expression (parser);
+  end_loc = expr.src_range.m_finish;
 alignof_expr:
   mark_exp_read (expr.value);
   c_inhibit_evaluation_warnings--;
   in_alignof--;
-  pedwarn (loc, OPT_Wpedantic, "ISO C does not allow %<%E (expression)%>",
+  pedwarn (start_loc,
+  OPT_Wpedantic, "ISO C does not allow %<%E (expression)%>",
   alignof_spelling);
-  ret.value = c_alignof_expr (loc, expr.value);
+  ret.value = c_alignof_expr (start_loc, expr.value);
   ret.original_code = ERROR_MARK;
   ret.original_type = NULL;
+  set_c_expr_source_range (, start_loc, end_loc);
   return ret;
 }
 }
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-1.c 
b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-1.c
index 023385b..175b2ea 100644
--- a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-1.c
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-1.c
@@ -568,6 +568,33 @@ void test_builtin_shuffle (v4si a, v4si b, v4si mask)
{ dg-end-multiline-output "" } */
 }
 
+void test_alignof (int param)
+{
+  __emit_expression_range (0, __alignof__ (int) + param );  /* { dg-warning 
"range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, __alignof__ (int) + param );
+   ~~^~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0,  param + __alignof__ (int) );  /* { dg-warning 
"range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0,  param + __alignof__ (int) );
+~~^~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0,  __alignof__ (param) + param );  /* { dg-warning 
"range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0,  __alignof__ (param) + param );
+^~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0,  param + __alignof__ (param) );  /* { dg-warning 
"range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0,  param + __alignof__ (param) );
+~~^
+   { dg-end-multiline-output "" } */
+}
+
 /* Examples of non-trivial expressions.  /
 
 extern double sqrt (double x);

Re: [6/6] OpenMP 4.0 library testsuite

2015-12-08 Thread Ilya Verbin

Hi!

On Tue, Oct 08, 2013 at 21:54:47 +0200, Jakub Jelinek wrote:
>   * testsuite/libgomp.c++/udr-1.C: New test.
>   * testsuite/libgomp.c++/udr-3.C: New test.
>   * testsuite/libgomp.c++/udr-9.C: New test.

I've just noticed that these tests fail fith -flto (on latest trunk and on Oct 
2013).

FAIL: libgomp.c++/udr-1.C (internal compiler error)
FAIL: libgomp.c++/udr-3.C (internal compiler error)
FAIL: libgomp.c++/udr-9.C (internal compiler error)
FAIL: libgomp.c++/udr-11.C (internal compiler error)
FAIL: libgomp.c++/udr-13.C (internal compiler error)
FAIL: libgomp.c++/udr-19.C (internal compiler error)

libgomp/testsuite/libgomp.c++/udr-1.C:56:13: internal compiler error: in 
discriminator_for_local_entity, at cp/mangle.c:1762
0x972ace discriminator_for_local_entity
gcc/cp/mangle.c:1762
0x972e49 write_local_name
gcc/cp/mangle.c:1850
0x96d079 write_name
gcc/cp/mangle.c:882
0x96c8c4 write_encoding
gcc/cp/mangle.c:744
0x96c365 write_mangled_name
gcc/cp/mangle.c:709
0x97c152 mangle_decl_string
gcc/cp/mangle.c:3509
0x97c198 get_mangled_id
gcc/cp/mangle.c:3531
0x97c622 mangle_decl(tree_node*)
gcc/cp/mangle.c:3598
0x12e3d77 decl_assembler_name(tree_node*)
gcc/tree.c:670
0x12f778d assign_assembler_name_if_neeeded(tree_node*)
gcc/tree.c:5879
0x12f78e5 free_lang_data_in_cgraph
gcc/tree.c:5934
0x12f7a99 free_lang_data
gcc/tree.c:5976
0x12f7b38 execute
gcc/tree.c:6025
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

  -- Ilya

Re: [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching

2015-12-08 Thread Ramana Radhakrishnan

On Tue, Dec 8, 2015 at 1:29 PM, Christian Bruel  wrote:
> Hello Ramana,
>
> On 12/08/2015 02:01 PM, Ramana Radhakrishnan wrote:
>>
>> On Tue, Dec 8, 2015 at 12:53 PM, Christian Bruel 
>> wrote:
>>>
>>> Hi,
>>>
>>> The order of the NEON builtins construction has led to complications
>>> since
>>> the attribute target support. This was not a problem when driven from the
>>> command line, but was causing various issues when the builtins was mixed
>>> between fpu configurations or when used with LTO.
>>>
>>> Firstly the builtin functions was not initialized before the parsing of
>>> functions, leading to wrong type initializations.
>>>
>>> Then error catching code when a builtin was used without the proper fpu
>>> flags was incomprehensible for the user, for instance
>>>
>>> #include "arm_neon.h"
>>>
>>> int8x8_t a, b;
>>> int16x8_t e;
>>>
>>> void
>>> main()
>>> {
>>>e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
>>> }
>>
>>
>> I'm not sure what problem you are trying to solve here - The user
>> should never be using __builtin_neon_vaddlsv8qi (a, b) here. What
>> happens with vaddl_s16 intrinsic ?
>>
>> They really have to only use the vaddl_s8 intrinsic.
>
>
>
> Sure, that's not the problem, replace _builtin_neon_vaddlsv8qi by vaddl_s8.
> The tests (part of the patch) equivalently fails.
>
> But anyway, users do use the __builtin directly, see for instance the Bug
> 65837

I think that's just a reduced testcase from the issue to illustrate
the problem from Prathamesh who was trying to build chromium with LTO.

The __builtin_neon* aren't published anywhere and people really
shouldn't be using that directly in source code and only use the
interface in arm_neon.h which implements pretty much all the Neon
intrinsics in the ACLE document.

regards
Ramana


>
>
>
>>
>>
>> Ramana
>>
>>>
>>> compiled with default options (without -mfpu=neon -mfloat-abi=hard) gave
>>> pages of
>>>
>>> /arm-none-eabi/6.0.0/include/arm_neon.h:39:9: error: unknown type name
>>> '__simd64_int8_t'
>>>   typedef __simd64_int8_t int8x8_t;
>>> ...
>>> ...
>>> arm_neon.h:4724:3: error: can't convert a vector of type 'poly64x2_t {aka
>>> __vector(4) int}' to type 'int' which has different size
>>> return (poly64x2_t)__builtin_neon_vsli_nv2di ((int64x2_t) __a,
>>> (int64x2_t) __b, __c);
>>> ^~
>>> ...
>>> ... and one for each arm_neon.h lines..
>>>
>>> by postponing the check into arm_expand_builtin, we now emit something
>>> more
>>> useful:
>>>
>>> testo.c: In function 'main':
>>> testo.c:9:7: error: '__builtin_neon_vaddlsv8qi' neon builtin is not
>>> supported in this configuration.
>>> e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
>>> ^~~
>>>
>>> One small side effect to note: The total memory allocated is 370k bigger
>>> when neon is not used, so this support will have a follow-up to make
>>> their
>>> initialization lazy. But I'd like first to stabilize the stuff for stage3
>>> (or get it pre-approved if the memory is an issue)
>>>
>>> tested without new failures with
>>> {,-mfpu=vfp,-mfpu=neon}{,-march=armv7-a\}
>>> (a few tests that was fail are now unsupported)
>>>
>>> OK for trunk ?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>

Re: [PATCH] Add testcase for c++/68116

2015-12-08 Thread Jason Merrill


On 12/08/2015 06:54 AM, Bernd Schmidt wrote:

On 12/07/2015 06:49 PM, Marek Polacek wrote:


diff --git gcc/testsuite/g++.dg/cpp0x/pr68116.C
gcc/testsuite/g++.dg/cpp0x/pr68116.C
index e69de29..04ed901 100644
--- gcc/testsuite/g++.dg/cpp0x/pr68116.C
+++ gcc/testsuite/g++.dg/cpp0x/pr68116.C
@@ -0,0 +1,12 @@
+// PR c++/68116
+// { dg-do compile { target c++11 } }
+
+class C {
+  void foo ();
+  typedef void (C::*T) (int);
+  static T b[];
+};
+C::T C::b[]
+{
+  T (::foo)
+};


The problem I have with approving C++ testcases is that I have no idea
whether this is valid or not or what it expresses. You should Cc Jason
(which I've now done).


That's odd code--I don't approve of the cast in the initializer--but it 
is well-formed.  OK.


Jason

Re: [PATCH] Add testcase for c++/68116


On 12/08/2015 03:21 PM, Marek Polacek wrote:

+C::T C::b[]
+{
+  T (::foo)
+};


The problem I have with approving C++ testcases is that I have no idea
whether this is valid or not or what it expresses. You should Cc Jason
(which I've now done).


That's odd code--I don't approve of the cast in the initializer--but it is
well-formed.  OK.


For all the other people who were clueless like me, apparently this is a 
thing called "list initialization" or "brace-init" that you can google for.



Bernd

Re: [RFA] [PATCH] [PR tree-optimization/68619] Avoid direct cfg cleanups in tree-ssa-dom.c [1/3]

On Tue, Dec 8, 2015 at 7:15 AM, Jeff Law  wrote:
>
> First in the series.  This merely refactors code from tree-ssa-sccvn.c into
> domwalk.c so that other walkers can use it as they see fit.
>
>
> There's an initialization function which sets all edges to executable.
>
> There's a test function to see if a block is reachable.
>
> There's a propagation function to propagate the unreachable property to
> edges.
>
> Finally a function to clear m_unreachable_dom.  I consider this a wart.
> Essentially that data member contains the highest unreachable block in the
> dominator tree.  Once we've finished processing that block's children, we
> need to clear the member.  Ideally clients wouldn't need to call this member
> function.
>
> Bikeshedding on the members names is encouraged.  And if someone has a
> clean, simple way to ensure that the m_unreachable_dom member gets cleared
> regardless of whether or not a client has its own after_dom_children
> callback, I'd love to hear it.

I wonder if it makes more sense to integrate this with the domwalker
itself.  Add
a constructor flag to it and do everything in itself.  By letting the
before_dom_children
return a taken edge (or NULL if unknown) it can drive the outgoing edge marking.
And the domwalk worker would simply not recurse to dom children for unreachable
blocks.

Richard.

> OK for trunk?
>
>
> Jeff
>
> commit 5e53fefae0373768b98d51d5746d43f36cecbe2a
> Author: Jeff Law 
> Date:   Mon Dec 7 11:32:58 2015 -0700
>
> * domwalk.h (dom_walker::init_edge_executable): New method.
> (dom_walker::maybe_clear_unreachable_dom): Likewise.
> (dom_walker::bb_reachable): Likewise.
> (dom_walker::propagate_unreachable_to_edges): Likewise.
> (dom_walker::m_unreachable_dom): New private data member.
> * domwalk.c: Include dumpfile.h.
> (dom_walker::init_edge_executable): New method.
> (dom_walker::maybe_clear_unreachable_dom): Likewise.
> (dom_walker::bb_reachable): Likewise.  Factored out of
> tree-ssa-sccvn.c
> (dom_walker::propagate_unreachable_to_edges): Likewise.
> * tree-ssa-sccvn.c (sccvn_dom_walker::unreachable_dom): Remove
> private data member.
> (sccvn_dom_walker::after_dom_children): Use methods from dom_walker
> class.
> (sccvn_dom_walker::before_dom_children): Similarly.
> (run_scc_vn): Likewise.
>
> diff --git a/gcc/domwalk.c b/gcc/domwalk.c
> index 167fc38..feb6478 100644
> --- a/gcc/domwalk.c
> +++ b/gcc/domwalk.c
> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "backend.h"
>  #include "cfganal.h"
>  #include "domwalk.h"
> +#include "dumpfile.h"
>
>  /* This file implements a generic walker for dominator trees.
>
> @@ -142,6 +143,93 @@ cmp_bb_postorder (const void *a, const void *b)
>return 1;
>  }
>
> +/* Mark all edges in the CFG as possibly being executable.  */
> +
> +void
> +dom_walker::init_edge_executable (struct function *fun)
> +{
> +  basic_block bb;
> +  FOR_ALL_BB_FN (bb, fun)
> +{
> +  edge_iterator ei;
> +  edge e;
> +  FOR_EACH_EDGE (e, ei, bb->succs)
> +   e->flags |= EDGE_EXECUTABLE;
> +}
> +}
> +
> +/* Return TRUE if BB is reachable, false otherwise.  */
> +
> +bool
> +dom_walker::bb_reachable (struct function *fun, basic_block bb)
> +{
> +  /* If any of the predecessor edges that do not come from blocks dominated
> + by us are still marked as possibly executable consider this block
> + reachable.  */
> +  bool reachable = false;
> +  if (!m_unreachable_dom)
> +{
> +  reachable = bb == ENTRY_BLOCK_PTR_FOR_FN (fun);
> +  edge_iterator ei;
> +  edge e;
> +  FOR_EACH_EDGE (e, ei, bb->preds)
> +   if (!dominated_by_p (CDI_DOMINATORS, e->src, bb))
> + reachable |= (e->flags & EDGE_EXECUTABLE);
> +}
> +
> +  return reachable;
> +}
> +
> +/* BB has been determined to be unreachable.  Propagate that property
> +   to incoming and outgoing edges of BB as appropriate.  */
> +
> +void
> +dom_walker::propagate_unreachable_to_edges (basic_block bb,
> +   FILE *dump_file,
> +   int dump_flags)
> +{
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +fprintf (dump_file, "Marking all outgoing edges of unreachable "
> +"BB %d as not executable\n", bb->index);
> +
> +  edge_iterator ei;
> +  edge e;
> +  FOR_EACH_EDGE (e, ei, bb->succs)
> +e->flags &= ~EDGE_EXECUTABLE;
> +
> +  FOR_EACH_EDGE (e, ei, bb->preds)
> +{
> +  if (dominated_by_p (CDI_DOMINATORS, e->src, bb))
> +   {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> +   fprintf (dump_file, "Marking backedge from BB %d into "
> +"unreachable BB %d as not executable\n",
> +e->src->index, bb->index);
> + e->flags &= ~EDGE_EXECUTABLE;
> +   }
> +}
> +

Re: [RFA] [PATCH] [PR tree-optimization/68619] Avoid direct cfg cleanups in tree-ssa-dom.c [1/3]

On Tue, Dec 8, 2015 at 3:23 PM, Richard Biener
 wrote:
> On Tue, Dec 8, 2015 at 7:15 AM, Jeff Law  wrote:
>>
>> First in the series.  This merely refactors code from tree-ssa-sccvn.c into
>> domwalk.c so that other walkers can use it as they see fit.
>>
>>
>> There's an initialization function which sets all edges to executable.
>>
>> There's a test function to see if a block is reachable.
>>
>> There's a propagation function to propagate the unreachable property to
>> edges.
>>
>> Finally a function to clear m_unreachable_dom.  I consider this a wart.
>> Essentially that data member contains the highest unreachable block in the
>> dominator tree.  Once we've finished processing that block's children, we
>> need to clear the member.  Ideally clients wouldn't need to call this member
>> function.
>>
>> Bikeshedding on the members names is encouraged.  And if someone has a
>> clean, simple way to ensure that the m_unreachable_dom member gets cleared
>> regardless of whether or not a client has its own after_dom_children
>> callback, I'd love to hear it.
>
> I wonder if it makes more sense to integrate this with the domwalker
> itself.  Add
> a constructor flag to it and do everything in itself.  By letting the
> before_dom_children
> return a taken edge (or NULL if unknown) it can drive the outgoing edge 
> marking.
> And the domwalk worker would simply not recurse to dom children for 
> unreachable
> blocks.

So interface-wise do

Index: gcc/domwalk.h
===
--- gcc/domwalk.h   (revision 231396)
+++ gcc/domwalk.h   (working copy)
@@ -30,13 +30,16 @@ along with GCC; see the file COPYING3.
 class dom_walker
 {
 public:
-  dom_walker (cdi_direction direction) : m_dom_direction (direction) {}
+  dom_walker (cdi_direction direction,
+ bool skip_unreachable_blocks = false)
+  : m_dom_direction (direction),
+m_skip_unreachable_blocks (skip_unreachable_blocks) {}

   /* Walk the dominator tree.  */
   void walk (basic_block);

   /* Function to call before the recursive walk of the dominator children.  */
-  virtual void before_dom_children (basic_block) {}
+  virtual edge before_dom_children (basic_block) {}

   /* Function to call after the recursive walk of the dominator children.  */
   virtual void after_dom_children (basic_block) {}
@@ -47,6 +50,7 @@ private:
  if it is set to CDI_POST_DOMINATORS, then we walk the post
  dominator tree.  */
   const ENUM_BITFIELD (cdi_direction) m_dom_direction : 2;
+  const bool m_skip_unreachable_blocks;
 };

 #endif

and simply inline all the code into dom_walker::walk.

Richard.

> Richard.
>
>> OK for trunk?
>>
>>
>> Jeff
>>
>> commit 5e53fefae0373768b98d51d5746d43f36cecbe2a
>> Author: Jeff Law 
>> Date:   Mon Dec 7 11:32:58 2015 -0700
>>
>> * domwalk.h (dom_walker::init_edge_executable): New method.
>> (dom_walker::maybe_clear_unreachable_dom): Likewise.
>> (dom_walker::bb_reachable): Likewise.
>> (dom_walker::propagate_unreachable_to_edges): Likewise.
>> (dom_walker::m_unreachable_dom): New private data member.
>> * domwalk.c: Include dumpfile.h.
>> (dom_walker::init_edge_executable): New method.
>> (dom_walker::maybe_clear_unreachable_dom): Likewise.
>> (dom_walker::bb_reachable): Likewise.  Factored out of
>> tree-ssa-sccvn.c
>> (dom_walker::propagate_unreachable_to_edges): Likewise.
>> * tree-ssa-sccvn.c (sccvn_dom_walker::unreachable_dom): Remove
>> private data member.
>> (sccvn_dom_walker::after_dom_children): Use methods from dom_walker
>> class.
>> (sccvn_dom_walker::before_dom_children): Similarly.
>> (run_scc_vn): Likewise.
>>
>> diff --git a/gcc/domwalk.c b/gcc/domwalk.c
>> index 167fc38..feb6478 100644
>> --- a/gcc/domwalk.c
>> +++ b/gcc/domwalk.c
>> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "backend.h"
>>  #include "cfganal.h"
>>  #include "domwalk.h"
>> +#include "dumpfile.h"
>>
>>  /* This file implements a generic walker for dominator trees.
>>
>> @@ -142,6 +143,93 @@ cmp_bb_postorder (const void *a, const void *b)
>>return 1;
>>  }
>>
>> +/* Mark all edges in the CFG as possibly being executable.  */
>> +
>> +void
>> +dom_walker::init_edge_executable (struct function *fun)
>> +{
>> +  basic_block bb;
>> +  FOR_ALL_BB_FN (bb, fun)
>> +{
>> +  edge_iterator ei;
>> +  edge e;
>> +  FOR_EACH_EDGE (e, ei, bb->succs)
>> +   e->flags |= EDGE_EXECUTABLE;
>> +}
>> +}
>> +
>> +/* Return TRUE if BB is reachable, false otherwise.  */
>> +
>> +bool
>> +dom_walker::bb_reachable (struct function *fun, basic_block bb)
>> +{
>> +  /* If any of the predecessor edges that do not come from blocks dominated
>> + by us are still marked as possibly executable consider this block
>> + reachable.  */
>> +

Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-08 Thread Ilya Verbin

On Tue, Dec 01, 2015 at 20:05:04 +0100, Jakub Jelinek wrote:
> This is racy, tsan would tell you so.
> Instead of a global var, I'd just change the devicep->is_initialized 
> field from bool into a 3 state field (perhaps enum), with states
> uninitialized, initialized, finalized, and then say in resolve_device,
> 
>   gomp_mutex_lock ([device_id].lock);
>   if (devices[device_id].state == GOMP_DEVICE_UNINITIALIZED)
> gomp_init_device ([device_id]);
>   else if (devices[device_id].state == GOMP_DEVICE_FINALIZED)
> {
>   gomp_mutex_unlock ([device_id].lock);
>   return NULL;
> }
>   gomp_mutex_unlock ([device_id].lock);
> 
> Though, of course, that is incomplete, because resolve_device takes one
> lock, gomp_get_target_fn_addr another one, gomp_map_vars yet another one.
> So I think either we want to rewrite the locking, such that say
> resolve_device returns a locked device and then you perform stuff on the
> locked device (disadvantage is that gomp_map_vars will call gomp_malloc
> with the lock held, which can take some time to allocate the memory),
> or there needs to be the possibility that gomp_map_vars rechecks if the
> device has not been finalized after taking the lock and returns to the
> caller if the device has been finalized in between resolve_device and
> gomp_map_vars.

This patch implements the second approach.  Is it OK?
Bootstrap and make check-target-libgomp passed.

libgomp/
* libgomp.h (gomp_device_state): New enum.
(struct gomp_device_descr): Replace is_initialized with state.
(gomp_fini_device): Remove declaration.
* oacc-host.c (host_dispatch): Use state instead of is_initialized.
* oacc-init.c (acc_init_1): Use state instead of is_initialized.
(acc_shutdown_1): Likewise.  Inline gomp_fini_device.
(acc_set_device_type): Use state instead of is_initialized.
(acc_set_device_num): Likewise.
* target.c (resolve_device): Use state instead of is_initialized.
Do not initialize finalized device.
(gomp_map_vars): Do nothing if device is finalized.
(gomp_unmap_vars): Likewise.
(gomp_update): Likewise.
(GOMP_offload_register_ver): Use state instead of is_initialized.
(GOMP_offload_unregister_ver): Likewise.
(gomp_init_device): Likewise.
(gomp_unload_device): Likewise.
(gomp_fini_device): Remove.
(gomp_get_target_fn_addr): Do nothing if device is finalized.
(GOMP_target): Go to host fallback if device is finalized.
(GOMP_target_ext): Likewise.
(gomp_exit_data): Do nothing if device is finalized.
(gomp_target_task_fn): Go to host fallback if device is finalized.
(gomp_target_fini): New static function.
(gomp_target_init): Use state instead of is_initialized.
Call gomp_target_fini at exit.
liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (unregister_main_image): Remove.
(register_main_image): Do not call unregister_main_image at exit.
(GOMP_OFFLOAD_fini_device): Allow for OpenMP.  Unregister main image.

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index c467f97..9d9949f 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -888,6 +888,14 @@ typedef struct acc_dispatch_t
   } cuda;
 } acc_dispatch_t;

+/* Various state of the accelerator device.  */
+enum gomp_device_state
+{
+  GOMP_DEVICE_UNINITIALIZED,
+  GOMP_DEVICE_INITIALIZED,
+  GOMP_DEVICE_FINALIZED
+};
+
 /* This structure describes accelerator device.
It contains name of the corresponding libgomp plugin, function handlers for
interaction with the device, ID-number of the device, and information about
@@ -933,8 +941,10 @@ struct gomp_device_descr
   /* Mutex for the mutable data.  */
   gomp_mutex_t lock;

-  /* Set to true when device is initialized.  */
-  bool is_initialized;
+  /* Current state of the device.  OpenACC allows to move from INITIALIZED 
state
+ back to UNINITIALIZED state.  OpenMP allows only to move from INITIALIZED
+ to FINALIZED state (at program shutdown).  */
+  enum gomp_device_state state;

   /* OpenACC-specific data and functions.  */
   /* This is mutable because of its mutable data_environ and target_data
@@ -962,7 +972,6 @@ extern void gomp_copy_from_async (struct target_mem_desc *);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
 extern void gomp_free_memmap (struct splay_tree_s *);
-extern void gomp_fini_device (struct gomp_device_descr *);
 extern void gomp_unload_device (struct gomp_device_descr *);

 /* work.c */
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 9874804..d289b38 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -222,7 +222,7 @@ static struct gomp_device_descr host_dispatch =

 .mem_map = { NULL },
 /* .lock initilized in goacc_host_init.  */
-.is_initialized = false,
+.state =

[patch] Backport libstdc++ documentation improvements

2015-12-08 Thread Jonathan Wakely


This syncs the gcc-5-branch docs with those on trunk (which I should
have done before the 5.3 release, sorry).

Committed to gcc-5-branch.


commit 677f6293e1c961ba75fba97509b613f28b4afc33
Author: Jonathan Wakely 
Date:   Tue Dec 8 13:41:50 2015 +

Backport libstdc++ documentation improvements

	* doc/xml/manual/abi.xml: Backport documentation improvements from
	mainline.
	* doc/xml/manual/configure.xml: Likewise.
	* doc/xml/manual/diagnostics.xml: Likewise.
	* doc/xml/manual/extensions.xml: Likewise.
	* doc/xml/manual/status_cxx2011.xml: Likewise.
	* doc/xml/manual/status_cxx2014.xml: Likewise.
	* doc/xml/manual/using.xml: Likewise.
	* doc/html/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/abi.xml b/libstdc++-v3/doc/xml/manual/abi.xml
index b399f71..a2ed57b 100644
--- a/libstdc++-v3/doc/xml/manual/abi.xml
+++ b/libstdc++-v3/doc/xml/manual/abi.xml
@@ -66,7 +66,7 @@
 
 
  Putting all of these ideas together results in the C++ Standard
-library ABI, which is the compilation of a given library API by a
+Library ABI, which is the compilation of a given library API by a
 given compiler ABI. In a nutshell:
 
 
diff --git a/libstdc++-v3/doc/xml/manual/configure.xml b/libstdc++-v3/doc/xml/manual/configure.xml
index f6a5551..7b09d01 100644
--- a/libstdc++-v3/doc/xml/manual/configure.xml
+++ b/libstdc++-v3/doc/xml/manual/configure.xml
@@ -297,10 +297,12 @@
 
  --enable-concept-checks
  This turns on additional compile-time checks for instantiated
-	library templates, in the form of specialized templates,
-	described here.  They
+	library templates, in the form of specialized templates described in
+the Concept
+Checking section.  They
 	can help users discover when they break the rules of the STL, before
-	their programs run.
+	their programs run. These checks are based on C++03 rules and some of
+	them are not compatible with correct C++11 code.
  
  
 
@@ -418,6 +420,15 @@
  
  
 
+ --enable-libstdcxx-filesystem-ts[default]
+ 
+Build libstdc++fs.a as well
+  as the usual libstdc++ and libsupc++ libraries. This is enabled by
+  default on select POSIX targets where it is known to work and disabled
+  otherwise.
+
+ 
+
 
 
 
diff --git a/libstdc++-v3/doc/xml/manual/diagnostics.xml b/libstdc++-v3/doc/xml/manual/diagnostics.xml
index 99206e9..88ed2e2 100644
--- a/libstdc++-v3/doc/xml/manual/diagnostics.xml
+++ b/libstdc++-v3/doc/xml/manual/diagnostics.xml
@@ -114,8 +114,9 @@
 
  
Please note that the checks are based on the requirements in the original
-   C++ standard, some of which have changed in the new C++11 revision.
-   Additionally, some correct code might be rejected by the concept checks,
+   C++ standard, many of which were relaxed in the C++11 standard and so valid
+   C++11 code may be incorrectly rejected by the concept checks.  Additionally,
+   some correct C++03 code might be rejected by the concept checks,
for example template argument types may need to be complete when used in
a template definition, rather than at the point of instantiation.
There are no plans to address these shortcomings.
diff --git a/libstdc++-v3/doc/xml/manual/extensions.xml b/libstdc++-v3/doc/xml/manual/extensions.xml
index c4120c9..41b1a80 100644
--- a/libstdc++-v3/doc/xml/manual/extensions.xml
+++ b/libstdc++-v3/doc/xml/manual/extensions.xml
@@ -82,7 +82,8 @@ extensions, be aware of two things:
   They can be enabled at configure time with
   --enable-concept-checks.
   You can enable them on a per-translation-unit basis with
-  #define _GLIBCXX_CONCEPT_CHECKS for GCC 3.4 and higher
+  #define
+  _GLIBCXX_CONCEPT_CHECKS for GCC 3.4 and higher
   (or with #define _GLIBCPP_CONCEPT_CHECKS for versions
   3.1, 3.2 and 3.3).

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
index 965df13..16ea8c4 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
@@ -17,8 +17,8 @@ Final Draft International Standard, Standard for Programming Language C++
 
 
 
-In this implementation -std=gnu++11 or
--std=c++11 flags must be used to enable language
+In this implementation the -std=gnu++11 or
+-std=c++11 flag must be used to enable language
 and library
 features. See dialect
 options. The pre-defined symbol
@@ -642,10 +642,8 @@ particular release.
   Class template shared_ptr
   Y
   
-	
 	  Uses code from
 	  http://www.w3.org/1999/xlink; xlink:href="http://www.boost.org/libs/smart_ptr/shared_ptr.htm;>boost::shared_ptr.
-	
   
 
 
@@ -2673,7 +2671,10 @@ particular release.
   30.2.3 [thread.req.native]/1
   native_handle_type and
   native_handle are provided. The handle types
-  are defined in terms of the Gthreads abstraction layer.
+  are defined in terms of the Gthreads abstraction layer, although this
+

[PATCH, i386, AVX-512] Fix assembler for broadcast pattern.

2015-12-08 Thread Kirill Yukhin

Hello,
Patch in the bottom fixes assembler strings for broadcast patterns.
Fixes spec2k6/464.h264ref compilation fail with -march=skylake-avx512.

Bootsatrapped, regtested & committed to main trunk.

gcc/
* config/i386/sse.md (define_insn "_vec_dup_1"): Fix
assembler to make source always 128bit.

--
Thanks, K

commit 87b314a103f47f21b342d1c0c751088f2562a481
Author: Kirill Yukhin 
Date:   Tue Dec 8 15:27:44 2015 +0300

AVX-512. Fix assembler section for broadcast pattern.

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index eb49c41..6740edf 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -17223,8 +17223,9 @@
(match_operand:VI_AVX512BW 1 "nonimmediate_operand" "v,m")
(parallel [(const_int 0)]]
   "TARGET_AVX512F"
-  "vpbroadcast\t{%1, %0|%0, %1}
-   vpbroadcast\t{%x1, %0|%0, %x1}"
+  "@
+   vpbroadcast\t{%x1, %0|%0, %x1}
+   vpbroadcast\t{%x1, %0|%0, %1}"
   [(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
(set_attr "mode" "")])

Re: [PATCH, i386, PR68627] Prohibit AVX-512VL broadcasts generation on KNL.

2015-12-08 Thread Kirill Yukhin

On 08 Dec 13:01, Uros Bizjak wrote:
> On Tue, Dec 8, 2015 at 11:40 AM, Kirill Yukhin  
> wrote:
> > Hello,
> > On 08 Dec 09:47, Andreas Schwab wrote:
> >> FAIL: gfortran.dg/pr68627.f   -O  (test for excess errors)
> >> Excess errors:
> >> gfortran: error: unrecognized command line option '-mavx512f'
> > Thanks for pointing.
> >
> > I've checked in this as obvious:
> >
> > gcc/testsuite:
> > * gfortran.dg/pr68627.f: Limit target x86.
> >
> > diff --git a/gcc/testsuite/gfortran.dg/pr68627.f 
> > b/gcc/testsuite/gfortran.dg/pr68627.f
> > index 32ff4a7..54575d7 100644
> > --- a/gcc/testsuite/gfortran.dg/pr68627.f
> > +++ b/gcc/testsuite/gfortran.dg/pr68627.f
> > @@ -1,4 +1,4 @@
> > -! { dg-do compile { target lp64 } }
> > +! { dg-do compile { target { { i?86-*-* x86_64-*-* } && lp64 } } }
> 
> Actually, you need { ! { ia32 } } here, since lp64 filters out x32 in
> addition to ia32. Please note that since x32 is x86_64 target, it
> implies 16 xmm registers.
Thanks Uroš, fixed in main trunk.
> 
> Uros.

--
Thanks, K

Re: [PATCH,RFC] Introduce RUN_UNDER_VALGRIND in test-suite


On 12/08/2015 04:28 PM, Martin Liška wrote:


Majority of them (~2600 BTs) are in fortran FE (BT contains 'gfc_'): [2].
The rest contains some issues in CP FE, many GGC invalid read/write operations 
([4]) and many
memory leaks in gcc.c (for instance option handling).

My question is if a bug should be created for all fortran issues and whether 
it's realistic that
they can be eventually fixed in next stage1?


It hardly seems worthwhile to file a bug for every issue, that's just 
unnecessary bureaucracy. Fix them as you go along would be my 
recommendation, that probably takes a similar amount of time.


Can't say whether it's realistic that they'll be fixed. I'm not at home 
in the Fortran frontend.



Bernd

Re: [RFA] [PATCH] [PR tree-optimization/68619] Avoid direct cfg cleanups in tree-ssa-dom.c [1/3]

2015-12-08 Thread Trevor Saunders

On Mon, Dec 07, 2015 at 11:15:33PM -0700, Jeff Law wrote:
> 
> First in the series.  This merely refactors code from tree-ssa-sccvn.c into
> domwalk.c so that other walkers can use it as they see fit.
> 
> 
> There's an initialization function which sets all edges to executable.
> 
> There's a test function to see if a block is reachable.
> 
> There's a propagation function to propagate the unreachable property to
> edges.
> 
> Finally a function to clear m_unreachable_dom.  I consider this a wart.
> Essentially that data member contains the highest unreachable block in the
> dominator tree.  Once we've finished processing that block's children, we
> need to clear the member.  Ideally clients wouldn't need to call this member
> function.

Hmm, I expect you thought about this, but why not always see if we need
to clear it before calling after_dom_children () ?  I think that would
amount to an extra compare of a register and cached memory (we're just
about to use the vtable pointer anyway) so I expect that wouldn't effect
performance significantly.

Thinking about this more I wonder if we could move more of this into the
dom walker, and skip calling before / after dom_children on unreachable
blocks all together.  That would seem to work for sccvn, but I'm not
sure about what tree-ssa-dom.c is doing with the mark pushing and
clearing.

Trev

[COMMITTED][Testcase] Skip big-endian as well for gcc.target/aarch64/got_mem_hoist_1.c

2015-12-08 Thread Jiong Wang


The same skip should be applied to big-endian for tiny and large code model.

Applied to trunk as obvious r231413.

2015-12-08  Jiong Wang  

gcc/testsuite/
  * gcc.target/aarch64/got_mem_hoist_1.c (dg-skip-if): Match big-endian as well.

diff --git a/gcc/testsuite/gcc.target/aarch64/got_mem_hoist_1.c b/gcc/testsuite/gcc.target/aarch64/got_mem_hoist_1.c
index 2d8c8ae..9ee772f 100644
--- a/gcc/testsuite/gcc.target/aarch64/got_mem_hoist_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/got_mem_hoist_1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fpic -fdump-rtl-loop2_invariant" } */
-/* { dg-skip-if "Load/Store hoisted by RTL PRE already" { aarch64-*-* }  { "-mcmodel=tiny" "-mcmodel=large" } { "" } } */
+/* { dg-skip-if "Load/Store hoisted by RTL PRE already" { aarch64*-*-* }  { "-mcmodel=tiny" "-mcmodel=large" } { "" } } */
 
 int bar (int);
 int cal (void *);

Re: [PATCH 1/2] PR c/68757: fix uninitialied src_range for various builtins


On 12/08/2015 05:02 PM, David Malcolm wrote:


I actually implemented something like this when implementing these two
patches.

Work-in-progress patch attached, which introduces an INVALID_LOCATION
value for source_location/location_t, and uses it to "poison" the
initial value of c_expr's src_range, with lots of assertions to verify
that it's been overwritten by time we use it.

It doesn't fully work yet, but much of gcc.dg survives with this (and
it's what I used to detect the alignof issue in patch 2).  Does this
look like something I should pursue?


Most definitely IMO.


Bernd

PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class

2015-12-08 Thread H.J. Lu

On Mon, Nov 23, 2015 at 12:53 PM, H.J. Lu  wrote:
> On Mon, Nov 23, 2015 at 1:57 AM, Richard Biener
>  wrote:
>> On Sat, Nov 21, 2015 at 12:46 AM, H.J. Lu  wrote:
>>> On Fri, Nov 20, 2015 at 2:17 PM, Jason Merrill  wrote:
 On 11/20/2015 01:52 PM, H.J. Lu wrote:
>
> On Tue, Nov 17, 2015 at 4:22 AM, Richard Biener
>  wrote:
>>
>> On Tue, Nov 17, 2015 at 12:01 PM, H.J. Lu  wrote:
>>>
>>> Empty record should be returned and passed the same way in C and C++.
>>> This patch adds LANG_HOOKS_EMPTY_RECORD_P for C++ empty class, which
>>> defaults to return false.  For C++, LANG_HOOKS_EMPTY_RECORD_P is defined
>>> to is_really_empty_class, which returns true for C++ empty classes.  For
>>> LTO, we stream out a bit to indicate if a record is empty and we store
>>> it in TYPE_LANG_FLAG_0 when streaming in.  get_ref_base_and_extent is
>>> changed to set bitsize to 0 for empty records.  Middle-end and x86
>>> backend are updated to ignore empty records for parameter passing and
>>> function value return.  Other targets may need similar changes.
>>
>>
>> Please avoid a new langhook for this and instead claim a bit in
>> tree_type_common
>> like for example restrict_flag (double-check it is unused for
>> non-pointers).
>
>
> There is no bit in tree_type_common I can overload.  restrict_flag is
> checked for non-pointers to issue an error when it is used on
> non-pointers:
>
>
> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/template/qualttp20.C:19:38:
> error: ‘__restrict__’ qualifiers cannot be applied to ‘AS::L’
> typedef typename T::L __restrict__ r;// { dg-error "'__restrict__'
> qualifiers cannot" "" }


 The C++ front end only needs to check TYPE_RESTRICT for this purpose on
 front-end-specific type codes like TEMPLATE_TYPE_PARM; cp_type_quals could
 handle that specifically if you change TYPE_RESTRICT to only apply to
 pointers.

>>>
>>> restrict_flag is also checked in this case:
>>>
>>> [hjl@gnu-6 gcc]$ cat x.i
>>> struct dummy { };
>>>
>>> struct dummy
>>> foo (struct dummy __restrict__ i)
>>> {
>>>   return i;
>>> }
>>> [hjl@gnu-6 gcc]$ gcc -S x.i -Wall
>>> x.i:4:13: error: invalid use of ‘restrict’
>>>  foo (struct dummy __restrict__ i)
>>>  ^
>>> x.i:4:13: error: invalid use of ‘restrict’
>>> [hjl@gnu-6 gcc]$
>>>
>>> restrict_flag can't also be used to indicate `i' is an empty record.
>>
>> I'm sure this error can be done during parsing w/o relying on TYPE_RESTRICT.
>>
>> But well, use any other free bit (but do not enlarge
>> tree_type_common).  Eventually
>> you can free up a bit by putting sth into type_lang_specific currently
>> using bits
>> in tree_type_common.
>
> There are no bits in tree_type_common I can move.  Instead,
> this patch overloads side_effects_flag in tree_base.  Tested on
> Linux/x86-64.  OK for trunk?
>
> Thanks.
>
> --
> H.J.



-- 
H.J.
From 1e3e6ed42ed86b74d01bd3a6b869c15dba964c21 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sun, 15 Nov 2015 13:19:05 -0800
Subject: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class

Empty record should be returned and passed the same way in C and C++.
This patch overloads a bit, side_effects_flag, in tree_base for C++
empty class.  Middle-end and x86 backend are updated to ignore empty
records for parameter passing and function value return.  Other targets
may need similar changes.

get_ref_base_and_extent is changed to set bitsize to 0 for empty records
so that when ref_maybe_used_by_call_p_1 calls get_ref_base_and_extent to
get 0 as the maximum size on empty record.  Otherwise, find_tail_calls
won't perform tail call optimization for functions with empty record
parameters, as shown in g++.dg/pr60336-1.C and g++.dg/pr60336-2.C.

gcc/

	PR c++/60336
	PR middle-end/67239
	PR target/68355
	* calls.c (initialize_argument_information): Replace
	targetm.calls.function_arg, targetm.calls.function_incoming_arg
	and targetm.calls.function_arg_advance with function_arg,
	function_incoming_arg and function_arg_advance.
	(expand_call): Likewise.
	(emit_library_call_value_1): Likewise.
	(store_one_arg): Use 0 for empty record size.  Don't
	push 0 size argument onto stack.
	(must_pass_in_stack_var_size_or_pad): Return false for empty
	record.
	* dse.c (get_call_args): Replace targetm.calls.function_arg
	and targetm.calls.function_arg_advance with function_arg and
	function_arg_advance.
	* expr.c (block_move_libcall_safe_for_call_parm): Likewise.
	* function.c (aggregate_value_p): Replace
	targetm.calls.return_in_memory with return_in_memory.
	(assign_parm_find_entry_rtl): Replace
	targetm.calls.function_incoming_arg with function_incoming_arg.
	(assign_parms): Replace targetm.calls.function_arg_advance with

Re: [gomp4] Fix Fortran deviceptr

2015-12-08 Thread James Norris


Cesar,

On 12/07/2015 09:55 AM, Cesar Philippidis wrote:


[snip snip]
Two observations:

  1. Why is deviceptr so special that gomp_map_vars can't handle it
 directly?


Recall that the deviceptr clause in Fortran presents as two
mappings: FORCE_DEVICEPTR and POINTER. The former
has the device address in hostaddr, while the latter has the
host address of the pointer in hostaddr. The new code
detects a properly formed pair and if found, proceeds to
turn the latter into the FORCE_DEVICEPTR map, while the
former is effectively discarded by setting hostaddr to NULL.

This behavior is specific to OpenACC, so I decided put it where
I did. Could it be put into gomp_map_vars? Probably. Would it
be messy? Probably. But what may be a better solution would
be to use the functionality that handles the use_device_ptr
clause in OpenMP4.1 and eliminate FORCE_DEVICEPTR from
gomp_map_var. Have to look into that.




  2. It appears that deviceptr code in GOACC_parallel_keyed is mostly
 identical to GOACC_data_start. Can you put that duplicate code into
 a function? That would be easier to maintain in the long run.



Fixed.

Thanks,
Jim

Re: [gomp4] Fix Fortran deviceptr

2015-12-08 Thread Cesar Philippidis

On 12/08/2015 08:22 AM, James Norris wrote:

>>   2. It appears that deviceptr code in GOACC_parallel_keyed is mostly
>>  identical to GOACC_data_start. Can you put that duplicate code into
>>  a function? That would be easier to maintain in the long run.
>>
> 
> Fixed.

Where? I don't see a patch.

Since you're working on fortran, can you take a look at how
gfc_match_omp_clauses is handling OMP_CLAUSE_DEVICEPTR. It seems overly
confusing to me. Could it be done in a similar way as OMP_CLAUSE_COPYIN,
i.e., using gfc_match_omp_map_clause?

Cesar

Re: [C] Issue an error on scalar va_list with reverse storage order

> The new gcc.dg/sso-9.c test is failing for aarch64 and arm targets. There's
> no error generated if I compile the test from the command line for
> aarch64-none-elf. GCC for x86_64 does generate the error.

Fixed like so, applied as obvious, thanks for the heads up.


* gcc.dg/sso-9.c (foo): Robustify trick.

-- 
Eric BotcazouIndex: gcc.dg/sso-9.c
===
--- gcc.dg/sso-9.c	(revision 231394)
+++ gcc.dg/sso-9.c	(working copy)
@@ -22,6 +22,6 @@ void foo (int i, ...)
 {
   struct Rec a;
   va_start (a.v, i);
-  a.v = a.v, x = va_arg (a.v, int); /* { dg-error "array type|reverse storage order" } */
+  a.v = 0, x = va_arg (a.v, int); /* { dg-error "type|reverse storage order" } */
   va_end (a.v);
 }

Re: -fstrict-aliasing fixes 5/6: make type system independent of flag_strict_aliasing

> Usually cycles happen through structure members and it might be that
> all other frontends have the pointed-to type incomplete.  But the
> above recursion shouldn't apply for the structure case.

All types are equal in Ada and can be forward declared; the language specifies 
that their "elaboration" can be delayed until a "freeze" point (in particular, 
you cannot declare an object of the type until after it), from which all the 
incomplete references must be resolved to the final type.

> Not sure how your other examples look like.

Pure pointer cycles of any length.

-- 
Eric Botcazou

Re: [PATCH,RFC] Introduce RUN_UNDER_VALGRIND in test-suite

2015-12-08 Thread Jeff Law


On 12/08/2015 08:33 AM, Bernd Schmidt wrote:

On 12/08/2015 04:28 PM, Martin Liška wrote:


Majority of them (~2600 BTs) are in fortran FE (BT contains 'gfc_'): [2].
The rest contains some issues in CP FE, many GGC invalid read/write
operations ([4]) and many
memory leaks in gcc.c (for instance option handling).

My question is if a bug should be created for all fortran issues and
whether it's realistic that
they can be eventually fixed in next stage1?


It hardly seems worthwhile to file a bug for every issue, that's just
unnecessary bureaucracy. Fix them as you go along would be my
recommendation, that probably takes a similar amount of time.

Agreed, let's just fix them as we go along.

Suppressions should be a last resort in those cases where the code is 
operating correctly.  It does happen here and there.


jeff

[PATCH] Fix phiopt ICE in Factor conversion in COND_EXPR (PR tree-optimization/66949)

2015-12-08 Thread Marek Polacek

The following is a conservative fix for this PR.  This is an ICE transpiring
in the new "Factor conversion in COND_EXPR" optimization added in r225722.

Before this optimization kicks in, we have
  :
  ...
  p1_32 = (short unsigned int) _20;

  :
  ...
  iftmp.0_18 = (short unsigned int) _20;

  :
  ...
  # iftmp.0_19 = PHI 

after factor_out_conditional_conversion does its work, we end up with those two
def stmts removed and instead of the PHI we'll have

  # _35 = PHI <_20(3), _20(2)>
  iftmp.0_19 = (short unsigned int) _35;

That itself looks like a fine optimization, but after 
factor_out_conditional_conversion
there's
 320   phis = phi_nodes (bb2);
 321   phi = single_non_singleton_phi_for_edges (phis, e1, e2);
 322   gcc_assert (phi);
and phis look like
  b.2_38 = PHI 
  _35 = PHI <_20(3), _20(2)>
so single_non_singleton_phi_for_edges returns NULL and the subsequent assert 
triggers.

With this patch we won't ICE (and PRE should clean this up anyway), but I don't 
know,
maybe I should try harder to optimize even this problematical case (not sure 
how hard
it would be...)?

pr66949-2.c only ICEd on powerpc64le and I have verified that this patch fixes 
it too.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-12-08  Marek Polacek  

PR tree-optimization/66949
* tree-ssa-phiopt.c (factor_out_conditional_conversion): Return false if
NEW_ARG0 and NEW_ARG1 are equal.

* gcc.dg/torture/pr66949-1.c: New test.
* gcc.dg/torture/pr66949-2.c: New test.

diff --git gcc/testsuite/gcc.dg/torture/pr66949-1.c 
gcc/testsuite/gcc.dg/torture/pr66949-1.c
index e69de29..1b765bc 100644
--- gcc/testsuite/gcc.dg/torture/pr66949-1.c
+++ gcc/testsuite/gcc.dg/torture/pr66949-1.c
@@ -0,0 +1,28 @@
+/* PR tree-optimization/66949 */
+/* { dg-do compile } */
+
+int a, *b = , c;
+
+unsigned short
+fn1 (unsigned short p1, unsigned int p2)
+{
+  return p2 > 1 || p1 >> p2 ? p1 : p1 << p2;
+}
+
+void
+fn2 ()
+{
+  int *d = 
+  for (a = 0; a < -1; a = 1)
+;
+  if (a < 0)
+c = 0;
+  *b = fn1 (*d || c, *d);
+}
+
+int
+main ()
+{
+  fn2 ();
+  return 0;
+}
diff --git gcc/testsuite/gcc.dg/torture/pr66949-2.c 
gcc/testsuite/gcc.dg/torture/pr66949-2.c
index e69de29..e6250a3 100644
--- gcc/testsuite/gcc.dg/torture/pr66949-2.c
+++ gcc/testsuite/gcc.dg/torture/pr66949-2.c
@@ -0,0 +1,23 @@
+/* PR tree-optimization/66949 */
+/* { dg-do compile } */
+
+char a;
+int b, c, d;
+extern int fn2 (void);
+
+short
+fn1 (short p1, short p2)
+{
+  return p2 == 0 ? p1 : p1 / p2;
+}
+
+int
+main (void)
+{
+  char e = 1;
+  int f = 7;
+  c = a >> f;
+  b = fn1 (c, 0 < d <= e && fn2 ());
+
+  return 0;
+}
diff --git gcc/tree-ssa-phiopt.c gcc/tree-ssa-phiopt.c
index 344cd2f..caac5d5 100644
--- gcc/tree-ssa-phiopt.c
+++ gcc/tree-ssa-phiopt.c
@@ -477,6 +477,11 @@ factor_out_conditional_conversion (edge e0, edge e1, gphi 
*phi,
return false;
 }
 
+  /* If we were to continue, we'd create a PHI with same arguments for edges
+ E0 and E1.  That could get us in trouble later, so punt.  */
+  if (operand_equal_for_phi_arg_p (new_arg0, new_arg1))
+return false;
+
   /*  If arg0/arg1 have > 1 use, then this transformation actually increases
   the number of expressions evaluated at runtime.  */
   if (!has_single_use (arg0)

Marek

Re: Ping [PATCH] c++/42121 - diagnose invalid flexible array members

2015-12-08 Thread Martin Sebor


Thanks for the review and the helpful hints!

I've reworked and simplified the diagnostic part of the patch and
corrected the remaining issues I uncovered while testing the new
version (failing to reject some invalid flexible array members in
base classes).  Please find the new version in the attachment.

FWIW, in the patch, I tried to address only the reported problems
with flexible array members without changing how zero-length arrays
are treated.  That means that the latter are accepted in more cases
(with a pedantic warning) than the latter.  For example, the
following is accepted:

struct A {
int a[0];   // pedantic warning: zero-size array member
int n;  // not at end of struct A
};

while this is rejected

struct B {
int a[];// hard error: flexible array member not at
int n;  // end of struct B
};

It would be easy to change the patch to treat zero-length arrays
more like flexible array members if that's viewed as desirable.

I also tried to avoid rejecting flexible array members that are
currently accepted and that are safe.  For example, GCC currently
accepts both flexible array members and zero-length arrays in base
classes (even polymorphic ones).  The patch continues to accept
those for compatibility with code that relies on it as long as
the flexible array members didn't overlap other members in derived
classes.  For example, this is still accepted:

struct A { int x; };
struct B { int n, a[]; };
struct C: A, B { };

but this is rejected:

struct D: B, A { };

My replies to your comments are below.

On 12/04/2015 10:51 AM, Jason Merrill wrote:

On 12/03/2015 11:42 PM, Martin Sebor wrote:

+  if (next && TREE_CODE (next) == FIELD_DECL)


This will break if there's a non-field between the array and the next
field.


You're right, I missed that case in my testing.  Fixed.


@@ -4114,7 +4115,10 @@ walk_subobject_offsets (tree type,

   /* Avoid recursing into objects that are not interesting.  */
   if (!CLASS_TYPE_P (element_type)
-  || !CLASSTYPE_CONTAINS_EMPTY_CLASS_P (element_type))
+  || !CLASSTYPE_CONTAINS_EMPTY_CLASS_P (element_type)
+  || !domain
+  /* Flexible array members have no upper bound.  */
+  || !TYPE_MAX_VALUE (domain))


Why is this desirable?  We do want to avoid empty bases at the same
address as a flexible array of the same type.


As we discussed on IRC, this bit is fine.  I added a few tests for
the layout to make sure the offset of flexible array members matches
the size of the containing class.  While adding these tests I found
a couple of regressions unrelated to my changes (68727 and 68711) so
it was time well spent.


+  && !tree_int_cst_equal (size_max_node, TYPE_MAX_VALUE (dom)))


This can be integer_minus_onep or integer_all_onesp.


Thanks.




+ its fields.  The recursive call to the function will
+ either return 0 or the flexible array member whose


Let's say NULL_TREE here rather than 0.


Sure.




+  {
+bool dummy = false;
+check_flexarrays (t, TYPE_FIELDS (t), );
+  }


This should be called from check_bases_and_members, or even integrated
into check_field_decls.


I tried moving it to check_bases_and_members there but with more
testing found out that calling it there was too early. In order
to detect invalid flexible array members in virtual base classes
without rejecting valid ones, the primary base class needs to
have been determined.  That happens in in layout_class_type()
called later on in finish_struct_1(). So I moved it just past
that call.




-  else if (name)
-pedwarn (input_location, OPT_Wpedantic, "ISO C++ forbids
zero-size array %qD", name);


Why?


At one point, the diagnostic would emit a badly messed up name
in some corner cases.  I think it might have been when I set
TYPE_DOMAIN to NULL_TREE rather than with the current approach
(I can't reproduce it anymore). I've restored the else block.


Can we leave TYPE_DOMAIN null for flexible arrays so you don't need to
add special new handling all over the place?


This was my initial approach until I noticed that it diverged
from C where TYPE_DOMAIN is set to the range [0, NULL_TREE], so
I redid it for consistency.




-tree decl;
+tree decl = NULL_TREE;


Why?


To avoid an ICE later on.  I didn't spend too much time trying
to understand how the control flow changed to trigger it but my
guess is that it has to do with the change to the upper bound.

/home/msebor/scm/fsf/gcc-42121/gcc/testsuite/g++.dg/ext/flexary2.C:16:9: 
internal compiler error: tree check: expected tree that contains 'typed' 
structure, have '' in cp_apply_type_quals_to_decl, at 
cp/typeck.c:9134
0x13103fe tree_contains_struct_check_failed(tree_node const*, 
tree_node_structure_enum, char const*, int, char const*)

 /home/msebor/scm/fsf/gcc-42121/gcc/tree.c:9760
0x726b12 contains_struct_check(tree_node*, tree_node_structure_enum, 
char const*, int, char const*)

Re: [PATCH 1/2] PR c/68757: fix uninitialied src_range for various builtins


On 12/08/2015 04:43 PM, David Malcolm wrote:

This fixes various uninitialized src_range of c_expr, this time
in the various builtins that are parsed via c_parser_get_builtin_args.

Bootstrapped on x86_64-pc-linux-gnu.

OK for trunk?


I think both of these patches are OK. Some questions though.


" a constant");
constant_expression_warning (c);
expr = integer_zerop (c) ? *e3_p : *e2_p;
+   set_c_expr_source_range (, loc, close_paren_loc);


If that had an uninitialized range, it implies that the *eN_p 
expressions also have uninitialized parts. Correct? If so, I think we 
should fix that.


Also, what happened to the idea of a constructor for c_expr_t?


Bernd

Re: [PATCH 1/2] PR c/68757: fix uninitialied src_range for various builtins

2015-12-08 Thread David Malcolm

On Tue, 2015-12-08 at 16:38 +0100, Bernd Schmidt wrote:
> On 12/08/2015 04:43 PM, David Malcolm wrote:
> > This fixes various uninitialized src_range of c_expr, this time
> > in the various builtins that are parsed via c_parser_get_builtin_args.
> >
> > Bootstrapped on x86_64-pc-linux-gnu.
> >
> > OK for trunk?
> 
> I think both of these patches are OK. Some questions though.
> 
> > " a constant");
> > constant_expression_warning (c);
> > expr = integer_zerop (c) ? *e3_p : *e2_p;
> > +   set_c_expr_source_range (, loc, close_paren_loc);
> 
> If that had an uninitialized range, it implies that the *eN_p 
> expressions also have uninitialized parts. Correct? If so, I think we 
> should fix that.

Hmmm... re-reading the patch, I think my description was sloppy; sorry.

I believe uninitialized data affected RID_BUILTIN_COMPLEX and
RID_BUILTIN_SHUFFLE, whereas RID_CHOOSE_EXPR and
RID_BUILTIN_CALL_WITH_STATIC_CHAIN copied src_range data from
subexpressions.

The fix for the uninitialized data made me think "what should the source
range of RID_BUILTIN_COMPLEX be?"  Hence I noticed that each of the
places where c_parser_get_builtin_args was used didn't have an explicit
choice about src_range, and that it ought to use the builtin args in its
range, and so I updated these.

Within RID_CHOOSE_EXPR and RID_BUILTIN_CALL_WITH_STATIC_CHAIN, *eN_P
come from c_parser_get_builtin_args, and reviewing that function, it
purely gets them by copying them from the return value of
c_parser_expr_no_commas, so there's no uninitialized data introduced
there.  [There could be an argument that for RID_CHOOSE_EXPR we should
use the range of the chosen expression, which was the status quo for
that case; the patch makes it use the range of the full
__builtin_choose_expr () for consistency with other the builtins].

(I believe the proposed ChangeLog does accurately reflect the change;
just the title may need tweaking).

> Also, what happened to the idea of a constructor for c_expr_t?

I actually implemented something like this when implementing these two
patches.  

Work-in-progress patch attached, which introduces an INVALID_LOCATION
value for source_location/location_t, and uses it to "poison" the
initial value of c_expr's src_range, with lots of assertions to verify
that it's been overwritten by time we use it.

It doesn't fully work yet, but much of gcc.dg survives with this (and
it's what I used to detect the alignof issue in patch 2).  Does this
look like something I should pursue?
>From cf8aa843a7bb5f173eb8107b1c721e964b9e0677 Mon Sep 17 00:00:00 2001
From: David Malcolm 
Date: Mon, 7 Dec 2015 13:47:03 -0500
Subject: [PATCH] WIP: add INVALID_LOCATION and use it to poison src_range

---
 gcc/c/c-parser.c  |  8 +++-
 gcc/c/c-tree.h| 26 ++
 gcc/c/c-typeck.c  |  2 ++
 gcc/tree.c|  6 ++
 libcpp/include/line-map.h | 10 +-
 libcpp/line-map.c |  7 +++
 6 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 4611e5b..cded9fe 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -63,6 +63,8 @@ void
 set_c_expr_source_range (c_expr *expr,
 			 location_t start, location_t finish)
 {
+  gcc_assert (start != INVALID_LOCATION);
+  gcc_assert (finish != INVALID_LOCATION);
   expr->src_range.m_start = start;
   expr->src_range.m_finish = finish;
   if (expr->value)
@@ -73,6 +75,8 @@ void
 set_c_expr_source_range (c_expr *expr,
 			 source_range src_range)
 {
+  gcc_assert (src_range.m_start != INVALID_LOCATION);
+  gcc_assert (src_range.m_finish != INVALID_LOCATION);
   expr->src_range = src_range;
   if (expr->value)
 set_source_range (expr->value, src_range);
@@ -6145,6 +6149,8 @@ c_parser_expr_no_commas (c_parser *parser, struct c_expr *after,
   c_parser_consume_token (parser);
   exp_location = c_parser_peek_token (parser)->location;
   rhs = c_parser_expr_no_commas (parser, NULL);
+  gcc_assert (rhs.src_range.m_start != INVALID_LOCATION);
+  gcc_assert (rhs.src_range.m_finish != INVALID_LOCATION);
   rhs = convert_lvalue_to_rvalue (exp_location, rhs, true, true);

   ret.value = build_modify_expr (op_location, lhs.value, lhs.original_type,
@@ -6917,7 +6923,6 @@ c_parser_alignof_expression (c_parser *parser)
 }
   else
 {
-  struct c_expr ret;
   expr = c_parser_unary_expression (parser);
   end_loc = expr.src_range.m_finish;
 alignof_expr:
@@ -6927,6 +6932,7 @@ c_parser_alignof_expression (c_parser *parser)
   pedwarn (start_loc,
 	   OPT_Wpedantic, "ISO C does not allow %<%E (expression)%>",
 	   alignof_spelling);
+  struct c_expr ret;
   ret.value = c_alignof_expr (start_loc, expr.value);
   ret.original_code = ERROR_MARK;
   ret.original_type = NULL;
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 00e72b1..30625ee 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@

[PATCH, testsuite] Fix sse4_1-round* inline asm statements

2015-12-08 Thread Bernd Edlinger


Hi,

this patch fixes the asm statements in the gcc.target/i386/sse4_1-round* test 
cases.

They do lots of things that are just absolutely forbidden, like clobber 
registers
that are not mentioned in the clobber list, and create a hidden data flow.

The test cases work just by chance, and You can see the asm statements
ripped completely apart by the loop optimizer if you try to do the assembler
part in a loop:

  for (i = 0; i < 10; i++) {
  __asm__ ("fld" ASM_SUFFIX " %0" : : "m" (*));

  __asm__ ("fstcw %0" : "=m" (*_cw));
  new_cw = saved_cw & clr_mask;
  new_cw |= type;
  __asm__ ("fldcw %0" : : "m" (*_cw));

  __asm__ ("frndint\n"
   "fstp" ASM_SUFFIX " %0\n" : "=m" (*));
  __asm__ ("fldcw %0" : : "m" (*_cw));
  }
  return ret;

So this patch avoids the hidden data flow, and
adds "st" to the clobber list.

Boot-strapped and reg-tested on x86_64-pc-linux-gnu
OK for trunk?


Thanks
Bernd.2015-12-08  Bernd Edlinger  

	* gcc.target/i386/sse4_1-round.h: Fix inline asm statements.
	* gcc.target/i386/sse4_1-roundsd-4.c: Fix inline asm statements.
	* gcc.target/i386/sse4_1-roundss-4.c: Fix inline asm statements.

Index: gcc/testsuite/gcc.target/i386/sse4_1-round.h
===
--- gcc/testsuite/gcc.target/i386/sse4_1-round.h	(revision 231343)
+++ gcc/testsuite/gcc.target/i386/sse4_1-round.h	(working copy)
@@ -42,16 +42,16 @@ do_round (FP_T f, int type)
   clr_mask = ~0x0C3F;
 }
 
-  __asm__ ("fld" ASM_SUFFIX " %0" : : "m" (*));
-
   __asm__ ("fstcw %0" : "=m" (*_cw));
   new_cw = saved_cw & clr_mask;
   new_cw |= type;
-  __asm__ ("fldcw %0" : : "m" (*_cw));
-
-  __asm__ ("frndint\n"
-	   "fstp" ASM_SUFFIX " %0\n" : "=m" (*));
-  __asm__ ("fldcw %0" : : "m" (*_cw));
+  __asm__ ("fld" ASM_SUFFIX " %1\n"
+	   "fldcw %2\n"
+	   "frndint\n"
+	   "fstp" ASM_SUFFIX " %0\n"
+	   "fldcw %3" : "=m" (*)
+		  : "m" (*), "m" (*_cw), "m" (*_cw)
+		  : "st");
   return ret;
 }
 
Index: gcc/testsuite/gcc.target/i386/sse4_1-roundsd-4.c
===
--- gcc/testsuite/gcc.target/i386/sse4_1-roundsd-4.c	(revision 231343)
+++ gcc/testsuite/gcc.target/i386/sse4_1-roundsd-4.c	(working copy)
@@ -50,16 +50,16 @@ do_round (double f, int type)
   clr_mask = ~0x0C3F;
 }
 
-  __asm__ ("fldl %0" : : "m" (*));
-
   __asm__ ("fstcw %0" : "=m" (*_cw));
   new_cw = saved_cw & clr_mask;
   new_cw |= type;
-  __asm__ ("fldcw %0" : : "m" (*_cw));
-
-  __asm__ ("frndint\n"
-	   "fstpl %0\n" : "=m" (*));
-  __asm__ ("fldcw %0" : : "m" (*_cw));
+  __asm__ ("fldl %1\n"
+	   "fldcw %2\n"
+	   "frndint\n"
+	   "fstpl %0\n"
+	   "fldcw %3" : "=m" (*)
+		  : "m" (*), "m" (*_cw), "m" (*_cw)
+		  : "st");
   return ret;
 }
 
Index: gcc/testsuite/gcc.target/i386/sse4_1-roundss-4.c
===
--- gcc/testsuite/gcc.target/i386/sse4_1-roundss-4.c	(revision 231343)
+++ gcc/testsuite/gcc.target/i386/sse4_1-roundss-4.c	(working copy)
@@ -50,16 +50,16 @@ do_round (float f, int type)
   clr_mask = ~0x0C3F;
 }
 
-  __asm__ ("flds %0" : : "m" (*));
-
   __asm__ ("fstcw %0" : "=m" (*_cw));
   new_cw = saved_cw & clr_mask;
   new_cw |= type;
-  __asm__ ("fldcw %0" : : "m" (*_cw));
-
-  __asm__ ("frndint\n"
-	   "fstps %0\n" : "=m" (*));
-  __asm__ ("fldcw %0" : : "m" (*_cw));
+  __asm__ ("flds %1\n"
+	   "fldcw %2\n"
+	   "frndint\n"
+	   "fstps %0\n"
+	   "fldcw %3" : "=m" (*)
+		  : "m" (*), "m" (*_cw), "m" (*_cw)
+		  : "st");
   return ret;
 }

Re: [PATCH] Convert SPARC to LRA

2015-12-08 Thread David Miller

From: Sebastian Huber 
Date: Tue, 8 Dec 2015 11:17:53 +0100

> since the LRA patch is still reverted on the trunk, I guess the
> switch to LRA will not happen in GCC 6?

Indeed, it is unlikely I will have time to work on this for at
least a month.

Re: [PATCH,RFC] Introduce RUN_UNDER_VALGRIND in test-suite

On 12/03/2015 03:15 PM, Bernd Schmidt wrote:
> On 11/23/2015 10:34 AM, Martin Liška wrote:
>> On 11/21/2015 05:26 AM, Hans-Peter Nilsson wrote:
>>> IIRC you can replace the actual dg-runtest proc with your own
>>> (implementing a wrapper).  Grep aroung, I think we do that
>>> already.  That's certainly preferable instead of touching all
>>> callers.
>>
>> You are right, the suggested patch was over-kill, wrapper should be fine for 
>> that.
>> Currently I've been playing with a bit different approach (suggested by 
>> Markus),
>> where I would like to enable valgrind in gcc.c using an environmental 
>> variable.
>>
>> Question is if it should replace existing ENABLE_VALGRIND_CHECKING and how to
>> integrate it with a valgrind suppressions file?
> 
> This patch still seems to be in the queue. I've been looking at it every now 
> and then, without really forming an opinion. In any case, I think we'll need 
> to postpone this to stage1 at this point.
> 
> Wouldn't it be better to fix issues first and only then enable running the 
> testsuite with valgrind, rather than make a suppression file?
> 
> Your latest patch seems to add the option of running the compiler without 
> ENABLE_CHECKING_VALGRIND being defined. Doesn't this run into problems when 
> the support in ggc isn't compiled in?
> 
> 
> Bernd

Hi.

Right, the patch is in queue and can wait for next stage1. I must agree with 
Hans-Peter Nilsson that we should
mainly focus on removal of memory leaks (and other invalid operations) rather 
that maintaining a list of suppressions.
After that, integration with existing configure machine should be easily 
doable, I guess.

I've just run the test-suite (with default languages) and report file was 
post-processed with my script [1] that
groups same back-traces together.

Currently we have ~20 errors, in ~4000 different back-traces.

Majority of them (~2600 BTs) are in fortran FE (BT contains 'gfc_'): [2].
The rest contains some issues in CP FE, many GGC invalid read/write operations 
([4]) and many
memory leaks in gcc.c (for instance option handling).

My question is if a bug should be created for all fortran issues and whether 
it's realistic that
they can be eventually fixed in next stage1?

Thanks,
Martin

[1] https://github.com/marxin/script-misc/blob/master/valgrind-grep.py
[2] 
https://drive.google.com/file/d/0B0pisUJ80pO1ZjdCVlZoeGZQNjg/view?usp=sharing
[3] 
https://drive.google.com/file/d/0B0pisUJ80pO1aFZTWk5sVTBlcHc/view?usp=sharing
[4] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68758

Re: [C] Issue an error on scalar va_list with reverse storage order

2015-12-08 Thread Matthew Wahab


Hello

On 03/12/15 14:53, Eric Botcazou wrote:

further testing revealed an issue with va_arg handling and reverse scalar 
storage
order on some platforms: when va_list is scalar, passing a field of a structure
with reverse SSO as first argument to va_start/va_arg/va_end doesn't work 
because
the machinery takes its address and this is not allowed for such a field (it's
really a corner case but gcc.c-torture/execute/stdarg-2.c does exercise it). 
Hence
the attached patch which issues an error in this case.


The new gcc.dg/sso-9.c test is failing for aarch64 and arm targets. There's no 
error
generated if I compile the test from the command line for aarch64-none-elf. GCC 
for
x86_64 does generate the error.

Matthew


2015-12-03  Eric Botcazou  

* c-tree.h (c_build_va_arg): Adjust prototype. * c-parser.c
(c_parser_postfix_expression): Adjust call to above. * c-typeck.c
(c_build_va_arg): Rename LOC parameter to LOC2, add LOC1 parameter, adjust
throughout and issue an error if EXPR is a component with reverse storage order.


2015-12-03  Eric Botcazou  

* gcc.dg/sso-9.c: New test.

Re: [patch] Fix PR middle-end/68291 & 68292


On 12/08/2015 11:50 AM, Eric Botcazou wrote:


I'm going to test it on x86-64, SPARC64 and Aarch64.


PR middle-end/68291
PR middle-end/68292
* cfgexpand.c (set_rtl): Always accept mode mismatch for SSA names
with BLKmode promoted mode based on RESULT_DECLs.


I think that is ok if the testing passes.


Bernd

[DOC, PATCH] Mention --enable-valgrind-annotations in install.texi

Hello.

I would like to add a missing configure option.

Thanks,
Martin
>From f828b34177908aebb1efab194f749bbdac0b1dbb Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 8 Dec 2015 16:54:43 +0100
Subject: [PATCH] Mention --enable-valgrind-annotations in install.texi

gcc/ChangeLog:

2015-12-08  Martin Liska  

	* doc/install.texi (--enable-valgrind-annotations): Mention
	the configure option in configure page.
---
 gcc/doc/install.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 0b71bef..6cb2079 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1584,6 +1584,11 @@ do a @samp{make -C gcc gnatlib_and_tools}.
 Specify that the run-time libraries for the various sanitizers should
 not be built.
 
+@item --enable-valgrind-annotations
+Specify that the compiler should interact with valgrind runtime, where
+selected invalid memory reads are marked as false positives and
+garbage collected memory is properly marked for proper interaction.
+
 @item --disable-libssp
 Specify that the run-time libraries for stack smashing protection
 should not be built.
-- 
2.6.3

[PATCH] Fix PR ipa/68790

Hi.

I'm sending patch for the PR, unfortunately I've back-ported Honza's patch
that introduced the regression to GCC 5.3.

Fix is obvious, I've been running regression tests for both
trunk and gcc-5-branch. Identical patch can be applied (modulo a line with 
'cleanup-ipa-dump').

Ready after it finishes?

Thanks,
Martin
>From 9f547d06489ffd85b20b09067b4cfbca2c0ff70e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 8 Dec 2015 18:46:44 +0100
Subject: [PATCH] Fix PR ipa/68790

gcc/ChangeLog:

2015-12-08  Martin Liska  

	PR ipa/68790
	* ipa-icf.c (sem_function::param_used_p): Return true
	if ipa_node_params_sum equals to NULL.

gcc/testsuite/ChangeLog:

2015-12-08  Martin Liska  

	* gcc.dg/ipa/pr68790.c: New test.
---
 gcc/ipa-icf.c  |  2 +-
 gcc/testsuite/gcc.dg/ipa/pr68790.c | 40 ++
 2 files changed, 41 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr68790.c

diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 0029a48..889a07d 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -544,7 +544,7 @@ bool
 sem_function::param_used_p (unsigned int i)
 {
   if (ipa_node_params_sum == NULL)
-return false;
+return true;
 
   struct ipa_node_params *parms_info = IPA_NODE_REF (get_node ());
 
diff --git a/gcc/testsuite/gcc.dg/ipa/pr68790.c b/gcc/testsuite/gcc.dg/ipa/pr68790.c
new file mode 100644
index 000..258c004
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr68790.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-options "-O0 -fipa-icf -fdump-ipa-icf"  } */
+
+struct S
+{
+  int a;
+};
+
+int
+foo3 (struct S x, struct S y, struct S z)
+{
+  if (z.a != 9)
+__builtin_abort ();
+  return 0;
+}
+
+int
+bar3 (struct S x, struct S y, struct S z)
+{
+  return foo3 (y, x, z);
+}
+
+int
+baz3 (struct S x, struct S y, struct S z)
+{
+  return foo3 (y, z, x);
+}
+
+int
+main (void)
+{
+  struct S
+a = { 3 },
+b = { 6 },
+c = { 9 };
+
+  return bar3 (b, a, c);
+}
+
+/* { dg-final { scan-ipa-dump "Equal symbols: 0" "icf"  } } */
-- 
2.6.3

Re: [PATCH, testsuite] Fix sse4_1-round* inline asm statements

Hello!

> this patch fixes the asm statements in the gcc.target/i386/sse4_1-round* test 
> cases.
>
> They do lots of things that are just absolutely forbidden, like clobber 
> registers
> that are not mentioned in the clobber list, and create a hidden data flow.
>
> The test cases work just by chance, and You can see the asm statements
> ripped completely apart by the loop optimizer if you try to do the assembler
> part in a loop:
>
>   for (i = 0; i < 10; i++) {
>   __asm__ ("fld" ASM_SUFFIX " %0" : : "m" (*));
>
>   __asm__ ("fstcw %0" : "=m" (*_cw));
>   new_cw = saved_cw & clr_mask;
>   new_cw |= type;
>   __asm__ ("fldcw %0" : : "m" (*_cw));
>
>   __asm__ ("frndint\n"
>"fstp" ASM_SUFFIX " %0\n" : "=m" (*));
>   __asm__ ("fldcw %0" : : "m" (*_cw));
>   }
>   return ret;
>
> So this patch avoids the hidden data flow, and
> adds "st" to the clobber list.
>
> Boot-strapped and reg-tested on x86_64-pc-linux-gnu
> OK for trunk?

Uh, no.

x87 is a strange beast, and even your patch is wrong. The assembly
should be written in this way:

  __asm__ ("fnstcw %0" : "=m" (saved_cw));

  new_cw = saved_cw & clr_mask;
  new_cw |= type;

  __asm__ ("fldcw %2\n\t"
   "frndint\n\t"
   "fldcw %3" : "=t" (ret)
  : "0" (f), "m" (new_cw), "m" (saved_cw));

I'm testing the attached patch.

Uros.
Index: gcc.target/i386/sse4_1-roundsd-4.c
===
--- gcc.target/i386/sse4_1-roundsd-4.c  (revision 231413)
+++ gcc.target/i386/sse4_1-roundsd-4.c  (working copy)
@@ -36,7 +36,7 @@ init_round (double *src)
 static double
 do_round (double f, int type)
 {
-  short saved_cw, new_cw, clr_mask;
+  unsigned short saved_cw, new_cw, clr_mask;
   double ret;
 
   if ((type & 4))
@@ -50,16 +50,15 @@ do_round (double f, int type)
   clr_mask = ~0x0C3F;
 }
 
-  __asm__ ("fldl %0" : : "m" (*));
+  __asm__ ("fstcw %0" : "=m" (saved_cw));
 
-  __asm__ ("fstcw %0" : "=m" (*_cw));
   new_cw = saved_cw & clr_mask;
   new_cw |= type;
-  __asm__ ("fldcw %0" : : "m" (*_cw));
 
-  __asm__ ("frndint\n"
-  "fstpl %0\n" : "=m" (*));
-  __asm__ ("fldcw %0" : : "m" (*_cw));
+  __asm__ ("fldcw %2\n\t"
+  "frndint\n\t"
+  "fldcw %3" : "=t" (ret)
+ : "0" (f), "m" (new_cw), "m" (saved_cw));
   return ret;
 }
 
Index: gcc.target/i386/sse4_1-roundss-4.c
===
--- gcc.target/i386/sse4_1-roundss-4.c  (revision 231413)
+++ gcc.target/i386/sse4_1-roundss-4.c  (working copy)
@@ -36,7 +36,7 @@ init_round (float *src)
 static float
 do_round (float f, int type)
 {
-  short saved_cw, new_cw, clr_mask;
+  unsigned short saved_cw, new_cw, clr_mask;
   float ret;
 
   if ((type & 4))
@@ -50,16 +50,15 @@ do_round (float f, int type)
   clr_mask = ~0x0C3F;
 }
 
-  __asm__ ("flds %0" : : "m" (*));
+  __asm__ ("fstcw %0" : "=m" (saved_cw));
 
-  __asm__ ("fstcw %0" : "=m" (*_cw));
   new_cw = saved_cw & clr_mask;
   new_cw |= type;
-  __asm__ ("fldcw %0" : : "m" (*_cw));
 
-  __asm__ ("frndint\n"
-  "fstps %0\n" : "=m" (*));
-  __asm__ ("fldcw %0" : : "m" (*_cw));
+  __asm__ ("fldcw %2\n\t"
+  "frndint\n\t"
+  "fldcw %3" : "=t" (ret)
+ : "0" (f), "m" (new_cw), "m" (saved_cw));
   return ret;
 }

Re: [PATCH] Fix warnings from including fdl.texi into gnat-style.texi

2015-12-08 Thread Gerald Pfeifer

Hi Tom,

On Tue, 8 Dec 2015, Tom de Vries wrote:
>>> Can you approve the fdl part?
>> Let's assume I can.  Okay.
> was the 'Okay' above:
> - a figure of speech (as I read it), or
> - an actual approval (conditional on the adding of the comment)
> ?

I should have written this as "Let's assume I can: Okay." or
better "Let's assume I can. -> Okay."

Yes, please consider this approved.

Sorry if you have been waiting due to this!

Gerald

Re: [PATCH, testsuite] Fix sse4_1-round* inline asm statements

On Tue, Dec 8, 2015 at 7:10 PM, Uros Bizjak  wrote:
> Hello!
>
>> this patch fixes the asm statements in the gcc.target/i386/sse4_1-round* 
>> test cases.
>>
>> They do lots of things that are just absolutely forbidden, like clobber 
>> registers
>> that are not mentioned in the clobber list, and create a hidden data flow.
>>
>> The test cases work just by chance, and You can see the asm statements
>> ripped completely apart by the loop optimizer if you try to do the assembler
>> part in a loop:
>>
>>   for (i = 0; i < 10; i++) {
>>   __asm__ ("fld" ASM_SUFFIX " %0" : : "m" (*));
>>
>>   __asm__ ("fstcw %0" : "=m" (*_cw));
>>   new_cw = saved_cw & clr_mask;
>>   new_cw |= type;
>>   __asm__ ("fldcw %0" : : "m" (*_cw));
>>
>>   __asm__ ("frndint\n"
>>"fstp" ASM_SUFFIX " %0\n" : "=m" (*));
>>   __asm__ ("fldcw %0" : : "m" (*_cw));
>>   }
>>   return ret;
>>
>> So this patch avoids the hidden data flow, and
>> adds "st" to the clobber list.
>>
>> Boot-strapped and reg-tested on x86_64-pc-linux-gnu
>> OK for trunk?
>
> Uh, no.
>
> x87 is a strange beast, and even your patch is wrong. The assembly
> should be written in this way:
>
>   __asm__ ("fnstcw %0" : "=m" (saved_cw));
>
>   new_cw = saved_cw & clr_mask;
>   new_cw |= type;
>
>   __asm__ ("fldcw %2\n\t"
>"frndint\n\t"
>"fldcw %3" : "=t" (ret)
>   : "0" (f), "m" (new_cw), "m" (saved_cw));
>
> I'm testing the attached patch.

Committed slightly updated patch to mainline SVN with the following ChangeLog:

2015-12-08  Uros Bizjak  

* gcc.target/i386/sse4_1-round.h (do_round): Fix inline asm statements.
* gcc.target/i386/sse4_1-roundsd-4.c (do_round): Ditto.
* gcc.target/i386/sse4_1-roundss-4.c (do_round): Ditto.

Tested on x86_64-linux-gnu {,-m32}, committed to mainline SVN, will be
backported to all active branches.

Uros.
Index: gcc.target/i386/sse4_1-round.h
===
--- gcc.target/i386/sse4_1-round.h  (revision 231413)
+++ gcc.target/i386/sse4_1-round.h  (working copy)
@@ -28,7 +28,7 @@ init_round (FP_T *src)
 static FP_T
 do_round (FP_T f, int type)
 {
-  short saved_cw, new_cw, clr_mask;
+  unsigned short saved_cw, new_cw, clr_mask;
   FP_T ret;
 
   if ((type & 4))
@@ -42,16 +42,15 @@ do_round (FP_T f, int type)
   clr_mask = ~0x0C3F;
 }
 
-  __asm__ ("fld" ASM_SUFFIX " %0" : : "m" (*));
+  __asm__ ("fnstcw %0" : "=m" (saved_cw));
 
-  __asm__ ("fstcw %0" : "=m" (*_cw));
   new_cw = saved_cw & clr_mask;
   new_cw |= type;
-  __asm__ ("fldcw %0" : : "m" (*_cw));
 
-  __asm__ ("frndint\n"
-  "fstp" ASM_SUFFIX " %0\n" : "=m" (*));
-  __asm__ ("fldcw %0" : : "m" (*_cw));
+  __asm__ ("fldcw %2\n\t"
+  "frndint\n\t"
+  "fldcw %3" : "=t" (ret)
+ : "0" (f), "m" (new_cw), "m" (saved_cw));
   return ret;
 }
 
Index: gcc.target/i386/sse4_1-roundsd-4.c
===
--- gcc.target/i386/sse4_1-roundsd-4.c  (revision 231413)
+++ gcc.target/i386/sse4_1-roundsd-4.c  (working copy)
@@ -36,7 +36,7 @@ init_round (double *src)
 static double
 do_round (double f, int type)
 {
-  short saved_cw, new_cw, clr_mask;
+  unsigned short saved_cw, new_cw, clr_mask;
   double ret;
 
   if ((type & 4))
@@ -50,16 +50,15 @@ do_round (double f, int type)
   clr_mask = ~0x0C3F;
 }
 
-  __asm__ ("fldl %0" : : "m" (*));
+  __asm__ ("fnstcw %0" : "=m" (saved_cw));
 
-  __asm__ ("fstcw %0" : "=m" (*_cw));
   new_cw = saved_cw & clr_mask;
   new_cw |= type;
-  __asm__ ("fldcw %0" : : "m" (*_cw));
 
-  __asm__ ("frndint\n"
-  "fstpl %0\n" : "=m" (*));
-  __asm__ ("fldcw %0" : : "m" (*_cw));
+  __asm__ ("fldcw %2\n\t"
+  "frndint\n\t"
+  "fldcw %3" : "=t" (ret)
+ : "0" (f), "m" (new_cw), "m" (saved_cw));
   return ret;
 }
 
Index: gcc.target/i386/sse4_1-roundss-4.c
===
--- gcc.target/i386/sse4_1-roundss-4.c  (revision 231413)
+++ gcc.target/i386/sse4_1-roundss-4.c  (working copy)
@@ -36,7 +36,7 @@ init_round (float *src)
 static float
 do_round (float f, int type)
 {
-  short saved_cw, new_cw, clr_mask;
+  unsigned short saved_cw, new_cw, clr_mask;
   float ret;
 
   if ((type & 4))
@@ -50,16 +50,15 @@ do_round (float f, int type)
   clr_mask = ~0x0C3F;
 }
 
-  __asm__ ("flds %0" : : "m" (*));
+  __asm__ ("fnstcw %0" : "=m" (saved_cw));
 
-  __asm__ ("fstcw %0" : "=m" (*_cw));
   new_cw = saved_cw & clr_mask;
   new_cw |= type;
-  __asm__ ("fldcw %0" : : "m" (*_cw));
 
-  __asm__ ("frndint\n"
-  "fstps %0\n" : "=m" (*));
-  __asm__ ("fldcw %0" : : "m" (*_cw));
+  __asm__ ("fldcw %2\n\t"
+  "frndint\n\t"
+  "fldcw %3" : "=t" (ret)
+ : "0" (f), "m" (new_cw), "m" (saved_cw));
   return ret;
 }

Re: [patch] Fix PR middle-end/68291 & 68292

> I think that is ok if the testing passes.

Thanks.  It did on the 3 architectures so I have applied the patch.

-- 
Eric Botcazou

[RFC] Request for comments on ivopts patch

2015-12-08 Thread Steve Ellcey

I have an ivopts optimization question/proposal.  When compiling the
attached program the ivopts pass prefers the original ivs over new ivs
and that causes us to generate less efficient code on MIPS.  It may
affect other platforms too.

The Source code is a C strcmp:

int strcmp (const char *p1, const char *p2)
{
  const unsigned char *s1 = (const unsigned char *) p1;
  const unsigned char *s2 = (const unsigned char *) p2;
  unsigned char c1, c2;
  do {
  c1 = (unsigned char) *s1++;
  c2 = (unsigned char) *s2++;
  if (c1 == '\0') return c1 - c2;
  } while (c1 == c2);
  return c1 - c2;
}

Currently the code prefers the original ivs and so it generates
code that increments s1 and s2 before doing the loads (and uses
a -1 offset):

  :
  # s1_1 = PHI 
  # s2_2 = PHI 
  s1_6 = s1_1 + 1;
  c1_8 = MEM[base: s1_6, offset: 4294967295B];
  s2_9 = s2_2 + 1;
  c2_10 = MEM[base: s2_9, offset: 4294967295B];
  if (c1_8 == 0)
goto ;
  else
goto ;

If I remove the cost increment for non-original ivs then GCC
does the loads before the increments:

 :
  # ivtmp.6_17 = PHI 
  # ivtmp.7_21 = PHI 
  _25 = (void *) ivtmp.6_17;
  c1_8 = MEM[base: _25, offset: 0B];
  _26 = (void *) ivtmp.7_21;
  c2_10 = MEM[base: _26, offset: 0B];
  if (c1_8 == 0)
goto ;
  else
goto ;
.
.
  :
  ivtmp.6_14 = ivtmp.6_17 + 1;
  ivtmp.7_23 = ivtmp.7_21 + 1;
  if (c1_8 == c2_10)
goto ;
  else
goto ;


This second case (without the preference for the original IV)
generates better code on MIPS because the final assembly 
has the increment instructions between the loads and the tests
of the values being loaded and so there is no delay (or less delay)
between the load and use.  It seems like this could easily be 
the case for other platforms too so I was wondering what people
thought of this patch:


2015-12-08  Steve Ellcey  

* tree-ssa-loop-ivopts.c (determine_iv_cost): Remove preference for
original ivs.


diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 98dc451..26daabc 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5818,14 +5818,6 @@ determine_iv_cost (struct ivopts_data *data, struct 
iv_cand *cand)
 
   cost = cost_step + adjust_setup_cost (data, cost_base.cost);
 
-  /* Prefer the original ivs unless we may gain something by replacing it.
- The reason is to make debugging simpler; so this is not relevant for
- artificial ivs created by other optimization passes.  */
-  if (cand->pos != IP_ORIGINAL
-  || !SSA_NAME_VAR (cand->var_before)
-  || DECL_ARTIFICIAL (SSA_NAME_VAR (cand->var_before)))
-cost++;
-
   /* Prefer not to insert statements into latch unless there are some
  already (so that we do not create unnecessary jumps).  */
   if (cand->pos == IP_END

[PATCH, i386]: Fix PR68701: ICE with -ffixed-ebp

Hello!

Attached patch fixes a compilation corner case, where -ffixed-ebp
option interferes with stack realignment requirements (the realignment
without -maccumulate-outgoing-args needs live %ebp).

The ICE can be worked around by adding -maccumulate-outgoing-args to
compile flags, and this is what the patch does, in addition to
emitting an informative warning.

2015-12-08  Uros Bizjak  

PR target/68701
* config/i386/i386.c (ix86_option_override_internal): Enable
-maccumulate-outgoing-args when %ebp is fixed due to stack
realignment requirements.

testsuite/ChangeLog:

2015-12-08  Uros Bizjak  

PR target/68701
* testsuite/gcc.target/i386/pr68701-1.c: New test.
* testsuite/gcc.target/i386/pr68701-2.c: Ditto.

Patch was bootstrapped and regression tested on x86_64-linux-gnu
{,-m32}. Patch was committed to mainline SVN and will be backported to
all active branches in a couple of days.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 231413)
+++ config/i386/i386.c  (working copy)
@@ -5296,6 +5296,17 @@ ix86_option_override_internal (bool main_args_p,
   opts->x_target_flags |= MASK_ACCUMULATE_OUTGOING_ARGS;
 }
 
+  /* Stack realignment without -maccumulate-outgoing-args requires %ebp,
+ so enable -maccumulate-outgoing-args when %ebp is fixed.  */
+  if (fixed_regs[BP_REG]
+  && !(opts->x_target_flags & MASK_ACCUMULATE_OUTGOING_ARGS))
+{
+  if (opts_set->x_target_flags & MASK_ACCUMULATE_OUTGOING_ARGS)
+   warning (0, "fixed ebp register requires %saccumulate-outgoing-args%s",
+prefix, suffix);
+  opts->x_target_flags |= MASK_ACCUMULATE_OUTGOING_ARGS;
+}
+
   /* Figure out what ASM_GENERATE_INTERNAL_LABEL builds as a prefix.  */
   {
 char *p;
Index: testsuite/gcc.target/i386/pr68701-1.c
===
--- testsuite/gcc.target/i386/pr68701-1.c   (nonexistent)
+++ testsuite/gcc.target/i386/pr68701-1.c   (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O -ffixed-ebp -mno-accumulate-outgoing-args" } */
+
+/* { dg-warning "fixed ebp register requires" "" { target *-*-* } 0 } */
+
+void foo (void);
+
+int
+main (int argc, char *argv[])
+{
+  foo ();
+  return argc - 1;
+}
Index: testsuite/gcc.target/i386/pr68701-2.c
===
--- testsuite/gcc.target/i386/pr68701-2.c   (nonexistent)
+++ testsuite/gcc.target/i386/pr68701-2.c   (working copy)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O -ffixed-ebp -mno-accumulate-outgoing-args -mstackrealign 
-msse" } */
+
+/* { dg-warning "fixed ebp register requires" "" { target *-*-* } 0 } */
+
+typedef float V __attribute__((vector_size(16)));
+
+void bar (V a)
+{
+  volatile V b = a;
+}

Re: [PATCH] Fix PR ipa/68790

> Hi.
> 
> I'm sending patch for the PR, unfortunately I've back-ported Honza's patch
> that introduced the regression to GCC 5.3.
> 
> Fix is obvious, I've been running regression tests for both
> trunk and gcc-5-branch. Identical patch can be applied (modulo a line with 
> 'cleanup-ipa-dump').
> 
> Ready after it finishes?
> 
> Thanks,
> Martin

> >From 9f547d06489ffd85b20b09067b4cfbca2c0ff70e Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Tue, 8 Dec 2015 18:46:44 +0100
> Subject: [PATCH] Fix PR ipa/68790
> 
> gcc/ChangeLog:
> 
> 2015-12-08  Martin Liska  
> 
>   PR ipa/68790
>   * ipa-icf.c (sem_function::param_used_p): Return true
>   if ipa_node_params_sum equals to NULL.

OK, thanks

Honza

[PTX] remove some test skipping

2015-12-08 Thread Nathan Sidwell

These tests no longer cause the PTX assembler to seg fault, so no reason to skip 
them.


nathan
2015-12-08  Nathan Sidwell  

	* gcc.c-torture/compile/920723-1.c: Remove PTX skip. 
	* gcc.c-torture/compile/pr33855.c: Likewise.
	* gcc.c-torture/execute/981019-1.c: Remove PTX -O2 skip.

Index: gcc.c-torture/compile/920723-1.c
===
--- gcc.c-torture/compile/920723-1.c	(revision 231418)
+++ gcc.c-torture/compile/920723-1.c	(working copy)
@@ -1,4 +1,3 @@
-/* { dg-skip-if "ptxas seg faults" { nvptx-*-* } { "-O2" } { "" } } */
 
 #if defined(STACK_SIZE) && STACK_SIZE < 65536
 # define GITT_SIZE 75
Index: gcc.c-torture/compile/pr33855.c
===
--- gcc.c-torture/compile/pr33855.c	(revision 231418)
+++ gcc.c-torture/compile/pr33855.c	(working copy)
@@ -1,4 +1,3 @@
-/* { dg-skip-if "ptxas seg faults" { nvptx-*-* } { "-O1" } { "" } } */
 /* Testcase by Martin Michlmayr  */
 /* Used to segfault due to cselim not marking the complex temp var
as GIMPLE reg.  */
Index: gcc.c-torture/execute/981019-1.c
===
--- gcc.c-torture/execute/981019-1.c	(revision 231418)
+++ gcc.c-torture/execute/981019-1.c	(working copy)
@@ -1,4 +1,4 @@
-/* { dg-skip-if "ptxas seg faults" { nvptx-*-* } { "-O2" "-O3*" } { "" } } */
+/* { dg-skip-if "ptxas seg faults" { nvptx-*-* } { "-O3*" } { "" } } */
 
 extern int f2(void);
 extern int f3(void);

[PATCH] Encode alignment info in MASK_{LOAD,STORE} ifns (PR tree-optimization/68786)

2015-12-08 Thread Jakub Jelinek

Hi!

As written in the PR, with MASK_{LOAD,STORE} ifns we lose info on the
alignment of the point, unless the referenced SSA_NAME has pointer info
with alignment mask, but that is just an optimization issue rather than
requirement.

As the second argument to these builtins is always 0 (used just to hold
the pointer type for aliasing purposes), this patch uses the value of
the second arg to hold alignment info.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2015-12-08  Jakub Jelinek  

PR tree-optimization/68786
* tree-if-conv.c: Include builtins.h.
(predicate_mem_writes): Put result of get_object_alignment (ref)
into second argument's value.
* tree-vect-stmts.c (vectorizable_mask_load_store): Put minimum
pointer alignment into second argument's value.
* tree-data-ref.c (get_references_in_stmt): Use value of second
argument for build_aligned_type, and only the type to build
a zero second argument for MEM_REF.
* internal-fn.c (expand_mask_load_optab_fn,
expand_mask_store_optab_fn): Likewise.

--- gcc/tree-if-conv.c.jj   2015-11-30 13:40:38.0 +0100
+++ gcc/tree-if-conv.c  2015-12-08 17:29:18.139186787 +0100
@@ -111,6 +111,7 @@ along with GCC; see the file COPYING3.
 #include "dbgcnt.h"
 #include "tree-hash-traits.h"
 #include "varasm.h"
+#include "builtins.h"
 
 /* List of basic blocks in if-conversion-suitable order.  */
 static basic_block *ifc_bbs;
@@ -2093,7 +2094,8 @@ predicate_mem_writes (loop_p loop)
vect_sizes.safe_push (bitsize);
vect_masks.safe_push (mask);
  }
-   ptr = build_int_cst (reference_alias_ptr_type (ref), 0);
+   ptr = build_int_cst (reference_alias_ptr_type (ref),
+get_object_alignment (ref));
/* Copy points-to info if possible.  */
if (TREE_CODE (addr) == SSA_NAME && !SSA_NAME_PTR_INFO (addr))
  copy_ref_info (build2 (MEM_REF, TREE_TYPE (ref), addr, ptr),
--- gcc/tree-vect-stmts.c.jj2015-12-02 20:26:59.0 +0100
+++ gcc/tree-vect-stmts.c   2015-12-08 17:10:47.226768560 +0100
@@ -2058,10 +2058,11 @@ vectorizable_mask_load_store (gimple *st
misalign = DR_MISALIGNMENT (dr);
  set_ptr_info_alignment (get_ptr_info (dataref_ptr), align,
  misalign);
+ tree ptr = build_int_cst (TREE_TYPE (gimple_call_arg (stmt, 1)),
+   misalign ? misalign & -misalign : align);
  new_stmt
= gimple_build_call_internal (IFN_MASK_STORE, 4, dataref_ptr,
- gimple_call_arg (stmt, 1),
- vec_mask, vec_rhs);
+ ptr, vec_mask, vec_rhs);
  vect_finish_stmt_generation (stmt, new_stmt, gsi);
  if (i == 0)
STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
@@ -2107,10 +2108,11 @@ vectorizable_mask_load_store (gimple *st
misalign = DR_MISALIGNMENT (dr);
  set_ptr_info_alignment (get_ptr_info (dataref_ptr), align,
  misalign);
+ tree ptr = build_int_cst (TREE_TYPE (gimple_call_arg (stmt, 1)),
+   misalign ? misalign & -misalign : align);
  new_stmt
= gimple_build_call_internal (IFN_MASK_LOAD, 3, dataref_ptr,
- gimple_call_arg (stmt, 1),
- vec_mask);
+ ptr, vec_mask);
  gimple_call_set_lhs (new_stmt, make_ssa_name (vec_dest));
  vect_finish_stmt_generation (stmt, new_stmt, gsi);
  if (i == 0)
--- gcc/tree-data-ref.c.jj  2015-11-09 13:39:32.0 +0100
+++ gcc/tree-data-ref.c 2015-12-08 17:38:55.199094723 +0100
@@ -3872,6 +3872,8 @@ get_references_in_stmt (gimple *stmt, ve
   else if (stmt_code == GIMPLE_CALL)
 {
   unsigned i, n;
+  tree ptr, type;
+  unsigned int align;
 
   ref.is_read = false;
   if (gimple_call_internal_p (stmt))
@@ -3882,12 +3884,16 @@ get_references_in_stmt (gimple *stmt, ve
  break;
ref.is_read = true;
  case IFN_MASK_STORE:
-   ref.ref = fold_build2 (MEM_REF,
-  ref.is_read
-  ? TREE_TYPE (gimple_call_lhs (stmt))
-  : TREE_TYPE (gimple_call_arg (stmt, 3)),
-  gimple_call_arg (stmt, 0),
-  gimple_call_arg (stmt, 1));
+   ptr = build_int_cst (TREE_TYPE (gimple_call_arg (stmt, 1)), 0);
+   align = tree_to_shwi (gimple_call_arg (stmt, 1));
+   if (ref.is_read)
+ type = TREE_TYPE (gimple_call_lhs (stmt));
+   else
+ type = TREE_TYPE

Re: [PTX] remove some test skipping

2015-12-08 Thread Alexander Monakov

On Tue, 8 Dec 2015, Nathan Sidwell wrote:

> These tests no longer cause the PTX assembler to seg fault, so no reason to
> skip them.

Shouldn't all such cases use 'dg-xfail' rather than 'dg-skip-if'?  Should be
more reasonable to still run the test and eventually get XPASS'es as ptxas
improves.

Alexander

Re: [GOOGLE] Remove overly-aggressive LIPO assert

2015-12-08 Thread Xinliang David Li

ok.

David

On Fri, Dec 4, 2015 at 11:29 AM, Teresa Johnson  wrote:
> Ping.
> Thanks, Teresa
>
> On Wed, Dec 2, 2015 at 12:46 PM, Teresa Johnson  wrote:
>> Remove an assert that was overly-strict and already partially redundant
>> with an immediately prior assert. In this case we had a hidden visibility
>> function clone that was created after the LIPO link due to indirect call
>> promotion. It is a cgraph_is_aux_decl_external node.
>>
>> Fixes failures and passes regression tests. Ok for Google branch?
>>
>> 2015-12-02  Teresa Johnson  
>>
>> Google ref b/25925223.
>> * l-ipo.c (cgraph_lipo_get_resolved_node_1): Remove overly-strict
>> assert.
>>
>> Index: l-ipo.c
>> ===
>> --- l-ipo.c (revision 231131)
>> +++ l-ipo.c (working copy)
>> @@ -1457,9 +1457,6 @@ cgraph_lipo_get_resolved_node_1 (tree decl, bool d
>>gcc_assert (DECL_EXTERNAL (decl)
>>|| cgraph_is_aux_decl_external (n)
>>|| DECL_VIRTUAL_P (decl));
>> -  gcc_assert (/* This is the case for explicit extern
>> instantiation,
>> - when cgraph node is not created before link.  
>> */
>> -  DECL_EXTERNAL (decl));
>>cgraph_link_node (n);
>>return n;
>>  }
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: -fstrict-aliasing fixes 5/6: make type system independent of flag_strict_aliasing

> > Usually cycles happen through structure members and it might be that
> > all other frontends have the pointed-to type incomplete.  But the
> > above recursion shouldn't apply for the structure case.
> 
> All types are equal in Ada and can be forward declared; the language 
> specifies 
> that their "elaboration" can be delayed until a "freeze" point (in 
> particular, 
> you cannot declare an object of the type until after it), from which all the 
> incomplete references must be resolved to the final type.
> 
> > Not sure how your other examples look like.
> 
> Pure pointer cycles of any length.

OK, the code already pre-allocates the vector to be 8 elements.  What about
simply punting when reaching this depth? I do not think real world program have
more than 8 nested pointers often enough for this to matter.
They will then get same alias set as void *.

Honza
> 
> -- 
> Eric Botcazou

Re: -fstrict-aliasing fixes 5/6: make type system independent of flag_strict_aliasing

> OK, the code already pre-allocates the vector to be 8 elements.  What about
> simply punting when reaching this depth? I do not think real world program
> have more than 8 nested pointers often enough for this to matter.
> They will then get same alias set as void *.

Fine with me, but the len<8 cases need to be dealt with as well.

-- 
Eric Botcazou

Transparent alias suport part 2 (lto-partition fixes)

Hi,
this patch fixes lto-partition WRT transparent aliases.  Normal aliases are
always put to the same partition as their target and in other partitions
they become part of the boundary (so we know that the two symbols are in fact
equivalent).

Weakrefs and transparent aliases go only into the partitions that use them.

Other issue is that while promoting a symbol to hidden you in fact also promote
all transparent aliases.

Bootstrapped/regtested x86_64-linux, will commit it shortly.
Honza
* lto-partition.c (add_symbol_to_partition_1): Transparent aliases
are not part of the definition.
(contained_in_symbol): Likewise.
(promote_symbol): When promoting a symbol also promote all transparent
aliases.
(rename_statics): Weakref needs unique name, too.
Index: lto/lto-partition.c
===
--- lto/lto-partition.c (revision 231376)
+++ lto/lto-partition.c (working copy)
@@ -177,8 +177,20 @@ add_symbol_to_partition_1 (ltrans_partit
   /* Add all aliases associated with the symbol.  */
 
   FOR_EACH_ALIAS (node, ref)
-if (!node->weakref)
+if (!ref->referring->transparent_alias)
   add_symbol_to_partition_1 (part, ref->referring);
+else
+  {
+   struct ipa_ref *ref2;
+   /* We do not need to add transparent aliases if they are not used.
+  However we must add aliases of transparent aliases if they exist.  */
+   FOR_EACH_ALIAS (ref->referring, ref2)
+ {
+   /* Nested transparent aliases are not permitted.  */
+   gcc_checking_assert (!ref2->referring->transparent_alias);
+   add_symbol_to_partition_1 (part, ref2->referring);
+ }
+  }
 
   /* Ensure that SAME_COMDAT_GROUP lists all allways added in a group.  */
   if (node->same_comdat_group)
@@ -199,8 +211,10 @@ add_symbol_to_partition_1 (ltrans_partit
 static symtab_node *
 contained_in_symbol (symtab_node *node)
 {
-  /* Weakrefs are never contained in anything.  */
-  if (node->weakref)
+  /* There is no need to consider transparent aliases to be part of the
+ definition: they are only useful insite the partition they are output
+ and thus we will always see an explicit reference to it.  */
+  if (node->transparent_alias)
 return node;
   if (cgraph_node *cnode = dyn_cast  (node))
 {
@@ -967,6 +981,23 @@ promote_symbol (symtab_node *node)
   TREE_PUBLIC (node->decl) = 1;
   DECL_VISIBILITY (node->decl) = VISIBILITY_HIDDEN;
   DECL_VISIBILITY_SPECIFIED (node->decl) = true;
+  ipa_ref *ref;
+
+  /* Promoting a symbol also promotes all trasparent aliases with exception
+ of weakref where the visibility flags are always wrong and set to 
+ !PUBLIC.  */
+  for (unsigned i = 0; node->iterate_direct_aliases (i, ref); i++)
+{
+  struct symtab_node *alias = ref->referring;
+  if (alias->transparent_alias && !alias->weakref)
+   {
+ TREE_PUBLIC (alias->decl) = 1;
+ DECL_VISIBILITY (alias->decl) = VISIBILITY_HIDDEN;
+ DECL_VISIBILITY_SPECIFIED (alias->decl) = true;
+   }
+  gcc_assert (!alias->weakref || TREE_PUBLIC (alias->decl));
+}
+
   if (symtab->dump_file)
 fprintf (symtab->dump_file,
"Promoting as hidden: %s\n", node->name ());
@@ -974,7 +1005,8 @@ promote_symbol (symtab_node *node)
 
 /* Return true if NODE needs named section even if it won't land in the 
partition
symbol table.
-   FIXME: we should really not use named sections for inline clones and master 
clones.  */
+   FIXME: we should really not use named sections for inline clones and master
+   clones.  */
 
 static bool
 may_need_named_section_p (lto_symtab_encoder_t encoder, symtab_node *node)
@@ -1004,7 +1036,7 @@ rename_statics (lto_symtab_encoder_t enc
   tree name = DECL_ASSEMBLER_NAME (decl);
 
   /* See if this is static symbol. */
-  if ((node->externally_visible
+  if (((node->externally_visible && !node->weakref)
   /* FIXME: externally_visible is somewhat illogically not set for
 external symbols (i.e. those not defined).  Remove this test
 once this is fixed.  */

[gomp4] libgomp documentation: CUDA Streams Usage

2015-12-08 Thread Thomas Schwinge

Hi!

On Mon, 12 Jan 2015 13:55:47 -0600, James Norris  
wrote:
> The attached patch adds a new section to the documentation
> for libgomp. This section describes the use of streams
> within the OpenACC portion of the library.

That never made it upstream; with a little bit of copy-editing now
committed to gomp-4_0-branch in r231424:

commit ec7ae163b644bd11fd7343dd576cc9da0b50cbc7
Author: tschwinge 
Date:   Tue Dec 8 20:44:06 2015 +

libgomp documentation: CUDA Streams Usage

libgomp/
* libgomp.texi (CUDA Streams Usage): New chapter.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@231424 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp |5 +
 libgomp/libgomp.texi   |   49 +++-
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index a59cc9d..4b99302 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,4 +1,9 @@
 2015-12-08  Thomas Schwinge  
+   James Norris  
+
+   * libgomp.texi (CUDA Streams Usage): New chapter.
+
+2015-12-08  Thomas Schwinge  
 
* testsuite/libgomp.oacc-c-c++-common/routine-bind-nohost-1.c: New
file.
diff --git libgomp/libgomp.texi libgomp/libgomp.texi
index 019e439..542ca2f 100644
--- libgomp/libgomp.texi
+++ libgomp/libgomp.texi
@@ -100,6 +100,8 @@ changed to GNU Offloading and Multi Processing Runtime 
Library.
   programming interface.
 * OpenACC Environment Variables::Influencing OpenACC runtime behavior with
  environment variables.
+* CUDA Streams Usage::   Notes on the implementation of
+ asynchronous operations.
 * OpenACC Library Interoperability:: OpenACC library interoperability with the
  NVIDIA CUBLAS library.
 * Enabling OpenMP::  How to enable OpenMP for your
@@ -552,6 +554,51 @@ Print debug information pertaining to the accelerator.
 @end table
 
 
+
+@c -
+@c CUDA Streams Usage
+@c -
+
+@node CUDA Streams Usage
+@chapter CUDA Streams Usage
+
+This applies to the @code{nvptx} plugin only.
+
+The library provides elements that perform asynchronous movement of
+data and asynchronous operation of computing constructs.  This
+asynchronous functionality is implemented by making use of CUDA
+streams@footnote{See "Stream Management" in "CUDA Driver API",
+TRM-06703-001, Version 5.5, July 2013, for additional information}.
+
+The primary means by which the asychronous functionality is accessed
+is through the use of those OpenACC directives which make use of the
+@code{async} and @code{wait} clauses.  When the @code{async} clause is
+first used with a directive, it will create a CUDA stream.  If an
+@code{async-argument} is used with the @code{async} clause, then the
+stream will be associated with the specified @code{async-argument}.
+
+Following the creation of an association between a CUDA stream and the
+@code{async-argument} of an @code{async} clause, both the @code{wait}
+clause and the @code{wait} directive can be used.  When either the
+clause or directive is used after stream creation, it creates a
+rendezvous point whereby execution will wait until all operations
+associated with the @code{async-argument}, that is, stream, have
+completed.
+
+Normally, the management of the streams that are created as a result of
+using the @code{async} clause, is done without any intervention by the
+caller.  This implies the association between the @code{async-argument}
+and the CUDA stream will be maintained for the lifetime of the program.
+However, this association can be changed through the use of the library
+function @code{acc_set_cuda_stream}.  When the function
+@code{acc_set_cuda_stream} is used, the CUDA stream that was
+originally associated with the @code{async} clause will be destroyed.
+Caution should be taken when changing the association as subsequent
+references to the @code{async-argument} will be referring to a different
+CUDA stream.
+
+
+
 @c -
 @c OpenACC Library Interoperability
 @c -
@@ -564,7 +611,7 @@ Print debug information pertaining to the accelerator.
 As the OpenACC library is built using the CUDA Driver API, the question has
 arisen on what impact does using the OpenACC library have on a program that
 uses the Runtime library, or a library based on the Runtime library, e.g.,
-CUBLAS@footnote{Seee section 2.26, "Interactions with the CUDA Driver

Re: [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching

2015-12-08 Thread Ramana Radhakrishnan



On 08/12/15 13:53, Christian Bruel wrote:
> 
>>
>> The __builtin_neon* aren't published anywhere and people really
>> shouldn't be using that directly in source code and only use the
>> interface in arm_neon.h which implements pretty much all the Neon
>> intrinsics in the ACLE document.
>>
> 
> yes, I see. I wanted to reduce the problem as well, not to confuse anything 
> by exposing those. sorry about this.
> 
> Here is the amended patch that use the arm_neon.h interface instead of the 
> builtins. Still fixes the same issues
> 
> Thanks
> 
> Christian
> 

> lto-neon.patch
> 
> 2015-12-07  Christian Bruel  
> 
>   * config/arm/arm-builtins.c (ARM_BUILTIN_CRYPTO_BASE): New enum tag.
>   (arm_init_neon_builtins_internal): Rename arm_init_neon_builtins,
>   (arm_init_crypto_builtins_internal): Rename arm_init_crypto_builtins.
>   use add_builtin_function_ext_scope instead of add_builtin_function.
>   (neon_set_p, neon_crypto_set_p): Remove.
>   (arm_init_builtins): Always call arm_init_neon_builtins and
>   arm_init_crypto_builtins.
>   (arm_expand_builtin): Check ARM_BUILTIN_NEON_BASE and
>   ARM_BUILTIN_CRYPTO_BASE.
>   * config/arm/arm-protos.h (arm_init_neon_builtins): Remove proto.
>   * config/arm/arm.c (arm_can_inline_p): Return OK for builtins.
>   (arm_valid_target_attribute_tree) : Remove arm_init_neon_builtins call.
> 
> 2015-12-07  Christian Bruel  
> 
>   PR target/pr68784
>   PR target/pr65837
>   * gcc.target/arm/pr68784.c: New test.
>   * gcc.target/arm/lto/pr65837_0_attr.c: New test.
>   * gcc.target/arm/lto/pr65837_0.c: Force float-abi.
> 
> Index: gcc/config/arm/arm-builtins.c
> ===
> --- gcc/config/arm/arm-builtins.c (revision 231363)
> +++ gcc/config/arm/arm-builtins.c (working copy)
> @@ -526,6 +526,8 @@ enum arm_builtins
>  #define CRYPTO3(L, U, M1, M2, M3, M4) \
>ARM_BUILTIN_CRYPTO_##U,
>  
> +  ARM_BUILTIN_CRYPTO_BASE,
> +
>  #include "crypto.def"
>  
>  #undef CRYPTO1
> @@ -894,7 +896,7 @@ arm_init_simd_builtin_scalar_types (void
>  }
>  
>  static void
> -arm_init_neon_builtins_internal (void)
> +arm_init_neon_builtins (void)
>  {
>unsigned int i, fcode = ARM_BUILTIN_NEON_PATTERN_START;
>  
> @@ -1018,7 +1020,7 @@ arm_init_neon_builtins_internal (void)
>  }
>  
>  static void
> -arm_init_crypto_builtins_internal (void)
> +arm_init_crypto_builtins (void)
>  {
>tree V16UQI_type_node
>  = arm_simd_builtin_type (V16QImode, true, false);
> @@ -1098,25 +1100,6 @@ arm_init_crypto_builtins_internal (void)
>#undef FT3
>  }
>  
> -static bool neon_set_p = false;
> -static bool neon_crypto_set_p = false;
> -
> -void
> -arm_init_neon_builtins (void)
> -{
> -  if (! neon_set_p)
> -{
> -  neon_set_p = true;
> -  arm_init_neon_builtins_internal ();
> -}
> -
> -  if (! neon_crypto_set_p && TARGET_CRYPTO && TARGET_HARD_FLOAT)
> -{
> -  neon_crypto_set_p = true;
> -  arm_init_crypto_builtins_internal ();
> -}
> -}
> -
>  #undef NUM_DREG_TYPES
>  #undef NUM_QREG_TYPES
>  
> @@ -1777,8 +1760,9 @@ arm_init_builtins (void)
>   arm_init_neon_builtins which uses it.  */
>arm_init_fp16_builtins ();
>  
> -  if (TARGET_NEON)
> -arm_init_neon_builtins ();
> +  arm_init_neon_builtins ();
> +
> +  arm_init_crypto_builtins ();
>  
>if (TARGET_CRC32)
>  arm_init_crc32_builtins ();
> @@ -2332,9 +2316,26 @@ arm_expand_builtin (tree exp,
>int mask;
>int imm;
>  
> +  /* Check in the context of the function making the call whether the
> + builtin is supported.  */
> +  if (fcode >= ARM_BUILTIN_NEON_BASE && !TARGET_NEON)
> +{
> +  error ("%qE neon builtin is not supported in this configuration.",
> +  fndecl);
> +  return const0_rtx;
> +}

Can we make this error message more user friendly.

"You must enable NEON instructions (e.g. -mfloat-abi=softfp -mfpu=neon) to use 
these intrinsics"

> +
>if (fcode >= ARM_BUILTIN_NEON_BASE)
>  return arm_expand_neon_builtin (fcode, exp, target);
>  
> +  if (fcode >= ARM_BUILTIN_CRYPTO_BASE
> +  && (!TARGET_CRYPTO || !TARGET_HARD_FLOAT))
> +{
> +  error ("%qE crypto builtin is not supported in this configuration.",
> +  fndecl);
> +  return const0_rtx;
> +}

"You must enable crypto intrinsics (e.g. -mfloat-abi=softfp 
-mfpu=crypto-neon...) to use these intrinsics" 

I'm still playing with this patch.

regards
Ramana





> +




>switch (fcode)
>  {
>  case ARM_BUILTIN_GET_FPSCR:
> Index: gcc/config/arm/arm-protos.h
> ===
> --- gcc/config/arm/arm-protos.h   (revision 231363)
> +++ gcc/config/arm/arm-protos.h   (working copy)
> @@ -213,7 +213,6 @@ extern void arm_mark_dllimport (tree);
>  extern bool arm_change_mode_p (tree);
>  #endif
>  
> -extern void

[gomp4] [WIP] OpenACC bind, nohost clauses (was: [OpenACC] C, C++: bind and nohost clauses)

2015-12-08 Thread Thomas Schwinge

Hi!

On Sat, 14 Nov 2015 09:36:36 +0100, I wrote:
> Initial support for the OpenACC bind and nohost clauses (routine
> directive) for C, C++.  Fortran to follow.  Middle end handling and more
> complete testsuite coverage also to follow once we got a few details
> clarified.  OK for trunk?

(Has not yet been reviewed.)  Meanwhile, I continued working on the
implementation, focussing on C.  See also my question "How to rewrite
call targets (OpenACC bind clause)",
.

To enable Cesar to help with the C++ and Fortran front ends (thanks!), in
r231423, I just committed "[WIP] OpenACC bind, nohost clauses" to
gomp-4_0-branch.  (There has already been initial support, parsing only,
on gomp-4_0-branch.)  I'll try to make progress with the generic middle
end bits, but will appreciate any review comments, so before inlining the
complete patch, first a few questions/comments:

In the OpenACC bind(Y) clause attached to a routine(X) directive, Y can
be an identifier or a string.  In the front ends, I canonicalize that
into a string, as we -- at least currently -- don't have any use for the
identifier (or decl?) later on:

--- gcc/tree-core.h
+++ gcc/tree-core.h
@@ -461,7 +461,7 @@ enum omp_clause_code {
-  /* OpenACC clause: bind ( identifer | string ).  */
+  /* OpenACC clause: bind (string).  */
   OMP_CLAUSE_BIND,

All the following are unreachable for OMP_CLAUSE_BIND, OMP_CLAUSE_NOHOST;
document that to make it obvious/expected:

--- gcc/cp/pt.c
+++ gcc/cp/pt.c
@@ -14501,6 +14501,8 @@ tsubst_omp_clauses (tree clauses, bool 
declare_simd, bool allow_fields,
  }
  }
  break;
+   case OMP_CLAUSE_BIND:
+   case OMP_CLAUSE_NOHOST:
default:
  gcc_unreachable ();
}
--- gcc/gimplify.c
+++ gcc/gimplify.c
@@ -7413,6 +7413,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  ctx->default_kind = OMP_CLAUSE_DEFAULT_KIND (c);
  break;
 
+   case OMP_CLAUSE_BIND:
+   case OMP_CLAUSE_NOHOST:
default:
  gcc_unreachable ();
}
@@ -8104,6 +8106,8 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, 
gimple_seq body, tree *list_p,
case OMP_CLAUSE_DEVICE_TYPE:
  break;
 
+   case OMP_CLAUSE_BIND:
+   case OMP_CLAUSE_NOHOST:
default:
  gcc_unreachable ();
}
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -2279,6 +2279,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
  sorry ("Clause not supported yet");
  break;
 
+   case OMP_CLAUSE_BIND:
+   case OMP_CLAUSE_NOHOST:
default:
  gcc_unreachable ();
}
@@ -2453,6 +2455,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
  sorry ("Clause not supported yet");
  break;
 
+   case OMP_CLAUSE_BIND:
+   case OMP_CLAUSE_NOHOST:
default:
  gcc_unreachable ();
}
--- gcc/tree-nested.c
+++ gcc/tree-nested.c
@@ -1200,6 +1200,8 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct 
walk_stmt_info *wi)
case OMP_CLAUSE_SEQ:
  break;
 
+   case OMP_CLAUSE_BIND:
+   case OMP_CLAUSE_NOHOST:
default:
  gcc_unreachable ();
}
@@ -1882,6 +1884,8 @@ convert_local_omp_clauses (tree *pclauses, struct 
walk_stmt_info *wi)
case OMP_CLAUSE_SEQ:
  break;
 
+   case OMP_CLAUSE_BIND:
+   case OMP_CLAUSE_NOHOST:
default:
  gcc_unreachable ();
}

C front end:

--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -11607,6 +11607,8 @@ c_parser_oacc_clause_async (c_parser *parser, tree 
list)
 static tree
 c_parser_oacc_clause_bind (c_parser *parser, tree list)
 {
+  check_no_duplicate_clause (list, OMP_CLAUSE_BIND, "bind");
+
   location_t loc = c_parser_peek_token (parser)->location;
 
   parser->lex_untranslated_string = true;
@@ -11615,20 +11617,43 @@ c_parser_oacc_clause_bind (c_parser *parser, tree 
list)
   parser->lex_untranslated_string = false;
   return list;
 }
-  if (c_parser_next_token_is (parser, CPP_NAME)
-  || c_parser_next_token_is (parser, CPP_STRING))
+  tree name = error_mark_node;
+  c_token *token = c_parser_peek_token (parser);
+  if (c_parser_next_token_is (parser, CPP_NAME))
 {
-  tree t = c_parser_peek_token (parser)->value;
+  tree decl = lookup_name (token->value);
+  if (!decl)
+   error_at (token->location, "%qE has not been declared",
+ token->value);
+  else if (TREE_CODE (decl)

Re: [PTX] remove some test skipping

2015-12-08 Thread Nathan Sidwell


On 12/08/15 14:35, Alexander Monakov wrote:

On Tue, 8 Dec 2015, Nathan Sidwell wrote:


These tests no longer cause the PTX assembler to seg fault, so no reason to
skip them.


Shouldn't all such cases use 'dg-xfail' rather than 'dg-skip-if'?  Should be
more reasonable to still run the test and eventually get XPASS'es as ptxas
improves.


I have a vague memory that dg-xfail didn't work because of the failure mode -- 
the harness claimed they were UNRESOLVED.


nathan

Transparent alias suport part 4 (ipa-visibility fixes)

Hi,
this patch fixes two issues with transparent laiases in ipa-visibility. First
one can't replace weakrefs by local aliases because those would lose the
weakness and second transparent aliases do not really have resolution info set
in all cases, so we want to skip them when considering a comdat for 
localization.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

* ipa-visibility.c (can_replace_by_local_alias): Look through 
transparent
aliaes; refuse weakrefs.
(update_visibility_by_resolution_info): Skip transparent aliases in the
analysis part
Index: ipa-visibility.c
===
--- ipa-visibility.c(revision 231425)
+++ ipa-visibility.c(working copy)
@@ -332,6 +332,13 @@ varpool_node::externally_visible_p (void
 bool
 can_replace_by_local_alias (symtab_node *node)
 {
+  /* Weakrefs have a reason to be non-local.  Be sure we do not replace
+ them.  */
+  while (node->transparent_alias && node->definition && !node->weakref)
+node = node->get_alias_target ();
+  if (node->weakref)
+return false;
+  
   return (node->get_availability () > AVAIL_INTERPOSABLE
  && !decl_binds_to_current_def_p (node->decl)
  && !node->can_be_discarded_p ());
@@ -392,7 +399,7 @@ update_visibility_by_resolution_info (sy
 for (symtab_node *next = node->same_comdat_group;
 next != node; next = next->same_comdat_group)
   {
-   if (!next->externally_visible)
+   if (!next->externally_visible || next->transparent_alias)
  continue;
 
bool same_def

Transparent alias suport part 5 (varpool fixes)

Hi,
this patch fixes ICE in varpool_node::get_availability which happens when
you dump a node with weakref and makes symbol_table::remove_unreferenced_decls
to keep aliases in the boundary so we do not lose the information that they
actually represent the same location in the binary.

This is needed to fix alias.c/tree-alias.c WRT aliases.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

* varpool.c (varpool_node::get_availability): Recurse only on
weakrefs with definition in the target.
(symbol_table::remove_unreferenced_decls): Keep aliases in the boundary.
Index: varpool.c
===
--- varpool.c   (revision 231425)
+++ varpool.c   (working copy)
@@ -490,7 +490,7 @@ varpool_node::get_availability (void)
   if (DECL_IN_CONSTANT_POOL (decl)
   || DECL_VIRTUAL_P (decl))
 return AVAIL_AVAILABLE;
-  if (transparent_alias)
+  if (transparent_alias && definition)
 {
   enum availability avail;
 
@@ -667,11 +667,11 @@ symbol_table::remove_unreferenced_decls
enqueue_node (vnode, );
  else
{
- referenced.add (node);
- while (node->alias && node->definition)
+ referenced.add (vnode);
+ while (vnode && vnode->alias && vnode->definition)
{
- node = node->get_alias_target ();
- referenced.add (node);
+ vnode = vnode->get_alias_target ();
+ referenced.add (vnode);
}
}
}

Transparent alias suport part 6 (lto-cgraph fixes)