date:20151202

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Nathan Sidwell


On 12/02/15 09:22, Jakub Jelinek wrote:


I believe Alex' testing revealed that if you take address of the same .local
objects in several threads, the addresses are the same, and therefore you
refer to your own .local space rather than the other thread's.


Before or after applying cvta?

nathan

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Jakub Jelinek

On Wed, Dec 02, 2015 at 09:23:11AM -0500, Nathan Sidwell wrote:
> On 12/02/15 09:22, Jakub Jelinek wrote:
> 
> >I believe Alex' testing revealed that if you take address of the same .local
> >objects in several threads, the addresses are the same, and therefore you
> >refer to your own .local space rather than the other thread's.
> 
> Before or after applying cvta?

I'll let Alex answer that.

Jakub

[patch] libstdc++/56383 Fix ambiguity with multiple enable_shared_from_this bases

2015-12-02 Thread Jonathan Wakely


The friend function defined in enable_shared_from_this did not match
the declaration at namespace scope, so instead of defining the
previously declared function it added a new overload, which was always
a better match than the declared (but never defined) one.

That worked fine for a single base class, the friend function got
found by ADL, but with two enable_shared_from_this base classes the
two friend overloads were ambiguous.

This changes the friend to match the declaration, so that it can only
be called when an unambiguous enable_shared_from_this base can be
deduced, and so fails silently (as it is supposed to) when there is
not an unambiguous base class.

Tested powerpc64le-linux, committed to trunk.

This fix is simple enough that I'm going to backport it after 5.3 is
released.

commit 1fa7fd2d7699c41f147b42ae96f70bf7b9e8e2d6
Author: Jonathan Wakely 
Date:   Wed Dec 2 14:43:34 2015 +

Fix ambiguity with multiple enable_shared_from_this bases

	PR libstdc++/56383
	* testsuite/20_util/enable_shared_from_this/56383.cc: New.
	* include/bits/shared_ptr_base.h (__enable_shared_from_this): Make
	friend declaration match previous declaration of
	__enable_shared_from_this_helper.
	* include/bits/shared_ptr.h (enable_shared_from_this): Likewise.

diff --git a/libstdc++-v3/include/bits/shared_ptr.h b/libstdc++-v3/include/bits/shared_ptr.h
index 2413b1b..26a0ad3 100644
--- a/libstdc++-v3/include/bits/shared_ptr.h
+++ b/libstdc++-v3/include/bits/shared_ptr.h
@@ -582,19 +582,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	_M_weak_assign(_Tp1* __p, const __shared_count<>& __n) const noexcept
 	{ _M_weak_this._M_assign(__p, __n); }
 
-  template
+  template
 	friend void
-	__enable_shared_from_this_helper(const __shared_count<>& __pn,
-	 const enable_shared_from_this* __pe,
-	 const _Tp1* __px) noexcept
-	{
-	  if (__pe != nullptr)
-	__pe->_M_weak_assign(const_cast<_Tp1*>(__px), __pn);
-	}
+	__enable_shared_from_this_helper(const __shared_count<>&,
+	 const enable_shared_from_this<_Tp1>*,
+	 const _Tp2*) noexcept;
 
   mutable weak_ptr<_Tp>  _M_weak_this;
 };
 
+  template
+inline void
+__enable_shared_from_this_helper(const __shared_count<>& __pn,
+ const enable_shared_from_this<_Tp1>*
+ __pe, const _Tp2* __px) noexcept
+{
+  if (__pe != nullptr)
+	__pe->_M_weak_assign(const_cast<_Tp2*>(__px), __pn);
+}
+
   /**
*  @brief  Create an object that is owned by a shared_ptr.
*  @param  __a An allocator.
diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h b/libstdc++-v3/include/bits/shared_ptr_base.h
index 1a96b4c..f4f98e6 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -1546,19 +1546,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	_M_weak_assign(_Tp1* __p, const __shared_count<_Lp>& __n) const noexcept
 	{ _M_weak_this._M_assign(__p, __n); }
 
-  template
+  template<_Lock_policy _Lp1, typename _Tp1, typename _Tp2>
 	friend void
-	__enable_shared_from_this_helper(const __shared_count<_Lp>& __pn,
-	 const __enable_shared_from_this* __pe,
-	 const _Tp1* __px) noexcept
-	{
-	  if (__pe != nullptr)
-	__pe->_M_weak_assign(const_cast<_Tp1*>(__px), __pn);
-	}
+	__enable_shared_from_this_helper(const __shared_count<_Lp1>&,
+	 const __enable_shared_from_this<_Tp1,
+	 _Lp1>*, const _Tp2*) noexcept;
 
   mutable __weak_ptr<_Tp, _Lp>  _M_weak_this;
 };
 
+  template<_Lock_policy _Lp1, typename _Tp1, typename _Tp2>
+inline void
+__enable_shared_from_this_helper(const __shared_count<_Lp1>& __pn,
+ const __enable_shared_from_this<_Tp1,
+ _Lp1>* __pe,
+ const _Tp2* __px) noexcept
+{
+  if (__pe != nullptr)
+	__pe->_M_weak_assign(const_cast<_Tp2*>(__px), __pn);
+}
 
   template
 inline __shared_ptr<_Tp, _Lp>
diff --git a/libstdc++-v3/testsuite/20_util/enable_shared_from_this/56383.cc b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/56383.cc
new file mode 100644
index 000..ea0f28d
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/56383.cc
@@ -0,0 +1,56 @@
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// {

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Nathan Sidwell


On 12/02/15 10:12, Jakub Jelinek wrote:


If we have a reasonable IPA pass to discover which addressable variables can
be shared by multiple threads and which can't, then we could use soft-stack
for those that can be shared by multiple PTX threads (different warps, or
same warp, different threads in it), then we shouldn't need to copy any
stack, just broadcast the scalar vars.


Note the current scalar (.reg)  broadcasting uses the live register set.  Not 
the subset of that that is actually read within the partitioned region.  That'd 
be a relatively straightforward optimization I think.


nathan

Re: [PATCH, PING*4] Track indirect calls for call site information in debug info.

2015-12-02 Thread Pierre-Marie de Rodat


On 12/02/2015 02:57 PM, Jakub Jelinek wrote:

Ok, thanks.


Great, thank you! I’ve pushed the change.

--
Pierre-Marie de Rodat

[RFA] [PR tree-optimization/68599] Avoid over-zealous optimization with -funsafe-loop-optimizations

2015-12-02 Thread Jeff Law




I strongly recommend reading the analysis in pr45122 since pr68599 uses 
the same testcase and just triggers the same bug in the RTL optimizers 
instead of the tree optimziers.


As noted in 45122, with -funsafe-loop-optimizations, we may exit the 
loop an iteration too early.  The loop in question is finite and the 
counter does not overflow.  Yet -funsafe-loop-optimizations munges it badly.


As is noted in c#6 and patched in c#8, when there's more than one exit 
from the loop, simply discarding the assumptions for the trip count is 
"a bit too unsafe".  Richi & Zdenek agreed that disabling the 
optimization when the loop has > 1 exit was the preferred approach. 
Alex's patch did just that, but only for the tree optimizers.


This patch does essentially the same thing for the RTL loop optimizer. 
If the candidate loop has > 1 exit, then we don't allow 
-funsafe-loop-optimizations to drop the assumptions/infinite notes for 
the RTL loop.


This required ensuring that LOOPS_HAVE_RECORDED_EXITS when initializing 
the loop optimizer.


Bootstrapped and regression tested on x86_64-linux-gnu and 
powerpc64-linux-gnu.  For the latter, pr45122.c flips to a pass.  Given 
this is covered by the pr45122 testcase, I didn't add a new one.


OK for the trunk?

Jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9a78b7a..ed677ec 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2015-12-02  Jeff Law  
+
+   PR tree-optimization/68599
+   * loop-init.c (rtl_loop_init): Set LOOPS_HAVE_RECORDED_EXITS
+   in call to loop_optimizer_init.
+   * loop-iv.c (get_simple_loop_desc): Only allow unsafe loop
+   optimization to drop the assumptions/infinite notations if
+   the loop has a single exit.
+
 2015-12-01  Andreas Tobler  
 
* config/rs6000/freebsd64.h (ELFv2_ABI_CHECK): Add new macro.
diff --git a/gcc/loop-init.c b/gcc/loop-init.c
index e32c94a..120316d 100644
--- a/gcc/loop-init.c
+++ b/gcc/loop-init.c
@@ -395,7 +395,7 @@ rtl_loop_init (void)
   dump_flow_info (dump_file, dump_flags);
 }
 
-  loop_optimizer_init (LOOPS_NORMAL);
+  loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS);
   return 0;
 }
 
diff --git a/gcc/loop-iv.c b/gcc/loop-iv.c
index c7d5164..dfa3ca3 100644
--- a/gcc/loop-iv.c
+++ b/gcc/loop-iv.c
@@ -3054,7 +3054,7 @@ get_simple_loop_desc (struct loop *loop)
}
}
 
-  if (flag_unsafe_loop_optimizations)
+  if (flag_unsafe_loop_optimizations && single_exit (loop))
{
  desc->assumptions = NULL_RTX;
  desc->infinite = NULL_RTX;

RE: [mips] Rotate stack checking loop

2015-12-02 Thread Moore, Catherine



> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Eric Botcazou
> Sent: Thursday, November 12, 2015 4:51 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [mips] Rotate stack checking loop
> 
> Hi,
> 
> this patch rotates the loop generated in the prologue to do stack checking
> when -fstack-check is specified, thereby saving one branch instruction.  It
> was initially implemented as a WHILE loop to match the generic
> implementation but can be turned into a DO-WHILE loop because the
> amount of stack to be checked is known at compile time (since it's the static
> part of the frame).
> 
> Unfortunately I don't have access to MIPS hardware any more so I only
> verified
> that the assembly code is as expected and can be assembled.   OK for
> mainline?
> 
> 
> 2015-11-12  Eric Botcazou  
> 
>   * config/mips/mips.c (mips_emit_probe_stack_range): Adjust.
>   (mips_output_probe_stack_range): Rotate the loop and simplify.
> 
This is OK.

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Jakub Jelinek

On Wed, Dec 02, 2015 at 09:14:03AM -0500, Nathan Sidwell wrote:
> On 12/02/15 08:46, Jakub Jelinek wrote:
> 
> >Or does the OpenACC execution model not allow anything like that, i.e.
> >have some function with an automatic variable pass the address of that
> >variable to some other function and that other function use #acc loop kind
> >that expects the caller to be at the worker level and splits the work among
> >the threads in the warp, on the array section pointed by that passed in
> >pointer?  See the OpenMP testcase I've posted in this thread.
> 
> There are two cases to consider
> 
> 1) the caller (& address taker) is already partitioned.  Thus the callers'
> frames are already copied.  The caller takes the address of the object in
> its own frame.
> 
> An example would be calling say __mulcd3 where the return value location is
> passed by pointer.
> 
> 2) the caller is not partitioned and calls a function containing a
> partitioned loop.  The caller takes the address of its instance of the
> variable.  As part of the RTL expansion we have to convert addresses (to be
> stored in registers) to the generic address space.  That conversion creates
> a pointer that may be used by any thread (on the same CTA)[*].  The function
> call is  executed by all threads (they're partially un-neutered before the
> call).  In the partitioned loop, each thread ends up accessing the location
> in the frame of the original calling active thread.
> 
> [*]  although .local is private to each thread, it's placed in memory that
> is reachable from anywhere, provided a generic address is used.  Essentially
> it's like TLS and genericization is simply adding the thread pointer to the
> local memory offset to create a generic address.

I believe Alex' testing revealed that if you take address of the same .local
objects in several threads, the addresses are the same, and therefore you
refer to your own .local space rather than the other thread's.  Which is why
the -msoft-stack stuff has been added.
Perhaps we need to use it everywhere, at least for OpenMP, and do it
selectively, non-addressable vars can stay .local, addressable vars proven
not to escape to other threads (or other functions that could access them
from other threads) would go to soft stack.

Jakub

[PTX] simplify movs

2015-12-02 Thread Nathan Sidwell

The PTX md file goes to a lot of effort handling SC and DC movs, including for 
unspecs to mov low and high parts around.  However, these code paths are not 
exercised in any gcc test or the build of newlib.  The generic handling of these 
movs deals with type punning, (using the stack frame, if needed).  There 
doesn't appear a need for a separate punbuffer.


Thus this patch deletes a lot of that machinery.

nathan
2015-12-02  Nathan Sidwell  

	* config/nvptx/nvptx-protos.h (nvptx_output_mov_insn): Declare.
	(nvptx_underlying_object_mode): Delete.
	* config/nvptx/nvptx.c (nvptx_underlying_object_mode): Delete.
	(output_reg): New.
	(nvptx_declare_function_name): Use output_reg.  Remove punning
	buffer.
	(nvptx_output_mov_insn): New.
	(nvptx_print_operand): Separate SUBREG handling, remove 'f' case,
	Use output_reg. Merge 't' and 'u' handling.
	* config/nvptx/nvptx.h (NVPTX_PUNNING_BUFFER_REGNUM): Delete.
	(struct machine_function): Remvoe punning_buffer_size.
	(REGISTER_NAMES): Remove %punbuffer.
	* config/nvptx/nvptx.md (UNSPEC_CPLX_LOWPART,
	UNSPEC_CPLX_HIGHPART): Delete.
	(*mov_insn [QHSDIM): Remove unnecessary constraints, use
	nvptx_output_mov_insn.
	(*mov_insn [SDFM): Reorder constraints to match integer
	moc.  Use nvptx_output_mov_insn.
	(highpartscsf2,  set_highpartscsf2, lowpartscsf2,
	set_lowpartscsf2): Delete.
	(mov [SDCM]): Delete.

Index: config/nvptx/nvptx-protos.h
===
--- config/nvptx/nvptx-protos.h	(revision 231177)
+++ config/nvptx/nvptx-protos.h	(working copy)
@@ -38,9 +38,9 @@ extern void nvptx_expand_oacc_join (unsi
 extern void nvptx_expand_call (rtx, rtx);
 extern rtx nvptx_expand_compare (rtx);
 extern const char *nvptx_ptx_type_from_mode (machine_mode, bool);
+extern const char *nvptx_output_mov_insn (rtx, rtx);
 extern const char *nvptx_output_call_insn (rtx_insn *, rtx, rtx);
 extern const char *nvptx_output_return (void);
-extern machine_mode nvptx_underlying_object_mode (rtx);
 extern const char *nvptx_section_from_addr_space (addr_space_t);
 extern bool nvptx_hard_regno_mode_ok (int, machine_mode);
 extern rtx nvptx_maybe_convert_symbolic_operand (rtx);
Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 231177)
+++ config/nvptx/nvptx.c	(working copy)
@@ -155,23 +155,6 @@ nvptx_option_override (void)
   worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
 }
 
-/* Return the mode to be used when declaring a ptx object for OBJ.
-   For objects with subparts such as complex modes this is the mode
-   of the subpart.  */
-
-machine_mode
-nvptx_underlying_object_mode (rtx obj)
-{
-  if (GET_CODE (obj) == SUBREG)
-obj = SUBREG_REG (obj);
-  machine_mode mode = GET_MODE (obj);
-  if (mode == TImode)
-return DImode;
-  if (COMPLEX_MODE_P (mode))
-return GET_MODE_INNER (mode);
-  return mode;
-}
-
 /* Return a ptx type for MODE.  If PROMOTE, then use .u32 for QImode to
deal with ptx ideosyncracies.  */
 
@@ -257,6 +240,37 @@ maybe_split_mode (machine_mode mode)
   return VOIDmode;
 }
 
+/* Output a register, subreg, or register pair (with optional
+   enclosing braces).  */
+
+static void
+output_reg (FILE *file, unsigned regno, machine_mode inner_mode,
+	int subreg_offset = -1)
+{
+  if (inner_mode == VOIDmode)
+{
+  if (HARD_REGISTER_NUM_P (regno))
+	fprintf (file, "%s", reg_names[regno]);
+  else
+	fprintf (file, "%%r%d", regno);
+}
+  else if (subreg_offset >= 0)
+{
+  output_reg (file, regno, VOIDmode);
+  fprintf (file, "$%d", subreg_offset);
+}
+  else
+{
+  if (subreg_offset == -1)
+	fprintf (file, "{");
+  output_reg (file, regno, inner_mode, GET_MODE_SIZE (inner_mode));
+  fprintf (file, ",");
+  output_reg (file, regno, inner_mode, 0);
+  if (subreg_offset == -1)
+	fprintf (file, "}");
+}
+}
+
 /* Emit forking instructions for MASK.  */
 
 static void
@@ -724,16 +738,12 @@ nvptx_declare_function_name (FILE *file,
 	{
 	  machine_mode mode = PSEUDO_REGNO_MODE (i);
 	  machine_mode split = maybe_split_mode (mode);
+
 	  if (split != VOIDmode)
-	{
-	  fprintf (file, "\t.reg%s %%r%d$%d;\n",
-		   nvptx_ptx_type_from_mode (split, true), i, 0);
-	  fprintf (file, "\t.reg%s %%r%d$%d;\n",
-		   nvptx_ptx_type_from_mode (split, true), i, 1);
-	}
-	  else
-	fprintf (file, "\t.reg%s %%r%d;\n",
-		 nvptx_ptx_type_from_mode (mode, true), i);
+	mode = split;
+	  fprintf (file, "\t.reg%s ", nvptx_ptx_type_from_mode (mode, true));
+	  output_reg (file, i, split, -2);
+	  fprintf (file, ";\n");
 	}
 }
 
@@ -754,15 +764,6 @@ nvptx_declare_function_name (FILE *file,
 	   BITS_PER_WORD);
 }
 
-  if (cfun->machine->punning_buffer_size > 0)
-{
-  fprintf (file, "\t.reg.u%d %%punbuffer;\n"
-	   "\t.local.align 8 .b8 %%punbuffer_ar[%d];\n",
-	   BITS_PER_WORD,

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Jakub Jelinek

On Wed, Dec 02, 2015 at 05:54:51PM +0300, Alexander Monakov wrote:
> On Wed, 2 Dec 2015, Jakub Jelinek wrote:
> 
> > On Wed, Dec 02, 2015 at 08:02:47AM -0500, Nathan Sidwell wrote:
> > > On 12/02/15 05:40, Jakub Jelinek wrote:
> > > > Don't know the HW good enough, is there any power consumption, heat etc.
> > > >difference between the two approaches?  I mean does the HW consume 
> > > >different
> > > >amount of power if only one thread in a warp executes code and the other
> > > >threads in the same warp just jump around it, vs. having all threads 
> > > >busy?
> > > 
> > > Having all threads busy will increase power consumption.  It's also bad if
> > > the other vectors are executing memory access instructions.  However, for
> > 
> > Then the uniform SIMT approach might not be that good idea.
> 
> Why?  Remember that the tradeoff is copying registers (and in OpenACC, stacks
> too).  We don't know how the costs balance.  My intuition is that copying is
> worse compared to what I'm doing.
> 
> Anyhow, for good performance the offloaded code needs to be running in vector
> regions most of the time, where the concern doesn't apply.

But you never know if people actually use #pragma omp simd regions or not,
sometimes they will, sometimes they won't, and if the uniform SIMT increases
power consumption, it might not be desirable.

If we have a reasonable IPA pass to discover which addressable variables can
be shared by multiple threads and which can't, then we could use soft-stack
for those that can be shared by multiple PTX threads (different warps, or
same warp, different threads in it), then we shouldn't need to copy any
stack, just broadcast the scalar vars.

Jakub

Re: [OpenACC 0/7] host_data construct

2015-12-02 Thread Tom de Vries


On 30/11/15 20:30, Julian Brown wrote:

 libgomp/
 * oacc-parallel.c (GOACC_host_data): New function.
 * libgomp.map (GOACC_host_data): Add to GOACC_2.0.1.
 * testsuite/libgomp.oacc-c-c++-common/host_data-1.c: New test.
 * testsuite/libgomp.oacc-c-c++-common/host_data-2.c: New test.
 * testsuite/libgomp.oacc-c-c++-common/host_data-3.c: New test.
 * testsuite/libgomp.oacc-c-c++-common/host_data-4.c: New test.
 * testsuite/libgomp.oacc-c-c++-common/host_data-5.c: New test.
 * testsuite/libgomp.oacc-c-c++-common/host_data-6.c: New test.



Hi,

At r231169, I'm seeing these failures for a no-accelerator setup:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-2.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-4.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-5.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 execution test

...

Thanks,
- Tom

Re: [PATCH] Avoid false vector mask conversion

2015-12-02 Thread Richard Biener

On Wed, Dec 2, 2015 at 4:24 PM, Ilya Enkovich  wrote:
> 2015-12-02 17:52 GMT+03:00 Richard Biener :
>> On Thu, Nov 12, 2015 at 5:08 PM, Ilya Enkovich  
>> wrote:
>>> Hi,
>>>
>>> When we use LTO for fortran we may have a mix 32bit and 1bit scalar 
>>> booleans. It means we may have conversion of one scalar type to another 
>>> which confuses vectorizer because values with different scalar boolean type 
>>> may get the same vectype.  This patch transforms such conversions into 
>>> comparison.
>>>
>>> I managed to make a small fortran test which gets vectorized with this 
>>> patch but I didn't find how I can run fortran test with LTO and then scan 
>>> tree dump to check it is vectorized.  BTW here is a loop from the test:
>>>
>>>   real*8 a(18)
>>>   logical b(18)
>>>   integer i
>>>
>>>   do i=1,18
>>>  if(a(i).gt.0.d0) then
>>> b(i)=.true.
>>>  else
>>> b(i)=.false.
>>>  endif
>>>   enddo
>>>
>>> Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK for trunk?
>>>
>>> Thanks,
>>> Ilya
>>> --
>>> gcc/
>>>
>>> 2015-11-12  Ilya Enkovich  
>>>
>>> * tree-vect-patterns.c (vect_recog_mask_conversion_pattern):
>>> Transform useless boolean conversion into assignment.
>>>
>>>
>>> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
>>> index b9d900c..62070da 100644
>>> --- a/gcc/tree-vect-patterns.c
>>> +++ b/gcc/tree-vect-patterns.c
>>> @@ -3674,6 +3674,38 @@ vect_recog_mask_conversion_pattern (vec 
>>> *stmts, tree *type_in,
>>>if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE)
>>>  return NULL;
>>>
>>> +  /* Check conversion between boolean types of different sizes.
>>> + If no vectype is specified, then we have a regular mask
>>> + assignment with no actual conversion.  */
>>> +  if (rhs_code == CONVERT_EXPR
>>
>> CONVERT_EXPR_CODE_P (rhs_code)
>>
>>> +  && !STMT_VINFO_DATA_REF (stmt_vinfo)
>>> +  && !STMT_VINFO_VECTYPE (stmt_vinfo))
>>> +{
>>> +  if (TREE_CODE (rhs1) != SSA_NAME)
>>> +   return NULL;
>>> +
>>> +  rhs1_type = search_type_for_mask (rhs1, vinfo);
>>> +  if (!rhs1_type)
>>> +   return NULL;
>>> +
>>> +  vectype1 = get_mask_type_for_scalar_type (rhs1_type);
>>> +
>>> +  if (!vectype1)
>>> +   return NULL;
>>> +
>>> +  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
>>> +  pattern_stmt = gimple_build_assign (lhs, rhs1);
>>
>> So what's the actual issue here?  That the conversion is spurious?
>> Why can't you accept this simply in vectorizable_assignment then?
>
> The problem is that conversion is supposed to be handled by
> vectorizable_conversion,
> but it fails to because it is not actually a conversion. I suppose it
> may be handled
> in vectorizable_assignment but I chose this pattern because it's meant
> to handle mask
> conversion issues.

I think it's always better to avoid patterns if you can.

Richard.

> Thanks,
> Ilya
>
>>
>> Richard.
>>
>>> +  *type_out = vectype1;
>>> +  *type_in = vectype1;
>>> +  stmts->safe_push (last_stmt);
>>> +  if (dump_enabled_p ())
>>> +   dump_printf_loc (MSG_NOTE, vect_location,
>>> + "vect_recog_mask_conversion_pattern: 
>>> detected:\n");
>>> +
>>> +  return pattern_stmt;
>>> +}
>>> +
>>>if (rhs_code != BIT_IOR_EXPR
>>>&& rhs_code != BIT_XOR_EXPR
>>>&& rhs_code != BIT_AND_EXPR)

Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path

2015-12-02 Thread Richard Biener

On Wed, Dec 2, 2015 at 4:31 PM, Jeff Law  wrote:
> On 12/02/2015 02:54 AM, Richard Biener wrote:
>>>
>>> Deferring to cfg_cleanup works because if cfg_cleanup does anything, it
>>> sets
>>> LOOPS_NEED_FIXUP (which we were trying to avoid in DOM).  So it seems
>>> that
>>> the gyrations we often do to avoid LOOPS_NEED_FIXUP are probably not all
>>> that valuable in the end.  Anyway...
>>
>>
>> Yeah, I have partially working patches lying around to "fix" CFG cleanup
>> to
>> avoid this.  Of course in the case of new loops appearing that's not
>> easily
>> possible.
>
> And that may argue that it's largely inevitable if we collapse a conditional
> (and thus delete an edge).
>
>
>>
>>> There's some fallout which I'm still exploring.  For example, we have
>>> cases
>>> where removal of the edge by DOM results in removal of a PHI argument in
>>> the
>>> target, which in turn results in the PHI becoming a degenerate which we
>>> can
>>> then propagate away.  I have a possible solution for this that I'm
>>> playing
>>> with.
>>>
>>> I suspect the right path is to continue down this path.
>>
>>
>> Yeah, the issue here is that DOM isn't tracking which edges are executable
>> to handle merge PHIs (or to aovid doing work in unreachable regions).
>
> Right.
>
>
> It should
>>
>> be possible to make it do that much like I extended SCCVN to do this
>> (when doing the DOM walk see if any incoming edge is marked executable
>> and if not, mark all outgoing edges as not executable, if the block is
>> executable
>> at the time we process the last stmt determine if we can compute the edge
>> that ends up always executed and mark all others as not executable)
>
> Essentially yes. I'm using the not-executable flag and bypassing things when
> it's discovered.
>
> The most interesting side effect, and one I haven't fully analyzed yet is an
> unexpected jump thread -- which I've traced back to differences in what the
> alias oracle is able to find when we walk unaliased vuses. Which makes
> totally no sense that it's unable to find the unaliased vuse in the
> simplified CFG, but finds it when we don't remove the unexecutable edge.  As
> I said, it makes no sense to me yet and I'm still digging.

The walking of PHI nodes is quite simplistic to avoid doing too much work so
an extra (not executable) edge may confuse it enough.  So this might be
"expected".  Adding a flag on whether EDGE_EXECUTABLE is to be
trusted would be an option (also helping SCCVN).

Richard.

> jeff

Re: [OpenACC 0/7] host_data construct

2015-12-02 Thread Thomas Schwinge

Hi!

Cesar and Jim copied, for help with Fortran and generally testsuite
things.

On Mon, 30 Nov 2015 19:30:34 +, Julian Brown  
wrote:
> [patch]

First, thanks!

> Tests look OK (libgomp/gcc/g++/libstdc++), and the new ones pass.

I see a regression (ICE) in gfortran.dg/goacc/coarray.f95 (done: XFAILed,
and obsolete dg-excess-errors directives removed; compare to
gfortran.dg/goacc/coarray_2.f90), and I see new FAILs for non-offloading
execution of libgomp.oacc-c-c++-common/host_data-2.c,
libgomp.oacc-c-c++-common/host_data-4.c, and
libgomp.oacc-c-c++-common/host_data-5.c (done: see below); confirmed by a
number of reports on the  and
 mailing lists.  I can understand that you
didn't see the Fortran problem if not running Fortrant testing (but
why?), but it's strange that you didn't see the libgomp C/C++ FAILs.

A few patch review items, some of which I've already addressed (see
below).

> --- a/gcc/c/c-parser.c
> +++ b/gcc/c/c-parser.c
> @@ -10279,6 +10279,8 @@ c_parser_omp_clause_name (c_parser *parser)
>   result = PRAGMA_OMP_CLAUSE_UNTIED;
> else if (!strcmp ("use_device_ptr", p))
>   result = PRAGMA_OMP_CLAUSE_USE_DEVICE_PTR;
> +   else if (!strcmp ("use_device", p))
> + result = PRAGMA_OACC_CLAUSE_USE_DEVICE;

"use_device" sorts before "use_device_ptr".  (Done.)

> @@ -12940,6 +12951,10 @@ c_parser_oacc_all_clauses (c_parser *parser, 
> omp_clause_mask mask,
> clauses = c_parser_oacc_data_clause (parser, c_kind, clauses);
> c_name = "self";
> break;
> + case PRAGMA_OACC_CLAUSE_USE_DEVICE:
> +   clauses = c_parser_oacc_clause_use_device (parser, clauses);
> +   c_name = "use_device";
> +   break;
>   case PRAGMA_OACC_CLAUSE_SEQ:
> clauses = c_parser_oacc_simple_clause (parser, OMP_CLAUSE_SEQ,
>   clauses);

Sorting?  (Done.)

> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c
> @@ -29232,6 +29232,8 @@ cp_parser_omp_clause_name (cp_parser *parser)
>   result = PRAGMA_OMP_CLAUSE_UNTIED;
> else if (!strcmp ("use_device_ptr", p))
>   result = PRAGMA_OMP_CLAUSE_USE_DEVICE_PTR;
> +   else if (!strcmp ("use_device", p))
> + result = PRAGMA_OACC_CLAUSE_USE_DEVICE;
> break;

Likewise.  (Done.)

> @@ -31598,6 +31600,11 @@ cp_parser_oacc_all_clauses (cp_parser *parser, 
> omp_clause_mask mask,
> clauses = cp_parser_oacc_data_clause (parser, c_kind, clauses);
> c_name = "self";
> break;
> + case PRAGMA_OACC_CLAUSE_USE_DEVICE:
> +   clauses = cp_parser_omp_var_list (parser, OMP_CLAUSE_USE_DEVICE,
> + clauses);
> +   c_name = "use_device";
> +   break;
>   case PRAGMA_OACC_CLAUSE_SEQ:
> clauses = cp_parser_oacc_simple_clause (parser, OMP_CLAUSE_SEQ,
>clauses, here);

Likewise.  (Done.)

> +#define OACC_HOST_DATA_CLAUSE_MASK   \
> +  ( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_USE_DEVICE) )
> +
> +/* OpenACC 2.0:
> +  # pragma acc host_data  new-line
> +  structured-block  */

Define OACC_HOST_DATA_CLAUSE_MASK after the "accepted syntax" comment.
(Done.)

There is no handlig of OMP_CLAUSE_USE_DEVICE in
gcc/cp/pt.c:tsubst_omp_clauses.  (Done.)

> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c

> @@ -6418,6 +6422,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
> *pre_p,
|if (!lang_GNU_Fortran ())
|  switch (code)
|{
|case OMP_TARGET:
>case OMP_TARGET_DATA:
>case OMP_TARGET_ENTER_DATA:
>case OMP_TARGET_EXIT_DATA:
> +  case OACC_HOST_DATA:
>   ctx->target_firstprivatize_array_bases = true;
>default:
>   break;

I understand it's not yet relevant/supported for OpenMP in Fortran, but
why is C/C++ vs. Fortran being handled differently here for OpenACC
host_data?

> --- a/libgomp/oacc-parallel.c
> +++ b/libgomp/oacc-parallel.c

> +void
> +GOACC_host_data (int device, size_t mapnum,
> +  void **hostaddrs, size_t *sizes, unsigned short *kinds)
> +{
> +  bool host_fallback = device == GOMP_DEVICE_HOST_FALLBACK;
> +  struct target_mem_desc *tgt;
> +
> +#ifdef HAVE_INTTYPES_H
> +  gomp_debug (0, "%s: mapnum=%"PRIu64", hostaddrs=%p, size=%p, kinds=%p\n",
> +   __FUNCTION__, (uint64_t) mapnum, hostaddrs, sizes, kinds);
> +#else
> +  gomp_debug (0, "%s: mapnum=%lu, hostaddrs=%p, sizes=%p, kinds=%p\n",
> +   __FUNCTION__, (unsigned long) mapnum, hostaddrs, sizes, kinds);
> +#endif
> +
> +  goacc_lazy_initialize ();
> +
> +  struct goacc_thread *thr = goacc_thread ();
> +  struct gomp_device_descr *acc_dev = thr->dev;
> +
> +  /* Host fallback or 'do nothing'.  */
> +  if ((acc_dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
> +  || host_fallback)
> +{
> +  tgt = gomp_map_vars (NULL, 0, NULL,

[PATCH 1/2] [graphite] rename flag_loop_optimize_isl to flag_loop_nest_optimize

2015-12-02 Thread Sebastian Pop

---
 gcc/common.opt  | 2 +-
 gcc/graphite-poly.c | 2 +-
 gcc/graphite.c  | 2 +-
 gcc/toplev.c| 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index e1617c4..e593631 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1379,7 +1379,7 @@ Common Ignore
 Does nothing. Preserved for backward compatibility.
 
 floop-nest-optimize
-Common Report Var(flag_loop_optimize_isl) Optimization
+Common Report Var(flag_loop_nest_optimize) Optimization
 Enable the ISL based loop nest optimizer.
 
 fstrict-volatile-bitfields
diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index a51aefe..f4bdd40 100644
--- a/gcc/graphite-poly.c
+++ b/gcc/graphite-poly.c
@@ -122,7 +122,7 @@ apply_poly_transforms (scop_p scop)
   if (flag_loop_parallelize_all)
 transform_done = true;
 
-  if (flag_loop_optimize_isl)
+  if (flag_loop_nest_optimize)
 transform_done |= optimize_isl (scop);
 
   return transform_done;
diff --git a/gcc/graphite.c b/gcc/graphite.c
index ee1d211..83aa88b 100644
--- a/gcc/graphite.c
+++ b/gcc/graphite.c
@@ -372,7 +372,7 @@ gate_graphite_transforms (void)
  is turned on.  */
   if (flag_graphite_identity
   || flag_loop_parallelize_all
-  || flag_loop_optimize_isl)
+  || flag_loop_nest_optimize)
 flag_graphite = 1;
 
   return flag_graphite != 0;
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 5aade2f..aee55fc 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1237,7 +1237,7 @@ process_options (void)
 
 #ifndef HAVE_isl
   if (flag_graphite
-  || flag_loop_optimize_isl
+  || flag_loop_nest_optimize
   || flag_graphite_identity
   || flag_loop_parallelize_all)
 sorry ("Graphite loop optimizations cannot be used (ISL is not available)" 
-- 
1.9.1

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Alexander Monakov

On Wed, 2 Dec 2015, Jakub Jelinek wrote:

> On Wed, Dec 02, 2015 at 08:02:47AM -0500, Nathan Sidwell wrote:
> > On 12/02/15 05:40, Jakub Jelinek wrote:
> > > Don't know the HW good enough, is there any power consumption, heat etc.
> > >difference between the two approaches?  I mean does the HW consume 
> > >different
> > >amount of power if only one thread in a warp executes code and the other
> > >threads in the same warp just jump around it, vs. having all threads busy?
> > 
> > Having all threads busy will increase power consumption.  It's also bad if
> > the other vectors are executing memory access instructions.  However, for
> 
> Then the uniform SIMT approach might not be that good idea.

Why?  Remember that the tradeoff is copying registers (and in OpenACC, stacks
too).  We don't know how the costs balance.  My intuition is that copying is
worse compared to what I'm doing.

Anyhow, for good performance the offloaded code needs to be running in vector
regions most of the time, where the concern doesn't apply.

Alexander

[PATCH] Fix PR66051

2015-12-02 Thread Richard Biener


This fixes the vectorizer part of PR66051 (a x86 target part remains
for the testcase in the PR - PR68655).  The issue is again a
misplaced check for SLP detection:

  /* Check that the size of interleaved loads group is not
 greater than the SLP group size.  */
  unsigned ncopies
= vectorization_factor / TYPE_VECTOR_SUBPARTS (vectype);
  if (is_a  (vinfo)
  && GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) == stmt
  && ((GROUP_SIZE (vinfo_for_stmt (stmt))
   - GROUP_GAP (vinfo_for_stmt (stmt)))
  > ncopies * group_size))
{
  if (dump_enabled_p ())
{
  dump_printf_loc (MSG_MISSED_OPTIMIZATION, 
vect_location,
   "Build SLP failed: the number "
   "of interleaved loads is greater 
than "
   "the SLP group size ");
  dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
stmt, 0);
  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
}
  /* Fatal mismatch.  */
  matches[0] = false;
  return false;
}

I've relaxed this multiple times but that still doesn't make it necessary.
It also uses a vectorization factor estimate as the vectorization factor
is not yet determined.  A good side-effect of the patch is that we
can get rid of that estimate completely.

Tested on the x86_64 vectorization tests sofar.

Bootstrap & regtest pending and I'll make sure SPEC CPU 2006 is
happy as well.

Thanks,
Richard.

2015-12-02  Richard Biener  

PR tree-optimization/66051
* tree-vect-slp.c (vect_build_slp_tree_1): Remove restriction
on load group size.  Do not pass in vectorization_factor.
(vect_transform_slp_perm_load): Do not require any permute support.
(vect_build_slp_tree): Do not pass in vectorization factor.
(vect_analyze_slp_instance): Do not compute vectorization
factor estimate.  Use vector size instead of vectorization factor
estimate to split store groups for BB vectorization.

* gcc.dg/vect/slp-42.c: New testcase.

Index: gcc/tree-vect-slp.c
===
*** gcc/tree-vect-slp.c (revision 231167)
--- gcc/tree-vect-slp.c (working copy)
*** static bool
*** 430,437 
  vect_build_slp_tree_1 (vec_info *vinfo,
   vec stmts, unsigned int group_size,
   unsigned nops, unsigned int *max_nunits,
!  unsigned int vectorization_factor, bool *matches,
!  bool *two_operators)
  {
unsigned int i;
gimple *first_stmt = stmts[0], *stmt = stmts[0];
--- 430,436 
  vect_build_slp_tree_1 (vec_info *vinfo,
   vec stmts, unsigned int group_size,
   unsigned nops, unsigned int *max_nunits,
!  bool *matches, bool *two_operators)
  {
unsigned int i;
gimple *first_stmt = stmts[0], *stmt = stmts[0];
*** vect_build_slp_tree_1 (vec_info *vinfo,
*** 523,533 
  
/* In case of multiple types we need to detect the smallest type.  */
if (*max_nunits < TYPE_VECTOR_SUBPARTS (vectype))
! {
!   *max_nunits = TYPE_VECTOR_SUBPARTS (vectype);
!   if (is_a  (vinfo))
! vectorization_factor = *max_nunits;
! }
  
if (gcall *call_stmt = dyn_cast  (stmt))
{
--- 522,528 
  
/* In case of multiple types we need to detect the smallest type.  */
if (*max_nunits < TYPE_VECTOR_SUBPARTS (vectype))
!   *max_nunits = TYPE_VECTOR_SUBPARTS (vectype);
  
if (gcall *call_stmt = dyn_cast  (stmt))
{
*** vect_build_slp_tree_1 (vec_info *vinfo,
*** 700,730 
  else
{
  /* Load.  */
-   /* Check that the size of interleaved loads group is not
-  greater than the SLP group size.  */
- unsigned ncopies
-   = vectorization_factor / TYPE_VECTOR_SUBPARTS (vectype);
-   if (is_a  (vinfo)
- && GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) == stmt
-   && ((GROUP_SIZE (vinfo_for_stmt (stmt))
-  - GROUP_GAP (vinfo_for_stmt (stmt)))
- > ncopies * group_size))
- {
-   if (dump_enabled_p ())
- {
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-  "Build SLP failed: the number "
-  "of interleaved loads is greater than "
-

Re: [PATCH] Avoid false vector mask conversion

2015-12-02 Thread Ilya Enkovich

2015-12-02 17:52 GMT+03:00 Richard Biener :
> On Thu, Nov 12, 2015 at 5:08 PM, Ilya Enkovich  wrote:
>> Hi,
>>
>> When we use LTO for fortran we may have a mix 32bit and 1bit scalar 
>> booleans. It means we may have conversion of one scalar type to another 
>> which confuses vectorizer because values with different scalar boolean type 
>> may get the same vectype.  This patch transforms such conversions into 
>> comparison.
>>
>> I managed to make a small fortran test which gets vectorized with this patch 
>> but I didn't find how I can run fortran test with LTO and then scan tree 
>> dump to check it is vectorized.  BTW here is a loop from the test:
>>
>>   real*8 a(18)
>>   logical b(18)
>>   integer i
>>
>>   do i=1,18
>>  if(a(i).gt.0.d0) then
>> b(i)=.true.
>>  else
>> b(i)=.false.
>>  endif
>>   enddo
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK for trunk?
>>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2015-11-12  Ilya Enkovich  
>>
>> * tree-vect-patterns.c (vect_recog_mask_conversion_pattern):
>> Transform useless boolean conversion into assignment.
>>
>>
>> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
>> index b9d900c..62070da 100644
>> --- a/gcc/tree-vect-patterns.c
>> +++ b/gcc/tree-vect-patterns.c
>> @@ -3674,6 +3674,38 @@ vect_recog_mask_conversion_pattern (vec 
>> *stmts, tree *type_in,
>>if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE)
>>  return NULL;
>>
>> +  /* Check conversion between boolean types of different sizes.
>> + If no vectype is specified, then we have a regular mask
>> + assignment with no actual conversion.  */
>> +  if (rhs_code == CONVERT_EXPR
>
> CONVERT_EXPR_CODE_P (rhs_code)
>
>> +  && !STMT_VINFO_DATA_REF (stmt_vinfo)
>> +  && !STMT_VINFO_VECTYPE (stmt_vinfo))
>> +{
>> +  if (TREE_CODE (rhs1) != SSA_NAME)
>> +   return NULL;
>> +
>> +  rhs1_type = search_type_for_mask (rhs1, vinfo);
>> +  if (!rhs1_type)
>> +   return NULL;
>> +
>> +  vectype1 = get_mask_type_for_scalar_type (rhs1_type);
>> +
>> +  if (!vectype1)
>> +   return NULL;
>> +
>> +  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
>> +  pattern_stmt = gimple_build_assign (lhs, rhs1);
>
> So what's the actual issue here?  That the conversion is spurious?
> Why can't you accept this simply in vectorizable_assignment then?

The problem is that conversion is supposed to be handled by
vectorizable_conversion,
but it fails to because it is not actually a conversion. I suppose it
may be handled
in vectorizable_assignment but I chose this pattern because it's meant
to handle mask
conversion issues.

Thanks,
Ilya

>
> Richard.
>
>> +  *type_out = vectype1;
>> +  *type_in = vectype1;
>> +  stmts->safe_push (last_stmt);
>> +  if (dump_enabled_p ())
>> +   dump_printf_loc (MSG_NOTE, vect_location,
>> + "vect_recog_mask_conversion_pattern: detected:\n");
>> +
>> +  return pattern_stmt;
>> +}
>> +
>>if (rhs_code != BIT_IOR_EXPR
>>&& rhs_code != BIT_XOR_EXPR
>>&& rhs_code != BIT_AND_EXPR)

[PATCH^1] Add fuzzing coverage support

2015-12-02 Thread Dmitry Vyukov

On Wed, Dec 2, 2015 at 5:10 PM, Dmitry Vyukov  wrote:
> ping
>
> Number of bugs found with this coverage in kernel already crossed 40:
> https://github.com/google/syzkaller/wiki/Found-Bugs
>
>
>
>
> On Fri, Nov 27, 2015 at 3:30 PM, Dmitry Vyukov  wrote:
>> +syzkaller group
>>
>> On Fri, Nov 27, 2015 at 3:28 PM, Dmitry Vyukov  wrote:
>>> Hello,
>>>
>>> This patch adds support for coverage-guided fuzzing:
>>> https://codereview.appspot.com/280140043
>>>
>>> Coverage-guided fuzzing is a powerful randomized testing technique
>>> that uses coverage feedback to determine new interesting inputs to a
>>> program. Some examples of the systems that use it are AFL
>>> (http://lcamtuf.coredump.cx/afl/) and LibFuzzer
>>> (http://llvm.org/docs/LibFuzzer.html). Compiler coverage
>>> instrumentation allows to make fuzzing more efficient as compared to
>>> binary instrumentation.
>>>
>>> Flag that enables coverage is named -fsanitize-coverage=trace-pc to
>>> make it consistent with similar functionality in clang, which already
>>> supports a set of fuzzing coverage modes:
>>> http://clang.llvm.org/docs/SanitizerCoverage.html
>>> -fsanitize-coverage=trace-pc is not yet supported in clang, but we
>>> plan to add it.
>>>
>>> This particular coverage mode simply inserts function calls into every
>>> basic block.
>>> I've built syzkaller, a Linux system call fuzzer
>>> (https://github.com/google/syzkaller), using this functionality. The
>>> fuzzer has found 30+ previously unknown bugs in kernel
>>> (https://github.com/google/syzkaller/wiki/Found-Bugs) in slightly more
>>> than a month (while kernel was extensively fuzzed with trinity --
>>> non-guided fuzzer). Quentin also built some kernel fuzzer on top of
>>> it.
>>>
>>> Why not gcov. Typical fuzzing loop looks as follows: (1) reset
>>> coverage, (2) execute a bit of code, (3) collect coverage, repeat. A
>>> typical coverage can be just a dozen of basic blocks (e.g. invalid
>>> input). In such context gcov becomes prohibitively expensive as
>>> reset/collect coverage steps depend on total number of basic
>>> blocks/edges in program (in case of kernel it is about 2M). Cost of
>>> this "tracing" mode depends on number of executed basic blocks/edges.
>>> On top of that, kernel required per-thread coverage because there are
>>> always background threads and unrelated processes that also produce
>>> coverage. With inlined gcov instrumentation per-thread coverage is not
>>> possible.
>>> Inlined fast paths do not make lots of sense in fuzzing scenario,
>>> because lots of code is executed just once in between resets. Also it
>>> is not really possible to inline accesses to per-cpu and per-task data
>>> structures for kernel.
>>>
>>> OK for trunk?

Re: [ping] pending patches

2015-12-02 Thread Jason Merrill


On 12/02/2015 02:58 AM, Eric Botcazou wrote:

DWARF-2 (debug info for Scalar_Storage_Order attribute):
   https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01659.html


It doesn't look to me like DW_AT_endianity is applicable to array types 
or members in DWARF 3/4; instead, it should be applied to the underlying 
base type.



C++ (PR 68290: internal error with concepts):
   https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03301.html


OK, thanks.

Jason

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Nathan Sidwell


On 12/02/15 09:24, Jakub Jelinek wrote:

On Wed, Dec 02, 2015 at 09:23:11AM -0500, Nathan Sidwell wrote:

On 12/02/15 09:22, Jakub Jelinek wrote:


I believe Alex' testing revealed that if you take address of the same .local
objects in several threads, the addresses are the same, and therefore you
refer to your own .local space rather than the other thread's.


Before or after applying cvta?


I'll let Alex answer that.


Nevermind, I've run an experiment, and it appears that local addresses converted 
to generic do give the same value regardless of executing thread.  I guess that 
means that genericization of local addresses to physical memory is done late at 
the load/store insn, rather than in the cvta insn.


When I added routine support, I did wonder whether the calling routine would 
need to clone its stack frame, but determined against it using the logic I wrote 
earlier.


nathan

Re: Add fuzzing coverage support

2015-12-02 Thread Dmitry Vyukov

ping

Number of bugs found with this coverage in kernel already crossed 40:
https://github.com/google/syzkaller/wiki/Found-Bugs




On Fri, Nov 27, 2015 at 3:30 PM, Dmitry Vyukov  wrote:
> +syzkaller group
>
> On Fri, Nov 27, 2015 at 3:28 PM, Dmitry Vyukov  wrote:
>> Hello,
>>
>> This patch adds support for coverage-guided fuzzing:
>> https://codereview.appspot.com/280140043
>>
>> Coverage-guided fuzzing is a powerful randomized testing technique
>> that uses coverage feedback to determine new interesting inputs to a
>> program. Some examples of the systems that use it are AFL
>> (http://lcamtuf.coredump.cx/afl/) and LibFuzzer
>> (http://llvm.org/docs/LibFuzzer.html). Compiler coverage
>> instrumentation allows to make fuzzing more efficient as compared to
>> binary instrumentation.
>>
>> Flag that enables coverage is named -fsanitize-coverage=trace-pc to
>> make it consistent with similar functionality in clang, which already
>> supports a set of fuzzing coverage modes:
>> http://clang.llvm.org/docs/SanitizerCoverage.html
>> -fsanitize-coverage=trace-pc is not yet supported in clang, but we
>> plan to add it.
>>
>> This particular coverage mode simply inserts function calls into every
>> basic block.
>> I've built syzkaller, a Linux system call fuzzer
>> (https://github.com/google/syzkaller), using this functionality. The
>> fuzzer has found 30+ previously unknown bugs in kernel
>> (https://github.com/google/syzkaller/wiki/Found-Bugs) in slightly more
>> than a month (while kernel was extensively fuzzed with trinity --
>> non-guided fuzzer). Quentin also built some kernel fuzzer on top of
>> it.
>>
>> Why not gcov. Typical fuzzing loop looks as follows: (1) reset
>> coverage, (2) execute a bit of code, (3) collect coverage, repeat. A
>> typical coverage can be just a dozen of basic blocks (e.g. invalid
>> input). In such context gcov becomes prohibitively expensive as
>> reset/collect coverage steps depend on total number of basic
>> blocks/edges in program (in case of kernel it is about 2M). Cost of
>> this "tracing" mode depends on number of executed basic blocks/edges.
>> On top of that, kernel required per-thread coverage because there are
>> always background threads and unrelated processes that also produce
>> coverage. With inlined gcov instrumentation per-thread coverage is not
>> possible.
>> Inlined fast paths do not make lots of sense in fuzzing scenario,
>> because lots of code is executed just once in between resets. Also it
>> is not really possible to inline accesses to per-cpu and per-task data
>> structures for kernel.
>>
>> OK for trunk?

[PATCH 2/2] [graphite] fix invalid bounds on array refs

2015-12-02 Thread Sebastian Pop

While enabling graphite in -O3 we found a Fortran testcase that fails
because the max of the type domain is -1.  We used to add that as a constraint
to the elements accessed by the array, leading to a unfeasible constraint:
0 <= i <= -1.  Having that constraint, drops the data reference as that says
that there are no elements accessed in the array.
---
 gcc/graphite-dependences.c  | 49 -
 gcc/graphite-sese-to-poly.c | 98 ++---
 gcc/testsuite/gfortran.dg/graphite/run-id-3.f90 | 12 +++
 3 files changed, 114 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/graphite/run-id-3.f90

diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c
index fcc771b..bb81ae3 100644
--- a/gcc/graphite-dependences.c
+++ b/gcc/graphite-dependences.c
@@ -87,7 +87,11 @@ scop_get_reads (scop_p scop, vec pbbs)
 {
   FOR_EACH_VEC_ELT (PBB_DRS (pbb), j, pdr)
if (pdr_read_p (pdr))
- res = isl_union_map_add_map (res, add_pdr_constraints (pdr, pbb));
+ {
+   if (dump_file)
+ print_pdr (dump_file, pdr);
+   res = isl_union_map_add_map (res, add_pdr_constraints (pdr, pbb));
+ }
 }
 
   return res;
@@ -108,7 +112,11 @@ scop_get_must_writes (scop_p scop, vec pbbs)
 {
   FOR_EACH_VEC_ELT (PBB_DRS (pbb), j, pdr)
if (pdr_write_p (pdr))
- res = isl_union_map_add_map (res, add_pdr_constraints (pdr, pbb));
+ {
+   if (dump_file)
+ print_pdr (dump_file, pdr);
+   res = isl_union_map_add_map (res, add_pdr_constraints (pdr, pbb));
+ }
 }
 
   return res;
@@ -129,7 +137,12 @@ scop_get_may_writes (scop_p scop, vec pbbs)
 {
   FOR_EACH_VEC_ELT (PBB_DRS (pbb), j, pdr)
if (pdr_may_write_p (pdr))
- res = isl_union_map_add_map (res, add_pdr_constraints (pdr, pbb));
+ {
+   if (dump_file)
+ print_pdr (dump_file, pdr);
+
+   res = isl_union_map_add_map (res, add_pdr_constraints (pdr, pbb));
+ }
 }
 
   return res;
@@ -313,6 +326,36 @@ compute_deps (scop_p scop, vec pbbs,
   isl_union_map *empty = isl_union_map_empty (space);
   isl_union_map *original = scop_get_original_schedule (scop, pbbs);
 
+  if (dump_file)
+{
+  fprintf (dump_file, "\n--- Documentation for datarefs dump: ---\n");
+  fprintf (dump_file, "Statements on the iteration domain are mapped to"
+  " array references.\n");
+  fprintf (dump_file, "  To read the following data references:\n\n");
+  fprintf (dump_file, "  S_5[i0] -> [106] : i0 >= 0 and i0 <= 3\n");
+  fprintf (dump_file, "  S_8[i0] -> [1, i0] : i0 >= 0 and i0 <= 3\n\n");
+
+  fprintf (dump_file, "  S_5[i0] is the dynamic instance of statement"
+  " bb_5 in a loop that accesses all iterations 0 <= i0 <= 3.\n");
+  fprintf (dump_file, "  [1, i0] is a 'memref' with alias set 1"
+  " and first subscript access i0.\n");
+  fprintf (dump_file, "  [106] is a 'scalar reference' which is the sum of"
+  " SSA_NAME_VERSION 6"
+  " and --param graphite-max-arrays-per-scop=100\n");
+  fprintf (dump_file, "---\n\n");
+
+  fprintf (dump_file, "data references (\n");
+  fprintf (dump_file, "  reads: ");
+  print_isl_union_map (dump_file, reads);
+  fprintf (dump_file, "  must_writes: ");
+  print_isl_union_map (dump_file, must_writes);
+  fprintf (dump_file, "  may_writes: ");
+  print_isl_union_map (dump_file, may_writes);
+  fprintf (dump_file, "  all_writes: ");
+  print_isl_union_map (dump_file, all_writes);
+  fprintf (dump_file, ")\n");
+}
+
   isl_union_map_compute_flow (isl_union_map_copy (reads),
  isl_union_map_copy (must_writes),
  isl_union_map_copy (may_writes),
diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 7ef01fb..887c212 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -922,6 +922,33 @@ pdr_add_memory_accesses (isl_map *acc, dr_info )
   return acc;
 }
 
+/* Return true when the LOW and HIGH bounds of an array reference REF are valid
+   to extract constraints on accessed elements of the array.  Returning false 
is
+   the conservative answer.  */
+
+static bool
+bounds_are_valid (tree ref, tree low, tree high)
+{
+  if (!high)
+return false;
+
+  if (!tree_fits_shwi_p (low)
+  || !tree_fits_shwi_p (high))
+return false;
+
+  /* 1-element arrays at end of structures may extend over
+ their declared size.  */
+  if (array_at_struct_end_p (ref)
+  && operand_equal_p (low, high, 0))
+return false;
+
+  /* Fortran has some arrays where high bound is -1 and low is 0.  */
+  if (integer_onep (fold_build2 (LT_EXPR, boolean_type_node, high, low)))
+return false;
+
+  return true;
+}
+
 /* Add constrains representing

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Jakub Jelinek

On Wed, Dec 02, 2015 at 06:44:11PM +0300, Alexander Monakov wrote:
> > But you never know if people actually use #pragma omp simd regions or not,
> > sometimes they will, sometimes they won't, and if the uniform SIMT
> increases
> > power consumption, it might not be desirable.
> 
> It's easy to address: just terminate threads 1-31 if the linked image has
> no SIMD regions, like my pre-simd libgomp was doing.

Well, can't say the linked image in one shared library call a function
in another linked image in another shared library?  Or is that just not
supported for PTX?  I believe XeonPhi supports that.

If each linked image is self-contained, then that is probably a good idea,
but still you could have a single simd region somewhere and lots of other
target regions that don't use simd, or cases where only small amount of time
is spent in a simd region and this wouldn't help in that case.

If the addressables are handled through soft stack, then the rest is mostly
just SSA_NAMEs you can see on the edges of the SIMT region, that really
shouldn't be that expensive to broadcast or reduce back.

Jakub

Re: [gomp-nvptx 9/9] adjust SIMD loop lowering for SIMT targets

2015-12-02 Thread Alexander Monakov

On Wed, 2 Dec 2015, Jakub Jelinek wrote:
> Just wanted to see -fdump-tree-ompexp dump say from the testcase I've
> posted.  Does your patchset have any dependencies that aren't on the trunk?
> If not, I guess I just could apply the patchset and look at the results, but
> if there are, it would need applying more.

Hm, the testcase has a reduction, which would cause the loop have a _SIMDUID
clause, which would in turn make my patch give up, setting do_simt_transform
to false.  So I'm using presence of SIMDUID to see whether the loop has any
reduction/lastprivate data, which I'm not handling for SIMT yet.

(I should really start a branch)

Alexander

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Alexander Monakov



On Wed, 2 Dec 2015, Jakub Jelinek wrote:

> On Wed, Dec 02, 2015 at 09:23:11AM -0500, Nathan Sidwell wrote:
> > On 12/02/15 09:22, Jakub Jelinek wrote:
> > 
> > >I believe Alex' testing revealed that if you take address of the same 
> > >.local
> > >objects in several threads, the addresses are the same, and therefore you
> > >refer to your own .local space rather than the other thread's.
> > 
> > Before or after applying cvta?
> 
> I'll let Alex answer that.

Both before and after, see this email:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02081.html

Alexander

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Nathan Sidwell


On 12/02/15 09:41, Alexander Monakov wrote:

On Wed, 2 Dec 2015, Nathan Sidwell wrote:


On 12/02/15 05:40, Jakub Jelinek wrote:

Don't know the HW good enough, is there any power consumption, heat etc.
difference between the two approaches?  I mean does the HW consume different
amount of power if only one thread in a warp executes code and the other
threads in the same warp just jump around it, vs. having all threads busy?


Having all threads busy will increase power consumption. >


Is that from general principles (i.e. "if it doesn't increase power
consumption, the GPU is poorly optimized"), or is that based on specific
knowledge on how existing GPUs operate (presumably reverse-engineered or
privately communicated -- I've never seen any public statements on this
point)?


Nvidia told me.


The only certain case I imagine is instructions that go to SFU rather than
normal SPs -- but those are relatively rare.


It's also bad if the other vectors are executing memory access instructions.


How so?  The memory accesses are the same independent of whether you reading
the same data from 1 thread or 32 synchronous threads.


Nvidia told me.

Re: [PING^2][PATCH] Improve C++ loop's backward-jump location

2015-12-02 Thread Andreas Arnez

On Wed, Dec 02 2015, Alan Lawrence wrote:

[...]

> Since this, we've been seeing these tests fail natively on AArch64 and ARM:
>
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++11  gcov: 3 failures in line
> counts, 0 in branch percentages, 0 in return percentages, 0 in
> intermediate format
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++11  line 115: is 27:should be 14
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++11  line 58: is 18:should be 9
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++11  line 73: is 162:should be 81
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++14  gcov: 3 failures in line
> counts, 0 in branch percentages, 0 in return percentages, 0 in
> intermediate format
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++14  line 115: is 27:should be 14
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++14  line 58: is 18:should be 9
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++14  line 73: is 162:should be 81
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++98  gcov: 3 failures in line
> counts, 0 in branch percentages, 0 in return percentages, 0 in
> intermediate format
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++98  line 115: is 27:should be 14
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++98  line 58: is 18:should be 9
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++98  line 73: is 162:should be 81

Right, sorry about that.  See also:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68603

This should be fixed now.  Let me know if you still see problems.

--
Andreas

Re: [PATCH] Avoid false vector mask conversion

2015-12-02 Thread Richard Biener

On Thu, Nov 12, 2015 at 5:08 PM, Ilya Enkovich  wrote:
> Hi,
>
> When we use LTO for fortran we may have a mix 32bit and 1bit scalar booleans. 
> It means we may have conversion of one scalar type to another which confuses 
> vectorizer because values with different scalar boolean type may get the same 
> vectype.  This patch transforms such conversions into comparison.
>
> I managed to make a small fortran test which gets vectorized with this patch 
> but I didn't find how I can run fortran test with LTO and then scan tree dump 
> to check it is vectorized.  BTW here is a loop from the test:
>
>   real*8 a(18)
>   logical b(18)
>   integer i
>
>   do i=1,18
>  if(a(i).gt.0.d0) then
> b(i)=.true.
>  else
> b(i)=.false.
>  endif
>   enddo
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK for trunk?
>
> Thanks,
> Ilya
> --
> gcc/
>
> 2015-11-12  Ilya Enkovich  
>
> * tree-vect-patterns.c (vect_recog_mask_conversion_pattern):
> Transform useless boolean conversion into assignment.
>
>
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index b9d900c..62070da 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -3674,6 +3674,38 @@ vect_recog_mask_conversion_pattern (vec 
> *stmts, tree *type_in,
>if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE)
>  return NULL;
>
> +  /* Check conversion between boolean types of different sizes.
> + If no vectype is specified, then we have a regular mask
> + assignment with no actual conversion.  */
> +  if (rhs_code == CONVERT_EXPR

CONVERT_EXPR_CODE_P (rhs_code)

> +  && !STMT_VINFO_DATA_REF (stmt_vinfo)
> +  && !STMT_VINFO_VECTYPE (stmt_vinfo))
> +{
> +  if (TREE_CODE (rhs1) != SSA_NAME)
> +   return NULL;
> +
> +  rhs1_type = search_type_for_mask (rhs1, vinfo);
> +  if (!rhs1_type)
> +   return NULL;
> +
> +  vectype1 = get_mask_type_for_scalar_type (rhs1_type);
> +
> +  if (!vectype1)
> +   return NULL;
> +
> +  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> +  pattern_stmt = gimple_build_assign (lhs, rhs1);

So what's the actual issue here?  That the conversion is spurious?
Why can't you accept this simply in vectorizable_assignment then?

Richard.

> +  *type_out = vectype1;
> +  *type_in = vectype1;
> +  stmts->safe_push (last_stmt);
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_NOTE, vect_location,
> + "vect_recog_mask_conversion_pattern: detected:\n");
> +
> +  return pattern_stmt;
> +}
> +
>if (rhs_code != BIT_IOR_EXPR
>&& rhs_code != BIT_XOR_EXPR
>&& rhs_code != BIT_AND_EXPR)

Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path

2015-12-02 Thread Jeff Law


On 12/02/2015 02:54 AM, Richard Biener wrote:

Deferring to cfg_cleanup works because if cfg_cleanup does anything, it sets
LOOPS_NEED_FIXUP (which we were trying to avoid in DOM).  So it seems that
the gyrations we often do to avoid LOOPS_NEED_FIXUP are probably not all
that valuable in the end.  Anyway...


Yeah, I have partially working patches lying around to "fix" CFG cleanup to
avoid this.  Of course in the case of new loops appearing that's not easily
possible.
And that may argue that it's largely inevitable if we collapse a 
conditional (and thus delete an edge).






There's some fallout which I'm still exploring.  For example, we have cases
where removal of the edge by DOM results in removal of a PHI argument in the
target, which in turn results in the PHI becoming a degenerate which we can
then propagate away.  I have a possible solution for this that I'm playing
with.

I suspect the right path is to continue down this path.


Yeah, the issue here is that DOM isn't tracking which edges are executable
to handle merge PHIs (or to aovid doing work in unreachable regions).

Right.


It should

be possible to make it do that much like I extended SCCVN to do this
(when doing the DOM walk see if any incoming edge is marked executable
and if not, mark all outgoing edges as not executable, if the block is
executable
at the time we process the last stmt determine if we can compute the edge
that ends up always executed and mark all others as not executable)
Essentially yes. I'm using the not-executable flag and bypassing things 
when it's discovered.


The most interesting side effect, and one I haven't fully analyzed yet 
is an unexpected jump thread -- which I've traced back to differences in 
what the alias oracle is able to find when we walk unaliased vuses. 
Which makes totally no sense that it's unable to find the unaliased vuse 
in the simplified CFG, but finds it when we don't remove the 
unexecutable edge.  As I said, it makes no sense to me yet and I'm still 
digging.


jeff

[gomp4] Merge trunk r231117 (2015-12-01) into gomp-4_0-branch

2015-12-02 Thread Thomas Schwinge

Hi!

Committed to gomp-4_0-branch in r231186:

commit e08db3c495b2447b7b3dacc61e9c297fd736a061
Merge: 32f9dce ce182d9
Author: tschwinge 
Date:   Wed Dec 2 16:16:40 2015 +

svn merge -r 231075:231117 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@231186 
138bc75d-0d04-0410-961f-82ee72b054a4


Grüße
 Thomas

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Alexander Monakov

On Wed, 2 Dec 2015, Nathan Sidwell wrote:

> On 12/02/15 05:40, Jakub Jelinek wrote:
> > Don't know the HW good enough, is there any power consumption, heat etc.
> > difference between the two approaches?  I mean does the HW consume different
> > amount of power if only one thread in a warp executes code and the other
> > threads in the same warp just jump around it, vs. having all threads busy?
> 
> Having all threads busy will increase power consumption. >

Is that from general principles (i.e. "if it doesn't increase power
consumption, the GPU is poorly optimized"), or is that based on specific
knowledge on how existing GPUs operate (presumably reverse-engineered or
privately communicated -- I've never seen any public statements on this
point)?

The only certain case I imagine is instructions that go to SFU rather than
normal SPs -- but those are relatively rare.

> It's also bad if the other vectors are executing memory access instructions.

How so?  The memory accesses are the same independent of whether you reading
the same data from 1 thread or 32 synchronous threads.

Alexander

Re: [PTX] simplify movs

2015-12-02 Thread Bernd Schmidt


On 12/02/2015 04:09 PM, Nathan Sidwell wrote:

The PTX md file goes to a lot of effort handling SC and DC movs,
including for unspecs to mov low and high parts around.  However, these
code paths are not exercised in any gcc test or the build of newlib.
The generic handling of these movs deals with type punning, (using the
stack frame, if needed).  There doesn't appear a need for a separate
punbuffer.

Thus this patch deletes a lot of that machinery.


Hmm, that was definitely necessary at one point. I wonder what changed?


Bernd

[PATCH] PR c/68637: Rebuild array with the updated function pointer type

2015-12-02 Thread H.J. Lu

On Tue, Dec 1, 2015 at 11:35 AM, H.J. Lu  wrote:
> When we apply function attribute to array of function pointer, we
> need to rebuild array with the updated function pointer type.
>
> gcc/
>
> PR c/68637
> * attribs.c (decl_attributes): Rebuid array with the updated
> * function pointer type.
>
> gcc/testsuite/
>
> PR c/68637
> * gcc.target/i386/pr68637.c: New test.

Here is the patch with run-time test.

-- 
H.J.
From 4e86b66144fba46b9b2ae88cf8d31179024ba7b8 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Tue, 1 Dec 2015 11:29:39 -0800
Subject: [PATCH] Rebuild array with the updated function pointer type

When we apply function attribute to array of function pointer, we need
to rebuild array with the updated function pointer type.

gcc/

	PR c/68637
	* attribs.c (decl_attributes): Rebuild array with the updated
	function pointer type.

gcc/testsuite/

	PR c/68637
	* gcc.target/i386/pr68637-1.c: New test.
	* gcc.target/i386/pr68637-2.c: Likewise.
---
 gcc/attribs.c | 18 +-
 gcc/testsuite/gcc.target/i386/pr68637-1.c | 10 ++
 gcc/testsuite/gcc.target/i386/pr68637-2.c | 25 +
 3 files changed, 52 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68637-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68637-2.c

diff --git a/gcc/attribs.c b/gcc/attribs.c
index affb21d..0be5ebf 100644
--- a/gcc/attribs.c
+++ b/gcc/attribs.c
@@ -494,9 +494,18 @@ decl_attributes (tree *node, tree attributes, int flags)
 	  flags &= ~(int) ATTR_FLAG_TYPE_IN_PLACE;
 	}
 
+  tree array_type = (TREE_CODE (*anode) == ARRAY_TYPE
+			 ? *anode
+			 : NULL_TREE);
+
   if (spec->function_type_required && TREE_CODE (*anode) != FUNCTION_TYPE
 	  && TREE_CODE (*anode) != METHOD_TYPE)
 	{
+	  /* We need to rebuid array with the updated function pointer
+	 type later. */
+	  if (array_type)
+	*anode = TREE_TYPE (*anode);
+
 	  if (TREE_CODE (*anode) == POINTER_TYPE
 	  && (TREE_CODE (TREE_TYPE (*anode)) == FUNCTION_TYPE
 		  || TREE_CODE (TREE_TYPE (*anode)) == METHOD_TYPE))
@@ -617,7 +626,14 @@ decl_attributes (tree *node, tree attributes, int flags)
 	  if (fn_ptr_quals)
 	fn_ptr_tmp = build_qualified_type (fn_ptr_tmp, fn_ptr_quals);
 	  if (DECL_P (*node))
-	TREE_TYPE (*node) = fn_ptr_tmp;
+	{
+	  if (array_type)
+		TREE_TYPE (*node)
+		  = build_array_type (fn_ptr_tmp,
+  TYPE_DOMAIN (array_type));
+	  else
+		TREE_TYPE (*node) = fn_ptr_tmp;
+	}
 	  else
 	{
 	  gcc_assert (TREE_CODE (*node) == POINTER_TYPE);
diff --git a/gcc/testsuite/gcc.target/i386/pr68637-1.c b/gcc/testsuite/gcc.target/i386/pr68637-1.c
new file mode 100644
index 000..b1661fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr68637-1.c
@@ -0,0 +1,10 @@
+/* { dg-do run { target ia32 } } */
+/* { dg-additional-sources pr68637-2.c } */
+
+extern void (*bar[]) (int, int) __attribute__ ((regparm (2)));
+
+void
+foo (void)
+{
+  bar[0] (1, 2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr68637-2.c b/gcc/testsuite/gcc.target/i386/pr68637-2.c
new file mode 100644
index 000..7d9654d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr68637-2.c
@@ -0,0 +1,25 @@
+/* { dg-do compile { target ia32 } } */
+
+static void 
+__attribute__ ((regparm (2)))
+bar0 (int i, int j)
+{
+  if (i != 1 || j != 2)
+__builtin_abort ();
+}
+
+typedef void (*func_t) (int, int) __attribute__ ((regparm (2)));
+
+func_t bar[] =
+{
+  bar0,
+};
+
+extern void foo (void);
+
+int
+main ()
+{
+  foo ();
+  return 0;
+}
-- 
2.5.0

DW_AT_endianity (was Re: [ping] pending patches)

2015-12-02 Thread Jason Merrill


On 12/02/2015 11:44 AM, Eric Botcazou wrote:

It doesn't look to me like DW_AT_endianity is applicable to array types
or members in DWARF 3/4; instead, it should be applied to the underlying
base type.


Yeah, I agreed but Jakub didn't and I trust him more than myself here. :-)
   https://gcc.gnu.org/ml/gcc/2015-06/msg00143.html
   https://gcc.gnu.org/ml/gcc/2015-06/msg00144.html
   https://gcc.gnu.org/ml/gcc/2015-06/msg00147.html



For the C family of languages where array types are not first-class citizens,
the SSO attribute simply cannot be supported on array types.  But for other
languages, like Ada, where they are, it can and it is supported on the branch,
so you can declare an array of 4 integers with big-endian storage order.


OK, but why not represent this as an array of big-endian integers?

Jason

Re: [PR67335] drop dummy zero from reverse VTA ops, fix infinite recursion

2015-12-02 Thread Jeff Law


On 11/26/2015 04:45 PM, Alexandre Oliva wrote:

VTA's cselib expression hashing compares expressions with the same
hash before adding them to the hash table.  When there is a collision
involving a self-referencing expression, we could get infinite
recursion, in spite of the cycle breakers already in place.  The
problem is currently latent in the trunk, because by chance we don't
get a collision.

Such value cycles are often introduced by reverse_op; most often,
they're indirect, and then value canonicalization takes care of the
cycle, but if the reverse operation simplifies to the original value,
we used to issue a (plus V (const_int 0)), because at some point
adding a plain value V to a location list as a reverse_op equivalence
caused other problems.

(Jakub, do you by any chance still remember what those problems were,
  some 5+ years ago?)

This dummy zero, in turn, caused the value canonicalizer to not fully
realize the equivalence, leading to more complex graphs and,
occasionally, to infinite recursion when comparing such
value-plus-zero expressions recursively.

Simply using V solves the infinite recursion from the PR testcase,
since the extra equivalence and the preexisting value canonicalization
together prevent recursion while the unrecognized equivalence
wouldn't, but it exposed another infinite recursion in
memrefs_conflict_p: get_addr had a cycle breaker in place, to skip RTL
referencing values introduced after the one we're examining, but it
wouldn't break the cycle if the value itself appeared in the
expression being examined.

After removing the dummy zero above, this kind of cycle in the
equivalence graph is no longer introduced by VTA itself, but dummy
zeros are also present in generated code, such as in the 32-bit x86's
pro_epilogue_adjust_stack_si_add epilogue insn generated as part of
the builtin longjmp in _Unwind_RaiseException building libgcc's
unwind-dw2.o.  So, break the recursion cycle for them too.


for  gcc/ChangeLog

PR debug/67355
* var-tracking.c (reverse_op): Don't add dummy zero to reverse
ops that simplify back to the original value.
* alias.c (refs_newer_value_p): Cut off recursion for
expressions containing the original value.

OK.

Presumably there's no good way to directly test this, even though we 
have a good sense of what's happening.  Thus no testcase.


Insert plug about unit testing here.

Jeff

Re: [ping] pending patches

2015-12-02 Thread Eric Botcazou

> It doesn't look to me like DW_AT_endianity is applicable to array types
> or members in DWARF 3/4; instead, it should be applied to the underlying
> base type.

Yeah, I agreed but Jakub didn't and I trust him more than myself here. :-)
  https://gcc.gnu.org/ml/gcc/2015-06/msg00143.html
  https://gcc.gnu.org/ml/gcc/2015-06/msg00144.html
  https://gcc.gnu.org/ml/gcc/2015-06/msg00147.html

-- 
Eric Botcazou

Re: [UPC 01/22] front-end changes

2015-12-02 Thread Gary Funck

On 12/01/15 09:12:44, Eric Botcazou wrote:
> > All languages (c, c++, fortran, go, lto, objc, obj-c++) have been
> > bootstrapped; no test suite regressions were introduced,
> > relative to the GCC trunk.
> 
> That's not all languages though, Ada and Java are missing.

Full bootstrap, no regressions relative to trunk for:
  ada,c,c++,fortran,go,java,lto,objc,obj-c++

Didn't do jit in this build, will add it in.

Re: Add fuzzing coverage support

2015-12-02 Thread Bernd Schmidt


On 12/02/2015 05:55 PM, Dmitry Vyukov wrote:

Can you point to some concrete coding style violations (besides
function comments)?

  (flag_sanitize & (SANITIZE_ADDRESS | SANITIZE_THREAD \
-   | SANITIZE_UNDEFINED | SANITIZE_NONDEFAULT)))
+   | SANITIZE_UNDEFINED | SANITIZE_NONDEFAULT) \
+   || flag_sanitize_coverage))


The || should line up with the other condition (i.e. the part starting 
with flag_sanitize).



+unsigned sancov_pass (function *fun)


Split the line after the return type.


+
+template
+class pass_sancov : public gimple_opt_pass
+{


This seems to be a new idiom but I find it OK. One thing to consider 
would be whether you really need this split between O0/optimize 
versions, or whether you can find a place in the queue where to insert 
it unconditionally. Have you considered this at all or did you just 
follow asan/tsan?



+public:
+  static pass_data pd ()
+  {
+static const pass_data data =


I think a static data member would be better than the unnecessary pd () 
function. This is also unlike existing practice, and I wonder how others 
think about it. IMO a fairly strong case could be made that if we're 
using C++, then this sort of thing ought to be part of the class definition.



Bernd

Re: RFC: c-common PATCHes to fully fold warning function arguments

2015-12-02 Thread Jason Merrill


On 12/02/2015 12:50 PM, Joseph Myers wrote:

On Wed, 2 Dec 2015, Jason Merrill wrote:


As richi pointed out in another thread, calling fold in warn_tautological_cmp
is problematic because fold expects operands of its argument to have already
been folded; we really need to call *_fully_fold.

The first patch moves c_fully_fold into the C front end, and defines it for
C++ as well.


The new file needs copyright and license notices.


So it does, thanks.


The second patch defines fold_for_warn to call c_fully_fold and uses it in
places in the warn_* functions that want to know if the operand has a constant
value.

Joseph, are these changes OK for C?  The extra folding isn't currently
necessary for C, should fold_for_warn check c_dialect_cxx()?


I think it should check that, yes.


Done.

Are these OK?

Jason

commit 4facfe1ac9fe192fbfc0ac57868652af2ca2be16
Author: Jason Merrill 
Date:   Thu Nov 26 07:45:03 2015 -0500

	Define c_fully_fold separately for C and C++.

gcc/c-family/
	* c-common.c (c_disable_warnings, c_enable_warnings, c_fully_fold)
	(c_fully_fold_internal, decl_constant_value_for_optimization):
	Move to c/c-fold.c.
	* c-common.h: Don't declare decl_constant_value_for_optimization.
gcc/c/
	* c-fold.c (c_disable_warnings, c_enable_warnings, c_fully_fold)
	(c_fully_fold_internal, decl_constant_value_for_optimization):
	Move from c-common.c.
	* c-tree.h: Declare decl_constant_value_for_optimization.
	* Make-lang.in (C_AND_OBJC_OBJS): Add c-fold.o.
gcc/cp/
	* cp-gimplify.c (c_fully_fold): Define.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 7a9e3b9..40f86e3 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -299,7 +299,6 @@ const struct fname_var_t fname_vars[] =
 /* Global visibility options.  */
 struct visibility_flags visibility_options;
 
-static tree c_fully_fold_internal (tree expr, bool, bool *, bool *, bool);
 static tree check_case_value (location_t, tree);
 static bool check_case_bounds (location_t, tree, tree, tree *, tree *,
 			   bool *);
@@ -1095,582 +1094,6 @@ fix_string_type (tree value)
   return value;
 }
 
-/* If DISABLE is true, stop issuing warnings.  This is used when
-   parsing code that we know will not be executed.  This function may
-   be called multiple times, and works as a stack.  */
-
-static void
-c_disable_warnings (bool disable)
-{
-  if (disable)
-{
-  ++c_inhibit_evaluation_warnings;
-  fold_defer_overflow_warnings ();
-}
-}
-
-/* If ENABLE is true, reenable issuing warnings.  */
-
-static void
-c_enable_warnings (bool enable)
-{
-  if (enable)
-{
-  --c_inhibit_evaluation_warnings;
-  fold_undefer_and_ignore_overflow_warnings ();
-}
-}
-
-/* Fully fold EXPR, an expression that was not folded (beyond integer
-   constant expressions and null pointer constants) when being built
-   up.  If IN_INIT, this is in a static initializer and certain
-   changes are made to the folding done.  Clear *MAYBE_CONST if
-   MAYBE_CONST is not NULL and EXPR is definitely not a constant
-   expression because it contains an evaluated operator (in C99) or an
-   operator outside of sizeof returning an integer constant (in C90)
-   not permitted in constant expressions, or because it contains an
-   evaluated arithmetic overflow.  (*MAYBE_CONST should typically be
-   set to true by callers before calling this function.)  Return the
-   folded expression.  Function arguments have already been folded
-   before calling this function, as have the contents of SAVE_EXPR,
-   TARGET_EXPR, BIND_EXPR, VA_ARG_EXPR, OBJ_TYPE_REF and
-   C_MAYBE_CONST_EXPR.  */
-
-tree
-c_fully_fold (tree expr, bool in_init, bool *maybe_const)
-{
-  tree ret;
-  tree eptype = NULL_TREE;
-  bool dummy = true;
-  bool maybe_const_itself = true;
-  location_t loc = EXPR_LOCATION (expr);
-
-  /* This function is not relevant to C++ because C++ folds while
- parsing, and may need changes to be correct for C++ when C++
- stops folding while parsing.  */
-  if (c_dialect_cxx ())
-gcc_unreachable ();
-
-  if (!maybe_const)
-maybe_const = 
-  if (TREE_CODE (expr) == EXCESS_PRECISION_EXPR)
-{
-  eptype = TREE_TYPE (expr);
-  expr = TREE_OPERAND (expr, 0);
-}
-  ret = c_fully_fold_internal (expr, in_init, maybe_const,
-			   _const_itself, false);
-  if (eptype)
-ret = fold_convert_loc (loc, eptype, ret);
-  *maybe_const &= maybe_const_itself;
-  return ret;
-}
-
-/* Internal helper for c_fully_fold.  EXPR and IN_INIT are as for
-   c_fully_fold.  *MAYBE_CONST_OPERANDS is cleared because of operands
-   not permitted, while *MAYBE_CONST_ITSELF is cleared because of
-   arithmetic overflow (for C90, *MAYBE_CONST_OPERANDS is carried from
-   both evaluated and unevaluated subexpressions while
-   *MAYBE_CONST_ITSELF is carried from only evaluated
-   subexpressions).  FOR_INT_CONST indicates if EXPR is an expression
-

Fix TYPE_MAIN_VARIANT construction for arrays of qualified typedefs (PR c/68162)

2015-12-02 Thread Joseph Myers

PR c/68162 reports a spurious warning about incompatible types
involving arrays of const double, constructed in one place using a
typedef for const double and in another place literally using const
double.

The problem is that the array of the typedef was incorrectly
constructed without a TYPE_MAIN_VARIANT being an array of unqualified
elements as it should be (though it seems some more recent change
resulted in this producing incorrect diagnostics, likely the support
for C++-style handling of arrays of qualified type).  This patch fixes
the logic in grokdeclarator to determine first_non_attr_kind, which is
used to determine whether it is necessary to use the TYPE_MAIN_VARIANT
of the type in the declaration specifiers.

However, fixing that logic introduces a failure of
gcc.dg/debug/dwarf2/pr47939-4.c, a test introduced along with
first_non_attr_kind.  Thus, it is necessary to track the original
qualified typedef when qualifying an array type, to use it rather than
a newly-constructed type, to avoid regressing regarding typedef names
in debug info.  This is done along lines I suggested in
: track the
original type and the number of levels of array indirection at which
it appears, and, in possibly affected cases, pass extra arguments to
c_build_qualified_type (with default arguments to avoid needing to
pass those extra arguments explicitly everywhere).  Given Richard's
recent fix to dwarf2out.c, this allows the C bug to be fixed without
causing debug information regressions.

Bootstrapped with no regressions on x86_64-pc-linux-gnu.  Applied to 
mainline.

gcc/c:
2015-12-02  Joseph Myers  

PR c/68162
* c-decl.c (grokdeclarator): Set first_non_attr_kind before
following link from declarator to next declarator.  Track original
qualified type and pass it to c_build_qualified_type.
* c-typeck.c (c_build_qualified_type): Add arguments
orig_qual_type and orig_qual_indirect.

gcc/c-family:
2015-12-02  Joseph Myers  

PR c/68162
* c-common.h (c_build_qualified_type): Add extra default
arguments.

gcc/cp:
2015-12-02  Joseph Myers  

PR c/68162
* tree.c (c_build_qualified_type): Add extra arguments.

gcc/testsuite:
2015-12-02  Joseph Myers  

PR c/68162
* gcc.dg/pr68162-1.c: New test.

Index: gcc/c/c-decl.c
===
--- gcc/c/c-decl.c  (revision 231075)
+++ gcc/c/c-decl.c  (working copy)
@@ -5351,6 +5351,8 @@ grokdeclarator (const struct c_declarator *declara
   tree returned_attrs = NULL_TREE;
   bool bitfield = width != NULL;
   tree element_type;
+  tree orig_qual_type = NULL;
+  size_t orig_qual_indirect = 0;
   struct c_arg_info *arg_info = 0;
   addr_space_t as1, as2, address_space;
   location_t loc = UNKNOWN_LOCATION;
@@ -5389,9 +5391,9 @@ grokdeclarator (const struct c_declarator *declara
case cdk_function:
case cdk_pointer:
  funcdef_syntax = (decl->kind == cdk_function);
- decl = decl->declarator;
  if (first_non_attr_kind == cdk_attrs)
first_non_attr_kind = decl->kind;
+ decl = decl->declarator;
  break;
 
case cdk_attrs:
@@ -5513,12 +5515,17 @@ grokdeclarator (const struct c_declarator *declara
   if ((TREE_CODE (type) == ARRAY_TYPE
|| first_non_attr_kind == cdk_array)
   && TYPE_QUALS (element_type))
-type = TYPE_MAIN_VARIANT (type);
+{
+  orig_qual_type = type;
+  type = TYPE_MAIN_VARIANT (type);
+}
   type_quals = ((constp ? TYPE_QUAL_CONST : 0)
| (restrictp ? TYPE_QUAL_RESTRICT : 0)
| (volatilep ? TYPE_QUAL_VOLATILE : 0)
| (atomicp ? TYPE_QUAL_ATOMIC : 0)
| ENCODE_QUAL_ADDR_SPACE (address_space));
+  if (type_quals != TYPE_QUALS (element_type))
+orig_qual_type = NULL_TREE;
 
   /* Applying the _Atomic qualifier to an array type (through the use
  of typedefs or typeof) must be detected here.  If the qualifier
@@ -6013,6 +6020,7 @@ grokdeclarator (const struct c_declarator *declara
array_ptr_attrs = NULL_TREE;
array_parm_static = 0;
  }
+   orig_qual_indirect++;
break;
  }
case cdk_function:
@@ -6022,6 +6030,7 @@ grokdeclarator (const struct c_declarator *declara
   attributes.  */
bool really_funcdef = false;
tree arg_types;
+   orig_qual_type = NULL_TREE;
if (funcdef_flag)
  {
const struct c_declarator *t = declarator->declarator;
@@ -6122,7 +6131,9 @@ grokdeclarator (const struct c_declarator *declara
  pedwarn (loc, OPT_Wpedantic,
   "ISO C forbids qualified function types");
if (type_quals)
-

Re: [UPC 02/22] tree-related changes

2015-12-02 Thread Gary Funck

On 12/01/15 12:26:32, Richard Biener wrote:
> On Mon, 30 Nov 2015, Gary Funck wrote:
> > -struct GTY(()) tree_type_common {
> > +struct GTY((user)) tree_type_common {
> >struct tree_common common;
> >tree size;
> >tree size_unit;
> > @@ -1441,10 +1458,10 @@ struct GTY(()) tree_type_common {
> >tree pointer_to;
> >tree reference_to;
> >union tree_type_symtab {
> > -int GTY ((tag ("TYPE_SYMTAB_IS_ADDRESS"))) address;
> > -const char * GTY ((tag ("TYPE_SYMTAB_IS_POINTER"))) pointer;
> > -struct die_struct * GTY ((tag ("TYPE_SYMTAB_IS_DIE"))) die;
> > -  } GTY ((desc ("debug_hooks->tree_type_symtab_field"))) symtab;
> > +int address;
> > +const char *pointer;
> > +struct die_struct *die;
> > +  } symtab;
>
> Err, you don't have debug info for this?  What is address?

Not sure what you mean.  The 'die' field is retained.
Is there something in the semantics of "GTY(( ((tag "
that relates to debug information?

> I do not like the explict GC of tree_type_common.

I'm not a fan either.

The gist is that we needed a map from tree nodes to tree nodes
to record the "layout qualifier" for layout qualifiers with
a value greater than one.  But when the garbage collector ran
over the hash table that maps integer constants to tree nodes,
it didn't know that the constant was being referenced by the
layout qualifier tree map.

We described the issue here:
https://gcc.gnu.org/ml/gcc-patches/2011-10/msg00800.html

The conclusion that we reached is that when tree nodes
were walked, we needed to check if there was a
tree node -> integer constant mapping, the integer constant map
(used to make tree nodes used to hold CST's unique)
needed to be marked to keep the CST mapping from going away.

This led to the conclusion that a custom GC routine was
needed for tree nodes.  Maybe that conclusion is wrong or
there is a better way to do things?

> > ===
> > --- gcc/tree-pretty-print.c (.../trunk) (revision 231059)
> > +++ gcc/tree-pretty-print.c (.../branches/gupc) (revision 231080)
> > @@ -1105,6 +1105,25 @@ dump_block_node (pretty_printer *pp, tre
> >  }
> >  
> >  
> > +static void
> > +dump_upc_type_quals (pretty_printer *buffer, tree type, int quals)
>
> Functions need comments.

OK.  Missed that one.  Will check on others.

> > Index: gcc/tree-sra.c
> > ===
> > --- gcc/tree-sra.c  (.../trunk) (revision 231059)
> > +++ gcc/tree-sra.c  (.../branches/gupc) (revision 231080)
> > @@ -3882,6 +3882,7 @@ find_param_candidates (void)
> >  
> >   if (TREE_CODE (type) == FUNCTION_TYPE
> >   || TYPE_VOLATILE (type)
> > + || SHARED_TYPE_P (type)
> 
> UPC_SHARED_TYPE_P ()

OK. As I mentioned in a previous reply, originally we prefixed
all "UPC" specific tree node fields and functions with UPC_ or upc_,
but as we transitioned away from UPC as a separate language
(ala ObjC) and made compilation conditional upon -fupc, an
observation was made off list that since the base tree nodes
are generic that naming UPC-related fields with "UPC" prefixes
didn't make sense, so we removed those prefixes.  There might
be a middle ground, however, whee UPC_SHARED_TYPE_P() is preferred
to SHARED_TYPE_P() because as you/others have mentioned,
the term "shared" gets used in a lot of contexts.

> > @@ -4381,6 +4422,7 @@ build1_stat (enum tree_code code, tree t
> >/* Whether a dereference is readonly has nothing to do with whether
> >  its operand is readonly.  */
> >TREE_READONLY (t) = 0;
> > +  TREE_SHARED (t) = SHARED_TYPE_P (type);
> 
> This is frontend logic and should reside in FEs.

[... several other similar actions taken contingent
upon SHARED_TYPE_P() elided ...]

OK, will take a look.

> > +  outer_is_pts_p = (POINTER_TYPE_P (outer_type)
> > +&& SHARED_TYPE_P (TREE_TYPE (outer_type)));
> > +  inner_is_pts_p = (POINTER_TYPE_P (inner_type)
> > +&& SHARED_TYPE_P (TREE_TYPE (inner_type)));
> > +
> > +  /* Pointer-to-shared types have special
> > + equivalence rules that must be checked.  */
> > +  if (outer_is_pts_p && inner_is_pts_p
> > +  && lang_hooks.types_compatible_p)
> > +return lang_hooks.types_compatible_p (outer_type, inner_type);
> 
> Sorry, but that can't fly with LTO.

Can you elaborate?  I'm not familiar with LTO or its requirements.

> I wonder why you didn't use address-spaces for whatever
> UPC needs for pointers.

Background: UPC pointers have special language defined rules
for address arithmetic.  In the simplest case, adding 1 to a UPC
pointer-to-shared increments the thread part of the pointer, until
it exceeds the number of threads in the program, at that point,
the "virtual address" (often an offset into the program's global
shared region) is incremented by the size of the element type.
Since UPC pointers have a thread field and block offset (phase)

Re: [PATCH, VECTOR ABI] Add attribute((simd)) to GCC.

2015-12-02 Thread Jeff Law


On 12/02/2015 05:46 AM, Kirill Yukhin wrote:

Hello Jakub,

On 13 Nov 13:16, Jakub Jelinek wrote:

--- /dev/null
+++ b/gcc/testsuite/c-c++-common/attr-simd.c


Similarly.

Ok for trunk with those changes.

It turns out that current implementation of GLibC does not
contain masked variants of math routines. So, this attribute
is useless until it is capable to generation only [nonmasked|maked]
variants of the routines.

Patch in the bottom introduces `notinbranch' and `inbranch' flags to
the attribute.

Bootstrapped and regtested. New tests pass.

Is it ok for trunk (GCC v6)?

gcc/
* c-family/c-common.c (c_common_attribute_table[]): Update max 
aerguments
count for "simd" attribute.
(handle_simd_attribute): Parse "notinbranch" and "inbranch" arguments.
* doc/extend.texi ("simd"): Describe new flags.
gcc/testsuite/
* c-c++-common/attr-simd-4.c: New test.
* c-c++-common/attr-simd-5.c: New test.
Why not use "unmasked" and "masked" instead of "notinbranch" and 
"inbranch"?   If those terms come from OpenMP or are in use by other 
compilers (llvm, icc, whatever), then I guess we should stick with them. 
 Otherwise we should consider [un]masked which are consistent with the 
vector abi document.


So I think if [not]inbranch comes from OpenMP, then this patch is OK 
as-is.   Similarly if other compilers are using the [not]inbranch 
modifier.  If the modifiers are totally arbitrary, then consider using 
[un]masked which is consistent with the vector abi documentation and the 
patch would be pre-approved with that change.



Jeff

Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

2015-12-02 Thread Thomas Schwinge

Hi!

On Tue, 1 Dec 2015 15:25:42 +0100, Tom de Vries  wrote:
> Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

>   * c-c++-common/goacc/kernels-alias-ipa-pta-2.c: New test.
>   * c-c++-common/goacc/kernels-alias-ipa-pta-3.c: New test.
>   * c-c++-common/goacc/kernels-alias-ipa-pta.c: New test.

I see:

PASS: c-c++-common/goacc/kernels-alias-ipa-pta-2.c (test for excess errors)
FAIL: c-c++-common/goacc/kernels-alias-ipa-pta-2.c scan-tree-dump-times 
optimized "(?n)= 0;$" 2
PASS: c-c++-common/goacc/kernels-alias-ipa-pta-2.c scan-tree-dump-times 
optimized "(?n)= 1;$" 1
FAIL: c-c++-common/goacc/kernels-alias-ipa-pta-2.c scan-tree-dump-times 
optimized "(?n)= \\*a" 0
PASS: c-c++-common/goacc/kernels-alias-ipa-pta-3.c (test for excess errors)
PASS: c-c++-common/goacc/kernels-alias-ipa-pta-3.c scan-tree-dump-times 
optimized "(?n)= 0;$" 1
PASS: c-c++-common/goacc/kernels-alias-ipa-pta-3.c scan-tree-dump-times 
optimized "(?n)= 1;$" 1
PASS: c-c++-common/goacc/kernels-alias-ipa-pta-3.c scan-tree-dump-times 
optimized "(?n)= \\*a" 1
PASS: c-c++-common/goacc/kernels-alias-ipa-pta.c (test for excess errors)
FAIL: c-c++-common/goacc/kernels-alias-ipa-pta.c scan-tree-dump-times 
optimized "(?n)= 0;$" 2
PASS: c-c++-common/goacc/kernels-alias-ipa-pta.c scan-tree-dump-times 
optimized "(?n)= 1;$" 1
FAIL: c-c++-common/goacc/kernels-alias-ipa-pta.c scan-tree-dump-times 
optimized "(?n)= \\*_[0-9]\\[0\\];$" 0

..., and similar for C++.  Looking at
c-c++-common/goacc/kernels-alias-ipa-pta.c:

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta.c
> @@ -0,0 +1,23 @@
> +/* { dg-additional-options "-O2" } */
> +/* { dg-additional-options "-fipa-pta -fdump-tree-optimized" } */
> +
> +#define N 2
> +
> +void
> +foo (void)
> +{
> +  unsigned int a[N];
> +  unsigned int b[N];
> +  unsigned int c[N];
> +
> +#pragma acc kernels pcopyout (a, b, c)
> +  {
> +a[0] = 0;
> +b[0] = 1;
> +c[0] = a[0];
> +  }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "(?n)= 0;$" 2 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "(?n)= 1;$" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "(?n)= \\*_\[0-9\]\\\[0\\\];$" 0 
> "optimized" } } */

..., manually running that one for C, I get:

;; Function foo._omp_fn.0 (foo._omp_fn.0, funcdef_no=1, decl_uid=1874, 
cgraph_uid=1, symbol_order=1)

__attribute__((oacc function (1, 1, 1), omp target entrypoint))
foo._omp_fn.0 (const struct .omp_data_t.0 & restrict .omp_data_i)
{
  unsigned int[2] * _3;
  unsigned int[2] * _5;
  unsigned int _7;
  unsigned int[2] * _8;

  :
  _3 = *.omp_data_i_2(D).a;
  *_3[0] = 0;
  _5 = *.omp_data_i_2(D).b;
  *_5[0] = 1;
  _7 = *_3[0];
  _8 = *.omp_data_i_2(D).c;
  *_8[0] = _7;
  return;

}



;; Function foo (foo, funcdef_no=0, decl_uid=1866, cgraph_uid=0, 
symbol_order=0)

foo ()
{
  unsigned int c[2];
  unsigned int b[2];
  unsigned int a[2];
  struct .omp_data_t.0 .omp_data_arr.1;
  static long unsigned int .omp_data_sizes.2[3] = {8, 8, 8};
  static short unsigned int .omp_data_kinds.3[3] = {514, 514, 514};

  :
  .omp_data_arr.1.c = 
  .omp_data_arr.1.b = 
  .omp_data_arr.1.a = 
  GOACC_parallel_keyed (-1, foo._omp_fn.0, 3, &.omp_data_arr.1, 
&.omp_data_sizes.2, &.omp_data_kinds.3, 0);
  .omp_data_arr.1 ={v} {CLOBBER};
  a ={v} {CLOBBER};
  b ={v} {CLOBBER};
  c ={v} {CLOBBER};
  return;

}


Grüße
 Thomas


signature.asc
Description: PGP signature

Re: RFC: Merge the GUPC branch into the GCC 6.0 trunk

2015-12-02 Thread Gary Funck

On 12/01/15 12:19:48, Bernd Schmidt wrote:
> On 12/01/2015 06:31 AM, Gary Funck wrote:
> >At this time, we would like to re-submit the UPC patches for comment
> >with the goal of introducing these changes into GCC 6.0.
> 
> This has missed stage 1 by a few weeks, we'd have to make an exception to
> include it at this late stage.

Based upon the feedback, it looks like GCC 6.0 is not feasible.

> 
> >@@ -857,9 +875,14 @@ struct GTY(()) tree_base {
> >unsigned user_align : 1;
> >unsigned nameless_flag : 1;
> >unsigned atomic_flag : 1;
> >-  unsigned spare0 : 3;
> >-
> >-  unsigned spare1 : 8;
> >+  unsigned shared_flag : 1;
> >+  unsigned strict_flag : 1;
> >+  unsigned relaxed_flag : 1;
> >+
> >+  unsigned threads_factor_flag : 1;
> >+  unsigned block_factor_0 : 1;
> >+  unsigned block_factor_x : 1;
> >+  unsigned spare1 : 5;
> 
> That's a lot of bits used up at once.

I provided some additional background in my reply to Richard.
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00136.html

> Does this solve anything that cannot be done with OpenMP, which we already
> support?

UPC's target use is similar to that of Co-Array Fortran (CAF),
mainly multi-node computations in addition to multi-core.
Also, in the sense that UPC makes syntactic extensions
it is arguably easier to write and to understand UPC programs
than the pragma based approach used by OpenMP and Cilk, or
library based solutions like MPI.

Here is an example application, parallel merge sort,
written in MPI, OpenMP, UPC, hybrid MPI/OpenMP, 
and hybrid UPC/OpenMP which illustrates how they
compare in terms of expressivity.  In general, the
bulks synchronous UPC implementation out-performs
the others.

https://github.com/gary-funck/parallel-merge-sort

> Can you show us any users of this that demonstrate that this is
> actually in use by anyone outside the universities responsible for UPC?

It is primarily used in universities and research labs.
Cray, IBM, and HP offer their own commercial compilers on
their HPC platforms.  Berkeley has an open UPC-to-C translator
and we have separately built a Clang-based compiler and
source-to-source translator.

> The language standard is apparently from 2005 [...]

The spec was updated in 2013.
http://upc.lbl.gov/publications/upc-spec-1.3.pdf

> but I've never heard of it and
> googling "upc" does not give any sensible results. The gccupc mailing list
> seems to have been dead for years judging by the archives. I'm worried we'll
> end up carrying something around as a burden that is of no practical use
> (considering we already support the more widespread OpenMP).

UPC is more similar to Co-Array Fortran (CAF) than OpenMP.
I don't keep up with developments in the OpenMP or OpenACC standards,
so am unaware of proposals to generalize them for multi-node
HPC applications.  As mentioned, IMO, UPC is more expressive
than OpenMP (which is pragma based).  Their programming
models are also different.  UPC is SIMD, and OpenMP uses
dynamic task dispatching.

Regarding a possible "burden", we have tried to modularize
the changes to minimize impact on the compiler.

We floated the idea of including UPC in GCC a few years back;
there were no objections at that time.  In the mean time,
we have been implementing changes based upon feedback,
porting the runtime to other communication layers
and implementing the changes needed to conform
to the 2013 UPC specification.

thanks,
- Gary

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Nathan Sidwell


On 12/02/15 11:35, Jakub Jelinek wrote:

On Wed, Dec 02, 2015 at 06:44:11PM +0300, Alexander Monakov wrote:

But you never know if people actually use #pragma omp simd regions or not,
sometimes they will, sometimes they won't, and if the uniform SIMT

increases

power consumption, it might not be desirable.


It's easy to address: just terminate threads 1-31 if the linked image has
no SIMD regions, like my pre-simd libgomp was doing.


Well, can't say the linked image in one shared library call a function
in another linked image in another shared library?  Or is that just not
supported for PTX?  I believe XeonPhi supports that.


I don't believe PTX supports such dynamic loading within the PTX program 
currently being executed.  The JIT compiler can have several PTX 'objects' 
loaded into it before you tell it to go link everything.  At that point all 
symbols must be resolved.  I've no idea as to how passing a pointer to a 
function in some other 'executable' and calling it might behave -- my suspicion 
is 'badly'.

[PATCH] Fix ICE in nonnull warning on OFFSET_TYPEs (PR c++/68653)

2015-12-02 Thread Marek Polacek

We were hitting the assert in nonnull_arg_p that only wants to see POINTER_TYPE,
but it got OFFSET_TYPE instead.  Which is--in tree.def--described as "a pointer
relative to an object".  Thus I think we should allow OFFSET_TYPE here.  In the
attached testcase I demonstrated that this warning works even for OFFSET_TYPEs.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-12-02  Marek Polacek  

PR c++/68653
* tree.c (nonnull_arg_p): Allow OFFSET_TYPE.

* g++.dg/warn/nonnull3.C: New test.

diff --git gcc/testsuite/g++.dg/warn/nonnull3.C 
gcc/testsuite/g++.dg/warn/nonnull3.C
index e69de29..d82fa31 100644
--- gcc/testsuite/g++.dg/warn/nonnull3.C
+++ gcc/testsuite/g++.dg/warn/nonnull3.C
@@ -0,0 +1,19 @@
+// PR c++/68653
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wall" }
+
+struct B;
+struct A {
+  template  __attribute__((nonnull)) bool foo (int T::*);
+  void bar (B *);
+};
+
+template  bool A::foo (int T::*p)
+{
+  return p;
+}
+void A::bar (B *)
+{
+  foo ((int B::*) nullptr);
+}
+// { dg-warning "nonnull argument" "" {target "*-*-*"} 0 }
diff --git gcc/tree.c gcc/tree.c
index 587bd74..bd5cf73 100644
--- gcc/tree.c
+++ gcc/tree.c
@@ -13848,7 +13848,9 @@ nonnull_arg_p (const_tree arg)
   tree t, attrs, fntype;
   unsigned HOST_WIDE_INT arg_num;
 
-  gcc_assert (TREE_CODE (arg) == PARM_DECL && POINTER_TYPE_P (TREE_TYPE 
(arg)));
+  gcc_assert (TREE_CODE (arg) == PARM_DECL
+ && (POINTER_TYPE_P (TREE_TYPE (arg))
+ || TREE_CODE (TREE_TYPE (arg)) == OFFSET_TYPE));
 
   /* The static chain decl is always non null.  */
   if (arg == cfun->static_chain_decl)

Marek

Re: RFC: c-common PATCHes to fully fold warning function arguments

2015-12-02 Thread Joseph Myers

On Wed, 2 Dec 2015, Jason Merrill wrote:

> > The new file needs copyright and license notices.
> 
> So it does, thanks.
> 
> > > The second patch defines fold_for_warn to call c_fully_fold and uses it in
> > > places in the warn_* functions that want to know if the operand has a
> > > constant
> > > value.
> > > 
> > > Joseph, are these changes OK for C?  The extra folding isn't currently
> > > necessary for C, should fold_for_warn check c_dialect_cxx()?
> > 
> > I think it should check that, yes.
> 
> Done.
> 
> Are these OK?

These are OK for C, with the copyright dates on the new file copied from 
c-common.c rather than only 2015.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH, i386, AVX-512] Split out mask version for vec_extract_hi_.

2015-12-02 Thread Kirill Yukhin

Hello,
On 30 Nov 13:46, Kirill Yukhin wrote:
> Hello,
> Patch in the bottom splits masked version of vec_extract_hi_
> to block AVX-1512VL insn generation for KNL and cures ICE on 
> spec2k6/450.soplex.
> 
> Bootstrapped and regtesed.
> 
> If no objections - I'll commit on Wednesday.
> 
> gcc/
>   * config/i386/sse.md (define_insn "vec_extract_hi__maskm"):
>   Remove "prefix_extra".
>   (define_insn "vec_extract_hi__mask"): New.
>   (define_insn "vec_extract_hi_"): Remove masking.
> gcc/testsuite/
>   * gcc.target/i386/avx512vl-vextractf32x4-1.c: Fix scan pattern.
> 
It looks like I forgot to fix i32x4 test as well.

Patch in the bottom fixes it (and a regression on 32b).

If no objections - I'll check it into main trunk tomorrow.

gcc/testsuite/
* gcc.target/i386/avx512vl-vextracti32x4-1.c: Fix scan pattern.

--
Thanks, K

diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vextracti32x4-1.c 
b/gcc/testsuite/gcc.target/i386/avx512vl-vextracti32x4-1.c
index 7ee87dd..0826a0b 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vl-vextracti32x4-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-vextracti32x4-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-mavx512vl -O2" } */
-/* { dg-final { scan-assembler-times "vextracti32x4\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vextracti\(?:128|32x4\)\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
 /* { dg-final { scan-assembler-times "vextracti32x4\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vextracti32x4\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */

[PTX] #define cleanup

2015-12-02 Thread Nathan Sidwell


I noticed an enum would be cleaner to define the shuffle kinds.

nathan
2015-12-02  Nathan Sidwell  

	* config/nvptx/nvptx.c (enum nvptx_shuffle_kind): New.  Absorb
	SHUFFLE defines.
	(nvptx_gen_shuffle, nvptx_print_operand, nvptx_expand_shuffle): Adjust.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 231191)
+++ config/nvptx/nvptx.c	(working copy)
@@ -70,10 +70,15 @@
 /* This file should be included last.  */
 #include "target-def.h"
 
-#define SHUFFLE_UP 0
-#define SHUFFLE_DOWN 1
-#define SHUFFLE_BFLY 2
-#define SHUFFLE_IDX 3
+/* The kind of shuffe instruction.  */
+enum nvptx_shuffle_kind
+{
+  SHUFFLE_UP,
+  SHUFFLE_DOWN,
+  SHUFFLE_BFLY,
+  SHUFFLE_IDX,
+  SHUFFLE_MAX
+};
 
 /* Record the function decls we've written, and the libfuncs and function
decls corresponding to them.  */
@@ -1221,7 +1226,7 @@ nvptx_gen_pack (rtx dst, rtx src0, rtx s
across the vectors of a single warp.  */
 
 static rtx
-nvptx_gen_shuffle (rtx dst, rtx src, rtx idx, unsigned kind)
+nvptx_gen_shuffle (rtx dst, rtx src, rtx idx, nvptx_shuffle_kind kind)
 {
   rtx res;
 
@@ -2019,10 +2024,11 @@ nvptx_print_operand (FILE *file, rtx x,
 
 case 'S':
   {
-	unsigned kind = UINTVAL (x);
+	nvptx_shuffle_kind kind = (nvptx_shuffle_kind) UINTVAL (x);
+	/* Same order as nvptx_shuffle_kind.  */
 	static const char *const kinds[] = 
-	  {"up", "down", "bfly", "idx"};
-	fprintf (file, ".%s", kinds[kind]);
+	  {".up", ".down", ".bfly", ".idx"};
+	fputs (kinds[kind], file);
   }
   break;
 
@@ -3990,7 +3996,8 @@ nvptx_expand_shuffle (tree exp, rtx targ
   if (!REG_P (idx) && GET_CODE (idx) != CONST_INT)
 idx = copy_to_mode_reg (SImode, idx);
 
-  rtx pat = nvptx_gen_shuffle (target, src, idx, INTVAL (op));
+  rtx pat = nvptx_gen_shuffle (target, src, idx,
+			   (nvptx_shuffle_kind) INTVAL (op));
   if (pat)
 emit_insn (pat);

Re: [PATCH] -Wshift-overflow: Warn for shifting sign bit out of a negative number

2015-12-02 Thread Jeff Law


On 11/26/2015 04:52 AM, Paolo Bonzini wrote:

maybe_warn_shift_overflow is checking for patterns such as (1 << 31)
and not warning for them.

However, if the shifted value is negative, a shift by a non-zero
amount will always shift *out* of the sign bit rather than into it.
Thus it should be warned about, even if the value only requires
one bit more than the precision of the LHS.

Ok for trunk?

Paolo

gcc:
* c-family/c-common.c (maybe_warn_shift_overflow): Warn on all
overflows if shifting 1 out of the sign bit.

gcc/testsuite:
* c-c++-common/Wshift-overflow-1.c: Test shifting 1 out of the sign bit.
* c-c++-common/Wshift-overflow-2.c: Test shifting 1 out of the sign bit.
* c-c++-common/Wshift-overflow-3.c: Test shifting 1 out of the sign bit.
* c-c++-common/Wshift-overflow-4.c: Test shifting 1 out of the sign bit.
* c-c++-common/Wshift-overflow-6.c: Test shifting 1 out of the sign bit.
* c-c++-common/Wshift-overflow-7.c: Test shifting 1 out of the sign bit.

OK.
jeff

Re: [PATCH] Fix oacc kernels default mapping for scalars

2015-12-02 Thread Thomas Schwinge

Hi!

Copying Nathan for your information, in case you had not yet seen that
gcc/gimplify.c:oacc_default_clause change:

On Fri, 27 Nov 2015 12:29:21 +0100, Tom de Vries  wrote:
> The OpenACC 2.0a standard says this about the default mapping for 
> variables used in a kernels region:
> ...
> An array or variable of aggregate data type referenced in the kernels 
> construct that does not appear in a data clause for the construct or
> any enclosing data construct will be treated as if it appeared in a 
> present_or_copy clause for the kernels construct.
> 
> A scalar variable referenced in the kernels construct that does not 
> appear in a data clause for the construct or any enclosing data 
> construct will be treated as if it appeared in a copy clause.
> ...
> 
> But atm, all variables including the scalar ones have 'present_or_copy' 
> defaults.
> 
> This patch makes sure scalar variables get the 'copy' default.
> 
> Bootstrapped and reg-tested on x86_64. OK for stage3 trunk?

I see:

PASS: gfortran.dg/goacc/reduction-2.f95   -O   scan-tree-dump-times gimple 
"acc loop private.k. reduction..:a." 1
PASS: gfortran.dg/goacc/reduction-2.f95   -O   scan-tree-dump-times gimple 
"acc loop private.p. reduction..:a." 1
[-PASS:-]{+FAIL:+} gfortran.dg/goacc/reduction-2.f95   -O   
scan-tree-dump-times gimple "target oacc_kernels map.tofrom:a .len: 4.." 1
PASS: gfortran.dg/goacc/reduction-2.f95   -O   scan-tree-dump-times gimple 
"target oacc_parallel firstprivate.a." 1
PASS: gfortran.dg/goacc/reduction-2.f95   -O  (test for excess errors)

I guess that one needs to be updated?

> Fix oacc kernels default mapping for scalars
> 
> 2015-11-27  Tom de Vries  
> 
>   * gimplify.c (enum gimplify_omp_var_data): Add enum value
>   GOVD_MAP_FORCE.
>   (oacc_default_clause): Fix default for scalars in oacc kernels.
>   (gimplify_adjust_omp_clauses_1): Handle GOVD_MAP_FORCE.
> 
>   * c-c++-common/goacc/kernels-default-2.c: New test.
>   * c-c++-common/goacc/kernels-default.c: New test.
> 
> ---
>  gcc/gimplify.c   | 19 ++-
>  gcc/testsuite/c-c++-common/goacc/kernels-default-2.c | 17 +
>  gcc/testsuite/c-c++-common/goacc/kernels-default.c   | 14 ++
>  3 files changed, 45 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index fcac745..68d90bf 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -87,6 +87,9 @@ enum gimplify_omp_var_data
>/* Flag for GOVD_MAP, if it is always, to or always, tofrom mapping.  */
>GOVD_MAP_ALWAYS_TO = 65536,
>  
> +  /* Flag for GOVD_MAP, if it is a forced mapping.  */
> +  GOVD_MAP_FORCE = 131072,
> +
>GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
>  | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR
>  | GOVD_LOCAL)
> @@ -5976,8 +5979,12 @@ oacc_default_clause (struct gimplify_omp_ctx *ctx, 
> tree decl, unsigned flags)
>gcc_unreachable ();
>  
>  case ORT_ACC_KERNELS:
> -  /* Everything under kernels are default 'present_or_copy'.  */
> +  /* Scalars are default 'copy' under kernels, non-scalars are default
> +  'present_or_copy'.  */
>flags |= GOVD_MAP;
> +  if (!AGGREGATE_TYPE_P (TREE_TYPE (decl)))
> + flags |= GOVD_MAP_FORCE;
> +
>rkind = "kernels";
>break;
>  
> @@ -7489,10 +7496,12 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, 
> void *data)
>  }
>else if (code == OMP_CLAUSE_MAP)
>  {
> -  OMP_CLAUSE_SET_MAP_KIND (clause,
> -flags & GOVD_MAP_TO_ONLY
> -? GOMP_MAP_TO
> -: GOMP_MAP_TOFROM);
> +  int kind = (flags & GOVD_MAP_TO_ONLY
> +   ? GOMP_MAP_TO
> +   : GOMP_MAP_TOFROM);
> +  if (flags & GOVD_MAP_FORCE)
> + kind |= GOMP_MAP_FLAG_FORCE;
> +  OMP_CLAUSE_SET_MAP_KIND (clause, kind);
>if (DECL_SIZE (decl)
> && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
>   {
> diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-default-2.c 
> b/gcc/testsuite/c-c++-common/goacc/kernels-default-2.c
> new file mode 100644
> index 000..232b123
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/kernels-default-2.c
> @@ -0,0 +1,17 @@
> +/* { dg-additional-options "-O2" } */
> +/* { dg-additional-options "-fdump-tree-gimple" } */
> +
> +#define N 2
> +
> +void
> +foo (void)
> +{
> +  unsigned int a[N];
> +
> +#pragma acc kernels
> +  {
> +a[0]++;
> +  }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "map\\(tofrom" 1 "gimple" } } */
> diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-default.c 
> b/gcc/testsuite/c-c++-common/goacc/kernels-default.c
> new file mode 100644
> index 000..58cd5e1
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/kernels-default.c
> @@ -0,0 +1,14

Re: Add fuzzing coverage support

2015-12-02 Thread Dmitry Vyukov

On Wed, Dec 2, 2015 at 5:47 PM, Bernd Schmidt  wrote:
> On 12/02/2015 05:10 PM, Dmitry Vyukov wrote:
>>
>> ping
>
>
> I do not see the original submission in my archives.

That's strange. I don't see it in gcc-patches archives as well.
The original email contained a plain-text patch attachment. Attaching it again.


> This one comes too late
> to make it into gcc-6. I can make some initial comments.
>
 This patch adds support for coverage-guided fuzzing:
 https://codereview.appspot.com/280140043
>
>
> Please send patches with the mail (the best method is usually a text/plain
> attachment) so that they can be quoted.
>
> From what I can tell this is relatively straight-forward. You'll want to fix
> a number of issues with coding style (formatting, and lack of function
> comments).
>
> There's no documentation in invoke.texi.

Will fix.
Can you point to some concrete coding style violations (besides
function comments)?


> We seem to have no established process for deciding whether we want a new
> feature. I am not sure how to approach such a question, and it comes up
> quite often. Somehow the default answer usually seems to be to accept
> anything.
Index: ChangeLog
===
--- ChangeLog	(revision 231000)
+++ ChangeLog	(working copy)
@@ -1,3 +1,13 @@
+2015-11-27  Dmitry Vyukov  
+
+	* sancov.c: add file.
+	* Makefile.in: add sancov.c.
+	* passes.def: add sancov pass.
+	* tree-pass.h: add sancov pass.
+	* common.opt: add -fsanitize-coverage=trace-pc flag.
+	* sanitizer.def: add __sanitizer_cov_trace_pc builtin.
+	* builtins.def: enable sanitizer builtins when coverage is enabled.
+
 2015-11-27  Jakub Jelinek  
 
 	PR tree-optimization/68552
Index: Makefile.in
===
--- Makefile.in	(revision 231000)
+++ Makefile.in	(working copy)
@@ -1426,6 +1426,7 @@
 	tsan.o \
 	ubsan.o \
 	sanopt.o \
+	sancov.o \
 	tree-call-cdce.o \
 	tree-cfg.o \
 	tree-cfgcleanup.o \
@@ -2399,6 +2400,7 @@
   $(srcdir)/ubsan.c \
   $(srcdir)/tsan.c \
   $(srcdir)/sanopt.c \
+  $(srcdir)/sancov.c \
   $(srcdir)/ipa-devirt.c \
   $(srcdir)/internal-fn.h \
   @all_gtfiles@
Index: builtins.def
===
--- builtins.def	(revision 231000)
+++ builtins.def	(working copy)
@@ -210,7 +210,8 @@
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,\
 	   true, true, true, ATTRS, true, \
 	  (flag_sanitize & (SANITIZE_ADDRESS | SANITIZE_THREAD \
-| SANITIZE_UNDEFINED | SANITIZE_NONDEFAULT)))
+| SANITIZE_UNDEFINED | SANITIZE_NONDEFAULT) \
+			|| flag_sanitize_coverage))
 
 #undef DEF_CILKPLUS_BUILTIN
 #define DEF_CILKPLUS_BUILTIN(ENUM, NAME, TYPE, ATTRS)  \
Index: common.opt
===
--- common.opt	(revision 231000)
+++ common.opt	(working copy)
@@ -225,6 +225,11 @@
 Variable
 unsigned int flag_sanitize_recover = SANITIZE_UNDEFINED | SANITIZE_NONDEFAULT | SANITIZE_KERNEL_ADDRESS
 
+fsanitize-coverage=trace-pc
+Common Report Var(flag_sanitize_coverage)
+Enable coverage-guided fuzzing support.
+Inserts call to __sanitizer_cov_trace_pc into every basic block.
+
 ; Flag whether a prefix has been added to dump_base_name
 Variable
 bool dump_base_name_prefixed = false
Index: passes.def
===
--- passes.def	(revision 231000)
+++ passes.def	(working copy)
@@ -237,6 +237,7 @@
   NEXT_PASS (pass_split_crit_edges);
   NEXT_PASS (pass_pre);
   NEXT_PASS (pass_sink_code);
+  NEXT_PASS (pass_sancov);
   NEXT_PASS (pass_asan);
   NEXT_PASS (pass_tsan);
   /* Pass group that runs when 1) enabled, 2) there are loops
@@ -346,6 +347,7 @@
  to forward object-size and builtin folding results properly.  */
   NEXT_PASS (pass_copy_prop);
   NEXT_PASS (pass_dce);
+  NEXT_PASS (pass_sancov);
   NEXT_PASS (pass_asan);
   NEXT_PASS (pass_tsan);
   /* ???  We do want some kind of loop invariant motion, but we possibly
@@ -369,6 +371,7 @@
   NEXT_PASS (pass_lower_vaarg);
   NEXT_PASS (pass_lower_vector);
   NEXT_PASS (pass_lower_complex_O0);
+  NEXT_PASS (pass_sancov_O0);
   NEXT_PASS (pass_asan_O0);
   NEXT_PASS (pass_tsan_O0);
   NEXT_PASS (pass_sanopt);
Index: sancov.c
===
--- sancov.c	(revision 0)
+++ sancov.c	(working copy)
@@ -0,0 +1,116 @@
+/* Code coverage instrumentation for fuzzing.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+   Contributed by Dmitry Vyukov 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is

Re: -fstrict-aliasing fixes 5/6: make type system independent of flag_strict_aliasing

2015-12-02 Thread Jan Hubicka

> 
> Rather than adjusting this function please adjust 
> alias_sets_must_conflict_p.
> 
> Otherwise this looks ok and indeed much nicer.

OK, will update it.  Will hold this patch until we resolve what we want to do
with the debug dumps. I do not seem to be able to reproduce any -fcompare-debug
issues with that, so perhaps we already strip the alias sets and the whoe
-fstrict-alias-set fiddling is unnecesary.

Honza

Re: -fstrict-aliasing fixes 4/6: do not fiddle with flag_strict_aliasing when expanding debug locations

2015-12-02 Thread Jan Hubicka

> On Wed, 2 Dec 2015, Jakub Jelinek wrote:
> 
> > On Wed, Dec 02, 2015 at 10:05:13AM +0100, Richard Biener wrote:
> > > On Wed, 2 Dec 2015, Jan Hubicka wrote:
> > > 
> > > > Hi,
> > > > this patch removes flag_strict_aliasing kludge in expanding debug 
> > > > locations and
> > > > instead it introduces explicit parameter DEBUG that makes
> > > > set_mem_attributes_minus_bitpos to not affect alias sets.  This is 
> > > > sanity
> > > > checked by comparing number of alias sets before and after at a time we
> > > > originally overwritten flag_strict_aliasing.
> > > > 
> > > > I also added code to prevent memory attributes creation for !optimize 
> > > > and to
> > > > avoid get_alias_set computation for !flag_strict_aliasing. This slightly
> > > > optimizes -O0 builds but the results seems to be down in the noise (I 
> > > > would not
> > > > object to leave it out).
> > > > 
> > > > The patch should fix at least one (latent?) bug that call_stmt expansion
> > > > invoke expand_debug_expr without clearing flag_strict_aliasing.
> > > > 
> > > > Bootstrapped/regtested x86_64-linux, also tested with compare-debug, OK?
> > > 
> > > First of all, why do debug MEMs need mem-attrs?
> > 
> > For aliasing purposes, like any other MEMs.  var-tracking needs to be able
> > to find out if some store through say some pointer could alias certain debug
> > MEM (then we need to flush the corresponding VALUEs), or not (then that
> > VALUE can be still considered live at that MEM location).
> 
> Ok, so we then pessimize the alias-set for all of them (when they
> don't have a DECL_RTL already) to use alias-set zero even if we
> wouldn't need to create a new alias set?  (and even if I don't see
> what issues that would cause apart from "messing up the dumps" which
> should be fixed by stripping alias sets from dumps)

Breaking out code into make_decl_rtl_raw is also easy, since set_mem_attributes
is called at very end of make_decl_rtl.

Jakub, in https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00316.html I make 
get_alias_set
to do the job even in !flas_strict_aliasing because this is needed to make it 
possible
to mix flag_strict_aliasing and !flag_strict_aliasing functions in one 
translation
unit.  We do that during LTO.

I wonder how to reproduce the messed up dumps here with that patch and if 
stripping
the alias sets would be OK? Indeed getting extra alias sets should not cause
codegen differences.

Honza
> 
> Richard.

Re: [PATCH] Fix declaration of pthread-structs in s-osinte-rtems.ads (ada/68169)

2015-12-02 Thread Joel Sherrill



On December 2, 2015 2:14:22 AM EST, Jeff Law  wrote:
>On 12/01/2015 12:56 PM, Jan Sommer wrote:
>> Am Monday 30 November 2015, 16:19:30 schrieb Jeff Law:
>>> On 11/30/2015 03:06 PM, Jan Sommer wrote:
 Could someone with write access please commit the patch?
 The paperwork with the FSF has gone through. If something else is
>missing, please tell me.
 I won't be available next week.
>>> I'm not sure what you built your patches again, but I can't apply
>them
>>> to the trunk.  Can you resend a patch as a diff against the trunk.
>>>
>>> Often I can fix things by hand, but this is Ada and I'd be much more
>>> likely to botch something.
>>
>> I updated the patches again. They should now fit with the heads of
>the respective branches again.
>> Maybe the Changelog will be out of synch again.
>> The patches are for the following branches:
>> ada-68169_4.9.diff   -->  gcc-4_9-branch
>> ada-68169_5.x.diff  -->   gcc-5-branch
>> ada-68169_trunk.diff --> trunk
>>
>> Let me know if they apply this time. I used svn diff to create them
>and used patch -p0 to test if they apply locally.
>THanks.  I've committed this to the trunk based on Joel's comments.
>
>The gcc-5 branch is frozen for the upcoming release and gcc-4.9 is 
>regression/doc fixes only.  It'll be up to the release managers whether
>
>or not to backport to those branches.

Thanks Jeff. 

I would consider this a regression. RTEMS changed the pthread_attr_t when we 
added thread affinity and updating Ada to match slipped through. We knew it 
needed attention for SMP but missed this critical piece to keep it working.

Jan.. Is there a gcc PR for this? To get it on a release branch, it is better 
to have one.

>Thanks.
>
>Jeff

--joel

Re: Add fuzzing coverage support

2015-12-02 Thread Dmitry Vyukov

On Wed, Dec 2, 2015 at 6:11 PM, Bernd Schmidt  wrote:
> On 12/02/2015 05:55 PM, Dmitry Vyukov wrote:
>>
>> Can you point to some concrete coding style violations (besides
>> function comments)?
>>
>>   (flag_sanitize & (SANITIZE_ADDRESS | SANITIZE_THREAD \
>> -   | SANITIZE_UNDEFINED |
>> SANITIZE_NONDEFAULT)))
>> +   | SANITIZE_UNDEFINED |
>> SANITIZE_NONDEFAULT) \
>> +   || flag_sanitize_coverage))
>
>
> The || should line up with the other condition (i.e. the part starting with
> flag_sanitize).
>
>> +unsigned sancov_pass (function *fun)
>
>
> Split the line after the return type.
>
>> +
>> +template
>> +class pass_sancov : public gimple_opt_pass
>> +{

Thanks for the review!

> This seems to be a new idiom but I find it OK.

Duplicating this code would double patch size :)
Asan pass has this duplication.
FWIW I found this template trick better than duplication of a large
chunk of code

> One thing to consider would
> be whether you really need this split between O0/optimize versions, or
> whether you can find a place in the queue where to insert it
> unconditionally. Have you considered this at all or did you just follow
> asan/tsan?

I inserted the pass just before asan/tsan because it looks like the
right place for it. If we do it after asan, it will insert coverage
for all asan-emited BBs which is highly undesirable. I also think it
is a good idea to run a bunch of optimizations before coverage pass to
not emit too many coverage callbacks (but I can't say that I am very
knowledgeable in this area). FWIW clang does the same: coverage passes
run just before asan/tsan.

>> +public:
>> +  static pass_data pd ()
>> +  {
>> +static const pass_data data =
>
>
> I think a static data member would be better than the unnecessary pd ()
> function. This is also unlike existing practice, and I wonder how others
> think about it. IMO a fairly strong case could be made that if we're using
> C++, then this sort of thing ought to be part of the class definition.

I vary name of the pass depending on the O0 template argument (again
following asan):

O0 ? "sancov_O0" : "sancov", /* name */

If we call it "sancov" always, then I can make it just a global var
(as all other passes in gcc).
Or I can make it a static variable of the template class and move
definition of the class (as you proposed).
What would you prefer?

RFC: c-common PATCHes to fully fold warning function arguments

2015-12-02 Thread Jason Merrill

As richi pointed out in another thread, calling fold in 
warn_tautological_cmp is problematic because fold expects operands of 
its argument to have already been folded; we really need to call 
*_fully_fold.


The first patch moves c_fully_fold into the C front end, and defines it 
for C++ as well.


The second patch defines fold_for_warn to call c_fully_fold and uses it 
in places in the warn_* functions that want to know if the operand has a 
constant value.


Joseph, are these changes OK for C?  The extra folding isn't currently 
necessary for C, should fold_for_warn check c_dialect_cxx()?


Tested x86_64-pc-linux-gnu.
commit 9cac8bb59f9f8a0ca2395029ef13b8a88bc49409
Author: Jason Merrill 
Date:   Thu Nov 26 07:45:03 2015 -0500

	Define c_fully_fold separately for C and C++.

gcc/c-family/
	* c-common.c (c_disable_warnings, c_enable_warnings, c_fully_fold)
	(c_fully_fold_internal, decl_constant_value_for_optimization):
	Move to c/c-fold.c.
	* c-common.h: Don't declare decl_constant_value_for_optimization.
gcc/c/
	* c-fold.c (c_disable_warnings, c_enable_warnings, c_fully_fold)
	(c_fully_fold_internal, decl_constant_value_for_optimization):
	Move from c-common.c.
	* c-tree.h: Declare decl_constant_value_for_optimization.
	* Make-lang.in (C_AND_OBJC_OBJS): Add c-fold.o.
gcc/cp/
	* cp-gimplify.c (c_fully_fold): Define.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 7a9e3b9..40f86e3 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -299,7 +299,6 @@ const struct fname_var_t fname_vars[] =
 /* Global visibility options.  */
 struct visibility_flags visibility_options;
 
-static tree c_fully_fold_internal (tree expr, bool, bool *, bool *, bool);
 static tree check_case_value (location_t, tree);
 static bool check_case_bounds (location_t, tree, tree, tree *, tree *,
 			   bool *);
@@ -1095,582 +1094,6 @@ fix_string_type (tree value)
   return value;
 }
 
-/* If DISABLE is true, stop issuing warnings.  This is used when
-   parsing code that we know will not be executed.  This function may
-   be called multiple times, and works as a stack.  */
-
-static void
-c_disable_warnings (bool disable)
-{
-  if (disable)
-{
-  ++c_inhibit_evaluation_warnings;
-  fold_defer_overflow_warnings ();
-}
-}
-
-/* If ENABLE is true, reenable issuing warnings.  */
-
-static void
-c_enable_warnings (bool enable)
-{
-  if (enable)
-{
-  --c_inhibit_evaluation_warnings;
-  fold_undefer_and_ignore_overflow_warnings ();
-}
-}
-
-/* Fully fold EXPR, an expression that was not folded (beyond integer
-   constant expressions and null pointer constants) when being built
-   up.  If IN_INIT, this is in a static initializer and certain
-   changes are made to the folding done.  Clear *MAYBE_CONST if
-   MAYBE_CONST is not NULL and EXPR is definitely not a constant
-   expression because it contains an evaluated operator (in C99) or an
-   operator outside of sizeof returning an integer constant (in C90)
-   not permitted in constant expressions, or because it contains an
-   evaluated arithmetic overflow.  (*MAYBE_CONST should typically be
-   set to true by callers before calling this function.)  Return the
-   folded expression.  Function arguments have already been folded
-   before calling this function, as have the contents of SAVE_EXPR,
-   TARGET_EXPR, BIND_EXPR, VA_ARG_EXPR, OBJ_TYPE_REF and
-   C_MAYBE_CONST_EXPR.  */
-
-tree
-c_fully_fold (tree expr, bool in_init, bool *maybe_const)
-{
-  tree ret;
-  tree eptype = NULL_TREE;
-  bool dummy = true;
-  bool maybe_const_itself = true;
-  location_t loc = EXPR_LOCATION (expr);
-
-  /* This function is not relevant to C++ because C++ folds while
- parsing, and may need changes to be correct for C++ when C++
- stops folding while parsing.  */
-  if (c_dialect_cxx ())
-gcc_unreachable ();
-
-  if (!maybe_const)
-maybe_const = 
-  if (TREE_CODE (expr) == EXCESS_PRECISION_EXPR)
-{
-  eptype = TREE_TYPE (expr);
-  expr = TREE_OPERAND (expr, 0);
-}
-  ret = c_fully_fold_internal (expr, in_init, maybe_const,
-			   _const_itself, false);
-  if (eptype)
-ret = fold_convert_loc (loc, eptype, ret);
-  *maybe_const &= maybe_const_itself;
-  return ret;
-}
-
-/* Internal helper for c_fully_fold.  EXPR and IN_INIT are as for
-   c_fully_fold.  *MAYBE_CONST_OPERANDS is cleared because of operands
-   not permitted, while *MAYBE_CONST_ITSELF is cleared because of
-   arithmetic overflow (for C90, *MAYBE_CONST_OPERANDS is carried from
-   both evaluated and unevaluated subexpressions while
-   *MAYBE_CONST_ITSELF is carried from only evaluated
-   subexpressions).  FOR_INT_CONST indicates if EXPR is an expression
-   with integer constant operands, and if any of the operands doesn't
-   get folded to an integer constant, don't fold the expression itself.  */
-
-static tree
-c_fully_fold_internal (tree

Re: RFC: c-common PATCHes to fully fold warning function arguments

2015-12-02 Thread Joseph Myers

On Wed, 2 Dec 2015, Jason Merrill wrote:

> As richi pointed out in another thread, calling fold in warn_tautological_cmp
> is problematic because fold expects operands of its argument to have already
> been folded; we really need to call *_fully_fold.
> 
> The first patch moves c_fully_fold into the C front end, and defines it for
> C++ as well.

The new file needs copyright and license notices.

> The second patch defines fold_for_warn to call c_fully_fold and uses it in
> places in the warn_* functions that want to know if the operand has a constant
> value.
> 
> Joseph, are these changes OK for C?  The extra folding isn't currently
> necessary for C, should fold_for_warn check c_dialect_cxx()?

I think it should check that, yes.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: Patch RFA: Improve building gotools with a cross compiler

2015-12-02 Thread Ian Lance Taylor

Ping to build maintainers.  Thanks.

Ian


On Wed, Nov 25, 2015 at 2:22 PM, Ian Lance Taylor  wrote:
> PR 66147 points out that it doesn't work to build the gotools with a
> cross-compiler.  This patch improves matters.  I added a new host
> export to the top level Makefile.  Build maintainers, does this change
> seem OK?
>
> Ian
>
> ./ChangeLog:
>
> 2015-11-25  Ian Lance Taylor  
>
> PR go/66147
> * Makefile.tpl (HOST_EXPORTS): Add XGCC_FLAGS_FOR_TARGET.
> * Makefile.in: Regenerate.
>
> gotools/ChangeLog:
>
> 2015-11-25  Ian Lance Taylor  
>
> PR go/66147
> * Makefile.am (GOCOMPILER): In NATIVE case, add
> $(XGCC_FLAGS_FOR_TARGET).

Re: Fix 61441 [ 1/5] Add REAL_VALUE_ISSIGNALING_NAN

2015-12-02 Thread Jeff Law


On 11/26/2015 01:27 AM, Saraswati, Sujoy (OSTL) wrote:

Hi,
   This series of patches fixes PR61441.  The fix is broken into 5 patches.

   The first one adds REAL_VALUE_ISSIGNALING_NAN.

   2015-11-26  Sujoy Saraswati 

PR tree-optimization/61441
* real.c (real_issignaling_nan): New.
* real.h (real_issignaling_nan, REAL_VALUE_ISSIGNALING_NAN): New.
If you haven't set up write-access to the repository, please go ahead 
and get that process started:


https://www.gnu.org/software/gcc/svnwrite.html

You can list me as your sponsor on the form.

Once your account is set up, you can commit patches which have been 
approved.


I'll go ahead and approve #1, #2 and #4.  Richi has approved #3.

I'm still looking at #5.

Jeff

[PATCH] [graphite] fix PR68550: do not handle ISL loop peeled statements

2015-12-02 Thread Sebastian Pop

In case ISL did some loop peeling, like this:

  S_8(0);
  for (int c1 = 1; c1 <= 5; c1 += 1) {
S_8(c1);
  }
  S_8(6);

we should not copy loop-phi nodes in S_8(0) or in S_8(6).

* graphite-isl-ast-to-gimple.c (copy_loop_phi_nodes): Add dump.
(copy_bb_and_scalar_dependences): Do not code generate loop peeled
statements.
---
 gcc/graphite-isl-ast-to-gimple.c | 26 +
 gcc/testsuite/gfortran.dg/graphite/pr68550-1.f90 | 36 
 gcc/testsuite/gfortran.dg/graphite/pr68550-2.f90 | 14 +
 3 files changed, 76 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/graphite/pr68550-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/graphite/pr68550-2.f90

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index bfce316..3139f30 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -2096,6 +2096,12 @@ translate_isl_ast_to_gimple::copy_loop_phi_nodes 
(basic_block bb,
   codegen_error = !copy_loop_phi_args (phi, ibp_old_bb, new_phi,
  ibp_new_bb, true);
   update_stmt (new_phi);
+
+  if (dump_file)
+   {
+ fprintf (dump_file, "[codegen] creating loop-phi node: ");
+ print_gimple_stmt (dump_file, new_phi, 0, 0);
+   }
 }
 
   return true;
@@ -2894,6 +2900,26 @@ 
translate_isl_ast_to_gimple::copy_bb_and_scalar_dependences (basic_block bb,
  return NULL;
}
 
+ /* In case ISL did some loop peeling, like this:
+
+  S_8(0);
+  for (int c1 = 1; c1 <= 5; c1 += 1) {
+S_8(c1);
+  }
+  S_8(6);
+
+there should be no loop-phi nodes in S_8(0).
+
+FIXME: We need to reason about dynamic instances of S_8, i.e., the
+values of all scalar variables: for the moment we instantiate only
+SCEV analyzable expressions on the iteration domain, and we need to
+extend that to reductions that cannot be analyzed by SCEV.  */
+ if (!bb_in_sese_p (phi_bb, region->if_region->true_region->region))
+   {
+ codegen_error = true;
+ return NULL;
+   }
+
  if (dump_file)
fprintf (dump_file, "[codegen] bb_%d contains loop phi nodes.\n",
 bb->index);
diff --git a/gcc/testsuite/gfortran.dg/graphite/pr68550-1.f90 
b/gcc/testsuite/gfortran.dg/graphite/pr68550-1.f90
new file mode 100644
index 000..beed63e
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/graphite/pr68550-1.f90
@@ -0,0 +1,36 @@
+! { dg-do compile }
+! { dg-options "-floop-nest-optimize -O2" }
+
+SUBROUTINE integrate_core_1(grid,coef_xyz,pol_x,pol_y,&
+ pol_z,map,sphere_bounds,cmax,gridbounds)
+INTEGER, PARAMETER :: dp=8
+INTEGER, INTENT(IN):: sphere_bounds(*), cmax, &
+  map(-cmax:cmax,1:3), &
+  gridbounds(2,3)
+REAL(dp), INTENT(IN) :: grid(gridbounds(1,1):gridbounds(2,1), &
+ gridbounds(1,2):gridbounds(2,2),&
+ gridbounds(1,3):gridbounds(2,3))
+INTEGER, PARAMETER :: lp = 1
+REAL(dp), INTENT(IN)   :: pol_x(0:lp,-cmax:cmax), &
+  pol_y(1:2,0:lp,-cmax:0), &
+  pol_z(1:2,0:lp,-cmax:0)
+REAL(dp), INTENT(OUT) :: coef_xyz(((lp+1)*(lp+2)*(lp+3))/6)
+INTEGER   :: i, ig, igmax, igmin, j, j2, &
+ jg, jg2, jgmin, k, k2, kg, &
+ kg2, kgmin, lxp, sci
+REAL(dp)  :: coef_x(4,0:lp), &
+ coef_xy(2,((lp+1)*(lp+2))/2), &
+ s(4)
+DO kg=kgmin,0
+   DO jg=jgmin,0
+  coef_x=0.0_dp
+  DO ig=igmin,igmax
+ DO lxp=0,lp
+coef_x(:,lxp)=coef_x(:,lxp)+s(:)*pol_x(lxp,ig)
+ ENDDO
+  END DO
+ coef_xy(:,3)=coef_xy(:,3)+coef_x(3:4,0)*pol_y(2,1,jg)
+   END DO
+coef_xyz(3)=coef_xyz(3)+coef_xy(1,3)*pol_z(1,0,kg)
+END DO
+  END SUBROUTINE integrate_core_1
diff --git a/gcc/testsuite/gfortran.dg/graphite/pr68550-2.f90 
b/gcc/testsuite/gfortran.dg/graphite/pr68550-2.f90
new file mode 100644
index 000..fae0c92
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/graphite/pr68550-2.f90
@@ -0,0 +1,14 @@
+! { dg-do compile }
+! { dg-options "-floop-nest-optimize -fcheck=bounds -O1" }
+
+SUBROUTINE PD2VAL(RES,NDERIV,TG1,TG2,C0)
+INTEGER, PARAMETER :: dp=8
+REAL(KIND=dp), INTENT(OUT)  :: res(*)
+REAL(KIND=dp), INTENT(IN)   :: TG1, TG2, C0(105,*)
+REAL(KIND=dp)   :: T1(0:13), T2(0:13)
+ DO K=1,NDERIV+1
+  RES(K)=RES(K)+DOT_PRODUCT(T1(0:7),C0(70:77,K))*T2(6)
+  RES(K)=RES(K)+DOT_PRODUCT(T1(0:4),C0(91:95,K))*T2(9)
+  RES(K)=RES(K)+DOT_PRODUCT(T1(0:3),C0(96:99,K))*T2(10)
+ ENDDO
+END SUBROUTINE PD2VAL
-- 
1.9.1

Re: Add fuzzing coverage support

2015-12-02 Thread Bernd Schmidt


On 12/02/2015 05:10 PM, Dmitry Vyukov wrote:

ping


I do not see the original submission in my archives. This one comes too 
late to make it into gcc-6. I can make some initial comments.



This patch adds support for coverage-guided fuzzing:
https://codereview.appspot.com/280140043


Please send patches with the mail (the best method is usually a 
text/plain attachment) so that they can be quoted.


From what I can tell this is relatively straight-forward. You'll want 
to fix a number of issues with coding style (formatting, and lack of 
function comments).


There's no documentation in invoke.texi.

We seem to have no established process for deciding whether we want a 
new feature. I am not sure how to approach such a question, and it comes 
up quite often. Somehow the default answer usually seems to be to accept 
anything.



Bernd

Re: [PATCH, VECTOR ABI] Add attribute((simd)) to GCC.

2015-12-02 Thread Jakub Jelinek

On Wed, Dec 02, 2015 at 10:40:13AM -0700, Jeff Law wrote:
> Why not use "unmasked" and "masked" instead of "notinbranch" and "inbranch"?
> If those terms come from OpenMP or are in use by other compilers (llvm, icc,
> whatever), then I guess we should stick with them.  Otherwise we should
> consider [un]masked which are consistent with the vector abi document.

notinbranch/inbranch are OpenMP clauses used for this purpose.

Jakub

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Alexander Monakov

On Wed, 2 Dec 2015, Jakub Jelinek wrote:
> > It's easy to address: just terminate threads 1-31 if the linked image has
> > no SIMD regions, like my pre-simd libgomp was doing.
> 
> Well, can't say the linked image in one shared library call a function
> in another linked image in another shared library?  Or is that just not
> supported for PTX?  I believe XeonPhi supports that.

I meant the PTX linked (post PTX-JIT link) image, so regardless of support,
it's not an issue.  E.g. check early in gomp_nvptx_main if .weak
__nvptx_has_simd != 0.  It would only break if there was dlopen on PTX.

> If each linked image is self-contained, then that is probably a good idea,
> but still you could have a single simd region somewhere and lots of other
> target regions that don't use simd, or cases where only small amount of time
> is spent in a simd region and this wouldn't help in that case.

Should we actually be much concerned about optimizing this case, which
is unlikely to run faster than host cpu in the first place?

> If the addressables are handled through soft stack, then the rest is mostly
> just SSA_NAMEs you can see on the edges of the SIMT region, that really
> shouldn't be that expensive to broadcast or reduce back.

That's not enough: you have to reach the SIMD region entry in threads 1-31,
which means they need to execute all preceding control flow like thread 0,
which means they need to compute controlling predicates like thread 0.
(OpenACC broadcasts controlling predicates at branches)

Alexander

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Nathan Sidwell


On 12/02/15 12:09, Alexander Monakov wrote:


I meant the PTX linked (post PTX-JIT link) image, so regardless of support,
it's not an issue.  E.g. check early in gomp_nvptx_main if .weak
__nvptx_has_simd != 0.  It would only break if there was dlopen on PTX.


Note I found a bug in .weak support.  See the comment in  
gcc.dg/special/weak-2.c

/* NVPTX's implementation of weak is broken when a strong symbol is in
   a later object file than the weak definition.   */


That's not enough: you have to reach the SIMD region entry in threads 1-31,
which means they need to execute all preceding control flow like thread 0,
which means they need to compute controlling predicates like thread 0.
(OpenACC broadcasts controlling predicates at branches)


indeed.  Hence the partial 'forking' before a function call of a function with 
internal partitioned execution.


nathan

Re: Solaris vtv port breaks x32 build

2015-12-02 Thread Matthias Klose


On 02.12.2015 13:29, Rainer Orth wrote:

Exactly: moving AM_ENABLE_MULTILIB up as Matthias suggested sets
cross_compiling=maybe for non-default multilibs early, which should
achieve the desired behaviour.  All other libraries that invoke both
macros already do so in this order.


now committed.

2015-12-02  Matthias Klose  

* configure.ac: Move AM_ENABLE_MULTILIB before
GCC_LIBSTDCXX_RAW_CXX_FLAGS.
* configure: Regenerate.

[PATCH] Fix shrink-wrap bug with anticipating into loops (PR67778, PR68634)

2015-12-02 Thread Segher Boessenkool

After shrink-wrapping has found the "tightest fit" for where to place
the prologue, it tries move it earlier (so that frame saves are run
earlier) -- but without copying any more basic blocks.

Unfortunately a candidate block we select can be inside a loop, and we
will still allow it (because the loop always exits via our previously
chosen block).  We can do that just fine if we make a duplicate of the
block, but we do not want to here.

So we need to detect this situation.  We can place the prologue at a
previous block PRE only if PRE dominates every block reachable from
it.  This is a bit hard / expensive to compute, so instead this patch
allows a block PRE only if PRE does not post-dominate any of its
successors (other than itself).

Tested on the two testcases from the PRs.  Also regression checked
on powerpc64-linux.

Is this okay for trunk?


Segher


2015-12-02  Segher Boessenkool  

* shrink-wrap.c (try_shrink_wrapping): Disallow moving the prologue
back into a loop.

---
 gcc/shrink-wrap.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index 3a1df84..fe33652 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -752,7 +752,11 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head 
*bb_with,
 
   /* If we can move PRO back without having to duplicate more blocks, do so.
  We can move back to a block PRE if every path from PRE will eventually
- need a prologue, that is, PRO is a post-dominator of PRE.  */
+ need a prologue, that is, PRO is a post-dominator of PRE.  We might
+ need to duplicate PRE if there is any path from a successor of PRE back
+ to PRE, so don't allow that either (but self-loops are fine, as are any
+ other loops entirely dominated by PRE; this in general seems too
+ expensive to check for, for such an uncommon case).  */
 
   if (pro != entry)
 {
@@ -765,9 +769,20 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head 
*bb_with,
  if (!dominated_by_p (CDI_POST_DOMINATORS, pre, pro))
break;
 
+ bool ok = true;
+
+ if (!can_get_prologue (pre, prologue_clobbered))
+   ok = false;
+
+ FOR_EACH_EDGE (e, ei, pre->succs)
+   if (e->dest != pre
+   && dominated_by_p (CDI_POST_DOMINATORS, e->dest, pre))
+ ok = false;
+
+ if (ok)
+   last_ok = pre;
+
  pro = pre;
- if (can_get_prologue (pro, prologue_clobbered))
-   last_ok = pro;
}
   pro = last_ok;
 
-- 
1.9.3

Re: Fix 61441 [5/5] Disable various transformations for signaling NaN operands

2015-12-02 Thread Jeff Law


On 11/26/2015 01:38 AM, Saraswati, Sujoy (OSTL) wrote:

Hi,
  This patch avoids various transformations with signaling NaN operands when 
flag_signaling_nans is on, to avoid folding which would lose exceptions. A test 
case for this change is also added as part of this patch.
Regards,
Sujoy

   2015-11-26  Sujoy Saraswati 

 PR tree-optimization/61441
 * fold-const.c (const_binop): Convert sNaN to qNaN when
 flag_signaling_nans is off.
 (const_unop): Avoid the operation, other than NEGATE and
 ABS, if flag_signaling_nans is on and the operand is an sNaN.
 (fold_convert_const_real_from_real): Avoid the operation if
 flag_signaling_nans is on and the operand is an sNaN.
 (integer_valued_real_unary_p): Update comment stating it
 returns false for sNaN values.
 (integer_valued_real_binary_p, integer_valued_real_call_p): Same.
 (integer_valued_real_single_p): Same.
 (integer_valued_real_invalid_p, integer_valued_real_p): Same.
 * fold-const-call.c (fold_const_pow): Avoid the operation
 if flag_signaling_nans is on and the operand is an sNaN.
 (fold_const_builtin_load_exponent) Same.
 (fold_const_call_sss): Same for BUILT_IN_POWI.
 * gimple-fold.c (gimple_assign_integer_valued_real_p): Same.
 (gimple_call_integer_valued_real_p): Same.
 (gimple_phi_integer_valued_real_p): Same.
 (gimple_stmt_integer_valued_real_p): Same.
 * simplify-rtx.c (simplify_const_unary_operation): Avoid the
 operation if flag_signaling_nans is on and the operand is an sNaN.
 (simplify_const_binary_operation): Same.
 * tree-ssa-math-opts.c (gimple_expand_builtin_pow): Avoid the
 operation if flag_signaling_nans is on and the operand is an sNaN.

 PR tree-optimization/61441
 * gcc.dg/pr61441.c: New testcase.

===
diff -u -p a/gcc/fold-const.c b/gcc/fold-const.c
--- a/gcc/fold-const.c  2015-11-25 15:24:49.656116740 +0530
+++ b/gcc/fold-const.c  2015-11-25 15:25:07.712117294 +0530
@@ -1166,9 +1166,21 @@ const_binop (enum tree_code code, tree a
/* If either operand is a NaN, just return it.  Otherwise, set up
  for floating-point trap; we return an overflow.  */
if (REAL_VALUE_ISNAN (d1))
-   return arg1;
+  {
+/* Make resulting NaN value to be qNaN when flag_signaling_nans
+   is off.  */
+d1.signalling = 0;
+t = build_real (type, d1);
+return t;
+  }
else if (REAL_VALUE_ISNAN (d2))
-   return arg2;
+  {
+/* Make resulting NaN value to be qNaN when flag_signaling_nans
+   is off.  */
+d2.signalling = 0;
+t = build_real (type, d2);
+return t;
+  }
I was a bit worried about this hunk, but by the time we get here we 
already know that at least one operand is a sNaN and that we're not 
honoring sNaNs.





inexact = real_arithmetic (, code, , );
real_convert (, mode, );
@@ -1538,6 +1550,15 @@ const_binop (enum tree_code code, tree t
  tree
  const_unop (enum tree_code code, tree type, tree arg0)
  {
+  /* Don't perform the operation, other than NEGATE and ABS, if
+ flag_signaling_nans is on and the operand is a NaN.  */
+  if (TREE_CODE (arg0) == REAL_CST
+  && HONOR_SNANS (TYPE_MODE (TREE_TYPE (arg0)))
+  && REAL_VALUE_ISSIGNALING_NAN (TREE_REAL_CST (arg0))
+  && code != NEGATE_EXPR
+  && code != ABS_EXPR)
+return NULL_TREE;
Why let NEGATE_EXPR and ABS_EXPR pass through here?  I realize that 
these can often be implemented with bit-twiddling, so they're usually 
considered special.  BUt in this case aren't we just dealing with 
constants and wouldn't we want to still express the neg/abs so that we 
get a signal when the input value is sNaN rather than collapse down to a 
constant?








  /* Return true if the floating point result of (CODE OP0) has an
 integer value.  We also allow +Inf, -Inf and NaN to be considered
-   integer values.
+   integer values. Return false for signalling NaN.
s/signalling/signaling/  I think it shows up in multiple places, so 
check for it using search/replace.


Also watch for using 8 spaces rather than a tab in your patches.  I 
think I see occurrences of that mistake in this patch.  SO just check 
everything you changed to be sure you aren't adding new instances of 
that nit.



I think a bit of explanation why you're letting NEGATE/ABS though in 
const_unop, fixing the spelling goof and the  instead of 8 spaces 
nit need to be addressed before this can be committed.


jeff

Re: Fix 61441 [5/5] Disable various transformations for signaling NaN operands

2015-12-02 Thread Joseph Myers

On Wed, 2 Dec 2015, Jeff Law wrote:

> Why let NEGATE_EXPR and ABS_EXPR pass through here?  I realize that these can
> often be implemented with bit-twiddling, so they're usually considered
> special.  BUt in this case aren't we just dealing with constants and wouldn't
> we want to still express the neg/abs so that we get a signal when the input
> value is sNaN rather than collapse down to a constant?

See IEEE 754-2008, 5.5.1.  "Implementations shall provide the following 
homogeneous quiet-computational sign bit operations for all supported 
arithmetic formats; they only affect the sign bit. The operations treat 
floating-point numbers and NaNs alike, and signal no exception. These 
operations may propagate non-canonical encodings.".

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: Add fuzzing coverage support

2015-12-02 Thread Jakub Jelinek

On Wed, Dec 02, 2015 at 05:55:29PM +0100, Dmitry Vyukov wrote:
> Can you point to some concrete coding style violations (besides
> function comments)?
> 
> 
> > We seem to have no established process for deciding whether we want a new
> > feature. I am not sure how to approach such a question, and it comes up
> > quite often. Somehow the default answer usually seems to be to accept
> > anything.

> Index: ChangeLog
> ===
> --- ChangeLog (revision 231000)
> +++ ChangeLog (working copy)
> @@ -1,3 +1,13 @@
> +2015-11-27  Dmitry Vyukov  
> +
> + * sancov.c: add file.
> + * Makefile.in: add sancov.c.
> + * passes.def: add sancov pass.
> + * tree-pass.h: add sancov pass.
> + * common.opt: add -fsanitize-coverage=trace-pc flag.
> + * sanitizer.def: add __sanitizer_cov_trace_pc builtin.
> + * builtins.def: enable sanitizer builtins when coverage is enabled.

All the ChangeLog entries should start with upper case letter after :,
instead of add file. say New file., in most other cases there is missing
() with what you've actually changed.
Say
* Makefile.in (OBJS): Add sancov.o.
You should not add sancov.c to GTFILES if there are no GTY variables (and if
there were, you would need to include the generated gt-* header at the end).
* builtins.def (DEF_SANITIZER_BUILTIN): Enable for
flag_sanitize_coverage.
etc.

> --- builtins.def  (revision 231000)
> +++ builtins.def  (working copy)
> @@ -210,7 +210,8 @@
>DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,\
>  true, true, true, ATTRS, true, \
> (flag_sanitize & (SANITIZE_ADDRESS | SANITIZE_THREAD \
> - | SANITIZE_UNDEFINED | SANITIZE_NONDEFAULT)))
> + | SANITIZE_UNDEFINED | SANITIZE_NONDEFAULT) \
> + || flag_sanitize_coverage))

The || should be indented below flag_.

> +  FOR_EACH_BB_FN (bb, fun)
> +{
> +  gsi = gsi_start_bb (bb);
> +  stmt = gsi_stmt (gsi);
> +  while (stmt && dyn_cast  (stmt))
> +{
> +  gsi_next ();
> +  stmt = gsi_stmt (gsi);
> +}
> +  if (!stmt)
> +continue;

That would be
gsi = gsi_after_labels (bb);
If you want to ignore empty basic blocks, then followed by
if (gsi_end_p (gsi))
  continue;

> +  f = gimple_build_call (builtin_decl_implicit (BUILT_IN_SANCOV_TRACE), 
> 0);

Generally, the text after BUILT_IN_ should be uppercase variant of the
function name, not a shorthand for it.
So I'd expect BUILT_IN_SANITIZER_COV_TRACE_PC instead.
If that makes the line too long, just use some temporary variable
say for builtin_decl_implicit result.

But I guess the most important question is, where is the
__sanitizer_cov_trace_pc function defined, I don't see it in libsanitizer
right now, is it meant for kernel only, something else?
If it is going to be in some libsanitizer library (which one), then one also
needs code to ensure that the library is linked in if -fsanitize-coverage
is used during linking.  Why is it not -fsanitize=coverage, btw?
Is it incompatible with some other sanitizers, or compatible with all of
them?

Jakub

Re: -fstrict-aliasing fixes 4/6: do not fiddle with flag_strict_aliasing when expanding debug locations

2015-12-02 Thread Jakub Jelinek

On Wed, Dec 02, 2015 at 09:16:10PM +0100, Jan Hubicka wrote:
>   * cfgexpand.c: Include alias.h

Missing full stop at the end.

>   (expand_call_stmt, expand_debug_expr): Set no_new_alias_sets;
>   do not fiddle with flag_strict_aliasing

Likewise.

>   * varasm.c: Include alias.h

Ditto.

> @@ -1034,6 +1037,10 @@ get_alias_set (tree t)
> gcc_checking_assert (p == TYPE_MAIN_VARIANT (p));
> if (TYPE_ALIAS_SET_KNOWN_P (p))
>   set = TYPE_ALIAS_SET (p);
> +   /* During debug statement expanding we can not allocate new alias sets

s/expanding/expansion/ ?

> +  /* During debug statement expanding we can not allocate new alias sets

Ditto.

Otherwise, it looks reasonable to me, but please wait for richi's feedback
on it.

Jakub

Re: [PATCH] PR target/48904 x86_64-knetbsd-gnu missing defs

2015-12-02 Thread Bernhard Reutner-Fischer

Trevor,

On 1 May 2015 at 01:23, Trevor Saunders  wrote:
> On Thu, Apr 30, 2015 at 11:58:09PM +0200, Bernhard Reutner-Fischer wrote:
>> On April 30, 2015 5:53:02 PM GMT+02:00, Jeff Law  wrote:
>> >On 04/30/2015 01:58 AM, Bernhard Reutner-Fischer wrote:
>> >> Hi,
>> >>
>> >> On 30 April 2015 at 07:00, Jeff Law  wrote:
>> >>> On 04/29/2015 02:01 AM, Bernhard Reutner-Fischer wrote:
>> 
>>  2012-09-21  H.J. Lu  
>> 
>>   PR target/48904
>>   * config.gcc (x86_64-*-knetbsd*-gnu): Add
>> >i386/knetbsd-gnu64.h.
>>   * config/i386/knetbsd-gnu64.h: New file
>> >>>
>> >>> OK.  Please install on the trunk.
>> >>
>> >> hmz, according to https://www.debian.org/ports/netbsd/ the debian
>> >> knetbsd port is abandoned since about 2002.
>> >> If this is true (please confirm) then we should probably remove
>> >knetbsd from
>> >> - upstream config repo
>> >> - GCC
>> >> - binutils-gdb
>> >>
>> >> instead of the above patchlet.
>> >> This would work equally well for me WRT config-list.mk builds..
>> >> [I should have checked this earlier, sorry..]
>> >Given what Guillem indicated, I'd support removal.
>> >
>> >It's often the case that we mark it as deprecated and issue an explicit
>> >
>> >error if someone tries to build the port.  That seems wise here.
>>
>> I will apply the abovementioned patch ASAP and let somebody else with 
>> janitorial spare cleanup cycles propose removal of
>> *-knetbsd-* then.
>> Given previous discussion in 
>> https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00582.html someone may also be 
>> in a position to remove *-openbsd3-* support, FWIW.
>> I personally cannot take care of these due to limited (essentially 
>> non-existing ;) spare time.
>
> I'll try to get to it before the end of stage 1.  Its pretty easy
> really, you just add the triple you want to obsolete to the switch in
> config.gcc at about line 240, and then wait for the next release.
>
> btw thanks for cleaning up config-list.mk :)
>
>> Same, BTW for Ultrix support which was officially removed from GCC last 
>> cycle, IIRC -- unless config entries are meant to be sticky from a GNU tool 
>> chain support POV per design/decision?
>
> It looks like most of the ultrix references are in config.guess /
> config.sub / configure so from upstream repositories.  THere's a few
> references in gcc/doc/ that could probably be cleaned up, and a little
> bit of something in fixincludes I think.
>
> Trev
>
>>
>> Either way, cannot clean these up properly ATM and not familiar with those 
>> policies, so punting for now..
>>
>> Cheers,
>> >
>> >jeff
>>

GCC-6 stage1 now ended. Can you take care of some of these
deprecations now, please?
TIA,

Re: [PATCH] Fix ICE in nonnull warning on OFFSET_TYPEs (PR c++/68653)

2015-12-02 Thread Jeff Law


On 12/02/2015 10:27 AM, Marek Polacek wrote:

We were hitting the assert in nonnull_arg_p that only wants to see POINTER_TYPE,
but it got OFFSET_TYPE instead.  Which is--in tree.def--described as "a pointer
relative to an object".  Thus I think we should allow OFFSET_TYPE here.  In the
attached testcase I demonstrated that this warning works even for OFFSET_TYPEs.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-12-02  Marek Polacek  

PR c++/68653
* tree.c (nonnull_arg_p): Allow OFFSET_TYPE.

* g++.dg/warn/nonnull3.C: New test.

OK.
jeff

Re: [PATCH] Fix c++/67337 (segfault in mangle.c)

2015-12-02 Thread Jason Merrill


OK, thanks.

Jason

Re: [PATCH] Add testcase for c/68513

2015-12-02 Thread Jakub Jelinek

On Wed, Dec 02, 2015 at 08:53:11PM +0100, Marek Polacek wrote:
> So Richi fixed this PR via changes for genmatch.c:
> .
> 
> I'd like to add this testcase which shows the original problem, and on top of
> that, I've added a bunch of other functions that try to exercises other
> patterns with save_exprs, both from match.pd and fold-const.c.
> 
> Tested on x86_64-linux, ok for trunk?
> 
> 2015-12-02  Marek Polacek  
> 
>   PR c/68513
>   * gcc.dg/pr68513.c: New test.

Ok, thanks.

Jakub

Re: [PATCH 1/2] destroy values as well as keys when removing them from hash maps

2015-12-02 Thread Trevor Saunders

On Wed, Dec 02, 2015 at 09:35:13AM +, Richard Sandiford wrote:
> Richard Biener  writes:
> > On Wed, 2 Dec 2015, Trevor Saunders wrote:
> >> On Tue, Dec 01, 2015 at 07:43:35PM +, Richard Sandiford wrote:
> >> > tbsaunde+...@tbsaunde.org writes:
> >> > > -template 
> >> > > +template 
> >> > >  template 
> >> > >  inline void
> >> > > -simple_hashmap_traits ::remove (T )
> >> > > +simple_hashmap_traits ::remove (T )
> >> > >  {
> >> > >H::remove (entry.m_key);
> >> > > +  entry.m_value.~Value ();
> >> > >  }
> >> > 
> >> > This is just repeating my IRC comment really, but doesn't this mean that
> >> > we're calling the destructor on an object that was never constructed?
> >> > I.e. nothing ever calls placement new on the entry, the m_key, or the
> >> > m_value.
> >> 
> >> I believe you are correct that placement new is not called.  I'd say its
> >> a bug waiting to happen given that the usage of auto_vec seems to
> >> demonstrate that people expect objects to be initialized and destroyed.
> >> However for now all values are either POD, or auto_vec and in either
> >> case the current 0 initialization has the same effect as the
> >> constructor.  So There may be a theoretical problem with how we
> >> initialize values that will become real when somebody adds a constructor
> >> that doesn't just 0 initialize.  So it should probably be improved at
> >> some point, but it doesn't seem necessary to mess with it at this point
> >> instead of next stage 1.
> >
> > Agreed.
> 
> OK.  I was just worried that (IIRC) we had cases where for:
> 
>  a.~foo ()
>  a.x = ...;
> 
> the assignment to a.x was removed as dead since the object had been
> destroyed.  Maybe that could happen again if we don't have an explicit
> constructor to create a new object.

It seems possible though it seems rather unlikely since you'd have to
insert a new element with the same hash as the one you just removed,
and the compiler would have to figure that out.  On the other hand I
suppose the existing code may already be invalid since it operates on
objects without constructing them.

Trev

> 
> Thanks,
> Richard
> 
> > You'll also need a more elaborate allocator/constructor
> > scheme for this considering the case where no default constructor
> > is available.  See how alloc-pool.h tries to dance around this
> > using a "raw" allocate and a operator new...
> >
> > Richard.
>

Re: -fstrict-aliasing fixes 4/6: do not fiddle with flag_strict_aliasing when expanding debug locations

2015-12-02 Thread Jan Hubicka

Hi,
I discussed the situation with Jakub on IRC today.  It seems we still want
to have alias sets in the dumps, but Jakub agrees with introduction new global
var to control this instead of tampering with flag_strict_aliasing.

This patch does that and also fixes expand_call_stmt which forget to set the
flag.  Additionally I modified get_alias_set to lookup existing alias set
whever possible. In majority of cases the alias set is there because the
variable was optimized out and the optimizers thus touched it earlier.

Bootstrapped/regtested x86_64-linux including ada. OK?

Honza

* cfgexpand.c: Include alias.h
(expand_call_stmt, expand_debug_expr): Set no_new_alias_sets;
do not fiddle with flag_strict_aliasing
* alias.c (no_new_alias_sets): New global var.
(get_alias_set): Do not introduce new alias sets when
no_new_alias_sets is set.
(new_alias_set): Check that no new alias sets are introduced.
* alias.h (no_new_alias_sets): Declare.
* varasm.c: Include alias.h
(make_decl_rtl_for_debug): Use no_new_alias_sets instead of
flag_strict_aliasing.
Index: cfgexpand.c
===
--- cfgexpand.c (revision 231122)
+++ cfgexpand.c (working copy)
@@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.
 #include "builtins.h"
 #include "tree-chkp.h"
 #include "rtl-chkp.h"
+#include "alias.h"
 
 /* Some systems use __main in a way incompatible with its use in gcc, in these
cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN to
@@ -2622,12 +2623,17 @@ expand_call_stmt (gcall *stmt)
   unsigned int ix;
   tree dtemp;
 
+  gcc_assert (!no_new_alias_sets);
+  no_new_alias_sets = true;
+
   if (debug_args)
for (ix = 1; (*debug_args)->iterate (ix, ); ix += 2)
  {
gcc_assert (TREE_CODE (dtemp) == DEBUG_EXPR_DECL);
expand_debug_expr (dtemp);
  }
+
+  no_new_alias_sets = false;
 }
 
   lhs = gimple_call_lhs (stmt);
@@ -4080,6 +4086,8 @@ expand_debug_expr (tree exp)
   int unsignedp = TYPE_UNSIGNED (TREE_TYPE (exp));
   addr_space_t as;
 
+  gcc_assert (no_new_alias_sets);
+
   switch (TREE_CODE_CLASS (TREE_CODE (exp)))
 {
 case tcc_expression:
@@ -5219,12 +5227,12 @@ expand_debug_locations (void)
 {
   rtx_insn *insn;
   rtx_insn *last = get_last_insn ();
-  int save_strict_alias = flag_strict_aliasing;
 
   /* New alias sets while setting up memory attributes cause
  -fcompare-debug failures, even though it doesn't bring about any
  codegen changes.  */
-  flag_strict_aliasing = 0;
+  gcc_assert (!no_new_alias_sets);
+  no_new_alias_sets = true;
 
   for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
 if (DEBUG_INSN_P (insn))
@@ -5284,7 +5292,7 @@ expand_debug_locations (void)
  avoid_complex_debug_insns (insn2, _VAR_LOCATION_LOC (insn2), 0);
   }
 
-  flag_strict_aliasing = save_strict_alias;
+  no_new_alias_sets = false;
 }
 
 /* Performs swapping operands of commutative operations to expand
Index: varasm.c
===
--- varasm.c(revision 231122)
+++ varasm.c(working copy)
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.
 #include "common/common-target.h"
 #include "asan.h"
 #include "rtl-iter.h"
+#include "alias.h"
 
 #ifdef XCOFF_DEBUGGING_INFO
 #include "xcoffout.h"  /* Needed for external data declarations.  */
@@ -1487,7 +1488,6 @@ make_decl_rtl (tree decl)
 rtx
 make_decl_rtl_for_debug (tree decl)
 {
-  unsigned int save_aliasing_flag;
   rtx rtl;
 
   if (DECL_RTL_SET_P (decl))
@@ -1496,18 +1496,16 @@ make_decl_rtl_for_debug (tree decl)
   /* Kludge alert!  Somewhere down the call chain, make_decl_rtl will
  call new_alias_set.  If running with -fcompare-debug, sometimes
  we do not want to create alias sets that will throw the alias
- numbers off in the comparison dumps.  So... clearing
- flag_strict_aliasing will keep new_alias_set() from creating a
- new set.  */
-  save_aliasing_flag = flag_strict_aliasing;
-  flag_strict_aliasing = 0;
+ numbers off in the comparison dumps.  */
+  bool save_no_new_alias_sets = no_new_alias_sets;
+  no_new_alias_sets = true;
 
   rtl = DECL_RTL (decl);
   /* Reset DECL_RTL back, as various parts of the compiler expects
  DECL_RTL set meaning it is actually going to be output.  */
   SET_DECL_RTL (decl, NULL);
 
-  flag_strict_aliasing = save_aliasing_flag;
+  no_new_alias_sets = save_no_new_alias_sets;
   return rtl;
 }
 
Index: alias.c
===
--- alias.c (revision 231122)
+++ alias.c (working copy)
@@ -145,6 +145,9 @@ struct GTY(()) alias_set_entry {
   bool has_pointer;
 };
 
+/* When true we will not allocate any new alias sets.  */
+bool no_new_alias_sets;
+
 static int rtx_equal_for_memref_p (const_rtx, const_rtx);

Re: [OpenACC 0/7] host_data construct

2015-12-02 Thread Cesar Philippidis

On 12/02/2015 07:58 AM, Thomas Schwinge wrote:

> diff --git gcc/testsuite/gfortran.dg/goacc/coarray.f95 
> gcc/testsuite/gfortran.dg/goacc/coarray.f95
> index 130ffc3..d2f10d5 100644
> --- gcc/testsuite/gfortran.dg/goacc/coarray.f95
> +++ gcc/testsuite/gfortran.dg/goacc/coarray.f95
> @@ -1,7 +1,9 @@
>  ! { dg-do compile } 
>  ! { dg-additional-options "-fcoarray=single" }
> -
> -! TODO: These cases must fail
> +!
> +! PR fortran/63861
> +! { dg-xfail-if "" { *-*-* } }
> +! { dg-excess-errors "TODO" }
>  
>  module test
>  contains
> @@ -9,7 +11,6 @@ contains
>  implicit none
>  integer :: i
>  integer, codimension[*] :: a
> -! { dg-excess-errors "sorry, unimplemented: directive not yet 
> implemented" }
>  !$acc declare device_resident (a)
>  !$acc data copy (a)
>  !$acc end data
> @@ -17,7 +18,6 @@ contains
>  !$acc end data
>  !$acc parallel private (a)
>  !$acc end parallel
> -! { dg-excess-errors "sorry, unimplemented: directive not yet 
> implemented" }
>  !$acc host_data use_device (a)
>  !$acc end host_data
>  !$acc parallel loop reduction(+:a)
> diff --git gcc/testsuite/gfortran.dg/goacc/coarray_2.f90 
> gcc/testsuite/gfortran.dg/goacc/coarray_2.f90
> index f9cf9ac..87e04d5 100644
> --- gcc/testsuite/gfortran.dg/goacc/coarray_2.f90
> +++ gcc/testsuite/gfortran.dg/goacc/coarray_2.f90
> @@ -3,6 +3,7 @@
>  !
>  ! PR fortran/63861
>  ! { dg-xfail-if "" { *-*-* } }
> +! { dg-excess-errors "TODO" }

This host_data patch exposed a bug in the fortran front end where it was
allowing arrays to be used as reduction variables. If replace you
replace codimension with dimension, you'd see a similar ICE. The
attached patch, while it doesn't make any attempt to fix the gimplifier
changes, does teach the fortran front end to error on acc reductions
containing array variables.

Note that this solution is somewhat aggressive because we probably
should allow reductions on individual array elements. E.g.

  !$acc loop reduction(+:var(1))

The c and c++ front ends also have that problem. Maybe I'll revisit this
later.

Is this ok for trunk? It will close pr63861.

Cesar
2015-12-02  Cesar Philippidis  

	gcc/fortran/
	PR fortran/63861
	* openmp.c (gfc_match_omp_clauses): Allow subarrays for acc reductions.
	(resolve_omp_clauses): Error on any acc reductions on arrays.

	gcc/testsuite/
	* gfortran.dg/goacc/array-reduction.f90: New test.
	* gfortran.dg/goacc/assumed.f95: Update expected diagnostics.
	* gfortran.dg/goacc/coarray.f95: Likewise.
	* gfortran.dg/goacc/coarray_2.f90: Likewise.
	* gfortran.dg/goacc/reduction-2.f95: Likewise.
	* gfortran.dg/goacc/reduction.f95: Likewise.

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 6182464..276f2f1 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -978,7 +978,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 
 	  if (gfc_match_omp_variable_list (" :",
 	   >lists[OMP_LIST_REDUCTION],
-	   false, NULL, ) == MATCH_YES)
+	   false, NULL, , openacc)
+	  == MATCH_YES)
 	{
 	  gfc_omp_namelist *n;
 	  if (rop == OMP_REDUCTION_NONE)
@@ -3313,6 +3314,11 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 		   n->sym->name, >where);
 	  else
 	n->sym->mark = 1;
+
+	  /* OpenACC does not support reductions on arrays.  */
+	  if (n->sym->as)
+	gfc_error ("Array %qs is not permitted in reduction at %L",
+		   n->sym->name, >where);
 	}
 }
   
diff --git a/gcc/testsuite/gfortran.dg/goacc/array-reduction.f90 b/gcc/testsuite/gfortran.dg/goacc/array-reduction.f90
new file mode 100644
index 000..d71c400
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/array-reduction.f90
@@ -0,0 +1,74 @@
+program test
+  implicit none
+  integer a(10), i
+
+  a(:) = 0
+  
+  ! Array reductions.
+  
+  !$acc parallel reduction (+:a) ! { dg-error "Array 'a' is not permitted in reduction" }
+  do i = 1, 10
+ a = a + 1
+  end do
+  !$acc end parallel
+
+  !$acc parallel
+  !$acc loop reduction (+:a) ! { dg-error "Array 'a' is not permitted in reduction" }
+  do i = 1, 10
+ a = a + 1
+  end do
+  !$acc end parallel
+
+  !$acc kernels
+  !$acc loop reduction (+:a) ! { dg-error "Array 'a' is not permitted in reduction" }
+  do i = 1, 10
+ a = a + 1
+  end do
+  !$acc end kernels
+
+  ! Subarray reductions.
+  
+  !$acc parallel reduction (+:a(1:5)) ! { dg-error "Array 'a' is not permitted in reduction" }
+  do i = 1, 10
+ a = a + 1
+  end do
+  !$acc end parallel
+
+  !$acc parallel
+  !$acc loop reduction (+:a(1:5)) ! { dg-error "Array 'a' is not permitted in reduction" }
+  do i = 1, 10
+ a = a + 1
+  end do
+  !$acc end parallel
+
+  !$acc kernels
+  !$acc loop reduction (+:a(1:5)) ! { dg-error "Array 'a' is not permitted in reduction" }
+  do i = 1, 10
+ a = a + 1
+  end do
+  !$acc end kernels
+
+  ! Reductions on array elements.
+  
+

[PATCH] Add testcase for c/68513

2015-12-02 Thread Marek Polacek

So Richi fixed this PR via changes for genmatch.c:
.

I'd like to add this testcase which shows the original problem, and on top of
that, I've added a bunch of other functions that try to exercises other
patterns with save_exprs, both from match.pd and fold-const.c.

Tested on x86_64-linux, ok for trunk?

2015-12-02  Marek Polacek  

PR c/68513
* gcc.dg/pr68513.c: New test.

diff --git gcc/testsuite/gcc.dg/pr68513.c gcc/testsuite/gcc.dg/pr68513.c
index e69de29..71298d8 100644
--- gcc/testsuite/gcc.dg/pr68513.c
+++ gcc/testsuite/gcc.dg/pr68513.c
@@ -0,0 +1,125 @@
+/* PR c/68513 */
+/* { dg-do compile } */
+/* { dg-options "-funsafe-math-optimizations -fno-math-errno -O 
-Wno-div-by-zero" } */
+
+int i;
+unsigned u;
+volatile int *e;
+
+#define E (i ? *e : 0)
+
+/* Can't trigger some of them because operand_equal_p will return false
+   for side-effects.  */
+
+/* (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x */
+int
+fn1 (void)
+{
+  int r = 0;
+  r += (short) (E & ~u | i & u);
+  r += -(short) (E & ~u | i & u);
+  r += (short) -(E & ~u | i & u);
+  return r;
+}
+
+/* sqrt(x) < y is x >= 0 && x != +Inf, when y is large.  */
+double
+fn2 (void)
+{
+  double r;
+  r = __builtin_sqrt (E) < __builtin_inf ();
+  return r;
+}
+
+/* sqrt(x) < c is the same as x >= 0 && x < c*c.  */
+double
+fn3 (void)
+{
+  double r;
+  r = __builtin_sqrt (E) < 1.3;
+  return r;
+}
+
+/* copysign(x,y)*copysign(x,y) -> x*x.  */
+double
+fn4 (double y, double x)
+{
+  return __builtin_copysign (E, y) * __builtin_copysign (E, y);
+}
+
+/* x <= +Inf is the same as x == x, i.e. !isnan(x).  */
+int
+fn5 (void)
+{
+  return E <= __builtin_inf ();
+}
+
+/* Fold (A & ~B) - (A & B) into (A ^ B) - B.  */
+int
+fn6 (void)
+{
+  return (i & ~E) - (i & E);
+}
+
+/* Fold (A & B) - (A & ~B) into B - (A ^ B).  */
+int
+fn7 (void)
+{
+  return (i & E) - (i & ~E);
+}
+
+/* x + (x & 1) -> (x + 1) & ~1 */
+int
+fn8 (void)
+{
+  return E + (E & 1);
+}
+
+/* Simplify comparison of something with itself.  */
+int
+fn9 (void)
+{
+  return E <= E | E >= E;
+}
+
+/* Fold (A & ~B) - (A & B) into (A ^ B) - B.  */
+int
+fn10 (void)
+{
+  return (i & ~E) - (i & E);
+}
+
+/* abs(x)*abs(x) -> x*x.  Should be valid for all types.  */
+int
+fn11 (void)
+{
+  return __builtin_abs (E) * __builtin_abs (E);
+}
+
+/* (x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2) */
+int
+fn12 (void)
+{
+  return (E | 11) & 12;
+}
+
+/* fold_range_test */
+int
+fn13 (const char *s)
+{
+  return s[E] != '\0' && s[E] != '/';
+}
+
+/* fold_comparison */
+int
+fn14 (void)
+{
+  return (!!i ? : (u *= E / 0)) >= (u = E);
+}
+
+/* fold_mult_zconjz */
+_Complex int
+fn15 (_Complex volatile int *z)
+{
+  return *z * ~*z;
+}

Marek

Re: [PATCH] Allocate constant size dynamic stack space in the prologue

2015-12-02 Thread Jeff Law


On 11/27/2015 07:09 AM, Dominik Vogt wrote:

New patch with the following changes:

* Fixed comment about dynamic var area placement.
* The area is now placed further away from the stack pointer than
   the non-dynamic stack variables (tested only with
   STACK_GROWS_DOWNWARD).  This is a possible performance
   improvement on S/390 (hoping that more variables will be
   addressable using a displacement).
* Moved the code that calculates the size to actually allocate
   from the size required by dynamic stack variables to a separate
   function.  Use that function from allocate_dynamic_stack_space()
   and expand_stack_vars() so the size calculations are the same
   for both.
* Use a target hook to activate the feature (for now).
   (This is just meant to make it more feasible to be included in
   Gcc6.  If it's to late for this the code may be as well be used
   for all targets.)

My inclination is for this to wait until gcc7.

jeff

[PATCH] Fix -E preprocessing of pragmas (PR preprocessor/57580)

2015-12-02 Thread Jakub Jelinek

Hi!

As the testcases show, we can happily emit e.g.
  { #pragma omp single
in the -E preprocessed output, which is not valid C/C++,
the preprocessing directives have to be at the beginning of line
or there can be optional whitespace before them.  But for -fpreprocessed
input there is even a stronger requirement - they have to be at the
beginning of line.

c-ppoutput.c already had a print.printed flag for it, but it has not
been updated in all the spots where it should be.  The following patch
ensures that print.printed is true if the last printed character is not a
newline (except for short chunks of code that obviously don't care about the
value of the flag).

I've compared preprocessed dwarf2out.c before/after the patch (both -E and
-E -P), and the patch makes no difference for that case.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2015-12-02  Jakub Jelinek  

PR preprocessor/57580
* c-ppoutput.c (print): Change printed field to bool.
Move src_file last for smaller padding.
(init_pp_output): Set print.printed to false instead of 0.
(scan_translation_unit): Fix up formatting.  Set print.printed
to true after printing something other than newline.
(scan_translation_unit_trad): Set print.printed to true instead of 1.
(maybe_print_line_1): Set print.printed to false instead of 0.
(print_line_1): Likewise.
(do_line_change): Set print.printed to true instead of 1.
(cb_define, dump_queued_macros, cb_include, cb_def_pragma,
dump_macro): Set print.printed to false after printing newline.

* c-c++-common/cpp/pr57580.c: New test.
* c-c++-common/gomp/pr57580.c: New test.

--- gcc/c-family/c-ppoutput.c.jj2015-11-14 19:35:34.0 +0100
+++ gcc/c-family/c-ppoutput.c   2015-12-02 16:50:54.345754775 +0100
@@ -31,11 +31,11 @@ static struct
   const cpp_token *prev;   /* Previous token.  */
   const cpp_token *source; /* Source token for spacing.  */
   int src_line;/* Line number currently being written. 
 */
-  unsigned char printed;   /* Nonzero if something output at line.  */
+  bool printed;/* True if something output at line.  */
   bool first_time; /* pp_file_change hasn't been called yet.  */
-  const char *src_file;/* Current source file.  */
   bool prev_was_system_token;  /* True if the previous token was a
   system token.*/
+  const char *src_file;/* Current source file.  */
 } print;
 
 /* Defined and undefined macros being queued for output with -dU at
@@ -153,7 +153,7 @@ init_pp_output (FILE *out_stream)
 
   /* Initialize the print structure.  */
   print.src_line = 1;
-  print.printed = 0;
+  print.printed = false;
   print.prev = 0;
   print.outf = out_stream;
   print.first_time = 1;
@@ -206,12 +206,16 @@ scan_translation_unit (cpp_reader *pfile
{
  line_marker_emitted = do_line_change (pfile, token, loc, false);
  putc (' ', print.outf);
+ print.printed = true;
}
  else if (print.source->flags & PREV_WHITE
   || (print.prev
   && cpp_avoid_paste (pfile, print.prev, token))
   || (print.prev == NULL && token->type == CPP_HASH))
-   putc (' ', print.outf);
+   {
+ putc (' ', print.outf);
+ print.printed = true;
+   }
}
   else if (token->flags & PREV_WHITE)
{
@@ -222,6 +226,7 @@ scan_translation_unit (cpp_reader *pfile
  && !in_pragma)
line_marker_emitted = do_line_change (pfile, token, loc, false);
  putc (' ', print.outf);
+ print.printed = true;
}
 
   avoid_paste = false;
@@ -239,7 +244,7 @@ scan_translation_unit (cpp_reader *pfile
fprintf (print.outf, "%s %s", space, name);
  else
fprintf (print.outf, "%s", name);
- print.printed = 1;
+ print.printed = true;
  in_pragma = true;
}
   else if (token->type == CPP_PRAGMA_EOL)
@@ -250,23 +255,23 @@ scan_translation_unit (cpp_reader *pfile
   else
{
  if (cpp_get_options (parse_in)->debug)
- linemap_dump_location (line_table, token->src_loc,
-print.outf);
+   linemap_dump_location (line_table, token->src_loc, print.outf);
 
  if (do_line_adjustments
  && !in_pragma
  && !line_marker_emitted
- && print.prev_was_system_token != !!in_system_header_at(loc)
+ && print.prev_was_system_token != !!in_system_header_at (loc)
  && !is_location_from_builtin_token (loc))
/* The system-ness of this token is different from the one
   of the previous token.  Let's emit a line change to

Re: [PATCH] Fix shrink-wrap bug with anticipating into loops (PR67778, PR68634)

2015-12-02 Thread Jakub Jelinek

On Wed, Dec 02, 2015 at 06:21:47PM +, Segher Boessenkool wrote:
> --- a/gcc/shrink-wrap.c
> +++ b/gcc/shrink-wrap.c
> @@ -752,7 +752,11 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head 
> *bb_with,
>  
>/* If we can move PRO back without having to duplicate more blocks, do so.
>   We can move back to a block PRE if every path from PRE will eventually
> - need a prologue, that is, PRO is a post-dominator of PRE.  */
> + need a prologue, that is, PRO is a post-dominator of PRE.  We might
> + need to duplicate PRE if there is any path from a successor of PRE back
> + to PRE, so don't allow that either (but self-loops are fine, as are any
> + other loops entirely dominated by PRE; this in general seems too
> + expensive to check for, for such an uncommon case).  */

So, what will happen if PRE self-loops?  It would be nice to have it covered
by a testcase.

> +   bool ok = true;
> +
> +   if (!can_get_prologue (pre, prologue_clobbered))
> + ok = false;
> +
> +   FOR_EACH_EDGE (e, ei, pre->succs)
> + if (e->dest != pre
> + && dominated_by_p (CDI_POST_DOMINATORS, e->dest, pre))
> +   ok = false;

I wonder if it wouldn't be better to:

if (!can_get_prologue (pre, prologue_clobbered))
  ok = false;
else
  FOR_EACH_EDGE (e, ei, pre->succs)
if (e->dest != pre
&& dominated_by_p (CDI_POST_DOMINATORS, e->dest, pre))
  {
ok = false;
break;
  }

so that it doesn't walk or continue walking the edges if not needed.

Anyway, that are my comments, I'll defer the rest of the review to somebody
else.

Jakub

Re: [PATCH/RFC v2] PR68212: Improve Accounting of Block Frequencies During Loop Unrolling

2015-12-02 Thread Jeff Law


On 11/25/2015 09:18 AM, Bernd Schmidt wrote:

I'm reading up on this stuff, but I'm probably still not the best person
to review the actual frequency manipulation parts in this. There are a
few things I can comment on, however.

The first question would be, have you looked at the rebuild_frequencies
code in predict.c, and whether you can reuse that entirely or in part?
There's similar code in the tree-ssa-threadupdate.c which tries to 
update the frequencies/probabilities in response to all the block 
copying and CFG reorganization that occurs as a result of jump 
threading.  It's a bit of a nasty problem.


I can try to help with the frequency manipulation bits.  It's not my 
strong suit, but I did have to dig into it at more than the surface 
level when the google folks submitted their rewrite of the frequency 
adjustments in tree-ssa-threadupdate.c


Kelvin, if you could post your next iteration, it'd be helpful.

jeff

Re: [OpenACC 0/7] host_data construct

2015-12-02 Thread Steve Kargl

On Wed, Dec 02, 2015 at 11:16:10AM -0800, Cesar Philippidis wrote:
> 
> This host_data patch exposed a bug in the fortran front end where it was
> allowing arrays to be used as reduction variables. If replace you
> replace codimension with dimension, you'd see a similar ICE. The
> attached patch, while it doesn't make any attempt to fix the gimplifier
> changes, does teach the fortran front end to error on acc reductions
> containing array variables.
> 
> Note that this solution is somewhat aggressive because we probably
> should allow reductions on individual array elements. E.g.
> 
>   !$acc loop reduction(+:var(1))
> 
> The c and c++ front ends also have that problem. Maybe I'll revisit this
> later.
> 
> Is this ok for trunk? It will close pr63861.
> 

I think that it is OK, but will defer to Jakub or Thomas.
I suspect tht Jakub may be pre-occupied with the upcoming
5.3 release.

-- 
Steve

Re: Patch RFA: Improve building gotools with a cross compiler

2015-12-02 Thread Jeff Law


On 11/25/2015 03:22 PM, Ian Lance Taylor wrote:

PR 66147 points out that it doesn't work to build the gotools with a
cross-compiler.  This patch improves matters.  I added a new host
export to the top level Makefile.  Build maintainers, does this change
seem OK?

Ian

./ChangeLog:

2015-11-25  Ian Lance Taylor  

PR go/66147
* Makefile.tpl (HOST_EXPORTS): Add XGCC_FLAGS_FOR_TARGET.
* Makefile.in: Regenerate.

gotools/ChangeLog:

2015-11-25  Ian Lance Taylor  

PR go/66147
* Makefile.am (GOCOMPILER): In NATIVE case, add
$(XGCC_FLAGS_FOR_TARGET).
Given this only affects Go, it's fine by me.  That's why I didn't look 
at it earlier -- I missed the RFA tag.


Jeff

Re: Fix 61441 [5/5] Disable various transformations for signaling NaN operands

2015-12-02 Thread Jeff Law


On 12/02/2015 12:08 PM, Joseph Myers wrote:

On Wed, 2 Dec 2015, Jeff Law wrote:


Why let NEGATE_EXPR and ABS_EXPR pass through here?  I realize that these can
often be implemented with bit-twiddling, so they're usually considered
special.  BUt in this case aren't we just dealing with constants and wouldn't
we want to still express the neg/abs so that we get a signal when the input
value is sNaN rather than collapse down to a constant?


See IEEE 754-2008, 5.5.1.  "Implementations shall provide the following
homogeneous quiet-computational sign bit operations for all supported
arithmetic formats; they only affect the sign bit. The operations treat
floating-point numbers and NaNs alike, and signal no exception. These
operations may propagate non-canonical encodings.".

Ah, in that case, nevermind :-)

So I think it's just the spelling and whitespace nits that need to be fixed.

jeff

Re: [OpenACC 0/7] host_data construct

2015-12-02 Thread Jakub Jelinek

On Wed, Dec 02, 2015 at 11:16:10AM -0800, Cesar Philippidis wrote:
> > --- gcc/testsuite/gfortran.dg/goacc/coarray_2.f90
> > +++ gcc/testsuite/gfortran.dg/goacc/coarray_2.f90
> > @@ -3,6 +3,7 @@
> >  !
> >  ! PR fortran/63861
> >  ! { dg-xfail-if "" { *-*-* } }
> > +! { dg-excess-errors "TODO" }
> 
> This host_data patch exposed a bug in the fortran front end where it was
> allowing arrays to be used as reduction variables. If replace you
> replace codimension with dimension, you'd see a similar ICE. The
> attached patch, while it doesn't make any attempt to fix the gimplifier
> changes, does teach the fortran front end to error on acc reductions
> containing array variables.

Does the OpenACC standard disallow array reductions?
Just asking, because OpenMP allows them (up to 4.0 only in Fortran,
in 4.5 also C/C++ array sections are allowed).

If the OpenACC standard disallows them, then it is desirable to reject them
and the patch is ok, otherwise you should try harder to support them ;).

Jakub

Re: [PATCH] Fix large code model with the ELFv2 ABI

2015-12-02 Thread David Edelsohn

On Wed, Dec 2, 2015 at 2:44 PM, Ulrich Weigand  wrote:
> Hello,
>
> this patch fixes support for the large code model with the ELFv2 ABI.
>
> The global entry point prologue currently assumes that the TOC associated
> with a function is less than 2GB away from the function entry point.  This
> is always true when using the medium or small code model, but may not be
> the case when using the large code model.
>
> This patch adds a new variant of the ELFv2 global entry point prologue that
> lifts the 2GB restriction when building with -mcmodel=large.  This works by
> emitting a quadword containing the distance from the function entry point
> to its associated TOC immediately before the entry point (this is done in
> rs6000_elf_declare_function_name), and then using a prologue like:
>
> ld r2,-8(r12)
> add r2,r2,r12
>
> In addition, if assembler support for this new relocation is detected,
> the compiler emits a R_PPC64_ENTRY reloc on the first instruction of
> this new prologue.  This will allow the linker to rewrite the prologue
> to the original form if it turns out at link time that the distance
> between entry point and TOC actually happens to be less than 2GB.
>
> The patch also introduces a new function rs6000_global_entry_point_needed_p,
> which is used instead of directly checking cfun->machine->r2_setup_needed.
> This allows handling global entry point prologues for C++ thunks.  This was
> previously done by having rs6000_output_mi_thunk set the r2_setup_needed
> flag, but this no longer works, since we now need to check whether we need
> a global entry point prologue in rs6000_elf_declare_function_name, which
> is already called *before* rs6000_output_mi_thunk.
>
> Finally, the patch removes use of the GNU local label extension ("0b")
> in favour of compiler-emitted internal labels.  This seems clearer now
> that the entry point code may be split across two different functions
> (rs6000_output_function_prologue vs. rs6000_elf_declare_function_name)
> and makes it simpler to move the location of the TOC delta quadword
> at some future time.  Also, it removes the implicit assumption that
> the system assembler supports this GNU extension, which I understand
> we don't assume anywhere else.
>
> Tested on powerp64le-linux.  Also tested bootstrap/regtest with an
> extra patch to switch the default code model to CMODEL_LARGE.  Tested
> both with an assembler supporting R_PPC64_ENTRY and with an assembler
> that doesn't support it.
>
> OK for mainline?
>
> Bye,
> Ulrich
>
> ChangeLog:
>
> * configure.ac: Check assembler support for R_PPC64_ENTRY relocation.
> * configure: Regenerate.
> * config.in: Regenerate.
> * config/rs6000/rs6000.c (rs6000_global_entry_point_needed_p): New
> function.
> (rs6000_output_function_prologue): Use it instead of checking
> cfun->machine->r2_setup_needed.  Use internal labels instead of
> GNU as local label extension.  Handle ELFv2 large code model.
> (rs6000_output_mi_thunk): Do not set cfun->machine->r2_setup_needed.
> (rs6000_elf_declare_function_name): Handle ELFv2 large code model.

Okay.

Thanks, David

Re: [OpenACC 0/7] host_data construct

2015-12-02 Thread Cesar Philippidis

On 12/02/2015 11:35 AM, Jakub Jelinek wrote:
> On Wed, Dec 02, 2015 at 11:16:10AM -0800, Cesar Philippidis wrote:
>>> --- gcc/testsuite/gfortran.dg/goacc/coarray_2.f90
>>> +++ gcc/testsuite/gfortran.dg/goacc/coarray_2.f90
>>> @@ -3,6 +3,7 @@
>>>  !
>>>  ! PR fortran/63861
>>>  ! { dg-xfail-if "" { *-*-* } }
>>> +! { dg-excess-errors "TODO" }
>>
>> This host_data patch exposed a bug in the fortran front end where it was
>> allowing arrays to be used as reduction variables. If replace you
>> replace codimension with dimension, you'd see a similar ICE. The
>> attached patch, while it doesn't make any attempt to fix the gimplifier
>> changes, does teach the fortran front end to error on acc reductions
>> containing array variables.
> 
> Does the OpenACC standard disallow array reductions?
> Just asking, because OpenMP allows them (up to 4.0 only in Fortran,
> in 4.5 also C/C++ array sections are allowed).
> 
> If the OpenACC standard disallows them, then it is desirable to reject them
> and the patch is ok, otherwise you should try harder to support them ;).

Array reductions aren't supported in OpenACC 2.0.

Cesar

[PATCH] Improve expansion of double word popcount or parity (PR target/68647)

2015-12-02 Thread Jakub Jelinek

Hi!

expand_unop already handles specially several bitop builtins (strangely
e.g. only clz and not ctz which can be handled pretty much the same,
except comparing the other subreg), but does not handle popcount/parity
this way.

popcount of double word value can be computed as popcount (hi) + popcount (lo),
parity of double word value can be computed as parity (hi ^ lo).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2015-12-02  Jakub Jelinek  

PR target/68647
* optabs.c (expand_doubleword_popcount, expand_doubleword_parity):
New functions.
(expand_unop): Use them.

* gcc.target/i386/pr68647.c: New test.

--- gcc/optabs.c.jj 2015-11-27 10:01:03.0 +0100
+++ gcc/optabs.c2015-12-02 09:50:46.774907703 +0100
@@ -2223,6 +2223,58 @@ expand_doubleword_clz (machine_mode mode
   return 0;
 }
 
+/* Try calculating popcount of a double-word quantity as two popcount's of
+   word-sized quantities and summing up the results.  */
+static rtx
+expand_doubleword_popcount (machine_mode mode, rtx op0, rtx target)
+{
+  rtx t0, t1, t;
+  rtx_insn *seq;
+
+  start_sequence ();
+
+  t0 = expand_unop_direct (word_mode, popcount_optab,
+  operand_subword_force (op0, 0, mode), NULL_RTX,
+  true);
+  t1 = expand_unop_direct (word_mode, popcount_optab,
+  operand_subword_force (op0, 1, mode), NULL_RTX,
+  true);
+  if (!t0 || !t1)
+{
+  end_sequence ();
+  return NULL_RTX;
+}
+
+  /* If we were not given a target, use a word_mode register, not a
+ 'mode' register.  The result will fit, and nobody is expecting
+ anything bigger (the return type of __builtin_popcount* is int).  */
+  if (!target)
+target = gen_reg_rtx (word_mode);
+
+  t = expand_binop (word_mode, add_optab, t0, t1, target, 0, OPTAB_DIRECT);
+
+  seq = get_insns ();
+  end_sequence ();
+
+  add_equal_note (seq, t, POPCOUNT, op0, 0);
+  emit_insn (seq);
+  return t;
+}
+
+/* Try calculating
+   (parity:wide x)
+   as
+   (parity:narrow (low (x) ^ high (x))) */
+static rtx
+expand_doubleword_parity (machine_mode mode, rtx op0, rtx target)
+{
+  rtx t = expand_binop (word_mode, xor_optab,
+   operand_subword_force (op0, 0, mode),
+   operand_subword_force (op0, 1, mode),
+   NULL_RTX, 0, OPTAB_DIRECT);
+  return expand_unop (word_mode, parity_optab, t, target, true);
+}
+
 /* Try calculating
(bswap:narrow x)
as
@@ -2582,7 +2634,7 @@ expand_absneg_bit (enum rtx_code code, m
different mode or with a libcall.  */
 static rtx
 expand_unop_direct (machine_mode mode, optab unoptab, rtx op0, rtx target,
-int unsignedp)
+   int unsignedp)
 {
   if (optab_handler (unoptab, mode) != CODE_FOR_nothing)
 {
@@ -2665,6 +2717,27 @@ expand_unop (machine_mode mode, optab un
   goto try_libcall;
 }
 
+  if (unoptab == popcount_optab
+  && GET_MODE_SIZE (mode) == 2 * UNITS_PER_WORD
+  && optab_handler (unoptab, word_mode) != CODE_FOR_nothing
+  && optimize_insn_for_speed_p ())
+{
+  temp = expand_doubleword_popcount (mode, op0, target);
+  if (temp)
+   return temp;
+}
+
+  if (unoptab == parity_optab
+  && GET_MODE_SIZE (mode) == 2 * UNITS_PER_WORD
+  && (optab_handler (unoptab, word_mode) != CODE_FOR_nothing
+ || optab_handler (popcount_optab, word_mode) != CODE_FOR_nothing)
+  && optimize_insn_for_speed_p ())
+{
+  temp = expand_doubleword_parity (mode, op0, target);
+  if (temp)
+   return temp;
+}
+
   /* Widening (or narrowing) bswap needs special treatment.  */
   if (unoptab == bswap_optab)
 {
--- gcc/testsuite/gcc.target/i386/pr68647.c.jj  2015-12-02 10:01:25.621671528 
+0100
+++ gcc/testsuite/gcc.target/i386/pr68647.c 2015-12-02 10:01:56.451226044 
+0100
@@ -0,0 +1,18 @@
+/* PR target/68647 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mpopcnt" } */
+
+int
+f1 (unsigned long long a)
+{
+  return __builtin_popcountll (a);
+}
+
+int
+f2 (unsigned long long a)
+{
+  return __builtin_parityll (a);
+}
+
+/* { dg-final { scan-assembler-not "__popcountdi2" } } */
+/* { dg-final { scan-assembler-not "__paritydi2" } } */

Jakub

Re: [PATCH] Improve expansion of double word popcount or parity (PR target/68647)

2015-12-02 Thread Jeff Law


On 12/02/2015 12:03 PM, Jakub Jelinek wrote:

Hi!

expand_unop already handles specially several bitop builtins (strangely
e.g. only clz and not ctz which can be handled pretty much the same,
except comparing the other subreg), but does not handle popcount/parity
this way.

popcount of double word value can be computed as popcount (hi) + popcount (lo),
parity of double word value can be computed as parity (hi ^ lo).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2015-12-02  Jakub Jelinek  

PR target/68647
* optabs.c (expand_doubleword_popcount, expand_doubleword_parity):
New functions.
(expand_unop): Use them.

* gcc.target/i386/pr68647.c: New test.

OK.
jeff

[GOOGLE] Remove overly-aggressive LIPO assert

2015-12-02 Thread Teresa Johnson

Remove an assert that was overly-strict and already partially redundant
with an immediately prior assert. In this case we had a hidden visibility
function clone that was created after the LIPO link due to indirect call
promotion. It is a cgraph_is_aux_decl_external node.

Fixes failures and passes regression tests. Ok for Google branch?

2015-12-02  Teresa Johnson  

Google ref b/25925223.
* l-ipo.c (cgraph_lipo_get_resolved_node_1): Remove overly-strict
assert.

Index: l-ipo.c
===
--- l-ipo.c (revision 231131)
+++ l-ipo.c (working copy)
@@ -1457,9 +1457,6 @@ cgraph_lipo_get_resolved_node_1 (tree decl, bool d
   gcc_assert (DECL_EXTERNAL (decl)
   || cgraph_is_aux_decl_external (n)
   || DECL_VIRTUAL_P (decl));
-  gcc_assert (/* This is the case for explicit extern
instantiation,
- when cgraph node is not created before link.  */
-  DECL_EXTERNAL (decl));
   cgraph_link_node (n);
   return n;
 }


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: -fstrict-aliasing fixes 4/6: do not fiddle with flag_strict_aliasing when expanding debug locations

2015-12-02 Thread Richard Biener

On Wed, 2 Dec 2015, Jan Hubicka wrote:

> Hi,
> this patch removes flag_strict_aliasing kludge in expanding debug locations 
> and
> instead it introduces explicit parameter DEBUG that makes
> set_mem_attributes_minus_bitpos to not affect alias sets.  This is sanity
> checked by comparing number of alias sets before and after at a time we
> originally overwritten flag_strict_aliasing.
> 
> I also added code to prevent memory attributes creation for !optimize and to
> avoid get_alias_set computation for !flag_strict_aliasing. This slightly
> optimizes -O0 builds but the results seems to be down in the noise (I would 
> not
> object to leave it out).
> 
> The patch should fix at least one (latent?) bug that call_stmt expansion
> invoke expand_debug_expr without clearing flag_strict_aliasing.
> 
> Bootstrapped/regtested x86_64-linux, also tested with compare-debug, OK?

First of all, why do debug MEMs need mem-attrs?  Second, I'd rather
refactor make_decl_rtl into a _raw part that can be used from
both make_decl_rtl and make_decl_rtl_for_debug avoiding the debug
parameter.

I don't think any of this is suitable for stage3.

Thanks,
Richard.

> Honza
> 
>   * cfgexpand.c: Include alias.h
>   (expand_debug_expr): Pass debug=true to set_mem_attributes.
>   (expand_debug_locations): Do not fiddle with flag_strict_aliasing;
>   sanity check that no new alias set was introduced.
>   * varasm.c: Include alias.h
>   (make_decl_rtl): New parameter DEBUG; pass it to set_mem_attributes.
>   (make_decl_rtl_for_debug): Do ont fiddle with flag_strict_aliasing;
>   assert that no new alias set was introduced.
>   * varasm.h (make_decl_rtl): New parameter debug.
>   * alias.h (num_alias_sets): New function.
>   * emit-rtl.c (set_mem_attributes_minus_bitpos): New parameter DEBUG;
>   exit early when not optimizing; do not introduce new alias set when
>   producing debug only attributes.
>   (set_mem_attributes): New parameter DEBUG.
>   * emit-rtl.h (set_mem_attributes, set_mem_attributes_minus_bitpos):
>   New parameters DEBUG.
>   (num_alias_sets): New function.
> 
> Index: cfgexpand.c
> ===
> --- cfgexpand.c   (revision 231122)
> +++ cfgexpand.c   (working copy)
> @@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.
>  #include "builtins.h"
>  #include "tree-chkp.h"
>  #include "rtl-chkp.h"
> +#include "alias.h"
>  
>  /* Some systems use __main in a way incompatible with its use in gcc, in 
> these
> cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN 
> to
> @@ -4178,7 +4179,7 @@ expand_debug_expr (tree exp)
>   return NULL_RTX;
> op0 = gen_rtx_CONST_STRING (Pmode, TREE_STRING_POINTER (exp));
> op0 = gen_rtx_MEM (BLKmode, op0);
> -   set_mem_attributes (op0, exp, 0);
> +   set_mem_attributes (op0, exp, 0, true);
> return op0;
>   }
>/* Fall through...  */
> @@ -4346,7 +4347,7 @@ expand_debug_expr (tree exp)
>   return NULL;
>  
>op0 = gen_rtx_MEM (mode, op0);
> -  set_mem_attributes (op0, exp, 0);
> +  set_mem_attributes (op0, exp, 0, true);
>if (TREE_CODE (exp) == MEM_REF
> && !is_gimple_mem_ref_addr (TREE_OPERAND (exp, 0)))
>   set_mem_expr (op0, NULL_TREE);
> @@ -4372,7 +4373,7 @@ expand_debug_expr (tree exp)
>  
>op0 = gen_rtx_MEM (mode, op0);
>  
> -  set_mem_attributes (op0, exp, 0);
> +  set_mem_attributes (op0, exp, 0, true);
>set_mem_addr_space (op0, as);
>  
>return op0;
> @@ -4458,7 +4459,7 @@ expand_debug_expr (tree exp)
> op0 = copy_rtx (op0);
>   if (op0 == orig_op0)
> op0 = shallow_copy_rtx (op0);
> - set_mem_attributes (op0, exp, 0);
> + set_mem_attributes (op0, exp, 0, true);
> }
>  
>   if (bitpos == 0 && mode == GET_MODE (op0))
> @@ -5219,12 +5220,11 @@ expand_debug_locations (void)
>  {
>rtx_insn *insn;
>rtx_insn *last = get_last_insn ();
> -  int save_strict_alias = flag_strict_aliasing;
>  
>/* New alias sets while setting up memory attributes cause
>   -fcompare-debug failures, even though it doesn't bring about any
>   codegen changes.  */
> -  flag_strict_aliasing = 0;
> +  int num = num_alias_sets ();
>  
>for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
>  if (DEBUG_INSN_P (insn))
> @@ -5284,7 +5284,7 @@ expand_debug_locations (void)
> avoid_complex_debug_insns (insn2, _VAR_LOCATION_LOC (insn2), 0);
>}
>  
> -  flag_strict_aliasing = save_strict_alias;
> +  gcc_checking_assert (num == num_alias_sets ());
>  }
>  
>  /* Performs swapping operands of commutative operations to expand
> Index: varasm.c
> ===
> --- varasm.c  (revision 231122)
> +++ varasm.c  (working copy)
> @@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.
>

Re: [PATCH] S/390: Fix warning in "*movstr" pattern.

2015-12-02 Thread Dominik Vogt

On Wed, Dec 02, 2015 at 09:59:10AM +0100, Andreas Krebbel wrote:
> On 11/30/2015 03:45 PM, Dominik Vogt wrote:
> > On Mon, Nov 09, 2015 at 01:33:23PM +0100, Andreas Krebbel wrote:
> >> On 11/04/2015 02:39 AM, Dominik Vogt wrote:
> >>> On Tue, Nov 03, 2015 at 06:47:28PM +0100, Ulrich Weigand wrote:
>  Dominik Vogt wrote:
> > +++ b/gcc/testsuite/gcc.target/s390/md/movstr-1.c
> > @@ -0,0 +1,11 @@
> > +/* Machine description pattern tests.  */
> > +
> > +/* { dg-do assemble } */
> > +/* { dg-options "-dP -save-temps" } */
> 
> -save-temps is not necessary for a dg-do assemble test.

It *is* necessary for "assemble", but not for "compile" which
should be used here.  Anyway, I want to upgrade the test to a
"run" test that also veryfies whether the generated code does the
right thing.

> > +# Additional md torture tests.
> > +torture-init
> > +set MD_TEST_OPTS [list \
> > +   {-Os -march=z900} {-Os -march=z13} \
> > +   {-O0 -march=z900} {-O0 -march=z13} \
> > +   {-O1 -march=z900} {-O1 -march=z13} \
> > +   {-O2 -march=z900} {-O2 -march=z13} \
> > +   {-O3 -march=z900} {-O3 -march=z13}]
> > +set-torture-options $MD_TEST_OPTS
> > +gcc-dg-runtest [lsort [glob -nocomplain $md_tests]] "" $DEFAULT_CFLAGS
> >  torture-finish
> 
> Does it really make sense to use different -march options for the
> md/ tests? Whether a certain pattern will match usually depends on
> the -march level. I would say the -march option needs to be part
> of testcase.

Agreed, but I think with "run" tests various -march= and -O
options are useful.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

Re: -fstrict-aliasing fixes 5/6: make type system independent of flag_strict_aliasing

2015-12-02 Thread Richard Biener

On Wed, 2 Dec 2015, Jan Hubicka wrote:

> Hi,
> this patch makes the type system to be unchanged by flag_strict_aliasing.
> This is needed to prevent optimization loss in flag_strict_aliasing code where
> some !flag_strict_aliasing code put alias set 0 into a type (this happens
> in all cases I modified in my original patch). It is also necessary to 
> validate
> ipa-icf and operand_equal_p transformations to be safe for code transitions
> !flag_strict_aliasing->flag_strict_aliasing that I wasn to do in the inliner.
> 
> This patch goes the opposite way than my previous attempt (and is short unlike
> the explanation ;).  Instead of adding extra parameter to get_alias_set it
> makes get_alias_set do ignore flag_strict_aliasing.  To make sure that no TBAA
> is used when !flag_strict_aliasing I can simply disable alias_set_subset_of 
> and
> alias_sets_conflict_p which are the only way TBAA oracle can disambiguate
> items.
> 
> Next there are cases where optimizations are disabled to keep TBAA right.  
> I audited the code and found only function.c (that uses object_must_conflict
> for packing) and ipa-icf/fold-const.  This patch updates 
> objects_must_conflict_p, fold-const
> already check flag_strict_aliasing and I did not update ipa-icf because I 
> would
> have to disable non-strict-aliasing path in the followup patch.
> 
> I checked that there is no code difference with -fno-strict-aliasing 
> -fno-ipa-icf
> with this patch on tramp3d and dealII
> 
> 
> Bootstrapped/regtested x86_64-linux and also lto-bootstraped. Looks OK?
> 
>   * alias.c (alias_set_subset_of, alias_sets_conflict_p,
>   objects_must_conflict_p): Short circuit for !flag_strict_aliasing
>   (get_alias_set): Remove flag_strict_aliasing check.
>   (new_alias_set): Likewise.
> Index: alias.c
> ===
> --- alias.c   (revision 231081)
> +++ alias.c   (working copy)
> @@ -405,6 +405,10 @@ alias_set_subset_of (alias_set_type set1
>  {
>alias_set_entry *ase2;
>  
> +  /* Disable TBAA oracle with !flag_strict_aliasing.  */
> +  if (!flag_strict_aliasing)
> +return true;
> +
>/* Everything is a subset of the "aliases everything" set.  */
>if (set2 == 0)
>  return true;
> @@ -466,6 +470,10 @@ alias_sets_conflict_p (alias_set_type se
>alias_set_entry *ase1;
>alias_set_entry *ase2;
>  
> +  /* Disable TBAA oracle with !flag_strict_aliasing.  */
> +  if (!flag_strict_aliasing)
> +return true;
> +
>/* The easy case.  */
>if (alias_sets_must_conflict_p (set1, set2))
>  return 1;
> @@ -561,6 +569,9 @@ objects_must_conflict_p (tree t1, tree t
>  {
>alias_set_type set1, set2;
>  
> +  if (!flag_strict_aliasing)
> +return 1;
> +

Rather than adjusting this function please adjust 
alias_sets_must_conflict_p.

Otherwise this looks ok and indeed much nicer.

Thanks,
Richard.

>/* If neither has a type specified, we don't know if they'll conflict
>   because we may be using them to store objects of various types, for
>   example the argument and local variables areas of inlined functions.  */
> @@ -816,10 +827,12 @@ get_alias_set (tree t)
>  {
>alias_set_type set;
>  
> -  /* If we're not doing any alias analysis, just assume everything
> - aliases everything else.  Also return 0 if this or its type is
> - an error.  */
> -  if (! flag_strict_aliasing || t == error_mark_node
> +  /* We can not give up with -fno-strict-aliasing because we need to build
> + proper type representation for possible functions which are build with
> + -fstirct-aliasing.  */
> +
> +  /* return 0 if this or its type is an error.  */
> +  if (t == error_mark_node
>|| (! TYPE_P (t)
> && (TREE_TYPE (t) == 0 || TREE_TYPE (t) == error_mark_node)))
>  return 0;
> @@ -1085,15 +1098,10 @@ get_alias_set (tree t)
>  alias_set_type
>  new_alias_set (void)
>  {
> -  if (flag_strict_aliasing)
> -{
> -  if (alias_sets == 0)
> - vec_safe_push (alias_sets, (alias_set_entry *) NULL);
> -  vec_safe_push (alias_sets, (alias_set_entry *) NULL);
> -  return alias_sets->length () - 1;
> -}
> -  else
> -return 0;
> +  if (alias_sets == 0)
> +vec_safe_push (alias_sets, (alias_set_entry *) NULL);
> +  vec_safe_push (alias_sets, (alias_set_entry *) NULL);
> +  return alias_sets->length () - 1;
>  }
>  
>  /* Indicate that things in SUBSET can alias things in SUPERSET, but that

Re: [PATCH] Empty redirect_edge_var_map after each pass and function

2015-12-02 Thread Richard Biener

On Tue, 1 Dec 2015, Jeff Law wrote:

> On 12/01/2015 11:33 AM, Alan Lawrence wrote:
> > 
> > I was not able to reduce the testcase below about 30k characters, with e.g.
> > #define T_VOID 0
> >  T_VOID 
> > producing the ICE, but manually changing to
> >  0 
> > preventing the ICE; as did running the preprocessor as a separate step, or a
> > wide variety of options (e.g. -fdump-tree-alias).
> Which is almost always an indication that there's a memory corruption, or
> uninitialized memory read or something similar.
> 
> 
> > 
> > In the end I traced this to loop_unswitch reading stale values from the edge
> > redirect map, which is keyed on 'edge' (a pointer to struct edge_def); the
> > map
> > entries had been left there by pass_dominator (on a different function), and
> > by
> > "chance" the edge *pointers* were the same as to some current edge_defs
> > (even
> > though they pointed to structures created by different allocations, the
> > first
> > of which had since been freed). Hence the fragility of the testcase and
> > environment.
> Right.  So the question I have is how/why did DOM leave anything in the map.
> And if DOM is fixed to not leave stuff lying around, can we then assert that
> nothing is ever left in those maps between passes?  There's certainly no good
> reason I'm aware of why DOM would leave things in this state.

It happens not only with DOM but with all passes doing edge redirection.
This is because the map is populated by GIMPLE cfg hooks just in case
it might be used.  But there is no such thing as a "start CFG manip"
and "end CFG manip" to cleanup such dead state.

IMHO the redirect-edge-var-map stuff is just the very most possible
unclean implementation possible. :(  (see how remove_edge "clears"
stale info from the map to avoid even more "interesting" stale
data)

Ideally we could assert the map is empty whenever we leave a pass,
but as said it triggers all over the place.  Even cfg-cleanup causes
such stale data.

I agree that the patch is only a half-way "solution", but a full
solution would require sth more explicit, like we do with
initialize_original_copy_tables/free_original_copy_tables.  Thus
require passes to explicitely request the edge data to be preserved
with a initialize_edge_var_map/free_edge_var_map call pair.

Not appropriate at this stage IMHO (well, unless it turns out to be
a very localized patch).

Richard.

PR68146: Check for null SSA_NAME_DEF_STMTs in fold-const.c

2015-12-02 Thread Richard Sandiford

The problem in the testcase was that tree-complex.c was trying
to fold ABS_EXPRs of SSA names that didn't yet have a definition
(because the SSA names were real and imaginary parts of a complex
SSA name whose definition hadn't yet been visited by the pass).
tree-complex.c uses a straightforward walk in index order:

  /* ??? Ideally we'd traverse the blocks in breadth-first order.  */
  old_last_basic_block = last_basic_block_for_fn (cfun);
  FOR_EACH_BB_FN (bb, cfun)
{

and in the testcase, we have a block A with a single successor B that
comes before it.  B has no other predecessor and has a complex division
that uses an SSA name X defined in A, so we split the components of X
before we reach the definition of X.  (I imagine cfgcleanup would
clean this up by joining A and B.)

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
PR tree-optimization/68146
* fold-const.c (tree_binary_nonnegative_warnv_p): Check for null
SSA_NAME_DEF_STMTs.
(integer_valued_real_call_p): Likewise.

gcc/testsuite/
* gfortran.fortran-torture/compile/pr68146.f90: New test.

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 16bff5f..c99e78e 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -12867,6 +12867,8 @@ tree_binary_nonnegative_warnv_p (enum tree_code code, 
tree type, tree op0,
 bool
 tree_single_nonnegative_warnv_p (tree t, bool *strict_overflow_p, int depth)
 {
+  gimple *stmt;
+
   if (TYPE_UNSIGNED (TREE_TYPE (t)))
 return true;
 
@@ -12892,8 +12894,9 @@ tree_single_nonnegative_warnv_p (tree t, bool 
*strict_overflow_p, int depth)
 to provide it through dataflow propagation.  */
   return (!name_registered_for_update_p (t)
  && depth < PARAM_VALUE (PARAM_MAX_SSA_NAME_QUERY_DEPTH)
- && gimple_stmt_nonnegative_warnv_p (SSA_NAME_DEF_STMT (t),
- strict_overflow_p, depth));
+ && (stmt = SSA_NAME_DEF_STMT (t))
+ && gimple_stmt_nonnegative_warnv_p (stmt, strict_overflow_p,
+ depth));
 
 default:
   return tree_simple_nonnegative_warnv_p (TREE_CODE (t), TREE_TYPE (t));
@@ -13508,6 +13511,7 @@ integer_valued_real_call_p (combined_fn fn, tree arg0, 
tree arg1, int depth)
 bool
 integer_valued_real_single_p (tree t, int depth)
 {
+  gimple *stmt;
   switch (TREE_CODE (t))
 {
 case REAL_CST:
@@ -13524,8 +13528,8 @@ integer_valued_real_single_p (tree t, int depth)
 to provide it through dataflow propagation.  */
   return (!name_registered_for_update_p (t)
  && depth < PARAM_VALUE (PARAM_MAX_SSA_NAME_QUERY_DEPTH)
- && gimple_stmt_integer_valued_real_p (SSA_NAME_DEF_STMT (t),
-   depth));
+ && (stmt = SSA_NAME_DEF_STMT (t))
+ && gimple_stmt_integer_valued_real_p (stmt, depth));
 
 default:
   break;
diff --git a/gcc/testsuite/gfortran.fortran-torture/compile/pr68146.f90 
b/gcc/testsuite/gfortran.fortran-torture/compile/pr68146.f90
new file mode 100644
index 000..7f75ec0
--- /dev/null
+++ b/gcc/testsuite/gfortran.fortran-torture/compile/pr68146.f90
@@ -0,0 +1,11 @@
+subroutine foo(a1, a2, s1, s2, n)
+  integer :: n
+  complex(kind=8) :: a1(n), a2(n), s1, s2
+  do i = 1, n
+ a1(i) = i
+  end do
+  s1 = 20.0 / s2
+  do i = 1, n
+ a2(i) = i / s2
+  end do
+end subroutine foo

Re: [PATCH] Empty redirect_edge_var_map after each pass and function

2015-12-02 Thread Richard Biener

On Tue, 1 Dec 2015, Alan Lawrence wrote:

> This follows on from discussion at
> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03392.html
> To recap: Starting in r229479 and continuing until at least 229711, compiling
> polynom.c from spec2000 on aarch64-none-linux-gnu, with options
> -O3 -mcpu=cortex-a53 -ffast-math (on both cross, native bootstrapped, and 
> native
> --disable-bootstrap compilers), produced a verify_gimple ICE after unswitch:
> 
> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c: In function 
> 'NormalizeCoeffsListx':
> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: 
> incompatible types in PHI argument 0
>  TypHandle NormalizeCoeffsListx ( hdC )
>^
> long int
> 
> int
> 
> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: location 
> references block not in block tree
> l1_279 = PHI <1(28), l1_299(33)>
> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: invalid 
> PHI argument
> 
> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: internal 
> compiler error: tree check: expected class 'type', have 'declaration' 
> (namespace_decl) in useless_type_conversion_p, at gimple-expr.c:84
> 0xd110ef tree_class_check_failed(tree_node const*, tree_code_class, char 
> const*, int, char const*)
> ../../gcc-fsf/gcc/tree.c:9643
> 0x82561b tree_class_check
> ../../gcc-fsf/gcc/tree.h:3042
> 0x82561b useless_type_conversion_p(tree_node*, tree_node*)
> ../../gcc-fsf/gcc/gimple-expr.c:84
> 0xaca043 verify_gimple_phi
> ../../gcc-fsf/gcc/tree-cfg.c:4673
> 0xaca043 verify_gimple_in_cfg(function*, bool)
> ../../gcc-fsf/gcc/tree-cfg.c:4967
> 0x9c2e0b execute_function_todo
> ../../gcc-fsf/gcc/passes.c:1967
> 0x9c360b do_per_function
> ../../gcc-fsf/gcc/passes.c:1659
> 0x9c3807 execute_todo
> ../../gcc-fsf/gcc/passes.c:2022
> 
> I was not able to reduce the testcase below about 30k characters, with e.g.
> #define T_VOID 0
>  T_VOID 
> producing the ICE, but manually changing to
>  0 
> preventing the ICE; as did running the preprocessor as a separate step, or a
> wide variety of options (e.g. -fdump-tree-alias).
> 
> In the end I traced this to loop_unswitch reading stale values from the edge
> redirect map, which is keyed on 'edge' (a pointer to struct edge_def); the map
> entries had been left there by pass_dominator (on a different function), and 
> by
> "chance" the edge *pointers* were the same as to some current edge_defs (even
> though they pointed to structures created by different allocations, the first
> of which had since been freed). Hence the fragility of the testcase and
> environment.
> 
> While the ICE is prevented merely by adding a call to
> redirect_edge_var_map_destroy at the end of pass_dominator::execute, given the
> fragility of the bug, difficulty of reducing the testcase, and the low 
> overhead
> of emptying an already-empty map, I believe the right fix is to empty the map
> as often as can correctly do so, hence this patch - based substantially on
> Richard's comments in PR/68117.
> 
> Bootstrapped + check-gcc + check-g++ on x86_64 linux, based on r231105; I've
> also built SPEC2000 on aarch64-none-linux-gnu by applying this patch (roughly)
> onto the previously-failing r229711, which also passes aarch64 bootstrap, and
> a more recent bootstrap on aarch64 is ongoing. Assuming/if no regressions 
> there...
> 
> Is this ok for trunk?
> 
> This could also be a candidate for the 5.3 release; backporting depends only 
> on
> the (fairly trivial) r230357.

Looks good to me (for both, but backport only after 5.3 is released).  But
please wait for the discussion with Jeff to settle down.

Thanks,
Richard.
 
> gcc/ChangeLog:
> 
>   Alan Lawrence  
>   Richard Biener  
> 
>   * cfgexpand.c (pass_expand::execute): Replace call to
>   redirect_edge_var_map_destroy with redirect_edge_var_map_empty.
>   * tree-ssa.c (delete_tree_ssa): Likewise.
>   * function.c (set_cfun): Call redirect_edge_var_map_empty.
>   * passes.c (execute_one_ipa_transform_pass, execute_one_pass): Likewise.
>   * tree-ssa.h (redirect_edge_var_map_destroy): Remove.
>   (redirect_edge_var_map_empty): New.
>   * tree-ssa.c (redirect_edge_var_map_destroy): Remove.
>   (redirect_edge_var_map_empty): New.
> 
> ---
>  gcc/cfgexpand.c | 2 +-
>  gcc/function.c  | 2 ++
>  gcc/passes.c| 2 ++
>  gcc/tree-ssa.c  | 8 
>  gcc/tree-ssa.h  | 2 +-
>  5 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 1990e10..ede1b82 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -6291,7 +6291,7 @@ pass_expand::execute (function *fun)
>expand_phi_nodes ();
>  
>/* Release any stale SSA redirection data.  */
> -  redirect_edge_var_map_destroy ();
> +  redirect_edge_var_map_empty ();
>  
>/* Register rtl specific

Re: [PATCH 1/2] destroy values as well as keys when removing them from hash maps

2015-12-02 Thread Richard Biener

On Wed, 2 Dec 2015, Trevor Saunders wrote:

> On Tue, Dec 01, 2015 at 07:43:35PM +, Richard Sandiford wrote:
> > tbsaunde+...@tbsaunde.org writes:
> > > -template 
> > > +template 
> > >  template 
> > >  inline void
> > > -simple_hashmap_traits ::remove (T )
> > > +simple_hashmap_traits ::remove (T )
> > >  {
> > >H::remove (entry.m_key);
> > > +  entry.m_value.~Value ();
> > >  }
> > 
> > This is just repeating my IRC comment really, but doesn't this mean that
> > we're calling the destructor on an object that was never constructed?
> > I.e. nothing ever calls placement new on the entry, the m_key, or the
> > m_value.
> 
> I believe you are correct that placement new is not called.  I'd say its
> a bug waiting to happen given that the usage of auto_vec seems to
> demonstrate that people expect objects to be initialized and destroyed.
> However for now all values are either POD, or auto_vec and in either
> case the current 0 initialization has the same effect as the
> constructor.  So There may be a theoretical problem with how we
> initialize values that will become real when somebody adds a constructor
> that doesn't just 0 initialize.  So it should probably be improved at
> some point, but it doesn't seem necessary to mess with it at this point
> instead of next stage 1.

Agreed.  You'll also need a more elaborate allocator/constructor
scheme for this considering the case where no default constructor
is available.  See how alloc-pool.h tries to dance around this
using a "raw" allocate and a operator new...

Richard.

> Trev
> 
> > 
> > Thanks,
> > Richard
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

1 2 >

1 - 100 of 193 matches

Mail list logo