Re: [PATCH] PR77359: Properly align local variables in functions calling alloca.

2016-11-10 Thread David Edelsohn
On Thu, Nov 10, 2016 at 6:47 PM, Dominik Vogt  wrote:
> On Thu, Nov 03, 2016 at 11:40:44AM +0100, Dominik Vogt wrote:
>> The attached patch fixes the stack layout problems on AIX and
>> Power as described here:
>>
>>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77359
>>
>> The patch has been bootstrapped on AIX (32 Bit) and bootstrappend
>> and regression tested on Power (biarch).  It needs more testing
>> that I cannot do with the hardware available to me.
>>
>> If the patch is good, this one can be re-applied:
>>
>>   https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01730.html
>>   https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01616.html
>
> So, is this patch in order to be committed?  (Assuming that a
> followup patch will clean up the rs6000.h+aix.h quirks.)

Please also update the ASCII pictures above the rs6000_stack_info()
function in rs6000.c to show / describe the new padding for alignment.

Thanks, David


Re: [PATCH/AARCH64] Handle ILP32 multi-arch

2016-11-10 Thread Andrew Pinski
On Tue, Oct 25, 2016 at 3:25 PM, Matthias Klose  wrote:
> On 07.10.2016 23:08, Andrew Pinski wrote:
>> Hi,
>>   This patch adds ilp32 multi-arch support.  This is needed to support
>> multi-arch on Debian like systems.
>>
>> OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>> Also tested with ilp32 with a newly built toolchain that supports
>> ILP32 with Ubuntu 1604 base.
>>
>> Thanks,
>> Andrew
>>
>> ChangeLog:
>> * config/aarch64/t-aarch64-linux (MULTILIB_OSDIRNAMES): Handle
>> multi-arch for ilp32.
>
> I can't approve that, but it looks like a reasonable change, but we should
> document the multiarch triplet at https://wiki.debian.org/Multiarch/Tuples


Ping?

Thanks,
Andrew

>
> Matthias
>


[PATCH/AARCH64] Improved -mcpu/mtune/march=native handling

2016-11-10 Thread Andrew Pinski
As I mentioned in my other emails, parsing /proc/cpuinfo has one issue
is that the current parsing assumes many different things about the
format.  So the best way to do this is to parse
/sys/devices/system/cpu/cpuN/regs/identification/midr_el1 files
instead.  To get which cpu are present (though not necessarily online)
we parse "/sys/devices/system/cpu/present" file.  We fall back to
parsing /proc/cpu if any parsing fails of these files including not
finding out which cpu we are on.  The main reason why we fall back is
because only newer kernels support exporting this file.  To get the
features I just look at the hwcap that the kernel passes to userspace
so I needed to add an extra argument to AARCH64_OPT_EXTENSION.  I also
had to define some HWCAP_* macros in driver-aarch64.c since older
kernels headers don't have these values defined.

It should also be possible to parse
/sys/devices/system/cpu/cpu%d/cache%d directory to get cache
information too but that is left for another patch and another time.

Since I don't have access to a big.LITTLE system, someone should test
there with a new enough kernel; I was using stock 4.9.0-rc3.

OK?  Bootstrapped and tested on ThunderX on aarch64-linux-gnu with no
regressions and making sure /proc/cpuinfo is not read (by using
strace).

Thanks,
Andrew Pinski

ChangeLog:
* config/aarch64/aarch64-option-extensions.def: Document extra
argument to AARCH64_OPT_EXTENSION.  Update for the extra argument for
all of the option extensions.
* config/aarch64/driver-aarch64.c: Include sys/auxv.h and asm/hwcap.h.
(HWCAP_CRC32): Define if needed.
(HWCAP_ATOMICS): Likewise.
(HWCAP_FPHP): Likewise.
(HWCAP_ASIMDHP): Likewise.
(aarch64_arch_extension): New field hwcap_mask.
(AARCH64_OPT_EXTENSION): Handle extra argument.
(AARCH64_BIG_LITTLE): Always put the larger core number first.
(valid_bL_core_p): Don't check AARCH64_BIG_LITTLE for the opposite
order as it already handles the order.
(implementor_from_midr): New function.
(part_no_from_midr): New function.
(sysfsformat): New define.
(host_detect_local_cpu_sys): New function.
(host_detect_local_cpu): Call host_detect_local_cpu_sys if opening
"/sys/devices/system/cpu/present" file worked.
* common/config/aarch64/aarch64-common.c (AARCH64_OPT_EXTENSION):
Handle extra argument.
Index: common/config/aarch64/aarch64-common.c
===
--- common/config/aarch64/aarch64-common.c  (revision 242061)
+++ common/config/aarch64/aarch64-common.c  (working copy)
@@ -121,7 +121,7 @@ struct aarch64_option_extension
 /* ISA extensions in AArch64.  */
 static const struct aarch64_option_extension all_extensions[] =
 {
-#define AARCH64_OPT_EXTENSION(NAME, FLAG_CANONICAL, FLAGS_ON, FLAGS_OFF, Z) \
+#define AARCH64_OPT_EXTENSION(NAME, FLAG_CANONICAL, FLAGS_ON, FLAGS_OFF, Z, 
YY) \
   {NAME, FLAG_CANONICAL, FLAGS_ON, FLAGS_OFF},
 #include "config/aarch64/aarch64-option-extensions.def"
   {NULL, 0, 0, 0}
Index: config/aarch64/aarch64-option-extensions.def
===
--- config/aarch64/aarch64-option-extensions.def(revision 242061)
+++ config/aarch64/aarch64-option-extensions.def(working copy)
@@ -21,7 +21,7 @@
 
Before using #include to read this file, define a macro:
 
-  AARCH64_OPT_EXTENSION(EXT_NAME, FLAG_CANONICAL, FLAGS_ON, FLAGS_OFF, 
FEATURE_STRING)
+  AARCH64_OPT_EXTENSION(EXT_NAME, FLAG_CANONICAL, FLAGS_ON, FLAGS_OFF, 
FEATURE_STRING, HWCAP)
 
EXT_NAME is the name of the extension, represented as a string constant.
FLAGS_CANONICAL is the canonical internal name for this flag.
@@ -36,28 +36,29 @@
the extension (for example, the 'crypto' extension depends on four
entries: aes, pmull, sha1, sha2 being present).  In that case this field
should contain a space (" ") separated list of the strings in 'Features'
-   that are required.  Their order is not important.  */
+   that are required.  Their order is not important.  
+   HWCAP is the required hwcap mask for this feature. */
 
 /* Enabling "fp" just enables "fp".
Disabling "fp" also disables "simd", "crypto" and "fp16".  */
-AARCH64_OPT_EXTENSION("fp", AARCH64_FL_FP, 0, AARCH64_FL_SIMD | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16, "fp")
+AARCH64_OPT_EXTENSION("fp", AARCH64_FL_FP, 0, AARCH64_FL_SIMD | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16, "fp", HWCAP_FP)
 
 /* Enabling "simd" also enables "fp".
Disabling "simd" also disables "crypto".  */
-AARCH64_OPT_EXTENSION("simd", AARCH64_FL_SIMD, AARCH64_FL_FP, 
AARCH64_FL_CRYPTO, "asimd")
+AARCH64_OPT_EXTENSION("simd", AARCH64_FL_SIMD, AARCH64_FL_FP, 
AARCH64_FL_CRYPTO, "asimd", HWCAP_ASIMD)
 
 /* Enabling "crypto" also enables "fp", "simd".
Disabling "crypto" just disables "crypto".  */
-AARCH64_OPT_EXTENSION("crypto", AARCH64_FL_CRYPTO, AARCH64_FL_FP | 
AARCH64_FL_SIMD, 0, "aes pmull sha1 sha2")
+AARCH64_OPT_EXTENSION("crypto", AARCH64_FL_CRYPTO, AARCH64_FL_FP | 
AARCH64_FL_SIMD, 0, "aes 

Re: [PATCH] Fix PR 78243 on PowerPC

2016-11-10 Thread Segher Boessenkool
On Thu, Nov 10, 2016 at 07:21:15PM -0500, Michael Meissner wrote:
> --- gcc/config/rs6000/vsx.md  (revision 242048)
> +++ gcc/config/rs6000/vsx.md  (revision 242049)
> @@ -2542,10 +2542,13 @@ (define_insn "vsx_extract__p9"
>"VECTOR_MEM_VSX_P (mode) && TARGET_VEXTRACTUB
> && TARGET_VSX_SMALL_INTEGER"
>  {
> -  /* Note, the element number has already been adjusted for endianness, so we
> - don't have to adjust it here.  */
> -  int unit_size = GET_MODE_UNIT_SIZE (mode);
> -  HOST_WIDE_INT offset = unit_size * INTVAL (operands[2]);
> +  HOST_WIDE_INT elt = INTVAL (operands[2]);
> +  HOST_WIDE_INT elt_adj = ((!VECTOR_ELT_ORDER_BIG)
> +? (GET_MODE_NUNITS (mode) - 1 - elt)
> +: elt);

Two unnecessary pairs of parens (three, but emacs likes the outer pair?)

Otherwise looks fine.  Okay for trunk with that fix, thanks,


Segher


Re: [PATCH] PR78241, fix loop unroller when niter expr is not reliable

2016-11-10 Thread Andrew Pinski
On Wed, Nov 9, 2016 at 2:13 PM, Pat Haugen  wrote:
> The following fixes a problem introduced by my earlier loop unroller patch, 
> https://gcc.gnu.org/ml/gcc-patches/2016-09/msg01612.html. In instances where 
> the niter expr is not reliable we need to still emit an initial peel copy of 
> the loop.
>
> Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?

This fixes the performance regression I reported with the original
patch at https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01224.html .

Thanks,
Andrew

>
> -Pat
>
>
> 2016-11-09  Pat Haugen  
>
> PR rtl-optimization/78241
> * loop-unroll.c (unroll_loop_runtime_iterations): Don't adjust 
> 'niter', but
> emit initial peel copy if niter expr is not reliable.
>
>
> testsuite/ChangeLog:
> 2016-11-09  Pat Haugen  
>
> * gcc.dg/pr78241.c: New test.
>
>


[PATCH] Fix PR 78243 on PowerPC

2016-11-10 Thread Michael Meissner
Aaron Sawdey has been running the GCC testsuite on the power9 simulator and he
noticed that:

gcc.c-torture/execute/pr68532.c

does not run, and opened bug 78243 for this failure.

Now, if you compile pr68532 with normal options (-O2/-O3 and -mcpu=power8 or
-mcpu=power9) it works because with the normal powerpc vectorization costs, the
vectorize only generates scalar code.  However, the options in the PR
explicitly turns on -fno-vect-cost-model, which forces the loop to be be
vectorized.  In doing so, it generates pretty bad code.

The vectorizer generates a vector add loop, and at the end it does a vector
reduction to get the total added.  When -mcpu=power9 is used it generates a
VEXTRACTUH instruction to extract the HImode from the V8HImode vector.
Unfortunately, on little endian (with little endian element ordering), it gets
the wrong element.

This patch fixes that problem.  I did a bootstrap and regression test, and
there were no regressions.  I ran the test on the power9 simulator for both big
endian and little endian options and it passed.  I also ran the following
executable tests from the testsuite which exercise vector init, set, and
extract for each of the basic types:

vec-init-1.celement type: int
vec-init-2.celement type: long
vec-init-4.celement type: short
vec-init-5.celement type: signed char
vec-init-8.celement type: float
vec-init-9.celement type: double

All tests passed on both little endian and big endian simulator runs.  Can I
check this patch into the trunk?

2016-11-10  Michael Meissner  

PR target/78243
* config/rs6000/vsx.md (vsx_extract__p9): Correct the
element order for little endian ordering.

* config/rs6000/altivec.md (reduc_plus_scal_): Use
VECTOR_ELT_ORDER_BIG and not BYTES_BIG_ENDIAN to adjust element
number.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 242048)
+++ gcc/config/rs6000/vsx.md(revision 242049)
@@ -2542,10 +2542,13 @@ (define_insn "vsx_extract__p9"
   "VECTOR_MEM_VSX_P (mode) && TARGET_VEXTRACTUB
&& TARGET_VSX_SMALL_INTEGER"
 {
-  /* Note, the element number has already been adjusted for endianness, so we
- don't have to adjust it here.  */
-  int unit_size = GET_MODE_UNIT_SIZE (mode);
-  HOST_WIDE_INT offset = unit_size * INTVAL (operands[2]);
+  HOST_WIDE_INT elt = INTVAL (operands[2]);
+  HOST_WIDE_INT elt_adj = ((!VECTOR_ELT_ORDER_BIG)
+  ? (GET_MODE_NUNITS (mode) - 1 - elt)
+  : elt);
+
+  HOST_WIDE_INT unit_size = GET_MODE_UNIT_SIZE (mode);
+  HOST_WIDE_INT offset = unit_size * elt_adj;
 
   operands[2] = GEN_INT (offset);
   if (unit_size == 4)
Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md(revision 242048)
+++ gcc/config/rs6000/altivec.md(revision 242049)
@@ -2785,7 +2785,7 @@ (define_expand "reduc_plus_scal_"
   rtx vtmp1 = gen_reg_rtx (V4SImode);
   rtx vtmp2 = gen_reg_rtx (mode);
   rtx dest = gen_lowpart (V4SImode, vtmp2);
-  int elt = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (mode) - 1 : 0;
+  int elt = VECTOR_ELT_ORDER_BIG ? GET_MODE_NUNITS (mode) - 1 : 0;
 
   emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
   emit_insn (gen_altivec_vsum4ss (vtmp1, operands[1], vzero));


Re: [PATCH] PR77359: Properly align local variables in functions calling alloca.

2016-11-10 Thread Segher Boessenkool
On Fri, Nov 11, 2016 at 12:47:02AM +0100, Dominik Vogt wrote:
> On Thu, Nov 03, 2016 at 11:40:44AM +0100, Dominik Vogt wrote:
> > The attached patch fixes the stack layout problems on AIX and
> > Power as described here:
> > 
> >   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77359
> > 
> > The patch has been bootstrapped on AIX (32 Bit) and bootstrappend
> > and regression tested on Power (biarch).  It needs more testing
> > that I cannot do with the hardware available to me.
> > 
> > If the patch is good, this one can be re-applied:
> > 
> >   https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01730.html
> >   https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01616.html
> 
> So, is this patch in order to be committed?  (Assuming that a
> followup patch will clean up the rs6000.h+aix.h quirks.)

You say it needs more testing -- what testing?

(And it needs to be posted to gcc-patches@ of course).

> > +#undef STARTING_FRAME_OFFSET
> > +#define STARTING_FRAME_OFFSET  
> > \
> > +  (FRAME_GROWS_DOWNWARD
> > \
> > +   ? 0 
> > \
> > +   : (cfun->calls_alloca   \
> > +  ? RS6000_ALIGN (crtl->outgoing_args_size + RS6000_SAVE_AREA, 16) 
> > \
> > +  : (RS6000_ALIGN (crtl->outgoing_args_size, 16) + RS6000_SAVE_AREA)))

Maybe you can make the comment explain these last two lines as well...  It
seems to me you want to align STARTING_FRAME_OFFSET if calls_alloca?

Also add a comment for the one in rs6000.h?

> > +/* Offset from the stack pointer register to an item dynamically
> > +   allocated on the stack, e.g., by `alloca'.
> > +
> > +   The default value for this macro is `STACK_POINTER_OFFSET' plus the
> > +   length of the outgoing arguments.  The default is correct for most
> > +   machines.  See `function.c' for details.  */
> > +#undef STACK_DYNAMIC_OFFSET
> > +#define STACK_DYNAMIC_OFFSET(FUNDECL)  
> > \
> > +   RS6000_ALIGN (crtl->outgoing_args_size + (STACK_POINTER_OFFSET), 16)

You don't need parens around STACK_POINTER_OFFSET.

Looks fine to me except for those nits,


Segher


Re: [PATCH] PR77359: Properly align local variables in functions calling alloca.

2016-11-10 Thread Dominik Vogt
On Thu, Nov 03, 2016 at 11:40:44AM +0100, Dominik Vogt wrote:
> The attached patch fixes the stack layout problems on AIX and
> Power as described here:
> 
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77359
> 
> The patch has been bootstrapped on AIX (32 Bit) and bootstrappend
> and regression tested on Power (biarch).  It needs more testing
> that I cannot do with the hardware available to me.
> 
> If the patch is good, this one can be re-applied:
> 
>   https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01730.html
>   https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01616.html

So, is this patch in order to be committed?  (Assuming that a
followup patch will clean up the rs6000.h+aix.h quirks.)

> gcc/ChangeLog
> 
>   * config/rs6000/rs6000.c (rs6000_stack_info): Properly align local
>   variables in functions calling alloca.
>   * config/rs6000/rs6000.h (STARTING_FRAME_OFFSET, STACK_DYNAMIC_OFFSET):
>   Likewise.
>   * config/rs6000/aix.h (STARTING_FRAME_OFFSET, STACK_DYNAMIC_OFFSET):
>   Copy AIX specific versions of the rs6000.h macros to aix.h.

> >From bd36042fd82e29204d2f10c180b9e7c27281eef2 Mon Sep 17 00:00:00 2001
> From: Dominik Vogt 
> Date: Fri, 28 Oct 2016 12:59:55 +0100
> Subject: [PATCH] PR77359: Properly align local variables in functions
>  calling alloca.
> 
> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77359 for a discussion of the
> problem and the fix.
> ---
>  gcc/config/rs6000/aix.h| 27 +++
>  gcc/config/rs6000/rs6000.c |  9 +++--
>  gcc/config/rs6000/rs6000.h | 14 --
>  3 files changed, 42 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/config/rs6000/aix.h b/gcc/config/rs6000/aix.h
> index b254236..7773517 100644
> --- a/gcc/config/rs6000/aix.h
> +++ b/gcc/config/rs6000/aix.h
> @@ -40,6 +40,33 @@
>  #undef  STACK_BOUNDARY
>  #define STACK_BOUNDARY 128
>  
> +/* Offset within stack frame to start allocating local variables at.
> +   If FRAME_GROWS_DOWNWARD, this is the offset to the END of the
> +   first local allocated.  Otherwise, it is the offset to the BEGINNING
> +   of the first local allocated.
> +
> +   On the RS/6000, the frame pointer is the same as the stack pointer,
> +   except for dynamic allocations.  So we start after the fixed area and
> +   outgoing parameter area.  */
> +
> +#undef STARTING_FRAME_OFFSET
> +#define STARTING_FRAME_OFFSET
> \
> +  (FRAME_GROWS_DOWNWARD  
> \
> +   ? 0   
> \
> +   : (cfun->calls_alloca \
> +  ? RS6000_ALIGN (crtl->outgoing_args_size + RS6000_SAVE_AREA, 16)   
> \
> +  : (RS6000_ALIGN (crtl->outgoing_args_size, 16) + RS6000_SAVE_AREA)))
> +
> +/* Offset from the stack pointer register to an item dynamically
> +   allocated on the stack, e.g., by `alloca'.
> +
> +   The default value for this macro is `STACK_POINTER_OFFSET' plus the
> +   length of the outgoing arguments.  The default is correct for most
> +   machines.  See `function.c' for details.  */
> +#undef STACK_DYNAMIC_OFFSET
> +#define STACK_DYNAMIC_OFFSET(FUNDECL)
> \
> +   RS6000_ALIGN (crtl->outgoing_args_size + (STACK_POINTER_OFFSET), 16)
> +
>  #undef  TARGET_IEEEQUAD
>  #define TARGET_IEEEQUAD 0
>  
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index f9e4739..02ed9c1 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -26004,8 +26004,13 @@ rs6000_stack_info (void)
>info->reg_size = reg_size;
>info->fixed_size   = RS6000_SAVE_AREA;
>info->vars_size= RS6000_ALIGN (get_frame_size (), 8);
> -  info->parm_size= RS6000_ALIGN (crtl->outgoing_args_size,
> -  TARGET_ALTIVEC ? 16 : 8);
> +  if (cfun->calls_alloca)
> +info->parm_size  =
> +  RS6000_ALIGN (crtl->outgoing_args_size + info->fixed_size,
> + STACK_BOUNDARY / BITS_PER_UNIT) - info->fixed_size;
> +  else
> +info->parm_size  = RS6000_ALIGN (crtl->outgoing_args_size,
> +  TARGET_ALTIVEC ? 16 : 8);
>if (FRAME_GROWS_DOWNWARD)
>  info->vars_size
>+= RS6000_ALIGN (info->fixed_size + info->vars_size + info->parm_size,
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 4b83abd..c11dc1b 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -1728,9 +1728,12 @@ extern enum reg_class 
> rs6000_constraints[RS6000_CONSTRAINT_MAX];
>  #define STARTING_FRAME_OFFSET
> \
>(FRAME_GROWS_DOWNWARD  
> \
> ? 0   
> \
> -   : (RS6000_ALIGN (crtl->outgoing_args_size,   

Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-10 Thread Segher Boessenkool
On Thu, Nov 10, 2016 at 02:42:24PM -0800, Andrew Pinski wrote:
> On Thu, Nov 10, 2016 at 6:25 AM, Kyrill Tkachov
> > I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there were
> > some interesting swings.
> > 458.sjeng +1.45%
> > 471.omnetpp   +2.19%
> > 445.gobmk -2.01%
> >
> > On SPECFP:
> > 453.povray+7.00%
> 
> 
> Wow, this looks really good.  Thank you for implementing this.  If I
> get some time I am going to try it out on other processors than A72
> but I doubt I have time any time soon.

I'd love to hear what causes the slowdown for gobmk as well, btw.


Segher


Re: [RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-10 Thread Dominik Vogt
On Thu, Nov 10, 2016 at 11:53:07PM +0100, Marc Glisse wrote:
> On Thu, 10 Nov 2016, Dominik Vogt wrote:
> 
> >On Wed, Nov 09, 2016 at 03:46:38PM +0100, Richard Biener wrote:
> >>On Wed, Nov 9, 2016 at 3:30 PM, Dominik Vogt  
> >>wrote:
> >>>Something like the attached patch?  Robin and me have spent quite
> >>>some time to figure out the new pattern.  Two questions:
> >>>
> >>>1) In the match expression you cannot just use SSA_NAME@0 because
> >>>   then the "case SSA_NAME:" is added to a switch for another
> >>>   pattern that already has that label.  Thus we made that "proxy"
> >>>   predicate "ssa_name_p" that forces the code for the new pattern
> >>>   out of the old switch and into a separate code block.  We
> >>>   couldn't figure out whether this joining of case labels is a
> >>>   feature in the matching language.  So, is this the right way to
> >>>   deal with the conflicting labels?
> >>
> >>No, just do not match SSA_NAME.  And instead of
> >>
> >>+  (with { gimple *def_stmt = SSA_NAME_DEF_STMT (@0); }
> >>+   (if (is_gimple_assign (def_stmt)
> >>+   && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt)))
> >>
> >>you instead want to change the pattern to
> >>
> >>(simpify
> >>  (cmp (convert @0) INTEGER_CST@1)
> >>
> >>@0 will then be your innerop
> >>
> >>note that you can't use get_value_range but you have to use the
> >>get_range_info interface instead.  I suppose a helper function
> >>somewhere that computes whether an expression fits a type
> >>would be helpful (see expr_not_equal_to for sth related,
> >>thus expr_fits_type_p (@0, TREE_TYPE (@1)))
> >
> >All right, I think we got that (new patch attached).
> >
> >>Likewise the overflow_infinity checks do not translate to match.pd
> >>(or rahter the range info you get).
> >
> >Can you give us another hint here, please?  The overflow check
> >should probably go into expr_fits_type_p, but with only the min
> >and max values from get_range_info, how can the overflow
> >TREE_OVERFLOW_P flag be retrieved from @1, to duplicate the logic
> >from is_{nega,posi}tive_overflow_infinity?  Is it availably
> >somewhere, or is it necessary to somehow re-calculate it from the
> >expression?
> >
> >(This is really necessary so that cases like this don't start
> >folding with the patch:
> >
> >--
> >signed char foo3uu (unsigned char a)
> >{
> > unsigned char d;
> > unsigned long un;
> >
> > d = (a & 63) + 200;
> > un = d;
> > if (un >= 12)
> >   ubar(un);
> >
> > return d;
> >}
> >--
> 
> What's wrong with folding un >= 12 to d >= 12 (ignoring
> profitability, which you already handle with single_use)? I am not
> convinced we need the overflow stuff at all here.

This is the patch that added the overflow check:

https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00037.html
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57124
--
PR tree-optimization/57124
* tree-vrp.c (simplify_cond_using_ranges): Only simplify a
conversion feeding a condition if the range has an
overflow
if -fstrict-overflow.  Add warnings for when we do make
the
transformation.

PR tree-optimization/57124
* gcc.c-torture/execute/pr57124.c: New test.
* gcc.c-torture/execute/pr57124.x: Set
* -fno-strict-overflow.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@199305
138bc75d-0d04-0410-961f-82ee72b054a4
--

> +(for cmp (eq ne gt ge lt le)
> 
> (for cmp (simple_comparison)
> 
> + (cmp (convert@0 @1) INTEGER_CST@2)
> + (if (TREE_CODE (@1) == SSA_NAME
> 
> (cmp (convert@0 SSA_NAME@1) INTEGER_CST@2)
> 
> +(cmp { @1; } (convert @2))
> 
> (cmp @1 (convert @2))

With some more cleaning:

(for cmp (simple_comparison) 
 (simplify 
  (cmp (convert@0 SSA_NAME@1) INTEGER_CST@2) 
  (if (!POINTER_TYPE_P (TREE_TYPE (@1)) 
   && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (@1) 
   && desired_pro_or_demotion_p (TREE_TYPE (@1), TREE_TYPE (@0))) 
   && expr_fits_type_p (@1, TREE_TYPE (@0)) 
   && int_fits_type_p (@2, TREE_TYPE (@1)) 
   && (!has_single_use (@1) || has_single_use (@0))) 
   (cmp @1 (convert @2)) 

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: [PATCH, gcc, wwwdocs] Document upcoming Qualcomm Falkor processor support

2016-11-10 Thread Gerald Pfeifer
On Fri, 11 Nov 2016, Siddhesh Poyarekar wrote:
> This patch documents the newly added flag in gcc 7 for the upcoming
> Qualcomm Falkor processor core.

Looks good to me.  Probably a good idea for one of the ARM maintainers
to sign off, too.

Gerald


Go patch committed: copy signal code from Go 1.7 runtime

2016-11-10 Thread Ian Lance Taylor
This patch to the Go frontend and libgo copies the signal code from
the Go 1.7 runtime.

This adds a little shell script to auto-generate runtime.sigtable from
the known signal names.

This forces the main package to always import the runtime package.
Otherwise some runtime package global variables may never be
initialized.

This sets the syscallsp and syscallpc fields of g when entering a
syscall, so that the runtime package knows when a g is executing a
syscall.

This fixes runtime.funcPC to avoid dead store elimination of the
interface value when the function is inlined.

The signal code in C now has some target-specific code to return the
PC where the signal occurred and to dump the registers on a hard
crash.  This is what the gc toolchain does as well.  I wrote versions
of that code for x86 GNU/Linux.  Other targets will fall back
reasonably and display less information.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.
Bootstrapped and ran relevant tests on sparc-sun-solaris.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 242024)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-cac897bd27885c18a16dacfe27d5efd4526455c5
+449e918b0f93d3e3339edcec21a5bc157f548e54
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 242024)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -3791,7 +3791,7 @@ Unary_expression::do_flatten(Gogo* gogo,
this->escapes_ = false;
 
   // When compiling the runtime, the address operator does not
-  // cause local variables to escapes.  When escape analysis
+  // cause local variables to escape.  When escape analysis
   // becomes the default, this should be changed to make it an
   // error if we have an address operator that escapes.
   if (gogo->compiling_runtime() && gogo->package_name() == "runtime")
Index: gcc/go/gofrontend/gogo.cc
===
--- gcc/go/gofrontend/gogo.cc   (revision 242024)
+++ gcc/go/gofrontend/gogo.cc   (working copy)
@@ -394,6 +394,7 @@ void
 Gogo::import_package(const std::string& filename,
 const std::string& local_name,
 bool is_local_name_exported,
+bool must_exist,
 Location location)
 {
   if (filename.empty())
@@ -497,7 +498,8 @@ Gogo::import_package(const std::string&
this->relative_import_path_);
   if (stream == NULL)
 {
-  go_error_at(location, "import file %qs not found", filename.c_str());
+  if (must_exist)
+   go_error_at(location, "import file %qs not found", filename.c_str());
   return;
 }
 
@@ -2179,6 +2181,14 @@ Gogo::is_thunk(const Named_object* no)
 void
 Gogo::define_global_names()
 {
+  if (this->is_main_package())
+{
+  // Every Go program has to import the runtime package, so that
+  // it is properly initialized.
+  this->import_package("runtime", "_", false, false,
+  Linemap::predeclared_location());
+}
+
   for (Bindings::const_declarations_iterator p =
 this->globals_->begin_declarations();
p != this->globals_->end_declarations();
Index: gcc/go/gofrontend/gogo.h
===
--- gcc/go/gofrontend/gogo.h(revision 241341)
+++ gcc/go/gofrontend/gogo.h(working copy)
@@ -301,7 +301,7 @@ class Gogo
   // the declarations are added to the global scope.
   void
   import_package(const std::string& filename, const std::string& local_name,
-bool is_local_name_exported, Location);
+bool is_local_name_exported, bool must_exist, Location);
 
   // Whether we are the global binding level.
   bool
Index: gcc/go/gofrontend/parse.cc
===
--- gcc/go/gofrontend/parse.cc  (revision 241341)
+++ gcc/go/gofrontend/parse.cc  (working copy)
@@ -5722,7 +5722,7 @@ Parse::import_spec(void*)
 }
 
   this->gogo_->import_package(token->string_value(), local_name,
- is_local_name_exported, location);
+ is_local_name_exported, true, location);
 
   this->advance_token();
 }
Index: libgo/Makefile.am
===
--- libgo/Makefile.am   (revision 241742)
+++ libgo/Makefile.am   (working copy)
@@ -480,14 +480,12 @@ runtime_files = \
runtime/print.c \
runtime/proc.c \
runtime/runtime_c.c \
-   runtime/signal_unix.c \
runtime/thread.c \
$(runtime_thread_files) \
runtime/yield.c 

Re: [RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-10 Thread Marc Glisse

On Thu, 10 Nov 2016, Dominik Vogt wrote:


On Wed, Nov 09, 2016 at 03:46:38PM +0100, Richard Biener wrote:

On Wed, Nov 9, 2016 at 3:30 PM, Dominik Vogt  wrote:

Something like the attached patch?  Robin and me have spent quite
some time to figure out the new pattern.  Two questions:

1) In the match expression you cannot just use SSA_NAME@0 because
   then the "case SSA_NAME:" is added to a switch for another
   pattern that already has that label.  Thus we made that "proxy"
   predicate "ssa_name_p" that forces the code for the new pattern
   out of the old switch and into a separate code block.  We
   couldn't figure out whether this joining of case labels is a
   feature in the matching language.  So, is this the right way to
   deal with the conflicting labels?


No, just do not match SSA_NAME.  And instead of

+  (with { gimple *def_stmt = SSA_NAME_DEF_STMT (@0); }
+   (if (is_gimple_assign (def_stmt)
+   && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt)))

you instead want to change the pattern to

(simpify
  (cmp (convert @0) INTEGER_CST@1)

@0 will then be your innerop

note that you can't use get_value_range but you have to use the
get_range_info interface instead.  I suppose a helper function
somewhere that computes whether an expression fits a type
would be helpful (see expr_not_equal_to for sth related,
thus expr_fits_type_p (@0, TREE_TYPE (@1)))


All right, I think we got that (new patch attached).


Likewise the overflow_infinity checks do not translate to match.pd
(or rahter the range info you get).


Can you give us another hint here, please?  The overflow check
should probably go into expr_fits_type_p, but with only the min
and max values from get_range_info, how can the overflow
TREE_OVERFLOW_P flag be retrieved from @1, to duplicate the logic
from is_{nega,posi}tive_overflow_infinity?  Is it availably
somewhere, or is it necessary to somehow re-calculate it from the
expression?

(This is really necessary so that cases like this don't start
folding with the patch:

--
signed char foo3uu (unsigned char a)
{
 unsigned char d;
 unsigned long un;

 d = (a & 63) + 200;
 un = d;
 if (un >= 12)
   ubar(un);

 return d;
}
--


What's wrong with folding un >= 12 to d >= 12 (ignoring profitability, 
which you already handle with single_use)? I am not convinced we need the 
overflow stuff at all here.


+(for cmp (eq ne gt ge lt le)

(for cmp (simple_comparison)

+ (cmp (convert@0 @1) INTEGER_CST@2)
+ (if (TREE_CODE (@1) == SSA_NAME

(cmp (convert@0 SSA_NAME@1) INTEGER_CST@2)

+  (cmp { @1; } (convert @2))

(cmp @1 (convert @2))

--
Marc Glisse


Re: [PATCH] combine: Do not call simplify from inside change_zero_ext (PR78232)

2016-11-10 Thread Segher Boessenkool
Hi all,

I now committed this, with changelog

PR rtl-optimization/78232
* combine.c (try_combine): Add a big comment about why reusing i2dest
is undesirable.
(change_zero_ext): Do not call simplify_gen_binary, do the
simplifications manually.

and the patch adds


--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -3560,6 +3560,15 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
{
  machine_mode new_mode = GET_MODE (SET_DEST (newpat));
 
+ /* ??? Reusing i2dest without resetting the reg_stat entry for it
+(temporarily, until we are committed to this instruction
+combination) does not work: for example, any call to nonzero_bits
+on the register (from a splitter in the MD file, for example)
+will get the old information, which is invalid.
+
+Since nowadays we can create registers during combine just fine,
+we should just create a new one here, not reuse i2dest.  */
+
  /* First try to split using the original register as a
 scratch register.  */
  parallel = gen_rtx_PARALLEL (VOIDmode,


Re: [Patch, fortran] PR44265 - Link error with reference to parameter array in specification expression

2016-11-10 Thread Dominique d'Humières
FAIL: gfortran.dg/char_result_16.f90   -g -flto  (internal compiler error)
FAIL: gfortran.dg/char_result_16.f90   -g -flto  (test for excess errors)

The ICE is for both -m32 and -m64 (module_procedure_3_db_1.f90 is the test 
posted in my last mail)

% gfc module_procedure_3_db_1.f90 -flto
module_procedure_3_db_1.f90:29:0: internal compiler error: in 
get_partitioning_class, at symtab.c:1848
 END PROGRAM WheresThatbLinkingConstantGone

Sorry to be such a nuisance!-(

Dominique

> Le 10 nov. 2016 à 15:49, Paul Richard Thomas  
> a écrit :
> 
> Hi Dominique.
> 
> snip
>> I have a last glitch (which can be deferred if needed):
> snip
> 
> Fixed by the new patch, which is attached. Bootstraps and regtests OK.
> 
> OK for trunk?
> 
> Paul
> 
> 2016-11-10  Paul Thomas  
> 
>PR fortran/44265
>* gfortran.h : Add fn_result_spec bitfield to gfc_symbol.
>* resolve.c (flag_fn_result_spec): New function.
>(resolve_fntype): Call it for character result lengths.
>* symbol.c (gfc_new_symbol): Set fn_result_spec to zero.
>* trans-decl.c (gfc_sym_mangled_identifier): Include the
>procedure name in the mangled name for symbols with the
>fn_result_spec bit set.
>(gfc_get_symbol_decl): Mangle the name of these symbols.
>(gfc_create_module_variable): Allow them through the assert.
>(gfc_generate_function_code): Remove the assert before the
>initialization of sym->tlink because the frontend no longer
>uses this field.
>* trans-expr.c (gfc_map_intrinsic_function): Add a case to
>treat the LEN_TRIM intrinsic.
> 
> 2016-11-10  Paul Thomas  
> 
>PR fortran/44265
>* gfortran.dg/char_result_14.f90: New test.
>* gfortran.dg/char_result_15.f90: New test.
>* gfortran.dg/char_result_16.f90: New test.
> 
> 
> -- 
> The difference between genius and stupidity is; genius has its limits.
> 
> Albert Einstein



Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-10 Thread Andrew Pinski
On Thu, Nov 10, 2016 at 6:25 AM, Kyrill Tkachov
 wrote:
> Hi all,
>
> This patch implements the new separate shrink-wrapping hooks for aarch64.
> In separate shrink wrapping (as I understand it) we consider each register
> save/restore as
> a 'component' that can be performed independently of the other save/restores
> in the prologue/epilogue
> and can be moved outside the prologue/epilogue and instead performed only in
> the basic blocks where it's
> actually needed. This allows us to avoid saving and restoring registers on
> execution paths where a register
> might not be needed.
>
> In the most general form a 'component' can be any operation that the
> prologue/epilogue performs, for example
> stack adjustment. But in this patch we only consider callee-saved register
> save/restores as components.
> The code is in many ways similar to the powerpc implementation of the hooks.
>
> The hooks implemented are:
> * TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS: Returns a bitmap containing a
> bit for each register that should
> be considered a 'component' i.e. its save/restore should be separated from
> the prologue and epilogue and placed
> at the basic block where it's needed.
>
> * TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB: Determine for a given basic block
> which 'component' registers it needs.
> This is determined through dataflow. If a component register is in the
> IN,GEN or KILL sets for the basic block
> it's considered as needed and marked as such in the bitmap.
>
> * TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS and
> TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS: Given a bitmap
> of component registers emits the save or restore code for them.
>
> * TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS: Given a bitmap of component
> registers record in the backend that
> the register is shrink-wrapped using this approach and that the normal
> prologue and epilogue expansion code
> should not emit code for them. This is done similarly to powerpc by defining
> a bool array in machine_function
> where we record whether each register is separately shrink-wrapped.  The
> prologue and epilogue expansion code
> (through aarch64_save_callee_saves and aarch64_restore_callee_saves) is
> updated to not emit save/restores for
> these registers if they appear in that array.
>
> Our prologue and epilogue code has a lot of intricate logic to perform stack
> adjustments using the writeback
> forms of the load/store instructions. Separately shrink-wrapping those
> registers marked for writeback
> (cfun->machine->frame.wb_candidate1 and cfun->machine->frame.wb_candidate2)
> broke that codegen and I had to
> emit an explicit stack adjustment instruction that created ugly
> prologue/epilogue sequences. So this patch
> is conservative and doesn't allow shrink-wrapping of the registers marked
> for writeback. Maybe in the future
> we can relax it (for example allow wrapping of one of the two writeback
> registers if the writeback amount
> can be encoded in a single-register writeback store/load) but given the
> development stage of GCC I thought
> I'd play it safe.
>
> I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there were
> some interesting swings.
> 458.sjeng +1.45%
> 471.omnetpp   +2.19%
> 445.gobmk -2.01%
>
> On SPECFP:
> 453.povray+7.00%


Wow, this looks really good.  Thank you for implementing this.  If I
get some time I am going to try it out on other processors than A72
but I doubt I have time any time soon.

Thanks,
Andrew

>
> I'll be re-running the benchmarks with Segher's recent patch [1] to see if
> they fix the regression
> and if it does I think this can go in.
>
> [1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00889.html
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Thanks,
> Kyrill
>
> 2016-11-10  Kyrylo Tkachov  
>
> * config/aarch64/aarch64.h (machine_function): Add
> reg_is_wrapped_separately field.
> * config/aarch64/aarch64.c (emit_set_insn): Change return type to
> rtx_insn *.
> (aarch64_save_callee_saves): Don't save registers that are wrapped
> separately.
> (aarch64_restore_callee_saves): Don't restore registers that are
> wrapped separately.
> (offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p,
> aarch64_offset_7bit_signed_scaled_p): Move earlier in the file.
> (aarch64_get_separate_components): New function.
> (aarch64_components_for_bb): Likewise.
> (aarch64_disqualify_components): Likewise.
> (aarch64_emit_prologue_components): Likewise.
> (aarch64_emit_epilogue_components): Likewise.
> (aarch64_set_handled_components): Likewise.
> (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS,
> TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB,
> TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS,
> TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS,
> TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS,
> TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Define.


Re: C++ PATCH for c++/77337 (auto return and constexpr lambda)

2016-11-10 Thread Jakub Jelinek
On Thu, Nov 10, 2016 at 01:42:20PM -0800, Jason Merrill wrote:
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp1y/auto-fn33.C
> @@ -0,0 +1,27 @@
> +// PR c++/77337
> +// { dg-do compile { target c++14 } }
> +
> +template
> +struct fix_type {
> +  Functor functor;
> +
> +  decltype(auto) operator()()
> +  { return functor(*this); }
> +};
> +
> +template
> +fix_type fix(Functor functor)
> +{ return { functor }; }
> +
> +int main()
> +{
> +  auto zero = fix
> +([](auto& self) -> int // N.B. non-deduced, non-dependent return type
> + {
> +   return 0;
> +
> +   self(); // error: use of 'decltype(auto) 
> fix_type::operator()() [with Functor = main()::]' 
> before deduction of 'auto'

Wouldn't it be clearer to turn that // error: line into
// { dg-bogus "use of \[^\n\r]* before deduction of 'auto'" }
so that it is clear that the error is undesirable even to casual reader?

> + });
> +
> +  return zero();
> +}

Jakub


Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)

2016-11-10 Thread Joseph Myers
On Thu, 10 Nov 2016, Richard Biener wrote:

> I'll address those comments.  As you did not have any comments on the 
> c-parser.[CH] parts does that mean you are fine with them?  That is, 
> does the above constitute a complete review of the patch?

I am fine with the c-parser.[ch] parts.

-- 
Joseph S. Myers
jos...@codesourcery.com


C++ PATCH for c++/77337 (auto return and constexpr lambda)

2016-11-10 Thread Jason Merrill
The constexpr lambda change introduced some problematic dependency
ordering, since instantiate constexpr functions aggressively so that
they are available for constexpr evaluation.  The patch for 65942
delayed that instantiation by triggering it from constexpr evaluation
directly rather than earlier in mark_used, but instantiate_decl still
wanted to instantiate them right away.  This patch fixes that and a
latent bug in tsubst_friend_function, which was leaving DECL_INITIAL
set on a function that was not yet instantiated.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit aea89e80dc0b7b976fd69680771a8534af265d7e
Author: Jason Merrill 
Date:   Thu Nov 10 10:13:44 2016 -0800

PR c++/77337 - auto return and lambda

* pt.c (tsubst_friend_function): Don't set DECL_INITIAL.
(instantiate_decl): It's OK to defer a constexpr function.
* cp-tree.h (DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION): Check
DECL_LANG_SPECIFIC.
* decl2.c (decl_defined_p): Use it.  No longer static.
* decl.c (redeclaration_error_message): Use decl_defined_p.
* constexpr.c (cxx_eval_call_expression): Set input_location around
call to instantiate_decl.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 43457d2..f75f0b0 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -1464,9 +1464,12 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
   if (!DECL_INITIAL (fun)
   && DECL_TEMPLOID_INSTANTIATION (fun))
 {
+  location_t save_loc = input_location;
+  input_location = loc;
   ++function_depth;
   instantiate_decl (fun, /*defer_ok*/false, /*expl_inst*/false);
   --function_depth;
+  input_location = save_loc;
 }
 
   /* If in direct recursive call, optimize definition search.  */
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 9b5b5bc..8183775 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -4380,7 +4380,8 @@ more_aggr_init_expr_args_p (const 
aggr_init_expr_arg_iterator *iter)
instantiated will not be a DECL_TEMPLATE_INSTANTIATION, but will be
a DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION.  */
 #define DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION(DECL) \
-  (DECL_TEMPLATE_INFO (DECL) && !DECL_USE_TEMPLATE (DECL))
+  (DECL_LANG_SPECIFIC (DECL) && DECL_TEMPLATE_INFO (DECL) \
+   && !DECL_USE_TEMPLATE (DECL))
 
 /* Nonzero if DECL is a function generated from a function 'temploid',
i.e. template, member of class template, or dependent friend.  */
@@ -5895,6 +5896,7 @@ extern void import_export_decl(tree);
 extern tree build_cleanup  (tree);
 extern tree build_offset_ref_call_from_tree(tree, vec **,
 tsubst_flags_t);
+extern bool decl_defined_p (tree);
 extern bool decl_constant_var_p(tree);
 extern bool decl_maybe_constant_var_p  (tree);
 extern void no_linkage_error   (tree);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 4b18d4e..185c98b 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -2778,8 +2778,8 @@ redeclaration_error_message (tree newdecl, tree olddecl)
 warn_extern_redeclared_static.  */
 
   /* Defining the same name twice is no good.  */
-  if (DECL_INITIAL (olddecl) != NULL_TREE
- && DECL_INITIAL (newdecl) != NULL_TREE)
+  if (decl_defined_p (olddecl)
+ && decl_defined_p (newdecl))
{
  if (DECL_NAME (olddecl) == NULL_TREE)
return G_("%q#D not declared in class");
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index e0fff1e..4ebc7dc 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -80,7 +80,6 @@ static void import_export_class (tree);
 static tree get_guard_bits (tree);
 static void determine_visibility_from_class (tree, tree);
 static bool determine_hidden_inline (tree);
-static bool decl_defined_p (tree);
 static void maybe_instantiate_decl (tree);
 
 /* A list of static class variables.  This is needed, because a
@@ -4085,11 +4084,15 @@ collect_ada_namespace (tree namespc, const char 
*source_file)
 /* Returns true iff there is a definition available for variable or
function DECL.  */
 
-static bool
+bool
 decl_defined_p (tree decl)
 {
   if (TREE_CODE (decl) == FUNCTION_DECL)
-return (DECL_INITIAL (decl) != NULL_TREE);
+return (DECL_INITIAL (decl) != NULL_TREE
+   /* A pending instantiation of a friend temploid is defined.  */
+   || (DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION (decl)
+   && DECL_INITIAL (DECL_TEMPLATE_RESULT
+(DECL_TI_TEMPLATE (decl);
   else
 {
   gcc_assert (VAR_P (decl));
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index e8b6afd..d4855d5 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -9383,10 +9383,6 @@ tsubst_friend_function (tree decl, tree args)
   else
new_friend_result_template_info = NULL_TREE;
 
- 

Re: Reject out-of-range bit pos in bit-fields insns operating on a register.

2016-11-10 Thread Andreas Schwab
On Nov 10 2016, Jeff Law  wrote:

> On 11/09/2016 03:40 AM, Andreas Schwab wrote:
>> As seen by the testcase in PR77822, combine can generate out-of-range
>> bit pos in a bit-field insn, unless the pattern explicitly rejects it.
>> This only makes a difference for expressions that are undefined at
>> runtime.  Without that we would either generate bad assembler or ICE in
>> output_btst.
>>
>>  PR target/78254
>>  * config/m68k/m68k.md: Reject out-of-range bit pos in bit-fields
>>  insns operating on a register.
> Could you please include a testcase for this?  Even if it's something that
> triggered during a bootstrap -- few people bootstrap m68k these days :-)

It didn't trigger during bootstrap.  It was the testcase for PR77822.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: Reject out-of-range bit pos in bit-fields insns operating on a register.

2016-11-10 Thread Jeff Law

On 11/09/2016 03:40 AM, Andreas Schwab wrote:

As seen by the testcase in PR77822, combine can generate out-of-range
bit pos in a bit-field insn, unless the pattern explicitly rejects it.
This only makes a difference for expressions that are undefined at
runtime.  Without that we would either generate bad assembler or ICE in
output_btst.

PR target/78254
* config/m68k/m68k.md: Reject out-of-range bit pos in bit-fields
insns operating on a register.
Could you please include a testcase for this?  Even if it's something 
that triggered during a bootstrap -- few people bootstrap m68k these 
days :-)


jeff



Re: Go patch committed: copy print code from Go 1.7 runtime

2016-11-10 Thread Ian Lance Taylor
On Tue, Oct 18, 2016 at 1:06 AM, Uros Bizjak  wrote:
>
>> This patch copies the code that implements the print and println
>> predeclared functions from the Go 1.7 runtime.  The compiler is
>> changed to use the new names, and to call the printlock and
>> printunlock functions around a sequence of print calls.  The writebuf
>> field in the g struct changes to a slice.  Bootstrapped and ran Go
>> testsuite on x86_64-pc-linux-gnu.  Committed to mainline.
>
> This patch probably introduced recent regression on 32bit x86 multilib:
>
> Running target unix/-m32
> FAIL: go.test/test/fixedbugs/bug114.go compilation,  -O2 -g
> FAIL: go.test/test/printbig.go -O (test for excess errors)
> FAIL: go.test/test/printbig.go execution
>
> === go Summary for unix/-m32 ===
>
> # of expected passes6875
> # of unexpected failures3
> # of expected failures  1
> # of untested testcases 12
> # of unsupported tests  2
>
> e.g.:
>
> /home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:15:27:
> error: integer constant overflow
> /home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:15:45:
> error: integer constant overflow
> /home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:19:38:
> error: integer constant overflow
> /home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:19:56:
> error: integer constant overflow
>
> FAIL: go.test/test/fixedbugs/bug114.go compilation,  -O2 -g
> UNTESTED: go.test/test/fixedbugs/bug114.go execution,  -O2 -g
>
> FAIL: go.test/test/printbig.go -O (test for excess errors)
> Excess errors:
> /home/uros/git/gcc/gcc/testsuite/go.test/test/printbig.go:12:8: error:
> integer constant overflow
> /home/uros/git/gcc/gcc/testsuite/go.test/test/printbig.go:13:15:
> error: integer constant overflow
>
> ./printbig.exe >printbig.p 2>&1
> couldn't execute "./printbig.exe": no such file or directory
> FAIL: go.test/test/printbig.go execution
> UNTESTED: go.test/test/printbig.go compare


For the record, Uros filed this as https://gcc.gnu.org/PR78145 and it
is now fixed.

Ian


Re: [RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-10 Thread Dominik Vogt
On Wed, Nov 09, 2016 at 03:46:38PM +0100, Richard Biener wrote:
> On Wed, Nov 9, 2016 at 3:30 PM, Dominik Vogt  wrote:
> > Something like the attached patch?  Robin and me have spent quite
> > some time to figure out the new pattern.  Two questions:
> >
> > 1) In the match expression you cannot just use SSA_NAME@0 because
> >then the "case SSA_NAME:" is added to a switch for another
> >pattern that already has that label.  Thus we made that "proxy"
> >predicate "ssa_name_p" that forces the code for the new pattern
> >out of the old switch and into a separate code block.  We
> >couldn't figure out whether this joining of case labels is a
> >feature in the matching language.  So, is this the right way to
> >deal with the conflicting labels?
> 
> No, just do not match SSA_NAME.  And instead of
> 
> +  (with { gimple *def_stmt = SSA_NAME_DEF_STMT (@0); }
> +   (if (is_gimple_assign (def_stmt)
> +   && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt)))
> 
> you instead want to change the pattern to
> 
> (simpify
>   (cmp (convert @0) INTEGER_CST@1)
> 
> @0 will then be your innerop
> 
> note that you can't use get_value_range but you have to use the
> get_range_info interface instead.  I suppose a helper function
> somewhere that computes whether an expression fits a type
> would be helpful (see expr_not_equal_to for sth related,
> thus expr_fits_type_p (@0, TREE_TYPE (@1)))

All right, I think we got that (new patch attached).

> Likewise the overflow_infinity checks do not translate to match.pd
> (or rahter the range info you get).

Can you give us another hint here, please?  The overflow check
should probably go into expr_fits_type_p, but with only the min
and max values from get_range_info, how can the overflow
TREE_OVERFLOW_P flag be retrieved from @1, to duplicate the logic
from is_{nega,posi}tive_overflow_infinity?  Is it availably
somewhere, or is it necessary to somehow re-calculate it from the
expression?

(This is really necessary so that cases like this don't start
folding with the patch:

--
signed char foo3uu (unsigned char a) 
{ 
  unsigned char d; 
  unsigned long un; 
 
  d = (a & 63) + 200; 
  un = d; 
  if (un >= 12) 
ubar(un); 
 
  return d; 
}
--

)

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
>From d6f30094d892d86598f6af35cb3b99c258642e9b Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 2 Nov 2016 14:01:46 +0100
Subject: [PATCH] Convert folding in simplify_cond_using_ranges() to match.pd.

In cases like this

  ac_4 = *p_3(D);
  bc_5 = ac_4 & 63;
  l_6 = (long int) bc_5;
  if (l_6 > 9)
bar(l_6);

the function would fold bc_5 into the condition, replacing l_6, but l_6 cannot
be eliminated.  We'd end up with bc_5 being used in two places and l_6 in one
place without the chance to eliminate either.

The patched code suppresses folding if

  bc_5 has only a single use (i.e. being cast to l_6),

AND

  l_6 has more than one use.

However, the patch does not catch the case where all uses of l_6 could be
eliminated by the function, e.g.

  ...
  if (l_6 > 9)
bar();
  else if (l_6 > 5)
foo();
---
 gcc/fold-const.c | 64 ++
 gcc/fold-const.h |  1 +
 gcc/match.pd | 28 
 gcc/tree-vrp.c   | 66 
 4 files changed, 93 insertions(+), 66 deletions(-)

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 593ea16..082ad46 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -9100,6 +9100,70 @@ expr_not_equal_to (tree t, const wide_int &w)
 }
 }
 
+/* Return true if the value range of T is known not fit into TYPE.  */
+
+bool
+expr_fits_type_p (tree t, tree type)
+{
+  wide_int min, max;
+  value_range_type rtype;
+  signop sign;
+  signop tsign;
+  bool has_range = false;
+
+  if (!INTEGRAL_TYPE_P (type))
+return false;
+  tsign = TYPE_SIGN (type);
+  sign = TYPE_SIGN (TREE_TYPE (t));
+  switch (TREE_CODE (t))
+{
+case INTEGER_CST:
+  min = t;
+  max = t;
+  has_range = true;
+  break;
+
+case SSA_NAME:
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
+   return false;
+  rtype = get_range_info (t, &min, &max);
+  if (rtype == VR_RANGE)
+   has_range = true;
+  else if (rtype == VR_ANTI_RANGE)
+   {
+ /* As ANTI_RANGEs always wrap around, just check if T's whole type's
+value range fits into TYPE.  */
+ min = TYPE_MIN_VALUE (TREE_TYPE (t));
+ max = TYPE_MAX_VALUE (TREE_TYPE (t));
+ has_range = true;
+   }
+  break;
+}
+  if (has_range)
+{
+  if (sign == tsign)
+   {
+ if (wi::le_p (max, TYPE_MAX_VALUE (type), sign)
+ && wi::ge_p (min, TYPE_MIN_VALUE (type), sign))
+   return true;
+   }
+  else if (sign == SIGNED && tsign == UNSIGNED)
+   {
+ if (wi::ge_p (min, 0, SIGNED)
+ && wi::le_p (max, TYPE_MAX_VALUE (type), UN

[gomp4] remove OMP_CLAUSE_DEVICE_RESIDENT

2016-11-10 Thread Cesar Philippidis
I've committed this patch to gomp-4_0-branch which removes
OMP_CLAUSE_DEVICE_RESIDENT. This standalone clause is no longer
necessary, and hasn't been for a while, because device_resident is
treated as a data mapping type for OMP_CLAUSE_MAP, and not a clause itself.

Cesar
2016-11-10  Cesar Philippidis  

	gcc/fortran/
	* trans-openmp.c (gfc_trans_omp_clauses_1): Remove
	OMP_CLAUSE_DEVICE_RESIDENT.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Remove
	OMP_CLAUSE_DEVICE_RESIDENT.
	(gimplify_adjust_omp_clauses): Likewise.
	* omp-low.c (scan_sharing_clauses): Likewise.
	* tree-core.h (enum omp_clause_code): Likewise.
	* tree-nested.c (convert_nonlocal_omp_clauses):
	(convert_local_omp_clauses):
	* tree-pretty-print.c (dump_omp_clause): Likewise.
	* tree.c (walk_tree_1): Likewise.


diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 9924872..3c53414 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1781,9 +1781,6 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, gfc_omp_clauses *clauses,
 	case OMP_LIST_USE_DEVICE:
 	  clause_code = OMP_CLAUSE_USE_DEVICE_PTR;
 	  goto add_clause;
-	case OMP_LIST_DEVICE_RESIDENT:
-	  clause_code = OMP_CLAUSE_DEVICE_RESIDENT;
-	  goto add_clause;
 
 	add_clause:
 	  omp_clauses
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 36c128b..9649fae 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -7594,10 +7594,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 	remove = true;
 	  break;
 
-	case OMP_CLAUSE_DEVICE_RESIDENT:
-	  remove = true;
-	  break;
-
 	case OMP_CLAUSE_NOWAIT:
 	case OMP_CLAUSE_ORDERED:
 	case OMP_CLAUSE_UNTIED:
@@ -8445,7 +8441,6 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 	case OMP_CLAUSE__CILK_FOR_COUNT_:
 	case OMP_CLAUSE_ASYNC:
 	case OMP_CLAUSE_WAIT:
-	case OMP_CLAUSE_DEVICE_RESIDENT:
 	case OMP_CLAUSE_INDEPENDENT:
 	case OMP_CLAUSE_NUM_GANGS:
 	case OMP_CLAUSE_NUM_WORKERS:
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e7cb66c..142330c 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2246,7 +2246,6 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 	  break;
 
 	case OMP_CLAUSE_BIND:
-	case OMP_CLAUSE_DEVICE_RESIDENT:
 	case OMP_CLAUSE_NOHOST:
 	case OMP_CLAUSE__CACHE_:
 	default:
@@ -2414,7 +2413,6 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 	  break;
 
 	case OMP_CLAUSE_BIND:
-	case OMP_CLAUSE_DEVICE_RESIDENT:
 	case OMP_CLAUSE_NOHOST:
 	case OMP_CLAUSE__CACHE_:
 	default:
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index b07ff30..a3c4b18 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -313,9 +313,6 @@ enum omp_clause_code {
  #pragma acc cache (variable-list).  */
   OMP_CLAUSE__CACHE_,
 
-  /* OpenACC clause: device_resident (variable_list).  */
-  OMP_CLAUSE_DEVICE_RESIDENT,
-
   /* OpenACC clause: gang [(gang-argument-list)].
  Where
   gang-argument-list: [gang-argument-list, ] gang-argument
diff --git a/gcc/tree-nested.c b/gcc/tree-nested.c
index ed77798..55f1f20 100644
--- a/gcc/tree-nested.c
+++ b/gcc/tree-nested.c
@@ -1215,7 +1215,6 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
 	  gcc_unreachable ();
 
 	case OMP_CLAUSE_BIND:
-	case OMP_CLAUSE_DEVICE_RESIDENT:
 	case OMP_CLAUSE_NOHOST:
 	default:
 	  gcc_unreachable ();
@@ -1914,7 +1913,6 @@ convert_local_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
 	  gcc_unreachable ();
 
 	case OMP_CLAUSE_BIND:
-	case OMP_CLAUSE_DEVICE_RESIDENT:
 	case OMP_CLAUSE_NOHOST:
 	default:
 	  gcc_unreachable ();
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index f5970e1..f1af737 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -407,9 +407,6 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, int flags)
 case OMP_CLAUSE__LOOPTEMP_:
   name = "_looptemp_";
   goto print_remap;
-case OMP_CLAUSE_DEVICE_RESIDENT:
-  name = "device_resident";
-  goto print_remap;
 case OMP_CLAUSE_TO_DECLARE:
   name = "to";
   goto print_remap;
diff --git a/gcc/tree.c b/gcc/tree.c
index f26e282..7c87612 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -281,7 +281,6 @@ unsigned const char omp_clause_num_ops[] =
   1, /* OMP_CLAUSE_USE_DEVICE_PTR  */
   1, /* OMP_CLAUSE_IS_DEVICE_PTR  */
   2, /* OMP_CLAUSE__CACHE_  */
-  1, /* OMP_CLAUSE_DEVICE_RESIDENT  */
   2, /* OMP_CLAUSE_GANG  */
   1, /* OMP_CLAUSE_ASYNC  */
   1, /* OMP_CLAUSE_WAIT  */
@@ -356,7 +355,6 @@ const char * const omp_clause_code_name[] =
   "use_device_ptr",
   "is_device_ptr",
   "_cache_",
-  "device_resident",
   "gang",
   "async",
   "wait",
@@ -11651,7 +11649,6 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
 	  WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, 1));
 	  /* FALLTHRU */
 
-	case OMP_CLAUSE_DEVICE_RESIDENT:
 	case OMP_CLAUSE_ASYNC:
 	case OMP_CLAUSE_WAIT:
 	case OMP_CLAUSE_WORKER:


Re: [PATCH] Fix ICE in default_use_anchors_for_symbol_p (PR middle-end/78201)

2016-11-10 Thread Yvan Roux
Hi,

On 10 November 2016 at 18:00, Jakub Jelinek  wrote:
> Hi!
>
> On arm/aarch64 we ICE because some decls that make it here has non-NULL
> DECL_SIZE, which is a VAR_DECL rather than CONST_INT (or DECL_SIZE that
> doesn't fit into shwi would ICE similarly).  While it is arguably a FE bug
> that it creates for VLA initialization from STRING_CST such a decl,
> I believe we have some PRs about it already open.
> I think it won't hurt to check for the large sizes properly even in this
> function though, and punt on unexpected cases, or even extremely large ones.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, Yvan said he is going
> to test on arm and/or aarch64.  Ok for trunk if the testing there passes
> too?

Bootstrapped and regtested on arm, aarch64, linux and bare targets
without issue.

Thanks,
Yvan

> 2016-11-10  Jakub Jelinek  
>
> PR middle-end/78201
> * varasm.c (default_use_anchors_for_symbol_p): Fix a comment typo.
> Don't test decl != NULL.  Don't look at DECL_SIZE, but DECL_SIZE_UNIT
> instead, return false if it is NULL, or doesn't fit into uhwi, or
> is larger or equal to targetm.max_anchor_offset.
>
> * g++.dg/opt/pr78201.C: New test.
>
> --- gcc/varasm.c.jj 2016-10-31 13:28:12.0 +0100
> +++ gcc/varasm.c2016-11-10 15:18:41.282886244 +0100
> @@ -6804,11 +6804,12 @@ default_use_anchors_for_symbol_p (const_
> return false;
>
>/* Don't use section anchors for decls that won't fit inside a single
> -anchor range to reduce the amount of instructions require to refer
> +anchor range to reduce the amount of instructions required to refer
>  to the entire declaration.  */
> -  if (decl && DECL_SIZE (decl)
> -&& tree_to_shwi (DECL_SIZE (decl))
> -   >= (targetm.max_anchor_offset * BITS_PER_UNIT))
> +  if (DECL_SIZE_UNIT (decl) == NULL_TREE
> + || !tree_fits_uhwi_p (DECL_SIZE_UNIT (decl))
> + || (tree_to_uhwi (DECL_SIZE_UNIT (decl))
> + >= (unsigned HOST_WIDE_INT) targetm.max_anchor_offset))
> return false;
>
>  }
> --- gcc/testsuite/g++.dg/opt/pr78201.C.jj   2016-11-10 15:20:18.398660681 
> +0100
> +++ gcc/testsuite/g++.dg/opt/pr78201.C  2016-11-10 15:19:58.0 +0100
> @@ -0,0 +1,13 @@
> +// PR middle-end/78201
> +// { dg-do compile }
> +// { dg-options "-O2" }
> +
> +struct B { long d (); } *c;
> +long e;
> +
> +void
> +foo ()
> +{
> +  char a[e] = "";
> +  c && c->d();
> +}
>
> Jakub


Re: [ipa-vrp] ice in set_value_range

2016-11-10 Thread kugan

Hi David,

Sorry about the breakage. I have already reverted this patch as this is 
causing bootstrap failures. I will test it on more targets before 
submitting this patch again.


Thanks,
Kugan

On 11/11/16 00:25, David Edelsohn wrote:

Kugan

Is there a PR for this failure?  It broke bootstrap on AIX as well and
I only was able to track it to your patch last night.

Thanks, David



[PATCH, Fortran] PR78277: ICE in is_anonymous_component, at fortran/interface.c:450

2016-11-10 Thread Fritz Reese
All,

The attached fixes an ICE-on-invalid-code, specifically due to invalid
anonymous structure declarations, as seen in the attached test case.
This also improves error handling in such cases- the anonymous
structure body will continue to be parsed even if the variable-decl
after the opening variable-type-decl is invalid. (Something similar
could be done to improve regular structure declarations.) See the
in-code comments and comments on the on the PR for an additional
description

Along with the first patch, I've attached another patch
(struct_whitespace) containing whitespace-only changes to some
dec-structure-related code; the poor formatting was introduced before
I had my vim settings right.

I intend to commit the two attached patches soon for trunk if nobody
finds any issues with it. They both regtest on x86_64-redhat-linux of
course.

---
Fritz Reese

> pr78277.diff
From: Fritz O. Reese 
Date: Thu, 10 Nov 2016 13:36:54 -0500
Subject: [PATCH] Fix ICE and improve errors for invalid anonymous
structure declarations.

PR fortran/78277
* gcc/fortran/decl.c (gfc_match_data_decl): Gracefully handle bad
anonymous structure declarations.

PR fortran/78277
* gcc/testsuite/gfortran.dg/dec_structure_17.f90: New test.
<

> struct_whitespace.diff
From: Fritz O. Reese 
Date: Thu, 10 Nov 2016 11:02:08 -0500
Subject: [PATCH] Fix some whitespace.

   gcc/fortran/
   * decl.c (get_struct_decl, gfc_match_map, gfc_match_union): Fix
   whitespace.
   * interface.c (gfc_compare_union_types): Likewise.
<
diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index 1272f1f..bf6bc24 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -4901,7 +4901,28 @@ ok:
 }
 
   if (!gfc_error_flag_test ())
-gfc_error ("Syntax error in data declaration at %C");
+{
+  /* An anonymous structure declaration is unambiguous; if we matched one
+according to gfc_match_structure_decl, we need to return MATCH_YES
+here to avoid confusing the remaining matchers, even if there was an
+error during variable_decl.  We must flush any such errors.  Note this
+causes the parser to gracefully continue parsing the remaining input
+as a structure body, which likely follows.  */
+  if (current_ts.type == BT_DERIVED && current_ts.u.derived
+ && gfc_fl_struct (current_ts.u.derived->attr.flavor))
+   {
+ gfc_error_now ("Syntax error in anonymous structure declaration"
+" at %C");
+ /* Skip the bad variable_decl and line up for the start of the
+structure body.  */
+ gfc_error_recovery ();
+ m = MATCH_YES;
+ goto cleanup;
+   }
+
+  gfc_error ("Syntax error in data declaration at %C");
+}
+
   m = MATCH_ERROR;
 
   gfc_free_data_all (gfc_current_ns);
diff --git a/gcc/testsuite/gfortran.dg/dec_structure_17.f90 
b/gcc/testsuite/gfortran.dg/dec_structure_17.f90
new file mode 100644
index 000..18d3193
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/dec_structure_17.f90
@@ -0,0 +1,27 @@
+! { dg-do compile }
+! { dg-options "-fdec-structure" }
+!
+! PR fortran/78277
+!
+! Fix ICE for invalid structure declaration code.
+!
+
+subroutine sub1()
+  structure /s/
+structure t
+  integer i
+end structure
+  end structure
+  record /s/ u
+  interface
+subroutine sub0(u)
+  structure /s/
+structure t. ! { dg-error "Syntax error in anonymous structure decl" }
+  integer i
+end structure
+  end structure
+  record /s/ u
+end
+  end interface
+  call sub0(u)
+end
diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index 0120ceb..1272f1f 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -8597,31 +8597,31 @@ get_struct_decl (const char *name, sym_flavor fl, locus 
*decl,
 match
 gfc_match_map (void)
 {
-/* Counter used to give unique internal names to map structures. */
-static unsigned int gfc_map_id = 0;
-char name[GFC_MAX_SYMBOL_LEN + 1];
-gfc_symbol *sym;
-locus old_loc;
+  /* Counter used to give unique internal names to map structures. */
+  static unsigned int gfc_map_id = 0;
+  char name[GFC_MAX_SYMBOL_LEN + 1];
+  gfc_symbol *sym;
+  locus old_loc;
 
-old_loc = gfc_current_locus;
+  old_loc = gfc_current_locus;
 
-if (gfc_match_eos () != MATCH_YES)
-  {
-  gfc_error ("Junk after MAP statement at %C");
-  gfc_current_locus = old_loc;
-  return MATCH_ERROR;
-  }
+  if (gfc_match_eos () != MATCH_YES)
+{
+   gfc_error ("Junk after MAP statement at %C");
+   gfc_current_locus = old_loc;
+   return MATCH_ERROR;
+}
 
-/* Map blocks are anonymous so we make up unique names for the symbol table
-   which are invalid Fortran identifiers.  */
-snprintf (name, GFC_MAX_SYMBOL_LEN + 1, "MM$%u", gfc_map_id++);
+  /* Map blocks are anonymous so we make up unique names for the symbol table
+   

Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)

2016-11-10 Thread Richard Biener
On November 10, 2016 6:38:12 PM GMT+01:00, Joseph Myers 
 wrote:
>On Fri, 28 Oct 2016, Richard Biener wrote:
>
>> +/* Parse a gimple expression.
>> +
>> +   gimple-expression:
>> + gimple-unary-expression
>> + gimple-call-statement
>> + gimple-binary-expression
>> + gimple-assign-expression
>> + gimple-cast-expression
>
>I don't see any comments expanding what the syntax is for most of these
>
>constructs.
>
>> +  if (c_parser_next_token_is (parser, CPP_EQ))
>> +c_parser_consume_token (parser);
>
>That implies you're allowing an optional '=' at this point in the
>syntax.  
>That doesn't seem to make sense to me; I'd expect you to do if (=) { 
>process assignment; } else { other cases; } or similar.
>
>> +  /* GIMPLE PHI expression.  */
>> +  if (c_parser_next_token_is_keyword (parser, RID_PHI))
>
>I don't see this mentioned in any of the syntax comments.
>
>> +  struct {
>> +/* The expression at this stack level.  */
>> +struct c_expr expr;
>> +/* The operation on its left.  */
>> +enum tree_code op;
>> +/* The source location of this operation.  */
>> +location_t loc;
>> +  } stack[2];
>> +  int sp;
>> +  /* Location of the binary operator.  */
>> +  location_t binary_loc = UNKNOWN_LOCATION;  /* Quiet warning.  */
>> +#define POP   \
>
>This all looks like excess complexity.  The syntax in the comment 
>indicates that in GIMPLE, the operands of a binary expression are unary
>
>expressions.  So nothing related to precedence is needed at all, and
>you 
>shouldn't need this stack construct.

I'll address those comments.  As you did not have any comments on the 
c-parser.[CH] parts does that mean you are fine with them?  That is, does the 
above constitute a complete review of the patch?

Thanks,
Richard.




Re: [PATCH] Fix PR71762

2016-11-10 Thread Richard Biener
On November 10, 2016 7:39:57 PM GMT+01:00, Marc Glisse  
wrote:
>On Thu, 10 Nov 2016, Richard Biener wrote:
>
>> The following fixes PR71762 via reverting the transforms of
>> ~X & Y to X < Y and similar because when the bools they apply to
>> are expanded to RTL undefined values are not reliably zero-extended
>> and thus the transform is invalid.  Ensuring the zero-extension
>> is too costly IMHO and the proper fix is to move the transform
>> to RTL where we can check known-zero-bits to validate validity
>> or to fix GIMPLE not not have operations on types not matching their
>mode
>> in precision.
>
>Can you explain why this particular transformation is special? We have
>a 
>number of other optimizations that take advantage of the fact that bool
>is 
>in [0, 1], even without looking at VRP, say for instance x != 0 -> x.
>Are 
>we supposed to remove all of them?

No.  But UNDEFINED & 0 is well-defined
Even if precision ends up being bigger.  Undefined != 0 is never well-defined.

Richard.




Re: [PATCH] Fix PR71762

2016-11-10 Thread Marc Glisse

On Thu, 10 Nov 2016, Richard Biener wrote:


The following fixes PR71762 via reverting the transforms of
~X & Y to X < Y and similar because when the bools they apply to
are expanded to RTL undefined values are not reliably zero-extended
and thus the transform is invalid.  Ensuring the zero-extension
is too costly IMHO and the proper fix is to move the transform
to RTL where we can check known-zero-bits to validate validity
or to fix GIMPLE not not have operations on types not matching their mode
in precision.


Can you explain why this particular transformation is special? We have a 
number of other optimizations that take advantage of the fact that bool is 
in [0, 1], even without looking at VRP, say for instance x != 0 -> x. Are 
we supposed to remove all of them?


--
Marc Glisse


[PATCH, gcc, wwwdocs] Document upcoming Qualcomm Falkor processor support

2016-11-10 Thread Siddhesh Poyarekar
Hi,

This patch documents the newly added flag in gcc 7 for the upcoming
Qualcomm Falkor processor core.

Siddhesh

Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.24
diff -u -r1.24 changes.html
--- htdocs/gcc-7/changes.html   9 Nov 2016 14:28:59 -   1.24
+++ htdocs/gcc-7/changes.html   10 Nov 2016 13:43:42 -
@@ -296,6 +296,11 @@
-mcpu=vulcan and -mtune=vulcan options as
well as the equivalent target attributes and pragmas.
  
+ 
+   The Qualcomm Falkor processor is now supported via the
+   -mcpu=falkor and -mtune=falkor options as
+   well as the equivalent target attributes and pragmas.
+ 

 
 ARM


Re: [patch, doc] PR 37998 -- Unclear documentation of -fno-common

2016-11-10 Thread Joseph Myers
On Thu, 10 Nov 2016, Sandra Loosemore wrote:

> This patch is for a long-standing issue with the way -fno-common is
> documented.  The specific bug report was about an ambiguous use of the word
> "This", but as I looked more at the way this option was described, I realized
> the whole thing was needlessly confusing because it didn't use the correct
> terminology -- the current text tries to use "declaration" to describe what
> the C standard calls a "tentative definition".
> 
> Joseph, can you look this over with your language-lawyer hat on, before I
> commit it?  I'm pretty sure I understand what this option does, but I want to
> be sure I'm explaining it correctly now.

This looks correct to me.

-- 
Joseph S. Myers
jos...@codesourcery.com


[patch, doc] PR 37998 -- Unclear documentation of -fno-common

2016-11-10 Thread Sandra Loosemore
This patch is for a long-standing issue with the way -fno-common is 
documented.  The specific bug report was about an ambiguous use of the 
word "This", but as I looked more at the way this option was described, 
I realized the whole thing was needlessly confusing because it didn't 
use the correct terminology -- the current text tries to use 
"declaration" to describe what the C standard calls a "tentative 
definition".


Joseph, can you look this over with your language-lawyer hat on, before 
I commit it?  I'm pretty sure I understand what this option does, but I 
want to be sure I'm explaining it correctly now.


-Sandra

2016-11-09  Sandra Loosemore  

	PR c/37998

	gcc/
	* doc/invoke.texi (Code Gen Options) [-fno-common]: Use correct
	terminology.  Expand to remove ambiguity.
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 241974)
+++ gcc/doc/invoke.texi	(working copy)
@@ -11953,25 +11953,32 @@ Use it to conform to a non-default appli
 
 @item -fno-common
 @opindex fno-common
-In C code, controls the placement of uninitialized global variables.
-Unix C compilers have traditionally permitted multiple definitions of
-such variables in different compilation units by placing the variables
-in a common block.
-This is the behavior specified by @option{-fcommon}, and is the default
-for GCC on most targets.
-On the other hand, this behavior is not required by ISO C, and on some
-targets may carry a speed or code size penalty on variable references.
-The @option{-fno-common} option specifies that the compiler should place
-uninitialized global variables in the data section of the object file,
-rather than generating them as common blocks.
-This has the effect that if the same variable is declared
-(without @code{extern}) in two different compilations,
-you get a multiple-definition error when you link them.
-In this case, you must compile with @option{-fcommon} instead.
+@cindex tentative definitions
+In C code, this option controls the placement of global variables 
+defined without an initializer, known as @dfn{tentative definitions} 
+in the C standard.  Tentative definitions are distinct from declarations 
+of a variable with the @code{extern} keyword, which do not allocate storage.
+
+Unix C compilers have traditionally allocated storage for
+uninitialized global variables in a common block.  This allows the
+linker to resolve all tentative definitions of the same variable
+in different compilation units to the same object, or to a non-tentative
+definition.  
+This is the behavior specified by @option{-fcommon}, and is the default for 
+GCC on most targets.  
+On the other hand, this behavior is not required by ISO
+C, and on some targets may carry a speed or code size penalty on
+variable references.
+
+The @option{-fno-common} option specifies that the compiler should instead
+place uninitialized global variables in the data section of the object file.
+This inhibits the merging of tentative definitions by the linker so
+you get a multiple-definition error if the same 
+variable is defined in more than one compilation unit.
 Compiling with @option{-fno-common} is useful on targets for which
 it provides better performance, or if you wish to verify that the
 program will work on other systems that always treat uninitialized
-variable declarations this way.
+variable definitions this way.
 
 @item -fno-ident
 @opindex fno-ident


Re: [PATCH], PowerPC ISA 3.0, allow QImode/HImode to go into vector registers

2016-11-10 Thread Michael Meissner
On Thu, Nov 10, 2016 at 10:05:25AM -0600, Segher Boessenkool wrote:
> Hi Mike,
> 
> > I have built the spec 2006 CPU benchmark suite with these changes, and the
> > power8 (ISA 2.07) code generation does not change.
> 
> Very good to hear :-)
> 
> Just some nits; okay for trunk with that fixed:
> 
> > +(define_split
> > +  [(set (match_operand:EXTHI 0 "altivec_register_operand" "")
> > +   (sign_extend:EXTHI
> > +(match_operand:HI 1 "indexed_or_indirect_operand" "")))]
> > +  "TARGET_P9_VECTOR && reload_completed"
> > +  [(set (match_dup 2)
> > +   (match_dup 1))
> > +   (set (match_dup 0)
> > +   (sign_extend:EXTHI (match_dup 2)))]
> > +{
> > +  operands[2] = gen_rtx_REG (HImode, REGNO (operands[1]));
> > +})
> 
> Please lose the default "" (here and elsewhere).

Ok for the match_operands, but I discovered that match_scratch still needs the
"".

> > Property changes on: gcc/testsuite/gcc.target/powerpc/p9-minmax-1.c
> > ___
> > Modified: svn:mergeinfo
> >Merged 
> > /trunk/gcc/testsuite/gcc.target/powerpc/p9-minmax-1.c:r241733-241924
> > 
> > 
> > Property changes on: gcc/testsuite/gcc.target/powerpc/p9-minmax-2.c
> > ___
> > Modified: svn:mergeinfo
> >Merged 
> > /trunk/gcc/testsuite/gcc.target/powerpc/p9-minmax-2.c:r241733-241924
> 
> I don't know what this is?

This is some artifact from svnmerge.  It shows up a lot of times when I do a
svn diff against the branch.  Usually I try to delete it, but once in awhile it
slips through.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



[gomp4] add support for derived types in ACC UPDATE

2016-11-10 Thread Cesar Philippidis
OpenACC 2.0a has limited support for fortran derived types. Basically,
derived type variables are only supported in ACC UPDATE. Rather than
adding generalized support for derived times in the gimplifier, this
patch has the fortran FE pass both subarrays and arrays as void pointers
with an appropriate OMP_CLAUSE_SIZE. ACC UPDATE is an executable
directive, and the gimplifier would just end up pruning out all of the
unnecessary supporting data clauses otherwise.

As of right now, GCC still lacks support for non-contiguous subarray
arguments to ACC UPDATE. I'm not sure why it was decided to let ACC
UPDATE support non-contiguous subarrays, but it already is an oddball
with its lone support for derived types.

This patch has been committed to gomp-4_0-branch.

Cesar
2016-11-10  Cesar Philippidis  

	gcc/fortran/
	* openmp.c (gfc_match_omp_variable_list): New allow_derived argument.
	(gfc_match_omp_map_clause): Update call to
	gfc_match_omp_variable_list.
	(gfc_match_omp_clauses): Update calls to gfc_match_omp_map_clause.
	(gfc_match_oacc_update): Update call to gfc_match_omp_clauses.
	(resolve_omp_clauses): Permit derived type variables in ACC UPDATE
	clauses.
	* trans-openmp.c (gfc_trans_omp_clauses_1): Lower derived type
	members.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Update handling of ACC
	UPDATE variables.

	gcc/testsuite/
	* gfortran.dg/goacc/derived-types.f90: New test.

	libgomp/
	* testsuite/libgomp.oacc-fortran/update-2.f90: New test.


diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 95885bc..0a9d137 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -216,7 +216,8 @@ static match
 gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 			 bool allow_common, bool *end_colon = NULL,
 			 gfc_omp_namelist ***headp = NULL,
-			 bool allow_sections = false)
+			 bool allow_sections = false,
+			 bool allow_derived = false)
 {
   gfc_omp_namelist *head, *tail, *p;
   locus old_loc, cur_loc;
@@ -242,7 +243,8 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 	case MATCH_YES:
 	  gfc_expr *expr;
 	  expr = NULL;
-	  if (allow_sections && gfc_peek_ascii_char () == '(')
+	  if (allow_sections && gfc_peek_ascii_char () == '('
+	  || allow_derived && gfc_peek_ascii_char () == '%')
 	{
 	  gfc_current_locus = cur_loc;
 	  m = gfc_match_variable (&expr, 0);
@@ -634,10 +636,11 @@ cleanup:
 
 static bool
 gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op,
-			  bool common_blocks)
+			  bool common_blocks, bool allow_derived)
 {
   gfc_omp_namelist **head = NULL;
-  if (gfc_match_omp_variable_list ("", list, common_blocks, NULL, &head, true)
+  if (gfc_match_omp_variable_list ("", list, common_blocks, NULL, &head, true,
+   allow_derived)
   == MATCH_YES)
 {
   gfc_omp_namelist *n;
@@ -655,7 +658,8 @@ gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op,
 static match
 gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 		   uint64_t dtype_mask, bool first = true,
-		   bool needs_space = true, bool openacc = false)
+		   bool needs_space = true, bool openacc = false,
+		   bool allow_derived = false)
 {
   gfc_omp_clauses *base_clauses, *c = gfc_get_omp_clauses ();
   locus old_loc;
@@ -773,7 +777,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 	  if ((mask & OMP_CLAUSE_COPY)
 	  && gfc_match ("copy ( ") == MATCH_YES
 	  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
-	   OMP_MAP_FORCE_TOFROM, openacc))
+	   OMP_MAP_FORCE_TOFROM, openacc,
+	   allow_derived))
 	continue;
 	  if (mask & OMP_CLAUSE_COPYIN)
 	{
@@ -781,7 +786,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 		{
 		  if (gfc_match ("copyin ( ") == MATCH_YES
 		  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
-		   OMP_MAP_FORCE_TO, true))
+		   OMP_MAP_FORCE_TO, true,
+		   allow_derived))
 		continue;
 		}
 	  else if (gfc_match_omp_variable_list ("copyin (",
@@ -792,7 +798,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 	  if ((mask & OMP_CLAUSE_COPYOUT)
 	  && gfc_match ("copyout ( ") == MATCH_YES
 	  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
-	   OMP_MAP_FORCE_FROM, true))
+	   OMP_MAP_FORCE_FROM, true,
+	   allow_derived))
 	continue;
 	  if ((mask & OMP_CLAUSE_COPYPRIVATE)
 	  && gfc_match_omp_variable_list ("copyprivate (",
@@ -802,14 +809,15 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 	  if ((mask & OMP_CLAUSE_CREATE)
 	  && gfc_match ("create ( ") == MATCH_YES
 	  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
-	   OMP_MAP_FORCE_ALLOC, true))
+	   OMP_MAP_FORCE_ALLOC, true,
+	   allow_derived))
 	continue;
 	  break;
 	case 'd':
 	  if ((mask & OMP_CLAUSE_DELETE)
 	  && gfc_match ("delete ( ") == MATCH_YES
 	  && gfc_match_omp_map_cl

Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)

2016-11-10 Thread Joseph Myers
On Fri, 28 Oct 2016, Richard Biener wrote:

> +/* Parse a gimple expression.
> +
> +   gimple-expression:
> + gimple-unary-expression
> + gimple-call-statement
> + gimple-binary-expression
> + gimple-assign-expression
> + gimple-cast-expression

I don't see any comments expanding what the syntax is for most of these 
constructs.

> +  if (c_parser_next_token_is (parser, CPP_EQ))
> +c_parser_consume_token (parser);

That implies you're allowing an optional '=' at this point in the syntax.  
That doesn't seem to make sense to me; I'd expect you to do if (=) { 
process assignment; } else { other cases; } or similar.

> +  /* GIMPLE PHI expression.  */
> +  if (c_parser_next_token_is_keyword (parser, RID_PHI))

I don't see this mentioned in any of the syntax comments.

> +  struct {
> +/* The expression at this stack level.  */
> +struct c_expr expr;
> +/* The operation on its left.  */
> +enum tree_code op;
> +/* The source location of this operation.  */
> +location_t loc;
> +  } stack[2];
> +  int sp;
> +  /* Location of the binary operator.  */
> +  location_t binary_loc = UNKNOWN_LOCATION;  /* Quiet warning.  */
> +#define POP\

This all looks like excess complexity.  The syntax in the comment 
indicates that in GIMPLE, the operands of a binary expression are unary 
expressions.  So nothing related to precedence is needed at all, and you 
shouldn't need this stack construct.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] dwarf2cfi: Dump row differences before asserting

2016-11-10 Thread Jakub Jelinek
On Thu, Nov 10, 2016 at 11:29:12AM -0600, Segher Boessenkool wrote:
> Ping.
> 
> On Mon, Oct 31, 2016 at 05:15:53PM +, Segher Boessenkool wrote:
> > If maybe_record_trace_start fails because the CFI is inconsistent on two
> > paths into a block it currently just ICEs.  This changes it to also dump
> > the CFI on those two paths in the dump file; debugging it without that
> > information is hopeless.
> > 
> > Tested on powerpc64-linux {-m32,-m64}.  Is this okay for trunk?
> > 
> > 
> > Segher
> > 
> > 
> > 2016-10-31  Segher Boessenkool  
> > 
> > * dwarf2cfi.c (dump_cfi_row): Add forward declaration.
> > (maybe_record_trace_start): If the CFI is different on the new and
> > old paths, print out both to the dump file before ICEing.

Ok, thanks.

Jakub


Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread Nathan Sidwell

On 11/10/2016 08:24 AM, Martin Liška wrote:

On 11/10/2016 05:17 PM, David Edelsohn wrote:

Maybe instead of adding "maybe", we need to change the severity of the
warning so that the warning is not emitted by default.


Adding the warning option to -Wextra can be solution. Is it acceptable
approach?


I don't think that's good.  Now I understand the -pthreads thing, we 
have different use cases.


1) user explicitly said -fprofile-update=FOO.  They shouldn't have to 
enable something else to get a diagnostic that FOO doesn't work.


2) driver implicitly said -fprofile-update=FOO, because the user said 
-pthreads but the driver doesn't know if FOO is acceptable.  We want to 
silently fallback to the old behaviour.


The proposed solution addresses #2 by having the driver say 
-fprofile-update=META-FOO.  My dislike is that we're exposing this to 
the user and they're going to start using it.  That strikes me as 
undesirable.


How hard is it to implement the fprofile-update option value as a list. 
I.e. '-fprofile-update=atomic,single', with semantics of 'pick the first 
one you can do'? If that's straightforwards, then that seems to me as a 
better solution for #2. [flyby-thought, have 'atomic,single' as an 
acceptable single option value?]


Failing that, Martin's solution is probably the sanest available 
solution, but I'd like to rename 'maybe-atomic' to the more meaningful 
'prefer-atomic'.  With 'maybe-atomic', I'm left wondering if it looks at 
the phase of the moon.


nathan

--
Nathan Sidwell


Re: [PATCH] dwarf2cfi: Dump row differences before asserting

2016-11-10 Thread Segher Boessenkool
Ping.

On Mon, Oct 31, 2016 at 05:15:53PM +, Segher Boessenkool wrote:
> If maybe_record_trace_start fails because the CFI is inconsistent on two
> paths into a block it currently just ICEs.  This changes it to also dump
> the CFI on those two paths in the dump file; debugging it without that
> information is hopeless.
> 
> Tested on powerpc64-linux {-m32,-m64}.  Is this okay for trunk?
> 
> 
> Segher
> 
> 
> 2016-10-31  Segher Boessenkool  
> 
>   * dwarf2cfi.c (dump_cfi_row): Add forward declaration.
>   (maybe_record_trace_start): If the CFI is different on the new and
>   old paths, print out both to the dump file before ICEing.
> 
> ---
>  gcc/dwarf2cfi.c | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
> index da9da52..19edc28 100644
> --- a/gcc/dwarf2cfi.c
> +++ b/gcc/dwarf2cfi.c
> @@ -2239,6 +2239,8 @@ add_cfis_to_fde (void)
>  }
>  }
>  
> +static void dump_cfi_row (FILE *f, dw_cfi_row *row);
> +
>  /* If LABEL is the start of a trace, then initialize the state of that
> trace from CUR_TRACE and CUR_ROW.  */
>  
> @@ -2282,7 +2284,21 @@ maybe_record_trace_start (rtx_insn *start, rtx_insn 
> *origin)
>/* We ought to have the same state incoming to a given trace no
>matter how we arrive at the trace.  Anything else means we've
>got some kind of optimization error.  */
> -  gcc_checking_assert (cfi_row_equal_p (cur_row, ti->beg_row));
> +#if CHECKING_P
> +  if (!cfi_row_equal_p (cur_row, ti->beg_row))
> + {
> +   if (dump_file)
> + {
> +   fprintf (dump_file, "Inconsistent CFI state!\n");
> +   fprintf (dump_file, "SHOULD have:\n");
> +   dump_cfi_row (dump_file, ti->beg_row);
> +   fprintf (dump_file, "DO have:\n");
> +   dump_cfi_row (dump_file, cur_row);
> + }
> +
> +   gcc_unreachable ();
> + }
> +#endif
>  
>/* The args_size is allowed to conflict if it isn't actually used.  */
>if (ti->beg_true_args_size != args_size)
> -- 
> 1.9.3


[PATCH][ARM] Improve max_insns_skipped logic

2016-11-10 Thread Wilco Dijkstra
Improve the logic when setting max_insns_skipped.  Limit the maximum size of IT
to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed,
increasing codesize.  Given 4 works well for Thumb-2, use the same limit for ARM
for consistency. 

ChangeLog:
2016-11-04  Wilco Dijkstra  

* config/arm/arm.c (arm_option_params_internal): Improve setting of
max_insns_skipped.
--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
f046854e9665d54911616fc1c60fee407188f7d6..29e8d1d07d918fbb2a627a653510dfc8587ee01a
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2901,20 +2901,12 @@ arm_option_params_internal (void)
   targetm.max_anchor_offset = TARGET_MAX_ANCHOR_OFFSET;
 }
 
-  if (optimize_size)
-{
-  /* If optimizing for size, bump the number of instructions that we
- are prepared to conditionally execute (even on a StrongARM).  */
-  max_insns_skipped = 6;
+  /* Increase the number of conditional instructions with -Os.  */
+  max_insns_skipped = optimize_size ? 4 : current_tune->max_insns_skipped;
 
-  /* For THUMB2, we limit the conditional sequence to one IT block.  */
-  if (TARGET_THUMB2)
-max_insns_skipped = arm_restrict_it ? 1 : 4;
-}
-  else
-/* When -mrestrict-it is in use tone down the if-conversion.  */
-max_insns_skipped = (TARGET_THUMB2 && arm_restrict_it)
-  ? 1 : current_tune->max_insns_skipped;
+  /* For THUMB2, we limit the conditional sequence to one IT block.  */
+  if (TARGET_THUMB2)
+max_insns_skipped = MIN (max_insns_skipped, MAX_INSN_PER_IT_BLOCK);
 }
 
 /* True if -mflip-thumb should next add an attribute for the default



Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-10 Thread Andrew Senkevich
2016-11-10 20:14 GMT+03:00 Vladimir N Makarov :
>
>
> On 11/10/2016 11:27 AM, Andrew Senkevich wrote:
>>
>> Hi,
>>
>> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.
>>
>> It requires additional patch for register allocator from Vladimir
>> Makarov to be committed before.
>>
>>
> I've just committed the necessary patch.

Thanks, Vladimir.


--
WBR,
Andrew


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-10 Thread Andrew Senkevich
2016-11-10 19:36 GMT+03:00 Jakub Jelinek :
> On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote:
>> Hi,
>>
>> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.
>>
>> It requires additional patch for register allocator from Vladimir
>> Makarov to be committed before.
>
> Your MUA ate tabs (and in the ChangeLog you're using spaces instead of
> tabs), can you repost as attachment or configure your MUA not to do this?
>
> Just a couple of random nits follow:
>
>> * gcc.target/i386/sse-12.c: Add -mavx5124fmaddps.
>
> This mentions an option that doesn't exist, is that s/dd// ?

Yes.
Attached fixed version.


--
WBR,
Andrew


new_avx512_instructions.patch
Description: Binary data


Re: [PATCH 2/2][AArch64] Add bfx attribute

2016-11-10 Thread Dr. Philipp Tomsich
> On 10 Nov 2016, at 18:14, Wilco Dijkstra  wrote:
> 
> I think the XGene-1 scheduler might need a similar change as currently all 
> AArch64
> shifts are modelled as 2-cycle operations.

Thanks for the heads-up.  We’ll indeed need to update this.

Regards,
Philipp.

[PATCH][AArch64] Improve TI mode address offsets

2016-11-10 Thread Wilco Dijkstra
Improve TI mode address offsets - these may either use LDP of 64-bit or
LDR of 128-bit, so we need to use the correct intersection of offsets.
When splitting a large offset into base and offset, use a signed 9-bit 
unscaled offset.

Remove the Ump constraint on movti and movtf instructions as this blocks
the reload optimizer from merging address CSEs (is this supposed to work
only on 'm' constraints?).  The result is improved codesize, especially
wrf and gamess in SPEC2006.


int f (int x)
{
  __int128_t arr[100];
  arr[31] = 0;
  arr[48] = 0;
  arr[79] = 0;
  arr[65] = 0;
  arr[70] = 0;
  return arr[x];
}

Before patch (note the multiple redundant add x1, sp, 1024):
sub sp, sp, #1600
sbfiz   x0, x0, 4, 32
add x1, sp, 256
stp xzr, xzr, [x1, 240]
add x1, sp, 768
stp xzr, xzr, [x1]
add x1, sp, 1024
stp xzr, xzr, [x1, 240]
add x1, sp, 1024
stp xzr, xzr, [x1, 16]
add x1, sp, 1024
stp xzr, xzr, [x1, 96]
ldr w0, [sp, x0]
add sp, sp, 1600
ret

After patch:
sub sp, sp, #1600
sbfiz   x0, x0, 4, 32
add x1, sp, 1024
stp xzr, xzr, [sp, 496]
stp xzr, xzr, [x1, -256]
stp xzr, xzr, [x1, 240]
stp xzr, xzr, [x1, 16]
stp xzr, xzr, [x1, 96]
ldr w0, [sp, x0]
add sp, sp, 1600
ret


Bootstrap & regress OK.

ChangeLog:
2015-11-10  Wilco Dijkstra  

gcc/
* config/aarch64/aarch64.md (movti_aarch64): Change Ump to m.
(movtf_aarch64): Likewise.
* config/aarch64/aarch64.c (aarch64_classify_address):
Use correct intersection of offsets.
(aarch64_legitimize_address_displacement): Use 9-bit signed offsets.
(aarch64_legitimize_address): Use 9-bit signed offsets for TI/TF mode.
Use 7-bit signed scaled mode for modes > 16 bytes.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
3045e6d6447d5c1860feb51708eeb2a21d2caca9..45f44e96ba9e9d3c8c41d977aa509fa13398a8fd
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4066,7 +4066,8 @@ aarch64_classify_address (struct aarch64_address_info 
*info,
 instruction memory accesses.  */
  if (mode == TImode || mode == TFmode)
return (aarch64_offset_7bit_signed_scaled_p (DImode, offset)
-   && offset_9bit_signed_unscaled_p (mode, offset));
+   && (offset_9bit_signed_unscaled_p (mode, offset)
+   || offset_12bit_unsigned_scaled_p (mode, offset)));
 
  /* A 7bit offset check because OImode will emit a ldp/stp
 instruction (only big endian will get here).
@@ -4270,18 +4271,19 @@ aarch64_legitimate_address_p (machine_mode mode, rtx x,
 /* Split an out-of-range address displacement into a base and offset.
Use 4KB range for 1- and 2-byte accesses and a 16KB range otherwise
to increase opportunities for sharing the base address of different sizes.
-   For TI/TFmode and unaligned accesses use a 256-byte range.  */
+   For unaligned accesses and TI/TF mode use the signed 9-bit range.  */
 static bool
 aarch64_legitimize_address_displacement (rtx *disp, rtx *off, machine_mode 
mode)
 {
-  HOST_WIDE_INT mask = GET_MODE_SIZE (mode) < 4 ? 0xfff : 0x3fff;
+  HOST_WIDE_INT offset = INTVAL (*disp);
+  HOST_WIDE_INT base = offset & ~(GET_MODE_SIZE (mode) < 4 ? 0xfff : 0x3ffc);
 
-  if (mode == TImode || mode == TFmode ||
-  (INTVAL (*disp) & (GET_MODE_SIZE (mode) - 1)) != 0)
-mask = 0xff;
+  if (mode == TImode || mode == TFmode
+  || (offset & (GET_MODE_SIZE (mode) - 1)) != 0)
+base = (offset + 0x100) & ~0x1ff;
 
-  *off = GEN_INT (INTVAL (*disp) & ~mask);
-  *disp = GEN_INT (INTVAL (*disp) & mask);
+  *off = GEN_INT (base);
+  *disp = GEN_INT (offset - base);
   return true;
 }
 
@@ -5148,12 +5150,10 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
machine_mode mode)
  x = gen_rtx_PLUS (Pmode, base, offset_rtx);
}
 
-  /* Does it look like we'll need a load/store-pair operation?  */
+  /* Does it look like we'll need a 16-byte load/store-pair operation?  */
   HOST_WIDE_INT base_offset;
-  if (GET_MODE_SIZE (mode) > 16
- || mode == TImode)
-   base_offset = ((offset + 64 * GET_MODE_SIZE (mode))
-  & ~((128 * GET_MODE_SIZE (mode)) - 1));
+  if (GET_MODE_SIZE (mode) > 16)
+   base_offset = (offset + 0x400) & ~0x7f0;
   /* For offsets aren't a multiple of the access size, the limit is
 -256...255.  */
   else if (offset & (GET_MODE_SIZE (mode) - 1))
@@ -5167,6 +5167,8 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
machine_mode mode)
   /* Small negative offsets are supported.  */
   else if (IN_RANGE (offset, -256, 0))
base_offset = 0;
+  else if (mode == TImod

Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-10 Thread Vladimir N Makarov



On 11/10/2016 11:27 AM, Andrew Senkevich wrote:

Hi,

this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.

It requires additional patch for register allocator from Vladimir
Makarov to be committed before.



I've just committed the necessary patch.



[PATCH 2/2][AArch64] Add bfx attribute

2016-11-10 Thread Wilco Dijkstra
The second patch updates the Cortex-A57 scheduler now that we can differentiate
between shifts and bitfield inserts.  The Cortex-A57 Software Optimization Guide
indicates that BFM operations use the integer multi-cycle pipeline, while ARM
UXTB/H instructions use the Integer 1 or Integer 0 pipelines, so swap the bfm
and extend reservations.  This results in minor scheduling differences.

I think the XGene-1 scheduler might need a similar change as currently all 
AArch64
shifts are modelled as 2-cycle operations.

ChangeLog:
2016-11-10  Wilco Dijkstra  

* config/arm/cortex-a57.md (cortex_a57_alu): Move extend here, move 
bfm...
(cortex_a57_alu_shift): ...here.

--
diff --git a/gcc/config/arm/cortex-a57.md b/gcc/config/arm/cortex-a57.md
index 
da461846baa5b28ce3d9c9f731dbfd7becb31a85..63072509e50375929f75c44af900a4803a6285f3
 100644
--- a/gcc/config/arm/cortex-a57.md
+++ b/gcc/config/arm/cortex-a57.md
@@ -297,7 +297,7 @@
(eq_attr "type" "alu_imm,alus_imm,logic_imm,logics_imm,\
alu_sreg,alus_sreg,logic_reg,logics_reg,\
adc_imm,adcs_imm,adc_reg,adcs_reg,\
-   adr,bfm,bfx,clz,rbit,rev,alu_dsp_reg,\
+   adr,bfx,extend,clz,rbit,rev,alu_dsp_reg,\
rotate_imm,shift_imm,shift_reg,\
mov_imm,mov_reg,\
mvn_imm,mvn_reg,\
@@ -307,7 +307,7 @@
 ;; ALU ops with immediate shift
 (define_insn_reservation "cortex_a57_alu_shift" 3
   (and (eq_attr "tune" "cortexa57")
-   (eq_attr "type" "extend,\
+   (eq_attr "type" "bfm,\
alu_shift_imm,alus_shift_imm,\
crc,logic_shift_imm,logics_shift_imm,\
mov_shift,mvn_shift"))



Re: gomp-nvptx branch - middle-end changes

2016-11-10 Thread Alexander Monakov
gcc/
* internal-fn.c (expand_GOMP_SIMT_LANE): New.
(expand_GOMP_SIMT_VF): New.
(expand_GOMP_SIMT_LAST_LANE): New.
(expand_GOMP_SIMT_ORDERED_PRED): New.
(expand_GOMP_SIMT_VOTE_ANY): New.
(expand_GOMP_SIMT_XCHG_BFLY): New.
(expand_GOMP_SIMT_XCHG_IDX): New.
* internal-fn.def (GOMP_SIMT_LANE): New.
(GOMP_SIMT_VF): New.
(GOMP_SIMT_LAST_LANE): New.
(GOMP_SIMT_ORDERED_PRED): New.
(GOMP_SIMT_VOTE_ANY): New.
(GOMP_SIMT_XCHG_BFLY): New.
(GOMP_SIMT_XCHG_IDX): New.
* omp-low.c (omp_maybe_offloaded_ctx): New, outlined from...
(create_omp_child_function): ...here.  Set "omp target entrypoint"
or "omp declare target" attribute based on is_gimple_omp_offloaded.
(omp_max_simt_vf): New.  Use it...
(omp_max_vf): ...here.
(lower_rec_input_clauses): Add reduction lowering for SIMT execution.
(lower_lastprivate_clauses): Likewise, for "lastprivate" lowering.
(lower_omp_ordered): Likewise, for "ordered" lowering.
(expand_omp_simd): Add SIMT transforms.
(pass_data_lower_omp): Add PROP_gimple_lomp_dev.
(execute_omp_device_lower): New.
(pass_data_omp_device_lower): New.
(pass_omp_device_lower): New pass.
(make_pass_omp_device_lower): New.
* passes.def (pass_omp_device_lower): Position new pass.
* tree-pass.h (PROP_gimple_lomp_dev): Define.
(make_pass_omp_device_lower): Declare.

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index cbee97e..fd1cd8b 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -157,6 +157,132 @@ expand_ANNOTATE (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+/* Lane index on SIMT targets: thread index in the warp on NVPTX.  On targets
+   without SIMT execution this should be expanded in omp_device_lower pass.  */
+
+static void
+expand_GOMP_SIMT_LANE (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  gcc_assert (targetm.have_omp_simt_lane ());
+  emit_insn (targetm.gen_omp_simt_lane (target));
+}
+
+/* This should get expanded in omp_device_lower pass.  */
+
+static void
+expand_GOMP_SIMT_VF (internal_fn, gcall *)
+{
+  gcc_unreachable ();
+}
+
+/* Lane index of the first SIMT lane that supplies a non-zero argument.
+   This is a SIMT counterpart to GOMP_SIMD_LAST_LANE, used to represent the
+   lane that executed the last iteration for handling OpenMP lastprivate.  */
+
+static void
+expand_GOMP_SIMT_LAST_LANE (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx cond = expand_normal (gimple_call_arg (stmt, 0));
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+  struct expand_operand ops[2];
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], cond, mode);
+  gcc_assert (targetm.have_omp_simt_last_lane ());
+  expand_insn (targetm.code_for_omp_simt_last_lane, 2, ops);
+}
+
+/* Non-transparent predicate used in SIMT lowering of OpenMP "ordered".  */
+
+static void
+expand_GOMP_SIMT_ORDERED_PRED (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx ctr = expand_normal (gimple_call_arg (stmt, 0));
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+  struct expand_operand ops[2];
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], ctr, mode);
+  gcc_assert (targetm.have_omp_simt_ordered ());
+  expand_insn (targetm.code_for_omp_simt_ordered, 2, ops);
+}
+
+/* "Or" boolean reduction across SIMT lanes: return non-zero in all lanes if
+   any lane supplies a non-zero argument.  */
+
+static void
+expand_GOMP_SIMT_VOTE_ANY (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx cond = expand_normal (gimple_call_arg (stmt, 0));
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+  struct expand_operand ops[2];
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], cond, mode);
+  gcc_assert (targetm.have_omp_simt_vote_any ());
+  expand_insn (targetm.code_for_omp_simt_vote_any, 2, ops);
+}
+
+/* Exchange between SIMT lanes with a "butterfly" pattern: source lane index
+   is destination lane index XOR given offset.  */
+
+static void
+expand_GOMP_SIMT_XCHG_BFLY (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx src = expand_normal (gimple_call_arg (stmt, 0));
+  rtx idx = expand_normal (gimple_call_arg (stmt, 1));
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+  struct expand_op

Re: gomp-nvptx branch - libgomp changes

2016-11-10 Thread Alexander Monakov
libgomp/

* Makefile.am (libgomp_la_SOURCES): Add atomic.c, icv.c, icv-device.c.
* Makefile.in. Regenerate.
* configure.ac [nvptx*-*-*] (libgomp_use_pthreads): Set and use it...
(LIBGOMP_USE_PTHREADS): ...here; new define.
* configure: Regenerate.
* config.h.in: Likewise.
* config/posix/affinity.c: Move to...
* affinity.c: ...here (new file).  Guard use of PThreads-specific
interface by LIBGOMP_USE_PTHREADS. 
* critical.c: Split out GOMP_atomic_{start,end} into...
* atomic.c: ...here (new file).
* env.c: Split out ICV definitions into...
* icv.c: ...here (new file) and...
* icv-device.c: ...here. New file.
* config/linux/lock.c (gomp_init_lock_30): Move to generic lock.c.
(gomp_destroy_lock_30): Ditto.
(gomp_set_lock_30): Ditto.
(gomp_unset_lock_30): Ditto.
(gomp_test_lock_30): Ditto.
(gomp_init_nest_lock_30): Ditto.
(gomp_destroy_nest_lock_30): Ditto.
(gomp_set_nest_lock_30): Ditto.
(gomp_unset_nest_lock_30): Ditto.
(gomp_test_nest_lock_30): Ditto.
* lock.c: New.
* config/nvptx/lock.c: New.
* config/nvptx/bar.c: New.
* config/nvptx/bar.h: New.
* config/nvptx/doacross.h: New.
* config/nvptx/error.c: New.
* config/nvptx/icv-device.c: New.
* config/nvptx/mutex.h: New.
* config/nvptx/pool.h: New.
* config/nvptx/proc.c: New.
* config/nvptx/ptrlock.h: New.
* config/nvptx/sem.h: New.
* config/nvptx/simple-bar.h: New.
* config/nvptx/target.c: New.
* config/nvptx/task.c: New.
* config/nvptx/team.c: New.
* config/nvptx/time.c: New.
* config/posix/simple-bar.h: New.
* libgomp.h: Guard pthread.h inclusion.  Include simple-bar.h.
(gomp_num_teams_var): Declare.
(struct gomp_thread_pool): Change threads_dock member to
gomp_simple_barrier_t.
[__nvptx__] (gomp_thread): New implementation.
(gomp_thread_attr): Guard by LIBGOMP_USE_PTHREADS.
(gomp_thread_destructor): Ditto.
(gomp_init_thread_affinity): Ditto.
* team.c: Guard uses of PThreads-specific interfaces by
LIBGOMP_USE_PTHREADS.  Adjust all uses of threads_dock.
(gomp_free_thread) [__nvptx__]: Do not call 'free'.

* config/nvptx/alloc.c: Delete.
* config/nvptx/barrier.c: Ditto.
* config/nvptx/fortran.c: Ditto.
* config/nvptx/iter.c: Ditto.
* config/nvptx/iter_ull.c: Ditto.
* config/nvptx/loop.c: Ditto.
* config/nvptx/loop_ull.c: Ditto.
* config/nvptx/ordered.c: Ditto.
* config/nvptx/parallel.c: Ditto.
* config/nvptx/section.c: Ditto.
* config/nvptx/single.c: Ditto.
* config/nvptx/splay-tree.c: Ditto.
* config/nvptx/work.c: Ditto.

* testsuite/libgomp.fortran/fortran.exp (lang_link_flags): Pass
-foffload=-lgfortran in addition to -lgfortran.
* testsuite/libgomp.oacc-fortran/fortran.exp (lang_link_flags): Ditto.

* plugin/plugin-nvptx.c: Include .
(struct targ_fn_descriptor): Add new fields.
(struct ptx_device): Ditto.  Set them...
(nvptx_open_device): ...here.
(nvptx_adjust_launch_bounds): New.
(nvptx_host2dev): Allow NULL 'nvthd'.
(nvptx_dev2host): Ditto.
(GOMP_OFFLOAD_get_caps): Add GOMP_OFFLOAD_CAP_OPENMP_400.
(link_ptx): Adjust log sizes.
(nvptx_host2dev): Allow NULL 'nvthd'.
(nvptx_dev2host): Ditto.
(nvptx_set_clocktick): New.  Use it...
(GOMP_OFFLOAD_load_image): ...here.  Set new targ_fn_descriptor
fields.
(GOMP_OFFLOAD_dev2dev): New.
(nvptx_adjust_launch_bounds): New.
(nvptx_stacks_size): New.
(nvptx_stacks_alloc): New.
(nvptx_stacks_free): New.
(GOMP_OFFLOAD_run): New.
(GOMP_OFFLOAD_async_run): New (stub).

diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index a3e1c2b..4090336 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -58,12 +58,12 @@ libgomp_la_LDFLAGS = $(libgomp_version_info) 
$(libgomp_version_script) \
 libgomp_la_DEPENDENCIES = $(libgomp_version_dep)
 libgomp_la_LINK = $(LINK) $(libgomp_la_LDFLAGS)
 
-libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c error.c iter.c \
-   iter_ull.c loop.c loop_ull.c ordered.c parallel.c sections.c single.c \
-   task.c team.c work.c lock.c mutex.c proc.c sem.c bar.c ptrlock.c \
-   time.c fortran.c affinity.c target.c splay-tree.c libgomp-plugin.c \
-   oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c oacc-async.c \
-   oacc-plugin.c oacc-cuda.c priority_queue.c
+libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c error.c \
+   icv.c icv-device.c iter.c iter_ull.c loop.c loop_ull.c ordered.c \
+   parallel.c sections.c single.c task.c team.

A RA patch necessary for new Intel insns generation

2016-11-10 Thread Vladimir N Makarov
Hi, the following patch is necessary for generation of new Intel insns 
requiring 4 aligned zmm regs.


Committed as rev. 242043.

Index: ChangeLog
===
--- ChangeLog	(revision 242040)
+++ ChangeLog	(working copy)
@@ -1,3 +1,12 @@
+2016-11-10  Vladimir Makarov  
+
+	* target.def (additional_allocno_class_p): New.
+	* hooks.h (hook_bool_reg_class_t_false): New prototype.
+	* hooks.c (hook_bool_reg_class_t_false): New.
+	* ira.c (setup_allocno_and_important_classes): Use the new hook.
+	* doc/tm.texi.in (TARGET_ADDITIONAL_ALLOCNO_CLASS_P): Add it.
+	* doc/tm.texi: Update.
+
 2016-11-10  Jason Merrill  
 
 	* gengtype.c (new_structure): Append to structures list.
Index: hooks.h
===
--- hooks.h	(revision 242040)
+++ hooks.h	(working copy)
@@ -55,6 +55,7 @@ extern bool hook_bool_rtx_insn_true (rtx
 extern bool hook_bool_rtx_false (rtx);
 extern bool hook_bool_rtx_insn_int_false (rtx_insn *, int);
 extern bool hook_bool_uintp_uintp_false (unsigned int *, unsigned int *);
+extern bool hook_bool_reg_class_t_false (reg_class_t regclass);
 extern bool hook_bool_rtx_mode_int_int_intp_bool_false (rtx, machine_mode,
 			int, int, int *, bool);
 extern bool hook_bool_tree_tree_false (tree, tree);
Index: hooks.c
===
--- hooks.c	(revision 242040)
+++ hooks.c	(working copy)
@@ -466,3 +466,11 @@ hook_bool_uint_uintp_false (unsigned int
 {
   return false;
 }
+
+/* Generic hook that takes a register class and returns false.  */
+bool
+hook_bool_reg_class_t_false (reg_class_t regclass ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
Index: target.def
===
--- target.def	(revision 242040)
+++ target.def	(working copy)
@@ -5029,6 +5029,18 @@ DEFHOOK
  reg_class_t, (reg_class_t, machine_mode),
  NULL)
 
+/* Determine an additional allocno class.  */
+DEFHOOK
+(additional_allocno_class_p,
+ "This hook should return @code{true} if given class of registers should\
+  be an allocno class in any way.  Usually RA uses only one register\
+  class from all classes containing the same register set.  In some\
+  complicated cases, you need to have two or more such classes as\
+  allocno ones for RA correct work.  Not defining this hook is\
+  equivalent to returning @code{false} for all inputs.",
+ bool, (reg_class_t),
+ hook_bool_reg_class_t_false)
+
 DEFHOOK
 (cstore_mode,
  "This hook defines the machine mode to use for the boolean result of\
Index: ira.c
===
--- ira.c	(revision 242040)
+++ ira.c	(working copy)
@@ -1012,7 +1012,7 @@ setup_allocno_and_important_classes (voi
 temp_hard_regset2))
 	break;
 	}
-  if (j >= n)
+  if (j >= n || targetm.additional_allocno_class_p (i))
 	classes[n++] = (enum reg_class) i;
   else if (i == GENERAL_REGS)
 	/* Prefer general regs.  For i386 example, it means that
Index: doc/tm.texi.in
===
--- doc/tm.texi.in	(revision 242040)
+++ doc/tm.texi.in	(working copy)
@@ -2507,6 +2507,8 @@ value that the middle-end intended.
 
 @hook TARGET_SPILL_CLASS
 
+@hook TARGET_ADDITIONAL_ALLOCNO_CLASS_P
+
 @hook TARGET_CSTORE_MODE
 
 @hook TARGET_COMPUTE_PRESSURE_CLASSES
Index: doc/tm.texi
===
--- doc/tm.texi	(revision 242040)
+++ doc/tm.texi	(working copy)
@@ -2899,6 +2899,10 @@ addressing.
 This hook defines a class of registers which could be used for spilling  pseudos of the given mode and class, or @code{NO_REGS} if only memory  should be used.  Not defining this hook is equivalent to returning  @code{NO_REGS} for all inputs.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_ADDITIONAL_ALLOCNO_CLASS_P (reg_class_t)
+This hook should return @code{true} if given class of registers should  be an allocno class in any way.  Usually RA uses only one register  class from all classes containing the same register set.  In some  complicated cases, you need to have two or more such classes as  allocno ones for RA correct work.  Not defining this hook is  equivalent to returning @code{false} for all inputs.
+@end deftypefn
+
 @deftypefn {Target Hook} machine_mode TARGET_CSTORE_MODE (enum insn_code @var{icode})
 This hook defines the machine mode to use for the boolean result of  conditional store patterns.  The ICODE argument is the instruction code  for the cstore being performed.  Not definiting this hook is the same  as accepting the mode encoded into operand 0 of the cstore expander  patterns.
 @end deftypefn


[PATCH 1/2][AArch64] Add bfx attribute

2016-11-10 Thread Wilco Dijkstra
Currently the SBFM, UBFM and BFM instructions all use the attribute "bfm".
SBFM and UBFM include all shifts on AArch64, which are simpler than bitfield
insert.  Add a new bfx attribute for these instructions so that they can be
modelled more accurately in the future.  There is no difference in code 
generation.

ChangeLog:
2016-11-10  Wilco Dijkstra  

* config/aarch64/aarch64.md (aarch64_ashl_sisd_or_int_3)
Use bfx attribute.
(aarch64_lshr_sisd_or_int_3): Likewise.
(aarch64_ashr_sisd_or_int_3): Likewise.
(si3_insn_uxtw): Likewise.
(3_insn): Likewise.
(_ashl): Likewise.
(zero_extend_lshr): Likewise.
(extend_ashr): Likewise.
(): Likewise.
(insv): Likewise.
(andim_ashift_bfiz): Likewise.
* config/aarch64/thunderx.md (thunderx_shift): Add bfx.
* config/arm/cortex-a53.md (cortex_a53_alu_shift): Likewise.
* config/arm/cortex-a57.md (cortex_a57_alu): Add bfx.
* config/arm/exynos-m1.md (exynos_m1_alu): Add bfx.
(exynos_m1_alu_p): Likewise.
* config/arm/types.md: Add bfx.
* config/arm/xgene1.md (xgene1_bfm): Add bfx.

--
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
62eda569f9b642ac569a61718d7debf7eae1b59e..afd463602af4c3f19db8f8cc834aa8cf0b78867e
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3955,7 +3955,7 @@
shl\t%0, %1, %2
ushl\t%0, %1, %2"
   [(set_attr "simd" "no,no,yes,yes")
-   (set_attr "type" "bfm,shift_reg,neon_shift_imm, neon_shift_reg")]
+   (set_attr "type" "bfx,shift_reg,neon_shift_imm, neon_shift_reg")]
 )
 
 ;; Logical right shift using SISD or Integer instruction
@@ -3972,7 +3972,7 @@
#
#"
   [(set_attr "simd" "no,no,yes,yes,yes")
-   (set_attr "type" 
"bfm,shift_reg,neon_shift_imm,neon_shift_reg,neon_shift_reg")]
+   (set_attr "type" 
"bfx,shift_reg,neon_shift_imm,neon_shift_reg,neon_shift_reg")]
 )
 
 (define_split
@@ -4019,7 +4019,7 @@
#
#"
   [(set_attr "simd" "no,no,yes,yes,yes")
-   (set_attr "type" 
"bfm,shift_reg,neon_shift_imm,neon_shift_reg,neon_shift_reg")]
+   (set_attr "type" 
"bfx,shift_reg,neon_shift_imm,neon_shift_reg,neon_shift_reg")]
 )
 
 (define_split
@@ -4129,7 +4129,7 @@
   "@
\\t%w0, %w1, %2
\\t%w0, %w1, %w2"
-  [(set_attr "type" "bfm,shift_reg")]
+  [(set_attr "type" "bfx,shift_reg")]
 )
 
 (define_insn "*3_insn"
@@ -4141,7 +4141,7 @@
   operands[3] = GEN_INT ( - UINTVAL (operands[2]));
   return "\t%w0, %w1, %2, %3";
 }
-  [(set_attr "type" "bfm")]
+  [(set_attr "type" "bfx")]
 )
 
 (define_insn "*extr5_insn"
@@ -4234,7 +4234,7 @@
   operands[3] = GEN_INT ( - UINTVAL (operands[2]));
   return "bfiz\t%0, %1, %2, %3";
 }
-  [(set_attr "type" "bfm")]
+  [(set_attr "type" "bfx")]
 )
 
 (define_insn "*zero_extend_lshr"
@@ -4247,7 +4247,7 @@
   operands[3] = GEN_INT ( - UINTVAL (operands[2]));
   return "ubfx\t%0, %1, %2, %3";
 }
-  [(set_attr "type" "bfm")]
+  [(set_attr "type" "bfx")]
 )
 
 (define_insn "*extend_ashr"
@@ -4260,7 +4260,7 @@
   operands[3] = GEN_INT ( - UINTVAL (operands[2]));
   return "sbfx\\t%0, %1, %2, %3";
 }
-  [(set_attr "type" "bfm")]
+  [(set_attr "type" "bfx")]
 )
 
 ;; ---
@@ -4283,7 +4283,7 @@
 (match_operand 3 "const_int_operand" "n")))]
   ""
   "bfx\\t%0, %1, %3, %2"
-  [(set_attr "type" "bfm")]
+  [(set_attr "type" "bfx")]
 )
 
 ;; Bitfield Insert (insv)
@@ -4365,7 +4365,7 @@
  : GEN_INT ( - UINTVAL (operands[2]));
   return "bfiz\t%0, %1, %2, %3";
 }
-  [(set_attr "type" "bfm")]
+  [(set_attr "type" "bfx")]
 )
 
 ;; XXX We should match (any_extend (ashift)) here, like (and (ashift)) below
@@ -4377,7 +4377,7 @@
 (match_operand 3 "const_int_operand" "n")))]
   "aarch64_mask_and_shift_for_ubfiz_p (mode, operands[3], operands[2])"
   "ubfiz\\t%0, %1, %2, %P3"
-  [(set_attr "type" "bfm")]
+  [(set_attr "type" "bfx")]
 )
 
 (define_insn "bswap2"
diff --git a/gcc/config/aarch64/thunderx.md b/gcc/config/aarch64/thunderx.md
index 
058713a2ad98a364d36a3faaf0e93c39cb89adbc..7c1c28b0498cfe0129e3f0de7e29e31536fe421a
 100644
--- a/gcc/config/aarch64/thunderx.md
+++ b/gcc/config/aarch64/thunderx.md
@@ -39,7 +39,7 @@
 
 (define_insn_reservation "thunderx_shift" 1
   (and (eq_attr "tune" "thunderx")
-   (eq_attr "type" "bfm,extend,rotate_imm,shift_imm,shift_reg,rbit,rev"))
+   (eq_attr "type" 
"bfm,bfx,extend,rotate_imm,shift_imm,shift_reg,rbit,rev"))
   "thunderx_pipe0 | thunderx_pipe1")
 
 
diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md
index 
70c0f4daabe0ccb8e32808f1af51f5460e087a18..eb6d0b04976aaf441dd95cc43d02918226e75387
 100644
--- a/gcc/config/arm/cortex-a53.md
+++ b/gcc/config/arm/cortex-a53.md
@@ -93,7 +93,7 @@
   (and (eq_attr "tune" "cortexa53")
(eq_attr "type" "alu_shift_imm,alus_shift_imm,
crc,logic_shift_imm,logics_

gomp-nvptx branch status

2016-11-10 Thread Alexander Monakov
Hello,

I'd like to provide an overview of the gomp-nvptx branch status. In response to
this message I'll send two more emails, with libgomp and middle-end changes on
the branch.  Some of the changes to libgomp such as build machinery adaptations
have already received substantial comments in 2015, but the middle-end stuff is
mostly unreviewed I believe.

Middle-end changes mostly amount to adding SIMD-to-SIMT transforms in omp-low.c,
as shown on the Cauldron.  SIMT outlining via gimplifier abuse is not there, and
neither is cloning of SIMD/SIMT loops.  Outlining is required for correctness,
and cloning is useful as it allows to avoid intermixing SIMD+SIMT and thus be
sure that SIMT lowering does not 'dirty' SIMD loops and regress host/MIC
vectorization.  I could argue that it's possible to improve my SIMT lowering to
avoid some dirtying (like moving loop-invariant calls to GOMP_SIMT_VF()), but
the need for outlining makes that moot anyway, I think.

To get great performance this will need further changes everywhere, including
in target-independent code, due to accidents like this bug (which I'd like to
ping given the topic): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68706 

With OpenMP/PTX offloading there are 5 additional failures in 
check-target-libgomp:

Two due to tests using 'usleep' in a target region:
FAIL: libgomp.c/target-32.c (test for excess errors)
FAIL: libgomp.c/thread-limit-2.c (test for excess errors)

Two with 'target nowait' (not implemented)
FAIL: libgomp.c/target-33.c execution test
FAIL: libgomp.c/target-34.c execution test

One with 'target link' (not implemented)
FAIL: libgomp.c/target-link-1.c (test for excess errors)

Eventually these can be fixed by implementing the two missing OpenMP 4.5
features; for the 'usleep' issues, while I think it's not good to have tests
with that, eventually I'd like to provide a port of musl libc for PTX which
would also provide usleep (either a no-op stub, or based on a busy loop).

Short term, it should be possible to implement something like -foffload=^nvptx
to skip PTX (and only PTX) offloading on those tests.

Thanks.
Alexander


[PATCH][AArch64] Tweak Cortex-A57 vector cost

2016-11-10 Thread Wilco Dijkstra
The existing vector costs stop some beneficial vectorization.  This is mostly 
due
to vector statement cost being set to 3 as well as vector loads having a higher
cost than scalar loads.  This means that even when we vectorize 4x, it is 
possible
that the cost of a vectorized loop is similar to the scalar version, and we fail
to vectorize.  For example for a particular loop the costs for -mcpu=generic 
are:

note: Cost model analysis: 
  Vector inside of loop cost: 146
  Vector prologue cost: 5
  Vector epilogue cost: 0
  Scalar iteration cost: 50
  Scalar outside cost: 0
  Vector outside cost: 5
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 1
note:   Runtime profitability threshold = 3
note:   Static estimate profitability threshold = 3
note: loop vectorized


While -mcpu=cortex-a57 reports:

note: Cost model analysis: 
  Vector inside of loop cost: 294
  Vector prologue cost: 15
  Vector epilogue cost: 0
  Scalar iteration cost: 74
  Scalar outside cost: 0
  Vector outside cost: 15
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 31
note:   Runtime profitability threshold = 30
note:   Static estimate profitability threshold = 30
note: not vectorized: vectorization not profitable.
note: not vectorized: iteration count smaller than user specified loop bound 
parameter or minimum profitable iterations (whichever is more conservative).


Using a cost of 3 for a vector operation suggests they are 3 times as
expensive as scalar operations.  Since most vector operations have a 
similar throughput as scalar operations, this is not correct.

Using slightly lower values for these heuristics now allows this loop
and many others to be vectorized.  On a proprietary benchmark the gain
from vectorizing this loop is around 15-30% which shows vectorizing it is
indeed beneficial.

ChangeLog:
2016-11-10  Wilco Dijkstra  

* config/aarch64/aarch64.c (cortexa57_vector_cost):
Change vec_stmt_cost, vec_align_load_cost and vec_unalign_load_cost.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
279a6dfaa4a9c306bc7a8dba9f4f53704f61fefe..cff2e8fc6e9309e6aa4f68a5aba3bfac3b737283
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -382,12 +382,12 @@ static const struct cpu_vector_cost cortexa57_vector_cost 
=
   1, /* scalar_stmt_cost  */
   4, /* scalar_load_cost  */
   1, /* scalar_store_cost  */
-  3, /* vec_stmt_cost  */
+  2, /* vec_stmt_cost  */
   3, /* vec_permute_cost  */
   8, /* vec_to_scalar_cost  */
   8, /* scalar_to_vec_cost  */
-  5, /* vec_align_load_cost  */
-  5, /* vec_unalign_load_cost  */
+  4, /* vec_align_load_cost  */
+  4, /* vec_unalign_load_cost  */
   1, /* vec_unalign_store_cost  */
   1, /* vec_store_cost  */
   1, /* cond_taken_branch_cost  */


[committed] Bump fortran _OPENMP macro and openmp_version

2016-11-10 Thread Jakub Jelinek
Hi!

I think it is better to announce 4.5 than 4.0 at the current state.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2016-11-10  Jakub Jelinek  

gcc/fortran/
* cpp.c (cpp_define_builtins): Define _OPENMP to 201511 instead
of 201307.
* gfortran.texi: Mention partial OpenMP 4.5 support.
* intrinsic.texi: Update for OpenMP 4.5.
gcc/testsuite/
* gfortran.dg/openmp-define-3.f90: Expect 201511 instead of
201307.
libgomp/
* omp_lib.f90.in (openmp_version): Change to 201511 from 201307.
* omp_lib.h.in (openmp_version): Likewise.
* testsuite/libgomp.fortran/openmp_version-1.f: Expect 201511 instead
of 201307.
* testsuite/libgomp.fortran/openmp_version-2.f90: Likewise.

--- gcc/fortran/cpp.c.jj2016-01-04 14:55:59.0 +0100
+++ gcc/fortran/cpp.c   2016-11-10 12:59:33.829477276 +0100
@@ -168,7 +168,7 @@ cpp_define_builtins (cpp_reader *pfile)
 cpp_define (pfile, "_OPENACC=201306");
 
   if (flag_openmp)
-cpp_define (pfile, "_OPENMP=201307");
+cpp_define (pfile, "_OPENMP=201511");
 
   /* The defines below are necessary for the TARGET_* macros.
 
--- gcc/fortran/gfortran.texi.jj2016-11-03 22:03:29.0 +0100
+++ gcc/fortran/gfortran.texi   2016-11-10 13:02:11.380466602 +0100
@@ -536,7 +536,8 @@ The current status of the support is can
 and @ref{TS 18508 status} sections of the documentation.
 
 Additionally, the GNU Fortran compilers supports the OpenMP specification
-(version 4.0, @url{http://openmp.org/@/wp/@/openmp-specifications/}).
+(version 4.0 and most of the features of the 4.5 version,
+@url{http://openmp.org/@/wp/@/openmp-specifications/}).
 There also is initial support for the OpenACC specification (targeting
 version 2.0, @uref{http://www.openacc.org/}).
 Note that this is an experimental feature, incomplete, and subject to
@@ -1999,7 +2000,7 @@ and environment variables that influence
 
 GNU Fortran strives to be compatible to the 
 @uref{http://openmp.org/wp/openmp-specifications/,
-OpenMP Application Program Interface v4.0}.
+OpenMP Application Program Interface v4.5}.
 
 To enable the processing of the OpenMP directive @code{!$omp} in
 free-form source code; the @code{c$omp}, @code{*$omp} and @code{!$omp}
--- gcc/fortran/intrinsic.texi.jj   2016-10-31 13:28:11.0 +0100
+++ gcc/fortran/intrinsic.texi  2016-11-10 13:02:57.463880355 +0100
@@ -14769,7 +14769,7 @@ with the following options: @code{-fno-u
 @section OpenMP Modules @code{OMP_LIB} and @code{OMP_LIB_KINDS}
 @table @asis
 @item @emph{Standard}:
-OpenMP Application Program Interface v4.0
+OpenMP Application Program Interface v4.5
 @end table
 
 
@@ -14783,8 +14783,8 @@ the named constants defined in the modul
 below.
 
 For details refer to the actual
-@uref{http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf,
-OpenMP Application Program Interface v4.0}.
+@uref{http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf,
+OpenMP Application Program Interface v4.5}.
 
 @code{OMP_LIB_KINDS} provides the following scalar default-integer
 named constants:
@@ -14799,7 +14799,7 @@ named constants:
 @code{OMP_LIB} provides the scalar default-integer
 named constant @code{openmp_version} with a value of the form
 @var{mm}, where @code{} is the year and @var{mm} the month
-of the OpenMP version; for OpenMP v4.0 the value is @code{201307}.
+of the OpenMP version; for OpenMP v4.5 the value is @code{201511}.
 
 The following scalar integer named constants of the
 kind @code{omp_sched_kind}:
--- gcc/testsuite/gfortran.dg/openmp-define-3.f90.jj2014-06-18 
09:11:57.0 +0200
+++ gcc/testsuite/gfortran.dg/openmp-define-3.f90   2016-11-10 
13:04:12.381927290 +0100
@@ -6,6 +6,6 @@
 # error _OPENMP not defined
 #endif
 
-#if _OPENMP != 201307
+#if _OPENMP != 201511
 # error _OPENMP defined to wrong value
 #endif
--- libgomp/omp_lib.f90.in.jj   2016-01-04 14:38:59.0 +0100
+++ libgomp/omp_lib.f90.in  2016-11-10 13:04:30.704694198 +0100
@@ -59,7 +59,7 @@
   module omp_lib
 use omp_lib_kinds
 implicit none
-integer, parameter :: openmp_version = 201307
+integer, parameter :: openmp_version = 201511
 
 interface
   subroutine omp_init_lock (svar)
--- libgomp/omp_lib.h.in.jj 2016-01-04 14:38:59.0 +0100
+++ libgomp/omp_lib.h.in2016-11-10 13:04:43.363533159 +0100
@@ -58,7 +58,7 @@
   parameter (omp_lock_hint_contended = 2)
   parameter (omp_lock_hint_nonspeculative = 4)
   parameter (omp_lock_hint_speculative = 8)
-  parameter (openmp_version = 201307)
+  parameter (openmp_version = 201511)
 
   external omp_init_lock, omp_init_nest_lock
   external omp_init_lock_with_hint
--- libgomp/testsuite/libgomp.fortran/openmp_version-1.f.jj 2014-06-18 
09:11:57.0 +0200
+++ libgomp/testsuite/libgomp.fortran/openmp_version-1.f2016-11-10 
13:05:04.255267386 +0100
@@ -4,6

[PATCH] Fix ICE in default_use_anchors_for_symbol_p (PR middle-end/78201)

2016-11-10 Thread Jakub Jelinek
Hi!

On arm/aarch64 we ICE because some decls that make it here has non-NULL
DECL_SIZE, which is a VAR_DECL rather than CONST_INT (or DECL_SIZE that
doesn't fit into shwi would ICE similarly).  While it is arguably a FE bug
that it creates for VLA initialization from STRING_CST such a decl,
I believe we have some PRs about it already open.
I think it won't hurt to check for the large sizes properly even in this
function though, and punt on unexpected cases, or even extremely large ones.

Bootstrapped/regtested on x86_64-linux and i686-linux, Yvan said he is going
to test on arm and/or aarch64.  Ok for trunk if the testing there passes
too?

2016-11-10  Jakub Jelinek  

PR middle-end/78201
* varasm.c (default_use_anchors_for_symbol_p): Fix a comment typo.
Don't test decl != NULL.  Don't look at DECL_SIZE, but DECL_SIZE_UNIT
instead, return false if it is NULL, or doesn't fit into uhwi, or
is larger or equal to targetm.max_anchor_offset.

* g++.dg/opt/pr78201.C: New test.

--- gcc/varasm.c.jj 2016-10-31 13:28:12.0 +0100
+++ gcc/varasm.c2016-11-10 15:18:41.282886244 +0100
@@ -6804,11 +6804,12 @@ default_use_anchors_for_symbol_p (const_
return false;
 
   /* Don't use section anchors for decls that won't fit inside a single
-anchor range to reduce the amount of instructions require to refer
+anchor range to reduce the amount of instructions required to refer
 to the entire declaration.  */
-  if (decl && DECL_SIZE (decl)
-&& tree_to_shwi (DECL_SIZE (decl))
-   >= (targetm.max_anchor_offset * BITS_PER_UNIT))
+  if (DECL_SIZE_UNIT (decl) == NULL_TREE
+ || !tree_fits_uhwi_p (DECL_SIZE_UNIT (decl))
+ || (tree_to_uhwi (DECL_SIZE_UNIT (decl))
+ >= (unsigned HOST_WIDE_INT) targetm.max_anchor_offset))
return false;
 
 }
--- gcc/testsuite/g++.dg/opt/pr78201.C.jj   2016-11-10 15:20:18.398660681 
+0100
+++ gcc/testsuite/g++.dg/opt/pr78201.C  2016-11-10 15:19:58.0 +0100
@@ -0,0 +1,13 @@
+// PR middle-end/78201
+// { dg-do compile }
+// { dg-options "-O2" }
+
+struct B { long d (); } *c;
+long e;
+
+void
+foo ()
+{
+  char a[e] = "";
+  c && c->d();
+}

Jakub


Re: [C++ PATCH] Only warn with -Wc++1z-compat about mangling of visible symbols

2016-11-10 Thread Jason Merrill
OK.

On Thu, Nov 10, 2016 at 8:52 AM, Jakub Jelinek  wrote:
> Hi!
>
> It seems -Wabi/-Wc++1z-compat warns about mangling changes even for symbols 
> that are
> not visible outside of its TU (so likely only inline asm or tools
> looking at .symtab STB_LOCAL symbols would notice).  Perhaps that is fine
> for -Wabi that isn't enabled in -Wall/-Wextra, but -Wc++1z-compat is, and
> I think most of the users don't really care about mangling of those symbols.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-11-10  Jakub Jelinek  
>
> * mangle.c (mangle_decl): Only emit -Wc++1z-compat warnings for
> public or external symbols.
>
> * g++.dg/cpp1z/noexcept-type14.C: New test.
> * g++.dg/asan/asan_test.C: Remove -Wno-c++1z-compat from dg-options.
>
> --- gcc/cp/mangle.c.jj  2016-11-09 23:55:59.0 +0100
> +++ gcc/cp/mangle.c 2016-11-10 10:14:26.914059686 +0100
> @@ -3836,7 +3836,8 @@ mangle_decl (const tree decl)
>  }
>SET_DECL_ASSEMBLER_NAME (decl, id);
>
> -  if (G.need_cxx1z_warning)
> +  if (G.need_cxx1z_warning
> +  && (TREE_PUBLIC (decl) || DECL_REALLY_EXTERN (decl)))
>  warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wc__1z_compat,
> "mangled name for %qD will change in C++17 because the "
> "exception specification is part of a function type",
> --- gcc/testsuite/g++.dg/cpp1z/noexcept-type14.C.jj 2016-11-10 
> 10:14:42.898857481 +0100
> +++ gcc/testsuite/g++.dg/cpp1z/noexcept-type14.C2016-11-10 
> 10:22:46.412741092 +0100
> @@ -0,0 +1,26 @@
> +// { dg-do compile { target c++11 } }
> +// { dg-options "-Wall" }
> +
> +#define A asm volatile ("" : : : "memory")
> +void foo () throw () {}
> +extern void f1 (decltype (foo) *); // { dg-bogus "mangled name" }
> +void f2 (decltype (foo) *);// { dg-bogus "mangled name" }
> +extern void f3 (decltype (foo) *); // { dg-warning "mangled name" "" { 
> target c++14_down } }
> +void f4 (decltype (foo) *);// { dg-warning "mangled name" "" { 
> target c++14_down } }
> +void f5 (decltype (foo) *) { A; }  // { dg-warning "mangled name" "" { 
> target c++14_down } }
> +static void f6 (decltype (foo) *) { A; }// { dg-bogus "mangled name" }
> +namespace N {
> +void f7 (decltype (foo) *) { A; }  // { dg-warning "mangled name" "" { 
> target c++14_down } }
> +}
> +namespace {
> +void f8 (decltype (foo) *) { A; }  // { dg-bogus "mangled name" }
> +}
> +void bar ()
> +{
> +  f3 (foo);
> +  f4 (foo);
> +  f5 (foo);
> +  f6 (foo);
> +  N::f7 (foo);
> +  f8 (foo);
> +}
> --- gcc/testsuite/g++.dg/asan/asan_test.C.jj2016-11-10 00:00:00.0 
> +0100
> +++ gcc/testsuite/g++.dg/asan/asan_test.C   2016-11-10 13:28:55.348073019 
> +0100
> @@ -2,7 +2,7 @@
>  // { dg-skip-if "" { *-*-* } { "*" } { "-O2" } }
>  // { dg-skip-if "" { *-*-* } { "-flto" } { "" } }
>  // { dg-additional-sources "asan_globals_test-wrapper.cc" }
> -// { dg-options "-std=c++11 -fsanitize=address -fno-builtin -Wall 
> -Wno-c++1z-compat -Werror -g -DASAN_UAR=0 -DASAN_HAS_EXCEPTIONS=1 
> -DASAN_HAS_BLACKLIST=0 -DSANITIZER_USE_DEJAGNU_GTEST=1 -lasan -lpthread -ldl" 
> }
> +// { dg-options "-std=c++11 -fsanitize=address -fno-builtin -Wall -Werror -g 
> -DASAN_UAR=0 -DASAN_HAS_EXCEPTIONS=1 -DASAN_HAS_BLACKLIST=0 
> -DSANITIZER_USE_DEJAGNU_GTEST=1 -lasan -lpthread -ldl" }
>  // { dg-additional-options "-DASAN_NEEDS_SEGV=1" { target { ! arm*-*-* } } }
>  // { dg-additional-options "-DASAN_LOW_MEMORY=1 -DASAN_NEEDS_SEGV=0" { 
> target arm*-*-* } }
>  // { dg-additional-options "-DASAN_AVOID_EXPENSIVE_TESTS=1" { target { ! 
> run_expensive_tests } } }
>
> Jakub


[C++ PATCH] Only warn with -Wc++1z-compat about mangling of visible symbols

2016-11-10 Thread Jakub Jelinek
Hi!

It seems -Wabi/-Wc++1z-compat warns about mangling changes even for symbols 
that are
not visible outside of its TU (so likely only inline asm or tools
looking at .symtab STB_LOCAL symbols would notice).  Perhaps that is fine
for -Wabi that isn't enabled in -Wall/-Wextra, but -Wc++1z-compat is, and
I think most of the users don't really care about mangling of those symbols.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-10  Jakub Jelinek  

* mangle.c (mangle_decl): Only emit -Wc++1z-compat warnings for
public or external symbols.

* g++.dg/cpp1z/noexcept-type14.C: New test.
* g++.dg/asan/asan_test.C: Remove -Wno-c++1z-compat from dg-options.

--- gcc/cp/mangle.c.jj  2016-11-09 23:55:59.0 +0100
+++ gcc/cp/mangle.c 2016-11-10 10:14:26.914059686 +0100
@@ -3836,7 +3836,8 @@ mangle_decl (const tree decl)
 }
   SET_DECL_ASSEMBLER_NAME (decl, id);
 
-  if (G.need_cxx1z_warning)
+  if (G.need_cxx1z_warning
+  && (TREE_PUBLIC (decl) || DECL_REALLY_EXTERN (decl)))
 warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wc__1z_compat,
"mangled name for %qD will change in C++17 because the "
"exception specification is part of a function type",
--- gcc/testsuite/g++.dg/cpp1z/noexcept-type14.C.jj 2016-11-10 
10:14:42.898857481 +0100
+++ gcc/testsuite/g++.dg/cpp1z/noexcept-type14.C2016-11-10 
10:22:46.412741092 +0100
@@ -0,0 +1,26 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wall" }
+
+#define A asm volatile ("" : : : "memory")
+void foo () throw () {}
+extern void f1 (decltype (foo) *); // { dg-bogus "mangled name" }
+void f2 (decltype (foo) *);// { dg-bogus "mangled name" }
+extern void f3 (decltype (foo) *); // { dg-warning "mangled name" "" { 
target c++14_down } }
+void f4 (decltype (foo) *);// { dg-warning "mangled name" "" { 
target c++14_down } }
+void f5 (decltype (foo) *) { A; }  // { dg-warning "mangled name" "" { 
target c++14_down } }
+static void f6 (decltype (foo) *) { A; }// { dg-bogus "mangled name" }
+namespace N {
+void f7 (decltype (foo) *) { A; }  // { dg-warning "mangled name" "" { 
target c++14_down } }
+}
+namespace {
+void f8 (decltype (foo) *) { A; }  // { dg-bogus "mangled name" }
+}
+void bar ()
+{
+  f3 (foo);
+  f4 (foo);
+  f5 (foo);
+  f6 (foo);
+  N::f7 (foo);
+  f8 (foo);
+}
--- gcc/testsuite/g++.dg/asan/asan_test.C.jj2016-11-10 00:00:00.0 
+0100
+++ gcc/testsuite/g++.dg/asan/asan_test.C   2016-11-10 13:28:55.348073019 
+0100
@@ -2,7 +2,7 @@
 // { dg-skip-if "" { *-*-* } { "*" } { "-O2" } }
 // { dg-skip-if "" { *-*-* } { "-flto" } { "" } }
 // { dg-additional-sources "asan_globals_test-wrapper.cc" }
-// { dg-options "-std=c++11 -fsanitize=address -fno-builtin -Wall 
-Wno-c++1z-compat -Werror -g -DASAN_UAR=0 -DASAN_HAS_EXCEPTIONS=1 
-DASAN_HAS_BLACKLIST=0 -DSANITIZER_USE_DEJAGNU_GTEST=1 -lasan -lpthread -ldl" }
+// { dg-options "-std=c++11 -fsanitize=address -fno-builtin -Wall -Werror -g 
-DASAN_UAR=0 -DASAN_HAS_EXCEPTIONS=1 -DASAN_HAS_BLACKLIST=0 
-DSANITIZER_USE_DEJAGNU_GTEST=1 -lasan -lpthread -ldl" }
 // { dg-additional-options "-DASAN_NEEDS_SEGV=1" { target { ! arm*-*-* } } }
 // { dg-additional-options "-DASAN_LOW_MEMORY=1 -DASAN_NEEDS_SEGV=0" { target 
arm*-*-* } }
 // { dg-additional-options "-DASAN_AVOID_EXPENSIVE_TESTS=1" { target { ! 
run_expensive_tests } } }

Jakub


Re: [PATCH, LIBGCC] Avoid count_leading_zeros with undefined result (PR 78067)

2016-11-10 Thread Joseph Myers
On Thu, 10 Nov 2016, James Greenhalgh wrote:

> 
> On Wed, Nov 09, 2016 at 10:16:35PM +, Joseph Myers wrote:
> > On Wed, 9 Nov 2016, Bernd Edlinger wrote:
> >
> > > Yes, but maybe introduce a test if the half-wide value fits?
> > >
> > > like:
> > >
> > > #define M_OK2(M, T) ((M) > sizeof(T) * CHAR_BIT / 2 - 1)
> >
> > Something like that.
> 
> In patch form, that would look like this...
> 
> I've checked on my ARM and AArch64 trees with _Float16 support that this
> lets the tests pass.
> 
> OK?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-10 Thread Kyrill Tkachov

On 10/11/16 16:26, Segher Boessenkool wrote:

Hi!


Hi,



Great to see this.  Just a few comments...

On Thu, Nov 10, 2016 at 02:25:47PM +, Kyrill Tkachov wrote:

+/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
+
+static sbitmap
+aarch64_get_separate_components (void)
+{
+  /* Calls to alloca further extend the stack frame and it can be messy to
+ figure out the location of the stack slots for each register.
+ For now be conservative.  */
+  if (cfun->calls_alloca)
+return NULL;

The generic code already disallows functions with alloca (in
try_shrink_wrapping_separate).


Ok, I'll remove this.


+static void
+aarch64_emit_prologue_components (sbitmap components)
+{
+  rtx ptr_reg = gen_rtx_REG (Pmode, frame_pointer_needed
+? HARD_FRAME_POINTER_REGNUM
+: STACK_POINTER_REGNUM);
+
+  for (unsigned regno = R0_REGNUM; regno <= V31_REGNUM; regno++)
+if (bitmap_bit_p (components, regno))
+  {
+   rtx reg = gen_rtx_REG (Pmode, regno);
+   HOST_WIDE_INT offset = cfun->machine->frame.reg_offset[regno];
+   if (!frame_pointer_needed)
+   offset += cfun->machine->frame.frame_size
+ - cfun->machine->frame.hard_fp_offset;
+   rtx addr = plus_constant (Pmode, ptr_reg, offset);
+   rtx mem = gen_frame_mem (Pmode, addr);
+
+   RTX_FRAME_RELATED_P (emit_move_insn (mem, reg)) = 1;
+  }
+}

I think you should emit the CFI notes here directly, just like for the
epilogue components.


The prologue code in expand_prologue doesn't attach any explicit notes,
so I didn't want to deviate from that. Looking at the powerpc implementation,
would that be a REG_CFA_OFFSET with the (SET (mem) (reg)) expression for saving
the reg?

Thanks,
Kyrill



Segher




Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-10 Thread Jakub Jelinek
On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote:
> Hi,
> 
> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.
> 
> It requires additional patch for register allocator from Vladimir
> Makarov to be committed before.

Your MUA ate tabs (and in the ChangeLog you're using spaces instead of
tabs), can you repost as attachment or configure your MUA not to do this?

Just a couple of random nits follow:

> * gcc.target/i386/sse-12.c: Add -mavx5124fmaddps.

This mentions an option that doesn't exist, is that s/dd// ?

> * gcc.target/i386/sse-13.c: Ditto.

> @@ -399,6 +403,13 @@ ix86_handle_option (struct gcc_options *opts,
>   {
>opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512F_UNSET;
>opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_UNSET;
> +
> +  //turn off additional isa flags

Comments start with capital letter, end with ., there should be space
between // and T, better use /* ... */ style comment to match other
comments in the file.

> +  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
> +  opts->x_ix86_isa_flags2_explicit |=
> OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
> +  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
> +  opts->x_ix86_isa_flags2_explicit |=
> OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
> +
>   }

The formatting looks very weird.

Jakub


[PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-10 Thread Andrew Senkevich
Hi,

this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.

It requires additional patch for register allocator from Vladimir
Makarov to be committed before.

gcc/
* common/config/i386/i386-common.c
(OPTION_MASK_ISA_AVX5124FMAPS_SET,
OPTION_MASK_ISA_AVX5124FMAPS_UNSET,
OPTION_MASK_ISA_AVX5124VNNIW_SET,
OPTION_MASK_ISA_AVX5124VNNIW_UNSET): New.
(ix86_handle_option): Handle OPT_mavx5124fmaps,
OPT_mavx5124vnniw.
* config.gcc: Add avx5124fmapsintrin.h, avx5124vnniwintrin.h.
* config/i386/avx5124fmapsintrin.h: New file.
* config/i386/avx5124vnniwintrin.h: Ditto.
* config/i386/constraints.md (h): New constraint.
* config/i386/cpuid.h: (bit_AVX5124VNNIW,
bit_AVX5124FMAPS): New.
* config/i386/driver-i386.c (host_detect_local_cpu):
Detect avx5124fmaps, avx5124vnniw.
* config/i386/i386-builtin-types.def: Add types
V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF_V16SF_UHI,
V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF,
V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF,
V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF_V4SF_UQI,
V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI,
V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI_V16SI_UHI.
* config/i386/i386-builtin.def (__builtin_ia32_4fmaddps_mask,
__builtin_ia32_4fmaddps, __builtin_ia32_4fmaddss,
__builtin_ia32_4fmaddss_mask, __builtin_ia32_4fnmaddps_mask,
__builtin_ia32_4fnmaddps, __builtin_ia32_4fnmaddss,
__builtin_ia32_4fnmaddss_mask, __builtin_ia32_vp4dpwssd,
__builtin_ia32_vp4dpwssd_mask, __builtin_ia32_vp4dpwssds,
__builtin_ia32_vp4dpwssds_mask): New.
* config/i386/i386-c.c (ix86_target_macros_internal):
Define __AVX5124FMAPS__, __AVX5124VNNIW__.
* config/i386/i386-modes.def (VECTOR_MODES (FLOAT, 256),
VECTOR_MODE (INT, SI, 64)): New modes.
* config/i386/i386.c (ix86_target_string): Add -mavx5124fmaps,
-mavx5124vnniw.
(PTA_AVX5124FMAPS, PTA_AVX5124VNNIW): Define.
(ix86_option_override_internal): Handle new options.
(ix86_valid_target_attribute_inner_p): Add avx5124fmaps,
avx5124vnniw.
(ix86_expand_builtin): Handle new builtins.
(ix86_additional_allocno_class_p): New.
* config/i386/i386.h (TARGET_AVX5124FMAPS,
TARGET_AVX5124FMAPS_P,
TARGET_AVX5124VNNIW,
TARGET_AVX5124VNNIW_P): Define.
(reg_class): Add MOD4_SSE_REGS.
(MOD4_SSE_REG_P, MOD4_SSE_REGNO_P): New.
* config/i386/i386.opt: Add mavx5124fmaps, mavx5124vnniw.
* config/i386/immintrin.h: Include avx5124fmapsintrin.h,
avx5124vnniwintrin.h.
* config/i386/sse.md (unspec): Add UNSPEC_VP4FMADD,
UNSPEC_VP4FNMADD,
UNSPEC_VP4DPWSSD, UNSPEC_VP4DPWSSDS.
(define_mode_iterator IMOD4): New.
(define_mode_attr imod4_narrow): Ditto.
(define_insn "mov"): Ditto.
(define_insn "avx5124fmaddps_4fmaddps"): Ditto.
(define_insn "avx5124fmaddps_4fmaddps_mask"): Ditto.
(define_insn "avx5124fmaddps_4fmaddps_maskz"): Ditto.
(define_insn "avx5124fmaddps_4fmaddss"): Ditto.
(define_insn "avx5124fmaddps_4fmaddss_mask"): Ditto.
(define_insn "avx5124fmaddps_4fmaddss_maskz"): Ditto.
(define_insn "avx5124fmaddps_4fnmaddps"): Ditto.
(define_insn "avx5124fmaddps_4fnmaddps_mask"): Ditto.
(define_insn "avx5124fmaddps_4fnmaddps_maskz"): Ditto.
(define_insn "avx5124fmaddps_4fnmaddss"): Ditto.
(define_insn "avx5124fmaddps_4fnmaddss_mask"): Ditto.
(define_insn "avx5124fmaddps_4fnmaddss_maskz"): Ditto.
(define_insn "avx5124vnniw_vp4dpwssd"): Ditto.
(define_insn "avx5124vnniw_vp4dpwssd_mask"): Ditto.
(define_insn "avx5124vnniw_vp4dpwssd_maskz"): Ditto.
(define_insn "avx5124vnniw_vp4dpwssds"): Ditto.
(define_insn "avx5124vnniw_vp4dpwssds_mask"): Ditto.
(define_insn "avx5124vnniw_vp4dpwssds_maskz"): Ditto.
* init-regs.c (initialize_uninitialized_regs): Add emit_clobber call.
* genmodes.c (mode_size_inline): Extend return type.
* machmode.h (mode_size, mode_base_align): Extend type.
gcc/testsuite/
* gcc.target/i386/avx5124fmadd-v4fmaddps-1.c: New test.
* gcc.target/i386/avx5124fmadd-v4fmaddps-2.c: Ditto.
* gcc.target/i386/avx5124fmadd-v4fmaddss-1.c: Ditto.
* gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c: Ditto.
* gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c: Ditto.
* gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c: Ditto.
* gcc.target/i386/avx5124fmaps-check.h: Ditto.
* gcc.target/i386/avx5124vnniw-check.h: Ditto.
* gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c: Ditto.
* gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c: Ditto.
* gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c: Ditto.
   

Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-10 Thread Segher Boessenkool
Hi!

Great to see this.  Just a few comments...

On Thu, Nov 10, 2016 at 02:25:47PM +, Kyrill Tkachov wrote:
> +/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
> +
> +static sbitmap
> +aarch64_get_separate_components (void)
> +{
> +  /* Calls to alloca further extend the stack frame and it can be messy to
> + figure out the location of the stack slots for each register.
> + For now be conservative.  */
> +  if (cfun->calls_alloca)
> +return NULL;

The generic code already disallows functions with alloca (in
try_shrink_wrapping_separate).

> +static void
> +aarch64_emit_prologue_components (sbitmap components)
> +{
> +  rtx ptr_reg = gen_rtx_REG (Pmode, frame_pointer_needed
> +  ? HARD_FRAME_POINTER_REGNUM
> +  : STACK_POINTER_REGNUM);
> +
> +  for (unsigned regno = R0_REGNUM; regno <= V31_REGNUM; regno++)
> +if (bitmap_bit_p (components, regno))
> +  {
> + rtx reg = gen_rtx_REG (Pmode, regno);
> + HOST_WIDE_INT offset = cfun->machine->frame.reg_offset[regno];
> + if (!frame_pointer_needed)
> + offset += cfun->machine->frame.frame_size
> +   - cfun->machine->frame.hard_fp_offset;
> + rtx addr = plus_constant (Pmode, ptr_reg, offset);
> + rtx mem = gen_frame_mem (Pmode, addr);
> +
> + RTX_FRAME_RELATED_P (emit_move_insn (mem, reg)) = 1;
> +  }
> +}

I think you should emit the CFI notes here directly, just like for the
epilogue components.


Segher


Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread Martin Liška
On 11/10/2016 05:17 PM, David Edelsohn wrote:
> Maybe instead of adding "maybe", we need to change the severity of the
> warning so that the warning is not emitted by default.

Adding the warning option to -Wextra can be solution. Is it acceptable
approach?

Martin


Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread Nathan Sidwell

On 11/10/2016 07:55 AM, David Edelsohn wrote:


gcc.c now imposes profile-update=atomic if -pthread is used, even if
the target does not support profile-update=atomic.


ah, that's where this is coming from.

nathan

--
Nathan Sidwell


Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread David Edelsohn
On Thu, Nov 10, 2016 at 10:58 AM, Martin Liška  wrote:
> On 11/10/2016 04:43 PM, Nathan Sidwell wrote:
>> On 11/10/2016 05:19 AM, Martin Liška wrote:
>>
 On 10/13/2016 05:34 PM, Martin Liška wrote:
> Hello.
>
> As it's very hard to guess from GCC driver whether a target supports 
> atomic updates
> for GCOV counter or not, I decided to come up with a new option value 
> (maybe-atomic),
> that would be transformed in a corresponding value (single or atomic) in 
> tree-profile.c.
> The GCC driver selects the option when -pthread is present in the command 
> line.
>
> That should fix all tests failures seen on AIX target.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?
>>
>> I dislike this.  If it's hard for gcc itself to know, how much harder for 
>> the user must it be?   (does gcc have another instance of an option that 
>> behaves 'prefer-A-or-B-if-you-can't'?
>>
>> It's also not clear what problem it's solving for the user?  If the user 
>> needs atomic update, they should get a hard error if the target doesn't 
>> support it.  If they don't need atomic, why ask for it?
>
> My initial motivation was to automatically selected -fprofile-update=atomic 
> if supported by a target and when '-pthread' is present on command line.
> As it's very problematic to identify (from GCC driver) whether a target 
> supports or not atomic updates, 'maybe' option is the only possible we can 
> guess.
>
>>
>> But as ever, I'm not going to veto it.
>
> Other option is to disable selection of -fprofile-update=atomic automatically.

Unfortunately, this cannot use a configure test or manually set value
based on target because the same gcc.c driver is invoked with
different options that may provide atomic update in some variants and
no atomic update in other (e.g., -m64) because the profile counter is
64 bits.

Maybe instead of adding "maybe", we need to change the severity of the
warning so that the warning is not emitted by default.

- David


Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread David Edelsohn
On Thu, Nov 10, 2016 at 11:14 AM, Nathan Sidwell  wrote:
> On 11/10/2016 07:43 AM, Nathan Sidwell wrote:
>>
>> On 11/10/2016 05:19 AM, Martin Liška wrote:
>>
 On 10/13/2016 05:34 PM, Martin Liška wrote:
>
> Hello.
>
> As it's very hard to guess from GCC driver whether a target supports
> atomic updates
> for GCOV counter or not, I decided to come up with a new option
> value (maybe-atomic),
> that would be transformed in a corresponding value (single or
> atomic) in tree-profile.c.
> The GCC driver selects the option when -pthread is present in the
> command line.
>
> That should fix all tests failures seen on AIX target.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression
> tests.
>
> Ready to be installed?
>>
>>
>> I dislike this.  If it's hard for gcc itself to know, how much harder
>> for the user must it be?   (does gcc have another instance of an option
>> that behaves 'prefer-A-or-B-if-you-can't'?
>
>
> Thinking further.  why isn't the right solution for -fprofile-update=atomic
> when faced with a target that cannot support it to:
> a) issue an error and bail out at the first opportunity
> b) or issue a warning and fall back to single threaded update?
>
> For #b presumably there'll be the capability of suppressing that particular
> warning?

Because that incorrectly breaks a huge portion of the testsuite.
that's not what the user intended.

- David


Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread Nathan Sidwell

On 11/10/2016 07:43 AM, Nathan Sidwell wrote:

On 11/10/2016 05:19 AM, Martin Liška wrote:


On 10/13/2016 05:34 PM, Martin Liška wrote:

Hello.

As it's very hard to guess from GCC driver whether a target supports
atomic updates
for GCOV counter or not, I decided to come up with a new option
value (maybe-atomic),
that would be transformed in a corresponding value (single or
atomic) in tree-profile.c.
The GCC driver selects the option when -pthread is present in the
command line.

That should fix all tests failures seen on AIX target.

Patch can bootstrap on ppc64le-redhat-linux and survives regression
tests.

Ready to be installed?


I dislike this.  If it's hard for gcc itself to know, how much harder
for the user must it be?   (does gcc have another instance of an option
that behaves 'prefer-A-or-B-if-you-can't'?


Thinking further.  why isn't the right solution for 
-fprofile-update=atomic when faced with a target that cannot support it to:

a) issue an error and bail out at the first opportunity
b) or issue a warning and fall back to single threaded update?

For #b presumably there'll be the capability of suppressing that 
particular warning?


nathan

--
Nathan Sidwell


Re: [PATCH], PowerPC ISA 3.0, allow QImode/HImode to go into vector registers

2016-11-10 Thread Segher Boessenkool
Hi Mike,

> I have built the spec 2006 CPU benchmark suite with these changes, and the
> power8 (ISA 2.07) code generation does not change.

Very good to hear :-)

Just some nits; okay for trunk with that fixed:

> +(define_split
> +  [(set (match_operand:EXTHI 0 "altivec_register_operand" "")
> + (sign_extend:EXTHI
> +  (match_operand:HI 1 "indexed_or_indirect_operand" "")))]
> +  "TARGET_P9_VECTOR && reload_completed"
> +  [(set (match_dup 2)
> + (match_dup 1))
> +   (set (match_dup 0)
> + (sign_extend:EXTHI (match_dup 2)))]
> +{
> +  operands[2] = gen_rtx_REG (HImode, REGNO (operands[1]));
> +})

Please lose the default "" (here and elsewhere).

> Property changes on: gcc/testsuite/gcc.target/powerpc/p9-minmax-1.c
> ___
> Modified: svn:mergeinfo
>Merged /trunk/gcc/testsuite/gcc.target/powerpc/p9-minmax-1.c:r241733-241924
> 
> 
> Property changes on: gcc/testsuite/gcc.target/powerpc/p9-minmax-2.c
> ___
> Modified: svn:mergeinfo
>Merged /trunk/gcc/testsuite/gcc.target/powerpc/p9-minmax-2.c:r241733-241924

I don't know what this is?

Thanks!


Segher


Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread Martin Liška
On 11/10/2016 04:43 PM, Nathan Sidwell wrote:
> On 11/10/2016 05:19 AM, Martin Liška wrote:
> 
>>> On 10/13/2016 05:34 PM, Martin Liška wrote:
 Hello.

 As it's very hard to guess from GCC driver whether a target supports 
 atomic updates
 for GCOV counter or not, I decided to come up with a new option value 
 (maybe-atomic),
 that would be transformed in a corresponding value (single or atomic) in 
 tree-profile.c.
 The GCC driver selects the option when -pthread is present in the command 
 line.

 That should fix all tests failures seen on AIX target.

 Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

 Ready to be installed?
> 
> I dislike this.  If it's hard for gcc itself to know, how much harder for the 
> user must it be?   (does gcc have another instance of an option that behaves 
> 'prefer-A-or-B-if-you-can't'?
> 
> It's also not clear what problem it's solving for the user?  If the user 
> needs atomic update, they should get a hard error if the target doesn't 
> support it.  If they don't need atomic, why ask for it?

My initial motivation was to automatically selected -fprofile-update=atomic if 
supported by a target and when '-pthread' is present on command line.
As it's very problematic to identify (from GCC driver) whether a target 
supports or not atomic updates, 'maybe' option is the only possible we can 
guess.

> 
> But as ever, I'm not going to veto it.

Other option is to disable selection of -fprofile-update=atomic automatically.

Martin

> 
> nathan
> 



Re: [PATCH, LIBGCC] Avoid count_leading_zeros with undefined result (PR 78067)

2016-11-10 Thread James Greenhalgh

On Wed, Nov 09, 2016 at 10:16:35PM +, Joseph Myers wrote:
> On Wed, 9 Nov 2016, Bernd Edlinger wrote:
>
> > Yes, but maybe introduce a test if the half-wide value fits?
> >
> > like:
> >
> > #define M_OK2(M, T) ((M) > sizeof(T) * CHAR_BIT / 2 - 1)
>
> Something like that.

In patch form, that would look like this...

I've checked on my ARM and AArch64 trees with _Float16 support that this
lets the tests pass.

OK?

Thanks,
James

---
2016-11-10  James Greenhalgh  

* gcc.dg/torture/fp-int-convert.h (M_OK2): New, use it in
WVAL0S tests added in r241817.

diff --git a/gcc/testsuite/gcc.dg/torture/fp-int-convert.h b/gcc/testsuite/gcc.dg/torture/fp-int-convert.h
index bbe9666..2b904b6 100644
--- a/gcc/testsuite/gcc.dg/torture/fp-int-convert.h
+++ b/gcc/testsuite/gcc.dg/torture/fp-int-convert.h
@@ -53,13 +53,14 @@ do {\
   TEST_I_F_VAL (U, F, HVAL1U (P, U), P_OK (P, U));		\
   TEST_I_F_VAL (U, F, HVAL1U (P, U) + 1, P_OK (P, U));		\
   TEST_I_F_VAL (U, F, HVAL1U (P, U) - 1, P_OK (P, U));		\
-  TEST_I_F_VAL (I, F, WVAL0S (I), 1);\
-  TEST_I_F_VAL (I, F, -WVAL0S (I), 1);\
+  TEST_I_F_VAL (I, F, WVAL0S (I), M_OK2 (M, U));		\
+  TEST_I_F_VAL (I, F, -WVAL0S (I), M_OK2 (M, U));		\
 } while (0)
 
 #define P_OK(P, T) ((P) >= sizeof(T) * CHAR_BIT)
 #define P_OK1(P, T) ((P) >= sizeof(T) * CHAR_BIT - 1)
 #define M_OK1(M, T) ((M) > sizeof(T) * CHAR_BIT - 1)
+#define M_OK2(M, T) ((M) > sizeof(T) * CHAR_BIT / 2 - 1)
 #define HVAL0U(P, U) (U)(P_OK (P, U)	 \
 			 ? (U)1		 \
 			 : (((U)1 << (sizeof(U) * CHAR_BIT - 1))	 \


Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread David Edelsohn
On Thu, Nov 10, 2016 at 10:43 AM, Nathan Sidwell  wrote:
> On 11/10/2016 05:19 AM, Martin Liška wrote:
>
>>> On 10/13/2016 05:34 PM, Martin Liška wrote:

 Hello.

 As it's very hard to guess from GCC driver whether a target supports
 atomic updates
 for GCOV counter or not, I decided to come up with a new option value
 (maybe-atomic),
 that would be transformed in a corresponding value (single or atomic) in
 tree-profile.c.
 The GCC driver selects the option when -pthread is present in the
 command line.

 That should fix all tests failures seen on AIX target.

 Patch can bootstrap on ppc64le-redhat-linux and survives regression
 tests.

 Ready to be installed?
>
>
> I dislike this.  If it's hard for gcc itself to know, how much harder for
> the user must it be?   (does gcc have another instance of an option that
> behaves 'prefer-A-or-B-if-you-can't'?
>
> It's also not clear what problem it's solving for the user?  If the user
> needs atomic update, they should get a hard error if the target doesn't
> support it.  If they don't need atomic, why ask for it?
>
> But as ever, I'm not going to veto it.

Do you have a better suggestion?

gcc.c now imposes profile-update=atomic if -pthread is used, even if
the target does not support profile-update=atomic.

Either gcc.c must not impose profile-update=atomic or we need some way
of differentiating between when the request should fail because the
user really expects it and when the request should silently and gently
be ignored.

The atomic update feature is nice, but currently GCC is trying to be
too smart to guess how important the feature is to the user.

Thanks, David


Re: Fix compilation errors with libstdc++v3 for AVR target and allow --enable-libstdcxx

2016-11-10 Thread Felipe Magno de Almeida
On Thu, Nov 10, 2016 at 1:39 PM, Felipe Magno de Almeida
 wrote:
> Hello,
>
> Sorry for top-posting, but this is a ping for the attached patch.
>
> The patch doesn't seem to have been applied nor refused. So I'm
> pinging to see if I need to change something? I already have a
> copyright assignment now.
>
> I'm attaching a updated patch that doesn't conflict in the Changelog
> file.

Reattaching the patch with the correct git commit descriptions.

> Regards,

[snip]

> --
> Felipe Magno de Almeida

Kind regards,
-- 
Felipe Magno de Almeida
From 2e62f3a4b3c307e2abec0cb0302ec601b8f693e8 Mon Sep 17 00:00:00 2001
From: Felipe Magno de Almeida 
Date: Thu, 15 Sep 2016 15:54:50 -0300
Subject: [PATCH 1/3] Add #ifdef case for 16 bits in cow-stdexcept.cc

Added #ifdef case for when void* is 16 bits so it compiles in AVR
target.
---
 libstdc++-v3/ChangeLog  |  4 
 libstdc++-v3/src/c++11/cow-stdexcept.cc | 11 ---
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index f405ccd..80d2e9d 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,7 @@
+2016-11-10  Felipe Magno de Almeida 
+
+	* src/c++11/cow-stdexcept.cc: Add special case for 16 bit pointers
+
 2016-11-09  Tim Shen  
 
 	* libstdc++-v3/include/bits/regex.h (regex_iterator::regex_iterator()):
diff --git a/libstdc++-v3/src/c++11/cow-stdexcept.cc b/libstdc++-v3/src/c++11/cow-stdexcept.cc
index 31a89df..641b372 100644
--- a/libstdc++-v3/src/c++11/cow-stdexcept.cc
+++ b/libstdc++-v3/src/c++11/cow-stdexcept.cc
@@ -208,6 +208,8 @@ extern void* _ZGTtnaX (size_t sz) __attribute__((weak));
 extern void _ZGTtdlPv (void* ptr) __attribute__((weak));
 extern uint8_t _ITM_RU1(const uint8_t *p)
   ITM_REGPARM __attribute__((weak));
+extern uint16_t _ITM_RU2(const uint16_t *p)
+  ITM_REGPARM __attribute__((weak));
 extern uint32_t _ITM_RU4(const uint32_t *p)
   ITM_REGPARM __attribute__((weak));
 extern uint64_t _ITM_RU8(const uint64_t *p)
@@ -272,12 +274,15 @@ _txnal_cow_string_C1_for_exceptions(void* that, const char* s,
 static void* txnal_read_ptr(void* const * ptr)
 {
   static_assert(sizeof(uint64_t) == sizeof(void*)
-		|| sizeof(uint32_t) == sizeof(void*),
-		"Pointers must be 32 bits or 64 bits wide");
+		|| sizeof(uint32_t) == sizeof(void*)
+		|| sizeof(uint16_t) == sizeof(void*),
+		"Pointers must be 16 bits, 32 bits or 64 bits wide");
 #if __UINTPTR_MAX__ == __UINT64_MAX__
   return (void*)_ITM_RU8((const uint64_t*)ptr);
-#else
+#elif __UINTPTR_MAX__ == __UINT32_MAX__
   return (void*)_ITM_RU4((const uint32_t*)ptr);
+#else
+  return (void*)_ITM_RU2((const uint16_t*)ptr);
 #endif
 }
 
-- 
2.10.2


From 7ed4af72fe0bdee1a38c7487955590fb64f76a5d Mon Sep 17 00:00:00 2001
From: Felipe Magno de Almeida 
Date: Thu, 15 Sep 2016 18:52:57 -0300
Subject: [PATCH 2/3] Use temporary int objects to access struct tm members

Call _M_extract_* functions family through temporary int objects, so
it doesn't convert from lvalue to rvalue through a temporary in AVR
because of the incompatible types used in AVR-Libc.

This fixes compilation errors with AVR-Libc while compiling libstdc++
for AVR target.
---
 libstdc++-v3/ChangeLog|  5 +++
 libstdc++-v3/include/bits/locale_facets_nonio.tcc | 44 ---
 2 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index 80d2e9d..80c75c6 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,5 +1,10 @@
 2016-11-10  Felipe Magno de Almeida 
 
+	* include/bits/locale_facets_nonio.tcc: Avoid compilation errors
+	with non-standard struct tm.
+
+2016-11-10  Felipe Magno de Almeida 
+
 	* src/c++11/cow-stdexcept.cc: Add special case for 16 bit pointers
 
 2016-11-09  Tim Shen  
diff --git a/libstdc++-v3/include/bits/locale_facets_nonio.tcc b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
index 1a4f9a0..cc9d2df 100644
--- a/libstdc++-v3/include/bits/locale_facets_nonio.tcc
+++ b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
@@ -659,30 +659,38 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
 		  // Abbreviated weekday name [tm_wday]
 		  const char_type*  __days1[7];
 		  __tp._M_days_abbreviated(__days1);
-		  __beg = _M_extract_name(__beg, __end, __tm->tm_wday, __days1,
+  __tm->tm_wday = __mem;
+		  __beg = _M_extract_name(__beg, __end, __mem, __days1,
 	  7, __io, __tmperr);
+  __mem = __tm->tm_wday;
 		  break;
 		case 'A':
 		  // Weekday name [tm_wday].
 		  const char_type*  __days2[7];
 		  __tp._M_days(__days2);
-		  __beg = _M_extract_name(__beg, __end, __tm->tm_wday, __days2,
+  __mem = __tm->tm_wday;
+		  __beg = _M_extract_name(__beg, __end, __mem, __days2,
 	  7, __io, __tmperr);
+  __tm->tm_wday = __mem;
 		  break;
 		case 'h':
 		case 'b':
 		  // Abbreviated month name [tm_mon]
 		  const char_type*  __months1[12];
 		  __tp._M_months_abbrevi

Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread Nathan Sidwell

On 11/10/2016 05:19 AM, Martin Liška wrote:


On 10/13/2016 05:34 PM, Martin Liška wrote:

Hello.

As it's very hard to guess from GCC driver whether a target supports atomic 
updates
for GCOV counter or not, I decided to come up with a new option value 
(maybe-atomic),
that would be transformed in a corresponding value (single or atomic) in 
tree-profile.c.
The GCC driver selects the option when -pthread is present in the command line.

That should fix all tests failures seen on AIX target.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?


I dislike this.  If it's hard for gcc itself to know, how much harder 
for the user must it be?   (does gcc have another instance of an option 
that behaves 'prefer-A-or-B-if-you-can't'?


It's also not clear what problem it's solving for the user?  If the user 
needs atomic update, they should get a hard error if the target doesn't 
support it.  If they don't need atomic, why ask for it?


But as ever, I'm not going to veto it.

nathan

--
Nathan Sidwell


Re: Fix compilation errors with libstdc++v3 for AVR target and allow --enable-libstdcxx

2016-11-10 Thread Felipe Magno de Almeida
Hello,

Sorry for top-posting, but this is a ping for the attached patch.

The patch doesn't seem to have been applied nor refused. So I'm
pinging to see if I need to change something? I already have a
copyright assignment now.

I'm attaching a updated patch that doesn't conflict in the Changelog
file.

Regards,

On Fri, Sep 16, 2016 at 3:37 AM, Felipe Magno de Almeida
 wrote:
> Hello,
>
> Another patch.
>
> On Fri, Sep 16, 2016 at 2:53 AM, Felipe Magno de Almeida
>  wrote:
>> On Fri, Sep 16, 2016 at 2:42 AM, Marc Glisse  wrote:
>>> On Thu, 15 Sep 2016, Felipe Magno de Almeida wrote:
>>>
>>> +   || sizeof(uint32_t) == sizeof(void*)
>>> +|| sizeof(uint16_t) == sizeof(void*),
>>>
>>> Indentation is off?
>>>
 Call _M_extract_* functions family through temporary int objects
>>>
>>>
>>> Would it make sense to use a template type instead of int for this
>>> parameter? Or possibly have a typedef that defaults to int (what POSIX
>>> requires). The hard case would be a libc that uses bitfields for the fields
>>> of struct tm (that could save some space), but I don't think anyone does
>>> that.
>>
>> I've tried both approaches. Templates were causing problems of not
>> defined instantations because they were being used as ints too
>> in other _M_extract functions through a tmp integer. And typedef's
>> caused the same problem of having to use a tmp value of the right
>> type but for example _M_extract_wday_or_month could not have the
>> same type (in AVR they do) and I'd have to use a temporary anyway
>> then.
>>
>> This was the least intrusive way.
>>
 float pointing
>>>
>>> floating point?
>>
>> :D. Yes.
>>
>>> A ChangeLog entry would also help.
>>
>> OK.
>>
>>> --
>>> Marc Glisse
>>
>> Regards,
>> --
>> Felipe Magno de Almeida
>
>
>
> --
> Felipe Magno de Almeida



-- 
Felipe Magno de Almeida
diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index f405ccd..b0efe72 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,17 @@
+2016-11-10  Felipe Magno de Almeida 
+
+	* src/c++11/cow-stdexcept.cc: Add special case for 16 bit pointers
+
+2016-11-10  Felipe Magno de Almeida 
+
+	* include/bits/locale_facets_nonio.tcc: Avoid compilation errors
+	with non-standard struct tm.
+
+2016-11-10  Felipe Magno de Almeida 
+
+	* crossconfig.m4: Add avr target for cross-compilation
+	* configure: Regenerate
+
 2016-11-09  Tim Shen  
 
 	* libstdc++-v3/include/bits/regex.h (regex_iterator::regex_iterator()):
diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 8481a48..5e3f783 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -28840,6 +28840,55 @@ case "${host}" in
 # This is a freestanding configuration; there is nothing to do here.
 ;;
 
+  avr*-*-*)
+$as_echo "#define HAVE_ACOSF 1" >>confdefs.h
+
+$as_echo "#define HAVE_ASINF 1" >>confdefs.h
+
+$as_echo "#define HAVE_ATAN2F 1" >>confdefs.h
+
+$as_echo "#define HAVE_ATANF 1" >>confdefs.h
+
+$as_echo "#define HAVE_CEILF 1" >>confdefs.h
+
+$as_echo "#define HAVE_COSF 1" >>confdefs.h
+
+$as_echo "#define HAVE_COSHF 1" >>confdefs.h
+
+$as_echo "#define HAVE_EXPF 1" >>confdefs.h
+
+$as_echo "#define HAVE_FABSF 1" >>confdefs.h
+
+$as_echo "#define HAVE_FLOORF 1" >>confdefs.h
+
+$as_echo "#define HAVE_FMODF 1" >>confdefs.h
+
+$as_echo "#define HAVE_FREXPF 1" >>confdefs.h
+
+$as_echo "#define HAVE_SQRTF 1" >>confdefs.h
+
+$as_echo "#define HAVE_HYPOTF 1" >>confdefs.h
+
+$as_echo "#define HAVE_LDEXPF 1" >>confdefs.h
+
+$as_echo "#define HAVE_LOG10F 1" >>confdefs.h
+
+$as_echo "#define HAVE_LOGF 1" >>confdefs.h
+
+$as_echo "#define HAVE_MODFF 1" >>confdefs.h
+
+$as_echo "#define HAVE_POWF 1" >>confdefs.h
+
+$as_echo "#define HAVE_SINF 1" >>confdefs.h
+
+$as_echo "#define HAVE_SINHF 1" >>confdefs.h
+
+$as_echo "#define HAVE_TANF 1" >>confdefs.h
+
+$as_echo "#define HAVE_TANHF 1" >>confdefs.h
+
+;;
+
   mips*-sde-elf*)
 # These definitions are for the SDE C library rather than newlib.
 SECTION_FLAGS='-ffunction-sections -fdata-sections'
diff --git a/libstdc++-v3/crossconfig.m4 b/libstdc++-v3/crossconfig.m4
index 6abc84f..2b955ec 100644
--- a/libstdc++-v3/crossconfig.m4
+++ b/libstdc++-v3/crossconfig.m4
@@ -9,6 +9,32 @@ case "${host}" in
 # This is a freestanding configuration; there is nothing to do here.
 ;;
 
+  avr*-*-*)
+AC_DEFINE(HAVE_ACOSF)
+AC_DEFINE(HAVE_ASINF)
+AC_DEFINE(HAVE_ATAN2F)
+AC_DEFINE(HAVE_ATANF)
+AC_DEFINE(HAVE_CEILF)
+AC_DEFINE(HAVE_COSF)
+AC_DEFINE(HAVE_COSHF)
+AC_DEFINE(HAVE_EXPF)
+AC_DEFINE(HAVE_FABSF)
+AC_DEFINE(HAVE_FLOORF)
+AC_DEFINE(HAVE_FMODF)
+AC_DEFINE(HAVE_FREXPF)
+AC_DEFINE(HAVE_SQRTF)
+AC_DEFINE(HAVE_HYPOTF)
+AC_DEFINE(HAVE_LDEXPF)
+AC_DEFINE(HAVE_LOG10F)
+AC_DEFINE(HAVE_LOGF)
+AC_DEFINE(HAVE_MODFF)
+AC_DEFINE(HAVE_POWF)
+AC_DEFINE(HAVE_SINF)
+AC_DEFINE(HAVE

[PATCH] Support no newline in print_gimple_stmt

2016-11-10 Thread Martin Liška
I've just noticed that tree-ssa-dse wrongly prints a new line to dump file.
For the next stage1, I'll go through usages of print_gimple_stmt and remove
extra new lines like:

gcc/auto-profile.c:  print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
gcc/auto-profile.c-  fprintf (dump_file, "\n");

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From ab1ed77381f78a8940ca250ee0f5ef5cd6b87e7f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 10 Nov 2016 14:54:00 +0100
Subject: [PATCH] Support no newline in print_gimple_stmt

gcc/ChangeLog:

2016-11-10  Martin Liska  

	* gimple-pretty-print.c (print_gimple_stmt): Add new argument.
	* gimple-pretty-print.h (print_gimple_stmt): Declare the new
	argument.
	* tree-ssa-dse.c (dse_optimize_stmt): Use the argument to not to
	print a newline.
---
 gcc/gimple-pretty-print.c | 7 +--
 gcc/gimple-pretty-print.h | 2 +-
 gcc/tree-ssa-dse.c| 5 +++--
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index f588f5e..8538911 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -76,13 +76,16 @@ debug_gimple_stmt (gimple *gs)
FLAGS as in pp_gimple_stmt_1.  */
 
 void
-print_gimple_stmt (FILE *file, gimple *g, int spc, int flags)
+print_gimple_stmt (FILE *file, gimple *g, int spc, int flags,
+		   bool newline)
 {
   pretty_printer buffer;
   pp_needs_newline (&buffer) = true;
   buffer.buffer->stream = file;
   pp_gimple_stmt_1 (&buffer, g, spc, flags);
-  pp_newline_and_flush (&buffer);
+  if (newline)
+pp_newline (&buffer);
+  pp_flush (&buffer);
 }
 
 DEBUG_FUNCTION void
diff --git a/gcc/gimple-pretty-print.h b/gcc/gimple-pretty-print.h
index f8eef99..6890cfb 100644
--- a/gcc/gimple-pretty-print.h
+++ b/gcc/gimple-pretty-print.h
@@ -27,7 +27,7 @@ along with GCC; see the file COPYING3.  If not see
 extern void debug_gimple_stmt (gimple *);
 extern void debug_gimple_seq (gimple_seq);
 extern void print_gimple_seq (FILE *, gimple_seq, int, int);
-extern void print_gimple_stmt (FILE *, gimple *, int, int);
+extern void print_gimple_stmt (FILE *, gimple *, int, int, bool newline = true);
 extern void debug (gimple &ref);
 extern void debug (gimple *ptr);
 extern void print_gimple_expr (FILE *, gimple *, int, int);
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 372a0be..b1a8c5d 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -237,7 +237,8 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 		{
 		  fprintf (dump_file, "  Deleted dead call '");
-		  print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0);
+		  print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0,
+ false);
 		  fprintf (dump_file, "'\n");
 		}
 
@@ -293,7 +294,7 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
   if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "  Deleted dead store '");
-	  print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0);
+	  print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0, false);
 	  fprintf (dump_file, "'\n");
 	}
 
-- 
2.10.1



Unreviewed libstdc++ patch

2016-11-10 Thread Rainer Orth
The libstdc++ part of the following patch

[fixincludes, v3] Don't define libstdc++-internal macros in Solaris 10+ 

https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00330.html

has remained unreviewed for a week.  Bruce already approved the
fixincludes part.

In the meantime, full testing on mainline has completed and backports to
the gcc-5 and gcc-6 branches have also been tested successfully.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH][ARM] PR78255: wrong code generation for indirect sibling calls

2016-11-10 Thread Andre Vieira (lists)
Hi,

As reported in PR78255 there is currently an issue with indirect sibling
calls in ARM when the address of the sibling call is loaded into 'r3'
and that same register is chosen to align the stack.  See the report for
further information.

As I mentioned in the bugzilla ticket I am not sure this is the right
approach, though it works... Bootstrapped on ARM and no regressions.

Do you think this is OK? Another solution would be to make sure that
'arm_get_frame_offsets' recalculates offsets after we know that the call
is going to be indirect, i.e. after we know the address is going to be
loaded into a register, but I do not know what a sane way would be to
ensure this.

Regards,
Andre

gcc/ChangeLog
2016-11-10  Andre Vieira  

* config/arm/arm.md (sibcall_internal): Add 'use' to pattern.
(sibcall_value_internal): Likewise.
(sibcall_insn): Likewise.
(sibcall_value_insn): Likewise.


gcc/testsuite/ChangeLog
2016-11-10  Andre Vieira  

* gcc.target/arm/pr78255.c: New.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
8393f65bcf4c9c3e61b91e5adcd5f59ff7c6ec3f..ab28b15f3e4ebbaca2b8ec0523493b54cce8c306
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -8192,7 +8192,8 @@
   [(parallel [(call (match_operand 0 "memory_operand" "")
(match_operand 1 "general_operand" ""))
  (return)
- (use (match_operand 2 "" ""))])])
+ (use (match_operand 2 "" ""))
+ (use (match_dup 0))])])
 
 ;; We may also be able to do sibcalls for Thumb, but it's much harder...
 (define_expand "sibcall"
@@ -8225,7 +8226,8 @@
   (call (match_operand 1 "memory_operand" "")
 (match_operand 2 "general_operand" "")))
  (return)
- (use (match_operand 3 "" ""))])])
+ (use (match_operand 3 "" ""))
+ (use (match_dup 1))])])
 
 (define_expand "sibcall_value"
   [(parallel [(set (match_operand 0 "" "")
@@ -8258,7 +8260,8 @@
  [(call (mem:SI (match_operand:SI 0 "call_insn_operand" "Cs, US"))
(match_operand 1 "" ""))
   (return)
-  (use (match_operand 2 "" ""))]
+  (use (match_operand 2 "" ""))
+  (use (match_operand 3 "" ""))]
   "TARGET_32BIT && SIBLING_CALL_P (insn)"
   "*
   if (which_alternative == 1)
@@ -8279,7 +8282,8 @@
(call (mem:SI (match_operand:SI 1 "call_insn_operand" "Cs,US"))
 (match_operand 2 "" "")))
   (return)
-  (use (match_operand 3 "" ""))]
+  (use (match_operand 3 "" ""))
+  (use (match_operand 4 "" ""))]
   "TARGET_32BIT && SIBLING_CALL_P (insn)"
   "*
   if (which_alternative == 1)
diff --git a/gcc/testsuite/gcc.target/arm/pr78255.c 
b/gcc/testsuite/gcc.target/arm/pr78255.c
new file mode 100644
index 
..4901acea51466c0bac92d9cb90e52b00b450d88a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr78255.c
@@ -0,0 +1,57 @@
+/* { dg-do run } */
+/* { dg-options "-O2" }  */
+
+#include 
+
+struct table_s
+{
+void (*fun0)
+( void );
+void (*fun1)
+( void );
+void (*fun2)
+( void );
+void (*fun3)
+( void );
+void (*fun4)
+( void );
+void (*fun5)
+( void );
+void (*fun6)
+( void );
+void (*fun7)
+( void );
+} table;
+
+void callback0(){__asm("mov r0, r0 \n\t");}
+void callback1(){__asm("mov r0, r0 \n\t");}
+void callback2(){__asm("mov r0, r0 \n\t");}
+void callback3(){__asm("mov r0, r0 \n\t");}
+void callback4(){__asm("mov r0, r0 \n\t");}
+
+void test (void) {
+memset(&table, 0, sizeof table);
+
+asm volatile ("" : : : "r3");
+
+table.fun0 = callback0;
+table.fun1 = callback1;
+table.fun2 = callback2;
+table.fun3 = callback3;
+table.fun4 = callback4;
+table.fun0();
+}
+
+void foo (void)
+{
+  __builtin_abort ();
+}
+
+int main (void)
+{
+  unsigned long p = (unsigned long) &foo;
+  asm volatile ("mov r3, %0" : : "r" (p));
+  test ();
+
+  return 0;
+}


Re: [Patch, fortran] PR44265 - Link error with reference to parameter array in specification expression

2016-11-10 Thread Paul Richard Thomas
Hi Dominique.

snip
> I have a last glitch (which can be deferred if needed):
snip

Fixed by the new patch, which is attached. Bootstraps and regtests OK.

OK for trunk?

Paul

2016-11-10  Paul Thomas  

PR fortran/44265
* gfortran.h : Add fn_result_spec bitfield to gfc_symbol.
* resolve.c (flag_fn_result_spec): New function.
(resolve_fntype): Call it for character result lengths.
* symbol.c (gfc_new_symbol): Set fn_result_spec to zero.
* trans-decl.c (gfc_sym_mangled_identifier): Include the
procedure name in the mangled name for symbols with the
fn_result_spec bit set.
(gfc_get_symbol_decl): Mangle the name of these symbols.
(gfc_create_module_variable): Allow them through the assert.
(gfc_generate_function_code): Remove the assert before the
initialization of sym->tlink because the frontend no longer
uses this field.
* trans-expr.c (gfc_map_intrinsic_function): Add a case to
treat the LEN_TRIM intrinsic.

2016-11-10  Paul Thomas  

PR fortran/44265
* gfortran.dg/char_result_14.f90: New test.
* gfortran.dg/char_result_15.f90: New test.
* gfortran.dg/char_result_16.f90: New test.


-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein
Index: gcc/fortran/gfortran.h
===
*** gcc/fortran/gfortran.h  (revision 241994)
--- gcc/fortran/gfortran.h  (working copy)
*** typedef struct gfc_symbol
*** 1498,1503 
--- 1498,1505 
unsigned equiv_built:1;
/* Set if this variable is used as an index name in a FORALL.  */
unsigned forall_index:1;
+   /* Set if the symbol is used in a function result specification .  */
+   unsigned fn_result_spec:1;
/* Used to avoid multiple resolutions of a single symbol.  */
unsigned resolved:1;
/* Set if this is a module function or subroutine with the
Index: gcc/fortran/resolve.c
===
*** gcc/fortran/resolve.c   (revision 241994)
--- gcc/fortran/resolve.c   (working copy)
*** resolve_equivalence (gfc_equiv *eq)
*** 15732,15737 
--- 15732,15785 
  }
  
  
+ /* Function called by resolve_fntype to flag other symbol used in the
+length type parameter specification of function resuls.  */
+ 
+ static bool
+ flag_fn_result_spec (gfc_expr *expr,
+  gfc_symbol *sym ATTRIBUTE_UNUSED,
+  int *f ATTRIBUTE_UNUSED)
+ {
+   gfc_namespace *ns;
+   gfc_symbol *s;
+ 
+   if (expr->expr_type == EXPR_VARIABLE)
+ {
+   s = expr->symtree->n.sym;
+   for (ns = s->ns; ns; ns = ns->parent)
+   if (!ns->parent)
+ break;
+ 
+   if (!s->fn_result_spec
+ && s->attr.flavor == FL_PARAMETER)
+   {
+ /* Function contained in a module */
+ if (ns->proc_name && ns->proc_name->attr.flavor == FL_MODULE)
+   {
+ gfc_symtree *st;
+ s->fn_result_spec = 1;
+ /* Make sure that this symbol is translated as a module
+variable.  */
+ st = gfc_get_unique_symtree (ns);
+ st->n.sym = s;
+ s->refs++;
+   }
+ /* ... which is use associated and called.  */
+ else if (s->attr.use_assoc || s->attr.used_in_submodule
+   ||
+ /* External function matched with an interface.  */
+ (s->ns->proc_name
+  && ((s->ns == ns
+&& s->ns->proc_name->attr.if_source == IFSRC_DECL)
+  || s->ns->proc_name->attr.if_source == IFSRC_IFBODY)
+  && s->ns->proc_name->attr.function))
+   s->fn_result_spec = 1;
+   }
+ }
+   return false;
+ }
+ 
+ 
  /* Resolve function and ENTRY types, issue diagnostics if needed.  */
  
  static void
*** resolve_fntype (gfc_namespace *ns)
*** 15782,15787 
--- 15830,15838 
el->sym->attr.untyped = 1;
  }
}
+ 
+   if (sym->ts.type == BT_CHARACTER)
+ gfc_traverse_expr (sym->ts.u.cl->length, NULL, flag_fn_result_spec, 0);
  }
  
  
Index: gcc/fortran/symbol.c
===
*** gcc/fortran/symbol.c(revision 241994)
--- gcc/fortran/symbol.c(working copy)
*** gfc_new_symbol (const char *name, gfc_na
*** 2933,2938 
--- 2933,2939 
p->common_block = NULL;
p->f2k_derived = NULL;
p->assoc = NULL;
+   p->fn_result_spec = 0;

return p;
  }
Index: gcc/fortran/trans-decl.c
===
*** gcc/fortran/trans-decl.c(revision 241994)
--- gcc/fortran/trans-decl.c(working copy)
*** gfc_sym_mangled_identifier (gfc_symbol *
*** 355,362 
if (sym->attr.is_bind_c == 1 && sym->binding_label)
  return get_identifier (sym

[PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-10 Thread Kyrill Tkachov

Hi all,

This patch implements the new separate shrink-wrapping hooks for aarch64.
In separate shrink wrapping (as I understand it) we consider each register 
save/restore as
a 'component' that can be performed independently of the other save/restores in 
the prologue/epilogue
and can be moved outside the prologue/epilogue and instead performed only in 
the basic blocks where it's
actually needed. This allows us to avoid saving and restoring registers on 
execution paths where a register
might not be needed.

In the most general form a 'component' can be any operation that the 
prologue/epilogue performs, for example
stack adjustment. But in this patch we only consider callee-saved register 
save/restores as components.
The code is in many ways similar to the powerpc implementation of the hooks.

The hooks implemented are:
* TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS: Returns a bitmap containing a bit 
for each register that should
be considered a 'component' i.e. its save/restore should be separated from the 
prologue and epilogue and placed
at the basic block where it's needed.

* TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB: Determine for a given basic block which 
'component' registers it needs.
This is determined through dataflow. If a component register is in the IN,GEN 
or KILL sets for the basic block
it's considered as needed and marked as such in the bitmap.

* TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS and 
TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS: Given a bitmap
of component registers emits the save or restore code for them.

* TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS: Given a bitmap of component 
registers record in the backend that
the register is shrink-wrapped using this approach and that the normal prologue 
and epilogue expansion code
should not emit code for them. This is done similarly to powerpc by defining a 
bool array in machine_function
where we record whether each register is separately shrink-wrapped.  The 
prologue and epilogue expansion code
(through aarch64_save_callee_saves and aarch64_restore_callee_saves) is updated 
to not emit save/restores for
these registers if they appear in that array.

Our prologue and epilogue code has a lot of intricate logic to perform stack 
adjustments using the writeback
forms of the load/store instructions. Separately shrink-wrapping those 
registers marked for writeback
(cfun->machine->frame.wb_candidate1 and cfun->machine->frame.wb_candidate2) 
broke that codegen and I had to
emit an explicit stack adjustment instruction that created ugly 
prologue/epilogue sequences. So this patch
is conservative and doesn't allow shrink-wrapping of the registers marked for 
writeback. Maybe in the future
we can relax it (for example allow wrapping of one of the two writeback 
registers if the writeback amount
can be encoded in a single-register writeback store/load) but given the 
development stage of GCC I thought
I'd play it safe.

I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there were some 
interesting swings.
458.sjeng +1.45%
471.omnetpp   +2.19%
445.gobmk -2.01%

On SPECFP:
453.povray+7.00%

I'll be re-running the benchmarks with Segher's recent patch [1] to see if they 
fix the regression
and if it does I think this can go in.

[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00889.html

Bootstrapped and tested on aarch64-none-linux-gnu.

Thanks,
Kyrill

2016-11-10  Kyrylo Tkachov  

* config/aarch64/aarch64.h (machine_function): Add
reg_is_wrapped_separately field.
* config/aarch64/aarch64.c (emit_set_insn): Change return type to
rtx_insn *.
(aarch64_save_callee_saves): Don't save registers that are wrapped
separately.
(aarch64_restore_callee_saves): Don't restore registers that are
wrapped separately.
(offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p,
aarch64_offset_7bit_signed_scaled_p): Move earlier in the file.
(aarch64_get_separate_components): New function.
(aarch64_components_for_bb): Likewise.
(aarch64_disqualify_components): Likewise.
(aarch64_emit_prologue_components): Likewise.
(aarch64_emit_epilogue_components): Likewise.
(aarch64_set_handled_components): Likewise.
(TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS,
TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB,
TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS,
TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS,
TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS,
TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Define.
commit 14c7a66d9f3a44ef40499e61ca9643c7dfbc6c82
Author: Kyrylo Tkachov 
Date:   Tue Oct 11 09:25:54 2016 +0100

[AArch64] Separate shrink wrapping hooks implementation

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 325e725..5508333 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1138,7 +1138,7 @@ aarch64_is_extend_from_extract (machine_mode mode, rtx mult_imm,
 
 /* Emit an insn that's a simple single-set.  Both the 

Re: [PATCH] Fix PR78189

2016-11-10 Thread Christophe Lyon
On 10 November 2016 at 09:34, Richard Biener  wrote:
> On Wed, 9 Nov 2016, Christophe Lyon wrote:
>
>> On 9 November 2016 at 09:36, Bin.Cheng  wrote:
>> > On Tue, Nov 8, 2016 at 9:11 AM, Richard Biener  wrote:
>> >> On Mon, 7 Nov 2016, Christophe Lyon wrote:
>> >>
>> >>> Hi Richard,
>> >>>
>> >>>
>> >>> On 7 November 2016 at 09:01, Richard Biener  wrote:
>> >>> >
>> >>> > The following fixes an oversight when computing alignment in the
>> >>> > vectorizer.
>> >>> >
>> >>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>> >>> >
>> >>> > Richard.
>> >>> >
>> >>> > 2016-11-07  Richard Biener  
>> >>> >
>> >>> > PR tree-optimization/78189
>> >>> > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Fix
>> >>> > alignment computation.
>> >>> >
>> >>> > * g++.dg/torture/pr78189.C: New testcase.
>> >>> >
>> >>> > Index: gcc/testsuite/g++.dg/torture/pr78189.C
>> >>> > ===
>> >>> > --- gcc/testsuite/g++.dg/torture/pr78189.C  (revision 0)
>> >>> > +++ gcc/testsuite/g++.dg/torture/pr78189.C  (working copy)
>> >>> > @@ -0,0 +1,41 @@
>> >>> > +/* { dg-do run } */
>> >>> > +/* { dg-additional-options "-ftree-slp-vectorize 
>> >>> > -fno-vect-cost-model" } */
>> >>> > +
>> >>> > +#include 
>> >>> > +
>> >>> > +struct A
>> >>> > +{
>> >>> > +  void * a;
>> >>> > +  void * b;
>> >>> > +};
>> >>> > +
>> >>> > +struct alignas(16) B
>> >>> > +{
>> >>> > +  void * pad;
>> >>> > +  void * misaligned;
>> >>> > +  void * pad2;
>> >>> > +
>> >>> > +  A a;
>> >>> > +
>> >>> > +  void Null();
>> >>> > +};
>> >>> > +
>> >>> > +void B::Null()
>> >>> > +{
>> >>> > +  a.a = nullptr;
>> >>> > +  a.b = nullptr;
>> >>> > +}
>> >>> > +
>> >>> > +void __attribute__((noinline,noclone))
>> >>> > +NullB(void * misalignedPtr)
>> >>> > +{
>> >>> > +  B* b = reinterpret_cast(reinterpret_cast(misalignedPtr) 
>> >>> > - offsetof(B, misaligned));
>> >>> > +  b->Null();
>> >>> > +}
>> >>> > +
>> >>> > +int main()
>> >>> > +{
>> >>> > +  B b;
>> >>> > +  NullB(&b.misaligned);
>> >>> > +  return 0;
>> >>> > +}
>> >>> > diff --git gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>> >>> > index 9346cfe..b03cb1e 100644
>> >>> > --- gcc/tree-vect-data-refs.c
>> >>> > +++ gcc/tree-vect-data-refs.c
>> >>> > @@ -773,10 +773,25 @@ vect_compute_data_ref_alignment (struct 
>> >>> > data_reference *dr)
>> >>> >base = ref;
>> >>> >while (handled_component_p (base))
>> >>> >  base = TREE_OPERAND (base, 0);
>> >>> > +  unsigned int base_alignment;
>> >>> > +  unsigned HOST_WIDE_INT base_bitpos;
>> >>> > +  get_object_alignment_1 (base, &base_alignment, &base_bitpos);
>> >>> > +  /* As data-ref analysis strips the MEM_REF down to its base operand
>> >>> > + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to
>> >>> > + adjust things to make base_alignment valid as the alignment of
>> >>> > + DR_BASE_ADDRESS.  */
>> >>> >if (TREE_CODE (base) == MEM_REF)
>> >>> > -base = build2 (MEM_REF, TREE_TYPE (base), base_addr,
>> >>> > -  build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), 
>> >>> > 0));
>> >>> > -  unsigned int base_alignment = get_object_alignment (base);
>> >>> > +{
>> >>> > +  base_bitpos -= mem_ref_offset (base).to_short_addr () * 
>> >>> > BITS_PER_UNIT;
>> >>> > +  base_bitpos &= (base_alignment - 1);
>> >>> > +}
>> >>> > +  if (base_bitpos != 0)
>> >>> > +base_alignment = base_bitpos & -base_bitpos;
>> >>> > +  /* Also look at the alignment of the base address DR analysis
>> >>> > + computed.  */
>> >>> > +  unsigned int base_addr_alignment = get_pointer_alignment 
>> >>> > (base_addr);
>> >>> > +  if (base_addr_alignment > base_alignment)
>> >>> > +base_alignment = base_addr_alignment;
>> >>> >
>> >>> >if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype)))
>> >>> >  DR_VECT_AUX (dr)->base_element_aligned = true;
>> >>>
>> >>> Since you committed this patch (r241892), I'm seeing execution failures:
>> >>>   gcc.dg/vect/pr40074.c -flto -ffat-lto-objects execution test
>> >>>   gcc.dg/vect/pr40074.c execution test
>> >>> on armeb-none-linux-gnueabihf --with-mode=arm --with-cpu=cortex-a9
>> >>> --with-fpu=neon-fp16
>> >>> (using qemu as simulator)
>> >>
>> >> The difference is that we now vectorize the testcase with versioning
>> >> for alignment (but it should never execute the vectorized variant).
>> >> I need arm peoples help to understand what is wrong.
>> > Hi All,
>> > I will look at it.
>> >
>>
>> Hi,
>>
>> This is causing new regressions on armeb:
>>   gcc.dg/vect/vect-strided-a-u8-i2-gap.c -flto -ffat-lto-objects
>> scan-tree-dump-times vect "vectorized 2 loops" 1
>>   gcc.dg/vect/vect-strided-a-u8-i2-gap.c scan-tree-dump-times vect
>> "vectorized 2 loops" 1
>
> It's actually an improvement as armeb can't do unaligned loads.
> Before the patch we versioned both loops for alignment (they
> can't be possibly both ali

Re: [ARM] PR 78253 do not resolve weak ref locally

2016-11-10 Thread Christophe Lyon
On 10 November 2016 at 11:05, Richard Earnshaw
 wrote:
> On 09/11/16 21:29, Christophe Lyon wrote:
>> Hi,
>>
>> PR 78253 shows that the handling of weak references has changed for
>> ARM with gcc-5.
>>
>> When r220674 was committed, default_binds_local_p_2 gained a new
>> parameter (weak_dominate), which, when true, implies that a reference
>> to a weak symbol defined locally will be resolved locally, even though
>> it could be overridden by a strong definition in another object file.
>>
>> With r220674, default_binds_local_p forces weak_dominate=true,
>> effectively changing the previous behavior.
>>
>> The attached patch introduces default_binds_local_p_4 which is a copy
>> of default_binds_local_p_2, but using weak_dominate=false, and updates
>> the ARM target to call default_binds_local_p_4 instead of
>> default_binds_local_p_2.
>>
>> I ran cross-tests on various arm* configurations with no regression,
>> and checked that the test attached to the original bugzilla now works
>> as expected.
>>
>> I am not sure why weak_dominate defaults to true, and I couldn't
>> really understand why by reading the threads related to r220674 and
>> following updates to default_binds_local_p_* which all deal with other
>> corner cases and do not discuss the weak_dominate parameter.
>>
>> Or should this patch be made more generic?
>>
>
> I certainly don't think it should be ARM specific.
That was my feeling too.

>
> The questions I have are:
>
> 1) What do other targets do today.  Are they the same, or different?

arm, aarch64, s390 use default_binds_local_p_2 since PR 65780, and
default_binds_local_p before that. Both have weak_dominate=true
i386 has its own version, calling default_binds_local_p_3 with true
for weak_dominate

But the behaviour of default_binds_local_p changed with r220674 as I said above.
See https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=220674 and
notice how weak_dominate was introduced

The original bug report is about a different case:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32219

The original patch submission is
https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00410.html
and the 1st version with weak_dominate is in:
https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00469.html
but it's not clear to me why this was introduced

> 2) If different why?
on aarch64, although binds_local_p returns true, the relocations used when
building the function pointer is still the same (still via the GOT).

aarch64 has different logic than arm when accessing a symbol
(eg aarch64_classify_symbol)

> 3) Is the current behaviour really what was intended by the patch?  ie.
> Was the old behaviour actually wrong?
>
That's what I was wondering.
Before r220674, calling a weak function directly or via a function
pointer had the same effect (in other words, the function pointer
points to the actual implementation: the strong one if any, the weak
one otherwise).

After r220674, on arm the function pointer points to the weak
definition, which seems wrong to me, it should leave the actual
resolution to the linker.


> R.
>> Thanks,
>>
>> Christophe
>>
>


Re: [Patch, Fortran, committed] PR 46459: ICE (segfault): Invalid read in compare_actual_formal [error recovery]

2016-11-10 Thread Janus Weil
Hi Andre,

> well, is it really that obvious?

well ... what can I say. If you wanna be strict about it, I guess
there is no such thing as an "obvious patch". There is basically
always something that you can miss, or that can be improved. Mikael's
patch was obvious to me in the sense that it is clear and simple and
fixes the ICE without any side effects. Thus it's a clear improvement,
and it has been rotting in bugzilla for over five years.

Are you implying that it was premature to commit it as 'obvious'?


> It fixes the ICE, correct. But what about
> cases where the actual has an explicit interface, but is not a variable? Like
> in:
> [...]
> Can you comment
> having thought about this somewhat?

In fact I have not thought about any further cases. Since you're not
giving full examples, I can only guess what you mean: The cases in the
attachment are working as expected. Anything else?

Cheers,
Janus




> On Wed, 9 Nov 2016 21:42:15 +0100
> Janus Weil  wrote:
>
>> Hi all,
>>
>> I have committed yet another obvious ice-on-invalid fix:
>>
>> https://gcc.gnu.org/viewcvs?rev=242020&root=gcc&view=rev
>>
>> Cheers,
>> Janus
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de

  implicit none
  call sub(f())

contains

  function f()
integer :: f
  end function

  ! sub() defined as in the pr.
  subroutine sub (j)
integer, volatile :: j
  end subroutine

end
  type t
integer :: i
  end type

  type(t) :: foo

  call sub(foo%i)

contains

  ! sub() defined as in the pr.
  subroutine sub (j)
integer, volatile :: j
  end subroutine

end


[PATCH] Fix PR71762

2016-11-10 Thread Richard Biener

The following fixes PR71762 via reverting the transforms of
~X & Y to X < Y and similar because when the bools they apply to
are expanded to RTL undefined values are not reliably zero-extended
and thus the transform is invalid.  Ensuring the zero-extension
is too costly IMHO and the proper fix is to move the transform
to RTL where we can check known-zero-bits to validate validity
or to fix GIMPLE not not have operations on types not matching their mode
in precision.

Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk
and branches?

Any takers for the RTL implementation?

Thanks,
Richard.

2016-11-10  Richard Biener  

PR middle-end/71762
* match.pd ((~X & Y) -> X < Y, (X & ~Y) -> Y < X,
(~X | Y) -> X <= Y, (X | ~Y) -> Y <= X): Remove.

* gcc.dg/torture/pr71762-1.c: New testcase.
* gcc.dg/torture/pr71762-2.c: Likewise.
* gcc.dg/torture/pr71762-3.c: Likewise.
* gcc.dg/tree-ssa/forwprop-28.c: XFAIL.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 242004)
+++ gcc/match.pd(working copy)
@@ -944,33 +944,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (op:c truth_valued_p@0 (logical_inverted_value @0))
   { constant_boolean_node (op == NE_EXPR ? true : false, type); }))
 
-/* If arg1 and arg2 are booleans (or any single bit type)
-   then try to simplify:
-
-   (~X & Y) -> X < Y
-   (X & ~Y) -> Y < X
-   (~X | Y) -> X <= Y
-   (X | ~Y) -> Y <= X
-
-   But only do this if our result feeds into a comparison as
-   this transformation is not always a win, particularly on
-   targets with and-not instructions.
-   -> simplify_bitwise_binary_boolean */
-(simplify
-  (ne (bit_and:c (bit_not @0) @1) integer_zerop)
-  (if (INTEGRAL_TYPE_P (TREE_TYPE (@1))
-   && TYPE_PRECISION (TREE_TYPE (@1)) == 1)
-   (if (TYPE_UNSIGNED (TREE_TYPE (@1)))
-(lt @0 @1)
-(gt @0 @1
-(simplify
-  (ne (bit_ior:c (bit_not @0) @1) integer_zerop)
-  (if (INTEGRAL_TYPE_P (TREE_TYPE (@1))
-   && TYPE_PRECISION (TREE_TYPE (@1)) == 1)
-   (if (TYPE_UNSIGNED (TREE_TYPE (@1)))
-(le @0 @1)
-(ge @0 @1
-
 /* ~~x -> x */
 (simplify
   (bit_not (bit_not @0))
Index: gcc/testsuite/gcc.dg/torture/pr71762-1.c
===
--- gcc/testsuite/gcc.dg/torture/pr71762-1.c(revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr71762-1.c(working copy)
@@ -0,0 +1,18 @@
+/* { dg-do run } */
+/* { dg-additional-options "-fdisable-rtl-init-regs" } */
+
+static _Bool
+foo (_Bool a, _Bool b)
+{
+  int x = a && ! b;
+  return x != 0;
+}
+
+int y = 1;
+int main()
+{
+  _Bool x;
+  if (foo (x, y))
+__builtin_abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/torture/pr71762-2.c
===
--- gcc/testsuite/gcc.dg/torture/pr71762-2.c(revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr71762-2.c(working copy)
@@ -0,0 +1,17 @@
+/* { dg-do run } */
+
+static _Bool
+foo (_Bool a, _Bool b)
+{
+  int x = a && ! b;
+  return x != 0;
+}
+
+int y = 1;
+int main()
+{
+  _Bool x[32];
+  if (foo (x[1], y))
+__builtin_abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/torture/pr71762-3.c
===
--- gcc/testsuite/gcc.dg/torture/pr71762-3.c(revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr71762-3.c(working copy)
@@ -0,0 +1,22 @@
+/* { dg-do run } */
+
+static _Bool
+foo (_Bool a, _Bool b)
+{
+  int x = a && ! b;
+  return x != 0;
+}
+
+int y = 1;
+int main()
+{
+  register _Bool x
+  /* Add register spec for the argv parameter to main.  */
+#if __i386__ || __x86_64__
+  __asm__("%esi")
+#endif
+;
+  if (foo (x, y))
+__builtin_abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/tree-ssa/forwprop-28.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/forwprop-28.c (revision 242032)
+++ gcc/testsuite/gcc.dg/tree-ssa/forwprop-28.c (working copy)
@@ -83,6 +83,8 @@ test_8 (int code)
to a ordered compare.  But the transform does not trigger if we transform
the negated code == 22 compare to code != 22 first.  It turns out if
we do that we even generate better code on x86 at least.  */
+/* ???  As PR71762 notices this transform causes wrong-code issues in RTL
+   with one uninitialized operand, thus it has been disabled.  */
 
-/* { dg-final { scan-tree-dump-times "simplified to if \\\(\[^ ]* \[<>\]" 4 
"forwprop1"} } */
+/* { dg-final { scan-tree-dump-times "simplified to if \\\(\[^ ]* \[<>\]" 4 
"forwprop1" { xfail *-*-* } } } */
 


Re: [ipa-vrp] ice in set_value_range

2016-11-10 Thread David Edelsohn
Kugan

Is there a PR for this failure?  It broke bootstrap on AIX as well and
I only was able to track it to your patch last night.

Thanks, David


Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread Martin Liška
PING^2

On 10/31/2016 10:13 AM, Martin Liška wrote:
> PING^1
> 
> On 10/13/2016 05:34 PM, Martin Liška wrote:
>> Hello.
>>
>> As it's very hard to guess from GCC driver whether a target supports atomic 
>> updates
>> for GCOV counter or not, I decided to come up with a new option value 
>> (maybe-atomic),
>> that would be transformed in a corresponding value (single or atomic) in 
>> tree-profile.c.
>> The GCC driver selects the option when -pthread is present in the command 
>> line.
>>
>> That should fix all tests failures seen on AIX target.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
>> Martin
>>



Re: [PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset (pr 77608)

2016-11-10 Thread Richard Biener
On Tue, Nov 8, 2016 at 5:03 AM, Martin Sebor  wrote:
> It's taken me longer than I expected to finally get back to this
> project.  Sorry about the delay.
>
>   https://gcc.gnu.org/ml/gcc-patches/2016-09/msg01110.html
>
> Attached is an updated patch with this enhancement and reflecting
> you previous comment.
>
> Besides running the GCC test suite I tested the patch by building
> Binutils and the Linux kernel.  It found one stpcpy-related overflow
> in Binutils that I'm looking into and reduced by one the number of
> problems reported by the -Wformat-length option in the kernel (I
> haven't yet checked which one it eliminated).
>
> Although I'm not done investigating the Binutils problem I'm posting
> the patch for review now to allow for comments before stage 1 ends.

@@ -158,14 +170,149 @@ compute_object_offset (const_tree expr, const_tree var)
   return size_binop (code, base, off);
 }

+static bool
+operand_unsigned_p (tree op)
+{
+  if (TREE_CODE (op) == SSA_NAME)

new functions need a comment.  But maybe you want to use tree_expr_nonnegative_p
to also allow signed but known positive ones?

+/* Fill the 2-element OFFRANGE array with the range of values OFF
+   is known to be in.  Postcodition: OFFRANGE[0] <= OFFRANGE[1].  */
+
+static bool
+get_offset_range (tree off, HOST_WIDE_INT offrange[2])
+{
+  STRIP_NOPS (off);

why strip nops (even sign changes!) here?  Why below convert things
via to_uhwi when offrange is of type signed HOST_WIDE_INT[2]?

+ gimple *def = SSA_NAME_DEF_STMT (off);
+ if (is_gimple_assign (def))
+   {
+ tree_code code = gimple_assign_rhs_code (def);
+ if (code == PLUS_EXPR)
+   {
+ /* Handle offset in the form VAR + CST where VAR's type
+is unsigned so the offset must be the greater of
+OFFRANGE[0] and CST.  This assumes the PLUS_EXPR
+is in a canonical form with CST second.  */
+ tree rhs2 = gimple_assign_rhs2 (def);

err, what?  What about overflow?  Aren't you just trying to decompose
'off' into a variable and a constant part here and somehow extracting a
range for the variable part?  So why not just do that?

+  else if (range_type == VR_ANTI_RANGE)
+   {
+ offrange[0] = max.to_uhwi () + 1;
+ offrange[1] = min.to_uhwi () - 1;
+ return true;
+   }

first of all, how do you know it fits uhwi?  Second, from ~[5, 9] you get
[10, 4] !?  That looks bogus (and contrary to the function comment
postcondition)

+  else if (range_type == VR_VARYING)
+   {
+ gimple *def = SSA_NAME_DEF_STMT (off);
+ if (is_gimple_assign (def))
+   {
+ tree_code code = gimple_assign_rhs_code (def);
+ if (code == NOP_EXPR)
+   {

please trust range-info instead of doing your own little VRP here.
VR_VARYING -> return false.

stopping review here noting that other parts of the compiler
split constant parts from variable parts and it looks like this is
what you want to do here?  That is, enhance

static vec object_sizes[4];

and cache a SSA-NAME, constant offset pair in addition?  Or just a range
(range of that SSA name + offset), if that's good enough -- wait, that's
what you get from get_range_info!

So I'm not sure where the whole complication in the patch comes from...

Possibly from the fact that VRP on pointers is limited and thus &a[i] + 5
doesn't get you a range for i + 5?

Richard.

> Martin
>
> PS The tests added in the patch (but nothing else) depend on
> the changes in the patch for c/53562:
>
>   https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00483.html
>


[PATCH] Fix print_node for CONSTRUCTORs

2016-11-10 Thread Martin Liška
Hello.

Following patch fixes indentation of print_node when printing a constructor
that has some equal elements. Current implementation caches tree to prevent deep
debug outputs. Such behavior is undesired for ctor elements. Apart from that,
I switch to hash_set for a table that is used for tree node caching.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Before:
  constant 1>
val  
constant 120>
idx  constant 2> val 
idx  constant 3> val 
idx  constant 4> val 
idx  constant 5> val >

After:
  constant 1>
val  
constant 120>
idx  constant 2>
val  
constant 120>
idx  constant 3>
val  
constant 120>
idx  constant 4>
val  
constant 120>
idx  constant 5>
val  
constant 120>>

Ready to be installed?
Martin
>From 6d18ff00ec1d8e6a8a154fbb70af25b2dda8165e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 9 Nov 2016 16:28:52 +0100
Subject: [PATCH] Fix print_node for CONSTRUCTORs

gcc/ChangeLog:

2016-11-10  Martin Liska  

	* print-tree.c (struct bucket): Remove.
	(print_node): Add new argument which drives whether a tree node
	is printed briefly or not.
	(debug_tree): Replace a custom hash table with hash_set.
	* print-tree.h (print_node): Add the argument.
---
 gcc/print-tree.c | 43 +++
 gcc/print-tree.h |  3 ++-
 2 files changed, 17 insertions(+), 29 deletions(-)

diff --git a/gcc/print-tree.c b/gcc/print-tree.c
index 8c63cb8..f3ee04c 100644
--- a/gcc/print-tree.c
+++ b/gcc/print-tree.c
@@ -33,19 +33,14 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-pretty-print.h" /* FIXME */
 #include "tree-cfg.h"
 #include "tree-dump.h"
+#include "print-tree.h"
 
 /* Define the hash table of nodes already seen.
Such nodes are not repeated; brief cross-references are used.  */
 
 #define HASH_SIZE 37
 
-struct bucket
-{
-  tree node;
-  struct bucket *next;
-};
-
-static struct bucket **table;
+static hash_set *table = NULL;
 
 /* Print PREFIX and ADDR to FILE.  */
 void
@@ -176,10 +171,9 @@ indent_to (FILE *file, int column)
starting in column INDENT.  */
 
 void
-print_node (FILE *file, const char *prefix, tree node, int indent)
+print_node (FILE *file, const char *prefix, tree node, int indent,
+	bool brief_for_visited)
 {
-  int hash;
-  struct bucket *b;
   machine_mode mode;
   enum tree_code_class tclass;
   int len;
@@ -219,21 +213,14 @@ print_node (FILE *file, const char *prefix, tree node, int indent)
   /* Allow this function to be called if the table is not there.  */
   if (table)
 {
-  hash = ((uintptr_t) node) % HASH_SIZE;
-
   /* If node is in the table, just mention its address.  */
-  for (b = table[hash]; b; b = b->next)
-	if (b->node == node)
-	  {
-	print_node_brief (file, prefix, node, indent);
-	return;
-	  }
+  if (table->contains (node) && brief_for_visited)
+	{
+	  print_node_brief (file, prefix, node, indent);
+	  return;
+	}
 
-  /* Add this node to the table.  */
-  b = XNEW (struct bucket);
-  b->node = node;
-  b->next = table[hash];
-  table[hash] = b;
+  table->add (node);
 }
 
   /* Indent to the specified column, since this is the long form.  */
@@ -846,8 +833,8 @@ print_node (FILE *file, const char *prefix, tree node, int indent)
 	FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (node),
   cnt, index, value)
 	  {
-		print_node (file, "idx", index, indent + 4);
-		print_node (file, "val", value, indent + 4);
+		print_node (file, "idx", index, indent + 4, false);
+		print_node (file, "val", value, indent + 4, false);
 	  }
 	  }
 	  break;
@@ -997,10 +984,10 @@ print_node (FILE *file, const char *prefix, tree node, int indent)
 DEBUG_FUNCTION void
 debug_tree (tree node)
 {
-  table = XCNEWVEC (struct bucket *, HASH_SIZE);
+  table = new hash_set (HASH_SIZE);
   print_node (stderr, "", node, 0);
-  free (table);
-  table = 0;
+  delete table;
+  table = NULL;
   putc ('\n', stderr);
 }
 
diff --git a/gcc/print-tree.h b/gcc/print-tree.h
index 124deab..fd610f9 100644
--- a/gcc/print-tree.h
+++ b/gcc/print-tree.h
@@ -38,7 +38,8 @@ extern void debug_raw (vec &ref);
 extern void debug_raw (vec *ptr);
 #ifdef BUFSIZ
 extern void dump_addr (FILE*, const char *, const void *);
-extern void print_node (FILE *, const char *, tree, int);
+extern void print_node (FILE *, const char *, tree, int,
+			bool brief_for_visited = true);
 extern void print_node_brief (FILE *, const char *, const_tree, int);
 extern void indent_to (FILE *, int);
 #endif
-- 
2.10.1



Re: RFA: PATCH to gengtype to avoid putting tree_node support in front end objects

2016-11-10 Thread Jakub Jelinek
On Thu, Oct 27, 2016 at 09:36:09AM -0400, Jason Merrill wrote:
> Currently, the way gengtype works it scans the list of source files
> with front end files at the end, and pushes data structures onto a
> stack.  It then processes the stack in LIFO order, so that data
> structures from front ends are handled first.  As a result, if a GTY
> data structure in a front end depends on tree_node, gengtype happily
> puts gt_ggc_mx(tree_node*&) in a front end file, leading to link
> errors on all other front ends.
> 
> This patch avoids this problem by appending to the list of data
> structures so that they are processed in FIFO order, and so tree_node
> gets handled in gtype-desc.o.
> 
> Tested x86_64-pc-linux-gnu, OK for trunk?

> commit 487a1c95c0d3169b2041942ff4f8d71c9ff689eb
> Author: Jason Merrill 
> Date:   Wed Oct 26 23:12:23 2016 -0400
> 
> * gengtype.c (new_structure): Append to structures list.
> 
> (find_structure): Likewise.

Please remove the blank line in the ChangeLog.

When looking at the differences it creates, it is hard, because
all the generated files have all the functions emitted in reverse order now
from what it used to be, so I only looked at files where the size changed,
and that is beyond gtype.state only in my case gt-tree-phinodes.h
which lost
void
gt_ggc_mx (struct gimple *& x)
{
  if (x)
gt_ggc_mx_gimple ((void *) x);
}
and
void
gt_pch_nx (struct gimple *& x)
{
  if (x)
gt_pch_nx_gimple ((void *) x);
}
and gtype-desc.c which didn't contain those but now it does (for
gtype-desc.c it is hard to find out due to the reordering what else
has changed, but as gt-tree-phinodes.h shrunk by 170 characters and
gtype-desc.c grew by 170 characters, I'd think it is all that changed).
I believe those routines belong to gtype-desc.c, that is where similar
ones for tree_node, etc. are, tree-phinodes.h certainly isn't the header
that defines gimple.

So I think this patch is ok for trunk.  Thanks.

Jakub


Re: [PATCH] DWARF: make signedness explicit for enumerator const values

2016-11-10 Thread Mark Wielaard
On Thu, 2016-10-13 at 18:12 +0200, Pierre-Marie de Rodat wrote:
> Currently, the DWARF description does not specify the signedness of the
> representation of enumeration types.  This is a problem in some
> contexts where DWARF consumers need to determine if value X is greater
> than value Y.
> 
> For instance in Ada:
> 
> type Enum_Type is ( A, B, C, D);
> for Enum_Type use (-1, 0, 1, 2);
> 
> type Rec_Type (E : Enum_Type) is record
>when A .. B => null;
>when others => B : Booleann;
> end record;
> 
> The above can be described in DWARF the following way:
> 
> DW_TAG_enumeration_type(Enum_Type)
> | DW_AT_byte_size: 1
>   DW_TAG_enumerator(A)
>   | DW_AT_const_value: -1
>   DW_TAG_enumerator(B)
>   | DW_AT_const_value: 0
>   DW_TAG_enumerator(C)
>   | DW_AT_const_value: 1
>   DW_TAG_enumerator(D)
>   | DW_AT_const_value: 2
> 
> DW_TAG_structure_type(Rec_Type)
>   DW_TAG_member(E)
>   | DW_AT_type: 
>   DW_TAG_variant_part
>   | DW_AT_discr: 
> DW_TAG_variant
> | DW_AT_discr_list: DW_DSC_range 0x7f 0
> DW_TAG_variant
> | DW_TAG_member(b)
> 
> DWARF consumers need to know that enumerators (A, B, C and D) are signed
> in order to determine the set of E values for which Rec_Type has a B
> field.  In practice, they need to know how to interpret the 0x7f LEB128
> number above (-1, not 127).
> 
> There seems to be only two alternatives to solve this issue: one is to
> add a DW_AT_type attribute to DW_TAG_enumerator_type DIEs to make it
> point to a base type that specifies the signedness.  The other is to
> make sure the form of the DW_AT_const_value attribute carries the
> signedness information.  This patch implements the latter.

IMHO having an explicit DW_AT_type pointing at the base type with size
and encoding for the DW_TAG_enumerator_type is better for consumers than
having to try and interpret the DW_FORM used to encode the values.

Alternatively could we just attach a DW_AT_encoding to the
DW_TAG_enumeration_type? The spec doesn't list it as one of the
attributes for an enumeration_type, but it makes sense given it already
carries bit/byte size attributes.

Thanks,

Mark


Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-10 Thread Richard Biener
On Thu, 10 Nov 2016, Richard Biener wrote:

> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
> 
> > Richard,
> > 
> > Here is updated 3 patch.
> > 
> > I checked that all new tests related to epilogue vectorization passed with 
> > it.
> > 
> > Your comments will be appreciated.
> 
> A lot better now.  Instead of the ->aux dance I now prefer to
> pass the original loops loop_vinfo to vect_analyze_loop as
> optional argument (if non-NULL we analyze the epilogue of that 
> loop_vinfo).  OTOH I remember we mainly use it to get at the
> original vectorization factor?  So we can pass down an (optional)
> forced vectorization factor as well?

Btw, I wonder if you can produce a single patch containing just
epilogue vectorization, that is combine patches 1-3 but rip out
changes only needed by later patches?

Thanks,
Richard.

> Richard.
> 
> > 2016-11-08 15:38 GMT+03:00 Richard Biener :
> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
> > >
> > >> Hi Richard,
> > >>
> > >> I did not understand your last remark:
> > >>
> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
> > >> >
> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> > >> >   && dump_enabled_p ())
> > >> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> > >> >"loop vectorized\n");
> > >> > -   vect_transform_loop (loop_vinfo);
> > >> > +   new_loop = vect_transform_loop (loop_vinfo);
> > >> > num_vectorized_loops++;
> > >> >/* Now that the loop has been vectorized, allow it to be 
> > >> > unrolled
> > >> >   etc.  */
> > >> >  loop->force_vectorize = false;
> > >> >
> > >> > +   /* Add new loop to a processing queue.  To make it easier
> > >> > +  to match loop and its epilogue vectorization in dumps
> > >> > +  put new loop as the next loop to process.  */
> > >> > +   if (new_loop)
> > >> > + {
> > >> > +   loops.safe_insert (i + 1, new_loop->num);
> > >> > +   vect_loops_num = number_of_loops (cfun);
> > >> > + }
> > >> >
> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> > >> f> unction which will set up stuff properly (and also perform
> > >> > the if-conversion of the epilogue there).
> > >> >
> > >> > That said, if we can get in non-masked epilogue vectorization
> > >> > separately that would be great.
> > >>
> > >> Could you please clarify your proposal.
> > >
> > > When a loop was vectorized set things up to immediately vectorize
> > > its epilogue, avoiding changing the loop iteration and avoiding
> > > the re-use of ->aux.
> > >
> > > Richard.
> > >
> > >> Thanks.
> > >> Yuri.
> > >>
> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener :
> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
> > >> >
> > >> >> Hi All,
> > >> >>
> > >> >> I re-send all patches sent by Ilya earlier for review which support
> > >> >> vectorization of loop epilogues and loops with low trip count. We
> > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> > >> >> approved by Jeff.
> > >> >>
> > >> >> I did re-base of all patches and performed bootstrapping and
> > >> >> regression testing that did not show any new failures. Also all
> > >> >> changes related to new vect_do_peeling algorithm have been changed
> > >> >> accordingly.
> > >> >>
> > >> >> Is it OK for trunk?
> > >> >
> > >> > I would have prefered that the series up to -03-nomask-tails would
> > >> > _only_ contain epilogue loop vectorization changes but unfortunately
> > >> > the patchset is oddly separated.
> > >> >
> > >> > I have a comment on that part nevertheless:
> > >> >
> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> > >> > loop_vinfo)
> > >> >/* Check if we can possibly peel the loop.  */
> > >> >if (!vect_can_advance_ivs_p (loop_vinfo)
> > >> >|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
> > >> > -  || loop->inner)
> > >> > +  || loop->inner
> > >> > +  /* Required peeling was performed in prologue and
> > >> > +is not required for epilogue.  */
> > >> > +  || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> > >> >  do_peeling = false;
> > >> >
> > >> >if (do_peeling
> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> > >> > loop_vinfo)
> > >> >
> > >> >do_versioning =
> > >> > optimize_loop_nest_for_speed_p (loop)
> > >> > -   && (!loop->inner); /* FORNOW */
> > >> > +   && (!loop->inner) /* FORNOW */
> > >> > +/* Required versioning was performed for the
> > >> > +  original loop and is not required for epilogue.  */
> > >> > +   && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
> > >> >
> > >> >if (do_versioning)
> > >> >  {
> > >> >
> > >> > please do that check in the single caller of this function.
> > >> >
> > >> > Otherwise I still dislike the new ->aux use and I believe that simply
> > >> > passing down info from the processed parent

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-10 Thread Richard Biener
On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:

> Richard,
> 
> Here is updated 3 patch.
> 
> I checked that all new tests related to epilogue vectorization passed with it.
> 
> Your comments will be appreciated.

A lot better now.  Instead of the ->aux dance I now prefer to
pass the original loops loop_vinfo to vect_analyze_loop as
optional argument (if non-NULL we analyze the epilogue of that 
loop_vinfo).  OTOH I remember we mainly use it to get at the
original vectorization factor?  So we can pass down an (optional)
forced vectorization factor as well?

Richard.

> 2016-11-08 15:38 GMT+03:00 Richard Biener :
> > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
> >
> >> Hi Richard,
> >>
> >> I did not understand your last remark:
> >>
> >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >> >
> >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >> >   && dump_enabled_p ())
> >> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> >> >"loop vectorized\n");
> >> > -   vect_transform_loop (loop_vinfo);
> >> > +   new_loop = vect_transform_loop (loop_vinfo);
> >> > num_vectorized_loops++;
> >> >/* Now that the loop has been vectorized, allow it to be unrolled
> >> >   etc.  */
> >> >  loop->force_vectorize = false;
> >> >
> >> > +   /* Add new loop to a processing queue.  To make it easier
> >> > +  to match loop and its epilogue vectorization in dumps
> >> > +  put new loop as the next loop to process.  */
> >> > +   if (new_loop)
> >> > + {
> >> > +   loops.safe_insert (i + 1, new_loop->num);
> >> > +   vect_loops_num = number_of_loops (cfun);
> >> > + }
> >> >
> >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> >> f> unction which will set up stuff properly (and also perform
> >> > the if-conversion of the epilogue there).
> >> >
> >> > That said, if we can get in non-masked epilogue vectorization
> >> > separately that would be great.
> >>
> >> Could you please clarify your proposal.
> >
> > When a loop was vectorized set things up to immediately vectorize
> > its epilogue, avoiding changing the loop iteration and avoiding
> > the re-use of ->aux.
> >
> > Richard.
> >
> >> Thanks.
> >> Yuri.
> >>
> >> 2016-11-02 15:27 GMT+03:00 Richard Biener :
> >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
> >> >
> >> >> Hi All,
> >> >>
> >> >> I re-send all patches sent by Ilya earlier for review which support
> >> >> vectorization of loop epilogues and loops with low trip count. We
> >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> >> >> approved by Jeff.
> >> >>
> >> >> I did re-base of all patches and performed bootstrapping and
> >> >> regression testing that did not show any new failures. Also all
> >> >> changes related to new vect_do_peeling algorithm have been changed
> >> >> accordingly.
> >> >>
> >> >> Is it OK for trunk?
> >> >
> >> > I would have prefered that the series up to -03-nomask-tails would
> >> > _only_ contain epilogue loop vectorization changes but unfortunately
> >> > the patchset is oddly separated.
> >> >
> >> > I have a comment on that part nevertheless:
> >> >
> >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> >> > loop_vinfo)
> >> >/* Check if we can possibly peel the loop.  */
> >> >if (!vect_can_advance_ivs_p (loop_vinfo)
> >> >|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
> >> > -  || loop->inner)
> >> > +  || loop->inner
> >> > +  /* Required peeling was performed in prologue and
> >> > +is not required for epilogue.  */
> >> > +  || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> >> >  do_peeling = false;
> >> >
> >> >if (do_peeling
> >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> >> > loop_vinfo)
> >> >
> >> >do_versioning =
> >> > optimize_loop_nest_for_speed_p (loop)
> >> > -   && (!loop->inner); /* FORNOW */
> >> > +   && (!loop->inner) /* FORNOW */
> >> > +/* Required versioning was performed for the
> >> > +  original loop and is not required for epilogue.  */
> >> > +   && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
> >> >
> >> >if (do_versioning)
> >> >  {
> >> >
> >> > please do that check in the single caller of this function.
> >> >
> >> > Otherwise I still dislike the new ->aux use and I believe that simply
> >> > passing down info from the processed parent would be _much_ cleaner.
> >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >> >
> >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >> > && dump_enabled_p ())
> >> >dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> >> > "loop vectorized\n");
> >> > -   vect_transform_loop (loop_vinfo);
> >> > +   new_loop = vect_transform_loop (loop_vinfo);
> >> > num_vectorized_loops++;
> >> > /*

Re: [PATCH] libiberty: Add Rust symbol demangling.

2016-11-10 Thread Mark Wielaard
On Thu, 2016-11-03 at 18:39 +0100, Mark Wielaard wrote:
> Adds Rust symbol demangler. Rust mangles symbols using GNU_V3 style,
> adding a hash and various special character subtitutions. This adds
> a new rust style to cplus_demangle and adds 3 helper functions
> rust_demangle, rust_demangle_sym and rust_is_mangled.
> 
> rust-demangle.c was written by David. Mark did the code formatting to
> GNU style and integration into the gcc/libiberty build system and
> testsuite.
> 
> The original code was written for the perf tool. David agreed with
> submitting it for gcc/libiberty so binutils, gdb, valgrind, etc.
> can also easily reuse it. He has sent request-assign to ass...@gnu.org.
> I already have write after approval for gcc.

Ping.

Any comments are welcome. A variant of this is being used already in
perf and valgrind. It would be nice to make it part of upstream
libiberty.

I did also apply this patch to binutils-gdb/libiberty and with a
oneliner patch (*) it allows c++filt to handle rust mangled symbols and
accept --format=rust (and displays it with --help).

I did not yet test gdb, which would need a similar small tweak to its
current language settings. Any feedback on whether it also works as
expected with gdb appreciated.

Thanks,

Mark

(*)

diff --git a/binutils/cxxfilt.c b/binutils/cxxfilt.c
index d5863ee..7cf9458 100644
--- a/binutils/cxxfilt.c
+++ b/binutils/cxxfilt.c
@@ -242,6 +242,7 @@ main (int argc, char **argv)
 case gnu_v3_demangling:
 case dlang_demangling:
 case auto_demangling:
+case rust_demangling:
   valid_symbols = standard_symbol_characters ();
   break;
 case hp_demangling:



Patch ping: Fix dwarf2out related bootstrap failure on Solaris (PR debug/78191)

2016-11-10 Thread Jakub Jelinek
On Thu, Nov 03, 2016 at 05:42:51PM +0100, Jakub Jelinek wrote:
> Bootstrapped/regtested on x86_64-linux and i686-linux and Rainer has kindly
> tested it on Solaris, ok for trunk?
> 
> 2016-11-03  Jakub Jelinek  
> 
>   * dwarf2out.c (size_of_discr_list): Fix typo in function comment.
> 
>   PR debug/78191
>   * dwarf2out.c (abbrev_opt_base_type_end): New variable.
>   (die_abbrev_cmp): Sort dies with die_abbrev smaller than
>   abbrev_opt_base_type_end only by increasing die_abbrev, before
>   any other dies.
>   (optimize_abbrev_table): Don't change abbrev numbers of
>   base types and CU or optimize implicit consts in them if
>   calc_base_type_die_sizes has been called during build_abbrev_table.
>   (calc_base_type_die_sizes): If abbrev_opt_start, set
>   abbrev_opt_base_type_end to one plus largest base type's
>   die_abbrev.

I'd like to ping this patch, it fixes a bootstrap failure on Solaris
(both sparc and x86), so I think it is high priority.

Thanks.

Jakub


Re: [PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270)

2016-11-10 Thread Jakub Jelinek
On Thu, Nov 10, 2016 at 12:02:45PM +0100, Martin Liška wrote:
> >From fb4b852a17656309e6acfb8da97cf9bce4b3b176 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Wed, 9 Nov 2016 11:52:00 +0100
> Subject: [PATCH] Create live_switch_vars conditionally (PR sanitizer/78270)
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-11-09  Martin Liska  
> 

Missing
PR sanitizer/78270
note.

>   * gcc.dg/asan/pr78269.c: New test.

Misnamed test, should be pr78270.c.
> 
> gcc/ChangeLog:
> 
> 2016-11-09  Martin Liska  
> 

Also missing the PR line.

>   * gimplify.c (gimplify_switch_expr): Create live_switch_vars
>   only when SWITCH_BODY is a BIND_EXPR.

Ok with those nits fixed.

Jakub


Re: [PATCH] debug/PR78112: remove recent duplicates for DW_TAG_subprogram attributes

2016-11-10 Thread Pierre-Marie de Rodat

On 11/09/2016 10:02 AM, Richard Biener wrote:

Using scan-assembler-times on the dwarf?


I always have a bad feeling about this kind of check as I imagine it can 
break very easily with legit changes. But I have nothing better to 
contribute, so I’ve added one such testcase. ;-)



Ok to commit?


Ok.


This is committed as r242035. Thanks!

--
Pierre-Marie de Rodat


[PATCH] [ARC] New option handling, refurbish multilib support.

2016-11-10 Thread Claudiu Zissulescu
Hi,

Please find the revised patch which includes the refurbishing of
mmpy-option option, and a new comment on DEFAULT_arc_fpu_build
define. As for the last suggestion, my proposal is to have a latter
patch on the topic of .cpu, synced with a related binutils patch.

OK to apply?
Claudiu

gcc/
2016-05-09  Claudiu Zissulescu  

* config/arc/arc-arch.h: New file.
* config/arc/arc-arches.def: Likewise.
* config/arc/arc-cpus.def: Likewise.
* config/arc/arc-options.def: Likewise.
* config/arc/t-multilib: Likewise.
* config/arc/genmultilib.awk: Likewise.
* config/arc/genoptions.awk: Likewise.
* config/arc/arc-tables.opt: Likewise.
* config/arc/driver-arc.c: Likewise.
* common/config/arc/arc-common.c (arc_handle_option): Trace
toggled options.
* config.gcc (arc*-*-*): Add arc-tables.opt to arc's extra
options; check for supported cpu against arc-cpus.def file.
(arc*-*-elf*, arc*-*-linux-uclibc*): Use new make fragment; define
TARGET_CPU_BUILD macro; add driver-arc.o as an extra object.
* config/arc/arc-c.def: Add emacs local variables.
* config/arc/arc-opts.h (processor_type): Use arc-cpus.def file.
(FPU_FPUS, FPU_FPUD, FPU_FPUDA, FPU_FPUDA_DIV, FPU_FPUDA_FMA)
(FPU_FPUDA_ALL, FPU_FPUS_DIV, FPU_FPUS_FMA, FPU_FPUS_ALL)
(FPU_FPUD_DIV, FPU_FPUD_FMA, FPU_FPUD_ALL): New defines.
(DEFAULT_arc_fpu_build): Define.
(DEFAULT_arc_mpy_option): Define.
* config/arc/arc-protos.h (arc_init): Delete.
* config/arc/arc.c (arc_cpu_name): New variable.
(arc_selected_cpu, arc_selected_arch, arc_arcem, arc_archs)
(arc_arc700, arc_arc600, arc_arc601): New variable.
(arc_init): Add static; remove selection of default tune value,
cleanup obsolete error messages.
(arc_override_options): Make use of .def files for selecting the
right cpu and option configurations.
* config/arc/arc.h (stdbool.h): Include.
(TARGET_CPU_DEFAULT): Define.
(CPP_SPEC): Remove mcpu=NPS400 handling.
(arc_cpu_to_as): Declare.
(EXTRA_SPEC_FUNCTIONS): Define.
(OPTION_DEFAULT_SPECS): Likewise.
(ASM_DEFAULT): Remove.
(ASM_SPEC): Use arc_cpu_to_as.
(DRIVER_SELF_SPECS): Remove deprecated options.
(arc_base_cpu): Declare.
(TARGET_ARC600, TARGET_ARC601, TARGET_ARC700, TARGET_EM)
(TARGET_HS, TARGET_V2, TARGET_ARC600): Make them use arc_base_cpu
variable.
(MULTILIB_DEFAULTS): Use ARC_MULTILIB_CPU_DEFAULT.
* config/arc/arc.md (attr_cpu): Remove.
* config/arc/arc.opt (mno-mpy): Deprecate.
(mcpu=ARC600, mcpu=ARC601, mcpu=ARC700, mcpu=NPS400, mcpu=ARCEM)
(mcpu=ARCHS): Remove.
(mcrc, mdsp-packa, mdvbf, mmac-d16, mmac-24, mtelephony, mrtsc):
Deprecate.
(mbarrel_shifte, mspfp_, mdpfp_, mdsp_pack, mmac_): Remove.
(arc_fpu): Use new defines.
(mpy-option): Change to use numeric or string like inputs.
* config/arc/t-arc (driver-arc.o): New target.
(arc-cpus, t-multilib, arc-tables.opt): Likewise.
* config/arc/t-arc-newlib: Delete.
* config/arc/t-arc-uClibc: Renamed to t-uClibc.
* doc/invoke.texi (ARC): Update arc options.
---
 gcc/common/config/arc/arc-common.c |  69 -
 gcc/config.gcc |  47 +
 gcc/config/arc/arc-arch.h  | 120 ++
 gcc/config/arc/arc-arches.def  |  56 ++
 gcc/config/arc/arc-c.def   |   4 +
 gcc/config/arc/arc-cpus.def|  75 ++
 gcc/config/arc/arc-options.def | 109 
 gcc/config/arc/arc-opts.h  |  49 +++--
 gcc/config/arc/arc-protos.h|   1 -
 gcc/config/arc/arc-tables.opt  |  90 
 gcc/config/arc/arc.c   | 176 +---
 gcc/config/arc/arc.h   |  89 
 gcc/config/arc/arc.md  |   5 -
 gcc/config/arc/arc.opt | 169 +++---
 gcc/config/arc/driver-arc.c|  78 ++
 gcc/config/arc/genmultilib.awk | 203 +
 gcc/config/arc/genoptions.awk  |  86 
 gcc/config/arc/t-arc   |  19 
 gcc/config/arc/t-arc-newlib|  46 -
 gcc/config/arc/t-arc-uClibc|  20 
 gcc/config/arc/t-multilib  |  34 +++
 gcc/config/arc/t-uClibc|  20 
 gcc/doc/invoke.texi|  90 +---
 23 files changed, 1277 insertions(+), 378 deletions(-)
 create mode 100644 gcc/config/arc/arc-arch.h
 create mode 100644 gcc/config/arc/arc-arches.def
 create mode 100644 gcc/config/arc/arc-cpus.def
 create mode 100644 gcc/config/arc/arc-options.def
 create mode 100644 gcc/config/arc/arc-tables.opt
 create mode 100644 gcc/config/arc/driver-a

Re: [PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270)

2016-11-10 Thread Martin Liška
On 11/09/2016 02:47 PM, Martin Liška wrote:
> On 11/09/2016 02:29 PM, Jakub Jelinek wrote:
>> On Wed, Nov 09, 2016 at 02:16:45PM +0100, Martin Liška wrote:
>>> As shown in the attached test-case, the assert cannot always be true.
>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>
>>> Ready to be installed?
>>> Martin
>>
>>> >From b55459461f3f7396a094be6801082715ddb4b30d Mon Sep 17 00:00:00 2001
>>> From: marxin 
>>> Date: Wed, 9 Nov 2016 11:52:00 +0100
>>> Subject: [PATCH] Remove unneeded gcc_assert in gimplifier (PR 
>>> sanitizer/78270)
>>>
>>> gcc/ChangeLog:
>>>
>>> 2016-11-09  Martin Liska  
>>>
>>> PR sanitizer/78270
>>> * gimplify.c (gimplify_switch_expr):
>>
>> No description on what you've changed.
>>
>> That said, I'm not 100% sure it is the right fix.
>> As the testcase shows, for switch without GIMPLE_BIND wrapping the body
>> we can have variables that are in scope from the switch onwards.
>> I bet we could also have variables that go out of scope, say if in the
>> compound literal's initializer there is ({ ... }) that declares variables.
>> I doubt you can have a valid case/default label in those though, so
>> perhaps it would be simpler not to create live_switch_vars at all
>> if SWITCH_BODY is not a BIND_EXPR?
> 
> I like the approach you introduced! I'll re-trigger regression tests and
> send a newer version of patch.
> 
> Martin

Sending the patch.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

> 
>>
>>  Jakub
>>
> 

>From fb4b852a17656309e6acfb8da97cf9bce4b3b176 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 9 Nov 2016 11:52:00 +0100
Subject: [PATCH] Create live_switch_vars conditionally (PR sanitizer/78270)

gcc/testsuite/ChangeLog:

2016-11-09  Martin Liska  

	* gcc.dg/asan/pr78269.c: New test.

gcc/ChangeLog:

2016-11-09  Martin Liska  

	* gimplify.c (gimplify_switch_expr): Create live_switch_vars
	only when SWITCH_BODY is a BIND_EXPR.
---
 gcc/gimplify.c  | 20 +++-
 gcc/testsuite/gcc.dg/asan/pr78269.c | 13 +
 2 files changed, 28 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/asan/pr78269.c

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index d392450..da60c05 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -2241,7 +2241,7 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p)
 {
   vec labels;
   vec saved_labels;
-  hash_set *saved_live_switch_vars;
+  hash_set *saved_live_switch_vars = NULL;
   tree default_case = NULL_TREE;
   gswitch *switch_stmt;
 
@@ -2253,8 +2253,14 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p)
  labels.  Save all the things from the switch body to append after.  */
   saved_labels = gimplify_ctxp->case_labels;
   gimplify_ctxp->case_labels.create (8);
-  saved_live_switch_vars = gimplify_ctxp->live_switch_vars;
-  gimplify_ctxp->live_switch_vars = new hash_set (4);
+
+  /* Do not create live_switch_vars if SWITCH_BODY is not a BIND_EXPR.  */
+  if (TREE_CODE (SWITCH_BODY (switch_expr)) == BIND_EXPR)
+	{
+	  saved_live_switch_vars = gimplify_ctxp->live_switch_vars;
+	  gimplify_ctxp->live_switch_vars = new hash_set (4);
+	}
+
   bool old_in_switch_expr = gimplify_ctxp->in_switch_expr;
   gimplify_ctxp->in_switch_expr = true;
 
@@ -2269,8 +2275,12 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p)
 
   labels = gimplify_ctxp->case_labels;
   gimplify_ctxp->case_labels = saved_labels;
-  gcc_assert (gimplify_ctxp->live_switch_vars->elements () == 0);
-  delete gimplify_ctxp->live_switch_vars;
+
+  if (gimplify_ctxp->live_switch_vars)
+	{
+	  gcc_assert (gimplify_ctxp->live_switch_vars->elements () == 0);
+	  delete gimplify_ctxp->live_switch_vars;
+	}
   gimplify_ctxp->live_switch_vars = saved_live_switch_vars;
 
   preprocess_case_label_vec_for_gimple (labels, index_type,
diff --git a/gcc/testsuite/gcc.dg/asan/pr78269.c b/gcc/testsuite/gcc.dg/asan/pr78269.c
new file mode 100644
index 000..55840b0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/pr78269.c
@@ -0,0 +1,13 @@
+// { dg-do compile }
+// { dg-additional-options "-Wno-switch-unreachable" }
+
+typedef struct
+{
+} bdaddr_t;
+
+int a;
+void fn1 ()
+{
+  switch (a)
+&(bdaddr_t){};
+}
-- 
2.10.1



Re: [PATCH] loop distribution bug fix

2016-11-10 Thread Richard Biener
On Thu, Nov 10, 2016 at 6:25 AM, Jim Wilson  wrote:
> This fixes a bug in code adding edges to the dependence graph. Values
> for this_dir can be -1 (backward edge), 0 (no edge), 1 (forward edge),
> and 2 (both backward and forward edges). There can be multiple
> dependencies checked, creating multiple edges that have to be merged
> together. this_dir contains the current edge. dir contains the
> previous edges. The code fails to handle the case where this_dir is 2,
> in which case we can return immediately. This is a minor optimization
> to improve compile time. The code handles the case where dir is
> non-zero and this_dir is zero by returning 2, which is incorrect. dir
> should be unmodified in this case. We can return 2 only if both dir
> and this_dir are non-zero and unequal, i.e. one is -1 and the other is
> 1. This problem creates extra unnecessary edges, which can prevent
> loops from being distributed. The patch fixes both problems.
>
> This passed an x86_64 gcc bootstrap with -ftree-loop-distribution
> added to BOOT_CFLAGS and a testsuite regression check.  Curiously, I
> see that I get different results for the C/C++ ubsan tests every time
> I run them, but this has nothing to do with my patch, as it happens
> with or without my patch.  I haven't tried debugging this yet.  Might
> be related to my recent upgrade to Ubuntu 16.04 LTS.  Otherwise, there
> are no regressions.
>
> On SPEC CPU2006, on aarch64, I see 5879 loops distributed without the
> patch, and 5906 loops distributed with the patch. So 27 extra loops
> are distributed which is about 0.5% more loop distributions. There is
> no measurable performance gain from the bug fix on the CPU2006 run
> time though I plan to spend some more time looking at this code to see
> if I can find other improvements.

The biggest "lack" of loop distribution is the ability to undo CSE so for

 for (;;)
  {
 a[i] = a[i] + 1;
 b[i] = a[i];
  }

CSE makes us see

  for (;;)
{
   tem = a[i];
   tem2 = tem + 1;
   a[i] = tem;
   b[i] = tem;
}

and loop distribution cannot re-materialize tem3 from a[i] thus most of the
time it ends up pulling redundant computations into each partition (sometimes
that can reduce memory bandwith if one less stream is used but sometimes
not, like in the above case).

Then of course the cost model is purely modeled for STREAM (reduce the number
of memory streams).  So loop distribution is expected to pessimize code for
the CSE case in case you are not memory bound and improve things if you
are memory bound.

> OK?

Ok.

Thanks for the improvement!
Richard.

> Jim


[Patch 4/5] OpenACC tile clause support, Fortran front-end parts

2016-11-10 Thread Chung-Lin Tang
The Fortran front-end patches. These were originally written by Cesar.

Thanks,
Chung-Lin

2016-XX-XX  Cesar Philippidis  

fortran/
* openmp.c (resolve_oacc_positive_int_expr): Promote the warning
to an error.
(resolve_oacc_loop_blocks): Use integer zero to represent the '*'
tile argument.
(resolve_omp_clauses): Error on directives containing both tile
and collapse clauses.
* trans-openmp.c (gfc_trans_omp_do): Lower tiled loops like
collapsed loops.


gcc/testsuite/
* gfortran.dg/goacc/loop-2.f95: Change expected tile clause
warnings to errors.
* gfortran.dg/goacc/loop-5.f95: Likewise.
* gfortran.dg/goacc/sie.f95: Likewise.
* gfortran.dg/goacc/tile-1.f90: New test.
* gfortran.dg/goacc/tile-2.f90: New test
* gfortran.dg/goacc/tile-lowering.f95: New test.
Index: fortran/openmp.c
===
--- fortran/openmp.c	(revision 241809)
+++ fortran/openmp.c	(working copy)
@@ -3024,8 +3024,8 @@ resolve_oacc_positive_int_expr (gfc_expr *expr, co
   resolve_oacc_scalar_int_expr (expr, clause);
   if (expr->expr_type == EXPR_CONSTANT && expr->ts.type == BT_INTEGER
   && mpz_sgn(expr->value.integer) <= 0)
-gfc_warning (0, "INTEGER expression of %s clause at %L must be positive",
-		 clause, &expr->where);
+gfc_error ("INTEGER expression of %s clause at %L must be positive",
+	   clause, &expr->where);
 }
 
 /* Emits error when symbol is pointer, cray pointer or cray pointee
@@ -3859,6 +3859,8 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_claus
 if (omp_clauses->wait_list)
   for (el = omp_clauses->wait_list; el; el = el->next)
 	resolve_oacc_scalar_int_expr (el->expr, "WAIT");
+  if (omp_clauses->collapse && omp_clauses->tile_list)
+gfc_error ("Incompatible use of TILE and COLLAPSE at %L", &code->loc);
 }
 
 
@@ -4964,11 +4966,11 @@ resolve_oacc_loop_blocks (gfc_code *code)
 	  if (el->expr == NULL)
 	{
 	  /* NULL expressions are used to represent '*' arguments.
-		 Convert those to a -1 expressions.  */
+		 Convert those to a 0 expressions.  */
 	  el->expr = gfc_get_constant_expr (BT_INTEGER,
 		gfc_default_integer_kind,
 		&code->loc);
-	  mpz_set_si (el->expr->value.integer, -1);
+	  mpz_set_si (el->expr->value.integer, 0);
 	}
 	  else
 	{
Index: fortran/trans-openmp.c
===
--- fortran/trans-openmp.c	(revision 241809)
+++ fortran/trans-openmp.c	(working copy)
@@ -3162,7 +3162,18 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op,
   vec inits = vNULL;
   dovar_init *di;
   unsigned ix;
+  gfc_expr_list *tile = do_clauses ? do_clauses->tile_list : clauses->tile_list;
 
+  /* Both collapsed and tiled loops are lowered the same way.  In
+ OpenACC, those clauses are not compatible, so prioritize the tile
+ clause, if present.  */
+  if (tile)
+{
+  collapse = 0;
+  for (gfc_expr_list *el = tile; el; el = el->next)
+	collapse++;
+}
+
   if (collapse <= 0)
 collapse = 1;
 
Index: testsuite/gfortran.dg/goacc/loop-5.f95
===
--- testsuite/gfortran.dg/goacc/loop-5.f95	(revision 241809)
+++ testsuite/gfortran.dg/goacc/loop-5.f95	(working copy)
@@ -93,9 +93,6 @@ program test
   DO j = 1,10
   ENDDO
 ENDDO
-!$acc loop tile(-1) ! { dg-warning "must be positive" }
-do i = 1,10
-enddo
 !$acc loop vector tile(*)
 DO i = 1,10
 ENDDO
@@ -129,9 +126,6 @@ program test
   DO j = 1,10
   ENDDO
 ENDDO
-!$acc loop tile(-1) ! { dg-warning "must be positive" }
-do i = 1,10
-enddo
 !$acc loop vector tile(*)
 DO i = 1,10
 ENDDO
@@ -242,9 +236,6 @@ program test
 DO j = 1,10
 ENDDO
   ENDDO
-  !$acc kernels loop tile(-1) ! { dg-warning "must be positive" }
-  do i = 1,10
-  enddo
   !$acc kernels loop vector tile(*)
   DO i = 1,10
   ENDDO
@@ -333,9 +324,6 @@ program test
 DO j = 1,10
 ENDDO
   ENDDO
-  !$acc parallel loop tile(-1) ! { dg-warning "must be positive" }
-  do i = 1,10
-  enddo
   !$acc parallel loop vector tile(*)
   DO i = 1,10
   ENDDO
Index: testsuite/gfortran.dg/goacc/tile-1.f90
===
--- testsuite/gfortran.dg/goacc/tile-1.f90	(revision 0)
+++ testsuite/gfortran.dg/goacc/tile-1.f90	(revision 0)
@@ -0,0 +1,339 @@
+subroutine parloop
+  integer, parameter :: n = 100
+  integer i, j, k, a
+
+  !$acc parallel loop tile(10)
+  do i = 1, n
+  end do
+  
+  !$acc parallel loop tile(*)
+  do i = 1, n
+  end do
+
+  !$acc parallel loop tile(10, *)
+  do i = 1, n
+ do j = 1, n
+ end do
+  end do
+ 
+  !$acc parallel loop tile(10, *, i) ! { dg-error "" }
+  do i = 1, n
+ do j = 1, n
+do k = 1, n
+end do
+ end do
+  end do 
+
+  !$acc parallel loop t

[Patch 5/5] OpenACC tile clause support, libgomp testsuite patches

2016-11-10 Thread Chung-Lin Tang
Some additional tests and adjustments to existing ones were made.

2016-XX-XX  Nathan Sidwell  
Chung-Lin Tang  

libgomp/
* testsuite/libgomp.oacc-c-c++-common/tile-1.c: New.
* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Adjust and
add additional case.
* testsuite/libgomp.oacc-c-c++-common/vprop.c: XFAIL under
"openacc_nvidia_accel_selected".
* libgomp.oacc-fortran/nested-function-1.f90 (test2): Add num_workers(8)
clause.

Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c	(revision 241809)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c	(working copy)
@@ -112,7 +112,7 @@ int vector_1 (int *ary, int size)
 	ary[ix] = place ();
   }
 
-  return check (ary, size, 0, 0, 1);
+  return check (ary, size, 0, 1, 1);
 }
 
 int vector_2 (int *ary, int size)
@@ -196,10 +196,24 @@ int gang_3 (int *ary, int size)
 	ary[ix + jx * 64] = place ();
   }
 
+  return check (ary, size, 1, 1, 1);
+}
+
+int gang_4 (int *ary, int size)
+{
+  clear (ary, size);
+  
+#pragma acc parallel vector_length(32) copy(ary[0:size]) firstprivate (size)
+  {
+#pragma acc loop auto
+for (int jx = 0; jx <  size; jx++)
+  ary[jx] = place ();
+  }
+
   return check (ary, size, 1, 0, 1);
 }
 
-#define N (32*32*32)
+#define N (32*32*32*2)
 int main ()
 {
   int ondev = 0;
@@ -227,6 +241,8 @@ int main ()
 return 1;
   if (gang_3 (ary,  N))
 return 1;
+  if (gang_4 (ary,  N))
+return 1;
 
   return 0;
 }
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/tile-1.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/tile-1.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/tile-1.c	(revision 0)
@@ -0,0 +1,281 @@
+/* This code uses nvptx inline assembly guarded with acc_on_device, which is
+   not optimized away at -O0, and then confuses the target assembler.
+   { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
+
+/* { dg-additional-options "-fopenacc-dim=32" } */
+
+#include 
+#include 
+
+static int check (const int *ary, int size, int gp, int wp, int vp)
+{
+  int exit = 0;
+  int ix;
+  int gangs[32], workers[32], vectors[32];
+
+  for (ix = 0; ix < 32; ix++)
+gangs[ix] = workers[ix] = vectors[ix] = 0;
+  
+  for (ix = 0; ix < size; ix++)
+{
+  vectors[ary[ix] & 0xff]++;
+  workers[(ary[ix] >> 8) & 0xff]++;
+  gangs[(ary[ix] >> 16) & 0xff]++;
+}
+
+  for (ix = 0; ix < 32; ix++)
+{
+  if (gp)
+	{
+	  int expect = gangs[0];
+	  if (gangs[ix] != expect)
+	{
+	  exit = 1;
+	  printf ("gang %d not used %d times\n", ix, expect);
+	}
+	}
+  else if (ix && gangs[ix])
+	{
+	  exit = 1;
+	  printf ("gang %d unexpectedly used\n", ix);
+	}
+
+  if (wp)
+	{
+	  int expect = workers[0];
+	  if (workers[ix] != expect)
+	{
+	  exit = 1;
+	  printf ("worker %d not used %d times\n", ix, expect);
+	}
+	}
+  else if (ix && workers[ix])
+	{
+	  exit = 1;
+	  printf ("worker %d unexpectedly used\n", ix);
+	}
+
+  if (vp)
+	{
+	  int expect = vectors[0];
+	  if (vectors[ix] != expect)
+	{
+	  exit = 1;
+	  printf ("vector %d not used %d times\n", ix, expect);
+	}
+	}
+  else if (ix && vectors[ix])
+	{
+	  exit = 1;
+	  printf ("vector %d unexpectedly used\n", ix);
+	}
+  
+}
+  return exit;
+}
+
+#pragma acc routine seq
+static int __attribute__((noinline)) place ()
+{
+  int r = 0;
+
+  if (acc_on_device (acc_device_nvidia))
+{
+  int g = 0, w = 0, v = 0;
+
+  __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
+  __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
+  __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
+  r = (g << 16) | (w << 8) | v;
+}
+  return r;
+}
+
+static void clear (int *ary, int size)
+{
+  int ix;
+
+  for (ix = 0; ix < size; ix++)
+ary[ix] = -1;
+}
+
+int gang_vector_1 (int *ary, int size)
+{
+  clear (ary, size);
+#pragma acc parallel vector_length(32) num_gangs (32) copy (ary[0:size]) firstprivate (size)
+  {
+#pragma acc loop tile(128) gang vector
+for (int jx = 0; jx < size; jx++)
+  ary[jx] = place ();
+  }
+
+  return check (ary, size, 1, 0, 1);
+}
+
+int gang_vector_2a (int *ary, int size)
+{
+  if (size % 256)
+return 1;
+  
+  clear (ary, size);
+#pragma acc parallel vector_length(32) num_gangs (32) copy (ary[0:size]) firstprivate (size)
+  {
+#pragma acc loop tile(64, 64) gang vector
+for (int jx = 0; jx < size / 256; jx++)
+  for (int ix = 0; ix < 256; ix++)
+	ary[jx * 256 + ix] = place ();
+  }
+
+  return check (ary, size, 1, 0, 1);
+}
+
+int gang_vector_2b (int *ary, int size)
+{
+  if (size % 256)
+return 1;
+  
+  clear (ary, size);
+#pragma acc parallel vector_length(32) num_gangs (32) copy (ary[0:size]) firstprivate (size)
+  {

[Patch 3/5] OpenACC tile clause support, C/C++ front-end parts

2016-11-10 Thread Chung-Lin Tang
These are the patches for the C/C++ front-ends, along with the
testsuite patches.

Thanks,
Chung-Lin

2016-XX-XX  Nathan Sidwell  

c/
* c-parser.c (c_parser_omp_clause_collapse): Disallow tile.
(c_parser_oacc_clause_tile): Disallow collapse. Fix parsing and
semantic checking.
* c-parser.c (c_parser_omp_for_loop): Accept tiling constructs.

cp/
* parser.c (cp_parser_oacc_clause_tile): Disallow collapse.  Fix
parsing.  Parse constant expression. Remove semantic checking.
(cp_parser_omp_clause_collapse): Disallow tile.
(cp_parser_omp_for_loop): Deal with tile clause.  Don't emit a
parse error about missing for after already emitting one.
Use more conventional for idiom for unbounded loop.
* pt.c (tsubst_omp_clauses): Require integral constant expression
for COLLAPSE and TILE.  Remove broken TILE subst.
* semantics.c (finish_omp_clauses): Correct TILE semantic check.
(finish_omp_for): Deal with tile clause.

gcc/testsuite/
* c-c++-common/goacc/loop-auto-1.c: Adjust and add additional
case.
* c-c++-common/goacc/loop-auto-2.c: New.
* c-c++-common/goacc/tile.c: Include stdbool, fix expected errors.
* g++.dg/goacc/template.C: Test tile subst.  Adjust erroneous
uses.
* g++.dg/goacc/tile-1.C: Check tile subst.
* gcc.dg/goacc/loop-processing-1.c: Adjust dg-final pattern.
Index: c/c-parser.c
===
--- c/c-parser.c	(revision 241809)
+++ c/c-parser.c	(working copy)
@@ -11010,6 +11010,7 @@ c_parser_omp_clause_collapse (c_parser *parser, tr
   location_t loc;
 
   check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse");
+  check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile");
 
   loc = c_parser_peek_token (parser)->location;
   if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
@@ -11920,10 +11921,11 @@ static tree
 c_parser_oacc_clause_tile (c_parser *parser, tree list)
 {
   tree c, expr = error_mark_node;
-  location_t loc, expr_loc;
+  location_t loc;
   tree tile = NULL_TREE;
 
   check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile");
+  check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse");
 
   loc = c_parser_peek_token (parser)->location;
   if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
@@ -11931,16 +11933,19 @@ c_parser_oacc_clause_tile (c_parser *parser, tree
 
   do
 {
+  if (tile && !c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
+	return list;
+
   if (c_parser_next_token_is (parser, CPP_MULT)
 	  && (c_parser_peek_2nd_token (parser)->type == CPP_COMMA
 	  || c_parser_peek_2nd_token (parser)->type == CPP_CLOSE_PAREN))
 	{
 	  c_parser_consume_token (parser);
-	  expr = integer_minus_one_node;
+	  expr = integer_zero_node;
 	}
   else
 	{
-	  expr_loc = c_parser_peek_token (parser)->location;
+	  location_t expr_loc = c_parser_peek_token (parser)->location;
 	  c_expr cexpr = c_parser_expr_no_commas (parser, NULL);
 	  cexpr = convert_lvalue_to_rvalue (expr_loc, cexpr, false, true);
 	  expr = cexpr.value;
@@ -11952,28 +11957,20 @@ c_parser_oacc_clause_tile (c_parser *parser, tree
 	  return list;
 	}
 
-	  if (!INTEGRAL_TYPE_P (TREE_TYPE (expr)))
-	{
-	  c_parser_error (parser, "% value must be integral");
-	  return list;
-	}
-
 	  expr = c_fully_fold (expr, false, NULL);
 
-	  /* Attempt to statically determine when expr isn't positive.  */
-	  c = fold_build2_loc (expr_loc, LE_EXPR, boolean_type_node, expr,
-			   build_int_cst (TREE_TYPE (expr), 0));
-	  protected_set_expr_location (c, expr_loc);
-	  if (c == boolean_true_node)
+	  if (!INTEGRAL_TYPE_P (TREE_TYPE (expr))
+	  || TREE_CODE (expr) != INTEGER_CST
+	  || !tree_fits_shwi_p (expr)
+	  || tree_to_shwi (expr) <= 0)
 	{
-	  warning_at (expr_loc, 0,"% value must be positive");
-	  expr = integer_one_node;
+	  error_at (expr_loc, "% argument needs positive"
+			" integral constant");
+	  expr = integer_zero_node;
 	}
 	}
 
   tile = tree_cons (NULL_TREE, expr, tile);
-  if (c_parser_next_token_is (parser, CPP_COMMA))
-	c_parser_consume_token (parser);
 }
   while (c_parser_next_token_is_not (parser, CPP_CLOSE_PAREN));
 
@@ -14899,11 +14896,17 @@ c_parser_omp_for_loop (location_t loc, c_parser *p
   bool fail = false, open_brace_parsed = false;
   int i, collapse = 1, ordered = 0, count, nbraces = 0;
   location_t for_loc;
+  bool tiling = false;
   vec *for_block = make_tree_vector ();
 
   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
 if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
   collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (cl));
+else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_TILE)
+  {
+	tiling = true;
+	collapse = list_length (OMP_CLAUSE_TILE_LIST (cl));
+  }
 else if (OMP_CLAUSE_CO

  1   2   >