date:20160616

Re: [PATCH, vec-tails 05/10] Check if loop can be masked

2016-06-16 Thread Jeff Law


On 05/19/2016 01:42 PM, Ilya Enkovich wrote:

Hi,

This patch introduces analysis to determine if loop can be masked
(compute LOOP_VINFO_CAN_BE_MASKED and LOOP_VINFO_REQUIRED_MASKS)
and compute how much masking costs.

Thanks,
Ilya
--
gcc/

2016-05-19  Ilya Enkovich  

* tree-vect-loop.c: Include insn-config.h and recog.h.
(vect_check_required_masks_widening): New.
(vect_check_required_masks_narrowing): New.
(vect_get_masking_iv_elems): New.
(vect_get_masking_iv_type): New.
(vect_get_extreme_masks): New.
(vect_check_required_masks): New.
(vect_analyze_loop_operations): Add vect_check_required_masks
call to compute LOOP_VINFO_CAN_BE_MASKED.
(vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and
LOOP_VINFO_NEED_MASKING before starting over.
(vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and
masking cost.
* tree-vect-stmts.c (can_mask_load_store): New.
(vect_model_load_masking_cost): New.
(vect_model_store_masking_cost): New.
(vect_model_simple_masking_cost): New.
(vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED
and masking cost.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vect_stmt_should_be_masked_for_epilogue): New.
(vect_add_required_mask_for_stmt): New.
(vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED.
* tree-vectorizer.h (vect_model_load_masking_cost): New.
(vect_model_store_masking_cost): New.
(vect_model_simple_masking_cost): New.


diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index e25a0ce..31360d3 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -31,6 +31,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "optabs-tree.h"
+#include "insn-config.h"
+#include "recog.h"   /* FIXME: for insn_data */

Ick :(



+
+/* Function vect_check_required_masks_narowing.

narrowing



+
+   Return 1 if vector mask of type MASK_TYPE can be narrowed
+   to a type having REQ_ELEMS elements in a single vector.  */
+
+static bool
+vect_check_required_masks_narrowing (loop_vec_info loop_vinfo,
+tree mask_type, unsigned req_elems)
Given the common structure & duplication I can't help but wonder if a 
single function should be used for widening/narrowing.  Ultimately can't 
you swap  mask_elems/req_elems and always go narrower to wider (using a 
different optab for the two different cases)?






+
+/* Function vect_get_masking_iv_elems.
+
+   Return a number of elements in IV used for loop masking.  */
+static int
+vect_get_masking_iv_elems (loop_vec_info loop_vinfo)
+{
+  tree iv_type = TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo));
I'm guessing Richi's comment about what tree type you're looking at 
refers to this and similar instances.  Doesn't this give you the type of 
the number of iterations rather than the type of the iteration variable 
itself?





 +

+  if (!expand_vec_cmp_expr_p (iv_vectype, mask_type))
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"cannot be masked: required vector comparison "
+"is not supported.\n");
+  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+  return;
+}
On a totally unrelated topic, I was speaking with David Malcolm earlier 
this week about how to turn this kind of missed optimization information 
we currently emit into dumps into something more easily consumed by users.


The general issue is that we've got customers that want to understand 
why certain optimizations fire or do not fire.  They're by far more 
interested in the vectorizer than anything else.


We have a sense that much of the information those customers desire is 
sitting in the dump files, but it's buried in there with other stuff 
that isn't generally useful to users.


So we're pondering what it might take to take these glorified fprintf 
calls and turn them into a first class diagnostic that could be emitted 
to stderr or into the dump file depending (of course) on the options 
passed to GCC.


The reason I bring this up is the hope that your team might have some 
insights based on what ICC has done the in the past for its customers.


Anyway, back to the code...



diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 9ab4af4..91ebe5a 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "builtins.h"
 #include "internal-fn.h"
+#include "tree-ssa-loop-ivopts.h"

 /* For lang_hooks.types.type_for_mode.  */
 #include "langhooks.h"
@@ -535,6 +536,38 @@ process_use (gimple *stmt, tree use, loop_vec_info 
loop_vinfo, bool live_p

Re: [Patch, avr] Fix PR 71151

2016-06-16 Thread Senthil Kumar Selvaraj


Senthil Kumar Selvaraj writes:

> Georg-Johann Lay writes:
>
>> Senthil Kumar Selvaraj schrieb:
>>> Hi,
>>> 
>>>   This patch fixes PR 71151 by eliminating the
>>>   TARGET_ASM_FUNCTION_RODATA_SECTION hook and setting
>>>   JUMP_TABLES_IN_TEXT_SECTION to 1.
>>> 
>>>   As described in the bugzilla entry, this hook assumed it will get
>>>   called only for jumptable rodata for functions. This was true until
>>>   6.1, when a commit in varasm.c started calling the hook for mergeable
>>>   string/constant data as well.
>>> 
>>>   This resulted in string constants ending up in a section intended for
>>>   jumptables (flash), and broke code using those constants, which
>>>   expects them to be present in rodata (SRAM).
>>> 
>>>   Given that the original reason for placing jumptables in a section was
>>>   fixed by Johann in PR 63323, this patch restores the original
>>>   behavior. Reg testing on both gcc-6-branch and trunk showed no 
>>> regressions.
>>
>> Just for the record:
>>
>> The intention for jump-tables in function-rodata-section was to get 
>> fine-grained section for the tables so that --gc-sections and 
>> -ffunction-sections not only gc unused functions but also unused 
>> jump-tables.  As these tables had to reside in the lowest 64KiB of flash 
>> (.progmem section) neither .rodata nor .text was a correct placement, 
>> hence the hacking in TARGET_ASM_FUNCTION_RODATA_SECTION.
>>
>> Before using TARGET_ASM_FUNCTION_RODATA_SECTION, all jump tables were 
>> put into .progmem.gcc_sw_table by ASM_OUTPUT_BEFORE_CASE_LABEL switching 
>> to that section.
>>
>> We actually never had fump-tables in .text before...
>
> JUMP_TABLES_IN_TEXT_SECTION was 1 before r37465 - that was when the
> progmem.gcc_sw_table section was introduced. But yes, I understand that
> the target hook for FUNCTION_RODATA_SECTION was done to get them gc'ed
> along with the code.
>
>>
>> The purpose of PR63323 was to have more generic jump-table 
>> implementation that also works if the table does NOT reside in the lower 
>> 64KiB.  This happens when moving whole whole TEXT section around like 
>> for a bootloader.
>
> Understood.
>>
>>>   As pointed out by Johann, this may end up increasing code
>>>   size if there are lots of branches that cross the jump tables. I
>>>   intend to propose a separate patch that gives additional information
>>>   to the target hook (SECCAT_RODATA_{STRING,JUMPTABLE}) so it can know
>>>   what type of function rodata is coming on. Johann also suggested
>>>   handling jump table generation ourselves - I'll experiment with that
>>>   some more.
>>> 
>>>   If ok, could someone commit please? Could you also backport to
>>>   gcc-6-branch?
>>> 
>>> Regards
>>> Senthil
>>> 
>>> gcc/ChangeLog
>>> 
>>> 2016-06-03  Senthil Kumar Selvaraj  
>>> 
>>> * config/avr/avr.c (avr_asm_function_rodata_section): Remove.
>>> * config/avr/avr.c (TARGET_ASM_FUNCTION_RODATA_SECTION): Remove.
>>> 
>>> gcc/testsuite/ChangeLog
>>> 
>>> 2016-06-03  Senthil Kumar Selvaraj  
>>> 
>>> * gcc/testsuite/gcc.target/avr/pr71151-1.c: New.
>>> * gcc/testsuite/gcc.target/avr/pr71151-2.c: New.
>>> 
>>> diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c
>>> index ba5cd91..3cb8cb7 100644
>>> --- gcc/config/avr/avr.c
>>> +++ gcc/config/avr/avr.c
>>> @@ -9488,65 +9488,6 @@ avr_asm_init_sections (void)
>>>  }
>>>  
>>>  
>>> -/* Implement `TARGET_ASM_FUNCTION_RODATA_SECTION'.  */
>>> -
>>> -static section*
>>> -avr_asm_function_rodata_section (tree decl)
>>> -{
>>> -  /* If a function is unused and optimized out by -ffunction-sections
>>> - and --gc-sections, ensure that the same will happen for its jump
>>> - tables by putting them into individual sections.  */
>>> -
>>> -  unsigned int flags;
>>> -  section * frodata;
>>> -
>>> -  /* Get the frodata section from the default function in varasm.c
>>> - but treat function-associated data-like jump tables as code
>>> - rather than as user defined data.  AVR has no constant pools.  */
>>> -  {
>>> -int fdata = flag_data_sections;
>>> -
>>> -flag_data_sections = flag_function_sections;
>>> -frodata = default_function_rodata_section (decl);
>>> -flag_data_sections = fdata;
>>> -flags = frodata->common.flags;
>>> -  }
>>> -
>>> -  if (frodata != readonly_data_section
>>> -  && flags & SECTION_NAMED)
>>> -{
>>> -  /* Adjust section flags and replace section name prefix.  */
>>> -
>>> -  unsigned int i;
>>> -
>>> -  static const char* const prefix[] =
>>> -{
>>> -  ".rodata",  ".progmem.gcc_sw_table",
>>> -  ".gnu.linkonce.r.", ".gnu.linkonce.t."
>>> -};
>>> -
>>> -  for (i = 0; i < sizeof (prefix) / sizeof (*prefix); i += 2)
>>> -{
>>> -  const char * old_prefix = prefix[i];
>>> -  const char * new_prefix = prefix[i+1];
>>> -  const char * name = frodata->named.name;
>>> -
>>> -  if (STR_PREFIX_P (name, old_prefix))
>>> -{
>>> -

[PATCH][AArch64] Handle iterator definitions with conditionals in geniterator.sh

2016-06-16 Thread Szabolcs Nagy

Turn the following definition in iterators.md

(define_mode_iterator XXX [(YYY "condition") ZZZ])

into

#define BUILTIN_XXX(T, N, MAP) \
  VAR2 (T, N, MAP, yyy, zzz)

previously geniterators.sh skipped definitions with
conditions.

gcc/ChangeLog:

2016-06-16  Szabolcs Nagy  

* config/aarch64/geniterators.sh: Handle parenthesised conditions.
diff --git a/gcc/config/aarch64/geniterators.sh b/gcc/config/aarch64/geniterators.sh
index ec1b1ea..8baa244 100644
--- a/gcc/config/aarch64/geniterators.sh
+++ b/gcc/config/aarch64/geniterators.sh
@@ -23,10 +23,7 @@
 # BUILTIN_ macros, which expand to VAR Macros covering the
 # same set of modes as the iterator in iterators.md
 #
-# Find the  definitions (may span several lines), skip the ones
-# which does not have a simple format because it contains characters we
-# don't want to or can't handle (e.g P, PTR iterators change depending on
-# Pmode and ptr_mode).
+# Find the  definitions (may span several lines).
 LC_ALL=C awk '
 BEGIN {
 	print "/* -*- buffer-read-only: t -*- */"
@@ -49,12 +46,24 @@ iterdef {
 	sub(/.*\(define_mode_iterator/, "", s)
 }
 
-iterdef && s ~ /\)/ {
+iterdef {
+	# Count the parentheses, the iterator definition ends
+	# if there are more closing ones than opening ones.
+	nopen = gsub(/\(/, "(", s)
+	nclose = gsub(/\)/, ")", s)
+	if (nopen >= nclose)
+		next
+
 	iterdef = 0
 
 	gsub(/[ \t]+/, " ", s)
-	sub(/ *\).*/, "", s)
+	sub(/ *\)[^)]*$/, "", s)
 	sub(/^ /, "", s)
+
+	# Drop the conditions.
+	gsub(/ *"[^"]*" *\)/, "", s)
+	gsub(/\( */, "", s)
+
 	if (s !~ /^[A-Za-z0-9_]+ \[[A-Z0-9 ]*\]$/)
 		next
 	sub(/\[ */, "", s)

RFC: pass to warn on questionable uses of alloca().

2016-06-16 Thread Aldy Hernandez


Hi folks!

I've been working on a plugin to warn on unbounded uses of alloca() to 
help find questionable uses in glibc and other libraries.  It occurred 
to me that the broader community could benefit from it, as it has found 
quite a few interesting cases. So, I've reimplemented it as an actual 
pass, lest it be lost in plugin la-la land and bit-rot.


Before I sink any more time cleaning it up, would this be something 
acceptable into the compiler?  It doesn't have anything glibc specific, 
except possibly the following idiom which I allow:


if (gate_function (length))
alloca(length);

...and the above is probably common enough that we should handle it.

The testcase has a lot of examples of what the pass handles.

Thoughts?

Aldy

p.s. The pass currently warns on all uses of VLAs.  I'm not completely 
sold on this idea, so perhaps we could remove it, or gate it with a flag.
gcc/

* Makefile.in (OBJS): Add walloca.o.
* common.opt (Walloca): New.
(Walloca-strict): New.
(Walloca-max-size): New.
* passes.def: Add pass_walloca.
* tree-pass.h (make_pass_walloca): New.
* walloca.c: New pass.

gcc/c-family/

Add Walloca and Walloca-strict entries.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 776f6d7..6964de1 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1537,6 +1537,7 @@ OBJS = \
varpool.o \
vmsdbgout.o \
vtable-verify.o \
+   walloca.o \
web.o \
wide-int.o \
wide-int-print.o \
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 83fd84c..816f854 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -275,6 +275,14 @@ Wall
 C ObjC C++ ObjC++ Warning
 Enable most warning messages.
 
+Walloca
+LangEnabledBy(C ObjC C++ ObjC++,Wall)
+; in common.opt
+
+Walloca-strict
+LangEnabledBy(C ObjC C++ ObjC++)
+; in common.opt
+
 Warray-bounds
 LangEnabledBy(C ObjC C++ ObjC++,Wall)
 ; in common.opt
diff --git a/gcc/common.opt b/gcc/common.opt
index f0d7196..a4ed603 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -545,6 +545,18 @@ Waggressive-loop-optimizations
 Common Var(warn_aggressive_loop_optimizations) Init(1) Warning
 Warn if a loop with constant number of iterations triggers undefined behavior.
 
+Walloca
+Common Var(warn_alloca) Warning
+Warn on unbounded uses of alloca and all uses of variable-length arrays.
+
+Walloca-strict
+Common Var(warn_alloca_strict) Init(0) Warning
+Warn on all uses of alloca or variable-length arrays.
+
+Walloca-max-size=
+Common Joined RejectNegative UInteger Var(warn_alloca_max_size) Init(4000) 
Warning
+Maximum size of alloca argument to allow without warning.
+
 Warray-bounds
 Common Var(warn_array_bounds) Warning
 Warn if an array is accessed out of bounds.
diff --git a/gcc/passes.def b/gcc/passes.def
index 3647e90..4b503e8 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -303,6 +303,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_simduid_cleanup);
   NEXT_PASS (pass_lower_vector_ssa);
   NEXT_PASS (pass_cse_reciprocals);
+  NEXT_PASS (pass_walloca);
   NEXT_PASS (pass_reassoc, false /* insert_powi_p */);
   NEXT_PASS (pass_strength_reduction);
   NEXT_PASS (pass_split_paths);
diff --git a/gcc/testsuite/gcc.dg/Walloca-1.c b/gcc/testsuite/gcc.dg/Walloca-1.c
new file mode 100644
index 000..23c5123
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Walloca-1.c
@@ -0,0 +1,89 @@
+/* { dg-do compile } */
+/* { dg-options "-Walloca -O -Walloca-max-size=2000" } */
+
+#define alloca __builtin_alloca
+
+typedef unsigned long size_t;
+extern size_t strlen(const char *);
+
+extern void useit (char *);
+
+int num;
+
+void test_walloca (size_t len, size_t len2, size_t len3)
+{
+  int i;
+
+  for (i=0; i < 123; ++i)
+{
+  char *s = alloca (566);  /* { dg-warning "alloca within a loop" } */
+  useit (s);
+}
+
+  char *s = alloca (123);
+  useit (s);   // OK, constant argument to alloca
+
+  s = alloca (num);/* { dg-warning "unbounded use" } */
+  useit (s);
+
+  s = alloca(9);   /* { dg-warning "is too big" } */
+  useit (s);
+
+  if (len < 2000)
+{
+  s = alloca(len); // OK, bounded
+  useit (s);
+}
+
+  if (len + len2 < 2000)   // OK, bounded
+{
+  s = alloca(len + len2);
+  useit (s);
+}
+
+  if (len3 <= 2001)
+{
+  s = alloca(len3);/* { dg-warning "alloca is too big" } */
+  useit(s);
+}
+}
+
+// Test __libc_use_alloca hack from glibc.
+extern int __libc_alloca_cutoff (size_t size) __attribute__ ((const));
+extern __inline __attribute__ ((__always_inline__))
+int
+__libc_use_alloca (size_t size)
+{
+  return (__builtin_expect (size <= 4000 / 4, 1)
+ || __builtin_expect (__libc_alloca_cutoff (size), 1));
+}
+
+void test_libc_use_alloca_from_glibc (const char *tocode)
+{
+  size_t len = strlen (tocode) + 3;
+  _Bool usealloca = __libc_use_alloca (len

Re: RFC: pass to warn on questionable uses of alloca().

2016-06-16 Thread Jakub Jelinek

On Thu, Jun 16, 2016 at 04:32:57AM -0400, Aldy Hernandez wrote:
> p.s. The pass currently warns on all uses of VLAs.  I'm not completely sold
> on this idea, so perhaps we could remove it, or gate it with a flag.

Just random nits, no comments on the idea of the patch.

>   * walloca.c: New pass.

Wouldn't it be better to call it gimple-ssa-warn-alloca.c or something
similar?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/Walloca-1.c
> @@ -0,0 +1,89 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Walloca -O -Walloca-max-size=2000" } */
> +
> +#define alloca __builtin_alloca
> +
> +typedef unsigned long size_t;

This should be typedef __SIZE_TYPE__ size_t;

> +  if (range_type == VR_RANGE
> +  && wi::fits_uhwi_p (max)
> +  && max.to_uhwi () <= (unsigned) warn_alloca_max_size)

Can't you use just wide-int comparison?

Jakub

Re: [PATCH] Add port for Phoenix-RTOS on ARM platform.

2016-06-16 Thread Kyrill Tkachov



On 15/06/16 23:23, Jeff Law wrote:

On 06/15/2016 08:21 AM, Jakub Sejdak wrote:

Hello,


First of all, do you or your employer have a copyright assignment
to the FSF? The above link contains instructions on how to do that.
It is a necessary prerequisite to accepting any non-small change.


Sorry for a late response, but it took me some time to fulfill
requirements mentioned above.
We (Phoenix Systems) now have a copyright assignment to the FSF.

Which I can confirm was recently recorded by the FSF.



Thanks Jeff,

Could you please give some guidance with respect to Jakub's request at:
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01153.html ?

This is regarding being appointed maintainer for the arm*-*-phoenix*
targets.

Thanks,
Kyrill


Jeff

Re: [PATCH] Add port for Phoenix-RTOS on ARM platform.

2016-06-16 Thread Jakub Sejdak

Actually, if possible, I would skip the "arm" part, because we plan to
port Phoenix-RTOS for other platforms. It will be easier to do it
once.

2016-06-16 10:52 GMT+02:00 Kyrill Tkachov :
>
> On 15/06/16 23:23, Jeff Law wrote:
>>
>> On 06/15/2016 08:21 AM, Jakub Sejdak wrote:
>>>
>>> Hello,
>>>
 First of all, do you or your employer have a copyright assignment
 to the FSF? The above link contains instructions on how to do that.
 It is a necessary prerequisite to accepting any non-small change.
>>>
>>>
>>> Sorry for a late response, but it took me some time to fulfill
>>> requirements mentioned above.
>>> We (Phoenix Systems) now have a copyright assignment to the FSF.
>>
>> Which I can confirm was recently recorded by the FSF.
>>
>
> Thanks Jeff,
>
> Could you please give some guidance with respect to Jakub's request at:
> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01153.html ?
>
> This is regarding being appointed maintainer for the arm*-*-phoenix*
> targets.
>
> Thanks,
> Kyrill
>
>> Jeff
>
>



-- 
Jakub Sejdak
Software Engineer
Phoenix Systems (www.phoesys.com)
+48 608 050 163

Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant

2016-06-16 Thread Kyrill Tkachov



On 15/06/16 22:53, Marc Glisse wrote:

On Wed, 15 Jun 2016, Kyrill Tkachov wrote:


This is a respin of https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00952.html 
following feedback.
I've changed the code to cast the operand to an unsigned type before applying 
the multiplication algorithm
and cast it back to the signed type at the end.
Whether to perform the cast is now determined by the function 
cast_mult_synth_to_unsigned in which I've implemented
the cases that Marc mentioned in [1]. Please do let me know
if there are any other cases that need to be handled.


Ah, I never meant those cases as an exhaustive list, I was just looking for 
examples showing that the transformation was unsafe, and those 2 came to mind:

- x*15 -> x*16-x the second one can overflow even when the first one doesn't.

- x*-2 -> -(x*2) can overflow when the result is INT_MIN (maybe that's 
redundant with the negate_variant check?)

On the other hand, as long as we remain in the 'positive' operations, turning x*3 to x<<1+x seems perfectly safe. And even x*30 to (x*16-x)*2 cannot cause spurious overflows. But I didn't look at the algorithm closely enough to 
characterize the safe cases. Now if you have done it, that's good :-) Otherwise, we might want to err on the side of caution.




I'll be honest, I didn't give it much thought beyond convincing myself that the 
two cases you listed are legitimate.
Looking at expand_mult_const in expmed.c can be helpful (where it updates 
val_so_far for checking purposes) to see
the different algorithm cases. I think the only steps that could cause overflow 
are alg_sub_t_m2, alg_sub_t2_m and alg_sub_factor or when the final step is 
negate_variant, which are what you listed (and covered in this patch).

richi is away on PTO for the time being though, so we have some time to 
convince ourselves :)

Thanks,
Kyrill

[PATCH, i386][Updated] Add native support for VIA C7, Eden and Nano CPUs

2016-06-16 Thread J. Mayer

The following patch adds support and native detection for C7, Eden
"Samuel2", Eden "Nehemiah", Eden "Esther", Eden x2, Eden x4, Nano 1xxx,
Nano 2xxx, Nano 3xxx, Nano x2 and Nano x4 VIA CPUs.

This patch has been updated against current repository.
It contains documentation and Changelog updates.

Please CC me to any comment / review / change request.

---

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 2650405..05b450a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,15 @@
+2016-06-16  Jocelyn Mayer 
+
+   * gcc/config/i386/driver-i386.c (host_detect_local_cpu): Set
+   PROCESSOR_K8 for signature_CENTAUR_ebx with has_longmode.
+   : Pass nano-3000, nano, eden-x2 or k8 for
+   signature_CENTAUR_ebx.
+   : Pass c7, nehemiah or samuel-2 for
signature_CENTAUR_ebx.
+   * gcc/config/i386/i386.c (ix86_option_override_internal): Add
+   definitions for VIA c7, samuel-2, nehemiah, esther, eden-x2,
eden-x4,
+   nano, nano-1000, nano-2000, nano-3000, nano-x2 and nano-x4.
+   * gcc/doc/invoke.texi: Document new VIA -march entries.
+
 2016-06-15  Michael Meissner  
 
* config/rs6000/vsx.md (VSINT_84): Add DImode to enable loading
diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-
i386.c
index a9d5135..d2c4c4c 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -651,7 +651,9 @@ const char *host_detect_local_cpu (int argc, const
char **argv)
  break;
 
case 6:
- if (model > 9 || has_longmode)
+ if (has_longmode)
+   processor = PROCESSOR_K8;
+ else if (model > 9)
/* Use the default detection procedure.  */
;
  else if (model == 9)
@@ -869,9 +871,30 @@ const char *host_detect_local_cpu (int argc, const
char **argv)
cpu = "athlon";
   break;
 case PROCESSOR_K8:
-  if (arch && has_sse3)
-   cpu = "k8-sse3";
+  if (arch)
+   {
+ if (vendor == signature_CENTAUR_ebx)
+   {
+ if (has_sse4_1)
+   /* Nano 3000 | Nano dual / quad core | Eden X4 */
+   cpu = "nano-3000";
+ else if (has_ssse3)
+   /* Nano 1000 | Nano 2000 */
+   cpu = "nano";
+ else if (has_sse3)
+   /* Eden X2 */
+   cpu = "eden-x2";
+ else
+   /* Default to k8 */
+   cpu = "k8";
+   }
+ else if (has_sse3)
+   cpu = "k8-sse3";
+ else
+   cpu = "k8";
+   }
   else
+   /* For -mtune, we default to -mtune=k8 */
cpu = "k8";
   break;
 case PROCESSOR_AMDFAM10:
@@ -903,7 +926,22 @@ const char *host_detect_local_cpu (int argc, const
char **argv)
   /* Use something reasonable.  */
   if (arch)
{
- if (has_ssse3)
+ if (vendor == signature_CENTAUR_ebx)
+   {
+ if (has_sse3) {
+   /* C7 / Eden "Esther" */
+   cpu = "c7";
+ } else if (has_sse) {
+   /* Eden "Nehemiah" */
+   cpu = "nehemiah";
+ } else if (has_3dnow) {
+   /* Eden "Samuel2" */
+   cpu = "samuel-2";
+ } else {
+   /* We have no idea: default to generic i386 */
+ }
+   }
+ else if (has_ssse3)
cpu = "core2";
  else if (has_sse3)
{
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index c5e5e12..3c88912 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4783,8 +4783,15 @@ ix86_option_override_internal (bool main_args_p,
   {"winchip-c6", PROCESSOR_I486, CPU_NONE, PTA_MMX},
   {"winchip2", PROCESSOR_I486, CPU_NONE, PTA_MMX | PTA_3DNOW |
PTA_PRFCHW},
   {"c3", PROCESSOR_I486, CPU_NONE, PTA_MMX | PTA_3DNOW |
PTA_PRFCHW},
+  {"samuel-2", PROCESSOR_I486, CPU_NONE, PTA_MMX | PTA_3DNOW |
PTA_PRFCHW},
   {"c3-2", PROCESSOR_PENTIUMPRO, CPU_PENTIUMPRO,
PTA_MMX | PTA_SSE | PTA_FXSR},
+  {"nehemiah", PROCESSOR_PENTIUMPRO, CPU_PENTIUMPRO,
+   PTA_MMX | PTA_SSE | PTA_FXSR},
+  {"c7", PROCESSOR_PENTIUMPRO, CPU_PENTIUMPRO,
+   PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_FXSR},
+  {"esther", PROCESSOR_PENTIUMPRO, CPU_PENTIUMPRO,
+   PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_FXSR},
   {"i686", PROCESSOR_PENTIUMPRO, CPU_PENTIUMPRO, 0},
   {"pentiumpro", PROCESSOR_PENTIUMPRO, CPU_PENTIUMPRO, 0},
   {"pentium2", PROCESSOR_PENTIUMPRO, CPU_PENTIUMPRO, PTA_MMX |
PTA_FXSR},
@@ -4915,6 +4922,30 @@ ix86_option_override_internal (bool main_args_p,
| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX
| PTA_BMI | PTA_F16C | PTA_MOVBE | PTA_PRFCHW
| PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
+  {"eden-x2", PROCESSOR_K8, CPU_K8,
+   PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+| PTA_FXSR},
+  {"eden-x4", PROCESSOR_K8, CPU_K8,
+   PTA_64BIT | P

Re: [PATCH][ARM][1/4] Replace uses of int_log2 by exact_log2

2016-06-16 Thread Kyrill Tkachov


Ping.

Thanks,
Kyrill

On 09/06/16 12:04, Kyrill Tkachov wrote:

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00139.html

Thanks,
Kyrill

On 02/06/16 11:37, Kyrill Tkachov wrote:

I wanted to ping this patch, but checking the gcc-patches archive I see this 
wasn't
archived, though I have confirmed that the patch was sent out and distributed by
the mail server correctly.

Anyway, resending...

Thanks,
Kyrill

On 24/05/16 14:25, Kyrill Tkachov wrote:

Hi all,

The int_log2 function in arm.c is not really useful since we already have a 
generic function for calculating
the log2 of HOST_WIDE_INTs. The only difference in functionality is that 
int_log2 also asserts that the result
is no greater than 31.

This patch removes int_log2 in favour of exact_log2 and adds an assert on the 
result to make sure the return
value was as expected.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Is this ok? Or is there something I'm missing about int_log2?

Thanks,
Kyrill

2016-05-24  Kyrylo Tkachov  

* config/arm/arm.c (int_log2): Delete definition and prototype.
(shift_op): Use exact_log2 instead of int_log2.
(vfp3_const_double_for_fract_bits): Likewise.

[Ada] Exclude private protected type defined in the runtime for restrictions

2016-06-16 Thread Arnaud Charlet

This is preliminary work to allow an implementation change in the runtime.
Does not affect users.

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-06-16  Tristan Gingold  

* einfo.ads (Has_Protected): Clarify comment.
* sem_ch9.adb (Analyze_Protected_Type_Declaration): Do not
consider private protected types declared in the runtime for
the No_Local_Protected_Types restriction.

Index: sem_ch9.adb
===
--- sem_ch9.adb (revision 237439)
+++ sem_ch9.adb (working copy)
@@ -32,8 +32,10 @@
 with Errout;use Errout;
 with Exp_Ch9;   use Exp_Ch9;
 with Elists;use Elists;
+with Fname; use Fname;
 with Freeze;use Freeze;
 with Layout;use Layout;
+with Lib;   use Lib;
 with Lib.Xref;  use Lib.Xref;
 with Namet; use Namet;
 with Nlists;use Nlists;
@@ -1985,12 +1987,27 @@
 
   Set_Ekind  (T, E_Protected_Type);
   Set_Is_First_Subtype   (T, True);
-  Set_Has_Protected  (T, True);
   Init_Size_Align(T);
   Set_Etype  (T, T);
   Set_Has_Delayed_Freeze (T, True);
   Set_Stored_Constraint  (T, No_Elist);
 
+  --  Mark this type as a protected type for the sake of restrictions,
+  --  unless the protected type is declared in a private part of a package
+  --  of the runtime. With this exception, the Suspension_Object from
+  --  Ada.Synchronous_Task_Control can be implemented using a protected
+  --  without triggering violations of No_Local_Protected_Objects when the
+  --  user locally declares such an object. This may look like a trick but
+  --  the user doesn't have to know how Suspension_Object is implemented.
+
+  if In_Private_Part (Current_Scope)
+and then Is_Internal_File_Name (Unit_File_Name (Current_Sem_Unit))
+  then
+ Set_Has_Protected   (T, False);
+  else
+ Set_Has_Protected   (T, True);
+  end if;
+
   --  Set the SPARK_Mode from the current context (may be overwritten later
   --  with an explicit pragma).
 
Index: einfo.ads
===
--- einfo.ads   (revision 237436)
+++ einfo.ads   (working copy)
@@ -1936,10 +1936,10 @@
 --Has_Protected (Flag271) [base type only]
 --   Defined in all type entities. Set on protected types themselves, and
 --   also (recursively) on any composite type which has a component for
---   which Has_Protected is set. The meaning is that an allocator for
---   or declaration of such an object must create the required protected
---   objects. Note: the flag is not set on access types, even if they
---   designate an object that Has_Protected.
+--   which Has_Protected is set, unless the protected type is declared in
+--   the private part of an internal unit. The meaning is that restrictions
+--   for protected types apply to this type. Note: the flag is not set on
+--   access types, even if they designate an object that Has_Protected.
 
 --Has_Qualified_Name (Flag161)
 --   Defined in all entities. Set if the name in the Chars field has

[Ada] Improve the support of No_Use_Entity

2016-06-16 Thread Arnaud Charlet

This patch performs a code cleanup of the previous implementation and extends
its functionality to facilitate the use of this restriction with entities of
the Ada83 package Text_IO. For example:

pragma Restrictions (No_Use_Of_Entity => Text_IO.Put_Line);

with Text_IO; use Text_IO;
procedure Restrict is
begin
   Put ("Hello");
   Put_Line ("Hello_World!");  -- Restriction failed

   Text_IO.Put ("Hello");
   Text_IO.Put_Line ("Hello_World!");  -- Restriction failed
end;

Command: gcc -c restrict.adb
Output:
restrict.adb:7:04: reference to "Put_Line" violates restriction
   No_Use_Of_Entity at line 1
restrict.adb:10:11: reference to "Text_IO.Put_Line" violates restriction
No_Use_Of_Entity at line 1

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-06-16  Javier Miranda  

* restrict.adb (Check_Restriction_No_Use_Of_Entity): Avoid
never-ending loop, code cleanup; adding also support for Text_IO.
* sem_ch8.adb (Find_Expanded_Name): Invoke
Check_Restriction_No_Use_Entity.

Index: restrict.adb
===
--- restrict.adb(revision 237429)
+++ restrict.adb(working copy)
@@ -6,7 +6,7 @@
 --  --
 -- B o d y  --
 --  --
---  Copyright (C) 1992-2015, Free Software Foundation, Inc. --
+--  Copyright (C) 1992-2016, Free Software Foundation, Inc. --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -759,9 +759,16 @@
 Ent  := Entity (N);
 Expr := NE_Ent.Entity;
 loop
-   --  Here if at outer level of entity name in reference
+   --  Here if at outer level of entity name in reference (handle
+   --  also the direct use of Text_IO in the pragma). For example:
+   --  pragma Restrictions (No_Use_Of_Entity => Text_IO.Put);
 
-   if Scope (Ent) = Standard_Standard then
+   if Scope (Ent) = Standard_Standard
+ or else (Nkind (Expr) = N_Identifier
+   and then Chars (Ent) = Name_Text_IO
+   and then Chars (Scope (Ent)) = Name_Ada
+   and then Scope (Scope (Ent)) = Standard_Standard)
+   then
   if Nkind_In (Expr, N_Identifier, N_Operator_Symbol)
 and then Chars (Ent) = Chars (Expr)
   then
@@ -774,22 +781,19 @@
  return;
 
   else
- goto Continue;
+ exit;
   end if;
 
--  Here if at outer level of entity name in table
 
elsif Nkind_In (Expr, N_Identifier, N_Operator_Symbol) then
-  goto Continue;
+  exit;
 
--  Here if neither at the outer level
 
else
   pragma Assert (Nkind (Expr) = N_Selected_Component);
-
-  if Chars (Selector_Name (Expr)) /= Chars (Ent) then
- goto Continue;
-  end if;
+  exit when Chars (Selector_Name (Expr)) /= Chars (Ent);
end if;
 
--  Move up a level
@@ -800,10 +804,6 @@
end loop;
 
Expr := Prefix (Expr);
-
-   --  Entry did not match
-
-   <> null;
 end loop;
  end;
   end loop;
Index: sem_ch12.adb
===
--- sem_ch12.adb(revision 237437)
+++ sem_ch12.adb(working copy)
@@ -1112,7 +1112,7 @@
   --  Find actual that corresponds to a given a formal parameter. If the
   --  actuals are positional, return the next one, if any. If the actuals
   --  are named, scan the parameter associations to find the right one.
-  --  A_F is the corresponding entity in the analyzed generic,which is
+  --  A_F is the corresponding entity in the analyzed generic, which is
   --  placed on the selector name for ASIS use.
   --
   --  In Ada 2005, a named association may be given with a box, in which
@@ -1257,7 +1257,7 @@
 
  elsif No (Selector_Name (Actual)) then
 Found_Assoc := Actual;
-Act := Explicit_Generic_Actual_Parameter (Actual);
+Act := Explicit_Generic_Actual_Parameter (Actual);
 Num_Matched := Num_Matched + 1;
 Next (Actual);
 
@@ -1271,12 +1271,17 @@
 Prev:= Empty;
 
 while Presen

Re: [PATCH] Fix builtin-arith-overflow-p-1[23].c on i686

2016-06-16 Thread Jakub Jelinek

On Wed, Jun 15, 2016 at 10:44:06PM +0200, Uros Bizjak wrote:
> Please also change similar peephole2 pattern (that does a zext with an
> and insn) a couple of patterns below the one you are changing.

Here is what I've committed to the trunk and 6.2 after bootstrap/regtest on
x86_64-linux and i686-linux.
For 5/4.9, this doesn't apply cleanly, as http://gcc.gnu.org/r222592
aka https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01930.html
has not been backported.  Shall I backport that too, or just not apply to
5/4.9?  I guess I should, because:
int v;
__attribute__ ((noinline, noclone)) void bar (void) { v++; }
__attribute__ ((noinline, noclone)) void foo (unsigned int x) { signed int y = 
((-__INT_MAX__ - 1) / 2), r; if (__builtin_mul_overflow (x, y, &r)) bar (); }
int main () { foo (2); if (v) __builtin_abort (); return 0; }
is miscompiled with -m32 -O2 in 5.x (though, 4.9 doesn't support
__builtin_*_overflow, so maybe it is not an issue there).

2016-06-16  Jakub Jelinek  

* config/i386/i386.md (setcc + movzbl peephole2): Use reg_set_p.
(setcc + and peephole2): Likewise.

--- gcc/config/i386/i386.md.jj  2016-06-15 19:09:09.233682173 +0200
+++ gcc/config/i386/i386.md 2016-06-16 09:28:23.725863387 +0200
@@ -11849,8 +11849,7 @@ (define_peephole2
   "(peep2_reg_dead_p (3, operands[1])
 || operands_match_p (operands[1], operands[3]))
&& ! reg_overlap_mentioned_p (operands[3], operands[0])
-   && ! (GET_CODE (operands[4]) == CLOBBER
-&& reg_mentioned_p (operands[3], operands[4]))"
+   && ! reg_set_p (operands[3], operands[4])"
   [(parallel [(set (match_dup 5) (match_dup 0))
  (match_dup 4)])
(set (strict_low_part (match_dup 6))
@@ -11894,8 +11893,7 @@ (define_peephole2
   "(peep2_reg_dead_p (3, operands[1])
 || operands_match_p (operands[1], operands[3]))
&& ! reg_overlap_mentioned_p (operands[3], operands[0])
-   && ! (GET_CODE (operands[4]) == CLOBBER
-&& reg_mentioned_p (operands[3], operands[4]))"
+   && ! reg_set_p (operands[3], operands[4])"
   [(parallel [(set (match_dup 5) (match_dup 0))
  (match_dup 4)])
(set (strict_low_part (match_dup 6))

Jakub

[Ada] Warn on buffer overrun with complex overlay

2016-06-16 Thread Arnaud Charlet

This change improves the warning issued for buffer overruns caused by overlays
where the underlying object is too small, by taking into account the offset
of the overlaid object from the first bit of the underlying object.

The effect is visible on the following package:

 1. with Interfaces; use Interfaces;
 2.
 3. package P is
 4.
 5.   type Arr1 is array (Positive range <>) of Unsigned_16;
 6.
 7.   type Rec1 is record
 8. I : Integer;
 9. A : Arr1 (1 .. 4);
10.   end record;
11.
12.   type Arr2 is array (Positive range <>) of Rec1;
13.
14.   type Rec2 is record
15. I : Integer;
16. A : Arr2 (1 .. 2);
17.   end record;
18.
19.   R : Rec2;
20.
21.   Obj1 : Arr1 (1 .. 13);
22.   for Obj1'Address use R.A(1).I'Address;  -- warning
  |
>>> warning: "Obj1" overlays smaller object
>>> warning: program execution may be erroneous
>>> warning: size of "Obj1" is 208
>>> warning: size of "R" is 224
>>> warning: and offset of "Obj1" is 32

23.
24.   Obj2 : Arr1 (1 .. 7);
25.   for Obj2'Address use R.A(2).I'Address;  -- warning
  |
>>> warning: "Obj2" overlays smaller object
>>> warning: program execution may be erroneous
>>> warning: size of "Obj2" is 112
>>> warning: size of "R" is 224
>>> warning: and offset of "Obj2" is 128

26.
27.   Obj3 : Arr1 (1 .. 10);
28.   for Obj3'Address use R.A(1).A(2)'Address;  -- warning
  |
>>> warning: "Obj3" overlays smaller object
>>> warning: program execution may be erroneous
>>> warning: size of "Obj3" is 160
>>> warning: size of "R" is 224
>>> warning: and offset of "Obj3" is 80

29.
30.   Obj4 : Arr1 (1 .. 2);
31.   for Obj4'Address use R.A(2).A(4)'Address;  -- warning
  |
>>> warning: "Obj4" overlays smaller object
>>> warning: program execution may be erroneous
>>> warning: size of "Obj4" is 32
>>> warning: size of "R" is 224
>>> warning: and offset of "Obj4" is 208

32.
33.   Obj5 : Unsigned_16;
34.   for Obj5'Address use R.A(2).A(4)'Address;  -- no warning
35.
36. end P;

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-06-16  Eric Botcazou  

* sem_util.ads (Indexed_Component_Bit_Offset): Declare.
* sem_util.adb (Indexed_Component_Bit_Offset): New
function returning the offset of an indexed component.
(Has_Compatible_Alignment_Internal): Call it.
* sem_ch13.adb (Offset_Value): New function returning the offset of an
Address attribute reference from the underlying entity.
(Validate_Address_Clauses): Call it and take the offset into
account for the size warning.

Index: sem_util.adb
===
--- sem_util.adb(revision 237510)
+++ sem_util.adb(working copy)
@@ -8780,7 +8780,6 @@
  elsif Nkind (Expr) = N_Indexed_Component then
 declare
Typ : constant Entity_Id := Etype (Prefix (Expr));
-   Ind : constant Node_Id   := First_Index (Typ);
 
 begin
--  Packing generates unknown alignment if layout is not done
@@ -8789,22 +8788,12 @@
   Set_Result (Unknown);
end if;
 
-   --  Check prefix and component offset
+   --  Check prefix and component offset (or at least size)
 
Check_Prefix;
-   Offs := Component_Size (Typ);
-
-   --  Small optimization: compute the full offset when possible
-
-   if Offs /= No_Uint
- and then Offs > Uint_0
- and then Present (Ind)
- and then Nkind (Ind) = N_Range
- and then Compile_Time_Known_Value (Low_Bound (Ind))
- and then Compile_Time_Known_Value (First (Expressions (Expr)))
-   then
-  Offs := Offs * (Expr_Value (First (Expressions (Expr)))
-- Expr_Value (Low_Bound ((Ind;
+   Offs := Indexed_Component_Bit_Offset (Expr);
+   if Offs = No_Uint then
+  Offs := Component_Size (Typ);
end if;
 end;
  end if;
@@ -11064,6 +11053,59 @@
   return Empty;
end Incomplete_Or_Partial_View;
 
+   --
+   -- Indexed_Component_Bit_Offset --
+   --
+
+   function Indexed_Component_Bit_Offset (N : Node_Id) return Uint is
+  Exp : constant Node_Id   := First (Expressions (N));
+  Typ : constant Entity_Id := Etype (Prefix (N));
+  Off : constant Uint  := Component_Size (Typ);
+  Ind : Node_Id;
+
+   begin
+  --  Return early if the component size is not known or variable
+
+  if Off

[Ada] Avoid anonymous array object for aggregates with qualified expressions

2016-06-16 Thread Arnaud Charlet

This patch enhances the memory usage of object declarations initialized by
a qualified array aggregate. Previously, as per RM 4.3(5), an anonymous object
was created to capture the value of the array aggregate, effectively doubling
the memory consumption. The changes above remove the anonymous object
declaration and instead ignore the qualified expression. As noted in the
comments this is allowed due to RM 7.6(17 1/3).


-- Source --


--  pack.adb

procedure Pack is
   
   type Rec is record
  I  : Integer;
  SI : Short_Integer;
  B  : Boolean;
   end record;

   type Arr is array (1 .. 3, 0 .. 255) of Rec;
   Obj_1 : Arr := Arr'(others => (others => Rec'(0, 0, False)));

begin
   null;
end Pack;


-- Compilation and output --


gnatmake -g -f -gnatD pack.adb
grep "obj_1[]*:[a-z_]*;" pack.adb.dg
   obj_1 : pack__arr;

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-06-16  Justin Squirek  

* sem_ch3.adb (Analyze_Object_Declaration): Add a missing check
for optimized aggregate arrays with qualified expressions.
* exp_aggr.adb (Expand_Array_Aggregate): Fix block and
conditional statement in charge of deciding whether to perform
in-place expansion. Specifically, use Parent_Node to jump over
the qualified expression to the object declaration node. Also,
a check has been inserted to skip the optimization if SPARK 2005
is being used in strict adherence to RM 4.3(5).

Index: sem_ch3.adb
===
--- sem_ch3.adb (revision 237439)
+++ sem_ch3.adb (working copy)
@@ -3471,7 +3471,7 @@
 
  --  In case of aggregates we must also take care of the correct
  --  initialization of nested aggregates bug this is done at the
- --  point of the analysis of the aggregate (see sem_aggr.adb).
+ --  point of the analysis of the aggregate (see sem_aggr.adb) ???
 
  if Present (Expression (N))
and then Nkind (Expression (N)) = N_Aggregate
@@ -4038,7 +4038,10 @@
 
   elsif Is_Array_Type (T)
 and then No_Initialization (N)
-and then Nkind (Original_Node (E)) = N_Aggregate
+and then (Nkind (Original_Node (E)) = N_Aggregate
+   or else (Nkind (Original_Node (E)) = N_Qualified_Expression
+ and then Nkind (Original_Node (Expression
+(Original_Node (E = N_Aggregate))
   then
  if not Is_Entity_Name (Object_Definition (N)) then
 Act_T := Etype (E);
Index: exp_aggr.adb
===
--- exp_aggr.adb(revision 237429)
+++ exp_aggr.adb(working copy)
@@ -5433,8 +5433,8 @@
 
   --  STEP 3
 
-  --  Delay expansion for nested aggregates: it will be taken care of
-  --  when the parent aggregate is expanded.
+  --  Delay expansion for nested aggregates: it will be taken care of when
+  --  the parent aggregate is expanded.
 
   Parent_Node := Parent (N);
   Parent_Kind := Nkind (Parent_Node);
@@ -5524,14 +5524,18 @@
  and then Parent_Kind = N_Object_Declaration
  and then not
Must_Slide (Etype (Defining_Identifier (Parent_Node)), Typ)
- and then N = Expression (Parent_Node)
+ and then Present (Expression (Parent_Node))
+ and then not Has_Controlled_Component (Typ)
  and then not Is_Bit_Packed_Array (Typ)
- and then not Has_Controlled_Component (Typ)
+
+ --  ??? the test for SPARK 05 needs documentation
+
+ and then not Restriction_Check_Required (SPARK_05)
   then
  In_Place_Assign_OK_For_Declaration := True;
- Tmp := Defining_Identifier (Parent (N));
- Set_No_Initialization (Parent (N));
- Set_Expression (Parent (N), Empty);
+ Tmp := Defining_Identifier (Parent_Node);
+ Set_No_Initialization (Parent_Node);
+ Set_Expression (Parent_Node, Empty);
 
  --  Set kind and type of the entity, for use in the analysis
  --  of the subsequent assignments. If the nominal type is not
@@ -5544,10 +5548,10 @@
  if not Is_Constrained (Typ) then
 Build_Constrained_Type (Positional => False);
 
- elsif Is_Entity_Name (Object_Definition (Parent (N)))
-   and then Is_Constrained (Entity (Object_Definition (Parent (N
+ elsif Is_Entity_Name (Object_Definition (Parent_Node))
+   and then Is_Constrained (Entity (Object_Definition (Parent_Node)))
  then
-Set_Etype (Tmp, Entity (Object_Definition (Parent (N;
+Set_Etype (Tmp, Entity (Object_Definition (Parent_Node)));
 
  else
 Set_Size_Known_At_Compile_Time (Typ, False);

[Ada] Use System.Priority to validate pragma Priority value for subprogram.

2016-06-16 Thread Arnaud Charlet

This fixes a corner case for pragma Priority (0) set on the main subprogram.
Does not affect usual platforms.

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-06-16  Tristan Gingold  

* sem_prag.adb (Analyze_Pragma): Simplify code
for Pragma_Priority.

Index: exp_aggr.adb
===
--- exp_aggr.adb(revision 237429)
+++ exp_aggr.adb(working copy)
@@ -5433,8 +5433,8 @@
 
   --  STEP 3
 
-  --  Delay expansion for nested aggregates: it will be taken care of
-  --  when the parent aggregate is expanded.
+  --  Delay expansion for nested aggregates: it will be taken care of when
+  --  the parent aggregate is expanded.
 
   Parent_Node := Parent (N);
   Parent_Kind := Nkind (Parent_Node);
@@ -5524,14 +5524,18 @@
  and then Parent_Kind = N_Object_Declaration
  and then not
Must_Slide (Etype (Defining_Identifier (Parent_Node)), Typ)
- and then N = Expression (Parent_Node)
+ and then Present (Expression (Parent_Node))
+ and then not Has_Controlled_Component (Typ)
  and then not Is_Bit_Packed_Array (Typ)
- and then not Has_Controlled_Component (Typ)
+
+ --  ??? the test for SPARK 05 needs documentation
+
+ and then not Restriction_Check_Required (SPARK_05)
   then
  In_Place_Assign_OK_For_Declaration := True;
- Tmp := Defining_Identifier (Parent (N));
- Set_No_Initialization (Parent (N));
- Set_Expression (Parent (N), Empty);
+ Tmp := Defining_Identifier (Parent_Node);
+ Set_No_Initialization (Parent_Node);
+ Set_Expression (Parent_Node, Empty);
 
  --  Set kind and type of the entity, for use in the analysis
  --  of the subsequent assignments. If the nominal type is not
@@ -5544,10 +5548,10 @@
  if not Is_Constrained (Typ) then
 Build_Constrained_Type (Positional => False);
 
- elsif Is_Entity_Name (Object_Definition (Parent (N)))
-   and then Is_Constrained (Entity (Object_Definition (Parent (N
+ elsif Is_Entity_Name (Object_Definition (Parent_Node))
+   and then Is_Constrained (Entity (Object_Definition (Parent_Node)))
  then
-Set_Etype (Tmp, Entity (Object_Definition (Parent (N;
+Set_Etype (Tmp, Entity (Object_Definition (Parent_Node)));
 
  else
 Set_Size_Known_At_Compile_Time (Typ, False);
Index: sem_prag.adb
===
--- sem_prag.adb(revision 237433)
+++ sem_prag.adb(working copy)
@@ -18903,22 +18903,15 @@
--  where we ignore the value if out of range.
 
else
-  declare
- Val : constant Uint := Expr_Value (Arg);
-  begin
- if not Relaxed_RM_Semantics
-   and then
- (Val < 0
-   or else Val > Expr_Value (Expression
-   (Parent (RTE (RE_Max_Priority)
- then
-Error_Pragma_Arg
-  ("main subprogram priority is out of range", Arg1);
- else
-Set_Main_Priority
-  (Current_Sem_Unit, UI_To_Int (Expr_Value (Arg)));
- end if;
-  end;
+  if not Relaxed_RM_Semantics
+and then not Is_In_Range (Arg, RTE (RE_Priority))
+  then
+ Error_Pragma_Arg
+   ("main subprogram priority is out of range", Arg1);
+  else
+ Set_Main_Priority
+   (Current_Sem_Unit, UI_To_Int (Expr_Value (Arg)));
+  end if;
end if;
 
--  Load an arbitrary entity from System.Tasking.Stages or

[Ada] Fix minor memory leak in GNAT.Command_Line

2016-06-16 Thread Arnaud Charlet

When a new switch is defined with a specific name for its parameter,
that name is not freed. This is a minor leak, since such switches
are in general defined once at the beginning of the program, and
never modified afterwards.
Detected with valgrind.

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-06-16  Emmanuel Briot  

* g-comlin.adb: Fix minor memory leak in GNAT.Command_Line.

Index: g-comlin.adb
===
--- g-comlin.adb(revision 237429)
+++ g-comlin.adb(working copy)
@@ -3073,6 +3073,7 @@
Free (Config.Switches (S).Long_Switch);
Free (Config.Switches (S).Help);
Free (Config.Switches (S).Section);
+   Free (Config.Switches (S).Argument);
 end loop;
 
 Unchecked_Free (Config.Switches);

[Ada] Missing finalization of controlled build-in-place function result

2016-06-16 Thread Arnaud Charlet

This patch modifies the finalization machinery to recognize a controlled
deferred constant initialized by means of a build-in-place function call
as requiring finalization actions.


-- Source --


--  types.ads

private with Ada.Finalization;

package Types is
   type T (<>) is limited private;
   function Create return T;

private
   type T is new Ada.Finalization.Limited_Controlled with record
  Id : Natural := 0;
   end record;

   overriding procedure Initialize (X : in out T);
   overriding procedure Finalize (X : in out T);
end Types;

--  types.adb

with Ada.Text_IO; use Ada.Text_IO;

package body Types is
   Id_Gen : Natural := 0;

   procedure Finalize (X : in out T) is
   begin
  Put_Line ("  fin" & X.Id'Img);
  X.Id := 0;
   end;

   procedure Initialize (X : in out T) is
   begin
  Id_Gen := Id_Gen + 1;
  X.Id   := Id_Gen;
  Put_Line ("  ini" & X.Id'Img);
   end Initialize;

   function Create return T is
   begin
  return Result : T do
 Put_Line ("Create");
  end return;
   end Create;
end Types;

--  main.adb

with Types; use Types;

procedure Main is
   Obj : T renames Create;
begin
   null;
end Main;


-- Compilation and output --


$ gnatmake -q main.adb
$ ./main
  ini 1
Create
  fin 1

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-06-16  Hristian Kirtchev  

* exp_ch7.adb (Find_Last_Init): Remove obsolete code. The
logic is now performed by Process_Object_Declaration.
(Process_Declarations): Recognize a controlled deferred
constant which is in fact initialized by means of a
build-in-place function call as needing finalization actions.
(Process_Object_Declaration): Insert the counter after the
build-in-place initialization call for a controlled object. This
was previously done in Find_Last_Init.
* exp_util.adb (Requires_Cleanup_Actions): Recognize a controlled
deferred constant which is in fact initialized by means of a
build-in-place function call as needing finalization actions.

Index: exp_ch7.adb
===
--- exp_ch7.adb (revision 237429)
+++ exp_ch7.adb (working copy)
@@ -2100,16 +2100,21 @@
   null;
 
--  The object is of the form:
-   --Obj : Typ [:= Expr];
+   --Obj : [constant] Typ [:= Expr];
 
-   --  Do not process the incomplete view of a deferred constant.
-   --  Do not consider tag-to-class-wide conversions.
+   --  Do not process tag-to-class-wide conversions because they do
+   --  not yield an object. Do not process the incomplete view of a
+   --  deferred constant. Note that an object initialized by means
+   --  of a build-in-place function call may appear as a deferred
+   --  constant after expansion activities. These kinds of objects
+   --  must be finalized.
 
elsif not Is_Imported (Obj_Id)
  and then Needs_Finalization (Obj_Typ)
- and then not (Ekind (Obj_Id) = E_Constant
-and then not Has_Completion (Obj_Id))
  and then not Is_Tag_To_Class_Wide_Conversion (Obj_Id)
+ and then not (Ekind (Obj_Id) = E_Constant
+and then not Has_Completion (Obj_Id)
+and then No (BIP_Initialization_Call (Obj_Id)))
then
   Processing_Actions;
 
@@ -2757,48 +2762,9 @@
 
 Stmt := Next_Suitable_Statement (Decl);
 
---  A limited controlled object initialized by a function call uses
---  the build-in-place machinery to obtain its value.
+--  Nothing to do for an object with suppressed initialization
 
---Obj : Lim_Controlled_Type := Func_Call;
-
---  is expanded into
-
---Obj  : Lim_Controlled_Type;
---type Ptr_Typ is access Lim_Controlled_Type;
---Temp : constant Ptr_Typ :=
--- Func_Call
---   (BIPalloc  => 1,
---BIPaccess => Obj'Unrestricted_Access)'reference;
-
---  In this scenario the declaration of the temporary acts as the
---  last initialization statement.
-
-if Is_Limited_Type (Obj_Typ)
-  and then Has_Init_Expression (Decl)
-  and then No (Expression (Decl))
-then
-   while Present (Stmt) loop
-  if Nkind (Stmt) = N_Object_Declaration
-and then Present (Expression (Stmt))
-and then Is_Object_Access_BIP_Func_Call
-   (Expr   => Expression (Stmt),
-Obj_I

[Ada] Missing errors on illegal expressions for entry pre/postconditions

2016-06-16 Thread Arnaud Charlet

This patch adds checks on the expressions of pre/postconditions for task and
protected entries, prior to their full analysis, so that errors are properly
emitted in various compiler modes.

Tested by ACATS 4.0L: B611008

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-06-16  Ed Schonberg  

* sem_ch3.adb (Check_Entry_Contracts): New procedure, subsidiary
of Analyze_Declarations, that performs pre-analysis of
pre/postconditions on entry declarations before full analysis
is performed after entries have been converted into procedures.
Done solely to capture semantic errors.
* sem_attr.adb (Analyze_Attribute, case 'Result): Add guard to
call to Denote_Same_Function.

Index: sem_ch3.adb
===
--- sem_ch3.adb (revision 237514)
+++ sem_ch3.adb (working copy)
@@ -2165,6 +2165,13 @@
   --  (They have the sloc of the label as found in the source, and that
   --  is ahead of the current declarative part).
 
+  procedure Check_Entry_Contracts;
+  --  Perform a pre-analysis of the pre- and postconditions of an entry
+  --  declaration. This must be done before full resolution and creation
+  --  of the parameter block, etc. to catch illegal uses within the
+  --  contract expression. Full analysis of the expression is done when
+  --  the contract is processed.
+
   procedure Handle_Late_Controlled_Primitive (Body_Decl : Node_Id);
   --  Determine whether Body_Decl denotes the body of a late controlled
   --  primitive (either Initialize, Adjust or Finalize). If this is the
@@ -2189,6 +2196,56 @@
  end loop;
   end Adjust_Decl;
 
+  ---
+  -- Check_Entry_Contracts --
+  ---
+
+  procedure Check_Entry_Contracts is
+ ASN : Node_Id;
+ Ent : Entity_Id;
+ Exp : Node_Id;
+
+  begin
+ Ent := First_Entity (Current_Scope);
+ while Present (Ent) loop
+
+--  This only concerns entries with pre/postconditions
+
+if Ekind (Ent) = E_Entry
+  and then Present (Contract (Ent))
+  and then Present (Pre_Post_Conditions (Contract (Ent)))
+then
+   ASN := Pre_Post_Conditions (Contract (Ent));
+   Push_Scope (Ent);
+   Install_Formals (Ent);
+
+   --  Pre/postconditions are rewritten as Check pragmas. Analysis
+   --  is performed on a copy of the pragma expression, to prevent
+   --  modifying the original expression.
+
+   while Present (ASN) loop
+  if Nkind (ASN) = N_Pragma then
+ Exp :=
+   New_Copy_Tree
+ (Expression
+   (First (Pragma_Argument_Associations (ASN;
+ Set_Parent (Exp, ASN);
+
+ --  ??? why not Preanalyze_Assert_Expression
+
+ Preanalyze (Exp);
+  end if;
+
+  ASN := Next_Pragma (ASN);
+   end loop;
+
+   End_Scope;
+end if;
+
+Next_Entity (Ent);
+ end loop;
+  end Check_Entry_Contracts;
+
   --
   -- Handle_Late_Controlled_Primitive --
   --
@@ -2349,12 +2406,14 @@
  --  (This is needed in any case for early instantiations ???).
 
  if No (Next_Decl) then
-if Nkind_In (Parent (L), N_Component_List,
- N_Task_Definition,
- N_Protected_Definition)
-then
+if Nkind (Parent (L)) = N_Component_List then
null;
 
+elsif Nkind_In (Parent (L), N_Protected_Definition,
+N_Task_Definition)
+then
+   Check_Entry_Contracts;
+
 elsif Nkind (Parent (L)) /= N_Package_Specification then
if Nkind (Parent (L)) = N_Package_Body then
   Freeze_From := First_Entity (Current_Scope);
Index: sem_attr.adb
===
--- sem_attr.adb(revision 237507)
+++ sem_attr.adb(working copy)
@@ -5348,7 +5348,9 @@
 if Is_Entity_Name (P) then
Pref_Id := Entity (P);
 
-   if Ekind_In (Pref_Id, E_Function, E_Generic_Function) then
+   if Ekind_In (Pref_Id, E_Function, E_Generic_Function)
+ and then Ekind (Spec_Id) = Ekind (Pref_Id)
+   then
   if Denote_Same_Function (Pref_Id, Spec_Id) then
 
  --  Correct the prefix of the attribute when the context

[PATCH][COMMITTED] [ARC] Fix option text.

2016-06-16 Thread Claudiu Zissulescu

Add dot at the end of sentence.

gcc/
2016-06-16  Claudiu Zissulescu  

* config/arc/arc.opt (mtp-regno): Update text.
---
 gcc/ChangeLog  | 4 
 gcc/config/arc/arc.opt | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b93a689..9138fd3 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2016-06-16  Claudiu Zissulescu  
+
+   * config/arc/arc.opt (mtp-regno): Update text.
+
 2016-06-16  Renlin Li  
 
* config/aarch64/aarch64.c (aarch64_legitimize_address): Fix a typo.
diff --git a/gcc/config/arc/arc.opt b/gcc/config/arc/arc.opt
index d957a92..4caf366 100644
--- a/gcc/config/arc/arc.opt
+++ b/gcc/config/arc/arc.opt
@@ -465,7 +465,7 @@ Enum(arc_fpu) String(fpud_all) Value(FPU_SP | FPU_SC | 
FPU_SF | FPU_SD | FPU_DP
 
 mtp-regno=
 Target RejectNegative Joined UInteger Var(arc_tp_regno) Init(25)
-Specify thread pointer register number
+Specify thread pointer register number.
 
 mtp-regno=none
 Target RejectNegative Var(arc_tp_regno,-1)
-- 
1.9.1

Re: [PATCH] Fix builtin-arith-overflow-p-1[23].c on i686

2016-06-16 Thread Jakub Jelinek

On Thu, Jun 16, 2016 at 11:51:12AM +0200, Jakub Jelinek wrote:
> Here is what I've committed to the trunk and 6.2 after bootstrap/regtest on
> x86_64-linux and i686-linux.
> For 5/4.9, this doesn't apply cleanly, as http://gcc.gnu.org/r222592
> aka https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01930.html
> has not been backported.  Shall I backport that too, or just not apply to
> 5/4.9?  I guess I should, because:
> int v;
> __attribute__ ((noinline, noclone)) void bar (void) { v++; }
> __attribute__ ((noinline, noclone)) void foo (unsigned int x) { signed int y 
> = ((-__INT_MAX__ - 1) / 2), r; if (__builtin_mul_overflow (x, y, &r)) bar (); 
> }
> int main () { foo (2); if (v) __builtin_abort (); return 0; }
> is miscompiled with -m32 -O2 in 5.x (though, 4.9 doesn't support
> __builtin_*_overflow, so maybe it is not an issue there).

Ok, I went ahead and committed following to 5.x branch
and the testcase also to trunk and 6.2.
What is your preference for 4.9?  The testcase isn't applicable.

2016-06-16  Jakub Jelinek  

PR target/71554
* config/i386/i386.md (setcc + movzbl peephole2): Use reg_set_p.
(setcc + and peephole2): Likewise.

Backported from mainline
2015-04-29  Uros Bizjak  

* config/i386/i386.md (setcc+movzbl peephole2): Check also clobbered
reg.
(setcc+andl peephole2): Ditto.

2016-06-16  Jakub Jelinek  

PR target/71554
* gcc.c-torture/execute/pr71554.c: New test.

--- gcc/config/i386/i386.md (revision 237518)
+++ gcc/config/i386/i386.md (working copy)
@@ -11645,7 +11645,8 @@ (define_peephole2
(zero_extend (match_dup 1)))]
   "(peep2_reg_dead_p (3, operands[1])
 || operands_match_p (operands[1], operands[3]))
-   && ! reg_overlap_mentioned_p (operands[3], operands[0])"
+   && ! reg_overlap_mentioned_p (operands[3], operands[0])
+   && ! reg_set_p (operands[3], operands[4])"
   [(parallel [(set (match_dup 5) (match_dup 0))
  (match_dup 4)])
(set (strict_low_part (match_dup 6))
@@ -11688,7 +11689,8 @@ (define_peephole2
  (clobber (reg:CC FLAGS_REG))])]
   "(peep2_reg_dead_p (3, operands[1])
 || operands_match_p (operands[1], operands[3]))
-   && ! reg_overlap_mentioned_p (operands[3], operands[0])"
+   && ! reg_overlap_mentioned_p (operands[3], operands[0])
+   && ! reg_set_p (operands[3], operands[4])"
   [(parallel [(set (match_dup 5) (match_dup 0))
  (match_dup 4)])
(set (strict_low_part (match_dup 6))
--- gcc/testsuite/gcc.c-torture/execute/pr71554.c   (revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/pr71554.c   (working copy)
@@ -0,0 +1,28 @@
+/* PR target/71554 */
+
+int v;
+
+__attribute__ ((noinline, noclone)) void
+bar (void)
+{
+  v++;
+}
+
+__attribute__ ((noinline, noclone))
+void
+foo (unsigned int x)
+{
+  signed int y = ((-__INT_MAX__ - 1) / 2);
+  signed int r;
+  if (__builtin_mul_overflow (x, y, &r))
+bar ();
+}
+
+int
+main ()
+{
+  foo (2);
+  if (v)
+__builtin_abort ();
+  return 0;
+}


Jakub

Re: [PATCH, IA64, RFT]: Implement PR 71242, Missing built-in functions for float128 NaNs

2016-06-16 Thread Alexander Monakov

Hi,

> 2016-06-12  Uros Bizjak  
> 
> PR target/71242
> * config/ia64/ia64.c (enum ia64_builtins) [IA64_BUILTIN_NANQ]: New.
> [IA64_BUILTIN_NANSQ]: Ditto.
> (ia64_fold_builtin): New function.
> (TARGET_FOLD_BUILTIN): New define.
> (ia64_init_builtins) Declare const_string_type node.
> Add __builtin_nanq and __builtin_nansq builtin functions.
> (ia64_expand_builtin): Handle IA64_BUILTIN_NANQ and IA64_BUILTIN_NANSQ.
> 
> testsuite/ChangeLog:
> 
> 2016-06-12  Uros Bizjak  
> 
> PR target/71241
> * testsuite/gcc.dg/torture/float128-nan.c: Also run on ia64-*-*.
> 
> Tested by building croscompiller to ia64-linux-gnu and eyeballed
> resulting assembly.
> 
> Can someone please test this patch on a real IA64 ?

I gave it a shot.  It bootstraps, and the float128-nan.c test passes.
Unfortunately, I had trouble running the testsuite to completion, but fwiw I
don't see anything alarming in the partial results I got.

Alexander

Re: [PATCH] Fix builtin-arith-overflow-p-1[23].c on i686

2016-06-16 Thread Uros Bizjak

On Thu, Jun 16, 2016 at 12:39 PM, Jakub Jelinek  wrote:
> On Thu, Jun 16, 2016 at 11:51:12AM +0200, Jakub Jelinek wrote:
>> Here is what I've committed to the trunk and 6.2 after bootstrap/regtest on
>> x86_64-linux and i686-linux.
>> For 5/4.9, this doesn't apply cleanly, as http://gcc.gnu.org/r222592
>> aka https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01930.html
>> has not been backported.  Shall I backport that too, or just not apply to
>> 5/4.9?  I guess I should, because:
>> int v;
>> __attribute__ ((noinline, noclone)) void bar (void) { v++; }
>> __attribute__ ((noinline, noclone)) void foo (unsigned int x) { signed int y 
>> = ((-__INT_MAX__ - 1) / 2), r; if (__builtin_mul_overflow (x, y, &r)) bar 
>> (); }
>> int main () { foo (2); if (v) __builtin_abort (); return 0; }
>> is miscompiled with -m32 -O2 in 5.x (though, 4.9 doesn't support
>> __builtin_*_overflow, so maybe it is not an issue there).
>
> Ok, I went ahead and committed following to 5.x branch
> and the testcase also to trunk and 6.2.
> What is your preference for 4.9?  The testcase isn't applicable.

Let's also patch 4.9. We know what kind of pattern makes problems, and
it is possible for 4.9 to generate problematic sequence (involving
cc-setting arithmetic insn) even without new patterns.

Patch should be safe, since it adds another condition.

Thanks,
Uros.

> 2016-06-16  Jakub Jelinek  
>
> PR target/71554
> * config/i386/i386.md (setcc + movzbl peephole2): Use reg_set_p.
> (setcc + and peephole2): Likewise.
>
> Backported from mainline
> 2015-04-29  Uros Bizjak  
>
> * config/i386/i386.md (setcc+movzbl peephole2): Check also clobbered
> reg.
> (setcc+andl peephole2): Ditto.
>
> 2016-06-16  Jakub Jelinek  
>
> PR target/71554
> * gcc.c-torture/execute/pr71554.c: New test.
>
> --- gcc/config/i386/i386.md (revision 237518)
> +++ gcc/config/i386/i386.md (working copy)
> @@ -11645,7 +11645,8 @@ (define_peephole2
> (zero_extend (match_dup 1)))]
>"(peep2_reg_dead_p (3, operands[1])
>  || operands_match_p (operands[1], operands[3]))
> -   && ! reg_overlap_mentioned_p (operands[3], operands[0])"
> +   && ! reg_overlap_mentioned_p (operands[3], operands[0])
> +   && ! reg_set_p (operands[3], operands[4])"
>[(parallel [(set (match_dup 5) (match_dup 0))
>   (match_dup 4)])
> (set (strict_low_part (match_dup 6))
> @@ -11688,7 +11689,8 @@ (define_peephole2
>   (clobber (reg:CC FLAGS_REG))])]
>"(peep2_reg_dead_p (3, operands[1])
>  || operands_match_p (operands[1], operands[3]))
> -   && ! reg_overlap_mentioned_p (operands[3], operands[0])"
> +   && ! reg_overlap_mentioned_p (operands[3], operands[0])
> +   && ! reg_set_p (operands[3], operands[4])"
>[(parallel [(set (match_dup 5) (match_dup 0))
>   (match_dup 4)])
> (set (strict_low_part (match_dup 6))
> --- gcc/testsuite/gcc.c-torture/execute/pr71554.c   (revision 0)
> +++ gcc/testsuite/gcc.c-torture/execute/pr71554.c   (working copy)
> @@ -0,0 +1,28 @@
> +/* PR target/71554 */
> +
> +int v;
> +
> +__attribute__ ((noinline, noclone)) void
> +bar (void)
> +{
> +  v++;
> +}
> +
> +__attribute__ ((noinline, noclone))
> +void
> +foo (unsigned int x)
> +{
> +  signed int y = ((-__INT_MAX__ - 1) / 2);
> +  signed int r;
> +  if (__builtin_mul_overflow (x, y, &r))
> +bar ();
> +}
> +
> +int
> +main ()
> +{
> +  foo (2);
> +  if (v)
> +__builtin_abort ();
> +  return 0;
> +}
>
>
> Jakub

[PATCH][ARM]Use different startfile and endfile for elf target when generating shared object.

2016-06-16 Thread Renlin Li


Hi all,

GCC has startfile and endfile spec string built into it.
startfile is used to specify objects files to include at the start of 
the link process. While endfile, on the other hand, is used to specify 
objects files to include at the end of the link process.


crtbegin.o is one of the object files specified by startfile spec 
string. IIUC, crtbeginS.o should be used in place of crtbegin.o when 
generating shared objects.
The same applies to crtend.o which is one of the endfile. crtendS.o 
should be used when generating shared objects.


This patch makes the change to use different crtbegin and crtend files 
when creating shared and static object for elf toolchain. The linux 
toolchain already did this differentiation.


So when the toolchain doesn't support shared object, the following error 
message will be produced:

ld: cannot find crtbeginS.o: No such file or directory

Still, those specs strings built into GCC can be overridden by using
-specs=command-line switch to specify a spec file.

arm-none-eabi regression test without new issues, OK for trunk?

Regards,
Renlin Li

gcc/ChangeLog:

2016-06-16  Renlin Li  

* config/arm/unknown-elf.h (UNKNOWN_ELF_STARTFILE_SPEC): Use
crtbeginS.o for shared object.
(UNKNOWN_ELF_ENDFILE_SPEC): Use crtendS.o for shared object.
diff --git a/gcc/config/arm/unknown-elf.h b/gcc/config/arm/unknown-elf.h
index fafe057..12ef497 100644
--- a/gcc/config/arm/unknown-elf.h
+++ b/gcc/config/arm/unknown-elf.h
@@ -29,14 +29,19 @@
 #endif
 
 /* Now we define the strings used to build the spec file.  */
-#define UNKNOWN_ELF_STARTFILE_SPEC	" crti%O%s crtbegin%O%s crt0%O%s"
+#define UNKNOWN_ELF_STARTFILE_SPEC	\
+  "crti%O%s \
+  %{!shared:crtbegin%O%s} %{shared:crtbeginS%O%s} \
+  crt0%O%s"
 
 #undef  STARTFILE_SPEC
 #define STARTFILE_SPEC	\
   "%{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s} "	\
   UNKNOWN_ELF_STARTFILE_SPEC
 
-#define UNKNOWN_ELF_ENDFILE_SPEC	"crtend%O%s crtn%O%s"
+#define UNKNOWN_ELF_ENDFILE_SPEC	\
+  "%{!shared:crtend%O%s} %{shared:crtendS%O%s} \
+  crtn%O%s"
 
 #undef  ENDFILE_SPEC
 #define ENDFILE_SPEC	UNKNOWN_ELF_ENDFILE_SPEC

[PATCH, PR middle-end/71488] Fix vectorization of comparison of booleans

2016-06-16 Thread Ilya Enkovich

Hi,

This patch fixes incorrect comparison vectorization for booleans.
The problem is that regular comparison which works for scalars
doesn't work for vectors due to different binary representation.
Also this never works for scalar masks.

This patch replaces such comparisons with bitwise operations
which work correctly for both vector and scalar masks.

Bootstrapped and regtested on x86_64-unknown-linux-gnu.  Is it
OK for trunk?  What should be done for gcc-6-branch?  Port this
patch or just restrict vectorization for comparison of booleans?

Thanks,
Ilya
--
gcc/

2016-06-15  Ilya Enkovich  

PR middle-end/71488
* tree-vect-patterns.c (vect_recog_mask_conversion_pattern): Support
comparison of boolean vectors.
* tree-vect-stmts.c (vectorizable_comparison): Vectorize comparison
of boolean vectors using bitwise operations.

gcc/testsuite/

2016-06-15  Ilya Enkovich  

PR middle-end/71488
* g++.dg/pr71488.C: New test.
* gcc.dg/vect/vect-bool-cmp.c: New test.


diff --git a/gcc/testsuite/g++.dg/pr71488.C b/gcc/testsuite/g++.dg/pr71488.C
new file mode 100644
index 000..d7d657e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr71488.C
@@ -0,0 +1,24 @@
+// PR middle-end/71488
+// { dg-do run }
+// { dg-options "-O3 -std=c++11" }
+// { dg-additional-options "-march=westmere" { target i?86-*-* x86_64-*-* } }
+// { dg-require-effective-target c++11 }
+
+#include 
+
+int var_4 = 1;
+long long var_9 = 0;
+
+int main() {
+  
+  std::valarray> v10;
+
+  v10.resize(1);
+  v10[0].resize(4);
+
+  for (int i = 0; i < 4; i++)
+v10[0][i] = ((var_9 == 0) > unsigned (var_4 == 0)) + (var_9 == 0);
+
+  if (v10[0][0] != 2)
+__builtin_abort ();
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bool-cmp.c 
b/gcc/testsuite/gcc.dg/vect/vect-bool-cmp.c
new file mode 100644
index 000..a1e2a24
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bool-cmp.c
@@ -0,0 +1,252 @@
+/* PR71488 */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_pack_trunc } */
+/* { dg-additional-options "-msse4" { target { i?86-*-* x86_64-*-* } } } */
+
+int i1, i2;
+
+void __attribute__((noclone,noinline))
+fn1 (int * __restrict__ p1, int * __restrict__ p2, int * __restrict__ p3, int 
size)
+{
+  int i;
+
+  for (i = 0; i < size; i++)
+p1[i] = ((p2[i] == 0) > (unsigned)(p3[i] == 0)) + (p2[i] == 0);
+}
+
+void __attribute__((noclone,noinline))
+fn2 (int * __restrict__ p1, int * __restrict__ p2, short * __restrict__ p3, 
int size)
+{
+  int i;
+
+  for (i = 0; i < size; i++)
+p1[i] = ((p2[i] == 0) > (unsigned)(p3[i] == 0)) + (p2[i] == 0);
+}
+
+void __attribute__((noclone,noinline))
+fn3 (int * __restrict__ p1, int * __restrict__ p2, long long * __restrict__ 
p3, int size)
+{
+  int i;
+
+  for (i = 0; i < size; i++)
+p1[i] = ((p2[i] == 0) > (unsigned)(p3[i] == 0)) + (p2[i] == 0);
+}
+
+void __attribute__((noclone,noinline))
+fn4 (int * __restrict__ p1, int * __restrict__ p2, int * __restrict__ p3, int 
size)
+{
+  int i;
+
+  for (i = 0; i < size; i++)
+p1[i] = ((p2[i] == 0) >= (unsigned)(p3[i] == 0)) + (p2[i] == 0);
+}
+
+void __attribute__((noclone,noinline))
+fn5 (int * __restrict__ p1, int * __restrict__ p2, short * __restrict__ p3, 
int size)
+{
+  int i;
+
+  for (i = 0; i < size; i++)
+p1[i] = ((p2[i] == 0) >= (unsigned)(p3[i] == 0)) + (p2[i] == 0);
+}
+
+void __attribute__((noclone,noinline))
+fn6 (int * __restrict__ p1, int * __restrict__ p2, long long * __restrict__ 
p3, int size)
+{
+  int i;
+
+  for (i = 0; i < size; i++)
+p1[i] = ((p2[i] == 0) >= (unsigned)(p3[i] == 0)) + (p2[i] == 0);
+}
+
+void __attribute__((noclone,noinline))
+fn7 (int * __restrict__ p1, int * __restrict__ p2, int * __restrict__ p3, int 
size)
+{
+  int i;
+
+  for (i = 0; i < size; i++)
+p1[i] = ((p2[i] == 0) < (unsigned)(p3[i] == 0)) + (p2[i] == 0);
+}
+
+void __attribute__((noclone,noinline))
+fn8 (int * __restrict__ p1, int * __restrict__ p2, short * __restrict__ p3, 
int size)
+{
+  int i;
+
+  for (i = 0; i < size; i++)
+p1[i] = ((p2[i] == 0) < (unsigned)(p3[i] == 0)) + (p2[i] == 0);
+}
+
+void __attribute__((noclone,noinline))
+fn9 (int * __restrict__ p1, int * __restrict__ p2, long long * __restrict__ 
p3, int size)
+{
+  int i;
+
+  for (i = 0; i < size; i++)
+p1[i] = ((p2[i] == 0) < (unsigned)(p3[i] == 0)) + (p2[i] == 0);
+}
+
+void __attribute__((noclone,noinline))
+fn10 (int * __restrict__ p1, int * __restrict__ p2, int * __restrict__ p3, int 
size)
+{
+  int i;
+
+  for (i = 0; i < size; i++)
+p1[i] = ((p2[i] == 0) <= (unsigned)(p3[i] == 0)) + (p2[i] == 0);
+}
+
+void __attribute__((noclone,noinline))
+fn11 (int * __restrict__ p1, int * __restrict__ p2, short * __restrict__ p3, 
int size)
+{
+  int i;
+
+  for (i = 0; i < size; i++)
+p1[i] = ((p2[i] == 0) <= (unsigned)(p3[i] == 0)) + (p2[i] == 0);
+}
+
+void __attribute__((noclone,noinline))
+fn12 (int * __restrict__ p1, int * __restrict__ p2, long long * __restrict

Re: [PATCH] Optimize inserting value_type into std::vector

2016-06-16 Thread Jonathan Wakely


On 15/06/16 11:15 +0100, Jonathan Wakely wrote:

* include/bits/stl_vector.h (vector::_S_insert_aux_assign): Define
new overloaded functions.
* include/bits/vector.tcc (vector::_M_insert_aux): Use new functions
to avoid creating a redundant temporary.
* testsuite/23_containers/vector/modifiers/insert_vs_emplace.cc: New
test.

Tested x86_64-linux.


Committed to trunk.

Fix loop size estimate in tree-ssa-loop-ivcanon

2016-06-16 Thread Jan Hubicka

Hi,
tree_estimate_loop_size contains one extra else that prevents it from 
determining
that the induction variable comparsion is going to be eliminated in both the 
peeled
copies as well as the last copy.  This patch fixes it
(it really removes one else, but need to reformat the conditional)

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* g++.dg/vect/pr36648.cc: Disable cunrolli
* tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Fix estimation
of comparsions in the last iteration.
Index: testsuite/g++.dg/vect/pr36648.cc
===
--- testsuite/g++.dg/vect/pr36648.cc(revision 237477)
+++ testsuite/g++.dg/vect/pr36648.cc(working copy)
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_float } */
+// { dg-additional-options "-fdisable-tree-cunrolli" }
 
 struct vector
 {
Index: tree-ssa-loop-ivcanon.c
===
--- tree-ssa-loop-ivcanon.c (revision 237477)
+++ tree-ssa-loop-ivcanon.c (working copy)
@@ -255,69 +255,73 @@ tree_estimate_loop_size (struct loop *lo
 
  /* Look for reasons why we might optimize this stmt away. */
 
- if (gimple_has_side_effects (stmt))
-   ;
- /* Exit conditional.  */
- else if (exit && body[i] == exit->src
-  && stmt == last_stmt (exit->src))
+ if (!gimple_has_side_effects (stmt))
{
- if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file, "   Exit condition will be eliminated "
-"in peeled copies.\n");
- likely_eliminated_peeled = true;
-   }
- else if (edge_to_cancel && body[i] == edge_to_cancel->src
-  && stmt == last_stmt (edge_to_cancel->src))
-   {
- if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file, "   Exit condition will be eliminated "
-"in last copy.\n");
- likely_eliminated_last = true;
-   }
- /* Sets of IV variables  */
- else if (gimple_code (stmt) == GIMPLE_ASSIGN
- && constant_after_peeling (gimple_assign_lhs (stmt), stmt, loop))
-   {
- if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file, "   Induction variable computation will"
-" be folded away.\n");
- likely_eliminated = true;
-   }
- /* Assignments of IV variables.  */
- else if (gimple_code (stmt) == GIMPLE_ASSIGN
-  && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
-  && constant_after_peeling (gimple_assign_rhs1 (stmt), stmt,
- loop)
-  && (gimple_assign_rhs_class (stmt) != GIMPLE_BINARY_RHS
-  || constant_after_peeling (gimple_assign_rhs2 (stmt),
- stmt, loop)))
-   {
- size->constant_iv = true;
- if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file,
-"   Constant expression will be folded away.\n");
- likely_eliminated = true;
-   }
- /* Conditionals.  */
- else if ((gimple_code (stmt) == GIMPLE_COND
-   && constant_after_peeling (gimple_cond_lhs (stmt), stmt,
-  loop)
-   && constant_after_peeling (gimple_cond_rhs (stmt), stmt,
-  loop)
-   /* We don't simplify all constant compares so make sure
-  they are not both constant already.  See PR70288.  */
-   && (! is_gimple_min_invariant (gimple_cond_lhs (stmt))
-   || ! is_gimple_min_invariant (gimple_cond_rhs (stmt
-  || (gimple_code (stmt) == GIMPLE_SWITCH
-  && constant_after_peeling (gimple_switch_index (
-   as_a  (stmt)),
+ /* Exit conditional.  */
+ if (exit && body[i] == exit->src
+ && stmt == last_stmt (exit->src))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "   Exit condition will be eliminated "
+"in peeled copies.\n");
+ likely_eliminated_peeled = true;
+   }
+ if (edge_to_cancel && body[i] == edge_to_cancel->src
+ && stmt == last_stmt (edge_to_cancel->src))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "   Exit condition will be eliminated "
+"in last copy.\n");
+ likely_eliminated_last = true;
+

Re: [Patch] Implement is_[nothrow_]swappable (p0185r1) - 2nd try

2016-06-16 Thread Jonathan Wakely


On 15/06/16 20:07 +0200, Daniel Krügler wrote:

2016-06-14 23:22 GMT+02:00 Daniel Krügler :

This is an implementation of the Standard is_swappable traits according to

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0185r1.html

During that work it has been found that std::array's member swap's exception
specification for zero-size arrays was incorrectly depending on the value_type
and that was fixed as well.

This patch is *untested*, because I cannot make the tests run on my
Windows system.

Upon the suggestion of Mike Stump I'm proposing this patch
nonetheless, asking for sending
me as specific feedback as possible for any failing tests so that I
can try to make further
adjustments if needed.


And now also with the promised patch files.


I'm still seeing large numbers of failed static assertions, and
duplicate definitions. I'll try to fix them.

As Marc suggested, getting access to the compile farm would allow you
to test on a real OS. Installing GNU/Linux in a VM would also allow
you to run the tests. (I have no idea how to test on Windows, I never
managed to get it working when I tried).

Re: [Patch] Implement is_[nothrow_]swappable (p0185r1) - 2nd try

2016-06-16 Thread Jonathan Wakely


On 16/06/16 14:00 +0100, Jonathan Wakely wrote:

On 15/06/16 20:07 +0200, Daniel Krügler wrote:

2016-06-14 23:22 GMT+02:00 Daniel Krügler :

This is an implementation of the Standard is_swappable traits according to

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0185r1.html

During that work it has been found that std::array's member swap's exception
specification for zero-size arrays was incorrectly depending on the value_type
and that was fixed as well.

This patch is *untested*, because I cannot make the tests run on my
Windows system.

Upon the suggestion of Mike Stump I'm proposing this patch
nonetheless, asking for sending
me as specific feedback as possible for any failing tests so that I
can try to make further
adjustments if needed.


And now also with the promised patch files.


I'm still seeing large numbers of failed static assertions, and
duplicate definitions. I'll try to fix them.


Ah, the duplicate definitions might be because I messed up applying
the patch.

Re: [Patch] Implement is_[nothrow_]swappable (p0185r1) - 2nd try

2016-06-16 Thread Jonathan Wakely


On 16/06/16 14:01 +0100, Jonathan Wakely wrote:

On 16/06/16 14:00 +0100, Jonathan Wakely wrote:

On 15/06/16 20:07 +0200, Daniel Krügler wrote:

2016-06-14 23:22 GMT+02:00 Daniel Krügler :

This is an implementation of the Standard is_swappable traits according to

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0185r1.html

During that work it has been found that std::array's member swap's exception
specification for zero-size arrays was incorrectly depending on the value_type
and that was fixed as well.

This patch is *untested*, because I cannot make the tests run on my
Windows system.

Upon the suggestion of Mike Stump I'm proposing this patch
nonetheless, asking for sending
me as specific feedback as possible for any failing tests so that I
can try to make further
adjustments if needed.


And now also with the promised patch files.


I'm still seeing large numbers of failed static assertions, and
duplicate definitions. I'll try to fix them.


Ah, the duplicate definitions might be because I messed up applying
the patch.


With that fixed, the remaining failures are:

/home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/20_util/is_nothrow_swappable/./value.h:285:3:
error: static assertion failed
/home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/20_util/is_nothrow_swappable/./value.h:287:3:
error: static assertion failed
/home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/20_util/is_nothrow_swappable/./value.h:289:3:
error: static assertion failed

Those assertions fail for both 20_util/is_nothrow_swappable/value.cc
and 20_util/is_nothrow_swappable/value_ext.cc

Re: [PATCH, vec-tails 01/10] New compiler options

2016-06-16 Thread Ilya Enkovich

On 20 May 14:40, Ilya Enkovich wrote:
> > Can you make all these --params then?  I think to be useful to users we'd 
> > want
> > them to be loop pragmas rather than options.
> 
> OK, I'll change it to params.  I didn't think about control via
> pragmas but will do now.
> 
> Thanks,
> Ilya
> 
> >
> > Richard.
> >

Hi,

Here is a set of params to be used instead of new flags.  Does this set looks 
OK?
I still use new option for cost model for convenient soct model enum re-use.

Thanks,
Ilya
--
gcc/

2016-06-16  Ilya Enkovich  

* common.opt (fvect-epilogue-cost-model=): New.
* params.def (PARAM_VECT_EPILOGUES_COMBINE): New.
(PARAM_VECT_EPILOGUES_MASK): New.
(PARAM_VECT_EPILOGUES_NOMASK): New.
(PARAM_VECT_SHORT_LOOPS): New.
* doc/invoke.texi (-fvect-epilogue-cost-model): New.


diff --git a/gcc/common.opt b/gcc/common.opt
index fccd4b5..10cd75b 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2582,6 +2582,10 @@ fsimd-cost-model=
 Common Joined RejectNegative Enum(vect_cost_model) Var(flag_simd_cost_model) 
Init(VECT_COST_MODEL_UNLIMITED) Optimization
 Specifies the vectorization cost model for code marked with a simd directive.
 
+fvect-epilogue-cost-model=
+Common Joined RejectNegative Enum(vect_cost_model) 
Var(flag_vect_epilogue_cost_model) Init(VECT_COST_MODEL_DEFAULT) Optimization
+Specifies the cost model for epilogue vectorization.
+
 Enum
 Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown 
vectorizer cost model %qs)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ce162a0..ecbd7ce 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7638,6 +7638,14 @@ or Cilk Plus simd directive.  The @var{model} argument 
should be one of
 have the same meaning as described in @option{-fvect-cost-model} and by
 default a cost model defined with @option{-fvect-cost-model} is used.
 
+@item -fvect-epilogue-cost-model=@var{model}
+@opindex fvect-epilogue-cost-model
+Alter the cost model used for vectorization of loop epilogues.  The
+@var{model} argument should be one of @samp{unlimited}, @samp{dynamic},
+@samp{cheap}.  All values of @var{model} have the same meaning as
+described in @option{-fvect-cost-model} and by default @samp{dynamic}
+cost model is used.
+
 @item -ftree-vrp
 @opindex ftree-vrp
 Perform Value Range Propagation on trees.  This is similar to the
diff --git a/gcc/params.def b/gcc/params.def
index 62a1e40..3bac68c 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1220,6 +1220,28 @@ DEFPARAM (PARAM_MAX_SPECULATIVE_DEVIRT_MAYDEFS,
  "Maximum number of may-defs visited when devirtualizing "
  "speculatively", 50, 0, 0)
 
+DEFPARAM (PARAM_VECT_EPILOGUES_COMBINE,
+ "vect-epilogues-combine",
+ "Enable loop epilogue vectorization by combining it with "
+ "vectorized loop body.",
+ 0, 0, 1)
+
+DEFPARAM (PARAM_VECT_EPILOGUES_MASK,
+ "vect-epilogues-mask",
+ "Enable loop epilogue vectorization using the same vector "
+ "size and masking.",
+ 0, 0, 1)
+
+DEFPARAM (PARAM_VECT_EPILOGUES_NOMASK,
+ "vect-epilogues-nomask",
+ "Enable loop epilogue vectorization using smaller vector size.",
+ 0, 0, 1)
+
+DEFPARAM (PARAM_VECT_SHORT_LOOPS,
+ "vect-short-loops",
+ "Enable vectorization of low trip count loops using masking.",
+ 0, 0, 1)
+
 /*
 
 Local variables:

[PATCH] Remove trailing whitespace from libstdc++ headers

2016-06-16 Thread Jonathan Wakely

	* include/std/array: Remove trailing whitespace.
	* include/std/atomic: Likewise.
	* include/std/bitset: Likewise.
	* include/std/chrono: Likewise.
	* include/std/complex: Likewise.
	* include/std/condition_variable: Likewise.
	* include/std/fstream: Likewise.
	* include/std/functional: Likewise.
	* include/std/future: Likewise.
	* include/std/iomanip: Likewise.
	* include/std/iosfwd: Likewise.
	* include/std/istream: Likewise.
	* include/std/limits: Likewise.
	* include/std/ratio: Likewise.
	* include/std/scoped_allocator: Likewise.
	* include/std/sstream: Likewise.
	* include/std/stdexcept: Likewise.
	* include/std/string: Likewise.
	* include/std/system_error: Likewise.
	* include/std/thread: Likewise.
	* include/std/tuple: Likewise.
	* include/std/type_traits: Likewise.
	* include/std/utility: Likewise.
	* include/std/valarray: Likewise.
	* include/std/vector: Likewise.

Tested x86_64-linux, committed to trunk.


patch.txt.gz
Description: application/gzip

[wwwdocs] Buildstat update for 5.x

2016-06-16 Thread Tom G. Christensen

Latest results for 5.x

-tgc

Testresults for 5.4.0:
  i386-pc-solaris2.11 (2)
  i386-pc-solaris2.12 (2)
  sparc-sun-solaris2.11 (2)
  sparc-sun-solaris2.12 (2)

Index: buildstat.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/buildstat.html,v
retrieving revision 1.8
diff -u -r1.8 buildstat.html
--- buildstat.html  3 Jun 2016 19:44:57 -   1.8
+++ buildstat.html  16 Jun 2016 13:52:26 -
@@ -86,6 +86,8 @@
 i386-pc-solaris2.12
  
 Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00375.html";>5.4.0,
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00374.html";>5.4.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-04/msg03525.html";>5.1.0
 
 
@@ -139,7 +141,7 @@
 powerpc64-unknown-linux-gnu
  
 Test results:
-https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00324.html";>5.4.0
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00324.html";>5.4.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-12/msg02803.html";>5.3.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-10/msg01557.html";>5.2.0
 
@@ -149,7 +151,7 @@
 powerpc64le-unknown-linux-gnu
  
 Test results:
-https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00322.html";>5.4.0
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00322.html";>5.4.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-12/msg02804.html";>5.3.0
 
 
@@ -177,6 +179,8 @@
 sparc-sun-solaris2.11
  
 Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00408.html";>5.4.0,
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00393.html";>5.4.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-04/msg03586.html";>5.1.0
 
 
@@ -240,6 +244,24 @@
 
 
 
+
+i386-pc-solaris2.11
+ 
+Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00379.html";>5.4.0,
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00349.html";>5.4.0
+
+
+
+
+sparc-sun-solaris2.12
+ 
+Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00377.html";>5.4.0,
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00376.html";>5.4.0
+
+
+

[wwwdocs] Buildstat update for 6.x

2016-06-16 Thread Tom G. Christensen

Latest results for 6.x

-tgc

Testresults for 6.1.0:
  i386-pc-solaris2.10
  i386-pc-solaris2.11
  i386-pc-solaris2.12
  i686-pc-linux-gnu
  sparc64-sun-solaris2.10
  sparc-sun-solaris2.10
  sparc-sun-solaris2.11
  sparc-sun-solaris2.12
  x86_64-apple-darwin11.4.2
  x86_64-apple-darwin15.5.0
  x86_64-pc-linux-gnu
  x86_64-w64-mingw32

Index: buildstat.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/buildstat.html,v
retrieving revision 1.2
diff -u -r1.2 buildstat.html
--- buildstat.html  29 Apr 2016 21:15:40 -  1.2
+++ buildstat.html  16 Jun 2016 13:56:58 -
@@ -23,6 +23,38 @@
 
 
 
+i386-pc-solaris2.10
+ 
+Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-05/msg02729.html";>6.1.0
+
+
+
+
+i386-pc-solaris2.11
+ 
+Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-04/msg02749.html";>6.1.0
+
+
+
+
+i386-pc-solaris2.12
+ 
+Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-04/msg02750.html";>6.1.0
+
+
+
+
+i686-pc-linux-gnu
+ 
+Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-05/msg01943.html";>6.1.0
+
+
+
+
 powerpc64le-unknown-linux-gnu
  
 Test results:
@@ -38,6 +70,70 @@
 
 
 
+
+sparc64-sun-solaris2.10
+ 
+Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg01694.html";>6.1.0
+
+
+
+
+sparc-sun-solaris2.10
+ 
+Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg01693.html";>6.1.0
+
+
+
+
+sparc-sun-solaris2.11
+ 
+Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-05/msg01130.html";>6.1.0
+
+
+
+
+sparc-sun-solaris2.12
+ 
+Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-05/msg01124.html";>6.1.0
+
+
+
+
+x86_64-apple-darwin11.4.2
+ 
+Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-05/msg00137.html";>6.1.0
+
+
+
+
+x86_64-apple-darwin15.5.0
+ 
+Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-04/msg02883.html";>6.1.0
+
+
+
+
+x86_64-pc-linux-gnu
+ 
+Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-04/msg02751.html";>6.1.0
+
+
+
+
+x86_64-w64-mingw32
+ 
+Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-05/msg00106.html";>6.1.0
+
+
+

Re: [PATCH] Handle undefined extern vars in output_in_order

2016-06-16 Thread Alexander Monakov

On Thu, 9 Jun 2016, Alexander Monakov wrote:

> Hi,
> 
> This patch teaches cgraphunit.c:output_in_order to output undefined external
> variables via assemble_undefined_decl.  At the moment that is only done for
> -ftoplevel-reorder in varpool.c:symbol_table::output_variables.  This patch
> makes both behave the same way.  I've also made handling of variables in both
> functions look similar to each other.

Ping.

Thanks.
Alexander

Re: [wwwdocs] Buildstat update for 5.x

2016-06-16 Thread Tom G. Christensen

On Thu, Jun 16, 2016 at 03:53:40PM +0200, Tom G. Christensen wrote:
> Latest results for 5.x
> 
> -tgc
> 
> Testresults for 5.4.0:
>   i386-pc-solaris2.11 (2)
>   i386-pc-solaris2.12 (2)
>   sparc-sun-solaris2.11 (2)
>   sparc-sun-solaris2.12 (2)
> 

Updated patch with new entries added in the right position.

-tgc
Index: buildstat.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/buildstat.html,v
retrieving revision 1.8
diff -u -r1.8 buildstat.html
--- buildstat.html  3 Jun 2016 19:44:57 -   1.8
+++ buildstat.html  16 Jun 2016 13:59:57 -
@@ -83,9 +83,20 @@
 
 
 
+i386-pc-solaris2.11
+ 
+Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00379.html";>5.4.0,
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00349.html";>5.4.0
+
+
+
+
 i386-pc-solaris2.12
  
 Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00375.html";>5.4.0,
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00374.html";>5.4.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-04/msg03525.html";>5.1.0
 
 
@@ -139,7 +150,7 @@
 powerpc64-unknown-linux-gnu
  
 Test results:
-https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00324.html";>5.4.0
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00324.html";>5.4.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-12/msg02803.html";>5.3.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-10/msg01557.html";>5.2.0
 
@@ -149,7 +160,7 @@
 powerpc64le-unknown-linux-gnu
  
 Test results:
-https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00322.html";>5.4.0
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00322.html";>5.4.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-12/msg02804.html";>5.3.0
 
 
@@ -177,11 +188,31 @@
 sparc-sun-solaris2.11
  
 Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00408.html";>5.4.0,
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00393.html";>5.4.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-04/msg03586.html";>5.1.0
 
 
 
 
+sparc-sun-solaris2.12
+ 
+Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00377.html";>5.4.0,
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00376.html";>5.4.0
+
+
+
+
+sparc-sun-solaris2.12
+ 
+Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00377.html";>5.4.0,
+https://gcc.gnu.org/ml/gcc-testresults/2016-06/msg00376.html";>5.4.0
+
+
+
+
 sparc64-sun-solaris2.10
  
 Test results:

Re: [PATCH] Handle undefined extern vars in output_in_order

2016-06-16 Thread Jan Hubicka

> On Thu, 9 Jun 2016, Alexander Monakov wrote:
> 
> > Hi,
> > 
> > This patch teaches cgraphunit.c:output_in_order to output undefined external
> > variables via assemble_undefined_decl.  At the moment that is only done for
> > -ftoplevel-reorder in varpool.c:symbol_table::output_variables.  This patch
> > makes both behave the same way.  I've also made handling of variables in 
> > both
> > functions look similar to each other.
> 
> Ping.
+  FOR_EACH_VARIABLE (pv)
+{
+  if (no_reorder && !pv->no_reorder)
+   continue;
+  if (DECL_HARD_REGISTER (pv->decl))
+   continue;
+  if (DECL_HAS_VALUE_EXPR_P (pv->decl))
+   {
+ gcc_checking_assert (lookup_attribute ("omp declare target link",
+DECL_ATTRIBUTES (pv->decl)));
+#ifdef ACCEL_COMPILER
+ continue;
+#endif
+   }
+  i = pv->order;
+  gcc_assert (nodes[i].kind == ORDER_UNDEFINED);
+  nodes[i].kind = pv->definition ? ORDER_VAR : ORDER_VAR_UNDEF;
+  nodes[i].u.v = pv;

order for undefined variables is not computed, so it will be 0. Doesn't think 
overwrite existing
entries of nodes array?

Honza
> 
> Thanks.
> Alexander

Commit: MSP430: Rename entries in option enums

2016-06-16 Thread Nick Clifton

Hi Guys,

  I recently noticed that the MSP430 backend uses some pretty generic
  names for the enum values of its hardware multiply and memory region
  options.  This could possibly cause problems if these names are used
  elsewhere, so I have decided to check in the patch below to fix this.

  Tested with no regressions on an msp430-elf toolchain.

Cheers
  Nick

gcc/ChangeLog
2016-06-16  Nick Clifton  

* config/msp430/msp430-opts.h (msp430_hwmult_types): Add
MSP430_HWMULT_ prefix to enum values.
(msp430_regions): Add MSP430_REGION_ prefix to enum values.
* config/msp430/msp430.c: Update use of enum values.
* config/msp430/msp430.md: Likewise.
* config/msp430/msp430.opt: Likewise.

Index: config/msp430/msp430-opts.h
===
--- config/msp430/msp430-opts.h	(revision 237527)
+++ config/msp430/msp430-opts.h	(working copy)
@@ -22,19 +22,19 @@
 
 enum msp430_hwmult_types
 {
-  NONE,
-  AUTO,
-  SMALL,
-  LARGE,
-  F5SERIES
+  MSP430_HWMULT_NONE,
+  MSP430_HWMULT_AUTO,
+  MSP430_HWMULT_SMALL,
+  MSP430_HWMULT_LARGE,
+  MSP430_HWMULT_F5SERIES
 };
 
 enum msp430_regions
 {
-  ANY,
-  EITHER,
-  LOWER,
-  UPPER
+  MSP430_REGION_ANY,
+  MSP430_REGION_EITHER,
+  MSP430_REGION_LOWER,
+  MSP430_REGION_UPPER
 };
 
 #endif
Index: config/msp430/msp430.c
===
--- config/msp430/msp430.c	(revision 237527)
+++ config/msp430/msp430.c	(working copy)
@@ -777,20 +777,21 @@
 			   target_mcu, xisa ? "430X" : "430", msp430x ? "430X" : "430");
 
 		if (msp430_mcu_data[i].hwmpy == 0
-		&& msp430_hwmult_type != AUTO
-		&& msp430_hwmult_type != NONE)
+		&& msp430_hwmult_type != MSP430_HWMULT_AUTO
+		&& msp430_hwmult_type != MSP430_HWMULT_NONE)
 		  warning (0, "MCU '%s' does not have hardware multiply support, but -mhwmult is set to %s",
 			   target_mcu,
-			   msp430_hwmult_type == SMALL ? "16-bit" : msp430_hwmult_type == LARGE ? "32-bit" : "f5series");
-		else if (msp430_hwmult_type == SMALL
+			   msp430_hwmult_type == MSP430_HWMULT_SMALL ? "16-bit"
+			   : msp430_hwmult_type == MSP430_HWMULT_LARGE ? "32-bit" : "f5series");
+		else if (msp430_hwmult_type == MSP430_HWMULT_SMALL
 		&& msp430_mcu_data[i].hwmpy != 1
 		&& msp430_mcu_data[i].hwmpy != 2 )
 		  warning (0, "MCU '%s' supports %s hardware multiply, but -mhwmult is set to 16-bit",
 			   target_mcu, hwmult_name (msp430_mcu_data[i].hwmpy));
-		else if (msp430_hwmult_type == LARGE && msp430_mcu_data[i].hwmpy != 4)
+		else if (msp430_hwmult_type == MSP430_HWMULT_LARGE && msp430_mcu_data[i].hwmpy != 4)
 		  warning (0, "MCU '%s' supports %s hardware multiply, but -mhwmult is set to 32-bit",
 			   target_mcu, hwmult_name (msp430_mcu_data[i].hwmpy));
-		else if (msp430_hwmult_type == F5SERIES && msp430_mcu_data[i].hwmpy != 8)
+		else if (msp430_hwmult_type == MSP430_HWMULT_F5SERIES && msp430_mcu_data[i].hwmpy != 8)
 		  warning (0, "MCU '%s' supports %s hardware multiply, but -mhwmult is set to f5series",
 			   target_mcu, hwmult_name (msp430_mcu_data[i].hwmpy));
 	  }
@@ -801,7 +802,7 @@
 
   if (i < 0)
 	{
-	  if (msp430_hwmult_type == AUTO)
+	  if (msp430_hwmult_type == MSP430_HWMULT_AUTO)
 	{
 	  if (msp430_warn_mcu)
 		{
@@ -815,7 +816,7 @@
 			 target_mcu);
 		}
 
-	  msp430_hwmult_type = NONE;
+	  msp430_hwmult_type = MSP430_HWMULT_NONE;
 	}
 	  else if (target_cpu == NULL)
 	{
@@ -833,15 +834,15 @@
 }
 
   /* The F5 series are all able to support the 430X ISA.  */
-  if (target_cpu == NULL && target_mcu == NULL && msp430_hwmult_type == F5SERIES)
+  if (target_cpu == NULL && target_mcu == NULL && msp430_hwmult_type == MSP430_HWMULT_F5SERIES)
 msp430x = true;
 
   if (TARGET_LARGE && !msp430x)
 error ("-mlarge requires a 430X-compatible -mmcu=");
 
-  if (msp430_code_region == UPPER && ! msp430x)
+  if (msp430_code_region == MSP430_REGION_UPPER && ! msp430x)
 error ("-mcode-region=upper requires 430X-compatible cpu");
-  if (msp430_data_region == UPPER && ! msp430x)
+  if (msp430_data_region == MSP430_REGION_UPPER && ! msp430x)
 error ("-mdata-region=upper requires 430X-compatible cpu");
 
   if (flag_exceptions || flag_non_call_exceptions
@@ -2166,24 +2167,24 @@
 
   if (TREE_CODE (decl) == FUNCTION_DECL)
 {
-  if (msp430_code_region == LOWER)
+  if (msp430_code_region == MSP430_REGION_LOWER)
 	return lower_prefix;
 
-  if (msp430_code_region == UPPER)
+  if (msp430_code_region == MSP430_REGION_UPPER)
 	return upper_prefix;
 
-  if (msp430_code_region == EITHER)
+  if (msp430_code_region == MSP430_REGION_EITHER)
 	return either_prefix;
 }
   else
 {
-  if (msp430_data_region == LOWER)
+  if (msp430_data_region == MSP430_REGION_LOWER)
 	return lower_prefix;
 
-  if (msp430_data_region == UPPER)
+  if (msp430_data_region == MSP430_REGION_UPPER)
 	return upper_prefix;
 
-

Re: [PATCH, IA64, RFT]: Implement PR 71242, Missing built-in functions for float128 NaNs

2016-06-16 Thread Uros Bizjak

On Thu, Jun 16, 2016 at 12:44 PM, Alexander Monakov  wrote:
> Hi,
>
>> 2016-06-12  Uros Bizjak  
>>
>> PR target/71242
>> * config/ia64/ia64.c (enum ia64_builtins) [IA64_BUILTIN_NANQ]: New.
>> [IA64_BUILTIN_NANSQ]: Ditto.
>> (ia64_fold_builtin): New function.
>> (TARGET_FOLD_BUILTIN): New define.
>> (ia64_init_builtins) Declare const_string_type node.
>> Add __builtin_nanq and __builtin_nansq builtin functions.
>> (ia64_expand_builtin): Handle IA64_BUILTIN_NANQ and IA64_BUILTIN_NANSQ.
>>
>> testsuite/ChangeLog:
>>
>> 2016-06-12  Uros Bizjak  
>>
>> PR target/71241
>> * testsuite/gcc.dg/torture/float128-nan.c: Also run on ia64-*-*.
>>
>> Tested by building croscompiller to ia64-linux-gnu and eyeballed
>> resulting assembly.
>>
>> Can someone please test this patch on a real IA64 ?
>
> I gave it a shot.  It bootstraps, and the float128-nan.c test passes.
> Unfortunately, I had trouble running the testsuite to completion, but fwiw I
> don't see anything alarming in the partial results I got.

Many thanks!

I have committed the patch to the mainline to resolve PR71242

Uros.

Re: [PATCH] Handle undefined extern vars in output_in_order

2016-06-16 Thread Alexander Monakov

On Thu, 16 Jun 2016, Jan Hubicka wrote:
> > On Thu, 9 Jun 2016, Alexander Monakov wrote:
> +  FOR_EACH_VARIABLE (pv)
[snip]
> +  i = pv->order;
> +  gcc_assert (nodes[i].kind == ORDER_UNDEFINED);
> +  nodes[i].kind = pv->definition ? ORDER_VAR : ORDER_VAR_UNDEF;
> +  nodes[i].u.v = pv;
> 
> order for undefined variables is not computed, so it will be 0. Doesn't
> think overwrite existing entries of nodes array?

Hm, I've tried the following testcase:

extern int a, b;
int f()
{
  return a+b;
}

and in the above loop I see pv->order == 2 on the first iteration,
pv->order == 1 on the second. Under what circumstances wouldn't
order be computed?

Thanks.
Alexander

Re: [C++ PATCH] Don't promote bitfields in last arg of __builtin_*_overflow_p

2016-06-16 Thread Joseph Myers

On Wed, 15 Jun 2016, Martin Sebor wrote:

> Looks fine to me.  The bit-field handling should be explained
> in the manual.  Though useful, it's unusual enough that I don't
> think people will expect it (there have been bug reports or
> questions in the past about the C handling of bit-fields from
> users familiar with the C++ semantics).

And at least once bug for C++ (70733) that was closed on the basis of C 
semantics where I don't see that closure as correct under C++ semantics (I 
haven't verified whether the bug report is correct, however).

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH] Add a new target hook to compute the frame layout

2016-06-16 Thread Bernd Edlinger

Hi!


By the design of the target hook INITIAL_ELIMINATION_OFFSET
it is necessary to call this function several times with
different register combinations.
Most targets use a cached data structure that describes the
exact frame layout of the current function.

It is safe to skip the computation when reload_completed = true,
and most targets do that already.

However while reload is doing its work, it is not clear when to
do the computation and when not.  This results in unnecessary
work.  Computing the frame layout can be a simple function or an
arbitrarily complex one, that walks all instructions of the current
function for instance, which is more or less the common case.


This patch adds a new optional target hook that can be used
by the target to factor the INITIAL_ELIMINATION_OFFSET-hook
into a O(n) computation part, and a O(1) result function.

The patch implements a compute_frame_layout target hook just
for ARM in the moment, to show the principle.
Other targets may also implement that hook, if it seems appropriate.


Boot-strapped and reg-tested on arm-linux-gnueabihf.
OK for trunk?


Thanks
Bernd.
2016-06-16  Bernd Edlinger  

* target.def (compute_frame_layout): New optional target hook.
* doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
* doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
* lra-eliminations.c (update_reg_eliminate): Call compute_frame_layout
target hook.
* reload1.c (verify_initial_elim_offsets): Likewise.
* config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
(use_simple_return_p): Call arm_compute_frame_layout if needed.
(arm_get_frame_offsets): Split up into this ...
(arm_compute_frame_layout): ... and this function.
Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c	(Revision 233176)
+++ gcc/config/arm/arm.c	(Arbeitskopie)
@@ -81,6 +81,7 @@ static bool arm_const_not_ok_for_debug_p (rtx);
 static bool arm_needs_doubleword_align (machine_mode, const_tree);
 static int arm_compute_static_chain_stack_bytes (void);
 static arm_stack_offsets *arm_get_frame_offsets (void);
+static void arm_compute_frame_layout (void);
 static void arm_add_gc_roots (void);
 static int arm_gen_constant (enum rtx_code, machine_mode, rtx,
 			 unsigned HOST_WIDE_INT, rtx, rtx, int, int);
@@ -669,6 +670,9 @@ static const struct attribute_spec arm_attribute_t
 #undef TARGET_SCALAR_MODE_SUPPORTED_P
 #define TARGET_SCALAR_MODE_SUPPORTED_P arm_scalar_mode_supported_p
 
+#undef TARGET_COMPUTE_FRAME_LAYOUT
+#define TARGET_COMPUTE_FRAME_LAYOUT arm_compute_frame_layout
+
 #undef TARGET_FRAME_POINTER_REQUIRED
 #define TARGET_FRAME_POINTER_REQUIRED arm_frame_pointer_required
 
@@ -3813,6 +3817,10 @@ use_simple_return_p (void)
 {
   arm_stack_offsets *offsets;
 
+  /* Note this function can be called before or after reload.  */
+  if (!reload_completed)
+arm_compute_frame_layout ();
+
   offsets = arm_get_frame_offsets ();
   return offsets->outgoing_args != 0;
 }
@@ -19238,7 +19246,7 @@ arm_compute_static_chain_stack_bytes (void)
 
 /* Compute a bit mask of which registers need to be
saved on the stack for the current function.
-   This is used by arm_get_frame_offsets, which may add extra registers.  */
+   This is used by arm_compute_frame_layout, which may add extra registers.  */
 
 static unsigned long
 arm_compute_save_reg_mask (void)
@@ -20789,12 +20797,25 @@ any_sibcall_could_use_r3 (void)
   alignment.  */
 
 
+/* Return cached stack offsets.  */
+
+static arm_stack_offsets *
+arm_get_frame_offsets (void)
+{
+  struct arm_stack_offsets *offsets;
+
+  offsets = &cfun->machine->stack_offsets;
+
+  return offsets;
+}
+
+
 /* Calculate stack offsets.  These are used to calculate register elimination
offsets and in prologue/epilogue code.  Also calculates which registers
should be saved.  */
 
-static arm_stack_offsets *
-arm_get_frame_offsets (void)
+static void
+arm_compute_frame_layout (void)
 {
   struct arm_stack_offsets *offsets;
   unsigned long func_type;
@@ -20806,19 +20827,6 @@ any_sibcall_could_use_r3 (void)
 
   offsets = &cfun->machine->stack_offsets;
 
-  /* We need to know if we are a leaf function.  Unfortunately, it
- is possible to be called after start_sequence has been called,
- which causes get_insns to return the insns for the sequence,
- not the function, which will cause leaf_function_p to return
- the incorrect result.
-
- to know about leaf functions once reload has completed, and the
- frame size cannot be changed after that time, so we can safely
- use the cached value.  */
-
-  if (reload_completed)
-return offsets;
-
   /* Initially this is the size of the local variables.  It will translated
  into an offset once we have determined the size of preceding data.  */
   frame_size = ROUND_UP_WORD (get_frame_size ());
@@ -20885,7 +20893,7 @@ any_sibcall_could_u

Re: [Patch] Implement is_[nothrow_]swappable (p0185r1) - 2nd try

2016-06-16 Thread Jonathan Wakely


On 16/06/16 14:08 +0100, Jonathan Wakely wrote:

/home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/20_util/is_nothrow_swappable/./value.h:285:3:
error: static assertion failed
/home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/20_util/is_nothrow_swappable/./value.h:287:3:
error: static assertion failed
/home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/20_util/is_nothrow_swappable/./value.h:289:3:
error: static assertion failed

Those assertions fail for both 20_util/is_nothrow_swappable/value.cc
and 20_util/is_nothrow_swappable/value_ext.cc


Those assertions should be changed because with your patch queue,
priority_queue and stack are nothrow swappable, because they don't
depend on the value_type's swappable trait, but the sequence's.

The other problems are that some tests need their dg-error lines
adjusting, and  says #ifdef __cplusplus >= 201402L which
should be #if.

Here's the patch I've tested and committed, thanks for working on
this!

commit e875362ec65adc793161cec9ee6d025ac86f12e5
Author: Jonathan Wakely 
Date:   Thu Jun 16 13:55:08 2016 +0100

Provide swappable traits (p0185r1)

2016-06-16  Daniel Kruegler  

	Provide swappable traits (p0185r1)
	* include/std/type_traits (is_swappable, is_nothrow_swappable,
	is_swappable_with, is_nothrow_swappable_with, is_swappable_v,
	is_nothrow_swappable_v, is_swappable_with_v,
	is_nothrow_swappable_with_v): New.
	* include/bits/stl_pair.h: Use it as per p0185r1.
	* include/bits/stl_queue.h: Likewise.
	* include/bits/stl_stack.h: Likewise.
	* include/bits/unique_ptr.h: Likewise.
	* include/std/tuple: Likewise.
	* include/std/array: Likewise. Fix zero-size member swap.
	* include/bits/hashtable.h: Use __and_.
	* testsuite/20_util/is_nothrow_swappable/requirements/
	explicit_instantiation.cc: Change test options to std=gnu++17.
	* testsuite/20_util/is_nothrow_swappable/requirements/typedefs.cc:
	Likewise.
	* testsuite/20_util/is_nothrow_swappable/value.cc: Likewise.
	* testsuite/20_util/is_swappable/requirements/
	explicit_instantiation.cc: Likewise.
	* testsuite/20_util/is_swappable/requirements/typedefs.cc: Likewise.
	* testsuite/20_util/is_swappable/value.cc: Likewise.
	* testsuite/20_util/is_nothrow_swappable/requirements/
	explicit_instantiation_ext.cc: New.
	* testsuite/20_util/is_nothrow_swappable/requirements/typedefs_ext.cc:
	New.
	* testsuite/20_util/is_nothrow_swappable/value.h: New.
	* testsuite/20_util/is_nothrow_swappable/value_ext.cc: New.
	* testsuite/20_util/is_nothrow_swappable_with/requirements/
	explicit_instantiation.cc: New.
	* testsuite/20_util/is_nothrow_swappable_with/requirements/typedefs.cc:
	New.
	* testsuite/20_util/is_nothrow_swappable_with/value.cc: New.
	* testsuite/20_util/is_swappable/requirements/
	explicit_instantiation_ext.cc: New.
	* testsuite/20_util/is_swappable/requirements/typedefs_ext.cc: New.
	* testsuite/20_util/is_swappable/value.h: New.
	* testsuite/20_util/is_swappable/value_ext.cc: New.
	* testsuite/20_util/is_swappable_with/requirements/
	explicit_instantiation.cc: New.
	* testsuite/20_util/is_swappable_with/requirements/typedefs.cc: New.
	* testsuite/20_util/is_swappable_with/value.cc: New.
	* testsuite/23_containers/array/tuple_interface/get_neg.cc: Adjust
	dg-error line numbers.
	* testsuite/23_containers/array/tuple_interface/tuple_element_neg.cc:
	Likewise.

diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index 5748920..05f27b4 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -475,8 +475,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   void
   swap(_Hashtable&)
-  noexcept(__is_nothrow_swappable<_H1>::value
-	   && __is_nothrow_swappable<_Equal>::value);
+  noexcept(__and_<__is_nothrow_swappable<_H1>,
+	  __is_nothrow_swappable<_Equal>>::value);
 
   // Basic container operations
   iterator
@@ -1236,8 +1236,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 	   _H1, _H2, _Hash, _RehashPolicy, _Traits>::
 swap(_Hashtable& __x)
-noexcept(__is_nothrow_swappable<_H1>::value
-	 && __is_nothrow_swappable<_Equal>::value)
+noexcept(__and_<__is_nothrow_swappable<_H1>,
+	__is_nothrow_swappable<_Equal>>::value)
 {
   // The only base class with member variables is hash_code_base.
   // We define _Hash_code_base::_M_swap because different
diff --git a/libstdc++-v3/include/bits/stl_pair.h b/libstdc++-v3/include/bits/stl_pair.h
index 37ee5cc..5ff160a 100644
--- a/libstdc++-v3/include/bits/stl_pair.h
+++ b/libstdc++-v3/include/bits/stl_pair.h
@@ -341,8 +341,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   void
   swap(pair& __p)
-  noexcept(__is_nothrow_swappable<_T1>::value
-   && __is_nothro

Re: [PATCH] Handle undefined extern vars in output_in_order

2016-06-16 Thread Jan Hubicka

> On Thu, 16 Jun 2016, Jan Hubicka wrote:
> > > On Thu, 9 Jun 2016, Alexander Monakov wrote:
> > +  FOR_EACH_VARIABLE (pv)
> [snip]
> > +  i = pv->order;
> > +  gcc_assert (nodes[i].kind == ORDER_UNDEFINED);
> > +  nodes[i].kind = pv->definition ? ORDER_VAR : ORDER_VAR_UNDEF;
> > +  nodes[i].u.v = pv;
> > 
> > order for undefined variables is not computed, so it will be 0. Doesn't
> > think overwrite existing entries of nodes array?
> 
> Hm, I've tried the following testcase:
> 
> extern int a, b;
> int f()
> {
>   return a+b;
> }
> 
> and in the above loop I see pv->order == 2 on the first iteration,
> pv->order == 1 on the second. Under what circumstances wouldn't
> order be computed?

I see, order is created at a time variable is added to symbol table (not at 
time when definition is given).
So we should have order everywhere.
Patch is OK

Honza
> 
> Thanks.
> Alexander

Re: RFA (gimplify): PATCH to implement C++ order of evaluation paper

2016-06-16 Thread Jason Merrill

On Wed, Jun 15, 2016 at 6:30 AM, Richard Biener
 wrote:
> On Tue, Jun 14, 2016 at 10:15 PM, Jason Merrill  wrote:
>> As discussed in bug 71104, the C++ P0145 proposal specifies the evaluation
>> order of certain operations:
>>
>> 1. a.b
>> 2. a->b
>> 3. a->*b
>> 4. a(b1, b2, b3)
>> 5. b @= a
>> 6. a[b]
>> 7. a << b
>> 8. a >> b
>>
>> The second patch introduces a flag -fargs-in-order to control whether these
>> orders are enforced on calls.  -fargs-in-order=1 enforces all but the
>> ordering between function arguments in #4.
>>
>> The first patch implements #5 for the built-in assignment operator by
>> changing the order of gimplification of MODIFY_EXPR in the back end, as
>> richi was also thinking about doing to fix 71104.  This runs into problems
>> with DECL_VALUE_EXPR variables, where is_gimple_reg can be true before
>> gimplification and false afterward, so he checks for this situation in
>> rhs_predicate_for.  richi, you said you were still working on 71104; is this
>> patch OK to put in for now, or should I wait for something better?

> I wasn't too happy about the rhs_predicate_for change and I was also worried
> about generating a lot less optimal GIMPLE due to evaluating the predicate
> on un-gimplified *to_p.

We can try to be more clever about recognizing things that will
gimplify to a reg.  How does this patch look?

> I wondered if we should simply gimplify *from_p
> with is_gimple_mem_rhs_or_call unconditionally, then gimplify *to_p
> and after that if (unmodified) rhs_predicate_for (*to_p) is !=
> is_gimple_mem_rhs_or_call re-gimplify *from_p to avoid this.  That should 
> also avoid changing
> rhs_predicate_for.

The problem with this approach is that gimplification is destructive;
you can't just throw away the first sequence and gimplify again.  For
instance, SAVE_EXPRs are clobbered the first time they are seen in
gimplification.

> Not sure if that solves whatever you were running into with OpenMP.
>
> I simply didn't have found the time to experiment with the above or even
> validate my fear by say comparing .gimple dumps of cc1 files with/without
> the gimplification order change.

Looking through the gimple dumps for optabs.c and c-common.c with this
patch I don't see any increase in temporaries, but I do see some
improved locality such that we initialize a pointer temporary just
before assigning to one of its fields rather than initializing it
before doing all the value computation, e.g.

before:
-  _14 = *node;
-  _15 = contains_struct_check (_14, 1, "../../../gcc/gcc/c-family/c-common.
c", 7672, &__FUNCTION__);
...lots...
-  _15->typed.type = _56;

after:
+  _55 = *node;
+  _56 = contains_struct_check (_55, 1,
"../../../gcc/gcc/c-family/c-common.c", 7672, &__FUNCTION__);
+  _56->typed.type = _54;

Is this version of the patch OK?

Jason
commit 50495a102be99950002b0cc9f824fcb90cdf65fb
Author: Jason Merrill 
Date:   Thu Jun 16 01:25:02 2016 -0400

P0145R2: Refining Expression Order for C++ (assignment)

* gimplify.c (will_be_gimple_reg): New.
(rhs_predicate_for): Use it.
(gimplify_modify_expr): Gimplify RHS first.

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index ae8b4fc..5d51d64 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -3802,12 +3802,45 @@ gimplify_init_ctor_eval (tree object, 
vec *elts,
 }
 }
 
+/* Return true if LHS will satisfy is_gimple_reg after gimplification.  */
+
+static bool
+will_be_gimple_reg (tree lhs)
+{
+  while (true)
+switch (TREE_CODE (lhs))
+  {
+  case COMPOUND_EXPR:
+   lhs = TREE_OPERAND (lhs, 1);
+   break;
+
+  case INIT_EXPR:
+  case MODIFY_EXPR:
+  case PREINCREMENT_EXPR:
+  case PREDECREMENT_EXPR:
+   lhs = TREE_OPERAND (lhs, 0);
+   break;
+
+  case VAR_DECL:
+  case PARM_DECL:
+  case RESULT_DECL:
+   if (DECL_HAS_VALUE_EXPR_P (lhs))
+ {
+   lhs = DECL_VALUE_EXPR (lhs);
+   break;
+ }
+   /* else fall through.  */
+  default:
+   return is_gimple_reg (lhs);
+  }
+}
+
 /* Return the appropriate RHS predicate for this LHS.  */
 
 gimple_predicate
 rhs_predicate_for (tree lhs)
 {
-  if (is_gimple_reg (lhs))
+  if (will_be_gimple_reg (lhs))
 return is_gimple_reg_rhs_or_call;
   else
 return is_gimple_mem_rhs_or_call;
@@ -4778,10 +4811,6 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p,
  that is what we must do here.  */
   maybe_with_size_expr (from_p);
 
-  ret = gimplify_expr (to_p, pre_p, post_p, is_gimple_lvalue, fb_lvalue);
-  if (ret == GS_ERROR)
-return ret;
-
   /* As a special case, we have to temporarily allow for assignments
  with a CALL_EXPR on the RHS.  Since in GIMPLE a function call is
  a toplevel statement, when gimplifying the GENERIC expression
@@ -4799,6 +4828,10 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p,
   if (ret == GS_ERROR)
 return ret;
 
+  ret = gimplify

Re: [PATCH] Handle undefined extern vars in output_in_order

2016-06-16 Thread Alexander Monakov

On Thu, 16 Jun 2016, Jan Hubicka wrote:
> I see, order is created at a time variable is added to symbol table (not at
> time when definition is given).  So we should have order everywhere.
> Patch is OK

Thanks!  If you don't mind a quick followup question: now that both
FOR_EACH_VARIABLE loops in two functions have the same structure, is
it alright to add a comment of the form

  /* There is a similar loop in output_in_order.  Please keep them in sync.  */

to symbol_table::output_variables, and vice versa?

Alexander

Re: [PATCH, vec-tails 07/10] Support loop epilogue combining

2016-06-16 Thread Ilya Enkovich

2016-06-15 14:44 GMT+03:00 Richard Biener :
> On Thu, May 19, 2016 at 9:44 PM, Ilya Enkovich  wrote:
>> Hi,
>>
>> This patch introduces support for loop epilogue combining.  This includes
>> support in cost estimation and all required changes required to mask
>> vectorized loop.
>
> I wonder why you compute a minimum number of iterations to make masking
> of the vectorized body profitable rather than a maximum number of iterations.
>
> I'd say masking the vectorized loop is profitable if niter/vf *
> masking-overhead < epilogue-cost.
> Masking the epilogue is profitable if vectorizing the epilogue with
> masking is profitable.
>
> Am I missing something?

We don't have two versions of vectorized loop.  The choice is between vector
and scalar loop and in this case minimum number of iterations is what we need.
Generating two vectorized loop versions would be something new to vectorizer.

Thanks,
Ilya

>
> Thanks,
> Richard.
>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2016-05-19  Ilya Enkovich  
>>
>> * dbgcnt.def (vect_tail_combine): New.
>> * params.def (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD): New.
>> * tree-vect-data-refs.c (vect_get_new_ssa_name): Support 
>> vect_mask_var.
>> * tree-vect-loop-manip.c (slpeel_tree_peel_loop_to_edge): Support
>> epilogue combined with loop body.
>> (vect_do_peeling_for_loop_bound): Likewise.
>> * tree-vect-loop.c Include alias.h and dbgcnt.h.
>> (vect_estimate_min_profitable_iters): Add 
>> ret_min_profitable_combine_niters
>> arg, compute number of iterations for which loop epilogue combining 
>> is
>> profitable.
>> (vect_generate_tmps_on_preheader): Support combined apilogue.
>> (vect_gen_ivs_for_masking): New.
>> (vect_get_mask_index_for_elems): New.
>> (vect_get_mask_index_for_type): New.
>> (vect_gen_loop_masks): New.
>> (vect_mask_reduction_stmt): New.
>> (vect_mask_mask_load_store_stmt): New.
>> (vect_mask_load_store_stmt): New.
>> (vect_combine_loop_epilogue): New.
>> (vect_transform_loop): Support combined apilogue.
>>
>>
>> diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
>> index 78ddcc2..73c2966 100644
>> --- a/gcc/dbgcnt.def
>> +++ b/gcc/dbgcnt.def
>> @@ -192,4 +192,5 @@ DEBUG_COUNTER (treepre_insert)
>>  DEBUG_COUNTER (tree_sra)
>>  DEBUG_COUNTER (vect_loop)
>>  DEBUG_COUNTER (vect_slp)
>> +DEBUG_COUNTER (vect_tail_combine)
>>  DEBUG_COUNTER (dom_unreachable_edges)
>> diff --git a/gcc/params.def b/gcc/params.def
>> index 62a1e40..98d6c5a 100644
>> --- a/gcc/params.def
>> +++ b/gcc/params.def
>> @@ -1220,6 +1220,11 @@ DEFPARAM (PARAM_MAX_SPECULATIVE_DEVIRT_MAYDEFS,
>>   "Maximum number of may-defs visited when devirtualizing "
>>   "speculatively", 50, 0, 0)
>>
>> +DEFPARAM (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD,
>> + "vect-cost-increase-combine-threshold",
>> + "Cost increase threshold to mask main loop for epilogue.",
>> + 10, 0, 300)
>> +
>>  /*
>>
>>  Local variables:
>> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>> index f275933..c5bdeb9 100644
>> --- a/gcc/tree-vect-data-refs.c
>> +++ b/gcc/tree-vect-data-refs.c
>> @@ -4000,6 +4000,9 @@ vect_get_new_ssa_name (tree type, enum vect_var_kind 
>> var_kind, const char *name)
>>case vect_scalar_var:
>>  prefix = "stmp";
>>  break;
>> +  case vect_mask_var:
>> +prefix = "mask";
>> +break;
>>case vect_pointer_var:
>>  prefix = "vectp";
>>  break;
>> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
>> index fab5879..b3c0668 100644
>> --- a/gcc/tree-vect-loop-manip.c
>> +++ b/gcc/tree-vect-loop-manip.c
>> @@ -1195,6 +1195,7 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, 
>> struct loop *scalar_loop,
>>int first_guard_probability = 2 * REG_BR_PROB_BASE / 3;
>>int second_guard_probability = 2 * REG_BR_PROB_BASE / 3;
>>int probability_of_second_loop;
>> +  bool skip_second_after_first = false;
>>
>>if (!slpeel_can_duplicate_loop_p (loop, e))
>>  return NULL;
>> @@ -1393,7 +1394,11 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, 
>> struct loop *scalar_loop,
>>  {
>>loop_vec_info loop_vinfo = loop_vec_info_for_loop (loop);
>>tree scalar_loop_iters = LOOP_VINFO_NITERSM1 (loop_vinfo);
>> -  unsigned limit = LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1;
>> +  unsigned limit = 0;
>> +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
>> +   skip_second_after_first = true;
>> +  else
>> +   limit = LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1;
>>if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
>> limit = limit + 1;
>>if (check_profitability
>> @@ -1464,11 +1469,20 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, 
>> struct loop *scalar_loop,
>>bb_between_loops = new_exit_bb;
>>bb_after_second_loop = split_edge (single_exit (second_loop));
>>
>> -  pre_con

Re: [PATCH] Handle undefined extern vars in output_in_order

2016-06-16 Thread Jan Hubicka

> On Thu, 16 Jun 2016, Jan Hubicka wrote:
> > I see, order is created at a time variable is added to symbol table (not at
> > time when definition is given).  So we should have order everywhere.
> > Patch is OK
> 
> Thanks!  If you don't mind a quick followup question: now that both
> FOR_EACH_VARIABLE loops in two functions have the same structure, is
> it alright to add a comment of the form
> 
>   /* There is a similar loop in output_in_order.  Please keep them in sync.  
> */
> 
> to symbol_table::output_variables, and vice versa?

Yes, that is fine.

Honza

RE: [PATCH][AArch64] Enable -frename-registers at -O2 and higher

2016-06-16 Thread Evandro Menezes

> Though there's a slight (<1%) overall improvement on Exynos M1, there just
were
> too many significant (<-3%) regressions for a few significant improvements for
me
> to be comfortable with -frename-registers being a generic default for AArch64.
> 
> I'll run some larger benchmarks tonight, but I'm leaning towards having it as
a
> target specific extra tuning option.

The results are in and -frename-registers is not a good idea for Exynos M1.

Thank you,

-- 
Evandro Menezes  Austin, TX

Re: [PATCH, vec-tails 08/10] Support loop epilogue masking and low trip count loop vectorization

2016-06-16 Thread Jeff Law


On 05/19/2016 01:46 PM, Ilya Enkovich wrote:

Hi,

This patch enables vectorization of loop epilogues and low trip count
loops using masking.

Thanks,
Ilya
--
gcc/

2016-05-19  Ilya Enkovich  

* dbgcnt.def (vect_tail_mask): New.
* tree-vect-loop.c (vect_analyze_loop_2): Support masked loop
epilogues and low trip count loops.
(vect_get_known_peeling_cost): Ignore scalat epilogue cost for

s/scalat/scalar/


loops we are going to mask.
(vect_estimate_min_profitable_iters): Support masked loop
epilogues and low trip count loops.
* tree-vectorizer.c (vectorize_loops): Add a message for a case
when loop epilogue can't be vectorized.

I don't see anything here that worries me.  Richi's question is a valid 
one, but I don't have a strong opinion on whether or not that should be 
explored as a prerequisite for this work to be accepted or if it should 
be a follow-up item.  So take guidance from Richi on that.


jeff

Re: [PATCH, vec-tails 07/10] Support loop epilogue combining

2016-06-16 Thread Jeff Law


On 06/16/2016 09:41 AM, Ilya Enkovich wrote:

2016-06-15 14:44 GMT+03:00 Richard Biener :

On Thu, May 19, 2016 at 9:44 PM, Ilya Enkovich  wrote:

Hi,

This patch introduces support for loop epilogue combining.  This includes
support in cost estimation and all required changes required to mask
vectorized loop.


I wonder why you compute a minimum number of iterations to make masking
of the vectorized body profitable rather than a maximum number of iterations.

I'd say masking the vectorized loop is profitable if niter/vf *
masking-overhead < epilogue-cost.
Masking the epilogue is profitable if vectorizing the epilogue with
masking is profitable.

Am I missing something?


We don't have two versions of vectorized loop.  The choice is between vector
and scalar loop and in this case minimum number of iterations is what we need.
Generating two vectorized loop versions would be something new to vectorizer.
What I think Richi is saying is that we have to multiply the cost of the 
masking overhead by the number of iterations of vectorized loop to 
determine the cost of masking -- the more loop iterations we have, the 
greater the cost of masking in the loop becomes and those costs may be 
higher than the normal epilogue sequence.


Jeff

Re: [PATCH, vec-tails 08/10] Support loop epilogue masking and low trip count loop vectorization

2016-06-16 Thread Ilya Enkovich

2016-06-15 15:00 GMT+03:00 Richard Biener :
> On Thu, May 19, 2016 at 9:46 PM, Ilya Enkovich  wrote:
>> Hi,
>>
>> This patch enables vectorization of loop epilogues and low trip count
>> loops using masking.
>
> I wonder why we have the epilogue masking restriction with respect to
> the original vectorization factor - shouldn't this simply be handled by
> vectorizing the epilogue?  First trying the original VF (requires masking
> and is equivalent to low-tripcount loop vectorization), then if that is not
> profitable iterate to smaller VFs?   [yes, ideally we'd be able to compare
> cost for vectorization with different VFs and choose the best VF]

When main loop is vectorized using some VF we compute epilogue masking
profitability and generate epilogue to be vectorized and masked using exactly
the same VF.  In ideal case we never fail to vectorize epilogue because we
check that it can be masked.  Unfortunately we may loose some info
when generating
a loop copy (e.g. scev info is lost) and therefore may fail to
vectorize epilogue.

I expect that if we loose some info and thus fail to vectorize for a
specified VF
(for which the main loop was successfully vectorized) then we are going to fail
to vectorize for other vector sizes too.  Actually I'd prefer to try
the only vector
size for vectorization with masking to save compilation time.

Thanks,
Ilya

>
> Thanks,
> Richard.
>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2016-05-19  Ilya Enkovich  
>>
>> * dbgcnt.def (vect_tail_mask): New.
>> * tree-vect-loop.c (vect_analyze_loop_2): Support masked loop
>> epilogues and low trip count loops.
>> (vect_get_known_peeling_cost): Ignore scalat epilogue cost for
>> loops we are going to mask.
>> (vect_estimate_min_profitable_iters): Support masked loop
>> epilogues and low trip count loops.
>> * tree-vectorizer.c (vectorize_loops): Add a message for a case
>> when loop epilogue can't be vectorized.
>>
>>
>> diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
>> index 73c2966..5aad1d7 100644
>> --- a/gcc/dbgcnt.def
>> +++ b/gcc/dbgcnt.def
>> @@ -193,4 +193,5 @@ DEBUG_COUNTER (tree_sra)
>>  DEBUG_COUNTER (vect_loop)
>>  DEBUG_COUNTER (vect_slp)
>>  DEBUG_COUNTER (vect_tail_combine)
>> +DEBUG_COUNTER (vect_tail_mask)
>>  DEBUG_COUNTER (dom_unreachable_edges)
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index 1a80c42..7075f29 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -2199,7 +2199,7 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool 
>> &fatal)
>>int saved_vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>>HOST_WIDE_INT estimated_niter;
>>unsigned th;
>> -  int min_scalar_loop_bound;
>> +  int min_scalar_loop_bound = 0;
>>
>>/* Check the SLP opportunities in the loop, analyze and build SLP trees.  
>> */
>>ok = vect_analyze_slp (loop_vinfo, n_stmts);
>> @@ -2224,6 +2224,30 @@ start_over:
>>unsigned vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>>gcc_assert (vectorization_factor != 0);
>>
>> +  /* For now we mask loop epilogue using the same VF since it was used
>> + for cost estimations and it should be easier for reduction
>> + optimization.  */
>> +  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
>> +  && LOOP_VINFO_ORIG_MASK_EPILOGUE (loop_vinfo)
>> +  && LOOP_VINFO_ORIG_VECT_FACTOR (loop_vinfo) != 
>> (int)vectorization_factor)
>> +{
>> +  if (dump_enabled_p ())
>> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> +"not vectorized: VF for loop epilogue doesn't "
>> +"match original loop VF.\n");
>> +  return false;
>> +}
>> +
>> +  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
>> +  && !LOOP_VINFO_ORIG_MASK_EPILOGUE (loop_vinfo)
>> +  && LOOP_VINFO_ORIG_VECT_FACTOR (loop_vinfo) <= 
>> (int)vectorization_factor)
>> +{
>> +  if (dump_enabled_p ())
>> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> +"not vectorized: VF for loop epilogue is too 
>> small\n");
>> +  return false;
>> +}
>> +
>>if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && dump_enabled_p ())
>>  dump_printf_loc (MSG_NOTE, vect_location,
>>  "vectorization_factor = %d, niters = "
>> @@ -2237,11 +2261,29 @@ start_over:
>>|| (max_niter != -1
>>   && (unsigned HOST_WIDE_INT) max_niter < vectorization_factor))
>>  {
>> -  if (dump_enabled_p ())
>> -   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> -"not vectorized: iteration count smaller than "
>> -"vectorization factor.\n");
>> -  return false;
>> +  /* Allow low trip count for loop epilogue we want to mask.  */
>> +  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
>> + && LOOP_VINFO_ORIG_MASK_EPILOGUE (loop_vinfo))
>> +   ;
>> +  /* Allow low trip count for non-epi

Re: [PATCH][AArch64] Handle iterator definitions with conditionals in geniterator.sh

2016-06-16 Thread James Greenhalgh

On Thu, Jun 16, 2016 at 09:31:19AM +0100, Szabolcs Nagy wrote:
> Turn the following definition in iterators.md
> 
> (define_mode_iterator XXX [(YYY "condition") ZZZ])
> 
> into
> 
> #define BUILTIN_XXX(T, N, MAP) \
>   VAR2 (T, N, MAP, yyy, zzz)
> 
> previously geniterators.sh skipped definitions with
> conditions.

I think this is OK.

It means that if we use any of these iterators with a conditional we may expose
some names that may not map to valid patterns (and then ICE), but these are in
the implementation namespace, so we make no promises that they will
unconditionally work anyway. But, I don't see how you could add this
conditional information back in, certainly not in the preprocessor as these are
runtime decisions.

We might want to catch a pattern that has not been enabled in
aarch64_simd_expand_builtin and provide a suitable error message, rather
than ICEing. But that can come as a follow-up.

The patch is OK to apply.

Thanks,
James

> 
> gcc/ChangeLog:
> 
> 2016-06-16  Szabolcs Nagy  
> 
>   * config/aarch64/geniterators.sh: Handle parenthesised conditions.

[PATCH] Add 'Fortran' to display text of all PRED_FORTRAN_*

2016-06-16 Thread Martin Liška

Hello.

Following patch just enhances display names of all Fortran predictors.
Pre-approved by Honza.

Patch survives reg&bootstrap on x86_64-linux-gnu.
Installed as r237532.

Martin
>From 4bfd7a78395c5145712fd5a104e9a9dd43b9c541 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 16 Jun 2016 13:38:30 +0200
Subject: [PATCH 1/2] Add 'Fortran' to display text of all PRED_FORTRAN_*
 predictors.

gcc/ChangeLog:

2016-06-16  Martin Liska  

	* predict.def: Add 'Fortran' to display text of all
	PRED_FORTRAN_* predictors.
---
 gcc/predict.def | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/gcc/predict.def b/gcc/predict.def
index da4f9ab..3e3a43a 100644
--- a/gcc/predict.def
+++ b/gcc/predict.def
@@ -159,40 +159,43 @@ DEF_PREDICTOR (PRED_COLD_LABEL, "cold label", PROB_VERY_LIKELY,
 /* The following predictors are used in Fortran. */
 
 /* Branch leading to an integer overflow are extremely unlikely.  */
-DEF_PREDICTOR (PRED_FORTRAN_OVERFLOW, "overflow", PROB_ALWAYS,
+DEF_PREDICTOR (PRED_FORTRAN_OVERFLOW, "Fortran overflow", PROB_ALWAYS,
 	   PRED_FLAG_FIRST_MATCH)
 
 /* Branch leading to a failure status are unlikely.  This can occur for out
of memory.  This predictor only occurs when the user explicitly asked
for a return status.  By default, the code aborts,
which is handled via PRED_NORETURN.  */
-DEF_PREDICTOR (PRED_FORTRAN_FAIL_ALLOC, "fail alloc", PROB_VERY_LIKELY, 0)
+DEF_PREDICTOR (PRED_FORTRAN_FAIL_ALLOC, "Fortran fail alloc", PROB_VERY_LIKELY, 0)
 
 /* Predictor is used for an allocation of an already allocated memory or
deallocating an already deallocated allocatable.  */
-DEF_PREDICTOR (PRED_FORTRAN_REALLOC, "repeated allocation/deallocation", \
-	   PROB_LIKELY, 0)
+DEF_PREDICTOR (PRED_FORTRAN_REALLOC, \
+	   "Fortran repeated allocation/deallocation", PROB_LIKELY, 0)
 
 /* Branch leading to an I/O failure status are unlikely.  This predictor is
used for I/O failures such as for invalid unit numbers.  This predictor
only occurs when the user explicitly asked for a return status.  By default,
the code aborts, which is handled via PRED_NORETURN.  */
-DEF_PREDICTOR (PRED_FORTRAN_FAIL_IO, "fail IO", HITRATE (85), 0)
+DEF_PREDICTOR (PRED_FORTRAN_FAIL_IO, "Fortran fail IO", HITRATE (85), 0)
 
 /* Branch leading to a run-time warning message which is printed only once
are unlikely.  The print-warning branch itself can be likely or unlikely.  */
-DEF_PREDICTOR (PRED_FORTRAN_WARN_ONCE, "warn once", HITRATE (75), 0)
+DEF_PREDICTOR (PRED_FORTRAN_WARN_ONCE, "Fortran warn once", HITRATE (75), 0)
 
 /* Branch belonging to a zero-sized array.  */
-DEF_PREDICTOR (PRED_FORTRAN_SIZE_ZERO, "zero-sized array", HITRATE(99), 0)
+DEF_PREDICTOR (PRED_FORTRAN_SIZE_ZERO, "Fortran zero-sized array", \
+	   HITRATE (99), 0)
 
 /* Branch belonging to an invalid bound index, in a context where it is
standard conform and well defined but rather pointless and, hence, rather
unlikely to occur.  */
-DEF_PREDICTOR (PRED_FORTRAN_INVALID_BOUND, "zero-sized array", HITRATE(90), 0)
+DEF_PREDICTOR (PRED_FORTRAN_INVALID_BOUND, "Fortran invalid bound", \
+	   HITRATE (90), 0)
 
 /* Branch belonging to the handling of absent optional arguments.  This
predictor is used when an optional dummy argument, associated with an
absent argument, is passed on as actual argument to another procedure,
which in turn has an optional argument.  */
-DEF_PREDICTOR (PRED_FORTRAN_ABSENT_DUMMY, "absent dummy", HITRATE(60), 0)
+DEF_PREDICTOR (PRED_FORTRAN_ABSENT_DUMMY, "Fortran absent dummy", \
+	   HITRATE (60), 0)
-- 
2.8.3

Re: [PATCH, vec-tails 07/10] Support loop epilogue combining

2016-06-16 Thread Ilya Enkovich

2016-06-16 18:51 GMT+03:00 Jeff Law :
> On 06/16/2016 09:41 AM, Ilya Enkovich wrote:
>>
>> 2016-06-15 14:44 GMT+03:00 Richard Biener :
>>>
>>> On Thu, May 19, 2016 at 9:44 PM, Ilya Enkovich 
>>> wrote:

 Hi,

 This patch introduces support for loop epilogue combining.  This
 includes
 support in cost estimation and all required changes required to mask
 vectorized loop.
>>>
>>>
>>> I wonder why you compute a minimum number of iterations to make masking
>>> of the vectorized body profitable rather than a maximum number of
>>> iterations.
>>>
>>> I'd say masking the vectorized loop is profitable if niter/vf *
>>> masking-overhead < epilogue-cost.
>>> Masking the epilogue is profitable if vectorizing the epilogue with
>>> masking is profitable.
>>>
>>> Am I missing something?
>>
>>
>> We don't have two versions of vectorized loop.  The choice is between
>> vector
>> and scalar loop and in this case minimum number of iterations is what we
>> need.
>> Generating two vectorized loop versions would be something new to
>> vectorizer.
>
> What I think Richi is saying is that we have to multiply the cost of the
> masking overhead by the number of iterations of vectorized loop to determine
> the cost of masking -- the more loop iterations we have, the greater the
> cost of masking in the loop becomes and those costs may be higher than the
> normal epilogue sequence.

Right.  But we compute that dynamically.  And what do we do when we see overall
masking cost becomes greater than a scalar epilogue cost?  The only case when
this check is useful is when we have vectorized non-combined version of a loop.
The original idea of combining (patches sent by Yuri last year) was to use it
only in cases when masking cost is small enough (and we expect cheap masking
computations are 'hidden' under heavier instructions by scheduler, so we don't
loose performance even for high iterations count).

Dynamically choosing between combined and non-combined versions is
another story.

Thanks,
Ilya

>
> Jeff
>
>
>

[PATCH] Introduce fortran loop preheader

2016-06-16 Thread Martin Liška

Hello.

Following patch introduces FORTRAN_LOOP_PREHEADER predictor for all
Fortran loops that are transformed to:

   [Evaluate loop bounds and step]
   dovar = from;
   if ((step > 0) ? (dovar <= to) : (dovar => to))
{
  for (;;)
{
  body;
   cycle_label:
  cond = (dovar == to);
  dovar += step;
  if (cond) goto end_label;
}
  }
   end_label:

The first condition can be predicted as satisfied in most situations.
I've also measured the predictor on SPEC2006 and was 99%, which I've
picked up as the hitrate. The patch is pre-approved by Honza.

Patch survives reg&bootstrap on x86_64-linux-gnu.
Installed as r237533.

Martin
>From 6072b4e4a215b876da99ebb37132852bdfd033ee Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 15 Jun 2016 14:45:12 +0200
Subject: [PATCH 2/2] Introduce fortran loop preheader

gcc/ChangeLog:

2016-06-15  Martin Liska  

	* predict.def: Add fortran loop preheader predictor.
	* gimple-fold.c (gimple_fold_stmt_to_constant_1): Properly
	fold IFN_BUILTIN_EXPECT with a known constant argument.

gcc/fortran/ChangeLog:

2016-06-15  Martin Liska  

	* trans-stmt.c (gfc_trans_simple_do): Predict the edge.

gcc/testsuite/ChangeLog:

2016-06-16  Martin Liska  

	* gfortran.dg/predict-1.f90: New test.
---
 gcc/fortran/trans-stmt.c|  4 +++-
 gcc/gimple-fold.c   |  8 
 gcc/predict.def |  6 ++
 gcc/testsuite/gfortran.dg/predict-1.f90 | 12 
 4 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/predict-1.f90

diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index 7d3cf8c..84bf749 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -1938,7 +1938,9 @@ gfc_trans_simple_do (gfc_code * code, stmtblock_t *pblock, tree dovar,
   else
 cond = fold_build2_loc (loc, GE_EXPR, boolean_type_node, dovar,
 			to);
-  tmp = fold_build3_loc (loc, COND_EXPR, void_type_node, cond, tmp,
+
+  tmp = fold_build3_loc (loc, COND_EXPR, void_type_node,
+			 gfc_likely (cond, PRED_FORTRAN_LOOP_PREHEADER), tmp,
 			 build_empty_stmt (loc));
   gfc_add_expr_to_block (pblock, tmp);
 
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 885367e..fa03e89 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -5250,6 +5250,14 @@ gimple_fold_stmt_to_constant_1 (gimple *stmt, tree (*valueize) (tree),
 	  case IFN_UBSAN_CHECK_MUL:
 		subcode = MULT_EXPR;
 		break;
+	  case IFN_BUILTIN_EXPECT:
+		  {
+		tree arg0 = gimple_call_arg (stmt, 0);
+		tree op0 = (*valueize) (arg0);
+		if (TREE_CODE (op0) == INTEGER_CST)
+		  return op0;
+		return NULL_TREE;
+		  }
 	  default:
 		return NULL_TREE;
 	  }
diff --git a/gcc/predict.def b/gcc/predict.def
index 3e3a43a..a0d0ba9 100644
--- a/gcc/predict.def
+++ b/gcc/predict.def
@@ -199,3 +199,9 @@ DEF_PREDICTOR (PRED_FORTRAN_INVALID_BOUND, "Fortran invalid bound", \
which in turn has an optional argument.  */
 DEF_PREDICTOR (PRED_FORTRAN_ABSENT_DUMMY, "Fortran absent dummy", \
 	   HITRATE (60), 0)
+
+/* Fortran DO statement generates a pre-header guard:
+   empty = (step > 0 ? to < from : to > from), which can be predicted
+   to be very likely.  */
+DEF_PREDICTOR (PRED_FORTRAN_LOOP_PREHEADER, "Fortran loop preheader", \
+	   HITRATE (99), 0)
diff --git a/gcc/testsuite/gfortran.dg/predict-1.f90 b/gcc/testsuite/gfortran.dg/predict-1.f90
new file mode 100644
index 000..81f0436
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/predict-1.f90
@@ -0,0 +1,12 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-profile_estimate" }
+
+subroutine test(block, array)
+integer :: i, block(9), array(2)
+
+do i = array(1), array(2)
+block(i) = i
+end do
+end subroutine test
+
+! { dg-final { scan-tree-dump-times "Fortran loop preheader heuristics of edge\[^:\]*: 99.0%" 1 "profile_estimate" } }
-- 
2.8.3

Re: [PATCH, i386][Updated] Add native support for VIA C7, Eden and Nano CPUs

2016-06-16 Thread Uros Bizjak

On Thu, Jun 16, 2016 at 11:12 AM, J. Mayer  wrote:
> The following patch adds support and native detection for C7, Eden
> "Samuel2", Eden "Nehemiah", Eden "Esther", Eden x2, Eden x4, Nano 1xxx,
> Nano 2xxx, Nano 3xxx, Nano x2 and Nano x4 VIA CPUs.
>
> This patch has been updated against current repository.
> It contains documentation and Changelog updates.
>
> Please CC me to any comment / review / change request.

The patch is OK, modulo redundant

: Pass c7, nehemiah or samuel-2 for signature_CENTAUR_ebx.

The  should not be reached for new processors. This part is a
safety net, intended for "strange" targets - emulators, prehistoric
parts or simply for the cases where the precise details shouldn't
matter.

Attached is the patch I have committed to SVN repository.

2016-06-16  Jocelyn Mayer  

* config/i386/driver-i386.c (host_detect_local_cpu): Set
PROCESSOR_K8 for signature_CENTAUR_ebx with has_longmode.
: Pass nano-3000, nano, eden-x2 or k8 for
signature_CENTAUR_ebx.
* config/i386/i386.c (ix86_option_override_internal): Add
definitions for VIA c7, samuel-2, nehemiah, esther, eden-x2, eden-x4,
nano, nano-1000, nano-2000, nano-3000, nano-x2 and nano-x4.
* doc/invoke.texi: Document new VIA -march entries.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32},
committed to mainline.

BTW: Can you please prepare an entry for the release notes (something
like [1]) mentioning new options? (Release notes are not yet present
for 7.0 development branch, so there is quite some time available ;) )

[1] https://gcc.gnu.org/gcc-6/changes.html

Uros.
Index: config/i386/driver-i386.c
===
--- config/i386/driver-i386.c   (revision 237528)
+++ config/i386/driver-i386.c   (working copy)
@@ -651,7 +651,9 @@
  break;
 
case 6:
- if (model > 9 || has_longmode)
+ if (has_longmode)
+   processor = PROCESSOR_K8;
+ else if (model > 9)
/* Use the default detection procedure.  */
;
  else if (model == 9)
@@ -869,9 +871,30 @@
cpu = "athlon";
   break;
 case PROCESSOR_K8:
-  if (arch && has_sse3)
-   cpu = "k8-sse3";
+  if (arch)
+   {
+ if (vendor == signature_CENTAUR_ebx)
+   {
+ if (has_sse4_1)
+   /* Nano 3000 | Nano dual / quad core | Eden X4 */
+   cpu = "nano-3000";
+ else if (has_ssse3)
+   /* Nano 1000 | Nano 2000 */
+   cpu = "nano";
+ else if (has_sse3)
+   /* Eden X2 */
+   cpu = "eden-x2";
+ else
+   /* Default to k8 */
+   cpu = "k8";
+   }
+ else if (has_sse3)
+   cpu = "k8-sse3";
+ else
+   cpu = "k8";
+   }
   else
+   /* For -mtune, we default to -mtune=k8 */
cpu = "k8";
   break;
 case PROCESSOR_AMDFAM10:
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 237528)
+++ config/i386/i386.c  (working copy)
@@ -4783,8 +4783,15 @@
   {"winchip-c6", PROCESSOR_I486, CPU_NONE, PTA_MMX},
   {"winchip2", PROCESSOR_I486, CPU_NONE, PTA_MMX | PTA_3DNOW | PTA_PRFCHW},
   {"c3", PROCESSOR_I486, CPU_NONE, PTA_MMX | PTA_3DNOW | PTA_PRFCHW},
+  {"samuel-2", PROCESSOR_I486, CPU_NONE, PTA_MMX | PTA_3DNOW | PTA_PRFCHW},
   {"c3-2", PROCESSOR_PENTIUMPRO, CPU_PENTIUMPRO,
PTA_MMX | PTA_SSE | PTA_FXSR},
+  {"nehemiah", PROCESSOR_PENTIUMPRO, CPU_PENTIUMPRO,
+PTA_MMX | PTA_SSE | PTA_FXSR},
+  {"c7", PROCESSOR_PENTIUMPRO, CPU_PENTIUMPRO,
+PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_FXSR},
+  {"esther", PROCESSOR_PENTIUMPRO, CPU_PENTIUMPRO,
+PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_FXSR},
   {"i686", PROCESSOR_PENTIUMPRO, CPU_PENTIUMPRO, 0},
   {"pentiumpro", PROCESSOR_PENTIUMPRO, CPU_PENTIUMPRO, 0},
   {"pentium2", PROCESSOR_PENTIUMPRO, CPU_PENTIUMPRO, PTA_MMX | PTA_FXSR},
@@ -4843,6 +4850,29 @@
PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE | PTA_PRFCHW | PTA_FXSR},
   {"x86-64", PROCESSOR_K8, CPU_K8,
PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_NO_SAHF | PTA_FXSR},
+  {"eden-x2", PROCESSOR_K8, CPU_K8,
+PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_FXSR},
+  {"nano", PROCESSOR_K8, CPU_K8,
+PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+| PTA_SSSE3 | PTA_FXSR},
+  {"nano-1000", PROCESSOR_K8, CPU_K8,
+PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+| PTA_SSSE3 | PTA_FXSR},
+  {"nano-2000", PROCESSOR_K8, CPU_K8,
+PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+| PTA_SSSE3 | PTA_FXSR},
+  {"nano-3000", PROCESSOR_K8, CPU_K8,
+PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+| PTA_SSSE3 | PTA_SSE4_1 | PTA_FXSR},

Re: [PATCH, vec-tails 02/10] Extend _loop_vec_info structure with epilogue related fields

2016-06-16 Thread Ilya Enkovich

2016-06-16 8:11 GMT+03:00 Jeff Law :
> On 05/19/2016 01:38 PM, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> This patch adds new fields to _loop_vec_info structure to support loop
>> epilogue vectorization.
>>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2016-05-19  Ilya Enkovich  
>>
>> * tree-vectorizer.h (struct _loop_vec_info): Add new fields
>> can_be_masked, required_masks, mask_epilogue, combine_epilogue,
>> need_masking, orig_loop_info.
>> (LOOP_VINFO_CAN_BE_MASKED): New.
>> (LOOP_VINFO_REQUIRED_MASKS): New.
>> (LOOP_VINFO_COMBINE_EPILOGUE): New.
>> (LOOP_VINFO_MASK_EPILOGUE): New.
>> (LOOP_VINFO_NEED_MASKING): New.
>> (LOOP_VINFO_ORIG_LOOP_INFO): New.
>> (LOOP_VINFO_EPILOGUE_P): New.
>> (LOOP_VINFO_ORIG_MASK_EPILOGUE): New.
>> (LOOP_VINFO_ORIG_VECT_FACTOR): New.
>> * tree-vect-loop.c (new_loop_vec_info): Initialize new
>> _loop_vec_info fields.
>
> I don't see anything here that is inherently wrong/bad here; I think this
> would be fine once the whole set is approved.   I also think if you find
> that you need additional similar kinds of fields, that would be OK as well.
>
> The one question I would ask -- do we ever need to copy VINFO data from one
> loop to a duplicate, and if so, ISTM that the code to copy that data would
> be a part of this patch.

AFAIK we currently never copy vectorized loop in vectorizer.  I never saw VINFO
to be copied and I don't see corresponding API in tree-vectorizer.h.
I'll double-check it.

Thanks,
Ilya

>
> jeff
>

Re: RFA (gimplify): PATCH to implement C++ order of evaluation paper

2016-06-16 Thread Jakub Jelinek

On Thu, Jun 16, 2016 at 11:28:48AM -0400, Jason Merrill wrote:
>  gimple_predicate
>  rhs_predicate_for (tree lhs)
>  {
> -  if (is_gimple_reg (lhs))
> +  if (will_be_gimple_reg (lhs))
>  return is_gimple_reg_rhs_or_call;
>else
>  return is_gimple_mem_rhs_or_call;
> @@ -4778,10 +4811,6 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p,
>   that is what we must do here.  */
>maybe_with_size_expr (from_p);
>  
> -  ret = gimplify_expr (to_p, pre_p, post_p, is_gimple_lvalue, fb_lvalue);
> -  if (ret == GS_ERROR)
> -return ret;
> -
>/* As a special case, we have to temporarily allow for assignments
>   with a CALL_EXPR on the RHS.  Since in GIMPLE a function call is
>   a toplevel statement, when gimplifying the GENERIC expression
> @@ -4799,6 +4828,10 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p,
>if (ret == GS_ERROR)
>  return ret;
>  
> +  ret = gimplify_expr (to_p, pre_p, post_p, is_gimple_lvalue, fb_lvalue);
> +  if (ret == GS_ERROR)
> +return ret;
> +
>/* In case of va_arg internal fn wrappped in a WITH_SIZE_EXPR, add the type
>   size as argument to the call.  */
>if (TREE_CODE (*from_p) == WITH_SIZE_EXPR)

I wonder if instead of trying to guess early what we'll gimplify into it
wouldn't be better to gimplify *from_p twice, first time with a predicate
that would assume *to_p could be gimplified into is_gimple_ref, but
guarantee there are no side-effects (so that those aren't evaluated
after lhs side-effects), and second time if needed (if *to_p didn't end up
being is_gimple_reg).  So something like a new predicate like:

static bool
is_whatever (tree t)
{
  /* For calls, as there are side-effects, assume lhs might not be
 is_gimple_reg.  */
  if (TREE_CODE (t) == CALL_EXPR && is_gimple_reg_type (TREE_TYPE (t)))
return is_gimple_val (t);
  /* For other side effects, also make sure those are evaluated before
 side-effects in lhs.  */
  if (TREE_THIS_VOLATILE (t))
return is_gimple_mem_rhs_or_call (t);
  /* Otherwise, optimistically assume lhs will be is_gimple_reg.  */
  return is_gimple_reg_rhs_or_call (t);
}

and then do in gimplify_modify_expr:
  ret = gimplify_expr (from_p, pre_p, post_p,
   is_gimple_reg (*to_p)
   ? is_gimple_reg_rhs_or_call : is_whatever,
   fb_rvalue);
  if (ret == GS_ERROR)
return ret;

  ret = gimplify_expr (to_p, pre_p, post_p, is_gimple_lvalue, fb_lvalue);
  if (ret == GS_ERROR)
return ret;

  if (!is_gimple_reg (*to_p) && !is_gimple_mem_rhs_or_call (*from_p))
{
  ret = gimplify_expr (from_p, pre_p, post_p, is_gimple_mem_rhs_or_call,
   fb_rvalue);
  if (ret == GS_ERROR)
return ret;
}

Or if you want to guess if *to_p will be is_gimple_reg or not after
gimplification, do it just very conservatively and let the more difficult
to predict cases handled worst case by forcing something into a temporary
with the above code.

Jakub

Re: RFA (gimplify): PATCH to implement C++ order of evaluation paper

2016-06-16 Thread Jakub Jelinek

On Thu, Jun 16, 2016 at 06:15:34PM +0200, Jakub Jelinek wrote:
> and then do in gimplify_modify_expr:
>   ret = gimplify_expr (from_p, pre_p, post_p,
>  is_gimple_reg (*to_p)

^^^ of course even this is a prediction and wrong one for
DECL_HAS_VALUE_EXPR_Ps.  Conservative would be is_whatever always.

>  ? is_gimple_reg_rhs_or_call : is_whatever,
>fb_rvalue);
>   if (ret == GS_ERROR)
> return ret;
> 
>   ret = gimplify_expr (to_p, pre_p, post_p, is_gimple_lvalue, fb_lvalue);
>   if (ret == GS_ERROR)
> return ret;
> 
>   if (!is_gimple_reg (*to_p) && !is_gimple_mem_rhs_or_call (*from_p))
> {
>   ret = gimplify_expr (from_p, pre_p, post_p, is_gimple_mem_rhs_or_call,
>  fb_rvalue);
>   if (ret == GS_ERROR)
>   return ret;
> }
> 
> Or if you want to guess if *to_p will be is_gimple_reg or not after
> gimplification, do it just very conservatively and let the more difficult
> to predict cases handled worst case by forcing something into a temporary
> with the above code.

Jakub

Re: [PATCH] Add port for Phoenix-RTOS on ARM platform.

2016-06-16 Thread Jeff Law


On 06/16/2016 02:59 AM, Jakub Sejdak wrote:

Actually, if possible, I would skip the "arm" part, because we plan to
port Phoenix-RTOS for other platforms. It will be easier to do it
once.
Generally we prefer to see an ongoing commitment to the GCC project 
along with regular high quality contributions to appoint maintainers.


So at least in the immediate term let's get you write privileges so you 
can commit approved changes and on the path towards maintaining the 
Phoenix-RTOS configurations.


https://www.gnu.org/software/gcc/svnwrite.html

jeff

Re: [Patch, avr] Fix PR 71151

2016-06-16 Thread Denis Chertykov

2016-06-16 10:27 GMT+03:00 Senthil Kumar Selvaraj
:
>
> Senthil Kumar Selvaraj writes:
>
>> Georg-Johann Lay writes:
>>
>>> Senthil Kumar Selvaraj schrieb:
 Hi,

   This patch fixes PR 71151 by eliminating the
   TARGET_ASM_FUNCTION_RODATA_SECTION hook and setting
   JUMP_TABLES_IN_TEXT_SECTION to 1.

   As described in the bugzilla entry, this hook assumed it will get
   called only for jumptable rodata for functions. This was true until
   6.1, when a commit in varasm.c started calling the hook for mergeable
   string/constant data as well.

   This resulted in string constants ending up in a section intended for
   jumptables (flash), and broke code using those constants, which
   expects them to be present in rodata (SRAM).

   Given that the original reason for placing jumptables in a section was
   fixed by Johann in PR 63323, this patch restores the original
   behavior. Reg testing on both gcc-6-branch and trunk showed no 
 regressions.
>>>
>>> Just for the record:
>>>
>>> The intention for jump-tables in function-rodata-section was to get
>>> fine-grained section for the tables so that --gc-sections and
>>> -ffunction-sections not only gc unused functions but also unused
>>> jump-tables.  As these tables had to reside in the lowest 64KiB of flash
>>> (.progmem section) neither .rodata nor .text was a correct placement,
>>> hence the hacking in TARGET_ASM_FUNCTION_RODATA_SECTION.
>>>
>>> Before using TARGET_ASM_FUNCTION_RODATA_SECTION, all jump tables were
>>> put into .progmem.gcc_sw_table by ASM_OUTPUT_BEFORE_CASE_LABEL switching
>>> to that section.
>>>
>>> We actually never had fump-tables in .text before...
>>
>> JUMP_TABLES_IN_TEXT_SECTION was 1 before r37465 - that was when the
>> progmem.gcc_sw_table section was introduced. But yes, I understand that
>> the target hook for FUNCTION_RODATA_SECTION was done to get them gc'ed
>> along with the code.
>>
>>>
>>> The purpose of PR63323 was to have more generic jump-table
>>> implementation that also works if the table does NOT reside in the lower
>>> 64KiB.  This happens when moving whole whole TEXT section around like
>>> for a bootloader.
>>
>> Understood.
>>>
   As pointed out by Johann, this may end up increasing code
   size if there are lots of branches that cross the jump tables. I
   intend to propose a separate patch that gives additional information
   to the target hook (SECCAT_RODATA_{STRING,JUMPTABLE}) so it can know
   what type of function rodata is coming on. Johann also suggested
   handling jump table generation ourselves - I'll experiment with that
   some more.

   If ok, could someone commit please? Could you also backport to
   gcc-6-branch?

 Regards
 Senthil

 gcc/ChangeLog

 2016-06-03  Senthil Kumar Selvaraj  

 * config/avr/avr.c (avr_asm_function_rodata_section): Remove.
 * config/avr/avr.c (TARGET_ASM_FUNCTION_RODATA_SECTION): Remove.

 gcc/testsuite/ChangeLog

 2016-06-03  Senthil Kumar Selvaraj  

 * gcc/testsuite/gcc.target/avr/pr71151-1.c: New.
 * gcc/testsuite/gcc.target/avr/pr71151-2.c: New.

 diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c
 index ba5cd91..3cb8cb7 100644
 --- gcc/config/avr/avr.c
 +++ gcc/config/avr/avr.c
 @@ -9488,65 +9488,6 @@ avr_asm_init_sections (void)
  }


 -/* Implement `TARGET_ASM_FUNCTION_RODATA_SECTION'.  */
 -
 -static section*
 -avr_asm_function_rodata_section (tree decl)
 -{
 -  /* If a function is unused and optimized out by -ffunction-sections
 - and --gc-sections, ensure that the same will happen for its jump
 - tables by putting them into individual sections.  */
 -
 -  unsigned int flags;
 -  section * frodata;
 -
 -  /* Get the frodata section from the default function in varasm.c
 - but treat function-associated data-like jump tables as code
 - rather than as user defined data.  AVR has no constant pools.  */
 -  {
 -int fdata = flag_data_sections;
 -
 -flag_data_sections = flag_function_sections;
 -frodata = default_function_rodata_section (decl);
 -flag_data_sections = fdata;
 -flags = frodata->common.flags;
 -  }
 -
 -  if (frodata != readonly_data_section
 -  && flags & SECTION_NAMED)
 -{
 -  /* Adjust section flags and replace section name prefix.  */
 -
 -  unsigned int i;
 -
 -  static const char* const prefix[] =
 -{
 -  ".rodata",  ".progmem.gcc_sw_table",
 -  ".gnu.linkonce.r.", ".gnu.linkonce.t."
 -};
 -
 -  for (i = 0; i < sizeof (prefix) / sizeof (*prefix); i += 2)
 -{
 -  const char * old_prefix = prefix[i];
 -  const char * new_prefix = pref

Re: [PATCH, vec-tails 07/10] Support loop epilogue combining

2016-06-16 Thread Jeff Law


On 05/19/2016 01:44 PM, Ilya Enkovich wrote:

Hi,

This patch introduces support for loop epilogue combining.  This includes
support in cost estimation and all required changes required to mask
vectorized loop.

Thanks,
Ilya
--
gcc/

2016-05-19  Ilya Enkovich  

* dbgcnt.def (vect_tail_combine): New.
* params.def (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD): New.
* tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var.
* tree-vect-loop-manip.c (slpeel_tree_peel_loop_to_edge): Support
epilogue combined with loop body.
(vect_do_peeling_for_loop_bound): Likewise.
* tree-vect-loop.c Include alias.h and dbgcnt.h.
(vect_estimate_min_profitable_iters): Add 
ret_min_profitable_combine_niters
arg, compute number of iterations for which loop epilogue combining is
profitable.
(vect_generate_tmps_on_preheader): Support combined apilogue.
(vect_gen_ivs_for_masking): New.
(vect_get_mask_index_for_elems): New.
(vect_get_mask_index_for_type): New.
(vect_gen_loop_masks): New.
(vect_mask_reduction_stmt): New.
(vect_mask_mask_load_store_stmt): New.
(vect_mask_load_store_stmt): New.
(vect_combine_loop_epilogue): New.
(vect_transform_loop): Support combined apilogue.


diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index fab5879..b3c0668 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1464,11 +1469,20 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, 
struct loop *scalar_loop,
   bb_between_loops = new_exit_bb;
   bb_after_second_loop = split_edge (single_exit (second_loop));

-  pre_condition =
-   fold_build2 (EQ_EXPR, boolean_type_node, *first_niters, niters);
-  skip_e = slpeel_add_loop_guard (bb_between_loops, pre_condition, NULL,
-  bb_after_second_loop, bb_before_first_loop,
- inverse_probability 
(second_guard_probability));
+  if (skip_second_after_first)
+/* We can just redirect edge from bb_between_loops to
+   bb_after_second_loop but we have many code assuming
+   we have a guard after the first loop.  So just make
+   always taken condtion.  */
+pre_condition = fold_build2 (EQ_EXPR, boolean_type_node, integer_zero_node,
+integer_zero_node);

This isn't ideal, but I don't think it's that big of an issue.


@@ -1758,8 +1772,10 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
   basic_block preheader;
   int loop_num;
   int max_iter;
+  int bound2;
   tree cond_expr = NULL_TREE;
   gimple_seq cond_expr_stmt_list = NULL;
+  bool combine = LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo);

   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,
@@ -1769,12 +1785,13 @@ vect_do_peeling_for_loop_bound (loop_vec_info 
loop_vinfo,

   loop_num  = loop->num;

+  bound2 = combine ? th : LOOP_VINFO_VECT_FACTOR (loop_vinfo);
Can you document what the TH parameter is to the various routines that 
use it in tree-vect-loop-manip.c?  I realize you didn't add it, but it 
would help anyone looking at this code in the future to know it's the 
threshold of iterations for vectorization without having to find it in 
other function comment headers ;-)


That's pre-approved to go in immediately :-)


@@ -1803,7 +1820,11 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
   max_iter = (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
  ? LOOP_VINFO_VECT_FACTOR (loop_vinfo) * 2
  : LOOP_VINFO_VECT_FACTOR (loop_vinfo)) - 2;
-  if (check_profitability)
+  /* When epilogue is combined only profitability
+ treshold matters.  */

s/treshold/threshold/




 static void
 vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
int *ret_min_profitable_niters,
-   int *ret_min_profitable_estimate)
+   int *ret_min_profitable_estimate,
+   int *ret_min_profitable_combine_niters)
I'm torn a bit here.  There's all kinds of things missing/incomplete in 
the function comments throughout the vectorizer.  And in some cases, 
like this one, the parameters are largely self-documenting.  But we've 
also got coding standards that we'd like to adhere to.


I don't think it's fair to require you to fix all these issues in the 
vectorizer (though if you wanted to, I'd fully support those an 
independent cleanups).


Perhaps just document LOOP_VINFO with a generic comment about the ret_* 
parameters for this function rather than a comment for each ret_* 
parameter.  Pre-approved for the trunk independent of the vec-tails work.




@@ -3728,6 +3784,77 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
 min_profitable_estimate);

+
+  unsigned combine_treshold
+   = PARAM_VALUE (PARAM_VECT_COST_

Re: [PATCH, vec-tails 09/10] Print more info about vectorized loop

2016-06-16 Thread Jeff Law


On 05/19/2016 01:49 PM, Ilya Enkovich wrote:

Hi,

This patch extends dumps for vectorized loops to provide more info
about them and also specify used vector size.  This is to be used
for tests.

Thanks,
Ilya
--
gcc/

2016-05-19  Ilya Enkovich  

* tree-vect-loop.c (vect_transform_loop): Print more info
about vectorized loop and specify used vector size.

OK when the rest of the dependent work is approved.

jeff

Re: [PATCH, vec-tails 06/10] Mark the first vector store generated for a scalar store

2016-06-16 Thread Jeff Law


On 05/19/2016 01:43 PM, Ilya Enkovich wrote:

Hi,

This patch STMT_VINFO_FIRST_COPY_P field to statement vec info.
This is used to find the first vector store generated for a
scalar one.  For other statements I use original scalar statement
to find the first and following vector statement.  For stores
original scalar statement is removed and this new fiels is used
to mark a chain start.  Also original data reference and vector
type are preserved in the first vector statement for masking
purposes.

Thanks,
Ilya
--
gcc/

2016-05-19  Ilya Enkovich  

* tree-vect-stmts.c (vectorizable_mask_load_store): Mark
the first copy of generated vector stores.
(vectorizable_store): Mark the first copy of generated
vector stores and provide it with vectype and the original
data reference.
* tree-vectorizer.h (struct _stmt_vec_info): Add first_copy_p
field.
(STMT_VINFO_FIRST_COPY_P): New.

OK when the rest of the patch kit is approved.

jeff

[Patch ARM arm_neon.h] s/__FAST_MATH/__FAST_MATH__/g

2016-06-16 Thread James Greenhalgh


Hi,

As subject, config/arm/arm_neon.h currently uses __FAST_MATH, but:

  $ gcc -E -dM - -ffast-math < /dev/null | grep FAST_MATH
  #define __FAST_MATH__ 1

It should be spelled as __FAST_MATH__.

I've made that change, and confirmed that it causes the preprocessor to
do what was intended for these intrinsics under -ffast-math.

Currently bootstrapped on arm-none-linux-gnueabihf with no issues.

This could also be backported to release branches. I think Ramana's patch
went in for GCC 5.0, so backports to gcc_5_branch and gcc_6_branch would
be feasible.

Thanks,
James

---
2016-06-16  James Greenhalgh  

* config/arm/arm_neon.h (vadd_f32): replace __FAST_MATH with
__FAST_MATH__.
(vaddq_f32): Likewise.
(vmul_f32): Likewise.
(vmulq_f32): Likewise.
(vsub_f32): Likewise.
(vsubq_f32): Likewise.

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 7997cb4..32ee06c 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -530,7 +530,7 @@ vadd_s32 (int32x2_t __a, int32x2_t __b)
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vadd_f32 (float32x2_t __a, float32x2_t __b)
 {
-#ifdef __FAST_MATH
+#ifdef __FAST_MATH__
   return __a + __b;
 #else
   return (float32x2_t) __builtin_neon_vaddv2sf (__a, __b);
@@ -594,7 +594,7 @@ vaddq_s64 (int64x2_t __a, int64x2_t __b)
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vaddq_f32 (float32x4_t __a, float32x4_t __b)
 {
-#ifdef __FAST_MATH
+#ifdef __FAST_MATH__
   return __a + __b;
 #else
   return (float32x4_t) __builtin_neon_vaddv4sf (__a, __b);
@@ -1030,7 +1030,7 @@ vmul_s32 (int32x2_t __a, int32x2_t __b)
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vmul_f32 (float32x2_t __a, float32x2_t __b)
 {
-#ifdef __FAST_MATH
+#ifdef __FAST_MATH__
   return __a * __b;
 #else
   return (float32x2_t) __builtin_neon_vmulfv2sf (__a, __b);
@@ -1077,7 +1077,7 @@ vmulq_s32 (int32x4_t __a, int32x4_t __b)
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vmulq_f32 (float32x4_t __a, float32x4_t __b)
 {
-#ifdef __FAST_MATH
+#ifdef __FAST_MATH__
   return __a * __b;
 #else
   return (float32x4_t) __builtin_neon_vmulfv4sf (__a, __b);
@@ -1678,7 +1678,7 @@ vsub_s32 (int32x2_t __a, int32x2_t __b)
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vsub_f32 (float32x2_t __a, float32x2_t __b)
 {
-#ifdef __FAST_MATH
+#ifdef __FAST_MATH__
   return __a - __b;
 #else
   return (float32x2_t) __builtin_neon_vsubv2sf (__a, __b);
@@ -1742,7 +1742,7 @@ vsubq_s64 (int64x2_t __a, int64x2_t __b)
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vsubq_f32 (float32x4_t __a, float32x4_t __b)
 {
-#ifdef __FAST_MATH
+#ifdef __FAST_MATH__
   return __a - __b;
 #else
   return (float32x4_t) __builtin_neon_vsubv4sf (__a, __b);

[PATCH,rs6000] Improve tests for Power9 vector absolute difference unsigned instructions

2016-06-16 Thread Kelvin Nilsen

This patch improves upon a recently committed patch to add support for
Power9 vector absolute difference unsigned instructions in two ways:

1. The dg-require-effective-target directive is changed in all tests to
allow the test to run even though the tests are not run on a Power9
platform, as long as the associated as tool understands Power9
instructions.  A dg-skip-if directive is added to all tests to disable
these tests on aix hosts, because that platform is known to have
incompatibilities for these tests.

2. The body of the vadsdub-2.c test is modified to test different
behavior than is tested by vadsdub-1.c.  In the previous commit, these
two tests were identical.

gcc/testsuite/ChangeLog:

2016-06-16  Kelvin Nilsen  

* gcc.target/powerpc/vadsdu-0.c: Replace
dg-require-effective-target directive to allow test to run on more
platforms, and add dg-skip-if directive to disable test on aix
platforms because of known incompatibilities.
* gcc.target/powerpc/vadsdu-1.c: Likewise.
* gcc.target/powerpc/vadsdu-2.c: Likewise.
* gcc.target/powerpc/vadsdu-3.c: Likewise.
* gcc.target/powerpc/vadsdu-4.c: Likewise.
* gcc.target/powerpc/vadsdu-5.c: Likewise.
* gcc.target/powerpc/vadsdub-1.c: Likewise.
* gcc.target/powerpc/vadsdub-2.c: Replace
dg-require-effective-target directive to allow test to run on more
platforms, and add dg-skip-if directive to disable test on aix
platforms because of known incompatibilities.
(doAbsoluteDifferenceUnsigned): Replace __builtin_vec_vadub call
with vec_absdb call to differentiate this test from vadsdub-1.c.
* gcc.target/powerpc/vadsduh-1.c: Replace
dg-require-effective-target directive to allow test to run on more
platforms, and add dg-skip-if directive to disable test on aix
platforms because of known incompatibilities.
* gcc.target/powerpc/vadsduh-2.c: Likewise.
* gcc.target/powerpc/vadsduw-1.c: Likewise.
* gcc.target/powerpc/vadsduw-2.c: Likewise.

Index: gcc/testsuite/gcc.target/powerpc/vadsdu-0.c
===
--- gcc/testsuite/gcc.target/powerpc/vadsdu-0.c (revision 237462)
+++ gcc/testsuite/gcc.target/powerpc/vadsdu-0.c (working copy)
@@ -1,6 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
-/* { dg-require-effective-target p9vector_hw } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "" { powerpc*-*-aix* } { "*" } { "" } } */
 /* { dg-options "-mcpu=power9" } */
 
 /* This test should succeed on both 32- and 64-bit configurations.  */
Index: gcc/testsuite/gcc.target/powerpc/vadsdu-1.c
===
--- gcc/testsuite/gcc.target/powerpc/vadsdu-1.c (revision 237462)
+++ gcc/testsuite/gcc.target/powerpc/vadsdu-1.c (working copy)
@@ -1,6 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
-/* { dg-require-effective-target p9vector_hw } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "" { powerpc*-*-aix* } { "*" } { "" } } */
 /* { dg-options "-mcpu=power9" } */
 
 /* This test should succeed on both 32- and 64-bit configurations.  */
Index: gcc/testsuite/gcc.target/powerpc/vadsdu-2.c
===
--- gcc/testsuite/gcc.target/powerpc/vadsdu-2.c (revision 237462)
+++ gcc/testsuite/gcc.target/powerpc/vadsdu-2.c (working copy)
@@ -1,6 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
-/* { dg-require-effective-target p9vector_hw } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "" { powerpc*-*-aix* } { "*" } { "" } } */
 /* { dg-options "-mcpu=power9" } */
 
 /* This test should succeed on both 32- and 64-bit configurations.  */
Index: gcc/testsuite/gcc.target/powerpc/vadsdu-3.c
===
--- gcc/testsuite/gcc.target/powerpc/vadsdu-3.c (revision 237462)
+++ gcc/testsuite/gcc.target/powerpc/vadsdu-3.c (working copy)
@@ -1,6 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
-/* { dg-require-effective-target p9vector_hw } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "" { powerpc*-*-aix* } { "*" } { "" } } */
 /* { dg-options "-mcpu=power9" } */
 
 /* This test should succeed on both 32- and 64-bit configurations.  */
Index: gcc/testsuite/gcc.target/powerpc/vadsdu-4.c
===
--- gc

Re: [PATCH,rs6000] Improve tests for Power9 vector absolute difference unsigned instructions

2016-06-16 Thread Kelvin Nilsen



On 06/16/2016 11:47 AM, Kelvin Nilsen wrote:
> This patch improves upon a recently committed patch to add support for
> Power9 vector absolute difference unsigned instructions in two ways:
> 
> 1. The dg-require-effective-target directive is changed in all tests to
> allow the test to run even though the tests are not run on a Power9
> platform, as long as the associated as tool understands Power9
> instructions.  A dg-skip-if directive is added to all tests to disable
> these tests on aix hosts, because that platform is known to have
> incompatibilities for these tests.
> 
> 2. The body of the vadsdub-2.c test is modified to test different
> behavior than is tested by vadsdub-1.c.  In the previous commit, these
> two tests were identical.
> 

Sorry for quick-fingering my post.  I meant to add before posting that I
have bootstrapped and regression tested on powerpc64le-unkonwn-linux-gnu
with no regressions.  Is this ok for trunk?  Is it ok for gcc-6 after a
few days of burn-in on the trunk?

Thanks.

[DOC PATCH, i386]: Document -m80387 and -mhard-float

2016-06-16 Thread Uros Bizjak

Hello!

These two options were missing from the documentation.

2016-06-16  Uros Bizjak  

* doc/invoke.texi (x86 Options): Document -m80387 and -mhard-float.

Bootstrapped on x86_64-linux-gnu, committed to mainline SVN.

Uros.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 237534)
+++ doc/invoke.texi (working copy)
@@ -1149,7 +1149,7 @@
 -mtune-ctrl=@var{feature-list} -mdump-tune-features -mno-default @gol
 -mfpmath=@var{unit} @gol
 -masm=@var{dialect}  -mno-fancy-math-387 @gol
--mno-fp-ret-in-387  -msoft-float @gol
+-mnofp-ret-in-387 -m80387 -mhard-float -msoft-float @gol
 -mno-wide-multiply  -mrtd  -malign-double @gol
 -mpreferred-stack-boundary=@var{num} @gol
 -mincoming-stack-boundary=@var{num} @gol
@@ -23544,7 +23544,15 @@
 comparisons.  These correctly handle the case where the result of a
 comparison is unordered.
 
+@item -m80387
+@item -mhard-float
+@opindex 80387
+@opindex mhard-float
+Generate output containing 80387 instructions for floating point.
+
+@item -mno-80387
 @item -msoft-float
+@opindex no-80387
 @opindex msoft-float
 Generate output containing library calls for floating point.

Re: [Patch] Implement is_[nothrow_]swappable (p0185r1) - 2nd try

2016-06-16 Thread Daniel Krügler

2016-06-16 17:07 GMT+02:00 Jonathan Wakely :
> On 16/06/16 14:08 +0100, Jonathan Wakely wrote:
>>
>>
>> /home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/20_util/is_nothrow_swappable/./value.h:285:3:
>> error: static assertion failed
>>
>> /home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/20_util/is_nothrow_swappable/./value.h:287:3:
>> error: static assertion failed
>>
>> /home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/20_util/is_nothrow_swappable/./value.h:289:3:
>> error: static assertion failed
>>
>> Those assertions fail for both 20_util/is_nothrow_swappable/value.cc
>> and 20_util/is_nothrow_swappable/value_ext.cc
>
>
> Those assertions should be changed because with your patch queue,
> priority_queue and stack are nothrow swappable, because they don't
> depend on the value_type's swappable trait, but the sequence's.

Yes, I agree.

> The other problems are that some tests need their dg-error lines
> adjusting, and  says #ifdef __cplusplus >= 201402L which
> should be #if.

OK.

Thanks again for your patience to test the whole mess!

- Daniel

[committed] Fix OpenMP C++ mapping of struct elements with reference to struct as base

2016-06-16 Thread Jakub Jelinek

Hi!

As the testcase shows, we weren't handling properly the case where the
decl after which .field appears in map clauses is reference to struct/class.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, and
tested with x86_64-intelmicemul-linux offloading on x86_64-linux, committed
to trunk.

2016-06-16  Jakub Jelinek  

* gimplify.c (gimplify_scan_omp_clauses): Handle COMPONENT_REFs
with base of reference to struct.

* parser.c (cp_parser_omp_var_list_no_open): Call
convert_from_reference before cp_parser_postfix_dot_deref_expression.
* semantics.c (finish_omp_clauses): Don't ICE when
processing_template_decl when checking for bitfields and unions.
Look through REFERENCE_REF_P as base of COMPONENT_REF.

* testsuite/libgomp.c++/target-20.C: New test.

--- gcc/gimplify.c.jj   2016-06-16 15:33:37.137940504 +0200
+++ gcc/gimplify.c  2016-06-16 16:50:21.489329556 +0200
@@ -6983,6 +6983,11 @@ gimplify_scan_omp_clauses (tree *list_p,
{
  while (TREE_CODE (decl) == COMPONENT_REF)
decl = TREE_OPERAND (decl, 0);
+ if (TREE_CODE (decl) == INDIRECT_REF
+ && DECL_P (TREE_OPERAND (decl, 0))
+ && (TREE_CODE (TREE_TYPE (TREE_OPERAND (decl, 0)))
+ == REFERENCE_TYPE))
+   decl = TREE_OPERAND (decl, 0);
}
  if (gimplify_expr (pd, pre_p, NULL, is_gimple_lvalue, fb_lvalue)
  == GS_ERROR)
@@ -6998,9 +7003,11 @@ gimplify_scan_omp_clauses (tree *list_p,
  break;
}
 
- if (TYPE_SIZE_UNIT (TREE_TYPE (decl)) == NULL
- || (TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (decl)))
- != INTEGER_CST))
+ tree stype = TREE_TYPE (decl);
+ if (TREE_CODE (stype) == REFERENCE_TYPE)
+   stype = TREE_TYPE (stype);
+ if (TYPE_SIZE_UNIT (stype) == NULL
+ || TREE_CODE (TYPE_SIZE_UNIT (stype)) != INTEGER_CST)
{
  error_at (OMP_CLAUSE_LOCATION (c),
"mapping field %qE of variable length "
@@ -7040,6 +7047,14 @@ gimplify_scan_omp_clauses (tree *list_p,
  base = get_inner_reference (base, &bitsize, &bitpos, &offset,
  &mode, &unsignedp, &reversep,
  &volatilep, false);
+ tree orig_base = base;
+ if ((TREE_CODE (base) == INDIRECT_REF
+  || (TREE_CODE (base) == MEM_REF
+  && integer_zerop (TREE_OPERAND (base, 1
+ && DECL_P (TREE_OPERAND (base, 0))
+ && (TREE_CODE (TREE_TYPE (TREE_OPERAND (base, 0)))
+ == REFERENCE_TYPE))
+   base = TREE_OPERAND (base, 0);
  gcc_assert (base == decl
  && (offset == NULL_TREE
  || TREE_CODE (offset) == INTEGER_CST));
@@ -7053,7 +7068,10 @@ gimplify_scan_omp_clauses (tree *list_p,
  tree l = build_omp_clause (OMP_CLAUSE_LOCATION (c),
 OMP_CLAUSE_MAP);
  OMP_CLAUSE_SET_MAP_KIND (l, GOMP_MAP_STRUCT);
- OMP_CLAUSE_DECL (l) = decl;
+ if (orig_base != base)
+   OMP_CLAUSE_DECL (l) = unshare_expr (orig_base);
+ else
+   OMP_CLAUSE_DECL (l) = decl;
  OMP_CLAUSE_SIZE (l) = size_int (1);
  if (struct_map_to_clause == NULL)
struct_map_to_clause = new hash_map;
@@ -7095,6 +7113,18 @@ gimplify_scan_omp_clauses (tree *list_p,
  *list_p = l;
  list_p = &OMP_CLAUSE_CHAIN (l);
}
+ if (orig_base != base && code == OMP_TARGET)
+   {
+ tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c),
+ OMP_CLAUSE_MAP);
+ enum gomp_map_kind mkind
+   = GOMP_MAP_FIRSTPRIVATE_REFERENCE;
+ OMP_CLAUSE_SET_MAP_KIND (c2, mkind);
+ OMP_CLAUSE_DECL (c2) = decl;
+ OMP_CLAUSE_SIZE (c2) = size_zero_node;
+ OMP_CLAUSE_CHAIN (c2) = OMP_CLAUSE_CHAIN (l);
+ OMP_CLAUSE_CHAIN (l) = c2;
+   }
  flags = GOVD_MAP | GOVD_EXPLICIT;
  if (GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)) || ptr)
flags |= GOVD_SEEN;
@@ -7113,8 +7143,12 @@ gimplify_scan_omp_cl

[PATCH] Fix finding of a first match predictor

2016-06-16 Thread Martin Liška


Hello.

Currently, whenever we find a first match predictor, we firstly find a predictor
which is defined first in the list of predictors and eventually we check that
the predictor is PRED_FLAG_FIRST_MATCH. Proper implementation is to consider 
just
predictors with the flag set.

Patch reg&bootstraps on x86_64-linux-gnu and is pre-approved by Honza.

Installed as r237539.

Martin
>From c20cc4a5f3b7ae756ce286a1ef0045f50bc96d46 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 16 Jun 2016 17:44:48 +0200
Subject: [PATCH] Fix finding of a first match predictor

gcc/ChangeLog:

2016-06-16  Martin Liska  

	* predict.c (combine_predictions_for_insn): When we find a first
	match predictor, we should consider just predictors with
	PRED_FLAG_FIRST_MATCH.  Print either first match (if any) or
	DS theory predictor.
	(combine_predictions_for_bb): Likewise.
---
 gcc/predict.c | 30 ++
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/gcc/predict.c b/gcc/predict.c
index bafcc96..642bd62 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -835,7 +835,8 @@ combine_predictions_for_insn (rtx_insn *insn, basic_block bb)
 	int probability = INTVAL (XEXP (XEXP (note, 0), 1));
 
 	found = true;
-	if (best_predictor > predictor)
+	if (best_predictor > predictor
+	&& predictor_info[predictor].flags & PRED_FLAG_FIRST_MATCH)
 	  best_probability = probability, best_predictor = predictor;
 
 	d = (combined_probability * probability
@@ -855,7 +856,7 @@ combine_predictions_for_insn (rtx_insn *insn, basic_block bb)
  use no_prediction heuristic, in case we did match, use either
  first match or Dempster-Shaffer theory depending on the flags.  */
 
-  if (predictor_info [best_predictor].flags & PRED_FLAG_FIRST_MATCH)
+  if (best_predictor != END_PREDICTORS)
 first_match = true;
 
   if (!found)
@@ -863,10 +864,12 @@ combine_predictions_for_insn (rtx_insn *insn, basic_block bb)
 		 combined_probability, bb);
   else
 {
-  dump_prediction (dump_file, PRED_DS_THEORY, combined_probability,
-		   bb, !first_match ? REASON_NONE : REASON_IGNORED);
-  dump_prediction (dump_file, PRED_FIRST_MATCH, best_probability,
-		   bb, first_match ? REASON_NONE : REASON_IGNORED);
+  if (!first_match)
+	dump_prediction (dump_file, PRED_DS_THEORY, combined_probability,
+			 bb, !first_match ? REASON_NONE : REASON_IGNORED);
+  else
+	dump_prediction (dump_file, PRED_FIRST_MATCH, best_probability,
+			 bb, first_match ? REASON_NONE : REASON_IGNORED);
 }
 
   if (first_match)
@@ -1096,7 +1099,8 @@ combine_predictions_for_bb (basic_block bb, bool dry_run)
 	  found = true;
 	  /* First match heuristics would be widly confused if we predicted
 	 both directions.  */
-	  if (best_predictor > predictor)
+	  if (best_predictor > predictor
+	&& predictor_info[predictor].flags & PRED_FLAG_FIRST_MATCH)
 	{
   struct edge_prediction *pred2;
 	  int prob = probability;
@@ -1142,17 +1146,19 @@ combine_predictions_for_bb (basic_block bb, bool dry_run)
  use no_prediction heuristic, in case we did match, use either
  first match or Dempster-Shaffer theory depending on the flags.  */
 
-  if (predictor_info [best_predictor].flags & PRED_FLAG_FIRST_MATCH)
+  if (best_predictor != END_PREDICTORS)
 first_match = true;
 
   if (!found)
 dump_prediction (dump_file, PRED_NO_PREDICTION, combined_probability, bb);
   else
 {
-  dump_prediction (dump_file, PRED_DS_THEORY, combined_probability, bb,
-		   !first_match ? REASON_NONE : REASON_IGNORED);
-  dump_prediction (dump_file, PRED_FIRST_MATCH, best_probability, bb,
-		   first_match ? REASON_NONE : REASON_IGNORED);
+  if (!first_match)
+	dump_prediction (dump_file, PRED_DS_THEORY, combined_probability, bb,
+			 !first_match ? REASON_NONE : REASON_IGNORED);
+  else
+	dump_prediction (dump_file, PRED_FIRST_MATCH, best_probability, bb,
+			 first_match ? REASON_NONE : REASON_IGNORED);
 }
 
   if (first_match)
-- 
2.8.3

Re: Container debug light mode

2016-06-16 Thread François Dumont


And here is the patch to only add light debug checks to vector and deque.

* include/debug/debug.h
(__glibcxx_requires_non_empty_range, __glibcxx_requires_nonempty)
(__glibcxx_requires_subscript): Move...
* include/debug/assertions.h: ...here and add __builtin_expect.
(_GLIBCXX_DEBUG_ONLY): Remove ; value.
* include/bits/stl_deque.h
(std::deque<>::operator[]): Add __glibcxx_requires_subscript check.
(std::deque<>::front()): Add __glibcxx_requires_nonempty check.
(std::deque<>::back()): Likewise.
(std::deque<>::pop_front()): Likewise.
(std::deque<>::pop_back()): Likewise.
(std::deque<>::swap(deque&)): Add allocator check.
* include/bits/stl_vector.h
(std::vector<>::operator[]): Add __glibcxx_requires_subscript check.
(std::vector<>::front()): Add __glibcxx_requires_nonempty check.
(std::vector<>::back()): Likewise.
(std::vector<>::pop_back()): Likewise.
(std::vector<>::swap(vector&)): Add allocator check.

Tested under Linux x86_64.

François

On 13/06/2016 12:21, Jonathan Wakely wrote:

On 08/06/16 22:53 +0200, François Dumont wrote:

Hi

   Here is the patch I already proposed to introduce the debug light 
mode for vector and deque containers.


   It also simplify some internal calls.


This looks great, and I'd like to see it on trunk, but could you split
it into two patches please? The simplifications to use
__iterator_category and replace insert() with _M_insert_* are good but
are unrelated to the debug mode parts so if there are two separate
commits it's easier to backport one piece separately, or to identify
any regressions that might be introduced.




diff --git a/libstdc++-v3/include/bits/stl_deque.h b/libstdc++-v3/include/bits/stl_deque.h
index f63ae4c..66b8da6 100644
--- a/libstdc++-v3/include/bits/stl_deque.h
+++ b/libstdc++-v3/include/bits/stl_deque.h
@@ -63,6 +63,8 @@
 #include 
 #endif
 
+#include 
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
@@ -1365,7 +1367,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
*/
   reference
   operator[](size_type __n) _GLIBCXX_NOEXCEPT
-  { return this->_M_impl._M_start[difference_type(__n)]; }
+  {
+	__glibcxx_requires_subscript(__n);
+	return this->_M_impl._M_start[difference_type(__n)];
+  }
 
   /**
*  @brief Subscript access to the data contained in the %deque.
@@ -1380,7 +1385,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
*/
   const_reference
   operator[](size_type __n) const _GLIBCXX_NOEXCEPT
-  { return this->_M_impl._M_start[difference_type(__n)]; }
+  {
+	__glibcxx_requires_subscript(__n);
+	return this->_M_impl._M_start[difference_type(__n)];
+  }
 
 protected:
   /// Safety check used only from at().
@@ -1437,7 +1445,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
*/
   reference
   front() _GLIBCXX_NOEXCEPT
-  { return *begin(); }
+  {
+	__glibcxx_requires_nonempty();
+	return *begin();
+  }
 
   /**
*  Returns a read-only (constant) reference to the data at the first
@@ -1445,7 +1456,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
*/
   const_reference
   front() const _GLIBCXX_NOEXCEPT
-  { return *begin(); }
+  {
+	__glibcxx_requires_nonempty();
+	return *begin();
+  }
 
   /**
*  Returns a read/write reference to the data at the last element of the
@@ -1454,6 +1468,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   reference
   back() _GLIBCXX_NOEXCEPT
   {
+	__glibcxx_requires_nonempty();
 	iterator __tmp = end();
 	--__tmp;
 	return *__tmp;
@@ -1466,6 +1481,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   const_reference
   back() const _GLIBCXX_NOEXCEPT
   {
+	__glibcxx_requires_nonempty();
 	const_iterator __tmp = end();
 	--__tmp;
 	return *__tmp;
@@ -1549,6 +1565,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   void
   pop_front() _GLIBCXX_NOEXCEPT
   {
+	__glibcxx_requires_nonempty();
 	if (this->_M_impl._M_start._M_cur
 	!= this->_M_impl._M_start._M_last - 1)
 	  {
@@ -1571,6 +1588,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   void
   pop_back() _GLIBCXX_NOEXCEPT
   {
+	__glibcxx_requires_nonempty();
 	if (this->_M_impl._M_finish._M_cur
 	!= this->_M_impl._M_finish._M_first)
 	  {
@@ -1789,6 +1807,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   void
   swap(deque& __x) _GLIBCXX_NOEXCEPT
   {
+#if __cplusplus >= 201103L
+	__glibcxx_assert(_Alloc_traits::propagate_on_container_swap::value
+			 || _M_get_Tp_allocator() == __x._M_get_Tp_allocator());
+#endif
 	_M_impl._M_swap_data(__x._M_impl);
 	_Alloc_traits::_S_on_swap(_M_get_Tp_allocator(),
   __x._M_get_Tp_allocator());
diff --git a/libstdc++-v3/include/bits/stl_vector.h b/libstdc++-v3/include/bits/stl_vector.h
index 8badea3..e2c42cb 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -63,6 +63,8 @@
 #include 
 #endif
 
+#include 
+
 namespace std _GL

Re: PR 71181 Avoid rehash after reserve

2016-06-16 Thread François Dumont


Here is a new version compiling all your feedbacks.

PR libstdc++/71181
* include/tr1/hashtable_policy.h
(_Prime_rehash_policy::_M_next_bkt): Make past-the-end iterator
dereferenceable to avoid check on lower_bound result.
(_Prime_rehash_policy::_M_bkt_for_elements): Call latter.
(_Prime_rehash_policy::_M_need_rehash): Likewise.
* src/c++11/hashtable_c++0x.cc (_Prime_rehash_policy::_M_next_bkt):
Always return a value greater than input value. Set _M_next_resize to
max value when reaching highest prime number.
* src/shared/hashtable-aux.cc (__prime_list): Add comment about 
sentinel

being now useless.
* testsuite/23_containers/unordered_set/hash_policy/71181.cc: New.
* testsuite/23_containers/unordered_set/hash_policy/power2_rehash.cc
(test02): New.
* 
testsuite/23_containers/unordered_set/hash_policy/prime_rehash.cc: New.

* testsuite/23_containers/unordered_set/hash_policy/rehash.cc:
Fix indentation.


On 15/06/2016 10:29, Jonathan Wakely wrote:

On 14/06/16 22:34 +0200, François Dumont wrote:

   const unsigned long* __next_bkt =
-  std::lower_bound(__prime_list + 5, __prime_list + 
__n_primes, __n);

+  std::lower_bound(__prime_list + 6, __prime_list_end, __n);
+
+if (*__next_bkt == __n && __next_bkt != __prime_list_end)
+  ++__next_bkt;


Can we avoid this check by searching for __n + 1 instead of __n with
the lower_bound call?


Yes, that's another option, I will give it a try.


I did some comparisons and this version seems to execute fewer
instructions in some simple tests, according to cachegrind.

The only drawback is that calling _M_next_bkt(size_t(-1)) doesn't give 
the right result. But reaching this kind of value is not likely to 
happen in real use cases so it is acceptable.


Tested under Linux x86_64.

François

diff --git a/libstdc++-v3/include/tr1/hashtable_policy.h b/libstdc++-v3/include/tr1/hashtable_policy.h
index 4ee6d45..c5cf866 100644
--- a/libstdc++-v3/include/tr1/hashtable_policy.h
+++ b/libstdc++-v3/include/tr1/hashtable_policy.h
@@ -420,8 +420,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Prime_rehash_policy::
   _M_next_bkt(std::size_t __n) const
   {
-const unsigned long* __p = std::lower_bound(__prime_list, __prime_list
-		+ _S_n_primes, __n);
+// Don't include the last prime in the search, so that anything
+// higher than the second-to-last prime returns a past-the-end
+// iterator that can be dereferenced to get the last prime.
+const unsigned long* __p
+  = std::lower_bound(__prime_list, __prime_list + _S_n_primes - 1, __n);
 _M_next_resize = 
   static_cast(__builtin_ceil(*__p * _M_max_load_factor));
 return *__p;
@@ -434,11 +437,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _M_bkt_for_elements(std::size_t __n) const
   {
 const float __min_bkts = __n / _M_max_load_factor;
-const unsigned long* __p = std::lower_bound(__prime_list, __prime_list
-		+ _S_n_primes, __min_bkts);
-_M_next_resize =
-  static_cast(__builtin_ceil(*__p * _M_max_load_factor));
-return *__p;
+return _M_next_bkt(__builtin_ceil(__min_bkts));
   }
 
   // Finds the smallest prime p such that alpha p > __n_elt + __n_ins.
@@ -462,12 +461,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	if (__min_bkts > __n_bkt)
 	  {
 	__min_bkts = std::max(__min_bkts, _M_growth_factor * __n_bkt);
-	const unsigned long* __p =
-	  std::lower_bound(__prime_list, __prime_list + _S_n_primes,
-			   __min_bkts);
-	_M_next_resize = static_cast
-	  (__builtin_ceil(*__p * _M_max_load_factor));
-	return std::make_pair(true, *__p);
+	return std::make_pair(true,
+  _M_next_bkt(__builtin_ceil(__min_bkts)));
 	  }
 	else 
 	  {
diff --git a/libstdc++-v3/src/c++11/hashtable_c++0x.cc b/libstdc++-v3/src/c++11/hashtable_c++0x.cc
index a5e6520..ce4961f 100644
--- a/libstdc++-v3/src/c++11/hashtable_c++0x.cc
+++ b/libstdc++-v3/src/c++11/hashtable_c++0x.cc
@@ -46,22 +46,38 @@ namespace __detail
   {
 // Optimize lookups involving the first elements of __prime_list.
 // (useful to speed-up, eg, constructors)
-static const unsigned char __fast_bkt[12]
-  = { 2, 2, 2, 3, 5, 5, 7, 7, 11, 11, 11, 11 };
+static const unsigned char __fast_bkt[13]
+  = { 2, 2, 3, 5, 5, 7, 7, 11, 11, 11, 11, 13, 13 };
 
-if (__n <= 11)
+if (__n <= 12)
   {
 	_M_next_resize =
 	  __builtin_ceil(__fast_bkt[__n] * (long double)_M_max_load_factor);
 	return __fast_bkt[__n];
   }
 
+// Number of primes (without sentinel).
 constexpr auto __n_primes
   = sizeof(__prime_list) / sizeof(unsigned long) - 1;
+
+// Don't include the last prime in the search, so that anything
+// higher than the second-to-last prime returns a past-the-end
+// iterator that can be dereferenced to get the last prime.
+constexpr auto __last_prime = __prime_list + __n_primes - 1;
+
+// Look for 'n + 1' to make sure returned value will be greater than n.
 co

Re: [DOC PATCH, i386]: Document -m80387 and -mhard-float

2016-06-16 Thread Bernhard Reutner-Fischer

On Thu, Jun 16, 2016 at 07:58:59PM +0200, Uros Bizjak wrote:
> Hello!
> 
> These two options were missing from the documentation.
> 
> 2016-06-16  Uros Bizjak  
> 
> * doc/invoke.texi (x86 Options): Document -m80387 and -mhard-float.
> 
> Bootstrapped on x86_64-linux-gnu, committed to mainline SVN.
> 
> Uros.

> Index: doc/invoke.texi
> ===
> --- doc/invoke.texi   (revision 237534)
> +++ doc/invoke.texi   (working copy)
> @@ -1149,7 +1149,7 @@
>  -mtune-ctrl=@var{feature-list} -mdump-tune-features -mno-default @gol
>  -mfpmath=@var{unit} @gol
>  -masm=@var{dialect}  -mno-fancy-math-387 @gol
> --mno-fp-ret-in-387  -msoft-float @gol
> +-mnofp-ret-in-387 -m80387 -mhard-float -msoft-float @gol

Fixed thusly and committed as obvious as r237540.

thanks,
>  -mno-wide-multiply  -mrtd  -malign-double @gol
>  -mpreferred-stack-boundary=@var{num} @gol
>  -mincoming-stack-boundary=@var{num} @gol
> @@ -23544,7 +23544,15 @@
>  comparisons.  These correctly handle the case where the result of a
>  comparison is unordered.
>  
> +@item -m80387
> +@item -mhard-float
> +@opindex 80387
> +@opindex mhard-float
> +Generate output containing 80387 instructions for floating point.
> +
> +@item -mno-80387
>  @item -msoft-float
> +@opindex no-80387
>  @opindex msoft-float
>  Generate output containing library calls for floating point.
>

Re: Container debug light mode

2016-06-16 Thread Jonathan Wakely


On 16/06/16 21:28 +0200, François Dumont wrote:

And here is the patch to only add light debug checks to vector and deque.


Excellent, thanks - this is OK for trunk.

[committed] Consolidate various PIC pc-relative sequences to one output function in pa.c

2016-06-16 Thread John David Anglin

The attached patch consolidates various PIC pc-relative sequences to load 
function and code-label
addresses into one function.  This simplifies the output functions.  It also 
allows use of the mfia instruction
to load the current program counter when generating PA 2.0 .

These sequences are primarily used for long PIC calls.

Tested on hppa-unknown-linux-gnu, hppa64-hp-hpux11.11 and hppa2.0w-hp-hpux11.11 
using
BOOT_CFLAGS and BOOT_CCCFLAGS for long call generation.

Committed to trunk.

Dave
--
John David Anglin   dave.ang...@bell.net


2016-06-16  John David Anglin  

* config/pa/pa.c (pa_output_pic_pcrel_sequence): New.
(pa_output_lbranch): Use pa_output_pic_pcrel_sequence.
(pa_output_millicode_call): Likewise.
(pa_output_call): Likewise.
(pa_output_indirect_call): Likewise.
(pa_asm_output_mi_thunk): Likewise.

Index: config/pa/pa.c
===
--- config/pa/pa.c  (revision 237385)
+++ config/pa/pa.c  (working copy)
@@ -6710,6 +6710,57 @@
   return buf;
 }
 
+/* Output a PIC pc-relative instruction sequence to load the address of
+   OPERANDS[0] to register OPERANDS[2].  OPERANDS[0] is a symbol ref
+   or a code label.  OPERANDS[1] specifies the register to use to load
+   the program counter.  OPERANDS[3] may be used for label generation
+   The sequence is always three instructions in length.  The program
+   counter recorded for PA 1.X is eight bytes more than that for PA 2.0.
+   Register %r1 is clobbered.  */
+
+static void
+pa_output_pic_pcrel_sequence (rtx *operands)
+{
+  gcc_assert (SYMBOL_REF_P (operands[0]) || LABEL_P (operands[0]));
+  if (TARGET_PA_20)
+{
+  /* We can use mfia to determine the current program counter.  */
+  if (TARGET_SOM || !TARGET_GAS)
+   {
+ operands[3] = gen_label_rtx ();
+ targetm.asm_out.internal_label (asm_out_file, "L",
+ CODE_LABEL_NUMBER (operands[3]));
+ output_asm_insn ("mfia %1", operands);
+ output_asm_insn ("addil L'%0-%l3,%1", operands);
+ output_asm_insn ("ldo R'%0-%l3(%%r1),%2", operands);
+   }
+  else
+   {
+ output_asm_insn ("mfia %1", operands);
+ output_asm_insn ("addil L'%0-$PIC_pcrel$0+12,%1", operands);
+ output_asm_insn ("ldo R'%0-$PIC_pcrel$0+16(%%r1),%2", operands);
+   }
+}
+  else
+{
+  /* We need to use a branch to determine the current program counter.  */
+  output_asm_insn ("{bl|b,l} .+8,%1", operands);
+  if (TARGET_SOM || !TARGET_GAS)
+   {
+ operands[3] = gen_label_rtx ();
+ output_asm_insn ("addil L'%0-%l3,%1", operands);
+ targetm.asm_out.internal_label (asm_out_file, "L",
+ CODE_LABEL_NUMBER (operands[3]));
+ output_asm_insn ("ldo R'%0-%l3(%%r1),%2", operands);
+   }
+  else
+   {
+ output_asm_insn ("addil L'%0-$PIC_pcrel$0+4,%1", operands);
+ output_asm_insn ("ldo R'%0-$PIC_pcrel$0+8(%%r1),%2", operands);
+   }
+}
+}
+
 /* This routine handles output of long unconditional branches that
exceed the maximum range of a simple branch instruction.  Since
we don't have a register available for the branch, we save register
@@ -6730,7 +6781,7 @@
 const char *
 pa_output_lbranch (rtx dest, rtx_insn *insn, int xdelay)
 {
-  rtx xoperands[2];
+  rtx xoperands[4];
  
   xoperands[0] = dest;
 
@@ -6800,20 +6851,9 @@
 }
   else if (flag_pic)
 {
-  output_asm_insn ("{bl|b,l} .+8,%%r1", xoperands);
-  if (TARGET_SOM || !TARGET_GAS)
-   {
- xoperands[1] = gen_label_rtx ();
- output_asm_insn ("addil L'%l0-%l1,%%r1", xoperands);
- targetm.asm_out.internal_label (asm_out_file, "L",
- CODE_LABEL_NUMBER (xoperands[1]));
- output_asm_insn ("ldo R'%l0-%l1(%%r1),%%r1", xoperands);
-   }
-  else
-   {
- output_asm_insn ("addil L'%l0-$PIC_pcrel$0+4,%%r1", xoperands);
- output_asm_insn ("ldo R'%l0-$PIC_pcrel$0+8(%%r1),%%r1", xoperands);
-   }
+  xoperands[1] = gen_rtx_REG (Pmode, 1);
+  xoperands[2] = xoperands[1];
+  pa_output_pic_pcrel_sequence (xoperands);
   output_asm_insn ("bv %%r0(%%r1)", xoperands);
 }
   else
@@ -7642,10 +7682,9 @@
 {
   int attr_length = get_attr_length (insn);
   int seq_length = dbr_sequence_length ();
-  rtx xoperands[3];
+  rtx xoperands[4];
 
   xoperands[0] = call_dest;
-  xoperands[2] = gen_rtx_REG (Pmode, TARGET_64BIT ? 2 : 31);
 
   /* Handle the common case where we are sure that the branch will
  reach the beginning of the $CODE$ subspace.  The within reach
@@ -7657,7 +7696,8 @@
  || (attr_length == 28
  && get_attr_type (insn) == TYPE_SH_FUNC_ADRS)))
 {
-  output_asm_insn ("{bl|b,l} %0,%2", xoperands);
+  xoperands[1] = gen_rtx_REG (Pmode, TARGET_64BIT ? 2 :

Re: [PATCH] Backport PowerPC complex __float128 compiler support to GCC 6.x

2016-06-16 Thread Joseph Myers

On Wed, 15 Jun 2016, Michael Meissner wrote:

> Note, I do feel the front ends should be modified to allow __complex 
> __float128
> directly rather than having to use an attribute to force the complex type (and
> to use mode(TF) on x86 or mode(KF) on PowerPC).  It would clean up both x86 
> and
> PowerPC.  However, those patches aren't written yet.

I'm now working on support for TS 18661-3 _FloatN / _FloatNx type names 
(keywords), constant suffixes and  addiitions.  That will 
address, for C, the need to use modes for complex float128 (bug 32187) by 
allowing the standard _Complex _Float128 to be used.  The issue would 
still apply for C++ (I'm not including any C++ support for these type 
names / constant suffixes in my patch), and for complex ibm128.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH,rs6000] Improve tests for Power9 vector absolute difference unsigned instructions

2016-06-16 Thread Segher Boessenkool

On Thu, Jun 16, 2016 at 11:47:17AM -0600, Kelvin Nilsen wrote:
> This patch improves upon a recently committed patch to add support for
> Power9 vector absolute difference unsigned instructions in two ways:

This is okay for trunk, and 6 after a while.  Thanks.  One comment:

> +/* { dg-skip-if "" { powerpc*-*-aix* } { "*" } { "" } } */

You can write that as just

/* { dg-skip-if "" { powerpc*-*-aix* } } */


Segher

RFC: 2->2 combine patch (was: Re: [PATCH] Allow fwprop to undo vectorization harm (PR68961))

2016-06-16 Thread Segher Boessenkool

On Fri, Jun 10, 2016 at 11:20:22AM +0200, Richard Biener wrote:
> With the proposed cost change for vector construction we will end up
> vectorizing the testcase in PR68961 again (on x86_64 and likely
> on ppc64le as well after that target gets adjustments).  Currently
> we can't optimize that away again noticing the direct overlap of
> argument and return registers.  The obstackle is
> 
> (insn 7 4 8 2 (set (reg:V2DF 93)
> (vec_concat:V2DF (reg/v:DF 91 [ a ])
> (reg/v:DF 92 [ aa ]))) 
> ...
> (insn 21 8 24 2 (set (reg:DI 97 [ D.1756 ])
> (subreg:DI (reg:TI 88 [ D.1756 ]) 0))
> (insn 24 21 11 2 (set (reg:DI 100 [+8 ])
> (subreg:DI (reg:TI 88 [ D.1756 ]) 8))
> 
> which we eventually optimize to DFmode subregs of (reg:V2DF 93).
> 
> First of all simplify_subreg doesn't handle the subregs of a vec_concat
> (easy fix below).
> 
> Then combine doesn't like to simplify the multi-use (it tries some
> parallel it seems).

Combine will not do a 2->2 combination currently.  Say it is combining
A with a later B into C, and the result of A is used again later, then
it tries a parallel of A with C.  That usually does not match an insn for
the target.

If this were a 3->2 (or 4->2) combination, or A or C are no-op moves
(so that they will disappear later in combines), combine will break the
parallel into two and see if that matches.  We can in fact do that for
2->2 combinations as well: this removes a log_link (from A to B), so
combine cannot get into an infinite loop, even though it does not make
the number of RTL insns lower.

So I tried out the patch below.  It decreases code size on most targets
(mostly fixed length insn targets), and increases it a small bit on some
variable length insn targets (doing an op twice, instead of doing it once
and doing a move).  It looks to be all good there too, but there are so
many changes that it is almost impossible to really check.

So: can people try this out with their favourite benchmarks, please?

Segher

diff --git a/gcc/combine.c b/gcc/combine.c
index 6b5d000..2c99b4e 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -3933,8 +3933,6 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   && XVECLEN (newpat, 0) == 2
   && GET_CODE (XVECEXP (newpat, 0, 0)) == SET
   && GET_CODE (XVECEXP (newpat, 0, 1)) == SET
-  && (i1 || set_noop_p (XVECEXP (newpat, 0, 0))
- || set_noop_p (XVECEXP (newpat, 0, 1)))
   && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT
   && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART
   && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT

Re: OpenACC wait clause

2016-06-16 Thread Cesar Philippidis

On 06/07/2016 08:02 AM, Jakub Jelinek wrote:
> On Tue, Jun 07, 2016 at 08:01:10AM -0700, Cesar Philippidis wrote:
>> On 06/07/2016 04:13 AM, Jakub Jelinek wrote:
>>
>>> I've noticed
>>>   if ((mask & OMP_CLAUSE_WAIT)
>>>   && !c->wait
>>>   && gfc_match ("wait") == MATCH_YES)
>>> {
>>>   c->wait = true; 
>>>   match_oacc_expr_list (" (", &c->wait_list, false);
>>>   continue;
>>> }
>>> which looks just weird and confusing.  Why isn't this instead:
>>>   if ((mask & OMP_CLAUSE_WAIT)
>>>   && !c->wait
>>>   && (match_oacc_expr_list ("wait (", &c->wait_list, false)
>>>   == MATCH_YES))
>>> {
>>>   c->wait = true; 
>>>   continue;
>>> }
>>> ?  Otherwise you happily accept wait without following (, perhaps even
>>> combined with another clause without any space in between etc.
>>
>> Both acc wait and async accept optional parenthesis arguments. E.g.,
>>
>>   #pragma acc wait
>>
>> blocks for all of the async streams to complete before proceeding, whereas
>>
>>   #pragma acc wait (1, 5)
>>
>> only blocks for async streams 1 and 5.
> 
> But then you need to set need_space = true; if it doesn't have the ( after
> it.

I was distracted with acc routine stuff, so this took me a little longer
to get around to this. In addition to that problem with the wait clause,
I discovered a similar problem with the async clause and the wait
directive. It this patch ok for trunk and gcc-6?

Cesar

2016-06-16  Cesar Philippidis  

	gcc/fortran/
	* openmp.c (gfc_match_omp_clauses): Update use a needs_space for the
	OpenACC wait and async clauses.
	(gfc_match_oacc_wait): Ensure that there is a space when the optional
	parenthesis is missing.

	gcc/testsuite/
	* gfortran.dg/goacc/asyncwait-2.f95: Add additional test coverage.
	* gfortran.dg/goacc/asyncwait-4.f95: Likewise.

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 2c92794..435c709 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -677,7 +677,6 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 	  && gfc_match ("async") == MATCH_YES)
 	{
 	  c->async = true;
-	  needs_space = false;
 	  if (gfc_match (" ( %e )", &c->async_expr) != MATCH_YES)
 		{
 		  c->async_expr
@@ -685,6 +684,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 	 gfc_default_integer_kind,
 	 &gfc_current_locus);
 		  mpz_set_si (c->async_expr->value.integer, GOMP_ASYNC_NOVAL);
+		  needs_space = true;
 		}
 	  continue;
 	}
@@ -1328,7 +1328,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 	  && gfc_match ("wait") == MATCH_YES)
 	{
 	  c->wait = true;
-	  match_oacc_expr_list (" (", &c->wait_list, false);
+	  if (match_oacc_expr_list (" (", &c->wait_list, false) == MATCH_NO)
+		needs_space = true;
 	  continue;
 	}
 	  if ((mask & OMP_CLAUSE_WORKER)
@@ -1649,7 +1650,7 @@ gfc_match_oacc_wait (void)
   gfc_expr_list *wait_list = NULL, *el;
 
   match_oacc_expr_list (" (", &wait_list, true);
-  gfc_match_omp_clauses (&c, OACC_WAIT_CLAUSES, false, false, true);
+  gfc_match_omp_clauses (&c, OACC_WAIT_CLAUSES, false, true, true);
 
   if (gfc_match_omp_eos () != MATCH_YES)
 {
diff --git a/gcc/testsuite/gfortran.dg/goacc/asyncwait-2.f95 b/gcc/testsuite/gfortran.dg/goacc/asyncwait-2.f95
index db0ce1f..7b2ae07 100644
--- a/gcc/testsuite/gfortran.dg/goacc/asyncwait-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/asyncwait-2.f95
@@ -83,6 +83,18 @@ program asyncwait
   end do
   !$acc end parallel ! { dg-error "Unexpected \\\!\\\$ACC END PARALLEL" }
 
+  !$acc parallel copyin (a(1:N)) copy (b(1:N)) waitasync ! { dg-error "Unclassifiable OpenACC directive" }
+  do i = 1, N
+ b(i) = a(i)
+  end do
+  !$acc end parallel ! { dg-error "Unexpected \\\!\\\$ACC END PARALLEL" }
+
+  !$acc parallel copyin (a(1:N)) copy (b(1:N)) asyncwait ! { dg-error "Unclassifiable OpenACC directive" }
+  do i = 1, N
+ b(i) = a(i)
+  end do
+  !$acc end parallel ! { dg-error "Unexpected \\\!\\\$ACC END PARALLEL" }
+  
   !$acc parallel copyin (a(1:N)) copy (b(1:N)) wait
   do i = 1, N
  b(i) = a(i)
diff --git a/gcc/testsuite/gfortran.dg/goacc/asyncwait-4.f95 b/gcc/testsuite/gfortran.dg/goacc/asyncwait-4.f95
index cd64ef3..01349b0 100644
--- a/gcc/testsuite/gfortran.dg/goacc/asyncwait-4.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/asyncwait-4.f95
@@ -34,4 +34,6 @@ program asyncwait
   !$acc wait async (1.0) ! { dg-error "ASYNC clause at \\\(1\\\) requires a scalar INTEGER expression" }
 
   !$acc wait async 1 ! { dg-error "Unexpected junk in \\\!\\\$ACC WAIT at" }
+
+  !$acc waitasync ! { dg-error "Unexpected junk in \\\!\\\$ACC WAIT at" }
 end program asyncwait

[openacc] clean up acc directive matching in fortran

2016-06-16 Thread Cesar Philippidis

This patch introduces a match_acc function to the fortran FE. It's
almost identical to match_omp, but it passes openacc = true to
gfc_match_omp_clauses. I supposed I could have consolidated those two
functions, but they are reasonably simple so I left them separate. Maybe
a follow up patch can consolidate them. I was able to eliminate a lot of
duplicate code with this function.

Is this ok for trunk and gcc-6?

Cesar

Re: [openacc] clean up acc directive matching in fortran

2016-06-16 Thread Cesar Philippidis

On 06/16/2016 08:30 PM, Cesar Philippidis wrote:
> This patch introduces a match_acc function to the fortran FE. It's
> almost identical to match_omp, but it passes openacc = true to
> gfc_match_omp_clauses. I supposed I could have consolidated those two
> functions, but they are reasonably simple so I left them separate. Maybe
> a follow up patch can consolidate them. I was able to eliminate a lot of
> duplicate code with this function.
> 
> Is this ok for trunk and gcc-6?

And here's the patch.

Cesar

2016-06-16  Cesar Philippidis  

	gcc/fortran/
	* openmp.c (match_acc): New generic function to parse OpenACC
	directives.
	(gfc_match_oacc_parallel_loop): Use it.
	(gfc_match_oacc_parallel): Likewise.
	(gfc_match_oacc_kernels_loop): Likewise.
	(gfc_match_oacc_kernels): Likewise.
	(gfc_match_oacc_data): Likewise.
	(gfc_match_oacc_host_data): Likewise.
	(gfc_match_oacc_loop): Likewise.
	(gfc_match_oacc_enter_data): Likewise.
	(gfc_match_oacc_exit_data): Likewise.


diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index b780f26..435c709 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -1412,63 +1412,101 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
   (OMP_CLAUSE_GANG | OMP_CLAUSE_WORKER | OMP_CLAUSE_VECTOR | OMP_CLAUSE_SEQ)
 
 
-static match
-match_acc (gfc_exec_op op, uint64_t mask)
+match
+gfc_match_oacc_parallel_loop (void)
 {
   gfc_omp_clauses *c;
-  if (gfc_match_omp_clauses (&c, mask, false, false, true) != MATCH_YES)
+  if (gfc_match_omp_clauses (&c, OACC_PARALLEL_LOOP_CLAUSES, false, false,
+			 true) != MATCH_YES)
 return MATCH_ERROR;
-  new_st.op = op;
+
+  new_st.op = EXEC_OACC_PARALLEL_LOOP;
   new_st.ext.omp_clauses = c;
   return MATCH_YES;
 }
 
-match
-gfc_match_oacc_parallel_loop (void)
-{
-  return match_acc (EXEC_OACC_PARALLEL_LOOP, OACC_PARALLEL_LOOP_CLAUSES);
-}
-
 
 match
 gfc_match_oacc_parallel (void)
 {
-  return match_acc (EXEC_OACC_PARALLEL, OACC_PARALLEL_CLAUSES);
+  gfc_omp_clauses *c;
+  if (gfc_match_omp_clauses (&c, OACC_PARALLEL_CLAUSES, false, false, true)
+  != MATCH_YES)
+return MATCH_ERROR;
+
+  new_st.op = EXEC_OACC_PARALLEL;
+  new_st.ext.omp_clauses = c;
+  return MATCH_YES;
 }
 
 
 match
 gfc_match_oacc_kernels_loop (void)
 {
-  return match_acc (EXEC_OACC_KERNELS_LOOP, OACC_KERNELS_LOOP_CLAUSES);
+  gfc_omp_clauses *c;
+  if (gfc_match_omp_clauses (&c, OACC_KERNELS_LOOP_CLAUSES, false, false,
+			 true) != MATCH_YES)
+return MATCH_ERROR;
+
+  new_st.op = EXEC_OACC_KERNELS_LOOP;
+  new_st.ext.omp_clauses = c;
+  return MATCH_YES;
 }
 
 
 match
 gfc_match_oacc_kernels (void)
 {
-  return match_acc (EXEC_OACC_KERNELS, OACC_KERNELS_CLAUSES);
+  gfc_omp_clauses *c;
+  if (gfc_match_omp_clauses (&c, OACC_KERNELS_CLAUSES, false, false, true)
+  != MATCH_YES)
+return MATCH_ERROR;
+
+  new_st.op = EXEC_OACC_KERNELS;
+  new_st.ext.omp_clauses = c;
+  return MATCH_YES;
 }
 
 
 match
 gfc_match_oacc_data (void)
 {
-  return match_acc (EXEC_OACC_DATA, OACC_DATA_CLAUSES);
+  gfc_omp_clauses *c;
+  if (gfc_match_omp_clauses (&c, OACC_DATA_CLAUSES, false, false, true)
+  != MATCH_YES)
+return MATCH_ERROR;
+
+  new_st.op = EXEC_OACC_DATA;
+  new_st.ext.omp_clauses = c;
+  return MATCH_YES;
 }
 
 
 match
 gfc_match_oacc_host_data (void)
 {
-  return match_acc (EXEC_OACC_HOST_DATA, OACC_HOST_DATA_CLAUSES);
+  gfc_omp_clauses *c;
+  if (gfc_match_omp_clauses (&c, OACC_HOST_DATA_CLAUSES, false, false, true)
+  != MATCH_YES)
+return MATCH_ERROR;
+
+  new_st.op = EXEC_OACC_HOST_DATA;
+  new_st.ext.omp_clauses = c;
+  return MATCH_YES;
 }
 
 
 match
 gfc_match_oacc_loop (void)
 {
-  return match_acc (EXEC_OACC_LOOP, OACC_LOOP_CLAUSES);
+  gfc_omp_clauses *c;
+  if (gfc_match_omp_clauses (&c, OACC_LOOP_CLAUSES, false, false, true)
+  != MATCH_YES)
+return MATCH_ERROR;
+
+  new_st.op = EXEC_OACC_LOOP;
+  new_st.ext.omp_clauses = c;
+  return MATCH_YES;
 }
 
 
@@ -1580,14 +1618,28 @@ gfc_match_oacc_update (void)
 match
 gfc_match_oacc_enter_data (void)
 {
-  return match_acc (EXEC_OACC_ENTER_DATA, OACC_ENTER_DATA_CLAUSES);
+  gfc_omp_clauses *c;
+  if (gfc_match_omp_clauses (&c, OACC_ENTER_DATA_CLAUSES, false, false, true)
+  != MATCH_YES)
+return MATCH_ERROR;
+
+  new_st.op = EXEC_OACC_ENTER_DATA;
+  new_st.ext.omp_clauses = c;
+  return MATCH_YES;
 }
 
 
 match
 gfc_match_oacc_exit_data (void)
 {
-  return match_acc (EXEC_OACC_EXIT_DATA, OACC_EXIT_DATA_CLAUSES);
+  gfc_omp_clauses *c;
+  if (gfc_match_omp_clauses (&c, OACC_EXIT_DATA_CLAUSES, false, false, true)
+  != MATCH_YES)
+return MATCH_ERROR;
+
+  new_st.op = EXEC_OACC_EXIT_DATA;
+  new_st.ext.omp_clauses = c;
+  return MATCH_YES;
 }

Re: [Patch, avr] Fix PR 71151

2016-06-16 Thread Senthil Kumar Selvaraj


Denis Chertykov writes:

> 2016-06-16 10:27 GMT+03:00 Senthil Kumar Selvaraj
> :
>>
>> Senthil Kumar Selvaraj writes:
>>
>>> Georg-Johann Lay writes:
>>>
 Senthil Kumar Selvaraj schrieb:
> Hi,
>
>   This patch fixes PR 71151 by eliminating the
>   TARGET_ASM_FUNCTION_RODATA_SECTION hook and setting
>   JUMP_TABLES_IN_TEXT_SECTION to 1.
>
>   As described in the bugzilla entry, this hook assumed it will get
>   called only for jumptable rodata for functions. This was true until
>   6.1, when a commit in varasm.c started calling the hook for mergeable
>   string/constant data as well.
>
>   This resulted in string constants ending up in a section intended for
>   jumptables (flash), and broke code using those constants, which
>   expects them to be present in rodata (SRAM).
>
>   Given that the original reason for placing jumptables in a section was
>   fixed by Johann in PR 63323, this patch restores the original
>   behavior. Reg testing on both gcc-6-branch and trunk showed no 
> regressions.

 Just for the record:

 The intention for jump-tables in function-rodata-section was to get
 fine-grained section for the tables so that --gc-sections and
 -ffunction-sections not only gc unused functions but also unused
 jump-tables.  As these tables had to reside in the lowest 64KiB of flash
 (.progmem section) neither .rodata nor .text was a correct placement,
 hence the hacking in TARGET_ASM_FUNCTION_RODATA_SECTION.

 Before using TARGET_ASM_FUNCTION_RODATA_SECTION, all jump tables were
 put into .progmem.gcc_sw_table by ASM_OUTPUT_BEFORE_CASE_LABEL switching
 to that section.

 We actually never had fump-tables in .text before...
>>>
>>> JUMP_TABLES_IN_TEXT_SECTION was 1 before r37465 - that was when the
>>> progmem.gcc_sw_table section was introduced. But yes, I understand that
>>> the target hook for FUNCTION_RODATA_SECTION was done to get them gc'ed
>>> along with the code.
>>>

 The purpose of PR63323 was to have more generic jump-table
 implementation that also works if the table does NOT reside in the lower
 64KiB.  This happens when moving whole whole TEXT section around like
 for a bootloader.
>>>
>>> Understood.

>   As pointed out by Johann, this may end up increasing code
>   size if there are lots of branches that cross the jump tables. I
>   intend to propose a separate patch that gives additional information
>   to the target hook (SECCAT_RODATA_{STRING,JUMPTABLE}) so it can know
>   what type of function rodata is coming on. Johann also suggested
>   handling jump table generation ourselves - I'll experiment with that
>   some more.
>
>   If ok, could someone commit please? Could you also backport to
>   gcc-6-branch?
>
> Regards
> Senthil
>
> gcc/ChangeLog
>
> 2016-06-03  Senthil Kumar Selvaraj  
>
> * config/avr/avr.c (avr_asm_function_rodata_section): Remove.
> * config/avr/avr.c (TARGET_ASM_FUNCTION_RODATA_SECTION): Remove.
>
> gcc/testsuite/ChangeLog
>
> 2016-06-03  Senthil Kumar Selvaraj  
>
> * gcc/testsuite/gcc.target/avr/pr71151-1.c: New.
> * gcc/testsuite/gcc.target/avr/pr71151-2.c: New.
>
> diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c
> index ba5cd91..3cb8cb7 100644
> --- gcc/config/avr/avr.c
> +++ gcc/config/avr/avr.c
> @@ -9488,65 +9488,6 @@ avr_asm_init_sections (void)
>  }
>
>
> -/* Implement `TARGET_ASM_FUNCTION_RODATA_SECTION'.  */
> -
> -static section*
> -avr_asm_function_rodata_section (tree decl)
> -{
> -  /* If a function is unused and optimized out by -ffunction-sections
> - and --gc-sections, ensure that the same will happen for its jump
> - tables by putting them into individual sections.  */
> -
> -  unsigned int flags;
> -  section * frodata;
> -
> -  /* Get the frodata section from the default function in varasm.c
> - but treat function-associated data-like jump tables as code
> - rather than as user defined data.  AVR has no constant pools.  */
> -  {
> -int fdata = flag_data_sections;
> -
> -flag_data_sections = flag_function_sections;
> -frodata = default_function_rodata_section (decl);
> -flag_data_sections = fdata;
> -flags = frodata->common.flags;
> -  }
> -
> -  if (frodata != readonly_data_section
> -  && flags & SECTION_NAMED)
> -{
> -  /* Adjust section flags and replace section name prefix.  */
> -
> -  unsigned int i;
> -
> -  static const char* const prefix[] =
> -{
> -  ".rodata",  ".progmem.gcc_sw_table",
> -  ".gnu.linkonce.r.", ".gnu.linkonce.t."
> -};
> -
> -  for (i = 0; i < s

84 matches

Mail list logo