Re: [AArch64] Add precision choices for the reciprocal square root approximation

2016-03-18 Thread Evandro Menezes

On 03/18/16 10:21, Wilco Dijkstra wrote:

Hi Evandro,


For example, though this approximation is improves the performance
noticeably for DF on A57, for SF, not so much, if at all.

I'm still skeptical that you ever can get any gain on scalars. I bet the only 
gain is on
4x vectorized floats.


I created a simple test that loops around an inline asm version of the 
Newton series using scalar insns and got these results on A57:


   1/sqrt(x):18290898/s
   Fast: 45896823/s

   1/sqrtf(x):   69618490/s
   Fast: 61865874/s



So what I would like to see is this implemented in a more general way. We should
be able choose whether to expand depending on the mode - including whether it is
vectorized. For example enable on V4SFmode and maybe V2DFmode, but not
on any scalars.

Then we'd add new CPU tuning settings for division, sqrt and rsqrt (rather than 
adding lots
of extra tune flags).


If I understood you correctly, would something like coarse tuning flags 
along with target-specific cost or parameters tables be what you have in 
mind?



Note the md file should call a function in aarch64.c to decide whether to
expand or not (your division approximation patch makes the decision in the md 
file which
does not seem a good idea).


I agree.  Will modify it.

Thank you,

--
Evandro Menezes



Re: PING: [PATCH] PR driver/70192: Properly set flag_pie and flag_pic

2016-03-18 Thread H.J. Lu
On Thu, Mar 17, 2016 at 8:55 AM, Bernd Schmidt  wrote:
> On 03/17/2016 04:26 PM, H.J. Lu wrote:
>>
>> On Thu, Mar 17, 2016 at 8:23 AM, Bernd Schmidt 
>> wrote:
>>>
>>> On 03/17/2016 04:13 PM, H.J. Lu wrote:

 We can add an effective target, something like ignore_pic_pie, and
 use it instead of *-*-darwin*.
>>>
>>>
>>>
>>> That should have been done _before_ committing the patch in a form that
>>> was
>>> not approved.
>>>
>>
>> How should we move forward?
>
>
> Maybe an effective target pic_default, which tests whether __PIC__ is
> defined without any options. Please prepare a patch.
>

That isn't sufficient for Darwin since it ignores -fno-pic and -fno-pie.


-- 
H.J.


[Patch, testsuite] Skip testcase for avr

2016-03-18 Thread Senthil Kumar Selvaraj
Hi,

 This trivial patch skips gcc.c-torture/compile/20151204.c for the avr
 target - the test allocates ~64K on the stack, which is too big
 for the avr target. Right now, the test errors out with "total
 size of local objects too large".

 If this is ok, could someone commit please? I don't have commit access.

Regards
Senthil

gcc/testsuite/ChangeLog
2016-03-16  Senthil Kumar Selvaraj  

  * gcc.c-torture/compile/20151204.c: Skip for avr.


diff --git a/gcc/testsuite/gcc.c-torture/compile/20151204.c 
b/gcc/testsuite/gcc.c-torture/compile/20151204.c
index 036316c..0a60871 100644
--- a/gcc/testsuite/gcc.c-torture/compile/20151204.c
+++ b/gcc/testsuite/gcc.c-torture/compile/20151204.c
@@ -1,3 +1,5 @@
+/* { dg-skip-if "Array too big" { "avr-*-*" } { "*" } { "" } } */
+
 typedef __SIZE_TYPE__ size_t;
 
 int strcmp (const char*, const char*);


Re: [PATCH][PR rtl-optimization/70024] Fix argument to CROSSING_JUMP_P

2016-03-18 Thread Andreas Schwab
Jeff Law  writes:

>   PR rtl-optimization/70024

That's probably a typo.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [gomp4.1] map clause parsing improvements

2016-03-18 Thread Jakub Jelinek
On Thu, Mar 17, 2016 at 03:34:09PM +0100, Thomas Schwinge wrote:
> That's simple enouch; OK to commit?  (I'm also including the related
> change, to rename the Fortran OMP_MAP_FORCE_DEALLOC to OMP_MAP_DELETE,
> because I think that's what you'd do, once starting the OpenMP 4.5
> Fortran front end work.)

Ok, thanks.

Jakub


Re: [PATCH PR69489/01]Improve tree ifcvt by storing/tracking DR against its innermost loop bahavior if possible

2016-03-18 Thread Richard Biener
On Wed, Mar 16, 2016 at 10:59 AM, Bin Cheng  wrote:
> Hi,
> One issue revealed in tree ifcvt is the pass stores/tracks DRs against its 
> memory references in IR.  This causes difficulty in identifying same memory 
> references appearing in different forms.  Given below example:
>
> void foo (int a[], int b[])
> {
>   int i;
>   for (i = 0; i < 100; i++)
> {
>   if (a[i] ==0)
> a[i] = b[i]*4;
>   else
> a[i] = b[i]*3;
> }
> }
>
> The gimple dump before tree ifcvt is as:
>
>   :
>
>   :
>   # i_24 = PHI 
>   # ivtmp_28 = PHI 
>   _5 = (long unsigned int) i_24;
>   _6 = _5 * 4;
>   _8 = a_7(D) + _6;
>   _9 = *_8;
>   if (_9 == 0)
> goto ;
>   else
> goto ;
>
>   :
>   _11 = b_10(D) + _6;
>   _12 = *_11;
>   _13 = _12 * 4;
>   goto ;
>
>   :
>   _15 = b_10(D) + _6;
>   _16 = *_15;
>   _17 = _16 * 3;
>
>   :
>   # cstore_1 = PHI <_13(4), _17(5)>
>   *_8 = cstore_1;
>   i_19 = i_24 + 1;
>   ivtmp_23 = ivtmp_28 - 1;
>   if (ivtmp_23 != 0)
> goto ;
>   else
> goto ;
>
>   :
>   goto ;
>
>   :
>   return;
>
> The two memory references "*_11" and "*_15" are actually the same thing, but 
> ifcvt failed to discover this because they are recorded in different forms.  
> This patch fixes the issue by recording/tracking memory reference against its 
> innermost_loop_behavior if: the memory reference has innermost_loop_behavior 
> and it is not a compound reference.
> BTW, PR56625 reported that this case couldn't be vectorized even tree if-conv 
> can handle it.  Interesting thing is at current time, it's tree if-conv that 
> could not handle the case.  Once it's if-converted with this patch, 
> vectorizer are able to handle it too.
>
> Bootstrap and test on x86_64 and AArch64.  Is it OK, not sure if it's GCC 7?

Hmm.

+  equal_p = true;
+  if (e1->base_address && e2->base_address)
+equal_p &= operand_equal_p (e1->base_address, e2->base_address, 0);
+  if (e1->offset && e2->offset)
+equal_p &= operand_equal_p (e1->offset, e2->offset, 0);

surely better to return false early.

I think we don't want this in tree-data-refs.h also because of ...

@@ -615,15 +619,29 @@
hash_memrefs_baserefs_and_store_DRs_read_written_info
(data_reference_p a)
   data_reference_p *master_dr, *base_master_dr;
   tree ref = DR_REF (a);
   tree base_ref = DR_BASE_OBJECT (a);
+  innermost_loop_behavior *innermost = _INNERMOST (a);
   tree ca = bb_predicate (gimple_bb (DR_STMT (a)));
   bool exist1, exist2;

-  while (TREE_CODE (ref) == COMPONENT_REF
-|| TREE_CODE (ref) == IMAGPART_EXPR
-|| TREE_CODE (ref) == REALPART_EXPR)
-ref = TREE_OPERAND (ref, 0);
+  /* If reference in DR has innermost loop behavior and it is not
+ a compound memory reference, we store it to innermost_DR_map,
+ otherwise to ref_DR_map.  */
+  if (TREE_CODE (ref) == COMPONENT_REF
+  || TREE_CODE (ref) == IMAGPART_EXPR
+  || TREE_CODE (ref) == REALPART_EXPR
+  || !(DR_BASE_ADDRESS (a) || DR_OFFSET (a)
+  || DR_INIT (a) || DR_STEP (a) || DR_ALIGNED_TO (a)))
+{
+  while (TREE_CODE (ref) == COMPONENT_REF
+|| TREE_CODE (ref) == IMAGPART_EXPR
+|| TREE_CODE (ref) == REALPART_EXPR)
+   ref = TREE_OPERAND (ref, 0);
+
+  master_dr = _DR_map->get_or_insert (ref, );
+}
+  else
+master_dr = _DR_map->get_or_insert (innermost, );

we don't want an extra hashmap but replace ref_DR_map entirely.  So we'd need to
strip outermost non-variant handled-components (COMPONENT_REF, IMAGPART
and REALPART) before creating the DR (or adjust the equality function
and hashing
to disregard them which means subtracting their offset from DR_INIT.

To adjust the references we collect you'd maybe could use a callback
to get_references_in_stmt
to adjust them.

OTOH post-processing the DRs in if_convertible_loop_p_1 can be as simple as

Index: tree-if-conv.c
===
--- tree-if-conv.c  (revision 234215)
+++ tree-if-conv.c  (working copy)
@@ -1235,6 +1220,38 @@ if_convertible_loop_p_1 (struct loop *lo

   for (i = 0; refs->iterate (i, ); i++)
 {
+  tree *refp = _REF (dr);
+  while ((TREE_CODE (*refp) == COMPONENT_REF
+ && TREE_OPERAND (*refp, 2) == NULL_TREE)
+|| TREE_CODE (*refp) == IMAGPART_EXPR
+|| TREE_CODE (*refp) == REALPART_EXPR)
+   refp = _OPERAND (*refp, 0);
+  if (refp != _REF (dr))
+   {
+ tree saved_base = *refp;
+ *refp = integer_zero_node;
+
+ if (DR_INIT (dr))
+   {
+ tree poffset;
+ int punsignedp, preversep, pvolatilep;
+ machine_mode pmode;
+ HOST_WIDE_INT pbitsize, pbitpos;
+ get_inner_reference (DR_REF (dr), , , ,
+  , , , ,
+  false);
+ gcc_assert (poffset == NULL_TREE);
+
+ DR_INIT 

Re: C++ PATCH to fix missing warning (PR c++/70194)

2016-03-18 Thread Patrick Palka
On Thu, Mar 17, 2016 at 12:27 PM, Jeff Law  wrote:
> On 03/16/2016 06:43 PM, Martin Sebor wrote:
>>>
>>> @@ -3974,6 +3974,38 @@ build_vec_cmp (tree_code code, tree type,
>>> return build3 (VEC_COND_EXPR, type, cmp, minus_one_vec, zero_vec);
>>>   }
>>>
>>> +/* Possibly warn about an address never being NULL.  */
>>> +
>>> +static void
>>> +warn_for_null_address (location_t location, tree op, tsubst_flags_t
>>> complain)
>>> +{
>>
>> ...
>>>
>>> +  if (TREE_CODE (cop) == ADDR_EXPR
>>> +  && decl_with_nonnull_addr_p (TREE_OPERAND (cop, 0))
>>> +  && !TREE_NO_WARNING (cop))
>>> +warning_at (location, OPT_Waddress, "the address of %qD will never "
>>> +"be NULL", TREE_OPERAND (cop, 0));
>>> +
>>> +  if (CONVERT_EXPR_P (op)
>>> +  && TREE_CODE (TREE_TYPE (TREE_OPERAND (op, 0))) == REFERENCE_TYPE)
>>> +{
>>> +  tree inner_op = op;
>>> +  STRIP_NOPS (inner_op);
>>> +
>>> +  if (DECL_P (inner_op))
>>> +warning_at (location, OPT_Waddress,
>>> +"the compiler can assume that the address of "
>>> +"%qD will never be NULL", inner_op);
>>
>>
>> Since I noted the subtle differences between the phrasing of
>> the various -Waddress warnings in the bug, I have to ask: what is
>> the significance of the difference between the two warnings here?
>>
>> Would it not be appropriate to issue the first warning in the latter
>> case?  Or perhaps even use the same text as is already used elsewhere:
>> "the address of %qD will always evaluate as ‘true’" (since it may not
>> be the macro NULL that's mentioned in the expression).
>
> They were added at different times AFAICT.  The former is fairly old
> (Douglas Gregor, 2008) at this point.  The latter was added by Patrick Palka
> for 65168 about a year ago.
>
> You could directly ask Patrick about motivations for a different message.

There is no plausible way for the address of a non-reference variable
to be NULL even in code with UB (aside from __attribute__ ((weak)) in
which case the warning is suppressed).  But the address of a reference
can easily seem to be NULL if one performs UB and assigns to it *(int
*)NULL or something like that.  I think that was my motivation, anyway
:)


Re: [C++ PATCH] Diagnose invalid _Jv_AllocObject prototype (PR c++/70267)

2016-03-18 Thread Jason Merrill

On 03/17/2016 03:35 PM, Jakub Jelinek wrote:

_Jv_AllocObject returns a pointer, and as the testcase below shows,
we easily ICE if a wrong prototype is provided for it instead.
There is already other diagnostics (e.g. when it is missing, or when
it is overloaded function), so this ensures at least the return type
is sane.


OK.


Wonder about all the other spots where the C++ FE relies on user prototypes
for __cxa* etc. functions, perhaps some sanity checking will be needed too
to avoid ICEs on invalid stuff.


I think the compiler generates internal declarations for all the __cxa* 
functions.  We've run into issues with broken definitions of 
std::initializer_list, but we have sanity checks on that now.


Jason



[committed] Fix linux blk-merge boot problem on hppa

2016-03-18 Thread John David Anglin
The attach patch fixes a problem causing block/blk-merge.c in the linux kernel 
to be miscompiled.  As a result,
block segments were not properly split and boot failed since linux 4.3.

The problem was found by a regression search.  The patch reverts a change in 
the handling of the Q and T
constraints.

Tested on hppa-unknown-linux-gnu, hppa2.0w-hp-hpux11.11and hppa64-hp-hpux11.11. 
 Committed to
trunk, 4.9 and 5 branches.

Dave
--
John David Anglin   dave.ang...@bell.net


2016-03-17  John David Anglin  

PR target/70188
* config/pa/constraints.md: Revert 2015-02-13 change.  Use
define_constraint for "Q" and "T" constraints.

Index: config/pa/constraints.md
===
--- config/pa/constraints.md(revision 234201)
+++ config/pa/constraints.md(working copy)
@@ -106,7 +106,7 @@
   (and (match_code "mem")
(match_test "IS_LO_SUM_DLT_ADDR_P (XEXP (op, 0))")))
 
-(define_memory_constraint "Q"
+(define_constraint "Q"
   "A memory operand that can be used as the destination operand of an
integer store, or the source operand of an integer load.  That is
any memory operand that isn't a symbolic, indexed or lo_sum memory
@@ -122,7 +122,7 @@
   (and (match_code "mem")
(match_test "IS_INDEX_ADDR_P (XEXP (op, 0))")))
 
-(define_memory_constraint "T"
+(define_constraint "T"
   "A memory operand for floating-point loads and stores."
   (match_test "floating_point_store_memory_operand (op, mode)"))
 


[COMMITTED][AArch64] Tweak the pipeline model for Exynos M1

2016-03-18 Thread Evandro Menezes

Tweak the pipeline model for Exynos M1

* gcc/config/aarch64/aarch64.c
(exynosm1_tunings): Enable the weak prefetching model.

Committed as r234307.

--
Evandro Menezes

>From a75d875a3c64180c9d6c368e2d87036d70f66036 Mon Sep 17 00:00:00 2001
From: evandro 
Date: Thu, 17 Mar 2016 21:20:50 +
Subject: [PATCH] Tweak the pipeline model for Exynos M1

	* gcc/config/aarch64/aarch64.c
	(exynosm1_tunings): Enable the weak prefetching model.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@234307 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog| 7 +++
 gcc/config/aarch64/aarch64.c | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 4a74494..fc50cb7 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2016-03-17  Evandro Menezes  
+	
+	Tweak the pipeline model for Exynos M1
+
+	* config/aarch64/aarch64.c (exynosm1_tunings): 	Enable weak prefetching
+	model.
+
 2016-03-17  David Malcolm  
 
 	PR c/70264
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 12e498d..ed0daa5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -537,7 +537,7 @@ static const struct tune_params exynosm1_tunings =
   2,	/* min_div_recip_mul_df.  */
   48,	/* max_case_values.  */
   64,	/* cache_line_size.  */
-  tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model.  */
+  tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_APPROX_RSQRT) /* tune_flags.  */
 };
 
-- 
1.9.1



[PATCH V3]PR other/70268: map one directory name (old) to another (new) in __FILE__

2016-03-18 Thread Hongxu Jia

Changed in V3:

- Rebase to latest master (efc86c4c627b82364f118a29b5d9d58cad8b8c76)

- Fix bad formatting (missing space before '(').

- Use of @code{} around literal source code text.

//Hongxu

>From 7fe014845596f547d735324c466799d8207d282d Mon Sep 17 00:00:00 2001
From: Hongxu Jia 
Date: Wed, 16 Mar 2016 04:55:56 -0700
Subject: [PATCH V3] map one directory name (old) to another (new) in __FILE__

PR other/70268

* gcc/c-family/c.opt(-ffile-prefix-map=): New option.
* gcc/c-family/c-opts.c: Include file-map.h
(c_common_handle_option): Handle -ffile-prefix-map.
* gcc/gimplify.c: Include file-map.h
(gimplify_call_expr): Call remap_file_filename
* gcc/dwarf2out.c (gen_producer_string): Ignore -ffile-prefix-map.
* libcpp/macro.c: Include file-map.h
(_cpp_builtin_macro_text): Call remap_file_filename
* libcpp/include/file-map.h (remap_file_filename,
add_file_prefix_map): Declare.
* libcpp/file-map.c: Include config.h, system.h, file-map.h.
(struct file_prefix_map, file_prefix_maps, add_file_prefix_map,
remap_file_filename): New.
* libcpp/Makefile.in (file-map.c, file-map.o,
file-map.h): Update dependencies.
* doc/invoke.texi (-ffile-prefix-map): Document.

Signed-off-by: Hongxu Jia 
---
 gcc/ChangeLog | 10 ++
 gcc/c-family/c-opts.c |  6 
 gcc/c-family/c.opt|  4 +++
 gcc/doc/invoke.texi   |  7 
 gcc/dwarf2out.c   |  1 +
 gcc/gimplify.c|  2 ++
 libcpp/ChangeLog  | 13 +++
 libcpp/Makefile.in| 10 +++---
 libcpp/file-map.c | 92 +++
 libcpp/include/file-map.h | 30 
 libcpp/macro.c|  2 ++
 11 files changed, 172 insertions(+), 5 deletions(-)
 create mode 100644 libcpp/file-map.c
 create mode 100644 libcpp/include/file-map.h

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7c0f325..54d353e 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2016-03-18  Hongxu Jia  
+
+	PR other/70268
+	* c-family/c-opts.c: Include file-map.h
+	(c_common_handle_option): Handle -ffile-prefix-map.
+	* c-family/c.opt(-ffile-prefix-map=): New option.
+	* gimplify.c: Include file-map.h
+	(gimplify_call_expr): Call remap_file_filename
+	* dwarf2out.c (gen_producer_string): Ignore -ffile-prefix-map.
+
 2016-03-17  John David Anglin  
 
 	PR target/70188
diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index fec58bc..d0033f6 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "plugin.h"		/* For PLUGIN_INCLUDE_FILE event.  */
 #include "mkdeps.h"
 #include "dumpfile.h"
+#include "file-map.h"
 
 #ifndef DOLLARS_IN_IDENTIFIERS
 # define DOLLARS_IN_IDENTIFIERS true
@@ -503,6 +504,11 @@ c_common_handle_option (size_t scode, const char *arg, int value,
   cpp_opts->narrow_charset = arg;
   break;
 
+case OPT_ffile_prefix_map_:
+  if (add_file_prefix_map (arg) < 0)
+error ("invalid argument %qs to -ffile-prefix-map", arg);
+  break;
+
 case OPT_fwide_exec_charset_:
   cpp_opts->wide_charset = arg;
   break;
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 7c5f6c7..17eb865 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1204,6 +1204,10 @@ fexec-charset=
 C ObjC C++ ObjC++ Joined RejectNegative
 -fexec-charset=	Convert all strings and character constants to character set .
 
+ffile-prefix-map=
+C ObjC C++ ObjC++ Joined RejectNegative

Re: [PATCH, PR70185] Only finalize dot files that have been initialized

2016-03-18 Thread Tom de Vries

On 16/03/16 12:34, Richard Biener wrote:

On Wed, Mar 16, 2016 at 11:57 AM, Tom de Vries  wrote:

Hi,

Atm, using fdump-tree-all-graph produces invalid dot files:
...
$ rm *.c.* ; gcc test.c -O2 -S -fdump-tree-all-graph
$ for f in *.dot; do dot -Tpdf $f -o dot.pdf; done
Warning: test.c.006t.omplower.dot: syntax error in line 1 near '}'
Warning: test.c.007t.lower.dot: syntax error in line 1 near '}'
Warning: test.c.010t.eh.dot: syntax error in line 1 near '}'
Warning: test.c.292t.statistics.dot: syntax error in line 1 near '}'
$ cat test.c.006t.omplower.dot
}
$
...
These dot files are finalized, but never initialized or used.

The 006/007/010 files are not used because '(fn->curr_properties & PROP_cfg)
== 0' at the corresponding passes.

And the file test.c.292t.statistics.dot is not used, because it doesn't
belong to a single pass.

The current finalization code doesn't handle these cases:
...
   /* Do whatever is necessary to finish printing the graphs.  */
   for (i = TDI_end; (dfi = dumps->get_dump_file_info (i)) != NULL; ++i)
 if (dumps->dump_initialized_p (i)
 && (dfi->pflags & TDF_GRAPH) != 0
 && (name = dumps->get_dump_file_name (i)) != NULL)
   {
 finish_graph_dump_file (name);
 free (name);
   }
...

The patch fixes this by simply testing for pass->graph_dump_initialized
instead.

[ That fix exposes the lack of initialization of graph_dump_initialized. It
seems to be initialized for static passes, but for dynamically added passes,
such as f.i. vzeroupper the value is uninitialized. The patch also fixes
this. ]

Bootstrapped and reg-tested on x86_64.

OK for stage1?


Seeing this I wonder if it makes more sense to move ->graph_dump_initialized
from pass to dump_file_info?


Done.


Also in the above shouldn't it use
dfi->pfilename rather than dumps->get_dump_file_name (i)?


That one isn't defined anymore once we get to finish_optimization_passes.

OK for stage1 if bootstrap and reg-test succeeds?

Thanks,
- Tom

Only finalize dot files that have been initialized

2016-03-16  Tom de Vries  

	PR other/70185
	* tree-pass.h (class opt_pass): Remove graph_dump_initialized member.
	* dumpfile.h (struct dump_file_info): Add graph_dump_initialized field.
	* dumpfile.c (dump_files): Initialize graph_dump_initialized field.
	* passes.c (finish_optimization_passes): Only call
	finish_graph_dump_file if dfi->graph_dump_initialized.
	(execute_function_dump, pass_init_dump_file): Use
	dfi->graph_dump_initialized instead of pass->graph_dump_initialized.

---
 gcc/dumpfile.c  | 22 +++---
 gcc/dumpfile.h  |  4 
 gcc/passes.c| 16 ++--
 gcc/tree-pass.h |  5 -
 4 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/gcc/dumpfile.c b/gcc/dumpfile.c
index 144e371..74522a6 100644
--- a/gcc/dumpfile.c
+++ b/gcc/dumpfile.c
@@ -50,29 +50,29 @@ int dump_flags;
TREE_DUMP_INDEX enumeration in dumpfile.h.  */
 static struct dump_file_info dump_files[TDI_end] =
 {
-  {NULL, NULL, NULL, NULL, NULL, NULL, NULL, 0, 0, 0, 0, 0, 0, false},
+  {NULL, NULL, NULL, NULL, NULL, NULL, NULL, 0, 0, 0, 0, 0, 0, false, false},
   {".cgraph", "ipa-cgraph", NULL, NULL, NULL, NULL, NULL, TDF_IPA,
-   0, 0, 0, 0, 0, false},
+   0, 0, 0, 0, 0, false, false},
   {".type-inheritance", "ipa-type-inheritance", NULL, NULL, NULL, NULL, NULL, TDF_IPA,
-   0, 0, 0, 0, 0, false},
+   0, 0, 0, 0, 0, false, false},
   {".tu", "translation-unit", NULL, NULL, NULL, NULL, NULL, TDF_TREE,
-   0, 0, 0, 0, 1, false},
+   0, 0, 0, 0, 1, false, false},
   {".class", "class-hierarchy", NULL, NULL, NULL, NULL, NULL, TDF_TREE,
-   0, 0, 0, 0, 2, false},
+   0, 0, 0, 0, 2, false, false},
   {".original", "tree-original", NULL, NULL, NULL, NULL, NULL, TDF_TREE,
-   0, 0, 0, 0, 3, false},
+   0, 0, 0, 0, 3, false, false},
   {".gimple", "tree-gimple", NULL, NULL, NULL, NULL, NULL, TDF_TREE,
-   0, 0, 0, 0, 4, false},
+   0, 0, 0, 0, 4, false, false},
   {".nested", "tree-nested", NULL, NULL, NULL, NULL, NULL, TDF_TREE,
-   0, 0, 0, 0, 5, false},
+   0, 0, 0, 0, 5, false, false},
 #define FIRST_AUTO_NUMBERED_DUMP 6
 
   {NULL, "tree-all", NULL, NULL, NULL, NULL, NULL, TDF_TREE,
-   0, 0, 0, 0, 0, false},
+   0, 0, 0, 0, 0, false, false},
   {NULL, "rtl-all", NULL, NULL, NULL, NULL, NULL, TDF_RTL,
-   0, 0, 0, 0, 0, false},
+   0, 0, 0, 0, 0, false, false},
   {NULL, "ipa-all", NULL, NULL, NULL, NULL, NULL, TDF_IPA,
-   0, 0, 0, 0, 0, false},
+   0, 0, 0, 0, 0, false, false},
 };
 
 /* Define a name->number mapping for a dump flag value.  */
diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h
index c168cbf..3f08b16 100644
--- a/gcc/dumpfile.h
+++ b/gcc/dumpfile.h
@@ -120,6 +120,10 @@ struct dump_file_info
   bool owns_strings;/* fields "suffix", "swtch", "glob" can be
    const strings, or can be dynamically
    allocated, needing free.  */
+  bool graph_dump_initialized;  /* When a given dump file is being 

Re: [AArch64] Emit square root using the Newton series

2016-03-18 Thread Evandro Menezes

On 03/10/16 19:06, Wilco Dijkstra wrote:

Evandro Menezes  wrote:

That's what I had in mind too, but around the approximation for x^-1/2
and using masks for vector cases thusly:

fcmne   v3.4s, v0.4s, #0.0
 frsqrte v1.4s, v0.4s
 fmulv2.4s, v1.4s, v1.4s
 frsqrts v2.4s, v0.4s, v2.4s
 fmulv1.4s, v1.4s, v2.4s
 fmulv2.4s, v1.4s, v1.4s
 frsqrts v2.4s, v0.4s, v2.4s
 fmulv1.4s, v1.4s, v2.4s
and v1.4s, v3.4s
 fmulv0.4s, v1.4s, v0.4s

That's possible but the overall latency is higher - according to exynos-1.md the
above takes 44 cycles while my version would be 37.


Emit square root using the Newton series

2016-03-16  Evandro Menezes  
Wilco Dijkstra  

gcc/
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_APPROX_SQRT_{SF,DF}): New tuning macros.
* config/aarch64/aarch64-protos.h
(aarch64_emit_approx_rsqrt): Replace with
   "aarch64_emit_approx_sqrt".
(AARCH64_EXTRA_TUNE_APPROX_SQRT): New macro.
* config/aarch64/aarch64.c
(exynosm1_tunings): Use the new macro.
(aarch64_emit_approx_sqrt): Define new function.
* config/aarch64/aarch64.md
(rsqrt2): Use new function instead.
(sqrt2): New expansion and insn definitions.
* config/aarch64/aarch64-simd.md: Likewise.
* config/aarch64/aarch64.opt
(mlow-precision-recip-sqrt): Expand option description.
* doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.


This patch merges the function that emits the approximate reciprocal 
square root and the approximate square root, qualifying the latter for 
when the input argument is zero.


It depends on the patch at 
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00534.html


I appreciate your feedback.

Thank you,

--
Evandro Menezes

>From a69a80da4c3feab691d3c1df28906ef195e5631d Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Wed, 16 Mar 2016 15:21:00 -0500
Subject: [PATCH] Emit square root using the Newton series

2016-03-16  Evandro Menezes  
Wilco Dijkstra  

gcc/
	* config/aarch64/aarch64-tuning-flags.def
	(AARCH64_EXTRA_TUNE_APPROX_SQRT_{SF,DF}): New tuning macros.
	* config/aarch64/aarch64-protos.h
	(aarch64_emit_approx_rsqrt): Replace with "aarch64_emit_approx_sqrt".
	(AARCH64_EXTRA_TUNE_APPROX_SQRT): New macro.
	* config/aarch64/aarch64.c
	(exynosm1_tunings): Use the new macro.
	(aarch64_emit_approx_sqrt): Define new function.
	* config/aarch64/aarch64.md
	(rsqrt2): Use new function instead.
	(sqrt2): New expansion and insn definitions.
	* config/aarch64/aarch64-simd.md: Likewise.
	* config/aarch64/aarch64.opt
	(mlow-precision-recip-sqrt): Expand option description.
	* doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.
---
 gcc/config/aarch64/aarch64-protos.h |  4 +-
 gcc/config/aarch64/aarch64-simd.md  | 27 +++-
 gcc/config/aarch64/aarch64-tuning-flags.def |  3 +-
 gcc/config/aarch64/aarch64.c| 97 +++--
 gcc/config/aarch64/aarch64.md   | 25 +++-
 gcc/config/aarch64/aarch64.opt  |  4 +-
 gcc/doc/invoke.texi |  9 +--
 7 files changed, 138 insertions(+), 31 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 58e5d73..c9a5192 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -265,6 +265,8 @@ enum aarch64_extra_tuning_flags
 
 #define AARCH64_EXTRA_TUNE_APPROX_RSQRT \
   (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF | AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF)
+#define AARCH64_EXTRA_TUNE_APPROX_SQRT \
+  (AARCH64_EXTRA_TUNE_APPROX_SQRT_DF | AARCH64_EXTRA_TUNE_APPROX_SQRT_SF)
 
 extern struct tune_params aarch64_tune_params;
 
@@ -364,7 +366,7 @@ void aarch64_register_pragmas (void);
 void aarch64_relayout_simd_types (void);
 void aarch64_reset_previous_fndecl (void);
 void aarch64_save_restore_target_globals (tree);
-void aarch64_emit_approx_rsqrt (rtx, rtx);
+void aarch64_emit_approx_sqrt (rtx, rtx, bool);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index bd73bce..31191bb 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -405,7 +405,7 @@
 		 UNSPEC_RSQRT))]
   "TARGET_SIMD"
 {
-  aarch64_emit_approx_rsqrt (operands[0], operands[1]);
+  aarch64_emit_approx_sqrt (operands[0], operands[1], true);
   DONE;
 })
 
@@ -4307,7 +4307,30 @@
 
 ;; sqrt
 
-(define_insn "sqrt2"
+(define_expand "sqrt2"
+  [(set (match_operand:VDQF 0 "register_operand")
+	(sqrt:VDQF 

Re: C++ PATCH to fix missing warning (PR c++/70194)

2016-03-18 Thread Martin Sebor

@@ -3974,6 +3974,38 @@ build_vec_cmp (tree_code code, tree type,
return build3 (VEC_COND_EXPR, type, cmp, minus_one_vec, zero_vec);
  }

+/* Possibly warn about an address never being NULL.  */
+
+static void
+warn_for_null_address (location_t location, tree op, tsubst_flags_t complain)
+{

...

+  if (TREE_CODE (cop) == ADDR_EXPR
+  && decl_with_nonnull_addr_p (TREE_OPERAND (cop, 0))
+  && !TREE_NO_WARNING (cop))
+warning_at (location, OPT_Waddress, "the address of %qD will never "
+   "be NULL", TREE_OPERAND (cop, 0));
+
+  if (CONVERT_EXPR_P (op)
+  && TREE_CODE (TREE_TYPE (TREE_OPERAND (op, 0))) == REFERENCE_TYPE)
+{
+  tree inner_op = op;
+  STRIP_NOPS (inner_op);
+
+  if (DECL_P (inner_op))
+   warning_at (location, OPT_Waddress,
+   "the compiler can assume that the address of "
+   "%qD will never be NULL", inner_op);


Since I noted the subtle differences between the phrasing of
the various -Waddress warnings in the bug, I have to ask: what is
the significance of the difference between the two warnings here?

Would it not be appropriate to issue the first warning in the latter
case?  Or perhaps even use the same text as is already used elsewhere:
"the address of %qD will always evaluate as ‘true’" (since it may not
be the macro NULL that's mentioned in the expression).

Martin


Re: [PATCH] Fix compiling large files

2016-03-18 Thread Jeff Law

On 03/15/2016 04:31 PM, Richard Henderson wrote:

On 03/10/2016 08:20 PM, DJ Delorie wrote:

I'm moving on to Plan C but I put a copy of the file on
.../dj/foo.c.gz (195Mb) if anyone wants to find out
why there's a 16Gb limit compiling it...


With just the following, we successfully compile your file.

It takes about 25 minutes and memory use tops out around 40GB.
Which still seems insane for a 1.6GB input file consisting
primarily of data for a static array, but that's a
different problem.

At this point we usually have a PR to go with all stage4
changes.  But a meaningful PR is difficult to create, since
the attachment would be too large.  Perhaps a generator could
be created, but since it wouldn't go in the testsuite it seems
like a waste of time.

Thoughts?


r~


 * line-map.c (new_linemap): Make alloc_size a size_t.

OK.
jeff



Re: Wonly-top-basic-asm

2016-03-18 Thread Bernd Schmidt

On 03/17/2016 06:23 AM, David Wohlferd wrote:

2016-03-16  David Wohlferd  
 Bernd Schmidt  

 * doc/extend.texi: Doc basic asm behavior re clobbers.



Any objections from the release managers if I install this for David at 
this stage?



Bernd



Re: [PATCH, i386, AVX-512] Emit vpbroadcastq instead if non-existent vbroadcastsd.

2016-03-18 Thread Richard Biener
On Fri, 18 Mar 2016, Kirill Yukhin wrote:

> Hello,
> Intel spec [1] states that there're almost all broadcasting
> intructions variants available, except for (p. 2-4)
>   vbroadcastsd %xmm, %xmm
> It is safe to emit
>   vpbroadcastq %xmm, %xmm
> instead.
> 
> I was uable to extract a testcase, but if this insn is generated -
> we'll got asm error.
> 
> [1] - 
> https://software.intel.com/sites/default/files/managed/b4/3a/319433-024.pdf
> 
> Bootstrapped and regtested.
> 
> Richard,
> is it ok to check in to main trunk?

If it's ok with Uros yes.

Richard.

> gcc/
>   * config/i386/sse.md: Use vpbroadcastq for broadcasting DF
>   values to 128b regs.
> 
> --
> Thanks, K
> 
> commit 72e85f1b936d61edc93603862c810a8b4817b8a7
> Author: Kirill Yukhin 
> Date:   Thu Mar 17 18:05:22 2016 +0300
> 
> AVX-512. Use vpbroadcastq for broadcasting DF values to 128b regs.
> 
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 3c521b3..b25c246 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -17269,7 +17269,14 @@
>   (match_operand: 1 "nonimmediate_operand" "vm")
>   (parallel [(const_int 0)]]
>"TARGET_AVX512F"
> -  "vbroadcast\t{%1, 
> %0|%0, %1}"
> +{
> +  /*  There is no DF broadcast (in AVX-512*) to 128b register.
> +  Mimic it with integer variant.  */
> +  if (mode == V2DFmode)
> +return "vpbroadcastq\t{%1, %0|%0, %1}";
> +  else
> +return "vbroadcast\t{%1, 
> %0|%0, %1}";
> +}
>[(set_attr "type" "ssemov")
> (set_attr "prefix" "evex")
> (set_attr "mode" "")])
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: Wonly-top-basic-asm

2016-03-18 Thread Bernd Schmidt

On 03/17/2016 06:23 AM, David Wohlferd wrote:

On 3/14/2016 8:28 AM, Bernd Schmidt wrote:

The example is not good, as discussed previously, and IMO the best
option is to remove it. Otherwise I have no objections to the latest
variant.


Despite the problems I have with the existing sample, adding the
information/warnings is more important to me, and more valuable to
users.  Perhaps we can revisit the sample when pr24414 gets addressed.


My thought was to remove the example altogether, which we might want to 
do later on. But for now I've committed your change since it can be seen 
as an improvement in itself.



Bernd



RFA: PATCH to load_register_parameters for empty structs and sibcalls

2016-03-18 Thread Jason Merrill
Discussion of empty class parameter passing ABI led me to notice that 
r162402 broke sibcalls with arguments of size 0 in some cases.  Before 
that commit, the code read



else if ((partial == 0 || args[i].pass_on_stack)
 && size != 0)
{
  rtx mem = validize_mem (args[i].value);

  /* Check for overlap with already clobbered argument area. */
  if (is_sibcall
  && mem_overlaps_already_clobbered_arg_p (XEXP (args[i].value, 0), size))
 *sibcall_failure = 1;


and after,


else if (partial == 0 || args[i].pass_on_stack)
  {
rtx mem = validize_mem (args[i].value);

/* Check for overlap with already clobbered argument area,
   providing that this has non-zero size.  */
if (is_sibcall
&& (size == 0
|| mem_overlaps_already_clobbered_arg_p
 (XEXP (args[i].value, 0), size)))
  *sibcall_failure = 1;


So now we set *sibcall_failure if size==0, whereas before we didn't 
enter the outer block.  The comment also contradicts the code.


OK for trunk?
commit 2fa659c5b7779ba190cd5e9d15a8b7bc3dc5df35
Author: Jason Merrill 
Date:   Wed Mar 16 12:57:37 2016 -0400

	* calls.c (load_register_parameters): Fix zero size sibcall logic.

diff --git a/gcc/calls.c b/gcc/calls.c
index 7b28f43..6415e08 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -2083,9 +2083,9 @@ load_register_parameters (struct arg_data *args, int num_actuals,
 	  /* Check for overlap with already clobbered argument area,
 	 providing that this has non-zero size.  */
 	  if (is_sibcall
-		  && (size == 0
-		  || mem_overlaps_already_clobbered_arg_p 
-	   (XEXP (args[i].value, 0), size)))
+		  && size != 0
+		  && (mem_overlaps_already_clobbered_arg_p
+		  (XEXP (args[i].value, 0), size)))
 		*sibcall_failure = 1;
 
 	  if (size % UNITS_PER_WORD == 0
diff --git a/gcc/testsuite/gcc.dg/sibcall-11.c b/gcc/testsuite/gcc.dg/sibcall-11.c
new file mode 100644
index 000..ae58770
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/sibcall-11.c
@@ -0,0 +1,7 @@
+// Test for sibcall optimization with empty struct.
+// { dg-options "-O2" }
+// { dg-final { scan-assembler "jmp" { target i?86-*-* x86_64-*-* } } }
+
+struct A { };
+void f(struct A);
+void g(struct A a) { f(a); }


[gomp-nvptx 2/7] nvptx libgcc: use attribute shared

2016-03-18 Thread Alexander Monakov
* config/nvptx/crt0.c (__nvptx_stacks): Define in C.  Use it...
(__nvptx_uni): Ditto.
(__main): ...here instead of inline asm.
* config/nvptx/stacks.c (__nvptx_stacks): Define in C.
(__nvptx_uni): Ditto.
---
 libgcc/ChangeLog.gomp-nvptx  |  8 
 libgcc/config/nvptx/crt0.c   | 10 --
 libgcc/config/nvptx/stacks.c |  9 ++---
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/libgcc/config/nvptx/crt0.c b/libgcc/config/nvptx/crt0.c
index 5e04b0f..9e9a25e 100644
--- a/libgcc/config/nvptx/crt0.c
+++ b/libgcc/config/nvptx/crt0.c
@@ -41,10 +41,8 @@ abort (void)
   exit (255);
 }
 
-asm ("// BEGIN GLOBAL VAR DECL: __nvptx_stacks");
-asm (".extern .shared .u64 __nvptx_stacks[32];");
-asm ("// BEGIN GLOBAL VAR DECL: __nvptx_uni");
-asm (".extern .shared .u32 __nvptx_uni[32];");
+extern char *__nvptx_stacks[32] __attribute__((shared));
+extern unsigned __nvptx_uni[32] __attribute__((shared));
 
 extern int main (int argc, char *argv[]);
 
@@ -54,8 +52,8 @@ __main (int *__retval, int __argc, char *__argv[])
   __exitval = __retval;
 
   static char gstack[131072] __attribute__((aligned(8)));
-  asm ("st.shared.u64 [__nvptx_stacks], %0;" : : "r" (gstack + sizeof gstack));
-  asm ("st.shared.u32 [__nvptx_uni], %0;" : : "r" (0));
+  __nvptx_stacks[0] = gstack + sizeof gstack;
+  __nvptx_uni[0] = 0;
 
   exit (main (__argc, __argv));
 }
diff --git a/libgcc/config/nvptx/stacks.c b/libgcc/config/nvptx/stacks.c
index a7e640a..4640fc9 100644
--- a/libgcc/config/nvptx/stacks.c
+++ b/libgcc/config/nvptx/stacks.c
@@ -21,10 +21,5 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
-/* __shared__ char *__nvptx_stacks[32];  */
-asm ("// BEGIN GLOBAL VAR DEF: __nvptx_stacks");
-asm (".visible .shared .u64 __nvptx_stacks[32];");
-
-/* __shared__ unsigned __nvptx_uni[32];  */
-asm ("// BEGIN GLOBAL VAR DEF: __nvptx_uni");
-asm (".visible .shared .u32 __nvptx_uni[32];");
+char *__nvptx_stacks[32] __attribute__((shared)) = { 0 };
+unsigned __nvptx_uni[32] __attribute__((shared)) = { 0 };


[Patch testsuite obvious][gcc-5] g++.dg/ext/pr57735.C should not run if the testsuite is explicitly passing -mfloat-abi=hard

2016-03-18 Thread Andre Vieira (lists)
On 09/06/15 14:07, James Greenhalgh wrote:
> 
> Hi,
> 
> g++.dg/ext/pr57735.C is failing for test runs which explicitly pass
> -mfloat-abi=hard. Looking at the test, it seems the best fix would be
> to check before adding -mfloat-abi=soft that we are not testing some other
> float-abi. We also fail to check that it is OK to add -march=armv5te
> and -marm.
> 
> Fixed using the same mechanisms we use elsewhere in the gcc.target/arm/
> tests with the attached, applied as obvious as revision 224280.
> 
> Thanks,
> James
> 
> ---
> gcc/testsuite/
> 
> 2015-06-09  James Greenhalgh  
> 
>   * g++.dg/ext/pr57735.C: Do not override -mfloat-abi directives
>   passed by the testsuite driver.
> 

Thomas committed this on my behalf to gcc-5-branch as obvious as
revision r234326.

Cheers,
Andre


Re: [PATCH] PR lto/70258: [6 Regression] flag_pic is cleared for PIE in lto_post_options

2016-03-18 Thread Richard Biener
On Wed, Mar 16, 2016 at 10:47 PM, H.J. Lu  wrote:
> Since PIE implies PIC, we should set flag_pic to flag_pie for PIE in
> LTO.
>
> Tested on x86-64.  OK for trunk?

Ok.  I wonder if we need to do sth to flag_shlib here as well?

Richard.

> H.J.
> ---
> PR lto/70258
> * lto-lang.c (lto_post_options): Set flag_pic to flag_pie for
> PIE.
> ---
>  gcc/lto/lto-lang.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c
> index 691e9e2..b5efe3a 100644
> --- a/gcc/lto/lto-lang.c
> +++ b/gcc/lto/lto-lang.c
> @@ -836,7 +836,7 @@ lto_post_options (const char **pfilename ATTRIBUTE_UNUSED)
>/* If -fPIC or -fPIE was used at compile time, be sure that
>   flag_pie is 2.  */
>flag_pie = MAX (flag_pie, flag_pic);
> -  flag_pic = 0;
> +  flag_pic = flag_pie;
>break;
>
>  case LTO_LINKER_OUTPUT_EXEC: /* Normal executable */
> --
> 2.5.0
>


[gomp-nvptx 1/7] libgomp: remove paste error in gomp_team_barrier_wait_end

2016-03-18 Thread Alexander Monakov
* config/nvptx/bar.c: Remove wrong invocation of
gomp_barrier_wait_end from gomp_team_barrier_wait_end.
---
 libgomp/ChangeLog.gomp-nvptx | 5 +
 libgomp/config/nvptx/bar.c   | 2 --
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/libgomp/config/nvptx/bar.c b/libgomp/config/nvptx/bar.c
index e6e8daa..a0d8a44 100644
--- a/libgomp/config/nvptx/bar.c
+++ b/libgomp/config/nvptx/bar.c
@@ -80,8 +80,6 @@ gomp_team_barrier_wait_end (gomp_barrier_t *bar, 
gomp_barrier_state_t state)
 {
   unsigned int generation, gen;
 
-  gomp_barrier_wait_end (bar, state);
-
   if (__builtin_expect (state & BAR_WAS_LAST, 0))
 {
   /* Next time we'll be awaiting TOTAL threads again.  */


[i386] Support .largecomm with Solaris as (PR target/61821)

2016-03-18 Thread Rainer Orth
gcc.target/i386/pr61599-1.c currently FAILs on 64-bit Solaris/x86 with
the native assembler:

FAIL: gcc.target/i386/pr61599-1.c (test for excess errors)
WARNING: gcc.target/i386/pr61599-1.c compilation failed to produce executable

Assembler: pr61599-1.c
"pr61599-1.s", line 2 : Illegal mnemonic
Near line: ".largecomm  a,1073741824,64"
"pr61599-1.s", line 2 : Syntax error

gcc emits

.largecomm  a,1073741824,64

here.  The issue is twofold: /bin/as uses the .lbcomm directive instead
of .largecomm.  In addition, the declaration needs to be prefixed with a
section declaration like this:

.section .lbss,"wah",@nobits
.lbcomm  a,1073741824,64

This can be done in gas as well: the resulting object files are
identical with or without the .section declaration.

The following patch implements this.  Because we need the corresponding
decl when switching to .lbss, all instances of ASM_OUTPUT_ALIGNED_COMMON
need to be changed to ASM_OUTPUT_ALIGNED_DECL_COMMON for the additional
decl argument.

Besides, this patch depends on the previous one for large section
handling with Solaris as:

https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01056.html

Bootstrapped without regressions on i386-pc-solaris2.12 (with as and
gas) and x86_64-pc-linux-gnu.

Ok for mainline now or after gcc 6 branches?

Thanks.
Rainer


2016-03-15  Rainer Orth  

PR target/61821
* config/i386/i386.c (LARGECOMM_SECTION_ASM_OP): Define default.
(x86_elf_aligned_common): Rename to ...
(x86_elf_aligned_decl_common): ... this.
Add decl arg.  Switch to .lbss for largecomm object.  Use
LARGECOMM_SECTION_ASM_OP.
* config/i386/i386-protos.h (x86_elf_aligned_common): Reflect
renaming.
* config/i386/x86-64.h (ASM_OUTPUT_ALIGNED_COMMON): Rename to ...
(ASM_OUTPUT_ALIGNED_DECL_COMMON): ... this.
Pass new decl arg.
* config/i386/sol2.h (ASM_OUTPUT_ALIGNED_COMMON): Likewise.
[!USE_GAS] (LARGECOMM_SECTION_ASM_OP): Define.

# HG changeset patch
# Parent  f2778f1226a58a09af87570a94143ad767c3de2d
Support .largecomm with Solaris as (PR target/61821)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -294,8 +294,8 @@ extern int ix86_decompose_address (rtx, 
 extern int memory_address_length (rtx, bool);
 extern void x86_output_aligned_bss (FILE *, tree, const char *,
 unsigned HOST_WIDE_INT, int);
-extern void x86_elf_aligned_common (FILE *, const char *,
-unsigned HOST_WIDE_INT, int);
+extern void x86_elf_aligned_decl_common (FILE *, tree, const char *,
+	 unsigned HOST_WIDE_INT, int);
 
 #ifdef RTX_CODE
 extern void ix86_fp_comparison_codes (enum rtx_code code, enum rtx_code *,
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6630,19 +6630,27 @@ x86_64_elf_unique_section (tree decl, in
 }
 
 #ifdef COMMON_ASM_OP
+
+#ifndef LARGECOMM_SECTION_ASM_OP
+#define LARGECOMM_SECTION_ASM_OP "\t.largecomm\t"
+#endif
+
 /* This says how to output assembler code to declare an
uninitialized external linkage data object.
 
-   For medium model x86-64 we need to use .largecomm opcode for
+   For medium model x86-64 we need to use LARGECOMM_SECTION_ASM_OP opcode for
large objects.  */
 void
-x86_elf_aligned_common (FILE *file,
+x86_elf_aligned_decl_common (FILE *file, tree decl,
 			const char *name, unsigned HOST_WIDE_INT size,
 			int align)
 {
   if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
   && size > (unsigned int)ix86_section_threshold)
-fputs ("\t.largecomm\t", file);
+{
+  switch_to_section (get_named_section (decl, ".lbss", 0));
+  fputs (LARGECOMM_SECTION_ASM_OP, file);
+}
   else
 fputs (COMMON_ASM_OP, file);
   assemble_name (file, name);
diff --git a/gcc/config/i386/sol2.h b/gcc/config/i386/sol2.h
--- a/gcc/config/i386/sol2.h
+++ b/gcc/config/i386/sol2.h
@@ -183,15 +183,15 @@ along with GCC; see the file COPYING3.  
 
 /* As in sparc/sol2.h, override the default from i386/x86-64.h to work
around Sun as TLS bug.  */
-#undef  ASM_OUTPUT_ALIGNED_COMMON
-#define ASM_OUTPUT_ALIGNED_COMMON(FILE, NAME, SIZE, ALIGN)		\
+#undef  ASM_OUTPUT_ALIGNED_DECL_COMMON
+#define ASM_OUTPUT_ALIGNED_DECL_COMMON(FILE, DECL, NAME, SIZE, ALIGN)	\
   do	\
 {	\
   if (TARGET_SUN_TLS		\
 	  && in_section			\
 	  && ((in_section->common.flags & SECTION_TLS) == SECTION_TLS))	\
 	switch_to_section (bss_section);\
-  x86_elf_aligned_common (FILE, NAME, SIZE, ALIGN);			\
+  x86_elf_aligned_decl_common (FILE, DECL, NAME, SIZE, ALIGN);	\
 }	\
   while  (0)
 
@@ -229,6 +229,10 @@ along with GCC; see the file COPYING3.  
 #define DTORS_SECTION_ASM_OP	"\t.section\t.dtors, \"aw\""
 #endif
 
+#ifndef USE_GAS
+#define 

Re: [PATCH] Fix PR c++/70121 (premature folding of const var that was implicitly captured)

2016-03-18 Thread Patrick Palka
On Fri, Mar 18, 2016 at 11:14 AM, Jason Merrill  wrote:
> On 03/10/2016 05:58 PM, Patrick Palka wrote:
>>
>> This patch reverses the behavior of process_outer_var_ref, so that we
>> always implicitly capture a const variable if it's capturable, instead
>> of always trying to first fold it to a constant.  This behavior however
>> is wrong too, and introduces a different but perhaps less important
>> regression: if we implicitly capture by value a const object that is not
>> actually odr-used within the body of the lambda, we may introduce a
>> redundant call to its copy/move constructor, see pr70121-2.C.
>
>
> In general I'm disinclined to trade one bug for another, and I'm skeptical
> that the different regression is less important; I imagine that most uses of
> const variables will be for their constant value rather than their address.

Okay, I may try to implement a complete fix for GCC 7 then.


Re: PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class

2016-03-18 Thread H.J. Lu
On Wed, Mar 16, 2016 at 10:02 AM, H.J. Lu  wrote:
> On Wed, Mar 16, 2016 at 9:58 AM, Jason Merrill  wrote:
>> On 03/16/2016 08:38 AM, H.J. Lu wrote:
>>>
>>> FAIL: g++.dg/abi/pr60336-1.C   scan-assembler jmp[\t
>>> ]+[^$]*?_Z3xxx9true_type
>>> FAIL: g++.dg/abi/pr60336-5.C   scan-assembler jmp[\t
>>> ]+[^$]*?_Z3xxx9true_type
>>> FAIL: g++.dg/abi/pr60336-6.C   scan-assembler jmp[\t
>>> ]+[^$]*?_Z3xxx9true_type
>>> FAIL: g++.dg/abi/pr60336-7.C   scan-assembler jmp[\t
>>> ]+[^$]*?_Z3xxx9true_type
>>> FAIL: g++.dg/abi/pr60336-9.C   scan-assembler jmp[\t
>>> ]+[^$]*?_Z3xxx9true_type
>>> FAIL: g++.dg/abi/pr68355.C   scan-assembler jmp[\t
>>> ]+[^$]*?_Z3xxx17integral_constantIbLb1EE
>>
>>
>> These pass for me on x86_64, but I do see calls with -m32.
>>
>>> They are expected since get_ref_base_and_extent needs to be
>>> changed to set bitsize to 0 for empty types so that when
>>> ref_maybe_used_by_call_p_1 calls get_ref_base_and_extent to
>>> get 0 as the maximum size on empty type.  Otherwise, find_tail_calls
>>> won't perform tail call optimization for functions with empty type
>>> parameters.
>>
>>
>> That isn't why the optimization isn't happening in pr68355 with -m32; the
>> .optimized dump has
>>
>>   xxx (D.2289); [tail call]
>>
>> Rather, the failure seems to happen in load_register_parameter, at
>>
>>>   /* Check for overlap with already clobbered argument area,
>>>  providing that this has non-zero size.  */
>>>   if (is_sibcall
>>>   && (size == 0
>>>   || mem_overlaps_already_clobbered_arg_p
>>>(XEXP (args[i].value, 0),
>>> size)))
>>> *sibcall_failure = 1;
>>
>>
>> The code seems to contradict the comment, and seems to have been broken by
>> r162402.  Applying this additional patch fixes those tests.
>>
>
> I am running the full test now.

On x86-64, I got

export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.dg/ubsan/object-size-9.c:11:13:
runtime error: load of address 0x00600ffa with insufficient space
for an object of type 'char'
0x00600ffa: note: pointer points here

PASS: gcc.dg/ubsan/object-size-9.c   -O2  execution test
FAIL: gcc.dg/ubsan/object-size-9.c   -O2  output pattern test
Output was:


-- 
H.J.


Re: [AArch64] Add precision choices for the reciprocal square root approximation

2016-03-18 Thread Wilco Dijkstra

Evandro Menezes  wrote:
> On 03/18/16 10:21, Wilco Dijkstra wrote:
> > Hi Evandro,
> >
> >> For example, though this approximation is improves the performance
> >> noticeably for DF on A57, for SF, not so much, if at all.
> > I'm still skeptical that you ever can get any gain on scalars. I bet the 
> > only gain is on
> > 4x vectorized floats.
>
> I created a simple test that loops around an inline asm version of the
> Newton series using scalar insns and got these results on A57:

That's pure max throughput rather than answering the question whether
it speeds up code that does real work. A test that loads an array of vectors and
writes back the unit vectors would be a more realistic scenario.

Note our testing showed rsqrt slows down various benchmarks:
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00574.html.

> If I understood you correctly, would something like coarse tuning flags
> along with target-specific cost or parameters tables be what you have in
> mind?

Yes, the magic tuning flags can be coarse (on/off is good enough). If we can
agree that these expansions are really only useful for 4x vectorized code and
not much else then all we need is a function that enables it for those modes. 
Otherwise we would need per-CPU settings that select which expansions are
enabled for which modes (not just single/double).

Wilco



[gomp-nvptx 0/7] Various fixes

2016-03-18 Thread Alexander Monakov
Hello,

I have committed the following patches to amonakov/gomp-nvptx branch to fix a
few bugs uncovered in recent testing (including testing on a 32-bit ARM
platform).  Patch 1 fixes an odd mispaste in bar.c, patches 2,5,6,7 address
32-bit portability issues, patch 3 works around a deadlock on error reporting
(this is a regression that is also visible on trunk with OpenACC offloading),
and patch 4 is a slightly more comprehensive fix to nvptx debuginfo generation.

Alexander Monakov (7):
  libgomp: remove paste error in gomp_team_barrier_wait_end
  nvptx libgcc: use attribute shared
  libgomp plugin: make cuMemFreeHost error non-fatal
  nvptx backend: re-enable line info generation
  nvptx backend: use POINTER_SIZE instead of BITS_PER_WORD
  nvptx backend: change mul.u32 to mul.lo.u32
  nvptx backend: define STACK_SIZE_MODE

 gcc/ChangeLog.gomp-nvptx  | 23 +++
 gcc/config/nvptx/nvptx.c  | 21 ++---
 gcc/config/nvptx/nvptx.h  |  1 +
 libgcc/ChangeLog.gomp-nvptx   |  8 
 libgcc/config/nvptx/crt0.c| 10 --
 libgcc/config/nvptx/stacks.c  |  9 ++---
 libgomp/ChangeLog.gomp-nvptx  |  9 +
 libgomp/config/nvptx/bar.c|  2 --
 libgomp/plugin/plugin-nvptx.c |  2 +-
 9 files changed, 54 insertions(+), 31 deletions(-)



Re: [AArch64] Add precision choices for the reciprocal square root approximation

2016-03-18 Thread Evandro Menezes

On 03/18/16 17:20, Wilco Dijkstra wrote:

Evandro Menezes  wrote:

On 03/18/16 10:21, Wilco Dijkstra wrote:

Hi Evandro,


For example, though this approximation is improves the performance
noticeably for DF on A57, for SF, not so much, if at all.

I'm still skeptical that you ever can get any gain on scalars. I bet the only 
gain is on
4x vectorized floats.

I created a simple test that loops around an inline asm version of the
Newton series using scalar insns and got these results on A57:

That's pure max throughput rather than answering the question whether
it speeds up code that does real work. A test that loads an array of vectors and
writes back the unit vectors would be a more realistic scenario.

Note our testing showed rsqrt slows down various benchmarks:
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00574.html.


I remember having seen that, but my point is that if A57 enabled this 
for only DF, it might be an overall improvement.



If I understood you correctly, would something like coarse tuning flags
along with target-specific cost or parameters tables be what you have in
mind?

Yes, the magic tuning flags can be coarse (on/off is good enough). If we can
agree that these expansions are really only useful for 4x vectorized code and
not much else then all we need is a function that enables it for those modes.
Otherwise we would need per-CPU settings that select which expansions are
enabled for which modes (not just single/double).


Just to be clear, the flags refer to the inner mode, whether scalar or 
vector.
I'm not hopeful that it can be said that this is only useful when 
vectorized though.


--
Evandro Menezes



Patch ping

2016-03-18 Thread Jakub Jelinek
Hi!

I'd like to ping 2 patches, one mine:
  P2 PR70001 http://gcc.gnu.org/ml/gcc-patches/2016-03/msg00710.html
and one from Alex, which hasn't been pinged for quite a while, but is P1:
  P1 PR69315 https://gcc.gnu.org/ml/gcc-patches/2016-01/msg02010.html

Thanks.

Jakub


Re: [PATCH] Change replace_rtx if from is a REG (PR target/70245, take 2)

2016-03-18 Thread Oleg Endo
On Thu, 2016-03-17 at 12:16 +0100, Jakub Jelinek wrote:

> Thus, I've reverted the patch (kept the testcase), and after some
> discussions on IRC bootstrapped/regtested on x86_64-linux and i686
> -linux following version, which right now should change behavior just 
> for the i?86 case and nothing else, so shouldn't break other targets.
> I believe at least the epiphany and sh peepholes that use replace_rtx 
> will want similar treatment, but will leave testing of that to their
> maintainers.

As far as I can see, replace_rtx is used only in one pattern in sh.md:

(define_peephole2
  [(set (match_operand:SI 0 "general_movdst_operand" "")
(match_operand:SI 1 "arith_reg_or_0_operand" ""))
   (set (match_operand:SI 2 "arith_reg_dest" "")
(if_then_else:SI (match_operator 4 "equality_comparison_operator"
   [(match_operand:SI 3 "arith_reg_operand" "")
(const_int 0)])
 (match_dup 0)
 (match_dup 2)))]
  "TARGET_SHMEDIA && peep2_reg_dead_p (2, operands[0])
   && (!REG_P (operands[1]) || GENERAL_REGISTER_P (REGNO (operands[1])))"
  [(set (match_dup 2)
(if_then_else:SI (match_dup 4) (match_dup 1) (match_dup 2)))]
{
  replace_rtx (operands[4], operands[0], operands[1]);
})

This pattern is only for SH5/SH64, which I'm planning on removing after
GCC 6.  So no concerns or further actions required here.

Cheers,
Oleg


Re: [RFA][PR rtl-optimization/70263] Fix creation of new REG_EQUIV notes

2016-03-18 Thread Jeff Law

On 03/18/2016 01:16 PM, Bernd Schmidt wrote:

On 03/18/2016 08:14 PM, Jeff Law wrote:

I also added a blurb to the dump file when we create these equivalences
and included a test to verify the code fires.  I verified it fired on
x86 and x86-64.  It may or may not fire on other targets, so I left the
test in the i386 specific subdirectory.


This is the sort of thing I'd want to do with rtl unit tests.
Yea.  Along the same lines, my patch for the coalescing problem 
introduces a new bitmap function that I'd like to cover with some unit 
tests.  I'm sure we're going to find oodles of these things as we 
continue development and ponder how to better test things than scanning 
dump files.


Jeff


Re: C++ PATCH for c++/70259 (-flifetime-dse vs. empty bases)

2016-03-18 Thread Jakub Jelinek
On Wed, Mar 16, 2016 at 02:47:09PM -0400, Jason Merrill wrote:
> The constructor for an empty class can't do the -flifetime-dse clobber
> because when the class is used as a base it might be assigned the same
> offset as a real base, so the clobber would mess with real data.

Isn't this needed also for the begin_destructor_body case?
I mean can't it clobber prematurely something that shouldn't be clobbered
yet?

> commit e1a5f038350d1881153d8f65359bd883f7452237
> Author: Jason Merrill 
> Date:   Wed Mar 16 13:46:32 2016 -0400
> 
>   PR c++/70259
>   * decl.c (start_preparsed_function): Don't clobber an empty base.

Jakub


Re: [PATCH 2/4][AArch64] Increase the loop peeling limit

2016-03-18 Thread Evandro Menezes

On 02/03/16 13:46, Evandro Menezes wrote:

On 01/08/16 16:55, Evandro Menezes wrote:

On 12/16/2015 02:11 PM, Evandro Menezes wrote:

On 12/16/2015 05:24 AM, Richard Earnshaw (lists) wrote:

On 15/12/15 23:34, Evandro Menezes wrote:

On 12/14/2015 05:26 AM, James Greenhalgh wrote:

On Thu, Dec 03, 2015 at 03:07:43PM -0600, Evandro Menezes wrote:

On 11/20/2015 05:53 AM, James Greenhalgh wrote:

On Thu, Nov 19, 2015 at 04:04:41PM -0600, Evandro Menezes wrote:

On 11/05/2015 02:51 PM, Evandro Menezes wrote:

2015-11-05  Evandro Menezes 

gcc/

* config/aarch64/aarch64.c
(aarch64_override_options_internal):
Increase loop peeling limit.

This patch increases the limit for the number of peeled insns.
With this change, I noticed no major regression in either
Geekbench v3 or SPEC CPU2000 while some benchmarks, typically FP
ones, improved significantly.

I tested this tuning on Exynos M1 and on A57. ThunderX seems to
benefit from this tuning too.  However, I'd appreciate comments

>from other stakeholders.

Ping.


I'd like to leave this for a call from the port maintainers. I can
see why
this leads to more opportunities for vectorization, but I'm
concerned about
the wider impact on code size. Certainly I wouldn't expect this to
be our
default at -O2 and below.

My gut feeling is that this doesn't really belong in the back-end
(there are
presumably good reasons why the default for this parameter across
GCC has
fluctuated from 400 to 100 to 200 over recent years), but as I 
say, I'd
like Marcus or Richard to make the call as to whether or not we 
take

this
patch.


Please, correct me if I'm wrong, but loop peeling is enabled only
with loop unrolling (and with PGO).  If so, then extra code size is
not a concern, for this heuristic is only active when unrolling
loops, when code size is already of secondary importance.


My understanding was that loop peeling is enabled from -O2 
upwards, and

is also used to partially peel unaligned loops for vectorization
(allowing
the vector code to be well aligned), or to completely peel inner 
loops

which
may then become amenable to SLP vectorization.

If I'm wrong then I take back these objections. But I was sure this
parameter was used in a number of situations outside of just
-funroll-loops/-funroll-all-loops . Certainly I remember seeing
performance
sensitivities to this parameter at -O3 in some internal workloads 
I was

analysing.


Vectorization, including SLP, is only enabled at -O3, isn't it?  It
seems to me that peeling is only used by optimizations which already
lead to potential increase in code size.

For instance, with "-Ofast -funroll-all-loops", the total text 
size for

the SPEC CPU2000 suite is 26.9MB with this proposed change and 26.8MB
without it; with just "-O2", it is the same at 23.1MB regardless 
of this

setting.

So it seems to me that this proposal should be neutral for up to -O2.


My preference would be to not diverge from the global parameter
settings.  I haven't looked in detail at this parameter but it 
seems to

me there are two possible paths:

1) We could get agreement globally that the parameter should be 
increased.

2) We could agree that this specific use of the parameter is distinct
from some other uses and deserves a new param in its own right with a
higher value.


Here's what I have observed, not only in AArch64: architectures 
benefit differently from certain loop optimizations, especially 
those dealing with vectorization. Be it because some have plenty of 
registers of more aggressive loop unrolling, or because some have 
lower costs to vectorize.  With this, I'm trying to imply that there 
may be the case to wiggle this parameter to suit loop optimizations 
better to specific targets.  While it is not the only parameter 
related to loop optimizations, it seems to be the one with the 
desired effects, as exemplified by PPC, S390 and x86 (AOSP).  Though 
there is the possibility that they are actually side-effects, as 
Richard Biener perhaps implied in another reply.


Gents,

Any new thoughts on this proposal?


Ping?


Ping^2

--
Evandro Menezes



[PATCH] Fix PR70288

2016-03-18 Thread Richard Biener

The following fixes excessive compile-time and memory-usage needed to
build the testcases which is caused by severe mis-calculation of
size-after-unrolling because it simply assumes that conditionals
with is_gimple_min_invariant ops can be folded to a constant.
This is not always true, like for

 int a[1], b[1];
 if (a < b)
   ...

which causes the testcase to explode.  The easiest fix is to look
for a change in constness due to peeling rather than only
constness.

Of course in the end we want to fold these (and comparing
[i] < [i] will run into the issue even w/o the fix).  But it
is not clear to me to what we should simplify the compare - replacing
it with __builtin_trap () might be best I suppose but I'm sure
it will break things out in the wild in interesting ways.  Not
optimizing is a better choice here IMHO.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-03-18  Richard Biener  

PR tree-optimization/70288
* tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Make sure
we do not estimate unsimplified all-constant conditionals or
switches as optimized away.

* gcc.dg/torture/pr70288-1.c: New testcase.
* gcc.dg/torture/pr70288-2.c: Likewise.

Index: gcc/tree-ssa-loop-ivcanon.c
===
*** gcc/tree-ssa-loop-ivcanon.c (revision 234320)
--- gcc/tree-ssa-loop-ivcanon.c (working copy)
*** tree_estimate_loop_size (struct loop *lo
*** 298,308 
  /* Conditionals.  */
  else if ((gimple_code (stmt) == GIMPLE_COND
&& constant_after_peeling (gimple_cond_lhs (stmt), stmt, 
loop)
!   && constant_after_peeling (gimple_cond_rhs (stmt), stmt, 
loop))
   || (gimple_code (stmt) == GIMPLE_SWITCH
   && constant_after_peeling (gimple_switch_index (
as_a  (stmt)),
! stmt, loop)))
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "   Constant conditional.\n");
--- 298,314 
  /* Conditionals.  */
  else if ((gimple_code (stmt) == GIMPLE_COND
&& constant_after_peeling (gimple_cond_lhs (stmt), stmt, 
loop)
!   && constant_after_peeling (gimple_cond_rhs (stmt), stmt, 
loop)
!   /* We don't simplify all constant compares so make sure
!  they are not both constant already.  See PR70288.  */
!   && (! is_gimple_min_invariant (gimple_cond_lhs (stmt))
!   || ! is_gimple_min_invariant (gimple_cond_rhs (stmt
   || (gimple_code (stmt) == GIMPLE_SWITCH
   && constant_after_peeling (gimple_switch_index (
as_a  (stmt)),
! stmt, loop)
!  && ! is_gimple_min_invariant (gimple_switch_index (
!  as_a  
(stmt)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "   Constant conditional.\n");
Index: gcc/testsuite/gcc.dg/torture/pr70288-1.c
===
*** gcc/testsuite/gcc.dg/torture/pr70288-1.c(revision 0)
--- gcc/testsuite/gcc.dg/torture/pr70288-1.c(working copy)
***
*** 0 
--- 1,36 
+ /* { dg-do compile } */
+ /* { dg-require-effective-target int32plus } */
+ 
+ int main()
+ {
+   int var6 = -1267827473;
+   do {
+   ++var6;
+   double s1_115[4], s2_108[4];
+   int var8 = -161498264;
+   do {
+ ++var8;
+ int var12 = 1260960076;
+ for (; var12 <= 1260960080; ++var12) {
+ int var13 = 1960990937;
+ do {
+ ++var13;
+ int var14 = 2128638723;
+ for (; var14 <= 2128638728; ++var14) {
+ int var22 = -1141190839;
+ do {
+ ++var22;
+ if (s2_108 > s1_115) {
+ int var23 = -890798748;
+ do {
+ ++var23;
+ long long e_119[4];
+ } while (var23 <= -890798746);
+ }
+ } while (var22 <= -1141190829);
+ }
+ } while (var13 <= 1960990946);
+ }
+   } while (var8 <= -161498254);
+   } while (var6 <= -1267827462);
+ }
Index: gcc/testsuite/gcc.dg/torture/pr70288-2.c
===
*** gcc/testsuite/gcc.dg/torture/pr70288-2.c(revision 0)
--- gcc/testsuite/gcc.dg/torture/pr70288-2.c(working copy)

Re: [RFA][PATCH][PR tree-optimization/64058] Improve and stabilize sorting of coalesce pairs

2016-03-18 Thread Jeff Law

On 03/14/2016 07:08 PM, Trevor Saunders wrote:


To work around the narrow API in the comparison function we have to either
store additional data in each node or have them available in globals.  The
former would be horribly wasteful, the latter is just ugly.  I choose the
latter in the lazy evaluation of the conflicts version.


its a bit ugly in C++98, but you can give std::sort a random object with
operator () to compare with.
So we could just wrap the object with a class that has a suitable 
operator?  I like that much more than mucking around with global 
variables.  Let me give that a whirl.


Jeff


Re: [PATCH] Fix PR64764

2016-03-18 Thread Richard Biener
On Wed, 16 Mar 2016, H.J. Lu wrote:

> On Wed, Mar 16, 2016 at 9:41 AM, H.J. Lu  wrote:
> > On Wed, Mar 16, 2016 at 9:35 AM, Tom de Vries  
> > wrote:
> >> On 16/03/16 17:15, H.J. Lu wrote:
> >>>
> >>> On Wed, Mar 16, 2016 at 9:12 AM, H.J. Lu  wrote:
> >>
> >>
>  Any particular reason why this test was changed to DOS format?
> >>
> >>
> >> FWIW, the test was in DOS format from the start.
> >>
> >>
> >
> > DOS format was introduced by r220530:
> >
> > Index: gcc.dg/uninit-19.c
> > ===
> > --- gcc.dg/uninit-19.c  (revision 220529)
> > +++ gcc.dg/uninit-19.c  (revision 220530)
> > @@ -10,7 +10,7 @@ fn1 (int p1, float *f1, float *f2, float
> >   unsigned char *c2, float *p10)^M
> >  {^M
> >if (p1 & 8)^M
> > -b[3] = p10[a];  /* { dg-warning "may be used uninitialized" } */^M
> > +b[3] = p10[a];  /* 13.  */^M
> >  }^M
> >  ^M
> >  void^M
> > @@ -19,5 +19,8 @@ fn2 ()
> >float *n;^M
> >if (l & 6)^M
> >  n =  + m;^M
> > -  fn1 (l, , , , , , , n);^M
> > +  fn1 (l, , , , , , , n);  /* 22.  */^M
> >  }^M
> > +^M
> > +/* { dg-warning "may be used uninitialized" "" { target nonpic } 13 } */^M
> > +/* { dg-warning "may be used uninitialized" "" { target { ! nonpic }
> > } 22 } */^M
> >
> > "^M" was added to those changed lines.
> >
> 
> Never mind.  "^M" was there before.

Happens from time to time when I download testcases from bugzilla
attachments.  If they are DOS format they stay that way.  Then often
I just edit 't.c' by removing everything and pasting sth new in
which seems to retain DOS format as well.

Richard.


[PATCH, rs6000] Add support for xxpermr and vpermr instructions

2016-03-18 Thread Kelvin Nilsen


This patch adds support for two new Power9 instructions, xxpermr and 
vpermr, providing more efficient vector permutation operations on

little-endian configurations. These new instructions are described in
the Power ISA 3.0 document.  Selection of the new instructions is
conditioned upon TARGET_P9_VECTOR and !VECTOR_ELT_ORDER_BIG.

The patch has bootstrapped and tested on powerpc64le-unknown-linux-gnu
and powerpc64-unknown-linux-gnu with no regressions.  Is this ok for GCC 
7 when stage 1 opens?


(A previous version of this patch was distributed and approved, but 
further experience with testing of P9 fusion instructions revealed a 
problem with that particular code expansion.  So this new revision of 
the patch omits the fusion instruction generation pattern.)


Thanks.

gcc/testsuite/ChangeLog:

2016-03-17  Kelvin Nilsen  

* gcc.target/powerpc/p9-permute.c: Generalize test to run on
big-endian Power9 in addition to little-endian Power9.
* gcc.target/powerpc/p9-vpermr.c: New test.


gcc/ChangeLog:

2016-03-17  Kelvin Nilsen  

* config/rs6000/altivec.md: (UNSPEC_VPERMR): New unspec
constant.
(*altivecvpermr__internal): New insn.
* config/rs6000/rs6000.c (rs6000_expand_vector_set): If
!BYTES_BIG_ENDIAN and TARGET_P9_VECTOR, expand using template
that translates into new xxpermr or vpermr instructions.
(altivec_expand_vec_perm_le): if TARGET_P9_VECTOR, expand using
template that translates into new xxpermr or vpermr
instructions.

--
Kelvin Nilsen, Ph.D.  kdnil...@linux.vnet.ibm.com
home office: 801-756-4821, cell: 520-991-6727
IBM Linux Technology Center - PPC Toolchain
Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md(revision 233539)
+++ gcc/config/rs6000/altivec.md(working copy)
@@ -58,6 +58,7 @@
UNSPEC_VSUM2SWS
UNSPEC_VSUMSWS
UNSPEC_VPERM
+   UNSPEC_VPERMR
UNSPEC_VPERM_UNS
UNSPEC_VRFIN
UNSPEC_VCFUX
@@ -1962,6 +1963,19 @@
   [(set_attr "type" "vecperm")
(set_attr "length" "4,4,8")])
 
+(define_insn "*altivec_vpermr__internal"
+  [(set (match_operand:VM 0 "register_operand" "=v,?wo")
+   (unspec:VM [(match_operand:VM 1 "register_operand" "v,0")
+   (match_operand:VM 2 "register_operand" "v,wo")
+   (match_operand:V16QI 3 "register_operand" "v,wo")]
+  UNSPEC_VPERMR))]
+  "TARGET_P9_VECTOR"
+  "@
+   vpermr %0,%1,%2,%3
+   xxpermr %x0,%x2,%x3"
+  [(set_attr "type" "vecperm")
+   (set_attr "length" "4,4")])
+
 (define_insn "altivec_vperm_v8hiv16qi"
   [(set (match_operand:V16QI 0 "register_operand" "=v,?wo,?")
(unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v,0,wo")
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 233539)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6553,19 +6553,27 @@ rs6000_expand_vector_set (rtx target, rtx val, int
UNSPEC_VPERM);
   else 
 {
-  /* Invert selector.  We prefer to generate VNAND on P8 so
- that future fusion opportunities can kick in, but must
- generate VNOR elsewhere.  */
-  rtx notx = gen_rtx_NOT (V16QImode, force_reg (V16QImode, x));
-  rtx iorx = (TARGET_P8_VECTOR
- ? gen_rtx_IOR (V16QImode, notx, notx)
- : gen_rtx_AND (V16QImode, notx, notx));
-  rtx tmp = gen_reg_rtx (V16QImode);
-  emit_insn (gen_rtx_SET (tmp, iorx));
-
-  /* Permute with operands reversed and adjusted selector.  */
-  x = gen_rtx_UNSPEC (mode, gen_rtvec (3, reg, target, tmp),
- UNSPEC_VPERM);
+  if (TARGET_P9_VECTOR)
+   x = gen_rtx_UNSPEC (mode,
+   gen_rtvec (3, target, reg, 
+  force_reg (V16QImode, x)),
+   UNSPEC_VPERMR);
+  else
+   {
+ /* Invert selector.  We prefer to generate VNAND on P8 so
+that future fusion opportunities can kick in, but must
+generate VNOR elsewhere.  */
+ rtx notx = gen_rtx_NOT (V16QImode, force_reg (V16QImode, x));
+ rtx iorx = (TARGET_P8_VECTOR
+ ? gen_rtx_IOR (V16QImode, notx, notx)
+ : gen_rtx_AND (V16QImode, notx, notx));
+ rtx tmp = gen_reg_rtx (V16QImode);
+ emit_insn (gen_rtx_SET (tmp, iorx));
+ 
+ /* Permute with operands reversed and adjusted selector.  */
+ x = gen_rtx_UNSPEC (mode, gen_rtvec (3, reg, target, tmp),
+ UNSPEC_VPERM);
+   }
 }
 
   emit_insn (gen_rtx_SET (target, x));
@@ -33421,18 +33429,26 @@ altivec_expand_vec_perm_le (rtx operands[4])
   if (!REG_P (target))
 tmp = gen_reg_rtx (mode);
 
-  /* Invert the selector 

Re: [C PATCH] Prevent -Wunused-value warning with __atomic_fetch_* (PR c/69407)

2016-03-18 Thread Uros Bizjak
On Fri, Mar 18, 2016 at 4:33 PM, Uros Bizjak  wrote:
> On Mon, Mar 7, 2016 at 2:34 PM, Marek Polacek  wrote:
>> On Fri, Mar 04, 2016 at 07:17:46PM +0100, Uros Bizjak wrote:
>>> Hello!
>>>
>>> > This is not a regression but I thought I'd post this anyway.  Martin 
>>> > reported
>>> > that we generate -Wunused-value warnings on the attached testcase, which
>>> > arguable doesn't make sense.  Setting TREE_USED suppresses the warning.  
>>> > Since
>>> > we already compute 'fetch_op' I used that.  (This warning doesn't trigger 
>>> > e.g.
>>> > for __atomic_load/store/compare.)
>>> >
>>> > Bootstrapped/regtested on x86_64-linux, ok for trunk or gcc7?
>>> >
>>> > 2016-03-04  Marek Polacek  
>>> >
>>> > PR c/69407
>>> > * c-common.c (resolve_overloaded_builtin): Set TREE_USED for the fetch
>>> > operations.
>>> >
>>> > * gcc.dg/atomic-op-6.c: New test.
>>>
>>> You can probably revert my workaround [1] that suppressed these
>>> warnings in libsupc++/guard.cc.
>>
>> Ah, thanks for the heads-up, I'll do that once I get the patch in.
>
> I have committed the attached revert after bootstrap on
> x86_64-linux-gnu {,-m32}. There were no warnings when compiling
> guard.cc.
>
> 2016-03-18  Uros Bizjak  
>
> Revert:
> 2015-07-02  Uros Bizjak  
>
> * libsupc++/guard.cc (__test_and_acquire): Use __p after __atomic_load
> to avoid unused variable warning.
> (__set_and_release): Use __p after __atomic_store to avoid unused
> variable warning.

Whoops, I looked at the wrong part of the build log ... unfortunately,
the warning still happens, and I have to revert the revert ...

Sorry for the noise,
Uros.


Re: [PATCH, i386, AVX-512] Emit vpbroadcastq instead if non-existent vbroadcastsd.

2016-03-18 Thread Uros Bizjak
On Fri, Mar 18, 2016 at 10:44 AM, Kirill Yukhin  wrote:
> Hello,
> Intel spec [1] states that there're almost all broadcasting
> intructions variants available, except for (p. 2-4)
> vbroadcastsd %xmm, %xmm
> It is safe to emit
> vpbroadcastq %xmm, %xmm
> instead.
>
> I was uable to extract a testcase, but if this insn is generated -
> we'll got asm error.
>
> [1] - 
> https://software.intel.com/sites/default/files/managed/b4/3a/319433-024.pdf
>
> Bootstrapped and regtested.
>
> Richard,
> is it ok to check in to main trunk?
>
> gcc/
> * config/i386/sse.md: Use vpbroadcastq for broadcasting DF
> values to 128b regs.

OK.

Thanks,
Uros.

> --
> Thanks, K
>
> commit 72e85f1b936d61edc93603862c810a8b4817b8a7
> Author: Kirill Yukhin 
> Date:   Thu Mar 17 18:05:22 2016 +0300
>
> AVX-512. Use vpbroadcastq for broadcasting DF values to 128b regs.
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 3c521b3..b25c246 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -17269,7 +17269,14 @@
> (match_operand: 1 "nonimmediate_operand" "vm")
> (parallel [(const_int 0)]]
>"TARGET_AVX512F"
> -  "vbroadcast\t{%1, 
> %0|%0, %1}"
> +{
> +  /*  There is no DF broadcast (in AVX-512*) to 128b register.
> +  Mimic it with integer variant.  */
> +  if (mode == V2DFmode)
> +return "vpbroadcastq\t{%1, %0|%0, %1}";
> +  else
> +return "vbroadcast\t{%1, 
> %0|%0, %1}";
> +}
>[(set_attr "type" "ssemov")
> (set_attr "prefix" "evex")
> (set_attr "mode" "")])