date:20171114

Re: [committed][PATCH] Change order of processing blocks/threads in tree-ssa-threadupdate.c

2017-11-14 Thread Aldy Hernandez




On 11/14/2017 10:46 PM, Jeff Law wrote:

With my local patches to remove jump threading from VRP I was seeing a
fairly obvious jump threading path left in the CFG after DOM.  This
missed jump thread ultimately caused a false positive uninitialized warning.


This wouldn't be uninit-pred-[68]* or some such, which I also trigger 
when messing around with the backwards threader ??.



ping-ponging, but not due to this patch AFAICT.  Also verified by visual
inspection that the first DOM pass fully threaded the code in question
when using a local branch that has my removal of threading from tree-vrp
patches installed and bootstrapping that branch.


If DOM dumps the threads to the dump file you may want to bake that test 
with some GIMPLE FE test.



Installing on the trunk.


You forgot to attach the patch :).

Aldy

Re: [patch] backwards threader cleanups

2017-11-14 Thread Aldy Hernandez




On 11/14/2017 02:38 PM, David Malcolm wrote:

On Tue, 2017-11-14 at 14:08 -0500, Aldy Hernandez wrote:



   https://gcc.gnu.org/codingconventions.html#Class_Form
says that:

"When defining a class, first [...]
declare all public member functions,
[...]
then declare all non-public member functions, and
then declare all non-public member variables."


Wow, I did not expect that order.  Fixed.



Should the class have a ctor?  I see in
   thread_jumps::find_jump_threads_backwards
below that you have:


+  /* Initialize the pass local data that changes at each iteration.  */
+  path.truncate (0);
+  path.safe_push (bb);
+  visited_bbs.empty ();
+  seen_loop_phi = false;
+  this->speed_p = speed_p;


As the comment says, these are per iteration, so I can't just put them 
in a constructor.  Perhaps I should say "per basic block" to make this 
clearer.




(Is this a self-assign from this->speed_p? should the "speed_p" param
be renamed, e.g. to "speed_p_")


Yes.  Fixed.




max_threaded_paths = PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATHS);


...but I'm not familiar enough with the code in question to be able
to know if it makes sense to move this initialization logic into a ctor
(i.e. is it per BB, or per CFG)


Per BB.  I've lumped this with the block above that now says "per basic 
block".




[...snip...]


@@ -823,11 +810,12 @@ pass_thread_jumps::execute (function *fun)
loop_optimizer_init (LOOPS_HAVE_PREHEADERS | LOOPS_HAVE_SIMPLE_LATCHES);
  
/* Try to thread each block with more than one successor.  */

+  thread_jumps pass;
basic_block bb;
FOR_EACH_BB_FN (bb, fun)
  {
if (EDGE_COUNT (bb->succs) > 1)
-   find_jump_threads_backwards (bb, true);
+   pass.find_jump_threads_backwards (bb, true);
  }
bool changed = thread_through_all_blocks (true);


Is it just me, or is "pass" a bit non-descript here?
How about "threader" or somesuch?


Done.





@@ -883,11 +871,12 @@ pass_early_thread_jumps::execute (function *fun)
loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
  
/* Try to thread each block with more than one successor.  */

+  thread_jumps pass;
basic_block bb;
FOR_EACH_BB_FN (bb, fun)
  {
if (EDGE_COUNT (bb->succs) > 1)
-   find_jump_threads_backwards (bb, false);
+   pass.find_jump_threads_backwards (bb, false);
  }


Similarly here

[...snip...]


Hope this is constructive


Yes, very.  Thanks!

Updated patch attached.
Aldy
gcc/

	* hash-set.h (hash_set::empty): New.
	* tree-ssa-threadbackward.h: Remove.
	* tree-ssa-threadbackward.c (class thread_jumps): New.
	Move max_threaded_paths into class.
	(fsm_find_thread_path): Remove arguments that are now in class.
	(profitable_jump_thread_path): Rename to...
	(thread_jumps::profitable_jump_thread_path): ...this.
	(convert_and_register_jump_thread_path): Rename to...
	(thread_jumps::convert_and_register_current_path): ...this.
	(check_subpath_and_update_thread_path): Rename to...
	(thread_jumps::check_subpath_and_update_thread_path): ...this.
	(register_jump_thread_path_if_profitable): Rename to...
	(thread_jumps::register_jump_thread_path_if_profitable): ...this.
	(handle_phi): Rename to...
	(thread_jumps::handle_phi): ...this.
	(handle_assignment): Rename to...
	(thread_jumps::handle_assignment): ...this.
	(fsm_find_control_statement_thread_paths): Rename to...
	(thread_jumps::fsm_find_control_statement_thread_paths): ...this.
	(find_jump_threads_backwards): Rename to...
	(thread_jumps::find_jump_threads_backwards): ...this.
	Initialize path local data.
	(pass_thread_jumps::execute): Call find_jump_threads_backwards
	from within thread_jumps class.
	(pass_early_thread_jumps::execute): Same.

diff --git a/gcc/hash-set.h b/gcc/hash-set.h
index d2247d39571..8ce796d1c48 100644
--- a/gcc/hash-set.h
+++ b/gcc/hash-set.h
@@ -80,6 +80,10 @@ public:
 
   size_t elements () const { return m_table.elements (); }
 
+  /* Clear the hash table.  */
+
+  void empty () { m_table.empty (); }
+
   class iterator
   {
   public:
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 12bc80350f5..5396f8e6761 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -38,7 +38,35 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-inline.h"
 #include "tree-vectorizer.h"
 
-static int max_threaded_paths;
+class thread_jumps
+{
+ public:
+  void find_jump_threads_backwards (basic_block bb, bool speed_p);
+ private:
+  edge profitable_jump_thread_path (basic_block bbi, tree name, tree arg,
+bool *creates_irreducible_loop);
+  void convert_and_register_current_path (edge taken_edge);
+  void register_jump_thread_path_if_profitable (tree name, tree arg,
+		basic_block def_bb);
+  void handle_assignment (gimple *stmt, tree name, basic_block def_bb);
+  void handle_phi (gphi *phi, tree name, basic_block def_bb);
+  void fsm_find_control_statement_thread_paths (tree name);
+  bool check_subpath_and_update_thread_path

Re: [PATCH][RFC] Instrument function exit with __builtin_unreachable in C++.

2017-11-14 Thread Martin Liška

PING^1

On 11/07/2017 11:08 AM, Martin Liška wrote:
> On 11/06/2017 06:33 PM, Jakub Jelinek wrote:
>> On Mon, Nov 06, 2017 at 06:23:11PM +0100, Eric Botcazou wrote:
 Thank you for review, done that.
>>>
>>> This has enabled -Wreturn-type for Ada, what we don't want since the 
>>> warning 
>>> is outsmarted by the language, so I have applied this.
>>>
>>>
>>> 2017-11-06  Eric Botcazou  
>>>
>>> * gcc-interface/misc.c (gnat_post_options): Clear warn_return_type.
>>
>> Hasn't it enabled it also for any other FEs other than C family and Fortran?
>> Say jit, brig, go, lto?, ...
>> I think better would be to remove the initialization to -1 and revert the
>> fortran/options.c change, and instead use in the C family:
>>   if (!global_options_set.x_warn_return_type)
>> warn_return_type = c_dialect_cxx ();
>>
>> Unless it for some reason doesn't work for -Wall or -W or similar.
>>
>>  Jakub
>>
> 
> Hello.
> 
> Sorry for the inconvenience, however using Jakub's approach really does not 
> work properly
> with -Wall.
> 
> Thus I'm suggesting following patch that will disable it in all 
> *_post_options hooks.
> 
> Ready after it survives regression tests?
> Martin
>

Re: [PATCH] Fix UBSAN errors in dse.c (PR rtl-optimization/82044).

2017-11-14 Thread Martin Liška

On 11/08/2017 05:31 PM, Jeff Law wrote:
> I don't see an updated patch in this thread?  THe last message I see is
> this one where you indicate you're going to tweak the patch and re-test.
> 
> Jeff

Yes, I tweaked and tested following patch.

Martin
>From a369ac78b887e219a375e17d6817c1f744e71779 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 19 Oct 2017 13:38:01 +0200
Subject: [PATCH] Fix UBSAN errors in dse.c (PR rtl-optimization/82044).

gcc/ChangeLog:

2017-10-19  Martin Liska  

	PR rtl-optimization/82044
	PR tree-optimization/82042
	* dse.c (check_mem_read_rtx): Check for overflow.
---
 gcc/dse.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/dse.c b/gcc/dse.c
index 563ca9f56f3..f6d5e6e6fe2 100644
--- a/gcc/dse.c
+++ b/gcc/dse.c
@@ -1981,6 +1981,12 @@ check_mem_read_rtx (rtx *loc, bb_info_t bb_info)
   else
 width = GET_MODE_SIZE (GET_MODE (mem));
 
+  if (offset > HOST_WIDE_INT_MAX - width)
+{
+  clear_rhs_from_active_local_stores ();
+  return;
+}
+
   read_info = read_info_type_pool.allocate ();
   read_info->group_id = group_id;
   read_info->mem = mem;
-- 
2.14.3

[PATCH][Aarch64] v2: Arithmetic overflow tests [Patch 4/4]

2017-11-14 Thread Michael Collison

This is a respin of a AArch64 patch that adds support for builtin arithmetic 
overflow operations. This update separates the patch into multiple pieces and 
addresses comments made by Richard Earnshaw here:

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00249.html

Original patch and motivation for patch here:

https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01512.html

This patch contains new test cases to verify that the new overflow patterns are 
being utilized.

Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk?

2017-10-26  Michael Collison  
Richard Henderson 

* gcc.target/aarch64/builtin_sadd_128.c: New testcase.
* gcc.target/aarch64/builtin_saddl.c: New testcase.
* gcc.target/aarch64/builtin_saddll.c: New testcase.
* gcc.target/aarch64/builtin_uadd_128.c: New testcase.
* gcc.target/aarch64/builtin_uaddl.c: New testcase.
* gcc.target/aarch64/builtin_uaddll.c: New testcase.
* gcc.target/aarch64/builtin_ssub_128.c: New testcase.
* gcc.target/aarch64/builtin_ssubl.c: New testcase.
* gcc.target/aarch64/builtin_ssubll.c: New testcase.
* gcc.target/aarch64/builtin_usub_128.c: New testcase.
* gcc.target/aarch64/builtin_usubl.c: New testcase.
* gcc.target/aarch64/builtin_usubll.c: New testcase.


gnutools-6308-tests-v1.patch.patch
Description: gnutools-6308-tests-v1.patch.patch

[PATCH][Aarch64] v2: Arithmetic overflow subv patterns [Patch 3/4]

2017-11-14 Thread Michael Collison

This is a re-spin of a AArch64 patch that adds support for builtin arithmetic 
overflow operations. This update separates the patch into multiple pieces and 
addresses comments made by Richard Earnshaw here:

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00249.html

Original patch and motivation for patch here:

This patch contains new patterns for subv overflow patterns.

https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01512.html

Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk?

2017-10-26  Michael Collison  
Richard Henderson 

* config/aarch64/aarch64.md (subv4, usubv4): New patterns.
(subti): Handle op1 zero.
(subvti4, usub4ti4): New.
(*sub3_compare1_imm): New.
(sub3_carryinCV): New.
(*sub3_carryinCV_z1_z2, *sub3_carryinCV_z1): New.
(*sub3_carryinCV_z2, *sub3_carryinCV): New.



gnutools-6308-subv-v2.patch.patch
Description: gnutools-6308-subv-v2.patch.patch

[PATCH][Aarch64] v2: Arithmetic overflow addv patterns [Patch 2/4]

2017-11-14 Thread Michael Collison

This is a respin of a AArch64 patch that adds support for builtin arithmetic 
overflow operations. This update separates the patch into multiple pieces and 
addresses comments made by Richard Earnshaw here:

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00249.html

Original patch and motivation for patch here:

https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01512.html

This patch contains new patterns for addv overflow patterns.

Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk?


2017-10-26  Michael Collison  
Richard Henderson 

* config/aarch64/aarch64.md: (addv4, uaddv4): New.
(addti3): Create simpler code if low part is already known to be 0.
(addvti4, uaddvti4): New.
(*add3_compareC_cconly_imm): New.
(*add3_compareC_cconly): New.
(*add3_compareC_imm): New.
(*add3_compareC): Rename from add3_compare1; do not
handle constants within this pattern..
(*add3_compareV_cconly_imm): New.
(*add3_compareV_cconly): New.
(*add3_compareV_imm): New.
(add3_compareV): New.
(add3_carryinC, add3_carryinV): New.
(*add3_carryinC_zero, *add3_carryinV_zero): New.
(*add3_carryinC, *add3_carryinV): New.
((*add3_compareC_cconly_imm): Replace 'ne' operator
with 'comparison' operator.
(*add3_compareV_cconly_imm): Ditto.
(*add3_compareV_cconly): Ditto.
(*add3_compareV_imm): Ditto.
(add3_compareV): Ditto.
(add3_carryinC): Ditto.
(*add3_carryinC_zero): Ditto.
(*add3_carryinC): Ditto.
(add3_carryinV): Ditto.
(*add3_carryinV_zero): Ditto.
(*add3_carryinV): Ditto.


gnutools-6308-addv-v2.patch.patch
Description: gnutools-6308-addv-v2.patch.patch

[PATCH][Aarch64] v2: Arithmetic overflow common functions [Patch 1/4]

2017-11-14 Thread Michael Collison

This is a respin of a AArch64 patch that adds support for builtin arithmetic 
overflow operations. This update separates the patch into multiple pieces and 
addresses comments made by Richard Earnshaw here:

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00249.html

Original patch and motivation for patch here:

https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01512.html

This patch primarily contains common functions in aarch64.c for generating 
TImode scratch registers,
and common rtl functions utilized by the overflow patterns in aarch64.md. In 
addition a new mode representing overflow, CC_Vmode is introduced.

Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk?

2017-10-26  Michael Collison  
Richard Henderson 

* config/aarch64/aarch64-modes.def (CC_V): New.
* config/aarch64/aarch64-protos.h
(aarch64_add_128bit_scratch_regs): Declare
(aarch64_subv_128bit_scratch_regs): Declare.
(aarch64_expand_subvti): Declare.
(aarch64_gen_unlikely_cbranch): Declare
* config/aarch64/aarch64.c (aarch64_select_cc_mode): Test
for signed overflow using CC_Vmode.
(aarch64_get_condition_code_1): Handle CC_Vmode.
(aarch64_gen_unlikely_cbranch): New function.
(aarch64_add_128bit_scratch_regs): New function.
(aarch64_subv_128bit_scratch_regs): New function.
(aarch64_expand_subvti): New function.


gnutools-6308-common-v2.patch.patch
Description: gnutools-6308-common-v2.patch.patch

[PATCH][OBVIOUS] Coverage: remove -fkeep-inline-functions from coverage_flags.

2017-11-14 Thread Martin Liška

Hello.

This is partial revert of my commit r254257 because having 
-fkeep-inline-functions
causes many unresolved symbols. And it would force us to add e.g. many objects
for libcommon.a.

I'm going to install the patch.

gcc/ChangeLog:

2017-11-15  Martin Liska  

* configure.ac: Remove -fkeep-inline-functions from coverage_flags.
* configure: Regenerate.
---
 gcc/configure| 4 ++--
 gcc/configure.ac | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)


diff --git a/gcc/configure b/gcc/configure
index fb40ead9204..05db8a24e24 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -7323,10 +7323,10 @@ fi
 if test "${enable_coverage+set}" = set; then :
   enableval=$enable_coverage; case "${enableval}" in
   yes|noopt)
-coverage_flags="-fprofile-arcs -ftest-coverage -frandom-seed=\$@ -O0 -fkeep-inline-functions -fkeep-static-functions"
+coverage_flags="-fprofile-arcs -ftest-coverage -frandom-seed=\$@ -O0 -fkeep-static-functions"
 ;;
   opt)
-coverage_flags="-fprofile-arcs -ftest-coverage -frandom-seed=\$@ -O2 -fkeep-inline-functions -fkeep-static-functions"
+coverage_flags="-fprofile-arcs -ftest-coverage -frandom-seed=\$@ -O2 -fkeep-static-functions"
 ;;
   no)
 # a.k.a. --disable-coverage
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 0e5167695a2..d7967d995b1 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -728,10 +728,10 @@ AC_ARG_ENABLE(coverage,
 		 default is noopt])],
 [case "${enableval}" in
   yes|noopt)
-coverage_flags="-fprofile-arcs -ftest-coverage -frandom-seed=\$@ -O0 -fkeep-inline-functions -fkeep-static-functions"
+coverage_flags="-fprofile-arcs -ftest-coverage -frandom-seed=\$@ -O0 -fkeep-static-functions"
 ;;
   opt)
-coverage_flags="-fprofile-arcs -ftest-coverage -frandom-seed=\$@ -O2 -fkeep-inline-functions -fkeep-static-functions"
+coverage_flags="-fprofile-arcs -ftest-coverage -frandom-seed=\$@ -O2 -fkeep-static-functions"
 ;;
   no)
 # a.k.a. --disable-coverage

Re: [PATCH] Fix usage of REG_BR_PROBs (PR target/82927).

2017-11-14 Thread Martin Liška

On 11/14/2017 02:59 PM, Joseph Myers wrote:
> On Tue, 14 Nov 2017, Martin Liška wrote:
> 
>> Hello.
>>
>> Quite obvious fix for SH target. Joseph can you please continue with testing?
>> I don't have a machine to test the patch.
> 
> I don't have SH hardware, but can confirm that, in conjunction with 
> Martin's glibc patch to work around strncpy warnings (where the proper fix 
> and another patch for one case are still under discussion), this fixes the 
> sh4-linux-gnu glibc build with build-many-glibcs.py.  (The glibc testsuite 
> build fails - errors in string/bug-strncat1.c, string/tester.c at least - 
> but the failures I see look like more fallout from Martin's GCC changes, 
> probably cases where the testsuite needs to disable the warnings, not 
> anything SH-specific.)
> 

Hello.

Good, I'm planning to install slightly simplified version of the patch.

Martin
>From 8dcdd9c110e39da619f0fa0e73d024186df66515 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 15 Nov 2017 08:09:18 +0100
Subject: [PATCH] Use proper probability (PR target/82927)

gcc/ChangeLog:

2017-11-15  Martin Liska  

	PR target/82927
	* config/sh/sh-mem.cc: Use proper probability for
	REG_BR_PROB_NOTE.
---
 gcc/config/sh/sh-mem.cc | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/config/sh/sh-mem.cc b/gcc/config/sh/sh-mem.cc
index 8fce9799921..4c33260d84e 100644
--- a/gcc/config/sh/sh-mem.cc
+++ b/gcc/config/sh/sh-mem.cc
@@ -183,8 +183,12 @@ expand_block_move (rtx *operands)
   return false;
 }
 
-static const int prob_unlikely = REG_BR_PROB_BASE / 10;
-static const int prob_likely = REG_BR_PROB_BASE / 4;
+static const int prob_unlikely
+  = profile_probability::from_reg_br_prob_base (REG_BR_PROB_BASE / 10)
+.to_reg_br_prob_note ();
+static const int prob_likely
+  = profile_probability::from_reg_br_prob_base (REG_BR_PROB_BASE / 4)
+.to_reg_br_prob_note ();
 
 /* Emit code to perform a strcmp.
 
-- 
2.14.3

[DWARF] mark partial fn versions and OMP frags as partial in dwarf2+ debug info

2017-11-14 Thread Alexandre Oliva

debug info: partial noentry functions: infra

This is the first patch of a set that addresses two different but
somewhat related issues.

On the one hand, after partial inlining, the non-inlined function
fragment is output in a way that debug info consumers can't distinguish
from the entire function: debug info lists the entire function as
abstract origin for the fragment, but nothing that indicates the
fragment does not stand for the entire function.  So, if a debugger is
asked to set a breakpoint at the entry point of the function, it might
very well set one at the entry point of the fragment, which is likely
not where users expect to stop.

On the other hand, OpenMP blocks are split out into artificial functions
that do not indicate their executable code is part of another function.
The artificial functions are nested within the original function, but
that's hardly enough: ideally, debug info consumers should be able to
tell that, if they stop within one of these functions, they're
abstractly within the original function.

This patch introduces a new DWARF attribute to indicate that a function
is a partial copy of its abstract origin, specifically, that its entry
point does not correspond to the entry point of the abstract origin.
This attribute can then be used to mark the out-of-line portion of
partial inlines, and OpenMP blocks split out into artificial functions.


This patchset was regstrapped on x86_64-linux-gnu and i686-linux-gnu.

Ok to install the first patch? (infrastructure)

Ok to install the second patch? (function versioning)

Ok to install the third patch? (OpenMP fragments)


for  include/ChangeLog

* dwarf2.def (DW_AT_GNU_partial_noentry): New.

for  gcc/ChangeLog

* tree-core.h (tree_function_decl): Drop unused
tm_clone_flag.  Add partial_copy_flag.
* tree.h (DECL_FUNCTION_PARTIAL_COPY): New.
* dwarf2out.c (checksum_attributes): Add at_partial_noentry.
(collect_checksum_attributes): Set it.
(die_checksum_ordered): Checksum it.
(gen_subprogam_die): Keep the old die if it the partial copy
flag matches the partial noentry attribute.  Set the attribute
as needed.
---
 gcc/dwarf2out.c|   11 ++-
 gcc/tree-core.h|2 +-
 gcc/tree.h |   10 ++
 include/dwarf2.def |3 +++
 4 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 76a538f1ff97..db0bc12a1f46 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -6944,6 +6944,7 @@ struct checksum_attributes
   dw_attr_node *at_virtuality;
   dw_attr_node *at_visibility;
   dw_attr_node *at_vtable_elem_location;
+  dw_attr_node *at_partial_noentry;
 };
 
 /* Collect the attributes that we will want to use for the checksum.  */
@@ -7108,6 +7109,9 @@ collect_checksum_attributes (struct checksum_attributes 
*attrs, dw_die_ref die)
 case DW_AT_vtable_elem_location:
   attrs->at_vtable_elem_location = a;
   break;
+   case DW_AT_GNU_partial_noentry:
+ attrs->at_partial_noentry = a;
+ break;
 default:
   break;
 }
@@ -7183,6 +7187,7 @@ die_checksum_ordered (dw_die_ref die, struct md5_ctx 
*ctx, int *mark)
   CHECKSUM_ATTR (attrs.at_type);
   CHECKSUM_ATTR (attrs.at_friend);
   CHECKSUM_ATTR (attrs.at_alignment);
+  CHECKSUM_ATTR (attrs.at_partial_noentry);
 
   /* Checksum the child DIEs.  */
   c = die->die_child;
@@ -21933,7 +21938,9 @@ gen_subprogram_die (tree decl, dw_die_ref context_die)
   if (old_die && old_die->die_parent == NULL)
add_child_die (context_die, old_die);
 
-  if (old_die && get_AT_ref (old_die, DW_AT_abstract_origin))
+  if (old_die && get_AT_ref (old_die, DW_AT_abstract_origin)
+ && (DECL_FUNCTION_PARTIAL_COPY (decl)
+ == get_AT_flag (old_die, DW_AT_GNU_partial_noentry)))
{
  /* If we have a DW_AT_abstract_origin we have a working
 cached version.  */
@@ -21943,6 +21950,8 @@ gen_subprogram_die (tree decl, dw_die_ref context_die)
{
  subr_die = new_die (DW_TAG_subprogram, context_die, decl);
  add_abstract_origin_attribute (subr_die, origin);
+ if (DECL_FUNCTION_PARTIAL_COPY (decl))
+   add_AT_flag (subr_die, DW_AT_GNU_partial_noentry, true);
  /*  This is where the actual code for a cloned function is.
  Let's emit linkage name attribute for it.  This helps
  debuggers to e.g, set breakpoints into
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index f74f1453de6d..507016db23e9 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1784,7 +1784,7 @@ struct GTY(()) tree_function_decl {
   unsigned pure_flag : 1;
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
-  unsigned tm_clone_flag : 1;
+  unsigned partial_copy_flag : 1;
   unsigned versioned_function : 1;
   /* No bits left.  */
 };
diff --git a/gcc/tree.h b/gcc/tree.h
index

[RFA][PATCH] patch 4/n Refactor bits of vrp_visit_assignment_or_call

2017-11-14 Thread Jeff Law


So the next group of changes is focused on breaking down evrp into an
analysis engine and the actual optimization pass.  The analysis engine
can be embedded into other dom walker passes quite easily.  I've done it
for the sprintf warnings as well as the main DOM optimizer locally.

Separating analysis from optimization for edge ranges and PHI ranges is
easy.  Doing so for statements isn't terribly hard either, but does
require a tiny bit of refactoring elsewhere.

Which brings us to this patch.

If we look at evrp_dom_walker::before_dom_children we'll see this in the
statement processing code:

  else if (stmt_interesting_for_vrp (stmt))
{
  edge taken_edge;
  value_range vr = VR_INITIALIZER;
  extract_range_from_stmt (stmt, _edge, , );
  if (output
  && (vr.type == VR_RANGE || vr.type == VR_ANTI_RANGE))
{
  update_value_range (output, );
  vr = *get_value_range (output);

  /* Mark stmts whose output we fully propagate for removal.  */

etc.

Conceptually this fragment is part of the analysis side.  But the
subsequent code (optimization side) wants to know the "output" of the
statement.  I'm not keen on calling extract_range_from_stmt on both the
analysis side and the optimization side.

So this patch factors out a bit of extract_range_from_stmt and its child
vrp_visit_assignment_or_call into a routine that will return the proper
SSA_NAME.  So the analysis side calls extract_range_from_stmt and the
optimization side calls the new "get_output_for_vrp".

And of course to avoid duplication we use get_output_for_vrp from within
vrp_visit_assignment_or_call.


Bootstrapped and regression tested on x86_64.  OK for the trunk?

Jeff
commit 0618a201f59699d48fd68edac10d9ad9da6b4c54
Author: law 
Date:   Wed Nov 15 06:30:31 2017 +

* explow.c (anti_adjust_stack_and_probe_stack_clash): Avoid probing
the red zone for stack_clash_protection_final_dynamic_probe targets
when the total dynamic stack size is zero bytes.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@254753 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index c404eb8e5a7..08642663d95 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,9 @@
 2017-11-14  Jeff Law  
 
+   * explow.c (anti_adjust_stack_and_probe_stack_clash): Avoid probing
+   the red zone for stack_clash_protection_final_dynamic_probe targets
+   when the total dynamic stack size is zero bytes.
+
* tree-ssa-threadupdate.c (thread_through_all_blocks): Thread
blocks is post order.
 
diff --git a/gcc/explow.c b/gcc/explow.c
index 662865d2808..96334b2b448 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1999,6 +1999,13 @@ anti_adjust_stack_and_probe_stack_clash (rtx size)
   if (size != CONST0_RTX (Pmode)
   && targetm.stack_clash_protection_final_dynamic_probe (residual))
 {
+  /* SIZE could be zero at runtime and in that case *sp could hold
+live data.  Furthermore, we don't want to probe into the red
+zone.
+
+Go ahead and just guard a probe at *sp on SIZE != 0 at runtime
+if SIZE is not a compile time constant.  */
+
   /* Ideally we would just probe at *sp.  However, if SIZE is not
 a compile-time constant, but is zero at runtime, then *sp
 might hold live data.  So probe at *sp if we know that
@@ -2011,9 +2018,12 @@ anti_adjust_stack_and_probe_stack_clash (rtx size)
}
   else
{
- emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
-  -GET_MODE_SIZE (word_mode)));
+ rtx label = gen_label_rtx ();
+ emit_cmp_and_jump_insns (size, CONST0_RTX (GET_MODE (size)),
+  EQ, NULL_RTX, Pmode, 1, label);
+ emit_stack_probe (stack_pointer_rtx);
  emit_insn (gen_blockage ());
+ emit_label (label);
}
 }
 }

[committed][PATCH] Fix probe into the red zone on aarch64

2017-11-14 Thread Jeff Law


Testing within Red Hat of the aarch64 stack clash bits turned up an
additional problem, one I probably should have expected.

aarch64 is a bit odd in that we may need to emit a probe in the residual
alloca space to enforce certain probing rules related to the outgoing
argument space.

When the size of the alloca space is known (and nonzero) at compile
time, we can just probe *sp as we know it's just been allocated and has
no live data.

If the size of the alloca space is not known at compile time, we have to
account for the possibility that it is zero.  So right now we actually
probe into the red zone (the existence of which is ABI dependent).

I didn't expect it would matter that much in practice, but it turns out
that certain code in process tear down within glibc has a dynamic
allocation where the size is not compile-time constant and is often zero
at runtime.

As a result valgrind complains regularly on stack-clash protected code
for aarch64.  Given the utility of clean valgrind runs and the relative
low cost of a probe relative to alloca setup it seems best to just have
a guarded probe of *sp when the size of the alloca space is not known at
compile time.

Bootstrapped and regression tested on aarch64.  Also verified that glibc
built with -fstack-clash-protection builds and does not regress its
testsuite and that valgrind's testsuite builds and runs with that just
built glibc.

Installing on the trunk.

Jeff

commit 0618a201f59699d48fd68edac10d9ad9da6b4c54
Author: law 
Date:   Wed Nov 15 06:30:31 2017 +

* explow.c (anti_adjust_stack_and_probe_stack_clash): Avoid probing
the red zone for stack_clash_protection_final_dynamic_probe targets
when the total dynamic stack size is zero bytes.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@254753 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index c404eb8e5a7..08642663d95 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,9 @@
 2017-11-14  Jeff Law  
 
+   * explow.c (anti_adjust_stack_and_probe_stack_clash): Avoid probing
+   the red zone for stack_clash_protection_final_dynamic_probe targets
+   when the total dynamic stack size is zero bytes.
+
* tree-ssa-threadupdate.c (thread_through_all_blocks): Thread
blocks is post order.
 
diff --git a/gcc/explow.c b/gcc/explow.c
index 662865d2808..96334b2b448 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1999,6 +1999,13 @@ anti_adjust_stack_and_probe_stack_clash (rtx size)
   if (size != CONST0_RTX (Pmode)
   && targetm.stack_clash_protection_final_dynamic_probe (residual))
 {
+  /* SIZE could be zero at runtime and in that case *sp could hold
+live data.  Furthermore, we don't want to probe into the red
+zone.
+
+Go ahead and just guard a probe at *sp on SIZE != 0 at runtime
+if SIZE is not a compile time constant.  */
+
   /* Ideally we would just probe at *sp.  However, if SIZE is not
 a compile-time constant, but is zero at runtime, then *sp
 might hold live data.  So probe at *sp if we know that
@@ -2011,9 +2018,12 @@ anti_adjust_stack_and_probe_stack_clash (rtx size)
}
   else
{
- emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
-  -GET_MODE_SIZE (word_mode)));
+ rtx label = gen_label_rtx ();
+ emit_cmp_and_jump_insns (size, CONST0_RTX (GET_MODE (size)),
+  EQ, NULL_RTX, Pmode, 1, label);
+ emit_stack_probe (stack_pointer_rtx);
  emit_insn (gen_blockage ());
+ emit_label (label);
}
 }
 }

Re: [PATCH] Move bswap pass into gimple-ssa-store-merging.c

2017-11-14 Thread Richard Biener

On November 14, 2017 10:25:28 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>I'll be working on further store-merging improvements next two days
>and hope to use some of the bswap pass APIs to handle stuff like:
>void foo (char *__restrict p, char *__restrict q)
>{
>  p[0] = q[3];
>  p[1] = q[2];
>  p[2] = q[1];
>  p[3] = q[0];
>}
>or
>  p[4] = data >> 8;
>  p[5] = data;
>etc.  As a precondition for that, this patch just moves the whole bswap
>pass from tree-ssa-math-opts.c, without too many changes (mostly just
>putting stuff into anon namespace, removing unneeded static keywords,
>and
>moving some test from execute to gate).  Bootstrapped/regtested on
>x86_64-linux and i686-linux, ok for trunk?  (Wouldn't commit it anyway
>until further patches that actually need it are approved).

OK. 

Richard. 

>2017-11-14  Jakub Jelinek  
>
>   * tree-ssa-math-opts.c (nop_stats, bswap_stats, struct
>symbolic_number,
>   BITS_PER_MARKER, MARKER_MASK, MARKER_BYTE_UNKNOWN, HEAD_MARKER,
>CMPNOP,
>   CMPXCHG, do_shift_rotate, verify_symbolic_number_p,
>   init_symbolic_number, find_bswap_or_nop_load, perform_symbolic_merge,
>   find_bswap_or_nop_1, find_bswap_or_nop, pass_data_optimize_bswap,
>   class pass_optimize_bswap, bswap_replace,
>   pass_optimize_bswap::execute): Moved to ...
>   * gimple-ssa-store-merging.c: ... this file.
>   Include optabs-tree.h.
>   (nop_stats, bswap_stats, do_shift_rotate, verify_symbolic_number_p,
>   init_symbolic_number, find_bswap_or_nop_load, perform_symbolic_merge,
>   find_bswap_or_nop_1, find_bswap_or_nop, bswap_replace): Put into
>   anonymous namespace, remove static keywords.
>   (pass_optimize_bswap::gate): Test BITS_PER_UNIT == 8 here...
>   (pass_optimize_bswap::execute): ... rather than here.
>
>--- gcc/tree-ssa-math-opts.c.jj2017-11-06 17:24:15.0 +0100
>+++ gcc/tree-ssa-math-opts.c   2017-11-14 17:08:28.831697803 +0100
>@@ -167,18 +167,6 @@ static struct
> 
> static struct
> {
>-  /* Number of hand-written 16-bit nop / bswaps found.  */
>-  int found_16bit;
>-
>-  /* Number of hand-written 32-bit nop / bswaps found.  */
>-  int found_32bit;
>-
>-  /* Number of hand-written 64-bit nop / bswaps found.  */
>-  int found_64bit;
>-} nop_stats, bswap_stats;
>-
>-static struct
>-{
>   /* Number of widening multiplication ops inserted.  */
>   int widen_mults_inserted;
> 
>@@ -1934,1070 +1922,6 @@ make_pass_cse_sincos (gcc::context *ctxt
>   return new pass_cse_sincos (ctxt);
> }
> 
>-/* A symbolic number structure is used to detect byte permutation and
>selection
>-   patterns of a source.  To achieve that, its field N contains an
>artificial
>-   number consisting of BITS_PER_MARKER sized markers tracking where
>does each
>-   byte come from in the source:
>-
>-   0 - target byte has the value 0
>-   FF- target byte has an unknown value (eg. due to sign
>extension)
>-   1..size - marker value is the byte index in the source (0 for lsb).
>-
>-   To detect permutations on memory sources (arrays and structures), a
>symbolic
>-   number is also associated:
>-   - a base address BASE_ADDR and an OFFSET giving the address of the
>source;
>-   - a range which gives the difference between the highest and lowest
>accessed
>- memory location to make such a symbolic number;
>-   - the address SRC of the source element of lowest address as a
>convenience
>- to easily get BASE_ADDR + offset + lowest bytepos;
>-   - number of expressions N_OPS bitwise ored together to represent
>- approximate cost of the computation.
>-
>-   Note 1: the range is different from size as size reflects the size
>of the
>-   type of the current expression.  For instance, for an array char
>a[],
>-   (short) a[0] | (short) a[3] would have a size of 2 but a range of 4
>while
>-   (short) a[0] | ((short) a[0] << 1) would still have a size of 2 but
>this
>-   time a range of 1.
>-
>-   Note 2: for non-memory sources, range holds the same value as size.
>-
>-   Note 3: SRC points to the SSA_NAME in case of non-memory source. 
>*/
>-
>-struct symbolic_number {
>-  uint64_t n;
>-  tree type;
>-  tree base_addr;
>-  tree offset;
>-  HOST_WIDE_INT bytepos;
>-  tree src;
>-  tree alias_set;
>-  tree vuse;
>-  unsigned HOST_WIDE_INT range;
>-  int n_ops;
>-};
>-
>-#define BITS_PER_MARKER 8
>-#define MARKER_MASK ((1 << BITS_PER_MARKER) - 1)
>-#define MARKER_BYTE_UNKNOWN MARKER_MASK
>-#define HEAD_MARKER(n, size) \
>-  ((n) & ((uint64_t) MARKER_MASK << (((size) - 1) * BITS_PER_MARKER)))
>-
>-/* The number which the find_bswap_or_nop_1 result should match in
>-   order to have a nop.  The number is masked according to the size of
>-   the symbolic number before using it.  */
>-#define CMPNOP (sizeof (int64_t) < 8 ? 0 : \
>-  (uint64_t)0x08070605 << 32 | 0x04030201)
>-
>-/* The number which the find_bswap_or_nop_1 result should match in
>-   order to have a byte swap.  The number is masked according

Re: [PATCH 02/14] Support for adding and stripping location_t wrapper nodes

2017-11-14 Thread Trevor Saunders

On Fri, Nov 10, 2017 at 04:45:17PM -0500, David Malcolm wrote:
> This patch provides a mechanism in tree.c for adding a wrapper node
> for expressing a location_t, for those nodes for which
> !CAN_HAVE_LOCATION_P, along with a new method of cp_expr.
> 
> It's called in later patches in the kit via that new method.
> 
> In this version of the patch, I use NON_LVALUE_EXPR for wrapping
> constants, and VIEW_CONVERT_EXPR for other nodes.
> 
> I also turned off wrapper nodes for EXCEPTIONAL_CLASS_P, for the sake
> of keeping the patch kit more minimal.
> 
> The patch also adds a STRIP_ANY_LOCATION_WRAPPER macro for stripping
> such nodes, used later on in the patch kit.

I happened to start reading this series near the end and was rather
confused by this macro since it changes variables in a rather unhygienic
way.  Did you consider just defining a inline function to return the
actual decl?  It seems like its not used that often so the slight extra
syntax should be that big a deal compared to the explicitness.

Other than that the series seems reasonable, and I look forward to
having wrappers in more places.  I seem to remember something I wanted
to warn about they would make much easier.

Thanks

Trev

Re: [PATCH v2] [libcc1] Rename C{,P}_COMPILER_NAME and remove triplet from them

2017-11-14 Thread Alexandre Oliva

On Nov 13, 2017, Sergio Durigan Junior  wrote:

> On Tuesday, September 26 2017, I wrote:
>> Ping^2.

> Ping^3.

> I'm sending the updated ChangeLog/patch.  I'm also removing gdb-patches
> from the Cc list.

> libcc1/ChangeLog:
> 2017-09-01  Sergio Durigan Junior  
>   Pedro Alves  

>   * Makefile.am: Remove references to c-compiler-name.h and
>   cp-compiler-name.h
>   * Makefile.in: Regenerate.
>   * compiler-name.hh: New file.
>   * libcc1.cc: Don't include c-compiler-name.h.  Include
>   compiler-name.hh.
>   * libcp1.cc: Don't include cp-compiler-name.h.  Include
>   compiler-name.hh.

I agree with Pedro's rationale.

I'm not sure I have authority to approve changes to libcc1, but since
this hasn't got a review for so long, and I'm the one who last touched
it, I kind of feel responsible and able, so I suggest that, if nobody
objects till Thursday, this is ok to install.

Thanks!

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

Re: [SFN+LVU+IEPM v4 7/9] [LVU] Introduce location views

2017-11-14 Thread Alexandre Oliva

On Nov 13, 2017, Richard Biener  wrote:

> What does final.c need langhooks for?

Thanks for catching this.

At some point I had a check on whether there could be being stmt markers
emitted by the language, but that's long gone, in part because of LTO;
that's now computed dynamically, on a per-function basis, so the check
was modified to match, but the include added back then lingered.
Consider it gone (I'm dropping it in the SFN and SLI branches, so it
won't be in the next version of the patchset)

> I'd appreciate a DWARF person reviewing the dwarf bits, some new
> static fns seem to lack their toplevel comment.

Thanks for pointing this out.  I've added the following comments:

/* Return true iff we're to emit .loc directives for the assembler to
   generate line number sections.  */

>> +output_asm_line_debug_info (void)

/* Add a view list attribute to DIE.  It must have a DW_AT_location
   attribute, because the view list complements the location list.  */

>> +add_AT_view_list (dw_die_ref die, enum dwarf_attribute attr_kind)

/* Return a pointer to the location list referenced by the attribute.
   If the named attribute is a view list, look up the corresponding
   DW_AT_location attribute and return its location list.  */

>> AT_loc_list_ptr (dw_attr_node *a)

/* Return the location attribute value associated with a view list
   attribute value.  */

>> +view_list_to_loc_list_val_node (dw_val_node *val)

> The middle-end pieces are ok.

Thanks!

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

[committed][PATCH] Change order of processing blocks/threads in tree-ssa-threadupdate.c

2017-11-14 Thread Jeff Law

With my local patches to remove jump threading from VRP I was seeing a
fairly obvious jump threading path left in the CFG after DOM.  This
missed jump thread ultimately caused a false positive uninitialized warning.

Interestingly enough when I looked at the dumps it appeared DOM was
finding the appropriate threads, the threads weren't being cancelled and
we weren't doing any undesirable duplications for joiners.

What was happening is we had these two jump threading paths

  A: (190, 192) (192, 194)
  B: (192, 194) (194, 202)

These are simple jump threading paths.  No joiners or empty block skipping.

The key here is that the target edge of path A is the starting edge of
path B and that path A starts at a block with a smaller index.  There's
a variety of reasons, mostly due to limitations in the duplication and
ssa updating code that may prevent this from being a single jump
threading path.  Anyway...

So we process A first.  That creates a duplicate of 192, removes the
control statement in the duplicate and wires the duplicate to reach 194.
 But note that we already know that if we traverse 192->194, then we
must reach 202.  So this new edge 192'->194 is actually threadable too,
but we're not going to see it until the next DOM pass.

Obviously we'd prefer to just DTRT and not leave the jump thread in the
IL.  That's easily accomplished by handling non-loop jump threads in a
post-order traversal

When we do that, we'll first handle thread path B which will redirect
the 192->194 edge to 192->194' and the right things "just happen" when
we process thread path A and we avoid the false positive uninit warning.


Bootstrapped and regression tested on x86_64.  Wstringop-truncation is
ping-ponging, but not due to this patch AFAICT.  Also verified by visual
inspection that the first DOM pass fully threaded the code in question
when using a local branch that has my removal of threading from tree-vrp
patches installed and bootstrapping that branch.

Installing on the trunk.

Jeff

Re: [PATCH, AArch64] Adjust tuning parameters for Falkor

2017-11-14 Thread Luis Machado

> I think the best thing is to leave this tuning structure in place and
> just change default_opt_level   to -1 to disable it at -O3.
> 
> Thanks,
> Andrew
> 

Indeed that seems to be more appropriate if re-enabling prefetches in the
future is a possibility.

How about the following?

Thanks,
Luis

2017-11-15  Luis Machado  

gcc/
* config/aarch64/aarch64.c
(qdf24xx_prefetch_tune) : Set to -1.
(qdf24xx_tunings) : Set to
tune_params::AUTOPREFETCHER_WEAK.
---
 gcc/ChangeLog| 7 +++
 gcc/config/aarch64/aarch64.c | 4 ++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b80a421..0e05f9e 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2017-11-15  Luis Machado  
+
+   * config/aarch64/aarch64.c
+   (qdf24xx_prefetch_tune) : Set to -1.
+   (qdf24xx_tunings) : Set to
+   tune_params::AUTOPREFETCHER_WEAK.
+
 2017-11-14  Carl Love  
 
* config/rs6000/rs6000.c (swap_endian_selector_for_mode): Remove
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0c67e2b..8779cad 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -508,7 +508,7 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune =
   32,  /* l1_cache_size  */
   64,  /* l1_cache_line_size  */
   1024,/* l2_cache_size  */
-  3/* default_opt_level  */
+  -1   /* default_opt_level  */
 };
 
 static const cpu_prefetch_tune thunderxt88_prefetch_tune =
@@ -817,7 +817,7 @@ static const struct tune_params qdf24xx_tunings =
   2,   /* min_div_recip_mul_sf.  */
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
-  tune_params::AUTOPREFETCHER_STRONG,  /* autoprefetcher_model.  */
+  tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
   _prefetch_tune
 };
-- 
2.7.4

Re: [PATCH] Fix use-after-free in the strlen pass (PR tree-optimization/82977)

2017-11-14 Thread Jeff Law

On 11/14/2017 02:30 PM, Jakub Jelinek wrote:
> On Tue, Nov 14, 2017 at 02:24:28PM -0700, Martin Sebor wrote:
>> On 11/14/2017 02:04 PM, Jakub Jelinek wrote:
>>> Hi!
>>>
>>> strlen_to_stridx.get (rhs1) returns an address into the hash_map, and
>>> strlen_to_stridx.put (lhs, *ps); (in order to be efficient) doesn't make a
>>> copy of the argument just in case, first inserts the slot into it which
>>> may cause reallocation, and only afterwards runs the copy ctor to assign
>>> the value into the new slot.  So, passing it a reference to something
>>> in the hash_map is wrong.  Fixed thusly, bootstrapped/regtested on
>>> x86_64-linux and i686-linux, ok for trunk?
>>
>> This seems like an unnecessary gotcha that should be possible to
>> avoid in the hash_map.  The corresponding standard containers
>> require it to work and so it's surprising when it doesn't in GCC.
>>
>> I've been looking at how this is implemented and it seems to me
>> that a fix should be doable by having the hash_map check to see if
>> the underlying table needs to expand and if so, create a temporary
>> copy of the element before reallocating it.
> 
> That would IMHO just slow down and enlarge the hash_map for all users,
> even when most of them don't really need it.
> While it is reasonable for STL containers to make sure it works, we
> aren't using STL containers and can pose additional restrictions.
But when we make our containers behave differently than the STL it makes
it much easier for someone to make a mistake such as this one.

IMHO this kind of difference in behavior is silly and long term just
makes our jobs harder.

I'd vote for fixing our containers.

jeff

Re: [PATCH, AArch64] Adjust tuning parameters for Falkor

2017-11-14 Thread Andrew Pinski

On Tue, Nov 14, 2017 at 6:00 PM, Luis Machado  wrote:
> Disabling software prefetching and switching the autoprefetcher to weak 
> improves
> CPU2017 rate and speed benchmarks for both int and fp sets on Falkor.
>
> SPECrate 2017 fp is up 0.38%
> SPECspeed 2017 fp is up 0.54%
> SPECrate 2017 int is up 3.02%
> SPECspeed 2017 int is up 3.16%
>
> There are only a couple individual regressions. The biggest one being about 4%
> in parest.
>
> For SPEC2006, we've noticed the following:
>
> SPECint is up 0.91%
> SPECfp is stable
>
> In the case of SPEC2006 we noticed both a big regression in mcf (about 20%)
> and a big improvement for hmmer (about 40%).
>
> Since the overall result is positive, we would like to make these new tuning
> settings the default for Falkor.
>
> We may revisit the software prefetcher setting in the future, in case we
> can adjust it enough so it provides us a good balance between improvements and
> regressions (mcf). But for now it is best if it stays off.
>
> I understand the freeze is happening soon, so it would be great to have this
> in before then.
>
> OK?
>
> Thanks,
> Luis
>
> 2017-11-14  Luis Machado  
>
> * config/aarch64/aarch64.c (qdf24xx_prefetch_tune): Remove.
> (qdf24xx_tunings): Replace qdf24xx_prefetch_tune with
> generic_prefetch_tune and tune_params::AUTOPREFETCHER_STRONG with
> tune_params::AUTOPREFETCHER_WEAK.
> ---
>  gcc/ChangeLog|  7 +++
>  gcc/config/aarch64/aarch64.c | 13 ++---
>  2 files changed, 9 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index b80a421..4dbfda0 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,10 @@
> +2017-11-14  Luis Machado  
> +
> +   * config/aarch64/aarch64.c (qdf24xx_prefetch_tune): Remove.
> +   (qdf24xx_tunings): Replace qdf24xx_prefetch_tune with
> +   generic_prefetch_tune and tune_params::AUTOPREFETCHER_STRONG with
> +   tune_params::AUTOPREFETCHER_WEAK.
> +
>  2017-11-14  Carl Love  
>
> * config/rs6000/rs6000.c (swap_endian_selector_for_mode): Remove
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 0c67e2b..171a230 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -502,15 +502,6 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune =
>-1   /* default_opt_level  */
>  };
>
> -static const cpu_prefetch_tune qdf24xx_prefetch_tune =
> -{
> -  4,   /* num_slots  */
> -  32,  /* l1_cache_size  */
> -  64,  /* l1_cache_line_size  */
> -  1024,/* l2_cache_size  */
> -  3/* default_opt_level  */

I think the best thing is to leave this tuning structure in place and
just change default_opt_level   to -1 to disable it at -O3.

Thanks,
Andrew

> -};
> -
>  static const cpu_prefetch_tune thunderxt88_prefetch_tune =
>  {
>8,   /* num_slots  */
> @@ -817,9 +808,9 @@ static const struct tune_params qdf24xx_tunings =
>2,   /* min_div_recip_mul_sf.  */
>2,   /* min_div_recip_mul_df.  */
>0,   /* max_case_values.  */
> -  tune_params::AUTOPREFETCHER_STRONG,  /* autoprefetcher_model.  */
> +  tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
>(AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
> -  _prefetch_tune
> +  _prefetch_tune
>  };
>
>  /* Tuning structure for the Qualcomm Saphira core.  Default to falkor values
> --
> 2.7.4
>


http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail;
target="_blank">https://ipmcdn.avast.com/images/icons/icon-envelope-tick-green-avg-v1.png;
alt="" width="46" height="29" style="width: 46px; height: 29px;"
/>
Virus-free. http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail;
target="_blank" style="color: #4453ea;">www.avg.com

[PATCH, AArch64] Adjust tuning parameters for Falkor

2017-11-14 Thread Luis Machado

Disabling software prefetching and switching the autoprefetcher to weak improves
CPU2017 rate and speed benchmarks for both int and fp sets on Falkor.

SPECrate 2017 fp is up 0.38%
SPECspeed 2017 fp is up 0.54%
SPECrate 2017 int is up 3.02%
SPECspeed 2017 int is up 3.16%

There are only a couple individual regressions. The biggest one being about 4%
in parest.

For SPEC2006, we've noticed the following:

SPECint is up 0.91%
SPECfp is stable

In the case of SPEC2006 we noticed both a big regression in mcf (about 20%)
and a big improvement for hmmer (about 40%).

Since the overall result is positive, we would like to make these new tuning
settings the default for Falkor.

We may revisit the software prefetcher setting in the future, in case we
can adjust it enough so it provides us a good balance between improvements and
regressions (mcf). But for now it is best if it stays off.

I understand the freeze is happening soon, so it would be great to have this
in before then.

OK?

Thanks,
Luis

2017-11-14  Luis Machado  

* config/aarch64/aarch64.c (qdf24xx_prefetch_tune): Remove.
(qdf24xx_tunings): Replace qdf24xx_prefetch_tune with
generic_prefetch_tune and tune_params::AUTOPREFETCHER_STRONG with
tune_params::AUTOPREFETCHER_WEAK.
---
 gcc/ChangeLog|  7 +++
 gcc/config/aarch64/aarch64.c | 13 ++---
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b80a421..4dbfda0 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2017-11-14  Luis Machado  
+
+   * config/aarch64/aarch64.c (qdf24xx_prefetch_tune): Remove.
+   (qdf24xx_tunings): Replace qdf24xx_prefetch_tune with
+   generic_prefetch_tune and tune_params::AUTOPREFETCHER_STRONG with
+   tune_params::AUTOPREFETCHER_WEAK.
+
 2017-11-14  Carl Love  
 
* config/rs6000/rs6000.c (swap_endian_selector_for_mode): Remove
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0c67e2b..171a230 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -502,15 +502,6 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune =
   -1   /* default_opt_level  */
 };
 
-static const cpu_prefetch_tune qdf24xx_prefetch_tune =
-{
-  4,   /* num_slots  */
-  32,  /* l1_cache_size  */
-  64,  /* l1_cache_line_size  */
-  1024,/* l2_cache_size  */
-  3/* default_opt_level  */
-};
-
 static const cpu_prefetch_tune thunderxt88_prefetch_tune =
 {
   8,   /* num_slots  */
@@ -817,9 +808,9 @@ static const struct tune_params qdf24xx_tunings =
   2,   /* min_div_recip_mul_sf.  */
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
-  tune_params::AUTOPREFETCHER_STRONG,  /* autoprefetcher_model.  */
+  tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
-  _prefetch_tune
+  _prefetch_tune
 };
 
 /* Tuning structure for the Qualcomm Saphira core.  Default to falkor values
-- 
2.7.4

Add __builtin_tgmath for better tgmath.h implementation (bug 81156)

2017-11-14 Thread Joseph Myers

Various implementations of C99/C11  have the property that
their macro expansions contain many copies of the macro arguments, so
resulting in exponential blowup of the size of macro expansions where
a call to such a macro contains other such calls in the macro
arguments.

This patch adds a (C-only) language feature __builtin_tgmath designed
to avoid this problem by implementing the  function
selection rules directly in the compiler.  The effect is that
type-generic macros can be defined simply as

#define pow(a, b) __builtin_tgmath (powf, pow, powl, \
cpowf, cpow, cpowl, a, b)

as in the example added to the manual, with each macro argument
expanded exactly once.  The details of __builtin_tgmath are as
described in the manual.  This is C-only since C++ uses function
overloading and just defines  to include  and
.

__builtin_tgmath handles C99/C11 type-generic macros, and _FloatN,
_FloatNx and decimal floating-point types (following the proposed
resolution to the floating-point TS DR#9 that makes the rules for
finding a common type from arguments to a type-generic macro follow
the usual arithmetic conversions after adjustment of integer arguments
to _Decimal64 or double - or to _Complex double in the case of GNU
complex integer arguments).

Type-generic macros for functions from TS 18661 that round their
results to a narrower type are handled, but there are still some
unresolved questions regarding such macros so further changes in that
regard may be needed in future.  The current implementation follows an
older version of the DR#13 resolution (allowing a function for a
wide-enough argument type to be selected if no exactly-matching
function is available), but with appropriate calls to __builtin_tgmath
is still fully compatible with the latest version of the resolution
(not yet in the DR log), and allowing such not-exactly-matching
argument types to be chosen in that case avoids needing another
special case to treat integers as _Float64 instead of double in
certain cases.

Regarding other possible language/library features, not currently
implemented in GCC:

* Imaginary types could be naturally supported by allowing cases where
  the type-generic type is an imaginary type T and arguments or return
  types may be T (as at present), or the corresponding real type to T
  (as at present), or (new) the corresponding real type if T is real
  or imaginary but T if T is complex.  (tgmath.h would need a series
  of functions such as

  static inline _Imaginary double
  __sin_imag (_Imaginary double __x)
  {
return _Imaginary_I * sinh (__imag__ __x);
  }

  to be used in __builtin_tgmath calls.)

* __builtin_tgmath would use the constant rounding direction in the
  presence of support for the FENV_ROUND / FENV_DEC_ROUND pragmas.
  Support for those would also require a new __builtin_ to
  cause a non-type-generic call to use the constant rounding
  direction (it seems cleaner to add a new __builtin_ when
  required than to make __builtin_tgmath handle a non-type-generic
  case with only one function argument).

* TS 18661-5 __STDC_TGMATH_OPERATOR_EVALUATION__ would require new
  __builtin_ that evaluates with excess range and precision
  like arithmetic operators do.

* The proposed C bindings for IEEE 754-2018 augmented arithmetic
  operations involve struct return types.  As currently implemented
  __builtin_tgmath does not handle those, but support could be added.

There are many error cases that the implementation diagnoses.  I've
tried to ensure reasonable error messages for erroneous uses of
__builtin_tgmath, but the errors for erroneous uses of the resulting
type-generic macros (that is, when the non-function arguments have
inappropriate types) are more important as they are more likely to be
seen by users.

GCC's own tgmath.h, as used for some targets, is updated in this
patch.  I've tested those changes minimally, via adjusting
gcc.dg/c99-tgmath-* locally to use that tgmath.h version.  I've also
run the glibc testsuite (which has much more thorough tests of
correctness of tgmath.h function selection) with a glibc patch to use
__builtin_tgmath in glibc's tgmath.h.

Bootstrapped with no regressions on x86_64-pc-linux-gnu.  Applied to 
mainline.

gcc:
2017-11-15  Joseph Myers  

PR c/81156
* doc/extend.texi (Other Builtins): Document __builtin_tgmath.
* ginclude/tgmath.h (__tg_cplx, __tg_ldbl, __tg_dbl, __tg_choose)
(__tg_choose_2, __tg_choose_3, __TGMATH_REAL_1_2)
(__TGMATH_REAL_2_3): Remove macros.
(__TGMATH_CPLX, __TGMATH_CPLX_2, __TGMATH_REAL, __TGMATH_REAL_2)
(__TGMATH_REAL_3, __TGMATH_CPLX_ONLY): Define using
__builtin_tgmath.
(frexp, ldexp, nexttoward, scalbn, scalbln): Define using
__TGMATH_REAL_2.
(remquo): Define using __TGMATH_REAL_3.

gcc/c:
2017-11-15  Joseph Myers  

PR c/81156
* c-parser.c

Go patch committed: Remove LHS/RHS setting for variables

2017-11-14 Thread Ian Lance Taylor

This patch to the Go frontend by Than McIntosh removes the  LHS/RHS
context determination for variable references.  It used to be needed
for a different backend but it's no longer necessary.  Bootstrapped
and ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian


2017-11-14  Than McIntosh  

* go-gcc.cc (var_expression): Remove Varexpr_context parameter.
Index: gcc/go/go-gcc.cc
===
--- gcc/go/go-gcc.cc(revision 254090)
+++ gcc/go/go-gcc.cc(working copy)
@@ -276,7 +276,7 @@ class Gcc_backend : public Backend
   { return this->make_expression(null_pointer_node); }
 
   Bexpression*
-  var_expression(Bvariable* var, Varexpr_context, Location);
+  var_expression(Bvariable* var, Location);
 
   Bexpression*
   indirect_expression(Btype*, Bexpression* expr, bool known_valid, Location);
@@ -1256,7 +1256,7 @@ Gcc_backend::zero_expression(Btype* btyp
 // An expression that references a variable.
 
 Bexpression*
-Gcc_backend::var_expression(Bvariable* var, Varexpr_context, Location location)
+Gcc_backend::var_expression(Bvariable* var, Location location)
 {
   tree ret = var->get_tree(location);
   if (ret == error_mark_node)
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 254729)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-d028451131e92bab5379defb04ead87ca978ed25
+cb5dc1ce98857884a2215c461dd1d7de530f9f5e
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/backend.h
===
--- gcc/go/gofrontend/backend.h (revision 254090)
+++ gcc/go/gofrontend/backend.h (working copy)
@@ -254,7 +254,7 @@ class Backend
 
   // Create a reference to a variable.
   virtual Bexpression*
-  var_expression(Bvariable* var, Varexpr_context in_lvalue_pos, Location) = 0;
+  var_expression(Bvariable* var, Location) = 0;
 
   // Create an expression that indirects through the pointer expression EXPR
   // (i.e., return the expression for *EXPR). KNOWN_VALID is true if the 
pointer
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 254126)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -771,7 +771,7 @@ Var_expression::do_get_backend(Translate
 go_unreachable();
 
   Bexpression* ret =
-  context->backend()->var_expression(bvar, this->in_lvalue_pos_, loc);
+  context->backend()->var_expression(bvar, loc);
   if (is_in_heap)
 ret = context->backend()->indirect_expression(btype, ret, true, loc);
   return ret;
@@ -898,10 +898,7 @@ Temporary_reference_expression::do_get_b
 {
   Gogo* gogo = context->gogo();
   Bvariable* bvar = this->statement_->get_backend_variable(context);
-  Varexpr_context ve_ctxt = (this->is_lvalue_ ? VE_lvalue : VE_rvalue);
-
-  Bexpression* ret = gogo->backend()->var_expression(bvar, ve_ctxt,
- this->location());
+  Bexpression* ret = gogo->backend()->var_expression(bvar, this->location());
 
   // The backend can't always represent the same set of recursive types
   // that the Go frontend can.  In some cases this means that a
@@ -972,7 +969,7 @@ Set_and_use_temporary_expression::do_get
   Location loc = this->location();
   Gogo* gogo = context->gogo();
   Bvariable* bvar = this->statement_->get_backend_variable(context);
-  Bexpression* lvar_ref = gogo->backend()->var_expression(bvar, VE_lvalue, 
loc);
+  Bexpression* lvar_ref = gogo->backend()->var_expression(bvar, loc);
 
   Named_object* fn = context->function();
   go_assert(fn != NULL);
@@ -980,7 +977,7 @@ Set_and_use_temporary_expression::do_get
   Bexpression* bexpr = this->expr_->get_backend(context);
   Bstatement* set = gogo->backend()->assignment_statement(bfn, lvar_ref,
   bexpr, loc);
-  Bexpression* var_ref = gogo->backend()->var_expression(bvar, VE_rvalue, loc);
+  Bexpression* var_ref = gogo->backend()->var_expression(bvar, loc);
   Bexpression* ret = gogo->backend()->compound_expression(set, var_ref, loc);
   return ret;
 }
@@ -1084,11 +1081,11 @@ Sink_expression::do_get_backend(Translat
gogo->backend()->temporary_variable(fn_ctx, context->bblock(), bt, NULL,
false, loc, );
   Bexpression* var_ref =
-  gogo->backend()->var_expression(this->bvar_, VE_lvalue, loc);
+  gogo->backend()->var_expression(this->bvar_, loc);
   var_ref = gogo->backend()->compound_expression(decl, var_ref, loc);
   return var_ref;
 }
-  return gogo->backend()->var_expression(this->bvar_, VE_lvalue, loc);
+  return gogo->backend()->var_expression(this->bvar_, loc);
 }
 
 // Ast dump for sink

Re: [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets (PR 82455)

2017-11-14 Thread Jeff Law

On 11/13/2017 06:21 PM, Martin Sebor wrote:
> I have a different concern with the general idea of moving these
> kinds of warnings into passes of their own.  It would unavoidably
> result in duplicating some code from the optimization passes (at
> a minimum, the GIMPLE traversal, but likely more than that).  It
> would also require pretty tight coupling between the warning pass
> and the optimization passes as the latter can defeat the work of
> the former (as happens in this case).  But most significantly,
> the separation isn't feasible in cases where the optimization
> pass computes and maintains non-trivial internal state (like,
> say, tree-object-size or tree-ssa-strlen).
Something like object-size, string lengths, value data, etc should be
independent analysis modules.  You then build an optimizer on top of the
module and/or a warning pass on top of the module.  The optimizer and
warning pass might in fact share analysis and just do different things
with the underlying data produced.

It's not perfect and we're not likely to ever be perfect, but it's a
hell of a lot better than what we're doing now.   Yes there'll be phase
ordering problems, problems with other passes hiding or exposing
questionable code, possibly duplication between the optimization client
and warning client, etc, etc.  We have to look at each problem and
decide how best to proceed.

I do think that the frameworks we build should have mechanisms to allow
transformation of code that is clearly wrong and exhibits undefined
behavior.

> 
> That said, it would make perfect sense to me to have the warning
> code live in separate files from the optimization code, and keep
> the separation of concerns and responsibilities crisp and well
> defined by their APIs.
Where that's feasible it seems reasonable to me.  I'm obviously biased
based on recent experiences within VRP which has grown over time into a
horrid mess my forays into tree-ssa-uninit.c which has a damn cool
little predicate analysis engine just waiting for someone extract and
re-use.

Jeff

Re: [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets (PR 82455)

2017-11-14 Thread Jeff Law

On 11/14/2017 01:42 AM, Richard Biener wrote:

> 
> I suspect once you're dealing with C++ code you run into the issue
> that even early inlining exposes code with forwprop run on it
> before running forwprop again on the inlined-into body.
> 
> So the IPA issues start very early.  Of course if you are doing
> path-sensitive processing then processing call/return "edges" as if
> they were CFG edges shouldn't be too hard.
> 
> Then the only issue remaining is that there are very many
> paths in a program compared to # edges or # blocks which means
> you'll quickly run into compile-time issues.
Yup.  There's a reason why static analysis is usually not on by default.
   It's bloody expensive.

> 
> Static analyzers are hard ;)  But I appreciate somebody finally
> trying that route.  Ideally we'd do the static analysis in parallel
> to the compilation given we'd need an "early" LTO phase before
> early inlining.  Thus do a LTO out then in parallel do WPA
> static analysis with diagnostics.
> 
No doubt.  I wouldn't even say we're going that route, at least not in
any real sense.  The walkers would probably be structured differently
than what we're doing and have to hook in much earlier.  You have to
build both the callgraph walker as well as the CFG walker and the
ability to pass data back and forth between them.  You'd also have to
finish all the deferred folding stuff so that what you analyze is closer
to the actual source.

We're just building on the existing infrastructure we have to give
better function scoped warnings.

jeff

Re: [PATCH] Fix use-after-free in the strlen pass (PR tree-optimization/82977)

2017-11-14 Thread Jeff Law

On 11/14/2017 02:30 PM, Martin Sebor wrote:
> On 11/14/2017 02:10 PM, Jeff Law wrote:
>> On 11/14/2017 02:04 PM, Jakub Jelinek wrote:
>>> Hi!
>>>
>>> strlen_to_stridx.get (rhs1) returns an address into the hash_map, and
>>> strlen_to_stridx.put (lhs, *ps); (in order to be efficient) doesn't
>>> make a
>>> copy of the argument just in case, first inserts the slot into it which
>>> may cause reallocation, and only afterwards runs the copy ctor to assign
>>> the value into the new slot.  So, passing it a reference to something
>>> in the hash_map is wrong.  Fixed thusly, bootstrapped/regtested on
>>> x86_64-linux and i686-linux, ok for trunk?
>>>
>>> 2017-11-14  Jakub Jelinek  
>>>
>>> PR tree-optimization/82977
>>> * tree-ssa-strlen.c (strlen_optimize_stmt): Pass a reference to a
>>> copy
>>> constructed temporary to strlen_to_stridx.put.
>> I've been seeing a couple new tests flip between pass and fail recently.
>>  I wonder if this is the ultimate cause.
> 
> I've been noticing it for quite a while, even before the commit,
> so I suspect something else is going on in addition to this bug.
I'm referring specifically to the Wstringop-truncation tests.  They're
ping-ponging between PASS/FAIL here with alarming regularity and no
sense of rhyme or reason.

If I had to guess I'd guess uninit memory, dangling pointer or the like,
which is precisely the kind of problem the patch is meant to fix.

jeff

Re: [AARCH64] implements neon vld1_*_x2 intrinsics

2017-11-14 Thread Kugan Vivekanandarajah

Ping?

Thanks,
Kugan

On 7 November 2017 at 15:10, Kugan Vivekanandarajah
 wrote:
> Hi,
>
> Attached patch implements the  vld1_*_x2 intrinsics as defined by the
> neon document.
>
> Bootstrap for the latest patch is ongoing on aarch64-linux-gnu. Is
> this OK for trunk if no regressions?
>
> Thanks,
> Kugan
>
> gcc/ChangeLog:
>
> 2017-11-06  Kugan Vivekanandarajah  
>
> * config/aarch64/aarch64-simd.md (aarch64_ld1x2): New.
> (aarch64_ld1x2): Likewise.
> (aarch64_simd_ld1_x2): Likewise.
> (aarch64_simd_ld1_x2): Likewise.
> * config/aarch64/arm_neon.h (vld1_u8_x2): New.
> (vld1_s8_x2): Likewise.
> (vld1_u16_x2): Likewise.
> (vld1_s16_x2): Likewise.
> (vld1_u32_x2): Likewise.
> (vld1_s32_x2): Likewise.
> (vld1_u64_x2): Likewise.
> (vld1_s64_x2): Likewise.
> (vld1_f16_x2): Likewise.
> (vld1_f32_x2): Likewise.
> (vld1_f64_x2): Likewise.
> (vld1_p8_x2): Likewise.
> (vld1_p16_x2): Likewise.
> (vld1_p64_x2): Likewise.
> (vld1q_u8_x2): Likewise.
> (vld1q_s8_x2): Likewise.
> (vld1q_u16_x2): Likewise.
> (vld1q_s16_x2): Likewise.
> (vld1q_u32_x2): Likewise.
> (vld1q_s32_x2): Likewise.
> (vld1q_u64_x2): Likewise.
> (vld1q_s64_x2): Likewise.
> (vld1q_f16_x2): Likewise.
> (vld1q_f32_x2): Likewise.
> (vld1q_f64_x2): Likewise.
> (vld1q_p8_x2): Likewise.
> (vld1q_p16_x2): Likewise.
> (vld1q_p64_x2): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 2017-11-06  Kugan Vivekanandarajah  
>
> * gcc.target/aarch64/advsimd-intrinsics/vld1x2.c: New test.

Re: libstdc++ PATCH to harmonize noexcept

2017-11-14 Thread Jonathan Wakely


On 14/11/17 13:56 -0500, Jason Merrill wrote:

While working on an unrelated issue I noticed that the compiler didn't
like some of these declarations after preprocessing, when they aren't
protected by system-header permissiveness.

I thought about limiting the permissiveness to only extern "C"
functions, but I believe that system headers are adding more C++
declarations, so we'd likely run into this issue again.

Shouldn't we build libstdc++ with -Wsystem-headers?  Maybe along with
-Wno-error=system-headers?


It's been suggested before:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50871

[C++ Patch] PR 82235 (Copy ctor is not found for copying array of an object when it's marked explicit)

2017-11-14 Thread Mukesh Kapoor


Hi,

This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82235
For the following test case

struct Foo {
    Foo() {}
    explicit Foo(const Foo& aOther) {}
};
struct Bar {
    Foo m[1];
};
void test() {
    Bar a;
    Bar b = a;
}

the compiler issues an error when the compiler generated copy 
constructor of class Bar calls the explicit copy constructor of class 
Foo. The fix is to implement ISO C++/17 16.3.1.4 (over.match.copy) 
correctly.


Bootstrapped and tested with 'make check' on x86_64-linux.

Mukesh

/gcc/cp
2017-11-14  Mukesh Kapoor   

PR c++/82235
* init.c (build_aggr_init): Allow explicit constructors
for the following special case:
ISO C++/17 16.3.1.4 (over.match.copy):
When initializing a temporary to be bound to the first
parameter of a constructor where the parameter is of type
"reference to possibly cv-qualified T" and the constructor
is called with a single argument in the context of
direct-initialization of an object of type "cv2 T", explicit
conversion functions are also considered.

Added function ref_first_parm_of_constructor() to check for
this case. In build_aggr_init(), don't set LOOKUP_ONLYCONVERTING
in flags if ref_first_parm_of_constructor() returns true.

/testsuite
2017-11-14  Mukesh Kapoor   

PR c++/82235
* g++.dg/cpp0x/explicit13.C: New

Index: gcc/cp/init.c
===
--- gcc/cp/init.c   (revision 254412)
+++ gcc/cp/init.c   (working copy)
@@ -1615,6 +1615,26 @@ expand_member_init (tree name)
   return NULL_TREE;
 }
 
+/* Return true if current_function_decl is a constructor
+   and its first argument is a reference type and it is
+   a direct initialization. 
+   If FLAGS is 0 then it is a direct initialization, the (init) form.  */
+
+static inline bool
+ref_first_parm_of_constructor (int flags)
+{
+  bool res = false;
+  if (current_function_decl != NULL_TREE)
+{
+  tree parmlist = FUNCTION_FIRST_USER_PARMTYPE (current_function_decl);
+  tree firstparm = TREE_VALUE (parmlist);
+  res = ((flags == 0) && DECL_CONSTRUCTOR_P (current_function_decl)
+   && (TREE_CODE (firstparm) == REFERENCE_TYPE));
+}
+
+  return res;
+}
+
 /* This is like `expand_member_init', only it stores one aggregate
value into another.
 
@@ -1728,10 +1748,20 @@ build_aggr_init (tree exp, tree init, int flags, t
   return stmt_expr;
 }
 
+   /* Don't set LOOKUP_ONLYCONVERTING for the following special case:
+   ISO C++/17 16.3.1.4 (over.match.copy):
+   When initializing a temporary to be bound to the first
+   parameter of a constructor where the parameter is of type
+   "reference to possibly cv-qualified T" and the constructor
+   is called with a single argument in the context of
+   direct-initialization of an object of type "cv2 T", explicit
+   conversion functions are also considered.  */
+
   if (init && init != void_type_node
   && TREE_CODE (init) != TREE_LIST
   && !(TREE_CODE (init) == TARGET_EXPR
   && TARGET_EXPR_DIRECT_INIT_P (init))
+  && !ref_first_parm_of_constructor (flags)
   && !DIRECT_LIST_INIT_P (init))
 flags |= LOOKUP_ONLYCONVERTING;
 
Index: gcc/testsuite/g++.dg/cpp0x/explicit13.C
===
--- gcc/testsuite/g++.dg/cpp0x/explicit13.C (revision 0)
+++ gcc/testsuite/g++.dg/cpp0x/explicit13.C (working copy)
@@ -0,0 +1,17 @@
+// PR c++/82235
+// { dg-do compile { target c++11 } }
+
+struct Foo {
+  Foo() {}
+  explicit Foo(const Foo& aOther) {}
+};
+
+struct Bar {
+  Foo m[1];
+};
+
+void test()
+{
+  Bar a;
+  Bar b = a;
+}

Re: [PATCH] Fix use-after-free in the strlen pass (PR tree-optimization/82977)

2017-11-14 Thread Martin Sebor


On 11/14/2017 02:30 PM, Jakub Jelinek wrote:

On Tue, Nov 14, 2017 at 02:24:28PM -0700, Martin Sebor wrote:

On 11/14/2017 02:04 PM, Jakub Jelinek wrote:

Hi!

strlen_to_stridx.get (rhs1) returns an address into the hash_map, and
strlen_to_stridx.put (lhs, *ps); (in order to be efficient) doesn't make a
copy of the argument just in case, first inserts the slot into it which
may cause reallocation, and only afterwards runs the copy ctor to assign
the value into the new slot.  So, passing it a reference to something
in the hash_map is wrong.  Fixed thusly, bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?


This seems like an unnecessary gotcha that should be possible to
avoid in the hash_map.  The corresponding standard containers
require it to work and so it's surprising when it doesn't in GCC.

I've been looking at how this is implemented and it seems to me
that a fix should be doable by having the hash_map check to see if
the underlying table needs to expand and if so, create a temporary
copy of the element before reallocating it.


That would IMHO just slow down and enlarge the hash_map for all users,
even when most of them don't really need it.
While it is reasonable for STL containers to make sure it works, we
aren't using STL containers and can pose additional restrictions.


How about at least detecting the problem then?  The attached patch
catches the bug while running the Wstringop-truncation tests and
passes x86_64 bootstrap.

Martin

gcc/ChangeLog:

	* gcc/hash-table.h (find_slot_with_hash, expand): Add argument.
	(find_slot): Adjust.
	* gcc/hash-map.h (put): Adjust.
	* gcc/hash-set.h (add): Adjust.

diff --git a/gcc/hash-map.h b/gcc/hash-map.h
index 73f1c54..b8bff72 100644
--- a/gcc/hash-map.h
+++ b/gcc/hash-map.h
@@ -134,7 +134,7 @@ public:
   bool put (const Key , const Value )
 {
   hash_entry *e = m_table.find_slot_with_hash (k, Traits::hash (k),
-		   INSERT);
+		   INSERT, );
   bool existed = !hash_entry::is_empty (*e);
   if (!existed)
 	e->m_key = k;
diff --git a/gcc/hash-set.h b/gcc/hash-set.h
index d2247d3..475a358 100644
--- a/gcc/hash-set.h
+++ b/gcc/hash-set.h
@@ -44,7 +44,7 @@ public:
 
   bool add (const Key )
 {
-  Key *e = m_table.find_slot_with_hash (k, Traits::hash (k), INSERT);
+  Key *e = m_table.find_slot_with_hash (k, Traits::hash (k), INSERT, );
   bool existed = !Traits::is_empty (*e);
   if (!existed)
 	*e = k;
diff --git a/gcc/hash-table.h b/gcc/hash-table.h
index 64d3157..fc39ad93 100644
--- a/gcc/hash-table.h
+++ b/gcc/hash-table.h
@@ -411,7 +411,8 @@ public:
 
   value_type *find_slot (const value_type , insert_option insert)
 {
-  return find_slot_with_hash (value, Descriptor::hash (value), insert);
+  return find_slot_with_hash (value, Descriptor::hash (value), insert,
+  );
 }
 
   /* This function searches for a hash table slot containing an entry
@@ -422,7 +423,8 @@ public:
  write the value you want into the returned slot.  When inserting an
  entry, NULL may be returned if memory allocation fails. */
   value_type *find_slot_with_hash (const compare_type ,
-hashval_t hash, enum insert_option insert);
+   hashval_t hash, enum insert_option insert,
+   const void *iptr = NULL);
 
   /* This function deletes an element with the given COMPARABLE value
  from hash table starting with the given HASH.  If there is no
@@ -504,7 +506,7 @@ private:
   value_type *alloc_entries (size_t n CXX_MEM_STAT_INFO) const;
   value_type *find_empty_slot_for_expand (hashval_t);
   bool too_empty_p (unsigned int);
-  void expand ();
+  void expand (const void* = NULL);
   static bool is_deleted (value_type )
   {
 return Descriptor::is_deleted (v);
@@ -706,11 +708,12 @@ hash_table::too_empty_p (unsigned int elts)
of the table after the call will be about 50%.  Naturally the hash
table must already exist.  Remember also that the place of the
table entries is changed.  If memory allocation fails, this function
-   will abort.  */
+   will abort.  If PTR points into the existing table, the function
+   will abort in checking mode.  */
 
 template class Allocator>
 void
-hash_table::expand ()
+hash_table::expand (const void *ptr /* = NULL */)
 {
   value_type *oentries = m_entries;
   unsigned int oindex = m_size_prime_index;
@@ -718,6 +721,15 @@ hash_table::expand ()
   value_type *olimit = oentries + osize;
   size_t elts = elements ();
 
+#if CHECKING_P
+  /* Verify that the pointer doesn't point into the table to detect
+ insertions of existing elements.  */
+  uintptr_t iptr = (uintptr_t)ptr;
+  uintptr_t ibeg = (uintptr_t)oentries;
+  uintptr_t iend = (uintptr_t)olimit;
+  gcc_checking_assert (iptr < ibeg || iend < iptr);
+#endif
+
   /* Resize only when table after removal of unused elements is either
  too full or too empty.  */
   unsigned int

Re: [Patch] Don't call linker when creating pre-compiled header just because you saw a linker option

2017-11-14 Thread Steve Ellcey

On Tue, 2017-11-14 at 20:04 +0100, Richard Biener wrote:
> 
> I think the intent is to link even for just - lfoo while it makes
> sense to ignore -L/path - Wl,... Certainly counts as possible input. 
> 
> It seems you even break - lfoo. 
> 
> Richard. 

Good point.  I guess I need to take a different approach to determining
if I am creating a PCH or an executable.  Here is a new patch that also
passes the testsuite with no regressions.  Originally in this patch I
was setting create_pch_flag whenever I saw a '@-cheader' compiler being
called but that broke on things like 'gcc foo.h main.c' so now I only
skip the link phase when creating pch's is the only thing being done.

Steve Ellcey
sell...@cavium.com



2017-11-14  Steve Ellcey  

* gcc.c (create_pch_flag): New variable.
(driver::prepare_infiles): Set create_pch_flag
when we are only creating precompiled headers.
(driver::maybe_run_linker): Do not link if
create_pch_flag is set.
(driver::finalize): Reset create_pch_flag.


diff --git a/gcc/gcc.c b/gcc/gcc.c
index 43e6d59..9df32e3 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -208,6 +208,9 @@ int is_cpp_driver;
 /* Flag set to nonzero if an @file argument has been supplied to gcc.  */
 static bool at_file_supplied;
 
+/* Flag set to nonzero if we are generating a precompiled header.  */
+static bool create_pch_flag;
+
 /* Definition of string containing the arguments given to configure.  */
 #include "configargs.h"
 
@@ -8074,6 +8077,8 @@ driver::prepare_infiles ()
 {
   size_t i;
   int lang_n_infiles = 0;
+  bool found_pch_input_file = false;
+  bool found_non_pch_input_file = false;
 
   if (n_infiles == added_libraries)
 fatal_error (input_location, "no input files");
@@ -8102,8 +8107,16 @@ driver::prepare_infiles ()
       strlen (name),
       infiles[i].language);
 
-  if (compiler && !(compiler->combinable))
-   combine_inputs = false;
+  if (compiler)
+   {
+     if (!(compiler->combinable))
+   combine_inputs = false;
+
+     if (strcmp(compiler->suffix, "@c-header") == 0)
+   found_pch_input_file = true;
+     else
+   found_non_pch_input_file = true;
+   }
 
   if (lang_n_infiles > 0 && compiler != input_file_compiler
      && infiles[i].language && infiles[i].language[0] != '*')
@@ -8129,6 +8142,9 @@ driver::prepare_infiles ()
 fatal_error (input_location,
     "cannot specify -o with -c, -S or -E with multiple files");
 
+  if (found_pch_input_file && !found_non_pch_input_file)
+create_pch_flag = true;
+
   /* No early exit needed from main; we can continue.  */
   return false;
 }
@@ -8289,6 +8305,10 @@ driver::maybe_run_linker (const char *argv0) const
   int linker_was_run = 0;
   int num_linker_inputs;
 
+  /* If we are creating a precompiled header, do not run the linker.  */
+  if (create_pch_flag)
+return;
+
   /* Determine if there are any linker input files.  */
   num_linker_inputs = 0;
   for (i = 0; (int) i < n_infiles; i++)
@@ -10059,6 +10079,7 @@ driver::finalize ()
 
   is_cpp_driver = 0;
   at_file_supplied = 0;
+  create_pch_flag = 0;
   print_help_list = 0;
   print_version = 0;
   verbose_only_flag = 0;

[Ping] [C++ Patch/RFC] PR 82593 ("Internal compiler error: in process_init_constructor_array, at cp/typeck2.c:1294")

2017-11-14 Thread Paolo Carlini


Hi,

any feedback about this - apparently rather simple - issue?

On 03/11/2017 18:33, Paolo Carlini wrote:

Hi,

this ICE on valid (given GNU's designated initializers) is rather 
simple to analyze: for the testcase, the gcc_assert in 
process_init_constructor_array triggers because at that time INDEX1 is 
still a CONST_DECL, not an INTEGER_CST. As regards fixing the problem, 
I immediately noticed earlier today that fold_non_dependent_expr is 
definitely able to fold the CONST_DECL to the expected zero 
INTEGER_CST - and that also passes testing - but I'm not sure that in 
the big picture folding at that time is correct. Thanks in advance for 
any feedback!


    https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00242.html

Thanks!
Paolo

Re: [PATCH] PR fortran/78240 -- kludge of the day

2017-11-14 Thread Steve Kargl

On Tue, Nov 14, 2017 at 05:21:41PM -0500, Fritz Reese wrote:
> On Tue, Nov 14, 2017 at 4:58 PM, Janus Weil  wrote:
> > Error: The module or main program array ‘x’ at (1) must have constant shape
> > pr78240.f90:11:19:
> >
> >integer x(n)/1/   ! { dg-error "Nonconstant array" }
> >1
> > Error: Nonconstant array section at (1) in DATA statement
> > [...]
> 
> ... does anyone know how to tell dejagnu to expect multiple errors on
> a single line?

You can use an OR in the dg-error directive.

  integer x(n) /1/  ! { dg-error "module or main" | "Nonconstant array" }

You can also use a prune directive.

! { dg-prune-output "module or main" }

See https://gcc.gnu.org/wiki/TestCaseWriting and examples
in gfortran.dg.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

Re: [PATCH] PR fortran/78240 -- kludge of the day

2017-11-14 Thread Janus Weil

2017-11-14 23:21 GMT+01:00 Fritz Reese :
> On Tue, Nov 14, 2017 at 4:58 PM, Janus Weil  wrote:
>> Hi guys,
>>
>> I see this new test case failing on x86_64-linux-gnu:
>>
>> FAIL: gfortran.dg/pr78240.f90   -O  (test for excess errors)
>>
>>
>> $ gfortran-8 pr78240.f90
>> pr78240.f90:11:12:
>>
>>integer x(n)/1/   ! { dg-error "Nonconstant array" }
>> 1
>> Error: Variable ‘n’ cannot appear in the expression at (1)
>> pr78240.f90:11:14:
>>
>>integer x(n)/1/   ! { dg-error "Nonconstant array" }
>>   1
>> Error: The module or main program array ‘x’ at (1) must have constant shape
>> pr78240.f90:11:19:
>>
>>integer x(n)/1/   ! { dg-error "Nonconstant array" }
>>1
>> Error: Nonconstant array section at (1) in DATA statement
>> [...]
>
> ... does anyone know how to tell dejagnu to expect multiple errors on
> a single line?

I think this is covered here:

https://gcc.gnu.org/wiki/TestCaseWriting

Cheers,
Janus

Re: [PATCH] PR fortran/78240 -- kludge of the day

2017-11-14 Thread Fritz Reese

On Tue, Nov 14, 2017 at 4:58 PM, Janus Weil  wrote:
> Hi guys,
>
> I see this new test case failing on x86_64-linux-gnu:
>
> FAIL: gfortran.dg/pr78240.f90   -O  (test for excess errors)
>
>
> $ gfortran-8 pr78240.f90
> pr78240.f90:11:12:
>
>integer x(n)/1/   ! { dg-error "Nonconstant array" }
> 1
> Error: Variable ‘n’ cannot appear in the expression at (1)
> pr78240.f90:11:14:
>
>integer x(n)/1/   ! { dg-error "Nonconstant array" }
>   1
> Error: The module or main program array ‘x’ at (1) must have constant shape
> pr78240.f90:11:19:
>
>integer x(n)/1/   ! { dg-error "Nonconstant array" }
>1
> Error: Nonconstant array section at (1) in DATA statement
> [...]

... does anyone know how to tell dejagnu to expect multiple errors on
a single line?

---
Fritz Reese

[PATCH, rs6000] (v2) GIMPLE folding for vector compares

2017-11-14 Thread Will Schmidt


Hi,
  Add support for gimple folding of vec_cmp_{eq,ge,gt,le,ne}
for the integer data types.

As part of this change, several define_insn stanzas have been added/updated
in vsx.md that specify the "ne: -> not: + eq: " combinations to allow for the 
generation
of the desired vcmpne[bhw] instructions, where we otherwise
would have generated a vcmpeq + vnor combination.  The defines
also obsoleted the need for the UNSPEC versions of the same, so this ends up
being just an update to those existing defines.

Several entries have been added to the switch statement in
builtin_function_type to identify the builtins having unsigned arguments.

A handful of existing tests required updates to their specified optimization
levels to continue to generate the desired code.  builtins-3-p9.c in particular
has been updated to reflect improved code gen with the higher specified
optimization level.
Testcase coverage is otherwise handled by the already-in-tree
gcc.target/powerpc/fold-vec-cmp-*.c tests.

Per feedback from the prior version, v2 changes also include:
  * Reworked the actual folding to use a VEC_COND_EXPR.  For cleanliness, I
  moved this to a new fold_build_vec_cmp() helper function, which itself
  is based on build_vec_cmp() as found in typeck.c.
  * Added an additional fold_compare_helper() function to further factor out
  the steps that are common to all of the vector compare operations.

Testing is currently underway on P6 and newer. OK for trunk?

Thanks,
-Will


2017-11-14  Will Schmidt  
[gcc]
* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
folding of vector compares.
(fold_build_vec_cmp): New helper function.
(fold_compare_helper): New helper function.
(builtin_function_type): Add compare builtins to the list of functions
having unsigned arguments.
* config/rs6000/vsx.md (vcmpneb, vcmpneh, vcmpnew): Update to specify 
the not+eq combination.

[testsuite]
* gcc.target/powerpc/builtins-3-p9.c: Add -O1, update
expected codegen checks.
* gcc.target/powerpc/vec-cmp-sel.c: Mark vars as volatile.
* gcc.target/powerpc/vsu/vec-cmpne-0.c: Add -O1.
* gcc.target/powerpc/vsu/vec-cmpne-1.c: Add -O1.
* gcc.target/powerpc/vsu/vec-cmpne-2.c: Add -O1.
* gcc.target/powerpc/vsu/vec-cmpne-3.c: Add -O1.
* gcc.target/powerpc/vsu/vec-cmpne-4.c: Add -O1.
* gcc.target/powerpc/vsu/vec-cmpne-5.c: Add -O1.
* gcc.target/powerpc/vsu/vec-cmpne-6.c: Add -O1.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 2c80a2f..0317324 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -16206,10 +16206,40 @@ rs6000_builtin_valid_without_lhs (enum 
rs6000_builtins fn_code)
 default:
   return false;
 }
 }
 
+/*  Helper function to handle the gimple folding of a vector compare
+operation.  This sets up true/false vectors, and uses the
+VEC_COND_EXPR operation.
+'code' indicates which comparison is to be made. (EQ, GT, ...).
+'type' indicates the type of the result.  */
+static tree
+fold_build_vec_cmp (tree_code code, tree type,
+   tree arg0, tree arg1)
+{
+  tree cmp_type = build_same_sized_truth_vector_type (type);
+  tree zero_vec = build_zero_cst (type);
+  tree minus_one_vec = build_minus_one_cst (type);
+  tree cmp = fold_build2 (code, cmp_type, arg0, arg1);
+  return fold_build3 (VEC_COND_EXPR, type, cmp, minus_one_vec, zero_vec);
+}
+
+/* Helper function to handle the in-between steps for the
+   vector compare built-ins.  */
+static void
+fold_compare_helper (gimple_stmt_iterator *gsi, tree_code code, gimple *stmt)
+{
+  tree arg0 = gimple_call_arg (stmt, 0);
+  tree arg1 = gimple_call_arg (stmt, 1);
+  tree lhs = gimple_call_lhs (stmt);
+  gimple *g = gimple_build_assign (lhs,
+   fold_build_vec_cmp (code, TREE_TYPE (lhs), arg0, arg1));
+  gimple_set_location (g, gimple_location (stmt));
+  gsi_replace (gsi, g, true);
+}
+
 /* Fold a machine-dependent built-in in GIMPLE.  (For folding into
a constant, use rs6000_fold_builtin.)  */
 
 bool
 rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
@@ -16701,10 +16731,67 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
gimple_set_location (g, gimple_location (stmt));
gsi_replace (gsi, g, true);
return true;
   }
 
+/* Vector compares; EQ, NE, GE, GT, LE.  */
+case ALTIVEC_BUILTIN_VCMPEQUB:
+case ALTIVEC_BUILTIN_VCMPEQUH:
+case ALTIVEC_BUILTIN_VCMPEQUW:
+case P8V_BUILTIN_VCMPEQUD:
+  {
+   fold_compare_helper (gsi, EQ_EXPR, stmt);
+   return true;
+  }
+
+case P9V_BUILTIN_CMPNEB:
+case P9V_BUILTIN_CMPNEH:
+case P9V_BUILTIN_CMPNEW:
+  {
+   fold_compare_helper (gsi, NE_EXPR, stmt);
+   return true;
+  }
+
+case VSX_BUILTIN_CMPGE_16QI:
+case

Re: [PATCH] Fixes for PR68356, PR81210, and PR81693

2017-11-14 Thread Mike Stump

On Nov 14, 2017, at 3:33 AM, Dominique d'Humières  wrote:
> 
>> Le 13 nov. 2017 à 18:40, Mike Stump  a écrit :
>> On Nov 12, 2017, at 6:58 AM, H.J. Lu  wrote:
>>> 
>>> On Sun, Nov 12, 2017 at 6:22 AM, Dominique d'Humières
>>>  wrote:
 The following patch fixes PR68356, PR81210, and PR81693 on darwin.
 ...
>>> 
>>> I wrote these tests.  These tests don't align stack to 16 bytes and
>>> should be skipped on Darwin.
>> 
>> So, sounds like something based on:
>> 
>> /* { dg-skip-if "sp not aligned to 16 bytes" { *-*-darwin } } */
>> 
>> then.  Ok with that change.
> 
> Thanks for the review.
> 
> Am I correct to understand that this apply to pr25967-*.c only?
> 
> I’ld like to keep the PR numbers. Is it OK?

Feel free to keep the PR, but, I'd close the other two as dups of the first, 
and then just list the first.  The issue is that all of these test cases are 1 
problem, so there should only be 1 PR.  HJ's comment applies to all test cases 
in your patch, and the problem appears to be the single non-16 byte stack on 
all of them.  Because of that alone, the test cases should be skipped for that 
one reason, no matter the existence of other reasons why that test case should 
be skipped.

Re: [PATCH] PR fortran/78240 -- kludge of the day

2017-11-14 Thread Janus Weil

Hi guys,

I see this new test case failing on x86_64-linux-gnu:

FAIL: gfortran.dg/pr78240.f90   -O  (test for excess errors)


$ gfortran-8 pr78240.f90
pr78240.f90:11:12:

   integer x(n)/1/   ! { dg-error "Nonconstant array" }
1
Error: Variable ‘n’ cannot appear in the expression at (1)
pr78240.f90:11:14:

   integer x(n)/1/   ! { dg-error "Nonconstant array" }
  1
Error: The module or main program array ‘x’ at (1) must have constant shape
pr78240.f90:11:19:

   integer x(n)/1/   ! { dg-error "Nonconstant array" }
   1
Error: Nonconstant array section at (1) in DATA statement


Cheers,
Janus




2017-11-14 1:32 GMT+01:00 Steve Kargl :
> On Mon, Nov 13, 2017 at 04:42:31PM -0500, Fritz Reese wrote:
>> On Thu, Nov 9, 2017 at 5:02 PM, Steve Kargl
>>  wrote:
>> > The following patch fixes PR fortran/78240.  It seems
>> > to me to be inelegant, but it does pass regression
>> > testing. [...] OK to commit?
>>
>> Upon closer analysis, the patch is insufficient to fix the PR. I will
>> explain below. At the bottom of this letter I explain and have
>> attached a new patch which fixes some more subtle issues in the code
>> which causes the PR.
>
> Thanks for taking a look.  You're correct that I should
> have looked more closely at the memory management that
> you noted.
>
>> Thanks for pointing me to this issue and allowing me time to review
>> it. The new patch passes all regression tests including its two tests
>> (one for each issue above). OK to commit this one?
>
> Yes.  Thanks for the patch.
>
> --
> steve

Re: [PATCH] Fix use-after-free in the strlen pass (PR tree-optimization/82977)

2017-11-14 Thread Martin Sebor


On 11/14/2017 02:10 PM, Jeff Law wrote:

On 11/14/2017 02:04 PM, Jakub Jelinek wrote:

Hi!

strlen_to_stridx.get (rhs1) returns an address into the hash_map, and
strlen_to_stridx.put (lhs, *ps); (in order to be efficient) doesn't make a
copy of the argument just in case, first inserts the slot into it which
may cause reallocation, and only afterwards runs the copy ctor to assign
the value into the new slot.  So, passing it a reference to something
in the hash_map is wrong.  Fixed thusly, bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?

2017-11-14  Jakub Jelinek  

PR tree-optimization/82977
* tree-ssa-strlen.c (strlen_optimize_stmt): Pass a reference to a copy
constructed temporary to strlen_to_stridx.put.

I've been seeing a couple new tests flip between pass and fail recently.
 I wonder if this is the ultimate cause.


I've been noticing it for quite a while, even before the commit,
so I suspect something else is going on in addition to this bug.

Martin

Re: [PATCH] Fix use-after-free in the strlen pass (PR tree-optimization/82977)

2017-11-14 Thread Jakub Jelinek

On Tue, Nov 14, 2017 at 02:24:28PM -0700, Martin Sebor wrote:
> On 11/14/2017 02:04 PM, Jakub Jelinek wrote:
> > Hi!
> > 
> > strlen_to_stridx.get (rhs1) returns an address into the hash_map, and
> > strlen_to_stridx.put (lhs, *ps); (in order to be efficient) doesn't make a
> > copy of the argument just in case, first inserts the slot into it which
> > may cause reallocation, and only afterwards runs the copy ctor to assign
> > the value into the new slot.  So, passing it a reference to something
> > in the hash_map is wrong.  Fixed thusly, bootstrapped/regtested on
> > x86_64-linux and i686-linux, ok for trunk?
> 
> This seems like an unnecessary gotcha that should be possible to
> avoid in the hash_map.  The corresponding standard containers
> require it to work and so it's surprising when it doesn't in GCC.
> 
> I've been looking at how this is implemented and it seems to me
> that a fix should be doable by having the hash_map check to see if
> the underlying table needs to expand and if so, create a temporary
> copy of the element before reallocating it.

That would IMHO just slow down and enlarge the hash_map for all users,
even when most of them don't really need it.
While it is reasonable for STL containers to make sure it works, we
aren't using STL containers and can pose additional restrictions.

Jakub

[PATCH] Move bswap pass into gimple-ssa-store-merging.c

2017-11-14 Thread Jakub Jelinek

Hi!

I'll be working on further store-merging improvements next two days
and hope to use some of the bswap pass APIs to handle stuff like:
void foo (char *__restrict p, char *__restrict q)
{
  p[0] = q[3];
  p[1] = q[2];
  p[2] = q[1];
  p[3] = q[0];
}
or
  p[4] = data >> 8;
  p[5] = data;
etc.  As a precondition for that, this patch just moves the whole bswap
pass from tree-ssa-math-opts.c, without too many changes (mostly just
putting stuff into anon namespace, removing unneeded static keywords, and
moving some test from execute to gate).  Bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?  (Wouldn't commit it anyway
until further patches that actually need it are approved).

2017-11-14  Jakub Jelinek  

* tree-ssa-math-opts.c (nop_stats, bswap_stats, struct symbolic_number,
BITS_PER_MARKER, MARKER_MASK, MARKER_BYTE_UNKNOWN, HEAD_MARKER, CMPNOP,
CMPXCHG, do_shift_rotate, verify_symbolic_number_p,
init_symbolic_number, find_bswap_or_nop_load, perform_symbolic_merge,
find_bswap_or_nop_1, find_bswap_or_nop, pass_data_optimize_bswap,
class pass_optimize_bswap, bswap_replace,
pass_optimize_bswap::execute): Moved to ...
* gimple-ssa-store-merging.c: ... this file.
Include optabs-tree.h.
(nop_stats, bswap_stats, do_shift_rotate, verify_symbolic_number_p,
init_symbolic_number, find_bswap_or_nop_load, perform_symbolic_merge,
find_bswap_or_nop_1, find_bswap_or_nop, bswap_replace): Put into
anonymous namespace, remove static keywords.
(pass_optimize_bswap::gate): Test BITS_PER_UNIT == 8 here...
(pass_optimize_bswap::execute): ... rather than here.

--- gcc/tree-ssa-math-opts.c.jj 2017-11-06 17:24:15.0 +0100
+++ gcc/tree-ssa-math-opts.c2017-11-14 17:08:28.831697803 +0100
@@ -167,18 +167,6 @@ static struct
 
 static struct
 {
-  /* Number of hand-written 16-bit nop / bswaps found.  */
-  int found_16bit;
-
-  /* Number of hand-written 32-bit nop / bswaps found.  */
-  int found_32bit;
-
-  /* Number of hand-written 64-bit nop / bswaps found.  */
-  int found_64bit;
-} nop_stats, bswap_stats;
-
-static struct
-{
   /* Number of widening multiplication ops inserted.  */
   int widen_mults_inserted;
 
@@ -1934,1070 +1922,6 @@ make_pass_cse_sincos (gcc::context *ctxt
   return new pass_cse_sincos (ctxt);
 }
 
-/* A symbolic number structure is used to detect byte permutation and selection
-   patterns of a source.  To achieve that, its field N contains an artificial
-   number consisting of BITS_PER_MARKER sized markers tracking where does each
-   byte come from in the source:
-
-   0  - target byte has the value 0
-   FF - target byte has an unknown value (eg. due to sign extension)
-   1..size - marker value is the byte index in the source (0 for lsb).
-
-   To detect permutations on memory sources (arrays and structures), a symbolic
-   number is also associated:
-   - a base address BASE_ADDR and an OFFSET giving the address of the source;
-   - a range which gives the difference between the highest and lowest accessed
- memory location to make such a symbolic number;
-   - the address SRC of the source element of lowest address as a convenience
- to easily get BASE_ADDR + offset + lowest bytepos;
-   - number of expressions N_OPS bitwise ored together to represent
- approximate cost of the computation.
-
-   Note 1: the range is different from size as size reflects the size of the
-   type of the current expression.  For instance, for an array char a[],
-   (short) a[0] | (short) a[3] would have a size of 2 but a range of 4 while
-   (short) a[0] | ((short) a[0] << 1) would still have a size of 2 but this
-   time a range of 1.
-
-   Note 2: for non-memory sources, range holds the same value as size.
-
-   Note 3: SRC points to the SSA_NAME in case of non-memory source.  */
-
-struct symbolic_number {
-  uint64_t n;
-  tree type;
-  tree base_addr;
-  tree offset;
-  HOST_WIDE_INT bytepos;
-  tree src;
-  tree alias_set;
-  tree vuse;
-  unsigned HOST_WIDE_INT range;
-  int n_ops;
-};
-
-#define BITS_PER_MARKER 8
-#define MARKER_MASK ((1 << BITS_PER_MARKER) - 1)
-#define MARKER_BYTE_UNKNOWN MARKER_MASK
-#define HEAD_MARKER(n, size) \
-  ((n) & ((uint64_t) MARKER_MASK << (((size) - 1) * BITS_PER_MARKER)))
-
-/* The number which the find_bswap_or_nop_1 result should match in
-   order to have a nop.  The number is masked according to the size of
-   the symbolic number before using it.  */
-#define CMPNOP (sizeof (int64_t) < 8 ? 0 : \
-  (uint64_t)0x08070605 << 32 | 0x04030201)
-
-/* The number which the find_bswap_or_nop_1 result should match in
-   order to have a byte swap.  The number is masked according to the
-   size of the symbolic number before using it.  */
-#define CMPXCHG (sizeof (int64_t) < 8 ? 0 : \
-  (uint64_t)0x01020304 << 32 | 0x05060708)
-
-/* Perform a SHIFT or ROTATE operation by COUNT bits on symbolic
-

Re: [PATCH] Fix use-after-free in the strlen pass (PR tree-optimization/82977)

2017-11-14 Thread Martin Sebor


On 11/14/2017 02:04 PM, Jakub Jelinek wrote:

Hi!

strlen_to_stridx.get (rhs1) returns an address into the hash_map, and
strlen_to_stridx.put (lhs, *ps); (in order to be efficient) doesn't make a
copy of the argument just in case, first inserts the slot into it which
may cause reallocation, and only afterwards runs the copy ctor to assign
the value into the new slot.  So, passing it a reference to something
in the hash_map is wrong.  Fixed thusly, bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?


This seems like an unnecessary gotcha that should be possible to
avoid in the hash_map.  The corresponding standard containers
require it to work and so it's surprising when it doesn't in GCC.

I've been looking at how this is implemented and it seems to me
that a fix should be doable by having the hash_map check to see if
the underlying table needs to expand and if so, create a temporary
copy of the element before reallocating it.

Martin

[PATCH] Small expand_mul_overflow improvement (PR target/82981)

2017-11-14 Thread Jakub Jelinek

Hi!

For targets that don't have {,u}mulv4 insn we try 3 different
expansions of the basic signed * signed -> signed or unsigned * unsigned ->
unsigned overflow computation.  The first one is done if 
   if (GET_MODE_2XWIDER_MODE (mode).exists ()
   && targetm.scalar_mode_supported_p (wmode))
and we emit a WIDEN_MULT_EXPR followed by extraction of the hipart
from it (for testing overflow if both unsigned and signed) and
lowpart (result of the multiplication and for signed overflow testing
where we use MSB of it).  This case is meant for use by smaller modes,
e.g. subword, where it is generally pretty efficient.  Unfortunately
on some targets, e.g. mips 64-bit, where the is no widening mult
optab it can be expanded as a libcall on the full wmode operands,
which is slow and causes problems e.g. to some freestanding environments
like Linux kernel that don't bother to link in libgcc.a or replacement
thereof.  Then there is another case, usually pretty large, with usually
two but sometimes one multiplication, and various conditionals, shifts, etc.
meant primarily for the widest supported mode.  And the last fallback
is just doing multiplication and never computing overflow, hopefully it
is never used at least on sane targets.

This patch attempts to check if we'd emit WIDEN_MULT_EXPR as a libcall
and in that case tries to use the other possibilities, and only falls
back to the WIDEN_MULT_EXPR with a libcall if we'd otherwise use the
last fallback without overflow computation.

In addition to it, it adds support for targets that have supported
MULT_HIGHPART_EXPR for that mode, by doing pretty much what the
WIDEN_MULT_EXPR case does, but instead of doing one multiplication
to compute both lowpart and highpart and then shifts to split those
we use one multiplication to compute the lowpart and one MULT_HIGHPART_EXPR
to compute the highpart.  In theory this method doesn't have to be always
faster than the one with hmode, because the MULT_HIGHPART_EXPR case does
2 multiplications plus one comparison, while the hmode code does sometimes
just one, but it is significantly shorter, fewer conditionals/branches
so I think it should be generally a win (if it turns out not to be the
case on some target, we could restrict it to -Os only or whatever).

And lastly, the MULT_HIGHPART_EXPR case can actually be the optimal code
if we are only checking for the overflow and don't actually need the
multiplication value, it is unsigned multiply and we don't need any
res using code afterwards; in that case the low part multiply can be DCEd
and only the highpart multiply + comparison will remain.  So, the patch
adds check for single IMAGPART_EXPR use and other conditions and uses
the MULT_HIGHPART_EXPR code in preference of the WIDEN_MULT_EXPR in that
case.

Bootstrapped/regtested on x86_64-linux and i686-linux, tested on the
testcase with cross to mips, ok for trunk?

2017-11-14  Jakub Jelinek  

PR target/82981
* internal-fn.c: Include gimple-ssa.h, tree-phinodes.h and
ssa-iterators.h.
(can_widen_mult_without_libcall): New function.
(expand_mul_overflow): If only checking unsigned mul overflow,
not result, and can do efficiently MULT_HIGHPART_EXPR, emit that.
Don't use WIDEN_MULT_EXPR if it would involve a libcall, unless
no other way works.  Add MULT_HIGHPART_EXPR + MULT_EXPR support.
(expand_DIVMOD): Formatting fix.
* expmed.h (expand_mult): Add NO_LIBCALL argument.
* expmed.c (expand_mult): Likewise.  Use OPTAB_WIDEN rather
than OPTAB_LIB_WIDEN if NO_LIBCALL is true, and allow it to fail.

--- gcc/internal-fn.c.jj2017-10-23 10:13:08.0 +0200
+++ gcc/internal-fn.c   2017-11-14 16:48:25.414403348 +0100
@@ -46,6 +46,9 @@ along with GCC; see the file COPYING3.
 #include "recog.h"
 #include "builtins.h"
 #include "optabs-tree.h"
+#include "gimple-ssa.h"
+#include "tree-phinodes.h"
+#include "ssa-iterators.h"
 
 /* The names of each internal function, indexed by function number.  */
 const char *const internal_fn_name_array[] = {
@@ -1172,6 +1175,35 @@ expand_neg_overflow (location_t loc, tre
 }
 }
 
+/* Return true if UNS WIDEN_MULT_EXPR with result mode WMODE and operand
+   mode MODE can be expanded without using a libcall.  */
+
+static bool
+can_widen_mult_without_libcall (scalar_int_mode wmode, scalar_int_mode mode,
+   rtx op0, rtx op1, bool uns)
+{
+  if (find_widening_optab_handler (umul_widen_optab, wmode, mode)
+  != CODE_FOR_nothing)
+return true;
+
+  if (find_widening_optab_handler (smul_widen_optab, wmode, mode)
+  != CODE_FOR_nothing)
+return true;
+
+  rtx_insn *last = get_last_insn ();
+  if (CONSTANT_P (op0))
+op0 = convert_modes (wmode, mode, op0, uns);
+  else
+op0 = gen_raw_REG (wmode, LAST_VIRTUAL_REGISTER + 1);
+  if (CONSTANT_P (op1))
+op1 = convert_modes (wmode, mode, op1, uns);
+  else
+op1 = gen_raw_REG

Re: [PATCH] Fix use-after-free in the strlen pass (PR tree-optimization/82977)

2017-11-14 Thread Jeff Law

On 11/14/2017 02:04 PM, Jakub Jelinek wrote:
> Hi!
> 
> strlen_to_stridx.get (rhs1) returns an address into the hash_map, and
> strlen_to_stridx.put (lhs, *ps); (in order to be efficient) doesn't make a
> copy of the argument just in case, first inserts the slot into it which
> may cause reallocation, and only afterwards runs the copy ctor to assign
> the value into the new slot.  So, passing it a reference to something
> in the hash_map is wrong.  Fixed thusly, bootstrapped/regtested on
> x86_64-linux and i686-linux, ok for trunk?
> 
> 2017-11-14  Jakub Jelinek  
> 
>   PR tree-optimization/82977
>   * tree-ssa-strlen.c (strlen_optimize_stmt): Pass a reference to a copy
>   constructed temporary to strlen_to_stridx.put.
I've been seeing a couple new tests flip between pass and fail recently.
 I wonder if this is the ultimate cause.

My brain isn't working at the moment (sick and too little sleep), so not
going to try and review it right now.

jeff

[PATCH] Fix use-after-free in the strlen pass (PR tree-optimization/82977)

2017-11-14 Thread Jakub Jelinek

Hi!

strlen_to_stridx.get (rhs1) returns an address into the hash_map, and
strlen_to_stridx.put (lhs, *ps); (in order to be efficient) doesn't make a
copy of the argument just in case, first inserts the slot into it which
may cause reallocation, and only afterwards runs the copy ctor to assign
the value into the new slot.  So, passing it a reference to something
in the hash_map is wrong.  Fixed thusly, bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?

2017-11-14  Jakub Jelinek  

PR tree-optimization/82977
* tree-ssa-strlen.c (strlen_optimize_stmt): Pass a reference to a copy
constructed temporary to strlen_to_stridx.put.

--- gcc/tree-ssa-strlen.c.jj2017-11-13 09:31:30.0 +0100
+++ gcc/tree-ssa-strlen.c   2017-11-14 10:28:30.583110162 +0100
@@ -2968,7 +2968,7 @@ strlen_optimize_stmt (gimple_stmt_iterat
 
tree rhs1 = gimple_assign_rhs1 (stmt);
if (stridx_strlenloc *ps = strlen_to_stridx.get (rhs1))
- strlen_to_stridx.put (lhs, *ps);
+ strlen_to_stridx.put (lhs, stridx_strlenloc (*ps));
   }
 else if (TREE_CODE (lhs) != SSA_NAME && !TREE_SIDE_EFFECTS (lhs))
{

Jakub

Re: [PATCH, rs6000] Repair vec_xl, vec_xst, vec_xl_be, vec_xst_be built-in functions

2017-11-14 Thread Bill Schmidt

Hi,

I had a pasto in the function prototype for vec_xst_be.  Fixed patch is below.

Thanks,
Bill

On 11/14/17 8:11 AM, Bill Schmidt wrote:

> Please hold review on this until I investigate something that Carl brought 
> up.  Thanks,
> and sorry for the noise!
>
> -- Bill
>> On Nov 13, 2017, at 10:30 AM, Bill Schmidt  
>> wrote:
>>
>> Hi,
>>
>> Some previous patches to add support for vec_xl_be and fill in gaps in 
>> testing
>> for vec_xl and vec_xst resulted in incorrect semantics for these built-in
>> functions.  My fault for not reviewing them in detail.  Essentially the 
>> vec_xl
>> and vec_xl_be semantics were reversed, and vec_xst received semantics that
>> should be associated with vec_xst_be.  Also, vec_xst_be has not yet been
>> implemented.  Also, my attempt to add P8-specific code for the element-
>> reversing loads and stores had the same problem with wrong semantics.  This 
>> patch addresses these issues in a uniform manner.
>>
>> Most of the work is done by adjusting the rs6000-c.c mapping between 
>> overloaded
>> function names and type-specific implementations, and adding missing 
>> interfaces
>> there.  I've also removed the previous implementation function
>> altivec_expand_xl_be_builtin, and moved the byte-reversal into define_expands
>> for the code gen patterns with some simplification.  These generate single
>> instructions for P9 (lxvh8x, lxvq16x, etc.) and lvxw4x + permute for P8.
>>
>> I've verified the code generation by hand, but have not included test cases
>> in this patch because Carl Love has been independently working on adding a
>> full set of test cases for these built-ins.  The code does pass those 
>> relevant
>> tests that are already in the test suite.  I hope it's okay to install this
>> patch while waiting for the full tests; I'll of course give high priority to
>> fixing any possible fallout from those tests.
>>
>> Bootstrapped and tested on powerpc64le-linux-gnu (POWER8 and POWER9) with no
>> regressions.  Is this okay for trunk?
>>
>> Thanks,
>> Bill
>>
[gcc]

2017-11-14  Bill Schmidt  

* config/rs6000/altivec.h (vec_xst_be): New #define.
* config/rs6000/altivec.md (altivec_vperm__direct): Rename
and externalize from *altivec_vperm__internal.
* config/rs6000/rs6000-builtin.def (XL_BE_V16QI): Remove macro
instantiation.
(XL_BE_V8HI): Likewise.
(XL_BE_V4SI): Likewise.
(XL_BE_V4SI): Likewise.
(XL_BE_V2DI): Likewise.
(XL_BE_V4SF): Likewise.
(XL_BE_V2DF): Likewise.
(XST_BE): Add BU_VSX_OVERLOAD_X macro instantiation.
* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Correct
all array entries with these keys: VSX_BUILTIN_VEC_XL,
VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_VEC_XST.  Add entries for key
VSX_BUILTIN_VEC_XST_BE.
* config/rs6000/rs6000.c (altivec_expand_xl_be_builtin): Remove.
(altivec_expand_builtin): Remove handling for VSX_BUILTIN_XL_BE_*
built-ins.
(altivec_init_builtins): Replace conditional calls to def_builtin
for __builtin_vsx_ld_elemrev_{v8hi,v16qi} and
__builtin_vsx_st_elemrev_{v8hi,v16qi} based on TARGET_P9_VECTOR
with unconditional calls.  Remove calls to def_builtin for
__builtin_vsx_le_be_.  Add a call to def_builtin for
__builtin_vec_xst_be.
* config/rs6000/vsx.md (vsx_ld_elemrev_v8hi): Convert define_insn
to define_expand, and add alternate RTL generation for P8.
(*vsx_ld_elemrev_v8hi_internal): New define_insn based on
vsx_ld_elemrev_v8hi.
(vsx_ld_elemrev_v16qi): Convert define_insn to define_expand, and
add alternate RTL generation for P8.
(*vsx_ld_elemrev_v16qi_internal): New define_insn based on
vsx_ld_elemrev_v16qi.
(vsx_st_elemrev_v8hi): Convert define_insn
to define_expand, and add alternate RTL generation for P8.
(*vsx_st_elemrev_v8hi_internal): New define_insn based on
vsx_st_elemrev_v8hi.
(vsx_st_elemrev_v16qi): Convert define_insn to define_expand, and
add alternate RTL generation for P8.
(*vsx_st_elemrev_v16qi_internal): New define_insn based on
vsx_st_elemrev_v16qi.

[gcc/testsuite]

2017-11-14  Bill Schmidt  

* gcc.target/powerpc/swaps-p8-26.c: Modify expected code
generation.

Index: gcc/config/rs6000/altivec.h
===
--- gcc/config/rs6000/altivec.h (revision 254629)
+++ gcc/config/rs6000/altivec.h (working copy)
@@ -357,6 +357,7 @@
 #define vec_xl __builtin_vec_vsx_ld
 #define vec_xl_be __builtin_vec_xl_be
 #define vec_xst __builtin_vec_vsx_st
+#define vec_xst_be __builtin_vec_xst_be

 /* Note, xxsldi and xxpermdi were added as __builtin_vsx_ functions
instead of __builtin_vec_  */
Index: gcc/config/rs6000/altivec.md

[PR c++/82878] pass-by-invisiref in lambda

2017-11-14 Thread Nathan Sidwell

This patch fixes 82878, which turned out to be a problem in genericizing 
the call inside a __closure::_FUN () that we synthesize when we're 
providing a non-member fn pointer for a non-capturing lambda.  That call 
is marked as CALL_FROM_THUNK_P, but I elided the check for that when 
fixing 78495.


I think the 78495 fix is incorrect.  That's when genericizing the call 
from an inheriting ctor to the target ctor.  That too was setting 
CALL_FROM_THUNK_P.


I think the right fix for that was to not set CALL_FROM_THUNK_P.  We're 
generating that call inside a function whose parms have yet to be bashed 
into invisrefs.  So not thunky enough.


This patch restores the deleted cp_genericize_r code and elides the 
setting of CALL_FROM_THUNK_P during call generation.


Jason, does this look right?

nathan

--
Nathan Sidwell
2017-11-14  Nathan Sidwell  

	PR c++/82878
	PR c++/78495
	* call.c (build_call_a): Don't set CALL_FROM_THUNK_P for inherited
	ctor.
	* cp-gimplify.c	(cp_genericize_r): Restore THUNK dereference
	inhibibition check removed in previous c++/78495 change.

	PR c++/82878
	* g++.dg/cpp0x/pr82878.C: New.
	* g++.dg/cpp1z/inh-ctor38.C: Check moves too.

Index: cp/call.c
===
--- cp/call.c	(revision 254686)
+++ cp/call.c	(working copy)
@@ -376,18 +376,10 @@ build_call_a (tree function, int n, tree
 
   TREE_HAS_CONSTRUCTOR (function) = (decl && DECL_CONSTRUCTOR_P (decl));
 
-  if (current_function_decl && decl
-  && flag_new_inheriting_ctors
-  && DECL_INHERITED_CTOR (current_function_decl)
-  && (DECL_INHERITED_CTOR (current_function_decl)
-	  == DECL_CLONED_FUNCTION (decl)))
-/* Pass arguments directly to the inherited constructor.  */
-CALL_FROM_THUNK_P (function) = true;
-
   /* Don't pass empty class objects by value.  This is useful
  for tags in STL, which are used to control overload resolution.
  We don't need to handle other cases of copying empty classes.  */
-  else if (! decl || ! DECL_BUILT_IN (decl))
+  if (! decl || ! DECL_BUILT_IN (decl))
 for (i = 0; i < n; i++)
   {
 	tree arg = CALL_EXPR_ARG (function, i);
Index: cp/cp-gimplify.c
===
--- cp/cp-gimplify.c	(revision 254686)
+++ cp/cp-gimplify.c	(working copy)
@@ -1078,6 +1078,14 @@ cp_genericize_r (tree *stmt_p, int *walk
   && omp_var_to_track (stmt))
 omp_cxx_notice_variable (wtd->omp_ctx, stmt);
 
+  /* Don't dereference parms in a thunk, pass the references through. */
+  if ((TREE_CODE (stmt) == CALL_EXPR && CALL_FROM_THUNK_P (stmt))
+  || (TREE_CODE (stmt) == AGGR_INIT_EXPR && AGGR_INIT_FROM_THUNK_P (stmt)))
+{
+  *walk_subtrees = 0;
+  return NULL;
+}
+
   /* Dereference invisible reference parms.  */
   if (wtd->handle_invisiref_parm_p && is_invisiref_parm (stmt))
 {
Index: testsuite/g++.dg/cpp0x/pr82878.C
===
--- testsuite/g++.dg/cpp0x/pr82878.C	(revision 0)
+++ testsuite/g++.dg/cpp0x/pr82878.C	(working copy)
@@ -0,0 +1,20 @@
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-O" }
+// pr 82878 erroneously unwrapped a reference parm in the lambda::_FUN
+// thunk.
+
+struct A {
+  ~A();
+  operator int ();
+};
+
+void baz ();
+
+void
+bar (A b)
+{
+  void (*lam) (A) = [](A) { baz (); };
+
+  if (auto c = b)
+lam (c);
+}
Index: testsuite/g++.dg/cpp1z/inh-ctor38.C
===
--- testsuite/g++.dg/cpp1z/inh-ctor38.C	(revision 254686)
+++ testsuite/g++.dg/cpp1z/inh-ctor38.C	(working copy)
@@ -1,17 +1,19 @@
 // { dg-do run { target c++11 } }
 // PR78495 failed to propagate pass-by-value struct to base ctor.
 
+static int moves = 0;
+
 struct Ptr {
   void *ptr = 0;
 
   Ptr() {}
   Ptr(Ptr const&) = delete;
-  Ptr(Ptr&& other) : ptr (other.ptr) {}
+  Ptr(Ptr&& other) : ptr (other.ptr) {moves++;}
 };
 
 struct Base {
   Ptr val;
-  Base(Ptr val_) : val(static_cast(val_)) {}
+  Base(Ptr val_);
 };
 
 struct Derived: Base {
@@ -27,5 +29,13 @@ void *Foo () {
 }
 
 int main () {
-  return Foo () != 0;
+  if (Foo ())
+return 1;
+
+  if (moves != 2)
+return 2;
+
+  return 0;
 }
+
+Base::Base(Ptr val_) : val(static_cast(val_)) {}

Re: [patch] backwards threader cleanups

2017-11-14 Thread David Malcolm

On Tue, 2017-11-14 at 14:08 -0500, Aldy Hernandez wrote:
> Howdy!
> 
> For some upcoming work I need some pass local data that I don't want
> to 
> be passing around as an argument.  We have enough of those in the 
> threader as it is.  So I moved the current pass local data into its
> own 
> class, and basically classified the entire pass, thus avoiding a lot
> of 
> arguments.
> 
> In doing this, I also noticed that not only was the prototype in the 
> header file wrong, but it wasn't used anywhere. I have removed the 
> useless header.
> 
> The net result is less lines of code, even without taking into
> account 
> the deleted header file.
> 
> Oh yeah, we had no way of clearing a hash set.  I've fixed this 
> oversight :).
> 
> Tested on x86-64 Linux.
> 
> OK for trunk?

Some nitpicks below...

> gcc/
> 
>   * hash-set.h (hash_set::empty): New.
>   * tree-ssa-threadbackward.h: Remove.
>   * tree-ssa-threadbackward.c (class thread_jumps): New.
>   Move max_threaded_paths into class.
>   (fsm_find_thread_path): Remove arguments that are now in class.
>   (profitable_jump_thread_path): Rename to...
>   (thread_jumps::profitable_jump_thread_path): ...this.
>   (convert_and_register_jump_thread_path): Rename to...
>   (thread_jumps::convert_and_register_current_path): ...this.
>   (check_subpath_and_update_thread_path): Rename to...
>   (thread_jumps::check_subpath_and_update_thread_path): ...this.
>   (register_jump_thread_path_if_profitable): Rename to...
>   (thread_jumps::register_jump_thread_path_if_profitable): ...this.
>   (handle_phi): Rename to...
>   (thread_jumps::handle_phi): ...this.
>   (handle_assignment): Rename to...
>   (thread_jumps::handle_assignment): ...this.
>   (fsm_find_control_statement_thread_paths): Rename to...
>   (thread_jumps::fsm_find_control_statement_thread_paths): ...this.
>   (find_jump_threads_backwards): Rename to...
>   (thread_jumps::find_jump_threads_backwards): ...this.
>   Initialize path local data.
>   (pass_thread_jumps::execute): Call find_jump_threads_backwards
>   from within thread_jumps class.
>   (pass_early_thread_jumps::execute): Same.
> 
> diff --git a/gcc/hash-set.h b/gcc/hash-set.h
> index d2247d39571..8ce796d1c48 100644
> --- a/gcc/hash-set.h
> +++ b/gcc/hash-set.h
> @@ -80,6 +80,10 @@ public:
>  
>size_t elements () const { return m_table.elements (); }
>  
> +  /* Clear the hash table.  */
> +
> +  void empty () { m_table.empty (); }
> +
>class iterator
>{
>public:
> diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
> index 12bc80350f5..a1454f31bec 100644
> --- a/gcc/tree-ssa-threadbackward.c
> +++ b/gcc/tree-ssa-threadbackward.c
> @@ -38,7 +38,35 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-inline.h"
>  #include "tree-vectorizer.h"
>  
> -static int max_threaded_paths;
> +class thread_jumps
> +{
> + private:
> +  /* The maximum number of BB we are allowed to thread.  */
> +  int max_threaded_paths;
> +  /* Hash to keep track of seen bbs.  */
> +  hash_set visited_bbs;
> +  /* The current path we're analyzing.  */
> +  auto_vec path;
> +  /* Tracks if we have recursed through a loop PHI node.  */
> +  bool seen_loop_phi;
> +  /* Indicate that we could increase code size to improve the
> + code path.  */
> +  bool speed_p;
> +
> +  edge profitable_jump_thread_path (basic_block bbi, tree name, tree arg,
> + bool *creates_irreducible_loop);
> +  void convert_and_register_current_path (edge taken_edge);
> +  void register_jump_thread_path_if_profitable (tree name, tree arg,
> + basic_block def_bb);
> +  void handle_assignment (gimple *stmt, tree name, basic_block def_bb);
> +  void handle_phi (gphi *phi, tree name, basic_block def_bb);
> +  void fsm_find_control_statement_thread_paths (tree name);
> +  bool check_subpath_and_update_thread_path (basic_block last_bb,
> +  basic_block new_bb,
> +  int *next_path_length);
> + public:
> +  void find_jump_threads_backwards (basic_block bb, bool speed_p);
> +};

Nitpick:
  https://gcc.gnu.org/codingconventions.html#Class_Form
says that:

"When defining a class, first [...]
declare all public member functions,
[...]
then declare all non-public member functions, and
then declare all non-public member variables."

Should the class have a ctor?  I see in
  thread_jumps::find_jump_threads_backwards
below that you have:

> +  /* Initialize the pass local data that changes at each iteration.  */
> +  path.truncate (0);
> +  path.safe_push (bb);
> +  visited_bbs.empty ();
> +  seen_loop_phi = false;
> +  this->speed_p = speed_p;

(Is this a self-assign from this->speed_p? should the "speed_p" param
be renamed, e.g. to "speed_p_")

>max_threaded_paths = PARAM_VALUE

Remove computation of path_in_freq in tree-ssa-threadupdate

2017-11-14 Thread Jan Hubicka

Hi,
this patch removes now unused variable. Bootstrapped/regtested x86_64-linux, 
comitted.

Honza

* tree-ssa-threadupdate.c (compute_path_counts): Remove
unused path_in_freq_ptr parameter.
(ssa_fix_duplicate_block_edges): Do not pass around path_in_freq
Index: tree-ssa-threadupdate.c
===
--- tree-ssa-threadupdate.c (revision 254719)
+++ tree-ssa-threadupdate.c (working copy)
@@ -691,8 +691,7 @@ static bool
 compute_path_counts (struct redirection_data *rd,
 ssa_local_info_t *local_info,
 profile_count *path_in_count_ptr,
-profile_count *path_out_count_ptr,
-int *path_in_freq_ptr)
+profile_count *path_out_count_ptr)
 {
   edge e = rd->incoming_edges->e;
   vec *path = THREAD_PATH (e);
@@ -700,7 +699,6 @@ compute_path_counts (struct redirection_
   profile_count nonpath_count = profile_count::zero ();
   bool has_joiner = false;
   profile_count path_in_count = profile_count::zero ();
-  int path_in_freq = 0;
 
   /* Start by accumulating incoming edge counts to the path's first bb
  into a couple buckets:
@@ -740,7 +738,6 @@ compute_path_counts (struct redirection_
 source block.  */
  gcc_assert (ein_path->last ()->e == elast);
  path_in_count += ein->count ();
- path_in_freq += EDGE_FREQUENCY (ein);
}
   else if (!ein_path)
{
@@ -751,10 +748,6 @@ compute_path_counts (struct redirection_
}
 }
 
-  /* This is needed due to insane incoming frequencies.  */
-  if (path_in_freq > BB_FREQ_MAX)
-path_in_freq = BB_FREQ_MAX;
-
   /* Now compute the fraction of the total count coming into the first
  path bb that is from the current threading path.  */
   profile_count total_count = e->dest->count;
@@ -843,7 +836,6 @@ compute_path_counts (struct redirection_
 
   *path_in_count_ptr = path_in_count;
   *path_out_count_ptr = path_out_count;
-  *path_in_freq_ptr = path_in_freq;
   return has_joiner;
 }
 
@@ -954,7 +946,6 @@ ssa_fix_duplicate_block_edges (struct re
   edge elast = path->last ()->e;
   profile_count path_in_count = profile_count::zero ();
   profile_count path_out_count = profile_count::zero ();
-  int path_in_freq = 0;
 
   /* First determine how much profile count to move from original
  path to the duplicate path.  This is tricky in the presence of
@@ -963,8 +954,7 @@ ssa_fix_duplicate_block_edges (struct re
  non-joiner case the path_in_count and path_out_count should be the
  same.  */
   bool has_joiner = compute_path_counts (rd, local_info,
-_in_count, _out_count,
-_in_freq);
+_in_count, _out_count);
 
   for (unsigned int count = 0, i = 1; i < path->length (); i++)
 {

Re: [PATCH, i386] Refactor -mprefer-avx[128|256] options into common -mprefer-vector-width=[none|128|256|512]

2017-11-14 Thread Sandra Loosemore


On 11/13/2017 10:25 AM, Shalnov, Sergey wrote:


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4427328..bc9eb85 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -26087,15 +26087,24 @@ before a transfer of control flow out of the function 
to minimize
 the AVX to SSE transition penalty as well as remove unnecessary 
@code{zeroupper}
 intrinsics.
 
-@item -mprefer-avx128

-@opindex mprefer-avx128
-This option instructs GCC to use 128-bit AVX instructions instead of
-256-bit AVX instructions in the auto-vectorizer.
+@item -mprefer-vector-width=@var{opt}
+@opindex mprefer-vector-width
+This option instructs GCC to use @var{opt}-bit vector width in instructions
+instead of default on the selected platform.
 
-@item -mprefer-avx256

-@opindex mprefer-avx256
-This option instructs GCC to use 256-bit AVX instructions instead of
-512-bit AVX instructions in the auto-vectorizer.
+@table @samp
+@item none
+No extra limitations applied to GCC over than defined by the selected platform.
+


s/over than/other than/  ???

-Sandra

PING: [PATCH] Use rcrt1.o%s/grcrt1.o%s to relocate static PIE

2017-11-14 Thread H.J. Lu

On Fri, Nov 3, 2017 at 10:48 AM, H.J. Lu  wrote:
> On Wed, Nov 1, 2017 at 9:39 AM, H.J. Lu  wrote:
>> On Wed, Nov 1, 2017 at 9:32 AM, Rich Felker  wrote:
>>> On Sun, Oct 15, 2017 at 06:16:57AM -0700, H.J. Lu wrote:
 crt1.o is used to create dynamic and non-PIE static executables.  Static
 PIE needs to link with Pcrt1.o, instead of crt1.o, to relocate static PIE
 at run-time.  When -pg is used with -static-pie, gPcrt1.o should be used.

 Tested on x86-64.  OK for master?
>>>
>>> Is there a reason you didn't follow the existing naming practice here?
>>> Openbsd and musl libc have both had static pie for a long time now and
>>> have used rcrt1.o as the name.
>>
>> I wasn't aware of rcrt1.o and there is no reference to rcrt1.o in GCC at all.
>> Does the FSF GCC support static PIE for musl libc? If not, is there a GCC
>> bug for it?
>>
>> BTW, I don't mind replacing Pcrt1.o/gPcrt1.o with rcrt1.o/grcrt1.o.
>>
>
> Here is the updated patch to use rcrt1.o/grcrt1.o.
>
> OK for trunk?
>
> Thanks.
>

PING.



-- 
H.J.

Fix accounting of call edges in inliner

2017-11-14 Thread Jan Hubicka

Hi,
this patch makes call time computations consistent again.
Bootstrapped/regtested x86_64-linux, comitted.

Honza
* ipa-inline.c (edge_badness): Dump sreal frequency.
(compute_inlined_call_time): Match natural implementaiton ...
* ipa-fnsummary.c (estimate_edge_size_and_time): ... here; remove
forgotten division by CGRAPH_FREQ_BASE.
Index: ipa-inline.c
===
--- ipa-inline.c(revision 254724)
+++ ipa-inline.c(working copy)
@@ -670,8 +670,7 @@ compute_inlined_call_time (struct cgraph
 
   /* This calculation should match one in ipa-inline-analysis.c
  (estimate_edge_size_and_time).  */
-  time -= (sreal) edge->frequency ()
-  * ipa_call_summaries->get (edge)->call_stmt_time / CGRAPH_FREQ_BASE;
+  time -= (sreal)ipa_call_summaries->get (edge)->call_stmt_time * freq;
   time += caller_time;
   if (time <= 0)
 time = ((sreal) 1) >> 8;
@@ -1164,7 +1163,7 @@ edge_badness (struct cgraph_edge *edge,
   " overall growth %i (current) %i (original)"
   " %i (compensated)\n",
   badness.to_double (),
- (double)edge->frequency () / CGRAPH_FREQ_BASE,
+  edge->sreal_frequency ().to_double (),
   edge->count.ipa ().initialized_p () ? edge->count.ipa 
().to_gcov_type () : -1,
   caller->count.ipa ().initialized_p () ? caller->count.ipa 
().to_gcov_type () : -1,
   compute_uninlined_call_time (edge,
Index: ipa-fnsummary.c
===
--- ipa-fnsummary.c (revision 254719)
+++ ipa-fnsummary.c (working copy)
@@ -2581,8 +2581,7 @@ estimate_edge_size_and_time (struct cgra
   if (prob == REG_BR_PROB_BASE)
 *time += ((sreal)call_time) * e->sreal_frequency ();
   else
-*time += ((sreal)call_time * prob) * e->sreal_frequency ()
- / CGRAPH_FREQ_BASE;
+*time += ((sreal)call_time * prob) * e->sreal_frequency ();
 }

[patch] backwards threader cleanups

2017-11-14 Thread Aldy Hernandez


Howdy!

For some upcoming work I need some pass local data that I don't want to 
be passing around as an argument.  We have enough of those in the 
threader as it is.  So I moved the current pass local data into its own 
class, and basically classified the entire pass, thus avoiding a lot of 
arguments.


In doing this, I also noticed that not only was the prototype in the 
header file wrong, but it wasn't used anywhere. I have removed the 
useless header.


The net result is less lines of code, even without taking into account 
the deleted header file.


Oh yeah, we had no way of clearing a hash set.  I've fixed this 
oversight :).


Tested on x86-64 Linux.

OK for trunk?
gcc/

	* hash-set.h (hash_set::empty): New.
	* tree-ssa-threadbackward.h: Remove.
	* tree-ssa-threadbackward.c (class thread_jumps): New.
	Move max_threaded_paths into class.
	(fsm_find_thread_path): Remove arguments that are now in class.
	(profitable_jump_thread_path): Rename to...
	(thread_jumps::profitable_jump_thread_path): ...this.
	(convert_and_register_jump_thread_path): Rename to...
	(thread_jumps::convert_and_register_current_path): ...this.
	(check_subpath_and_update_thread_path): Rename to...
	(thread_jumps::check_subpath_and_update_thread_path): ...this.
	(register_jump_thread_path_if_profitable): Rename to...
	(thread_jumps::register_jump_thread_path_if_profitable): ...this.
	(handle_phi): Rename to...
	(thread_jumps::handle_phi): ...this.
	(handle_assignment): Rename to...
	(thread_jumps::handle_assignment): ...this.
	(fsm_find_control_statement_thread_paths): Rename to...
	(thread_jumps::fsm_find_control_statement_thread_paths): ...this.
	(find_jump_threads_backwards): Rename to...
	(thread_jumps::find_jump_threads_backwards): ...this.
	Initialize path local data.
	(pass_thread_jumps::execute): Call find_jump_threads_backwards
	from within thread_jumps class.
	(pass_early_thread_jumps::execute): Same.

diff --git a/gcc/hash-set.h b/gcc/hash-set.h
index d2247d39571..8ce796d1c48 100644
--- a/gcc/hash-set.h
+++ b/gcc/hash-set.h
@@ -80,6 +80,10 @@ public:
 
   size_t elements () const { return m_table.elements (); }
 
+  /* Clear the hash table.  */
+
+  void empty () { m_table.empty (); }
+
   class iterator
   {
   public:
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 12bc80350f5..a1454f31bec 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -38,7 +38,35 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-inline.h"
 #include "tree-vectorizer.h"
 
-static int max_threaded_paths;
+class thread_jumps
+{
+ private:
+  /* The maximum number of BB we are allowed to thread.  */
+  int max_threaded_paths;
+  /* Hash to keep track of seen bbs.  */
+  hash_set visited_bbs;
+  /* The current path we're analyzing.  */
+  auto_vec path;
+  /* Tracks if we have recursed through a loop PHI node.  */
+  bool seen_loop_phi;
+  /* Indicate that we could increase code size to improve the
+ code path.  */
+  bool speed_p;
+
+  edge profitable_jump_thread_path (basic_block bbi, tree name, tree arg,
+bool *creates_irreducible_loop);
+  void convert_and_register_current_path (edge taken_edge);
+  void register_jump_thread_path_if_profitable (tree name, tree arg,
+		basic_block def_bb);
+  void handle_assignment (gimple *stmt, tree name, basic_block def_bb);
+  void handle_phi (gphi *phi, tree name, basic_block def_bb);
+  void fsm_find_control_statement_thread_paths (tree name);
+  bool check_subpath_and_update_thread_path (basic_block last_bb,
+	 basic_block new_bb,
+	 int *next_path_length);
+ public:
+  void find_jump_threads_backwards (basic_block bb, bool speed_p);
+};
 
 /* Simple helper to get the last statement from BB, which is assumed
to be a control statement.   Return NULL if the last statement is
@@ -61,14 +89,15 @@ get_gimple_control_stmt (basic_block bb)
 
 /* Return true if the CFG contains at least one path from START_BB to
END_BB.  When a path is found, record in PATH the blocks from
-   END_BB to START_BB.  VISITED_BBS is used to make sure we don't fall
-   into an infinite loop.  Bound the recursion to basic blocks
-   belonging to LOOP.  */
+   END_BB to START_BB.  LOCAL_VISITED_BBS is used to make sure we
+   don't fall into an infinite loop.  Bound the recursion to basic
+   blocks belonging to LOOP.  */
 
 static bool
 fsm_find_thread_path (basic_block start_bb, basic_block end_bb,
 		  vec ,
-		  hash_set _bbs, loop_p loop)
+		  hash_set _visited_bbs,
+		  loop_p loop)
 {
   if (loop != start_bb->loop_father)
 return false;
@@ -79,12 +108,13 @@ fsm_find_thread_path (basic_block start_bb, basic_block end_bb,
   return true;
 }
 
-  if (!visited_bbs.add (start_bb))
+  if (!local_visited_bbs.add (start_bb))
 {
   edge e;
   edge_iterator ei;
   FOR_EACH_EDGE (e, ei, start_bb->succs)
-	if (fsm_find_thread_path (e->dest, end_bb, path, visited_bbs, loop))
+	if

Re: [Patch] Don't call linker when creating pre-compiled header just because you saw a linker option

2017-11-14 Thread Richard Biener

On November 14, 2017 7:59:18 PM GMT+01:00, Steve Ellcey  
wrote:
>I am building GCC (and binutils) with --with-sysroot and I build glibc
>in
>that sysroot too.  I want to test GCC using the glibc libraries I just
>built, but by default GCC uses the dynamic linker that is in /lib and
>that
>is loading the "normal" libraries from /lib or /lib64 at runtime and
>not
>the ones from my sysroot.  To address this I ran the GCC testsuite
>with:
>
>make check
>RUNTESTFLAGS="CFLAGS_FOR_TARGET='-Wl,--dynamic-linker=/mylocation/lib/ld-linux-aarch64.so.1
>-Wl,-rpath=/mylocation/lib64'"
>
>But when I do this, I noticed that a number of pch tests fail.  What I
>found
>is that when I run the pch testsuite, it executes:
>
>/home/sellcey/tot/obj/gcc/gcc/xgcc -B/home/sellcey/tot/obj/gcc/gcc/
>./common-1.h -fno-diagnostics-show-caret -fdiagnostics-color=never -O0
>-g -Wl,--dynamic-linker=/mylocation/lib/ld-linux-aarch64.so.1
>-Wl,-rpath=/mylocation/lib64 -o common-1.h.gch
>
>And this causes the linker to run and try to create an executable
>instead of 
>creating a pre-compiled header.  If I run the same command without the
>-Wl flags then GCC creates the pre-compiled header that I need for
>testing.
>
>I tracked this down to driver::maybe_run_linker where it sees the
>linker
>flags and increments num_linker_inputs, this causes the routine to call
>the linker.  I would like to have this function ignore linker options
>and only count linker input files.  This way the linker is only called
>when there is an actual file to link.
>
>I tested this with the GCC testsuite and got no regressions, OK to
>checkin?

I think the intent is to link even for just - lfoo while it makes sense to 
ignore -L/path - Wl,... Certainly counts as possible input. 

It seems you even break - lfoo. 

Richard. 

>
>2017-11-14  Steve Ellcey  
>
>   * gcc.c (driver::maybe_run_linker): Ignore linker options
>   when checking for linker inputs.
>
>
>diff --git a/gcc/gcc.c b/gcc/gcc.c
>index 43e6d59..61c7561 100644
>--- a/gcc/gcc.c
>+++ b/gcc/gcc.c
>@@ -8289,11 +8289,14 @@ driver::maybe_run_linker (const char *argv0)
>const
>   int linker_was_run = 0;
>   int num_linker_inputs;
> 
>-  /* Determine if there are any linker input files.  */
>+  /* Determine if there are any linker input files, but ignore
>+ linker options.  If we have options but no input files we
>+ do not want to call the linker.  */
>   num_linker_inputs = 0;
>   for (i = 0; (int) i < n_infiles; i++)
>-if (explicit_link_files[i] || outfiles[i] != NULL)
>-  num_linker_inputs++;
>+if ((explicit_link_files[i] || outfiles[i] != NULL)
>+&& (outfiles[i] == NULL || outfiles[i][0] != '-'))
>+num_linker_inputs++;
> 
>   /* Run ld to link all the compiler output files.  */
>

[Patch] Don't call linker when creating pre-compiled header just because you saw a linker option

2017-11-14 Thread Steve Ellcey

I am building GCC (and binutils) with --with-sysroot and I build glibc in
that sysroot too.  I want to test GCC using the glibc libraries I just
built, but by default GCC uses the dynamic linker that is in /lib and that
is loading the "normal" libraries from /lib or /lib64 at runtime and not
the ones from my sysroot.  To address this I ran the GCC testsuite with:

make check 
RUNTESTFLAGS="CFLAGS_FOR_TARGET='-Wl,--dynamic-linker=/mylocation/lib/ld-linux-aarch64.so.1
 -Wl,-rpath=/mylocation/lib64'"

But when I do this, I noticed that a number of pch tests fail.  What I found
is that when I run the pch testsuite, it executes:

/home/sellcey/tot/obj/gcc/gcc/xgcc -B/home/sellcey/tot/obj/gcc/gcc/ 
./common-1.h -fno-diagnostics-show-caret -fdiagnostics-color=never -O0 -g 
-Wl,--dynamic-linker=/mylocation/lib/ld-linux-aarch64.so.1 
-Wl,-rpath=/mylocation/lib64 -o common-1.h.gch

And this causes the linker to run and try to create an executable instead of 
creating a pre-compiled header.  If I run the same command without the
-Wl flags then GCC creates the pre-compiled header that I need for testing.

I tracked this down to driver::maybe_run_linker where it sees the linker
flags and increments num_linker_inputs, this causes the routine to call
the linker.  I would like to have this function ignore linker options
and only count linker input files.  This way the linker is only called
when there is an actual file to link.

I tested this with the GCC testsuite and got no regressions, OK to checkin?


2017-11-14  Steve Ellcey  

* gcc.c (driver::maybe_run_linker): Ignore linker options
when checking for linker inputs.


diff --git a/gcc/gcc.c b/gcc/gcc.c
index 43e6d59..61c7561 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -8289,11 +8289,14 @@ driver::maybe_run_linker (const char *argv0) const
   int linker_was_run = 0;
   int num_linker_inputs;
 
-  /* Determine if there are any linker input files.  */
+  /* Determine if there are any linker input files, but ignore
+ linker options.  If we have options but no input files we
+ do not want to call the linker.  */
   num_linker_inputs = 0;
   for (i = 0; (int) i < n_infiles; i++)
-if (explicit_link_files[i] || outfiles[i] != NULL)
-  num_linker_inputs++;
+if ((explicit_link_files[i] || outfiles[i] != NULL)
+&& (outfiles[i] == NULL || outfiles[i][0] != '-'))
+num_linker_inputs++;
 
   /* Run ld to link all the compiler output files.  */

libstdc++ PATCH to harmonize noexcept

2017-11-14 Thread Jason Merrill

While working on an unrelated issue I noticed that the compiler didn't
like some of these declarations after preprocessing, when they aren't
protected by system-header permissiveness.

I thought about limiting the permissiveness to only extern "C"
functions, but I believe that system headers are adding more C++
declarations, so we'd likely run into this issue again.

Shouldn't we build libstdc++ with -Wsystem-headers?  Maybe along with
-Wno-error=system-headers?

Jonathan approved these changes elsewhere.

Jason
commit abe66184d116f85f10108191b081f48dd0cfe3aa
Author: Jason Merrill 
Date:   Tue Nov 14 13:48:57 2017 -0500

Correct noexcept mismatch in declarations.

* include/bits/fs_ops.h (permissions): Add noexcept.
* include/bits/fs_fwd.h (copy, copy_file): Remove noexcept.
(permissions): Add noexcept.
* include/std/string_view (find_first_of): Add noexcept.

diff --git a/libstdc++-v3/include/bits/fs_fwd.h 
b/libstdc++-v3/include/bits/fs_fwd.h
index f408a39b974..66b0d20f027 100644
--- a/libstdc++-v3/include/bits/fs_fwd.h
+++ b/libstdc++-v3/include/bits/fs_fwd.h
@@ -300,11 +300,11 @@ _GLIBCXX_END_NAMESPACE_CXX11
 
   void copy(const path& __from, const path& __to, copy_options __options);
   void copy(const path& __from, const path& __to, copy_options __options,
-   error_code&) noexcept;
+   error_code&);
 
   bool copy_file(const path& __from, const path& __to, copy_options __option);
   bool copy_file(const path& __from, const path& __to, copy_options __option,
-error_code&) noexcept;
+error_code&);
 
   path current_path();
 
@@ -319,7 +319,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
   file_time_type last_write_time(const path&);
   file_time_type last_write_time(const path&, error_code&) noexcept;
 
-  void permissions(const path&, perms, perm_options, error_code&);
+  void permissions(const path&, perms, perm_options, error_code&) noexcept;
 
   path proximate(const path& __p, const path& __base, error_code& __ec);
   path proximate(const path& __p, const path& __base, error_code& __ec);
diff --git a/libstdc++-v3/include/bits/fs_ops.h 
b/libstdc++-v3/include/bits/fs_ops.h
index 075d61e2a63..22422bd1f4d 100644
--- a/libstdc++-v3/include/bits/fs_ops.h
+++ b/libstdc++-v3/include/bits/fs_ops.h
@@ -253,7 +253,7 @@ namespace filesystem
 
   void
   permissions(const path& __p, perms __prms, perm_options __opts,
- error_code& __ec);
+ error_code& __ec) noexcept;
 
   inline path proximate(const path& __p, error_code& __ec)
   { return proximate(__p, current_path(), __ec); }
diff --git a/libstdc++-v3/include/std/string_view 
b/libstdc++-v3/include/std/string_view
index 1900b867841..49b6c378370 100644
--- a/libstdc++-v3/include/std/string_view
+++ b/libstdc++-v3/include/std/string_view
@@ -323,7 +323,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return this->find(__c, __pos); }
 
   constexpr size_type
-  find_first_of(const _CharT* __str, size_type __pos, size_type __n) const;
+  find_first_of(const _CharT* __str, size_type __pos, size_type __n) const 
noexcept;
 
   constexpr size_type
   find_first_of(const _CharT* __str, size_type __pos = 0) const noexcept

Re: [PATCH][GCC][AARCH64]Bad code-gen for structure/block/unaligned memory access

2017-11-14 Thread Tamar Christina

Th 11/14/2017 17:48, James Greenhalgh wrote:
> On Tue, Nov 14, 2017 at 04:05:12PM +, Tamar Christina wrote:
> > Hi James,
> > 
> > I have split off the aarch64 bit off from the generic parts and processed 
> > your feedback.
> > 
> > Attached is the reworked patch.
> > 
> > Ok for Tunk?
> 
> Thanks for the respin, I'm a bit confused by this comment.

Sorry, the comment was a bit nonsensical. I update the last part but didn't 
re-read the comment
as a whole.

I've respun the patch again.

Thanks,
Tamar
> 
> > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > index 
> > e89c8156976cecf200cd67c1e938c8156c1240c4..bc197cc973a2bbc2c263809c7e8ccfbc86309c26
> >  100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -13974,6 +13974,28 @@ aarch64_expand_movmem (rtx *operands)
> >base = copy_to_mode_reg (Pmode, XEXP (src, 0));
> >src = adjust_automodify_address (src, VOIDmode, base, 0);
> >  
> > +  /* Optimize routines for MEM to REG copies.
> > + This particular function only handles copying to two
> > + destination types: 1) a regular register and 2) the stack.
> > + When writing to a regular register we may end up writting too much in 
> > cases
> > + where the register already contains a live value or when no data 
> > padding is
> > + happening so we disallow regular registers to use this new code path. 
> >  */
> 
> I'm struggling to understand when you'd end up with a struct held in
> a partial register going through this code path. Maybe the restriction
> makes sense, can you write a testcase, or show some sample code where
> this goes wrong (or simplify the comment).
> 
> Thanks,
> James
> 

-- 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e89c8156976cecf200cd67c1e938c8156c1240c4..2597ec70ff2c49c8276622d3f9d6fe394a1f80c1 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13974,6 +13974,26 @@ aarch64_expand_movmem (rtx *operands)
   base = copy_to_mode_reg (Pmode, XEXP (src, 0));
   src = adjust_automodify_address (src, VOIDmode, base, 0);
 
+  /* Optimize routines for MEM to REG copies.
+ This particular function only handles copying to two
+ destination types: 1) a regular register and 2) the stack and only when
+ the amount to copy fits in less than 8 bytes.  */
+  if (n < 8 && !REG_P (XEXP (operands[0], 0)))
+{
+   machine_mode dest_mode
+	 = smallest_mode_for_size (n * BITS_PER_UNIT, MODE_INT);
+   rtx result = gen_reg_rtx (dest_mode);
+   emit_insn (gen_move_insn (result, GEN_INT (0)));
+
+   src = adjust_address (src, dest_mode, 0);
+   emit_insn (gen_move_insn (result, src));
+   src = aarch64_progress_pointer (src);
+
+   dst = adjust_address (dst, dest_mode, 0);
+   emit_insn (gen_move_insn (dst, result));
+   return true;
+}
+
   /* Simple cases.  Copy 0-3 bytes, as (if applicable) a 2-byte, then a
  1-byte chunk.  */
   if (n < 4)
diff --git a/gcc/testsuite/gcc.target/aarch64/structs.c b/gcc/testsuite/gcc.target/aarch64/structs.c
new file mode 100644
index ..2be8fae9479500cdbd5f22c6ab4f6adcdd7fcbce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/structs.c
@@ -0,0 +1,226 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2" } */
+/* Copyright 1996, 1999, 2007 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .
+
+   Please email any bugs, comments, and/or additions to this file to:
+   bug-...@prep.ai.mit.edu  */
+
+struct struct1 { char a;};
+struct struct2 { char a, b;};
+struct struct3 { char a, b, c; };
+struct struct4 { char a, b, c, d; };
+struct struct5 { char a, b, c, d, e; };
+struct struct6 { char a, b, c, d, e, f; };
+struct struct7 { char a, b, c, d, e, f, g; };
+struct struct8 { char a, b, c, d, e, f, g, h; };
+struct struct9 { char a, b, c, d, e, f, g, h, i; };
+struct struct10 { char a, b, c, d, e, f, g, h, i, j; };
+struct struct11 { char a, b, c, d, e, f, g, h, i, j, k; };
+struct struct12 { char a, b, c, d, e, f, g, h, i, j, k, l; };
+struct struct16 { char a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p; };
+
+struct struct1 foo1 = {'1'},  L1;
+struct struct2 foo2 = { 'a', 'b'},  L2;
+struct struct3 foo3 = { 'A', 'B', 'C'},  L3;
+struct struct4 foo4 = {'1', '2', '3',

Re: [PATCH libquadmath, pr68686] tgammaq(x) is always negative for noninteger x < 0

2017-11-14 Thread Steve Kargl

Ed,

gfortran uses libquadmath for support of its REAL(16)
intrinsic subprograms.  I checked the list of intrinsics
and tgamma is current not in the list.  I don't have
time to try Fortran's C interop feature to see if an
ordinary user can access __float128.  While your patch
is probably useful for GCC, I doubr that gfortran in
its current state will use tgammal.  You'll need either
Jakub or Joseph (jsm28) to give an OK.

-- 
steve

On Mon, Nov 13, 2017 at 01:27:42PM -0500, Ed Smith-Rowland wrote:
> Here is a patch for tammaq for negative argument pr68686.
> 
> I know about depending on ports from upstream but this was done recently 
> and this (tgammaq) was left out.
> 
> This patch is basically a one-liner.
> 
> I have test cases but libquadmath doesn't have a testsuite.
> 
> One test just shows alternating signs for -0.5Q, -1.5Q, -2.5Q, etc.
> 
> Another test verifies the Gamma reflection formula:
> 
>   tgamma(1-x) * tgamma(x) - pi / sinq(pi*x) is tiny.
> 
> Builds on x86_64-linux and tests correctly offline.
> 
> 
> OK?
> 
> 

> 2017-11-10  Edward Smith-Rowland  <3dw...@verizon.net>
> 
>   PR libquadmath/68686
>   * math/tgammaq.c: Correct sign for negative argument.

> diff --git a/libquadmath/math/tgammaq.c b/libquadmath/math/tgammaq.c
> index a07d583..3080094 100644
> --- a/libquadmath/math/tgammaq.c
> +++ b/libquadmath/math/tgammaq.c
> @@ -47,7 +47,9 @@ tgammaq (__float128 x)
>  /* x == -Inf.  According to ISO this is NaN.  */
>  return x - x;
>  
> -  /* XXX FIXME.  */
>res = expq (lgammaq (x));
> -  return signbitq (x) ? -res : res;
> +  if (x > 0.0Q || ((int)(-x) & 1) == 1)
> +return res;
> +  else
> +return -res;
>  }

> #include 
> #include 
> 
> void
> test_alt_signs()
> {
>   __float128 result = tgammaq(-1.5Q);
>   assert(result > 0.0Q);
> 
>   result = tgammaq(-2.5Q);
>   assert(result < 0.0Q);
> 
>   result = tgammaq(-3.5Q);
>   assert(result > 0.0Q);
> 
>   result = tgammaq(-4.5Q);
>   assert(result < 0.0Q);
> }
> 
> /*
>  * Return |\Gamma(x) \Gamma(1 - x) - \frac{\pi}{sin(\pi x)}|
>  */
> __float128
> abs_delta(__float128 x)
> {
>   return fabsq(tgammaq(x) * tgammaq(1.0Q - x) - M_PIq / sinq(M_PIq * x));
> }
> 
> /*
>  * Test reflection: \Gamma(x) \Gamma(1 - x) = \frac{\pi}{sin(\pi x)}
>  */
> void
> test_reflection()
> {
>   assert(abs_delta(+1.5Q) < 100 * FLT128_EPSILON);
>   assert(abs_delta(+0.5Q) < 100 * FLT128_EPSILON);
>   assert(abs_delta(-0.5Q) < 100 * FLT128_EPSILON);
>   assert(abs_delta(-1.5Q) < 100 * FLT128_EPSILON);
>   assert(abs_delta(-2.5Q) < 100 * FLT128_EPSILON);
> }
> 
> int
> main()
> {
>   test_alt_signs();
> 
>   test_reflection();
> 
>   return 0;
> }


-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

Re: [v3 PATCH] Implement LWG 2733 and LWG 2759

2017-11-14 Thread Jonathan Wakely


On 14/11/17 19:51 +0200, Ville Voutilainen wrote:

On 14 November 2017 at 19:06, Jonathan Wakely  wrote:

Both files can use remove_cv_t instead of remove_cv::type (and there's
no need to std-qualify it in std::gcd).

They could also use is_integral_v and is_same_v but that's
pre-existing, and less important because there's no 'typename' that
can be avoided by changing it.


Here's round 2.


OK for trunk, thanks.

Re: potential bug in libstc++ ?

2017-11-14 Thread Jonathan Wakely


On 14/11/17 15:04 +, Jonathan Wakely wrote:

On 14 November 2017 at 14:35, Sergey Nenakhov wrote:

Hello.

Excuse me if I'm posting to the wrong mailing list.


libstdc++ has its own mailing list (which Richard CC'd).


I've grabbed gcc-7.2.0
sources and noticed strange statement in the file
gcc-7.2.0\libstdc++-v3\include\bits\locale_conv.h  which seems like a bug
to me:
Line 434 and 434 of that file are:
if (__nbytes < 1)
  __nbytes == 1;

Should not it be ?
if (__nbytes < 1)
  __nbytes = 1;

(=  vs == if it didn't hit your eye).


Yes, thanks, I'll take care of it.


Fixed on trunk by this patch.

Tested powerpc64le-linux, committed to trunk.

commit 1dad4a0c23b915635ea5ac27cb096b3bfafed670
Author: Jonathan Wakely 
Date:   Tue Nov 14 17:39:32 2017 +

Fix typo in std::wbuffer_convert

* include/bits/locale_conv.h (wbuffer_convert::_M_conv_get): Fix typo.
* testsuite/22_locale/conversions/buffer/3.cc: New test.

diff --git a/libstdc++-v3/include/bits/locale_conv.h b/libstdc++-v3/include/bits/locale_conv.h
index 47c8dee53cb..b8f77dcaca9 100644
--- a/libstdc++-v3/include/bits/locale_conv.h
+++ b/libstdc++-v3/include/bits/locale_conv.h
@@ -431,7 +431,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
 	streamsize __nbytes = sizeof(_M_get_buf) - _M_unconv;
 	__nbytes = std::min(__nbytes, _M_buf->in_avail());
 	if (__nbytes < 1)
-	  __nbytes == 1;
+	  __nbytes = 1;
 	__nbytes = _M_buf->sgetn(_M_get_buf + _M_unconv, __nbytes);
 	if (__nbytes < 1)
 	  return false;
diff --git a/libstdc++-v3/testsuite/22_locale/conversions/buffer/3.cc b/libstdc++-v3/testsuite/22_locale/conversions/buffer/3.cc
new file mode 100644
index 000..99a679dc124
--- /dev/null
+++ b/libstdc++-v3/testsuite/22_locale/conversions/buffer/3.cc
@@ -0,0 +1,58 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target c++11 } }
+
+#include 
+#include 
+#include 
+
+struct streambuf : std::streambuf
+{
+  int_type underflow() override
+  {
+if (c != '\0')
+{
+  this->setg(, ,  + 1);
+  return *this->gptr();
+}
+c = '\0';
+return traits_type::eof();
+  }
+
+private:
+  char c = 'a';
+};
+
+struct codecvt : std::codecvt { };
+
+void
+test01()
+{
+  // https://gcc.gnu.org/ml/libstdc++/2017-11/msg00022.html
+  streambuf sb;
+  std::wbuffer_convert conv();
+  VERIFY( sb.in_avail() == 0 );
+  wchar_t c = conv.sgetc();
+  VERIFY( c == L'a' );
+}
+
+int
+main()
+{
+  test01();
+}

Re: [v3 PATCH] Implement LWG 2733 and LWG 2759

2017-11-14 Thread Ville Voutilainen

On 14 November 2017 at 19:06, Jonathan Wakely  wrote:
> Both files can use remove_cv_t instead of remove_cv::type (and there's
> no need to std-qualify it in std::gcd).
>
> They could also use is_integral_v and is_same_v but that's
> pre-existing, and less important because there's no 'typename' that
> can be avoided by changing it.

Here's round 2.
diff --git a/libstdc++-v3/include/experimental/numeric 
b/libstdc++-v3/include/experimental/numeric
index c8597fc..f037d8e 100644
--- a/libstdc++-v3/include/experimental/numeric
+++ b/libstdc++-v3/include/experimental/numeric
@@ -55,10 +55,12 @@ inline namespace fundamentals_v2
 constexpr common_type_t<_Mn, _Nn>
 gcd(_Mn __m, _Nn __n)
 {
-  static_assert(is_integral<_Mn>::value, "gcd arguments are integers");
-  static_assert(is_integral<_Nn>::value, "gcd arguments are integers");
-  static_assert(!is_same<_Mn, bool>::value, "gcd arguments are not bools");
-  static_assert(!is_same<_Nn, bool>::value, "gcd arguments are not bools");
+  static_assert(is_integral_v<_Mn>, "gcd arguments are integers");
+  static_assert(is_integral_v<_Nn>, "gcd arguments are integers");
+  static_assert(!is_same_v, bool>,
+   "gcd arguments are not bools");
+  static_assert(!is_same_v, bool>,
+   "gcd arguments are not bools");
   return std::__detail::__gcd(__m, __n);
 }
 
@@ -67,10 +69,12 @@ inline namespace fundamentals_v2
 constexpr common_type_t<_Mn, _Nn>
 lcm(_Mn __m, _Nn __n)
 {
-  static_assert(is_integral<_Mn>::value, "lcm arguments are integers");
-  static_assert(is_integral<_Nn>::value, "lcm arguments are integers");
-  static_assert(!is_same<_Mn, bool>::value, "lcm arguments are not bools");
-  static_assert(!is_same<_Nn, bool>::value, "lcm arguments are not bools");
+  static_assert(is_integral_v<_Mn>, "lcm arguments are integers");
+  static_assert(is_integral_v<_Nn>, "lcm arguments are integers");
+  static_assert(!is_same_v, bool>,
+   "lcm arguments are not bools");
+  static_assert(!is_same_v, bool>,
+   "lcm arguments are not bools");
   return std::__detail::__lcm(__m, __n);
 }
 } // namespace fundamentals_v2
diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
index 2b80419..a3a447d 100644
--- a/libstdc++-v3/include/std/numeric
+++ b/libstdc++-v3/include/std/numeric
@@ -131,10 +131,12 @@ namespace __detail
 constexpr common_type_t<_Mn, _Nn>
 gcd(_Mn __m, _Nn __n)
 {
-  static_assert(is_integral<_Mn>::value, "gcd arguments are integers");
-  static_assert(is_integral<_Nn>::value, "gcd arguments are integers");
-  static_assert(!is_same<_Mn, bool>::value, "gcd arguments are not bools");
-  static_assert(!is_same<_Nn, bool>::value, "gcd arguments are not bools");
+  static_assert(is_integral_v<_Mn>, "gcd arguments are integers");
+  static_assert(is_integral_v<_Nn>, "gcd arguments are integers");
+  static_assert(!is_same_v, bool>,
+   "gcd arguments are not bools");
+  static_assert(!is_same_v, bool>,
+   "gcd arguments are not bools");
   return __detail::__gcd(__m, __n);
 }
 
@@ -143,10 +145,12 @@ namespace __detail
 constexpr common_type_t<_Mn, _Nn>
 lcm(_Mn __m, _Nn __n)
 {
-  static_assert(is_integral<_Mn>::value, "lcm arguments are integers");
-  static_assert(is_integral<_Nn>::value, "lcm arguments are integers");
-  static_assert(!is_same<_Mn, bool>::value, "lcm arguments are not bools");
-  static_assert(!is_same<_Nn, bool>::value, "lcm arguments are not bools");
+  static_assert(is_integral_v<_Mn>, "lcm arguments are integers");
+  static_assert(is_integral_v<_Nn>, "lcm arguments are integers");
+  static_assert(!is_same_v, bool>,
+   "lcm arguments are not bools");
+  static_assert(!is_same_v, bool>,
+   "lcm arguments are not bools");
   return __detail::__lcm(__m, __n);
 }
 
diff --git a/libstdc++-v3/testsuite/26_numerics/gcd/gcd_neg.cc 
b/libstdc++-v3/testsuite/26_numerics/gcd/gcd_neg.cc
index 30524a1b..63a8afa 100644
--- a/libstdc++-v3/testsuite/26_numerics/gcd/gcd_neg.cc
+++ b/libstdc++-v3/testsuite/26_numerics/gcd/gcd_neg.cc
@@ -26,14 +26,29 @@ test01()
   std::gcd(true, 1);// { dg-error "from here" }
   std::gcd(1, true);// { dg-error "from here" }
   std::gcd(true, true); // { dg-error "from here" }
+  std::gcd(true, 1);// { dg-error "from here" }
+  std::gcd(1, true);// { dg-error "from here" }
+  std::gcd(true, true); // { dg-error "from here" }
+  std::gcd(true, 1);// { dg-error "from here" }
+  std::gcd(1, true);// { dg-error "from here" }
+  std::gcd(true, true); // {

Re: [PATCH][GCC][AARCH64]Bad code-gen for structure/block/unaligned memory access

2017-11-14 Thread James Greenhalgh

On Tue, Nov 14, 2017 at 04:05:12PM +, Tamar Christina wrote:
> Hi James,
> 
> I have split off the aarch64 bit off from the generic parts and processed 
> your feedback.
> 
> Attached is the reworked patch.
> 
> Ok for Tunk?

Thanks for the respin, I'm a bit confused by this comment.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> e89c8156976cecf200cd67c1e938c8156c1240c4..bc197cc973a2bbc2c263809c7e8ccfbc86309c26
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13974,6 +13974,28 @@ aarch64_expand_movmem (rtx *operands)
>base = copy_to_mode_reg (Pmode, XEXP (src, 0));
>src = adjust_automodify_address (src, VOIDmode, base, 0);
>  
> +  /* Optimize routines for MEM to REG copies.
> + This particular function only handles copying to two
> + destination types: 1) a regular register and 2) the stack.
> + When writing to a regular register we may end up writting too much in 
> cases
> + where the register already contains a live value or when no data 
> padding is
> + happening so we disallow regular registers to use this new code path.  
> */

I'm struggling to understand when you'd end up with a struct held in
a partial register going through this code path. Maybe the restriction
makes sense, can you write a testcase, or show some sample code where
this goes wrong (or simplify the comment).

Thanks,
James

Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices

2017-11-14 Thread Martin Sebor


On 11/14/2017 05:28 AM, Richard Biener wrote:

On Mon, Nov 13, 2017 at 6:37 PM, Martin Sebor  wrote:

Richard, this thread may have been conflated with the one Re:
[PATCH] enhance -Warray-bounds to detect out-of-bounds offsets
(PR 82455) They are about different things.

I'm still looking for approval of:

  https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01208.html


Sorry, I pointed to an outdated version.  This is the latest
version:

  https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01304.html

My bad...



+  tree maxbound
+ = build_int_cst (sizetype, ~(1LLU << (TYPE_PRECISION (sizetype) - 1)));

this looks possibly bogus.  Can you instead use

  up_bound_p1
= wide_int_to_tree (sizetype, wi::div_trunc (wi::max_value
(TYPE_PRECISION (sizetype), SIGNED), wi::to_wide (eltsize)));

please?  Note you are _not_ computing the proper upper bound here because that
is what you compute plus low_bound.

+  up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize);

+
+  tree arg = TREE_OPERAND (ref, 0);
+  tree_code code = TREE_CODE (arg);
+  if (code == COMPONENT_REF)
+ {
+  HOST_WIDE_INT off;
+  if (tree base = get_addr_base_and_unit_offset (ref, ))
+{
+  tree size = TYPE_SIZE_UNIT (TREE_TYPE (base));
+  if (TREE_CODE (size) == INTEGER_CST)
+ up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size);

I think I asked this multiple times now but given 'ref' is the
variable array-ref
a.b.c[i] when you call get_addr_base_and_unit_offset (ref, ) you always
get a NULL_TREE return value.

So I asked you to pass it 'arg' instead ... which gets you the offset of
a.b.c, which looks like what you intended to get anyway.

I also wonder what you compute here - you are looking at the size of 'base'
but that is the size of 'a'.  You don't even use the computed offset!  Which
means you could have used get_base_address instead!?  Also the type
of 'base' may be completely off given MEM[ + 8].b.c[i] would return blk
as base which might be an array of chars and not in any way related to
the type of the innermost structure we access with COMPONENT_REFs.

Why are you only looking at COMPONENT_REF args anyways?  You
don't want to handle a.b[3][i]?

That is, I'd have expected you do

   if (get_addr_base_and_unit_offset (ref, ))
 up_bound_p1 = wide_int_to_tree (sizetype, wi::sub (wi::to_wide
(up_bound_p1), off));

Richard.


Thanks
Martin



The difficulty with a testcase like

struct { struct A { int b[1]; } a[5]; } x;

 x.a[i].b[j]

is that b is not considered an array at struct end since one of my
recent changes to array_at_struct_end (basically it disallows having
a flex array as the last member of an array).

It would still stand for non-array components with variable offset
but you can't create C testcases for that.

So yes, for the specific case within the array_at_struct_end_p condition
get_addr_base_and_unit_offset is enough.  IIRC the conditon was
a bit more than just get_addr_base_and_unit_offset.  up_bound !=
INTEGER_CST for example.  So make the above

void foo (int n, int i)
{
 struct { struct A { int b[n]; } a[5]; } x;
 return x.a[i].b[PTRDIFF_MAX/2];
}

with appropriately adjusted constant.  Does that give you the testcase
you want?



Thank you for the test case.  It is diagnosed the same way
irrespective of which of the two functions is used so it serves
to confirm my understanding that the only difference between
the two functions is bits vs bytes.

Unless you have another test case that does demonstrate that
get_ref_base_and_extent is necessary/helpful, is the last patch
okay to commit?

(Again, to be clear, I'm happy to change or enhance the patch if
I can verify that the change handles cases that the current patch
misses.)



As of "it works, catches corner-cases, ..." - yes, it does, but it
adds code that needs to be maintained, may contain bugs, is
executed even for valid code.



Understood.  I don't claim the enhancement is free of any cost
whatsoever.  But it is teeny by most standards and it doesn't
detect just excessively large indices but also negative indices
into last member arrays (bug 68325) and out-of-bounds indices
(bug 82583).  The "excessively large" part does come largely
for free with the other checks.

Martin

Re: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation for SKX

2017-11-14 Thread Uros Bizjak

On Tue, Nov 14, 2017 at 3:26 PM, Peryt, Sebastian
 wrote:
> Attached is fixed patch.
>
> Sebastian
>
>
>> -Original Message-
>> From: H.J. Lu [mailto:hjl.to...@gmail.com]
>> Sent: Tuesday, November 14, 2017 1:18 PM
>> To: Peryt, Sebastian 
>> Cc: Jakub Jelinek ; gcc-patches@gcc.gnu.org; Uros Bizjak
>> ; Kirill Yukhin ; Lu, Hongjiu
>> 
>> Subject: Re: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation
>> for SKX
>>
>> On Tue, Nov 14, 2017 at 3:18 AM, Peryt, Sebastian 
>> wrote:
>> > I have updated tests and changelog according to Jakub's suggestions.
>> > Please find attached v2 of my patch.
>> >
>> >
>> > 14.11.2017  Sebastian Peryt  
>> >
>> > gcc/
>> >
>> > PR target/82941
>> > PR target/82942
>> > * config/i386/i386.c (pass_insert_vzeroupper): Modify gate 
>> > condition
>> > to return true on Xeon and not on Xeon Phi.
>> > (ix86_check_avx256_register): Changed to ...
>> > (ix86_check_avx_upper_register): ... this. Add extra check for
>> > VALID_AVX512F_REG_OR_XI_MODE.
>> > (ix86_avx_u128_mode_needed): Changed
>> > ix86_check_avx256_register to ix86_check_avx_upper_register.
>> > (ix86_check_avx256_stores): Changed to ...
>> > (ix86_check_avx_upper_stores): ... this. Changed
>> > ix86_check_avx256_register to ix86_check_avx_upper_register.
>> > (ix86_avx_u128_mode_after): Changed
>> > avx_reg256_found to avx_upper_reg_found. Changed
>> > ix86_check_avx256_stores to ix86_check_avx_upper_stores.
>> > (ix86_avx_u128_mode_entry): Changed
>> > ix86_check_avx256_register to ix86_check_avx_upper_register.
>> > (ix86_avx_u128_mode_exit): Ditto.
>> > * config/i386/i386.h: (host_detect_local_cpu): New define.

OK.

Thanks,
Uros.

Re: Tweak vector::_M_realloc_insert for code size

2017-11-14 Thread Jonathan Wakely


On 11/11/17 19:14 +0100, Marc Glisse wrote:

Hello,

operator new can clobber memory, it is hard to teach the compiler 
otherwise since it is replaceable. Here I cache a couple values before 
the call to the allocator. I checked the result on this simple 
example:


#include 
void f(std::vector){ v.push_back(0); }

The patch does not affect the fast path where no reallocation is 
needed. Compiling with -O3 (everything gets inlined into f), I see a 
nice decrease in code size on x86_64


$ size old.o new.o
  text data bss dec hex filename
   4620   0 462 1ce old.o
   3760   0 376 178 new.o

Even at -O2 where _M_realloc_insert is not inlined, I get a slight 
decrease in code size (490 -> 470). On x86, that's 531 -> 519 (465 -> 
387 at -O3).


I'm not going to modify every function like that, I just happened to 
be looking at this example for other reasons, and the size gain is 
larger than I expected, so I am posting the patch.


That's a nice improvement, OK for trunk. Thanks.

Re: [v3 PATCH] Implement LWG 2733 and LWG 2759

2017-11-14 Thread Jonathan Wakely


On 14/11/17 18:14 +0200, Ville Voutilainen wrote:

   Implement LWG 2733 and LWG 2759
   * include/experimental/numeric (gcd): Reject cv-qualified bool.
   (lcm): Likewise.
   * include/std/numeric (gcd): Likewise.
   (lcm): Likewise.
   * testsuite/26_numerics/gcd/gcd_neg.cc: Add tests and adjust.
   * testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.


Both files can use remove_cv_t instead of remove_cv::type (and there's
no need to std-qualify it in std::gcd).

They could also use is_integral_v and is_same_v but that's
pre-existing, and less important because there's no 'typename' that
can be avoided by changing it.

Re: [PATCH], make Float128 built-in functions work with -mabi=ieeelongdouble

2017-11-14 Thread Segher Boessenkool

Hi!

On Fri, Nov 10, 2017 at 06:13:19PM -0500, Michael Meissner wrote:
> This patch updates the float128 built-in functions that get/set exponents, get
> mantissa, and do tests to work with _Float128 on the current system, and with
> long double when -mabi=ieeelongdouble is used.
> 
> The issue is when long double == IEEE, we use TFmode instead of KFmode.  I
> decided to fix this inside of rs6000_expand_builtin, adding a switch statement
> for the KFmode float128 built-ins, and switching them to the TFmode variant if
> -mabi=ieeelongdouble.

>   (scalar_extract_expq): Change float128 built-in functions to
>   accomidate having both KFmode and TFmode functions.  Use the
>   KFmode variant as the default.

"accommodate".

>   * config/rs6000/vsx.md (xsxexpqp_): Change float128 built-in
>   function insns to support both TFmode and KFmode varaints.

"variants".

Other than those trivialities, looks fine.  Please commit.  Thanks!


Segher

Patch ping^2

2017-11-14 Thread Jakub Jelinek

On Mon, Nov 06, 2017 at 05:22:36PM +0100, Jakub Jelinek wrote:
> I'd like to ping the:
> 
>   http://gcc.gnu.org/ml/gcc-patches/2017-10/msg01895.html
>   PR debug/82718
>   Fix DWARF5 .debug_loclist handling with hot/cold partitioning
> 
> patch.  Thanks

Ping^2.

Jakub

Re: [PATCH] Zero vptr in dtor for -fsanitize=vptr.

2017-11-14 Thread Jason Merrill

OK.

On Mon, Nov 6, 2017 at 3:27 AM, Martin Liška  wrote:
> On 11/03/2017 04:21 PM, Jason Merrill wrote:
>> On Fri, Nov 3, 2017 at 10:25 AM, Martin Liška  wrote:
>>> On 10/27/2017 09:44 PM, Nathan Sidwell wrote:
 On 10/27/2017 02:34 PM, Jakub Jelinek wrote:

> But when singly inheriting a polymorphic base and thus mapped to the same
> vptr all but the last dtor will not be in charge, right?

 Correct.

> So, if using build_clobber_this for this, instead of clobbering what we
> clobber we'd just clear the single vptr (couldn't clobber the rest, even
> if before the store, because that would make the earlier other vptr stores
> dead).

 ok (I'd not looked at the patch to see if in chargeness was signficant)

 nathan

>>>
>>> Hello.
>>>
>>> I'm sending v2 which only zeros vptr of object.
>>>
>>> Ready to be installed after finishing tests?
>>
>> Surely we also want to check TYPE_CONTAINS_VPTR_P.
>>
>> Jason
>>
>
> Done that in attached patch.
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?
> Martin

[v3 PATCH] Implement LWG 2733 and LWG 2759

2017-11-14 Thread Ville Voutilainen

Tested on Linux-x64.

2017-11-14  Ville Voutilainen  

Implement LWG 2733 and LWG 2759
* include/experimental/numeric (gcd): Reject cv-qualified bool.
(lcm): Likewise.
* include/std/numeric (gcd): Likewise.
(lcm): Likewise.
* testsuite/26_numerics/gcd/gcd_neg.cc: Add tests and adjust.
* testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.
diff --git a/libstdc++-v3/include/experimental/numeric 
b/libstdc++-v3/include/experimental/numeric
index c8597fc..6a48c72 100644
--- a/libstdc++-v3/include/experimental/numeric
+++ b/libstdc++-v3/include/experimental/numeric
@@ -57,8 +57,10 @@ inline namespace fundamentals_v2
 {
   static_assert(is_integral<_Mn>::value, "gcd arguments are integers");
   static_assert(is_integral<_Nn>::value, "gcd arguments are integers");
-  static_assert(!is_same<_Mn, bool>::value, "gcd arguments are not bools");
-  static_assert(!is_same<_Nn, bool>::value, "gcd arguments are not bools");
+  static_assert(!is_same::type, bool>::value,
+   "gcd arguments are not bools");
+  static_assert(!is_same::type, bool>::value,
+   "gcd arguments are not bools");
   return std::__detail::__gcd(__m, __n);
 }
 
@@ -69,8 +71,10 @@ inline namespace fundamentals_v2
 {
   static_assert(is_integral<_Mn>::value, "lcm arguments are integers");
   static_assert(is_integral<_Nn>::value, "lcm arguments are integers");
-  static_assert(!is_same<_Mn, bool>::value, "lcm arguments are not bools");
-  static_assert(!is_same<_Nn, bool>::value, "lcm arguments are not bools");
+  static_assert(!is_same::type, bool>::value,
+   "lcm arguments are not bools");
+  static_assert(!is_same::type, bool>::value,
+   "lcm arguments are not bools");
   return std::__detail::__lcm(__m, __n);
 }
 } // namespace fundamentals_v2
diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
index 2b80419..e3276e4 100644
--- a/libstdc++-v3/include/std/numeric
+++ b/libstdc++-v3/include/std/numeric
@@ -133,8 +133,10 @@ namespace __detail
 {
   static_assert(is_integral<_Mn>::value, "gcd arguments are integers");
   static_assert(is_integral<_Nn>::value, "gcd arguments are integers");
-  static_assert(!is_same<_Mn, bool>::value, "gcd arguments are not bools");
-  static_assert(!is_same<_Nn, bool>::value, "gcd arguments are not bools");
+  static_assert(!is_same::type, bool>::value,
+   "gcd arguments are not bools");
+  static_assert(!is_same::type, bool>::value,
+   "gcd arguments are not bools");
   return __detail::__gcd(__m, __n);
 }
 
@@ -145,8 +147,10 @@ namespace __detail
 {
   static_assert(is_integral<_Mn>::value, "lcm arguments are integers");
   static_assert(is_integral<_Nn>::value, "lcm arguments are integers");
-  static_assert(!is_same<_Mn, bool>::value, "lcm arguments are not bools");
-  static_assert(!is_same<_Nn, bool>::value, "lcm arguments are not bools");
+  static_assert(!is_same::type, bool>::value,
+   "lcm arguments are not bools");
+  static_assert(!is_same::type, bool>::value,
+   "lcm arguments are not bools");
   return __detail::__lcm(__m, __n);
 }
 
diff --git a/libstdc++-v3/testsuite/26_numerics/gcd/gcd_neg.cc 
b/libstdc++-v3/testsuite/26_numerics/gcd/gcd_neg.cc
index 30524a1b..d5de28f 100644
--- a/libstdc++-v3/testsuite/26_numerics/gcd/gcd_neg.cc
+++ b/libstdc++-v3/testsuite/26_numerics/gcd/gcd_neg.cc
@@ -26,6 +26,20 @@ test01()
   std::gcd(true, 1);// { dg-error "from here" }
   std::gcd(1, true);// { dg-error "from here" }
   std::gcd(true, true); // { dg-error "from here" }
+  std::gcd(true, 1);// { dg-error "from here" }
+  std::gcd(1, true);// { dg-error "from here" }
+  std::gcd(true, true); // { dg-error "from here" }
+  std::gcd(true, 1);// { dg-error "from here" }
+  std::gcd(1, true);// { dg-error "from here" }
+  std::gcd(true, true); // { dg-error "from here" }
+  std::gcd(true, 1);// { dg-error "from here" }
+  std::gcd(1, true);// { dg-error "from here" }
+  std::gcd(true, true); // { dg-error "from here" }
+  std::gcd(true, 1);// { dg-error "from here" }
+  std::gcd(1, true);// { dg-error "from here" }
+  std::gcd(true, true); // { dg-error "from here" }
   std::gcd(0.1, 1); // { dg-error "from here" }
   std::gcd(1, 0.1); // { dg-error "from here" }
   std::gcd(0.1, 0.1);   // { dg-error "from here" }
@@ -34,6 +48,6 @@ test01()
 // { dg-error "integers" "" { target *-*-* } 134 }
 // { dg-error "integers" "" { target *-*-* } 135 }
 // { dg-error "not bools" "" { target *-*-* } 136 }
-// { dg-error "not bools" "" { target *-*-* } 137 }
+// { dg-error "not bools" "" { target *-*-* } 138 }
 // { dg-prune-output "deleted

Re: [PATCH] Zero vptr in dtor for -fsanitize=vptr.

2017-11-14 Thread Martin Liška


PING^1

On 11/06/2017 09:27 AM, Martin Liška wrote:

On 11/03/2017 04:21 PM, Jason Merrill wrote:

On Fri, Nov 3, 2017 at 10:25 AM, Martin Liška  wrote:

On 10/27/2017 09:44 PM, Nathan Sidwell wrote:

On 10/27/2017 02:34 PM, Jakub Jelinek wrote:


But when singly inheriting a polymorphic base and thus mapped to the same
vptr all but the last dtor will not be in charge, right?


Correct.


So, if using build_clobber_this for this, instead of clobbering what we
clobber we'd just clear the single vptr (couldn't clobber the rest, even
if before the store, because that would make the earlier other vptr stores
dead).


ok (I'd not looked at the patch to see if in chargeness was signficant)

nathan



Hello.

I'm sending v2 which only zeros vptr of object.

Ready to be installed after finishing tests?


Surely we also want to check TYPE_CONTAINS_VPTR_P.

Jason



Done that in attached patch.
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

Re: Cleanup predict.c

2017-11-14 Thread Martin Liška


On 11/14/2017 10:20 AM, Jan Hubicka wrote:

@@ -4670,11 +4671,12 @@ expand_call_inline (basic_block bb, gimp
  
if (dump_file && (dump_flags & TDF_DETAILS))

  {
-  fprintf (dump_file, "Inlining ");
-  print_generic_expr (dump_file, id->src_fn);
-  fprintf (dump_file, " to ");
-  print_generic_expr (dump_file, id->dst_fn);
-  fprintf (dump_file, " with frequency %i\n", cg_edge->frequency ());
+  fprintf (dump_file, "Inlining %s to %s with frequency %4.2f\n",
+  xstrdup_for_dump (id->src_node->dump_name ()),
+  xstrdup_for_dump (id->dst_node->dump_name ()),
+  cg_edge->sreal_frequency ().to_double ());
+  id->src_node->dump (dump_file);
+  id->dst_node->dump (dump_file);
  }


Hi.

You don't have to call xstrdup_for_dump for functions symtab_node::dump_name and
symtab_node::dump_asm_name. Allocation is done from GGC memory.

Martin

RE: [PATCH][GCC][AARCH64]Bad code-gen for structure/block/unaligned memory access

2017-11-14 Thread Tamar Christina

Hi James,

I have split off the aarch64 bit off from the generic parts and processed your 
feedback.

Attached is the reworked patch.

Ok for Tunk?

Thanks,
Tamar

Thanks,
Tamar

gcc/
2017-11-14  Tamar Christina  

* config/aarch64/aarch64.c (aarch64_expand_movmem):
Add MEM to REG optimized case.

gcc/testsuite/
2017-11-14  Tamar Christina  

* gcc.target/aarch64/structs.c

> -Original Message-
> From: James Greenhalgh [mailto:james.greenha...@arm.com]
> Sent: Wednesday, August 30, 2017 16:56
> To: Tamar Christina 
> Cc: Richard Sandiford ; GCC Patches  patc...@gcc.gnu.org>; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> 
> Subject: Re: [PATCH][GCC][AARCH64]Bad code-gen for
> structure/block/unaligned memory access
> 
> Hi Tamar,
> 
> I think the AArch64 parts of this patch can be substantially simplified.
> 
> On Mon, Jul 03, 2017 at 07:17:51AM +0100, Tamar Christina wrote:
> > diff --git a/gcc/config/aarch64/aarch64.c
> > b/gcc/config/aarch64/aarch64.c index
> >
> ab1bdc0233afe7a3c41501cb724a5c4c719123b8..1cf6c4b891571745f740d7dbd0
> 37
> > 54bbba9264a6 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -13466,7 +13466,6 @@
> aarch64_copy_one_block_and_progress_pointers
> > (rtx *src, rtx *dst,
> >
> >  /* Expand movmem, as if from a __builtin_memcpy.  Return true if
> > we succeed, otherwise return false.  */
> > -
> >  bool
> >  aarch64_expand_movmem (rtx *operands)  { @@ -13498,16 +13497,55
> @@
> > aarch64_expand_movmem (rtx *operands)
> >base = copy_to_mode_reg (Pmode, XEXP (src, 0));
> >src = adjust_automodify_address (src, VOIDmode, base, 0);
> >
> > +  /* Optimize routines for MEM to REG copies.
> > + This particular function only handles copying to two
> > + destination types: 1) a regular register and 2) the stack.
> > + When writing to a regular register we may end up writting too much in
> cases
> > + where the register already contains a live value or when no data
> padding is
> > + happening so we disallow it.  */  if (n < 8 && !REG_P (XEXP
> > + (operands[0], 0)))
> 
> I don't understand how this comment lines up with the condition on this
> code path. On the one hand you say you permit regular registers, but on the
> other hand you say you disallow regular registers because you may overwrite.
> 
> 
> > +   {
> > + machine_mode mov_mode, dest_mode
> > +   = smallest_mode_for_size (n * BITS_PER_UNIT, MODE_INT);
> > + rtx result = gen_reg_rtx (dest_mode);
> > + emit_insn (gen_move_insn (result, GEN_INT (0)));
> > +
> > + unsigned int shift_cnt = 0;
> 
> Any reason not to just initialise the shift_cnt in place here?
> 
> > + for (; n > shift_cnt; shift_cnt += GET_MODE_SIZE (mov_mode))
> 
> 
>  for (unsigned int shift_cnt = 0;
>   shift_cnt <=  n;
>   shift_cnt += GET_MODE_SIZE (mov_mode))
> 
> Having the loop counter first in the comparison is personal preference.
> 
> mov_mode looks uninitialised, but I guess by the time you check the loop
> condition you have actually initialized it.
> 
> > +   {
> > +int nearest = 0;
> 
> This isn't required, we could do the loop below with one
> 
> > +/* Find the mode to use.  */
> > +for (unsigned max = 1; max <= (n - shift_cnt); max *= 2)
> > + nearest = max;
> 
> Wrong formatting.
> 
> So, when this loop terminates for the first time (shift_cnt == 0) nearest is 
> the
> first power of two greater than n.
> 
> > +
> > + mov_mode = smallest_mode_for_size (nearest * BITS_PER_UNIT,
> > +MODE_INT);
> 
> That means this is always a mode of at least the right size, which in turn
> means that we can't execute round the loop again (GET_MODE_SIZE
> (mov_mode) will always be greater than n). So, we can be sure this loop only
> executes once, and we can fold mov_mode and dest_mode to be equal.
> 
> > + rtx reg = gen_reg_rtx (mov_mode);
> > +
> > + src = adjust_address (src, mov_mode, 0);
> > + emit_insn (gen_move_insn (reg, src));
> > + src = aarch64_progress_pointer (src);
> > +
> > + reg = gen_rtx_ZERO_EXTEND (dest_mode, reg);
> > + reg = gen_rtx_ASHIFT (dest_mode, reg,
> > +   GEN_INT (shift_cnt * BITS_PER_UNIT));
> > + result = gen_rtx_IOR (dest_mode, reg, result);
> > +   }
> > +
> > +dst = adjust_address (dst, dest_mode, 0);
> > +emit_insn (gen_move_insn (dst, result));
> > +return true;
> > +  }
> > +
> >/* Simple cases.  Copy 0-3 bytes, as (if applicable) a 2-byte, then a
> >   1-byte chunk.  */
> >if (n < 4)
> >  {
> >if (n >= 2)
> > -   {
> > - aarch64_copy_one_block_and_progress_pointers (, ,
> HImode);
> > - n -= 2;
> > -   }
> >
> > +  {
> > +   aarch64_copy_one_block_and_progress_pointers (, ,
> HImode);
> >

[PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2017-11-14 Thread Tamar Christina

Hi All,

This patch allows larger bitsizes to be used as copy size
when the target does not have SLOW_UNALIGNED_ACCESS.

fun3:
adrpx2, .LANCHOR0
add x2, x2, :lo12:.LANCHOR0
mov x0, 0
sub sp, sp, #16
ldrhw1, [x2, 16]
ldrbw2, [x2, 18]
add sp, sp, 16
bfi x0, x1, 0, 8
ubfxx1, x1, 8, 8
bfi x0, x1, 8, 8
bfi x0, x2, 16, 8
ret

is turned into

fun3:
adrpx0, .LANCHOR0
add x0, x0, :lo12:.LANCHOR0
sub sp, sp, #16
ldrhw1, [x0, 16]
ldrbw0, [x0, 18]
strhw1, [sp, 8]
strbw0, [sp, 10]
ldr w0, [sp, 8]
add sp, sp, 16
ret

which avoids the bfi's for a simple 3 byte struct copy.

Regression tested on aarch64-none-linux-gnu and x86_64-pc-linux-gnu and no 
regressions.

This patch is just splitting off from the previous combined patch with AArch64 
and adding
a testcase.

I assume Jeff's ACK from 
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01523.html is still
valid as the code did not change.

Thanks,
Tamar


gcc/
2017-11-14  Tamar Christina  

* expr.c (copy_blkmode_to_reg): Fix bitsize for targets
with fast unaligned access.
* doc/sourcebuild.texi (no_slow_unalign): New.

gcc/testsuite/
2017-11-14  Tamar Christina  

* gcc.dg/struct-simple.c: New.
* lib/target-supports.exp
(check_effective_target_no_slow_unalign): New.

-- 
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 1646d0a99911aa7b2e66762e5907fbb0454ed00d..1df7a82fbd516b9bf07908bb800e441110b28ca4 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2178,8 +2178,12 @@ Target supports @code{wchar_t} that is compatible with @code{char32_t}.
 
 @item comdat_group
 Target uses comdat groups.
+
+@item no_slow_unalign
+Target does not have slow unaligned access.
 @end table
 
+
 @subsubsection Local to tests in @code{gcc.target/i386}
 
 @table @code
diff --git a/gcc/expr.c b/gcc/expr.c
index 2f8432d92ccac17c0a548faf4a16eff0656cef1b..afcea8fef58155d0a81c10cd485ba8af888d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
 
   n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
   dst_words = XALLOCAVEC (rtx, n_regs);
-  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
+  bitsize = BITS_PER_WORD;
+  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE (src
+bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
 
   /* Copy the structure BITSIZE bits at a time.  */
   for (bitpos = 0, xbitpos = padding_correction;
diff --git a/gcc/testsuite/gcc.dg/struct-simple.c b/gcc/testsuite/gcc.dg/struct-simple.c
new file mode 100644
index ..9f218851d897421b217b0926d29845b6192982fb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/struct-simple.c
@@ -0,0 +1,52 @@
+/* { dg-do-run } */
+/* { dg-require-effective-target no_slow_unalign } */
+/* { dg-additional-options "-fdump-rtl-final" } */
+
+/* Copyright 1996, 1999, 2007 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .
+
+   Please email any bugs, comments, and/or additions to this file to:
+   bug-...@prep.ai.mit.edu  */
+
+#include 
+
+struct struct3 { char a, b, c; };
+struct struct3 foo3 = { 'A', 'B', 'C'},  L3;
+
+struct struct3  fun3()
+{
+  return foo3;
+}
+
+#ifdef PROTOTYPES
+void Fun3(struct struct3 foo3)
+#else
+void Fun3(foo3)
+ struct struct3 foo3;
+#endif
+{
+  L3 = foo3;
+}
+
+int main()
+{
+  struct struct3 x = fun3();
+
+  printf("a:%c, b:%c, c:%c\n", x.a, x.b, x.c);
+}
+
+/* { dg-final { scan-rtl-dump-not {zero_extract:.+\[\s*foo3\s*\]} "final" } } */
+
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index b6f9e51c4817cf8235c8e33b14e2763308eb482a..690d8960002a3c38462bbe5524b419c205ba4da9 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6037,6 +6037,30 @@ proc check_effective_target_unaligned_stack { } {
 return $et_unaligned_stack_saved
 }
 
+# Return 1 if the target plus current options does not have
+# slow unaligned access.
+#
+# This won't

[GCC][PATCH][AArch64] Add negative tests for dotprod and set minimum version to v8.2 in the target bit.

2017-11-14 Thread Tamar Christina

Hi All,

Dot Product is intended to only be available for Armv8.2-a and newer.
While this restriction is reflected in the intrinsics, the patterns
themselves were missing the Armv8.2-a bit.

This means that using -march=armv8.1-a+dotprod incorrectly got the
auto-vectorizer to generate dot product instructions.

Regtested on aarch64-none-elf and no issues.

Ok for trunk?

Thanks,
Tamar

gcc/
2017-11-14  Tamar Christina  

* config/aarch64/aarch64.h (TARGET_DOTPROD): Add AARCH64_ISA_V8_2.

gcc/testsuite/
2017-11-14  Tamar Christina  

* gcc.target/aarch64/advsimd-intrinsics/vect-dot-s8_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vect-dot-s8_2.c: New.

-- 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 93d29b84d47b7017661a2129d61e7d740bbf7c93..b6805775079626417e05f6f5a74289670330243d 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -192,7 +192,7 @@ extern unsigned aarch64_architecture_version;
 #define TARGET_SIMD_F16INST (TARGET_SIMD && AARCH64_ISA_F16)
 
 /* Dot Product is an optional extension to AdvSIMD enabled through +dotprod.  */
-#define TARGET_DOTPROD (TARGET_SIMD && AARCH64_ISA_DOTPROD)
+#define TARGET_DOTPROD (TARGET_SIMD && AARCH64_ISA_DOTPROD && AARCH64_ISA_V8_2)
 
 /* ARMv8.3-A features.  */
 #define TARGET_ARMV8_3	(AARCH64_ISA_V8_3)
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vect-dot-s8_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vect-dot-s8_1.c
new file mode 100644
index ..3a1664b0740fa6fb32f99f3cc40025afce2292e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vect-dot-s8_1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-additional-options "-O3 -march=armv8.1-a+dotprod" } */
+
+#define N 64
+#define TYPE signed
+
+#include "vect-dot-qi.h"
+
+/* { dg-final { scan-assembler-not {sdot\tv[0-9]+\.4s, v[0-9]+\.16b, v[0-9]+\.16b} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vect-dot-s8_2.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vect-dot-s8_2.c
new file mode 100644
index ..f8c746cb094194c74bb280bf68a38dd435c46c64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vect-dot-s8_2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-additional-options "-O3 -march=armv8.3-a+dotprod" } */
+
+#define N 64
+#define TYPE signed
+
+#include "vect-dot-qi.h"
+
+/* { dg-final { scan-assembler-times {sdot\tv[0-9]+\.4s, v[0-9]+\.16b, v[0-9]+\.16b} 4 } } */

[PATCH][GCC][ARM] Add Armv8.3-a to AArch32.

2017-11-14 Thread Tamar Christina

Hi All,

This patch adds Armv8.3-a as an architecture to the compiler with
the feature set inherited from Armv8.2-a.

Bootstrapped regtested on arm-none-linux-gnueabihf and no issues.

gcc/
2017-11-14  Tamar Christina  

* config/arm/arm-cpus.in (armv8_3, ARMv8_3a, armv8.3-a): New
* config/arm/arm-tables.opt (armv8.3-a): New.
* doc/invoke.texi (ARM Options): Add armv8.3-a

Ok for trunk?

Thanks,
Tamar.

-- 
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 0820ad74c2e75f209bf09ef598c23cdd6116d29d..281ec162db8c982128462d8efac2be1d21959cf7 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -114,9 +114,12 @@ define feature iwmmxt2
 # Architecture rel 8.1.
 define feature armv8_1
 
-# Architecutre rel 8.2.
+# Architecture rel 8.2.
 define feature armv8_2
 
+# Architecture rel 8.3.
+define feature armv8_3
+
 # M-Profile security extensions.
 define feature cmse
 
@@ -238,6 +241,7 @@ define fgroup ARMv7em ARMv7m armv7em
 define fgroup ARMv8a  ARMv7ve armv8
 define fgroup ARMv8_1aARMv8a crc32 armv8_1
 define fgroup ARMv8_2aARMv8_1a armv8_2
+define fgroup ARMv8_3aARMv8_2a armv8_3
 define fgroup ARMv8m_base ARMv6m armv8 cmse tdiv
 define fgroup ARMv8m_main ARMv7m armv8 cmse
 define fgroup ARMv8r  ARMv8a
@@ -579,6 +583,20 @@ begin arch armv8.2-a
  option dotprod add FP_ARMv8 DOTPROD
 end arch armv8.2-a
 
+begin arch armv8.3-a
+ tune for cortex-a53
+ tune flags CO_PROC
+ base 8A
+ profile A
+ isa ARMv8_3a
+ option simd add FP_ARMv8 NEON
+ option fp16 add fp16 FP_ARMv8 NEON
+ option crypto add FP_ARMv8 CRYPTO
+ option nocrypto remove ALL_CRYPTO
+ option nofp remove ALL_FP
+ option dotprod add FP_ARMv8 DOTPROD
+end arch armv8.3-a
+
 begin arch armv8-m.base
  tune for cortex-m23
  base 8M_BASE
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 4e508b1555a77628ff6e7cfea39c98b87caa840a..f7937256cd79296ba33d109232bcf0d6f7b03917 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -452,19 +452,22 @@ EnumValue
 Enum(arm_arch) String(armv8.2-a) Value(28)
 
 EnumValue
-Enum(arm_arch) String(armv8-m.base) Value(29)
+Enum(arm_arch) String(armv8.3-a) Value(29)
 
 EnumValue
-Enum(arm_arch) String(armv8-m.main) Value(30)
+Enum(arm_arch) String(armv8-m.base) Value(30)
 
 EnumValue
-Enum(arm_arch) String(armv8-r) Value(31)
+Enum(arm_arch) String(armv8-m.main) Value(31)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt) Value(32)
+Enum(arm_arch) String(armv8-r) Value(32)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt2) Value(33)
+Enum(arm_arch) String(iwmmxt) Value(33)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt2) Value(34)
 
 Enum
 Name(arm_fpu) Type(enum fpu_type)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2ef88e081f982f5619132cc33ce23c3fb542ae11..b03388080e7f14411468c4f4d39b82478255ef7e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15473,7 +15473,7 @@ Permissible names are:
 @samp{armv6}, @samp{armv6j}, @samp{armv6k}, @samp{armv6kz}, @samp{armv6t2},
 @samp{armv6z}, @samp{armv6zk},
 @samp{armv7}, @samp{armv7-a}, @samp{armv7ve}, 
-@samp{armv8-a}, @samp{armv8.1-a}, @samp{armv8.2-a},
+@samp{armv8-a}, @samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a},
 @samp{armv7-r},
 @samp{armv8-r},
 @samp{armv6-m}, @samp{armv6s-m},

[PATCH][GCC][ARM] Restrict TARGET_DOTPROD to baseline Armv8.2-a.

2017-11-14 Thread Tamar Christina

Hi All,

Dot Product is intended to only be available for Armv8.2-a and newer.
While this restriction is reflected in the intrinsics, the patterns
themselves were missing the Armv8.2-a bit.

While GCC would prevent invalid options e.g. `-march=armv8.1-a+dotprod`
we should prevent the pattern from being able to expand at all.

Regtested on arm-none-eabi and no issues.

Ok for trunk?

Thanks,
Tamar

gcc/
2017-11-14  Tamar Christina  

* config/arm/arm.h (TARGET_DOTPROD): Add arm_arch8_2.

-- 
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 9567f6df73a960ab08b3766fcf3677629658a5ab..b189951c934e327c88cc5893e9629515c9c39013 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -210,10 +210,11 @@ extern tree arm_fp16_type_node;
 /* FPU supports ARMv8.1 Adv.SIMD extensions.  */
 #define TARGET_NEON_RDMA (TARGET_NEON && arm_arch8_1)
 
-/* Supports for Dot Product AdvSIMD extensions.  */
+/* Supports the Dot Product AdvSIMD extensions.  */
 #define TARGET_DOTPROD (TARGET_NEON	\
 			&& bitmap_bit_p (arm_active_target.isa,		\
-	isa_bit_dotprod))
+	isa_bit_dotprod)		\
+			&& arm_arch8_2)
 
 /* FPU supports the floating point FP16 instructions for ARMv8.2 and later.  */
 #define TARGET_VFP_FP16INST \

Re: [RFC PATCH] Merge libsanitizer from upstream

2017-11-14 Thread Jakub Jelinek

On Tue, Nov 14, 2017 at 06:40:32PM +0300, Maxim Ostapenko wrote:
> > But on the upstream sanitizer repo,
> > sanitizer_common/sanitizer_syscall_linux_arm.inc was added on Nov 8th,
> > and also checks for retval >= -4095, hence handling the clone() error
> > gracefully. So... can we merge again our sanitizer sources to at least
> > http://llvm.org/viewvc/llvm-project?view=revision=317640 ?
> 
> I think we could just cherry-pick this commit from upstream. I can handle
> this if it's not too late given the fact that GCC is on stage 3 now.

stage3 starts only during the weekend.

Jakub

Re: [RFC PATCH] Merge libsanitizer from upstream

2017-11-14 Thread Maxim Ostapenko


Hi,

On 13/11/17 15:47, Christophe Lyon wrote:

On 30 October 2017 at 16:21, Maxim Ostapenko  wrote:

On 30/10/17 17:08, Christophe Lyon wrote:

On 30/10/2017 11:12, Maxim Ostapenko wrote:

Hi,

sorry for the late response.

On 20/10/17 13:45, Christophe Lyon wrote:

Hi,

On 19 October 2017 at 13:17, Jakub Jelinek  wrote:

On Thu, Oct 19, 2017 at 02:07:24PM +0300, Maxim Ostapenko wrote:

Is the patch (the merge + this incremental) ok for trunk?

I think the patch is OK, just wondering about two things:

Richi just approved the patch on IRC, so I'll commit, then we can deal
with
follow-ups.


Does anyone else run these tests on arm?
Since you applied this patch, I'm seeing lots of new errors and
timeouts.
I have been ignoring regression reports for *san because of yyrandomness
in the results, but the timeouts are a  major inconvenience in testing
because it increases latency a lot in getting results, or worse I get no
result at all because the validation job is killed before completion.

Looking at some intermediate logs, I have noticed:
==24797==AddressSanitizer CHECK failed:
/libsanitizer/asan/asan_poisoning.cc:34
"((AddrIsAlignedByGranularity(addr))) != (0)" (0x0, 0x0)
  #0 0x408d7d65 in AsanCheckFailed /libsanitizer/asan/asan_rtl.cc:67
  #1 0x408ecd5d in __sanitizer::CheckFailed(char const*, int, char
const*, unsigned long long, unsigned long long)
/libsanitizer/sanitizer_common/sanitizer_termination.cc:77
  #2 0x408d22d5 in __asan::PoisonShadow(unsigned long, unsigned
long, unsigned char) /libsanitizer/asan/asan_poisoning.cc:34
  #3 0x4085409b in __asan_register_globals
/libsanitizer/asan/asan_globals.cc:368
  #4 0x109eb in _GLOBAL__sub_I_00099_1_ten

(/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-linux-gnueabi/gcc3/gcc/testsuite/gcc/alloca_big_alignment.exe+0x109eb)

in MANY (193 in gcc) tests.

and many others (152 in gcc) just time out individually (eg
c-c++-common/asan/alloca_instruments_all_paddings.c) with no error in
the logs besides Dejagnu's
WARNING: program timed out.


Since I'm using an apparently unusual setup, maybe I have to update it
to cope with the new version,
so I'd like to know if others are seeing the same problems on arm?

I'm using qemu -R 0 to execute the test programs, encapsulated by
proot (similar to chroot, but does not require root privileges).

Am I missing something obvious?


I've caught the same error on my Arndale board. The issue seems to be
quite obvious: after merge, ASan requires globals array to be aligned by
shadow granularity.


Thanks for confirming. I've spent a lot of time investigating the timeout
issues, that led to zombie processes and servers needing reboot. I've
finally identified that going back to qemu-2.7 avoid the timeout issues
(I've reported a qemu bug).


This trivial patch seems to fix the issue. Could you check it on your
setup?


I was just about to finally start looking at this sanity check problem, so
thank you very much for sharing your patch.
I manually tested it on the subset of my configs and it solves the
assertion failure, thanks!
However, I can notice many regressions compared to before the merge:
c-c++-common/asan/alloca_instruments_all_paddings.c
c-c++-common/asan/alloca_loop_unpoisoning.c
c-c++-common/asan/alloca_safe_access.c
c-c++-common/asan/asan-interface-1.c
c-c++-common/asan/halt_on_error-1.c
c-c++-common/asan/pr59063-1.c
c-c++-common/asan/pr59063-2.c
c-c++-common/asan/pr63316.c
c-c++-common/asan/pr63888.c
c-c++-common/asan/pr70712.c
c-c++-common/asan/pr71480.c
c-c++-common/asan/pr79944.c
c-c++-common/asan/pr80308.c
c-c++-common/asan/swapcontext-test-1.c
gcc.dg/asan/nosanitize-and-inline.c
gcc.dg/asan/pr79196.c
gcc.dg/asan/pr80166.c
gcc.dg/asan/pr81186.c
gcc.dg/asan/use-after-scope-11.c
gcc.dg/asan/use-after-scope-4.c
gcc.dg/asan/use-after-scope-6.c
gcc.dg/asan/use-after-scope-7.c
gcc.dg/asan/use-after-scope-goto-1.c
gcc.dg/asan/use-after-scope-goto-2.c
gcc.dg/asan/use-after-scope-switch-1.c
gcc.dg/asan/use-after-scope-switch-2.c
gcc.dg/asan/use-after-scope-switch-3.c
gcc.dg/asan/use-after-scope-switch-4.c

out of which only
c-c++-common/asan/swapcontext-test-1.c
c-c++-common/asan/halt_on_error-1.c
print something in gcc.log

Do they pass for you?


Ah, I see. The problem is that after this merge LSan was enabled for ARM.
LSan sets atexit handler that calls internal_clone function that's not
supported in QEMU.
That's why these tests pass on board, but fail under QEMU.
Could you try set ASAN_OPTIONS=detect_leaks=0 in your environment?


Hi,

I have a followup on this issue, after investigating a bit more.

I filed a bug report against QEMU, and indeed it seems that it rejects
clone() as called by the sanitizers on purpose, because it cannot support
CLONE_UNTRACED.

That being said, I was wondering why the same tests worked "better"
with qemu-aarch64 (as opposed to qemu-arm). And I noticed that on aarch64,
we have

Re: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation for SKX

2017-11-14 Thread H.J. Lu

On Tue, Nov 14, 2017 at 6:20 AM, Jakub Jelinek  wrote:
> On Tue, Nov 14, 2017 at 06:14:10AM -0800, H.J. Lu wrote:
>> Just a thought, should we have a separate patch to add -mprefer-vzeroupper
>> to cover all bases in the future, like
>>
>>   /* opt_pass methods: */
>>   virtual bool gate (function *)
>> {
>>   return TARGET_AVX && (!TARGET_AVX512ER || TARGET_PREFER_VZEROUPPER)
>>  && TARGET_VZEROUPPER && flag_expensive_optimizations
>>  && !optimize_size;
>> }
>
> If we have it, then shouldn't explicit -mprefer-vzeroupper or
> -mno-prefer-vzeroupper override whatever other optimization
> conditions there are (i.e. everything other than
> TARGET_AVX and TARGET_VZEROUPPER)?  I.e. use
> !TARGET_AVX512ER && flag_expensive_optimizations && !optimize_size
> only when the explicit bit is not set for it?
>

I opened:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82990

to track it.

-- 
H.J.

Re: [build, libgcc, libgo] Adapt Solaris 12 references

2017-11-14 Thread Ian Lance Taylor via gcc-patches

On Tue, Nov 14, 2017 at 2:09 AM, Rainer Orth
 wrote:
>
>> With the change in the Solaris release model (no more major releases
>> like Solaris 12 but only minor ones like 11.4), the Solaris 12
>> references in GCC need to be adapted.
>>
>> The following patch does this, consisting mostly of comment changes.
>>
>> Only a few changes bear comment:
>>
>> * Solaris 11.4 introduced __cxa_atexit, so we have to enable it on
>>   *-*-solaris2.11.  Not a problem for native builds which check for the
>>   actual availability of the function.
>>
>> * gcc.dg/torture/pr60092.c was xfailed on *-*-solaris2.11*, but the
>>   underlying bug was fixed in Solaris 12/11.4.  However, now 11.3 and
>>   11.4 have the same configure triplet.  To avoid noise on the newest
>>   release, I've removed the xfail.
>>
>> I've left a few references to Solaris 12 builds in
>> libstdc++-v3/acinclude.m4 because those hadn't been renamed
>> retroactively, of course.
>>
>> install.texi needs some work, too, but I'll address this separately
>> because there's more than just the version change.
>>
>> Bootstrapped without regressions on {i386-pc, sparc-sun}-solaris2.1[01]
>> (both Solaris 11.3 and 11.4).  I believe I need approval only for the
>> libgo parts.
>
> how should we proceed with the libgo part of this patch?  I can checkin
> the rest (which contains functional changes) now and omit the libgo
> part (either for now or completely, given that it consists only of
> comment changes).

Sorry, I've fallen behind a bit on gccgo/libgo patch review.  I've now
committed your patch to libgo trunk.


>> I'm going to backport the patch to the gcc-7 and gcc-6 branches after a
>> bit of soak time.
>
> What's the procedure for libgo here?  IIUC, only the trunk version of
> libgo is imported from upstream, while changes to branches can go in
> directly.

That is correct.  Backporting gcc/go/gofrontend and libgo patches to
release branches is always fine with me if the release managers are OK
with it.

Ian

RE: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation for SKX

2017-11-14 Thread Peryt, Sebastian

Attached is fixed patch.

Sebastian


> -Original Message-
> From: H.J. Lu [mailto:hjl.to...@gmail.com]
> Sent: Tuesday, November 14, 2017 1:18 PM
> To: Peryt, Sebastian 
> Cc: Jakub Jelinek ; gcc-patches@gcc.gnu.org; Uros Bizjak
> ; Kirill Yukhin ; Lu, Hongjiu
> 
> Subject: Re: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation
> for SKX
> 
> On Tue, Nov 14, 2017 at 3:18 AM, Peryt, Sebastian 
> wrote:
> > I have updated tests and changelog according to Jakub's suggestions.
> > Please find attached v2 of my patch.
> >
> >
> > 14.11.2017  Sebastian Peryt  
> >
> > gcc/
> >
> > PR target/82941
> > PR target/82942
> > * config/i386/i386.c (pass_insert_vzeroupper): Modify gate condition
> > to return true on Xeon and not on Xeon Phi.
> > (ix86_check_avx256_register): Changed to ...
> > (ix86_check_avx_upper_register): ... this. Add extra check for
> > VALID_AVX512F_REG_OR_XI_MODE.
> > (ix86_avx_u128_mode_needed): Changed
> > ix86_check_avx256_register to ix86_check_avx_upper_register.
> > (ix86_check_avx256_stores): Changed to ...
> > (ix86_check_avx_upper_stores): ... this. Changed
> > ix86_check_avx256_register to ix86_check_avx_upper_register.
> > (ix86_avx_u128_mode_after): Changed
> > avx_reg256_found to avx_upper_reg_found. Changed
> > ix86_check_avx256_stores to ix86_check_avx_upper_stores.
> > (ix86_avx_u128_mode_entry): Changed
> > ix86_check_avx256_register to ix86_check_avx_upper_register.
> > (ix86_avx_u128_mode_exit): Ditto.
> > * config/i386/i386.h: (host_detect_local_cpu): New define.
> 
> @@ -2497,7 +2497,7 @@ public:
>/* opt_pass methods: */
>virtual bool gate (function *)
>  {
> -  return TARGET_AVX && !TARGET_AVX512F
> +  return TARGET_AVX && !TARGET_AVX512PF && !TARGET_AVX512ER
> ^  Please 
> remove  this.
> 
> From glibc commit:
> 
> commit 4cb334c4d6249686653137ec273d081371b3672d
> Author: H.J. Lu 
> Date:   Tue Apr 18 14:01:45 2017 -0700
> 
> x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396]
> 
> On Skylake server, AVX512 load/store instructions in memcpy/memset may
> lead to lower CPU turbo frequency in certain situations.  Use of AVX2
> in memcpy/memset has been observed to have improved overall performance
> in many workloads due to the higher frequency.
> 
> Since AVX512ER is unique to Xeon Phi, this patch sets Prefer_No_AVX512
> if AVX512ER isn't available so that AVX2 versions of memcpy/memset are
> used on Skylake server.
> 
> Only AVX512ER is really unique to Xeon Phi.
> 
>&& TARGET_VZEROUPPER && flag_expensive_optimizations
>&& !optimize_size;
>  }
> 
> > 14.11.2017  Sebastian Peryt  
> >
> > gcc/testsuite/
> >
> > PR target/82941
> > PR target/82942
> > * gcc.target/i386/pr82941-1.c: New test.
> > * gcc.target/i386/pr82941-2.c: New test.
> > * gcc.target/i386/pr82942-1.c: New test.
> > * gcc.target/i386/pr82942-2.c: New test.
> >
> >
> > Thanks,
> > Sebastian
> >
> >> -Original Message-
> >> From: Jakub Jelinek [mailto:ja...@redhat.com]
> >> Sent: Tuesday, November 14, 2017 10:51 AM
> >> To: Peryt, Sebastian 
> >> Cc: gcc-patches@gcc.gnu.org; Uros Bizjak ; Kirill
> >> Yukhin ; Lu, Hongjiu 
> >> Subject: Re: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper
> >> generation for SKX
> >>
> >> On Tue, Nov 14, 2017 at 09:45:12AM +, Peryt, Sebastian wrote:
> >> > Hi,
> >> >
> >> > This patch fixes PR82941 and PR82942 by adding vzeroupper
> >> > generation on
> >> SKX.
> >> > Bootstrapped and tested.
> >> >
> >> > 14.11.2017  Sebastian Peryt  
> >> >
> >> > gcc/
> >>
> >> In that case the ChangeLog entry should list the PRs, i.e.
> >>   PR target/82941
> >>   PR target/82942
> >> > * config/i386/i386.c (pass_insert_vzeroupper): Modify gate condition
> >> > to return true on Xeon and not on Xeon Phi.
> >> > (ix86_check_avx256_register): Changed to ...
> >> > (ix86_check_avx_upper_register): ... this.
> >> > (ix86_check_avx_upper_register): Add extra check for
> >> > VALID_AVX512F_REG_OR_XI_MODE.
> >>
> >> The way this is usually written is instead:
> >>   (ix86_check_avx256_register): Changed to ...
> >>   (ix86_check_avx_upper_register): ... this.  Add extra check for
> >>   VALID_AVX512F_REG_OR_XI_MODE.
> >> i.e. don't duplicate the function name, just continue mentioning further
> changes.
> >>
> >> > (ix86_avx_u128_mode_needed): Changed

Re: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation for SKX

2017-11-14 Thread Jakub Jelinek

On Tue, Nov 14, 2017 at 06:14:10AM -0800, H.J. Lu wrote:
> Just a thought, should we have a separate patch to add -mprefer-vzeroupper
> to cover all bases in the future, like
> 
>   /* opt_pass methods: */
>   virtual bool gate (function *)
> {
>   return TARGET_AVX && (!TARGET_AVX512ER || TARGET_PREFER_VZEROUPPER)
>  && TARGET_VZEROUPPER && flag_expensive_optimizations
>  && !optimize_size;
> }

If we have it, then shouldn't explicit -mprefer-vzeroupper or
-mno-prefer-vzeroupper override whatever other optimization
conditions there are (i.e. everything other than
TARGET_AVX and TARGET_VZEROUPPER)?  I.e. use
!TARGET_AVX512ER && flag_expensive_optimizations && !optimize_size
only when the explicit bit is not set for it?

Jakub

Re: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation for SKX

2017-11-14 Thread H.J. Lu

On Tue, Nov 14, 2017 at 4:17 AM, H.J. Lu  wrote:
> On Tue, Nov 14, 2017 at 3:18 AM, Peryt, Sebastian
>  wrote:
>> I have updated tests and changelog according to Jakub's suggestions.
>> Please find attached v2 of my patch.
>>
>>
>> 14.11.2017  Sebastian Peryt  
>>
>> gcc/
>>
>> PR target/82941
>> PR target/82942
>> * config/i386/i386.c (pass_insert_vzeroupper): Modify gate condition
>> to return true on Xeon and not on Xeon Phi.
>> (ix86_check_avx256_register): Changed to ...
>> (ix86_check_avx_upper_register): ... this. Add extra check for
>> VALID_AVX512F_REG_OR_XI_MODE.
>> (ix86_avx_u128_mode_needed): Changed
>> ix86_check_avx256_register to ix86_check_avx_upper_register.
>> (ix86_check_avx256_stores): Changed to ...
>> (ix86_check_avx_upper_stores): ... this. Changed
>> ix86_check_avx256_register to ix86_check_avx_upper_register.
>> (ix86_avx_u128_mode_after): Changed
>> avx_reg256_found to avx_upper_reg_found. Changed
>> ix86_check_avx256_stores to ix86_check_avx_upper_stores.
>> (ix86_avx_u128_mode_entry): Changed
>> ix86_check_avx256_register to ix86_check_avx_upper_register.
>> (ix86_avx_u128_mode_exit): Ditto.
>> * config/i386/i386.h: (host_detect_local_cpu): New define.
>
> @@ -2497,7 +2497,7 @@ public:
>/* opt_pass methods: */
>virtual bool gate (function *)
>  {
> -  return TARGET_AVX && !TARGET_AVX512F
> +  return TARGET_AVX && !TARGET_AVX512PF && !TARGET_AVX512ER
> ^
>  Please remove  this.
>
> From glibc commit:
>
> commit 4cb334c4d6249686653137ec273d081371b3672d
> Author: H.J. Lu 
> Date:   Tue Apr 18 14:01:45 2017 -0700
>
> x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396]
>
> On Skylake server, AVX512 load/store instructions in memcpy/memset may
> lead to lower CPU turbo frequency in certain situations.  Use of AVX2
> in memcpy/memset has been observed to have improved overall performance
> in many workloads due to the higher frequency.
>
> Since AVX512ER is unique to Xeon Phi, this patch sets Prefer_No_AVX512
> if AVX512ER isn't available so that AVX2 versions of memcpy/memset are
> used on Skylake server.
>
> Only AVX512ER is really unique to Xeon Phi.
>
>&& TARGET_VZEROUPPER && flag_expensive_optimizations
>&& !optimize_size;
>  }

Just a thought, should we have a separate patch to add -mprefer-vzeroupper
to cover all bases in the future, like

  /* opt_pass methods: */
  virtual bool gate (function *)
{
  return TARGET_AVX && (!TARGET_AVX512ER || TARGET_PREFER_VZEROUPPER)
 && TARGET_VZEROUPPER && flag_expensive_optimizations
 && !optimize_size;
}

>> 14.11.2017  Sebastian Peryt  
>>
>> gcc/testsuite/
>>
>> PR target/82941
>> PR target/82942
>> * gcc.target/i386/pr82941-1.c: New test.
>> * gcc.target/i386/pr82941-2.c: New test.
>> * gcc.target/i386/pr82942-1.c: New test.
>> * gcc.target/i386/pr82942-2.c: New test.
>>
>>
>> Thanks,
>> Sebastian
>>
>>> -Original Message-
>>> From: Jakub Jelinek [mailto:ja...@redhat.com]
>>> Sent: Tuesday, November 14, 2017 10:51 AM
>>> To: Peryt, Sebastian 
>>> Cc: gcc-patches@gcc.gnu.org; Uros Bizjak ; Kirill Yukhin
>>> ; Lu, Hongjiu 
>>> Subject: Re: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation
>>> for SKX
>>>
>>> On Tue, Nov 14, 2017 at 09:45:12AM +, Peryt, Sebastian wrote:
>>> > Hi,
>>> >
>>> > This patch fixes PR82941 and PR82942 by adding vzeroupper generation on
>>> SKX.
>>> > Bootstrapped and tested.
>>> >
>>> > 14.11.2017  Sebastian Peryt  
>>> >
>>> > gcc/
>>>
>>> In that case the ChangeLog entry should list the PRs, i.e.
>>>   PR target/82941
>>>   PR target/82942
>>> > * config/i386/i386.c (pass_insert_vzeroupper): Modify gate condition
>>> > to return true on Xeon and not on Xeon Phi.
>>> > (ix86_check_avx256_register): Changed to ...
>>> > (ix86_check_avx_upper_register): ... this.
>>> > (ix86_check_avx_upper_register): Add extra check for
>>> > VALID_AVX512F_REG_OR_XI_MODE.
>>>
>>> The way this is usually written is instead:
>>>   (ix86_check_avx256_register): Changed to ...
>>>   (ix86_check_avx_upper_register): ... this.  Add extra check for
>>>   VALID_AVX512F_REG_OR_XI_MODE.
>>> i.e. don't duplicate the function name, just continue mentioning further 
>>> changes.
>>>
>>> > (ix86_avx_u128_mode_needed): Changed
>>> > ix86_check_avx256_register to ix86_check_avx_upper_register.
>>> > (ix86_check_avx256_stores): Changed to ...
>>> >

Re: [Patch AArch64] Stop generating BSL for simple integer code

2017-11-14 Thread James Greenhalgh

On Wed, Oct 04, 2017 at 05:44:07PM +0100, James Greenhalgh wrote:
> 
> On Thu, Jul 27, 2017 at 06:49:01PM +0100, James Greenhalgh wrote:
> >
> > On Mon, Jun 12, 2017 at 02:44:40PM +0100, James Greenhalgh wrote:
> > > [Sorry for the re-send. I spotted that the attributes were not right for 
> > > the
> > >  new pattern I was adding. The change between this and the first version 
> > > was:
> > >
> > >   +  [(set_attr "type" "neon_bsl,neon_bsl,neon_bsl,multiple")
> > >   +   (set_attr "length" "4,4,4,12")]
> > > ]
> > >
> > > ---
> > >
> > > Hi,
> > >
> > > In this testcase, all argument registers and the return register
> > > will be general purpose registers:
> > >
> > >   long long
> > >   foo (long long a, long long b, long long c)
> > >   {
> > > return ((a ^ b) & c) ^ b;
> > >   }
> > >
> > > However, due to the implementation of aarch64_simd_bsl_internal
> > > we'll match that pattern and emit a BSL, necessitating moving all those
> > > arguments and results to the Advanced SIMD registers:
> > >
> > >   fmovd2, x0
> > >   fmovd0, x2
> > >   fmovd1, x1
> > >   bsl v0.8b, v2.8b, v1.8b
> > >   fmovx0, d0
> > >
> > > To fix this, we turn aarch64_simd_bsldi_internal in to an insn_and_split 
> > > that
> > > knows to split back to integer operations if the register allocation
> > > falls that way.
> > >
> > > We could have used an unspec, but then we lose some of the nice
> > > simplifications that can be made from explicitly spelling out the 
> > > semantics
> > > of BSL.
> >
> > Off list, Richard and I found considerable issues with this patch. From
> > idioms failing to match in the testcase, to subtle register allocation bugs,
> > to potential for suboptimal code generation.
> >
> > That's not good!
> >
> > This spin of the patch corrects those issues by adding a second split
> > pattern, allocating a temporary register if we're permitted to, or
> > properly using tied output operands if we're not, and by generally playing
> > things a bit safer around the possibility of register overlaps.
> >
> > The testcase is expanded to an execute test, hopefully giving a little
> > more assurance that we're doing the right thing - now testing the BSL idiom
> > generation in both general-purpose and floating-point registers and 
> > comparing
> > the results. Hopefully we're now a bit more robust!
> >
> > Bootstrapped and tested on aarch64-none-linux-gnu and cross-tested on
> > aarch64-none-elf with no issues.
> >
> > OK for trunk?
> 
> I'd like to ping this patch for review (I would like the explicit review as
> an earlier revision had bugs), attached is a rebase of this patch on trunk.
> 
> OK? If so, please commit it on my behalf as I will be out of office for a
> few weeks for some time off.

IMHO, This has had plenty of time for feedback from other AArch64 maintainers.
I've used my maintainer privileges and pushed this as r254727 after a
successful bootstrap and test cycle.

Thanks,
James

> gcc/
> 
> 2017-10-04  James Greenhalgh  
> 
>   * config/aarch64/aarch64-simd.md
>   (aarch64_simd_bsl_internal): Remove DImode.
>   (*aarch64_simd_bsl_alt): Likewise.
>   (aarch64_simd_bsldi_internal): New.
>   (aarch64_simd_bsldi_alt): Likewise.
> 
> gcc/testsuite/
> 
> 2017-10-04  James Greenhalgh  
> 
>   * gcc.target/aarch64/bsl-idiom.c: New.
>   * gcc.target/aarch64/copysign-bsl.c: New.
> 

> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 70e9339..1888d2f 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -2322,13 +2322,13 @@
>  ;; in *aarch64_simd_bsl_alt.
>  
>  (define_insn "aarch64_simd_bsl_internal"
> -  [(set (match_operand:VSDQ_I_DI 0 "register_operand" "=w,w,w")
> - (xor:VSDQ_I_DI
> -(and:VSDQ_I_DI
> -  (xor:VSDQ_I_DI
> +  [(set (match_operand:VDQ_I 0 "register_operand" "=w,w,w")
> + (xor:VDQ_I
> +(and:VDQ_I
> +  (xor:VDQ_I
>  (match_operand: 3 "register_operand" "w,0,w")
> -(match_operand:VSDQ_I_DI 2 "register_operand" "w,w,0"))
> -  (match_operand:VSDQ_I_DI 1 "register_operand" "0,w,w"))
> +(match_operand:VDQ_I 2 "register_operand" "w,w,0"))
> +  (match_operand:VDQ_I 1 "register_operand" "0,w,w"))
> (match_dup: 3)
>   ))]
>"TARGET_SIMD"
> @@ -2346,14 +2346,14 @@
>  ;; permutations of commutative operations, we have to have a separate 
> pattern.
>  
>  (define_insn "*aarch64_simd_bsl_alt"
> -  [(set (match_operand:VSDQ_I_DI 0 "register_operand" "=w,w,w")
> - (xor:VSDQ_I_DI
> -(and:VSDQ_I_DI
> -  (xor:VSDQ_I_DI
> -(match_operand:VSDQ_I_DI 3 "register_operand" "w,w,0")
> -(match_operand:VSDQ_I_DI 2 "register_operand" "w,0,w"))
> -   (match_operand:VSDQ_I_DI 1 "register_operand" "0,w,w"))
> -   (match_dup:VSDQ_I_DI 2)))]
> +  [(set (match_operand:VDQ_I

Re: [PATCH, rs6000] Repair vec_xl, vec_xst, vec_xl_be, vec_xst_be built-in functions

2017-11-14 Thread Bill Schmidt

Please hold review on this until I investigate something that Carl brought up.  
Thanks,
and sorry for the noise!

-- Bill


> On Nov 13, 2017, at 10:30 AM, Bill Schmidt  
> wrote:
> 
> Hi,
> 
> Some previous patches to add support for vec_xl_be and fill in gaps in testing
> for vec_xl and vec_xst resulted in incorrect semantics for these built-in
> functions.  My fault for not reviewing them in detail.  Essentially the vec_xl
> and vec_xl_be semantics were reversed, and vec_xst received semantics that
> should be associated with vec_xst_be.  Also, vec_xst_be has not yet been
> implemented.  Also, my attempt to add P8-specific code for the element-
> reversing loads and stores had the same problem with wrong semantics.  This 
> patch addresses these issues in a uniform manner.
> 
> Most of the work is done by adjusting the rs6000-c.c mapping between 
> overloaded
> function names and type-specific implementations, and adding missing 
> interfaces
> there.  I've also removed the previous implementation function
> altivec_expand_xl_be_builtin, and moved the byte-reversal into define_expands
> for the code gen patterns with some simplification.  These generate single
> instructions for P9 (lxvh8x, lxvq16x, etc.) and lvxw4x + permute for P8.
> 
> I've verified the code generation by hand, but have not included test cases
> in this patch because Carl Love has been independently working on adding a
> full set of test cases for these built-ins.  The code does pass those relevant
> tests that are already in the test suite.  I hope it's okay to install this
> patch while waiting for the full tests; I'll of course give high priority to
> fixing any possible fallout from those tests.
> 
> Bootstrapped and tested on powerpc64le-linux-gnu (POWER8 and POWER9) with no
> regressions.  Is this okay for trunk?
> 
> Thanks,
> Bill
> 
> 
> [gcc]
> 
> 2017-11-13  Bill Schmidt  
> 
>   * config/rs6000/altivec.h (vec_xst_be): New #define.
>   * config/rs6000/altivec.md (altivec_vperm__direct): Rename
>   and externalize from *altivec_vperm__internal.
>   * config/rs6000/rs6000-builtin.def (XL_BE_V16QI): Remove macro
>   instantiation.
>   (XL_BE_V8HI): Likewise.
>   (XL_BE_V4SI): Likewise.
>   (XL_BE_V4SI): Likewise.
>   (XL_BE_V2DI): Likewise.
>   (XL_BE_V4SF): Likewise.
>   (XL_BE_V2DF): Likewise.
>   (XST_BE): Add BU_VSX_OVERLOAD_X macro instantiation.
>   * config/rss6000/rs6000-c.c (altivec_overloaded_builtins): Correct
>   all array entries with these keys: VSX_BUILTIN_VEC_XL,
>   VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_VEC_XST.  Add entries for key
>   VSX_BUILTIN_VEC_XST_BE.
>   (altivec_expand_xl_be_builtin): Remove.
>   (altivec_expand_builtin): Remove handling for VSX_BUILTIN_XL_BE_*
>   built-ins.  Replace conditional calls to def_builtin for
>   __builtin_vsx_ld_elemrev_{v8hi,v16qi} and
>   __builtin_vsx_st_elemrev_{v8hi,v16qi} based on TARGET_P9_VECTOR
>   with unconditional calls.  Remove calls to def_builtin for
>   __builtin_vsx_le_be_.  Add a call to def_builtin for
>   __builtin_vec_xst_be.
>   * config/rs6000/vsx.md (vsx_ld_elemrev_v8hi): Convert define_insn
>   to define_expand, and add alternate RTL generation for P8.
>   (*vsx_ld_elemrev_v8hi_internal): New define_insn based on
>   vsx_ld_elemrev_v8hi.
>   (vsx_ld_elemrev_v16qi): Convert define_insn to define_expand, and
>   add alternate RTL generation for P8.
>   (*vsx_ld_elemrev_v16qi_internal): New define_insn based on
>   vsx_ld_elemrev_v16qi.
>   (vsx_st_elemrev_v8hi): Convert define_insn
>   to define_expand, and add alternate RTL generation for P8.
>   (*vsx_st_elemrev_v8hi_internal): New define_insn based on
>   vsx_st_elemrev_v8hi.
>   (vsx_st_elemrev_v16qi): Convert define_insn to define_expand, and
>   add alternate RTL generation for P8.
>   (*vsx_st_elemrev_v16qi_internal): New define_insn based on
>   vsx_st_elemrev_v16qi.
> 
> [gcc/testsuite]
> 
> 2017-11-13  Bill Schmidt  
> 
>   * gcc.target/powerpc/swaps-p8-26.c: Modify expected code
>   generation.
> 
> 
> Index: gcc/config/rs6000/altivec.h
> ===
> --- gcc/config/rs6000/altivec.h   (revision 254629)
> +++ gcc/config/rs6000/altivec.h   (working copy)
> @@ -357,6 +357,7 @@
> #define vec_xl __builtin_vec_vsx_ld
> #define vec_xl_be __builtin_vec_xl_be
> #define vec_xst __builtin_vec_vsx_st
> +#define vec_xst_be __builtin_vec_xst_be
> 
> /* Note, xxsldi and xxpermdi were added as __builtin_vsx_ functions
>instead of __builtin_vec_  */
> Index: gcc/config/rs6000/altivec.md
> ===
> --- gcc/config/rs6000/altivec.md  (revision 254629)
> +++ gcc/config/rs6000/altivec.md  (working copy)
> @@ -2130,7

Re: [PATCH] Fix usage of REG_BR_PROBs (PR target/82927).

2017-11-14 Thread Joseph Myers

On Tue, 14 Nov 2017, Martin Liška wrote:

> Hello.
> 
> Quite obvious fix for SH target. Joseph can you please continue with testing?
> I don't have a machine to test the patch.

I don't have SH hardware, but can confirm that, in conjunction with 
Martin's glibc patch to work around strncpy warnings (where the proper fix 
and another patch for one case are still under discussion), this fixes the 
sh4-linux-gnu glibc build with build-many-glibcs.py.  (The glibc testsuite 
build fails - errors in string/bug-strncat1.c, string/tester.c at least - 
but the failures I see look like more fallout from Martin's GCC changes, 
probably cases where the testsuite needs to disable the warnings, not 
anything SH-specific.)

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH] Fix usage of REG_BR_PROBs (PR target/82927).

2017-11-14 Thread Martin Liška

Hello.

Quite obvious fix for SH target. Joseph can you please continue with testing?
I don't have a machine to test the patch.

Ready for trunk?
Thanks,
Martin

gcc/ChangeLog:

2017-11-14  Martin Liska  

PR target/82927
* config/sh/sh-mem.cc: Initialize probabilities.
(sh_expand_cmpstr): Use to_reg_br_prob_note function for them.
(sh_expand_cmpnstr): Likewise.
(sh_expand_strlen): Likewise.
(sh_expand_setmem): Likewise.
---
 gcc/config/sh/sh-mem.cc | 75 +
 1 file changed, 45 insertions(+), 30 deletions(-)


diff --git a/gcc/config/sh/sh-mem.cc b/gcc/config/sh/sh-mem.cc
index 8fce9799921..39d11fe6458 100644
--- a/gcc/config/sh/sh-mem.cc
+++ b/gcc/config/sh/sh-mem.cc
@@ -183,8 +183,11 @@ expand_block_move (rtx *operands)
   return false;
 }
 
-static const int prob_unlikely = REG_BR_PROB_BASE / 10;
-static const int prob_likely = REG_BR_PROB_BASE / 4;
+static const profile_probability prob_unlikely
+  = profile_probability::from_reg_br_prob_base (REG_BR_PROB_BASE / 10);
+static const profile_probability prob_likely
+  = profile_probability::from_reg_br_prob_base (REG_BR_PROB_BASE / 4);
+
 
 /* Emit code to perform a strcmp.
 
@@ -219,19 +222,19 @@ sh_expand_cmpstr (rtx *operands)
   emit_insn (gen_iorsi3 (tmp1, s1_addr, s2_addr));
   emit_insn (gen_tstsi_t (tmp1, GEN_INT (3)));
   jump = emit_jump_insn (gen_branch_false (L_loop_byte));
-  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+  add_int_reg_note (jump, REG_BR_PROB, prob_likely.to_reg_br_prob_note ());
 }
   else if (addr1_alignment < 4 && addr2_alignment >= 4)
 {
   emit_insn (gen_tstsi_t (s1_addr, GEN_INT (3)));
   jump = emit_jump_insn (gen_branch_false (L_loop_byte));
-  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+  add_int_reg_note (jump, REG_BR_PROB, prob_likely.to_reg_br_prob_note ());
 }
   else if (addr1_alignment >= 4 && addr2_alignment < 4)
 {
   emit_insn (gen_tstsi_t (s2_addr, GEN_INT (3)));
   jump = emit_jump_insn (gen_branch_false (L_loop_byte));
-  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+  add_int_reg_note (jump, REG_BR_PROB, prob_likely.to_reg_br_prob_note ());
 }
 
   addr1 = adjust_automodify_address (addr1, SImode, s1_addr, 0);
@@ -255,7 +258,7 @@ sh_expand_cmpstr (rtx *operands)
 
   emit_insn (gen_cmpstr_t (tmp0, tmp3));
   jump = emit_jump_insn (gen_branch_true (L_end_loop_long));
-  add_int_reg_note (jump, REG_BR_PROB, prob_unlikely);
+  add_int_reg_note (jump, REG_BR_PROB, prob_unlikely.to_reg_br_prob_note ());
 
   emit_insn (gen_cmpeqsi_t (tmp1, tmp2));
 
@@ -264,7 +267,7 @@ sh_expand_cmpstr (rtx *operands)
   emit_move_insn (s2_addr, plus_constant (Pmode, s2_addr, 4));
 
   jump = emit_jump_insn (gen_branch_true (L_loop_long));
-  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+  add_int_reg_note (jump, REG_BR_PROB, prob_likely.to_reg_br_prob_note ());
   /* end loop.  */
 
   /* Fallthu, substract words.  */
@@ -303,13 +306,13 @@ sh_expand_cmpstr (rtx *operands)
 
   emit_insn (gen_cmpeqsi_t (tmp2, const0_rtx));
   jump = emit_jump_insn (gen_branch_true (L_end_loop_byte));
-  add_int_reg_note (jump, REG_BR_PROB, prob_unlikely);
+  add_int_reg_note (jump, REG_BR_PROB, prob_unlikely.to_reg_br_prob_note ());
 
   emit_insn (gen_cmpeqsi_t (tmp1, tmp2));
   if (flag_delayed_branch)
 emit_insn (gen_zero_extendqisi2 (tmp2, gen_lowpart (QImode, tmp2)));
   jump = emit_jump_insn (gen_branch_true (L_loop_byte));
-  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+  add_int_reg_note (jump, REG_BR_PROB, prob_likely.to_reg_br_prob_note ());
   /* end loop.  */
 
   emit_label (L_end_loop_byte);
@@ -378,19 +381,22 @@ sh_expand_cmpnstr (rtx *operands)
 	  emit_insn (gen_iorsi3 (tmp1, s1_addr, s2_addr));
 	  emit_insn (gen_tstsi_t (tmp1, GEN_INT (3)));
 	  jump = emit_jump_insn (gen_branch_false (L_loop_byte));
-	  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+	  add_int_reg_note (jump, REG_BR_PROB,
+prob_likely.to_reg_br_prob_note ());
 	}
 	  else if (addr1_alignment < 4 && addr2_alignment >= 4)
 	{
 	  emit_insn (gen_tstsi_t (s1_addr, GEN_INT (3)));
 	  jump = emit_jump_insn (gen_branch_false (L_loop_byte));
-	  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+	  add_int_reg_note (jump, REG_BR_PROB,
+prob_likely.to_reg_br_prob_note ());
 	}
 	  else if (addr1_alignment >= 4 && addr2_alignment < 4)
 	{
 	  emit_insn (gen_tstsi_t (s2_addr, GEN_INT (3)));
 	  jump = emit_jump_insn (gen_branch_false (L_loop_byte));
-	  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+	  add_int_reg_note (jump, REG_BR_PROB,
+prob_likely.to_reg_br_prob_note ());
 	}
 
 	  /* word count. Do we have iterations ?  */
@@ -414,11 +420,13 @@ sh_expand_cmpnstr (rtx *operands)
 
 	  emit_insn (gen_cmpstr_t (tmp0, tmp3));
 	  jump = emit_jump_insn

Re: [libsanitizer] Fix Mac OS X 10.7 bootstrap (PR sanitizer/82824)

2017-11-14 Thread Jakub Jelinek

On Tue, Nov 14, 2017 at 01:57:21PM +0100, Rainer Orth wrote:
> >> --- a/libsanitizer/configure.ac
> >> +++ b/libsanitizer/configure.ac
> >> @@ -140,6 +140,24 @@ case "$host" in
> >>  esac
> >>  AM_CONDITIONAL(USING_MAC_INTERPOSE, $MAC_INTERPOSE)
> >>  
> >> +# lsan_common_mac.cc needs VM_MEMORY_OS_ALLOC_ONCE which was only
> >> +# introduced in Mac OS X 10.9/Darwin 13.
> >> +case "$host" in
> >> +  *-*-darwin[[1-9]] | *-*-darwin[[1-9]].* | *-*-darwin1[[0-2]]*)
> >> +LSAN_COMMON=false ;;
> >> +  *) LSAN_COMMON=true ;;
> >> +esac
> >> +AM_CONDITIONAL(LSAN_COMMON_SUPPORTED, $LSAN_COMMON)
> >> +
> >> +# Before Xcode 4.5, the Darwin linker doesn't properly support undefined
> >> +# weak symbols.
> >> +case "$host" in
> >
> > $host here and above?  Shouldn't that be $target instead?
> 
> AFAIU $host and $target are identical in target libraries, and the
> majority of cases uses $host right now.

Didn't know that.  Though, e.g. libsanitizer uses $target except for this
single MAC_INTERPOSE case.

> > That said, I think it would be better to move this stuff except for the
> > AM_CONDITIONAL into configure.tgt similarly to how *_SUPPORTED is handled.
> 
> My reasoning was to have purely static based stuff like supported status
> or additional compiler/linker flags in configure.tgt, while other tests
> which might need configure checks go into configure.ac.

But the checks you've added are clearly static, just from parsing the
triplet.

> > And the preexisting:
> > case "$host" in
> >   *-*-darwin*) MAC_INTERPOSE=true ; enable_static=no ;;
> >   *) MAC_INTERPOSE=false ;;
> > esac
> > AM_CONDITIONAL(USING_MAC_INTERPOSE, $MAC_INTERPOSE)
> > looks wrong too.
> 
> Wrong in which way?  Just the location?

I meant for using $host instead of $target (perhaps a non-issue) and
for being in configure.ac instead of configure.tgt, where in configure.ac
we could have just the AM_CONDITIONAL.

Jakub

[PATCH] Fix CFG cleanup iteration order

2017-11-14 Thread Richard Biener


This fixes a latent bug I ran into with RPO value-numbering work which
similar to all SSA propagators and existing FRE/PRE do not perform
elimination in unreachable code-regions.  Those unreachable code-regions
are controlled by if (0 != 0) style conditions and CFG cleanup is supposed
to deal with invalid IL in them by removing them before doing any
"real" work.  My original attempt at doing this was flawed since
(as I ran into with a Go testcase) removing dead EH edges done by the
early sweep inspects the possibly throwing stmt and can end up
ICEing on released SSA names.

The following patch makes the whole thing more bullet-proof by
making sure to never visit blocks in such dead regions by performing
a DFS walk and removing dead outgoing edges in PRE order.

This adds a little overhead but should also speedup processing given
we previously visited most blocks twice by means of the removed

- /* During a first iteration on the CFG only remove trivially
-dead edges but mark other conditions for re-evaluation.  */
- if (first_p)
-   {
- val = const_binop (gimple_cond_code (stmt), 
boolean_type_node,
-gimple_cond_lhs (stmt),
-gimple_cond_rhs (stmt));
- if (! val)
-   bitmap_set_bit (cfgcleanup_altered_bbs, bb->index);


Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2017-11-14  Richard Biener  

* tree-cfgcleanup.c (cleanup_control_expr_graph): Remove first_p
paramter and handling.
(cleanup_control_flow_bb): Likewise.
(cleanup_control_flow_pre): New helper performing a DFS walk
to call cleanup_control_flow_bb in PRE order.
(cleanup_tree_cfg_1): Do the first phase of cleanup_control_flow_bb
via cleanup_control_flow_pre.

Index: gcc/tree-cfgcleanup.c
===
--- gcc/tree-cfgcleanup.c   (revision 254718)
+++ gcc/tree-cfgcleanup.c   (working copy)
@@ -122,8 +122,7 @@ convert_single_case_switch (gswitch *swt
at block BB.  */
 
 static bool
-cleanup_control_expr_graph (basic_block bb, gimple_stmt_iterator gsi,
-   bool first_p)
+cleanup_control_expr_graph (basic_block bb, gimple_stmt_iterator gsi)
 {
   edge taken_edge;
   bool retval = false;
@@ -146,25 +145,14 @@ cleanup_control_expr_graph (basic_block
   switch (gimple_code (stmt))
{
case GIMPLE_COND:
- /* During a first iteration on the CFG only remove trivially
-dead edges but mark other conditions for re-evaluation.  */
- if (first_p)
-   {
- val = const_binop (gimple_cond_code (stmt), boolean_type_node,
-gimple_cond_lhs (stmt),
-gimple_cond_rhs (stmt));
- if (! val)
-   bitmap_set_bit (cfgcleanup_altered_bbs, bb->index);
-   }
- else
-   {
- code_helper rcode;
- tree ops[3] = {};
- if (gimple_simplify (stmt, , ops, NULL, no_follow_ssa_edges,
-  no_follow_ssa_edges)
- && rcode == INTEGER_CST)
-   val = ops[0];
-   }
+ {
+   code_helper rcode;
+   tree ops[3] = {};
+   if (gimple_simplify (stmt, , ops, NULL, no_follow_ssa_edges,
+no_follow_ssa_edges)
+   && rcode == INTEGER_CST)
+ val = ops[0];
+ }
  break;
 
case GIMPLE_SWITCH:
@@ -235,7 +223,7 @@ cleanup_call_ctrl_altering_flag (gimple
true if anything changes.  */
 
 static bool
-cleanup_control_flow_bb (basic_block bb, bool first_p)
+cleanup_control_flow_bb (basic_block bb)
 {
   gimple_stmt_iterator gsi;
   bool retval = false;
@@ -258,7 +246,7 @@ cleanup_control_flow_bb (basic_block bb,
   || gimple_code (stmt) == GIMPLE_SWITCH)
 {
   gcc_checking_assert (gsi_stmt (gsi_last_bb (bb)) == stmt);
-  retval |= cleanup_control_expr_graph (bb, gsi, first_p);
+  retval |= cleanup_control_expr_graph (bb, gsi);
 }
   else if (gimple_code (stmt) == GIMPLE_GOTO
   && TREE_CODE (gimple_goto_dest (stmt)) == ADDR_EXPR
@@ -732,6 +720,45 @@ cleanup_tree_cfg_bb (basic_block bb)
   return false;
 }
 
+/* Do cleanup_control_flow_bb in PRE order.  */
+
+static bool
+cleanup_control_flow_pre ()
+{
+  bool retval = false;
+
+  auto_vec stack (n_basic_blocks_for_fn (cfun) + 1);
+  auto_sbitmap visited (last_basic_block_for_fn (cfun));
+  bitmap_clear (visited);
+
+  stack.quick_push (ei_start (ENTRY_BLOCK_PTR_FOR_FN (cfun)->succs));
+
+  while (! stack.is_empty ())
+{
+  /* Look at the edge on the top of the stack.  */
+  edge_iterator ei = stack.last ();
+  basic_block dest = ei_edge (ei)->dest;
+
+  if (dest != EXIT_BLOCK_PTR_FOR_FN

Re: [libsanitizer] Fix Mac OS X 10.7 bootstrap (PR sanitizer/82824)

2017-11-14 Thread Rainer Orth

Hi Jakub,

>> --- a/libsanitizer/configure.ac
>> +++ b/libsanitizer/configure.ac
>> @@ -140,6 +140,24 @@ case "$host" in
>>  esac
>>  AM_CONDITIONAL(USING_MAC_INTERPOSE, $MAC_INTERPOSE)
>>  
>> +# lsan_common_mac.cc needs VM_MEMORY_OS_ALLOC_ONCE which was only
>> +# introduced in Mac OS X 10.9/Darwin 13.
>> +case "$host" in
>> +  *-*-darwin[[1-9]] | *-*-darwin[[1-9]].* | *-*-darwin1[[0-2]]*)
>> +LSAN_COMMON=false ;;
>> +  *) LSAN_COMMON=true ;;
>> +esac
>> +AM_CONDITIONAL(LSAN_COMMON_SUPPORTED, $LSAN_COMMON)
>> +
>> +# Before Xcode 4.5, the Darwin linker doesn't properly support undefined
>> +# weak symbols.
>> +case "$host" in
>
> $host here and above?  Shouldn't that be $target instead?

AFAIU $host and $target are identical in target libraries, and the
majority of cases uses $host right now.

> That said, I think it would be better to move this stuff except for the
> AM_CONDITIONAL into configure.tgt similarly to how *_SUPPORTED is handled.

My reasoning was to have purely static based stuff like supported status
or additional compiler/linker flags in configure.tgt, while other tests
which might need configure checks go into configure.ac.

> And the preexisting:
> case "$host" in
>   *-*-darwin*) MAC_INTERPOSE=true ; enable_static=no ;;
>   *) MAC_INTERPOSE=false ;;
> esac
> AM_CONDITIONAL(USING_MAC_INTERPOSE, $MAC_INTERPOSE)
> looks wrong too.

Wrong in which way?  Just the location?

>> --- a/libsanitizer/lsan/lsan_common.h
>> +++ b/libsanitizer/lsan/lsan_common.h
>> @@ -20,6 +20,7 @@
>>  #include "sanitizer_common/sanitizer_stoptheworld.h"
>>  #include "sanitizer_common/sanitizer_symbolizer.h"
>>  
>> +#ifndef CAN_SANITIZE_LEAKS
>
> Obviously better would be to move the Darwin version checks here, but if
> upstream refuses to do that, we can do it this way too.

Kostya suggested the same.  I see now that
sanitizer_common/sanitizer_platform_interceptors.h already uses
__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__, so I'll give it a try.

> Note we already have similar thing in ubsan/ubsan_platform.h, where we
> want to override CAN_SANITIZE_UB from command line and need to patch that
> header for that, because upstream doesn't allow that :(.

I already mentioned that as a precedent for my lsan_common.h change.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [RFA][PATCH] 3/n Refactoring tree-vrp.c, step #3 introduce vr-values.c

2017-11-14 Thread Richard Biener

On Fri, Nov 10, 2017 at 11:57 PM, Jeff Law  wrote:
>
> And here's the monster that pulls the vr_values class out of tree-vrp.c.
>  I've also pulled out the transitive closure of free routines that are
> used strictly by vr-values.c.
>
> Like the gimple-ssa-evrp.c change, this exposes various functions and
> data structures that are still shared via tree-vrp.h.  It's more than
> I'd like.  I expect some may ultimately turn into free routines exported
> by vr-values.h.  Bug again, the idea here is to be as much of a
> cut-n-paste as possible, so I haven't tried to rationalize the exported
> goop.
>
> Also like the gimple-ssa-evrp.c change the bits in vr-values.c are
> straight copies of the bits that were in tree-vrp.c.
>
> Now we can start with deeper cleanups and further breakdown of evrp into
> analysis & optimization components.
>
>
> Bootstrapped and regression tested on x86_64.  OK for the trunk?

Ok.

Richard.

> jeff
>
> commit 2305155d7ad9561b267458c58071b212ee758c2a
> Author: Jeff Law 
> Date:   Wed Nov 8 11:53:16 2017 -0500
>
> * vr-values.c: New file with contents extracted from tree-vrp.c.
> * Makefile.in (OBJS): Add vr-values.o
> * tree-vrp.h (set_value_range_to_nonnull): Prototype.
> (set_value_range, set_and_canonicalize_value_range): Likewise.
> (vrp_bitmap_equal_p, range_is_nonnull): Likewise.
> (value_range_constant_singleton, symbolic_range_p): Likewise.
> (compare_values, compare_values_warnv, vrp_val_is_min): Likewise.
> (vrp_val_is_max, copy_value_range, set_value_range_to_value): 
> Likewise.
> (extract_range_from_binary_expr_1, vrp_val_min, vrp_val_max): 
> Likewise.
> (set_value_range_to_null, range_int_cst_p, opreand_less_p): 
> Likewise.
> (find_case_label_range, find_case_label_index): Likewise.
> (zero_nonzero_bits_from_vr, overflow_comparison_p): Likewise.
> (range_int_cst_singleton_p, value_inside_range): Likewise.
> (get_single_symbol): Likewise.
> (switch_update): Move structure definition here.
> (to_remove_edges, to_update_switch_stmts): Provide externs.
> * tree-vrp.c: Move all methods for vr-values class to vr-values.c
> (vrp_val_max, vrp_val_min, vrp_val_is_max): Make externally 
> visible.
> (vrp_val_is_min, set_value_range): Likewise.
> (set_and_canonicalize_value_range, copy_value_range): Likewise.
> (set_value_range_to_value, set_value_range_to_nonnull): Likewise.
> (set_value_range_to_null, vrp_bitmap_equal_p): Likewise.
> (range_is_nonnull, range_int_cst_p): Likewwise.
> (range_int_cst_singleton_p, symbolic_range_p): Likewise.
> (get_single_symbol, operand_less_p): Likewise
> (compare_values_warnv, compare_values): Likewise.
> (value_inside_range, value_range_constant_singleton): Likewise.
> (zero_nonzero_bitgs_from_vr): Likewise.
> (extract_range_from_binary_expr_1): Likewise.
> (overflow_comparison_p): Likewise.
> (to_remove_edges, to_update_switch_stmts): Likewise.
> (find_case_label-index, find_case_label_range): Likewise.
> (switch_update, set_value_range_to_nonnegative): Remove.
> (set_value_range_to_truthvalue): Likewise.
> (symbolic_range_based_on_p, gimple_assign_nonzero_p): Likewise.
> (gimple_stmt_nonzero_p, compare_ranges): Likewise.
> (compare_range_with_value, vrp_valueize, vrp_valueize_1): 
> Likewise.
> (find_case_label_ranges, test_for_singularity): Likewise.
> (range_fits_type_p, simplify_conversion_using_ranges): LIkewise.
> (x_vr_values): Move to its remaining use site.
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 9e5b9340fe8..778b0ada7c3 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -5,6 +5,49 @@
> with memcpy.
> (find_subframework_file): Same.
>
> +2017-11-10  Jeff Law  
> +
> +   * vr-values.c: New file with contents extracted from tree-vrp.c.
> +   * Makefile.in (OBJS): Add vr-values.o
> +   * tree-vrp.h (set_value_range_to_nonnull): Prototype.
> +   (set_value_range, set_and_canonicalize_value_range): Likewise.
> +   (vrp_bitmap_equal_p, range_is_nonnull): Likewise.
> +   (value_range_constant_singleton, symbolic_range_p): Likewise.
> +   (compare_values, compare_values_warnv, vrp_val_is_min): Likewise.
> +   (vrp_val_is_max, copy_value_range, set_value_range_to_value): 
> Likewise.
> +   (extract_range_from_binary_expr_1, vrp_val_min, vrp_val_max): 
> Likewise.
> +   (set_value_range_to_null, range_int_cst_p, opreand_less_p): Likewise.
> +   (find_case_label_range, find_case_label_index): Likewise.
> +

Re: PR82808

2017-11-14 Thread Richard Biener

On Tue, Nov 14, 2017 at 10:31 AM, Prathamesh Kulkarni
 wrote:
> On 3 November 2017 at 15:38, Richard Biener  
> wrote:
>> On Fri, Nov 3, 2017 at 6:15 AM, Prathamesh Kulkarni
>>  wrote:
>>> Hi Martin,
>>> As mentioned in PR, the issue here for propagating value of 'm' from
>>> f_c1 to foo() is that the jump function operation is FLOAT_EXPR, and
>>> the type of input param 'm' is int, so fold_unary() doesn't do the
>>> conversion to real_type. The attached patch fixes that by calling
>>> fold_convert if operation is FLOAT_EXPR / FIX_TRUNC_EXPR /
>>> CONVERT_EXPR and converts it to the type of corresponding parameter in
>>> callee.
>>>
>>> There are still two issues:
>>> a) Using NOP_EXPR for early_exit in ipa_get_jf_pass_through_result.
>>> I suppose we need to change to some other code to indicate that there
>>> is no operation ?
>>> b) Patch does not passing param_type from all callers.
>>> I suppose we could fix these incrementally ?
>>>
>>> Bootstrap+tested on x86_64-unknown-linux-gnu.
>>> OK for trunk ?
>>
>> This doesn't look like a well designed fix.  Both fold_unary and fold_binary
>> calls get a possibly bogus type and you single out only a few ops.
>>
>> Either _fully_ list a set of operations that are know to have matching
>> input/output types or always require param_type to be non-NULL.
>>
>> For a) simply remove the special-casing and merge it with CONVERT_EXPR
>> handling (however it will end up looking).
>>
>> Please don't use fold_convert, using fold_unary is fine.
> Hi Richard,
> Sorry for the delay. In the attached version, parm_type is made non
> NULL in ipa_get_jf_pass_through_result().
>
> ipa_get_jf_pass_through_result() is called from two
> places - propagate_vals_across_pass_through() and ipa_value_from_jfunc().
> However it appears ipa_value_from_jfunc() is called from multiple
> functions and it's
> hard to detect parm_type in the individual callers. So I have passed
> correct parm_type from propagate_vals_across_pass_through(), and kept
> the old behavior for ipa_value_from_jfunc().
>
> Would it be OK to fix that incrementally ?

I don't think it is good to do that.  If we can't get the correct type
then we have
to only support known-good operations.  There's the ipa_node_params->descriptors
array where the type should be attached to, no?

So use TREE_TYPE (ipa_get_param (info, idx))?

Other than that Martin should have a look as I'm not really familiar
with this IPA API.

Richard.

> Patch is bootstrapped+tested on x86_64-unknown-linux-gnu.
>
> Thanks,
> Prathamesh
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Prathamesh

Re: RFA (hash-map): PATCH to support GTY((cache)) with hash_map

2017-11-14 Thread Richard Biener

On Fri, Sep 15, 2017 at 11:45 PM, Jason Merrill  wrote:
> The hash_map interface is a lot more convenient than that of
> hash_table for cases where it makes sense, but there hasn't been a way
> to get the ggc_cache_remove behavior with a hash_map.  In other words,
> not marking elements during the initial ggc marking phase, but maybe
> marking them during the clear_caches phase based on keep_cache_entry.
>
> This patch implements that by:
>
> Adding a ggc_maybe_mx member function to ggc_remove, and overriding
> that instead of ggc_mx in ggc_cache_remove.
> Calling H::ggc_maybe_mx instead of H::ggc_mx in gt_ggc_mx (hash_table *).
> Calling H::ggc_mx in gt_cleare_caches (hash_table *) rather than
> relying on an extern declaration of a plain function that cannot be
> declared for hash_map::hash_entry.
> Adding ggc_maybe_mx and keep_cache_entry to hash_map::hash_entry.
> Adding gt_cleare_cache for hash_map.
> Adding a boolean constant to the hash-map traits indicating whether we
> want the cache behavior above.
>
> I then define a typedef tree_cache_map to use this functionality, and
> use it in a few places in the C++ front end.
>
> Tested x86_64-pc-linux-gnu, OK for trunk?

Ok.

Richard.

[PATCH] New lang hook

2017-11-14 Thread Nathan Sidwell

This patch addresses c++/82836 & c++/82737.  The root cause was a bad 
assumption I made when moving the mangling alias machinery to its own 
hash table.


I had thought that once we SET_DECL_ASSEMBLER_NAME, it never becomes 
unset (or changed).  That is false.  There are paths in the compiler 
that set it back to zero, and at least one path where we remangled 
because of a bad assumption about the templatedness of a friend.


Previously, we placed mangling aliases in the global namespace mapping 
an identifier (the mangled name) to the decl.  Resetting the assembler 
name didn't break this map -- but may have led to unneeded aliases, I guess.


Moving the alias machinery to its own hash-map allowed me to make 
namespace hashes a simple hash, keyed by DECL_NAME (rather than a 
hash-map from identifier->decl).


Finally converting the alias hash-map to a hash-table keyed by 
DECL_ASSEMBLER_NAME shrunk that table too.  But exposed this problem.


I did contemplate reverting that change, but tried this first, and it 
seems good.


1) rename the args of COPY_DECL_ASSEMBLER_NAME from DECL1 & DECL2 to the 
more semantic SRC_DECL and DST_DECL.  Same for COPY_DECL_RTL.  These 
macros smell rather memcpy-like, but the args are the wrong way round, 
so the more clues the better. (My comment in 82737 was misled by this.)


2) change SET_DECL_ASSEMBLER_NAME to a function call, similar to 
DECL_ASSEMBLER_NAME.


3) have that call forward to a new lang hook, 
override_decl_assembler_name, if it is changing the name. 
(set_decl_assembler_name already having been taken.)


4) the new lang hook default simply stores the new name via 
DECL_ASSEMBLER_NAME_RAW.


5) the C++ FE overrides this.  If the current name is in the alias map 
and maps to this decl, we take it out of the map.  Then set the new name.


6) excitingly, mangle_decl can be called with a non-null 
DECL_ASSEMBLER_NAME, so that function's use of SET_DECL_ASSEMBLER_NAME 
works just fine.


booted on all languages.

ok?

nathan
--
Nathan Sidwell
2017-11-13  Nathan Sidwell  

	PR c++/82836
	PR c++/82737
	* tree.h (COPY_DECL_RTL): Rename parms for clarity.
	(SET_DECL_ASSEMBLER_NAME): Forward to
	overwrite_decl_assembler_name.
	(COPY_DECL_ASSEMBLER_NAME): Rename parms for clarity.
	(overwrite_decl_assembler_name): Declare.
	* tree.c (overwrite_decl_assembler_name): New.
	* langhooks-def.h (lhd_overwrite_decl_assembler_name): Declare.
	(LANG_HOOKS_OVERWRITE_DECL_ASSEMBLER_NAME): Provide default.
	(LANG_HOOKS_INITIALIZER): Add it.
	* langhooks.h (struct lang_hooks): Add overwrite_decl_assembler_name.
	* langhooks.c (lhd_overwrite_decl_assembler_name): Default
	implementation.

	gcc/cp/
	* cp-objcp-common.h (LANG_HOOKS_OVERWRITE_DECL_ASSEMBLER_NAME):
	Override.
	* cp-tree.h (overwrite_mangling): Declare.
	* decl2.c (struct mangled_decl_hash): Entries are deletable.
	(overwrite_mangling): New.

	gcc/testsuite/
	* g++.dg/pr82836.C: New.

Index: tree.c
===
--- tree.c	(revision 254703)
+++ tree.c	(working copy)
@@ -674,6 +674,17 @@ decl_assembler_name (tree decl)
   return DECL_ASSEMBLER_NAME_RAW (decl);
 }
 
+/* The DECL_ASSEMBLER_NAME_RAW of DECL is being explicitly set to NAME
+   (which may be NULL).  Inform the FE, if this is changing the
+   name.  */
+
+void
+overwrite_decl_assembler_name (tree decl, tree name)
+{
+  if (DECL_ASSEMBLER_NAME_RAW (decl) != name)
+lang_hooks.overwrite_decl_assembler_name (decl, name);
+}
+
 /* When the target supports COMDAT groups, this indicates which group the
DECL is associated with.  This can be either an IDENTIFIER_NODE or a
decl, in which case its DECL_ASSEMBLER_NAME identifies the group.  */
Index: tree.h
===
--- tree.h	(revision 254703)
+++ tree.h	(working copy)
@@ -2528,11 +2528,11 @@ extern void decl_value_expr_insert (tree
 #define DECL_RTL_SET_P(NODE) \
   (HAS_RTL_P (NODE) && DECL_WRTL_CHECK (NODE)->decl_with_rtl.rtl != NULL)
 
-/* Copy the RTL from NODE1 to NODE2.  If the RTL was not set for
-   NODE1, it will not be set for NODE2; this is a lazy copy.  */
-#define COPY_DECL_RTL(NODE1, NODE2) \
-  (DECL_WRTL_CHECK (NODE2)->decl_with_rtl.rtl \
-   = DECL_WRTL_CHECK (NODE1)->decl_with_rtl.rtl)
+/* Copy the RTL from SRC_DECL to DST_DECL.  If the RTL was not set for
+   SRC_DECL, it will not be set for DST_DECL; this is a lazy copy.  */
+#define COPY_DECL_RTL(SRC_DECL, DST_DECL) \
+  (DECL_WRTL_CHECK (DST_DECL)->decl_with_rtl.rtl \
+   = DECL_WRTL_CHECK (SRC_DECL)->decl_with_rtl.rtl)
 
 /* The DECL_RTL for NODE, if it is set, or NULL, if it is not set.  */
 #define DECL_RTL_IF_SET(NODE) (DECL_RTL_SET_P (NODE) ? DECL_RTL (NODE) : NULL)
@@ -2723,19 +2723,21 @@ extern void decl_value_expr_insert (tree
 
 /* Set the DECL_ASSEMBLER_NAME for NODE to NAME.  */
 #define SET_DECL_ASSEMBLER_NAME(NODE, NAME) \
-  (DECL_ASSEMBLER_NAME_RAW (NODE) = (NAME))
+

Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices

2017-11-14 Thread Richard Biener

On Mon, Nov 13, 2017 at 6:37 PM, Martin Sebor  wrote:
> Richard, this thread may have been conflated with the one Re:
> [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets
> (PR 82455) They are about different things.
>
> I'm still looking for approval of:
>
>   https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01208.html

+  tree maxbound
+ = build_int_cst (sizetype, ~(1LLU << (TYPE_PRECISION (sizetype) - 1)));

this looks possibly bogus.  Can you instead use

  up_bound_p1
= wide_int_to_tree (sizetype, wi::div_trunc (wi::max_value
(TYPE_PRECISION (sizetype), SIGNED), wi::to_wide (eltsize)));

please?  Note you are _not_ computing the proper upper bound here because that
is what you compute plus low_bound.

+  up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize);

+
+  tree arg = TREE_OPERAND (ref, 0);
+  tree_code code = TREE_CODE (arg);
+  if (code == COMPONENT_REF)
+ {
+  HOST_WIDE_INT off;
+  if (tree base = get_addr_base_and_unit_offset (ref, ))
+{
+  tree size = TYPE_SIZE_UNIT (TREE_TYPE (base));
+  if (TREE_CODE (size) == INTEGER_CST)
+ up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size);

I think I asked this multiple times now but given 'ref' is the
variable array-ref
a.b.c[i] when you call get_addr_base_and_unit_offset (ref, ) you always
get a NULL_TREE return value.

So I asked you to pass it 'arg' instead ... which gets you the offset of
a.b.c, which looks like what you intended to get anyway.

I also wonder what you compute here - you are looking at the size of 'base'
but that is the size of 'a'.  You don't even use the computed offset!  Which
means you could have used get_base_address instead!?  Also the type
of 'base' may be completely off given MEM[ + 8].b.c[i] would return blk
as base which might be an array of chars and not in any way related to
the type of the innermost structure we access with COMPONENT_REFs.

Why are you only looking at COMPONENT_REF args anyways?  You
don't want to handle a.b[3][i]?

That is, I'd have expected you do

   if (get_addr_base_and_unit_offset (ref, ))
 up_bound_p1 = wide_int_to_tree (sizetype, wi::sub (wi::to_wide
(up_bound_p1), off));

Richard.

> Thanks
> Martin
>
>
>>> The difficulty with a testcase like
>>>
>>> struct { struct A { int b[1]; } a[5]; } x;
>>>
>>>  x.a[i].b[j]
>>>
>>> is that b is not considered an array at struct end since one of my
>>> recent changes to array_at_struct_end (basically it disallows having
>>> a flex array as the last member of an array).
>>>
>>> It would still stand for non-array components with variable offset
>>> but you can't create C testcases for that.
>>>
>>> So yes, for the specific case within the array_at_struct_end_p condition
>>> get_addr_base_and_unit_offset is enough.  IIRC the conditon was
>>> a bit more than just get_addr_base_and_unit_offset.  up_bound !=
>>> INTEGER_CST for example.  So make the above
>>>
>>> void foo (int n, int i)
>>> {
>>>  struct { struct A { int b[n]; } a[5]; } x;
>>>  return x.a[i].b[PTRDIFF_MAX/2];
>>> }
>>>
>>> with appropriately adjusted constant.  Does that give you the testcase
>>> you want?
>>
>>
>> Thank you for the test case.  It is diagnosed the same way
>> irrespective of which of the two functions is used so it serves
>> to confirm my understanding that the only difference between
>> the two functions is bits vs bytes.
>>
>> Unless you have another test case that does demonstrate that
>> get_ref_base_and_extent is necessary/helpful, is the last patch
>> okay to commit?
>>
>> (Again, to be clear, I'm happy to change or enhance the patch if
>> I can verify that the change handles cases that the current patch
>> misses.)
>>
>>>
>>> As of "it works, catches corner-cases, ..." - yes, it does, but it
>>> adds code that needs to be maintained, may contain bugs, is
>>> executed even for valid code.
>>
>>
>> Understood.  I don't claim the enhancement is free of any cost
>> whatsoever.  But it is teeny by most standards and it doesn't
>> detect just excessively large indices but also negative indices
>> into last member arrays (bug 68325) and out-of-bounds indices
>> (bug 82583).  The "excessively large" part does come largely
>> for free with the other checks.
>>
>> Martin
>
>

Re: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation for SKX

2017-11-14 Thread H.J. Lu

On Tue, Nov 14, 2017 at 3:18 AM, Peryt, Sebastian
 wrote:
> I have updated tests and changelog according to Jakub's suggestions.
> Please find attached v2 of my patch.
>
>
> 14.11.2017  Sebastian Peryt  
>
> gcc/
>
> PR target/82941
> PR target/82942
> * config/i386/i386.c (pass_insert_vzeroupper): Modify gate condition
> to return true on Xeon and not on Xeon Phi.
> (ix86_check_avx256_register): Changed to ...
> (ix86_check_avx_upper_register): ... this. Add extra check for
> VALID_AVX512F_REG_OR_XI_MODE.
> (ix86_avx_u128_mode_needed): Changed
> ix86_check_avx256_register to ix86_check_avx_upper_register.
> (ix86_check_avx256_stores): Changed to ...
> (ix86_check_avx_upper_stores): ... this. Changed
> ix86_check_avx256_register to ix86_check_avx_upper_register.
> (ix86_avx_u128_mode_after): Changed
> avx_reg256_found to avx_upper_reg_found. Changed
> ix86_check_avx256_stores to ix86_check_avx_upper_stores.
> (ix86_avx_u128_mode_entry): Changed
> ix86_check_avx256_register to ix86_check_avx_upper_register.
> (ix86_avx_u128_mode_exit): Ditto.
> * config/i386/i386.h: (host_detect_local_cpu): New define.

@@ -2497,7 +2497,7 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *)
 {
-  return TARGET_AVX && !TARGET_AVX512F
+  return TARGET_AVX && !TARGET_AVX512PF && !TARGET_AVX512ER
^
 Please remove  this.

>From glibc commit:

commit 4cb334c4d6249686653137ec273d081371b3672d
Author: H.J. Lu 
Date:   Tue Apr 18 14:01:45 2017 -0700

x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396]

On Skylake server, AVX512 load/store instructions in memcpy/memset may
lead to lower CPU turbo frequency in certain situations.  Use of AVX2
in memcpy/memset has been observed to have improved overall performance
in many workloads due to the higher frequency.

Since AVX512ER is unique to Xeon Phi, this patch sets Prefer_No_AVX512
if AVX512ER isn't available so that AVX2 versions of memcpy/memset are
used on Skylake server.

Only AVX512ER is really unique to Xeon Phi.

   && TARGET_VZEROUPPER && flag_expensive_optimizations
   && !optimize_size;
 }

> 14.11.2017  Sebastian Peryt  
>
> gcc/testsuite/
>
> PR target/82941
> PR target/82942
> * gcc.target/i386/pr82941-1.c: New test.
> * gcc.target/i386/pr82941-2.c: New test.
> * gcc.target/i386/pr82942-1.c: New test.
> * gcc.target/i386/pr82942-2.c: New test.
>
>
> Thanks,
> Sebastian
>
>> -Original Message-
>> From: Jakub Jelinek [mailto:ja...@redhat.com]
>> Sent: Tuesday, November 14, 2017 10:51 AM
>> To: Peryt, Sebastian 
>> Cc: gcc-patches@gcc.gnu.org; Uros Bizjak ; Kirill Yukhin
>> ; Lu, Hongjiu 
>> Subject: Re: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation
>> for SKX
>>
>> On Tue, Nov 14, 2017 at 09:45:12AM +, Peryt, Sebastian wrote:
>> > Hi,
>> >
>> > This patch fixes PR82941 and PR82942 by adding vzeroupper generation on
>> SKX.
>> > Bootstrapped and tested.
>> >
>> > 14.11.2017  Sebastian Peryt  
>> >
>> > gcc/
>>
>> In that case the ChangeLog entry should list the PRs, i.e.
>>   PR target/82941
>>   PR target/82942
>> > * config/i386/i386.c (pass_insert_vzeroupper): Modify gate condition
>> > to return true on Xeon and not on Xeon Phi.
>> > (ix86_check_avx256_register): Changed to ...
>> > (ix86_check_avx_upper_register): ... this.
>> > (ix86_check_avx_upper_register): Add extra check for
>> > VALID_AVX512F_REG_OR_XI_MODE.
>>
>> The way this is usually written is instead:
>>   (ix86_check_avx256_register): Changed to ...
>>   (ix86_check_avx_upper_register): ... this.  Add extra check for
>>   VALID_AVX512F_REG_OR_XI_MODE.
>> i.e. don't duplicate the function name, just continue mentioning further 
>> changes.
>>
>> > (ix86_avx_u128_mode_needed): Changed
>> > ix86_check_avx256_register to ix86_check_avx_upper_register.
>> > (ix86_check_avx256_stores): Changed to ...
>> > (ix86_check_avx_upper_stores): ... this.
>> > (ix86_check_avx_upper_stores): Changed
>> > ix86_check_avx256_register to ix86_check_avx_upper_register.
>>
>> Likewise.
>>
>> > gcc/testsuite/
>> > * gcc.target/i386/pr82941.c: New test.
>> > * gcc.target/i386/pr82942.c: New test.
>>
>> Shouldn't there be also a test that if using -march=knl and another one if 
>> using -
>> mavx512f -mavx512er that we don't emit any vzeroupper?
>>
>>   Jakub



-- 
H.J.

Re: [PATCH] Fixes for PR68356, PR81210, and PR81693

2017-11-14 Thread Dominique d'Humières



> Le 13 nov. 2017 à 18:40, Mike Stump  a écrit :
> 
> On Nov 12, 2017, at 6:58 AM, H.J. Lu  wrote:
>> 
>> On Sun, Nov 12, 2017 at 6:22 AM, Dominique d'Humières
>>  wrote:
>>> The following patch fixes PR68356, PR81210, and PR81693 on darwin.
>>> ...
>> 
>> I wrote these tests.  These tests don't align stack to 16 bytes and
>> should be skipped on Darwin.
> 
> So, sounds like something based on:
> 
>  /* { dg-skip-if "sp not aligned to 16 bytes" { *-*-darwin } } */
> 
> then.  Ok with that change.

Thanks for the review.

Am I correct to understand that this apply to pr25967-*.c only?

I’ld like to keep the PR numbers. Is it OK?

TIA

Dominique

RE: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation for SKX

2017-11-14 Thread Peryt, Sebastian

I have updated tests and changelog according to Jakub's suggestions.
Please find attached v2 of my patch.

14.11.2017  Sebastian Peryt  

gcc/

PR target/82941
PR target/82942
* config/i386/i386.c (pass_insert_vzeroupper): Modify gate condition
to return true on Xeon and not on Xeon Phi.
(ix86_check_avx256_register): Changed to ...
(ix86_check_avx_upper_register): ... this. Add extra check for
VALID_AVX512F_REG_OR_XI_MODE.
(ix86_avx_u128_mode_needed): Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_check_avx256_stores): Changed to ...
(ix86_check_avx_upper_stores): ... this. Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_avx_u128_mode_after): Changed
avx_reg256_found to avx_upper_reg_found. Changed
ix86_check_avx256_stores to ix86_check_avx_upper_stores.
(ix86_avx_u128_mode_entry): Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_avx_u128_mode_exit): Ditto.
* config/i386/i386.h: (host_detect_local_cpu): New define.

14.11.2017  Sebastian Peryt  

gcc/testsuite/

PR target/82941
PR target/82942
* gcc.target/i386/pr82941-1.c: New test.
* gcc.target/i386/pr82941-2.c: New test.
* gcc.target/i386/pr82942-1.c: New test.
* gcc.target/i386/pr82942-2.c: New test.

Thanks,
Sebastian

> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Tuesday, November 14, 2017 10:51 AM
> To: Peryt, Sebastian 
> Cc: gcc-patches@gcc.gnu.org; Uros Bizjak ; Kirill Yukhin
> ; Lu, Hongjiu 
> Subject: Re: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation
> for SKX
> 
> On Tue, Nov 14, 2017 at 09:45:12AM +, Peryt, Sebastian wrote:
> > Hi,
> >
> > This patch fixes PR82941 and PR82942 by adding vzeroupper generation on
> SKX.
> > Bootstrapped and tested.
> >
> > 14.11.2017  Sebastian Peryt  
> >
> > gcc/
> 
> In that case the ChangeLog entry should list the PRs, i.e.
>   PR target/82941
>   PR target/82942
> > * config/i386/i386.c (pass_insert_vzeroupper): Modify gate condition
> > to return true on Xeon and not on Xeon Phi.
> > (ix86_check_avx256_register): Changed to ...
> > (ix86_check_avx_upper_register): ... this.
> > (ix86_check_avx_upper_register): Add extra check for
> > VALID_AVX512F_REG_OR_XI_MODE.
> 
> The way this is usually written is instead:
>   (ix86_check_avx256_register): Changed to ...
>   (ix86_check_avx_upper_register): ... this.  Add extra check for
>   VALID_AVX512F_REG_OR_XI_MODE.
> i.e. don't duplicate the function name, just continue mentioning further 
> changes.
> 
> > (ix86_avx_u128_mode_needed): Changed
> > ix86_check_avx256_register to ix86_check_avx_upper_register.
> > (ix86_check_avx256_stores): Changed to ...
> > (ix86_check_avx_upper_stores): ... this.
> > (ix86_check_avx_upper_stores): Changed
> > ix86_check_avx256_register to ix86_check_avx_upper_register.
> 
> Likewise.
> 
> > gcc/testsuite/
> > * gcc.target/i386/pr82941.c: New test.
> > * gcc.target/i386/pr82942.c: New test.
> 
> Shouldn't there be also a test that if using -march=knl and another one if 
> using -
> mavx512f -mavx512er that we don't emit any vzeroupper?
> 
>   Jakub

0001-VZEROUPPER_v2.patch
Description: 0001-VZEROUPPER_v2.patch

1 2 >

1 - 100 of 123 matches

Mail list logo