[PATCH] Fix ia32 ICE while compiling glibc (PR target/93174)

2020-01-07 Thread Jakub Jelinek
Hi!

Joseph reported ia32 glibc build ICEs, because the
*adddi3_doubleword_cc_overflow_1 pattern allows a memory output and matching
input, but addcarry* to which it splits doesn't, for some strange
reason it only allows register output.  As it emits an adc instruction
which is very similar to add, I don't see the point of doing in the
predicates/constraints anything other than what add does.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2020-01-07  Jakub Jelinek  

PR target/93174
* config/i386/i386.md (addcarry_0): Use nonimmediate_operand
predicate for output operand instead of register_operand.
(addcarry, addcarry_1): Likewise.  Add alternative with
memory destination and non-memory operands[2].

* gcc.c-torture/compile/pr93174.c: New test.

--- gcc/config/i386/i386.md.jj  2020-01-07 10:54:45.924177633 +0100
+++ gcc/config/i386/i386.md 2020-01-07 18:12:48.043555173 +0100
@@ -6786,13 +6786,13 @@ (define_insn "addcarry"
  (plus:SWI48
(match_operator:SWI48 5 "ix86_carry_flag_operator"
  [(match_operand 3 "flags_reg_operand") (const_int 0)])
-   (match_operand:SWI48 1 "nonimmediate_operand" "%0"))
- (match_operand:SWI48 2 "nonimmediate_operand" "rm")))
+   (match_operand:SWI48 1 "nonimmediate_operand" "%0,0"))
+ (match_operand:SWI48 2 "nonimmediate_operand" "r,rm")))
  (plus:
(zero_extend: (match_dup 2))
(match_operator: 4 "ix86_carry_flag_operator"
  [(match_dup 3) (const_int 0)]
-   (set (match_operand:SWI48 0 "register_operand" "=r")
+   (set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
(plus:SWI48 (plus:SWI48 (match_op_dup 5
 [(match_dup 3) (const_int 0)])
(match_dup 1))
@@ -6812,7 +6812,7 @@ (define_expand "addcarry_0"
   (match_operand:SWI48 1 "nonimmediate_operand")
   (match_operand:SWI48 2 "x86_64_general_operand"))
 (match_dup 1)))
-  (set (match_operand:SWI48 0 "register_operand")
+  (set (match_operand:SWI48 0 "nonimmediate_operand")
   (plus:SWI48 (match_dup 1) (match_dup 2)))])]
   "ix86_binary_operator_ok (PLUS, mode, operands)")
 
@@ -6830,7 +6830,7 @@ (define_insn "*addcarry_1"
(match_operand: 6 "const_scalar_int_operand" "")
(match_operator: 4 "ix86_carry_flag_operator"
  [(match_dup 3) (const_int 0)]
-   (set (match_operand:SWI48 0 "register_operand" "=r")
+   (set (match_operand:SWI48 0 "nonimmediate_operand" "=rm")
(plus:SWI48 (plus:SWI48 (match_op_dup 5
 [(match_dup 3) (const_int 0)])
(match_dup 1))
--- gcc/testsuite/gcc.c-torture/compile/pr93174.c.jj2020-01-07 
18:16:29.460185015 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr93174.c   2020-01-07 
18:15:29.716094376 +0100
@@ -0,0 +1,14 @@
+/* PR target/93174 */
+
+unsigned long long a[2];
+void bar (void);
+
+void
+foo (int c)
+{
+  int e = c >> 2;
+  a[0] += c;
+  a[1] = a[0] < c;
+  while (e--)
+bar ();
+}

Jakub



Re: [PATCH][vect] Keep track of DR_OFFSET advance in dr_vec_info rather than data_reference

2020-01-07 Thread Richard Biener
On Tue, 7 Jan 2020, Andre Vieira (lists) wrote:

> Hello,
> 
> Should we try to get this refactoring in stage 3 or park it for next stage 1?

I only looked at the patch now and I think it nicely simplifies things.
The patch is OK now with the usual function comment added before

 }

+inline tree
+get_dr_vinfo_offset (dr_vec_info *dr_info, bool check_outer = false)
+{
+  innermost_loop_behavior *base;

Thanks,
Richard.

> Cheers,
> Andre
> 
> On 10/12/2019 13:36, Andre Vieira (lists) wrote:
> > Hi,
> > 
> > This patch aims at refactoring the vectorizer code to no longer need to
> > reset DR_OFFSET for epilogue vectorization and instead keep track of
> > DR_OFFSET changes per dr_vec_info and just update it as needed.  I added a
> > member of type 'tree' called 'offset' to dr_vec_info, which keeps track of
> > the changes to the data_reference's offset per dr_vec_info and thus per
> > instance of loop vectorization.  To get the current loop's DR_OFFSET I
> > introduced a function 'get_dr_vinfo_offset' which will add the dr_vec_info's
> > offset to either the data_reference's innermost offset or the offset of the
> > 'innermost_loop_behavior' returned by 'vect_dr_behavior' depending on
> > whether 'get_dr_vinfo_offset's second parameter 'check_outer' is true.  This
> > is to mimic the behavior of using the outer loop relative
> > 'innermost_loop_behavior' in 'vect_create_addr_base_for_vector_ref'.
> > 
> > I regression tested vect.exp on aarch64 and x86_64. I also regression tested
> > libgomp on aarch64 and x86_64, no changes, but there were quite a few test
> > failures with the commit I based this patch on...
> > 
> > Is this OK for trunk or shall I wait until libgomp stabilizes?
> > 
> > 2019-12-10  Andre Vieira  
> > 
> >     * tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref): Use
> >      get_dr_vinfo_offset
> >      * tree-vect-loop.c (update_epilogue_loop_vinfo):  Remove 
> > orig_drs_init
> >      parameter and its use to reset DR_OFFSET's.
> >      (vect_transform_loop): Remove orig_drs_init argument.
> >      * tree-vect-loop-manip.c (vect_update_init_of_dr): Update the 
> > offset
> >      member of dr_vec_info rather than the offset of the associated
> >      data_reference's innermost_loop_behavior.
> >      (vect_update_init_of_dr): Pass dr_vec_info instead of 
> > data_reference.
> >      (vect_do_peeling): Remove orig_drs_init parameter and its
> >      construction.
> >      * tree-vect-stmts.c (check_scan_store): Replace use of DR_OFFSET
> >      with get_dr_vinfo_offset.
> >      (vectorizable_store): Likewise.
> >      (vectorizable_load): Likewise.
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH] ipa-inline: Adjust condition for caller_growth_limits

2020-01-07 Thread luoxhu


On 2020/1/7 23:40, Jan Hubicka wrote:
>>
>>
>> On 2020/1/7 16:40, Jan Hubicka wrote:
 On Mon, 2020-01-06 at 01:03 -0600, Xiong Hu Luo wrote:
> Inline should return failure either (newsize > param_large_function_insns)
> OR (newsize > limit).  Sometimes newsize is larger than
> param_large_function_insns, but smaller than limit, inline doesn't return
> failure even if the new function is a large function.
> Therefore, param_large_function_insns and param_large_function_growth 
> should be
> OR instead of AND, otherwise --param large-function-growth won't
> work correctly with --param large-function-insns.
>
> gcc/ChangeLog:
>
>   2020-01-06  Luo Xiong Hu  
>
>   ipa-inline-analysis.c (estimate_growth): Fix typo.
>   ipa-inline.c (caller_growth_limits): Use OR instead of AND.
 OK
>>>
>>> This patch also seems wrong.  Inliner should return failure when
>>>newsize > param_large_function_insns && newsize > limit
>>> The reason is same as with large_unit_insns.  We used to have only
>>> growth limits but it has turned out that this gets too
>>> restrictive/random for very small units/growths.
>>>
>>> So the logic is meant to be that inlining is OK either
>>>- if function is reasonably small (large_function_insns limit is not
>>>  met)
>>>- or it does not grow too much (large-function-growth isnot met)
>>
>> Sorry for checking too fast.  Will revert once confirmed.
> 
> Yes, i can confirm that the origina test was as intended and it is OK to
> revert the patch (I am currently traveling)

Partially reverted in r279986 and remained the typo fix as it is obvious.

> 
>> large_function_insns is default to 2700, and large-function-growth is
>> default to 100, so inline a large function with insn 2700 to another large
>> function with insn 2700 is allowed here (limit reach to 5400), I observed
>> HUGE performance drop in some cases for this behavior(large function inline
>> large function due to register spilling) compared with not inlined, that's
>> why this patch comes out.
>>
>>  From performance point of view,it seems benefitial to inline small function 
>> to
>> large function, but negative when inline large function to large 
>> function(causing
>> register spill)?  2700 seems too large for defining a function to be large 
>> and and
>> 5400 also seems too big for growth limit?
>>
>> This inline happens at "Inlining one function called once", it's out of
>> inline_small_functions and since newsize > param_large_function_insns, so I
>> suppose it won't change behavior for inlining small functions?
> 
> large-function-growth/insns is not meant to improve performance. The
> reason why they exists is the fact that compiler is non-linar
> in some cases and thus producing very large functions makes it slow / or
> give up on some optimizations.
> 
> The problem that sometimes inlining a larger function into a cold path
> of other function leads to slowdown is rather old problem (see PR49194).
> In general most of logic in inliner is about determining when inlining
> is going to be useful. It is hard to tell when it is *not* going to be
> useful, so I am bit unsure what to do about these cases in general.
> 
> Recently I experimented path which disables inlining functions called
> once when callee and caller frequency differ (i.e. one is hot other is
> cold) etc, but it did not lead to consistent perormance improvements.
> 
> What situation did you run into?

Thanks.  So caller could be {hot, cold} + {large, small}, same for callee.  It 
may
produce up to 4 * 4 = 16 combinations.  Agree that hard to define useful,
and useful really doesn't reflect performance improvements certainly. :)

My case is A1(1) calls A2(2), A2(2) calls A3(3).  A1, A2, A3 are 
specialized cloned nodes with different input, they are all hot, called once
and each have about 1000+ insns.  By default, large-function-growth/insns are
both not reached, so A1 will inline A2, A2 will inline A3, which is 40% slower 
than no-inline.

If adjust the large-function-growth/insns to allow 
A2 inline A3 only, the performance is 20% slower then no-inline.
All the inlinings are generated in functions called once.


Xionghu

> 
> Honza
> 



*ping* [patch, fortran] Fix PR 65428, ICE on nested empty array constructors

2020-01-07 Thread Thomas Koenig

Am 02.01.20 um 23:35 schrieb Thomas Koenig:

Hello world,

the attached patch fixes an ICE where an array constructor
containing an empty array constructor with a type didn't
get the type from the inner constructor.

The solution is to stash the type away in yet another variable
and only use it if the constructor turns out to be empty, and
the type has not been set some other way.

Regression-tested. OK for trunk?


Ping?


Re: [PATCH] PR libstdc++/92124 on hashtable

2020-01-07 Thread François Dumont

On 1/6/20 4:17 PM, Jonathan Wakely wrote:

On 07/11/19 20:28 +0100, François Dumont wrote:
From what I understood from recent fix the unordered containers need 
to be updated the same way.


I hope you'll appreciate the usage of rvalue forwarding. Containers 


Yes, I think it makes sense.

node values are moved as soon as _M_assign is called with a rvalue 
reference to the source container.


Additionnaly this patch removes usages of lambdas in _Hashtable.

If you confirm it I'll check for the same on _Rb_tree.

    * include/bits/hashtable.h (_Hashtable<>::__alloc_node_gen_t): New
    template alias.
    (_Hashtable<>::__mv_if_value_type_mv_noexcept): New.
    (_Hashtable<>::__fwd_value): New.
    (_Hashtable<>::_M_assign_elements<>): Remove _NodeGenerator template
    parameter.
    (_Hashtable<>::_M_assign<>): Add _Ht template parameter.
    (_Hashtable<>::operator=(const _Hashtable<>&)): Adapt.
    (_Hashtable<>::_M_move_assign): Adapt.
    (_Hashtable<>::_Hashtable(const _Hashtable&)): Adapt.
    (_Hashtable<>::_Hashtable(const _Hashtable&, const 
allocator_type&)):

    Adapt.
    (_Hashtable<>::_Hashtable(_Hashtable&&, const allocator_type&)):
    Adapt.
    * testsuite/23_containers/unordered_set/92124.cc: New.

Tested under Linux x86_64.

Ok to commit ?

François



diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h

index ab579a7059e..c2b2219d471 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -255,6 +255,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

  using __reuse_or_alloc_node_gen_t =
__detail::_ReuseOrAllocNode<__node_alloc_type>;
+  using __alloc_node_gen_t =
+    __detail::_AllocNode<__node_alloc_type>;

  // Simple RAII type for managing a node containing an element
  struct _Scoped_node
@@ -280,6 +282,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__node_type* _M_node;
  };

+  template
+    static constexpr
+    typename conditional<__move_if_noexcept_cond::value,
+ const _Tp&, _Tp&&>::type
+    __mv_if_value_type_mv_noexcept(_Tp& __x) noexcept
+    { return std::move(__x); }


This is only used in one place. Adding a function doesn't seem
worthwhile, you can just do this where you use it:

  using _Fwd_Ht = typename
conditional<__move_if_noexcept_cond::value,
    const _Ht&, _Ht>::type;
  _M_assign(std::forward<_Fwd_Ht>(__ht), __alloc_gen);



+  template
+    static constexpr
+    typename conditional::value,
+ value_type&&, const value_type&>::type


I think I'd prefer to reverse the condition, i.e.

  typename conditional::value,
   const value_type&, value_type&&>::type


+    __fwd_value(_Ht&&, value_type& __val) noexcept
+    { return std::move(__val); }


Since this doesn't use the first parameter, it can just be removed:

  template
    static constexpr
    typename conditional::value,
 const value_type&, value_type&&>::type
    __fwd_value(value_type& __val) noexcept
    { return std::move(__val); }

That simplifies the usage from:

  __fwd_value(std::forward<_Ht>(__ht), __ht_n->_M_v()))

so it becomes:

  __fwd_value<_Ht>(__ht_n->_M_v()))


Maybe __fwd_value<_Ht> should be __fwd_value_for<_Ht> so it's clear
how it depends on _Ht (because otherwise it looks like it is
forwarding as _Ht&& like std::forward<_Ht> would).

What do you think?


The simpler the better. So here is the cleaned patch.

Regarding the comment, I also rewrite it a little cause copy/move of 
elements doesn't depend on _NodeGenerator anymore.


    PR libstdc++/92124
    * include/bits/hashtable.h (_Hashtable<>::__alloc_node_gen_t): New
    template alias.
    (_Hashtable<>::__fwd_value_for): New.
    (_Hashtable<>::_M_assign_elements<>): Remove _NodeGenerator template
    parameter.
    (_Hashtable<>::_M_assign<>): Add _Ht template parameter.
    (_Hashtable<>::operator=(const _Hashtable<>&)): Adapt.
    (_Hashtable<>::_M_move_assign): Adapt.
    (_Hashtable<>::_Hashtable(const _Hashtable&)): Adapt.
    (_Hashtable<>::_Hashtable(const _Hashtable&, const allocator_type&)):
    Adapt.
    (_Hashtable<>::_Hashtable(_Hashtable&&, const allocator_type&)):
    Adapt.
    * testsuite/23_containers/unordered_set/92124.cc: New.

Tested under Linux x86_64.

Ok to commit ?

François


diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index 06b3cca6437..48eaa7600f3 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -255,6 +255,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   using __reuse_or_alloc_node_gen_t =
 	__detail::_ReuseOrAllocNode<__node_alloc_type>;
+  using __alloc_node_gen_t =
+	__detail::_AllocNode<__node_alloc_type>;
 
   // Simple RAII type for managing a node containing an element
   struct _Scoped_node
@@ -280,6 +282,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__node_type* _M_node;
   };
 
+  template

Re: [PATCH GCC11]Improve uninitialized warning with value range info

2020-01-07 Thread bin.cheng
Sorry, here is the patch.
--
Sender:bin.cheng 
Sent At:2020 Jan. 8 (Wed.) 12:58
Recipient:GCC Patches 
Subject:[PATCH GCC11]Improve uninitialized warning with value range info


Hi,

Function use_pred_not_overlap_with_undef_path_pred of 
pass_late_warn_uninitialized
checks if predicate of variable use overlaps with predicate of undefined 
control flow path.
For now, it only checks ssa_var comparing against constant, this can be 
improved where
ssa_var compares against another ssa_var with value range info, as described in 
comment:

+ /* Check value range info of rhs, do following transforms:
+  flag_var < [min, max]  ->  flag_var < max
+  flag_var > [min, max]  ->  flag_var > min
+
+We can also transform LE_EXPR/GE_EXPR to LT_EXPR/GT_EXPR:
+  flag_var <= [min, max] ->  flag_var < [min, max+1]
+  flag_var >= [min, max] ->  flag_var > [min-1, max]
+if no overflow/wrap.  */

This change can avoid some false warning.  Bootstrap and test on x86_64, any 
comment?

Thanks,
bin

2020-01-08  Bin Cheng  

 * tree-ssa-uninit.c (find_var_cmp_const): New function.
 (use_pred_not_overlap_with_undef_path_pred): Call above.

check-uninit_warning-with-vrinfo-20200108.txt
Description: Binary data


[PATCH GCC11]Improve uninitialized warning with value range info

2020-01-07 Thread bin.cheng
Hi,

Function use_pred_not_overlap_with_undef_path_pred of 
pass_late_warn_uninitialized
checks if predicate of variable use overlaps with predicate of undefined 
control flow path.
For now, it only checks ssa_var comparing against constant, this can be 
improved where
ssa_var compares against another ssa_var with value range info, as described in 
comment:

+ /* Check value range info of rhs, do following transforms:
+  flag_var < [min, max]  ->  flag_var < max
+  flag_var > [min, max]  ->  flag_var > min
+
+We can also transform LE_EXPR/GE_EXPR to LT_EXPR/GT_EXPR:
+  flag_var <= [min, max] ->  flag_var < [min, max+1]
+  flag_var >= [min, max] ->  flag_var > [min-1, max]
+if no overflow/wrap.  */

This change can avoid some false warning.  Bootstrap and test on x86_64, any 
comment?

Thanks,
bin

2020-01-08  Bin Cheng  

* tree-ssa-uninit.c (find_var_cmp_const): New function.
(use_pred_not_overlap_with_undef_path_pred): Call above.

[PATCH] Align __patchable_function_entries to POINTER_SIZE

2020-01-07 Thread Fangrui Song via gcc-patches

Fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93194
>From 60f489f2bf2b32afd1bdbb2405bb028dcedf82cc Mon Sep 17 00:00:00 2001
From: Fangrui Song 
Date: Tue, 7 Jan 2020 20:46:26 -0800
Subject: [PATCH] Align __patchable_function_entries to POINTER_SIZE
To: gcc-patches@gcc.gnu.org

---
 gcc/ChangeLog   | 5 +
 gcc/targhooks.c | 1 +
 2 files changed, 6 insertions(+)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 44ae44c8260..74fdb773dc9 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2020-01-08  Fangrui Song  
+
+	* targhooks.c (default_print_patchable_function_entry): Align to
+	POINTER_SIZE.
+
 2020-01-08  Luo Xiong Hu  
 
 	* ipa-inline.c (caller_growth_limits): Restore the AND.
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 4819bb8058f..a3f83918f8e 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1812,6 +1812,7 @@ default_print_patchable_function_entry (FILE *file,
 
   switch_to_section (get_section ("__patchable_function_entries",
   SECTION_WRITE | SECTION_RELRO, NULL));
+  assemble_align (POINTER_SIZE);
   fputs (asm_op, file);
   assemble_name_raw (file, buf);
   fputc ('\n', file);
-- 
2.24.0



Re: [PATCH] Rename condition_variable_any::wait_on_* methods

2020-01-07 Thread Thomas Rodgers


Tested x86_64-pc-linux-gnu, committed to trunk.

Jonathan Wakely writes:

> On 10/12/19 22:38 -0800, Thomas Rodgers wrote:
>>User-agent: mu4e 1.3.4; emacs 26.2
>>  * include/std/condition_variable
>>  (condition_variable_any::wait_on(_Lock&, stop_token, _Predicate): Rename
>>  to match current draft standard.
>>  (condition_variable_any::wait_on_until(_Lock&, stop_token,
>>  const chrono::time_point<>&, _Predicate): Likewise.
>>  (condition_variable_any::wait_on_for(_Lock&, stop_token,
>>  const chrono::duration<>&, _Predicate(: Likewise.
>
> The closing paren here is an opening one. OK for trunk with that
> fixed.
>
> Optional tweaks ...
>
> Since the names wait_on, wait_on_until and wait_on_for are
> unambiguous, you could just list them without parameters i.e.
>
>   * include/std/condition_variable (condition_variable_any::wait_on)
>(condition_variable_any::wait_on_until)
>(condition_variable_any::wait_on_for): Rename to match current
>draft standard.
>
>>  * testsuite/30_threads/condition_variable_any/stop_token/wait_on.cc 
>> (main):
>
> We don't generally both to say which functions were modified in tests,
> so the (main) isn't needed here, but is harmless.
>
>>  Adjust tests to account for renamed methods.



Go patch committed: Fix loopdepth tracking in array slice

2020-01-07 Thread Ian Lance Taylor
This Go frontend patch by Cherry Zhang fixes the loopdepth tracking in
array slicing expressions in escape analysis.  In the gc compiler, for
slicing an array, its AST has an implicit address operation node.
There isn't such a node in the gofrontend AST.  During escape
analysis, we create a fake node to mimic the gc compiler's behavior.
For the fake node, the loopdepth was not tracked correctly, causing
miscompilation.  Since this is an address operation, do the same thing
as we do for the address operator.  This fixes
https://golang.org/issue/36404.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to trunk and GCC 9 branch.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 279979)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-6fa9657df508ff4d7760cf1abfad3611ba808561
+e0790a756e9ba77e2d3d6ef5d0abbb11dd71211b
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/escape.cc
===
--- gcc/go/gofrontend/escape.cc (revision 279815)
+++ gcc/go/gofrontend/escape.cc (working copy)
@@ -1906,7 +1906,8 @@ Escape_analysis_assign::expression(Expre
 this->context_->track(addr_node);
 
 Node::Escape_state* addr_state = addr_node->state(this->context_, 
NULL);
-addr_state->loop_depth = array_state->loop_depth;
+if (array_state->loop_depth != 0)
+  addr_state->loop_depth = array_state->loop_depth;
   }
   }
   break;


Go patch committed: Stop using __go_runtime_error

2020-01-07 Thread Ian Lance Taylor
This patch to the Go frontend and libgo stops using
__go_runtime_error.  We use specific panic functions instead, which
are mostly already in the runtime package.  We also correct "defer
nil" to panic when we execute the defer, rather than throw when we
queue it.  Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 279962)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-86d223eaccecff72b44cd23a014bc028b658055e
+6fa9657df508ff4d7760cf1abfad3611ba808561
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 279962)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -5270,13 +5270,13 @@ Unary_expression::do_get_backend(Transla
 Bexpression* compare =
 gogo->backend()->binary_expression(OPERATOR_EQEQ, tbexpr,
nil, loc);
-Bexpression* crash =
-gogo->runtime_error(RUNTIME_ERROR_NIL_DEREFERENCE,
-loc)->get_backend(context);
+   Expression* crash = Runtime::make_call(Runtime::PANIC_MEM,
+  loc, 0);
+   Bexpression* bcrash = crash->get_backend(context);
 Bfunction* bfn = context->function()->func_value()->get_decl();
 bexpr = gogo->backend()->conditional_expression(bfn, btype,
 compare,
-crash, ubexpr,
+bcrash, ubexpr,
 loc);
 known_valid = true;
 break;
@@ -7060,12 +7060,12 @@ Binary_expression::do_get_backend(Transl
  Bexpression* compare =
gogo->backend()->binary_expression(OPERATOR_LT, right, zero_expr,
   loc);
- const int errcode = RUNTIME_ERROR_SHIFT_BY_NEGATIVE;
- Bexpression* crash =
-   gogo->runtime_error(errcode, loc)->get_backend(context);
+ Expression* crash = Runtime::make_call(Runtime::PANIC_SHIFT,
+loc, 0);
+ Bexpression* bcrash = crash->get_backend(context);
  Bfunction* bfn = context->function()->func_value()->get_decl();
  ret = gogo->backend()->conditional_expression(bfn, btype, compare,
-   crash, ret, loc);
+   bcrash, ret, loc);
}
 }
 
@@ -7081,15 +7081,14 @@ Binary_expression::do_get_backend(Transl
   gogo->backend()->binary_expression(OPERATOR_EQEQ,
  right, zero_expr, loc);
 
- // __go_runtime_error(RUNTIME_ERROR_DIVISION_BY_ZERO)
- int errcode = RUNTIME_ERROR_DIVISION_BY_ZERO;
- Bexpression* crash = gogo->runtime_error(errcode,
-  loc)->get_backend(context);
+ Expression* crash = Runtime::make_call(Runtime::PANIC_DIVIDE,
+loc, 0);
+ Bexpression* bcrash = crash->get_backend(context);
 
- // right == 0 ? (__go_runtime_error(...), 0) : ret
+ // right == 0 ? (panicdivide(), 0) : ret
   Bfunction* bfn = context->function()->func_value()->get_decl();
   ret = gogo->backend()->conditional_expression(bfn, btype,
-check, crash,
+check, bcrash,
ret, loc);
}
 
@@ -8071,8 +8070,7 @@ Bound_method_expression::do_flatten(Gogo
 
   if (nil_check != NULL)
 {
-  Expression* crash = gogo->runtime_error(RUNTIME_ERROR_NIL_DEREFERENCE,
- loc);
+  Expression* crash = Runtime::make_call(Runtime::PANIC_MEM, loc, 0);
   // Fix the type of the conditional expression by pretending to
   // evaluate to RET either way through the conditional.
   crash = Expression::make_compound(crash, ret, loc);
@@ -8886,11 +8884,8 @@ Builtin_call_expression::flatten_append(
   Expression* zero = Expression::make_integer_ul(0, int_type, loc);
   Expression* cond = Expression::make_binary(OPERATOR_LT, len2,
  zero, loc);
-  Expression* arg =
-

Re: [PATCH] libstdcxx: Update ctype_base.h from NetBSD upstream

2020-01-07 Thread Jason Thorpe


> On Jan 7, 2020, at 7:43 AM, Jonathan Wakely  wrote:
> 
> For Jason and Krister's benefit, that last comment was referring to
> an earlier suggestion to not try to support old NetBSD releases, see
> https://gcc.gnu.org/ml/libstdc++/2020-01/msg00026.html
> 
>> I think we need the netbsd target maintainers (CC'd) to decide whether
>> GCC should still support older releases or drop that support for GCC
>> 10. Usually I'd say we need a period of deprecation, but if GCC
>> doesn't currently build on NetBSD then maybe that's unnecessary.

The affected NetBSD versions are NetBSD 6 and earlier, which are EOL from the 
NetBSD perspective, so I think this is OK.

-- thorpej



Re: [committed] Fix UB in gfc_trans_omp_clauses (PR fortran/93162)

2020-01-07 Thread Thomas Schwinge
Hi!

On 2020-01-07T08:20:44+0100, Jakub Jelinek  wrote:
> When I wrote the code, for map clause the arguments couldn't contain any
> REF_COMPONENT (nor REF_UNQUIRY nor REF_SUBSTRING) and therefore it was
> ok (although unclean) to just look at u.ar.type

Jakub, thanks for fixing this.

> but now that REF_COMPONENT
> can appear there (so far for OpenACC only, although OpenMP 5.0 also allows
> it), that is no longer the case.

Do I understand correctly that this relates to the r279631 "Fortran
polymorphic class-type support for OpenACC" changes?

> Not really sure if the code doesn't need
> further changes (how will e.g. var%comp(:) be handled in the mapping
> clauses?), but this conditional is clearly wrong.

Julian and/or Tobias, will you please review that at some point (in
context of the OpenACC usage/support recently introduced), and add test
cases etc.  (... but need not be done right now.)


Grüße
 Thomas


> 2020-01-07  Jakub Jelinek  
>
>   PR fortran/93162
>   * trans-openmp.c (gfc_trans_omp_clauses): Check for REF_ARRAY type
>   before testing u.ar.type == AR_FULL.
>
> --- gcc/fortran/trans-openmp.c.jj 2020-01-04 00:14:28.511400132 +0100
> +++ gcc/fortran/trans-openmp.c2020-01-06 17:01:10.538816323 +0100
> @@ -2495,7 +2495,9 @@ gfc_trans_omp_clauses (stmtblock_t *bloc
> tree decl = gfc_trans_omp_variable (n->sym, false);
> if (DECL_P (decl))
>   TREE_ADDRESSABLE (decl) = 1;
> -   if (n->expr == NULL || n->expr->ref->u.ar.type == AR_FULL)
> +   if (n->expr == NULL
> +   || (n->expr->ref->type == REF_ARRAY
> +   && n->expr->ref->u.ar.type == AR_FULL))
>   {
> tree present = gfc_omp_check_optional_argument (decl, true);
> if (n->sym->ts.type == BT_CLASS)


signature.asc
Description: PGP signature


Re: [patch] relax aarch64 stack-clash tests depedence on alloca.h

2020-01-07 Thread Richard Sandiford
Olivier Hainque  writes:
> Hi Andrew,
>
>> On 6 Jan 2020, at 23:24, Andrew Pinski  wrote:
>> Just one small suggestion:
>
> Sure
>
>> Instead of:
>> -  char* pStr = alloca(SIZE);
>> +  char* pStr = __builtin_alloca(SIZE);
>> 
>> Why not just do:
>> -#include 
>> +#define alloca __builtin_alloca
>
> Yes, good idea.
>
> Revised patch attached, where I also added a comment
> explaining why we are doing this.
>
> Re-tested on aarch64-linux.
>
> Is this one ok ?
>
> Thanks,
>
> Olivier
>
> 2020-01-06  Olivier Hainque  
>   Alexandre Oliva  
>
>   * gcc.target/aarch64/stack-check-alloca.h: Remove
>   #include alloca.h. #define alloca __builtin_alloca
>   instead.
>   * gcc.target/aarch64/stack-check-alloca-1.c: Add
>   { dg-require-effective-target alloca }.
>   * gcc.target/aarch64/stack-check-alloca-2.c: Likewise.
>   * gcc.target/aarch64/stack-check-alloca-3.c: Likewise.
>   * gcc.target/aarch64/stack-check-alloca-4.c: Likewise.
>   * gcc.target/aarch64/stack-check-alloca-5.c: Likewise.
>   * gcc.target/aarch64/stack-check-alloca-6.c: Likewise.
>   * gcc.target/aarch64/stack-check-alloca-7.c: Likewise.
>   * gcc.target/aarch64/stack-check-alloca-8.c: Likewise.
>   * gcc.target/aarch64/stack-check-alloca-9.c: Likewise.
>   * gcc.target/aarch64/stack-check-alloca-10.c: Likewise.

OK, thanks.

Richard


Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [1/2]

2020-01-07 Thread Richard Sandiford
Thanks for the update.  The new patch looks really good, just some
minor comments.

Stam Markianos-Wright  writes:
> [...]
> Also I've update the filenames of all our tests to make them a bit clearer:
>
> C tests:
>
> __ bfloat16_scalar_compile_1.c to bfloat16_scalar_compile_3.c: Compilation of 
> scalar moves/loads/stores with "-march8.2-a+bf16", "-march8.2-a and +bf16 
> target 
> pragma", "-march8.2-a" (now does not error out at all). There now include 
> register asms to check more MOV alternatives.
>
> __ bfloat16_scalar_compile_4.c: The _Complex error test.
>
> __ bfloat16_simd_compile_1.c to bfloat16_simd_compile_3.c: Likewise to 
> x_scalar_x, but also include (vector) 0x1234.. compilation (no assembler 
> scan).

Sounds good to me, although TBH the "_compile" feels a bit redundant.

> I had also done a small c++ test, but have chosen to shift that to the [2/2] 
> patch because it is currently being blocked by target_invalid_conversion.

OK.  Does that include the mangling test?

> [...]
 - a test that involves moving constants, for both scalars and vectors.
 You can create zero scalar constants in C++ using bfloat16_t() etc.
 For vectors it's possible to do things like:

   typedef short v2bf __attribute__((vector_size(4)));
   v2hi foo (void) { return (v2hi) 0x12345678; }

 The same sort of things should work for bfloat16x4_t and bfloat16x8_t.
>>>
>>> Leaving this as an open issue for now because I'm not 100% sure what we
>>> should/shouldn't be allowing past the tree-level target hooks.
>>>
>>> If we do want to block this we would do this in the [2/2] patch.
>>> I will come back to it and create a scan-assembler test when I'm more clear 
>>> on
>>> what we should and shouldn't allow at the higher level :)
>> 
>> FWIW, I'm not sure we should go out of our way to disallow this.
>> Preventing bfloat16_t() in C++ would IMO be unnatural.  And the
>> "(vector) vector-sized-integer" syntax specifically treats the vector
>> as a bundle of bits without really caring what the element type is.
>> Even if we did manage to forbid the conversion in that context,
>> it would still be possible to achieve the same thing using:
>> 
>> v2hi
>> foo (void)
>> {
>>   union { v2hi v; unsigned int i; } u;
>>   u.i = 0x12345678;
>>   return u.v;
>> }
>> 
> Added the compilation of "(vector) vector-sized-integer" in the vector tests.
>
> But target_invalid_conversion in the [2/2] patch is a complication to this 
> (as 
> with bfloat_16t() in c++.
>
> I was under the impression that the original intent of bfloat was for it to 
> be 
> storage only, with any initialisation happening through the float32 convert 
> intrinsic.
>
> Either I'd be happy to allow it, but it does feel like we'd slightly be going 
> against what's the ACLE currently.
> However, looking back at it now, it only mentions using ACLE intrinsics over 
> C 
> operators, so I'd be happy to allow this for vectors.
>
> For scalars though, if we e.g. were to allow:
>
> bfloat16_t (0x1234);
>
> on a single bfloat, I don't see how we could still block conversions like:
>
> bfloat16_t scalar1 = 0.1;
> bfloat16_t scalar2 = 0;
> bfloat16_t scalar3 = is_a_float;
>
> Agreed that the union {} would still always slip through, though.

It wasn't clear sorry, but I meant literally "bfloat16_t()", i.e.
construction with zero initialisation.  I agree we don't want to
support "bfloat16_t(0.25)" etc.

> [...]
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/bfloat16_compile_1.c 
>>> b/gcc/testsuite/gcc.target/aarch64/bfloat16_compile_1.c
>>> new file mode 100644
>>> index 000..f2bef671deb
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/bfloat16_compile_1.c
>>> @@ -0,0 +1,51 @@
>>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>>> +/* { dg-add-options arm_v8_2a_bf16_neon }  */
>>> +/* { dg-additional-options "-O3 --save-temps" } */
>>> +/* { dg-final { check-function-bodies "**" "" } } */
>>> +
>>> +#include 
>>> +
>>> +/*
>>> +**stacktest1:
>>> +** ...
>>> +** str h0, \[sp, [0-9]+\]
>>> +** ldr h0, \[sp, [0-9]+\]
>>> +** ...
>>> +** ret
>>> +*/
>>> +bfloat16_t stacktest1 (bfloat16_t __a)
>>> +{
>>> +  volatile bfloat16_t b = __a;
>>> +  return b;
>>> +}
>>> +
>>> +/*
>>> +**stacktest2:
>>> +** ...
>>> +** str d0, \[sp, [0-9]+\]
>>> +** ldr d0, \[sp, [0-9]+\]
>>> +** ...
>>> +** ret
>>> +*/
>>> +bfloat16x4_t stacktest2 (bfloat16x4_t __a)
>>> +{
>>> +  volatile bfloat16x4_t b = __a;
>>> +  return b;
>>> +}
>>> +
>>> +/*
>>> +**stacktest3:
>>> +** ...
>>> +** str q0, \[sp\]
>>> +** ldr q0, \[sp\]
>>> +** ...
>>> +** ret
>>> +*/
>>> +bfloat16x8_t stacktest3 (bfloat16x8_t __a)
>>> +{
>>> +  volatile bfloat16x8_t b = __a;
>>> +  return b;
>>> +}
>> 
>> Might be a daft question, but why do we have an offset for the first
>> two and not for the last one?  Might be worth hard-coding 

Register and Learn Strategies for Managing Increased Tax Rates

2020-01-07 Thread CannaWest Summit
CannaWest   
     
 
http://links.infocastevents.mkt8115.com/ctt?kn=13=NDE0ODk2MDMS1=NjkyMTk1NzM3MTk0S0=2=MTY4MDY1Nzc0NwS2=1=0
  

The CannaWest Summit is a tremendous opportunity for licensed operators from 
across the west to hear discussions and learn strategies that will help their 
business mitigate the new increased cannabis mark-up and cultivation tax rate 
changes. Regulators will discuss the possible move to a “potency-based or 
tiered ad valorem tax” for the cannabis market to stabilize revenues and 
discourage abuse.  

Now in its fifth year, this remarkable event will bring together the West’s 
most senior regulators who will directly address both longer-term and currently 
unresolved issues. Network and hear about the continuing evolution of policy in 
the regulated cannabis space from the key decision-makers themselves at the  
CannaWest Summit.

Don’t Miss This Valuable Opportunity!
Reserve Your Spot Today. 
Register Now

Featured Speakers:
Hezekiah Allen, President, GLOBAL CANNABIS INNOVATORS CORP
Ryan C. Bacchas, Chairman & President, CALIFORNIA CANNABIS COALITION
Zack Crafton, CEO, ESI LOGISTICS, President, BIG MOON SKY
Ted Lichtenberger, Chief Executive Officer, FLOWER CO
Joel Miller, Chief People Officer, CANNACRAFT
Michael Moussalli, Co-Founder and Partner, SE7ENLEAF 
Felipe Recalde, Co-Founder, CALIFORNIA DISTRIBUTION ASSOCIATION
Cynthia Salarizadeh, Founder and President, HOUSE OF SAKA
Kim Stemler, Executive Director, MONTEREY COUNTY VINTNERS AND GROWERS 
ASSOCIATION


---
 
Interested in being a sponsor?   
Email lawren...@infocastevents.com or call (818) 888-4445 ext. 31   
 

Copyright © 2019 Information Forecast, Inc., All rights reserved.
  
Infocast, 20931 Burbank Boulevard, Woodland Hills, CA 91367, United States
  
Contact Us mailto:i...@infocastevents.com?subject= 
Visit Website 
http://links.infocastevents.mkt8115.com/ctt?kn=3=NDE0ODk2MDMS1=NjkyMTk1NzM3MTk0S0=2=MTY4MDY1Nzc0NwS2=1=0
 
Unsubscribe from all emails 
http://www.pages03.net/informationforecastinc/UnsubAll/UnsubAllForm?spMailingID=41489603=NjkyMTk1NzM3MTk0S0=MTY4MDY1Nzc0NwS2=MTY4MDY1Nzc0NwS2
 



Re: [PATCH] Relax invalidation of TOP N counters in PGO.

2020-01-07 Thread Jan Hubicka
> On 1/7/20 11:08 AM, Jan Hubicka wrote:
> > > On 1/6/20 8:03 PM, Jan Hubicka wrote:
> > > > > > > OK
> > > > > > Actually I am not so sure about this patch - how do we ensure
> > > > > > reproducibility in this case?
> > > > > ISTM that anyone trying to have reproducible builds shouldn't be using
> > > > > PGO based optimizations.
> > > > 
> > > > OpenSUSE does that. Builds are supposed to be reproducible + PGO is used
> > > > for number of core packages (like GCC itself).  This was motivation of
> > > > original Martin's change to drop TOPN values from profile more
> > > > agressively.
> > > 
> > > I agree with Honza that we should provide as reproducible builds as 
> > > possible.
> > > On the other hand, we should not be rigid and allow a reasonable sample 
> > > loss.
> > > That's why I added a parameter that be tweaked at openSUSE side.
> > > 
> > > Note that original motivation for the patch was your PR :) So that I'm 
> > > surprised
> > > I don't like the patch much.
> > 
> > I know. However I wonder if this can be done in reproducible way.
> 
> My patch deals with TOP N counters for which we expect quite a high
> frequency to trigger an optimization. That said, such values should
> not be put out of a TOP N counter.
> 
> > Problem is that small values from profile run polutes the on-disk
> > counter and eventually renders it invalid.
> 
> Considering my patch, how will these small values lead to an invalid counter?
> 
> >  If libgcov before meerging
> > or streaming dropped alues smaller than TOTAL_COUNT/N from the histogram
> > it may be enough to fit in the number of entires? (at least in the
> > testcase they seem very dominated by one value).
> 
> This will have probably the same effect as the patch I've suggested.

It will not.  If you do that before merging/streaming it will remain
reproducible (i.e. no matter what in what order you merge data you will
always get same histogram)

I may be missing something, but I do not see how you can recover
reproducibility from info you stream w/o invalidating the counter 
at merging time. 

Honza


Re: [PATCH] Make warn_inline Optimization option.

2020-01-07 Thread Jan Hubicka
> On Tue, Jan 7, 2020 at 3:26 PM Jan Hubicka  wrote:
> >
> > > Err - Optimization also lists it in some -help section?  It's a Warning
> > > option and certainly we don't handle per-function Warnings in general
> > > (with LTO) even though we have #pragma GCC diagnostic, no?
> > >
> > > I'm not sure why we force warn_inline to zero with -O0, it seems much
> > > better to guard false warnings in some other way for -O0?
> >
> > Well, we can do that with warn_inline, but in general we do want to
> > stream late warnings (so things like -Wmaybe-uninitialized works sort of
> > as expected for -flto). So I guess we want way to mark option as part of
> > TARGET_OPTIMIZATION_NODE even though it is not realy an optimization
> > option but parameter, warning or semantic change.
> 
> Given all warning options can be enabled/disabled via #pragma GCC diagnostic
> all Warning annotated options should be implicitely 'Optimization' for
> the purpose
> of LTO streaming then?

Well, perhaps they can be marked but for late optimizations this does
not work
__attribute__ ((warning("haha"))) test() { }
#pragma gcc diagnostic ignore "-Wattribute-warning"
test2() { test(); }

We have many warning options but only few of them are late - it would be
nice to have them explicitly marked somehow IMO (by design and to avoid
streaming a lot of useless flags)

honza

> 
> Richard.
> 
> > Honza


Re: [PATCH] libstdcxx: Update ctype_base.h from NetBSD upstream

2020-01-07 Thread Jonathan Wakely

On 07/01/20 15:40 +, Jonathan Wakely wrote:

On 07/01/20 01:30 +0100, Kamil Rytarowski wrote:

On 07.01.2020 01:26, Jonathan Wakely wrote:

On 06/01/20 23:20 +0100, Kamil Rytarowski wrote:

On 06.01.2020 16:24, Jonathan Wakely wrote:

On 22/12/19 09:36 +1000, Gerald Pfeifer wrote:

Hi Matthew,

On Mon, 4 Feb 2019, Matthew Bauer wrote:

The ctype_base.h file in libstdc++-v3 is out of date for NetBSD. They
have changed their ctype.h definition. It was updated in their intree
libstdc++-v3 but not in the GCC one. My understanding is this is a
straightforward rewrite. I've attached my own patch, but the file can
be obtained directly here:

http://cvsweb.netbsd.org/bsdweb.cgi/src/external/gpl3/gcc/dist/libstdc%2b%2b-v3/config/os/bsd/netbsd/ctype_base.h



With the attached patch, libstdc++-v3 can succesfully be built with
NetBSD headers (along with --disable-libcilkrts).


I noticed this has not been applied yet, nor seen a follow-up?, and
also
noticed it went to the gcc-patches list, but not libstd...@gcc.gnu.org.

Let me re-address this to libstd...@gcc.gnu.org in the hope the
maintainers there will have a look.

Gerald



diff --git a/libstdc++-v3/config/os/bsd/netbsd/ctype_base.h
b/libstdc++-v3/config/os/bsd/netbsd/ctype_base.h
index ff3ec893974..21eccf9fde1 100644
--- a/libstdc++-v3/config/os/bsd/netbsd/ctype_base.h
+++ b/libstdc++-v3/config/os/bsd/netbsd/ctype_base.h
@@ -38,40 +38,46 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  /// @brief  Base class for ctype.
  struct ctype_base
  {
-    // Non-standard typedefs.
-    typedef const unsigned char*    __to_type;

    // NB: Offsets into ctype::_M_table force a particular size
    // on the mask type. Because of this, we don't use an enum.
-    typedef unsigned char  mask;

#ifndef _CTYPE_U
-    static const mask upper    = _U;
-    static const mask lower = _L;
-    static const mask alpha = _U | _L;
-    static const mask digit = _N;
-    static const mask xdigit = _N | _X;
-    static const mask space = _S;
-    static const mask print = _P | _U | _L | _N | _B;
-    static const mask graph = _P | _U | _L | _N;
-    static const mask cntrl = _C;
-    static const mask punct = _P;
-    static const mask alnum = _U | _L | _N;
+    // Non-standard typedefs.
+    typedef const unsigned char*    __to_type;
+
+    typedef unsigned char    mask;
+
+    static const mask upper    = _U;
+    static const mask lower    = _L;
+    static const mask alpha    = _U | _L;
+    static const mask digit    = _N;
+    static const mask xdigit    = _N | _X;
+    static const mask space    = _S;
+    static const mask print    = _P | _U | _L | _N | _B;
+    static const mask graph    = _P | _U | _L | _N;
+    static const mask cntrl    = _C;
+    static const mask punct    = _P;
+    static const mask alnum    = _U | _L | _N;
#else
-    static const mask upper    = _CTYPE_U;
-    static const mask lower = _CTYPE_L;
-    static const mask alpha = _CTYPE_U | _CTYPE_L;
-    static const mask digit = _CTYPE_N;
-    static const mask xdigit = _CTYPE_N | _CTYPE_X;
-    static const mask space = _CTYPE_S;
-    static const mask print = _CTYPE_P | _CTYPE_U | _CTYPE_L |
_CTYPE_N | _CTYPE_B;
-    static const mask graph = _CTYPE_P | _CTYPE_U | _CTYPE_L |
_CTYPE_N;
-    static const mask cntrl = _CTYPE_C;
-    static const mask punct = _CTYPE_P;
-    static const mask alnum = _CTYPE_U | _CTYPE_L | _CTYPE_N;
+    typedef const unsigned short*    __to_type;
+
+    typedef unsigned short    mask;


I seem to recall looking at this previously and noting that the change
to ctype_base::mask is an ABI break. It means that code compiled with
old versions of GCC or on old versions of NetBSD will not be ABI
compatible with code compiled by a new GCC on a new version of NetBSD.

If the NetBSD maintainers are OK with that, then we can go ahead and
change it.




We are fine with ABI breaks as we bump libstdc++ major on each upgrade
in base.


That affects the libstdc++ built from NetBSD ports collection, but if
somebody builds GCC themselves using the upstream sources (which is
what these patches are trying to change) then they don't get that
bumped major version.

So if somebody builds GCC 9 themselves, and then builds GCC 10
themselves on the same machine, they'll get two versions of
libstdc++.so that are not ABI compatible.




I know, but we don't support this in NetBSD. We ship with our own major
numbers for GCC. If someone wants to maintain out of base or out of
pkgsrc it will have likely more problems than ABI breakage.


But that's exactly what upstream GCC sources are (out of pkgsrc), and
that's where I'm being asked to include this patch.

NetBSD is free to not support it, of course, but we (GCC) have
different schedules and different priorities.

And if older versions of NetBSD are not supported, can we just drop
the `#ifndef _CTYPE_U` branch in config/os/bsd/netbsd/ctype_base.h ?

It seems 

Re: [PING^5] Re: [PATCH 1/2] Add a pass to automatically add ptwrite instrumentation

2020-01-07 Thread Andi Kleen
Andi Kleen  writes:

Ping!

> Andi Kleen  writes:
>
> Ping!
>
>> Andi Kleen  writes:
>>
>> Ping!
>>
>>> Andi Kleen  writes:
>>>
>>> Ping!
>>>
 Andi Kleen  writes:

 Ping!

> From: Andi Kleen 
>
> [v4: Rebased on current tree. Avoid some redundant log statements
> for locals and a few other fixes.  Fix some comments. Improve
> documentation. Did some studies on the debug information quality,
> see below]
>
> Add a new pass to automatically instrument changes to variables
> with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
> field into an Processor Trace log, which allows low over head
> logging of information. Essentially it's a hardware accelerated
> printf.


Re: [PATCH] ipa-inline: Adjust condition for caller_growth_limits

2020-01-07 Thread Jan Hubicka
> 
> 
> On 2020/1/7 16:40, Jan Hubicka wrote:
> >> On Mon, 2020-01-06 at 01:03 -0600, Xiong Hu Luo wrote:
> >>> Inline should return failure either (newsize > param_large_function_insns)
> >>> OR (newsize > limit).  Sometimes newsize is larger than
> >>> param_large_function_insns, but smaller than limit, inline doesn't return
> >>> failure even if the new function is a large function.
> >>> Therefore, param_large_function_insns and param_large_function_growth 
> >>> should be
> >>> OR instead of AND, otherwise --param large-function-growth won't
> >>> work correctly with --param large-function-insns.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>   2020-01-06  Luo Xiong Hu  
> >>>
> >>>   ipa-inline-analysis.c (estimate_growth): Fix typo.
> >>>   ipa-inline.c (caller_growth_limits): Use OR instead of AND.
> >> OK
> > 
> > This patch also seems wrong.  Inliner should return failure when
> >   newsize > param_large_function_insns && newsize > limit
> > The reason is same as with large_unit_insns.  We used to have only
> > growth limits but it has turned out that this gets too
> > restrictive/random for very small units/growths.
> > 
> > So the logic is meant to be that inlining is OK either
> >   - if function is reasonably small (large_function_insns limit is not
> > met)
> >   - or it does not grow too much (large-function-growth isnot met)
> 
> Sorry for checking too fast.  Will revert once confirmed.

Yes, i can confirm that the origina test was as intended and it is OK to
revert the patch (I am currently traveling)

> large_function_insns is default to 2700, and large-function-growth is 
> default to 100, so inline a large function with insn 2700 to another large 
> function with insn 2700 is allowed here (limit reach to 5400), I observed 
> HUGE performance drop in some cases for this behavior(large function inline 
> large function due to register spilling) compared with not inlined, that's 
> why this patch comes out.
> 
> From performance point of view,it seems benefitial to inline small function 
> to 
> large function, but negative when inline large function to large 
> function(causing
> register spill)?  2700 seems too large for defining a function to be large 
> and and
> 5400 also seems too big for growth limit?
> 
> This inline happens at "Inlining one function called once", it's out of 
> inline_small_functions and since newsize > param_large_function_insns, so I  
> suppose it won't change behavior for inlining small functions? 

large-function-growth/insns is not meant to improve performance. The
reason why they exists is the fact that compiler is non-linar
in some cases and thus producing very large functions makes it slow / or
give up on some optimizations.

The problem that sometimes inlining a larger function into a cold path
of other function leads to slowdown is rather old problem (see PR49194).
In general most of logic in inliner is about determining when inlining
is going to be useful. It is hard to tell when it is *not* going to be
useful, so I am bit unsure what to do about these cases in general.

Recently I experimented path which disables inlining functions called
once when callee and caller frequency differ (i.e. one is hot other is
cold) etc, but it did not lead to consistent perormance improvements.

What situation did you run into?

Honza


Re: [PATCH] libstdcxx: Update ctype_base.h from NetBSD upstream

2020-01-07 Thread Jonathan Wakely

On 07/01/20 01:30 +0100, Kamil Rytarowski wrote:

On 07.01.2020 01:26, Jonathan Wakely wrote:

On 06/01/20 23:20 +0100, Kamil Rytarowski wrote:

On 06.01.2020 16:24, Jonathan Wakely wrote:

On 22/12/19 09:36 +1000, Gerald Pfeifer wrote:

Hi Matthew,

On Mon, 4 Feb 2019, Matthew Bauer wrote:

The ctype_base.h file in libstdc++-v3 is out of date for NetBSD. They
have changed their ctype.h definition. It was updated in their intree
libstdc++-v3 but not in the GCC one. My understanding is this is a
straightforward rewrite. I've attached my own patch, but the file can
be obtained directly here:

http://cvsweb.netbsd.org/bsdweb.cgi/src/external/gpl3/gcc/dist/libstdc%2b%2b-v3/config/os/bsd/netbsd/ctype_base.h



With the attached patch, libstdc++-v3 can succesfully be built with
NetBSD headers (along with --disable-libcilkrts).


I noticed this has not been applied yet, nor seen a follow-up?, and
also
noticed it went to the gcc-patches list, but not libstd...@gcc.gnu.org.

Let me re-address this to libstd...@gcc.gnu.org in the hope the
maintainers there will have a look.

Gerald



diff --git a/libstdc++-v3/config/os/bsd/netbsd/ctype_base.h
b/libstdc++-v3/config/os/bsd/netbsd/ctype_base.h
index ff3ec893974..21eccf9fde1 100644
--- a/libstdc++-v3/config/os/bsd/netbsd/ctype_base.h
+++ b/libstdc++-v3/config/os/bsd/netbsd/ctype_base.h
@@ -38,40 +38,46 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  /// @brief  Base class for ctype.
  struct ctype_base
  {
-    // Non-standard typedefs.
-    typedef const unsigned char*    __to_type;

    // NB: Offsets into ctype::_M_table force a particular size
    // on the mask type. Because of this, we don't use an enum.
-    typedef unsigned char  mask;

#ifndef _CTYPE_U
-    static const mask upper    = _U;
-    static const mask lower = _L;
-    static const mask alpha = _U | _L;
-    static const mask digit = _N;
-    static const mask xdigit = _N | _X;
-    static const mask space = _S;
-    static const mask print = _P | _U | _L | _N | _B;
-    static const mask graph = _P | _U | _L | _N;
-    static const mask cntrl = _C;
-    static const mask punct = _P;
-    static const mask alnum = _U | _L | _N;
+    // Non-standard typedefs.
+    typedef const unsigned char*    __to_type;
+
+    typedef unsigned char    mask;
+
+    static const mask upper    = _U;
+    static const mask lower    = _L;
+    static const mask alpha    = _U | _L;
+    static const mask digit    = _N;
+    static const mask xdigit    = _N | _X;
+    static const mask space    = _S;
+    static const mask print    = _P | _U | _L | _N | _B;
+    static const mask graph    = _P | _U | _L | _N;
+    static const mask cntrl    = _C;
+    static const mask punct    = _P;
+    static const mask alnum    = _U | _L | _N;
#else
-    static const mask upper    = _CTYPE_U;
-    static const mask lower = _CTYPE_L;
-    static const mask alpha = _CTYPE_U | _CTYPE_L;
-    static const mask digit = _CTYPE_N;
-    static const mask xdigit = _CTYPE_N | _CTYPE_X;
-    static const mask space = _CTYPE_S;
-    static const mask print = _CTYPE_P | _CTYPE_U | _CTYPE_L |
_CTYPE_N | _CTYPE_B;
-    static const mask graph = _CTYPE_P | _CTYPE_U | _CTYPE_L |
_CTYPE_N;
-    static const mask cntrl = _CTYPE_C;
-    static const mask punct = _CTYPE_P;
-    static const mask alnum = _CTYPE_U | _CTYPE_L | _CTYPE_N;
+    typedef const unsigned short*    __to_type;
+
+    typedef unsigned short    mask;


I seem to recall looking at this previously and noting that the change
to ctype_base::mask is an ABI break. It means that code compiled with
old versions of GCC or on old versions of NetBSD will not be ABI
compatible with code compiled by a new GCC on a new version of NetBSD.

If the NetBSD maintainers are OK with that, then we can go ahead and
change it.




We are fine with ABI breaks as we bump libstdc++ major on each upgrade
in base.


That affects the libstdc++ built from NetBSD ports collection, but if
somebody builds GCC themselves using the upstream sources (which is
what these patches are trying to change) then they don't get that
bumped major version.

So if somebody builds GCC 9 themselves, and then builds GCC 10
themselves on the same machine, they'll get two versions of
libstdc++.so that are not ABI compatible.




I know, but we don't support this in NetBSD. We ship with our own major
numbers for GCC. If someone wants to maintain out of base or out of
pkgsrc it will have likely more problems than ABI breakage.


But that's exactly what upstream GCC sources are (out of pkgsrc), and
that's where I'm being asked to include this patch.

NetBSD is free to not support it, of course, but we (GCC) have
different schedules and different priorities.

And if older versions of NetBSD are not supported, can we just drop
the `#ifndef _CTYPE_U` branch in config/os/bsd/netbsd/ctype_base.h ?

It seems wrong to keep support for old systems in one file 

Go patch committed: Avoid write barrier for a[i] = a[i][:v]

2020-01-07 Thread Ian Lance Taylor
This patch to the Go frontend avoids generating a write barrier for
code that appears in the Go1.14beta1 runtime package in
(*pageAlloc).sysGrow:
s.summary[l] = s.summary[l][:needIdxLimit]
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 279956)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-d0a102eea2262e3fca89b1eb342fd03328c4aa16
+86d223eaccecff72b44cd23a014bc028b658055e
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 279956)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -183,6 +183,24 @@ Expression::is_same_variable(Expression*
  bu->operand()));
 }
 
+  Array_index_expression* aie = a->array_index_expression();
+  if (aie != NULL)
+{
+  Array_index_expression* bie = b->array_index_expression();
+  return (aie->end() == NULL
+ && bie->end() == NULL
+ && Expression::is_same_variable(aie->array(), bie->array())
+ && Expression::is_same_variable(aie->start(), bie->start()));
+}
+
+  Numeric_constant aval;
+  if (a->numeric_constant_value())
+{
+  Numeric_constant bval;
+  if (b->numeric_constant_value())
+   return aval.equals(bval);
+}
+
   return false;
 }
 


Re: [patch] relax aarch64 stack-clash tests depedence on alloca.h

2020-01-07 Thread Olivier Hainque
Hi Andrew,

> On 6 Jan 2020, at 23:24, Andrew Pinski  wrote:
> Just one small suggestion:

Sure

> Instead of:
> -  char* pStr = alloca(SIZE);
> +  char* pStr = __builtin_alloca(SIZE);
> 
> Why not just do:
> -#include 
> +#define alloca __builtin_alloca

Yes, good idea.

Revised patch attached, where I also added a comment
explaining why we are doing this.

Re-tested on aarch64-linux.

Is this one ok ?

Thanks,

Olivier

2020-01-06  Olivier Hainque  
Alexandre Oliva  

* gcc.target/aarch64/stack-check-alloca.h: Remove
#include alloca.h. #define alloca __builtin_alloca
instead.
* gcc.target/aarch64/stack-check-alloca-1.c: Add
{ dg-require-effective-target alloca }.
* gcc.target/aarch64/stack-check-alloca-2.c: Likewise.
* gcc.target/aarch64/stack-check-alloca-3.c: Likewise.
* gcc.target/aarch64/stack-check-alloca-4.c: Likewise.
* gcc.target/aarch64/stack-check-alloca-5.c: Likewise.
* gcc.target/aarch64/stack-check-alloca-6.c: Likewise.
* gcc.target/aarch64/stack-check-alloca-7.c: Likewise.
* gcc.target/aarch64/stack-check-alloca-8.c: Likewise.
* gcc.target/aarch64/stack-check-alloca-9.c: Likewise.
* gcc.target/aarch64/stack-check-alloca-10.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-1.c 
b/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-1.c
index 7fc189f..e963ee6 100644
--- a/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-1.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fstack-clash-protection --param 
stack-clash-protection-guard-size=16" } */
 /* { dg-require-effective-target supports_stack_clash_protection } */
+/* { dg-require-effective-target alloca } */
 
 #define SIZE y
 #include "stack-check-alloca.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-10.c 
b/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-10.c
index 7c42206..eb85843 100644
--- a/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-10.c
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-10.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fstack-clash-protection --param 
stack-clash-protection-guard-size=16" } */
 /* { dg-require-effective-target supports_stack_clash_protection } */
+/* { dg-require-effective-target alloca } */
 
 #define SIZE 127.5 * 64 * 1024
 #include "stack-check-alloca.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-2.c 
b/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-2.c
index 69fdd16..cc8216d 100644
--- a/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-2.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fstack-clash-protection --param 
stack-clash-protection-guard-size=16" } */
 /* { dg-require-effective-target supports_stack_clash_protection } */
+/* { dg-require-effective-target alloca } */
 
 #define SIZE 0
 #include "stack-check-alloca.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-3.c 
b/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-3.c
index fba3a7a..f5e51fa 100644
--- a/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-3.c
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-3.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fstack-clash-protection --param 
stack-clash-protection-guard-size=16" } */
 /* { dg-require-effective-target supports_stack_clash_protection } */
+/* { dg-require-effective-target alloca } */
 
 #define SIZE 100
 #include "stack-check-alloca.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-4.c 
b/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-4.c
index d53f30a..c903f4d 100644
--- a/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-4.c
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-4.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fstack-clash-protection --param 
stack-clash-protection-guard-size=16" } */
 /* { dg-require-effective-target supports_stack_clash_protection } */
+/* { dg-require-effective-target alloca } */
 
 #define SIZE 2 * 1024
 #include "stack-check-alloca.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-5.c 
b/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-5.c
index e0ff99f..691ec23 100644
--- a/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-5.c
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-5.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fstack-clash-protection --param 
stack-clash-protection-guard-size=16" } */
 /* { dg-require-effective-target supports_stack_clash_protection } */
+/* { dg-require-effective-target alloca } */
 
 #define SIZE 63 * 1024
 #include "stack-check-alloca.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-6.c 
b/gcc/testsuite/gcc.target/aarch64/stack-check-alloca-6.c

[committed, amdgcn] Add more modes for vector comparisons

2020-01-07 Thread Andrew Stubbs

This patch add vector comparison patterns for many more modes.

These test failures now pass:

FAIL: gcc.dg/vect/pr65947-2.c scan-tree-dump-times vect "LOOP VECTORIZED" 2
FAIL: gcc.dg/vect/pr65947-5.c scan-tree-dump-times vect "LOOP VECTORIZED" 2
FAIL: gcc.dg/vect/pr65947-9.c scan-tree-dump-times vect "LOOP VECTORIZED" 1
FAIL: gcc.dg/vect/vect-21.c scan-tree-dump-times vect "vectorized 3 loops" 1
FAIL: gcc.dg/vect/vect-23.c scan-tree-dump-times vect "vectorized 3 loops" 1
FAIL: gcc.dg/vect/vect-24.c scan-tree-dump-times vect "vectorized 3 loops" 1
FAIL: gcc.dg/vect/vect-cond-reduc-4.c scan-tree-dump-times vect "LOOP 
VECTORIZED" 2
FAIL: gcc.dg/vect/vect-cselim-1.c scan-tree-dump-times vect "vectorized 
2 loops" 1
FAIL: gcc.dg/vect/vect-version-1.c scan-tree-dump vect "applying loop 
versioning to outer loop 1"


Andrew
[amdgcn] Add more modes for vector comparisons

2020-01-07  Andrew Stubbs  

	gcc/
	* config/gcn/gcn-valu.md (VEC_1REG_INT_ALT): Delete iterator.
	(VEC_ALLREG_ALT): New iterator.
	(VEC_ALLREG_INT_MODE): New iterator.
	(VCMP_MODE): New iterator.
	(VCMP_MODE_INT): New iterator.
	(vec_cmpudi): Use VCMP_MODE_INT.
	(vec_cmpv64qidi): New define_expand.
	(vec_cmpdi_exec): Use VCMP_MODE.
	(vec_cmpudi_exec): New define_expand.
	(vec_cmpv64qidi_exec): New define_expand.
	(vec_cmpdi_dup): Use VCMP_MODE.
	(vec_cmpdi_dup_exec): Use VCMP_MODE.
	(vcond): Rename ...
	(vcond): ... to this.
	(vcond_exec): Rename ...
	(vcond_exec): ... to this.
	(vcondu): Rename ...
	(vcondu): ... to this.
	(vcondu_exec): Rename ...
	(vcondu_exec): ... to
	this.
	* config/gcn/gcn.c (print_operand): Fix 8 and 16 bit suffixes.
	* config/gcn/gcn.md (expander): Add sign_extend and zero_extend.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 7dd7bb96918..7c3de8cbc7e 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -26,8 +26,6 @@
 
 (define_mode_iterator VEC_1REG_INT_MODE
 		  [V64SI])
-(define_mode_iterator VEC_1REG_INT_ALT
-		  [V64SI])
 (define_mode_iterator VEC_ALL1REG_INT_MODE
 		  [V64QI V64HI V64SI])
 (define_mode_iterator VEC_ALL1REG_INT_ALT
@@ -44,6 +42,12 @@
 (define_mode_iterator VEC_ALLREG_MODE
 		  [V64QI V64HI V64SI V64HF V64SF; Single reg
 		   V64DI V64DF])		; Double reg
+(define_mode_iterator VEC_ALLREG_ALT
+		  [V64QI V64HI V64SI V64HF V64SF; Single reg
+		   V64DI V64DF])		; Double reg
+(define_mode_iterator VEC_ALLREG_INT_MODE
+		  [V64QI V64HI V64SI	; Single reg
+		   V64DI])			; Double reg
 
 (define_mode_attr scalar_mode
   [(V64QI "qi") (V64HI "hi") (V64SI "si")
@@ -2521,12 +2525,15 @@
 ;; }}}
 ;; {{{ Vector comparison/merge
 
+(define_mode_iterator VCMP_MODE [V64HI V64SI V64DI V64HF V64SF V64DF])
+(define_mode_iterator VCMP_MODE_INT [V64HI V64SI V64DI])
+
 (define_insn "vec_cmpdi"
   [(set (match_operand:DI 0 "register_operand"	  "=cV,cV,  e, e,Sg,Sg")
 	(match_operator 1 "comparison_operator"
-	  [(match_operand:VEC_1REG_MODE 2 "gcn_alu_operand"
+	  [(match_operand:VCMP_MODE 2 "gcn_alu_operand"
 		  "vSv, B,vSv, B, v,vA")
-	   (match_operand:VEC_1REG_MODE 3 "gcn_vop3_operand"
+	   (match_operand:VCMP_MODE 3 "gcn_vop3_operand"
 		  "  v, v,  v, v,vA, v")]))
(clobber (match_scratch:DI 4			  "= X, X, cV,cV, X, X"))]
   ""
@@ -2543,8 +2550,8 @@
 (define_expand "vec_cmpudi"
   [(match_operand:DI 0 "register_operand")
(match_operator 1 "comparison_operator"
- [(match_operand:VEC_1REG_INT_MODE 2 "gcn_alu_operand")
-  (match_operand:VEC_1REG_INT_MODE 3 "gcn_vop3_operand")])]
+ [(match_operand:VCMP_MODE_INT 2 "gcn_alu_operand")
+  (match_operand:VCMP_MODE_INT 3 "gcn_vop3_operand")])]
   ""
   {
 /* Unsigned comparisons use the same patterns as signed comparisons,
@@ -2555,13 +2562,30 @@
 DONE;
   })
 
+; There's no instruction for 8-bit vector comparison, so we need to extend.
+(define_expand "vec_cmpv64qidi"
+  [(match_operand:DI 0 "register_operand")
+   (match_operator 1 "comparison_operator"
+ [(any_extend:V64SI (match_operand:V64QI 2 "gcn_alu_operand"))
+  (any_extend:V64SI (match_operand:V64QI 3 "gcn_vop3_operand"))])]
+  "can_create_pseudo_p ()"
+  {
+rtx sitmp1 = gen_reg_rtx (V64SImode);
+rtx sitmp2 = gen_reg_rtx (V64SImode);
+
+emit_insn (gen_v64qiv64si2 (sitmp1, operands[2]));
+emit_insn (gen_v64qiv64si2 (sitmp2, operands[3]));
+emit_insn (gen_vec_cmpv64sidi (operands[0], operands[1], sitmp1, sitmp2));
+DONE;
+  })
+
 (define_insn "vec_cmpdi_exec"
   [(set (match_operand:DI 0 "register_operand"	   "=cV,cV,  e, e,Sg,Sg")
 	(and:DI
 	  (match_operator 1 "comparison_operator"
-	[(match_operand:VEC_1REG_MODE 2 "gcn_alu_operand"
+	[(match_operand:VCMP_MODE 2 "gcn_alu_operand"
 		   "vSv, B,vSv, B, v,vA")
-	 (match_operand:VEC_1REG_MODE 3 "gcn_vop3_operand"
+	 (match_operand:VCMP_MODE 3 "gcn_vop3_operand"
 		   "  v, v,  v, v,vA, v")])
 	  (match_operand:DI 4 

Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [2/2]

2020-01-07 Thread Richard Sandiford
Stam Markianos-Wright  writes:
> On 12/19/19 10:08 AM, Richard Sandiford wrote:
>> Stam Markianos-Wright  writes:
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index f57469b6e23..f40f6432fd4 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -21661,6 +21661,68 @@ aarch64_stack_protect_guard (void)
>>> return NULL_TREE;
>>>   }
>>>   
>>> +/* Return the diagnostic message string if conversion from FROMTYPE to
>>> +   TOTYPE is not allowed, NULL otherwise.  */
>>> +
>>> +static const char *
>>> +aarch64_invalid_conversion (const_tree fromtype, const_tree totype)
>>> +{
>>> +  static char templ[100];
>>> +  if ((GET_MODE_INNER (TYPE_MODE (fromtype)) == BFmode
>>> +   || GET_MODE_INNER (TYPE_MODE (totype)) == BFmode)
>>> +   && TYPE_MODE (fromtype) != TYPE_MODE (totype))
>>> +  {
>>> +snprintf (templ, sizeof (templ), \
>>> +  "incompatible types when assigning to type '%s' from type '%s'",
>>> +  IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (totype))),
>>> +  IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (fromtype;
>>> +return N_(templ);
>>> +  }
>>> +  /* Conversion allowed.  */
>>> +  return NULL;
>>> +}
>>> +
>> 
>> This won't handle translation properly.  We also have no guarantee that
>> the formatted string will fit in 100 characters since at least one of
>> the type names is unconstrained.  (Also, not all types have names.)
>> 
>
> Hi Richard. I'm sending an email here to show you what I have done here, too 
> :)
>
> Currently I have the following:
>
> static const char *
> aarch64_invalid_conversion (const_tree fromtype, const_tree totype)
> {
>static char templ[100];
>if (TYPE_MODE (fromtype) != TYPE_MODE (totype)
>&& ((TYPE_MODE (fromtype) == BFmode && !VECTOR_TYPE_P (fromtype))
> || (TYPE_MODE (totype) == BFmode && !VECTOR_TYPE_P (totype

Just:

if (TYPE_MODE (fromtype) != TYPE_MODE (totype)
&& (TYPE_MODE (fromtype) == BFmode || TYPE_MODE (fromtype) == BFmode))

should be enough.  Types that have BFmode can't also be vectors.

>  {
>if (TYPE_NAME (fromtype) != NULL && TYPE_NAME (totype) != NULL)
>   {
> snprintf (templ, sizeof (templ),
>   "incompatible types when assigning to type '%s' from type '%s'",
>   IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (totype))),
>   IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (fromtype;
> return N_(templ);
>   }
>else
>   {
> snprintf (templ, sizeof (templ),
>   "incompatible types for assignment");
> return N_(templ);
>   }

This still has the problem I mentioned above though: DECL_NAMEs are
supplied by the user and can be arbitrary lengths, so there's no
guarantee that the error message fits in the 100-character buffer.
We would get a truncated message if the buffer isn't big enough.

As far as translation goes: the arguments to diagnostic functions
like "error" are untranslated strings, which the diagnostic functions
then translate internally.  po/exgettext scans the source tree looking
for strings that need to be translatable and collects them all in po/gcc.pot.
Constant format strings in calls to known diagnostic functions get picked
up automatically (see ABOUT-GCC-NLS), but others need to be marked with
N_().  This N_() is simply a no-op wrapper macro that marks the argument
as needing translation.  It has no effect if the argument isn't a
constant string.

The interface of this hook is to return an untranslated diagnostic string
that gets passed to error.  A better interface would be to let the hook
raise its own error and return a boolean result, but that isn't what
we have.

So in the above, it's "incompatible types for assignment" that needs to
be wrapped in N_().  Wrapping templ has no effect.

This is also why the first arm doesn't work for translation.  It constructs
and returns an arbitrary new string that won't have been entered into
gcc.pot (and can't be, since it depends on the names of the user types).
So the error function will have no chance to translate it.  And it would
be a layering violation to try to translate it here.

So the hook basically has to return fixed strings marked with N_().
I don't think it should mention assignment though, since the conversions
could occur in any context (initialisation, function calls, etc.).  If
"invalid conversion" seems too terse, maybe we could have things like:

  "invalid conversion to %"

and:

  "invalid conversion to %"

>  }
>/* Conversion allowed.  */
>return NULL;
> }
>
> This blocks the conversion only if the two types are of different modes and 
> one 
> of them is a BFmode scalar.
>
> Doing it like this seems to block all scalar-sized assignments:
>
> C:
>
> typedef bfloat16_t vbf __attribute__((vector_size(2)));
> vbf foo3 (void) { return (vbf) 0x1234; }
>
> bfloat16_t foo1 (void) { return (bfloat16_t) 0x1234; }
>
> bfloat16_t scalar1_3 = 0;
> 

Re: [wwwdocs] Document -fcommon default change

2020-01-07 Thread Wilco Dijkstra
Hi,

>On 1/6/20 7:10 AM, Jonathan Wakely wrote:
>> GCC now defaults to -fno-common.  As a result, global
>> variable accesses are more efficient on various targets.  In C, global
>> variables with multiple tentative definitions will result in linker
>> errors.
>
> This is better.  I'd also s/will/now/, since we're talking about the 
> present behavior of GCC 10, not some future behavior.

Thanks for the suggestions, I've reworded it as:

GCC now defaults to -fno-common.  As a result, global
variable accesses are more efficient on various targets.  In C, global
variables with multiple tentative definitions now result in linker errors.
With -fcommon such definitions are silently merged during
linking.

Also changed the anchor name (I think it was a copy/paste from another entry).
Here the updated version:


diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
index 
d6108269e977df2af29bd5c9149cc2136654ce05..45af7fa333cfff2155ff0346fe36855aa6ff940a
 100644
--- a/htdocs/gcc-10/changes.html
+++ b/htdocs/gcc-10/changes.html
@@ -140,6 +140,13 @@ a work-in-progress.
 In C2X mode, -fno-fp-int-builtin-inexact is
 enabled by default.
   
+
+  GCC now defaults to -fno-common.  As a result, global
+  variable accesses are more efficient on various targets.  In C, global
+  variables with multiple tentative definitions now result in linker 
errors.
+  With -fcommon such definitions are silently merged during
+  linking.
+  
 
 
 C++
diff --git a/htdocs/gcc-10/porting_to.html b/htdocs/gcc-10/porting_to.html
index 
3256e8a35d00ce1352c169a1c6df6d8f120889ee..7d45a962d014fecece9bd52a13ca1799153672fe
 100644
--- a/htdocs/gcc-10/porting_to.html
+++ b/htdocs/gcc-10/porting_to.html
@@ -29,9 +29,25 @@ and provide solutions. Let us know if you have suggestions 
for improvements!
 Preprocessor issues
 -->
 
-
+
+Default to -fno-common
+
+
+  A common mistake in C is omitting extern when declaring a global
+  variable in a header file.  If the header is included by several files it
+  results in multiple definitions of the same variable.  In previous GCC
+  versions this error is ignored.  GCC 10 defaults to -fno-common,
+  which means a linker error will now be reported.
+  To fix this, use extern in header files when declaring global
+  variables, and ensure each global is defined in exactly one C file.
+  As a workaround, legacy C code can be compiled with -fcommon.
+
+  
+  int x;  // tentative definition - avoid in header files
+
+  extern int y;  // correct declaration in a header file
+  
 
 Fortran language issues
 


Re: [Patch 0/X] HWASAN v3

2020-01-07 Thread Martin Liška

On 12/12/19 4:18 PM, Matthew Malcomson wrote:

Hello.

I've just sent few comments that are related to the v3 of the patch set.
Based on the HWASAN (limited) knowledge the patch seems reasonable to me.
I haven't looked much at the newly introduced RTL-hooks.
But these seems to me isolated to the aarch64 port.

I can also verify that the patchset works on my aarch64 linux machine and
hwasan.exp and asan.exp tests succeed.


I haven't gotten ASAN_MARK to print as HWASAN_MARK when using memory tagging,
since I'm not sure the way I found to implement this would be acceptable.  The
inlined patch below works but it requires a special declaration instead of just
an ~#include~.


Knowing that, I would not bother with the printing of HWASAN_MARK.

Thanks for the series,
Martin


Re: [PATCH 6/X] [libsanitizer] Add hwasan pass and associated gimple changes

2020-01-07 Thread Martin Liška

On 12/12/19 4:19 PM, Matthew Malcomson wrote:

- if (is_store && !param_asan_instrument_writes)
+ if (is_store
+ && (!param_asan_instrument_writes || !param_hwasan_instrument_writes))
return;
- if (!is_store && !param_asan_instrument_reads)
+ if (!is_store
+ && (!param_asan_instrument_reads || !param_hwasan_instrument_reads))
return;


I know it's very unlikely, but one can use -fsanitize=address and
--param hwasan-instrument-reads=0 which will drop instrumentation of reads
for ASAN.

Similarly for other parameters.

Martin


Re: [C++ Patch] Improve build_new locations

2020-01-07 Thread Jason Merrill

On 1/7/20 5:46 AM, Paolo Carlini wrote:

Hi,

On 06/01/20 21:47, Jason Merrill wrote:

On 1/2/20 4:23 AM, Paolo Carlini wrote:

@@ -19320,8 +19320,8 @@ tsubst_copy_and_build (tree t,
    tree op1 = tsubst (TREE_OPERAND (t, 1), args, complain, 
in_decl);

  tree op2 = RECUR (TREE_OPERAND (t, 2));
-    ret = build_new (_vec, op1, op2, _vec,
- NEW_EXPR_USE_GLOBAL (t),
+    ret = build_new (input_location, _vec, op1, op2,


Hmm, using input_location bothers me even though it does have the 
right value in this function.  Let's use a local variable instead; 
maybe change the existing loc variable to save_loc and use the name 
loc for the location we want to use.


I see, I wondered about that myself: when I started working on these 
location issues I forwarded simple EXPR_LOCATION (t) in a few places, 
which worked just fine. Then I noticed that in many existing places we 
were exploiting the fact that input_location is changed on top and 
started using it myself too. I suppose some of those places too could be 
changed to simple EXPR_LOCATION (t), I can experiment with that.


Anyway, the below implements the save_loc bit and regtests fine on 
x86_64-linux, as usual.


OK.



[C++ PATCH] PR c++/47877 - -fvisibility-inlines-hidden and member templates.

2020-01-07 Thread Jason Merrill
DECL_VISIBILITY_SPECIFIED is also true if an enclosing scope has explicit
visibility, and we don't want that to override -fvisibility-inlines-hidden.
So check for the attribute specifically on the function, like we already do
for template argument visibility restriction.

Tested x86_64-pc-linux-gnu, applying to trunk.

* decl2.c (determine_visibility): -fvisibility-inlines-hidden beats
explicit class visibility for a template.
---
 gcc/tree.h  |  4 ++--
 gcc/cp/decl2.c  |  5 +++--
 .../ext/visibility/fvisibility-inlines-hidden-5.C   | 13 +
 3 files changed, 18 insertions(+), 4 deletions(-)
 create mode 100644 
gcc/testsuite/g++.dg/ext/visibility/fvisibility-inlines-hidden-5.C

diff --git a/gcc/tree.h b/gcc/tree.h
index a6e4e9e6b8c..9ca9ab58ec0 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -2903,8 +2903,8 @@ extern void decl_value_expr_insert (tree, tree);
 #define DECL_VISIBILITY(NODE) \
   (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.visibility)
 
-/* Nonzero means that the decl had its visibility specified rather than
-   being inferred.  */
+/* Nonzero means that the decl (or an enclosing scope) had its
+   visibility specified rather than being inferred.  */
 #define DECL_VISIBILITY_SPECIFIED(NODE) \
   (DECL_WITH_VIS_CHECK (NODE)->decl_with_vis.visibility_specified)
 
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 9fb1d81a881..a641667991f 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -2629,6 +2629,7 @@ determine_visibility (tree decl)
   tree attribs = (TREE_CODE (decl) == TYPE_DECL
  ? TYPE_ATTRIBUTES (TREE_TYPE (decl))
  : DECL_ATTRIBUTES (decl));
+  tree attr = lookup_attribute ("visibility", attribs);
   
   if (args != error_mark_node)
{
@@ -2636,7 +2637,7 @@ determine_visibility (tree decl)
 
  if (!DECL_VISIBILITY_SPECIFIED (decl))
{
- if (!DECL_VISIBILITY_SPECIFIED (pattern)
+ if (!attr
  && determine_hidden_inline (decl))
DECL_VISIBILITY (decl) = VISIBILITY_HIDDEN;
  else
@@ -2650,7 +2651,7 @@ determine_visibility (tree decl)
  if (args
  /* Template argument visibility outweighs #pragma or namespace
 visibility, but not an explicit attribute.  */
- && !lookup_attribute ("visibility", attribs))
+ && !attr)
{
  int depth = TMPL_ARGS_DEPTH (args);
  if (DECL_VISIBILITY_SPECIFIED (decl))
diff --git a/gcc/testsuite/g++.dg/ext/visibility/fvisibility-inlines-hidden-5.C 
b/gcc/testsuite/g++.dg/ext/visibility/fvisibility-inlines-hidden-5.C
new file mode 100644
index 000..2937f35d9a6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/visibility/fvisibility-inlines-hidden-5.C
@@ -0,0 +1,13 @@
+// PR c++/47877
+// { dg-options "-fvisibility-inlines-hidden" }
+// { dg-require-visibility "" }
+// { dg-final { scan-hidden "_ZN3Foo3barIS_EEvv" } }
+
+struct __attribute__((visibility("default"))) Foo {
+  template  inline void bar() {};
+};
+
+int main()
+{
+  Foo().bar();
+}

base-commit: 31bfdb85acb36bfdae772c7fa253abb6d6e8c4ca
-- 
2.18.1



Re: [PATCH 4/X] [libsanitizer][options] Add hwasan flags and argument parsing

2020-01-07 Thread Martin Liška

On 12/12/19 4:19 PM, Matthew Malcomson wrote:

+-param=hwasan-stack=
+Common Joined UInteger Var(param_hwasan_stack) Init(1) IntegerRange(0, 1) Param
+Enable hwasan stack protection.
+
+-param=hwasan-random-frame-tag=
+Common Joined UInteger Var(param_hwasan_random_frame_tag) Init(1) 
IntegerRange(0, 1) Param
+Use random base tag for each frame, as opposed to base always zero.
+
+-param=hwasan-instrument-allocas=
+Common Joined UInteger Var(param_hwasan_protect_allocas) Init(1) 
IntegerRange(0, 1) Param
+Enable hwasan allocas/VLAs protection.
+
+-param=hwasan-instrument-reads=
+Common Joined UInteger Var(param_hwasan_instrument_reads) Init(1) 
IntegerRange(0, 1) Param
+Enable hwasan load operations protection.
+
+-param=hwasan-instrument-writes=
+Common Joined UInteger Var(param_hwasan_instrument_writes) Init(1) 
IntegerRange(0, 1) Param
+Enable hwasan store operations protection.
+
+-param=hwasan-memintrin=
+Common Joined UInteger Var(param_hwasan_memintrin) Init(1) IntegerRange(0, 1) 
Param
+Enable hwasan builtin functions protection.


Similarly to majority of ASAN parameters, all these should have Optimization 
keyword.
That means one can have different parameter values for each function (one can 
do that
with LTO right now).

Martin


[committed, amdgcn] Disallow 'B' constraints on addc/subb

2020-01-07 Thread Andrew Stubbs
This patch fixes some failures to assemble I encountered while fixing 
other issues (not yet posted).


These were caused by emitting 'B' immediates that the ISA manual 
indicates should work, but the assembler rejects. I've confirmed that 
they are, in fact, not handled by the hardware; if I encode the 
instruction manually the GPU appears to decode the illegal instructions 
correctly, but the then gives wrong outputs.


The solution is to remove the 'B' immediates from the addc and subc 
patterns. This has a knock-on effect for all the V64DImode patterns that 
split into the other patterns, so these have been adjusted to use a new 
'Db' constraint that allows 'B' for the low-part, and only 'A' for the 
high-part.


As a bonus, this patch fixes four test failures:

FAIL: gcc.dg/vect/vect-alias-check-10.c execution test 



FAIL: gcc.dg/vect/vect-alias-check-11.c execution test 



FAIL: gcc.dg/vect/vect-alias-check-12.c execution test 



FAIL: gcc.dg/vect/vect-live-slp-3.c execution test 




These execution were more likely caused by some early-clobber issues 
(since they assembled fine), but the constraint review and rework has 
addressed that issue too.


Andrew
Disallow 'B' constraints on amdgcn addc/subb

2020-01-07  Andrew Stubbs  

	gcc/
	* config/gcn/constraints.md (DA): Update description and match.
	(DB): Likewise.
	(Db): New constraint.
	* config/gcn/gcn-protos.h (gcn_inline_constant64_p): Add second
	parameter.
	* config/gcn/gcn.c (gcn_inline_constant64_p): Add 'mixed' parameter.
	Implement 'Db' mixed immediate type.
	* config/gcn/gcn-valu.md (addcv64si3): Rework constraints.
	(addcv64si3_dup): Delete.
	(subcv64si3): Rework constraints.
	(addv64di3): Rework constraints.
	(addv64di3_exec): Rework constraints.
	(subv64di3): Rework constraints.
	(addv64di3_dup): Delete.
	(addv64di3_dup_exec): Delete.
	(addv64di3_zext): Rework constraints.
	(addv64di3_zext_exec): Rework constraints.
	(addv64di3_zext_dup): Rework constraints.
	(addv64di3_zext_dup_exec): Rework constraints.
	(addv64di3_zext_dup2): Rework constraints.
	(addv64di3_zext_dup2_exec): Rework constraints.
	(addv64di3_sext_dup2): Rework constraints.
	(addv64di3_sext_dup2_exec): Rework constraints.

diff --git a/gcc/config/gcn/constraints.md b/gcc/config/gcn/constraints.md
index f2de943ba16..dd6615b0fd7 100644
--- a/gcc/config/gcn/constraints.md
+++ b/gcc/config/gcn/constraints.md
@@ -53,12 +53,17 @@
 	(match_test "gcn_constant64_p (op)")))
 
 (define_constraint "DA"
-  "Splittable inline immediate 64-bit parameter"
+  "Immediate 64-bit parameter, low and high part match 'A'"
   (and (match_code "const_int,const_double,const_vector")
-   (match_test "gcn_inline_constant64_p (op)")))
+   (match_test "gcn_inline_constant64_p (op, 0)")))
+
+(define_constraint "Db"
+  "Immediate 64-bit parameter, low part matches 'B', high part matches 'A'"
+  (and (match_code "const_int,const_double,const_vector")
+   (match_test "gcn_inline_constant64_p (op, 1)")))
 
 (define_constraint "DB"
-  "Splittable immediate 64-bit parameter"
+  "Immediate 64-bit parameter, low and high part match 'B'"
   (match_code "const_int,const_double,const_vector"))
 
 (define_constraint "U"
diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h
index 92a6aa77e84..e4dadd37f21 100644
--- a/gcc/config/gcn/gcn-protos.h
+++ b/gcc/config/gcn/gcn-protos.h
@@ -51,7 +51,7 @@ extern int gcn_hard_regno_nregs (int regno, machine_mode mode);
 extern void gcn_hsa_declare_function_name (FILE *file, const char *name,
 	   tree decl);
 extern HOST_WIDE_INT gcn_initial_elimination_offset (int, int);
-extern bool gcn_inline_constant64_p (rtx);
+extern bool gcn_inline_constant64_p (rtx, bool);
 extern bool gcn_inline_constant_p (rtx);
 extern int gcn_inline_fp_constant_p (rtx, bool);
 extern reg_class gcn_mode_code_base_reg_class (machine_mode, addr_space_t,
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index e301a4356ec..7dd7bb96918 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -1090,20 +1090,21 @@
   [(set_attr "type" "vop2,vop3b")
(set_attr "length" "8,8")])
 
-; This pattern does not accept SGPR because VCC read already counts as an
-; SGPR use and number of SGPR operands is limited to 1.
+; v_addc does not accept an SGPR because the VCC read already counts as an
+; SGPR use and the number of SGPR operands is limited to 1.  It does not
+; accept "B" immediate constants due to a related bus conflict.
 
 (define_insn "addcv64si3"
-  [(set (match_operand:V64SI 0 "register_operand" "=v,v")
+  [(set (match_operand:V64SI 0 "register_operand""=v,   v")
 	(plus:V64SI
 	  (plus:V64SI
 	(vec_merge:V64SI
 	  (vec_duplicate:V64SI (const_int 1))
 	  (vec_duplicate:V64SI (const_int 0))
-	  (match_operand:DI 3 "register_operand" " cV,Sv"))
-	(match_operand:V64SI 1 "gcn_alu_operand" "%vA,vA"))
-	  (match_operand:V64SI 2 "gcn_alu_operand"   " vB,vB")))
-   (set (match_operand:DI 4 

Re: [PATCH] Make warn_inline Optimization option.

2020-01-07 Thread Richard Biener
On Tue, Jan 7, 2020 at 3:26 PM Jan Hubicka  wrote:
>
> > Err - Optimization also lists it in some -help section?  It's a Warning
> > option and certainly we don't handle per-function Warnings in general
> > (with LTO) even though we have #pragma GCC diagnostic, no?
> >
> > I'm not sure why we force warn_inline to zero with -O0, it seems much
> > better to guard false warnings in some other way for -O0?
>
> Well, we can do that with warn_inline, but in general we do want to
> stream late warnings (so things like -Wmaybe-uninitialized works sort of
> as expected for -flto). So I guess we want way to mark option as part of
> TARGET_OPTIMIZATION_NODE even though it is not realy an optimization
> option but parameter, warning or semantic change.

Given all warning options can be enabled/disabled via #pragma GCC diagnostic
all Warning annotated options should be implicitely 'Optimization' for
the purpose
of LTO streaming then?

Richard.

> Honza


Re: [C++ coroutines 6/6] Testsuite.

2020-01-07 Thread JunMa

在 2019/11/20 下午9:11, JunMa 写道:

在 2019/11/20 下午7:22, Iain Sandoe 写道:

Hello JunMa,

JunMa  wrote:


在 2019/11/17 下午6:28, Iain Sandoe 写道:
I find that the patches donot support 'for co_await'. And it is
quiet simple to implement range based 'for co_await' based on your
patches, since it's just need few more works on range for source to
source transform. Any reason for that?

If I understand your question correctly,

for example TS n4775, there was:

[stmt.stmt]
   ….
   for co_await ( for-range-declaration : for-range-initializer ) 
statement


yes?

This was removed by a later committee resolution, and was *not* merged
to the C++20 Working Draft (I am currently working to n4835).

So, the reason it is not implemented in GCC at present, is that it is 
not clear
exactly what form it might take if introduced in some future proposal 
for

enhanced coroutines.

hope that answers the question,
thanks
Iain

Hi Iain,
    Thanks, that make sense.

Regards
JunMa

Hi Iain

In current implementation of the allocating storage for coroutine,
it does not follow the rules in n4835 which look up in the scope of
the promise type first. Is there any reason? If not, would youimplement
this following the rules of n4835? Thanks.

Regards
JunMa


Re: [PATCH] Fix redundant load missed by fre [tree-optimization 92980]

2020-01-07 Thread Richard Biener
On Wed, Dec 18, 2019 at 10:59 AM Andrew Pinski  wrote:
>
> On Wed, Dec 18, 2019 at 1:18 AM Hongtao Liu  wrote:
> >
> > On Wed, Dec 18, 2019 at 4:26 PM Segher Boessenkool
> >  wrote:
> > >
> > > On Wed, Dec 18, 2019 at 10:37:11AM +0800, Hongtao Liu wrote:
> > > > Hi:
> > > >   This patch is to simplify A * C + (-D) -> (A - D/C) * C when C is a
> > > > power of 2 and D mod C == 0.

Looks like a subset of what fold_plusminus_mult_expr does?

> > > >   bootstrap and make check is ok.
> > >
> > > Why would this be a good idea?  It is not reducing the number of
> > > operators or similar?
> > >
> > It helps VN, so that fre will delete redundant load.

So it's basically a canonicalization.  What you have to watch for is
code doing the reverse (extract_muldiv and the associate: cases in
fold-const.c are a bad example here).

> It is basically doing a factoring and undoing an optimization that was
> done in the front-end (see pointer_int_sum in c-common.c).
> But I think the optimization in the front-end should be removed.  It
> dates from 1992, a time when GCC did not anything on the tree level
> and there was no GCSE (PRE) and the CSE was limited.

Agreed (as this is a premature point).  But not at this point - I think I tried
this and there's quite some fallout.

Also always watch for unefined overflow issues.

Richard.

> Thanks,
> Andrew Pinski
>
>
> > >
> > > Segher
> >
> >
> >
> > --
> > BR,
> > Hongtao


Re: [PATCH] Make warn_inline Optimization option.

2020-01-07 Thread Jan Hubicka
> Err - Optimization also lists it in some -help section?  It's a Warning
> option and certainly we don't handle per-function Warnings in general
> (with LTO) even though we have #pragma GCC diagnostic, no?
> 
> I'm not sure why we force warn_inline to zero with -O0, it seems much
> better to guard false warnings in some other way for -O0?

Well, we can do that with warn_inline, but in general we do want to
stream late warnings (so things like -Wmaybe-uninitialized works sort of
as expected for -flto). So I guess we want way to mark option as part of
TARGET_OPTIMIZATION_NODE even though it is not realy an optimization
option but parameter, warning or semantic change.

Honza


Re: [testsuite][arm] Remove xfail for vect-epilogues test

2020-01-07 Thread Richard Biener
On Tue, 7 Jan 2020, Andre Vieira (lists) wrote:

> What about the previous patch fixing the existing testism by skipping for
> big-endian arm using existing target supports?

I thought this was an update containing both - so yes, that's OK as well.

Richard.

> On 07/01/2020 07:31, Richard Biener wrote:
> > On Fri, 13 Dec 2019, Andre Vieira (lists) wrote:
> > 
> >> Consequently whilst looking at the list I noticed these two were missing.
> >>
> >> This OK for trunk?
> > 
> > Yes.
> > 
> > Thanks,
> > Richard.
> > 
> >> gcc/ChangeLog:
> >> 2019-12-13  Andre Vieira  
> >>
> >>  * doc/sourcebuild.texi (arm_little_endian, arm_nothumb):
> >>  Documented existing target checks.
> >>
> >> On 13/12/2019 13:05, Andre Vieira (lists) wrote:
> >>> Thanks for pointing it out Rainer. Thanks to that reminder I noticed there
> >>> is a different way to achieve this without adding that extra target check.
> >>>
> >>> This OK?
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>> 2019-12-12  Andre Vieira  
> >>>
> >>>       * gcc.dg/vect/vect-epilogues.c: XFAIL for arm big endian.
> >>>
> >>> On 12/12/2019 17:41, Rainer Orth wrote:
>  Hi Andre,
> 
> > gcc/testsuite/ChangeLog:
> >
> > 2019-12-12  Andre Vieira  
> >
> >   * gcc.dg/vect/vect-epilogues.c: XFAIL for arm big endian.
> >   * lib/target-supports.exp
> > (check_effective_target_arm_big_endian):
> >   New target selector.
> 
>  as always, this needs documenting in sourcebuild.texi.
> 
>   Rainer
> 
> >>
> > 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Go patch committed: Avoid a couple of compiler crashes

2020-01-07 Thread Ian Lance Taylor
This patch to the Go frontend avoids a couple of compiler crashes.
These came up while building 1.14beta1 while the code was still
invalid.  The general policy is to not bother committing invalid test
cases that cause compiler crashes.  Bootstrapped and ran Go tests on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 279848)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-9163fa28b89222cd851c0d24bd6a1384d1379c55
+d0a102eea2262e3fca89b1eb342fd03328c4aa16
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 279847)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -11766,10 +11766,17 @@ Call_expression::do_add_conversions()
   Typed_identifier_list::const_iterator pp = fntype->parameters()->begin();
   bool is_interface_method =
 this->fn_->interface_field_reference_expression() != NULL;
+  size_t argcount = this->args_->size();
   if (!is_interface_method && fntype->is_method())
 {
   // Skip the receiver argument, which cannot be interface.
   pa++;
+  argcount--;
+}
+  if (argcount != fntype->parameters()->size())
+{
+  go_assert(saw_errors());
+  return;
 }
   for (; pa != this->args_->end(); ++pa, ++pp)
 {
@@ -11895,6 +11902,8 @@ Call_expression::is_erroneous_call()
 Type*
 Call_expression::do_type()
 {
+  if (this->is_error_expression())
+return Type::make_error_type();
   if (this->type_ != NULL)
 return this->type_;
 


Re: [testsuite][arm] Remove xfail for vect-epilogues test

2020-01-07 Thread Andre Vieira (lists)
What about the previous patch fixing the existing testism by skipping 
for big-endian arm using existing target supports?


On 07/01/2020 07:31, Richard Biener wrote:

On Fri, 13 Dec 2019, Andre Vieira (lists) wrote:


Consequently whilst looking at the list I noticed these two were missing.

This OK for trunk?


Yes.

Thanks,
Richard.


gcc/ChangeLog:
2019-12-13  Andre Vieira  

 * doc/sourcebuild.texi (arm_little_endian, arm_nothumb):
 Documented existing target checks.

On 13/12/2019 13:05, Andre Vieira (lists) wrote:

Thanks for pointing it out Rainer. Thanks to that reminder I noticed there
is a different way to achieve this without adding that extra target check.

This OK?

gcc/testsuite/ChangeLog:

2019-12-12  Andre Vieira  

      * gcc.dg/vect/vect-epilogues.c: XFAIL for arm big endian.

On 12/12/2019 17:41, Rainer Orth wrote:

Hi Andre,


gcc/testsuite/ChangeLog:

2019-12-12  Andre Vieira  

  * gcc.dg/vect/vect-epilogues.c: XFAIL for arm big endian.
  * lib/target-supports.exp
(check_effective_target_arm_big_endian):
  New target selector.


as always, this needs documenting in sourcebuild.texi.

 Rainer







Re: [PATCH][vect] Keep track of DR_OFFSET advance in dr_vec_info rather than data_reference

2020-01-07 Thread Andre Vieira (lists)

Hello,

Should we try to get this refactoring in stage 3 or park it for next 
stage 1?


Cheers,
Andre

On 10/12/2019 13:36, Andre Vieira (lists) wrote:

Hi,

This patch aims at refactoring the vectorizer code to no longer need to 
reset DR_OFFSET for epilogue vectorization and instead keep track of 
DR_OFFSET changes per dr_vec_info and just update it as needed.  I added 
a member of type 'tree' called 'offset' to dr_vec_info, which keeps 
track of the changes to the data_reference's offset per dr_vec_info and 
thus per instance of loop vectorization.  To get the current loop's 
DR_OFFSET I introduced a function 'get_dr_vinfo_offset' which will add 
the dr_vec_info's offset to either the data_reference's innermost offset 
or the offset of the 'innermost_loop_behavior' returned by 
'vect_dr_behavior' depending on whether 'get_dr_vinfo_offset's second 
parameter 'check_outer' is true.  This is to mimic the behavior of using 
the outer loop relative 'innermost_loop_behavior' in 
'vect_create_addr_base_for_vector_ref'.


I regression tested vect.exp on aarch64 and x86_64. I also regression 
tested libgomp on aarch64 and x86_64, no changes, but there were quite a 
few test failures with the commit I based this patch on...


Is this OK for trunk or shall I wait until libgomp stabilizes?

2019-12-10  Andre Vieira  

     * tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref): 
Use

     get_dr_vinfo_offset
     * tree-vect-loop.c (update_epilogue_loop_vinfo):  Remove 
orig_drs_init

     parameter and its use to reset DR_OFFSET's.
     (vect_transform_loop): Remove orig_drs_init argument.
     * tree-vect-loop-manip.c (vect_update_init_of_dr): Update the 
offset

     member of dr_vec_info rather than the offset of the associated
     data_reference's innermost_loop_behavior.
     (vect_update_init_of_dr): Pass dr_vec_info instead of 
data_reference.

     (vect_do_peeling): Remove orig_drs_init parameter and its
     construction.
     * tree-vect-stmts.c (check_scan_store): Replace use of DR_OFFSET
     with get_dr_vinfo_offset.
     (vectorizable_store): Likewise.
     (vectorizable_load): Likewise.


Re: [PATCH] std::experimental::simd

2020-01-07 Thread Dr. Matthias Kretz
On Dienstag, 7. Januar 2020 12:16:57 CET Andrew Pinski wrote:
> On Tue, Jan 7, 2020 at 3:01 AM Matthias Kretz  wrote:
> > Is there any chance left we can get this done for 10.1? If not, can we
> > please get it ready for 10.2 ASAP?
> > 
> > Cheers,
> > 
> >   Matthias
> > 
> > On Montag, 14. Oktober 2019 14:12:12 CET Matthias Kretz wrote:
> > > Let me try again to get this patch ready. It will need a few
> > > iterations...
> > > This patch is without documentation and testsuite. I can add them on
> > > request but would prefer a follow-up patch after getting this one
> > > right.
> > > 
> > > I recommend to review starting from simd.h + simd_scalar.h, then
> > > simd_builtin.h, simd_x86.h, and simd_fixed_size.h. I sure when we get
> > > this
> > > far we are a few iterations further.
> > > 
> > > Regarding the license. The license header is currently just a copy from
> > > my
> > > repo, but we can change it to the libstdc++ license. The paperwork with
> > > the
> > > FSF is done.
> 
> Seems like it would be better if we put the x86 and aarch64/arm
> specific parts in their own headers.

Yes. I'm already working on it. It makes me unhappy in some of the generic 
parts of the code, but I think it's still a worthwhile reorganization. Last 
state is here: https://github.com/VcDevel/std-simd/tree/master/experimental/
bits
I'll prepare a new patch.

> Also all of the x86 conversion should be removed as
> __builtin_convertvector  is supported now.

simd_x86_conversions.h is about PR85048 (and more missing optimizations). I'd 
prefer to implement simd_x86_conversions.h in the compiler, but I'd need some 
guidance. I'd like the first release of std::experimental::simd to have high 
performance - because that's the main reason for using it. I'd rather wait a 
release than taint the impression of its usefulness.

> libstdc++v3 is only ever supported by the version that comes with the
> compiler.

Right, that's an artifact of having active users of this code. I'll clean it 
up.

Thanks for the feedback,
  Matthias

-- 
──┬
 Dr. Matthias Kretz   │ SDE — Software Development for Experiments
 Senior Software Engineer,│  +49 6159 713084
 SIMD Expert, │  m.kr...@gsi.de
 ISO C++ Committee Member │  mattkretz.github.io
──┴

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Ursula Weyrich, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Georg Schütte


Re: [PATCH] Fix a bug that propagation in recursive function uses wrong aggregate lattice (PR ipa/93084)

2020-01-07 Thread Martin Jambor
Hi,

On Fri, Jan 03 2020, Feng Xue OS wrote:
> When checking a self-recursively generated value for aggregate jump
> function, wrong aggregate lattice was used, which will cause infinite
> constant propagation. This patch is composed to fix this issue. 
>
> 2020-01-03  Feng Xue  
>
> PR ipa/93084
> * ipa-cp.c (self_recursively_generated_p): Find matched aggregate
> lattice for a value to check.
> (propagate_vals_across_arith_jfunc): Add an assertion to ensure
> finite propagation in self-recursive scc.

as far as I am concerned, the patch looks good.

Thanks,

Martin


Re: [PATCH] Mark param_max_fields_for_field_sensitive with Optimization keyword.

2020-01-07 Thread Richard Biener
On Fri, Jan 3, 2020 at 1:26 PM Martin Liška  wrote:
>
> Hi.
>
> One another fix where -Ox sets a parameter that
> is not marked with Optimize keyword. Fixed by
> adding the keyword.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

As said in the PR this looks wrong, possibly leading to wrong code.

Richard.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> 2020-01-03  Martin Liska  
>
> PR tree-optimization/92860
> * params.opt: (param_max_fields_for_field_sensitive):
> Mark with Optimization keyword.
> * tree-ssa-structalias.c (use_field_sensitive): Make
> it a function.
> (get_constraint_for_ptr_offset): Call it.
> (create_variable_info_for_1): Likewise.
> (init_alias_vars): Do not initialize use_field_sensitive.
> ---
>   gcc/params.opt |  2 +-
>   gcc/tree-ssa-structalias.c | 15 ++-
>   2 files changed, 11 insertions(+), 6 deletions(-)
>
>


Re: [PATCH] Make warn_inline Optimization option.

2020-01-07 Thread Richard Biener
On Thu, Jan 2, 2020 at 10:04 AM Martin Liška  wrote:
>
> Hi.
>
> The patch is about using Optimization for warn_inline as
> it's affected by -O0.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

Err - Optimization also lists it in some -help section?  It's a Warning
option and certainly we don't handle per-function Warnings in general
(with LTO) even though we have #pragma GCC diagnostic, no?

I'm not sure why we force warn_inline to zero with -O0, it seems much
better to guard false warnings in some other way for -O0?

Richard.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> 2019-12-11  Martin Liska  
>
> PR tree-optimization/92860
> * common.opt: Make in Optimization option
> as it is affected by -O0, which is an Optimization
> option.
> * tree-inline.c (tree_inlinable_function_p):
> Use opt_for_fn for warn_inline.
> (expand_call_inline): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 2019-12-11  Martin Liska  
>
> PR tree-optimization/92860
> * gcc.dg/pr92860.c: New test.
> ---
>   gcc/common.opt |  2 +-
>   gcc/testsuite/gcc.dg/pr92860.c | 13 +
>   gcc/tree-inline.c  |  4 ++--
>   3 files changed, 16 insertions(+), 3 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.dg/pr92860.c
>
>


[PATCH][Arm] ACLE intrinsics for AdvSIMD bfloat16 dot product

2020-01-07 Thread Dennis Zhang
Hi all,

This patch is part of a series adding support for Armv8.6-A features.
It depends on the patch enabling Arm BFmode 
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html

This patch adds intrinsics for brain half-precision float-point dot product.
ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Regression tested for arm-none-linux-gnueabi-armv8-a.

Is it OK for trunk please?

Thanks,
Dennis

gcc/ChangeLog:

2020-01-03  Dennis Zhang  

* config/arm/arm_neon.h (vbfdot_f32, vbfdotq_f32): New
(vbfdot_lane_f32, vbfdotq_laneq_f32): New.
(vbfdot_laneq_f32, vbfdotq_lane_f32): New.
* config/arm/arm_neon_builtins.def (vbfdot): New.
(vbfdot_lanev4bf, vbfdot_lanev8bf): New.
* config/arm/iterators.md (VSF2BF): New mode attribute.
* config/arm/neon.md (neon_vbfdot): New.
(neon_vbfdot_lanev4bf): New.
(neon_vbfdot_lanev8bf): New.

gcc/testsuite/ChangeLog:

2020-01-03  Dennis Zhang  

* gcc.target/arm/simd/bf16_dot_1.c: New test.
* gcc.target/arm/simd/bf16_dot_2.c: New test.

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 7433559f00020a4f7878dff22ddc2b9d40bb2e06..1d9e7d40ccdd86e9ece300b9e08c78bcffe915a6 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18745,6 +18745,59 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
 #pragma GCC pop_options
 #endif
 
+/* AdvSIMD Brain half-precision float-point (Bfloat16) intrinsics.  */
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdot_f32 (float32x2_t __r, bfloat16x4_t __a, bfloat16x4_t __b)
+{
+  return __builtin_neon_vbfdotv2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdotq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_neon_vbfdotv4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdot_lane_f32 (float32x2_t __r, bfloat16x4_t __a, bfloat16x4_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vbfdot_lanev4bfv2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdotq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
+		   const int __index)
+{
+  return __builtin_neon_vbfdot_lanev8bfv4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdot_laneq_f32 (float32x2_t __r, bfloat16x4_t __a, bfloat16x8_t __b,
+		  const int __index)
+{
+  return __builtin_neon_vbfdot_lanev8bfv2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdotq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b,
+		  const int __index)
+{
+  return __builtin_neon_vbfdot_lanev4bfv4sf (__r, __a, __b, __index);
+}
+
+#pragma GCC pop_options
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index bcccf93f7fa2750e9006e5856efecbec0fb331b9..367fd21f5546c6b5a49d79df2822537cbb98e1f7 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -373,3 +373,7 @@ VAR2 (MAC_LANE_PAIR, vcmlaq_lane0, v4sf, v8hf)
 VAR2 (MAC_LANE_PAIR, vcmlaq_lane90, v4sf, v8hf)
 VAR2 (MAC_LANE_PAIR, vcmlaq_lane180, v4sf, v8hf)
 VAR2 (MAC_LANE_PAIR, vcmlaq_lane270, v4sf, v8hf)
+
+VAR2 (TERNOP, vbfdot, v2sf, v4sf)
+VAR2 (MAC_LANE_PAIR, vbfdot_lanev4bf, v2sf, v4sf)
+VAR2 (MAC_LANE_PAIR, vbfdot_lanev8bf, v2sf, v4sf)
\ No newline at end of file
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 439021fa0733ac31706287c4f98d62b080afc3a1..eb001131dc5cb7bed2afe428664d7c863595c60c 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -834,6 +834,8 @@
 (define_mode_attr VSI2QI [(V2SI "V8QI") (V4SI "V16QI")])
 (define_mode_attr vsi2qi [(V2SI "v8qi") (V4SI "v16qi")])
 
+(define_mode_attr VSF2BF [(V2SF "V4BF") (V4SF "V8BF")])
+
 ;;
 ;; Code attributes
 ;;
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 3e7ebd7464d4d42eac6a525b5f1b39eae08c9086..248c5f622421d7e8197adb23d7f28588840ff772 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -6556,3 +6556,51 @@ if (BYTES_BIG_ENDIAN)
  "vabd. %0, %1, %2"
  [(set_attr "type" "neon_fp_abd_s")]
 )
+
+(define_insn "neon_vbfdot"
+  [(set 

Re: [PATCH][RFC] Introduce BIT_FIELD_INSERT

2020-01-07 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, 7 Jan 2020, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Tue, 7 Jan 2020, Andrew Pinski wrote:
>> >
>> >> On Mon, Jan 6, 2020 at 11:36 PM Richard Biener  wrote:
>> >> >
>> >> > On Mon, 16 Dec 2019, Andrew Pinski wrote:
>> >> >
>> >> > > On Thu, Nov 15, 2018 at 12:31 AM Richard Biener  
>> >> > > wrote:
>> >> > > >
>> >> > > > On Thu, 15 Nov 2018, Richard Biener wrote:
>> >> > > >
>> >> > > > > So - can you fix it please?  Also note that the VECTOR_CST case
>> >> > > > > (as in BIT_FIELD_REF) seems to be inconsistent here and counts
>> >> > > > > "bits" in a different way?
>> >> > > >
>> >> > > > And bonus points for documenting BIT_FIELD_REF and BIT_INSERT_EXPR
>> >> > > > in generic.texi, together with those "details".
>> >> > >
>> >> > > This is the fix:
>> >> > > diff --git a/gcc/fold-const.c b/gcc/fold-const.c
>> >> > > index 8e9e299..a919b63 100644
>> >> > > --- a/gcc/fold-const.c
>> >> > > +++ b/gcc/fold-const.c
>> >> > > @@ -12301,6 +12301,8 @@ fold_ternary_loc (location_t loc, enum
>> >> > > tree_code code, tree type,
>> >> > > {
>> >> > >   unsigned HOST_WIDE_INT bitpos = tree_to_uhwi (op2);
>> >> > >   unsigned bitsize = TYPE_PRECISION (TREE_TYPE (arg1));
>> >> > > + if (BYTES_BIG_ENDIAN)
>> >> > > +   bitpos = TYPE_PRECISION (type) - bitpos - bitsize;
>> >> > >   wide_int tem = (wi::to_wide (arg0)
>> >> > >   & wi::shifted_mask (bitpos, bitsize, true,
>> >> > >   TYPE_PRECISION (type)));
>> >> >
>> >> > I guess you need to guard against BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN
>> >> > as well.
>> >> 
>> >> Yes I will add that check.
>> >> 
>> >> >  Also the above only works reliably for mode-precision
>> >> > integers?  We might want to disallow BIT_FIELD_REF/BIT_INSERT_EXPR
>> >> > on non-mode-precision entities in the GIMPLE/GENERIC verifiers.
>> >> 
>> >> You added that check already for gimple in r268332 due to PR88739.
>> >> BIT_FIELD_REF around tree-cfg.c:3083
>> >> BIT_INSERT_EXPR  around tree-cfg.c:4324
>> >
>> > Ah, good ;)  Note neither BIT_FIELD_REF nor BIT_INSERT_EXPR are
>> > documented in generic.texi and BIT_FIELD_REF is documented in tree.def
>> > as operating on structs/unions (well, memory).  And for register args
>> > we interpret it as storing the register to memory and interpreting
>> > the bit positions in memory bit terms (with the store doing endian
>> > fiddling).
>> 
>> Ah, was going to ask what the semantics were. :-)  That sounds good
>> because it's essentially the same as for non-paradoxical subregs.
>> We have routines like subreg_lsb_1 and subreg_offset_from_lsb that
>> convert byte offsets to shift amounts, so maybe we should move them
>> to code that's common to both gimple and rtl.  The BYTES_BIG_ENDIAN !=
>> WORDS_BIG_ENDIAN calculation is quite subtle, so I don't think we should
>> open-code it everwhere we need it.
>> 
>> What about the subbyte part of the bit value?  Does that always count
>> from the lsb of the containing byte?  E.g. for the four bytes:
>> 
>>   0x12, 0x34, 0x56, 0x78
>> 
>> what does bit offset == 3, bit size == 7 mean for big-endian?
>> 
>> > But for vector (register only?) accesses we interpret
>> > it as specifying lane numbers but at least BIT_FIELD_REF verifying
>> > doesn't barf on bit/sizes not corresponding to exact vector lanes
>> > (and I know we introduce non-matching ones via at least VIEW_CONVERT
>> > "merging" into BIT_FIELD_REFs).
>> 
>> GCC's vector lane numbering is equivalent to array index numbering for
>> all endiannesses, so these cases should still be ok for BYTES_BIG_ENDIAN
>> == WORDS_BIG_ENDIAN, at least on byte boundaries.  Not sure about
>> subbyte boundaries -- guess it depends on the answer to the question
>> above.
>
> I was thinking about, say a SImode extract at offset == 16, size == 32 of 
> a V4SImode vector.  Is that to be interpreted as some weird shifted vector
> lane or as a memory "bit" location after storing the vector to
> memory?  The issue I see here is that once RTL expansion decides to
> spill and interpret the offset/size in non-lane terms will there ever
> be a mismatch between both?

I think they'll be the same for BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN
(or even without for BITS_PER_WORD >= 64).  E.g.:

   memory:  0x01 0x23 | 0x45 0x67 | 0x89 0xab | 0xcd 0xef
    

   big-endian: 0x2345, little-endian: 0x4523

    lane
   big-endian register:  0x0123456789abcdef (lsb)
 

    lane
  little-endian register:0xefcdab8967452301 (lsb)
 

Thanks,
Richard


Re: [RFC] IVOPTs select cand with preferred D-form access

2020-01-07 Thread Kewen.Lin
on 2020/1/7 下午7:25, Richard Biener wrote:
> On Tue, 7 Jan 2020, Kewen.Lin wrote:
> 
>> on 2020/1/7 下午5:14, Richard Biener wrote:
>>> On Mon, 6 Jan 2020, Kewen.Lin wrote:
>>>
 We are thinking whether it can be handled in IVOPTs instead of one RTL 
 pass.

 During IVOPTs selecting IV cands, it doesn't know the loop will be 
 unrolled so
 it doesn't count the possible step cost in with X-form.  If we can teach 
 it to
 consider the case, the IV cands which plays with D-form can be preferred.
 Currently unrolling (incomplete) happens in RTL, it looks we have to 
 predict
 the loop whether unroll in IVOPTs.  Since there is some parameter checks 
 on RTL
 insn counts and target hooks, it seems not easy to get that.  Besides, we 
 need
 to check the step is valid to put into D-form field (eg: DQ-form requires 
 divide
 16 exactly), to ensure no extra ADDIs needed.

 I'm not sure whether it's a good idea to implement in IVOPTs, but I did 
 some
 changes in IVOPTs to prove it's doable to get expected codes, the patch is 
 attached.

 Any comments/suggestions are highly appreiciated!
>>>
>>> Is the unrolled code better than the not unrolled code (assuming
>>> optimal IV choice)?  Then IMHO IVOPTs should drive the unrolling,
>>> either by actually doing it or by forcing it via the loop->unroll
>>> setting.  I don't think second-guessing the RTL unroller at this
>>> point is going to work.  Alternatively turn X-form into D-form during
>>> RTL unrolling?
>>>
>>
>> Hi Richard,
>>
>> Thanks for the comments!
>>
>> Yes, unrolled version is better on Power9 for both forms, but D-form 
>> unrolled is better than X-form unrolled.  If we drive unrolling in 
>> IVOPTs, not sure it will be a concern that IVOPTs becomes too heavy? or 
>> too rude with forced UF if imprecise? Do we still have the plan to 
>> introduce one middle-end unroll pass, does it help if yes?
> 
> I have the opinion that an isolated unrolling pass is not wanted.
> Instead unrolling should be driven by some profitability metric
> which in your case is better induction variable optimization.
> In the "usual" case it is better scheduling where then scheduling
> should drive unrolling.

OK, it makes sense.  I heard some compiler consider unrolling factor
for vectorization and some for modulo scheduling.

> 
>> The quoted 
>> RTL patch is to propose one RTL pass after RTL loop passes, it also 
>> sounds good to check whether RTL unrolling is a good place!
> 
> Why would you need a new RTL pass?  I'd do it during the unroll
> transform itself, ideally on the not unrolled body because that's
> likely simpler than updating N copies?

Good question, I don't have good understanding on it.  But from the notes
of the patch, I guess one new pass doesn't only handle the cases exposed
by unrolling, but also the others without unrolling.

Quoted from its note: "This new pass scans existing rtl expressions and
replaces X-form loads and stores with rtl expressions that favor selection
of the D-form instructions in contexts for which the D-form instructions
are preferred.  The new pass runs after the RTL loop optimizations since
loop unrolling often introduces opportunities for beneficial replacements
of X-form addressing instructions."

BR,
Kewen



Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [2/2]

2020-01-07 Thread Stam Markianos-Wright


On 12/19/19 10:08 AM, Richard Sandiford wrote:
> Stam Markianos-Wright  writes:
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index f57469b6e23..f40f6432fd4 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -21661,6 +21661,68 @@ aarch64_stack_protect_guard (void)
>> return NULL_TREE;
>>   }
>>   
>> +/* Return the diagnostic message string if conversion from FROMTYPE to
>> +   TOTYPE is not allowed, NULL otherwise.  */
>> +
>> +static const char *
>> +aarch64_invalid_conversion (const_tree fromtype, const_tree totype)
>> +{
>> +  static char templ[100];
>> +  if ((GET_MODE_INNER (TYPE_MODE (fromtype)) == BFmode
>> +   || GET_MODE_INNER (TYPE_MODE (totype)) == BFmode)
>> +   && TYPE_MODE (fromtype) != TYPE_MODE (totype))
>> +  {
>> +snprintf (templ, sizeof (templ), \
>> +  "incompatible types when assigning to type '%s' from type '%s'",
>> +  IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (totype))),
>> +  IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (fromtype;
>> +return N_(templ);
>> +  }
>> +  /* Conversion allowed.  */
>> +  return NULL;
>> +}
>> +
> 
> This won't handle translation properly.  We also have no guarantee that
> the formatted string will fit in 100 characters since at least one of
> the type names is unconstrained.  (Also, not all types have names.)
> 

Hi Richard. I'm sending an email here to show you what I have done here, too :)

Currently I have the following:

static const char *
aarch64_invalid_conversion (const_tree fromtype, const_tree totype)
{
   static char templ[100];
   if (TYPE_MODE (fromtype) != TYPE_MODE (totype)
   && ((TYPE_MODE (fromtype) == BFmode && !VECTOR_TYPE_P (fromtype))
  || (TYPE_MODE (totype) == BFmode && !VECTOR_TYPE_P (totype
 {
   if (TYPE_NAME (fromtype) != NULL && TYPE_NAME (totype) != NULL)
{
  snprintf (templ, sizeof (templ),
"incompatible types when assigning to type '%s' from type '%s'",
IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (totype))),
IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (fromtype;
  return N_(templ);
}
   else
{
  snprintf (templ, sizeof (templ),
"incompatible types for assignment");
  return N_(templ);
}
 }
   /* Conversion allowed.  */
   return NULL;
}

This blocks the conversion only if the two types are of different modes and one 
of them is a BFmode scalar.

Doing it like this seems to block all scalar-sized assignments:

C:

typedef bfloat16_t vbf __attribute__((vector_size(2)));
vbf foo3 (void) { return (vbf) 0x1234; }

bfloat16_t foo1 (void) { return (bfloat16_t) 0x1234; }

bfloat16_t scalar1_3 = 0;
bfloat16_t scalar1_4 = 0.1;
bfloat16_t scalar1_5 = is_a_float;

bfloat16x4_t vector2_8 = { 0.0, 0, n2, is_a_float }; // (blocked on each 
element 
assignment)


C++:

bfloat16_t c1 (void) { return bfloat16_t (0x1234); }

bfloat16_t c2 (void) { return bfloat16_t (0.1); }


But then it allows vector initialisation from binary:

C:
bfloat16x4_t foo1 (void) { return (bfloat16x4_t) 0x1234567812345678; }

C++:
bfloat16x4_t foo1 (void) { return bfloat16x4_t (0x1234567812345678); }
typedef bfloat16_t v2bf __attribute__((vector_size(4)));
v2bf foo3 (void) { return v2bf (0x12345678); }

I also need to check with a colleague who is on holiday if any of this impacts 
the vector-reinterpret intrinsics that he was working on...

Let me know of your thoughts!

Cheers,
Stam

> Unfortunately the interface of the current hook doesn't allow for good
> diagnostics.  We'll just have to return a fixed string. >
> Formatting nit: braced block should be indented two spaces more
> than the "if (...)".
> 
> Same comment for the other hooks.

Done. Will be in next revision

> 
>> +/* Return the diagnostic message string if the unary operation OP is
>> +   not permitted on TYPE, NULL otherwise.  */
>> +
>> +static const char *
>> +aarch64_invalid_unary_op (int op, const_tree type)
>> +{
>> +  static char templ[100];
>> +  /* Reject all single-operand operations on BFmode except for &.  */
>> +  if (GET_MODE_INNER (TYPE_MODE (type)) == BFmode && op != ADDR_EXPR)
>> +  {
>> +snprintf (templ, sizeof (templ),
>> +  "operation not permitted on type '%s'",
>> +  IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type;
>> +return N_(templ);
>> +  }
>> +  /* Operation allowed.  */
>> +  return NULL;
>> +}
> 
> The problem with testing TYPE_MODE is that we'll then miss things
> that don't have a dedicated mode.  E.g. it'd be interesting to
> test what happens for arithmetic on:
> 
>typedef bfloat16_t v16bf __attribute__((vector_size(32)));
> 
> Probably better to use element_mode instead.

Done. Will be in next revision

> 
>> diff --git a/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_typecheck.c 
>> b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_typecheck.c
>> new file mode 100644
>> index 

Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [1/2]

2020-01-07 Thread Stam Markianos-Wright
On 23/12/2019 16:57, Richard Sandiford wrote:
> Stam Markianos-Wright  writes:
>> On 12/19/19 10:01 AM, Richard Sandiford wrote:
 +
 +#pragma GCC push_options
 +#pragma GCC target ("arch=armv8.2-a+bf16")
 +#ifdef __ARM_FEATURE_BF16_SCALAR_ARITHMETIC
 +
 +typedef __bf16 bfloat16_t;
 +
 +
 +#endif
 +#pragma GCC pop_options
 +
 +#endif
>>>
>>> Are you sure we need the #ifdef?  The target pragma should guarantee
>>> that the macro's defined.
>>>
>>> But the validity of the typedef shouldn't depend on target options,
>>> so AFAICT this should just be:
>>>
>>> typedef __bf16 bfloat16_t;
>>
>> Ok so it's a case of "what do we want to happen if the user tries to use 
>> bfloats
>> without +bf16 enabled.
>>
>> So the intent of the ifdef was to not have bfloat16_t be visible if the macro
>> wasn't defined (i.e. not having any bf16 support), but I see now that this 
>> was
>> being negated by the target macro, anyway! Oops, my bad for not really
>> understanding that, sorry!
>>
>> If we have the types always visible, then the user may use them, resulting 
>> in an
>> ICE.
>>
>> But even if the #ifdef worked this still doesn't stop the user from trying to
>> use  __bf16 or __Bfloat16x4_t, __Bfloat16x8_t , which would still do produce 
>> an
>> ICE, so it's not a perfect solution anyway...
> 
> Right.  Or they could use #pragma GCC target to switch to a different
> non-bf16 target after including arm_bf16.h.
> 
>> One other thing I tried was the below change to aarch64-builtins.c which 
>> stops
>> __bf16 or the vector types from being registered at all:
>>
>> --- a/gcc/config/aarch64/aarch64-builtins.c
>> +++ b/gcc/config/aarch64/aarch64-builtins.c
>> @@ -759,26 +759,32 @@ aarch64_init_simd_builtin_types (void)
>>   aarch64_simd_types[Float64x1_t].eltype = double_type_node;
>>   aarch64_simd_types[Float64x2_t].eltype = double_type_node;
>>
>> -  /* Init Bfloat vector types with underlying __bf16 type.  */
>> -  aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node;
>> -  aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node;
>> +  if (TARGET_BF16_SIMD)
>> +{
>> +  /* Init Bfloat vector types with underlying __bf16 type.  */
>> +  aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node;
>> +  aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node;
>> +}
>>
>>   for (i = 0; i < nelts; i++)
>> {
>>   tree eltype = aarch64_simd_types[i].eltype;
>>   machine_mode mode = aarch64_simd_types[i].mode;
>>
>> -  if (aarch64_simd_types[i].itype == NULL)
>> +  if (eltype != NULL)
>>{
>> - aarch64_simd_types[i].itype
>> -   = build_distinct_type_copy
>> - (build_vector_type (eltype, GET_MODE_NUNITS (mode)));
>> - SET_TYPE_STRUCTURAL_EQUALITY (aarch64_simd_types[i].itype);
>> -   }
>> + if (aarch64_simd_types[i].itype == NULL)
>> +   {
>> + aarch64_simd_types[i].itype
>> +   = build_distinct_type_copy
>> +   (build_vector_type (eltype, GET_MODE_NUNITS (mode)));
>> + SET_TYPE_STRUCTURAL_EQUALITY (aarch64_simd_types[i].itype);
>> +   }
>>
>> -  tdecl = add_builtin_type (aarch64_simd_types[i].name,
>> -   aarch64_simd_types[i].itype);
>> -  TYPE_NAME (aarch64_simd_types[i].itype) = tdecl;
>> + tdecl = add_builtin_type (aarch64_simd_types[i].name,
>> +   aarch64_simd_types[i].itype);
>> + TYPE_NAME (aarch64_simd_types[i].itype) = tdecl;
>> +   }
>> }
>>
>> #define AARCH64_BUILD_SIGNED_TYPE(mode)  \
>> @@ -1240,7 +1246,8 @@ aarch64_general_init_builtins (void)
>>
>>   aarch64_init_fp16_types ();
>>
>> -  aarch64_init_bf16_types ();
>> +  if (TARGET_BF16_FP)
>> +aarch64_init_bf16_types ();
>>
>>   if (TARGET_SIMD)
>> aarch64_init_simd_builtins ();
>>
>>
>>
>> But the problem in that case was that it the types could not be re-enabled 
>> using
>> a target pragma like:
>>
>> #pragma GCC push_options
>> #pragma GCC target ("+bf16")
>>
>> Inside the test.
>>
>> (i.e. the pragma caused the ifdef to be TRUE, but __bf16 was still not being
>> enabled afaict?)
>>
>> So I'm not sure what to do, presumably we do want some guard around the type 
>> so
>> as not to just ICE if the type is used without +bf16?
> 
> Other header files work both ways: you get the same definitions regardless
> of what the target was when the header file was included.  Then we need
> to raise an error if the user tries to do something that the current
> target doesn't support.
> 
> I suppose for bf16 we could either (a) try to raise an error whenever
> BF-related moves are emitted without the required target feature or
> (b) handle __bf16 types like __fp16 types.  The justification for
> (b) is that there aren't really any new instructions for moves;
> __bf16 is mostly a software 

Re: [PATCH][RFC] Introduce BIT_FIELD_INSERT

2020-01-07 Thread Richard Biener
On Tue, 7 Jan 2020, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Tue, 7 Jan 2020, Andrew Pinski wrote:
> >
> >> On Mon, Jan 6, 2020 at 11:36 PM Richard Biener  wrote:
> >> >
> >> > On Mon, 16 Dec 2019, Andrew Pinski wrote:
> >> >
> >> > > On Thu, Nov 15, 2018 at 12:31 AM Richard Biener  
> >> > > wrote:
> >> > > >
> >> > > > On Thu, 15 Nov 2018, Richard Biener wrote:
> >> > > >
> >> > > > > So - can you fix it please?  Also note that the VECTOR_CST case
> >> > > > > (as in BIT_FIELD_REF) seems to be inconsistent here and counts
> >> > > > > "bits" in a different way?
> >> > > >
> >> > > > And bonus points for documenting BIT_FIELD_REF and BIT_INSERT_EXPR
> >> > > > in generic.texi, together with those "details".
> >> > >
> >> > > This is the fix:
> >> > > diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> >> > > index 8e9e299..a919b63 100644
> >> > > --- a/gcc/fold-const.c
> >> > > +++ b/gcc/fold-const.c
> >> > > @@ -12301,6 +12301,8 @@ fold_ternary_loc (location_t loc, enum
> >> > > tree_code code, tree type,
> >> > > {
> >> > >   unsigned HOST_WIDE_INT bitpos = tree_to_uhwi (op2);
> >> > >   unsigned bitsize = TYPE_PRECISION (TREE_TYPE (arg1));
> >> > > + if (BYTES_BIG_ENDIAN)
> >> > > +   bitpos = TYPE_PRECISION (type) - bitpos - bitsize;
> >> > >   wide_int tem = (wi::to_wide (arg0)
> >> > >   & wi::shifted_mask (bitpos, bitsize, true,
> >> > >   TYPE_PRECISION (type)));
> >> >
> >> > I guess you need to guard against BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN
> >> > as well.
> >> 
> >> Yes I will add that check.
> >> 
> >> >  Also the above only works reliably for mode-precision
> >> > integers?  We might want to disallow BIT_FIELD_REF/BIT_INSERT_EXPR
> >> > on non-mode-precision entities in the GIMPLE/GENERIC verifiers.
> >> 
> >> You added that check already for gimple in r268332 due to PR88739.
> >> BIT_FIELD_REF around tree-cfg.c:3083
> >> BIT_INSERT_EXPR  around tree-cfg.c:4324
> >
> > Ah, good ;)  Note neither BIT_FIELD_REF nor BIT_INSERT_EXPR are
> > documented in generic.texi and BIT_FIELD_REF is documented in tree.def
> > as operating on structs/unions (well, memory).  And for register args
> > we interpret it as storing the register to memory and interpreting
> > the bit positions in memory bit terms (with the store doing endian
> > fiddling).
> 
> Ah, was going to ask what the semantics were. :-)  That sounds good
> because it's essentially the same as for non-paradoxical subregs.
> We have routines like subreg_lsb_1 and subreg_offset_from_lsb that
> convert byte offsets to shift amounts, so maybe we should move them
> to code that's common to both gimple and rtl.  The BYTES_BIG_ENDIAN !=
> WORDS_BIG_ENDIAN calculation is quite subtle, so I don't think we should
> open-code it everwhere we need it.
> 
> What about the subbyte part of the bit value?  Does that always count
> from the lsb of the containing byte?  E.g. for the four bytes:
> 
>   0x12, 0x34, 0x56, 0x78
> 
> what does bit offset == 3, bit size == 7 mean for big-endian?
> 
> > But for vector (register only?) accesses we interpret
> > it as specifying lane numbers but at least BIT_FIELD_REF verifying
> > doesn't barf on bit/sizes not corresponding to exact vector lanes
> > (and I know we introduce non-matching ones via at least VIEW_CONVERT
> > "merging" into BIT_FIELD_REFs).
> 
> GCC's vector lane numbering is equivalent to array index numbering for
> all endiannesses, so these cases should still be ok for BYTES_BIG_ENDIAN
> == WORDS_BIG_ENDIAN, at least on byte boundaries.  Not sure about
> subbyte boundaries -- guess it depends on the answer to the question
> above.

I was thinking about, say a SImode extract at offset == 16, size == 32 of 
a V4SImode vector.  Is that to be interpreted as some weird shifted vector
lane or as a memory "bit" location after storing the vector to
memory?  The issue I see here is that once RTL expansion decides to
spill and interpret the offset/size in non-lane terms will there ever
be a mismatch between both?

> For BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN, any sequence that crosses
> a word boundary can lead to ranges that aren't contiguous in registers,
> unless the range starts and ends on a word boundary.  This would include
> some of those vector cases, but it could also include bitfield references
> to multiword integers.
> 
> Subregs that cross a word boundary must start and end on a word boundary,
> but I guess that would be too draconian for gimple.

Yeah.  

Richard.


Re: [RFC] IVOPTs select cand with preferred D-form access

2020-01-07 Thread Richard Biener
On Tue, 7 Jan 2020, Kewen.Lin wrote:

> on 2020/1/7 下午5:14, Richard Biener wrote:
> > On Mon, 6 Jan 2020, Kewen.Lin wrote:
> > 
> >> We are thinking whether it can be handled in IVOPTs instead of one RTL 
> >> pass.
> >>
> >> During IVOPTs selecting IV cands, it doesn't know the loop will be 
> >> unrolled so
> >> it doesn't count the possible step cost in with X-form.  If we can teach 
> >> it to
> >> consider the case, the IV cands which plays with D-form can be preferred.
> >> Currently unrolling (incomplete) happens in RTL, it looks we have to 
> >> predict
> >> the loop whether unroll in IVOPTs.  Since there is some parameter checks 
> >> on RTL
> >> insn counts and target hooks, it seems not easy to get that.  Besides, we 
> >> need
> >> to check the step is valid to put into D-form field (eg: DQ-form requires 
> >> divide
> >> 16 exactly), to ensure no extra ADDIs needed.
> >>
> >> I'm not sure whether it's a good idea to implement in IVOPTs, but I did 
> >> some
> >> changes in IVOPTs to prove it's doable to get expected codes, the patch is 
> >> attached.
> >>
> >> Any comments/suggestions are highly appreiciated!
> > 
> > Is the unrolled code better than the not unrolled code (assuming
> > optimal IV choice)?  Then IMHO IVOPTs should drive the unrolling,
> > either by actually doing it or by forcing it via the loop->unroll
> > setting.  I don't think second-guessing the RTL unroller at this
> > point is going to work.  Alternatively turn X-form into D-form during
> > RTL unrolling?
> > 
> 
> Hi Richard,
> 
> Thanks for the comments!
> 
> Yes, unrolled version is better on Power9 for both forms, but D-form 
> unrolled is better than X-form unrolled.  If we drive unrolling in 
> IVOPTs, not sure it will be a concern that IVOPTs becomes too heavy? or 
> too rude with forced UF if imprecise? Do we still have the plan to 
> introduce one middle-end unroll pass, does it help if yes?

I have the opinion that an isolated unrolling pass is not wanted.
Instead unrolling should be driven by some profitability metric
which in your case is better induction variable optimization.
In the "usual" case it is better scheduling where then scheduling
should drive unrolling.

> The quoted 
> RTL patch is to propose one RTL pass after RTL loop passes, it also 
> sounds good to check whether RTL unrolling is a good place!

Why would you need a new RTL pass?  I'd do it during the unroll
transform itself, ideally on the not unrolled body because that's
likely simpler than updating N copies?

Richard.

Re: [PATCH] std::experimental::simd

2020-01-07 Thread Andrew Pinski
On Tue, Jan 7, 2020 at 3:01 AM Matthias Kretz  wrote:
>
> Is there any chance left we can get this done for 10.1? If not, can we please
> get it ready for 10.2 ASAP?
>
> Cheers,
>   Matthias
>
> On Montag, 14. Oktober 2019 14:12:12 CET Matthias Kretz wrote:
> > Let me try again to get this patch ready. It will need a few iterations...
> > This patch is without documentation and testsuite. I can add them on request
> > but would prefer a follow-up patch after getting this one right.
> >
> > I recommend to review starting from simd.h + simd_scalar.h, then
> > simd_builtin.h, simd_x86.h, and simd_fixed_size.h. I sure when we get this
> > far we are a few iterations further.
> >
> > Regarding the license. The license header is currently just a copy from my
> > repo, but we can change it to the libstdc++ license. The paperwork with the
> > FSF is done.

Seems like it would be better if we put the x86 and aarch64/arm
specific parts in their own headers.
Also all of the x86 conversion should be removed as
__builtin_convertvector  is supported now.
libstdc++v3 is only ever supported by the version that comes with the compiler.

Thanks,
Andrew

> >
> >
> >   * include/Makefile.am: Add new header.
> >   * include/Makefile.in: Regenerate.
> >   * include/experimental/simd: New header for Parallelism TS 2.
> >   * include/experimental/bits/simd.h: Definition of the public simd
> >   interfaces and general implementation helpers.
> >   * include/experimental/bits/simd_builtin.h: Implementation of the
> >   _VecBuiltin simd_abi.
> >   * include/experimental/bits/simd_combine.h: Preliminary
> >   implementation of the _Combine simd_abi.
> >   * include/experimental/bits/simd_converter.h: Generic simd
> >   conversions.
> >   * include/experimental/bits/simd_detail.h: Internal macros for the
> >   simd implementation.
> >   * include/experimental/bits/simd_fixed_size.h: Simd fixed_size ABI
> >   specific implementations.
> >   * include/experimental/bits/simd_math.h: Math overloads for simd.
> >   * include/experimental/bits/simd_neon.h: Simd NEON specific
> >   implementations.
> >   * include/experimental/bits/simd_scalar.h: Simd scalar ABI
> >   specific implementations.
> >   * include/experimental/bits/simd_x86.h: Simd x86 specific
> >   implementations.
> >   * include/experimental/bits/simd_x86_conversions.h: x86 specific
> >   conversion optimizations.
>
>
> --
> ──
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
>  std::experimental::simd  https://github.com/VcDevel/std-simd
> ──


Re: [PATCH 3/3] Check array contiguity for OpenACC/Fortran

2020-01-07 Thread Tobias Burnus

Hi Julian, hi Thomas,

in terms of the check, it looks fine to me – but I am not sure about the 
spec.


At least the following test case seems to work fine:

integer :: A(10,10), out(12)
A = reshape([(i, i=0,100)], shape(A))
!$omp target map(A(3:6,3:5), out)
!$acc parallel copy(A(3:6,3:5), out)
  out = reshape(A(3:6,3:5), shape(out))
  A(3:6,3:5) = 4
!$acc end parallel
!$omp end target
print '(4i3)', out
print '(10i3)', A
end

The reason that it works is that the stride is included in the length 
calculation:
#pragma omp target num_teams(1) thread_limit(0) map(tofrom:MEM[(c_char *)_6] 
[len: _5])

Has for the section (with A(3:6,3:5) -> parm(1:4,1:3)):
  parm.1.dim[0].lbound = 1;
  parm.1.dim[0].ubound = 4;
  parm.1.dim[0].stride = 1;
  parm.1.dim[1].lbound = 1;
  parm.1.dim[1].ubound = 3;
  parm.1.dim[1].stride = 10;
And instead of doing '(4-1+1) * (3-1+1)' = 12 (i.e. multiplying the extends),
the code does: 'stride * (3-1+1)' = 30.

Cheers,

Tobias

PS: It also works if one puts the stride on the ptr, i.e.:

integer,target :: A(10,10), out(12)
integer, pointer :: ptr(:,:)
A = reshape([(i, i=0,100)], shape(A))
ptr => A(3:6,3:5)
!$omp target map(ptr, out)
!$acc parallel copy(ptr, out)
  out = reshape(ptr, shape(out))
  ptr = 4
!$acc end parallel
!$omp end target
print '(4i3)', out
print '(10i3)', A
end

On 1/4/20 3:25 AM, Julian Brown wrote:

This patch tightens up error checking for array references used in
OpenACC clauses such that they must now be contiguous. I believe this
matches up to the spec (as of 2.6). I've tried to make it so an error
only triggers if the compiler is sure that a given array expression must
be non-contiguous at compile time, although this errs on the side
of potentially allowing the user to shoot themselves in the foot at
runtime: at least one existing test in the testsuite appears to expect
that behaviour.
[…]


Re: [PATCH][RFC] Introduce BIT_FIELD_INSERT

2020-01-07 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, 7 Jan 2020, Andrew Pinski wrote:
>
>> On Mon, Jan 6, 2020 at 11:36 PM Richard Biener  wrote:
>> >
>> > On Mon, 16 Dec 2019, Andrew Pinski wrote:
>> >
>> > > On Thu, Nov 15, 2018 at 12:31 AM Richard Biener  
>> > > wrote:
>> > > >
>> > > > On Thu, 15 Nov 2018, Richard Biener wrote:
>> > > >
>> > > > > So - can you fix it please?  Also note that the VECTOR_CST case
>> > > > > (as in BIT_FIELD_REF) seems to be inconsistent here and counts
>> > > > > "bits" in a different way?
>> > > >
>> > > > And bonus points for documenting BIT_FIELD_REF and BIT_INSERT_EXPR
>> > > > in generic.texi, together with those "details".
>> > >
>> > > This is the fix:
>> > > diff --git a/gcc/fold-const.c b/gcc/fold-const.c
>> > > index 8e9e299..a919b63 100644
>> > > --- a/gcc/fold-const.c
>> > > +++ b/gcc/fold-const.c
>> > > @@ -12301,6 +12301,8 @@ fold_ternary_loc (location_t loc, enum
>> > > tree_code code, tree type,
>> > > {
>> > >   unsigned HOST_WIDE_INT bitpos = tree_to_uhwi (op2);
>> > >   unsigned bitsize = TYPE_PRECISION (TREE_TYPE (arg1));
>> > > + if (BYTES_BIG_ENDIAN)
>> > > +   bitpos = TYPE_PRECISION (type) - bitpos - bitsize;
>> > >   wide_int tem = (wi::to_wide (arg0)
>> > >   & wi::shifted_mask (bitpos, bitsize, true,
>> > >   TYPE_PRECISION (type)));
>> >
>> > I guess you need to guard against BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN
>> > as well.
>> 
>> Yes I will add that check.
>> 
>> >  Also the above only works reliably for mode-precision
>> > integers?  We might want to disallow BIT_FIELD_REF/BIT_INSERT_EXPR
>> > on non-mode-precision entities in the GIMPLE/GENERIC verifiers.
>> 
>> You added that check already for gimple in r268332 due to PR88739.
>> BIT_FIELD_REF around tree-cfg.c:3083
>> BIT_INSERT_EXPR  around tree-cfg.c:4324
>
> Ah, good ;)  Note neither BIT_FIELD_REF nor BIT_INSERT_EXPR are
> documented in generic.texi and BIT_FIELD_REF is documented in tree.def
> as operating on structs/unions (well, memory).  And for register args
> we interpret it as storing the register to memory and interpreting
> the bit positions in memory bit terms (with the store doing endian
> fiddling).

Ah, was going to ask what the semantics were. :-)  That sounds good
because it's essentially the same as for non-paradoxical subregs.
We have routines like subreg_lsb_1 and subreg_offset_from_lsb that
convert byte offsets to shift amounts, so maybe we should move them
to code that's common to both gimple and rtl.  The BYTES_BIG_ENDIAN !=
WORDS_BIG_ENDIAN calculation is quite subtle, so I don't think we should
open-code it everwhere we need it.

What about the subbyte part of the bit value?  Does that always count
from the lsb of the containing byte?  E.g. for the four bytes:

  0x12, 0x34, 0x56, 0x78

what does bit offset == 3, bit size == 7 mean for big-endian?

> But for vector (register only?) accesses we interpret
> it as specifying lane numbers but at least BIT_FIELD_REF verifying
> doesn't barf on bit/sizes not corresponding to exact vector lanes
> (and I know we introduce non-matching ones via at least VIEW_CONVERT
> "merging" into BIT_FIELD_REFs).

GCC's vector lane numbering is equivalent to array index numbering for
all endiannesses, so these cases should still be ok for BYTES_BIG_ENDIAN
== WORDS_BIG_ENDIAN, at least on byte boundaries.  Not sure about
subbyte boundaries -- guess it depends on the answer to the question
above.

For BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN, any sequence that crosses
a word boundary can lead to ranges that aren't contiguous in registers,
unless the range starts and ends on a word boundary.  This would include
some of those vector cases, but it could also include bitfield references
to multiword integers.

Subregs that cross a word boundary must start and end on a word boundary,
but I guess that would be too draconian for gimple.

Thanks,
Richard


Re: [PATCH] Some compute_objsize/gimple_call_alloc_size/maybe_warn_overflow cleanups (PR tree-optimization/92868)

2020-01-07 Thread Martin Sebor

On 12/18/19 1:49 AM, Jakub Jelinek wrote:

On Tue, Dec 17, 2019 at 09:53:43AM -0700, Martin Sebor wrote:

I appreciate a cleanup but I don't have the impression this patch
does clean anything up.  Because of all the formatting changes and
no tests the effect of the changes isn't as clear as it should be.
(I wish you would resist the urge to reformat existing code on this
scale while also making changes with an observable effect in
the same diff.)  But thanks to the detailed explanation above
I think I can safely say that the builtins.c changes are not in
line with what I would like to see.


As written, there were two real changes in gimple_call_alloc_size,
one in maybe_warn_overflow and the rest formatting fixes (which I really
can't ignore, e.g. semicolon after if (...) { ... }; ?).
 From the above, it seems you are talking about just one change in
gimple_call_alloc_size (the setting of rng1[0] to 0 on overflow).

So, let's talk first about the first real change in gimple_call_alloc_size.
Without it, you get garbage on testcase like:
/* { dg-do compile { target lp64 } } */
/* { dg-options "-O2 -Wall" } */

__attribute__((noipa, alloc_size (1)))
char *
foo (int a)
{
   return (char *) __builtin_malloc (a);
}

void
bar (char *q)
{
   char *p = foo (-13);
   if (!p)
 return;
   __builtin_memcpy (p, q, (__SIZE_TYPE__) 0x10002ULL);
}

alloc_size_test.c: In function ‘bar’:
alloc_size_test.c:17:3: warning: ‘__builtin_memcpy’ writing 4294967298 bytes 
into a region of size 4294967283 [-Wstringop-overflow=]
17 |   __builtin_memcpy (p, q, (__SIZE_TYPE__) 0x10002ULL);
   |   ^~~
alloc_size_test.c:14:13: note: at offset 0 to an object with size 4294967283 
allocated by ‘foo’ here
14 |   char *p = foo (-13);
   | ^
alloc_size_test.c:14:13: warning: argument 1 value ‘-13’ is negative 
[-Walloc-size-larger-than=]
alloc_size_test.c:6:1: note: in a call to allocation function ‘foo’ declared 
here
 6 | foo (int a)
   | ^~~

The first warning is wrong, there is no object with size 4294967283 aka
-13U, the allocation will almost certainly fail


The warning is based on the alloc_size argument to foo.  It cannot
infer whether or not the allocation will succeed based on foo's body.

But since no size can be negative the warning should conservatively
treat it as if it was zero.  That's what I would like to see fixed.


and there will be nothing
wrong in the testcase, but if it wouldn't fail, it would need to allocate
-13UL aka 18446744073709551603 bytes.  And the reason is that
gimple_call_alloc_size leaves bogus rng1[0] and rng1[1], of -13 with
precision 32, which the caller than extends to siz_prec, but with UNSIGNED
extension, which is reasonable assumption that it is given size (UNSIGNED)
wide_ints.

The second hunk, there can be still several cases, e.g. both low bound and
upper bound could overflow, such as on calloc (SIZE_MAX / 2 + 2, SIZE_MAX /
2 + 2).  In this case, the code would produce a range of (say let's assume
ilp32 in this case) of [0x40010001, 0x].  What the caller
will do with such thing is hard to predict, there is from UNSIGNED siz_prec
conversion.  If you don't want the function to return real range of possible
values, but something else, it should be document what it is and not call it
range.  For the reasons you stated, perhaps for warnings (but never for code
generation!) it could be useful to have both range and likely range, where
the latter would be on the assumption that no overflow happens and so e.g.
for [64, SIZE_MAX] * [64, SIZE_MAX] it could return range of [0, SIZE_MAX]
and likely range of [64 * 64, SIZE_MAX].


For standard functions like calloc, gimple_call_alloc_size should
return the valid size of the allocated object in the range [0,
SIZE_MAX] (it should really be [0, MAXOBJSIZE]).  The size of
the object that would be allocated by calloc (SIZE_MAX / 2 - 2,
SIZE_MAX / 2 - 2) is outside that range so calloc must fail and
gimple_call_alloc_size should set the size to zero.

For user-defined functions that take signed arguments, negative
values should be treated as zero and the warning should be issued
for any access to the object.

Martin



And for the real change in maybe_warn_overflow, you can just try in the
debugger or using gprof see when that if (early out) will ever trigger, I'd
bet very rarely.
char *
foo (void)
{
   char *p = __builtin_calloc (32, 32);
   __builtin_memset (p, ' ', 32 * 32);
   return p;
}
gdb --args ./cc1 -quiet -O2 -Wall test.c
b tree-ssa-strlen.c:2051
r
p destsize
$1 = 
p len
$2 = 
p debug_generic_stmt (destsize)
1024
p debug_generic_stmt (len)
1024

Jakub



Re: [PATCH] std::experimental::simd

2020-01-07 Thread Matthias Kretz
Is there any chance left we can get this done for 10.1? If not, can we please 
get it ready for 10.2 ASAP?

Cheers,
  Matthias

On Montag, 14. Oktober 2019 14:12:12 CET Matthias Kretz wrote:
> Let me try again to get this patch ready. It will need a few iterations...
> This patch is without documentation and testsuite. I can add them on request
> but would prefer a follow-up patch after getting this one right.
> 
> I recommend to review starting from simd.h + simd_scalar.h, then
> simd_builtin.h, simd_x86.h, and simd_fixed_size.h. I sure when we get this
> far we are a few iterations further.
> 
> Regarding the license. The license header is currently just a copy from my
> repo, but we can change it to the libstdc++ license. The paperwork with the
> FSF is done.
> 
> 
>   * include/Makefile.am: Add new header.
>   * include/Makefile.in: Regenerate.
>   * include/experimental/simd: New header for Parallelism TS 2.
>   * include/experimental/bits/simd.h: Definition of the public simd
>   interfaces and general implementation helpers.
>   * include/experimental/bits/simd_builtin.h: Implementation of the
>   _VecBuiltin simd_abi.
>   * include/experimental/bits/simd_combine.h: Preliminary
>   implementation of the _Combine simd_abi.
>   * include/experimental/bits/simd_converter.h: Generic simd
>   conversions.
>   * include/experimental/bits/simd_detail.h: Internal macros for the
>   simd implementation.
>   * include/experimental/bits/simd_fixed_size.h: Simd fixed_size ABI
>   specific implementations.
>   * include/experimental/bits/simd_math.h: Math overloads for simd.
>   * include/experimental/bits/simd_neon.h: Simd NEON specific
>   implementations.
>   * include/experimental/bits/simd_scalar.h: Simd scalar ABI
>   specific implementations.
>   * include/experimental/bits/simd_x86.h: Simd x86 specific
>   implementations.
>   * include/experimental/bits/simd_x86_conversions.h: x86 specific
>   conversion optimizations.


-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──


Re: [C++ Patch] Improve build_new locations

2020-01-07 Thread Paolo Carlini

Hi,

On 06/01/20 21:47, Jason Merrill wrote:

On 1/2/20 4:23 AM, Paolo Carlini wrote:

@@ -19320,8 +19320,8 @@ tsubst_copy_and_build (tree t,
    tree op1 = tsubst (TREE_OPERAND (t, 1), args, complain, 
in_decl);

  tree op2 = RECUR (TREE_OPERAND (t, 2));
-    ret = build_new (_vec, op1, op2, _vec,
- NEW_EXPR_USE_GLOBAL (t),
+    ret = build_new (input_location, _vec, op1, op2,


Hmm, using input_location bothers me even though it does have the 
right value in this function.  Let's use a local variable instead; 
maybe change the existing loc variable to save_loc and use the name 
loc for the location we want to use.


I see, I wondered about that myself: when I started working on these 
location issues I forwarded simple EXPR_LOCATION (t) in a few places, 
which worked just fine. Then I noticed that in many existing places we 
were exploiting the fact that input_location is changed on top and 
started using it myself too. I suppose some of those places too could be 
changed to simple EXPR_LOCATION (t), I can experiment with that.


Anyway, the below implements the save_loc bit and regtests fine on 
x86_64-linux, as usual.


Thanks, Paolo.

/

Index: gcc/cp/cp-tree.h
===
--- gcc/cp/cp-tree.h(revision 279829)
+++ gcc/cp/cp-tree.h(working copy)
@@ -6720,9 +6720,10 @@ extern tree throw_bad_array_new_length   (void);
 extern bool type_has_new_extended_alignment(tree);
 extern unsigned malloc_alignment   (void);
 extern tree build_new_constexpr_heap_type  (tree, tree, tree);
-extern tree build_new  (vec **, tree, 
tree,
-vec **, int,
- tsubst_flags_t);
+extern tree build_new  (location_t,
+vec **, tree,
+tree, vec **,
+int, tsubst_flags_t);
 extern tree get_temp_regvar(tree, tree);
 extern tree build_vec_init (tree, tree, tree, bool, int,
  tsubst_flags_t);
Index: gcc/cp/init.c
===
--- gcc/cp/init.c   (revision 279829)
+++ gcc/cp/init.c   (working copy)
@@ -2396,8 +2396,8 @@ decl_constant_value (tree decl)
creates and returns a NEW_EXPR.  */
 
 static tree
-build_raw_new_expr (vec *placement, tree type, tree nelts,
-   vec *init, int use_global_new)
+build_raw_new_expr (location_t loc, vec *placement, tree type,
+   tree nelts, vec *init, int use_global_new)
 {
   tree init_list;
   tree new_expr;
@@ -2413,9 +2413,9 @@ static tree
   else
 init_list = build_tree_list_vec (init);
 
-  new_expr = build4 (NEW_EXPR, build_pointer_type (type),
-build_tree_list_vec (placement), type, nelts,
-init_list);
+  new_expr = build4_loc (loc, NEW_EXPR, build_pointer_type (type),
+build_tree_list_vec (placement), type, nelts,
+init_list);
   NEW_EXPR_USE_GLOBAL (new_expr) = use_global_new;
   TREE_SIDE_EFFECTS (new_expr) = 1;
 
@@ -3775,8 +3775,9 @@ build_new_1 (vec **placement, tree ty
rather than just "new".  This may change PLACEMENT and INIT.  */
 
 tree
-build_new (vec **placement, tree type, tree nelts,
-  vec **init, int use_global_new, tsubst_flags_t complain)
+build_new (location_t loc, vec **placement, tree type,
+  tree nelts, vec **init, int use_global_new,
+  tsubst_flags_t complain)
 {
   tree rval;
   vec *orig_placement = NULL;
@@ -3826,7 +3827,7 @@ tree
  || (nelts && type_dependent_expression_p (nelts))
  || (nelts && *init)
  || any_type_dependent_arguments_p (*init))
-   return build_raw_new_expr (*placement, type, nelts, *init,
+   return build_raw_new_expr (loc, *placement, type, nelts, *init,
   use_global_new);
 
   orig_placement = make_tree_vector_copy (*placement);
@@ -3852,10 +3853,11 @@ tree
 
   if (nelts)
 {
+  location_t nelts_loc = cp_expr_loc_or_loc (nelts, loc);
   if (!build_expr_type_conversion (WANT_INT | WANT_ENUM, nelts, false))
 {
   if (complain & tf_error)
-   permerror (cp_expr_loc_or_input_loc (nelts),
+   permerror (nelts_loc,
   "size in array new must have integral type");
   else
 return error_mark_node;
@@ -3871,8 +3873,7 @@ tree
 less than zero. ... If the expression is a constant expression,
 the program is ill-fomed.  */
   if (TREE_CODE (cst_nelts) == INTEGER_CST
- && !valid_array_size_p (cp_expr_loc_or_input_loc (nelts),
- 

Re: [RFC] IVOPTs select cand with preferred D-form access

2020-01-07 Thread Kewen.Lin
on 2020/1/7 下午5:14, Richard Biener wrote:
> On Mon, 6 Jan 2020, Kewen.Lin wrote:
> 
>> We are thinking whether it can be handled in IVOPTs instead of one RTL pass.
>>
>> During IVOPTs selecting IV cands, it doesn't know the loop will be unrolled 
>> so
>> it doesn't count the possible step cost in with X-form.  If we can teach it 
>> to
>> consider the case, the IV cands which plays with D-form can be preferred.
>> Currently unrolling (incomplete) happens in RTL, it looks we have to predict
>> the loop whether unroll in IVOPTs.  Since there is some parameter checks on 
>> RTL
>> insn counts and target hooks, it seems not easy to get that.  Besides, we 
>> need
>> to check the step is valid to put into D-form field (eg: DQ-form requires 
>> divide
>> 16 exactly), to ensure no extra ADDIs needed.
>>
>> I'm not sure whether it's a good idea to implement in IVOPTs, but I did some
>> changes in IVOPTs to prove it's doable to get expected codes, the patch is 
>> attached.
>>
>> Any comments/suggestions are highly appreiciated!
> 
> Is the unrolled code better than the not unrolled code (assuming
> optimal IV choice)?  Then IMHO IVOPTs should drive the unrolling,
> either by actually doing it or by forcing it via the loop->unroll
> setting.  I don't think second-guessing the RTL unroller at this
> point is going to work.  Alternatively turn X-form into D-form during
> RTL unrolling?
> 

Hi Richard,

Thanks for the comments!

Yes, unrolled version is better on Power9 for both forms, but D-form unrolled 
is better
than X-form unrolled.  If we drive unrolling in IVOPTs, not sure it will be a 
concern
that IVOPTs becomes too heavy? or too rude with forced UF if imprecise? Do we 
still
have the plan to introduce one middle-end unroll pass, does it help if yes?
The quoted RTL patch is to propose one RTL pass after RTL loop passes, it also 
sounds
good to check whether RTL unrolling is a good place!


BR,
Kewen



Re: [PATCH 4/4] Make total scalarization also copy padding (PR 92486)

2020-01-07 Thread Richard Biener
On Tue, 17 Dec 2019, Martin Jambor wrote:

> Hi,
> 
> PR 92486 shows that DSE, when seeing a "normal" gimple aggregate
> assignment coming from a C struct assignment and one a representing a
> folded memcpy, can kill the latter and keep in place only the former,
> which does not copy padding - at least when SRA decides to totally
> scalarize a least one of the aggregates (when not doing total
> scalarization, SRA cares about padding)
> 
> Mind you, SRA would not totally scalarize an aggregate if it saw that
> it takes part in a gimple assignment which is a folded memcpy (see how
> type_changing_p is set in contains_vce_or_bfcref_p) but it doesn't
> because of the DSE decisions.
> 
> I was asked to modify SRA to take padding into account - and to copy
> it around - when totally scalarizing, which is what the patch below
> does.
> 
> I believe the patch is correct in the sense that it will not cause
> miscompilations but after I have seen inlining propagate the useless
> (and small and ugly and certainly damaging) accesses far and wide, I
> am more convinced that before that this is not the correct approach
> and DSE should simple be able to discern between the two assignment
> too - and that the semantics of a "normal" gimple assignments should
> not include copying of padding.
> 
> But if the decision will be to make gimple aggregate always a solid
> block copy, the patch can do it, and has passed bootstrap and testing
> on x86_64-linux and a very similar one on aarch64-linux and
> i686-linux.  I suppose that at least the way how it figures out the
> type for copying will need change, but even then I'd rather not commit
> it.

I think both GENERIC and GIMPLE always had assignments being
block-copies.  I know we've changed memcpy folding to be more
explicit recently to avoid this very same issue but clearly
having a GIMPLE stmt like

 *p = *q;

decomposing into loads/stores of multiple discontiguous memory
ranges looks very wrong and would be quite awkward to correctly
represent throughout the compiler (and the alias infrastructure).

So even if the C standard says for aggregates the elements are
copied and contents of padding is unspecified the actual GIMPLE
primitive should always copy everything unless we can prove the
padding is not used.  The frontends could always choose to
make not copying the padding explicit (but usually copying it
_is_ more efficient unless you add artificial very large one).

>From an SRA analysis perspective I wonder if you can produce a
fix that doesn't depend on the earlier patches in this series?
It might be as "simple" as, in completely_scalarize, for
FIELD_DECLs with following padding to artificially enlarge
the field (I wonder if we can do w/o the 'ref' arg to
scalarize_elem for that - we'd build a MEM_REF then?).

Thanks,
Richard.

> Thanks,
> 
> Martin
> 
> 2019-12-13  Martin Jambor  
> 
>   PR tree-optimization/92486
>   * tree-sra.c: Include langhooks.h.
>   (total_scalarization_fill_padding): New function.
>   (total_skip_all_accesses_until_pos): Also create accesses for padding.
>   (total_should_skip_creating_access): Pass new parameters to
>   total_skip_all_accesses_until_pos, update how much area is already
>   covered in cases of success.
>   (totally_scalarize_subtree): Track how much of an aggregate is
>   covered, create accesses for trailing padding.
> ---
>  gcc/tree-sra.c | 102 -
>  1 file changed, 92 insertions(+), 10 deletions(-)
> 
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 9f087e5c27a..753bf63c33c 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -99,7 +99,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "dbgcnt.h"
>  #include "builtins.h"
>  #include "tree-sra.h"
> -
> +#include "langhooks.h"
>  
>  /* Enumeration of all aggregate reductions we can do.  */
>  enum sra_mode { SRA_MODE_EARLY_IPA,   /* early call regularization */
> @@ -2983,6 +2983,54 @@ create_total_scalarization_access (struct access 
> *parent, HOST_WIDE_INT from,
>return access;
>  }
>  
> +/* Create accesses to cover padding within PARENT, spanning from FROM to TO,
> +   link it in between its children in between LAST_PTR and NEXT_SIBLING.  */
> +
> +static struct access **
> +total_scalarization_fill_padding (struct access *parent, HOST_WIDE_INT from,
> +   HOST_WIDE_INT to, struct access **last_ptr,
> +   struct access *next_sibling)
> +{
> +  do
> +{
> +  /* Diff cannot be bigger than max_scalarization_size in
> +  analyze_all_variable_accesses.  */
> +  HOST_WIDE_INT diff = to - from;
> +  gcc_assert (diff >= BITS_PER_UNIT);
> +  HOST_WIDE_INT stsz = 1 << floor_log2 (diff);
> +  tree type;
> +  scalar_int_mode mode;
> +
> +  while (true)
> + {
> +   type = lang_hooks.types.type_for_size (stsz, 1);
> +   if (type
> +   && is_a  (TYPE_MODE 

Re: Add a compatible_vector_types_p target hook

2020-01-07 Thread Richard Sandiford
Richard Sandiford  writes:
> Richard Biener  writes:
>> On December 14, 2019 11:43:48 AM GMT+01:00, Richard Sandiford 
>>  wrote:
>>>Richard Biener  writes:
 On December 13, 2019 10:12:40 AM GMT+01:00, Richard Sandiford
>>> wrote:
>Richard Biener  writes:
>The AArch64 port emits an error if calls pass values of SVE type
>>>to
>>>an
>unprototyped function.  To do that we need to know whether the
>value
>really is an SVE type rathr than a plain vector.
>
>For varags the ABI is the same for 256 bits+.  But we'll have the
>same problem there once we support -msve-vector-bits=128, since
>>>the
>layout of SVE and Advanced SIMD vectors differ for big-endian.

 But then why don't you have different modes?
>>>
>>>Yeah, true, modes will probably help for the Advanced SIMD/SVE
>>>difference.  But from a vector value POV, a vector of 4 ints is a
>>>vector
>>>of 4 ints, so even distinguishing based on the mode is artificial.
>>
>> True. 
>>
>>>SVE is AFAIK the first target to have different modes for
>>>potentially
>>>the "same" vector type, and I had to add new infrastructure to
>>>allow
>>>targets to define multiple modes of the same size.  So the fact
>>>that
>>>gimple distinguishes otherwise identical vectors based on mode is a
>>>relatively recent thing.  AFAIK it just fell out in the wash rather
>>>than being deliberately planned.  It happens to be convenient in
>>>this
>>>context, but it hasn't been important until now.
>>>
>>>The hook doesn't seem any worse than distinguishing based on the
>mode.
>>>Another way to avoid this would have been to define separate SVE
>modes
>>>for the predefined vectors.  The big downside of that is that we'd
>end
>>>up doubling the number of SVE patterns.
>>>
>>>Extra on-the-side metadata is going to be easy to drop
>>>accidentally,
>>>and this is something we need for correctness rather than
>optimisation.
>>
>> Still selecting the ABI during call expansion only and based on
>values types at that point is fragile.
>
>Agreed.  But it's fragile in general, not just for this case. 
>>>Changing
>something as fundamental as that would be a lot of work and seems
>likely
>to introduce accidental ABI breakage.
>
>> The frontend are in charge of specifying the actual argument type
>>>and
>> at that point the target may fix the ABI. The ABI can be recorded
>>>in
>> the calls fntype, either via its TYPE_ARG_TYPES or in more awkward
>> ways for varargs functions (in full generality that would mean
>> attaching varargs ABI meta to each call).
>>
>> The alternative is to have an actual argument type vector
>>>associated
>> with each call.
>
>I think multiple pieces of gimple code would then have to cope with
>that
>as a special case.  E.g. if:
>
>   void foo (int, ...);
>
>   type1 a;
>   b = VIEW_CONVERT_EXPR (a);
>   if (a)
> foo (1, a);
>   else
> foo (1, b);
>
>gets converted to:
>
>   if (a)
> foo (1, a);
>   else
> foo (1, a);
>
>on the basis that type1 and type2 are "the same" despite having
>different calling conventions, we have to be sure that the calls
>are not treated as equivalent:
>
>   foo (1, a);
>
>Things like IPA clones would also need to handle this specially.
>Anything that generates new calls based on old ones will need
>to copy this information too.
>
>This also sounds like it would be fragile and seems a bit too
>invasive for stage 3.

 But we are already relying on this to work (fntype non-propagation)
>>>because function pointer conversions are dropped on the floor. 

 The real change would be introducing (per call) fntype for calls to
>>>unprototyped functions and somehow dealing with varargs. 
>>>
>>>It looks like this itself relies on useless_type_conversion_p,
>>>is that right?  E.g. we have things like:
>>>
>>>bool
>>>func_checker::compare_gimple_call (gcall *s1, gcall *s2)
>>>{
>>>  ...
>>>  tree fntype1 = gimple_call_fntype (s1);
>>>  tree fntype2 = gimple_call_fntype (s2);
>>>  if ((fntype1 && !fntype2)
>>>  || (!fntype1 && fntype2)
>>>  || (fntype1 && !types_compatible_p (fntype1, fntype2)))
>>>return return_false_with_msg ("call function types are not
>>>compatible");
>>>
>>>and useless_type_conversion_p has:
>>>
>>>  else if ((TREE_CODE (inner_type) == FUNCTION_TYPE
>>> || TREE_CODE (inner_type) == METHOD_TYPE)
>>>&& TREE_CODE (inner_type) == TREE_CODE (outer_type))
>>>{
>>>  tree outer_parm, inner_parm;
>>>
>>>  /* If the return types are not compatible bail out.  */
>>>  if (!useless_type_conversion_p (TREE_TYPE (outer_type),
>>>   TREE_TYPE (inner_type)))
>>> 

Re: [PATCH] Use dump_asm_name for Callers/Calls in dump.

2020-01-07 Thread Martin Liška

On 1/7/20 11:12 AM, Martin Jambor wrote:

I try to use just the symtab_order numbers whenever I can to avoid
confusion


Which is fine. Apparently there are just few usages of manual printing
of a symtab node and order like:

  fprintf (f,
   "%*s%s/%i %s\n%*s  freq:%4.2f",
   indent, "", callee->name (), callee->order,

I can replace these with symtab_node::dump_{asm_}name.


but - at least if we care about the non-asm name at all -
shouldn't the dump have both the name and asm_name?


That will probably make output of functions like cgraph_node::dump
harder to read.


If not, then I
guess get_dump_name should print asm name too to get consistency
everywhere in the IPA dumps not just the graph dump bit.


One alternative would be to print always asm names? Or we can drive that
with a flag? Similar thing do all tools like objdump, as, .. They
have --demangle (--no-demangle) options.

Martin



Martin




Re: [PATCH 2/3] Don't allow mixed component and non-component accesses for OpenACC/Fortran

2020-01-07 Thread Tobias Burnus

Hi Julian,

please hold back the commit.

Actually, it does not seem to work if one modifies the test
case a bit. The following code compiles – but I think it
should not:

type one
  integer i, j
end type
type two
  type(one)  A, B
end type

type(two) x
!$acc enter data copyin(x%A, x%A%i)
!$acc enter data copyin(x%A, x%A%i, x%A%i)
end

At least the last line has x%A%i twice but it is accepted.

I am not sure whether the line before is valid or not
(x%A%i is copied twice; might make a difference if it were
a pointer but it isn't a pointer here.) – In C/C++, with OpenMP
and, hence, OpenACC, the first one is accepted but the second
one is rejected. C example:

struct one {
  int i, j;
};

struct two {
  struct one  A, B;
};

void foo() {
  struct two x;
/* Accepted: x.A + x.A.i, even though both map 'x.A.i'.  */
#pragma omp target enter data map(to:x.A, x.A.i)
#pragma acc enter data copyin(x.A, x.A.i)

/* Rejected: 'x.A.i' mapped twice */
#pragma omp target enter data map(to:x.A, x.A.i, x.A.i)
#pragma acc enter data copyin(x.A, x.A.i, x.A.i)
}


Tobias



Patch ping

2020-01-07 Thread Jakub Jelinek
Hi!

I'd like to ping a few patches:

PR target/93009 - avx512* wrong-code fix
  http://gcc.gnu.org/ml/gcc-patches/2019-12/msg01488.html

PR target/93069 - avx512* rejects-valid fix (rejected by assembler)
  http://gcc.gnu.org/ml/gcc-patches/2019-12/msg01606.html

PR tree-optimization/92868 - compute_objsize/gimple_call_alloc_size
 /maybe_warn_overflow fixes
  http://gcc.gnu.org/ml/gcc-patches/2019-12/msg01164.html

Thanks

Jakub



[committed][AArch64] Use type attributes to mark types that use the SVE PCS

2020-01-07 Thread Richard Sandiford
The SVE port needs to maintain a different type identity for
GNU vectors and "SVE vectors", since the types use different ABIs.
Until now we've done that using pointer equality between the
TYPE_MAIN_VARIANT and the built-in SVE type.

However, as Richard B noted, that doesn't work well for LTO,
where we stream both GNU and SVE types from a file instead of
creating them directly.  We need a mechanism for distinguishing
the types using streamed type information.

This patch does that using a new type attribute.  This attribute
is only meant to be used for the built-in SVE types and shouldn't
be user-visible.  The patch tries to ensure this by including a space
in the attribute name, like we already do for things like "fn spec"
and "omp declare simd".

Tested on aarch64-linux-gnu, applied as r279953.

Richard


2020-01-07  Richard Sandiford  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_sve::svbool_type_p)
(aarch64_sve::nvectors_if_data_type): Replace with...
(aarch64_sve::builtin_type_p): ...this.
* config/aarch64/aarch64-sve-builtins.cc: Include attribs.h.
(find_vector_type): Delete.
(add_sve_type_attribute): New function.
(lookup_sve_type_attribute): Likewise.
(register_builtin_types): Add an "SVE type" attribute to each type.
(register_tuple_type): Likewise.
(svbool_type_p, nvectors_if_data_type): Delete.
(mangle_builtin_type): Use lookup_sve_type_attribute.
(builtin_type_p): Likewise.  Add an overload that returns the
number of constituent vector and predicate registers.
* config/aarch64/aarch64.c (aarch64_sve_argument_p): Delete.
(aarch64_returns_value_in_sve_regs_p): Use aarch64_sve::builtin_type_p
instead of aarch64_sve_argument_p.
(aarch64_takes_arguments_in_sve_regs_p): Likewise.
(aarch64_pass_by_reference): Likewise.
(aarch64_function_value_1): Likewise.
(aarch64_return_in_memory): Likewise.
(aarch64_layout_arg): Likewise.

gcc/testsuite/
* g++.target/aarch64/sve/acle/general-c++/mangle_5.C: New test.
* gcc.target/aarch64/sve/pcs/asm_1.c: Likewise.
* gcc.target/aarch64/sve/pcs/asm_2.c: Likewise.
* gcc.target/aarch64/sve/pcs/asm_3.c: Likewise.

Index: gcc/config/aarch64/aarch64-protos.h
===
--- gcc/config/aarch64/aarch64-protos.h 2020-01-06 12:58:16.457794543 +
+++ gcc/config/aarch64/aarch64-protos.h 2020-01-07 10:17:56.612716250 +
@@ -706,8 +706,7 @@ tree aarch64_builtin_vectorized_function
   void handle_arm_sve_h ();
   tree builtin_decl (unsigned, bool);
   bool builtin_type_p (const_tree);
-  bool svbool_type_p (const_tree);
-  unsigned int nvectors_if_data_type (const_tree);
+  bool builtin_type_p (const_tree, unsigned int *, unsigned int *);
   const char *mangle_builtin_type (const_tree);
   tree resolve_overloaded_builtin (location_t, unsigned int,
   vec *);
Index: gcc/config/aarch64/aarch64-sve-builtins.cc
===
--- gcc/config/aarch64/aarch64-sve-builtins.cc  2020-01-06 12:58:16.457794543 
+
+++ gcc/config/aarch64/aarch64-sve-builtins.cc  2020-01-07 10:17:56.616716227 
+
@@ -47,6 +47,7 @@
 #include "gimple-fold.h"
 #include "langhooks.h"
 #include "stringpool.h"
+#include "attribs.h"
 #include "aarch64-sve-builtins.h"
 #include "aarch64-sve-builtins-base.h"
 #include "aarch64-sve-builtins-shapes.h"
@@ -418,18 +419,31 @@ static hash_table 0;
+  if (tree attr = lookup_sve_type_attribute (type))
+{
+  tree num_zr_node = TREE_VALUE (attr);
+  tree num_pr_node = TREE_CHAIN (num_zr_node);
+  *num_zr = tree_to_uhwi (TREE_VALUE (num_zr_node));
+  *num_pr = tree_to_uhwi (TREE_VALUE (num_pr_node));
+  return true;
+}
+  return false;
 }
 
 /* Implement TARGET_VERIFY_TYPE_CONTEXT for SVE types.  */
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2020-01-06 12:58:16.457794543 +
+++ gcc/config/aarch64/aarch64.c2020-01-07 10:17:56.620716201 +
@@ -1246,6 +1246,7 @@ static const struct attribute_spec aarch
affects_type_identity, handler, exclude } */
   { "aarch64_vector_pcs", 0, 0, false, true,  true,  true,
  handle_aarch64_vector_pcs_attribute, NULL },
+  { "SVE type",  3, 3, false, true,  false, true,  NULL, NULL 
},
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
 };
 
@@ -2042,37 +2043,15 @@ aarch64_hard_regno_mode_ok (unsigned reg
true, set *NUM_ZR and *NUM_PR to the number of required Z and P registers
respectively.  */
 
-static bool
-aarch64_sve_argument_p (const_tree type, unsigned int *num_zr,
-   unsigned int *num_pr)
-{
-  if (aarch64_sve::svbool_type_p (type))
-{
-  

Re: [PATCH] Relax invalidation of TOP N counters in PGO.

2020-01-07 Thread Martin Liška

On 1/7/20 11:08 AM, Jan Hubicka wrote:

On 1/6/20 8:03 PM, Jan Hubicka wrote:

OK

Actually I am not so sure about this patch - how do we ensure
reproducibility in this case?

ISTM that anyone trying to have reproducible builds shouldn't be using
PGO based optimizations.


OpenSUSE does that. Builds are supposed to be reproducible + PGO is used
for number of core packages (like GCC itself).  This was motivation of
original Martin's change to drop TOPN values from profile more
agressively.


I agree with Honza that we should provide as reproducible builds as possible.
On the other hand, we should not be rigid and allow a reasonable sample loss.
That's why I added a parameter that be tweaked at openSUSE side.

Note that original motivation for the patch was your PR :) So that I'm surprised
I don't like the patch much.


I know. However I wonder if this can be done in reproducible way.


My patch deals with TOP N counters for which we expect quite a high
frequency to trigger an optimization. That said, such values should
not be put out of a TOP N counter.


Problem is that small values from profile run polutes the on-disk
counter and eventually renders it invalid.


Considering my patch, how will these small values lead to an invalid counter?


 If libgcov before meerging
or streaming dropped alues smaller than TOTAL_COUNT/N from the histogram
it may be enough to fit in the number of entires? (at least in the
testcase they seem very dominated by one value).


This will have probably the same effect as the patch I've suggested.



We could also have more values on disk than used by gcov counter.
But I see it is hard to get with current way libgcov is done.


Yes, it's possible to extend it.

Martin



Honza





Re: [PATCH] Use dump_asm_name for Callers/Calls in dump.

2020-01-07 Thread Martin Jambor
Hi,

On Tue, Jan 07 2020, Martin Liška wrote:
> Hi.
>
> I would like to make cgraph_node names consistent in cgraph_node::dump.
>

I try to use just the symtab_order numbers whenever I can to avoid
confusion but - at least if we care about the non-asm name at all -
shouldn't the dump have both the name and asm_name?  If not, then I
guess get_dump_name should print asm name too to get consistency
everywhere in the IPA dumps not just the graph dump bit. 

Martin


> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?
> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> 2020-01-02  Martin Liska  
>
>   * cgraph.c (cgraph_node::dump): Use systematically
>   dump_asm_name.
> ---
>   gcc/cgraph.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
>
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index 13238d3d442..aa2c476842a 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -2036,7 +2036,7 @@ cgraph_node::dump (FILE *f)
>profile_count sum = profile_count::zero ();
>for (edge = callers; edge; edge = edge->next_caller)
>  {
> -  fprintf (f, "%s ", edge->caller->dump_name ());
> +  fprintf (f, "%s ", edge->caller->dump_asm_name ());
>edge->dump_edge_flags (f);
>if (edge->count.initialized_p ())
>   sum += edge->count.ipa ();
> @@ -2045,7 +2045,7 @@ cgraph_node::dump (FILE *f)
>fprintf (f, "\n  Calls: ");
>for (edge = callees; edge; edge = edge->next_callee)
>  {
> -  fprintf (f, "%s ", edge->callee->dump_name ());
> +  fprintf (f, "%s ", edge->callee->dump_asm_name ());
>edge->dump_edge_flags (f);
>  }
>fprintf (f, "\n");


Re: [patch] Let libstdc++ know that VxWorks has_nanosleep

2020-01-07 Thread Olivier Hainque



> On 6 Jan 2020, at 16:21, Jonathan Wakely  wrote:
>> 
>> Overall, this really seems like a good setting for
>> --enable-libstdcxx-time auto.
> 
> OK, that makes sense. Thanks!

Sure, thank you for your feedback :)


Re: [PATCH] Relax invalidation of TOP N counters in PGO.

2020-01-07 Thread Jan Hubicka
> On 1/6/20 8:03 PM, Jan Hubicka wrote:
> > > > > OK
> > > > Actually I am not so sure about this patch - how do we ensure
> > > > reproducibility in this case?
> > > ISTM that anyone trying to have reproducible builds shouldn't be using
> > > PGO based optimizations.
> > 
> > OpenSUSE does that. Builds are supposed to be reproducible + PGO is used
> > for number of core packages (like GCC itself).  This was motivation of
> > original Martin's change to drop TOPN values from profile more
> > agressively.
> 
> I agree with Honza that we should provide as reproducible builds as possible.
> On the other hand, we should not be rigid and allow a reasonable sample loss.
> That's why I added a parameter that be tweaked at openSUSE side.
> 
> Note that original motivation for the patch was your PR :) So that I'm 
> surprised
> I don't like the patch much.

I know. However I wonder if this can be done in reproducible way.
Problem is that small values from profile run polutes the on-disk
counter and eventually renders it invalid.  If libgcov before meerging
or streaming dropped alues smaller than TOTAL_COUNT/N from the histogram
it may be enough to fit in the number of entires? (at least in the
testcase they seem very dominated by one value).

We could also have more values on disk than used by gcov counter.
But I see it is hard to get with current way libgcov is done.

Honza


Re: [PATCH][RFC] Introduce BIT_FIELD_INSERT

2020-01-07 Thread Richard Biener
On Tue, 7 Jan 2020, Andrew Pinski wrote:

> On Mon, Jan 6, 2020 at 11:36 PM Richard Biener  wrote:
> >
> > On Mon, 16 Dec 2019, Andrew Pinski wrote:
> >
> > > On Thu, Nov 15, 2018 at 12:31 AM Richard Biener  wrote:
> > > >
> > > > On Thu, 15 Nov 2018, Richard Biener wrote:
> > > >
> > > > > On Wed, 14 Nov 2018, Andrew Pinski wrote:
> > > > >
> > > > > > On Fri, May 13, 2016 at 3:51 AM Richard Biener  
> > > > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > > The following patch adds BIT_FIELD_INSERT, an operation to
> > > > > > > facilitate doing bitfield inserts on registers (as opposed
> > > > > > > to currently where we'd have a BIT_FIELD_REF store).
> > > > > > >
> > > > > > > Originally this was developed as part of bitfield lowering
> > > > > > > where bitfield stores were lowered into read-modify-write
> > > > > > > cycles and the modify part, instead of doing shifting and masking,
> > > > > > > be kept in a more high-level form to ease combining them.
> > > > > > >
> > > > > > > A second use case (the above is still valid) is vector element
> > > > > > > inserts which we currently can only do via memory or
> > > > > > > by extracting all components and re-building the vector using
> > > > > > > a CONSTRUCTOR.  For this second use case I added code
> > > > > > > re-writing the BIT_FIELD_REF stores the C family FEs produce
> > > > > > > into BIT_FIELD_INSERT when update-address-taken can otherwise
> > > > > > > re-write a decl into SSA form (the testcase shows we miss
> > > > > > > a similar opportunity with the MEM_REF form of a vector insert,
> > > > > > > I plan to fix that for the final submission).
> > > > > > >
> > > > > > > One speciality of BIT_FIELD_INSERT as opposed to BIT_FIELD_REF
> > > > > > > is that the size of the insertion is given implicitely via the
> > > > > > > type size/precision of the value to insert.  That avoids
> > > > > > > introducing ways to have quaternary ops in folding and GIMPLE 
> > > > > > > stmts.
> > > > > > >
> > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > > > > >
> > > > > > > Richard.
> > > > > > >
> > > > > > > 2011-06-16  Richard Guenther  
> > > > > > >
> > > > > > > PR tree-optimization/29756
> > > > > > > * tree.def (BIT_FIELD_INSERT): New tcc_expression tree 
> > > > > > > code.
> > > > > > > * expr.c (expand_expr_real_2): Handle BIT_FIELD_INSERT.
> > > > > > > * fold-const.c (operand_equal_p): Likewise.
> > > > > > > (fold_ternary_loc): Add constant folding of 
> > > > > > > BIT_FIELD_INSERT.
> > > > > > > * gimplify.c (gimplify_expr): Handle BIT_FIELD_INSERT.
> > > > > > > * tree-inline.c (estimate_operator_cost): Likewise.
> > > > > > > * tree-pretty-print.c (dump_generic_node): Likewise.
> > > > > > > * tree-ssa-operands.c (get_expr_operands): Likewise.
> > > > > > > * cfgexpand.c (expand_debug_expr): Likewise.
> > > > > > > * gimple-pretty-print.c (dump_ternary_rhs): Likewise.
> > > > > > > * gimple.c (get_gimple_rhs_num_ops): Handle 
> > > > > > > BIT_FIELD_INSERT.
> > > > > > > * tree-cfg.c (verify_gimple_assign_ternary): Verify 
> > > > > > > BIT_FIELD_INSERT.
> > > > > > >
> > > > > > > * tree-ssa.c (non_rewritable_lvalue_p): We can rewrite
> > > > > > > vector inserts using BIT_FIELD_REF on the lhs.
> > > > > > > (execute_update_addresses_taken): Do it.
> > > > > > >
> > > > > > > * gcc.dg/tree-ssa/vector-6.c: New testcase.
> > > > > > >
> > > > > > > Index: trunk/gcc/expr.c
> > > > > > > ===
> > > > > > > *** trunk.orig/gcc/expr.c   2016-05-12 13:40:30.704262951 
> > > > > > > +0200
> > > > > > > --- trunk/gcc/expr.c2016-05-12 15:40:32.481225744 +0200
> > > > > > > *** expand_expr_real_2 (sepops ops, rtx targ
> > > > > > > *** 9358,9363 
> > > > > > > --- 9358,9380 
> > > > > > > target = expand_vec_cond_expr (type, treeop0, treeop1, 
> > > > > > > treeop2, target);
> > > > > > > return target;
> > > > > > >
> > > > > > > + case BIT_FIELD_INSERT:
> > > > > > > +   {
> > > > > > > +   unsigned bitpos = tree_to_uhwi (treeop2);
> > > > > > > +   unsigned bitsize;
> > > > > > > +   if (INTEGRAL_TYPE_P (TREE_TYPE (treeop1)))
> > > > > > > + bitsize = TYPE_PRECISION (TREE_TYPE (treeop1));
> > > > > > > +   else
> > > > > > > + bitsize = tree_to_uhwi (TYPE_SIZE (TREE_TYPE 
> > > > > > > (treeop1)));
> > > > > > > +   rtx op0 = expand_normal (treeop0);
> > > > > > > +   rtx op1 = expand_normal (treeop1);
> > > > > > > +   rtx dst = gen_reg_rtx (mode);
> > > > > > > +   emit_move_insn (dst, op0);
> > > > > > > +   store_bit_field (dst, bitsize, bitpos, 0, 0,
> > > > > > > +TYPE_MODE (TREE_TYPE (treeop1)), op1, 
> > > > > > > false);
> > > > > > > +   return dst;
> > > > > > > +   }
> > > > > > 

Re: [PATCH 2/3] Don't allow mixed component and non-component accesses for OpenACC/Fortran

2020-01-07 Thread Tobias Burnus

Hi Julian,

@Jakub: I suggest to enable it also for OpenMP as prep work. Any 
objections/comments?
@Thomas: This all assumes that "copy(a,a.comp)" is not valid in OpenACC; 
does this match your understanding of the spec?


On 1/4/20 3:25 AM, Julian Brown wrote:


In C/C++, it's not permitted to mix full struct-type accesses and
"attach/detach" accesses to the same struct's members within a single
directive.  This patch adds the same restriction to the Fortran front-end,
and adjusts some existing tests to match.  A test is also added for
detection of duplicate derived-type member accesses within a directive.

Tested with offloading to NVPTX. OK for mainline?

Looks good do me, with the comments below addressed.

- the old symbol. gfc_new is used in symbol.c to flag new symbols.  */
+ the old symbol. gfc_new is used in symbol.c to flag new symbols.
+ comp_mark is used to indicate variables which have component accesses
+ in OpenACC directive clauses.  */

"OpenMP/OpenACC" – assuming …

+ /* Allow multiple components of the same (e.g. derived-type)
+variable here.  Duplicate components are detected elsewhere.  */
  if (openacc && n->expr && n->expr->expr_type == EXPR_VARIABLE)

…

+ if (ref->type == REF_COMPONENT)
+   component_ref_p = true;
+ if (openacc && ((!component_ref_p && n->sym->comp_mark)
+ || (component_ref_p && n->sym->mark)))
+   gfc_error ("Symbol %qs has mixed component and non-component "
+  "accesses at %L", n->sym->name, >where);


To match C/C++, I think it makes sense to apply the same check also for 
OpenMP.


Currently, that's only a nop as the OpenMP 4.5 feature is not yet 
implemented (and, hence, component access is rejected during parsing) 
but the condition also applies to OpenMP.


Thanks,

Tobias



Re: [PATCH] ipa-inline: Adjust condition for caller_growth_limits

2020-01-07 Thread luoxhu



On 2020/1/7 16:40, Jan Hubicka wrote:
>> On Mon, 2020-01-06 at 01:03 -0600, Xiong Hu Luo wrote:
>>> Inline should return failure either (newsize > param_large_function_insns)
>>> OR (newsize > limit).  Sometimes newsize is larger than
>>> param_large_function_insns, but smaller than limit, inline doesn't return
>>> failure even if the new function is a large function.
>>> Therefore, param_large_function_insns and param_large_function_growth 
>>> should be
>>> OR instead of AND, otherwise --param large-function-growth won't
>>> work correctly with --param large-function-insns.
>>>
>>> gcc/ChangeLog:
>>>
>>> 2020-01-06  Luo Xiong Hu  
>>>
>>> ipa-inline-analysis.c (estimate_growth): Fix typo.
>>> ipa-inline.c (caller_growth_limits): Use OR instead of AND.
>> OK
> 
> This patch also seems wrong.  Inliner should return failure when
>   newsize > param_large_function_insns && newsize > limit
> The reason is same as with large_unit_insns.  We used to have only
> growth limits but it has turned out that this gets too
> restrictive/random for very small units/growths.
> 
> So the logic is meant to be that inlining is OK either
>   - if function is reasonably small (large_function_insns limit is not
> met)
>   - or it does not grow too much (large-function-growth isnot met)

Sorry for checking too fast.  Will revert once confirmed.
large_function_insns is default to 2700, and large-function-growth is 
default to 100, so inline a large function with insn 2700 to another large 
function with insn 2700 is allowed here (limit reach to 5400), I observed 
HUGE performance drop in some cases for this behavior(large function inline 
large function due to register spilling) compared with not inlined, that's 
why this patch comes out.

>From performance point of view,it seems benefitial to inline small function to 
large function, but negative when inline large function to large 
function(causing
register spill)?  2700 seems too large for defining a function to be large and 
and
5400 also seems too big for growth limit?

This inline happens at "Inlining one function called once", it's out of 
inline_small_functions and since newsize > param_large_function_insns, so I  
suppose it won't change behavior for inlining small functions? 

> 
> @item large-function-insns
> The limit specifying really large functions.  For functions larger than
> thislimit after inlining, inlining is constrained by @option{--param
> large-function-growth}.  This parameter is useful primarily to avoid
> extreme compilation time caused by non-linear algorithms used by the
> back end.
> 
> 
> Honza
>> jeff
>>



Re: [PATCH] ipa-inline: Adjust condition for caller_growth_limits

2020-01-07 Thread Martin Liška

On 1/7/20 9:40 AM, Jan Hubicka wrote:

On Mon, 2020-01-06 at 01:03 -0600, Xiong Hu Luo wrote:

Inline should return failure either (newsize > param_large_function_insns)
OR (newsize > limit).  Sometimes newsize is larger than
param_large_function_insns, but smaller than limit, inline doesn't return
failure even if the new function is a large function.
Therefore, param_large_function_insns and param_large_function_growth should be
OR instead of AND, otherwise --param large-function-growth won't
work correctly with --param large-function-insns.

gcc/ChangeLog:

2020-01-06  Luo Xiong Hu  

ipa-inline-analysis.c (estimate_growth): Fix typo.
ipa-inline.c (caller_growth_limits): Use OR instead of AND.

OK


This patch also seems wrong.  Inliner should return failure when
  newsize > param_large_function_insns && newsize > limit
The reason is same as with large_unit_insns.  We used to have only
growth limits but it has turned out that this gets too
restrictive/random for very small units/growths.

So the logic is meant to be that inlining is OK either
  - if function is reasonably small (large_function_insns limit is not
met)
  - or it does not grow too much (large-function-growth isnot met)

@item large-function-insns
The limit specifying really large functions.  For functions larger than
thislimit after inlining, inlining is constrained by @option{--param
large-function-growth}.  This parameter is useful primarily to avoid
extreme compilation time caused by non-linear algorithms used by the
back end.


Hello.

The patch was already installed and caused quite significant changes in SPEC
benchmarks:
https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report

May I revert the patch?

Thanks,
Martin




Honza

jeff





Re: [PATCH 1/3] Add OpenACC test for sub-references being pointer or allocatable variables

2020-01-07 Thread Tobias Burnus

Hi Julian,

looks good to me – thanks!

Tobias

On 1/4/20 3:25 AM, Julian Brown wrote:

Hi,

This test (by Tobias Burnus, mildly edited) adds a test to check whether
the final component of a derived-type access has pointer or allocatable
type for manual deep copy attach/detach operations. This is just checking
existing behaviour.

This arose from discussion of the manual deep copy patch here:

   https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01439.html

OK?

Thanks,

Julian

ChangeLog

2020-01-04  Tobias Burnus  

gcc/testsuite/
* gfortran.dg/goacc/strided-alloc-ptr.f90: New test.
---
  .../gfortran.dg/goacc/strided-alloc-ptr.f90   | 34 +++
  1 file changed, 34 insertions(+)
  create mode 100644 gcc/testsuite/gfortran.dg/goacc/strided-alloc-ptr.f90

diff --git a/gcc/testsuite/gfortran.dg/goacc/strided-alloc-ptr.f90 
b/gcc/testsuite/gfortran.dg/goacc/strided-alloc-ptr.f90
new file mode 100644
index 000..755fd1c164b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/strided-alloc-ptr.f90
@@ -0,0 +1,34 @@
+implicit none
+type t
+  integer, allocatable :: i, j(:)
+  integer, pointer :: k, ll(:)
+end type t
+type(t) :: x(2)
+
+!$acc enter data copyin(x)
+
+!$acc enter data copyin(x(:)%i)
+! { dg-error "Component to the right of a part reference with nonzero rank must not have the 
ALLOCATABLE attribute" "" { target "*-*-*" } 10 }
+! { dg-error ".x. in MAP clause at .1. is not a proper array section"  "" { target 
"*-*-*" } 10 }
+
+!$acc enter data copyin(x(:)%j(3))
+! { dg-error "Component to the right of a part reference with nonzero rank must not have the 
ALLOCATABLE attribute" "" { target "*-*-*" } 14 }
+! { dg-error ".x. in MAP clause at .1. is not a proper array section"  "" { target 
"*-*-*" } 14 }
+
+!$acc enter data copyin(x(:)%j)
+! { dg-error "Component to the right of a part reference with nonzero rank must not have the 
ALLOCATABLE attribute" "" { target "*-*-*" } 18 }
+! { dg-error ".x. in MAP clause at .1. is not a proper array section"  "" { target 
"*-*-*" } 18 }
+
+
+!$acc enter data copyin(x(:)%k)
+! { dg-error "Component to the right of a part reference with nonzero rank must not have the POINTER 
attribute" "" { target "*-*-*" } 23 }
+! { dg-error ".x. in MAP clause at .1. is not a proper array section"  "" { target 
"*-*-*" } 23 }
+
+!$acc enter data copyin(x(:)%ll(3))
+! { dg-error "Component to the right of a part reference with nonzero rank must not have the POINTER 
attribute" "" { target "*-*-*" } 27 }
+! { dg-error ".x. in MAP clause at .1. is not a proper array section"  "" { target 
"*-*-*" } 27 }
+
+!$acc enter data copyin(x(:)%ll)
+! { dg-error "Component to the right of a part reference with nonzero rank must not have the POINTER 
attribute" "" { target "*-*-*" } 31 }
+! { dg-error ".x. in MAP clause at .1. is not a proper array section"  "" { target 
"*-*-*" } 31 }
+end


*ping* – Re: [Patch] OpenACC – support "if" + "if_present" clauses with "host_data"

2020-01-07 Thread Tobias Burnus

First *ping* (two weeks old but all during the holiday season).

Side note, for what it is worth: Thomas K regarded the Fortran part as 
reasonable/OK – but those OpenACC changes also affect 
C/C++/omp-low.c/libgomp :-) – cf. 
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01596.html


Cheers,

Tobias

On 12/24/19 3:23 PM, Tobias Burnus wrote:
On the front-end side, adding "if" and "if_present" to the  "acc 
host_data" directive is simple as other directives already support 
those clauses.


The 'if_present' status has to be passed along the use_device_ptr 
flag; for this a new flag has been introduced, using the gap in the 
gomp_map_kind enum (16 was still free; now used for 
GOMP_MAP_USE_DEVICE_PTR_IF_PRESENT, use_device_ptr has 14).


There was some forth and back whether a new enum should be used or a 
flag (passed in the device argument, as OpenACC only uses host and 
device and not a device number). Initial version was with new enum, 
then with flag and now again a new enum.


On the libgomp side, one simply needs to skip the if-not-present error 
and otherwise handle it as use_device_ptr.


Build on x86-64-gnu-linux without offloading and with nvptx offloading.
OK for the trunk?

Cheers,

Tobias

PS: History: The initial version (not public) used an enum but Thomas 
was wondering whether a flag is not the better solution, cf. 
discussion at https://gcc.gnu.org/ml/gcc/2018-12/msg00118.html – 
Hence, the first public version has switched to a flag and submitted 
for the OG8 (openacc-gcc-8-branch GIT) branch at 
https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01559.html and has been 
committed as Rev. b52c8d006581b2a657ac3d81ab2527bb398b6615 (this 
commit is also part of OG9. — A minor fix was committed to OG8?/OG9 as 
Rev. 995f9680a46c3a7246fe465faa847f8009e47ed8.


Thomas now wonders whether the flag will be future safe, hence, it now 
uses again an enum as in the original version.


PPS: Changes to OG9: Using again an enum; Fortran run-time test and 
rediffs.




Re: [PATCH] Relax invalidation of TOP N counters in PGO.

2020-01-07 Thread Martin Liška

On 1/6/20 8:03 PM, Jan Hubicka wrote:

OK

Actually I am not so sure about this patch - how do we ensure
reproducibility in this case?

ISTM that anyone trying to have reproducible builds shouldn't be using
PGO based optimizations.


OpenSUSE does that. Builds are supposed to be reproducible + PGO is used
for number of core packages (like GCC itself).  This was motivation of
original Martin's change to drop TOPN values from profile more
agressively.


I agree with Honza that we should provide as reproducible builds as possible.
On the other hand, we should not be rigid and allow a reasonable sample loss.
That's why I added a parameter that be tweaked at openSUSE side.

Note that original motivation for the patch was your PR :) So that I'm surprised
I don't like the patch much.

Martin



Honza


jeff





Re: [PATCH][RFC] Introduce BIT_FIELD_INSERT

2020-01-07 Thread Andrew Pinski
On Mon, Jan 6, 2020 at 11:36 PM Richard Biener  wrote:
>
> On Mon, 16 Dec 2019, Andrew Pinski wrote:
>
> > On Thu, Nov 15, 2018 at 12:31 AM Richard Biener  wrote:
> > >
> > > On Thu, 15 Nov 2018, Richard Biener wrote:
> > >
> > > > On Wed, 14 Nov 2018, Andrew Pinski wrote:
> > > >
> > > > > On Fri, May 13, 2016 at 3:51 AM Richard Biener  
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > > The following patch adds BIT_FIELD_INSERT, an operation to
> > > > > > facilitate doing bitfield inserts on registers (as opposed
> > > > > > to currently where we'd have a BIT_FIELD_REF store).
> > > > > >
> > > > > > Originally this was developed as part of bitfield lowering
> > > > > > where bitfield stores were lowered into read-modify-write
> > > > > > cycles and the modify part, instead of doing shifting and masking,
> > > > > > be kept in a more high-level form to ease combining them.
> > > > > >
> > > > > > A second use case (the above is still valid) is vector element
> > > > > > inserts which we currently can only do via memory or
> > > > > > by extracting all components and re-building the vector using
> > > > > > a CONSTRUCTOR.  For this second use case I added code
> > > > > > re-writing the BIT_FIELD_REF stores the C family FEs produce
> > > > > > into BIT_FIELD_INSERT when update-address-taken can otherwise
> > > > > > re-write a decl into SSA form (the testcase shows we miss
> > > > > > a similar opportunity with the MEM_REF form of a vector insert,
> > > > > > I plan to fix that for the final submission).
> > > > > >
> > > > > > One speciality of BIT_FIELD_INSERT as opposed to BIT_FIELD_REF
> > > > > > is that the size of the insertion is given implicitely via the
> > > > > > type size/precision of the value to insert.  That avoids
> > > > > > introducing ways to have quaternary ops in folding and GIMPLE stmts.
> > > > > >
> > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > 2011-06-16  Richard Guenther  
> > > > > >
> > > > > > PR tree-optimization/29756
> > > > > > * tree.def (BIT_FIELD_INSERT): New tcc_expression tree code.
> > > > > > * expr.c (expand_expr_real_2): Handle BIT_FIELD_INSERT.
> > > > > > * fold-const.c (operand_equal_p): Likewise.
> > > > > > (fold_ternary_loc): Add constant folding of 
> > > > > > BIT_FIELD_INSERT.
> > > > > > * gimplify.c (gimplify_expr): Handle BIT_FIELD_INSERT.
> > > > > > * tree-inline.c (estimate_operator_cost): Likewise.
> > > > > > * tree-pretty-print.c (dump_generic_node): Likewise.
> > > > > > * tree-ssa-operands.c (get_expr_operands): Likewise.
> > > > > > * cfgexpand.c (expand_debug_expr): Likewise.
> > > > > > * gimple-pretty-print.c (dump_ternary_rhs): Likewise.
> > > > > > * gimple.c (get_gimple_rhs_num_ops): Handle 
> > > > > > BIT_FIELD_INSERT.
> > > > > > * tree-cfg.c (verify_gimple_assign_ternary): Verify 
> > > > > > BIT_FIELD_INSERT.
> > > > > >
> > > > > > * tree-ssa.c (non_rewritable_lvalue_p): We can rewrite
> > > > > > vector inserts using BIT_FIELD_REF on the lhs.
> > > > > > (execute_update_addresses_taken): Do it.
> > > > > >
> > > > > > * gcc.dg/tree-ssa/vector-6.c: New testcase.
> > > > > >
> > > > > > Index: trunk/gcc/expr.c
> > > > > > ===
> > > > > > *** trunk.orig/gcc/expr.c   2016-05-12 13:40:30.704262951 +0200
> > > > > > --- trunk/gcc/expr.c2016-05-12 15:40:32.481225744 +0200
> > > > > > *** expand_expr_real_2 (sepops ops, rtx targ
> > > > > > *** 9358,9363 
> > > > > > --- 9358,9380 
> > > > > > target = expand_vec_cond_expr (type, treeop0, treeop1, 
> > > > > > treeop2, target);
> > > > > > return target;
> > > > > >
> > > > > > + case BIT_FIELD_INSERT:
> > > > > > +   {
> > > > > > +   unsigned bitpos = tree_to_uhwi (treeop2);
> > > > > > +   unsigned bitsize;
> > > > > > +   if (INTEGRAL_TYPE_P (TREE_TYPE (treeop1)))
> > > > > > + bitsize = TYPE_PRECISION (TREE_TYPE (treeop1));
> > > > > > +   else
> > > > > > + bitsize = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (treeop1)));
> > > > > > +   rtx op0 = expand_normal (treeop0);
> > > > > > +   rtx op1 = expand_normal (treeop1);
> > > > > > +   rtx dst = gen_reg_rtx (mode);
> > > > > > +   emit_move_insn (dst, op0);
> > > > > > +   store_bit_field (dst, bitsize, bitpos, 0, 0,
> > > > > > +TYPE_MODE (TREE_TYPE (treeop1)), op1, 
> > > > > > false);
> > > > > > +   return dst;
> > > > > > +   }
> > > > > > +
> > > > > >   default:
> > > > > > gcc_unreachable ();
> > > > > >   }
> > > > > > Index: trunk/gcc/fold-const.c
> > > > > > ===
> > > > > > *** trunk.orig/gcc/fold-const.c 2016-05-12 13:40:30.704262951 +0200
> 

[PATCH] Use dump_asm_name for Callers/Calls in dump.

2020-01-07 Thread Martin Liška

Hi.

I would like to make cgraph_node names consistent in cgraph_node::dump.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2020-01-02  Martin Liska  

* cgraph.c (cgraph_node::dump): Use systematically
dump_asm_name.
---
 gcc/cgraph.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 13238d3d442..aa2c476842a 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2036,7 +2036,7 @@ cgraph_node::dump (FILE *f)
   profile_count sum = profile_count::zero ();
   for (edge = callers; edge; edge = edge->next_caller)
 {
-  fprintf (f, "%s ", edge->caller->dump_name ());
+  fprintf (f, "%s ", edge->caller->dump_asm_name ());
   edge->dump_edge_flags (f);
   if (edge->count.initialized_p ())
 	sum += edge->count.ipa ();
@@ -2045,7 +2045,7 @@ cgraph_node::dump (FILE *f)
   fprintf (f, "\n  Calls: ");
   for (edge = callees; edge; edge = edge->next_callee)
 {
-  fprintf (f, "%s ", edge->callee->dump_name ());
+  fprintf (f, "%s ", edge->callee->dump_asm_name ());
   edge->dump_edge_flags (f);
 }
   fprintf (f, "\n");



Re: [PATCH] Small ccp optimization for x * x (PR tree-optimization/93156)

2020-01-07 Thread Richard Biener
On Tue, 7 Jan 2020, Jakub Jelinek wrote:

> Hi!
> 
> In x * x = ((x & -2) + (x & 1))^2 = ((x^2) & -4) + 2 * (x & -2) * (x & 1) + 
> (x & 1)^2
> the first two terms are divisible by 4 and the last one is 0 or 1, so
> (x * x) % 4U <= 1
> and thus bit ccp can assume he second least significant bit of any power of 
> two is
> 0.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2020-01-07  Jakub Jelinek  
> 
>   PR tree-optimization/93156
>   * tree-ssa-ccp.c (bit_value_binop): For x * x note that the second
>   least significant bit is always clear.
> 
>   * gcc.dg/tree-ssa/pr93156.c: New test.
> 
> --- gcc/tree-ssa-ccp.c.jj 2020-01-01 12:15:48.026609558 +0100
> +++ gcc/tree-ssa-ccp.c2020-01-06 15:39:16.807109578 +0100
> @@ -1650,6 +1650,17 @@ bit_value_binop (enum tree_code code, tr
>  TYPE_SIGN (TREE_TYPE (rhs2)), TYPE_PRECISION (TREE_TYPE 
> (rhs2)),
>  value_to_wide_int (r2val), r2val.mask);
>  
> +  /* (x * x) & 2 == 0.  */
> +  if (code == MULT_EXPR && rhs1 == rhs2 && TYPE_PRECISION (type) > 1)
> +{
> +  widest_int m = 2;
> +  if (wi::sext (mask, TYPE_PRECISION (type)) != -1)
> + value = wi::bit_and_not (value, m);
> +  else
> + value = 0;
> +  mask = wi::bit_and_not (mask, m);
> +}
> +
>if (wi::sext (mask, TYPE_PRECISION (type)) != -1)
>  {
>val.lattice_val = CONSTANT;
> --- gcc/testsuite/gcc.dg/tree-ssa/pr93156.c.jj2020-01-06 
> 15:26:57.750286597 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr93156.c   2020-01-06 15:40:32.987957791 
> +0100
> @@ -0,0 +1,23 @@
> +/* PR tree-optimization/93156 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times "return 0;" 3 "optimized" } } */
> +
> +int
> +foo (int x)
> +{
> +  return (x * x) & 2;
> +}
> +
> +unsigned long long
> +bar (unsigned long long x)
> +{
> +  return (x * x) & 2;
> +}
> +
> +int
> +baz (int x)
> +{
> +  x &= -2;
> +  return (x * x) & 3;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH] Make cgraph_edge::resolve_speculation static

2020-01-07 Thread Martin Jambor
Hi,

On Mon, Jan 06 2020, Jan Hubicka wrote:
>> Hi,
>> 
>> throughout this year a few of us got burnt by the fact that
>> cgraph_edge::resolve_speculation method sometimes removed and
>> deallocated its this pointer, sometimes making the this pointer of a few
>> other methods of the class also suddenly invalid.
>> 
>> We postponed dealing with the issue because simply making these methods
>> static would be a bit ugly and hoped that someone would come with
>> something better.  Well, that did not happen and so the semi-mechanical
>> patch below does exactly that and fixing a few (potential) problems I
>> encountered: I made the iteration over edges in
>> function_and_variable_visibility cope with such edge removal and fixed
>> local variable hiding in cgraph_node::set_call_stmt_including_clones and
>> cgraph_node::create_edge_including_clones.  I did not unify calls to
>> resolve_speculation and make_direct in redirect_to_unreachable in this
>> patch but I believe that is a logical follow-up.
>> 
>> The patch has passed bootstrap and LTO bootstrap and testing on an
>> x86_64-linux.  What do you think?
> It looks reasonable to me, but if we do not want member function to
> deallocate THIS, then we probably want to make remove()
> methods also static?
>

Right, remove is not quite as treacherous but I guess we do.  Updated
patch is below, it has also passed bootstrap, LTO bootstrap and testing
on an x86_64-linux.  (On a related note, varpool_node::remove should be
static too, but that is something for another patch.)

Thanks,

Martin


2020-01-06  Martin Jambor  

* cgraph.h (cgraph_edge): Make remove, set_call_stmt, make_direct,
resolve_speculation and redirect_call_stmt_to_callee static.  Change
return type of set_call_stmt to cgraph_edge *.
* auto-profile.c (afdo_indirect_call): Adjust call to
redirect_call_stmt_to_callee.
* cgraph.c (cgraph_edge::set_call_stmt): Make return cgraph-edge *,
make the this pointer explicit, adjust self-recursive calls and the
call top make_direct.  Return the resulting edge.
(cgraph_edge::remove): Make this pointer explicit.
(cgraph_edge::resolve_speculation): Likewise, adjust call to remove.
(cgraph_edge::make_direct): Likewise, adjust call to
resolve_speculation.
(cgraph_edge::redirect_call_stmt_to_callee): Likewise, also adjust
call to set_call_stmt.
(cgraph_update_edges_for_call_stmt_node): Update call to
set_call_stmt and remove.
* cgraphclones.c (cgraph_node::set_call_stmt_including_clones):
Renamed edge to master_edge.  Adjusted calls to set_call_stmt.
(cgraph_node::create_edge_including_clones): Moved "first" definition
of edge to the block where it was used.  Adjusted calls to
set_call_stmt.
(cgraph_node::remove_symbol_and_inline_clones): Adjust call to
cgraph_edge::remove.
* cgraphunit.c (walk_polymorphic_call_targets): Adjusted calls to
make_direct and redirect_call_stmt_to_callee.
* ipa-fnsummary.c (redirect_to_unreachable): Adjust calls to
resolve_speculation and make_direct.
* ipa-inline-transform.c (inline_transform): Adjust call to
redirect_call_stmt_to_callee.
(check_speculations_1):: Adjust call to resolve_speculation.
* ipa-inline.c (resolve_noninline_speculation): Adjust call to
resolve-speculation.
(inline_small_functions): Adjust call to resolve_speculation.
(ipa_inline): Likewise.
* ipa-prop.c (ipa_make_edge_direct_to_target): Adjust call to
make_direct.
* ipa-visibility.c (function_and_variable_visibility): Make iteration
safe with regards to edge removal, adjust calls to
redirect_call_stmt_to_callee.
* ipa.c (walk_polymorphic_call_targets): Adjust calls to make_direct
and redirect_call_stmt_to_callee.
* multiple_target.c (create_dispatcher_calls): Adjust call to
redirect_call_stmt_to_callee
(redirect_to_specific_clone): Likewise.
* tree-cfgcleanup.c (delete_unreachable_blocks_update_callgraph):
Adjust calls to cgraph_edge::remove.
* tree-inline.c (copy_bb): Adjust call to set_call_stmt.
(redirect_all_calls): Adjust call to redirect_call_stmt_to_callee.
(expand_call_inline): Adjust call to cgraph_edge::remove.
---
 gcc/auto-profile.c |   2 +-
 gcc/cgraph.c   | 129 +++--
 gcc/cgraph.h   |  39 ++-
 gcc/cgraphclones.c |  16 ++---
 gcc/cgraphunit.c   |  11 ++--
 gcc/ipa-fnsummary.c|   4 +-
 gcc/ipa-inline-transform.c |   4 +-
 gcc/ipa-inline.c   |   6 +-
 gcc/ipa-prop.c |   2 +-
 gcc/ipa-visibility.c   |   8 ++-
 gcc/ipa.c  |   4 +-
 gcc/multiple_target.c  |   4 +-
 gcc/tree-cfgcleanup.c  |   4 +-
 gcc/tree-inline.c  |   

Re: [PATCH] Document cloning for the target_clone attribute.

2020-01-07 Thread Martin Liška

On 1/6/20 3:04 PM, Martin Sebor wrote:

On 1/6/20 3:05 AM, Martin Liška wrote:

Hi.

The patch is about explanation what happens when
a target_clone function calls a function without
the attribute.

Ready for trunk?
Thanks,
Martin

gcc/ChangeLog:

2020-01-06  Martin Liska  

 PR ipa/83411
 * doc/extend.texi: Explain cloning for target_clone
 attribute.
---
  gcc/doc/extend.texi | 6 ++
  1 file changed, 6 insertions(+)


diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 05ec62c552b..184146abd10 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3768,6 +3768,12 @@ the @code{ifunc} attribute above) that dynamically 
selects a clone
  suitable for current architecture.  The resolver is created only if there
  is a usage of a function with @code{target_clones} attribute.

+Note that any subsequent call of a function without @code{target_clone}
+from a @code{target_clone} caller will not lead to copying
+(target clone) of the called function.
+If you want to enforce such behaviour,
+we recommend to use @code{flatter} attribute for the caller function.
+


Hello.



Should that be "the @code{flatten} attribute?"  (Article and
an 'n' rather than an 'r' in "flatten".)


Yep.



Also, (assuming I'm reading it correctly) would the rest of
the sentence be clearer if it instead read:

   we recommend declaring the calling function with the @code{flatten}
   attribute?


Sure, seems better to me.
I've just installed the patch with the suggested changed.

Martin



Martin





Re: [RFC] IVOPTs select cand with preferred D-form access

2020-01-07 Thread Richard Biener
On Mon, 6 Jan 2020, Kewen.Lin wrote:

> Hi all,
> 
> Recently I'm investigating on an issue related to use D-form/X-form vector
> memory access, it's the same as what the patch
> https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01879.html 
> was intended to deal with.  Power9 introduces DQ-form instructions for vector
> memory access, we perfer to use DQ-form when unrolling loop.  As the example
> in the above link, it can save number of ADDI and GPR for indexing.
> 
> Or for below example:
> 
>   extern void dummy (double, unsigned n);
> 
>   void
>   func (double *x, double *y, unsigned m, unsigned n)
>   {
> double sacc;
> for (unsigned j = 1; j < m; j++)
> {
>   sacc = 0.0;
>   for (unsigned i = 1; i < n; i++)
> sacc = sacc + x[i] * y[i];
>   dummy (sacc, n);
> }
>   }
> 
> Core loop with X-form (lxvx):
> 
>   mtctr   r10
>   lxvxvs12,r31,r9
>   lxvxvs0,r30,r9
>   addir10,r9,16
>   addir9,r9,32
>   xvmaddadp vs32,vs12,vs0
>   lxvxvs12,r31,r10
>   lxvxvs0,r30,r10
>   xvmaddadp vs11,vs12,vs0
>   lxvxvs12,r31,r9
>   lxvxvs0,r30,r9
>   addir9,r10,32
>   xvmaddadp vs32,vs12,vs0
>   lxvxvs12,r31,r9
>   lxvxvs0,r30,r9
>   addir9,r10,48
>   xvmaddadp vs11,vs12,vs0
>   lxvxvs12,r31,r9
>   lxvxvs0,r30,r9
>   addir9,r10,64
>   xvmaddadp vs32,vs12,vs0
>   lxvxvs12,r31,r9
>   lxvxvs0,r30,r9
>   addir9,r10,80
>   xvmaddadp vs11,vs12,vs0
>   lxvxvs12,r31,r9
>   lxvxvs0,r30,r9
>   addir9,r10,96
>   xvmaddadp vs32,vs12,vs0
>   lxvxvs12,r31,r9
>   lxvxvs0,r30,r9
>   addir9,r10,112
>   xvmaddadp vs11,vs12,vs0
>   bdnz190 
> 
> vs.
> 
> Core loop with D-form (lxv)
>   mtctr   r8
>   lxv vs12,0(r9)
>   lxv vs0,0(r10)
>   addir7,r9,16  // r7, r8 can be eliminated further with r9, r10
>   addir8,r10,16 // 2 or 4 addi vs. 8 addi above
>   addir9,r9,128
>   addir10,r10,128  
>   xvmaddadp vs32,vs12,vs0
>   lxv vs12,-112(r9)
>   lxv vs0,-112(r10)
>   xvmaddadp vs11,vs12,vs0
>   lxv vs12,16(r7)
>   lxv vs0,16(r8)
>   xvmaddadp vs32,vs12,vs0
>   lxv vs12,32(r7)
>   lxv vs0,32(r8)
>   xvmaddadp vs11,vs12,vs0
>   lxv vs12,48(r7)
>   lxv vs0,48(r8)
>   xvmaddadp vs32,vs12,vs0
>   lxv vs12,64(r7)
>   lxv vs0,64(r8)
>   xvmaddadp vs11,vs12,vs0
>   lxv vs12,80(r7)
>   lxv vs0,80(r8)
>   xvmaddadp vs32,vs12,vs0
>   lxv vs12,96(r7)
>   lxv vs0,96(r8)
>   xvmaddadp vs11,vs12,vs0
>   bdnz1b0 
> 
> We are thinking whether it can be handled in IVOPTs instead of one RTL pass.
> 
> During IVOPTs selecting IV cands, it doesn't know the loop will be unrolled so
> it doesn't count the possible step cost in with X-form.  If we can teach it to
> consider the case, the IV cands which plays with D-form can be preferred.
> Currently unrolling (incomplete) happens in RTL, it looks we have to predict
> the loop whether unroll in IVOPTs.  Since there is some parameter checks on 
> RTL
> insn counts and target hooks, it seems not easy to get that.  Besides, we need
> to check the step is valid to put into D-form field (eg: DQ-form requires 
> divide
> 16 exactly), to ensure no extra ADDIs needed.
> 
> I'm not sure whether it's a good idea to implement in IVOPTs, but I did some
> changes in IVOPTs to prove it's doable to get expected codes, the patch is 
> attached.
> 
> Any comments/suggestions are highly appreiciated!

Is the unrolled code better than the not unrolled code (assuming
optimal IV choice)?  Then IMHO IVOPTs should drive the unrolling,
either by actually doing it or by forcing it via the loop->unroll
setting.  I don't think second-guessing the RTL unroller at this
point is going to work.  Alternatively turn X-form into D-form during
RTL unrolling?

Thanks,
Richard.

> BR,
> Kewen
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH] ipa-inline: Adjust condition for caller_growth_limits

2020-01-07 Thread Jan Hubicka
> On Mon, 2020-01-06 at 01:03 -0600, Xiong Hu Luo wrote:
> > Inline should return failure either (newsize > param_large_function_insns)
> > OR (newsize > limit).  Sometimes newsize is larger than
> > param_large_function_insns, but smaller than limit, inline doesn't return
> > failure even if the new function is a large function.
> > Therefore, param_large_function_insns and param_large_function_growth 
> > should be
> > OR instead of AND, otherwise --param large-function-growth won't
> > work correctly with --param large-function-insns.
> > 
> > gcc/ChangeLog:
> > 
> > 2020-01-06  Luo Xiong Hu  
> > 
> > ipa-inline-analysis.c (estimate_growth): Fix typo.
> > ipa-inline.c (caller_growth_limits): Use OR instead of AND.
> OK

This patch also seems wrong.  Inliner should return failure when
 newsize > param_large_function_insns && newsize > limit
The reason is same as with large_unit_insns.  We used to have only
growth limits but it has turned out that this gets too
restrictive/random for very small units/growths.

So the logic is meant to be that inlining is OK either
 - if function is reasonably small (large_function_insns limit is not
   met)
 - or it does not grow too much (large-function-growth isnot met)

@item large-function-insns  
The limit specifying really large functions.  For functions larger than
thislimit after inlining, inlining is constrained by @option{--param
large-function-growth}.  This parameter is useful primarily to avoid
extreme compilation time caused by non-linear algorithms used by the
back end.


Honza
> jeff
> 


Re: [PATCH] popcount{64,32}c pattern matching fixes (PR tree-optimization/93098)

2020-01-07 Thread Richard Biener
On Tue, 31 Dec 2019, Jakub Jelinek wrote:

> On Tue, Dec 31, 2019 at 05:47:54PM +0100, Richard Biener wrote:
> > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > Ok. 
> 
> Thanks.
> 
> > >One thing I haven't done anything about yet is that there is
> > >FAIL: gcc.dg/tree-ssa/popcount4ll.c scan-tree-dump-times optimized
> > >".POPCOUNT" 1
> > >before/after this patch with -m32/-march=skylake-avx512.  That is
> > >because
> > >the popcountll effective target tests that we don't emit a call for
> > >__builtin_popcountll, which we don't on ia32 skylake-avx512, but
> > >direct_internal_fn_supported_p isn't true - that is because we expand
> > >the
> > >double word popcount using 2 word popcounts + addition.  Shall the
> > >match.pd
> > >case handle that case too  by allowing the optimization even if there
> > >is a
> > >type with half precision for which direct_internal_fn_supported_p?
> > 
> > You mean emitting a single builtin call
> > Or an add of two ifns? 
> 
> I meant to do in the match.pd condition what expand_unop will do, i.e.
> - && direct_internal_fn_supported_p (IFN_POPCOUNT, type,
> -OPTIMIZE_FOR_BOTH))
> + && (direct_internal_fn_supported_p (IFN_POPCOUNT, type,
> + OPTIMIZE_FOR_BOTH)
> + /* expand_unop can handle double-word popcount using
> +two word popcounts and addition.  */
> + || (TREE_CODE (type) == INTEGRAL_TYPE
> + && TYPE_PRECISION (type) == 2 * BITS_PER_WORD
> + && (optab_handler (popcount_optab, word_mode)
> + != CODE_FOR_nothing
> or so.

OK, that would work for me (maybe add a predicate to the optabs code
close to the actual expander).

Richard.