Re: [PING v2][PATCH] Make function clone name numbering independent.

2018-09-04 Thread Michael Ploujnikov
Hi Martin,

On 2018-09-03 06:01 AM, Martin Jambor wrote:
> Hi,
> 
> On Fri, Aug 31 2018, Michael Ploujnikov wrote:
>> I've done some more digging into the current uses of
>> numbered_clone_function_name and checked if any tests fail if I change
>> it to suffixed_function_name:
>>
>>   - gcc/cgraphclones.c:  DECL_NAME (new_decl) = numbered_clone_function_name 
>> (thunk->decl, "artificial_thunk");
>> - no new tests fail, inconclusive
>> - and despite the comment on redirect_callee_duplicating_thunks
>>   about "one or more" duplicates it doesn't seem like
>>   duplicate_thunk_for_node would be called more than once for each
>>   node, assuming each node is named uniquely, but I'm far from an
>>   expert in this area
> 
> The comment means that if there is a chain of thunks, the method clones
> all of them.  Nevertheless, you need name numbering here for the same
> reason why you need them for constprop.
> 
> 
>>   - gcc/omp-expand.c:  DECL_NAME (kern_fndecl) = 
>> numbered_clone_function_name (kern_fndecl, "kernel");
>> - no new tests fail, inconclusive
>> - I didn't see (and couldn't figure out a way to get) any of the
>>   existing omp/acc tests actually exercise this codeptah
> 
> I guess this one should not need it.  Build with
> --enable-offload-targets=hsa and run gomp.exp to try yourself.  I can
> run run-time HSA tests for you if you want.
> 
> Martin
> 

I've tried building with numbered_clone_function_name replaced by
suffixed_function_name and with --enable-offload-targets=hsa and
didn't see any errors in gomp.exp. I don't have a readily available
HSA setup so if you could do a quick test, I would really appreciate
it!

- Michael



signature.asc
Description: OpenPGP digital signature


Re: [PING v2][PATCH] Make function clone name numbering independent.

2018-09-04 Thread Michael Ploujnikov
Hi Martin, Richard,

Thanks for your responses.

On 2018-09-03 09:15 AM, Martin Jambor wrote:
> Hi,
> 
> On Mon, Sep 03 2018, Richard Biener wrote:
>> On Mon, Sep 3, 2018 at 12:02 PM Martin Jambor  wrote:
>>>
>>> Hi,
>>>
>>> On Fri, Aug 31 2018, Michael Ploujnikov wrote:
 I've done some more digging into the current uses of
 numbered_clone_function_name and checked if any tests fail if I change
 it to suffixed_function_name:

   - gcc/cgraphclones.c:  DECL_NAME (new_decl) = 
 numbered_clone_function_name (thunk->decl, "artificial_thunk");
 - no new tests fail, inconclusive
 - and despite the comment on redirect_callee_duplicating_thunks
   about "one or more" duplicates it doesn't seem like
   duplicate_thunk_for_node would be called more than once for each
   node, assuming each node is named uniquely, but I'm far from an
   expert in this area
>>>
>>> The comment means that if there is a chain of thunks, the method clones
>>> all of them.  Nevertheless, you need name numbering here for the same
>>> reason why you need them for constprop.
>>
>> The remaining question I had with the patch was if maybe all callers
>> can handle assigning
>> the numbering themselves, thus do sth like
>>
>>for-each-clone-of (i, fn)
>>   DECL_NAME (...) = numbered_clone_function_name (...,
>> "artificial_thunk", i);
>>
>> which would make the map of name -> number unnecessary.

Please see my comment below.

>>
> 
> That is what I would prefer too but it also has downsides.  Callers
> would have to do their book keeping and the various cloning interfaces
> would get another parameter when already they have quite a few
> (introducing some cloning context classes seems like an overkill to me
> :-).

I'm fine with doing it the way you guys are suggesting, but just to
explain my original thinking: The way I see it is that there will
always be some kind of name -> number mapping, even if it's managed by
each individual user and even if it's actually cgraph_node ->
number. Needless to say I'm far from an expert in all of the involved
areas (this is my first contribution to GCC) so the most effective
approach I could come up is doing it all in one place. Now with your
expert advice and as I get a better understanding of how things like
sra and other clones are created, I'm starting to see that a more
tailored approach is possible and probably the way it should have been
done in the first place.

> 
> If we want to get rid of .constprop discrepancies,

Could you please clarify to me what exactly you mean by the
".constprop discrepancies"?

> something like the
> following (only very mildly tested) patch should be enough (note that
> IPA-CP currently does not really need the new has because it creates

What do you mean "it doesn't need the new hash"? My
independent-cloneids-1.c test shows that a function can have more than
one .constprop clone, but I'm probably just not understanding
something.

> clones in only one pass through the call graph, but that is something
> likely to change).  We could then adjust other cloners that we care
> about.
> 
> However, if clones of thunks are one of them, then the optional
> parameter will also proliferate to cgraph_node::create_clone(),
> cgraph_edge::redirect_callee_duplicating_thunks() and
> duplicate_thunk_for_node().  create_version_clone_with_body would almost
> certainly need it because of IPA-SRA too (IPA-SRA can of course always
> pass zero).

I'll try to do this and to convert the other users in the next version
of the patch, but I might need some help with understanding how some
of those passes work!

> 
> Martin
> 
> 
> 
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 2b00f0165fa..703c3cfea7b 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -964,11 +964,15 @@ public:
>cgraph_node *new_inlined_to,
>bitmap args_to_skip, const char *suffix = NULL);
>  
> -  /* Create callgraph node clone with new declaration.  The actual body will
> - be copied later at compilation stage.  */
> +  /* Create callgraph node clone with new declaration.  The actual body will 
> be
> + copied later at compilation stage.  The name of the new clone will be
> + constructed from the name of the original name, SUFFIX and a number 
> which
> + can either be NUM_SUFFIX if non-negative or a unique incremented integer
> + otherwise.  */
>cgraph_node *create_virtual_clone (vec redirect_callers,
>vec *tree_map,
> -  bitmap args_to_skip, const char * suffix);
> +  bitmap args_to_skip, const char * suffix,
> +  int num_suffix = -1);
>  
>/* cgraph node being removed from symbol table; see if its entry can be
> replaced by other inline clone.  */
> @@ -2376,8 +2380,9 @@ basic_block init_lowered_empty_function (tree, bool, 

Re: C++ PATCH for build_throw

2018-09-04 Thread Jason Merrill
OK.

On Tue, Sep 4, 2018 at 7:01 PM, Marek Polacek  wrote:
> It occurred to me that we should be able to use treat_lvalue_as_rvalue_p in 
> the
> second context where we're supposed to use a move operation as well.  Except
> that for a throw-expression, the operand may not be a function parameter.
> While at it, also make use of CP_TYPE_VOLATILE_P.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2018-09-04  Marek Polacek  
>
> * cp-tree.h (treat_lvalue_as_rvalue_p): Declare.
> * except.c (build_throw): Use it.  Use CP_TYPE_VOLATILE_P.
> * typeck.c (treat_lvalue_as_rvalue_p): No longer static.  Add PARM_OK
> parameter.
> (maybe_warn_pessimizing_move): Adjust treat_lvalue_as_rvalue_p call.
> (check_return_expr): Likewise.
>
> diff --git gcc/cp/cp-tree.h gcc/cp/cp-tree.h
> index 43e452cc1a3..df441fca304 100644
> --- gcc/cp/cp-tree.h
> +++ gcc/cp/cp-tree.h
> @@ -7354,6 +7354,7 @@ extern tree cp_perform_integral_promotions  (tree, 
> tsubst_flags_t);
>  extern tree finish_left_unary_fold_expr  (tree, int);
>  extern tree finish_right_unary_fold_expr (tree, int);
>  extern tree finish_binary_fold_expr  (tree, tree, int);
> +extern bool treat_lvalue_as_rvalue_p(tree, bool);
>
>  /* in typeck2.c */
>  extern void require_complete_eh_spec_types (tree, tree);
> diff --git gcc/cp/except.c gcc/cp/except.c
> index f85ae047cfc..2db90eedcf7 100644
> --- gcc/cp/except.c
> +++ gcc/cp/except.c
> @@ -676,12 +676,9 @@ build_throw (tree exp)
>   /* Under C++0x [12.8/16 class.copy], a thrown lvalue is sometimes
>  treated as an rvalue for the purposes of overload resolution
>  to favor move constructors over copy constructors.  */
> - if (/* Must be a local, automatic variable.  */
> - VAR_P (exp)
> - && DECL_CONTEXT (exp) == current_function_decl
> - && ! TREE_STATIC (exp)
> + if (treat_lvalue_as_rvalue_p (exp, /*parm_ok*/false)
>   /* The variable must not have the `volatile' qualifier.  */
> - && !(cp_type_quals (TREE_TYPE (exp)) & TYPE_QUAL_VOLATILE))
> + && !CP_TYPE_VOLATILE_P (TREE_TYPE (exp)))
> {
>   tree moved = move (exp);
>   exp_vec = make_tree_vector_single (moved);
> diff --git gcc/cp/typeck.c gcc/cp/typeck.c
> index ab088a946b3..84cf4c478aa 100644
> --- gcc/cp/typeck.c
> +++ gcc/cp/typeck.c
> @@ -9180,14 +9180,15 @@ can_do_nrvo_p (tree retval, tree functype)
>  }
>
>  /* Returns true if we should treat RETVAL, an expression being returned,
> -   as if it were designated by an rvalue.  See [class.copy.elision].  */
> +   as if it were designated by an rvalue.  See [class.copy.elision].
> +   PARM_P is true if a function parameter is OK in this context.  */
>
> -static bool
> -treat_lvalue_as_rvalue_p (tree retval)
> +bool
> +treat_lvalue_as_rvalue_p (tree retval, bool parm_ok)
>  {
>return ((cxx_dialect != cxx98)
>   && ((VAR_P (retval) && !DECL_HAS_VALUE_EXPR_P (retval))
> - || TREE_CODE (retval) == PARM_DECL)
> + || (parm_ok && TREE_CODE (retval) == PARM_DECL))
>   && DECL_CONTEXT (retval) == current_function_decl
>   && !TREE_STATIC (retval));
>  }
> @@ -9240,7 +9241,7 @@ maybe_warn_pessimizing_move (tree retval, tree functype)
> }
>   /* Warn if the move is redundant.  It is redundant when we would
>  do maybe-rvalue overload resolution even without std::move.  */
> - else if (treat_lvalue_as_rvalue_p (arg))
> + else if (treat_lvalue_as_rvalue_p (arg, /*parm_ok*/true))
> {
>   auto_diagnostic_group d;
>   if (warning_at (loc, OPT_Wredundant_move,
> @@ -9525,7 +9526,7 @@ check_return_expr (tree retval, bool *no_warning)
>   Note that these conditions are similar to, but not as strict as,
>  the conditions for the named return value optimization.  */
>bool converted = false;
> -  if (treat_lvalue_as_rvalue_p (retval)
> +  if (treat_lvalue_as_rvalue_p (retval, /*parm_ok*/true)
>   /* This is only interesting for class type.  */
>   && CLASS_TYPE_P (functype))
> {


C++ PATCH for build_throw

2018-09-04 Thread Marek Polacek
It occurred to me that we should be able to use treat_lvalue_as_rvalue_p in the
second context where we're supposed to use a move operation as well.  Except
that for a throw-expression, the operand may not be a function parameter.
While at it, also make use of CP_TYPE_VOLATILE_P.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2018-09-04  Marek Polacek  

* cp-tree.h (treat_lvalue_as_rvalue_p): Declare.
* except.c (build_throw): Use it.  Use CP_TYPE_VOLATILE_P.
* typeck.c (treat_lvalue_as_rvalue_p): No longer static.  Add PARM_OK
parameter.
(maybe_warn_pessimizing_move): Adjust treat_lvalue_as_rvalue_p call.
(check_return_expr): Likewise.

diff --git gcc/cp/cp-tree.h gcc/cp/cp-tree.h
index 43e452cc1a3..df441fca304 100644
--- gcc/cp/cp-tree.h
+++ gcc/cp/cp-tree.h
@@ -7354,6 +7354,7 @@ extern tree cp_perform_integral_promotions  (tree, 
tsubst_flags_t);
 extern tree finish_left_unary_fold_expr  (tree, int);
 extern tree finish_right_unary_fold_expr (tree, int);
 extern tree finish_binary_fold_expr  (tree, tree, int);
+extern bool treat_lvalue_as_rvalue_p(tree, bool);
 
 /* in typeck2.c */
 extern void require_complete_eh_spec_types (tree, tree);
diff --git gcc/cp/except.c gcc/cp/except.c
index f85ae047cfc..2db90eedcf7 100644
--- gcc/cp/except.c
+++ gcc/cp/except.c
@@ -676,12 +676,9 @@ build_throw (tree exp)
  /* Under C++0x [12.8/16 class.copy], a thrown lvalue is sometimes
 treated as an rvalue for the purposes of overload resolution
 to favor move constructors over copy constructors.  */
- if (/* Must be a local, automatic variable.  */
- VAR_P (exp)
- && DECL_CONTEXT (exp) == current_function_decl
- && ! TREE_STATIC (exp)
+ if (treat_lvalue_as_rvalue_p (exp, /*parm_ok*/false)
  /* The variable must not have the `volatile' qualifier.  */
- && !(cp_type_quals (TREE_TYPE (exp)) & TYPE_QUAL_VOLATILE))
+ && !CP_TYPE_VOLATILE_P (TREE_TYPE (exp)))
{
  tree moved = move (exp);
  exp_vec = make_tree_vector_single (moved);
diff --git gcc/cp/typeck.c gcc/cp/typeck.c
index ab088a946b3..84cf4c478aa 100644
--- gcc/cp/typeck.c
+++ gcc/cp/typeck.c
@@ -9180,14 +9180,15 @@ can_do_nrvo_p (tree retval, tree functype)
 }
 
 /* Returns true if we should treat RETVAL, an expression being returned,
-   as if it were designated by an rvalue.  See [class.copy.elision].  */
+   as if it were designated by an rvalue.  See [class.copy.elision].
+   PARM_P is true if a function parameter is OK in this context.  */
 
-static bool
-treat_lvalue_as_rvalue_p (tree retval)
+bool
+treat_lvalue_as_rvalue_p (tree retval, bool parm_ok)
 {
   return ((cxx_dialect != cxx98)
  && ((VAR_P (retval) && !DECL_HAS_VALUE_EXPR_P (retval))
- || TREE_CODE (retval) == PARM_DECL)
+ || (parm_ok && TREE_CODE (retval) == PARM_DECL))
  && DECL_CONTEXT (retval) == current_function_decl
  && !TREE_STATIC (retval));
 }
@@ -9240,7 +9241,7 @@ maybe_warn_pessimizing_move (tree retval, tree functype)
}
  /* Warn if the move is redundant.  It is redundant when we would
 do maybe-rvalue overload resolution even without std::move.  */
- else if (treat_lvalue_as_rvalue_p (arg))
+ else if (treat_lvalue_as_rvalue_p (arg, /*parm_ok*/true))
{
  auto_diagnostic_group d;
  if (warning_at (loc, OPT_Wredundant_move,
@@ -9525,7 +9526,7 @@ check_return_expr (tree retval, bool *no_warning)
  Note that these conditions are similar to, but not as strict as,
 the conditions for the named return value optimization.  */
   bool converted = false;
-  if (treat_lvalue_as_rvalue_p (retval)
+  if (treat_lvalue_as_rvalue_p (retval, /*parm_ok*/true)
  /* This is only interesting for class type.  */
  && CLASS_TYPE_P (functype))
{


Re: [PATCH][4/4][v2] RPO-style value-numbering for FRE/PRE

2018-09-04 Thread Jeff Law
On 09/04/2018 04:12 PM, Gerald Pfeifer wrote:
> On Fri, 24 Aug 2018, Richard Biener wrote:
>> Comments are still welcome - I've re-bootstrapped and tested the series
>> on x86_64-unknown-linux-gnu for all languages and will talk about
>> this work at the Cauldron in more detail.
> 
> Is there any chance you can test this on i586 as well?
> 
> Since around that commit (August 27th) my i586 builds are failing
> with something like
> 
>   during GIMPLE pass: pre
>   cp-demangle.c: In function ‘d_print_comp’:
>   cp-demangle.c:5711:1: internal compiler error: Segmentation fault
>   5711 | d_print_comp (struct d_print_info *dpi, int options,
>| ^~~~
> 
> when doing a regular bootstrap on i586-unknown-freebsd10.4 (with
> clang 3.4.1 as system compiler).
> 
> Alas, when I add -save-temps, the internal compiler error goes away,
> and the backtrace using gdb installed on that system I share isn't 
> very useful, either.  (When I replace cp-demangle.c by cp-demangle.i
> in the invocation the error also disppears.)
> 
> On the other hand, this ICE has been consistent across a week of
> daily builds now.
An FYI, My i686 builds have been running OK.  But given what you've
described this could well be an uninitialized read, dangling pointer,
out of bounds write or some similar kind of bug.

Jeff


Re: [patch,nvptx] Basic -misa support for nvptx

2018-09-04 Thread Cesar Philippidis
On 09/02/2018 07:57 AM, Cesar Philippidis wrote:
> On 09/01/2018 12:04 PM, Tom de Vries wrote:
>> On 08/31/2018 04:14 PM, Cesar Philippidis wrote:
> 
>>> Is this patch OK for trunk?
>>>
>>
>> Well, how did you test this (
>> https://gcc.gnu.org/contribute.html#patches : "Bootstrapping and
>> testing. State the host and target combinations you used to do proper
>> testing as described above, and the results of your testing.") ?
> 
> I tested the standalone nvptx compiler. I'll retest with libgomp with
> -misa=sm_35. Bootstrapping won't help much here, unfortunately.
>>> +++ b/gcc/testsuite/gcc.target/nvptx/atomic_fetch-1.c
>>> @@ -0,0 +1,24 @@
>>> +/* Test the nvptx atomic instructions for __atomic_fetch_OP for SM_35
>>> +   targets.  */
>>> +
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -misa=sm_35" } */
>>> +
>>> +int
>>> +main()
>>> +{
>>> +  unsigned long long a = ~0;
>>> +  unsigned b = 0xa;
>>> +
>>> +  __atomic_fetch_add (, b, 0);
>>> +  __atomic_fetch_and (, b, 0);
>>> +  __atomic_fetch_or (, b, 0);
>>> +  __atomic_fetch_xor (, b, 0);
>>> +  
>>> +  return a;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler "atom.add.u64" } } */
>>> +/* { dg-final { scan-assembler "atom.b64.and" } } */
>>> +/* { dg-final { scan-assembler "atom.b64.or" } } */
>>> +/* { dg-final { scan-assembler "atom.b64.xor" } } */
>>> -- 2.17.1
>>>
>>
>> Hmm, the add.u64 vs b64.and looks odd (and the scan-assembler-not
>> testcase does not use this difference, so that needs to be fixed, or for
>> bonus points, changed into a scan-assembler testcase).
>>
>> The documentation uses "op.type", we should fix the compiler to emit
>> that consistently. Separate patch that fixes that pre-approved.
> 
> ACK. I think there are a lot of other cases like that in the BE.
> 
>> This is ok (with, as I mentioned above, the SI part split off into a
>> separate patch), on the condition that you test libgomp with
>> -foffload=-misa=sm_35.

Adding -foffload=misa=sm_35 didn't work because the host gcc doesn't
support the -misa flag. When I forced the nvptx BE to set TARGET_SM35 to
always be true, I ran into problems with SM_30 code linking against
SM_35 code. Therefore, I don't think this patch is ready for trunk yet.

By the way, is -misa really necessary for atomic_fetch_?
Looking at the PTX documentation I see
:

PTX ISA version 3.1 introduces the following new features:

* Support for sm_35 target architecture.
* Extends atomic and reduction instructions to perform 64-bit {and, or,
xor} operations, and 64-bit integer {min, max} operations.

Is there a table for which list which GPUs are compatible with which
instructions?

Thanks,
Cesar


Re: [PATCH][4/4][v2] RPO-style value-numbering for FRE/PRE

2018-09-04 Thread Gerald Pfeifer
On Fri, 24 Aug 2018, Richard Biener wrote:
> Comments are still welcome - I've re-bootstrapped and tested the series
> on x86_64-unknown-linux-gnu for all languages and will talk about
> this work at the Cauldron in more detail.

Is there any chance you can test this on i586 as well?

Since around that commit (August 27th) my i586 builds are failing
with something like

  during GIMPLE pass: pre
  cp-demangle.c: In function ‘d_print_comp’:
  cp-demangle.c:5711:1: internal compiler error: Segmentation fault
  5711 | d_print_comp (struct d_print_info *dpi, int options,
   | ^~~~

when doing a regular bootstrap on i586-unknown-freebsd10.4 (with
clang 3.4.1 as system compiler).

Alas, when I add -save-temps, the internal compiler error goes away,
and the backtrace using gdb installed on that system I share isn't 
very useful, either.  (When I replace cp-demangle.c by cp-demangle.i
in the invocation the error also disppears.)

On the other hand, this ICE has been consistent across a week of
daily builds now.

Gerald

/scratch/tmp/gerald/OBJ-0904-1640/./gcc/xgcc 
-B/scratch/tmp/gerald/OBJ-0904-1640/./gcc/ 
-B/home/gerald/gcc-ref10-i386/i586-unknown-freebsd10.4/bin/ 
-B/home/gerald/gcc-ref10-i386/i586-unknown-freebsd10.4/lib/ -isystem 
/home/gerald/gcc-ref10-i386/i586-unknown-freebsd10.4/include -isystem 
/home/gerald/gcc-ref10-i386/i586-unknown-freebsd10.4/sys-include -fno-checking 
-DHAVE_CONFIG_H -I.. 
-I/scratch/tmp/gerald/GCC-HEAD/libstdc++-v3/../libiberty 
-I/scratch/tmp/gerald/GCC-HEAD/libstdc++-v3/../include -D_GLIBCXX_SHARED 
-I/scratch/tmp/gerald/OBJ-0904-1640/i586-unknown-freebsd10.4/libstdc++-v3/include/i586-unknown-freebsd10.4
 
-I/scratch/tmp/gerald/OBJ-0904-1640/i586-unknown-freebsd10.4/libstdc++-v3/include
 
-I/scratch/tmp/gerald/GCC-HEAD/libstdc++-v3/libsupc++ -g -O2 -DIN_GLIBCPP_V3 
-Wno-error 
-c cp-demangle.c  -fPIC -DPIC -o cp-demangle.o

[wwwdocs] Fix main page

2018-09-04 Thread Gerald Pfeifer
Something must have gone awry with our main page during the last
stages of the HTML 5 conversion.  I noticed this yesterday and
fixed it on the spot with the patch below.

More CSS, though not exactly beautiful.  As a next step, I'll see to 
reduce the use of tables for our main page; it should not be necessary 
any longer with modern HTML and CSS.  Anyone who wants to lend a helping 
hand is definitely welcome!

Gerald

Index: gcc.css
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc.css,v
retrieving revision 1.62
retrieving revision 1.64
diff -u -r1.62 -r1.64
--- gcc.css 3 Sep 2018 18:55:18 -   1.62
+++ gcc.css 3 Sep 2018 19:43:07 -   1.64
@@ -21,6 +21,7 @@
 .top  { vertical-align:top; }
 
 .width33  { width:33%; }
+.border0  { border-width:0; }
 
 .no-margin-top { margin-top:0; }
 .twocolumns { column-count:2; }
@@ -43,10 +44,13 @@
 
 table.nav {
   padding-left: 32px;
-  border-width: 0;
   border-spacing: 0pt;
 }
 
+table.nav td {
+  border-width: 0;
+}
+
 table.navitem {
   width: 100%;
   border-spacing: 0pt;
Index: style.mhtml
===
RCS file: /cvs/gcc/wwwdocs/htdocs/style.mhtml,v
retrieving revision 1.148
retrieving revision 1.151
diff -u -r1.148 -r1.151
--- style.mhtml 1 Sep 2018 23:39:53 -   1.148
+++ style.mhtml 3 Sep 2018 19:34:55 -   1.151
@@ -70,10 +70,10 @@
 
 
  
+  
   
 
- 
+  
  >
 >
 
@@ -83,7 +83,7 @@
  
 
-  
+   
   
 
   
Index: index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v
retrieving revision 1.1096
retrieving revision 1.1099
diff -u -r1.1096 -r1.1099
--- index.html  1 Sep 2018 23:42:00 -   1.1096
+++ index.html  3 Sep 2018 19:28:26 -   1.1099
@@ -46,14 +46,14 @@
 
 
 
-
+
 
 
 
-
+
 News
 
 
@@ -110,7 +110,7 @@
 
 
 
-
+
 Supported Releases
 
 


Re: C++ PATCH/RFC for c++/87109, wrong overload with ref-qualifiers

2018-09-04 Thread Jason Merrill
On Tue, Sep 4, 2018 at 3:02 PM, Marek Polacek  wrote:
> On Thu, Aug 30, 2018 at 09:22:26AM -0400, Jason Merrill wrote:
>> On Wed, Aug 29, 2018 at 8:03 PM, Marek Polacek  wrote:
>> > I've now gotten to the point where I question the validity of this PR, so 
>> > it's
>> > probably a good time to stop and ask for some advice.
>> >
>> > As discussed in 
>> > , we
>> > choose the wrong overload for f1:
>> >
>> > struct C { };
>> > struct A {
>> >   operator C() &;
>> >   operator C() &&;
>> > };
>> >
>> > C f1(A a)
>> > {
>> >return a; // should call operator C()&, but calls operator C()&&
>> > }
>> >
>> > Since we're returning a local variable, we know it's about to be destroyed,
>> > so even though it's got a name, not for very much longer, so we activate 
>> > move
>> > semantics.  So we perform overload resolution with 'a' turned into
>> > *NON_LVALUE_EXPR<(A&) >, an xvalue.  We need to convert 'a' from A to C,
>> > which is taking place in build_user_type_conversion_1.  It will see two
>> > cadidates:
>> >
>> >   A::operator C() &&
>> >   A::operator C() &
>> >
>> > when adding these candidates in add_function_candidate we process the
>> > ref-qualifiers by tweaking the type of the implicit object parameter by 
>> > turning
>> > it into a reference type.  Then we create an implicit conversion sequence
>> > for converting the type of the argument to the type of the parameter,
>> > so A to A&.  That succeeds in the first case (an xvalue binding to an 
>> > rvalue
>> > reference) but fails in the second case (an xvalue binding to an lvalue
>> > reference).  And thus we end up using the first overload.
>> >
>> > But why is this invalid, again?  [class.copy.elision] says "or if the type 
>> > of
>> > the first parameter of the selected constructor is not an rvalue reference 
>> > to
>> > the object's type (possibly cv-qualified), overload resolution is performed
>> > again, considering the object as an lvalue." but I don't see how that 
>> > applies
>> > here.  (Constructors can't have ref-qualifiers anyway.)
>> >
>> > Thoughts?
>>
>> Where that rule comes in is when we choose the constructor for C:
>> since we've already called operator C()&&, we choose C(C&&), which
>> does not have a first parameter of "rvalue reference to cv A", so it
>> should be rejected.
>
> I see.  Turned out there were two problems: we weren't getting into the
> if (flags & LOOKUP_PREFER_RVALUE) block in build_over_call; this I fixed
> by setting rvaluedness_matches_p in build_user_type_conversion_1 for ck_rvalue
> if appropriate.
>
> And then the constructor argument check failed to check the returned object's
> type.  The tricky part is getting the type.  I realized we can go to the
> beginning of the conversion sequence and extract u.expr.  But we can't simply
> look at its type, if it's DECL_CONV_FN_P, we need to dig deeper.  You'll see
> that I don't handle other things, and I don't know if I need to.  I tried to
> handle arbitrary expressions, but it evolved into something that just seemed
> too fragile.
>
> E.g. for this testcase u.expr will look like:
> TARGET_EXPR  &) >)>
> and taking its type would yield "struct C".  I actually think that returning
> error_mark_node when we see any DECL_CONV_FN_P would work just as well.
>
> Can you think of something better than this?

Hmm, I think we could push that back even farther, and bail out where
you're already changing build_user_type_conversion_1; if we're doing
this two-step initialization, then we aren't going to end up with a
suitable constructor.

Jason


Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-09-04 Thread Steve Ellcey
On Tue, 2018-09-04 at 17:20 +, Wilco Dijkstra wrote:
> External Email
> 
> Hi Steve,
> 
> The latest version compiles the examples I used correctly, so it looks fine
> from that perspective (but see comments below). However the key point of
> the ABI is to enable better code generation when calling a vector function,
> and that will likely require further changes that may conflict with this 
> patch.
> Do you have patches for that work outstanding? It seems best to do this in
> one go.
> 
> Also did you check there is no regression in code generation for non-vector
> functions?

Yes, I checked for regressions in non-vector functions.  They way this
is written, there should be zero changes to any generated code that
does not involve vector functions.

I have some changes to improve code generation for SIMD functions but
it is not ready to go yet.  The main change I made to improve code
generation is to modify TARGET_HARD_REGNO_CALL_PART_CLOBBERED to accept
an instruction pointer so we could check to see if the call being made
is to a 'normal' function or a 'simd' function and return the correct
value.  Unfortunately, there are many places where we don't have the
call insn available and have to make the pessimistic guess.

I also changed get_call_reg_set_usage so that it could return different
default sets of what registers changed based on whether or not the
function being called was normal or simd.  Again, there are some places
where we have to make a pessimistic guess.

> 
> 
> +/* Return 1 if the register is used by the epilogue.  We need to say the
> +   return register is used, but only after epilogue generation is complete.
> +   Note that in the case of sibcalls, the values "used by the epilogue" are
> +   considered live at the start of the called function.
> +
> +   For SIMD functions we need to return 1 for FP registers that are saved and
> +   restored by a function but not zero in call_used_regs.  If we do not do
> +   this optimizations may remove the restore of the register.  */
> +
> +int
> +aarch64_epilogue_uses (int regno)
> +{
> +  if (epilogue_completed && (regno) == LR_REGNUM)
> +return 1;
> +  if (aarch64_simd_decl_p (cfun->decl) && FP_SIMD_SAVED_REGNUM_P
> (regno))
> +return 1;
> +  return 0;
> +}
> 
> I'm not convinced this is a good idea. It suggests GCC doesn't have the 
> correct set
> of caller/callee-save registers for vector functions (I don't see a change to 
> update
> CALL_USED_REGISTERS or aarch64_hard_regno_call_part_clobbered), which could
> lead to all kinds of interesting issues.

The problem is that there is no single correct set of caller/callee-
saved registers anymore.  There is one set for normal functions and a
different set for simd functions.  GCC isn't set up to have two
different caller/callee saved ABIs on one target, there is only
one CALL_USED_REGISTERS macro used to set one call_used_regs
array.  Should V16-V23 be set in CALL_USED_REGISTERS?  For 'normal'
functions, yes.  For SIMD functions, no.

> 
> +/* Return false for non-leaf SIMD functions in order to avoid
> +   shrink-wrapping them.  Doing this will lose the necessary
> +   save/restore of FP registers.  */
> +
> +bool
> +aarch64_use_simple_return_insn_p (void)
> +{
> +  if (aarch64_simd_decl_p (cfun->decl) && !crtl->is_leaf)
> +return false;
> +
> +  return true;
> +}
> 
> Such as this...
> 
> Wilco


Re: [PATCH] DWARF: Allow hard frame pointer even if frame pointer isn't used

2018-09-04 Thread Jason Merrill
OK.

On Tue, Sep 4, 2018 at 11:13 AM, H.J. Lu  wrote:
> On Mon, Sep 3, 2018 at 10:01 AM, H.J. Lu  wrote:
>> On Mon, Sep 3, 2018 at 9:44 AM, Michael Matz  wrote:
>>> Hi,
>>>
>>> On Mon, 3 Sep 2018, H.J. Lu wrote:
>>>
 > So, what's the testcase testing then?  Before the patch it doesn't ICE,
 > after the patch it doesn't ICE.  What should I look out for so I can see
 > that what the testcase is producing without the patch is wrong?

 Before the patch, debug info is wrong since it uses hard frame pointer
 which isn't set up for the function.  You can do "readelf -w" on .o file to
 verify the debug info.
>>>
>>> Yeah, that's what I thought as well, but it's correct:
>>>
>>> % ./gcc/cc1plus -quiet -O2 -g -fno-omit-frame-pointer -fvar-tracking x.cc
>>> % gcc -c x.s
>>> % readelf -wfi x.o
>>> ...
>>>  <1><8a>: Abbrev Number: 9 (DW_TAG_subprogram)
>>> <8b>   DW_AT_specification: <0x3a>
>>> <8f>   DW_AT_decl_line   : 6
>>> <90>   DW_AT_decl_column : 5
>>> <91>   DW_AT_object_pointer: <0xa7>
>>> <95>   DW_AT_low_pc  : 0x0
>>> <9d>   DW_AT_high_pc : 0x3
>>>DW_AT_frame_base  : 1 byte block: 9c 
>>> (DW_OP_call_frame_cfa)
>>>DW_AT_GNU_all_call_sites: 1
>>> ...
>>>  <2>: Abbrev Number: 11 (DW_TAG_formal_parameter)
>>>DW_AT_name: d
>>> <101>   DW_AT_decl_file   : 1
>>> <102>   DW_AT_decl_line   : 6
>>> <103>   DW_AT_decl_column : 63
>>> <104>   DW_AT_type: <0x78>
>>> <108>   DW_AT_location: 2 byte block: 91 8  (DW_OP_fbreg: 8)
>>> ...
>>>   DW_CFA_def_cfa: r7 (rsp) ofs 8
>>>   DW_CFA_offset: r16 (rip) at cfa-8
>>>   DW_CFA_nop
>>>   DW_CFA_nop
>>> ...
>>>
>>> So, argument 'd' is supposed to be at DW_AT_frame_base + 8, which is
>>> %rsp+8+8, aka %rsp+16, which is correct given that it's the eigth argument
>>> (including the implicit this parameter).
>>
>> Can we use DW_AT_frame_base when the frame pointer isn't available?
>> If yes,
>>
>>  gcc_assert ((SUPPORTS_STACK_ALIGNMENT
>>&& (elim == hard_frame_pointer_rtx
>>|| elim == stack_pointer_rtx))
>>   || elim == (frame_pointer_needed
>>   ? hard_frame_pointer_rtx
>>   : stack_pointer_rtx));
>>
>> should be changed to
>>
>>   gcc_assert (elim == hard_frame_pointer_rtx
>>   || elim == stack_pointer_rtx);
>>
>> This will also fix:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86593
>>
>
> Since hard frame pointer is encoded with DW_OP_fbreg which uses the
> DW_AT_frame_base attribute, not hard frame pointer directly, we should
> allow hard frame pointer when generating DWARF info even if frame pointer
> isn't used.
>
> OK for trunk?
>
> --
> H.J.


Re: [PATCH] Ignore properly -mdirect-move (PR target/87164).

2018-09-04 Thread Segher Boessenkool
Hi!

On Tue, Sep 04, 2018 at 04:02:23PM +0200, Martin Liška wrote:
> Option mdirect-move should be Deprecated, that means option value is ignored
> and user can't influence rs6000_isa_flags.
> 
> Patch can bootstrap on ppc64le-redhat-linux (gcc110 and gcc112) and survives 
> regression tests.
> 
> Ready to be installed?

So the mask is still defined then, and mask for that variable?  Sounds fine
then, thanks.

The other issue is still there: options.texi says:
@item Ignore
This option is ignored apart from printing any warning specified using
@code{Warn}.

So Warn is explicitly allowed with Ignore, not forbidden.  If you want to
change that, you'll have to change the documentation as well ;-)

I'll test it later, might be tomorrow.  Thanks,


Segher


Re: [PATCH] Fix PR87177 (and dup)

2018-09-04 Thread H.J. Lu
On Mon, Sep 3, 2018 at 6:50 AM, Richard Biener  wrote:
>
> This reverts vuse_ssa_val back to its previous state.  I need to think
> about how to limit alias walking for regions but will do so as followup.
>
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>
> Richard.
>
> 2018-09-03  Richard Biener  
>
> PR tree-optimization/87177
> * tree-ssa-sccvn.c (vuse_ssa_val): Revert previous change, keep
> cleanup.
>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87219

-- 
H.J.


Re: C++ PATCH/RFC for c++/87109, wrong overload with ref-qualifiers

2018-09-04 Thread Marek Polacek
On Thu, Aug 30, 2018 at 09:22:26AM -0400, Jason Merrill wrote:
> On Wed, Aug 29, 2018 at 8:03 PM, Marek Polacek  wrote:
> > I've now gotten to the point where I question the validity of this PR, so 
> > it's
> > probably a good time to stop and ask for some advice.
> >
> > As discussed in , 
> > we
> > choose the wrong overload for f1:
> >
> > struct C { };
> > struct A {
> >   operator C() &;
> >   operator C() &&;
> > };
> >
> > C f1(A a)
> > {
> >return a; // should call operator C()&, but calls operator C()&&
> > }
> >
> > Since we're returning a local variable, we know it's about to be destroyed,
> > so even though it's got a name, not for very much longer, so we activate 
> > move
> > semantics.  So we perform overload resolution with 'a' turned into
> > *NON_LVALUE_EXPR<(A&) >, an xvalue.  We need to convert 'a' from A to C,
> > which is taking place in build_user_type_conversion_1.  It will see two
> > cadidates:
> >
> >   A::operator C() &&
> >   A::operator C() &
> >
> > when adding these candidates in add_function_candidate we process the
> > ref-qualifiers by tweaking the type of the implicit object parameter by 
> > turning
> > it into a reference type.  Then we create an implicit conversion sequence
> > for converting the type of the argument to the type of the parameter,
> > so A to A&.  That succeeds in the first case (an xvalue binding to an rvalue
> > reference) but fails in the second case (an xvalue binding to an lvalue
> > reference).  And thus we end up using the first overload.
> >
> > But why is this invalid, again?  [class.copy.elision] says "or if the type 
> > of
> > the first parameter of the selected constructor is not an rvalue reference 
> > to
> > the object's type (possibly cv-qualified), overload resolution is performed
> > again, considering the object as an lvalue." but I don't see how that 
> > applies
> > here.  (Constructors can't have ref-qualifiers anyway.)
> >
> > Thoughts?
> 
> Where that rule comes in is when we choose the constructor for C:
> since we've already called operator C()&&, we choose C(C&&), which
> does not have a first parameter of "rvalue reference to cv A", so it
> should be rejected.

I see.  Turned out there were two problems: we weren't getting into the
if (flags & LOOKUP_PREFER_RVALUE) block in build_over_call; this I fixed
by setting rvaluedness_matches_p in build_user_type_conversion_1 for ck_rvalue
if appropriate.

And then the constructor argument check failed to check the returned object's
type.  The tricky part is getting the type.  I realized we can go to the
beginning of the conversion sequence and extract u.expr.  But we can't simply
look at its type, if it's DECL_CONV_FN_P, we need to dig deeper.  You'll see
that I don't handle other things, and I don't know if I need to.  I tried to
handle arbitrary expressions, but it evolved into something that just seemed
too fragile.

E.g. for this testcase u.expr will look like:
TARGET_EXPR )>
and taking its type would yield "struct C".  I actually think that returning
error_mark_node when we see any DECL_CONV_FN_P would work just as well.

Can you think of something better than this?

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2018-09-04  Marek Polacek  

PR c++/87109
* call.c (build_user_type_conversion_1): If LOOKUP_PREFER_RVALUE, set
rvaluedness_matches_p on ck_rvalue.
(source_expr): New.
(build_over_call): Check if the returned object's type is the same as
the argument this ctor received.

* g++.dg/cpp0x/ref-qual19.C: New test.

diff --git gcc/cp/call.c gcc/cp/call.c
index a1567026975..6f998e1201a 100644
--- gcc/cp/call.c
+++ gcc/cp/call.c
@@ -3919,7 +3919,11 @@ build_user_type_conversion_1 (tree totype, tree expr, 
int flags,
 harmless since we'd add it here anyway. */
  if (ics && MAYBE_CLASS_TYPE_P (totype) && ics->kind != ck_rvalue
  && !(convflags & LOOKUP_NO_TEMP_BIND))
-   ics = build_conv (ck_rvalue, totype, ics);
+   {
+ ics = build_conv (ck_rvalue, totype, ics);
+ if (flags & LOOKUP_PREFER_RVALUE)
+   ics->rvaluedness_matches_p = true;
+   }
 
  cand->second_conv = ics;
 
@@ -7722,6 +7726,16 @@ build_trivial_dtor_call (tree instance)
 instance, clobber);
 }
 
+/* The source expr for this standard conversion sequence.  */
+
+static tree
+source_expr (conversion *t)
+{
+  while (t->kind != ck_identity)
+t = next_conversion (t);
+  return t->u.expr;
+}
+
 /* Subroutine of the various build_*_call functions.  Overload resolution
has chosen a winning candidate CAND; build up a CALL_EXPR accordingly.
ARGS is a TREE_LIST of the unconverted arguments to the call.  FLAGS is a
@@ -7934,6 +7948,25 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
  || !TYPE_REF_IS_RVALUE (ptype)
  

Re: [PATCH v2] gcc: xtensa: fix NAND code in xtensa_expand_atomic

2018-09-04 Thread Max Filippov
On Tue, Sep 4, 2018 at 10:35 AM, augustine.sterl...@gmail.com
 wrote:
> On Tue, Sep 4, 2018 at 9:42 AM Max Filippov  wrote:
>>
>> NAND is ~(a1 & a2), but xtensa_expand_atomic does ~a1 & a2.
>> That fixes libatomic tests atomic-op-{1,2}.
>>
>> gcc/
>> 2018-09-04  Max Filippov  
>>
>> * config/xtensa/xtensa.c (xtensa_expand_atomic): Reorder AND and
>> XOR operations in NAND case.
>
>
> Approved.

Thanks. Applied to trunk and gcc-[678] branches.

-- 
Thanks.
-- Max


Re: [patch, fortan] PR87103 - [OOP] ICE in gfc_new_symbol() due to overlong symbol name

2018-09-04 Thread Bernhard Reutner-Fischer
On Tue, 4 Sep 2018 at 18:43, Andrew Benson  wrote:
>
> As suggested by Janus, PR87103 is easily fixed by the attached patch which
> increases GFC_MAX_SYMBOL_LEN to 76 (sufficient to hold the maximum allowed F08
> symbol length of 63, plus a null terminator, plus the "__tmp_class_" prefix).

This is so much wrong.
Note that this will be fixed properly by the changes contained in the
https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/aldot/fortran-fe-stringpool
branch.
There we keep the GFC_MAX_SYMBOL_LEN at 63 proper but use an internal
buffer double that size which in turn is sufficient to hold all
compiler-generated identifiers.
See gfc_get_string() even in current TOT.

Maybe we should bite the bullet and start to merge the stringpool
changes now instead of this hack?

thanks and cheers,


Re: [PATCH v2] gcc: xtensa: fix NAND code in xtensa_expand_atomic

2018-09-04 Thread augustine.sterl...@gmail.com
On Tue, Sep 4, 2018 at 9:42 AM Max Filippov  wrote:

> NAND is ~(a1 & a2), but xtensa_expand_atomic does ~a1 & a2.
> That fixes libatomic tests atomic-op-{1,2}.
>
> gcc/
> 2018-09-04  Max Filippov  
>
> * config/xtensa/xtensa.c (xtensa_expand_atomic): Reorder AND and
> XOR operations in NAND case.
>

Approved.


Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-09-04 Thread Wilco Dijkstra
Hi Steve,

The latest version compiles the examples I used correctly, so it looks fine
from that perspective (but see comments below). However the key point of 
the ABI is to enable better code generation when calling a vector function,
and that will likely require further changes that may conflict with this patch.
Do you have patches for that work outstanding? It seems best to do this in
one go.

Also did you check there is no regression in code generation for non-vector
functions? 


+/* Return 1 if the register is used by the epilogue.  We need to say the
+   return register is used, but only after epilogue generation is complete.
+   Note that in the case of sibcalls, the values "used by the epilogue" are
+   considered live at the start of the called function.
+
+   For SIMD functions we need to return 1 for FP registers that are saved and
+   restored by a function but not zero in call_used_regs.  If we do not do 
+   this optimizations may remove the restore of the register.  */
+
+int
+aarch64_epilogue_uses (int regno)
+{
+  if (epilogue_completed && (regno) == LR_REGNUM)
+    return 1;
+  if (aarch64_simd_decl_p (cfun->decl) && FP_SIMD_SAVED_REGNUM_P (regno))
+    return 1;
+  return 0;
+}

I'm not convinced this is a good idea. It suggests GCC doesn't have the correct 
set
of caller/callee-save registers for vector functions (I don't see a change to 
update
CALL_USED_REGISTERS or aarch64_hard_regno_call_part_clobbered), which could
lead to all kinds of interesting issues.

+/* Return false for non-leaf SIMD functions in order to avoid
+   shrink-wrapping them.  Doing this will lose the necessary
+   save/restore of FP registers.  */
+
+bool
+aarch64_use_simple_return_insn_p (void)
+{
+  if (aarch64_simd_decl_p (cfun->decl) && !crtl->is_leaf)
+    return false;
+
+  return true;
+}

Such as this...

Wilco


Re: [PATCH] Optimise sqrt reciprocal multiplications

2018-09-04 Thread Kyrill Tkachov



On 04/09/18 15:31, Richard Biener wrote:

On Tue, 4 Sep 2018, Kyrill Tkachov wrote:


Hi Richard,

On 31/08/18 12:07, Richard Biener wrote:

On Thu, 30 Aug 2018, Kyrill Tkachov wrote:


Ping.

https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01496.html

Thanks,
Kyrill

On 23/08/18 18:09, Kyrill Tkachov wrote:

Hi Richard,

On 23/08/18 11:13, Richard Sandiford wrote:

Kyrill  Tkachov  writes:

Hi all,

This patch aims to optimise sequences involving uses of 1.0 / sqrt (a)
under -freciprocal-math and -funsafe-math-optimizations.
In particular consider:

x = 1.0 / sqrt (a);
r1 = x * x;  // same as 1.0 / a
r2 = a * x; // same as sqrt (a)

If x, r1 and r2 are all used further on in the code, this can be
transformed into:
tmp1 = 1.0 / a
tmp2 = sqrt (a)
tmp3 = tmp1 * tmp2
x = tmp3
r1 = tmp1
r2 = tmp2

Nice optimisation :-)  Someone who knows the pass better should review,
but:

Thanks for the review.


There seems to be an implicit assumption that this is a win even
when the r1 and r2 assignments are only conditionally executed.
That's probably true, but it might be worth saying explicitly.

I'll admit I had not considered that case.
I think it won't make a difference in practice, as the really expensive
operations here
are the sqrt and the division and they are on the executed path in either
case and them
becoming independent should be a benefit of its own.


+/* Return TRUE if USE_STMT is a multiplication of DEF by A.  */
+
+static inline bool
+is_mult_by (gimple *use_stmt, tree def, tree a)
+{
+  if (gimple_code (use_stmt) == GIMPLE_ASSIGN
+  && gimple_assign_rhs_code (use_stmt) == MULT_EXPR)
+{
+  tree op0 = gimple_assign_rhs1 (use_stmt);
+  tree op1 = gimple_assign_rhs2 (use_stmt);
+
+  return (op0 == def && op1 == a)
+  || (op0 == a && op1 == def);
+}
+  return 0;
+}

Seems like is_square_of could now be a light-weight wrapper around this.

Indeed, I've done the wrapping now.


@@ -652,6 +669,180 @@ execute_cse_reciprocals_1 (gimple_stmt_iterator
*def_gsi, tree def)
 occ_head = NULL;
   }
   +/* Transform sequences like
+   x = 1.0 / sqrt (a);
+   r1 = x * x;
+   r2 = a * x;
+   into:
+   tmp1 = 1.0 / a;
+   tmp2 = sqrt (a);
+   tmp3 = tmp1 * tmp2;
+   x = tmp3;
+   r1 = tmp1;
+   r2 = tmp2;
+   depending on the uses of x, r1, r2.  This removes one multiplication
and
+   allows the sqrt and division operations to execute in parallel.
+   DEF_GSI is the gsi of the initial division by sqrt that defines
+   DEF (x in the example abovs).  */
+
+static void
+optimize_recip_sqrt (gimple_stmt_iterator *def_gsi, tree def)
+{
+  use_operand_p use_p;
+  imm_use_iterator use_iter;
+  gimple *stmt = gsi_stmt (*def_gsi);
+  tree x = def;
+  tree orig_sqrt_ssa_name = gimple_assign_rhs2 (stmt);
+  tree div_rhs1 = gimple_assign_rhs1 (stmt);
+
+  if (TREE_CODE (orig_sqrt_ssa_name) != SSA_NAME
+  || TREE_CODE (div_rhs1) != REAL_CST
+  || !real_equal (_REAL_CST (div_rhs1), ))
+return;
+
+  gimple *sqrt_stmt = SSA_NAME_DEF_STMT (orig_sqrt_ssa_name);
+  if (!is_gimple_call (sqrt_stmt)
+  || !gimple_call_lhs (sqrt_stmt))
+return;
+
+  gcall *call = as_a  (sqrt_stmt);

Very minor, but:

gcall *sqrt_stmt
  = dyn_cast  (SSA_NAME_DEF_STMT (orig_sqrt_ssa_name));
if (!sqrt_stmt || !gimple_call_lhs (sqrt_stmt))
  return;

would avoid the need for the separate as_a<>, and would mean that
we only call gimple_call_* on gcalls.

Ok.


+  if (has_other_use)
+{
+  /* Using the two temporaries tmp1, tmp2 from above
+ the original x is now:
+ x = tmp1 * tmp2.  */
+  gcc_assert (mult_ssa_name);
+  gcc_assert (sqr_ssa_name);
+  gimple_stmt_iterator gsi2 = gsi_for_stmt (stmt);
+
+  tree new_ssa_name
+= make_temp_ssa_name (TREE_TYPE (a), NULL,
"recip_sqrt_transformed");
+  gimple *new_stmt
+= gimple_build_assign (new_ssa_name, MULT_EXPR,
+   mult_ssa_name, sqr_ssa_name);
+  gsi_insert_before (, new_stmt, GSI_SAME_STMT);
+  gcc_assert (gsi_stmt (gsi2) == stmt);
+  gimple_assign_set_rhs_from_tree (, new_ssa_name);
+  fold_stmt ();
+  update_stmt (stmt);

In this case we're replacing the statement in its original position,
so there's no real need to use a temporary.  It seems better to
change the rhs_code, rhs1 and rhs2 of stmt in-place, with the same
lhs as before.

Yes, that's cleaner.


@@ -762,6 +953,23 @@ pass_cse_reciprocals::execute (function *fun)
 if (optimize_bb_for_size_p (bb))
   continue;

Seems unnecessary to skip the new optimisation when optimising for size.
Like you say, it saves a multiplication overall. Also:

Indeed.


+  if (flag_unsafe_math_optimizations)
+{
+  for (gimple_stmt_iterator gsi = gsi_after_labels (bb);
+   !gsi_end_p (gsi);
+   gsi_next ())
+{
+  gimple *stmt = gsi_stmt (gsi);
+
+  if (gimple_has_lhs (stmt)
+  && (def = SINGLE_SSA_TREE_OPERAND (stmt, SSA_OP_DEF)) != NULL
+  && 

[patch, fortan] PR87103 - [OOP] ICE in gfc_new_symbol() due to overlong symbol name

2018-09-04 Thread Andrew Benson
As suggested by Janus, PR87103 is easily fixed by the attached patch which 
increases GFC_MAX_SYMBOL_LEN to 76 (sufficient to hold the maximum allowed F08 
symbol length of 63, plus a null terminator, plus the "__tmp_class_" prefix).

Bootstraps and regtests successfully.

OK for trunk?

-Andrew

On Saturday, August 25, 2018 1:50:57 PM PDT Andrew Benson wrote:
> I just opened PR for this: 87103
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87103
> 
> The following code causes an ICE with gfortan 9.0.0 (r263855):
> 
> module a
>   type :: nameThatIsVeryLongButNotTooLongThatItShouldBeInvalid
>   end type nameThatIsVeryLongButNotTooLongThatItShouldBeInvalid
> contains
>   subroutine b(self)
> class(nameThatIsVeryLongButNotTooLongThatItShouldBeInvalid), intent(in)
> :: self
> select type (self)
> class is (nameThatIsVeryLongButNotTooLongThatItShouldBeInvalid)
> end select
>   end subroutine b
> end module a
> 
> $ gfortran -v
> Using built-in specs.
> COLLECT_GCC=gfortran
> COLLECT_LTO_WRAPPER=/home/abenson/Galacticus/Tools/libexec/gcc/x86_64-pc-
> linux-gnu/9.0.0/lto-wrapper
> Target: x86_64-pc-linux-gnu
> Configured with: ../gcc-trunk/configure
> --prefix=/home/abenson/Galacticus/Tools --enable-languages=c,c++,fortran
> --disable-multilib : (reconfigured) ../gcc- trunk/configure
> --prefix=/home/abenson/Galacticus/Tools --enable-languages=c,c+ +,fortran
> --disable-multilib : (reconfigured) ../gcc-trunk/configure --prefix=/
> home/abenson/Galacticus/Tools --disable-multilib --enable-languages=c,c+
> +,fortran,lto --no-create --no-recursion : (reconfigured)
> ../gcc-trunk/configure --prefix=/home/abenson/Galacticus/Tools
> --disable-multilib --enable- languages=c,c++,fortran,lto --no-create
> --no-recursion : (reconfigured) ../gcc- trunk/configure
> --prefix=/home/abenson/Galacticus/Tools --disable-multilib --
> enable-languages=c,c++,fortran,lto --no-create --no-recursion
> Thread model: posix
> gcc version 9.0.0 20180825 (experimental) (GCC)
> 
> 
> $ gfortran -c tmp1.F90 -o tmp1.o
> f951: internal compiler error: new_symbol(): Symbol name too long
> 0x7b9711 gfc_internal_error(char const*, ...)
> ../../gcc-trunk/gcc/fortran/error.c:1362
> 0x84a838 gfc_new_symbol(char const*, gfc_namespace*)
> ../../gcc-trunk/gcc/fortran/symbol.c:3132
> 0x84ab97 gfc_get_sym_tree(char const*, gfc_namespace*, gfc_symtree**, bool)
> ../../gcc-trunk/gcc/fortran/symbol.c:3373
> 0x7e565d select_type_set_tmp
> ../../gcc-trunk/gcc/fortran/match.c:6114
> 0x7ee260 gfc_match_class_is()
> ../../gcc-trunk/gcc/fortran/match.c:6470
> 0x808c79 match_word
> ../../gcc-trunk/gcc/fortran/parse.c:65
> 0x80a4f1 decode_statement
> ../../gcc-trunk/gcc/fortran/parse.c:462
> 0x80d04c next_free
> ../../gcc-trunk/gcc/fortran/parse.c:1234
> 0x80d04c next_statement
> ../../gcc-trunk/gcc/fortran/parse.c:1466
> 0x810861 parse_select_type_block
> ../../gcc-trunk/gcc/fortran/parse.c:4236
> 0x810861 parse_executable
> ../../gcc-trunk/gcc/fortran/parse.c:5330
> 0x8118e6 parse_progunit
> ../../gcc-trunk/gcc/fortran/parse.c:5697
> 0x811bfc parse_contained
> ../../gcc-trunk/gcc/fortran/parse.c:5574
> 0x812a4c parse_module
> ../../gcc-trunk/gcc/fortran/parse.c:5944
> 0x812f75 gfc_parse_file()
> ../../gcc-trunk/gcc/fortran/parse.c:6247
> 0x859d3f gfc_be_parse_file
> ../../gcc-trunk/gcc/fortran/f95-lang.c:204
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See  for instructions.
> 
> 
> The problem seems to be that gfc_new_symbol() is passed a name
> '__tmp_class_namethatisverylongbutnottoolongthatitshouldbeinvalid' in this
> case which exceeds GFC_MAX_SYMBOL_LEN=64. My understanding is that the code
> is valid since the symbol name itself,
> 'namethatisverylongbutnottoolongthatitshouldbeinvalid', is less than 64
> characters.
> 
> I'm not sure what the correct fix for this is - should the name length limit
> in gfc_new_symbol() just be increased by enough to allow the '__tmp_class_'
> prefix?
> 
> This appears to be a regression as the same code (with the same long-but-
> legal) symbol names compiled under earlier versions of gfortran. I'll
> attempt to rollback to previous versions and see if I can identify where
> the change occurred.
> 
> -Andrew
2018-09-04  Andrew Benson  

	* gfortan.h: Increase GFC_MAX_SYMBOL_LEN to 76, sufficient to hold
	the maximum allowed F08 symbol length of 63, plus a null
	terminator, plus the "__tmp_class_" prefix.
Index: gcc/fortran/gfortran.h
===
--- gcc/fortran/gfortran.h	(revision 264085)
+++ gcc/fortran/gfortran.h	(working copy)
@@ -54,7 +54,9 @@ not after.
 
 /* Major control parameters.  */
 
-#define GFC_MAX_SYMBOL_LEN 63   /* Must be at least 63 for F2003.  */
+/* Must be at least 63 for F2003, +1 

[PATCH v2] gcc: xtensa: fix NAND code in xtensa_expand_atomic

2018-09-04 Thread Max Filippov
NAND is ~(a1 & a2), but xtensa_expand_atomic does ~a1 & a2.
That fixes libatomic tests atomic-op-{1,2}.

gcc/
2018-09-04  Max Filippov  

* config/xtensa/xtensa.c (xtensa_expand_atomic): Reorder AND and
XOR operations in NAND case.
---
Changes v1->v2:
- put final inversion result into the new_rtx avoiding unneeded move

 gcc/config/xtensa/xtensa.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.c b/gcc/config/xtensa/xtensa.c
index 7cfe64d42895..080bb4ad765d 100644
--- a/gcc/config/xtensa/xtensa.c
+++ b/gcc/config/xtensa/xtensa.c
@@ -1614,9 +1614,9 @@ xtensa_expand_atomic (enum rtx_code code, rtx target, rtx 
mem, rtx val,
   break;
 
 case MULT: /* NAND */
-  tmp = expand_simple_binop (SImode, XOR, old, ac.modemask,
+  tmp = expand_simple_binop (SImode, AND, old, val,
 NULL_RTX, 1, OPTAB_DIRECT);
-  tmp = expand_simple_binop (SImode, AND, tmp, val,
+  tmp = expand_simple_binop (SImode, XOR, tmp, ac.modemask,
 new_rtx, 1, OPTAB_DIRECT);
   break;
 
-- 
2.11.0



Re: [PATCH] Fix up -mno-xsave handling (PR target/87198)

2018-09-04 Thread Uros Bizjak
On Tue, Sep 4, 2018 at 4:28 PM, Jakub Jelinek  wrote:
> Hi!
>
> The -mxsave{opt,s,c} options turn on automatically -mxsave option and
> the patterns rely on TARGET_XSAVE{OPT,S,C} implying TARGET_XSAVE,
> but if somebody uses e.g. -mxsave{opt,s,c} -mno-xsave (or something that 
> implies
> -mno-xsave), TARGET_XSAVE{OPT,S,C} remains set and TARGET_XSAVE is clear.
>
> Fixed by making sure OPTION_MASK_ISA_XSAVE_UNSET unsets also xsave{s,c}
> (xsaveopt was already there).
>
> This patch doesn't try to address PR87171, which I believe needs further
> discussions and doesn't touch the same code anyway, so I think it should be
> resolved independently.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
> release branches?
>
> 2018-09-04  Jakub Jelinek  
>
> PR target/87198
> * common/config/i386/i386-common.c (OPTION_MASK_ISA_XSAVEOPT_SET,
> OPTION_MASK_ISA_XSAVES_SET, OPTION_MASK_ISA_XSAVEC_SET): Use
> OPTION_MASK_ISA_XSAVE_SET instead of OPTION_MASK_ISA_XSAVE.
> (OPTION_MASK_ISA_XSAVE_UNSET): Add OPTION_MASK_ISA_XSAVES_UNSET
> and OPTION_MASK_ISA_XSAVEC_UNSET.
>
> * gcc.target/i386/pr87198.c: New test.

OK.

Thanks,
Uros.

> --- gcc/common/config/i386/i386-common.c.jj 2018-07-17 12:48:23.388587985 
> +0200
> +++ gcc/common/config/i386/i386-common.c2018-09-03 11:45:03.543653449 
> +0200
> @@ -59,7 +59,7 @@ along with GCC; see the file COPYING3.
>  #define OPTION_MASK_ISA_FXSR_SET OPTION_MASK_ISA_FXSR
>  #define OPTION_MASK_ISA_XSAVE_SET OPTION_MASK_ISA_XSAVE
>  #define OPTION_MASK_ISA_XSAVEOPT_SET \
> -  (OPTION_MASK_ISA_XSAVEOPT | OPTION_MASK_ISA_XSAVE)
> +  (OPTION_MASK_ISA_XSAVEOPT | OPTION_MASK_ISA_XSAVE_SET)
>  #define OPTION_MASK_ISA_AVX512F_SET \
>(OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_AVX2_SET)
>  #define OPTION_MASK_ISA_AVX512CD_SET \
> @@ -95,9 +95,9 @@ along with GCC; see the file COPYING3.
>  #define OPTION_MASK_ISA_PREFETCHWT1_SET OPTION_MASK_ISA_PREFETCHWT1
>  #define OPTION_MASK_ISA_CLFLUSHOPT_SET OPTION_MASK_ISA_CLFLUSHOPT
>  #define OPTION_MASK_ISA_XSAVES_SET \
> -  (OPTION_MASK_ISA_XSAVES | OPTION_MASK_ISA_XSAVE)
> +  (OPTION_MASK_ISA_XSAVES | OPTION_MASK_ISA_XSAVE_SET)
>  #define OPTION_MASK_ISA_XSAVEC_SET \
> -  (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_XSAVE)
> +  (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_XSAVE_SET)
>  #define OPTION_MASK_ISA_CLWB_SET OPTION_MASK_ISA_CLWB
>
>  /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
> @@ -185,7 +185,8 @@ along with GCC; see the file COPYING3.
>  #define OPTION_MASK_ISA_FMA_UNSET OPTION_MASK_ISA_FMA
>  #define OPTION_MASK_ISA_FXSR_UNSET OPTION_MASK_ISA_FXSR
>  #define OPTION_MASK_ISA_XSAVE_UNSET \
> -  (OPTION_MASK_ISA_XSAVE | OPTION_MASK_ISA_XSAVEOPT_UNSET)
> +  (OPTION_MASK_ISA_XSAVE | OPTION_MASK_ISA_XSAVEOPT_UNSET \
> +   | OPTION_MASK_ISA_XSAVES_UNSET | OPTION_MASK_ISA_XSAVEC_UNSET)
>  #define OPTION_MASK_ISA_XSAVEOPT_UNSET OPTION_MASK_ISA_XSAVEOPT
>  #define OPTION_MASK_ISA_AVX2_UNSET \
>(OPTION_MASK_ISA_AVX2 | OPTION_MASK_ISA_AVX512F_UNSET)
> --- gcc/testsuite/gcc.target/i386/pr87198.c.jj  2018-09-03 12:07:48.277760935 
> +0200
> +++ gcc/testsuite/gcc.target/i386/pr87198.c 2018-09-03 12:07:27.624107043 
> +0200
> @@ -0,0 +1,13 @@
> +/* PR target/87198 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mxsavec -mno-xsave" } */
> +
> +#include 
> +
> +void
> +test_xsavec (void *__A, long long __B)
> +{
> +  _xsavec (__A, __B);
> +}
> +
> +/* { dg-error "target specific option mismatch" "" { target *-*-* } 0 } */
>
> Jakub


PING^1 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-09-04 Thread H.J. Lu
On Tue, Aug 28, 2018 at 11:04 AM, H.J. Lu  wrote:
> With -mavx, for
>
> [hjl@gnu-cfl-1 skx-2]$ cat foo.i
> extern float f;
> extern double d;
> extern int i;
>
> void
> foo (void)
> {
>   d = f;
>   f = i;
> }
>
> we need to generate
>
> vxorp[ds]   %xmmN, %xmmN, %xmmN
> ...
> vcvtss2sd   f(%rip), %xmmN, %xmmX
> ...
> vcvtsi2ss   i(%rip), %xmmN, %xmmY
>
> to avoid partial XMM register stall.  This patch adds a pass to generate
> a single
>
> vxorps  %xmmN, %xmmN, %xmmN
>
> at function entry, which is shared by all SF and DF conversions, instead
> of generating one
>
> vxorp[ds]   %xmmN, %xmmN, %xmmN
>
> for each SF/DF conversion.
>
> Performance impacts on SPEC CPU 2017 rate with 1 copy using
>
> -Ofast -march=native -mfpmath=sse -fno-associative-math -funroll-loops
>
> are
>
> 1. On Broadwell server:
>
> 500.perlbench_r (-0.82%)
> 502.gcc_r (0.73%)
> 505.mcf_r (-0.24%)
> 520.omnetpp_r (-2.22%)
> 523.xalancbmk_r (-1.47%)
> 525.x264_r (0.31%)
> 531.deepsjeng_r (0.27%)
> 541.leela_r (0.85%)
> 548.exchange2_r (-0.11%)
> 557.xz_r (-0.34%)
> Geomean: (-0.23%)
>
> 503.bwaves_r (0.00%)
> 507.cactuBSSN_r (-1.88%)
> 508.namd_r (0.00%)
> 510.parest_r (-0.56%)
> 511.povray_r (0.49%)
> 519.lbm_r (-1.28%)
> 521.wrf_r (-0.28%)
> 526.blender_r (0.55%)
> 527.cam4_r (-0.20%)
> 538.imagick_r (2.52%)
> 544.nab_r (-0.18%)
> 549.fotonik3d_r (-0.51%)
> 554.roms_r (-0.22%)
> Geomean: (0.00%)
>
> 2. On Skylake client:
>
> 500.perlbench_r (-0.29%)
> 502.gcc_r (-0.36%)
> 505.mcf_r (1.77%)
> 520.omnetpp_r (-0.26%)
> 523.xalancbmk_r (-3.69%)
> 525.x264_r (-0.32%)
> 531.deepsjeng_r (0.00%)
> 541.leela_r (-0.46%)
> 548.exchange2_r (0.00%)
> 557.xz_r (0.00%)
> Geomean: (-0.34%)
>
> 503.bwaves_r (0.00%)
> 507.cactuBSSN_r (-0.56%)
> 508.namd_r (0.87%)
> 510.parest_r (0.00%)
> 511.povray_r (-0.73%)
> 519.lbm_r (0.84%)
> 521.wrf_r (0.00%)
> 526.blender_r (-0.81%)
> 527.cam4_r (-0.43%)
> 538.imagick_r (2.55%)
> 544.nab_r (0.28%)
> 549.fotonik3d_r (0.00%)
> 554.roms_r (0.32%)
> Geomean: (0.12%)
>
> 3. On Skylake server:
>
> 500.perlbench_r (-0.55%)
> 502.gcc_r (0.69%)
> 505.mcf_r (0.00%)
> 520.omnetpp_r (-0.33%)
> 523.xalancbmk_r (-0.21%)
> 525.x264_r (-0.27%)
> 531.deepsjeng_r (0.00%)
> 541.leela_r (0.00%)
> 548.exchange2_r (-0.11%)
> 557.xz_r (0.00%)
> Geomean: (0.00%)
>
> 503.bwaves_r (0.58%)
> 507.cactuBSSN_r (0.00%)
> 508.namd_r (0.00%)
> 510.parest_r (0.18%)
> 511.povray_r (-0.58%)
> 519.lbm_r (0.25%)
> 521.wrf_r (0.40%)
> 526.blender_r (0.34%)
> 527.cam4_r (0.19%)
> 538.imagick_r (5.87%)
> 544.nab_r (0.17%)
> 549.fotonik3d_r (0.00%)
> 554.roms_r (0.00%)
> Geomean: (0.62%)
>
> On Skylake client, impacts on 538.imagick_r are
>
> size before:
>
>textdata bss dec hex filename
> 277   108765576 2572029  273efd imagick_r.exe
>
> size after:
>
>textdata bss dec hex filename
> 2511825   108765576 2528277  269415 imagick_r.exe
>
> number of vxorp[ds]:
>
> before  after   difference
> 14570   4515-69%
>
> OK for trunk?
>
> Thanks.
>
>
> H.J.
> ---
> gcc/
>
> 2018-08-28  H.J. Lu  
> Sunil K Pandey  
>
> PR target/87007
> * config/i386/i386-passes.def: Add
> pass_remove_partial_avx_dependency.
> * config/i386/i386-protos.h
> (make_pass_remove_partial_avx_dependency): New.
> * config/i386/i386.c (make_pass_remove_partial_avx_dependency):
> New function.
> (pass_data_remove_partial_avx_dependency): New.
> (pass_remove_partial_avx_dependency): Likewise.
> (make_pass_remove_partial_avx_dependency): Likewise.
> * config/i386/i386.md (SF/DF conversion splitters): Disabled
> for TARGET_AVX.
>
> gcc/testsuite/
>
> 2018-08-28  H.J. Lu  
> Sunil K Pandey  
>
> PR target/87007
> * gcc.target/i386/pr87007.c: New file.


PING:

https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01781.html



-- 
H.J.


Re: [PATCH] DWARF: add DW_AT_count to zero-length arrays

2018-09-04 Thread Tom de Vries
[ Adding Jason as addressee ]

On 08/28/2018 08:20 PM, Omar Sandoval wrote:
> On Fri, Aug 17, 2018 at 12:16:07AM -0700, Omar Sandoval wrote:
>> On Thu, Aug 16, 2018 at 11:54:53PM -0700, Omar Sandoval wrote:
>>> On Thu, Aug 16, 2018 at 10:27:48PM -0700, Andrew Pinski wrote:
 On Thu, Aug 16, 2018 at 9:29 PM Omar Sandoval  wrote:
>
> Hi,
>
> This fixes the issue that it is impossible to distinguish a zero-length 
> array
> type from a flexible array type given the DWARF produced by GCC (which I
> reported here [1]). We do so by adding a DW_AT_count attribute with a 
> value of
> zero only for zero-length arrays (this is what clang does in this case, 
> too).
>
> 1: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86985
>
> The reproducer from the PR now produces the expected output:
>
> $ ~/gcc-build/gcc/xgcc -B ~/gcc-build/gcc -g -c zero_length.c
> $ ~/gcc-build/gcc/xgcc -B ~/gcc-build/gcc -g -c flexible.c
> $ gdb -batch -ex 'ptype baz' zero_length.o
> type = struct {
> int foo;
> int bar[0];
> }
> $ gdb -batch -ex 'ptype baz' flexible.o
> type = struct {
> int foo;
> int bar[];
> }
>
> This was bootstrapped and tested on x86_64-pc-linux-gnu.
>
> I don't have commit rights (first time contributor), so if this change is 
> okay
> could it please be applied?
>>
>> [snip]
>>
>> Here's the patch with the is_c () helper instead.
>>
>> 2018-08-17  Omar Sandoval  
>>

I've added the PR number here.

>>  * dwarf2out.c (is_c): New.
>>  (add_subscript_info): Add DW_AT_count of 0 for C zero-length arrays.
>>
>> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
>> index 5a74131d332..189f9bb381f 100644
>> --- a/gcc/dwarf2out.c
>> +++ b/gcc/dwarf2out.c
>> @@ -3671,6 +3671,7 @@ static const char *get_AT_string (dw_die_ref, enum 
>> dwarf_attribute);
>>  static int get_AT_flag (dw_die_ref, enum dwarf_attribute);
>>  static unsigned get_AT_unsigned (dw_die_ref, enum dwarf_attribute);
>>  static inline dw_die_ref get_AT_ref (dw_die_ref, enum dwarf_attribute);
>> +static bool is_c (void);
>>  static bool is_cxx (void);
>>  static bool is_cxx (const_tree);
>>  static bool is_fortran (void);
>> @@ -5434,6 +5435,19 @@ get_AT_file (dw_die_ref die, enum dwarf_attribute 
>> attr_kind)
>>return a ? AT_file (a) : NULL;
>>  }
>>  
>> +/* Return TRUE if the language is C.  */
>> +
>> +static inline bool
>> +is_c (void)
>> +{
>> +  unsigned int lang = get_AT_unsigned (comp_unit_die (), DW_AT_language);
>> +
>> +  return (lang == DW_LANG_C || lang == DW_LANG_C89 || lang == DW_LANG_C99
>> +  || lang == DW_LANG_C11 || lang == DW_LANG_ObjC);
>> +
>> +
>> +}
>> +
>>  /* Return TRUE if the language is C++.  */
>>  
>>  static inline bool
>> @@ -20918,12 +20932,24 @@ add_subscript_info (dw_die_ref type_die, tree 
>> type, bool collapse_p)
>> dimension arr(N:*)
>>   Since the debugger is definitely going to need to know N
>>   to produce useful results, go ahead and output the lower
>> - bound solo, and hope the debugger can cope.  */
>> + bound solo, and hope the debugger can cope.
>> +
>> + For C and C++, if upper is NULL, this may be a zero-length array
>> + or a flexible array; we'd like to be able to distinguish between
>> + the two.  Set DW_AT_count to 0 for the former.  TYPE_SIZE is NULL
>> + for the latter.  */
>> 

I found inserting this comment here confusing, I've moved it down and
shortened it.

>>if (!get_AT (subrange_die, DW_AT_lower_bound))
>>  add_bound_info (subrange_die, DW_AT_lower_bound, lower, NULL);
>> -  if (upper && !get_AT (subrange_die, DW_AT_upper_bound))
>> -add_bound_info (subrange_die, DW_AT_upper_bound, upper, NULL);
>> +  if (!get_AT (subrange_die, DW_AT_upper_bound)
>> +  && !get_AT (subrange_die, DW_AT_count))
>> +{
>> +  if (upper)
>> +add_bound_info (subrange_die, DW_AT_upper_bound, upper, NULL);
>> +  else if ((is_c () || is_cxx ()) && TYPE_SIZE (type))
>> +add_bound_info (subrange_die, DW_AT_count,
>> +build_int_cst (TREE_TYPE (lower), 0), NULL);
>> +}
>>  }
>>  
>>/* Otherwise we have an array type with an unspecified length.  The
> 
> Ping. Does this version look okay for trunk?
> 

Looks ok to me (but I can't approve).

Also, I've added a testcase.

OK for trunk?

Thanks,
- Tom

[debug] DWARF: add DW_AT_count to zero-length arrays

2018-09-04  Omar Sandoval  
	Tom de Vries  

	PR debug/86985
	* dwarf2out.c (is_c): New function.
	(add_subscript_info): Add DW_AT_count of 0 for C zero-length arrays.

	* gcc.dg/guality/zero-length-array.c: New test.

---
 gcc/dwarf2out.c  | 26 ++--
 gcc/testsuite/gcc.dg/guality/zero-length-array.c | 21 +++
 2 files changed, 45 

Re: [ARM/FDPIC v2 04/21] [ARM] FDPIC: Add support for FDPIC for arm architecture

2018-09-04 Thread Richard Earnshaw (lists)
On 29/08/18 11:46, Kyrill Tkachov wrote:
> Hi Christophe,
> 
> On 13/07/18 17:10, christophe.l...@st.com wrote:
>> From: Christophe Lyon 
>>
>> The FDPIC register is hard-coded to r9, as defined in the ABI.
>>
>> We have to disable tailcall optimizations if we don't know if the
>> target function is in the same module. If not, we have to set r9 to
>> the value associated with the target module.
>>
>> When generating a symbol address, we have to take into account whether
>> it is a pointer to data or to a function, because different
>> relocations are needed.
>>
>> 2018-XX-XX  Christophe Lyon  
>>     Mickaël Guêné 
>>
>>     * config/arm/arm-c.c (__FDPIC__): Define new pre-processor macro
>>     in FDPIC mode.
>>     * config/arm/arm-protos.h (arm_load_function_descriptor): Declare
>>     new function.
>>     * config/arm/arm.c (arm_option_override): Define pic register to
>>     FDPIC_REGNUM.
>>     (arm_function_ok_for_sibcall) Disable sibcall optimization if we
>>     have no decl or go through PLT.
>>     (arm_load_pic_register): Handle TARGET_FDPIC.
>>     (arm_is_segment_info_known): New function.
>>     (arm_pic_static_addr): Add support for FDPIC.
>>     (arm_load_function_descriptor): New function.
>>     (arm_assemble_integer): Add support for FDPIC.
>>     * config/arm/arm.h (PIC_OFFSET_TABLE_REG_CALL_CLOBBERED):
>>     Define. (FDPIC_REGNUM): New define.
>>     * config/arm/arm.md (call): Add support for FDPIC.
>>     (call_value): Likewise.
>>     (*restore_pic_register_after_call): New pattern.
>>     (untyped_call): Disable if FDPIC.
>>     (untyped_return): Likewise.
>>     * config/arm/unspecs.md (UNSPEC_PIC_RESTORE): New.
>>
> 
> In general, you can use SYMBOL_REF_P to check RTXes for SYMBOL_REF code.
> 
>> Change-Id: Icee8484772f97ac6f3a9574df4aa4f25a8196786
>>
>> diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
>> index 4471f79..90733cc 100644
>> --- a/gcc/config/arm/arm-c.c
>> +++ b/gcc/config/arm/arm-c.c
>> @@ -202,6 +202,8 @@ arm_cpu_builtins (struct cpp_reader* pfile)
>>    builtin_define ("__ARM_EABI__");
>>  }
>>
>> +  def_or_undef_macro (pfile, "__FDPIC__", TARGET_FDPIC);
>> +
>>    def_or_undef_macro (pfile, "__ARM_ARCH_EXT_IDIV__", TARGET_IDIV);
>>    def_or_undef_macro (pfile, "__ARM_FEATURE_IDIV", TARGET_IDIV);
>>
>> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
>> index 8537262..edebeb7 100644
>> --- a/gcc/config/arm/arm-protos.h
>> +++ b/gcc/config/arm/arm-protos.h
>> @@ -134,6 +134,7 @@ extern int arm_max_const_double_inline_cost (void);
>>  extern int arm_const_double_inline_cost (rtx);
>>  extern bool arm_const_double_by_parts (rtx);
>>  extern bool arm_const_double_by_immediates (rtx);
>> +extern rtx arm_load_function_descriptor (rtx funcdesc);
>>  extern void arm_emit_call_insn (rtx, rtx, bool);
>>  bool detect_cmse_nonsecure_call (tree);
>>  extern const char *output_call (rtx *);
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index c70be36..44c3b08 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -3466,6 +3466,14 @@ arm_option_override (void)
>>    if (flag_pic && TARGET_VXWORKS_RTP)
>>  arm_pic_register = 9;
>>
>> +  /* If in FDPIC mode then force arm_pic_register to be r9.  */
>> +  if (TARGET_FDPIC)
>> +    {
>> +  arm_pic_register = FDPIC_REGNUM;
>> +  if (TARGET_ARM_ARCH < 7)
>> +   error ("FDPIC mode is not supported on architectures older
>> than 7");
> 
> Armv7 rather than 7 please.
> 

What about R and M profiles, especially armv8-m.base?

R.

>> +    }
>> +
>>    if (arm_pic_register_string != NULL)
>>  {
>>    int pic_register = decode_reg_name (arm_pic_register_string);
>> @@ -7247,6 +7255,21 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
>>    if (cfun->machine->sibcall_blocked)
>>  return false;
>>
>> +  if (TARGET_FDPIC)
>> +    {
>> +  /* In FDPIC, never tailcall something for which we have no decl:
>> +    the target function could be in a different module, requiring
>> +    a different FDPIC register value.  */
>> +  if (decl == NULL)
>> +   return false;
>> +
>> +  /* Don't tailcall if we go through the PLT since the FDPIC
>> +    register is then corrupted and we don't restore it after
>> +    static function calls.  */
>> +  if (!targetm.binds_local_p (decl))
>> +   return false;
>> +    }
>> +
>>    /* Never tailcall something if we are generating code for Thumb-1.  */
>>    if (TARGET_THUMB1)
>>  return false;
>> @@ -7625,7 +7648,9 @@ arm_load_pic_register (unsigned long saved_regs
>> ATTRIBUTE_UNUSED)
>>  {
>>    rtx l1, labelno, pic_tmp, pic_rtx, pic_reg;
>>
>> -  if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE)
>> +  if (crtl->uses_pic_offset_table == 0
>> +  || TARGET_SINGLE_PIC_BASE
>> +  || TARGET_FDPIC)
>>  return;
>>
>>    gcc_assert (flag_pic);
>> @@ -7693,28 

Re: [PATCH v3] Change default to -fno-math-errno

2018-09-04 Thread Wilco Dijkstra


ping




From: Wilco Dijkstra
Sent: 18 June 2018 15:01
To: GCC Patches
Cc: nd; Joseph Myers
Subject: [PATCH v3] Change default to -fno-math-errno
  

GCC currently defaults to -fmath-errno.  This generates code assuming math
functions set errno and the application checks errno.  Few applications
test errno and various systems and math libraries no longer set errno since it
is optional.  GCC generates much faster code for simple math functions with
-fno-math-errno such as sqrt and lround (avoiding a call and PLT redirection).
Therefore it is reasonable to change the default to -fno-math-errno.  This is
already the case for non-C languages.  Only change the default for C99 and
later.

long f(float x) { return lroundf(x) + 1; }

by default:

f:
    str x30, [sp, -16]!
    bl  lroundf
    add x0, x0, 1
    ldr x30, [sp], 16
    ret

With -fno-math-errno:

f:
    fcvtas  x0, s0
    add x0, x0, 1
    ret

Passes regress on AArch64. OK for commit?

ChangeLog:
2018-06-18  Wilco Dijkstra  

    * common.opt (fmath-errno): Change default to 0.
    * opts.c (set_fast_math_flags): Force -fno-math-errno with -ffast-math.
    * c-family/c-opts.c (c_common_init_options_struct): Set flag_errno_math
    to special value.
    (c_common_post_options): Set flag_errno_math default based on language.

doc/
    * invoke.texi (-fmath-errno) Update documentation.

testsuite/

    * gcc.dg/errno-1.c: Add -fmath-errno.
    * gcc.dg/torture/pr68264.c: Likewise.
    * gcc.dg/tree-ssa/ssa-dse-15.c: Likewise.
    * gcc.target/aarch64/no-inline-lrint_1.c: Likewise.
--

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 
bbcb1bb1a9c606a262a1aa362471388987629d06..b13fc122f8baf6aae0b47010be92ffd961b0147a
 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -208,6 +208,9 @@ c_common_init_options_struct (struct gcc_options *opts)
 
   /* By default, C99-like requirements for complex multiply and divide.  */
   opts->x_flag_complex_method = 2;
+
+  /* Use a special value so the default can be set after option parsing.  */
+  opts->x_flag_errno_math = 2;
 }
 
 /* Common initialization before calling option handlers.  */
@@ -831,6 +834,11 @@ c_common_post_options (const char **pfilename)
   else if (!flag_gnu89_inline && !flag_isoc99)
 error ("-fno-gnu89-inline is only supported in GNU99 or C99 mode");
 
+  /* If -fmath-errno isn't set (or implied by another math option),
+ set the default to -fno-math-errno for C99 and later.  */
+  if (flag_errno_math == 2)
+    flag_errno_math = !flag_isoc99 || c_dialect_objc ();
+
   /* Default to ObjC sjlj exception handling if NeXT runtime.  */
   if (flag_objc_sjlj_exceptions < 0)
 flag_objc_sjlj_exceptions = flag_next_runtime;
diff --git a/gcc/common.opt b/gcc/common.opt
index 
d54e8e5601698c73d2c2bc8f354b55f3889a9186..e6fe11d1dcdbb39016242299dd3d76233e2bd976
 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1855,7 +1855,7 @@ Common Report Var(flag_lto_report_wpa) Init(0)
 Report various link-time optimization statistics for WPA only.
 
 fmath-errno
-Common Report Var(flag_errno_math) Init(1) Optimization SetByCombined
+Common Report Var(flag_errno_math) Init(0) Optimization SetByCombined
 Set errno after built-in math functions.
 
 fmax-errors=
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
42832c5a50013d776dff47c6d347f5bc4dbef2d8..aeda219ba0dbe169d9fb2b4358950f2ba32b5c78
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9783,24 +9783,16 @@ that depend on an exact implementation of IEEE or ISO 
rules/specifications
 for math functions. It may, however, yield faster code for programs
 that do not require the guarantees of these specifications.
 
-@item -fno-math-errno
-@opindex fno-math-errno
-Do not set @code{errno} after calling math functions that are executed
-with a single instruction, e.g., @code{sqrt}.  A program that relies on
-IEEE exceptions for math error handling may want to use this flag
-for speed while maintaining IEEE arithmetic compatibility.
+@item -fmath-errno
+@opindex fmath-errno
+Generate code that assumes math functions may set errno.  This disables
+inlining of simple math functions like @code{sqrt} and @code{lround}.
 
-This option is not turned on by any @option{-O} option since
-it can result in incorrect output for programs that depend on
-an exact implementation of IEEE or ISO rules/specifications for
-math functions. It may, however, yield faster code for programs
-that do not require the guarantees of these specifications.
-
-The default is @option{-fmath-errno}.
+A program which relies on math functions setting errno may need to
+use this flag.  However note various systems and math libraries never
+set errno.
 
-On Darwin systems, the math library never sets @code{errno}.  There is
-therefore no reason for the compiler to consider the possibility that
-it might, and @option{-fno-math-errno} is the default.

[PATCH] DWARF: Allow hard frame pointer even if frame pointer isn't used

2018-09-04 Thread H.J. Lu
On Mon, Sep 3, 2018 at 10:01 AM, H.J. Lu  wrote:
> On Mon, Sep 3, 2018 at 9:44 AM, Michael Matz  wrote:
>> Hi,
>>
>> On Mon, 3 Sep 2018, H.J. Lu wrote:
>>
>>> > So, what's the testcase testing then?  Before the patch it doesn't ICE,
>>> > after the patch it doesn't ICE.  What should I look out for so I can see
>>> > that what the testcase is producing without the patch is wrong?
>>>
>>> Before the patch, debug info is wrong since it uses hard frame pointer
>>> which isn't set up for the function.  You can do "readelf -w" on .o file to
>>> verify the debug info.
>>
>> Yeah, that's what I thought as well, but it's correct:
>>
>> % ./gcc/cc1plus -quiet -O2 -g -fno-omit-frame-pointer -fvar-tracking x.cc
>> % gcc -c x.s
>> % readelf -wfi x.o
>> ...
>>  <1><8a>: Abbrev Number: 9 (DW_TAG_subprogram)
>> <8b>   DW_AT_specification: <0x3a>
>> <8f>   DW_AT_decl_line   : 6
>> <90>   DW_AT_decl_column : 5
>> <91>   DW_AT_object_pointer: <0xa7>
>> <95>   DW_AT_low_pc  : 0x0
>> <9d>   DW_AT_high_pc : 0x3
>>DW_AT_frame_base  : 1 byte block: 9c 
>> (DW_OP_call_frame_cfa)
>>DW_AT_GNU_all_call_sites: 1
>> ...
>>  <2>: Abbrev Number: 11 (DW_TAG_formal_parameter)
>>DW_AT_name: d
>> <101>   DW_AT_decl_file   : 1
>> <102>   DW_AT_decl_line   : 6
>> <103>   DW_AT_decl_column : 63
>> <104>   DW_AT_type: <0x78>
>> <108>   DW_AT_location: 2 byte block: 91 8  (DW_OP_fbreg: 8)
>> ...
>>   DW_CFA_def_cfa: r7 (rsp) ofs 8
>>   DW_CFA_offset: r16 (rip) at cfa-8
>>   DW_CFA_nop
>>   DW_CFA_nop
>> ...
>>
>> So, argument 'd' is supposed to be at DW_AT_frame_base + 8, which is
>> %rsp+8+8, aka %rsp+16, which is correct given that it's the eigth argument
>> (including the implicit this parameter).
>
> Can we use DW_AT_frame_base when the frame pointer isn't available?
> If yes,
>
>  gcc_assert ((SUPPORTS_STACK_ALIGNMENT
>&& (elim == hard_frame_pointer_rtx
>|| elim == stack_pointer_rtx))
>   || elim == (frame_pointer_needed
>   ? hard_frame_pointer_rtx
>   : stack_pointer_rtx));
>
> should be changed to
>
>   gcc_assert (elim == hard_frame_pointer_rtx
>   || elim == stack_pointer_rtx);
>
> This will also fix:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86593
>

Since hard frame pointer is encoded with DW_OP_fbreg which uses the
DW_AT_frame_base attribute, not hard frame pointer directly, we should
allow hard frame pointer when generating DWARF info even if frame pointer
isn't used.

OK for trunk?

-- 
H.J.
From 6c16105e88c2635bd58fc904e7d28901a7f198f5 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Tue, 4 Sep 2018 08:01:33 -0700
Subject: [PATCH] DWARF: Allow hard frame pointer even if frame pointer isn't
 used

r251028

commit cd557ff63f388ad27c376d0a225e74d3594a6f9d
Author: hjl 
Date:   Thu Aug 10 15:29:05 2017 +

i386: Don't use frame pointer without stack access

When there is no stack access, there is no need to use frame pointer
even if -fno-omit-frame-pointer is used and caller's frame pointer is
unchanged.

frame pointer may not be available even if -fno-omit-frame-pointer is
used.  When this happened, arg pointer may be eliminated by hard frame
pointer.  Since hard frame pointer is encoded with DW_OP_fbreg which
uses the DW_AT_frame_base attribute, not hard frame pointer directly,
we should allow hard frame pointer when generating DWARF info even if
frame pointer isn't used.

gcc/

	PR debug/86593
	* dwarf2out.c (based_loc_descr): Allow hard frame pointer even
	if frame pointer isn't used.
	(compute_frame_pointer_to_fb_displacement): Likewise.

gcc/testsuite/

	PR debug/86593
	* g++.dg/pr86593.C: New test.
---
 gcc/dwarf2out.c| 25 -
 gcc/testsuite/g++.dg/pr86593.C | 11 +++
 2 files changed, 23 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr86593.C

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 77317ed2575..40cfdf56337 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -14325,13 +14325,13 @@ based_loc_descr (rtx reg, poly_int64 offset,
 
   if (elim != reg)
 	{
+	  /* Allow hard frame pointer here even if frame pointer
+	isn't used since hard frame pointer is encoded with
+	DW_OP_fbreg which uses the DW_AT_frame_base attribute,
+	not hard frame pointer directly.  */
 	  elim = strip_offset_and_add (elim, );
-	  gcc_assert ((SUPPORTS_STACK_ALIGNMENT
-		   && (elim == hard_frame_pointer_rtx
-			   || elim == stack_pointer_rtx))
-	  || elim == (frame_pointer_needed
-  ? hard_frame_pointer_rtx
-  : stack_pointer_rtx));
+	  gcc_assert (elim == hard_frame_pointer_rtx
+		  || elim == stack_pointer_rtx);
 
 	  /* If drap register is used to align stack, use frame
 	 pointer + offset to access stack variables.  If 

Re: [GCC][PATCH v2][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks

2018-09-04 Thread Sam Tebbs

Hi James,

Thanks for the feedback. Here is an update with the changes you proposed 
and an updated changelog.


gcc/
2018-09-04  Sam Tebbs  

PR target/85628
* config/aarch64/aarch64.md (*aarch64_bfxil):
Define.
* config/aarch64/constraints.md (Ulc): Define
* config/aarch64/aarch64-protos.h (aarch64_high_bits_all_ones_p):
Define.
* config/aarch64/aarch64.c (aarch64_high_bits_all_ones_p): New function.

gcc/testsuite
2018-09-04  Sam Tebbs  

PR target/85628
* gcc.target/aarch64/combine_bfxil.c: New file.
* gcc.target/aarch64/combine_bfxil_2.c: New file.


On 08/28/2018 11:53 PM, James Greenhalgh wrote:

Hm, I'm not very sure about the naming here; "left consecutive" isn't a
common phrase to denote the mask you're looking for (exact_log2 (-i) != -1
if I'm reading right), and is misleading 0x is 'left consecutive'
too, just with zeroes rather than ones.


      * config/aarch64/aarch64.c (aarch64_is_left_consecutive):
  New function.

gcc/testsuite
2018-08-01  Sam Tebbs

      PR target/85628
      * gcc.target/aarch64/combine_bfxil.c: New file.
      * gcc.target/aarch64/combine_bfxil_2.c: New file.


diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
fa01475aa9ee579b6a3b2526295b622157120660..3cfa51b15af3e241672f1383cf881c12a44494a5
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1454,6 +1454,14 @@ aarch64_hard_regno_caller_save_mode (unsigned regno, 
unsigned,
  return SImode;
  }
  
+/* Implement IS_LEFT_CONSECUTIVE.  Check if I's bits are consecutive

What is IS_LEFT_CONSECUTIVE - I don't see it elsewhere in the GCC code, so
what does the comment refer to implementing?


+   ones from the MSB.  */
+bool
+aarch64_is_left_consecutive (HOST_WIDE_INT i)
+{
+  return (i | (i - 1)) == HOST_WIDE_INT_M1;

exact_log2(-i) != HOST_WIDE_INT_M1?


diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index ef95fc829b83886e2ff00e4664e31af916e99b0c..b43e70285b3e6c45b830ce3790c38307b2294f81 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -575,4 +575,6 @@ rtl_opt_pass *make_pass_track_speculation (gcc::context *);
 
 poly_uint64 aarch64_regmode_natural_size (machine_mode);
 
+bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1de76e075471acaa68584595023e3878b10538e2..352515e9fc92c7b14a631c76bbe3d25030174bcf 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1474,6 +1474,13 @@ aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned,
 return SImode;
 }
 
+/* Return true if I's bits are consecutive ones from the MSB.  */
+bool
+aarch64_high_bits_all_ones_p (HOST_WIDE_INT i)
+{
+  return exact_log2(-i) != HOST_WIDE_INT_M1;
+}
+
 /* Implement TARGET_CONSTANT_ALIGNMENT.  Make strings word-aligned so
that strcpy from constants will be faster.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 955769a64d2030839cdb337321a808626188190e..88f66104db31320389f05cdd5d161db9992a77b8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5336,6 +5336,31 @@
   [(set_attr "type" "rev")]
 )
 
+(define_insn "*aarch64_bfxil"
+  [(set (match_operand:GPI 0 "register_operand" "=r,r")
+(ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "r,0")
+		(match_operand:GPI 3 "const_int_operand" "n, Ulc"))
+	(and:GPI (match_operand:GPI 2 "register_operand" "0,r")
+		(match_operand:GPI 4 "const_int_operand" "Ulc, n"]
+  "(INTVAL (operands[3]) == ~INTVAL (operands[4]))
+  && (aarch64_high_bits_all_ones_p (INTVAL (operands[3]))
+|| aarch64_high_bits_all_ones_p (INTVAL (operands[4])))"
+  {
+switch (which_alternative)
+{
+  case 0:
+	operands[3] = GEN_INT (ctz_hwi (~INTVAL (operands[3])));
+	return "bfxil\\t%0, %1, 0, %3";
+  case 1:
+	operands[3] = GEN_INT (ctz_hwi (~INTVAL (operands[4])));
+	return "bfxil\\t%0, %2, 0, %3";
+  default:
+	gcc_unreachable ();
+}
+  }
+  [(set_attr "type" "bfm")]
+)
+
 ;; There are no canonicalisation rules for the position of the lshiftrt, ashift
 ;; operations within an IOR/AND RTX, therefore we have two patterns matching
 ;; each valid permutation.
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index 72cacdabdac52dcb40b480f7a5bfbf4997c742d8..31fc3eafd8bba03cc773e226223a6293c6dde8d4 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -172,6 +172,13 @@
   A constraint that matches the immediate constant -1."
   (match_test "op == constm1_rtx"))
 
+(define_constraint "Ulc"
+ "@internal
+ A constraint that matches a constant integer whose bits are consecutive ones
+ from the MSB."
+ (and (match_code "const_int")
+  

Re: [RFH] split {generic,gimple}-match.c files

2018-09-04 Thread Martin Liška
On 09/03/2018 04:43 PM, Richard Biener wrote:
> On Mon, 3 Sep 2018, Martin Liška wrote:
> 
>> On 09/03/2018 04:00 PM, Richard Biener wrote:
>>> On Mon, 3 Sep 2018, Martin Liška wrote:
>>>
 On 09/03/2018 02:54 PM, Martin Liška wrote:
> On 09/03/2018 02:41 PM, Richard Biener wrote:
>> On Mon, 3 Sep 2018, Martin Liška wrote:
>>
>>> On 04/25/2018 01:42 PM, Richard Biener wrote:

 The following patch^Whack splits $subject files into three, one
 for the predicates (due to an implementation detail) and two for
 the rest - for now into similar LOC size files.

 I'd like to get help on the makefile changes to make them less
 verbose, somehow globbing the -[12p] parts.

 Also you can see the split point is manually chosen which means
 it will bitrot.  Timings for the stage2 compiles on a x86_64
 box are

 gimple-match-p.c   5s
 generic-match-p.c  3s
 gimple-match-1.c  85s
 generic-match-1.c 56s
 gimple-match-2.c  82s
 generic-match-2.c 31s

 the required header files are quite big (and of course everything
 needs to be exported without the analysis work becoming too 
 cumbersome),
 it's 3342 LOC for gimple-match-head.h and 1556 LOC for 
 generic-match-head.h

 The machine I tested is quite fast so the 80ish second timings are 
 still
 too slow I guess and thus splitting up into four files for gimple and
 three files for generic looks better.

 Note we lose some inlining/cloning capability in the splitting process
 (I see quite a bit of constprop/isra work being done on the generated 
 files).  I didn't try to measure the runtime impact though.

 The patch still needs quite some TLC, it really is a bit hacky but I'd
 like to get feedback on the approach and I didn't want to spend time
 on programatically finding optimal split points (so everything is 
 output
 in the same semi-random order as before).

 Richard.
> ...
>>> I took a look at gimple-match.c and what about doing a split in 
>>> following way:
>>> - all gimple_simplify_$number going into a separate header file (~12000 
>>> LOC)
>>> - all the function can be marked as static inline
>>> - all other gimple_simplify_$code can be split into arbitrary number of 
>>> parts
>>> - we have 287 such functions where each function only calls 
>>> gimple_simplify_$number and
>>>   on average there 10 of such calls
>>> - that would allow to remove most of gimple_simplify_$number functions 
>>> from the header file
>>>
>>> Richi do you think it will be viable?
>>
>> That relies on the cgraph code DCEing all unused gimple_simplify_$number
>> functions from the header fast as they are now effectively duplicated
>> into all parts, correct?  Also I'm not sure if we actually want to inline
>> them...  they are split out to get both code size and compile-time
>> under control.  Unfortunately we have still high max-inline-insns-single
>> which is used for inline marked functions.
>>
>> Eventually doing a "proper" partitioning algorithm is viable, that is,
>> partition based on gimple_simplify_$code and put gimple_simplify_$number
>> where they are used.  If they are used across different codes then
>> merge those partitions.  I guess you'll see that that'll merge the 
>> biggest _$code parititions :/ (MINUS_EXPR, PLUS_EXPR).
>
> Yes, that should be much better. I'm attaching a 'callgraph' that was 
> done by grepping.
> Function starting at the beginning of a line is function definition, with 
> an indentation
> one can see calls.
>
> Yes, PLUS and MINUS call ~20 gimple_simplify_$number calls.
>
> Well, generating some simple call graph format for the source file and a 
> source file
> annotation of nodes can be input for a partitioning tool that can do the 
> split.
>
> Issue with the generated files is that one needs to fix the most 
> offenders (*-match.c, insn-recog.c, insn-emit.c, ..)
> in order to see some improvement.
>
> Looking at insn-recog.c, maybe similar callgraph-based split can be done 
> for recog_$number functions?
>
> Martin
>
>>
>> Richard.
>>
>

 I'm sending SCC components for gimple-match.c. So there are 3 quite big 
 one and rest is small. It's questionable
 whether partitioning based on that will provide desired speed up?
>>>
>>> When I experimented split based on # of functions wasn't working well,
>>> only split based on # of lines did.  I'd still expect that eventually
>>> basing the split on the SCC components makes sense if you use say,
>>> the biggest 4 (but measure size in # 

[testsuite, i386] Don't xfail gcc.target/i386/addr-sel-1.c (PR target/86744)

2018-09-04 Thread Rainer Orth
gcc.target/i386/addr-sel-1.c had started to XPASS around 20180730.  As
suggested by Uros in the PR, I've just removed the xfail.  Tested on
i386-pc-solaris2.11 and x86_64-pc-linux-gnu (both 32 and 64-bit),
installed on mainline.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2018-09-03  Rainer Orth  

PR target/86744
* gcc.target/i386/addr-sel-1.c: Don't xfail "b\\+1" scan.

# HG changeset patch
# Parent  c8d5d2ef046fbb4630756bcb087cb4ef83625e91
Don't xfail gcc.target/i386/addr-sel-1.c (PR target/86744)

diff --git a/gcc/testsuite/gcc.target/i386/addr-sel-1.c b/gcc/testsuite/gcc.target/i386/addr-sel-1.c
--- a/gcc/testsuite/gcc.target/i386/addr-sel-1.c
+++ b/gcc/testsuite/gcc.target/i386/addr-sel-1.c
@@ -14,4 +14,4 @@ int f(int i)
 }
 
 /* { dg-final { scan-assembler "a\\+1" } } */
-/* { dg-final { scan-assembler "b\\+1" { xfail *-*-* } } } */
+/* { dg-final { scan-assembler "b\\+1" } } */


Re: [PATCH] Merge Ignore and Deprecated in .opt files.

2018-09-04 Thread Segher Boessenkool
On Tue, Aug 21, 2018 at 03:14:29PM +0200, Martin Liška wrote:
> On 08/18/2018 12:24 AM, Segher Boessenkool wrote:
> > Removing Init is *wrong* as far as I see; it changes things, anyway.
> > Could you not have done this as a separate patch?
> 
> It's already in, but it should be fine. Note that I added check into
> opt-functions.awk where I can't use Var with Deprecated or Ignore attribute.
> Thus removal of Init should be fine. Or am I wrong?

rs6000 had

Target Undocumented Mask(DIRECT_MOVE) Var(rs6000_isa_flags) Ignore Warn(%qs is 
deprecated)

and you removed the Ignore.  Putting it back complains about Ignore+Warn
(which is explicitly allowed by the documentation!), and about Ignore+Var
(which we need here, and is no problem ever as far as I see).

Just putting Ignore back and disabling those two warnings seems to work
fine.  But I don't know what your patch really tried to do, so could you
look at this please?

[ This is PR87164, which you already assigned to yourself, thanks :-) ]


Segher


Re: VRP: abstract out wide int CONVERT_EXPR_P code

2018-09-04 Thread Aldy Hernandez




On 09/04/2018 10:20 AM, Richard Biener wrote:

On Tue, Sep 4, 2018 at 4:12 PM Aldy Hernandez  wrote:



I honestly don't remember exactly but I suppose precision mismatched somehow.

Your code was OK with not using widest_ints.


Ah, this got lost in the noise.  Sorry.

Will commit the original patch.


Yes, but for truncation of ~[0, 5] from say int to short it's not correct
to make the result ~[0, 5], is it?  At least the original code dropped
that to VARYING.  For the same reason truncating [3, 765] from
short to unsigned char isn't [3, 253].  But maybe I'm missing something.


Correct, but in that case we will realize that in wide_int_range_convert
and refuse to do the conversion:

/* If the conversion is not truncating we can convert the min and
   max values and canonicalize the resulting range.  Otherwise we
   can do the conversion if the size of the range is less than what
   the precision of the target type can represent.  */
if (outer_prec >= inner_prec
|| wi::rshift (wi::sub (vr0_max, vr0_min),
  wi::uhwi (outer_prec, inner_prec),
  inner_sign) == 0)


so you say the check is also sufficient for all anti-ranges?  I see the
current VRP code does canonicalization to ranges before running into
the CONVERT_EXPR_CODE case.


Yes.  exactract_range_into_wide_ints() will do the right thing.

Thanks again.
Aldy


Re: [PATCH] Optimise sqrt reciprocal multiplications

2018-09-04 Thread Richard Biener
On Tue, 4 Sep 2018, Kyrill Tkachov wrote:

> Hi Richard,
> 
> On 31/08/18 12:07, Richard Biener wrote:
> > On Thu, 30 Aug 2018, Kyrill Tkachov wrote:
> >
> >> Ping.
> >>
> >> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01496.html
> >>
> >> Thanks,
> >> Kyrill
> >>
> >> On 23/08/18 18:09, Kyrill Tkachov wrote:
> >>> Hi Richard,
> >>>
> >>> On 23/08/18 11:13, Richard Sandiford wrote:
>  Kyrill  Tkachov  writes:
> > Hi all,
> >
> > This patch aims to optimise sequences involving uses of 1.0 / sqrt (a)
> > under -freciprocal-math and -funsafe-math-optimizations.
> > In particular consider:
> >
> > x = 1.0 / sqrt (a);
> > r1 = x * x;  // same as 1.0 / a
> > r2 = a * x; // same as sqrt (a)
> >
> > If x, r1 and r2 are all used further on in the code, this can be
> > transformed into:
> > tmp1 = 1.0 / a
> > tmp2 = sqrt (a)
> > tmp3 = tmp1 * tmp2
> > x = tmp3
> > r1 = tmp1
> > r2 = tmp2
>  Nice optimisation :-)  Someone who knows the pass better should review,
>  but:
> >>>
> >>> Thanks for the review.
> >>>
>  There seems to be an implicit assumption that this is a win even
>  when the r1 and r2 assignments are only conditionally executed.
>  That's probably true, but it might be worth saying explicitly.
> >>>
> >>> I'll admit I had not considered that case.
> >>> I think it won't make a difference in practice, as the really expensive
> >>> operations here
> >>> are the sqrt and the division and they are on the executed path in either
> >>> case and them
> >>> becoming independent should be a benefit of its own.
> >>>
> > +/* Return TRUE if USE_STMT is a multiplication of DEF by A.  */
> > +
> > +static inline bool
> > +is_mult_by (gimple *use_stmt, tree def, tree a)
> > +{
> > +  if (gimple_code (use_stmt) == GIMPLE_ASSIGN
> > +  && gimple_assign_rhs_code (use_stmt) == MULT_EXPR)
> > +{
> > +  tree op0 = gimple_assign_rhs1 (use_stmt);
> > +  tree op1 = gimple_assign_rhs2 (use_stmt);
> > +
> > +  return (op0 == def && op1 == a)
> > +  || (op0 == a && op1 == def);
> > +}
> > +  return 0;
> > +}
>  Seems like is_square_of could now be a light-weight wrapper around this.
> >>>
> >>> Indeed, I've done the wrapping now.
> >>>
> > @@ -652,6 +669,180 @@ execute_cse_reciprocals_1 (gimple_stmt_iterator
> > *def_gsi, tree def)
> > occ_head = NULL;
> >   }
> >   +/* Transform sequences like
> > +   x = 1.0 / sqrt (a);
> > +   r1 = x * x;
> > +   r2 = a * x;
> > +   into:
> > +   tmp1 = 1.0 / a;
> > +   tmp2 = sqrt (a);
> > +   tmp3 = tmp1 * tmp2;
> > +   x = tmp3;
> > +   r1 = tmp1;
> > +   r2 = tmp2;
> > +   depending on the uses of x, r1, r2.  This removes one multiplication
> > and
> > +   allows the sqrt and division operations to execute in parallel.
> > +   DEF_GSI is the gsi of the initial division by sqrt that defines
> > +   DEF (x in the example abovs).  */
> > +
> > +static void
> > +optimize_recip_sqrt (gimple_stmt_iterator *def_gsi, tree def)
> > +{
> > +  use_operand_p use_p;
> > +  imm_use_iterator use_iter;
> > +  gimple *stmt = gsi_stmt (*def_gsi);
> > +  tree x = def;
> > +  tree orig_sqrt_ssa_name = gimple_assign_rhs2 (stmt);
> > +  tree div_rhs1 = gimple_assign_rhs1 (stmt);
> > +
> > +  if (TREE_CODE (orig_sqrt_ssa_name) != SSA_NAME
> > +  || TREE_CODE (div_rhs1) != REAL_CST
> > +  || !real_equal (_REAL_CST (div_rhs1), ))
> > +return;
> > +
> > +  gimple *sqrt_stmt = SSA_NAME_DEF_STMT (orig_sqrt_ssa_name);
> > +  if (!is_gimple_call (sqrt_stmt)
> > +  || !gimple_call_lhs (sqrt_stmt))
> > +return;
> > +
> > +  gcall *call = as_a  (sqrt_stmt);
>  Very minor, but:
> 
> gcall *sqrt_stmt
>   = dyn_cast  (SSA_NAME_DEF_STMT (orig_sqrt_ssa_name));
> if (!sqrt_stmt || !gimple_call_lhs (sqrt_stmt))
>   return;
> 
>  would avoid the need for the separate as_a<>, and would mean that
>  we only call gimple_call_* on gcalls.
> >>>
> >>> Ok.
> >>>
> > +  if (has_other_use)
> > +{
> > +  /* Using the two temporaries tmp1, tmp2 from above
> > + the original x is now:
> > + x = tmp1 * tmp2.  */
> > +  gcc_assert (mult_ssa_name);
> > +  gcc_assert (sqr_ssa_name);
> > +  gimple_stmt_iterator gsi2 = gsi_for_stmt (stmt);
> > +
> > +  tree new_ssa_name
> > += make_temp_ssa_name (TREE_TYPE (a), NULL,
> > "recip_sqrt_transformed");
> > +  gimple *new_stmt
> > += gimple_build_assign (new_ssa_name, MULT_EXPR,
> > +   mult_ssa_name, sqr_ssa_name);
> > +  gsi_insert_before (, new_stmt, GSI_SAME_STMT);
> > +  gcc_assert (gsi_stmt (gsi2) == stmt);
> > +  

Re: [PATCH] Use complete_array_type on flexible array member initializers

2018-09-04 Thread Jeff Law
On 09/03/2018 06:35 AM, Bernd Edlinger wrote:
[ Big snip, dropping lots of context ]


>>>
>>> No I don't think so, because at that time BRACE_ENCLOSED_INITIALIZER_P
>>> property does no longer work, as I explained in the previous mail.
>>>
>>> And cp_complete_array_type does use that property:
>> [ ... ]
>> Jason's the expert here.I'll defer to his expertise.  It just seemed
>> a bit odd to me that we have a routine to "complete" an array that does
>> any special C++ handling (cp_complete_array_type) and we're not using it.
>>
> 
> Agreed, I have posted an update which uses cp_complete_array_type, since
> Jason raised the same point.
> 
> But since C++ does not need the recursion here, things are a lot more easy.
> The patch still completes the array type _after_ the FE is completely done 
> with the
> parsing, since I want to avoid to destroy the information too early, since 
> the FE is looking
> at the TYPE_DOMAIN==NULL at various places.
FYI, I don't see a follow-up for this patch?  Did I miss it?  Or is it
under a different subject line?

jeff


[PATCH] Fix up -mno-xsave handling (PR target/87198)

2018-09-04 Thread Jakub Jelinek
Hi!

The -mxsave{opt,s,c} options turn on automatically -mxsave option and
the patterns rely on TARGET_XSAVE{OPT,S,C} implying TARGET_XSAVE,
but if somebody uses e.g. -mxsave{opt,s,c} -mno-xsave (or something that implies
-mno-xsave), TARGET_XSAVE{OPT,S,C} remains set and TARGET_XSAVE is clear.

Fixed by making sure OPTION_MASK_ISA_XSAVE_UNSET unsets also xsave{s,c}
(xsaveopt was already there).

This patch doesn't try to address PR87171, which I believe needs further
discussions and doesn't touch the same code anyway, so I think it should be
resolved independently.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
release branches?

2018-09-04  Jakub Jelinek  

PR target/87198
* common/config/i386/i386-common.c (OPTION_MASK_ISA_XSAVEOPT_SET,
OPTION_MASK_ISA_XSAVES_SET, OPTION_MASK_ISA_XSAVEC_SET): Use
OPTION_MASK_ISA_XSAVE_SET instead of OPTION_MASK_ISA_XSAVE.
(OPTION_MASK_ISA_XSAVE_UNSET): Add OPTION_MASK_ISA_XSAVES_UNSET
and OPTION_MASK_ISA_XSAVEC_UNSET.

* gcc.target/i386/pr87198.c: New test.

--- gcc/common/config/i386/i386-common.c.jj 2018-07-17 12:48:23.388587985 
+0200
+++ gcc/common/config/i386/i386-common.c2018-09-03 11:45:03.543653449 
+0200
@@ -59,7 +59,7 @@ along with GCC; see the file COPYING3.
 #define OPTION_MASK_ISA_FXSR_SET OPTION_MASK_ISA_FXSR
 #define OPTION_MASK_ISA_XSAVE_SET OPTION_MASK_ISA_XSAVE
 #define OPTION_MASK_ISA_XSAVEOPT_SET \
-  (OPTION_MASK_ISA_XSAVEOPT | OPTION_MASK_ISA_XSAVE)
+  (OPTION_MASK_ISA_XSAVEOPT | OPTION_MASK_ISA_XSAVE_SET)
 #define OPTION_MASK_ISA_AVX512F_SET \
   (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_AVX2_SET)
 #define OPTION_MASK_ISA_AVX512CD_SET \
@@ -95,9 +95,9 @@ along with GCC; see the file COPYING3.
 #define OPTION_MASK_ISA_PREFETCHWT1_SET OPTION_MASK_ISA_PREFETCHWT1
 #define OPTION_MASK_ISA_CLFLUSHOPT_SET OPTION_MASK_ISA_CLFLUSHOPT
 #define OPTION_MASK_ISA_XSAVES_SET \
-  (OPTION_MASK_ISA_XSAVES | OPTION_MASK_ISA_XSAVE)
+  (OPTION_MASK_ISA_XSAVES | OPTION_MASK_ISA_XSAVE_SET)
 #define OPTION_MASK_ISA_XSAVEC_SET \
-  (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_XSAVE)
+  (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_XSAVE_SET)
 #define OPTION_MASK_ISA_CLWB_SET OPTION_MASK_ISA_CLWB
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
@@ -185,7 +185,8 @@ along with GCC; see the file COPYING3.
 #define OPTION_MASK_ISA_FMA_UNSET OPTION_MASK_ISA_FMA
 #define OPTION_MASK_ISA_FXSR_UNSET OPTION_MASK_ISA_FXSR
 #define OPTION_MASK_ISA_XSAVE_UNSET \
-  (OPTION_MASK_ISA_XSAVE | OPTION_MASK_ISA_XSAVEOPT_UNSET)
+  (OPTION_MASK_ISA_XSAVE | OPTION_MASK_ISA_XSAVEOPT_UNSET \
+   | OPTION_MASK_ISA_XSAVES_UNSET | OPTION_MASK_ISA_XSAVEC_UNSET)
 #define OPTION_MASK_ISA_XSAVEOPT_UNSET OPTION_MASK_ISA_XSAVEOPT
 #define OPTION_MASK_ISA_AVX2_UNSET \
   (OPTION_MASK_ISA_AVX2 | OPTION_MASK_ISA_AVX512F_UNSET)
--- gcc/testsuite/gcc.target/i386/pr87198.c.jj  2018-09-03 12:07:48.277760935 
+0200
+++ gcc/testsuite/gcc.target/i386/pr87198.c 2018-09-03 12:07:27.624107043 
+0200
@@ -0,0 +1,13 @@
+/* PR target/87198 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mxsavec -mno-xsave" } */
+
+#include 
+
+void
+test_xsavec (void *__A, long long __B)
+{
+  _xsavec (__A, __B);
+}
+
+/* { dg-error "target specific option mismatch" "" { target *-*-* } 0 } */

Jakub


Re: VRP: abstract out wide int CONVERT_EXPR_P code

2018-09-04 Thread Aldy Hernandez



On 09/04/2018 10:20 AM, Richard Biener wrote:

On Tue, Sep 4, 2018 at 4:12 PM Aldy Hernandez  wrote:




On 09/04/2018 08:49 AM, Richard Biener wrote:

On Tue, Sep 4, 2018 at 2:41 PM Aldy Hernandez  wrote:




On 09/04/2018 07:58 AM, Richard Biener wrote:

On Mon, Sep 3, 2018 at 1:32 PM Aldy Hernandez  wrote:




On 08/28/2018 05:27 AM, Richard Biener wrote:

On Mon, Aug 27, 2018 at 2:24 PM Aldy Hernandez  wrote:


Howdy!

Phew, I think this is the last abstraction.  This handles the unary
CONVERT_EXPR_P code.

It's the usual story-- normalize the symbolics to [-MIN,+MAX] and handle
everything generically.

Normalizing the symbolics brought about some nice surprises.  We now
handle a few things we were punting on before, which I've documented in
the patch, but can remove if so desired.  I wrote them mainly for myself:

/* NOTES: Previously we were returning VARYING for all symbolics, but
we can do better by treating them as [-MIN, +MAX].  For
example, converting [SYM, SYM] from INT to LONG UNSIGNED,
we can return: ~[0x800, 0x7fff].

We were also failing to convert ~[0,0] from char* to unsigned,
instead choosing to return VR_VARYING.  Now we return ~[0,0].  */

Tested on x86-64 by the usual bootstrap and regtest gymnastics,
including --enable-languages=all, because my past sins are still
haunting me.

OK?


The new wide_int_range_convert_tree looks odd given it returns
tree's.  I'd have expected an API that does the conversion resulting
in a wide_int range and the VRP code adapting to that by converting
the result to trees.


Rewritten as suggested.

A few notes.

First.  I am not using widest_int as was done previously.  We were
passing 0/false to the overflow args in force_fit_type so the call was
just degrading into wide_int_to_tree() anyhow.  Am I missing some
subtlety here, and must we use widest_int and then force_fit_type on the
caller?


The issue is that the wide-ints get "converted", so it's indeed subtle.


This seems like it should be documented-- at the very least in
wide_int_range_convert.


I don't think you need it there.


What do you mean "converted"?  I'd like to understand this better before
I write out the comment :).  Do we lose precision by keeping it in a
wide_int as opposed to a widest_int?


As you're using wide_int::from that "changes" precision now.


Ah, so it's wide_int_to_tree that wants to see the widest precision.  I see.


I honestly don't remember exactly but I suppose precision mismatched somehow.

Your code was OK with not using widest_ints.




Also, can we have the caller to wide_int_range_convert() use
wide_int_to_tree directly instead of passing through force_fit_type?


I think so.


It
seems force_fit_type with overflowable=0 and overflowed=false is just a
call to wide_int_to_tree anyhow.



+  || wi::rshift (wi::sub (vr0_max, vr0_min),
+wi::uhwi (outer_prec, inner_prec),
+inner_sign) == 0)

this was previously allowed only for VR_RANGEs - now it's also for
VR_ANTI_RANGE.
That looks incorrect.  Testcase welcome ;)


See change to extract_range_into_wide_ints.  VR_ANTI_RANGEs of constants
will remain as is, whereas VR_ANTI_RANGEs of symbols will degrade into
[-MIN,+MAX] and be handled generically.

Also, see comment:

+  /* We normalize everything to a VR_RANGE, but for constant
+anti-ranges we must handle them by leaving the final result
+as an anti range.  This allows us to convert things like
+~[0,5] seamlessly.  */


Yes, but for truncation of ~[0, 5] from say int to short it's not correct
to make the result ~[0, 5], is it?  At least the original code dropped
that to VARYING.  For the same reason truncating [3, 765] from
short to unsigned char isn't [3, 253].  But maybe I'm missing something.


Correct, but in that case we will realize that in wide_int_range_convert
and refuse to do the conversion:

/* If the conversion is not truncating we can convert the min and
   max values and canonicalize the resulting range.  Otherwise we
   can do the conversion if the size of the range is less than what
   the precision of the target type can represent.  */
if (outer_prec >= inner_prec
|| wi::rshift (wi::sub (vr0_max, vr0_min),
  wi::uhwi (outer_prec, inner_prec),
  inner_sign) == 0)


so you say the check is also sufficient for all anti-ranges?  I see the
current VRP code does canonicalization to ranges before running into
the CONVERT_EXPR_CODE case.





+  return (!wi::eq_p (min, wi::min_value (outer_prec, outer_sign))
+ || !wi::eq_p (max, wi::max_value (outer_prec, outer_sign)));

so you return false for VARYING?  Why?  I'd simply return true and
return VARYING in the new type in the path you return false.


Since symbols and VR_VARYING and other things get turned into a
[-MIN,+MAX] by extract_range_into_wide_ints, it's a lot easier to handle
things 

[PATCH] Improve x % c1 == c2 expansion (PR middle-end/82853)

2018-09-04 Thread Jakub Jelinek
Hi!

Improve expansion of x % c1 == c2 and x % c1 != c2 checks.

As mentioned in Hacker's Delight book, section 10-{16,17}, we can improve
generated code for modulo by constant, if we only use the result to equality
compare against some other constant (0 for all divisors, other constants
only in certain cases at least so far).

Right now for modulo we usually emit a highpart multiplication (to get
quotient) followed by normal multiplication by the quotient and subtraction
(plus the comparison).

As the comment in the code try to explain, if c1 is odd and it is unsigned
modulo, we can expand it as (x - c2) * c3 <= c4 where c3 is modular
multiplicative inverse of c1 in the corresponding unsigned type and c4 is
either -1U / c1 or one less than that, depending on the c2 value.
If c1 is even, the patch uses r>> ctz (c1), but then supports only
a subset of c2 values (only if the highest unsigned c2 values % c1
don't yield 0).
The patch also supports signed modulo, again both for odd and even c1,
but only for c2 == 0.

The optimization is done during expansion using TER, and the patch computes
cost of emitting normal modulo + cost of the c2 constant vs.
cost of emitting the new optimized code + cost of the c4 constant.

The patch doesn't try to optimize if x is used in division or another modulo
by the same constant in the same basic block, assuming that it is likely
optimized into division + modulo combined operation or at least the high
part multiply reused between the two.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

This optimization during those 2 bootstraps + regtests triggered 1239
times (counted cases where the cost of optimized sequence was smaller
than original), in particular 545x with unsigned modulo with odd c1, 51x with
signed modulo with odd c1, 454x with unsigned modulo with even c1 and
189x with signed modulo with even c1.

In the PR there is somewhat bigger test coverage, but I'm not sure if it
isn't too big to be included in the testsuite.  On a fast box each of the
6 tests takes 2-4 seconds to compile and runtime for each test is 2-18
seconds (if skipping the 128-bit tests 2-6 seconds), the non-32/64-bit tests
are 36-64KiB long, 128-bit tests 104-184KiB (plus it needs random ()).

2018-09-04  Jakub Jelinek  

PR middle-end/82853
* expr.h (maybe_optimize_mod_cmp): Declare.
* expr.c (mod_inv): New function.
(maybe_optimize_mod_cmp): New function.
(do_store_flag): Use it.
* cfgexpand.c (expand_gimple_cond): Likewise.

* gcc.target/i386/pr82853-1.c: New test.
* gcc.target/i386/pr82853-2.c: New test.

--- gcc/expr.h.jj   2018-08-29 23:36:15.806122967 +0200
+++ gcc/expr.h  2018-09-04 09:38:35.215881588 +0200
@@ -290,6 +290,8 @@ expand_normal (tree exp)
a string constant.  */
 extern tree string_constant (tree, tree *, tree *, tree *);
 
+extern enum tree_code maybe_optimize_mod_cmp (enum tree_code, tree *, tree *);
+
 /* Two different ways of generating switch statements.  */
 extern int try_casesi (tree, tree, tree, tree, rtx, rtx, rtx, 
profile_probability);
 extern int try_tablejump (tree, tree, tree, tree, rtx, rtx, 
profile_probability);
--- gcc/expr.c.jj   2018-08-29 23:36:15.806122967 +0200
+++ gcc/expr.c  2018-09-04 12:31:37.538106639 +0200
@@ -11491,6 +11491,241 @@ string_constant (tree arg, tree *ptr_off
   return init;
 }
 
+/* Compute the modular multiplicative inverse of A modulo M
+   using extended Euclid's algorithm.  Assumes A and M are coprime.  */
+static wide_int
+mod_inv (const wide_int , const wide_int )
+{
+  /* Verify the assumption.  */
+  gcc_checking_assert (wi::eq_p (wi::gcd (a, b), 1));
+
+  unsigned int p = a.get_precision () + 1;
+  gcc_checking_assert (b.get_precision () + 1 == p);
+  wide_int c = wide_int::from (a, p, UNSIGNED);
+  wide_int d = wide_int::from (b, p, UNSIGNED);
+  wide_int x0 = wide_int::from (0, p, UNSIGNED);
+  wide_int x1 = wide_int::from (1, p, UNSIGNED);
+
+  if (wi::eq_p (b, 1))
+return wide_int::from (1, p, UNSIGNED);
+
+  while (wi::gt_p (c, 1, UNSIGNED))
+{
+  wide_int t = d;
+  wide_int q = wi::divmod_trunc (c, d, UNSIGNED, );
+  c = t;
+  wide_int s = x0;
+  x0 = wi::sub (x1, wi::mul (q, x0));
+  x1 = s;
+}
+  if (wi::lt_p (x1, 0, SIGNED))
+x1 += d;
+  return x1;
+}
+
+/* Attempt to optimize unsigned (X % C1) == C2 (or (X % C1) != C2).
+   If C1 is odd to:
+   (X - C2) * C3 <= C4 (or >), where
+   C3 is modular multiplicative inverse of C1 and 1< ((1<> S) <= C4, where C3 is modular multiplicative
+   inverse of C1>>S and 1<>S)) >> S.
+
+   For signed (X % C1) == 0 if C1 is odd to (all operations in it
+   unsigned):
+   (X * C3) + C4 <= 2 * C4, where
+   C3 is modular multiplicative inverse of (unsigned) C1 and 1<> S <= (C4 >> (S - 1))
+   where C3 is modular multiplicative inverse of (unsigned)(C1>>S) and 1<>S)) & (-1<= c should be always
+false.  */
+  || tree_int_cst_le (treeop1, 

Re: VRP: abstract out wide int CONVERT_EXPR_P code

2018-09-04 Thread Richard Biener
On Tue, Sep 4, 2018 at 4:12 PM Aldy Hernandez  wrote:
>
>
>
> On 09/04/2018 08:49 AM, Richard Biener wrote:
> > On Tue, Sep 4, 2018 at 2:41 PM Aldy Hernandez  wrote:
> >>
> >>
> >>
> >> On 09/04/2018 07:58 AM, Richard Biener wrote:
> >>> On Mon, Sep 3, 2018 at 1:32 PM Aldy Hernandez  wrote:
> 
> 
> 
>  On 08/28/2018 05:27 AM, Richard Biener wrote:
> > On Mon, Aug 27, 2018 at 2:24 PM Aldy Hernandez  wrote:
> >>
> >> Howdy!
> >>
> >> Phew, I think this is the last abstraction.  This handles the unary
> >> CONVERT_EXPR_P code.
> >>
> >> It's the usual story-- normalize the symbolics to [-MIN,+MAX] and 
> >> handle
> >> everything generically.
> >>
> >> Normalizing the symbolics brought about some nice surprises.  We now
> >> handle a few things we were punting on before, which I've documented in
> >> the patch, but can remove if so desired.  I wrote them mainly for 
> >> myself:
> >>
> >> /* NOTES: Previously we were returning VARYING for all symbolics, but
> >>we can do better by treating them as [-MIN, +MAX].  For
> >>example, converting [SYM, SYM] from INT to LONG UNSIGNED,
> >>we can return: ~[0x800, 0x7fff].
> >>
> >>We were also failing to convert ~[0,0] from char* to unsigned,
> >>instead choosing to return VR_VARYING.  Now we return ~[0,0].  
> >> */
> >>
> >> Tested on x86-64 by the usual bootstrap and regtest gymnastics,
> >> including --enable-languages=all, because my past sins are still
> >> haunting me.
> >>
> >> OK?
> >
> > The new wide_int_range_convert_tree looks odd given it returns
> > tree's.  I'd have expected an API that does the conversion resulting
> > in a wide_int range and the VRP code adapting to that by converting
> > the result to trees.
> 
>  Rewritten as suggested.
> 
>  A few notes.
> 
>  First.  I am not using widest_int as was done previously.  We were
>  passing 0/false to the overflow args in force_fit_type so the call was
>  just degrading into wide_int_to_tree() anyhow.  Am I missing some
>  subtlety here, and must we use widest_int and then force_fit_type on the
>  caller?
> >>>
> >>> The issue is that the wide-ints get "converted", so it's indeed subtle.
> >>
> >> This seems like it should be documented-- at the very least in
> >> wide_int_range_convert.
> >
> > I don't think you need it there.
> >
> >> What do you mean "converted"?  I'd like to understand this better before
> >> I write out the comment :).  Do we lose precision by keeping it in a
> >> wide_int as opposed to a widest_int?
> >
> > As you're using wide_int::from that "changes" precision now.
>
> Ah, so it's wide_int_to_tree that wants to see the widest precision.  I see.

I honestly don't remember exactly but I suppose precision mismatched somehow.

Your code was OK with not using widest_ints.

> >
> >> Also, can we have the caller to wide_int_range_convert() use
> >> wide_int_to_tree directly instead of passing through force_fit_type?
> >
> > I think so.
> >
> >> It
> >> seems force_fit_type with overflowable=0 and overflowed=false is just a
> >> call to wide_int_to_tree anyhow.
> >>
> >>>
> >>> +  || wi::rshift (wi::sub (vr0_max, vr0_min),
> >>> +wi::uhwi (outer_prec, inner_prec),
> >>> +inner_sign) == 0)
> >>>
> >>> this was previously allowed only for VR_RANGEs - now it's also for
> >>> VR_ANTI_RANGE.
> >>> That looks incorrect.  Testcase welcome ;)
> >>
> >> See change to extract_range_into_wide_ints.  VR_ANTI_RANGEs of constants
> >> will remain as is, whereas VR_ANTI_RANGEs of symbols will degrade into
> >> [-MIN,+MAX] and be handled generically.
> >>
> >> Also, see comment:
> >>
> >> +  /* We normalize everything to a VR_RANGE, but for constant
> >> +anti-ranges we must handle them by leaving the final result
> >> +as an anti range.  This allows us to convert things like
> >> +~[0,5] seamlessly.  */
> >
> > Yes, but for truncation of ~[0, 5] from say int to short it's not correct
> > to make the result ~[0, 5], is it?  At least the original code dropped
> > that to VARYING.  For the same reason truncating [3, 765] from
> > short to unsigned char isn't [3, 253].  But maybe I'm missing something.
>
> Correct, but in that case we will realize that in wide_int_range_convert
> and refuse to do the conversion:
>
>/* If the conversion is not truncating we can convert the min and
>   max values and canonicalize the resulting range.  Otherwise we
>   can do the conversion if the size of the range is less than what
>   the precision of the target type can represent.  */
>if (outer_prec >= inner_prec
>|| wi::rshift (wi::sub (vr0_max, vr0_min),
>  wi::uhwi (outer_prec, inner_prec),
>  inner_sign) == 0)

so you say the 

Re: VRP: abstract out POINTER_TYPE_P handling

2018-09-04 Thread Aldy Hernandez




On 09/04/2018 10:12 AM, Richard Biener wrote:

On Tue, Sep 4, 2018 at 3:51 PM Aldy Hernandez  wrote:



Splitting this out makes my life a lot easier, with virtually no penalty
to VRP.  And we could always revisit it later.  But if you feel strongly
about it, I could inline the pointer handling into our code, and revisit
this later with you.


Yeah, please do so - I'm leaving for the Cauldron soon.  Looking at
the current wide-int-range.h header doesn't reveal a consistent API :/


Suggestions welcome :).

Just to make sure, was that OK and approval, provided I make the
function an inline?


It was to "inline the pointer handling into our code"


Booo


Re: VRP: abstract out POINTER_TYPE_P handling

2018-09-04 Thread Richard Biener
On Tue, Sep 4, 2018 at 3:51 PM Aldy Hernandez  wrote:
>
>
>
> On 09/04/2018 09:41 AM, Richard Biener wrote:
> > On Tue, Sep 4, 2018 at 3:21 PM Aldy Hernandez  wrote:
> >>
> >>
> >>
> >> On 09/04/2018 08:29 AM, Richard Biener wrote:
> >>> On Tue, Sep 4, 2018 at 2:09 PM Aldy Hernandez  wrote:
> 
> 
> 
>  On 09/04/2018 07:42 AM, Richard Biener wrote:
> > On Thu, Aug 30, 2018 at 9:39 AM Aldy Hernandez  wrote:
> >>
> >>
> >>
> >> On 08/29/2018 12:32 PM, David Malcolm wrote:
> >>> On Wed, 2018-08-29 at 06:54 -0400, Aldy Hernandez wrote:
>  Never say never.  Just when I thought I was done...
> 
>  It looks like I need the special casing we do for pointer types in
>  extract_range_from_binary_expr_1.
> 
>  The code is simple enough that we could just duplicate it anywhere
>  we
>  need it, but I hate code duplication and keeping track of multiple
>  versions of the same thing.
> 
>  No change in functionality.
> 
>  Tested on x86-64 Linux with all languages.
> 
>  OK?
> >>>
> >>> A couple of nits I spotted:
> >>>
> >>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> >>> index f20730a85ba..228f20b5203 100644
> >>> --- a/gcc/tree-vrp.c
> >>> +++ b/gcc/tree-vrp.c
> >>> @@ -1275,6 +1275,32 @@ set_value_range_with_overflow (value_range ,
> >>>  }
> >>>  }
> >>>
> >>> +/* Value range wrapper for wide_int_range_pointer.  */
> >>> +
> >>> +static void
> >>> +vrp_range_pointer (value_range *vr,
> >>> +enum tree_code code, tree type,
> >>> +value_range *vr0, value_range *vr1)
> >>>
> >>> Looking at the signature of the function, I wondered what the source
> >>> and destination of the information is...
> >>
> >> vr being the destination and vr0/vr1 being the sources are standard
> >> operating procedure within tree-vrp.c.  All the functions are basically
> >> that, that's why I haven't bothered.
> >>
> >>>
> >>> Could vr0 and vr1 be const?
> >>>
> >>> ...which would require extract_range_into_wide_ints to take a const
> >>> value_range *
> >>
> >> Yes, but that would require changing all of tree-vrp.c to take const
> >> value_range's.  For instance, range_int_cst_p and a slew of other
> >> functions throughout.
> >>
> >>>
> >>>   ... which would require range_int_cst_p to take a const 
> >>> value_range *
> >>>
> >>> (I *think* that's where the yak-shaving would end)
> >>>
> >>> +{
> >>> +  signop sign = TYPE_SIGN (type);
> >>> +  unsigned prec = TYPE_PRECISION (type);
> >>> +  wide_int vr0_min, vr0_max;
> >>> +  wide_int vr1_min, vr1_max;
> >>> +
> >>> +  extract_range_into_wide_ints (vr0, sign, prec, vr0_min, vr0_max);
> >>> +  extract_range_into_wide_ints (vr1, sign, prec, vr1_min, vr1_max);
> >>> +  wide_int_range_nullness n;
> >>> +  n = wide_int_range_pointer (code, sign, vr0_min, vr0_max, vr1_min,
> >>> vr1_max);
> >>> +  if (n == WIDE_INT_RANGE_UNKNOWN)
> >>> +set_value_range_to_varying (vr);
> >>> +  else if (n == WIDE_INT_RANGE_NULL)
> >>> +set_value_range_to_null (vr, type);
> >>> +  else if (n == WIDE_INT_RANGE_NONNULL)
> >>> +set_value_range_to_nonnull (vr, type);
> >>> +  else
> >>> +gcc_unreachable ();
> >>> +}
> >>> +
> >>>
> >>> Would it be better to use a "switch (n)" here, rather than a series of
> >>> "if"/"else if" for each enum value?
> >>
> >> I prefer ifs for a small amount of options, but having the compiler 
> >> warn
> >> on missing enum alternatives is nice, so I've changed it.  Notice
> >> there's more code now though :-(.
> >
> > I don't like the names *_range_pointer, please change them to sth more
> > specific like *_range_pointer_binary_op.
> 
>  Sure.
> 
> >
> > What's the advantage of "hiding" the resulting ranges behind the
> > wide_int_range_nullness enum rather than having regular range output?
> 
>  Our upcoming work has another representation for non-nulls in particular
>  ([-MIN,-1][1,+MAX]), since we don't have anti ranges.  I want to share
>  whatever VRP is currently doing for pointers, without having to depend
>  on the internals of vrp (value_range *).
> 
> > What's the value in separating pointer operations at all in the
> > wide-int interfaces?  IIRC the difference is that whenever unioning
> > is required that when it's a pointer result we should possibly
> > preserve non-nullness.  It's of course also doing less work overall.
> 
>  I don't follow.  What are you suggesting?
> >>>
> >>> I'm thinking out loud.  Here adding sort-of a "preferencing" to the
> >>> more general handling of the ops would do 

Re: VRP: abstract out wide int CONVERT_EXPR_P code

2018-09-04 Thread Aldy Hernandez



On 09/04/2018 08:49 AM, Richard Biener wrote:

On Tue, Sep 4, 2018 at 2:41 PM Aldy Hernandez  wrote:




On 09/04/2018 07:58 AM, Richard Biener wrote:

On Mon, Sep 3, 2018 at 1:32 PM Aldy Hernandez  wrote:




On 08/28/2018 05:27 AM, Richard Biener wrote:

On Mon, Aug 27, 2018 at 2:24 PM Aldy Hernandez  wrote:


Howdy!

Phew, I think this is the last abstraction.  This handles the unary
CONVERT_EXPR_P code.

It's the usual story-- normalize the symbolics to [-MIN,+MAX] and handle
everything generically.

Normalizing the symbolics brought about some nice surprises.  We now
handle a few things we were punting on before, which I've documented in
the patch, but can remove if so desired.  I wrote them mainly for myself:

/* NOTES: Previously we were returning VARYING for all symbolics, but
   we can do better by treating them as [-MIN, +MAX].  For
   example, converting [SYM, SYM] from INT to LONG UNSIGNED,
   we can return: ~[0x800, 0x7fff].

   We were also failing to convert ~[0,0] from char* to unsigned,
   instead choosing to return VR_VARYING.  Now we return ~[0,0].  */

Tested on x86-64 by the usual bootstrap and regtest gymnastics,
including --enable-languages=all, because my past sins are still
haunting me.

OK?


The new wide_int_range_convert_tree looks odd given it returns
tree's.  I'd have expected an API that does the conversion resulting
in a wide_int range and the VRP code adapting to that by converting
the result to trees.


Rewritten as suggested.

A few notes.

First.  I am not using widest_int as was done previously.  We were
passing 0/false to the overflow args in force_fit_type so the call was
just degrading into wide_int_to_tree() anyhow.  Am I missing some
subtlety here, and must we use widest_int and then force_fit_type on the
caller?


The issue is that the wide-ints get "converted", so it's indeed subtle.


This seems like it should be documented-- at the very least in
wide_int_range_convert.


I don't think you need it there.


What do you mean "converted"?  I'd like to understand this better before
I write out the comment :).  Do we lose precision by keeping it in a
wide_int as opposed to a widest_int?


As you're using wide_int::from that "changes" precision now.


Ah, so it's wide_int_to_tree that wants to see the widest precision.  I see.




Also, can we have the caller to wide_int_range_convert() use
wide_int_to_tree directly instead of passing through force_fit_type?


I think so.


It
seems force_fit_type with overflowable=0 and overflowed=false is just a
call to wide_int_to_tree anyhow.



+  || wi::rshift (wi::sub (vr0_max, vr0_min),
+wi::uhwi (outer_prec, inner_prec),
+inner_sign) == 0)

this was previously allowed only for VR_RANGEs - now it's also for
VR_ANTI_RANGE.
That looks incorrect.  Testcase welcome ;)


See change to extract_range_into_wide_ints.  VR_ANTI_RANGEs of constants
will remain as is, whereas VR_ANTI_RANGEs of symbols will degrade into
[-MIN,+MAX] and be handled generically.

Also, see comment:

+  /* We normalize everything to a VR_RANGE, but for constant
+anti-ranges we must handle them by leaving the final result
+as an anti range.  This allows us to convert things like
+~[0,5] seamlessly.  */


Yes, but for truncation of ~[0, 5] from say int to short it's not correct
to make the result ~[0, 5], is it?  At least the original code dropped
that to VARYING.  For the same reason truncating [3, 765] from
short to unsigned char isn't [3, 253].  But maybe I'm missing something.


Correct, but in that case we will realize that in wide_int_range_convert 
and refuse to do the conversion:


  /* If the conversion is not truncating we can convert the min and
 max values and canonicalize the resulting range.  Otherwise we
 can do the conversion if the size of the range is less than what
 the precision of the target type can represent.  */
  if (outer_prec >= inner_prec
  || wi::rshift (wi::sub (vr0_max, vr0_min),
 wi::uhwi (outer_prec, inner_prec),
 inner_sign) == 0)





+  return (!wi::eq_p (min, wi::min_value (outer_prec, outer_sign))
+ || !wi::eq_p (max, wi::max_value (outer_prec, outer_sign)));

so you return false for VARYING?  Why?  I'd simply return true and
return VARYING in the new type in the path you return false.


Since symbols and VR_VARYING and other things get turned into a
[-MIN,+MAX] by extract_range_into_wide_ints, it's a lot easier to handle
things generically and return varying when the range spans the entire
domain.  It also keeps with the rest of the wide_int_range_* functions
that return false when we know it's going to be VR_VARYING.


Ah, OK, didn't know they did that.  Not sure if that's convenient though
given VR_VARYING has no representation in the wide-int-range "range"
so callers chaining ops would need to do sth like

if (!wide_int_range_* (, _min, 

[PATCH] Ignore properly -mdirect-move (PR target/87164).

2018-09-04 Thread Martin Liška
Hi.

Option mdirect-move should be Deprecated, that means option value is ignored
and user can't influence rs6000_isa_flags.

Patch can bootstrap on ppc64le-redhat-linux (gcc110 and gcc112) and survives 
regression tests.

Ready to be installed?
Martin

gcc/ChangeLog:

2018-08-31  Martin Liska  

PR target/87164
* config/rs6000/rs6000.opt: Mark the option as Deprecated.
* optc-gen.awk: Allow 'Var' for Deprecated options in order
to generate a MASK value.
---
 gcc/config/rs6000/rs6000.opt | 3 +--
 gcc/optc-gen.awk | 2 --
 2 files changed, 1 insertion(+), 4 deletions(-)


diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 0abeeafc646..138ce26d03f 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -483,9 +483,8 @@ mcrypto
 Target Report Mask(CRYPTO) Var(rs6000_isa_flags)
 Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions.
 
-; We can't use Ignore flag because DIRECT_MOVE mask is still used.
 mdirect-move
-Target Undocumented Mask(DIRECT_MOVE) Var(rs6000_isa_flags) Warn(%qs is deprecated)
+Target Undocumented Mask(DIRECT_MOVE) Var(rs6000_isa_flags) Deprecated
 
 mhtm
 Target Report Mask(HTM) Var(rs6000_isa_flags)
diff --git a/gcc/optc-gen.awk b/gcc/optc-gen.awk
index 9a79bb86243..3668b3ef0e4 100644
--- a/gcc/optc-gen.awk
+++ b/gcc/optc-gen.awk
@@ -336,8 +336,6 @@ for (i = 0; i < n_opts; i++) {
 			  alias_data = "NULL, NULL, OPT_SPECIAL_deprecated"
 if (warn_message != "NULL")
   print "#error Deprecated option with Warn"
-if (var_name(flags[i]) != "")
-  print "#error Deprecated option with Var"
 if (flag_set_p("Report", flags[i]))
   print "#error Deprecated option with Report"
   }



Re: VRP: abstract out POINTER_TYPE_P handling

2018-09-04 Thread Aldy Hernandez




On 09/04/2018 09:41 AM, Richard Biener wrote:

On Tue, Sep 4, 2018 at 3:21 PM Aldy Hernandez  wrote:




On 09/04/2018 08:29 AM, Richard Biener wrote:

On Tue, Sep 4, 2018 at 2:09 PM Aldy Hernandez  wrote:




On 09/04/2018 07:42 AM, Richard Biener wrote:

On Thu, Aug 30, 2018 at 9:39 AM Aldy Hernandez  wrote:




On 08/29/2018 12:32 PM, David Malcolm wrote:

On Wed, 2018-08-29 at 06:54 -0400, Aldy Hernandez wrote:

Never say never.  Just when I thought I was done...

It looks like I need the special casing we do for pointer types in
extract_range_from_binary_expr_1.

The code is simple enough that we could just duplicate it anywhere
we
need it, but I hate code duplication and keeping track of multiple
versions of the same thing.

No change in functionality.

Tested on x86-64 Linux with all languages.

OK?


A couple of nits I spotted:

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index f20730a85ba..228f20b5203 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -1275,6 +1275,32 @@ set_value_range_with_overflow (value_range ,
 }
 }

+/* Value range wrapper for wide_int_range_pointer.  */
+
+static void
+vrp_range_pointer (value_range *vr,
+enum tree_code code, tree type,
+value_range *vr0, value_range *vr1)

Looking at the signature of the function, I wondered what the source
and destination of the information is...


vr being the destination and vr0/vr1 being the sources are standard
operating procedure within tree-vrp.c.  All the functions are basically
that, that's why I haven't bothered.



Could vr0 and vr1 be const?

...which would require extract_range_into_wide_ints to take a const
value_range *


Yes, but that would require changing all of tree-vrp.c to take const
value_range's.  For instance, range_int_cst_p and a slew of other
functions throughout.



  ... which would require range_int_cst_p to take a const value_range *

(I *think* that's where the yak-shaving would end)

+{
+  signop sign = TYPE_SIGN (type);
+  unsigned prec = TYPE_PRECISION (type);
+  wide_int vr0_min, vr0_max;
+  wide_int vr1_min, vr1_max;
+
+  extract_range_into_wide_ints (vr0, sign, prec, vr0_min, vr0_max);
+  extract_range_into_wide_ints (vr1, sign, prec, vr1_min, vr1_max);
+  wide_int_range_nullness n;
+  n = wide_int_range_pointer (code, sign, vr0_min, vr0_max, vr1_min,
vr1_max);
+  if (n == WIDE_INT_RANGE_UNKNOWN)
+set_value_range_to_varying (vr);
+  else if (n == WIDE_INT_RANGE_NULL)
+set_value_range_to_null (vr, type);
+  else if (n == WIDE_INT_RANGE_NONNULL)
+set_value_range_to_nonnull (vr, type);
+  else
+gcc_unreachable ();
+}
+

Would it be better to use a "switch (n)" here, rather than a series of
"if"/"else if" for each enum value?


I prefer ifs for a small amount of options, but having the compiler warn
on missing enum alternatives is nice, so I've changed it.  Notice
there's more code now though :-(.


I don't like the names *_range_pointer, please change them to sth more
specific like *_range_pointer_binary_op.


Sure.



What's the advantage of "hiding" the resulting ranges behind the
wide_int_range_nullness enum rather than having regular range output?


Our upcoming work has another representation for non-nulls in particular
([-MIN,-1][1,+MAX]), since we don't have anti ranges.  I want to share
whatever VRP is currently doing for pointers, without having to depend
on the internals of vrp (value_range *).


What's the value in separating pointer operations at all in the
wide-int interfaces?  IIRC the difference is that whenever unioning
is required that when it's a pointer result we should possibly
preserve non-nullness.  It's of course also doing less work overall.


I don't follow.  What are you suggesting?


I'm thinking out loud.  Here adding sort-of a "preferencing" to the
more general handling of the ops would do the same trick, without
restricting that to "pointers".  For example if a pass would be interested
in knowing whether a divisor is zero it would also rather preserve
non-nullness and trade that for precision in other parts of the range,
say, if you had [-1, -1] [1, +INF] then the smallest unioning result
is [-1, +INF] but ~[0, 0] is also a valid result which preserves non-zeroness.

So for wide-int workers I don't believe in tagging sth as pointers.  Rather
than that ops might get one of

   enum vr_preference { VR_PREF_SMALL, VR_PREF_NONNULL, VR_PREF_NONNEGATIVE, 
... }

?


Neat.  I think this is worth exploring, but perhaps something to be
looked at as a further enhancement?





So - in the end I'm not convinced that adding this kind of interface
to the wide_int_ variants is worth it and I'd rather keep the existing
VRP code?


Again, I don't want to depend on vr_values or VRP in general.


Sure.  But the general handling of the ops (just treat POINTER_PLUS like PLUS)
should do range operations just fine.  It's only the preferencing you'd lose
and that preferencing looks like a more general feature than "lets have 

Re: VRP: abstract out POINTER_TYPE_P handling

2018-09-04 Thread Richard Biener
On Tue, Sep 4, 2018 at 3:21 PM Aldy Hernandez  wrote:
>
>
>
> On 09/04/2018 08:29 AM, Richard Biener wrote:
> > On Tue, Sep 4, 2018 at 2:09 PM Aldy Hernandez  wrote:
> >>
> >>
> >>
> >> On 09/04/2018 07:42 AM, Richard Biener wrote:
> >>> On Thu, Aug 30, 2018 at 9:39 AM Aldy Hernandez  wrote:
> 
> 
> 
>  On 08/29/2018 12:32 PM, David Malcolm wrote:
> > On Wed, 2018-08-29 at 06:54 -0400, Aldy Hernandez wrote:
> >> Never say never.  Just when I thought I was done...
> >>
> >> It looks like I need the special casing we do for pointer types in
> >> extract_range_from_binary_expr_1.
> >>
> >> The code is simple enough that we could just duplicate it anywhere
> >> we
> >> need it, but I hate code duplication and keeping track of multiple
> >> versions of the same thing.
> >>
> >> No change in functionality.
> >>
> >> Tested on x86-64 Linux with all languages.
> >>
> >> OK?
> >
> > A couple of nits I spotted:
> >
> > diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> > index f20730a85ba..228f20b5203 100644
> > --- a/gcc/tree-vrp.c
> > +++ b/gcc/tree-vrp.c
> > @@ -1275,6 +1275,32 @@ set_value_range_with_overflow (value_range ,
> > }
> > }
> >
> > +/* Value range wrapper for wide_int_range_pointer.  */
> > +
> > +static void
> > +vrp_range_pointer (value_range *vr,
> > +enum tree_code code, tree type,
> > +value_range *vr0, value_range *vr1)
> >
> > Looking at the signature of the function, I wondered what the source
> > and destination of the information is...
> 
>  vr being the destination and vr0/vr1 being the sources are standard
>  operating procedure within tree-vrp.c.  All the functions are basically
>  that, that's why I haven't bothered.
> 
> >
> > Could vr0 and vr1 be const?
> >
> > ...which would require extract_range_into_wide_ints to take a const
> > value_range *
> 
>  Yes, but that would require changing all of tree-vrp.c to take const
>  value_range's.  For instance, range_int_cst_p and a slew of other
>  functions throughout.
> 
> >
> >  ... which would require range_int_cst_p to take a const 
> > value_range *
> >
> > (I *think* that's where the yak-shaving would end)
> >
> > +{
> > +  signop sign = TYPE_SIGN (type);
> > +  unsigned prec = TYPE_PRECISION (type);
> > +  wide_int vr0_min, vr0_max;
> > +  wide_int vr1_min, vr1_max;
> > +
> > +  extract_range_into_wide_ints (vr0, sign, prec, vr0_min, vr0_max);
> > +  extract_range_into_wide_ints (vr1, sign, prec, vr1_min, vr1_max);
> > +  wide_int_range_nullness n;
> > +  n = wide_int_range_pointer (code, sign, vr0_min, vr0_max, vr1_min,
> > vr1_max);
> > +  if (n == WIDE_INT_RANGE_UNKNOWN)
> > +set_value_range_to_varying (vr);
> > +  else if (n == WIDE_INT_RANGE_NULL)
> > +set_value_range_to_null (vr, type);
> > +  else if (n == WIDE_INT_RANGE_NONNULL)
> > +set_value_range_to_nonnull (vr, type);
> > +  else
> > +gcc_unreachable ();
> > +}
> > +
> >
> > Would it be better to use a "switch (n)" here, rather than a series of
> > "if"/"else if" for each enum value?
> 
>  I prefer ifs for a small amount of options, but having the compiler warn
>  on missing enum alternatives is nice, so I've changed it.  Notice
>  there's more code now though :-(.
> >>>
> >>> I don't like the names *_range_pointer, please change them to sth more
> >>> specific like *_range_pointer_binary_op.
> >>
> >> Sure.
> >>
> >>>
> >>> What's the advantage of "hiding" the resulting ranges behind the
> >>> wide_int_range_nullness enum rather than having regular range output?
> >>
> >> Our upcoming work has another representation for non-nulls in particular
> >> ([-MIN,-1][1,+MAX]), since we don't have anti ranges.  I want to share
> >> whatever VRP is currently doing for pointers, without having to depend
> >> on the internals of vrp (value_range *).
> >>
> >>> What's the value in separating pointer operations at all in the
> >>> wide-int interfaces?  IIRC the difference is that whenever unioning
> >>> is required that when it's a pointer result we should possibly
> >>> preserve non-nullness.  It's of course also doing less work overall.
> >>
> >> I don't follow.  What are you suggesting?
> >
> > I'm thinking out loud.  Here adding sort-of a "preferencing" to the
> > more general handling of the ops would do the same trick, without
> > restricting that to "pointers".  For example if a pass would be interested
> > in knowing whether a divisor is zero it would also rather preserve
> > non-nullness and trade that for precision in other parts of the range,
> > say, if you had [-1, -1] [1, +INF] then the smallest unioning result
> > is [-1, +INF] but ~[0, 0] is also a valid result 

Re: [2/2] Add AddressSanitizer annotations to std::string.

2018-09-04 Thread Mikhail Kashkarov
^^^ gentle ping.


On 08/16/2018 02:28 PM, Mikhail Kashkarov wrote:
> ^^ gentle ping.
>
> On 07/16/2018 07:16 PM, Mikhail Kashkarov wrote:
>> Rebased and update patch (typos, add missing annotations),
>> add ASan teststo verify string annotation.
>>
>>
>> On 06/28/2018 11:09 AM, Mikhail Kashkarov wrote:
>>> ^ gentle ping.
>>>
>>>
>>> On 06/08/2018 05:54 PM, Mikhail Kashkarov wrote:
 Hello,

 I've updated patches for std::string sanitization and disabling CXX11
 string SSO usage for correct sanitization.

    >>       _M_destroy(_M_allocated_capacity);
    >>+    else
    >>+      __annotate_delete();
    >
    >Do these calls definitely optimize away completely when not
    >sanitizing? Even for -O1, -Os and -Og?
    >
    >For std::vector annotation I used macros to add these 
 annotations, so
    >there is no change to the generated code when annotations are
    >disabled. But it makes the code quite ugly.

 I've checked asm code for string-inst.o and it looks like this 
 calls are
 optimized away, but there are some light changes after patch fir .

    > Right, I was wondering specifically about the 
    > instantiations. I could be wrong but I don't think anything in
    >  creates, destroys or modifies a std::basic_string.

 I was confused by reinterpret_cast's on strings in fstream.tcc, looks
 like this is not needed, you are right.

    >>   // calling 4.0.x+ _S_create.
    >>   __p->_M_set_sharable();
    >>+#if _GLIBCXX_SANITIZER_ANNOTATE_STRING
    >>+  __p->_M_length = 0;
    >>+#endif
    >
    > Why is this needed? I think a comment explaining it would help 
 (like
    > the one above explaining why calling _M_set_sharable() is 
 needed).

 Checked current version without this change, looks like now it works,
 reverted.

 Short summary:
 The reason for changing strings layout under sanitization is to 
 place string
 char buffer on heap for correct aligning in 32-bit environments,
 both pre-CXX11 and CXX11 strings ABI.

 | Sanitize string | string type | ABI is changed? | 32-bit | 64-bit |
 |-+-+-++|
 | FULL    | SSO-string  | yes | +  | +  |
 | | COW-string  | yes | +  | +  |
 |-+-+-++|
 | PARTIAL | SSO-string  | no  | -+(*)  | +  |
 | | COW-string  | no  | -  | +  |
 *only longs strings are sanitized for 32-bit

 Some functions with new define looks a bit messy without changing 
 internal
 functions(operator=), I'm also not sure if disabling string SSO 
 usage define
 should also affects other parts that relies on basic_string class size
 (checks
 with static_assert in exceptions/shim-facets).


 Any thoughts?

 On 05/29/2018 06:55 PM, Jonathan Wakely wrote:
> On 29/05/18 18:18 +0300, Kashkarov Mikhail wrote:
>> Jonathan Wakely  writes:
 --- a/libstdc++-v3/include/bits/fstream.tcc
 +++ b/libstdc++-v3/include/bits/fstream.tcc
 @@ -1081,6 +1081,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

     // Inhibit implicit instantiations for required 
 instantiations,
     // which are defined via explicit instantiations elsewhere.
 +#if !_GLIBCXX_SANITIZE_STRING
 #if _GLIBCXX_EXTERN_TEMPLATE
     extern template class basic_filebuf;
     extern template class basic_ifstream;
 @@ -1094,6 +1095,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     extern template class basic_fstream;
 #endif
 #endif
 +#endif // !_GLIBCXX_SANITIZE_STRING
>>> Why do we need to disable these explicit instantiation 
>>> declarations?
>>> Are they affected by the std::string layout changes? Is that just
>>> because of the constructors taking std::string, or something else?
>> Libstdc++ build is not sanitized, so macroses that requires
>> AddressSanitizer support will not applied and these templates 
>> will be
>> instantate without support for ASan annotations.
> Right, I was wondering specifically about the 
> instantiations. I could be wrong but I don't think anything in
>  creates, destroys or modifies a std::basic_string.
>
>
>
>
>
>>
>

-- 
Best regards,
Kashkarov Mikhail
Samsung R Institute Russia



Re: [PATCH] Add -Waligned-new to Option Summary

2018-09-04 Thread Jonathan Wakely

On 04/09/18 14:25 +0100, Jonathan Wakely wrote:

* doc/invoke.texi (Option Summary): Add -Waligned-new.

Committing to trunk and branches as obvious.





commit 55e8b8ff39837963039e6852fd9c7ed403e6eb49
Author: Jonathan Wakely 
Date:   Tue Sep 4 14:22:09 2018 +0100

   Add -Waligned-new to Option Summary

   * doc/invoke.texi (Option Summary): Add -Waligned-new.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index f911ae0d1af..27d822e83c3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -277,7 +277,7 @@ Objective-C and Objective-C++ Dialects}.
@xref{Warning Options,,Options to Request or Suppress Warnings}.
@gccoptlist{-fsyntax-only  -fmax-errors=@var{n}  -Wpedantic @gol
-pedantic-errors @gol
--w  -Wextra  -Wall  -Waddress  -Waggregate-return  @gol
+-w  -Wextra  -Wall  -Waddress  -Waggregate-return -Waligned-new @gol


I noticed this should have two spaces before the option, so did that.

commit bedaeaf331b53c01152d5856d3fd82412ee896be
Author: Jonathan Wakely 
Date:   Tue Sep 4 14:34:06 2018 +0100

Add whitespace before warning option added in previous commmit

* doc/invoke.texi (Option Summary): Add whitespace.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 27d822e83c3..ca92028b9ea 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -277,7 +277,7 @@ Objective-C and Objective-C++ Dialects}.
 @xref{Warning Options,,Options to Request or Suppress Warnings}.
 @gccoptlist{-fsyntax-only  -fmax-errors=@var{n}  -Wpedantic @gol
 -pedantic-errors @gol
--w  -Wextra  -Wall  -Waddress  -Waggregate-return -Waligned-new @gol
+-w  -Wextra  -Wall  -Waddress  -Waggregate-return  -Waligned-new @gol
 -Walloc-zero  -Walloc-size-larger-than=@var{byte-size}
 -Walloca  -Walloca-larger-than=@var{byte-size} @gol
 -Wno-aggressive-loop-optimizations  -Warray-bounds  -Warray-bounds=@var{n} @gol


[PATCH] Add -Waligned-new to Option Summary

2018-09-04 Thread Jonathan Wakely

* doc/invoke.texi (Option Summary): Add -Waligned-new.

Committing to trunk and branches as obvious.


commit 55e8b8ff39837963039e6852fd9c7ed403e6eb49
Author: Jonathan Wakely 
Date:   Tue Sep 4 14:22:09 2018 +0100

Add -Waligned-new to Option Summary

* doc/invoke.texi (Option Summary): Add -Waligned-new.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index f911ae0d1af..27d822e83c3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -277,7 +277,7 @@ Objective-C and Objective-C++ Dialects}.
 @xref{Warning Options,,Options to Request or Suppress Warnings}.
 @gccoptlist{-fsyntax-only  -fmax-errors=@var{n}  -Wpedantic @gol
 -pedantic-errors @gol
--w  -Wextra  -Wall  -Waddress  -Waggregate-return  @gol
+-w  -Wextra  -Wall  -Waddress  -Waggregate-return -Waligned-new @gol
 -Walloc-zero  -Walloc-size-larger-than=@var{byte-size}
 -Walloca  -Walloca-larger-than=@var{byte-size} @gol
 -Wno-aggressive-loop-optimizations  -Warray-bounds  -Warray-bounds=@var{n} @gol


Re: VRP: abstract out POINTER_TYPE_P handling

2018-09-04 Thread Aldy Hernandez




On 09/04/2018 08:29 AM, Richard Biener wrote:

On Tue, Sep 4, 2018 at 2:09 PM Aldy Hernandez  wrote:




On 09/04/2018 07:42 AM, Richard Biener wrote:

On Thu, Aug 30, 2018 at 9:39 AM Aldy Hernandez  wrote:




On 08/29/2018 12:32 PM, David Malcolm wrote:

On Wed, 2018-08-29 at 06:54 -0400, Aldy Hernandez wrote:

Never say never.  Just when I thought I was done...

It looks like I need the special casing we do for pointer types in
extract_range_from_binary_expr_1.

The code is simple enough that we could just duplicate it anywhere
we
need it, but I hate code duplication and keeping track of multiple
versions of the same thing.

No change in functionality.

Tested on x86-64 Linux with all languages.

OK?


A couple of nits I spotted:

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index f20730a85ba..228f20b5203 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -1275,6 +1275,32 @@ set_value_range_with_overflow (value_range ,
}
}

+/* Value range wrapper for wide_int_range_pointer.  */
+
+static void
+vrp_range_pointer (value_range *vr,
+enum tree_code code, tree type,
+value_range *vr0, value_range *vr1)

Looking at the signature of the function, I wondered what the source
and destination of the information is...


vr being the destination and vr0/vr1 being the sources are standard
operating procedure within tree-vrp.c.  All the functions are basically
that, that's why I haven't bothered.



Could vr0 and vr1 be const?

...which would require extract_range_into_wide_ints to take a const
value_range *


Yes, but that would require changing all of tree-vrp.c to take const
value_range's.  For instance, range_int_cst_p and a slew of other
functions throughout.



 ... which would require range_int_cst_p to take a const value_range *

(I *think* that's where the yak-shaving would end)

+{
+  signop sign = TYPE_SIGN (type);
+  unsigned prec = TYPE_PRECISION (type);
+  wide_int vr0_min, vr0_max;
+  wide_int vr1_min, vr1_max;
+
+  extract_range_into_wide_ints (vr0, sign, prec, vr0_min, vr0_max);
+  extract_range_into_wide_ints (vr1, sign, prec, vr1_min, vr1_max);
+  wide_int_range_nullness n;
+  n = wide_int_range_pointer (code, sign, vr0_min, vr0_max, vr1_min,
vr1_max);
+  if (n == WIDE_INT_RANGE_UNKNOWN)
+set_value_range_to_varying (vr);
+  else if (n == WIDE_INT_RANGE_NULL)
+set_value_range_to_null (vr, type);
+  else if (n == WIDE_INT_RANGE_NONNULL)
+set_value_range_to_nonnull (vr, type);
+  else
+gcc_unreachable ();
+}
+

Would it be better to use a "switch (n)" here, rather than a series of
"if"/"else if" for each enum value?


I prefer ifs for a small amount of options, but having the compiler warn
on missing enum alternatives is nice, so I've changed it.  Notice
there's more code now though :-(.


I don't like the names *_range_pointer, please change them to sth more
specific like *_range_pointer_binary_op.


Sure.



What's the advantage of "hiding" the resulting ranges behind the
wide_int_range_nullness enum rather than having regular range output?


Our upcoming work has another representation for non-nulls in particular
([-MIN,-1][1,+MAX]), since we don't have anti ranges.  I want to share
whatever VRP is currently doing for pointers, without having to depend
on the internals of vrp (value_range *).


What's the value in separating pointer operations at all in the
wide-int interfaces?  IIRC the difference is that whenever unioning
is required that when it's a pointer result we should possibly
preserve non-nullness.  It's of course also doing less work overall.


I don't follow.  What are you suggesting?


I'm thinking out loud.  Here adding sort-of a "preferencing" to the
more general handling of the ops would do the same trick, without
restricting that to "pointers".  For example if a pass would be interested
in knowing whether a divisor is zero it would also rather preserve
non-nullness and trade that for precision in other parts of the range,
say, if you had [-1, -1] [1, +INF] then the smallest unioning result
is [-1, +INF] but ~[0, 0] is also a valid result which preserves non-zeroness.

So for wide-int workers I don't believe in tagging sth as pointers.  Rather
than that ops might get one of

  enum vr_preference { VR_PREF_SMALL, VR_PREF_NONNULL, VR_PREF_NONNEGATIVE, ... 
}

?


Neat.  I think this is worth exploring, but perhaps something to be 
looked at as a further enhancement?






So - in the end I'm not convinced that adding this kind of interface
to the wide_int_ variants is worth it and I'd rather keep the existing
VRP code?


Again, I don't want to depend on vr_values or VRP in general.


Sure.  But the general handling of the ops (just treat POINTER_PLUS like PLUS)
should do range operations just fine.  It's only the preferencing you'd lose
and that preferencing looks like a more general feature than "lets have this
function dealing with a small subset of binary ops on pointers"?


Something like MIN/MAX, that is 

Re: Keep std::deque algos specializations in Debug mode

2018-09-04 Thread Jonathan Wakely

On 25/08/18 22:44 +0200, François Dumont wrote:
The last optimizations that get disabled when Debug mode is enable are 
the algo specializations for std::deque iterators.


This patch move those algos in std namespace as they should even when 
Debug mode is enable so that they get considered even when calls are 
made with the namespace qualification. And it adds all the algos Debug 
implementations which forward to normal implementations to benefit 
from optimizations.


Note that I try to use typename deque<>::iterator or typename 
deque<>::const_iterator to define Debug algos but it didn't work, gcc 
was just not considering those overloads. I wonder why ?


I added test and manually checked that behavior was correct. Do you 
see a way to automate this validation ?


François




diff --git a/libstdc++-v3/include/bits/deque.tcc 
b/libstdc++-v3/include/bits/deque.tcc
index 125bcffb0c3..2a3f23a8588 100644
--- a/libstdc++-v3/include/bits/deque.tcc
+++ b/libstdc++-v3/include/bits/deque.tcc
@@ -980,22 +980,27 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
  this->_M_impl._M_finish._M_set_node(__new_nstart + __old_num_nodes - 1);
}

+_GLIBCXX_END_NAMESPACE_CONTAINER
+
  // Overload for deque::iterators, exploiting the "segmented-iterator
  // optimization".
  template
void
-fill(const _Deque_iterator<_Tp, _Tp&, _Tp*>& __first,
-const _Deque_iterator<_Tp, _Tp&, _Tp*>& __last, const _Tp& __value)
+fill(const _GLIBCXX_STD_C::_Deque_iterator<_Tp, _Tp&, _Tp*>& __first,
+const _GLIBCXX_STD_C::_Deque_iterator<_Tp, _Tp&, _Tp*>& __last,
+const _Tp& __value)
{
-  typedef typename _Deque_iterator<_Tp, _Tp&, _Tp*>::_Self _Self;
-
-  for (typename _Self::_Map_pointer __node = __first._M_node + 1;
-   __node < __last._M_node; ++__node)
-   std::fill(*__node, *__node + _Self::_S_buffer_size(), __value);
+  typedef typename _GLIBCXX_STD_C::_Deque_iterator<_Tp, _Tp&, _Tp*>::_Self
+   _Self;

  if (__first._M_node != __last._M_node)
{
  std::fill(__first._M_cur, __first._M_last, __value);
+
+ for (typename _Self::_Map_pointer __node = __first._M_node + 1;
+  __node != __last._M_node; ++__node)


Is there any particular reason to change this from using < to != for
the comparison?

(This change is part of the reason I asked for the ChangeLog, but you
didn't mention it in the ChangeLog).

Moving it inside the condition makes sense (not only does it avoid a
branch in the single-page case, but means we fill the elements in
order).



+   std::fill(*__node, *__node + _Self::_S_buffer_size(), __value);
+
  std::fill(__last._M_first, __last._M_cur, __value);
}
  else


The rest of the code changes look fine, I just wondered about that
bit.

I do have some comments on the new tests though ...




diff --git a/libstdc++-v3/testsuite/23_containers/deque/copy.cc 
b/libstdc++-v3/testsuite/23_containers/deque/copy.cc
new file mode 100644
index 000..9171200bc20
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/deque/copy.cc
@@ -0,0 +1,56 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+#include 
+
+#include 
+
+void test01()
+{
+  std::deque d;
+  for (char c = 0; c != std::numeric_limits::max(); ++c)
+d.push_back(c);
+
+  std::deque dest(std::numeric_limits::max(), '\0');


These deques only have 127 or 255 elements (depending on
is_signed) which will fit on a single page of a deque (the
default is 512 bytes per page).

That means the tests don't exercise the logic for handling
non-contiguous blocks of memory.

Ideally we'd want to test multiple cases:

- a single page, with/without empty capacity at front/back
- multiple pages, with/without empty capacity at front/back

That would be 8 cases. I think we want to test at least a single
page and multiple pages.


--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/deque/fill.cc
@@ -0,0 +1,35 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, 

Re: VRP: abstract out wide int CONVERT_EXPR_P code

2018-09-04 Thread Richard Biener
On Tue, Sep 4, 2018 at 2:41 PM Aldy Hernandez  wrote:
>
>
>
> On 09/04/2018 07:58 AM, Richard Biener wrote:
> > On Mon, Sep 3, 2018 at 1:32 PM Aldy Hernandez  wrote:
> >>
> >>
> >>
> >> On 08/28/2018 05:27 AM, Richard Biener wrote:
> >>> On Mon, Aug 27, 2018 at 2:24 PM Aldy Hernandez  wrote:
> 
>  Howdy!
> 
>  Phew, I think this is the last abstraction.  This handles the unary
>  CONVERT_EXPR_P code.
> 
>  It's the usual story-- normalize the symbolics to [-MIN,+MAX] and handle
>  everything generically.
> 
>  Normalizing the symbolics brought about some nice surprises.  We now
>  handle a few things we were punting on before, which I've documented in
>  the patch, but can remove if so desired.  I wrote them mainly for myself:
> 
>  /* NOTES: Previously we were returning VARYING for all symbolics, but
>    we can do better by treating them as [-MIN, +MAX].  For
>    example, converting [SYM, SYM] from INT to LONG UNSIGNED,
>    we can return: ~[0x800, 0x7fff].
> 
>    We were also failing to convert ~[0,0] from char* to unsigned,
>    instead choosing to return VR_VARYING.  Now we return ~[0,0].  */
> 
>  Tested on x86-64 by the usual bootstrap and regtest gymnastics,
>  including --enable-languages=all, because my past sins are still
>  haunting me.
> 
>  OK?
> >>>
> >>> The new wide_int_range_convert_tree looks odd given it returns
> >>> tree's.  I'd have expected an API that does the conversion resulting
> >>> in a wide_int range and the VRP code adapting to that by converting
> >>> the result to trees.
> >>
> >> Rewritten as suggested.
> >>
> >> A few notes.
> >>
> >> First.  I am not using widest_int as was done previously.  We were
> >> passing 0/false to the overflow args in force_fit_type so the call was
> >> just degrading into wide_int_to_tree() anyhow.  Am I missing some
> >> subtlety here, and must we use widest_int and then force_fit_type on the
> >> caller?
> >
> > The issue is that the wide-ints get "converted", so it's indeed subtle.
>
> This seems like it should be documented-- at the very least in
> wide_int_range_convert.

I don't think you need it there.

> What do you mean "converted"?  I'd like to understand this better before
> I write out the comment :).  Do we lose precision by keeping it in a
> wide_int as opposed to a widest_int?

As you're using wide_int::from that "changes" precision now.

> Also, can we have the caller to wide_int_range_convert() use
> wide_int_to_tree directly instead of passing through force_fit_type?

I think so.

> It
> seems force_fit_type with overflowable=0 and overflowed=false is just a
> call to wide_int_to_tree anyhow.
>
> >
> > +  || wi::rshift (wi::sub (vr0_max, vr0_min),
> > +wi::uhwi (outer_prec, inner_prec),
> > +inner_sign) == 0)
> >
> > this was previously allowed only for VR_RANGEs - now it's also for
> > VR_ANTI_RANGE.
> > That looks incorrect.  Testcase welcome ;)
>
> See change to extract_range_into_wide_ints.  VR_ANTI_RANGEs of constants
> will remain as is, whereas VR_ANTI_RANGEs of symbols will degrade into
> [-MIN,+MAX] and be handled generically.
>
> Also, see comment:
>
> +  /* We normalize everything to a VR_RANGE, but for constant
> +anti-ranges we must handle them by leaving the final result
> +as an anti range.  This allows us to convert things like
> +~[0,5] seamlessly.  */

Yes, but for truncation of ~[0, 5] from say int to short it's not correct
to make the result ~[0, 5], is it?  At least the original code dropped
that to VARYING.  For the same reason truncating [3, 765] from
short to unsigned char isn't [3, 253].  But maybe I'm missing something.

> >
> > +  return (!wi::eq_p (min, wi::min_value (outer_prec, outer_sign))
> > + || !wi::eq_p (max, wi::max_value (outer_prec, outer_sign)));
> >
> > so you return false for VARYING?  Why?  I'd simply return true and
> > return VARYING in the new type in the path you return false.
>
> Since symbols and VR_VARYING and other things get turned into a
> [-MIN,+MAX] by extract_range_into_wide_ints, it's a lot easier to handle
> things generically and return varying when the range spans the entire
> domain.  It also keeps with the rest of the wide_int_range_* functions
> that return false when we know it's going to be VR_VARYING.

Ah, OK, didn't know they did that.  Not sure if that's convenient though
given VR_VARYING has no representation in the wide-int-range "range"
so callers chaining ops would need to do sth like

if (!wide_int_range_* (, _min, _max))
  {
out_min = wide_int::min (prec);
out_max = wide_int::max (prec);
  }
if (!wide_int_range_* (out_min, out_max, ))
...

to not lose cases where VARYING inputs produce non-VARYING results?

Richard.

> Aldy
>
> >
> > Richard.
> >
> >
> >
> >>
> >> Second.  I need extract_range_into_wide_ints to 

Re: VRP: abstract out wide int CONVERT_EXPR_P code

2018-09-04 Thread Aldy Hernandez




On 09/04/2018 07:58 AM, Richard Biener wrote:

On Mon, Sep 3, 2018 at 1:32 PM Aldy Hernandez  wrote:




On 08/28/2018 05:27 AM, Richard Biener wrote:

On Mon, Aug 27, 2018 at 2:24 PM Aldy Hernandez  wrote:


Howdy!

Phew, I think this is the last abstraction.  This handles the unary
CONVERT_EXPR_P code.

It's the usual story-- normalize the symbolics to [-MIN,+MAX] and handle
everything generically.

Normalizing the symbolics brought about some nice surprises.  We now
handle a few things we were punting on before, which I've documented in
the patch, but can remove if so desired.  I wrote them mainly for myself:

/* NOTES: Previously we were returning VARYING for all symbolics, but
  we can do better by treating them as [-MIN, +MAX].  For
  example, converting [SYM, SYM] from INT to LONG UNSIGNED,
  we can return: ~[0x800, 0x7fff].

  We were also failing to convert ~[0,0] from char* to unsigned,
  instead choosing to return VR_VARYING.  Now we return ~[0,0].  */

Tested on x86-64 by the usual bootstrap and regtest gymnastics,
including --enable-languages=all, because my past sins are still
haunting me.

OK?


The new wide_int_range_convert_tree looks odd given it returns
tree's.  I'd have expected an API that does the conversion resulting
in a wide_int range and the VRP code adapting to that by converting
the result to trees.


Rewritten as suggested.

A few notes.

First.  I am not using widest_int as was done previously.  We were
passing 0/false to the overflow args in force_fit_type so the call was
just degrading into wide_int_to_tree() anyhow.  Am I missing some
subtlety here, and must we use widest_int and then force_fit_type on the
caller?


The issue is that the wide-ints get "converted", so it's indeed subtle.


This seems like it should be documented-- at the very least in 
wide_int_range_convert.


What do you mean "converted"?  I'd like to understand this better before 
I write out the comment :).  Do we lose precision by keeping it in a 
wide_int as opposed to a widest_int?


Also, can we have the caller to wide_int_range_convert() use 
wide_int_to_tree directly instead of passing through force_fit_type?  It 
seems force_fit_type with overflowable=0 and overflowed=false is just a 
call to wide_int_to_tree anyhow.




+  || wi::rshift (wi::sub (vr0_max, vr0_min),
+wi::uhwi (outer_prec, inner_prec),
+inner_sign) == 0)

this was previously allowed only for VR_RANGEs - now it's also for
VR_ANTI_RANGE.
That looks incorrect.  Testcase welcome ;)


See change to extract_range_into_wide_ints.  VR_ANTI_RANGEs of constants 
will remain as is, whereas VR_ANTI_RANGEs of symbols will degrade into 
[-MIN,+MAX] and be handled generically.


Also, see comment:

+  /* We normalize everything to a VR_RANGE, but for constant
+anti-ranges we must handle them by leaving the final result
+as an anti range.  This allows us to convert things like
+~[0,5] seamlessly.  */



+  return (!wi::eq_p (min, wi::min_value (outer_prec, outer_sign))
+ || !wi::eq_p (max, wi::max_value (outer_prec, outer_sign)));

so you return false for VARYING?  Why?  I'd simply return true and
return VARYING in the new type in the path you return false.


Since symbols and VR_VARYING and other things get turned into a 
[-MIN,+MAX] by extract_range_into_wide_ints, it's a lot easier to handle 
things generically and return varying when the range spans the entire 
domain.  It also keeps with the rest of the wide_int_range_* functions 
that return false when we know it's going to be VR_VARYING.


Aldy



Richard.





Second.  I need extract_range_into_wide_ints to work with anti ranges of
constants.  It seems like only the unary handling of CONVERT_EXPR here
uses it that way, so I believe it should go into one patch.  If you
believe strongly otherwise, I could split out the 4 lines into a
separate patch, but I'd prefer not to.

Finally, I could use vr0_min.get_precision() instead of inner_precision,
but I think it's more regular and readable the way I have it.

Tested on all languages on x86-64 Linux.

OK for trunk?


Re: VRP: abstract out POINTER_TYPE_P handling

2018-09-04 Thread Richard Biener
On Tue, Sep 4, 2018 at 2:09 PM Aldy Hernandez  wrote:
>
>
>
> On 09/04/2018 07:42 AM, Richard Biener wrote:
> > On Thu, Aug 30, 2018 at 9:39 AM Aldy Hernandez  wrote:
> >>
> >>
> >>
> >> On 08/29/2018 12:32 PM, David Malcolm wrote:
> >>> On Wed, 2018-08-29 at 06:54 -0400, Aldy Hernandez wrote:
>  Never say never.  Just when I thought I was done...
> 
>  It looks like I need the special casing we do for pointer types in
>  extract_range_from_binary_expr_1.
> 
>  The code is simple enough that we could just duplicate it anywhere
>  we
>  need it, but I hate code duplication and keeping track of multiple
>  versions of the same thing.
> 
>  No change in functionality.
> 
>  Tested on x86-64 Linux with all languages.
> 
>  OK?
> >>>
> >>> A couple of nits I spotted:
> >>>
> >>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> >>> index f20730a85ba..228f20b5203 100644
> >>> --- a/gcc/tree-vrp.c
> >>> +++ b/gcc/tree-vrp.c
> >>> @@ -1275,6 +1275,32 @@ set_value_range_with_overflow (value_range ,
> >>>}
> >>>}
> >>>
> >>> +/* Value range wrapper for wide_int_range_pointer.  */
> >>> +
> >>> +static void
> >>> +vrp_range_pointer (value_range *vr,
> >>> +enum tree_code code, tree type,
> >>> +value_range *vr0, value_range *vr1)
> >>>
> >>> Looking at the signature of the function, I wondered what the source
> >>> and destination of the information is...
> >>
> >> vr being the destination and vr0/vr1 being the sources are standard
> >> operating procedure within tree-vrp.c.  All the functions are basically
> >> that, that's why I haven't bothered.
> >>
> >>>
> >>> Could vr0 and vr1 be const?
> >>>
> >>> ...which would require extract_range_into_wide_ints to take a const
> >>> value_range *
> >>
> >> Yes, but that would require changing all of tree-vrp.c to take const
> >> value_range's.  For instance, range_int_cst_p and a slew of other
> >> functions throughout.
> >>
> >>>
> >>> ... which would require range_int_cst_p to take a const value_range *
> >>>
> >>> (I *think* that's where the yak-shaving would end)
> >>>
> >>> +{
> >>> +  signop sign = TYPE_SIGN (type);
> >>> +  unsigned prec = TYPE_PRECISION (type);
> >>> +  wide_int vr0_min, vr0_max;
> >>> +  wide_int vr1_min, vr1_max;
> >>> +
> >>> +  extract_range_into_wide_ints (vr0, sign, prec, vr0_min, vr0_max);
> >>> +  extract_range_into_wide_ints (vr1, sign, prec, vr1_min, vr1_max);
> >>> +  wide_int_range_nullness n;
> >>> +  n = wide_int_range_pointer (code, sign, vr0_min, vr0_max, vr1_min,
> >>> vr1_max);
> >>> +  if (n == WIDE_INT_RANGE_UNKNOWN)
> >>> +set_value_range_to_varying (vr);
> >>> +  else if (n == WIDE_INT_RANGE_NULL)
> >>> +set_value_range_to_null (vr, type);
> >>> +  else if (n == WIDE_INT_RANGE_NONNULL)
> >>> +set_value_range_to_nonnull (vr, type);
> >>> +  else
> >>> +gcc_unreachable ();
> >>> +}
> >>> +
> >>>
> >>> Would it be better to use a "switch (n)" here, rather than a series of
> >>> "if"/"else if" for each enum value?
> >>
> >> I prefer ifs for a small amount of options, but having the compiler warn
> >> on missing enum alternatives is nice, so I've changed it.  Notice
> >> there's more code now though :-(.
> >
> > I don't like the names *_range_pointer, please change them to sth more
> > specific like *_range_pointer_binary_op.
>
> Sure.
>
> >
> > What's the advantage of "hiding" the resulting ranges behind the
> > wide_int_range_nullness enum rather than having regular range output?
>
> Our upcoming work has another representation for non-nulls in particular
> ([-MIN,-1][1,+MAX]), since we don't have anti ranges.  I want to share
> whatever VRP is currently doing for pointers, without having to depend
> on the internals of vrp (value_range *).
>
> > What's the value in separating pointer operations at all in the
> > wide-int interfaces?  IIRC the difference is that whenever unioning
> > is required that when it's a pointer result we should possibly
> > preserve non-nullness.  It's of course also doing less work overall.
>
> I don't follow.  What are you suggesting?

I'm thinking out loud.  Here adding sort-of a "preferencing" to the
more general handling of the ops would do the same trick, without
restricting that to "pointers".  For example if a pass would be interested
in knowing whether a divisor is zero it would also rather preserve
non-nullness and trade that for precision in other parts of the range,
say, if you had [-1, -1] [1, +INF] then the smallest unioning result
is [-1, +INF] but ~[0, 0] is also a valid result which preserves non-zeroness.

So for wide-int workers I don't believe in tagging sth as pointers.  Rather
than that ops might get one of

 enum vr_preference { VR_PREF_SMALL, VR_PREF_NONNULL, VR_PREF_NONNEGATIVE, ... }

?

> >
> > So - in the end I'm not convinced that adding this kind of interface
> > to the wide_int_ variants is worth it and I'd rather keep the existing
> > VRP code?
>
> 

Re: [PATCH] Optimise sqrt reciprocal multiplications

2018-09-04 Thread Kyrill Tkachov

Hi Richard,

On 31/08/18 12:07, Richard Biener wrote:
> On Thu, 30 Aug 2018, Kyrill Tkachov wrote:
>
>> Ping.
>>
>> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01496.html
>>
>> Thanks,
>> Kyrill
>>
>> On 23/08/18 18:09, Kyrill Tkachov wrote:
>>> Hi Richard,
>>>
>>> On 23/08/18 11:13, Richard Sandiford wrote:
 Kyrill  Tkachov  writes:
> Hi all,
>
> This patch aims to optimise sequences involving uses of 1.0 / sqrt (a)
> under -freciprocal-math and -funsafe-math-optimizations.
> In particular consider:
>
> x = 1.0 / sqrt (a);
> r1 = x * x;  // same as 1.0 / a
> r2 = a * x; // same as sqrt (a)
>
> If x, r1 and r2 are all used further on in the code, this can be
> transformed into:
> tmp1 = 1.0 / a
> tmp2 = sqrt (a)
> tmp3 = tmp1 * tmp2
> x = tmp3
> r1 = tmp1
> r2 = tmp2
 Nice optimisation :-)  Someone who knows the pass better should review,
 but:
>>>
>>> Thanks for the review.
>>>
 There seems to be an implicit assumption that this is a win even
 when the r1 and r2 assignments are only conditionally executed.
 That's probably true, but it might be worth saying explicitly.
>>>
>>> I'll admit I had not considered that case.
>>> I think it won't make a difference in practice, as the really expensive
>>> operations here
>>> are the sqrt and the division and they are on the executed path in either
>>> case and them
>>> becoming independent should be a benefit of its own.
>>>
> +/* Return TRUE if USE_STMT is a multiplication of DEF by A.  */
> +
> +static inline bool
> +is_mult_by (gimple *use_stmt, tree def, tree a)
> +{
> +  if (gimple_code (use_stmt) == GIMPLE_ASSIGN
> +  && gimple_assign_rhs_code (use_stmt) == MULT_EXPR)
> +{
> +  tree op0 = gimple_assign_rhs1 (use_stmt);
> +  tree op1 = gimple_assign_rhs2 (use_stmt);
> +
> +  return (op0 == def && op1 == a)
> +  || (op0 == a && op1 == def);
> +}
> +  return 0;
> +}
 Seems like is_square_of could now be a light-weight wrapper around this.
>>>
>>> Indeed, I've done the wrapping now.
>>>
> @@ -652,6 +669,180 @@ execute_cse_reciprocals_1 (gimple_stmt_iterator
> *def_gsi, tree def)
> occ_head = NULL;
>   }
>   +/* Transform sequences like
> +   x = 1.0 / sqrt (a);
> +   r1 = x * x;
> +   r2 = a * x;
> +   into:
> +   tmp1 = 1.0 / a;
> +   tmp2 = sqrt (a);
> +   tmp3 = tmp1 * tmp2;
> +   x = tmp3;
> +   r1 = tmp1;
> +   r2 = tmp2;
> +   depending on the uses of x, r1, r2.  This removes one multiplication
> and
> +   allows the sqrt and division operations to execute in parallel.
> +   DEF_GSI is the gsi of the initial division by sqrt that defines
> +   DEF (x in the example abovs).  */
> +
> +static void
> +optimize_recip_sqrt (gimple_stmt_iterator *def_gsi, tree def)
> +{
> +  use_operand_p use_p;
> +  imm_use_iterator use_iter;
> +  gimple *stmt = gsi_stmt (*def_gsi);
> +  tree x = def;
> +  tree orig_sqrt_ssa_name = gimple_assign_rhs2 (stmt);
> +  tree div_rhs1 = gimple_assign_rhs1 (stmt);
> +
> +  if (TREE_CODE (orig_sqrt_ssa_name) != SSA_NAME
> +  || TREE_CODE (div_rhs1) != REAL_CST
> +  || !real_equal (_REAL_CST (div_rhs1), ))
> +return;
> +
> +  gimple *sqrt_stmt = SSA_NAME_DEF_STMT (orig_sqrt_ssa_name);
> +  if (!is_gimple_call (sqrt_stmt)
> +  || !gimple_call_lhs (sqrt_stmt))
> +return;
> +
> +  gcall *call = as_a  (sqrt_stmt);
 Very minor, but:

gcall *sqrt_stmt
  = dyn_cast  (SSA_NAME_DEF_STMT (orig_sqrt_ssa_name));
if (!sqrt_stmt || !gimple_call_lhs (sqrt_stmt))
  return;

 would avoid the need for the separate as_a<>, and would mean that
 we only call gimple_call_* on gcalls.
>>>
>>> Ok.
>>>
> +  if (has_other_use)
> +{
> +  /* Using the two temporaries tmp1, tmp2 from above
> + the original x is now:
> + x = tmp1 * tmp2.  */
> +  gcc_assert (mult_ssa_name);
> +  gcc_assert (sqr_ssa_name);
> +  gimple_stmt_iterator gsi2 = gsi_for_stmt (stmt);
> +
> +  tree new_ssa_name
> += make_temp_ssa_name (TREE_TYPE (a), NULL,
> "recip_sqrt_transformed");
> +  gimple *new_stmt
> += gimple_build_assign (new_ssa_name, MULT_EXPR,
> +   mult_ssa_name, sqr_ssa_name);
> +  gsi_insert_before (, new_stmt, GSI_SAME_STMT);
> +  gcc_assert (gsi_stmt (gsi2) == stmt);
> +  gimple_assign_set_rhs_from_tree (, new_ssa_name);
> +  fold_stmt ();
> +  update_stmt (stmt);
 In this case we're replacing the statement in its original position,
 so there's no real need to use a temporary.  It seems better to
 change the rhs_code, rhs1 and rhs2 of stmt in-place, with the same
 lhs 

Building old GCC with new GCC: stage1 32-bit libstdc++ fails to build after building 64-bit libstdc++

2018-09-04 Thread Matthew Krupcale
Hello,

I was recently trying to build an older multilib bootstrap GCC
(version 4.8.3) using a newer GCC (version 8.1), and I ran into an
issue where after building the 64-bit libstdc++ for the target, while
building the 32-bit libstdc++, I would get the following error[1]:

/builddir/build/BUILD/gcc-4.8.3-20140911/obj-x86_64-redhat-linux/./gcc/cc1plus:
/builddir/build/BUILD/gcc-4.8.3-20140911/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/src/.libs/libstdc++.so.6:
version `CXXABI_1.3.9' not found (required by
/builddir/build/BUILD/gcc-4.8.3-20140911/obj-x86_64-redhat-linux/./gcc/cc1plus)

What I think is happening is that the stage1 host GCC executable
cc1plus depends on the build libstdc++,

# objdump -p 
/builddir/build/BUILD/gcc-4.8.3-20140911/obj-x86_64-redhat-linux/./gcc/cc1plus
| tail -n 17 | head -n 5
Version References:
  required from libstdc++.so.6:
0x0bafd179 0x00 11 CXXABI_1.3.9
0x08922974 0x00 10 GLIBCXX_3.4
0x056bafd3 0x00 07 CXXABI_1.3

which defines CXXABI_1.3.9, among other things. However, the older
libstdc++ does not define CXXABI_1.3.9,

# objdump -p 
/builddir/build/BUILD/gcc-4.8.3-20140911/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/src/.libs/libstdc++.so
| grep 'CXXABI_1\.3\.9'

so after the target libstdc++ is built, because BASE_TARGET_EXPORTS
exports LD_LIBRARY_PATH which gives preference to the target
libraries, cc1plus fails to load.

There are also some other stage1 host GCC executables such as cc1 and
lto1 which also depend on CXXABI_1.3.9 from the build libstdc++. The
final, post-stage1 host GCC binaries cc1plus etc. do not depend on
libstdc++.so.6 at all, however, since they are linked using
-static-libstdc++.

To allow me to continue to build, I changed the BASE_TARGET_EXPORTS to
only export LD_LIBRARY_PATH with the target libraries excluding the
target libstdc++ so that the stage1 host programs would continue to
use the build libstdc++. Then in post-stage1, the target libstdc++
path is prepended to LD_LIBRARY_PATH.

I've attached two patches here: the first,
gcc48-libstdc++v3-CXXABI_1.3.9.patch, is the one that allowed me to
build[2] GCC 4.8.3 using GCC 8.1. The second,
gcc81-libstdc++v3-CXXABI_1.3.9.patch, is an analogous patch but for
GCC 8.1, which might allow future versions of GCC to build GCC 8.1
(not tested).

Now I also haven't tested what is the typical use case of building new
compilers with old ones with this patch, but it seems to me that it
makes sense for the stage1 host programs to use the build libstdc++ in
either case since they will depend on the build libstdc++ in either
case. This issue I had wouldn't show up when building new compilers
because presumably the new libstdc++ will define the versions required
by host programs from the old, build libstdc++.

Does this seem like a reasonable change in the general case, or have I
overlooked something?

Best,
Matthew Krupcale

[1] 
https://copr-be.cloud.fedoraproject.org/results/mkrupcale/gcc/fedora-28-x86_64/00789265-gcc/build.log.gz
[2] https://copr.fedorainfracloud.org/coprs/mkrupcale/gcc/build/789678/
diff --git a/Makefile.in b/Makefile.in
index bfbaf0341..9fb6ef128 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -291,7 +291,7 @@ BASE_TARGET_EXPORTS = \
 	WINDRES="$(WINDRES_FOR_TARGET)"; export WINDRES; \
 	WINDMC="$(WINDMC_FOR_TARGET)"; export WINDMC; \
 @if gcc-bootstrap
-	$(RPATH_ENVVAR)=`echo "$(TARGET_LIB_PATH)$$$(RPATH_ENVVAR)" | sed 's,::*,:,g;s,^:*,,;s,:*$$,,'`; export $(RPATH_ENVVAR); \
+	$(RPATH_ENVVAR)=`echo "$(TARGET_LIB_PATH_BUILD_LIBSTDC++)$$$(RPATH_ENVVAR)" | sed 's,::*,:,g;s,^:*,,;s,:*$$,,'`; export $(RPATH_ENVVAR); \
 @endif gcc-bootstrap
 	$(RPATH_ENVVAR)=`echo "$(HOST_LIB_PATH)$$$(RPATH_ENVVAR)" | sed 's,::*,:,g;s,^:*,,;s,:*$$,,'`; export $(RPATH_ENVVAR); \
 	TARGET_CONFIGDIRS="$(TARGET_CONFIGDIRS)"; export TARGET_CONFIGDIRS;
@@ -305,6 +305,18 @@ NORMAL_TARGET_EXPORTS = \
 	$(BASE_TARGET_EXPORTS) \
 	CXX="$(CXX_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS"; export CXX;
 
+# Use the target libstdc++ only after stage1 since the build libstdc++ is
+# required by some stage1 host modules (e.g. cc1, cc1plus, lto1) for
+# CXXABI_1.3.9
+POSTSTAGE1_RPATH_EXPORT = \
+@if target-libstdc++-v3-bootstrap
+	$(RPATH_ENVVAR)=`echo "$(TARGET_LIB_PATH_libstdc++-v3)$$$(RPATH_ENVVAR)" | sed 's,::*,:,g;s,^:*,,;s,:*$$,,'`; export $(RPATH_ENVVAR);
+@endif target-libstdc++-v3-bootstrap
+
+# Similar, for later GCC stages.
+POSTSTAGE1_TARGET_EXPORTS = \
+	$(POSTSTAGE1_RPATH_EXPORT)
+
 # Where to find GMP
 HOST_GMPLIBS = @gmplibs@
 HOST_GMPINC = @gmpinc@
@@ -576,6 +588,8 @@ all:
 # This is the list of directories that may be needed in RPATH_ENVVAR
 # so that programs built for the target machine work.
 TARGET_LIB_PATH = $(TARGET_LIB_PATH_libstdc++-v3)$(TARGET_LIB_PATH_libmudflap)$(TARGET_LIB_PATH_libsanitizer)$(TARGET_LIB_PATH_libssp)$(TARGET_LIB_PATH_libgomp)$(TARGET_LIB_PATH_libitm)$(TARGET_LIB_PATH_libatomic)$(HOST_LIB_PATH_gcc)
+# Use the build rather than the target 

Re: VRP: abstract out POINTER_TYPE_P handling

2018-09-04 Thread Aldy Hernandez




On 09/04/2018 07:42 AM, Richard Biener wrote:

On Thu, Aug 30, 2018 at 9:39 AM Aldy Hernandez  wrote:




On 08/29/2018 12:32 PM, David Malcolm wrote:

On Wed, 2018-08-29 at 06:54 -0400, Aldy Hernandez wrote:

Never say never.  Just when I thought I was done...

It looks like I need the special casing we do for pointer types in
extract_range_from_binary_expr_1.

The code is simple enough that we could just duplicate it anywhere
we
need it, but I hate code duplication and keeping track of multiple
versions of the same thing.

No change in functionality.

Tested on x86-64 Linux with all languages.

OK?


A couple of nits I spotted:

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index f20730a85ba..228f20b5203 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -1275,6 +1275,32 @@ set_value_range_with_overflow (value_range ,
   }
   }

+/* Value range wrapper for wide_int_range_pointer.  */
+
+static void
+vrp_range_pointer (value_range *vr,
+enum tree_code code, tree type,
+value_range *vr0, value_range *vr1)

Looking at the signature of the function, I wondered what the source
and destination of the information is...


vr being the destination and vr0/vr1 being the sources are standard
operating procedure within tree-vrp.c.  All the functions are basically
that, that's why I haven't bothered.



Could vr0 and vr1 be const?

...which would require extract_range_into_wide_ints to take a const
value_range *


Yes, but that would require changing all of tree-vrp.c to take const
value_range's.  For instance, range_int_cst_p and a slew of other
functions throughout.



... which would require range_int_cst_p to take a const value_range *

(I *think* that's where the yak-shaving would end)

+{
+  signop sign = TYPE_SIGN (type);
+  unsigned prec = TYPE_PRECISION (type);
+  wide_int vr0_min, vr0_max;
+  wide_int vr1_min, vr1_max;
+
+  extract_range_into_wide_ints (vr0, sign, prec, vr0_min, vr0_max);
+  extract_range_into_wide_ints (vr1, sign, prec, vr1_min, vr1_max);
+  wide_int_range_nullness n;
+  n = wide_int_range_pointer (code, sign, vr0_min, vr0_max, vr1_min,
vr1_max);
+  if (n == WIDE_INT_RANGE_UNKNOWN)
+set_value_range_to_varying (vr);
+  else if (n == WIDE_INT_RANGE_NULL)
+set_value_range_to_null (vr, type);
+  else if (n == WIDE_INT_RANGE_NONNULL)
+set_value_range_to_nonnull (vr, type);
+  else
+gcc_unreachable ();
+}
+

Would it be better to use a "switch (n)" here, rather than a series of
"if"/"else if" for each enum value?


I prefer ifs for a small amount of options, but having the compiler warn
on missing enum alternatives is nice, so I've changed it.  Notice
there's more code now though :-(.


I don't like the names *_range_pointer, please change them to sth more
specific like *_range_pointer_binary_op.


Sure.



What's the advantage of "hiding" the resulting ranges behind the
wide_int_range_nullness enum rather than having regular range output?


Our upcoming work has another representation for non-nulls in particular 
([-MIN,-1][1,+MAX]), since we don't have anti ranges.  I want to share 
whatever VRP is currently doing for pointers, without having to depend 
on the internals of vrp (value_range *).



What's the value in separating pointer operations at all in the
wide-int interfaces?  IIRC the difference is that whenever unioning
is required that when it's a pointer result we should possibly
preserve non-nullness.  It's of course also doing less work overall.


I don't follow.  What are you suggesting?



So - in the end I'm not convinced that adding this kind of interface
to the wide_int_ variants is worth it and I'd rather keep the existing
VRP code?


Again, I don't want to depend on vr_values or VRP in general.

Aldy


[PATCH] Fix PR87211

2018-09-04 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-09-04  Richard Biener  

PR tree-optimization/87211
* tree-ssa-sccvn.c (visit_phi): When value-numbering to a
backedge value we're supposed to treat as VARYING also number
the PHI to VARYING in case it got a different value-number already.

* gcc.dg/torture/pr87211.c: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 264078)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -4180,7 +4180,8 @@ visit_phi (gimple *phi, bool *inserted,
   }
 
   /* If the value we want to use is the backedge and that wasn't visited
- yet drop to VARYING.  This only happens when not iterating.
+ yet or if we should take it as VARYING but it has a non-VARYING
+ value drop to VARYING.  This only happens when not iterating.
  If we value-number a virtual operand never value-number to the
  value from the backedge as that confuses the alias-walking code.
  See gcc.dg/torture/pr87176.c.  If the value is the same on a
@@ -4190,7 +4191,8 @@ visit_phi (gimple *phi, bool *inserted,
   && TREE_CODE (backedge_val) == SSA_NAME
   && sameval == backedge_val
   && (SSA_NAME_IS_VIRTUAL_OPERAND (backedge_val)
- || !SSA_VISITED (backedge_val)))
+ || !SSA_VISITED (backedge_val)
+ || SSA_VAL (backedge_val) != backedge_val))
 /* Note this just drops to VARYING without inserting the PHI into
the hashes.  */
 result = PHI_RESULT (phi);
Index: gcc/testsuite/gcc.dg/torture/pr87211.c
===
--- gcc/testsuite/gcc.dg/torture/pr87211.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr87211.c  (working copy)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+
+int a, b;
+int i(int *);
+int *c(int *d, int *e)
+{
+  for (; b;)
+d = e;
+  return d;
+}
+void f()
+{
+  for (;;)
+{
+  int *g[1];
+  int h = 0;
+  for (; h < 3; h++)
+   g[0] = 
+   == g[0] || i(c((int *)g, g[0]));
+}
+}


Re: VRP: abstract out wide int CONVERT_EXPR_P code

2018-09-04 Thread Richard Biener
On Mon, Sep 3, 2018 at 1:32 PM Aldy Hernandez  wrote:
>
>
>
> On 08/28/2018 05:27 AM, Richard Biener wrote:
> > On Mon, Aug 27, 2018 at 2:24 PM Aldy Hernandez  wrote:
> >>
> >> Howdy!
> >>
> >> Phew, I think this is the last abstraction.  This handles the unary
> >> CONVERT_EXPR_P code.
> >>
> >> It's the usual story-- normalize the symbolics to [-MIN,+MAX] and handle
> >> everything generically.
> >>
> >> Normalizing the symbolics brought about some nice surprises.  We now
> >> handle a few things we were punting on before, which I've documented in
> >> the patch, but can remove if so desired.  I wrote them mainly for myself:
> >>
> >> /* NOTES: Previously we were returning VARYING for all symbolics, but
> >>  we can do better by treating them as [-MIN, +MAX].  For
> >>  example, converting [SYM, SYM] from INT to LONG UNSIGNED,
> >>  we can return: ~[0x800, 0x7fff].
> >>
> >>  We were also failing to convert ~[0,0] from char* to unsigned,
> >>  instead choosing to return VR_VARYING.  Now we return ~[0,0].  */
> >>
> >> Tested on x86-64 by the usual bootstrap and regtest gymnastics,
> >> including --enable-languages=all, because my past sins are still
> >> haunting me.
> >>
> >> OK?
> >
> > The new wide_int_range_convert_tree looks odd given it returns
> > tree's.  I'd have expected an API that does the conversion resulting
> > in a wide_int range and the VRP code adapting to that by converting
> > the result to trees.
>
> Rewritten as suggested.
>
> A few notes.
>
> First.  I am not using widest_int as was done previously.  We were
> passing 0/false to the overflow args in force_fit_type so the call was
> just degrading into wide_int_to_tree() anyhow.  Am I missing some
> subtlety here, and must we use widest_int and then force_fit_type on the
> caller?

The issue is that the wide-ints get "converted", so it's indeed subtle.

+  || wi::rshift (wi::sub (vr0_max, vr0_min),
+wi::uhwi (outer_prec, inner_prec),
+inner_sign) == 0)

this was previously allowed only for VR_RANGEs - now it's also for
VR_ANTI_RANGE.
That looks incorrect.  Testcase welcome ;)

+  return (!wi::eq_p (min, wi::min_value (outer_prec, outer_sign))
+ || !wi::eq_p (max, wi::max_value (outer_prec, outer_sign)));

so you return false for VARYING?  Why?  I'd simply return true and
return VARYING in the new type in the path you return false.

Richard.



>
> Second.  I need extract_range_into_wide_ints to work with anti ranges of
> constants.  It seems like only the unary handling of CONVERT_EXPR here
> uses it that way, so I believe it should go into one patch.  If you
> believe strongly otherwise, I could split out the 4 lines into a
> separate patch, but I'd prefer not to.
>
> Finally, I could use vr0_min.get_precision() instead of inner_precision,
> but I think it's more regular and readable the way I have it.
>
> Tested on all languages on x86-64 Linux.
>
> OK for trunk?


Re: [PATCH v2 1/6] [MIPS] Split Loongson (MMI) from loongson3a

2018-09-04 Thread Paul Hua
Hi Terry,

Thanks for your comments.
>
> For the new files, I think the copyright year should be just 2018.
>
The loongson-mmi.md is a renamed file from loongson.md, I think the
copyright year should be include the old file.

But in gs464e.md and gs264e.md, the copyright years will be just 2018,
I will update.

Paul


Re: VRP: abstract out POINTER_TYPE_P handling

2018-09-04 Thread Richard Biener
On Thu, Aug 30, 2018 at 9:39 AM Aldy Hernandez  wrote:
>
>
>
> On 08/29/2018 12:32 PM, David Malcolm wrote:
> > On Wed, 2018-08-29 at 06:54 -0400, Aldy Hernandez wrote:
> >> Never say never.  Just when I thought I was done...
> >>
> >> It looks like I need the special casing we do for pointer types in
> >> extract_range_from_binary_expr_1.
> >>
> >> The code is simple enough that we could just duplicate it anywhere
> >> we
> >> need it, but I hate code duplication and keeping track of multiple
> >> versions of the same thing.
> >>
> >> No change in functionality.
> >>
> >> Tested on x86-64 Linux with all languages.
> >>
> >> OK?
> >
> > A couple of nits I spotted:
> >
> > diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> > index f20730a85ba..228f20b5203 100644
> > --- a/gcc/tree-vrp.c
> > +++ b/gcc/tree-vrp.c
> > @@ -1275,6 +1275,32 @@ set_value_range_with_overflow (value_range ,
> >   }
> >   }
> >
> > +/* Value range wrapper for wide_int_range_pointer.  */
> > +
> > +static void
> > +vrp_range_pointer (value_range *vr,
> > +enum tree_code code, tree type,
> > +value_range *vr0, value_range *vr1)
> >
> > Looking at the signature of the function, I wondered what the source
> > and destination of the information is...
>
> vr being the destination and vr0/vr1 being the sources are standard
> operating procedure within tree-vrp.c.  All the functions are basically
> that, that's why I haven't bothered.
>
> >
> > Could vr0 and vr1 be const?
> >
> > ...which would require extract_range_into_wide_ints to take a const
> > value_range *
>
> Yes, but that would require changing all of tree-vrp.c to take const
> value_range's.  For instance, range_int_cst_p and a slew of other
> functions throughout.
>
> >
> >... which would require range_int_cst_p to take a const value_range *
> >
> > (I *think* that's where the yak-shaving would end)
> >
> > +{
> > +  signop sign = TYPE_SIGN (type);
> > +  unsigned prec = TYPE_PRECISION (type);
> > +  wide_int vr0_min, vr0_max;
> > +  wide_int vr1_min, vr1_max;
> > +
> > +  extract_range_into_wide_ints (vr0, sign, prec, vr0_min, vr0_max);
> > +  extract_range_into_wide_ints (vr1, sign, prec, vr1_min, vr1_max);
> > +  wide_int_range_nullness n;
> > +  n = wide_int_range_pointer (code, sign, vr0_min, vr0_max, vr1_min,
> > vr1_max);
> > +  if (n == WIDE_INT_RANGE_UNKNOWN)
> > +set_value_range_to_varying (vr);
> > +  else if (n == WIDE_INT_RANGE_NULL)
> > +set_value_range_to_null (vr, type);
> > +  else if (n == WIDE_INT_RANGE_NONNULL)
> > +set_value_range_to_nonnull (vr, type);
> > +  else
> > +gcc_unreachable ();
> > +}
> > +
> >
> > Would it be better to use a "switch (n)" here, rather than a series of
> > "if"/"else if" for each enum value?
>
> I prefer ifs for a small amount of options, but having the compiler warn
> on missing enum alternatives is nice, so I've changed it.  Notice
> there's more code now though :-(.

I don't like the names *_range_pointer, please change them to sth more
specific like *_range_pointer_binary_op.

What's the advantage of "hiding" the resulting ranges behind the
wide_int_range_nullness enum rather than having regular range output?
What's the value in separating pointer operations at all in the
wide-int interfaces?  IIRC the difference is that whenever unioning
is required that when it's a pointer result we should possibly
preserve non-nullness.  It's of course also doing less work overall.

So - in the end I'm not convinced that adding this kind of interface
to the wide_int_ variants is worth it and I'd rather keep the existing
VRP code?

Richard.

>
> Aldy


Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-09-04 Thread Kyrill Tkachov

Hi Steve,

On 20/08/18 18:37, Steve Ellcey wrote:

On Tue, 2018-08-07 at 12:15 -0500, Segher Boessenkool wrote:

> > +/* { dg-final { scan-assembler-not "\[ \t\]stp\tq\[01234567\]" } }
> > */
> That's [0-7] but maybe you find [01234567] more readable here.

Segher,  I fixed all the issues you pointed out except this one.  Since
there are some uses of non consecutive numbers in one of the tests I
decided to leave [01234567] instead of using [0-7].  Here is the
latest version of the patch, there are no semantic changes, just
syntactical ones to address the issues that you pointed out.

Steve Ellcey
sell...@cavium.com



One more comment below.
It looks sensible enough to me otherwise, but I haven't done a deep review of 
the logic.

Thanks,
Kyrill



2018-08-20  Steve Ellcey  

* config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p):
New prototype.
(aarch64_epilogue_uses): Ditto.
* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_is_simd_call_p): New function.
(aarch64_function_ok_for_sibcall): Check for simd calls.
(aarch64_layout_frame): Check for simd function.
(aarch64_gen_storewb_pair): Handle E_TFmode.
(aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_loadwb_pair): Handle E_TFmode.
(aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_store_pair): Handle E_TFmode.
(aarch64_gen_load_pair): Ditto.
(aarch64_save_callee_saves): Handle different mode sizes.
(aarch64_restore_callee_saves): Ditto.
(aarch64_components_for_bb): Check for simd function.
(aarch64_epilogue_uses): New function.
(aarch64_process_components): Check for simd function.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_expand_call): Ditto.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (EPILOGUE_USES): Redefine.
(FP_SIMD_SAVED_REGNUM_P): New macro.
* config/aarch64/aarch64.md (V23_REGNUM) New constant.
(simple_return): New define_expand.
(load_pair_dw_tftf): New instruction.
(store_pair_dw_tftf): Ditto.
(loadwb_pair_): Ditto.
("storewb_pair_): Ditto.

2018-08-20  Steve Ellcey  

* gcc.target/aarch64/torture/aarch64-torture.exp: New file.
* gcc.target/aarch64/torture/simd-abi-1.c: New test.
* gcc.target/aarch64/torture/simd-abi-2.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-3.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-4.c: Ditto.


@@ -4469,6 +4536,9 @@ aarch64_restore_callee_saves (machine_mode mode,
   unsigned regno;
   unsigned regno2;
   poly_int64 offset;
+  HOST_WIDE_INT mode_size;
+
+  gcc_assert (GET_MODE_SIZE (mode).is_constant(_size));
 
I think you want to use the gcc_unreachable approach here:


  if (!GET_MODE_SIZE (mode).is_constant(_size))
gcc_unreachable ();

just in case the gcc_assert is compiled out.





Re: VRP: abstract out bitwise AND/OR optimizations

2018-09-04 Thread Richard Biener
On Wed, Aug 29, 2018 at 12:55 PM Aldy Hernandez  wrote:
>
> I thought I was done with this one as well, but it looks like we can
> push more things out to wide-int-range.*.
>
> I also pushed the optimization itself into wide_int_range_bit_{and,or},
> for maximal sharing.
>
> And finally, I split out getting the mask and the bounds into its own
> function (wide_int_range_get_mask_and_bounds), because I have an
> upcoming optimization that uses this information, and there's no sense
> duplicating work :).
>
> No change in functionality.
>
> Tested on x86-64 Linux with all languages.
>
> OK?

OK and sorry for the delay.

Richard.

>


Re: Relocation (= move+destroy)

2018-09-04 Thread Jonathan Wakely

On 01/09/18 17:00 +0200, Marc Glisse wrote:
_GLIBCXX_ASAN_ANNOTATE_REINIT: I am not familiar with those 
annotations. It was convenient in one function to move this annotation 
after _Destroy, to reduce code duplication. For consistency, I did the 
same in the whole file. As far as I understand, the macro makes it ok 
to access memory between _M_finish and _M_end_of_storage, and at the 
end of the block marks again the region after the new _M_finish as 
protected. Since _Destroy should stop at _M_finish, moving the macro 
looks safe. But maybe the position of the macro was chosen to reduce 
needless checking in ASAN?


I don't remember if I had a specific reason for doing the REINIT
before _Destroy, but your change looks fine.

I'll keep reviewing the rest, but my first impression is that this is
a nice optimisation, thanks for working on it.




Re: [PATCH] Fix PR87176

2018-09-04 Thread Richard Biener
On Mon, 3 Sep 2018, Richard Biener wrote:

> 
> The following fixes a wrong-code issue similar to that in PR87132 where
> the alias walk reaches into code parts that are only reachable over
> backedges.  This time not via PHI processing but by value-numbering
> a virtual PHI to its backedge value.  That doesn't play well with the
> way we do iteration for the memory state.
> 
> This also adds a testcase showing the desirable way of doing virtual
> PHI value-numbering (ideally the walking code would honor edge
> executability state but that would tie it even more to VN...).
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.

This is what I have applied.

Richard.

2018-09-04  Richard Biener  

PR tree-optimization/87176
* tree-ssa-sccvn.c (visit_phi): Remove redundant allsame
variable.  When value-numbering a virtual PHI node make sure
to not value-number to the backedge value.

* gcc.dg/torture/pr87176.c: New testcase.
* gcc.dg/torture/ssa-fre-1.c: Likewise.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 264069)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -4110,11 +4110,11 @@ static bool
 visit_phi (gimple *phi, bool *inserted, bool backedges_varying_p)
 {
   tree result, sameval = VN_TOP, seen_undef = NULL_TREE;
-  tree backedge_name = NULL_TREE;
+  tree backedge_val = NULL_TREE;
+  bool seen_non_backedge = false;
   tree sameval_base = NULL_TREE;
   poly_int64 soff, doff;
   unsigned n_executable = 0;
-  bool allsame = true;
   edge_iterator ei;
   edge e;
 
@@ -4139,11 +4139,13 @@ visit_phi (gimple *phi, bool *inserted,
++n_executable;
if (TREE_CODE (def) == SSA_NAME)
  {
-   if (e->flags & EDGE_DFS_BACK)
- backedge_name = def;
if (!backedges_varying_p || !(e->flags & EDGE_DFS_BACK))
  def = SSA_VAL (def);
+   if (e->flags & EDGE_DFS_BACK)
+ backedge_val = def;
  }
+   if (!(e->flags & EDGE_DFS_BACK))
+ seen_non_backedge = true;
if (def == VN_TOP)
  ;
/* Ignore undefined defs for sameval but record one.  */
@@ -4172,16 +4174,25 @@ visit_phi (gimple *phi, bool *inserted,
 && known_eq (soff, doff))
  continue;
  }
-   allsame = false;
+   sameval = NULL_TREE;
break;
  }
   }
 
   /* If the value we want to use is the backedge and that wasn't visited
- yet drop to VARYING.  */ 
-  if (backedge_name
-  && sameval == backedge_name
-  && !SSA_VISITED (backedge_name))
+ yet drop to VARYING.  This only happens when not iterating.
+ If we value-number a virtual operand never value-number to the
+ value from the backedge as that confuses the alias-walking code.
+ See gcc.dg/torture/pr87176.c.  If the value is the same on a
+ non-backedge everything is OK though.  */
+  if (backedge_val
+  && !seen_non_backedge
+  && TREE_CODE (backedge_val) == SSA_NAME
+  && sameval == backedge_val
+  && (SSA_NAME_IS_VIRTUAL_OPERAND (backedge_val)
+ || !SSA_VISITED (backedge_val)))
+/* Note this just drops to VARYING without inserting the PHI into
+   the hashes.  */
 result = PHI_RESULT (phi);
   /* If none of the edges was executable keep the value-number at VN_TOP,
  if only a single edge is exectuable use its value.  */
@@ -4212,7 +4223,7 @@ visit_phi (gimple *phi, bool *inserted,
   /* If all values are the same use that, unless we've seen undefined
  values as well and the value isn't constant.
  CCP/copyprop have the same restriction to not remove uninit warnings.  */
-  else if (allsame
+  else if (sameval
   && (! seen_undef || is_gimple_min_invariant (sameval)))
 result = sameval;
   else
Index: gcc/testsuite/gcc.dg/torture/pr87176.c
===
--- gcc/testsuite/gcc.dg/torture/pr87176.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr87176.c  (working copy)
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+
+int a, b, c;
+
+int main ()
+{
+  int d = a = 0;
+  while (1)
+{
+  a = a ^ 6;
+  if (!a)
+   break;
+  if (d)
+   goto L;
+  d = a;
+  for (b = 0; b < 2; b++)
+   {
+ const int *f[3] = {  };
+ const int **g[] = { [2] };
+ int h = ~d;
+ if (d)
+   L:
+   if (h > 1)
+ continue;
+   }
+}
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/torture/ssa-fre-1.c
===
--- gcc/testsuite/gcc.dg/torture/ssa-fre-1.c(nonexistent)
+++ gcc/testsuite/gcc.dg/torture/ssa-fre-1.c(working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
+/* { dg-additional-options "-fstrict-aliasing 

Re: [PATCH] Update C Extensions docs for support in latest C++

2018-09-04 Thread Joseph Myers
On Tue, 4 Sep 2018, Jonathan Wakely wrote:

> The docs on 'long long' and hexadecimal floating-point literals are
> outdated w.r.t what C++ supports.
> 
> OK for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Fix a minor copyright year typo

2018-09-04 Thread wei xiao
Hi,

According to commit log, this file is created in the year 2018, hence
the copyright year should be corrected to 2018.
Is below patch ok to fix this problem?

Thanks
Wei Xiao

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 62e32c6..416dbd1 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2018-09-04  Wei Xiao  
+
+   * gcc/config/i386/movdirintrin.h: Update copyright year.
+
 2018-09-03  Richard Biener  

PR tree-optimization/87177
diff --git a/gcc/config/i386/movdirintrin.h b/gcc/config/i386/movdirintrin.h
index 8b4d0b3..75a5552 100644
--- a/gcc/config/i386/movdirintrin.h
+++ b/gcc/config/i386/movdirintrin.h
@@ -1,4 +1,4 @@
-/* Copyright (C) 2017 Free Software Foundation, Inc.
+/* Copyright (C) 2018 Free Software Foundation, Inc.

This file is part of GCC.


[PATCH] Update C Extensions docs for support in latest C++

2018-09-04 Thread Jonathan Wakely

The docs on 'long long' and hexadecimal floating-point literals are
outdated w.r.t what C++ supports.

OK for trunk?


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 981e7063f97..8e31ec6b610 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2018-09-04  Jonathan Wakely  
+
+	* doc/extend.texi (Long Long, Hex Floats): Document support for
+	long long and hex floats in more recent versions of ISO C++.
+
 2018-09-03  Richard Biener  
 
 	PR tree-optimization/87200
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 9c3cfdbf022..4b606c007d8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -843,8 +843,8 @@ for targets with @code{long long} integer less than 128 bits wide.
 @cindex @code{LL} integer suffix
 @cindex @code{ULL} integer suffix
 
-ISO C99 supports data types for integers that are at least 64 bits wide,
-and as an extension GCC supports them in C90 mode and in C++.
+ISO C99 and ISO C++11 support data types for integers that are at least
+64 bits wide, and as an extension GCC supports them in C90 and C++98 modes.
 Simply write @code{long long int} for a signed integer, or
 @code{unsigned long long int} for an unsigned integer.  To make an
 integer constant of type @code{long long int}, add the suffix @samp{LL}
@@ -1121,11 +1121,11 @@ are supported by the DWARF debug information format.
 @section Hex Floats
 @cindex hex floats
 
-ISO C99 supports floating-point numbers written not only in the usual
-decimal notation, such as @code{1.55e1}, but also numbers such as
+ISO C99 and ISO C++17 support floating-point numbers written not only in
+the usual decimal notation, such as @code{1.55e1}, but also numbers such as
 @code{0x1.fp3} written in hexadecimal format.  As a GNU extension, GCC
 supports this in C90 mode (except in some cases when strictly
-conforming) and in C++.  In that format the
+conforming) and in C++98, C++11 and C++14 modes.  In that format the
 @samp{0x} hex introducer and the @samp{p} or @samp{P} exponent field are
 mandatory.  The exponent is a decimal number that indicates the power of
 2 by which the significant part is multiplied.  Thus @samp{0x1.f} is


Re: [PATCH] Add -Wabsolute-value

2018-09-04 Thread Martin Jambor
Hi,

On Fri, Aug 31 2018, Joseph Myers wrote:
> On Fri, 31 Aug 2018, Martin Jambor wrote:
>
>> diff --git a/gcc/common.opt b/gcc/common.opt
>> index ebc3ef43ce2..2950760fb2a 100644
>> --- a/gcc/common.opt
>> +++ b/gcc/common.opt
>> @@ -815,6 +815,10 @@ Wvector-operation-performance
>>  Common Var(warn_vector_operation_performance) Warning
>>  Warn when a vector operation is compiled outside the SIMD.
>>  
>> +Wabsolute-value
>> +Common Var(warn_absolute_value) Warning EnabledBy(Wextra)
>> +Warn on suspicious calls of standard functions computing absolute values.
>
> If it's C-specific I'd expect it to be in c.opt (in the appropriate 
> alphabetical position) and have "C ObjC" instead of Common.

I see, fixed.

>
>> +@item -Wabsolute-value
>> +@opindex Wabsolute-value
>> +@opindex Wno-absolute-value
>> +Warn when a wrong absolute value function seems to be used or when it
>> +does not have any effect because its argument is an unsigned type.
>> +This warning is specific to C language and can be suppressed with an
>> +explicit type cast.  This warning is also enabled by @option{-Wextra}.
>
> And then the @item would have @r{(C and Objective-C only)}, similar to 
> other such options (rather than saying "specific to C language" in the 
> text).
>

Likewise.

I have bootstrapped and tested the updated (and re-based) patch on
x86-64-linux.  OK for trunk now?

Thank you very much,

Martin


2018-09-03  Martin Jambor  

gcc/
* doc/invoke.texi (Warning Options): Likewise.

gcc/c-family/
* c.opt (Wabsolute-value): New.

gcc/c/
* c-parser.c: (warn_for_abs): New function.
(c_parser_postfix_expression_after_primary): Call it.

testsuite/
* gcc.dg/warn-abs-1.c: New test.
* gcc.dg/dfp/warn-abs-2.c: Likewise.
---
 gcc/c-family/c.opt|   4 +
 gcc/c/c-parser.c  | 156 +-
 gcc/doc/invoke.texi   |   8 ++
 gcc/testsuite/gcc.dg/dfp/warn-abs-2.c |  28 +
 gcc/testsuite/gcc.dg/warn-abs-1.c |  67 +++
 5 files changed, 257 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/dfp/warn-abs-2.c
 create mode 100644 gcc/testsuite/gcc.dg/warn-abs-1.c

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 31a2b972919..092ec940d86 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -271,6 +271,10 @@ Warn if a subobject has an abi_tag attribute that the 
complete object type does
 Wpsabi
 C ObjC C++ ObjC++ LTO Var(warn_psabi) Init(1) Warning Undocumented 
LangEnabledBy(C ObjC C++ ObjC++,Wabi)
 
+Wabsolute-value
+C ObjC Var(warn_absolute_value) Warning EnabledBy(Wextra)
+Warn on suspicious calls of standard functions computing absolute values.
+
 Waddress
 C ObjC C++ ObjC++ Var(warn_address) Warning LangEnabledBy(C ObjC C++ 
ObjC++,Wall)
 Warn about suspicious uses of memory addresses.
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 69ed5ae9d2f..1766a256633 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -9101,6 +9101,144 @@ sizeof_ptr_memacc_comptypes (tree type1, tree type2)
   return comptypes (type1, type2) == 1;
 }
 
+/* Warn for patterns where abs-like function appears to be used incorrectly,
+   gracely ignore any non-abs-like function.  The warning location should be
+   LOC.  FNDECL is the declaration of called function, it must be a
+   BUILT_IN_NORMAL function.  ARG is the first and only argument of the
+   call.  */
+
+static void
+warn_for_abs (location_t loc, tree fndecl, tree arg)
+{
+  tree atype = TREE_TYPE (arg);
+
+  /* Casts from pointers (and thus arrays and fndecls) will generate
+ -Wint-conversion warnings.  Most other wrong types hopefully lead to type
+ mismatch errors.  TODO: Think about what to do with FIXED_POINT_TYPE_P
+ types and possibly other exotic types.  */
+  if (!INTEGRAL_TYPE_P (atype)
+  && !SCALAR_FLOAT_TYPE_P (atype)
+  && TREE_CODE (atype) != COMPLEX_TYPE)
+return;
+
+  enum built_in_function fcode = DECL_FUNCTION_CODE (fndecl);
+
+  switch (fcode)
+{
+case BUILT_IN_ABS:
+case BUILT_IN_LABS:
+case BUILT_IN_LLABS:
+case BUILT_IN_IMAXABS:
+  if (!INTEGRAL_TYPE_P (atype))
+   {
+ if (SCALAR_FLOAT_TYPE_P (atype))
+   warning_at (loc, OPT_Wabsolute_value,
+   "using integer absolute value function %qD when "
+   "argument is of floating point type %qT",
+   fndecl, atype);
+ else if (TREE_CODE (atype) == COMPLEX_TYPE)
+   warning_at (loc, OPT_Wabsolute_value,
+   "using integer absolute value function %qD when "
+   "argument is of complex type %qT", fndecl, atype);
+ else
+   gcc_unreachable ();
+ return;
+   }
+  if (TYPE_UNSIGNED (atype))
+   warning_at (loc, OPT_Wabsolute_value,
+   "taking the absolute value of unsigned type %qT "
+   

Re: [PATCH v2 1/6] [MIPS] Split Loongson (MMI) from loongson3a

2018-09-04 Thread Terry Guo
On Tue, Sep 4, 2018 at 11:53 AM, Paul Hua  wrote:
> On Mon, Sep 3, 2018 at 8:29 PM Paul Hua  wrote:
>>
>>
>
> Hi:
>
> The v2 patch add:
> * gcc/doc/invoke.texi (-mloongson-mmi): Document.
>
> Thanks
> Paul Hua

Hi Paul,

For the new files, I think the copyright year should be just 2018.

diff --git a/gcc/config/mips/loongson-mmi.md b/gcc/config/mips/loongson-mmi.md
new file mode 100644
index 000..ad23f67
--- /dev/null
+++ b/gcc/config/mips/loongson-mmi.md
@@ -0,0 +1,903 @@
+;; Machine description for Loongson MultiMedia extensions Instructions (MMI).
+;; Copyright (C) 2008-2018 Free Software Foundation, Inc.
+;; Contributed by CodeSourcery.
+;;

BR,
Terry


Re: [PATCH] genmatch: isolate reporting into a function

2018-09-04 Thread Richard Biener
On Mon, Sep 3, 2018 at 4:19 PM Martin Liška  wrote:
>
> On 09/03/2018 03:31 PM, Richard Biener wrote:
> > On Mon, Sep 3, 2018 at 2:09 PM Martin Liška  wrote:
> >>
> >> On 09/03/2018 10:19 AM, Richard Biener wrote:
> >>> On Mon, Sep 3, 2018 at 9:56 AM Martin Liška  wrote:
> 
>  Hi.
> 
>  This is small improvement for {gimple,generic}-match.c files.
>  The code path that reports application of a pattern is now guarded
>  with __builtin_expect. And reporting function lives in gimple.c.
> 
>  For gimple-match.o, difference is:
> 
>  bloaty /tmp/after.o -- /tmp/before.o
>   VM SIZE   FILE SIZE
>   ++ GROWING ++
>    [ = ]   0 .rela.debug_loc +58.5Ki  +0.5%
>    +0.7% +7.70Ki .text   +7.70Ki  +0.7%
>    [ = ]   0 .debug_info +3.53Ki  +0.6%
>    [ = ]   0 .rela.debug_ranges  +2.02Ki  +0.0%
>    [ = ]   0 .debug_loc  +1.86Ki  +0.7%
>    +0.7%+448 .eh_frame  +448  +0.7%
>    [ = ]   0 .rela.eh_frame +192  +0.7%
>    [ = ]   0 .rela.debug_line+48  +0.4%
>    [ = ]   0 .debug_str  +26  +0.0%
>    +6.9%  +9 .rodata.str1.1   +9  +6.9%
> 
>   -- SHRINKING   --
>   -97.5% -24.8Ki .rodata.str1.8  -24.8Ki -97.5%
>    [ = ]   0 .symtab -14.7Ki -26.1%
>    [ = ]   0 .strtab -3.57Ki  -2.2%
>    [ = ]   0 .rela.debug_info-2.81Ki  -0.0%
>    [ = ]   0 .debug_line -2.14Ki  -0.6%
>    [ = ]   0 .rela.text -816  -0.1%
>    [ = ]   0 .rela.text.unlikely-288  -0.1%
>    -0.1%-131 .text.unlikely -131  -0.1%
>    [ = ]   0 [Unmapped]  -23 -14.0%
>    [ = ]   0 .debug_abbrev-2  -0.1%
> 
>    -1.2% -16.8Ki TOTAL   +25.1Ki  +0.1%
> 
>  I'm testing the patch.
>  Thoughts?
> >>>
> >>> Looks good in principle but why have the function in gimple.c
> >>> rather than in gimple-match-head.c where it could be static?
> >>> Do we still end up inlining it even though it is guarded with
> >>> __builtin_expect?
> >>
> >> Done that transformation:
> >>
> >> #include "gimple-match-head.c"
> >> static void report_match_pattern (const char *match_file, unsigned int 
> >> match_file_line, const char *generated_file, unsigned int 
> >> generate_file_line)
> >> {
> >> fprintf (dump_file, "Applying pattern %s:%d, %s:%d\n",match_file, 
> >> match_file_line, generated_file, generate_file_line);
> >> }
> >>
> >> Yes, I can confirm it's inlined now.
> >
> > Hmm, but that was the point of the exercise?  Not inlining it?  Or was
> > the point to have
> > the __builtin_expect()?
>
> The point was __builtin_expect and I thought I can also save some space.
>
> >
> >> Ready to install after proper testing?
> >
> > Just occured to me you need a copy of that in generic-match-head.c.
> >
> > But then, why not just add the __builtin_expect()...
>
> Yes, let's add that. And it's questionable whether to split the string in:
>
> gimple_seq *lseq = seq;
> -   if (dump_file && (dump_flags & TDF_FOLDING)) fprintf 
> (dump_file, "Applying pattern match.pd:4858, %s:%d\n", __FILE__, __LINE__);
> +   if (__builtin_expect (dump_file && (dump_flags & 
> TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern %s:%d, %s:%d\n", 
> "match.pd", 4858, __FILE__, __LINE__);
> res_op->set_op (CFN_FNMA, type, 3);

I guess it makes sense to split it, so the patch is OK.

Richard.

> That does:
>
> bloaty /tmp/after.o -- /tmp/before.o
>  VM SIZE  FILE SIZE
>  ++ GROWING++
>   [ = ]   0 .rela.text +22.7Ki  +3.5%
>   +1.6% +17.8Ki .text  +17.8Ki  +1.6%
>   [ = ]   0 .rela.debug_ranges +3.89Ki  +0.0%
>   [ = ]   0 .debug_info+3.08Ki  +0.5%
>   +1.8% +1.09Ki .eh_frame  +1.09Ki  +1.8%
>   [ = ]   0 .debug_loc +1.01Ki  +0.4%
>   [ = ]   0 .rela.eh_frame+480  +1.9%
>   +0.1%+195 .text.unlikely+195  +0.1%
>   [ = ]   0 .rela.debug_line   +72  +0.6%
>   +6.9%  +9 .rodata.str1.1  +9  +6.9%
>   [ = ]   0 .debug_ranges   +1  +0.0%
>
>  -- SHRINKING  --
>  -97.4% -24.8Ki .rodata.str1.8 -24.8Ki -97.4%
>   [ = ]   0 .rela.debug_loc-14.5Ki  -0.1%
>   [ = ]   0 .symtab-14.4Ki -25.6%
>   [ = ]   0 .rela.debug_info   -3.45Ki  -0.0%
>   [ = ]   0 .strtab-2.48Ki  -1.5%
>   [ = ]   0 .debug_line-2.43Ki  -0.7%
>   [ = ]   0 [Unmapped] -15  -9.1%
>   [ = ]   0 .debug_abbrev  -12  -0.6%
>
>   -0.4% -5.68Ki TOTAL  -11.7Ki  -0.0%
>
> 

[PATCH] gcc: xtensa: fix NAND code in xtensa_expand_atomic

2018-09-04 Thread Max Filippov
NAND is ~(a1 & a2), but xtensa_expand_atomic does ~a1 & a2.
That fixes libatomic tests atomic-op-{1,2}.

gcc/
2018-09-04  Max Filippov  

* config/xtensa/xtensa.c (xtensa_expand_atomic): Reorder AND and
XOR operations in NAND case.
---
 gcc/config/xtensa/xtensa.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.c b/gcc/config/xtensa/xtensa.c
index 7cfe64d42895..b88556625223 100644
--- a/gcc/config/xtensa/xtensa.c
+++ b/gcc/config/xtensa/xtensa.c
@@ -1614,10 +1614,10 @@ xtensa_expand_atomic (enum rtx_code code, rtx target, 
rtx mem, rtx val,
   break;
 
 case MULT: /* NAND */
-  tmp = expand_simple_binop (SImode, XOR, old, ac.modemask,
-NULL_RTX, 1, OPTAB_DIRECT);
-  tmp = expand_simple_binop (SImode, AND, tmp, val,
+  tmp = expand_simple_binop (SImode, AND, old, val,
 new_rtx, 1, OPTAB_DIRECT);
+  tmp = expand_simple_binop (SImode, XOR, tmp, ac.modemask,
+NULL_RTX, 1, OPTAB_DIRECT);
   break;
 
 default:
-- 
2.11.0



Re: [PATCHv2] Handle overlength string literals in the fortan FE

2018-09-04 Thread Bernd Edlinger
On 03/09/2018, 21:25 Janne Blomqvist wrote:
> On Fri, Aug 24, 2018 at 11:06 PM Bernd Edlinger 
> wrote:
>
>> Hi!
>>
>>
>> This is an alternative approach to handle overlength strings in the
>> Fortran FE.
>>
>
> Hi,
>
> can you explain a little more what the problem that this patch tries to
> solve is? What is an "overlength" string?

In the middle-end STRING_CST objects have a TYPE_DOMAIN
which specifies how much memory the string constant uses,
and what kind of characters the string constant consists of,
and a TREE_STRING_LENGTH which specifies how many
bytes the string value contains.

Everything is fine, if both sizes agree, or the memory size
is larger than the string length, in which case the string is simply
padded with zero bytes to the full length.

But things get unnecessarily complicated if the memory size
is smaller than the string length.

In this situation we have two different use cases of STRING_CST
which have contradicting rules:

For string literals and flexible arrays the memory size is ignored
and the TREE_STRING_LENGTH is used to specify both the
string length and the memory size.  Fortran does not use those.

For STRING_CST used in a CONSTRUCTOR of a string object
the TREE_STRING_LENGTH is ignored, and only the part of the
string value is used that fits into the memory size, the situation
is similar to excess precision floating point values.

Now it happens that the middle-end sees a STRING_CST with
overlength and wants to know if the string constant is properly
zero-terminated, and it is impossible to tell, since any nul byte
at the end of the string value might be part of the ignored excess
precision, but this depends on where the string constant actually
came from.

Therefore I started an effort to sanitize the STRING_CST via
an assertion in the varasm.c where most of the string constants
finally come along, and it triggered in two fortran test cases,
and a few other languages of course.

This is what this patch tries to fix.

Bernd.