Re: [RFC][PATCH][PR63586] Convert x+x+x+x into 4*x

2016-04-23 Thread kugan

Hi Richard,

As you have said in the other email, I tried implementing with the 
add_reapeats_to_ops_vec but the whole repeat vector is designed for 
MULT_EXPR chain. I tried changing it but it turned out to be not 
straightforward without lots of re-write. Therefore I tried to implement 
based on your review here. Please tell me what you think.



+/* Transoform repeated addition of same values into multiply with
+   constant.  */

Transform


Done.



+static void
+transform_add_to_multiply (gimple_stmt_iterator *gsi, gimple *stmt,
vec *ops)

split the long line


Done.



op_list looks redundant - ops[start]->op gives you the desired value
already and if you
use a vec> you can have a more C++ish start,end pair.

+  tree tmp = make_temp_ssa_name (TREE_TYPE (op), NULL, "reassocmul");
+  gassign *mul_stmt = gimple_build_assign (tmp, MULT_EXPR,
+  op, build_int_cst
(TREE_TYPE(op), count));

this won't work for floating point or complex numbers - you need to use sth like
fold_convert (TREE_TYPE (op), build_int_cst (integer_type_node, count));


Done.



For FP types you need to guard the transform with flag_unsafe_math_optimizations


Done.



+  gimple_set_location (mul_stmt, gimple_location (stmt));
+  gimple_set_uid (mul_stmt, gimple_uid (stmt));
+  gsi_insert_before (gsi, mul_stmt, GSI_SAME_STMT);

I think you do not want to set the stmt uid


assert in reassoc_stmt_dominates_p (gcc_assert (gimple_uid (s1) && 
gimple_uid (s2))) is failing. So I tried to add the uid of the adjacent 
stmt and it seems to work.



and you want to insert the
stmt right
after the def of op (or at the original first add - though you can't
get your hands at


Done.


that easily).  You also don't want to set the location to the last stmt of the
whole add sequence - simply leave it unset.

+  oe = operand_entry_pool.allocate ();
+  oe->op = tmp;
+  oe->rank = get_rank (op) * count;

?  Why that?  oe->rank should be get_rank (tmp).

+  oe->id = 0;

other places use next_operand_entry_id++.  I think you want to simply
use add_to_ops_vec (oe, tmp); here for all of the above.


Done.



Please return whether you did any optimization and do the
qsort of the operand vector only if you did sth.


Done.



Testcase with FP math missing.  Likewise with complex or vector math.


Btw, does it handle associating

   x + 3 * x + x

to

   5 * x

?


Added this to the testcase and verified it is working.

Regression tested and bootstrapped on x86-64-linux-gnu with no new 
regressions.


Is this OK for trunk?

Thanks,
Kugan


gcc/testsuite/ChangeLog:

2016-04-24  Kugan Vivekanandarajah  

PR middle-end/63586
* gcc.dg/tree-ssa/pr63586-2.c: New test.
* gcc.dg/tree-ssa/pr63586.c: New test.
* gcc.dg/tree-ssa/reassoc-14.c: Adjust multiplication count.

gcc/ChangeLog:

2016-04-24  Kugan Vivekanandarajah  

PR middle-end/63586
* tree-ssa-reassoc.c (transform_add_to_multiply): New.
(reassociate_bb): Call transform_add_to_multiply.



diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr63586-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr63586-2.c
index e69de29..2774fbd 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr63586-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr63586-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */
+
+float f1_float (float x)
+{
+float y = x + x;
+y = y + x;
+y = y + x;
+y = y + x;
+y = y + x;
+y = y + x;
+y = y + x;
+return y;
+}
+
+float f1_float2 (float x)
+{
+float y = x + 3 * x + x;
+return y;
+}
+
+int f1_int (int x)
+{
+int y = x + 3 * x + x;
+return y;
+}
+
+/* { dg-final { scan-tree-dump-times "\\\*" 4 "reassoc1" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr63586.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr63586.c
index e69de29..a0f705b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr63586.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr63586.c
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-reassoc1" } */
+
+unsigned f1 (unsigned x)
+{
+unsigned y = x + x;
+y = y + x;
+y = y + x;
+y = y + x;
+y = y + x;
+y = y + x;
+y = y + x;
+return y;
+}
+
+unsigned f2 (unsigned x, unsigned z)
+{
+unsigned y = x + x;
+y = y + x;
+y = y + x;
+y = y + z;
+y = y + z;
+y = y + z;
+y = y + z;
+return y;
+}
+
+unsigned f3 (unsigned x, unsigned z, unsigned k)
+{
+unsigned y = x + x;
+y = y + x;
+y = y + x;
+y = y + z;
+y = y + z;
+y = y + z;
+y = y + k;
+return y;
+}
+
+unsigned f4 (unsigned x, unsigned z, unsigned k)
+{
+unsigned y = k + x;
+y = y + x;
+y = y + x;
+y = y + z;
+y = y + z;
+y = y + z;
+y = y + x;
+return y;
+}
+
+unsigned f5 (unsigned x, unsigned y, unsigned z)
+{
+return x + x + x + x + y + y + y + y 

Re: [Patch AArch64] Fix PR target/63874

2016-04-23 Thread Ramana Radhakrishnan
On Thu, Mar 31, 2016 at 5:30 PM, James Greenhalgh
 wrote:
> On Thu, Mar 31, 2016 at 02:11:49PM +0100, Ramana Radhakrishnan wrote:
>> Hi,
>>
>>   In this PR we have a situation where we aren't really detecting
>> weak references vs weak definitions. If one has a weak definition
>> that binds locally there's no reason not to put out PC relative
>> relocations.
>>
>> However if you have a genuine weak reference that is
>> known not to bind locally it makes very little sense
>> to put out an entry into the literal pool which doesn't always
>> work with DSOs and shared objects.
>>
>> Tested aarch64-none-linux-gnu bootstrap and regression test with no 
>> regressions
>>
>> This is not a regression and given what we've seen recently with protected
>> symbols and binds_locally_p I'd rather this were queued for GCC 7.
>>
>> Ok ?
>
> Based on the bugzilla report, this looks OK for GCC 7 to me. But I don't
> know the dark corners of the elf specification, so I'd rather leave the
> final review to Richard or Marcus.

Richard / Marcus, ping ?


Ramana
>
> Thanks,
> James
>
>> gcc/
>>
>> * config/aarch64/aarch64.c (aarch64_classify_symbol): Typo in comment fixed.
>>   Only force to memory if it is a weak external reference.
>>
>> gcc/testsuite
>>
>> * gcc.target/aarch64/pr63874.c: New test.
>


Re: [PATCH, fortran, v3] Use Levenshtein spelling suggestions in Fortran FE

2016-04-23 Thread Bernhard Reutner-Fischer
On March 7, 2016 3:57:16 PM GMT+01:00, David Malcolm  
wrote:
>On Sat, 2016-03-05 at 23:46 +0100, Bernhard Reutner-Fischer wrote:
>[...]
>
>> diff --git a/gcc/fortran/misc.c b/gcc/fortran/misc.c
>> index 405bae0..72ed311 100644
>> --- a/gcc/fortran/misc.c
>> +++ b/gcc/fortran/misc.c
>[...]
>
>> @@ -274,3 +275,41 @@ get_c_kind(const char *c_kind_name,teropKind_tki
>> nds_table[])
>>  
>>return ISOCBINDING_INVALID;
>>  }
>> +
>> +
>> +/* For a given name TYPO, determine the best candidate from
>> CANDIDATES
>> +   perusing Levenshtein distance.  Frees CANDIDATES before
>> returning.  */
>> +
>> +const char *
>> +gfc_closest_fuzzy_match (const char *typo, char **candidates)
>> +{
>> +  /* Determine closest match.  */
>> +  const char *best = NULL;
>> +  char **cand = candidates;
>> +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
>> +
>> +  while (cand && *cand)
>> +{
>> +  edit_distance_t dist = levenshtein_distance (typo, *cand);
>> +  if (dist < best_distance)
>> +{
>> +   best_distance = dist;
>> +   best = *cand;
>> +}
>> +  cand++;
>> +}
>> +  /* If more than half of the letters were misspelled, the
>> suggestion is
>> + likely to be meaningless.  */
>> +  if (best)
>> +{
>> +  unsigned int cutoff = MAX (strlen (typo), strlen (best)) / 2;
>> +
>> +  if (best_distance > cutoff)
>> +{
>> +  XDELETEVEC (candidates);
>> +  return NULL;
>> +}
>> +  XDELETEVEC (candidates);
>> +}
>> +  return best;
>> +}
>
>FWIW, there are two overloaded variants of levenshtein_distance in
>gcc/spellcheck.h, the first of which takes a pair of strlen values;
>your patch uses the second one:
>
>extern edit_distance_t
>levenshtein_distance (const char *s, int len_s,
> const char *t, int len_t);
>
>extern edit_distance_t
>levenshtein_distance (const char *s, const char *t);
>
>So one minor tweak you may want to consider here is to calculate
>  strlen (typo)
>once at the top of gfc_closest_fuzzy_match, and then pass it in to the
>4-arg variant of levenshtein_distance, which would avoid recalculating
>strlen (typo) for every candidate.

I've pondered this back then but came to the conclusion to use the variant 
without len because to use the 4 argument variant I would have stored the 
candidates strlen in the vector too and was not convinced about the memory 
footprint for that would be justified. Maybe it is, but I would prefer the 
following tweak in the 4 argument variant:
If you would amend the 4 argument variant with a

  if (len_t == -1)
len_t = strlen (t);
before the
   if (len_s == 0)
 return len_t;
   if (len_t == 0)
 return len_s;

checks then I'd certainly use the 4 arg variant :)

WDYT?
>
>I can't comment on the rest of the patch (I'm not a Fortran expert),
>though it seems sane to 
>
>Hope this is constructive

It is, thanks for your thoughts!

cheers,



Re: [wwwdocs] Reduce use of MetaHTML for navigation

2016-04-23 Thread Gerald Pfeifer
On Sun, 17 Apr 2016, Gerald Pfeifer wrote:
> When I initially created this in the early 2000s, CSS did barely 
> exist and was hardly used.  Now in 2016 it makes sense to use it
> fully (a first phase of conversation happened a few years ago)
> and reduce our dependency on MetaHTML even further.

After that change, we can now simplify things further.

Committed.

Gerald

Remove now obsolete nav-title-style and nav-body-style MetaHTML functions.

Index: style.mhtml
===
RCS file: /cvs/gcc/wwwdocs/htdocs/style.mhtml,v
retrieving revision 1.129
diff -u -r1.129 style.mhtml
--- style.mhtml 21 Apr 2016 23:04:36 -  1.129
+++ style.mhtml 23 Apr 2016 16:13:06 -
@@ -3,8 +3,6 @@
 
 
 
- class="td_title" 
- class="td_con" 
 
 ;;; The "install/" pages are HTML, not XHTML.
 


[PING^2] Re: [PATCH 1/4] Add gcc-auto-profile script

2016-04-23 Thread Andi Kleen
Andi Kleen  writes:

Ping^2 for the patch series!

> Andi Kleen  writes:
>
> Ping for the patch series!
>
>> From: Andi Kleen 
>>
>> Using autofdo is currently something difficult. It requires using the
>> model specific branches taken event, which differs on different CPUs.
>> The example shown in the manual requires a special patched version of
>> perf that is non standard, and also will likely not work everywhere.
>>
>> This patch adds a new gcc-auto-profile script that figures out the
>> correct event and runs perf. The script is installed with on Linux systems.
>>
>> Since maintaining the script would be somewhat tedious (needs changes
>> every time a new CPU comes out) I auto generated it from the online
>> Intel event database. The script to do that is in contrib and can be
>> rerun.
>>
>> Right now there is no test if perf works in configure. This
>> would vary depending on the build and target system, and since
>> it currently doesn't work in virtualization and needs uptodate
>> kernel it may often fail in common distribution build setups.
>>
>> So Linux just hardcodes installing the script, but it may fail at runtime.
>>
>> This is needed to actually make use of autofdo in a generic way
>> in the build system and in the test suite.
>>
>> So far the script is not installed.
>>
>> gcc/:
>> 2016-03-27  Andi Kleen  
>>
>>  * doc/invoke.texi: Document gcc-auto-profile
>>  * gcc-auto-profile: Create.
>>
>> contrib/:
>>
>> 2016-03-27  Andi Kleen  
>>
>>  * gen_autofdo_event.py: New file to regenerate
>>  gcc-auto-profile.
>> ---
>>  contrib/gen_autofdo_event.py | 155 
>> +++
>>  gcc/doc/invoke.texi  |  31 +++--
>>  gcc/gcc-auto-profile |  70 +++
>>  3 files changed, 251 insertions(+), 5 deletions(-)
>>  create mode 100755 contrib/gen_autofdo_event.py
>>  create mode 100755 gcc/gcc-auto-profile
>>
>> diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py
>> new file mode 100755
>> index 000..db4db33
>> --- /dev/null
>> +++ b/contrib/gen_autofdo_event.py
>> @@ -0,0 +1,155 @@
>> +#!/usr/bin/python
>> +# generate Intel taken branches Linux perf event script for autofdo 
>> profiling
>> +
>> +# Copyright (C) 2016 Free Software Foundation, Inc.
>> +#
>> +# GCC is free software; you can redistribute it and/or modify it under
>> +# the terms of the GNU General Public License as published by the Free
>> +# Software Foundation; either version 3, or (at your option) any later
>> +# version.
>> +#
>> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>> +# for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with GCC; see the file COPYING3.  If not see
>> +# .  */
>> +
>> +# run it with perf record -b -e EVENT program ...
>> +# The Linux Kernel needs to support the PMU of the current CPU, and
>> +# it will likely not work in VMs.
>> +# add --all to print for all cpus, otherwise for current cpu
>> +# add --script to generate shell script to run correct event
>> +#
>> +# requires internet (https) access. this may require setting up a proxy
>> +# with export https_proxy=...
>> +#
>> +import urllib2
>> +import sys
>> +import json
>> +import argparse
>> +import collections
>> +
>> +baseurl = "https://download.01.org/perfmon;
>> +
>> +target_events = (u'BR_INST_RETIRED.NEAR_TAKEN',
>> + u'BR_INST_EXEC.TAKEN',
>> + u'BR_INST_RETIRED.TAKEN_JCC',
>> + u'BR_INST_TYPE_RETIRED.COND_TAKEN')
>> +
>> +ap = argparse.ArgumentParser()
>> +ap.add_argument('--all', '-a', help='Print for all CPUs', 
>> action='store_true')
>> +ap.add_argument('--script', help='Generate shell script', 
>> action='store_true')
>> +args = ap.parse_args()
>> +
>> +eventmap = collections.defaultdict(list)
>> +
>> +def get_cpu_str():
>> +with open('/proc/cpuinfo', 'r') as c:
>> +vendor, fam, model = None, None, None
>> +for j in c:
>> +n = j.split()
>> +if n[0] == 'vendor_id':
>> +vendor = n[2]
>> +elif n[0] == 'model' and n[1] == ':':
>> +model = int(n[2])
>> +elif n[0] == 'cpu' and n[1] == 'family':
>> +fam = int(n[3])
>> +if vendor and fam and model:
>> +return "%s-%d-%X" % (vendor, fam, model), model
>> +return None, None
>> +
>> +def find_event(eventurl, model):
>> +print >>sys.stderr, "Downloading", eventurl
>> +u = urllib2.urlopen(eventurl)
>> +events = json.loads(u.read())
>> +u.close()
>> +
>> +found = 0
>> +for j in events:
>> +if j[u'EventName'] in 

[wwwdocs] Buildstat update for 4.8

2016-04-23 Thread Tom G. Christensen
Latest results for 4.8.x

-tgc

Testresults for 4.8.5:
  hppa1.1-hp-hpux11.11
  powerpc64le-unknown-linux-gnu
  s390x-ibm-linux-gnu (3)

Index: buildstat.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/buildstat.html,v
retrieving revision 1.15
diff -u -r1.15 buildstat.html
--- buildstat.html  1 Oct 2015 06:21:15 -   1.15
+++ buildstat.html  23 Apr 2016 15:44:14 -
@@ -59,6 +59,14 @@
 
 
 
+hppa1.1-hp-hpux11.11
+
+Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2015-12/msg02725.html;>4.8.5
+
+
+
+
 hppa2.0w-hp-hpux11.00
 
 Test results:
@@ -216,6 +224,7 @@
 powerpc64le-unknown-linux-gnu
 
 Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2015-09/msg00640.html;>4.8.5,
 https://gcc.gnu.org/ml/gcc-testresults/2015-06/msg02566.html;>4.8.5
 
 
@@ -235,6 +244,9 @@
 s390x-ibm-linux-gnu
 
 Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-01/msg01584.html;>4.8.5,
+https://gcc.gnu.org/ml/gcc-testresults/2016-01/msg01575.html;>4.8.5,
+https://gcc.gnu.org/ml/gcc-testresults/2016-01/msg01570.html;>4.8.5,
 https://gcc.gnu.org/ml/gcc-testresults/2015-07/msg00493.html;>4.8.5,
 https://gcc.gnu.org/ml/gcc-testresults/2015-07/msg00485.html;>4.8.5,
 https://gcc.gnu.org/ml/gcc-testresults/2015-07/msg00473.html;>4.8.5


[wwwdocs] Buildstat update for 5.x

2016-04-23 Thread Tom G. Christensen
Latest results for 5.x

-tgc

Testresults for 5.3.0:
  armv7l-unknown-linux-gnueabi
  arm-unknown-linux-gnueabi
  hppa-unknown-linux-gnu
  mips-unknown-linux-gnu
  mipsel-unknown-linux-gnu
  powerpc-unknown-linux-gnu
  sparc-sun-solaris2.10
  sparc64-sun-solaris2.10
  sparc-unknown-linux-gnu
  x86_64-unknown-linux-gnu (2)

Testresults for 5.2.0:
  arm-unknown-linux-gnueabi
  hppa-unknown-linux-gnu
  mips-unknown-linux-gnu
  mipsel-unknown-linux-gnu
  powerpc64-unknown-linux-gnu
  sparc-unknown-linux-gnu

Index: buildstat.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/buildstat.html,v
retrieving revision 1.6
diff -u -r1.6 buildstat.html
--- buildstat.html  6 Jan 2016 15:48:52 -   1.6
+++ buildstat.html  23 Apr 2016 15:25:04 -
@@ -31,9 +31,19 @@
 
 
 
+armv7l-unknown-linux-gnueabi
+
+Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2016-02/msg00279.html;>5.3.0
+
+
+
+
 arm-unknown-linux-gnueabi
 
 Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2015-12/msg01344.html;>5.3.0,
+https://gcc.gnu.org/ml/gcc-testresults/2015-10/msg00378.html;>5.2.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-05/msg00965.html;>5.1.0
 
 
@@ -42,6 +52,8 @@
 hppa-unknown-linux-gnu
 
 Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2015-12/msg02075.html;>5.3.0,
+https://gcc.gnu.org/ml/gcc-testresults/2015-10/msg01144.html;>5.2.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-05/msg00763.html;>5.1.0
 
 
@@ -91,6 +103,8 @@
 mips-unknown-linux-gnu
 
 Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2015-12/msg01134.html;>5.3.0,
+https://gcc.gnu.org/ml/gcc-testresults/2015-10/msg01303.html;>5.2.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-05/msg01101.html;>5.1.0
 
 
@@ -99,6 +113,8 @@
 mipsel-unknown-linux-gnu
 
 Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2015-12/msg01133.html;>5.3.0,
+https://gcc.gnu.org/ml/gcc-testresults/2015-10/msg01145.html;>5.2.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-05/msg01480.html;>5.1.0
 
 
@@ -123,7 +139,8 @@
 powerpc64-unknown-linux-gnu
 
 Test results:
-https://gcc.gnu.org/ml/gcc-testresults/2015-12/msg02803.html;>5.3.0
+https://gcc.gnu.org/ml/gcc-testresults/2015-12/msg02803.html;>5.3.0,
+https://gcc.gnu.org/ml/gcc-testresults/2015-10/msg01557.html;>5.2.0
 
 
 
@@ -139,6 +156,7 @@
 powerpc-unknown-linux-gnu
 
 Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2015-12/msg02465.html;>5.3.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-09/msg00261.html;>5.2.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-05/msg01751.html;>5.1.0
 
@@ -148,6 +166,7 @@
 sparc-sun-solaris2.10
 
 Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2015-12/msg01463.html;>5.3.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-09/msg01141.html;>5.2.0
 
 
@@ -164,6 +183,7 @@
 sparc64-sun-solaris2.10
 
 Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2015-12/msg01462.html;>5.3.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-09/msg01142.html;>5.2.0
 
 
@@ -172,6 +192,8 @@
 sparc-unknown-linux-gnu
 
 Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2015-12/msg01345.html;>5.3.0,
+https://gcc.gnu.org/ml/gcc-testresults/2015-10/msg00560.html;>5.2.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-05/msg00966.html;>5.1.0
 
 
@@ -196,6 +218,8 @@
 x86_64-unknown-linux-gnu
 
 Test results:
+https://gcc.gnu.org/ml/gcc-testresults/2015-12/msg01427.html;>5.3.0,
+https://gcc.gnu.org/ml/gcc-testresults/2015-12/msg00587.html;>5.3.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-09/msg00628.html;>5.2.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-08/msg02501.html;>5.2.0,
 https://gcc.gnu.org/ml/gcc-testresults/2015-08/msg01782.html;>5.2.0,


Re: [RFC][PATCH][PR40921] Convert x + (-y * z * z) into x - y * z * z

2016-04-23 Thread kugan



I am not sure I understand this. I tried doing this. If I add  -1 and rhs1
for the NEGATE_EXPR to ops list,  when it come to rewrite_expr_tree constant
will be sorted early and would make it hard to generate:
  x + (-y * z * z) => x - y * z * z

Do you want to swap the constant in MULT_EXPR chain (if present) like in
swap_ops_for_binary_stmt and then create a NEGATE_EXPR ?


In addition to linearize_expr handling you need to handle a -1 in the MULT_EXPR
chain specially at rewrite_expr_tree time by emitting a NEGATE_EXPR instead
of a MULT_EXPR (which also means you should sort the -1 "last").


Hi Richard,

Thanks. Here is an attempt which does this.

Regression tested and bootstrapped on x86-64-linux-gnu with no new 
regressions.


Is this OK for trunk?

Thanks,
Kugan

2016-04-23  Kugan Vivekanandarajah  

PR middle-end/40921
* gcc.dg/tree-ssa/pr40921.c: New test.

gcc/ChangeLog:

2016-04-23  Kugan Vivekanandarajah  

PR middle-end/40921
* tree-ssa-reassoc.c (try_special_add_to_ops): New.
(linearize_expr_tree): Call try_special_add_to_ops.
(reassociate_bb): Convert MULT_EXPR by (-1) to NEGATE_EXPR.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr40921.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr40921.c
index e69de29..f587a8f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr40921.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr40921.c
@@ -0,0 +1,20 @@
+
+/* { dg-do compile } */
+/* { dg-options "-O2  -fdump-tree-optimized -ffast-math" } */
+
+unsigned int foo (unsigned int x, unsigned int y, unsigned int z)
+{
+  return x + (-y * z*z);
+}
+
+float bar (float x, float y, float z)
+{
+  return x + (-y * z*z);
+}
+
+float bar (float x, float y, float z)
+{
+  return x + (-y * z*z * 5.0);
+}
+
+/* { dg-final { scan-tree-dump-times "_* = -y_" 0 "optimized" } } */
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index d23dabd..1b38207 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -4252,6 +4252,45 @@ acceptable_pow_call (gimple *stmt, tree *base, 
HOST_WIDE_INT *exponent)
   return true;
 }
 
+/* Try to derive and add operand entry for OP to *OPS.  Return false if
+   unsuccessful.  */
+
+static bool
+try_special_add_to_ops (vec *ops,
+   enum tree_code code,
+   tree op, gimple* def_stmt)
+{
+  tree base = NULL_TREE;
+  HOST_WIDE_INT exponent = 0;
+
+  if (TREE_CODE (op) != SSA_NAME)
+return false;
+
+  if (code == MULT_EXPR
+  && acceptable_pow_call (def_stmt, , ))
+{
+  add_repeat_to_ops_vec (ops, base, exponent);
+  gimple_set_visited (def_stmt, true);
+  return true;
+}
+  else if (code == MULT_EXPR
+  && is_gimple_assign (def_stmt)
+  && gimple_assign_rhs_code (def_stmt) == NEGATE_EXPR
+  && !HONOR_SNANS (TREE_TYPE (op))
+  && (!HONOR_SIGNED_ZEROS (TREE_TYPE (op))
+  || !COMPLEX_FLOAT_TYPE_P (TREE_TYPE (op
+{
+  tree rhs1 = gimple_assign_rhs1 (def_stmt);
+  tree cst = build_minus_one_cst (TREE_TYPE (op));
+  add_to_ops_vec (ops, rhs1);
+  add_to_ops_vec (ops, cst);
+  gimple_set_visited (def_stmt, true);
+  return true;
+}
+
+  return false;
+}
+
 /* Recursively linearize a binary expression that is the RHS of STMT.
Place the operands of the expression tree in the vector named OPS.  */
 
@@ -4266,8 +4305,6 @@ linearize_expr_tree (vec *ops, gimple 
*stmt,
   bool binrhsisreassoc = false;
   enum tree_code rhscode = gimple_assign_rhs_code (stmt);
   struct loop *loop = loop_containing_stmt (stmt);
-  tree base = NULL_TREE;
-  HOST_WIDE_INT exponent = 0;
 
   if (set_visited)
 gimple_set_visited (stmt, true);
@@ -4303,26 +4340,11 @@ linearize_expr_tree (vec *ops, gimple 
*stmt,
 
   if (!binrhsisreassoc)
{
- if (rhscode == MULT_EXPR
- && TREE_CODE (binrhs) == SSA_NAME
- && acceptable_pow_call (binrhsdef, , ))
-   {
- add_repeat_to_ops_vec (ops, base, exponent);
- gimple_set_visited (binrhsdef, true);
-   }
- else
+ if (!try_special_add_to_ops (ops, rhscode, binrhs, binrhsdef))
add_to_ops_vec (ops, binrhs);
 
- if (rhscode == MULT_EXPR
- && TREE_CODE (binlhs) == SSA_NAME
- && acceptable_pow_call (binlhsdef, , ))
-   {
- add_repeat_to_ops_vec (ops, base, exponent);
- gimple_set_visited (binlhsdef, true);
-   }
- else
+ if (!try_special_add_to_ops (ops, rhscode, binlhs, binlhsdef))
add_to_ops_vec (ops, binlhs);
-
  return;
}
 
@@ -4360,14 +4382,7 @@ linearize_expr_tree (vec *ops, gimple 
*stmt,
   linearize_expr_tree (ops, SSA_NAME_DEF_STMT (binlhs),
   is_associative, set_visited);
 
-  if (rhscode == MULT_EXPR
-  && TREE_CODE (binrhs) == SSA_NAME
-  && acceptable_pow_call (SSA_NAME_DEF_STMT 

Re: [patch] removing aged ifdef 0 from boehm-gc/os_dep.c

2016-04-23 Thread Joseph Myers
On Wed, 13 Apr 2016, g...@glenstark.net wrote:

> This is my first effort at contributing to gcc, so I thought I would try
> with some of the easy stuff listed here:
> https://gcc.gnu.org/projects/beginner.html
> 
> Attached is a patch removing a block which has been #if 0'd out since
> 2006.  I tested the build afterward.
> 
> I look forward to your feedback.

As boehm-gc is an externally maintained project (see "Upstream packages" 
at ), such cleanups should be 
done upstream only, and get into GCC through merges from the upstream 
version.

-- 
Joseph S. Myers
jos...@codesourcery.com


New hashtable power 2 rehash policy

2016-04-23 Thread François Dumont

Hi

Here is a patch to introduce a new power of 2 based rehash policy. 
It enhances performance as it avoids modulo float operations. I have 
updated performance benches and here is the result:


54075.cctr1 benches 455r  446u8s
0mem0pf
54075.ccstd benches 466r  460u6s
0mem0pf
54075.ccstd2 benches 375r  369u6s
0mem0pf


std2 benches is the one using power of 2 bucket count.

Note that I made use of __detected_or_t to avoid duplicating all 
the code of _Rehash_base<>.


It brings a simplification of _Insert<>, it doesn't take a 
_Unique_keys template parameter anymore. It allowed to remove a 
specialization.


It also improve behavior when we reach maximum number of buckets, 
we won't keep on trying to increase the number as it is impossible.


Last it fixes a small problem in 54075.cc bench. We were using 
__uset_traits rather than __umset_traits in definition of __umset. 
Results were not the expected ones.


2016-04-22  François Dumont 

* include/bits/hashtable_policy.h
(_Prime_rehash_policy::__has_load_factor): New. Mark rehash policy
having load factor management.
(_Mask_range_hashing): New.
(_NextPower2): New.
(_Power2_rehash_policy): New.
(_Inserts<>): Remove last template parameter _Unique_keys. Use the same
implementation when keys are unique no matter if iterators are constant
or not.
* src/c++11/hashable_c++0x.cc (_Prime_rehash_policy::_M_next_bkt):
Consider when last prime number has been reach.
* testsuite/23_containers/unordered_set/hash_policy/power2_rehash.cc:
New.
* testsuite/performance/23_containers/insert/54075.cc: Add bench using
the new hash policy.
* testsuite/performance/23_containers/insert_erase/41975.cc: Likewise.

Tested under linux x64_86, ok to commit ?

François


Index: include/bits/hashtable_policy.h
===
--- include/bits/hashtable_policy.h	(révision 235348)
+++ include/bits/hashtable_policy.h	(copie de travail)
@@ -457,6 +457,8 @@
   /// smallest prime that keeps the load factor small enough.
   struct _Prime_rehash_policy
   {
+using __has_load_factor = std::true_type;
+
 _Prime_rehash_policy(float __z = 1.0) noexcept
 : _M_max_load_factor(__z), _M_next_resize(0) { }
 
@@ -501,6 +503,136 @@
 mutable std::size_t	_M_next_resize;
   };
 
+  /// Range hashing function assuming that second args is a power of 2.
+  struct _Mask_range_hashing
+  {
+typedef std::size_t first_argument_type;
+typedef std::size_t second_argument_type;
+typedef std::size_t result_type;
+
+result_type
+operator()(first_argument_type __num,
+	   second_argument_type __den) const noexcept
+{ return __num & (__den - 1); }
+  };
+
+
+  /// Helper type to compute next power of 2.
+  template
+struct _NextPower2
+{
+  static _SizeT
+  _Get(_SizeT __n)
+  {
+	_SizeT __next = _NextPower2<_SizeT, (_N >> 1), false>::_Get(__n);
+	__next |= __next >> _N;
+	if (_Increment)
+	  ++__next;
+
+	return __next;
+  }
+};
+
+  template
+struct _NextPower2<_SizeT, 1, false>
+{
+  static _SizeT
+  _Get(_SizeT __n)
+  {
+	--__n;
+	return __n |= __n >> 1;
+  }
+};
+
+  /// Rehash policy providing power of 2 bucket numbers. Ease modulo
+  /// operations.
+  struct _Power2_rehash_policy
+  {
+using __has_load_factor = std::true_type;
+
+_Power2_rehash_policy(float __z = 1.0) noexcept
+: _M_max_load_factor(__z), _M_next_resize(0) { }
+
+float
+max_load_factor() const noexcept
+{ return _M_max_load_factor; }
+
+// Return a bucket size no smaller than n (as long as n is not above the
+// highest power of 2).
+std::size_t
+_M_next_bkt(std::size_t __n) const
+{
+  constexpr auto __max_bkt
+	= std::size_t(1) << ((sizeof(std::size_t) << 3) - 1);
+
+  std::size_t __res = _NextPower2::_Get(__n);
+
+  if (__res == 0)
+	__res = __max_bkt;
+
+  if (__res == __max_bkt)
+	// Set next resize to the max value so that we never try to rehash again
+	// as we already reach the biggest possible bucket number.
+	// Note that it might result in max_load_factor not being respected.
+	_M_next_resize = std::size_t(0) - 1;
+  else
+	_M_next_resize
+	  = __builtin_floor(__res * (long double)_M_max_load_factor);
+
+  return __res;
+}
+
+// Return a bucket count appropriate for n elements
+std::size_t
+_M_bkt_for_elements(std::size_t __n) const
+{ return __builtin_ceil(__n / (long double)_M_max_load_factor); }
+
+// __n_bkt is current bucket count, __n_elt is current element count,
+// and __n_ins is number of elements to be inserted.  Do we need to
+// increase bucket count?  If so, return make_pair(true, n), where n
+// is the new bucket count.  If not, return