date:20140428

Re: [PATCH 2/2] allow running mklog as a filter

2014-04-28 Thread Yury Gribov


+# XXX We should probably accept /dev/stdin or maybe magic autodetection of
+# being supposed to get the patch from stdin.
+#


Can we just set $diff to '-' if @ARGV is empty?


+# In any case if we got the diff on stdin then write the ChangeLog to stdout.


Hm, this is breaks semantics: you only dump CL instead of CL+diff just 
because diff comes from stdin. Perhaps we could append contents of 
@diff_lines here?



+if ($diff == "-") {


This will work but 'eq' is preferred way to compare strings.

-Y

[committed] [PATCH, AARCH64] movcc for fcsel

2014-04-28 Thread Zhenqiang Chen

On 28 April 2014 18:16, Marcus Shawcroft  wrote:
> On 22 April 2014 10:36, Zhenqiang Chen  wrote:
>
>>> +float f1 (float a, float b, float c, float d)
>>> +{
>>> +  if (a > 0.0)
>>> +return c;
>>> +  else
>>> +return 2.0;
>>> +}
>>> +
>>> +double f2 (double a, double b, double c, double d)
>>> +{
>>> +  if (a > b)
>>> +return c;
>>> +  else
>>> +return d;
>>> +}
>
> OK, but please GNUize the test case, function names start in column 1
> and the test case file names should end in _1.c

Thanks! Patch with test case changes was committed @r209889.

-Zhenqiang

Re: [PATCH 1/2] teach mklog to get name / email from git config when available

2014-04-28 Thread Yury Gribov


Hi Trevor,

I think this looks rather useful.

> +if (-d .git) {

What about moving default name/addr (with finger, etc.) to else branch?

> +  chomp($gitname);
> +  chomp($gitaddr);

Missing whites before (.

-Y

[PATCH ARM]Handle REG addressing mode in output_move_neon explicitly

2014-04-28 Thread bin.cheng

Hi,
Function output_move_neon now generates vld1.64 for memory ref like "dx <-
[r1:SI]", this is bogus because it requires at least 64-bit alignment for
32-bit aligned memory ref.  It works now because GCC doesn't generate such
insns in the first place, but things are going to change if memset/memcpy
calls are inlined by using neon instructions.

This patch fixes the issue by generating ldr for such instructions.

Bootstrapped on cortex-a15 with neon.
Is it OK?

Thanks,
bin


2014-04-29  Bin Cheng  

* config/arm/arm.c (output_move_neon): Handle REG explicitly.Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 209852)
+++ gcc/config/arm/arm.c(working copy)
@@ -18427,6 +18453,20 @@ output_move_neon (rtx *operands)
   /* FIXME: Not currently enabled in neon_vector_mem_operand.  */
   gcc_unreachable ();
 
+case REG:
+  /* We have to use vldm / vstm for too-large modes.  */
+  if (nregs > 1)
+   {
+ if (nregs > 4)
+   templ = "v%smia%%?\t%%m0, %%h1";
+ else
+   templ = "v%s1.64\t%%h1, %%A0";
+
+ ops[0] = mem;
+ ops[1] = reg;
+ break;
+   }
+  /* Fall through.  */
 case LABEL_REF:
 case PLUS:
   {
@@ -18460,14 +18500,7 @@ output_move_neon (rtx *operands)
   }
 
 default:
-  /* We have to use vldm / vstm for too-large modes.  */
-  if (nregs > 4)
-   templ = "v%smia%%?\t%%m0, %%h1";
-  else
-   templ = "v%s1.64\t%%h1, %%A0";
-
-  ops[0] = mem;
-  ops[1] = reg;
+  gcc_unreachable ();
 }
 
   sprintf (buff, templ, load ? "ld" : "st");

Re: [RFC][AARCH64] TARGET_ATOMIC_ASSIGN_EXPAND_FENV hook

2014-04-28 Thread Kugan


On 28/04/14 21:01, Ramana Radhakrishnan wrote:
> On 04/26/14 11:57, Kugan wrote:
>> Attached patch implements TARGET_ATOMIC_ASSIGN_EXPAND_FENV for AARCH64.
>> With this, atomic test-case gcc.dg/atomic/c11-atomic-exec-5.c now PASS.
>>
>> This implementation is based on SPARC and i386 implementations.
>>
>> Regression tested on qemu-aarch64 for aarch64-none-linux-gnu with no new
>> regression. Is this OK for trunk?
> 
> Again like A32 please test on hardware to make sure this behaves
> correctly with c11-atomic-exec-5.c .
> 
> If you don't have access to hardware, let us know : we'll take it for a
> spin once you update the patch according to Marcus's comments.
> 

Thanks for the review. I have updated the patch. I also have updated
hold, clear and update to be exactly as in feholdexcpt.c, fclrexcpt.c
and feupdateenv.c of glibc/ports/sysdeps/aarch64/fpu.

I have limited real hardware access and just did a bootstrap and tested
c11-atomic-exec-5.c alone to make sure that it PASS. I have also
regression tested again on qemu-aarch64 for aarch64-none-linux-gnu with
no new regressions. I will appreciate if you could do the regression
testing on real hw.

As for the ARM version of the patch, I did test the previous version for
c11-atomic-exec-5.c and did verified it on chromebook before I posted
the match . I have now updated the patch based on your review and the
full bootstrap and regression testing is now under way. I will post the
patch once the results are available.

Thanks,
Kugan

+2014-04-29  Kugan Vivekanandarajah  
+
+   * config/aarch64/aarch64.c (TARGET_ATOMIC_ASSIGN_EXPAND_FENV): New
+   define.
+   * config/aarch64/aarch64-protos.h (aarch64_atomic_assign_expand_fenv):
+   New function declaration.
+   * config/aarch64/aarch64-builtins.c (aarch64_builtins) : Add
+   AARCH64_BUILTIN_GET_FPCR, AARCH64_BUILTIN_SET_FPCR.
+   AARCH64_BUILTIN_GET_FPSR and AARCH64_BUILTIN_SET_FPSR.
+   (aarch64_init_builtins) : Initialize builtins
+   __builtins_aarch64_set_fpcr, __builtins_aarch64_get_fpcr.
+   __builtins_aarch64_set_fpsr and __builtins_aarch64_get_fpsr.
+   (aarch64_expand_builtin) : Expand builtins __builtins_aarch64_set_fpcr
+   __builtins_aarch64_get_fpcr, __builtins_aarch64_get_fpsr,
+   and __builtins_aarch64_set_fpsr.
+   (aarch64_atomic_assign_expand_fenv): New function.
+   * config/aarch64/aarch64.md (set_fpcr): New pattern.
+   (get_fpcr) : Likewise.
+   (set_fpsr) : Likewise.
+   (get_fpsr) : Likewise.
+   (unspecv): Add UNSPECV_GET_FPCR and UNSPECV_SET_FPCR, UNSPECV_GET_FPSR
+and UNSPECV_SET_FPSR.
+   * doc/extend.texi (AARCH64 Built-in Functions) : Document
+   __builtins_aarch64_set_fpcr, __builtins_aarch64_get_fpcr.
+   __builtins_aarch64_set_fpsr and __builtins_aarch64_get_fpsr.


diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 55cfe0a..5cdc978 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -371,6 +371,12 @@ static aarch64_simd_builtin_datum 
aarch64_simd_builtin_data[] = {
 enum aarch64_builtins
 {
   AARCH64_BUILTIN_MIN,
+
+  AARCH64_BUILTIN_GET_FPCR,
+  AARCH64_BUILTIN_SET_FPCR,
+  AARCH64_BUILTIN_GET_FPSR,
+  AARCH64_BUILTIN_SET_FPSR,
+
   AARCH64_SIMD_BUILTIN_BASE,
 #include "aarch64-simd-builtins.def"
   AARCH64_SIMD_BUILTIN_MAX = AARCH64_SIMD_BUILTIN_BASE
@@ -752,6 +758,24 @@ aarch64_init_simd_builtins (void)
 void
 aarch64_init_builtins (void)
 {
+  tree ftype_set_fpr
+= build_function_type_list (void_type_node, unsigned_type_node, NULL);
+  tree ftype_get_fpr
+= build_function_type_list (unsigned_type_node, NULL);
+
+  aarch64_builtin_decls[AARCH64_BUILTIN_GET_FPCR]
+= add_builtin_function ("__builtin_aarch64_get_fpcr", ftype_get_fpr,
+   AARCH64_BUILTIN_GET_FPCR, BUILT_IN_MD, NULL, 
NULL_TREE);
+  aarch64_builtin_decls[AARCH64_BUILTIN_SET_FPCR]
+= add_builtin_function ("__builtin_aarch64_set_fpcr", ftype_set_fpr,
+   AARCH64_BUILTIN_SET_FPCR, BUILT_IN_MD, NULL, 
NULL_TREE);
+  aarch64_builtin_decls[AARCH64_BUILTIN_GET_FPSR]
+= add_builtin_function ("__builtin_aarch64_get_fpsr", ftype_get_fpr,
+   AARCH64_BUILTIN_GET_FPSR, BUILT_IN_MD, NULL, 
NULL_TREE);
+  aarch64_builtin_decls[AARCH64_BUILTIN_SET_FPSR]
+= add_builtin_function ("__builtin_aarch64_set_fpsr", ftype_set_fpr,
+   AARCH64_BUILTIN_SET_FPSR, BUILT_IN_MD, NULL, 
NULL_TREE);
+
   if (TARGET_SIMD)
 aarch64_init_simd_builtins ();
 }
@@ -964,6 +988,36 @@ aarch64_expand_builtin (tree exp,
 {
   tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
   int fcode = DECL_FUNCTION_CODE (fndecl);
+  int icode;
+  rtx pat, op0;
+  tree arg0;
+
+  switch (fcode)
+{
+case AARCH64_BUILTIN_GET_FPCR:
+case AARCH64_BUILTIN_SET_FPCR:
+case AARCH64_BUILTIN_GET_FPSR:
+case AARCH64_BUILTIN_SET_FPSR:
+  if (

[wwwdocs] Mention generic functions and explicit lambda templates in gcc-4.9/changes

2014-04-28 Thread Adam Butcher


Hi,

The following patch adds details of support for generic functions and 
the explicit template parameter extension for generic lambdas present in 
GCC 4.9.


OK to commit?

Cheers,
Adam

Index: htdocs/gcc-4.9/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v
retrieving revision 1.68
diff -u -r1.68 changes.html
--- htdocs/gcc-4.9/changes.html 22 Apr 2014 11:28:09 -  1.68
+++ htdocs/gcc-4.9/changes.html 29 Apr 2014 02:10:39 -
@@ -273,12 +273,37 @@
 
   
   
-G++ supports C++1y 
polymorphic lambdas.

+G++ supports C++1y
+generic (polymorphic) lambdas.
 
 // a functional object that will increment any type
 auto incr = [](auto x) { return x++; };
 
   
+  
+As a GNU extension, G++ supports explicit template parameter
+syntax for generic lambdas.  This can be combined in the expected
+way with the standard auto syntax.
+
+// a functional object that will add two like-type objects
+auto add = []  (T a, T b) { return a + b; };
+
+  
+  
+G++ supports unconstrained generic functions as specified
+by §4.1.2 and §5.1.1 of
+href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3889.pdf";>

+N3889: Concepts Lite Specification.  Briefly,
+auto may be used as a type-specifier in a parameter
+declaration of any function declarator in order to introduce an
+implicit function template parameter, akin to generic lambdas.
+
+// the following two function declarations are equivalent
+auto incr(auto x) { return x++; }
+template 
+auto incr(T x) { return x++; }
+
+  
 

   Runtime Library (libstdc++)

[PATCH 2/2] allow running mklog as a filter

2014-04-28 Thread tsaunders

From: Trevor Saunders 

Hi,

I'd like to be able to suggest a git prepare-committ-msg hook, that uses this
at some point to populate the commit message at some point.  This doesn't do
that, but its a step in that direction, what would remain is just writing a
shell script to pipe git diff to mklog and then put the result in the commit
template.  Until that's done this atleast makes it so you don't need to
interact with a diff file at any point.

Trev


2014-04-28  Trevor Saunders


* mklog: if reading the patch on stdin write the ChangeLog to stdout.
---
 contrib/mklog | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/contrib/mklog b/contrib/mklog
index 5f5d98e..dfdd2a4 100755
--- a/contrib/mklog
+++ b/contrib/mklog
@@ -277,8 +277,16 @@ foreach my $clname (keys %cl_entries) {
print CLFILE "$clname:\n\n$hdrline\n\n$cl_entries{$clname}\n";
 }
 
-# Concatenate the ChangeLog template and the original .diff file.
-system ("cat $diff >>$temp && mv $temp $diff") == 0
-or die "Could not add the ChangeLog entry to $diff";
+# XXX We should probably accept /dev/stdin or maybe magic autodetection of
+# being supposed to get the patch from stdin.
+#
+# In any case if we got the diff on stdin then write the ChangeLog to stdout.
+if ($diff == "-") {
+   system("cat $temp");
+} else {
+   # Concatenate the ChangeLog template and the original .diff file.
+   system ("cat $diff >>$temp && mv $temp $diff") == 0
+   or die "Could not add the ChangeLog entry to $diff";
+}
 
 exit 0;
-- 
2.0.0.rc0

[PATCH 1/2] teach mklog to get name / email from git config when available

2014-04-28 Thread tsaunders

From: Trevor Saunders 

Hi,

 finger gives the wrong data on my machines, and while I could fix it it seems
nicer to use what's configured for the git repo we're in if any, that way you
can use different defaults from the rest of the machine.

Trev

contrib/ChangeLog:

2014-04-28  Trevor Saunders  

* mklog: if in a git checkout try to get name and email from git.
---
 contrib/mklog | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/contrib/mklog b/contrib/mklog
index fb489b0..5f5d98e 100755
--- a/contrib/mklog
+++ b/contrib/mklog
@@ -38,6 +38,20 @@ $gcc_root = $0;
 $gcc_root =~ s/[^\\\/]+$/../;
 chdir $gcc_root;
 
+# if this is a git tree then take name and email from the git configuration
+if (-d .git) {
+  $gitname = `git config user.name`;
+  chomp($gitname);
+  if ($gitname) {
+ $name = $gitname;
+  }
+
+  $gitaddr = `git config user.email`;
+  chomp($gitaddr);
+  if ($gitaddr) {
+ $addr = $gitaddr;
+  }
+}
 
 #-
 # Program starts here. You should not need to edit anything below this
-- 
2.0.0.rc0

[PATCH 0/2] make using mklog a little nicer

2014-04-28 Thread tsaunders

Hi,

 These patches make it a little nicer to use mklog, but there not particularly  
pretty.

Trev

Re: [PATCH (for next stage 1)] Add return type to gimple function dumps

2014-04-28 Thread David Malcolm

On Thu, 2014-04-24 at 15:46 -0600, Jeff Law wrote:
> On 03/10/14 13:22, David Malcolm wrote:
> > Gimple function dumps contain the types of parameters, but not of the
> > return type.
> >
> > The attached patch fixes this omission; here's an example of the
> > before/after diff:
> > $ diff -up /tmp/pr23401.c.004t.gimple.old /tmp/pr23401.c.004t.gimple.new
> > --- /tmp/pr23401.c.004t.gimple.old  2014-03-10 13:40:08.972063541 -0400
> > +++ /tmp/pr23401.c.004t.gimple.new  2014-03-10 13:39:49.346515464 -0400
> > @@ -1,3 +1,4 @@
> > +int
> >    (int i)
> >   {
> > int D.1731;
> >
> >
> > Successfully bootstrapped and regrtested on x86_64 Linux (Fedora 20).
> >
> > A couple of test cases needed tweaking, since they were counting the
> > number of occurrences of "int" in the gimple dump, which thus changed
> > for functions returning int (like the one above).
> >
> > OK for next stage 1?
> Conceptually OK.  As Richi notes, the work here is in fixing up the 
> testsuite.  I didn't see a reply to Richi's question, particularly WRT 
> the Fortran testsuite.

I'm attaching a revised version of the patch which adds the use of
TDF_SLIM (though it didn't appear to be necessary in the test I did of a
function returning a struct).

Successfully bootstrapped & regrtested on x86_64 Linux (Fedora 20),
using:
  --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto

I didn't see any new failures from this in the testsuite, in particular
gfortran.sum.  Here's a comparison of the before/after test results,
generated using my "jamais-vu" tool [1], with comments added by me
inline:

Comparing 16 common .sum files
--

 gcc/testsuite/ada/acats/acats.sum : total: 2320 PASS: 2320
 gcc/testsuite/g++/g++.sum : total: 90421 FAIL: 3 PASS: 86969 XFAIL: 445 
UNSUPPORTED: 3004
 gcc/testsuite/gcc/gcc.sum : total: 110458 FAIL: 45 PASS: 108292 XFAIL: 265 
XPASS: 33 UNSUPPORTED: 1823
 gcc/testsuite/gfortran/gfortran.sum : total: 45717 PASS: 45600 XFAIL: 52 
UNSUPPORTED: 65
 gcc/testsuite/gnat/gnat.sum : total: 1255 PASS: 1234 XFAIL: 18 UNSUPPORTED: 3
 gcc/testsuite/go/go.sum : total: 7266 PASS: 7258 XFAIL: 1 UNTESTED: 6 
UNSUPPORTED: 1
 gcc/testsuite/obj-c++/obj-c++.sum : total: 1450 PASS: 1354 XFAIL: 10 
UNSUPPORTED: 86
 gcc/testsuite/objc/objc.sum : total: 2973 PASS: 2893 XFAIL: 6 UNSUPPORTED: 74
 x86_64-unknown-linux-gnu/boehm-gc/testsuite/boehm-gc.sum : total: 13 PASS: 12 
UNSUPPORTED: 1
 x86_64-unknown-linux-gnu/libatomic/testsuite/libatomic.sum : total: 54 PASS: 54
 x86_64-unknown-linux-gnu/libffi/testsuite/libffi.sum : total: 1856 PASS: 1801 
UNSUPPORTED: 55
 x86_64-unknown-linux-gnu/libgo/libgo.sum : total: 122 PASS: 122
 x86_64-unknown-linux-gnu/libgomp/testsuite/libgomp.sum : total: 2420 PASS: 2420
 x86_64-unknown-linux-gnu/libitm/testsuite/libitm.sum : total: 30 PASS: 26 
XFAIL: 3 UNSUPPORTED: 1
 x86_64-unknown-linux-gnu/libjava/testsuite/libjava.sum : total: 2586 PASS: 
2582 XFAIL: 4
 x86_64-unknown-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum : total: 10265 
PASS: 1 XFAIL: 41 UNSUPPORTED: 224

(...i.e. the totals were unchanged between unpatched/patched for all of
the .sum files; and yes, Fortran was tested.  Should there be a
gcj.sum?)

Tests that went away in gcc/testsuite/gcc/gcc.sum: 2


 PASS: gcc.dg/tree-ssa/pr23401.c scan-tree-dump-times gimple "int" 5
 PASS: gcc.dg/tree-ssa/pr27810.c scan-tree-dump-times gimple "int" 3

Tests appeared in gcc/testsuite/gcc/gcc.sum: 2
--

 PASS: gcc.dg/tree-ssa/pr23401.c scan-tree-dump-times gimple "int" 6
 PASS: gcc.dg/tree-ssa/pr27810.c scan-tree-dump-times gimple "int" 4


(...my comparison tool isn't smart enough yet to tie these "went
away"/"appeared" results together; they reflect the fixups from the
patch).

Tests that went away in gcc/testsuite/go/go.sum: 2
--

 PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of build) 
compilation,  -O2 -g
 PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of build) 
execution,  -O2 -g

Tests appeared in gcc/testsuite/go/go.sum: 2


 PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of build) 
compilation,  -O2 -g
 PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of build) 
execution,  -O2 -g

(...I hand edited the above, this main.go test embeds numerous paths,
which change between the two builds; so nothing really changed here).


Are the above results sane?

I'm not sure why I didn't see the failures Richi described; the patch
does appear to work (though again, should there be a gcj.sum? Did I miss
any frontends?)

OK for trunk?

Dave

[1] https://github.com/davidmalcolm/jamais-vu
>From d89adb2ac085741e569d2988b3edf2bf6b481024 Mon Sep 17 00:00:00 2001
From: David Malcolm 
Date: Mon, 10 Mar 2014 13:52:44 -0400
Subject: [PATCH] Dump the return type

Re: [patch 3/N] std::regex refactoring - _Executor DFS / BFS

2014-04-28 Thread Tim Shen

On Mon, Apr 28, 2014 at 4:18 PM, Jonathan Wakely  wrote:
> I thought I'd make a 5x speedup to the run-time of the regex matching,
> but I was comparing the wrong version and the improvement actually
> came from one of your patches yesterday - maybe this one:
> http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01725.html
>
> Nice work!

That's surprising. May I ask for the performance testcase?

> My changes don't seem to make anything worse, and the first one makes
> a big improvement to the worst case performance of std::wregex.

Glad to see that!

Thanks!


-- 
Regards,
Tim Shen

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-04-28 Thread Richard Henderson

On 04/28/2014 01:43 PM, Evgeny Stupachenko wrote:
> Agree on checks:
> 
>   /* PALIGNR of 2 128-bits registers takes only 1 instrucion.
>  Requires SSSE3.  */
>   if (GET_MODE_SIZE (d->vmode) == 16)
> {
>   if(!TARGET_SSSE3)
> return false;
> }
>   /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
>  PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.  */
>   else if (GET_MODE_SIZE (d->vmode) == 32)
> {
>   if(!TARGET_AVX2)
> return false;
> }
>   else
> return false;

Thanks, much better.


r~

Re: Update doc/gimple.texi to reflect change from union to class hierarchy

2014-04-28 Thread David Malcolm

On Sat, 2014-04-26 at 22:22 +0200, Gerald Pfeifer wrote:
> On Fri, 25 Apr 2014, David Malcolm wrote:
> > Successfully generates HTML, info and pdf via appropriate make
> > invocations; example of resulting HTML can be seen at the bottom of:
> > 
> > http://dmalcolm.fedorapeople.org/gcc/2014-04-25/Tuple-representation.html
> > 
> > The diagram is split over pages 178-180 of the generated PDF:
> > 
> > http://dmalcolm.fedorapeople.org/gcc/2014-04-25/gccint.pdf
> 
> When I just checked that PDF did not (yet?) have that table.

Sadly I see a discrepancy between the page numbering as reported by my
PDF viewer (evince 3.4.0 with poppler 0.18.4), vs the page numbering
that appears at the top of the pages themselves, due to the viewer
appearing to start page numbering at "1" with the first page - despite
15 pages of frontmatter.  In my viewer, the "Introduction" page has "1"
at the top of it, but appears is page 16 in the viewer's navigation UI.
I don't know if this is an issue with Evince/Poppler (this *used* to
work; in a past life I briefly worked on Evince), or with the toolchain
used to generate the PDF.

I was referring to the page numbers visible in the pages themselves.  In
my viewer, pages 178-180 appear as pages 194-196 in the navigation UI,
and the table does appear there.

Sorry for any confusion.

> > but I don't think there's a good way to avoid page breaks there.
> 
> Agreed.
> 
> > Bootstrap in progress [do pure doc fixes require a bootstrap?]
> 
> Not generally, no.
> 
> > OK for trunk, assuming bootstraps? (and eventually for the 4.9 branch,
> > after a few days?)
> 
> Yes and yes.
> 
> (I did not review the technical accuracy of the diagram in detail, but 
> (a) micromanagement is not helpful and (b) the current documentation 
> is plain wrong.)

Thanks.

> The one thing that confused me a bit at first was the line
>   +- gimple_statement_base
> where I then thought all the others should move their indentation
> left by four or so columns.  Which is not what you are trying to
> state in fact.  How about omitting the "+-" from that one?
> 
> Gerald

Sure; omitted.

Looking at the generated "info", it occurred to me that the diagram
should live in its own node, so I took the liberty of adding that also.

Successfully bootstrapped (again); and make info/html/pdf work as
expected.

Committed to trunk as r209879.  Will commit to 4.9 in a few days unless
anyone objects.

Dave

Re: [PATCH, rs6000] Improve atomic_load/store code gen for Power8 TI mode

2014-04-28 Thread Pat Haugen


On 04/09/2014 02:56 PM, David Edelsohn wrote:

I have reverted this on trunk and asked Bill to revert this on the 4.8
branch. This patch is too risky to apply this close to a freeze for
4.9.
I received approval off list for an updated variant of the patch for 
4.8, so this patch has now been (re)committed to 4.8/4.9/trunk.


-Pat

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-04-28 Thread Evgeny Stupachenko

Agree on checks:

  /* PALIGNR of 2 128-bits registers takes only 1 instrucion.
 Requires SSSE3.  */
  if (GET_MODE_SIZE (d->vmode) == 16)
{
  if(!TARGET_SSSE3)
return false;
}
  /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
 PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.  */
  else if (GET_MODE_SIZE (d->vmode) == 32)
{
  if(!TARGET_AVX2)
return false;
}
  else
return false;


On Mon, Apr 28, 2014 at 9:32 PM, Richard Henderson  wrote:
> On 04/28/2014 09:48 AM, Evgeny Stupachenko wrote:
>> -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
>> -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
>> +  /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
>> + PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.
>> + PALIGNR of 2 128-bits registers takes only 1 instrucion.  */
>> +  if (!TARGET_SSSE3 || (GET_MODE_SIZE (d->vmode) != 16 &&
>> +  GET_MODE_SIZE (d->vmode) != 32))
>> +return false;
>> +  /* Only AVX2 or higher support PALIGNR on 256-bits registers.  */
>> +  if (!TARGET_AVX2 && (GET_MODE_SIZE (d->vmode) == 32))
>>  return false;
>
> This is confusingly written.
>
> How about
>
>   if (GET_MODE_SIZE (d->vmode) == 16)
> {
>   if (!TARGET_SSSE3)
> return false;
> }
>   else if (GET_MODE_SIZE (d->vmode) == 32)
> {
>   if (!TARGET_AVX2)
> return false;
> }
>   else
> return false;
>
> With the comments added into the right places.
>
>
> r~

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-04-28 Thread Evgeny Stupachenko

On Mon, Apr 28, 2014 at 9:08 PM, H.J. Lu  wrote:
> On Mon, Apr 28, 2014 at 9:48 AM, Evgeny Stupachenko  
> wrote:
>> Hi,
>>
>> The patch enables use of "palignr with perm" instead of "2 pshufb, or
>> and perm" at AVX2 for some cases.
>>
>> Bootstrapped and passes make check on x86.
>>
>> Is it ok?
>>
>> 2014-04-28  Evgeny Stupachenko  
>>
>> * config/i386/i386.c (expand_vec_perm_1): Try AVX2 vpshufb.
>> * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
>> PALINGR instruction.
>
> Can you add testcases to verify that AVX2 vpshufb and paligngr are
> properly generated?

One of next patches will have test case.

>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index 88142a8..ae80477 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -42807,6 +42807,8 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
>>return true;
>>  }
>>
>> +static bool expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d);
>> +
>>  /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to instantiate D
>> in a single instruction.  */
>>
>> @@ -42946,6 +42948,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
>>if (expand_vec_perm_pshufb (d))
>>  return true;
>>
>> +  /* Try the AVX2 vpshufb.  */
>> +  if (expand_vec_perm_vpshufb2_vpermq (d))
>> +return true;
>> +
>>/* Try the AVX512F vpermi2 instructions.  */
>>rtx vec[64];
>>enum machine_mode mode = d->vmode;
>> @@ -43004,7 +43010,7 @@ expand_vec_perm_pshuflw_pshufhw (struct
>> expand_vec_perm_d *d)
>>  }
>>
>>  /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
>> -   the permutation using the SSSE3 palignr instruction.  This succeeds
>> +   the permutation using the SSSE3/AVX2 palignr instruction.  This succeeds
>> when all of the elements in PERM fit within one vector and we merely
>> need to shift them down so that a single vector permutation has a
>> chance to succeed.  */
>> @@ -43015,14 +43021,20 @@ expand_vec_perm_palignr (struct expand_vec_perm_d 
>> *d)
>>unsigned i, nelt = d->nelt;
>>unsigned min, max;
>>bool in_order, ok;
>> -  rtx shift, target;
>> +  rtx shift, shift1, target, tmp;
>>struct expand_vec_perm_d dcopy;
>>
>> -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
>> -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
>> +  /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
>> + PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.
>> + PALIGNR of 2 128-bits registers takes only 1 instrucion.  */
>> +  if (!TARGET_SSSE3 || (GET_MODE_SIZE (d->vmode) != 16 &&
>
>^^^ This should
> be only on the next lined.
>
>> +  GET_MODE_SIZE (d->vmode) != 32))
>> +return false;
>> +  /* Only AVX2 or higher support PALIGNR on 256-bits registers.  */
>> +  if (!TARGET_AVX2 && (GET_MODE_SIZE (d->vmode) == 32))
>  ^ No need for '(' here.
>
> Or you can use
>
> if ((!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
>  && (!TARGET_AVX2 || GET_MODE_SIZE (d->vmode) != 32))
>
>>  return false;
>>
>> -  min = nelt, max = 0;
>> +  min = 2 * nelt, max = 0;
> ^^  Will this change 128bit vector code?

No. The only case when all elements in permutation constant are equal
or greater than "nelt" will lead to canonization to one operand case
with all elements less than "nelt". So the change actually affects
nothing, but still more accurate.

>
>>for (i = 0; i < nelt; ++i)
>>  {
>>unsigned e = d->perm[i];
>> @@ -43041,9 +43053,34 @@ expand_vec_perm_palignr (struct expand_vec_perm_d 
>> *d)
>>
>>dcopy = *d;
>>shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
>> -  target = gen_reg_rtx (TImode);
>> -  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
>> - gen_lowpart (TImode, d->op0), shift));
>> +  shift1 = GEN_INT ((min - nelt / 2) *
>> +  GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
>> +
>> +  if (GET_MODE_SIZE (d->vmode) != 32)
>> +{
>> +  target = gen_reg_rtx (TImode);
>> +  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
>> + gen_lowpart (TImode, d->op0), shift));
>> +}
>> +  else
>> +{
>> +  target = gen_reg_rtx (V2TImode);
>> +  tmp = gen_reg_rtx (V4DImode);
>> +  emit_insn (gen_avx2_permv2ti (tmp,
>> +   gen_lowpart (V4DImode, d->op0),
>> +   gen_lowpart (V4DImode, d->op1),
>> +   GEN_INT (33)));
>> +  if (min < nelt / 2)
>> +emit_insn (gen_avx2_palignrv2ti (target,
>> +gen_lowpart (V2TImode, tmp),
>> +gen_lowpart (V2TImode, d->op0),
>> +shift

[google/gcc-4_9] PR debug/60929: Fix a few ICEs and other problems with -fdebug-types-sections

2014-04-28 Thread Cary Coutant

I've backported the following patch from trunk at r209812. Committed
on the google/gcc-4_9 branch at r209875.

Google ref: 14230806.

-cary

gcc/
* dwarf2out.c (should_move_die_to_comdat): A type definition
can contain a subprogram definition, but don't move it to a
comdat unit.
(clone_as_declaration): Copy DW_AT_abstract_origin attribute.
(generate_skeleton_bottom_up): Remove DW_AT_object_pointer attribute
from original DIE.
(clone_tree_hash): Rename to...
(clone_tree_partial): ...this; change callers.  Copy
DW_TAG_subprogram DIEs as declarations.
(copy_decls_walk): Don't copy children of a declaration into a
type unit.

gcc/testsuite/
* g++.dg/debug/dwarf2/dwarf4-nested.C: New test case.
* g++.dg/debug/dwarf2/dwarf4-typedef.C: Add
-fdebug-types-section flag.

Re: [patch 3/N] std::regex refactoring - _Executor DFS / BFS

2014-04-28 Thread Jonathan Wakely


I thought I'd make a 5x speedup to the run-time of the regex matching,
but I was comparing the wrong version and the improvement actually
came from one of your patches yesterday - maybe this one:
http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01725.html

Nice work!

My changes don't seem to make anything worse, and the first one makes
a big improvement to the worst case performance of std::wregex.

Re: [PATCH] Implement -fsanitize=float-divide-by-zero

2014-04-28 Thread Marc Glisse


On Mon, 28 Apr 2014, Marek Polacek wrote:


This patch implements -fsanitize=float-divide-by-zero option that can
be used to detect division by zero even when using floating types.
Most of the code in ubsan_instrument_division was ready for this
so this was mainly about handling REAL_TYPE there.


Ideally this would all be unneeded, you would compile your program with 
pragma stdc fenv_access on, on glibc you would call 
feenableexcept(FE_DIVBYZERO) at the beginning of the program, done (I may 
have listed the wrong function, I am always confused by those names).


But I guess we won't be there anytime soon, so in the mean time...


Since division by a floating point zero can be a valid way of
obtaining infinities and NaNs, I'm not 100% sure this ought to be
enabled by default (that is, enabled when -fsanitize=undefined is
specified).


Please don't enable it with "undefined". As you say, it is well defined 
(except when finite-math-only is in effect). If you want a meta-category 
for this kind of valid thing, maybe -fsanitize=unusual or 
-fsanitize=suspicious.


--
Marc Glisse

Re: [patch 3/N] std::regex refactoring - _Executor DFS / BFS

2014-04-28 Thread Tim Shen

On Mon, Apr 28, 2014 at 3:29 PM, Jonathan Wakely  wrote:
> I'm testing the attached patch now. It compiles slightly faster
> (-ftime-report shows, as expected, that less time is spent in template
> instantiation).
>
> I'd also like to change __match_mode from a bool to an enum like:
>
>   enum _Match_mode { _S_exact_match, _S_prefix_match };
>
> Because I find it easier to read something like:
>
>   if (__match_mode == _S_exact_match)
> // ...
>
> rather than
>
>   if (__match_mode)
> // ...

Oh this is nice, good to know.

Thanks!


-- 
Regards,
Tim Shen

Re: Changes for if-convert to recognize simple conditional reduction.

2014-04-28 Thread Richard Henderson

On 04/17/2014 06:09 AM, Yuri Rumyantsev wrote:
> +  /* Build cond expression using COND and constant operand
> + of reduction rhs.  */
> +  c = fold_build_cond_expr (TREE_TYPE (rhs1),
> + unshare_expr (cond),
> + swap? zero: op1,
> + swap? op1: zero);

Do we recognize somewhere the canonical value for the comparison is -1, and
simplify this further?

E.g.

  if (A[i] != 0) num += 1;

  _vec_cmp = (_vec_A != _vec_ZERO);
  _vec_num -= _vec_cmp;


  if (A[i] != 0) num += x;

  _vec_cmp = (_vec_A != _vec_ZERO);
  _vec_cmp *= _vec_x;
  _vec_num -= _vec_cmp;



r~

Re: [RFC] Add aarch64 support for ada

2014-04-28 Thread Richard Henderson

On 04/28/2014 08:00 AM, Eric Botcazou wrote:
> You can re-apply the gcc-interface/Makefile.in hunk 
> (I reverted it as well) but you first need to adjust it to the mainline.

Done, after re-bootstrapping on aarch64 Just to Be Sure.


r~

Re: [PATCH, rs6000] Improve TImode add/sub

2014-04-28 Thread David Edelsohn

On Mon, Apr 28, 2014 at 3:33 PM, Pat Haugen  wrote:
> On 04/16/2014 10:27 PM, David Edelsohn wrote:
>>>
>>> >Updated patch with above comments incorporated. Bootstrap/regtest on
>>> > BE/LE
>>> >with no new regressions. Ok for trunk?
>>
>> 2014-04-08  Pat Haugen
>>
>>  * config/rs6000/rs6000.md (addti3, subti3): New.
>>
>> gcc/testsuite:
>>  * gcc.target/powerpc/ti_math1.c: New.
>>  * gcc.target/powerpc/ti_math2.c: New.
>>
>> Okay.
>
> Is this also ok to backport to 4.8/4.9? Bootstrap/regtest on BE/LE with no
> new regressions.

The respective variants for 4.8 and 4.9/trunk are okay.

Thanks, David

Re: [patch 3/N] std::regex refactoring - _Executor DFS / BFS

2014-04-28 Thread Jonathan Wakely


On 28/04/14 15:24 -0400, Tim Shen wrote:

Worth a try. Will you make the change or will I? It seems to be
simpler doing than talking.


Yes :-)

I'm testing the attached patch now. It compiles slightly faster
(-ftime-report shows, as expected, that less time is spent in template
instantiation).

I'd also like to change __match_mode from a bool to an enum like:

  enum _Match_mode { _S_exact_match, _S_prefix_match };

Because I find it easier to read something like:

  if (__match_mode == _S_exact_match)
// ...

rather than

  if (__match_mode)
// ...


diff --git a/libstdc++-v3/include/bits/regex_executor.h 
b/libstdc++-v3/include/bits/regex_executor.h
index 064e3df..617d1a4 100644
--- a/libstdc++-v3/include/bits/regex_executor.h
+++ b/libstdc++-v3/include/bits/regex_executor.h
@@ -88,7 +88,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _M_match()
   {
_M_current = _M_begin;
-   return _M_main();
+   return _M_main(true);
   }
 
   // Set matched when some prefix of the string matches the pattern.
@@ -96,33 +96,28 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _M_search_from_first()
   {
_M_current = _M_begin;
-   return _M_main();
+   return _M_main(false);
   }
 
   bool
   _M_search();
 
 private:
-  template
-   void
-   _M_rep_once_more(_StateIdT);
-
-  template
-   void
-   _M_dfs(_StateIdT __start);
-
-  template
-   bool
-   _M_main()
-   { return _M_main_dispatch<__match_mode>(__search_mode{}); }
-
-  template
-   bool
-   _M_main_dispatch(__dfs);
-
-  template
-   bool
-   _M_main_dispatch(__bfs);
+  void
+  _M_rep_once_more(bool __match_mode, _StateIdT);
+
+  void
+  _M_dfs(bool __match_mode, _StateIdT __start);
+
+  bool
+  _M_main(bool __match_mode)
+  { return _M_main_dispatch(__match_mode, __search_mode{}); }
+
+  bool
+  _M_main_dispatch(bool __match_mode, __dfs);
+
+  bool
+  _M_main_dispatch(bool __match_mode, __bfs);
 
   bool
   _M_is_word(_CharT __ch) const
diff --git a/libstdc++-v3/include/bits/regex_executor.tcc 
b/libstdc++-v3/include/bits/regex_executor.tcc
index a7351de..c0c75fa 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -45,7 +45,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   do
{
  _M_current = __cur;
- if (_M_main())
+ if (_M_main(false))
return true;
}
   // Continue when __cur == _M_end
@@ -78,13 +78,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   //
   template
-  template
 bool _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
-_M_main_dispatch(__dfs)
+_M_main_dispatch(bool __match_mode, __dfs)
 {
   _M_has_sol = false;
   _M_cur_results = _M_results;
-  _M_dfs<__match_mode>(_M_states._M_start);
+  _M_dfs(__match_mode, _M_states._M_start);
   return _M_has_sol;
 }
 
@@ -112,9 +111,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   //   O(_M_nfa.size() * match_results.size())
   template
-  template
 bool _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
-_M_main_dispatch(__bfs)
+_M_main_dispatch(bool __match_mode, __bfs)
 {
   _M_states._M_queue(_M_states._M_start, _M_results);
   bool __ret = false;
@@ -128,7 +126,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  for (auto& __task : __old_queue)
{
  _M_cur_results = std::move(__task.second);
- _M_dfs<__match_mode>(__task.first);
+ _M_dfs(__match_mode, __task.first);
}
  if (!__match_mode)
__ret |= _M_has_sol;
@@ -168,9 +166,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // we need to spare one more time for potential group capture.
   template
-  template
 void _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
-_M_rep_once_more(_StateIdT __i)
+_M_rep_once_more(bool __match_mode, _StateIdT __i)
 {
   const auto& __state = _M_nfa[__i];
   auto& __rep_count = _M_rep_count[__i];
@@ -179,7 +176,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  auto __back = __rep_count;
  __rep_count.first = _M_current;
  __rep_count.second = 1;
- _M_dfs<__match_mode>(__state._M_alt);
+ _M_dfs(__match_mode, __state._M_alt);
  __rep_count = __back;
}
   else
@@ -187,7 +184,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  if (__rep_count.second < 2)
{
  __rep_count.second++;
- _M_dfs<__match_mode>(__state._M_alt);
+ _M_dfs(__match_mode, __state._M_alt);
  __rep_count.second--;
}
}
@@ -195,9 +192,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
-  template
 void _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
-_M_dfs(_StateIdT __i)
+_M_dfs(bool __match_mode, _StateIdT __i)
 {
   if (_M_states._M_visited(__i))
return;
@@ -216,19

Re: [PATCH, rs6000] Improve TImode add/sub

2014-04-28 Thread Pat Haugen

On 04/16/2014 10:27 PM, David Edelsohn wrote:

>Updated patch with above comments incorporated. Bootstrap/regtest on BE/LE
>with no new regressions. Ok for trunk?

2014-04-08  Pat Haugen

 * config/rs6000/rs6000.md (addti3, subti3): New.

gcc/testsuite:
 * gcc.target/powerpc/ti_math1.c: New.
 * gcc.target/powerpc/ti_math2.c: New.

Okay.
Is this also ok to backport to 4.8/4.9? Bootstrap/regtest on BE/LE with 
no new regressions.

Re: [patch 3/N] std::regex refactoring - _Executor DFS / BFS

2014-04-28 Thread Tim Shen

On Mon, Apr 28, 2014 at 3:10 PM, Jonathan Wakely  wrote:
> Data members are accessed through the 'this' pointer, which can
> require an indirection and be harder to optimise than a function
> parameter.

It doesn't matter if this variable is not frequently checked. But
let's just use the function parameter and expecting compiler's
optimization.

> My concern is that making it a template parameter causes twice as much
> code to be instantiated, which makes executables bigger and can cause
> more I-cache misses.

Worth a try. Will you make the change or will I? It seems to be
simpler doing than talking.

-- 
Regards,
Tim Shen

Re: [patch 3/N] std::regex refactoring - _Executor DFS / BFS

2014-04-28 Thread Jonathan Wakely


On 28/04/14 15:02 -0400, Tim Shen wrote:

If we want to change it to a runtime flag, it should be a class
member. Otherwise we have to pass it as a function parameter all the
time, and it may waste an instruction and one byte per recursive call.
It surely make the code cleaner.


Data members are accessed through the 'this' pointer, which can
require an indirection and be harder to optimise than a function
parameter.


Am I premature optimizing?


I don't know yet - we need to measure.

My concern is that making it a template parameter causes twice as much
code to be instantiated, which makes executables bigger and can cause
more I-cache misses.

[PATCH] Implement -fsanitize=float-divide-by-zero

2014-04-28 Thread Marek Polacek

This patch implements -fsanitize=float-divide-by-zero option that can
be used to detect division by zero even when using floating types.
Most of the code in ubsan_instrument_division was ready for this
so this was mainly about handling REAL_TYPE there. 

Since division by a floating point zero can be a valid way of
obtaining infinities and NaNs, I'm not 100% sure this ought to be
enabled by default (that is, enabled when -fsanitize=undefined is
specified).

Regtested/bootstrapped/ran bootstrap-ubsan on x86_64-linux, ok for
trunk?

2014-04-28  Marek Polacek  

* flag-types.h (enum sanitize_code): Add SANITIZE_FLOAT_DIVIDE and
or it into SANITIZE_UNDEFINED.
* opts.c (common_handle_option): Add -fsanitize=float-divide-by-zero.
c-family/
* c-ubsan.c (ubsan_instrument_division): Handle REAL_TYPEs.  Perform
INT_MIN / -1 sanitization only for integer types.
c/
* c-typeck.c (build_binary_op): Call ubsan_instrument_division
also when SANITIZE_FLOAT_DIVIDE is on.
cp/
* typeck.c (cp_build_binary_op): Call ubsan_instrument_division
even when SANITIZE_FLOAT_DIVIDE is on.  Set doing_div_or_mod even
for non-integer types.
testsuite/
* c-c++-common/ubsan/div-by-zero-5.c: Fix formatting.
* c-c++-common/ubsan/float-div-by-zero-1.c: New test.

diff --git gcc/c-family/c-ubsan.c gcc/c-family/c-ubsan.c
index e4f6f32..a039792 100644
--- gcc/c-family/c-ubsan.c
+++ gcc/c-family/c-ubsan.c
@@ -46,15 +46,21 @@ ubsan_instrument_division (location_t loc, tree op0, tree 
op1)
   gcc_assert (TYPE_MAIN_VARIANT (TREE_TYPE (op0))
  == TYPE_MAIN_VARIANT (TREE_TYPE (op1)));
 
-  /* TODO: REAL_TYPE is not supported yet.  */
-  if (TREE_CODE (type) != INTEGER_TYPE)
+  if (TREE_CODE (type) == INTEGER_TYPE
+  && (flag_sanitize & SANITIZE_DIVIDE))
+t = fold_build2 (EQ_EXPR, boolean_type_node,
+op1, build_int_cst (type, 0));
+  else if (TREE_CODE (type) == REAL_TYPE
+  && (flag_sanitize & SANITIZE_FLOAT_DIVIDE))
+t = fold_build2 (EQ_EXPR, boolean_type_node,
+op1, build_real (type, dconst0));
+  else
 return NULL_TREE;
 
-  t = fold_build2 (EQ_EXPR, boolean_type_node,
-   op1, build_int_cst (type, 0));
-
   /* We check INT_MIN / -1 only for signed types.  */
-  if (!TYPE_UNSIGNED (type))
+  if (TREE_CODE (type) == INTEGER_TYPE
+  && (flag_sanitize & SANITIZE_DIVIDE)
+  && !TYPE_UNSIGNED (type))
 {
   tree x;
   tt = fold_build2 (EQ_EXPR, boolean_type_node, op1,
diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 62c72df..8df544b 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -10995,7 +10995,8 @@ build_binary_op (location_t location, enum tree_code 
code,
return error_mark_node;
 }
 
-  if ((flag_sanitize & (SANITIZE_SHIFT | SANITIZE_DIVIDE))
+  if ((flag_sanitize & (SANITIZE_SHIFT | SANITIZE_DIVIDE
+   | SANITIZE_FLOAT_DIVIDE))
   && current_function_decl != 0
   && !lookup_attribute ("no_sanitize_undefined",
DECL_ATTRIBUTES (current_function_decl))
@@ -11006,7 +11007,8 @@ build_binary_op (location_t location, enum tree_code 
code,
   op1 = c_save_expr (op1);
   op0 = c_fully_fold (op0, false, NULL);
   op1 = c_fully_fold (op1, false, NULL);
-  if (doing_div_or_mod && (flag_sanitize & SANITIZE_DIVIDE))
+  if (doing_div_or_mod && (flag_sanitize & (SANITIZE_DIVIDE
+   | SANITIZE_FLOAT_DIVIDE)))
instrument_expr = ubsan_instrument_division (location, op0, op1);
   else if (doing_shift && (flag_sanitize & SANITIZE_SHIFT))
instrument_expr = ubsan_instrument_shift (location, code, op0, op1);
diff --git gcc/cp/typeck.c gcc/cp/typeck.c
index 9a80727..99b4ce6 100644
--- gcc/cp/typeck.c
+++ gcc/cp/typeck.c
@@ -4112,10 +4112,7 @@ cp_build_binary_op (location_t location,
  enum tree_code tcode0 = code0, tcode1 = code1;
  tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
  cop1 = maybe_constant_value (cop1);
-
- if (tcode0 == INTEGER_TYPE)
-   doing_div_or_mod = true;
-
+ doing_div_or_mod = true;
  warn_for_div_by_zero (location, cop1);
 
  if (tcode0 == COMPLEX_TYPE || tcode0 == VECTOR_TYPE)
@@ -4155,9 +4152,7 @@ cp_build_binary_op (location_t location,
   {
tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
cop1 = maybe_constant_value (cop1);
-
-   if (code0 == INTEGER_TYPE)
- doing_div_or_mod = true;
+   doing_div_or_mod = true;
warn_for_div_by_zero (location, cop1);
   }
 
@@ -4904,7 +4899,8 @@ cp_build_binary_op (location_t location,
   if (build_type == NULL_TREE)
 build_type = result_type;
 
-  if ((flag_sanitize & (SANITIZE_SHIFT | SANITIZE_DIVIDE))
+  if ((flag_sanitize & (SANITIZE_SHIFT | SANITIZE_DIVIDE
+   | SANITIZE_FLOAT_DIVIDE))
   && !p

Re: [patch 3/N] std::regex refactoring - _Executor DFS / BFS

2014-04-28 Thread Tim Shen

On Mon, Apr 28, 2014 at 12:51 PM, Jonathan Wakely  wrote:
> The next thing I plan to look at, which I haven't done yet, is to see
> if passing the __match_mode template parameter as a runtime function
> parameter makes any difference to the way the code is structuted. Do
> you have any thoughts in that, before I waste time doing something
> that won't work?

The thing behind the template parameter is like this:

0) We have DFS and BFS executor;
1) To be DRY, I write one class for those two approaches. We have to
use some option variable (__match_mode) in a function (say
_Executor::_M_dfs) to distingush one approach from another.
2) Keep checking the flag at runtime hurts efficiency, so a template
flag is used.

However, it turns out that it's not clear that if __match_mode is
frequently asked (in _Executor::_M_main and the _S_opcode_accept
branch in _Executor::_M_dfs).

If we want to change it to a runtime flag, it should be a class
member. Otherwise we have to pass it as a function parameter all the
time, and it may waste an instruction and one byte per recursive call.
It surely make the code cleaner.

Am I premature optimizing?

-- 
Regards,
Tim Shen

Re: [PATCH, PR60738] More LRA split for regno conflicting with single reg class operand

2014-04-28 Thread Wei Mi

Thanks. I will change

> + if (a != operand_a
> + && (LRA_SPLIT_FREQ_RATIO * freq >= a->freq))

to

> + if (a != operand_a
> + && (!ira_use_lra_p || LRA_SPLIT_FREQ_RATIO * freq >= a->freq))

Regards,
Wei.

On Mon, Apr 28, 2014 at 12:57 AM, Steven Bosscher  wrote:
> On Sat, Apr 26, 2014 at 5:35 AM, Wei Mi wrote:
>> Index: ira-lives.c
>> ===
>> --- ira-lives.c (revision 209253)
>> +++ ira-lives.c (working copy)
>> @@ -1025,7 +1025,11 @@ process_single_reg_class_operands (bool
>>  {
>>   ira_object_t obj = ira_object_id_map[px];
>>   a = OBJECT_ALLOCNO (obj);
>> - if (a != operand_a)
>> + /* If a is much hotter in some other region, don't add reg class
>> +cl into its conflict hardreg set. Let lra_split to do splitting
>> +here for operand_a.  */
>> + if (a != operand_a
>> + && (LRA_SPLIT_FREQ_RATIO * freq >= a->freq))
>> {
>>   /* We could increase costs of A instead of making it
>>  conflicting with the hard register.  But it works worse
>
> AFAICT this path is not LRA specific, so your patch may break ports
> still relying on reload.
>
> Ciao!
> Steven

Re: [PATCH] pedantic warning behavior when casting void* to ptr-to-func, 4.8 and 4.9

2014-04-28 Thread Jason Merrill


Applied, thanks.  Sorry for the delay.

Jason

Fwd: status of wide-int patch.

2014-04-28 Thread Mike Stump

FYI:

Begin forwarded message:

> From: Kenneth Zadeck 
> Subject: status of wide-int patch.
> Date: April 28, 2014 at 10:03:36 AM PDT
> To: gcc , Richard Sandiford , 
> Richard Biener , Mike Stump 
> 
> At this point we have believe that we have addressed all of the concerns that 
> the community has made about the wide-int branch.   We have also had each of 
> the sections of the branch approved by the area maintainers.
> 
> We are awaiting a clean build on the arm and are currently retesting x86-64, 
> s390, and p7 but assuming that those are clean, we are ready to merge this 
> branch into trunk in the next day or so.Other port maintainers may wish 
> consider testing on the branch before we commit.   Otherwise we will fix any 
> regressions after the merge.
> 
> Thanks for all of the help we have received along the way.
> 
> Kenny

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-04-28 Thread Richard Henderson

On 04/28/2014 09:48 AM, Evgeny Stupachenko wrote:
> -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
> -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
> +  /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
> + PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.
> + PALIGNR of 2 128-bits registers takes only 1 instrucion.  */
> +  if (!TARGET_SSSE3 || (GET_MODE_SIZE (d->vmode) != 16 &&
> +  GET_MODE_SIZE (d->vmode) != 32))
> +return false;
> +  /* Only AVX2 or higher support PALIGNR on 256-bits registers.  */
> +  if (!TARGET_AVX2 && (GET_MODE_SIZE (d->vmode) == 32))
>  return false;

This is confusingly written.

How about

  if (GET_MODE_SIZE (d->vmode) == 16)
{
  if (!TARGET_SSSE3)
return false;
}
  else if (GET_MODE_SIZE (d->vmode) == 32)
{
  if (!TARGET_AVX2)
return false;
}
  else
return false;

With the comments added into the right places.


r~

Re: PR debug/60929: Fix a few ICEs and other problems with -fdebug-types-sections

2014-04-28 Thread Cary Coutant

What are the rules for backporting to 4.9.1? Should I backport this patch?

-cary


> 2014-04-25  Cary Coutant  
>
> gcc/
> PR debug/60929
> * dwarf2out.c (should_move_die_to_comdat): A type definition
> can contain a subprogram definition, but don't move it to a
> comdat unit.
> (clone_as_declaration): Copy DW_AT_abstract_origin attribute.
> (generate_skeleton_bottom_up): Remove DW_AT_object_pointer attribute
> from original DIE.
> (clone_tree_hash): Rename to...
> (clone_tree_partial): ...this; change callers.  Copy
> DW_TAG_subprogram DIEs as declarations.
> (copy_decls_walk): Don't copy children of a declaration into a
> type unit.
>
> gcc/testsuite/
> PR debug/60929
> * g++.dg/debug/dwarf2/dwarf4-nested.C: New test case.
> * g++.dg/debug/dwarf2/dwarf4-typedef.C: Add -fdebug-types-section 
> flag.

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-04-28 Thread H.J. Lu

On Mon, Apr 28, 2014 at 9:48 AM, Evgeny Stupachenko  wrote:
> Hi,
>
> The patch enables use of "palignr with perm" instead of "2 pshufb, or
> and perm" at AVX2 for some cases.
>
> Bootstrapped and passes make check on x86.
>
> Is it ok?
>
> 2014-04-28  Evgeny Stupachenko  
>
> * config/i386/i386.c (expand_vec_perm_1): Try AVX2 vpshufb.
> * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
> PALINGR instruction.

Can you add testcases to verify that AVX2 vpshufb and paligngr are
properly generated?

> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 88142a8..ae80477 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -42807,6 +42807,8 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
>return true;
>  }
>
> +static bool expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d);
> +
>  /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to instantiate D
> in a single instruction.  */
>
> @@ -42946,6 +42948,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
>if (expand_vec_perm_pshufb (d))
>  return true;
>
> +  /* Try the AVX2 vpshufb.  */
> +  if (expand_vec_perm_vpshufb2_vpermq (d))
> +return true;
> +
>/* Try the AVX512F vpermi2 instructions.  */
>rtx vec[64];
>enum machine_mode mode = d->vmode;
> @@ -43004,7 +43010,7 @@ expand_vec_perm_pshuflw_pshufhw (struct
> expand_vec_perm_d *d)
>  }
>
>  /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
> -   the permutation using the SSSE3 palignr instruction.  This succeeds
> +   the permutation using the SSSE3/AVX2 palignr instruction.  This succeeds
> when all of the elements in PERM fit within one vector and we merely
> need to shift them down so that a single vector permutation has a
> chance to succeed.  */
> @@ -43015,14 +43021,20 @@ expand_vec_perm_palignr (struct expand_vec_perm_d 
> *d)
>unsigned i, nelt = d->nelt;
>unsigned min, max;
>bool in_order, ok;
> -  rtx shift, target;
> +  rtx shift, shift1, target, tmp;
>struct expand_vec_perm_d dcopy;
>
> -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
> -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
> +  /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
> + PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.
> + PALIGNR of 2 128-bits registers takes only 1 instrucion.  */
> +  if (!TARGET_SSSE3 || (GET_MODE_SIZE (d->vmode) != 16 &&

   ^^^ This should
be only on the next lined.

> +  GET_MODE_SIZE (d->vmode) != 32))
> +return false;
> +  /* Only AVX2 or higher support PALIGNR on 256-bits registers.  */
> +  if (!TARGET_AVX2 && (GET_MODE_SIZE (d->vmode) == 32))
 ^ No need for '(' here.

Or you can use

if ((!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
 && (!TARGET_AVX2 || GET_MODE_SIZE (d->vmode) != 32))

>  return false;
>
> -  min = nelt, max = 0;
> +  min = 2 * nelt, max = 0;
^^  Will this change 128bit vector code?

>for (i = 0; i < nelt; ++i)
>  {
>unsigned e = d->perm[i];
> @@ -43041,9 +43053,34 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)
>
>dcopy = *d;
>shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
> -  target = gen_reg_rtx (TImode);
> -  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
> - gen_lowpart (TImode, d->op0), shift));
> +  shift1 = GEN_INT ((min - nelt / 2) *
> +  GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
> +
> +  if (GET_MODE_SIZE (d->vmode) != 32)
> +{
> +  target = gen_reg_rtx (TImode);
> +  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
> + gen_lowpart (TImode, d->op0), shift));
> +}
> +  else
> +{
> +  target = gen_reg_rtx (V2TImode);
> +  tmp = gen_reg_rtx (V4DImode);
> +  emit_insn (gen_avx2_permv2ti (tmp,
> +   gen_lowpart (V4DImode, d->op0),
> +   gen_lowpart (V4DImode, d->op1),
> +   GEN_INT (33)));
> +  if (min < nelt / 2)
> +emit_insn (gen_avx2_palignrv2ti (target,
> +gen_lowpart (V2TImode, tmp),
> +gen_lowpart (V2TImode, d->op0),
> +shift));
> +  else
> +   emit_insn (gen_avx2_palignrv2ti (target,
> +gen_lowpart (V2TImode, d->op1),
> +gen_lowpart (V2TImode, tmp),
> +shift1));
> +}
>
>dcopy.op0 = dcopy.op1 = gen_lowpart (d->vmode, target);
>dcopy.one_operand_p = true;
>
>
> Evgeny



-- 
H.J.

Re: [patch 3/N] std::regex refactoring - _Executor DFS / BFS

2014-04-28 Thread Jonathan Wakely


On 28/04/14 11:40 -0400, Tim Shen wrote:

On Mon, Apr 28, 2014 at 10:46 AM, Jonathan Wakely  wrote:

This change splits _Executor::_M_main() into two overloaded
_M_main_dispatch() functions, choosing which to run based on the
__dfs_mode template parameter.

I think this gives a (very) small improvement in compilation time when
using regexes.

Splitting _M_main() allows the _M_match_queue and _M_visited members
(which are only used in BFS mode) to be replaced with an instantiation
of the new _States class template. _States<__dfs> only contains the
start state (so that it's not empty and doesn't waste space) but
_States<__bfs> contains the start state and also the match queue and
visited list.


This is great! I keep worrying about the ugly all-in-one _Executor
class; I've tried to specialize two versions of it, but that
introduced much duplicated code. Your abstract is nice!


Thanks. I'll clean up that patch and commit it soon too.

The next thing I plan to look at, which I haven't done yet, is to see
if passing the __match_mode template parameter as a runtime function
parameter makes any difference to the way the code is structuted. Do
you have any thoughts in that, before I waste time doing something
that won't work?


As the visited list never changes size after construction I changed
its type from unique_ptr> to unique_ptr, which
avoids an indirection, although now that that member is only present
when actually required it could just be vector. An array of bool
is simpler to access, but takes more heap memory. vector uses
less space but the compiler has to do all the masking to address
individual bits.


What about bitset? Anyway we should get rid of vector.


The size of a bitset is fixed at compile-time. If we could use
dynarray that might be nice, but we can't :-)

[PATCH 1/2, x86] Add palignr support for AVX2.

2014-04-28 Thread Evgeny Stupachenko

Hi,

The patch enables use of "palignr with perm" instead of "2 pshufb, or
and perm" at AVX2 for some cases.

Bootstrapped and passes make check on x86.

Is it ok?

2014-04-28  Evgeny Stupachenko  

* config/i386/i386.c (expand_vec_perm_1): Try AVX2 vpshufb.
* config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
PALINGR instruction.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 88142a8..ae80477 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -42807,6 +42807,8 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
   return true;
 }

+static bool expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d);
+
 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to instantiate D
in a single instruction.  */

@@ -42946,6 +42948,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
   if (expand_vec_perm_pshufb (d))
 return true;

+  /* Try the AVX2 vpshufb.  */
+  if (expand_vec_perm_vpshufb2_vpermq (d))
+return true;
+
   /* Try the AVX512F vpermi2 instructions.  */
   rtx vec[64];
   enum machine_mode mode = d->vmode;
@@ -43004,7 +43010,7 @@ expand_vec_perm_pshuflw_pshufhw (struct
expand_vec_perm_d *d)
 }

 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
-   the permutation using the SSSE3 palignr instruction.  This succeeds
+   the permutation using the SSSE3/AVX2 palignr instruction.  This succeeds
when all of the elements in PERM fit within one vector and we merely
need to shift them down so that a single vector permutation has a
chance to succeed.  */
@@ -43015,14 +43021,20 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)
   unsigned i, nelt = d->nelt;
   unsigned min, max;
   bool in_order, ok;
-  rtx shift, target;
+  rtx shift, shift1, target, tmp;
   struct expand_vec_perm_d dcopy;

-  /* Even with AVX, palignr only operates on 128-bit vectors.  */
-  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
+  /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
+ PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.
+ PALIGNR of 2 128-bits registers takes only 1 instrucion.  */
+  if (!TARGET_SSSE3 || (GET_MODE_SIZE (d->vmode) != 16 &&
+  GET_MODE_SIZE (d->vmode) != 32))
+return false;
+  /* Only AVX2 or higher support PALIGNR on 256-bits registers.  */
+  if (!TARGET_AVX2 && (GET_MODE_SIZE (d->vmode) == 32))
 return false;

-  min = nelt, max = 0;
+  min = 2 * nelt, max = 0;
   for (i = 0; i < nelt; ++i)
 {
   unsigned e = d->perm[i];
@@ -43041,9 +43053,34 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d)

   dcopy = *d;
   shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
-  target = gen_reg_rtx (TImode);
-  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
- gen_lowpart (TImode, d->op0), shift));
+  shift1 = GEN_INT ((min - nelt / 2) *
+  GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
+
+  if (GET_MODE_SIZE (d->vmode) != 32)
+{
+  target = gen_reg_rtx (TImode);
+  emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
+ gen_lowpart (TImode, d->op0), shift));
+}
+  else
+{
+  target = gen_reg_rtx (V2TImode);
+  tmp = gen_reg_rtx (V4DImode);
+  emit_insn (gen_avx2_permv2ti (tmp,
+   gen_lowpart (V4DImode, d->op0),
+   gen_lowpart (V4DImode, d->op1),
+   GEN_INT (33)));
+  if (min < nelt / 2)
+emit_insn (gen_avx2_palignrv2ti (target,
+gen_lowpart (V2TImode, tmp),
+gen_lowpart (V2TImode, d->op0),
+shift));
+  else
+   emit_insn (gen_avx2_palignrv2ti (target,
+gen_lowpart (V2TImode, d->op1),
+gen_lowpart (V2TImode, tmp),
+shift1));
+}

   dcopy.op0 = dcopy.op1 = gen_lowpart (d->vmode, target);
   dcopy.one_operand_p = true;


Evgeny

Re: [wide-int] Stricter type checking in wide_int constructor

2014-04-28 Thread Kenneth Zadeck


On 04/28/2014 12:25 PM, Mike Stump wrote:

On Apr 28, 2014, at 2:36 AM, Richard Sandiford  
wrote:

Ping.  FWIW this is the last patch I have lined up before the merge.
I repeated the asm comparison test I did a few months ago on one target
per config/ architecture and there were no unexpected changes.

Nice, thanks.

by "nice thanks", he meant "accepted".

Re: [wide-int] Stricter type checking in wide_int constructor

2014-04-28 Thread Mike Stump

On Apr 28, 2014, at 2:36 AM, Richard Sandiford  
wrote:
> Ping.  FWIW this is the last patch I have lined up before the merge.
> I repeated the asm comparison test I did a few months ago on one target
> per config/ architecture and there were no unexpected changes.

Nice, thanks.

Re: [PATCH] testsuite: Register loaded libs

2014-04-28 Thread Mike Stump

On Apr 28, 2014, at 12:32 AM, Sebastian Huber 
 wrote:
> On 2014-04-28 04:23, Mike Stump wrote:
>> On Apr 27, 2014, at 10:45 AM, Sebastian Huber 
>>  wrote:
>>> 2014-04-27  Sebastian Huber  
>>> 
>>> * testsuite/lib/libffi.exp (load_gcc_lib): Register loaded libs.
>> 
>> So, I didn’t see anything that strikes me as wrong, but, I’m curious why you 
>> want this?  I didn’t see any uses?
>> 
> 
> I would like to add tests for C/C++ compatibility of atomic operations.
> 
> http://gcc.gnu.org/ml/gcc/2014-04/msg00267.html
> 
> This patch unifies all copy and paste versions of load_gcc_lib.

Ah, thanks for the pointer…

Ok.

Re: [PATCH][RFC][wide-int] Fix some build errors on arm in wide-int branch and report ICE

2014-04-28 Thread Kenneth Zadeck


ok to commit.

kenny
On 04/28/2014 11:42 AM, Richard Sandiford wrote:

Kyrill Tkachov  writes:

With that patch bootstrap now still fails at dwarf2out.c with the same
message. I'm attaching a gzipped dwarf2out.ii

Thanks.  This is a nice proof of why clz_zero and ctz_zero were as bogus
as claimed.  It meant that the behaviour of floor_log2 depended on the
target and would return the wrong value if clz (0) was anything other
than the precision.

This patch makes the wide-int functions behave like the double_int
ones and pushes the target dependency back to the callers that care,
which is where it belongs.  The "new" *_DEFINED_VALUE_AT_ZERO checks
are really reinstating what's already on trunk.  There are other tree
uses of ctz that I think relied on the double_int behaviour.

Tests still ongoing, but could you check what the arm results are like
with this?

Thanks,
Richard


Index: gcc/builtins.c
===
--- gcc/builtins.c  2014-04-28 16:30:59.939239843 +0100
+++ gcc/builtins.c  2014-04-28 16:31:00.252238996 +0100
@@ -8080,6 +8080,7 @@ fold_builtin_bitop (tree fndecl, tree ar
/* Optimize for constant argument.  */
if (TREE_CODE (arg) == INTEGER_CST && !TREE_OVERFLOW (arg))
  {
+  tree type = TREE_TYPE (arg);
int result;
  
switch (DECL_FUNCTION_CODE (fndecl))

@@ -8089,11 +8090,17 @@ fold_builtin_bitop (tree fndecl, tree ar
  break;
  
  	CASE_INT_FN (BUILT_IN_CLZ):

- result = wi::clz (arg);
+ if (wi::ne_p (arg, 0))
+   result = wi::clz (arg);
+ else if (! CLZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), result))
+   result = TYPE_PRECISION (type);
  break;
  
  	CASE_INT_FN (BUILT_IN_CTZ):

- result = wi::ctz (arg);
+ if (wi::ne_p (arg, 0))
+   result = wi::ctz (arg);
+ else if (! CTZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), result))
+   result = TYPE_PRECISION (type);
  break;
  
  	CASE_INT_FN (BUILT_IN_CLRSB):

Index: gcc/simplify-rtx.c
===
--- gcc/simplify-rtx.c  2014-04-28 16:30:59.941239838 +0100
+++ gcc/simplify-rtx.c  2014-04-28 16:31:00.254238990 +0100
@@ -1656,6 +1656,7 @@ simplify_const_unary_operation (enum rtx
wide_int result;
enum machine_mode imode = op_mode == VOIDmode ? mode : op_mode;
rtx_mode_t op0 = std::make_pair (op, imode);
+  int int_value;
  
  #if TARGET_SUPPORTS_WIDE_INT == 0

/* This assert keeps the simplification from producing a result
@@ -1686,7 +1687,11 @@ simplify_const_unary_operation (enum rtx
  break;
  
  	case CLZ:

- result = wi::shwi (wi::clz (op0), mode);
+ if (wi::ne_p (op0, 0))
+   int_value = wi::clz (op0);
+ else if (! CLZ_DEFINED_VALUE_AT_ZERO (mode, int_value))
+   int_value = GET_MODE_PRECISION (mode);
+ result = wi::shwi (int_value, mode);
  break;
  
  	case CLRSB:

@@ -1694,7 +1699,11 @@ simplify_const_unary_operation (enum rtx
  break;
  
  	case CTZ:

- result = wi::shwi (wi::ctz (op0), mode);
+ if (wi::ne_p (op0, 0))
+   int_value = wi::ctz (op0);
+ else if (! CTZ_DEFINED_VALUE_AT_ZERO (mode, int_value))
+   int_value = GET_MODE_PRECISION (mode);
+ result = wi::shwi (int_value, mode);
  break;
  
  	case POPCOUNT:

Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2014-04-28 16:30:59.941239838 +0100
+++ gcc/wide-int.cc 2014-04-28 16:31:00.254238990 +0100
@@ -1137,46 +1137,6 @@ wi::add_large (HOST_WIDE_INT *val, const
return canonize (val, len, prec);
  }
  
-/* This is bogus.  We should always return the precision and leave the

-   caller to handle target dependencies.  */
-static int
-clz_zero (unsigned int precision)
-{
-  unsigned int count;
-
-  enum machine_mode mode = mode_for_size (precision, MODE_INT, 0);
-  if (mode == BLKmode)
-mode_for_size (precision, MODE_PARTIAL_INT, 0);
-
-  /* Even if the value at zero is undefined, we have to come up
- with some replacement.  Seems good enough.  */
-  if (mode == BLKmode)
-count = precision;
-  else if (!CLZ_DEFINED_VALUE_AT_ZERO (mode, count))
-count = precision;
-  return count;
-}
-
-/* This is bogus.  We should always return the precision and leave the
-   caller to handle target dependencies.  */
-static int
-ctz_zero (unsigned int precision)
-{
-  unsigned int count;
-
-  enum machine_mode mode = mode_for_size (precision, MODE_INT, 0);
-  if (mode == BLKmode)
-mode_for_size (precision, MODE_PARTIAL_INT, 0);
-
-  /* Even if the value at zero is undefined, we have to come up
- with some replacement.  Seems good enough.  */
-  if (mode == BLKmode)
-count = precision;
-  else if (!CTZ_DEFINED_VALUE_AT_ZERO (mode, count))
-count = precision;
-  return count;
-}
-
  /* Subrout

Re: [PATCH][RFC][wide-int] Fix some build errors on arm in wide-int branch and report ICE

2014-04-28 Thread Richard Sandiford

Kyrill Tkachov  writes:
> With that patch bootstrap now still fails at dwarf2out.c with the same
> message. I'm attaching a gzipped dwarf2out.ii

Thanks.  This is a nice proof of why clz_zero and ctz_zero were as bogus
as claimed.  It meant that the behaviour of floor_log2 depended on the
target and would return the wrong value if clz (0) was anything other
than the precision.

This patch makes the wide-int functions behave like the double_int
ones and pushes the target dependency back to the callers that care,
which is where it belongs.  The "new" *_DEFINED_VALUE_AT_ZERO checks
are really reinstating what's already on trunk.  There are other tree
uses of ctz that I think relied on the double_int behaviour.

Tests still ongoing, but could you check what the arm results are like
with this?

Thanks,
Richard


Index: gcc/builtins.c
===
--- gcc/builtins.c  2014-04-28 16:30:59.939239843 +0100
+++ gcc/builtins.c  2014-04-28 16:31:00.252238996 +0100
@@ -8080,6 +8080,7 @@ fold_builtin_bitop (tree fndecl, tree ar
   /* Optimize for constant argument.  */
   if (TREE_CODE (arg) == INTEGER_CST && !TREE_OVERFLOW (arg))
 {
+  tree type = TREE_TYPE (arg);
   int result;
 
   switch (DECL_FUNCTION_CODE (fndecl))
@@ -8089,11 +8090,17 @@ fold_builtin_bitop (tree fndecl, tree ar
  break;
 
CASE_INT_FN (BUILT_IN_CLZ):
- result = wi::clz (arg);
+ if (wi::ne_p (arg, 0))
+   result = wi::clz (arg);
+ else if (! CLZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), result))
+   result = TYPE_PRECISION (type);
  break;
 
CASE_INT_FN (BUILT_IN_CTZ):
- result = wi::ctz (arg);
+ if (wi::ne_p (arg, 0))
+   result = wi::ctz (arg);
+ else if (! CTZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), result))
+   result = TYPE_PRECISION (type);
  break;
 
CASE_INT_FN (BUILT_IN_CLRSB):
Index: gcc/simplify-rtx.c
===
--- gcc/simplify-rtx.c  2014-04-28 16:30:59.941239838 +0100
+++ gcc/simplify-rtx.c  2014-04-28 16:31:00.254238990 +0100
@@ -1656,6 +1656,7 @@ simplify_const_unary_operation (enum rtx
   wide_int result;
   enum machine_mode imode = op_mode == VOIDmode ? mode : op_mode;
   rtx_mode_t op0 = std::make_pair (op, imode);
+  int int_value;
 
 #if TARGET_SUPPORTS_WIDE_INT == 0
   /* This assert keeps the simplification from producing a result
@@ -1686,7 +1687,11 @@ simplify_const_unary_operation (enum rtx
  break;
 
case CLZ:
- result = wi::shwi (wi::clz (op0), mode);
+ if (wi::ne_p (op0, 0))
+   int_value = wi::clz (op0);
+ else if (! CLZ_DEFINED_VALUE_AT_ZERO (mode, int_value))
+   int_value = GET_MODE_PRECISION (mode);
+ result = wi::shwi (int_value, mode);
  break;
 
case CLRSB:
@@ -1694,7 +1699,11 @@ simplify_const_unary_operation (enum rtx
  break;
 
case CTZ:
- result = wi::shwi (wi::ctz (op0), mode);
+ if (wi::ne_p (op0, 0))
+   int_value = wi::ctz (op0);
+ else if (! CTZ_DEFINED_VALUE_AT_ZERO (mode, int_value))
+   int_value = GET_MODE_PRECISION (mode);
+ result = wi::shwi (int_value, mode);
  break;
 
case POPCOUNT:
Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2014-04-28 16:30:59.941239838 +0100
+++ gcc/wide-int.cc 2014-04-28 16:31:00.254238990 +0100
@@ -1137,46 +1137,6 @@ wi::add_large (HOST_WIDE_INT *val, const
   return canonize (val, len, prec);
 }
 
-/* This is bogus.  We should always return the precision and leave the
-   caller to handle target dependencies.  */
-static int
-clz_zero (unsigned int precision)
-{
-  unsigned int count;
-
-  enum machine_mode mode = mode_for_size (precision, MODE_INT, 0);
-  if (mode == BLKmode)
-mode_for_size (precision, MODE_PARTIAL_INT, 0);
-
-  /* Even if the value at zero is undefined, we have to come up
- with some replacement.  Seems good enough.  */
-  if (mode == BLKmode)
-count = precision;
-  else if (!CLZ_DEFINED_VALUE_AT_ZERO (mode, count))
-count = precision;
-  return count;
-}
-
-/* This is bogus.  We should always return the precision and leave the
-   caller to handle target dependencies.  */
-static int
-ctz_zero (unsigned int precision)
-{
-  unsigned int count;
-
-  enum machine_mode mode = mode_for_size (precision, MODE_INT, 0);
-  if (mode == BLKmode)
-mode_for_size (precision, MODE_PARTIAL_INT, 0);
-
-  /* Even if the value at zero is undefined, we have to come up
- with some replacement.  Seems good enough.  */
-  if (mode == BLKmode)
-count = precision;
-  else if (!CTZ_DEFINED_VALUE_AT_ZERO (mode, count))
-count = precision;
-  return count;
-}
-
 /* Subroutines of the multiplication and division operations.  Unpack
the

Re: [patch 3/N] std::regex refactoring - _Executor DFS / BFS

2014-04-28 Thread Tim Shen

On Mon, Apr 28, 2014 at 10:46 AM, Jonathan Wakely  wrote:
> This change splits _Executor::_M_main() into two overloaded
> _M_main_dispatch() functions, choosing which to run based on the
> __dfs_mode template parameter.
>
> I think this gives a (very) small improvement in compilation time when
> using regexes.
>
> Splitting _M_main() allows the _M_match_queue and _M_visited members
> (which are only used in BFS mode) to be replaced with an instantiation
> of the new _States class template. _States<__dfs> only contains the
> start state (so that it's not empty and doesn't waste space) but
> _States<__bfs> contains the start state and also the match queue and
> visited list.

This is great! I keep worrying about the ugly all-in-one _Executor
class; I've tried to specialize two versions of it, but that
introduced much duplicated code. Your abstract is nice!

> As the visited list never changes size after construction I changed
> its type from unique_ptr> to unique_ptr, which
> avoids an indirection, although now that that member is only present
> when actually required it could just be vector. An array of bool
> is simpler to access, but takes more heap memory. vector uses
> less space but the compiler has to do all the masking to address
> individual bits.

What about bitset? Anyway we should get rid of vector.

> I also changed this range-based for loop in _M_main (now in
> _M_main_dispatch(__bfs)) to make __task a reference, avoiding one
> copy, and to move from __task.second, avoiding a second copy:
>
>auto __old_queue = std::move(_M_states._M_match_queue);
>for (auto& __task : __old_queue)
>  {
>_M_cur_results = std::move(__task.second);
>_M_dfs<__match_mode>(__task.first);
>  }

I didn't know why I've written the code that copys a lot. Thanks.


-- 
Regards,
Tim Shen

Re: [patch 1/N] std::regex refactoring - _BracketMatcher

2014-04-28 Thread Tim Shen

On Mon, Apr 28, 2014 at 11:05 AM, Jonathan Wakely  wrote:
> There is a well-defined mapping from every unsigned char in the range
> [0,255] to char and back, so conversions between char and unsigned
> char are fine.  If we used a larger type then we would get the wrong result
> when char is signed, because (size_t)(char)-1 != (unsigned char)(char)-1

You are right. I've missed this.

> sizeof(wchar_t) is 4 on unix-like systems, but a cache for char16_t
> wouldn't be totally crazy.
>
> If you prefer I will not remove the uses of _CharT and _UnsignedCharT
> and won't assume the cache is only used for char. There are still some
> simplifications we can make.

Yes, I (now again) prefer _UnsignedCharT. But it's just kind of
personal flavor, I suppose?

> Thanks, I'll clean the patch up and commit it soon.

Thanks again :)


-- 
Regards,
Tim Shen

Re: [PATCH] Cleanup do_per_function, require less push/pop_cfun

2014-04-28 Thread Jan Hubicka

> On Wed, 23 Apr 2014, Richard Biener wrote:
> 
> > 
> > This avoids all the complex work on simple things like
> > clear_last_verified.  It also makes eventually inlining all
> > calls (for example the one with the small IPA pass hack)
> > less code-duplicating.
> > 
> > I had to remove the asserts in favor of frees of DOM info in 
> > release_function_body as the old code released DOM info in
> > various odd places.
> > 
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> > 
> > Honza, does this look ok to you?
> 
> I've heard nothing back so I assume it's ok and committed it.

The purpose of the assert was to make sure that the dominance trees
are not maintained for whole program at once. release_function_body is
never called for a function that is "current" and thus the dominators should
be always freed.

At beggining the datastructure was global and thus it was not possible
to hold two. I think that is long fixed, but it seems bit wasteful to
keep all the dominance trees around.  Do you know how dominators leak
there?

Honza
> 
> Richard.
> 
> > Thanks,
> > Richard.
> > 
> > 2014-04-23  Richard Biener  
> > 
> > * tree-pass.h (execute_pass_list): Adjust prototype.
> > * passes.c (pass_manager::execute_early_local_passes):
> > Adjust.
> > (do_per_function): Change callback signature, push all actual
> > work to the callbals.
> > (do_per_function_toporder): Likewise.
> > (execute_function_dump): Adjust.
> > (execute_function_todo): Likewise.
> > (clear_last_verified): Likewise.
> > (verify_curr_properties): Likewise.
> > (update_properties_after_pass): Likewise.
> > (apply_ipa_transforms): Likewise.
> > (execute_pass_list_1): Split out from ...
> > (execute_pass_list): ... here.  Adjust.
> > (execute_ipa_pass_list): Likewise.
> > * cgraphunit.c (cgraph_add_new_function): Adjust.
> > (analyze_function): Likewise.
> > (expand_function): Likewise.
> > * cgraph.c (release_function_body): Free dominance info
> > here instead of asserting it was magically freed elsewhere.
> > 
> > Index: gcc/tree-pass.h
> > ===
> > *** gcc/tree-pass.h.orig2014-04-23 14:55:25.640624814 +0200
> > --- gcc/tree-pass.h 2014-04-23 15:40:56.443436802 +0200
> > *** extern gimple_opt_pass *make_pass_conver
> > *** 587,593 
> >   extern opt_pass *current_pass;
> >   
> >   extern bool execute_one_pass (opt_pass *);
> > ! extern void execute_pass_list (opt_pass *);
> >   extern void execute_ipa_pass_list (opt_pass *);
> >   extern void execute_ipa_summary_passes (ipa_opt_pass_d *);
> >   extern void execute_all_ipa_transforms (void);
> > --- 587,593 
> >   extern opt_pass *current_pass;
> >   
> >   extern bool execute_one_pass (opt_pass *);
> > ! extern void execute_pass_list (function *, opt_pass *);
> >   extern void execute_ipa_pass_list (opt_pass *);
> >   extern void execute_ipa_summary_passes (ipa_opt_pass_d *);
> >   extern void execute_all_ipa_transforms (void);
> > *** extern bool function_called_by_processed
> > *** 615,621 
> >   extern bool first_pass_instance;
> >   
> >   /* Declare for plugins.  */
> > ! extern void do_per_function_toporder (void (*) (void *), void *);
> >   
> >   extern void disable_pass (const char *);
> >   extern void enable_pass (const char *);
> > --- 615,621 
> >   extern bool first_pass_instance;
> >   
> >   /* Declare for plugins.  */
> > ! extern void do_per_function_toporder (void (*) (function *, void *), void 
> > *);
> >   
> >   extern void disable_pass (const char *);
> >   extern void enable_pass (const char *);
> > Index: gcc/passes.c
> > ===
> > *** gcc/passes.c.orig   2014-04-23 14:55:25.642624814 +0200
> > --- gcc/passes.c2014-04-23 15:41:02.414436391 +0200
> > *** opt_pass::opt_pass (const pass_data &dat
> > *** 132,138 
> >   void
> >   pass_manager::execute_early_local_passes ()
> >   {
> > !   execute_pass_list (pass_early_local_passes_1->sub);
> >   }
> >   
> >   unsigned int
> > --- 132,138 
> >   void
> >   pass_manager::execute_early_local_passes ()
> >   {
> > !   execute_pass_list (cfun, pass_early_local_passes_1->sub);
> >   }
> >   
> >   unsigned int
> > *** pass_manager::pass_manager (context *ctx
> > *** 1498,1524 
> >  call CALLBACK on the current function.  */
> >   
> >   static void
> > ! do_per_function (void (*callback) (void *data), void *data)
> >   {
> > if (current_function_decl)
> > ! callback (data);
> > else
> >   {
> > struct cgraph_node *node;
> > FOR_EACH_DEFINED_FUNCTION (node)
> > if (node->analyzed && gimple_has_body_p (node->decl)
> > && (!node->clone_of || node->decl != node->clone_of->decl))
> > ! {
> > !   push_cfun (DECL_STRUCT_FUNCTION (node->decl));
> > !   callback (data);

Re: [patch 1/N] std::regex refactoring - _BracketMatcher

2014-04-28 Thread Jonathan Wakely


On 28/04/14 10:15 -0400, Tim Shen wrote:

On Mon, Apr 28, 2014 at 8:40 AM, Jonathan Wakely  wrote:

I've been looking through the regex code and have a few ideas for
simplifications or optimisations that I'd like to share.


Thanks :)


This first patch is for _BracketMatcher. We only use std::bitset when
is_same<_CharT, char> so 8 * sizeof(_CharT) should be __CHAR_BIT__
instead. We also only user _UnsignedCharT when is_same<_CharT, char>
so that can just be simplified to unsigned char.


Yes, since _UnsignedCharT is just used as indexes, we can always use a
larger unsigned integer instead. Maybe "size_t" is a better choice?


There is a well-defined mapping from every unsigned char in the range
[0,255] to char and back, so conversions between char and unsigned
char are fine.  If we used a larger type then we would get the wrong result
when char is signed, because (size_t)(char)-1 != (unsigned char)(char)-1


I'm not sure if we'll have a wchar_t cache (bitset<65536>) in the future ;)


sizeof(wchar_t) is 4 on unix-like systems, but a cache for char16_t
wouldn't be totally crazy.

If you prefer I will not remove the uses of _CharT and _UnsignedCharT
and won't assume the cache is only used for char. There are still some
simplifications we can make.


The contents of _BracketMatcher::_M_char_set are not sorted and can
contain duplicates in the current code. Making that a sorted, unique
list in _BracketMatcher::_M_ready() allows a binary search instead of
linear search. This improves worst case performance for pathological
regular expressions like std::wregex('['+std::wstring(1000, 'a')+"b]")
but I'm not sure if it helps in the common case.


Trust me, in common case, even it's a bit slower, it won't be
observable. So it's just OK.


Finally, in the non-char case the _CacheT member is an unused empty
object, so having that as the first member requires 7 bytes of
padding. Re-ordering the members reduces the size of a non-char
_BracketMatcher by 8 bytes (but it's still a whopping 96 bytes).

(For a char _BracketMatcher the bitset cache makes it 128 bytes,
this patch doesn't change that).


This is OK too.


Thanks, I'll clean the patch up and commit it soon.

Re: [PATCH] Optionally trap on impossible devirtualization

2014-04-28 Thread Jan Hubicka

> On Mon, Apr 28, 2014 at 11:10:41AM +0200, Jakub Jelinek wrote:
> > On Mon, Apr 28, 2014 at 11:05:06AM +0200, Richard Biener wrote:
> > > On Fri, Apr 25, 2014 at 5:35 PM, Martin Jambor  wrote:
> > > > Hi,
> > > >
> > > > the patch below might be useful for testcase preparation and debugging
> > > > compiler bugs such as PR 60965.  When
> > > > -ftrap-on-impossible-devirtualization is supplied on the command line,
> > > > it makes the devirtualization produce __builtin_trap instead of
> > > > __builtin_unreachable when it comes to the conclusion that there is no
> > > > legal target of a virtual call.
> > > >
> > > > Apart from dealing with our bugs, it may be even useful to debug
> > > > compiled programs when a user triggers some sort of illegal
> > > > devirtualization, typically by missing a type check somewhere.
> > > > Currently the compiled program might simply take a wrong branch, with
> > > > the patch it will abort.
> > > >
> > > > Bootstrapped and tested (with the option on) on x86_64-linux, I have
> > > > also successfully LTO built Firefox with it.  If I add some
> > > > documentation, would like to see this in trunk?
> > > 
> > > It's useful for debugging, so yes.  Not sure about the option name though.
> > > Maybe we should have a generic -ftrap-on-unreachable flag instead
> > > and handle all __builtin_unreachable () like that (for example by
> > > folding or by simply make __builtin_unreachable () alias to 
> > > __builtin_trap ()).
> > 
> > -fsanitize=unreachable should already do that.  With
> > -fsanitize=unreachable -fsanitize-undefined-trap-on-error
> > it should fold __builtin_unreachable () to __builtin_trap (), otherwise
> > to __ubsan_handle_builtin_unreachable () call.
> > 
> > So, from this POV, the new option is redundant.
> 
> That sounds like good news except that it does not work, at least not
> for me when I tried it on the testcase from comment #2 from PR 60965.
> The behavior of the executable is just the same, I do not get any
> traps.  Is this supposed to work at -O2?  If so, should I file a ubsan
> bug?  (If not, then I suppose some additional non-ubsan mechanism for
> this might be also useful.)

As i wrote in the other email, I think the problem is when the transformation
happen and if we re-fold the statement.

Indeed, we ought to fix this. (It works in one of the PRs I looked into, but
only for mainline, not for 4.9)

Honza
> 
> Thanks,
> 
> Martin

Re: [patch 2/N] std::regex refactoring - sub _Executor for lookahead

2014-04-28 Thread Tim Shen

On Mon, Apr 28, 2014 at 10:55 AM, Jonathan Wakely  wrote:
> Ah yes, I didn't think of that. But the size of _Executor is fixed,
> isn't it?  If it has a huge number of states or matches those will be
> on the heap anyway, in vectors/arrays.
>
> It could be huge if instantiated with a huge iterator type, as it
> stores three members of the iterator type, but I don't think users
> should be too surprised if they overflow the stack with freakishly
> large iterators :-)
>
> Am I still missing something?
>
> (I don't have a preference for whether to change this, but if we keep
> it on the heap we should add a comment, or I'll keep forgetting the
> rationale and try to change it again!)

Either way is OK, in fact. Let's just keep the code simple by applying
this patch. I can't imagine one could use nested lookahead. :)


-- 
Regards,
Tim Shen

Re: [RFC] Add aarch64 support for ada

2014-04-28 Thread Eric Botcazou

> Bootstrap and test succeeded, thanks.

Thanks, applied as such.  You can re-apply the gcc-interface/Makefile.in hunk 
(I reverted it as well) but you first need to adjust it to the mainline.


* exp_dbug.ads (Get_External_Name): Add 'False' default to Has_Suffix,
add 'Suffix' parameter and adjust comment.
(Get_External_Name_With_Suffix): Delete.
* exp_dbug.adb (Get_External_Name_With_Suffix): Merge into...
(Get_External_Name): ...here.  Add 'False' default to Has_Suffix, add
'Suffix' parameter.
(Get_Encoded_Name): Remove 2nd argument in call to Get_External_Name.
Call Get_External_Name instead of Get_External_Name_With_Suffix.
(Get_Secondary_DT_External_Name): Likewise.
* exp_cg.adb (Write_Call_Info): Likewise.
* exp_disp.adb (Export_DT): Likewise.
(Import_DT): Likewise.
* comperr.ads (Compiler_Abort): Remove Code parameter and add From_GCC
parameter with False default.
* comperr.adb (Compiler_Abort): Likewise.  Adjust accordingly.
* types.h (Fat_Pointer): Rename into...
(String_Pointer): ...this.  Add comment on interfacing rules.
* fe.h (Compiler_Abort): Adjust for above renaming.
(Error_Msg_N): Likewise.
(Error_Msg_NE): Likewise.
(Get_External_Name): Likewise.  Add third parameter.
(Get_External_Name_With_Suffix): Delete.
* gcc-interface/decl.c (STDCALL_PREFIX): Define.
(create_concat_name): Adjust call to Get_External_Name, remove call to
Get_External_Name_With_Suffix, use STDCALL_PREFIX, adjust for renaming.
* gcc-interface/trans.c (post_error): Likewise.
(post_error_ne): Likewise.
* gcc-interface/misc.c (internal_error_function): Likewise.


-- 
Eric BotcazouIndex: comperr.adb
===
--- comperr.adb	(revision 209859)
+++ comperr.adb	(working copy)
@@ -6,7 +6,7 @@
 --  --
 -- B o d y  --
 --  --
---  Copyright (C) 1992-2013, Free Software Foundation, Inc. --
+--  Copyright (C) 1992-2014, Free Software Foundation, Inc. --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -74,8 +74,8 @@ package body Comperr is
 
procedure Compiler_Abort
  (X: String;
-  Code : Integer := 0;
-  Fallback_Loc : String := "")
+  Fallback_Loc : String  := "";
+  From_GCC : Boolean := False)
is
   --  The procedures below output a "bug box" with information about
   --  the cause of the compiler abort and about the preferred method
@@ -206,7 +206,7 @@ package body Comperr is
  Write_Str (") ");
 
  if X'Length + Column > 76 then
-if Code < 0 then
+if From_GCC then
Write_Str ("GCC error:");
 end if;
 
@@ -235,11 +235,7 @@ package body Comperr is
 Write_Str (X);
  end if;
 
- if Code > 0 then
-Write_Str (", Code=");
-Write_Int (Int (Code));
-
- elsif Code = 0 then
+ if not From_GCC then
 
 --  For exception case, get exception message from the TSD. Note
 --  that it would be neater and cleaner to pass the exception
Index: comperr.ads
===
--- comperr.ads	(revision 209859)
+++ comperr.ads	(working copy)
@@ -6,7 +6,7 @@
 --  --
 -- S p e c  --
 --  --
---  Copyright (C) 1992-2013, Free Software Foundation, Inc. --
+--  Copyright (C) 1992-2014, Free Software Foundation, Inc. --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -31,8 +31,8 @@ package Comperr is
 
procedure Compiler_Abort
  (X: String;
-  Code : Integer := 0;
-  Fallback_Loc : String := "");
+  Fallback_Loc : String  := "";
+  From_GCC : Boolean := False);
pragma No_Return (Compiler_Abort);
--  Signals an internal compiler error. Never returns control. Depending on
--  processing may end up raising Unrecoverable_Error, or exiting directly.
@@ -46,10 +46,9 @@ package Comperr is
--  Note that this

Re: -fuse-caller-save - Enable for MIPS

2014-04-28 Thread Richard Sandiford

Tom de Vries  writes:
>> If so,
>> should -fuse-caller-save imply -fcaller-saves?
>
> I don't think it's strictly necessary, but I can make a patch if required.

Implying -fcaller-saves seems better to me, since "-O -fuse-caller-save"
looks like it should enable the new optimisation.  It's not my call though.

Thanks,
Richard

Re: [patch 2/N] std::regex refactoring - sub _Executor for lookahead

2014-04-28 Thread Jonathan Wakely


On 28/04/14 10:18 -0400, Tim Shen wrote:

On Mon, Apr 28, 2014 at 8:45 AM, Jonathan Wakely  wrote:

Is there any reason this object is created on the heap?


Say, _Executor's size is so huge and a uncommon user gets a stack
overflow by keep invoking this function?


Ah yes, I didn't think of that. But the size of _Executor is fixed,
isn't it?  If it has a huge number of states or matches those will be
on the heap anyway, in vectors/arrays.

It could be huge if instantiated with a huge iterator type, as it
stores three members of the iterator type, but I don't think users
should be too surprised if they overflow the stack with freakishly
large iterators :-)

Am I still missing something?

(I don't have a preference for whether to change this, but if we keep
it on the heap we should add a comment, or I'll keep forgetting the
rationale and try to change it again!)

Re: -fuse-caller-save - Enable for MIPS

2014-04-28 Thread Richard Sandiford

Tom de Vries  writes:
> On 28-04-14 12:26, Richard Sandiford wrote:
>> Tom de Vries  writes:
>>> On 27-04-14 12:27, Richard Sandiford wrote:
 Tom de Vries  writes:
>mips_emit_call_insn (rtx pattern, rtx orig_addr, rtx addr, bool lazy_p)
>{
>  rtx insn, reg;
>
> -  insn = emit_call_insn (pattern);
> +  emit_call_insn (pattern);
> +  insn = last_call_insn ();
>
>  if (TARGET_MIPS16 && mips_use_pic_fn_addr_reg_p (orig_addr))
>{

 This change isn't necessary; emit_call_insn is defined to return a
 CALL_INSN.

>>>
>>> I dropped this change, as well as the change in the untyped_call expand, I
>>> realized it's unnecessary.
>>
>> Why was the untyped_call part unnecessary?
>>
>
> The define_expand "untyped_call" uses GEN_CALL, which uses
> define_expand "call", which uses mips_expand_call, which uses
> mips_emit_call_insn, which adds the required clobbers.

Ah, yeah.  In that case please keep mips_emit_call_insn static.

OK with that change, although please remove -O1 if the agreement
is that "-O1 -fuse-call-save" should work.

Thanks,
Richard

[patch 3/N] std::regex refactoring - _Executor DFS / BFS

2014-04-28 Thread Jonathan Wakely


This change splits _Executor::_M_main() into two overloaded
_M_main_dispatch() functions, choosing which to run based on the
__dfs_mode template parameter.

I think this gives a (very) small improvement in compilation time when
using regexes.

Splitting _M_main() allows the _M_match_queue and _M_visited members
(which are only used in BFS mode) to be replaced with an instantiation
of the new _States class template. _States<__dfs> only contains the
start state (so that it's not empty and doesn't waste space) but
_States<__bfs> contains the start state and also the match queue and
visited list.

As the visited list never changes size after construction I changed
its type from unique_ptr> to unique_ptr, which
avoids an indirection, although now that that member is only present
when actually required it could just be vector. An array of bool
is simpler to access, but takes more heap memory. vector uses
less space but the compiler has to do all the masking to address
individual bits.

I also changed this range-based for loop in _M_main (now in
_M_main_dispatch(__bfs)) to make __task a reference, avoiding one
copy, and to move from __task.second, avoiding a second copy:

   auto __old_queue = std::move(_M_states._M_match_queue);
   for (auto& __task : __old_queue)
 {
   _M_cur_results = std::move(__task.second);
   _M_dfs<__match_mode>(__task.first);
 }

Thoughts?

diff --git a/libstdc++-v3/include/bits/regex_executor.h b/libstdc++-v3/include/bits/regex_executor.h
index c110b88..064e3df 100644
--- a/libstdc++-v3/include/bits/regex_executor.h
+++ b/libstdc++-v3/include/bits/regex_executor.h
@@ -42,8 +42,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*/
 
   /**
-   * @brief Takes a regex and an input string in and
-   * do the matching.
+   * @brief Takes a regex and an input string and does the matching.
*
* The %_Executor class has two modes: DFS mode and BFS mode, controlled
* by the template parameter %__dfs_mode.
@@ -52,6 +51,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	   bool __dfs_mode>
 class _Executor
 {
+  using __search_mode = integral_constant;
+  using __dfs = true_type;
+  using __bfs = false_type;
+
 public:
   typedef typename iterator_traits<_BiIter>::value_type _CharT;
   typedef basic_regex<_CharT, _TraitsT> _RegexT;
@@ -71,16 +74,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _M_re(__re),
   _M_nfa(*__re._M_automaton),
   _M_results(__results),
-  _M_match_queue(__dfs_mode ? nullptr
-		 : new vector>()),
   _M_rep_count(_M_nfa.size()),
-  _M_visited(__dfs_mode ? nullptr : new vector(_M_nfa.size())),
+  _M_states(_M_nfa._M_start(), _M_nfa.size()),
   _M_flags((__flags & regex_constants::match_prev_avail)
 	   ? (__flags
 		  & ~regex_constants::match_not_bol
 		  & ~regex_constants::match_not_bow)
-	   : __flags),
-  _M_start_state(_M_nfa._M_start())
+	   : __flags)
   { }
 
   // Set matched when string exactly match the pattern.
@@ -113,7 +113,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 	bool
-	_M_main();
+	_M_main()
+	{ return _M_main_dispatch<__match_mode>(__search_mode{}); }
+
+  template
+	bool
+	_M_main_dispatch(__dfs);
+
+  template
+	bool
+	_M_main_dispatch(__bfs);
 
   bool
   _M_is_word(_CharT __ch) const
@@ -144,6 +153,53 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   bool
   _M_lookahead(_State<_TraitsT> __state);
 
+   // Holds additional information used in BFS-mode.
+  template
+	struct _State_info;
+
+  template
+	struct _State_info<__bfs, _ResultsVec>
+	{
+	  explicit
+	  _State_info(_StateIdT __start, size_t __n)
+	  : _M_start(__start), _M_visited_states(new bool[__n]())
+	  { }
+
+	  bool _M_visited(_StateIdT __i)
+	  {
+	if (_M_visited_states[__i])
+	  return true;
+	_M_visited_states[__i] = true;
+	return false;
+	  }
+
+	  void _M_queue(_StateIdT __i, const _ResultsVec& __res)
+	  { _M_match_queue.emplace_back(__i, __res); }
+
+	  // Saves states that need to be considered for the next character.
+	  vector>	_M_match_queue;
+	  // Indicates which states are already visited.
+	  unique_ptr			_M_visited_states;
+	  // To record current solution.
+	  _StateIdT _M_start;
+	};
+
+  template
+	struct _State_info<__dfs, _ResultsVec>
+	{
+	  explicit
+	  _State_info(_StateIdT __start, size_t) : _M_start(__start)
+	  { }
+
+	  // Dummy implementations for DFS mode.
+	  bool _M_visited(_StateIdT) const { return false; }
+	  void _M_queue(_StateIdT, const _ResultsVec&) { }
+
+	  // To record current solution.
+	  _StateIdT _M_start;
+	};
+
+
 public:
   _ResultsVec   _M_cur_results;
   _BiIter   _M_current;
@@ -152,15 +208,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   const _RegexT&_M_re;
   const _NFAT&

Re: [PATCH] Optionally trap on impossible devirtualization

2014-04-28 Thread Martin Jambor

On Mon, Apr 28, 2014 at 11:10:41AM +0200, Jakub Jelinek wrote:
> On Mon, Apr 28, 2014 at 11:05:06AM +0200, Richard Biener wrote:
> > On Fri, Apr 25, 2014 at 5:35 PM, Martin Jambor  wrote:
> > > Hi,
> > >
> > > the patch below might be useful for testcase preparation and debugging
> > > compiler bugs such as PR 60965.  When
> > > -ftrap-on-impossible-devirtualization is supplied on the command line,
> > > it makes the devirtualization produce __builtin_trap instead of
> > > __builtin_unreachable when it comes to the conclusion that there is no
> > > legal target of a virtual call.
> > >
> > > Apart from dealing with our bugs, it may be even useful to debug
> > > compiled programs when a user triggers some sort of illegal
> > > devirtualization, typically by missing a type check somewhere.
> > > Currently the compiled program might simply take a wrong branch, with
> > > the patch it will abort.
> > >
> > > Bootstrapped and tested (with the option on) on x86_64-linux, I have
> > > also successfully LTO built Firefox with it.  If I add some
> > > documentation, would like to see this in trunk?
> > 
> > It's useful for debugging, so yes.  Not sure about the option name though.
> > Maybe we should have a generic -ftrap-on-unreachable flag instead
> > and handle all __builtin_unreachable () like that (for example by
> > folding or by simply make __builtin_unreachable () alias to __builtin_trap 
> > ()).
> 
> -fsanitize=unreachable should already do that.  With
> -fsanitize=unreachable -fsanitize-undefined-trap-on-error
> it should fold __builtin_unreachable () to __builtin_trap (), otherwise
> to __ubsan_handle_builtin_unreachable () call.
> 
> So, from this POV, the new option is redundant.

That sounds like good news except that it does not work, at least not
for me when I tried it on the testcase from comment #2 from PR 60965.
The behavior of the executable is just the same, I do not get any
traps.  Is this supposed to work at -O2?  If so, should I file a ubsan
bug?  (If not, then I suppose some additional non-ubsan mechanism for
this might be also useful.)

Thanks,

Martin

Re: [patch 2/N] std::regex refactoring - sub _Executor for lookahead

2014-04-28 Thread Tim Shen

On Mon, Apr 28, 2014 at 8:45 AM, Jonathan Wakely  wrote:
> Is there any reason this object is created on the heap?

Say, _Executor's size is so huge and a uncommon user gets a stack
overflow by keep invoking this function?


-- 
Regards,
Tim Shen

Re: [patch 1/N] std::regex refactoring - _BracketMatcher

2014-04-28 Thread Tim Shen

On Mon, Apr 28, 2014 at 8:40 AM, Jonathan Wakely  wrote:
> I've been looking through the regex code and have a few ideas for
> simplifications or optimisations that I'd like to share.

Thanks :)

> This first patch is for _BracketMatcher. We only use std::bitset when
> is_same<_CharT, char> so 8 * sizeof(_CharT) should be __CHAR_BIT__
> instead. We also only user _UnsignedCharT when is_same<_CharT, char>
> so that can just be simplified to unsigned char.

Yes, since _UnsignedCharT is just used as indexes, we can always use a
larger unsigned integer instead. Maybe "size_t" is a better choice?

I'm not sure if we'll have a wchar_t cache (bitset<65536>) in the future ;)

> The contents of _BracketMatcher::_M_char_set are not sorted and can
> contain duplicates in the current code. Making that a sorted, unique
> list in _BracketMatcher::_M_ready() allows a binary search instead of
> linear search. This improves worst case performance for pathological
> regular expressions like std::wregex('['+std::wstring(1000, 'a')+"b]")
> but I'm not sure if it helps in the common case.

Trust me, in common case, even it's a bit slower, it won't be
observable. So it's just OK.

> Finally, in the non-char case the _CacheT member is an unused empty
> object, so having that as the first member requires 7 bytes of
> padding. Re-ordering the members reduces the size of a non-char
> _BracketMatcher by 8 bytes (but it's still a whopping 96 bytes).
>
> (For a char _BracketMatcher the bitset cache makes it 128 bytes,
> this patch doesn't change that).

This is OK too.

Rant: I can't understand why the committee doesn't keep things simple
by letting struct { } has 0 size. I used to implement my own tuple by
specializing tuple<> as an empty struct. The thing is, one tuple
shouldn't `has it` as a member, or it will get extra padding; at last
I used inheritance.


-- 
Regards,
Tim Shen

Re: [PATCH] Add --enable-valgrind-annotations

2014-04-28 Thread Jakub Jelinek

On Mon, Apr 28, 2014 at 03:47:17PM +0200, Richard Biener wrote:
> On Tue, Mar 18, 2014 at 2:51 PM, Richard Biener  wrote:
> >
> > This is another patch (well, I've polished it a bit) that was sitting
> > in my local tree for some time.  I've not enabled the ggc-common.c
> > code (I merely want to get rid of the false possitives).
> >
> > Queued for 4.10 unless I'm told otherwise.
> 
> There were no comments sofar - thus, ok for trunk?

Ok.
> > 2014-03-18  Richard Biener  
> >
> > * configure.ac: Do valgrind header checks unconditionally.
> > Add --enable-valgrind-annotations.
> > * system.h: Guard valgrind header inclusion with
> > ENABLE_VALGRIND_ANNOTATIONS instead of ENABLE_VALGRIND_CHECKING.
> > * alloc-pool.c (pool_alloc, pool_free): Use
> > ENABLE_VALGRIND_ANNOTATIONS instead of ENABLE_VALGRIND_CHECKING
> > to guard possibly dead code.
> > * config.in: Regenerated.
> > * configure: Likewise.

Jakub

Re: [PATCH] Add --enable-valgrind-annotations

2014-04-28 Thread Richard Biener

On Tue, Mar 18, 2014 at 2:51 PM, Richard Biener  wrote:
>
> This is another patch (well, I've polished it a bit) that was sitting
> in my local tree for some time.  I've not enabled the ggc-common.c
> code (I merely want to get rid of the false possitives).
>
> Queued for 4.10 unless I'm told otherwise.

There were no comments sofar - thus, ok for trunk?

Thanks,
Richard.

> Richard.
>
> 2014-03-18  Richard Biener  
>
> * configure.ac: Do valgrind header checks unconditionally.
> Add --enable-valgrind-annotations.
> * system.h: Guard valgrind header inclusion with
> ENABLE_VALGRIND_ANNOTATIONS instead of ENABLE_VALGRIND_CHECKING.
> * alloc-pool.c (pool_alloc, pool_free): Use
> ENABLE_VALGRIND_ANNOTATIONS instead of ENABLE_VALGRIND_CHECKING
> to guard possibly dead code.
> * config.in: Regenerated.
> * configure: Likewise.
>
> Index: gcc/configure.ac
> ===
> *** gcc/configure.ac(revision 208642)
> --- gcc/configure.ac(working copy)
> *** dnl # an if statement.  This was the sou
> *** 512,538 
>   dnl # in converting to autoconf 2.5x!
>   AC_CHECK_HEADER(valgrind.h, have_valgrind_h=yes, have_valgrind_h=no)
>
> ! if test x$ac_valgrind_checking != x ; then
> !   # It is certainly possible that there's valgrind but no valgrind.h.
> !   # GCC relies on making annotations so we must have both.
> !   AC_MSG_CHECKING(for VALGRIND_DISCARD in )
> !   AC_PREPROC_IFELSE([AC_LANG_SOURCE(
> ! [[#include 
>   #ifndef VALGRIND_DISCARD
>   #error VALGRIND_DISCARD not defined
>   #endif]])],
> [gcc_cv_header_valgrind_memcheck_h=yes],
> [gcc_cv_header_valgrind_memcheck_h=no])
> !   AC_MSG_RESULT($gcc_cv_header_valgrind_memcheck_h)
> !   AC_MSG_CHECKING(for VALGRIND_DISCARD in )
> !   AC_PREPROC_IFELSE([AC_LANG_SOURCE(
> ! [[#include 
>   #ifndef VALGRIND_DISCARD
>   #error VALGRIND_DISCARD not defined
>   #endif]])],
> [gcc_cv_header_memcheck_h=yes],
> [gcc_cv_header_memcheck_h=no])
> !   AC_MSG_RESULT($gcc_cv_header_memcheck_h)
> AM_PATH_PROG_WITH_TEST(valgrind_path, valgrind,
> [$ac_dir/$ac_word --version | grep valgrind- >/dev/null 2>&1])
> if test "x$valgrind_path" = "x" \
> --- 512,547 
>   dnl # in converting to autoconf 2.5x!
>   AC_CHECK_HEADER(valgrind.h, have_valgrind_h=yes, have_valgrind_h=no)
>
> ! # It is certainly possible that there's valgrind but no valgrind.h.
> ! # GCC relies on making annotations so we must have both.
> ! AC_MSG_CHECKING(for VALGRIND_DISCARD in )
> ! AC_PREPROC_IFELSE([AC_LANG_SOURCE(
> !   [[#include 
>   #ifndef VALGRIND_DISCARD
>   #error VALGRIND_DISCARD not defined
>   #endif]])],
> [gcc_cv_header_valgrind_memcheck_h=yes],
> [gcc_cv_header_valgrind_memcheck_h=no])
> ! AC_MSG_RESULT($gcc_cv_header_valgrind_memcheck_h)
> ! AC_MSG_CHECKING(for VALGRIND_DISCARD in )
> ! AC_PREPROC_IFELSE([AC_LANG_SOURCE(
> !   [[#include 
>   #ifndef VALGRIND_DISCARD
>   #error VALGRIND_DISCARD not defined
>   #endif]])],
> [gcc_cv_header_memcheck_h=yes],
> [gcc_cv_header_memcheck_h=no])
> ! AC_MSG_RESULT($gcc_cv_header_memcheck_h)
> ! if test $gcc_cv_header_valgrind_memcheck_h = yes; then
> !   AC_DEFINE(HAVE_VALGRIND_MEMCHECK_H, 1,
> !   [Define if valgrind's valgrind/memcheck.h header is installed.])
> ! fi
> ! if test $gcc_cv_header_memcheck_h = yes; then
> !   AC_DEFINE(HAVE_MEMCHECK_H, 1,
> !   [Define if valgrind's memcheck.h header is installed.])
> ! fi
> !
> ! if test x$ac_valgrind_checking != x ; then
> AM_PATH_PROG_WITH_TEST(valgrind_path, valgrind,
> [$ac_dir/$ac_word --version | grep valgrind- >/dev/null 2>&1])
> if test "x$valgrind_path" = "x" \
> *** if test x$ac_valgrind_checking != x ; th
> *** 546,559 
> AC_DEFINE(ENABLE_VALGRIND_CHECKING, 1,
>   [Define if you want to run subprograms and generated programs
>  through valgrind (a memory checker).  This is extremely expensive.])
> -   if test $gcc_cv_header_valgrind_memcheck_h = yes; then
> - AC_DEFINE(HAVE_VALGRIND_MEMCHECK_H, 1,
> -   [Define if valgrind's valgrind/memcheck.h header is installed.])
> -   fi
> -   if test $gcc_cv_header_memcheck_h = yes; then
> - AC_DEFINE(HAVE_MEMCHECK_H, 1,
> -   [Define if valgrind's memcheck.h header is installed.])
> -   fi
>   fi
>   AC_SUBST(valgrind_path_defines)
>   AC_SUBST(valgrind_command)
> --- 555,560 
> *** gather_stats=`if test $enable_gather_det
> *** 592,597 
> --- 593,613 
>   AC_DEFINE_UNQUOTED(GATHER_STATISTICS, $gather_stats,
>   [Define to enable detailed memory allocation stats gathering.])
>
> + AC_ARG_ENABLE(valgrind-annotations,
> + [AS_HELP_STRING([--enable-valgrind-annotations],
> +   [enable valgrind runtime interaction])], [],
> + [enable_valgrind_annotations=no])
> + if test x$enable_valgrind_annotations != xno \
> + || test x$ac_valgrind_checking != x; t

Do slightly less work in jump threader

2014-04-28 Thread Jeff Law



Per Richi's request, don't iterate over virtual outputs when invaliding 
outputs of statements which do not produce useful outputs for jump 
threading.  Related to 60902.


Bootstrapped and regression tested on x86_64-unknown-linux-gnu. 
Installed on the trunk.




diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 61fd558..89930b0 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2014-04-28  Jeff Law  
+
+   PR tree-optimization/60902
+   * tree-ssa-threadedge.c
+   (record_temporary_equivalences_from_stmts_at_dest): Only iterate
+   over real defs when invalidating outputs from statements that do not
+   produce useful outputs for threading.
+
 2014-04-28  Richard Biener  
 
PR tree-optimization/60979
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 8a0103b..7621348 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -398,7 +398,7 @@ record_temporary_equivalences_from_stmts_at_dest (edge e,
  ssa_op_iter iter;
 
  if (backedge_seen)
-   FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_ALL_DEFS)
+   FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_DEF)
  {
/* This call only invalidates equivalences created by
   PHI nodes.  This is by design to keep the cost of

Re: [PATCH] pedantic warning behavior when casting void* to ptr-to-func, 4.8 and 4.9

2014-04-28 Thread Daniel Gutson

Sorry, ping for maintainer.

We'd do need this for 4.8.3.

Thanks,

Daniel.

On Tue, Apr 22, 2014 at 9:15 AM, Daniel Gutson
 wrote:
> Ping for maintainer please.
>
> Thanks,
>
>Daniel.
>
> On Tue, Apr 15, 2014 at 7:05 PM, Daniel Gutson
>  wrote:
>> On Tue, Apr 15, 2014 at 6:12 PM, Richard Sandiford
>>  wrote:
>>> cc:ing Jason, who's the C++ maintainer.
>>
>>
>> FWIW: I created http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60850
>>
>>>
>>> Daniel Gutson  writes:
 ping for maintainer.

 Could this be considered for 4.8.3 please?

 Thanks,

Daniel.


 On Tue, Apr 1, 2014 at 2:46 PM, Daniel Gutson
  wrote:
>
> I just realized I posted the patch in the wrong list.
>
>
> -- Forwarded message --
> From: Daniel Gutson 
> Date: Tue, Apr 1, 2014 at 10:43 AM
> Subject: [PATCH] pedantic warning behavior when casting void* to
> ptr-to-func, 4.8 and 4.9
> To: gcc Mailing List 
>
>
> Hi,
>
>I observed two different behaviors in gcc 4.8.2 and 4.9 regarding
> the same issue, IMO both erroneous.
>
> Regarding 4.8.2, #pragma GCC diagnostic ignored "-pedantic" doesn't
> work in cases such as:
> void* p = 0;
> #pragma GCC diagnostic ignored "-pedantic"
> F* f2 = reinterpret_cast(p);
>
> (see testcase in the patch).
>
> The attached patch attempts to fix this issue. Since I no longer have
> write access, please
> apply this for me if correct (is the 4.8 branch still alive for adding 
> fixes?).
>
> Regarding 4.9, gcc fails to complain at all when -pedantic is passed,
> even specifying -std=c++03.
> Please let me know if this is truly a bug, in which case I could also
> fix it for the latest version as well
> (if so, please let me know if I should look into trunk or any other 
> branch).
>
> Thanks,
>
>Daniel.
>
> 2014-03-31  Daniel Gutson  
>
> gcc/cp/
> * typeck.c (build_reinterpret_cast_1): Pass proper argument to
> warn() in pedantic.
>
> gcc/testsuite/g++.dg/
> * diagnostic/pedantic.C: New test case.
>>>
>>> --- gcc-4.8.2-orig/gcc/cp/typeck.c  2014-03-31 22:29:42.736367936 -0300
>>> +++ gcc-4.8.2/gcc/cp/typeck.c   2014-03-31 14:26:43.536747050 -0300
>>> @@ -6639,7 +6639,7 @@
>>>where possible, and it is necessary in some cases.  DR 195
>>>addresses this issue, but as of 2004/10/26 is still in
>>>drafting.  */
>>> -   warning (0, "ISO C++ forbids casting between pointer-to-function 
>>> and pointer-to-object");
>>> +   warning (OPT_Wpedantic, "ISO C++ forbids casting between 
>>> pointer-to-function and pointer-to-object");
>>>return fold_if_not_in_template (build_nop (type, expr));
>>>  }
>>>else if (TREE_CODE (type) == VECTOR_TYPE)
>>> --- gcc-4.8.2-orig/gcc/testsuite/g++.dg/diagnostic/pedantic.C   1969-12-31 
>>> 21:00:00.0 -0300
>>> +++ gcc-4.8.2/gcc/testsuite/g++.dg/diagnostic/pedantic.C2014-03-31 
>>> 17:24:42.532607344 -0300
>>> @@ -0,0 +1,12 @@
>>> +// { dg-do compile }
>>> +// { dg-options "-pedantic" }
>>> +typedef void F(void);
>>> +
>>> +void foo()
>>> +{
>>> +void* p = 0;
>>> +F* f1 = reinterpret_cast(p);// { dg-warning "ISO" }
>>> +#pragma GCC diagnostic ignored "-pedantic"
>>> +F* f2 = reinterpret_cast(p);
>>> +}
>>> +
>>
>>
>>
>> --
>>
>> Daniel F. Gutson
>> Chief Engineering Officer, SPD
>>
>>
>> San Lorenzo 47, 3rd Floor, Office 5
>>
>> Córdoba, Argentina
>>
>>
>> Phone: +54 351 4217888 / +54 351 4218211
>>
>> Skype: dgutson
>
>
>
> --
>
> Daniel F. Gutson
> Chief Engineering Officer, SPD
>
>
> San Lorenzo 47, 3rd Floor, Office 5
>
> Córdoba, Argentina
>
>
> Phone: +54 351 4217888 / +54 351 4218211
>
> Skype: dgutson



-- 

Daniel F. Gutson
Chief Engineering Officer, SPD


San Lorenzo 47, 3rd Floor, Office 5

Córdoba, Argentina


Phone: +54 351 4217888 / +54 351 4218211

Skype: dgutson

[patch 1/N] std::regex refactoring - _BracketMatcher

2014-04-28 Thread Jonathan Wakely


Hi,

I've been looking through the regex code and have a few ideas for
simplifications or optimisations that I'd like to share.

This first patch is for _BracketMatcher. We only use std::bitset when
is_same<_CharT, char> so 8 * sizeof(_CharT) should be __CHAR_BIT__
instead. We also only user _UnsignedCharT when is_same<_CharT, char>
so that can just be simplified to unsigned char.

The contents of _BracketMatcher::_M_char_set are not sorted and can
contain duplicates in the current code. Making that a sorted, unique
list in _BracketMatcher::_M_ready() allows a binary search instead of
linear search. This improves worst case performance for pathological
regular expressions like std::wregex('['+std::wstring(1000, 'a')+"b]")
but I'm not sure if it helps in the common case.

Finally, in the non-char case the _CacheT member is an unused empty
object, so having that as the first member requires 7 bytes of
padding. Re-ordering the members reduces the size of a non-char
_BracketMatcher by 8 bytes (but it's still a whopping 96 bytes).

(For a char _BracketMatcher the bitset cache makes it 128 bytes,
this patch doesn't change that).

Thoughts?


diff --git a/libstdc++-v3/include/bits/regex_compiler.h b/libstdc++-v3/include/bits/regex_compiler.h
index f5a198f..a9dd8d3 100644
--- a/libstdc++-v3/include/bits/regex_compiler.h
+++ b/libstdc++-v3/include/bits/regex_compiler.h
@@ -396,6 +396,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   void
   _M_ready()
   {
+	std::sort(_M_char_set.begin(), _M_char_set.end());
+	auto __end = std::unique(_M_char_set.begin(), _M_char_set.end());
+	_M_char_set.erase(__end, _M_char_set.end());
 	_M_make_cache(_IsChar());
 #ifdef _GLIBCXX_DEBUG
 	_M_is_ready = true;
@@ -405,10 +408,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 private:
   typedef typename is_same<_CharT, char>::type _IsChar;
   struct _Dummy { };
+  static constexpr size_t _S_cache_size() { return 1ul << __CHAR_BIT__; }
   typedef typename conditional<_IsChar::value,
-   std::bitset<1 << (8 * sizeof(_CharT))>,
+   std::bitset<_S_cache_size()>,
    _Dummy>::type _CacheT;
-  typedef typename make_unsigned<_CharT>::type _UnsignedCharT;
 
 private:
   bool
@@ -416,14 +419,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   bool
   _M_apply(_CharT __ch, true_type) const
-  { return _M_cache[static_cast<_UnsignedCharT>(__ch)]; }
+  { return _M_cache[static_cast(__ch)]; }
 
   void
   _M_make_cache(true_type)
   {
-	for (int __i = 0; __i < _M_cache.size(); __i++)
-	  _M_cache[static_cast<_UnsignedCharT>(__i)] =
-	_M_apply(__i, false_type());
+	for (unsigned __i = 0; __i < _S_cache_size(); __i++)
+	  _M_cache[__i] = _M_apply(static_cast(__i), false_type());
   }
 
   void
@@ -431,13 +433,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { }
 
 private:
-  _CacheT   _M_cache;
   std::vector<_CharT>   _M_char_set;
   std::vector<_StringT> _M_equiv_set;
   std::vector> _M_range_set;
   _CharClassT   _M_class_set;
   _TransT   _M_translator;
   const _TraitsT&   _M_traits;
+  _CacheT   _M_cache;
   bool  _M_is_non_matching;
 #ifdef _GLIBCXX_DEBUG
   bool  _M_is_ready;
diff --git a/libstdc++-v3/include/bits/regex_compiler.tcc b/libstdc++-v3/include/bits/regex_compiler.tcc
index 128dac1..36edfba 100644
--- a/libstdc++-v3/include/bits/regex_compiler.tcc
+++ b/libstdc++-v3/include/bits/regex_compiler.tcc
@@ -507,12 +507,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _BracketMatcher<_TraitsT, __icase, __collate>::
 _M_apply(_CharT __ch, false_type) const
 {
-  bool __ret = false;
-  if (std::find(_M_char_set.begin(), _M_char_set.end(),
-		_M_translator._M_translate(__ch))
-	  != _M_char_set.end())
-	__ret = true;
-  else
+  bool __ret = std::binary_search(_M_char_set.begin(), _M_char_set.end(),
+  _M_translator._M_translate(__ch));
+  if (!__ret)
 	{
 	  auto __s = _M_translator._M_transform(__ch);
 	  for (auto& __it : _M_range_set)

Re: [Patch ARM 1/3] Neon intrinsics TLC : Replace intrinsics with GNU C implementations where possible.

2014-04-28 Thread Ramana Radhakrishnan

On Mon, Apr 28, 2014 at 12:44 PM, Julian Brown  
wrote:

> On Mon, 28 Apr 2014 11:44:01 +0100
> Ramana Radhakrishnan  wrote:
>
>> I've special cased the ffast-math case for the _f32 intrinsics to
>> prevent the auto-vectorizer from coming along and vectorizing addv2sf
>> and addv4sf type operations which we don't want to happen by default.
>> Patch 1/3 causes apparent "regressions" in the rather ineffective
>> neon intrinsics tests that we currently carry soon hopefully to be
>> replaced by Christophe Lyon's rewrite that is being reviewed. On the
>> whole I deem this patch stack to be safe to go in if necessary. These
>> "regressions" are for -O0 with the vbic and vorn intrinsics which
>> don't now get combined and well, so be it.
>
> I think reimplementing these intrinsics in C is a mistake if we ever
> hope to make big-endian mode work properly, and "fixing" the generated
> header file by bypassing the generator makes it harder to accurately
> perform the sweeping changes that will probably be necessary to do that.#

> Recall e.g. the discussion around:

>
> http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00161.html

Well, it would help if the generator were written in a better language 
than ML :) . While I don't mind the different language in the backend 
once in a while the problem is that everytime anyone needs to make a 
change to this file, we spend far more time relearning ML than actually 
doing the change :(.

>
> Generally (though in this case it's merely an implementation detail)
> the NEON intrinsics and GCC's generic vector support cannot be expected
> to interwork properly (because of incompatible lane ordering). Of
> course we get away with it in little-endian mode though, and I guess
> the bridge has already been crossed by earlier patches.

Please note that I have been very careful about doing only those 
operations that will not be afflicted by big endian. I am not touching 
any of the lane-wise intrinsics or intrinsics that touch lane numbers. 
It is the intrinsics that have explicit lane numbering that have the 
issue and not the intrinsics I have touched. What's being done here is 
similar to how these particular intrinsics have been dealt with in the 
AArch64 backend and we don't see any issues with these intrinsics in the 
big endian mode and I will not expect these intrinsics to be more broken 
in big-endian than they are currently with this patch or these set of 
patches.

What specifically are you worried about with Patch 1/3 with respect to 
big endian in this case ? I agree that there may be issues with the 
specific "lane" extraction and vector lane numbering extensions that GCC 
has in big-endian mode vs Neon intrinsics but otherwise this change 
should *not* cause any issues in that space.

What specifically are you worried about with this patch other than 
losing the ability to auto-generate these intrinsics - the patch as is 
doesn't do anything but touch all those that operate on the entire 
vector and have no dependence at all on lane numbering ?

regards
Ramana

Re: [PATCH 00/89] Compile-time gimple-checking

2014-04-28 Thread Michael Matz

Hi,

On Fri, 25 Apr 2014, Richard Biener wrote:

> Btw, I agree we should stick to one style throughout the code-base.
> The advantage of the cast variant is that it can be made work with
> NULL pointers (in the dyn_cast <> case).

NULL pointers shouldn't even be casted at all, there should be sensible 
early-outs or conditions to avoid work on NULL.

> Oh, and you could avoid all the base-class changing stuff if you'd
> do the method like
> 
> class Foo
> {
>   template  T *as () { return as_a  (this); }
> }
> 
> Best (or worst) of both worlds.
> 
>   gimple_cond c = g->as ();
> 
> ;)  (you can even put those methods in a separate feature class you
> can simply inherit from)

Yeah, well, I can write ugly C++ as well, it's just that I don't want to 
:-/


Ciao,
Michael.

[patch 2/N] std::regex refactoring - sub _Executor for lookahead

2014-04-28 Thread Jonathan Wakely


Is there any reason this object is created on the heap?


diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
index 7f89933..92ca590 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -145,13 +145,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _M_lookahead(_State<_TraitsT> __state)
 {
   _ResultsVec __what(_M_cur_results.size());
-  auto __sub = std::unique_ptr<_Executor>(new _Executor(_M_current,
-			_M_end,
-			__what,
-			_M_re,
-			_M_flags));
-  __sub->_M_start_state = __state._M_alt;
-  if (__sub->_M_search_from_first())
+  _Executor __sub(_M_current, _M_end, __what, _M_re, _M_flags);
+  __sub._M_start_state = __state._M_alt;
+  if (__sub._M_search_from_first())
 	{
 	  for (size_t __i = 0; __i < __what.size(); __i++)
 	if (__what[__i].matched)

Re: [PATCH] Typofixes and a trivial change

2014-04-28 Thread Patrick Palka

On Mon, Apr 28, 2014 at 4:57 AM, Richard Biener
 wrote:
> On Fri, Apr 25, 2014 at 5:03 PM, Patrick Palka  wrote:
>> I forgot the ChangeLog entry:
>
> Ok.
>
> Thanks,
> Richard.

Thanks for reviewing.  May someone please commit this for me?  Thanks
in advance.

Re: Ping2: [PATCH] PR debug/16063. Add DW_AT_type to DW_TAG_enumeration.

2014-04-28 Thread Mark Wielaard

On Mon, 2014-04-28 at 14:23 +0200, Jakub Jelinek wrote:
> On Mon, Apr 28, 2014 at 01:17:32PM +0200, Mark Wielaard wrote:
> > Ping2. Please let me know if I should ping/cc other people to get this
> > reviewed.
> 
> Do you want to add DW_AT_type to DW_TAG_enumeration only if it has explicit
> underlying type (enum class foo: char { ... };) or even when the underlying
> type is computed emplicitly (then you'd just use TREE_TYPE of the
> ENUMERAL_TYPE if non-NULL).

The debugger cares about the actual underlying type used if the language
can use multiple. Either explicitly assigned by the user or implicitly
as derived by the language/compile flags used. So the lang hook should
provide one in both cases, if appropriate.

Cheers,

Mark

Re: Ping2: [PATCH] PR debug/16063. Add DW_AT_type to DW_TAG_enumeration.

2014-04-28 Thread Jakub Jelinek

On Mon, Apr 28, 2014 at 01:17:32PM +0200, Mark Wielaard wrote:
> On Tue, 2014-04-22 at 12:31 +0200, Mark Wielaard wrote:
> > On Mon, 2014-04-14 at 23:19 +0200, Mark Wielaard wrote:
> > > On Fri, 2014-04-11 at 11:03 -0700, Cary Coutant wrote:
> > > > >> The DWARF bits are fine with me.
> > > > >
> > > > > Thanks. Who can approve the other bits?
> > > > 
> > > > You should probably get C and C++ front end approval. I'm not really
> > > > sure who needs to review patches in c-family/. Since the part in c/ is
> > > > so tiny, maybe all you need is a C++ front end maintainer. Both
> > > > Richard Henderson and Jason Merrill are global reviewers, so either of
> > > > them could approve the whole thing.
> > > 
> > > Thanks, I added them to the CC.
> > > 
> > > > > When approved should I wait till stage 1 opens before committing?
> > > > 
> > > > Yes. The PR you're fixing is an enhancement request, not a regression,
> > > > so it needs to wait.
> > > 
> > > Since stage one just opened up again this seems a good time to re-ask
> > > for approval then :) Rebased patch against current trunk attached.
> > 
> > Ping. Tom already pushed his patches to GDB that take advantage of the
> > new information if available.
> 
> Ping2. Please let me know if I should ping/cc other people to get this
> reviewed.

Do you want to add DW_AT_type to DW_TAG_enumeration only if it has explicit
underlying type (enum class foo: char { ... };) or even when the underlying
type is computed emplicitly (then you'd just use TREE_TYPE of the
ENUMERAL_TYPE if non-NULL).

Jakub

[PATCH] Fix PR60979

2014-04-28 Thread Richard Biener


In this PR graphite scop detection ends up doing make_forwarder_block
on a block with incoming abnormal edges but doesn't want those
"stay" on the forwarder.  In the end it is an implementation detail
of make_forwarder_block that fails (because what graphite wants
would be possible - just not using make_forwarder_block).

Rather than changing make_forwarder_block I chose to avoid the
situation from GRAPHITE (also makes backporting obvious).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2014-04-28  Richard Biener  

PR tree-optimization/60979
* graphite-scop-detection.c (scopdet_basic_block_info): Reject
SCOPs that end in a block with a successor with abnormal
predecessors.

* gcc.dg/graphite/pr60979.c: New testcase.

Index: gcc/graphite-scop-detection.c
===
*** gcc/graphite-scop-detection.c   (revision 209849)
--- gcc/graphite-scop-detection.c   (working copy)
*** scopdet_basic_block_info (basic_block bb
*** 474,481 
result.exits = false;
  
/* Mark bbs terminating a SESE region difficult, if they start
!a condition.  */
!   if (!single_succ_p (bb))
result.difficult = true;
else
result.exit = single_succ (bb);
--- 474,483 
result.exits = false;
  
/* Mark bbs terminating a SESE region difficult, if they start
!a condition or if the block it exits to cannot be split
!with make_forwarder_block.  */
!   if (!single_succ_p (bb)
! || bb_has_abnormal_pred (single_succ (bb)))
result.difficult = true;
else
result.exit = single_succ (bb);
Index: gcc/testsuite/gcc.dg/graphite/pr60979.c
===
*** gcc/testsuite/gcc.dg/graphite/pr60979.c (revision 0)
--- gcc/testsuite/gcc.dg/graphite/pr60979.c (working copy)
***
*** 0 
--- 1,37 
+ /* { dg-options "-O -fgraphite-identity" } */
+ 
+ #include 
+ 
+ struct x;
+ 
+ typedef struct x **(*a)(struct x *);
+ 
+ struct x {
+ union {
+   struct {
+   union {
+   a *i;
+   } l;
+   int s;
+   } y;
+ } e;
+ };
+ 
+ jmp_buf c;
+ 
+ void
+ b(struct x *r)
+ {
+   int f;
+   static int w = 0;
+   volatile jmp_buf m;
+   f = (*(((struct x *)r)->e.y.l.i[2]((struct x *)r)))->e.y.s;
+   if (w++ != 0)
+ __builtin_memcpy((char *)m, (const char *)c, sizeof(jmp_buf));
+   if (setjmp (c) == 0) {
+   int z;
+   for (z = 0; z < 0; ++z)
+   ;
+   }
+   d((const char *)m);
+ }

Re: Changes for if-convert to recognize simple conditional reduction.

2014-04-28 Thread Richard Biener

On Thu, Apr 17, 2014 at 3:09 PM, Yuri Rumyantsev  wrote:
> Hi All,
>
> We implemented enhancement for if-convert phase to recognize the
> simplest conditional reduction and to transform it vectorizable form,
> e.g. statement
> if (A[i] != 0) num+= 1; will be recognized.
> A new test-case is also provided.
>
> Bootstrapping and regression testing did not show any new failures.

Clever.  Can you add a testcase with a non-constant but invariant
reduction value and one with a variable reduction value as well?

+  if (!(is_cond_scalar_reduction (arg_0, &reduc, &op0, &op1)
+   || is_cond_scalar_reduction (arg_1, &reduc, &op0, &op1)))

Actually one of the args should be defined by a PHI node in the
loop header and the PHI result should be the PHI arg on the
latch edge, so I'd pass both PHI args to the predicate and do
the decision on what the reduction op is there (you do that
anyway).  The pattern matching is somewhat awkward

+  /* Consider only conditional reduction.  */
+  bb = gimple_bb (stmt);
+  if (!bb_has_predicate (bb))
+return false;
+  if (is_true_predicate (bb_predicate (bb)))
+return false;

should be replaced by matching the PHI structure

loop-header:
  reduc_1 = PHI <..., reduc_2>
  ...
  if (..)
reduc_3 = ...
  reduc_2 = PHI 

+  lhs = gimple_assign_lhs (stmt);
+  if (TREE_CODE (lhs) != SSA_NAME)
+return false;

always true, in fact lhs == arg.

+  if (SSA_NAME_VAR (lhs) == NULL)
+return false;

no need to check that (or later verify SSA_NAME_VAR equivalency), not
sure why you think you need that.

+  if (!single_imm_use (lhs, &use, &use_stmt))
+return false;
+  if (gimple_code (use_stmt) != GIMPLE_PHI)
+return false;

checking has_single_use (arg) is enough.  The above is error-prone
wrt debug statements.

+  if (reduction_op == PLUS_EXPR &&
+  TREE_CODE (r_op2) == SSA_NAME)

&& goes to the next line

+  if (TREE_CODE (r_op2) != INTEGER_CST && TREE_CODE (r_op2) != REAL_CST)
+return false;

any reason for this check?  The vectorizer can cope with
loop invariant non-constant values as well (at least).

+  /* Right operand is constant, check that left operand is equal to lhs.  */
+  if (SSA_NAME_VAR (lhs) !=  SSA_NAME_VAR (r_op1))
+return false;

see above - that looks weird.

Note that I think you may introduce undefined overflow in
unconditionally executing the increment.  So you need to
make sure to re-write the increment in unsigned arithmetic
(for integral types, that is).

Thanks,
Richard.

> Is it OK for trunk?
>
> gcc/ChangeLog:
> 2014-04-17  Yuri Rumyantsev  
>
> * tree-if-conv.c (is_cond_scalar_reduction): New function.
> (convert_scalar_cond_reduction): Likewise.
> (predicate_scalar_phi): Add recognition and transformation
> of simple conditioanl reduction to be vectorizable.
>
> gcc/testsuite/ChangeLog:
> 2014-04-17  Yuri Rumyantsev  
>
> * gcc.dg/cond-reduc.c: New test.

Re: [PATCH] PR60092, add C11 aligned_alloc handling

2014-04-28 Thread Jakub Jelinek

On Mon, Apr 28, 2014 at 01:52:08PM +0200, Richard Biener wrote:
> On Thu, Feb 6, 2014 at 2:26 PM, Richard Biener  wrote:
> >
> > This adds a builtin for C11 aligned_alloc and support for it
> > in the alias and alignment tracking machinery.
> >
> > Bootstrap and regtest in progress on x86_64-unknown-linux-gnu.
> >
> > Ok for trunk?
> 
> Ping.

Ok, thanks.

> > 2014-02-06  Richard Biener  
> >
> > PR middle-end/60092
> > * builtins.def (DEF_C11_BUILTIN): Add.
> > (BUILT_IN_ALIGNED_ALLOC): Likewise.
> > * coretypes.h (enum function_class): Add function_c11_misc.
> > * tree-ssa-alias.c (ref_maybe_used_by_call_p_1): Handle
> > BUILT_IN_ALIGNED_ALLOC like BUILT_IN_MALLOC.
> > (call_may_clobber_ref_p_1): Likewise.
> > * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Likewise.
> > (mark_all_reaching_defs_necessary_1): Likewise.
> > (propagate_necessity): Likewise.
> > (eliminate_unnecessary_stmts): Likewise.
> > * tree-ssa-ccp.c (evaluate_stmt): Handle BUILT_IN_ALIGNED_ALLOC.
> >
> > ada/
> > * gcc-interface/utils.c: Define flag_isoc11.
> >
> > lto/
> > * lto-lang.c: Define flag_isoc11.
> >
> > * gcc.dg/tree-ssa/alias-32.c: New testcase.
> > * gcc.dg/vect/pr60092.c: Likewise.

Jakub

Re: -fuse-caller-save - Enable for MIPS

2014-04-28 Thread Tom de Vries


On 28-04-14 12:47, Tom de Vries wrote:

Hmm, is that just because -fcaller-saves is -O2 and above?


For -O1, after adding -fcaller-saves the optimization triggers, and the 
test-cases passes.


For -O0, adding -fcaller-saves doesn't make a difference, the optimization 
doesn't trigger.



If so,
should -fuse-caller-save imply -fcaller-saves?


I don't think it's strictly necessary, but I can make a patch if required.

Thanks,
- Tom

Re: [PATCH][testsuite] Fix gcc.dg/pr60114.c on arm/aarch64

2014-04-28 Thread Marek Polacek

On Mon, Apr 28, 2014 at 12:18:05PM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> I noticed this test is failing on aarch64:
> 
> FAIL: gcc.dg/pr60114.c  (test for warnings, line 7)
> FAIL: gcc.dg/pr60114.c  (test for warnings, line 8)
> FAIL: gcc.dg/pr60114.c  (test for warnings, line 21)
> FAIL: gcc.dg/pr60114.c  (test for warnings, line 22)
> FAIL: gcc.dg/pr60114.c  (test for warnings, line 23)
> FAIL: gcc.dg/pr60114.c  (test for warnings, line 25)
> FAIL: gcc.dg/pr60114.c (test for excess errors)
> 
> The test was recently added with
> http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00592.html
 
Sorry, I tested x86_64, both -m64 and -m32, but I don't test ARM.

> The offending code is of the form:
> 
> 
> const char z[] = {
>   [0] = 0x100, /* { dg-warning "9:overflow in implicit constant conversion" } 
> */
>   [2] = 0x101, /* { dg-warning "9:overflow in implicit constant conversion" } 
> */
> };
> 
> 
> On aarch64 (and arm) chars are unsigned by default so instead we get
> the warning "large integer implicitly truncated to unsigned type".
> 
> This patch explicitly uses signed chars in the test as suggested by richi in 
> the PR.
> 
> Ok for trunk?

Looks good.

Marek

Re: [PATCH] PR60092, add C11 aligned_alloc handling

2014-04-28 Thread Richard Biener

On Thu, Feb 6, 2014 at 2:26 PM, Richard Biener  wrote:
>
> This adds a builtin for C11 aligned_alloc and support for it
> in the alias and alignment tracking machinery.
>
> Bootstrap and regtest in progress on x86_64-unknown-linux-gnu.
>
> Ok for trunk?

Ping.

> Thanks,
> Richard.
>
> 2014-02-06  Richard Biener  
>
> PR middle-end/60092
> * builtins.def (DEF_C11_BUILTIN): Add.
> (BUILT_IN_ALIGNED_ALLOC): Likewise.
> * coretypes.h (enum function_class): Add function_c11_misc.
> * tree-ssa-alias.c (ref_maybe_used_by_call_p_1): Handle
> BUILT_IN_ALIGNED_ALLOC like BUILT_IN_MALLOC.
> (call_may_clobber_ref_p_1): Likewise.
> * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Likewise.
> (mark_all_reaching_defs_necessary_1): Likewise.
> (propagate_necessity): Likewise.
> (eliminate_unnecessary_stmts): Likewise.
> * tree-ssa-ccp.c (evaluate_stmt): Handle BUILT_IN_ALIGNED_ALLOC.
>
> ada/
> * gcc-interface/utils.c: Define flag_isoc11.
>
> lto/
> * lto-lang.c: Define flag_isoc11.
>
> * gcc.dg/tree-ssa/alias-32.c: New testcase.
> * gcc.dg/vect/pr60092.c: Likewise.
>
> Index: trunk/gcc/builtins.def
> ===
> *** trunk.orig/gcc/builtins.def 2014-02-06 12:46:01.085000256 +0100
> --- trunk/gcc/builtins.def  2014-02-06 13:13:06.499888349 +0100
> *** along with GCC; see the file COPYING3.
> *** 111,116 
> --- 111,123 
> DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,  \
>true, true, !flag_isoc99, ATTRS, targetm.libc_has_function 
> (function_c99_misc), true)
>
> + /* Like DEF_LIB_BUILTIN, except that the function is only a part of
> +the standard in C11 or above.  */
> + #undef DEF_C11_BUILTIN
> + #define DEF_C11_BUILTIN(ENUM, NAME, TYPE, ATTRS)  \
> +   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,  \
> +  true, true, !flag_isoc11, ATTRS, targetm.libc_has_function 
> (function_c11_misc), true)
> +
>   /* Like DEF_C99_BUILTIN, but for complex math functions.  */
>   #undef DEF_C99_COMPL_BUILTIN
>   #define DEF_C99_COMPL_BUILTIN(ENUM, NAME, TYPE, ATTRS)\
> *** DEF_C99_BUILTIN(BUILT_IN_ACOSH,
> *** 223,228 
> --- 230,236 
>   DEF_C99_BUILTIN(BUILT_IN_ACOSHF, "acoshf", BT_FN_FLOAT_FLOAT, 
> ATTR_MATHFN_FPROUNDING_ERRNO)
>   DEF_C99_BUILTIN(BUILT_IN_ACOSHL, "acoshl", 
> BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_FPROUNDING_ERRNO)
>   DEF_C99_C90RES_BUILTIN (BUILT_IN_ACOSL, "acosl", 
> BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_FPROUNDING_ERRNO)
> + DEF_C11_BUILTIN(BUILT_IN_ALIGNED_ALLOC, "aligned_alloc", 
> BT_FN_PTR_SIZE_SIZE, ATTR_MALLOC_NOTHROW_LIST)
>   DEF_LIB_BUILTIN(BUILT_IN_ASIN, "asin", BT_FN_DOUBLE_DOUBLE, 
> ATTR_MATHFN_FPROUNDING_ERRNO)
>   DEF_C99_C90RES_BUILTIN (BUILT_IN_ASINF, "asinf", BT_FN_FLOAT_FLOAT, 
> ATTR_MATHFN_FPROUNDING_ERRNO)
>   DEF_C99_BUILTIN(BUILT_IN_ASINH, "asinh", BT_FN_DOUBLE_DOUBLE, 
> ATTR_MATHFN_FPROUNDING)
> Index: trunk/gcc/coretypes.h
> ===
> *** trunk.orig/gcc/coretypes.h  2014-01-07 10:20:00.549453955 +0100
> --- trunk/gcc/coretypes.h   2014-02-06 13:10:03.472900950 +0100
> *** enum function_class {
> *** 194,200 
> function_c94,
> function_c99_misc,
> function_c99_math_complex,
> !   function_sincos
>   };
>
>   /* Memory model types for the __atomic* builtins.
> --- 194,201 
> function_c94,
> function_c99_misc,
> function_c99_math_complex,
> !   function_sincos,
> !   function_c11_misc
>   };
>
>   /* Memory model types for the __atomic* builtins.
> Index: trunk/gcc/tree-ssa-alias.c
> ===
> *** trunk.orig/gcc/tree-ssa-alias.c 2014-02-06 12:43:34.450010352 +0100
> --- trunk/gcc/tree-ssa-alias.c  2014-02-06 13:14:08.669884068 +0100
> *** ref_maybe_used_by_call_p_1 (gimple call,
> *** 1516,1521 
> --- 1516,1522 
> case BUILT_IN_FREE:
> case BUILT_IN_MALLOC:
> case BUILT_IN_POSIX_MEMALIGN:
> +   case BUILT_IN_ALIGNED_ALLOC:
> case BUILT_IN_CALLOC:
> case BUILT_IN_ALLOCA:
> case BUILT_IN_ALLOCA_WITH_ALIGN:
> *** call_may_clobber_ref_p_1 (gimple call, a
> *** 1826,1831 
> --- 1827,1833 
> /* Allocating memory does not have any side-effects apart from
>being the definition point for the pointer.  */
> case BUILT_IN_MALLOC:
> +   case BUILT_IN_ALIGNED_ALLOC:
> case BUILT_IN_CALLOC:
> case BUILT_IN_STRDUP:
> case BUILT_IN_STRNDUP:
> Index: trunk/gcc/tree-ssa-dce.c
> ===
> *** trunk.orig/gcc/tree-ssa-dce.c   2014-01-07 10:20:02.5

Re: [Patch ARM 1/3] Neon intrinsics TLC : Replace intrinsics with GNU C implementations where possible.

2014-04-28 Thread Julian Brown

On Mon, 28 Apr 2014 11:44:01 +0100
Ramana Radhakrishnan  wrote:

> I've special cased the ffast-math case for the _f32 intrinsics to 
> prevent the auto-vectorizer from coming along and vectorizing addv2sf 
> and addv4sf type operations which we don't want to happen by default.
> Patch 1/3 causes apparent "regressions" in the rather ineffective
> neon intrinsics tests that we currently carry soon hopefully to be
> replaced by Christophe Lyon's rewrite that is being reviewed. On the
> whole I deem this patch stack to be safe to go in if necessary. These
> "regressions" are for -O0 with the vbic and vorn intrinsics which
> don't now get combined and well, so be it.

I think reimplementing these intrinsics in C is a mistake if we ever
hope to make big-endian mode work properly, and "fixing" the generated
header file by bypassing the generator makes it harder to accurately
perform the sweeping changes that will probably be necessary to do that.
Recall e.g. the discussion around:

http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00161.html

Generally (though in this case it's merely an implementation detail)
the NEON intrinsics and GCC's generic vector support cannot be expected
to interwork properly (because of incompatible lane ordering). Of
course we get away with it in little-endian mode though, and I guess
the bridge has already been crossed by earlier patches.

Of course it's possible nobody actually wants to use big-endian NEON,
in which case it's probably time to declared it unsupported?

Julian

Re: [i386] Replace builtins with vector extensions

2014-04-28 Thread Marc Glisse


Ping
http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00590.html

(note that ARM seems to be doing the same thing for their neon 
intrinsics, see Ramana's patch series posted today)


On Fri, 11 Apr 2014, Marc Glisse wrote:


Hello,

the previous discussion on the topic was before we added all those #pragma 
target in *mmintrin.h:


http://gcc.gnu.org/ml/gcc-patches/2013-04/msg00374.html

I believe that removes a large part of the arguments against it. Note that I 
only did a few of the more obvious intrinsics, I am waiting to see if this 
patch is accepted before doing more.


Bootstrap+testsuite on x86_64-linux-gnu.

2014-04-11  Marc Glisse  

* config/i386/xmmintrin.h (_mm_add_ps, _mm_sub_ps, _mm_mul_ps,
_mm_div_ps, _mm_store_ss, _mm_cvtss_f32): Use vector extensions
instead of builtins.
	* config/i386/emmintrin.h (_mm_store_sd, _mm_cvtsd_f64, 
_mm_storeh_pd,

_mm_cvtsi128_si64, _mm_cvtsi128_si64x, _mm_add_pd, _mm_sub_pd,
_mm_mul_pd, _mm_div_pd, _mm_storel_epi64, _mm_movepi64_pi64,
_mm_loadh_pd, _mm_loadl_pd): Likewise.
(_mm_sqrt_sd): Fix comment.


--
Marc Glisse

Re: [RFC] [Testsuite,ARM] Neon intrinsics executable tests

2014-04-28 Thread Ramana Radhakrishnan

On Mon, Apr 21, 2014 at 8:28 PM, Christophe Lyon
 wrote:
> Hi Ramana,
>
> Here is an updated patch, which adds a README file, some improved
> comments and a few more tests.
> The ChangeLog entry would list the following as new files:
> arm-neon-ref.h
> binary_op.inc
> compute-ref-data.h
> neon-intrinsics.exp
> README
> unary_op.inc
> vaba.c
> vabal.c
> vabd.c
> vabdl.c
> vabs.c
> vadd.c
> vaddhn.c
> vaddl.c
> vaddw.c
> vld1.c
>
> Comments?

LGTM - I'd like a testsuite maintainer to take a look .

Mike, do you have any opinions on the way in which the tests are being
structured ?

Ramana
>
> Thanks,
>
> Christophe.
>
>
> On 15 April 2014 19:38, Christophe Lyon  wrote:
>> On 15 April 2014 16:18, Ramana Radhakrishnan
>>  wrote:
>>> On 04/14/14 23:16, Christophe Lyon wrote:

 Hi Ramana,

 Here is an updated version of my proposal to include tests for Neon
 intrinsics.

 wrt to my previous post, I have made a few changes:
 - renamed the test files, removing the "ref_" prefix.
 - removed the TEST_ prefix on some initialization macros
 - use the c-torture framework

 I have run it successfully on the following configurations:
  aarch64-none-linux-gnu
  aarch64-none-elf
  aarch64_be-none-elf
  arm-none-linux-gnueabihf
  armeb-none-linux-gnueabihf
  arm-none-linux-gnueabi
  armeb-none-linux-gnueabi
  arm-none-eabi
 using qemu for most of them and the Foundation Model for aarch64*elf
>>>
>>> I had a brief look at your patch and how does this run for AArch64 when
>>> you have such options in the testsuite ?
>>>
>>>
>>> +++ b/gcc/testsuite/gcc.target/arm/neon-intrinsics/vaba.c
>>>
>>> @@ -0,0 +1,145 @@
>>> +/* { dg-do run } */
>>> +/* { dg-require-effective-target arm_neon_hw { target { "arm* } } } */
>>>
>>> +/* { dg-add-options arm_neon } */
>>> +
>>>
>>
>> Good catch... in fact these lines are ignored when using c-torture, I
>> just forgot to clean them up.
>>
>>> Additionally a README would help in terms of how one should add new tests.
>> OK
>>
 Any comments?

 Thanks,

 Christophe.


 On 29 October 2013 19:09, Christophe Lyon 
 wrote:
>
> On 29 October 2013 03:24, Ramana Radhakrishnan  wrote:
>>
>> On 10/09/13 23:16, Christophe Lyon wrote:
>
>
>> Irrespective of our earlier conversations on this now I'm actually
>> wondering
>> if instead of doing this and integrating this in the GCC source base it
>> maybe easier to write a harness to test this cross on qemu or natively.
>> Additionally setting up an auto-tester to do this might be a more
>> productive
>> use of time rather than manually dejagnuizing this which appears to be a
>> tedious and slow process.
>
>
> This would be easy to setup, since the Makefile on gitorious is
> already targetting qemu. I used it occasionnally on boards with
> minimal changes.
> This just means we'd have to agree on how to set up such an
> auto-tester, where do we send the results to, etc...
>>>
>>> If you are sufficiently motivated to do the transition, I'm not opposed
>>> to putting it into the testsuite as a basic regression testing framework
>>> for neon intrinsics.
>>>
>> I would really like to have all this converge to a good solution, so
>> yes I want to convert the whole testsuite to dejagnu.
>> I just want that we agree on the format before proceeding with the
>> other tests, that's why I've just posted a subset, hopefully
>> representative enough but easier to review.
>>
>>> I'll try and play with this in some more detail with a couple of patches
>>> I'm doing in the area of neon intrinsics so it may be useful to cross check.
>>
>> OK let me know if you have further comments.
>>
>> As of now I understand that you are OK with this patch, modulo the
>> removal of the 3 dg-* lines, correct?
>>
>>
>> Thanks,
>>
>> Christophe.
>>
>>>
>>> regards
>>> Ramana
>>>
>>>
>
>>> I'd like your feedback before continuing, as there are a lot more
>>> files to come.
>>>
>>> I have made some cleanup to help review, but the two .h files will
>>> need to grow as more intrinsics will be added (see the original ones).
>>
>>
>> Which one should I compare this with in terms of the original file ?
>
>
> I have kept the same file names.
>
>
>>> I'd like to keep the modifications at a minimal level, to save my time
>>> when adapting each test (there are currently 145 test files, so 143
>>> left:-).
>>
>>
>>
>> On to the patch itself.
>>
>> The prefix TEST_ seems a bit misleading in that it suggests this is
>> testing
>> something when in reality this is initializing stuff.
>
> In fact, TEST_ executes the  intrinsics, and copies the
> results to memory when relevant. But I can easily change TEST_ to
> something else.
>
> So in the sample

Re: [PATCH] Cleanup do_per_function, require less push/pop_cfun

2014-04-28 Thread Richard Biener

On Wed, 23 Apr 2014, Richard Biener wrote:

> 
> This avoids all the complex work on simple things like
> clear_last_verified.  It also makes eventually inlining all
> calls (for example the one with the small IPA pass hack)
> less code-duplicating.
> 
> I had to remove the asserts in favor of frees of DOM info in 
> release_function_body as the old code released DOM info in
> various odd places.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> Honza, does this look ok to you?

I've heard nothing back so I assume it's ok and committed it.

Richard.

> Thanks,
> Richard.
> 
> 2014-04-23  Richard Biener  
> 
>   * tree-pass.h (execute_pass_list): Adjust prototype.
>   * passes.c (pass_manager::execute_early_local_passes):
>   Adjust.
>   (do_per_function): Change callback signature, push all actual
>   work to the callbals.
>   (do_per_function_toporder): Likewise.
>   (execute_function_dump): Adjust.
>   (execute_function_todo): Likewise.
>   (clear_last_verified): Likewise.
>   (verify_curr_properties): Likewise.
>   (update_properties_after_pass): Likewise.
>   (apply_ipa_transforms): Likewise.
>   (execute_pass_list_1): Split out from ...
>   (execute_pass_list): ... here.  Adjust.
>   (execute_ipa_pass_list): Likewise.
>   * cgraphunit.c (cgraph_add_new_function): Adjust.
>   (analyze_function): Likewise.
>   (expand_function): Likewise.
>   * cgraph.c (release_function_body): Free dominance info
>   here instead of asserting it was magically freed elsewhere.
> 
> Index: gcc/tree-pass.h
> ===
> *** gcc/tree-pass.h.orig  2014-04-23 14:55:25.640624814 +0200
> --- gcc/tree-pass.h   2014-04-23 15:40:56.443436802 +0200
> *** extern gimple_opt_pass *make_pass_conver
> *** 587,593 
>   extern opt_pass *current_pass;
>   
>   extern bool execute_one_pass (opt_pass *);
> ! extern void execute_pass_list (opt_pass *);
>   extern void execute_ipa_pass_list (opt_pass *);
>   extern void execute_ipa_summary_passes (ipa_opt_pass_d *);
>   extern void execute_all_ipa_transforms (void);
> --- 587,593 
>   extern opt_pass *current_pass;
>   
>   extern bool execute_one_pass (opt_pass *);
> ! extern void execute_pass_list (function *, opt_pass *);
>   extern void execute_ipa_pass_list (opt_pass *);
>   extern void execute_ipa_summary_passes (ipa_opt_pass_d *);
>   extern void execute_all_ipa_transforms (void);
> *** extern bool function_called_by_processed
> *** 615,621 
>   extern bool first_pass_instance;
>   
>   /* Declare for plugins.  */
> ! extern void do_per_function_toporder (void (*) (void *), void *);
>   
>   extern void disable_pass (const char *);
>   extern void enable_pass (const char *);
> --- 615,621 
>   extern bool first_pass_instance;
>   
>   /* Declare for plugins.  */
> ! extern void do_per_function_toporder (void (*) (function *, void *), void 
> *);
>   
>   extern void disable_pass (const char *);
>   extern void enable_pass (const char *);
> Index: gcc/passes.c
> ===
> *** gcc/passes.c.orig 2014-04-23 14:55:25.642624814 +0200
> --- gcc/passes.c  2014-04-23 15:41:02.414436391 +0200
> *** opt_pass::opt_pass (const pass_data &dat
> *** 132,138 
>   void
>   pass_manager::execute_early_local_passes ()
>   {
> !   execute_pass_list (pass_early_local_passes_1->sub);
>   }
>   
>   unsigned int
> --- 132,138 
>   void
>   pass_manager::execute_early_local_passes ()
>   {
> !   execute_pass_list (cfun, pass_early_local_passes_1->sub);
>   }
>   
>   unsigned int
> *** pass_manager::pass_manager (context *ctx
> *** 1498,1524 
>  call CALLBACK on the current function.  */
>   
>   static void
> ! do_per_function (void (*callback) (void *data), void *data)
>   {
> if (current_function_decl)
> ! callback (data);
> else
>   {
> struct cgraph_node *node;
> FOR_EACH_DEFINED_FUNCTION (node)
>   if (node->analyzed && gimple_has_body_p (node->decl)
>   && (!node->clone_of || node->decl != node->clone_of->decl))
> !   {
> ! push_cfun (DECL_STRUCT_FUNCTION (node->decl));
> ! callback (data);
> ! if (!flag_wpa)
> !   {
> ! free_dominance_info (CDI_DOMINATORS);
> ! free_dominance_info (CDI_POST_DOMINATORS);
> !   }
> ! pop_cfun ();
> ! ggc_collect ();
> !   }
>   }
>   }
>   
> --- 1498,1514 
>  call CALLBACK on the current function.  */
>   
>   static void
> ! do_per_function (void (*callback) (function *, void *data), void *data)
>   {
> if (current_function_decl)
> ! callback (cfun, data);
> else
>   {
> struct cgraph_node *node;
> FOR_EACH_DEFINED_FUNCTION (node)
>   if (node->analyzed && gimple_has_body_p (node->decl)
>

[PATCH] Improve VRP

2014-04-28 Thread Richard Biener


This improves VRP of induction variables tested against zero
and handles overflow detection in a less awkward way.  It does
that by, instead of dropping to +-INF on iteration, drop to
+INF-1 or -INF+1 and letting the next iteration figure that out.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2014-04-28  Richard Biener  

* tree-vrp.c (vrp_var_may_overflow): Remove.
(vrp_visit_phi_node): Rather than bumping to +-INF possibly
with overflow immediately bump to one before that value and
let iteration figure out overflow status.

* gcc.dg/tree-ssa/vrp91.c: New testcase.
* gcc.dg/Wstrict-overflow-14.c: XFAIL.
* gcc.dg/Wstrict-overflow-15.c: Likewise.
* gcc.dg/Wstrict-overflow-18.c: Remove XFAIL.

Index: gcc/tree-vrp.c
===
*** gcc/tree-vrp.c.orig 2014-04-25 11:53:37.930478725 +0200
--- gcc/tree-vrp.c  2014-04-28 13:05:03.952338031 +0200
*** adjust_range_with_scev (value_range_t *v
*** 4026,4077 
  }
  }
  
- /* Return true if VAR may overflow at STMT.  This checks any available
-loop information to see if we can determine that VAR does not
-overflow.  */
- 
- static bool
- vrp_var_may_overflow (tree var, gimple stmt)
- {
-   struct loop *l;
-   tree chrec, init, step;
- 
-   if (current_loops == NULL)
- return true;
- 
-   l = loop_containing_stmt (stmt);
-   if (l == NULL
-   || !loop_outer (l))
- return true;
- 
-   chrec = instantiate_parameters (l, analyze_scalar_evolution (l, var));
-   if (TREE_CODE (chrec) != POLYNOMIAL_CHREC)
- return true;
- 
-   init = initial_condition_in_loop_num (chrec, l->num);
-   step = evolution_part_in_loop_num (chrec, l->num);
- 
-   if (step == NULL_TREE
-   || !is_gimple_min_invariant (step)
-   || !valid_value_p (init))
- return true;
- 
-   /* If we get here, we know something useful about VAR based on the
-  loop information.  If it wraps, it may overflow.  */
- 
-   if (scev_probably_wraps_p (init, step, stmt, get_chrec_loop (chrec),
-true))
- return true;
- 
-   if (dump_file && (dump_flags & TDF_DETAILS) != 0)
- {
-   print_generic_expr (dump_file, var, 0);
-   fprintf (dump_file, ": loop information indicates does not overflow\n");
- }
- 
-   return false;
- }
- 
  
  /* Given two numeric value ranges VR0, VR1 and a comparison code COMP:
  
--- 4026,4031 
*** vrp_visit_phi_node (gimple phi)
*** 8453,8484 
  && (cmp_min != 0 || cmp_max != 0))
goto varying;
  
!   /* If the new minimum is smaller or larger than the previous
!one, go all the way to -INF.  In the first case, to avoid
!iterating millions of times to reach -INF, and in the
!other case to avoid infinite bouncing between different
!minimums.  */
!   if (cmp_min > 0 || cmp_min < 0)
!   {
! if (!needs_overflow_infinity (TREE_TYPE (vr_result.min))
! || !vrp_var_may_overflow (lhs, phi))
!   vr_result.min = TYPE_MIN_VALUE (TREE_TYPE (vr_result.min));
! else if (supports_overflow_infinity (TREE_TYPE (vr_result.min)))
!   vr_result.min =
!   negative_overflow_infinity (TREE_TYPE (vr_result.min));
!   }
! 
!   /* Similarly, if the new maximum is smaller or larger than
!the previous one, go all the way to +INF.  */
!   if (cmp_max < 0 || cmp_max > 0)
!   {
! if (!needs_overflow_infinity (TREE_TYPE (vr_result.max))
! || !vrp_var_may_overflow (lhs, phi))
!   vr_result.max = TYPE_MAX_VALUE (TREE_TYPE (vr_result.max));
! else if (supports_overflow_infinity (TREE_TYPE (vr_result.max)))
!   vr_result.max =
!   positive_overflow_infinity (TREE_TYPE (vr_result.max));
!   }
  
/* If we dropped either bound to +-INF then if this is a loop
 PHI node SCEV may known more about its value-range.  */
--- 8407,8438 
  && (cmp_min != 0 || cmp_max != 0))
goto varying;
  
!   /* If the new minimum is larger than than the previous one
!retain the old value.  If the new minimum value is smaller
!than the previous one and not -INF go all the way to -INF + 1.
!In the first case, to avoid infinite bouncing between different
!minimums, and in the other case to avoid iterating millions of
!times to reach -INF.  Going to -INF + 1 also lets the following
!iteration compute whether there will be any overflow, at the
!expense of one additional iteration.  */
!   if (cmp_min < 0)
!   vr_result.min = lhs_vr->min;
!   else if (cmp_min > 0
!  && !vrp_val_is_min (vr_result.min))
!   vr_result.min
! = int_const_binop (PLUS_EXPR,
!vrp_val_min (TREE_TYPE (vr_result.min)),
!build_int_cst (TREE_TYPE (v

[PATCH][testsuite] Fix gcc.dg/pr60114.c on arm/aarch64

2014-04-28 Thread Kyrill Tkachov


Hi all,

I noticed this test is failing on aarch64:

FAIL: gcc.dg/pr60114.c  (test for warnings, line 7)
FAIL: gcc.dg/pr60114.c  (test for warnings, line 8)
FAIL: gcc.dg/pr60114.c  (test for warnings, line 21)
FAIL: gcc.dg/pr60114.c  (test for warnings, line 22)
FAIL: gcc.dg/pr60114.c  (test for warnings, line 23)
FAIL: gcc.dg/pr60114.c  (test for warnings, line 25)
FAIL: gcc.dg/pr60114.c (test for excess errors)

The test was recently added with 
http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00592.html


The offending code is of the form:


const char z[] = {
  [0] = 0x100, /* { dg-warning "9:overflow in implicit constant conversion" } */
  [2] = 0x101, /* { dg-warning "9:overflow in implicit constant conversion" } */
};


On aarch64 (and arm) chars are unsigned by default so instead we get the warning 
"large integer implicitly truncated to unsigned type".


This patch explicitly uses signed chars in the test as suggested by richi in 
the PR.

Ok for trunk?

Thanks,
Kyrill

2014-04-28  Kyrylo Tkachov  

PR c/60983
* gcc.dg/pr60114.c: Use signed chars.diff --git a/gcc/testsuite/gcc.dg/pr60114.c b/gcc/testsuite/gcc.dg/pr60114.c
index 83f9852..c656a95 100644
--- a/gcc/testsuite/gcc.dg/pr60114.c
+++ b/gcc/testsuite/gcc.dg/pr60114.c
@@ -3,7 +3,7 @@
 /* { dg-options "-Wconversion" } */
 
 struct S { int n, u[2]; };
-const char z[] = {
+const signed char z[] = {
   [0] = 0x100, /* { dg-warning "9:overflow in implicit constant conversion" } */
   [2] = 0x101, /* { dg-warning "9:overflow in implicit constant conversion" } */
 };
@@ -18,11 +18,11 @@ typedef int H[];
 void
 foo (void)
 {
-  char a[][3] = { { 0x100, /* { dg-warning "21:overflow in implicit constant conversion" } */
+  signed char a[][3] = { { 0x100, /* { dg-warning "28:overflow in implicit constant conversion" } */
 1, 0x100 }, /* { dg-warning "24:overflow in implicit constant conversion" } */
   { '\0', 0x100, '\0' } /* { dg-warning "27:overflow in implicit constant conversion" } */
 };
-  (const char []) { 0x100 }; /* { dg-warning "21:overflow in implicit constant conversion" } */
+  (const signed char []) { 0x100 }; /* { dg-warning "28:overflow in implicit constant conversion" } */
   (const float []) { 1e0, 1e1, 1e100 }; /* { dg-warning "32:conversion" } */
   struct S s1 = { 0x8000 }; /* { dg-warning "19:conversion of unsigned constant value to negative integer" } */
   struct S s2 = { .n = 0x8000 }; /* { dg-warning "24:conversion of unsigned constant value to negative integer" } */

Ping2: [PATCH] PR debug/16063. Add DW_AT_type to DW_TAG_enumeration.

2014-04-28 Thread Mark Wielaard

On Tue, 2014-04-22 at 12:31 +0200, Mark Wielaard wrote:
> On Mon, 2014-04-14 at 23:19 +0200, Mark Wielaard wrote:
> > On Fri, 2014-04-11 at 11:03 -0700, Cary Coutant wrote:
> > > >> The DWARF bits are fine with me.
> > > >
> > > > Thanks. Who can approve the other bits?
> > > 
> > > You should probably get C and C++ front end approval. I'm not really
> > > sure who needs to review patches in c-family/. Since the part in c/ is
> > > so tiny, maybe all you need is a C++ front end maintainer. Both
> > > Richard Henderson and Jason Merrill are global reviewers, so either of
> > > them could approve the whole thing.
> > 
> > Thanks, I added them to the CC.
> > 
> > > > When approved should I wait till stage 1 opens before committing?
> > > 
> > > Yes. The PR you're fixing is an enhancement request, not a regression,
> > > so it needs to wait.
> > 
> > Since stage one just opened up again this seems a good time to re-ask
> > for approval then :) Rebased patch against current trunk attached.
> 
> Ping. Tom already pushed his patches to GDB that take advantage of the
> new information if available.

Ping2. Please let me know if I should ping/cc other people to get this
reviewed.

Thanks,

Mark
commit 603223a974054aa52512ca08f36f1550692240e5
Author: Mark Wielaard 
Date:   Sun Mar 23 12:05:16 2014 +0100

PR debug/16063. Add DW_AT_type to DW_TAG_enumeration.

Add a new lang-hook that provides the underlying base type of an
ENUMERAL_TYPE. Including implementations for C and C++. Use this
enum_underlying_base_type lang-hook in dwarf2out.c to add a DW_AT_type
base type reference to a DW_TAG_enumeration.

gcc/
	* dwarf2out.c (gen_enumeration_type_die): Add DW_AT_type if
	enum_underlying_base_type defined and DWARF version > 3.
	* langhooks.h (struct lang_hooks_for_types): Add
	enum_underlying_base_type.
	* langhooks-def.h (LANG_HOOKS_ENUM_UNDERLYING_BASE_TYPE): New define.
	(LANG_HOOKS_FOR_TYPES_INITIALIZER): Add new lang hook.

gcc/c-family/
	* c-common.c (c_enum_underlying_base_type): New function.
	* c-common.h (c_enum_underlying_base_type): Add declaration.

gcc/c/
	* c-objc-common.h (LANG_HOOKS_ENUM_UNDERLYING_BASE_TYPE): Define.

gcc/cp/
	* cp-lang.c (cxx_enum_underlying_base_type): New function.
	(LANG_HOOKS_ENUM_UNDERLYING_BASE_TYPE): Define.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b25f1f6..766e0e7 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2014-03-21  Mark Wielaard  
+
+	PR debug/16063
+	* dwarf2out.c (gen_enumeration_type_die): Add DW_AT_type if
+	enum_underlying_base_type defined and DWARF version > 3.
+	* langhooks.h (struct lang_hooks_for_types): Add
+	enum_underlying_base_type.
+	* langhooks-def.h (LANG_HOOKS_ENUM_UNDERLYING_BASE_TYPE): New define.
+	(LANG_HOOKS_FOR_TYPES_INITIALIZER): Add new lang hook.
+
 2014-04-27  Richard Sandiford  
 
 	* cselib.c (find_slot_memmode): Delete.
diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
index fb0d102..e652c1b 100644
--- a/gcc/c-family/ChangeLog
+++ b/gcc/c-family/ChangeLog
@@ -1,3 +1,9 @@
+2014-03-21  Mark Wielaard  
+
+	PR debug/16063
+	* c-common.c (c_enum_underlying_base_type): New function.
+	* c-common.h (c_enum_underlying_base_type): Add declaration.
+
 2014-04-25  Marek Polacek  
 
 	PR c/18079
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 0ad955d..6862c6f 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -3906,6 +3906,14 @@ c_register_builtin_type (tree type, const char* name)
 
   registered_builtin_types = tree_cons (0, type, registered_builtin_types);
 }
+
+/* The C version of the enum_underlying_base_type langhook.  */
+tree
+c_enum_underlying_base_type (const_tree type)
+{
+  return c_common_type_for_size (TYPE_PRECISION (type), TYPE_UNSIGNED (type));
+}
+
 
 /* Print an error message for invalid operands to arith operation
CODE with TYPE0 for operand 0, and TYPE1 for operand 1.
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 57b7dce..25c3272 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -832,6 +832,7 @@ extern void c_common_finish (void);
 extern void c_common_parse_file (void);
 extern alias_set_type c_common_get_alias_set (tree);
 extern void c_register_builtin_type (tree, const char*);
+extern tree c_enum_underlying_base_type (const_tree);
 extern bool c_promoting_integer_type_p (const_tree);
 extern int self_promoting_args_p (const_tree);
 extern tree strip_pointer_operator (tree);
diff --git a/gcc/c/ChangeLog b/gcc/c/ChangeLog
index 80841af..1eb3782 100644
--- a/gcc/c/ChangeLog
+++ b/gcc/c/ChangeLog
@@ -1,3 +1,8 @@
+2014-03-21  Mark Wielaard  
+
+	PR debug/16063
+	* c-objc-common.h (LANG_HOOKS_ENUM_UNDERLYING_BASE_TYPE): Define.
+
 2014-04-25  Marek Polacek  
 
 	PR c/18079
diff --git a/gcc/c/c-objc-common.h b/gcc/c/c-objc-common.h
index 92cf60f..0651db7 100644
--- a/gcc/c/c-objc-common.h
+++ b/gcc/c/c-objc-commo

Re: [PATCH] Detect a pack-unpack pattern in GCC vectorizer and optimize it.

2014-04-28 Thread Richard Biener

On Thu, 24 Apr 2014, Cong Hou wrote:

> Given the following loop:
> 
> int a[N];
> short b[N*2];
> 
> for (int i = 0; i < N; ++i)
>   a[i] = b[i*2];
> 
> 
> After being vectorized, the access to b[i*2] will be compiled into
> several packing statements, while the type promotion from short to int
> will be compiled into several unpacking statements. With this patch,
> each pair of pack/unpack statements will be replaced by less expensive
> statements (with shift or bit-and operations).
> 
> On x86_64, the loop above will be compiled into the following assembly
> (with -O2 -ftree-vectorize):
> 
> movdqu 0x10(%rcx),%xmm3
> movdqu -0x20(%rcx),%xmm0
> movdqa %xmm0,%xmm2
> punpcklwd %xmm3,%xmm0
> punpckhwd %xmm3,%xmm2
> movdqa %xmm0,%xmm3
> punpcklwd %xmm2,%xmm0
> punpckhwd %xmm2,%xmm3
> movdqa %xmm1,%xmm2
> punpcklwd %xmm3,%xmm0
> pcmpgtw %xmm0,%xmm2
> movdqa %xmm0,%xmm3
> punpckhwd %xmm2,%xmm0
> punpcklwd %xmm2,%xmm3
> movups %xmm0,-0x10(%rdx)
> movups %xmm3,-0x20(%rdx)
> 
> 
> With this patch, the generated assembly is shown below:
> 
> movdqu 0x10(%rcx),%xmm0
> movdqu -0x20(%rcx),%xmm1
> pslld  $0x10,%xmm0
> psrad  $0x10,%xmm0
> pslld  $0x10,%xmm1
> movups %xmm0,-0x10(%rdx)
> psrad  $0x10,%xmm1
> movups %xmm1,-0x20(%rdx)
> 
> 
> Bootstrapped and tested on x86-64. OK for trunk?

This is an odd place to implement such transform.  Also if it
is faster or not depends on the exact ISA you target - for
example ppc has constraints on the maximum number of shifts
carried out in parallel and the above has 4 in very short
succession.  Esp. for the sign-extend path.

So this looks more like an opportunity for a post-vectorizer
transform on RTL or for the vectorizer special-casing
widening loads with a vectorizer pattern.

Richard.

Re: [RFC][AARCH64] TARGET_ATOMIC_ASSIGN_EXPAND_FENV hook

2014-04-28 Thread Ramana Radhakrishnan


On 04/26/14 11:57, Kugan wrote:

Attached patch implements TARGET_ATOMIC_ASSIGN_EXPAND_FENV for AARCH64.
With this, atomic test-case gcc.dg/atomic/c11-atomic-exec-5.c now PASS.

This implementation is based on SPARC and i386 implementations.

Regression tested on qemu-aarch64 for aarch64-none-linux-gnu with no new
regression. Is this OK for trunk?


Again like A32 please test on hardware to make sure this behaves 
correctly with c11-atomic-exec-5.c .


If you don't have access to hardware, let us know : we'll take it for a 
spin once you update the patch according to Marcus's comments.


regards
Ramana



Thanks,
Kugan

gcc/
+2014-04-27  Kugan Vivekanandarajah  
+
+   * config/aarch64/aarch64.c (TARGET_ATOMIC_ASSIGN_EXPAND_FENV): New
+   define.
+   * config/aarch64/aarch64-builtins.c (arm_builtins) : Add
+   AARCH64_BUILTIN_LDFPSCR and AARCH64_BUILTIN_STFPSCR.
+   (aarch64_init_builtins) : Initialize builtins
+   __builtins_aarch64_stfpscr and __builtins_aarch64_ldfpscr.
+   (aarch64_expand_builtin) : Expand builtins __builtins_aarch64_stfpscr
+   and __builtins_aarch64_ldfpscr.
+   (aarch64_atomic_assign_expand_fenv): New function.
+   * config/aarch64/aarch64.md (stfpscr): New pattern.
+   (ldfpscr) : Likewise.
+   (unspecv): Add UNSPECV_LDFPSCR and UNSPECV_STFPSCR.
+







--
Ramana Radhakrishnan
Principal Engineer
ARM Ltd.

Re: [Patch ARM 3/3] Neon intrinsics TLC - Remove unneeded ML from backend.

2014-04-28 Thread Ramana Radhakrishnan


On 04/28/14 11:48, Ramana Radhakrishnan wrote:

Patch 3/3 removes the ML to generate Neon intrinsics and the
documentation and updates the comments in the files to show that these
are now hand crafted rather than auto-generated. We've had these for
many years now and I think it's time we got rid of this. Not everyone
groks ML and it doesn't help that only one or 2 folks can actually do
this properly everytime. Instead of having these bottlenecks and given
the fact that the intrinsics are pretty stable now, there's no point in
retaining the generator interface. I'd rather get rid of them. The only
bit left is neon-schedgen.ml, neon.ml and neon-testgen.ml. I think we
can safely remove neon-testgen.ml once Christophe's testsuite is done
and we'll probably just have to carry neon-schedgen.ml / neon.ml as it
still generates the neon descriptions for both a8 and a9.


James just pointed out that (my memory was wrong or I must have been 
looking in the wrong directory) he killed neon-schedgen.ml last year. So 
neon.ml can go as well once neon-testgen.ml dies.


regards
Ramana






  Ramana Radhakrishnan  

* config/arm/arm_neon.h: Update comment.
* config/arm/neon-docgen.ml: Delete.
* config/arm/neon-gen.ml: Delete.
* doc/arm-neon-intrinsics.texi: Update comment.




--
Ramana Radhakrishnan
Principal Engineer
ARM Ltd.

[Patch ARM 3/3] Neon intrinsics TLC - Remove unneeded ML from backend.

2014-04-28 Thread Ramana Radhakrishnan


Patch 3/3 removes the ML to generate Neon intrinsics and the
documentation and updates the comments in the files to show that these
are now hand crafted rather than auto-generated. We've had these for
many years now and I think it's time we got rid of this. Not everyone
groks ML and it doesn't help that only one or 2 folks can actually do
this properly everytime. Instead of having these bottlenecks and given
the fact that the intrinsics are pretty stable now, there's no point in
retaining the generator interface. I'd rather get rid of them. The only
bit left is neon-schedgen.ml, neon.ml and neon-testgen.ml. I think we
can safely remove neon-testgen.ml once Christophe's testsuite is done
and we'll probably just have to carry neon-schedgen.ml / neon.ml as it
still generates the neon descriptions for both a8 and a9.


  Ramana Radhakrishnan  

* config/arm/arm_neon.h: Update comment.
* config/arm/neon-docgen.ml: Delete.
* config/arm/neon-gen.ml: Delete.
* doc/arm-neon-intrinsics.texi: Update comment.

--
Ramana Radhakrishnan
Principal Engineer
ARM Ltd.From 9382d4c22ceb555fc74d8c90c75e6ce47faaffe0 Mon Sep 17 00:00:00 2001
From: Ramana Radhakrishnan 
Date: Thu, 24 Apr 2014 10:11:48 +0100
Subject: [PATCH 3/3]We have now reached the point where both neon-gen.ml
 and neon-docgen.ml are obsolete and are a pain to maintain for a number
 of bespoke handcrafted changes to arm_neon.h.

Given this there is no point in keeping this further in the source tree.

neon-testgen.ml is on it's last legs and if clyon's work in getting
the neon execute tests in is completed, we will remove all of
gcc.target/arm/neon and neon-testgen.ml.

Ramana
---
 gcc/config/arm/arm_neon.h|   3 +-
 gcc/config/arm/neon-docgen.ml| 424 ---
 gcc/config/arm/neon-gen.ml   | 520 ---
 gcc/doc/arm-neon-intrinsics.texi |   2 -
 4 files changed, 1 insertion(+), 948 deletions(-)
 delete mode 100644 gcc/config/arm/neon-docgen.ml
 delete mode 100644 gcc/config/arm/neon-gen.ml

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index e146369..564e46b 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -1,5 +1,4 @@
-/* ARM NEON intrinsics include file. This file is generated automatically
-   using neon-gen.ml.  Please do not edit manually.
+/* ARM NEON intrinsics include file.
 
Copyright (C) 2006-2014 Free Software Foundation, Inc.
Contributed by CodeSourcery.
diff --git a/gcc/config/arm/neon-docgen.ml b/gcc/config/arm/neon-docgen.ml
deleted file mode 100644
index 5788a53..000
--- a/gcc/config/arm/neon-docgen.ml
+++ /dev/null
@@ -1,424 +0,0 @@
-(* ARM NEON documentation generator.
-
-   Copyright (C) 2006-2014 Free Software Foundation, Inc.
-   Contributed by CodeSourcery.
-
-   This file is part of GCC.
-
-   GCC is free software; you can redistribute it and/or modify it under
-   the terms of the GNU General Public License as published by the Free
-   Software Foundation; either version 3, or (at your option) any later
-   version.
-
-   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
-   WARRANTY; without even the implied warranty of MERCHANTABILITY or
-   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
-   for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with GCC; see the file COPYING3.  If not see
-   .
-
-   This is an O'Caml program.  The O'Caml compiler is available from:
-
- http://caml.inria.fr/
-
-   Or from your favourite OS's friendly packaging system. Tested with version
-   3.09.2, though other versions will probably work too.
-
-   Compile with:
- ocamlc -c neon.ml
- ocamlc -o neon-docgen neon.cmo neon-docgen.ml
-
-   Run with:
- /path/to/neon-docgen /path/to/gcc/doc/arm-neon-intrinsics.texi
-*)
-
-open Neon
-
-(* The combined "ops" and "reinterp" table.  *)
-let ops_reinterp = reinterp @ ops
-
-(* Helper functions for extracting things from the "ops" table.  *)
-let single_opcode desired_opcode () =
-  List.fold_left (fun got_so_far ->
-  fun row ->
-match row with
-  (opcode, _, _, _, _, _) ->
-if opcode = desired_opcode then row :: got_so_far
-   else got_so_far
- ) [] ops_reinterp
-
-let multiple_opcodes desired_opcodes () =
-  List.fold_left (fun got_so_far ->
-  fun desired_opcode ->
-(single_opcode desired_opcode ()) @ got_so_far)
- [] desired_opcodes
-
-let ldx_opcode number () =
-  List.fold_left (fun got_so_far ->
-  fun row ->
-match row with
-  (opcode, _, _, _, _, _) ->
-match opcode with
-  Vldx n | Vldx

Re: -fuse-caller-save - Enable for MIPS

2014-04-28 Thread Tom de Vries


On 28-04-14 12:26, Richard Sandiford wrote:

Tom de Vries  writes:

On 27-04-14 12:27, Richard Sandiford wrote:

Tom de Vries  writes:

   mips_emit_call_insn (rtx pattern, rtx orig_addr, rtx addr, bool lazy_p)
   {
 rtx insn, reg;

-  insn = emit_call_insn (pattern);
+  emit_call_insn (pattern);
+  insn = last_call_insn ();

 if (TARGET_MIPS16 && mips_use_pic_fn_addr_reg_p (orig_addr))
   {


This change isn't necessary; emit_call_insn is defined to return a CALL_INSN.



I dropped this change, as well as the change in the untyped_call expand, I
realized it's unnecessary.


Why was the untyped_call part unnecessary?



The define_expand "untyped_call" uses GEN_CALL, which uses define_expand "call", 
which uses mips_expand_call, which uses mips_emit_call_insn, which adds the 
required clobbers.



I'm a bit surprised that it doesn't work at -O1 for a simple test
like this though.  What goes wrong?



AFAIU now the problem is that the optimization doesn't trigger for -O0
and -01, because the register allocator behaves more conservatively.


Hmm, is that just because -fcaller-saves is -O2 and above?
 If so,
should -fuse-caller-save imply -fcaller-saves?

Thanks,
Richard

[Patch ARM 2/3] Remove dead code from backend.

2014-04-28 Thread Ramana Radhakrishnan


This then left us in the happy position of being able to delete code
but I was worried about LTO streaming as these "builtins" are
essentially streamed out in LTO object code format. However since we
make no promises about LTO compatibility across releases, that's safe
but I structured the dead code elimination as Patch 2/3.

This will be committed separately in case folks want to backport Patch 
1/3 separately and want to assure their users of LTO compatibility 
within a release branch (if that even works) .




  Ramana Radhakrishnan  

* config/arm/arm_neon_builtins.def (vadd, vsub): Only define 
the v2sf and v4sf versions.

  (vand, vorr, veor, vorn, vbic): Remove.
* config/arm/neon.md (neon_vadd, neon_vsub, neon_vadd_unspec, 
neon_vsub_unspec): Adjust iterator.

  (neon_vorr, neon_vand, neon_vbic, neon_veor, neon_vorn): Remove.

--
Ramana Radhakrishnan
Principal Engineer
ARM Ltd.commit dad8586bd8c799ad26b0c7ee6e1837b50b9ef9a3
Author: Ramana Radhakrishnan 
Date:   Thu Apr 24 16:00:08 2014 +0100

Remove Dead code.

diff --git a/gcc/config/arm/arm_neon_builtins.def 
b/gcc/config/arm/arm_neon_builtins.def
index a00951a..85215b5 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -18,8 +18,7 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
-VAR10 (BINOP, vadd,
-   v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
+VAR2 (BINOP, vadd, v2sf, v4sf),
 VAR3 (BINOP, vaddl, v8qi, v4hi, v2si),
 VAR3 (BINOP, vaddw, v8qi, v4hi, v2si),
 VAR6 (BINOP, vhadd, v8qi, v4hi, v2si, v16qi, v8hi, v4si),
@@ -54,7 +53,7 @@ VAR8 (SHIFTIMM, vqshl_n, v8qi, v4hi, v2si, di, v16qi, v8hi, 
v4si, v2di),
 VAR8 (SHIFTIMM, vqshlu_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
 VAR3 (SHIFTIMM, vshll_n, v8qi, v4hi, v2si),
 VAR8 (SHIFTACC, vsra_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
-VAR10 (BINOP, vsub, v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
+VAR2 (BINOP, vsub, v2sf, v4sf),
 VAR3 (BINOP, vsubl, v8qi, v4hi, v2si),
 VAR3 (BINOP, vsubw, v8qi, v4hi, v2si),
 VAR8 (BINOP, vqsub, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di),
@@ -199,14 +198,4 @@ VAR5 (LOADSTRUCT, vld4_dup, v8qi, v4hi, v2si, v2sf, di),
 VAR9 (STORESTRUCT, vst4,
v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf),
 VAR7 (STORESTRUCTLANE, vst4_lane,
-   v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf),
-VAR10 (LOGICBINOP, vand,
-v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
-VAR10 (LOGICBINOP, vorr,
-v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
-VAR10 (BINOP, veor,
-v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
-VAR10 (LOGICBINOP, vbic,
-v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
-VAR10 (LOGICBINOP, vorn,
-v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
+   v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index aad420c..9ac393b 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1842,9 +1842,9 @@
 ; good for plain vadd, vaddq.
 
 (define_expand "neon_vadd"
-  [(match_operand:VDQX 0 "s_register_operand" "=w")
-   (match_operand:VDQX 1 "s_register_operand" "w")
-   (match_operand:VDQX 2 "s_register_operand" "w")
+  [(match_operand:VCVTF 0 "s_register_operand" "=w")
+   (match_operand:VCVTF 1 "s_register_operand" "w")
+   (match_operand:VCVTF 2 "s_register_operand" "w")
(match_operand:SI 3 "immediate_operand" "i")]
   "TARGET_NEON"
 {
@@ -1869,9 +1869,9 @@
 ; Used for intrinsics when flag_unsafe_math_optimizations is false.
 
 (define_insn "neon_vadd_unspec"
-  [(set (match_operand:VDQX 0 "s_register_operand" "=w")
-(unspec:VDQX [(match_operand:VDQX 1 "s_register_operand" "w")
- (match_operand:VDQX 2 "s_register_operand" "w")]
+  [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
+(unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
+ (match_operand:VCVTF 2 "s_register_operand" "w")]
  UNSPEC_VADD))]
   "TARGET_NEON"
   "vadd.\t%0, %1, %2"
@@ -2132,9 +2132,9 @@
 )
 
 (define_expand "neon_vsub"
-  [(match_operand:VDQX 0 "s_register_operand" "=w")
-   (match_operand:VDQX 1 "s_register_operand" "w")
-   (match_operand:VDQX 2 "s_register_operand" "w")
+  [(match_operand:VCVTF 0 "s_register_operand" "=w")
+   (match_operand:VCVTF 1 "s_register_operand" "w")
+   (match_operand:VCVTF 2 "s_register_operand" "w")
(match_operand:SI 3 "immediate_operand" "i")]
   "TARGET_NEON"
 {
@@ -2149,9 +2149,9 @@
 ; Used for intrinsics when flag_unsafe_math_optimizations is false.
 
 (define_insn "neon_vsub_unspec"
-  [(set (match_operand:VDQX 0 "s_register_operand" "=w")
-(unspec:VDQX [(match_operand:VDQX 1 "s_register_operand" "w")
- (match_operand:VDQX 2 "s_register_operand" "w")]
+  [(set (match_operand:VCVTF 0 "

Re: [PATCH] Make SRA create statements with the correct alias type

2014-04-28 Thread Richard Biener

On Fri, 25 Apr 2014, Martin Jambor wrote:

> Hi,
> 
> the patch below is inspired by PR 57297 (the most relevant comments
> are #4 and #5).  The problem is that currently SRA creates memory
> loads and stores with alias type of whatever happens to be in
> access->base.  However, at least when using placement or some nasty
> type-casting, it is possible that the same aggregate, represented by
> the same access structure, is accessed using different alias types in
> one function.  This might lead to bogus memory access reordering, at
> least in theory.  This patch therefore makes sure all SRA created
> accesses have the same alias type as the load/store they originated
> from.
> 
> Because load_assign_lhs_subreplacements did not look like it could
> accept one more parameter, I encapsulated all of them in a structure.
> I wrote this patch in December, I admit I don't remember what the new
> testcase aims for, but I assume I added it for a reason :-)
> 
> Bootstrapped and tested on x86_64-linux.  OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> 
> Martin
> 
> 
> 2014-04-24  Martin Jambor  
> 
>   * tree-sra.c (sra_modify_expr): Generate new memory accesses with
>   same alias type as the original statement.
>   (subreplacement_assignment_data): New type.
>   (handle_unscalarized_data_in_subtree): New type of parameter,
>   generate new memory accesses with same alias type as the original
>   statement.
>   (load_assign_lhs_subreplacements): Likewise.
>   (sra_modify_constructor_assign): Generate new memory accesses with
>   same alias type as the original statement.
> 
> testsuite/
>   * gcc.dg/tree-ssa/sra-14.c: New test.
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-14.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/sra-14.c
> new file mode 100644
> index 000..6cbc0b4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-14.c
> @@ -0,0 +1,70 @@
> +/* { dg-do run } */
> +/* { dg-options "-O1" } */
> +
> +struct S
> +{
> +  int i, j;
> +};
> +
> +struct Z
> +{
> +  struct S d, s;
> +};
> +
> +struct S __attribute__ ((noinline, noclone))
> +get_s (void)
> +{
> +  struct S s;
> +  s.i = 5;
> +  s.j = 6;
> +
> +  return s;
> +}
> +
> +struct S __attribute__ ((noinline, noclone))
> +get_d (void)
> +{
> +  struct S d;
> +  d.i = 0;
> +  d.j = 0;
> +
> +  return d;
> +}
> +
> +int __attribute__ ((noinline, noclone))
> +get_c (void)
> +{
> +  return 1;
> +}
> +
> +int __attribute__ ((noinline, noclone))
> +my_nop (int i)
> +{
> +  return i;
> +}
> +
> +int __attribute__ ((noinline, noclone))
> +foo (void)
> +{
> +  struct Z z;
> +  int i, c = get_c ();
> +
> +  z.d = get_d ();
> +  z.s = get_s ();
> +
> +  for (i = 0; i < c; i++)
> +{
> +  z.s.i = my_nop (z.s.i);
> +  z.s.j = my_nop (z.s.j);
> +}
> +
> +  return z.s.i + z.s.j;
> +}
> +
> +int main (int argc, char *argv[])
> +{
> +  if (foo () != 11)
> +__builtin_abort ();
> +  return 0;
> +}
> +
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 49bbee3..4a24e6a 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -2769,7 +2769,7 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, 
> bool write)
>  {
>location_t loc;
>struct access *access;
> -  tree type, bfr;
> +  tree type, bfr, orig_expr;
>  
>if (TREE_CODE (*expr) == BIT_FIELD_REF)
>  {
> @@ -2785,6 +2785,7 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, 
> bool write)
>if (!access)
>  return false;
>type = TREE_TYPE (*expr);
> +  orig_expr = *expr;
>  
>loc = gimple_location (gsi_stmt (*gsi));
>gimple_stmt_iterator alt_gsi = gsi_none ();
> @@ -2811,8 +2812,7 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, 
> bool write)
>   {
> tree ref;
>  
> -   ref = build_ref_for_model (loc, access->base, access->offset, access,
> -  NULL, false);
> +   ref = build_ref_for_model (loc, orig_expr, 0, access, NULL, false);
>  
> if (write)
>   {
> @@ -2863,7 +2863,7 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, 
> bool write)
>else
>   start_offset = chunk_size = 0;
>  
> -  generate_subtree_copies (access->first_child, access->base, 0,
> +  generate_subtree_copies (access->first_child, orig_expr, 
> access->offset,
>  start_offset, chunk_size, gsi, write, write,
>  loc);
>  }
> @@ -2877,53 +2877,70 @@ enum unscalarized_data_handling { SRA_UDH_NONE,  /* 
> Nothing done so far. */
> SRA_UDH_RIGHT, /* Data flushed to the RHS. */
> SRA_UDH_LEFT }; /* Data flushed to the LHS. */
>  
> +struct subreplacement_assignment_data
> +{
> +  /* Offset of the access representing the lhs of the assignment.  */
> +  HOST_WIDE_INT left_offset;
> +
> +  /* LHS and RHS of the original assignment.  */
> +  tree assignment_lhs, assignment_rhs;
> +
> +  /* Access representing the rhs of the

[Patch ARM 1/3] Neon intrinsics TLC : Replace intrinsics with GNU C implementations where possible.

2014-04-28 Thread Ramana Radhakrishnan

I've special cased the ffast-math case for the _f32 intrinsics to 
prevent the auto-vectorizer from coming along and vectorizing addv2sf 
and addv4sf type operations which we don't want to happen by default.
Patch 1/3 causes apparent "regressions" in the rather ineffective neon 
intrinsics tests that we currently carry soon hopefully to be replaced 
by Christophe Lyon's rewrite that is being reviewed. On the whole I deem 
this patch stack to be safe to go in if necessary. These "regressions" 
are for -O0 with the vbic and vorn intrinsics which

don't now get combined and well, so be it.


Given we're in stage 1 and that I think we're getting some where
with clyon's testsuite I feel that is reasonably practical in just
carrying the noise with these extra failures. Christophe and I will
testdrive his testsuite work in this space with these patches to see how 
the conversion process works and if there are any issues with these patches.



  Ramana Radhakrishnan  

* config/arm/arm_neon.h (vadd_s8): GNU C implementation
(vadd_s16): Likewise.
(vadd_s32): Likewise.
(vadd_f32): Likewise.
(vadd_u8): Likewise.
(vadd_u16): Likewise.
(vadd_u32): Likewise.
(vadd_s64): Likewise.
(vadd_u64): Likewise.
(vaddq_s8): Likewise.
(vaddq_s16): Likewise.
(vaddq_s32): Likewise.
(vaddq_s64): Likewise.
(vaddq_f32): Likewise.
(vaddq_u8): Likewise.
(vaddq_u16): Likewise.
(vaddq_u32): Likewise.
(vaddq_u64): Likewise.
(vmul_s8): Likewise.
(vmul_s16): Likewise.
(vmul_s32): Likewise.
(vmul_f32): Likewise.
(vmul_u8): Likewise.
(vmul_u16): Likewise.
(vmul_u32): Likewise.
(vmul_p8): Likewise.
(vmulq_s8): Likewise.
(vmulq_s16): Likewise.
(vmulq_s32): Likewise.
(vmulq_f32): Likewise.
(vmulq_u8): Likewise.
(vmulq_u16): Likewise.
(vmulq_u32): Likewise.
(vsub_s8): Likewise.
(vsub_s16): Likewise.
(vsub_s32): Likewise.
(vsub_f32): Likewise.
(vsub_u8): Likewise.
(vsub_u16): Likewise.
(vsub_u32): Likewise.
(vsub_s64): Likewise.
(vsub_u64): Likewise.
(vsubq_s8): Likewise.
(vsubq_s16): Likewise.
(vsubq_s32): Likewise.
(vsubq_s64): Likewise.
(vsubq_f32): Likewise.
(vsubq_u8): Likewise.
(vsubq_u16): Likewise.
(vsubq_u32): Likewise.
(vsubq_u64): Likewise.
(vand_s8): Likewise.
(vand_s16): Likewise.
(vand_s32): Likewise.
(vand_u8): Likewise.
(vand_u16): Likewise.
(vand_u32): Likewise.
(vand_s64): Likewise.
(vand_u64): Likewise.
(vandq_s8): Likewise.
(vandq_s16): Likewise.
(vandq_s32): Likewise.
(vandq_s64): Likewise.
(vandq_u8): Likewise.
(vandq_u16): Likewise.
(vandq_u32): Likewise.
(vandq_u64): Likewise.
(vorr_s8): Likewise.
(vorr_s16): Likewise.
(vorr_s32): Likewise.
(vorr_u8): Likewise.
(vorr_u16): Likewise.
(vorr_u32): Likewise.
(vorr_s64): Likewise.
(vorr_u64): Likewise.
(vorrq_s8): Likewise.
(vorrq_s16): Likewise.
(vorrq_s32): Likewise.
(vorrq_s64): Likewise.
(vorrq_u8): Likewise.
(vorrq_u16): Likewise.
(vorrq_u32): Likewise.
(vorrq_u64): Likewise.
(veor_s8): Likewise.
(veor_s16): Likewise.
(veor_s32): Likewise.
(veor_u8): Likewise.
(veor_u16): Likewise.
(veor_u32): Likewise.
(veor_s64): Likewise.
(veor_u64): Likewise.
(veorq_s8): Likewise.
(veorq_s16): Likewise.
(veorq_s32): Likewise.
(veorq_s64): Likewise.
(veorq_u8): Likewise.
(veorq_u16): Likewise.
(veorq_u32): Likewise.
(veorq_u64): Likewise.
(vbic_s8): Likewise.
(vbic_s16): Likewise.
(vbic_s32): Likewise.
(vbic_u8): Likewise.
(vbic_u16): Likewise.
(vbic_u32): Likewise.
(vbic_s64): Likewise.
(vbic_u64): Likewise.
(vbicq_s8): Likewise.
(vbicq_s16): Likewise.
(vbicq_s32): Likewise.
(vbicq_s64): Likewise.
(vbicq_u8): Likewise.
(vbicq_u16): Likewise.
(vbicq_u32): Likewise.
(vbicq_u64): Likewise.
(vorn_s8): Likewise.
(vorn_s16): Likewise.
(vorn_s32): Likewise.
(vorn_u8): Likewise.
(vorn_u16): Likewise.
(vorn_u32): Likewise.
(vorn_s64): Likewise.
(vorn_u64): Likewise.
(vornq_s8): Likewise.
(vornq_s16): Likewise.
(vornq_s32): Likewise.
(vornq_s64): Likewise.
(vornq_u8): Likewise.
(vornq_u16): Likewise.
(vornq_u32): Likewise.
(vornq_u64): Likewise.



--
Ramana Radhakrishnan

[Patch ARM 0/3] Neon intrinsics TLC - Replace intrinsics with GNU C implementations where possible and remove dead code.

2014-04-28 Thread Ramana Radhakrishnan


Hi,

	I was investigating a performance issue with Neon intrinsics and 
realized this needed to happen.


	Patch 1/3 does this. I've special cased the ffast-math case for the 
_f32 intrinsics to prevent the auto-vectorizer from coming along and 
vectorizing addv2sf and addv4sf type operations which we don't want to 
happen by default. Patch 1/3 causes apparent "regressions" in the rather 
ineffective neon intrinsics tests that we currently carry soon hopefully 
to be replaced by Christophe Lyon's rewrite that is being reviewed. On 
the whole I deem this patch stack to be safe to go in if necessary. 
These "regressions" are for -O0 with the vbic and vorn intrinsics which 
don't now get combined and well, so be it.


	This then left us in the happy position of being able to delete code 
but I was worried about LTO streaming as these "builtins" are 
essentially streamed out in LTO object code format. However since we 
make no promises about LTO compatibility across releases, that's safe 
but I structured the dead code elimination as Patch 2/3. This will be 
committed separately in case folks want to backport Patch 1/3 separately 
and want to assure their users of LTO compatibility within a release 
branch (if that even works :)  ) .


	Patch 3/3 removes the ML to generate Neon intrinsics and the 
documentation and updates the comments in the files to show that these 
are now hand crafted rather than auto-generated. We've had these for 
many years now and I think it's time we got rid of this. Not everyone 
groks ML and it doesn't help that only one or 2 folks can actually do 
this properly everytime. Instead of having these bottlenecks and given 
the fact that the intrinsics are pretty stable now, there's no point in 
retaining the generator interface. I'd rather get rid of them. The only 
bit left is neon-schedgen.ml, neon.ml and neon-testgen.ml. I think we 
can safely remove neon-testgen.ml once Christophe's testsuite is done 
and we'll probably just have to carry neon-schedgen.ml / neon.ml as it 
still generates the neon descriptions for both a8 and a9.


	The patch stack was caught up in the C++ type info mess recently and 
I've tested this on a cross arm-linux-gnueabihf testsuite run and it 
looks ok module the issues mentioned for Patch 1/3. I've deliberately 
resisted deleting the entire gcc.target/arm/neon and neon-testgen.ml in 
the hope that Christophe's testsuite will do the honours at that point 
:). Given we're in stage 1 and that I think we're getting some where 
with clyon's testsuite I feel that is reasonably practical in just 
carrying the noise with these extra failures. Christophe and I will 
testdrive his testsuite work in this space with these patches to see how 
the conversion process works and if there are any issues with these patches.


If there are issues I'm happy to hear about them.

Will apply to trunk in a couple of days if no regressions with clyon's 
testsuite for these intrinsics.



regards
Ramana
--
Ramana Radhakrishnan
Principal Engineer
ARM Ltd.

Simplify Solaris 2 configuration

2014-04-28 Thread Rainer Orth

After Solaris 9/x86 support is gone from mainline, the Solaris 2
configuration can be massively simplified:

* All Solaris 10+ configurations are bi-arch now.

* The sol2-bi.h configs can be merged into the base configs.  For x86,
  i386/x86-64.h can be included before i386/sol2.h, which then overrides
  settings which differ on Solaris.

* On Solaris 10+, /usr/ccs/lib only contains symlinks to files in
  /usr/lib, so no need to include it in library search paths and such.
  /usr/ccs/bin is different, though: before Solaris 11, as, ld and
  friends only live there.

* ${cpu}/sol2.h needs to be included before sol2.h to provide
  definitions of DEFAULT_ARCH32_P for the latter.

* In config/sparc/sol2.h, all definitions of ASM_CPU_DEFAULT_SPEC could
  go: even before my change, they've been overridden by
  config/sparc/sol2-bi.h anyway, so they served no purpose.

Initial test results looked good; I'll now be running a complete regtest
across the whole set of configurations (sparc-sun-solaris2.1[01] with
as/ld, gas/ld, gas/gld, as/gld, sparcv9-sun-solaris2.1[01] with as/ld,
gas/ld) and commit the patch unless problems show up there.

Rainer

2014-04-24  Rainer Orth  

* config/sol2-10.h (TARGET_LIBC_HAS_FUNCTION): Move ...
* config/sol2.h: ... here.
* config/sol2-10.h: Remove.

* config/sol2-bi.h (WCHAR_TYPE, WCHAR_TYPE_SIZE, WINT_TYPE)
(WINT_TYPE_SIZE, MULTILIB_DEFAULTS, DEF_ARCH32_SPEC)
(DEF_ARCH64_SPEC, ASM_CPU_DEFAULT_SPEC, LINK_ARCH64_SPEC_BASE)
(LINK_ARCH64_SPEC, ARCH_DEFAULT_EMULATION, TARGET_LD_EMULATION)
(LINK_ARCH_SPEC, SUBTARGET_EXTRA_SPECS): Move ...
* config/sol2.h: ... here.
(SECTION_NAME_FORMAT): Don't redefine.
(STARTFILE_ARCH32_SPEC): Rename to ...
(STARTFILE_ARCH_SPEC): ... this.
(ASM_OUTPUT_ALIGNED_COMMON): Move ...
* config/sparc/sol2.h: ... here.
(SECTION_NAME_FORMAT): Don't undef.
* config/i386/sol2.h (ASM_CPU_DEFAULT_SPEC)
(SUBTARGET_EXTRA_SPECS): Remove.
* config/sparc/sol2.h (ASM_CPU_DEFAULT_SPEC): Remove.

* config/i386/sol2-bi.h (TARGET_SUBTARGET_DEFAULT)
(MD_STARTFILE_PREFIX): Remove.
(SUBTARGET_OPTIMIZATION_OPTIONS, ASM_CPU32_DEFAULT_SPEC)
(ASM_CPU64_DEFAULT_SPEC, ASM_CPU_SPEC, ASM_SPEC, DEFAULT_ARCH32_P)
(ARCH64_SUBDIR, ARCH32_EMULATION, ARCH64_EMULATION)
(ASM_COMMENT_START, JUMP_TABLES_IN_TEXT_SECTION)
(ASM_OUTPUT_DWARF_PCREL, ASM_OUTPUT_ALIGNED_COMMON)
(USE_IX86_FRAME_POINTER, USE_X86_64_FRAME_POINTER): Move ...
* config/i386/sol2.h: ... here.
(TARGET_SUBTARGET_DEFAULT, SIZE_TYPE, PTRDIFF_TYPE): Remove.
* config/i386/sol2-bi.h: Remove.
* config/sol2.h (MD_STARTFILE_PREFIX): Remove.
(LINK_ARCH32_SPEC_BASE): Remove /usr/ccs/lib/libp, /usr/ccs/lib.

* config/i386/t-sol2-64: Rename to ...
* config/i386/t-sol2: ... this.
* config/sparc/t-sol2-64: Rename to ...
* config/sparc/t-sol2: ... this.

* config.gcc (*-*-solaris2*): Split sol2_tm_file into
sol2_tm_file_head, sol2_tm_file_tail.
Include ${cpu_type}/sol2.h before sol2.h.
Remove sol2-10.h.
(i[34567]86-*-solaris2* | x86_64-*-solaris2.1[0-9]*): Include
i386/x86-64.h between sol2_tm_file_head and sol2_tm_file_tail.
Remove i386/sol2-bi.h, sol2-bi.h from tm_file.
Reflect i386/t-sol2-64 renaming.
(sparc*-*-solaris2*): Remove sol2-bi.h from tm_file.
Reflect sparc/t-sol2-64 renaming.

# HG changeset patch
# Parent 1c62820f15f3becfe5eb38788cecb94a20babda9
Simplify Solaris 2 configuration

diff --git a/gcc/config.gcc b/gcc/config.gcc
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -808,8 +808,9 @@ case ${target} in
 *-*-solaris2*)
   # i?86-*-solaris2* needs to insert headers between cpu default and
   # Solaris 2 specific ones.
-  sol2_tm_file="dbxelf.h elfos.h ${cpu_type}/sysv4.h sol2.h ${cpu_type}/sol2.h"
-  sol2_tm_file="${sol2_tm_file} sol2-10.h"
+  sol2_tm_file_head="dbxelf.h elfos.h ${cpu_type}/sysv4.h"
+  sol2_tm_file_tail="${cpu_type}/sol2.h sol2.h"
+  sol2_tm_file="${sol2_tm_file_head} ${sol2_tm_file_tail}"
   use_gcc_stdint=wrap
   if test x$gnu_ld = xyes; then
 tm_file="usegld.h ${tm_file}"
@@ -1515,14 +1516,13 @@ i[34567]86-*-rtems*)
 	tmake_file="${tmake_file} i386/t-rtems"
 	;;
 i[34567]86-*-solaris2* | x86_64-*-solaris2.1[0-9]*)
-	tm_file="${tm_file} i386/unix.h i386/att.h ${sol2_tm_file}"
 	# Set default arch_32 to pentium4, tune_32 to generic like the other
 	# i386 targets, although config.guess defaults to i386-pc-solaris2*.
 	with_arch_32=${with_arch_32:-pentium4}
 	with_tune_32=${with_tune_32:-generic}
-	tm_file="${tm_file} i386/x86-64.h i386/sol2-bi.h sol2-bi.h"
+	tm_file="${tm_file} i386/unix.h i386/att.h ${sol2_tm_file_head} i386/x86-64.h ${sol2_tm_file_tail}"
 	tm_defines="${tm_defines} TARGET_BI_ARCH=1"
-	tmake_file="$tmake_file i386/t-sol2-64"
+	tma

Re: [PATCH] Add a couple of dialect and warning options regarding Objective-C instance variable scope

2014-04-28 Thread Dimitris Papavasiliou


On 04/25/2014 07:50 PM, Mike Stump wrote:

On Apr 25, 2014, at 9:34 AM, Dimitris Papavasiliou  wrote:


--Wreturn-type  -Wsequence-point  -Wshadow @gol
+-Wreturn-type  -Wsequence-point  -Wshadow  -Wshadow-ivar @gol


This has to be -Wno-shadow-ivar, we document the non-default…


+@item -Wshadow-ivar @r{(Objective-C only)}


Likewise.


+  /* Check wheter the local variable hides the instance variable. */


spelling, whether...


Fixed these.


+  a = private;/* { dg-warning "hides instance variable" "" { xfail *-*-* } 
} */
+  a = protected;  /* { dg-warning "hides instance variable" "" { xfail *-*-* } 
} */
+  a = public; /* { dg-warning "hides instance variable" "" { xfail *-*-* } 
} */


No, we don’t expect failures.  We makes the compiler do what we wants or it 
gets the hose again.  Then, we expect it to be perfect.  If you won’t want 
warning, and non are produces, then just remove the /* … */, or write /* no 
warning */.


I've fixed these as per your request.  For the record though, this form 
of test seems to be fairly common in the test suites as this output 
indicates:


dimitris@debian:~/sandbox/gcc-build$ find ../gcc-source/gcc/testsuite/ 
-name "*.c" -o -name "*.C" -o -name "*.m" | xargs grep "xfail \*-\*-\*" 
| wc -l

354

Many of these seem to be in error or warning messages which are expected 
not to show up.  In any case if the messages do show up they'll trigger 
the excessive messages test so I suppose that's enough.



Also, synth up the ChnageLogs… :-), they are trivial enough.


Done.


And, just pop them all into one patch (cd ..; svn diff), 3 is 3x the work for 
me.


Attached.


Once we resolve the 3 warning tests above, this will be ok.


Actually, there were a few more { xfail *-*-* } in the other test cases. 
 I've removed these as well.


Dimitris

Index: gcc/c-family/c.opt
===
--- gcc/c-family/c.opt	(revision 209852)
+++ gcc/c-family/c.opt	(working copy)
@@ -685,7 +685,7 @@
 Warn if a selector has multiple methods
 
 Wshadow-ivar
-ObjC ObjC++ Var(warn_shadow_ivar) Init(1) Warning
+ObjC ObjC++ Var(warn_shadow_ivar) EnabledBy(Wshadow) Init(1) Warning
 Warn if a local declaration hides an instance variable
 
 Wsequence-point
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 209852)
+++ gcc/doc/invoke.texi	(working copy)
@@ -216,6 +216,8 @@
 -fobjc-gc @gol
 -fobjc-nilcheck @gol
 -fobjc-std=objc1 @gol
+-fno-local-ivars @gol
+-fivar-visibility=@var{public|protected|private|package} @gol
 -freplace-objc-classes @gol
 -fzero-link @gol
 -gen-decls @gol
@@ -261,7 +263,7 @@
 -Wparentheses  -Wpedantic-ms-format -Wno-pedantic-ms-format @gol
 -Wpointer-arith  -Wno-pointer-to-int-cast @gol
 -Wredundant-decls  -Wno-return-local-addr @gol
--Wreturn-type  -Wsequence-point  -Wshadow @gol
+-Wreturn-type  -Wsequence-point  -Wshadow  -Wno-shadow-ivar @gol
 -Wsign-compare  -Wsign-conversion -Wfloat-conversion @gol
 -Wsizeof-pointer-memaccess @gol
 -Wstack-protector -Wstack-usage=@var{len} -Wstrict-aliasing @gol
@@ -2976,6 +2978,22 @@
 The GNU runtime currently always retains calls to @code{objc_get_class("@dots{}")}
 regardless of command-line options.
 
+@item -fno-local-ivars
+@opindex fno-local-ivars
+@opindex flocal-ivars
+By default instance variables in Objective-C can be accessed as if
+they were local variables from within the methods of the class they're
+declared in.  This can lead to shadowing between instance variables
+and other variables declared either locally inside a class method or
+globally with the same name.  Specifying the @option{-fno-local-ivars}
+flag disables this behavior thus avoiding variable shadowing issues.
+
+@item -fivar-visibility=@var{public|protected|private|package}
+@opindex fivar-visibility
+Set the default instance variable visibility to the specified option
+so that instance variables declared outside the scope of any access
+modifier directives default to the specified visibility.
+
 @item -gen-decls
 @opindex gen-decls
 Dump interface declarations for all classes seen in the source file to a
@@ -4350,11 +4368,18 @@
 @item -Wshadow
 @opindex Wshadow
 @opindex Wno-shadow
-Warn whenever a local variable or type declaration shadows another variable,
-parameter, type, or class member (in C++), or whenever a built-in function
-is shadowed. Note that in C++, the compiler warns if a local variable
-shadows an explicit typedef, but not if it shadows a struct/class/enum.
+Warn whenever a local variable or type declaration shadows another
+variable, parameter, type, class member (in C++), or instance variable
+(in Objective-C) or whenever a built-in function is shadowed. Note
+that in C++, the compiler warns if a local variable shadows an
+explicit typedef, but not if it shadows a struct/class/enum.
 
+@item -Wno-shadow-ivar @r{(Objective-C only)}
+@opindex Wno-shadow-ivar
+@opindex Wshadow-ivar
+Do not warn wheneve

Re: [RFC][ARM] TARGET_ATOMIC_ASSIGN_EXPAND_FENV hook

2014-04-28 Thread Ramana Radhakrishnan


On 04/26/14 11:26, Kugan wrote:

Hi,

Attached patch implements TARGET_ATOMIC_ASSIGN_EXPAND_FENV for ARM. With
this, atomic test-case gcc.dg/atomic/c11-atomic-exec-5.c now PASS.

This implementation is based on SPARC and i386 implementations.

Regression tested on qemu-arm for arm-none-linux-gnueabi with no new
regression. Is this OK for trunk?


Thanks for this patch. Can you please test this on hardware and make 
sure c11-atomic-exec-5.c works reliably ?


Testing on qemu is not enough for this patch, sorry :(.

Comments inline below.



Thanks,
Kugan

gcc/
+2014-04-27  Kugan Vivekanandarajah  
+
+   * config/arm/arm.c (TARGET_ATOMIC_ASSIGN_EXPAND_FENV): New define.
+   (arm_builtins) : Add ARM_BUILTIN_LDFPSCR and ARM_BUILTIN_STFPSCR.
+   (bdesc_2arg) : Add description for builtins __builtins_arm_stfpscr
+   and __builtins_arm_ldfpscr.


Rename ld and st as get and set intrinsics please like AArch64.

Add __builtin_arm_setfpscr and __builtin_get_fpscr .


+   (arm_init_builtins) : Initialize builtins __builtins_arm_stfpscr and
+   __builtins_arm_ldfpscr.


Likewise.


+   (arm_expand_builtin) : Expand builtins __builtins_arm_stfpscr and
+   __builtins_arm_ldfpscr.


Likewise.


+   (arm_atomic_assign_expand_fenv): New function.



+   * config/arm/vfp.md (stfpscr): New pattern.
+   (ldfpscr) : Likewise.
+   * config/arm/unspecs.md (unspecv): Add VUNSPEC_LDFPSCR and
+   VUNSPEC_STFPSCR.
+



Replace LD and ST with Get and Set in the builtin names please overall.


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 0240cc7..4f0ed58 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -59,6 +59,7 @@
 #include "params.h"
 #include "opts.h"
 #include "dumpfile.h"
+#include "gimple-expr.h"

 /* Forward definitions of types.  */
 typedef struct minipool_nodeMnode;
@@ -93,6 +94,7 @@ static int thumb_far_jump_used_p (void);
 static bool thumb_force_lr_save (void);
 static unsigned arm_size_return_regs (void);
 static bool arm_assemble_integer (rtx, unsigned int, int);
+static void arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree 
*update);
 static void arm_print_operand (FILE *, rtx, int);
 static void arm_print_operand_address (FILE *, rtx);
 static bool arm_print_operand_punct_valid_p (unsigned char code);
@@ -584,6 +586,9 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_MANGLE_TYPE
 #define TARGET_MANGLE_TYPE arm_mangle_type

+#undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
+#define TARGET_ATOMIC_ASSIGN_EXPAND_FENV arm_atomic_assign_expand_fenv
+
 #undef TARGET_BUILD_BUILTIN_VA_LIST
 #define TARGET_BUILD_BUILTIN_VA_LIST arm_build_builtin_va_list
 #undef TARGET_EXPAND_BUILTIN_VA_START
@@ -23212,6 +23217,9 @@ enum arm_builtins
   ARM_BUILTIN_CRC32CH,
   ARM_BUILTIN_CRC32CW,

+  ARM_BUILTIN_LDFPSCR,
+  ARM_BUILTIN_STFPSCR,
+


s/LD/GET
s/ST/SET


 #undef CRYPTO1
 #undef CRYPTO2
 #undef CRYPTO3
@@ -24010,6 +24018,15 @@ static const struct builtin_description bdesc_2arg[] =
   IWMMXT_BUILTIN2 (iwmmxt_wmacuz, WMACUZ)
   IWMMXT_BUILTIN2 (iwmmxt_wmacsz, WMACSZ)

+
+#define FP_BUILTIN(L, U) \
+  {0, CODE_FOR_##L, "__builtin_arm_"#L, ARM_BUILTIN_##U, \
+   UNKNOWN, 0},
+
+  FP_BUILTIN (stfpscr, LDFPSCR)
+  FP_BUILTIN (ldfpscr, STFPSCR)
+#undef FP_BUILTIN
+
 #define CRC32_BUILTIN(L, U) \
   {0, CODE_FOR_##L, "__builtin_arm_"#L, ARM_BUILTIN_##U, \
UNKNOWN, 0},
@@ -24524,6 +24541,21 @@ arm_init_builtins (void)

   if (TARGET_CRC32)
 arm_init_crc32_builtins ();
+
+  if (TARGET_VFP)
+{
+  tree ftype_stfpscr
+   = build_function_type_list (void_type_node, unsigned_type_node, NULL);
+  tree ftype_ldfpscr
+   = build_function_type_list (unsigned_type_node, NULL);
+
+  arm_builtin_decls[ARM_BUILTIN_LDFPSCR]
+   = add_builtin_function ("__builtin_arm_ldfscr", ftype_ldfpscr,
+   ARM_BUILTIN_LDFPSCR, BUILT_IN_MD, NULL, 
NULL_TREE);
+  arm_builtin_decls[ARM_BUILTIN_STFPSCR]
+   = add_builtin_function ("__builtin_arm_stfscr", ftype_stfpscr,
+   ARM_BUILTIN_STFPSCR, BUILT_IN_MD, NULL, 
NULL_TREE);
+}
 }

 /* Return the ARM builtin for CODE.  */
@@ -25251,6 +25283,25 @@ arm_expand_builtin (tree exp,

   switch (fcode)
 {
+case ARM_BUILTIN_LDFPSCR:
+case ARM_BUILTIN_STFPSCR:
+  if (fcode == ARM_BUILTIN_LDFPSCR)
+   {
+ icode = CODE_FOR_ldfpscr;
+ target = gen_reg_rtx (SImode);
+ pat = GEN_FCN (icode) (target);
+   }
+  else
+   {
+ target = NULL_RTX;
+ icode = CODE_FOR_stfpscr;
+ arg0 = CALL_EXPR_ARG (exp, 0);
+ op0 = expand_normal (arg0);
+ pat = GEN_FCN (icode) (op0);
+   }
+  emit_insn (pat);
+  return target;
+
 case ARM_BUILTIN_TEXTRMSB:
 case ARM_BUILTIN_TEXTRMUB:
 case ARM_BUILTIN_TEXTRMSH:
@@ -31116,4 +31167,70 @@ arm_asan_shadow_offset (void)
   return (unsigned HOST_WIDE_INT) 1 << 29;
 }

+static

Re: -fuse-caller-save - Enable for MIPS

2014-04-28 Thread Richard Sandiford

Tom de Vries  writes:
> On 27-04-14 12:27, Richard Sandiford wrote:
>> Tom de Vries  writes:
>>>   mips_emit_call_insn (rtx pattern, rtx orig_addr, rtx addr, bool lazy_p)
>>>   {
>>> rtx insn, reg;
>>>
>>> -  insn = emit_call_insn (pattern);
>>> +  emit_call_insn (pattern);
>>> +  insn = last_call_insn ();
>>>
>>> if (TARGET_MIPS16 && mips_use_pic_fn_addr_reg_p (orig_addr))
>>>   {
>>
>> This change isn't necessary; emit_call_insn is defined to return a CALL_INSN.
>>
>
> I dropped this change, as well as the change in the untyped_call expand, I 
> realized it's unnecessary.

Why was the untyped_call part unnecessary?

>> I'm a bit surprised that it doesn't work at -O1 for a simple test
>> like this though.  What goes wrong?
>>
>
> AFAIU now the problem is that the optimization doesn't trigger for -O0
> and -01, because the register allocator behaves more conservatively.

Hmm, is that just because -fcaller-saves is -O2 and above?  If so,
should -fuse-caller-save imply -fcaller-saves?

Thanks,
Richard

Re: [PING] [PATCH, AARCH64] movcc for fcsel

2014-04-28 Thread Marcus Shawcroft

On 22 April 2014 10:36, Zhenqiang Chen  wrote:

>> +float f1 (float a, float b, float c, float d)
>> +{
>> +  if (a > 0.0)
>> +return c;
>> +  else
>> +return 2.0;
>> +}
>> +
>> +double f2 (double a, double b, double c, double d)
>> +{
>> +  if (a > b)
>> +return c;
>> +  else
>> +return d;
>> +}

OK, but please GNUize the test case, function names start in column 1
and the test case file names should end in _1.c

/Marcus

[SPARC] Fix incorrect ASI used for casa on LEON3

2014-04-28 Thread Eric Botcazou

As discussed at http://gcc.gnu.org/ml/gcc/2014-04/msg00241.html, this changes 
the compiler to directly emit a 'casa' instruction with an appropriate ASI on 
LEON3 and adds the -muser-mode switch.

Tested on SPARC/Solaris and LEON3, applied on mainline, 4.9 and 4.8 branches.


2014-04-28  Eric Botcazou  

* configure.ac: Tweak GAS check for LEON instructions on SPARC.
* configure: Regenerate.
* config/sparc/sparc.opt (muser-mode): New option.
* config/sparc/sync.md (atomic_compare_and_swap_1): Do not enable
for LEON3.
(atomic_compare_and_swap_leon3_1): New instruction for LEON3.
* doc/invoke.texi (SPARC options): Document -muser-mode.


-- 
Eric BotcazouIndex: doc/invoke.texi
===
--- doc/invoke.texi	(revision 209819)
+++ doc/invoke.texi	(working copy)
@@ -993,6 +993,7 @@ See RS/6000 and PowerPC Options.
 -mhard-quad-float  -msoft-quad-float @gol
 -mstack-bias  -mno-stack-bias @gol
 -munaligned-doubles  -mno-unaligned-doubles @gol
+-muser-mode  -mno-user-mode @gol
 -mv8plus  -mno-v8plus  -mvis  -mno-vis @gol
 -mvis2  -mno-vis2  -mvis3  -mno-vis3 @gol
 -mcbcond -mno-cbcond @gol
@@ -20961,6 +20962,14 @@ Specifying this option avoids some rare
 generated by other compilers.  It is not the default because it results
 in a performance loss, especially for floating-point code.
 
+@item -muser-mode
+@itemx -mno-user-mode
+@opindex muser-mode
+@opindex mno-user-mode
+Do not generate code that can only run in supervisor mode.  This is relevant
+only for the @code{casa} instruction emitted for the LEON3 processor.  The
+default is @option{-mno-user-mode}.
+
 @item -mno-faster-structs
 @itemx -mfaster-structs
 @opindex mno-faster-structs
Index: configure.ac
===
--- configure.ac	(revision 209819)
+++ configure.ac	(working copy)
@@ -3647,7 +3647,7 @@ foo:
.align 4
smac %g2, %g3, %g1
umac %g2, %g3, %g1
-   cas [[%g2]], %g3, %g1],,
+   casa [[%g2]] 0xb, %g3, %g1],,
   [AC_DEFINE(HAVE_AS_LEON, 1,
 [Define if your assembler supports LEON instructions.])])
 ;;
Index: config/sparc/sparc.opt
===
--- config/sparc/sparc.opt	(revision 209819)
+++ config/sparc/sparc.opt	(working copy)
@@ -113,6 +113,10 @@ mrelax
 Target
 Optimize tail call instructions in assembler and linker
 
+muser-mode
+Target Report Mask(USER_MODE)
+Do not generate code that can only run in supervisor mode
+
 mcpu=
 Target RejectNegative Joined Var(sparc_cpu_and_features) Enum(sparc_processor_type) Init(PROCESSOR_V7)
 Use features of and schedule code for given CPU
Index: config/sparc/sync.md
===
--- config/sparc/sync.md	(revision 209819)
+++ config/sparc/sync.md	(working copy)
@@ -200,10 +200,27 @@ (define_insn "*atomic_compare_and_swapmode != DImode || TARGET_ARCH64)"
+  "TARGET_V9 && (mode != DImode || TARGET_ARCH64)"
   "cas\t%1, %2, %0"
   [(set_attr "type" "multi")])
 
+(define_insn "*atomic_compare_and_swap_leon3_1"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(match_operand:SI 1 "mem_noofs_operand" "+w"))
+   (set (match_dup 1)
+	(unspec_volatile:SI
+	  [(match_operand:SI 2 "register_operand" "r")
+	   (match_operand:SI 3 "register_operand" "0")]
+	  UNSPECV_CAS))]
+  "TARGET_LEON3"
+{
+  if (TARGET_USER_MODE)
+return "casa\t%1 0xa, %2, %0"; /* ASI for user data space.  */
+  else
+return "casa\t%1 0xb, %2, %0"; /* ASI for supervisor data space.  */
+}
+  [(set_attr "type" "multi")])
+
 (define_insn "*atomic_compare_and_swapdi_v8plus"
   [(set (match_operand:DI 0 "register_operand" "=h")
 	(match_operand:DI 1 "mem_noofs_operand" "+w"))

Re: [RFC][AARCH64] TARGET_ATOMIC_ASSIGN_EXPAND_FENV hook

2014-04-28 Thread Marcus Shawcroft

Hi Kugan, Thanks for this, couple of comments inline:

On 26 April 2014 11:57, Kugan  wrote:

> gcc/
> +2014-04-27  Kugan Vivekanandarajah  
> +
> +   * config/aarch64/aarch64.c (TARGET_ATOMIC_ASSIGN_EXPAND_FENV): New
> +   define.
> +   * config/aarch64/aarch64-builtins.c (arm_builtins) : Add

aarch64_builtins ?

> +   AARCH64_BUILTIN_LDFPSCR and AARCH64_BUILTIN_STFPSCR.

AArch32 has the traditional combined FPSCR, but AArch64 splits this
register into FPSR and FPCR therefore I think AARCH64_BUILTIN_GET_FPCR
and AARCH64_BUILTIN_SET_FPCR are more appropriate names.  Likewise
subsequent references to FPSCR in this patch should change to FPCR.

> +   (aarch64_init_builtins) : Initialize builtins
> +   __builtins_aarch64_stfpscr and __builtins_aarch64_ldfpscr.
> +   (aarch64_expand_builtin) : Expand builtins __builtins_aarch64_stfpscr
> +   and __builtins_aarch64_ldfpscr.
> +   (aarch64_atomic_assign_expand_fenv): New function.
> +   * config/aarch64/aarch64.md (stfpscr): New pattern.
> +   (ldfpscr) : Likewise.
> +   (unspecv): Add UNSPECV_LDFPSCR and UNSPECV_STFPSCR.
> +

+  aarch64_builtin_decls[AARCH64_BUILTIN_LDFPSCR]
+= add_builtin_function ("__builtin_aarch64_ldfscr", ftype_ldfpscr,

I'd prefer __builtin_aarch64_get_fpcr and __builtin_aarch64_set_fpcr.

We should document them in doc/extend.texi

+  const unsigned HOST_WIDE_INT FE_ALL_EXCEPT = (FE_INVALID | FE_DIVBYZERO
+ | FE_OVERFLOW | FE_UNDERFLOW
+ | FE_INEXACT);

Indentation is funny here..

+  /* Genareate the equivalence of :

Spelling.

+  tree fenv_var = create_tmp_var (unsigned_type_node, NULL);
+  tree ldfpscr = aarch64_builtin_decls[AARCH64_BUILTIN_LDFPSCR];
+  tree stfpscr = aarch64_builtin_decls[AARCH64_BUILTIN_STFPSCR];

Move the declarations to the top of the function please.

+void aarch64_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update);
+

Drop the argument names and relocate to aarch64-protos.h please.

+UNSPECV_LDFPSCR ; load floating point status and control register.

It isn't a status register, how about:

UNSPECV_GET_FPCR ; Represent fetch of FPCR content.

Cheers
/Marcus

Re: [PATCH][RFC][wide-int] Fix some build errors on arm in wide-int branch and report ICE

2014-04-28 Thread Richard Sandiford

Kyrill Tkachov  writes:
> The attached patch allowed the build to proceed for me, but in stage 2 I 
> encountered an ICE:
>
> $TOP/gcc/dwarf2out.c: In function 'long unsigned int 
> _ZL11size_of_dieP10die_struct.isra.209(vec**, long 
> unsigned int)':
> $TOP/gcc/dwarf2out.c:7820:1: internal compiler error: in set_value_range, at 
> tree-vrp.c:452
>   size_of_die (dw_die_ref die)
>   ^
> 0xa825c1 set_value_range
>  $TOP/gcc/tree-vrp.c:452
> 0xa8a441 extract_range_basic
>  $TOP/gcc/tree-vrp.c:3679
> 0xa92c13 vrp_visit_assignment_or_call
>  $TOP/gcc/tree-vrp.c:6725
> 0xa947eb vrp_visit_stmt
>  $TOP/gcc/tree-vrp.c:7538
> 0x9d4d47 simulate_stmt
>  $TOP/gcc/tree-ssa-propagate.c:329
> 0x9d5047 simulate_block
>  $TOP/gcc/tree-ssa-propagate.c:452
> 0x9d5e23 ssa_propagate(ssa_prop_result (*)(gimple_statement_base*, 
> edge_def**, tree_node**), ssa_prop_result (*)(gimple_statement_base*))
>  $TOP/gcc/tree-ssa-propagate.c:859
> 0xa9a1e1 execute_vrp
>  $TOP/gcc/tree-vrp.c:9781
> 0xa9a4a3 execute
>  $TOP/gcc/tree-vrp.c:9872
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See  for instructions.
>
>
> Any ideas? The compiler was configured with: --enable-languages=c,c++,fortran 
> --with-cpu=cortex-a15 --with-float=hard --with-mode=thumb

Please can try again with the current branch?  If it still fails, could
you send me the dwarf2out.ii file?

Thanks,
Richard

Re: [wide-int] Stricter type checking in wide_int constructor

2014-04-28 Thread Richard Sandiford

Ping.  FWIW this is the last patch I have lined up before the merge.
I repeated the asm comparison test I did a few months ago on one target
per config/ architecture and there were no unexpected changes.

Richard Sandiford  writes:
> At the moment we prohibit "widest_int = wide_int" and "offset_int = wide_int".
> These would be correct only if the wide_int had the same precision as
> widest_int and offset_int respectively, but since those precisions
> don't really correspond to a particular language-level precision,
> such cases should never occur in practice.
>
> We also prohibit "wide_int = HOST_WIDE_INT" on the basis that the wide_int
> would naturally get the precision of HOST_WIDE_INT, which is a host property.
>
> However, we allowed "wide_int = widest_int" and "wide_int = offset_int".
> This is always safe in the sense that, unlike "widest_int = wide_int"
> and "offset_int = wide_int", they would never trigger an assertion
> failure in themselves.  But I think in practice they're always going to
> be a mistake.  The only arithmetic you can do on the resulting wide_int
> is with wide_ints that have been assigned in the same way, in which case
> you should be doing the arithmetic on widest_int or offset_int instead.
> And if you don't want to do arithmetic, but simply want to access the value,
> you should use wide_int_ref instead.  This avoids unnecessary copying.
>
> This patch adds an extra STATIC_ASSERT to trap that case and fixes
> the minor fallout.  The to_mpz change contains another fix: we applied
> small_prec without checking whether len was the maximum value.
> Also, it seems safer to use alloca now that we have the extra-wide
> integer in vrp.
>
> Putting the STATIC_ASSERTs in their own scope is a bit clunky, but a lot
> of this code would be much cleaner with C++11...
>
> Tested on x86_64-linux-gnu and included in the asm comparison.  OK to install?
>
> Thanks,
> Richard
>
>
> Index: gcc/emit-rtl.c
> ===
> --- gcc/emit-rtl.c2014-04-24 08:30:16.191670326 +0100
> +++ gcc/emit-rtl.c2014-04-24 08:30:19.117694968 +0100
> @@ -535,7 +535,7 @@ lookup_const_wide_int (rtx wint)
> (if TARGET_SUPPORTS_WIDE_INT).  */
>  
>  rtx
> -immed_wide_int_const (const wide_int &v, enum machine_mode mode)
> +immed_wide_int_const (const wide_int_ref &v, enum machine_mode mode)
>  {
>unsigned int len = v.get_len ();
>unsigned int prec = GET_MODE_PRECISION (mode);
> Index: gcc/rtl.h
> ===
> --- gcc/rtl.h 2014-04-24 08:30:16.191670326 +0100
> +++ gcc/rtl.h 2014-04-24 08:30:19.117694968 +0100
> @@ -2008,7 +2008,7 @@ extern double_int rtx_to_double_int (con
>  #endif
>  extern void cwi_output_hex (FILE *, const_rtx);
>  #ifndef GENERATOR_FILE
> -extern rtx immed_wide_int_const (const wide_int &cst, enum machine_mode 
> mode);
> +extern rtx immed_wide_int_const (const wide_int_ref &, enum machine_mode);
>  #endif
>  #if TARGET_SUPPORTS_WIDE_INT == 0
>  extern rtx immed_double_const (HOST_WIDE_INT, HOST_WIDE_INT,
> Index: gcc/tree-ssa-ccp.c
> ===
> --- gcc/tree-ssa-ccp.c2014-04-24 08:30:16.191670326 +0100
> +++ gcc/tree-ssa-ccp.c2014-04-24 08:30:19.118694976 +0100
> @@ -218,7 +218,8 @@ dump_lattice_value (FILE *outf, const ch
>   }
>else
>   {
> -   wide_int cval = wi::bit_and_not (wi::to_widest (val.value), val.mask);
> +   widest_int cval = wi::bit_and_not (wi::to_widest (val.value),
> +  val.mask);
> fprintf (outf, "%sCONSTANT ", prefix);
> print_hex (cval, outf);
> fprintf (outf, " (");
> @@ -1249,7 +1250,7 @@ bit_value_binop_1 (enum tree_code code,
>  case RROTATE_EXPR:
>if (r2mask == 0)
>   {
> -   wide_int shift = r2val;
> +   widest_int shift = r2val;
> if (shift == 0)
>   {
> *mask = r1mask;
> @@ -1286,7 +1287,7 @@ bit_value_binop_1 (enum tree_code code,
>is zero.  */
>if (r2mask == 0)
>   {
> -   wide_int shift = r2val;
> +   widest_int shift = r2val;
> if (shift == 0)
>   {
> *mask = r1mask;
> Index: gcc/tree-vrp.c
> ===
> --- gcc/tree-vrp.c2014-04-24 08:30:16.191670326 +0100
> +++ gcc/tree-vrp.c2014-04-24 08:30:19.119694985 +0100
> @@ -3860,7 +3860,8 @@ adjust_range_with_scev (value_range_t *v
> signop sgn = TYPE_SIGN (TREE_TYPE (step));
> bool overflow;
>  
> -   wide_int wtmp = wi::mul (wi::to_widest (step), nit, sgn, &overflow);
> +   widest_int wtmp = wi::mul (wi::to_widest (step), nit, sgn,
> +  &overflow);
> /* If the multiplication overflowed we can't do a meaningful
>adjustment.  Likewise if the result doesn't fit in the type
>

Re: [PATCH] Optionally trap on impossible devirtualization

2014-04-28 Thread Jakub Jelinek

On Mon, Apr 28, 2014 at 11:05:06AM +0200, Richard Biener wrote:
> On Fri, Apr 25, 2014 at 5:35 PM, Martin Jambor  wrote:
> > Hi,
> >
> > the patch below might be useful for testcase preparation and debugging
> > compiler bugs such as PR 60965.  When
> > -ftrap-on-impossible-devirtualization is supplied on the command line,
> > it makes the devirtualization produce __builtin_trap instead of
> > __builtin_unreachable when it comes to the conclusion that there is no
> > legal target of a virtual call.
> >
> > Apart from dealing with our bugs, it may be even useful to debug
> > compiled programs when a user triggers some sort of illegal
> > devirtualization, typically by missing a type check somewhere.
> > Currently the compiled program might simply take a wrong branch, with
> > the patch it will abort.
> >
> > Bootstrapped and tested (with the option on) on x86_64-linux, I have
> > also successfully LTO built Firefox with it.  If I add some
> > documentation, would like to see this in trunk?
> 
> It's useful for debugging, so yes.  Not sure about the option name though.
> Maybe we should have a generic -ftrap-on-unreachable flag instead
> and handle all __builtin_unreachable () like that (for example by
> folding or by simply make __builtin_unreachable () alias to __builtin_trap 
> ()).

-fsanitize=unreachable should already do that.  With
-fsanitize=unreachable -fsanitize-undefined-trap-on-error
it should fold __builtin_unreachable () to __builtin_trap (), otherwise
to __ubsan_handle_builtin_unreachable () call.

So, from this POV, the new option is redundant.

Jakub

Re: Examples of gimple statement API (was Re: [PATCH 03/89] Introduce gimple_bind and use it for accessors.)

2014-04-28 Thread Richard Biener

On Fri, Apr 25, 2014 at 5:28 PM, David Malcolm  wrote:
> On Fri, 2014-04-25 at 10:37 +0200, Richard Biener wrote:
>> On Thu, Apr 24, 2014 at 4:59 PM, David Malcolm  wrote:
>> > On Thu, 2014-04-24 at 09:09 -0400, Andrew MacLeod wrote:
>> >> On 04/24/2014 04:33 AM, Richard Biener wrote:
>> >> > On Wed, Apr 23, 2014 at 11:23 PM, Jeff Law  wrote:
>> >> >> On 04/23/14 15:13, David Malcolm wrote:
>> >> >>> On Wed, 2014-04-23 at 15:04 -0600, Jeff Law wrote:
>> >>  On 04/21/14 10:56, David Malcolm wrote:
>> >> > This updates all of the gimple_bind_* accessors in gimple.h from 
>> >> > taking
>> >> > a
>> >> > plain gimple to taking a gimple_bind (or const_gimple_bind), with 
>> >> > the
>> >> > checking happening at the point of cast.
>> >> >
>> >> > Various other types are strengthened from gimple to gimple_bind, and
>> >> > from
>> >> > plain vec to vec.
>> >> >>>
>> >> >>> [...]
>> >> >>>
>> >>  This is fine, with the same requested changes as #2; specifically 
>> >>  using
>> >>  an explicit cast rather than hiding the conversion in a method.  Once
>> >>  those changes are in place, it's good for 4.9.1.
>> >> >>> Thanks - presumably you mean
>> >> >>> "good for *trunk* after 4.9.1 is released"
>> >> >> Right.  Sorry for the confusion.
>> >> > Note I still want that less-typedefs (esp. the const_*) variants to be 
>> >> > explored.
>> >> > Changing this will touch all the code again, so I'd like to avoid that.
>> >> >
>> >> > That is, shouldn't we go back to 'gimple' being 'gimple_statement_base'
>> >> > and not 'gimple_statement_base *'?  The main reason that we have so
>> >> > many typedefs is that in C you had to use 'struct foo' each time you
>> >> > refer to foo as a type - I suppose it was then convenient to do the
>> >> > typedef to the pointer type.  With 'gimple' being not a pointer type
>> >> > we get const correctness in the way people would expect it to work.
>> >> > [no, I don't suggest you change 'tree' or 'const_tree' at this point, 
>> >> > just
>> >> > gimple (and maybe gimple_seq) as you are working on the 'gimple'
>> >> > type anyway].
>> >> >
>> >> >
>> >>
>> >> So if we change 'gimple' everywhere to be 'gimple *', can we just
>> >> abandon the 'gimple' typedef completely and go directly to using
>> >> something like gimple_stmt, or some other agreeable name instead?
>> >>
>> >> I think its more descriptive and then frees up the generic 'gimple' name
>> >> should we decide to do something more with namespaces in the future...
>> >
>> > There have been a few different proposals as to what the resulting
>> > gimple API might look like, in various subthreads of this discusssion,
>> > so I thought it might help the discussion to gather up the proposals,
>> > and to apply them to some specific code examples, to see what the
>> > results might look like.
>> >
>> > So here are a couple of code fragments, from gcc/graphite-sese-to-poly.c
>> > and gcc/tree-ssa-uninit.c respectively:
>> >
>> > Status quo
>> > ==
>> >
>> >static gimple
>> >detect_commutative_reduction (scop_p scop, gimple stmt, vec *in,
>> >  vec *out)
>> >{
>> >  if (scalar_close_phi_node_p (stmt))
>> >{
>> >  gimple def, loop_phi, phi, close_phi = stmt;
>> >  tree init, lhs, arg = gimple_phi_arg_def (close_phi, 0);
>> >
>> >  if (TREE_CODE (arg) != SSA_NAME)
>> >
>> >/* ...etc... */
>> >
>> >static unsigned int
>> >execute_late_warn_uninitialized (void)
>> >{
>> >  basic_block bb;
>> >  gimple_stmt_iterator gsi;
>> >  vec worklist = vNULL;
>> >  pointer_set_t *added_to_worklist;
>> >
>> > The currently-posted patch series
>> > =
>> > Here's the cumulative effect of the patch series I posted, using the
>> > casting methods of the base class (the "stmt->as_a_gimple_phi" call):
>> >
>> >   -static gimple
>> >   +static gimple_phi
>> >detect_commutative_reduction (scop_p scop, gimple stmt, vec *in,
>> > vec *out)
>> >{
>> >  if (scalar_close_phi_node_p (stmt))
>> >{
>> >   -  gimple def, loop_phi, phi, close_phi = stmt;
>> >   +  gimple def;
>> >   +  gimple_phi loop_phi, phi, close_phi = stmt->as_a_gimple_phi ();
>> >  tree init, lhs, arg = gimple_phi_arg_def (close_phi, 0);
>> >
>> >  if (TREE_CODE (arg) != SSA_NAME)
>> >
>> >/* ...etc... */
>> >
>> >execute_late_warn_uninitialized (void)
>> >{
>> >  basic_block bb;
>> >   -  gimple_stmt_iterator gsi;
>> >   -  vec worklist = vNULL;
>> >   +  gimple_phi_iterator gsi;
>> >   +  vec worklist = vNULL;
>> >  pointer_set_t *added_to_worklist;
>> >
>> > Direct use of is-a.h, retaining typedefs of pointers
>> > 
>> > The following patch shows what the above might look like using the patch
>> > series as posted, but eliminating the

1 2 >

1 - 100 of 107 matches

Mail list logo