[wwwdocs] GCC 6 Release Notes for RTEMS

2015-09-03 Thread Sebastian Huber

Index: htdocs/gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.25
diff -u -r1.25 changes.html
--- htdocs/gcc-6/changes.html   25 Aug 2015 22:27:46 -  1.25
+++ htdocs/gcc-6/changes.html   4 Sep 2015 06:21:14 -
@@ -203,6 +203,23 @@

 

+
+  
+The RTEMS thread model implementation changed.  For the mutexes
+self-contained objects defined in Newlib  are used
+instead of Classic API semaphores.  The keys for thread specific 
data and

+the once function are directly defined via .
+Self-contained condition variables are provided via Newlib
+.  The RTEMS thread model supports now the C++11
+threads.
+
+The OpenMP support uses now self-contained objects provided by 
Newlib
+ and offers a significantly better performance 
compared

+to the POSIX configuration of libgomp.  It is possible to
+configure thread pools for each scheduler instance via the environment
+variable GOMP_RTEMS_THREAD_POOLS.
+  
+
 


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: [PATCH] Make ubsan tests less picky about ansi escape codes in diagnostics.

2015-09-03 Thread Yury Gribov

On 09/03/2015 07:45 PM, Jonathan Roelofs wrote:



On 9/3/15 10:17 AM, Jakub Jelinek wrote:

On Thu, Sep 03, 2015 at 10:15:02AM -0600, Jonathan Roelofs wrote:

+kcc, mrs

Ping

On 8/27/15 4:44 PM, Jonathan Roelofs wrote:

The attached patch makes the ubsan tests agnostic to ansi escape codes
in their diagnostic output.


It wouldn't hurt if you explained in detail what is the problem you are
trying to solve and why something that works for most people doesn't
work in
your case.


Hi Jakub,

AFAICT, there are two ways to suppress the emission of color codes from
ubsan's diagnostics:

   1) Set an environment variable.
   2) Make the output stream not a tty.

#1 doesn't seem to be possible in DejaGnu without hacks.


AFAIR it can't be done for remote targets due to DejaGnu design limitations.


#2 doesn't work in our environment because DejaGnu attempts to make
itself appear to the program under test as if it were a tty. This might
be an artifact of the fact that all of our testing is remote testing
(though that is just blind speculation on my part:


AFAIK that's indeed the case.

Added Max.

-Y


Re: [wwwdocs] Skeleton for GCC 6 release notes

2015-09-03 Thread Sebastian Huber



On 04/09/15 00:41, Gerald Pfeifer wrote:

Hi Sebastian,

On Thu, 3 Sep 2015, Sebastian Huber wrote:

>how can I add something to the release notes? I would like to mention
>some RTEMS changes.

is it possible checking outhttps://gcc.gnu.org/about.html  is all
you are looking for, or am I thinking to simple?:-)



I searched the web and found this page before. Then I clicked at "browse 
the repository " and 
landed in the GCC sources. This somehow confused me.


Using the CVS checkout leads to the right repository.

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



[PATCH] select isl-0.15 in download_prerequisites

2015-09-03 Thread VandeVondele Joost
For the recent fix of PR53852, isl-0.15 is needed, which is already available 
at ftp://gcc.gnu.org/pub/gcc/infrastructure/ . Thus, it seems to make sense to 
update the download_prerequisites script, as done with the attached patch.

OK for trunk ?

JoostIndex: contrib/download_prerequisites
===
--- contrib/download_prerequisites	(revision 227480)
+++ contrib/download_prerequisites	(working copy)
@@ -43,7 +43,7 @@ ln -sf $MPC mpc || exit 1
 
 # Necessary to build GCC with the Graphite loop optimizations.
 if [ "$GRAPHITE_LOOP_OPT" = "yes" ] ; then
-  ISL=isl-0.14
+  ISL=isl-0.15
 
   wget ftp://gcc.gnu.org/pub/gcc/infrastructure/$ISL.tar.bz2 || exit 1
   tar xjf $ISL.tar.bz2  || exit 1
contrib/ChangeLog:

2015-09-04  Joost VandeVondele  

* download_prerequisites: update to isl-0.15.



Re: [libffi] Correct powerpc sysv stack argument accounting (#194)

2015-09-03 Thread Alan Modra
On Tue, Aug 04, 2015 at 08:23:46AM -0700, Richard Henderson wrote:
> Looks good, Alan.  Thanks.  After this gets merged, I guess it's
> worth merging back to gcc.

It's been a month since I created the pull request and posted
https://sourceware.org/ml/libffi-discuss/2015/msg00079.html

Given that things are going rather slow in libffi land at the moment,
perhaps I should merge this to gcc now?

-- 
Alan Modra
Australia Development Lab, IBM


Re: [wwwdocs] Skeleton for GCC 6 release notes

2015-09-03 Thread Gerald Pfeifer
Hi Sebastian,

On Thu, 3 Sep 2015, Sebastian Huber wrote:
> how can I add something to the release notes? I would like to mention 
> some RTEMS changes.

is it possible checking out https://gcc.gnu.org/about.html is all
you are looking for, or am I thinking to simple? :-)

> The RTEMS thread model implementation changed. For the mutexes 
> self-contained objects defined in Newlib  are used instead 
> of Classic API semaphores.  The keys and the once function are directly 
> defined via .  Condition variables are provided via Newlib 
> . The RTEMS thread model supports now the C++11 threads.
> 
> The OpenMP support uses now self-contained objects provided by Newlib
>  and offers a significantly better performance compared to the
> POSIX configuration. It is possible to configure scheduler instance specific
> thread pools.

Absolutely, please go ahead and add this to the release notes.

I'll be happy to have a look and review, or advise if you have
any questions.

Gerald


Re: [patch] Clean up libstdc++ includes slightly.

2015-09-03 Thread Jonathan Wakely

I'm committing the __throw_bad_alloc() part on the branch too.

commit 02221ce47cade82036c7d78ed79e5fe536fdfcfd
Author: Jonathan Wakely 
Date:   Thu Sep 3 23:01:02 2015 +0100

	* include/std/shared_mutex (shared_timed_mutex::shared_timed_mutex):
	Replace throw with __throw_bad_alloc.

diff --git a/libstdc++-v3/include/std/shared_mutex b/libstdc++-v3/include/std/shared_mutex
index b72a822..7b216a5 100644
--- a/libstdc++-v3/include/std/shared_mutex
+++ b/libstdc++-v3/include/std/shared_mutex
@@ -74,7 +74,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   int __ret = pthread_rwlock_init(&_M_rwlock, NULL);
   if (__ret == ENOMEM)
-	throw bad_alloc();
+	__throw_bad_alloc();
   else if (__ret == EAGAIN)
 	__throw_system_error(int(errc::resource_unavailable_try_again));
   else if (__ret == EPERM)


[PATCH, rs6000] Add memory barriers to tbegin, tend, etc.

2015-09-03 Thread Peter Bergner
While debugging a transaction lock elision issue, we noticed that the
compiler was moving some loads/stores outside of the transaction body,
because the HTM instructions were not marked as memory barriers, which
is bad.  Looking deeper, I also noticed that neither Intel and S390
have their HTM instructions marked as memory barriers either, although
Andi did submit a patch last year:

https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02999.html

Richi and r~ both said the memory barrier should be part of the patterns:

https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02235.html

The following patch uses that suggestion by adding memory barriers to
all of our HTM instructions, which fixes the issue.  I have also added
a __TM_FENCE__ macro users can test to see whether the compiler treats
the HTM instructions as memory barriers or not, in case they want to
explicitly add memory barriers to their code when using older compilers.

On a glibc thread discussing this issue, Torvald also asked that I add
documention describing the memory consistency semantics the HTM instructions
should have, so I added a blurb about that.  Torvald, is the text below
what you were looking for?

This has passed bootstrap/regtesting on powerpc64le-linux.  Is this ok
for mainline?

Since this is a correctness issue, I'd like to eventually backport this to
the release branches.  Is that ok once I've verified bootstrap/regtesting
on them?

Once this is committed, I can take a stab at fixing Intel and S390 similarly,
unless someone beats me to it (hint hint :).  I'd need help testing it though,
since I don't have access to Intel or S390 hardware that support HTM.

Peter

* config/rs6000/htm.md (UNSPEC_HTM_FENCE): New.
(tabort, tabortc, tabortci, tbegin, tcheck, tend,
trechkpt, treclaim, tsr, ttest): Rename define_insns from this...
(*tabort, *tabortc, *tabortci, *tbegin, *tcheck, *tend,
*trechkpt, *treclaim, *tsr, *ttest): ...to this.  Add memory barrier.
(tabort, tabortc, tabortci, tbegin, tcheck, tend,
trechkpt, treclaim, tsr, ttest): New define_expands.
* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
__TM_FENCE__ for htm.
* doc/extend.texi: Update documentation for htm builtins.

Index: gcc/config/rs6000/htm.md
===
--- gcc/config/rs6000/htm.md(revision 227308)
+++ gcc/config/rs6000/htm.md(working copy)
@@ -45,96 +45,231 @@
UNSPECV_HTM_MTSPR
   ])
 
+;;
+;; UNSPEC usage
+;;
 
-(define_insn "tabort"
+(define_c_enum "unspec"
+  [UNSPEC_HTM_FENCE
+  ])
+
+(define_expand "tabort"
+  [(parallel
+ [(set (match_operand:CC 1 "cc_reg_operand" "=x")
+  (unspec_volatile:CC [(match_operand:SI 0 "base_reg_operand" "b")]
+  UNSPECV_HTM_TABORT))
+  (set (match_dup 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[2] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[2]) = 1;
+})
+
+(define_insn "*tabort"
   [(set (match_operand:CC 1 "cc_reg_operand" "=x")
(unspec_volatile:CC [(match_operand:SI 0 "base_reg_operand" "b")]
-   UNSPECV_HTM_TABORT))]
+   UNSPECV_HTM_TABORT))
+   (set (match_operand:BLK 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tabort. %0"
   [(set_attr "type" "htm")
(set_attr "length" "4")])
 
-(define_insn "tabortc"
+(define_expand "tabortc"
+  [(parallel
+ [(set (match_operand:CC 3 "cc_reg_operand" "=x")
+  (unspec_volatile:CC [(match_operand 0 "u5bit_cint_operand" "n")
+   (match_operand:GPR 1 "gpc_reg_operand" "r")
+   (match_operand:GPR 2 "gpc_reg_operand" "r")]
+  UNSPECV_HTM_TABORTXC))
+  (set (match_dup 4) (unspec:BLK [(match_dup 4)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[4] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[4]) = 1;
+})
+
+(define_insn "*tabortc"
   [(set (match_operand:CC 3 "cc_reg_operand" "=x")
(unspec_volatile:CC [(match_operand 0 "u5bit_cint_operand" "n")
 (match_operand:GPR 1 "gpc_reg_operand" "r")
 (match_operand:GPR 2 "gpc_reg_operand" "r")]
-   UNSPECV_HTM_TABORTXC))]
+   UNSPECV_HTM_TABORTXC))
+   (set (match_operand:BLK 4) (unspec:BLK [(match_dup 4)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tabortc. %0,%1,%2"
   [(set_attr "type" "htm")
(set_attr "length" "4")])
 
-(define_insn "tabortci"
+(define_expand "tabortci"
+  [(parallel
+ [(set (match_operand:CC 3 "cc_reg_operand" "=x")
+  (unspec_volatile:CC [(match_operand 0 "u5bit_cint_operand" "n")
+   (match_operand:GPR 1 "gpc_reg_operand" "r")
+   (match_operand 2 "s5bit_cint

Re: [Fortran, committed] XFAIL read_dir.f90 on FreeBSD

2015-09-03 Thread Janne Blomqvist
On Wed, Sep 2, 2015 at 6:03 PM, Steve Kargl
 wrote:
> On Wed, Sep 02, 2015 at 11:30:07AM +0300, Janne Blomqvist wrote:
>> On Wed, Sep 2, 2015 at 1:28 AM, Jerry DeLisle  wrote:
>> > On 09/01/2015 11:18 AM, Steve Kargl wrote:
>> >> On Tue, Sep 01, 2015 at 11:16:27AM -0700, Steve Kargl wrote:
>> >>> open(unit=10, 
>> >>> file='junko.dir',iostat=ios,action='read',access='stream')
>> >>> if (ios.ne.0) call abort
>> >>> read(10, iostat=ios) c
>> >>> -   if (ios.ne.21) call abort
>> >>> +   if (ios.ne.21) then
>> >>> +  close(10)
>> >>
>> >> I forgot to mention that 'close(10, status="delete')' does not
>> >> work on a directory.  Should it?
>> >>
>> >>> +  call system('rmdir junko.dir')
>> >>> +  call abort
>> >>> +   end if
>> >>> +   close(10)
>> >>> call system('rmdir junko.dir')
>> >>
>> >
>> > Thanks for the touch up Steve.  I suspect other OS's will not work either. 
>> >  I
>> > assumed close with Status="delete" would not work on a directory.
>>
>> That's because libgfortran uses unlink(2), which only works for files,
>> not directories. One could change that to use remove(3), which works
>> for both.
>
> I suspect people who create directories and then
> want to delete them will use SYSTEM or the
> standard conforming equivalent.

Probably. Anyway, it's no big deal to fix it and shouldn't have any
negative side effects, so I committed the attached patch as obvious.

testsuite:

2015-09-04  Janne Blomqvist  

* gfortran.dg/read_dir.f90: Delete empty directory when closing
rather than calling rmdir, cleanup if open fails.


libgfortran:

2015-09-04  Janne Blomqvist  

* io/unix.h (delete_file): Remove prototype.
* io/unix.c (delete_file): Remove function.
* io/close.c (st_close): Replace delete_file and unlink with
remove.
* io/open.c (already_open): Replace unlink with remove.


>> Also, I suspect the reason why it fails on freebsd is that errno
>> EISDIR is not 21 there. Perhaps one should just check for ios /= 0?
>
> I checked.  FreeBSD's EISDIR is 21; howevr, ios == 0 in this
> case.  I haven't looked too deep.  FreeBSD is probably
> adhering to the unix philosophy of "everything is a file".

Hmm, Ok. Reading the POSIX spec for read()

http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html

it seems it's allowed, but not required for an implementation to
return data when reading from a directory fd. The portable would be to
use readdir().


-- 
Janne Blomqvist
diff --git a/gcc/testsuite/gfortran.dg/read_dir.f90 
b/gcc/testsuite/gfortran.dg/read_dir.f90
index 0e28f9f..4009ed6 100644
--- a/gcc/testsuite/gfortran.dg/read_dir.f90
+++ b/gcc/testsuite/gfortran.dg/read_dir.f90
@@ -7,13 +7,14 @@ program bug
integer ios
call system('[ -d junko.dir ] || mkdir junko.dir')
open(unit=10, file='junko.dir',iostat=ios,action='read',access='stream')
-   if (ios.ne.0) call abort
+   if (ios.ne.0) then
+  call system('rmdir junko.dir')
+  call abort
+   end if
read(10, iostat=ios) c
if (ios.ne.21) then 
-  close(10)
-  call system('rmdir junko.dir')
+  close(10, status='delete')
   call abort
end if
-   close(10)
-   call system('rmdir junko.dir')
+   close(10, status='delete')
 end program bug
diff --git a/libgfortran/io/close.c b/libgfortran/io/close.c
index 38855ee..1e10993 100644
--- a/libgfortran/io/close.c
+++ b/libgfortran/io/close.c
@@ -80,7 +80,7 @@ st_close (st_parameter_close *clp)
  if (status == CLOSE_DELETE)
 {
 #if HAVE_UNLINK_OPEN_FILE
- delete_file (u);
+ remove (u->filename);
 #else
  path = strdup (u->filename);
 #endif
@@ -92,7 +92,7 @@ st_close (st_parameter_close *clp)
 #if !HAVE_UNLINK_OPEN_FILE
   if (path != NULL)
{
- unlink (path);
+ remove (path);
  free (path);
}
 #endif
diff --git a/libgfortran/io/open.c b/libgfortran/io/open.c
index 4654de2..630bca6 100644
--- a/libgfortran/io/open.c
+++ b/libgfortran/io/open.c
@@ -664,7 +664,7 @@ already_open (st_parameter_open *opp, gfc_unit * u, 
unit_flags * flags)
  
 #if !HAVE_UNLINK_OPEN_FILE
   if (u->filename && u->flags.status == STATUS_SCRATCH)
-   unlink (u->filename);
+   remove (u->filename);
 #endif
  free (u->filename);
  u->filename = NULL;
diff --git a/libgfortran/io/unix.c b/libgfortran/io/unix.c
index fd5f277..5385d8b 100644
--- a/libgfortran/io/unix.c
+++ b/libgfortran/io/unix.c
@@ -1716,16 +1716,6 @@ flush_all_units (void)
 }
 
 
-/* delete_file()-- Given a unit structure, delete the file associated
- * with the unit.  Returns nonzero if something went wrong. */
-
-int
-delete_file (gfc_unit * u)
-{
-  return unlink (u->filename);
-}
-
-
 /* file_exists()-- Returns nonzero if the current filename exists on
  * the system */
 
diff --git a/libgfortran/io/unix.h b/libgfortran/io/unix.h
index 78a41f7..d1aa75d 100644
--- a/libgfortran/io/unix.h
+++ b/libgfortran/io/unix.h
@@ -141,9 +141,6 @@ internal_proto(

[PATCH, MIPS] Frame header optimization for MIPS O32 ABI

2015-09-03 Thread Steve Ellcey
Here is an update of my MIPS frame header optimization patch.  This is
actually only one part of the patch but I would like to get this approved
and checked in before proceeding with the second half.

The O32 ABI on MIPS requires that calling functions allocate space on the
stack for arguments that are passed in registers.  That way if the address
of an argument passed in a register is needed, the called function can write
it out to this space.  The new MIPS ABIs have the called functions allocate
this space and they are unaffected by this patch.

This patch looks at what functions a function calls and if none of them
use the allocated stack space to store arguments then the the calling function
does not allocate this stack space.  In general this optimization is not going
to save any time because the calling function will still need to allocate
stack space for the return address if nothing else, but it does save space.
Using a callers allocated space to save the return address if it is not
needed for arguments will be the second part of this optimization.

There is one major restriction on when this optimization will not happen, and
that is for PIC code.  There is something about accessing global variables
in PIC code on MIPS, with its ghost instructions and global symbol accesses,
that conflict with this optimization, so I skip it for PIC code.  I think it
only needs to be skipped in functions where the global pointer register is
saved and restored but we don't know which those are until very late in the
compilation (thus the ghost instructions) and that is after we need to
determine whether or not we can do this optimization.

I did some testing with this optimization turned on by default at -O2 and
did not have any regressions but this patch does not turn the option on
by default at -O2 (or any other optimization level).  While that may be a
reasonable thing to do, personally, I think I would like to get this checked
in and have it available to more users before turning it on by default.

FYI: There is one reorganization bit to this patch, some types were moved
from mips.c to mips.h so that they would be visible to the new file
I created (frame-header-opt.c) that needs them.  Other platforms have
these types, particularly machine_function, in their headers so this
is not a departure from what other targets have already done.

Tested on MIPS with the mips-mti-linux-gnu, OK for checkin?

Steve Ellcey
sell...@imgtec.com


2015-09-03  Steve Ellcey  

* config.gcc (mips*-*-*): Add frame-header-opt.o to extra_objs.
* frame-header-opt.c: New file.
* config/mips/mips-proto.h (mips_register_frame_header_opt):
Add prototype.
* config/mips/mips.c (mips_compute_frame_info): Check
optimize_call_stack flag.
(mips_option_override): Register new frame_header_opt pass.
(mips_frame_info, mips_int_mask, mips_shadow_set,
machine_function): Move these types to...
* config/mips/mips.h: here.
(machine_function): Add does_not_use_parm_stack_space and
optimize_call_stack fields.
* config/mips/t-mips (frame-header-opt.o): Add new make rule.
* doc/invoke.texi (-mframe-header-opt, -mno-frame-header-opt):
Document new flags.
* config/mips/mips.opt (mframe-header-opt): Add new option.


diff --git a/gcc/config.gcc b/gcc/config.gcc
index 5712547e..eea97de 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -420,6 +420,7 @@ microblaze*-*-*)
 mips*-*-*)
cpu_type=mips
extra_headers="loongson.h"
+   extra_objs="frame-header-opt.o"
extra_options="${extra_options} g.opt fused-madd.opt 
mips/mips-tables.opt"
;;
 nds32*)
diff --git a/gcc/config/mips/frame-header-opt.c 
b/gcc/config/mips/frame-header-opt.c
index e69de29..5db5385 100644
--- a/gcc/config/mips/frame-header-opt.c
+++ b/gcc/config/mips/frame-header-opt.c
@@ -0,0 +1,261 @@
+/* Analyze functions to determine if calling functions need to allocate
+   stack space for its called functions to write out their arguments on
+   to the stack.  This optimization is only applicable to TARGET_OLDABI
+   targets because calling functions on TARGET_NEWABI targets do not
+   allocate any stack space for arguments (the called function does it 
+   if needed).
+
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+

Re: RFC: Combine of compare & and oddity

2015-09-03 Thread Jeff Law

On 09/03/2015 01:14 PM, Richard Sandiford wrote:


If the (and ...) form is a better canonical form (IMO yes) then
I think it would be better to make it the canonical form across
the baord and update the existing ports to use it.  The criteria
could be something like no unjustifiable differences in gcc.dg,
g++.dg and gcc.c-torture .s output for -O2, which is relatively
easy to test.

I'd support this.

jeff


[patch] Clean up libstdc++ includes slightly.

2015-09-03 Thread Jonathan Wakely

This adjusts some missing or redundant includes, and replaces "throw
bad_alloc()" (which won't work with -fno-exceptions) with a call to
__throw_bad_alloc().

Tested powerpc64e-linux, committed to trunk.

commit ca17448c303cfd58191c64abe42a750c9590aa14
Author: Jonathan Wakely 
Date:   Thu Sep 3 21:02:41 2015 +0100

Clean up libstdc++ includes slightly.

	* include/bits/shared_ptr_base.h: Add required header.
	* include/std/condition_variable: Likewise.
	* include/std/mutex: Remove unused header.
	* include/std/shared_mutex: Remove redundant header.
	(shared_mutex::shared_mutex()): Replace throw with __throw_bad_alloc.

diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h b/libstdc++-v3/include/bits/shared_ptr_base.h
index 820edcb..f2f577b 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -49,6 +49,7 @@
 #ifndef _SHARED_PTR_BASE_H
 #define _SHARED_PTR_BASE_H 1
 
+#include 
 #include 
 #include 
 
@@ -67,8 +68,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   class bad_weak_ptr : public std::exception
   {
   public:
-virtual char const*
-what() const noexcept;
+virtual char const* what() const noexcept;
 
 virtual ~bad_weak_ptr() noexcept;
   };
diff --git a/libstdc++-v3/include/std/condition_variable b/libstdc++-v3/include/std/condition_variable
index f7da017..fbed043 100644
--- a/libstdc++-v3/include/std/condition_variable
+++ b/libstdc++-v3/include/std/condition_variable
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #if defined(_GLIBCXX_HAS_GTHREADS) && defined(_GLIBCXX_USE_C99_STDINT_TR1)
 
diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
index 790508c..fbf1740 100644
--- a/libstdc++-v3/include/std/mutex
+++ b/libstdc++-v3/include/std/mutex
@@ -44,7 +44,6 @@
 #include 
 #include 
 #include  // for std::swap
-#include 
 
 #ifdef _GLIBCXX_USE_C99_STDINT_TR1
 
diff --git a/libstdc++-v3/include/std/shared_mutex b/libstdc++-v3/include/std/shared_mutex
index ae5f199..69107cc 100644
--- a/libstdc++-v3/include/std/shared_mutex
+++ b/libstdc++-v3/include/std/shared_mutex
@@ -36,7 +36,6 @@
 #else
 
 #include 
-#include 
 #include 
 #include 
 
@@ -80,7 +79,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   int __ret = pthread_rwlock_init(&_M_rwlock, NULL);
   if (__ret == ENOMEM)
-	throw bad_alloc();
+	__throw_bad_alloc();
   else if (__ret == EAGAIN)
 	__throw_system_error(int(errc::resource_unavailable_try_again));
   else if (__ret == EPERM)


Re: [patch] libstdc++/65473 Make define libstdc++ version macros.

2015-09-03 Thread Jonathan Wakely

On 03/09/15 13:22 -0600, Martin Sebor wrote:

On 09/03/2015 04:58 AM, Jonathan Wakely wrote:

This change would allow including  to be used to check for
__GLIBCXX__ and detect whether youre using libstdc++ or not. Howard
Hinnant recommends including that header for libc++ because it has no
other effects in C++.

We could make every  header include  so that
any of them can be used, but I can't be bothered doing that change!
This makes it work for the one header that is recommended to be used,
but of course that doesn't help people using older versions of
libstdc++, who still need to include some other header.

Is this worth doing?


I'd say it's worth doing consistently, in every header.


OK, I'll add it to the others as well.


Re: [patch] libstdc++/65473 Make define libstdc++ version macros.

2015-09-03 Thread Martin Sebor

On 09/03/2015 04:58 AM, Jonathan Wakely wrote:

This change would allow including  to be used to check for
__GLIBCXX__ and detect whether youre using libstdc++ or not. Howard
Hinnant recommends including that header for libc++ because it has no
other effects in C++.

We could make every  header include  so that
any of them can be used, but I can't be bothered doing that change!
This makes it work for the one header that is recommended to be used,
but of course that doesn't help people using older versions of
libstdc++, who still need to include some other header.

Is this worth doing?


I'd say it's worth doing consistently, in every header. Users are
told by others (e.g., on various discussion forums) to expect to
be able to check what C++ library implementation they're using by
including any C++ standard header and testing the known version
macros.

Martin

PS Out of curiosity I looked to see which headers don't include
c++config.g.

$ (for f in cassert ccomplex cctype cerrno cfenv cfloat cinttypes 
ciso646 climits clocale cmath csetjmp csignal cstdalign cstdarg cstdbool 
cstddef cstdint cstdio cstdlib cstring ctgmath ctime cuchar; do printf " 
 %-20s " "<$f>" && echo "#include <$f>" | ~/bin/gcc-5.1.0/bin/g++ -E 
-std=c++14 -xc++ - | grep -l "c++config\.h" | wc -l; done )

  0
 1
   1
   0
1
   0
1
  0
  0
  1
1
  1
  1
1
  1
 1
  1
  1
   1
  1
  1
  1
1
   :1:18: fatal error: cuchar: No such file 
or directory

compilation terminated.
0



Re: RFC: Combine of compare & and oddity

2015-09-03 Thread Segher Boessenkool
On Thu, Sep 03, 2015 at 07:53:12PM +0100, Wilco Dijkstra wrote:
> > > >>You will end up with a *lot* of target hooks like this.  It will also
> > > >>make testing harder (less coverage).  I am not sure that is a good idea.
> > > >
> > > >We certainly need a lot more target hooks in general so GCC can do the
> > > >right thing
> > > >(rather than using costs inconsistently all over the place). But that's a
> > > >different
> > > >discussion...
> > > Let's be very careful here, target hooks aren't always the solution.
> > > I'd rather see the costing models fixed and use those across the board.
> > >  But frankly, I don't know how to fix the costing models.
> > 
> > Combine doesn't currently use costs to decide how to simplify and
> > canonicalise things.  Simplifications are what is simpler RTL; combine's
> > job is to make fewer RTL instructions (which is not the same thing as
> > fewer machine instructions, or cheaper instructions).  Changing what is
> > canonical based on target hooks would be, uh, interesting.
> 
> Would it be reasonable to query the rtx_cost of a compare+and and if the cost
> is the same as an AND assume that that instruction does not need to be 
> "improved"
> into the canonical form? That way it will use the compare+and pattern if it 
> exists
> and still try the zero_extract/shift+and forms for targets that don't have a 
> compare+and instruction.

At the point the canonicalisation is done you do not yet know if this
is a valid instruction at all.  Introducing more cost computations for
random things is not such a great idea, and for RTL that can never be
part of a machine instruction doubly so.

I think we really should just change what is the canonical form for such
a comparison.


Segher


Re: RFC: Combine of compare & and oddity

2015-09-03 Thread Richard Sandiford
Jeff Law  writes:
> On 09/03/2015 12:53 PM, Wilco Dijkstra wrote:
>>> Segher Boessenkool wrote:
>>> On Thu, Sep 03, 2015 at 10:09:36AM -0600, Jeff Law wrote:
>> You will end up with a *lot* of target hooks like this.  It will also
>> make testing harder (less coverage).  I am not sure that is a good idea.
>
> We certainly need a lot more target hooks in general so GCC can do the
> right thing
> (rather than using costs inconsistently all over the place). But that's a
> different
> discussion...
 Let's be very careful here, target hooks aren't always the solution.
 I'd rather see the costing models fixed and use those across the board.
   But frankly, I don't know how to fix the costing models.
>>>
>>> Combine doesn't currently use costs to decide how to simplify and
>>> canonicalise things.  Simplifications are what is simpler RTL; combine's
>>> job is to make fewer RTL instructions (which is not the same thing as
>>> fewer machine instructions, or cheaper instructions).  Changing what is
>>> canonical based on target hooks would be, uh, interesting.
>>
>> Would it be reasonable to query the rtx_cost of a compare+and and if the cost
>> is the same as an AND assume that that instruction does not need to be
>> "improved"
>> into the canonical form? That way it will use the compare+and pattern
>> if it exists
>> and still try the zero_extract/shift+and forms for targets that don't have a
>> compare+and instruction.
> Perhaps -- but you also have to make sure that you don't regress cases 
> where canonicalization in turn exposes simplifications due to related 
> insns in the chain.

I agree with Segher that the canonical form really shouldn't depend
on costs or target hooks.  It's just going to be a can of worms.
And patterns shouldn't match non-canonical rtl.

If the (and ...) form is a better canonical form (IMO yes) then
I think it would be better to make it the canonical form across
the baord and update the existing ports to use it.  The criteria
could be something like no unjustifiable differences in gcc.dg,
g++.dg and gcc.c-torture .s output for -O2, which is relatively
easy to test.

Thanks,
Richard


Re: RFC: Combine of compare & and oddity

2015-09-03 Thread Segher Boessenkool
On Thu, Sep 03, 2015 at 12:42:30PM -0600, Jeff Law wrote:
> Note huge parts of combine are structured around the needs of processors 
> from the late 80s and early 90s.  The canonical forms selected for those 
> processors may not be optimal for today's processors.

Or, more precisely, for the backends for those processors.  Some of the
canonicalisation rules are very inconvenient for some backends.

> The problem (of course) is changing the canonical forms can be a ton of 
> work in both the backends as well as combine itself to ensure quality of 
> code doesn't regress.

Yes exactly.  Even more so than with other combine changes, before we
do such changes we need to evaluate 1) what this changes, on what targets;
and 2) how big the impact of that is.

Without a proposed patch, all I can say is "most targets will need changes".

> >>But the change from AND to zero_extract is already changing semantics...
> >
> >Oh?  It is not supposed to!
> Combine should never change semantics.  It can change form and may 
> change what happens to "don't care" bits.  But it should never change 
> visible semantics.

And in the reverse transform (in change_zero_ext), it is hard to tell
what those "don't care" bits are (so there are no such bits).


Segher


Re: [patch] libstdc++/66902 Make _S_debug_messages static.

2015-09-03 Thread Jonathan Wakely

On 26/08/15 21:22 +0100, Jonathan Wakely wrote:

This patch removes a public symbol from the .so, which is generally a
bad thing, but there should be no users of this anywhere (it's never
declared in any public header).

For targets using symbol versioning this isn't exported at all, as it
isn't in the linker script, so this really just makes other targets
consistent with the ones using versioned symbols.

Tested powerpc64le-linux and dragonfly-4.2, committed to trunk



commit d35fbf8937930554af62a7320806abecf7381175
Author: Jonathan Wakely 
Date:   Fri Jul 17 10:15:03 2015 +0100

   libstdc++/66902 Make _S_debug_messages static.
   
   	PR libstdc++/66902

* src/c++11/debug.cc (_S_debug_messages): Give internal linkage.

diff --git a/libstdc++-v3/src/c++11/debug.cc b/libstdc++-v3/src/c++11/debug.cc
index 997c0f3..c435de7 100644
--- a/libstdc++-v3/src/c++11/debug.cc
+++ b/libstdc++-v3/src/c++11/debug.cc
@@ -103,7 +103,7 @@ namespace

namespace __gnu_debug
{
-  const char* _S_debug_messages[] =
+  static const char* _S_debug_messages[] =
  {
// General Checks
"function requires a valid iterator range [%1.name;, %2.name;)",



Jason suggested making the array const, which still gives it internal
linkage but prevents accidentally changing it, so even better.

Tested powerpc64le-linux, committed to trunk.


commit 370c0be6b4c82c0769b9808f7a7b378dc49a1a8a
Author: Jonathan Wakely 
Date:   Thu Sep 3 19:56:16 2015 +0100

	PR libstdc++/66902
	* src/c++11/debug.cc (_S_debug_messages): Make array const.

diff --git a/libstdc++-v3/src/c++11/debug.cc b/libstdc++-v3/src/c++11/debug.cc
index c435de7..ac3ac67 100644
--- a/libstdc++-v3/src/c++11/debug.cc
+++ b/libstdc++-v3/src/c++11/debug.cc
@@ -103,7 +103,7 @@ namespace
 
 namespace __gnu_debug
 {
-  static const char* _S_debug_messages[] =
+  const char* const _S_debug_messages[] =
   {
 // General Checks
 "function requires a valid iterator range [%1.name;, %2.name;)",


Re: RFC: Combine of compare & and oddity

2015-09-03 Thread Jeff Law

On 09/03/2015 12:53 PM, Wilco Dijkstra wrote:

Segher Boessenkool wrote:
On Thu, Sep 03, 2015 at 10:09:36AM -0600, Jeff Law wrote:

You will end up with a *lot* of target hooks like this.  It will also
make testing harder (less coverage).  I am not sure that is a good idea.


We certainly need a lot more target hooks in general so GCC can do the
right thing
(rather than using costs inconsistently all over the place). But that's a
different
discussion...

Let's be very careful here, target hooks aren't always the solution.
I'd rather see the costing models fixed and use those across the board.
  But frankly, I don't know how to fix the costing models.


Combine doesn't currently use costs to decide how to simplify and
canonicalise things.  Simplifications are what is simpler RTL; combine's
job is to make fewer RTL instructions (which is not the same thing as
fewer machine instructions, or cheaper instructions).  Changing what is
canonical based on target hooks would be, uh, interesting.


Would it be reasonable to query the rtx_cost of a compare+and and if the cost
is the same as an AND assume that that instruction does not need to be 
"improved"
into the canonical form? That way it will use the compare+and pattern if it 
exists
and still try the zero_extract/shift+and forms for targets that don't have a
compare+and instruction.
Perhaps -- but you also have to make sure that you don't regress cases 
where canonicalization in turn exposes simplifications due to related 
insns in the chain.


Jeff


RE: RFC: Combine of compare & and oddity

2015-09-03 Thread Wilco Dijkstra
> Segher Boessenkool wrote:
> On Thu, Sep 03, 2015 at 10:09:36AM -0600, Jeff Law wrote:
> > >>You will end up with a *lot* of target hooks like this.  It will also
> > >>make testing harder (less coverage).  I am not sure that is a good idea.
> > >
> > >We certainly need a lot more target hooks in general so GCC can do the
> > >right thing
> > >(rather than using costs inconsistently all over the place). But that's a
> > >different
> > >discussion...
> > Let's be very careful here, target hooks aren't always the solution.
> > I'd rather see the costing models fixed and use those across the board.
> >  But frankly, I don't know how to fix the costing models.
> 
> Combine doesn't currently use costs to decide how to simplify and
> canonicalise things.  Simplifications are what is simpler RTL; combine's
> job is to make fewer RTL instructions (which is not the same thing as
> fewer machine instructions, or cheaper instructions).  Changing what is
> canonical based on target hooks would be, uh, interesting.

Would it be reasonable to query the rtx_cost of a compare+and and if the cost
is the same as an AND assume that that instruction does not need to be 
"improved"
into the canonical form? That way it will use the compare+and pattern if it 
exists
and still try the zero_extract/shift+and forms for targets that don't have a 
compare+and instruction.

Wilco





Re: RFC: Combine of compare & and oddity

2015-09-03 Thread Jeff Law

On 09/03/2015 10:18 AM, Segher Boessenkool wrote:



But there are more efficient ways to emit single bit and masks tests that apply
to most CPUs rather than doing something specific that works for just one target
only. For example single bit test is a simple shift into carry flag or into the
sign bit, and for mask tests, you shift out all the non-mask bits.


Most of those are quite target-specific.  Some others are already done,
and/or done by other passes.


But what combine does here is even more target-specific.


Combine puts everything (well, most things) through
make_compound_operation, on all targets.
Note huge parts of combine are structured around the needs of processors 
from the late 80s and early 90s.  The canonical forms selected for those 
processors may not be optimal for today's processors.


The problem (of course) is changing the canonical forms can be a ton of 
work in both the backends as well as combine itself to ensure quality of 
code doesn't regress.



Combine converts the merged instructions to what it thinks is the
canonical or cheapest form, and uses that.  It does not try multiple
options (the zero_ext* -> and+shift rewriting is not changing the
semantics of the pattern at all).


But the change from AND to zero_extract is already changing semantics...


Oh?  It is not supposed to!
Combine should never change semantics.  It can change form and may 
change what happens to "don't care" bits.  But it should never change 
visible semantics.


Jeff


RE: RFC: Combine of compare & and oddity

2015-09-03 Thread Wilco Dijkstra
> Oleg Endo wrote:
> On 04 Sep 2015, at 01:54, Segher Boessenkool  
> wrote:
> 
> > On Thu, Sep 03, 2015 at 05:25:43PM +0100, Kyrill Tkachov wrote:
> >>> void g(void);
> >>> void f(int *x) { if (*x & 2) g(); }
> >
> >> A testcase I was looking at is:
> >> int
> >> foo (int a)
> >> {
> >>  return (a & 7) != 0;
> >> }
> >>
> >> For me this generates:
> >>and w0, w0, 7
> >>cmp w0, wzr
> >>csetw0, ne
> >>ret
> >>
> >> when it could be:
> >>tst  w0, 7
> >>cset w0, ne
> >>ret
> >
> > Interesting, thanks.
> >
> > That testcase with 4 (instead of 7) results in a single ubfx (a 
> > zero_extract)
> > (this case is written differently before combine already, however).
> > With 6 it does what you want (combine does not handle it as an extract,
> > no matter what the docs say); and 7 is as you say (combine tries the 
> > extract,
> > there is no insn like that).

(a & 1) != 0 is optimized earlier to a & 1, (a & 2) != 0 to a real zero_extract 
(non-zero shift), so these cases don't need a compare. return (a & C) ? 2 : 3 
always uses a compare with zero, even for C=1.

> I've been through this on SH.  As it currently stands, to generate tst insns 
> basically 4
> different combine patterns are required:
>  - lsb (e.g. & 1)
>  - one bit (zero extract, e.g. & 2)
>  - n contiguous bits (zero extract, e.g. & 7)
>  - everything else (e.g. 4)

Also: (a & 255) and (a & 65535) which are converted into zero_extend by combine.
Interestingly a subreg is used depending on whether the operand is a virtual or 
physical reg -
((a + 1) & 255) == 0 vs (a & 65535) == 0:

Failed to match this instruction:
(set (reg:CC_ZESWP 66 cc)
(compare:CC_ZESWP (reg:HI 0 x0 [ xD.2661 ])
(const_int 0 [0])))

Failed to match this instruction:
(set (reg:CC_ZESWP 66 cc)
(compare:CC_ZESWP (subreg:QI (reg:SI 80 [ D.2778 ]) 0)
(const_int 0 [0])))

So that means another 2 patterns - and all that for one simple instruction...

Wilco




Re: patch for PR61578

2015-09-03 Thread Vladimir Makarov

On 09/03/2015 11:00 AM, Vladimir Makarov wrote:

On 09/02/2015 11:32 AM, Christophe Lyon wrote:

Hi Vladimir,



On 1 September 2015 at 21:39, Vladimir Makarov  
wrote:

   The following patch is for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61578

   The patch was bootstrapped and tested on x86 and x86-64.

   Committed as rev. 227382.


Since this patch, I can see:
   gcc.dg/vect/slp-perm-5.c (internal compiler error)
   gcc.dg/vect/slp-perm-5.c -flto -ffat-lto-objects (internal 
compiler error)


on arm* targets.

Christophe, I can not reproduce it on my arm board (odroid xu4). Could 
you provide more info (target, configure options).  Thanks.




Re: [PING] Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function

2015-09-03 Thread Martin Sebor

You've committed empty gcc/builtins.c.orig file, I've removed it, but
please be more careful next time.


And c/ or cp/ prefixes don't belong to c/ChangeLog or cp/ChangeLog
(also fixed).

Jakub



Thank you for fixing that up.

Martin


Re: [gomp4.1] Depend clause support for offloading

2015-09-03 Thread Jakub Jelinek
Hi!

FYI, I've merged trunk into the gomp-4_1-branch, it has been a while since
that has been done.  make -C check RUNTESTFLAGS=gomp.exp and
make check-target-libgomp still pass without offloading and when offloading
to mic emul (the latter with the libgomp.c/for-5.c and libgomp.c++/for-13.C
LTO ICEs that have been failing for a while).

Jakub


Re: [PATCH] Make ubsan tests less picky about ansi escape codes in diagnostics.

2015-09-03 Thread Jonathan Roelofs



On 9/3/15 10:45 AM, Jonathan Roelofs wrote:



On 9/3/15 10:17 AM, Jakub Jelinek wrote:

On Thu, Sep 03, 2015 at 10:15:02AM -0600, Jonathan Roelofs wrote:

+kcc, mrs

Ping

On 8/27/15 4:44 PM, Jonathan Roelofs wrote:

The attached patch makes the ubsan tests agnostic to ansi escape codes
in their diagnostic output.


It wouldn't hurt if you explained in detail what is the problem you are
trying to solve and why something that works for most people doesn't
work in
your case.


Hi Jakub,

AFAICT, there are two ways to suppress the emission of color codes from
ubsan's diagnostics:

   1) Set an environment variable.
   2) Make the output stream not a tty.

#1 doesn't seem to be possible in DejaGnu without hacks.
#2 doesn't work in our environment because DejaGnu attempts to make
itself appear to the program under test as if it were a tty. This might
be an artifact of the fact that all of our testing is remote testing
(though that is just blind speculation on my part: I'm not familiar with
how others have their testing set up, nor whether they do remote testing
of the sanitizer runtimes).

Moral of the story is: these tests fail in our environment, but only
because the regexes do not expect the presence of the ansi color codes,
and we can't trick the runtime into not emitting them.


Cheers,

Jon




Tested on an x86_64-linux-gnu target.

 2015-08-27  Jonathan Roelofs  

 * c-c++-common/ubsan/align-2.c: Don't be picky about ansi
escape
 codes in diagnostics.
 * c-c++-common/ubsan/align-4.c: Ditto.
 * c-c++-common/ubsan/align-6.c: Ditto.
 * c-c++-common/ubsan/align-7.c: Ditto.
 * c-c++-common/ubsan/align-9.c: Ditto.
 * c-c++-common/ubsan/float-cast-overflow-2.c: Ditto.
 * c-c++-common/ubsan/float-cast-overflow-8.c: Ditto.
 * c-c++-common/ubsan/object-size-1.c: Ditto.
 * c-c++-common/ubsan/object-size-10.c: Ditto.
 * c-c++-common/ubsan/object-size-4.c: Ditto.
 * c-c++-common/ubsan/object-size-5.c: Ditto.
 * c-c++-common/ubsan/object-size-7.c: Ditto.
 * c-c++-common/ubsan/object-size-8.c: Ditto.
 * c-c++-common/ubsan/object-size-9.c: Ditto.
 * c-c++-common/ubsan/overflow-int128.c: Ditto.
 * c-c++-common/ubsan/pr63802.c: Ditto.


Oops, forgot the one in `gcc.dg/ubsan/object-size-9.c`.

2015-08-27  Jonathan Roelofs  

* c-c++-common/ubsan/align-2.c: Don't be picky about ansi escape
codes in diagnostics.
* c-c++-common/ubsan/align-4.c: Ditto.
* c-c++-common/ubsan/align-6.c: Ditto.
* c-c++-common/ubsan/align-7.c: Ditto.
* c-c++-common/ubsan/align-9.c: Ditto.
* c-c++-common/ubsan/float-cast-overflow-2.c: Ditto.
* c-c++-common/ubsan/float-cast-overflow-8.c: Ditto.
* c-c++-common/ubsan/object-size-1.c: Ditto.
* c-c++-common/ubsan/object-size-10.c: Ditto.
* c-c++-common/ubsan/object-size-4.c: Ditto.
* c-c++-common/ubsan/object-size-5.c: Ditto.
* c-c++-common/ubsan/object-size-7.c: Ditto.
* c-c++-common/ubsan/object-size-8.c: Ditto.
* c-c++-common/ubsan/object-size-9.c: Ditto.
* c-c++-common/ubsan/overflow-int128.c: Ditto.
* c-c++-common/ubsan/pr63802.c: Ditto.
* gcc.dg/ubsan/object-size-9.c: Ditto.

Jon



I do not have write access, so I'll need someone to commit this for me
if it is approved.


Jakub





--
Jon Roelofs
jonat...@codesourcery.com
CodeSourcery / Mentor Embedded
Index: gcc/testsuite/gcc.dg/ubsan/object-size-9.c
===
--- gcc/testsuite/gcc.dg/ubsan/object-size-9.c  (revision 454039)
+++ gcc/testsuite/gcc.dg/ubsan/object-size-9.c  (working copy)
@@ -19,6 +19,6 @@
 }
 
 /* { dg-output "load of address \[^\n\r]* with insufficient space for an 
object of type 'char'\[^\n\r]*(\n|\r\n|\r)" } */
-/* { dg-output "\[^\n\r]*note: pointer points here\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*note: \[^\n\r\]*pointer points 
here\[^\n\r]*(\n|\r\n|\r)" } */
 /* { dg-output "\[^\n\r]*\[^\n\r]*(\n|\r\n|\r)" } */
 /* { dg-output "\[^\n\r]*\\^\[^\n\r]*(\n|\r\n|\r)" } */
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog (revision 454039)
+++ gcc/testsuite/ChangeLog (working copy)
@@ -1,3 +1,24 @@
+2015-08-27  Jonathan Roelofs  
+
+   * c-c++-common/ubsan/align-2.c: Don't be picky about ansi escape
+   codes in diagnostics.
+   * c-c++-common/ubsan/align-4.c: Ditto.
+   * c-c++-common/ubsan/align-6.c: Ditto.
+   * c-c++-common/ubsan/align-7.c: Ditto.
+   * c-c++-common/ubsan/align-9.c: Ditto.
+   * c-c++-common/ubsan/float-cast-overflow-2.c: Ditto.
+   * c-c++-common/ubsan/float-cast-overflow-8.c: Ditto.
+   * c-c++-common/ubsan/object-size-1.c: Ditto.
+   * c-c++-common/ubsan/object-size-10.c: Ditto.
+   * c-c++-common/ubsan/obje

Re: [PATCH] Add __builtin_argument_pointer

2015-09-03 Thread H.J. Lu
On Tue, Sep 1, 2015 at 7:52 AM, H.J. Lu  wrote:
> On Wed, Aug 19, 2015 at 3:35 PM, Segher Boessenkool
>  wrote:
>> On Wed, Aug 19, 2015 at 03:18:46PM -0700, H.J. Lu wrote:
>>> @deftypefn {Built-in Function} {void *} __builtin_argument_pointer (void)
>>> This function is similar to @code{__builtin_frame_address} with an
>>> argument of 0, but it returns the address of the incoming arguments to
>>> the current function rather than the address of its frame.
>>>
>>> The exact definition of this address depends upon the processor and the
>>> calling convention.  Usually some arguments are passed in registers and
>>> the rest on the stack, and this builtin returns the address of the
>>> first argument which would be passed on the stack.
>>> @end deftypefn
>>
>> That is fine by me.  Thanks!
>>
>>
>
> Here is a patch to add __builtin_argument_pointer.  OK for master?
>

I withdrew this patch.


-- 
H.J.


Re: [PING] Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function

2015-09-03 Thread Jakub Jelinek
On Wed, Sep 02, 2015 at 03:53:07PM -0600, Martin Sebor wrote:
> gcc/ChangeLog
> 2015-09-02  Martin Sebor  
> 
>   PR c/66516
>   * doc/extend.texi (Other Builtins): Document when the address
>   of a builtin function can be taken.
> 
> gcc/c-family/ChangeLog
> 2015-09-02  Martin Sebor  
> 
>   PR c/66516
>   * c-common.h (c_decl_implicit, reject_gcc_builtin): Declare new
>   functions.
>   * c-common.c (reject_gcc_builtin): Define.

You've committed empty gcc/builtins.c.orig file, I've removed it, but
please be more careful next time.

> gcc/c/ChangeLog
> 2015-09-02  Martin Sebor  
> 
>   PR c/66516
>   * c/c-typeck.c (convert_arguments, parser_build_unary_op)

And c/ or cp/ prefixes don't belong to c/ChangeLog or cp/ChangeLog
(also fixed).

Jakub


Re: RFC: Combine of compare & and oddity

2015-09-03 Thread Oleg Endo

On 04 Sep 2015, at 01:54, Segher Boessenkool  wrote:

> On Thu, Sep 03, 2015 at 05:25:43PM +0100, Kyrill Tkachov wrote:
>>> void g(void);
>>> void f(int *x) { if (*x & 2) g(); }
> 
>> A testcase I was looking at is:
>> int
>> foo (int a)
>> {
>>  return (a & 7) != 0;
>> }
>> 
>> For me this generates:
>>and w0, w0, 7
>>cmp w0, wzr
>>csetw0, ne
>>ret
>> 
>> when it could be:
>>tst  w0, 7
>>cset w0, ne
>>ret
> 
> Interesting, thanks.
> 
> That testcase with 4 (instead of 7) results in a single ubfx (a zero_extract)
> (this case is written differently before combine already, however).
> With 6 it does what you want (combine does not handle it as an extract,
> no matter what the docs say); and 7 is as you say (combine tries the extract,
> there is no insn like that).

I've been through this on SH.  As it currently stands, to generate tst insns 
basically 4 different combine patterns are required:
 - lsb (e.g. & 1)
 - one bit (zero extract, e.g. & 2)
 - n contiguous bits (zero extract, e.g. & 7)
 - everything else (e.g. 4)

Cheers,
Oleg

Re: RFC: Combine of compare & and oddity

2015-09-03 Thread Segher Boessenkool
On Thu, Sep 03, 2015 at 05:25:43PM +0100, Kyrill Tkachov wrote:
> >void g(void);
> >void f(int *x) { if (*x & 2) g(); }

> A testcase I was looking at is:
> int
> foo (int a)
> {
>   return (a & 7) != 0;
> }
> 
> For me this generates:
> and w0, w0, 7
> cmp w0, wzr
> csetw0, ne
> ret
> 
> when it could be:
> tst  w0, 7
> cset w0, ne
> ret

Interesting, thanks.

That testcase with 4 (instead of 7) results in a single ubfx (a zero_extract)
(this case is written differently before combine already, however).
With 6 it does what you want (combine does not handle it as an extract,
no matter what the docs say); and 7 is as you say (combine tries the extract,
there is no insn like that).


Segher


Re: [PATCH] Make ubsan tests less picky about ansi escape codes in diagnostics.

2015-09-03 Thread Jonathan Roelofs



On 9/3/15 10:17 AM, Jakub Jelinek wrote:

On Thu, Sep 03, 2015 at 10:15:02AM -0600, Jonathan Roelofs wrote:

+kcc, mrs

Ping

On 8/27/15 4:44 PM, Jonathan Roelofs wrote:

The attached patch makes the ubsan tests agnostic to ansi escape codes
in their diagnostic output.


It wouldn't hurt if you explained in detail what is the problem you are
trying to solve and why something that works for most people doesn't work in
your case.


Hi Jakub,

AFAICT, there are two ways to suppress the emission of color codes from 
ubsan's diagnostics:


  1) Set an environment variable.
  2) Make the output stream not a tty.

#1 doesn't seem to be possible in DejaGnu without hacks.
#2 doesn't work in our environment because DejaGnu attempts to make 
itself appear to the program under test as if it were a tty. This might 
be an artifact of the fact that all of our testing is remote testing 
(though that is just blind speculation on my part: I'm not familiar with 
how others have their testing set up, nor whether they do remote testing 
of the sanitizer runtimes).


Moral of the story is: these tests fail in our environment, but only 
because the regexes do not expect the presence of the ansi color codes, 
and we can't trick the runtime into not emitting them.



Cheers,

Jon




Tested on an x86_64-linux-gnu target.

 2015-08-27  Jonathan Roelofs  

 * c-c++-common/ubsan/align-2.c: Don't be picky about ansi escape
 codes in diagnostics.
 * c-c++-common/ubsan/align-4.c: Ditto.
 * c-c++-common/ubsan/align-6.c: Ditto.
 * c-c++-common/ubsan/align-7.c: Ditto.
 * c-c++-common/ubsan/align-9.c: Ditto.
 * c-c++-common/ubsan/float-cast-overflow-2.c: Ditto.
 * c-c++-common/ubsan/float-cast-overflow-8.c: Ditto.
 * c-c++-common/ubsan/object-size-1.c: Ditto.
 * c-c++-common/ubsan/object-size-10.c: Ditto.
 * c-c++-common/ubsan/object-size-4.c: Ditto.
 * c-c++-common/ubsan/object-size-5.c: Ditto.
 * c-c++-common/ubsan/object-size-7.c: Ditto.
 * c-c++-common/ubsan/object-size-8.c: Ditto.
 * c-c++-common/ubsan/object-size-9.c: Ditto.
 * c-c++-common/ubsan/overflow-int128.c: Ditto.
 * c-c++-common/ubsan/pr63802.c: Ditto.

I do not have write access, so I'll need someone to commit this for me
if it is approved.


Jakub



--
Jon Roelofs
jonat...@codesourcery.com
CodeSourcery / Mentor Embedded


Re: [Patch, fortran] F2008 - implement pointer function assignment

2015-09-03 Thread Dominique d'Humières
Dear Paul,

I have tested your patch (with the two patches in pr67429) and got the 
following regressions:

FAIL: gfortran.dg/bind_c_usage_12.f03   -O   (test for errors, line 33)
FAIL: gfortran.dg/bind_c_usage_12.f03   -O   (test for errors, line 51)
FAIL: gfortran.dg/bind_c_usage_12.f03   -O   (test for errors, line 61)
FAIL: gfortran.dg/bind_c_usage_12.f03   -O  (test for excess errors)
FAIL: gfortran.dg/derived_function_interface_1.f90   -O   (test for errors, 
line 41)
FAIL: gfortran.dg/derived_function_interface_1.f90   -O  (test for excess 
errors)
FAIL: gfortran.dg/error_recovery_3.f90   -O   (test for errors, line 9)
FAIL: gfortran.dg/error_recovery_3.f90   -O  (test for excess errors)
FAIL: gfortran.dg/func_decl_1.f90   -O   (test for errors, line 16)
FAIL: gfortran.dg/func_decl_1.f90   -O   (test for errors, line 22)
FAIL: gfortran.dg/func_decl_1.f90   -O  (test for excess errors)
FAIL: gfortran.dg/func_decl_4.f90   -O   (test for errors, line 20)
FAIL: gfortran.dg/func_decl_4.f90   -O  (test for excess errors)
FAIL: gfortran.dg/proc_assign_1.f90   -O   (test for errors, line 68)
FAIL: gfortran.dg/proc_assign_1.f90   -O   (test for errors, line 73)
FAIL: gfortran.dg/proc_assign_1.f90   -O  (internal compiler error)
FAIL: gfortran.dg/proc_assign_1.f90   -O  (test for excess errors)
FAIL: gfortran.dg/typebound_proc_23.f90   -O0  (internal compiler error)
FAIL: gfortran.dg/typebound_proc_23.f90   -O0  (test for excess errors)
UNRESOLVED: gfortran.dg/typebound_proc_23.f90   -O0  compilation failed to 
produce executable
FAIL: gfortran.dg/typebound_proc_23.f90   -O1  (internal compiler error)
FAIL: gfortran.dg/typebound_proc_23.f90   -O1  (test for excess errors)
UNRESOLVED: gfortran.dg/typebound_proc_23.f90   -O1  compilation failed to 
produce executable
FAIL: gfortran.dg/typebound_proc_23.f90   -O2  (internal compiler error)
FAIL: gfortran.dg/typebound_proc_23.f90   -O2  (test for excess errors)
UNRESOLVED: gfortran.dg/typebound_proc_23.f90   -O2  compilation failed to 
produce executable
FAIL: gfortran.dg/typebound_proc_23.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler 
error)
FAIL: gfortran.dg/typebound_proc_23.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
errors)
UNRESOLVED: gfortran.dg/typebound_proc_23.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  compilation failed to 
produce executable
FAIL: gfortran.dg/typebound_proc_23.f90   -O3 -g  (internal compiler error)
FAIL: gfortran.dg/typebound_proc_23.f90   -O3 -g  (test for excess errors)
UNRESOLVED: gfortran.dg/typebound_proc_23.f90   -O3 -g  compilation failed to 
produce executable
FAIL: gfortran.dg/typebound_proc_23.f90   -Os  (internal compiler error)
FAIL: gfortran.dg/typebound_proc_23.f90   -Os  (test for excess errors)
UNRESOLVED: gfortran.dg/typebound_proc_23.f90   -Os  compilation failed to 
produce executable
FAIL: gfortran.dg/use_7.f90   -O   (test for errors, line 42)
FAIL: gfortran.dg/use_7.f90   -O   (test for errors, line 43)
FAIL: gfortran.dg/use_7.f90   -O   (test for errors, line 44)
FAIL: gfortran.dg/use_7.f90   -O   (test for errors, line 45)
FAIL: gfortran.dg/use_7.f90   -O   (test for errors, line 46)
FAIL: gfortran.dg/use_7.f90   -O  (test for excess errors)

The failures for typebound_proc_23.f90 and proc_assign_1.f90 are ICEs

f951: internal compiler error: Segmentation fault: 11

The failures for gfortran.dg/bind_c_usage_12.f03 are

/opt/gcc/_clean/gcc/testsuite/gfortran.dg/bind_c_usage_12.f03:33:25:

   integer(c_int) function int2() bind(c, name="jjj") ! { dg-error "No binding 
name is allowed" }
 1
Error: Syntax error in data declaration at (1)

/opt/gcc/_clean/gcc/testsuite/gfortran.dg/bind_c_usage_12.f03:51:25:

   integer(c_int) function int2() bind(c, name="kkk") ! { dg-error "No binding 
name is allowed" }
 1
Error: Syntax error in data declaration at (1)

/opt/gcc/_clean/gcc/testsuite/gfortran.dg/bind_c_usage_12.f03:61:25:

   integer(c_int) function int2() bind(c, name="mmm") ! { dg-error "No binding 
name is allowed" }
 1
Error: Syntax error in data declaration at (1)

I also see

/opt/gcc/work/gcc/testsuite/gfortran.dg/func_decl_1.f90:16:0: Error: 
Unclassifiable statement at (1)
/opt/gcc/work/gcc/testsuite/gfortran.dg/func_decl_1.f90:22:0: Error: 
Unclassifiable statement at (1)

/opt/gcc/work/gcc/testsuite/gfortran.dg/use_7.f90:42:0: Error: Unclassifiable 
statement at (1)
/opt/gcc/work/gcc/testsuite/gfortran.dg/use_7.f90:43:0: Error: Unclassifiable 
statement at (1)
/opt/gcc/work/gcc/testsuite/gfortran.dg/use_7.f90:44:0: Error: Unclassifiable 
statement at (1)
/opt/gcc/work/gcc/testsuite/gfortran.dg/use_7.f90:45:0: Error: Unclassifiable 
statement at (1)
/opt/gcc/work/gcc/testsuite/gfortran.dg/use_7.f90:46:0: Error: Unclassifiable 
statement at (1)

Cheers,

Dominique



RE: RFC: Combine of compare & and oddity

2015-09-03 Thread Wilco Dijkstra
> Kyrill Tkachov wrote:
> A testcase I was looking at is:
> int
> foo (int a)
> {
>return (a & 7) != 0;
> }
> 
> For me this generates:
>  and w0, w0, 7
>  cmp w0, wzr
>  csetw0, ne
>  ret
> 
> when it could be:
>  tst  w0, 7
>  cset w0, ne
>  ret

Indeed. And return (a & 14) != 0; generates a tst just fine (this should be an 
actual zero_extract:
((a >> 1) & 7) != 0 - but somehow it doesn't get converted into one...).

Wilco





Re: RFC: Combine of compare & and oddity

2015-09-03 Thread Segher Boessenkool
On Thu, Sep 03, 2015 at 10:09:36AM -0600, Jeff Law wrote:
> >>You will end up with a *lot* of target hooks like this.  It will also
> >>make testing harder (less coverage).  I am not sure that is a good idea.
> >
> >We certainly need a lot more target hooks in general so GCC can do the 
> >right thing
> >(rather than using costs inconsistently all over the place). But that's a 
> >different
> >discussion...
> Let's be very careful here, target hooks aren't always the solution. 
> I'd rather see the costing models fixed and use those across the board. 
>  But frankly, I don't know how to fix the costing models.

Combine doesn't currently use costs to decide how to simplify and
canonicalise things.  Simplifications are what is simpler RTL; combine's
job is to make fewer RTL instructions (which is not the same thing as
fewer machine instructions, or cheaper instructions).  Changing what is
canonical based on target hooks would be, uh, interesting.


Segher


Re: RFC: Combine of compare & and oddity

2015-09-03 Thread Kyrill Tkachov


On 03/09/15 17:18, Segher Boessenkool wrote:

On Thu, Sep 03, 2015 at 03:59:00PM +0100, Wilco Dijkstra wrote:

However there are 2 issues with this, one is the spurious subreg,

Combine didn't make that up out of thin air; something already used
DImode here.  It could simplify it to SImode in this case, that is
true, don't know why it doesn't; it isn't necessarily faster code to
do so, it can be slower, it might not match, etc.

The relevant RTL instructions on AArch64 are:

[ You never gave a full  test case, or I missed it, or cannot find it
   anymore -- but I can reproduce this now:

void g(void);
void f(int *x) { if (*x & 2) g(); }

]


(insn 8 3 25 2 (set (reg:SI 77 [ D.2705 ])
 (and:SI (reg/v:SI 76 [ xD.2641 ])
 (const_int 2 [0x2]))) tmp5.c:122 452 {andsi3}
  (nil))
  (insn 26 25 27 2 (set (reg:CC 66 cc)
 (compare:CC (reg:SI 77 [ D.2705 ])
 (const_int 0 [0]))) tmp5.c:122 377 {*cmpsi}
  (expr_list:REG_DEAD (reg:SI 77 [ D.2705 ])
 (nil)))

I don't see anything using DI...

Yeah, I spoke too soon, sorry.  It looks like make_compound_operation came
up with it.


It's only a problem for AND-and-compare, no?

Yes, so it looks like some other backends match the odd pattern and then have 
another
pattern change it back into the canonical AND/TST form during the split phase 
(maybe
the subreg confuses register allocation or block other optimizations).

A subreg of a pseudo is not anything special, don't worry about it,
register_operand and similar treat it just like any other register.


This all seems
a lot of unnecessary complexity for a few special immediates when there is a 
much
simpler solution...

Feel free to post a patch!  I would love to have this all simplified.


But there are more efficient ways to emit single bit and masks tests that apply
to most CPUs rather than doing something specific that works for just one target
only. For example single bit test is a simple shift into carry flag or into the
sign bit, and for mask tests, you shift out all the non-mask bits.

Most of those are quite target-specific.  Some others are already done,
and/or done by other passes.

But what combine does here is even more target-specific.

Combine puts everything (well, most things) through
make_compound_operation, on all targets.


Combine converts the merged instructions to what it thinks is the
canonical or cheapest form, and uses that.  It does not try multiple
options (the zero_ext* -> and+shift rewriting is not changing the
semantics of the pattern at all).

But the change from AND to zero_extract is already changing semantics...

Oh?  It is not supposed to!


Or would it be better to let each target decide
on how to canonicalize bit tests and only try that alternative?

The question is how to write the pattern to be most convenient for all
targets.

The obvious choice is to try the 2 original instructions merged.

... without any simplification.  Yes, I've wanted combine to fall back
to that if the "simplified" version does not work out.  Not so easy to
do though.


Yes, but that doesn't mean (x & C) != 0 shouldn't be tried as well...

Combine does not try multiple options.

I'm not following - combine tries zero_extract and shift+AND - that's 2 options.
If that is feasible then adding a 3rd option should be possible.

The shift+and is *exactly the same* as the zero_extract, just written
differently.


We certainly need a lot more target hooks in general so GCC can do the right 
thing
(rather than using costs inconsistently all over the place). But that's a 
different
discussion...

This isn't about costs though.  That is a big other can of worms, indeed!


Anyway.  In that testcase I made, everything is simplified just fine on
aarch64, using *tbeqdi1; what am I missing?


A testcase I was looking at is:
int
foo (int a)
{
  return (a & 7) != 0;
}

For me this generates:
and w0, w0, 7
cmp w0, wzr
csetw0, ne
ret

when it could be:
tst  w0, 7
cset w0, ne
ret

Kyrill




Segher





Re: [PATCH 3/3] [gomp] Add thread attribute customization

2015-09-03 Thread Jakub Jelinek
On Thu, Sep 03, 2015 at 01:36:35PM +0200, Sebastian Huber wrote:
> On 03/09/15 13:10, Jakub Jelinek wrote:
> >On Thu, Sep 03, 2015 at 01:09:23PM +0200, Sebastian Huber wrote:
> >We have only thread attributes in this function: mutable_attr and attr. 
> >The
> >attr is initialized with &gomp_thread_attr and gomp_thread_attr is 
> >supposed
> >to be read-only by this function. Under certain conditions we have to 
> >modify
> >the initial attributes. Since gomp_thread_attr is read-only, we have to 
> >copy
> >it and then modify the copy. For this we need some storage: mutable_attr.
> >>>So use local_thread_attr if you want to stress it, but IMHO thread_attr
> >>>just just fine.  I really don't like mutable_attr.
> >>Ok, if I don't rename thread_attr, is the patch ok?
> >Yes.
> 
> Thanks a lot for your kind review.
> 
> I committed the patches as:
> 
> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=227439
> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=227440
> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=227441
> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=227442

Unfortunately it broke stuff, here is a fix I've committed:

2015-09-03  Jakub Jelinek  

* configure.tgt: Add missing ;; in between nvptx and rtems
snippets.

--- libgomp/configure.tgt   (revision 227456)
+++ libgomp/configure.tgt   (working copy)
@@ -153,6 +153,7 @@ case "${target}" in
 
   nvptx*-*-*)
config_path="nvptx"
+   ;;
 
   *-*-rtems*)
# Use self-contained synchronization objects if provided by Newlib


Jakub


Re: RFC: Combine of compare & and oddity

2015-09-03 Thread Segher Boessenkool
On Thu, Sep 03, 2015 at 03:59:00PM +0100, Wilco Dijkstra wrote:
> > > However there are 2 issues with this, one is the spurious subreg,
> > 
> > Combine didn't make that up out of thin air; something already used
> > DImode here.  It could simplify it to SImode in this case, that is
> > true, don't know why it doesn't; it isn't necessarily faster code to
> > do so, it can be slower, it might not match, etc.
> 
> The relevant RTL instructions on AArch64 are:

[ You never gave a full  test case, or I missed it, or cannot find it
  anymore -- but I can reproduce this now:

void g(void);
void f(int *x) { if (*x & 2) g(); }

]

> (insn 8 3 25 2 (set (reg:SI 77 [ D.2705 ])
> (and:SI (reg/v:SI 76 [ xD.2641 ])
> (const_int 2 [0x2]))) tmp5.c:122 452 {andsi3}
>  (nil))
>  (insn 26 25 27 2 (set (reg:CC 66 cc)
> (compare:CC (reg:SI 77 [ D.2705 ])
> (const_int 0 [0]))) tmp5.c:122 377 {*cmpsi}
>  (expr_list:REG_DEAD (reg:SI 77 [ D.2705 ])
> (nil)))
> 
> I don't see anything using DI...

Yeah, I spoke too soon, sorry.  It looks like make_compound_operation came
up with it.

> > It's only a problem for AND-and-compare, no?
> 
> Yes, so it looks like some other backends match the odd pattern and then have 
> another
> pattern change it back into the canonical AND/TST form during the split phase 
> (maybe
> the subreg confuses register allocation or block other optimizations).

A subreg of a pseudo is not anything special, don't worry about it,
register_operand and similar treat it just like any other register.

> This all seems
> a lot of unnecessary complexity for a few special immediates when there is a 
> much 
> simpler solution...

Feel free to post a patch!  I would love to have this all simplified.

> > > But there are more efficient ways to emit single bit and masks tests that 
> > > apply
> > > to most CPUs rather than doing something specific that works for just one 
> > > target
> > > only. For example single bit test is a simple shift into carry flag or 
> > > into the
> > > sign bit, and for mask tests, you shift out all the non-mask bits.
> > 
> > Most of those are quite target-specific.  Some others are already done,
> > and/or done by other passes.
> 
> But what combine does here is even more target-specific.

Combine puts everything (well, most things) through
make_compound_operation, on all targets.

> > Combine converts the merged instructions to what it thinks is the
> > canonical or cheapest form, and uses that.  It does not try multiple
> > options (the zero_ext* -> and+shift rewriting is not changing the
> > semantics of the pattern at all).
> 
> But the change from AND to zero_extract is already changing semantics...

Oh?  It is not supposed to!

> > > Or would it be better to let each target decide
> > > on how to canonicalize bit tests and only try that alternative?
> > 
> > The question is how to write the pattern to be most convenient for all
> > targets.
> 
> The obvious choice is to try the 2 original instructions merged.

... without any simplification.  Yes, I've wanted combine to fall back
to that if the "simplified" version does not work out.  Not so easy to
do though.

> > > Yes, but that doesn't mean (x & C) != 0 shouldn't be tried as well...
> > 
> > Combine does not try multiple options.
> 
> I'm not following - combine tries zero_extract and shift+AND - that's 2 
> options.
> If that is feasible then adding a 3rd option should be possible.

The shift+and is *exactly the same* as the zero_extract, just written
differently.

> We certainly need a lot more target hooks in general so GCC can do the right 
> thing
> (rather than using costs inconsistently all over the place). But that's a 
> different
> discussion...

This isn't about costs though.  That is a big other can of worms, indeed!


Anyway.  In that testcase I made, everything is simplified just fine on
aarch64, using *tbeqdi1; what am I missing?


Segher


Re: [PATCH] Make ubsan tests less picky about ansi escape codes in diagnostics.

2015-09-03 Thread Jakub Jelinek
On Thu, Sep 03, 2015 at 10:15:02AM -0600, Jonathan Roelofs wrote:
> +kcc, mrs
> 
> Ping
> 
> On 8/27/15 4:44 PM, Jonathan Roelofs wrote:
> >The attached patch makes the ubsan tests agnostic to ansi escape codes
> >in their diagnostic output.

It wouldn't hurt if you explained in detail what is the problem you are
trying to solve and why something that works for most people doesn't work in
your case.

> >Tested on an x86_64-linux-gnu target.
> >
> > 2015-08-27  Jonathan Roelofs  
> >
> > * c-c++-common/ubsan/align-2.c: Don't be picky about ansi escape
> > codes in diagnostics.
> > * c-c++-common/ubsan/align-4.c: Ditto.
> > * c-c++-common/ubsan/align-6.c: Ditto.
> > * c-c++-common/ubsan/align-7.c: Ditto.
> > * c-c++-common/ubsan/align-9.c: Ditto.
> > * c-c++-common/ubsan/float-cast-overflow-2.c: Ditto.
> > * c-c++-common/ubsan/float-cast-overflow-8.c: Ditto.
> > * c-c++-common/ubsan/object-size-1.c: Ditto.
> > * c-c++-common/ubsan/object-size-10.c: Ditto.
> > * c-c++-common/ubsan/object-size-4.c: Ditto.
> > * c-c++-common/ubsan/object-size-5.c: Ditto.
> > * c-c++-common/ubsan/object-size-7.c: Ditto.
> > * c-c++-common/ubsan/object-size-8.c: Ditto.
> > * c-c++-common/ubsan/object-size-9.c: Ditto.
> > * c-c++-common/ubsan/overflow-int128.c: Ditto.
> > * c-c++-common/ubsan/pr63802.c: Ditto.
> >
> >I do not have write access, so I'll need someone to commit this for me
> >if it is approved.

Jakub


Re: [PATCH] 2015-07-31 Benedikt Huber Philipp Tomsich

2015-09-03 Thread pinskia




> On Sep 3, 2015, at 11:58 PM, Sebastian Pop  wrote:
> 
> On Wed, Aug 26, 2015 at 11:58 AM, Benedikt Huber
>  wrote:
>> ping
>> 
>> [PATCH v4][aarch64] Implemented reciprocal square root (rsqrt) estimation in 
>> -ffast-math
>> 
>> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02698.html
>> 
>>> On 31 Jul 2015, at 19:05, Benedikt Huber 
>>>  wrote:
>>> 
>>>  * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and
>>>  rsqrtf.
>>>  * config/aarch64/aarch64-opts.h: -mrecip has a default value
>>>  depending on the core.
>>>  * config/aarch64/aarch64-protos.h: Declare.
>>>  * config/aarch64/aarch64-simd.md: Matching expressions for
>>>  frsqrte and frsqrts.
>>>  * config/aarch64/aarch64-tuning-flags.def: Added
>>>  MRECIP_DEFAULT_ENABLED.
>>>  * config/aarch64/aarch64.c: New functions. Emit rsqrt
>>>  estimation code in fast math mode.
>>>  * config/aarch64/aarch64.md: Added enum entries.
>>>  * config/aarch64/aarch64.opt: Added options -mrecip and
>>>  -mlow-precision-recip-sqrt.
>>>  * testsuite/gcc.target/aarch64/rsqrt-asm-check.c: Assembly scans
>>>  for frsqrte and frsqrts
>>>  * testsuite/gcc.target/aarch64/rsqrt.c: Functional tests for rsqrt.
> 
> The patch looks good to me.
> You will need an ARM/AArch64 maintainer to approve your patch: +Ramana

Except it is missing comments before almost all new functions.  Yes aarch64 
backend does not follow that rule but that does not mean you should not either. 

Thanks,
Andrew

> 
> Thanks,
> Sebastian
> 
>>> 
>>> Signed-off-by: Philipp Tomsich 
>>> ---
>>> gcc/ChangeLog  |  21 
>>> gcc/config/aarch64/aarch64-builtins.c  | 104 
>>> 
>>> gcc/config/aarch64/aarch64-opts.h  |   7 ++
>>> gcc/config/aarch64/aarch64-protos.h|   2 +
>>> gcc/config/aarch64/aarch64-simd.md |  27 ++
>>> gcc/config/aarch64/aarch64-tuning-flags.def|   1 +
>>> gcc/config/aarch64/aarch64.c   | 106 
>>> +++-
>>> gcc/config/aarch64/aarch64.md  |   3 +
>>> gcc/config/aarch64/aarch64.opt |   8 ++
>>> gcc/doc/invoke.texi|  19 
>>> gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c |  63 
>>> gcc/testsuite/gcc.target/aarch64/rsqrt.c   | 107 
>>> +
>>> 12 files changed, 463 insertions(+), 5 deletions(-)
>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c
>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
>>> 
>>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>>> index 3432adb..3bf3098 100644
>>> --- a/gcc/ChangeLog
>>> +++ b/gcc/ChangeLog
>>> @@ -1,3 +1,24 @@
>>> +2015-07-31  Benedikt Huber  
>>> + Philipp Tomsich  
>>> +
>>> + * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and
>>> + rsqrtf.
>>> + * config/aarch64/aarch64-opts.h: -mrecip has a default value
>>> + depending on the core.
>>> + * config/aarch64/aarch64-protos.h: Declare.
>>> + * config/aarch64/aarch64-simd.md: Matching expressions for
>>> + frsqrte and frsqrts.
>>> + * config/aarch64/aarch64-tuning-flags.def: Added
>>> + MRECIP_DEFAULT_ENABLED.
>>> + * config/aarch64/aarch64.c: New functions. Emit rsqrt
>>> + estimation code in fast math mode.
>>> + * config/aarch64/aarch64.md: Added enum entries.
>>> + * config/aarch64/aarch64.opt: Added options -mrecip and
>>> + -mlow-precision-recip-sqrt.
>>> + * testsuite/gcc.target/aarch64/rsqrt-asm-check.c: Assembly scans
>>> + for frsqrte and frsqrts
>>> + * testsuite/gcc.target/aarch64/rsqrt.c: Functional tests for rsqrt.
>>> +
>>> 2015-07-08  Jiong Wang  
>>> 
>>>  * config/aarch64/aarch64.c (aarch64_unspec_may_trap_p): New function.
>>> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
>>> b/gcc/config/aarch64/aarch64-builtins.c
>>> index b6c89b9..b4f443c 100644
>>> --- a/gcc/config/aarch64/aarch64-builtins.c
>>> +++ b/gcc/config/aarch64/aarch64-builtins.c
>>> @@ -335,6 +335,11 @@ enum aarch64_builtins
>>>  AARCH64_BUILTIN_GET_FPSR,
>>>  AARCH64_BUILTIN_SET_FPSR,
>>> 
>>> +  AARCH64_BUILTIN_RSQRT_DF,
>>> +  AARCH64_BUILTIN_RSQRT_SF,
>>> +  AARCH64_BUILTIN_RSQRT_V2DF,
>>> +  AARCH64_BUILTIN_RSQRT_V2SF,
>>> +  AARCH64_BUILTIN_RSQRT_V4SF,
>>>  AARCH64_SIMD_BUILTIN_BASE,
>>>  AARCH64_SIMD_BUILTIN_LANE_CHECK,
>>> #include "aarch64-simd-builtins.def"
>>> @@ -824,6 +829,42 @@ aarch64_init_crc32_builtins ()
>>> }
>>> 
>>> void
>>> +aarch64_add_builtin_rsqrt (void)

Here. 

>>> +{
>>> +  tree fndecl = NULL;
>>> +  tree ftype = NULL;
>>> +
>>> +  tree V2SF_type_node = build_vector_type (float_type_node, 2);
>>> +  tree V2DF_type_node = build_vector_type (double_type_node, 2);
>>> +  tree V4SF_type_node = build_vector_type (float_type_node, 4);
>>> +
>>> +  ftype = build_function_type_list (double_type_node, double_type_node, 
>

Re: [PATCH] Make ubsan tests less picky about ansi escape codes in diagnostics.

2015-09-03 Thread Jonathan Roelofs

+kcc, mrs

Ping

On 8/27/15 4:44 PM, Jonathan Roelofs wrote:

The attached patch makes the ubsan tests agnostic to ansi escape codes
in their diagnostic output.

Tested on an x86_64-linux-gnu target.

 2015-08-27  Jonathan Roelofs  

 * c-c++-common/ubsan/align-2.c: Don't be picky about ansi escape
 codes in diagnostics.
 * c-c++-common/ubsan/align-4.c: Ditto.
 * c-c++-common/ubsan/align-6.c: Ditto.
 * c-c++-common/ubsan/align-7.c: Ditto.
 * c-c++-common/ubsan/align-9.c: Ditto.
 * c-c++-common/ubsan/float-cast-overflow-2.c: Ditto.
 * c-c++-common/ubsan/float-cast-overflow-8.c: Ditto.
 * c-c++-common/ubsan/object-size-1.c: Ditto.
 * c-c++-common/ubsan/object-size-10.c: Ditto.
 * c-c++-common/ubsan/object-size-4.c: Ditto.
 * c-c++-common/ubsan/object-size-5.c: Ditto.
 * c-c++-common/ubsan/object-size-7.c: Ditto.
 * c-c++-common/ubsan/object-size-8.c: Ditto.
 * c-c++-common/ubsan/object-size-9.c: Ditto.
 * c-c++-common/ubsan/overflow-int128.c: Ditto.
 * c-c++-common/ubsan/pr63802.c: Ditto.

I do not have write access, so I'll need someone to commit this for me
if it is approved.


Cheers,

Jon




--
Jon Roelofs
jonat...@codesourcery.com
CodeSourcery / Mentor Embedded


Re: RFC: Combine of compare & and oddity

2015-09-03 Thread Jeff Law

On 09/03/2015 08:59 AM, Wilco Dijkstra wrote:

Segher Boessenkool wrote:
On Thu, Sep 03, 2015 at 12:43:34PM +0100, Wilco Dijkstra wrote:

Combine canonicalizes certain AND masks in a comparison with zero into extracts 
of the

widest

register type. During matching these are expanded into a very inefficient 
sequence that

fails to

match. For example (x & 2) == 0 is matched in combine like this:

Failed to match this instruction:
(set (reg:CC 66 cc)
 (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
 (const_int 1 [0x1])
 (const_int 1 [0x1]))
 (const_int 0 [0])))


Yes.  Some processors even have specific instructions to do this.


However there are 2 issues with this, one is the spurious subreg,


Combine didn't make that up out of thin air; something already used
DImode here.  It could simplify it to SImode in this case, that is
true, don't know why it doesn't; it isn't necessarily faster code to
do so, it can be slower, it might not match, etc.


The relevant RTL instructions on AArch64 are:

(insn 8 3 25 2 (set (reg:SI 77 [ D.2705 ])
 (and:SI (reg/v:SI 76 [ xD.2641 ])
 (const_int 2 [0x2]))) tmp5.c:122 452 {andsi3}
  (nil))
  (insn 26 25 27 2 (set (reg:CC 66 cc)
 (compare:CC (reg:SI 77 [ D.2705 ])
 (const_int 0 [0]))) tmp5.c:122 377 {*cmpsi}
  (expr_list:REG_DEAD (reg:SI 77 [ D.2705 ])
 (nil)))

I don't see anything using DI...
Is this aarch64?  Might be the case that combine wants to do everything 
in word_mode for some reason or another.



Yes, so it looks like some other backends match the odd pattern and then have 
another
pattern change it back into the canonical AND/TST form during the split phase 
(maybe
the subreg confuses register allocation or block other optimizations). This all 
seems
a lot of unnecessary complexity for a few special immediates when there is a 
much
simpler solution...
subregs get in the way in various places.  So if we can avoid generating 
them, then that's good.



But there are more efficient ways to emit single bit and masks tests that apply
to most CPUs rather than doing something specific that works for just one target
only. For example single bit test is a simple shift into carry flag or into the
sign bit, and for mask tests, you shift out all the non-mask bits.


Most of those are quite target-specific.  Some others are already done,
and/or done by other passes.
I wouldn't go that far.  Many targets have simple, cheap ways to do 
single bit testing.  And there are some targets where trying to shift a 
bit into the carry flag would be *horribly* bad performance-wise as they 
only have single bit shifters.







But what combine does here is even more target-specific. Shifts and AND setting 
flags
are universally supported on targets with condition code register.
Bitfield test/extract instructions are more rare, and when they are supported, 
they
may well be more expensive.

I don't think it's anywhere near that clear cut.





So my question is, is it combine's job to try all possible permutations that
constitute a bit or mask test?


Combine converts the merged instructions to what it thinks is the
canonical or cheapest form, and uses that.  It does not try multiple
options (the zero_ext* -> and+shift rewriting is not changing the
semantics of the pattern at all).


But the change from AND to zero_extract is already changing semantics...

It shouldn't be changing semantics -- changing form != changing semantics.



Combine does not try multiple options.


I'm not following - combine tries zero_extract and shift+AND - that's 2 options.
If that is feasible then adding a 3rd option should be possible.
Combine will stop once it finds a match.  It may, in some circumstances, 
try more than one representation to find a match, but that's the 
exception rather than the rule.



You will end up with a *lot* of target hooks like this.  It will also
make testing harder (less coverage).  I am not sure that is a good idea.


We certainly need a lot more target hooks in general so GCC can do the right 
thing
(rather than using costs inconsistently all over the place). But that's a 
different
discussion...
Let's be very careful here, target hooks aren't always the solution. 
I'd rather see the costing models fixed and use those across the board. 
 But frankly, I don't know how to fix the costing models.


jeff


Re: [PATCH] 2015-07-31 Benedikt Huber Philipp Tomsich

2015-09-03 Thread Sebastian Pop
On Wed, Aug 26, 2015 at 11:58 AM, Benedikt Huber
 wrote:
> ping
>
> [PATCH v4][aarch64] Implemented reciprocal square root (rsqrt) estimation in 
> -ffast-math
>
> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02698.html
>
>> On 31 Jul 2015, at 19:05, Benedikt Huber 
>>  wrote:
>>
>>   * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and
>>   rsqrtf.
>>   * config/aarch64/aarch64-opts.h: -mrecip has a default value
>>   depending on the core.
>>   * config/aarch64/aarch64-protos.h: Declare.
>>   * config/aarch64/aarch64-simd.md: Matching expressions for
>>   frsqrte and frsqrts.
>>   * config/aarch64/aarch64-tuning-flags.def: Added
>>   MRECIP_DEFAULT_ENABLED.
>>   * config/aarch64/aarch64.c: New functions. Emit rsqrt
>>   estimation code in fast math mode.
>>   * config/aarch64/aarch64.md: Added enum entries.
>>   * config/aarch64/aarch64.opt: Added options -mrecip and
>>   -mlow-precision-recip-sqrt.
>>   * testsuite/gcc.target/aarch64/rsqrt-asm-check.c: Assembly scans
>>   for frsqrte and frsqrts
>>   * testsuite/gcc.target/aarch64/rsqrt.c: Functional tests for rsqrt.

The patch looks good to me.
You will need an ARM/AArch64 maintainer to approve your patch: +Ramana

Thanks,
Sebastian

>>
>> Signed-off-by: Philipp Tomsich 
>> ---
>> gcc/ChangeLog  |  21 
>> gcc/config/aarch64/aarch64-builtins.c  | 104 
>> gcc/config/aarch64/aarch64-opts.h  |   7 ++
>> gcc/config/aarch64/aarch64-protos.h|   2 +
>> gcc/config/aarch64/aarch64-simd.md |  27 ++
>> gcc/config/aarch64/aarch64-tuning-flags.def|   1 +
>> gcc/config/aarch64/aarch64.c   | 106 +++-
>> gcc/config/aarch64/aarch64.md  |   3 +
>> gcc/config/aarch64/aarch64.opt |   8 ++
>> gcc/doc/invoke.texi|  19 
>> gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c |  63 
>> gcc/testsuite/gcc.target/aarch64/rsqrt.c   | 107 
>> +
>> 12 files changed, 463 insertions(+), 5 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index 3432adb..3bf3098 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,24 @@
>> +2015-07-31  Benedikt Huber  
>> + Philipp Tomsich  
>> +
>> + * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and
>> + rsqrtf.
>> + * config/aarch64/aarch64-opts.h: -mrecip has a default value
>> + depending on the core.
>> + * config/aarch64/aarch64-protos.h: Declare.
>> + * config/aarch64/aarch64-simd.md: Matching expressions for
>> + frsqrte and frsqrts.
>> + * config/aarch64/aarch64-tuning-flags.def: Added
>> + MRECIP_DEFAULT_ENABLED.
>> + * config/aarch64/aarch64.c: New functions. Emit rsqrt
>> + estimation code in fast math mode.
>> + * config/aarch64/aarch64.md: Added enum entries.
>> + * config/aarch64/aarch64.opt: Added options -mrecip and
>> + -mlow-precision-recip-sqrt.
>> + * testsuite/gcc.target/aarch64/rsqrt-asm-check.c: Assembly scans
>> + for frsqrte and frsqrts
>> + * testsuite/gcc.target/aarch64/rsqrt.c: Functional tests for rsqrt.
>> +
>> 2015-07-08  Jiong Wang  
>>
>>   * config/aarch64/aarch64.c (aarch64_unspec_may_trap_p): New function.
>> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
>> b/gcc/config/aarch64/aarch64-builtins.c
>> index b6c89b9..b4f443c 100644
>> --- a/gcc/config/aarch64/aarch64-builtins.c
>> +++ b/gcc/config/aarch64/aarch64-builtins.c
>> @@ -335,6 +335,11 @@ enum aarch64_builtins
>>   AARCH64_BUILTIN_GET_FPSR,
>>   AARCH64_BUILTIN_SET_FPSR,
>>
>> +  AARCH64_BUILTIN_RSQRT_DF,
>> +  AARCH64_BUILTIN_RSQRT_SF,
>> +  AARCH64_BUILTIN_RSQRT_V2DF,
>> +  AARCH64_BUILTIN_RSQRT_V2SF,
>> +  AARCH64_BUILTIN_RSQRT_V4SF,
>>   AARCH64_SIMD_BUILTIN_BASE,
>>   AARCH64_SIMD_BUILTIN_LANE_CHECK,
>> #include "aarch64-simd-builtins.def"
>> @@ -824,6 +829,42 @@ aarch64_init_crc32_builtins ()
>> }
>>
>> void
>> +aarch64_add_builtin_rsqrt (void)
>> +{
>> +  tree fndecl = NULL;
>> +  tree ftype = NULL;
>> +
>> +  tree V2SF_type_node = build_vector_type (float_type_node, 2);
>> +  tree V2DF_type_node = build_vector_type (double_type_node, 2);
>> +  tree V4SF_type_node = build_vector_type (float_type_node, 4);
>> +
>> +  ftype = build_function_type_list (double_type_node, double_type_node, 
>> NULL_TREE);
>> +  fndecl = add_builtin_function ("__builtin_aarch64_rsqrt_df",
>> +ftype, AARCH64_BUILTIN_RSQRT_DF, BUILT_IN_MD, NULL, NULL_TREE);
>> +  aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_DF] = fndecl;
>> +
>> +  ftype = build_function_type_list (float_type_node, float_type_node, 
>> NULL_TREE);
>> +  fndecl = add_builtin_function ("__builtin_aarch64

Re: Reviving SH FDPIC target

2015-09-03 Thread Rich Felker
On Thu, Sep 03, 2015 at 02:58:39PM +, Joseph Myers wrote:
> On Wed, 2 Sep 2015, Rich Felker wrote:
> 
> > So if __fpscr_values was the only reason for patch 1/3 in the FDPIC
> > patchset, I think we can safely drop it. And patch 2/3 was already
> > committed, so 3/3, the one I was originally looking at, seems to be
> > all we need. It was approved at the time, so I'll proceed with merging
> > it with 5.2.0.
> 
> Well, obviously if trying dropping patch 1/3 you need to remove everything 
> related to use_initial_val (the feature added in patch 1/3) from patch 
> 3/3.

As far as I can tell, the only "use" of use_initial_val is defining
the pseudo-instruction in the md file, which causes the code in patch
1/3 to use it. I see no other references to it. As I understand, the
breakage from not having it (in the original 4.5-era patch) would be
when introducing references to __fpscr_values later, and no longer
having the GOT pointer, but that code is gone now.

Rich


Re: [PATCH, rs6000] Use hardware support for vector character multiply

2015-09-03 Thread Bill Schmidt
On Thu, 2015-09-03 at 11:36 -0400, David Edelsohn wrote:
> On Thu, Sep 3, 2015 at 11:20 AM, Bill Schmidt
>  wrote:
> > Hi,
> >
> > It was pointed out to me recently that multiplying two vector chars is
> > performed using scalarization, even though we have hardware support for
> > byte multiplies in vectors.  This patch adds an expansion for mulv16qi3
> > to correct this.
> >
> > The expansion is pretty simple.  We do a multiply-even and multiply-odd
> > to create halfword results, and then use a permute to extract the
> > low-order bytes of each result.  This particular form of a permute uses
> > a different set of input/output vector modes than have been used before,
> > so I added the altivec_vperm_v8hiv16qi insn to represent this.  (The two
> > source operands are vector halfword types, while the target operand is a
> > vector char type.)
> >
> > I've added two test variants, one to test the code generation, and one
> > executable test to check correctness.  One other test failed with this
> > change.  This turned out to be because PowerPC was excluded from the
> > check_effective_target_vect_char_mult target support test.  I resolved
> > this by adding check_effective_target_powerpc_altivec to that test.
> >
> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> > regressions.  Is this ok for trunk?
> >
> > Thanks,
> > Bill
> >
> >
> > [gcc]
> >
> > 2015-09-03  Bill Schmidt  
> >
> > * config/rs6000/altivec.md (altivec_vperm_v8hiv16qi): New
> > define_insn.
> > (mulv16qi3): New define_expand.
> >
> > [gcc/testsuite]
> >
> > 2015-09-03  Bill Schmidt  
> >
> > * gcc.target/powerpc/vec-mult-char-1.c: New test.
> > * gcc.target/powerpc/vec-mult-char-2.c: New test.
> 
> This is okay.
> 
> The "bool be = BYTES_BIG_ENDIAN" and use of "be" is not a common style in GCC.

OK.  I thought that helped with readability, but I will revise this to
use BYTES_BIG_ENDIAN directly in the expressions.

Thanks,
Bill

> 
> Thanks, David
> 




Re: RFC: Combine of compare & and oddity

2015-09-03 Thread Jeff Law

On 09/03/2015 07:18 AM, Segher Boessenkool wrote:

On Thu, Sep 03, 2015 at 12:43:34PM +0100, Wilco Dijkstra wrote:

Combine canonicalizes certain AND masks in a comparison with zero into extracts 
of the

widest

register type. During matching these are expanded into a very inefficient 
sequence that

fails to

match. For example (x & 2) == 0 is matched in combine like this:

Failed to match this instruction:
(set (reg:CC 66 cc)
 (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
 (const_int 1 [0x1])
 (const_int 1 [0x1]))
 (const_int 0 [0])))


Yes.  Some processors even have specific instructions to do this.


However there are 2 issues with this, one is the spurious subreg,


Combine didn't make that up out of thin air; something already used
DImode here.  It could simplify it to SImode in this case, that is
true, don't know why it doesn't; it isn't necessarily faster code to
do so, it can be slower, it might not match, etc.
Right.  It may also be the case that on a 64 bit target, but the 
underlying object is 32 bits and combine wanted to do things in word_mode.


But yes, there's a reason why the subreg is in there and there are times 
when the subregs get in the way of the hand-written pattern matching 
that occurs in combine.c and elsewhere.


So it's generally useful to squash away the subregs when we can. 
However, it's also the case that the subregs can't always be squashed 
away -- so it's also helpful to dig into the transformations in 
combine.c that you want to fire and figure out if and how that code can 
be extended to handle the embedded subregs.





(*) I think that is another issue in combine - when both alternatives match you
want to select the lowest cost one, not the first one that matches.


That's recog, not combine.  And quite a few backends rely on "first match
wins", because it always has been that way.  It also is very easy to write
such patterns accidentally (sometimes even the exact same one twice in the
same machine description, etc.)

Note that it's also been documented that first match wins for 20+ years.






So my question is, is it combine's job to try all possible permutations that
constitute a bit or mask test?


Combine converts the merged instructions to what it thinks is the
canonical or cheapest form, and uses that.  It does not try multiple
options (the zero_ext* -> and+shift rewriting is not changing the
semantics of the pattern at all).
Right.  Once combine finds something that works, it's done and moves 
onto the next set of insns to combine.






Neither matches the AArch64 patterns for ANDS/TST (which is just compare and 
AND). If the

immediate

is not a power of 2 or a power of 2 -1 then it matches correctly as expected.

I don't understand how ((x >> 1) & 1) != 0 could be a useful expansion


It is zero_extract(x,1,1) really.  This is convenient for (old and embedded)
processors that have special bit-test instructions.  If we now want combine
to not do this, we'll have to update all backends that rely on it.


Would any backend actually rely on this given it only does some specific masks,
has a redundant shift with 0 for the mask case and the odd subreg as well?


Such backends match the zero_extract patterns, of course.  Random example:
the h8300 patterns for the "btst" instruction.
PA, m68k and almost certainly others.  I suspect it's fairly common in 
older ports.



Jeff


Re: [PATCH, rs6000] Use hardware support for vector character multiply

2015-09-03 Thread Bill Schmidt
On Thu, 2015-09-03 at 23:26 +0800, Andrew Pinski wrote:
> On Thu, Sep 3, 2015 at 11:20 PM, Bill Schmidt
>  wrote:
> > Hi,
> >
> > It was pointed out to me recently that multiplying two vector chars is
> > performed using scalarization, even though we have hardware support for
> > byte multiplies in vectors.  This patch adds an expansion for mulv16qi3
> > to correct this.
> >
> > The expansion is pretty simple.  We do a multiply-even and multiply-odd
> > to create halfword results, and then use a permute to extract the
> > low-order bytes of each result.  This particular form of a permute uses
> > a different set of input/output vector modes than have been used before,
> > so I added the altivec_vperm_v8hiv16qi insn to represent this.  (The two
> > source operands are vector halfword types, while the target operand is a
> > vector char type.)
> 
> This seems like something which should be done in vector generic
> rather than the back-end.  I am not blocking this patch but just
> suggesting an alternative way of doing this instead of a target
> specific patch.

Currently vector-generic checks whether the back end implements the
smul_optab for the specific vector type; if not, it scalarizes the code.
I'm not sure what else it should do.  Targets might implement the
character multiply in several different ways (directly, using
mult-even/mult-odd, using mult-hi/mult-lo), so anything other than
leaving the multiply in place could be wrong for some targets.  Am I
misunderstanding your point?

Thanks,
Bill

> 
> Thanks,
> Andrew
> 
> >
> > I've added two test variants, one to test the code generation, and one
> > executable test to check correctness.  One other test failed with this
> > change.  This turned out to be because PowerPC was excluded from the
> > check_effective_target_vect_char_mult target support test.  I resolved
> > this by adding check_effective_target_powerpc_altivec to that test.
> >
> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> > regressions.  Is this ok for trunk?
> >
> > Thanks,
> > Bill
> >
> >
> > [gcc]
> >
> > 2015-09-03  Bill Schmidt  
> >
> > * config/rs6000/altivec.md (altivec_vperm_v8hiv16qi): New
> > define_insn.
> > (mulv16qi3): New define_expand.
> >
> > [gcc/testsuite]
> >
> > 2015-09-03  Bill Schmidt  
> >
> > * gcc.target/powerpc/vec-mult-char-1.c: New test.
> > * gcc.target/powerpc/vec-mult-char-2.c: New test.
> >
> >
> > Index: gcc/config/rs6000/altivec.md
> > ===
> > --- gcc/config/rs6000/altivec.md(revision 227416)
> > +++ gcc/config/rs6000/altivec.md(working copy)
> > @@ -1957,6 +1957,16 @@
> >"vperm %0,%1,%2,%3"
> >[(set_attr "type" "vecperm")])
> >
> > +(define_insn "altivec_vperm_v8hiv16qi"
> > +  [(set (match_operand:V16QI 0 "register_operand" "=v")
> > +   (unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v")
> > +  (match_operand:V8HI 2 "register_operand" "v")
> > +  (match_operand:V16QI 3 "register_operand" "v")]
> > +  UNSPEC_VPERM))]
> > +  "TARGET_ALTIVEC"
> > +  "vperm %0,%1,%2,%3"
> > +  [(set_attr "type" "vecperm")])
> > +
> >  (define_expand "altivec_vperm__uns"
> >[(set (match_operand:VM 0 "register_operand" "=v")
> > (unspec:VM [(match_operand:VM 1 "register_operand" "v")
> > @@ -3161,6 +3171,34 @@
> >""
> >"")
> >
> > +(define_expand "mulv16qi3"
> > +  [(set (match_operand:V16QI 0 "register_operand" "=v")
> > +(mult:V16QI (match_operand:V16QI 1 "register_operand" "v")
> > +(match_operand:V16QI 2 "register_operand" "v")))]
> > +  "TARGET_ALTIVEC"
> > +  "
> > +{
> > +  rtx even = gen_reg_rtx (V8HImode);
> > +  rtx odd = gen_reg_rtx (V8HImode);
> > +  rtx mask = gen_reg_rtx (V16QImode);
> > +  rtvec v = rtvec_alloc (16);
> > +  bool be = BYTES_BIG_ENDIAN;
> > +  int i;
> > +
> > +  for (i = 0; i < 8; ++i) {
> > +RTVEC_ELT (v, 2 * i)
> > + = gen_rtx_CONST_INT (QImode, be ? 2 * i + 1 : 31 - 2 * i);
> > +RTVEC_ELT (v, 2 * i + 1)
> > + = gen_rtx_CONST_INT (QImode, be ? 2 * i + 17 : 15 - 2 * i);
> > +  }
> > +
> > +  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> > +  emit_insn (gen_altivec_vmulesb (even, operands[1], operands[2]));
> > +  emit_insn (gen_altivec_vmulosb (odd, operands[1], operands[2]));
> > +  emit_insn (gen_altivec_vperm_v8hiv16qi (operands[0], even, odd, mask));
> > +  DONE;
> > +}")
> > +
> >  (define_expand "altivec_negv4sf2"
> >[(use (match_operand:V4SF 0 "register_operand" ""))
> > (use (match_operand:V4SF 1 "register_operand" ""))]
> > Index: gcc/testsuite/gcc.target/powerpc/vec-mult-char-1.c
> > ===
> > --- gcc/testsuite/gcc.target/powerpc/vec-mult-char-1.c  (revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/vec-mult-char-1.c  (working copy)
> > @@ -0,0 +1,53 @@

[PATCH] PR67421, Cost instruction sequences when doing left wide shift

2015-09-03 Thread Jiong Wang

As Rainer reported at

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67421

Also, as described at

  https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01147.html

This patch relax the restriction on wide left shift. Previously we
always honor target private pattern, so when the following check be
true, we cancel the transformation.

  have_insn_for (ASHIFT, mode)

While it's better to do a cost on generated instruction sequences to
decided whether it's beneficial to honor backend pattern. Normally the
generic transformation will be better.

I haven't used GEN_FCN to invoke gen_* directly, instead I reused
"expand_variable_shift" to let it handle all the left work.

wide-shift-64 pass on sparc under the option "-mv8plus -mcpu=v9" now,
and arm32 also generate better code for wide-shift-64.

OK for trunk?

2015-09-03  Jiong. Wang  

gcc/
  PR rtl-optimization/67421
  * expr.c (expand_expr_real_2): Cost instrcution sequences when doing
  left wide shift tranformation.

-- 
Regards,
Jiong

diff --git a/gcc/expr.c b/gcc/expr.c
index ee0c1f9..cf28f44 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8892,7 +8892,6 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	&& ! unsignedp
 	&& mode == GET_MODE_WIDER_MODE (word_mode)
 	&& GET_MODE_SIZE (mode) == 2 * GET_MODE_SIZE (word_mode)
-	&& ! have_insn_for (ASHIFT, mode)
 	&& TREE_CONSTANT (treeop1)
 	&& TREE_CODE (treeop0) == SSA_NAME)
 	  {
@@ -8908,6 +8907,7 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 		&& ((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode))
 			>= GET_MODE_BITSIZE (word_mode)))
 		  {
+		rtx_insn *seq, *seq_old;
 		unsigned int high_off = subreg_highpart_offset (word_mode,
 mode);
 		rtx low = lowpart_subreg (word_mode, op0, mode);
@@ -8918,6 +8918,7 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	 - TREE_INT_CST_LOW (treeop1));
 		tree rshift = build_int_cst (TREE_TYPE (treeop1), ramount);
 
+		start_sequence ();
 		/* dest_high = src_low >> (word_size - C).  */
 		temp = expand_variable_shift (RSHIFT_EXPR, word_mode, low,
 		  rshift, dest_high, unsignedp);
@@ -8930,7 +8931,28 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 		if (temp != dest_low)
 		  emit_move_insn (dest_low, temp);
 
+		seq = get_insns ();
+		end_sequence ();
 		temp = target ;
+
+		if (have_insn_for (ASHIFT, mode))
+		  {
+			bool speed_p = optimize_insn_for_speed_p ();
+			start_sequence ();
+			rtx ret_old = expand_variable_shift (code, mode, op0,
+			 treeop1, target,
+			 unsignedp);
+
+			seq_old = get_insns ();
+			end_sequence ();
+			if (seq_cost (seq, speed_p)
+			>= seq_cost (seq_old, speed_p))
+			  {
+			seq = seq_old;
+			temp = ret_old;
+			  }
+		  }
+		  emit_insn (seq);
 		  }
 	  }
 	  }


Re: [PATCH, rs6000] Use hardware support for vector character multiply

2015-09-03 Thread David Edelsohn
On Thu, Sep 3, 2015 at 11:20 AM, Bill Schmidt
 wrote:
> Hi,
>
> It was pointed out to me recently that multiplying two vector chars is
> performed using scalarization, even though we have hardware support for
> byte multiplies in vectors.  This patch adds an expansion for mulv16qi3
> to correct this.
>
> The expansion is pretty simple.  We do a multiply-even and multiply-odd
> to create halfword results, and then use a permute to extract the
> low-order bytes of each result.  This particular form of a permute uses
> a different set of input/output vector modes than have been used before,
> so I added the altivec_vperm_v8hiv16qi insn to represent this.  (The two
> source operands are vector halfword types, while the target operand is a
> vector char type.)
>
> I've added two test variants, one to test the code generation, and one
> executable test to check correctness.  One other test failed with this
> change.  This turned out to be because PowerPC was excluded from the
> check_effective_target_vect_char_mult target support test.  I resolved
> this by adding check_effective_target_powerpc_altivec to that test.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Is this ok for trunk?
>
> Thanks,
> Bill
>
>
> [gcc]
>
> 2015-09-03  Bill Schmidt  
>
> * config/rs6000/altivec.md (altivec_vperm_v8hiv16qi): New
> define_insn.
> (mulv16qi3): New define_expand.
>
> [gcc/testsuite]
>
> 2015-09-03  Bill Schmidt  
>
> * gcc.target/powerpc/vec-mult-char-1.c: New test.
> * gcc.target/powerpc/vec-mult-char-2.c: New test.

This is okay.

The "bool be = BYTES_BIG_ENDIAN" and use of "be" is not a common style in GCC.

Thanks, David


Re: [PATCH, rs6000] Use hardware support for vector character multiply

2015-09-03 Thread Andrew Pinski
On Thu, Sep 3, 2015 at 11:20 PM, Bill Schmidt
 wrote:
> Hi,
>
> It was pointed out to me recently that multiplying two vector chars is
> performed using scalarization, even though we have hardware support for
> byte multiplies in vectors.  This patch adds an expansion for mulv16qi3
> to correct this.
>
> The expansion is pretty simple.  We do a multiply-even and multiply-odd
> to create halfword results, and then use a permute to extract the
> low-order bytes of each result.  This particular form of a permute uses
> a different set of input/output vector modes than have been used before,
> so I added the altivec_vperm_v8hiv16qi insn to represent this.  (The two
> source operands are vector halfword types, while the target operand is a
> vector char type.)

This seems like something which should be done in vector generic
rather than the back-end.  I am not blocking this patch but just
suggesting an alternative way of doing this instead of a target
specific patch.

Thanks,
Andrew

>
> I've added two test variants, one to test the code generation, and one
> executable test to check correctness.  One other test failed with this
> change.  This turned out to be because PowerPC was excluded from the
> check_effective_target_vect_char_mult target support test.  I resolved
> this by adding check_effective_target_powerpc_altivec to that test.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Is this ok for trunk?
>
> Thanks,
> Bill
>
>
> [gcc]
>
> 2015-09-03  Bill Schmidt  
>
> * config/rs6000/altivec.md (altivec_vperm_v8hiv16qi): New
> define_insn.
> (mulv16qi3): New define_expand.
>
> [gcc/testsuite]
>
> 2015-09-03  Bill Schmidt  
>
> * gcc.target/powerpc/vec-mult-char-1.c: New test.
> * gcc.target/powerpc/vec-mult-char-2.c: New test.
>
>
> Index: gcc/config/rs6000/altivec.md
> ===
> --- gcc/config/rs6000/altivec.md(revision 227416)
> +++ gcc/config/rs6000/altivec.md(working copy)
> @@ -1957,6 +1957,16 @@
>"vperm %0,%1,%2,%3"
>[(set_attr "type" "vecperm")])
>
> +(define_insn "altivec_vperm_v8hiv16qi"
> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
> +   (unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v")
> +  (match_operand:V8HI 2 "register_operand" "v")
> +  (match_operand:V16QI 3 "register_operand" "v")]
> +  UNSPEC_VPERM))]
> +  "TARGET_ALTIVEC"
> +  "vperm %0,%1,%2,%3"
> +  [(set_attr "type" "vecperm")])
> +
>  (define_expand "altivec_vperm__uns"
>[(set (match_operand:VM 0 "register_operand" "=v")
> (unspec:VM [(match_operand:VM 1 "register_operand" "v")
> @@ -3161,6 +3171,34 @@
>""
>"")
>
> +(define_expand "mulv16qi3"
> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
> +(mult:V16QI (match_operand:V16QI 1 "register_operand" "v")
> +(match_operand:V16QI 2 "register_operand" "v")))]
> +  "TARGET_ALTIVEC"
> +  "
> +{
> +  rtx even = gen_reg_rtx (V8HImode);
> +  rtx odd = gen_reg_rtx (V8HImode);
> +  rtx mask = gen_reg_rtx (V16QImode);
> +  rtvec v = rtvec_alloc (16);
> +  bool be = BYTES_BIG_ENDIAN;
> +  int i;
> +
> +  for (i = 0; i < 8; ++i) {
> +RTVEC_ELT (v, 2 * i)
> + = gen_rtx_CONST_INT (QImode, be ? 2 * i + 1 : 31 - 2 * i);
> +RTVEC_ELT (v, 2 * i + 1)
> + = gen_rtx_CONST_INT (QImode, be ? 2 * i + 17 : 15 - 2 * i);
> +  }
> +
> +  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_altivec_vmulesb (even, operands[1], operands[2]));
> +  emit_insn (gen_altivec_vmulosb (odd, operands[1], operands[2]));
> +  emit_insn (gen_altivec_vperm_v8hiv16qi (operands[0], even, odd, mask));
> +  DONE;
> +}")
> +
>  (define_expand "altivec_negv4sf2"
>[(use (match_operand:V4SF 0 "register_operand" ""))
> (use (match_operand:V4SF 1 "register_operand" ""))]
> Index: gcc/testsuite/gcc.target/powerpc/vec-mult-char-1.c
> ===
> --- gcc/testsuite/gcc.target/powerpc/vec-mult-char-1.c  (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/vec-mult-char-1.c  (working copy)
> @@ -0,0 +1,53 @@
> +/* { dg-do run { target { powerpc*-*-* && vmx_hw } } } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-options "-maltivec" } */
> +
> +#include 
> +
> +extern void abort (void);
> +
> +vector unsigned char vmului(vector unsigned char v,
> +   vector unsigned char i)
> +{
> +   return v * i;
> +}
> +
> +vector signed char vmulsi(vector signed char v,
> + vector signed char i)
> +{
> +   return v * i;
> +}
> +
> +int main ()
> +{
> +  vector unsigned char a = {2, 4, 6, 8, 10, 12, 14, 16,
> +   18, 20, 22, 24, 26, 28, 30, 32};
> +  vector unsigned char b = {3, 6, 9, 12, 15, 18, 21, 24,
> +   27, 30, 3

Re: RFC: Combine of compare & and oddity

2015-09-03 Thread Jeff Law

On 09/02/2015 03:00 PM, Segher Boessenkool wrote:

On Wed, Sep 02, 2015 at 01:59:58PM -0600, Jeff Law wrote:

(set (reg:CC 66 cc)
 (compare:CC (and:DI (lshiftrt:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ])
 0)
 (const_int 1 [0x1]))
 (const_int 1 [0x1]))
 (const_int 0 [0])))

Yea, this is an alternative form.   I don't offhand remember how/why
this form appears, but it certainly does.  I don't think any ports
handle this form (but I certainly have done any checks), but I believe
combine creates it primarily for internal purposes.


Combine replaces zero_ext* with equivalent shift/and patterns and tries
again, if things don't match.  Targets with more generic masking insns
do not want to describe the very many cases that can be described with
zero_ext* separately.

rs6000 handles this exact pattern, btw.  And I'll be very happy if we can
just drop it :-)
If I still cared, I'd probably look into this for the PA which has some 
rough similarities with the PPC architecture in its bit 
insertion/extraction capabilities.But the PA just isn't worth the 
time :-)



cindex @code{zero_extract}, canonicalization of
@cindex @code{sign_extract}, canonicalization of
@item
Equality comparisons of a group of bits (usually a single bit) with zero
will be written using @code{zero_extract} rather than the equivalent
@code{and} or @code{sign_extract} operations.


Oh it's even documented, thanks.  I do still think we should think of
changing this.
Do-able, but I suspect the fallout would be significant across the older 
ports.


Jeff


Re: [PATCH] fix PR53852: stop ISL after a given number of operations

2015-09-03 Thread Sebastian Pop
Richard Biener wrote:
> > * gcc.dg/graphite/uns-interchange-12.c: Adjust pattern to pass 
> > with
> > both isl-0.12 and isl-0.15.
> 
> Does it mean with 0.15 we now "time out" on some of the cases?  

"time out" will not trigger on the testcases modified in this patch.

> Or is this
> just a general difference between 0.12 and 0.15?  In which case, like for
> this testcase, is there a better way to verify whether the loops J and K were
> interchanged?

We have more "tiled by" with isl-0.15 than with isl-0.12, so that means that the
pattern we are looking for is not stable enough between isl versions: I will
have to find and test for another pattern to check that loops have been blocked,
interchanged, etc., which in my opinion is hard as we currently use different
schedulers for different versions of isl.

I have tuned the time out such that it will not trigger on the interchange
testcases.  It will trigger on a fortran testcase pr42334-1.f on which I have
seen warnings of dejagnu timing out, and I have also tried on the reduced
testcase attached to PR53852 which will time out with isl-0.15.  I have not
added PR53852's testcase as there still are people using isl-0.12 that would get
another testcase that uses large amounts of memory and compile time.

Sebastian


[PATCH, rs6000] Use hardware support for vector character multiply

2015-09-03 Thread Bill Schmidt
Hi,

It was pointed out to me recently that multiplying two vector chars is
performed using scalarization, even though we have hardware support for
byte multiplies in vectors.  This patch adds an expansion for mulv16qi3
to correct this.

The expansion is pretty simple.  We do a multiply-even and multiply-odd
to create halfword results, and then use a permute to extract the
low-order bytes of each result.  This particular form of a permute uses
a different set of input/output vector modes than have been used before,
so I added the altivec_vperm_v8hiv16qi insn to represent this.  (The two
source operands are vector halfword types, while the target operand is a
vector char type.)

I've added two test variants, one to test the code generation, and one
executable test to check correctness.  One other test failed with this
change.  This turned out to be because PowerPC was excluded from the
check_effective_target_vect_char_mult target support test.  I resolved
this by adding check_effective_target_powerpc_altivec to that test.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Is this ok for trunk?

Thanks,
Bill


[gcc]

2015-09-03  Bill Schmidt  

* config/rs6000/altivec.md (altivec_vperm_v8hiv16qi): New
define_insn.
(mulv16qi3): New define_expand.

[gcc/testsuite]

2015-09-03  Bill Schmidt  

* gcc.target/powerpc/vec-mult-char-1.c: New test.
* gcc.target/powerpc/vec-mult-char-2.c: New test.


Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md(revision 227416)
+++ gcc/config/rs6000/altivec.md(working copy)
@@ -1957,6 +1957,16 @@
   "vperm %0,%1,%2,%3"
   [(set_attr "type" "vecperm")])
 
+(define_insn "altivec_vperm_v8hiv16qi"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+   (unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v")
+  (match_operand:V8HI 2 "register_operand" "v")
+  (match_operand:V16QI 3 "register_operand" "v")]
+  UNSPEC_VPERM))]
+  "TARGET_ALTIVEC"
+  "vperm %0,%1,%2,%3"
+  [(set_attr "type" "vecperm")])
+
 (define_expand "altivec_vperm__uns"
   [(set (match_operand:VM 0 "register_operand" "=v")
(unspec:VM [(match_operand:VM 1 "register_operand" "v")
@@ -3161,6 +3171,34 @@
   ""
   "")
 
+(define_expand "mulv16qi3"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+(mult:V16QI (match_operand:V16QI 1 "register_operand" "v")
+(match_operand:V16QI 2 "register_operand" "v")))]
+  "TARGET_ALTIVEC"
+  "
+{
+  rtx even = gen_reg_rtx (V8HImode);
+  rtx odd = gen_reg_rtx (V8HImode);
+  rtx mask = gen_reg_rtx (V16QImode);
+  rtvec v = rtvec_alloc (16);
+  bool be = BYTES_BIG_ENDIAN;
+  int i;
+
+  for (i = 0; i < 8; ++i) {
+RTVEC_ELT (v, 2 * i)
+ = gen_rtx_CONST_INT (QImode, be ? 2 * i + 1 : 31 - 2 * i);
+RTVEC_ELT (v, 2 * i + 1)
+ = gen_rtx_CONST_INT (QImode, be ? 2 * i + 17 : 15 - 2 * i);
+  }
+
+  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_altivec_vmulesb (even, operands[1], operands[2]));
+  emit_insn (gen_altivec_vmulosb (odd, operands[1], operands[2]));
+  emit_insn (gen_altivec_vperm_v8hiv16qi (operands[0], even, odd, mask));
+  DONE;
+}")
+
 (define_expand "altivec_negv4sf2"
   [(use (match_operand:V4SF 0 "register_operand" ""))
(use (match_operand:V4SF 1 "register_operand" ""))]
Index: gcc/testsuite/gcc.target/powerpc/vec-mult-char-1.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-mult-char-1.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-mult-char-1.c  (working copy)
@@ -0,0 +1,53 @@
+/* { dg-do run { target { powerpc*-*-* && vmx_hw } } } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec" } */
+
+#include 
+
+extern void abort (void);
+
+vector unsigned char vmului(vector unsigned char v,
+   vector unsigned char i)
+{
+   return v * i;
+}
+
+vector signed char vmulsi(vector signed char v,
+ vector signed char i)
+{
+   return v * i;
+}
+
+int main ()
+{
+  vector unsigned char a = {2, 4, 6, 8, 10, 12, 14, 16,
+   18, 20, 22, 24, 26, 28, 30, 32};
+  vector unsigned char b = {3, 6, 9, 12, 15, 18, 21, 24,
+   27, 30, 33, 36, 39, 42, 45, 48};
+  vector unsigned char c = vmului (a, b);
+  vector unsigned char expect_c = {6, 24, 54, 96, 150, 216, 38, 128,
+  230, 88, 214, 96, 246, 152, 70, 0};
+
+  vector signed char d = {2, -4, 6, -8, 10, -12, 14, -16,
+ 18, -20, 22, -24, 26, -28, 30, -32};
+  vector signed char e = {3, 6, -9, -12, 15, 18, -21, -24,
+ 27, 30, -33, -36, 39, 42, -45, -48};
+  vector signed char f = vmulsi (d, e);
+  vector signed char expect_f

Re: RFC: Combine of compare & and oddity

2015-09-03 Thread Andrew Pinski
On Thu, Sep 3, 2015 at 10:59 PM, Wilco Dijkstra  wrote:
>> Segher Boessenkool wrote:
>> On Thu, Sep 03, 2015 at 12:43:34PM +0100, Wilco Dijkstra wrote:
>> > > > Combine canonicalizes certain AND masks in a comparison with zero into 
>> > > > extracts of the
>> > > widest
>> > > > register type. During matching these are expanded into a very 
>> > > > inefficient sequence that
>> > > fails to
>> > > > match. For example (x & 2) == 0 is matched in combine like this:
>> > > >
>> > > > Failed to match this instruction:
>> > > > (set (reg:CC 66 cc)
>> > > > (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 
>> > > > 0)
>> > > > (const_int 1 [0x1])
>> > > > (const_int 1 [0x1]))
>> > > > (const_int 0 [0])))
>> > >
>> > > Yes.  Some processors even have specific instructions to do this.
>> >
>> > However there are 2 issues with this, one is the spurious subreg,
>>
>> Combine didn't make that up out of thin air; something already used
>> DImode here.  It could simplify it to SImode in this case, that is
>> true, don't know why it doesn't; it isn't necessarily faster code to
>> do so, it can be slower, it might not match, etc.
>
> The relevant RTL instructions on AArch64 are:
>
> (insn 8 3 25 2 (set (reg:SI 77 [ D.2705 ])
> (and:SI (reg/v:SI 76 [ xD.2641 ])
> (const_int 2 [0x2]))) tmp5.c:122 452 {andsi3}
>  (nil))
>  (insn 26 25 27 2 (set (reg:CC 66 cc)
> (compare:CC (reg:SI 77 [ D.2705 ])
> (const_int 0 [0]))) tmp5.c:122 377 {*cmpsi}
>  (expr_list:REG_DEAD (reg:SI 77 [ D.2705 ])
> (nil)))
>
> I don't see anything using DI...
>
>> > the other is
>> > that only a subset of legal zero_extracts are tried (only single bit and 
>> > the
>> > degenerate case of zero_extract with shift of 0 - which I think should not 
>> > be a
>> > zero_extract). All other AND immediate remain as AND.
>>
>> Yes.  I'm happy to see this weird special case "optimisation",
>> "canocalisation" gone.
>>
>> > So to emit an AND on targets without such specific instructions, or where 
>> > such
>> > instructions are more expensive than a simple AND (*), you need now at 
>> > least 3 different
>> > backend patterns for any instruction that can emit AND immediate...
>>
>> It's only a problem for AND-and-compare, no?
>
> Yes, so it looks like some other backends match the odd pattern and then have 
> another
> pattern change it back into the canonical AND/TST form during the split phase 
> (maybe
> the subreg confuses register allocation or block other optimizations). This 
> all seems
> a lot of unnecessary complexity for a few special immediates when there is a 
> much
> simpler solution...
>
>> > (*) I think that is another issue in combine - when both alternatives 
>> > match you
>> > want to select the lowest cost one, not the first one that matches.
>>
>> That's recog, not combine.  And quite a few backends rely on "first match
>> wins", because it always has been that way.  It also is very easy to write
>> such patterns accidentally (sometimes even the exact same one twice in the
>> same machine description, etc.)
>>
>> > > > Failed to match this instruction:
>> > > > (set (reg:CC 66 cc)
>> > > > (compare:CC (and:DI (lshiftrt:DI (subreg:DI (reg/v:SI 76 [ xD.2641 
>> > > > ]) 0)
>> > > > (const_int 1 [0x1]))
>> > > > (const_int 1 [0x1]))
>> > > > (const_int 0 [0])))
>> > >
>> > > This is after r223067.  Combine tests only one "final" instruction; that
>> > > revision rewrites zero_ext* if it doesn't match and tries again.  This
>> > > helps for processors that can do more generic masks (like rs6000, and I
>> > > believe also aarch64?): without it, you need many more patterns to match
>> > > all the zero_ext{ract,end} cases.
>> >
>> > But there are more efficient ways to emit single bit and masks tests that 
>> > apply
>> > to most CPUs rather than doing something specific that works for just one 
>> > target
>> > only. For example single bit test is a simple shift into carry flag or 
>> > into the
>> > sign bit, and for mask tests, you shift out all the non-mask bits.
>>
>> Most of those are quite target-specific.  Some others are already done,
>> and/or done by other passes.
>
> But what combine does here is even more target-specific. Shifts and AND 
> setting flags
> are universally supported on targets with condition code register.
> Bitfield test/extract instructions are more rare, and when they are 
> supported, they
> may well be more expensive.
>
>> > So my question is, is it combine's job to try all possible permutations 
>> > that
>> > constitute a bit or mask test?
>>
>> Combine converts the merged instructions to what it thinks is the
>> canonical or cheapest form, and uses that.  It does not try multiple
>> options (the zero_ext* -> and+shift rewriting is not changing the
>> semantics of the pattern at all).
>
> But the change from AND to zero_extract is already changing semantics...
>
>

Re: [patch] libstdc++/66998 Make std::experimental::not_fn SFINAE-friendly.

2015-09-03 Thread Jonathan Wakely

On 03/09/15 15:35 +0100, Jonathan Wakely wrote:

Tested powerpc64le-linux, committed to trunk.


And gcc-5-branch.


commit dd64ea78da1f6e92ba011605ece7cc4bb08e41cc
Author: Jonathan Wakely 
Date:   Thu Sep 3 12:26:55 2015 +0100

   Make std::experimental::not_fn SFINAE-friendly.

PR libstdc++/66998
* include/experimental/functional (_Not_fn): Add exception
specifications and non-deduced return types.
(not_fn): Add exception specification and wrap pointer-to-member.
* testsuite/experimental/functional/not_fn.cc: Test in SFINAE context
and test pointer-to-member.


Re: [testsuite] Clean up effective_target cache

2015-09-03 Thread Christophe Lyon
On 3 September 2015 at 13:31, H.J. Lu  wrote:
> On Wed, Sep 2, 2015 at 7:02 AM, Christophe Lyon
>  wrote:
>> On 1 September 2015 at 16:04, Christophe Lyon
>>  wrote:
>>> On 25 August 2015 at 17:31, Mike Stump  wrote:
 On Aug 25, 2015, at 1:14 AM, Christophe Lyon  
 wrote:
> Some subsets of the tests override ALWAYS_CXXFLAGS or
> TEST_ALWAYS_FLAGS and perform effective_target support tests using
> these modified flags.

> This patch adds a new function 'clear_effective_target_cache', which
> is called at the end of every .exp file which overrides
> ALWAYS_CXXFLAGS or TEST_ALWAYS_FLAGS.

 So, a simple English directive somewhere that says, if one changes 
 ALWAYS_CXXFLAGS or TEST_ALWAYS_FLAGS then they should do a 
 clear_effective_target_cache at the end as the target cache can make 
 decisions based upon the flags, and those decisions need to be redone when 
 the flags change would be nice.

 I do wonder, do we need to reexamine when setting the flags?  I’m thinking 
 of a sequence like: non-thumb default, is_thumb, set flags (thumb), 
 is_thumb.  Anyway, safe to punt this until someone discovers it or is 
 reasonable sure it happens.

 Anyway, all looks good.  Ok.

>>> Here is what I have committed (r227372).
>>
>> Hmmm, in fact this was r227401.
>>
>
> It caused:
>
> ERROR: can't unset "et_cache(arm_neon_ok,value)": no such element in array
> ERROR: can't unset "et_cache(arm_neon_ok,value)": no such element in array
> ERROR: can't unset "et_cache(arm_neon_ok,value)": no such element in array
> ERROR: can't unset "et_cache(dfp,value)": no such element in array
> ERROR: can't unset "et_cache(fsanitize_address,value)": no such element in 
> array
> ERROR: can't unset "et_cache(ia32,value)": no such element in array
> ERROR: can't unset "et_cache(ia32,value)": no such element in array
> ERROR: can't unset "et_cache(ia32,value)": no such element in array
> ERROR: can't unset "et_cache(ia32,value)": no such element in array
> ERROR: can't unset "et_cache(ia32,value)": no such element in array
> ERROR: can't unset "et_cache(ilp32,value)": no such element in array
> ERROR: can't unset "et_cache(ilp32,value)": no such element in array
> ERROR: can't unset "et_cache(ilp32,value)": no such element in array
> ERROR: can't unset "et_cache(ilp32,value)": no such element in array
> ERROR: can't unset "et_cache(label_values,value)": no such element in array
> ERROR: can't unset "et_cache(lp64,value)": no such element in array
> ERROR: can't unset "et_cache(lp64,value)": no such element in array
> ERROR: can't unset "et_cache(lp64,value)": no such element in array
> ERROR: can't unset "et_cache(ptr32plus,value)": no such element in array
> ERROR: can't unset "et_cache(ptr32plus,value)": no such element in array
> ...
>
> on Linux/x86-64:
>
> https://gcc.gnu.org/ml/gcc-testresults/2015-09/msg00167.html
>

I'll have a look.
That's the configuration I used to check before committing, but I am
going to re-check.

> --
> H.J.


Re: patch for PR61578

2015-09-03 Thread Vladimir Makarov

On 09/02/2015 11:32 AM, Christophe Lyon wrote:

Hi Vladimir,



On 1 September 2015 at 21:39, Vladimir Makarov  wrote:

   The following patch is for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61578

   The patch was bootstrapped and tested on x86 and x86-64.

   Committed as rev. 227382.


Since this patch, I can see:
   gcc.dg/vect/slp-perm-5.c (internal compiler error)
   gcc.dg/vect/slp-perm-5.c -flto -ffat-lto-objects (internal compiler error)

on arm* targets.

Can you have a look?



Thanks for reporting this, Christophe.

Sure, I'll investigate it.  It is my fault I should fix it.


Re: Reviving SH FDPIC target

2015-09-03 Thread Joseph Myers
On Wed, 2 Sep 2015, Rich Felker wrote:

> So if __fpscr_values was the only reason for patch 1/3 in the FDPIC
> patchset, I think we can safely drop it. And patch 2/3 was already
> committed, so 3/3, the one I was originally looking at, seems to be
> all we need. It was approved at the time, so I'll proceed with merging
> it with 5.2.0.

Well, obviously if trying dropping patch 1/3 you need to remove everything 
related to use_initial_val (the feature added in patch 1/3) from patch 
3/3.

-- 
Joseph S. Myers
jos...@codesourcery.com


RE: RFC: Combine of compare & and oddity

2015-09-03 Thread Wilco Dijkstra
> Segher Boessenkool wrote:
> On Thu, Sep 03, 2015 at 12:43:34PM +0100, Wilco Dijkstra wrote:
> > > > Combine canonicalizes certain AND masks in a comparison with zero into 
> > > > extracts of the
> > > widest
> > > > register type. During matching these are expanded into a very 
> > > > inefficient sequence that
> > > fails to
> > > > match. For example (x & 2) == 0 is matched in combine like this:
> > > >
> > > > Failed to match this instruction:
> > > > (set (reg:CC 66 cc)
> > > > (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
> > > > (const_int 1 [0x1])
> > > > (const_int 1 [0x1]))
> > > > (const_int 0 [0])))
> > >
> > > Yes.  Some processors even have specific instructions to do this.
> >
> > However there are 2 issues with this, one is the spurious subreg,
> 
> Combine didn't make that up out of thin air; something already used
> DImode here.  It could simplify it to SImode in this case, that is
> true, don't know why it doesn't; it isn't necessarily faster code to
> do so, it can be slower, it might not match, etc.

The relevant RTL instructions on AArch64 are:

(insn 8 3 25 2 (set (reg:SI 77 [ D.2705 ])
(and:SI (reg/v:SI 76 [ xD.2641 ])
(const_int 2 [0x2]))) tmp5.c:122 452 {andsi3}
 (nil))
 (insn 26 25 27 2 (set (reg:CC 66 cc)
(compare:CC (reg:SI 77 [ D.2705 ])
(const_int 0 [0]))) tmp5.c:122 377 {*cmpsi}
 (expr_list:REG_DEAD (reg:SI 77 [ D.2705 ])
(nil)))

I don't see anything using DI...

> > the other is
> > that only a subset of legal zero_extracts are tried (only single bit and the
> > degenerate case of zero_extract with shift of 0 - which I think should not 
> > be a
> > zero_extract). All other AND immediate remain as AND.
> 
> Yes.  I'm happy to see this weird special case "optimisation",
> "canocalisation" gone.
> 
> > So to emit an AND on targets without such specific instructions, or where 
> > such
> > instructions are more expensive than a simple AND (*), you need now at 
> > least 3 different
> > backend patterns for any instruction that can emit AND immediate...
> 
> It's only a problem for AND-and-compare, no?

Yes, so it looks like some other backends match the odd pattern and then have 
another
pattern change it back into the canonical AND/TST form during the split phase 
(maybe
the subreg confuses register allocation or block other optimizations). This all 
seems
a lot of unnecessary complexity for a few special immediates when there is a 
much 
simpler solution...

> > (*) I think that is another issue in combine - when both alternatives match 
> > you
> > want to select the lowest cost one, not the first one that matches.
> 
> That's recog, not combine.  And quite a few backends rely on "first match
> wins", because it always has been that way.  It also is very easy to write
> such patterns accidentally (sometimes even the exact same one twice in the
> same machine description, etc.)
> 
> > > > Failed to match this instruction:
> > > > (set (reg:CC 66 cc)
> > > > (compare:CC (and:DI (lshiftrt:DI (subreg:DI (reg/v:SI 76 [ xD.2641 
> > > > ]) 0)
> > > > (const_int 1 [0x1]))
> > > > (const_int 1 [0x1]))
> > > > (const_int 0 [0])))
> > >
> > > This is after r223067.  Combine tests only one "final" instruction; that
> > > revision rewrites zero_ext* if it doesn't match and tries again.  This
> > > helps for processors that can do more generic masks (like rs6000, and I
> > > believe also aarch64?): without it, you need many more patterns to match
> > > all the zero_ext{ract,end} cases.
> >
> > But there are more efficient ways to emit single bit and masks tests that 
> > apply
> > to most CPUs rather than doing something specific that works for just one 
> > target
> > only. For example single bit test is a simple shift into carry flag or into 
> > the
> > sign bit, and for mask tests, you shift out all the non-mask bits.
> 
> Most of those are quite target-specific.  Some others are already done,
> and/or done by other passes.

But what combine does here is even more target-specific. Shifts and AND setting 
flags
are universally supported on targets with condition code register.
Bitfield test/extract instructions are more rare, and when they are supported, 
they
may well be more expensive.

> > So my question is, is it combine's job to try all possible permutations that
> > constitute a bit or mask test?
> 
> Combine converts the merged instructions to what it thinks is the
> canonical or cheapest form, and uses that.  It does not try multiple
> options (the zero_ext* -> and+shift rewriting is not changing the
> semantics of the pattern at all).

But the change from AND to zero_extract is already changing semantics...

> > Or would it be better to let each target decide
> > on how to canonicalize bit tests and only try that alternative?
> 
> The question is how to write the pattern to be most convenient for all
> targets.

The obv

Ping Re: Pass -foffload targets from driver to libgomp at link time

2015-09-03 Thread Joseph Myers
Ping.  This patch 
 is pending 
review.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PING] Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function

2015-09-03 Thread Joseph Myers
On Thu, 3 Sep 2015, Jason Merrill wrote:

> The C++ parts are OK.

The diagnostic should say "built-in" not "builtin" (see 
codingconventions.html).  The C parts are OK with that change (which will 
require testcases to be updated).

-- 
Joseph S. Myers
jos...@codesourcery.com


[hsa] Implement a number of atomic builtins

2015-09-03 Thread Martin Jambor
Hi,

The patch below implements expansion of a number of atomic builtin
calls into HSA atomic instructions.  Committed to the branch.

Thanks,

Martin


2015-09-03  Martin Jambor  

* hsa-gen.c (gen_hsa_ternary_atomic_for_builtin): New function.
(gen_hsa_insns_for_call): Use it to implement appropriate builtin
calls.
---
 gcc/ChangeLog.hsa |   6 ++
 gcc/hsa-gen.c | 205 +-
 2 files changed, 209 insertions(+), 2 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index cbfc75a..9d569fe 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -3287,7 +3287,8 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
 /* Helper functions to create a single unary HSA operations out of calls to
builtins.  OPCODE is the HSA operation to be generated.  STMT is a gimple
call to a builtin.  HBB is the HSA BB to which the instruction should be
-   added and SSA_MAP is used to map gimple SSA names to HSA pseudoreisters.  */
+   added and SSA_MAP is used to map gimple SSA names to HSA
+   pseudoregisters.  */
 
 static void
 gen_hsa_unaryop_for_builtin (int opcode, gimple stmt, hsa_bb *hbb,
@@ -3304,6 +3305,97 @@ gen_hsa_unaryop_for_builtin (int opcode, gimple stmt, 
hsa_bb *hbb,
   gen_hsa_unary_operation (opcode, dest, op, hbb);
 }
 
+/* Helper function to create an HSA atomic binary operation instruction out of
+   calls to atomic builtins.  RET_ORIG is true if the built-in is the variant
+   that return s the value before applying operation, and false if it should
+   return the value after applying the operation (if it returns value at all).
+   ACODE is the atomic operation code, STMT is a gimple call to a builtin.  HBB
+   is the HSA BB to which the instruction should be added and SSA_MAP is used
+   to map gimple SSA names to HSA pseudoregisters.*/
+
+static void
+gen_hsa_ternary_atomic_for_builtin (bool ret_orig,
+   enum BrigAtomicOperation acode, gimple stmt,
+   hsa_bb *hbb, vec  *ssa_map)
+{
+  tree lhs = gimple_call_lhs (stmt);
+
+  tree type = TREE_TYPE (gimple_call_arg (stmt, 1));
+  BrigType16_t hsa_type  = hsa_type_for_scalar_tree_type (type, false);
+  BrigType16_t bit_type  = hsa_bittype_for_type (hsa_type);
+
+  hsa_op_reg *dest;
+  int nops, opcode;
+  if (lhs)
+{
+  if (ret_orig)
+   dest = hsa_reg_for_gimple_ssa (lhs, ssa_map);
+  else
+   dest = new hsa_op_reg (hsa_type);
+  opcode = BRIG_OPCODE_ATOMIC;
+  nops = 3;
+}
+  else
+{
+  dest = NULL;
+  opcode = BRIG_OPCODE_ATOMICNORET;
+  nops = 2;
+}
+
+  hsa_insn_atomic *atominsn = new hsa_insn_atomic (nops, opcode, acode,
+  bit_type);
+  hsa_op_address *addr;
+  addr = gen_hsa_addr (gimple_call_arg (stmt, 0), hbb, ssa_map);
+  hsa_op_base *op = hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 1),
+   hbb, ssa_map, NULL);
+
+  if (lhs)
+{
+  atominsn->set_op (0, dest);
+  atominsn->set_op (1, addr);
+  atominsn->set_op (2, op);
+}
+  else
+{
+  atominsn->set_op (0, addr);
+  atominsn->set_op (1, op);
+}
+  /* FIXME: Perhaps select a more relaxed memory model based on the last
+ argument of the buildin call.  */
+
+  hbb->append_insn (atominsn);
+
+  /* HSA does not natively support the variants that return the modified value,
+ so re-do the operation again non-atomically if that is what was
+ requested.  */
+  if (lhs && !ret_orig)
+{
+  int arith;
+  switch (acode)
+   {
+   case BRIG_ATOMIC_ADD:
+ arith = BRIG_OPCODE_ADD;
+ break;
+   case BRIG_ATOMIC_AND:
+ arith = BRIG_OPCODE_AND;
+ break;
+   case BRIG_ATOMIC_OR:
+ arith = BRIG_OPCODE_OR;
+ break;
+   case BRIG_ATOMIC_SUB:
+ arith = BRIG_OPCODE_SUB;
+ break;
+   case BRIG_ATOMIC_XOR:
+ arith = BRIG_OPCODE_XOR;
+ break;
+   default:
+ gcc_unreachable ();
+   }
+  hsa_op_reg *real_dest = dest = hsa_reg_for_gimple_ssa (lhs, ssa_map);
+  gen_hsa_binary_operation (arith, real_dest, dest, op, hbb);
+}
+}
+
 /* Generate HSA instructions for the given call statement STMT.  Instructions
will be appended to HBB.  SSA_MAP maps gimple SSA names to HSA pseudo
registers.  */
@@ -3440,7 +3532,6 @@ specialop:
 case BUILT_IN_ATOMIC_LOAD_8:
 case BUILT_IN_ATOMIC_LOAD_16:
   {
-   /* XXX Ignore mem model for now.  */
BrigType16_t mtype = mem_type_for_type (hsa_type_for_scalar_tree_type
(TREE_TYPE (lhs), false));
hsa_op_address *addr = gen_hsa_addr (gimple_call_arg (stmt, 0),
@@ -3452,10 +3543,120 @@ specialop:
atominsn->set_op (0, dest);
atominsn->set_op (1, addr);
atominsn->memoryorder = BRIG_MEMORY_ORDER_SC_AC

[hsa] Represent atomic loads with atomic insn, introduce set_op

2015-09-03 Thread Martin Jambor
Hi,

generation of atomic load instructions was still in its first ancient
implementation when it was not generating atomic loads at all.  This
is fixed by the subsequent patch.

The patch also introduces a method of hsa_insn_basic set_op which we
plan to use to use almost everywhere to set instruction operands.  It
has the advantage of automatically keeping SSA form of
pseudoregisters.

Committed to the branch after rudimantary HSA testing.

Thanks,

Martin


2015-09-03  Martin Jambor  

* hsa.h (hsa_insn_mem): Move fields memoryorder and
memoryscope...
(hsa_insn_atomic): ...here.  Also add new operator.
* hsa-dump.c (dump_hsa_insn): Do not dump removed fields.
* hsa-gen.c (hsa_insn_mem): Remove initialization of removed
fields.
(hsa_insn_mem): Likewise.
(hsa_insn_atomic::hsa_insn_atomic): Initialize new fields.
(hsa_insn_atomic::new): New.
(gen_hsa_insns_for_call): Create atomic instruction for atomic
loads.
---
 gcc/ChangeLog.hsa | 20 +
 gcc/hsa-dump.c|  4 
 gcc/hsa-gen.c | 64 ---
 gcc/hsa.h | 15 +++--
 4 files changed, 76 insertions(+), 27 deletions(-)

diff --git a/gcc/hsa-dump.c b/gcc/hsa-dump.c
index 4d78519..af61ebc 100644
--- a/gcc/hsa-dump.c
+++ b/gcc/hsa-dump.c
@@ -796,10 +796,6 @@ dump_hsa_insn (FILE *f, hsa_insn_basic *insn, int *indent)
   fprintf (f, "%s", hsa_opcode_name (mem->opcode));
   if (addr->symbol)
fprintf (f, "_%s", hsa_seg_name (addr->symbol->segment));
-  if (mem->memoryorder != BRIG_MEMORY_ORDER_NONE)
-   fprintf (f, "_%s", hsa_memsem_name (mem->memoryorder));
-  if (mem->memoryscope != BRIG_MEMORY_SCOPE_NONE)
-   fprintf (f, "_%s", hsa_memscope_name (mem->memoryscope));
   if (mem->equiv_class != 0)
fprintf (f, "_equiv(%i)", mem->equiv_class);
   fprintf (f, "_%s ", hsa_type_name (mem->type));
diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 92df7e4..cbfc75a 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -888,6 +888,31 @@ hsa_insn_basic::hsa_insn_basic (unsigned nops, int opc)
 operands.safe_grow_cleared (nops);
 }
 
+/* Make OP the operand number INDEX of operands of this instuction.  If OP is a
+   register or an address containing a register, then either set the definition
+   of the register to this instruction if it an output operand or add this
+   instruction to the uses if it is an input one.  */
+
+void
+hsa_insn_basic::set_op (int index, hsa_op_base *op)
+{
+  if (hsa_opcode_op_output_p (opcode, index))
+{
+  if (hsa_op_reg *reg = dyn_cast  (op))
+   reg->set_definition (this);
+}
+else
+  {
+   hsa_op_address *addr;
+   if (hsa_op_reg *reg = dyn_cast  (op))
+ reg->uses.safe_push (this);
+   else if ((addr = dyn_cast  (op))
+&& addr->reg)
+ addr->reg->uses.safe_push (this);
+  }
+  operands[index] = op;
+}
+
 /* Constructor of the class which is the bases of all instructions and directly
represents the most basic ones.  NOPS is the number of operands that the
operand vector will contain (and which will be cleared).  OPC is the opcode
@@ -1005,8 +1030,6 @@ hsa_insn_mem::hsa_insn_mem (int opc, BrigType16_t t, 
hsa_op_base *arg0,
   || opc == BRIG_OPCODE_EXPAND);
 
   equiv_class = 0;
-  memoryorder = BRIG_MEMORY_ORDER_NONE;
-  memoryscope = BRIG_MEMORY_SCOPE_NONE;
   operands[0] = arg0;
   operands[1] = arg1;
 }
@@ -1019,8 +1042,6 @@ hsa_insn_mem::hsa_insn_mem (unsigned nops, int opc, 
BrigType16_t t)
   : hsa_insn_basic (nops, opc, t)
 {
   equiv_class = 0;
-  memoryorder = BRIG_MEMORY_ORDER_NONE;
-  memoryscope = BRIG_MEMORY_SCOPE_NONE;
 }
 
 /* New operator to allocate memory instruction from pool alloc.  */
@@ -1031,9 +1052,9 @@ hsa_insn_mem::operator new (size_t)
   return hsa_allocp_inst_mem->vallocate ();
 }
 
-/* Constructor of class representing atomic instructions. OPC is the prinicpa;
-   opcode, aop is the specific atomic operation opcode.  T is the type of the
-   instruction.  */
+/* Constructor of class representing atomic instructions and signals. OPC is
+   the prinicpal opcode, aop is the specific atomic operation opcode.  T is the
+   type of the instruction.  */
 
 hsa_insn_atomic::hsa_insn_atomic (int nops, int opc,
  enum BrigAtomicOperation aop,
@@ -1045,6 +1066,18 @@ hsa_insn_atomic::hsa_insn_atomic (int nops, int opc,
   opc == BRIG_OPCODE_SIGNAL ||
   opc == BRIG_OPCODE_SIGNALNORET);
   atomicop = aop;
+  /* TODO: Review the following defaults (together with the few overriddes we
+ have in the code).  */
+  memoryorder = BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE;
+  memoryscope = BRIG_MEMORY_SCOPE_SYSTEM;
+}
+
+/* New operator to allocate signal instruction from pool alloc.  */
+
+void *
+hsa_insn_atomic::operator new (size_t)
+{
+  return h

[patch] libstdc++/66998 Make std::experimental::not_fn SFINAE-friendly.

2015-09-03 Thread Jonathan Wakely

Tested powerpc64le-linux, committed to trunk.


commit dd64ea78da1f6e92ba011605ece7cc4bb08e41cc
Author: Jonathan Wakely 
Date:   Thu Sep 3 12:26:55 2015 +0100

Make std::experimental::not_fn SFINAE-friendly.

PR libstdc++/66998
* include/experimental/functional (_Not_fn): Add exception
specifications and non-deduced return types.
(not_fn): Add exception specification and wrap pointer-to-member.
* testsuite/experimental/functional/not_fn.cc: Test in SFINAE context
and test pointer-to-member.

diff --git a/libstdc++-v3/include/experimental/functional 
b/libstdc++-v3/include/experimental/functional
index c6b9800..9db5fef 100644
--- a/libstdc++-v3/include/experimental/functional
+++ b/libstdc++-v3/include/experimental/functional
@@ -376,8 +376,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /// Generalized negator.
   template
-struct _Not_fn
+class _Not_fn
 {
+  _Fn _M_fn;
+
+public:
   template
explicit
_Not_fn(_Fn2&& __fn) : _M_fn(std::forward<_Fn2>(__fn)) { }
@@ -389,34 +392,43 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   ~_Not_fn() = default;
 
   template
-   decltype(auto)
+   auto
operator()(_Args&&... __args)
+   noexcept(noexcept(!_M_fn(std::forward<_Args>(__args)...)))
+   -> decltype(!_M_fn(std::forward<_Args>(__args)...))
{ return !_M_fn(std::forward<_Args>(__args)...); }
 
   template
-   decltype(auto)
+   auto
operator()(_Args&&... __args) const
+   noexcept(noexcept(!_M_fn(std::forward<_Args>(__args)...)))
+   -> decltype(!_M_fn(std::forward<_Args>(__args)...))
{ return !_M_fn(std::forward<_Args>(__args)...); }
 
   template
-   decltype(auto)
+   auto
operator()(_Args&&... __args) volatile
+   noexcept(noexcept(!_M_fn(std::forward<_Args>(__args)...)))
+   -> decltype(!_M_fn(std::forward<_Args>(__args)...))
{ return !_M_fn(std::forward<_Args>(__args)...); }
 
   template
-   decltype(auto)
+   auto
operator()(_Args&&... __args) const volatile
+   noexcept(noexcept(!_M_fn(std::forward<_Args>(__args)...)))
+   -> decltype(!_M_fn(std::forward<_Args>(__args)...))
{ return !_M_fn(std::forward<_Args>(__args)...); }
-
-private:
-  _Fn _M_fn;
 };
 
   /// [func.not_fn] Function template not_fn
-  template 
+  template
 inline auto
 not_fn(_Fn&& __fn)
-{ return _Not_fn>{std::forward<_Fn>(__fn)}; }
+noexcept(std::is_nothrow_constructible, _Fn&&>::value)
+{
+  using __maybe_type = _Maybe_wrap_member_pointer>;
+  return _Not_fn{std::forward<_Fn>(__fn)};
+}
 
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace fundamentals_v2
diff --git a/libstdc++-v3/testsuite/experimental/functional/not_fn.cc 
b/libstdc++-v3/testsuite/experimental/functional/not_fn.cc
index 8285ec4..4c137e8 100644
--- a/libstdc++-v3/testsuite/experimental/functional/not_fn.cc
+++ b/libstdc++-v3/testsuite/experimental/functional/not_fn.cc
@@ -20,6 +20,8 @@
 #include 
 #include 
 
+using std::experimental::not_fn;
+
 int func(int, char) { return 0; }
 
 struct F
@@ -33,8 +35,6 @@ struct F
 void
 test01()
 {
-  using std::experimental::not_fn;
-
   auto f1 = not_fn(func);
   VERIFY( f1(1, '2') == true );
 
@@ -50,8 +50,36 @@ test01()
   VERIFY( f5(1) == false );
 }
 
+template
+auto foo(F f, Arg arg) -> decltype(not_fn(f)(arg)) { return not_fn(f)(arg); }
+
+template
+auto foo(F f, Arg arg) -> decltype(not_fn(f)()) { return not_fn(f)(); }
+
+struct negator
+{
+bool operator()(int) const { return false; }
+void operator()() const {}
+};
+
+void 
+test02()
+{
+  foo(negator{}, 1); // PR libstdc++/66998
+}
+
+void
+test03()
+{
+  struct X { bool b; };
+  X x{ false };
+  VERIFY( not_fn(&X::b)(x) );
+}
+
 int
 main()
 {
   test01();
+  test02();
+  test03();
 }




[patch] libstdc++/62039 Add concept checks to std::next and std::prev.

2015-09-03 Thread Jonathan Wakely

Marc suggested adding concept checks to std::prev, I've also done so
for std::next. Even though these checks are deprecated, they give us
somewhere to consider putting C++17 concept requirements.

Tested powerpx64le-linux, committed to trunk.

commit 48d52e4f02e6c392086dce8832319c68ebec68b9
Author: Jonathan Wakely 
Date:   Thu Sep 3 12:07:01 2015 +0100

Add concept checks to std::next and std::prev.

	PR libstdc++/62039
	* include/bits/stl_iterator_base_funcs.h (next, prev): Add concept
	checks.
	* testsuite/24_iterators/operations/prev_neg.cc: New.
	* testsuite/24_iterators/operations/next_neg.cc: New.

diff --git a/libstdc++-v3/include/bits/stl_iterator_base_funcs.h b/libstdc++-v3/include/bits/stl_iterator_base_funcs.h
index 516f8fc..0f77329 100644
--- a/libstdc++-v3/include/bits/stl_iterator_base_funcs.h
+++ b/libstdc++-v3/include/bits/stl_iterator_base_funcs.h
@@ -205,6 +205,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 next(_ForwardIterator __x, typename
 	 iterator_traits<_ForwardIterator>::difference_type __n = 1)
 {
+  // concept requirements
+  __glibcxx_function_requires(_ForwardIteratorConcept<
+  _ForwardIterator>)
   std::advance(__x, __n);
   return __x;
 }
@@ -214,6 +217,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 prev(_BidirectionalIterator __x, typename
 	 iterator_traits<_BidirectionalIterator>::difference_type __n = 1) 
 {
+  // concept requirements
+  __glibcxx_function_requires(_BidirectionalIteratorConcept<
+  _BidirectionalIterator>)
   std::advance(__x, -__n);
   return __x;
 }
diff --git a/libstdc++-v3/testsuite/24_iterators/operations/next_neg.cc b/libstdc++-v3/testsuite/24_iterators/operations/next_neg.cc
new file mode 100644
index 000..881307e
--- /dev/null
+++ b/libstdc++-v3/testsuite/24_iterators/operations/next_neg.cc
@@ -0,0 +1,42 @@
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11 -D_GLIBCXX_CONCEPT_CHECKS" }
+// { dg-do compile }
+
+#include 
+
+struct X {};
+
+namespace std
+{
+  template<>
+struct iterator_traits : iterator_traits
+{
+  using iterator_category = input_iterator_tag;
+  using reference = const X&;
+  using pointer = const X*;
+};
+}
+
+void
+test01()
+{
+  const X array[1] = { };
+  std::next(array);
+  // { dg-error "input_iterator" "" { target *-*-* } 220 }
+}
diff --git a/libstdc++-v3/testsuite/24_iterators/operations/prev_neg.cc b/libstdc++-v3/testsuite/24_iterators/operations/prev_neg.cc
new file mode 100644
index 000..513e0e8
--- /dev/null
+++ b/libstdc++-v3/testsuite/24_iterators/operations/prev_neg.cc
@@ -0,0 +1,42 @@
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11 -D_GLIBCXX_CONCEPT_CHECKS" }
+// { dg-do compile }
+
+#include 
+
+struct Y {};
+
+namespace std
+{
+  template<>
+struct iterator_traits : iterator_traits
+{
+  using iterator_category = forward_iterator_tag;
+  using reference = const Y&;
+  using pointer = const Y*;
+};
+}
+
+void
+test02()
+{
+  const Y array[1] = { };
+  std::prev(array + 1);
+  // { dg-error "forward_iterator" "" { target *-*-* } 220 }
+}


[PATCH] Refactor dwarf2out_late_global_decl WRT early debug

2015-09-03 Thread Richard Biener

The following patch refactors dwarf2out_late_global_decl to only
add location or const value attributes in late dwarf phase.  It
adds LTO support by doing the early phase there as well (just
what it would have done on-the-fly when using dwarf2out_decl).

This change enables the other part of the patch, making sure we
finish all template stuff early (well, most of it - the part
requiring symbolic constants is left to the late phase).
That this change requires the first means that we fail to
create some early debug (there's some other unreviewed patch
from me addressing parts of that but it doesn't address all
cases that happen during bootstrap - we're creating the stuff
"late" via the late_global_decl call from cgraphunit.c:analyze_functions).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

I'll try to dive into the above issue again when I return from vacation.

Richard.

2015-09-03  Richard Biener  

* dwarf2out.c (dwarf2out_late_global_decl): For LTO dispatch
to dwarf2out_early_global_decl first.  With early debug
just add locations or const value attributes on the early
created DIEs.
(append_entry_to_tmpl_value_parm_die_table): Assert we are
in early dwarf mode.
(schedule_generic_params_dies_gen): Likewise.
(gen_remaining_tmpl_value_param_die_attribute): Keep entries
we were not able to process.
(gen_scheduled_generic_parms_dies): Set early-dwarf and clear
vector after we're done.
(dwarf2out_finish): Do not call gen_scheduled_generic_parms_dies
here.
(dwarf2out_early_finish): Call gen_scheduled_generic_parms_dies
here and also process a first round of
gen_remaining_tmpl_value_param_die_attribute.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 227446)
+++ gcc/dwarf2out.c (working copy)
@@ -21640,14 +21640,20 @@ dwarf2out_early_global_decl (tree decl)
 static void
 dwarf2out_late_global_decl (tree decl)
 {
-  /* Output any global decls we missed or fill-in any location
- information we were unable to determine on the first pass.
-
- Skip over functions because they were handled by the
- debug_hooks->function_decl() call in rest_of_handle_final.  */
-  if ((TREE_CODE (decl) != FUNCTION_DECL || !DECL_INITIAL (decl))
+  /* We have to generate early debug late for LTO.  */
+  if (in_lto_p)
+dwarf2out_early_global_decl (decl);
+
+/* Fill-in any location information we were unable to determine
+   on the first pass.  */
+  if (TREE_CODE (decl) == VAR_DECL
   && !POINTER_BOUNDS_P (decl))
-dwarf2out_decl (decl);
+{
+  dw_die_ref die = lookup_decl_die (decl);
+  if (die)
+   add_location_or_const_value_attribute (die, decl, false,
+  DW_AT_location);
+}
 }
 
 /* Output debug information for type decl DECL.  Called from toplev.c
@@ -22105,6 +22111,8 @@ append_entry_to_tmpl_value_parm_die_tabl
   if (!die || !arg)
 return;
 
+  gcc_assert (early_dwarf);
+
   if (!tmpl_value_parm_die_table)
 vec_alloc (tmpl_value_parm_die_table, 32);
 
@@ -22134,6 +22142,8 @@ schedule_generic_params_dies_gen (tree t
   if (!generic_type_p (t))
 return;
 
+  gcc_assert (early_dwarf);
+
   if (!generic_type_instances)
 vec_alloc (generic_type_instances, 256);
 
@@ -22149,11 +22159,21 @@ gen_remaining_tmpl_value_param_die_attri
 {
   if (tmpl_value_parm_die_table)
 {
-  unsigned i;
+  unsigned i, j;
   die_arg_entry *e;
 
+  /* We do this in two phases - first get the cases we can
+handle during early-finish, preserving those we cannot
+(containing symbolic constants where we don't yet know
+whether we are going to output the referenced symbols).
+For those we try again at late-finish.  */
+  j = 0;
   FOR_EACH_VEC_ELT (*tmpl_value_parm_die_table, i, e)
-   tree_add_const_value_attribute (e->die, e->arg);
+   {
+ if (!tree_add_const_value_attribute (e->die, e->arg))
+   (*tmpl_value_parm_die_table)[j++] = *e;
+   }
+  tmpl_value_parm_die_table->truncate (j);
 }
 }
 
@@ -22171,9 +22191,15 @@ gen_scheduled_generic_parms_dies (void)
   if (!generic_type_instances)
 return;
   
+  /* We end up "recursing" into schedule_generic_params_dies_gen, so
+ pretend this generation is part of "early dwarf" as well.  */
+  set_early_dwarf s;
+
   FOR_EACH_VEC_ELT (*generic_type_instances, i, t)
 if (COMPLETE_TYPE_P (t))
   gen_generic_params_dies (t);
+
+  generic_type_instances = NULL;
 }
 
 
@@ -25207,7 +25233,6 @@ dwarf2out_finish (const char *filename)
   producer->dw_attr_val.v.val_str->refcount--;
   producer->dw_attr_val.v.val_str = find_AT_string (producer_string);
 
-  gen_scheduled_generic_parms_dies ();
   gen_remaining_tmpl_value_param_die_attribute ();
 
   /* Add the name for the main input fi

Re: [PING] Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function

2015-09-03 Thread Jason Merrill

The C++ parts are OK.

Jason


Re: [gomp4.1] Depend clause support for offloading

2015-09-03 Thread Jakub Jelinek
On Wed, Sep 02, 2015 at 05:58:54PM +0200, Jakub Jelinek wrote:
> Here is the start of the async offloading support I've talked about,
> but nowait is not supported on the library side yet, only depend clause
> (and for that I haven't added a testcase yet).

Added testcase revealed two (small) issues, here is a fix for that together
with the testcase.

BTW, unless we want to add (at least now) support for running tasks in
between sending offloading target requests for memory allocation or data
movement and the offloading target signalizing their completion (supposedly
we'd better then be able to perform something like writev, merge as many
requests as possible into one metarequest and then await the completion of
it), I think at least for now we can ignore nowait on
target {update,{enter,exit} data} if depend clause is not also present
(on the library side).

I'll try to work on target {update,{enter,exit} data} nowait depend next
(in that case we need to copy the arrays and create some gomp_task).

2015-09-03  Jakub Jelinek  

* omp-low.c (lower_depend_clauses): Set TREE_ADDRESSABLE on array.
(lower_omp_target): Use gimple_omp_target_clauses_ptr instead of
gimple_omp_task_clauses_ptr.

* testsuite/libgomp.c/target-25.c: New test.

--- gcc/omp-low.c.jj2015-09-02 15:13:13.0 +0200
+++ gcc/omp-low.c   2015-09-03 15:24:15.153716381 +0200
@@ -12975,6 +12975,7 @@ lower_depend_clauses (tree *pclauses, gi
}
   tree type = build_array_type_nelts (ptr_type_node, n_in + n_out + 2);
   tree array = create_tmp_var (type);
+  TREE_ADDRESSABLE (array) = 1;
   tree r = build4 (ARRAY_REF, ptr_type_node, array, size_int (0), NULL_TREE,
   NULL_TREE);
   g = gimple_build_assign (r, build_int_cst (ptr_type_node, n_in + n_out));
@@ -13182,7 +13183,7 @@ lower_omp_target (gimple_stmt_iterator *
 {
   push_gimplify_context ();
   dep_bind = gimple_build_bind (NULL, NULL, make_node (BLOCK));
-  lower_depend_clauses (gimple_omp_task_clauses_ptr (stmt),
+  lower_depend_clauses (gimple_omp_target_clauses_ptr (stmt),
&dep_ilist, &dep_olist);
 }
 
--- libgomp/testsuite/libgomp.c/target-25.c.jj  2015-09-03 15:02:34.130651945 
+0200
+++ libgomp/testsuite/libgomp.c/target-25.c 2015-09-03 15:49:52.077362256 
+0200
@@ -0,0 +1,84 @@
+#include 
+#include 
+
+int
+main ()
+{
+  int x = 0, y = 0, z = 0, s = 11, t = 12, u = 13, w = 7, err;
+  #pragma omp parallel
+  #pragma omp single
+  {
+#pragma omp task depend(in: x)
+{
+  usleep (5000);
+  x = 1;
+}
+#pragma omp task depend(in: x)
+{
+  usleep (6000);
+  y = 2;
+}
+#pragma omp task depend(out: z)
+{
+  usleep (7000);
+  z = 3;
+}
+#pragma omp target map(tofrom: x) firstprivate (y) depend(inout: x, z)
+err = (x != 1 || y != 2 || z != 3);
+if (err)
+  abort ();
+#pragma omp task depend(in: x)
+{
+  usleep (5000);
+  x = 4;
+}
+#pragma omp task depend(in: x)
+{
+  usleep (4000);
+  y = 5;
+}
+#pragma omp task depend(in: z)
+{
+  usleep (3000);
+  z = 6;
+}
+#pragma omp target enter data nowait map (to: w)
+#pragma omp target enter data depend (inout: x, z) map (to: x, y, z)
+#pragma omp target map (alloc: x, y, z)
+{
+  err = (x != 4 || y != 5 || z != 6);
+  x = 7;
+  y = 8;
+  z = 9;
+}
+if (err)
+  abort ();
+#pragma omp taskwait
+#pragma omp target map (alloc: w)
+{
+  err = w != 7;
+  w = 17;
+}
+if (err)
+  abort (); 
+#pragma omp task depend(in: x)
+{
+  usleep (2000);
+  s = 14;
+}
+#pragma omp task depend(in: x)
+{
+  usleep (3000);
+  t = 15;
+}
+#pragma omp task depend(in: z)
+{
+  usleep (4000);
+  u = 16;
+}
+#pragma omp target exit data depend (inout: x, z) map (from: x, y, z, w)
+if (x != 7 || y != 8 || z != 9 || s != 14 || t != 15 || u != 16 || w != 17)
+  abort ();
+  }
+  return 0;
+}

Jakub


[PATCH][committed] Some trivial dwarf2out.c refactoring

2015-09-03 Thread Richard Biener

I split this bit out from the early-LTO debug work and the [1/n] patch
that is still pending review.

As I figured I need to preserve the old LTO behavior because of
tooling issues with old linkers I am now concentrating on getting
dwarf2out early + late phases refactored in a way that do not break
old LTO rather than doing that together with early LTO debug.

Bootstrapped & tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-09-03  Richard Biener  

* dwarf2out.c (flush_limbo_die_list): Split out from ...
(dwarf2out_early_finish): ... here.
(dwarf2out_finish): Do not call dwarf2out_early_finish but
flush_limbo_die_list.  Assert we have no deferred asm names.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 227400)
+++ gcc/dwarf2out.c (working copy)
@@ -25127,6 +25127,62 @@ optimize_location_lists (dw_die_ref die)
   optimize_location_lists_1 (die, &htab);
 }
 
+/* Traverse the limbo die list, and add parent/child links.  The only
+   dies without parents that should be here are concrete instances of
+   inline functions, and the comp_unit_die.  We can ignore the comp_unit_die.
+   For concrete instances, we can get the parent die from the abstract
+   instance.  */
+
+static void
+flush_limbo_die_list (void)
+{
+  limbo_die_node *node, *next_node;
+
+  for (node = limbo_die_list; node; node = next_node)
+{
+  dw_die_ref die = node->die;
+  next_node = node->next;
+
+  if (die->die_parent == NULL)
+   {
+ dw_die_ref origin = get_AT_ref (die, DW_AT_abstract_origin);
+
+ if (origin && origin->die_parent)
+   add_child_die (origin->die_parent, die);
+ else if (is_cu_die (die))
+   ;
+ else if (seen_error ())
+   /* It's OK to be confused by errors in the input.  */
+   add_child_die (comp_unit_die (), die);
+ else
+   {
+ /* In certain situations, the lexical block containing a
+nested function can be optimized away, which results
+in the nested function die being orphaned.  Likewise
+with the return type of that nested function.  Force
+this to be a child of the containing function.
+
+It may happen that even the containing function got fully
+inlined and optimized out.  In that case we are lost and
+assign the empty child.  This should not be big issue as
+the function is likely unreachable too.  */
+ gcc_assert (node->created_for);
+
+ if (DECL_P (node->created_for))
+   origin = get_context_die (DECL_CONTEXT (node->created_for));
+ else if (TYPE_P (node->created_for))
+   origin = scope_die_for (node->created_for, comp_unit_die ());
+ else
+   origin = comp_unit_die ();
+
+ add_child_die (origin, die);
+   }
+   }
+}
+
+  limbo_die_list = NULL;
+}
+
 /* Output stuff that dwarf requires at the end of every file,
and generate the DWARF-2 debugging info.  */
 
@@ -25137,7 +25193,11 @@ dwarf2out_finish (const char *filename)
   dw_die_ref main_comp_unit_die;
 
   /* Flush out any latecomers to the limbo party.  */
-  dwarf2out_early_finish ();
+  flush_limbo_die_list ();
+
+  /* We shouldn't have any symbols with delayed asm names for
+ DIEs generated after early finish.  */
+  gcc_assert (deferred_asm_name == NULL);
 
   /* PCH might result in DW_AT_producer string being restored from the
  header compilation, so always fill it with empty string initially
@@ -25483,7 +25543,7 @@ dwarf2out_finish (const char *filename)
 static void
 dwarf2out_early_finish (void)
 {
-  limbo_die_node *node, *next_node;
+  limbo_die_node *node;
 
   /* Add DW_AT_linkage_name for all deferred DIEs.  */
   for (node = deferred_asm_name; node; node = node->next)
@@ -25501,57 +25561,9 @@ dwarf2out_early_finish (void)
 }
   deferred_asm_name = NULL;
 
-  /* Traverse the limbo die list, and add parent/child links.  The only
- dies without parents that should be here are concrete instances of
- inline functions, and the comp_unit_die.  We can ignore the comp_unit_die.
- For concrete instances, we can get the parent die from the abstract
- instance.
-
- The point here is to flush out the limbo list so that it is empty
+  /* The point here is to flush out the limbo list so that it is empty
  and we don't need to stream it for LTO.  */
-  for (node = limbo_die_list; node; node = next_node)
-{
-  dw_die_ref die = node->die;
-  next_node = node->next;
-
-  if (die->die_parent == NULL)
-   {
- dw_die_ref origin = get_AT_ref (die, DW_AT_abstract_origin);
-
- if (origin && origin->die_parent)
-   add_child_die (origin->die_parent, die);
- else if (is_cu_die (die))
-   ;
-  

Re: [RFC] Try vector as a new representation for vector masks

2015-09-03 Thread Ilya Enkovich
2015-09-03 15:11 GMT+03:00 Richard Biener :
> On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich  wrote:
>> Adding CCs.
>>
>> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich :
>>> 2015-09-01 17:25 GMT+03:00 Richard Biener :
>>>
>>> Totally disabling old style vector comparison and bool pattern is a
>>> goal but doing hat would mean a lot of regressions for many targets.
>>> Do you want to it to be tried to estimate amount of changes required
>>> and reveal possible issues? What would be integration plan for these
>>> changes? Do you want to just introduce new vector in GIMPLE
>>> disabling bool patterns and then resolving vectorization regression on
>>> all targets or allow them live together with following target switch
>>> one by one from bool patterns with finally removing them? Not all
>>> targets are likely to be adopted fast I suppose.
>
> Well, the frontends already create vec_cond exprs I believe.  So for
> bool patterns the vectorizer would have to do the same, but the
> comparison result in there would still use vec.  Thus the scalar
>
>  _Bool a = b < c;
>  _Bool c = a || d;
>  if (c)
>
> would become
>
>  vec a = VEC_COND ;
>  vec c = a | d;

This should be identical to

vec<_Bool> a = a < b;
vec<_Bool> c = a | d;

where vec<_Bool> has VxSI mode. And we should prefer it in case target
supports vector comparison into vec, right?

>
> when the target does not have vecs directly and otherwise
> vec directly (dropping the VEC_COND).
>
> Just the vector comparison inside the VEC_COND would always
> have vec type.

I don't really understand what you mean by 'doesn't have vecs
dirrectly' here. Currently I have a hook to ask for a vec mode
and assume target doesn't support it in case it returns VOIDmode. But
in such case I have no mode to use for vec inside VEC_COND
either.

In default implementation of the new target hook I always return
integer vector mode (to have default behavior similar to the current
one). It should allow me to use vec for conditions in all
vec_cond. But we'd need some other trigger for bool patterns to apply.
Probably check vec_cmp optab in check_bool_pattern and don't convert
in case comparison is supported by target? Or control it via
additional hook.

>
> And the "bool patterns" I am talking about are those in
> tree-vect-patterns.c, not any targets instruction patterns.

I refer to them also. BTW bool patterns also pull comparison into
vec_cond. Thus we cannot have SSA_NAME in vec_cond as a condition. I
think with vector comparisons in place we should allow SSA_NAME as
conditions in VEC_COND for better CSE. That should require new vcond
optabs though.

Ilya

>
> Richard.
>
>>>
>>> Ilya


Re: RFC: Combine of compare & and oddity

2015-09-03 Thread Segher Boessenkool
On Thu, Sep 03, 2015 at 12:43:34PM +0100, Wilco Dijkstra wrote:
> > > Combine canonicalizes certain AND masks in a comparison with zero into 
> > > extracts of the
> > widest
> > > register type. During matching these are expanded into a very inefficient 
> > > sequence that
> > fails to
> > > match. For example (x & 2) == 0 is matched in combine like this:
> > >
> > > Failed to match this instruction:
> > > (set (reg:CC 66 cc)
> > > (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
> > > (const_int 1 [0x1])
> > > (const_int 1 [0x1]))
> > > (const_int 0 [0])))
> > 
> > Yes.  Some processors even have specific instructions to do this.
> 
> However there are 2 issues with this, one is the spurious subreg,

Combine didn't make that up out of thin air; something already used
DImode here.  It could simplify it to SImode in this case, that is
true, don't know why it doesn't; it isn't necessarily faster code to
do so, it can be slower, it might not match, etc.

> the other is 
> that only a subset of legal zero_extracts are tried (only single bit and the
> degenerate case of zero_extract with shift of 0 - which I think should not be 
> a
> zero_extract). All other AND immediate remain as AND. 

Yes.  I'm happy to see this weird special case "optimisation",
"canocalisation" gone.

> So to emit an AND on targets without such specific instructions, or where 
> such 
> instructions are more expensive than a simple AND (*), you need now at least 
> 3 different 
> backend patterns for any instruction that can emit AND immediate...

It's only a problem for AND-and-compare, no?

> (*) I think that is another issue in combine - when both alternatives match 
> you
> want to select the lowest cost one, not the first one that matches.

That's recog, not combine.  And quite a few backends rely on "first match
wins", because it always has been that way.  It also is very easy to write
such patterns accidentally (sometimes even the exact same one twice in the
same machine description, etc.)

> > > Failed to match this instruction:
> > > (set (reg:CC 66 cc)
> > > (compare:CC (and:DI (lshiftrt:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 
> > > 0)
> > > (const_int 1 [0x1]))
> > > (const_int 1 [0x1]))
> > > (const_int 0 [0])))
> > 
> > This is after r223067.  Combine tests only one "final" instruction; that
> > revision rewrites zero_ext* if it doesn't match and tries again.  This
> > helps for processors that can do more generic masks (like rs6000, and I
> > believe also aarch64?): without it, you need many more patterns to match
> > all the zero_ext{ract,end} cases.
> 
> But there are more efficient ways to emit single bit and masks tests that 
> apply
> to most CPUs rather than doing something specific that works for just one 
> target
> only. For example single bit test is a simple shift into carry flag or into 
> the 
> sign bit, and for mask tests, you shift out all the non-mask bits.

Most of those are quite target-specific.  Some others are already done,
and/or done by other passes.

> So my question is, is it combine's job to try all possible permutations that
> constitute a bit or mask test?

Combine converts the merged instructions to what it thinks is the
canonical or cheapest form, and uses that.  It does not try multiple
options (the zero_ext* -> and+shift rewriting is not changing the
semantics of the pattern at all).

> Or would it be better to let each target decide
> on how to canonicalize bit tests and only try that alternative?

The question is how to write the pattern to be most convenient for all
targets.

> > > Neither matches the AArch64 patterns for ANDS/TST (which is just compare 
> > > and AND). If the
> > immediate
> > > is not a power of 2 or a power of 2 -1 then it matches correctly as 
> > > expected.
> > >
> > > I don't understand how ((x >> 1) & 1) != 0 could be a useful expansion
> > 
> > It is zero_extract(x,1,1) really.  This is convenient for (old and embedded)
> > processors that have special bit-test instructions.  If we now want combine
> > to not do this, we'll have to update all backends that rely on it.
> 
> Would any backend actually rely on this given it only does some specific 
> masks,
> has a redundant shift with 0 for the mask case and the odd subreg as well?

Such backends match the zero_extract patterns, of course.  Random example:
the h8300 patterns for the "btst" instruction.

> > They are common, and many processors had instructions for them, which is
> > (supposedly) why combine special-cased them.
> 
> Yes, but that doesn't mean (x & C) != 0 shouldn't be tried as well...

Combine does not try multiple options.

> > > It's trivial to change change_zero_ext to expand extracts always into AND 
> > > and remove the
> > redundant
> > > subreg.
> > 
> > Not really trivial...  Think about comparisons...
> > 
> > int32_t x;
> > ((x >> 31) & 1) > 0;
> > // is not the same as
> > (x & 0x8000) >

[PATCH] Side-step wide_int_to_tree issue

2015-09-03 Thread Richard Biener

In this particular place which otherwise triggers with

Index: gcc/tree.c
===
--- gcc/tree.c  (revision 227429)
+++ gcc/tree.c  (working copy)
@@ -1395,6 +1395,8 @@ wide_int_to_tree (tree type, const wide_
gcc_checking_assert (pcst.elt (l - 2) >= 0);
 }
 
+  gcc_assert (prec <= pcst.get_precision ());
+
   wide_int cst = wide_int::from (pcst, prec, sgn);
   unsigned int ext_len = get_int_cst_ext_nunits (type, cst);
 


Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-09-03  Richard Biener  

* varasm.c (output_constant): Use fold_convert instead of
wide_int_to_tree.

Index: gcc/varasm.c
===
--- gcc/varasm.c(revision 227429)
+++ gcc/varasm.c(working copy)
@@ -4699,7 +4699,7 @@ output_constant (tree exp, unsigned HOST
exp = build1 (ADDR_EXPR, saved_type, TREE_OPERAND (exp, 0));
   /* Likewise for constant ints.  */
   else if (TREE_CODE (exp) == INTEGER_CST)
-   exp = wide_int_to_tree (saved_type, exp);
+   exp = fold_convert (saved_type, exp);
 
 }
 


Re: [RFC] Try vector as a new representation for vector masks

2015-09-03 Thread Richard Biener
On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich  wrote:
> Adding CCs.
>
> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich :
>> 2015-09-01 17:25 GMT+03:00 Richard Biener :
>>> On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich  
>>> wrote:
 On 27 Aug 09:55, Richard Biener wrote:
> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich  
> wrote:
> >
> > Yes, I want to try it. But getting rid of bool patterns would mean
> > support for all targets currently supporting vec_cond. Would it be OK
> > to have vector mask co-exist with bool patterns for some time?
>
> No, I'd like to remove the bool patterns anyhow - the vectorizer should 
> be able
> to figure out the correct vector type (or mask type) from the uses.  
> Currently
> it simply looks at the stmts LHS type but as all stmt operands already 
> have
> vector types it can as well compute the result type from those.  We'd 
> want to
> have a helper function that does this result type computation as I figure 
> it
> will be needed in multiple places.
>
> This is now on my personal TODO list (but that's already quite long for 
> GCC 6),
> so if you manage to get to that...  see
> tree-vect-loop.c:vect_determine_vectorization_factor
> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their
> vector type set from data-ref analysis already - there 'bool' loads
> correctly get
> VNQImode).  There is a basic-block / SLP part as well that would need to 
> use
> the helper function (eventually with some SLP discovery order issue).
>
> > Thus first step would be to require vector for MASK_LOAD and
> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and
> > MASK_STORE).
>
> You can certainly try that first, but as soon as you hit complications 
> with
> needing to adjust bool patterns then I'd rather get rid of them.
>
> >
> > I can directly build a vector type with specified mode to avoid it. 
> > Smth. like:
> >
> > mask_mode = targetm.vectorize.get_mask_mode (nunits, 
> > current_vector_size);
> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode);
>
> Hmm, indeed, that might be a (good) solution.  Btw, in this case
> target attribute
> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change 
> depending
> on the active target).  There would also be no way for the user to
> declare vector
> in source (which is good because of that target attribute issue...).
>
> So yeah.  Adding a tree.c:build_truth_vector_type (unsigned nunits)
> and adjusting
> truth_type_for is the way to go.
>
> I suggest you try modifying those parts first according to this scheme
> that will most
> likely uncover issues we missed.
>
> Thanks,
> Richard.
>

 I tried to implement this scheme and apply it for MASK_LOAD and 
 MASK_STORE.  There were no major issues (for now).

 build_truth_vector_type and get_mask_type_for_scalar_type were added to 
 build a mask type.  It is always a vector of bools but its mode is 
 determined by a target using number of units and currently used vector 
 length.

 As previously I fixed if-conversion to apply boolean masks for loads and 
 stores which automatically disables bool patterns for them and flow goes 
 by a mask path.  Vectorization factor computation is fixed to have a 
 separate computation for mask types.  Comparison is now handled separately 
 by vectorizer and is vectorized into vector comparison.

 Optabs for masked loads and stores were transformed into convert optabs.  
 Now it is checked using both value and mask modes.

 Optabs for comparison were added.  These are also convert optabs checking 
 value and result type.

 I had to introduce significant number of new patterns in i386 target to 
 support new optabs.  The reason was vector compare was never expanded 
 separately and always was a part of a vec_cond expansion.
>>>
>>> Indeed.
>>>
 As a result it's possible to use the sage GIMPLE representation for both 
 vector and scalar masks target type.  Here is an example I used as a 
 simple test:

   for (i=0; i>>>   {
 float t = a[i];
 if (t > 0.0f && t < 1.0e+2f)
   if (c[i] != 0)
 c[i] = 1;
   }

 Produced vector GIMPLE (before expand):

   vect_t_5.22_105 = MEM[base: _256, offset: 0B];
   mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
 0.0 };
   mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 
 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 };
   mask__8.27_110 = mask__6.23_107 & mask__7.25_109;
   vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110);
   mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 

Re: [RFC] Try vector as a new representation for vector masks

2015-09-03 Thread Ilya Enkovich
Adding CCs.

2015-09-03 15:03 GMT+03:00 Ilya Enkovich :
> 2015-09-01 17:25 GMT+03:00 Richard Biener :
>> On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich  wrote:
>>> On 27 Aug 09:55, Richard Biener wrote:
 On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich  
 wrote:
 >
 > Yes, I want to try it. But getting rid of bool patterns would mean
 > support for all targets currently supporting vec_cond. Would it be OK
 > to have vector mask co-exist with bool patterns for some time?

 No, I'd like to remove the bool patterns anyhow - the vectorizer should be 
 able
 to figure out the correct vector type (or mask type) from the uses.  
 Currently
 it simply looks at the stmts LHS type but as all stmt operands already have
 vector types it can as well compute the result type from those.  We'd want 
 to
 have a helper function that does this result type computation as I figure 
 it
 will be needed in multiple places.

 This is now on my personal TODO list (but that's already quite long for 
 GCC 6),
 so if you manage to get to that...  see
 tree-vect-loop.c:vect_determine_vectorization_factor
 which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their
 vector type set from data-ref analysis already - there 'bool' loads
 correctly get
 VNQImode).  There is a basic-block / SLP part as well that would need to 
 use
 the helper function (eventually with some SLP discovery order issue).

 > Thus first step would be to require vector for MASK_LOAD and
 > MASK_STORE and support it for i386 (the only user of MASK_LOAD and
 > MASK_STORE).

 You can certainly try that first, but as soon as you hit complications with
 needing to adjust bool patterns then I'd rather get rid of them.

 >
 > I can directly build a vector type with specified mode to avoid it. 
 > Smth. like:
 >
 > mask_mode = targetm.vectorize.get_mask_mode (nunits, 
 > current_vector_size);
 > mask_type = make_vector_type (bool_type_node, nunits, mask_mode);

 Hmm, indeed, that might be a (good) solution.  Btw, in this case
 target attribute
 boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending
 on the active target).  There would also be no way for the user to
 declare vector
 in source (which is good because of that target attribute issue...).

 So yeah.  Adding a tree.c:build_truth_vector_type (unsigned nunits)
 and adjusting
 truth_type_for is the way to go.

 I suggest you try modifying those parts first according to this scheme
 that will most
 likely uncover issues we missed.

 Thanks,
 Richard.

>>>
>>> I tried to implement this scheme and apply it for MASK_LOAD and MASK_STORE. 
>>>  There were no major issues (for now).
>>>
>>> build_truth_vector_type and get_mask_type_for_scalar_type were added to 
>>> build a mask type.  It is always a vector of bools but its mode is 
>>> determined by a target using number of units and currently used vector 
>>> length.
>>>
>>> As previously I fixed if-conversion to apply boolean masks for loads and 
>>> stores which automatically disables bool patterns for them and flow goes by 
>>> a mask path.  Vectorization factor computation is fixed to have a separate 
>>> computation for mask types.  Comparison is now handled separately by 
>>> vectorizer and is vectorized into vector comparison.
>>>
>>> Optabs for masked loads and stores were transformed into convert optabs.  
>>> Now it is checked using both value and mask modes.
>>>
>>> Optabs for comparison were added.  These are also convert optabs checking 
>>> value and result type.
>>>
>>> I had to introduce significant number of new patterns in i386 target to 
>>> support new optabs.  The reason was vector compare was never expanded 
>>> separately and always was a part of a vec_cond expansion.
>>
>> Indeed.
>>
>>> As a result it's possible to use the sage GIMPLE representation for both 
>>> vector and scalar masks target type.  Here is an example I used as a simple 
>>> test:
>>>
>>>   for (i=0; i>>   {
>>> float t = a[i];
>>> if (t > 0.0f && t < 1.0e+2f)
>>>   if (c[i] != 0)
>>> c[i] = 1;
>>>   }
>>>
>>> Produced vector GIMPLE (before expand):
>>>
>>>   vect_t_5.22_105 = MEM[base: _256, offset: 0B];
>>>   mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
>>> 0.0 };
>>>   mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 
>>> 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 };
>>>   mask__8.27_110 = mask__6.23_107 & mask__7.25_109;
>>>   vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110);
>>>   mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 };
>>>   mask__37.35_120 = mask__8.27_110 & mask__36.33_119;
>>>   MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 1 
>>> });
>>
>> Looks good to me.
>>
>>> Produce

Re: [wwwdocs] Skeleton for GCC 6 release notes

2015-09-03 Thread Sebastian Huber

Hello,

how can I add something to the release notes? I would like to mention 
some RTEMS changes.


The RTEMS thread model implementation changed. For the mutexes 
self-contained objects defined in Newlib  are used instead 
of Classic API semaphores.  The keys and the once function are directly 
defined via .  Condition variables are provided via Newlib 
. The RTEMS thread model supports now the C++11 threads.


The OpenMP support uses now self-contained objects provided by Newlib 
 and offers a significantly better performance compared to 
the POSIX configuration. It is possible to configure scheduler instance 
specific thread pools.


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



RE: RFC: Combine of compare & and oddity

2015-09-03 Thread Wilco Dijkstra
> Segher Boessenkool wrote:
> Hi Wilco,
> 
> On Wed, Sep 02, 2015 at 06:09:24PM +0100, Wilco Dijkstra wrote:
> > Combine canonicalizes certain AND masks in a comparison with zero into 
> > extracts of the
> widest
> > register type. During matching these are expanded into a very inefficient 
> > sequence that
> fails to
> > match. For example (x & 2) == 0 is matched in combine like this:
> >
> > Failed to match this instruction:
> > (set (reg:CC 66 cc)
> > (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
> > (const_int 1 [0x1])
> > (const_int 1 [0x1]))
> > (const_int 0 [0])))
> 
> Yes.  Some processors even have specific instructions to do this.

However there are 2 issues with this, one is the spurious subreg, the other is 
that only a subset of legal zero_extracts are tried (only single bit and the
degenerate case of zero_extract with shift of 0 - which I think should not be a
zero_extract). All other AND immediate remain as AND. 

So to emit an AND on targets without such specific instructions, or where such 
instructions are more expensive than a simple AND (*), you need now at least 3 
different 
backend patterns for any instruction that can emit AND immediate...

(*) I think that is another issue in combine - when both alternatives match you
want to select the lowest cost one, not the first one that matches.

> > Failed to match this instruction:
> > (set (reg:CC 66 cc)
> > (compare:CC (and:DI (lshiftrt:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
> > (const_int 1 [0x1]))
> > (const_int 1 [0x1]))
> > (const_int 0 [0])))
> 
> This is after r223067.  Combine tests only one "final" instruction; that
> revision rewrites zero_ext* if it doesn't match and tries again.  This
> helps for processors that can do more generic masks (like rs6000, and I
> believe also aarch64?): without it, you need many more patterns to match
> all the zero_ext{ract,end} cases.

But there are more efficient ways to emit single bit and masks tests that apply
to most CPUs rather than doing something specific that works for just one target
only. For example single bit test is a simple shift into carry flag or into the 
sign bit, and for mask tests, you shift out all the non-mask bits.

So my question is, is it combine's job to try all possible permutations that
constitute a bit or mask test? Or would it be better to let each target decide
on how to canonicalize bit tests and only try that alternative?

> > Neither matches the AArch64 patterns for ANDS/TST (which is just compare 
> > and AND). If the
> immediate
> > is not a power of 2 or a power of 2 -1 then it matches correctly as 
> > expected.
> >
> > I don't understand how ((x >> 1) & 1) != 0 could be a useful expansion
> 
> It is zero_extract(x,1,1) really.  This is convenient for (old and embedded)
> processors that have special bit-test instructions.  If we now want combine
> to not do this, we'll have to update all backends that rely on it.

Would any backend actually rely on this given it only does some specific masks,
has a redundant shift with 0 for the mask case and the odd subreg as well?

> > (it even uses shifts by 0 at
> > times which are unlikely to ever match anything).
> 
> It matches fine on some targets.  It certainly looks silly, and could be
> expressed simpler.  Patch should be simple; watch this space / remind me /
> or file a PR.

I don't think this issue matters as it seems unlikely it ever matches.

> > Why does combine not try to match the obvious (x &
> > C) != 0 case? Single-bit and mask tests are very common, so this blocks 
> > efficient code
> generation on
> > many targets.
> 
> They are common, and many processors had instructions for them, which is
> (supposedly) why combine special-cased them.

Yes, but that doesn't mean (x & C) != 0 shouldn't be tried as well...

> > It's trivial to change change_zero_ext to expand extracts always into AND 
> > and remove the
> redundant
> > subreg.
> 
> Not really trivial...  Think about comparisons...
> 
> int32_t x;
> ((x >> 31) & 1) > 0;
> // is not the same as
> (x & 0x8000) > 0; // signed comparison
> 
> (You do not easily know what the COMPARE is used for).

Indeed if you had a zero_extract that included the sign-bit then you may have 
to adjust
the compare condition. However it looks like that case can't happen - (x & 
0x8000) 
comparisons with have the AND optimized away much earlier.

> > However wouldn't it make more sense to never special case certain AND 
> > immediate in the first
> > place?
> 
> Yes, but we need to make sure no targets regress (i.e. by rewriting patterns
> for those that do).  We need to know the impact of such a change, at the 
> least.

The alternative would be let the target decide how to canonicalize bit tests. 
That seems like a better solution than trying multiple possibilities which can 
never 
match on most targets.

Wilco




Re: [PATCH 3/3] [gomp] Add thread attribute customization

2015-09-03 Thread Sebastian Huber

On 03/09/15 13:10, Jakub Jelinek wrote:

On Thu, Sep 03, 2015 at 01:09:23PM +0200, Sebastian Huber wrote:

We have only thread attributes in this function: mutable_attr and attr. The
attr is initialized with &gomp_thread_attr and gomp_thread_attr is supposed
to be read-only by this function. Under certain conditions we have to modify
the initial attributes. Since gomp_thread_attr is read-only, we have to copy
it and then modify the copy. For this we need some storage: mutable_attr.

So use local_thread_attr if you want to stress it, but IMHO thread_attr
just just fine.  I really don't like mutable_attr.

Ok, if I don't rename thread_attr, is the patch ok?

Yes.


Thanks a lot for your kind review.

I committed the patches as:

https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=227439
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=227440
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=227441
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=227442

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: [testsuite] Clean up effective_target cache

2015-09-03 Thread H.J. Lu
On Wed, Sep 2, 2015 at 7:02 AM, Christophe Lyon
 wrote:
> On 1 September 2015 at 16:04, Christophe Lyon
>  wrote:
>> On 25 August 2015 at 17:31, Mike Stump  wrote:
>>> On Aug 25, 2015, at 1:14 AM, Christophe Lyon  
>>> wrote:
 Some subsets of the tests override ALWAYS_CXXFLAGS or
 TEST_ALWAYS_FLAGS and perform effective_target support tests using
 these modified flags.
>>>
 This patch adds a new function 'clear_effective_target_cache', which
 is called at the end of every .exp file which overrides
 ALWAYS_CXXFLAGS or TEST_ALWAYS_FLAGS.
>>>
>>> So, a simple English directive somewhere that says, if one changes 
>>> ALWAYS_CXXFLAGS or TEST_ALWAYS_FLAGS then they should do a 
>>> clear_effective_target_cache at the end as the target cache can make 
>>> decisions based upon the flags, and those decisions need to be redone when 
>>> the flags change would be nice.
>>>
>>> I do wonder, do we need to reexamine when setting the flags?  I’m thinking 
>>> of a sequence like: non-thumb default, is_thumb, set flags (thumb), 
>>> is_thumb.  Anyway, safe to punt this until someone discovers it or is 
>>> reasonable sure it happens.
>>>
>>> Anyway, all looks good.  Ok.
>>>
>> Here is what I have committed (r227372).
>
> Hmmm, in fact this was r227401.
>

It caused:

ERROR: can't unset "et_cache(arm_neon_ok,value)": no such element in array
ERROR: can't unset "et_cache(arm_neon_ok,value)": no such element in array
ERROR: can't unset "et_cache(arm_neon_ok,value)": no such element in array
ERROR: can't unset "et_cache(dfp,value)": no such element in array
ERROR: can't unset "et_cache(fsanitize_address,value)": no such element in array
ERROR: can't unset "et_cache(ia32,value)": no such element in array
ERROR: can't unset "et_cache(ia32,value)": no such element in array
ERROR: can't unset "et_cache(ia32,value)": no such element in array
ERROR: can't unset "et_cache(ia32,value)": no such element in array
ERROR: can't unset "et_cache(ia32,value)": no such element in array
ERROR: can't unset "et_cache(ilp32,value)": no such element in array
ERROR: can't unset "et_cache(ilp32,value)": no such element in array
ERROR: can't unset "et_cache(ilp32,value)": no such element in array
ERROR: can't unset "et_cache(ilp32,value)": no such element in array
ERROR: can't unset "et_cache(label_values,value)": no such element in array
ERROR: can't unset "et_cache(lp64,value)": no such element in array
ERROR: can't unset "et_cache(lp64,value)": no such element in array
ERROR: can't unset "et_cache(lp64,value)": no such element in array
ERROR: can't unset "et_cache(ptr32plus,value)": no such element in array
ERROR: can't unset "et_cache(ptr32plus,value)": no such element in array
...

on Linux/x86-64:

https://gcc.gnu.org/ml/gcc-testresults/2015-09/msg00167.html

-- 
H.J.


Re: [Patch] Add to the libgfortran/newlib bodge to "detect" ftruncate support in ARM/AArch64/SH

2015-09-03 Thread Hans-Peter Nilsson
On Thu, 3 Sep 2015, James Greenhalgh wrote:
> On Sun, Aug 30, 2015 at 02:46:26PM +0100, Hans-Peter Nilsson wrote:
> > (Pruned the CC list a bit as lists are included anyway)
> >
> > On Fri, 28 Aug 2015, James Greenhalgh wrote:
> > > Give me a shout if you see issues in your build systems.
> >
> > Since you asked: I saw a build failure for cris-elf matching the
> > missing-kill-declaration issue, and I don't like much having to
> > take manual steps force a new newlib version. It isn't being
> > automatically updated because there are regressions in my gcc
> > test-suite results.  I guess autodetecting the kill-declaration
> > issue in libgfortran is unnecessary complicated, in presence of
> > a fixed newlib trunk.  All in all, I appreciate you don't force
> > a new newlib on release branches.
>
> Hi,
>
> I could postpone the pain until after the release of GCC 6, by that
> point the newlib change will have a little longer to make it in to
> people's trees. On the other hand, this seems like the best way to fix
> the issue, and we are probably as well to do it now while we are still
> sitting in stage 1.

I do agree; the fault was with newlib for not declaring a
standard function that could actually be linked.  (It did cause
a linker *warning* for the stub function, but libtool didn't
care.)

> I don't want to cause you too much inconvenience, so if you'd like, I
> can revert the more comprehensive patch from trunk for now. I would be
> very keen to push it again, either late in GCC 6 development, or soon
> after the opening of GCC 7.
>
> Otherwise, if you're happy enough with the fix staying in place, I'll
> just leave it.

...that's why I said "All in all, I appreciate you don't force a
new newlib on release branches".  Maybe I wasn't clear: you
didn't; trunk is not a release branch.  Just wanted to cast a
vote for that decision to remain as-is.

> Sorry to have caused you any issues.

No worries.  If you hadn't asked to give a shout I'd have
remained silent - unless I'd have found other issues. :]

brgds, H-P


Re: [PATCH v2][GCC] Algorithmic optimization in match and simplify

2015-09-03 Thread Andre Vieira

On 01/09/15 15:01, Richard Biener wrote:

On Tue, Sep 1, 2015 at 3:40 PM, Andre Vieira
 wrote:

Hi Marc,

On 28/08/15 19:07, Marc Glisse wrote:


(not a review, I haven't even read the whole patch)

On Fri, 28 Aug 2015, Andre Vieira wrote:


2015-08-03  Andre Vieira  

   * match.pd: Added new patterns:
 ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
 (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)



+(for op0 (rshift rshift lshift lshift bit_and bit_and)
+ op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor)
+ op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior)

You can nest for-loops, it seems clearer as:
(for op0 (rshift lshift bit_and)
(for op1 (bit_ior bit_xor)
 op2 (bit_xor bit_ior)



Will do, thank you for pointing it out.



+(simplify
+ (op2:c
+  (op1:c
+   (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)

I suspect you will want more :s (single_use) and less :c (canonicalization
should put constants in second position).


I can't find the definition of :s (single_use).


Sorry for that - didn't get along updating it yet :/  It restricts matching to
sub-expressions that have a single-use.  So

+  a &= 0xd123;
+  unsigned short tem = a ^ 0x6040;
+  a = tem | 0xc031; /* Simplify _not_ to ((a & 0xd123) | 0xe071).  */
... use of tem ...

we shouldn't do the simplifcation here because the expression
(a & 0x123) ^ 0x6040 is kept live by 'tem'.


GCC internals do point out
that canonicalization does put constants in the second position, didnt see
that first. Thank you for pointing it out.


+   C1 = wi::bit_and_not (C1,C2);

Space after ','.


Will do.


Having wide_int_storage in many places is surprising, I can't find similar
code anywhere else in gcc.




I tried looking for examples of something similar, I think I ended up using
wide_int because I was able to convert easily to and from it and it has the
"mask" and "wide_int_to_tree" functions. I welcome any suggestions on what I
should be using here for integer constant transformations and comparisons.


Using wide-ints is fine, but you shouldn't need 'wide_int_storage'
constructors - those
are indeed odd.  Is it just for the initializers of wide-ints?

+wide_int zero_mask = wi::zero (prec);
+wide_int C0 = wide_int_storage (@1);
+wide_int C1 = wide_int_storage (@2);
+wide_int C2 = wide_int_storage (@3);
...
+   zero_mask = wide_int_storage (wi::mask (C0.to_uhwi (), false, prec));

tree_to_uhwi (@1) should do the trick as well

+   C1 = wi::bit_and_not (C1,C2);
+   cst_emit = wi::bit_or (C1, C2);

the ops should be replacable with @2 and @3, the store to C1 obviously not
(but you can use a tree temporary and use wide_int_to_tree here to avoid
the back-and-forth for the case where C1 is not assigned to).

Note that transforms only doing association are prone to endless recursion
in case some other pattern does the reverse op...

Richard.



BR,
Andre




Thank you for all the comments, see reworked version:

Two new algorithmic optimisations:
  1.((X op0 C0) op1 C1) op2 C2)
with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2
zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0)
and 0's otherwise.
if (op1 == '^') C1 &= ~C2 (Only changed if actually emitted)
if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2)
if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2)
  2. (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)


This patch does two algorithmic optimisations that target patterns like:
(((x << 24) | 0x00FF) ^ 0xFF00) and ((x ^ 0x4002) >> 1) | 
0x8000.


The transformation uses the knowledge of which bits are zero after 
operations like (X {&,<<,(unsigned)>>}) to combine constants, reducing 
run-time operations.
The two examples above would be transformed into (X << 24) ^ 0x 
and (X >> 1) ^ 0xa001 respectively.


The second transformation enables more applications of the first. Also 
some targets may benefit from delaying shift operations. I am aware that 
such an optimization, in combination with one or more optimizations that 
cause the reverse transformation, may lead to an infinite loop. Though 
such behavior has not been detected during regression testing and 
bootstrapping on aarch64.


gcc/ChangeLog:

2015-08-03  Andre Vieira  

  * match.pd: Added new patterns:
((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
(X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)

gcc/testsuite/ChangeLog:

2015-08-03  Andre Vieira  
Hale Wang  

  * gcc.dg/tree-ssa/forwprop-33.c: New test.
From 3d1d4d838fed9af45aea9fa99f8954585fee7c23 Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira 
Date: Wed, 2 Sep 2015 16:47:38 +0100
Subject: [PATCH] algorithmic optimization v2

---
 gcc/match.pd| 70 +
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c | 42 +
 2 files changed, 112 inserti

Re: [PATCH 3/3] [gomp] Add thread attribute customization

2015-09-03 Thread Jakub Jelinek
On Thu, Sep 03, 2015 at 01:09:23PM +0200, Sebastian Huber wrote:
> >>>We have only thread attributes in this function: mutable_attr and attr. The
> >>>attr is initialized with &gomp_thread_attr and gomp_thread_attr is supposed
> >>>to be read-only by this function. Under certain conditions we have to 
> >>>modify
> >>>the initial attributes. Since gomp_thread_attr is read-only, we have to 
> >>>copy
> >>>it and then modify the copy. For this we need some storage: mutable_attr.
> >So use local_thread_attr if you want to stress it, but IMHO thread_attr
> >just just fine.  I really don't like mutable_attr.
> 
> Ok, if I don't rename thread_attr, is the patch ok?

Yes.

Jakub


Re: [PATCH 3/3] [gomp] Add thread attribute customization

2015-09-03 Thread Sebastian Huber



On 03/09/15 13:05, Jakub Jelinek wrote:

On Thu, Sep 03, 2015 at 12:57:53PM +0200, Sebastian Huber wrote:

>On 03/09/15 12:19, Jakub Jelinek wrote:

> >>@@ -292,7 +292,7 @@ gomp_team_start (void (*fn) (void *), void *data, 
unsigned nthreads,

> >>>bool nested;
> >>>struct gomp_thread_pool *pool;
> >>>unsigned i, n, old_threads_used = 0;
> >>>-  pthread_attr_t thread_attr, *attr;
> >>>+  pthread_attr_t mutable_attr, *attr;

> >Just wonder why have you renamed this variable.  It is a thread attribute
> >after all, even after your changes.  mutable_attr doesn't make much sense to
> >me.

>
>We have only thread attributes in this function: mutable_attr and attr. The
>attr is initialized with &gomp_thread_attr and gomp_thread_attr is supposed
>to be read-only by this function. Under certain conditions we have to modify
>the initial attributes. Since gomp_thread_attr is read-only, we have to copy
>it and then modify the copy. For this we need some storage: mutable_attr.

So use local_thread_attr if you want to stress it, but IMHO thread_attr
just just fine.  I really don't like mutable_attr.


Ok, if I don't rename thread_attr, is the patch ok?

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: [PATCH 3/3] [gomp] Add thread attribute customization

2015-09-03 Thread Jakub Jelinek
On Thu, Sep 03, 2015 at 12:57:53PM +0200, Sebastian Huber wrote:
> On 03/09/15 12:19, Jakub Jelinek wrote:
> >>@@ -292,7 +292,7 @@ gomp_team_start (void (*fn) (void *), void *data, 
> >>unsigned nthreads,
> >>>bool nested;
> >>>struct gomp_thread_pool *pool;
> >>>unsigned i, n, old_threads_used = 0;
> >>>-  pthread_attr_t thread_attr, *attr;
> >>>+  pthread_attr_t mutable_attr, *attr;
> >Just wonder why have you renamed this variable.  It is a thread attribute
> >after all, even after your changes.  mutable_attr doesn't make much sense to
> >me.
> 
> We have only thread attributes in this function: mutable_attr and attr. The
> attr is initialized with &gomp_thread_attr and gomp_thread_attr is supposed
> to be read-only by this function. Under certain conditions we have to modify
> the initial attributes. Since gomp_thread_attr is read-only, we have to copy
> it and then modify the copy. For this we need some storage: mutable_attr.

So use local_thread_attr if you want to stress it, but IMHO thread_attr
just just fine.  I really don't like mutable_attr.

Jakub


[patch] libstdc++/65473 Make define libstdc++ version macros.

2015-09-03 Thread Jonathan Wakely

This change would allow including  to be used to check for
__GLIBCXX__ and detect whether youre using libstdc++ or not. Howard
Hinnant recommends including that header for libc++ because it has no
other effects in C++.

We could make every  header include  so that
any of them can be used, but I can't be bothered doing that change!
This makes it work for the one header that is recommended to be used,
but of course that doesn't help people using older versions of
libstdc++, who still need to include some other header.

Is this worth doing?

commit 0ac33b5beb231efc94ce4f0288fad36047f0325e
Author: Jonathan Wakely 
Date:   Thu Sep 3 11:45:29 2015 +0100

Make  define libstdc++ version macros.

	PR libstdc++/65473
	* include/c/ciso646: Include  and improve comment.
	* include/c_global/ciso646: Likewise.
	* include/c_std/ciso646: Likewise.

diff --git a/libstdc++-v3/include/c/ciso646 b/libstdc++-v3/include/c/ciso646
index 125f166..fb537f5 100644
--- a/libstdc++-v3/include/c/ciso646
+++ b/libstdc++-v3/include/c/ciso646
@@ -27,6 +27,6 @@
  *  in your programs, rather than any of the "*.h" implementation files.
  *
  *  This is the C++ version of the Standard C Library header @c iso646.h,
- *  and its contents are (mostly) the same as that header, but are all
- *  contained in the namespace @c std.
+ *  which is empty in C++.
  */
+#include 
diff --git a/libstdc++-v3/include/c_global/ciso646 b/libstdc++-v3/include/c_global/ciso646
index 818db67..c59677a 100644
--- a/libstdc++-v3/include/c_global/ciso646
+++ b/libstdc++-v3/include/c_global/ciso646
@@ -27,7 +27,6 @@
  *  in your programs, rather than any of the @a *.h implementation files.
  *
  *  This is the C++ version of the Standard C Library header @c iso646.h,
- *  and its contents are (mostly) the same as that header, but are all
- *  contained in the namespace @c std (except for names which are defined
- *  as macros in C).
+ *  which is empty in C++.
  */
+#include 
diff --git a/libstdc++-v3/include/c_std/ciso646 b/libstdc++-v3/include/c_std/ciso646
index 08cdf24..ab44488 100644
--- a/libstdc++-v3/include/c_std/ciso646
+++ b/libstdc++-v3/include/c_std/ciso646
@@ -27,7 +27,6 @@
  *  in your programs, rather than any of the @a *.h implementation files.
  *
  *  This is the C++ version of the Standard C Library header @c iso646.h,
- *  and its contents are (mostly) the same as that header, but are all
- *  contained in the namespace @c std (except for names which are defined
- *  as macros in C).
+ *  which is empty in C++.
  */
+#include 


Re: [PR64164] drop copyrename, integrate into expand

2015-09-03 Thread Alan Lawrence

On 02/09/15 23:12, Alexandre Oliva wrote:

On Sep  2, 2015, Alan Lawrence  wrote:


One more failure to report, I'm afraid. On AArch64 Bigendian,
aapcs64/func-ret-4.c ICEs in simplify_subreg (line refs here are from
r227348):


Thanks.  The failure mode was different in the current, revamped git
branch aoliva/pr64164, but I've just fixed it there.

I'm almost ready to post a new patch, with a new, simpler, less fragile
and more maintainable approach to integrate cfgexpand and assign_parms'
RTL assignment, so if you could give it a spin on big and little endian
aarch64 natives, that would be very much appreciated!



On aarch64_be, that branch fixes the ICE - but func-ret-4.c fails on execution, 
and now func-ret-3.c does too! Also it causes a bunch of errors building newlib 
using cross-built binutils, which I haven't tracked down yet:


/work/alalaw01/src2/binutils-gdb/newlib/libc/locale/locale.c: In function 
'__get_locale_env':
/work/alalaw01/src2/binutils-gdb/newlib/libc/locale/locale.c:911:1: internal 
compiler error: in insert_value_copy_on_edge, at tree-outof-ssa.c:308

 __get_locale_env(struct _reent *p, int category)
 ^
0xb4ecc4 insert_value_copy_on_edge
/work/alalaw01/src2/gcc/gcc/tree-outof-ssa.c:307
0xb4ecc4 eliminate_phi
/work/alalaw01/src2/gcc/gcc/tree-outof-ssa.c:780
0xb4ecc4 expand_phi_nodes(ssaexpand*)
/work/alalaw01/src2/gcc/gcc/tree-outof-ssa.c:943
0x6e74a6 execute
/work/alalaw01/src2/gcc/gcc/cfgexpand.c:6242
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
make[7]: *** [lib_a-locale.o] Error 1

--Alan



Re: [PATCH 3/3] [gomp] Add thread attribute customization

2015-09-03 Thread Sebastian Huber

On 03/09/15 12:19, Jakub Jelinek wrote:

@@ -292,7 +292,7 @@ gomp_team_start (void (*fn) (void *), void *data, unsigned 
nthreads,
>bool nested;
>struct gomp_thread_pool *pool;
>unsigned i, n, old_threads_used = 0;
>-  pthread_attr_t thread_attr, *attr;
>+  pthread_attr_t mutable_attr, *attr;

Just wonder why have you renamed this variable.  It is a thread attribute
after all, even after your changes.  mutable_attr doesn't make much sense to
me.


We have only thread attributes in this function: mutable_attr and attr. 
The attr is initialized with &gomp_thread_attr and gomp_thread_attr is 
supposed to be read-only by this function. Under certain conditions we 
have to modify the initial attributes. Since gomp_thread_attr is 
read-only, we have to copy it and then modify the copy. For this we need 
some storage: mutable_attr.


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: [PATCH 3/3] [gomp] Add thread attribute customization

2015-09-03 Thread Jakub Jelinek
On Tue, Jul 28, 2015 at 01:04:59PM +0200, Sebastian Huber wrote:
> libgomp/ChangeLog
> 2015-07-28  Sebastian Huber  
> 
>   * config/posix/pool.h (gomp_adjust_thread_attr): New.
>   * config/rtems/pool.h (gomp_adjust_thread_attr): Likewise.
>   (gomp_thread_pool_reservoir): Add priority member.
>   * confi/rtems/proc.c (allocate_thread_pool_reservoir): Add
>   priority.
>   (parse_thread_pools): Likewise.
>   * team.c (gomp_team_start): Rename thread_attr to mutable_attr.
>   Call configuration provided gomp_adjust_thread_attr(). Destroy
>   mutable attributes if necessary.
>   * libgomp.texi: Document GOMP_RTEMS_THREAD_POOLS.

Wonder if the RTEMS specific bits in libgomp.texi shouldn't be guarded
with
@ifset RTEMS
...
@end ifset
and --texinfo='@set RTEMS' be used when compiling it.
But perhaps it can be done incrementally, so the patch is ok.

> @@ -292,7 +292,7 @@ gomp_team_start (void (*fn) (void *), void *data, 
> unsigned nthreads,
>bool nested;
>struct gomp_thread_pool *pool;
>unsigned i, n, old_threads_used = 0;
> -  pthread_attr_t thread_attr, *attr;
> +  pthread_attr_t mutable_attr, *attr;

Just wonder why have you renamed this variable.  It is a thread attribute
after all, even after your changes.  mutable_attr doesn't make much sense to
me.

Jakub


Re: [PATCH 2/3] [gomp] Thread pool management

2015-09-03 Thread Jakub Jelinek
On Tue, Jul 28, 2015 at 01:04:58PM +0200, Sebastian Huber wrote:
> +#ifndef GOMP_POOL_H
> +#define GOMP_POOL_H 1
> +
> +#include 

Please use #include "libgomp.h" here.

> +#include 

Similarly.

> +
> +#include "libgomp.h"
> +#include 
> +#include 

#include "pool.h" (and perhaps if possible move after libgomp.h).

> --- a/libgomp/team.c
> +++ b/libgomp/team.c
> @@ -27,6 +27,7 @@
> creation and termination.  */
>  
>  #include "libgomp.h"
> +#include 

Likewise.

Ok with those changes.

Jakub


Re: [PATCH 1/3] [gomp] Add RTEMS configuration

2015-09-03 Thread Jakub Jelinek
On Thu, Sep 03, 2015 at 11:46:49AM +0200, Jakub Jelinek wrote:
> On Tue, Jul 28, 2015 at 01:04:57PM +0200, Sebastian Huber wrote:
> > libgomp/ChangeLog
> > 2015-07-28  Sebastian Huber  
> > 
> > * config/rtems/bar.c: New.
> > * config/rtems/bar.h: Likewise.
> > * config/rtems/mutex.c: Likewise.
> > * config/rtems/mutex.h: Likewise.
> > * config/rtems/sem.c: Likewise.
> > * config/rtems/sem.h: Likewise.
> > * configure.ac (*-*-rtems*): Check that Newlib provides a proper
> >  header file.
> > * configure.tgt (*-*-rtems*): Enable RTEMS configuration if
> > supported by Newlib.
> > * configure: Regenerate.
> 
> > --- /dev/null
> > +++ b/libgomp/config/rtems/bar.c
> > @@ -0,0 +1,255 @@
> > +
> > +static gomp_barrier_t *
> > +generation_to_barrier(int *addr)
> 
> Missing space before (.

Oh, and please use
#include "libgomp.h"
#include "bar.h"
instead of
#include 
#include 
(to help differentiate system headers from libgomp specific headers).

Jakub


Re: [Patch] PR67351 Implement << N & >> N optimizers

2015-09-03 Thread Richard Biener
On Thu, Sep 3, 2015 at 9:29 AM, Hurugalawadi, Naveen
 wrote:
> Hi,
>
> Thanks for all the review and comments.
>
>>> replace the precision test with wi::ltu_p (@1, TYPE_PRECISION (type)
>>> use element_precision instead of TYPE_PRECISION
>
> Please find attached the modified patch as per review comments.
>
> Please review the same and let me know if the patch is okay?

Ok with the tree_fits_uhwi_p checks removed (they are redundant)
and with

+/* { dg-final { scan-assembler-not "<<" } } */
+/* { dg-final { scan-assembler-not ">>" } } */

replaced with

/* { dg-final { scan-tree-dump-not "<<" "optimized" } } */

Thanks,
Richard.

> Regression Tested on AArch64 and X86_64.
>
> Thanks,
> Naveen


Re: [PATCH 00/10] removal of typedefs that hide pointerness episode 1

2015-09-03 Thread Richard Biener
On Thu, Sep 3, 2015 at 7:26 AM,   wrote:
> From: Trevor Saunders 
>
> Hi,
>
> Personally I think hiding that variables are pointers is confusing, and I
> believe consensus is we should move away from this style.  So this series
> starts to do that.

Btw, what happened to the promised gimple -> gimple * conversion?

> patches individually bootstrapped + regtested on x86_64-linux-gnu, ok?

Leaving some time for others to start bikeshedding.

Richard.

> Trev
>
> Trevor Saunders (10):
>   don't typedef alias_set_entry and unhide pointerness
>   dse.c: remove some typedefs that hide pointerness
>   var-tracking.c: remove typedef of location_chain
>   var-tracking.c: remove typedef of shared_hash
>   bt-load.c: remove typedefs that hide pointerness
>   tree-ssa-ter.c: remove typedefs that hide pointerness
>   tree-vrp.c: remove typedefs that hide pointerness
>   dwarf2cfi.c: remove typedef that hides pointerness
>   dwarf2out.c: remove typedefs that hide pointerness
>   tree-ssa-loop-im.c: remove typedefs that hide pointerness
>
>  gcc/alias.c|  31 +++--
>  gcc/bt-load.c  | 140 ++--
>  gcc/dse.c  | 115 -
>  gcc/dwarf2cfi.c|   5 +-
>  gcc/dwarf2out.c| 340 
> -
>  gcc/tree-ssa-loop-im.c |  98 +++---
>  gcc/tree-ssa-ter.c |  39 +++---
>  gcc/tree-vrp.c |  22 ++--
>  gcc/var-tracking.c | 192 ++--
>  9 files changed, 479 insertions(+), 503 deletions(-)
>
> --
> 2.4.0
>


Re: [PATCH 1/3] [gomp] Add RTEMS configuration

2015-09-03 Thread Jakub Jelinek
On Tue, Jul 28, 2015 at 01:04:57PM +0200, Sebastian Huber wrote:
> libgomp/ChangeLog
> 2015-07-28  Sebastian Huber  
> 
>   * config/rtems/bar.c: New.
>   * config/rtems/bar.h: Likewise.
>   * config/rtems/mutex.c: Likewise.
>   * config/rtems/mutex.h: Likewise.
>   * config/rtems/sem.c: Likewise.
>   * config/rtems/sem.h: Likewise.
>   * configure.ac (*-*-rtems*): Check that Newlib provides a proper
>header file.
>   * configure.tgt (*-*-rtems*): Enable RTEMS configuration if
>   supported by Newlib.
>   * configure: Regenerate.

> --- /dev/null
> +++ b/libgomp/config/rtems/bar.c
> @@ -0,0 +1,255 @@
> +
> +static gomp_barrier_t *
> +generation_to_barrier(int *addr)

Missing space before (.

Ok with that fixed.

Jakub


Re: [PATCH] fix PR53852: stop ISL after a given number of operations

2015-09-03 Thread Richard Biener
On Thu, Sep 3, 2015 at 12:34 AM, Sebastian Pop  wrote:
> 2015-09-02  Sebastian Pop  
>
> * config.in: Regenerate.
> * configure: Regenerate.
> * configure.ac (HAVE_ISL_CTX_MAX_OPERATIONS): Detect.
> * graphite-optimize-isl.c (optimize_isl): Stop computation when
> PARAM_MAX_ISL_OPERATIONS is reached.
> * params.def (PARAM_MAX_ISL_OPERATIONS): Add.
>
> * graphite-dependences.c (extend_schedule): Remove gcc_asserts on
> result equal to isl_stat_ok as the status now can be 
> isl_error_quota.
> (subtract_commutative_associative_deps): Same.
> (compute_deps): Same.
>
> testsuite/
> * gcc.dg/graphite/uns-interchange-12.c: Adjust pattern to pass 
> with
> both isl-0.12 and isl-0.15.

Does it mean with 0.15 we now "time out" on some of the cases?  Or is this
just a general difference between 0.12 and 0.15?  In which case, like for
this testcase, is there a better way to verify whether the loops J and K were
interchanged?

> * gcc.dg/graphite/uns-interchange-14.c: Same.
> * gcc.dg/graphite/uns-interchange-15.c: Same.
> * gcc.dg/graphite/uns-interchange-mvt.c: Same.
> ---
>  gcc/config.in  |  6 ++
>  gcc/configure  | 28 
>  gcc/configure.ac   | 11 +++
>  gcc/graphite-dependences.c | 83 
> +-
>  gcc/graphite-optimize-isl.c| 49 -
>  gcc/params.def |  5 ++
>  gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c |  2 +-
>  gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c |  2 +-
>  gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c |  2 +-
>  .../gcc.dg/graphite/uns-interchange-mvt.c  |  2 +-
>  10 files changed, 120 insertions(+), 70 deletions(-)
>
> diff --git a/gcc/config.in b/gcc/config.in
> index 22a4e6b..98c4647 100644
> --- a/gcc/config.in
> +++ b/gcc/config.in
> @@ -1332,6 +1332,12 @@
>  #endif
>
>
> +/* Define if isl_ctx_get_max_operations exists. */
> +#ifndef USED_FOR_TARGET
> +#undef HAVE_ISL_CTX_MAX_OPERATIONS
> +#endif
> +
> +
>  /* Define if isl_options_set_schedule_serialize_sccs exists. */
>  #ifndef USED_FOR_TARGET
>  #undef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
> diff --git a/gcc/configure b/gcc/configure
> index 0d31383..07d39f9 100755
> --- a/gcc/configure
> +++ b/gcc/configure
> @@ -28625,6 +28625,29 @@ rm -f core conftest.err conftest.$ac_objext \
>{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
> $ac_has_isl_options_set_schedule_serialize_sccs" >&5
>  $as_echo "$ac_has_isl_options_set_schedule_serialize_sccs" >&6; }
>
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: checking Checking for 
> isl_ctx_get_max_operations" >&5
> +$as_echo_n "checking Checking for isl_ctx_get_max_operations... " >&6; }
> +  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +#include 
> +int
> +main ()
> +{
> +isl_ctx_get_max_operations (isl_ctx_alloc ());
> +  ;
> +  return 0;
> +}
> +_ACEOF
> +if ac_fn_cxx_try_link "$LINENO"; then :
> +  ac_has_isl_ctx_get_max_operations=yes
> +else
> +  ac_has_isl_ctx_get_max_operations=no
> +fi
> +rm -f core conftest.err conftest.$ac_objext \
> +conftest$ac_exeext conftest.$ac_ext
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: 
> $ac_has_isl_ctx_get_max_operations" >&5
> +$as_echo "$ac_has_isl_ctx_get_max_operations" >&6; }
> +
>LIBS="$saved_LIBS"
>CXXFLAGS="$saved_CXXFLAGS"
>
> @@ -28639,6 +28662,11 @@ $as_echo "#define 
> HAVE_ISL_SCHED_CONSTRAINTS_COMPUTE_SCHEDULE 1" >>confdefs.h
>  $as_echo "#define HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS 1" 
> >>confdefs.h
>
>fi
> +  if test x"$ac_has_isl_ctx_get_max_operations" = x"yes"; then
> +
> +$as_echo "#define HAVE_ISL_CTX_MAX_OPERATIONS 1" >>confdefs.h
> +
> +  fi
>  fi
>
>  # Check for plugin support
> diff --git a/gcc/configure.ac b/gcc/configure.ac
> index 846651d..b6e8bed 100644
> --- a/gcc/configure.ac
> +++ b/gcc/configure.ac
> @@ -5790,6 +5790,13 @@ if test "x${ISLLIBS}" != "x" ; then
>[ac_has_isl_options_set_schedule_serialize_sccs=no])
>AC_MSG_RESULT($ac_has_isl_options_set_schedule_serialize_sccs)
>
> +  AC_MSG_CHECKING([Checking for isl_ctx_get_max_operations])
> +  AC_TRY_LINK([#include ],
> +  [isl_ctx_get_max_operations (isl_ctx_alloc ());],
> +  [ac_has_isl_ctx_get_max_operations=yes],
> +  [ac_has_isl_ctx_get_max_operations=no])
> +  AC_MSG_RESULT($ac_has_isl_ctx_get_max_operations)
> +
>LIBS="$saved_LIBS"
>CXXFLAGS="$saved_CXXFLAGS"
>
> @@ -5802,6 +5809,10 @@ if test "x${ISLLIBS}" != "x" ; then
>   AC_DEFINE(HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS, 1,
> [Define if isl_options_set_schedule_serialize_sccs exists.])
>fi
> +  if test x"$ac_has_isl_ctx_get_max_oper

Re: [PATCH] [gomp] Simplify thread pool initialization

2015-09-03 Thread Jakub Jelinek
On Wed, Jul 22, 2015 at 02:56:15PM +0200, Sebastian Huber wrote:
> 2015-07-22  Sebastian Huber  
> 
>   * team.c (gomp_new_thread_pool): Delete and move content to ...
>   (gomp_get_thread_pool): ... new function.  Allocate and
>   initialize thread pool on demand.
>   (get_last_team): Use gomp_get_thread_pool().
>   (gomp_team_start): Delete thread pool initialization.
> ---
>  libgomp/team.c | 56 +++-
>  1 file changed, 27 insertions(+), 29 deletions(-)
> 
> diff --git a/libgomp/team.c b/libgomp/team.c
> index 7671b05..5c56182 100644
> --- a/libgomp/team.c
> +++ b/libgomp/team.c
> @@ -134,22 +134,39 @@ gomp_thread_start (void *xdata)
>return NULL;
>  }
>  
> +/* Get the thread pool, allocate and initialize it on demand.  */
> +
> +static struct gomp_thread_pool *

Please make this
static inline struct gomp_thread_pool *
(unlike the gomp_new_thread_pool call which has been solely cold path,
this one has a hot path and thus we want to either inline it fully or
inline it partially at least).

Ok with that change.

Jakub


Re: [PATCH] Fix ICE when generating a vector shift by scalar

2015-09-03 Thread Richard Biener
On Wed, Sep 2, 2015 at 11:14 PM, Bill Schmidt
 wrote:
>
> On Wed, 2015-09-02 at 14:44 +0200, Richard Biener wrote:
>> On Tue, Sep 1, 2015 at 5:53 PM, Bill Schmidt
>>  wrote:
>> > On Tue, 2015-09-01 at 11:01 +0200, Richard Biener wrote:
>> >> On Mon, Aug 31, 2015 at 10:28 PM, Bill Schmidt
>> >>  wrote:
>> >> > Hi,
>> >> >
>> >> > The following simple test fails when attempting to convert a vector
>> >> > shift-by-scalar into a vector shift-by-vector.
>> >> >
>> >> >   typedef unsigned char v16ui __attribute__((vector_size(16)));
>> >> >
>> >> >   v16ui vslb(v16ui v, unsigned char i)
>> >> >   {
>> >> > return v << i;
>> >> >   }
>> >> >
>> >> > When this code is gimplified, the shift amount gets expanded to an
>> >> > unsigned int:
>> >> >
>> >> >   vslb (v16ui v, unsigned char i)
>> >> >   {
>> >> > v16ui D.2300;
>> >> > unsigned int D.2301;
>> >> >
>> >> > D.2301 = (unsigned int) i;
>> >> > D.2300 = v << D.2301;
>> >> > return D.2300;
>> >> >   }
>> >> >
>> >> > In expand_binop, the shift-by-scalar is converted into a shift-by-vector
>> >> > using expand_vector_broadcast, which produces the following rtx to be
>> >> > used to initialize a V16QI vector:
>> >> >
>> >> > (parallel:V16QI [
>> >> > (subreg/s/v:SI (reg:DI 155) 0)
>> >> > (subreg/s/v:SI (reg:DI 155) 0)
>> >> > (subreg/s/v:SI (reg:DI 155) 0)
>> >> > (subreg/s/v:SI (reg:DI 155) 0)
>> >> > (subreg/s/v:SI (reg:DI 155) 0)
>> >> > (subreg/s/v:SI (reg:DI 155) 0)
>> >> > (subreg/s/v:SI (reg:DI 155) 0)
>> >> > (subreg/s/v:SI (reg:DI 155) 0)
>> >> > (subreg/s/v:SI (reg:DI 155) 0)
>> >> > (subreg/s/v:SI (reg:DI 155) 0)
>> >> > (subreg/s/v:SI (reg:DI 155) 0)
>> >> > (subreg/s/v:SI (reg:DI 155) 0)
>> >> > (subreg/s/v:SI (reg:DI 155) 0)
>> >> > (subreg/s/v:SI (reg:DI 155) 0)
>> >> > (subreg/s/v:SI (reg:DI 155) 0)
>> >> > (subreg/s/v:SI (reg:DI 155) 0)
>> >> > ])
>> >> >
>> >> > The back end eventually chokes trying to generate a copy of the SImode
>> >> > expression into a QImode memory slot.
>> >> >
>> >> > This patch fixes this problem by ensuring that the shift amount is
>> >> > truncated to the inner mode of the vector when necessary.  I've added a
>> >> > test case verifying correct PowerPC code generation in this case.
>> >> >
>> >> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
>> >> > regressions.  Is this ok for trunk?
>> >> >
>> >> > Thanks,
>> >> > Bill
>> >> >
>> >> >
>> >> > [gcc]
>> >> >
>> >> > 2015-08-31  Bill Schmidt  
>> >> >
>> >> > * optabs.c (expand_binop): Don't create a broadcast vector with 
>> >> > a
>> >> > source element wider than the inner mode.
>> >> >
>> >> > [gcc/testsuite]
>> >> >
>> >> > 2015-08-31  Bill Schmidt  
>> >> >
>> >> > * gcc.target/powerpc/vec-shift.c: New test.
>> >> >
>> >> >
>> >> > Index: gcc/optabs.c
>> >> > ===
>> >> > --- gcc/optabs.c(revision 227353)
>> >> > +++ gcc/optabs.c(working copy)
>> >> > @@ -1608,6 +1608,13 @@ expand_binop (machine_mode mode, optab binoptab, 
>> >> > r
>> >> >
>> >> >if (otheroptab && optab_handler (otheroptab, mode) != 
>> >> > CODE_FOR_nothing)
>> >> > {
>> >> > + /* The scalar may have been extended to be too wide.  Truncate
>> >> > +it back to the proper size to fit in the broadcast vector. 
>> >> >  */
>> >> > + machine_mode inner_mode = GET_MODE_INNER (mode);
>> >> > + if (GET_MODE_BITSIZE (inner_mode)
>> >> > + < GET_MODE_BITSIZE (GET_MODE (op1)))
>> >>
>> >> Does that work for modeless constants?  Btw, what do other targets do
>> >> here?  Do they
>> >> also choke or do they cope with the wide operand?
>> >
>> > Good question.  This works by serendipity more than by design.  Because
>> > a constant has a mode of VOIDmode, its bitsize is 0 and the TRUNCATE
>> > won't be generated.  It would be better for me to put in an explicit
>> > check for CONST_INT rather than relying on this, though.  I'll fix that.
>> >
>> > I am not sure what other targets do here; I can check.  However, do you
>> > think that's relevant?  I'm concerned that
>> >
>> > (parallel:V16QI [
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (reg:DI 155) 0)
>> > (subreg/s/v:SI (r

Re: [Patch] Add to the libgfortran/newlib bodge to "detect" ftruncate support in ARM/AArch64/SH

2015-09-03 Thread James Greenhalgh
On Sun, Aug 30, 2015 at 02:46:26PM +0100, Hans-Peter Nilsson wrote:
> (Pruned the CC list a bit as lists are included anyway)
> 
> On Fri, 28 Aug 2015, James Greenhalgh wrote:
> > On Fri, Aug 28, 2015 at 10:40:31AM +0100, James Greenhalgh wrote:
> > > On Tue, Aug 25, 2015 at 03:44:05PM +0100, FX wrote:
> > > > > 2015-08-25  James Greenhalgh  
> > > > >
> > > > >   * configure.ac: Auto-detect newlib function support unless we
> > > > >   know there are issues when configuring for a host.
> > > > >   * configure: Regenerate.
> > > >
> > > > Thanks for CC?ing the fortran list.
> > > >
> > > > Given that this is newlib-specific code, even though it?s in libgfortran
> > > > configury, you should decide and commit what?s best. I don?t think we 
> > > > have
> > > > any newlib expert in the Fortran maintainers.
> > > >
> > > > Wait for 48 hours to see if anyone else objects, though.
> > >
> > > OK, it has been 48 hours and I haven't seen any objections. The newlib
> > > patch has now been committed.
> > >
> > > I agree with Marcus' suggestion that we put the more comprehensive patch
> > > (which requires the newlib fix) on trunk and my original patch (which does
> > > not) on the release branches.
> > >
> > > I'll go ahead with that later today.
> >
> > Now in place on trunk (r227301), gcc-5-branch (r227302) and gcc-4_9-branch
> > (r227304).
> >
> > Give me a shout if you see issues in your build systems.
> 
> Since you asked: I saw a build failure for cris-elf matching the
> missing-kill-declaration issue, and I don't like much having to
> take manual steps force a new newlib version. It isn't being
> automatically updated because there are regressions in my gcc
> test-suite results.  I guess autodetecting the kill-declaration
> issue in libgfortran is unnecessary complicated, in presence of
> a fixed newlib trunk.  All in all, I appreciate you don't force
> a new newlib on release branches.

Hi,

I could postpone the pain until after the release of GCC 6, by that
point the newlib change will have a little longer to make it in to
people's trees. On the other hand, this seems like the best way to fix
the issue, and we are probably as well to do it now while we are still
sitting in stage 1.

I don't want to cause you too much inconvenience, so if you'd like, I
can revert the more comprehensive patch from trunk for now. I would be
very keen to push it again, either late in GCC 6 development, or soon
after the opening of GCC 7.

Otherwise, if you're happy enough with the fix staying in place, I'll
just leave it.

Sorry to have caused you any issues.

James



Re: [PR65637][PATCH][5/5] Handle 2 preds for fin_bb in expand_omp_for_static_chunk

2015-09-03 Thread Jakub Jelinek
On Mon, Aug 31, 2015 at 02:02:57PM +0200, Tom de Vries wrote:
> 2015-08-31  Tom de Vries  
> 
>   PR tree-optimization/65637
>   * omp-low.c (expand_omp_for_static_chunk): Handle case that fin_bb has 2
>   predecessors.

Ok.
> 
>   * gcc.dg/autopar/reduc-3-chunk-size.c: New test.

But for the testcase similarly to previous patches, I'd call it
reduc-4.c and #include "reduc-3.c" instead of duplicating it.

Jakub


Re: [PR65637][PATCH][4/5] Fix inner loop phi in expand_omp_for_static_chunk

2015-09-03 Thread Jakub Jelinek
On Mon, Aug 31, 2015 at 02:00:10PM +0200, Tom de Vries wrote:
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -6885,6 +6885,22 @@ expand_omp_for_static_nochunk (struct omp_region 
> *region,
>  }
>  }
>  

Please add a function comment.

> +static gphi *
> +find_phi_with_arg_on_edge (tree arg, edge e)
> +{
> +  basic_block bb = e->dest;
> +
> +  for (gphi_iterator gpi = gsi_start_phis (bb);
> +   !gsi_end_p (gpi);
> +   gsi_next (&gpi))
> +{
> +  gphi *phi = gpi.phi ();
> +  if (PHI_ARG_DEF_FROM_EDGE (phi, e) == arg)
> + return phi;
> +}
> +
> +  return NULL;
> +}

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c/autopar-1-chunk-size.c
> @@ -0,0 +1,44 @@
> +/* { dg-do run } */
> +/* { dg-additional-options "-ftree-parallelize-loops=4 -ffast-math --param 
> parloops-chunk-size=100" } */

Similarly to previous patch, just use autopar-2.c as filename
and #include "autopar-1.c" rather than duplicating it.

Ok with those changes.

Jakub


Re: [PR65637][PATCH][3/5] Fix gcc_assert in expand_omp_for_static_chunk

2015-09-03 Thread Jakub Jelinek
On Mon, Aug 31, 2015 at 01:55:40PM +0200, Tom de Vries wrote:
> Fix gcc_assert in expand_omp_for_static_chunk
> 
> 2015-08-31  Tom de Vries  
> 
>   PR tree-optimization/65637
>   * omp-low.c (expand_omp_for_static_chunk): Fix gcc_assert for the case
>   that head is NULL.
> 
>   * gcc.dg/autopar/pr46099-chunk-size.c: New test.
> ---
>  gcc/ChangeLog |  6 +++
>  gcc/omp-low.c |  2 +-
>  gcc/testsuite/gcc.dg/autopar/pr46099-chunk-size.c | 47 
> +++
>  3 files changed, 54 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/autopar/pr46099-chunk-size.c
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index a0123b1..5a273ba 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,9 @@
> +2015-05-18  Tom de Vries  
> +
> + PR tree-optimization/65637
> + * omp-low.c (expand_omp_for_static_chunk): Fix gcc_assert for the case
> + that head is NULL.
> +
>  2015-08-31  Tom de Vries  
>  
>   * tree-ssa-loop-manip.c (find_uses_to_rename_use)
> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> index c3dfc51..4e732ae 100644
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -7326,7 +7326,7 @@ expand_omp_for_static_chunk (struct omp_region *region,
> locus = redirect_edge_var_map_location (vm);
> add_phi_arg (nphi, redirect_edge_var_map_def (vm), re, locus);
>   }
> -  gcc_assert (gsi_end_p (psi) && i == head->length ());
> +  gcc_assert (gsi_end_p (psi) && (head == NULL || i == head->length ()));
>redirect_edge_var_map_clear (re);
>while (1)
>   {

Ok.

> diff --git a/gcc/testsuite/gcc.dg/autopar/pr46099-chunk-size.c 
> b/gcc/testsuite/gcc.dg/autopar/pr46099-chunk-size.c
> new file mode 100644
> index 000..709841a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/autopar/pr46099-chunk-size.c

I'd name the testcase just pr46099-2.c.

> @@ -0,0 +1,47 @@
> +/* PR tree-optimization/46099.  */
> +/* { dg-do compile } */
> +/* { dg-options "-ftree-parallelize-loops=2 -fcompare-debug -O --param 
> parloops-chunk-size=100" } */

But more importantly, if you haven't changed anything in the testcase
beyond dg-options, just
#include "pr46099.c"
here rather than duplicating the whole testcase.  Ok with that change.

Jakub


Re: [libgo] Use stat_atim.go on Solaris 12+

2015-09-03 Thread Rainer Orth
Rainer Orth  writes:

> Solaris 12 changes the stat_[amc]tim members of struct stat from
> timestruc_t to timespec_t for XPG7 compatiblity, thus breaking the libgo
> build.  The following patch checks for this change and uses the common
> stat_atim.go if appropriate.
>
> Btw., I noticed that go/os/stat_atim.go and stat_dragonfly.go are identical;
> no idea why that would be useful.
>
> Bootstrapped without regressions on i386-pc-solaris2.1[12] and
> sparc-sun-solaris2.1[12].
>
> I had to regenerate aclocal.m4 since for some reason it had been built
> with automake 1.11.1 instead of the common 1.11.6, thus inhibiting
> Makefile.in regeneration.
>
> Ok for mainline now and the gcc 5 branch after some soak time?
>
>   Rainer
>
>
> 2015-02-10  Rainer Orth  
>
>   * configure.ac (have_stat_timespec): Check for timespec_t st_atim
>   in .
>   (HAVE_STAT_TIMESPEC): New conditional.
>   * configure: Regenerate.
>   * Makefile.am [LIBGO_IS_SOLARIS && HAVE_STAT_TIMESPEC]
>   (go_os_stat_file): Use go/os/stat_atim.go.
>   * aclocal.m4: Regenerate.
>   * Makefile.in: Regenerate.

This patch has remained unreviewed for a week.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH][2/5] Handle simple latch bb in expand_omp_for_static_chunk

2015-09-03 Thread Jakub Jelinek
On Mon, Aug 31, 2015 at 01:50:42PM +0200, Tom de Vries wrote:
> @@ -7351,14 +7357,25 @@ expand_omp_for_static_chunk (struct omp_region 
> *region,
>  
>if (!broken_loop)
>  {
> +  struct loop *loop = body_bb->loop_father;
>struct loop *trip_loop = alloc_loop ();
>trip_loop->header = iter_part_bb;
>trip_loop->latch = trip_update_bb;
>add_loop (trip_loop, iter_part_bb->loop_father);
>  
> +  if (loop != entry_bb->loop_father)
> + {
> +   gcc_assert (loop->header == body_bb);
> +   gcc_assert (broken_loop

This is in a block code guarded with !broken_loop.
So, either you should just leave the "broken_loop || " out, or
you need to move it elsewhere, outside of the block guarded with
!broken_loop.

Jakub


  1   2   >