date:20170717

[PATCH] gcc: ada: delete old $(P) reference

2017-07-17 Thread Mike Frysinger

From: Mike Frysinger 

The P variable was deleted back in Nov 2015 (svn rev 231062),
but its expansion was missed.  Delete those now too.

2017-07-18  Mike Frysinger  

* gcc-interface/Makefile.in ($(P)): Delete
---
 gcc/ada/gcc-interface/Makefile.in | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/gcc-interface/Makefile.in 
b/gcc/ada/gcc-interface/Makefile.in
index 1c172037d927..b485c18ec21e 100644
--- a/gcc/ada/gcc-interface/Makefile.in
+++ b/gcc/ada/gcc-interface/Makefile.in
@@ -2643,10 +2643,10 @@ gnatlink-re: ../stamp-tools gnatmake-re
 #  stamp target in the parent directory whenever gnat1 is rebuilt
 
 # Likewise for the tools
-../../gnatmake$(exeext): $(P) b_gnatm.o $(GNATMAKE_OBJS)
+../../gnatmake$(exeext): b_gnatm.o $(GNATMAKE_OBJS)
+$(GCC_LINK) $(ALL_CFLAGS) -o $@ b_gnatm.o $(GNATMAKE_OBJS) 
$(TOOLS_LIBS) $(TOOLS1_LIBS)
 
-../../gnatlink$(exeext): $(P) b_gnatl.o $(GNATLINK_OBJS)
+../../gnatlink$(exeext): b_gnatl.o $(GNATLINK_OBJS)
+$(GCC_LINK) $(ALL_CFLAGS) -o $@ b_gnatl.o $(GNATLINK_OBJS) 
$(TOOLS_LIBS) $(TOOLS1_LIBS)
 
 ../stamp-gnatlib-$(RTSDIR):
-- 
2.12.0

PATCH v2][Aarch64] Add vectorized mersenne twister

2017-07-17 Thread Michael Collison

This is the second version of a patch for Aarc64 to add a vectorized mersenne 
twister to libstdc++. The first version used intrinsics and included 
"arm_neon.h". After feedback from the community this version uses only GCC 
vector extensions and Aarch64 simd data types.

This patch adds an vectorized implementation of the mersenne twister random 
number generator. This implementation is approximately 2.6 times faster than 
the non-vectorized implementation.

Sample code to use the new generator would look like this:

#include 
#include 
#include 

int
main()
{
  __gnu_cxx::sfmt19937 mt(1729);

  std::uniform_int_distribution dist(0,1008);

  for (int i = 0; i < 16; ++i)
{
  std::cout << dist(mt) << " ";
}
}

Okay for trunk?

2017-07-16  Michael Collison  

Add optimized implementation of mersenne twister for aarch64
* config/cpu/aarch64/opt/ext/opt_random.h: New file.
(__arch64_recursion): New function.
(__aarch64_lsr_128): New function.
(__aarch64_lsl_128): New function.
(operator==): New function.
(simd_fast_mersenne_twister_engine): Implement
method _M_gen_rand.
* config/cpu/aarch64/opt/bits/opt_random.h: New file.
* include/ext/random: (simd_fast_mersenne_twister_engine):
add _M_state private array.



pr4218v2.patch
Description: pr4218v2.patch

Re: [PATCH] Fix pr80044, -static and -pie insanity, and pr81170

2017-07-17 Thread Alan Modra

On Mon, Jul 17, 2017 at 06:01:47AM -0700, H.J. Lu wrote:
> On Mon, Jul 17, 2017 at 5:33 AM, Alan Modra  wrote:
> > On Sat, Jul 15, 2017 at 06:32:40AM -0700, H.J. Lu wrote:
> >> On Thu, Jun 22, 2017 at 8:28 AM, Alan Modra  wrote:
> >> > PR80044 notes that -static and -pie together behave differently when
> >> > gcc is configured with --enable-default-pie as compared to configuring
> >> > without (or --disable-default-pie).  This patch removes that
> >> > difference.  In both cases you now will have -static completely
> >> > overriding -pie.
> >> >
> >> > Fixing this wasn't quite as simple as you'd expect, due to poor
> >> > separation of functionality.  PIE_SPEC didn't just mean that -pie was
> >> > on explicitly or by default, but also -r and -shared were *not* on.
> >> > Fortunately the three files touched by this patch are the only places
> >> > PIE_SPEC and NO_PIE_SPEC are used, so it isn't too hard to see that
> >> > the reason PIE_SPEC and NO_PIE_SPEC are not inverses is the use of
> >> > PIE_SPEC in LINK_PIE_SPEC.  So, move the inelegant symmetry breaking
> >> > addition, to LINK_PIE_SPEC where it belongs.  Doing that showed
> >> > another problem in gnu-user.h, with PIE_SPEC and NO_PIE_SPEC selection
> >> > of crtbegin*.o not properly hooked into a chain of if .. elseif ..
> >> > conditions, which required both PIE_SPEC and NO_PIE_SPEC to exclude
> >> > -static and -shared.  Fixing that particular problem finally allows
> >> > PIE_SPEC to serve just one purpose, and NO_PIE_SPEC to disappear.
> >> >
> >> > Bootstrapped and regression tested powerpc64le-linux c,c++.  No
> >> > regressions and a bunch of --enable-default-pie failures squashed.
> >> > OK mainline and active branches?
> >> >
> >> > Incidentally, there is a fairly strong case to be made for adding
> >> > -static to the -shared, -pie, -no-pie chain of RejectNegative's in
> >> > common.opt.  Since git 0d6378a9e (svn r48039) 2001-11-15, -static has
> >> > done more than just the traditional "prevent linking with dynamic
> >> > libraries", as -static selects crtbeginT.o rather than crtbegin.o
> >> > on GNU systems.  Realizing this is what led me to close pr80044, which
> >> > I'd opened with the aim of making -pie -static work together (with the
> >> > traditional meaning of -static).  I don't that is worth doing, but
> >> > mention pr80044 in the changelog due to fixing the insane output
> >> > produced by -pie -static with --disable-default-pie.
> >> >
> >>
> >> On x86-64, without --enable-default-pie, "-static -pie" and "-pie -static"
> >> never worked since both -static and -pie are passed to linker, which
> >> uses libc.a to build PIE.
> >
> > Yes, it's broken.
> 
> This behavior may be useful for static PIE when libc.a is compiled with
> -fPIE.

Building a PIE from static archives using -static -pie or -pie -static
right now is broken, even if the archives are compiled -fpie/PIE.
I've looked into fixing it, and decided it wasn't worth the effort.
There are multiple problems.  See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80044#c1

One reason why I believe it isn't worth fixing is that the meaning of
-static has changed over the years, from "link using static archives"
to "produce a static executable", and most certainly the meaning of
-static and -pie together is not clear.  I'll cite gold behaviour as
evidence: -static with -pie results in an error from gold.  See
https://sourceware.org/ml/binutils/2012-02/msg00119.html and following
discussion.

> >>  With --enable-default-pie, -static and -pie
> >> override each other.
> >
> > No they don't.  -static overrides -pie.
> >
> >>  What does your patch do on x86-64?  Make
> >> with and without --enable-default-pie behave the same?
> >
> > Yes, as I said in my original post first paragraph.
> >
> >>  Does it
> >> mean that both fail to create executable?
> >
> > I try to leave that sort of patch to those better qualified.
> > Bootstrap and regression testing on x86_64-linux both
> > --enable-default-pie and --disable-default-pie was complete June 23.
> >
> 
> What is the new behavior?  The old  --disable-default-pie or old
> --enable-default-pie?

You are asking questions to which the answer is given in the very
first paragraph posted in this thread, if you knew the current
--enable-default-pie behaviour.  -static overrides -pie.  ie. current
--enable-default-pie behaviour is unchanged.

> Will static PIE be supported if libc is
> compiled with -fPIE by default?

I covered this above, if you're asking about -static and -pie
together.  Unsupported both before and after my patch.  You *can* link
a working PIE from -fPIE archives, if that is what you want, with
"-pie -Wl,-Bstatic", both before and after my patch.

I'll ask a question of you.  Have you reviewed the patch and found
anything wrong with it?

-- 
Alan Modra
Australia Development Lab, IBM

[PING] [PATCH] enhance overflow and truncation detection in strncpy and strncat (PR 81117)

2017-07-17 Thread Martin Sebor


Ping:

  https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00411.html

On 07/08/2017 02:45 PM, Martin Sebor wrote:

PR 81117 asks for improved detection of common misuses(*) of
strncpy and strncat.  The attached patch is my solution.  It
consists of three related sets of changes:

1) Adds a new option, -Wstringop-truncation, that diagnoses calls
to strncpy, and stpncpy (and also strncat) that truncate the copy.
This helps highlight the common but incorrect assumption that
the first two functions NUL-terminate the copy (see, for example,
CWE-170)  For strncat, it helps detect cases of inadvertent
truncation of the source string by passing in a bound that's
less than or equal to its length.

2) Enhances -Wstringon-overflow to diagnose calls of the form
strncpy(D, S, N) where the bound N depends on a call to strlen(S).
This misuse is common in legacy code when, often in response to
the adoption of a secure coding initiative, while replacing uses
of strcpy with strncpy, the engineer either makes a mistake, or
doesn't have a good enough understanding of how the function works,
or does only the bare minimum to satisfy the requirement to avoid
using strcpy without actually improving anything.

3) Enhances -Wsizeof-pointer-memaccess to also warn about uses of
the functions to copy an array to a destination of an unknown size
that specify the size of the array as the bound.  Given the
pervasive [mis]use of strncpy to bound the copy to the size of
the destination, instances like this suggest a bug: a possible
buffer overflow due to an excessive bound (see, for example,
CWE-806).  In cases when the call is safe, it's equivalent to
the corresponding call to memcpy which is clearer and can be
more efficient.

Martin

PS By coincidence rather than by intent, the strncat warnings
are a superset of Clang's -Wstrncat-size.  AFAICS, Clang only
warns when the destination is an array of known size and
doesn't have a corresponding warning for strncpy.

[*] Here's some background into these misuses.

The purpose of the historical strncpy function introduced in V7
UNIX was to completely fill an array of chars with data, either
by copying an initial portion of a source string, or by clearing
it.  I.e., its purpose wasn't to create NUL-terminated strings.
An example of its use was to fill the directory entry d_name
array (dirent::d_name) with the name of a file.

The original purpose of the strncat function, on the other hand,
was to append a not necessarily NUL-terminated array of chars
to a string to form a NUL-terminated concatenation of the two.
An example use case is appending a directory entry (struct
dirent::d_name) that need not be NUL-terminated, to form
a pathname which does.

Largely due to a misunderstanding of the functions' purpose they
have become commonly used (and misused) to make "safe," bounded
string copies by safeguarding against accidentally overflowing
the destination.  This has led to great many bugs and security
vulnerabilities.

[PING #2] [PATCH] enhance -Wrestrict for sprintf %s arguments

2017-07-17 Thread Martin Sebor


Ping #2:

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00036.html


On 07/02/2017 02:00 PM, Martin Sebor wrote:

The attached patch enhances the -Wrestrict warning to detect more
than just trivial instances of overlapping copying by sprintf and
related functions.

The meat of the patch is relatively simple but because it introduces
dependencies between existing classes in the sprintf pass I had to
move the class definitions around.  That makes the changes look more
extensive than they really are.

The enhancement works by first determining the base object (or
pointer) from the destination of the sprintf call, the constant
offset into the object, and its size.  For each %s argument, it
then computes same information.  If it determines that overlap
between the two is possible it stores the information for the
directive argument (including the size of the argument) for later
processing.  After the whole call/format string has been processed,
the patch then iterates over the stored directives and their
arguments and compares the size/length of the argument against
the function's overall output.  If they overlap it issues
a warning.

Tested on x86_64-linux.

-Wrestrict is not currently included in either -Wextra or -Wall
and this patch doesn't change it even though there have been
requests to add it to one of these two options.  I'd like to do
that as a separate step.

As the next step I'd also like to extend a limited form of the
-Wrestrict enhancement to other restrict-qualified built-ins (like
strcpy), and ultimately also to user-defined functions that make
use of restrict.  I think this might perhaps best be done in
a separate pass where the computed pointer information can be
cached to avoid recomputing it for each call, but I don't expect
to be able to have the new pass (or whatever form the enhancement
might end up taking) to also handle sprintf (at least not with
the same accuracy it does now) because the sprintf data for each
format directive are not available outside the sprintf pass.

Martin

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 07/08

2017-07-17 Thread Jeff Law

On 07/17/2017 04:42 PM, Wilco Dijkstra wrote:
> Jeff Law wrote:
>> On 07/17/2017 05:27 AM, Wilco Dijkstra wrote:
> 
>>> A minimum guard size of 64KB seems reasonable even on systems with
>>> 4KB pages. However whatever the chosen guard size, you cannot defend
>>> against hostile code. An OS can of course increase the guard size well 
>>> beyond the minimum required, but that's simply reducing the probability -
>>> it is never going to block a big unchecked alloca.
>> That's a kernel issue and I'm not in a position to change that.  On
>> systems with a 64bit address space, I'm keen to see the kernel team
>> increase the guard size, but it's not something I can unilaterally make
>> happen.
> 
> Well if we want the stack guard catch cases with high probability even if some
> or most of the code is unchecked, it must be made much larger. And the fact
> you can set the stack guard to zero in GLIBC is worrying as that would allow 
> an
> attacker to trivially bypass the stack guard...
Well, the attacker would have to arrange to emit a call to change the
guard size -- if the program doesn't already have that code sequence,
then that's going to be reasonably hard (they'd have to control the
argument to that pthread call to set the guard or conjure up a ROP
sequence -- and if they've got a ROP chain started there's probably more
direct attacks).




>>> In 99% of the frames only one stack allocation is made. There are a few
>>> cases where the stack can be adjusted twice.
>> BUt we still have to deal with the cases where there are multiple
>> adjustments.  Punting in the case of multiple adjustments isn't the
>> right thing to do.  Some level of tracking is needed.
> 
> I didn't say we should punt, just that no tracking is required. The AArch64 
> prolog
> code is extremely simple. The only adjustments that need to be checked are 
> initial_adjust and final_adjust. Callee_adjust doesn't need any checks since 
> it is 
> limited by the range of STP (ie. < 512) and if the locals are large, it is 
> zero.
Hmm, so we can't ever get into a case where INITIAL_ADJUST and
CALLEE_ADJUST are both nonzero?  If so, then yes, that does simplify
things -- dealing with that case is an serious annoyance and I'd be
happy to add an assert to ensure it never happens.   That's where
knowing the architecture details (which I don't) really turns out to be
useful :-)

> 
> Anyway the only complex case is shrinkwrapping. We know that at least LR must 
> be
> saved before a call, but with -fomit-frame-pointer it doesn't always end up 
> at the
> bottom of the callee-saves. We could take its offset into account or force it 
> at offset 0.
Right.  I'd roughly figured out that the LR save is not separately
wrapped, which is good to know in terms of cases that have to be supported.


> 
>>> To be safe I think we first need to probe and then allocate. Or are there 
>>> going
>>> to be extra checks in asynchronous interrupt handlers that check whether SP 
>>> is
>>> above the stack guard?
>> That hits the red zone, which is unfortunate.  But it's acceptable IMHO.
> 
> How do you mean? What will happen if SP points into the heap? Will the trap
> handler just write to it without any check?
In general writing below SP tends of be discouraged.  But there's cases
where, IMHO, it's called for.  I think this is one of them.

I'm not aware of any checks in the signal handler frame setup done by
the kernel, which is a significant issue.  I'd focused on making sure we
don't have a window where the stack pointer has been pushed past the
guard before probing.  However, as Richard E has pointed out, if the
kernel doesn't have checks, then the signal frame setup itself in the
kernel could be used to push through the guard depending on the size of
the signal frame and the state of the stack pointer relative to the end
of the guard.

Jeff

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 07/08

2017-07-17 Thread Wilco Dijkstra

Jeff Law wrote:    
> On 07/17/2017 05:27 AM, Wilco Dijkstra wrote:

> > A minimum guard size of 64KB seems reasonable even on systems with
> > 4KB pages. However whatever the chosen guard size, you cannot defend
> > against hostile code. An OS can of course increase the guard size well 
> > beyond the minimum required, but that's simply reducing the probability -
> > it is never going to block a big unchecked alloca.
> That's a kernel issue and I'm not in a position to change that.  On
> systems with a 64bit address space, I'm keen to see the kernel team
> increase the guard size, but it's not something I can unilaterally make
> happen.

Well if we want the stack guard catch cases with high probability even if some
or most of the code is unchecked, it must be made much larger. And the fact
you can set the stack guard to zero in GLIBC is worrying as that would allow an
attacker to trivially bypass the stack guard...

> > Outgoing args are not an area for concern even with a 4KB stack guard.
> They are a concern and as the size of the guard comes down, they become
> an increasing concern, IMHO.

At 4KB it's 1000 times more likely to have a frame that could skip the stack
guard than outgoing args. That's before we consider alloca which is even
more common.

> > In 99% of the frames only one stack allocation is made. There are a few
> > cases where the stack can be adjusted twice.
> BUt we still have to deal with the cases where there are multiple
> adjustments.  Punting in the case of multiple adjustments isn't the
> right thing to do.  Some level of tracking is needed.

I didn't say we should punt, just that no tracking is required. The AArch64 
prolog
code is extremely simple. The only adjustments that need to be checked are 
initial_adjust and final_adjust. Callee_adjust doesn't need any checks since it 
is 
limited by the range of STP (ie. < 512) and if the locals are large, it is zero.

Anyway the only complex case is shrinkwrapping. We know that at least LR must be
saved before a call, but with -fomit-frame-pointer it doesn't always end up at 
the
bottom of the callee-saves. We could take its offset into account or force it 
at offset 0.

> > To be safe I think we first need to probe and then allocate. Or are there 
> > going
> > to be extra checks in asynchronous interrupt handlers that check whether SP 
> > is
> > above the stack guard?
> That hits the red zone, which is unfortunate.  But it's acceptable IMHO.

How do you mean? What will happen if SP points into the heap? Will the trap
handler just write to it without any check?

Wilco

Re: [PING^4] {C++ PATCH] Fix-it hints for wrong usage of 'friend' and 'auto'

2017-07-17 Thread Volker Reichelt

On 17 Jul, David Malcolm wrote:
> On Sun, 2017-07-16 at 12:13 +0200, Volker Reichelt wrote:
>> Hi,
>> 
>> this is PING^4 for a C++ patch that adds fix-it hints for wrong usage
>> of 'friend' and 'auto':
>> 
>> https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01575.html
> 
> I notice that the patch slightly changes the apparent indentation
> within the RID_AUTO hunk, but the pre-existing lines there are a
> mixture of pure spaces, and tabs+spaces; please double-check that any
> lines that the patch touches are converted to correctly follow our
> convention of tabs+spaces.

The indentation in that area needs quite some tabs vs. spaces cleanup.
I only fixed the lines that are touched by the code changes (and also
the comment above in the final commit).

> Other than that, OK for trunk.

Committed to trunk.

Regards,
VOlker

> Sorry for not spotting this earlier.
> Dave
> 
>> Previous pings
>> 
>> https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01248.html
>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00207.html
>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00712.html ; (by
>> Gerald)
>> 
>> (thanks Gerald!)
>> 
>> Regards,
>> Volker
>

[PATCH, alpha]: Fix bootstrap

2017-07-17 Thread Uros Bizjak

2017-07-17  Uros Bizjak  

* config/alpha/alpha.c: Include predict.h.

Bootstrapped on alphaev68-linux-gnu, committed to mainline SVN.

Uros.

Index: config/alpha/alpha.c
===
--- config/alpha/alpha.c(revision 250278)
+++ config/alpha/alpha.c(working copy)
@@ -29,6 +29,7 @@
 #include "memmodel.h"
 #include "gimple.h"
 #include "df.h"
+#include "predict.h"
 #include "tm_p.h"
 #include "ssa.h"
 #include "expmed.h"

[PATCH] PR libstdc++/81064 fix versioned namespace

2017-07-17 Thread François Dumont


Hi

Here is the patch to fix libstdc++ versioned namespace.

Now versioned namespace is only at std and __gnu_cxx levels, not 
anymore in nested namespaces.


   PR libstdc++/81064

* include/bits/algorithmfwd.h: Reorganize versioned namespace.
* include/bits/basic_string.h: Likewise.
* include/bits/c++config: Likewise.
* include/bits/deque.tcc: Likewise.
* include/bits/forward_list.h: Likewise.
* include/bits/forward_list.tcc: Likewise.
* include/bits/hashtable_policy.h: Likewise.
* include/bits/list.tcc: Likewise.
* include/bits/move.h: Likewise.
* include/bits/quoted_string.h: Likewise.
* include/bits/random.h: Likewise.
* include/bits/random.tcc: Likewise.
* include/bits/regex.h: Likewise.
* include/bits/regex.tcc: Likewise.
* include/bits/regex_automaton.h: Likewise.
* include/bits/regex_automaton.tcc: Likewise.
* include/bits/regex_compiler.h: Likewise.
* include/bits/regex_compiler.tcc: Likewise.
* include/bits/regex_constants.h: Likewise.
* include/bits/regex_error.h: Likewise.
* include/bits/regex_executor.h: Likewise.
* include/bits/regex_executor.tcc: Likewise.
* include/bits/regex_scanner.h: Likewise.
* include/bits/regex_scanner.tcc: Likewise.
* include/bits/specfun.h: Likewise.
* include/bits/stl_algo.h: Likewise.
* include/bits/stl_algobase.h: Likewise.
* include/bits/stl_bvector.h: Likewise.
* include/bits/stl_deque.h: Likewise.
* include/bits/stl_iterator.h: Likewise.
* include/bits/stl_iterator_base_funcs.h: Likewise.
* include/bits/stl_list.h: Likewise.
* include/bits/stl_map.h: Likewise.
* include/bits/stl_multimap.h: Likewise.
* include/bits/stl_multiset.h: Likewise.
* include/bits/stl_relops.h: Likewise.
* include/bits/stl_set.h: Likewise.
* include/bits/stl_vector.h: Likewise.
* include/bits/uniform_int_dist.h: Likewise.
* include/bits/unordered_map.h: Likewise.
* include/bits/unordered_set.h: Likewise.
* include/bits/vector.tcc: Likewise.
* include/c_global/cmath: Likewise.
* include/c_std/cmath: Likewise.
* include/decimal/decimal: Likewise.
* include/decimal/decimal.h: Likewise.
* include/experimental/algorithm: Likewise.
* include/experimental/any: Likewise.
* include/experimental/array: Likewise.
* include/experimental/bits/erase_if.h: Likewise.
* include/experimental/bits/fs_dir.h: Likewise.
* include/experimental/bits/fs_fwd.h: Likewise.
* include/experimental/bits/fs_ops.h: Likewise.
* include/experimental/bits/fs_path.h: Likewise.
* include/experimental/bits/shared_ptr.h: Likewise.
* include/experimental/bits/string_view.tcc: Likewise.
* include/experimental/chrono: Likewise.
* include/experimental/deque: Likewise.
* include/experimental/filesystem: Likewise.
* include/experimental/forward_list: Likewise.
* include/experimental/functional: Likewise.
* include/experimental/iterator: Likewise.
* include/experimental/list: Likewise.
* include/experimental/map: Likewise.
* include/experimental/memory: Likewise.
* include/experimental/memory_resource: Likewise.
* include/experimental/numeric: Likewise.
* include/experimental/optional: Likewise.
* include/experimental/propagate_const: Likewise.
* include/experimental/random: Likewise.
* include/experimental/ratio: Likewise.
* include/experimental/regex: Likewise.
* include/experimental/set: Likewise.
* include/experimental/source_location: Likewise.
* include/experimental/string: Likewise.
* include/experimental/string_view: Likewise.
* include/experimental/system_error: Likewise.
* include/experimental/tuple: Likewise.
* include/experimental/type_traits: Likewise.
* include/experimental/unordered_map: Likewise.
* include/experimental/unordered_set: Likewise.
* include/experimental/utility: Likewise.
* include/experimental/vector: Likewise.
* include/ext/bitmap_allocator.h: Likewise.
* include/ext/codecvt_specializations.h: Likewise.
* include/ext/rope: Likewise.
* include/ext/typelist.h: Likewise.
* include/std/chrono: Likewise.
* include/std/complex: Likewise.
* include/std/functional: Likewise.
* include/std/numeric: Likewise.
* include/std/string_view: Likewise.
* include/std/thread: Likewise.
* include/std/variant: Likewise.
* include/tr1/array: Likewise.
* include/tr1/bessel_function.tcc: Likewise.
* include/tr1/beta_function.tcc: Likewise.
* include/tr1/cmath: Likewise.
* include/tr1/complex: Likewise.
* include/tr1/ell_integral.tcc: Likewise.
* include/tr1/exp_integral.tcc: Likewise.
* include/tr1/functional: Likewise.
* include/tr1/functional_hash.h: Likewise.
* include/tr1/gamma.tcc: Likewise.
* include/tr1/hashtable.h: Likewise.
* include/tr1/hashtable_policy.h: Likewise.
* include/tr1/hyper

Re: [PATCH] match.pd: reassociate multiplications with constants

2017-07-17 Thread Alexander Monakov

On Mon, 17 Jul 2017, Marc Glisse wrote:
> > +/* Combine successive multiplications.  Similar to above, but handling
> > +   overflow is different.  */
> > +(simplify
> > + (mult (mult @0 INTEGER_CST@1) INTEGER_CST@2)
> > + (with {
> > +   bool overflow_p;
> > +   wide_int mul = wi::mul (@1, @2, TYPE_SIGN (type), &overflow_p);
> > +  }
> > +  (if (!overflow_p || TYPE_OVERFLOW_WRAPS (type))
> 
> I wonder if there are cases where this would cause trouble for saturating
> integers. The only case I can think of is when @2 is -1, but that's likely
> simplified to NEGATE_EXPR first.

Ah, yes, I think if @2 is -1 or 0 then we should not attempt this transform for
either saturating or sanitized types, just like in the first patch. I think
wrapping the 'with' with 'if (!integer_minus_onep (@2) && !integer_zerop (@2))'
works, since as you say it should become a negate/zero anyway?

Alexander

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 07/08

2017-07-17 Thread Trevor Saunders

On Mon, Jul 17, 2017 at 09:51:40AM -0600, Jeff Law wrote:
> On 07/16/2017 12:36 PM, Trevor Saunders wrote:
> >>> On the other hand if probing is fast enough that it can be on by default
> >>> in gcc there should be much less of it.  Even if you did change the ABI
> >>> to require probing it seems unlikely that code violating that
> >>> requirement would hit problems other than this security concern, so I'd
> >>> expect there will be some non compliant asm out there.
> >> Certainly my goal is to enable it by default one day.  Even if that's
> >> ultimately done at the distro level or by a configure time switch.
> >>
> >> Certainly if/when we reach that point the amount of unprotected code
> >> will drop dramatically, but based on my experience I fully expect some
> >> folks to turn it off.  They'll say something like "hey, we've audited
> >> our code and we don't have any large stack or allocas, so I'm turning
> >> this thing off to get some un-measurable performance gain."  It'd be an
> >> uber-dumb thing to do, but it's going to happen.
> > 
> > I agree with that, though we could consider not having an option for it.
> > Of course people can compile there own compiler, but I think its pretty
> > clear if you do that the security problem is your fault too.
> I suspect not having an option is a non-starter.  But maybe I'm wrong on
> that one.

I can see conditions under which I'd support it depending on
architecture and impact, but I can't speak for anyone else, so your
probably wise to expect it won't happen.  As an example it seems

> >> Tweaking the ABI to mandate touching *sp in the outgoing args area &
> >> alloca space is better because we likely wouldn't have an option to
> >> avoid that access.   So a well meaning, but clueless, developer couldn't
> >> turn the option off like they could stack checking.
> > 
> > However see the comment about assembly code, I can easily see someone
> > forgeting to touch *sp in hand written assembly.  Obviously much less of
> > that, but it kind of sounds like you want perfect here.
> Perfect would be good :-)
> 
> The odds of hand written assembly code having a large enough outgoing
> args size or dynamic frame to require touching is exceedingly small.

Probably true, but that's not perfect so we're back in the land of
balancing probabilities.

> >>> It seems to me pretty important to ask how many programs out there have
> >>> a caller that can push the stack into the guard page, but not past it.
> >> I've actually spent a lot of time thing about that precise problem. You
> >> don't need large frames to do that -- you just need controlled heap
> >> leaks and/or controlled recursion.  ie, even if a function has no
> >> allocas and a small frame, it can put the stack pointer into the guard.
> > 
> > There may not be room for it on 32 bit platforms, but I think another
> > thing we learned here is that its a mistake to allow the heap to grow
> > into space the stack might use.  That would require the use of recursion
> > here.
> The heap grows in response to explicit requests and appropriate checks
> can be made to prevent jumping the guard due to heap growth.  It's the
> implicit mechansisms for stack growth that cause headaches.

I'm aware the heap can't jump the guard on its own.  However my
understanding of some of the known exploits was that they used heap
leaks to grow the heap until the stack couldn't grow any larger to make
it easier to jump the guard.

Has anyone looked at how necessary it is to support stack growth on 64
bit arches? I'd hope we could just have a set size that gets faulted in,
but I'm probably wrong.

> >>   I think the largest
> >>> buffer Qualys found was less than 400k? So 1 256k guard page should
> >>> protect 95% of functions, and 1m or 2m  seems like enough to protect
> >>> against all non malicious programs.  I'm not sure, but this is a 64 bit
> >>> arch, so it seems like we should have the adress space for large guard
> >>> pages like that.
> >> I'm all for larger guards, particularly on 64 bit architectures.
> >>
> >> We use 64k pages on aarch64 for RHEL which implicitly gives us a minimum
> >> guard of 64k.  That would be a huge factor in any analysis I would do if
> >> the aarch64 maintainers choose not to fully protect their architecture
> >> and I was forced to make a recommendation for Red Hat and its customers.
> >>  I hope I don't have to sit down and do the analysis on this and make
> >> any kind of recommendation.
> > 
> > Its certainly easier to say its not the compilers job to fully protect
> > when you don't have to make that recommendation ;)
> Lots of factors come into play.  What I don't want to do is put Red Hat
> in a position where some customer gets hacked because we left open a
> known hole that we could have reasonably closed.

Yeah, that's understandable if I thought all of this should be fixed in
the compiler I'd agree with not wanting to leave any holes.

> >> The fact that Qualys found nothing larger than X (f

Re: std::forward_list optim for always equal allocator

2017-07-17 Thread Daniel Krügler

2017-07-17 22:10 GMT+02:00 François Dumont :
> Hi
>
> Here is the patch to implement the always equal alloc optimization for
> forward_list. With this version there is no abi issue.
>
> I also prefer to implement the _Fwd_list_node_base move operator for
> consistency with the move constructor and used it where applicable.
>
>
> * include/bits/forward_list.h
> (_Fwd_list_node_base& operator=(_Fwd_list_node_base&&)): Implement.
> (_Fwd_list_impl(_Fwd_list_impl&&, _Node_alloc_type&&)): New.
> (_Fwd_list_base(_Fwd_list_base&&, _Node_alloc_type&&, std::true_type)):
> New, use latter.
> (forward_list(forward_list&&, _Node_alloc_type&&, std::false_type)):
> New.
> (forward_list(forward_list&&, _Node_alloc_type&&, std::true_type)):
> New.
> (forward_list(forward_list&&, const _Alloc&)): Adapt to use latters.
> * include/bits/forward_list.tcc
> (_Fwd_list_base(_Fwd_list_base&&, _Node_alloc_type&&)): Adapt to use
> _M_impl._M_head move assignment.
> (forward_list<>::merge(forward_list<>&&, _Comp)): Likewise.
>
> Tested under Linux x86_64, ok to commit ?

Out of curiosity: Shouldn't

_Fwd_list_node_base&
operator=(_Fwd_list_node_base&& __x);

be declared noexcept?

Thanks,

- Daniel

std::forward_list optim for always equal allocator

2017-07-17 Thread François Dumont


Hi

Here is the patch to implement the always equal alloc optimization 
for forward_list. With this version there is no abi issue.


I also prefer to implement the _Fwd_list_node_base move operator 
for consistency with the move constructor and used it where applicable.



* include/bits/forward_list.h
(_Fwd_list_node_base& operator=(_Fwd_list_node_base&&)): Implement.
(_Fwd_list_impl(_Fwd_list_impl&&, _Node_alloc_type&&)): New.
(_Fwd_list_base(_Fwd_list_base&&, _Node_alloc_type&&, std::true_type)):
New, use latter.
(forward_list(forward_list&&, _Node_alloc_type&&, std::false_type)):
New.
(forward_list(forward_list&&, _Node_alloc_type&&, std::true_type)):
New.
(forward_list(forward_list&&, const _Alloc&)): Adapt to use latters.
* include/bits/forward_list.tcc
(_Fwd_list_base(_Fwd_list_base&&, _Node_alloc_type&&)): Adapt to use
_M_impl._M_head move assignment.
(forward_list<>::merge(forward_list<>&&, _Comp)): Likewise.

Tested under Linux x86_64, ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/forward_list.h b/libstdc++-v3/include/bits/forward_list.h
index 9ddbcb2..dec91ea 100644
--- a/libstdc++-v3/include/bits/forward_list.h
+++ b/libstdc++-v3/include/bits/forward_list.h
@@ -60,7 +60,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
 _Fwd_list_node_base(const _Fwd_list_node_base&) = delete;
 _Fwd_list_node_base& operator=(const _Fwd_list_node_base&) = delete;
-_Fwd_list_node_base& operator=(_Fwd_list_node_base&&) = delete;
+
+_Fwd_list_node_base&
+operator=(_Fwd_list_node_base&& __x)
+{
+  _M_next = __x._M_next;
+  __x._M_next = nullptr;
+  return *this;
+}
 
 _Fwd_list_node_base* _M_next = nullptr;
 
@@ -75,7 +82,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	  __end->_M_next = _M_next;
 	}
   else
-	__begin->_M_next = 0;
+	__begin->_M_next = nullptr;
   _M_next = __keep;
   return __end;
 }
@@ -180,7 +187,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	if (_M_node)
 	  return _Fwd_list_iterator(_M_node->_M_next);
 	else
-	  return _Fwd_list_iterator(0);
+	  return _Fwd_list_iterator(nullptr);
   }
 
   _Fwd_list_node_base* _M_node;
@@ -251,7 +258,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	if (this->_M_node)
 	  return _Fwd_list_const_iterator(_M_node->_M_next);
 	else
-	  return _Fwd_list_const_iterator(0);
+	  return _Fwd_list_const_iterator(nullptr);
   }
 
   const _Fwd_list_node_base* _M_node;
@@ -298,6 +305,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
 	_Fwd_list_impl(_Fwd_list_impl&&) = default;
 
+	_Fwd_list_impl(_Fwd_list_impl&& __fl, _Node_alloc_type&& __a)
+	: _Node_alloc_type(std::move(__a)), _M_head(std::move(__fl._M_head))
+	{ }
+
 	_Fwd_list_impl(_Node_alloc_type&& __a)
 	: _Node_alloc_type(std::move(__a))
 	{ }
@@ -323,15 +334,21 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   _Fwd_list_base(_Node_alloc_type&& __a)
   : _M_impl(std::move(__a)) { }
 
+  // When allocators are always equal.
+  _Fwd_list_base(_Fwd_list_base&& __lst, _Node_alloc_type&& __a,
+		 std::true_type)
+  : _M_impl(std::move(__lst._M_impl), std::move(__a))
+  { }
+
+  // When allocators are not always equal.
   _Fwd_list_base(_Fwd_list_base&& __lst, _Node_alloc_type&& __a);
 
   _Fwd_list_base(_Fwd_list_base&&) = default;
 
   ~_Fwd_list_base()
-  { _M_erase_after(&_M_impl._M_head, 0); }
+  { _M_erase_after(&_M_impl._M_head, nullptr); }
 
 protected:
-
   _Node*
   _M_get_node()
   {
@@ -448,7 +465,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   : _Base(_Node_alloc_type(__al))
   { }
 
-
   /**
*  @brief  Copy constructor with allocator argument.
*  @param  __list  Input list to copy.
@@ -458,14 +474,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   : _Base(_Node_alloc_type(__al))
   { _M_range_initialize(__list.begin(), __list.end()); }
 
-  /**
-   *  @brief  Move constructor with allocator argument.
-   *  @param  __list  Input list to move.
-   *  @param  __alAn allocator object.
-   */
-  forward_list(forward_list&& __list, const _Alloc& __al)
-  noexcept(_Node_alloc_traits::_S_always_equal())
-  : _Base(std::move(__list), _Node_alloc_type(__al))
+private:
+  forward_list(forward_list&& __list, _Node_alloc_type&& __al,
+		   std::false_type)
+  : _Base(std::move(__list), std::move(__al))
   {
 	// If __list is not empty it means its allocator is not equal to __a,
 	// so we need to move from each element individually.
@@ -474,6 +486,24 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 		 std::__make_move_if_noexcept_iterator(__list.end()));
   }
 
+  forward_list(forward_list&& __list, _Node_alloc_type&& __al,
+		   std::true_type)
+  noexcept
+  : _Base(std::move(__list), _Node_alloc_type(__al), std::true_type{})
+  { }
+
+public:
+  /**
+   *  @brief  Move constructor with allocator argument.
+   *  @param  __list

Re: [PATCH] match.pd: reassociate multiplications with constants

2017-07-17 Thread Marc Glisse


On Sat, 15 Jul 2017, Alexander Monakov wrote:


On Thu, 13 Jul 2017, Marc Glisse wrote:

X*big*big where abs(big*big)>abs(INT_MIN) can be optimized to 0


I'm not sure that would be a win, eliminating X prevents the compiler from
deducing that X must be zero (if overflow invokes undefined behavior).


This issue is common to many simplifications. As far as I know, we almost 
never use an operation to deduce a range for its argument (the only case I 
can think of is dereferencing a pointer). Though this case is a bit 
special since we would determine a constant value, not just a range, so we 
could imagine walking through the uses of X that are dominated by the 
multiplication and replacing X by 0 there... but that's quite a bit of 
work for something hopefully rare. I wonder if the new VRP that's in the 
works changes anything there.



the only hard case is when the product of the constants is -INT_MIN, which we
could turn into X<<31 for instance (sadly loses range info), or (-X)*INT_MIN
or whatever. That would make a nice follow-up, if you are interested.


Here's a patch that combines constants, but doesn't handle this case,
(I'm not sure we want to handle it, would it be useful in practice?)


Probably not worth it indeed.


and neither does substitute zero on overflow, per the above concern.

(slsr-4.c needs to be adjusted due to new simplifications)

* match.pd ((X * CST1) * CST2): Simplify to X * (CST1 * CST2)
if the product does not overflow.
testsuite:
* gcc.dg/tree-ssa/assoc-2.c: Enhance.
* gcc.dg/tree-ssa/slsr-4.c: Adjust.

diff --git a/gcc/match.pd b/gcc/match.pd
index 36045f1..7f384db 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -283,6 +283,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
|| mul != wi::min_value (TYPE_PRECISION (type), SIGNED))
 { build_zero_cst (type); })

+/* Combine successive multiplications.  Similar to above, but handling
+   overflow is different.  */
+(simplify
+ (mult (mult @0 INTEGER_CST@1) INTEGER_CST@2)
+ (with {
+   bool overflow_p;
+   wide_int mul = wi::mul (@1, @2, TYPE_SIGN (type), &overflow_p);
+  }
+  (if (!overflow_p || TYPE_OVERFLOW_WRAPS (type))


I wonder if there are cases where this would cause trouble for saturating 
integers. The only case I can think of is when @2 is -1, but that's likely 
simplified to NEGATE_EXPR first.



+   (mult @0 {wide_int_to_tree (type, mul); }
+


There are a number of possible extensions (handle vectors, handle a 
conversion, etc) but I guess reviewers probably won't make those a 
requirement for this patch. Missing space before wide_int_to_tree.



/* Optimize A / A to 1.0 if we don't care about
   NaNs or Infinities.  */
(simplify
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
index a92c882..cc0e9d4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
@@ -5,4 +5,15 @@ int f0(int a, int b){
  return a * 33 * b * 55;
}

-/* { dg-final { scan-tree-dump-times "mult_expr" 2 "gimple" } } */
+int f1(int a){
+  a *= 33;
+  return a * 55;
+}
+
+int f2(int a, int b){
+  a *= 33;
+  return a * b * 55;
+}
+
+/* { dg-final { scan-tree-dump-times "mult_expr" 7 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "mult_expr" 5 "optimized" } } */


Ah, so that's why you had -fdump-tree-optimized in the previous patch...


diff --git a/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
index 17d7b4c..1e943b7 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
@@ -23,13 +23,9 @@ f (int i)
  foo (y);
}

-/* { dg-final { scan-tree-dump-times "\\* 4" 1 "slsr" } } */
-/* { dg-final { scan-tree-dump-times "\\* 10" 1 "slsr" } } */
-/* { dg-final { scan-tree-dump-times "\\+ 20;" 1 "slsr" } } */
+/* { dg-final { scan-tree-dump-times "\\* 40" 1 "slsr" } } */
/* { dg-final { scan-tree-dump-times "\\+ 200" 1 "slsr" } } */
-/* { dg-final { scan-tree-dump-times "\\- 16;" 1 "slsr" } } */
/* { dg-final { scan-tree-dump-times "\\- 160" 1 "slsr" } } */
-/* { dg-final { scan-tree-dump-times "\\* 4" 1 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "\\* 10" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\* 40" 1 "optimized" } } */
/* { dg-final { scan-tree-dump-times "\\+ 200" 1 "optimized" } } */
/* { dg-final { scan-tree-dump-times "\\+ 40" 1 "optimized" } } */


--
Marc Glisse

Backports to 7.x

2017-07-17 Thread Jakub Jelinek

Hi!

I've bootstrapped/regtested and committed following 5 backports from
mainline to 7.x.

Jakub
2017-07-17  Jakub Jelinek  

Backported from mainline
2017-06-30  Jakub Jelinek  

PR target/81225
* config/i386/sse.md (vec_extract_lo_): For
V8FI, V16FI and VI8F_256 iterators, use  instead
of nonimmediate_operand and  instead of m for
the input operand.  For V8FI iterator, always split if input is a MEM.
For V16FI and V8SF_256 iterators, don't test if both operands are MEM
if .  For VI4F_256 iterator, use 
instead of register_operand and  instead of v for
the input operand.  Make sure both operands aren't MEMs for if not
.

* gcc.target/i386/pr81225.c: New test.

--- gcc/config/i386/sse.md  (revision 249843)
+++ gcc/config/i386/sse.md  (revision 249844)
@@ -7359,13 +7359,13 @@ (define_insn "vec_extract_lo__mask
 (define_insn "vec_extract_lo_"
   [(set (match_operand: 0 "" 
"=,v")
(vec_select:
- (match_operand:V8FI 1 "nonimmediate_operand" "v,m")
+ (match_operand:V8FI 1 "" 
"v,")
  (parallel [(const_int 0) (const_int 1)
 (const_int 2) (const_int 3)])))]
   "TARGET_AVX512F
&& ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
 {
-  if ( || !TARGET_AVX512VL)
+  if ( || (!TARGET_AVX512VL && !MEM_P (operands[1])))
 return "vextract64x4\t{$0x0, %1, 
%0|%0, %1, 0x0}";
   else
 return "#";
@@ -7515,14 +7515,15 @@ (define_expand "avx_vextractf128"
 (define_insn "vec_extract_lo_"
   [(set (match_operand: 0 "nonimmediate_operand" "=v,m")
(vec_select:
- (match_operand:V16FI 1 "nonimmediate_operand" "vm,v")
+ (match_operand:V16FI 1 ""
+",v")
  (parallel [(const_int 0) (const_int 1)
  (const_int 2) (const_int 3)
  (const_int 4) (const_int 5)
  (const_int 6) (const_int 7)])))]
   "TARGET_AVX512F
&& 
-   && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+   && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
 {
   if ()
 return "vextract32x8\t{$0x0, %1, 
%0|%0, %1, 0x0}";
@@ -7546,11 +7547,12 @@ (define_split
 (define_insn "vec_extract_lo_"
   [(set (match_operand: 0 "" "=v,m")
(vec_select:
- (match_operand:VI8F_256 1 "nonimmediate_operand" "vm,v")
+ (match_operand:VI8F_256 1 ""
+   ",v")
  (parallel [(const_int 0) (const_int 1)])))]
   "TARGET_AVX
&&  && 
-   && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+   && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
 {
   if ()
 return "vextract64x2\t{$0x0, %1, %0%{%3%}|%0%{%3%}, %1, 0x0}";
@@ -7610,12 +7612,16 @@ (define_split
   "operands[1] = gen_lowpart (mode, operands[1]);")
 
 (define_insn "vec_extract_lo_"
-  [(set (match_operand: 0 "" 
"=")
+  [(set (match_operand: 0 ""
+ "=,v")
(vec_select:
- (match_operand:VI4F_256 1 "register_operand" "v")
+ (match_operand:VI4F_256 1 ""
+   "v,")
  (parallel [(const_int 0) (const_int 1)
 (const_int 2) (const_int 3)])))]
-  "TARGET_AVX &&  && "
+  "TARGET_AVX
+   &&  && 
+   && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
 {
   if ()
 return "vextract32x4\t{$0x0, %1, 
%0|%0, %1, 0x0}";
--- gcc/testsuite/gcc.target/i386/pr81225.c (nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr81225.c (revision 249844)
@@ -0,0 +1,14 @@
+/* PR target/81225 */
+/* { dg-do compile } */
+/* { dg-options "-mavx512ifma -O3 -ffloat-store" } */
+
+long a[24];
+float b[4], c[24];
+int d;
+
+void
+foo ()
+{
+  for (d = 0; d < 24; d++)
+c[d] = (float) d ? : b[a[d]];
+}
2017-07-17  Jakub Jelinek  

Backported from mainline
2017-07-04  Jakub Jelinek  

PR c++/81258
* parser.c (cp_parser_decomposition_declaration): Diagnose invalid
forms of structured binding initializers.

* g++.dg/cpp1z/decomp21.C (foo): Adjust expected diagnostics.
* g++.dg/cpp1z/decomp30.C: New test.

--- gcc/cp/parser.c (revision 249946)
+++ gcc/cp/parser.c (revision 249947)
@@ -13210,6 +13210,16 @@ cp_parser_decomposition_declaration (cp_
   *init_loc = cp_lexer_peek_token (parser->lexer)->location;
   tree initializer = cp_parser_initializer (parser, &is_direct_init,
&non_constant_p);
+  if (initializer == NULL_TREE
+ || (TREE_CODE (initializer) == TREE_LIST
+ && TREE_CHAIN (initializer))
+ || (TREE_CODE (initializer) == CONSTRUCTOR
+ && CONSTRUCTOR_NELTS (initializer) != 1))
+   {
+ error_at (loc, "invalid initializer for structured binding "
+   "declaration");
+ initializer = error_mark_node;
+   }
 
   if (decl != error_mark_node)
{
--- gcc/tes

Re: [PATCH] prevent -Wall from resetting -Wstringop-overflow=2 to 1 (pr 81345)

2017-07-17 Thread Martin Sebor


On 07/10/2017 05:27 PM, Joseph Myers wrote:

On Mon, 10 Jul 2017, Martin Sebor wrote:


On 07/07/2017 10:58 AM, Joseph Myers wrote:

This patch is OK.



Thanks.  Committed in r250104.

Do you have any comments on or concerns with changing how
LangEnabledBy interprets the opt argument as I suggested below?

  IMO, it makes little sense for an option that takes an argument
  and that specifies a binary option like -Wall in LangEnabledBy
  to default to the binary value of the latter option.  I think
  it would be more intuitive and convenient for it to default to
  the value set by its Init directive for the positive form of
  the binary option and to zero for the negative form (or to empty
  for strings, if that's ever done).


I'm uneasy about the notion of -Wall implying an option that's
on-by-default at all.  If it's on-by-default, why should -Wall have
anything to do with it?  (Resetting from 2 to 1 is obviously wrong.)
Such an implication only makes sense to me if -Wall implies a *different*
(presumably higher) value than the default.  And in the case of Init(-1)
(for special logic to initialize an option), copying the Init value
certainly doesn't make sense in an implication.


That sounds like a good point for -Wstringop-verflow.  I think
there I copied the whole LangEnabledBy attribute from another
option without fully considering (or even understanding) its
effect.  I'm pretty sure copying options like this is uncommon,
and what I'd like to do is help avoid similar kinds of mistake
in the future.  This is one small tweak that can help.  Better
checking of the attributes for consistency and sanity would be
another, provided "boolean to integer conversions" were treated
as invalid by the option scripts.  The checking could also reject
the mistake of redundantly specifying -Wall for an option that's
on by default (or worse, having -Wall or -Wextra lower a numeric
option's default argument value).

Martin

[PATCH, rs6000] Rev 2, 1/2 Add x86 MMX intrinsics to GCC PPC64LE target

2017-07-17 Thread Steven Munroe

Correct the problems Segher found in review and added a changes to deal
with the fallout from the __builtin_cpu_supports warning for older
distros.

Tested on P8 LE and P6/P7/P8 BE. No new tests failures.

./gcc/ChangeLog:

2017-07-17  Steven Munroe  

* config.gcc (powerpc*-*-*): Add mmintrin.h.
* config/rs6000/mmintrin.h: New file.
* config/rs6000/x86intrin.h [__ALTIVEC__]: Include mmintrin.h.

Index: gcc/config/rs6000/mmintrin.h
===
--- gcc/config/rs6000/mmintrin.h(revision 0)
+++ gcc/config/rs6000/mmintrin.h(working copy)
@@ -0,0 +1,1456 @@
+/* Copyright (C) 2002-2017 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* Implemented from the specification included in the Intel C++ Compiler
+   User Guide and Reference, version 9.0.  */
+
+#ifndef NO_WARN_X86_INTRINSICS
+/* This header is distributed to simplify porting x86_64 code that
+   makes explicit use of Intel intrinsics to powerpc64le.
+   It is the user's responsibility to determine if the results are
+   acceptable and make additional changes as necessary.
+   Note that much code that uses Intel intrinsics can be rewritten in
+   standard C or GNU C extensions, which are more portable and better
+   optimized across multiple targets.
+
+   In the specific case of X86 MMX (__m64) intrinsics, the PowerPC
+   target does not support a native __vector_size__ (8) type.  Instead
+   we typedef __m64 to a 64-bit unsigned long long, which is natively
+   supported in 64-bit mode.  This works well for the _si64 and some
+   _pi32 operations, but starts to generate long sequences for _pi16
+   and _pi8 operations.  For those cases it better (faster and
+   smaller code) to transfer __m64 data to the PowerPC vector 128-bit
+   unit, perform the operation, and then transfer the result back to
+   the __m64 type. This implies that the direct register move
+   instructions, introduced with power8, are available for efficient
+   implementation of these transfers.
+
+   Most MMX intrinsic operations can be performed efficiently as
+   C language 64-bit scalar operation or optimized to use the newer
+   128-bit SSE/Altivec operations.  We recomend this for new
+   applications.  */
+#warning "Please read comment above.  Use -DNO_WARN_X86_INTRINSICS to disable 
this warning."
+#endif
+
+#ifndef _MMINTRIN_H_INCLUDED
+#define _MMINTRIN_H_INCLUDED
+
+#include 
+/* The Intel API is flexible enough that we must allow aliasing with other
+   vector types, and their scalar components.  */
+typedef __attribute__ ((__aligned__ (8))) unsigned long long __m64;
+
+typedef __attribute__ ((__aligned__ (8)))
+union
+  {
+__m64 as_m64;
+char as_char[8];
+signed char as_signed_char [8];
+short as_short[4];
+int as_int[2];
+long long as_long_long;
+float as_float[2];
+double as_double;
+  } __m64_union;
+
+/* Empty the multimedia state.  */
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_empty (void)
+{
+  /* nothing to do on PowerPC.  */
+}
+
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_m_empty (void)
+{
+  /* nothing to do on PowerPC.  */
+}
+
+/* Convert I to a __m64 object.  The integer is zero-extended to 64-bits.  */
+extern __inline __m64  __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_cvtsi32_si64 (int __i)
+{
+  return (__m64) (unsigned int) __i;
+}
+
+extern __inline __m64  __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_m_from_int (int __i)
+{
+  return _mm_cvtsi32_si64 (__i);
+}
+
+/* Convert the lower 32 bits of the __m64 object into an integer.  */
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_cvtsi64_si32 (__m64 __i)
+{
+  return ((int) __i);
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_m_to_int (__m64 __i)
+{
+  return _mm_cvtsi64_si32 (__i);
+}
+
+#ifdef __pow

Re: PING: [PATCH v2 0/2] [testsuite, libgcc] PR80759 Fix FAILs on Solaris and Darwin

2017-07-17 Thread Uros Bizjak

On Mon, Jul 17, 2017 at 6:16 PM, Daniel Santos  wrote:
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00025.html
>
> Uros,
> Can you review changes for i386 please?

x86 part is OK, so considering that Mike is OK with the Darwin part,
if there are no comments from Ian, the patch is OK for mainline.

Thanks,
Uros.

[PATCH rs6000] Fix up BMI/BMI2 intrinsic DG tests

2017-07-17 Thread Steven Munroe

After a resent GCC change the previously submitted BMI/BMI2 intrinsic
test started to fail with the following warning/error.

ppc_cpu_supports_hw_available122373.c: In function 'main':
ppc_cpu_supports_hw_available122373.c:9:10: warning:
__builtin_cpu_supports need
s GLIBC (2.23 and newer) that exports hardware capability bits

The does not occur on systems with the newer (2.23) GLIBC but is common
on older (stable) distos.

As this is coming from the bmi-check.h and bmi2-check.h includes (and
not the tests directly) it seems simpler to simply skip the test unless
__BUILTIN_CPU_SUPPORTS__ is defined.


[gcc/testsuite]

2017-07-17  Steven Munroe  

*gcc.target/powerpc/bmi-check.h (main): Skip unless
__BUILTIN_CPU_SUPPORTS__ defined.
*gcc.target/powerpc/bmi2-check.h (main): Skip unless
__BUILTIN_CPU_SUPPORTS__ defined.

Index: gcc/testsuite/gcc.target/powerpc/bmi-check.h
===
--- gcc/testsuite/gcc.target/powerpc/bmi-check.h(revision 250212)
+++ gcc/testsuite/gcc.target/powerpc/bmi-check.h(working copy)
@@ -13,6 +13,7 @@ do_test (void)
 int
 main ()
 {
+#ifdef __BUILTIN_CPU_SUPPORTS__
   /* Need 64-bit for 64-bit longs as single instruction.  */
   if ( __builtin_cpu_supports ("ppc64") )
 {
@@ -25,6 +26,6 @@ main ()
   else
 printf ("SKIPPED\n");
 #endif
-
+#endif /* __BUILTIN_CPU_SUPPORTS__ */
   return 0;
 }
Index: gcc/testsuite/gcc.target/powerpc/bmi2-check.h
===
--- gcc/testsuite/gcc.target/powerpc/bmi2-check.h   (revision 250212)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-check.h   (working copy)
@@ -13,6 +13,7 @@ do_test (void)
 int
 main ()
 {
+#ifdef __BUILTIN_CPU_SUPPORTS__
   /* The BMI2 test for pext test requires the Bit Permute doubleword
  (bpermd) instruction added in PowerISA 2.06 along with the VSX
  facility.  So we can test for arch_2_06.  */
@@ -27,7 +28,7 @@ main ()
   else
 printf ("SKIPPED\n");
 #endif
-
+#endif /* __BUILTIN_CPU_SUPPORTS__ */
   return 0;
 }

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 02/08

2017-07-17 Thread Jeff Law

On 07/13/2017 04:54 PM, Jeff Law wrote:
> On 07/12/2017 07:44 PM, Segher Boessenkool wrote:

>> I don't really see why this is so complicated, and why the rs6000
>> target changes (a later patch) are so big.  Why isn't it just simple
>> patches to allocate_stack (and the prologue thing), that check the
>> flag and if it is set do some probes?
> Yea.  I wasn't happy with the size of the rs6000 patches either, which I
> mentioned at some point :-)  Some of the complexity is making sure we
> keep the backchain pointer correct and trying to do so as efficiently as
> possible.  But there's too much conceptual code duplication.
> 
> Essentially the code shows up 3 times in slightly different forms.
So in the V2 patch the PPC code is somewhat cleaner.  It's inherently
going to be more complex than the other ports because it has to handle
dynamic allocation and probing on its own rather than relying on the
generic code.

What I've done is broken out a few helper functions in that generic code
that the ppc backend can use.  There's helpers that compute the key
information we need (rounded size, last address, residual allocation,
etc), code to emit the start of the loop and code to emit the end of the
loop.

So the PPC dynamic code in rs6000.md looks something like this now

compute_data (...)
if (loop needed)
  {
emit_loop_start (...);

ppc magic to allocate & probe a stack page

emit_loop_end (...);
  }

if (residuals)
  {
fix operands[1] and let the rest of the expander run
  }



Ideally those three helpers could be used by other ports that have
backend specific expanders to handle dynamic stack allocations.

Jeff

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 03/08

2017-07-17 Thread Jeff Law

On 07/17/2017 10:28 AM, Segher Boessenkool wrote:
> Hi,
> 
> Just some typos:
> 
> On Tue, Jul 11, 2017 at 03:20:38PM -0600, Jeff Law wrote:
>> +/* If debugging dumps are requested, dump infomation about how the
> 
> Typo ("information").
> 
>> +   target handled -fstack-check=clash for the prologue.
>> +
>> +   PROBES describes what if any probes were emitted.
>> +
>> +   RESIDUALS indicates if the prologue had any residual allocation
>> +   (ie total allocation was not a multiple of PROBE_INTERVAL.  */
> 
> Missing closing paren; i.e.
THanks.  Fixed for next update.

I nearly posted V2 last night, but wanted to take a stab at the aarch64
bits again :-)

jeff

Re: [PATCH, PR81030] Call compute_outgoing_frequencies at expand

2017-07-17 Thread Jan Hubicka

Hi,
thanks for looking into this issue.  It is quite weird indeed.
> Commenting out this bit makes the compilation succeed.

so my understanding is that RTL expansions confuses itself and redistributes
probabilities to two jumps while immediately optimizing out the second...

I think in such case instead of calling compute_outgoing_frequencies which
will take confused probability from REG_BR_PROB_NOTE and produce inconsistent
profile, it is better to take the existing probabilities in profile and put
them back to RTL.

Of course it would be nice to teach RTl expansion to not do such mistakes
(because the patch bellow helps only in case the BB happens to not split at
all), but that would be quite difficult I guess.

I am testing the attached patch.

Honza

Index: cfgbuild.c
===
--- cfgbuild.c  (revision 250246)
+++ cfgbuild.c  (working copy)
@@ -676,7 +676,14 @@ find_many_sub_basic_blocks (sbitmap bloc
else
  /* If nothing changed, there is no need to create new BBs.  */
  if (EDGE_COUNT (bb->succs) == n_succs[bb->index])
-   continue;
+   {
+ /* In rare occassions RTL expansion might have mistakely assigned
+a probabilities different from what is in CFG.  This happens
+when we try to split branch to two but optimize out the
+second branch during the way. See PR81030.  */
+ update_br_prob_note (bb);
+ continue;
+   }
 
compute_outgoing_frequencies (bb);
   }
> 
> 
> III.
> 
> The bit added in find_many_sub_basic_blocks makes sense to me.
> 
> It just seems to me that we have been relying on find_many_sub_basic_blocks
> to call compute_outgoing_frequencies for all basic blocks during expand.
> This patch restores that situation, and fixes the ICE.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for trunk?
> 
> Thanks,
> - Tom

> Call compute_outgoing_frequencies at expand
> 
> 2017-07-17  Tom de Vries  
> 
>   PR middle-end/81030
>   * cfgbuild.c (find_many_sub_basic_blocks): Add and handle update_freq
>   parameter.
>   * cfgbuild.h (find_many_sub_basic_blocks): Add update_freq parameter.
>   * cfgexpand.c (pass_expand::execute): Call find_many_sub_basic_blocks
>   with update_freq == true.
> 
>   * gcc.dg/pr81030.c: New test.
> 
> ---
>  gcc/cfgbuild.c |  4 ++--
>  gcc/cfgbuild.h |  3 ++-
>  gcc/cfgexpand.c|  2 +-
>  gcc/testsuite/gcc.dg/pr81030.c | 29 +
>  4 files changed, 34 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/cfgbuild.c b/gcc/cfgbuild.c
> index 56a2cb9..78036fe 100644
> --- a/gcc/cfgbuild.c
> +++ b/gcc/cfgbuild.c
> @@ -594,7 +594,7 @@ compute_outgoing_frequencies (basic_block b)
> and create edges.  */
>  
>  void
> -find_many_sub_basic_blocks (sbitmap blocks)
> +find_many_sub_basic_blocks (sbitmap blocks, bool update_freq)
>  {
>basic_block bb, min, max;
>bool found = false;
> @@ -677,7 +677,7 @@ find_many_sub_basic_blocks (sbitmap blocks)
> }
>   else
> /* If nothing changed, there is no need to create new BBs.  */
> -   if (EDGE_COUNT (bb->succs) == n_succs[bb->index])
> +   if (!update_freq && EDGE_COUNT (bb->succs) == n_succs[bb->index])
>   continue;
>  
>   compute_outgoing_frequencies (bb);
> diff --git a/gcc/cfgbuild.h b/gcc/cfgbuild.h
> index afda5ac..0c58590 100644
> --- a/gcc/cfgbuild.h
> +++ b/gcc/cfgbuild.h
> @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
>  extern bool inside_basic_block_p (const rtx_insn *);
>  extern bool control_flow_insn_p (const rtx_insn *);
>  extern void rtl_make_eh_edge (sbitmap, basic_block, rtx);
> -extern void find_many_sub_basic_blocks (sbitmap);
> +extern void find_many_sub_basic_blocks (sbitmap block,
> + bool update_freq = false);
>  
>  #endif /* GCC_CFGBUILD_H */
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 3e1d24d..e030a86 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -6464,7 +6464,7 @@ pass_expand::execute (function *fun)
>  
>auto_sbitmap blocks (last_basic_block_for_fn (fun));
>bitmap_ones (blocks);
> -  find_many_sub_basic_blocks (blocks);
> +  find_many_sub_basic_blocks (blocks, true);
>purge_all_dead_edges ();
>  
>expand_stack_alignment ();
> diff --git a/gcc/testsuite/gcc.dg/pr81030.c b/gcc/testsuite/gcc.dg/pr81030.c
> new file mode 100644
> index 000..23da6e5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr81030.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1" } */
> +
> +void __assert_fail (const char *, const char *, unsigned int, const char *);
> +
> +int a, b, c, d, e, f, h;
> +unsigned char g;
> +
> +int main ()
> +{
> +  int i, *j = &b;
> +  if (a)
> +{
> +  if (h)
> + {
> +   int **k = &j;
> +   d = 0;
> +   *k =

Re: [PATCH] gcc/doc: list what version each attribute was introduced in

2017-07-17 Thread Martin Sebor


On 07/06/2017 07:25 AM, Daniel P. Berrange wrote:

There are several hundred named attribute keys that have been
introduced over many GCC releases. Applications typically need
to be compilable with multiple GCC versions, so it is important
for developers to know when GCC introduced support for each
attribute.

This augments the texi docs that list attribute keys with
a note of what version introduced the feature. The version
information was obtained through archaeology of the GCC source
repository release tags, back to gcc-4_0_0-release. For
attributes added in 4.0.0 or later, an explicit version will
be noted. Any attribute that predates 4.0.0 will simply note
that it has existed prior to 4.0.0. It is thought there is
little need to go further back in time than 4.0.0 since few,
if any, apps will still be using such old compiler versions.


A few years ago I created something like this for command line
options.  I used a script to download the GCC manual for each
version, extract new command line options from the index, and
tabulate them.



Where a named attribute can be used in many contexts (ie the
'visibility' attribute can be used for both functions or
variables), it was assumed that the attribute was supported
in all use contexts at the same time.

Future patches that add new attributes to GCC should be
required to follow this new practice, by documenting the
version.


I think this is great material.  I don't have a strong opinion
on where it belongs (the manual, the Wiki, or somewhere else),
but given how tedious it can be to get at this information I
think having it available somewhere in some form could be quite
useful.  (I specifically needed it in a tabular form, with
option names on left and versions in columns.)

Martin

PS The version information isn't just used by engineers writing
code (where I agree that having a portable API to query it would
be more robust than copying it from a static page).  In my
experience, it's also used to drive decisions about adopting
different versions of GCC or versions of other compilers that
aim to be (near) 100% GCC compatible (such as Intel ICC).

[C++ PATCH] ctor privacy

2017-07-17 Thread Nathan Sidwell

While futzing around with ctor lookup I discovered this warning about 
overly-private classes.


Originally we'd allow a public copy-ctor to be sufficiently public, but 
as the comment says, you need an already-constructed object for that to 
work.  so this implements that check -- public copy or move ctors do not 
make the class sufficiently public.


I didn't implement the further suggested check of ignoring a ctor that 
takes an already constructed object -- be it copy ctor or not.


applied to trunk.

nathan
--
Nathan Sidwell
2017-07-17  Nathan Sidwell  

	* class.c (maybe_warn_about_overly_private_class): Ignore public
	copy ctors.

	* g++.dg/warn/ctor-dtor-privacy-3.C: New.

Index: cp/class.c
===
--- cp/class.c	(revision 250280)
+++ cp/class.c	(working copy)
@@ -2240,10 +2240,10 @@ maybe_warn_about_overly_private_class (t
   /* Warn about classes that have private constructors and no friends.  */
   if (TYPE_HAS_USER_CONSTRUCTOR (t)
   /* Implicitly generated constructors are always public.  */
-  && (!CLASSTYPE_LAZY_DEFAULT_CTOR (t)
-	  || !CLASSTYPE_LAZY_COPY_CTOR (t)))
+  && !CLASSTYPE_LAZY_DEFAULT_CTOR (t))
 {
   bool nonprivate_ctor = false;
+  tree copy_or_move = NULL_TREE;
 
   /* If a non-template class does not define a copy
 	 constructor, one is defined for it, enabling it to avoid
@@ -2260,13 +2260,15 @@ maybe_warn_about_overly_private_class (t
   else
 	for (ovl_iterator iter (CLASSTYPE_CONSTRUCTORS (t));
 	 !nonprivate_ctor && iter; ++iter)
-	  /* Ideally, we wouldn't count copy constructors (or, in
-	 fact, any constructor that takes an argument of the class
-	 type as a parameter) because such things cannot be used
-	 to construct an instance of the class unless you already
-	 have one.  But, for now at least, we're more
-	 generous.  */
-	  if (! TREE_PRIVATE (*iter))
+	  if (TREE_PRIVATE (*iter))
+	continue;
+	  else if (copy_fn_p (*iter) || move_fn_p (*iter))
+	/* Ideally, we wouldn't count any constructor that takes
+	   an argument of the class type as a parameter, because
+	   such things cannot be used to construct an instance of
+	   the class unless you already have one.  */
+	copy_or_move = *iter;
+	  else
 	nonprivate_ctor = true;
 
   if (!nonprivate_ctor)
@@ -2274,6 +2276,10 @@ maybe_warn_about_overly_private_class (t
 	  warning (OPT_Wctor_dtor_privacy,
 		   "%q#T only defines private constructors and has no friends",
 		   t);
+	  if (copy_or_move)
+	inform (DECL_SOURCE_LOCATION (copy_or_move),
+		"%q#D is public, but requires an existing %q#T object",
+		copy_or_move, t);
 	  return;
 	}
 }
Index: testsuite/g++.dg/warn/ctor-dtor-privacy-3.C
===
--- testsuite/g++.dg/warn/ctor-dtor-privacy-3.C	(revision 0)
+++ testsuite/g++.dg/warn/ctor-dtor-privacy-3.C	(working copy)
@@ -0,0 +1,20 @@
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-Wctor-dtor-privacy" } 
+
+class X // { dg-message "only defines private" }
+{
+public:
+  X (X const &); // { dg-message "requires an existing" }
+};
+
+class Y // { dg-message "only defines private" }
+{
+public:
+  Y (Y &&);  // { dg-message "requires an existing" }
+};
+
+class Z
+{
+public:
+  Z (int);
+};

Re: PING: [PATCH v2 0/2] [testsuite, libgcc] PR80759 Fix FAILs on Solaris and Darwin

2017-07-17 Thread Mike Stump

On Jul 17, 2017, at 9:16 AM, Daniel Santos  wrote:
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00025.html

> Mike or Iain,
> Can one of you review changes for Darwin please?

Ok.

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 03/08

2017-07-17 Thread Segher Boessenkool

Hi,

Just some typos:

On Tue, Jul 11, 2017 at 03:20:38PM -0600, Jeff Law wrote:
> +/* If debugging dumps are requested, dump infomation about how the

Typo ("information").

> +   target handled -fstack-check=clash for the prologue.
> +
> +   PROBES describes what if any probes were emitted.
> +
> +   RESIDUALS indicates if the prologue had any residual allocation
> +   (ie total allocation was not a multiple of PROBE_INTERVAL.  */

Missing closing paren; i.e.


Segher

Re: [PATCH, GCC/ARM, ping] Remove ARMv8-M code for D17-D31

2017-07-17 Thread Thomas Preudhomme

My bad, found an off-by-one error in the sizing of bitmaps. Please find fixed 
patch in attachment.


ChangeLog entry is unchanged:

*** gcc/ChangeLog ***

2017-06-13  Thomas Preud'homme  

* config/arm/arm.c (arm_option_override): Forbid ARMv8-M Security
Extensions with more than 16 double VFP registers.
(cmse_nonsecure_entry_clear_before_return): Remove second entry of
to_clear_mask and all code related to it.  Replace the remaining
entry by a sbitmap and adapt code accordingly.

Best regards,

Thomas

On 17/07/17 09:52, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 12/07/17 09:59, Thomas Preudhomme wrote:

Hi Richard,

On 07/07/17 15:19, Richard Earnshaw (lists) wrote:


Hmm, I think that's because really this is a partial conversion.  It
looks like doing this properly would involve moving that existing code
to use sbitmaps as well.  I think doing that would be better for
long-term maintenance perspectives, but I'm not going to insist that you
do it now.


There's also the assert later but I've found a way to improve it slightly. 
While switching to auto_sbitmap I also changed the code slightly to allocate 
directly bitmaps to the right size. Since the change is probably bigger than 
what you had in mind I'd appreciate if you can give me an OK again. See 
updated patch in attachment. ChangeLog entry is unchanged:


2017-06-13  Thomas Preud'homme  

 * config/arm/arm.c (arm_option_override): Forbid ARMv8-M Security
 Extensions with more than 16 double VFP registers.
 (cmse_nonsecure_entry_clear_before_return): Remove second entry of
 to_clear_mask and all code related to it.  Replace the remaining
 entry by a sbitmap and adapt code accordingly.



As a result I'll let you take the call as to whether you keep this
version or go back to your earlier patch.  If you do decide to keep this
version, then see the comment below.


Given the changes I'm more happy with how the patch looks now and making it go 
in can be a nice incentive to change other ARMv8-M Security Extension related 
code later on.


Best regards,

Thomas
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 259597d8890ee84c5bd92b12b6f9f6521c8dcd2e..b79f0e701713d1958da3874da3a34f9c8f78bb5c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3620,6 +3620,11 @@ arm_option_override (void)
   if (use_cmse && !arm_arch_cmse)
 error ("target CPU does not support ARMv8-M Security Extensions");
 
+  /* We don't clear D16-D31 VFP registers for cmse_nonsecure_call functions
+ and ARMv8-M Baseline and Mainline do not allow such configuration.  */
+  if (use_cmse && LAST_VFP_REGNUM > LAST_LO_VFP_REGNUM)
+error ("ARMv8-M Security Extensions incompatible with selected FPU");
+
   /* Disable scheduling fusion by default if it's not armv7 processor
  or doesn't prefer ldrd/strd.  */
   if (flag_schedule_fusion == 2
@@ -24996,42 +25001,39 @@ thumb1_expand_prologue (void)
 void
 cmse_nonsecure_entry_clear_before_return (void)
 {
-  uint64_t to_clear_mask[2];
+  bool clear_vfpregs = TARGET_HARD_FLOAT && !TARGET_THUMB1;
+  int regno, maxregno = clear_vfpregs ? LAST_VFP_REGNUM : IP_REGNUM;
   uint32_t padding_bits_to_clear = 0;
   uint32_t * padding_bits_to_clear_ptr = &padding_bits_to_clear;
-  int regno, maxregno = IP_REGNUM;
+  auto_sbitmap to_clear_bitmap (LAST_VFP_REGNUM + 1);
   tree result_type;
   rtx result_rtl;
 
-  to_clear_mask[0] = (1ULL << (NUM_ARG_REGS)) - 1;
-  to_clear_mask[0] |= (1ULL << IP_REGNUM);
+  bitmap_clear (to_clear_bitmap);
+  bitmap_set_range (to_clear_bitmap, R0_REGNUM, NUM_ARG_REGS);
+  bitmap_set_bit (to_clear_bitmap, IP_REGNUM);
 
   /* If we are not dealing with -mfloat-abi=soft we will need to clear VFP
  registers.  We also check that TARGET_HARD_FLOAT and !TARGET_THUMB1 hold
  to make sure the instructions used to clear them are present.  */
-  if (TARGET_HARD_FLOAT && !TARGET_THUMB1)
+  if (clear_vfpregs)
 {
-  uint64_t float_mask = (1ULL << (D7_VFP_REGNUM + 1)) - 1;
-  maxregno = LAST_VFP_REGNUM;
-
-  float_mask &= ~((1ULL << FIRST_VFP_REGNUM) - 1);
-  to_clear_mask[0] |= float_mask;
+  int float_bits = D7_VFP_REGNUM - FIRST_VFP_REGNUM + 1;
 
-  float_mask = (1ULL << (maxregno - 63)) - 1;
-  to_clear_mask[1] = float_mask;
+  bitmap_set_range (to_clear_bitmap, FIRST_VFP_REGNUM, float_bits);
 
   /* Make sure we don't clear the two scratch registers used to clear the
 	 relevant FPSCR bits in output_return_instruction.  */
   emit_use (gen_rtx_REG (SImode, IP_REGNUM));
-  to_clear_mask[0] &= ~(1ULL << IP_REGNUM);
+  bitmap_clear_bit (to_clear_bitmap, IP_REGNUM);
   emit_use (gen_rtx_REG (SImode, 4));
-  to_clear_mask[0] &= ~(1ULL << 4);
+  bitmap_clear_bit (to_clear_bitmap, 4);
 }
 
   /* If the user has defined registers to be caller saved, these are no longer
  restored by the function before returning and must thus be cleared for

PING: [PATCH v2 0/2] [testsuite, libgcc] PR80759 Fix FAILs on Solaris and Darwin

2017-07-17 Thread Daniel Santos


https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00025.html

Uros,
Can you review changes for i386 please?

Mike or Iain,
Can one of you review changes for Darwin please?  I'm not familiar with 
the platform, although Rainer tested on Darwin for me.


Ian,
Can you review changes to libgcc please?

Thank you all!
Daniel


On 07/02/2017 12:11 AM, Daniel Santos wrote:
This patchset addresses a number of testsuite issues for 
gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp, mostly occurring on Solaris 
and Darwin.  Additionally, it solves a bug in libgcc that caused link 
failures on Darwin when building with -mcall-ms2sysv-xlogues.  The 
issues are detailed in the notes for each patch.


I would particularly appreciate any feedback for Darwin as I am 
unfamiliar with the platform and Rainer and I have fashioned some of 
these changes by looking at other Darwin code in gcc.


 .../gcc.target/x86_64/abi/ms-sysv/do-test.S  | 200 
---

 .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.c  |  83 +++-
 .../gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp| 153 +-
 libgcc/config.host   |   6 +-
 libgcc/config/i386/i386-asm.h|  89 +
 libgcc/config/i386/resms64.S |   2 +-
 libgcc/config/i386/resms64f.S|   2 +-
 libgcc/config/i386/resms64fx.S   |   2 +-
 libgcc/config/i386/resms64x.S|   2 +-
 libgcc/config/i386/savms64.S |   2 +-
 libgcc/config/i386/savms64f.S|   2 +-
 11 files changed, 274 insertions(+), 269 deletions(-)


Many thanks to Rainer for all of his help on this!

Thanks,
Daniel

[C++ PATCH] combine move assign/ctor check

2017-07-17 Thread Nathan Sidwell

We currently have separate 'has-move-assign' and 'has-move-ctor' 
functions.  but they are always called together.  so lets just have a 
single 'has-move-assign-or-move-ctor' function.


Applied to trunk.

nathan
--
Nathan Sidwell
2017-07-17  Nathan Sidwell  

	* class.c (type_has_user_declared_move_constructor,
	type_has_user_declared_move_assign): Combine into ...
	(classtype_has_user_move_assign_or_move_ctor_p): ... this new function.
	* cp-tree.h (type_has_user_declared_move_constructor,
	type_has_user_declared_move_assign): Combine into ...
	(classtype_has_user_move_assign_or_move_ctor_p): ... this. Declare.
	* method.c (maybe_explain_implicit_delete): Use it.
	(lazily_declare_fn): Use it.
	* tree.c (type_has_nontrivial_copy_init): Use it.

Index: cp/class.c
===
--- cp/class.c	(revision 250272)
+++ cp/class.c	(working copy)
@@ -5491,48 +5491,30 @@ type_has_move_assign (tree t)
   return false;
 }
 
-/* Returns true iff class T has a move constructor that was explicitly
-   declared in the class body.  Note that this is different from
-   "user-provided", which doesn't include functions that are defaulted in
-   the class.  */
+/* Returns true iff T, a class, has a user-declared move-assignment or
+   move-constructor.  Note that this is different from
+   "user-provided", which doesn't include functions that are defaulted
+   in the class.  */
 
 bool
-type_has_user_declared_move_constructor (tree t)
+classtype_has_user_move_assign_or_move_ctor_p (tree t)
 {
-  if (CLASSTYPE_LAZY_MOVE_CTOR (t))
-return false;
-
   if (!CLASSTYPE_METHOD_VEC (t))
 return false;
 
-  for (ovl_iterator iter (CLASSTYPE_CONSTRUCTORS (t)); iter; ++iter)
-{
-  tree fn = *iter;
-  if (move_fn_p (fn) && !DECL_ARTIFICIAL (fn))
+  if (!CLASSTYPE_LAZY_MOVE_CTOR (t))
+for (ovl_iterator iter (lookup_fnfields_slot_nolazy (t, ctor_identifier));
+	 iter; ++iter)
+  if (!DECL_ARTIFICIAL (*iter) && move_fn_p (*iter))
 	return true;
-}
-
-  return false;
-}
-
-/* Returns true iff class T has a move assignment operator that was
-   explicitly declared in the class body.  */
 
-bool
-type_has_user_declared_move_assign (tree t)
-{
-  if (CLASSTYPE_LAZY_MOVE_ASSIGN (t))
-return false;
-
-  for (ovl_iterator iter (lookup_fnfields_slot_nolazy
-			  (t, cp_assignment_operator_id (NOP_EXPR)));
-   iter; ++iter)
-{
-  tree fn = *iter;
-  if (move_fn_p (fn) && !DECL_ARTIFICIAL (fn))
+  if (!CLASSTYPE_LAZY_MOVE_ASSIGN (t))
+for (ovl_iterator iter (lookup_fnfields_slot_nolazy
+			(t, cp_assignment_operator_id (NOP_EXPR)));
+	 iter; ++iter)
+  if (!DECL_ARTIFICIAL (*iter) && move_fn_p (*iter))
 	return true;
-}
-
+  
   return false;
 }
 
Index: cp/cp-tree.h
===
--- cp/cp-tree.h	(revision 250272)
+++ cp/cp-tree.h	(working copy)
@@ -6025,8 +6025,7 @@ extern bool type_has_constexpr_default_c
 extern bool type_has_virtual_destructor		(tree);
 extern bool type_has_move_constructor		(tree);
 extern bool type_has_move_assign		(tree);
-extern bool type_has_user_declared_move_constructor (tree);
-extern bool type_has_user_declared_move_assign(tree);
+extern bool classtype_has_user_move_assign_or_move_ctor_p (tree);
 extern bool type_build_ctor_call		(tree);
 extern bool type_build_dtor_call		(tree);
 extern void explain_non_literal_class		(tree);
Index: cp/method.c
===
--- cp/method.c	(revision 250272)
+++ cp/method.c	(working copy)
@@ -1808,10 +1808,8 @@ maybe_explain_implicit_delete (tree decl
 	informed = false;
 	}
   else if (DECL_ARTIFICIAL (decl)
-	   && (sfk == sfk_copy_assignment
-		   || sfk == sfk_copy_constructor)
-	   && (type_has_user_declared_move_constructor (ctype)
-		   || type_has_user_declared_move_assign (ctype)))
+	   && (sfk == sfk_copy_assignment || sfk == sfk_copy_constructor)
+	   && classtype_has_user_move_assign_or_move_ctor_p (ctype))
 	{
 	  inform (DECL_SOURCE_LOCATION (decl),
 		  "%q#D is implicitly declared as deleted because %qT "
@@ -2372,10 +2370,8 @@ lazily_declare_fn (special_function_kind
   /* [class.copy]/8 If the class definition declares a move constructor or
  move assignment operator, the implicitly declared copy constructor is
  defined as deleted */
-  if ((sfk == sfk_copy_assignment
-   || sfk == sfk_copy_constructor)
-  && (type_has_user_declared_move_constructor (type)
-	  || type_has_user_declared_move_assign (type)))
+  if ((sfk == sfk_copy_assignment || sfk == sfk_copy_constructor)
+  && classtype_has_user_move_assign_or_move_ctor_p (type))
 DECL_DELETED_FN (fn) = true;
 
   /* A destructor may be virtual.  */
Index: cp/tree.c
===
--- cp/tree.c	(revision 250272)
+++ cp/tree.c	(working copy)
@@ -3976,8 +3976,7 @@ type_has_nontrivial_copy_

Re: [PATCH][PING] Improve extraction of changed file in contrib/mklog

2017-07-17 Thread Mike Stump

On Jul 17, 2017, at 12:22 AM, Yuri Gribov  wrote:
> 
> Currently mklog will fail to analyze lines like this in patches:
> diff -rupN gcc/gcc/testsuite/lib/profopt.exp
> gcc-compare-checks/gcc/testsuite/lib/profopt.exp
> (it fails with "Error: failed to parse diff for ... and ...").
> 
> This patch fixes it. Ok for trunk?

Ok.

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 07/08

2017-07-17 Thread Jeff Law

On 07/16/2017 12:36 PM, Trevor Saunders wrote:
>>> On the other hand if probing is fast enough that it can be on by default
>>> in gcc there should be much less of it.  Even if you did change the ABI
>>> to require probing it seems unlikely that code violating that
>>> requirement would hit problems other than this security concern, so I'd
>>> expect there will be some non compliant asm out there.
>> Certainly my goal is to enable it by default one day.  Even if that's
>> ultimately done at the distro level or by a configure time switch.
>>
>> Certainly if/when we reach that point the amount of unprotected code
>> will drop dramatically, but based on my experience I fully expect some
>> folks to turn it off.  They'll say something like "hey, we've audited
>> our code and we don't have any large stack or allocas, so I'm turning
>> this thing off to get some un-measurable performance gain."  It'd be an
>> uber-dumb thing to do, but it's going to happen.
> 
> I agree with that, though we could consider not having an option for it.
> Of course people can compile there own compiler, but I think its pretty
> clear if you do that the security problem is your fault too.
I suspect not having an option is a non-starter.  But maybe I'm wrong on
that one.

>> Tweaking the ABI to mandate touching *sp in the outgoing args area &
>> alloca space is better because we likely wouldn't have an option to
>> avoid that access.   So a well meaning, but clueless, developer couldn't
>> turn the option off like they could stack checking.
> 
> However see the comment about assembly code, I can easily see someone
> forgeting to touch *sp in hand written assembly.  Obviously much less of
> that, but it kind of sounds like you want perfect here.
Perfect would be good :-)

The odds of hand written assembly code having a large enough outgoing
args size or dynamic frame to require touching is exceedingly small.

>>> It seems to me pretty important to ask how many programs out there have
>>> a caller that can push the stack into the guard page, but not past it.
>> I've actually spent a lot of time thing about that precise problem. You
>> don't need large frames to do that -- you just need controlled heap
>> leaks and/or controlled recursion.  ie, even if a function has no
>> allocas and a small frame, it can put the stack pointer into the guard.
> 
> There may not be room for it on 32 bit platforms, but I think another
> thing we learned here is that its a mistake to allow the heap to grow
> into space the stack might use.  That would require the use of recursion
> here.
The heap grows in response to explicit requests and appropriate checks
can be made to prevent jumping the guard due to heap growth.  It's the
implicit mechansisms for stack growth that cause headaches.



>>   I think the largest
>>> buffer Qualys found was less than 400k? So 1 256k guard page should
>>> protect 95% of functions, and 1m or 2m  seems like enough to protect
>>> against all non malicious programs.  I'm not sure, but this is a 64 bit
>>> arch, so it seems like we should have the adress space for large guard
>>> pages like that.
>> I'm all for larger guards, particularly on 64 bit architectures.
>>
>> We use 64k pages on aarch64 for RHEL which implicitly gives us a minimum
>> guard of 64k.  That would be a huge factor in any analysis I would do if
>> the aarch64 maintainers choose not to fully protect their architecture
>> and I was forced to make a recommendation for Red Hat and its customers.
>>  I hope I don't have to sit down and do the analysis on this and make
>> any kind of recommendation.
> 
> Its certainly easier to say its not the compilers job to fully protect
> when you don't have to make that recommendation ;)
Lots of factors come into play.  What I don't want to do is put Red Hat
in a position where some customer gets hacked because we left open a
known hole that we could have reasonably closed.


> 
>> The fact that Qualys found nothing larger than X (for any X) in the code
>> they scanned isn't relevant.  There could well be code out there they
>> did not look at that uses > X or code that is yet to be written that
>> uses > X.
> 
> If you want to protect all code someone could write I agree.  I've been
> thinking more about protecting most reasonable code and saying if you
> allocate a 50mb buffer on the stack that's your bug, and we don't need
> to make that safe.
Depends on your point of view.  I can't reasonably take that position
with my Red Hat hat on.  In that role, I would say that any architecture
for Red Hat Enterprise Linux needs to have a minimum level of protection
against stack clash style attacks and that would include being safe
against someone allocating very large structures on the stack. Such code
may be dumb, such code may be inefficient and not terribly portable.
But in the environments where RHEL is deployed, we can't just dismiss it
out-of-hand.

That level of protection does not necessarily extend to an unbound
alloca/vl

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 07/08

2017-07-17 Thread Jakub Jelinek

On Mon, Jul 17, 2017 at 09:24:27AM -0600, Jeff Law wrote:
> >> While I'm not really comfortable with the 2k/2k split in general, I
> >> think I can support it from a Red Hat point of view -- largely because
> >> we use 64k pages in RHEL.  So our guard is actually 64k.  Having a
> >> hostile call chain push 2k into the guard isn't so bad in that case.
> > 
> > A minimum guard size of 64KB seems reasonable even on systems with
> > 4KB pages. However whatever the chosen guard size, you cannot defend
> > against hostile code. An OS can of course increase the guard size well 
> > beyond the minimum required, but that's simply reducing the probability -
> > it is never going to block a big unchecked alloca.
> That's a kernel issue and I'm not in a position to change that.  On
> systems with a 64bit address space, I'm keen to see the kernel team
> increase the guard size, but it's not something I can unilaterally make
> happen.

That is actually not true.  Kernel issue it is only for the initial thread's
stack.  For pthread_create created threads it is a matter how big the
default guard size is (these days glibc uses a single page by default) and
how many apps override that guardsize to something different (smaller
including 0 or larger).  And, while it might be acceptable to have larger
guard size for the initial thread, if you have hundreds of thousands of
threads, every extra KB in the guard areas will count significantly.

Jakub

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 07/08

2017-07-17 Thread Jeff Law

On 07/17/2017 05:27 AM, Wilco Dijkstra wrote:
> Jeff Law wrote:
> 
>> So would a half-half (2k caller/2k callee) split like Florian has
>> proposed work for you?  ie, we simply declare a caller that pushes the
>> stack pointer 2k or more into the guard as broken?
> 
> My results show using a 4KB guard size is not ideal. 2KB of outgoing
> args is too large (it means 264 arguments) so it's a waste of the limited
> range.
Ideal depends on your point of view, obviously.  Can you interface
directly with Richard and determine what split you want?  Having me as
an intermediary just slows things down :-)



> 
>> While I'm not really comfortable with the 2k/2k split in general, I
>> think I can support it from a Red Hat point of view -- largely because
>> we use 64k pages in RHEL.  So our guard is actually 64k.  Having a
>> hostile call chain push 2k into the guard isn't so bad in that case.
> 
> A minimum guard size of 64KB seems reasonable even on systems with
> 4KB pages. However whatever the chosen guard size, you cannot defend
> against hostile code. An OS can of course increase the guard size well 
> beyond the minimum required, but that's simply reducing the probability -
> it is never going to block a big unchecked alloca.
That's a kernel issue and I'm not in a position to change that.  On
systems with a 64bit address space, I'm keen to see the kernel team
increase the guard size, but it's not something I can unilaterally make
happen.




> 
>> In fact, with a 64k guard, the dynamic area dwarfs the outgoing args as
>> the area of most concern.  And with that size guard we're far more
>> likely to see an attempted jump with an unconstrained alloca rather than
>> with a fairly large alloca.
> 
> Outgoing args are not an area for concern even with a 4KB stack guard.
They are a concern and as the size of the guard comes down, they become
an increasing concern, IMHO.

> 
> Unconstrained alloca's are indeed the most risky - if a program has a single
> alloca that can be controlled then it is 100% unsafe no matter how many
> million probes you do elsewhere.
Agreed.

> 
>> I believe Richard Earnshaw indicated that 4k pages were in use on some
>> aarch64 systems, so I didn't try to use a larger probe interval.  Though
>> I certainly considered it.  I think that's a decision that belongs in
>> the hands of the target maintainers.  Though I think it's best if you're
>> going to use a larger probe interval to mandate a suitable page size in
>> the ABI.
> 
> The probe interval doesn't have to be the same as the (minimum) page size.
Correct.  It has to be smaller than the guard.   SO what guard does the
kernel guarantee?  I'm going to set things up assuming a single guard
page, y'all can modify as needed.

> 
>> Some (simpler) tracking is still needed because allocations happen in
>> potentially 3 places for aarch64.  There's almost certainly cases where
>> none of them individually are larger than PROBE_INTERVAL, but as a group
>> they could be.
> 
> In 99% of the frames only one stack allocation is made. There are a few
> cases where the stack can be adjusted twice.
BUt we still have to deal with the cases where there are multiple
adjustments.  Punting in the case of multiple adjustments isn't the
right thing to do.  Some level of tracking is needed.

> 
>> So how about this.
>>
>> First we have to disallow any individual allocation from allocating more
>> than PROBE_INTERVAL bytes.
>>
>> If the total frame size is less than X (where we're still discussing X),
>> we emit no probes.
> 
> ... and the outgoing args are smaller than Y.
> 
>> If the total frame size is greater than X, then after the first
>> allocation we probe the highest address within in the just allocated
>> area and we probe every PROBE_INTERVAL bytes thereafter as space is
>> allocated.
> 
> To be safe I think we first need to probe and then allocate. Or are there 
> going
> to be extra checks in asynchronous interrupt handlers that check whether SP is
> above the stack guard?
That hits the red zone, which is unfortunate.  But it's acceptable IMHO.


Jeff

Re: [PATCH] Increase expect match buffer size

2017-07-17 Thread Richard Biener

On Mon, 17 Jul 2017, Bernd Edlinger wrote:

> Hi,
> 
> 
> this is to mitigate a race condition in expect 5.45-7
> (which will be fixed in 5.45-8) see this thread for details:
> 
> https://gcc.gnu.org/ml/gcc/2017-07/msg00081.html
> 
> By increasing the match buffer size from 2000 to 1
> bytes we can avoid the fall-out due to the bug in expect,
> while we have to wait for the next stable debian version.
> 
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?

Ok.

Richard.

Fix inconsistent section flags

2017-07-17 Thread Joerg Sonnenberger

Hello,
attached patch fixes inconsistent handling of section flags when using
the section attribute, i.e.:

__attribute__((section("writable1"))) int foo1;
__attribute__((section("readonly1"))) const int foo1c;
__attribute__((section("writable2"))) int foo2 = 42;
__attribute__((section("readonly2"))) const int foo2c = 42;

should give section attributes of "aw", "a", "aw", "a" in that order.
Currently, "foo1c" is classified as BSS though and therefore put into a
writable section.

Joerg
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 45611a9a858..eaf78b28bc1 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -969,11 +969,17 @@ decode_reg_name (const char *name)
 }
 
 
-/* Return true if DECL's initializer is suitable for a BSS section.  */
+/*
+ * Return true if DECL's initializer is suitable for a BSS section.
+ * If there is an explicit section name attribute, assume that it is not
+ * for a BSS section, independent of the name.
+ */
 
 bool
 bss_initializer_p (const_tree decl)
 {
+  if (DECL_SECTION_NAME (decl) != NULL)
+return false;
   return (DECL_INITIAL (decl) == NULL
  /* In LTO we have no errors in program; error_mark_node is used
 to mark offlined constructors.  */
@@ -6460,7 +6466,7 @@ categorize_decl_for_section (const_tree decl, int reloc)
ret = SECCAT_BSS;
   else if (! TREE_READONLY (decl)
   || TREE_SIDE_EFFECTS (decl)
-  || ! TREE_CONSTANT (DECL_INITIAL (decl)))
+  || (DECL_INITIAL(decl) != NULL && ! TREE_CONSTANT (DECL_INITIAL 
(decl
{
  /* Here the reloc_rw_mask is not testing whether the section should
 be read-only or not, but whether the dynamic link will have to
@@ -6504,6 +6510,7 @@ categorize_decl_for_section (const_tree decl, int reloc)
 no concept of a read-only thread-local-data section.  */
   if (ret == SECCAT_BSS
   || (flag_zero_initialized_in_bss
+  && DECL_INITIAL(decl) != NULL
   && initializer_zerop (DECL_INITIAL (decl
ret = SECCAT_TBSS;
   else

[wwwdocs] Document SPARC changes in upcoming 7.2 release

2017-07-17 Thread Eric Botcazou

Applied to the wwwdocs module.

-- 
Eric BotcazouIndex: gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.87
diff -u -r1.87 changes.html
--- gcc-7/changes.html	15 Jul 2017 16:51:18 -	1.87
+++ gcc-7/changes.html	17 Jul 2017 13:23:46 -
@@ -1231,7 +1231,6 @@
 are not listed here).
 
 
-

[PATCH] Increase expect match buffer size

2017-07-17 Thread Bernd Edlinger

Hi,


this is to mitigate a race condition in expect 5.45-7
(which will be fixed in 5.45-8) see this thread for details:

https://gcc.gnu.org/ml/gcc/2017-07/msg00081.html

By increasing the match buffer size from 2000 to 1
bytes we can avoid the fall-out due to the bug in expect,
while we have to wait for the next stable debian version.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.
2017-07-18  Bernd Edlinger  

* lib/gcc-dg.exp: Increase expect's match buffer size.

Index: gcc/testsuite/lib/gcc-dg.exp
===
--- gcc/testsuite/lib/gcc-dg.exp	(revision 250150)
+++ gcc/testsuite/lib/gcc-dg.exp	(working copy)
@@ -43,6 +43,9 @@
   setenv LANG C.ASCII
 }
 
+# Avoid sporadic data-losses with expect
+match_max -d 1
+
 # Ensure GCC_COLORS is unset, for the rare testcases that verify
 # how output is colorized.
 if [info exists ::env(GCC_COLORS) ] {

Re: [PATCH] Move static chain and non-local goto init after NOTE_INSN_FUNCTION_BEG (PR sanitize/81186).

2017-07-17 Thread Michael Matz

Hello,

On Mon, 17 Jul 2017, Martin Liška wrote:

> which does all the stack preparation (including the problematic call to 
> __asan_stack_malloc_N).
> 
> Note that this code still should be placed before parm_birth_note as we 
> cant's say that params are ready before a fake stack is prepared.

Yes, understood.

> Then we generate code that loads the implicit chain argument:
> 
> (gdb) p debug_rtx_list(get_insns(), 100)
> (note 1 0 37 NOTE_INSN_DELETED)
> 
> (note 37 1 38 NOTE_INSN_FUNCTION_BEG)
> 
> (insn 38 37 39 (set (reg/f:DI 94 [ CHAIN.1 ])
> (reg:DI 39 r10 [ CHAIN.1 ])) 
> "/home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/asan/pr81186.c":9 -1
>  (nil))
> 
> (insn 39 38 0 (set (mem/c:DI (plus:DI (reg/f:DI 82 virtual-stack-vars)
> (const_int -584 [0xfdb8])) [0  S8 A64])
> (reg:DI 39 r10 [ CHAIN.1 ])) 
> "/home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/asan/pr81186.c":9 -1
>  (nil))
> 
> Which is problematic as using virtual-stack-vars which should point to 
> fake stack done by AddressSanitizer in __asan_stack_malloc_N.

If anything, then only the stack access is problematic, i.e. the last 
instruction.  I don't understand why that should be problematic, though.  
Probably because I don't know much about the ASAN implementation.  But why 
should there be something magic about using the non-asan stack?  Most 
local variable accesses are rewritten to be in terms of the fake stack, 
but those that aren't could use the normal stack just fine, can't they?

If that really is a problem then that could also be rectified by splitting 
the static_chain_decl in expand_function_start a bit, ala this:

  if (cfun->static_chain_decl) {
all code except the last "if (!optimize) store-into-stack"
  }
  emit_note; parm_birth_insn = ...
  if (cfun->static_chain_decl && !optimize) {
store into assign_stack_local
  }

(requires moving some local variable to an outer scope, but hey).

But what you say above mystifies me.  You claim that access via 
virtual-stack-vars is problematic before the shadow stack is created by 
ASAN.  But the whole parameter setup always uses such local stack storage 
whenever it needs.  And those definitely happen before the ASAN setup.  
See the subroutines of assign_parms, (e.g. assign_parm_setup_block and 
assign_parm_setup_stack).  You might need to use special function argument 
types or special ABIs to trigger this, though you should be able to find 
some cases to trigger also on i386 or x86_64.

So, if the stack access for the static chain is problematic I don't see 
why the stack accesses for the parameters are not.  And if they indeed are 
problematic, then something is confused within ASAN, and the fix for that 
confusion is not to move parm_birth_insn, but something else (I can't say 
what, as I don't know much about how ASAN is supposed to work in such 
situations).

Ciao,
Michael.

Re: [patch] Fix ICE on CONSTRUCTOR containing absolute addresses

2017-07-17 Thread Richard Biener

On Mon, Jul 17, 2017 at 12:48 PM, Eric Botcazou  wrote:
>> So this isn't about global
>>
>>  void *x[] = { &((struct Y *)0x12)->foo }
>>
>> but for a local one where supposedly variable indexing is valid (don't
>> we gimplify that)?
>
> Yes, it's local (it's OK if global because we do a full RTL expansion).
> Everything is valid, constant and passes initializer_constant_valid_p.
>
>> And
>>
>> +case INDIRECT_REF:
>> +  /* This deals with absolute addresses.  */
>> +  offset += tree_to_shwi (TREE_OPERAND (target, 0));
>> +  x = gen_rtx_MEM (QImode,
>> +  gen_rtx_SYMBOL_REF (Pmode, "origin of addresses"));
>>
>> simply translates 0x12 to &* + 0x12 (where origin == 0 somehow?).
>>
>> I suppose returing directly here and sth like
>>
>> value->base = gen_rtx_SYMBOL_REF (Pmode, "origin of addresses");
>> value->offset = offset + tree_to_shwi (...);
>> return;
>>
>> would be clearer.
>
> That's a matter of consistency, the LABEL_DECL case does something equivalent:
>
>   x = gen_rtx_MEM (FUNCTION_MODE,
>gen_rtx_LABEL_REF (Pmode, force_label_rtx (target)));

Ah, I see.

>> Or even
>>
>> value->base = tree-to-rtx (TREE_OPERAND (target, 0));
>> value->offset = offset;
>
> The callers expect the base to be SYMBOL_REF or LABEL_REF though.

Ok.  I suppose I'm mostly concerned about that magic "origin of addresses".

Patch is ok in its original form.  As said, handling MEM_REF would be nice,
FEs start to generate that (well C++ does).

Thanks,
Richard.

> --
> Eric Botcazou

Re: [PATCH] Fix pr80044, -static and -pie insanity, and pr81170

2017-07-17 Thread H.J. Lu

On Mon, Jul 17, 2017 at 5:33 AM, Alan Modra  wrote:
> On Sat, Jul 15, 2017 at 06:32:40AM -0700, H.J. Lu wrote:
>> On Thu, Jun 22, 2017 at 8:28 AM, Alan Modra  wrote:
>> > PR80044 notes that -static and -pie together behave differently when
>> > gcc is configured with --enable-default-pie as compared to configuring
>> > without (or --disable-default-pie).  This patch removes that
>> > difference.  In both cases you now will have -static completely
>> > overriding -pie.
>> >
>> > Fixing this wasn't quite as simple as you'd expect, due to poor
>> > separation of functionality.  PIE_SPEC didn't just mean that -pie was
>> > on explicitly or by default, but also -r and -shared were *not* on.
>> > Fortunately the three files touched by this patch are the only places
>> > PIE_SPEC and NO_PIE_SPEC are used, so it isn't too hard to see that
>> > the reason PIE_SPEC and NO_PIE_SPEC are not inverses is the use of
>> > PIE_SPEC in LINK_PIE_SPEC.  So, move the inelegant symmetry breaking
>> > addition, to LINK_PIE_SPEC where it belongs.  Doing that showed
>> > another problem in gnu-user.h, with PIE_SPEC and NO_PIE_SPEC selection
>> > of crtbegin*.o not properly hooked into a chain of if .. elseif ..
>> > conditions, which required both PIE_SPEC and NO_PIE_SPEC to exclude
>> > -static and -shared.  Fixing that particular problem finally allows
>> > PIE_SPEC to serve just one purpose, and NO_PIE_SPEC to disappear.
>> >
>> > Bootstrapped and regression tested powerpc64le-linux c,c++.  No
>> > regressions and a bunch of --enable-default-pie failures squashed.
>> > OK mainline and active branches?
>> >
>> > Incidentally, there is a fairly strong case to be made for adding
>> > -static to the -shared, -pie, -no-pie chain of RejectNegative's in
>> > common.opt.  Since git 0d6378a9e (svn r48039) 2001-11-15, -static has
>> > done more than just the traditional "prevent linking with dynamic
>> > libraries", as -static selects crtbeginT.o rather than crtbegin.o
>> > on GNU systems.  Realizing this is what led me to close pr80044, which
>> > I'd opened with the aim of making -pie -static work together (with the
>> > traditional meaning of -static).  I don't that is worth doing, but
>> > mention pr80044 in the changelog due to fixing the insane output
>> > produced by -pie -static with --disable-default-pie.
>> >
>>
>> On x86-64, without --enable-default-pie, "-static -pie" and "-pie -static"
>> never worked since both -static and -pie are passed to linker, which
>> uses libc.a to build PIE.
>
> Yes, it's broken.

This behavior may be useful for static PIE when libc.a is compiled with
-fPIE.

>>  With --enable-default-pie, -static and -pie
>> override each other.
>
> No they don't.  -static overrides -pie.
>
>>  What does your patch do on x86-64?  Make
>> with and without --enable-default-pie behave the same?
>
> Yes, as I said in my original post first paragraph.
>
>>  Does it
>> mean that both fail to create executable?
>
> I try to leave that sort of patch to those better qualified.
> Bootstrap and regression testing on x86_64-linux both
> --enable-default-pie and --disable-default-pie was complete June 23.
>

What is the new behavior?  The old  --disable-default-pie or old
--enable-default-pie?  Will static PIE be supported if libc is
compiled with -fPIE by default?

-- 
H.J.

RE: [PATCH 4/7] [ARC] [LRA] Avoid emitting COND_EXEC during expand.

2017-07-17 Thread Claudiu Zissulescu

> This seems fine, your description "does not always work." is a bit
> of a tease :) it would be nice to know _why_ it doesn't work, or at
> least a description of what problem you're seeing.

Committed with additional comments. Thank you, 
Claudiu

RE: [PATCH 6/7] [ARC] Deprecate mexpand-adddi option.

2017-07-17 Thread Claudiu Zissulescu

> This looks fine, though the commit message tells me it's not a good
> idea, but it would be nice to know _why_ it's not good.  Might be nice
> to know for future reference.

Committed with additional comments. Thank you,
Claudiu

Re: [PATCH] Fix pr80044, -static and -pie insanity, and pr81170

2017-07-17 Thread Alan Modra

On Sat, Jul 15, 2017 at 06:32:40AM -0700, H.J. Lu wrote:
> On Thu, Jun 22, 2017 at 8:28 AM, Alan Modra  wrote:
> > PR80044 notes that -static and -pie together behave differently when
> > gcc is configured with --enable-default-pie as compared to configuring
> > without (or --disable-default-pie).  This patch removes that
> > difference.  In both cases you now will have -static completely
> > overriding -pie.
> >
> > Fixing this wasn't quite as simple as you'd expect, due to poor
> > separation of functionality.  PIE_SPEC didn't just mean that -pie was
> > on explicitly or by default, but also -r and -shared were *not* on.
> > Fortunately the three files touched by this patch are the only places
> > PIE_SPEC and NO_PIE_SPEC are used, so it isn't too hard to see that
> > the reason PIE_SPEC and NO_PIE_SPEC are not inverses is the use of
> > PIE_SPEC in LINK_PIE_SPEC.  So, move the inelegant symmetry breaking
> > addition, to LINK_PIE_SPEC where it belongs.  Doing that showed
> > another problem in gnu-user.h, with PIE_SPEC and NO_PIE_SPEC selection
> > of crtbegin*.o not properly hooked into a chain of if .. elseif ..
> > conditions, which required both PIE_SPEC and NO_PIE_SPEC to exclude
> > -static and -shared.  Fixing that particular problem finally allows
> > PIE_SPEC to serve just one purpose, and NO_PIE_SPEC to disappear.
> >
> > Bootstrapped and regression tested powerpc64le-linux c,c++.  No
> > regressions and a bunch of --enable-default-pie failures squashed.
> > OK mainline and active branches?
> >
> > Incidentally, there is a fairly strong case to be made for adding
> > -static to the -shared, -pie, -no-pie chain of RejectNegative's in
> > common.opt.  Since git 0d6378a9e (svn r48039) 2001-11-15, -static has
> > done more than just the traditional "prevent linking with dynamic
> > libraries", as -static selects crtbeginT.o rather than crtbegin.o
> > on GNU systems.  Realizing this is what led me to close pr80044, which
> > I'd opened with the aim of making -pie -static work together (with the
> > traditional meaning of -static).  I don't that is worth doing, but
> > mention pr80044 in the changelog due to fixing the insane output
> > produced by -pie -static with --disable-default-pie.
> >
> 
> On x86-64, without --enable-default-pie, "-static -pie" and "-pie -static"
> never worked since both -static and -pie are passed to linker, which
> uses libc.a to build PIE.

Yes, it's broken.

>  With --enable-default-pie, -static and -pie
> override each other.

No they don't.  -static overrides -pie.

>  What does your patch do on x86-64?  Make
> with and without --enable-default-pie behave the same?

Yes, as I said in my original post first paragraph.

>  Does it
> mean that both fail to create executable?

I try to leave that sort of patch to those better qualified.
Bootstrap and regression testing on x86_64-linux both
--enable-default-pie and --disable-default-pie was complete June 23.

-- 
Alan Modra
Australia Development Lab, IBM

RE: [PATCH 5/7] [ARC] Enable indexed loads for elf targers.

2017-07-17 Thread Claudiu Zissulescu

> The change looks fine, but it would be nice if the commit message
> explained _why_ we are default off for Linux and on for Elf, I think
> more text in the commit message on this sort of thing will help future
> developers understand why things are the way they are.
> 

Committed with suggested annotations. Thank you for your review,
Claudiu

Re: [PATCH][PING] Improve extraction of changed file in contrib/mklog

2017-07-17 Thread Yuri Gribov

On Mon, Jul 17, 2017 at 11:37 AM, Trevor Saunders  wrote:
> On Mon, Jul 17, 2017 at 08:22:56AM +0100, Yuri Gribov wrote:
>> Hi,
>>
>> Currently mklog will fail to analyze lines like this in patches:
>> diff -rupN gcc/gcc/testsuite/lib/profopt.exp
>> gcc-compare-checks/gcc/testsuite/lib/profopt.exp
>> (it fails with "Error: failed to parse diff for ... and ...").
>>
>> This patch fixes it. Ok for trunk?
>
> I'm no perl expert, but it seems fine.  I don't think I can
> approve this, but I sort of remembered you being the one who owns this
> script, so I'm not sure you need approval.

I think it's more like being the only one who cares about the script)

Cc-ed Jeff who approved past mklog commits.

-Y

Re: [PATCH, committed] Fix PR81162

2017-07-17 Thread Bill Schmidt

Thomas, thanks for the heads-up!  I didn't realize we had this dependency.  
I'll move the test case shortly.

-- Bill

> On Jul 17, 2017, at 5:47 AM, Thomas Preudhomme 
>  wrote:
> 
> Hi Bill,
> 
> Is there a reason the new test is in gcc.dg rather than in gcc.dg/ubsan? The 
> test FAILs when there is no ubsan runtime support and fsanitize_undefined 
> effective target is not available in gcc.dg (one needs to load ubsan-dg for 
> this effective target to be defined).
> 
> Best regards,
> 
> Thomas
> 
> On 14/07/17 19:05, Bill Schmidt wrote:
>> Hi,
>> PR81162 identifies a bug in SLSR involving overflow that occurs when
>> replacing a NEGATE_EXPR with a PLUS_EXPR.  This is another example
>> of an unprofitable transformation that should be skipped anyway,
>> hence this simple patch.  Bootstrapped and tested on
>> powerpc64le-unknown-linux-gnu, committed.  Test case provided from
>> the bug report.
>> Thanks,
>> Bill
>> [gcc]
>> 2016-07-14  Bill Schmidt  
>>  PR tree-optimization/81162
>>  * gimple-ssa-strength-reduction.c (replace_mult_candidate): Don't
>>  replace a negate with an add.
>> [gcc/testsuite]
>> 2016-07-14  Bill Schmidt  
>>  PR tree-optimization/81162
>>  * gcc.dg/pr81162.c: New file.
>> Index: gcc/gimple-ssa-strength-reduction.c
>> ===
>> --- gcc/gimple-ssa-strength-reduction.c  (revision 250189)
>> +++ gcc/gimple-ssa-strength-reduction.c  (working copy)
>> @@ -2082,13 +2082,14 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
>>   types but allows for safe negation without twisted logic.  */
>>if (wi::fits_shwi_p (bump)
>>&& bump.to_shwi () != HOST_WIDE_INT_MIN
>> -  /* It is not useful to replace casts, copies, or adds of
>> +  /* It is not useful to replace casts, copies, negates, or adds of
>>   an SSA name and a constant.  */
>>&& cand_code != SSA_NAME
>>&& !CONVERT_EXPR_CODE_P (cand_code)
>>&& cand_code != PLUS_EXPR
>>&& cand_code != POINTER_PLUS_EXPR
>> -  && cand_code != MINUS_EXPR)
>> +  && cand_code != MINUS_EXPR
>> +  && cand_code != NEGATE_EXPR)
>>  {
>>enum tree_code code = PLUS_EXPR;
>>tree bump_tree;
>> Index: gcc/testsuite/gcc.dg/pr81162.c
>> ===
>> --- gcc/testsuite/gcc.dg/pr81162.c   (nonexistent)
>> +++ gcc/testsuite/gcc.dg/pr81162.c   (working copy)
>> @@ -0,0 +1,17 @@
>> +/* PR tree-optimization/81162 */
>> +/* { dg-do run } */
>> +/* { dg-options "-fsanitize=undefined -O2" } */
>> +
>> +short s;
>> +int i1 = 1;
>> +int i2 = 1;
>> +unsigned char uc = 147;
>> +
>> +int main() {
>> +  s = (-uc + 2147483647) << 0;
>> +  if (9031239389974324562ULL >= (-((i1 && i2) + uc) ^ -21096) ) {
>> +return 0;
>> +  } else {
>> +return -1;
>> +  }
>> +}
>

Re: [PATCH] Move static chain and non-local goto init after NOTE_INSN_FUNCTION_BEG (PR sanitize/81186).

2017-07-17 Thread Martin Liška

On 07/14/2017 03:42 PM, Michael Matz wrote:
> Hi,
> 
> On Thu, 13 Jul 2017, Martin Liška wrote:
> 
>> Hopefully following patch will fix that. I returned to the first version 
>> and saved/restored static_chain register before/after 
>> __asan_stack_malloc.
> 
> It should also work if you emit the parm_birth_note after the static chain 
> is set up (not before it), but before you store into the 
> nonlocal_goto_save_area.  With that you don't need to worry about 
> clobbering the incoming static chain with the asan setup.

Unfortunately it does not work. First asan_emit_stack_protection is executed, 
which creates:

#0  0x009850f4 in expand_used_vars () at ../../gcc/cfgexpand.c:2233
#1  0x00992ab7 in (anonymous namespace)::pass_expand::execute 
(this=0x28c02f0, fun=0x2c1b60b0) at ../../gcc/cfgexpand.c:6232
#2  0x00e0d3a8 in execute_one_pass (pass=0x28c02f0) at 
../../gcc/passes.c:2492

which does all the stack preparation (including the problematic call to 
__asan_stack_malloc_N).

Note that this code still should be placed before parm_birth_note as we cant's 
say that params are
ready before a fake stack is prepared.

Then we generate code that loads the implicit chain argument:

(gdb) p debug_rtx_list(get_insns(), 100)
(note 1 0 37 NOTE_INSN_DELETED)

(note 37 1 38 NOTE_INSN_FUNCTION_BEG)

(insn 38 37 39 (set (reg/f:DI 94 [ CHAIN.1 ])
(reg:DI 39 r10 [ CHAIN.1 ])) 
"/home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/asan/pr81186.c":9 -1
 (nil))

(insn 39 38 0 (set (mem/c:DI (plus:DI (reg/f:DI 82 virtual-stack-vars)
(const_int -584 [0xfdb8])) [0  S8 A64])
(reg:DI 39 r10 [ CHAIN.1 ])) 
"/home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/asan/pr81186.c":9 -1
 (nil))

Which is problematic as using virtual-stack-vars which should point to fake 
stack done by AddressSanitizer
in __asan_stack_malloc_N.

That said both parts (ASAN fake stack init and CHAIN load from implicit 
argument) are before parm birth actions
that should be aware each other. Thus my previous patch preserves the r10 
register on x86_64.

Thanks,
Martin

> 
> Can you test that?  It would better reflect the intent of this note (the 
> static chain being an implicit parameter, but the nonlocal_goto_save_area 
> setup not being such).
> 
> 
> Ciao,
> Michael.
>

Re: [PATCH v12] add -fpatchable-function-entry=N,M option

2017-07-17 Thread Torsten Duwe

What is the next step now? Is anybody going to commit that patch?

Torsten

On Fri, Jul 07, 2017 at 02:57:55PM +0100, Richard Earnshaw (lists) wrote:
> On 06/07/17 15:03, Torsten Duwe wrote:
> +#if TARGET_HAVE_NAMED_SECTIONS

No, this is a hook.  You need to test targetm_common.have_named_sections.

OK with that change.

R.

On Fri, Jul 07, 2017 at 09:30:28PM +0200, Torsten Duwe wrote:
> Change since v11:
> 
> < +#if TARGET_HAVE_NAMED_SECTIONS
> 
> > +  if (record_p && targetm_common.have_named_sections)
> 
> (plus > +#include "common/common-target.h" )
> 
>   Torsten
> 
> 
> gcc/c-family/ChangeLog
> 2017-07-07  Torsten Duwe  
> 
>   * c-attribs.c (c_common_attribute_table): Add entry for
>   "patchable_function_entry".
> 
> gcc/lto/ChangeLog
> 2017-07-07  Torsten Duwe  
> 
>   * lto-lang.c (lto_attribute_table): Add entry for
>   "patchable_function_entry".
> 
> gcc/ChangeLog
> 2017-07-07  Torsten Duwe  
> 
>   * common.opt: Introduce -fpatchable-function-entry
>   command line option, and its variables function_entry_patch_area_size
>   and function_entry_patch_area_start.
>   * opts.c (common_handle_option): Add -fpatchable_function_entry_ case,
>   including a two-value parser.
>   * target.def (print_patchable_function_entry): New target hook.
>   * targhooks.h (default_print_patchable_function_entry): New function.
>   * targhooks.c (default_print_patchable_function_entry): Likewise.
>   * toplev.c (process_options): Switch off IPA-RA if
>   patchable function entries are being generated.
>   * varasm.c (assemble_start_function): Look at the
>   patchable-function-entry command line switch and current
>   function attributes and maybe generate NOP instructions by
>   calling the print_patchable_function_entry hook.
>   * doc/extend.texi: Document patchable_function_entry attribute.
>   * doc/invoke.texi: Document -fpatchable_function_entry
>   command line option.
>   * doc/tm.texi.in (TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY):
>   New target hook.
>   * doc/tm.texi: Likewise.
> 
> gcc/testsuite/ChangeLog
> 2017-07-07  Torsten Duwe  
> 
>   * c-c++-common/patchable_function_entry-default.c: New test.
>   * c-c++-common/patchable_function_entry-decl.c: Likewise.
>   * c-c++-common/patchable_function_entry-definition.c: Likewise.
> 
> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> index 626ffa1cde7..ecb00c1d5b9 100644
> --- a/gcc/c-family/c-attribs.c
> +++ b/gcc/c-family/c-attribs.c
> @@ -142,6 +142,8 @@ static tree handle_bnd_variable_size_attribute (tree *, 
> tree, tree, int, bool *)
>  static tree handle_bnd_legacy (tree *, tree, tree, int, bool *);
>  static tree handle_bnd_instrument (tree *, tree, tree, int, bool *);
>  static tree handle_fallthrough_attribute (tree *, tree, tree, int, bool *);
> +static tree handle_patchable_function_entry_attribute (tree *, tree, tree,
> +int, bool *);
>  
>  /* Table of machine-independent attributes common to all C-like languages.
>  
> @@ -351,6 +353,9 @@ const struct attribute_spec c_common_attribute_table[] =
> handle_bnd_instrument, false },
>{ "fallthrough", 0, 0, false, false, false,
> handle_fallthrough_attribute, false },
> +  { "patchable_function_entry",  1, 2, true, false, false,
> +   handle_patchable_function_entry_attribute,
> +   false },
>{ NULL, 0, 0, false, false, false, NULL, false }
>  };
>  
> @@ -3260,3 +3265,10 @@ handle_fallthrough_attribute (tree *, tree name, tree, 
> int,
>*no_add_attrs = true;
>return NULL_TREE;
>  }
> +
> +static tree
> +handle_patchable_function_entry_attribute (tree *, tree, tree, int, bool *)
> +{
> +  /* Nothing to be done here.  */
> +  return NULL_TREE;
> +}
> diff --git a/gcc/common.opt b/gcc/common.opt
> index e81165c488b..78cfa568a95 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -163,6 +163,13 @@ bool flag_stack_usage_info = false
>  Variable
>  int flag_debug_asm
>  
> +; How many NOP insns to place at each function entry by default
> +Variable
> +HOST_WIDE_INT function_entry_patch_area_size
> +
> +; And how far the real asm entry point is into this area
> +Variable
> +HOST_WIDE_INT function_entry_patch_area_start
>  
>  ; Balance between GNAT encodings and standard DWARF to emit.
>  Variable
> @@ -2030,6 +2037,10 @@ fprofile-reorder-functions
>  Common Report Var(flag_profile_reorder_functions)
>  Enable function reordering that improves code placement.
>  
> +fpatchable-function-entry=
> +Common Joined Optimization
> +Insert NOP instructions at each function entry.
> +
>  frandom-seed
>  Common Var(common_deferred_options) Defer
>  
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 5cb512fe575..9c171abc121 100644
> --- a/gcc/doc/extend.

Re: [PATCH GCC][13/13]Distribute loop with loop versioning under runtime alias check

2017-07-17 Thread Christophe Lyon

On 17 July 2017 at 12:06, Bin.Cheng  wrote:
> On Mon, Jul 10, 2017 at 10:32 AM, Christophe Lyon
>  wrote:
>> Hi Bin,
>>
>> On 30 June 2017 at 12:43, Bin.Cheng  wrote:
>>> On Wed, Jun 28, 2017 at 2:09 PM, Bin.Cheng  wrote:
 On Wed, Jun 28, 2017 at 1:29 PM, Richard Biener
  wrote:
> On Wed, Jun 28, 2017 at 1:46 PM, Bin.Cheng  wrote:
>> On Wed, Jun 28, 2017 at 11:58 AM, Richard Biener
>>  wrote:
>>> On Tue, Jun 27, 2017 at 4:07 PM, Bin.Cheng  
>>> wrote:
 On Tue, Jun 27, 2017 at 1:44 PM, Richard Biener
  wrote:
> On Fri, Jun 23, 2017 at 12:30 PM, Bin.Cheng  
> wrote:
>> On Tue, Jun 20, 2017 at 10:22 AM, Bin.Cheng  
>> wrote:
>>> On Mon, Jun 12, 2017 at 6:03 PM, Bin Cheng  
>>> wrote:
 Hi,
>> Rebased V3 for changes in previous patches.  Bootstap and test on
>> x86_64 and aarch64.
>
> why is ldist-12.c no longer distributed?  your comment says it 
> doesn't expose
> more "parallelism" but the point is to reduce memory bandwith 
> requirements
> which it clearly does.
>
> Likewise for -13.c, -14.c.  -4.c may be a questionable case but the 
> wording
> of "parallelism" still confuses me.
>
> Can you elaborate on that.  Now onto the patch:
 Given we don't model data locality or memory bandwidth, whether
 distribution enables loops that can be executed paralleled becomes the
 major criteria for distribution.  BTW, I think a good memory stream
 optimization model shouldn't consider small loops as in ldist-12.c,
 etc., appropriate for distribution.
>>>
>>> True.  But what means "parallel" here?  ldist-13.c if partitioned into 
>>> two loops
>>> can be executed "in parallel"
>> So if a loop by itself can be vectorized (or so called can be executed
>> paralleled), we tend to no distribute it into small ones.  But there
>> is one exception here, if the distributed small loops are recognized
>> as builtin functions, we still distribute it.  I assume it's generally
>> better to call builtin memory functions than vectorize it by GCC?
>
> Yes.
>
>>>
>
> +   Loop distribution is the dual of loop fusion.  It separates 
> statements
> +   of a loop (or loop nest) into multiple loops (or loop nests) with 
> the
> +   same loop header.  The major goal is to separate statements which 
> may
> +   be vectorized from those that can't.  This pass implements 
> distribution
> +   in the following steps:
>
> misses the goal of being a memory stream optimization, not only a 
> vectorization
> enabler.  distributing a loop can also reduce register pressure.
 I will revise the comment, but as explained, enabling more
 vectorization is the major criteria for distribution to some extend
 now.
>>>
>>> Yes, I agree -- originally it was written to optimize the stream 
>>> benchmark IIRC.
>> Let's see if any performance drop will be reported against this patch.
>> Let's see if we can create a cost model for it.
>
> Fine.
 I will run some benchmarks to see if there is breakage.
>
>>>
>
> You introduce ldist_alias_id in struct loop (probably in 01/n which I
> didn't look
> into yet).  If you don't use that please introduce it separately.
 Hmm, yes it is introduced in patch [01/n] and set in this patch.

>
> + /* Be conservative.  If data references are not well 
> analyzed,
> +or the two data references have the same base 
> address and
> +offset, add dependence and consider it alias to each 
> other.
> +In other words, the dependence can not be resolved by
> +runtime alias check.  */
> + if (!DR_BASE_ADDRESS (dr1) || !DR_BASE_ADDRESS (dr2)
> + || !DR_OFFSET (dr1) || !DR_OFFSET (dr2)
> + || !DR_INIT (dr1) || !DR_INIT (dr2)
> + || !DR_STEP (dr1) || !tree_fits_uhwi_p (DR_STEP 
> (dr1))
> + || !DR_STEP (dr2) || !tree_fits_uhwi_p (DR_STEP 
> (dr2))
> + || res == 0)
>
> ISTR a helper that computes whether we can handle a runtime alias 
> check for
> a specific case?
 I guess you mean runtime_alias_check_p that I factored out previously?
  Unfortunately, it's factored out vectorizer's usage and doesn't fit
 here straightforwardly.  Shall I try to further generalize the
 interface as independence patch to this one?
>>>
>>> That would be nice.
>>>
>
>

Re: [PATCH] Remove Java references in source code.

2017-07-17 Thread Martin Liška

On 07/15/2017 09:37 AM, Eric Botcazou wrote:
>> Patch removes some remaining references in gcc/ subfolder.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> You need to configure with --enable-languages=all for cleanups like this, the 
> patch trivially breaks the Ada compiler everywhere:

Hi.

Sorry for the breakage. Thanks!

Martin 

> 
> +===GNAT BUG DETECTED==+
> | 8.0.0 20170715 (experimental) [trunk revision 250221] (x86_64-suse-linux) 
> GCC error:|
> | in gimplify_save_expr, at gimplify.c:5805|
> | Error detected around /home/eric/svn/gcc/gcc/ada/butil.adb:40:31 |
> | Please submit a bug report; see https://gcc.gnu.org/bugs/ .  |
>

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 07/08

2017-07-17 Thread Wilco Dijkstra

Jeff Law wrote:

> So would a half-half (2k caller/2k callee) split like Florian has
> proposed work for you?  ie, we simply declare a caller that pushes the
> stack pointer 2k or more into the guard as broken?

My results show using a 4KB guard size is not ideal. 2KB of outgoing
args is too large (it means 264 arguments) so it's a waste of the limited
range.

> While I'm not really comfortable with the 2k/2k split in general, I
> think I can support it from a Red Hat point of view -- largely because
> we use 64k pages in RHEL.  So our guard is actually 64k.  Having a
> hostile call chain push 2k into the guard isn't so bad in that case.

A minimum guard size of 64KB seems reasonable even on systems with
4KB pages. However whatever the chosen guard size, you cannot defend
against hostile code. An OS can of course increase the guard size well 
beyond the minimum required, but that's simply reducing the probability -
it is never going to block a big unchecked alloca.

> In fact, with a 64k guard, the dynamic area dwarfs the outgoing args as
> the area of most concern.  And with that size guard we're far more
> likely to see an attempted jump with an unconstrained alloca rather than
> with a fairly large alloca.

Outgoing args are not an area for concern even with a 4KB stack guard.

Unconstrained alloca's are indeed the most risky - if a program has a single
alloca that can be controlled then it is 100% unsafe no matter how many
million probes you do elsewhere.

> And if there's an attacker controlled unconstrained alloca, then, well,
> we've lost because no matter the size of the guard, the attacker can
> jump it and there isn't a damn thing the callee can do about it.

Exactly. This means checking dynamic allocation by default is essential.

> I believe Richard Earnshaw indicated that 4k pages were in use on some
> aarch64 systems, so I didn't try to use a larger probe interval.  Though
> I certainly considered it.  I think that's a decision that belongs in
> the hands of the target maintainers.  Though I think it's best if you're
> going to use a larger probe interval to mandate a suitable page size in
> the ABI.

The probe interval doesn't have to be the same as the (minimum) page size.

> Some (simpler) tracking is still needed because allocations happen in
> potentially 3 places for aarch64.  There's almost certainly cases where
> none of them individually are larger than PROBE_INTERVAL, but as a group
> they could be.

In 99% of the frames only one stack allocation is made. There are a few
cases where the stack can be adjusted twice.

> So how about this.
>
> First we have to disallow any individual allocation from allocating more
> than PROBE_INTERVAL bytes.
>
> If the total frame size is less than X (where we're still discussing X),
> we emit no probes.

... and the outgoing args are smaller than Y.

> If the total frame size is greater than X, then after the first
> allocation we probe the highest address within in the just allocated
> area and we probe every PROBE_INTERVAL bytes thereafter as space is
> allocated.

To be safe I think we first need to probe and then allocate. Or are there going
to be extra checks in asynchronous interrupt handlers that check whether SP is
above the stack guard?

> PROBE_INTERVAL is currently 4k, but the aarch64 maintainers can choose
> to change that.  Note that significantly larger probe intervals may
> require tweaking the sequences -- I'm not familiar enough with the
> various immediate field limitations on aarch64 instructions to be sure
> either way on that issue.

On AArch64 a probe can reach up to 16MBytes using 2 instructions.

Wilco

[PATCH, PR81030] Call compute_outgoing_frequencies at expand

2017-07-17 Thread Tom de Vries


Hi,

this patch fixes PR81030, an P1 ICE in expand.


I.

For the test-case from the patch compiled at -O1 we have at .optimized:
...
  _21 = 0;
  _22 = c.5_6 != 0;
  _23 = _21 | _22;
  if (_23 != 0)
goto ; [73.27%] [count: INV]
  else
goto ; [26.73%] [count: INV]
;;succ:   7 [73.3% (guessed)]  (TRUE_VALUE,EXECUTABLE)
;;6 [26.7% (guessed)]  (FALSE_VALUE,EXECUTABLE)
...

While expanding this bit, we first swap the operands of the BIT_IOR_EXPR:
...
;; Generating RTL for gimple basic block 4
Swap operands in stmt:
_23 = _21 | _22;
Cost left opnd=0, right opnd=2
...

And then in expand_gimple_cond we substitute the def of _23 and replace 
the BIT_IOR_EXPR with TRUTH_ORIF_EXPR. So we really expand:

...
  if (_22 || _21)
...

In do_jump_1, we encounter the following TRUTH_ORIF_EXPR handling:
...
 case TRUTH_ORIF_EXPR:
 {
   /* Spread the probability evenly between the two conditions. So
  the first condition has half the total probability of being true.
  The second condition has the other half of the total probability,
  so its jump has a probability of half the total, relative to
  the probability we reached it (i.e. the first condition was
  false).  */
   profile_probability op0_prob = profile_probability::uninitialized ();
   profile_probability op1_prob = profile_probability::uninitialized ();
   if (prob.initialized_p ())
 {
   op0_prob = prob.apply_scale (1, 2);
   op1_prob = prob.apply_scale (1, 2) / op0_prob.invert ();
 }
   if (if_true_label == NULL)
 {
   drop_through_label = gen_label_rtx ();
   do_jump (op0, NULL, drop_through_label, op0_prob);
   do_jump (op1, if_false_label, NULL, op1_prob);
 }
   else
 {
   do_jump (op0, NULL, if_true_label, op0_prob);
   do_jump (op1, if_false_label, if_true_label, op1_prob);
 }
   break;
 }
...

This expands into two jumps. The intention is two conditional jumps, but 
since _21 is 0, the second becomes unconditional:

...
;; if (_23 != 0)

(insn 12 11 13 4 (set (reg:CCZ 17 flags)
(compare:CCZ (mem/c:SI (symbol_ref:DI ("c") [flags 0x2] 
) [1 cD.1821+0 S4 A32])

(const_int 0 [0]))) "test2.c":20 -1
 (nil))

(insn 13 12 14 4 (set (reg:QI 95)
(ne:QI (reg:CCZ 17 flags)
(const_int 0 [0]))) "test2.c":20 -1
 (nil))

(insn 14 13 15 4 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:QI 95)
(const_int 0 [0]))) "test2.c":20 -1
 (nil))

(jump_insn 15 14 18 4 (set (pc)
(if_then_else (ne (reg:CCZ 17 flags)
(const_int 0 [0]))
(label_ref 0)
(pc))) "test2.c":20 -1
 (int_list:REG_BR_PROB 3664 (nil)))

(note 18 15 16 10 [bb 10] NOTE_INSN_BASIC_BLOCK)

(jump_insn 16 18 17 10 (set (pc)
(label_ref 0)) -1
 (nil))

(barrier 17 16 0)
...

The jump_insn 15 now contains a reg-note with the branch probability set 
to 3664.


Eventually, in pass_expand::execute we call 'cleanup_cfg 
(CLEANUP_NO_INSN_DEL)' during which we call checking_verify_flow_info, 
which complains that:

...
test2.c:26:1: error: verify_flow_info: REG_BR_PROB does not match cfg 
3664 7327

 }
 ^
during RTL pass: expand
test2.c:26:1: internal compiler error: verify_flow_info failed
...

In other words, this jump_insn has a branch probability note of 3664, 
but the corresponding edge claims a probability of 7327:

...
(jump_insn 15 14 18 5 (set (pc)
(if_then_else (ne (reg:CCZ 17 flags)
(const_int 0 [0]))
(label_ref:DI 32)
(pc))) "test2.c":20 617 {*jcc_1}
 (int_list:REG_BR_PROB 3664 (nil))
 -> 32)
;;succ:   10 [73.3% (guessed)]
;;6 [26.7% (guessed)]  (FALLTHRU)
...


II.

The ICE was introduced with r248945, by adding this bit to 
find_many_sub_basic_blocks:

...
  }
+   else
+ /* If nothing changed, there is no need to create new BBs.  */
+ if (EDGE_COUNT (bb->succs) == n_succs[bb->index])
+   continue;

compute_outgoing_frequencies (bb);
   }
...
Commenting out this bit makes the compilation succeed.


III.

The bit added in find_many_sub_basic_blocks makes sense to me.

It just seems to me that we have been relying on 
find_many_sub_basic_blocks to call compute_outgoing_frequencies for all 
basic blocks during expand. This patch restores that situation, and 
fixes the ICE.


Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom
Call compute_outgoing_frequencies at expand

2017-07-17  Tom de Vries  

	PR middle-end/81030
	* cfgbuild.c (find_many_sub_basic_blocks): Add and handle update_freq
	parameter.
	* cfgbuild.h (find_many_sub_basic_blocks): Add update_freq parameter.
	* cfgexpand.c (pass_expand::execute): Call find_many_sub_basic_blocks
	with update_freq == true.

	* gcc.dg/pr81030.c: New test.

---
 gcc/cfgbuild.c |  4 ++--
 gcc/cfgbuild.h |  3 ++-
 gcc/cfgexpand.c|  2 +-
 gcc/testsuite/gcc.dg/pr81030

RE: [PATCH 7/7] [ARC] Consolidate PIC implementation.

2017-07-17 Thread Claudiu Zissulescu

> Looks like a good clean up.
> 
> Thanks,
> Andrew

Committed, thank you for your review,
Claudiu

Re: [patch] Fix ICE on CONSTRUCTOR containing absolute addresses

2017-07-17 Thread Eric Botcazou

> So this isn't about global
> 
>  void *x[] = { &((struct Y *)0x12)->foo }
> 
> but for a local one where supposedly variable indexing is valid (don't
> we gimplify that)?

Yes, it's local (it's OK if global because we do a full RTL expansion).
Everything is valid, constant and passes initializer_constant_valid_p.

> And
> 
> +case INDIRECT_REF:
> +  /* This deals with absolute addresses.  */
> +  offset += tree_to_shwi (TREE_OPERAND (target, 0));
> +  x = gen_rtx_MEM (QImode,
> +  gen_rtx_SYMBOL_REF (Pmode, "origin of addresses"));
> 
> simply translates 0x12 to &* + 0x12 (where origin == 0 somehow?).
> 
> I suppose returing directly here and sth like
> 
> value->base = gen_rtx_SYMBOL_REF (Pmode, "origin of addresses");
> value->offset = offset + tree_to_shwi (...);
> return;
> 
> would be clearer.

That's a matter of consistency, the LABEL_DECL case does something equivalent:

  x = gen_rtx_MEM (FUNCTION_MODE,
   gen_rtx_LABEL_REF (Pmode, force_label_rtx (target)));

> Or even
> 
> value->base = tree-to-rtx (TREE_OPERAND (target, 0));
> value->offset = offset;

The callers expect the base to be SYMBOL_REF or LABEL_REF though.

-- 
Eric Botcazou

RE: [PATCH][ARC] Add support for naked functions.

2017-07-17 Thread Claudiu Zissulescu

> This all looks fine,
> 
> Thanks,
> Andrew
> 
> 

Committed. Thanks you for your review,
Claudiu

Re: [PATCH, committed] Fix PR81162

2017-07-17 Thread Thomas Preudhomme


Hi Bill,

Is there a reason the new test is in gcc.dg rather than in gcc.dg/ubsan? The 
test FAILs when there is no ubsan runtime support and fsanitize_undefined 
effective target is not available in gcc.dg (one needs to load ubsan-dg for this 
effective target to be defined).


Best regards,

Thomas

On 14/07/17 19:05, Bill Schmidt wrote:

Hi,

PR81162 identifies a bug in SLSR involving overflow that occurs when
replacing a NEGATE_EXPR with a PLUS_EXPR.  This is another example
of an unprofitable transformation that should be skipped anyway,
hence this simple patch.  Bootstrapped and tested on
powerpc64le-unknown-linux-gnu, committed.  Test case provided from
the bug report.

Thanks,
Bill


[gcc]

2016-07-14  Bill Schmidt  

PR tree-optimization/81162
* gimple-ssa-strength-reduction.c (replace_mult_candidate): Don't
replace a negate with an add.

[gcc/testsuite]

2016-07-14  Bill Schmidt  

PR tree-optimization/81162
* gcc.dg/pr81162.c: New file.


Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 250189)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -2082,13 +2082,14 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
   types but allows for safe negation without twisted logic.  */
if (wi::fits_shwi_p (bump)
&& bump.to_shwi () != HOST_WIDE_INT_MIN
-  /* It is not useful to replace casts, copies, or adds of
+  /* It is not useful to replace casts, copies, negates, or adds of
 an SSA name and a constant.  */
&& cand_code != SSA_NAME
&& !CONVERT_EXPR_CODE_P (cand_code)
&& cand_code != PLUS_EXPR
&& cand_code != POINTER_PLUS_EXPR
-  && cand_code != MINUS_EXPR)
+  && cand_code != MINUS_EXPR
+  && cand_code != NEGATE_EXPR)
  {
enum tree_code code = PLUS_EXPR;
tree bump_tree;
Index: gcc/testsuite/gcc.dg/pr81162.c
===
--- gcc/testsuite/gcc.dg/pr81162.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/pr81162.c  (working copy)
@@ -0,0 +1,17 @@
+/* PR tree-optimization/81162 */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=undefined -O2" } */
+
+short s;
+int i1 = 1;
+int i2 = 1;
+unsigned char uc = 147;
+
+int main() {
+  s = (-uc + 2147483647) << 0;
+  if (9031239389974324562ULL >= (-((i1 && i2) + uc) ^ -21096) ) {
+return 0;
+  } else {
+return -1;
+  }
+}

Re: [PATCH][PING] Improve extraction of changed file in contrib/mklog

2017-07-17 Thread Trevor Saunders

On Mon, Jul 17, 2017 at 08:22:56AM +0100, Yuri Gribov wrote:
> Hi,
> 
> Currently mklog will fail to analyze lines like this in patches:
> diff -rupN gcc/gcc/testsuite/lib/profopt.exp
> gcc-compare-checks/gcc/testsuite/lib/profopt.exp
> (it fails with "Error: failed to parse diff for ... and ...").
> 
> This patch fixes it. Ok for trunk?

I'm no perl expert, but it seems fine.  I don't think I can
approve this, but I sort of remembered you being the one who owns this
script, so I'm not sure you need approval.

Trev

> 
> -Y

Re: [patch] Fix ICE on CONSTRUCTOR containing absolute addresses

2017-07-17 Thread Richard Biener

On Mon, Jul 17, 2017 at 10:51 AM, Eric Botcazou  wrote:
>> Apart from the MEM construction where I simply trust you this looks
>> ok.  Mind adding MEM_REF support for this case as well?
>
> Do you mean MEM_REF ?  Is that possible?

Yes.

>> Btw, why's simply output_constant_def (TREE_OPERAND (target, 0), 1);
>> not correct?
>
> If you do that, you get a symbol in the constant pool whose value (address) is
> arbitrary; here what we want is a fixed value.  That being said, given that
> the contents of the contant pool is hashed, there is very likely not much
> difference in the end, although that would be conceptually incorrect.
>
>> Isn't this about &*0x1?
>
> Yes, it's not the address of a constant, it's the address of an object whose
> base address is absolute, so &(abs_address)->field[index].  This kind of thing
> is not folded by build_fold_addr_expr.

So this isn't about global

 void *x[] = { &((struct Y *)0x12)->foo }

but for a local one where supposedly variable indexing is valid (don't
we gimplify
that)?

And

+case INDIRECT_REF:
+  /* This deals with absolute addresses.  */
+  offset += tree_to_shwi (TREE_OPERAND (target, 0));
+  x = gen_rtx_MEM (QImode,
+  gen_rtx_SYMBOL_REF (Pmode, "origin of addresses"));

simply translates 0x12 to &* + 0x12 (where origin == 0 somehow?).

I suppose returing directly here and sth like

value->base = gen_rtx_SYMBOL_REF (Pmode, "origin of addresses");
value->offset = offset + tree_to_shwi (...);
return;

would be clearer.  Or even

value->base = tree-to-rtx (TREE_OPERAND (target, 0));
value->offset = offset;

?


> --
> Eric Botcazou

Re: PING^2: Fwd: SSA range class and removal of VR_ANTI_RANGEs

2017-07-17 Thread Richard Biener

On Mon, Jul 17, 2017 at 8:51 AM, Aldy Hernandez  wrote:
> PING PING
>
> Hi folks.
>
> The following is another iteration of the SSA range class, taking into
> account many of the suggestions posted on this thread, especially the
> addition of a memory efficient class for storage, folding non-zero
> bits back into the range information, C++ suggestions by Martin, and
> some minor suggestions.
>
> Most importantly, I have included an irange_storage class that uses
> trailing_wide_ints<5>.  This produces far better results that my
> previous incarnation with wide_int[6] :).
>
> The storage class is basically this:
>
> class GTY((variable_size)) irange_storage
> {
>   friend class irange;
>  public:
> /* Maximum number of pairs of ranges allowed.  */
>   static const unsigned int max_pairs = 2;
>   /* These are the pair of subranges for the irange.  The last
>  wide_int allocated is a mask representing which bits in an
>  integer are known to be non-zero.  */
>   trailing_wide_ints trailing_bounds;
> }
>
> Compare this with mainline which has trailing_wide_ints<3>.  The extra
> 2 items in this patchset chew up two 64-bit words, for an additional
> 16 bytes per range in SSA_NAME_RANGE_INFO.  No additional storage is
> needed for SSA_NAMEs per se.
>
> I looked at Jakub's suggestion of compiling insn-recog.c.  Although I
> don't see 4M SSA_NAMES nodes created Jakub sees, I do see a little
> over a million when building with:
>
> ./cc1plus insn-recog.ii -fno-PIE -O2 -fno-exceptions -fno-rtti
> -fasynchronous-unwind-tables  -quiet -fsanitize=address,undefined
> -fmem-report
>
> I explored 3 different ways of measuring memory consumption:
>
> 1. /usr/bin/time -f "%M" , which measures maximum RSS usage.  This
> produced results within the noise.  The RSS usage differed slightly
> between runs, with no consistent difference between mainline and
> patch.
>
> 2. valgrind --tool=massif , no difference.  Perhaps the overhead
> of our GC hides any difference?
>
> 3. --enable-gather-detailed-mem-stats and -fmem-report ...
>
> Total Allocated before: 2351658176
> Total Allocated  after: 2353199328
> diff: 1541152 (0.06%)
>
> SSA_NAME nodes allocated: 1026694
>
> AFAICT with -fmem-report, a 2.35gig compilation consumes 1.5 more
> megs? This is total usage, and some of this gets cleaned up during GC,
> so the total impact is probably less.  Unless there is another
> preferred way of measuring memory usage, I think memory is a non-issue
> with this approach.
>
> Note, this is even before my pending patch avoiding generation of
> useless range information
> (https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01068.html).
>
> How does this look?

It's a change that on its own doesn't look worthwhile to me.

So please post the changes that will build ontop of this.  Like removing
anti-ranges from VRP or your on-demand value-range stuff.

Thanks,
Richard.

>
> Aldy

Re: [PING^4] {C++ PATCH] Fix-it hints for wrong usage of 'friend' and 'auto'

2017-07-17 Thread David Malcolm

On Sun, 2017-07-16 at 12:13 +0200, Volker Reichelt wrote:
> Hi,
> 
> this is PING^4 for a C++ patch that adds fix-it hints for wrong usage
> of 'friend' and 'auto':
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01575.html

I notice that the patch slightly changes the apparent indentation
within the RID_AUTO hunk, but the pre-existing lines there are a
mixture of pure spaces, and tabs+spaces; please double-check that any
lines that the patch touches are converted to correctly follow our
convention of tabs+spaces.

Other than that, OK for trunk.

Sorry for not spotting this earlier.
Dave

> Previous pings
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01248.html
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00207.html
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00712.html ; (by
> Gerald)
> 
> (thanks Gerald!)
> 
> Regards,
> Volker

Re: [PATCH PR81369/02]Conservatively not distribute loop with unknown niters

2017-07-17 Thread Richard Biener

On Fri, Jul 14, 2017 at 4:32 PM, Bin Cheng  wrote:
> Hi,
> This is a followup patch for previous fix to PR81369.  In that test case, GCC
> tries to distribute infinite loop, which doesn't make much sense.  This patch
> works conservatively by skipping loops with unknown niters.  It also 
> simplifies
> code a bit.
> Bootstrap and test on x86_64 and AArch64, is it OK?

Ok.

Thanks,
Richard.

> Thanks,
> bin
> 2017-07-12  Bin Cheng  
>
> PR target/81369
> * tree-loop-distribution.c (classify_partition): Only assert on
> numer of iterations.
> (merge_dep_scc_partitions): Delete prameter.  Update function call.
> (distribute_loop): Remove code handling loop with unknown niters.
> (pass_loop_distribution::execute): Skip loop with unknown niters.

Re: [PATCH PR81369/01]Sort partitions by post order for all cases

2017-07-17 Thread Richard Biener

On Fri, Jul 14, 2017 at 4:31 PM, Bin Cheng  wrote:
> Hi,
> This patch fixes ICE reported by PR81369.  It simply sinks call to
> sort_partitions_by_post_order so that it's executed for all cases.
> This is necessary to schedule reduction partition as the last one.
> Bootstrap and test on x86_64 and AArch64.  Is it OK?

Ok.

Richard.

> Thanks,
> bin
> 2017-07-12  Bin Cheng  
>
> PR target/81369
> * tree-loop-distribution.c (merge_dep_scc_partitions): Sink call to
> function sort_partitions_by_post_order.
>
> gcc/testsuite/ChangeLog
> 2017-07-12  Bin Cheng  
>
> PR target/81369
> * gcc.dg/tree-ssa/pr81369.c: New.

Re: [PATCH] Fix ICE on _Fract division (PR tree-optimization/81428)

2017-07-17 Thread Richard Biener

On Thu, Jul 13, 2017 at 10:41 PM, Jakub Jelinek  wrote:
> Hi!
>
> _Fract types can't express 1, other spots that call build_one_cst already
> make sure that the type is integral or uses the !ALL_FRACT_MODE_P (TYPE_MODE 
> (type))
> check I've added in this patch.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux and tested with a
> cross to arm on the testcase.  Ok for trunk?

Ok.

> 2017-07-13  Jakub Jelinek  
>
> PR tree-optimization/81428
> * match.pd (X / X -> one): Don't optimize _Fract divisions, as 1
> can't be built for those types.
>
> * gcc.dg/fixed-point/pr81428.c: New test.
>
> --- gcc/match.pd.jj 2017-07-13 15:37:34.0 +0200
> +++ gcc/match.pd2017-07-13 15:46:11.194593051 +0200
> @@ -243,8 +243,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   /* X / X is one.  */
>   (simplify
>(div @0 @0)
> -  /* But not for 0 / 0 so that we can get the proper warnings and errors.  */
> -  (if (!integer_zerop (@0))
> +  /* But not for 0 / 0 so that we can get the proper warnings and errors.
> + And not for _Fract types where we can't build 1.  */
> +  (if (!integer_zerop (@0) && !ALL_FRACT_MODE_P (TYPE_MODE (type)))
> { build_one_cst (type); }))
>   /* X / abs (X) is X < 0 ? -1 : 1.  */
>   (simplify
> --- gcc/testsuite/gcc.dg/fixed-point/pr81428.c.jj   2017-07-13 
> 15:49:52.980806440 +0200
> +++ gcc/testsuite/gcc.dg/fixed-point/pr81428.c  2017-07-13 15:49:29.0 
> +0200
> @@ -0,0 +1,9 @@
> +/* PR tree-optimization/81428 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +void
> +foo (long _Fract *a, long _Fract *b)
> +{
> +  *b = *a / *a;
> +}
>
> Jakub

Re: [PATCH GCC][13/13]Distribute loop with loop versioning under runtime alias check

2017-07-17 Thread Bin.Cheng

On Mon, Jul 10, 2017 at 10:32 AM, Christophe Lyon
 wrote:
> Hi Bin,
>
> On 30 June 2017 at 12:43, Bin.Cheng  wrote:
>> On Wed, Jun 28, 2017 at 2:09 PM, Bin.Cheng  wrote:
>>> On Wed, Jun 28, 2017 at 1:29 PM, Richard Biener
>>>  wrote:
 On Wed, Jun 28, 2017 at 1:46 PM, Bin.Cheng  wrote:
> On Wed, Jun 28, 2017 at 11:58 AM, Richard Biener
>  wrote:
>> On Tue, Jun 27, 2017 at 4:07 PM, Bin.Cheng  wrote:
>>> On Tue, Jun 27, 2017 at 1:44 PM, Richard Biener
>>>  wrote:
 On Fri, Jun 23, 2017 at 12:30 PM, Bin.Cheng  
 wrote:
> On Tue, Jun 20, 2017 at 10:22 AM, Bin.Cheng  
> wrote:
>> On Mon, Jun 12, 2017 at 6:03 PM, Bin Cheng  wrote:
>>> Hi,
> Rebased V3 for changes in previous patches.  Bootstap and test on
> x86_64 and aarch64.

 why is ldist-12.c no longer distributed?  your comment says it doesn't 
 expose
 more "parallelism" but the point is to reduce memory bandwith 
 requirements
 which it clearly does.

 Likewise for -13.c, -14.c.  -4.c may be a questionable case but the 
 wording
 of "parallelism" still confuses me.

 Can you elaborate on that.  Now onto the patch:
>>> Given we don't model data locality or memory bandwidth, whether
>>> distribution enables loops that can be executed paralleled becomes the
>>> major criteria for distribution.  BTW, I think a good memory stream
>>> optimization model shouldn't consider small loops as in ldist-12.c,
>>> etc., appropriate for distribution.
>>
>> True.  But what means "parallel" here?  ldist-13.c if partitioned into 
>> two loops
>> can be executed "in parallel"
> So if a loop by itself can be vectorized (or so called can be executed
> paralleled), we tend to no distribute it into small ones.  But there
> is one exception here, if the distributed small loops are recognized
> as builtin functions, we still distribute it.  I assume it's generally
> better to call builtin memory functions than vectorize it by GCC?

 Yes.

>>

 +   Loop distribution is the dual of loop fusion.  It separates 
 statements
 +   of a loop (or loop nest) into multiple loops (or loop nests) with 
 the
 +   same loop header.  The major goal is to separate statements which 
 may
 +   be vectorized from those that can't.  This pass implements 
 distribution
 +   in the following steps:

 misses the goal of being a memory stream optimization, not only a 
 vectorization
 enabler.  distributing a loop can also reduce register pressure.
>>> I will revise the comment, but as explained, enabling more
>>> vectorization is the major criteria for distribution to some extend
>>> now.
>>
>> Yes, I agree -- originally it was written to optimize the stream 
>> benchmark IIRC.
> Let's see if any performance drop will be reported against this patch.
> Let's see if we can create a cost model for it.

 Fine.
>>> I will run some benchmarks to see if there is breakage.

>>

 You introduce ldist_alias_id in struct loop (probably in 01/n which I
 didn't look
 into yet).  If you don't use that please introduce it separately.
>>> Hmm, yes it is introduced in patch [01/n] and set in this patch.
>>>

 + /* Be conservative.  If data references are not well 
 analyzed,
 +or the two data references have the same base address 
 and
 +offset, add dependence and consider it alias to each 
 other.
 +In other words, the dependence can not be resolved by
 +runtime alias check.  */
 + if (!DR_BASE_ADDRESS (dr1) || !DR_BASE_ADDRESS (dr2)
 + || !DR_OFFSET (dr1) || !DR_OFFSET (dr2)
 + || !DR_INIT (dr1) || !DR_INIT (dr2)
 + || !DR_STEP (dr1) || !tree_fits_uhwi_p (DR_STEP 
 (dr1))
 + || !DR_STEP (dr2) || !tree_fits_uhwi_p (DR_STEP 
 (dr2))
 + || res == 0)

 ISTR a helper that computes whether we can handle a runtime alias 
 check for
 a specific case?
>>> I guess you mean runtime_alias_check_p that I factored out previously?
>>>  Unfortunately, it's factored out vectorizer's usage and doesn't fit
>>> here straightforwardly.  Shall I try to further generalize the
>>> interface as independence patch to this one?
>>
>> That would be nice.
>>

 +  /* Depend on vectorizer to fold IFN_LOOP_DIST_ALIAS.  */
 +  if (flag_tree_loop_vectorize)
 +{

 so at this point I'd condition the who

Re: [RFC/SCCVN] Handle BIT_INSERT_EXPR in vn_nary_op_eq

2017-07-17 Thread Richard Biener

On Thu, Jul 13, 2017 at 6:18 AM, Andrew Pinski  wrote:
> On Wed, Jul 12, 2017 at 9:10 PM, Marc Glisse  wrote:
>> On Wed, 12 Jul 2017, Andrew Pinski wrote:
>>
>>> Hi,
>>>  Unlike most other expressions, BIT_INSERT_EXPR has an implicit
>>> operand of the precision/size of the second operand.  This means if we
>>> have an integer constant for the second operand and that compares to
>>> the same constant value, vn_nary_op_eq would return that these two
>>> expressions are the same.  But in the case I was looking into the
>>> integer constants had different types, one with 1 bit precision and
>>> the other with 2 bit precision which means the BIT_INSERT_EXPR were
>>> not equal at all.
>>>
>>> This patches the problem by checking to see if BIT_INSERT_EXPR's
>>> operand 1's (second operand) type  has different precision to return
>>> false.
>>>
>>> Is this the correct location or should we be checking for this
>>> differently?  If this is the correct location, is the patch ok?
>>> Bootstrapped and tested on aarch64-linux-gnu with no regressions (and
>>> also tested with a few extra patches to expose BIT_INSERT_EXPR).
>>>
>>> Thanks,
>>> Andrew Pinski
>>>
>>> ChangeLog:
>>> * tree-ssa-sccvn.c (vn_nary_op_eq): Check BIT_INSERT_EXPR's operand 1
>>> to see if the types precision matches.
>>
>>
>> Hello,
>>
>> since BIT_INSERT_EXPR is implicitly an expression with 4 arguments, it makes
>> sense that we may need a few such special cases. But shouldn't the hash
>> function be in sync with the equality comparator? Does operand_equal_p need
>> the same?
>
> The hash function does not need to be exactly the same.  The only
> requirement there is if vn_nary_op_eq returns true then the hash has
> to be the same.  Now we could improve the hash by using the precision
> which will allow us not to compare as much in some cases.
>
> Yes operand_equal_p needs the same handling; I did not notice that
> until you mention it..
> Right now it does:
> case BIT_INSERT_EXPR:
>   return OP_SAME (0) && OP_SAME (1) && OP_SAME (2);

Aww.  The issue is that operand_equal_p treats INTEGER_CSTs of different
type/precision but the same value as equal.

Revisiting that, while a good idea, shouldn't block a fix here.  So ...

Index: tree-ssa-sccvn.c
===
--- tree-ssa-sccvn.c(revision 250159)
+++ tree-ssa-sccvn.c(working copy)
@@ -2636,6 +2636,14 @@ vn_nary_op_eq (const_vn_nary_op_t const
 if (!expressions_equal_p (vno1->op[i], vno2->op[i]))
   return false;

+  /* BIT_INSERT_EXPR has an implict operand as the type precision
+ of op1.  Need to check to make sure they are the same.  */
+  if (vno1->opcode == BIT_INSERT_EXPR)
+if (INTEGRAL_TYPE_P (TREE_TYPE (vno1->op[0]))
+   && TYPE_PRECISION (TREE_TYPE (vno1->op[1]))
+   != TYPE_PRECISION (TREE_TYPE (vno2->op[1])))
+  return false;
+

the case can be restricted to INTEGER_CST vno1->op[0] I think:

  if (vno1->opcode == BIT_INSERT_EXPR
  && TREE_CODE (vno1->op[0]) == INTEGER_CST
  && TYPE_PRECISION (

and yes, operand_equal_p needs a similar fix.  Can you re-post with that added?
Do you have a testcase?

Thanks,
Richard.

> Thanks,
> Andrew Pinski
>
>>
>> --
>> Marc Glisse

Re: [ARM, VXworks] Fix build

2017-07-17 Thread Eric Botcazou

> What I said.  looking at the contents of vxworks.h I see:
> 
> #define CC1_SPEC\
> "%{tstrongarm:-mlittle-endian -mcpu=strongarm ; \
>t4:-mlittle-endian -march=armv4 ;\
>t4be:  -mbig-endian -march=armv4 ;   \
>t4t:   -mthumb -mthumb-interwork -mlittle-endian -march=armv4t ; \
>t4tbe: -mthumb -mthumb-interwork -mbig-endian -march=armv4t ;\
>t5:-mlittle-endian -march=armv5 ;\
>t5be:  -mbig-endian -march=armv5 ;   \
>t5t:   -mthumb -mthumb-interwork -mlittle-endian -march=armv5 ;  \
>t5tbe: -mthumb -mthumb-interwork -mbig-endian -march=armv5 ; \
>txscale:   -mlittle-endian -mcpu=xscale ;\
>txscalebe: -mbig-endian -mcpu=xscale ;   \
> 
> : -march=armv4}"
> 
> Nothing in that list post-dates armv5t, and many of the mentioned target
> architectures are under threat of deprecation in GCC, in addition to the
> old ABI (in particular armv5 is a nonsense architecture - it should be
> armv5t).

OK, we have indeed changed the last one locally to -march=armv7-a.

> I also can't see how ARMv7 would work big-endian with the old ABI since
> there's no way of generating BE8 format images without at least some
> features from the new ABI (mapping symbols, forcing --be8 through to the
> linker, etc).  So "works fine" would appear to have quite a narrow
> definition.

Yes, VxWorks certainly doesn't support all the combinations.

> That's good news.  Does that mean we'll be able to drop the old stuff
> though?  I'd really like to make progress towards removing the old ABI
> support from GCC.

Yes, I'd think so, but Olivier knows better.

> Note that considering a port for deprecation is only the first step.  It
> does not mean that it has been deprecated.  Sometimes the only way to
> find out if a port really is wanted is to make such a threat...

No disagreement. ;-)

-- 
Eric Botcazou

Re: [PATCH, rs6000] Do not do gimple-folding of expressions that are missing their LHS

2017-07-17 Thread Richard Biener

On Wed, Jul 12, 2017 at 11:24 PM, Segher Boessenkool
 wrote:
> On Wed, Jul 12, 2017 at 01:34:01PM -0500, Bill Schmidt wrote:
>> Although I said this was invalid code, it really isn't -- it's legal code.  
>> It's more of an ice-on-stupid-code situation. :)  So probably you should 
>> remove the "invalid" language from the commentary.  Sorry for misleading you.
>
> We could fold this to nothing (if there are no side effects), but it
> would be better if we made stupid code slower instead of faster ;-)

Well, optimization opportunities are not always obvious to the writer.
Iff the builtins have no side-effects
besides the return value the backend should mark them PURE or CONST
and you wouldn't run into
this situation.

But yes, simply folding to GIMPLE_NOP is the appropriate thing when
you want to paper over the
above deficit in the folding routine.

  gsi_replace (si_p, gimple_build_nop (), false);

note you'll eventually wreck virtual operands so before do

  unlink_stmt_vdef (gsi_stmt (gsi));

which may have it's own share of issues (folding may not look at SSA
immediate uses...).

So better fixup builtin attributes!

Richard.

>
> Segher

Re: [PATCH v1 0/3] dwarf: purge DIEs for unreferenced extern globals.

2017-07-17 Thread Richard Biener

On Wed, Jul 12, 2017 at 5:42 PM, Franklin “Snaipe” Mathieu
 wrote:
> From: Franklin “Snaipe” Mathieu 
>
> Hello GCC folks,
>
> This patch series addresses PR 81135 [1].
>
> * patch 1/3 is for trunk (built/tested on trunk@250093).
> * patch 2/3 is the gcc7 backport (built/tested on gcc-7-branch@249680).
> * patch 3/3 is the gcc6 backport (built/tested on gcc-6-branch@249671).
>
> [1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81135

Doing this from late_global_decl is bogus from a conceptual point.  It should
be instead done from dwarf2out_early_finish where we also prune unused
types from (prune_unused_types).  I'd be more comfortable if you'd integrate
removing unused global declarations into that function given it already handles
uses in attributes and the like.

The other possibility is to address this from the FE side and not call
early_global_decl
on mere declarations but only after parsing (when TREE_USED is
properly set) walk
them again and register them at that point.

Richard.

> Kind Regards,
>
> --
> Franklin "Snaipe" Mathieu
> Arista Networks, Ltd
>

Re: [PATCH] Fix PR81362: Vector peeling

2017-07-17 Thread Richard Biener

On Wed, Jul 12, 2017 at 5:11 PM, Robin Dapp  wrote:
> The attached patch fixes PR81362.
>
> npeel was erroneously overwritten by vect_peeling_hash_get_lowest_cost
> although the corresponding dataref is not used afterwards.  It should be
> safe to get rid of the npeel parameter since we use the returned
> peeling_info's npeel anyway.  Also removed the body_cost_vec parameter
> which is not used elsewhere.

Ok.

Thanks,
Richard.

> Regards
>  Robin
>
>
> --
>
> gcc/ChangeLog:
>
> 2017-07-12  Robin Dapp  
>
> * (vect_enhance_data_refs_alignment):
> Remove body_cost_vec from _vect_peel_extended_info.
> tree-vect-data-refs.c (vect_peeling_hash_get_lowest_cost):
> Do not set body_cost_vec.
> (vect_peeling_hash_choose_best_peeling): Remove body_cost_vec
> and npeel.

Re: [ARM, VXworks] Fix build

2017-07-17 Thread Richard Earnshaw (lists)

On 16/07/17 10:21, Eric Botcazou wrote:
>> The port is also *very* out-of-date.  Not only does it not use the EABI,
>> but it hasn't had support for any core added since ARMv5 (and ARMv6 was
>> announced in 2002)!
> 
> What do you mean exactly?  The port works fine on ARMv7.

What I said.  looking at the contents of vxworks.h I see:

#define CC1_SPEC\
"%{tstrongarm:-mlittle-endian -mcpu=strongarm ; \
   t4:-mlittle-endian -march=armv4 ;\
   t4be:  -mbig-endian -march=armv4 ;   \
   t4t:   -mthumb -mthumb-interwork -mlittle-endian -march=armv4t ; \
   t4tbe: -mthumb -mthumb-interwork -mbig-endian -march=armv4t ;\
   t5:-mlittle-endian -march=armv5 ;\
   t5be:  -mbig-endian -march=armv5 ;   \
   t5t:   -mthumb -mthumb-interwork -mlittle-endian -march=armv5 ;  \
   t5tbe: -mthumb -mthumb-interwork -mbig-endian -march=armv5 ; \
   txscale:   -mlittle-endian -mcpu=xscale ;\
   txscalebe: -mbig-endian -mcpu=xscale ;   \
: -march=armv4}"

Nothing in that list post-dates armv5t, and many of the mentioned target
architectures are under threat of deprecation in GCC, in addition to the
old ABI (in particular armv5 is a nonsense architecture - it should be
armv5t).

Now maybe you've changed the way all this works but there's not much
evidence of that in the ARM backend.

I also don't see any evidence of this port being tested from a search of
gcc-testresults.

I also can't see how ARMv7 would work big-endian with the old ABI since
there's no way of generating BE8 format images without at least some
features from the new ABI (mapping symbols, forcing --be8 through to the
linker, etc).  So "works fine" would appear to have quite a narrow
definition.

> 
>> I therefore propose that we consider this port for deprecation.
> 
> We have a port to VxWorks 7, which is EABI/AAPCS, ready to be contributed.
> 

That's good news.  Does that mean we'll be able to drop the old stuff
though?  I'd really like to make progress towards removing the old ABI
support from GCC.

Note that considering a port for deprecation is only the first step.  It
does not mean that it has been deprecated.  Sometimes the only way to
find out if a port really is wanted is to make such a threat...

R.

[patch,avr,committed] Remove dead avr_inform_devices

2017-07-17 Thread Georg-Johann Lay


https://gcc.gnu.org/r250264

Applied this patch which removes stuff dead since

https://gcc.gnu.org/r239246

Committed as obvious.

Johann

Remove stuff dead since r239246.

* config/avr/avr-arch.h (avr_inform_devices): Remove dead proto.
* config/avr/avr-devices.c (mcu_name, comparator, avr_mcus_str)
(avr_inform_devices): Remove dead stuff.
Index: config/avr/avr-arch.h
===
--- config/avr/avr-arch.h	(revision 250090)
+++ config/avr/avr-arch.h	(working copy)
@@ -195,7 +195,6 @@ extern const avr_arch_t *avr_arch;
 
 extern const avr_mcu_t avr_mcu_types[];
 
-extern void avr_inform_devices (void);
 extern void avr_inform_core_architectures (void);
 
 #endif /* AVR_ARCH_H */
Index: config/avr/avr-devices.c
===
--- config/avr/avr-devices.c	(revision 250090)
+++ config/avr/avr-devices.c	(working copy)
@@ -128,39 +128,6 @@ avr_mcu_types[] =
 
 #ifndef IN_GEN_AVR_MMCU_TEXI
 
-/* Copy-pastes from `gen-avr-mmcu-texi.c' follow...  */
-
-static const char*
-mcu_name[sizeof avr_mcu_types / sizeof avr_mcu_types[0]];
-
-static int
-comparator (const void *va, const void *vb)
-{
-  const char *a = *(const char* const*) va;
-  const char *b = *(const char* const*) vb;
-
-  while (*a && *b)
-{
-  /* Make letters smaller than digits so that `atmega16a' follows
- `atmega16' without `atmega161' etc. between them.  */
-  
-  if (ISALPHA (*a) && ISDIGIT (*b))
-return -1;
-
-  if (ISDIGIT (*a) && ISALPHA (*b))
-return 1;
-
-  if (*a != *b)
-return *a - *b;
-  
-  a++;
-  b++;
-}
-
-  return *a - *b;
-}
-
-
 static char*
 avr_archs_str (void)
 {
@@ -176,39 +143,6 @@ if (!mcu->macro)
 }
 
   
-static char*
-avr_mcus_str (void)
-{
-  size_t n_mcus = 0;
-  char *mcus = concat ("", NULL);
-
-  // Build array of proper devices' names.
-
-  for (const avr_mcu_t *mcu = avr_mcu_types; mcu->name; mcu++)
-if (mcu->macro)
-  mcu_name[n_mcus++] = mcu->name;
-
-  // Sort MCUs so that they are displayed in the same canonical order as
-  // in doc/avr-mcus.texi.
-
-  qsort (mcu_name, n_mcus, sizeof (char*), comparator);
-
-  for (size_t i = 0; i < n_mcus; i++)
-mcus = concat (mcus, " ", mcu_name[i], NULL);
-
-  return mcus;
-}
-
-
-void
-avr_inform_devices (void)
-{
-  char *mcus = avr_mcus_str ();
-  inform (input_location, "devices natively supported:%s", mcus);
-  free (mcus);
-}
-
-
 void
 avr_inform_core_architectures (void)
 {

Re: [PATCHv2][PING^2][PR 56727] Bypass PLT for recursive calls

2017-07-17 Thread Jan Hubicka

> Hi all,
> 
> This is a new version of previous patch
> (https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00020.html), fixed
> after Rainer's remarks.
Hi,
the patch looks OK, but I wonder why you included can_be_discarded check?
If function is in comdat I believe the optimization still can happen.
Perhaps you only want to check DECL_EXTERNAL?

Honza

> 
> -Y

Re: C/C++ PATCH to implement -Wmultistatement-macros (PR c/80116)

2017-07-17 Thread Marek Polacek

On Mon, Jul 17, 2017 at 10:15:40AM +0200, Gerald Pfeifer wrote:
> On Sat, 10 Jun 2017, Gerald Pfeifer wrote:
> > I'm curious to see how many issues this is going to find in real-world
> > code out there!
> 
> This is a bit anecdotal, but your code did find one real issue in Wine:
> 
>  https://www.winehq.org/pipermail/wine-patches/2017-July/163551.html
> (and https://www.winehq.org/pipermail/wine-patches/2017-July/163550.html ) 
> 
> The first patch was accepted, the second (but earlier one) not.

Thanks, good to see that.  It's also found some bugs in glibc.  Would love
to hear about kernel, too.

Marek

Re: Better merging of -fPIC/pic/PIE/pie in lto-wrapper

2017-07-17 Thread Jan Hubicka

> On Sat, Jul 8, 2017 at 1:03 PM, Jan Hubicka  wrote:
> > Hi,
> > PR lto/80838 is about lto+profiledbootstrapped compiler being slower than
> > profiledboostrapped compiler.
> >
> > This is caused by a fact that some of object files linked into cc1plus 
> > binary
> > are built with -fPIC and lto-wrapper then decides to make whole binary PIC 
> > that
> > is very slow.  While we probably ought to avoid linking PIC code into static
> > binary but I saw similar issue with firefox and other programs.
> >
> > I do not think we want to support mixed PIC/non-PIC symbols internally, 
> > because
> > it would need quite some work and I do not see any reasonable use cases.
> >
> > This patch makes merging more realistic/agressive.  Linking -fPIC and 
> > non-PIC
> > code together results in non-PIC binary and thus the corresponding flags are
> > dropped when mismatches occurs.
> >
> > It would be nice to warn about it, but I do not know how to make warning
> > meaningful on targets that are PIC by default.
> >
> > Bootstrapped/regtested x86_64-linux, plan to commit it tomorrow after
> > lto-boottrapping if there are no complains.
> 
> Hum.  I wonder if/why we can't ask the linker about the output binary kind?
Hmm, you are right. I forgot I implemented also that :).
I bleieve we need both - disable PIC when linker tell us we can and merge option
so we make sane decisions between pic/PIC, pie/PIE. And of course we have the
wonderful non-plugin path.

I bleieve we got code quality regression here because I forgot about clearing 
flag_shlib
for non-dynamic builds:
Index: lto-lang.c
===
--- lto-lang.c  (revision 250245)
+++ lto-lang.c  (working copy)
@@ -840,11 +840,13 @@
  flag_pie is 2.  */
   flag_pie = MAX (flag_pie, flag_pic);
   flag_pic = flag_pie;
+  flag_shlib = 0;
   break;
 
 case LTO_LINKER_OUTPUT_EXEC: /* Normal executable */
   flag_pic = 0;
   flag_pie = 0;
+  flag_shlib = 0;
   break;
 
 case LTO_LINKER_OUTPUT_UNKNOWN:

I will test this and verify that there is no code quality difference without 
the option merging
then.

Honza

> 
> Richard.

Re: [PATCH PR81374]Record the max index of basic block, rather than # of basic blocks

2017-07-17 Thread Richard Biener

On Mon, Jul 10, 2017 at 4:18 PM, Bin Cheng  wrote:
> Hi,
> This patch fixes an ICE in new loop distribution code.  When computing 
> topological
> order for basic blocks it should record the max index of basic block, rather 
> than
> number of basic blocks.  I didn't add new test because existing tests can 
> catch the
> ICE as well.
>
> Bootstrap and test on x86_64.  Is it OK?

Ok.

Thanks,
Richard.

> Thanks,
> bin
> 2017-07-10  Bin Cheng  
>
> PR tree-optimization/81374
> * tree-loop-distribution.c (pass_loop_distribution::execute): Record
> the max index of basic blocks, rather than number of basic blocks.

Re: [PATCH] Do not allow -fgnu-tm w/ -fsanitize={kernel-,}address (PR sanitizer/81302).

2017-07-17 Thread Jakub Jelinek

On Mon, Jul 17, 2017 at 11:18:02AM +0200, Richard Biener wrote:
> On Mon, Jul 10, 2017 at 11:35 AM, Martin Liška  wrote:
> > Hello
> >
> > Following patch disables -fgnu-tm with AddressSanitizer.
> > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> >
> > Ready to be installed?
> 
> I think it should be a sorry () but no strong opinion.

Certainly better sorry than error, but ideally it should be just supported.
Of course with nobody actively working on -fgnu-tm that might take a while.

Jakub

Re: [PATCH] Do not allow -fgnu-tm w/ -fsanitize={kernel-,}address (PR sanitizer/81302).

2017-07-17 Thread Richard Biener

On Mon, Jul 10, 2017 at 11:35 AM, Martin Liška  wrote:
> Hello
>
> Following patch disables -fgnu-tm with AddressSanitizer.
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

I think it should be a sorry () but no strong opinion.

Thanks,
Richard.

> Martin
>
> gcc/ChangeLog:
>
> 2017-07-04  Martin Liska  
>
> PR sanitizer/81302
> * opts.c (finish_options): Do not allow -fgnu-tm
> w/ -fsanitize={kernel-,}address.
> ---
>  gcc/opts.c | 8 
>  1 file changed, 8 insertions(+)
>
>

[patch,contrib,committed] Remove dead avr-tables.opt from files_and_dependencies.

2017-07-17 Thread Georg-Johann Lay


https://gcc.gnu.org/r250263

This is a small clean-up that removed a dependency for a file
which no more exists (avr-tables.opt).

Committed as obvious.

Johann


* gcc_update (files_and_dependencies)
[gcc/config/avr/avr-tables.opt]: Remove dead entry.

Index: gcc_update
===
--- gcc_update  (revision 250261)
+++ gcc_update  (working copy)
@@ -82,7 +82,6 @@ gcc/fixinc/fixincl.x: gcc/fixinc/fixincl
 gcc/config/aarch64/aarch64-tune.md: 
gcc/config/aarch64/aarch64-cores.def gcc/config/aarch64/gentune.sh
 gcc/config/arm/arm-tune.md: gcc/config/arm/arm-cpus.in 
gcc/config/arm/parsecpu.awk
 gcc/config/arm/arm-tables.opt: gcc/config/arm/arm-cpus.in 
gcc/config/arm/parsecpu.awk
-gcc/config/avr/avr-tables.opt: gcc/config/avr/avr-mcus.def 
gcc/config/avr/genopt.sh
 gcc/config/avr/t-multilib: gcc/config/avr/avr-mcus.def 
gcc/config/avr/genmultilib.awk
 gcc/config/c6x/c6x-tables.opt: gcc/config/c6x/c6x-isas.def 
gcc/config/c6x/genopt.sh
 gcc/config/c6x/c6x-sched.md: gcc/config/c6x/c6x-sched.md.in 
gcc/config/c6x/gensched.sh

[PATCH][GCC][Arm][COMMITTED] Fix typo in error message when arm_neon.h included.

2017-07-17 Thread Tamar Christina

Hi All,

A typo in arm_neon.h is suggesting uses use -mfloat-abi=softp instead of softfp.

Committed as r25026 under the GCC obvious rule.

gcc/
2017-07-17  Tamar Christina  

* config/arm/arm_neon.h: Fix softp typo.

Thanks,
Tamardiff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 65f36e2c91ee0c2950954938f4818f28d7e7e12d..0d436e83d0f01f0c86f8d6a25f84466c841c7e11 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -28,7 +28,7 @@
 #define _GCC_ARM_NEON_H 1
 
 #ifndef __ARM_FP
-#error "NEON intrinsics not available with the soft-float ABI.  Please use -mfloat-abi=softp or -mfloat-abi=hard"
+#error "NEON intrinsics not available with the soft-float ABI.  Please use -mfloat-abi=softfp or -mfloat-abi=hard"
 #else
 
 #pragma GCC push_options

Re: [PATCH, GCC/ARM, ping] Remove ARMv8-M code for D17-D31

2017-07-17 Thread Thomas Preudhomme


Ping?

Best regards,

Thomas

On 12/07/17 09:59, Thomas Preudhomme wrote:

Hi Richard,

On 07/07/17 15:19, Richard Earnshaw (lists) wrote:


Hmm, I think that's because really this is a partial conversion.  It
looks like doing this properly would involve moving that existing code
to use sbitmaps as well.  I think doing that would be better for
long-term maintenance perspectives, but I'm not going to insist that you
do it now.


There's also the assert later but I've found a way to improve it slightly. While 
switching to auto_sbitmap I also changed the code slightly to allocate directly 
bitmaps to the right size. Since the change is probably bigger than what you had 
in mind I'd appreciate if you can give me an OK again. See updated patch in 
attachment. ChangeLog entry is unchanged:


2017-06-13  Thomas Preud'homme  

 * config/arm/arm.c (arm_option_override): Forbid ARMv8-M Security
 Extensions with more than 16 double VFP registers.
 (cmse_nonsecure_entry_clear_before_return): Remove second entry of
 to_clear_mask and all code related to it.  Replace the remaining
 entry by a sbitmap and adapt code accordingly.



As a result I'll let you take the call as to whether you keep this
version or go back to your earlier patch.  If you do decide to keep this
version, then see the comment below.


Given the changes I'm more happy with how the patch looks now and making it go 
in can be a nice incentive to change other ARMv8-M Security Extension related 
code later on.


Best regards,

Thomas
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 259597d8890ee84c5bd92b12b6f9f6521c8dcd2e..4d09e891c288c7bf9b519ad7cd62bd38df661a02 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3620,6 +3620,11 @@ arm_option_override (void)
   if (use_cmse && !arm_arch_cmse)
 error ("target CPU does not support ARMv8-M Security Extensions");
 
+  /* We don't clear D16-D31 VFP registers for cmse_nonsecure_call functions
+ and ARMv8-M Baseline and Mainline do not allow such configuration.  */
+  if (use_cmse && LAST_VFP_REGNUM > LAST_LO_VFP_REGNUM)
+error ("ARMv8-M Security Extensions incompatible with selected FPU");
+
   /* Disable scheduling fusion by default if it's not armv7 processor
  or doesn't prefer ldrd/strd.  */
   if (flag_schedule_fusion == 2
@@ -24996,42 +25001,39 @@ thumb1_expand_prologue (void)
 void
 cmse_nonsecure_entry_clear_before_return (void)
 {
-  uint64_t to_clear_mask[2];
+  bool clear_vfpregs = TARGET_HARD_FLOAT && !TARGET_THUMB1;
+  int regno, maxregno = clear_vfpregs ? LAST_VFP_REGNUM : IP_REGNUM;
   uint32_t padding_bits_to_clear = 0;
   uint32_t * padding_bits_to_clear_ptr = &padding_bits_to_clear;
-  int regno, maxregno = IP_REGNUM;
+  auto_sbitmap to_clear_bitmap (LAST_VFP_REGNUM);
   tree result_type;
   rtx result_rtl;
 
-  to_clear_mask[0] = (1ULL << (NUM_ARG_REGS)) - 1;
-  to_clear_mask[0] |= (1ULL << IP_REGNUM);
+  bitmap_clear (to_clear_bitmap);
+  bitmap_set_range (to_clear_bitmap, R0_REGNUM, NUM_ARG_REGS);
+  bitmap_set_bit (to_clear_bitmap, IP_REGNUM);
 
   /* If we are not dealing with -mfloat-abi=soft we will need to clear VFP
  registers.  We also check that TARGET_HARD_FLOAT and !TARGET_THUMB1 hold
  to make sure the instructions used to clear them are present.  */
-  if (TARGET_HARD_FLOAT && !TARGET_THUMB1)
+  if (clear_vfpregs)
 {
-  uint64_t float_mask = (1ULL << (D7_VFP_REGNUM + 1)) - 1;
-  maxregno = LAST_VFP_REGNUM;
-
-  float_mask &= ~((1ULL << FIRST_VFP_REGNUM) - 1);
-  to_clear_mask[0] |= float_mask;
+  int float_bits = D7_VFP_REGNUM - FIRST_VFP_REGNUM + 1;
 
-  float_mask = (1ULL << (maxregno - 63)) - 1;
-  to_clear_mask[1] = float_mask;
+  bitmap_set_range (to_clear_bitmap, FIRST_VFP_REGNUM, float_bits);
 
   /* Make sure we don't clear the two scratch registers used to clear the
 	 relevant FPSCR bits in output_return_instruction.  */
   emit_use (gen_rtx_REG (SImode, IP_REGNUM));
-  to_clear_mask[0] &= ~(1ULL << IP_REGNUM);
+  bitmap_clear_bit (to_clear_bitmap, IP_REGNUM);
   emit_use (gen_rtx_REG (SImode, 4));
-  to_clear_mask[0] &= ~(1ULL << 4);
+  bitmap_clear_bit (to_clear_bitmap, 4);
 }
 
   /* If the user has defined registers to be caller saved, these are no longer
  restored by the function before returning and must thus be cleared for
  security purposes.  */
-  for (regno = NUM_ARG_REGS; regno < LAST_VFP_REGNUM; regno++)
+  for (regno = NUM_ARG_REGS; regno <= maxregno; regno++)
 {
   /* We do not touch registers that can be used to pass arguments as per
 	 the AAPCS, since these should never be made callee-saved by user
@@ -25041,29 +25043,45 @@ cmse_nonsecure_entry_clear_before_return (void)
   if (IN_RANGE (regno, IP_REGNUM, PC_REGNUM))
 	continue;
   if (call_used_regs[regno])
-	to_clear_mask[regno / 64] |= (1ULL << (regno % 64));
+	bitmap_set_bit (to_clear_bitmap, regno);
 }
 
   /* Make sure we do

Re: [PATCH, GCC/testsuite/ARM, ping] Fix coprocessor intrinsic test failures on ARMv8-A

2017-07-17 Thread Thomas Preudhomme


Ping?

Best regards,

Thomas

On 12/07/17 14:31, Thomas Preudhomme wrote:

Coprocessor intrinsic tests in gcc.target/arm/acle test whether
__ARM_FEATURE_COPROC has the right bit defined before calling the
intrinsic. This allows to test both the correct setting of that macro
and the availability and correct working of the intrinsic. However the
__ARM_FEATURE_COPROC macro is no longer defined for ARMv8-A since
r249399.

This patch changes the testcases to skip that test for ARMv8-A and
ARMv8-R targets.  It also fixes some irregularity in the coprocessor
effective targets:
- add ldcl and stcl to the list of instructions listed as guarded by
   arm_coproc1_ok
- enable tests guarded by arm_coproc2_ok, arm_coproc3_ok and
   arm_coproc4_ok for Thumb-2 capable targets but disable for Thumb-1
   targets.

ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2017-07-04  Thomas Preud'homme  

 * gcc.target/arm/acle/cdp.c: Skip __ARM_FEATURE_COPROC check for
 ARMv8-A and ARMv8-R.
 * gcc.target/arm/acle/cdp2.c: Likewise.
 * gcc.target/arm/acle/ldc.c: Likewise.
 * gcc.target/arm/acle/ldc2.c: Likewise.
 * gcc.target/arm/acle/ldc2l.c: Likewise.
 * gcc.target/arm/acle/ldcl.c: Likewise.
 * gcc.target/arm/acle/mcr.c: Likewise.
 * gcc.target/arm/acle/mcr2.c: Likewise.
 * gcc.target/arm/acle/mcrr.c: Likewise.
 * gcc.target/arm/acle/mcrr2.c: Likewise.
 * gcc.target/arm/acle/mrc.c: Likewise.
 * gcc.target/arm/acle/mrc2.c: Likewise.
 * gcc.target/arm/acle/mrrc.c: Likewise.
 * gcc.target/arm/acle/mrrc2.c: Likewise.
 * gcc.target/arm/acle/stc.c: Likewise.
 * gcc.target/arm/acle/stc2.c: Likewise.
 * gcc.target/arm/acle/stc2l.c: Likewise.
 * gcc.target/arm/acle/stcl.c: Likewise.
 * lib/target-supports.exp:
 (check_effective_target_arm_coproc1_ok_nocache): Mention ldcl
 and stcl in the comment.
 (check_effective_target_arm_coproc2_ok_nocache): Allow Thumb-2 targets
 and disable Thumb-1 targets.
 (check_effective_target_arm_coproc3_ok_nocache): Likewise.
 (check_effective_target_arm_coproc4_ok_nocache): Likewise.

Tested by running all tests in gcc.target/arm/acle before and after this
patch for ARMv6-M, ARMv7-M, ARMv7E-M, ARMv3, ARMv4 (ARM state), ARMv4T
(Thumb state), ARMv5 (ARM state), ARMv5TE (ARM state), ARMv6 (ARM
state), ARMv6T2 (Thumb state) and and ARMv8-A (both state). The only
changes are for ARMv8-A where tests FAILing are now PASSing again.

Is this ok for trunk?

Best regards,

Thomas
diff --git a/gcc/testsuite/gcc.target/arm/acle/cdp.c b/gcc/testsuite/gcc.target/arm/acle/cdp.c
index cebd8c4024ea1930f490f63e5267a33bac59a3a8..cfa922a797cddbf4a99f27ec156fd2d2fc9a460d 100644
--- a/gcc/testsuite/gcc.target/arm/acle/cdp.c
+++ b/gcc/testsuite/gcc.target/arm/acle/cdp.c
@@ -5,7 +5,8 @@
 /* { dg-require-effective-target arm_coproc1_ok } */
 
 #include "arm_acle.h"
-#if (__ARM_FEATURE_COPROC & 0x1) == 0
+#if (__ARM_ARCH < 8 || !defined (__ARM_ARCH_ISA_ARM)) \
+&& (__ARM_FEATURE_COPROC & 0x1) == 0
   #error "__ARM_FEATURE_COPROC does not have correct feature bits set"
 #endif
 
diff --git a/gcc/testsuite/gcc.target/arm/acle/cdp2.c b/gcc/testsuite/gcc.target/arm/acle/cdp2.c
index 945d435d2fb99962ff47d921d9cb3633cb75bb79..b18076c26274043be8ac71e6516b9b6eac3b4137 100644
--- a/gcc/testsuite/gcc.target/arm/acle/cdp2.c
+++ b/gcc/testsuite/gcc.target/arm/acle/cdp2.c
@@ -5,7 +5,8 @@
 /* { dg-require-effective-target arm_coproc2_ok } */
 
 #include "arm_acle.h"
-#if (__ARM_FEATURE_COPROC & 0x2) == 0
+#if (__ARM_ARCH < 8 || !defined (__ARM_ARCH_ISA_ARM)) \
+&& (__ARM_FEATURE_COPROC & 0x2) == 0
   #error "__ARM_FEATURE_COPROC does not have correct feature bits set"
 #endif
 
diff --git a/gcc/testsuite/gcc.target/arm/acle/ldc.c b/gcc/testsuite/gcc.target/arm/acle/ldc.c
index cd57343208fc5b17e5391d11d126d20e224d6566..10c879f4a15e7c293541c61dc974d972798ecedf 100644
--- a/gcc/testsuite/gcc.target/arm/acle/ldc.c
+++ b/gcc/testsuite/gcc.target/arm/acle/ldc.c
@@ -5,7 +5,8 @@
 /* { dg-require-effective-target arm_coproc1_ok } */
 
 #include "arm_acle.h"
-#if (__ARM_FEATURE_COPROC & 0x1) == 0
+#if (__ARM_ARCH < 8 || !defined (__ARM_ARCH_ISA_ARM)) \
+&& (__ARM_FEATURE_COPROC & 0x1) == 0
   #error "__ARM_FEATURE_COPROC does not have correct feature bits set"
 #endif
 
diff --git a/gcc/testsuite/gcc.target/arm/acle/ldc2.c b/gcc/testsuite/gcc.target/arm/acle/ldc2.c
index d7691e30d763d1e921817fd586b47888e1b5c78f..d561adacccf358a1dbfa9db253c9bc08847c7e33 100644
--- a/gcc/testsuite/gcc.target/arm/acle/ldc2.c
+++ b/gcc/testsuite/gcc.target/arm/acle/ldc2.c
@@ -5,7 +5,8 @@
 /* { dg-require-effective-target arm_coproc2_ok } */
 
 #include "arm_acle.h"
-#if (__ARM_FEATURE_COPROC & 0x2) == 0
+#if (__ARM_ARCH < 8 || !defined (__ARM_ARCH_ISA_ARM)) \
+&& (__ARM_FEATURE_COPROC & 0x2) == 0
   #error "__ARM_FEATURE_COPROC does not have correct feature bits set"
 #endif
 
diff --git a/gcc/testsuite/gcc.target/arm/acle/ldc2l.c b/gcc/testsuite

Re: [patch] Fix ICE on CONSTRUCTOR containing absolute addresses

2017-07-17 Thread Eric Botcazou

> Apart from the MEM construction where I simply trust you this looks
> ok.  Mind adding MEM_REF support for this case as well?

Do you mean MEM_REF ?  Is that possible?

> Btw, why's simply output_constant_def (TREE_OPERAND (target, 0), 1);
> not correct?

If you do that, you get a symbol in the constant pool whose value (address) is 
arbitrary; here what we want is a fixed value.  That being said, given that 
the contents of the contant pool is hashed, there is very likely not much 
difference in the end, although that would be conceptually incorrect.

> Isn't this about &*0x1?

Yes, it's not the address of a constant, it's the address of an object whose 
base address is absolute, so &(abs_address)->field[index].  This kind of thing 
is not folded by build_fold_addr_expr.

-- 
Eric Botcazou

Re: [PATCH] Fix wrong-code aggregate propagate_with_phi bug (PR tree-optimization/81365)

2017-07-17 Thread Richard Biener

On Thu, 13 Jul 2017, Jakub Jelinek wrote:

> Hi!
> 
> As mentioned in the PR, for aggregate copies we fail to verify there
> are no loads in between the PHI and the aggregate copy that could alias with 
> the
> lhs of the copy, which is needed, because we want to hoist the aggregate
> copy onto predecessor edges of the bb with the PHI.
> 
> The following patch implements that, bootstrapped/regtested on x86_64-linux
> and i686-linux, ok for trunk and after a while 7.x?
> 
> 2017-07-13  Jakub Jelinek  
> 
>   PR tree-optimization/81365
>   * tree-ssa-phiprop.c (propagate_with_phi): When considering hoisting
>   aggregate moves onto bb predecessor edges, make sure there are no
>   loads that could alias the lhs in between the start of bb and the
>   loads from *phi.  If there are any debug stmts that could alias,
>   reset them.
> 
>   * g++.dg/torture/pr81365.C: New test.
> 
> --- gcc/tree-ssa-phiprop.c.jj 2017-05-22 10:50:11.0 +0200
> +++ gcc/tree-ssa-phiprop.c2017-07-11 16:52:41.012340615 +0200
> @@ -327,7 +327,7 @@ propagate_with_phi (basic_block bb, gphi
>if (!dominated_by_p (CDI_POST_DOMINATORS,
>  bb, gimple_bb (use_stmt)))
>   continue;
> - 
> +
>/* Check whether this is a load of *ptr.  */
>if (!(is_gimple_assign (use_stmt)
>   && gimple_assign_rhs_code (use_stmt) == MEM_REF
> @@ -356,6 +356,9 @@ propagate_with_phi (basic_block bb, gphi
>   insert aggregate copies on the edges instead.  */
>if (!is_gimple_reg_type (TREE_TYPE (TREE_TYPE (ptr
>   {
> +   if (!gimple_vdef (use_stmt))
> + goto next;
> +
> /* As we replicate the lhs on each incoming edge all
>used SSA names have to be available there.  */
> if (! for_each_index (gimple_assign_lhs_ptr (use_stmt),
> @@ -363,6 +366,51 @@ propagate_with_phi (basic_block bb, gphi
>   get_immediate_dominator (CDI_DOMINATORS,
>gimple_bb (phi
>   goto next;
> +
> +   gimple *vuse_stmt;
> +   imm_use_iterator vui;
> +   use_operand_p vuse_p;
> +   bool debug_use_seen = false;
> +   /* In order to move the aggregate copies earlier, make sure
> +  there are no statements that could read from memory
> +  aliasing the lhs in between the start of bb and use_stmt.
> +  As we require use_stmt to have a VDEF above, loads after
> +  use_stmt will use a different virtual SSA_NAME.  */
> +   FOR_EACH_IMM_USE_FAST (vuse_p, vui, vuse)
> + {
> +   vuse_stmt = USE_STMT (vuse_p);
> +   if (vuse_stmt == use_stmt)
> + continue;
> +   if (!dominated_by_p (CDI_DOMINATORS,
> +gimple_bb (vuse_stmt), bb))
> + continue;
> +   if (ref_maybe_used_by_stmt_p (vuse_stmt,
> + gimple_assign_lhs (use_stmt)))
> + {
> +   if (is_gimple_debug (vuse_stmt))

debug stmts do not have virtual operands so their handling is not
necessary here.

Ok with that change.

Thanks,
Richard.


> + debug_use_seen = true;
> +   else
> + goto next;
> + }
> + }
> +   /* Debug stmt uses should not prevent the transformation, but
> +  if we saw any, reset those debug stmts.  */
> +   if (debug_use_seen)
> + FOR_EACH_IMM_USE_STMT (vuse_stmt, vui, vuse)
> +   {
> + if (!is_gimple_debug (vuse_stmt))
> +   continue;
> + if (!dominated_by_p (CDI_DOMINATORS,
> +  gimple_bb (vuse_stmt), bb))
> +   continue;
> + if (ref_maybe_used_by_stmt_p (vuse_stmt,
> +   gimple_assign_lhs (use_stmt)))
> +   {
> + gimple_debug_bind_reset_value (vuse_stmt);
> + update_stmt (vuse_stmt);
> +   }
> +   }
> +
> phiprop_insert_phi (bb, phi, use_stmt, phivn, n);
>  
> /* Remove old stmt.  The phi is taken care of by DCE.  */
> --- gcc/testsuite/g++.dg/torture/pr81365.C.jj 2017-07-11 17:07:11.107130111 
> +0200
> +++ gcc/testsuite/g++.dg/torture/pr81365.C2017-07-11 17:06:52.0 
> +0200
> @@ -0,0 +1,39 @@
> +// PR tree-optimization/81365
> +// { dg-do run }
> +
> +struct A { unsigned a; };
> +
> +struct B {
> +  B (const A *x)
> +  {
> +__builtin_memcpy (b, x, 3 * sizeof (A));
> +__builtin_memcpy (c, x + 3, sizeof (A));
> +__builtin_memset (c + 1, 0, sizeof (A));
> +  }
> +  bool
> +  foo (unsigned x)
> +  {
> +A *it = c;
> +if (it->a == x || (++it)->a == x)
> +  {
> + A t(b[0]);
> + b[0] = *it;
> + *it = t;
> + return true;
> +  }
> +return false;
> +  }
> +  A b[3];
> +  A c[2];
> +};
> +
> +int
> +main ()
> +{
>

Re: [PATCH, GCC/ARM] Rewire -mfpu=fp-armv8 as VFPv5 + D32 + DP

2017-07-17 Thread Thomas Preudhomme


[resend without HTML formatting]


On 14/07/17 16:29, Thomas Preudhomme wrote:

Hi Richard,


Hi,



I've committed the requested change as a separate patch to make it easier to 
backport to earlier GCC versions
While looking into backporting r250206 I read the ARM C Language Extension 
documentation again and realized that __ARM_FEATURE_NUMERIC_MAXMIN is for 
*vector* min and max instructions and intrinsics. Therefore the previous 
definition was correct. Sorry for the mistake.


This reverts commit r250206.

2017-07-15  Thomas Preud'homme  

Revert:
2017-07-14  Thomas Preud'homme  

* config/arm/arm-c.c (arm_cpu_builtins): Define
__ARM_FEATURE_NUMERIC_MAXMIN solely based on TARGET_VFP5.

Best regards,


Thomas



Definition of __ARM_FEATURE_NUMERIC_MAXMIN checks for
TARGET_ARM_ARCH >= 8 and TARGET_NEON being true in addition to
TARGET_VFP5. However, instructions covered by this macro are part of
FPv5 which is available in ARMv7E-M architecture. This patch fixes the
macro to only check for TARGET_VFP5.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

 * config/arm/arm-c.c (arm_cpu_builtins): Define
 __ARM_FEATURE_NUMERIC_MAXMIN solely based on TARGET_VFP5.

Built and confirmed that the macro is now defined when building with
-march=armv7e-m+fpv5 -mfloat-abi=hard.

Best regards,

Thomas

On 14/07/17 15:43, Richard Earnshaw (lists) wrote:

On 14/07/17 09:20, Thomas Preudhomme wrote:

Hi,

fp-armv8 is currently defined as a double precision FPv5 with 32 D
registers *and* a special FP_ARMv8 bit. However FP for ARMv8 should only
bring 32 D registers on top of FPv5-D16 so this FP_ARMv8 bit is
spurious. As a consequence, many instruction patterns which are guarded
by TARGET_FPU_ARMV8 are unavailable to FPv5-D16 and FPv5-SP-D16.

This patch gets rid of TARGET_FPU_ARMV8 and rewire all uses to
expressions based on TARGET_VFP5, TARGET_VFPD32 and TARGET_VFP_DOUBLE.
It also redefine ISA_FP_ARMv8 to include the D32 capability to
distinguish it from FPv5-D16. At last, it sets the +fp.sp for ARMv8-R to
enable FPv5-SP-D16 (ie FP for ARMv8 with single precision only and 16 D
registers).

ChangeLog entry is as follows:

2017-07-07  Thomas Preud'homme  

 * config/arm/arm-isa.h (isa_bit_FP_ARMv8): Delete enumerator.
 (ISA_FP_ARMv8): Define as ISA_FPv5 and ISA_FP_D32.
 * config/arm/arm-cpus.in (armv8-r): Define fp.sp as enabling FPv5.
 (fp-armv8): Define it as FP_ARMv8 only.
 config/arm/arm.h (TARGET_FPU_ARMV8): Delete.
 (TARGET_VFP_FP16INST): Define using TARGET_VFP5 rather than
 TARGET_FPU_ARMV8.
 config/arm/arm.c (arm_rtx_costs_internal): Replace checks against
 TARGET_FPU_ARMV8 by checks against TARGET_VFP5.
 * config/arm/arm-builtins.c (arm_builtin_vectorized_function): Define
 first ARM_CHECK_BUILTIN_MODE definition using TARGET_VFP5 rather
 than TARGET_FPU_ARMV8.
 * config/arm/arm-c.c (arm_cpu_builtins): Likewise for
 __ARM_FEATURE_NUMERIC_MAXMIN macro definition.
 * config/arm/arm.md (cmov): Condition on TARGET_VFP5 rather than
 TARGET_FPU_ARMV8.
 * config/arm/neon.md (neon_vrint): Likewise.
 (neon_vcvt): Likewise.
 (neon_): Likewise.
 (3): Likewise.
 * config/arm/vfp.md (lsi2): Likewise.
 * config/arm/predicates.md (arm_cond_move_operator): Check against
 TARGET_VFP5 rather than TARGET_FPU_ARMV8 and fix spacing.

Testing:
   * Bootstrapped under ARMv8-A Thumb state and ran testsuite -> no
regression
   * built Spec2000 and Spec2006 with -march=armv8-a+fp16 and compared
objdump -> no code generation difference

Is this ok for trunk?


OK with changes mentioned below.

R.



Best regards,

Thomas

rewire_mfpu_fparmv8.patch


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
63ee880822c17eda55dd58438d61cbbba333b2c6..7504ed581c63a657a0dff48442633704bd252b2e 
100644

--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -3098,7 +3098,7 @@ arm_builtin_vectorized_function (unsigned int fn, tree 
type_out, tree type_in)

 NULL_TREE is returned if no such builtin is available.  */
  #undef ARM_CHECK_BUILTIN_MODE
  #define ARM_CHECK_BUILTIN_MODE(C)\
-  (TARGET_FPU_ARMV8   \
+  (TARGET_VFP5   \
 && flag_unsafe_math_optimizations \
 && ARM_CHECK_BUILTIN_MODE_1 (C))
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 
a3daa3220a2bc4220dffdb7ca08ca9419bdac425..9178937b6d9e0fe5d0948701390c4cf01f4f8c7d 
100644

--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -96,7 +96,7 @@ arm_cpu_builtins (struct cpp_reader* pfile)
 || TARGET_ARM_ARCH_ISA_THUMB >=2));
def_or_undef_macro (pfile, "__ARM_FEATURE_NUMERIC_MAXMIN",
-  TARGET_ARM_ARCH >= 8 && TARGET_NEON && TARGET_FPU_ARMV8);
+  TARGET_ARM_ARCH >= 8 && TARGET_NEON && TARGET_VFP5);


This looks wrong (though ACLE is misleading).  The MAXMIN property is
solely defined by having an FPv5 capable FPU.


def_or_und

[nvptx, committed, PR81069] Insert diverging jump alap in nvptx_single

2017-07-17 Thread Tom de Vries


Hi,

Consider nvptx_single:
...
/* Single neutering according to MASK.  FROM is the incoming block and
   TO is the outgoing block.  These may be the same block. Insert at
   start of FROM:

 if (tid.) goto end.

   and insert before ending branch of TO (if there is such an insn):

 end:
 
 

   We currently only use differnt FROM and TO when skipping an entire
   loop.  We could do more if we detected superblocks.  */

static void
nvptx_single (unsigned mask, basic_block from, basic_block to)
...

When compiling libgomp.oacc-fortran/nested-function-1.f90 at -O1, we 
observed the following pattern:

...
:
 goto bb3;

: (with single predecessor)
 
 
...

which was translated by nvptx_single into:
...

 if (tid.) goto end.
 goto bb3;

:
 
 end:
 
 
...

There is no benefit to be gained from doing the goto bb3 in neutered 
mode, and there is no need to, so we might as well insert the neutering 
branch as late as possible:

...

 goto bb3;

:
 if (tid.) goto end.
 
 end:
 
 
...

This patch implements inserting the neutering branch as late as possible.

[ As it happens, the actual code for 
libgomp.oacc-fortran/nested-function-1.f90 at -O1 was more complicated: 
there were other bbs inbetween bb2 and bb3. While this doesn't change 
anything from a control flow graph point of view, it did trigger a bug 
in the ptx JIT compiler where it inserts the synchronization point for 
the diverging branch later than the immediate post-dominator point at 
the end label. Consequently, the condition broadcast was executed in 
divergent mode (which is known to give undefined results), resulting in 
a hang.
This patch also works around this ptx JIT compiler bug, for this 
test-case. ]


Build and tested on x86_64 with nvptx accelerator.

Committed.

Thanks,
- Tom
Insert diverging jump alap in nvptx_single

2017-07-17  Tom de Vries  

	PR target/81069
	* config/nvptx/nvptx.c (nvptx_single): Insert diverging branch as late
	as possible.

---
 gcc/config/nvptx/nvptx.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index daeec27..cb11686 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -3866,9 +3866,25 @@ nvptx_single (unsigned mask, basic_block from, basic_block to)
   rtx_insn *tail = BB_END (to);
   unsigned skip_mask = mask;
 
-  /* Find first insn of from block */
-  while (head != BB_END (from) && !INSN_P (head))
-head = NEXT_INSN (head);
+  while (true)
+{
+  /* Find first insn of from block.  */
+  while (head != BB_END (from) && !INSN_P (head))
+	head = NEXT_INSN (head);
+
+  if (from == to)
+	break;
+
+  if (!(JUMP_P (head) && single_succ_p (from)))
+	break;
+
+  basic_block jump_target = single_succ (from);
+  if (!single_pred_p (jump_target))
+	break;
+
+  from = jump_target;
+  head = BB_HEAD (from);
+}
 
   /* Find last insn of to block */
   rtx_insn *limit = from == to ? head : BB_HEAD (to);

Re: [PATCH] Fix qsort ordering violation in tree-vrp.c

2017-07-17 Thread Richard Biener

On Sat, 15 Jul 2017, Yuri Gribov wrote:

> Hi,
> 
> This is a follow-up on https://gcc.gnu.org/ml/gcc/2017-07/msg00078.html
> 
> compare_assert_loc in tree-vrp.c could return unpredictable results
> due to implicit conversion of unsigned subtraction to int here:
>   return ha - hb;
> 
> This could return inconsistent results for objects with the following hashes:
>   a: 0xdc8e4f72U
>   b: 0x912ab538U
>   c: 0x5ae66e3bU
> Then
>   a > b because (int)(0xdc8e4f72U - 0x912ab538U) > 0
>   b > c because (int)(0x912ab538U - 0x5ae66e3bU) > 0
> but
>   a < c because (int)(0xdc8e4f72U - 0x5ae66e3bU) == (int)0x81a7e137U < 0
> 
> Bug was found with https://github.com/yugr/sortcheck
> 
> Bootstrapped and regtested in x64, ok for trunk?

Ok.

Thanks,
Richard.

Re: Better merging of -fPIC/pic/PIE/pie in lto-wrapper

2017-07-17 Thread Richard Biener

On Sat, Jul 8, 2017 at 1:03 PM, Jan Hubicka  wrote:
> Hi,
> PR lto/80838 is about lto+profiledbootstrapped compiler being slower than
> profiledboostrapped compiler.
>
> This is caused by a fact that some of object files linked into cc1plus binary
> are built with -fPIC and lto-wrapper then decides to make whole binary PIC 
> that
> is very slow.  While we probably ought to avoid linking PIC code into static
> binary but I saw similar issue with firefox and other programs.
>
> I do not think we want to support mixed PIC/non-PIC symbols internally, 
> because
> it would need quite some work and I do not see any reasonable use cases.
>
> This patch makes merging more realistic/agressive.  Linking -fPIC and non-PIC
> code together results in non-PIC binary and thus the corresponding flags are
> dropped when mismatches occurs.
>
> It would be nice to warn about it, but I do not know how to make warning
> meaningful on targets that are PIC by default.
>
> Bootstrapped/regtested x86_64-linux, plan to commit it tomorrow after
> lto-boottrapping if there are no complains.

Hum.  I wonder if/why we can't ask the linker about the output binary kind?

Richard.

> Honza
>
> PR lto/80838
> * lto-wrapper.c (remove_option): New function.
> (merge_and_complain): Merge PIC/PIE options more realistically.
> Index: lto-wrapper.c
> ===
> --- lto-wrapper.c   (revision 250054)
> +++ lto-wrapper.c   (working copy)
> @@ -192,6 +192,20 @@ append_option (struct cl_decoded_option
>   sizeof (struct cl_decoded_option));
>  }
>
> +/* Remove option number INDEX from DECODED_OPTIONS, update
> +   DECODED_OPTIONS_COUNT.  */
> +
> +static void
> +remove_option (struct cl_decoded_option **decoded_options,
> +  int index, unsigned int *decoded_options_count)
> +{
> +  --*decoded_options_count;
> +  memmove (&(*decoded_options)[index + 1],
> +  &(*decoded_options)[index],
> +  sizeof (struct cl_decoded_option)
> +  * (*decoded_options_count - index));
> +}
> +
>  /* Try to merge and complain about options FDECODED_OPTIONS when applied
> ontop of DECODED_OPTIONS.  */
>
> @@ -202,6 +216,8 @@ merge_and_complain (struct cl_decoded_op
> unsigned int fdecoded_options_count)
>  {
>unsigned int i, j;
> +  struct cl_decoded_option *pic_option = NULL;
> +  struct cl_decoded_option *pie_option = NULL;
>
>/* ???  Merge options from files.  Most cases can be
>   handled by either unioning or intersecting
> @@ -238,10 +254,6 @@ merge_and_complain (struct cl_decoded_op
> case OPT_fdiagnostics_show_option:
> case OPT_fdiagnostics_show_location_:
> case OPT_fshow_column:
> -   case OPT_fPIC:
> -   case OPT_fpic:
> -   case OPT_fPIE:
> -   case OPT_fpie:
> case OPT_fcommon:
> case OPT_fgnu_tm:
>   /* Do what the old LTO code did - collect exactly one option
> @@ -255,6 +267,16 @@ merge_and_complain (struct cl_decoded_op
> append_option (decoded_options, decoded_options_count, foption);
>   break;
>
> +   /* Figure out what PIC/PIE level wins and merge the results.  */
> +   case OPT_fPIC:
> +   case OPT_fpic:
> + pic_option = foption;
> + break;
> +   case OPT_fPIE:
> +   case OPT_fpie:
> + pie_option = foption;
> + break;
> +
> case OPT_fopenmp:
> case OPT_fopenacc:
> case OPT_fcilkplus:
> @@ -286,18 +308,6 @@ merge_and_complain (struct cl_decoded_op
>  foption->orig_option_with_args_text);
>   break;
>
> -   case OPT_foffload_abi_:
> - for (j = 0; j < *decoded_options_count; ++j)
> -   if ((*decoded_options)[j].opt_index == foption->opt_index)
> - break;
> - if (j == *decoded_options_count)
> -   append_option (decoded_options, decoded_options_count, foption);
> - else if (foption->value != (*decoded_options)[j].value)
> -   fatal_error (input_location,
> -"Option %s not used consistently in all LTO input"
> -" files", foption->orig_option_with_args_text);
> - break;
> -
> case OPT_O:
> case OPT_Ofast:
> case OPT_Og:
> @@ -368,12 +378,70 @@ merge_and_complain (struct cl_decoded_op
>   (*decoded_options)[j].value = 1;
> }
>   break;
> +
> +
> +   case OPT_foffload_abi_:
> + for (j = 0; j < *decoded_options_count; ++j)
> +   if ((*decoded_options)[j].opt_index == foption->opt_index)
> + break;
> + if (j == *decoded_options_count)
> +   append_option (decoded_options, decoded_options_count, foption);
> + else if (foption->value != (*decoded_options)[j].value)
> +   fatal_error (input_location,
> +"Option %s not used consistently in all LTO

Re: C/C++ PATCH to implement -Wmultistatement-macros (PR c/80116)

2017-07-17 Thread Gerald Pfeifer

On Sat, 10 Jun 2017, Gerald Pfeifer wrote:
> I'm curious to see how many issues this is going to find in real-world
> code out there!

This is a bit anecdotal, but your code did find one real issue in Wine:

 https://www.winehq.org/pipermail/wine-patches/2017-July/163551.html
(and https://www.winehq.org/pipermail/wine-patches/2017-July/163550.html ) 

The first patch was accepted, the second (but earlier one) not.

Gerald

Re: [patch] Fix ICE on CONSTRUCTOR containing absolute addresses

2017-07-17 Thread Richard Biener

On Fri, Jul 7, 2017 at 1:08 PM, Eric Botcazou  wrote:
> Hi,
>
> this fixes the following ICE in decode_addr_const:
>
> +===GNAT BUG DETECTED==+
> | 8.0.0 20170704 (experimental) [trunk revision 249942] (x86_64-suse-linux)
> GCC error:|
> | in decode_addr_const, at varasm.c:2880
>
> stemming from a CONSTRUCTOR containing absolute addresses hidden behind a
> COMPONENT_REF or similar references.
>
> Fixed by adding support for INDIRECT_REF  to decode_addr_const.
>
> Tested on x86_64-suse-linux, OK for the mainline?

Apart from the MEM construction where I simply trust you this looks
ok.  Mind adding
MEM_REF support for this case as well?

Btw, why's simply output_constant_def (TREE_OPERAND (target, 0), 1);
not correct?

Isn't this about &*0x1?

Thanks,
Richard.

>
> 2017-07-07  Eric Botcazou  
>
> * varasm.c (decode_addr_const): Deal with INDIRECT_REF .
>
>
> 2017-07-07  Eric Botcazou  
>
> * gnat.dg/aggr22.ad[sb]: New test.
>
> --
> Eric Botcazou

Re: [PATCH] Improve bswap on nop non-base_addr reshuffles (PR tree-optimization/81396)

2017-07-17 Thread Richard Biener

On Thu, 13 Jul 2017, Jakub Jelinek wrote:

> Hi!
> 
> As mentioned in the PR, the following testcase started using recently
> BIT_FIELD_REFs instead of MEM_REFs and thus the bswap pass, while it
> properly determines the very long sequence of stmts is a nop transformation,
> throws that away and doesn't optimize it, and no other optimizations
> are able to optimize it away.
> 
> The patch attempts to not do anything if there is a simple identity
> copy, but if the nop reshuffle needs more than one operation, it will
> try to replace the final SSA_NAME BIT_IOR_EXPR assignment with assignment
> from the src value (typically SSA_NAME).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2017-07-13  Jakub Jelinek  
> 
>   PR tree-optimization/81396
>   * tree-ssa-math-opts.c (struct symbolic_number): Add n_ops field.
>   (init_symbolic_number): Initialize it to 1.
>   (perform_symbolic_merge): Add n_ops from both operands into the new
>   n_ops.
>   (find_bswap_or_nop): Don't consider n->n == cmpnop computations
>   without base_addr as useless if they need more than one operation.
>   (bswap_replace): Handle !bswap case for NULL base_addr.
> 
>   * gcc.dg/tree-ssa/pr81396.c: New test.
> 
> --- gcc/tree-ssa-math-opts.c.jj   2017-07-06 20:31:43.0 +0200
> +++ gcc/tree-ssa-math-opts.c  2017-07-13 19:27:02.985354778 +0200
> @@ -1968,6 +1968,7 @@ struct symbolic_number {
>tree alias_set;
>tree vuse;
>unsigned HOST_WIDE_INT range;
> +  int n_ops;
>  };

Ok if you document n_ops a bit in the comment above the structure.

Thanks,
Richard.

>  #define BITS_PER_MARKER 8
> @@ -2083,6 +2084,7 @@ init_symbolic_number (struct symbolic_nu
>  return false;
>n->range = size;
>n->n = CMPNOP;
> +  n->n_ops = 1;
>  
>if (size < 64 / BITS_PER_MARKER)
>  n->n &= ((uint64_t) 1 << (size * BITS_PER_MARKER)) - 1;
> @@ -2293,6 +2295,7 @@ perform_symbolic_merge (gimple *source_s
>   return NULL;
>  }
>n->n = n1->n | n2->n;
> +  n->n_ops = n1->n_ops + n2->n_ops;
>  
>return source_stmt;
>  }
> @@ -2588,7 +2591,7 @@ find_bswap_or_nop (gimple *stmt, struct
>  return NULL;
>  
>/* Useless bit manipulation performed by code.  */
> -  if (!n->base_addr && n->n == cmpnop)
> +  if (!n->base_addr && n->n == cmpnop && n->n_ops == 1)
>  return NULL;
>  
>n->range *= BITS_PER_UNIT;
> @@ -2747,6 +2750,36 @@ bswap_replace (gimple *cur_stmt, gimple
>   }
>src = val_tmp;
>  }
> +  else if (!bswap)
> +{
> +  gimple *g;
> +  if (!useless_type_conversion_p (TREE_TYPE (tgt), TREE_TYPE (src)))
> + {
> +   if (!is_gimple_val (src))
> + return false;
> +   g = gimple_build_assign (tgt, NOP_EXPR, src);
> + }
> +  else
> + g = gimple_build_assign (tgt, src);
> +  if (n->range == 16)
> + nop_stats.found_16bit++;
> +  else if (n->range == 32)
> + nop_stats.found_32bit++;
> +  else
> + {
> +   gcc_assert (n->range == 64);
> +   nop_stats.found_64bit++;
> + }
> +  if (dump_file)
> + {
> +   fprintf (dump_file,
> +"%d bit reshuffle in target endianness found at: ",
> +(int) n->range);
> +   print_gimple_stmt (dump_file, cur_stmt, 0);
> + }
> +  gsi_replace (&gsi, g, true);
> +  return true;
> +}
>else if (TREE_CODE (src) == BIT_FIELD_REF)
>  src = TREE_OPERAND (src, 0);
>  
> --- gcc/testsuite/gcc.dg/tree-ssa/pr81396.c.jj2017-07-13 
> 19:22:10.191954620 +0200
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr81396.c   2017-07-13 19:24:16.638399984 
> +0200
> @@ -0,0 +1,25 @@
> +/* PR tree-optimization/81396 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +typedef unsigned long long uint64_t;
> +
> +uint64_t
> +foo (uint64_t word)
> +{
> +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ && __SIZEOF_LONG_LONG__ == 8
> +  const unsigned char *const ptr = (const unsigned char *) &word;
> +  return ((uint64_t) ptr[0]
> +   | ((uint64_t) ptr[1] << 8)
> +   | ((uint64_t) ptr[2] << (8 * 2))
> +   | ((uint64_t) ptr[3] << (8 * 3))
> +   | ((uint64_t) ptr[4] << (8 * 4))
> +   | ((uint64_t) ptr[5] << (8 * 5))
> +   | ((uint64_t) ptr[6] << (8 * 6))
> +   | ((uint64_t) ptr[7] << (8 * 7)));
> +#else
> +  return word;
> +#endif
> +}
> +
> +/* { dg-final { scan-tree-dump "return word_\[0-9]*\\(D\\);" "optimized" } } 
> */
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

[PATCHv4][PING][PR 57371] Remove useless floating point casts in comparisons

2017-07-17 Thread Yuri Gribov

Hi all,

This is an updated version of patch in
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00409.html . It prevents
optimization in presense of sNaNs (and qNaNs when comparison operator
is > >= < <=) to preserve FP exceptions.

Note that I had to use -fsignaling-nans in pr57371-5.c test because by
default this option is off and some existing patterns in match.pd
happily optimize NaN comparisons, even with sNaNs (!).

Bootstrapped and regtested on x64. Ok for trunk?

-Y


pr57371-4.patch
Description: Binary data

[RFC][PR 67336][PING^6] Verify pointers during stack unwind

2017-07-17 Thread Yuri Gribov

Hi all,

I've rebased the previous patch to trunk per Andrew's suggestion.
Original patch description/motivation/questions are in
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html

-Y


safe-unwind-2.patch
Description: Binary data
#include 
#include 

struct _Unwind_Context;

typedef int (*_Unwind_Trace_Fn)(struct _Unwind_Context *, void *vdata);

extern int _Unwind_Backtrace(_Unwind_Trace_Fn trace, void * trace_argument);
extern int _Unwind_Backtrace_Checked(_Unwind_Trace_Fn trace, void * trace_argument);

#ifdef CHECK_UNWIND
#define _Unwind_Backtrace _Unwind_Backtrace_Checked
#endif

extern void *_Unwind_GetIP (struct _Unwind_Context *context);

int simple_unwind (struct _Unwind_Context *context, void *vdata) {
  printf("Next frame: ");
  void *pc = _Unwind_GetIP(context);
  printf("%p\n", pc);
  return 0;
}

#define noinline __attribute__((noinline))

noinline int foo() {
  // Clobber stack to provoke errors in unwinder
  int x;
  void *p = &x;
  asm("" :: "r"(p));
  memset(p, 0xa, 128);

  printf("After clobbering stack\n");

  int ret = _Unwind_Backtrace(simple_unwind, 0);
  printf("After unwind: %d\n", ret);
  printf("We're going to fail now\n");

  return 0;
}

noinline int bar() {
  int x = foo();
  return x + 1;
}

int main() {
  bar();
  return 0;
}

[PATCH][PING] Improve extraction of changed file in contrib/mklog

2017-07-17 Thread Yuri Gribov

Hi,

Currently mklog will fail to analyze lines like this in patches:
diff -rupN gcc/gcc/testsuite/lib/profopt.exp
gcc-compare-checks/gcc/testsuite/lib/profopt.exp
(it fails with "Error: failed to parse diff for ... and ...").

This patch fixes it. Ok for trunk?

-Y


mklog-filename-fix-1.patch
Description: Binary data

[PATCHv2][PING^2][PR 56727] Bypass PLT for recursive calls

2017-07-17 Thread Yuri Gribov

Hi all,

This is a new version of previous patch
(https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00020.html), fixed
after Rainer's remarks.

-Y


pr56727-2.patch
Description: Binary data

95 matches

Mail list logo