Re: [PATCH] Make strlen range computations more conservative

2018-07-31 Thread Bernd Edlinger
> Certainly not every "strlen" has these semantics.  For example,
> this open-coded one doesn't:
> 
>int len = 0;
>for (int i = 0; s.a[i]; ++i)
>  ++len;
> 
> It computes 2 (with no warning for the out-of-bounds access).
> 

yes, which is questionable as well, but that happens only
if the source code accesses the array via s.a[i]
not if it happens to use char *, as this experiment shows:

$ cat y1.c
int len (const char *x)
{
   int len = 0;
   for (int i = 0; x[i]; ++i)
 ++len;
   return len;
}

const char a[3] = "123";

int main ()
{
   return len(a);
}

$ gcc -O3 y1.c
$  ./a.out ; echo $?
3

The loop is not optimized away.

$ cat y2.c
const char a[3] = "123";

int main ()
{
   int len = 0;
   for (int i = 0; a[i]; ++i)
 ++len;
   return len;
}

$ gcc -O3 y2.c
$ ./a.out ; echo $?
2


The point I make is that it is impossible to know where the function
is inlined, and if the original code can be broken in surprising ways.
And most importantly strlen is often used in security relevant ways.


> So if the standard doesn't guarantee it and different kinds
> of accesses behave differently, how do we explain what "works"
> and what doesn't without relying on GCC implementation details?
> 
> If we can't then the only language we have in common with users
> is the standard.  (This, by the way, is what the C memory model
> group is trying to address -- the language or feature that's
> missing from the standard that says when, if ever, these things
> might be valid.)

Sorry, but there are examples of undefined behaviour that GCC does
deliberately not use for code optimizations, but only for warnings.
I mean undefinedness of signed shift left overflow for instance.

I think the possible return value of strlen should be also not used
for code optimizations.

Because your optimization assumes the return value of strlen
is always in the range 0..size-1 even if the string is not nul terminated.
But that is the only value that can _never_ be returned if the string is
not nul terminated.  Therefore this is often used as check for
zero-termination. (*)

But in reality the return value is always in range size..infinity or
the function aborts, code like assert(strlen(x) < sizeof(x)) uses
this basic knowledge.  The standard should mention these magic
powers of strlen, and state that it will either abort or return >= sizeof(x).
It does not help anybody to be unclear.


(*): This is even done here:
__strcpy_chk (char *__restrict__ dest, const char *__restrict__ src,
  size_t slen)
{
  size_t len = strlen (src);
  if (len >= slen)
__chk_fail ();
  return memcpy (dest, src, len + 1);
}

If you are right __chk_fail will never be called. So why not optimize
it away?


Bernd.

Re: [PATCH][GCC][mid-end] Allow larger copies when not slow_unaligned_access and no padding.

2018-07-31 Thread Richard Biener
On Tue, 31 Jul 2018, Tamar Christina wrote:

> Hi Richard,
> 
> The 07/31/2018 11:21, Richard Biener wrote:
> > On Tue, 31 Jul 2018, Tamar Christina wrote:
> > 
> > > Ping 😊
> > > 
> > > > -Original Message-
> > > > From: gcc-patches-ow...@gcc.gnu.org 
> > > > On Behalf Of Tamar Christina
> > > > Sent: Tuesday, July 24, 2018 17:34
> > > > To: Richard Biener 
> > > > Cc: gcc-patches@gcc.gnu.org; nd ; l...@redhat.com;
> > > > i...@airs.com; amo...@gmail.com; berg...@vnet.ibm.com
> > > > Subject: Re: [PATCH][GCC][mid-end] Allow larger copies when not
> > > > slow_unaligned_access and no padding.
> > > > 
> > > > Hi Richard,
> > > > 
> > > > Thanks for the review!
> > > > 
> > > > The 07/23/2018 18:46, Richard Biener wrote:
> > > > > On July 23, 2018 7:01:23 PM GMT+02:00, Tamar Christina
> > > >  wrote:
> > > > > >Hi All,
> > > > > >
> > > > > >This allows copy_blkmode_to_reg to perform larger copies when it is
> > > > > >safe to do so by calculating the bitsize per iteration doing the
> > > > > >maximum copy allowed that does not read more than the amount of bits
> > > > > >left to copy.
> > > > > >
> > > > > >Strictly speaking, this copying is only done if:
> > > > > >
> > > > > >  1. the target supports fast unaligned access  2. no padding is
> > > > > > being used.
> > > > > >
> > > > > >This should avoid the issues of the first patch (PR85123) but still
> > > > > >work for targets that are safe to do so.
> > > > > >
> > > > > >Original patch
> > > > > >https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01088.html
> > > > > >Previous respin
> > > > > >https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00239.html
> > > > > >
> > > > > >
> > > > > >This produces for the copying of a 3 byte structure:
> > > > > >
> > > > > >fun3:
> > > > > > adrpx1, .LANCHOR0
> > > > > > add x1, x1, :lo12:.LANCHOR0
> > > > > > mov x0, 0
> > > > > > sub sp, sp, #16
> > > > > > ldrhw2, [x1, 16]
> > > > > > ldrbw1, [x1, 18]
> > > > > > add sp, sp, 16
> > > > > > bfi x0, x2, 0, 16
> > > > > > bfi x0, x1, 16, 8
> > > > > > ret
> > > > > >
> > > > > >whereas before it was producing
> > > > > >
> > > > > >fun3:
> > > > > > adrpx0, .LANCHOR0
> > > > > > add x2, x0, :lo12:.LANCHOR0
> > > > > > sub sp, sp, #16
> > > > > > ldrhw1, [x0, #:lo12:.LANCHOR0]
> > > > > > ldrbw0, [x2, 2]
> > > > > > strhw1, [sp, 8]
> > > > > > strbw0, [sp, 10]
> > > > > > ldr w0, [sp, 8]
> > > > > > add sp, sp, 16
> > > > > > ret
> > > > > >
> > > > > >Cross compiled and regtested on
> > > > > >  aarch64_be-none-elf
> > > > > >  armeb-none-eabi
> > > > > >and no issues
> > > > > >
> > > > > >Boostrapped and regtested
> > > > > > aarch64-none-linux-gnu
> > > > > > x86_64-pc-linux-gnu
> > > > > > powerpc64-unknown-linux-gnu
> > > > > > arm-none-linux-gnueabihf
> > > > > >
> > > > > >and found no issues.
> > > > > >
> > > > > >OK for trunk?
> > > > >
> > > > > How does this affect store-to-load forwarding when the source is 
> > > > > initialized
> > > > piecewise? IMHO we should avoid larger loads but generate larger stores
> > > > when possible.
> > > > >
> > > > > How do non-x86 architectures behave with respect to STLF?
> > > > >
> > > > 
> > > > I should have made it more explicit in my cover letter, but this only 
> > > > covers reg
> > > > to reg copies.
> > > > So the store-t-load forwarding shouldn't really come to play here, 
> > > > unless I'm
> > > > missing something
> > > > 
> > > > The example in my patch shows that the loads from mem are mostly
> > > > unaffected.
> > > > 
> > > > For x86 the change is also quite significant, e.g for a 5 byte struct 
> > > > load it used
> > > > to generate
> > > > 
> > > > fun5:
> > > > movlfoo5(%rip), %eax
> > > > movl%eax, %edi
> > > > movzbl  %al, %edx
> > > > movzbl  %ah, %eax
> > > > movb%al, %dh
> > > > movzbl  foo5+2(%rip), %eax
> > > > shrl$24, %edi
> > > > salq$16, %rax
> > > > movq%rax, %rsi
> > > > movzbl  %dil, %eax
> > > > salq$24, %rax
> > > > movq%rax, %rcx
> > > > movq%rdx, %rax
> > > > movzbl  foo5+4(%rip), %edx
> > > > orq %rsi, %rax
> > > > salq$32, %rdx
> > > > orq %rcx, %rax
> > > > orq %rdx, %rax
> > > > ret
> > > > 
> > > > instead of
> > > > 
> > > > fun5:
> > > > movzbl  foo5+4(%rip), %eax
> > > > salq$32, %rax
> > > > movq%rax, %rdx
> > > > movlfoo5(%rip), %eax
> > > > orq %rdx, %rax
> > > > ret
> > > > 
> > > > so the loads themselves are unaffected.
> > 
> > I see.  Few things:
> > 
> >dst_words = XALLOCAVEC (rtx, n_regs);
> > +
> > +  slow_unaligned_access
> > += targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE 
> > (src)));
> > +
> >bitsize = MIN 

Re: [PR 83141] Prevent SRA from removing type changing assignment

2018-07-31 Thread H.J. Lu
On Tue, Dec 5, 2017 at 4:00 AM, Martin Jambor  wrote:
> On Tue, Dec 05 2017, Martin Jambor wrote:
>> On Tue, Dec 05 2017, Martin Jambor wrote:
>> Hi,
>>
>>> Hi,
>>>
>>> this is a followup to Richi's
>>> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02396.html to fix PR
>>> 83141.  The basic idea is simple, be just as conservative about type
>>> changing MEM_REFs as we are about actual VCEs.
>>>
>>> I have checked how that would affect compilation of SPEC 2006 and (non
>>> LTO) Mozilla Firefox and am happy to report that the difference was
>>> tiny.  However, I had to make the test less strict, otherwise testcase
>>> gcc.dg/guality/pr54970.c kept failing because it contains folded memcpy
>>> and expects us to track values accross:
>>>
>>>   int a[] = { 1, 2, 3 };
>>>   /* ... */
>>>   __builtin_memcpy (&a, (int [3]) { 4, 5, 6 }, sizeof (a));
>>>  /* { dg-final { gdb-test 31 "a\[0\]" "4" } } */
>>>  /* { dg-final { gdb-test 31 "a\[1\]" "5" } } */
>>>  /* { dg-final { gdb-test 31 "a\[2\]" "6" } } */
>>>
>>> SRA is able to load replacement of a[0] directly from the temporary
>>> array which is apparently necessary to generate proper debug info.  I
>>> have therefore allowed the current transformation to go forward if the
>>> source does not contain any padding or if it is a read-only declaration.
>>
>> Ah, the read-only test is of course bogus, it was a last minute addition
>> when I was apparently already too tired to think it through.  Please
>> disregard that line in the patch (it has passed bootstrap and testing
>> without it).
>>
>> Sorry for the noise,
>>
>> Martin
>>
>
> And for the record, below is the actual patch, after a fresh round of
> re-testing to double check I did not mess up anything else.  As before,
> I'd like to ask for review, especially of the type_contains_padding_p
> predicate and then would like to commit it to trunk.
>
> Thanks,
>
> Martin
>
>
> 2017-12-05  Martin Jambor  
>
> PR tree-optimization/83141
> * tree-sra.c (type_contains_padding_p): New function.
> (contains_vce_or_bfcref_p): Move up in the file, also test for
> MEM_REFs implicitely changing types with padding.  Remove inline
> keyword.
> (build_accesses_from_assign): Check contains_vce_or_bfcref_p
> before setting bit in should_scalarize_away_bitmap.
>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86763

H.J.


libgo patch committed: Use poll rather than pollset on AIX

2018-07-31 Thread Ian Lance Taylor
This patch by Tony Reix changes libgo's poller support on AIX to use
poll rather than pollset.  This may fixes
https://golang.org/issue/26634.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux, not that that proves much.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 263179)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-a2e0ad16555b2698df8e71f4c0fe02e185715bc1
+8997a3afcc746824cb70b48b32d9c86b4814807d
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/runtime/netpoll.go
===
--- libgo/go/runtime/netpoll.go (revision 263179)
+++ libgo/go/runtime/netpoll.go (working copy)
@@ -169,8 +169,8 @@ func poll_runtime_pollWait(pd *pollDesc,
if err != 0 {
return err
}
-   // As for now only Solaris uses level-triggered IO.
-   if GOOS == "solaris" {
+   // As for now only Solaris and AIX use level-triggered IO.
+   if GOOS == "solaris" || GOOS == "aix" {
netpollarm(pd, mode)
}
for !netpollblock(pd, int32(mode), false) {
Index: libgo/go/runtime/netpoll_aix.go
===
--- libgo/go/runtime/netpoll_aix.go (revision 263179)
+++ libgo/go/runtime/netpoll_aix.go (working copy)
@@ -7,9 +7,7 @@ package runtime
 import "unsafe"
 
 // This is based on the former libgo/runtime/netpoll_select.c implementation
-// except that it uses AIX pollset_poll instead of select and is written in Go.
-
-type pollset_t int32
+// except that it uses poll instead of select and is written in Go.
 
 type pollfd struct {
fd  int32
@@ -22,25 +20,9 @@ const _POLLOUT = 0x0002
 const _POLLHUP = 0x2000
 const _POLLERR = 0x4000
 
-type poll_ctl struct {
-   cmdint16
-   events int16
-   fd int32
-}
-
-const _PS_ADD = 0x0
-const _PS_DELETE = 0x2
-
-//extern pollset_create
-func pollset_create(maxfd int32) pollset_t
-
 //go:noescape
-//extern pollset_ctl
-func pollset_ctl(ps pollset_t, pollctl_array *poll_ctl, array_length int32) 
int32
-
-//go:noescape
-//extern pollset_poll
-func pollset_poll(ps pollset_t, polldata_array *pollfd, array_length int32, 
timeout int32) int32
+//extern poll
+func libc_poll(pfds *pollfd, npfds uintptr, timeout uintptr) int32
 
 //go:noescape
 //extern pipe
@@ -55,9 +37,10 @@ func fcntl(fd, cmd int32, arg uintptr) u
 }
 
 var (
-   ps  pollset_t = -1
-   mpfds   map[int32]*pollDesc
-   pmtxmutex
+   pfds[]pollfd
+   pds []*pollDesc
+   mtxpoll mutex
+   mtxset  mutex
rdwake  int32
wrwake  int32
needsUpdate bool
@@ -66,13 +49,7 @@ var (
 func netpollinit() {
var p [2]int32
 
-   if ps = pollset_create(-1); ps < 0 {
-   throw("runtime: netpollinit failed to create pollset")
-   }
-   // It is not possible to add or remove descriptors from
-   // the pollset while pollset_poll is active.
-   // We use a pipe to wakeup pollset_poll when the pollset
-   // needs to be updated.
+   // Create the pipe we use to wakeup poll.
if err := libc_pipe(&p[0]); err < 0 {
throw("runtime: netpollinit failed to create pipe")
}
@@ -84,127 +61,136 @@ func netpollinit() {
fcntl(rdwake, _F_SETFD, _FD_CLOEXEC)
 
fl = fcntl(wrwake, _F_GETFL, 0)
-   fcntl(wrwake, _F_SETFL, fl|_O_NONBLOCK)
fcntl(wrwake, _F_SETFD, _FD_CLOEXEC)
 
-   // Add the read side of the pipe to the pollset.
-   var pctl poll_ctl
-   pctl.cmd = _PS_ADD
-   pctl.fd = rdwake
-   pctl.events = _POLLIN
-   if pollset_ctl(ps, &pctl, 1) != 0 {
-   throw("runtime: netpollinit failed to register pipe")
-   }
-
-   mpfds = make(map[int32]*pollDesc)
+   // Pre-allocate array of pollfd structures for poll.
+   pfds = make([]pollfd, 1, 128)
+   // Poll the read side of the pipe.
+   pfds[0].fd = rdwake
+   pfds[0].events = _POLLIN
+
+   // Allocate index to pd array
+   pds = make([]*pollDesc, 1, 128)
+   pds[0] = nil
 }
 
 func netpolldescriptor() uintptr {
-   // ps is not a real file descriptor.
return ^uintptr(0)
 }
 
-func netpollopen(fd uintptr, pd *pollDesc) int32 {
-   // pollset_ctl will block if pollset_poll is active
-   // so wakeup pollset_poll first.
-   lock(&pmtx)
-   needsUpdate = true
-   unlock(&pmtx)
-   b := [1]byte{0}
-   write(uintptr(wrwake), unsafe.Pointer(&b[0]), 1)
-
-   var pctl poll_ctl
-   pctl.cmd = _PS_ADD
-   pctl.fd = int32(fd)
-   pctl.events = _POLLIN | _POLLOUT
-   if pollset_ctl(ps, &pctl, 1) != 0 {
-   return int32(errno())
+func netpollwakeup(

Re: [PATCH 11/11] rs6000 - add speculation_barrier pattern

2018-07-31 Thread Segher Boessenkool
On Tue, Jul 31, 2018 at 05:01:02PM -0500, Bill Schmidt wrote:
> > On Jul 27, 2018, at 4:37 AM, Richard Earnshaw  
> > wrote:
> > This patch reworks the existing rs6000_speculation_barrier pattern to
> > work with the new __builtin_sepculation_safe_value() intrinsic.  The
> > change is trivial as it simply requires renaming the existing speculation
> > barrier pattern.
> > 
> > So the total patch is to delete 14 characters!

> I can't ack the patch, but I am happy with it.  Thank you for this work!

Looks fine to me, too.  I'm sure someone has tested it by now, too ;-)
Okay for trunk.  Thanks!


Segher


> > * config/rs6000/rs6000.md (speculation_barrier): Renamed from
> > rs6000_speculation_barrier.
> > * config/rs6000/rs6000.c (rs6000_expand_builtin): Adjust for
> > new barrier pattern name.


Re: [PATCH] Make strlen range computations more conservative

2018-07-31 Thread Martin Sebor

On 07/31/2018 09:48 AM, Jakub Jelinek wrote:

On Tue, Jul 31, 2018 at 09:17:52AM -0600, Martin Sebor wrote:

On 07/31/2018 12:38 AM, Jakub Jelinek wrote:

On Mon, Jul 30, 2018 at 09:45:49PM -0600, Martin Sebor wrote:

Even without _FORTIFY_SOURCE GCC diagnoses (some) writes past
the end of subobjects by string functions.  With _FORTIFY_SOURCE=2
it calls abort.  This is the default on popular distributions,


Note that _FORTIFY_SOURCE=2 is the mode that goes beyond what the standard
requires, imposes extra requirements.  So from what this mode accepts or
rejects we shouldn't determine what is or isn't considered valid.


I'm not sure what the additional requirements are but the ones
I am referring to are the enforcing of struct member boundaries.
This is in line with the standard requirements of not accessing
[sub]objects via pointers derived from other [sub]objects.


In the middle-end the distinction between what was originally a reference
to subobjects and what was a reference to objects is quickly lost
(whether through SCCVN or other optimizations).
We've run into this many times with the __builtin_object_size already.
So, if e.g.
struct S { char a[3]; char b[5]; } s = { "abc", "defg" };
...
strlen ((char *) &s) is well defined but
strlen (s.a) is not in C, for the middle-end you might not figure out which
one is which.


Yes, I'm aware of the middle-end transformation to MEM_REF
-- it's one of the reasons why detecting invalid accesses
by the middle end warnings, including -Warray-bounds,
-Wformat-overflow, -Wsprintf-overflow, and even -Wrestrict,
is less than perfect.

But is strlen(s.a) also meant to be well-defined in the middle
end (with the semantics of computing the length or "abcdefg"?)
And if so, what makes it well defined?

Certainly not every "strlen" has these semantics.  For example,
this open-coded one doesn't:

  int len = 0;
  for (int i = 0; s.a[i]; ++i)
++len;

It computes 2 (with no warning for the out-of-bounds access).

So if the standard doesn't guarantee it and different kinds
of accesses behave differently, how do we explain what "works"
and what doesn't without relying on GCC implementation details?

If we can't then the only language we have in common with users
is the standard.  (This, by the way, is what the C memory model
group is trying to address -- the language or feature that's
missing from the standard that says when, if ever, these things
might be valid.)

Martin


Re: [PATCH 10/11] x86 - add speculation_barrier pattern

2018-07-31 Thread H.J. Lu
On Sat, Jul 28, 2018 at 1:25 AM, Uros Bizjak  wrote:
> On Fri, Jul 27, 2018 at 11:37 AM, Richard Earnshaw
>  wrote:
>>
>> This patch adds a speculation barrier for x86, based on my
>> understanding of the required mitigation for that CPU, which is to use
>> an lfence instruction.
>>
>> This patch needs some review by an x86 expert and if adjustments are
>> needed, I'd appreciate it if they could be picked up by the port
>> maintainer.  This is supposed to serve as an example of how to deploy
>> the new __builtin_speculation_safe_value() intrinsic on this
>> architecture.
>>
>> * config/i386/i386.md (unspecv): Add UNSPECV_SPECULATION_BARRIER.
>> (speculation_barrier): New insn.
>
> The implementation is OK, but someone from Intel (CC'd) should clarify
> if lfence is the correct insn.
>

I checked with our people.  lfence is OK.

Thanks.

-- 
H.J.


Re: [PATCH] [AArch64, Falkor] Adjust Falkor's sign extend reg+reg address cost

2018-07-31 Thread James Greenhalgh
On Wed, Jul 25, 2018 at 01:35:23PM -0500, Luis Machado wrote:
> Adjust Falkor's register_sextend cost from 4 to 3.  This fixes a testsuite
> failure in gcc.target/aarch64/extend.c:ldr_sxtw where GCC was generating
> a sbfiz instruction rather than a load with sign extension.
> 
> No performance changes.

OK if this is what is best for your subtarget.

Thanks,
James

> 
> gcc/ChangeLog:
> 
> 2018-07-25  Luis Machado  
> 
>   * config/aarch64/aarch64.c (qdf24xx_addrcost_table)
>   : Set to 3.


Re: [PATCH] [AArch64, Falkor] Switch to using Falkor-specific vector costs

2018-07-31 Thread James Greenhalgh
On Wed, Jul 25, 2018 at 01:10:34PM -0500, Luis Machado wrote:
> The adjusted vector costs give Falkor a reasonable boost in performance for FP
> benchmarks (both CPU2017 and CPU2006) and doesn't change INT benchmarks that
> much. About 0.7% for CPU2017 FP and 1.54% for CPU2006 FP.
> 
> OK for trunk?

OK if this is what works best for your subtarget.

Thanks,
James

> 
> gcc/ChangeLog:
> 
> 2018-07-25  Luis Machado  
> 
>   * config/aarch64/aarch64.c (qdf24xx_vector_cost): New.
>   (qdf24xx_tunings) : Set to qdf24xx_vector_cost.
> ---
>  gcc/config/aarch64/aarch64.c | 22 +-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index fa01475..d443aee 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -430,6 +430,26 @@ static const struct cpu_vector_cost generic_vector_cost =
>1 /* cond_not_taken_branch_cost  */
>  };
>  
> +/* Qualcomm QDF24xx costs for vector insn classes.  */
> +static const struct cpu_vector_cost qdf24xx_vector_cost =
> +{
> +  1, /* scalar_int_stmt_cost  */
> +  1, /* scalar_fp_stmt_cost  */
> +  1, /* scalar_load_cost  */
> +  1, /* scalar_store_cost  */
> +  1, /* vec_int_stmt_cost  */
> +  3, /* vec_fp_stmt_cost  */
> +  2, /* vec_permute_cost  */
> +  1, /* vec_to_scalar_cost  */
> +  1, /* scalar_to_vec_cost  */
> +  1, /* vec_align_load_cost  */
> +  1, /* vec_unalign_load_cost  */
> +  1, /* vec_unalign_store_cost  */
> +  1, /* vec_store_cost  */
> +  3, /* cond_taken_branch_cost  */
> +  1  /* cond_not_taken_branch_cost  */
> +};
> +
>  /* ThunderX costs for vector insn classes.  */
>  static const struct cpu_vector_cost thunderx_vector_cost =
>  {
> @@ -890,7 +910,7 @@ static const struct tune_params qdf24xx_tunings =
>&qdf24xx_extra_costs,
>&qdf24xx_addrcost_table,
>&qdf24xx_regmove_cost,
> -  &generic_vector_cost,
> +  &qdf24xx_vector_cost,
>&generic_branch_cost,
>&generic_approx_modes,
>4, /* memmov_cost  */
> -- 
> 2.7.4
> 


Re: [PATCH] PR libstdc++/86751 default assignment operators for std::pair

2018-07-31 Thread Jonathan Wakely

On 31/07/18 18:40 +0100, Jonathan Wakely wrote:

On 31/07/18 20:14 +0300, Ville Voutilainen wrote:

On 31 July 2018 at 20:07, Jonathan Wakely  wrote:

The solution for PR 77537 causes ambiguities due to the extra copy
assignment operator taking a __nonesuch_no_braces parameter. The copy
and move assignment operators can be defined as defaulted to meet the
semantics required by the standard.

In order to preserve ABI compatibility (specifically argument passing
conventions for pair) we need a new base class that makes the
assignment operators non-trivial.

   PR libstdc++/86751
   * include/bits/stl_pair.h (__nonesuch_no_braces): Remove.
   (__pair_base): New class with non-trivial copy assignment operator.
   (pair): Derive from __pair_base. Define copy assignment and move
   assignment operators as defaulted.
   * testsuite/20_util/pair/86751.cc: New test.


Ville, this passes all our tests, but am I forgetting something that
means this isn't right?


Pairs of references?


I knew there was a reason.

We need better tests, since nothing failed when I made this change.

OK, let me rework the patch ...


Here's the patch I've committed. It adds a test for pairs of
references, so I don't try to define t he assignment ops as defaulted
again :-)  Thanks for the quick feedback for these patches.

Tested powerpc64le-linux, committed to trunk.

This is a regression on all branches, but I'd like to leave it on
trunk for a short while before backporting it.


commit 988a9158fd074353621f4f216270109c767a4725
Author: Jonathan Wakely 
Date:   Tue Jul 31 17:26:04 2018 +0100

PR libstdc++/86751 default assignment operators for std::pair

The solution for PR 77537 causes ambiguities due to the extra copy
assignment operator taking a __nonesuch_no_braces parameter. By making
the base class non-assignable we don't need the extra deleted overload
in std::pair. The copy assignment operator will be implicitly deleted
(and the move assignment operator not declared) as needed. Without the
additional user-provided operator in std::pair the ambiguity is avoided.

PR libstdc++/86751
* include/bits/stl_pair.h (__pair_base): New class with deleted copy
assignment operator.
(pair): Derive from __pair_base.
(pair::operator=): Remove deleted overload.
* python/libstdcxx/v6/printers.py (StdPairPrinter): New pretty printer
so that new base class isn't shown in GDB.
* testsuite/20_util/pair/86751.cc: New test.
* testsuite/20_util/pair/ref_assign.cc: New test.

diff --git a/libstdc++-v3/include/bits/stl_pair.h b/libstdc++-v3/include/bits/stl_pair.h
index a2486ba8244..ea8bd981559 100644
--- a/libstdc++-v3/include/bits/stl_pair.h
+++ b/libstdc++-v3/include/bits/stl_pair.h
@@ -185,8 +185,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   struct __nonesuch_no_braces : std::__nonesuch {
 explicit __nonesuch_no_braces(const __nonesuch&) = delete;
   };
+#endif // C++11
 
-#endif
+  class __pair_base
+  {
+#if __cplusplus >= 201103L
+template friend struct pair;
+__pair_base() = default;
+~__pair_base() = default;
+__pair_base(const __pair_base&) = default;
+__pair_base& operator=(const __pair_base&) = delete;
+#endif // C++11
+  };
 
  /**
*  @brief Struct holding two objects of arbitrary type.
@@ -196,6 +206,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*/
   template
 struct pair
+: private __pair_base
 {
   typedef _T1 first_type;/// @c first_type is the first bound type
   typedef _T2 second_type;   /// @c second_type is the second bound type
@@ -374,19 +385,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return *this;
   }
 
-  pair&
-  operator=(typename conditional<
-		__not_<__and_,
-		  is_copy_assignable<_T2>>>::value,
-		const pair&, const __nonesuch_no_braces&>::type __p) = delete;
-
   pair&
   operator=(typename conditional<
 		__and_,
 		   is_move_assignable<_T2>>::value,
 		pair&&, __nonesuch_no_braces&&>::type __p)
   noexcept(__and_,
-	  is_nothrow_move_assignable<_T2>>::value)
+		  is_nothrow_move_assignable<_T2>>::value)
   {
 	first = std::forward(__p.first);
 	second = std::forward(__p.second);
diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 34d8b4e6606..43d459ec8ec 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -1229,6 +1229,39 @@ class StdExpPathPrinter:
 return self._iterator(self.val['_M_cmpts'])
 
 
+class StdPairPrinter:
+"Print a std::pair object, with 'first' and 'second' as children"
+
+def __init__(self, typename, val):
+self.val = val
+
+class _iter(Iterator):
+"An iterator for std::pair types. Returns 'first' then 'second'."
+
+def __init__(self, val):
+self.val = val
+  

[PATCH] Don't unconditionally define feature test macros in

2018-07-31 Thread Jonathan Wakely

The macro definitions in  should depend on the same
preprocessor conditions as the original macros in other headers.
Otherwise  can define macros that imply the availability of
features that are not actually defined.

This fix is incomplete, as __cpp_lib_filesystem should depend on whether
libstdc++fs.a is supported, and several macros should only be defined
when _GLIBCXX_HOSTED is defined. Also, the feature test macros should
define their value as type long, but most are type int.

* include/bits/c++config (_GLIBCXX_HAVE_BUILTIN_HAS_UNIQ_OBJ_REP)
(_GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE): Move definitions here.
(_GLIBCXX_HAVE_BUILTIN_LAUNDER): Likewise. Use !__is_identifier
instead of __has_builtin.
* include/std/type_traits (_GLIBCXX_HAVE_BUILTIN_HAS_UNIQ_OBJ_REP)
(_GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE): Remove definitions from here.
* include/std/version [!_GLIBCXX_HAS_GTHREADS]
(__cpp_lib_shared_timed_mutex, __cpp_lib_scoped_lock)
(__cpp_lib_shared_mutex): Don't define when Gthreads not in use.
[!_GLIBCXX_HAVE_BUILTIN_HAS_UNIQ_OBJ_REP]
(__cpp_lib_has_unique_object_representations): Don't define when
builtin not available.
[!_GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE] (__cpp_lib_is_aggregate):
Likewise.
[!_GLIBCXX_HAVE_BUILTIN_LAUNDER] (__cpp_lib_launder): Likewise.
* libsupc++/new (_GLIBCXX_HAVE_BUILTIN_LAUNDER): Remove definition
from here.

It would be nice if we had tests to check that every macro in
 matches the other definition of it (i.e. either both are
defined to the same value, or neither is defined).

Tested powerpc64le-linux, committed to trunk.

commit 273e2cbfc73cf52350db743ad9b0d0d8000ed17e
Author: Jonathan Wakely 
Date:   Tue Jul 31 18:32:31 2018 +0100

Don't unconditionally define feature test macros in 

The macro definitions in  should depend on the same
preprocessor conditions as the original macros in other headers.
Otherwise  can define macros that imply the availability of
features that are not actually defined.

This fix is incomplete, as __cpp_lib_filesystem should depend on whether
libstdc++fs.a is supported, and several macros should only be defined
when _GLIBCXX_HOSTED is defined. Also, the feature test macros should
define their value as type long, but most are type int.

* include/bits/c++config (_GLIBCXX_HAVE_BUILTIN_HAS_UNIQ_OBJ_REP)
(_GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE): Move definitions here.
(_GLIBCXX_HAVE_BUILTIN_LAUNDER): Likewise. Use !__is_identifier
instead of __has_builtin.
* include/std/type_traits (_GLIBCXX_HAVE_BUILTIN_HAS_UNIQ_OBJ_REP)
(_GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE): Remove definitions from here.
* include/std/version [!_GLIBCXX_HAS_GTHREADS]
(__cpp_lib_shared_timed_mutex, __cpp_lib_scoped_lock)
(__cpp_lib_shared_mutex): Don't define when Gthreads not in use.
[!_GLIBCXX_HAVE_BUILTIN_HAS_UNIQ_OBJ_REP]
(__cpp_lib_has_unique_object_representations): Don't define when
builtin not available.
[!_GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE] (__cpp_lib_is_aggregate):
Likewise.
[!_GLIBCXX_HAVE_BUILTIN_LAUNDER] (__cpp_lib_launder): Likewise.
* libsupc++/new (_GLIBCXX_HAVE_BUILTIN_LAUNDER): Remove definition
from here.

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index c0b89f481d8..d499d32b51e 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -622,4 +622,22 @@ namespace std
 #define _GLIBCXX_USE_FLOAT128
 #endif
 
+#if __GNUC__ >= 7
+// Assume these are available if the compiler claims to be a recent GCC:
+# define _GLIBCXX_HAVE_BUILTIN_HAS_UNIQ_OBJ_REP 1
+# define _GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE 1
+# define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1
+#elif defined(__is_identifier)
+// For non-GNU compilers:
+# if ! __is_identifier(__has_unique_object_representations)
+#  define _GLIBCXX_HAVE_BUILTIN_HAS_UNIQ_OBJ_REP 1
+# endif
+# if ! __is_identifier(__is_aggregate)
+#  define _GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE 1
+# endif
+# if ! __is_identifier(__builtin_launder)
+#  define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1
+# endif
+#endif // GCC
+
 // End of prewritten config; the settings discovered at configure time follow.
diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index aaa554c6200..4f89723d468 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2956,15 +2956,6 @@ template 
 template 
   inline constexpr bool is_convertible_v = is_convertible<_From, _To>::value;
 
-#if __GNUC__ >= 7
-# define _GLIBCXX_HAVE_BUILTIN_HAS_UNIQ_OBJ_REP 1
-#elif defined(__is_identifier)
-// For non-GNU compilers:
-# if ! __is_identifier(__has_unique_object_representations)

[Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-07-31 Thread Steve Ellcey
Here is a new version of my patch to support the Aarch64 SIMD ABI [1]
in GCC.  I think this is complete enought to be considered for check
in.  I wrote a few new tests and put them in a new gcc.target/torture
directory so they would be run with multiple optimization options.  I
also verified that there are no regressions in the GCC testsuite.

The significant difference between the standard ARM ABI and the SIMD
ABI is that in the normal ABI a callee saves only the lower 64 bits of 
registers V8-V15, in the SIMD ABI the callee must save all 128 bits of
registers V8-V23.

As I mentioned in my RFC, I intend to (eventually) follow this patch
with two more, one to define the TARGET_SIMD_CLONE* macros and one to
improve the GCC register allocation/usage when calling SIMD
functions.  Right now, a caller calling a SIMD function will save more
registers than it needs to because some of those registers will also be
saved by the callee.

Steve Ellcey
sell...@cavium.com

[1] https://developer.arm.com/products/software-development-tools/hpc/a
rm-compiler-for-hpc/vector-function-abi

Compiler ChangeLog:

2018-07-31  Steve Ellcey  

* config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p):
New prototype.
(aarch64_epilogue_uses): Ditto.
* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_is_simd_call_p): New function.
(aarch64_function_ok_for_sibcall): Check for simd calls.
(aarch64_layout_frame): Check for simd function.
(aarch64_gen_storewb_pair): Handle E_TFmode.
(aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_loadwb_pair): Handle E_TFmode.
(aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_store_pair): Handle E_TFmode.
(aarch64_gen_load_pair): Ditto.
(aarch64_save_callee_saves): Handle different mode sizes.
(aarch64_restore_callee_saves): Ditto.
(aarch64_components_for_bb): Check for simd function.
(aarch64_epilogue_uses): New function.
(aarch64_process_components): Ditto.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_expand_call): Ditto.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (EPILOGUE_USES): Redefine.
(FP_SIMD_SAVED_REGNUM_P): New macro.
* config/aarch64/aarch64.md (V23_REGNUM) New constant.
(simple_return): New define_expand.
(load_pair_dw_tftf): New instruction.
(store_pair_dw_tftf): Ditto.
(loadwb_pair_): Ditto.
("storewb_pair_): Ditto.

Testsuite ChangeLog:

2018-07-31  Steve Ellcey  

* gcc.target/aarch64/torture/aarch64-torture.exp: New file.
* gcc.target/aarch64/torture/simd-abi-1.c: New test.
* gcc.target/aarch64/torture/simd-abi-2.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-3.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-4.c: Ditto.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index af5db9c..99c962f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -423,6 +423,7 @@ bool aarch64_split_dimode_const_store (rtx, rtx);
 bool aarch64_symbolic_address_p (rtx);
 bool aarch64_uimm12_shift (HOST_WIDE_INT);
 bool aarch64_use_return_insn_p (void);
+bool aarch64_use_simple_return_insn_p (void);
 const char *aarch64_mangle_builtin_type (const_tree);
 const char *aarch64_output_casesi (rtx *);
 
@@ -507,6 +508,8 @@ void aarch64_split_simd_move (rtx, rtx);
 /* Check for a legitimate floating point constant for FMOV.  */
 bool aarch64_float_const_representable_p (rtx);
 
+extern int aarch64_epilogue_uses (int);
+
 #if defined (RTX_CODE)
 void aarch64_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode,
    rtx label_ref);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fa01475..9e6827a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1027,6 +1027,15 @@ static const struct processor *selected_tune;
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
+/* Table of machine attributes.  */
+static const struct attribute_spec aarch64_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "aarch64_vector_pcs", 0, 0, true,  false, false, false, NULL, NULL },
+  { NULL, 0, 0, false, false, false, false, NULL, NULL }
+};
+
 #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)
 
 /* An ISA extension in the co-processor and main instruction set space.  */
@@ -1405,6 +1414,26 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
   return false;
 }
 
+/* Return true if this is a definition of a v

Re: [PATCH] Avoid infinite loop with duplicate anonymous union fields

2018-07-31 Thread Joseph Myers
On Wed, 1 Aug 2018, Bogdan Harjoc wrote:

> So array[0] < component < array[2], which loops (I removed the gdb p
> commands for field_array[1] and so on).

Is the key thing here that you end up with DECL_NAME (field) == NULL_TREE, 
but DECL_NAME (field_array[bot]) != NULL_TREE - and in this particular 
case of a bad ordering only, it's possible to loop without either top or 
bot being changed?  (But other details of the DECL_NAME ordering are 
needed to actually get to that particular point.)

seen_error () is the idiomatic way of testing whether an error has been 
reported.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode

2018-07-31 Thread James Greenhalgh
On Thu, Jul 26, 2018 at 11:52:15AM -0500, Sam Tebbs wrote:



> > Thanks for making the changes and adding more test cases. I do however
> > see that you are only covering 2 out of 4 new
> > *aarch64_get_lane_zero_extenddi<> patterns. The
> > *aarch64_get_lane_zero_extendsi<> were already existing. I don't mind
> > those tests. I would just ask you to add the other two new patterns
> > as well. Also since the different versions of the instruction generate
> > same instructions (like foo_16qi and foo_8qi both give out the same
> > instruction), I would suggest using a -fdump-rtl-final (or any relevant
> > rtl dump) with the dg-options and using a scan-rtl-dump to scan the
> > pattern name. Something like:
> > /* { dg-do compile } */
> > /* { dg-options "-O3 -fdump-rtl-final" } */
> > ...
> > ...
> > /* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv16qi" 
> > "final" } } */
> >
> > Thanks
> > Sudi
> 
> Hi Sudi,
> 
> Thanks again. Here's an update that adds 4 more tests, so all 8 patterns
> generated are now tested for!

This is OK for trunk, thanks for the patch (and thanks Sudi for the review!)

Thanks,
James

> 
> Below is the updated changelog
> 
> gcc/
> 2018-07-26  Sam Tebbs  
> 
>      * config/aarch64/aarch64-simd.md
>      (*aarch64_get_lane_zero_extendsi):
>      Rename to...
> (*aarch64_get_lane_zero_extend): ... This.
>      Use GPI iterator instead of SI mode.
> 
> gcc/testsuite
> 2018-07-26  Sam Tebbs  
> 
>      * gcc.target/aarch64/extract_zero_extend.c: New file
> 



Re: [PATCH] Avoid infinite loop with duplicate anonymous union fields

2018-07-31 Thread Bogdan Harjoc
#define foo(a) did it, thanks!

As Joseph suspected, the hang/no hang result depended on the values of
DECL_NAME pointers:

- with #define foo(a) plus the testcase from bugzilla id 86690 and no
-save-temps, the "return s.a" that triggers lookup_field() will find
the sorted field_array containing:

(gdb) p component
$1 = (tree) 0x76300050

(gdb) p field_array[0].decl_minimal.name
$0 = (tree) 0x761cfd70
$1 = (tree) 0x0
$2 = (tree) 0x763000f0
$3 = (tree) 0x76300140
$4 = (tree) 0x76300190
$5 = (tree) 0x763001e0
$6 = (tree) 0x76300230
$7 = (tree) 0x76300280
$8 = (tree) 0x763002d0
$9 = (tree) 0x76300320

So array[0] < component < array[2], which loops (I removed the gdb p
commands for field_array[1] and so on).

- with same test-case and with -save-temps I get:

(gdb) p component
$1 = (tree) 0x76300c30

(gdb) p field_array[0].decl_minimal.name
$0 = (tree) 0x761cfd70
$1 = (tree) 0x0
$2 = (tree) 0x76300050
$3 = (tree) 0x763000a0
$4 = (tree) 0x763000f0
$5 = (tree) 0x76300140
$6 = (tree) 0x76300190
$7 = (tree) 0x763001e0
$8 = (tree) 0x76300230
$9 = (tree) 0x76300280

So component > array[9], and in this case the binary search doesn't
end up with bottom=0 and top=2, where it would hang earlier.

Component here is the s.a field, and with -save-temps, cc1 gets bin.i
as input (which it treats as preprocessed due to the extension)
instead of bin.c. So with preprocessed/unprocessed source, the order
in which builtin/user-defined names are allocated changes, resulting
in a hang or no-hang result.

I propose this testcase for gcc/testsuite/gcc.dg/ as it reproduces
with or without -save-temps, and with no other magic 'scanf'
identifiers because it keeps a0 first in the sorted array and a1
(which will be replaced with NULL) second:

int a0;

struct S
{
int a1;
union {
int a0;
int a1;
int a2, a3, a4, a5, a6, a7, a8, a9;
int a10, a11, a12, a13, a14, a15;
};
};

int f()
{
struct S s;
return s.a0;
}

(gdb) p component
$1 = (tree) 0x76300c30

(gdb) p field_array[0].decl_minimal.name
$1 = (tree) 0x76300c30
$1 = (tree) 0x0
$2 = (tree) 0x76300050
$3 = (tree) 0x763000a0...

Thanks for the guidance,
Bogdan

On Tue, Jul 31, 2018 at 10:43 PM, Joseph Myers  wrote:
> On Tue, 31 Jul 2018, Bogdan Harjoc wrote:
>
>> With fresh git sources and contrib/gcc_update the tests pass:
>>
>> === gcc Summary ===
>>
>> # of expected passes 133500
>> # of expected failures 422
>> # of unsupported tests 2104
>>
>> gcc-build/gcc/xgcc  version 9.0.0 20180730 (experimental) (GCC)
>>
>> I wasn't able to reduce the input to avoid including  and as
>> it only reproduces without -save-temps, it's not clear how to write a
>> testcase for this one.
>
> Could you give more details of the paths through the code that are
> involved in the infinite loop, and the different paths you get without
> -save-temps?  Is this an issue of dependence on the values of pointers, or
> something like that?  Is it possible to produce a test with more instances
> of the problem in it, say, so that the probability of the problem showing
> up at least once with the test is much higher?
>
> A binary search should not result in an infinite loop simply because the
> elements of the array are not sorted; in that case it should just converge
> on an unpredictable element.  So more explanation of how the infinite loop
> occurs is needed.  (But if choice of an unpredictable element results in
> e.g. subsequent diagnostics varying depending on pointer values, that by
> itself is a problem that may justify avoiding this code in the case where
> the array may not be sorted.)
>
> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: [PATCH 11/11] rs6000 - add speculation_barrier pattern

2018-07-31 Thread Bill Schmidt
Hi Richard,

I can't ack the patch, but I am happy with it.  Thank you for this work!

-- Bill

Bill Schmidt, Ph.D.
STSM, GCC Architect for Linux on Power
IBM Linux Technology Center
wschm...@linux.vnet.ibm.com

> On Jul 27, 2018, at 4:37 AM, Richard Earnshaw  
> wrote:
> 
> 
> This patch reworks the existing rs6000_speculation_barrier pattern to
> work with the new __builtin_sepculation_safe_value() intrinsic.  The
> change is trivial as it simply requires renaming the existing speculation
> barrier pattern.
> 
> So the total patch is to delete 14 characters!
> 
>   * config/rs6000/rs6000.md (speculation_barrier): Renamed from
>   rs6000_speculation_barrier.
>   * config/rs6000/rs6000.c (rs6000_expand_builtin): Adjust for
>   new barrier pattern name.
> ---
> gcc/config/rs6000/rs6000.c  | 2 +-
> gcc/config/rs6000/rs6000.md | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
> 
> <0011-rs6000-add-speculation_barrier-pattern.patch>



Re: [gen/AArch64] Generate helpers for substituting iterator values into pattern names

2018-07-31 Thread James Greenhalgh
On Fri, Jul 13, 2018 at 04:15:41AM -0500, Richard Sandiford wrote:
> Given a pattern like:
> 
>   (define_insn "aarch64_frecpe" ...)
> 
> the SVE ACLE implementation wants to generate the pattern for a
> particular (non-constant) mode.  This patch automatically generates
> helpers to do that, specifically:
> 
>   // Return CODE_FOR_nothing on failure.
>   insn_code maybe_code_for_aarch64_frecpe (machine_mode);
> 
>   // Assert that the code exists.
>   insn_code code_for_aarch64_frecpe (machine_mode);
> 
>   // Return NULL_RTX on failure.
>   rtx maybe_gen_aarch64_frecpe (machine_mode, rtx, rtx);
> 
>   // Assert that generation succeeds.
>   rtx gen_aarch64_frecpe (machine_mode, rtx, rtx);
> 
> Many patterns don't have sensible names when all <...>s are removed.
> E.g. "2" would give a base name "2".  The new functions
> therefore require explicit opt-in, which should also help to reduce
> code bloat.
> 
> The (arbitrary) opt-in syntax I went for was to prefix the pattern
> name with '@', similarly to the existing '*' marker.
> 
> The patch also makes config/aarch64 use the new routines in cases where
> they obviously apply.  This was mostly straight-forward, but it seemed
> odd that we defined:
> 
>aarch64_reload_movcp<...>
> 
> but then only used it with DImode, never SImode.  If we should be
> using Pmode instead of DImode, then that's a simple change,
> but should probably be a separate patch.
> 
> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
> and x86_64-linux-gnu.  I think I can self-approve the gen* bits,
> but OK for the AArch64 parts?

For what it is worth, I like the change to AArch64, and would support it
when you get consensus around the new syntax from other targets.

You only have to look at something like:

> -  rtx (*gen) (rtx, rtx, rtx);
> -
> -  switch (src_mode)
> -{
> -case E_V8QImode:
> -  gen = gen_aarch64_simd_combinev8qi;
> -  break;
> -case E_V4HImode:
> -  gen = gen_aarch64_simd_combinev4hi;
> -  break;
> -case E_V2SImode:
> -  gen = gen_aarch64_simd_combinev2si;
> -  break;
> -case E_V4HFmode:
> -  gen = gen_aarch64_simd_combinev4hf;
> -  break;
> -case E_V2SFmode:
> -  gen = gen_aarch64_simd_combinev2sf;
> -  break;
> -case E_DImode:
> -  gen = gen_aarch64_simd_combinedi;
> -  break;
> -case E_DFmode:
> -  gen = gen_aarch64_simd_combinedf;
> -  break;
> -default:
> -  gcc_unreachable ();
> -}
> -
> -  emit_insn (gen (dst, src1, src2));
> +  emit_insn (gen_aarch64_simd_combine (src_mode, dst, src1, src2));

To understand this is a Good Thing for code maintainability.

Thanks,
James


> 
> Any objections to this approach or syntax?
> 
> Richard


Re: [PATCH][AArch64] Implement new intrinsics vabsd_s64 and vnegd_s64

2018-07-31 Thread James Greenhalgh
On Fri, Jul 20, 2018 at 04:37:34AM -0500, Vlad Lazar wrote:
> Hi,
> 
> The patch adds implementations for the NEON intrinsics vabsd_s64 and 
> vnegd_s64.
> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/docs/ihi0073/latest/arm-neon-intrinsics-reference-architecture-specification)
> 
> Bootstrapped and regtested on aarch64-none-linux-gnu and there are no 
> regressions.
> 
> OK for trunk?
> 
> +__extension__ extern __inline int64_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vnegd_s64 (int64_t __a)
> +{
> +  return -__a;
> +}

Does this give the correct behaviour for the minimum value of int64_t? That
would be undefined behaviour in C, but well-defined under ACLE.

Thanks,
James



Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-07-31 Thread Andrew Pinski
On Tue, Jul 31, 2018 at 2:43 PM James Greenhalgh
 wrote:
>
> On Thu, Jul 12, 2018 at 12:01:09PM -0500, Sudakshina Das wrote:
> > Hi Eric
> >
> > On 27/06/18 12:22, Wilco Dijkstra wrote:
> > > Eric Botcazou wrote:
> > >
> > >>> This test can easily be changed not to use optimize since it doesn't 
> > >>> look
> > >>> like it needs it. We really need to tests these builtins properly,
> > >>> otherwise they will continue to fail on most targets.
> > >>
> > >> As far as I can see PR target/84521 has been reported only for Aarch64 
> > >> so I'd
> > >> just leave the other targets alone (and avoid propagating FUD if 
> > >> possible).
> > >
> > > It's quite obvious from PR84521 that this is an issue affecting all 
> > > targets.
> > > Adding better generic tests for __builtin_setjmp can only be a good thing.
> > >
> > > Wilco
> > >
> >
> > This conversation seems to have died down and I would like to
> > start it again. I would agree with Wilco's suggestion about
> > keeping the test in the generic folder. I have removed the
> > optimize attribute and the effect is still the same. It passes
> > on AArch64 with this patch and it currently fails on x86
> > trunk (gcc version 9.0.0 20180712 (experimental) (GCC))
> > on -O1 and above.
>
>
> I don't see where the FUD comes in here; either this builtin has a defined
> semantics across targets and they are adhered to, or the builtin doesn't have
> well defined semantics, or the targets fail to implement those semantics.

The problem comes from the fact the builtins are not documented at all.
See PR59039 for the issue on them not being documented.

Thanks,
Andrew


>
> I think this should go in as is. If other targets are unhappy with the
> failing test they should fix their target or skip the test if it is not
> appropriate.
>
> You may want to CC some of the maintainers of platforms you know to fail as
> a courtesy on the PR (add your testcase, and add failing targets and their
> maintainers to that PR) before committing so it doesn't come as a complete
> surprise.
>
> This is OK with some attempt to get target maintainers involved in the
> conversation before commit.
>
> Thanks,
> James
>
> > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> > index f284e74..9792d28 100644
> > --- a/gcc/config/aarch64/aarch64.h
> > +++ b/gcc/config/aarch64/aarch64.h
> > @@ -473,7 +473,9 @@ extern unsigned aarch64_architecture_version;
> >  #define EH_RETURN_STACKADJ_RTX   gen_rtx_REG (Pmode, R4_REGNUM)
> >  #define EH_RETURN_HANDLER_RTX  aarch64_eh_return_handler_rtx ()
> >
> > -/* Don't use __builtin_setjmp until we've defined it.  */
> > +/* Don't use __builtin_setjmp until we've defined it.
> > +   CAUTION: This macro is only used during exception unwinding.
> > +   Don't fall for its name.  */
> >  #undef DONT_USE_BUILTIN_SETJMP
> >  #define DONT_USE_BUILTIN_SETJMP 1
> >
> > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > index 01f35f8..4266a3d 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -3998,7 +3998,7 @@ static bool
> >  aarch64_needs_frame_chain (void)
> >  {
> >/* Force a frame chain for EH returns so the return address is at FP+8.  
> > */
> > -  if (frame_pointer_needed || crtl->calls_eh_return)
> > +  if (frame_pointer_needed || crtl->calls_eh_return || 
> > cfun->has_nonlocal_label)
> >  return true;
> >
> >/* A leaf function cannot have calls or write LR.  */
> > @@ -12218,6 +12218,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx 
> > nextarg ATTRIBUTE_UNUSED)
> >expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
> >  }
> >
> > +/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE.  */
> > +static rtx
> > +aarch64_builtin_setjmp_frame_value (void)
> > +{
> > +  return hard_frame_pointer_rtx;
> > +}
> > +
> >  /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR.  */
> >
> >  static tree
> > @@ -17744,6 +17751,9 @@ aarch64_run_selftests (void)
> >  #undef TARGET_FOLD_BUILTIN
> >  #define TARGET_FOLD_BUILTIN aarch64_fold_builtin
> >
> > +#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE
> > +#define TARGET_BUILTIN_SETJMP_FRAME_VALUE 
> > aarch64_builtin_setjmp_frame_value
> > +
> >  #undef TARGET_FUNCTION_ARG
> >  #define TARGET_FUNCTION_ARG aarch64_function_arg
> >
> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > index a014a01..d5f33d8 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -6087,6 +6087,30 @@
> >DONE;
> >  })
> >
> > +;; This is broadly similar to the builtins.c except that it uses
> > +;; temporaries to load the incoming SP and FP.
> > +(define_expand "nonlocal_goto"
> > +  [(use (match_operand 0 "general_operand"))
> > +   (use (match_operand 1 "general_operand"))
> > +   (use (match_operand 2 "general_operand"))
> > +   (use (match_operand 3 "general_operand"))]
> > +  ""
> > +{
> > +rtx label_in = copy_to_reg (operands[1]);
> > +rtx fp_in = copy_to_reg (operands[3

Re: [AArch64] Add support for 16-bit FMOV immediates

2018-07-31 Thread James Greenhalgh
On Wed, Jul 18, 2018 at 12:47:27PM -0500, Richard Sandiford wrote:
> aarch64_float_const_representable_p was still returning false for
> HFmode, so we wouldn't use 16-bit FMOV immediate.  E.g. before the
> patch:
> 
> __fp16 foo (void) { return 0x1.1p-3; }
> 
> gave:
> 
>mov w0, 12352
>fmovh0, w0
> 
> with -march=armv8.2-a+fp16, whereas now it gives:
> 
>fmovh0, 1.328125e-1
> 
> Tested on aarch64-linux-gnu, both with and without SVE.  OK to install?

OK.

Thanks,
James

> 
> Richard
> 
> 
> 2018-07-18  Richard Sandiford  
> 
> gcc/
>   * config/aarch64/aarch64.c (aarch64_float_const_representable_p):
>   Allow HFmode constants if TARGET_FP_F16INST.
> 
> gcc/testsuite/
>   * gcc.target/aarch64/f16_mov_immediate_1.c: Expect fmov immediate
>   to be used.
>   * gcc.target/aarch64/f16_mov_immediate_2.c: Likewise.
>   * gcc.target/aarch64/f16_mov_immediate_3.c: Force +nofp16.
>   * gcc.target/aarch64/sve/single_1.c: Except fmov immediate to be used
>   for .h.
>   * gcc.target/aarch64/sve/single_2.c: Likewise.
>   * gcc.target/aarch64/sve/single_3.c: Likewise.
>   * gcc.target/aarch64/sve/single_4.c: Likewise.
> 
> Index: gcc/config/aarch64/aarch64.c
> ===
> --- gcc/config/aarch64/aarch64.c  2018-07-18 18:45:26.0 +0100
> +++ gcc/config/aarch64/aarch64.c  2018-07-18 18:45:27.025332090 +0100
> @@ -14908,8 +14908,8 @@ aarch64_float_const_representable_p (rtx
>if (!CONST_DOUBLE_P (x))
>  return false;
>  
> -  /* We don't support HFmode constants yet.  */
> -  if (GET_MODE (x) == VOIDmode || GET_MODE (x) == HFmode)
> +  if (GET_MODE (x) == VOIDmode
> +  || (GET_MODE (x) == HFmode && !TARGET_FP_F16INST))
>  return false;
>  
>r = *CONST_DOUBLE_REAL_VALUE (x);
> Index: gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_1.c
> ===
> --- gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_1.c2018-07-18 
> 18:45:26.0 +0100
> +++ gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_1.c2018-07-18 
> 18:45:27.025332090 +0100
> @@ -44,6 +44,6 @@ __fp16 f5 ()
>return a;
>  }
>  
> -/* { dg-final { scan-assembler-times "mov\tw\[0-9\]+, #?19520"   3 } 
> } */
> -/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.4h, 0xbc, lsl 8"  1 
> } } */
> -/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.4h, 0x4c, lsl 8"  1 
> } } */
> +/* { dg-final { scan-assembler-times {fmov\th[0-9]+, #?1\.7e\+1}  3 } } */
> +/* { dg-final { scan-assembler-times {fmov\th[0-9]+, #?-1\.0e\+0} 1 } } */
> +/* { dg-final { scan-assembler-times {fmov\th[0-9]+, #?1\.6e\+1}  1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_2.c
> ===
> --- gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_2.c2018-07-18 
> 18:45:26.0 +0100
> +++ gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_2.c2018-07-18 
> 18:45:27.025332090 +0100
> @@ -40,6 +40,4 @@ float16_t f3(void)
>  /* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.4h, 0x5c, lsl 8" 1 
> } } */
>  /* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.4h, 0x7c, lsl 8" 1 
> } } */
>  
> -/* { dg-final { scan-assembler-times "mov\tw\[0-9\]+, 19520"  1 
> } } */
> -/* { dg-final { scan-assembler-times "fmov\th\[0-9\], w\[0-9\]+"  1 
> } } */
> -
> +/* { dg-final { scan-assembler-times {fmov\th[0-9]+, #?1.7e\+1}   1 
> } } */
> Index: gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_3.c
> ===
> --- gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_3.c2018-07-18 
> 18:45:26.0 +0100
> +++ gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_3.c2018-07-18 
> 18:45:27.025332090 +0100
> @@ -1,6 +1,8 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2" } */
>  
> +#pragma GCC target "+nofp16"
> +
>  __fp16 f4 ()
>  {
>__fp16 a = 0.1;
> Index: gcc/testsuite/gcc.target/aarch64/sve/single_1.c
> ===
> --- gcc/testsuite/gcc.target/aarch64/sve/single_1.c   2018-07-18 
> 18:45:26.0 +0100
> +++ gcc/testsuite/gcc.target/aarch64/sve/single_1.c   2018-07-18 
> 18:45:27.025332090 +0100
> @@ -36,7 +36,7 @@ TEST_LOOP (double, 3.0)
>  /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.s, #6\n} 1 } } */
>  /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #7\n} 1 } } */
>  /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #8\n} 1 } } */
> -/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, #15360\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tfmov\tz[0-9]+\.h, #1\.0e\+0\n} 1 } } 
> */
>  /* { dg-final { scan-assembler-times {\tfmov\tz[0-9]+\.s, #2\.0e\+0\n} 1 } } 
> */
>  /* { dg-final { scan-assembler-times {\tfmov\tz[0-9]+\.d, #3\.0e\+0\n

Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-07-31 Thread James Greenhalgh
On Thu, Jul 12, 2018 at 12:01:09PM -0500, Sudakshina Das wrote:
> Hi Eric
> 
> On 27/06/18 12:22, Wilco Dijkstra wrote:
> > Eric Botcazou wrote:
> > 
> >>> This test can easily be changed not to use optimize since it doesn't look
> >>> like it needs it. We really need to tests these builtins properly,
> >>> otherwise they will continue to fail on most targets.
> >>
> >> As far as I can see PR target/84521 has been reported only for Aarch64 so 
> >> I'd
> >> just leave the other targets alone (and avoid propagating FUD if possible).
> > 
> > It's quite obvious from PR84521 that this is an issue affecting all targets.
> > Adding better generic tests for __builtin_setjmp can only be a good thing.
> > 
> > Wilco
> > 
> 
> This conversation seems to have died down and I would like to
> start it again. I would agree with Wilco's suggestion about
> keeping the test in the generic folder. I have removed the
> optimize attribute and the effect is still the same. It passes
> on AArch64 with this patch and it currently fails on x86
> trunk (gcc version 9.0.0 20180712 (experimental) (GCC))
> on -O1 and above.


I don't see where the FUD comes in here; either this builtin has a defined
semantics across targets and they are adhered to, or the builtin doesn't have
well defined semantics, or the targets fail to implement those semantics.

I think this should go in as is. If other targets are unhappy with the
failing test they should fix their target or skip the test if it is not
appropriate.

You may want to CC some of the maintainers of platforms you know to fail as
a courtesy on the PR (add your testcase, and add failing targets and their
maintainers to that PR) before committing so it doesn't come as a complete
surprise.

This is OK with some attempt to get target maintainers involved in the
conversation before commit.

Thanks,
James

> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index f284e74..9792d28 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -473,7 +473,9 @@ extern unsigned aarch64_architecture_version;
>  #define EH_RETURN_STACKADJ_RTX   gen_rtx_REG (Pmode, R4_REGNUM)
>  #define EH_RETURN_HANDLER_RTX  aarch64_eh_return_handler_rtx ()
>  
> -/* Don't use __builtin_setjmp until we've defined it.  */
> +/* Don't use __builtin_setjmp until we've defined it.
> +   CAUTION: This macro is only used during exception unwinding.
> +   Don't fall for its name.  */
>  #undef DONT_USE_BUILTIN_SETJMP
>  #define DONT_USE_BUILTIN_SETJMP 1
>  
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 01f35f8..4266a3d 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3998,7 +3998,7 @@ static bool
>  aarch64_needs_frame_chain (void)
>  {
>/* Force a frame chain for EH returns so the return address is at FP+8.  */
> -  if (frame_pointer_needed || crtl->calls_eh_return)
> +  if (frame_pointer_needed || crtl->calls_eh_return || 
> cfun->has_nonlocal_label)
>  return true;
>  
>/* A leaf function cannot have calls or write LR.  */
> @@ -12218,6 +12218,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx 
> nextarg ATTRIBUTE_UNUSED)
>expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
>  }
>  
> +/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE.  */
> +static rtx
> +aarch64_builtin_setjmp_frame_value (void)
> +{
> +  return hard_frame_pointer_rtx;
> +}
> +
>  /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR.  */
>  
>  static tree
> @@ -17744,6 +17751,9 @@ aarch64_run_selftests (void)
>  #undef TARGET_FOLD_BUILTIN
>  #define TARGET_FOLD_BUILTIN aarch64_fold_builtin
>  
> +#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE
> +#define TARGET_BUILTIN_SETJMP_FRAME_VALUE aarch64_builtin_setjmp_frame_value
> +
>  #undef TARGET_FUNCTION_ARG
>  #define TARGET_FUNCTION_ARG aarch64_function_arg
>  
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index a014a01..d5f33d8 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -6087,6 +6087,30 @@
>DONE;
>  })
>  
> +;; This is broadly similar to the builtins.c except that it uses
> +;; temporaries to load the incoming SP and FP.
> +(define_expand "nonlocal_goto"
> +  [(use (match_operand 0 "general_operand"))
> +   (use (match_operand 1 "general_operand"))
> +   (use (match_operand 2 "general_operand"))
> +   (use (match_operand 3 "general_operand"))]
> +  ""
> +{
> +rtx label_in = copy_to_reg (operands[1]);
> +rtx fp_in = copy_to_reg (operands[3]);
> +rtx sp_in = copy_to_reg (operands[2]);
> +
> +emit_move_insn (hard_frame_pointer_rtx, fp_in);
> +emit_stack_restore (SAVE_NONLOCAL, sp_in);
> +
> +emit_use (hard_frame_pointer_rtx);
> +emit_use (stack_pointer_rtx);
> +
> +emit_indirect_jump (label_in);
> +
> +DONE;
> +})
> +
>  ;; Helper for aarch64.c code.
>  (define_expand "set_clobber_cc"
>[(parallel [(set (match_operand 0)
> diff --git a/gcc/testsuite/gcc.c

Re: [PATCH][GCC][AArch64] Cleanup the AArch64 testsuite when stack-clash is on [Patch (6/6)]

2018-07-31 Thread James Greenhalgh
On Tue, Jul 24, 2018 at 05:28:03AM -0500, Tamar Christina wrote:
> Hi All,
> 
> This patch cleans up the testsuite when a run is done with stack clash
> protection turned on.
> 
> Concretely this switches off -fstack-clash-protection for a couple of tests:
> 
> * sve: We don't yet support stack-clash-protection and sve, so for now turn 
> these off.
> * assembler scan: some tests are quite fragile in that they check for exact
>assembly output, e.g. check for exact amount of sub etc.  These won't
>match now.
> * vla: Some of the ubsan tests negative array indices. Because the arrays 
> weren't
>used before the incorrect $sp wouldn't have been used. The correct 
> value is
>restored on ret.  Now however we probe the $sp which causes a segfault.
> * params: When testing the parameters we have to skip these on AArch64 
> because of our
>   custom constraints on them.  We already test them separately so 
> this isn't a
>   loss.
> 
> Note that the testsuite is not entire clean due to gdb failure caused by 
> alloca with
> stack clash. On AArch64 we output an incorrect .loc directive, but this is 
> already the
> case with the current implementation in GCC and is a bug unrelated to this 
> patch series.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no 
> issues.
> Both targets were tested with stack clash on and off by default.
> 
> Ok for trunk?

For each of the generic tests you skip because of mismatched bounds, I think
we should ensure we have an equivalent test checking that behaviour in
gcc.target/aarch64/ . If we have that, it might be good to cross-reference
them with a comment above your skip lines.

> * vla: Some of the ubsan tests negative array indices. Because the arrays 
> weren't
>used before the incorrect $sp wouldn't have been used. The correct 
> value is
>restored on ret.  Now however we probe the $sp which causes a segfault.

This is interesting behaviour; is it a desirable side effect of your changes?

Otherwise, this patch is OK.

Thanks,
James


> gcc/testsuite/
> 2018-07-24  Tamar Christina  
> 
>   PR target/86486
>   * gcc.dg/pr82788.c: Skip for AArch64.
>   * gcc.dg/guality/vla-1.c: Turn off stack-clash.
>   * gcc.target/aarch64/subsp.c: Likewise.
>   * gcc.target/aarch64/sve/mask_struct_load_3.c: Likewise.
>   * gcc.target/aarch64/sve/mask_struct_store_3.c: Likewise.
>   * gcc.target/aarch64/sve/mask_struct_store_4.c: Likewise.
>   * gcc.dg/params/blocksort-part.c: Skip stack-clash checks
>   on AArch64.
>   * gcc.dg/stack-check-10.c: Add AArch64 specific checks.
>   * gcc.dg/stack-check-5.c: Add AArch64 specific checks.
>   * gcc.dg/stack-check-6a.c: Skip on AArch64, we don't support this.
>   * testsuite/lib/target-supports.exp
>   (check_effective_target_frame_pointer_for_non_leaf): AArch64 does not
>   require frame pointer for non-leaf functions.
> 
> > -Original Message-
> > From: Tamar Christina 
> > Sent: Wednesday, July 11, 2018 12:23
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd ; James Greenhalgh ;
> > Richard Earnshaw ; Marcus Shawcroft
> > 
> > Subject: [PATCH][GCC][AArch64] Cleanup the AArch64 testsuite when stack-
> > clash is on [Patch (6/6)]
> > 
> > Hi All,
> > 
> > This patch cleans up the testsuite when a run is done with stack clash
> > protection turned on.
> > 
> > Concretely this switches off -fstack-clash-protection for a couple of tests:
> > 
> > * sve: We don't yet support stack-clash-protection and sve, so for now turn
> > these off.
> > * assembler scan: some tests are quite fragile in that they check for exact
> >assembly output, e.g. check for exact amount of sub etc.  These won't
> >match now.
> > * vla: Some of the ubsan tests negative array indices. Because the arrays
> > weren't
> >used before the incorrect $sp wouldn't have been used. The correct
> > value is
> >restored on ret.  Now however we probe the $sp which causes a 
> > segfault.
> > * params: When testing the parameters we have to skip these on AArch64
> > because of our
> >   custom constraints on them.  We already test them separately so 
> > this
> > isn't a
> >   loss.
> > 
> > Note that the testsuite is not entire clean due to gdb failure caused by 
> > alloca
> > with stack clash. On AArch64 we output an incorrect .loc directive, but 
> > this is
> > already the case with the current implementation in GCC and is a bug
> > unrelated to this patch series.
> > 
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no issues.
> > Both targets were tested with stack clash on and off by default.
> > 
> > Ok for trunk?
> > 
> > Thanks,
> > Tamar
> > 
> > gcc/testsuite/
> > 2018-07-11  Tamar Christina  
> > 
> > PR target/86486
> > gcc.dg/pr82788.c: Skip for AArch64.
> > gcc.dg/guality/vla-1.c: Turn off stack-clash.
> > gcc.target/aarch64/subs

Re: [PATCH][GCC][AArch64] Set default values for stack-clash and do basic validation in back-end. [Patch (5/6)]

2018-07-31 Thread James Greenhalgh
On Tue, Jul 24, 2018 at 05:27:05AM -0500, Tamar Christina wrote:
> Hi All,
> 
> This patch is a cascade update from having to re-spin the configure patch 
> (no# 4 in the series).
> 
> This patch enforces that the default guard size for stack-clash protection for
> AArch64 be 64KB unless the user has overriden it via configure in which case
> the user value is used as long as that value is within the valid range.
> 
> It also does some basic validation to ensure that the guard size is only 4KB 
> or
> 64KB and also enforces that for aarch64 the stack-clash probing interval is
> equal to the guard size.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> Target was tested with stack clash on and off by default.
> 
> Ok for trunk?

This is OK with the style changes below.

Thanks,
James

> gcc/
> 2018-07-24  Tamar Christina  
> 
>   PR target/86486
>   * config/aarch64/aarch64.c (aarch64_override_options_internal):
>   Add validation for stack-clash parameters and set defaults.
> 
> > -Original Message-
> > From: Tamar Christina 
> > Sent: Wednesday, July 11, 2018 12:23
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd ; James Greenhalgh ;
> > Richard Earnshaw ; Marcus Shawcroft
> > 
> > Subject: [PATCH][GCC][AArch64] Set default values for stack-clash and do
> > basic validation in back-end. [Patch (5/6)]
> > 
> > Hi All,
> > 
> > This patch enforces that the default guard size for stack-clash protection 
> > for
> > AArch64 be 64KB unless the user has overriden it via configure in which case
> > the user value is used as long as that value is within the valid range.
> > 
> > It also does some basic validation to ensure that the guard size is only 
> > 4KB or
> > 64KB and also enforces that for aarch64 the stack-clash probing interval is
> > equal to the guard size.
> > 
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > Target was tested with stack clash on and off by default.
> > 
> > Ok for trunk?
> > 
> > Thanks,
> > Tamar
> > 
> > gcc/
> > 2018-07-11  Tamar Christina  
> > 
> > PR target/86486
> > * config/aarch64/aarch64.c (aarch64_override_options_internal):
> > Add validation for stack-clash parameters.
> > 
> > --

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> e2c34cdfc96a1d3f99f7e4834c66a7551464a518..30c62c406e10793fe041d54c73316a6c8d7c229f
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -10916,6 +10916,37 @@ aarch64_override_options_internal (struct 
> gcc_options *opts)
>opts->x_param_values,
>global_options_set.x_param_values);
>  
> +  /* If the user hasn't change it via configure then set the default to 64 KB

s/change/changed/

> + for the backend.  */
> +  maybe_set_param_value (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE,
> +  DEFAULT_STK_CLASH_GUARD_SIZE == 0
> +? 16 : DEFAULT_STK_CLASH_GUARD_SIZE,
> +  opts->x_param_values,
> +  global_options_set.x_param_values);
> +
> +  /* Validate the guard size.  */
> +  int guard_size = PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
> +  if (guard_size != 12 && guard_size != 16)
> +  error ("only values 12 (4 KB) and 16 (64 KB) are supported for guard "

Formatting is wrong, two spaces to indent error.

> +  "size.  Given value %d (%llu KB) is out of range.\n",

No \n on errors. s/out of range/invalid/

> +  guard_size, (1ULL << guard_size) / 1024ULL);
> +
> +  /* Enforce that interval is the same size as size so the mid-end does the
> + right thing.  */
> +  maybe_set_param_value (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL,
> +  guard_size,
> +  opts->x_param_values,
> +  global_options_set.x_param_values);
> +
> +  /* The maybe_set calls won't update the value if the user has explicitly 
> set
> + one.  Which means we need to validate that probing interval and guard 
> size
> + are equal.  */
> +  int probe_interval
> += PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL);
> +  if (guard_size != probe_interval)
> +error ("stack clash guard size '%d' must be equal to probing interval "
> +"'%d'\n", guard_size, probe_interval);

No \n on errors.

> +
>/* Enable sw prefetching at specified optimization level for
>   CPUS that have prefetch.  Lower optimization level threshold by 1
>   when profiling is enabled.  */
> 



Re: [PATCH] Alias -Warray-bounds to -Warray-bounds=1

2018-07-31 Thread Martin Sebor

I can't approve patches but this one seems to be in
the obvious category so I think it could be checked in
without formal approval.

It is however missing a couple of things: 1) a test case,
and 2) a reference to the bug it fixes in the ChangeLog
and in the test.

With that, if no one objects, I will commit the path for
you.

Martin

On 07/25/2018 08:04 AM, Franz Sirl wrote:

Hi,

as discussed with Martin, this patch consolidates -Warray-bounds into an
alias of -Warray-bounds=1.

Bootstrapped on x86_64-linux, no regressions.

Please apply if it's OK.

Franz.







Re: Fold pointer range checks with equal spans

2018-07-31 Thread Marc Glisse

On Tue, 31 Jul 2018, Richard Biener wrote:


Also, when @2 == @0 + (@1+1) then the original condition is true but
((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not?
   (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2
-> -1 > @1 * 2

which is false.  So I can't really see how you can apply this transform in
general (it would be fine for generating alias checks but a bit more
pessimistic).

But maybe I am missing something?


It relies on sizetype being unsigned: (sizetype)-1 > @1 * 2 is true.


Hmm, so mathematically this is

 (@0 - @2) % modreduce + @1 > @1 * 2

then, but I don't want to think too much about this since Marc didn't
object here ;)


We already transform abs(x)<=3 into (unsigned)x+3u<=6u, that's the usual
way we do range checking so I didn't pay much attention to that part.
(tempted to say: "I didn't want to think too much about this since
Richard was going to do it anyway" ;-)

Turning multiple comparisons of the form P + cst CMP Q + cst into a
range check on P - Q sounds good (we don't really have to restrict to
the case where the range is symmetric). Actually, just turning P + cst
CMP Q + cst into P - Q CMP cst should do it, we should already have code
to handle range checking on integers (modulo the issue of CSE P-Q and
Q-P). But I don't know if a couple :s is sufficient to make this
transformation a good canonicalization.

If we start from a comparison of pointer_plus, I think it would make
sense to use pointer_diff.

I believe currently we try to use pointer operations (pointer_plus,
pointer_diff, lt) only for related pointers and we cast to some integer
type for wilder cases (implementation of std::less in C++ for instance).
On the other hand, in an alias check, the 2 pointers are possibly
unrelated, so maybe the code emitted for an alias check should be
changed.

--
Marc Glisse


Re: [PATCH 01/11] Add __builtin_speculation_safe_value

2018-07-31 Thread Ian Lance Taylor via gcc-patches
On Tue, Jul 31, 2018 at 12:25 PM, H.J. Lu  wrote:
> On Mon, Jul 30, 2018 at 6:16 AM, Richard Biener  wrote:
>> On Fri, 27 Jul 2018, Richard Earnshaw wrote:
>>
>>>
>>> This patch defines a new intrinsic function
>>> __builtin_speculation_safe_value.  A generic default implementation is
>>> defined which will attempt to use the backend pattern
>>> "speculation_safe_barrier".  If this pattern is not defined, or if it
>>> is not available, then the compiler will emit a warning, but
>>> compilation will continue.
>>>
>>> Note that the test spec-barrier-1.c will currently fail on all
>>> targets.  This is deliberate, the failure will go away when
>>> appropriate action is taken for each target backend.
>>
>> OK.
>>
>> Thanks,
>> Richard.
>>
>>> gcc:
>>>   * builtin-types.def (BT_FN_PTR_PTR_VAR): New function type.
>>>   (BT_FN_I1_I1_VAR, BT_FN_I2_I2_VAR, BT_FN_I4_I4_VAR): Likewise.
>>>   (BT_FN_I8_I8_VAR, BT_FN_I16_I16_VAR): Likewise.
>>>   * builtin-attrs.def (ATTR_NOVOPS_NOTHROW_LEAF_LIST): New attribute
>>>   list.
>>>   * builtins.def (BUILT_IN_SPECULATION_SAFE_VALUE_N): New builtin.
>>>   (BUILT_IN_SPECULATION_SAFE_VALUE_PTR): New internal builtin.
>>>   (BUILT_IN_SPECULATION_SAFE_VALUE_1): Likewise.
>>>   (BUILT_IN_SPECULATION_SAFE_VALUE_2): Likewise.
>>>   (BUILT_IN_SPECULATION_SAFE_VALUE_4): Likewise.
>>>   (BUILT_IN_SPECULATION_SAFE_VALUE_8): Likewise.
>>>   (BUILT_IN_SPECULATION_SAFE_VALUE_16): Likewise.
>>>   * builtins.c (expand_speculation_safe_value): New function.
>>>   (expand_builtin): Call it.
>>>   * doc/cpp.texi: Document predefine __HAVE_SPECULATION_SAFE_VALUE.
>>>   * doc/extend.texi: Document __builtin_speculation_safe_value.
>>>   * doc/md.texi: Document "speculation_barrier" pattern.
>>>   * doc/tm.texi.in: Pull in TARGET_SPECULATION_SAFE_VALUE and
>>>   TARGET_HAVE_SPECULATION_SAFE_VALUE.
>>>   * doc/tm.texi: Regenerated.
>>>   * target.def (have_speculation_safe_value, speculation_safe_value): 
>>> New
>>>   hooks.
>>>   * targhooks.c (default_have_speculation_safe_value): New function.
>>>   (default_speculation_safe_value): New function.
>>>   * targhooks.h (default_have_speculation_safe_value): Add prototype.
>>>   (default_speculation_safe_value): Add prototype.
>>>
>
> I got
>
> ../../src-trunk/gcc/targhooks.c: In function ‘bool
> default_have_speculation_safe_value(bool)’:
> ../../src-trunk/gcc/targhooks.c:2319:43: error: unused parameter
> ‘active’ [-Werror=unused-parameter]
>  default_have_speculation_safe_value (bool active)
>   ~^~


Me too.

Committed this patch as obvious.

Ian


2018-07-31  Ian Lance Taylor  

* targhooks.c (default_have_speculation_safe_value): Add
ATTRIBUTE_UNUSED.
Index: gcc/targhooks.c
===
--- gcc/targhooks.c (revision 263179)
+++ gcc/targhooks.c (working copy)
@@ -2316,7 +2316,7 @@ default_preferred_else_value (unsigned,
 
 /* Default implementation of TARGET_HAVE_SPECULATION_SAFE_VALUE.  */
 bool
-default_have_speculation_safe_value (bool active)
+default_have_speculation_safe_value (bool active ATTRIBUTE_UNUSED)
 {
 #ifdef HAVE_speculation_barrier
   return active ? HAVE_speculation_barrier : true;


Merge from trunk to gccgo branch

2018-07-31 Thread Ian Lance Taylor
I've merged trunk revision 263114 to the gccgo branch.

Ian


Re: [PATCH] libbacktrace: Move define of HAVE_ZLIB into check for -lz

2018-07-31 Thread Ian Lance Taylor via gcc-patches
On Tue, Jul 31, 2018 at 8:10 AM, Iain Buclaw  wrote:
> On 31 July 2018 at 16:33, Ian Lance Taylor  wrote:
>> On Sun, Jul 29, 2018 at 7:50 AM, Iain Buclaw  wrote:
>>>
>>> This is really to suppress the default action-if-found for
>>> AC_CHECK_LIBS.  Zlib is not a dependency of libbacktrace, and so it
>>> shouldn't be added to LIBS.  When looking at the check, saw that could
>>> remove the test for ac_cv_lib_z_compress also.
>>
>> Thanks, but this doesn't seem like quite the right approach, as seen
>> by the fact that HAVE_ZLIB_H was dropped from config.h.in.  I think
>> you need to keep the AC_DEFINE out of the AC_CHECK_LIB.  I would guess
>> that it would work to just change the default case of AC_CHECK_LIB to
>> [;] or something similarly innocuous.
>>
>> Ian
>
> May I ask you to look at the patch again?  There's two similarly named
> variables here, HAVE_LIBZ and HAVE_ZLIB.
>
> Only the unused HAVE_LIBZ has been dropped from config.h.in.  The one
> that matters has been left alone, or at least I'm pretty sure of.

Ah, got it.  Sorry about that.  The patch is OK with a ChangeLog entry.  Thanks.

Ian


Re: [PATCH] Make GO string literals properly NUL terminated

2018-07-31 Thread Ian Lance Taylor
On Tue, Jul 31, 2018 at 9:19 AM, Bernd Edlinger
 wrote:
> On 07/31/18 16:40, Ian Lance Taylor wrote:
>> On Tue, Jul 31, 2018 at 5:14 AM, Bernd Edlinger
>>  wrote:
>>>
>>> could someone please review this patch and check it in into the GO FE?
>>
>> I don't understand why the change is correct, and you didn't explain
>> it.  Go strings are not NUL terminated.  Go strings always have an
>> associated length.
>>
>
> Yes, sorry.  Effectively for go this change is a no-op.
> I'll elaborate a bit.
>
> This makes it easier for the middle-end to distinguish between nul-terminated
> and not nul terminated strings.  Especially if wide character strings
> may also may come along.
>
> In C a not nul terminated string might be declared like
> char x[2] = "12";
> it is always a STRING_CST object of length 3, with value "12\0".
> The array_type is char[0..1]
>
> while a nul terminated string is declared like
> char x[3] = "12"
> it is also a STRING_CST object of length 3, with value "12\0"
> The array_type is char[0..2]
>
> Note however the array type is different.
> So with this convention one only needs to compare the array type
> size with the string length which is much easier than looking for
> a terminating wide character, which is rarely done right.
>
> At the end varasm.c filters the excess NUL byte away, but
> I would like to add a checking assertion there that this does not
> strip more than max. one wide character nul byte.

Thanks, I think I should probably let this be reviewed by someone
reviewing the larger patch.  The go-gcc.cc file lives in the GCC repo
and changes to it can be approved and committed by any GCC middle-end
or global maintainer.  It's not part of the code copied from another
repo, which lives in gcc/go/gofrontend.

Ian


Re: [C PATCH] Fix endless loop in the C FE initializer handling (PR c/85704)

2018-07-31 Thread Joseph Myers
On Tue, 31 Jul 2018, Jakub Jelinek wrote:

> On Tue, Jul 31, 2018 at 08:04:58PM +, Joseph Myers wrote:
> > On Tue, 24 Jul 2018, Jakub Jelinek wrote:
> > 
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
> > > release branches?
> > > 
> > > 2018-07-24  Jakub Jelinek  
> > > 
> > >   PR c/85704
> > >   * c-typeck.c (field_decl_cmp): New function.
> > >   (output_pending_init_elements): Use it for field comparisons
> > >   instead of pure bit_position comparisons.
> > > 
> > >   * gcc.c-torture/compile/pr85704.c: New test.
> > 
> > OK, though I'm a bit uneasy about both c-decl.c and c-typeck.c having 
> > static field_decl_cmp functions that do completely different comparisons.
> 
> So would you like me to rename it somehow?
> init_field_decl_cmp, or field_cmp, or fld_cmp, something else?

init_field_decl_cmp seems reasonable (makes clearer what this particular 
comparison is for).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [C PATCH] Fix endless loop in the C FE initializer handling (PR c/85704)

2018-07-31 Thread Jakub Jelinek
On Tue, Jul 31, 2018 at 08:04:58PM +, Joseph Myers wrote:
> On Tue, 24 Jul 2018, Jakub Jelinek wrote:
> 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
> > release branches?
> > 
> > 2018-07-24  Jakub Jelinek  
> > 
> > PR c/85704
> > * c-typeck.c (field_decl_cmp): New function.
> > (output_pending_init_elements): Use it for field comparisons
> > instead of pure bit_position comparisons.
> > 
> > * gcc.c-torture/compile/pr85704.c: New test.
> 
> OK, though I'm a bit uneasy about both c-decl.c and c-typeck.c having 
> static field_decl_cmp functions that do completely different comparisons.

So would you like me to rename it somehow?
init_field_decl_cmp, or field_cmp, or fld_cmp, something else?

Jakub


Re: [C PATCH] Fix endless loop in the C FE initializer handling (PR c/85704)

2018-07-31 Thread Joseph Myers
On Tue, 24 Jul 2018, Jakub Jelinek wrote:

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
> release branches?
> 
> 2018-07-24  Jakub Jelinek  
> 
>   PR c/85704
>   * c-typeck.c (field_decl_cmp): New function.
>   (output_pending_init_elements): Use it for field comparisons
>   instead of pure bit_position comparisons.
> 
>   * gcc.c-torture/compile/pr85704.c: New test.

OK, though I'm a bit uneasy about both c-decl.c and c-typeck.c having 
static field_decl_cmp functions that do completely different comparisons.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 5/5] Formatted printing for dump_* in the middle-end

2018-07-31 Thread Joseph Myers
On Tue, 31 Jul 2018, David Malcolm wrote:

> I didn't exhaustively check every callsite to the changed calls; I'm
> assuming that -Wformat during bootstrap has effectively checked that
> for me.  Though now I think about it, I note that we use
> HOST_WIDE_INT_PRINT_DEC in many places: is this guaranteed to be a
> valid input to pp_format on all of our configurations?

HOST_WIDE_INT_PRINT_DEC should not be considered safe with pp_format 
(although since r197049 may have effectively stopped using %I64 on MinGW 
hosts, I'm not sure if there are current cases where it won't work).  
Rather, it is the job of pp_format to map the 'w' length specifier to 
HOST_WIDE_INT_PRINT_DEC etc.

I think it clearly makes for cleaner code to limit use of 
HOST_WIDE_INT_PRINT_* to as few places as possible and to prefer use of 
internal printf-like functions that accept formats such as %wd where 
possible.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Avoid infinite loop with duplicate anonymous union fields

2018-07-31 Thread Joseph Myers
On Tue, 31 Jul 2018, Bogdan Harjoc wrote:

> With fresh git sources and contrib/gcc_update the tests pass:
> 
> === gcc Summary ===
> 
> # of expected passes 133500
> # of expected failures 422
> # of unsupported tests 2104
> 
> gcc-build/gcc/xgcc  version 9.0.0 20180730 (experimental) (GCC)
> 
> I wasn't able to reduce the input to avoid including  and as
> it only reproduces without -save-temps, it's not clear how to write a
> testcase for this one.

Could you give more details of the paths through the code that are 
involved in the infinite loop, and the different paths you get without 
-save-temps?  Is this an issue of dependence on the values of pointers, or 
something like that?  Is it possible to produce a test with more instances 
of the problem in it, say, so that the probability of the problem showing 
up at least once with the test is much higher?

A binary search should not result in an infinite loop simply because the 
elements of the array are not sorted; in that case it should just converge 
on an unpredictable element.  So more explanation of how the infinite loop 
occurs is needed.  (But if choice of an unpredictable element results in 
e.g. subsequent diagnostics varying depending on pointer values, that by 
itself is a problem that may justify avoiding this code in the case where 
the array may not be sorted.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Libraries' configure scripts should not read config-ml.in when multilib is disabled

2018-07-31 Thread Joseph Myers
On Mon, 30 Jul 2018, John Ericson wrote:

> That said, it is my tentative understanding that the point of having config-ml
> is to cordon-off all the necessarily-multilib-specific logic so it doesn't
> pollute everything else. When that script isn't run, I think the Makefiles
> already contain default "trivial values" for capitalized MULTI* variables
> (which are the only ones actually used by the build itself), yielding
> precisely that deduplication of code paths we both want.

Well, if there are unnecessary settings in Makefiles the obvious thing to 
do is to remove them.

I think ideally config-ml.in wouldn't contain any conditionals at all on 
whether multilibs are enabled, and would run just the same whether they 
are or not, because in the multilib-disabled case -print-multi-lib should 
print the appropriate line for exactly one multilib, and everything should 
just follow through from that setting (--disable-multilib should be 
equivalent to multilibs enabled but only one present).

(More clearly, I think the special handling in config-ml.in for various 
architecture-specific configure options should go away.  I don't know why 
it's there, but multilib-related configure options should result in 
-print-multi-lib doing the right thing, so top-level doesn't need to do 
further manipulations on top of that.)

To build libraries separately, you can configure at top level with some of 
the subdirectories or their configure scripts that you don't need removed 
to avoid them getting built as well, and with --disable-multilib, and with 
my proposal above for config-ml.in not itself checking for 
--disable-multilib you might also need CC_FOR_TARGET etc. set to a wrapper 
that overrides the output of -print-multi-lib.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 01/11] Add __builtin_speculation_safe_value

2018-07-31 Thread H.J. Lu
On Mon, Jul 30, 2018 at 6:16 AM, Richard Biener  wrote:
> On Fri, 27 Jul 2018, Richard Earnshaw wrote:
>
>>
>> This patch defines a new intrinsic function
>> __builtin_speculation_safe_value.  A generic default implementation is
>> defined which will attempt to use the backend pattern
>> "speculation_safe_barrier".  If this pattern is not defined, or if it
>> is not available, then the compiler will emit a warning, but
>> compilation will continue.
>>
>> Note that the test spec-barrier-1.c will currently fail on all
>> targets.  This is deliberate, the failure will go away when
>> appropriate action is taken for each target backend.
>
> OK.
>
> Thanks,
> Richard.
>
>> gcc:
>>   * builtin-types.def (BT_FN_PTR_PTR_VAR): New function type.
>>   (BT_FN_I1_I1_VAR, BT_FN_I2_I2_VAR, BT_FN_I4_I4_VAR): Likewise.
>>   (BT_FN_I8_I8_VAR, BT_FN_I16_I16_VAR): Likewise.
>>   * builtin-attrs.def (ATTR_NOVOPS_NOTHROW_LEAF_LIST): New attribute
>>   list.
>>   * builtins.def (BUILT_IN_SPECULATION_SAFE_VALUE_N): New builtin.
>>   (BUILT_IN_SPECULATION_SAFE_VALUE_PTR): New internal builtin.
>>   (BUILT_IN_SPECULATION_SAFE_VALUE_1): Likewise.
>>   (BUILT_IN_SPECULATION_SAFE_VALUE_2): Likewise.
>>   (BUILT_IN_SPECULATION_SAFE_VALUE_4): Likewise.
>>   (BUILT_IN_SPECULATION_SAFE_VALUE_8): Likewise.
>>   (BUILT_IN_SPECULATION_SAFE_VALUE_16): Likewise.
>>   * builtins.c (expand_speculation_safe_value): New function.
>>   (expand_builtin): Call it.
>>   * doc/cpp.texi: Document predefine __HAVE_SPECULATION_SAFE_VALUE.
>>   * doc/extend.texi: Document __builtin_speculation_safe_value.
>>   * doc/md.texi: Document "speculation_barrier" pattern.
>>   * doc/tm.texi.in: Pull in TARGET_SPECULATION_SAFE_VALUE and
>>   TARGET_HAVE_SPECULATION_SAFE_VALUE.
>>   * doc/tm.texi: Regenerated.
>>   * target.def (have_speculation_safe_value, speculation_safe_value): New
>>   hooks.
>>   * targhooks.c (default_have_speculation_safe_value): New function.
>>   (default_speculation_safe_value): New function.
>>   * targhooks.h (default_have_speculation_safe_value): Add prototype.
>>   (default_speculation_safe_value): Add prototype.
>>

I got

../../src-trunk/gcc/targhooks.c: In function ‘bool
default_have_speculation_safe_value(bool)’:
../../src-trunk/gcc/targhooks.c:2319:43: error: unused parameter
‘active’ [-Werror=unused-parameter]
 default_have_speculation_safe_value (bool active)
  ~^~

-- 
H.J.


[PATCH] change %G argument from gcall* to gimple*

2018-07-31 Thread Martin Sebor

The GCC internal %G directive takes a gcall* argument and prints
the call's inlining stack in diagnostics.  The argument type makes
it unsuitable for gimple expressions such as those diagnosed by
-Warray-bounds.

As the first step in adding inlining context to -Warray-bounds
warnings the attached patch changes the %G argument to accept
gimple* instead of gcall*.  (More work is needed for %G to
preserve the location range within diagnostics so this patch
just implements the first step.)

Martin
PR tree-optimization/86650 - -Warray-bounds missing inlining context

gcc/c/ChangeLog:

	PR tree-optimization/86650
	* c-objc-common.c (c_tree_printer): Adjust.

gcc/c-family/ChangeLog:

	PR tree-optimization/86650
	* c-format.c (local_gcall_ptr_node): Rename...
	 (local_gimple_ptr_node): ...to this.
	* c-format.h (T89_G): Adjust.

gcc/cp/ChangeLog:

	PR tree-optimization/86650
	* error.c (cp_printer): Adjust.

gcc/ChangeLog:

	PR tree-optimization/86650
	* gimple-pretty-print.c (percent_G_format): Simplify.
	* tree-diagnostic.c (default_tree_printer): Adjust.
	* tree-pretty-print.c (percent_K_format): Add argument.
	* tree-pretty-print.h: Add argument.
	* gimple-fold.c (gimple_fold_builtin_strncpy): Adjust.
	* gimple-ssa-warn-restrict.h (check_bounds_or_overlap): Replace
	gcall* argument with gimple*.
	* gimple-ssa-warn-restrict.c (check_call): Same.
	(wrestrict_dom_walker::before_dom_children): Same.
	(builtin_access::builtin_access): Same.
	(check_bounds_or_overlap): Same.
	* tree-ssa-ccp.c (pass_post_ipa_warn::execute): Adjust.
	* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Adjust.

gcc/testsuite/ChangeLog:

	PR tree-optimization/86650
	* gcc.dg/format/gcc_diag-10.c: Adjust.

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index a0192dd..705bffb 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -56,7 +56,7 @@ struct function_format_info
 
 /* Initialized in init_dynamic_diag_info.  */
 static GTY(()) tree local_tree_type_node;
-static GTY(()) tree local_gcall_ptr_node;
+static GTY(()) tree local_gimple_ptr_node;
 static GTY(()) tree locus;
 
 static bool decode_format_attr (tree, function_format_info *, int);
@@ -691,7 +691,7 @@ static const format_char_info gcc_diag_char_table[] =
 
   /* Custom conversion specifiers.  */
 
-  /* G requires a "gcall*" argument at runtime.  */
+  /* G requires a "gimple*" argument at runtime.  */
   { "G",   1, STD_C89, { T89_G,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "","\"",   NULL },
   /* K requires a "tree" argument at runtime.  */
   { "K",   1, STD_C89, { T89_T,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "","\"",   NULL },
@@ -722,7 +722,7 @@ static const format_char_info gcc_tdiag_char_table[] =
   { "E", 1, STD_C89, { T89_T,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q+", "",   NULL },
   { "K", 1, STD_C89, { T89_T,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "", "\"",   NULL },
 
-  /* G requires a "gcall*" argument at runtime.  */
+  /* G requires a "gimple*" argument at runtime.  */
   { "G", 1, STD_C89, { T89_G,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "", "\"",   NULL },
 
   { "v",   0, STD_C89, { T89_I,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q#",  "",   NULL },
@@ -754,7 +754,7 @@ static const format_char_info gcc_cdiag_char_table[] =
   { "E",   1, STD_C89, { T89_T,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q+", "",   NULL },
   { "K",   1, STD_C89, { T89_T,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "", "\"",   NULL },
 
-  /* G requires a "gcall*" argument at runtime.  */
+  /* G requires a "gimple*" argument at runtime.  */
   { "G",   1, STD_C89, { T89_G,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "", "\"",   NULL },
 
   { "v",   0, STD_C89, { T89_I,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q#",  "",   NULL },
@@ -787,7 +787,7 @@ static const format_char_info gcc_cxxdiag_char_table[] =
   { "K", 1, STD_C89,{ T89_T,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "",   "\"",   NULL },
   { "v", 0,STD_C89, { T89_I,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q#",  "",   NULL },
 
-  /* G requires a "gcall*" argument at runtime.  */
+  /* G requires a "gimple*" argument at runtime.  */
   { "G", 1, STD_C89,{ T89_G,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "",   "\"",   NULL },
 
   /* These accept either an 'int' or an

Re: [PATCH] include more detail in -Warray-bounds (PR 86650)

2018-07-31 Thread Martin Sebor

Attached is a much scaled-down patch that only adds a note
to the -Warray-bounds warnings mentioning the declaration
to which the out-of-bounds index or offset applies.

Printing the inlining context needs a bit more work but
this small improvement can be made independently of it.

On 07/23/2018 05:49 PM, Martin Sebor wrote:

(David, I'm hoping your your help here.  Please see the end.)

While looking into a recent -Warray-bounds instance in Glibc
involving inlining of large functions it became apparent that
GCC could do a better job of pinpointing the source of
the problem.

The attached patch makes a few adjustments to both
the pretty printer infrastructure and to VRP to make this
possible.  The diagnostic pretty printer already has directives
to print the inlining context for both tree and gcall* arguments,
so most of the changes just adjust things to be able to pass in
gimple* argument instead.

The only slightly interesting change is to print the declaration
to which the out-of-bounds array refers if one is known.

Tested on x86_64-linux with one regression.

The regression is in the gcc.dg/Warray-bounds.c test: the column
numbers of the warnings are off.  Adding the %G specifier to
the array bounds warnings in VRP has the unexpected effect of
expanding the extent of the underling. For instance, for a test
case like this:

  int a[10];

  void f (void)
  {
a[-1] = 0;
  }

from the expected:

   a[-1] = 0;
   ~^~~~

to this:

  a[-1] = 0;
   ~~^~~

David, do you have any idea how to avoid this?

Martin


PR tree-optimization/86650 - -Warray-bounds missing inlining context

gcc/ChangeLog:

	PR tree-optimization/86650
	* tree-vrp.c (vrp_prop::check_array_ref): Print an inform message.
	(vrp_prop::check_mem_ref): Same.

gcc/testsuite/ChangeLog:

	PR tree-optimization/86650
	* gcc.dg/Warray-bounds-33.c: New test.

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 7ab8898..e553f3a 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -4838,14 +4838,13 @@ vrp_prop::check_array_ref (location_t location, tree ref,
 
   tree artype = TREE_TYPE (TREE_OPERAND (ref, 0));
 
+  bool warned = false;
+
   /* Empty array.  */
   if (up_bound && tree_int_cst_equal (low_bound, up_bound_p1))
-{
-  warning_at (location, OPT_Warray_bounds,
-		  "array subscript %E is above array bounds of %qT",
-		  low_bound, artype);
-  TREE_NO_WARNING (ref) = 1;
-}
+warned = warning_at (location, OPT_Warray_bounds,
+			 "array subscript %E is above array bounds of %qT",
+			 low_bound, artype);
 
   if (TREE_CODE (low_sub) == SSA_NAME)
 {
@@ -4866,12 +4865,10 @@ vrp_prop::check_array_ref (location_t location, tree ref,
 	  : tree_int_cst_le (up_bound, up_sub))
   && TREE_CODE (low_sub) == INTEGER_CST
   && tree_int_cst_le (low_sub, low_bound))
-{
-  warning_at (location, OPT_Warray_bounds,
-		  "array subscript [%E, %E] is outside array bounds of %qT",
-		  low_sub, up_sub, artype);
-  TREE_NO_WARNING (ref) = 1;
-}
+	warned = warning_at (location, OPT_Warray_bounds,
+			 "array subscript [%E, %E] is outside "
+			 "array bounds of %qT",
+			 low_sub, up_sub, artype);
 }
   else if (up_bound
 	   && TREE_CODE (up_sub) == INTEGER_CST
@@ -4885,10 +4882,9 @@ vrp_prop::check_array_ref (location_t location, tree ref,
 	  dump_generic_expr (MSG_NOTE, TDF_SLIM, ref);
 	  fprintf (dump_file, "\n");
 	}
-  warning_at (location, OPT_Warray_bounds,
-		  "array subscript %E is above array bounds of %qT",
-		  up_sub, artype);
-  TREE_NO_WARNING (ref) = 1;
+  warned = warning_at (location, OPT_Warray_bounds,
+			   "array subscript %E is above array bounds of %qT",
+			   up_sub, artype);
 }
   else if (TREE_CODE (low_sub) == INTEGER_CST
&& tree_int_cst_lt (low_sub, low_bound))
@@ -4899,9 +4895,18 @@ vrp_prop::check_array_ref (location_t location, tree ref,
 	  dump_generic_expr (MSG_NOTE, TDF_SLIM, ref);
 	  fprintf (dump_file, "\n");
 	}
-  warning_at (location, OPT_Warray_bounds,
-		  "array subscript %E is below array bounds of %qT",
-		  low_sub, artype);
+  warned = warning_at (location, OPT_Warray_bounds,
+			   "array subscript %E is below array bounds of %qT",
+			   low_sub, artype);
+}
+
+  if (warned)
+{
+  ref = TREE_OPERAND (ref, 0);
+
+  if (DECL_P (ref))
+	inform (DECL_SOURCE_LOCATION (ref), "while referencing %qD", ref);
+
   TREE_NO_WARNING (ref) = 1;
 }
 }
@@ -4916,7 +4921,8 @@ vrp_prop::check_array_ref (location_t location, tree ref,
the address of the just-past-the-end element of an array).  */
 
 void
-vrp_prop::check_mem_ref (location_t location, tree ref, bool ignore_off_by_one)
+vrp_prop::check_mem_ref (location_t location, tree ref,
+			 bool ignore_off_by_one)
 {
   if (TREE_NO_WARNING (ref))
 return;
@@ -5134,16 +5140,21 @@ vrp_prop::check_mem_ref (location_t location, tree ref, bool ignore_off_by_one)
 	  offrange[1] = offrange[1] / eltsize;
 	}
 
+ 

Re: [PATCH] Fix PR middle-end/86705

2018-07-31 Thread Jozef Lawrynowicz

On 30/07/18 14:29, Richard Biener wrote:

On Sun, Jul 29, 2018 at 6:27 PM Jozef Lawrynowicz
 wrote:

pr45678-2.c ICEs for msp430-elf with -mlarge, because an alignment of
POINTER_SIZE is attempted. POINTER_SIZE with -mlarge is 20-bits, so further
code in the middle-end that expects this to be a power or 2 causes odd
alignments to be set, in this case eventually resulting in an ICE.

The test ICEs on gcc-7-branch, gcc-8-branch, and current trunk. It
successfully builds on gcc-6-branch.
The failure is caused by r235172.

Successfully bootstrapped and regtested the attached patch for
x86-64-pc-linux-gnu, and msp430-elf with -mlarge, on trunk.

Ok for gcc-7-branch, gcc-8-branch and trunk?

I wonder if most (if not all) places you touch want to use
get_mode_alignment (Pmode) instead?  (or ptr_mode)

Anyhow, the patch is otherwise obvious though factoring
the thing might be nice (thus my suggestion above...)

Richard.


Thanks for the suggestion, using GET_MODE_ALIGNMENT does seem like a neater
idea.
After retesting, I went ahead and committed the below patch onto trunk, will
backport to gcc-7/8-branch later.

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index d6e3c38..7353d5d 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1257,10 +1257,10 @@ set_parm_rtl (tree parm, rtx x)
 	 allocate it, which means that in-frame portion is just a
 	 pointer.  ??? We've got a pseudo for sure here, do we
 	 actually dynamically allocate its spilling area if needed?
-	 ??? Isn't it a problem when POINTER_SIZE also exceeds
-	 MAX_SUPPORTED_STACK_ALIGNMENT, as on cris and lm32?  */
+	 ??? Isn't it a problem when Pmode alignment also exceeds
+	 MAX_SUPPORTED_STACK_ALIGNMENT, as can happen on cris and lm32?  */
   if (align > MAX_SUPPORTED_STACK_ALIGNMENT)
-	align = POINTER_SIZE;
+	align = GET_MODE_ALIGNMENT (Pmode);
 
   record_alignment_for_reg_var (align);
 }
@@ -1381,7 +1381,7 @@ expand_one_ssa_partition (tree var)
   /* If the variable alignment is very large we'll dynamicaly allocate
  it, which means that in-frame portion is just a pointer.  */
   if (align > MAX_SUPPORTED_STACK_ALIGNMENT)
-align = POINTER_SIZE;
+align = GET_MODE_ALIGNMENT (Pmode);
 
   record_alignment_for_reg_var (align);
 
@@ -1608,7 +1608,7 @@ expand_one_var (tree var, bool toplevel, bool really_expand)
   /* If the variable alignment is very large we'll dynamicaly allocate
 	 it, which means that in-frame portion is just a pointer.  */
   if (align > MAX_SUPPORTED_STACK_ALIGNMENT)
-	align = POINTER_SIZE;
+	align = GET_MODE_ALIGNMENT (Pmode);
 }
 
   record_alignment_for_reg_var (align);


Re: [PATCH] Avoid infinite loop with duplicate anonymous union fields

2018-07-31 Thread Richard Sandiford
Hi,

Thanks for submitting the patch.

Bogdan Harjoc  writes:
> With fresh git sources and contrib/gcc_update the tests pass:
>
> === gcc Summary ===
>
> # of expected passes 133500
> # of expected failures 422
> # of unsupported tests 2104
>
> gcc-build/gcc/xgcc  version 9.0.0 20180730 (experimental) (GCC)
>
> I wasn't able to reduce the input to avoid including  and as
> it only reproduces without -save-temps, it's not clear how to write a
> testcase for this one.

Adding -save-temps to the options is OK.  You just need to add:

  /* { dg-options "-save-temps" } */

to the test file, and put it in somewhere like gcc.dg.

FWIW, the failure reproduces for me with #include  replaced by:

  #define foo(a)

Seems it has to be a function macro that has an argument called "a".
No idea why :-)

Richard


Re: [PATCH] PR libstdc++/86751 default assignment operators for std::pair

2018-07-31 Thread Jonathan Wakely

On 31/07/18 20:14 +0300, Ville Voutilainen wrote:

On 31 July 2018 at 20:07, Jonathan Wakely  wrote:

The solution for PR 77537 causes ambiguities due to the extra copy
assignment operator taking a __nonesuch_no_braces parameter. The copy
and move assignment operators can be defined as defaulted to meet the
semantics required by the standard.

In order to preserve ABI compatibility (specifically argument passing
conventions for pair) we need a new base class that makes the
assignment operators non-trivial.

PR libstdc++/86751
* include/bits/stl_pair.h (__nonesuch_no_braces): Remove.
(__pair_base): New class with non-trivial copy assignment operator.
(pair): Derive from __pair_base. Define copy assignment and move
assignment operators as defaulted.
* testsuite/20_util/pair/86751.cc: New test.


Ville, this passes all our tests, but am I forgetting something that
means this isn't right?


Pairs of references?


I knew there was a reason.

We need better tests, since nothing failed when I made this change.

OK, let me rework the patch ...




Re: [PATCH] Make function clone name numbering independent.

2018-07-31 Thread Michael Ploujnikov
On 2018-07-26 01:27 PM, Michael Ploujnikov wrote:
> On 2018-07-24 09:57 AM, Michael Ploujnikov wrote:
>> On 2018-07-20 06:05 AM, Richard Biener wrote:
  /* Return a new assembler name for a clone with SUFFIX of a decl named
 NAME.  */
 @@ -521,14 +521,13 @@ tree
  clone_function_name_1 (const char *name, const char *suffix)
>>>
>>> pass this function the counter to use
>>>
  {
size_t len = strlen (name);
 -  char *tmp_name, *prefix;
 +  char *prefix;

prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
memcpy (prefix, name, len);
strcpy (prefix + len + 1, suffix);
prefix[len] = symbol_table::symbol_suffix_separator ();
 -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
>>>
>>> and keep using ASM_FORMAT_PRIVATE_NAME here.  You need to change
>>> the lto/lto-partition.c caller (just use zero as counter).
>>>
 -  return get_identifier (tmp_name);
 +  return get_identifier (prefix);
  }

  /* Return a new assembler name for a clone of DECL with SUFFIX.  */
 @@ -537,7 +536,17 @@ tree
  clone_function_name (tree decl, const char *suffix)
  {
tree name = DECL_ASSEMBLER_NAME (decl);
 -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
 +  const char *decl_name = IDENTIFIER_POINTER (name);
 +  char *numbered_name;
 +  unsigned int *suffix_counter;
 +  if (!clone_fn_ids) {
 +/* Initialize the per-function counter hash table if this is the 
 first call */
 +clone_fn_ids = hash_map::create_ggc (64);
 +  }
>>>
>>> I still do not like throwing memory at the problem in this way for the
>>> little benefit
>>> this change provides :/
>>>
>>> So no approval from me at this point...
>>>
>>> Richard.
>>
>> Can you give me an idea of the memory constraints that are involved?
>>
>> The highest memory usage increase that I could find was when compiling
>> a source file (from Linux) with roughly 10,000 functions. It showed a 2kB
>> increase over the before-patch use of 6936kB which is barely 0.03%.
>>
>> Using a single counter can result in more confusing namespacing when
>> you have .bar.clone.4 despite there only being 3 clones of .bar.
>>
>> From a practical point of view this change is helpful to anyone
>> diffing binary output such as forensic analysts, Debian Reproducible
>> Builds or even someone validating compiler output (before and after an input
>> source patch). The extra changes that this patch alleviates are a
>> distraction and could even be misleading. For example, applying a
>> source patch to the same Linux source produces the following binary
>> diff before my change:
>>
>> --- /tmp/output.o.objdump
>> +++ /tmp/patched-output.o.objdump
>> @@ -1,5 +1,5 @@
>>
>> -/tmp/uverbs_cmd/output.o: file format elf32-i386
>> +/tmp/uverbs_cmd/patched-output.o: file format elf32-i386
>>
>>
>>  Disassembly of section .text.get_order:
>> @@ -265,12 +265,12 @@
>> 3:   e9 fc ff ff ff  jmp4 
>>  4: R_386_PC32   .text.put_uobj_read
>>
>> -Disassembly of section .text.trace_kmalloc.constprop.3:
>> +Disassembly of section .text.trace_kmalloc.constprop.4:
>>
>> - :
>> + :
>> 0:   83 3d 04 00 00 00 00cmpl   $0x0,0x4
>>  2: R_386_32 __tracepoint_kmalloc
>> -   7:   74 34   je 3d 
>> 
>> +   7:   74 34   je 3d 
>> 
>> 9:   55  push   %ebp
>> a:   89 cd   mov%ecx,%ebp
>> c:   57  push   %edi
>> @@ -281,7 +281,7 @@
>>13:   8b 1d 10 00 00 00   mov0x10,%ebx
>>  15: R_386_32__tracepoint_kmalloc
>>19:   85 db   test   %ebx,%ebx
>> -  1b:   74 1b   je 38 
>> 
>> +  1b:   74 1b   je 38 
>> 
>>1d:   68 d0 00 00 00  push   $0xd0
>>22:   89 fa   mov%edi,%edx
>>24:   89 f0   mov%esi,%eax
>> @@ -292,7 +292,7 @@
>>31:   58  pop%eax
>>32:   83 3b 00cmpl   $0x0,(%ebx)
>>35:   5a  pop%edx
>> -  36:   eb e3   jmp1b 
>> 
>> +  36:   eb e3   jmp1b 
>> 
>>38:   5b  pop%ebx
>>39:   5e  pop%esi
>>3a:   5f  pop%edi
>> @@ -846,7 +846,7 @@
>>78:   b8 5f 00 00 00  mov$0x5f,%eax
>>  79: R_386_32.text.ib_uverbs_alloc_pd
>>7d:   e8 fc ff ff ff  call   7e 
>> -7e: R_386_PC32  .text.trace_kmalloc.constprop.3
>> +7e: R_386_PC32  .text.trace_kmalloc.constprop.4
>>82:   c7 45 d4 f4 ff ff ffmovl   $0xfff4,-0x2c(%ebp)
>>

Re: [PATCH] PR libstdc++/86751 default assignment operators for std::pair

2018-07-31 Thread Ville Voutilainen
On 31 July 2018 at 20:07, Jonathan Wakely  wrote:
> The solution for PR 77537 causes ambiguities due to the extra copy
> assignment operator taking a __nonesuch_no_braces parameter. The copy
> and move assignment operators can be defined as defaulted to meet the
> semantics required by the standard.
>
> In order to preserve ABI compatibility (specifically argument passing
> conventions for pair) we need a new base class that makes the
> assignment operators non-trivial.
>
> PR libstdc++/86751
> * include/bits/stl_pair.h (__nonesuch_no_braces): Remove.
> (__pair_base): New class with non-trivial copy assignment operator.
> (pair): Derive from __pair_base. Define copy assignment and move
> assignment operators as defaulted.
> * testsuite/20_util/pair/86751.cc: New test.
>
>
> Ville, this passes all our tests, but am I forgetting something that
> means this isn't right?

Pairs of references?


[og8] More goacc_parlevel enhancements

2018-07-31 Thread Cesar Philippidis
I've committed this patch which contains all of the remaining
goacc_parlevel bug fixes present in trunk to og8.

The goal of the goacc parlevel changes is replace the use of inline ptx
code with builtin functions so that the certain OpenACC execution tests
that exercise the execution model can be target independent. For the
most part, these patches applied cleanly to og8, however, as I noted in
PR86757, there were a couple of og8-specific regressions involving tests
that started to fail when built -O0. I believe that problem is caused by
the ganglocal memory changes.

Chung-Lin, we'll need to fix PR86757 before we push the gangprivate
changes upstream.

Julian, I'm not sure if the GCN port supports gangprivate memory. If it
does, you might be hit by this failure at -O0. But those tests have
already been xfailed, so you should be OK.

Cesar
[og8] More goacc_parlevel enhancements

2018-07-31  Cesar Philippidis  

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Adjust test.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise.

	Backport from mainline:
	2018-05-02  Tom de Vries  

	PR libgomp/85411
	libgomp/
	* plugin/plugin-nvptx.c (nvptx_exec): Move parsing of
	GOMP_OPENACC_DIM ...
	* env.c (parse_gomp_openacc_dim): ... here.  New function.
	(initialize_env): Call parse_gomp_openacc_dim.
	(goacc_default_dims): Define.
	* libgomp.h (goacc_default_dims): Declare.
	* oacc-plugin.c (GOMP_PLUGIN_acc_default_dim): New function.
	* oacc-plugin.h (GOMP_PLUGIN_acc_default_dim): Declare.
	* libgomp.map: New version "GOMP_PLUGIN_1.2". Add
	GOMP_PLUGIN_acc_default_dim.
	* testsuite/libgomp.oacc-c-c++-common/loop-default-runtime.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default.h: New test.

	2018-05-04  Tom de Vries  
	PR libgomp/85639
	gcc/
	* builtins.c (expand_builtin_goacc_parlevel_id_size): Handle null target
	if ignore == 0.

	2018-05-07  Tom de Vries  
	PR testsuite/85677
	libgomp/
	* testsuite/lib/libgomp.exp (libgomp_init): Move inclusion of top-level
	include directory in ALWAYS_CFLAGS out of $blddir != "" condition.

[openacc] Move GOMP_OPENACC_DIM parsing out of nvptx plugin

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@259852
138bc75d-0d04-0410-961f-82ee72b054a4

[expand] Handle null target in expand_builtin_goacc_parlevel_id_size

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@259927
138bc75d-0d04-0410-961f-82ee72b054a4

[openacc, testsuite] Allow installed testing of libgomp to find gomp-constants.h

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@259992
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 300e13c..0097d5b 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6682,6 +6682,9 @@ expand_builtin_goacc_parlevel_id_size (tree exp, rtx target, int ignore)
   if (ignore)
 return target;
 
+  if (target == NULL_RTX)
+target = gen_reg_rtx (TYPE_MODE (TREE_TYPE (exp)));
+
   if (!targetm.have_oacc_dim_size ())
 {
   emit_move_insn (target, fallback_retval);
diff --git a/libgomp/env.c b/libgomp/env.c
index c99ba85..fab35b7 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -90,6 +90,7 @@ int gomp_debug_var;
 unsigned int gomp_num_teams_var;
 char *goacc_device_type;
 int goacc_device_num;
+int goacc_default_dims[GOMP_DIM_MAX];
 
 #ifndef LIBGOMP_OFFLOADED_ONLY
 
@@ -1066,6 +1067,36 @@ parse_acc_device_type (void)
 }
 
 static void
+parse_gomp_openacc_dim (void)
+{
+  /* The syntax is the same as for the -fopenacc-dim compilation option.  */
+  const char *var_name = "GOMP_OPENACC_DIM";
+  const char *env_var = getenv (var_name);
+  if (!env_var)
+return;
+
+  const char *pos = env_var;
+  int i;
+  for (i = 0; *pos && i != GOMP_DIM_MAX; i++)
+{
+  if (i && *pos++ != ':')
+	break;
+
+  if (*pos == ':')
+	continue;
+
+  const char *eptr;
+  errno = 0;
+  long val = strtol (pos, (char **)&eptr, 10);
+  if (errno || val < 0 || (unsigned)val != val)
+	break;
+
+  goacc_default_dims[i] = (int)val;
+  pos = eptr;
+}
+}
+
+static void
 handle_omp_display_env (unsigned long stacksize, int wait_policy)
 {
   const char *env;
@@ -1336,6 +1367,7 @@ initialize_env (void)
 goacc_device_num = 0;
 
   parse_acc_device_type ();
+  parse_gomp_openacc_dim ();
 
   goacc_runtime_initialize ();
 
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index a9aca74..607f4c2 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -44,6 +44,7 @@
 #include "config.h"
 #include "gstdint.h"
 #include "libgomp-plugin.h"
+#include "gomp-constants

Re: [Patch, fortran] A first small step towards CFI descriptor implementation

2018-07-31 Thread Janus Weil
Hi Paul,

2018-07-31 14:06 GMT+02:00 Paul Richard Thomas :
> Daniel Celis Garza and Damian Rouson have developed a runtime library
> and include file for the TS 29113 and F2018 C descriptors.
> https://github.com/sourceryinstitute/ISO_Fortran_binding
>
> The ordering of types is different to the current 'bt' enum in
> libgfortran.h. This patch interchanges BT_DERIVED and BT_CHARACTER to
> fix this.

is this ordering actually fixed by the F18 standard, or is there any
other reason why it needs to be like this? What's wrong with
gfortran's current ordering?

Cheers,
Janus


[PATCH] PR libstdc++/86751 default assignment operators for std::pair

2018-07-31 Thread Jonathan Wakely

The solution for PR 77537 causes ambiguities due to the extra copy
assignment operator taking a __nonesuch_no_braces parameter. The copy
and move assignment operators can be defined as defaulted to meet the
semantics required by the standard.

In order to preserve ABI compatibility (specifically argument passing
conventions for pair) we need a new base class that makes the
assignment operators non-trivial.

PR libstdc++/86751
* include/bits/stl_pair.h (__nonesuch_no_braces): Remove.
(__pair_base): New class with non-trivial copy assignment operator.
(pair): Derive from __pair_base. Define copy assignment and move
assignment operators as defaulted.
* testsuite/20_util/pair/86751.cc: New test.


Ville, this passes all our tests, but am I forgetting something that
means this isn't right?


commit 766fc07c06b774fc6a0bd30d5bd8add8e4185d69
Author: Jonathan Wakely 
Date:   Tue Jul 31 17:26:04 2018 +0100

PR libstdc++/86751 default assignment operators for std::pair

The solution for PR 77537 causes ambiguities due to the extra copy
assignment operator taking a __nonesuch_no_braces parameter. The copy
and move assignment operators can be defined as defaulted to meet the
semantics required by the standard.

In order to preserve ABI compatibility (specifically argument passing
conventions for pair) we need a new base class that makes the
assignment operators non-trivial.

PR libstdc++/86751
* include/bits/stl_pair.h (__nonesuch_no_braces): Remove.
(__pair_base): New class with non-trivial copy assignment operator.
(pair): Derive from __pair_base. Define copy assignment and move
assignment operators as defaulted.
* testsuite/20_util/pair/86751.cc: New test.

diff --git a/libstdc++-v3/include/bits/stl_pair.h 
b/libstdc++-v3/include/bits/stl_pair.h
index a2486ba8244..03261fef1ea 100644
--- a/libstdc++-v3/include/bits/stl_pair.h
+++ b/libstdc++-v3/include/bits/stl_pair.h
@@ -179,14 +179,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
   };
 
-  // PR libstdc++/79141, a utility type for preventing
-  // initialization of an argument of a disabled assignment
-  // operator from a pair of empty braces.
-  struct __nonesuch_no_braces : std::__nonesuch {
-explicit __nonesuch_no_braces(const __nonesuch&) = delete;
+#if !_GLIBCXX_INLINE_VERSION
+  class __pair_base
+  {
+template friend struct pair;
+__pair_base() = default;
+~__pair_base() = default;
+__pair_base(const __pair_base&) = default;
+// Ensure !is_trivially_copy_assignable> for ABI compatibility:
+__pair_base& operator=(const __pair_base&) noexcept { return *this; }
   };
-
-#endif
+#endif // !_GLIBCXX_INLINE_VERSION
+#endif // C++11
 
  /**
*  @brief Struct holding two objects of arbitrary type.
@@ -196,6 +200,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*/
   template
 struct pair
+#if __cplusplus >= 201103L && !_GLIBCXX_INLINE_VERSION
+// GLIBCXX_ABI Deprecated
+: private __pair_base
+#endif
 {
   typedef _T1 first_type;/// @c first_type is the first bound type
   typedef _T2 second_type;   /// @c second_type is the second bound type
@@ -363,35 +371,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 pair(piecewise_construct_t, tuple<_Args1...>, tuple<_Args2...>);
 
-  pair&
-  operator=(typename conditional<
-   __and_,
-  is_copy_assignable<_T2>>::value,
-   const pair&, const __nonesuch_no_braces&>::type __p)
-  {
-   first = __p.first;
-   second = __p.second;
-   return *this;
-  }
-
-  pair&
-  operator=(typename conditional<
-   __not_<__and_,
- is_copy_assignable<_T2>>>::value,
-   const pair&, const __nonesuch_no_braces&>::type __p) = delete;
-
-  pair&
-  operator=(typename conditional<
-   __and_,
-  is_move_assignable<_T2>>::value,
-   pair&&, __nonesuch_no_braces&&>::type __p)
-  noexcept(__and_,
- is_nothrow_move_assignable<_T2>>::value)
-  {
-   first = std::forward(__p.first);
-   second = std::forward(__p.second);
-   return *this;
-  }
+  pair& operator=(const pair&) = default;
+  pair& operator=(pair&&) = default;
 
   template
   typename enable_if<__and_,
diff --git a/libstdc++-v3/testsuite/20_util/pair/86751.cc 
b/libstdc++-v3/testsuite/20_util/pair/86751.cc
new file mode 100644
index 000..76a76c0d656
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/pair/86751.cc
@@ -0,0 +1,33 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation

Re: [PATCH] Fix DJGPP LTO with debug

2018-07-31 Thread Andris Pavenis

On 07/31/2018 04:04 PM, Richard Biener wrote:

On Sat, 28 Jul 2018, Andris Pavenis wrote:


On 07/27/2018 11:51 PM, DJ Delorie wrote:

Richard Biener  writes:

DJ, did you ever run the testsuite with a configuration that has LTO
enabled?  I don't see any djgpp results posted to gcc-testresults.
Quick googling doesn't yield anything useful with regarding on how to
do actual testing with a cross so I only built a i686-pc-msdosdjgpp
cross cc1/lto1 from x86_64-linux which went fine.

CC's Andris, our current gcc maintainer within DJGPP.  I know he just
built 8.2 binaries for us, I don't know what his testing infrastructure
looks like.


No.

II tried to run part of tests from custom scripts (eg. when trying to
implement DJGPP support for libstdc++fs, not yet submitted to upstream) with
native compiler for DJGPP.

Otherwise no DejaGNU support for DJGPP. So no way to run testsuite with native
compiler.

I should perhaps try to find some way to try to run testsuite using
cross-compiler from Linux. Possibilities:
- trying to execute test programs under DosEmu (no more possible with linux
kernels 4.15+ as DosEmu do not support DPMI for them)
- trying to execute test programs under Dosbox. Question: how to configure
testsuiite to do that? I do not know
- trying to run them through ssh on some Windows 32 bit system (older than
Windows 10 as DPMI support is rather horribly broken in Windows 10 32 bit
since March 2018)

So what about the patch?  Is it OK for trunk and GCC 8 branch?


It is OK for both (actually tested with gcc-8.2.0).

I comments about patch together with results of performed tests can be found in 
Bugzilla

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86651

Andris



[og8] Add __builtin_goacc_parlevel_{id,size}

2018-07-31 Thread Cesar Philippidis
I've committed this patch to og8 which backports the first of Tom's
goacc_parlevel patches from mainline. I'll post of a followup patch
which contains various bug fixes. I believe that this patch was
originally introduced in PR82428, or at least it resolves that PR.

Cesar
[og8] Add __builtin_goacc_parlevel_{id,size}

2018-07-31  Cesar Philippidis  

	Backport from mainline:
	2018-05-02  Tom de Vries  

	PR libgomp/82428
	gcc/
	* builtins.def (DEF_GOACC_BUILTIN_ONLY): Define.
	* omp-builtins.def (BUILT_IN_GOACC_PARLEVEL_ID)
	(BUILT_IN_GOACC_PARLEVEL_SIZE): New builtin.
	* builtins.c (expand_builtin_goacc_parlevel_id_size): New function.
	(expand_builtin): Call expand_builtin_goacc_parlevel_id_size.
	* doc/extend.texi (Other Builtins): Add __builtin_goacc_parlevel_id and
	__builtin_goacc_parlevel_size.

	gcc/fortran/
	* f95-lang.c (DEF_GOACC_BUILTIN_ONLY): Define.

	gcc/testsuite/
	* c-c++-common/goacc/builtin-goacc-parlevel-id-size-2.c: New test.
	* c-c++-common/goacc/builtin-goacc-parlevel-id-size.c: New test.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-static-2.c: Use
	__builtin_goacc_parlevel_{id,size}.
	* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-g-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-v-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/tile-1.c: Same.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@259850
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/builtins.c b/gcc/builtins.c
index a71555e..300e13c 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -71,6 +71,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "intl.h"
 #include "file-prefix-map.h" /* remap_macro_filename()  */
+#include "gomp-constants.h"
+#include "omp-general.h"
 
 struct target_builtins default_target_builtins;
 #if SWITCHABLE_TARGET
@@ -6628,6 +6630,71 @@ expand_stack_save (void)
   return ret;
 }
 
+/* Emit code to get the openacc gang, worker or vector id or size.  */
+
+static rtx
+expand_builtin_goacc_parlevel_id_size (tree exp, rtx target, int ignore)
+{
+  const char *name;
+  rtx fallback_retval;
+  rtx_insn *(*gen_fn) (rtx, rtx);
+  switch (DECL_FUNCTION_CODE (get_callee_fndecl (exp)))
+{
+case BUILT_IN_GOACC_PARLEVEL_ID:
+  name = "__builtin_goacc_parlevel_id";
+  fallback_retval = const0_rtx;
+  gen_fn = targetm.gen_oacc_dim_pos;
+  break;
+case BUILT_IN_GOACC_PARLEVEL_SIZE:
+  name = "__builtin_goacc_parlevel_size";
+  fallback_retval = const1_rtx;
+  gen_fn = targetm.gen_oacc_dim_size;
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  if (oacc_get_fn_attrib (current_function_decl) == NULL_TREE)
+{
+  error ("%qs only supported in OpenACC code", name);
+  return const0_rtx;
+}
+
+  tree arg = CALL_EXPR_ARG (exp, 0);
+  if (TREE_CODE (arg) != INTEGER_CST)
+{
+  error ("non-constant argument 0 to %qs", name);
+  return const0_rtx;
+}
+
+  int dim = TREE_INT_CST_LOW (arg);
+  switch (dim)
+{
+case GOMP_DIM_GANG:
+case GOMP_DIM_WORKER:
+case GOMP_DIM_VECTOR:
+  break;
+default:
+  error ("illegal argument 0 to %qs", name);
+  return const0_rtx;
+}
+
+  if (ignore)
+return target;
+
+  if (!targetm.have_oacc_dim_size ())
+{
+  emit_move_insn (target, fallback_retval);
+  return target;
+}
+
+  rtx reg = MEM_P (target) ? gen_reg_rtx (GET_MODE (target)) : target;
+  emit_insn (gen_fn (reg, GEN_INT (dim)));
+  if (reg != target)
+emit_move_insn (target, reg);
+
+  return target;
+}
 
 /* Expand an expression EXP that calls a built-in function,
with result going to TARGET if that's convenient
@@ -7758,6 +7825,10 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode,
 	 folding.  */
   bre

Re: [PATCH] Make GO string literals properly NUL terminated

2018-07-31 Thread Bernd Edlinger
On 07/31/18 16:40, Ian Lance Taylor wrote:
> On Tue, Jul 31, 2018 at 5:14 AM, Bernd Edlinger
>  wrote:
>>
>> could someone please review this patch and check it in into the GO FE?
> 
> I don't understand why the change is correct, and you didn't explain
> it.  Go strings are not NUL terminated.  Go strings always have an
> associated length.
> 

Yes, sorry.  Effectively for go this change is a no-op.
I'll elaborate a bit.

This makes it easier for the middle-end to distinguish between nul-terminated
and not nul terminated strings.  Especially if wide character strings
may also may come along.

In C a not nul terminated string might be declared like
char x[2] = "12";
it is always a STRING_CST object of length 3, with value "12\0".
The array_type is char[0..1]

while a nul terminated string is declared like
char x[3] = "12"
it is also a STRING_CST object of length 3, with value "12\0"
The array_type is char[0..2]

Note however the array type is different.
So with this convention one only needs to compare the array type
size with the string length which is much easier than looking for
a terminating wide character, which is rarely done right.

At the end varasm.c filters the excess NUL byte away, but
I would like to add a checking assertion there that this does not
strip more than max. one wide character nul byte.


Bernd.


[PATCH,nvptx] Truncate config/nvptx/oacc-parallel.c

2018-07-31 Thread Cesar Philippidis
Way back in the GCC 5 days when support for OpenACC was in its infancy,
we used to rely on having various GOACC_ thread functions in the runtime
to implement the execution model, or there lack of (that version of GCC
only supported vector level parallelism). However, beginning with GCC 6,
those external functions were replaced with internal functions that get
expanded by the nvptx BE directly.

This patch removes those stale libgomp functions from the nvptx libgomp
target. Is this OK for trunk, or does libgomp still need to maintain
backwards compatibility with GCC 5?

This patch has been bootstrapped and regtested for x86_64 with nvptx
offloading.

Thanks,
Cesar
[PATCH] [libgomp] Truncate config/nvptx/oacc-parallel.c

2018-XX-YY  Cesar Philippidis  
	Thomas Schwinge 

	libgomp/
	* config/nvptx/oacc-parallel.c: Truncate.

(cherry picked from gomp-4_0-branch r228836)
---
 libgomp/config/nvptx/oacc-parallel.c | 358 ---
 1 file changed, 358 deletions(-)

diff --git a/libgomp/config/nvptx/oacc-parallel.c b/libgomp/config/nvptx/oacc-parallel.c
index 5dc53da..e69de29 100644
--- a/libgomp/config/nvptx/oacc-parallel.c
+++ b/libgomp/config/nvptx/oacc-parallel.c
@@ -1,358 +0,0 @@
-/* OpenACC constructs
-
-   Copyright (C) 2014-2018 Free Software Foundation, Inc.
-
-   Contributed by Mentor Embedded.
-
-   This file is part of the GNU Offloading and Multi Processing Library
-   (libgomp).
-
-   Libgomp is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 3, or (at your option)
-   any later version.
-
-   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
-   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
-   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
-   more details.
-
-   Under Section 7 of GPL version 3, you are granted additional
-   permissions described in the GCC Runtime Library Exception, version
-   3.1, as published by the Free Software Foundation.
-
-   You should have received a copy of the GNU General Public License and
-   a copy of the GCC Runtime Library Exception along with this program;
-   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-   .  */
-
-#include "libgomp_g.h"
-
-__asm__ (".visible .func (.param .u32 %out_retval) GOACC_tid (.param .u32 %in_ar1);\n"
-	 ".visible .func (.param .u32 %out_retval) GOACC_ntid (.param .u32 %in_ar1);\n"
-	 ".visible .func (.param .u32 %out_retval) GOACC_ctaid (.param .u32 %in_ar1);\n"
-	 ".visible .func (.param .u32 %out_retval) GOACC_nctaid (.param .u32 %in_ar1);\n"
-	 "// BEGIN GLOBAL FUNCTION DECL: GOACC_get_num_threads\n"
-	 ".visible .func (.param .u32 %out_retval) GOACC_get_num_threads;\n"
-	 "// BEGIN GLOBAL FUNCTION DECL: GOACC_get_thread_num\n"
-	 ".visible .func (.param .u32 %out_retval) GOACC_get_thread_num;\n"
-	 "// BEGIN GLOBAL FUNCTION DECL: abort\n"
-	 ".extern .func abort;\n"
-	 ".visible .func (.param .u32 %out_retval) GOACC_tid (.param .u32 %in_ar1)\n"
-	 "{\n"
-	 ".reg .u32 %ar1;\n"
-	 ".reg .u32 %retval;\n"
-	 ".reg .u64 %hr10;\n"
-	 ".reg .u32 %r22;\n"
-	 ".reg .u32 %r23;\n"
-	 ".reg .u32 %r24;\n"
-	 ".reg .u32 %r25;\n"
-	 ".reg .u32 %r26;\n"
-	 ".reg .u32 %r27;\n"
-	 ".reg .u32 %r28;\n"
-	 ".reg .u32 %r29;\n"
-	 ".reg .pred %r30;\n"
-	 ".reg .u32 %r31;\n"
-	 ".reg .pred %r32;\n"
-	 ".reg .u32 %r33;\n"
-	 ".reg .pred %r34;\n"
-	 ".local .align 8 .b8 %frame[4];\n"
-	 "ld.param.u32 %ar1,[%in_ar1];\n"
-	 "mov.u32 %r27,%ar1;\n"
-	 "st.local.u32 [%frame],%r27;\n"
-	 "ld.local.u32 %r28,[%frame];\n"
-	 "mov.u32 %r29,1;\n"
-	 "setp.eq.u32 %r30,%r28,%r29;\n"
-	 "@%r30 bra $L4;\n"
-	 "mov.u32 %r31,2;\n"
-	 "setp.eq.u32 %r32,%r28,%r31;\n"
-	 "@%r32 bra $L5;\n"
-	 "mov.u32 %r33,0;\n"
-	 "setp.eq.u32 %r34,%r28,%r33;\n"
-	 "@!%r34 bra $L8;\n"
-	 "mov.u32 %r23,%tid.x;\n"
-	 "mov.u32 %r22,%r23;\n"
-	 "bra $L7;\n"
-	 "$L4:\n"
-	 "mov.u32 %r24,%tid.y;\n"
-	 "mov.u32 %r22,%r24;\n"
-	 "bra $L7;\n"
-	 "$L5:\n"
-	 "mov.u32 %r25,%tid.z;\n"
-	 "mov.u32 %r22,%r25;\n"
-	 "bra $L7;\n"
-	 "$L8:\n"
-	 "{\n"
-	 "{\n"
-	 "call abort;\n"
-	 "}\n"
-	 "}\n"
-	 "$L7:\n"
-	 "mov.u32 %r26,%r22;\n"
-	 "mov.u32 %retval,%r26;\n"
-	 "st.param.u32 [%out_retval],%retval;\n"
-	 "ret;\n"
-	 "}\n"
-	 ".visible .func (.param .u32 %out_retval) GOACC_ntid (.param .u32 %in_ar1)\n"
-	 "{\n"
-	 ".reg .u32 %ar1;\n"
-	 ".reg .u32 %retval;\n"
-	 ".reg .u64 %hr10;\n"
-	 ".reg .u32 %r22;\n"
-	 ".reg .u32 %r23;\n"
-	 ".reg .u32 %r24;\n"
-	 ".reg .u32 %r25;\n"
-	 ".reg .u32 %r26;\n"
-	 ".reg .u32 %r27;\n"
-	 ".reg .u32 %r28;\n"
-	 ".reg .u32 %r29;\n"
-	 ".reg .pred %r30;\n"
-	 ".reg .u32 %r31;\n"
-	 ".reg .pred %r32;\n"
-	 ".reg .u32 %r33;\n"
-	 ".reg .pred %r34;\n"
-	 ".local .align 8 .b8 %frame[4];\n"
-	 "ld.param.u32 %ar1,[%in_ar1];\n"
-	 "mov.u32 %r27,%ar1;\n"
-	 "st.local.u32 [%frame],%r27;\n"
-	 "ld.local.u32

Re: [PATCH][GCC][mid-end] Allow larger copies when not slow_unaligned_access and no padding.

2018-07-31 Thread Tamar Christina
Hi Richard,

The 07/31/2018 11:21, Richard Biener wrote:
> On Tue, 31 Jul 2018, Tamar Christina wrote:
> 
> > Ping 😊
> > 
> > > -Original Message-
> > > From: gcc-patches-ow...@gcc.gnu.org 
> > > On Behalf Of Tamar Christina
> > > Sent: Tuesday, July 24, 2018 17:34
> > > To: Richard Biener 
> > > Cc: gcc-patches@gcc.gnu.org; nd ; l...@redhat.com;
> > > i...@airs.com; amo...@gmail.com; berg...@vnet.ibm.com
> > > Subject: Re: [PATCH][GCC][mid-end] Allow larger copies when not
> > > slow_unaligned_access and no padding.
> > > 
> > > Hi Richard,
> > > 
> > > Thanks for the review!
> > > 
> > > The 07/23/2018 18:46, Richard Biener wrote:
> > > > On July 23, 2018 7:01:23 PM GMT+02:00, Tamar Christina
> > >  wrote:
> > > > >Hi All,
> > > > >
> > > > >This allows copy_blkmode_to_reg to perform larger copies when it is
> > > > >safe to do so by calculating the bitsize per iteration doing the
> > > > >maximum copy allowed that does not read more than the amount of bits
> > > > >left to copy.
> > > > >
> > > > >Strictly speaking, this copying is only done if:
> > > > >
> > > > >  1. the target supports fast unaligned access  2. no padding is
> > > > > being used.
> > > > >
> > > > >This should avoid the issues of the first patch (PR85123) but still
> > > > >work for targets that are safe to do so.
> > > > >
> > > > >Original patch
> > > > >https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01088.html
> > > > >Previous respin
> > > > >https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00239.html
> > > > >
> > > > >
> > > > >This produces for the copying of a 3 byte structure:
> > > > >
> > > > >fun3:
> > > > >   adrpx1, .LANCHOR0
> > > > >   add x1, x1, :lo12:.LANCHOR0
> > > > >   mov x0, 0
> > > > >   sub sp, sp, #16
> > > > >   ldrhw2, [x1, 16]
> > > > >   ldrbw1, [x1, 18]
> > > > >   add sp, sp, 16
> > > > >   bfi x0, x2, 0, 16
> > > > >   bfi x0, x1, 16, 8
> > > > >   ret
> > > > >
> > > > >whereas before it was producing
> > > > >
> > > > >fun3:
> > > > >   adrpx0, .LANCHOR0
> > > > >   add x2, x0, :lo12:.LANCHOR0
> > > > >   sub sp, sp, #16
> > > > >   ldrhw1, [x0, #:lo12:.LANCHOR0]
> > > > >   ldrbw0, [x2, 2]
> > > > >   strhw1, [sp, 8]
> > > > >   strbw0, [sp, 10]
> > > > >   ldr w0, [sp, 8]
> > > > >   add sp, sp, 16
> > > > >   ret
> > > > >
> > > > >Cross compiled and regtested on
> > > > >  aarch64_be-none-elf
> > > > >  armeb-none-eabi
> > > > >and no issues
> > > > >
> > > > >Boostrapped and regtested
> > > > > aarch64-none-linux-gnu
> > > > > x86_64-pc-linux-gnu
> > > > > powerpc64-unknown-linux-gnu
> > > > > arm-none-linux-gnueabihf
> > > > >
> > > > >and found no issues.
> > > > >
> > > > >OK for trunk?
> > > >
> > > > How does this affect store-to-load forwarding when the source is 
> > > > initialized
> > > piecewise? IMHO we should avoid larger loads but generate larger stores
> > > when possible.
> > > >
> > > > How do non-x86 architectures behave with respect to STLF?
> > > >
> > > 
> > > I should have made it more explicit in my cover letter, but this only 
> > > covers reg
> > > to reg copies.
> > > So the store-t-load forwarding shouldn't really come to play here, unless 
> > > I'm
> > > missing something
> > > 
> > > The example in my patch shows that the loads from mem are mostly
> > > unaffected.
> > > 
> > > For x86 the change is also quite significant, e.g for a 5 byte struct 
> > > load it used
> > > to generate
> > > 
> > > fun5:
> > >   movlfoo5(%rip), %eax
> > >   movl%eax, %edi
> > >   movzbl  %al, %edx
> > >   movzbl  %ah, %eax
> > >   movb%al, %dh
> > >   movzbl  foo5+2(%rip), %eax
> > >   shrl$24, %edi
> > >   salq$16, %rax
> > >   movq%rax, %rsi
> > >   movzbl  %dil, %eax
> > >   salq$24, %rax
> > >   movq%rax, %rcx
> > >   movq%rdx, %rax
> > >   movzbl  foo5+4(%rip), %edx
> > >   orq %rsi, %rax
> > >   salq$32, %rdx
> > >   orq %rcx, %rax
> > >   orq %rdx, %rax
> > >   ret
> > > 
> > > instead of
> > > 
> > > fun5:
> > > movzbl  foo5+4(%rip), %eax
> > > salq$32, %rax
> > > movq%rax, %rdx
> > > movlfoo5(%rip), %eax
> > > orq %rdx, %rax
> > > ret
> > > 
> > > so the loads themselves are unaffected.
> 
> I see.  Few things:
> 
>dst_words = XALLOCAVEC (rtx, n_regs);
> +
> +  slow_unaligned_access
> += targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE 
> (src)));
> +
>bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> 
> please avoid the extra vertical space.
> 
> 
> +
> +  /* Find the largest integer mode that can be used to copy all or as
> +many bits as possible of the struct
> 
> likewise.

Done.

> 
> +  FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)
> +   if (padding_correction == 0
> +   && !slow_unaligned_access
> 
> These conditions are invariant so p

Re: [PATCH] Make strlen range computations more conservative

2018-07-31 Thread Jakub Jelinek
On Tue, Jul 31, 2018 at 09:17:52AM -0600, Martin Sebor wrote:
> On 07/31/2018 12:38 AM, Jakub Jelinek wrote:
> > On Mon, Jul 30, 2018 at 09:45:49PM -0600, Martin Sebor wrote:
> > > Even without _FORTIFY_SOURCE GCC diagnoses (some) writes past
> > > the end of subobjects by string functions.  With _FORTIFY_SOURCE=2
> > > it calls abort.  This is the default on popular distributions,
> > 
> > Note that _FORTIFY_SOURCE=2 is the mode that goes beyond what the standard
> > requires, imposes extra requirements.  So from what this mode accepts or
> > rejects we shouldn't determine what is or isn't considered valid.
> 
> I'm not sure what the additional requirements are but the ones
> I am referring to are the enforcing of struct member boundaries.
> This is in line with the standard requirements of not accessing
> [sub]objects via pointers derived from other [sub]objects.

In the middle-end the distinction between what was originally a reference
to subobjects and what was a reference to objects is quickly lost
(whether through SCCVN or other optimizations).
We've run into this many times with the __builtin_object_size already.
So, if e.g.
struct S { char a[3]; char b[5]; } s = { "abc", "defg" };
...
strlen ((char *) &s) is well defined but
strlen (s.a) is not in C, for the middle-end you might not figure out which
one is which.

Jakub


[Committed] S/390: Don't emit prefetch instructions for clrmem

2018-07-31 Thread Andreas Krebbel
From: Andreas Krebbel 

gcc/ChangeLog:

2018-07-31  Andreas Krebbel  

* config/s390/s390.c (s390_expand_setmem): Make the unrolling to
depend on whether prefetch instructions will be emitted or not.
Use TARGET_SETMEM_PFD for checking whether prefetch instructions
will be emitted or not.
* config/s390/s390.h (TARGET_SETMEM_PREFETCH_DISTANCE)
(TARGET_SETMEM_PFD): New macros.

gcc/testsuite/ChangeLog:

2018-07-31  Andreas Krebbel  

* gcc.target/s390/memset-1.c: Improve testcase.
---
 gcc/config/s390/s390.c   | 22 +
 gcc/config/s390/s390.h   | 10 
 gcc/testsuite/gcc.target/s390/memset-1.c | 81 
 3 files changed, 84 insertions(+), 29 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index a579e9d..ec588a2 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -5499,12 +5499,15 @@ s390_expand_setmem (rtx dst, rtx len, rtx val)
 
   /* Expand setmem/clrmem for a constant length operand without a
  loop if it will be shorter that way.
- With a constant length and without pfd argument a
- clrmem loop is 32 bytes -> 5.3 * xc
- setmem loop is 36 bytes -> 3.6 * (mvi/stc + mvc) */
+ clrmem loop (with PFD)is 30 bytes -> 5 * xc
+ clrmem loop (without PFD) is 24 bytes -> 4 * xc
+ setmem loop (with PFD)is 38 bytes -> ~4 * (mvi/stc + mvc)
+ setmem loop (without PFD) is 32 bytes -> ~4 * (mvi/stc + mvc) */
   if (GET_CODE (len) == CONST_INT
-  && ((INTVAL (len) <= 256 * 5 && val == const0_rtx)
- || INTVAL (len) <= 257 * 3)
+  && ((val == const0_rtx
+  && (INTVAL (len) <= 256 * 4
+  || (INTVAL (len) <= 256 * 5 && TARGET_SETMEM_PFD(val,len
+ || (val != const0_rtx && INTVAL (len) <= 257 * 4))
   && (!TARGET_MVCLE || INTVAL (len) <= 256))
 {
   HOST_WIDE_INT o, l;
@@ -5618,12 +5621,11 @@ s390_expand_setmem (rtx dst, rtx len, rtx val)
 
   emit_label (loop_start_label);
 
-  if (TARGET_Z10
- && (GET_CODE (len) != CONST_INT || INTVAL (len) > 1024))
+  if (TARGET_SETMEM_PFD (val, len))
{
- /* Issue a write prefetch for the +4 cache line.  */
- rtx prefetch = gen_prefetch (gen_rtx_PLUS (Pmode, dst_addr,
-GEN_INT (1024)),
+ /* Issue a write prefetch.  */
+ rtx distance = GEN_INT (TARGET_SETMEM_PREFETCH_DISTANCE);
+ rtx prefetch = gen_prefetch (gen_rtx_PLUS (Pmode, dst_addr, distance),
   const1_rtx, const0_rtx);
  emit_insn (prefetch);
  PREFETCH_SCHEDULE_BARRIER_P (prefetch) = true;
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index 71a12b8..c6aedcd 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -181,6 +181,16 @@ enum processor_flags
 
 #define TARGET_AVOID_CMP_AND_BRANCH (s390_tune == PROCESSOR_2817_Z196)
 
+/* Issue a write prefetch for the +4 cache line.  */
+#define TARGET_SETMEM_PREFETCH_DISTANCE 1024
+
+/* Expand to a C expressions evaluating to true if a setmem to VAL of
+   length LEN should be emitted using prefetch instructions.  */
+#define TARGET_SETMEM_PFD(VAL,LEN) \
+  (TARGET_Z10  \
+   && (s390_tune < PROCESSOR_2964_Z13 || (VAL) != const0_rtx)  \
+   && (!CONST_INT_P (LEN) || INTVAL ((LEN)) > TARGET_SETMEM_PREFETCH_DISTANCE))
+
 /* Run-time target specification.  */
 
 /* Defaults for option flags defined only on some subtargets.  */
diff --git a/gcc/testsuite/gcc.target/s390/memset-1.c 
b/gcc/testsuite/gcc.target/s390/memset-1.c
index 7b43b97c..3e201df 100644
--- a/gcc/testsuite/gcc.target/s390/memset-1.c
+++ b/gcc/testsuite/gcc.target/s390/memset-1.c
@@ -2,16 +2,23 @@
without loop statements.  */
 
 /* { dg-do compile } */
-/* { dg-options "-O3 -mzarch" } */
+/* { dg-options "-O3 -mzarch -march=z13" } */
 
-/* 1 mvc */
+/* 1 stc */
+void
+*memset0(void *s, int c)
+{
+  return __builtin_memset (s, c, 1);
+}
+
+/* 1 stc 1 mvc */
 void
 *memset1(void *s, int c)
 {
   return __builtin_memset (s, c, 42);
 }
 
-/* 3 mvc */
+/* 3 stc 3 mvc */
 void
 *memset2(void *s, int c)
 {
@@ -25,55 +32,62 @@ void
   return __builtin_memset (s, c, 0);
 }
 
-/* mvc */
+/* 1 stc 1 mvc */
 void
 *memset4(void *s, int c)
 {
   return __builtin_memset (s, c, 256);
 }
 
-/* 2 mvc */
+/* 2 stc 2 mvc */
 void
 *memset5(void *s, int c)
 {
   return __builtin_memset (s, c, 512);
 }
 
-/* still 2 mvc through the additional first byte  */
+/* 2 stc 2 mvc - still due to the stc bytes */
 void
 *memset6(void *s, int c)
 {
   return __builtin_memset (s, c, 514);
 }
 
-/* 3 mvc */
+/* 3 stc 2 mvc */
 void
 *memset7(void *s, int c)
 {
   return __builtin_memset (s, c, 515);
 }
 
-/* still 3 mvc through the additional first byte  */
+/* 4 stc 4 mvc - 4 * 256 + 4 stc bytes */
 voi

Re: [PATCH] combine: Allow combining two insns to two insns

2018-07-31 Thread Richard Earnshaw (lists)
On 31/07/18 14:57, Segher Boessenkool wrote:
> Hi Christophe,
> 
> On Tue, Jul 31, 2018 at 02:34:06PM +0200, Christophe Lyon wrote:
>> Since this was committed, I've noticed regressions
>> on aarch64:
>> FAIL: gcc.dg/zero_bits_compound-1.c scan-assembler-not \\(and:
> 
> This went from
> and w0, w0, 255
> lsl w1, w0, 8

These are sequentially dependent.

> orr w0, w1, w0, lsl 20
> ret
> to
> and w1, w0, 255
> ubfiz   w0, w0, 8, 8

These can run in parallel.

So the change is a good one!  On a super-scalar machine we save a cycle.

R.

> orr w0, w0, w1, lsl 20
> ret
> so it's neither an improvement nor a regression, just different code.
> The testcase wants no ANDs in the RTL.
> 
> 
>> on arm-none-linux-gnueabi
>> FAIL: gfortran.dg/actual_array_constructor_1.f90   -O1  execution test
> 
> That sounds bad.  Open a PR, maybe?
> 
> 
>> On aarch64, I've also noticed a few others regressions but I'm not yet
>> 100% sure it's caused by this patch (bisect running):
>> gcc.target/aarch64/ashltidisi.c scan-assembler-times asr 4
> 
>  ushift_53_i:
> -   uxtwx1, w0
> -   lsl x0, x1, 53
> -   lsr x1, x1, 11
> +   lsr w1, w0, 11
> +   lsl x0, x0, 53
> ret
> 
>  shift_53_i:
> -   sxtwx1, w0
> -   lsl x0, x1, 53
> -   asr x1, x1, 11
> +   sbfxx1, x0, 11, 21
> +   lsl x0, x0, 53
> ret
> 
> Both are improvements afais.  The number of asr insns changes, sure.
> 
> 
>> gcc.target/aarch64/sve/var_stride_2.c -march=armv8.2-a+sve
>> scan-assembler-times \\tadd\\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 10\\n 2
> 
> Skipping all the SVE tests, sorry.  Richard says they look like
> improvements, and exactly of the expected kind.  :-)
> 
> 
> Segher
> 



Re: [PATCH 2/5] dumpfile.c: eliminate special-casing of dump_file/alt_dump_file

2018-07-31 Thread Richard Biener
On Tue, Jul 31, 2018 at 5:34 PM David Malcolm  wrote:
>
> On Tue, 2018-07-31 at 14:53 +0200, Richard Biener wrote:
> > On Fri, Jul 27, 2018 at 11:48 PM David Malcolm 
> > wrote:
> > >
> > > With the addition of optinfo, the various dump_* calls had three
> > > parts:
> > > - optionally print to dump_file
> > > - optionally print to alt_dump_file
> > > - optionally make an optinfo_item and add it to the pending
> > > optinfo,
> > >   creating it for dump_*_loc calls.
> > >
> > > However, this split makes it difficult to implement the formatted
> > > dumps
> > > later in patch kit, so as enabling work towards that, this patch
> > > removes
> > > the above split, so that all dumping within the dump_* API goes
> > > through
> > > optinfo_item.
> > >
> > > In order to ensure that the dumps to dump_file and alt_dump_file
> > > are
> > > processed immediately (rather than being buffered within the
> > > pending
> > > optinfo for consolidation), this patch introduces the idea of
> > > "immediate"
> > > optinfo_item destinations vs "non-immediate" destinations.
> > >
> > > The patch also adds selftest coverage of what's printed, and of
> > > scopes.
> > >
> > > This adds two allocations per dump_* call when dumping is enabled.
> > > I'm assuming that this isn't a problem, as dump_enabled_p is
> > > normally
> > > false.  There are ways of optimizing it if it is an issue (by
> > > making
> > > optinfo_item instances become temporaries that borrow the
> > > underlying
> > > buffer), but they require nontrivial changes, so I'd prefer to
> > > leave
> > > that for another patch kit, if it becomes necessary.
> >
> > Yeah, I guess that's OK given we can consolidate quite some calls
> > after
> > your patch anyways.
>
> We can, but FWIW my plan is to only touch the calls that I need to to
> implement the  "Higher-level reporting of vectorization problems" idea
> here:
>https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00446.html
>
> where the explicit dump calls become implicit within calls to things
> like:
>
>return opt_result::failure_at (stmt,
>   "not vectorized: different sized vector "
>   "types in statement, %T and %T\n",
>   vectype, nunits_vectype);
>
> But if you think it's worthwhile, I can do a big patch that uses these
> format codes throughout.
>
> > Using alloca + placement new would be possible
> > as well I guess?
>
> Maybe.  I think the underlying question here is "what should the
> lifetimes of the optinfo_items (and their text buffers) be?"
>
> In the initial version of the optinfo patch kit, I had optinfo_items
> being created in response to the various dump_* calls, and them being
> added to an optinfo (which takes ownership of them), before the optinfo
> is eventually emitted to various destinations; the optinfo is then
> deleted, deleting the owned items.
>
> This lifetime approach (having the optinfos own the optinfo_items) was
> necessary because one of the destinations was through the diagnostics
> system; they needed consolidation so that all of the items could be
> alive at the point of emission.  (I think the JSON output also required
> it at one point).
>
> Hence the above approach needs the items and thus their underlying text
> strings to live as long as the optinfo that owns them - the
> destinations assume that the optinfo_items are all alive at the point
> of emission.
>
> Hence this requires new/delete pairs for the items, and also the
> xstrdup around the text buffer, so that the items can own a copy.
>
> But the dump_file and alt_dump_file destinations don't need the items
> to be long-lived: they can be temporary wrappers.
>
> Similarly, the optimization record destination could simply work in
> terms of temporary items: when an optinfo_item is added, the
> corresponding JSON could be added immediately.
>
> So I think the only things that are requiring optinfo_items to be long-
> lived are:
> * the -fremarks idea from an earlier patch kit - and I'm not sure what
> our plans for that should be, in terms of how it should interact with
> alt_dump_file/-fopt-info
> * the selftests within dumpfile.c itself.
>
> So the other approach would be to rewrite dumpfile.c so that
> optinfo_item instances (or maybe "dump_item" instances) are implicitly
> temporary wrappers around a text buffer; the various emit destinations
> make no assumptions that the items will stick around; any that do need
> them to (e.g. for dumpfile.c's selftests) make a copy, perhaps with a
> optinfo_items_need_saving_p () function to guard adding a copy of each
> item into the optinfo.
>
> That would avoid the new/delete pair for all of the optinfo_item
> instances, and the xstrdup for each one, apart from during selftests.
>
> But it's a rewrite of this code (and has interactions with the rest of
> the kit, which is why I didn't do it).
>
> Is this something you'd want me to pursue as a followup?  (it's an
> optimizatio

Re: [PATCH 2/5] dumpfile.c: eliminate special-casing of dump_file/alt_dump_file

2018-07-31 Thread David Malcolm
On Tue, 2018-07-31 at 14:53 +0200, Richard Biener wrote:
> On Fri, Jul 27, 2018 at 11:48 PM David Malcolm 
> wrote:
> > 
> > With the addition of optinfo, the various dump_* calls had three
> > parts:
> > - optionally print to dump_file
> > - optionally print to alt_dump_file
> > - optionally make an optinfo_item and add it to the pending
> > optinfo,
> >   creating it for dump_*_loc calls.
> > 
> > However, this split makes it difficult to implement the formatted
> > dumps
> > later in patch kit, so as enabling work towards that, this patch
> > removes
> > the above split, so that all dumping within the dump_* API goes
> > through
> > optinfo_item.
> > 
> > In order to ensure that the dumps to dump_file and alt_dump_file
> > are
> > processed immediately (rather than being buffered within the
> > pending
> > optinfo for consolidation), this patch introduces the idea of
> > "immediate"
> > optinfo_item destinations vs "non-immediate" destinations.
> > 
> > The patch also adds selftest coverage of what's printed, and of
> > scopes.
> > 
> > This adds two allocations per dump_* call when dumping is enabled.
> > I'm assuming that this isn't a problem, as dump_enabled_p is
> > normally
> > false.  There are ways of optimizing it if it is an issue (by
> > making
> > optinfo_item instances become temporaries that borrow the
> > underlying
> > buffer), but they require nontrivial changes, so I'd prefer to
> > leave
> > that for another patch kit, if it becomes necessary.
> 
> Yeah, I guess that's OK given we can consolidate quite some calls
> after
> your patch anyways. 

We can, but FWIW my plan is to only touch the calls that I need to to
implement the  "Higher-level reporting of vectorization problems" idea
here:
   https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00446.html

where the explicit dump calls become implicit within calls to things
like:

   return opt_result::failure_at (stmt,
  "not vectorized: different sized vector "
  "types in statement, %T and %T\n",
  vectype, nunits_vectype);

But if you think it's worthwhile, I can do a big patch that uses these
format codes throughout.

> Using alloca + placement new would be possible
> as well I guess?

Maybe.  I think the underlying question here is "what should the
lifetimes of the optinfo_items (and their text buffers) be?"

In the initial version of the optinfo patch kit, I had optinfo_items
being created in response to the various dump_* calls, and them being
added to an optinfo (which takes ownership of them), before the optinfo
is eventually emitted to various destinations; the optinfo is then
deleted, deleting the owned items.

This lifetime approach (having the optinfos own the optinfo_items) was
necessary because one of the destinations was through the diagnostics
system; they needed consolidation so that all of the items could be
alive at the point of emission.  (I think the JSON output also required
it at one point).

Hence the above approach needs the items and thus their underlying text
strings to live as long as the optinfo that owns them - the
destinations assume that the optinfo_items are all alive at the point
of emission.

Hence this requires new/delete pairs for the items, and also the
xstrdup around the text buffer, so that the items can own a copy.

But the dump_file and alt_dump_file destinations don't need the items
to be long-lived: they can be temporary wrappers.

Similarly, the optimization record destination could simply work in
terms of temporary items: when an optinfo_item is added, the
corresponding JSON could be added immediately.

So I think the only things that are requiring optinfo_items to be long-
lived are:
* the -fremarks idea from an earlier patch kit - and I'm not sure what
our plans for that should be, in terms of how it should interact with
alt_dump_file/-fopt-info
* the selftests within dumpfile.c itself.

So the other approach would be to rewrite dumpfile.c so that
optinfo_item instances (or maybe "dump_item" instances) are implicitly
temporary wrappers around a text buffer; the various emit destinations
make no assumptions that the items will stick around; any that do need 
them to (e.g. for dumpfile.c's selftests) make a copy, perhaps with a
optinfo_items_need_saving_p () function to guard adding a copy of each
item into the optinfo.

That would avoid the new/delete pair for all of the optinfo_item
instances, and the xstrdup for each one, apart from during selftests.

But it's a rewrite of this code (and has interactions with the rest of
the kit, which is why I didn't do it).

Is this something you'd want me to pursue as a followup?  (it's an
optimization of the dump_enabled_p branch.  Maybe it might become more
necessary for people using -fdump-optimization-record on large
codebases???)

> OK.

Thanks
Dave


> Richard.
> 
> > gcc/ChangeLog:
> > * dump-context.h: Include "pretty-print.h".
> > (dump_c

[PATCH, rs6000] refactor/cleanup in rs6000-string.c

2018-07-31 Thread Aaron Sawdey
Just teasing things apart a bit more in this function so I can add
vec/vsx code generation without making it enormous and
incomprehensible.

Bootstrap/regtest passes on powerpc64le, ok for trunk?

Thanks,
Aaron


2018-07-31  Aaron Sawdey  

* config/rs6000/rs6000-string.c (select_block_compare_mode): Move test
for word_mode_ok here instead of passing as argument.
(expand_block_compare): Change select_block_compare_mode() call.
(expand_strncmp_gpr_sequence): New function.
(expand_strn_compare): Make use of expand_strncmp_gpr_sequence.

Index: gcc/config/rs6000/rs6000-string.c
===
--- gcc/config/rs6000/rs6000-string.c   (revision 263039)
+++ gcc/config/rs6000/rs6000-string.c   (working copy)
@@ -238,13 +238,11 @@
 
OFFSET is the current read offset from the beginning of the block.
BYTES is the number of bytes remaining to be read.
-   ALIGN is the minimum alignment of the memory blocks being compared in bytes.
-   WORD_MODE_OK indicates using WORD_MODE is allowed, else SImode is
-   the largest allowable mode.  */
+   ALIGN is the minimum alignment of the memory blocks being compared in 
bytes.  */
 static machine_mode
 select_block_compare_mode (unsigned HOST_WIDE_INT offset,
   unsigned HOST_WIDE_INT bytes,
-  unsigned HOST_WIDE_INT align, bool word_mode_ok)
+  unsigned HOST_WIDE_INT align)
 {
   /* First see if we can do a whole load unit
  as that will be more efficient than a larger load + shift.  */
@@ -257,6 +255,11 @@
   /* The most we can read without potential page crossing.  */
   unsigned HOST_WIDE_INT maxread = ROUND_UP (bytes, align);
 
+  /* If we have an LE target without ldbrx and word_mode is DImode,
+ then we must avoid using word_mode.  */
+  int word_mode_ok = !(!BYTES_BIG_ENDIAN && !TARGET_LDBRX
+  && word_mode == DImode);
+
   if (word_mode_ok && bytes >= UNITS_PER_WORD)
 return word_mode;
   else if (bytes == GET_MODE_SIZE (SImode))
@@ -1382,16 +1385,11 @@
   else
 cond = gen_reg_rtx (CCmode);
 
-  /* If we have an LE target without ldbrx and word_mode is DImode,
- then we must avoid using word_mode.  */
-  int word_mode_ok = !(!BYTES_BIG_ENDIAN && !TARGET_LDBRX
-  && word_mode == DImode);
-
   /* Strategy phase.  How many ops will this take and should we expand it?  */
 
   unsigned HOST_WIDE_INT offset = 0;
   machine_mode load_mode =
-select_block_compare_mode (offset, bytes, base_align, word_mode_ok);
+select_block_compare_mode (offset, bytes, base_align);
   unsigned int load_mode_size = GET_MODE_SIZE (load_mode);
 
   /* We don't want to generate too much code.  The loop code can take
@@ -1445,8 +1443,7 @@
   while (bytes > 0)
 {
   unsigned int align = compute_current_alignment (base_align, offset);
-  load_mode = select_block_compare_mode (offset, bytes,
-align, word_mode_ok);
+  load_mode = select_block_compare_mode (offset, bytes, align);
   load_mode_size = GET_MODE_SIZE (load_mode);
   if (bytes >= load_mode_size)
cmp_bytes = load_mode_size;
@@ -1698,6 +1695,189 @@
   LABEL_NUSES (strncmp_label) += 1;
 }
 
+/* Generate the sequence of compares for strcmp/strncmp using gpr instructions.
+   BYTES_TO_COMPARE is the number of bytes to be compared.
+   BASE_ALIGN is the smaller of the alignment of the two strings.
+   ORIG_SRC1 is the unmodified rtx for the first string.
+   ORIG_SRC2 is the unmodified rtx for the second string.
+   TMP_REG_SRC1 is the register for loading the first string.
+   TMP_REG_SRC2 is the register for loading the second string.
+   RESULT_REG is the rtx for the result register.
+   EQUALITY_COMPARE_REST is a flag to indicate we need to make a cleanup call
+   to strcmp/strncmp if we have equality at the end of the inline comparison.
+   CLEANUP_LABEL is rtx for a label we generate if we need code to clean up
+   and generate the final comparison result.
+   FINAL_MOVE_LABEL is rtx for a label we can branch to when we can just 
+   set the final result.  */
+static void
+expand_strncmp_gpr_sequence(unsigned HOST_WIDE_INT bytes_to_compare,
+   unsigned int base_align,
+   rtx orig_src1, rtx orig_src2,
+   rtx tmp_reg_src1, rtx tmp_reg_src2, rtx result_reg,
+   bool equality_compare_rest, rtx &cleanup_label,
+   rtx final_move_label)
+{
+  unsigned int word_mode_size = GET_MODE_SIZE (word_mode);
+  machine_mode load_mode;
+  unsigned int load_mode_size;
+  unsigned HOST_WIDE_INT cmp_bytes = 0;
+  unsigned HOST_WIDE_INT offset = 0;
+  rtx src1_addr = force_reg (Pmode, XEXP (orig_src1, 0));
+  rtx src2_addr = force_reg (Pmode, XEXP (orig_src2, 0));
+
+  while (bytes_to_compare > 0)
+{
+  /* GPR compare se

[PATCH,nvptx] Remove use of CUDA unified memory in libgomp

2018-07-31 Thread Cesar Philippidis
At present, libgomp is using CUDA unified memory only as a buffer pass
to the struct containing the pointers to the data mappings to the
offloaded functions. I'm not sure why unified memory is needed here if
it is still being managed explicitly by the driver.

This patch removes the use of CUDA unified memory from the driver. I
don't recall observing any reduction in performance. Besides,
eventually, we'd like to eliminate the struct containing all pointers to
the offloaded data mappings and pass those pointers as individual
function arguments to cuLaunchKernel directly.

Is this patch OK for trunk? I bootstrapped and regression tested it for
x86_64 with nvptx offloading.

Thanks,
Cesar
[PATCH] [nvptx] Remove use of CUDA unified memory in libgomp

2018-XX-YY  Cesar Philippidis  

	libgomp/
	* plugin/plugin-nvptx.c (struct cuda_map): New.
	(struct ptx_stream): Replace d, h, h_begin, h_end, h_next, h_prev,
	h_tail with (cuda_map *) map.
	(cuda_map_create): New function.
	(cuda_map_destroy): New function.
	(map_init): Update to use a linked list of cuda_map objects.
	(map_fini): Likewise.
	(map_pop): Likewise.
	(map_push): Likewise.  Return CUdeviceptr instead of void.
	(init_streams_for_device): Remove stales references to ptx_stream
	members.
	(select_stream_for_async): Likewise.
	(nvptx_exec): Update call to map_init.

(cherry picked from gomp-4_0-branch r242614)
---
 libgomp/plugin/plugin-nvptx.c | 167 +++---
 1 file changed, 90 insertions(+), 77 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 1237ea10..d79ddf1 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -200,20 +200,20 @@ cuda_error (CUresult r)
 static unsigned int instantiated_devices = 0;
 static pthread_mutex_t ptx_dev_lock = PTHREAD_MUTEX_INITIALIZER;
 
+struct cuda_map
+{
+  CUdeviceptr d;
+  size_t size;
+  bool active;
+  struct cuda_map *next;
+};
+
 struct ptx_stream
 {
   CUstream stream;
   pthread_t host_thread;
   bool multithreaded;
-
-  CUdeviceptr d;
-  void *h;
-  void *h_begin;
-  void *h_end;
-  void *h_next;
-  void *h_prev;
-  void *h_tail;
-
+  struct cuda_map *map;
   struct ptx_stream *next;
 };
 
@@ -225,101 +225,114 @@ struct nvptx_thread
   struct ptx_device *ptx_dev;
 };
 
+static struct cuda_map *
+cuda_map_create (size_t size)
+{
+  struct cuda_map *map = GOMP_PLUGIN_malloc (sizeof (struct cuda_map));
+
+  assert (map);
+
+  map->next = NULL;
+  map->size = size;
+  map->active = false;
+
+  CUDA_CALL_ERET (NULL, cuMemAlloc, &map->d, size);
+  assert (map->d);
+
+  return map;
+}
+
+static void
+cuda_map_destroy (struct cuda_map *map)
+{
+  CUDA_CALL_ASSERT (cuMemFree, map->d);
+  free (map);
+}
+
+/* The following map_* routines manage the CUDA device memory that
+   contains the data mapping arguments for cuLaunchKernel.  Each
+   asynchronous PTX stream may have multiple pending kernel
+   invocations, which are launched in a FIFO order.  As such, the map
+   routines maintains a queue of cuLaunchKernel arguments.
+
+   Calls to map_push and map_pop must be guarded by ptx_event_lock.
+   Likewise, calls to map_init and map_fini are guarded by
+   ptx_dev_lock inside GOMP_OFFLOAD_init_device and
+   GOMP_OFFLOAD_fini_device, respectively.  */
+
 static bool
 map_init (struct ptx_stream *s)
 {
   int size = getpagesize ();
 
   assert (s);
-  assert (!s->d);
-  assert (!s->h);
-
-  CUDA_CALL (cuMemAllocHost, &s->h, size);
-  CUDA_CALL (cuMemHostGetDevicePointer, &s->d, s->h, 0);
 
-  assert (s->h);
+  s->map = cuda_map_create (size);
 
-  s->h_begin = s->h;
-  s->h_end = s->h_begin + size;
-  s->h_next = s->h_prev = s->h_tail = s->h_begin;
-
-  assert (s->h_next);
-  assert (s->h_end);
   return true;
 }
 
 static bool
 map_fini (struct ptx_stream *s)
 {
-  CUDA_CALL (cuMemFreeHost, s->h);
+  assert (s->map->next == NULL);
+  assert (!s->map->active);
+
+  cuda_map_destroy (s->map);
+
   return true;
 }
 
 static void
 map_pop (struct ptx_stream *s)
 {
-  assert (s != NULL);
-  assert (s->h_next);
-  assert (s->h_prev);
-  assert (s->h_tail);
-
-  s->h_tail = s->h_next;
-
-  if (s->h_tail >= s->h_end)
-s->h_tail = s->h_begin + (int) (s->h_tail - s->h_end);
+  struct cuda_map *next;
 
-  if (s->h_next == s->h_tail)
-s->h_prev = s->h_next;
+  assert (s != NULL);
 
-  assert (s->h_next >= s->h_begin);
-  assert (s->h_tail >= s->h_begin);
-  assert (s->h_prev >= s->h_begin);
+  if (s->map->next == NULL)
+{
+  s->map->active = false;
+  return;
+}
 
-  assert (s->h_next <= s->h_end);
-  assert (s->h_tail <= s->h_end);
-  assert (s->h_prev <= s->h_end);
+  next = s->map->next;
+  cuda_map_destroy (s->map);
+  s->map = next;
 }
 
-static void
-map_push (struct ptx_stream *s, size_t size, void **h, void **d)
+static CUdeviceptr
+map_push (struct ptx_stream *s, size_t size)
 {
-  int left;
-  int offset;
+  struct cuda_map *map = NULL, *t = NULL;
 
-  assert (s != NULL);
+  assert (s);
+  

Re: [PATCH] Make strlen range computations more conservative

2018-07-31 Thread Martin Sebor

On 07/31/2018 12:38 AM, Jakub Jelinek wrote:

On Mon, Jul 30, 2018 at 09:45:49PM -0600, Martin Sebor wrote:

Even without _FORTIFY_SOURCE GCC diagnoses (some) writes past
the end of subobjects by string functions.  With _FORTIFY_SOURCE=2
it calls abort.  This is the default on popular distributions,


Note that _FORTIFY_SOURCE=2 is the mode that goes beyond what the standard
requires, imposes extra requirements.  So from what this mode accepts or
rejects we shouldn't determine what is or isn't considered valid.


I'm not sure what the additional requirements are but the ones
I am referring to are the enforcing of struct member boundaries.
This is in line with the standard requirements of not accessing
[sub]objects via pointers derived from other [sub]objects.

The one area where Builtin Object Size doesn't faithfully reflect
subobject boundaries is arrays of of arrays.  This was a serious
concern for the security group at my last company (see bug 44384)
We developed (proprietary) patches to mitigate the shortcoming.

Martin



Re: [Patch, fortran] A first small step towards CFI descriptor implementation

2018-07-31 Thread Paul Richard Thomas
Hi Richard,

Ah yes, you are absolutely right. I will sit on it for a bit and do
the interchange at the descriptor conversion stage for now.

Thanks

Paul
On Tue, 31 Jul 2018 at 15:57, Richard Biener  wrote:
>
> On Tue, Jul 31, 2018 at 2:07 PM Paul Richard Thomas
>  wrote:
> >
> > Daniel Celis Garza and Damian Rouson have developed a runtime library
> > and include file for the TS 29113 and F2018 C descriptors.
> > https://github.com/sourceryinstitute/ISO_Fortran_binding
> >
> > The ordering of types is different to the current 'bt' enum in
> > libgfortran.h. This patch interchanges BT_DERIVED and BT_CHARACTER to
> > fix this.
> >
> > Regtests on FC28/x86_64. OK for trunk?
>
> That's an ABI change, correct?
>
> Richard.
>
> > Cheers
> >
> > Paul
> >
> > 2018-07-31  Paul Thomas  
> >
> > * gcc/fortran/libgfortran.h : In bt enum interchange BT_DERIVED
> > and BT_CHARACTER for CFI descriptor compatibility(TS 29113).



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


[PATCH,AIX] Optimize the time required for loading XCOFF data

2018-07-31 Thread REIX, Tony

Description:
 * This patch optimizes the time required for loading XCOFF data.

Tests:
 * AIX: Build: SUCCESS
   - build made by means of gmake on AIX.

ChangeLog:
  * xcoff.c: Optimize loading of XCOFF data.
 
 
Cordialement,
 
 Tony Reix
 
 tony.r...@atos.net
 
 ATOS / Bull SAS
 ATOS Expert
 IBM Coop Architect & Technical Leader
  
Office : +33 (0) 4 76 29 72 67 
1 rue de Provence - 38432 Échirolles - France 
www.atos.net Index: libbacktrace/xcoff.c
===
--- ./libbacktrace/xcoff.c	(revision 262803)
+++ ./libbacktrace/xcoff.c	(working copy)
@@ -338,27 +338,32 @@ struct xcoff_incl_vector
   size_t count;
 };
 
-/* Map a single PC value to a file/function/line.  */
+/* A growable vector of functions information.  */
 
-struct xcoff_line
+struct xcoff_func
 {
   /* PC.  */
   uintptr_t pc;
-  /* File name.  Many entries in the array are expected to point to
- the same file name.  */
+  /* The size of the function.  */
+  size_t size;
+  /* Function name.  */
+  const char *name;
+  /* File name.  */
   const char *filename;
-  /* Function name.  */
-  const char *function;
-  /* Line number.  */
-  int lineno;
+  /* Pointer to first lnno entry.  */
+  uintptr_t lnnoptr;
+  /* Base address of containing section.  */
+  uintptr_t sect_base;
+  /* Starting source line number.  */
+  int lnno;
 };
 
-/* A growable vector of line number information.  This is used while
-   reading the line numbers.  */
+/* A growable vector of function information.  This is used while
+   reading the function symbols.  */
 
-struct xcoff_line_vector
+struct xcoff_func_vector
 {
-  /* Memory.  This is an array of struct xcoff_line.  */
+  /* Memory.  This is an array of struct xcoff_func.  */
   struct backtrace_vector vec;
   /* Number of valid mappings.  */
   size_t count;
@@ -370,8 +375,16 @@ struct xcoff_fileline_data
 {
   /* The data for the next file we know about.  */
   struct xcoff_fileline_data *next;
-  /* Line number information.  */
-  struct xcoff_line_vector vec;
+  /* Functions information.  */
+  struct xcoff_func_vector func_vec;
+  /* Include files information.  */
+  struct xcoff_incl_vector incl_vec;
+  /* Line numbers information.  */
+  const unsigned char *linenos;
+  size_t linenos_size;
+  uint64_t lnnoptr0;
+  /* Loader address.  */
+  uintptr_t base_address;
 };
 
 /* An index of DWARF sections we care about.  */
@@ -509,6 +522,7 @@ xcoff_syminfo (struct backtrace_state *state ATTRI
 {
   struct xcoff_syminfo_data *edata;
   struct xcoff_symbol *sym = NULL;
+  const char *name;
 
   if (!state->threaded)
 {
@@ -547,7 +561,13 @@ xcoff_syminfo (struct backtrace_state *state ATTRI
   if (sym == NULL)
 callback (data, addr, NULL, 0, 0);
   else
-callback (data, addr, sym->name, sym->address, sym->size);
+{
+  name = sym->name;
+  /* AIX prepends a '.' to function entry points, remove it.  */
+  if (name && *name == '.')
+	++name;
+  callback (data, addr, name, sym->address, sym->size);
+}
 }
 
 /* Return the name of an XCOFF symbol.  */
@@ -640,43 +660,76 @@ xcoff_initialize_syminfo (struct backtrace_state *
   return 1;
 }
 
-/* Compare struct xcoff_line for qsort.  */
+/* Compare struct xcoff_func for qsort.  */
 
 static int
-xcoff_line_compare (const void *v1, const void *v2)
+xcoff_func_compare (const void *v1, const void *v2)
 {
-  const struct xcoff_line *ln1 = (const struct xcoff_line *) v1;
-  const struct xcoff_line *ln2 = (const struct xcoff_line *) v2;
+  const struct xcoff_func *fn1 = (const struct xcoff_func *) v1;
+  const struct xcoff_func *fn2 = (const struct xcoff_func *) v2;
 
-  if (ln1->pc < ln2->pc)
+  if (fn1->pc < fn2->pc)
 return -1;
-  else if (ln1->pc > ln2->pc)
+  else if (fn1->pc > fn2->pc)
 return 1;
   else
 return 0;
 }
 
-/* Find a PC in a line vector.  We always allocate an extra entry at
-   the end of the lines vector, so that this routine can safely look
-   at the next entry.  */
+/* Compare a PC against an xcoff_func for bsearch.  */
 
 static int
-xcoff_line_search (const void *vkey, const void *ventry)
+xcoff_func_search (const void *vkey, const void *ventry)
 {
   const uintptr_t *key = (const uintptr_t *) vkey;
-  const struct xcoff_line *entry = (const struct xcoff_line *) ventry;
+  const struct xcoff_func *entry = (const struct xcoff_func *) ventry;
   uintptr_t pc;
 
   pc = *key;
   if (pc < entry->pc)
 return -1;
-  else if ((entry + 1)->pc == (uintptr_t) -1 || pc >= (entry + 1)->pc)
+  else if ((entry->size == 0 && pc > entry->pc)
+	   || (entry->size > 0 && pc >= entry->pc + entry->size))
 return 1;
   else
 return 0;
 }
 
-/* Look for a PC in the line vector for one module.  On success,
+/* Compare struct xcoff_incl for qsort.  */
+
+static int
+xcoff_incl_compare (const void *v1, const void *v2)
+{
+  const struct xcoff_incl *in1 = (const struct xcoff_incl *) v1;
+  const struct xcoff_incl *in2 = (const struct xcoff_inc

[PATCH,nvptx] Remove use of 'struct map' from plugin (nvptx)

2018-07-31 Thread Cesar Philippidis
This is an old patch which removes the struct map from the nvptx plugin.
I believe at one point this was supposed to be used to manage async data
mappings, but in practice that never worked out.

Is this OK for trunk? I bootstrapped and regtested on x86_64 with nvptx
offloading.

Thanks,
Cesar
[PATCH] Remove use of 'struct map' from plugin (nvptx)

2018-XX-YY  Cesar Philippidis  
	James Norris 	

	libgomp/
	* plugin/plugin-nvptx.c (struct map): Removed.
	(map_init, map_pop): Remove use of struct map. (map_push):
	Likewise and change argument list.
	* testsuite/libgomp.oacc-c-c++-common/mapping-1.c: New

(cherry picked from gomp-4_0-branch r231616)
---
 libgomp/plugin/plugin-nvptx.c  | 33 +++-
 .../libgomp.oacc-c-c++-common/mapping-1.c  | 63 ++
 2 files changed, 69 insertions(+), 27 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/mapping-1.c

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index a92f054..1237ea10 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -225,13 +225,6 @@ struct nvptx_thread
   struct ptx_device *ptx_dev;
 };
 
-struct map
-{
-  int async;
-  size_t  size;
-  charmappings[0];
-};
-
 static bool
 map_init (struct ptx_stream *s)
 {
@@ -265,16 +258,12 @@ map_fini (struct ptx_stream *s)
 static void
 map_pop (struct ptx_stream *s)
 {
-  struct map *m;
-
   assert (s != NULL);
   assert (s->h_next);
   assert (s->h_prev);
   assert (s->h_tail);
 
-  m = s->h_tail;
-
-  s->h_tail += m->size;
+  s->h_tail = s->h_next;
 
   if (s->h_tail >= s->h_end)
 s->h_tail = s->h_begin + (int) (s->h_tail - s->h_end);
@@ -292,37 +281,27 @@ map_pop (struct ptx_stream *s)
 }
 
 static void
-map_push (struct ptx_stream *s, int async, size_t size, void **h, void **d)
+map_push (struct ptx_stream *s, size_t size, void **h, void **d)
 {
   int left;
   int offset;
-  struct map *m;
 
   assert (s != NULL);
 
   left = s->h_end - s->h_next;
-  size += sizeof (struct map);
 
   assert (s->h_prev);
   assert (s->h_next);
 
   if (size >= left)
 {
-  m = s->h_prev;
-  m->size += left;
-  s->h_next = s->h_begin;
-
-  if (s->h_next + size > s->h_end)
-	GOMP_PLUGIN_fatal ("unable to push map");
+  assert (s->h_next == s->h_prev);
+  s->h_next = s->h_prev = s->h_tail = s->h_begin;
 }
 
   assert (s->h_next);
 
-  m = s->h_next;
-  m->async = async;
-  m->size = size;
-
-  offset = (void *)&m->mappings[0] - s->h;
+  offset = s->h_next - s->h;
 
   *d = (void *)(s->d + offset);
   *h = (void *)(s->h + offset);
@@ -1291,7 +1270,7 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
   /* This reserves a chunk of a pre-allocated page of memory mapped on both
  the host and the device. HP is a host pointer to the new chunk, and DP is
  the corresponding device pointer.  */
-  map_push (dev_str, async, mapnum * sizeof (void *), &hp, &dp);
+  map_push (dev_str, mapnum * sizeof (void *), &hp, &dp);
 
   GOMP_PLUGIN_debug (0, "  %s: prepare mappings\n", __FUNCTION__);
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/mapping-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/mapping-1.c
new file mode 100644
index 000..593e7d4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/mapping-1.c
@@ -0,0 +1,63 @@
+/* { dg-do run } */
+
+#include 
+#include 
+#include 
+
+/* Exercise the kernel launch argument mapping.  */
+
+int
+main (int argc, char **argv)
+{
+  int a[256], b[256], c[256], d[256], e[256], f[256];
+  int i;
+  int n;
+
+  /* 48 is the size of the mappings for the first parallel construct.  */
+  n = sysconf (_SC_PAGESIZE) / 48 - 1;
+
+  i = 0;
+
+  for (i = 0; i < n; i++)
+{
+  #pragma acc parallel copy (a, b, c, d)
+	{
+	  int j;
+
+	  for (j = 0; j < 256; j++)
+	{
+	  a[j] = j;
+	  b[j] = j;
+	  c[j] = j;
+	  d[j] = j;
+	}
+	}
+}
+
+#pragma acc parallel copy (a, b, c, d, e, f)
+  {
+int j;
+
+for (j = 0; j < 256; j++)
+  {
+	a[j] = j;
+	b[j] = j;
+	c[j] = j;
+	d[j] = j;
+	e[j] = j;
+	f[j] = j;
+  }
+  }
+
+  for (i = 0; i < 256; i++)
+   {
+ if (a[i] != i) abort();
+ if (b[i] != i) abort();
+ if (c[i] != i) abort();
+ if (d[i] != i) abort();
+ if (e[i] != i) abort();
+ if (f[i] != i) abort();
+   }
+
+  exit (0);
+}
-- 
2.7.4



Re: [PATCH] libbacktrace: Move define of HAVE_ZLIB into check for -lz

2018-07-31 Thread Iain Buclaw
On 31 July 2018 at 16:33, Ian Lance Taylor  wrote:
> On Sun, Jul 29, 2018 at 7:50 AM, Iain Buclaw  wrote:
>>
>> This is really to suppress the default action-if-found for
>> AC_CHECK_LIBS.  Zlib is not a dependency of libbacktrace, and so it
>> shouldn't be added to LIBS.  When looking at the check, saw that could
>> remove the test for ac_cv_lib_z_compress also.
>
> Thanks, but this doesn't seem like quite the right approach, as seen
> by the fact that HAVE_ZLIB_H was dropped from config.h.in.  I think
> you need to keep the AC_DEFINE out of the AC_CHECK_LIB.  I would guess
> that it would work to just change the default case of AC_CHECK_LIB to
> [;] or something similarly innocuous.
>
> Ian

May I ask you to look at the patch again?  There's two similarly named
variables here, HAVE_LIBZ and HAVE_ZLIB.

Only the unused HAVE_LIBZ has been dropped from config.h.in.  The one
that matters has been left alone, or at least I'm pretty sure of.

Iain.


[PATCH] Improve libstdc++ docs w.r.t newer C++ standards

2018-07-31 Thread Jonathan Wakely

Instead of repeating all the old headers for every new standard I've
changed the docs to only list the new headers for each standard.

* doc/xml/manual/test.xml: Improve documentation on writing tests for
newer standards.
* doc/xml/manual/using.xml: Document all headers for C++11 and later.
* doc/html/*: Regenerate.

Committed to trunk.

commit b034d9c7df59641273bef998c6f9d4b7b02c4d83
Author: Jonathan Wakely 
Date:   Tue Jul 31 15:54:19 2018 +0100

Improve libstdc++ docs w.r.t newer C++ standards

Instead of repeating all the old headers for every new standard I've
changed the docs to only list the new headers for each standard.

* doc/xml/manual/test.xml: Improve documentation on writing tests 
for
newer standards.
* doc/xml/manual/using.xml: Document all headers for C++11 and 
later.
* doc/html/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/test.xml 
b/libstdc++-v3/doc/xml/manual/test.xml
index c8c47d1bbdb..1725cdb69f3 100644
--- a/libstdc++-v3/doc/xml/manual/test.xml
+++ b/libstdc++-v3/doc/xml/manual/test.xml
@@ -763,12 +763,15 @@ cat 27_io/objects/char/3_xin.in | a.out
 
   
 Similarly, tests which depend on a newer standard than the default
-should use dg-options instead of an effective target,
-so that they are not skipped by default.
+must use dg-options instead of (or in addition to)
+an effective target, so that they are not skipped by default.
 For example, tests for C++17 features should use
 // { dg-options "-std=gnu++17" }
-and not
-// { dg-do run "c++1z" }
+before any dg-do such as:
+// { dg-do run "c++17" }
+The dg-options directive must come first, so that
+the -std flag has already been added to the options
+before checking the c++17 target.
   
 
 Examples of Test 
Directives
diff --git a/libstdc++-v3/doc/xml/manual/using.xml 
b/libstdc++-v3/doc/xml/manual/using.xml
index 67f9cf5216b..a5f2a2d074d 100644
--- a/libstdc++-v3/doc/xml/manual/using.xml
+++ b/libstdc++-v3/doc/xml/manual/using.xml
@@ -128,7 +128,7 @@
  must be available to all hosted implementations.  Actually, the
  word "files" is a misnomer, since the contents of the
  headers don't necessarily have to be in any kind of external
- file.  The only rule is that when one #include's a
+ file.  The only rule is that when one #includes a
  header, the contents of that header become available, no matter
  how.

@@ -140,16 +140,24 @@

  There are two main types of include files: header files related
  to a specific version of the ISO C++ standard (called Standard
- Headers), and all others (TR1, C++ ABI, and Extensions).
+ Headers), and all others (TS, TR1, C++ ABI, and Extensions).

 

- Two dialects of standard headers are supported, corresponding to
- the 1998 standard as updated for 2003, and the current 2011 standard.
+ Multiple dialects of standard headers are supported, corresponding to
+ the 1998 standard as updated for 2003, the 2011 standard, the 2014
+ standard, and so on.

 

- C++98/03 include files. These are available in the default compilation 
mode, i.e. -std=c++98 or -std=gnu++98.
+  and
+  and
+ 
+ show the C++98/03 include files.
+ These are available in the C++98 compilation mode,
+ i.e. -std=c++98 or -std=gnu++98.
+ Unless specified otherwise below, they are also available in later modes
+ (C++11, C++14 etc).

 
 
@@ -207,6 +215,7 @@
 
 valarray
 vector
+
 
 
 
@@ -248,14 +257,38 @@
 ctime
 cwchar
 cwctype
+
 
 
 
 
 
 
-C++11 include files. These are only available in C++11 compilation
+  The following header is deprecated
+  and might be removed from a future C++ standard.
+
+
+
+C++ 1998 Deprecated Library Header
+
+
+
+
+
+strstream
+
+
+
+
+
+
+ and
+ show the C++11 include files.
+These are available in C++11 compilation
 mode, i.e. -std=c++11 or -std=gnu++11.
+Including these headers in C++98/03 mode may result in compilation errors.
+Unless specified otherwise below, they are also available in later modes
+(C++14 etc).
 
 
 
@@ -271,73 +304,33 @@ mode, i.e. -std=c++11 or 
-std=gnu++11.
 
 
 
-algorithm
 array
-bitset
+atomic
 chrono
-complex
-
-
+codecvt
 condition_variable
-deque
-exception
-forward_list
-fstream
 
 
-functional
+forward_list
 future
 initalizer_list
-iomanip
-ios
-
-
-iosfwd
-iostream
-istream
-iterator
-limits
-
-
-list
-locale
-map
-memory
 mutex
-
-
-new
-numeric
-ostream
-queue
 random
 
 
 ratio
 regex
-set
-sstream
-stack
-
-
-stdexcept
-streambuf
-string
+scoped_allocator
 system_error
 thread
 
 
 tuple
+typeindex
 type_traits
-typeinfo
 unordered_map
 unordered_set
 
-
-utility
-valarray
-vector
-
 
 
 
@@ -356,39 +349,231 @@ mode, i.e. -std=c++11 or 
-std=gnu++11.
 
 
 
-cassert
 ccomplex
-cctype
-cerrno
 cfenv
-
-
-cfloat
 cinttypes
-ciso646
-climits
-clocale
-
-
-cmath
-csetjmp
-csigna

Re: [18/46] Make SLP_TREE_SCALAR_STMTS a vec

2018-07-31 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, Jul 24, 2018 at 12:01 PM Richard Sandiford
>  wrote:
>>
>> This patch changes SLP_TREE_SCALAR_STMTS from a vec to
>> a vec.  It's longer than the previous conversions
>> but mostly mechanical.
>
> OK.  I don't remember exactly but vect_external_def SLP nodes have
> empty stmts vector then?  I realize we only have those for defs that
> are in the vectorized region.

Yeah, for this the thing we care about is that it's part of the
vectorisable region.  I'm not sure how much stuff we hang off
a vect_external_def SLP stmt_vec_info, but we do need at least
STMT_VINFO_DEF_TYPE as well STMT_VINFO_STMT itself.

Thanks,
Richard

>
>>
>> 2018-07-24  Richard Sandiford  
>>
>> gcc/
>> * tree-vectorizer.h (_slp_tree::stmts): Change from a vec
>> to a vec.
>> * tree-vect-slp.c (vect_free_slp_tree): Update accordingly.
>> (vect_create_new_slp_node): Take a vec instead of a
>> vec.
>> (_slp_oprnd_info::def_stmts): Change from a vec
>> to a vec.
>> (bst_traits::value_type, bst_traits::value_type): Likewise.
>> (bst_traits::hash): Update accordingly.
>> (vect_get_and_check_slp_defs): Change the stmts parameter from
>> a vec to a vec.
>> (vect_two_operations_perm_ok_p, vect_build_slp_tree_1): Likewise.
>> (vect_build_slp_tree): Likewise.
>> (vect_build_slp_tree_2): Likewise.  Update uses of
>> SLP_TREE_SCALAR_STMTS.
>> (vect_print_slp_tree): Update uses of SLP_TREE_SCALAR_STMTS.
>> (vect_mark_slp_stmts, vect_mark_slp_stmts_relevant)
>> (vect_slp_rearrange_stmts, vect_attempt_slp_rearrange_stmts)
>> (vect_supported_load_permutation_p, 
>> vect_find_last_scalar_stmt_in_slp)
>> (vect_detect_hybrid_slp_stmts, vect_slp_analyze_node_operations_1)
>> (vect_slp_analyze_node_operations, vect_slp_analyze_operations)
>> (vect_bb_slp_scalar_cost, vect_slp_analyze_bb_1)
>> (vect_get_constant_vectors, vect_get_slp_defs)
>> (vect_transform_slp_perm_load, vect_schedule_slp_instance)
>> (vect_remove_slp_scalar_calls, vect_schedule_slp): Likewise.
>> (vect_analyze_slp_instance): Build up a vec of stmt_vec_infos
>> instead of gimple stmts.
>> * tree-vect-data-refs.c (vect_slp_analyze_node_dependences): Change
>> the stores parameter for a vec to a vec.
>> (vect_slp_analyze_instance_dependence): Update uses of
>> SLP_TREE_SCALAR_STMTS.
>> (vect_slp_analyze_and_verify_node_alignment): Likewise.
>> (vect_slp_analyze_and_verify_instance_alignment): Likewise.
>> * tree-vect-loop.c (neutral_op_for_slp_reduction): Likewise.
>> (get_initial_defs_for_reduction): Likewise.
>> (vect_create_epilog_for_reduction): Likewise.
>> (vectorize_fold_left_reduction): Likewise.
>> * tree-vect-stmts.c (vect_prologue_cost_for_slp_op): Likewise.
>> (vect_model_simple_cost, vectorizable_shift, vectorizable_load)
>> (can_vectorize_live_stmts): Likewise.
>>
>> Index: gcc/tree-vectorizer.h
>> ===
>> --- gcc/tree-vectorizer.h   2018-07-24 10:22:57.277070390 +0100
>> +++ gcc/tree-vectorizer.h   2018-07-24 10:23:00.401042649 +0100
>> @@ -138,7 +138,7 @@ struct _slp_tree {
>>/* Nodes that contain def-stmts of this node statements operands.  */
>>vec children;
>>/* A group of scalar stmts to be vectorized together.  */
>> -  vec stmts;
>> +  vec stmts;
>>/* Load permutation relative to the stores, NULL if there is no
>>   permutation.  */
>>vec load_permutation;
>> Index: gcc/tree-vect-slp.c
>> ===
>> --- gcc/tree-vect-slp.c 2018-07-24 10:22:57.277070390 +0100
>> +++ gcc/tree-vect-slp.c 2018-07-24 10:23:00.401042649 +0100
>> @@ -66,11 +66,11 @@ vect_free_slp_tree (slp_tree node, bool
>>   statements would be redundant.  */
>>if (!final_p)
>>  {
>> -  gimple *stmt;
>> -  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt)
>> +  stmt_vec_info stmt_info;
>> +  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
>> {
>> - gcc_assert (STMT_VINFO_NUM_SLP_USES (vinfo_for_stmt (stmt)) > 0);
>> - STMT_VINFO_NUM_SLP_USES (vinfo_for_stmt (stmt))--;
>> + gcc_assert (STMT_VINFO_NUM_SLP_USES (stmt_info) > 0);
>> + STMT_VINFO_NUM_SLP_USES (stmt_info)--;
>> }
>>  }
>>
>> @@ -99,21 +99,21 @@ vect_free_slp_instance (slp_instance ins
>>  /* Create an SLP node for SCALAR_STMTS.  */
>>
>>  static slp_tree
>> -vect_create_new_slp_node (vec scalar_stmts)
>> +vect_create_new_slp_node (vec scalar_stmts)
>>  {
>>slp_tree node;
>> -  gimple *stmt = scalar_stmts[0];
>> +  stmt_vec_info stmt_info = scalar_stmts[0];
>>unsigned int nops;
>>
>> -  if (is_gimple_call (stmt))
>> +  if (gcall *stmt = dyn_ca

[PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-07-31 Thread Cesar Philippidis
The attached patch teaches libgomp how to use the CUDA thread occupancy
calculator built into the CUDA driver. Despite both being based off the
CUDA thread occupancy spreadsheet distributed with CUDA, the built in
occupancy calculator differs from the occupancy calculator in og8 in two
key ways. First, og8 launches twice the number of gangs as the driver
thread occupancy calculator. This was my attempt at preventing threads
from idling, and it operating on a similar principle of running 'make
-jN', where N is twice the number of CPU threads. Second, whereas og8
always attempts to maximize the CUDA block size, the driver may select a
smaller block, which effectively decreases num_workers.

In terms of performance, there really isn't that much of a difference
between the CUDA driver's occupancy calculator and og8's. However, on
the tests that are impacted, they are generally within a factor of two
from one another, with some tests running faster with the driver
occupancy calculator and others with og8's.

Unfortunately, support for the CUDA driver API isn't universal; it's
only available in CUDA version 6.5 (or 6050) and newer. In this patch,
I'm exploiting the fact that init_cuda_lib only checks for errors on the
last library function initialized. Therefore it guards the usage of

  cuOccupancyMaxPotentialBlockSizeWithFlags

by checking driver_version. If the driver occupancy calculator isn't
available, it falls back to the existing defaults. Maybe the og8 thread
occupancy would make a better default for older versions of CUDA, but
that's a patch for another day.

Is this patch OK for trunk? I bootstrapped and regression tested it
using x86_64 with nvptx offloading.

Thanks,
Cesar
[nvptx] Use CUDA driver API to select default runtime launch geometry

2018-XX-YY  Cesar Philippidis  
	libgomp/
	plugin/cuda/cuda.h (CUoccupancyB2DSize): New typedef.
	(cuDriverGetVersion): Declare.
	(cuOccupancyMaxPotentialBlockSizeWithFlags): Declare.
	plugin/plugin-nvptx.c (CUDA_ONE_CALL): Add entries for
	cuDriverGetVersion and cuOccupancyMaxPotentialBlockSize.
	(ptx_device): Add driver_version member.
	(nvptx_open_device): Initialize it.
	(nvptx_exec): Use cuOccupancyMaxPotentialBlockSize to set the
	default num_gangs and num_workers when the driver supports it.
---
 libgomp/plugin/cuda/cuda.h|  5 +
 libgomp/plugin/plugin-nvptx.c | 37 -
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/libgomp/plugin/cuda/cuda.h b/libgomp/plugin/cuda/cuda.h
index 4799825..1fc694d 100644
--- a/libgomp/plugin/cuda/cuda.h
+++ b/libgomp/plugin/cuda/cuda.h
@@ -44,6 +44,7 @@ typedef void *CUevent;
 typedef void *CUfunction;
 typedef void *CUlinkState;
 typedef void *CUmodule;
+typedef size_t (*CUoccupancyB2DSize)(int);
 typedef void *CUstream;
 
 typedef enum {
@@ -123,6 +124,7 @@ CUresult cuCtxSynchronize (void);
 CUresult cuDeviceGet (CUdevice *, int);
 CUresult cuDeviceGetAttribute (int *, CUdevice_attribute, CUdevice);
 CUresult cuDeviceGetCount (int *);
+CUresult cuDriverGetVersion (int *);
 CUresult cuEventCreate (CUevent *, unsigned);
 #define cuEventDestroy cuEventDestroy_v2
 CUresult cuEventDestroy (CUevent);
@@ -170,6 +172,9 @@ CUresult cuModuleGetGlobal (CUdeviceptr *, size_t *, CUmodule, const char *);
 CUresult cuModuleLoad (CUmodule *, const char *);
 CUresult cuModuleLoadData (CUmodule *, const void *);
 CUresult cuModuleUnload (CUmodule);
+CUresult cuOccupancyMaxPotentialBlockSizeWithFlags (int *, int *, CUfunction,
+		CUoccupancyB2DSize, size_t,
+		int, unsigned int);
 CUresult cuStreamCreate (CUstream *, unsigned);
 #define cuStreamDestroy cuStreamDestroy_v2
 CUresult cuStreamDestroy (CUstream);
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index b6ec5f8..2647af6 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -63,6 +63,7 @@ CUDA_ONE_CALL (cuCtxSynchronize)	\
 CUDA_ONE_CALL (cuDeviceGet)		\
 CUDA_ONE_CALL (cuDeviceGetAttribute)	\
 CUDA_ONE_CALL (cuDeviceGetCount)	\
+CUDA_ONE_CALL (cuDriverGetVersion)	\
 CUDA_ONE_CALL (cuEventCreate)		\
 CUDA_ONE_CALL (cuEventDestroy)		\
 CUDA_ONE_CALL (cuEventElapsedTime)	\
@@ -94,6 +95,7 @@ CUDA_ONE_CALL (cuModuleGetGlobal)	\
 CUDA_ONE_CALL (cuModuleLoad)		\
 CUDA_ONE_CALL (cuModuleLoadData)	\
 CUDA_ONE_CALL (cuModuleUnload)		\
+CUDA_ONE_CALL (cuOccupancyMaxPotentialBlockSize) \
 CUDA_ONE_CALL (cuStreamCreate)		\
 CUDA_ONE_CALL (cuStreamDestroy)		\
 CUDA_ONE_CALL (cuStreamQuery)		\
@@ -423,6 +425,7 @@ struct ptx_device
   int max_threads_per_block;
   int max_threads_per_multiprocessor;
   int default_dims[GOMP_DIM_MAX];
+  int driver_version;
 
   struct ptx_image_data *images;  /* Images loaded on device.  */
   pthread_mutex_t image_lock; /* Lock for above list.  */
@@ -734,6 +737,7 @@ nvptx_open_device (int n)
   ptx_dev->ord = n;
   ptx_dev->dev = dev;
   ptx_dev->ctx_shared = false;
+  ptx_dev->driver_version = 0;
 
   r = CUDA_CALL_NOCHECK (cuCtxGetD

Re: [Patch, fortran] A first small step towards CFI descriptor implementation

2018-07-31 Thread Richard Biener
On Tue, Jul 31, 2018 at 2:07 PM Paul Richard Thomas
 wrote:
>
> Daniel Celis Garza and Damian Rouson have developed a runtime library
> and include file for the TS 29113 and F2018 C descriptors.
> https://github.com/sourceryinstitute/ISO_Fortran_binding
>
> The ordering of types is different to the current 'bt' enum in
> libgfortran.h. This patch interchanges BT_DERIVED and BT_CHARACTER to
> fix this.
>
> Regtests on FC28/x86_64. OK for trunk?

That's an ABI change, correct?

Richard.

> Cheers
>
> Paul
>
> 2018-07-31  Paul Thomas  
>
> * gcc/fortran/libgfortran.h : In bt enum interchange BT_DERIVED
> and BT_CHARACTER for CFI descriptor compatibility(TS 29113).


[PATCH] Replace safe bool idiom with explicit operator bool

2018-07-31 Thread Jonathan Wakely

* include/ext/pointer.h [__cplusplus >= 201103L]
(_Pointer_adapter::operator bool): Add explicit conversion operator
to replace safe bool idiom.

Tested powerpc64le-linux, committed to trunk.


commit 791062941cf5c4b93153bfd10d5bb9b0ac78d301
Author: Jonathan Wakely 
Date:   Tue Jul 31 11:33:03 2018 +0100

Replace safe bool idiom with explicit operator bool

* include/ext/pointer.h [__cplusplus >= 201103L]
(_Pointer_adapter::operator bool): Add explicit conversion operator
to replace safe bool idiom.

diff --git a/libstdc++-v3/include/ext/pointer.h 
b/libstdc++-v3/include/ext/pointer.h
index 318fbb11b08..ee5c30dfa64 100644
--- a/libstdc++-v3/include/ext/pointer.h
+++ b/libstdc++-v3/include/ext/pointer.h
@@ -356,6 +356,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return _Storage_policy::get()[__index]; }
 
   // To allow implicit conversion to "bool", for "if (ptr)..."
+#if __cplusplus >= 201103L
+  explicit operator bool() const { return _Storage_policy::get() != 0; }
+#else
 private:
   typedef element_type*(_Pointer_adapter::*__unspecified_bool_type)() 
const;
 
@@ -370,6 +373,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   inline bool
   operator!() const 
   { return (_Storage_policy::get() == 0); }
+#endif
   
   // Pointer differences
   inline friend std::ptrdiff_t 


Re: [PATCH] convert braced initializers to strings (PR 71625)

2018-07-31 Thread Martin Sebor

On 07/31/2018 07:38 AM, Jason Merrill wrote:

On Tue, Jul 31, 2018 at 9:51 AM, Martin Sebor  wrote:

The middle-end contains code to determine the lengths of constant
character arrays initialized by string literals.  The code is used
in a number of optimizations and warnings.

However, the code is unable to deal with constant arrays initialized
using the braced initializer syntax, as in

  const char a[] = { '1', '2', '\0' };

The attached patch extends the C and C++ front-ends to convert such
initializers into a STRING_CST form.

The goal of this work is to both enable existing optimizations for
such arrays, and to help detect bugs due to using non-nul terminated
arrays where nul-terminated strings are expected.  The latter is
an extension of the GCC 8 _Wstringop-overflow and
-Wstringop-truncation warnings that help detect or prevent reading
past the end of dynamically created character arrays.  Future work
includes detecting potential past-the-end reads from uninitialized
local character arrays.



  && TYPE_MAIN_VARIANT (TREE_TYPE (valtype)) == char_type_node)


Why? Don't we want this for other character types as well?


It suppresses narrowing warnings for things like

  signed char a[] = { 0xff };

(there are a couple of tests that exercise this).

At the same time, STRING_CST is supposed to be able to represent
strings of any integer type so there should be a way to make it
work.  On the flip side, recent discussions of changes in this
area suggest there may be bugs in the wide character handling of
STRING_CST so those would need to be fixed before relying on it
for robust support.

In any case, if you have a suggestion for how to make it work for
at least the narrow character types I'll adjust the patch.

Martin


Re: [PATCH] Make GO string literals properly NUL terminated

2018-07-31 Thread Ian Lance Taylor
On Tue, Jul 31, 2018 at 5:14 AM, Bernd Edlinger
 wrote:
>
> could someone please review this patch and check it in into the GO FE?

I don't understand why the change is correct, and you didn't explain
it.  Go strings are not NUL terminated.  Go strings always have an
associated length.

Ian


[PATCH], Improve PowerPC switch behavior on medium code model system

2018-07-31 Thread Michael Meissner
I noticed that the switch code on PowerPC little endian systems (with medium
code mode) did not follow the ABI in terms of page 69:

Table 2.36. Position-Independent Switch Code for Small/Medium Models
(preferred, with TOC-relative addressing)

The code we currently generate is:

.section".toc","aw"
.align 3
.LC0:
.quad   .L4
.section".text"

# ...

addis 10,2,.LC0@toc@ha
ld 10,.LC0@toc@l(10)
sldi 3,3,2
add 9,10,3
lwa 9,0(9)
add 9,9,10
mtctr 9
bctr
.L4:
.long .L2-.L4
.long .L12-.L4
.long .L11-.L4
.long .L10-.L4
.long .L9-.L4
.long .L8-.L4
.long .L7-.L4
.long .L6-.L4
.long .L5-.L4
.long .L3-.L4

While the suggested code would be something like:

addis 10,2,.L4@toc@ha
addi 10,10,.L4@toc@l
sldi 3,3,2
lwax 9,10,3
add 9,9,10
mtctr 9
bctr
.p2align 2
.align 2
.L4:
.long .L2-.L4
.long .L12-.L4
.long .L11-.L4
.long .L10-.L4
.long .L9-.L4
.long .L8-.L4
.long .L7-.L4
.long .L6-.L4
.long .L5-.L4
.long .L3-.L4

This patch adds an insn to load a LABEL_REF into a GPR.  This is needed so the
FWPROP1 pass can convert the load the of the label address from the TOC to a
direct load to a GPR.

While working on the patch, I discovered that the LWA instruction did not
support indexed loads.  This was due to it using the 'Y' constraint, which
accepts DS-form offsettable addresses, but not X-form indexed addresses.  I
added the Z constraint so that the indexed form is accepted.

I am in the middle of doing spec 2006 runs on both power8 and power9 systems
with this change.  So far after 2 runs out 3, I'm seeing several minor wins on
power9 (1-2%, perlbench, gcc, sjeng, sphinx3) and no regressions.  On power8 I
see 3 minor wins (1-3%, perlbench, sjeng, omnetpp) and 1 minor regression (1%,
povray).

I have done bootstrap builds with/without the change and there were no
regressions in the test suite.  Can I check this change into the trunk?  It is
a simple enough change for back ports, if desired.

Note, I will be on vacation for 11 days starting this Saturday.  I will not be
actively checking my mail in that time period.  If I get the approval early
enough, I can check it in.  Otherwise, somebody else can check it in if they
monitor for failure, or we can wait until I get around August 14th to check it
in.

2018-07-31  Michael Meissner  

* config/rs6000/predicates.md (label_ref_operand): New predicate
to recognize LABEL_REF.
* config/rs6000/rs6000.c (rs6000_output_addr_const_extra): Allow
LABEL_REF's inside of UNSPEC_TOCREL's.
* config/rs6000/rs6000.md (extendsi2): Allow reg+reg indexed
addressing.
(labelref): New insn to optimize loading a label address into
registers on a medium code system.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 263040)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -1662,6 +1662,10 @@ (define_predicate "small_toc_ref"
   return GET_CODE (op) == UNSPEC && XINT (op, 1) == UNSPEC_TOCREL;
 })
 
+;; Match a LABEL_REF operand
+(define_predicate "label_ref_operand"
+  (match_code "label_ref"))
+
 ;; Match the first insn (addis) in fusing the combination of addis and loads to
 ;; GPR registers on power8.
 (define_predicate "fusion_gpr_addis"
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 263040)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -20807,7 +20807,8 @@ rs6000_output_addr_const_extra (FILE *fi
 switch (XINT (x, 1))
   {
   case UNSPEC_TOCREL:
-   gcc_checking_assert (GET_CODE (XVECEXP (x, 0, 0)) == SYMBOL_REF
+   gcc_checking_assert ((GET_CODE (XVECEXP (x, 0, 0)) == SYMBOL_REF
+ || GET_CODE (XVECEXP (x, 0, 0)) == LABEL_REF)
 && REG_P (XVECEXP (x, 0, 1))
 && REGNO (XVECEXP (x, 0, 1)) == TOC_REGISTER);
output_addr_const (file, XVECEXP (x, 0, 0));
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 263040)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -998,7 +998,7 @@ (define_insn "extendsi2"
 "=r, r,   wl,wu,wj,wK, wH,wr")
 
(sign_extend:EXTSI (match_operand:SI 1 "lwa_operand"
-"Y,  r,   Z, Z, r, wK, wH,?wIwH")))]
+  

Re: [C++ PATCH] Implement P0595R1 - so far as __builtin_is_constant_evaluated rather than std::is_constant_evaluated magic builtin

2018-07-31 Thread Jason Merrill
On Mon, Jul 23, 2018 at 8:50 PM, Richard Biener
 wrote:
> On Mon, Jul 23, 2018 at 12:28 PM Jakub Jelinek  wrote:
>>
>> On Mon, Jul 23, 2018 at 12:17:42PM +0200, Richard Biener wrote:
>> > > Bootstrapped/regtested on x86_64-linux.
>> >
>> > Thanks for working on this.  I wonder if we can completely hide this
>> > from the middle-end, without requiring defining of c_dialect_cxx.
>> > There is the BUILT_IN_FRONTEND class so you could somewhere
>> > manually inject a decl in that class for C++?
>>
>> But then I couldn't handle folding of the builtin in the middle-end to false,
>> which is what I need (because in the FE it needs to be either folded to
>> true, or its folding deferred until later).
>> Or maybe in the C++ gimplification langhook?
>
> Yes, I was thinking the C++ langhook or its fully_fold routine.

fully_fold is too soon (until constexpr evaluation uses the
pre-genericize form), but the gimplification hook should work.

>> Seems we have a single BUILT_IN_FRONTEND builtin in the whole compiler,
>> __integer_pack, but it doesn't act as a normal builtin, given it is a
>> templatish magic.
>
> Yeah, I think at some point we considered removing BUILT_IN_FRONTEND ...
>
> Nowadays internal-use builtins can easily be internal-functions but of couse
> this one will eventually be used from libstdc++.

Immediately, I'd think.

Jason


Re: [PATCH 5/5] Formatted printing for dump_* in the middle-end

2018-07-31 Thread Richard Biener
On Tue, Jul 31, 2018 at 4:21 PM Richard Biener
 wrote:
>
> On Tue, Jul 31, 2018 at 4:19 PM David Malcolm  wrote:
> >
> > On Tue, 2018-07-31 at 15:03 +0200, Richard Biener wrote:
> > > On Fri, Jul 27, 2018 at 11:49 PM David Malcolm 
> > > wrote:
> > > >
> > > > This patch converts dump_print and dump_printf_loc from using
> > > > printf (and thus ATTRIBUTE_PRINTF) to using a new pretty-printer
> > > > based on pp_format, which supports formatting middle-end types.
> > > >
> > > > In particular, the following codes are implemented (in addition
> > > > to the standard pretty_printer ones):
> > > >
> > > >%E: gimple *:
> > > >Equivalent to: dump_gimple_expr (MSG_*, TDF_SLIM, stmt, 0)
> > > >%G: gimple *:
> > > >Equivalent to: dump_gimple_stmt (MSG_*, TDF_SLIM, stmt, 0)
> > > >%T: tree:
> > > >Equivalent to: dump_generic_expr (MSG_*, arg, TDF_SLIM).
> > > >
> > > > Hence it becomes possible to convert e.g.:
> > > >
> > > >   if (dump_enabled_p ())
> > > > {
> > > >   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > >"not vectorized: different sized vector "
> > > >"types in statement, ");
> > > >   dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> > > > vectype);
> > > >   dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
> > > >   dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> > > > nunits_vectype);
> > > >   dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
> > > > }
> > > >
> > > > into a one-liner:
> > > >
> > > >   if (dump_enabled_p ())
> > > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > >  "not vectorized: different sized vector "
> > > >  "types in statement, %T and %T\n",
> > > >  vectype, nunits_vectype);
> > > >
> > > > Unlike regular pretty-printers, this one captures optinfo_item
> > > > instances for the formatted chunks as appropriate, so that when
> > > > written out to a JSON optimization record, the relevant parts of
> > > > the message are labelled by type, and by source location (so that
> > > > e.g. %G is entirely equivalent to using dump_gimple_stmt).
> > > >
> > > > dump_printf and dump_printf_loc become marked with
> > > > ATTRIBUTE_GCC_DUMP_PRINTF, which the patch also implements.
> > > >
> > > > gcc/c-family/ChangeLog:
> > > > * c-format.c (enum format_type): Add
> > > > gcc_dump_printf_format_type.
> > > > (local_gimple_ptr_node): New decl.
> > > > (gcc_dump_printf_length_specs): New.
> > > > (gcc_dump_printf_flag_pairs): New.
> > > > (gcc_dump_printf_flag_specs): New.
> > > > (gcc_dump_printf_char_table): New.
> > > > (format_types_orig): Add entry for "gcc_dump_printf".
> > > > (init_dynamic_diag_info): Create local_gimple_ptr_node.
> > > > Set up length_char_specs and conversion_specs for
> > > > gcc_dump_printf_format_type.
> > > > (handle_format_attribute): Handle
> > > > gcc_dump_printf_format_type.
> > > > * c-format.h (T89_GIMPLE): New macro.
> > >
> > > Iff the c-family changes are neccessary (are they?) then how does
> > > this
> > > work for non-c-family languages which do not link c-family/c-
> > > format.o?
> >
> > The c-family changes are necessary for bootstrap, so that -Wformat
> > works cleanly after changing dump_printf_loc etc from
> >   ATTRIBUTE_PRINTF_3;
> > to
> >   ATTRIBUTE_GCC_DUMP_PRINTF (3, 0);
> > i.e. they're just the changes to -Wformat to teach it how to verify the
> > new:
> >   __attribute__ ((__format__ (__gcc_dump_printf__, m ,n)))
> > (hence the cleanups to c-format.c earlier in the patch kit, to avoid
> > yet more copy-and-paste there for the new format decoder callback).
>
> Ah, thanks for clarifying.
>
> > The implementation itself is all within dumpfile.c, hence the non-c-
> > family languages ought to work.  My testing was with:
> >   --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto,jit,brig
> > (and with cloog and isl, fwiw).
> >
> > (I kept the alphabetization of the ChangeLog files from my generate-
> > changelog.py script, which put the gcc/c-family/ChangeLog before the
> > gcc/ChangeLog and thus may have made this confusing to read, sorry).
> >
> > I didn't exhaustively check every callsite to the changed calls; I'm
> > assuming that -Wformat during bootstrap has effectively checked that
> > for me.  Though now I think about it, I note that we use
> > HOST_WIDE_INT_PRINT_DEC in many places: is this guaranteed to be a
> > valid input to pp_format on all of our configurations?
>
> I hope so ... returning to the patch now.

The patch is OK if C family maintainers agree on their parts.

Thanks,
Richard.

> Richard.
>
> > Dave
> >
> > >
> > > > gcc/ChangeLog:
> > > > * dump-context.h: Include "dumpfile.h".
> > > > (dump_context::dump_printf_va): Convert final param from
> > > > va_list
> > > > to va_list 

Re: [PATCH] libbacktrace: Move define of HAVE_ZLIB into check for -lz

2018-07-31 Thread Ian Lance Taylor via gcc-patches
On Sun, Jul 29, 2018 at 7:50 AM, Iain Buclaw  wrote:
>
> This is really to suppress the default action-if-found for
> AC_CHECK_LIBS.  Zlib is not a dependency of libbacktrace, and so it
> shouldn't be added to LIBS.  When looking at the check, saw that could
> remove the test for ac_cv_lib_z_compress also.

Thanks, but this doesn't seem like quite the right approach, as seen
by the fact that HAVE_ZLIB_H was dropped from config.h.in.  I think
you need to keep the AC_DEFINE out of the AC_CHECK_LIB.  I would guess
that it would work to just change the default case of AC_CHECK_LIB to
[;] or something similarly innocuous.

Ian


Re: [PATCH 5/5] Formatted printing for dump_* in the middle-end

2018-07-31 Thread Richard Biener
On Tue, Jul 31, 2018 at 4:19 PM David Malcolm  wrote:
>
> On Tue, 2018-07-31 at 15:03 +0200, Richard Biener wrote:
> > On Fri, Jul 27, 2018 at 11:49 PM David Malcolm 
> > wrote:
> > >
> > > This patch converts dump_print and dump_printf_loc from using
> > > printf (and thus ATTRIBUTE_PRINTF) to using a new pretty-printer
> > > based on pp_format, which supports formatting middle-end types.
> > >
> > > In particular, the following codes are implemented (in addition
> > > to the standard pretty_printer ones):
> > >
> > >%E: gimple *:
> > >Equivalent to: dump_gimple_expr (MSG_*, TDF_SLIM, stmt, 0)
> > >%G: gimple *:
> > >Equivalent to: dump_gimple_stmt (MSG_*, TDF_SLIM, stmt, 0)
> > >%T: tree:
> > >Equivalent to: dump_generic_expr (MSG_*, arg, TDF_SLIM).
> > >
> > > Hence it becomes possible to convert e.g.:
> > >
> > >   if (dump_enabled_p ())
> > > {
> > >   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > >"not vectorized: different sized vector "
> > >"types in statement, ");
> > >   dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> > > vectype);
> > >   dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
> > >   dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> > > nunits_vectype);
> > >   dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
> > > }
> > >
> > > into a one-liner:
> > >
> > >   if (dump_enabled_p ())
> > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > >  "not vectorized: different sized vector "
> > >  "types in statement, %T and %T\n",
> > >  vectype, nunits_vectype);
> > >
> > > Unlike regular pretty-printers, this one captures optinfo_item
> > > instances for the formatted chunks as appropriate, so that when
> > > written out to a JSON optimization record, the relevant parts of
> > > the message are labelled by type, and by source location (so that
> > > e.g. %G is entirely equivalent to using dump_gimple_stmt).
> > >
> > > dump_printf and dump_printf_loc become marked with
> > > ATTRIBUTE_GCC_DUMP_PRINTF, which the patch also implements.
> > >
> > > gcc/c-family/ChangeLog:
> > > * c-format.c (enum format_type): Add
> > > gcc_dump_printf_format_type.
> > > (local_gimple_ptr_node): New decl.
> > > (gcc_dump_printf_length_specs): New.
> > > (gcc_dump_printf_flag_pairs): New.
> > > (gcc_dump_printf_flag_specs): New.
> > > (gcc_dump_printf_char_table): New.
> > > (format_types_orig): Add entry for "gcc_dump_printf".
> > > (init_dynamic_diag_info): Create local_gimple_ptr_node.
> > > Set up length_char_specs and conversion_specs for
> > > gcc_dump_printf_format_type.
> > > (handle_format_attribute): Handle
> > > gcc_dump_printf_format_type.
> > > * c-format.h (T89_GIMPLE): New macro.
> >
> > Iff the c-family changes are neccessary (are they?) then how does
> > this
> > work for non-c-family languages which do not link c-family/c-
> > format.o?
>
> The c-family changes are necessary for bootstrap, so that -Wformat
> works cleanly after changing dump_printf_loc etc from
>   ATTRIBUTE_PRINTF_3;
> to
>   ATTRIBUTE_GCC_DUMP_PRINTF (3, 0);
> i.e. they're just the changes to -Wformat to teach it how to verify the
> new:
>   __attribute__ ((__format__ (__gcc_dump_printf__, m ,n)))
> (hence the cleanups to c-format.c earlier in the patch kit, to avoid
> yet more copy-and-paste there for the new format decoder callback).

Ah, thanks for clarifying.

> The implementation itself is all within dumpfile.c, hence the non-c-
> family languages ought to work.  My testing was with:
>   --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto,jit,brig
> (and with cloog and isl, fwiw).
>
> (I kept the alphabetization of the ChangeLog files from my generate-
> changelog.py script, which put the gcc/c-family/ChangeLog before the
> gcc/ChangeLog and thus may have made this confusing to read, sorry).
>
> I didn't exhaustively check every callsite to the changed calls; I'm
> assuming that -Wformat during bootstrap has effectively checked that
> for me.  Though now I think about it, I note that we use
> HOST_WIDE_INT_PRINT_DEC in many places: is this guaranteed to be a
> valid input to pp_format on all of our configurations?

I hope so ... returning to the patch now.

Richard.

> Dave
>
> >
> > > gcc/ChangeLog:
> > > * dump-context.h: Include "dumpfile.h".
> > > (dump_context::dump_printf_va): Convert final param from
> > > va_list
> > > to va_list *.  Convert from ATTRIBUTE_PRINTF to
> > > ATTRIBUTE_GCC_DUMP_PRINTF.
> > > (dump_context::dump_printf_loc_va): Likewise.
> > > * dumpfile.c: Include "stringpool.h".
> > > (make_item_for_dump_printf_va): Delete.
> > > (make_item_for_dump_printf): Delete.
> > > (class dump_pretty_printer): New class.
> > >

Re: [PATCH 5/5] Formatted printing for dump_* in the middle-end

2018-07-31 Thread David Malcolm
On Tue, 2018-07-31 at 15:03 +0200, Richard Biener wrote:
> On Fri, Jul 27, 2018 at 11:49 PM David Malcolm 
> wrote:
> > 
> > This patch converts dump_print and dump_printf_loc from using
> > printf (and thus ATTRIBUTE_PRINTF) to using a new pretty-printer
> > based on pp_format, which supports formatting middle-end types.
> > 
> > In particular, the following codes are implemented (in addition
> > to the standard pretty_printer ones):
> > 
> >%E: gimple *:
> >Equivalent to: dump_gimple_expr (MSG_*, TDF_SLIM, stmt, 0)
> >%G: gimple *:
> >Equivalent to: dump_gimple_stmt (MSG_*, TDF_SLIM, stmt, 0)
> >%T: tree:
> >Equivalent to: dump_generic_expr (MSG_*, arg, TDF_SLIM).
> > 
> > Hence it becomes possible to convert e.g.:
> > 
> >   if (dump_enabled_p ())
> > {
> >   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >"not vectorized: different sized vector "
> >"types in statement, ");
> >   dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> > vectype);
> >   dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
> >   dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> > nunits_vectype);
> >   dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
> > }
> > 
> > into a one-liner:
> > 
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >  "not vectorized: different sized vector "
> >  "types in statement, %T and %T\n",
> >  vectype, nunits_vectype);
> > 
> > Unlike regular pretty-printers, this one captures optinfo_item
> > instances for the formatted chunks as appropriate, so that when
> > written out to a JSON optimization record, the relevant parts of
> > the message are labelled by type, and by source location (so that
> > e.g. %G is entirely equivalent to using dump_gimple_stmt).
> > 
> > dump_printf and dump_printf_loc become marked with
> > ATTRIBUTE_GCC_DUMP_PRINTF, which the patch also implements.
> > 
> > gcc/c-family/ChangeLog:
> > * c-format.c (enum format_type): Add
> > gcc_dump_printf_format_type.
> > (local_gimple_ptr_node): New decl.
> > (gcc_dump_printf_length_specs): New.
> > (gcc_dump_printf_flag_pairs): New.
> > (gcc_dump_printf_flag_specs): New.
> > (gcc_dump_printf_char_table): New.
> > (format_types_orig): Add entry for "gcc_dump_printf".
> > (init_dynamic_diag_info): Create local_gimple_ptr_node.
> > Set up length_char_specs and conversion_specs for
> > gcc_dump_printf_format_type.
> > (handle_format_attribute): Handle
> > gcc_dump_printf_format_type.
> > * c-format.h (T89_GIMPLE): New macro.
> 
> Iff the c-family changes are neccessary (are they?) then how does
> this
> work for non-c-family languages which do not link c-family/c-
> format.o?

The c-family changes are necessary for bootstrap, so that -Wformat
works cleanly after changing dump_printf_loc etc from
  ATTRIBUTE_PRINTF_3;
to
  ATTRIBUTE_GCC_DUMP_PRINTF (3, 0);
i.e. they're just the changes to -Wformat to teach it how to verify the
new:
  __attribute__ ((__format__ (__gcc_dump_printf__, m ,n)))
(hence the cleanups to c-format.c earlier in the patch kit, to avoid
yet more copy-and-paste there for the new format decoder callback).

The implementation itself is all within dumpfile.c, hence the non-c-
family languages ought to work.  My testing was with:
  --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto,jit,brig
(and with cloog and isl, fwiw).

(I kept the alphabetization of the ChangeLog files from my generate-
changelog.py script, which put the gcc/c-family/ChangeLog before the
gcc/ChangeLog and thus may have made this confusing to read, sorry).

I didn't exhaustively check every callsite to the changed calls; I'm
assuming that -Wformat during bootstrap has effectively checked that
for me.  Though now I think about it, I note that we use
HOST_WIDE_INT_PRINT_DEC in many places: is this guaranteed to be a
valid input to pp_format on all of our configurations?

Dave

> 
> > gcc/ChangeLog:
> > * dump-context.h: Include "dumpfile.h".
> > (dump_context::dump_printf_va): Convert final param from
> > va_list
> > to va_list *.  Convert from ATTRIBUTE_PRINTF to
> > ATTRIBUTE_GCC_DUMP_PRINTF.
> > (dump_context::dump_printf_loc_va): Likewise.
> > * dumpfile.c: Include "stringpool.h".
> > (make_item_for_dump_printf_va): Delete.
> > (make_item_for_dump_printf): Delete.
> > (class dump_pretty_printer): New class.
> > (dump_pretty_printer::dump_pretty_printer): New ctor.
> > (dump_pretty_printer::emit_items): New member function.
> > (dump_pretty_printer::emit_any_pending_textual_chunks): New
> > member
> > function.
> > (dump_pretty_printer::emit_item): New member function.
> > (dump_pretty_printer::stash_it

Re: [C++2A] Implement P1008R1 - prohibit aggregates with user-declared constructors

2018-07-31 Thread Jason Merrill
On Mon, Jul 30, 2018 at 9:01 PM, Jakub Jelinek  wrote:
> Seems what is considered an aggregate type keeps changing in every single
> C++ version.

Indeed :/

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

This is OK.  I think we could also use a -Wc++2a-compat warning, and
an inform in 2a mode about why your initializer doesn't work any more,
but they don't need to go in right now.

Jason


Re: [PATCH] combine: Allow combining two insns to two insns

2018-07-31 Thread Segher Boessenkool
On Tue, Jul 31, 2018 at 05:39:37AM -0700, H.J. Lu wrote:
> For
> 
> ---
> #define N 16
> float f[N];
> double d[N];
> int n[N];
> 
> __attribute__((noinline)) void
> f3 (void)
> {
>   int i;
>   for (i = 0; i < N; i++)
> d[i] = f[i];
> }
> ---
> 
> r263067 improved -O3 -mavx2 -mtune=generic -m64 from
> 
> .cfi_startproc
> vmovaps f(%rip), %xmm2
> vmovaps f+32(%rip), %xmm3
> vinsertf128 $0x1, f+16(%rip), %ymm2, %ymm0
> vcvtps2pd %xmm0, %ymm1
> vextractf128 $0x1, %ymm0, %xmm0
> vmovaps %xmm1, d(%rip)
> vextractf128 $0x1, %ymm1, d+16(%rip)
> vcvtps2pd %xmm0, %ymm0
> vmovaps %xmm0, d+32(%rip)
> vextractf128 $0x1, %ymm0, d+48(%rip)
> vinsertf128 $0x1, f+48(%rip), %ymm3, %ymm0
> vcvtps2pd %xmm0, %ymm1
> vextractf128 $0x1, %ymm0, %xmm0
> vmovaps %xmm1, d+64(%rip)
> vextractf128 $0x1, %ymm1, d+80(%rip)
> vcvtps2pd %xmm0, %ymm0
> vmovaps %xmm0, d+96(%rip)
> vextractf128 $0x1, %ymm0, d+112(%rip)
> vzeroupper
> ret
> .cfi_endproc
> 
> to
> 
> .cfi_startproc
> vcvtps2pd f(%rip), %ymm0
> vmovaps %xmm0, d(%rip)
> vextractf128 $0x1, %ymm0, d+16(%rip)
> vcvtps2pd f+16(%rip), %ymm0
> vmovaps %xmm0, d+32(%rip)
> vextractf128 $0x1, %ymm0, d+48(%rip)
> vcvtps2pd f+32(%rip), %ymm0
> vextractf128 $0x1, %ymm0, d+80(%rip)
> vmovaps %xmm0, d+64(%rip)
> vcvtps2pd f+48(%rip), %ymm0
> vextractf128 $0x1, %ymm0, d+112(%rip)
> vmovaps %xmm0, d+96(%rip)
> vzeroupper
> ret
> .cfi_endproc

I cannot really read AVX, but that looks like better code alright :-)


Segher


Re: [PATCH 0/5] dump_printf support for middle-end types

2018-07-31 Thread David Malcolm
On Tue, 2018-07-31 at 14:50 +0200, Richard Biener wrote:
> On Fri, Jul 27, 2018 at 11:47 PM David Malcolm 
> wrote:
> > 
> > This patch kit converts dump_print and dump_printf_loc from using
> > fprintf etc internally to using a new pretty-printer
> > based on pp_format, which supports formatting middle-end types.
> > 
> > In particular, the following codes are implemented (in addition
> > to the standard pretty_printer ones):
> > 
> >%E: gimple *:
> >Equivalent to: dump_gimple_expr (MSG_*, TDF_SLIM, stmt, 0)
> >%G: gimple *:
> >Equivalent to: dump_gimple_stmt (MSG_*, TDF_SLIM, stmt, 0)
> >%T: tree:
> >Equivalent to: dump_generic_expr (MSG_*, arg, TDF_SLIM).
> > 
> > Hence it becomes possible to convert e.g.:
> > 
> >   if (dump_enabled_p ())
> > {
> >   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >"not vectorized: different sized vector "
> >"types in statement, ");
> >   dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> > vectype);
> >   dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
> >   dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> > nunits_vectype);
> >   dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
> > }
> >   return false;
> > 
> > into a single call to dump_printf_loc:
> > 
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >  "not vectorized: different sized vector "
> >  "types in statement, %T and %T\n",
> >  vectype, nunits_vectype);
> >   return false;
> > 
> > Unlike regular pretty-printers, this captures metadata for the
> > formatted chunks as appropriate, so that when written out to a
> > JSON optimization record, the relevant parts of the message are
> > labelled by type, and by source location (so that
> > e.g. %G is entirely equivalent to using dump_gimple_stmt).
> > 
> > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > 
> > OK for trunk?
> 
> Nice!  I'm somehow missing 3/5?  Will look into the other ones now.

Thanks.

FWIW, 3/5 was:
  "[PATCH 3/5] C++: clean up cp_printer"
 https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01766.html

which Jason has already approved (and I've committed as r263046, after
a fresh bootstrap®rtest).

Dave

> Richard.
> 
> > I'm hoping to use this in a v3 of:
> >   "[PATCH 0/5] [RFC v2] Higher-level reporting of vectorization
> > problems"
> >  https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00446.html
> > where the above might become:
> >   return opt_result::failure_at (stmt,
> >  "not vectorized: different sized
> > vector "
> >  "types in statement, %T and %T\n",
> >  vectype, nunits_vectype);
> > where opt_result::failure_at would encapsulate the "false", and
> > capture an opt_problem * (when dumps are enabled), for the reasons
> > discussed in that other kit.
> > 
> > David Malcolm (5):
> >   Simplify dump_context by adding a dump_loc member function
> >   dumpfile.c: eliminate special-casing of dump_file/alt_dump_file
> >   C++: clean up cp_printer
> >   c-family: clean up the data tables in c-format.c
> >   Formatted printing for dump_* in the middle-end
> > 
> >  gcc/c-family/c-format.c   |  159 +++--
> >  gcc/c-family/c-format.h   |1 +
> >  gcc/cp/error.c|   46 +-
> >  gcc/dump-context.h|   25 +-
> >  gcc/dumpfile.c| 1011
> > ++---
> >  gcc/dumpfile.h|   54 +-
> >  gcc/optinfo-emit-json.cc  |2 +-
> >  gcc/optinfo.cc|  135 +---
> >  gcc/optinfo.h |   38 +-
> >  gcc/testsuite/gcc.dg/format/gcc_diag-1.c  |   19 +-
> >  gcc/testsuite/gcc.dg/format/gcc_diag-10.c |   33 +-
> >  11 files changed, 998 insertions(+), 525 deletions(-)
> > 
> > --
> > 1.8.5.3
> > 


Re: [PATCH] combine: Allow combining two insns to two insns

2018-07-31 Thread Segher Boessenkool
Hi Christophe,

On Tue, Jul 31, 2018 at 02:34:06PM +0200, Christophe Lyon wrote:
> Since this was committed, I've noticed regressions
> on aarch64:
> FAIL: gcc.dg/zero_bits_compound-1.c scan-assembler-not \\(and:

This went from
and w0, w0, 255
lsl w1, w0, 8
orr w0, w1, w0, lsl 20
ret
to
and w1, w0, 255
ubfiz   w0, w0, 8, 8
orr w0, w0, w1, lsl 20
ret
so it's neither an improvement nor a regression, just different code.
The testcase wants no ANDs in the RTL.


> on arm-none-linux-gnueabi
> FAIL: gfortran.dg/actual_array_constructor_1.f90   -O1  execution test

That sounds bad.  Open a PR, maybe?


> On aarch64, I've also noticed a few others regressions but I'm not yet
> 100% sure it's caused by this patch (bisect running):
> gcc.target/aarch64/ashltidisi.c scan-assembler-times asr 4

 ushift_53_i:
-   uxtwx1, w0
-   lsl x0, x1, 53
-   lsr x1, x1, 11
+   lsr w1, w0, 11
+   lsl x0, x0, 53
ret

 shift_53_i:
-   sxtwx1, w0
-   lsl x0, x1, 53
-   asr x1, x1, 11
+   sbfxx1, x0, 11, 21
+   lsl x0, x0, 53
ret

Both are improvements afais.  The number of asr insns changes, sure.


> gcc.target/aarch64/sve/var_stride_2.c -march=armv8.2-a+sve
> scan-assembler-times \\tadd\\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 10\\n 2

Skipping all the SVE tests, sorry.  Richard says they look like
improvements, and exactly of the expected kind.  :-)


Segher


Re: [PATCH] convert braced initializers to strings (PR 71625)

2018-07-31 Thread Jason Merrill
On Tue, Jul 31, 2018 at 9:51 AM, Martin Sebor  wrote:
> The middle-end contains code to determine the lengths of constant
> character arrays initialized by string literals.  The code is used
> in a number of optimizations and warnings.
>
> However, the code is unable to deal with constant arrays initialized
> using the braced initializer syntax, as in
>
>   const char a[] = { '1', '2', '\0' };
>
> The attached patch extends the C and C++ front-ends to convert such
> initializers into a STRING_CST form.
>
> The goal of this work is to both enable existing optimizations for
> such arrays, and to help detect bugs due to using non-nul terminated
> arrays where nul-terminated strings are expected.  The latter is
> an extension of the GCC 8 _Wstringop-overflow and
> -Wstringop-truncation warnings that help detect or prevent reading
> past the end of dynamically created character arrays.  Future work
> includes detecting potential past-the-end reads from uninitialized
> local character arrays.

>   && TYPE_MAIN_VARIANT (TREE_TYPE (valtype)) == char_type_node)

Why? Don't we want this for other character types as well?

Jason


Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM

2018-07-31 Thread Kyrill Tkachov

Hi Thomas,

On 25/07/18 14:28, Thomas Preudhomme wrote:

Hi Kyrill,

Using memory_operand worked, the issues I encountered when using it in
earlier versions of the patch must have been due to the missing test
on address_operand in the preparation statements which I added later.
Please find an updated patch in attachment. ChangeLog entry is as
follows:

*** gcc/ChangeLog ***

2018-07-05  Thomas Preud'homme  

 * target-insns.def (stack_protect_combined_set): Define new standard
 pattern name.
 (stack_protect_combined_test): Likewise.
 * cfgexpand.c (stack_protect_prologue): Try new
 stack_protect_combined_set pattern first.
 * function.c (stack_protect_epilogue): Try new
 stack_protect_combined_test pattern first.
 * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
 parameters to control which register to use as PIC register and force
 reloading PIC register respectively.  Insert in the stream of insns if
 possible.
 (legitimize_pic_address): Expose above new parameters in prototype and
 adapt recursive calls accordingly.
 (arm_legitimize_address): Adapt to new legitimize_pic_address
 prototype.
 (thumb_legitimize_address): Likewise.
 (arm_emit_call_insn): Adapt to new require_pic_register prototype.
 * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
 change.
 * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
 prototype change.
 (stack_protect_combined_set): New insn_and_split pattern.
 (stack_protect_set): New insn pattern.
 (stack_protect_combined_test): New insn_and_split pattern.
 (stack_protect_test): New insn pattern.
 * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
 (UNSPEC_SP_TEST): Likewise.
 * doc/md.texi (stack_protect_combined_set): Document new standard
 pattern name.
 (stack_protect_set): Clarify that the operand for guard's address is
 legal.
 (stack_protect_combined_test): Document new standard pattern name.
 (stack_protect_test): Clarify that the operand for guard's address is
 legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  

 * gcc.target/arm/pr85434.c: New test.

Bootstrapped again for Arm and Thumb-2 and regtested with and without
-fstack-protector-all without any regression.


This looks ok to me now.
Thank you for your patience and addressing my comments from before.

Kyrill


Best regards,

Thomas
On Thu, 19 Jul 2018 at 17:34, Thomas Preudhomme
 wrote:

[Dropping Jeff Law from the list since he already commented on the
middle end parts]

Hi Kyrill,

On Thu, 19 Jul 2018 at 12:02, Kyrill Tkachov
 wrote:

Hi Thomas,

On 17/07/18 12:02, Thomas Preudhomme wrote:

Fixed in attached patch. ChangeLog entries are unchanged:

*** gcc/ChangeLog ***

2018-07-05  Thomas Preud'homme 

 PR target/85434
 * target-insns.def (stack_protect_combined_set): Define new standard
 pattern name.
 (stack_protect_combined_test): Likewise.
 * cfgexpand.c (stack_protect_prologue): Try new
 stack_protect_combined_set pattern first.
 * function.c (stack_protect_epilogue): Try new
 stack_protect_combined_test pattern first.
 * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
 parameters to control which register to use as PIC register and force
 reloading PIC register respectively.
 (legitimize_pic_address): Expose above new parameters in prototype and
 adapt recursive calls accordingly.
 (arm_legitimize_address): Adapt to new legitimize_pic_address
 prototype.
 (thumb_legitimize_address): Likewise.
 (arm_emit_call_insn): Adapt to new require_pic_register prototype.
 * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
 change.
 * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
 prototype change.
 (stack_protect_combined_set): New insn_and_split pattern.
 (stack_protect_set): New insn pattern.
 (stack_protect_combined_test): New insn_and_split pattern.
 (stack_protect_test): New insn pattern.
 * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
 (UNSPEC_SP_TEST): Likewise.
 * doc/md.texi (stack_protect_combined_set): Document new standard
 pattern name.
 (stack_protect_set): Clarify that the operand for guard's address is
 legal.
 (stack_protect_combined_test): Document new standard pattern name.
 (stack_protect_test): Clarify that the operand for guard's address is
 legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme 

 PR target/85434
 * gcc.target/arm/pr85434.c: New test.


Sorry for the delay. Some comments inline.

Kyrill

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index d6e3c382085..d1a893ac56e 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6105,8 +6105,18 @@ stack_protect_prologue (void)
   {
 tree guard_decl = targetm.stack_protect_guard ();
   

Re: [PATCH][c++] Fix DECL_BY_REFERENCE of clone parms

2018-07-31 Thread Jason Merrill
OK.

On Tue, Jul 31, 2018 at 7:22 PM, Richard Biener  wrote:
> On Mon, 30 Jul 2018, Tom de Vries wrote:
>
>> Hi,
>>
>> Consider test.C compiled at -O0 -g:
>> ...
>> class string {
>> public:
>>   string (const char *p) { this->p = p ; }
>>   string (const string &s) { this->p = s.p; }
>>
>> private:
>>   const char *p;
>> };
>>
>> class foo {
>> public:
>>   foo (string dir_hint) {}
>> };
>>
>> int
>> main (void)
>> {
>>   std::string s = "This is just a string";
>>   foo bar(s);
>>   return 0;
>> }
>> ...
>>
>> When parsing foo::foo, the dir_hint parameter gets a DECL_ARG_TYPE of
>> 'struct string & restrict'.  Then during finish_struct, we call
>> clone_constructors_and_destructors and create clones for foo::foo, and
>> set the DECL_ARG_TYPE in the same way.
>>
>> Later on, during finish_function, cp_genericize is called for the original
>> foo::foo, which sets the type of parm dir_hint to DECL_ARG_TYPE, and sets
>> DECL_BY_REFERENCE of dir_hint to 1.
>>
>> After that, during maybe_clone_body update_cloned_parm is called with:
>> ...
>> (gdb) call debug_generic_expr (parm.typed.type)
>> struct string & restrict
>> (gdb) call debug_generic_expr (cloned_parm.typed.type)
>> struct string
>> ...
>> The type of the cloned_parm is then set to the type of parm, but
>> DECL_BY_REFERENCE is not set.
>>
>> When doing cp_genericize for the clone later on,
>> TREE_ADDRESSABLE (TREE_TYPE ()) is no longer true for the updated type of
>> the parm, so DECL_BY_REFERENCE is not set there either.
>>
>> This patch fixes the problem by copying DECL_BY_REFERENCE in 
>> update_cloned_parm.
>>
>> Build and reg-tested on x86_64.
>>
>> OK for trunk?
>
> Thanks for tracking this down.  It looks OK to me but please leave
> Jason and Nathan a day to comment.
>
> Otherwise OK for trunk and also for branches after a while.
>
> Thanks,
> Richard.
>
>> Thanks,
>> - Tom
>>
>> [c++] Fix DECL_BY_REFERENCE of clone parms
>>
>> 2018-07-30  Tom de Vries  
>>
>>   PR debug/86687
>>   * optimize.c (update_cloned_parm): Copy DECL_BY_REFERENCE.
>>
>>   * g++.dg/guality/pr86687.C: New test.
>>
>> ---
>>  gcc/cp/optimize.c  |  2 ++
>>  gcc/testsuite/g++.dg/guality/pr86687.C | 28 
>>  2 files changed, 30 insertions(+)
>>
>> diff --git a/gcc/cp/optimize.c b/gcc/cp/optimize.c
>> index 0e9b84ed8a4..3923a5fc6c4 100644
>> --- a/gcc/cp/optimize.c
>> +++ b/gcc/cp/optimize.c
>> @@ -46,6 +46,8 @@ update_cloned_parm (tree parm, tree cloned_parm, bool 
>> first)
>>/* We may have taken its address.  */
>>TREE_ADDRESSABLE (cloned_parm) = TREE_ADDRESSABLE (parm);
>>
>> +  DECL_BY_REFERENCE (cloned_parm) = DECL_BY_REFERENCE (parm);
>> +
>>/* The definition might have different constness.  */
>>TREE_READONLY (cloned_parm) = TREE_READONLY (parm);
>>
>> diff --git a/gcc/testsuite/g++.dg/guality/pr86687.C 
>> b/gcc/testsuite/g++.dg/guality/pr86687.C
>> new file mode 100644
>> index 000..140a6fce596
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/guality/pr86687.C
>> @@ -0,0 +1,28 @@
>> +// PR debug/86687
>> +// { dg-do run }
>> +// { dg-options "-g" }
>> +
>> +class string {
>> +public:
>> +  string (int p) { this->p = p ; }
>> +  string (const string &s) { this->p = s.p; }
>> +
>> +  int p;
>> +};
>> +
>> +class foo {
>> +public:
>> +  foo (string dir_hint) {
>> +p = dir_hint.p; // { dg-final { gdb-test . "dir_hint.p" 3 } }
>> +  }
>> +
>> +  int p;
>> +};
>> +
>> +int
>> +main (void)
>> +{
>> +  string s = 3;
>> +  foo bar(s);
>> +  return !(bar.p == 3);
>> +}
>>
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


Re: [PATCH 4/5] c-family: clean up the data tables in c-format.c

2018-07-31 Thread Marek Polacek
On Tue, Jul 31, 2018 at 02:56:29PM +0200, Richard Biener wrote:
> On Fri, Jul 27, 2018 at 11:48 PM David Malcolm  wrote:
> >
> > The format_char_info tables in c-format.c for our own formats contain
> > a lot of repetition.
> >
> > This patch adds a macro to express the conversion specifiers implemented
> > within pp_format, making it clearer which are custom ones added by the
> > various diagnostic_format_decoder callbacks.
> >
> > Doing so uncovered a few mistakes in the data (based on comparison with
> > the source of the diagnostic_format_decoder callbacks, and the notes
> > below), which the patch fixes:
> >
> > - gcc_diag_char_table didn't have 'Z', but it *is* implemented by pp_format.
> >
> > - removed erroneous 'G' and 'K' entries from gcc_diag_char_table: they're
> >   implemented by default_tree_printer (and thus in "tdiag") and by the
> >   C/C++ FEs, but not in pp_format.
> >
> > - removed "v" (lower case) from gcc_tdiag_char_table and
> >   gcc_cxxdiag_char_table
> >
> > Notes:
> >
> > pretty-print.h uses this for ATTRIBUTE_GCC_PPDIAG, used by pp_printf
> > and pp_verbatim:
> >
> > whereas diagnostic-core.h uses this for ATTRIBUTE_GCC_DIAG, used by
> > the various diagnostic functions:
> >
> > /* If we haven't already defined a front-end-specific diagnostics
> >style, use the generic one.  */
> >
> > Hence I'm assuming that __gcc_diag__ is for use for when we don't
> > know what kind of diagnostic_format_decoder we have, and we can
> > only rely on pp_format's core functionality, where __gcc_tdiag__
> > is allowed to assume default_tree_printer.
> 
> OK if nobody objects.

Looks fine to me, too.

Marek


Re: [PATCH] Fix DJGPP LTO with debug

2018-07-31 Thread Richard Biener
On Sat, 28 Jul 2018, Andris Pavenis wrote:

> On 07/27/2018 11:51 PM, DJ Delorie wrote:
> > Richard Biener  writes:
> > > DJ, did you ever run the testsuite with a configuration that has LTO
> > > enabled?  I don't see any djgpp results posted to gcc-testresults.
> > > Quick googling doesn't yield anything useful with regarding on how to
> > > do actual testing with a cross so I only built a i686-pc-msdosdjgpp
> > > cross cc1/lto1 from x86_64-linux which went fine.
> > CC's Andris, our current gcc maintainer within DJGPP.  I know he just
> > built 8.2 binaries for us, I don't know what his testing infrastructure
> > looks like.
> 
> 
> No.
> 
> II tried to run part of tests from custom scripts (eg. when trying to
> implement DJGPP support for libstdc++fs, not yet submitted to upstream) with
> native compiler for DJGPP.
> 
> Otherwise no DejaGNU support for DJGPP. So no way to run testsuite with native
> compiler.
> 
> I should perhaps try to find some way to try to run testsuite using
> cross-compiler from Linux. Possibilities:
> - trying to execute test programs under DosEmu (no more possible with linux
> kernels 4.15+ as DosEmu do not support DPMI for them)
> - trying to execute test programs under Dosbox. Question: how to configure
> testsuiite to do that? I do not know
> - trying to run them through ssh on some Windows 32 bit system (older than
> Windows 10 as DPMI support is rather horribly broken in Windows 10 32 bit
> since March 2018)

So what about the patch?  Is it OK for trunk and GCC 8 branch?

Thanks,
Richard.


Re: [PATCH 5/5] Formatted printing for dump_* in the middle-end

2018-07-31 Thread Richard Biener
On Fri, Jul 27, 2018 at 11:49 PM David Malcolm  wrote:
>
> This patch converts dump_print and dump_printf_loc from using
> printf (and thus ATTRIBUTE_PRINTF) to using a new pretty-printer
> based on pp_format, which supports formatting middle-end types.
>
> In particular, the following codes are implemented (in addition
> to the standard pretty_printer ones):
>
>%E: gimple *:
>Equivalent to: dump_gimple_expr (MSG_*, TDF_SLIM, stmt, 0)
>%G: gimple *:
>Equivalent to: dump_gimple_stmt (MSG_*, TDF_SLIM, stmt, 0)
>%T: tree:
>Equivalent to: dump_generic_expr (MSG_*, arg, TDF_SLIM).
>
> Hence it becomes possible to convert e.g.:
>
>   if (dump_enabled_p ())
> {
>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>"not vectorized: different sized vector "
>"types in statement, ");
>   dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, vectype);
>   dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
>   dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, nunits_vectype);
>   dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
> }
>
> into a one-liner:
>
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "not vectorized: different sized vector "
>  "types in statement, %T and %T\n",
>  vectype, nunits_vectype);
>
> Unlike regular pretty-printers, this one captures optinfo_item
> instances for the formatted chunks as appropriate, so that when
> written out to a JSON optimization record, the relevant parts of
> the message are labelled by type, and by source location (so that
> e.g. %G is entirely equivalent to using dump_gimple_stmt).
>
> dump_printf and dump_printf_loc become marked with
> ATTRIBUTE_GCC_DUMP_PRINTF, which the patch also implements.
>
> gcc/c-family/ChangeLog:
> * c-format.c (enum format_type): Add gcc_dump_printf_format_type.
> (local_gimple_ptr_node): New decl.
> (gcc_dump_printf_length_specs): New.
> (gcc_dump_printf_flag_pairs): New.
> (gcc_dump_printf_flag_specs): New.
> (gcc_dump_printf_char_table): New.
> (format_types_orig): Add entry for "gcc_dump_printf".
> (init_dynamic_diag_info): Create local_gimple_ptr_node.
> Set up length_char_specs and conversion_specs for
> gcc_dump_printf_format_type.
> (handle_format_attribute): Handle gcc_dump_printf_format_type.
> * c-format.h (T89_GIMPLE): New macro.

Iff the c-family changes are neccessary (are they?) then how does this
work for non-c-family languages which do not link c-family/c-format.o?

> gcc/ChangeLog:
> * dump-context.h: Include "dumpfile.h".
> (dump_context::dump_printf_va): Convert final param from va_list
> to va_list *.  Convert from ATTRIBUTE_PRINTF to
> ATTRIBUTE_GCC_DUMP_PRINTF.
> (dump_context::dump_printf_loc_va): Likewise.
> * dumpfile.c: Include "stringpool.h".
> (make_item_for_dump_printf_va): Delete.
> (make_item_for_dump_printf): Delete.
> (class dump_pretty_printer): New class.
> (dump_pretty_printer::dump_pretty_printer): New ctor.
> (dump_pretty_printer::emit_items): New member function.
> (dump_pretty_printer::emit_any_pending_textual_chunks): New member
> function.
> (dump_pretty_printer::emit_item): New member function.
> (dump_pretty_printer::stash_item): New member function.
> (dump_pretty_printer::format_decoder_cb): New member function.
> (dump_pretty_printer::decode_format): New member function.
> (dump_context::dump_printf_va): Reimplement in terms of
> dump_pretty_printer.
> (dump_context::dump_printf_loc_va): Convert final param from va_list
> to va_list *.
> (dump_context::begin_scope): Reimplement call to
> make_item_for_dump_printf.
> (dump_printf): Update for change to dump_printf_va.
> (dump_printf_loc): Likewise.
> (selftest::test_capture_of_dump_calls): Convert "stmt" from
> greturn * to gimple *.  Add a test_decl.  Add tests of dump_printf
> with %T, %E, and %G.
> * dumpfile.h (ATTRIBUTE_GCC_DUMP_PRINTF): New macro.
> (dump_printf): Replace ATTRIBUTE_PRINTF_2 with
> ATTRIBUTE_GCC_DUMP_PRINTF (2, 3).
> (dump_printf_loc): Replace ATTRIBUTE_PRINTF_3 with
> ATTRIBUTE_GCC_DUMP_PRINTF (3, 0).
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/format/gcc_diag-1.c: Fix typo.  Add test coverage for
> gcc_dump_printf.
> * gcc.dg/format/gcc_diag-10.c: Add gimple typedef.  Add test
> coverage for gcc_dump_printf.
> ---
>  gcc/c-family/c-format.c   |  60 -
>  gcc/c-family/c-format.h   |   1 +
>  gcc/dump-context.h|   7 +-
>  gcc/dumpfile.c| 

Re: [PATCH] combine: Allow combining two insns to two insns

2018-07-31 Thread Richard Sandiford
Christophe Lyon  writes:
> On Mon, 30 Jul 2018 at 18:09, Segher Boessenkool
>  wrote:
>>
>> On Tue, Jul 24, 2018 at 05:18:41PM +, Segher Boessenkool wrote:
>> > This patch allows combine to combine two insns into two.  This helps
>> > in many cases, by reducing instruction path length, and also allowing
>> > further combinations to happen.  PR85160 is a typical example of code
>> > that it can improve.
>> >
>> > This patch does not allow such combinations if either of the original
>> > instructions was a simple move instruction.  In those cases combining
>> > the two instructions increases register pressure without improving the
>> > code.  With this move test register pressure does no longer increase
>> > noticably as far as I can tell.
>> >
>> > (At first I also didn't allow either of the resulting insns to be a
>> > move instruction.  But that is actually a very good thing to have, as
>> > should have been obvious).
>> >
>> > Tested for many months; tested on about 30 targets.
>> >
>> > I'll commit this later this week if there are no objections.
>>
>> Done now, with the testcase at
>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01856.html .
>>
>
> Hi,
>
> Since this was committed, I've noticed regressions
> on aarch64:
> FAIL: gcc.dg/zero_bits_compound-1.c scan-assembler-not \\(and:
>
> on arm-none-linux-gnueabi
> FAIL: gfortran.dg/actual_array_constructor_1.f90   -O1  execution test
>
> On aarch64, I've also noticed a few others regressions but I'm not yet
> 100% sure it's caused by this patch (bisect running):
> gcc.target/aarch64/ashltidisi.c scan-assembler-times asr 4
> gcc.target/aarch64/sve/var_stride_2.c -march=armv8.2-a+sve
> scan-assembler-times \\tadd\\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 10\\n 2
> gcc.target/aarch64/sve/var_stride_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tlsl\\tx[0-9]+, x[0-9]+, 10\\n 2
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmeq\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> #0\\.0\\n 7
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmeq\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> z[0-9]+\\.d\\n 14
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmeq\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> #0\\.0\\n 5
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmeq\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> z[0-9]+\\.s\\n 10
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> #0\\.0\\n 21
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> z[0-9]+\\.d\\n 42
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> #0\\.0\\n 15
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> z[0-9]+\\.s\\n 30
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> #0\\.0\\n 21
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> z[0-9]+\\.d\\n 42
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> #0\\.0\\n 15
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> z[0-9]+\\.s\\n 30
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> #0\\.0\\n 21
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> z[0-9]+\\.d\\n 42
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> #0\\.0\\n 15
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> z[0-9]+\\.s\\n 30
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> #0\\.0\\n 21
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> z[0-9]+\\.d\\n 42
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> #0\\.0\\n 15
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
> z[0-9]+\\.s\\n 30
> gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
> scan-assembler-times \\tfcmne\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
> #0\\.0\\n 21
> gcc.target/aarch64/sve/vcond

Re: [PATCH 4/5] c-family: clean up the data tables in c-format.c

2018-07-31 Thread Richard Biener
On Fri, Jul 27, 2018 at 11:48 PM David Malcolm  wrote:
>
> The format_char_info tables in c-format.c for our own formats contain
> a lot of repetition.
>
> This patch adds a macro to express the conversion specifiers implemented
> within pp_format, making it clearer which are custom ones added by the
> various diagnostic_format_decoder callbacks.
>
> Doing so uncovered a few mistakes in the data (based on comparison with
> the source of the diagnostic_format_decoder callbacks, and the notes
> below), which the patch fixes:
>
> - gcc_diag_char_table didn't have 'Z', but it *is* implemented by pp_format.
>
> - removed erroneous 'G' and 'K' entries from gcc_diag_char_table: they're
>   implemented by default_tree_printer (and thus in "tdiag") and by the
>   C/C++ FEs, but not in pp_format.
>
> - removed "v" (lower case) from gcc_tdiag_char_table and
>   gcc_cxxdiag_char_table
>
> Notes:
>
> pretty-print.h uses this for ATTRIBUTE_GCC_PPDIAG, used by pp_printf
> and pp_verbatim:
>
> whereas diagnostic-core.h uses this for ATTRIBUTE_GCC_DIAG, used by
> the various diagnostic functions:
>
> /* If we haven't already defined a front-end-specific diagnostics
>style, use the generic one.  */
>
> Hence I'm assuming that __gcc_diag__ is for use for when we don't
> know what kind of diagnostic_format_decoder we have, and we can
> only rely on pp_format's core functionality, where __gcc_tdiag__
> is allowed to assume default_tree_printer.

OK if nobody objects.

Thanks,
Richard.

> gcc/c-family/ChangeLog:
> * c-format.c (PP_FORMAT_CHAR_TABLE): New macro, based on existing
> table entries for gcc_diag_char_table, and the 'Z' entry from
> gcc_tdiag_char_table, changing the "chain" entry for 'Z' from
> &gcc_tdiag_char_table[0] to &gcc_diag_char_table[0].
> (gcc_diag_char_table): Use PP_FORMAT_CHAR_TABLE, implicitly
> adding missing "Z" for this table.  Remove erroneous "G" and "K"
> entries.
> (gcc_tdiag_char_table): Use PP_FORMAT_CHAR_TABLE.  Remove "v".
> (gcc_cdiag_char_table): Use PP_FORMAT_CHAR_TABLE.
> (gcc_cxxdiag_char_table): Use PP_FORMAT_CHAR_TABLE.  Remove "v".
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/format/gcc_diag-1.c (foo): Update the %v tests for
> tdiag and cxxdiag.
> * gcc.dg/format/gcc_diag-10.c (test_diag): Update tests of %G
> and %K.
> ---
>  gcc/c-family/c-format.c   | 99 
> ++-
>  gcc/testsuite/gcc.dg/format/gcc_diag-1.c  |  4 +-
>  gcc/testsuite/gcc.dg/format/gcc_diag-10.c |  7 +--
>  3 files changed, 35 insertions(+), 75 deletions(-)
>
> diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
> index a0192dd..82841e4 100644
> --- a/gcc/c-family/c-format.c
> +++ b/gcc/c-family/c-format.c
> @@ -679,43 +679,40 @@ static const format_char_info asm_fprintf_char_table[] =
>{ NULL,  0, STD_C89, NOLENGTHS, NULL, NULL, NULL }
>  };
>
> +/* GCC-specific format_char_info arrays.  */
> +
> +/* The conversion specifiers implemented within pp_format, and thus supported
> +   by all pretty_printer instances within GCC.  */
> +
> +#define PP_FORMAT_CHAR_TABLE \
> +  { "di",  0, STD_C89, { T89_I,   BADLEN,  BADLEN,  T89_L,   T9L_LL,  
> BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q",  "",   
> NULL }, \
> +  { "ox",  0, STD_C89, { T89_UI,  BADLEN,  BADLEN,  T89_UL,  T9L_ULL, 
> BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q",  "",   
> NULL }, \
> +  { "u",   0, STD_C89, { T89_UI,  BADLEN,  BADLEN,  T89_UL,  T9L_ULL, 
> BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q",  "",   
> NULL }, \
> +  { "c",   0, STD_C89, { T89_I,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  
> BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q",  "",   
> NULL }, \
> +  { "s",   1, STD_C89, { T89_C,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  
> BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "pq", "cR", 
> NULL }, \
> +  { "p",   1, STD_C89, { T89_V,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  
> BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q",  "c",  
> NULL }, \
> +  { "r",   1, STD_C89, { T89_C,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  
> BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "",
> "//cR",   NULL }, \
> +  { "<",   0, STD_C89, NOARGUMENTS, "",  "<",   NULL }, \
> +  { ">",   0, STD_C89, NOARGUMENTS, "",  ">",   NULL }, \
> +  { "'" ,  0, STD_C89, NOARGUMENTS, "",  "",NULL }, \
> +  { "R",   0, STD_C89, NOARGUMENTS, "", "\\",   NULL }, \
> +  { "m",   0, STD_C89, NOARGUMENTS, "q", "",   NULL }, \
> +  { "Z",   1, STD_C89, { T89_I,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  
> BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "","", 
> &gcc_diag_char_table[0] }
> +
>  static const format_char_info gcc_diag_char_table[] =
>  {
> -  /* C89 conversion specifiers.  */
> -  { "di",  0, STD_C89, { T89_I,   BADLEN,

Re: [PATCH 2/5] dumpfile.c: eliminate special-casing of dump_file/alt_dump_file

2018-07-31 Thread Richard Biener
On Fri, Jul 27, 2018 at 11:48 PM David Malcolm  wrote:
>
> With the addition of optinfo, the various dump_* calls had three parts:
> - optionally print to dump_file
> - optionally print to alt_dump_file
> - optionally make an optinfo_item and add it to the pending optinfo,
>   creating it for dump_*_loc calls.
>
> However, this split makes it difficult to implement the formatted dumps
> later in patch kit, so as enabling work towards that, this patch removes
> the above split, so that all dumping within the dump_* API goes through
> optinfo_item.
>
> In order to ensure that the dumps to dump_file and alt_dump_file are
> processed immediately (rather than being buffered within the pending
> optinfo for consolidation), this patch introduces the idea of "immediate"
> optinfo_item destinations vs "non-immediate" destinations.
>
> The patch also adds selftest coverage of what's printed, and of scopes.
>
> This adds two allocations per dump_* call when dumping is enabled.
> I'm assuming that this isn't a problem, as dump_enabled_p is normally
> false.  There are ways of optimizing it if it is an issue (by making
> optinfo_item instances become temporaries that borrow the underlying
> buffer), but they require nontrivial changes, so I'd prefer to leave
> that for another patch kit, if it becomes necessary.

Yeah, I guess that's OK given we can consolidate quite some calls after
your patch anyways.  Using alloca + placement new would be possible
as well I guess?

OK.

Richard.

> gcc/ChangeLog:
> * dump-context.h: Include "pretty-print.h".
> (dump_context::refresh_dumps_are_enabled): New decl.
> (dump_context::emit_item): New decl.
> (class dump_context): Add fields "m_test_pp" and
> "m_test_pp_flags".
> (temp_dump_context::temp_dump_context): Add param "test_pp_flags".
> (temp_dump_context::get_dumped_text): New decl.
> (class temp_dump_context): Add field "m_pp".
> * dumpfile.c (refresh_dumps_are_enabled): Convert to...
> (dump_context::refresh_dumps_are_enabled): ...and add a test for
> m_test_pp.
> (set_dump_file): Update for above change.
> (set_alt_dump_file): Likewise.
> (dump_loc): New overload, taking a pretty_printer *.
> (dump_context::dump_loc): Call end_any_optinfo.  Dump the location
> to any test pretty-printer.
> (make_item_for_dump_gimple_stmt): New function, adapted from
> optinfo::add_gimple_stmt.
> (dump_context::dump_gimple_stmt): Call it, and use the result,
> eliminating the direct usage of dump_file and alt_dump_file in
> favor of indirectly using them via emit_item.
> (make_item_for_dump_gimple_expr): New function, adapted from
> optinfo::add_gimple_expr.
> (dump_context::dump_gimple_expr): Call it, and use the result,
> eliminating the direct usage of dump_file and alt_dump_file in
> favor of indirectly using them via emit_item.
> (make_item_for_dump_generic_expr): New function, adapted from
> optinfo::add_tree.
> (dump_context::dump_generic_expr): Call it, and use the result,
> eliminating the direct usage of dump_file and alt_dump_file in
> favor of indirectly using them via emit_item.
> (make_item_for_dump_printf_va): New function, adapted from
> optinfo::add_printf_va.
> (make_item_for_dump_printf): New function.
> (dump_context::dump_printf_va): Call make_item_for_dump_printf_va,
> and use the result, eliminating the direct usage of dump_file and
> alt_dump_file in favor of indirectly using them via emit_item.
> (make_item_for_dump_dec): New function.
> (dump_context::dump_dec): Call it, and use the result,
> eliminating the direct usage of dump_file and alt_dump_file in
> favor of indirectly using them via emit_item.
> (make_item_for_dump_symtab_node): New function, adapted from
> optinfo::add_symtab_node.
> (dump_context::dump_symtab_node): Call it, and use the result,
> eliminating the direct usage of dump_file and alt_dump_file in
> favor of indirectly using them via emit_item.
> (dump_context::begin_scope): Reimplement, avoiding direct usage
> of dump_file and alt_dump_file in favor of indirectly using them
> via emit_item.
> (dump_context::emit_item): New member function.
> (temp_dump_context::temp_dump_context): Add param "test_pp_flags".
> Set up test pretty-printer on the underlying context.  Call
> refresh_dumps_are_enabled.
> (temp_dump_context::~temp_dump_context): Call
> refresh_dumps_are_enabled.
> (temp_dump_context::get_dumped_text): New member function.
> (selftest::verify_dumped_text): New function.
> (ASSERT_DUMPED_TEXT_EQ): New macro.
> (selftest::test_capture_of_dump_calls): Run all tests twice, with

Re: [PATCH 1/5] Simplify dump_context by adding a dump_loc member function

2018-07-31 Thread Richard Biener
On Fri, Jul 27, 2018 at 11:48 PM David Malcolm  wrote:
>
> This patch removes some duplicated code in dumpfile.c by
> reimplementing the various dump_foo_loc calls in terms of dump_foo.

OK.

Richard.

> gcc/ChangeLog:
> * dump-context.h (dump_context::dump_loc): New decl.
> * dumpfile.c (dump_context::dump_loc): New member function.
> (dump_context::dump_gimple_stmt_loc): Reimplement using dump_loc
> and dump_gimple_stmt.
> (dump_context::dump_gimple_expr_loc): Likewise, using
> dump_gimple_expr.
> (dump_context::dump_generic_expr_loc): Likewise, using
> dump_generic_expr.
> (dump_context::dump_printf_loc_va): Likewise, using
> dump_printf_va.
> (dump_context::begin_scope): Explicitly using the global function
> "dump_loc", rather than the member function.
> ---
>  gcc/dump-context.h |   2 +
>  gcc/dumpfile.c | 119 
> ++---
>  2 files changed, 33 insertions(+), 88 deletions(-)
>
> diff --git a/gcc/dump-context.h b/gcc/dump-context.h
> index a191e3a..f6df0b4 100644
> --- a/gcc/dump-context.h
> +++ b/gcc/dump-context.h
> @@ -39,6 +39,8 @@ class dump_context
>
>~dump_context ();
>
> +  void dump_loc (dump_flags_t dump_kind, const dump_location_t &loc);
> +
>void dump_gimple_stmt (dump_flags_t dump_kind, dump_flags_t 
> extra_dump_flags,
>  gimple *gs, int spc);
>
> diff --git a/gcc/dumpfile.c b/gcc/dumpfile.c
> index 176c9b8..3c8bc38 100644
> --- a/gcc/dumpfile.c
> +++ b/gcc/dumpfile.c
> @@ -474,6 +474,27 @@ dump_context::~dump_context ()
>delete m_pending;
>  }
>
> +/* Print LOC to the appropriate dump destinations, given DUMP_KIND.
> +   If optinfos are enabled, begin a new optinfo.  */
> +
> +void
> +dump_context::dump_loc (dump_flags_t dump_kind, const dump_location_t &loc)
> +{
> +  location_t srcloc = loc.get_location_t ();
> +
> +  if (dump_file && (dump_kind & pflags))
> +::dump_loc (dump_kind, dump_file, srcloc);
> +
> +  if (alt_dump_file && (dump_kind & alt_flags))
> +::dump_loc (dump_kind, alt_dump_file, srcloc);
> +
> +  if (optinfo_enabled_p ())
> +{
> +  optinfo &info = begin_next_optinfo (loc);
> +  info.handle_dump_file_kind (dump_kind);
> +}
> +}
> +
>  /* Dump gimple statement GS with SPC indentation spaces and
> EXTRA_DUMP_FLAGS on the dump streams if DUMP_KIND is enabled.  */
>
> @@ -504,25 +525,8 @@ dump_context::dump_gimple_stmt_loc (dump_flags_t 
> dump_kind,
> dump_flags_t extra_dump_flags,
> gimple *gs, int spc)
>  {
> -  location_t srcloc = loc.get_location_t ();
> -  if (dump_file && (dump_kind & pflags))
> -{
> -  dump_loc (dump_kind, dump_file, srcloc);
> -  print_gimple_stmt (dump_file, gs, spc, dump_flags | extra_dump_flags);
> -}
> -
> -  if (alt_dump_file && (dump_kind & alt_flags))
> -{
> -  dump_loc (dump_kind, alt_dump_file, srcloc);
> -  print_gimple_stmt (alt_dump_file, gs, spc, dump_flags | 
> extra_dump_flags);
> -}
> -
> -  if (optinfo_enabled_p ())
> -{
> -  optinfo &info = begin_next_optinfo (loc);
> -  info.handle_dump_file_kind (dump_kind);
> -  info.add_gimple_stmt (gs, spc, dump_flags | extra_dump_flags);
> -}
> +  dump_loc (dump_kind, loc);
> +  dump_gimple_stmt (dump_kind, extra_dump_flags, gs, spc);
>  }
>
>  /* Dump gimple statement GS with SPC indentation spaces and
> @@ -557,25 +561,8 @@ dump_context::dump_gimple_expr_loc (dump_flags_t 
> dump_kind,
> gimple *gs,
> int spc)
>  {
> -  location_t srcloc = loc.get_location_t ();
> -  if (dump_file && (dump_kind & pflags))
> -{
> -  dump_loc (dump_kind, dump_file, srcloc);
> -  print_gimple_expr (dump_file, gs, spc, dump_flags | extra_dump_flags);
> -}
> -
> -  if (alt_dump_file && (dump_kind & alt_flags))
> -{
> -  dump_loc (dump_kind, alt_dump_file, srcloc);
> -  print_gimple_expr (alt_dump_file, gs, spc, dump_flags | 
> extra_dump_flags);
> -}
> -
> -  if (optinfo_enabled_p ())
> -{
> -  optinfo &info = begin_next_optinfo (loc);
> -  info.handle_dump_file_kind (dump_kind);
> -  info.add_gimple_expr (gs, spc, dump_flags | extra_dump_flags);
> -}
> +  dump_loc (dump_kind, loc);
> +  dump_gimple_expr (dump_kind, extra_dump_flags, gs, spc);
>  }
>
>
> @@ -611,25 +598,8 @@ dump_context::dump_generic_expr_loc (dump_flags_t 
> dump_kind,
>  dump_flags_t extra_dump_flags,
>  tree t)
>  {
> -  location_t srcloc = loc.get_location_t ();
> -  if (dump_file && (dump_kind & pflags))
> -{
> -  dump_loc (dump_kind, dump_file, srcloc);
> -  print_generic_expr (dump_file, t, dump_flags | extra_dump_flags);
> -}
> -
> -  if (alt_dump_file && (dump_kind & alt_flags))
> -{
> -  du

Re: [PATCH 0/5] dump_printf support for middle-end types

2018-07-31 Thread Richard Biener
On Fri, Jul 27, 2018 at 11:47 PM David Malcolm  wrote:
>
> This patch kit converts dump_print and dump_printf_loc from using
> fprintf etc internally to using a new pretty-printer
> based on pp_format, which supports formatting middle-end types.
>
> In particular, the following codes are implemented (in addition
> to the standard pretty_printer ones):
>
>%E: gimple *:
>Equivalent to: dump_gimple_expr (MSG_*, TDF_SLIM, stmt, 0)
>%G: gimple *:
>Equivalent to: dump_gimple_stmt (MSG_*, TDF_SLIM, stmt, 0)
>%T: tree:
>Equivalent to: dump_generic_expr (MSG_*, arg, TDF_SLIM).
>
> Hence it becomes possible to convert e.g.:
>
>   if (dump_enabled_p ())
> {
>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>"not vectorized: different sized vector "
>"types in statement, ");
>   dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, vectype);
>   dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
>   dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, nunits_vectype);
>   dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
> }
>   return false;
>
> into a single call to dump_printf_loc:
>
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "not vectorized: different sized vector "
>  "types in statement, %T and %T\n",
>  vectype, nunits_vectype);
>   return false;
>
> Unlike regular pretty-printers, this captures metadata for the
> formatted chunks as appropriate, so that when written out to a
> JSON optimization record, the relevant parts of the message are
> labelled by type, and by source location (so that
> e.g. %G is entirely equivalent to using dump_gimple_stmt).
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> OK for trunk?

Nice!  I'm somehow missing 3/5?  Will look into the other ones now.

Richard.

> I'm hoping to use this in a v3 of:
>   "[PATCH 0/5] [RFC v2] Higher-level reporting of vectorization problems"
>  https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00446.html
> where the above might become:
>   return opt_result::failure_at (stmt,
>  "not vectorized: different sized vector "
>  "types in statement, %T and %T\n",
>  vectype, nunits_vectype);
> where opt_result::failure_at would encapsulate the "false", and
> capture an opt_problem * (when dumps are enabled), for the reasons
> discussed in that other kit.
>
> David Malcolm (5):
>   Simplify dump_context by adding a dump_loc member function
>   dumpfile.c: eliminate special-casing of dump_file/alt_dump_file
>   C++: clean up cp_printer
>   c-family: clean up the data tables in c-format.c
>   Formatted printing for dump_* in the middle-end
>
>  gcc/c-family/c-format.c   |  159 +++--
>  gcc/c-family/c-format.h   |1 +
>  gcc/cp/error.c|   46 +-
>  gcc/dump-context.h|   25 +-
>  gcc/dumpfile.c| 1011 
> ++---
>  gcc/dumpfile.h|   54 +-
>  gcc/optinfo-emit-json.cc  |2 +-
>  gcc/optinfo.cc|  135 +---
>  gcc/optinfo.h |   38 +-
>  gcc/testsuite/gcc.dg/format/gcc_diag-1.c  |   19 +-
>  gcc/testsuite/gcc.dg/format/gcc_diag-10.c |   33 +-
>  11 files changed, 998 insertions(+), 525 deletions(-)
>
> --
> 1.8.5.3
>


Re: [PR 80689] Copy small aggregates element-wise

2018-07-31 Thread Richard Biener
On Tue, Jul 24, 2018 at 3:47 PM Martin Jambor  wrote:
>
> Hi,
>
> I'd like to propose again a new variant of a fix that I sent here in
> November (https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00881.html) that
> avoids store-to-load forwarding stalls in the ImageMagick benchmark by
> expanding copies of very small simple aggregates element-wise rather
> than "by pieces."
>
> I have adjusted the patch only a little, most notably there is only one
> controlling parameter and that is the maximum number of element-copies
> that are necessary to copy an aggregate to use this method, rather than
> a size constraint.  On x86_64, that parameter is set 4, on other
> architectures I leave it at zero but it could help them too.
>
> I have benchmarked the patch on top of a recent trunk on an AMD Ryzen
> and an Intel Skylake machine using SPEC 2006 and SPEC 2017 CPU suites.
> The only non-noise difference was 538.imagick_r, which on Ryzen and -O2
> improved by 13% (generich march/mtune) and 20% (native march/mtune) and
> on Skylake by 7% and 9% with the same switches.
>
> I have bootstrapped the patch on x86_64-linux and (after changing the
> parameter default) also on ppc64-linux and aarch64-linux.  I have not
> done any benchmarking on non-x86_64 machines.
>
> I'll be grateful for any comments, eventually I'd like to get approval
> to commit it to trunk.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 1dcdfb51c47..240934b07d8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11309,6 +11309,10 @@ away for the unroll-and-jam transformation to
be considered profitable.
 @item unroll-jam-max-unroll
 The maximum number of times the outer loop should be unrolled by
 the unroll-and-jam transformation.
+
+@item max-elems-for-elementwise-copy
+The maximum number of elements required to copy a structure and/or
+array element-wise (as opposed to a bulk memory move).

Maybe use 'considered' or 'allowed' instead of 'required'?  It reads
somewhat odd
otherwise.

+ /* Copying smallish BLKmode structures with emit_block_move and thus
+by-pieces can result in store-to-load stalls.  So copy some simple
+small aggregates element or field-wise.  */
+ int count = 0;
+ if (GET_MODE (target) == BLKmode
+ && AGGREGATE_TYPE_P (TREE_TYPE (exp))
+ && !TREE_ADDRESSABLE (TREE_TYPE (exp))

do you remember why !TREE_ADDRESSABLE is needed here?

+ && (tree_to_shwi (TYPE_SIZE (TREE_TYPE (exp)))
+ <= (PARAM_VALUE (PARAM_MAX_ELEMS_FOR_ELEMENTWISE_COPY)
+ * MOVE_MAX_PIECES * BITS_PER_UNIT))

I guess that works to limit element counting.  You can use TYPE_SIZE_UNIT
to elide the BITS_PER_UNIT multiplication btw.  Specifying an
element limit to simple_mix_of_records_and_arrays_p instead of counting them
and checking afterwards would be another thing to do (also avoiding the store
in that function).

Should the whole thing be guarded with !optimize_insn_for_size ()?

As I said in the original threads I have an issue with splitting the stores
(not so much with splitting the loads).  I realize that it's a bit hard
to retrofit that in your code, you'd be better off passing a type to
emit_block_move[_hints] / move_by_pieces (not sure if the Imagemagic
testcase runs into that or emit_block_move_via_movmem?).  You'd
there keep the basic "piece" iteration but split the load part in case
it happens to "align" on several fields.  You'd then use store_bit_field
to compose a temporary reg for the larger store, as a last resort
making the store part smaller (for example when it's 8 bytes but
the upper 4 bytes are just padding).

Alternatively targets could do some store merging using peepholes,
but I think that's not going to work very well because that usually
requires a scratch reg.

Sorry,
Richard.

> Thanks,
>
> Martin
>
>
>
> 2018-07-10  Martin Jambor  
>
> PR target/80689
> * tree-sra.h: New file.
> * ipa-prop.h: Moved declaration of build_ref_for_offset to
> tree-sra.h.
> * expr.c: Include params.h and tree-sra.h.
> (emit_move_elementwise): New function.
> (store_expr_with_bounds): Optionally use it.
> * ipa-cp.c: Include tree-sra.h.
> * params.def (PARAM_MAX_ELEMS_FOR_ELEMENTWISE_COPY): New.
> * doc/invoke.texi: Document max-elems-for-elementwise-copy.
> * config/i386/i386.c (ix86_option_override_internal): Set
> PARAM_MAX_ELEMS_FOR_ELEMENTWISE_COPY to 4.
> * tree-sra.c: Include tree-sra.h.
> (scalarizable_type_p): Renamed to
> simple_mix_of_records_and_arrays_p, made public, renamed the
> second parameter to allow_char_arrays, added count_p parameter.
> (extract_min_max_idx_from_array): New function.
> (completely_scalarize): Moved bits of the function to
> extract_min_max_idx_from_array.
>
> testsuite/
> * gcc.target/i386/pr80689-1.c: New test.
> --

Re: [PATCH] combine: Allow combining two insns to two insns

2018-07-31 Thread H.J. Lu
On Wed, Jul 25, 2018 at 1:28 AM, Richard Biener
 wrote:
> On Tue, Jul 24, 2018 at 7:18 PM Segher Boessenkool
>  wrote:
>>
>> This patch allows combine to combine two insns into two.  This helps
>> in many cases, by reducing instruction path length, and also allowing
>> further combinations to happen.  PR85160 is a typical example of code
>> that it can improve.
>>
>> This patch does not allow such combinations if either of the original
>> instructions was a simple move instruction.  In those cases combining
>> the two instructions increases register pressure without improving the
>> code.  With this move test register pressure does no longer increase
>> noticably as far as I can tell.
>>
>> (At first I also didn't allow either of the resulting insns to be a
>> move instruction.  But that is actually a very good thing to have, as
>> should have been obvious).
>>
>> Tested for many months; tested on about 30 targets.
>>
>> I'll commit this later this week if there are no objections.
>
> Sounds good - but, _any_ testcase?  Please! ;)
>

Here is a testcase:

For

---
#define N 16
float f[N];
double d[N];
int n[N];

__attribute__((noinline)) void
f3 (void)
{
  int i;
  for (i = 0; i < N; i++)
d[i] = f[i];
}
---

r263067 improved -O3 -mavx2 -mtune=generic -m64 from

.cfi_startproc
vmovaps f(%rip), %xmm2
vmovaps f+32(%rip), %xmm3
vinsertf128 $0x1, f+16(%rip), %ymm2, %ymm0
vcvtps2pd %xmm0, %ymm1
vextractf128 $0x1, %ymm0, %xmm0
vmovaps %xmm1, d(%rip)
vextractf128 $0x1, %ymm1, d+16(%rip)
vcvtps2pd %xmm0, %ymm0
vmovaps %xmm0, d+32(%rip)
vextractf128 $0x1, %ymm0, d+48(%rip)
vinsertf128 $0x1, f+48(%rip), %ymm3, %ymm0
vcvtps2pd %xmm0, %ymm1
vextractf128 $0x1, %ymm0, %xmm0
vmovaps %xmm1, d+64(%rip)
vextractf128 $0x1, %ymm1, d+80(%rip)
vcvtps2pd %xmm0, %ymm0
vmovaps %xmm0, d+96(%rip)
vextractf128 $0x1, %ymm0, d+112(%rip)
vzeroupper
ret
.cfi_endproc

to

.cfi_startproc
vcvtps2pd f(%rip), %ymm0
vmovaps %xmm0, d(%rip)
vextractf128 $0x1, %ymm0, d+16(%rip)
vcvtps2pd f+16(%rip), %ymm0
vmovaps %xmm0, d+32(%rip)
vextractf128 $0x1, %ymm0, d+48(%rip)
vcvtps2pd f+32(%rip), %ymm0
vextractf128 $0x1, %ymm0, d+80(%rip)
vmovaps %xmm0, d+64(%rip)
vcvtps2pd f+48(%rip), %ymm0
vextractf128 $0x1, %ymm0, d+112(%rip)
vmovaps %xmm0, d+96(%rip)
vzeroupper
ret
.cfi_endproc

This is:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86752

H.J.


Re: [PATCH] arm: Testcase for PR86640

2018-07-31 Thread Kyrill Tkachov

Hi Segher,

On 31/07/18 13:14, Segher Boessenkool wrote:

Hi Kyrill,

As before, untested.  Is this okay for trunk, or will you handle it
yourself (or will someone else do it?)


This is ok.

Thanks,
Kyrill



Segher


2018-07-31  Segher Boessenkool  

gcc/testsuite/
PR target/86640
* gcc.target/arm/pr86640.c: New testcase.

---
  gcc/testsuite/gcc.target/arm/pr86640.c | 10 ++
  1 file changed, 10 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/arm/pr86640.c

diff --git a/gcc/testsuite/gcc.target/arm/pr86640.c 
b/gcc/testsuite/gcc.target/arm/pr86640.c
new file mode 100644
index 000..e104602
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr86640.c
@@ -0,0 +1,10 @@
+/* { dg-options "-O3" } */
+
+/* This ICEd with  -O3 -mfpu=neon -mfloat-abi=hard -march=armv7-a  .  */
+
+char fn1() {
+  long long b[5];
+  for (int a = 0; a < 5; a++)
+b[a] = ~0ULL;
+  return b[3];
+}




Re: [PATCH] combine: Allow combining two insns to two insns

2018-07-31 Thread Christophe Lyon
On Mon, 30 Jul 2018 at 18:09, Segher Boessenkool
 wrote:
>
> On Tue, Jul 24, 2018 at 05:18:41PM +, Segher Boessenkool wrote:
> > This patch allows combine to combine two insns into two.  This helps
> > in many cases, by reducing instruction path length, and also allowing
> > further combinations to happen.  PR85160 is a typical example of code
> > that it can improve.
> >
> > This patch does not allow such combinations if either of the original
> > instructions was a simple move instruction.  In those cases combining
> > the two instructions increases register pressure without improving the
> > code.  With this move test register pressure does no longer increase
> > noticably as far as I can tell.
> >
> > (At first I also didn't allow either of the resulting insns to be a
> > move instruction.  But that is actually a very good thing to have, as
> > should have been obvious).
> >
> > Tested for many months; tested on about 30 targets.
> >
> > I'll commit this later this week if there are no objections.
>
> Done now, with the testcase at 
> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01856.html .
>

Hi,

Since this was committed, I've noticed regressions
on aarch64:
FAIL: gcc.dg/zero_bits_compound-1.c scan-assembler-not \\(and:

on arm-none-linux-gnueabi
FAIL: gfortran.dg/actual_array_constructor_1.f90   -O1  execution test

On aarch64, I've also noticed a few others regressions but I'm not yet
100% sure it's caused by this patch (bisect running):
gcc.target/aarch64/ashltidisi.c scan-assembler-times asr 4
gcc.target/aarch64/sve/var_stride_2.c -march=armv8.2-a+sve
scan-assembler-times \\tadd\\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 10\\n 2
gcc.target/aarch64/sve/var_stride_4.c -march=armv8.2-a+sve
scan-assembler-times \\tlsl\\tx[0-9]+, x[0-9]+, 10\\n 2
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmeq\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
#0\\.0\\n 7
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmeq\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
z[0-9]+\\.d\\n 14
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmeq\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
#0\\.0\\n 5
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmeq\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
z[0-9]+\\.s\\n 10
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
#0\\.0\\n 21
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
z[0-9]+\\.d\\n 42
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
#0\\.0\\n 15
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
z[0-9]+\\.s\\n 30
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
#0\\.0\\n 21
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
z[0-9]+\\.d\\n 42
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
#0\\.0\\n 15
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
z[0-9]+\\.s\\n 30
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
#0\\.0\\n 21
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
z[0-9]+\\.d\\n 42
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
#0\\.0\\n 15
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
z[0-9]+\\.s\\n 30
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
#0\\.0\\n 21
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
z[0-9]+\\.d\\n 42
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
#0\\.0\\n 15
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s,
z[0-9]+\\.s\\n 30
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmne\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
#0\\.0\\n 21
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmne\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d,
z[0-9]+\\.d\\n 42
gcc.target/aarch64/sve/vcond_4.c -march=armv8.2-a+sve
scan-assembler-times \\tfcmne\\tp[0-9]+\

Re: [PATCH] Fix the damage done by my other patch from yesterday to strlenopt-49.c

2018-07-31 Thread Bernd Edlinger
On 07/30/18 15:03, Richard Biener wrote:
> On Mon, 30 Jul 2018, Bernd Edlinger wrote:
> 
>> Hi,
>>
>> this is how I would like to handle the over length strings issue in the C FE.
>> If the string constant is exactly the right length and ends in one explicit
>> NUL character, shorten it by one character.
>>
>> I thought Martin would be working on it,  but as this is a really simple fix,
>> I would dare to send it to gcc-patches anyway, hope you don't mind...
>>
>> The patch is relative to the other patch here: 
>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html
>>
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk?
> 
> I'll leave this to FE maintainers but can I ask you to verify the
> (other) FEs do not leak this kind of invalid initializers to the
> middle-end?  I suggest to put this verification in
> output_constructor which otherwise happily truncates initializers
> with excess size.  There's also gimplification which might elide
> a = { "abcd", "cdse" }; to  a.x = "abcd"; a.y = "cdse"; but
> hopefully there the GIMPLE verifier (verify_gimple_assign_single)
> verifies this - well, it only dispatches to useless_type_conversion_p
> (lhs_type, rhs1_type) for this case, but non-flexarrays should be
> handled fine there.
> 

Okay, this is what I am currently playing with.
There is still a major fault in the fortran FE, but I believe sanitizing
the middle-end is worth it

IMHO sanitizing should have priority over new optimizations :(


Thanks
Bernd.
Index: gcc/varasm.c
===
--- gcc/varasm.c	(Revision 263072)
+++ gcc/varasm.c	(Arbeitskopie)
@@ -4774,6 +4774,29 @@ initializer_constant_valid_for_bitfield_p (tree va
   return false;
 }
 
+/* Check if a STRING_CST fits into the field.
+   Tolerate only the case when the NUL termination
+   does not fit into the field.   */
+
+bool
+check_string_literal (tree string, unsigned HOST_WIDE_INT size)
+{
+  tree eltype = TREE_TYPE (TREE_TYPE (string));
+  unsigned HOST_WIDE_INT elts = tree_to_uhwi (TYPE_SIZE_UNIT (eltype));
+  const char *p = TREE_STRING_POINTER (string);
+  int len = TREE_STRING_LENGTH (string);
+
+  if (elts != 1 && elts != 2 && elts != 4)
+return false;
+  if (len <= 0 || len % elts != 0)
+return false;
+  if ((unsigned)len != size && (unsigned)len != size + elts)
+return false;
+  if (memcmp (p + len - elts, "\0\0\0\0", elts) != 0)
+return false;
+  return true;
+}
+
 /* output_constructor outer state of relevance in recursive calls, typically
for nested aggregate bitfields.  */
 
@@ -4942,6 +4965,7 @@ output_constant (tree exp, unsigned HOST_WIDE_INT
 	case STRING_CST:
 	  thissize
 	= MIN ((unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp), size);
+	  gcc_checking_assert (check_string_literal (exp, thissize));
 	  assemble_string (TREE_STRING_POINTER (exp), thissize);
 	  break;
 	case VECTOR_CST:


[PATCH] Make GO string literals properly NUL terminated

2018-07-31 Thread Bernd Edlinger
Hi,


could someone please review this patch and check it in into the GO FE?


Thanks
Bernd.
2018-07-31  Bernd Edlinger  

	* go-gcc.cc (Gcc_backend::string_constant_expression): Make string
	literal properly NUL terminated.

diff -pur gcc/go/go-gcc.cc gcc/go/go-gcc.cc
--- gcc/go/go-gcc.cc	2018-06-28 19:46:36.0 +0200
+++ gcc/go/go-gcc.cc	2018-07-31 12:52:24.690236476 +0200
@@ -1394,7 +1394,7 @@ Gcc_backend::string_constant_expression(
 	  TYPE_QUAL_CONST);
   tree string_type = build_array_type(const_char_type, index_type);
   TYPE_STRING_FLAG(string_type) = 1;
-  tree string_val = build_string(val.length(), val.data());
+  tree string_val = build_string(val.length() + 1, val.data());
   TREE_TYPE(string_val) = string_type;
 
   return this->make_expression(string_val);


[PATCH] arm: Testcase for PR86640

2018-07-31 Thread Segher Boessenkool
Hi Kyrill,

As before, untested.  Is this okay for trunk, or will you handle it
yourself (or will someone else do it?)


Segher


2018-07-31  Segher Boessenkool  

gcc/testsuite/
PR target/86640
* gcc.target/arm/pr86640.c: New testcase.

---
 gcc/testsuite/gcc.target/arm/pr86640.c | 10 ++
 1 file changed, 10 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr86640.c

diff --git a/gcc/testsuite/gcc.target/arm/pr86640.c 
b/gcc/testsuite/gcc.target/arm/pr86640.c
new file mode 100644
index 000..e104602
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr86640.c
@@ -0,0 +1,10 @@
+/* { dg-options "-O3" } */
+
+/* This ICEd with  -O3 -mfpu=neon -mfloat-abi=hard -march=armv7-a  .  */
+
+char fn1() {
+  long long b[5];
+  for (int a = 0; a < 5; a++)
+b[a] = ~0ULL;
+  return b[3];
+}
-- 
1.8.3.1



[PATCH] [Ada] Make middle-end string literals NUL terminated

2018-07-31 Thread Bernd Edlinger
Hi!


This fixes a couple STRING_CST that are not explicitly NUL terminated.
These were caught in a new check in varasm.c I am currently working on.

Having a NUL terminated string does not change the binary output, but it
makes things easier for he middle-end.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.
2018-07-31  Bernd Edlinger  

	* gcc-interface/trans.c (gnat_to_gnu): Make string literal properly
	NUL terminated.
	* gcc-interface/utils2.c (expand_sloc): Likewise.

diff -pur gcc/ada/gcc-interface/trans.c gcc/ada/gcc-interface/trans.c
--- gcc/ada/gcc-interface/trans.c	2018-07-17 10:10:04.0 +0200
+++ gcc/ada/gcc-interface/trans.c	2018-07-31 11:16:27.350728886 +0200
@@ -6079,7 +6079,7 @@ gnat_to_gnu (Node_Id gnat_node)
 	 where GCC will want to treat it as a C string.  */
 	  string[i] = 0;
 
-	  gnu_result = build_string (length, string);
+	  gnu_result = build_string (length + 1, string);
 
 	  /* Strings in GCC don't normally have types, but we want
 	 this to not be converted to the array type.  */
diff -pur gcc/ada/gcc-interface/utils2.c gcc/ada/gcc-interface/utils2.c
--- gcc/ada/gcc-interface/utils2.c	2017-12-21 07:57:41.0 +0100
+++ gcc/ada/gcc-interface/utils2.c	2018-07-31 11:44:01.517117923 +0200
@@ -1844,7 +1844,7 @@ expand_sloc (Node_Id gnat_node, tree *fi
 }
 
   const int len = strlen (str);
-  *filename = build_string (len, str);
+  *filename = build_string (len + 1, str);
   TREE_TYPE (*filename) = build_array_type (char_type_node,
 	build_index_type (size_int (len)));
   *line = build_int_cst (NULL_TREE, line_number);


[Patch, fortran] A first small step towards CFI descriptor implementation

2018-07-31 Thread Paul Richard Thomas
Daniel Celis Garza and Damian Rouson have developed a runtime library
and include file for the TS 29113 and F2018 C descriptors.
https://github.com/sourceryinstitute/ISO_Fortran_binding

The ordering of types is different to the current 'bt' enum in
libgfortran.h. This patch interchanges BT_DERIVED and BT_CHARACTER to
fix this.

Regtests on FC28/x86_64. OK for trunk?

Cheers

Paul

2018-07-31  Paul Thomas  

* gcc/fortran/libgfortran.h : In bt enum interchange BT_DERIVED
and BT_CHARACTER for CFI descriptor compatibility(TS 29113).
Index: gcc/fortran/libgfortran.h
===
*** gcc/fortran/libgfortran.h	(revision 262444)
--- gcc/fortran/libgfortran.h	(working copy)
*** typedef enum
*** 171,177 
 used in the run-time library for IO.  */
  typedef enum
  { BT_UNKNOWN = 0, BT_INTEGER, BT_LOGICAL, BT_REAL, BT_COMPLEX,
!   BT_DERIVED, BT_CHARACTER, BT_CLASS, BT_PROCEDURE, BT_HOLLERITH, BT_VOID,
BT_ASSUMED, BT_UNION
  }
  bt;
--- 171,177 
 used in the run-time library for IO.  */
  typedef enum
  { BT_UNKNOWN = 0, BT_INTEGER, BT_LOGICAL, BT_REAL, BT_COMPLEX,
!   BT_CHARACTER, BT_DERIVED, BT_CLASS, BT_PROCEDURE, BT_HOLLERITH, BT_VOID,
BT_ASSUMED, BT_UNION
  }
  bt;


  1   2   >