Re: Enabling -ftree-slp-vectorize on -O2/Os

2018-05-26 Thread Richard Biener
On May 27, 2018 1:25:25 AM GMT+02:00, Allan Sandfeld Jensen 
 wrote:
>On Sonntag, 27. Mai 2018 00:05:32 CEST Segher Boessenkool wrote:
>> On Sat, May 26, 2018 at 11:32:29AM +0200, Allan Sandfeld Jensen
>wrote:
>> > I brought this subject up earlier, and was told to suggest it again
>for
>> > gcc 9, so I have attached the preliminary changes.
>> > 
>> > My studies have show that with generic x86-64 optimization it
>reduces
>> > binary size with around 0.5%, and when optimizing for x64 targets
>with
>> > SSE4 or better, it reduces binary size by 2-3% on average. The
>> > performance changes are negligible however*, and I haven't been
>able to
>> > detect changes in compile time big enough to penetrate general
>noise on
>> > my platform, but perhaps someone has a better setup for that?
>> > 
>> > * I believe that is because it currently works best on
>non-optimized code,
>> > it is better at big basic blocks doing all kinds of things than
>tightly
>> > written inner loops.
>> > 
>> > Anythhing else I should test or report?
>> 
>> What does it do on other architectures?
>> 
>> 
>I believe NEON would do the same as SSE4, but I can do a check. For 
>architectures without SIMD it essentially does nothing.

By default it combines integer ops where possible into word_mode registers. So 
yes, almost nothing. 

Richard. 

>'Allan



[Bug libfortran/85906] Conditional jump depends on uninitialized value in write_decimal / write_integer

2018-05-26 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85906

--- Comment #7 from Jerry DeLisle  ---
Author: jvdelisle
Date: Sun May 27 03:22:11 2018
New Revision: 260802

URL: https://gcc.gnu.org/viewcvs?rev=260802=gcc=rev
Log:
2018-05-26  Jerry DeLisle  

Backport from trunk.
PR libgfortran/85906
* io/write.c (write_integer): Initialise the fnode format to
FMT_NONE, used for list directed write.
(BUF_STACK_SZ): Bump default buffer size up to avoid allocs on
small stuff.

2018-05-26  Jerry DeLisle  

Backport from trunk.
PR libgfortran/85840
* io/write.c (write_float_0): Use separate local variable for
the float string length.

Modified:
branches/gcc-8-branch/libgfortran/ChangeLog
branches/gcc-8-branch/libgfortran/io/write.c

[Bug fortran/85840] Memory leak in write.c

2018-05-26 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85840

--- Comment #13 from Jerry DeLisle  ---
Author: jvdelisle
Date: Sun May 27 03:22:11 2018
New Revision: 260802

URL: https://gcc.gnu.org/viewcvs?rev=260802=gcc=rev
Log:
2018-05-26  Jerry DeLisle  

Backport from trunk.
PR libgfortran/85906
* io/write.c (write_integer): Initialise the fnode format to
FMT_NONE, used for list directed write.
(BUF_STACK_SZ): Bump default buffer size up to avoid allocs on
small stuff.

2018-05-26  Jerry DeLisle  

Backport from trunk.
PR libgfortran/85840
* io/write.c (write_float_0): Use separate local variable for
the float string length.

Modified:
branches/gcc-8-branch/libgfortran/ChangeLog
branches/gcc-8-branch/libgfortran/io/write.c

Re: Enabling -ftree-slp-vectorize on -O2/Os

2018-05-26 Thread Segher Boessenkool
On Sun, May 27, 2018 at 01:25:25AM +0200, Allan Sandfeld Jensen wrote:
> On Sonntag, 27. Mai 2018 00:05:32 CEST Segher Boessenkool wrote:
> > On Sat, May 26, 2018 at 11:32:29AM +0200, Allan Sandfeld Jensen wrote:
> > > I brought this subject up earlier, and was told to suggest it again for
> > > gcc 9, so I have attached the preliminary changes.
> > > 
> > > My studies have show that with generic x86-64 optimization it reduces
> > > binary size with around 0.5%, and when optimizing for x64 targets with
> > > SSE4 or better, it reduces binary size by 2-3% on average. The
> > > performance changes are negligible however*, and I haven't been able to
> > > detect changes in compile time big enough to penetrate general noise on
> > > my platform, but perhaps someone has a better setup for that?
> > > 
> > > * I believe that is because it currently works best on non-optimized code,
> > > it is better at big basic blocks doing all kinds of things than tightly
> > > written inner loops.
> > > 
> > > Anythhing else I should test or report?
> > 
> > What does it do on other architectures?
> > 
> I believe NEON would do the same as SSE4, but I can do a check. For 
> architectures without SIMD it essentially does nothing.

Sorry, I wasn't clear.  What does it do to performance on other
architectures?  Is it (almost) always a win (or neutral)?  If not, it
doesn't belong in -O2, not for the generic options at least.

(We'll test it on Power soon, it's weekend now :-) ).


Segher


[Bug libstdc++/85930] [8/9 Regression] Misaligned reference created in shared_ptr_base.h with -fno-rtti

2018-05-26 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85930

Jonathan Wakely  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
  Known to work||7.3.0
   Assignee|unassigned at gcc dot gnu.org  |redi at gcc dot gnu.org
   Target Milestone|--- |8.2
Summary|Misaligned reference|[8/9 Regression] Misaligned
   |created in  |reference created in
   |shared_ptr_base.h with  |shared_ptr_base.h with
   |-fno-rtti   |-fno-rtti
  Known to fail||8.1.0, 9.0

[Bug libstdc++/85930] Misaligned reference created in shared_ptr_base.h with -fno-rtti

2018-05-26 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85930

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-05-27
 Ever confirmed|0   |1

[Bug c++/85940] New: Address of label breaks ISO C++ program despite non-GNU dialect and pedantic

2018-05-26 Thread hstong at ca dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85940

Bug ID: 85940
   Summary: Address of label breaks ISO C++ program despite
non-GNU dialect and pedantic
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: rejects-valid
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hstong at ca dot ibm.com
  Target Milestone: ---

The address of label extension breaks well-formed C++ programs.
GCC confuses the logical AND operator for the extension in the source below.
MSVC works.

### SOURCE ():
bool f(bool x) {
x:
  return (bool()) && x;
}


### COMPILER INVOCATION COMMAND:
g++ -fsyntax-only -xc++ -std=c++11 -pedantic -


### ACTUAL OUTPUT:
: In function 'bool f(bool)':
:3:22: warning: taking the address of a label is non-standard
[-Wpedantic]
:3:22: error: invalid cast to function type 'bool()'


### EXPECTED OUTPUT:
(Clean compile).


### COMPILER VERSION INFO (g++ -v):
Using built-in specs.
COLLECT_GCC=/opt/wandbox/gcc-head/bin/g++
COLLECT_LTO_WRAPPER=/opt/wandbox/gcc-head/libexec/gcc/x86_64-pc-linux-gnu/9.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../source/configure --prefix=/opt/wandbox/gcc-head
--enable-languages=c,c++ --disable-multilib --without-ppl --without-cloog-ppl
--enable-checking=release --disable-nls --enable-lto
LDFLAGS=-Wl,-rpath,/opt/wandbox/gcc-head/lib,-rpath,/opt/wandbox/gcc-head/lib64,-rpath,/opt/wandbox/gcc-head/lib32
Thread model: posix
gcc version 9.0.0 20180525 (experimental) (GCC)

Re: Enabling -ftree-slp-vectorize on -O2/Os

2018-05-26 Thread Allan Sandfeld Jensen
On Sonntag, 27. Mai 2018 00:05:32 CEST Segher Boessenkool wrote:
> On Sat, May 26, 2018 at 11:32:29AM +0200, Allan Sandfeld Jensen wrote:
> > I brought this subject up earlier, and was told to suggest it again for
> > gcc 9, so I have attached the preliminary changes.
> > 
> > My studies have show that with generic x86-64 optimization it reduces
> > binary size with around 0.5%, and when optimizing for x64 targets with
> > SSE4 or better, it reduces binary size by 2-3% on average. The
> > performance changes are negligible however*, and I haven't been able to
> > detect changes in compile time big enough to penetrate general noise on
> > my platform, but perhaps someone has a better setup for that?
> > 
> > * I believe that is because it currently works best on non-optimized code,
> > it is better at big basic blocks doing all kinds of things than tightly
> > written inner loops.
> > 
> > Anythhing else I should test or report?
> 
> What does it do on other architectures?
> 
> 
I believe NEON would do the same as SSE4, but I can do a check. For 
architectures without SIMD it essentially does nothing.

'Allan




Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP

2018-05-26 Thread Segher Boessenkool
On Sat, May 26, 2018 at 11:09:24AM +0100, Richard Sandiford wrote:
> On the wider point about changing the way call clobber information
> is represented: I agree it would be good to generalise what we have
> now.  But if possible I think we should avoid target hooks that take
> a specific call, and instead make it an inherent part of the call insn
> itself, much like CALL_INSN_FUNCTION_USAGE is now.  E.g. we could add
> a field that points to an ABI description, with -fipa-ra effectively
> creating ad-hoc ABIs.  That ABI description could start out with
> whatever we think is relevant now and could grow over time.

Somewhat related: there still is PR68150 open for problems with
HARD_REGNO_CALL_PART_CLOBBERED in postreload-gcse (it ignores it).


Segher


[Bug target/85918] Conversions to/from [unsigned] long long are not vectorized for AVX512DQ target

2018-05-26 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85918

--- Comment #3 from Jakub Jelinek  ---
Author: jakub
Date: Sat May 26 22:04:50 2018
New Revision: 260797

URL: https://gcc.gnu.org/viewcvs?rev=260797=gcc=rev
Log:
PR target/85918
* config/i386/i386.md (fixunssuffix, floatunssuffix): New code
attributes.
* config/i386/sse.md
(float2):
Rename to ...
   
(float2):
... this.
   
(float2):
Rename to ...
   
(float2):
... this.
(*floatv2div2sf2): Rename to ...
(*floatv2div2sf2): ... this.
(floatv2div2sf2_mask): Rename to ...
(floatv2div2sf2_mask): ... this.
(*floatv2div2sf2_mask_1): Rename to ...
(*floatv2div2sf2_mask_1): ... this.
(fix_truncv8dfv8si2): Rename
to ...
(fix_truncv8dfv8si2):
... this.
   
(fix_trunc2):
Rename to ...
   
(fix_trunc2):
... this.
   
(fix_trunc2):
Rename to ...
   
(fix_trunc2):
... this.
(fix_truncv2sfv2di2): Rename to ...
(fix_truncv2sfv2di2): ... this.
(vec_pack_ufix_trunc_): Use gen_fixuns_truncv8dfv8si2 instead of
gen_ufix_truncv8dfv8si2.
* config/i386/i386-builtin.def (__builtin_ia32_cvttpd2uqq256_mask,
__builtin_ia32_cvttpd2uqq128_mask, __builtin_ia32_cvttps2uqq256_mask,
__builtin_ia32_cvttps2uqq128_mask, __builtin_ia32_cvtuqq2ps256_mask,
__builtin_ia32_cvtuqq2ps128_mask, __builtin_ia32_cvtuqq2pd256_mask,
__builtin_ia32_cvtuqq2pd128_mask, __builtin_ia32_cvttpd2udq512_mask,
__builtin_ia32_cvtuqq2ps512_mask, __builtin_ia32_cvtuqq2pd512_mask,
__builtin_ia32_cvttps2uqq512_mask, __builtin_ia32_cvttpd2uqq512_mask):
Use fixuns instead ufix or floatuns instead ufloat in CODE_FOR_ names.

* gcc.target/i386/avx512dq-pr85918.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/avx512dq-pr85918.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386-builtin.def
trunk/gcc/config/i386/i386.md
trunk/gcc/config/i386/sse.md
trunk/gcc/testsuite/ChangeLog

Re: Enabling -ftree-slp-vectorize on -O2/Os

2018-05-26 Thread Segher Boessenkool
On Sat, May 26, 2018 at 11:32:29AM +0200, Allan Sandfeld Jensen wrote:
> I brought this subject up earlier, and was told to suggest it again for gcc 
> 9, 
> so I have attached the preliminary changes.
> 
> My studies have show that with generic x86-64 optimization it reduces binary 
> size with around 0.5%, and when optimizing for x64 targets with SSE4 or 
> better, it reduces binary size by 2-3% on average. The performance changes 
> are 
> negligible however*, and I haven't been able to detect changes in compile 
> time 
> big enough to penetrate general noise on my platform, but perhaps someone has 
> a better setup for that?
> 
> * I believe that is because it currently works best on non-optimized code, it 
> is better at big basic blocks doing all kinds of things than tightly written 
> inner loops.
> 
> Anythhing else I should test or report?

What does it do on other architectures?


Segher


[Bug target/85939] New: -mstackrealign does not realign stack with local __m64 variable

2018-05-26 Thread fw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85939

Bug ID: 85939
   Summary: -mstackrealign does not realign stack with local __m64
variable
   Product: gcc
   Version: 8.1.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: fw at gcc dot gnu.org
  Target Milestone: ---
Target: i386-*-linux-gnu

Consider this test case:

#include 

int f1 (__m64 *);

int
f2 (void)
{
  __m64 v;
  return f1 ();
}

My understanding is that the i386 ABI requires 8-byte alignment for __m64
objects.  However, with “gcc -m32 -mstackrealign -O2”, I get this:

.p2align 4,,15
.globl  f2
.type   f2, @function
f2:
.LFB504:
.cfi_startproc
subl$40, %esp
.cfi_def_cfa_offset 44
leal20(%esp), %eax
pushl   %eax
.cfi_def_cfa_offset 48
callf1
addl$44, %esp
.cfi_def_cfa_offset 4
ret
.cfi_endproc
.LFE504:
.size   f2, .-f2

This will not produce a correctly aligned object if the stack alignment is off
by 4 (or 12) bytes.

[Bug web/85917] GCC 8 Changes page fails to mention change of default mode for C

2018-05-26 Thread Arfrever.FTA at GMail dot Com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85917

--- Comment #2 from Arfrever Frehtes Taifersar Arahesis  ---
Even such minor changes could be mentioned in that page for completeness.

[Bug target/85915] -mfunction-return=thunk causes multiple definition of `__x86_return_thunk'

2018-05-26 Thread Arfrever.FTA at GMail dot Com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85915

--- Comment #7 from Arfrever Frehtes Taifersar Arahesis  ---
Ebuild for GCC 7 branch is not available in Gentoo.
I guess that the relevant commit is:
https://gcc.gnu.org/viewcvs/gcc?view=revision=258647
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=b67700aa0344abbbdb50d150c628475e747b316f
I have backported this commit and 6 earlier commits to GCC 7.3.0 and this
patched GCC 7.3.0 now seems to work (both with -mfunction-return=thunk and with
default -mfunction-return=keep).

I guess that the same ABI incompatibility will exist between vanilla GCC 7.3.0
and future GCC 7.4.0 as between vanilla GCC 7.3.0 and GCC 8.1.0.

If it is not possible to have ABI compatibility while still keeping fixed
behavior (not generating aliases for function return thunks), then I suggest
that documentation mention this ABI incompatibility.
The relevant places would be:
https://gcc.gnu.org/gcc-7/changes.html (mentioning ABI incompatibility between
GCC 7.3.0 and >=7.4.0)
https://gcc.gnu.org/gcc-7/porting_to.html (mentioning ABI incompatibility
between GCC 7.3.0 and >=7.4.0)
https://gcc.gnu.org/gcc-8/changes.html (mentioning ABI incompatibility between
GCC 7.3.0 and >=8)
https://gcc.gnu.org/gcc-8/porting_to.html (mentioning ABI incompatibility
between GCC 7.3.0 and >=8)

[PATCH] DWARF5: Don't generate DW_AT_loclists_base for split compile unit DIEs.

2018-05-26 Thread Mark Wielaard
The loclists_base attribute is used to point to the beginning of the
loclists index of a DWARF5 loclists table when using DW_FORM_loclistsx.
For split compile units the base is not given by the attribute, but is
either the first (and only) index in the .debug_loclists section, or
(when placed in a .dwp file) given by the DW_SECT_LOCLISTS row in the
.debug_cu_index section.

The loclists_base attribute is only valid for the full (or skeleton)
compile unit DIE in the main (relocatable) object. But GCC only ever
generates a loclists table index for the .debug_loclists section put
into the split DWARF .dwo file.

For split compile unit DIEs it is confusing (and not according to spec)
to also have a DW_AT_loclists_base attribute (which might be wrong,
since its relocatable offset won't actually be relocated).

gcc/ChangeLog

* dwarf2out.c (dwarf2out_finish): Remove generation of
DW_AT_loclists_base.
---

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index c05bfe4..103ded0 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -31292,11 +31292,17 @@ dwarf2out_finish (const char *)
   if (dwarf_split_debug_info)
 {
   if (have_location_lists)
{
- if (dwarf_version >= 5)
-   add_AT_loclistsptr (comp_unit_die (), DW_AT_loclists_base,
-   loc_section_label);
+ /* Since we generate the loclists in the split DWARF .dwo
+file itself, we don't need to generate a loclists_base
+attribute for the split compile unit DIE.  That attribute
+(and using relocatable sec_offset FORMs) isn't allowed
+for a split compile unit.  Only if the .debug_loclists
+section was in the main file, would we need to generate a
+loclists_base attribute here (for the full or skeleton
+unit DIE).  */
+
  /* optimize_location_lists calculates the size of the lists,
 so index them first, and assign indices to the entries.
 Although optimize_location_lists will remove entries from
 the table, it only does so for duplicates, and therefore


[Bug objc/50909] Process "#pragma options align=reset" correctly on Mac OS X

2018-05-26 Thread rudolf.chrispens at web dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50909

Rudolf  changed:

   What|Removed |Added

 CC||rudolf.chrispens at web dot de

--- Comment #12 from Rudolf  ---
This is still an issue... 7 YEARS LEATER!?

I tried to change the permission on USB.h just to get around this bug...

with something like this:
#if defined(__clang__) || defined(__llvm__)
#pragma options align=reset
#else
#pragma pack()
#endif

but sadly could not do so atm.

I am using gcc 8.1.0. please fix this.

[Bug c++/58372] internal compiler error: ix86_compute_frame_layout

2018-05-26 Thread martchus at gmx dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58372

martchus at gmx dot net changed:

   What|Removed |Added

 CC||martchus at gmx dot net

--- Comment #13 from martchus at gmx dot net ---
I also came across this issue when compiling Qt 5.11.0. The error message you
get when compiling Qt looks very similar, indeed. I configured my compiler
similar to  Bitterblue, but I'm already using GCC 8.1.0.

Since the official Qt binaries are apparently not affected, I assume that
problem is only present when using SJLJ.

[Bug fortran/85938] New: Spurious assert failure for matmul with reshaped array

2018-05-26 Thread stephan.kramer at imperial dot ac.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85938

Bug ID: 85938
   Summary: Spurious assert failure for matmul with reshaped array
   Product: gcc
   Version: 8.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: stephan.kramer at imperial dot ac.uk
  Target Milestone: ---

The following program

program foo

  real, dimension(9) :: A
  real, dimension(3) :: b
  integer :: n = 3

  A = 1.0
  b = 1.0
  print *, matmul(reshape(A, (/ n, n /)), b)

end program

compiled with, or without optimisation, produces the following runtime
assertion failure in libgfortran:

$ gfortran test.f90 
$ ./a.out 
a.out: ../../../src/libgfortran/generated/matmul_r4.c:651: matmul_r4_avx2:
Assertion `GFC_DESCRIPTOR_RANK (a) == 2 || GFC_DESCRIPTOR_RANK (b) == 2'
failed.

Program received signal SIGABRT: Process abort signal.

I also tried -fno-frontend-optimize to no avail.
Backtrace in gdb:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x76dfc231 in __GI_abort () at abort.c:79
#2  0x76df39da in __assert_fail_base (fmt=0x76f46d48 "%s%s%s:%u:
%s%sAssertion `%s' failed.\n%n", 
assertion=assertion@entry=0x77ba9ec8 "GFC_DESCRIPTOR_RANK (a) == 2 ||
GFC_DESCRIPTOR_RANK (b) == 2", 
file=file@entry=0x77baa300
"../../../src/libgfortran/generated/matmul_r4.c", line=line@entry=651, 
function=function@entry=0x77baa368 "matmul_r4_avx2") at assert.c:92
#3  0x76df3a52 in __GI___assert_fail (assertion=0x77ba9ec8
"GFC_DESCRIPTOR_RANK (a) == 2 || GFC_DESCRIPTOR_RANK (b) == 2", 
file=0x77baa300 "../../../src/libgfortran/generated/matmul_r4.c",
line=651, function=0x77baa368 "matmul_r4_avx2") at assert.c:101
#4  0x77a5c9a3 in ?? () from /usr/lib/x86_64-linux-gnu/libgfortran.so.5
#5  0x4c32 in foo () at test.f90:8
#6  0x4cf0 in main (argc=1, argv=0x7fffe38c) at test.f90:10
#7  0x76de7a87 in __libc_start_main (main=0x4cba ,
argc=1, argv=0x7fffe088, init=, fini=, 
rtld_fini=, stack_end=0x7fffe078) at
../csu/libc-start.c:310
#8  0x482a in _start ()

This is based on source and debugging symbols for Debian gcc-8 8.1.0-3

[Bug fortran/85840] Memory leak in write.c

2018-05-26 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85840

--- Comment #12 from Jerry DeLisle  ---
Fixed on trunk. I think this should be backported as it is a regression I think
on 7 and 8 branches.

[Bug libfortran/85906] Conditional jump depends on uninitialized value in write_decimal / write_integer

2018-05-26 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85906

--- Comment #6 from Jerry DeLisle  ---
Fixed on trunk.  If anyone thinks this should be backported as a regression,
let me know.

[patch, committed, libgfortran] PR85906 - Conditional jump depends on uninitialized value in write_integer

2018-05-26 Thread Jerry DeLisle
I biffed the ChangeLog on this with a flip of two digits on the PR 
number (fixed).


Anyway, the following was committed as obvious to trunk.  The 
BUF_STACK_SZ I bumped up because I noticed on PR85840 test case that 
even small kind floats were asking for a buffer size of 323. This avoids 
a few allocs for every write operation.


2018-05-26  Jerry DeLisle  

PR libgfortran/85906
* io/write.c (write_integer): Initialise the fnode format to
FMT_NONE, used for list directed write.
(BUF_STACK_SZ): Bump default buffer size up to avoid allocs on
small stuff.


--- trunk/libgfortran/io/write.c2018/05/26 17:30:52 260793
+++ trunk/libgfortran/io/write.c2018/05/26 18:22:18 260795
@@ -1348,6 +1348,7 @@
 }
   f.u.integer.w = width;
   f.u.integer.m = -1;
+  f.format = FMT_NONE;
   write_decimal (dtp, , source, kind, (void *) gfc_itoa);
 }

@@ -1465,7 +1466,7 @@

 /* Floating point helper functions.  */

-#define BUF_STACK_SZ 256
+#define BUF_STACK_SZ 384

 static int
 get_precision (st_parameter_dt *dtp, const fnode *f, const char 
*source, int kind)


[Bug libfortran/85906] Conditional jump depends on uninitialized value in write_decimal / write_integer

2018-05-26 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85906

--- Comment #5 from Jerry DeLisle  ---
2018-05-26  Jerry DeLisle  

PR libgfortran/85906
* io/write.c (write_integer): Initialise the fnode format to
FMT_NONE, used for list directed write.
(BUF_STACK_SZ): Bump default buffer size up to avoid allocs on
small stuff.


--- trunk/libgfortran/io/write.c2018/05/26 17:30:52 260793
+++ trunk/libgfortran/io/write.c2018/05/26 18:22:18 260795
@@ -1348,6 +1348,7 @@
 }
   f.u.integer.w = width;
   f.u.integer.m = -1;
+  f.format = FMT_NONE;
   write_decimal (dtp, , source, kind, (void *) gfc_itoa);
 }

@@ -1465,7 +1466,7 @@

 /* Floating point helper functions.  */

-#define BUF_STACK_SZ 256
+#define BUF_STACK_SZ 384

 static int
 get_precision (st_parameter_dt *dtp, const fnode *f, const char *source, int
kind)

[Bug middle-end/85933] FAIL: gcc.dg/sso/p8.c -O3 -finline-functions (internal compiler error)

2018-05-26 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85933

Dominique d'Humieres  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-05-26
 Ever confirmed|0   |1

--- Comment #2 from Dominique d'Humieres  ---
> *** Bug 85937 has been marked as a duplicate of this bug. ***

Then confirmed.

Re: [PATCH] Warn for ignored ASM labels on typdef declarations PR 85444 (v.3)

2018-05-26 Thread Will Hawkins
Hello everyone!

I know every member of the community is very busy, but I am following
up on this patch to conform to the 'ping' etiquette.

Please let me know what comments you have about this patch and how I
can modify it to make sure that it meets standards.

Thanks for everything that you all do to make GCC the best compiler out there.

Will


On Fri, May 18, 2018 at 5:34 PM, Will Hawkins  wrote:
> Hello again!
>
> Thanks to the feedback of Mr. Myers and those on the PR, I have
> created a version 3 of this patch. This version introduces a new
> warning flag (enabled at Wall) -Wignored-asm-name that will flag cases
> where the user specifies an ASM name that the compiler ignores.
>
> Test cases included. Results from make bootstrap and/or make -k check
> are available upon request.
>
> Please let me know what I can do to make this better and bring it up
> to the standards of the community! Thanks again for the feedback on
> this patch during the previous two revisions!
>
> Sincerely,
> Will Hawkins
>
>
> 2018-05-18 Will Hawkins 
>
> PR c,c++/85444
> * gcc/c/c-decl.c: Warn about ignored asm label for
> typedef declaration
> * gcc/cp/decl.c: Warn about ignored asm label for
> typedef declaration
> * gcc/testsuite/gcc.dg/asm-pr85444.c: c testcase.
> * gcc/testsuite/g++.dg/asm-pr85444.C: c++ testcase.
>
> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> index c48d6dc..ab3a9af 100644
> --- a/gcc/c-family/c.opt
> +++ b/gcc/c-family/c.opt
> @@ -595,6 +595,10 @@ Wignored-attributes
>  C C++ Var(warn_ignored_attributes) Init(1) Warning
>  Warn whenever attributes are ignored.
>
> +Wignored-asm-name
> +C C++ Var(warn_ignored_asm_name) Warning LangEnabledBy(C C++,Wall)
> +Warn whenever assembler names are specified but ignored.
> +
>  Wincompatible-pointer-types
>  C ObjC Var(warn_incompatible_pointer_types) Init(1) Warning
>  Warn when there is a conversion between pointers that have incompatible 
> types.
> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> index 3c4b18e..5a1ecd7 100644
> --- a/gcc/c/c-decl.c
> +++ b/gcc/c/c-decl.c
> @@ -5177,7 +5177,11 @@ finish_decl (tree decl, location_t init_loc, tree init,
>if (!DECL_FILE_SCOPE_P (decl)
>&& variably_modified_type_p (TREE_TYPE (decl), NULL_TREE))
>  add_stmt (build_stmt (DECL_SOURCE_LOCATION (decl), DECL_EXPR, decl));
> -
> +  if (asmspec_tree != NULL_TREE)
> +{
> +  warning (OPT_Wignored_asm_name, "asm-specifier is ignored in "
> +   "typedef declaration");
> +}
>rest_of_decl_compilation (decl, DECL_FILE_SCOPE_P (decl), 0);
>  }
>
> diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
> index 10e3079..4c3ee36 100644
> --- a/gcc/cp/decl.c
> +++ b/gcc/cp/decl.c
> @@ -6981,6 +6981,11 @@ cp_finish_decl (tree decl, tree init, bool
> init_const_expr_p,
>/* Take care of TYPE_DECLs up front.  */
>if (TREE_CODE (decl) == TYPE_DECL)
>  {
> +  if (asmspec_tree != NULL_TREE)
> +{
> +  warning (OPT_Wignored_asm_name, "asm-specifier is ignored for "
> +   "typedef declarations");
> +}
>if (type != error_mark_node
>&& MAYBE_CLASS_TYPE_P (type) && DECL_NAME (decl))
>  {
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index ca3772b..63f81f4 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -286,7 +286,8 @@ Objective-C and Objective-C++ Dialects}.
>  -Wformat-y2k  -Wframe-address @gol
>  -Wframe-larger-than=@var{len}  -Wno-free-nonheap-object
> -Wjump-misses-init @gol
>  -Wif-not-aligned @gol
> --Wignored-qualifiers  -Wignored-attributes  -Wincompatible-pointer-types @gol
> +-Wignored-qualifiers  -Wignored-attributes  -Wignored-asm-name @gol
> +-Wincompatible-pointer-types @gol
>  -Wimplicit  -Wimplicit-fallthrough  -Wimplicit-fallthrough=@var{n} @gol
>  -Wimplicit-function-declaration  -Wimplicit-int @gol
>  -Winit-self  -Winline  -Wno-int-conversion  -Wint-in-bool-context @gol
> @@ -4523,6 +4524,14 @@ Warn when an attribute is ignored.  This is
> different from the
>  to drop an attribute, not that the attribute is either unknown, used in a
>  wrong place, etc.  This warning is enabled by default.
>
> +@item -Wignored-asm-name @r{(C and C++ only)}
> +@opindex Wignored-asm-name
> +@opindex Wno-ignored-asm-name
> +Warn when an assembler name is given but ignored. For C and C++, this
> +happens when a @code{typdef} declaration is given an assembler name.
> +
> +This warning is also enabled by @option{-Wall}.
> +
>  @item -Wmain
>  @opindex Wmain
>  @opindex Wno-main
> diff --git a/gcc/testsuite/g++.dg/asm-pr85444.C
> b/gcc/testsuite/g++.dg/asm-pr85444.C
> new file mode 100644
> index 000..f1f8f61
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/asm-pr85444.C
> @@ -0,0 +1,13 @@
> +/* Fix Bugzilla 8544 -- asm specifier on typedef silently ignored.
> +   { dg-do compile }
> +   { dg-options "-Wignored-asm-name" } */
> +
> +typedef struct
> +{
> +  int 

Re: PR80155: Code hoisting and register pressure

2018-05-26 Thread Bin.Cheng
On Fri, May 25, 2018 at 5:54 PM, Richard Biener  wrote:
> On May 25, 2018 6:57:13 PM GMT+02:00, Jeff Law  wrote:
>>On 05/25/2018 03:49 AM, Bin.Cheng wrote:
>>> On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni
>>>  wrote:
 On 23 May 2018 at 18:37, Jeff Law  wrote:
> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote:
>> On 23 May 2018 at 13:58, Richard Biener  wrote:
>>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote:
>>>
 Hi,
 I am trying to work on PR80155, which exposes a problem with
>>code
 hoisting and register pressure on a leading embedded benchmark
>>for ARM
 cortex-m7, where code-hoisting causes an extra register spill.

 I have attached two test-cases which (hopefully) are
>>representative of
 the original test-case.
 The first one (trans_dfa.c) is bigger and somewhat similar to
>>the
 original test-case and trans_dfa_2.c is hand-reduced version of
 trans_dfa.c. There's 2 spills caused with trans_dfa.c
 and one spill with trans_dfa_2.c due to lesser amount of cases.
 The test-cases in the PR are probably not relevant.

 Initially I thought the spill was happening because of "too many
 hoistings" taking place in original test-case thus increasing
>>the
 register pressure, but it seems the spill is possibly caused
>>because
 expression gets hoisted out of a block that is on loop exit.

 For example, the following hoistings take place with
>>trans_dfa_2.c:

 (1) Inserting expression in block 4 for code hoisting:
 {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005)

 (2) Inserting expression in block 4 for code hoisting:
>>{plus_expr,_4,1} (0006)

 (3) Inserting expression in block 4 for code hoisting:
 {pointer_plus_expr,s_33,1} (0023)

 (4) Inserting expression in block 3 for code hoisting:
 {pointer_plus_expr,s_33,1} (0023)

 The issue seems to be hoisting of (*tab + 1) which consists of
>>first
 two hoistings in block 4
 from blocks 5 and 9, which causes the extra spill. I verified
>>that by
 disabling hoisting into block 4,
 which resulted in no extra spills.

 I wonder if that's because the expression (*tab + 1) is getting
 hoisted from blocks 5 and 9,
 which are on loop exit ? So the expression that was previously
 computed in a block on loop exit, gets hoisted outside that
>>block
 which possibly makes the allocator more defensive ? Similarly
 disabling hoisting of expressions which appeared in blocks on
>>loop
 exit in original test-case prevented the extra spill. The other
 hoistings didn't seem to matter.
>>>
>>> I think that's simply co-incidence.  The only thing that makes
>>> a block that also exits from the loop special is that an
>>> expression could be sunk out of the loop and hoisting (commoning
>>> with another path) could prevent that.  But that isn't what is
>>> happening here and it would be a pass ordering issue as
>>> the sinking pass runs only after hoisting (no idea why exactly
>>> but I guess there are cases where we want to prefer CSE over
>>> sinking).  So you could try if re-ordering PRE and sinking helps
>>> your testcase.
>> Thanks for the suggestions. Placing sink pass before PRE works
>> for both these test-cases! Sadly it still causes the spill for the
>>benchmark -:(
>> I will try to create a better approximation of the original
>>test-case.
>>>
>>> What I do see is a missed opportunity to merge the successors
>>> of BB 4.  After PRE we have
>>>
>>>  [local count: 159303558]:
>>> :
>>> pretmp_123 = *tab_37(D);
>>> _87 = pretmp_123 + 1;
>>> if (c_36 == 65)
>>>   goto ; [34.00%]
>>> else
>>>   goto ; [66.00%]
>>>
>>>  [local count: 54163210]:
>>> *tab_37(D) = _87;
>>> _96 = MEM[(char *)s_57 + 1B];
>>> if (_96 != 0)
>>>   goto ; [89.00%]
>>> else
>>>   goto ; [11.00%]
>>>
>>>  [local count: 105140348]:
>>> *tab_37(D) = _87;
>>> _56 = MEM[(char *)s_57 + 1B];
>>> if (_56 != 0)
>>>   goto ; [89.00%]
>>> else
>>>   goto ; [11.00%]
>>>
>>> here at least the stores and loads can be hoisted.  Note this
>>> may also point at the real issue of the code hoisting which is
>>> tearing apart the RMW operation?
>> Indeed, this possibility seems much more likely than block being
>>on loop exit.
>> I will try to "hardcode" the load/store hoists into block 4 for
>>this
>> specific test-case to check
>> if that prevents the spill.
> Even if it prevents the spill in this case, it's likely a good
>>thing to
> do.  The 

[Bug middle-end/85933] FAIL: gcc.dg/sso/p8.c -O3 -finline-functions (internal compiler error)

2018-05-26 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85933

Eric Botcazou  changed:

   What|Removed |Added

 CC||dominiq at lps dot ens.fr

--- Comment #1 from Eric Botcazou  ---
*** Bug 85937 has been marked as a duplicate of this bug. ***

[Bug ada/85937] [9 Regression] Failures in the Ada tests

2018-05-26 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85937

Eric Botcazou  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Eric Botcazou  ---
.

*** This bug has been marked as a duplicate of bug 85933 ***

[patch, libgfortran, committed] Bug 85840 - Memory leak in write.c

2018-05-26 Thread Jerry DeLisle

The following committed as obvious after regression testing.

2018-05-26  Jerry DeLisle  

PR libgfortran/85840
* io/write.c (write_float_0): Use separate local variable for
the float string length.


Author: jvdelisle
Date: Sat May 26 17:30:52 2018
New Revision: 260793

URL: https://gcc.gnu.org/viewcvs?rev=260793=gcc=rev
Log:
2018-05-26  Jerry DeLisle  

PR libgfortran/85840
* io/write.c (write_float_0): Use separate local variable for
the float string length.

Modified:
trunk/libgfortran/io/write.c

--- trunk/libgfortran/io/write.c2018/05/26 11:35:31 260792
+++ trunk/libgfortran/io/write.c2018/05/26 17:30:52 260793
@@ -1566,19 +1566,19 @@
   char buf_stack[BUF_STACK_SZ];
   char str_buf[BUF_STACK_SZ];
   char *buffer, *result;
-  size_t buf_size, res_len;
+  size_t buf_size, res_len, flt_str_len;

   /* Precision for snprintf call.  */
   int precision = get_precision (dtp, f, source, kind);

   /* String buffer to hold final result.  */
   result = select_string (dtp, f, str_buf, _len, kind);
-
+
   buffer = select_buffer (dtp, f, precision, buf_stack, _size, kind);
-
+
   get_float_string (dtp, f, source , kind, 0, buffer,
-   precision, buf_size, result, _len);
-  write_float_string (dtp, result, res_len);
+   precision, buf_size, result, _str_len);
+  write_float_string (dtp, result, flt_str_len);

   if (buf_size > BUF_STACK_SZ)
 free (buffer);


[Bug fortran/85840] Memory leak in write.c

2018-05-26 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85840

--- Comment #11 from Jerry DeLisle  ---
Author: jvdelisle
Date: Sat May 26 17:30:52 2018
New Revision: 260793

URL: https://gcc.gnu.org/viewcvs?rev=260793=gcc=rev
Log:
2018-05-26  Jerry DeLisle  

PR libgfortran/85840
* io/write.c (write_float_0): Use separate local variable for
the float string length.

Modified:
trunk/libgfortran/io/write.c

[Bug ada/85937] New: [9 Regression] Failures in the Ada tests

2018-05-26 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85937

Bug ID: 85937
   Summary: [9 Regression] Failures in the Ada tests
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ada
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dominiq at lps dot ens.fr
CC: ebotcazou at gcc dot gnu.org
  Target Milestone: ---
  Host: x86_64-apple-darwin17
Target: x86_64-apple-darwin17
 Build: x86_64-apple-darwin17

Between revisions r260557 and r260760, the following errors appeared on darwin

FAIL: gnat.dg/sso/q10.adb   -O3 -finline-functions  (test for excess errors)
UNRESOLVED: gnat.dg/sso/q10.adb   -O3 -finline-functions  compilation failed to
produce executable
FAIL: gnat.dg/sso/q11.adb   -O3 -finline-functions  (test for excess errors)
UNRESOLVED: gnat.dg/sso/q11.adb   -O3 -finline-functions  compilation failed to
produce executable

with -m32, and

FAIL: gnat.dg/sso/p6.adb   -O3 -finline-functions  (test for excess errors)
UNRESOLVED: gnat.dg/sso/p6.adb   -O3 -finline-functions  compilation failed to
produce executable

with -m64. The errors are all

raised CONSTRAINT_ERROR : erroneous memory access

[Bug fortran/85840] Memory leak in write.c

2018-05-26 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85840

--- Comment #10 from Jerry DeLisle  ---
(In reply to Joshua Cogliati from comment #9)
--- snip ---
> I could look into either method of fixing this if you want.  (And for what
> it is worth, I do have copyright assignment paperwork from both myself and
> my employer for GCC filed)

First, thank you very much for the offer to help. Truthfully we do need more
contributors and it does take time to get familiar with the code base, and it
can be fun to work on.

This bug is my doing for sure.  I did a significant re-factoring of the code
and lost track of that variable, res_len. The intent is that it does not get
modified outside of those functions that are allocating the buffers.

In fact it is getting modified by build_float_string in write_float.def. The
build_float_string is being called within the macros of get_float_string. A
variable needs to be used by thos macros, but I don't think it needs to be
passed up the call chain.  The simplest thing would be to give a different
variable to get float string.

@@ -1566,19 +1568,19 @@ write_float_0 (st_parameter_dt *dtp, const fnode *f,
const char *source, int kin
   char buf_stack[BUF_STACK_SZ];
   char str_buf[BUF_STACK_SZ];
   char *buffer, *result;
-  size_t buf_size, res_len;
+  size_t buf_size, res_len, flt_str_len;

   /* Precision for snprintf call.  */
   int precision = get_precision (dtp, f, source, kind);

   /* String buffer to hold final result.  */
   result = select_string (dtp, f, str_buf, _len, kind);

   buffer = select_buffer (dtp, f, precision, buf_stack, _size, kind);

   get_float_string (dtp, f, source , kind, 0, buffer,
-   precision, buf_size, result, _len);
-  write_float_string (dtp, result, res_len);
+   precision, buf_size, result, _str_len);
+  write_float_string (dtp, result, flt_str_len);

   if (buf_size > BUF_STACK_SZ)
 free (buffer);

I also noticed that the buffer size for this small float is 323 which seems
crazy big, but there are other use cases where we need to hold much higher
precision numbers of digits. It seems pointless to do an alloc for this case so
I also suggest we do this:  

@@ -1465,7 +1466,7 @@ write_character (st_parameter_dt *dtp, const char
*source, int kind, size_t leng

 /* Floating point helper functions.  */

-#define BUF_STACK_SZ 256
+#define BUF_STACK_SZ 384


This eliminates 3 allocs in one I/O operation. The purpose of all this alloc
logic is to eliminate them.  The stack size change will mask the problem
uncovered in this test case, so I have tested with and without it.

Since I pretty much have this figured out, I will go ahead and do the commit.
However, please dont hesitate to take on other bugs and join the gfortranners
club. Much appreciated.

Re: PING^2: [PATCH] Don't mark IFUNC resolver as only called directly

2018-05-26 Thread H.J. Lu
On Thu, May 24, 2018 at 1:47 PM, H.J. Lu  wrote:
> On Wed, May 23, 2018 at 8:35 AM, H.J. Lu  wrote:
>> On Wed, May 23, 2018 at 8:11 AM, Jan Hubicka  wrote:
 On Wed, May 23, 2018 at 2:01 AM, Jan Hubicka  wrote:
 >> On Tue, May 22, 2018 at 9:21 AM, Jan Hubicka  wrote:
 >> >> > >  class ipa_opt_pass_d;
 >> >> > >  typedef ipa_opt_pass_d *ipa_opt_pass;
 >> >> > > @@ -2894,7 +2896,8 @@ 
 >> >> > > cgraph_node::only_called_directly_or_aliased_p (void)
 >> >> > >   && !DECL_STATIC_CONSTRUCTOR (decl)
 >> >> > >   && !DECL_STATIC_DESTRUCTOR (decl)
 >> >> > >   && !used_from_object_file_p ()
 >> >> > > - && !externally_visible);
 >> >> > > + && !externally_visible
 >> >> > > + && !lookup_attribute ("ifunc", DECL_ATTRIBUTES 
 >> >> > > (decl)));
 >> >> >
 >> >> > How's it handled for our own generated resolver functions?  
 >> >> > That is,
 >> >> > isn't there sth cheaper than doing a lookup_attribute here?  I 
 >> >> > see
 >> >> > that make_dispatcher_decl nor 
 >> >> > ix86_get_function_versions_dispatcher
 >> >> > adds the 'ifunc' attribute (though they are TREE_PUBLIC there).
 >> >> 
 >> >>  Is there any drawback of setting force_output flag?
 >> >>  Honza
 >> >> >>>
 >> >> >>> Setting force_output may prevent some optimizations.  Can we add 
 >> >> >>> a bit
 >> >> >>> for IFUNC resolver?
 >> >> >>>
 >> >> >>
 >> >> >> Here is the patch to add ifunc_resolver to cgraph_node. Tested on 
 >> >> >> x86-64
 >> >> >> and i686.  Any comments?
 >> >> >>
 >> >> >
 >> >> > PING:
 >> >> >
 >> >> > https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00647.html
 >> >> >
 >> >>
 >> >> PING.
 >> > OK, but please extend the verifier that ifunc_resolver flag is 
 >> > equivalent to
 >> > lookup_attribute ("ifunc", DECL_ATTRIBUTES (decl))
 >> > so we are sure things stays in sync.
 >> >
 >>
 >> Like this
 >>
 >> diff --git a/gcc/symtab.c b/gcc/symtab.c
 >> index 80f6f910c3b..954920b6dff 100644
 >> --- a/gcc/symtab.c
 >> +++ b/gcc/symtab.c
 >> @@ -998,6 +998,13 @@ symtab_node::verify_base (void)
 >>error ("function symbol is not function");
 >>error_found = true;
 >>}
 >> +  else if ((lookup_attribute ("ifunc", DECL_ATTRIBUTES (decl))
 >> + != NULL)
 >> + != dyn_cast  (this)->ifunc_resolver)
 >> +  {
 >> +  error ("inconsistent `ifunc' attribute");
 >> +  error_found = true;
 >> +  }
 >>  }
 >>else if (is_a  (this))
 >>  {
 >>
 >>
 >> Thanks.
 > Yes, thanks!
 > Honza

 I'd like to also fix it on GCC 8 branch for CET.  Should I backport my
 patch to GCC 8 after a few days or use the simple patch for GCC 8:

 https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00588.html
>>>
>>> I would backport this one so we don't unnecesarily diverge.
>>> Thanks!
>>> Honza
>>
>> This is the backport which I will check into GCC 8 branch next week.
>>
>
> This is the updated backport which I will check into GCC 8 branch next week.
>

This is the updated backport which I will check into GCC 8 branch next week.


-- 
H.J.
From 5ebddef01e810c1684ed0927c0dbb1239cf3c178 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Wed, 11 Apr 2018 12:31:21 -0700
Subject: [PATCH] Don't mark IFUNC resolver as only called directly

Since IFUNC resolver is called indirectly, don't mark IFUNC resolver as
only called directly.  This patch adds ifunc_resolver to cgraph_node,
sets ifunc_resolver for ifunc attribute and checks ifunc_resolver
instead of looking up ifunc attribute.

gcc/

	Backport from mainline
	2018-05-26  H.J. Lu  

	PR target/85900
	PR target/85345
	* varasm.c (assemble_alias): Lookup ifunc attribute on error.

	2018-05-24  H.J. Lu  

	PR target/85900
	PR target/85345
	* varasm.c (assemble_alias): Check ifunc_resolver only on
	FUNCTION_DECL.

	2018-05-22  H.J. Lu  

	PR target/85345
	* cgraph.h (cgraph_node::create): Set ifunc_resolver for ifunc
	attribute.
	(cgraph_node::create_alias): Likewise.
	(cgraph_node::get_availability): Check ifunc_resolver instead
	of looking up ifunc attribute.
	* cgraphunit.c (maybe_diag_incompatible_alias): Likewise.
	* varasm.c (do_assemble_alias): Likewise.
	(assemble_alias): Likewise.
	(default_binds_local_p_3): Likewise.
	* cgraph.h (cgraph_node): Add ifunc_resolver.
	(cgraph_node::only_called_directly_or_aliased_p): Return false
	for IFUNC resolver.
	* lto-cgraph.c (input_node): Set ifunc_resolver for ifunc
	attribute.
	* symtab.c 

[Bug target/85918] Conversions to/from [unsigned] long long are not vectorized for AVX512DQ target

2018-05-26 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85918

--- Comment #2 from Jakub Jelinek  ---
Created attachment 44189
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44189=edit
gcc9-pr85918-2.patch

WIP fix for the rest, still need to write testcases and actually test it.

RISC-V ELF multilibs

2018-05-26 Thread Sebastian Huber
Hello,

I built a riscv64-rtems5 GCC (it uses gcc/config/riscv/t-elf-multilib). The 
following multilibs are built:

riscv64-rtems5-gcc -print-multi-lib
.;
rv32i/ilp32;@march=rv32i@mabi=ilp32
rv32im/ilp32;@march=rv32im@mabi=ilp32
rv32iac/ilp32;@march=rv32iac@mabi=ilp32
rv32imac/ilp32;@march=rv32imac@mabi=ilp32
rv32imafc/ilp32f;@march=rv32imafc@mabi=ilp32f
rv64imac/lp64;@march=rv64imac@mabi=lp64
rv64imafdc/lp64d;@march=rv64imafdc@mabi=lp64d

If I print out the builtin defines and search paths for the default settings 
and the -march=rv64imafdc and compare the results I get:

riscv64-rtems5-gcc -E -P -v -dD empty.c > def.txt 2>&1
riscv64-rtems5-gcc -E -P -v -dD empty.c -march=rv64imafdc > rv64imafdc.txt 2>&1
diff -u def.txt rv64imafdc.txt 
--- def.txt 2018-05-26 14:53:26.277760090 +0200
+++ rv64imafdc.txt  2018-05-26 14:53:47.705638409 +0200
@@ -4,8 +4,8 @@
 Configured with: ../gcc-7.3.0/configure --prefix=/opt/rtems/5 
--bindir=/opt/rtems/5/bin --exec_prefix=/opt/rtems/5 
--includedir=/opt/rtems/5/include --libdir=/opt/rtems/5/lib 
--libexecdir=/opt/rtems/5/libexec --mandir=/opt/rtems/5/share/man 
--infodir=/opt/rtems/5/share/info --datadir=/opt/rtems/5/share 
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=riscv64-rtems5 
--disable-libstdcxx-pch --with-gnu-as --with-gnu-ld --verbose --with-newlib 
--disable-nls --without-included-gettext --disable-win32-registry 
--enable-version-specific-runtime-libs --disable-lto 
--enable-newlib-io-c99-formats --enable-newlib-iconv 
--enable-newlib-iconv-encodings=big5,cp775,cp850,cp852,cp855,cp866,euc_jp,euc_kr,euc_tw,iso_8859_1,iso_8859_10,iso_8859_11,iso_8859_13,iso_8859_14,iso_8859_15,iso_8859_2,iso_8859_3,iso_8859_4,iso_8859_5,iso_8859_6,iso_8859_7,iso_8859_8,iso_8859_9,iso_ir_111,koi8_r,koi8_ru,koi8_u,koi8_uni,ucs_2,ucs_2_internal,ucs_2be,ucs_2le,ucs_4,ucs_4_internal,ucs_4be,ucs_4le,us_ascii,utf_16,utf_16be,utf_16le,utf_8,win_1250,win_1251,win_1252,win_1253,win_1254,win_1255,win_1256,win_1257,win_1258
 --enable-threads --disable-plugin --enable-libgomp --enable-languages=c,c++,ada
 Thread model: rtems
 gcc version 7.3.0 20180125 (RTEMS 5, RSB 
a3a6c34c150a357e57769a26a460c475e188438f, Newlib 3.0.0) (GCC) 
-COLLECT_GCC_OPTIONS='-E' '-P' '-v' '-dD' '-march=rv64gc' '-mabi=lp64d'
- /opt/rtems/5/libexec/gcc/riscv64-rtems5/7.3.0/cc1 -E -quiet -v -P -imultilib 
rv64imafdc/lp64d empty.c -march=rv64gc -mabi=lp64d -dD
+COLLECT_GCC_OPTIONS='-E' '-P' '-v' '-dD' '-march=rv64imafdc' '-mabi=lp64d'
+ /opt/rtems/5/libexec/gcc/riscv64-rtems5/7.3.0/cc1 -E -quiet -v -P -imultilib 
rv64imafdc/lp64d empty.c -march=rv64imafdc -mabi=lp64d -dD
 ignoring nonexistent directory 
"/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/../../../../riscv64-rtems5/sys-include"
 #include "..." search starts here:
 #include <...> search starts here:
@@ -338,4 +338,4 @@
 #define __ELF__ 1
 
COMPILER_PATH=/opt/rtems/5/libexec/gcc/riscv64-rtems5/7.3.0/:/opt/rtems/5/libexec/gcc/riscv64-rtems5/7.3.0/:/opt/rtems/5/libexec/gcc/riscv64-rtems5/:/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/:/opt/rtems/5/lib/gcc/riscv64-rtems5/:/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/../../../../riscv64-rtems5/bin/
 
LIBRARY_PATH=/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/rv64imafdc/lp64d/:/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/../../../../riscv64-rtems5/lib/rv64imafdc/lp64d/:/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/:/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/../../../../riscv64-rtems5/lib/:/lib/:/usr/lib/
-COLLECT_GCC_OPTIONS='-E' '-P' '-v' '-dD' '-march=rv64gc' '-mabi=lp64d'
+COLLECT_GCC_OPTIONS='-E' '-P' '-v' '-dD' '-march=rv64imafdc' '-mabi=lp64d'

This looks pretty much the same and the documentation says that G == IMAFD.

Why is the default multilib and a variant identical?

Most variants include the C extension. Would it be possible to add -march=rv32g 
and -march=rv64g variants?

-- 
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.


Re: Enabling -ftree-slp-vectorize on -O2/Os

2018-05-26 Thread Florian Weimer
* Allan Sandfeld Jensen:

> Anythhing else I should test or report?

Interaction with -mstackrealign on i386, where it is required for
system libraries to support applications which use the legacy ABI
without stack alignment if you compile with -msse2 or -march=x86-64
-mtune=generic (and -mfpmath=sse).


[Bug target/85900] [9 Regression] ICEs after revision r260547 on darwin.

2018-05-26 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85900

H.J. Lu  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from H.J. Lu  ---
Fixed.

[Bug target/85345] Missing ENDBR in IFUNC resolver

2018-05-26 Thread hjl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85345

--- Comment #4 from hjl at gcc dot gnu.org  ---
Author: hjl
Date: Sat May 26 11:35:31 2018
New Revision: 260792

URL: https://gcc.gnu.org/viewcvs?rev=260792=gcc=rev
Log:
 Don't check ifunc_resolver on error

Since ifunc_resolver isn't set when an error is detected, we should
lookup ifunc attribute in this case.

PR target/85900
PR target/85345
* varasm.c (assemble_alias): Lookup ifunc attribute on error.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/varasm.c

[Bug target/85900] [9 Regression] ICEs after revision r260547 on darwin.

2018-05-26 Thread hjl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85900

--- Comment #7 from hjl at gcc dot gnu.org  ---
Author: hjl
Date: Sat May 26 11:35:31 2018
New Revision: 260792

URL: https://gcc.gnu.org/viewcvs?rev=260792=gcc=rev
Log:
 Don't check ifunc_resolver on error

Since ifunc_resolver isn't set when an error is detected, we should
lookup ifunc attribute in this case.

PR target/85900
PR target/85345
* varasm.c (assemble_alias): Lookup ifunc attribute on error.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/varasm.c

Re: [PATCH] Check ifunc_resolver only on FUNCTION_DECL

2018-05-26 Thread H.J. Lu
On Fri, May 25, 2018 at 4:48 AM, H.J. Lu  wrote:
> On Thu, May 24, 2018 at 04:43:25AM -0700, H.J. Lu wrote:
>> Since ifunc_resolver is only valid on FUNCTION_DECL, check ifunc_resolver
>> only on FUNCTION_DECL.
>>
>> Please test it on Darwin.
>>
>>
>> H.J.
>> ---
>>   PR target/85900
>>   PR target/85345
>>   * varasm.c (assemble_alias): Check ifunc_resolver only on
>>   FUNCTION_DECL.
>> ---
>>  gcc/varasm.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/varasm.c b/gcc/varasm.c
>> index 3bd9cbb69f0..bff43450a91 100644
>> --- a/gcc/varasm.c
>> +++ b/gcc/varasm.c
>> @@ -5917,7 +5917,8 @@ assemble_alias (tree decl, tree target)
>>  # else
>>if (!DECL_WEAK (decl))
>>   {
>> -   if (cgraph_node::get (decl)->ifunc_resolver)
>> +   if (TREE_CODE (decl) == FUNCTION_DECL
>> +   && cgraph_node::get (decl)->ifunc_resolver)
>>   error_at (DECL_SOURCE_LOCATION (decl),
>> "ifunc is not supported in this configuration");
>> else
>> --
>
> Please test it on Darwin.
>
> H.J.
> ---
> Since ifunc_resolver isn't set when an error is detected, we should
> lookup ifunc attribute in this case.
>
> PR target/85900
> PR target/85345
> * varasm.c (assemble_alias): Lookup ifunc attribute on error.
> ---
>  gcc/varasm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/varasm.c b/gcc/varasm.c
> index 6b9f87b203f..4d332f50270 100644
> --- a/gcc/varasm.c
> +++ b/gcc/varasm.c
> @@ -5917,8 +5917,9 @@ assemble_alias (tree decl, tree target)
>  # else
>if (!DECL_WEAK (decl))
> {
> + /* NB: ifunc_resolver isn't set when an error is detected.  */
>   if (TREE_CODE (decl) == FUNCTION_DECL
> - && cgraph_node::get (decl)->ifunc_resolver)
> + && lookup_attribute ("ifunc", DECL_ATTRIBUTES (decl)))
> error_at (DECL_SOURCE_LOCATION (decl),
>   "ifunc is not supported in this configuration");
>   else
> --
> 2.17.0
>

Dominique verified that it fixed all Darwin issues.  I am checking it
in.

-- 
H.J.


Re: [PATCH] Rename ufloat to floatuns and ufix_trunc to fixuns_trunc in a few patterns (PR target/85918)

2018-05-26 Thread Uros Bizjak
On Fri, May 25, 2018 at 11:09 PM, Jakub Jelinek  wrote:
> Hi!
>
> The optab is looking for floatuns2 and
> fixuns_trunc2, but some of the patterns are instead called
> ufloat2 or ufix_trunc2
> and thus are only used from intrinsics.
>
> We can't change all spots, in two spots we have intentionally an
> floatuns2 or fixuns_trunc2 expander that
> uses for AVX512+ a ufloat*/ufix* insn and in other cases something
> different, but for the cases I've changed we just give up before AVX512DQ.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2018-05-25  Jakub Jelinek  
>
> PR target/85918
> * config/i386/i386.md (fixunssuffix, floatunssuffix): New code
> attributes.
> * config/i386/sse.md
> 
> (float2):
> Rename to ...
> 
> (float2):
> ... this.
> 
> (float2):
> Rename to ...
> 
> (float2):
> ... this.
> (*floatv2div2sf2): Rename to ...
> (*floatv2div2sf2): ... this.
> (floatv2div2sf2_mask): Rename to ...
> (floatv2div2sf2_mask): ... this.
> (*floatv2div2sf2_mask_1): Rename to ...
> (*floatv2div2sf2_mask_1): ... this.
> (fix_truncv8dfv8si2): Rename
> to ...
> (fix_truncv8dfv8si2):
> ... this.
> 
> (fix_trunc2):
> Rename to ...
> 
> (fix_trunc2):
> ... this.
> 
> (fix_trunc2):
> Rename to ...
> 
> (fix_trunc2):
> ... this.
> (fix_truncv2sfv2di2): Rename to ...
> (fix_truncv2sfv2di2): ... this.
> (vec_pack_ufix_trunc_): Use gen_fixuns_truncv8dfv8si2 instead of
> gen_ufix_truncv8dfv8si2.
> * config/i386/i386-builtin.def (__builtin_ia32_cvttpd2uqq256_mask,
> __builtin_ia32_cvttpd2uqq128_mask, __builtin_ia32_cvttps2uqq256_mask,
> __builtin_ia32_cvttps2uqq128_mask, __builtin_ia32_cvtuqq2ps256_mask,
> __builtin_ia32_cvtuqq2ps128_mask, __builtin_ia32_cvtuqq2pd256_mask,
> __builtin_ia32_cvtuqq2pd128_mask, __builtin_ia32_cvttpd2udq512_mask,
> __builtin_ia32_cvtuqq2ps512_mask, __builtin_ia32_cvtuqq2pd512_mask,
> __builtin_ia32_cvttps2uqq512_mask, __builtin_ia32_cvttpd2uqq512_mask):
> Use fixuns instead ufix or floatuns instead ufloat in CODE_FOR_ names.
>
> * gcc.target/i386/avx512dq-pr85918.c: New test.

OK.

Thanks,
Uros.

> --- gcc/config/i386/i386.md.jj  2018-05-25 14:34:52.339390522 +0200
> +++ gcc/config/i386/i386.md 2018-05-25 20:41:43.913430614 +0200
> @@ -981,10 +981,12 @@ (define_code_attr trunsuffix [(ss_trunca
>  ;; Used in signed and unsigned fix.
>  (define_code_iterator any_fix [fix unsigned_fix])
>  (define_code_attr fixsuffix [(fix "") (unsigned_fix "u")])
> +(define_code_attr fixunssuffix [(fix "") (unsigned_fix "uns")])
>
>  ;; Used in signed and unsigned float.
>  (define_code_iterator any_float [float unsigned_float])
>  (define_code_attr floatsuffix [(float "") (unsigned_float "u")])
> +(define_code_attr floatunssuffix [(float "") (unsigned_float "uns")])
>
>  ;; All integer modes.
>  (define_mode_iterator SWI1248x [QI HI SI DI])
> --- gcc/config/i386/sse.md.jj   2018-05-25 14:35:23.122416638 +0200
> +++ gcc/config/i386/sse.md  2018-05-25 20:21:41.939050655 +0200
> @@ -4853,7 +4853,7 @@ (define_insn "float (set_attr "prefix" "maybe_vex")
> (set_attr "mode" "")])
>
> -(define_insn 
> "float2"
> +(define_insn 
> "float2"
>[(set (match_operand:VF2_AVX512VL 0 "register_operand" "=v")
> (any_float:VF2_AVX512VL
>   (match_operand: 1 "nonimmediate_operand" 
> "")))]
> @@ -4863,7 +4863,7 @@ (define_insn "float (set_attr "prefix" "evex")
> (set_attr "mode" "")])
>
> -;; For float insn patterns
> +;; For float insn patterns
>  (define_mode_attr qq2pssuff
>[(V8SF "") (V4SF "{y}")])
>
> @@ -4877,7 +4877,7 @@ (define_mode_attr sseintvecmode3
>[(V8SF "XI") (V4SF "OI")
> (V8DF "OI") (V4DF "TI")])
>
> -(define_insn 
> "float2"
> +(define_insn 
> "float2"
>[(set (match_operand:VF1_128_256VL 0 "register_operand" "=v")
>  (any_float:VF1_128_256VL
>(match_operand: 1 "nonimmediate_operand" 
> "")))]
> @@ -4887,7 +4887,7 @@ (define_insn "float (set_attr "prefix" "evex")
> (set_attr "mode" "")])
>
> -(define_insn "*floatv2div2sf2"
> +(define_insn "*floatv2div2sf2"
>[(set (match_operand:V4SF 0 "register_operand" "=v")
>  (vec_concat:V4SF
> (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" 
> "vm"))
> @@ -4898,7 +4898,7 @@ (define_insn "*floatv2div2s
> (set_attr "prefix" "evex")
> (set_attr "mode" "V4SF")])
>
> -(define_insn "floatv2div2sf2_mask"
> +(define_insn "floatv2div2sf2_mask"
>[(set (match_operand:V4SF 0 "register_operand" "=v")
>  (vec_concat:V4SF
>  (vec_merge:V2SF
> @@ -4914,7 +4914,7 @@ (define_insn "floatv2div2sf
> (set_attr "prefix" "evex")
> (set_attr "mode" 

Re: Enabling -ftree-slp-vectorize on -O2/Os

2018-05-26 Thread Richard Biener
On May 26, 2018 11:32:29 AM GMT+02:00, Allan Sandfeld Jensen 
 wrote:
>I brought this subject up earlier, and was told to suggest it again for
>gcc 9, 
>so I have attached the preliminary changes.
>
>My studies have show that with generic x86-64 optimization it reduces
>binary 
>size with around 0.5%, and when optimizing for x64 targets with SSE4 or
>
>better, it reduces binary size by 2-3% on average. The performance
>changes are 
>negligible however*, and I haven't been able to detect changes in
>compile time 
>big enough to penetrate general noise on my platform, but perhaps
>someone has 
>a better setup for that?
>
>* I believe that is because it currently works best on non-optimized
>code, it 
>is better at big basic blocks doing all kinds of things than tightly
>written 
>inner loops.
>
>Anythhing else I should test or report?

If you have access to SPEC CPU I'd like to see performance, size and 
compile-time effects of the patch on that. Embedded folks may want to rhn their 
favorite benchmark and report results as well. 

Richard. 

>Best regards
>'Allan
>
>
>diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>index beba295bef5..05851229354 100644
>--- a/gcc/doc/invoke.texi
>+++ b/gcc/doc/invoke.texi
>@@ -7612,6 +7612,7 @@ also turns on the following optimization flags:
> -fstore-merging @gol
> -fstrict-aliasing @gol
> -ftree-builtin-call-dce @gol
>+-ftree-slp-vectorize @gol
> -ftree-switch-conversion -ftree-tail-merge @gol
> -fcode-hoisting @gol
> -ftree-pre @gol
>@@ -7635,7 +7636,6 @@ by @option{-O2} and also turns on the following 
>optimization flags:
> -floop-interchange @gol
> -floop-unroll-and-jam @gol
> -fsplit-paths @gol
>--ftree-slp-vectorize @gol
> -fvect-cost-model @gol
> -ftree-partial-pre @gol
> -fpeel-loops @gol
>@@ -8932,7 +8932,7 @@ Perform loop vectorization on trees. This flag is
>
>enabled by default at
> @item -ftree-slp-vectorize
> @opindex ftree-slp-vectorize
>Perform basic block vectorization on trees. This flag is enabled by
>default 
>at
>-@option{-O3} and when @option{-ftree-vectorize} is enabled.
>+@option{-O2} or higher, and when @option{-ftree-vectorize} is enabled.
> 
> @item -fvect-cost-model=@var{model}
> @opindex fvect-cost-model
>diff --git a/gcc/opts.c b/gcc/opts.c
>index 33efcc0d6e7..11027b847e8 100644
>--- a/gcc/opts.c
>+++ b/gcc/opts.c
>@@ -523,6 +523,7 @@ static const struct default_options 
>default_options_table[] =
> { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 },
> { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 },
> { OPT_LEVELS_2_PLUS, OPT_fstore_merging, NULL, 1 },
>+{ OPT_LEVELS_2_PLUS, OPT_ftree_slp_vectorize, NULL, 1 },
> 
> /* -O3 optimizations.  */
>{ OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
>@@ -539,7 +540,6 @@ static const struct default_options 
>default_options_table[] =
> { OPT_LEVELS_3_PLUS, OPT_floop_unroll_and_jam, NULL, 1 },
> { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
> { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
>-{ OPT_LEVELS_3_PLUS, OPT_ftree_slp_vectorize, NULL, 1 },
>{ OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL,
>VECT_COST_MODEL_DYNAMIC 
>},
> { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },
> { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },



[Bug target/85900] [9 Regression] ICEs after revision r260547 on darwin.

2018-05-26 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85900

--- Comment #6 from Dominique d'Humieres  ---
> This patch fixes the ICE and related problems I have spotted. Full testing
> in progress.

Back to "normal"!

Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP

2018-05-26 Thread Richard Sandiford
Steve Ellcey  writes:
> On Wed, 2018-05-16 at 22:11 +0100, Richard Sandiford wrote:
>> 
>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED is the only current way
>> of saying that an rtl instruction preserves the low part of a
>> register but clobbers the high part.  We would need something like
>> Alan H's CLOBBER_HIGH patches to do it using explicit clobbers.
>> 
>> Another approach would be to piggy-back on the -fipa-ra
>> infrastructure
>> and record that vector PCS functions only clobber Q0-Q7.  If -fipa-ra
>> knows that a function doesn't clobber Q8-Q15 then that should
>> override
>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  (I'm not sure whether it does
>> in practice, but it should :-)  And if it doesn't that's a bug that's
>> worth fixing for its own sake.)
>> 
>> Thanks,
>> Richard
>
> Alan,
>
> I have been looking at your CLOBBER_HIGH patches to see if they
> might be helpful in implementing the ARM SIMD Vector ABI in GCC.
> I have also been looking at the -fipa-ra flag and how it works.
>
> I was wondering if you considered using the ipa-ra infrastructure
> for the SVE work that you are currently trying to support with 
> the CLOBBER_HIGH macro?
>
> My current thought for the ABI work is to mark all the floating
> point / vector registers as caller saved (the lower half of V8-V15
> are currently callee saved) and remove
> TARGET_HARD_REGNO_CALL_PART_CLOBBERED.
> This should work but would be inefficient.
>
> The next step would be to split get_call_reg_set_usage up into
> two functions so that I don't have to pass in a default set of
> registers.  One function would return call_used_reg_set by
> default (but could return a smaller set if it had actual used
> register information) and the other would return regs_invalidated
> by_call by default (but could also return a smaller set).
>
> Next I would add a 'largest mode used' array to call_cgraph_rtl_info
> structure in addition to the current function_used_regs register
> set.
>
> Then I could turn the get_call_reg_set_usage replacement functions
> into target specific functions and with the information in the
> call_cgraph_rtl_info structure and any simd attribute information on
> a function I could modify what registers are really being used/invalidated
> without being saved.
>
> If the called function only uses the bottom half of a register it would not
> be marked as used/invalidated.  If it uses the entire register and the
> function is not marked as simd, then the register would marked as
> used/invalidated.  If the function was marked as simd the register would not
> be marked because a simd function would save both the upper and lower halves
> of a callee saved register (whereas a non simd function would only save the
> lower half).
>
> Does this sound like something that could be used in place of your 
> CLOBBER_HIGH patch?

One of the advantages of CLOBBER_HIGH is that it can be attached to
arbitrary instructions, not just calls.  The motivating example was
tlsdesc_small_, which isn't treated as a call but as a normal
instruction.  (And I don't think we want to change that, since it's much
easier for rtl optimisers to deal with normal instructions compared to
calls.  In general a call is part of a longer sequence of instructions
that includes setting up arguments, etc.)

The other use case (not implemented in the posted patches) would be
to represent the effect of syscalls, which clobber the "SVE part"
of all vector registers.  In that case the clobber would need to be
attached to an inline asm insn.

On the wider point about changing the way call clobber information
is represented: I agree it would be good to generalise what we have
now.  But if possible I think we should avoid target hooks that take
a specific call, and instead make it an inherent part of the call insn
itself, much like CALL_INSN_FUNCTION_USAGE is now.  E.g. we could add
a field that points to an ABI description, with -fipa-ra effectively
creating ad-hoc ABIs.  That ABI description could start out with
whatever we think is relevant now and could grow over time.

Thanks,
Richard


Re: Why is REG_ALLOC_ORDER not defined on Aarch64

2018-05-26 Thread Richard Sandiford
Andrew Pinski  writes:
> On Fri, May 25, 2018 at 3:35 PM, Steve Ellcey  wrote:
>> I was curious if there was any reason that REG_ALLOC_ORDER is not
>> defined for Aarch64.  Has anyone tried this to see if it could help
>> performance?  It is defined for many other platforms.
>
> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01815.html
> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01822.html

It looks like the immediate reason for reverting was the effect of
listing the argument registers in reverse order.  I wonder how much that
actually helps with IRA and LRA?  They track per-register costs, and
would be able to increase the cost of a pseudo that conflicts with a
hard-register call argument.

It just felt like it might have been a "best practice" idea passed down
from the old local.c and global.c days.

Thanks,
Richard


Enabling -ftree-slp-vectorize on -O2/Os

2018-05-26 Thread Allan Sandfeld Jensen
I brought this subject up earlier, and was told to suggest it again for gcc 9, 
so I have attached the preliminary changes.

My studies have show that with generic x86-64 optimization it reduces binary 
size with around 0.5%, and when optimizing for x64 targets with SSE4 or 
better, it reduces binary size by 2-3% on average. The performance changes are 
negligible however*, and I haven't been able to detect changes in compile time 
big enough to penetrate general noise on my platform, but perhaps someone has 
a better setup for that?

* I believe that is because it currently works best on non-optimized code, it 
is better at big basic blocks doing all kinds of things than tightly written 
inner loops.

Anythhing else I should test or report?

Best regards
'Allan


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index beba295bef5..05851229354 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7612,6 +7612,7 @@ also turns on the following optimization flags:
 -fstore-merging @gol
 -fstrict-aliasing @gol
 -ftree-builtin-call-dce @gol
+-ftree-slp-vectorize @gol
 -ftree-switch-conversion -ftree-tail-merge @gol
 -fcode-hoisting @gol
 -ftree-pre @gol
@@ -7635,7 +7636,6 @@ by @option{-O2} and also turns on the following 
optimization flags:
 -floop-interchange @gol
 -floop-unroll-and-jam @gol
 -fsplit-paths @gol
--ftree-slp-vectorize @gol
 -fvect-cost-model @gol
 -ftree-partial-pre @gol
 -fpeel-loops @gol
@@ -8932,7 +8932,7 @@ Perform loop vectorization on trees. This flag is 
enabled by default at
 @item -ftree-slp-vectorize
 @opindex ftree-slp-vectorize
 Perform basic block vectorization on trees. This flag is enabled by default 
at
-@option{-O3} and when @option{-ftree-vectorize} is enabled.
+@option{-O2} or higher, and when @option{-ftree-vectorize} is enabled.
 
 @item -fvect-cost-model=@var{model}
 @opindex fvect-cost-model
diff --git a/gcc/opts.c b/gcc/opts.c
index 33efcc0d6e7..11027b847e8 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -523,6 +523,7 @@ static const struct default_options 
default_options_table[] =
 { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_fstore_merging, NULL, 1 },
+{ OPT_LEVELS_2_PLUS, OPT_ftree_slp_vectorize, NULL, 1 },
 
 /* -O3 optimizations.  */
 { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
@@ -539,7 +540,6 @@ static const struct default_options 
default_options_table[] =
 { OPT_LEVELS_3_PLUS, OPT_floop_unroll_and_jam, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
-{ OPT_LEVELS_3_PLUS, OPT_ftree_slp_vectorize, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC 
},
 { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },





Re: [PING] [PATCH] Avoid excessive function type casts with splay-trees

2018-05-26 Thread Bernd Edlinger


On 05/17/18 16:37, Bernd Edlinger wrote:
> On 05/17/18 15:39, Richard Biener wrote:
>> On Thu, May 17, 2018 at 3:21 PM Bernd Edlinger 
>> 
>> wrote:
>>
>>> Ping...
>>
>> So this makes all traditional users go through the indirect
>> splay_tree_compare_wrapper
>> and friends (which is also exported for no good reason?).  And all users
>> are traditional
>> at the moment.
>>
> 
> all except gcc/typed-splay-tree.h which only works if VALUE_TYPE is
> compatible with uint_ptr_t but cannot check this requirement.
> This one worried me the most.
> 
> But not having to rewrite omp-low.c for instance where splay_tree_lookup
> and access to n->value are made all the time, made me think it
> will not work to rip out the old interface completely.
> 

Well, I think it will be best to split this patch in two parts:

One that adds just two utility functions for avoiding undefined
function type casts which can be used with the original C interface.
This first part is attached.

And another part that uses a similar approach as the splay-tree in
libgomp, but instead of creating a type-safe C interface it should
translate the complete code from splay-tree.c/.h into a template.
The second part, I plan to do at a later time.


Is this OK for trunk?


Thanks
Bernd.
include:
2018-05-26  Bernd Edlinger  

* splay-tree.h (splay_tree_compare_strings,
splay_tree_delete_pointers): Declare new utility functions.

libiberty:
2018-05-26  Bernd Edlinger  

* splay-tree.c (splay_tree_compare_strings,
splay_tree_delete_pointers): New utility functions.

gcc:
2018-05-26  Bernd Edlinger  

* tree-dump.c (dump_node): Use splay_tree_delete_pointers.

c-family:
2018-05-26  Bernd Edlinger  

* c-lex.c (get_fileinfo): Use splay_tree_compare_strings and
splay_tree_delete_pointers.

cp:
2018-05-26  Bernd Edlinger  

* decl2.c (start_static_storage_duration_function): Use
splay_tree_delete_pointers.
Index: gcc/c-family/c-lex.c
===
--- gcc/c-family/c-lex.c	(revision 260671)
+++ gcc/c-family/c-lex.c	(working copy)
@@ -103,11 +103,9 @@ get_fileinfo (const char *name)
   struct c_fileinfo *fi;
 
   if (!file_info_tree)
-file_info_tree = splay_tree_new ((splay_tree_compare_fn)
- (void (*) (void)) strcmp,
+file_info_tree = splay_tree_new (splay_tree_compare_strings,
  0,
- (splay_tree_delete_value_fn)
- (void (*) (void)) free);
+ splay_tree_delete_pointers);
 
   n = splay_tree_lookup (file_info_tree, (splay_tree_key) name);
   if (n)
Index: gcc/cp/decl2.c
===
--- gcc/cp/decl2.c	(revision 260671)
+++ gcc/cp/decl2.c	(working copy)
@@ -3595,8 +3595,7 @@ start_static_storage_duration_function (unsigned c
   priority_info_map = splay_tree_new (splay_tree_compare_ints,
 	  /*delete_key_fn=*/0,
 	  /*delete_value_fn=*/
-	  (splay_tree_delete_value_fn)
-	  (void (*) (void)) free);
+	  splay_tree_delete_pointers);
 
   /* We always need to generate functions for the
 	 DEFAULT_INIT_PRIORITY so enter it now.  That way when we walk
Index: gcc/tree-dump.c
===
--- gcc/tree-dump.c	(revision 260671)
+++ gcc/tree-dump.c	(working copy)
@@ -736,8 +736,7 @@ dump_node (const_tree t, dump_flags_t flags, FILE
   di.flags = flags;
   di.node = t;
   di.nodes = splay_tree_new (splay_tree_compare_pointers, 0,
-			 (splay_tree_delete_value_fn)
-			 (void (*) (void)) free);
+			 splay_tree_delete_pointers);
 
   /* Queue up the first node.  */
   queue (, t, DUMP_NONE);
Index: include/splay-tree.h
===
--- include/splay-tree.h	(revision 260671)
+++ include/splay-tree.h	(working copy)
@@ -147,7 +147,9 @@ extern splay_tree_node splay_tree_max (splay_tree)
 extern splay_tree_node splay_tree_min (splay_tree);
 extern int splay_tree_foreach (splay_tree, splay_tree_foreach_fn, void*);
 extern int splay_tree_compare_ints (splay_tree_key, splay_tree_key);
-extern int splay_tree_compare_pointers (splay_tree_key,	splay_tree_key);
+extern int splay_tree_compare_pointers (splay_tree_key, splay_tree_key);
+extern int splay_tree_compare_strings (splay_tree_key, splay_tree_key);
+extern void splay_tree_delete_pointers (splay_tree_value);
 
 #ifdef __cplusplus
 }
Index: libiberty/splay-tree.c
===
--- libiberty/splay-tree.c	(revision 260671)
+++ libiberty/splay-tree.c	(working copy)
@@ -31,6 +31,9 @@ Boston, MA 02110-1301, USA.  */
 #ifdef HAVE_STDLIB_H
 #include 
 #endif
+#ifdef HAVE_STRING_H
+#include 
+#endif
 
 #include 
 
@@ -590,3 +593,19 @@ 

[Bug bootstrap/85921] /gcc/c-family/c-warn.c fails to build

2018-05-26 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85921

Jakub Jelinek  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

--- Comment #19 from Jakub Jelinek  ---
Worked around in 8.1+.  That doesn't mean the headers you are using aren't
seriously broken and will break other stuff.

[Bug bootstrap/85921] /gcc/c-family/c-warn.c fails to build

2018-05-26 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85921

--- Comment #18 from Jakub Jelinek  ---
Author: jakub
Date: Sat May 26 06:56:41 2018
New Revision: 260791

URL: https://gcc.gnu.org/viewcvs?rev=260791=gcc=rev
Log:
PR bootstrap/85921
* c-warn.c (diagnose_mismatched_attributes): Remove unnecessary
noinline variable to workaround broken kernel headers.

Modified:
branches/gcc-8-branch/gcc/c-family/ChangeLog
branches/gcc-8-branch/gcc/c-family/c-warn.c

[Bug bootstrap/85921] /gcc/c-family/c-warn.c fails to build

2018-05-26 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85921

--- Comment #17 from Jakub Jelinek  ---
Author: jakub
Date: Sat May 26 06:40:50 2018
New Revision: 260790

URL: https://gcc.gnu.org/viewcvs?rev=260790=gcc=rev
Log:
PR bootstrap/85921
* c-warn.c (diagnose_mismatched_attributes): Remove unnecessary
noinline variable to workaround broken kernel headers.

Modified:
trunk/gcc/c-family/ChangeLog
trunk/gcc/c-family/c-warn.c

[Bug c++/85936] New: GCC incorrectly implements [expr.prim.lambda.capture]/10.2

2018-05-26 Thread lebedev.ri at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85936

Bug ID: 85936
   Summary: GCC incorrectly implements
[expr.prim.lambda.capture]/10.2
   Product: gcc
   Version: 8.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: lebedev.ri at gmail dot com
  Target Milestone: ---

https://godbolt.org/g/qX6k2H

# 0 "" 3
template  void b(int, a);
template  void b(int, a &, c) {
  b(0, [=] { e; });
}
void d() { b(0, d, 0); }

gcc complans:
: error: 'void b(int, a) [with a = b(int, a&&, c) [with a = void (&)(); c =
int]::]', declared using local type 'b(int, a&&, c) [with a = void
(&)(); c = int]::', is used but never defined [-fpermissive]

That is incorrect, the code is perfectly valid.

Re: [PATCH] PR target/85358 patch v2: Add target hook to prevent default widening

2018-05-26 Thread Richard Biener
On May 25, 2018 8:49:47 PM GMT+02:00, Michael Meissner  
wrote:
>I redid the patch to make the target hook only apply for scalar float
>points,
>and I removed all of the integer only subcases.
>
>I have checked this on a little endian Power8 system, and verified that
>it
>bootstraps correctly and there are no regressions.  I have just started
>an
>x86_64 build.  Assuming that build has no regressions, can I check this
>into
>GCC 9?  This bug appears in GCC 8, and I would like to back port this
>patch to
>GCC 8 as well before GCC 8.2 goes out.

What happens if you hack genmodes to not claim IFmode has any wider 
relationship with other modes? 

Richard. 

>[gcc]
>2018-05-25  Michael Meissner  
>
>   PR target/85358
>   * target.def (default_fp_widening_p): New target hook to automatic
>   widening betwen two floating point modes.
>   * optabs.c (expand_binop): Do not automatically widen a binary or
>   unary scalar floating point op if the backend says that the
>   widening should not occur.
>   (expand_twoval_unop): Likewise.
>   (expand_twoval_binop): Likewise.
>   (expand_unop): Likewise.
>   * config/rs6000/rs6000.c (TARGET_DEFAULT_FP_WIDENING_P): Define.
>   (rs6000_default_fp_widening_p): New target hook to prevent
>   automatic widening between IEEE 128-bit floating point and IBM
>   extended double floating point.
>   * doc/tm.texi (Target Hooks): Document new target hook
>   default_fp_widening_p.
>   * doc/tm.texi.in (Target Hooks): Likewise.
>
>[gcc/testsuite]
>2018-05-25  Michael Meissner  
>
>   PR target/85358
>   * gcc.target/powerpc/pr85358.c: New test.



Re: PR80155: Code hoisting and register pressure

2018-05-26 Thread Richard Biener
On May 25, 2018 9:25:51 PM GMT+02:00, Jeff Law  wrote:
>On 05/25/2018 11:54 AM, Richard Biener wrote:
>> On May 25, 2018 6:57:13 PM GMT+02:00, Jeff Law 
>wrote:
>>> On 05/25/2018 03:49 AM, Bin.Cheng wrote:
 On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni
  wrote:
> On 23 May 2018 at 18:37, Jeff Law  wrote:
>> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote:
>>> On 23 May 2018 at 13:58, Richard Biener 
>wrote:
 On Wed, 23 May 2018, Prathamesh Kulkarni wrote:

> Hi,
> I am trying to work on PR80155, which exposes a problem with
>>> code
> hoisting and register pressure on a leading embedded benchmark
>>> for ARM
> cortex-m7, where code-hoisting causes an extra register spill.
>
> I have attached two test-cases which (hopefully) are
>>> representative of
> the original test-case.
> The first one (trans_dfa.c) is bigger and somewhat similar to
>>> the
> original test-case and trans_dfa_2.c is hand-reduced version
>of
> trans_dfa.c. There's 2 spills caused with trans_dfa.c
> and one spill with trans_dfa_2.c due to lesser amount of
>cases.
> The test-cases in the PR are probably not relevant.
>
> Initially I thought the spill was happening because of "too
>many
> hoistings" taking place in original test-case thus increasing
>>> the
> register pressure, but it seems the spill is possibly caused
>>> because
> expression gets hoisted out of a block that is on loop exit.
>
> For example, the following hoistings take place with
>>> trans_dfa_2.c:
>
> (1) Inserting expression in block 4 for code hoisting:
> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005)
>
> (2) Inserting expression in block 4 for code hoisting:
>>> {plus_expr,_4,1} (0006)
>
> (3) Inserting expression in block 4 for code hoisting:
> {pointer_plus_expr,s_33,1} (0023)
>
> (4) Inserting expression in block 3 for code hoisting:
> {pointer_plus_expr,s_33,1} (0023)
>
> The issue seems to be hoisting of (*tab + 1) which consists of
>>> first
> two hoistings in block 4
> from blocks 5 and 9, which causes the extra spill. I verified
>>> that by
> disabling hoisting into block 4,
> which resulted in no extra spills.
>
> I wonder if that's because the expression (*tab + 1) is
>getting
> hoisted from blocks 5 and 9,
> which are on loop exit ? So the expression that was previously
> computed in a block on loop exit, gets hoisted outside that
>>> block
> which possibly makes the allocator more defensive ? Similarly
> disabling hoisting of expressions which appeared in blocks on
>>> loop
> exit in original test-case prevented the extra spill. The
>other
> hoistings didn't seem to matter.

 I think that's simply co-incidence.  The only thing that makes
 a block that also exits from the loop special is that an
 expression could be sunk out of the loop and hoisting
>(commoning
 with another path) could prevent that.  But that isn't what is
 happening here and it would be a pass ordering issue as
 the sinking pass runs only after hoisting (no idea why exactly
 but I guess there are cases where we want to prefer CSE over
 sinking).  So you could try if re-ordering PRE and sinking
>helps
 your testcase.
>>> Thanks for the suggestions. Placing sink pass before PRE works
>>> for both these test-cases! Sadly it still causes the spill for
>the
>>> benchmark -:(
>>> I will try to create a better approximation of the original
>>> test-case.

 What I do see is a missed opportunity to merge the successors
 of BB 4.  After PRE we have

  [local count: 159303558]:
 :
 pretmp_123 = *tab_37(D);
 _87 = pretmp_123 + 1;
 if (c_36 == 65)
   goto ; [34.00%]
 else
   goto ; [66.00%]

  [local count: 54163210]:
 *tab_37(D) = _87;
 _96 = MEM[(char *)s_57 + 1B];
 if (_96 != 0)
   goto ; [89.00%]
 else
   goto ; [11.00%]

  [local count: 105140348]:
 *tab_37(D) = _87;
 _56 = MEM[(char *)s_57 + 1B];
 if (_56 != 0)
   goto ; [89.00%]
 else
   goto ; [11.00%]

 here at least the stores and loads can be hoisted.  Note this
 may also point at the real issue of the code hoisting which is
 tearing apart the RMW operation?
>>> Indeed, this possibility seems much more likely than block being
>>> on loop exit.
>>> I will try to "hardcode" the load/store hoists into 

Re: [PATCH] Remove useless noinline variable (PR bootstrap/85921)

2018-05-26 Thread Richard Biener
On May 25, 2018 11:03:50 PM GMT+02:00, Jakub Jelinek  wrote:
>Hi!
>
>The following variable only makes the code larger and less readable.
>In addition, with some broken kernel headers that redefine noinline
>it breaks bootstrap.
>
>Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok
>for
>trunk?

OK. 

Richard. 

>2018-05-25  Jakub Jelinek  
>
>   PR bootstrap/85921
>   * c-warn.c (diagnose_mismatched_attributes): Remove unnecessary
>   noinline variable to workaround broken kernel headers.
>
>--- gcc/c-family/c-warn.c.jj   2018-05-21 13:15:33.878575581 +0200
>+++ gcc/c-family/c-warn.c  2018-05-25 14:28:12.151050892 +0200
>@@ -2246,18 +2246,16 @@ diagnose_mismatched_attributes (tree old
>  newdecl);
> 
>   /* Diagnose inline __attribute__ ((noinline)) which is silly.  */
>-  const char *noinline = "noinline";
>-
>   if (DECL_DECLARED_INLINE_P (newdecl)
>   && DECL_UNINLINABLE (olddecl)
>-  && lookup_attribute (noinline, DECL_ATTRIBUTES (olddecl)))
>+  && lookup_attribute ("noinline", DECL_ATTRIBUTES (olddecl)))
>warned |= warning (OPT_Wattributes, "inline declaration of %qD follows
>"
>- "declaration with attribute %qs", newdecl, noinline);
>+ "declaration with attribute %", newdecl);
>   else if (DECL_DECLARED_INLINE_P (olddecl)
>  && DECL_UNINLINABLE (newdecl)
>  && lookup_attribute ("noinline", DECL_ATTRIBUTES (newdecl)))
>warned |= warning (OPT_Wattributes, "declaration of %q+D with attribute
>"
>- "%qs follows inline declaration", newdecl, noinline);
>+ "% follows inline declaration", newdecl);
> 
>   return warned;
> }
>
>   Jakub