Re: Enabling -ftree-slp-vectorize on -O2/Os
On May 27, 2018 1:25:25 AM GMT+02:00, Allan Sandfeld Jensenwrote: >On Sonntag, 27. Mai 2018 00:05:32 CEST Segher Boessenkool wrote: >> On Sat, May 26, 2018 at 11:32:29AM +0200, Allan Sandfeld Jensen >wrote: >> > I brought this subject up earlier, and was told to suggest it again >for >> > gcc 9, so I have attached the preliminary changes. >> > >> > My studies have show that with generic x86-64 optimization it >reduces >> > binary size with around 0.5%, and when optimizing for x64 targets >with >> > SSE4 or better, it reduces binary size by 2-3% on average. The >> > performance changes are negligible however*, and I haven't been >able to >> > detect changes in compile time big enough to penetrate general >noise on >> > my platform, but perhaps someone has a better setup for that? >> > >> > * I believe that is because it currently works best on >non-optimized code, >> > it is better at big basic blocks doing all kinds of things than >tightly >> > written inner loops. >> > >> > Anythhing else I should test or report? >> >> What does it do on other architectures? >> >> >I believe NEON would do the same as SSE4, but I can do a check. For >architectures without SIMD it essentially does nothing. By default it combines integer ops where possible into word_mode registers. So yes, almost nothing. Richard. >'Allan
[Bug libfortran/85906] Conditional jump depends on uninitialized value in write_decimal / write_integer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85906 --- Comment #7 from Jerry DeLisle --- Author: jvdelisle Date: Sun May 27 03:22:11 2018 New Revision: 260802 URL: https://gcc.gnu.org/viewcvs?rev=260802=gcc=rev Log: 2018-05-26 Jerry DeLisleBackport from trunk. PR libgfortran/85906 * io/write.c (write_integer): Initialise the fnode format to FMT_NONE, used for list directed write. (BUF_STACK_SZ): Bump default buffer size up to avoid allocs on small stuff. 2018-05-26 Jerry DeLisle Backport from trunk. PR libgfortran/85840 * io/write.c (write_float_0): Use separate local variable for the float string length. Modified: branches/gcc-8-branch/libgfortran/ChangeLog branches/gcc-8-branch/libgfortran/io/write.c
[Bug fortran/85840] Memory leak in write.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85840 --- Comment #13 from Jerry DeLisle --- Author: jvdelisle Date: Sun May 27 03:22:11 2018 New Revision: 260802 URL: https://gcc.gnu.org/viewcvs?rev=260802=gcc=rev Log: 2018-05-26 Jerry DeLisleBackport from trunk. PR libgfortran/85906 * io/write.c (write_integer): Initialise the fnode format to FMT_NONE, used for list directed write. (BUF_STACK_SZ): Bump default buffer size up to avoid allocs on small stuff. 2018-05-26 Jerry DeLisle Backport from trunk. PR libgfortran/85840 * io/write.c (write_float_0): Use separate local variable for the float string length. Modified: branches/gcc-8-branch/libgfortran/ChangeLog branches/gcc-8-branch/libgfortran/io/write.c
Re: Enabling -ftree-slp-vectorize on -O2/Os
On Sun, May 27, 2018 at 01:25:25AM +0200, Allan Sandfeld Jensen wrote: > On Sonntag, 27. Mai 2018 00:05:32 CEST Segher Boessenkool wrote: > > On Sat, May 26, 2018 at 11:32:29AM +0200, Allan Sandfeld Jensen wrote: > > > I brought this subject up earlier, and was told to suggest it again for > > > gcc 9, so I have attached the preliminary changes. > > > > > > My studies have show that with generic x86-64 optimization it reduces > > > binary size with around 0.5%, and when optimizing for x64 targets with > > > SSE4 or better, it reduces binary size by 2-3% on average. The > > > performance changes are negligible however*, and I haven't been able to > > > detect changes in compile time big enough to penetrate general noise on > > > my platform, but perhaps someone has a better setup for that? > > > > > > * I believe that is because it currently works best on non-optimized code, > > > it is better at big basic blocks doing all kinds of things than tightly > > > written inner loops. > > > > > > Anythhing else I should test or report? > > > > What does it do on other architectures? > > > I believe NEON would do the same as SSE4, but I can do a check. For > architectures without SIMD it essentially does nothing. Sorry, I wasn't clear. What does it do to performance on other architectures? Is it (almost) always a win (or neutral)? If not, it doesn't belong in -O2, not for the generic options at least. (We'll test it on Power soon, it's weekend now :-) ). Segher
[Bug libstdc++/85930] [8/9 Regression] Misaligned reference created in shared_ptr_base.h with -fno-rtti
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85930 Jonathan Wakely changed: What|Removed |Added Status|NEW |ASSIGNED Known to work||7.3.0 Assignee|unassigned at gcc dot gnu.org |redi at gcc dot gnu.org Target Milestone|--- |8.2 Summary|Misaligned reference|[8/9 Regression] Misaligned |created in |reference created in |shared_ptr_base.h with |shared_ptr_base.h with |-fno-rtti |-fno-rtti Known to fail||8.1.0, 9.0
[Bug libstdc++/85930] Misaligned reference created in shared_ptr_base.h with -fno-rtti
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85930 Jonathan Wakely changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2018-05-27 Ever confirmed|0 |1
[Bug c++/85940] New: Address of label breaks ISO C++ program despite non-GNU dialect and pedantic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85940 Bug ID: 85940 Summary: Address of label breaks ISO C++ program despite non-GNU dialect and pedantic Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: rejects-valid Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hstong at ca dot ibm.com Target Milestone: --- The address of label extension breaks well-formed C++ programs. GCC confuses the logical AND operator for the extension in the source below. MSVC works. ### SOURCE (): bool f(bool x) { x: return (bool()) && x; } ### COMPILER INVOCATION COMMAND: g++ -fsyntax-only -xc++ -std=c++11 -pedantic - ### ACTUAL OUTPUT: : In function 'bool f(bool)': :3:22: warning: taking the address of a label is non-standard [-Wpedantic] :3:22: error: invalid cast to function type 'bool()' ### EXPECTED OUTPUT: (Clean compile). ### COMPILER VERSION INFO (g++ -v): Using built-in specs. COLLECT_GCC=/opt/wandbox/gcc-head/bin/g++ COLLECT_LTO_WRAPPER=/opt/wandbox/gcc-head/libexec/gcc/x86_64-pc-linux-gnu/9.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../source/configure --prefix=/opt/wandbox/gcc-head --enable-languages=c,c++ --disable-multilib --without-ppl --without-cloog-ppl --enable-checking=release --disable-nls --enable-lto LDFLAGS=-Wl,-rpath,/opt/wandbox/gcc-head/lib,-rpath,/opt/wandbox/gcc-head/lib64,-rpath,/opt/wandbox/gcc-head/lib32 Thread model: posix gcc version 9.0.0 20180525 (experimental) (GCC)
Re: Enabling -ftree-slp-vectorize on -O2/Os
On Sonntag, 27. Mai 2018 00:05:32 CEST Segher Boessenkool wrote: > On Sat, May 26, 2018 at 11:32:29AM +0200, Allan Sandfeld Jensen wrote: > > I brought this subject up earlier, and was told to suggest it again for > > gcc 9, so I have attached the preliminary changes. > > > > My studies have show that with generic x86-64 optimization it reduces > > binary size with around 0.5%, and when optimizing for x64 targets with > > SSE4 or better, it reduces binary size by 2-3% on average. The > > performance changes are negligible however*, and I haven't been able to > > detect changes in compile time big enough to penetrate general noise on > > my platform, but perhaps someone has a better setup for that? > > > > * I believe that is because it currently works best on non-optimized code, > > it is better at big basic blocks doing all kinds of things than tightly > > written inner loops. > > > > Anythhing else I should test or report? > > What does it do on other architectures? > > I believe NEON would do the same as SSE4, but I can do a check. For architectures without SIMD it essentially does nothing. 'Allan
Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
On Sat, May 26, 2018 at 11:09:24AM +0100, Richard Sandiford wrote: > On the wider point about changing the way call clobber information > is represented: I agree it would be good to generalise what we have > now. But if possible I think we should avoid target hooks that take > a specific call, and instead make it an inherent part of the call insn > itself, much like CALL_INSN_FUNCTION_USAGE is now. E.g. we could add > a field that points to an ABI description, with -fipa-ra effectively > creating ad-hoc ABIs. That ABI description could start out with > whatever we think is relevant now and could grow over time. Somewhat related: there still is PR68150 open for problems with HARD_REGNO_CALL_PART_CLOBBERED in postreload-gcse (it ignores it). Segher
[Bug target/85918] Conversions to/from [unsigned] long long are not vectorized for AVX512DQ target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85918 --- Comment #3 from Jakub Jelinek --- Author: jakub Date: Sat May 26 22:04:50 2018 New Revision: 260797 URL: https://gcc.gnu.org/viewcvs?rev=260797=gcc=rev Log: PR target/85918 * config/i386/i386.md (fixunssuffix, floatunssuffix): New code attributes. * config/i386/sse.md (float2): Rename to ... (float2): ... this. (float2): Rename to ... (float2): ... this. (*floatv2div2sf2): Rename to ... (*floatv2div2sf2): ... this. (floatv2div2sf2_mask): Rename to ... (floatv2div2sf2_mask): ... this. (*floatv2div2sf2_mask_1): Rename to ... (*floatv2div2sf2_mask_1): ... this. (fix_truncv8dfv8si2): Rename to ... (fix_truncv8dfv8si2): ... this. (fix_trunc2): Rename to ... (fix_trunc2): ... this. (fix_trunc2): Rename to ... (fix_trunc2): ... this. (fix_truncv2sfv2di2): Rename to ... (fix_truncv2sfv2di2): ... this. (vec_pack_ufix_trunc_): Use gen_fixuns_truncv8dfv8si2 instead of gen_ufix_truncv8dfv8si2. * config/i386/i386-builtin.def (__builtin_ia32_cvttpd2uqq256_mask, __builtin_ia32_cvttpd2uqq128_mask, __builtin_ia32_cvttps2uqq256_mask, __builtin_ia32_cvttps2uqq128_mask, __builtin_ia32_cvtuqq2ps256_mask, __builtin_ia32_cvtuqq2ps128_mask, __builtin_ia32_cvtuqq2pd256_mask, __builtin_ia32_cvtuqq2pd128_mask, __builtin_ia32_cvttpd2udq512_mask, __builtin_ia32_cvtuqq2ps512_mask, __builtin_ia32_cvtuqq2pd512_mask, __builtin_ia32_cvttps2uqq512_mask, __builtin_ia32_cvttpd2uqq512_mask): Use fixuns instead ufix or floatuns instead ufloat in CODE_FOR_ names. * gcc.target/i386/avx512dq-pr85918.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/avx512dq-pr85918.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386-builtin.def trunk/gcc/config/i386/i386.md trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog
Re: Enabling -ftree-slp-vectorize on -O2/Os
On Sat, May 26, 2018 at 11:32:29AM +0200, Allan Sandfeld Jensen wrote: > I brought this subject up earlier, and was told to suggest it again for gcc > 9, > so I have attached the preliminary changes. > > My studies have show that with generic x86-64 optimization it reduces binary > size with around 0.5%, and when optimizing for x64 targets with SSE4 or > better, it reduces binary size by 2-3% on average. The performance changes > are > negligible however*, and I haven't been able to detect changes in compile > time > big enough to penetrate general noise on my platform, but perhaps someone has > a better setup for that? > > * I believe that is because it currently works best on non-optimized code, it > is better at big basic blocks doing all kinds of things than tightly written > inner loops. > > Anythhing else I should test or report? What does it do on other architectures? Segher
[Bug target/85939] New: -mstackrealign does not realign stack with local __m64 variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85939 Bug ID: 85939 Summary: -mstackrealign does not realign stack with local __m64 variable Product: gcc Version: 8.1.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: fw at gcc dot gnu.org Target Milestone: --- Target: i386-*-linux-gnu Consider this test case: #include int f1 (__m64 *); int f2 (void) { __m64 v; return f1 (); } My understanding is that the i386 ABI requires 8-byte alignment for __m64 objects. However, with “gcc -m32 -mstackrealign -O2”, I get this: .p2align 4,,15 .globl f2 .type f2, @function f2: .LFB504: .cfi_startproc subl$40, %esp .cfi_def_cfa_offset 44 leal20(%esp), %eax pushl %eax .cfi_def_cfa_offset 48 callf1 addl$44, %esp .cfi_def_cfa_offset 4 ret .cfi_endproc .LFE504: .size f2, .-f2 This will not produce a correctly aligned object if the stack alignment is off by 4 (or 12) bytes.
[Bug web/85917] GCC 8 Changes page fails to mention change of default mode for C
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85917 --- Comment #2 from Arfrever Frehtes Taifersar Arahesis --- Even such minor changes could be mentioned in that page for completeness.
[Bug target/85915] -mfunction-return=thunk causes multiple definition of `__x86_return_thunk'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85915 --- Comment #7 from Arfrever Frehtes Taifersar Arahesis --- Ebuild for GCC 7 branch is not available in Gentoo. I guess that the relevant commit is: https://gcc.gnu.org/viewcvs/gcc?view=revision=258647 https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=b67700aa0344abbbdb50d150c628475e747b316f I have backported this commit and 6 earlier commits to GCC 7.3.0 and this patched GCC 7.3.0 now seems to work (both with -mfunction-return=thunk and with default -mfunction-return=keep). I guess that the same ABI incompatibility will exist between vanilla GCC 7.3.0 and future GCC 7.4.0 as between vanilla GCC 7.3.0 and GCC 8.1.0. If it is not possible to have ABI compatibility while still keeping fixed behavior (not generating aliases for function return thunks), then I suggest that documentation mention this ABI incompatibility. The relevant places would be: https://gcc.gnu.org/gcc-7/changes.html (mentioning ABI incompatibility between GCC 7.3.0 and >=7.4.0) https://gcc.gnu.org/gcc-7/porting_to.html (mentioning ABI incompatibility between GCC 7.3.0 and >=7.4.0) https://gcc.gnu.org/gcc-8/changes.html (mentioning ABI incompatibility between GCC 7.3.0 and >=8) https://gcc.gnu.org/gcc-8/porting_to.html (mentioning ABI incompatibility between GCC 7.3.0 and >=8)
[PATCH] DWARF5: Don't generate DW_AT_loclists_base for split compile unit DIEs.
The loclists_base attribute is used to point to the beginning of the loclists index of a DWARF5 loclists table when using DW_FORM_loclistsx. For split compile units the base is not given by the attribute, but is either the first (and only) index in the .debug_loclists section, or (when placed in a .dwp file) given by the DW_SECT_LOCLISTS row in the .debug_cu_index section. The loclists_base attribute is only valid for the full (or skeleton) compile unit DIE in the main (relocatable) object. But GCC only ever generates a loclists table index for the .debug_loclists section put into the split DWARF .dwo file. For split compile unit DIEs it is confusing (and not according to spec) to also have a DW_AT_loclists_base attribute (which might be wrong, since its relocatable offset won't actually be relocated). gcc/ChangeLog * dwarf2out.c (dwarf2out_finish): Remove generation of DW_AT_loclists_base. --- diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index c05bfe4..103ded0 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -31292,11 +31292,17 @@ dwarf2out_finish (const char *) if (dwarf_split_debug_info) { if (have_location_lists) { - if (dwarf_version >= 5) - add_AT_loclistsptr (comp_unit_die (), DW_AT_loclists_base, - loc_section_label); + /* Since we generate the loclists in the split DWARF .dwo +file itself, we don't need to generate a loclists_base +attribute for the split compile unit DIE. That attribute +(and using relocatable sec_offset FORMs) isn't allowed +for a split compile unit. Only if the .debug_loclists +section was in the main file, would we need to generate a +loclists_base attribute here (for the full or skeleton +unit DIE). */ + /* optimize_location_lists calculates the size of the lists, so index them first, and assign indices to the entries. Although optimize_location_lists will remove entries from the table, it only does so for duplicates, and therefore
[Bug objc/50909] Process "#pragma options align=reset" correctly on Mac OS X
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50909 Rudolf changed: What|Removed |Added CC||rudolf.chrispens at web dot de --- Comment #12 from Rudolf --- This is still an issue... 7 YEARS LEATER!? I tried to change the permission on USB.h just to get around this bug... with something like this: #if defined(__clang__) || defined(__llvm__) #pragma options align=reset #else #pragma pack() #endif but sadly could not do so atm. I am using gcc 8.1.0. please fix this.
[Bug c++/58372] internal compiler error: ix86_compute_frame_layout
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58372 martchus at gmx dot net changed: What|Removed |Added CC||martchus at gmx dot net --- Comment #13 from martchus at gmx dot net --- I also came across this issue when compiling Qt 5.11.0. The error message you get when compiling Qt looks very similar, indeed. I configured my compiler similar to Bitterblue, but I'm already using GCC 8.1.0. Since the official Qt binaries are apparently not affected, I assume that problem is only present when using SJLJ.
[Bug fortran/85938] New: Spurious assert failure for matmul with reshaped array
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85938 Bug ID: 85938 Summary: Spurious assert failure for matmul with reshaped array Product: gcc Version: 8.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: stephan.kramer at imperial dot ac.uk Target Milestone: --- The following program program foo real, dimension(9) :: A real, dimension(3) :: b integer :: n = 3 A = 1.0 b = 1.0 print *, matmul(reshape(A, (/ n, n /)), b) end program compiled with, or without optimisation, produces the following runtime assertion failure in libgfortran: $ gfortran test.f90 $ ./a.out a.out: ../../../src/libgfortran/generated/matmul_r4.c:651: matmul_r4_avx2: Assertion `GFC_DESCRIPTOR_RANK (a) == 2 || GFC_DESCRIPTOR_RANK (b) == 2' failed. Program received signal SIGABRT: Process abort signal. I also tried -fno-frontend-optimize to no avail. Backtrace in gdb: (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x76dfc231 in __GI_abort () at abort.c:79 #2 0x76df39da in __assert_fail_base (fmt=0x76f46d48 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x77ba9ec8 "GFC_DESCRIPTOR_RANK (a) == 2 || GFC_DESCRIPTOR_RANK (b) == 2", file=file@entry=0x77baa300 "../../../src/libgfortran/generated/matmul_r4.c", line=line@entry=651, function=function@entry=0x77baa368 "matmul_r4_avx2") at assert.c:92 #3 0x76df3a52 in __GI___assert_fail (assertion=0x77ba9ec8 "GFC_DESCRIPTOR_RANK (a) == 2 || GFC_DESCRIPTOR_RANK (b) == 2", file=0x77baa300 "../../../src/libgfortran/generated/matmul_r4.c", line=651, function=0x77baa368 "matmul_r4_avx2") at assert.c:101 #4 0x77a5c9a3 in ?? () from /usr/lib/x86_64-linux-gnu/libgfortran.so.5 #5 0x4c32 in foo () at test.f90:8 #6 0x4cf0 in main (argc=1, argv=0x7fffe38c) at test.f90:10 #7 0x76de7a87 in __libc_start_main (main=0x4cba , argc=1, argv=0x7fffe088, init=, fini=, rtld_fini=, stack_end=0x7fffe078) at ../csu/libc-start.c:310 #8 0x482a in _start () This is based on source and debugging symbols for Debian gcc-8 8.1.0-3
[Bug fortran/85840] Memory leak in write.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85840 --- Comment #12 from Jerry DeLisle --- Fixed on trunk. I think this should be backported as it is a regression I think on 7 and 8 branches.
[Bug libfortran/85906] Conditional jump depends on uninitialized value in write_decimal / write_integer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85906 --- Comment #6 from Jerry DeLisle --- Fixed on trunk. If anyone thinks this should be backported as a regression, let me know.
[patch, committed, libgfortran] PR85906 - Conditional jump depends on uninitialized value in write_integer
I biffed the ChangeLog on this with a flip of two digits on the PR number (fixed). Anyway, the following was committed as obvious to trunk. The BUF_STACK_SZ I bumped up because I noticed on PR85840 test case that even small kind floats were asking for a buffer size of 323. This avoids a few allocs for every write operation. 2018-05-26 Jerry DeLislePR libgfortran/85906 * io/write.c (write_integer): Initialise the fnode format to FMT_NONE, used for list directed write. (BUF_STACK_SZ): Bump default buffer size up to avoid allocs on small stuff. --- trunk/libgfortran/io/write.c2018/05/26 17:30:52 260793 +++ trunk/libgfortran/io/write.c2018/05/26 18:22:18 260795 @@ -1348,6 +1348,7 @@ } f.u.integer.w = width; f.u.integer.m = -1; + f.format = FMT_NONE; write_decimal (dtp, , source, kind, (void *) gfc_itoa); } @@ -1465,7 +1466,7 @@ /* Floating point helper functions. */ -#define BUF_STACK_SZ 256 +#define BUF_STACK_SZ 384 static int get_precision (st_parameter_dt *dtp, const fnode *f, const char *source, int kind)
[Bug libfortran/85906] Conditional jump depends on uninitialized value in write_decimal / write_integer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85906 --- Comment #5 from Jerry DeLisle --- 2018-05-26 Jerry DeLislePR libgfortran/85906 * io/write.c (write_integer): Initialise the fnode format to FMT_NONE, used for list directed write. (BUF_STACK_SZ): Bump default buffer size up to avoid allocs on small stuff. --- trunk/libgfortran/io/write.c2018/05/26 17:30:52 260793 +++ trunk/libgfortran/io/write.c2018/05/26 18:22:18 260795 @@ -1348,6 +1348,7 @@ } f.u.integer.w = width; f.u.integer.m = -1; + f.format = FMT_NONE; write_decimal (dtp, , source, kind, (void *) gfc_itoa); } @@ -1465,7 +1466,7 @@ /* Floating point helper functions. */ -#define BUF_STACK_SZ 256 +#define BUF_STACK_SZ 384 static int get_precision (st_parameter_dt *dtp, const fnode *f, const char *source, int kind)
[Bug middle-end/85933] FAIL: gcc.dg/sso/p8.c -O3 -finline-functions (internal compiler error)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85933 Dominique d'Humieres changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2018-05-26 Ever confirmed|0 |1 --- Comment #2 from Dominique d'Humieres --- > *** Bug 85937 has been marked as a duplicate of this bug. *** Then confirmed.
Re: [PATCH] Warn for ignored ASM labels on typdef declarations PR 85444 (v.3)
Hello everyone! I know every member of the community is very busy, but I am following up on this patch to conform to the 'ping' etiquette. Please let me know what comments you have about this patch and how I can modify it to make sure that it meets standards. Thanks for everything that you all do to make GCC the best compiler out there. Will On Fri, May 18, 2018 at 5:34 PM, Will Hawkinswrote: > Hello again! > > Thanks to the feedback of Mr. Myers and those on the PR, I have > created a version 3 of this patch. This version introduces a new > warning flag (enabled at Wall) -Wignored-asm-name that will flag cases > where the user specifies an ASM name that the compiler ignores. > > Test cases included. Results from make bootstrap and/or make -k check > are available upon request. > > Please let me know what I can do to make this better and bring it up > to the standards of the community! Thanks again for the feedback on > this patch during the previous two revisions! > > Sincerely, > Will Hawkins > > > 2018-05-18 Will Hawkins > > PR c,c++/85444 > * gcc/c/c-decl.c: Warn about ignored asm label for > typedef declaration > * gcc/cp/decl.c: Warn about ignored asm label for > typedef declaration > * gcc/testsuite/gcc.dg/asm-pr85444.c: c testcase. > * gcc/testsuite/g++.dg/asm-pr85444.C: c++ testcase. > > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt > index c48d6dc..ab3a9af 100644 > --- a/gcc/c-family/c.opt > +++ b/gcc/c-family/c.opt > @@ -595,6 +595,10 @@ Wignored-attributes > C C++ Var(warn_ignored_attributes) Init(1) Warning > Warn whenever attributes are ignored. > > +Wignored-asm-name > +C C++ Var(warn_ignored_asm_name) Warning LangEnabledBy(C C++,Wall) > +Warn whenever assembler names are specified but ignored. > + > Wincompatible-pointer-types > C ObjC Var(warn_incompatible_pointer_types) Init(1) Warning > Warn when there is a conversion between pointers that have incompatible > types. > diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c > index 3c4b18e..5a1ecd7 100644 > --- a/gcc/c/c-decl.c > +++ b/gcc/c/c-decl.c > @@ -5177,7 +5177,11 @@ finish_decl (tree decl, location_t init_loc, tree init, >if (!DECL_FILE_SCOPE_P (decl) >&& variably_modified_type_p (TREE_TYPE (decl), NULL_TREE)) > add_stmt (build_stmt (DECL_SOURCE_LOCATION (decl), DECL_EXPR, decl)); > - > + if (asmspec_tree != NULL_TREE) > +{ > + warning (OPT_Wignored_asm_name, "asm-specifier is ignored in " > + "typedef declaration"); > +} >rest_of_decl_compilation (decl, DECL_FILE_SCOPE_P (decl), 0); > } > > diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c > index 10e3079..4c3ee36 100644 > --- a/gcc/cp/decl.c > +++ b/gcc/cp/decl.c > @@ -6981,6 +6981,11 @@ cp_finish_decl (tree decl, tree init, bool > init_const_expr_p, >/* Take care of TYPE_DECLs up front. */ >if (TREE_CODE (decl) == TYPE_DECL) > { > + if (asmspec_tree != NULL_TREE) > +{ > + warning (OPT_Wignored_asm_name, "asm-specifier is ignored for " > + "typedef declarations"); > +} >if (type != error_mark_node >&& MAYBE_CLASS_TYPE_P (type) && DECL_NAME (decl)) > { > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index ca3772b..63f81f4 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -286,7 +286,8 @@ Objective-C and Objective-C++ Dialects}. > -Wformat-y2k -Wframe-address @gol > -Wframe-larger-than=@var{len} -Wno-free-nonheap-object > -Wjump-misses-init @gol > -Wif-not-aligned @gol > --Wignored-qualifiers -Wignored-attributes -Wincompatible-pointer-types @gol > +-Wignored-qualifiers -Wignored-attributes -Wignored-asm-name @gol > +-Wincompatible-pointer-types @gol > -Wimplicit -Wimplicit-fallthrough -Wimplicit-fallthrough=@var{n} @gol > -Wimplicit-function-declaration -Wimplicit-int @gol > -Winit-self -Winline -Wno-int-conversion -Wint-in-bool-context @gol > @@ -4523,6 +4524,14 @@ Warn when an attribute is ignored. This is > different from the > to drop an attribute, not that the attribute is either unknown, used in a > wrong place, etc. This warning is enabled by default. > > +@item -Wignored-asm-name @r{(C and C++ only)} > +@opindex Wignored-asm-name > +@opindex Wno-ignored-asm-name > +Warn when an assembler name is given but ignored. For C and C++, this > +happens when a @code{typdef} declaration is given an assembler name. > + > +This warning is also enabled by @option{-Wall}. > + > @item -Wmain > @opindex Wmain > @opindex Wno-main > diff --git a/gcc/testsuite/g++.dg/asm-pr85444.C > b/gcc/testsuite/g++.dg/asm-pr85444.C > new file mode 100644 > index 000..f1f8f61 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/asm-pr85444.C > @@ -0,0 +1,13 @@ > +/* Fix Bugzilla 8544 -- asm specifier on typedef silently ignored. > + { dg-do compile } > + { dg-options "-Wignored-asm-name" } */ > + > +typedef struct > +{ > + int
Re: PR80155: Code hoisting and register pressure
On Fri, May 25, 2018 at 5:54 PM, Richard Bienerwrote: > On May 25, 2018 6:57:13 PM GMT+02:00, Jeff Law wrote: >>On 05/25/2018 03:49 AM, Bin.Cheng wrote: >>> On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni >>> wrote: On 23 May 2018 at 18:37, Jeff Law wrote: > On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: >> On 23 May 2018 at 13:58, Richard Biener wrote: >>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote: >>> Hi, I am trying to work on PR80155, which exposes a problem with >>code hoisting and register pressure on a leading embedded benchmark >>for ARM cortex-m7, where code-hoisting causes an extra register spill. I have attached two test-cases which (hopefully) are >>representative of the original test-case. The first one (trans_dfa.c) is bigger and somewhat similar to >>the original test-case and trans_dfa_2.c is hand-reduced version of trans_dfa.c. There's 2 spills caused with trans_dfa.c and one spill with trans_dfa_2.c due to lesser amount of cases. The test-cases in the PR are probably not relevant. Initially I thought the spill was happening because of "too many hoistings" taking place in original test-case thus increasing >>the register pressure, but it seems the spill is possibly caused >>because expression gets hoisted out of a block that is on loop exit. For example, the following hoistings take place with >>trans_dfa_2.c: (1) Inserting expression in block 4 for code hoisting: {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) (2) Inserting expression in block 4 for code hoisting: >>{plus_expr,_4,1} (0006) (3) Inserting expression in block 4 for code hoisting: {pointer_plus_expr,s_33,1} (0023) (4) Inserting expression in block 3 for code hoisting: {pointer_plus_expr,s_33,1} (0023) The issue seems to be hoisting of (*tab + 1) which consists of >>first two hoistings in block 4 from blocks 5 and 9, which causes the extra spill. I verified >>that by disabling hoisting into block 4, which resulted in no extra spills. I wonder if that's because the expression (*tab + 1) is getting hoisted from blocks 5 and 9, which are on loop exit ? So the expression that was previously computed in a block on loop exit, gets hoisted outside that >>block which possibly makes the allocator more defensive ? Similarly disabling hoisting of expressions which appeared in blocks on >>loop exit in original test-case prevented the extra spill. The other hoistings didn't seem to matter. >>> >>> I think that's simply co-incidence. The only thing that makes >>> a block that also exits from the loop special is that an >>> expression could be sunk out of the loop and hoisting (commoning >>> with another path) could prevent that. But that isn't what is >>> happening here and it would be a pass ordering issue as >>> the sinking pass runs only after hoisting (no idea why exactly >>> but I guess there are cases where we want to prefer CSE over >>> sinking). So you could try if re-ordering PRE and sinking helps >>> your testcase. >> Thanks for the suggestions. Placing sink pass before PRE works >> for both these test-cases! Sadly it still causes the spill for the >>benchmark -:( >> I will try to create a better approximation of the original >>test-case. >>> >>> What I do see is a missed opportunity to merge the successors >>> of BB 4. After PRE we have >>> >>> [local count: 159303558]: >>> : >>> pretmp_123 = *tab_37(D); >>> _87 = pretmp_123 + 1; >>> if (c_36 == 65) >>> goto ; [34.00%] >>> else >>> goto ; [66.00%] >>> >>> [local count: 54163210]: >>> *tab_37(D) = _87; >>> _96 = MEM[(char *)s_57 + 1B]; >>> if (_96 != 0) >>> goto ; [89.00%] >>> else >>> goto ; [11.00%] >>> >>> [local count: 105140348]: >>> *tab_37(D) = _87; >>> _56 = MEM[(char *)s_57 + 1B]; >>> if (_56 != 0) >>> goto ; [89.00%] >>> else >>> goto ; [11.00%] >>> >>> here at least the stores and loads can be hoisted. Note this >>> may also point at the real issue of the code hoisting which is >>> tearing apart the RMW operation? >> Indeed, this possibility seems much more likely than block being >>on loop exit. >> I will try to "hardcode" the load/store hoists into block 4 for >>this >> specific test-case to check >> if that prevents the spill. > Even if it prevents the spill in this case, it's likely a good >>thing to > do. The
[Bug middle-end/85933] FAIL: gcc.dg/sso/p8.c -O3 -finline-functions (internal compiler error)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85933 Eric Botcazou changed: What|Removed |Added CC||dominiq at lps dot ens.fr --- Comment #1 from Eric Botcazou --- *** Bug 85937 has been marked as a duplicate of this bug. ***
[Bug ada/85937] [9 Regression] Failures in the Ada tests
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85937 Eric Botcazou changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #1 from Eric Botcazou --- . *** This bug has been marked as a duplicate of bug 85933 ***
[patch, libgfortran, committed] Bug 85840 - Memory leak in write.c
The following committed as obvious after regression testing. 2018-05-26 Jerry DeLislePR libgfortran/85840 * io/write.c (write_float_0): Use separate local variable for the float string length. Author: jvdelisle Date: Sat May 26 17:30:52 2018 New Revision: 260793 URL: https://gcc.gnu.org/viewcvs?rev=260793=gcc=rev Log: 2018-05-26 Jerry DeLisle PR libgfortran/85840 * io/write.c (write_float_0): Use separate local variable for the float string length. Modified: trunk/libgfortran/io/write.c --- trunk/libgfortran/io/write.c2018/05/26 11:35:31 260792 +++ trunk/libgfortran/io/write.c2018/05/26 17:30:52 260793 @@ -1566,19 +1566,19 @@ char buf_stack[BUF_STACK_SZ]; char str_buf[BUF_STACK_SZ]; char *buffer, *result; - size_t buf_size, res_len; + size_t buf_size, res_len, flt_str_len; /* Precision for snprintf call. */ int precision = get_precision (dtp, f, source, kind); /* String buffer to hold final result. */ result = select_string (dtp, f, str_buf, _len, kind); - + buffer = select_buffer (dtp, f, precision, buf_stack, _size, kind); - + get_float_string (dtp, f, source , kind, 0, buffer, - precision, buf_size, result, _len); - write_float_string (dtp, result, res_len); + precision, buf_size, result, _str_len); + write_float_string (dtp, result, flt_str_len); if (buf_size > BUF_STACK_SZ) free (buffer);
[Bug fortran/85840] Memory leak in write.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85840 --- Comment #11 from Jerry DeLisle --- Author: jvdelisle Date: Sat May 26 17:30:52 2018 New Revision: 260793 URL: https://gcc.gnu.org/viewcvs?rev=260793=gcc=rev Log: 2018-05-26 Jerry DeLislePR libgfortran/85840 * io/write.c (write_float_0): Use separate local variable for the float string length. Modified: trunk/libgfortran/io/write.c
[Bug ada/85937] New: [9 Regression] Failures in the Ada tests
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85937 Bug ID: 85937 Summary: [9 Regression] Failures in the Ada tests Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ada Assignee: unassigned at gcc dot gnu.org Reporter: dominiq at lps dot ens.fr CC: ebotcazou at gcc dot gnu.org Target Milestone: --- Host: x86_64-apple-darwin17 Target: x86_64-apple-darwin17 Build: x86_64-apple-darwin17 Between revisions r260557 and r260760, the following errors appeared on darwin FAIL: gnat.dg/sso/q10.adb -O3 -finline-functions (test for excess errors) UNRESOLVED: gnat.dg/sso/q10.adb -O3 -finline-functions compilation failed to produce executable FAIL: gnat.dg/sso/q11.adb -O3 -finline-functions (test for excess errors) UNRESOLVED: gnat.dg/sso/q11.adb -O3 -finline-functions compilation failed to produce executable with -m32, and FAIL: gnat.dg/sso/p6.adb -O3 -finline-functions (test for excess errors) UNRESOLVED: gnat.dg/sso/p6.adb -O3 -finline-functions compilation failed to produce executable with -m64. The errors are all raised CONSTRAINT_ERROR : erroneous memory access
[Bug fortran/85840] Memory leak in write.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85840 --- Comment #10 from Jerry DeLisle --- (In reply to Joshua Cogliati from comment #9) --- snip --- > I could look into either method of fixing this if you want. (And for what > it is worth, I do have copyright assignment paperwork from both myself and > my employer for GCC filed) First, thank you very much for the offer to help. Truthfully we do need more contributors and it does take time to get familiar with the code base, and it can be fun to work on. This bug is my doing for sure. I did a significant re-factoring of the code and lost track of that variable, res_len. The intent is that it does not get modified outside of those functions that are allocating the buffers. In fact it is getting modified by build_float_string in write_float.def. The build_float_string is being called within the macros of get_float_string. A variable needs to be used by thos macros, but I don't think it needs to be passed up the call chain. The simplest thing would be to give a different variable to get float string. @@ -1566,19 +1568,19 @@ write_float_0 (st_parameter_dt *dtp, const fnode *f, const char *source, int kin char buf_stack[BUF_STACK_SZ]; char str_buf[BUF_STACK_SZ]; char *buffer, *result; - size_t buf_size, res_len; + size_t buf_size, res_len, flt_str_len; /* Precision for snprintf call. */ int precision = get_precision (dtp, f, source, kind); /* String buffer to hold final result. */ result = select_string (dtp, f, str_buf, _len, kind); buffer = select_buffer (dtp, f, precision, buf_stack, _size, kind); get_float_string (dtp, f, source , kind, 0, buffer, - precision, buf_size, result, _len); - write_float_string (dtp, result, res_len); + precision, buf_size, result, _str_len); + write_float_string (dtp, result, flt_str_len); if (buf_size > BUF_STACK_SZ) free (buffer); I also noticed that the buffer size for this small float is 323 which seems crazy big, but there are other use cases where we need to hold much higher precision numbers of digits. It seems pointless to do an alloc for this case so I also suggest we do this: @@ -1465,7 +1466,7 @@ write_character (st_parameter_dt *dtp, const char *source, int kind, size_t leng /* Floating point helper functions. */ -#define BUF_STACK_SZ 256 +#define BUF_STACK_SZ 384 This eliminates 3 allocs in one I/O operation. The purpose of all this alloc logic is to eliminate them. The stack size change will mask the problem uncovered in this test case, so I have tested with and without it. Since I pretty much have this figured out, I will go ahead and do the commit. However, please dont hesitate to take on other bugs and join the gfortranners club. Much appreciated.
Re: PING^2: [PATCH] Don't mark IFUNC resolver as only called directly
On Thu, May 24, 2018 at 1:47 PM, H.J. Luwrote: > On Wed, May 23, 2018 at 8:35 AM, H.J. Lu wrote: >> On Wed, May 23, 2018 at 8:11 AM, Jan Hubicka wrote: On Wed, May 23, 2018 at 2:01 AM, Jan Hubicka wrote: >> On Tue, May 22, 2018 at 9:21 AM, Jan Hubicka wrote: >> >> > > class ipa_opt_pass_d; >> >> > > typedef ipa_opt_pass_d *ipa_opt_pass; >> >> > > @@ -2894,7 +2896,8 @@ >> >> > > cgraph_node::only_called_directly_or_aliased_p (void) >> >> > > && !DECL_STATIC_CONSTRUCTOR (decl) >> >> > > && !DECL_STATIC_DESTRUCTOR (decl) >> >> > > && !used_from_object_file_p () >> >> > > - && !externally_visible); >> >> > > + && !externally_visible >> >> > > + && !lookup_attribute ("ifunc", DECL_ATTRIBUTES >> >> > > (decl))); >> >> > >> >> > How's it handled for our own generated resolver functions? >> >> > That is, >> >> > isn't there sth cheaper than doing a lookup_attribute here? I >> >> > see >> >> > that make_dispatcher_decl nor >> >> > ix86_get_function_versions_dispatcher >> >> > adds the 'ifunc' attribute (though they are TREE_PUBLIC there). >> >> >> >> Is there any drawback of setting force_output flag? >> >> Honza >> >> >>> >> >> >>> Setting force_output may prevent some optimizations. Can we add >> >> >>> a bit >> >> >>> for IFUNC resolver? >> >> >>> >> >> >> >> >> >> Here is the patch to add ifunc_resolver to cgraph_node. Tested on >> >> >> x86-64 >> >> >> and i686. Any comments? >> >> >> >> >> > >> >> > PING: >> >> > >> >> > https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00647.html >> >> > >> >> >> >> PING. >> > OK, but please extend the verifier that ifunc_resolver flag is >> > equivalent to >> > lookup_attribute ("ifunc", DECL_ATTRIBUTES (decl)) >> > so we are sure things stays in sync. >> > >> >> Like this >> >> diff --git a/gcc/symtab.c b/gcc/symtab.c >> index 80f6f910c3b..954920b6dff 100644 >> --- a/gcc/symtab.c >> +++ b/gcc/symtab.c >> @@ -998,6 +998,13 @@ symtab_node::verify_base (void) >>error ("function symbol is not function"); >>error_found = true; >>} >> + else if ((lookup_attribute ("ifunc", DECL_ATTRIBUTES (decl)) >> + != NULL) >> + != dyn_cast (this)->ifunc_resolver) >> + { >> + error ("inconsistent `ifunc' attribute"); >> + error_found = true; >> + } >> } >>else if (is_a (this)) >> { >> >> >> Thanks. > Yes, thanks! > Honza I'd like to also fix it on GCC 8 branch for CET. Should I backport my patch to GCC 8 after a few days or use the simple patch for GCC 8: https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00588.html >>> >>> I would backport this one so we don't unnecesarily diverge. >>> Thanks! >>> Honza >> >> This is the backport which I will check into GCC 8 branch next week. >> > > This is the updated backport which I will check into GCC 8 branch next week. > This is the updated backport which I will check into GCC 8 branch next week. -- H.J. From 5ebddef01e810c1684ed0927c0dbb1239cf3c178 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Wed, 11 Apr 2018 12:31:21 -0700 Subject: [PATCH] Don't mark IFUNC resolver as only called directly Since IFUNC resolver is called indirectly, don't mark IFUNC resolver as only called directly. This patch adds ifunc_resolver to cgraph_node, sets ifunc_resolver for ifunc attribute and checks ifunc_resolver instead of looking up ifunc attribute. gcc/ Backport from mainline 2018-05-26 H.J. Lu PR target/85900 PR target/85345 * varasm.c (assemble_alias): Lookup ifunc attribute on error. 2018-05-24 H.J. Lu PR target/85900 PR target/85345 * varasm.c (assemble_alias): Check ifunc_resolver only on FUNCTION_DECL. 2018-05-22 H.J. Lu PR target/85345 * cgraph.h (cgraph_node::create): Set ifunc_resolver for ifunc attribute. (cgraph_node::create_alias): Likewise. (cgraph_node::get_availability): Check ifunc_resolver instead of looking up ifunc attribute. * cgraphunit.c (maybe_diag_incompatible_alias): Likewise. * varasm.c (do_assemble_alias): Likewise. (assemble_alias): Likewise. (default_binds_local_p_3): Likewise. * cgraph.h (cgraph_node): Add ifunc_resolver. (cgraph_node::only_called_directly_or_aliased_p): Return false for IFUNC resolver. * lto-cgraph.c (input_node): Set ifunc_resolver for ifunc attribute. * symtab.c
[Bug target/85918] Conversions to/from [unsigned] long long are not vectorized for AVX512DQ target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85918 --- Comment #2 from Jakub Jelinek --- Created attachment 44189 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44189=edit gcc9-pr85918-2.patch WIP fix for the rest, still need to write testcases and actually test it.
RISC-V ELF multilibs
Hello, I built a riscv64-rtems5 GCC (it uses gcc/config/riscv/t-elf-multilib). The following multilibs are built: riscv64-rtems5-gcc -print-multi-lib .; rv32i/ilp32;@march=rv32i@mabi=ilp32 rv32im/ilp32;@march=rv32im@mabi=ilp32 rv32iac/ilp32;@march=rv32iac@mabi=ilp32 rv32imac/ilp32;@march=rv32imac@mabi=ilp32 rv32imafc/ilp32f;@march=rv32imafc@mabi=ilp32f rv64imac/lp64;@march=rv64imac@mabi=lp64 rv64imafdc/lp64d;@march=rv64imafdc@mabi=lp64d If I print out the builtin defines and search paths for the default settings and the -march=rv64imafdc and compare the results I get: riscv64-rtems5-gcc -E -P -v -dD empty.c > def.txt 2>&1 riscv64-rtems5-gcc -E -P -v -dD empty.c -march=rv64imafdc > rv64imafdc.txt 2>&1 diff -u def.txt rv64imafdc.txt --- def.txt 2018-05-26 14:53:26.277760090 +0200 +++ rv64imafdc.txt 2018-05-26 14:53:47.705638409 +0200 @@ -4,8 +4,8 @@ Configured with: ../gcc-7.3.0/configure --prefix=/opt/rtems/5 --bindir=/opt/rtems/5/bin --exec_prefix=/opt/rtems/5 --includedir=/opt/rtems/5/include --libdir=/opt/rtems/5/lib --libexecdir=/opt/rtems/5/libexec --mandir=/opt/rtems/5/share/man --infodir=/opt/rtems/5/share/info --datadir=/opt/rtems/5/share --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=riscv64-rtems5 --disable-libstdcxx-pch --with-gnu-as --with-gnu-ld --verbose --with-newlib --disable-nls --without-included-gettext --disable-win32-registry --enable-version-specific-runtime-libs --disable-lto --enable-newlib-io-c99-formats --enable-newlib-iconv --enable-newlib-iconv-encodings=big5,cp775,cp850,cp852,cp855,cp866,euc_jp,euc_kr,euc_tw,iso_8859_1,iso_8859_10,iso_8859_11,iso_8859_13,iso_8859_14,iso_8859_15,iso_8859_2,iso_8859_3,iso_8859_4,iso_8859_5,iso_8859_6,iso_8859_7,iso_8859_8,iso_8859_9,iso_ir_111,koi8_r,koi8_ru,koi8_u,koi8_uni,ucs_2,ucs_2_internal,ucs_2be,ucs_2le,ucs_4,ucs_4_internal,ucs_4be,ucs_4le,us_ascii,utf_16,utf_16be,utf_16le,utf_8,win_1250,win_1251,win_1252,win_1253,win_1254,win_1255,win_1256,win_1257,win_1258 --enable-threads --disable-plugin --enable-libgomp --enable-languages=c,c++,ada Thread model: rtems gcc version 7.3.0 20180125 (RTEMS 5, RSB a3a6c34c150a357e57769a26a460c475e188438f, Newlib 3.0.0) (GCC) -COLLECT_GCC_OPTIONS='-E' '-P' '-v' '-dD' '-march=rv64gc' '-mabi=lp64d' - /opt/rtems/5/libexec/gcc/riscv64-rtems5/7.3.0/cc1 -E -quiet -v -P -imultilib rv64imafdc/lp64d empty.c -march=rv64gc -mabi=lp64d -dD +COLLECT_GCC_OPTIONS='-E' '-P' '-v' '-dD' '-march=rv64imafdc' '-mabi=lp64d' + /opt/rtems/5/libexec/gcc/riscv64-rtems5/7.3.0/cc1 -E -quiet -v -P -imultilib rv64imafdc/lp64d empty.c -march=rv64imafdc -mabi=lp64d -dD ignoring nonexistent directory "/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/../../../../riscv64-rtems5/sys-include" #include "..." search starts here: #include <...> search starts here: @@ -338,4 +338,4 @@ #define __ELF__ 1 COMPILER_PATH=/opt/rtems/5/libexec/gcc/riscv64-rtems5/7.3.0/:/opt/rtems/5/libexec/gcc/riscv64-rtems5/7.3.0/:/opt/rtems/5/libexec/gcc/riscv64-rtems5/:/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/:/opt/rtems/5/lib/gcc/riscv64-rtems5/:/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/../../../../riscv64-rtems5/bin/ LIBRARY_PATH=/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/rv64imafdc/lp64d/:/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/../../../../riscv64-rtems5/lib/rv64imafdc/lp64d/:/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/:/opt/rtems/5/lib/gcc/riscv64-rtems5/7.3.0/../../../../riscv64-rtems5/lib/:/lib/:/usr/lib/ -COLLECT_GCC_OPTIONS='-E' '-P' '-v' '-dD' '-march=rv64gc' '-mabi=lp64d' +COLLECT_GCC_OPTIONS='-E' '-P' '-v' '-dD' '-march=rv64imafdc' '-mabi=lp64d' This looks pretty much the same and the documentation says that G == IMAFD. Why is the default multilib and a variant identical? Most variants include the C extension. Would it be possible to add -march=rv32g and -march=rv64g variants? -- Sebastian Huber, embedded brains GmbH Address : Dornierstr. 4, D-82178 Puchheim, Germany Phone : +49 89 189 47 41-16 Fax : +49 89 189 47 41-09 Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Re: Enabling -ftree-slp-vectorize on -O2/Os
* Allan Sandfeld Jensen: > Anythhing else I should test or report? Interaction with -mstackrealign on i386, where it is required for system libraries to support applications which use the legacy ABI without stack alignment if you compile with -msse2 or -march=x86-64 -mtune=generic (and -mfpmath=sse).
[Bug target/85900] [9 Regression] ICEs after revision r260547 on darwin.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85900 H.J. Lu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #8 from H.J. Lu --- Fixed.
[Bug target/85345] Missing ENDBR in IFUNC resolver
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85345 --- Comment #4 from hjl at gcc dot gnu.org --- Author: hjl Date: Sat May 26 11:35:31 2018 New Revision: 260792 URL: https://gcc.gnu.org/viewcvs?rev=260792=gcc=rev Log: Don't check ifunc_resolver on error Since ifunc_resolver isn't set when an error is detected, we should lookup ifunc attribute in this case. PR target/85900 PR target/85345 * varasm.c (assemble_alias): Lookup ifunc attribute on error. Modified: trunk/gcc/ChangeLog trunk/gcc/varasm.c
[Bug target/85900] [9 Regression] ICEs after revision r260547 on darwin.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85900 --- Comment #7 from hjl at gcc dot gnu.org --- Author: hjl Date: Sat May 26 11:35:31 2018 New Revision: 260792 URL: https://gcc.gnu.org/viewcvs?rev=260792=gcc=rev Log: Don't check ifunc_resolver on error Since ifunc_resolver isn't set when an error is detected, we should lookup ifunc attribute in this case. PR target/85900 PR target/85345 * varasm.c (assemble_alias): Lookup ifunc attribute on error. Modified: trunk/gcc/ChangeLog trunk/gcc/varasm.c
Re: [PATCH] Check ifunc_resolver only on FUNCTION_DECL
On Fri, May 25, 2018 at 4:48 AM, H.J. Luwrote: > On Thu, May 24, 2018 at 04:43:25AM -0700, H.J. Lu wrote: >> Since ifunc_resolver is only valid on FUNCTION_DECL, check ifunc_resolver >> only on FUNCTION_DECL. >> >> Please test it on Darwin. >> >> >> H.J. >> --- >> PR target/85900 >> PR target/85345 >> * varasm.c (assemble_alias): Check ifunc_resolver only on >> FUNCTION_DECL. >> --- >> gcc/varasm.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/gcc/varasm.c b/gcc/varasm.c >> index 3bd9cbb69f0..bff43450a91 100644 >> --- a/gcc/varasm.c >> +++ b/gcc/varasm.c >> @@ -5917,7 +5917,8 @@ assemble_alias (tree decl, tree target) >> # else >>if (!DECL_WEAK (decl)) >> { >> - if (cgraph_node::get (decl)->ifunc_resolver) >> + if (TREE_CODE (decl) == FUNCTION_DECL >> + && cgraph_node::get (decl)->ifunc_resolver) >> error_at (DECL_SOURCE_LOCATION (decl), >> "ifunc is not supported in this configuration"); >> else >> -- > > Please test it on Darwin. > > H.J. > --- > Since ifunc_resolver isn't set when an error is detected, we should > lookup ifunc attribute in this case. > > PR target/85900 > PR target/85345 > * varasm.c (assemble_alias): Lookup ifunc attribute on error. > --- > gcc/varasm.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/gcc/varasm.c b/gcc/varasm.c > index 6b9f87b203f..4d332f50270 100644 > --- a/gcc/varasm.c > +++ b/gcc/varasm.c > @@ -5917,8 +5917,9 @@ assemble_alias (tree decl, tree target) > # else >if (!DECL_WEAK (decl)) > { > + /* NB: ifunc_resolver isn't set when an error is detected. */ > if (TREE_CODE (decl) == FUNCTION_DECL > - && cgraph_node::get (decl)->ifunc_resolver) > + && lookup_attribute ("ifunc", DECL_ATTRIBUTES (decl))) > error_at (DECL_SOURCE_LOCATION (decl), > "ifunc is not supported in this configuration"); > else > -- > 2.17.0 > Dominique verified that it fixed all Darwin issues. I am checking it in. -- H.J.
Re: [PATCH] Rename ufloat to floatuns and ufix_trunc to fixuns_trunc in a few patterns (PR target/85918)
On Fri, May 25, 2018 at 11:09 PM, Jakub Jelinekwrote: > Hi! > > The optab is looking for floatuns2 and > fixuns_trunc2, but some of the patterns are instead called > ufloat2 or ufix_trunc2 > and thus are only used from intrinsics. > > We can't change all spots, in two spots we have intentionally an > floatuns2 or fixuns_trunc2 expander that > uses for AVX512+ a ufloat*/ufix* insn and in other cases something > different, but for the cases I've changed we just give up before AVX512DQ. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2018-05-25 Jakub Jelinek > > PR target/85918 > * config/i386/i386.md (fixunssuffix, floatunssuffix): New code > attributes. > * config/i386/sse.md > > (float2): > Rename to ... > > (float2): > ... this. > > (float2): > Rename to ... > > (float2): > ... this. > (*floatv2div2sf2): Rename to ... > (*floatv2div2sf2): ... this. > (floatv2div2sf2_mask): Rename to ... > (floatv2div2sf2_mask): ... this. > (*floatv2div2sf2_mask_1): Rename to ... > (*floatv2div2sf2_mask_1): ... this. > (fix_truncv8dfv8si2): Rename > to ... > (fix_truncv8dfv8si2): > ... this. > > (fix_trunc2): > Rename to ... > > (fix_trunc2): > ... this. > > (fix_trunc2): > Rename to ... > > (fix_trunc2): > ... this. > (fix_truncv2sfv2di2): Rename to ... > (fix_truncv2sfv2di2): ... this. > (vec_pack_ufix_trunc_): Use gen_fixuns_truncv8dfv8si2 instead of > gen_ufix_truncv8dfv8si2. > * config/i386/i386-builtin.def (__builtin_ia32_cvttpd2uqq256_mask, > __builtin_ia32_cvttpd2uqq128_mask, __builtin_ia32_cvttps2uqq256_mask, > __builtin_ia32_cvttps2uqq128_mask, __builtin_ia32_cvtuqq2ps256_mask, > __builtin_ia32_cvtuqq2ps128_mask, __builtin_ia32_cvtuqq2pd256_mask, > __builtin_ia32_cvtuqq2pd128_mask, __builtin_ia32_cvttpd2udq512_mask, > __builtin_ia32_cvtuqq2ps512_mask, __builtin_ia32_cvtuqq2pd512_mask, > __builtin_ia32_cvttps2uqq512_mask, __builtin_ia32_cvttpd2uqq512_mask): > Use fixuns instead ufix or floatuns instead ufloat in CODE_FOR_ names. > > * gcc.target/i386/avx512dq-pr85918.c: New test. OK. Thanks, Uros. > --- gcc/config/i386/i386.md.jj 2018-05-25 14:34:52.339390522 +0200 > +++ gcc/config/i386/i386.md 2018-05-25 20:41:43.913430614 +0200 > @@ -981,10 +981,12 @@ (define_code_attr trunsuffix [(ss_trunca > ;; Used in signed and unsigned fix. > (define_code_iterator any_fix [fix unsigned_fix]) > (define_code_attr fixsuffix [(fix "") (unsigned_fix "u")]) > +(define_code_attr fixunssuffix [(fix "") (unsigned_fix "uns")]) > > ;; Used in signed and unsigned float. > (define_code_iterator any_float [float unsigned_float]) > (define_code_attr floatsuffix [(float "") (unsigned_float "u")]) > +(define_code_attr floatunssuffix [(float "") (unsigned_float "uns")]) > > ;; All integer modes. > (define_mode_iterator SWI1248x [QI HI SI DI]) > --- gcc/config/i386/sse.md.jj 2018-05-25 14:35:23.122416638 +0200 > +++ gcc/config/i386/sse.md 2018-05-25 20:21:41.939050655 +0200 > @@ -4853,7 +4853,7 @@ (define_insn "float (set_attr "prefix" "maybe_vex") > (set_attr "mode" "")]) > > -(define_insn > "float2" > +(define_insn > "float2" >[(set (match_operand:VF2_AVX512VL 0 "register_operand" "=v") > (any_float:VF2_AVX512VL > (match_operand: 1 "nonimmediate_operand" > "")))] > @@ -4863,7 +4863,7 @@ (define_insn "float (set_attr "prefix" "evex") > (set_attr "mode" "")]) > > -;; For float insn patterns > +;; For float insn patterns > (define_mode_attr qq2pssuff >[(V8SF "") (V4SF "{y}")]) > > @@ -4877,7 +4877,7 @@ (define_mode_attr sseintvecmode3 >[(V8SF "XI") (V4SF "OI") > (V8DF "OI") (V4DF "TI")]) > > -(define_insn > "float2" > +(define_insn > "float2" >[(set (match_operand:VF1_128_256VL 0 "register_operand" "=v") > (any_float:VF1_128_256VL >(match_operand: 1 "nonimmediate_operand" > "")))] > @@ -4887,7 +4887,7 @@ (define_insn "float (set_attr "prefix" "evex") > (set_attr "mode" "")]) > > -(define_insn "*floatv2div2sf2" > +(define_insn "*floatv2div2sf2" >[(set (match_operand:V4SF 0 "register_operand" "=v") > (vec_concat:V4SF > (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" > "vm")) > @@ -4898,7 +4898,7 @@ (define_insn "*floatv2div2s > (set_attr "prefix" "evex") > (set_attr "mode" "V4SF")]) > > -(define_insn "floatv2div2sf2_mask" > +(define_insn "floatv2div2sf2_mask" >[(set (match_operand:V4SF 0 "register_operand" "=v") > (vec_concat:V4SF > (vec_merge:V2SF > @@ -4914,7 +4914,7 @@ (define_insn "floatv2div2sf > (set_attr "prefix" "evex") > (set_attr "mode"
Re: Enabling -ftree-slp-vectorize on -O2/Os
On May 26, 2018 11:32:29 AM GMT+02:00, Allan Sandfeld Jensenwrote: >I brought this subject up earlier, and was told to suggest it again for >gcc 9, >so I have attached the preliminary changes. > >My studies have show that with generic x86-64 optimization it reduces >binary >size with around 0.5%, and when optimizing for x64 targets with SSE4 or > >better, it reduces binary size by 2-3% on average. The performance >changes are >negligible however*, and I haven't been able to detect changes in >compile time >big enough to penetrate general noise on my platform, but perhaps >someone has >a better setup for that? > >* I believe that is because it currently works best on non-optimized >code, it >is better at big basic blocks doing all kinds of things than tightly >written >inner loops. > >Anythhing else I should test or report? If you have access to SPEC CPU I'd like to see performance, size and compile-time effects of the patch on that. Embedded folks may want to rhn their favorite benchmark and report results as well. Richard. >Best regards >'Allan > > >diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi >index beba295bef5..05851229354 100644 >--- a/gcc/doc/invoke.texi >+++ b/gcc/doc/invoke.texi >@@ -7612,6 +7612,7 @@ also turns on the following optimization flags: > -fstore-merging @gol > -fstrict-aliasing @gol > -ftree-builtin-call-dce @gol >+-ftree-slp-vectorize @gol > -ftree-switch-conversion -ftree-tail-merge @gol > -fcode-hoisting @gol > -ftree-pre @gol >@@ -7635,7 +7636,6 @@ by @option{-O2} and also turns on the following >optimization flags: > -floop-interchange @gol > -floop-unroll-and-jam @gol > -fsplit-paths @gol >--ftree-slp-vectorize @gol > -fvect-cost-model @gol > -ftree-partial-pre @gol > -fpeel-loops @gol >@@ -8932,7 +8932,7 @@ Perform loop vectorization on trees. This flag is > >enabled by default at > @item -ftree-slp-vectorize > @opindex ftree-slp-vectorize >Perform basic block vectorization on trees. This flag is enabled by >default >at >-@option{-O3} and when @option{-ftree-vectorize} is enabled. >+@option{-O2} or higher, and when @option{-ftree-vectorize} is enabled. > > @item -fvect-cost-model=@var{model} > @opindex fvect-cost-model >diff --git a/gcc/opts.c b/gcc/opts.c >index 33efcc0d6e7..11027b847e8 100644 >--- a/gcc/opts.c >+++ b/gcc/opts.c >@@ -523,6 +523,7 @@ static const struct default_options >default_options_table[] = > { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 }, > { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 }, > { OPT_LEVELS_2_PLUS, OPT_fstore_merging, NULL, 1 }, >+{ OPT_LEVELS_2_PLUS, OPT_ftree_slp_vectorize, NULL, 1 }, > > /* -O3 optimizations. */ >{ OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 }, >@@ -539,7 +540,6 @@ static const struct default_options >default_options_table[] = > { OPT_LEVELS_3_PLUS, OPT_floop_unroll_and_jam, NULL, 1 }, > { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 }, > { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 }, >-{ OPT_LEVELS_3_PLUS, OPT_ftree_slp_vectorize, NULL, 1 }, >{ OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, >VECT_COST_MODEL_DYNAMIC >}, > { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 }, > { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
[Bug target/85900] [9 Regression] ICEs after revision r260547 on darwin.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85900 --- Comment #6 from Dominique d'Humieres --- > This patch fixes the ICE and related problems I have spotted. Full testing > in progress. Back to "normal"!
Re: [Aarch64] Vector Function Application Binary Interface Specification for OpenMP
Steve Ellceywrites: > On Wed, 2018-05-16 at 22:11 +0100, Richard Sandiford wrote: >> >> TARGET_HARD_REGNO_CALL_PART_CLOBBERED is the only current way >> of saying that an rtl instruction preserves the low part of a >> register but clobbers the high part. We would need something like >> Alan H's CLOBBER_HIGH patches to do it using explicit clobbers. >> >> Another approach would be to piggy-back on the -fipa-ra >> infrastructure >> and record that vector PCS functions only clobber Q0-Q7. If -fipa-ra >> knows that a function doesn't clobber Q8-Q15 then that should >> override >> TARGET_HARD_REGNO_CALL_PART_CLOBBERED. (I'm not sure whether it does >> in practice, but it should :-) And if it doesn't that's a bug that's >> worth fixing for its own sake.) >> >> Thanks, >> Richard > > Alan, > > I have been looking at your CLOBBER_HIGH patches to see if they > might be helpful in implementing the ARM SIMD Vector ABI in GCC. > I have also been looking at the -fipa-ra flag and how it works. > > I was wondering if you considered using the ipa-ra infrastructure > for the SVE work that you are currently trying to support with > the CLOBBER_HIGH macro? > > My current thought for the ABI work is to mark all the floating > point / vector registers as caller saved (the lower half of V8-V15 > are currently callee saved) and remove > TARGET_HARD_REGNO_CALL_PART_CLOBBERED. > This should work but would be inefficient. > > The next step would be to split get_call_reg_set_usage up into > two functions so that I don't have to pass in a default set of > registers. One function would return call_used_reg_set by > default (but could return a smaller set if it had actual used > register information) and the other would return regs_invalidated > by_call by default (but could also return a smaller set). > > Next I would add a 'largest mode used' array to call_cgraph_rtl_info > structure in addition to the current function_used_regs register > set. > > Then I could turn the get_call_reg_set_usage replacement functions > into target specific functions and with the information in the > call_cgraph_rtl_info structure and any simd attribute information on > a function I could modify what registers are really being used/invalidated > without being saved. > > If the called function only uses the bottom half of a register it would not > be marked as used/invalidated. If it uses the entire register and the > function is not marked as simd, then the register would marked as > used/invalidated. If the function was marked as simd the register would not > be marked because a simd function would save both the upper and lower halves > of a callee saved register (whereas a non simd function would only save the > lower half). > > Does this sound like something that could be used in place of your > CLOBBER_HIGH patch? One of the advantages of CLOBBER_HIGH is that it can be attached to arbitrary instructions, not just calls. The motivating example was tlsdesc_small_, which isn't treated as a call but as a normal instruction. (And I don't think we want to change that, since it's much easier for rtl optimisers to deal with normal instructions compared to calls. In general a call is part of a longer sequence of instructions that includes setting up arguments, etc.) The other use case (not implemented in the posted patches) would be to represent the effect of syscalls, which clobber the "SVE part" of all vector registers. In that case the clobber would need to be attached to an inline asm insn. On the wider point about changing the way call clobber information is represented: I agree it would be good to generalise what we have now. But if possible I think we should avoid target hooks that take a specific call, and instead make it an inherent part of the call insn itself, much like CALL_INSN_FUNCTION_USAGE is now. E.g. we could add a field that points to an ABI description, with -fipa-ra effectively creating ad-hoc ABIs. That ABI description could start out with whatever we think is relevant now and could grow over time. Thanks, Richard
Re: Why is REG_ALLOC_ORDER not defined on Aarch64
Andrew Pinskiwrites: > On Fri, May 25, 2018 at 3:35 PM, Steve Ellcey wrote: >> I was curious if there was any reason that REG_ALLOC_ORDER is not >> defined for Aarch64. Has anyone tried this to see if it could help >> performance? It is defined for many other platforms. > > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01815.html > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01822.html It looks like the immediate reason for reverting was the effect of listing the argument registers in reverse order. I wonder how much that actually helps with IRA and LRA? They track per-register costs, and would be able to increase the cost of a pseudo that conflicts with a hard-register call argument. It just felt like it might have been a "best practice" idea passed down from the old local.c and global.c days. Thanks, Richard
Enabling -ftree-slp-vectorize on -O2/Os
I brought this subject up earlier, and was told to suggest it again for gcc 9, so I have attached the preliminary changes. My studies have show that with generic x86-64 optimization it reduces binary size with around 0.5%, and when optimizing for x64 targets with SSE4 or better, it reduces binary size by 2-3% on average. The performance changes are negligible however*, and I haven't been able to detect changes in compile time big enough to penetrate general noise on my platform, but perhaps someone has a better setup for that? * I believe that is because it currently works best on non-optimized code, it is better at big basic blocks doing all kinds of things than tightly written inner loops. Anythhing else I should test or report? Best regards 'Allan diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index beba295bef5..05851229354 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -7612,6 +7612,7 @@ also turns on the following optimization flags: -fstore-merging @gol -fstrict-aliasing @gol -ftree-builtin-call-dce @gol +-ftree-slp-vectorize @gol -ftree-switch-conversion -ftree-tail-merge @gol -fcode-hoisting @gol -ftree-pre @gol @@ -7635,7 +7636,6 @@ by @option{-O2} and also turns on the following optimization flags: -floop-interchange @gol -floop-unroll-and-jam @gol -fsplit-paths @gol --ftree-slp-vectorize @gol -fvect-cost-model @gol -ftree-partial-pre @gol -fpeel-loops @gol @@ -8932,7 +8932,7 @@ Perform loop vectorization on trees. This flag is enabled by default at @item -ftree-slp-vectorize @opindex ftree-slp-vectorize Perform basic block vectorization on trees. This flag is enabled by default at -@option{-O3} and when @option{-ftree-vectorize} is enabled. +@option{-O2} or higher, and when @option{-ftree-vectorize} is enabled. @item -fvect-cost-model=@var{model} @opindex fvect-cost-model diff --git a/gcc/opts.c b/gcc/opts.c index 33efcc0d6e7..11027b847e8 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -523,6 +523,7 @@ static const struct default_options default_options_table[] = { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fstore_merging, NULL, 1 }, +{ OPT_LEVELS_2_PLUS, OPT_ftree_slp_vectorize, NULL, 1 }, /* -O3 optimizations. */ { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 }, @@ -539,7 +540,6 @@ static const struct default_options default_options_table[] = { OPT_LEVELS_3_PLUS, OPT_floop_unroll_and_jam, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 }, -{ OPT_LEVELS_3_PLUS, OPT_ftree_slp_vectorize, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC }, { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
Re: [PING] [PATCH] Avoid excessive function type casts with splay-trees
On 05/17/18 16:37, Bernd Edlinger wrote: > On 05/17/18 15:39, Richard Biener wrote: >> On Thu, May 17, 2018 at 3:21 PM Bernd Edlinger >>>> wrote: >> >>> Ping... >> >> So this makes all traditional users go through the indirect >> splay_tree_compare_wrapper >> and friends (which is also exported for no good reason?). And all users >> are traditional >> at the moment. >> > > all except gcc/typed-splay-tree.h which only works if VALUE_TYPE is > compatible with uint_ptr_t but cannot check this requirement. > This one worried me the most. > > But not having to rewrite omp-low.c for instance where splay_tree_lookup > and access to n->value are made all the time, made me think it > will not work to rip out the old interface completely. > Well, I think it will be best to split this patch in two parts: One that adds just two utility functions for avoiding undefined function type casts which can be used with the original C interface. This first part is attached. And another part that uses a similar approach as the splay-tree in libgomp, but instead of creating a type-safe C interface it should translate the complete code from splay-tree.c/.h into a template. The second part, I plan to do at a later time. Is this OK for trunk? Thanks Bernd. include: 2018-05-26 Bernd Edlinger * splay-tree.h (splay_tree_compare_strings, splay_tree_delete_pointers): Declare new utility functions. libiberty: 2018-05-26 Bernd Edlinger * splay-tree.c (splay_tree_compare_strings, splay_tree_delete_pointers): New utility functions. gcc: 2018-05-26 Bernd Edlinger * tree-dump.c (dump_node): Use splay_tree_delete_pointers. c-family: 2018-05-26 Bernd Edlinger * c-lex.c (get_fileinfo): Use splay_tree_compare_strings and splay_tree_delete_pointers. cp: 2018-05-26 Bernd Edlinger * decl2.c (start_static_storage_duration_function): Use splay_tree_delete_pointers. Index: gcc/c-family/c-lex.c === --- gcc/c-family/c-lex.c (revision 260671) +++ gcc/c-family/c-lex.c (working copy) @@ -103,11 +103,9 @@ get_fileinfo (const char *name) struct c_fileinfo *fi; if (!file_info_tree) -file_info_tree = splay_tree_new ((splay_tree_compare_fn) - (void (*) (void)) strcmp, +file_info_tree = splay_tree_new (splay_tree_compare_strings, 0, - (splay_tree_delete_value_fn) - (void (*) (void)) free); + splay_tree_delete_pointers); n = splay_tree_lookup (file_info_tree, (splay_tree_key) name); if (n) Index: gcc/cp/decl2.c === --- gcc/cp/decl2.c (revision 260671) +++ gcc/cp/decl2.c (working copy) @@ -3595,8 +3595,7 @@ start_static_storage_duration_function (unsigned c priority_info_map = splay_tree_new (splay_tree_compare_ints, /*delete_key_fn=*/0, /*delete_value_fn=*/ - (splay_tree_delete_value_fn) - (void (*) (void)) free); + splay_tree_delete_pointers); /* We always need to generate functions for the DEFAULT_INIT_PRIORITY so enter it now. That way when we walk Index: gcc/tree-dump.c === --- gcc/tree-dump.c (revision 260671) +++ gcc/tree-dump.c (working copy) @@ -736,8 +736,7 @@ dump_node (const_tree t, dump_flags_t flags, FILE di.flags = flags; di.node = t; di.nodes = splay_tree_new (splay_tree_compare_pointers, 0, - (splay_tree_delete_value_fn) - (void (*) (void)) free); + splay_tree_delete_pointers); /* Queue up the first node. */ queue (, t, DUMP_NONE); Index: include/splay-tree.h === --- include/splay-tree.h (revision 260671) +++ include/splay-tree.h (working copy) @@ -147,7 +147,9 @@ extern splay_tree_node splay_tree_max (splay_tree) extern splay_tree_node splay_tree_min (splay_tree); extern int splay_tree_foreach (splay_tree, splay_tree_foreach_fn, void*); extern int splay_tree_compare_ints (splay_tree_key, splay_tree_key); -extern int splay_tree_compare_pointers (splay_tree_key, splay_tree_key); +extern int splay_tree_compare_pointers (splay_tree_key, splay_tree_key); +extern int splay_tree_compare_strings (splay_tree_key, splay_tree_key); +extern void splay_tree_delete_pointers (splay_tree_value); #ifdef __cplusplus } Index: libiberty/splay-tree.c === --- libiberty/splay-tree.c (revision 260671) +++ libiberty/splay-tree.c (working copy) @@ -31,6 +31,9 @@ Boston, MA 02110-1301, USA. */ #ifdef HAVE_STDLIB_H #include #endif +#ifdef HAVE_STRING_H +#include +#endif #include @@ -590,3 +593,19 @@
[Bug bootstrap/85921] /gcc/c-family/c-warn.c fails to build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85921 Jakub Jelinek changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |FIXED --- Comment #19 from Jakub Jelinek --- Worked around in 8.1+. That doesn't mean the headers you are using aren't seriously broken and will break other stuff.
[Bug bootstrap/85921] /gcc/c-family/c-warn.c fails to build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85921 --- Comment #18 from Jakub Jelinek --- Author: jakub Date: Sat May 26 06:56:41 2018 New Revision: 260791 URL: https://gcc.gnu.org/viewcvs?rev=260791=gcc=rev Log: PR bootstrap/85921 * c-warn.c (diagnose_mismatched_attributes): Remove unnecessary noinline variable to workaround broken kernel headers. Modified: branches/gcc-8-branch/gcc/c-family/ChangeLog branches/gcc-8-branch/gcc/c-family/c-warn.c
[Bug bootstrap/85921] /gcc/c-family/c-warn.c fails to build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85921 --- Comment #17 from Jakub Jelinek --- Author: jakub Date: Sat May 26 06:40:50 2018 New Revision: 260790 URL: https://gcc.gnu.org/viewcvs?rev=260790=gcc=rev Log: PR bootstrap/85921 * c-warn.c (diagnose_mismatched_attributes): Remove unnecessary noinline variable to workaround broken kernel headers. Modified: trunk/gcc/c-family/ChangeLog trunk/gcc/c-family/c-warn.c
[Bug c++/85936] New: GCC incorrectly implements [expr.prim.lambda.capture]/10.2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85936 Bug ID: 85936 Summary: GCC incorrectly implements [expr.prim.lambda.capture]/10.2 Product: gcc Version: 8.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: lebedev.ri at gmail dot com Target Milestone: --- https://godbolt.org/g/qX6k2H # 0 "" 3 template void b(int, a); template void b(int, a &, c) { b(0, [=] { e; }); } void d() { b(0, d, 0); } gcc complans: : error: 'void b(int, a) [with a = b(int, a&&, c) [with a = void (&)(); c = int]::]', declared using local type 'b(int, a&&, c) [with a = void (&)(); c = int]:: ', is used but never defined [-fpermissive] That is incorrect, the code is perfectly valid.
Re: [PATCH] PR target/85358 patch v2: Add target hook to prevent default widening
On May 25, 2018 8:49:47 PM GMT+02:00, Michael Meissnerwrote: >I redid the patch to make the target hook only apply for scalar float >points, >and I removed all of the integer only subcases. > >I have checked this on a little endian Power8 system, and verified that >it >bootstraps correctly and there are no regressions. I have just started >an >x86_64 build. Assuming that build has no regressions, can I check this >into >GCC 9? This bug appears in GCC 8, and I would like to back port this >patch to >GCC 8 as well before GCC 8.2 goes out. What happens if you hack genmodes to not claim IFmode has any wider relationship with other modes? Richard. >[gcc] >2018-05-25 Michael Meissner > > PR target/85358 > * target.def (default_fp_widening_p): New target hook to automatic > widening betwen two floating point modes. > * optabs.c (expand_binop): Do not automatically widen a binary or > unary scalar floating point op if the backend says that the > widening should not occur. > (expand_twoval_unop): Likewise. > (expand_twoval_binop): Likewise. > (expand_unop): Likewise. > * config/rs6000/rs6000.c (TARGET_DEFAULT_FP_WIDENING_P): Define. > (rs6000_default_fp_widening_p): New target hook to prevent > automatic widening between IEEE 128-bit floating point and IBM > extended double floating point. > * doc/tm.texi (Target Hooks): Document new target hook > default_fp_widening_p. > * doc/tm.texi.in (Target Hooks): Likewise. > >[gcc/testsuite] >2018-05-25 Michael Meissner > > PR target/85358 > * gcc.target/powerpc/pr85358.c: New test.
Re: PR80155: Code hoisting and register pressure
On May 25, 2018 9:25:51 PM GMT+02:00, Jeff Lawwrote: >On 05/25/2018 11:54 AM, Richard Biener wrote: >> On May 25, 2018 6:57:13 PM GMT+02:00, Jeff Law >wrote: >>> On 05/25/2018 03:49 AM, Bin.Cheng wrote: On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni wrote: > On 23 May 2018 at 18:37, Jeff Law wrote: >> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: >>> On 23 May 2018 at 13:58, Richard Biener >wrote: On Wed, 23 May 2018, Prathamesh Kulkarni wrote: > Hi, > I am trying to work on PR80155, which exposes a problem with >>> code > hoisting and register pressure on a leading embedded benchmark >>> for ARM > cortex-m7, where code-hoisting causes an extra register spill. > > I have attached two test-cases which (hopefully) are >>> representative of > the original test-case. > The first one (trans_dfa.c) is bigger and somewhat similar to >>> the > original test-case and trans_dfa_2.c is hand-reduced version >of > trans_dfa.c. There's 2 spills caused with trans_dfa.c > and one spill with trans_dfa_2.c due to lesser amount of >cases. > The test-cases in the PR are probably not relevant. > > Initially I thought the spill was happening because of "too >many > hoistings" taking place in original test-case thus increasing >>> the > register pressure, but it seems the spill is possibly caused >>> because > expression gets hoisted out of a block that is on loop exit. > > For example, the following hoistings take place with >>> trans_dfa_2.c: > > (1) Inserting expression in block 4 for code hoisting: > {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) > > (2) Inserting expression in block 4 for code hoisting: >>> {plus_expr,_4,1} (0006) > > (3) Inserting expression in block 4 for code hoisting: > {pointer_plus_expr,s_33,1} (0023) > > (4) Inserting expression in block 3 for code hoisting: > {pointer_plus_expr,s_33,1} (0023) > > The issue seems to be hoisting of (*tab + 1) which consists of >>> first > two hoistings in block 4 > from blocks 5 and 9, which causes the extra spill. I verified >>> that by > disabling hoisting into block 4, > which resulted in no extra spills. > > I wonder if that's because the expression (*tab + 1) is >getting > hoisted from blocks 5 and 9, > which are on loop exit ? So the expression that was previously > computed in a block on loop exit, gets hoisted outside that >>> block > which possibly makes the allocator more defensive ? Similarly > disabling hoisting of expressions which appeared in blocks on >>> loop > exit in original test-case prevented the extra spill. The >other > hoistings didn't seem to matter. I think that's simply co-incidence. The only thing that makes a block that also exits from the loop special is that an expression could be sunk out of the loop and hoisting >(commoning with another path) could prevent that. But that isn't what is happening here and it would be a pass ordering issue as the sinking pass runs only after hoisting (no idea why exactly but I guess there are cases where we want to prefer CSE over sinking). So you could try if re-ordering PRE and sinking >helps your testcase. >>> Thanks for the suggestions. Placing sink pass before PRE works >>> for both these test-cases! Sadly it still causes the spill for >the >>> benchmark -:( >>> I will try to create a better approximation of the original >>> test-case. What I do see is a missed opportunity to merge the successors of BB 4. After PRE we have [local count: 159303558]: : pretmp_123 = *tab_37(D); _87 = pretmp_123 + 1; if (c_36 == 65) goto ; [34.00%] else goto ; [66.00%] [local count: 54163210]: *tab_37(D) = _87; _96 = MEM[(char *)s_57 + 1B]; if (_96 != 0) goto ; [89.00%] else goto ; [11.00%] [local count: 105140348]: *tab_37(D) = _87; _56 = MEM[(char *)s_57 + 1B]; if (_56 != 0) goto ; [89.00%] else goto ; [11.00%] here at least the stores and loads can be hoisted. Note this may also point at the real issue of the code hoisting which is tearing apart the RMW operation? >>> Indeed, this possibility seems much more likely than block being >>> on loop exit. >>> I will try to "hardcode" the load/store hoists into
Re: [PATCH] Remove useless noinline variable (PR bootstrap/85921)
On May 25, 2018 11:03:50 PM GMT+02:00, Jakub Jelinekwrote: >Hi! > >The following variable only makes the code larger and less readable. >In addition, with some broken kernel headers that redefine noinline >it breaks bootstrap. > >Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok >for >trunk? OK. Richard. >2018-05-25 Jakub Jelinek > > PR bootstrap/85921 > * c-warn.c (diagnose_mismatched_attributes): Remove unnecessary > noinline variable to workaround broken kernel headers. > >--- gcc/c-family/c-warn.c.jj 2018-05-21 13:15:33.878575581 +0200 >+++ gcc/c-family/c-warn.c 2018-05-25 14:28:12.151050892 +0200 >@@ -2246,18 +2246,16 @@ diagnose_mismatched_attributes (tree old > newdecl); > > /* Diagnose inline __attribute__ ((noinline)) which is silly. */ >- const char *noinline = "noinline"; >- > if (DECL_DECLARED_INLINE_P (newdecl) > && DECL_UNINLINABLE (olddecl) >- && lookup_attribute (noinline, DECL_ATTRIBUTES (olddecl))) >+ && lookup_attribute ("noinline", DECL_ATTRIBUTES (olddecl))) >warned |= warning (OPT_Wattributes, "inline declaration of %qD follows >" >- "declaration with attribute %qs", newdecl, noinline); >+ "declaration with attribute % ", newdecl); > else if (DECL_DECLARED_INLINE_P (olddecl) > && DECL_UNINLINABLE (newdecl) > && lookup_attribute ("noinline", DECL_ATTRIBUTES (newdecl))) >warned |= warning (OPT_Wattributes, "declaration of %q+D with attribute >" >- "%qs follows inline declaration", newdecl, noinline); >+ "% follows inline declaration", newdecl); > > return warned; > } > > Jakub