[Bug tree-optimization/56354] [4.8 Regression] -O2 creates incorrect for loop code
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56354 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org 2013-02-16 08:07:59 UTC --- Or perhaps you wanted to multiply by 4U.
[Bug c++/54276] Lambda in a Template Function Undefined Reference to local static
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54276 --- Comment #8 from Paolo Carlini paolo.carlini at oracle dot com 2013-02-16 09:24:11 UTC --- ... and 4.7.3 too.
[Bug sanitizer/56330] ICE: verify_gimple failed: gimple_bb (stmt) is set to a wrong basic block with -fsanitize=address
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56330 --- Comment #7 from Dodji Seketeli dodji at gcc dot gnu.org 2013-02-16 09:30:10 UTC --- FWIW, I have posted the patch for this to http://gcc.gnu.org/ml/gcc-patches/2013-02/msg00795.html
[Bug sanitizer/56330] ICE: verify_gimple failed: gimple_bb (stmt) is set to a wrong basic block with -fsanitize=address
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56330 --- Comment #8 from Dodji Seketeli dodji at gcc dot gnu.org 2013-02-16 09:33:01 UTC --- Author: dodji Date: Sat Feb 16 09:32:56 2013 New Revision: 196102 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=196102 Log: [asan] Fix for PR asan/56330 gcc/ * asan.c (get_mem_refs_of_builtin_call): White space and style cleanup. (instrument_mem_region_access): Do not forget to always put instrumentation of the of 'base' and 'base + len' in a if (len != 0) statement, even for cases where either 'base' or 'base + len' are not instrumented -- because they have been previously instrumented. Simplify the logic by putting all the statements instrument 'base + len' inside a sequence, and then insert that sequence right before the current insertion point. Then, to instrument 'base + len', just get an iterator on that statement. And do not forget to update the pointer to iterator the function received as argument. gcc/testsuite/ * c-c++-common/asan/no-redundant-instrumentation-4.c: New test file. * c-c++-common/asan/no-redundant-instrumentation-5.c: Likewise. * c-c++-common/asan/no-redundant-instrumentation-6.c: Likewise. * c-c++-common/asan/no-redundant-instrumentation-7.c: Likewise. * c-c++-common/asan/no-redundant-instrumentation-8.c: Likewise. * c-c++-common/asan/pr56330.c: Likewise. * c-c++-common/asan/no-redundant-instrumentation-1.c (test1): Ensure the size argument of __builtin_memcpy is a constant. Added: trunk/gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-4.c trunk/gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-5.c trunk/gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-6.c trunk/gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-7.c trunk/gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-8.c trunk/gcc/testsuite/c-c++-common/asan/pr56330.c Modified: trunk/gcc/ChangeLog trunk/gcc/asan.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-1.c
[Bug middle-end/55030] [4.8 Regression]: gcc.c-torture/execute/builtins/memcpy-chk.c execution, -Os (et al)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55030 Eric Botcazou ebotcazou at gcc dot gnu.org changed: What|Removed |Added CC||ebotcazou at gcc dot ||gnu.org --- Comment #10 from Eric Botcazou ebotcazou at gcc dot gnu.org 2013-02-16 09:37:22 UTC --- I'm getting back to this because I think that we should reinstate the original patch, now that the blockage patch has been installed. I have run into the same issue as your original issue with a private port on the 4.7 branch: the clobber causes the restoring of the frame pointer to be deleted http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01172.html Later reload allocates a stack slot to a pseudo that is set before the setjmp and used after, but the frame pointer doesn't have a consistent value... Clearly the frame pointer needs to be restored so the clobber is wrong. It was there because the final blockage wasn't blocking enough, but the blockage patch is supposed to have fixed that.
[Bug middle-end/55030] [4.8 Regression]: gcc.c-torture/execute/builtins/memcpy-chk.c execution, -Os (et al)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55030 --- Comment #11 from Eric Botcazou ebotcazou at gcc dot gnu.org 2013-02-16 09:44:31 UTC --- While we are at it, we could also revert the dse.c and cselib.c hunks of the blockage patch, which weren't strictly necessary. Jakub was really concerned about their impact on volatile asms.
[Bug sanitizer/56330] ICE: verify_gimple failed: gimple_bb (stmt) is set to a wrong basic block with -fsanitize=address
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56330 Dodji Seketeli dodji at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED --- Comment #9 from Dodji Seketeli dodji at gcc dot gnu.org 2013-02-16 09:58:22 UTC --- This should now be fixed in trunk (4.8).
[Bug target/55190] [SH] ivopts causes loop setup bloat
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55190 Oleg Endo olegendo at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2013-02-16 Ever Confirmed|0 |1 --- Comment #1 from Oleg Endo olegendo at gcc dot gnu.org 2013-02-16 11:12:47 UTC --- As of rev. 196091 this problem still exists.
[Bug target/54089] [SH] Refactor shift patterns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #29 from Oleg Endo olegendo at gcc dot gnu.org 2013-02-16 11:36:37 UTC --- Another case taken from CSiBE / bzip2, where reusing the intermediate shift result would be better: void uInt64_from_UInt32s ( UInt64* n, UInt32 lo32, UInt32 hi32 ) { n-b[7] = (UChar)((hi32 24) 0xFF); n-b[6] = (UChar)((hi32 16) 0xFF); n-b[5] = (UChar)((hi32 8) 0xFF); n-b[4] = (UChar) (hi32 0xFF); /* n-b[3] = (UChar)((lo32 24) 0xFF); n-b[2] = (UChar)((lo32 16) 0xFF); n-b[1] = (UChar)((lo32 8) 0xFF); n-b[0] = (UChar) (lo32 0xFF); */ } on rev 196091 with -O2 -m4 compiles to: mov r6,r0 shlr16 r0 shlr8 r0 mov.b r0,@(7,r4) mov r6,r0 shlr16 r0 mov.b r0,@(6,r4) mov r6,r0 shlr8 r0 mov.b r0,@(5,r4) mov r6,r0 mov.b r0,@(4,r4) which would be better as: mov r6,r0 mov.b r0,@(4,r4) shlr8 r0 mov.b r0,@(5,r4) shlr8 r0 mov.b r0,@(6,r4) shlr8 r0 mov.b r0,@(7,r4) this would require reordering of the mem stores, which should be OK to do if the mem is not volatile. Reordering the stores manually: void uInt64_from_UInt32s ( UInt64* n, UInt32 lo32, UInt32 hi32 ) { n-b[4] = (UChar) (hi32 0xFF); n-b[5] = (UChar)((hi32 8) 0xFF); n-b[6] = (UChar)((hi32 16) 0xFF); n-b[7] = (UChar)((hi32 24) 0xFF); } still results in: mov r6,r0 mov.b r0,@(4,r4) mov r6,r0 shlr8 r0 mov.b r0,@(5,r4) mov r6,r0 shlr16 r0 mov.b r0,@(6,r4) mov r6,r0 shlr16 r0 shlr8 r0 mov.b r0,@(7,r4) ... at least this case should be handled, I think.
[Bug tree-optimization/56355] New: abs and multiplication
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56355 Bug #: 56355 Summary: abs and multiplication Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: gli...@gcc.gnu.org #include cmath #include cstdlib typedef double T; // typedef int T; T f(T a, T b){ return std::abs(a)*std::abs(b); } T g(T a){ return std::abs(a)*std::abs(a); } T h(T a){ return std::abs(a*a); } Compiled with g++ -O3 (-ffast-math doesn't help), g is properly optimized to a*a but only at the RTL level, and the other 2 are not optimized at all. If I make T a typedef for int, nothing is optimized. For the first one, I would expect abs(a*b), and for the others just a*a. Related to PR 31548.
[Bug c++/54835] [C++11] Explicit default constructors not respected during copy-list-initialization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54835 --- Comment #3 from Daniel Krügler daniel.kruegler at googlemail dot com 2013-02-16 11:57:21 UTC --- (In reply to comment #2) I'm not opposed to this behavior, but I think it would be a language change. Thanks Jason. I just see now http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1518 Unless I'm mistaken, this is actually the relevant issue. I suggest to defer this issue and mark it with CWG 1518. I'm not sure how this would be best done, so leave it to the administrators of this list.
[Bug tree-optimization/56355] abs and multiplication
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56355 --- Comment #1 from Marc Glisse glisse at gcc dot gnu.org 2013-02-16 12:07:28 UTC --- Actually, for g/h with double, using __builtin_fabs instead of std::abs does it, so it is just the usual lack of combine at the tree level. But there is still f, and the builtin approach doesn't help for int.
[Bug ada/52123] [4.7/4.8 Regression] gcc bootstrap with ada fails on mingw target
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52123 --- Comment #13 from Daniel Starke daniel.f.starke at freenet dot de 2013-02-16 12:41:42 UTC --- I just tried out to bootstrap r196092 on mingw32. There is still one more cast patch missing to make it work for that target. diff -uart gcc-4.8.0-r196092/gcc/ada/seh_init.c gcc-4.8.0/gcc/ada/seh_init.c --- gcc-4.8.0-r196092/gcc/ada/seh_init.c2013-02-16 08:26:53 + +++ gcc-4.8.0/gcc/ada/seh_init.c2013-02-06 12:01:20 + @@ -198,7 +198,7 @@ #endif Raise_From_Signal_Handler (exception, msg); - return 0; /* This is never reached, avoid compiler warning */ + return (EXCEPTION_DISPOSITION)0; /* This is never reached, avoid compiler warning */ } #endif /* !(defined (_WIN64) defined (__SEH__)) */
[Bug libstdc++/56332] libstdc++-v3 does not support x86_64-pc-mingw64: No support for this host/target combination
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56332 --- Comment #6 from devurandom at gmx dot net 2013-02-16 13:15:27 UTC --- Ok... I assumed that in the cpu-vendor-os triplet the os part contains the reference to the c library and/or kernel, while vendor refers to the distribution that packaged the compiler (or is often just pc for i386). Apparently this was completely wrong. I'll ask the Gentoo maintainers to rename the package to something that is not plain wrong.
[Bug libstdc++/56332] libstdc++-v3 does not support x86_64-pc-mingw64: No support for this host/target combination
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56332 --- Comment #7 from devurandom at gmx dot net 2013-02-16 13:20:49 UTC --- P.S: Is relaxing the match to accept mingw*, because the library and compiler are called mingw(-w64), an option? That shouldn't hurt anyone and not make anything more complicated either.
[Bug ada/52123] [4.7/4.8 Regression] gcc bootstrap with ada fails on mingw target
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52123 --- Comment #14 from Eric Botcazou ebotcazou at gcc dot gnu.org 2013-02-16 13:45:48 UTC --- I just tried out to bootstrap r196092 on mingw32. There is still one more cast patch missing to make it work for that target. diff -uart gcc-4.8.0-r196092/gcc/ada/seh_init.c gcc-4.8.0/gcc/ada/seh_init.c --- gcc-4.8.0-r196092/gcc/ada/seh_init.c2013-02-16 08:26:53 + +++ gcc-4.8.0/gcc/ada/seh_init.c2013-02-06 12:01:20 + @@ -198,7 +198,7 @@ #endif Raise_From_Signal_Handler (exception, msg); - return 0; /* This is never reached, avoid compiler warning */ + return (EXCEPTION_DISPOSITION)0; /* This is never reached, avoid compiler warning */ } #endif /* !(defined (_WIN64) defined (__SEH__)) */ That's ugly, please use ATTRIBUTE_NORETURN instead.
[Bug c++/54835] [C++11][DR 1518] Explicit default constructors not respected during copy-list-initialization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54835 Jason Merrill jason at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED AssignedTo|unassigned at gcc dot |jason at gcc dot gnu.org |gnu.org | Summary|[C++11] Explicit default|[C++11][DR 1518] Explicit |constructors not respected |default constructors not |during |respected during |copy-list-initialization|copy-list-initialization --- Comment #4 from Jason Merrill jason at gcc dot gnu.org 2013-02-16 15:02:22 UTC --- Ah, good point. I think we decided in Portland to go with the behavior you expect; all that's left is the drafting (which is also for me to do). Thanks.
[Bug target/56110] Sub-optimal code: unnecessary CMP after AND
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56110 --- Comment #1 from Tilman Sauerbeck til...@code-monkey.de 2013-02-16 16:49:34 UTC --- Changing the literal in the test function so that it fits in 8 bits makes gcc go with the TST instruction instead of AND+CMP: unsigned f2 (unsigned x, unsigned m) { if (m 0x80) x = 8; return x; } = tstr1, #128 movner0, r0, lsr #8 bxlr So I guess I shouldn't ask why gcc generates AND+CMP instead of ANDS, but why it chooses not to use TST.
[Bug c/56356] New: DJGPP compiler crashing
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56356 Bug #: 56356 Summary: DJGPP compiler crashing Classification: Unclassified Product: gcc Version: 4.7.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: fabrizio...@tiscali.it I am building gcc as cross-compiler for DJGPP, running on an Ubuntu 12.04 machine. I downloaded the sources of binutils 2.23.1 and gcc 4.7.2. binutils compiled with no problems. gcc's build proceeded for a while, so that the compiler was actually built, but, while compiling libstdc++, the cross-compiler crashed. This is the command (part of the build process) that caused the crash /home/fabrizio/dev/djgpp/cross/gcc/./gcc/xgcc -v -shared-libgcc -B/home/fabrizio/dev/djgpp/cross/gcc/./gcc -nostdinc++ -L/home/fabrizio/dev/djgpp/cross/gcc/i586-pc-msdosdjgpp/libstdc++-v3/src -L/home/fabrizio/dev/djgpp/cross/gcc/i586-pc-msdosdjgpp/libstdc++-v3/src/.libs -B/home/fabrizio/dev/djgpp/i586-pc-msdosdjgpp/bin/ -B/home/fabrizio/dev/djgpp/i586-pc-msdosdjgpp/lib/ -isystem /home/fabrizio/dev/djgpp/i586-pc-msdosdjgpp/include -isystem /home/fabrizio/dev/djgpp/i586-pc-msdosdjgpp/sys-include -I/home/fabrizio/dev/djgpp/cross/gcc-4.7.2/libstdc++-v3/../libgcc -I/home/fabrizio/dev/djgpp/cross/gcc/i586-pc-msdosdjgpp/libstdc++-v3/include/i586-pc-msdosdjgpp -I/home/fabrizio/dev/djgpp/cross/gcc/i586-pc-msdosdjgpp/libstdc++-v3/include -I/home/fabrizio/dev/djgpp/cross/gcc-4.7.2/libstdc++-v3/libsupc++ -fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -Wabi -fdiagnostics-show-location=once -ffunction-sections -fdata-sections -frandom-seed=strstream.lo -g -O2 -I/home/fabrizio/dev/djgpp/cross/gcc/i586-pc-msdosdjgpp/libstdc++-v3/include/backward -Wno-deprecated -c ../../../../../gcc-4.7.2/libstdc++-v3/src/c++98/strstream.cc -o strstream.o I even tried to use gdb to debug cc1plus. gdb --args /home/fabrizio/dev/djgpp/cross/gcc/./gcc/cc1plus -quiet -nostdinc++ -v -I /home/fabrizio/dev/djgpp/cross/gcc-4.7.2/libstdc++-v3/../libgcc -I /home/fabrizio/dev/djgpp/cross/gcc/i586-pc-msdosdjgpp/libstdc++-v3/include/i586-pc-msdosdjgpp -I /home/fabrizio/dev/djgpp/cross/gcc/i586-pc-msdosdjgpp/libstdc++-v3/include -I /home/fabrizio/dev/djgpp/cross/gcc-4.7.2/libstdc++-v3/libsupc++ -I /home/fabrizio/dev/djgpp/cross/gcc/i586-pc-msdosdjgpp/libstdc++-v3/include/backward -iprefix /home/fabrizio/dev/djgpp/cross/gcc/gcc/../lib/gcc/i586-pc-msdosdjgpp/4.7.2/ -isystem /home/fabrizio/dev/djgpp/cross/gcc/./gcc/include -isystem /home/fabrizio/dev/djgpp/cross/gcc/./gcc/include-fixed -remap -imacros /home/fabrizio/dev/djgpp/i586-pc-msdosdjgpp/bin/../include/sys/version.h -isystem /home/fabrizio/dev/djgpp/i586-pc-msdosdjgpp/include -isystem /home/fabrizio/dev/djgpp/i586-pc-msdosdjgpp/sys-include ../../../../../gcc-4.7.2/libstdc++-v3/src/c++98/strstream.cc -quiet -dumpbase strstream.cc -mtune=pentium -march=pentium -auxbase-strip strstream.o -g -O2 -Wall -Wextra -Wwrite-strings -Wcast-qual -Wabi -Wno-deprecated -version -fno-implicit-templates -fdiagnostics-show-location=once -ffunction-sections -fdata-sections -frandom-seed=strstream.lo -o /tmp/ccqldaCk.s And this is the result Program received signal SIGSEGV, Segmentation fault. eliminate_regs_1 (x=0x0, mem_mode=VOIDmode, insn=0x0, may_use_invariant=0 '\000', for_costs=0 '\000') at ../../gcc-4.7.2/gcc/reload1.c:2549 2549 enum rtx_code code = GET_CODE (x); (gdb) ba #0 eliminate_regs_1 (x=0x0, mem_mode=VOIDmode, insn=0x0, may_use_invariant=0 '\000', for_costs=0 '\000') at ../../gcc-4.7.2/gcc/reload1.c:2549 #1 0x007d42da in eliminate_regs (x=optimised out, mem_mode=mem_mode@entry=VOIDmode, insn=insn@entry=0x0) at ../../gcc-4.7.2/gcc/reload1.c:2960 #2 0x007ed561 in sdbout_parms (parms=0x75e8a440) at ../../gcc-4.7.2/gcc/sdbout.c:1278 #3 sdbout_end_prologue (line=optimised out, file=optimised out) at ../../gcc-4.7.2/gcc/sdbout.c:1594 #4 sdbout_begin_prologue (line=optimised out, file=optimised out) at ../../gcc-4.7.2/gcc/sdbout.c:1585 #5 0x006aea43 in final_start_function (first=optimised out, file=0x13751d0, optimize_p=optimised out) at ../../gcc-4.7.2/gcc/final.c:1543 #6 0x009aa10e in x86_output_mi_thunk (file=0x13751d0, thunk=optimised out, delta=-8, vcall_offset=0, function=optimised out) at ../../gcc-4.7.2/gcc/config/i386/i386.c:32312 #7 0x0061e2f2 in assemble_thunk (node=0x75e99000) at ../../gcc-4.7.2/gcc/cgraphunit.c:1641 #8 assemble_thunks_and_aliases (node=optimised out, node=optimised out) at ../../gcc-4.7.2/gcc/cgraphunit.c:1802 #9 0x0061e231 in assemble_thunks_and_aliases (node=0x75e80b40, ---Type return to continue, or q return to quit--- node=0x75e80b40) at
[Bug libgomp/56357] New: [4.8 Regression] missing symbol references for libgomp when using -flto -fopenmp on mingw32
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56357 Bug #: 56357 Summary: [4.8 Regression] missing symbol references for libgomp when using -flto -fopenmp on mingw32 Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp AssignedTo: unassig...@gcc.gnu.org ReportedBy: daniel.f.sta...@freenet.de CC: ja...@gcc.gnu.org The combination of -flto and -fopenmp is broken for mingw32 in r196092. #define ARRAY_SIZE 10 int main() { int i; volatile int a[ARRAY_SIZE]; #pragma omp parallel for private(a) for (i = 0; i ARRAY_SIZE; i++) { a[i] += i; } return 0; } This code works fine with gcc -fopenmp gcc-lto-gomp.c but results in errors for gcc -fopenmp -flto gcc-lto-gomp.c. Adding -lgomp does not solves this. Error output: C:\Users\Me\AppData\Local\Temp\cccobKR8.ltrans0.ltrans.o:cccobKR8.ltrans0.o:(.text+0x6): undefined reference to `omp_get_num_threads' C:\Users\Me\AppData\Local\Temp\cccobKR8.ltrans0.ltrans.o:cccobKR8.ltrans0.o:(.text+0xd): undefined reference to `omp_get_thread_num' C:\Users\Me\AppData\Local\Temp\cccobKR8.ltrans0.ltrans.o:cccobKR8.ltrans0.o:(.text.startup+0x26): undefined reference to `GOMP_parallel_start' C:\Users\Me\AppData\Local\Temp\cccobKR8.ltrans0.ltrans.o:cccobKR8.ltrans0.o:(.text.startup+0x37): undefined reference to `GOMP_parallel_end' could not unlink output filecollect2.exe: error: ld returned 1 exit status This all works fine in gcc 4.7.2 with the same build configuration. Using built-in specs. COLLECT_GCC=D:\Programme\msys\gcc\bin\gcc.exe COLLECT_LTO_WRAPPER=d:/programme/msys/gcc/bin/../libexec/gcc/mingw32/4.8.0/lto-wrapper.exe Target: mingw32 Configured with: ../gcc-4.8.0-r196092/configure --enable-languages=c,ada,c++,fortran,objc,obj-c++ --disable-sjlj-exceptions --disable-nls --disable-shared --enable-static --enable-fully-dynamic-string --enable-libgomp --enable-lto --with-dwarf2 --disable-win32-registry --enable-version-specific-runtime-libs --enable-bootstrap --build=mingw32 --enable-abi=32 --enable-checking=release --prefix=/mingw Thread model: win32 gcc version 4.8.0 20130215 (experimental) (GCC)
[Bug tree-optimization/54073] [4.7 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073 Jake Stine jake.stine at gmail dot com changed: What|Removed |Added CC||jake.stine at gmail dot com --- Comment #16 from Jake Stine jake.stine at gmail dot com 2013-02-16 19:12:05 UTC --- Hi, I have done quite a bit of analysis on cmov performance across x86 architectures, so I will share here in case it helps: Quick summary: Conditional moves on Intel Core/Xeon and AMD Bulldozer architectures should probably be avoided as a rule. History: Conditional moves were beneficial for the Intel Pentium 4, and also (but less-so) for AMD Athlon/Phenom chips. In the AMD Athlon/Phenom case the performance of cmov vs cmp+branch is determined more by the alignment of the target of the branch, than by the prediction rate of the branch. The instruction decoders would incur penalties on certain types of unaligned branch targets (when taken), or when decoding sequences of instructions that contained multiple branches within a 16byte fetch window (taken or not). cmov was sometimes handy for avoiding those. With regard to more current Intel Core and AMD Bulldozer/Bobcat architecture: I have found that use of conditional moves (cmov) is only beneficial if the branch that the move is replacing is badly mis-predicted. In my tests, the cmov only became clearly optimal when the branch was predicted correctly less than 92% of the time, which is abysmal by modern branch predictor standards and rarely occurs in practice. Above 97% prediction rates, cmov is typically slower than cmp+branch. Inside loops that contain branches with prediction rates approaching 100% (as is the case presented by the OP), cmov becomes a severe performance bottleneck. This holds true for both Core and Bulldozer. Bulldozer has less efficient branching than the i7, but is also severely bottlenecked by its limited fetch/decode. Cmov requires executing more total instructions, and that makes Bulldozer very unhappy. Note that my tests involved relatively simple loops that did not suffer from the added register pressure that cmov introduces. In practice, the prognosis for cmov being optimal is even worse than what I've observed in a controlled environment. Furthermore, to my knowledge the status of cmov vs. branch performance on x86 will not be changing anytime soon. cmov will continue to be a liability well into the next couple architecture releases from Intel and AMD. Piledriver will have added fetch/decode resources but should also have a smaller mispredict penalty, so its doubtful cmov will gain much advantages there either. Therefore I would recommend setting -fno-tree-loop-if-convert for all -march matching Intel Core and AMD Bulldozer/Bobcat families. There is one good use-case for cmov on x86: Mis-predicted conditions inside of loops. Currently there's no way to force that behavior in situations where I, the programmer, am fully aware that the condition is chaotic/random. A builtin cmov or condition hint would be nice. For now I'm forced to address those (fortunately infrequent) situations via inline asm.
[Bug c++/56358] New: [C++11] Erroneous interaction of typedef and inherited constructor declarations
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56358 Bug #: 56358 Summary: [C++11] Erroneous interaction of typedef and inherited constructor declarations Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: daniel.krueg...@googlemail.com The following code - compiled with the flags -pedantic-errors -std=c++11 -Wall is rejected by gcc 4.8.0 20130210 (experimental): //-- templateclass T struct A {}; templateclass T struct B1 : AT { typedef AT super_t; using AT::A; // #7 }; templateclass T struct B2 : AT { using AT::A; typedef AT super_t; // #13 }; //-- 7|error: declaration of 'using AT::A' [-fpermissive]| 2|error: changes meaning of 'A' from 'struct AT' [-fpermissive]| 13|error: 'A' does not name a type| 13|note: (perhaps 'typename AT::A' was intended) It could be related to bug 56323, but I have currently no way to verify this hypotheses. My understanding is that both definitions of B1 and B2 should be valid. Note that even though the typedefs referring to AT are needed to produce the error, even though at least in B1 its effects are completely unexpected.
[Bug middle-end/55030] [4.8 Regression]: gcc.c-torture/execute/builtins/memcpy-chk.c execution, -Os (et al)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55030 --- Comment #12 from Hans-Peter Nilsson hp at gcc dot gnu.org 2013-02-17 00:33:17 UTC --- (In reply to comment #10) I'm getting back to this because I think that we should reinstate the original patch, now that the blockage patch has been installed. *wake-up reactions* I have run into the same issue as your original issue with a private port on the 4.7 branch: the clobber causes the restoring of the frame pointer to be deleted http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01172.html Later reload allocates a stack slot to a pseudo that is set before the setjmp and used after, but the frame pointer doesn't have a consistent value... Yup, this far I remember. Clearly the frame pointer needs to be restored so the clobber is wrong. It was there because the final blockage wasn't blocking enough, but the blockage patch is supposed to have fixed that. I've lost track. What was the original patch, what do you mean by the blockage patch (that has been installed) and I'm pretty sure there were several follow-up patches, so I can't say I'm confident about reverting something from just this subset. (To wit: if it's something that causes volatile asms to again be treated different from (other) blockages, then that's wrong, as a volatile asm is the default blockage.) Can you please a candidate (reverting?) patch gcc-patches@ *and CC me* (I'm far behind on reading gcc lists).
[Bug c++/56359] New: [4.8 regression] Bogus error: no matching function for call to ...
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56359 Bug #: 56359 Summary: [4.8 regression] Bogus error: no matching function for call to ... Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: ppluzhni...@google.com Google ref b/8213841 Test case reduced from nodejs/src/node_io_watcher.cc typedef int (*InvocationCallback) (const int ); template typename target_t void SetPrototypeMethod (target_t, const char *, InvocationCallback); class A { void Initialize (); protected: static int Stop (const int ); void Stop (); // comment out to make the bug disappear. }; void A::Initialize () { SetPrototypeMethod (0, stop, A::Stop); } Compiles fine with gcc-4.7, fails with SVN trunk @196104: g++ -c t.ii t.ii: In member function ‘void A::Initialize()’: t.ii:17:43: error: no matching function for call to ‘SetPrototypeMethod(int, const char [5], unresolved overloaded function type)’ SetPrototypeMethod (0, stop, A::Stop); ^ t.ii:17:43: note: candidate is: t.ii:4:6: note: templateclass target_t void SetPrototypeMethod(target_t, const char*, InvocationCallback) void SetPrototypeMethod (target_t, const char *, InvocationCallback); ^ t.ii:4:6: note: template argument deduction/substitution failed: t.ii: In substitution of ‘templateclass target_t void SetPrototypeMethod(target_t, const char*, InvocationCallback) [with target_t = int]’: t.ii:17:43: required from here t.ii:10:16: error: ‘static int A::Stop(const int)’ is protected static int Stop (const int ); ^ t.ii:17:43: error: within this context SetPrototypeMethod (0, stop, A::Stop); ^
[Bug tree-optimization/56360] New: Loop invariant motion can introduce speculative store
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56360 Bug #: 56360 Summary: Loop invariant motion can introduce speculative store Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: i...@airs.com When I compile this C++ program with -O3 -std=gnu+11 on x86_64: #includemutex #includethread extern int a; std::mutex m; void nop2(int*); // this function is ONLY called with i=0. void foo(int i) { if (i) m.lock(); nop2(i); // Only here to prevent optimiser from creating one big if(). for (int c = 0; c 1; ++c) { if (i) { ++a; // Thus this is never executed. } } nop2(i); // Only here to prevent optimiser from creating one big if(). if (i) m.unlock(); } I see this in _Z3fooi: movla(%rip), %eax movl12(%rsp), %ecx leaq12(%rsp), %rdi leal1(%rax), %edx testl %ecx, %ecx cmovne %edx, %eax movl%eax, a(%rip) This unconditionally stores a value in a. This is a speculative store, which is invalid according to the C++ memory model. This does not happen with -O2. From looking at the dumps, the bug appears to be in the loop invariant motion pass.
[Bug tree-optimization/56360] Loop invariant motion can introduce speculative store
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56360 --- Comment #1 from Ian Lance Taylor ian at airs dot com 2013-02-17 04:21:28 UTC --- The speculative store can be disabled via --param allow-store-data-races=0. So perhaps the question is: shouldn't that be set by -std=gnu++11?
[Bug tree-optimization/56360] Loop invariant motion can introduce speculative store
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56360 --- Comment #2 from Andrew Pinski pinskia at gcc dot gnu.org 2013-02-17 04:55:40 UTC --- (In reply to comment #1) The speculative store can be disabled via --param allow-store-data-races=0. So perhaps the question is: shouldn't that be set by -std=gnu++11? As I understand the C++11 memory model is not fully there in 4.8.
[Bug tree-optimization/56360] Loop invariant motion can introduce speculative store
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56360 --- Comment #3 from Andrew Pinski pinskia at gcc dot gnu.org 2013-02-17 04:58:05 UTC --- http://gcc.gnu.org/wiki/Atomic/GCCMM/gcc4.8 describes what is left. bitfields is a big issue.
[Bug tree-optimization/56360] Loop invariant motion can introduce speculative store
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56360 --- Comment #4 from Ian Lance Taylor ian at airs dot com 2013-02-17 05:15:01 UTC --- Bitfields are an issue but I thought that speculative stores were fixed.
[Bug tree-optimization/56360] Loop invariant motion can introduce speculative store
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56360 --- Comment #5 from Andrew Pinski pinskia at gcc dot gnu.org 2013-02-17 05:46:09 UTC --- 7. Add flag for multi-threaded vs single threaded. is still left and that is what needs to turn on --param allow-store-data-races=0 .