RE: selective linking of floating point support for *printf / *scanf
From: Joseph S. Myers [mailto:jos...@codesourcery.com] Sent: Tuesday, September 02, 2014 11:29 PM Identifiers beginning with a single underscore are reserved with file scope. This means an application cannot provide an external definition of them, because such an external definition would have file scope. So it's fine for the implementation to define such identifiers and use them in the implementation of standard functions. Ah yes, I mistook file scope with file scope with internal linkage. So then there shouldn't be any problem since _printf_float and _scanf_float are only used for external linkage, no macro refer to them. Best regards, Thomas
Re: Some questions about pass web
On Wed, Sep 3, 2014 at 7:35 AM, Carrot Wei car...@google.com wrote: Hi I have following questions about web (pseudo register renaming) pass: 1. It is well known that register renaming is a big help to register allocation, but in gcc's backend, the web pass is far before RA, there are about 20 passes between them. Does it mean register renaming can also heavily benefit other optimizations? And the passes between them usually don't generate more register renaming chances? I think one purpose is to break long dependency chain into short ones. For example, with below code use(i) i = i + 1; ... use(i) i = i + 1; ... use(i) i = i + 1; ... Pass fweb could change it into below form use(i) i0 = i + 1 ... use(i0) i1 = i0 + 1 ... use(i1) i = i0 + 2 ... Apparently, latter form has shorter chains, which makes df stuff more efficient. 2. It looks current web pass can't handle AUTOINC expressions, a reg operand is used as both use and def reference in an AUTOINC expression, so this def side should not be renamed. Pass web doesn't explicitly check this case, may rename the reg operand of AUTOINC expression. Is this expected because it is before auto_inc_dec pass? Last time I tried, there are several passes after loop_done and before auto-inc-dec can't handle auto-increment addressing mode, including fweb. 3. Are AUTOINC expressions only generated by auto_inc_dec pass? All passes before auto_inc_dec including expand should not generate AUTOINC expressions, otherwise it will break web. Yes. Yet other passes may generate auto-inc friendly instruction patterns thus auto-inc-dec can capture more opportunities. IVOPT is a typical example. Thanks, bin Could anybody help to answer these questions? thanks a lot Guozhi Wei
Re: GCC ARM: aligned access
On Mon, Sep 1, 2014 at 9:14 AM, Peng Fan van.free...@gmail.com wrote: On 09/01/2014 08:09 AM, Matt Thomas wrote: On Aug 31, 2014, at 11:32 AM, Joel Sherrill joel.sherr...@oarcorp.com wrote: Hi, I am writing some code and found that system crashed. I found it was unaligned access which causes `data abort` exception. I write a piece of code and objdump it. I am not sure this is right or not. command: arm-poky-linux-gnueabi-gcc -marm -mno-thumb-interwork -mabi=aapcs-linux -mword-relocations -march=armv7-a -mno-unaligned-access -ffunction-sections -fdata-sections -fno-common -ffixed-r9 -msoft-float -pipe -O2 -c 2.c -o 2.o arch is armv7-a and used '-mno-unaligned access' I think this is totally expected. You were passed a u8 pointer which is aligned for that type (no restrictions likely). You cast it to a type with stricter alignment requirements. The code is just flawed. Some CPUs handle unaligned accesses but not your ARM. armv7 and armv6 arch except armv6-m support unaligned access. a u8 pointer is casted to u32 pointer, should gcc take the align problem into consideration to avoid possible errors? because -mno-unaligned-access. While armv7 and armv6 supports unaligned access, that support has to be enabled by the underlying O/S. Not knowing the underlying environment, I can't say whether that support is enabled. One issue we had in NetBSD in moving to gcc4.8 was that the NetBSD/arm kernel didn't enable unaligned access for armv[67] CPUs. We quickly changed things so unaligned access is supported. Yeah. by set a hardware bit in arm coprocessor, unaligned access will not cause data abort exception. I just wonder is the following correct? should gcc take the responsibility to take care possible unaligned pointer `u8 *data`? because -mno-unaligned-access is passed to gcc. I suppose no. It explicit type conversion, the compiler assumes you take the responsibility I think. Actually you can dump the final rtl using -fdump-rtl-shorten,look at the memory alignment information. In my experiment, it's A32 with -mno-unaligned-access, which means it's 32 bits aligned. Thanks, bin int func(u8 *data) { return *(unsigned int *)data; } func: 0: e590 ldr r0, [r0] 4: e12fff1e bx lr Regards, Peng.
Re: Bounded array type?
On 09/02/2014 11:22 PM, James Nelson wrote: This is error-prone because even though a size parameter is given, the code in the function has no requirement to enforce it. With a bounded array type, the prototype looks like this: buf *foo(char buf[sz], size_t sz); GCC already has a syntax extension to support this: https://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html The compiler now knows how large `buf` is, and it can put bounds checks into the code (which may be disabled with -O3). We tried this, but it is hard to find information about it, see “Bounded Pointers”. Nowdays, there is -fsanitize=object-size, but I don't know if it uses VLA lengths: https://gcc.gnu.org/ml/gcc-patches/2014-07/msg00923.html Historically, propagation of object sizes from malloc and VLAs to __builtin_object_size was rather incomplete. -- Florian Weimer / Red Hat Product Security
Re: Some questions about pass web
On Wed, Sep 3, 2014 at 1:35 AM, Carrot Wei wrote: 1. It is well known that register renaming is a big help to register allocation, but in gcc's backend, the web pass is far before RA, there are about 20 passes between them. Does it mean register renaming can also heavily benefit other optimizations? Yes - sometimes anyway. Most non-SSA data flow analyses can't look through pseudos that have multiple non-overlapping live ranges. Think constant/copy propagation, value numbering, etc. After passes that duplicate basic blocks, and not renaming registers, you get false dependencies that hide RA and scheduling opportunities. This is why pass_web is after code-duplication transformations like (RTL) loop unrolling but before the last RTL CPROP pass. The old RA couldn't do register renaming, so something before RA had to take care of it. Enter pass_web. But this is less relevant in GCC today, where RTL code transformations basically should be limited to simple local transformations, with the more difficult global transformations already done on GIMPLE (including live range splitting, part of out-of-SSA). On top of that, IRA knows how to do some forms of live range splitting (but not within loops, AFAIU, because a loop is a single region in IRA). If someone would give some TLC to the RTL loop-unroll.c code for IV-splitting/accumulator-expanding and make them enabled by default, I doubt pass_web would be doing much at all. And the passes between them usually don't generate more register renaming chances? Usually not. Most of them create new pseudos for newly inserted expressions. Some passes are actually harmed by pass_web. auto_inc_dec is one of those. 2. It looks current web pass can't handle AUTOINC expressions, a reg operand is used as both use and def reference in an AUTOINC expression, so this def side should not be renamed. Pass web doesn't explicitly check this case, may rename the reg operand of AUTOINC expression. Is this expected because it is before auto_inc_dec pass? You already found the DF_REF_READ_WRITE bits. pass_web also handles match_dup constraints. That should be enough. If it is not, then please file a bug report (and feel free to assign it to me). 3. Are AUTOINC expressions only generated by auto_inc_dec pass? All passes before auto_inc_dec including expand should not generate AUTOINC expressions, otherwise it will break web. IIRC, it used to be that only push/pop could be AUTOINC before auto_inc_dec. I'm not sure if this is still true today. Ciao! Steven
Re: Some questions about pass web
On Wed, Sep 3, 2014 at 9:17 AM, Bin.Cheng wrote: Last time I tried, there are several passes after loop_done and before auto-inc-dec can't handle auto-increment addressing mode, including fweb. It surprises me that pass_web can't handle AUTOINC. Perhaps I'm off my rocker, but it's always been my understanding that almost all passes handle AUTOINC just fine (or at least conservatively: punt if you see an AUTOINC), and that only CSE really doesn't know about AUTOINC at all. Ciao! Steven
gcc parallel make check
I've noticed that make -j -k check-fortran results in a serialized checking, while make -j32 -k check-fortran goes parallel. Somehow the explicit 'N' in -jN seems to be needed for the check target, while the other targets seem to do just fine. Is that a feature, or should I file a PR for that... ? Somewhat related is there a rule of thumb on how is the granularity of parallel check decided ? E.g. check-fortran seems to be limited to about ~5 parallel targets, which is few for a typical server (but of course a welcome speedup already). Thanks, Joost
Re: gcc parallel make check
On Wed, 3 Sep 2014, VandeVondele Joost wrote: I've noticed that make -j -k check-fortran results in a serialized checking, while make -j32 -k check-fortran goes parallel. Somehow the explicit 'N' in -jN seems to be needed for the check target, while the other targets seem to do just fine. Is that a feature, or should I file a PR for that... ? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53155 -- Marc Glisse
Re: gcc parallel make check
On Wed, Sep 03, 2014 at 09:15:51AM +, VandeVondele Joost wrote: I've noticed that make -j -k check-fortran results in a serialized checking, while make -j32 -k check-fortran goes parallel. Somehow the explicit 'N' in -jN seems to be needed for the check target, while the other targets seem to do just fine. Is that a feature, or should I file a PR for that... ? It is intentional. With -j it is essentially a fork bomb, just don't use it. Somewhat related is there a rule of thumb on how is the granularity of parallel check decided ? E.g. check-fortran seems to be limited to about ~5 parallel targets, which is few for a typical server (but of course a welcome speedup already). The splitting has some cost (e.g. lots of various checks are cached, with split jobs they need to be done in each separate goal), and the goal of the split is toplevel make check parallelization, not individual directory or language testing. For the latter perhaps more fine grained split could be useful, but how would one find out if it is a toplevel make check, or say make -C gcc check where you test many languages, or check-gfortran? Jakub
RE: gcc parallel make check
It is intentional. With -j it is essentially a fork bomb, just don't use it. well, silently ignoring it for just this target did cost me a lot of time, while an eventual fork bomb would have been dealt with much more quickly. Somewhat related is there a rule of thumb on how is the granularity of parallel check decided ? E.g. check-fortran seems to be limited to about ~5 parallel targets, which is few for a typical server (but of course a welcome speedup already). The splitting has some cost (e.g. lots of various checks are cached, with split jobs they need to be done in each separate goal), and the goal of the split is toplevel make check parallelization, not individual directory or language testing. For the latter perhaps more fine grained split could be useful, but how would one find out if it is a toplevel make check, or say make -C gcc check where you test many languages, or check-gfortran? the cost must be small compared to the possible gain... on a 32 core server, testing of fortran FE changes would be 4x larger. I notice that even on a full check, the Fortran tests are still running when the number of processes is already way below 32. However, the longest running (by a few minutes) are those: expect -- /usr/share/dejagnu/runtest.exp --tool gcc lto.exp weak.exp tls.exp ipa.exp tree-ssa.exp debug.exp dwarf2.exp fixed-point.exp vxworks.exp cilk-plus.exp vmx.exp pch.exp simulate-thread.exp x86_64-costmodel-vect.exp i386-costmodel-vect.exp spu-costmodel-vect.exp ppc-costmodel-vect.exp charset.exp noncompile.exp tsan.exp graphite.exp compat.exp expect -- /usr/share/dejagnu/runtest.exp --tool g++ lto.exp tls.exp gcov.exp debug.exp dwarf2.exp cilk-plus.exp pch.exp bprob.exp simulate-thread.exp vect.exp charset.exp tsan.exp graphite.exp compat.exp struct-layout-1.exp ubsan.exp tm.exp gomp.exp dfp.exp tree-prof.exp stackalign.exp plugin.exp guality.exp asan.exp ecos.exp so can those be run more independently ?
RE: gcc parallel make check
What did you expect for -j alone? an error? No, as is standard in gnu make, a new process for any target that can be processed (i.e. unlimited). ... check-fortran seems to be limited to about ~5 parallel targets ... Running the make with -j8 gives 7 directories gfortran[1-6]? in gcc/testsuite/. Note that the load balancing could be improved: few minutes with a single thread over ~20 minutes. I'd like to have roughly 32 directories (or as many of the -jN allows for).
Re: gcc parallel make check
On Wed, Sep 03, 2014 at 09:37:19AM +, VandeVondele Joost wrote: It is intentional. With -j it is essentially a fork bomb, just don't use it. well, silently ignoring it for just this target did cost me a lot of time, while an eventual fork bomb would have been dealt with much more quickly. Somewhat related is there a rule of thumb on how is the granularity of parallel check decided ? E.g. check-fortran seems to be limited to about ~5 parallel targets, which is few for a typical server (but of course a welcome speedup already). The splitting has some cost (e.g. lots of various checks are cached, with split jobs they need to be done in each separate goal), and the goal of the split is toplevel make check parallelization, not individual directory or language testing. For the latter perhaps more fine grained split could be useful, but how would one find out if it is a toplevel make check, or say make -C gcc check where you test many languages, or check-gfortran? the cost must be small compared to the possible gain... on a 32 core server, testing of fortran FE changes would be 4x larger. I notice that It depends. For make -j2 if you split check-gfortran alone into 32 pieces, check-gcc into 32 pieces, check-g++ into 32 pieces, libstdc++ into 32 pieces etc., it might be too much. The problem with too fine-grained split beyond some cost to start the testing in the goal, and running various cached tests, is also that once you want to split a single *.exp job into smaller parts, you need to use wildcards, but then it is a maintainance problem, you don't want to test anything more than once, or not at all, even if new tests with weirdo names are added later. even on a full check, the Fortran tests are still running when the number of processes is already way below 32. However, the longest running (by a few minutes) are those: expect -- /usr/share/dejagnu/runtest.exp --tool gcc lto.exp weak.exp tls.exp ipa.exp tree-ssa.exp debug.exp dwarf2.exp fixed-point.exp vxworks.exp cilk-plus.exp vmx.exp pch.exp simulate-thread.exp x86_64-costmodel-vect.exp i386-costmodel-vect.exp spu-costmodel-vect.exp ppc-costmodel-vect.exp charset.exp noncompile.exp tsan.exp graphite.exp compat.exp expect -- /usr/share/dejagnu/runtest.exp --tool g++ lto.exp tls.exp gcov.exp debug.exp dwarf2.exp cilk-plus.exp pch.exp bprob.exp simulate-thread.exp vect.exp charset.exp tsan.exp graphite.exp compat.exp struct-layout-1.exp ubsan.exp tm.exp gomp.exp dfp.exp tree-prof.exp stackalign.exp plugin.exp guality.exp asan.exp ecos.exp so can those be run more independently ? It is a moving target, new tests are added every day. I'm trying to adjust it during stage3/stage4 occassionally, but it also very much depends on which target it is (e.g. i?86/x86_64 has many more tests in i386.exp then other targets in their gcc.target), how fast the compiler is on the target (e.g. on some targets -g is much slower than on others, etc.). Jakub
Re: gcc parallel make check
Hi Joost, VandeVondele Joost wrote: I've noticed that make -j -k check-fortran results in a serialized checking, while make -j32 -k check-fortran goes parallel. I have to admit that I don't know why that's the case. However, I can answer the next question, is presumably related to this one: Somewhat related is there a rule of thumb on how is the granularitys of parallel check decided ? DejaGNU is not able to run checks in parallel - thus, we have only makefile parallelization (check-gcc, check-gfortran). As that wasn't suifficient, Jakub (?) split the single tests into multiple ones, trying to do ensure that those subtargets all take about the same time. See: gcc/fortran/Make-lang.in, which has: # For description see comment above check_gcc_parallelize in gcc/Makefile.in. check_gfortran_parallelize = dg.exp=gfortran.dg/\[adAD\]* \ dg.exp=gfortran.dg/\[bcBC\]* \ dg.exp=gfortran.dg/\[nopNOP\]* \ dg.exp=gfortran.dg/\[isuvISUV\]* \ dg.exp=gfortran.dg/\[efhkqrxzEFHKQRXZ\]* \ dg.exp=gfortran.dg/\[0-9gjlmtwyGJLMTWY\]* Thus, you currently get 6 parallel check-gfortran checks - followed by one which tries to combine the results. I think Diego has some means to run GCC's in a vastly parallel way, which break due to a test-framework issue / gfortran.dg-dependency issues. See PR56408. Thus, you could asks him how he does it. Additionally, I wouldn't mind if some lispy person could look at the PR - my attempts failed, but, admittedly, I didn't spend much time on it. Tobias PS: There was/is the reoccuring thought of replacing DejaGNU by a different framework or to enhance it, but not much substantial work has happened, despite some occasional effort. At least DejaGNU is now back under maintaince, cf. http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=summary
RE: gcc parallel make check
I have to admit that I don't know why that's the case. Actually Marc answered that one (I had the wrong mail address for gcc@ so repeat here): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53155 See: gcc/fortran/Make-lang.in, which has: I'll have a look and do some testing what the gains/costs of a further split are. Joost
RE: gcc parallel make check
expect -- /usr/share/dejagnu/runtest.exp --tool gcc lto.exp weak.exp tls.exp ipa.exp tree-ssa.exp debug.exp dwarf2.exp fixed-point.exp vxworks.exp cilk-plus.exp vmx.exp pch.exp simulate-thread.exp x86_64-costmodel-vect.exp i386-costmodel-vect.exp spu-costmodel-vect.exp ppc-costmodel-vect.exp charset.exp noncompile.exp tsan.exp graphite.exp compat.exp expect -- /usr/share/dejagnu/runtest.exp --tool g++ lto.exp tls.exp gcov.exp debug.exp dwarf2.exp cilk-plus.exp pch.exp bprob.exp simulate-thread.exp vect.exp charset.exp tsan.exp graphite.exp compat.exp struct-layout-1.exp ubsan.exp tm.exp gomp.exp dfp.exp tree-prof.exp stackalign.exp plugin.exp guality.exp asan.exp ecos.exp so can those be run more independently ? It is a moving target, new tests are added every day. I'm trying to adjust it during stage3/stage4 occassionally, but it also very much depends on which target it is (e.g. i?86/x86_64 has many more tests in i386.exp then other targets in their gcc.target), how fast the compiler is on the target (e.g. on some targets -g is much slower than on others, etc.). could you point me to the right file (or example commit) for trying to adjust this ? I can try to do some testing and come back with some numbers.
Re: gcc parallel make check
On Wed, Sep 03, 2014 at 10:35:41AM +, VandeVondele Joost wrote: expect -- /usr/share/dejagnu/runtest.exp --tool gcc lto.exp weak.exp tls.exp ipa.exp tree-ssa.exp debug.exp dwarf2.exp fixed-point.exp vxworks.exp cilk-plus.exp vmx.exp pch.exp simulate-thread.exp x86_64-costmodel-vect.exp i386-costmodel-vect.exp spu-costmodel-vect.exp ppc-costmodel-vect.exp charset.exp noncompile.exp tsan.exp graphite.exp compat.exp expect -- /usr/share/dejagnu/runtest.exp --tool g++ lto.exp tls.exp gcov.exp debug.exp dwarf2.exp cilk-plus.exp pch.exp bprob.exp simulate-thread.exp vect.exp charset.exp tsan.exp graphite.exp compat.exp struct-layout-1.exp ubsan.exp tm.exp gomp.exp dfp.exp tree-prof.exp stackalign.exp plugin.exp guality.exp asan.exp ecos.exp so can those be run more independently ? It is a moving target, new tests are added every day. I'm trying to adjust it during stage3/stage4 occassionally, but it also very much depends on which target it is (e.g. i?86/x86_64 has many more tests in i386.exp then other targets in their gcc.target), how fast the compiler is on the target (e.g. on some targets -g is much slower than on others, etc.). could you point me to the right file (or example commit) for trying to adjust this ? I can try to do some testing and come back with some numbers. The splits are in the Makefiles, see check_gcc_parallelize var in gcc/Makefile.in (where there is a big comment with documentation), check_g++_parallelize var in gcc/cp/Make-lang.in, check_gfortran_parallelize var in gcc/fortran/Make-lang.in, or check-DEJAGNU goal in libstdc++-v3/testsuite/Makefile.am. Jakub
optimization for simd intrinsics vld2_dup_* on aarch64-none-elf
Hi, I found there is a performance problem with some simd intrinsics (vld2_dup_*) on aarch64-none-elf. Now the vld2_dup_* are defined as follows: #define __LD2R_FUNC(rettype, structtype, ptrtype, \ regsuffix, funcsuffix, Q) \ __extension__ static __inline rettype \ __attribute__ ((__always_inline__)) \ vld2 ## Q ## _dup_ ## funcsuffix (const ptrtype *ptr) \ { \ rettype result; \ __asm__ (ld2r {v16. #regsuffix , v17. #regsuffix }, %1\n\t \ st1 {v16. #regsuffix , v17. #regsuffix }, %0\n\t \ : =Q(result) \ : Q(*(const structtype *)ptr) \ : memory, v16, v17); \ return result; \ } It loads from memory to registers, and then store the value of registers to memory as a result. Such code is terribly low in performance because of redundant memory visit and limited registers allocation. Some intinsics like vld2_* were similar to vld2_dup_*, but now they are realized by builtin functions. __extension__ static __inline int16x4x2_t __attribute__ ((__always_inline__)) vld2_s16 (const int16_t * __a) { int16x4x2_t ret; __builtin_aarch64_simd_oi __o; __o = __builtin_aarch64_ld2v4hi ((const __builtin_aarch64_simd_hi *) __a); ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0); ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1); return ret; } Could vld2_dup_* also be written as builtin ? If not, i think the inline assembler can be optimized as follows : #define __LD2R_FUNC(rettype, structtype, ptrtype, \ regsuffix, funcsuffix, Q) \ __extension__ static __inline rettype \ __attribute__ ((__always_inline__)) \ vld2 ## Q ## _dup_ ## funcsuffix (const ptrtype *ptr) \ { \ rettype result; \ __asm__ ( ld2r {%0.4h, %1.4h}, %2 \ : =V16(result.val[0]), =V17(result.val[1]) \ : Q(*(const structtype *)ptr) \ : memory, v16, v17); \ return result; \ } It need to add a reg_class_name v16v17 and add constraints V16 V17 for them. For this, aarch64.h、aarch64.c、constraints.md should be modified. -- Shanyao Chen
Re: Bounded array type?
On Wed, 3 Sep 2014, Florian Weimer wrote: On 09/02/2014 11:22 PM, James Nelson wrote: This is error-prone because even though a size parameter is given, the code in the function has no requirement to enforce it. With a bounded array type, the prototype looks like this: buf *foo(char buf[sz], size_t sz); GCC already has a syntax extension to support this: https://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html But the size declared in a parameter declaration has no semantic significance; there is no requirement that the pointer passed does point to an array of that size. If you declare the size as [static sz] then that means it points to an array of at least that size, but it could be larger. Thus, any option for any sort of bounds checks based on parameter array sizes (constant or non-constant) would be an option that explicitly produces errors for valid C code. (You could always have a function attribute to enable checking based on parameter array sizes - such an attribute would declare that the function should never access the parameter array outside the bounds given by the size, even if the array passed by the caller is larger, and maybe also that the caller must not pass an array smaller than the size given.) -- Joseph S. Myers jos...@codesourcery.com
Re: Some questions about pass web
On 09/03/14 02:35, Steven Bosscher wrote: On Wed, Sep 3, 2014 at 9:17 AM, Bin.Cheng wrote: Last time I tried, there are several passes after loop_done and before auto-inc-dec can't handle auto-increment addressing mode, including fweb. It surprises me that pass_web can't handle AUTOINC. Perhaps I'm off my rocker, but it's always been my understanding that almost all passes handle AUTOINC just fine (or at least conservatively: punt if you see an AUTOINC), and that only CSE really doesn't know about AUTOINC at all. In the past autoinc instructions didn't appear until flow (just prior to combine) and that was documented behaviour. So anything which was run strictly prior to flow/combine wasn't autoinc aware. That may have changed somewhat with the autoinc rewrite. jeff
Re: optimization for simd intrinsics vld2_dup_* on aarch64-none-elf
Hi Shanyao, On 03/09/14 16:02, shanyao chen wrote: Hi, I found there is a performance problem with some simd intrinsics (vld2_dup_*) on aarch64-none-elf. Now the vld2_dup_* are defined as follows: #define __LD2R_FUNC(rettype, structtype, ptrtype, \ regsuffix, funcsuffix, Q) \ __extension__ static __inline rettype \ __attribute__ ((__always_inline__)) \ vld2 ## Q ## _dup_ ## funcsuffix (const ptrtype *ptr) \ { \ rettype result; \ __asm__ (ld2r {v16. #regsuffix , v17. #regsuffix }, %1\n\t \ st1 {v16. #regsuffix , v17. #regsuffix }, %0\n\t \ : =Q(result) \ : Q(*(const structtype *)ptr) \ : memory, v16, v17); \ return result; \ } It loads from memory to registers, and then store the value of registers to memory as a result. Such code is terribly low in performance because of redundant memory visit and limited registers allocation. Some intinsics like vld2_* were similar to vld2_dup_*, but now they are realized by builtin functions. __extension__ static __inline int16x4x2_t __attribute__ ((__always_inline__)) vld2_s16 (const int16_t * __a) { int16x4x2_t ret; __builtin_aarch64_simd_oi __o; __o = __builtin_aarch64_ld2v4hi ((const __builtin_aarch64_simd_hi *) __a); ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0); ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1); return ret; } Could vld2_dup_* also be written as builtin ? If not, i think the inline assembler can be optimized as follows : The arm port implements these with builtins, it should possible to implement them that way on aarch64 as well. Could you log an issue in bugzilla please, including some source code demonstrating the poor codegen if possible? Thanks, Kyrill #define __LD2R_FUNC(rettype, structtype, ptrtype, \ regsuffix, funcsuffix, Q) \ __extension__ static __inline rettype \ __attribute__ ((__always_inline__)) \ vld2 ## Q ## _dup_ ## funcsuffix (const ptrtype *ptr) \ { \ rettype result; \ __asm__ ( ld2r {%0.4h, %1.4h}, %2 \ : =V16(result.val[0]), =V17(result.val[1]) \ : Q(*(const structtype *)ptr) \ : memory, v16, v17); \ return result; \ } It need to add a reg_class_name v16v17 and add constraints V16 V17 for them. For this, aarch64.h、aarch64.c、constraints.md should be modified.
stack_pointer_delta related ICE in libgcc on 4.9.1
Trying to bootstrap m68k i hit an assert in emit_library_call_value_1 that wants to assure that the stack is aligned properly. PUSH_ROUNDING(GET_MODE_SIZE(QImode)) for m5206 is currently 1 so the testcase below has stack_pointer_delta = 1 + 1 + 4 but emit_library_call_value_1() has this: /* Stack must be properly aligned now. */ gcc_assert (!(stack_pointer_delta (PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT - 1))); where 6 3 != 0 and ICEs I am not familier with m68k so i would be glad for any help! Should the arg be partial? Doesn't look like, no. Does m68k use stack save area? From the looks it doesn't. Is the alignment_pad for QImode arg wrong? Should PUSH_ROUNDING be changed back to the !CF variant? Or is the alignment assert too strict? Perhaps m5206 is not TARGET_CAS and should not compile this linux-atomic in the first place? (is emit_library_call_value_1 missing a do_pending_stack_adjust() before NO_DEFER_POP ? Does not seem relevant for this case though) Slightly simplified reproducer: $ cat x.i ; echo EOF; # see libgcc/config/m68k/linux-atomic.c unsigned char __attribute__ ((visibility (hidden))) __sync_val_compare_and_swap_1 (unsigned char *ptr, unsigned char soldval, unsigned char snewval) { unsigned *wordptr = (unsigned *) ((unsigned long) ptr ~3); unsigned int mask, shift, woldval, wnewval; unsigned oldval, newval, cmpval; shift = (((unsigned long) ptr 3) 3) ^ 24; mask = 0xffu shift; woldval = (soldval 0xffu) shift; wnewval = (snewval 0xffu) shift; cmpval = *wordptr; do { oldval = cmpval; if ((oldval mask) != woldval) break; newval = (oldval ~mask) | wnewval; { register unsigned *a0 asm (a0) = wordptr; register unsigned d2 asm (d2) = oldval; register unsigned d1 asm (d1) = newval; register unsigned d0 asm (d0) = 335; asm volatile (trap #0:=r (d0), =r (d1), =r (a0):r (d0), r (d1), r (d2), r (a0):memory, a1); cmpval = d0; } } while (__builtin_expect (oldval != cmpval, 0)); return (oldval shift) 0xffu; } _Bool __attribute__ ((visibility (hidden))) __sync_bool_compare_and_swap_1 (unsigned char *ptr, unsigned char oldval, unsigned char newval) { return (__sync_val_compare_and_swap_1 (ptr, oldval, newval) == oldval); } EOF /home/me/src/oe/openembedded-core/build/tmp-glibc/sysroots/x86_64-linux/usr/bin/m68k-oe-linux.gcc-cross-initial-m68k/m68k-oe-linux-gcc -mcpu=5206 --sysroot=/home/me/src/oe/openembedded-core/build/tmp-glibc/sysroots/qemum68k -O2 -pipe -g -feliminate-unused-debug-types -O2 -g -Os -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem ./include -fPIC -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector -Dinhibit_libc -fPIC -I. -I. -I/home/me/src/oe/openembedded-core/build/tmp-glibc/work/m5206-oe-linux/libgcc-initial/4.9.1-r0/gcc-4.9.1/build.m68k-oe-linux.m68k-oe-linux/m68k-oe-linux/libgcc/../.././gcc -I/home/me/src/oe/openembedded-core/build/tmp-glibc/work-shared/gcc-4.9.1-r0/gcc-4.9.1/libgcc -I/home/me/src/oe/openembedded-core/build/tmp-glibc/work-shared/gcc-4.9.1-r0/gcc-4.9.1/libgcc/. -I/home/me/src/oe/openembedded-core/build/tmp-glibc/work-shared/gcc-4.9.1-r0/gcc-4.9.1/libgcc/../gcc -I/home/me/src/oe/openembedded-core/build/tmp-glibc/work-shared/gcc-4.9.1-r0/gcc-4.9.1/libgcc/../include -DHAVE_CC_TLS -o o.o -MT linux-atomic.i -MD -MP -MF linux-atomic.dep -c x.i -fvisibility=hidden -DHIDE_EXPORTS -v Using built-in specs. COLLECT_GCC=/home/me/src/oe/openembedded-core/build/tmp-glibc/sysroots/x86_64-linux/usr/bin/m68k-oe-linux.gcc-cross-initial-m68k/m68k-oe-linux-gcc Target: m68k-oe-linux Configured with: /home/me/src/oe/openembedded-core/build/tmp-glibc/work-shared/gcc-4.9.1-r0/gcc-4.9.1/configure --build=x86_64-linux --host=x86_64-linux --target=m68k-oe-linux --prefix=/home/me/src/oe/openembedded-core/build/tmp-glibc/sysroots/x86_64-linux/usr --exec_prefix=/home/me/src/oe/openembedded-core/build/tmp-glibc/sysroots/x86_64-linux/usr --bindir=/home/me/src/oe/openembedded-core/build/tmp-glibc/sysroots/x86_64-linux/usr/bin/m68k-oe-linux.gcc-cross-initial-m68k --sbindir=/home/me/src/oe/openembedded-core/build/tmp-glibc/sysroots/x86_64-linux/usr/bin/m68k-oe-linux.gcc-cross-initial-m68k --libexecdir=/home/me/src/oe/openembedded-core/build/tmp-glibc/sysroots/x86_64-linux/usr/libexec/m68k-oe-linux.gcc-cross-initial-m68k --datadir=/home/me/src/oe/openembedded-core/build/tmp-glibc/sysroots/x86_64-linux/usr/share --sysconfdir=/home/me/src/oe/openembedded-core/build/tmp-glibc/sysroots/x86_64-linux/etc --sharedstatedir=/home/me/src/oe/openembedded-core/build/tmp-glibc/sysroots/x86_64-linux/com --localstatedir=/home/me/src/oe/openembedded-core/build/tmp-glibc/sysroots/x86_64-linux/var
Re: Bounded array type?
On 09/03/2014 05:20 PM, Joseph S. Myers wrote: On Wed, 3 Sep 2014, Florian Weimer wrote: On 09/02/2014 11:22 PM, James Nelson wrote: This is error-prone because even though a size parameter is given, the code in the function has no requirement to enforce it. With a bounded array type, the prototype looks like this: buf *foo(char buf[sz], size_t sz); GCC already has a syntax extension to support this: https://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html But the size declared in a parameter declaration has no semantic significance; there is no requirement that the pointer passed does point to an array of that size. I believe this was different with the bounded pointer extension. But I might misremember how things worked. I've never used it (I think), I only recall reading some documentation which has now vanished. If you declare the size as [static sz] then that means it points to an array of at least that size, but it could be larger. GCC does not seem to enforce that. This compiles without errors: int foo(char [static 5]); int bar(char *p) { return foo(p); } This could be -- Florian Weimer / Red Hat Product Security
Re: Bounded array type?
On Wed, 3 Sep 2014, Florian Weimer wrote: If you declare the size as [static sz] then that means it points to an array of at least that size, but it could be larger. GCC does not seem to enforce that. This compiles without errors: [static] is about optimization (but GCC doesn't optimize using it either). It's only undefined behavior if a call with a too-small array is actually executed. int foo(char [static 5]); int bar(char *p) { return foo(p); } That's perfectly valid, as long as every call to bar is with an argument that does in fact point to at least 5 chars (if a call doesn't, there's undefined behavior when that call is executed). -- Joseph S. Myers jos...@codesourcery.com
Re: Some questions about pass web
On Wed, Sep 3, 2014 at 1:29 AM, Steven Bosscher stevenb@gmail.com wrote: On Wed, Sep 3, 2014 at 1:35 AM, Carrot Wei wrote: 1. It is well known that register renaming is a big help to register allocation, but in gcc's backend, the web pass is far before RA, there are about 20 passes between them. Does it mean register renaming can also heavily benefit other optimizations? Yes - sometimes anyway. Most non-SSA data flow analyses can't look through pseudos that have multiple non-overlapping live ranges. Think constant/copy propagation, value numbering, etc. After passes that duplicate basic blocks, and not renaming registers, you get false dependencies that hide RA and scheduling opportunities. This is why pass_web is after code-duplication transformations like (RTL) loop unrolling but before the last RTL CPROP pass. The old RA couldn't do register renaming, so something before RA had to take care of it. Enter pass_web. But this is less relevant in GCC today, where RTL code transformations basically should be limited to simple local transformations, with the more difficult global transformations already done on GIMPLE (including live range splitting, part of out-of-SSA). On top of that, IRA knows how to do some forms of live range splitting (but not within loops, AFAIU, because a loop is a single region in IRA). If someone would give some TLC to the RTL loop-unroll.c code for IV-splitting/accumulator-expanding and make them enabled by default, I doubt pass_web would be doing much at all. And the passes between them usually don't generate more register renaming chances? Usually not. Most of them create new pseudos for newly inserted expressions. Some passes are actually harmed by pass_web. auto_inc_dec is one of those. 2. It looks current web pass can't handle AUTOINC expressions, a reg operand is used as both use and def reference in an AUTOINC expression, so this def side should not be renamed. Pass web doesn't explicitly check this case, may rename the reg operand of AUTOINC expression. Is this expected because it is before auto_inc_dec pass? You already found the DF_REF_READ_WRITE bits. pass_web also handles match_dup constraints. That should be enough. If it is not, then please file a bug report (and feel free to assign it to me). Bug entry filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63156. Debugging shows that DF_REF_READ_WRITE is not set for the operand of post_inc, may be a dataflow problem? thanks Guozhi Wei
Re: [RFC] Don't inline builtin memory functions when ASan is enabled.
On Tue, Sep 2, 2014 at 7:32 AM, Maxim Ostapenko m.ostape...@partner.samsung.com wrote: Hi, At this moment, most of GCC builtin memory functions (for example strcpy, stpcpy, wcpcpy, strdup, etc) are not instrumented by GCC, however some of them are rather dangerous. If GCC inlines these builtin functions, we will miss important checks for arguments, and possible overflow won't be detected. I know, that Clang ASan team simply disable inlining of builtin functions in Clang if -fsanitize=address is enabled and rely on libsanitizer's hooks. Correct, that's what we do. The main benefit of this approach is that we won't miss overflow in builtins, that can significantly increase target programs safety. Also, some redundant checks will be removed for builtin functions, that are instrumented and are not inlined for some reasons. The potential disadvantage of this approach is performance decreasing for sanitized programs. Does disabling of builtin functions inlining look sane in this case? If yes, I can provide performance investigation and prepare the patch. What do you think? -Maxim
Re: stack_pointer_delta related ICE in libgcc on 4.9.1
On 09/03/14 09:56, Bernhard Reutner-Fischer wrote: Trying to bootstrap m68k i hit an assert in emit_library_call_value_1 that wants to assure that the stack is aligned properly. PUSH_ROUNDING(GET_MODE_SIZE(QImode)) for m5206 is currently 1 so the testcase below has stack_pointer_delta = 1 + 1 + 4 but emit_library_call_value_1() has this: /* Stack must be properly aligned now. */ gcc_assert (!(stack_pointer_delta (PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT - 1))); where 6 3 != 0 and ICEs I am not familier with m68k so i would be glad for any help! Should the arg be partial? Doesn't look like, no. No. m68k doesn't pass args in registers. Does m68k use stack save area? From the looks it doesn't. No. m68k does not pass args in registers, I believe that's a requirement for needing a stack save area. Is the alignment_pad for QImode arg wrong? Possibly. Clearly if the stack is to be aligned to larger than a byte and PUSH_ROUNDING has no adjustment for QImode, then padding is needed somewhere. And both the caller and callee need to agree on the padding. Should PUSH_ROUNDING be changed back to the !CF variant? Possibly. It's unfortunate the CF chips do something different than other m68k variants here. The change in behaviour would seem to imply it's impossible to mix traditional m68k code with CF code, though perhaps nobody cares about that. Or is the alignment assert too strict? I don't think so. Perhaps m5206 is not TARGET_CAS and should not compile this linux-atomic in the first place? No, I don't think so. You might ping Jim Wilson or Richard Sandiford who have both done coldfire work in the past. I really don't have any experience with the coldfire bits. (is emit_library_call_value_1 missing a do_pending_stack_adjust() before NO_DEFER_POP ? Does not seem relevant for this case though) Unsure. I haven't done significant work on the m68k in decades, so the rules around defer_pop have long since been dropped from my memory. If you can describe why you think it might be missing it'd be helpful for evaluation. My recommendation would be to file a bug report with the reproducer. m68k isn't nearly as important today as it has been in the past, so getting developer time to hash through how all this should work for the coldfire may be difficult. jeff
Re: stack_pointer_delta related ICE in libgcc on 4.9.1
On 9/3/2014 1:24 PM, Jeff Law wrote: On 09/03/14 09:56, Bernhard Reutner-Fischer wrote: Trying to bootstrap m68k i hit an assert in emit_library_call_value_1 that wants to assure that the stack is aligned properly. PUSH_ROUNDING(GET_MODE_SIZE(QImode)) for m5206 is currently 1 so the testcase below has stack_pointer_delta = 1 + 1 + 4 but emit_library_call_value_1() has this: /* Stack must be properly aligned now. */ gcc_assert (!(stack_pointer_delta (PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT - 1))); where 6 3 != 0 and ICEs I am not familier with m68k so i would be glad for any help! Should the arg be partial? Doesn't look like, no. No. m68k doesn't pass args in registers. Does m68k use stack save area? From the looks it doesn't. No. m68k does not pass args in registers, I believe that's a requirement for needing a stack save area. Is the alignment_pad for QImode arg wrong? Possibly. Clearly if the stack is to be aligned to larger than a byte and PUSH_ROUNDING has no adjustment for QImode, then padding is needed somewhere. And both the caller and callee need to agree on the padding. FWIW For stack alignment RTEMS does not distinguish between any m68k or Coldfire variant. The note says that what comes from the allocator is sufficiently aligned. And that's on a 4-byte boundary. My recollection is that was selected in the m68020 days to avoid the penalty of unaligned accesses -- not to avoid faults. I don't recall if Coldfires fault or handle the unaligned accesses but either way, there is a penalty incurred and you want to avoid it. Should PUSH_ROUNDING be changed back to the !CF variant? Possibly. It's unfortunate the CF chips do something different than other m68k variants here. If that gives you 4-byte stack alignment, then yes. I think the same stack alignment rules should apply. The change in behaviour would seem to imply it's impossible to mix traditional m68k code with CF code, though perhaps nobody cares about that. I would bet that myself also. I know we don't care. But we provide source and our users compile it themselves with the best options. :) Or is the alignment assert too strict? I don't think so. Perhaps m5206 is not TARGET_CAS and should not compile this linux-atomic in the first place? No, I don't think so. Coldfire does not have the CAS instruction per http://www.freescale.com/files/dsp/doc/ref_manual/CFPRM.pdf You might ping Jim Wilson or Richard Sandiford who have both done coldfire work in the past. I really don't have any experience with the coldfire bits. (is emit_library_call_value_1 missing a do_pending_stack_adjust() before NO_DEFER_POP ? Does not seem relevant for this case though) Unsure. I haven't done significant work on the m68k in decades, so the rules around defer_pop have long since been dropped from my memory. If you can describe why you think it might be missing it'd be helpful for evaluation. My recommendation would be to file a bug report with the reproducer. m68k isn't nearly as important today as it has been in the past, so getting developer time to hash through how all this should work for the coldfire may be difficult. jeff -- Joel Sherrill, Ph.D. Director of Research Development joel.sherr...@oarcorp.comOn-Line Applications Research Ask me about RTEMS: a free RTOS Huntsville AL 35805 Support Available(256) 722-9985
Re: selective linking of floating point support for *printf / *scanf
On 2 September 2014 16:28, Joseph S. Myers jos...@codesourcery.com wrote: On Tue, 2 Sep 2014, Joey Ye wrote: Apparently newlib is not following this specification very well, as there are symbols like _abc_r defined every where in current newlib. I am not implying the spec should not be followed, but is newlib designed to have a loose spec for the single underscore? Identifiers beginning with a single underscore are reserved with file scope. This means an application cannot provide an external definition of them, because such an external definition would have file scope. So it's fine for the implementation to define such identifiers and use them in the implementation of standard functions. Hmm, this trows up another question how in GNU C, extensions interact with the putatively unchanged parts of the standard. If a user program defines an assembler name for a global function which is different from the name used in the source code, is that assembler name used at file scope? It would seem to me it's only used at global/link scope. As such, is the use of _[a-z].* as assembly names then part of the user namespace?
Re: selective linking of floating point support for *printf / *scanf
On Wed, 3 Sep 2014, Joern Rennecke wrote: On 2 September 2014 16:28, Joseph S. Myers jos...@codesourcery.com wrote: On Tue, 2 Sep 2014, Joey Ye wrote: Apparently newlib is not following this specification very well, as there are symbols like _abc_r defined every where in current newlib. I am not implying the spec should not be followed, but is newlib designed to have a loose spec for the single underscore? Identifiers beginning with a single underscore are reserved with file scope. This means an application cannot provide an external definition of them, because such an external definition would have file scope. So it's fine for the implementation to define such identifiers and use them in the implementation of standard functions. Hmm, this trows up another question how in GNU C, extensions interact with the putatively unchanged parts of the standard. If a user program defines an assembler name for a global function which is different from the name used in the source code, is that assembler name used at file scope? It would seem to me it's only used at global/link scope. As such, is the use of _[a-z].* as assembly names then part of the user namespace? I see no reason a standard header shouldn't be able to define _[a-z] static functions at file scope, so I think it should be presumed that such names as assembly names are part of the implementation namespace. That's certainly the case for names such as _a.1 that GCC can generate for block-scope static variables called _a: if you generate such assembler names yourself, you risk conflicting with ones generated by GCC for block-scope statics. -- Joseph S. Myers jos...@codesourcery.com
Re: Enable EBX for x86 in 32bits PIC code
On 2014-08-29 2:47 AM, Ilya Enkovich wrote: Seems your patch doesn't cover all cases. Attached is a modified patch (with your changes included) and a test where double constant is wrongly rematerialized. I also see in ira dump that there is still a copy of PIC reg created: Initialization of original PIC reg: (insn 23 22 24 2 (set (reg:SI 127) (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 3 bx) (nil))) ... Copy is created: (insn 135 37 25 3 (set (reg:SI 138 [127]) (reg:SI 127)) 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 127) (nil))) ... Copy is used: (insn 119 25 122 3 (set (reg:DF 134) (mem/u/c:DF (plus:SI (reg:SI 138 [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI (*.LC0) [flags 0x2]) ] UNSPEC_GOTOFF))) [5 S8 A64])) 128 {*movdf_internal} (expr_list:REG_EQUIV (const_double:DF 2.9997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11]) (nil))) The copy is created by a newer IRA optimization for function prologues. The patch in the attachment should solve the problem. I also added the code to prevent spilling the pic pseudo in LRA which could happen before theoretically. After reload we have new usage of r127 which is allocated to ecx which actually does not have any definition in this function at all. (insn 151 42 44 4 (set (reg:SI 0 ax [147]) (plus:SI (reg:SI 2 cx [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI (*.LC0) [flags 0x2]) ] UNSPEC_GOTOFF test.cc:44 213 {*leasi} (expr_list:REG_EQUAL (symbol_ref/u:SI (*.LC0) [flags 0x2]) (nil))) (insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129]) (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (mem/u/c:DF (reg:SI 0 ax [147]) [5 S8 A64]))) test.cc:44 790 {*fop_df_comm_sse} (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (const_double:DF 2.9997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11])) (nil))) Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc Index: ira.c === --- ira.c (revision 214576) +++ ira.c (working copy) @@ -4887,7 +4887,7 @@ split_live_ranges_for_shrink_wrap (void) FOR_BB_INSNS (first, insn) { rtx dest = interesting_dest_for_shprep (insn, call_dom); - if (!dest) + if (!dest || dest == pic_offset_table_rtx) continue; rtx newreg = NULL_RTX; Index: lra-assigns.c === --- lra-assigns.c (revision 214576) +++ lra-assigns.c (working copy) @@ -879,11 +879,13 @@ spill_for (int regno, bitmap spilled_pse } /* Spill pseudos. */ EXECUTE_IF_SET_IN_BITMAP (spill_pseudos_bitmap, 0, spill_regno, bi) - if ((int) spill_regno = lra_constraint_new_regno_start -! bitmap_bit_p (lra_inheritance_pseudos, spill_regno) -! bitmap_bit_p (lra_split_regs, spill_regno) -! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno) -! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno)) + if ((pic_offset_table_rtx != NULL + spill_regno == REGNO (pic_offset_table_rtx)) + || ((int) spill_regno = lra_constraint_new_regno_start +! bitmap_bit_p (lra_inheritance_pseudos, spill_regno) +! bitmap_bit_p (lra_split_regs, spill_regno) +! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno) +! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno))) goto fail; insn_pseudos_num = 0; if (lra_dump_file != NULL) @@ -1053,7 +1055,9 @@ setup_live_pseudos_and_spill_after_risky return; } for (n = 0, i = FIRST_PSEUDO_REGISTER; i max_regno; i++) -if (reg_renumber[i] = 0 lra_reg_info[i].nrefs 0) +if ((pic_offset_table_rtx == NULL_RTX +|| i != (int) REGNO (pic_offset_table_rtx)) +reg_renumber[i] = 0 lra_reg_info[i].nrefs 0) sorted_pseudos[n++] = i; qsort (sorted_pseudos, n, sizeof (int), pseudo_compare_func); for (i = n - 1; i = 0; i--) @@ -1360,6 +1364,8 @@ assign_by_spills (void) } EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno) { + gcc_assert (pic_offset_table_rtx == NULL + || conflict_regno != REGNO (pic_offset_table_rtx)); if ((int) conflict_regno = lra_constraint_new_regno_start) sorted_pseudos[nfails++] = conflict_regno; if (lra_dump_file != NULL)
Re: stack_pointer_delta related ICE in libgcc on 4.9.1
Joel Sherrill joel.sherr...@oarcorp.com writes: On 9/3/2014 1:24 PM, Jeff Law wrote: On 09/03/14 09:56, Bernhard Reutner-Fischer wrote: Perhaps m5206 is not TARGET_CAS and should not compile this linux-atomic in the first place? No, I don't think so. Coldfire does not have the CAS instruction per http://www.freescale.com/files/dsp/doc/ref_manual/CFPRM.pdf On Linux it uses a kernel helper for atomic operations. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Compare Elimination problems
For a 16 bit CPU the cmpelim pass is changing (insn 33 84 85 6 (parallel [ (set (reg:HI 1 r1) (ashift:HI (reg:HI 1 r1) (const_int 1 [0x1]))) (clobber (reg:CC_NOOV 7 flags)) ]) ../gcc/testsuite/gcc.c-torture/execute/960311-3.c:18 33 {ashlhi3} (insn 34 87 35 6 (set (reg:CC_NOOV 7 flags) (compare:CC_NOOV (reg:SI 0 r0) (const_int 0 [0]))) ../gcc/testsuite/gcc.c-torture/execute/960311-3.c:20 39 {*comparesi3_nov} (jump_insn 35 34 36 6 (set (pc) (if_then_else (ge (reg:CC_NOOV 7 flags) to (insn 33 84 85 6 (parallel [ (set (reg:HI 1 r1) (ashift:HI (reg:HI 1 r1) (const_int 1 [0x1]))) (set (reg:CC_NOOV 7 flags) (compare:CC_NOOV (ashift:HI (reg:HI 1 r1) (const_int 1 [0x1])) (const_int 0 [0]))) ]) ../gcc/testsuite/gcc.c-torture/execute/960311-3.c:18 29 {ashlhi3_cc} (jump_insn 35 87 36 6 (set (pc) (if_then_else (ge (reg:CC_NOOV 7 flags) (reg:HI r1) is a subreg of (reg:SI r0) however the cmpelim seems to be substituting the compare of (reg:HI r1 and 0) for the compare of (reg:SI r0 and 0) ? While I'm here, in i386.md some of the flag setting operations specify a mode and some don't . Eg (define_expand cmpmode_1 [(set (reg:CC FLAGS_REG) (compare:CC (match_operand:SWI48 0 nonimmediate_operand) (define_insn *addmode_3 [(set (reg FLAGS_REG) (compare Can anyone explain the significance of this ? Thanks, Paul.
gcc-4.9-20140903 is now available
Snapshot gcc-4.9-20140903 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20140903/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.9 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch revision 214893 You'll find: gcc-4.9-20140903.tar.bz2 Complete GCC MD5=24dfd67139fda4746d2deff18182611d SHA1=d5bf2b1fba133bef433d459a3add44fec262ab20 Diffs from 4.9-20140827 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.9 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
PATCH for Re: New GCC mirror
On Fri, 29 Aug 2014, ConcertPass Mirrors Admin wrote: we set up a new GCC mirror for the community. URL: http://mirrors.concertpass.com/gcc/ Organization/Contact: ConcertPass (ad...@mirrors.concertpass.com) Location: United States, Michigan Please, add it to your mirror list page. Done thusly. Note that your page claims this to be a Sudo mirror; you may want to make that read GCC mirror, or better gcc.gnu.org mirror. (Out of curiosity, this is not for the sake of search engine optimization, is it?) Gerald Index: mirrors.html === RCS file: /cvs/gcc/wwwdocs/htdocs/mirrors.html,v retrieving revision 1.226 diff -u -r1.226 mirrors.html --- mirrors.html28 Jul 2014 23:02:56 - 1.226 +++ mirrors.html31 Aug 2014 11:14:17 - @@ -57,6 +57,7 @@ a href=http://mirrors-usa.go-parts.com/gcc/;http://mirrors-usa.go-parts.com/gcc/a | a href=ftp://mirrors-usa.go-parts.com/gcc;ftp://mirrors-usa.go-parts.com/gcc/a | a href=rsync://mirrors-usa.go-parts.com/gccrsync://mirrors-usa.go-parts.com/gcc/a/li +liUS, Michigan: a href=http://mirrors.concertpass.com/gcc/;http://mirrors.concertpass.com/gcc//a, thanks to ad...@mirrors.concertpass.com./li /ul pThe archives there will be signed by one of the following GnuPG keys:/p
[Bug target/62308] A bug with aarch64 big-endian
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62308 Yvan Roux yroux at gcc dot gnu.org changed: What|Removed |Added CC||vmakarov at gcc dot gnu.org, ||yroux at gcc dot gnu.org --- Comment #4 from Yvan Roux yroux at gcc dot gnu.org --- yes, and there is no issue when we use reload instead of LRA (flag -mno-lra).
[Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535 Zhenqiang Chen zhenqiang.chen at arm dot com changed: What|Removed |Added CC||zhenqiang.chen at arm dot com --- Comment #20 from Zhenqiang Chen zhenqiang.chen at arm dot com --- Here is a small case to show lra introduces one more register copy (tested with trunk and 4.9). int isascii (int c) { return c = 0 c 128; } With options: -Os -mthumb -mcpu=cortex-m0, I got isascii: movr3, #0 movr2, #127 movr1, r3 //??? cmpr2, r0 adcr1, r1, r3 movr0, r1 bxlr With options: -Os -mthumb -mcpu=cortex-m0 -mno-lra, I got isascii: movr2, #127 movr3, #0 cmpr2, r0 adcr3, r3, r3 movr0, r3 bxlr
[Bug libstdc++/55409] std::list not properly wrapping access to custom allocator through allocator_traits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55409 --- Comment #7 from Freddie Chopin freddie_chopin at op dot pl --- Great (; Do you have some timeline? I'm not trying to rush you - I'm just working on a project in which this feature would be beneficial, so I'm wondering whether I should wait a bit (this particular requirement is not top-priority) or maybe just implement the allocator the old way for now. Thanks in advance!
[Bug middle-end/61848] [5 Regression] a previous declaration causes the section attribute to be lost
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61848 --- Comment #8 from Andrey Ryabinin ryabinin.a.a at gmail dot com --- Hi, may I ask what's the status of this? Besides of section mismatches in linux kernel it also breaks kernel's modules. Variable __this_module doesn't get into section .gnu.linkonce.this_module, therefore module refuses to load.
[Bug fortran/61881] ICE in gfc_conv_intrinsic_to_class with assumed-rank CLASS(*)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61881 --- Comment #6 from Tobias Burnus burnus at gcc dot gnu.org --- Author: burnus Date: Wed Sep 3 06:41:37 2014 New Revision: 214843 URL: https://gcc.gnu.org/viewcvs?rev=214843root=gccview=rev Log: Missed that file in r213079 of 2014-07-26 2014-09-03 Tobias Burnus bur...@net-b.de PR fortran/61881 PR fortran/61888 PR fortran/57305 * gfortran.dg/sizeof_4.f90: New. Added: trunk/gcc/testsuite/gfortran.dg/sizeof_4.f90 Modified: trunk/gcc/testsuite/ChangeLog
[Bug fortran/61888] Wrong results with SIZEOF and assumed-rank arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61888 --- Comment #3 from Tobias Burnus burnus at gcc dot gnu.org --- Author: burnus Date: Wed Sep 3 06:41:37 2014 New Revision: 214843 URL: https://gcc.gnu.org/viewcvs?rev=214843root=gccview=rev Log: Missed that file in r213079 of 2014-07-26 2014-09-03 Tobias Burnus bur...@net-b.de PR fortran/61881 PR fortran/61888 PR fortran/57305 * gfortran.dg/sizeof_4.f90: New. Added: trunk/gcc/testsuite/gfortran.dg/sizeof_4.f90 Modified: trunk/gcc/testsuite/ChangeLog
[Bug fortran/57305] [OOP] ICE when calling SIZEOF on an unlimited polymorphic variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57305 --- Comment #17 from Tobias Burnus burnus at gcc dot gnu.org --- Author: burnus Date: Wed Sep 3 06:41:37 2014 New Revision: 214843 URL: https://gcc.gnu.org/viewcvs?rev=214843root=gccview=rev Log: Missed that file in r213079 of 2014-07-26 2014-09-03 Tobias Burnus bur...@net-b.de PR fortran/61881 PR fortran/61888 PR fortran/57305 * gfortran.dg/sizeof_4.f90: New. Added: trunk/gcc/testsuite/gfortran.dg/sizeof_4.f90 Modified: trunk/gcc/testsuite/ChangeLog
[Bug target/62663] m68k / coldfire : compiling with -msep-data breaks the code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62663 --- Comment #4 from Andreas Schwab sch...@linux-m68k.org --- Then this is most likely a linker bug, not setting up the GOT correctly.
[Bug fortran/63152] New: needless initialization of local pointer arrays.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63152 Bug ID: 63152 Summary: needless initialization of local pointer arrays. Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: Joost.VandeVondele at mat dot ethz.ch I've noticed that for this code: SUBROUTINE S1() INTEGER, POINTER, DIMENSION(:) :: v INTERFACE SUBROUTINE foo(v) INTEGER, POINTER, DIMENSION(:) :: v END SUBROUTINE END INTERFACE CALL foo(v) END SUBROUTINE S1 gfortran initializes the pointer (to zero) even if '-fno-init-local-zero' : s1 () { struct array1_integer(kind=4) v; v.data = 0B; foo (v); } I don't think this is mandated (other compilers don't) I'm working on a patch.
[Bug fortran/63152] needless initialization of local pointer arrays.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63152 Joost VandeVondele Joost.VandeVondele at mat dot ethz.ch changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-09-03 CC||Joost.VandeVondele at mat dot ethz ||.ch Assignee|unassigned at gcc dot gnu.org |Joost.VandeVondele at mat dot ethz ||.ch Ever confirmed|0 |1 --- Comment #1 from Joost VandeVondele Joost.VandeVondele at mat dot ethz.ch --- working on a patch.
[Bug c++/62224] [4.9 Regression] Possible regression in gcc-4.9-20140820
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62224 Markus Trippelsdorf trippels at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #11 from Markus Trippelsdorf trippels at gcc dot gnu.org --- Here's a small testcase: markus@x4 tmp % cat cppcodemodelinspectordialog.ii namespace CppTools { class A { public: virtual void headerPaths () = 0; }; namespace Internal { class CppModelManager : CppTools::A { void headerPaths () { ensureUpdated (); } void ensureUpdated (); }; } } CppTools::A *a; void fn1 () { a-headerPaths (); } (before r214208) markus@x4 tmp % g++ -Wl,--no-undefined -shared -fPIC -O2 cppcodemodelinspectordialog.ii markus@x4 tmp % (after r214208) markus@x4 tmp % g++ -Wl,--no-undefined -shared -fPIC -O2 cppcodemodelinspectordialog.ii /tmp/ccMZQE0g.o:cppcodemodelinspectordialog.ii:function fn1(): error: undefined reference to 'CppTools::Internal::CppModelManager::ensureUpdated()' /tmp/ccMZQE0g.o:cppcodemodelinspectordialog.ii:function CppTools::Internal::CppModelManager::headerPaths(): error: undefined reference to 'CppTools::Internal::CppModelManager::ensureUpdated()' collect2: error: ld returned 1 exit status (one can use -fno-devirtualize-speculatively as a workaround) markus@x4 tmp % g++ -Wl,--no-undefined -fno-devirtualize-speculatively -shared -fPIC -O2 cppcodemodelinspectordialog.ii markus@x4 tmp %
[Bug fortran/63153] New: pointers are not nullified with -finit-local-zero
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63153 Bug ID: 63153 Summary: pointers are not nullified with -finit-local-zero Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: Joost.VandeVondele at mat dot ethz.ch scalar pointers are not nullified with -finit-local-zero . After the fix for PR63152, also arrays with the pointer attribute might need this. cat bug.f90 SUBROUTINE S1() INTEGER, POINTER :: w IF (ASSOCIATED(w)) CALL ABORT() END SUBROUTINE S1 gfortran -fdump-tree-original -finit-local-zero -g -c bug.f90 cat bug.f90.003t.original s1 () { integer(kind=4) * w; if (w != 0B) { _gfortran_abort (); } L.1:; }
[Bug target/61330] [5 Regression] Thumb ICE for case 920507-1.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61330 --- Comment #9 from Yvan Roux yroux at gcc dot gnu.org --- Author: yroux Date: Wed Sep 3 07:23:01 2014 New Revision: 214847 URL: https://gcc.gnu.org/viewcvs?rev=214847root=gccview=rev Log: gcc/ 2014-09-03 Yvan Roux yvan.r...@linaro.org Backport from trunk r214526. 2014-08-26 Joseph Myers jos...@codesourcery.com PR target/60606 PR target/61330 * varasm.c (make_decl_rtl): Clear DECL_ASSEMBLER_NAME and DECL_HARD_REGISTER and return for invalid register specifications. * cfgexpand.c (expand_one_var): If expand_one_hard_reg_var clears DECL_HARD_REGISTER, call expand_one_error_var. * config/arm/arm.c (arm_hard_regno_mode_ok): Do not allow CC_REGNUM with non-MODE_CC modes. (arm_regno_class): Return NO_REGS for PC_REGNUM. gcc/testsuite/ 2014-09-03 Yvan Roux yvan.r...@linaro.org Backport from trunk r214526. 2014-08-26 Joseph Myers jos...@codesourcery.com PR target/60606 PR target/61330 * gcc.dg/torture/pr60606-1.c, gcc.target/arm/pr60606-2.c, gcc.target/arm/pr60606-3.c, gcc.target/arm/pr60606-4.c: New tests. Added: branches/linaro/gcc-4_9-branch/gcc/testsuite/gcc.dg/torture/pr60606-1.c branches/linaro/gcc-4_9-branch/gcc/testsuite/gcc.target/arm/pr60606-2.c branches/linaro/gcc-4_9-branch/gcc/testsuite/gcc.target/arm/pr60606-3.c branches/linaro/gcc-4_9-branch/gcc/testsuite/gcc.target/arm/pr60606-4.c Modified: branches/linaro/gcc-4_9-branch/gcc/ChangeLog.linaro branches/linaro/gcc-4_9-branch/gcc/cfgexpand.c branches/linaro/gcc-4_9-branch/gcc/config/arm/arm.c branches/linaro/gcc-4_9-branch/gcc/testsuite/ChangeLog.linaro branches/linaro/gcc-4_9-branch/gcc/varasm.c
[Bug target/60606] [ARM] ICE with asm (mov ..., pc)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60606 --- Comment #9 from Yvan Roux yroux at gcc dot gnu.org --- Author: yroux Date: Wed Sep 3 07:23:01 2014 New Revision: 214847 URL: https://gcc.gnu.org/viewcvs?rev=214847root=gccview=rev Log: gcc/ 2014-09-03 Yvan Roux yvan.r...@linaro.org Backport from trunk r214526. 2014-08-26 Joseph Myers jos...@codesourcery.com PR target/60606 PR target/61330 * varasm.c (make_decl_rtl): Clear DECL_ASSEMBLER_NAME and DECL_HARD_REGISTER and return for invalid register specifications. * cfgexpand.c (expand_one_var): If expand_one_hard_reg_var clears DECL_HARD_REGISTER, call expand_one_error_var. * config/arm/arm.c (arm_hard_regno_mode_ok): Do not allow CC_REGNUM with non-MODE_CC modes. (arm_regno_class): Return NO_REGS for PC_REGNUM. gcc/testsuite/ 2014-09-03 Yvan Roux yvan.r...@linaro.org Backport from trunk r214526. 2014-08-26 Joseph Myers jos...@codesourcery.com PR target/60606 PR target/61330 * gcc.dg/torture/pr60606-1.c, gcc.target/arm/pr60606-2.c, gcc.target/arm/pr60606-3.c, gcc.target/arm/pr60606-4.c: New tests. Added: branches/linaro/gcc-4_9-branch/gcc/testsuite/gcc.dg/torture/pr60606-1.c branches/linaro/gcc-4_9-branch/gcc/testsuite/gcc.target/arm/pr60606-2.c branches/linaro/gcc-4_9-branch/gcc/testsuite/gcc.target/arm/pr60606-3.c branches/linaro/gcc-4_9-branch/gcc/testsuite/gcc.target/arm/pr60606-4.c Modified: branches/linaro/gcc-4_9-branch/gcc/ChangeLog.linaro branches/linaro/gcc-4_9-branch/gcc/cfgexpand.c branches/linaro/gcc-4_9-branch/gcc/config/arm/arm.c branches/linaro/gcc-4_9-branch/gcc/testsuite/ChangeLog.linaro branches/linaro/gcc-4_9-branch/gcc/varasm.c
[Bug c++/62224] [4.9 Regression] Possible regression in gcc-4.9-20140820
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62224 --- Comment #12 from Chris Clayton chris2553 at googlemail dot com --- Sorry, you'll have to stick with me here while a figure out what that means. I think you are saying that prior to r214208, the symbols definedMacros() and headerPaths() were present but effectively no-ops. Post r214208 they now contain operations including calls to ensureUpdated(). Given that the symbol for ensureUpdated() appears to be present in libCppTools.so (along with the symbols for its two post-r214208 callers), does that suggest a problem with the linker, which is /usr/bin/ld from the latest version (2.24) of binutils? Or could it be anything to do with my system being a 32bit userspace on a 64bit kernel? I usually build packages as rpms and have the rpm binary wrapped in a script which uses prefixes the call to the actual rpm binary with setarch i386. I've been careful whilst investigated this problem to make sure that I prefix calls to qmake and make with setarch i386. I've built loads and loads of packages with this setup (including gcc). I'm just trying to figure out the next port of call with this problem. I note that the Debian folks have a bug logged but seem to be waiting on resolution via this bug report - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=759862.
[Bug c++/62224] [4.9 Regression] Possible regression in gcc-4.9-20140820
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62224 --- Comment #13 from Markus Trippelsdorf trippels at gcc dot gnu.org --- (In reply to Chris Clayton from comment #12) Sorry, you'll have to stick with me here while a figure out what that means. I think you are saying that prior to r214208, the symbols definedMacros() and headerPaths() were present but effectively no-ops. Post r214208 they now contain operations including calls to ensureUpdated(). Given that the symbol for ensureUpdated() appears to be present in libCppTools.so (along with the symbols for its two post-r214208 callers), does that suggest a problem with the linker, which is /usr/bin/ld from the latest version (2.24) of binutils? No. This has nothing to do with libCppTools.so. As I wrote before the build system of qt-creator must be changed to provide the missing symbol by simply adding cppmodelmanager.o to the libCppEditor.so link command.
[Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535 --- Comment #21 from Fredrik Hederstierna fredrik.hederstie...@securitas-direct.com --- I filed this previously, maybe its duplicate Bug 61578 - Code size increase for ARM thumb compared to 4.8.x when compiling with -Os BR Fredrik
[Bug target/62662] [4.9/5 Regression] Miscompilation of Qt on s390x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62662 --- Comment #5 from Andreas Krebbel krebbel at gcc dot gnu.org --- (In reply to Jakub Jelinek from comment #4) I agree that this is something we need to fix in the back-end. I was just curious about when this surfaced first and keep that info for the records.
[Bug bootstrap/61078] [5 Regression] ESA mode bootstrap failure since r209897
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61078 --- Comment #8 from Andreas Krebbel krebbel at gcc dot gnu.org --- Author: krebbel Date: Wed Sep 3 08:06:09 2014 New Revision: 214850 URL: https://gcc.gnu.org/viewcvs?rev=214850root=gccview=rev Log: 2014-09-03 Andreas Krebbel andreas.kreb...@de.ibm.com PR target/61078 * config/s390/s390.md (*negdi2_31): Add s390_split_ok_p check and add a second splitter to handle the remaining cases. 2014-09-03 Andreas Krebbel andreas.kreb...@de.ibm.com PR target/61078 * gcc.target/s390/pr61078.c: New testcase. Added: trunk/gcc/testsuite/gcc.target/s390/pr61078.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/s390/s390.md trunk/gcc/testsuite/ChangeLog
[Bug fortran/63152] needless initialization of local pointer arrays.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63152 Joost VandeVondele Joost.VandeVondele at mat dot ethz.ch changed: What|Removed |Added URL||https://gcc.gnu.org/ml/fort ||ran/2014-09/msg00016.html --- Comment #2 from Joost VandeVondele Joost.VandeVondele at mat dot ethz.ch --- WIP patch at URL
[Bug bootstrap/61078] [5 Regression] ESA mode bootstrap failure since r209897
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61078 Andreas Krebbel krebbel at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #9 from Andreas Krebbel krebbel at gcc dot gnu.org --- Fixed per comment 8
[Bug middle-end/61654] [4.9/5 Regression] ICE in release_function_body, at cgraph.c:1699
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61654 Martin Jambor jamborm at gcc dot gnu.org changed: What|Removed |Added Known to fail|4.10.0 |5.0 --- Comment #12 from Martin Jambor jamborm at gcc dot gnu.org --- I have proposed a fix on the mailing list: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00207.html
[Bug ipa/61986] ICE on valid code at -O3 on x86_64-linux-gnu indecide_about_value, at ipa-cp.c:3480
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61986 --- Comment #2 from Martin Jambor jamborm at gcc dot gnu.org --- I have proposed a fix on the mailing list: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00209.html
[Bug ipa/62015] [4.8/4.9/5 Regression] ipa-cp-clone uses a clone that is too specialized for the call context
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62015 --- Comment #3 from Martin Jambor jamborm at gcc dot gnu.org --- I have proposed a fix on the mailing list: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00210.html
[Bug regression/63150] [4.9/5 regression] FAIL: gcc.target/powerpc/pr53199.c scan-assembler-times *
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63150 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Known to work||4.8.3 Target Milestone|--- |4.9.2 Summary|[4.9 regression] FAIL: |[4.9/5 regression] FAIL: |gcc.target/powerpc/pr53199. |gcc.target/powerpc/pr53199. |c scan-assembler-times *|c scan-assembler-times *
[Bug tree-optimization/63148] r187042 causes auto-vectorization failure for X86 for -m32.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63148 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #1 from Richard Biener rguenth at gcc dot gnu.org --- This has been fixed on the 4.8 branch already, I think this is a duplicate of PR60276. *** This bug has been marked as a duplicate of bug 60276 ***
[Bug tree-optimization/60276] [4.7 Regression] -O3 autovectorizer breaks on a particular loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60276 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added CC||doug.gilmore at imgtec dot com --- Comment #15 from Richard Biener rguenth at gcc dot gnu.org --- *** Bug 63148 has been marked as a duplicate of this bug. ***
[Bug c++/63140] wrong code generation probably due to optimization problem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63140 --- Comment #3 from Richard Biener rguenth at gcc dot gnu.org --- You might want to try -fsanitize=undefined and/or -fno-strict-overflow as it sounds like you may be invoking undefined behavior.
[Bug c++/62224] [4.9 Regression] Possible regression in gcc-4.9-20140820
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62224 --- Comment #14 from Markus Trippelsdorf trippels at gcc dot gnu.org --- (In reply to Markus Trippelsdorf from comment #13) (In reply to Chris Clayton from comment #12) Sorry, you'll have to stick with me here while a figure out what that means. I think you are saying that prior to r214208, the symbols definedMacros() and headerPaths() were present but effectively no-ops. Post r214208 they now contain operations including calls to ensureUpdated(). Given that the symbol for ensureUpdated() appears to be present in libCppTools.so (along with the symbols for its two post-r214208 callers), does that suggest a problem with the linker, which is /usr/bin/ld from the latest version (2.24) of binutils? No. This has nothing to do with libCppTools.so. As I wrote before the build system of qt-creator must be changed to provide the missing symbol by simply adding cppmodelmanager.o to the libCppEditor.so link command. Out of curiosity, I have downloaded and tried to build qt-creator-3.2.0. The build failed exactly as you described in commment 0. The fix is simple, just add __attribute__ ((visibility (default))) to CppModelManager::ensureUpdated() in src/plugins/cpptools/cppmodelmanager.cpp: 294 __attribute__ ((visibility (default))) 295 void CppModelManager::ensureUpdated() 296 { This will make _ZN8CppTools8Internal15CppModelManager13ensureUpdatedEv external for libCppTools.so and everything is fine.
[Bug testsuite/53155] Not parallel: test for -j fails with new make
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53155 Joost VandeVondele Joost.VandeVondele at mat dot ethz.ch changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-09-03 CC||Joost.VandeVondele at mat dot ethz ||.ch Ever confirmed|0 |1 --- Comment #4 from Joost VandeVondele Joost.VandeVondele at mat dot ethz.ch --- still fails. Honestly, this made contributing my first patches much slower, as testing took ages to complete.
[Bug c++/63140] wrong code generation probably due to optimization problem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63140 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #4 from Jakub Jelinek jakub at gcc dot gnu.org --- Or -fno-aggressive-loop-optimizations. From your description it is hard to figure what exactly to look for in the assembly, so e.g. bisecting compiler where it stopped working is not easy.
[Bug tree-optimization/55334] [4.8/4.9/5 Regression] mgrid regression (ipa-cp disables vectorization)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55334 --- Comment #39 from Martin Jambor jamborm at gcc dot gnu.org --- (In reply to Vidya Praveen from comment #38) Until we fix this issue, could we have workaround posted by Martin Jambor (comment #29) applied again on 4.9 and trunk? No, not on trunk please. As I said on IRC yesterday. Before we even consider this for the 4.9 branch, please verify that inlining does not cause the same problems with the benchmark (on the particular architecture you care for). It is certainly capable of doing that and we certainly do not want to switch inlining off :-)
[Bug tree-optimization/49444] IV-OPTs changes an unaligned loads into aligned loads incorrectly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49444 bin.cheng amker.cheng at gmail dot com changed: What|Removed |Added CC||amker.cheng at gmail dot com --- Comment #8 from bin.cheng amker.cheng at gmail dot com --- This should be fixed on trunk now. At least for r211210 and r214864. For Andrew's test, the generated mips assmbly for kernel loop is as below. $L3: lwl$5,1($16) lwl$4,5($16) lwl$3,9($16) lwr$5,4($16) lwr$4,8($16) lwr$3,12($16) lw$2,%gp_rel(ss)($28) addiu$16,$16,13 sw$5,0($2) sw$4,4($2) jalg sw$3,8($2) bne$16,$17,$L3 move$2,$0 For Richard's case (with an explicit conversion when calling foo), the generated mips assembly is as below. foo: .frame$sp,0,$31# vars= 0, regs= 0/0, args= 0, gp= 0 .mask0x,0 .fmask0x,0 .setnoreorder .setnomacro lwl$2,0($4) nop lwr$2,3($4) j$31 nop .setmacro .setreorder .endfoo .sizefoo, .-foo Apparently, lwl/lwr are generated for unalgned memory access. Thanks, bin
[Bug tree-optimization/55334] [4.8/4.9/5 Regression] mgrid regression (ipa-cp disables vectorization)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55334 --- Comment #40 from rguenther at suse dot de rguenther at suse dot de --- nOn Wed, 3 Sep 2014, jamborm at gcc dot gnu.org wrote: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55334 --- Comment #39 from Martin Jambor jamborm at gcc dot gnu.org --- (In reply to Vidya Praveen from comment #38) Until we fix this issue, could we have workaround posted by Martin Jambor (comment #29) applied again on 4.9 and trunk? No, not on trunk please. As I said on IRC yesterday. Before we even consider this for the 4.9 branch, please verify that inlining does not cause the same problems with the benchmark (on the particular architecture you care for). It is certainly capable of doing that and we certainly do not want to switch inlining off :-) Inlining will certainly cause the same problem.
[Bug c++/63140] wrong code generation probably due to optimization problem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63140 --- Comment #5 from Ralf Hoffmann gcc at boomerangsworld dot de --- Thanks for the feedback, I am also suspecting I have some problem in my code regarding undefined behavior. What I do for testing is to compile my tool Worker (http://www.boomerangsworld.de/cms/worker/index.html, version 3.5.0) with make clean LDFLAGS=-fsanitize=undefined CPPFLAGS=-fsanitize=undefined ./configure make and then start the program (src/worker), click on top left A button for the about dialog and click on the down arrow to scroll down the option list. It then either works, or the process hangs in the endless loop. I tried to use -fsanitize=undefined and it actually makes a difference. There is no compiler output pointing out some problem and also no runtime output when reaching the test point mentioned above. But with this option, it behaves normally and the endless loop does not occur. When using the options -fno-strict-overflow or -fno-aggressive-loop-optimizations the problem still occurs. I would like to help bisecting the compiler if you could give me a hint where to start. As far as I see, there is no git repo which would make it easier.
[Bug tree-optimization/49444] IV-OPTs changes an unaligned loads into aligned loads incorrectly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49444 --- Comment #9 from Richard Biener rguenth at gcc dot gnu.org --- Thus dup of PR61320?
[Bug c/62024] __atomic_always_lock_free is not a constant expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62024 --- Comment #8 from Marek Polacek mpolacek at gcc dot gnu.org --- Author: mpolacek Date: Wed Sep 3 11:16:29 2014 New Revision: 214871 URL: https://gcc.gnu.org/viewcvs?rev=214871root=gccview=rev Log: PR c/62024 * c-parser.c (c_parser_static_assert_declaration_no_semi): Strip no-op conversions. * g++.dg/cpp0x/pr62024.C: New test. * gcc.dg/pr62024.c: New test. Added: trunk/gcc/testsuite/g++.dg/cpp0x/pr62024.C trunk/gcc/testsuite/gcc.dg/pr62024.c Modified: trunk/gcc/c/ChangeLog trunk/gcc/c/c-parser.c trunk/gcc/testsuite/ChangeLog
[Bug c/62024] __atomic_always_lock_free is not a constant expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62024 Marek Polacek mpolacek at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #9 from Marek Polacek mpolacek at gcc dot gnu.org --- Should be fixed.
[Bug libstdc++/55409] std::list not properly wrapping access to custom allocator through allocator_traits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55409 Jonathan Wakely redi at gcc dot gnu.org changed: What|Removed |Added Target Milestone|--- |5.0 --- Comment #8 from Jonathan Wakely redi at gcc dot gnu.org --- It will be done for the GCC 5.0 release.
[Bug lto/62026] [4.9/5 Regression] Crash in lto_get_decl_name_mapping
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62026 --- Comment #8 from Martin Jambor jamborm at gcc dot gnu.org --- I'm sorry but I cannot reproduce the problem with the attached testcase. I will try the libxul link.
[Bug other/63155] New: [4.9/5 Regression] memory hog
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63155 Bug ID: 63155 Summary: [4.9/5 Regression] memory hog Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: doko at gcc dot gnu.org Created attachment 33441 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33441action=edit preprocessed source [forwarded from https://bugs.debian.org/759683] compiling the attached test case with the 4.9 branch r214759 and trunk r213954 takes about 90sec on x86_64 and 10GB of memory. succeeds with the 4.8 branch in less than a second. $ gcc -std=c99 -c testunity_Runner.i from the Debian issue: Notice that replacing _setjmp (Unity.AbortFrame[Unity.CurrentAbortFrame]) in main function by _setjmp (Unity.AbortFrame[0]), make gcc works normaly. After few tests it seems that gcc does not like having a variable in here. I don't see the crash reported in the Debian issue.
[Bug c++/63140] wrong code generation probably due to optimization problem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63140 --- Comment #6 from Jakub Jelinek jakub at gcc dot gnu.org --- There is a git mirror of the svn repo. Anyway, -fsanitize=undefined enables -fno-delete-null-pointer-checks, perhaps you could try that option alone if it makes a difference.
[Bug libstdc++/62259] atomic class doesn't enforce required alignment on powerpc64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62259 Ulrich Weigand uweigand at gcc dot gnu.org changed: What|Removed |Added CC||uweigand at gcc dot gnu.org --- Comment #1 from Ulrich Weigand uweigand at gcc dot gnu.org --- Indeed, when running a simple test program: #include atomic #include stdio.h struct twoints { int a; int b; }; int main(void) { printf(%d\n, __alignof__ (twoints)); printf(%d\n, __alignof__ (std::atomictwoints)); return 0; } we see that the GCC only requires 4 bytes of alignment for the atomic type. However, with the equivalent C11 code using the _Atomic keyword #include stdatomic.h #include stdio.h struct twoints { int a; int b; }; int main() { printf(%d\n, __alignof__ (struct twoints)); printf(%d\n, __alignof__ (_Atomic (struct twoints))); return 0; } we get an alignment requirement of 8 bytes for the atomic type. In the C case, this is done by the compiler front-end where it implements the _Atomic keyword. In the C++ case, it seems the compiler doesn't really get involved, as it's all done in plain C++ in standard library code ... I suspect the intent was that for C++, we likewise ought to have an increased alignment requirement for the type, but I'm not sure how to implement this in the library. Need some of the library experts to comment here.
[Bug tree-optimization/49444] IV-OPTs changes an unaligned loads into aligned loads incorrectly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49444 Andrew Pinski pinskia at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #10 from Andrew Pinski pinskia at gcc dot gnu.org --- (In reply to Richard Biener from comment #9) Thus dup of PR61320? Yes. *** This bug has been marked as a duplicate of bug 61320 ***
[Bug bootstrap/61320] [4.10 regression] ICE in jcf-parse.c:1622 (parse_class_file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61320 Andrew Pinski pinskia at gcc dot gnu.org changed: What|Removed |Added CC||pinskia at gcc dot gnu.org --- Comment #69 from Andrew Pinski pinskia at gcc dot gnu.org --- *** Bug 49444 has been marked as a duplicate of this bug. ***
[Bug c/62294] [4.9 Regression] Missing passing argument [...] from incompatible pointer type warning.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62294 --- Comment #3 from Marek Polacek mpolacek at gcc dot gnu.org --- Author: mpolacek Date: Wed Sep 3 12:54:06 2014 New Revision: 214874 URL: https://gcc.gnu.org/viewcvs?rev=214874root=gccview=rev Log: PR c/62294 * c-typeck.c (convert_arguments): Get location of a parameter. Change error and warning calls to error_at and warning_at. Pass location of a parameter to it. (convert_for_assignment): Add parameter to WARN_FOR_ASSIGNMENT and WARN_FOR_QUALIFIERS. Pass expr_loc to those. * gcc.dg/pr56724-1.c: New test. * gcc.dg/pr56724-2.c: New test. * gcc.dg/pr62294.c: New test. * gcc.dg/pr62294.h: New file. Added: branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/pr56724-1.c branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/pr56724-2.c branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/pr62294.c branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/pr62294.h Modified: branches/gcc-4_9-branch/gcc/c/ChangeLog branches/gcc-4_9-branch/gcc/c/c-typeck.c branches/gcc-4_9-branch/gcc/testsuite/ChangeLog
[Bug c/62294] [4.9 Regression] Missing passing argument [...] from incompatible pointer type warning.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62294 Marek Polacek mpolacek at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #4 from Marek Polacek mpolacek at gcc dot gnu.org --- Fixed. I'll add the new test to trunk as well.
[Bug c++/63140] wrong code generation probably due to optimization problem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63140 --- Comment #7 from Ralf Hoffmann gcc at boomerangsworld dot de --- Created attachment 33442 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33442action=edit simplified example file 1 simple example containing the code piece which triggers the behavior
[Bug c++/63140] wrong code generation probably due to optimization problem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63140 --- Comment #8 from Ralf Hoffmann gcc at boomerangsworld dot de --- Created attachment 33443 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33443action=edit aguixtest.cc file with helper functions, not related to the problem, but required to execute
[Bug c++/63140] wrong code generation probably due to optimization problem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63140 --- Comment #9 from Ralf Hoffmann gcc at boomerangsworld dot de --- Created attachment 33444 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33444action=edit aguixtest.hh file with helper functions, not related to the problem, but required to execute
[Bug c++/63140] wrong code generation probably due to optimization problem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63140 --- Comment #10 from Ralf Hoffmann gcc at boomerangsworld dot de --- Created attachment 33445 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33445action=edit build build script used to create executable test program
[Bug c++/63140] wrong code generation probably due to optimization problem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63140 --- Comment #11 from Ralf Hoffmann gcc at boomerangsworld dot de --- I managed to create a standalone test program. Attachment aguix.cc contains the stripped down critical code segments. The two other files aguixtest.cc and aguixtest.hh are just to make a runnable binary. The attached script build can be used to create the binary. The expected output is: wait4mess2 called waittime2: 5 Worker: msg lock element lost! Worker: msg lock element lost! wait4mess2 called (this is what the binary does with gcc 4.8.1) while with gcc 4.9.1 it will loop forever: wait4mess2 called waittime2: 5 waittime2: 5 waittime2: 5 waittime2: 5 Compiled with -O1 instead of -O2 the example program crashes. Adding -fsanitize=undefined on the other hand will make it work again regardless of O1 or O2.
[Bug libstdc++/62259] atomic class doesn't enforce required alignment on powerpc64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62259 David Edelsohn dje at gcc dot gnu.org changed: What|Removed |Added Keywords||wrong-code Status|UNCONFIRMED |NEW Last reconfirmed||2014-09-03 CC||dje at gcc dot gnu.org, ||redi at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from David Edelsohn dje at gcc dot gnu.org --- Confirmed.
[Bug c/62294] [4.9 Regression] Missing passing argument [...] from incompatible pointer type warning.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62294 --- Comment #5 from Marek Polacek mpolacek at gcc dot gnu.org --- Author: mpolacek Date: Wed Sep 3 13:20:43 2014 New Revision: 214876 URL: https://gcc.gnu.org/viewcvs?rev=214876root=gccview=rev Log: PR c/62294 * gcc.dg/pr62294.c: New test. * gcc.dg/pr62294.h: New file. Added: trunk/gcc/testsuite/gcc.dg/pr62294.c trunk/gcc/testsuite/gcc.dg/pr62294.h Modified: trunk/gcc/testsuite/ChangeLog
[Bug c/62294] [4.9 Regression] Missing passing argument [...] from incompatible pointer type warning.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62294 --- Comment #6 from Emmanuel Thomé Emmanuel.Thome at inria dot fr --- Thanks. E.
[Bug other/63155] [4.9/5 Regression] memory hog
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63155 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-09-03 Target Milestone|--- |4.9.2 Ever confirmed|0 |1 --- Comment #1 from Richard Biener rguenth at gcc dot gnu.org --- Clearly caused by the correctness fix for setjmp to wire abnormal edges. For me it is out-of-ssa which uses too much memory while building the conflict graph. We have gigantic PHI nodes here: _10263(ab) = PHI _109925(D)(ab)(2),, _10592(ab)(1489) it's fast when optimizing. At -O0 we have a _lot_ more anonymous SSA names. -O1: bb 4: # _1(ab) = PHI _1902(3), _2(ab)(5) _1905 = _setjmp (_1(ab)); if (_1905 == 0) goto bb 6; else goto bb 8; bb 5 # _2(ab) = PHI _1895(D), single gigantic PHI -O0: bb 4: # _1(ab) = PHI _398164(3), _2(ab)(5) # _632(ab) = PHI _397532(D)(ab)(3), _633(ab)(5) # _1263(ab) = PHI _397533(D)(ab)(3), _1264(ab)(5) # _1894(ab) = PHI _397534(D)(ab)(3), _1895(ab)(5) # _2525(ab) = PHI _397535(D)(ab)(3), _2526(ab)(5) ... # _396900(ab) = PHI _398160(D)(ab)(3), _396901(ab)(5) _398165 = _setjmp (_1(ab)); if (_398165 == 0) goto bb 6; else goto bb 8; bb 5 # _2(ab) = PHI _397531(D)(ab)(2)... # _396901(ab) = PHI _398160(D)(ab)(2), _3... gazillion of gigantic PHIs. And very many PHIs in every block. It's into-SSA that introduces the difference for the PHI nodes but already GIMPLIFICATION that introduces very many more temporaries which is the underlying issue (lookup_tmp_var !optimize check). Index: gcc/gimplify.c === --- gcc/gimplify.c (revision 214810) +++ gcc/gimplify.c (working copy) @@ -476,7 +476,7 @@ lookup_tmp_var (tree val, bool is_formal block, which means it will go into memory, causing much extra work in reload and final and poorer code generation, outweighing the extra memory allocation here. */ - if (!optimize || !is_formal || TREE_SIDE_EFFECTS (val)) + if (!is_formal || TREE_SIDE_EFFECTS (val)) ret = create_tmp_from_val (val); else { fixes it (but it means that changing the testcase to use more distinct user variables would produce the same issue even when optimizing).
[Bug tree-optimization/58526] Inlining looses restrict qualifier and leads to loop versioned vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58526 --- Comment #2 from Tobias Burnus burnus at gcc dot gnu.org --- See also RFC patch at https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00232.html
[Bug other/63155] [4.9/5 Regression] memory hog
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63155 --- Comment #2 from Richard Biener rguenth at gcc dot gnu.org --- I wonder why we need to explicitely represent abnormal PHIs in the dispatcher. All incoming edges are abnormal and all SSA names have to be coalesced anyway. Thus we could instead have bb 5: /* Not: # _2(ab) = PHI _17(D)(ab)(2), _1(ab)(6), _1(ab)(7), _3(ab)(11), _3(ab)(12), _4(ab)(15), _4(ab)(16), _5(ab)(20), _5(ab)(21), _5(ab)(22) */ ABNORMAL_DISPATCHER (0); _2(ab) = D.12345; or simply rewrite all must-coalesce vars out-of-SSA? (or not into SSA in the first place) The question is whether accesses to them should be loads/stores (I think so) and if that will cause other similar issues. We'd have to factor abnormal edges into a block to a separate forwarder of course, with a load of all abnormal vars. Anyway, not sure why the gimplify code is disabled for -O0 (or why we don't re-use formal temps more aggressively as they become anonymous SSA names later anyway).
[Bug tree-optimization/55334] [4.8/4.9/5 Regression] mgrid regression (ipa-cp disables vectorization)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55334 --- Comment #41 from Richard Biener rguenth at gcc dot gnu.org --- New attempt: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00232.html
[Bug ipa/61986] ICE on valid code at -O3 on x86_64-linux-gnu indecide_about_value, at ipa-cp.c:3480
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61986 --- Comment #3 from Martin Jambor jamborm at gcc dot gnu.org --- Author: jamborm Date: Wed Sep 3 14:16:54 2014 New Revision: 214877 URL: https://gcc.gnu.org/viewcvs?rev=214877root=gccview=rev Log: 2014-09-03 Martin Jambor mjam...@suse.cz PR ipa/61986 * ipa-cp.c (find_aggregate_values_for_callers_subset): Chain created replacements in ascending order of offsets. (known_aggs_to_agg_replacement_list): Likewise. gcc/testsuite/ * gcc.dg/ipa/pr61986.c: New test. Added: trunk/gcc/testsuite/gcc.dg/ipa/pr61986.c Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-cp.c trunk/gcc/testsuite/ChangeLog
[Bug other/63155] [4.9/5 Regression] memory hog
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63155 --- Comment #3 from Richard Biener rguenth at gcc dot gnu.org --- So the issue is that the setjmp argument needs two temporaries: D.2832 = Unity.CurrentAbortFrame; D.2833 = Unity.AbortFrame[D.2832]; bb 18: D.2834 = _setjmp (D.2833); and the EH edge going into the _setjmp call has to merge those through the abnormal dispatcher. And that way it receives all of them. Hmm. Huh. Without the abnormal dispatcher they should just get default defs everywhere (but still many PHI nodes). Maybe that would be more light-weight.
[Bug ipa/62015] [4.8/4.9/5 Regression] ipa-cp-clone uses a clone that is too specialized for the call context
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62015 --- Comment #4 from Martin Jambor jamborm at gcc dot gnu.org --- Author: jamborm Date: Wed Sep 3 14:26:38 2014 New Revision: 214878 URL: https://gcc.gnu.org/viewcvs?rev=214878root=gccview=rev Log: 2014-09-03 Martin Jambor mjam...@suse.cz PR ipa/62015 * ipa-cp.c (intersect_aggregates_with_edge): Handle impermissible pass-trough jump functions correctly. testsuite/ * g++.dg/ipa/pr62015.C: New test. Added: trunk/gcc/testsuite/g++.dg/ipa/pr62015.C Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-cp.c trunk/gcc/testsuite/ChangeLog
[Bug c++/57335] internal compiler error: in cxx_eval_bit_field_ref, at cp/semantics.c:6977
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57335 Paolo Carlini paolo.carlini at oracle dot com changed: What|Removed |Added Keywords||ice-on-valid-code --- Comment #3 from Paolo Carlini paolo.carlini at oracle dot com --- ... but we ICE with the testcase adjusted too.
[Bug ipa/61986] ICE on valid code at -O3 on x86_64-linux-gnu indecide_about_value, at ipa-cp.c:3480
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61986 --- Comment #4 from Martin Jambor jamborm at gcc dot gnu.org --- I can reproduce the bug on the 4.9 branch too and the code is the same in 4.8 as well (although the bug does not manifest form me there), so please keep this bug opened until I commit the same fix to the two branches, which will happen right after my bootstrap and testing finishes.
[Bug libstdc++/62259] atomic class doesn't enforce required alignment on powerpc64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62259 --- Comment #3 from Jonathan Wakely redi at gcc dot gnu.org --- (In reply to saugustine from comment #0) My uneducated guess is that the template at atomic:189 should either use _M_i in calls to __atomic_is_lock_free (instead of nullptr) or should add alignment as necessary. Not sure how that is intended to be done. If I fix atomic to pass the pointer, then gcc chooses to call out to an atomic library function, which gcc doesn't provide. GCC does provide it, in libatomic, so -latomic should work. But I just tried your suggested change and saw no effect: I didn't need libatomic and I still got a bus error. I suppose what we want is the equivalent of this, but the _Atomic keyword isn't valid in C++: --- a/libstdc++-v3/include/std/atomic +++ b/libstdc++-v3/include/std/atomic @@ -161,7 +161,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION struct atomic { private: - _Tp _M_i; + alignas(alignof(_Atomic _Tp)) _Tp _M_i; // TODO: static_assert(is_trivially_copyable_Tp::value, );
[Bug fortran/62270] -Wlogical-not-parentheses warnings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62270 --- Comment #7 from Marek Polacek mpolacek at gcc dot gnu.org --- Author: mpolacek Date: Wed Sep 3 16:04:27 2014 New Revision: 214881 URL: https://gcc.gnu.org/viewcvs?rev=214881root=gccview=rev Log: PR fortran/62270 * interface.c (compare_parameter): Fix condition. * trans-expr.c (gfc_conv_procedure_call): Likewise. * gfortran.dg/pointer_intent_7.f90: Adjust dg-error. Modified: branches/gcc-4_9-branch/gcc/fortran/ChangeLog branches/gcc-4_9-branch/gcc/fortran/interface.c branches/gcc-4_9-branch/gcc/fortran/trans-expr.c branches/gcc-4_9-branch/gcc/testsuite/ChangeLog branches/gcc-4_9-branch/gcc/testsuite/gfortran.dg/pointer_intent_7.f90