RE: [Patch, ARM, ping1] Fix PR target/56846
Thanks. Ccing release manager for their opinion. Best regards, Thomas -Original Message- From: Jonathan Wakely [mailto:jwak...@redhat.com] Sent: Wednesday, November 26, 2014 5:33 PM To: Thomas Preud'homme Cc: Tony Wang; gcc-patches@gcc.gnu.org; d...@debian.org; aph- g...@littlepinkcloud.com; Richard Earnshaw; Ramana Radhakrishnan; libstd...@gcc.gnu.org Subject: Re: [Patch, ARM, ping1] Fix PR target/56846 On 26/11/14 17:23 -, Thomas Preud'homme wrote: Ping? I'm OK with backporting it if a release manager approves it. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Wednesday, November 19, 2014 6:00 PM To: Tony Wang; gcc-patches@gcc.gnu.org; d...@debian.org; aph- g...@littlepinkcloud.com; Richard Earnshaw; Ramana Radhakrishnan; libstd...@gcc.gnu.org Subject: RE: [Patch ARM] Fix PR target/56846 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Tony Wang Hi all, The bug is reported at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56846, and it’s about the problem that when exception handler is involved in the function, then _Unwind_Backtrace function will run into deadloop on arm target. The patch (in r215101) can be backported without any change on 4.8 and 4.9 branches. I checked in QEMU with and without the patch on both branches and it indeed solves the problem. Testsuite run without regression when compiled with arm-none-eabi cross compiler and executed on QEMU emulating Cortex-M3. I also bootstrapped gcc on x86_64-linux-gnu and run the testsuite without regressions. Is it ok for backport? Best regards, Thomas
Re: [PATCH][ARM] Add Cortex-A17 support
On Thu, Nov 13, 2014 at 5:25 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This patch adds support for the Cortex-A17 processor to the arm backend. Cortex-A17 is an ARMv7ve core with the same architectural features as the Cortex-A7, A12 and A15 cores. The -m{tune, cpu}=cortex-a17 option is added and a pipeline description for instruction scheduling is provided. This has given an uplift over -mcpu=cortex-a15 tuning on a number of benchmarks. The patch is fairly self-contained with the bulk of the diffstat being the pipeline description files. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? Ok with documentation update in invoke.texi and an entry for changes.html Ramana Thanks, Kyrill 2014-11-13 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.md (generic_sched): Specify cortexa17 in 'no' list. Include cortex-a17.md. * config/arm/arm.c (arm_issue_rate): Specify 2 for cortexa17. * config/arm/arm-cores.def (cortex-a17): New entry. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Regenerate. * config/arm/bpabi.h (BE8_LINK_SPEC): Specify mcpu=cortex-a17. * config/arm/cortex-a17.md: New file. * config/arm/cortex-a17-neon.md: New file. * config/arm/driver-arm.c (arm_cpu_table): Add entry for cortex-a17. * config/arm/t-aprofile: Add cortex-a17 entries to MULTILIB_MATCHES.
Re: [PATCH][ARM] Use Cortex-A17 tuning parameters for Cortex-A12
On Thu, Nov 13, 2014 at 5:25 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, The Cortex-A12 very close to the Cortex-A17 and this patch updates the tuning struct parameters to match the Cortex-A17 ones. This has improved performance in a number of benchmarks that I tried. The instruction scheduling is also changed to match the Cortex-A17 scheduling Tested arm-none-eabi. Ok for trunk once the Cortex-A17 patch goes in? Ok. Ramana Thanks, Kyrill 2014-11-13 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.c (arm_cortex_a12_tune): Update entries to match Cortex-A17 tuning parameters. * config/arm/arm-cores.def (cortex-a12): Schedule for cortex-a17.
Re: path fixing PPC bootstrap
Vladimir Makarov vmaka...@redhat.com writes: Mike Meissner pointed me out that my last patch broke PPC bootstrap. I submitted a quick fix for it and now I am sending the path after bootstrap on ppc is done successfully. Sorry for the inconvinience. 2014-11-25 Vladimir Makarov vmaka...@redhat.com * ira-lives.c (process_bb_node_lives): Make code with conditional REAL_PIC_OFFSET_TABLE_REGNUM. REAL_PIC_OFFSET_TABLE_REGNUM is really an i386-local thing though. There's no documentation for it in tm.texi and this is the first use of it in generic code. I suppose I'm just being lazy and not building and trying it out for myself, but could you describe the original fix a bit more? The comment says: /* Processing insn usage in call insn can create conflict with pic pseudo and pic hard reg and that is wrong. Check this situation and fix it at the end of the insn processing. */ but where does the conflict come from? Thanks, Richard
Re: [C/C++ PATCH] Don't convert RHS of a shift-expression to int (PR c/63862)
On Wed, 26 Nov 2014, Jakub Jelinek wrote: On Wed, Nov 26, 2014 at 10:20:39PM +0100, Richard Biener wrote: Well, if you want to aggressively prune unused bits then you could back-propagate the shift count value-range. And note that all the frontend shorten-optimizations should move to the middle-end. That said, instead of casting to int we could cast to unsigned char... As long as we don't have 256 bitsize integers, yes, for the purposes of GIMPLE. Though, as with type demotion, then it might be worthwhile to try to promote it to wider type if the shift instructions operate on wider shift count modes than QImode. True. I suppose the promotion pass could promote/demote the shift count to a mode appropriate for the target. But that's again orthogonal to optimizing the computation of the shift count based on its value-range which does not invoke undefined behavior (thus, shortening of those computations). We don't have a pass yet that allows this kind of back-propagation though of course a simple shortening pass wouldn't be hard to implement (or similar to what the FEs do now we can add some patterns to match.pd catching those opportunities). Richard.
RE: [Patch, ARM, ping1] Fix PR target/56846
On Thu, 27 Nov 2014, Thomas Preud'homme wrote: Thanks. Ccing release manager for their opinion. It doesn't look ARM specific and frankly I have not too much expertise in this area. The patch has been on trunk for more than two months though so I guess it is ok to backport. Richard. Best regards, Thomas -Original Message- From: Jonathan Wakely [mailto:jwak...@redhat.com] Sent: Wednesday, November 26, 2014 5:33 PM To: Thomas Preud'homme Cc: Tony Wang; gcc-patches@gcc.gnu.org; d...@debian.org; aph- g...@littlepinkcloud.com; Richard Earnshaw; Ramana Radhakrishnan; libstd...@gcc.gnu.org Subject: Re: [Patch, ARM, ping1] Fix PR target/56846 On 26/11/14 17:23 -, Thomas Preud'homme wrote: Ping? I'm OK with backporting it if a release manager approves it. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Wednesday, November 19, 2014 6:00 PM To: Tony Wang; gcc-patches@gcc.gnu.org; d...@debian.org; aph- g...@littlepinkcloud.com; Richard Earnshaw; Ramana Radhakrishnan; libstd...@gcc.gnu.org Subject: RE: [Patch ARM] Fix PR target/56846 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Tony Wang Hi all, The bug is reported at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56846, and it’s about the problem that when exception handler is involved in the function, then _Unwind_Backtrace function will run into deadloop on arm target. The patch (in r215101) can be backported without any change on 4.8 and 4.9 branches. I checked in QEMU with and without the patch on both branches and it indeed solves the problem. Testsuite run without regression when compiled with arm-none-eabi cross compiler and executed on QEMU emulating Cortex-M3. I also bootstrapped gcc on x86_64-linux-gnu and run the testsuite without regressions. Is it ok for backport? Best regards, Thomas -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284 (AG Nuernberg) Maxfeldstrasse 5, 90409 Nuernberg, Germany
Re: [Ping] Port of VTV for Cygwin and MinGW
On 12.11.2014 19:40, Kai Tietz wrote: TerminateProcess is actually bad, as it doesn't call any of the atexit handlers. You simply nuke the process off. For cygwin this behavior is inacceptable. Why a classical abort, or a classical exit call cause for you that issues? It seems to me more related to some other thing you try to paper over by this. It turns out the test program made some trouble. I rewrote it to the attached program (virtual_func_test_min_AW.cpp). I changed obstack.c and vtv_rts.cc to the C-runtime functions. For testing I used a program just containing an abort and all three tests in the attached test program. The call stack, passed parameters and behavior matched at the crucial parts (tested again on MinGW 32/64bit). Regarding the question, why I reimplemented mprotect, I also haven't changed anything in the patch but answered the question. And this doesn't make it better. It is present in the static part of libgcc. Have you tried to declare it with extern C (for C++ case) and simply use it? Cygwin provides its own version too. So there seems to me no real need to re-implement it. You're right. I was stuck with the idea of importing it dynamically, but changed it to extern C now. Regards, Patrick * gcc/config/i386/cygwin.h (STARTFILE_SPEC): Add vtv_start.o, if -fvtable-verify=preinit/std is used. * gcc/config/i386/mingw-w64.h (STARTFILE_SPEC): Likewise. * gcc/config/i386/mingw32.h (STARTFILE_SPEC): Likewise. * gcc/config/i386/cygwin.h (ENDFILE_SPEC): Add vtv_end.o, if -fvtable-verify=preinit/std is used. * gcc/config/i386/mingw32.h (ENDFILE_SPEC): Likewise. * gcc/config/i386/cygwin.h (LIB_SPEC): Pass -lvtv and -lpsapi, if -fvtable-verify=preinit/std is used. * gcc/config/i386/mingw-w64.h (LIB_SPEC): Likewise. * gcc/config/i386/mingw32.h (LIB_SPEC): Likewise. * gcc/cp/vtable-class-hierarchy.c (vtv_generate_init_routine): Add check for not TARGET_PECOFF at the VTV_PREINIT_PRIORITY checks. * gcc/varasm.c (assemble_variable): Add code to properly set the comdat section and name for the .vtable_map_vars section in case the target is PE or COFF. * libgcc/Makefile.in: Move rules to build vtv_*.o out of the check for CUSTOM_CRTSTUFF. * libgcc/config.host (i[34567]86-*-cygwin*, x86_64-*-cygwin*, i[34567]86-*-mingw*) (x86_64-*-mingw*): Only add vtv_*.o to extra_parts if enable_vtable_verify. * libstdc++-v3/acinclude.m4: Define VTV_CYGMIN. * libstdc++-v3/configure: Regenerate. * libstdc++-v3/libsupc++/Makefile.am: Add vtv_sources only to libsupc___la_SOURCES and libsupc__convenience_la_SOURCES if VTV_CYGMIN is not set. * libstdc++-v3/libsupc++/Makefile.in: Regenerated. * libstdc++-v3/libsupc++/vtv_stubs.cc: Add none weak declaration of every function for Cygwin and MinGW. * libstdc++-v3/src/Makefile.am: Add libvtv.la to toolexeclib_LTLIBRARIES, if VTV_CYGMIN is set. Define libvtv_la_SOURCES, libvtv_la_LDFLAGS, libvtv_la_AM_CXXFLAGS and libvtv_la_LINK if VTV_CYGMIN is set. * libstdc++-v3/src/Makefile.in: Regenerate. * libvtv/Makefile.am : Add libvtv.la to toolexeclib_LTLIBRARIES, if VTV_CYGMIN is set. Define libvtv_la_LIBADD, libvtv_la_LDFLAGS, libvtv_stubs_la_LDFLAGS and libvtv_stubs_la_SOURCES if VTV_CYGMIN is set. Add obstac.c to libvtv_la_SOURCES if VTV_CYGMIN is set. * libvtv/Makefile.in : Regenerate. * libvtv/aclocal.m4 : Regenerate. * libvtv/configure : Regenerate. * libvtv/configure.ac : Add ACX_LT_HOST_FLAGS. Define VTV_CYGMIN. * libvtv/configure.tgt : (x86_64-*-cygwin*, i?86-*-cygwin*, x86_64-*-mingw*) (i?86-*-mingw*): Add to supported targets. * libvtv/vtv_fail.cc : Skip inclusion of execinfo.h on Cygwin and MinGW. (log_error_message): Skip calls to backtrace and backtrace_symbols_fd on Cygwin and MinGW. * libvtv/vtv_malloc.cc : Include windows.h and skip sys/mman.h inclusion on Cygwin and MinGW. Add sysconf port on Cygwin and MinGW. (obstack_chunk_alloc): Exchange call to mmap with call to VirtualAlloc on Cygwin and MinGW. (__vtv_malloc_init): Exchange call to sysconf with call to port of sysconf on Cygwin and MinGW. * libvtv/vtv_malloc.h : Declare mprotect and define PROT_READ and PROT_WRITE on Cygwin and MinGW. * libvtv/map.h : Include stdint.h on MinGW. * libvtv/rts.cc : Include windows.h, winternl.h and psapi.h, skip include of execinfo.h, sys/mman.h and link.h on Cygwin and MinGW. Add port of __fortify_fail on Cygwin and MinGW. Change ElfW (Addr) to uintptr_t on Cygwin and MinGW. (read_section_offset_and_length): Add port for Cygwin and MinGW (iterate_modules): New function. (vtv_unprotect_vtable_vars): Use iterate_modules instead of dl_iterate_phdr on Cygwin and MinGW. (vtv_protect_vtable_vars): Likewise. (count_all_pages): Likewise. (dl_iterate_phdr_count_pages): Don't build on Cygwin and MinGW. * libvtv/utils.cc : Include windows.h and skip execinfo.h inclusion on Cygwin and MinGW. (__vtv_open_log): Exchange call to getuid and getpid
Re: [C/C++ PATCH] Don't convert RHS of a shift-expression to int (PR c/63862)
On Thu, Nov 27, 2014 at 10:30:16AM +0100, Richard Biener wrote: On Wed, 26 Nov 2014, Jakub Jelinek wrote: On Wed, Nov 26, 2014 at 10:20:39PM +0100, Richard Biener wrote: Well, if you want to aggressively prune unused bits then you could back-propagate the shift count value-range. And note that all the frontend shorten-optimizations should move to the middle-end. That said, instead of casting to int we could cast to unsigned char... As long as we don't have 256 bitsize integers, yes, for the purposes of GIMPLE. Though, as with type demotion, then it might be worthwhile to try to promote it to wider type if the shift instructions operate on wider shift count modes than QImode. True. I suppose the promotion pass could promote/demote the shift count to a mode appropriate for the target. But that's again orthogonal to optimizing the computation of the shift count based on its value-range which does not invoke undefined behavior (thus, shortening of those computations). We don't have a pass yet that allows this kind of back-propagation though of course a simple shortening pass wouldn't be hard to implement (or similar to what the FEs do now we can add some patterns to match.pd catching those opportunities). My point is that because we don't have infrastructure for that in GCC 5, I think it would be best to keep the status quo in GIMPLE for this, and only when we have infrastructure to do something better, change it. As Marek wants to have the original (promoted) type in the FEs (which is of course desirable, not only for ubsan, but also for diagnostics), transforming that into the old way in c_gimplify_expr would keep the middle-end seeing what it was used to. Jakub
[PATCH] Fix PR64083
The following fixes an omission from when I introduced the mark_loop_for_removal abstraction. thread_through_all_blocks still uses the old way of killing a loop. The following patch fixes this by not killing it at all (it's just likely that the loop will break and a loop-fixup is scheduled anyway if we change the CFG and that fixup will deal with this just fine). Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2014-11-26 Richard Biener rguent...@suse.de PR tree-optimization/64083 * tree-ssa-threadupdate.c (thread_through_all_blocks): Do not forcibly mark loop for removal the wrong way. * gcc.dg/torture/pr64083.c: New testcase. Index: gcc/tree-ssa-threadupdate.c === --- gcc/tree-ssa-threadupdate.c (revision 218078) +++ gcc/tree-ssa-threadupdate.c (working copy) @@ -2428,16 +2428,8 @@ thread_through_all_blocks (bool may_peel /* Our path is still valid, thread it. */ if (e-aux) { - struct loop *loop = (*path)[0]-e-dest-loop_father; - if (thread_block ((*path)[0]-e-dest, false)) - { - /* This jump thread likely totally scrambled this loop. - So arrange for it to be fixed up. */ - loop-header = NULL; - loop-latch = NULL; - e-aux = NULL; - } + e-aux = NULL; else { delete_jump_thread_path (path); Index: gcc/testsuite/gcc.dg/torture/pr64083.c === --- gcc/testsuite/gcc.dg/torture/pr64083.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/pr64083.c (working copy) @@ -0,0 +1,17 @@ +/* { dg-do compile } */ + +int a, b; +void +fn1 () +{ + int c = 0; + while (b) +{ + switch (c) +case 1: + fn1 (); + if (a) + c = 1; + b = 0; +} +}
[PATCH] Fix match-and-simplify regression
The following allows gimple_fold_stmt_to_constant_1 in SCCVN to follow SSA edges. This fixes regressions caused by no longer dispatching to fold_unary from it. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2014-11-27 Richard Biener rguent...@suse.de * tree-ssa-sccvn.c (try_to_simplify): Allow gimple_fold_stmt_to_constant_1 to follow SSA edges. Index: gcc/tree-ssa-sccvn.c === --- gcc/tree-ssa-sccvn.c(revision 218114) +++ gcc/tree-ssa-sccvn.c(working copy) @@ -3461,7 +3461,7 @@ try_to_simplify (gassign *stmt) return NULL_TREE; /* First try constant folding based on our current lattice. */ - tem = gimple_fold_stmt_to_constant_1 (stmt, vn_valueize); + tem = gimple_fold_stmt_to_constant_1 (stmt, vn_valueize, vn_valueize); if (tem (TREE_CODE (tem) == SSA_NAME || is_gimple_min_invariant (tem)))
Re: [Patch, ARM, ping1] Fix PR target/56846
On 27/11/14 09:34, Richard Biener wrote: On Thu, 27 Nov 2014, Thomas Preud'homme wrote: Thanks. Ccing release manager for their opinion. It doesn't look ARM specific and frankly I have not too much expertise in this area. The patch has been on trunk for more than two months though so I guess it is ok to backport. It is ARM specific because the whole thing sits in a #ifdef __ARM_EABI_UNWINDER__ in eh_personality.cc. regards Ramana Richard. Best regards, Thomas -Original Message- From: Jonathan Wakely [mailto:jwak...@redhat.com] Sent: Wednesday, November 26, 2014 5:33 PM To: Thomas Preud'homme Cc: Tony Wang; gcc-patches@gcc.gnu.org; d...@debian.org; aph- g...@littlepinkcloud.com; Richard Earnshaw; Ramana Radhakrishnan; libstd...@gcc.gnu.org Subject: Re: [Patch, ARM, ping1] Fix PR target/56846 On 26/11/14 17:23 -, Thomas Preud'homme wrote: Ping? I'm OK with backporting it if a release manager approves it. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Wednesday, November 19, 2014 6:00 PM To: Tony Wang; gcc-patches@gcc.gnu.org; d...@debian.org; aph- g...@littlepinkcloud.com; Richard Earnshaw; Ramana Radhakrishnan; libstd...@gcc.gnu.org Subject: RE: [Patch ARM] Fix PR target/56846 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Tony Wang Hi all, The bug is reported at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56846, and it’s about the problem that when exception handler is involved in the function, then _Unwind_Backtrace function will run into deadloop on arm target. The patch (in r215101) can be backported without any change on 4.8 and 4.9 branches. I checked in QEMU with and without the patch on both branches and it indeed solves the problem. Testsuite run without regression when compiled with arm-none-eabi cross compiler and executed on QEMU emulating Cortex-M3. I also bootstrapped gcc on x86_64-linux-gnu and run the testsuite without regressions. Is it ok for backport? Best regards, Thomas
Re: [C/C++ PATCH] Don't convert RHS of a shift-expression to int (PR c/63862)
On Thu, 27 Nov 2014, Jakub Jelinek wrote: On Thu, Nov 27, 2014 at 10:30:16AM +0100, Richard Biener wrote: On Wed, 26 Nov 2014, Jakub Jelinek wrote: On Wed, Nov 26, 2014 at 10:20:39PM +0100, Richard Biener wrote: Well, if you want to aggressively prune unused bits then you could back-propagate the shift count value-range. And note that all the frontend shorten-optimizations should move to the middle-end. That said, instead of casting to int we could cast to unsigned char... As long as we don't have 256 bitsize integers, yes, for the purposes of GIMPLE. Though, as with type demotion, then it might be worthwhile to try to promote it to wider type if the shift instructions operate on wider shift count modes than QImode. True. I suppose the promotion pass could promote/demote the shift count to a mode appropriate for the target. But that's again orthogonal to optimizing the computation of the shift count based on its value-range which does not invoke undefined behavior (thus, shortening of those computations). We don't have a pass yet that allows this kind of back-propagation though of course a simple shortening pass wouldn't be hard to implement (or similar to what the FEs do now we can add some patterns to match.pd catching those opportunities). My point is that because we don't have infrastructure for that in GCC 5, I think it would be best to keep the status quo in GIMPLE for this, and only when we have infrastructure to do something better, change it. As Marek wants to have the original (promoted) type in the FEs (which is of course desirable, not only for ubsan, but also for diagnostics), transforming that into the old way in c_gimplify_expr would keep the middle-end seeing what it was used to. But then rather than using integer_type_node I'd convert to unsigned_type_node (if that doesn't work you know we have some nasty dependence on negative shift counts working) Richard.
Re: [Patch, ARM, ping1] Fix PR target/56846
On Thu, 27 Nov 2014, Ramana Radhakrishnan wrote: On 27/11/14 09:34, Richard Biener wrote: On Thu, 27 Nov 2014, Thomas Preud'homme wrote: Thanks. Ccing release manager for their opinion. It doesn't look ARM specific and frankly I have not too much expertise in this area. The patch has been on trunk for more than two months though so I guess it is ok to backport. It is ARM specific because the whole thing sits in a #ifdef __ARM_EABI_UNWINDER__ in eh_personality.cc. Ah, too little patch context then. Richard. regards Ramana Richard. Best regards, Thomas -Original Message- From: Jonathan Wakely [mailto:jwak...@redhat.com] Sent: Wednesday, November 26, 2014 5:33 PM To: Thomas Preud'homme Cc: Tony Wang; gcc-patches@gcc.gnu.org; d...@debian.org; aph- g...@littlepinkcloud.com; Richard Earnshaw; Ramana Radhakrishnan; libstd...@gcc.gnu.org Subject: Re: [Patch, ARM, ping1] Fix PR target/56846 On 26/11/14 17:23 -, Thomas Preud'homme wrote: Ping? I'm OK with backporting it if a release manager approves it. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Wednesday, November 19, 2014 6:00 PM To: Tony Wang; gcc-patches@gcc.gnu.org; d...@debian.org; aph- g...@littlepinkcloud.com; Richard Earnshaw; Ramana Radhakrishnan; libstd...@gcc.gnu.org Subject: RE: [Patch ARM] Fix PR target/56846 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Tony Wang Hi all, The bug is reported at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56846, and it’s about the problem that when exception handler is involved in the function, then _Unwind_Backtrace function will run into deadloop on arm target. The patch (in r215101) can be backported without any change on 4.8 and 4.9 branches. I checked in QEMU with and without the patch on both branches and it indeed solves the problem. Testsuite run without regression when compiled with arm-none-eabi cross compiler and executed on QEMU emulating Cortex-M3. I also bootstrapped gcc on x86_64-linux-gnu and run the testsuite without regressions. Is it ok for backport? Best regards, Thomas -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284 (AG Nuernberg) Maxfeldstrasse 5, 90409 Nuernberg, Germany
Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
On 26/11/14 10:14, Christophe Lyon wrote: Hi Kyrill, Hi Christophe, On 21 November 2014 at 16:52, Marcus Shawcroft marcus.shawcr...@gmail.com wrote: On 17 November 2014 17:35, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic. 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/simd/vsqrt_f64_1.c OK /Marcus Your new test fails at the scan-assembly step because all the code is optimized away (even at -O1). Sorry about that, I could have sworn I saw it pass when I initially wrote it... In any case, I've committed this patch (r218117) as obvious to mark one of the variables as volatile to make sure it's not optimised away. I've confirmed that the scan-assembler test fails without this patch and passes with it. Thanks, Kyrill Christophe. diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c index 57fb6bb..7f99bd5 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c @@ -11,7 +11,7 @@ extern void abort (void); int main (void) { - float64x1_t in = vcreate_f64(0x3febd3e560634d7bULL); + volatile float64x1_t in = vcreate_f64(0x3febd3e560634d7bULL); float64x1_t result = vsqrt_f64 (in); float64_t expected = 0.9325321502142351;
Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
On 27/11/14 10:03, Kyrill Tkachov wrote: On 26/11/14 10:14, Christophe Lyon wrote: Hi Kyrill, Hi Christophe, On 21 November 2014 at 16:52, Marcus Shawcroft marcus.shawcr...@gmail.com wrote: On 17 November 2014 17:35, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic. 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/simd/vsqrt_f64_1.c OK /Marcus Your new test fails at the scan-assembly step because all the code is optimized away (even at -O1). Sorry about that, I could have sworn I saw it pass when I initially wrote it... In any case, I've committed this patch (r218117) as obvious to mark one of the variables as volatile to make sure it's not optimised away. I've confirmed that the scan-assembler test fails without this patch and passes with it. With the following ChangeLog for completeness sake: 2014-11-27 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/simd/vsqrt_f64_1.c: Mark variable volatile. Thanks, Kyrill Christophe.
Re: [PATCH] Fix simd clone vectorization (PR tree-optimization/64024)
On Wed, 26 Nov 2014, Jakub Jelinek wrote: Hi! As discussed in the PR and on IRC, the problem here is that peeling for alignment can for some linear argument that during vect analysis passed simple_iv no longer pass it during vect transform phase. So, to fix this, this patch remembers the base and step values from simple_iv during vect analysis and uses them during transform phase (biased by what the peeling for alignment advanced of course). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. 2014-11-26 Jakub Jelinek ja...@redhat.com PR tree-optimization/64024 * tree-vectorizer.h (struct _stmt_vec_info): Remove simd_clone_fndecl field. Add simd_clone_info field. (STMT_VINFO_SIMD_CLONE_FNDECL): Remove. (STMT_VINFO_SIMD_CLONE_INFO): Define. * tree-vect-stmts.c (vectorizable_simd_clone_call): Adjust for STMT_VINFO_SIMD_CLONE_FNDECL becoming first element of STMT_VINFO_SIMD_CLONE_INFO vector. For linear arguments, remember base and linear_step from analysis phase and use it during transform phase, biased by the difference between LOOP_VINFO_NITERS{_UNCHANGED,} multiplied by linear_step. (free_stmt_vec_info): Release STMT_VINFO_SIMD_CLONE_INFO. * gcc.dg/vect/vect-simd-clone-13.c: New test. * gcc.dg/vect/vect-simd-clone-14.c: New test. --- gcc/tree-vectorizer.h.jj 2014-11-19 18:48:07.0 +0100 +++ gcc/tree-vectorizer.h 2014-11-26 12:56:00.899824766 +0100 @@ -602,8 +602,10 @@ typedef struct _stmt_vec_info { of this stmt. */ vecdr_p same_align_refs; - /* Selected SIMD clone's function decl. */ - tree simd_clone_fndecl; + /* Selected SIMD clone's function info. First vector element + is SIMD clone's function decl, followed by a pair of trees (base + step) + for linear arguments (pair of NULLs for other arguments). */ + vectree simd_clone_info; /* Classify the def of this stmt. */ enum vect_def_type def_type; @@ -677,7 +679,7 @@ typedef struct _stmt_vec_info { #define STMT_VINFO_RELATED_STMT(S) (S)-related_stmt #define STMT_VINFO_PATTERN_DEF_SEQ(S) (S)-pattern_def_seq #define STMT_VINFO_SAME_ALIGN_REFS(S) (S)-same_align_refs -#define STMT_VINFO_SIMD_CLONE_FNDECL(S) (S)-simd_clone_fndecl +#define STMT_VINFO_SIMD_CLONE_INFO(S) (S)-simd_clone_info #define STMT_VINFO_DEF_TYPE(S) (S)-def_type #define STMT_VINFO_GROUP_FIRST_ELEMENT(S) (S)-first_element #define STMT_VINFO_GROUP_NEXT_ELEMENT(S) (S)-next_element --- gcc/tree-vect-stmts.c.jj 2014-11-19 18:47:59.0 +0100 +++ gcc/tree-vect-stmts.c 2014-11-26 15:38:59.883409014 +0100 @@ -2715,12 +2715,40 @@ vectorizable_simd_clone_call (gimple stm else gcc_assert (thisarginfo.vectype != NULL_TREE); - if (thisarginfo.dt != vect_constant_def -thisarginfo.dt != vect_external_def -loop_vinfo -TREE_CODE (op) == SSA_NAME -simple_iv (loop, loop_containing_stmt (stmt), op, iv, false) -tree_fits_shwi_p (iv.step)) + /* For linear arguments, the analyze phase should have saved + the base and step in STMT_VINFO_SIMD_CLONE_INFO. */ + if (i * 2 + 3 = STMT_VINFO_SIMD_CLONE_INFO (stmt_info).length () +STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 2 + 2]) + { + gcc_assert (vec_stmt); + thisarginfo.linear_step + = tree_to_shwi (STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 2 + 2]); + thisarginfo.op + = STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 2 + 1]; + /* If loop has been peeled for alignment, we need to adjust it. */ + tree n1 = LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo); + tree n2 = LOOP_VINFO_NITERS (loop_vinfo); + if (n1 != n2) + { + tree bias = fold_build2 (MINUS_EXPR, TREE_TYPE (n1), n1, n2); + tree step = STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 2 + 2]; + tree opt = TREE_TYPE (thisarginfo.op); + bias = fold_convert (TREE_TYPE (step), bias); + bias = fold_build2 (MULT_EXPR, TREE_TYPE (step), bias, step); + thisarginfo.op + = fold_build2 (POINTER_TYPE_P (opt) +? POINTER_PLUS_EXPR : PLUS_EXPR, opt, +thisarginfo.op, bias); + } + } + else if (!vec_stmt + thisarginfo.dt != vect_constant_def + thisarginfo.dt != vect_external_def + loop_vinfo + TREE_CODE (op) == SSA_NAME + simple_iv (loop, loop_containing_stmt (stmt), op, + iv, false) + tree_fits_shwi_p (iv.step)) { thisarginfo.linear_step = tree_to_shwi (iv.step); thisarginfo.op = iv.base; @@ -2735,8 +2763,8 @@ vectorizable_simd_clone_call (gimple stm unsigned int badness = 0;
Re: [PATCH] Fix expansion with COMPOUND_LITERAL_EXPR (PR middle-end/64067)
On Wed, 26 Nov 2014, Jakub Jelinek wrote: Hi! The testcase shows that when expanding ARRAY_REF from a const static var with initializer that contains COMPOUND_LITERAL_EXPR, we can try to expand COMPOUND_LITERAL_EXPR even when modifier is not EXPAND_INITIALIZER (it is EXPAND_SUM in that testcase, but could be many others). While gimplification should get rid of all the compound literal exprs for automatic compound literals, COMPOUND_LITERAL_EXPR for static compound literals can survive in the initializers, thus this patch handles them always. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. (I wonder if the fallthru still makes sense or if we should hit a gcc_unreachable () for other uses of COMPOUND_LITERAL_EXPR). Thanks, Richard. 2014-11-26 Jakub Jelinek ja...@redhat.com PR middle-end/64067 * expr.c (expand_expr_addr_expr_1) case COMPOUND_LITERAL_EXPR: Handle it by returning address of COMPOUND_LITERAL_EXPR_DECL not only if modifier is EXPAND_INITIALIZER, but whenever COMPOUND_LITERAL_EXPR_DECL is non-NULL and TREE_STATIC. * gcc.c-torture/compile/pr64067.c: New test. --- gcc/expr.c.jj 2014-11-18 08:26:45.0 +0100 +++ gcc/expr.c2014-11-26 17:38:52.877675088 +0100 @@ -7677,11 +7677,13 @@ expand_expr_addr_expr_1 (tree exp, rtx t break; case COMPOUND_LITERAL_EXPR: - /* Allow COMPOUND_LITERAL_EXPR in initializers, if e.g. - rtl_for_decl_init is called on DECL_INITIAL with - COMPOUNT_LITERAL_EXPRs in it, they aren't gimplified. */ - if (modifier == EXPAND_INITIALIZER -COMPOUND_LITERAL_EXPR_DECL (exp)) + /* Allow COMPOUND_LITERAL_EXPR in initializers or coming from + initializers, if e.g. rtl_for_decl_init is called on DECL_INITIAL + with COMPOUND_LITERAL_EXPRs in it, or ARRAY_REF on a const static + array with address of COMPOUND_LITERAL_EXPR in DECL_INITIAL; + the initializers aren't gimplified. */ + if (COMPOUND_LITERAL_EXPR_DECL (exp) +TREE_STATIC (COMPOUND_LITERAL_EXPR_DECL (exp))) return expand_expr_addr_expr_1 (COMPOUND_LITERAL_EXPR_DECL (exp), target, tmode, modifier, as); /* FALLTHRU */ --- gcc/testsuite/gcc.c-torture/compile/pr64067.c.jj 2014-11-26 17:37:37.835865797 +0100 +++ gcc/testsuite/gcc.c-torture/compile/pr64067.c 2014-11-26 17:37:02.0 +0100 @@ -0,0 +1,10 @@ +/* PR middle-end/64067 */ + +struct S { int s; }; +int *const v[1] = { ((struct S) { .s = 42 }).s }; + +int * +foo (void) +{ + return v[0]; +} Jakub -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284 (AG Nuernberg) Maxfeldstrasse 5, 90409 Nuernberg, Germany
Re: [PATCH] Fix find_base_term in 32-bit -fpic code (PR lto/64025, take 2)
On Wed, 26 Nov 2014, Jakub Jelinek wrote: On Tue, Nov 25, 2014 at 09:20:13AM +0100, Jakub Jelinek wrote: Actually, thinking about it more, at least according to commutative_operand_precedence the canonical order is what we used to return (i.e. (something - _G_O_T_) + (symbol_ref) or (something - _G_O_T_) + (const (symbol_ref +- const)) So perhaps better fix is to follow find_base_value, which does something like: /* Guess which operand is the base address: If either operand is a symbol, then it is the base. If either operand is a CONST_INT, then the other is the base. */ if (CONST_INT_P (src_1) || CONSTANT_P (src_0)) return find_base_value (src_0); else if (CONST_INT_P (src_0) || CONSTANT_P (src_1)) return find_base_value (src_1); and do something similar in find_base_term too. I.e. perhaps even with higher precedence over REG_P with REG_POINTER (or lower, in these cases it doesn't really matter, neither argument is REG_P), choose first operand that is CONSTANT_P and not CONST_INT_P. Yeah, that makes sense. Here it is. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. 2014-11-26 Jakub Jelinek ja...@redhat.com PR lto/64025 * alias.c (find_base_term): Use std::swap. Prefer tmp2 if it is CONSTANT_P other than CONST_INT. --- gcc/alias.c.jj2014-11-21 10:17:17.0 +0100 +++ gcc/alias.c 2014-11-26 12:31:24.719485590 +0100 @@ -1756,11 +1756,11 @@ find_base_term (rtx x) if (REG_P (tmp1) REG_POINTER (tmp1)) ; else if (REG_P (tmp2) REG_POINTER (tmp2)) - { - rtx tem = tmp1; - tmp1 = tmp2; - tmp2 = tem; - } + std::swap (tmp1, tmp2); + /* If second argument is constant which has base term, prefer it +over variable tmp1. See PR64025. */ + else if (CONSTANT_P (tmp2) !CONST_INT_P (tmp2)) + std::swap (tmp1, tmp2); /* Go ahead and find the base term for both operands. If either base term is from a pointer or is a named object or a special address Jakub -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284 (AG Nuernberg) Maxfeldstrasse 5, 90409 Nuernberg, Germany
RE: [Patch, ARM, ping1] Fix PR target/56846
-Original Message- From: Richard Biener [mailto:rguent...@suse.de] Sent: Thursday, November 27, 2014 9:57 AM To: Ramana Radhakrishnan Cc: Thomas Preud'homme; 'Jonathan Wakely'; Jakub Jelinek; Tony Wang; gcc-patches@gcc.gnu.org; d...@debian.org; aph- g...@littlepinkcloud.com; Richard Earnshaw; libstd...@gcc.gnu.org Subject: Re: [Patch, ARM, ping1] Fix PR target/56846 On Thu, 27 Nov 2014, Ramana Radhakrishnan wrote: On 27/11/14 09:34, Richard Biener wrote: On Thu, 27 Nov 2014, Thomas Preud'homme wrote: Thanks. Ccing release manager for their opinion. It doesn't look ARM specific and frankly I have not too much expertise in this area. The patch has been on trunk for more than two months though so I guess it is ok to backport. It is ARM specific because the whole thing sits in a #ifdef __ARM_EABI_UNWINDER__ in eh_personality.cc. Ah, too little patch context then. Sorry, my bad. Below is the patch that were sent by Tony at that time with 20 lines of context: diff --git a/libstdc++-v3/libsupc++/eh_personality.cc b/libstdc++-v3/libsupc++/eh_personality.cc index f315a83..cb4467a 100644 --- a/libstdc++-v3/libsupc++/eh_personality.cc +++ b/libstdc++-v3/libsupc++/eh_personality.cc @@ -361,40 +361,46 @@ PERSONALITY_FUNCTION (int version, found_cleanup, found_handler } found_type; lsda_header_info info; const unsigned char *language_specific_data; const unsigned char *action_record; const unsigned char *p; _Unwind_Ptr landing_pad, ip; int handler_switch_value; void* thrown_ptr = 0; bool foreign_exception; int ip_before_insn = 0; #ifdef __ARM_EABI_UNWINDER__ _Unwind_Action actions; switch (state _US_ACTION_MASK) { case _US_VIRTUAL_UNWIND_FRAME: + // If the unwind state pattern is + // _US_VIRTUAL_UNWIND_FRAME | _US_FORCE_UNWIND + // then we don't need to search for any handler as it is not a real + // exception. Just unwind the stack. + if (state _US_FORCE_UNWIND) + CONTINUE_UNWINDING; actions = _UA_SEARCH_PHASE; break; case _US_UNWIND_FRAME_STARTING: actions = _UA_CLEANUP_PHASE; if (!(state _US_FORCE_UNWIND) ue_header-barrier_cache.sp == _Unwind_GetGR(context, UNWIND_STACK_REG)) actions |= _UA_HANDLER_FRAME; break; case _US_UNWIND_FRAME_RESUME: CONTINUE_UNWINDING; break; default: std::abort(); } actions |= state _US_FORCE_UNWIND; Best regards, Thomas
Re: [PATCH] Fix expansion with COMPOUND_LITERAL_EXPR (PR middle-end/64067)
On Thu, Nov 27, 2014 at 11:03:20AM +0100, Richard Biener wrote: The testcase shows that when expanding ARRAY_REF from a const static var with initializer that contains COMPOUND_LITERAL_EXPR, we can try to expand COMPOUND_LITERAL_EXPR even when modifier is not EXPAND_INITIALIZER (it is EXPAND_SUM in that testcase, but could be many others). While gimplification should get rid of all the compound literal exprs for automatic compound literals, COMPOUND_LITERAL_EXPR for static compound literals can survive in the initializers, thus this patch handles them always. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. (I wonder if the fallthru still makes sense or if we should hit a gcc_unreachable () for other uses of COMPOUND_LITERAL_EXPR). I think the fallthru can't work, because it will hit the gcc_unreachable (); on unhandled COMPOUND_LITERAL_EXPR in expand_expr_real_*. So changing the if in the patch into gcc_assert and removing /* FALLTHRU */ might be ok too (and removing == COMPOUND_LITERAL_EXPR from the fallthru if). 2014-11-26 Jakub Jelinek ja...@redhat.com PR middle-end/64067 * expr.c (expand_expr_addr_expr_1) case COMPOUND_LITERAL_EXPR: Handle it by returning address of COMPOUND_LITERAL_EXPR_DECL not only if modifier is EXPAND_INITIALIZER, but whenever COMPOUND_LITERAL_EXPR_DECL is non-NULL and TREE_STATIC. * gcc.c-torture/compile/pr64067.c: New test. --- gcc/expr.c.jj 2014-11-18 08:26:45.0 +0100 +++ gcc/expr.c 2014-11-26 17:38:52.877675088 +0100 @@ -7677,11 +7677,13 @@ expand_expr_addr_expr_1 (tree exp, rtx t break; case COMPOUND_LITERAL_EXPR: - /* Allow COMPOUND_LITERAL_EXPR in initializers, if e.g. -rtl_for_decl_init is called on DECL_INITIAL with -COMPOUNT_LITERAL_EXPRs in it, they aren't gimplified. */ - if (modifier == EXPAND_INITIALIZER - COMPOUND_LITERAL_EXPR_DECL (exp)) + /* Allow COMPOUND_LITERAL_EXPR in initializers or coming from +initializers, if e.g. rtl_for_decl_init is called on DECL_INITIAL +with COMPOUND_LITERAL_EXPRs in it, or ARRAY_REF on a const static +array with address of COMPOUND_LITERAL_EXPR in DECL_INITIAL; +the initializers aren't gimplified. */ + if (COMPOUND_LITERAL_EXPR_DECL (exp) + TREE_STATIC (COMPOUND_LITERAL_EXPR_DECL (exp))) return expand_expr_addr_expr_1 (COMPOUND_LITERAL_EXPR_DECL (exp), target, tmode, modifier, as); /* FALLTHRU */ Jakub
Re: [PATCH 1/3, ARM, libgcc, ping5] Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc
On Thu, Nov 13, 2014 at 4:03 PM, Thomas Preud'homme thomas.preudho...@arm.com wrote: [Taking over Tony's patch] Ping? Best regards, Thomas -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Tony Wang Sent: Thursday, August 21, 2014 7:15 AM To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/3,ARM,libgcc]Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc Hi there, In libgcc the file ieee754-sf.S and ieee754-df.S have some function pairs which will be bundled into one .o file and sharing the same .text section. For example, the fmul and fdiv, the libgcc makefile will build them into one .o file and archived into libgcc.a. So when user only call single float point multiply functions, the fdiv function will also be linked, and as fmul and fdiv share the same .text section, linker option --gc-sections or -flot can't remove the dead code. So this optimization just separates the function pair(fmul/fdiv and dmul/ddiv) into different sections, following the naming pattern of -ffunction- sections(.text.__functionname), through which the unused sections of fdiv/ddiv can be eliminated through option --gcc-sections when users only use fmul/dmul.The solution is to add a conditional statement in the macro FUNC_START, which will conditional change the section of a function from .text to .text.__\name. when compiling with the L_arm_muldivsf3 or L_arm_muldivdf3 macro. GCC regression test has been done on QEMU for Cortex-M3. No new regressions when turn on this patch. The code reduction for thumb2 on cortex-m3 is: 1. When user only use single float point multiply: fmul+fdiv = fmul will have a code size reduction of 318 bytes. 2. When user only use double float point multiply: dmul+ddiv = dmul will have a code size reduction of 474 bytes. Ok for trunk? This is OK. Ramana BR, Tony Step 1: Provide another option: sp-scetion to control whether to split the section of a function pair into two part. gcc/libgcc/ChangeLog: 2014-08-21 Tony Wang tony.w...@arm.com * config/arm/lib1funcs.S (FUNC_START): Add conditional section redefine for macro L_arm_muldivsf3 and L_arm_muldivdf3 (SYM_END, ARM_SYM_START): Add macros used to expose function Symbols diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S index b617137..0f87111 100644 --- a/libgcc/config/arm/lib1funcs.S +++ b/libgcc/config/arm/lib1funcs.S @@ -418,8 +418,12 @@ SYM (\name): #define THUMB_SYNTAX #endif -.macro FUNC_START name +.macro FUNC_START name sp_section= + .ifc \sp_section, function_section + .section.text.__\name,ax,%progbits + .else .text + .endif .globl SYM (__\name) TYPE (__\name) .align 0 @@ -429,14 +433,24 @@ SYM (\name): SYM (__\name): .endm +.macro ARM_SYM_START name + TYPE (\name) + .align 0 +SYM (\name): +.endm + +.macro SYM_END name + SIZE (\name) +.endm + /* Special function that will always be coded in ARM assembly, even if in Thumb-only compilation. */ #if defined(__thumb2__) /* For Thumb-2 we build everything in thumb mode. */ -.macro ARM_FUNC_START name - FUNC_START \name +.macro ARM_FUNC_START name sp_section= + FUNC_START \name \sp_section .syntax unified .endm #define EQUIV .thumb_set @@ -467,8 +481,12 @@ _L__\name: #ifdef __ARM_ARCH_6M__ #define EQUIV .thumb_set #else -.macro ARM_FUNC_START name +.macro ARM_FUNC_START name sp_section= + .ifc \sp_section, function_section + .section.text.__\name,ax,%progbits + .else .text + .endif .globl SYM (__\name) TYPE (__\name) .align 0
Re: [PATCH][ARM] Fix PR59593/PR63742: make consttable_{1,2} available to ARM
On Tue, Nov 11, 2014 at 3:40 PM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Fixed the subject. Best regards, Thomas -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Tuesday, November 11, 2014 3:31 PM To: gcc-patches@gcc.gnu.org; Ramana Radhakrishnan; Richard Earnshaw Subject: [PATCH][ARM] Fix PR59593/PR63742: arm *movhi_insn_arch4 pattern for big-endian Currently, constant pool entries for QImode, HImode and SImode values are filled as 32-bit quantities. This works fine for little endian system but gives some incorrect results for big endian system when the value is *accessed* with a mode smaller than 32-bit in size. Suppose the case of the value 0x1234 that is accessed as an HImode value. It would be output as 0x0 0x0 0x12 0x34 in a constant pool entry and the 16-bit load that would be done would lead to the value 0x0 in register. This patch makes the consttable_1 and consttable_2 pattern available to ARM as well so that values are output according to their mode. This is a different approach than the one proposed by Felix Yang at [1]. [1] https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00258.html ChangeLog entries are as follows: *** gcc/ChangeLog *** 2014-09-03 Thomas Preud'homme thomas.preudho...@arm.com PR target/59593 * config/arm/arm.c (dump_minipool): dispatch to consttable pattern based on mode size. * config/arm/arm.md (consttable_1): Move from config/arm/thumb1.md and make it TARGET_EITHER. (consttable_2): Move from config/arm/thumb1.md, make it TARGET_EITHER and move HFmode handling from consttable_4 to it. (consttable_4): Move HFmode handling to consttable_2 pattern. * config/arm/thumb1.md (consttable_1): Move to config/arm/arm.md. (consttable_2): Ditto. *** gcc/testsuite/ChangeLog *** 2014-10-08 Thomas Preud'homme thomas.preudho...@arm.com PR target/59593 * gcc.target/arm/constant-pool.c: New test. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index d8bfda3..63babd2 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -16608,7 +16608,7 @@ dump_minipool (rtx_insn *scan) fputc ('\n', dump_file); } - switch (mp-fix_size) + switch (GET_MODE_SIZE (mp-mode)) { #ifdef HAVE_consttable_1 case 1: diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index cd9ab6c..8ca88b6 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -10567,6 +10567,42 @@ [(set_attr type no_insn)] ) +(define_insn consttable_1 + [(unspec_volatile [(match_operand 0 )] VUNSPEC_POOL_1)] + TARGET_EITHER + * + making_const_table = TRUE; + assemble_integer (operands[0], 1, BITS_PER_WORD, 1); + assemble_zeros (3); + return \\; + + [(set_attr length 4) + (set_attr type no_insn)] +) + +(define_insn consttable_2 + [(unspec_volatile [(match_operand 0 )] VUNSPEC_POOL_2)] + TARGET_EITHER + * + { +rtx x = operands[0]; +making_const_table = TRUE; +switch (GET_MODE_CLASS (GET_MODE (x))) + { + case MODE_FLOAT: + arm_emit_fp16_const (x); + break; + default: + assemble_integer (operands[0], 2, BITS_PER_WORD, 1); + assemble_zeros (2); + break; + } +return \\; + } + [(set_attr length 4) + (set_attr type no_insn)] +) + (define_insn consttable_4 [(unspec_volatile [(match_operand 0 )] VUNSPEC_POOL_4)] TARGET_EITHER @@ -10577,15 +10613,12 @@ switch (GET_MODE_CLASS (GET_MODE (x))) { case MODE_FLOAT: - if (GET_MODE (x) == HFmode) - arm_emit_fp16_const (x); - else - { - REAL_VALUE_TYPE r; - REAL_VALUE_FROM_CONST_DOUBLE (r, x); - assemble_real (r, GET_MODE (x), BITS_PER_WORD); - } - break; + { + REAL_VALUE_TYPE r; + REAL_VALUE_FROM_CONST_DOUBLE (r, x); + assemble_real (r, GET_MODE (x), BITS_PER_WORD); + break; + } default: /* XXX: Sometimes gcc does something really dumb and ends up with a HIGH in a constant pool entry, usually because it's trying to diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md index 020d83b..80d470b 100644 --- a/gcc/config/arm/thumb1.md +++ b/gcc/config/arm/thumb1.md @@ -1735,33 +1735,6 @@ (set_attr conds clob)] ) -(define_insn consttable_1 - [(unspec_volatile [(match_operand 0 )] VUNSPEC_POOL_1)] - TARGET_THUMB1 - * - making_const_table = TRUE; - assemble_integer (operands[0], 1, BITS_PER_WORD, 1); - assemble_zeros (3); - return \\; - - [(set_attr length 4) - (set_attr type no_insn)] -) - -(define_insn consttable_2 - [(unspec_volatile [(match_operand 0 )] VUNSPEC_POOL_2)] - TARGET_THUMB1 - * - making_const_table = TRUE; - gcc_assert (GET_MODE_CLASS
Re: [PATCH][ARM] Add -mcpu=cortex-a17.cortex-a7
On Tue, Nov 18, 2014 at 10:40 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, Following up from adding Cortex-A17 support this patch adds a big.LITTLE option cortex-a17.cortex-a7. Similar to the existing cortex-a15.cortex-a7 support we schedule for Cortex-A7 and make the other tuning decisions as for Cortex-A17. Tested arm-none-eabi. Ok for trunk? Ok when the rest go in. Ramana Thanks, Kyrill 2014-11-18 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm-cores.def (cortex-a17.cortex-a7): New entry. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Regenerate. * config/arm/bpabi.h (BE8_LINK_SPEC): Add mcpu=cortex-a17.cortex-a7. * config/arm/t-aprofile: Add cortex-a17.cortex-a7 entry to MULTILIB_MATCHES.
Re: [PATCH][ARM] Optimize copysign/copysignf for soft-float using BFI
On Wed, Oct 29, 2014 at 10:20 AM, Jiong Wang jiong.w...@arm.com wrote: On 26/08/14 13:36, Richard Earnshaw wrote: On 29/07/14 15:49, Jiong Wang wrote: test done === no regression on the full toolchain test on arm-none-eabi. ok to install? Hmm, I think this is wrong for DF mode. The principle the patch works on is by tying the output to the value containing the sign bit, and then copying the rest of the other value into that value. However, for DF mode it only copies 31 of the 63 bits needed; the least significant 32 bits of the mantissa are not copied over. R. updated the patch. fixed the DF mode bug. no regression on arm-none-eabi multi-lib test. ok to trunk? gcc/ * config/arm/arm.md (copysignsf3): New define_expand for SImode. New pattern. (copysigndf3): New define_expand for DImode. Likewise. Ok for trunk if no regressions. Sorry about the slow review. Ramana gcc/testsuite/ * gcc.target/arm/copysign_softfloat_1.c: New copysign/copysignf testcase for soft-float.
Re: [PATCH, ARM] attribute target (thumb,arm) [0/6]
On Wed, Nov 19, 2014 at 2:54 PM, Christian Bruel christian.br...@st.com wrote: On 11/19/2014 03:18 PM, Ramana Radhakrishnan wrote: On Wed, Nov 19, 2014 at 1:24 PM, Christian Bruel christian.br...@st.com wrote: I think I missed the stage3, Anyway would it be OK for stage1 when it reopens ? Since you submitted this well during stage1 and given that these patches address comments from earlier in the review process we should aim to get these in for 5.0. If at some point during the review process it looks risky and there needs to be significant rework we can always stop. It's on the list of patches to be reviewed and I will find some dedicated time later this week to set down and review / play with the patches in an attempt to move this forward as it is a reasonably large chunk of work. Thanks, also I forgot to mention that you need https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01231.html to play with the attribute. Will be part of the same shot. Isn't that Ok'd for committing ? Why then isn't it applied ? Ramana Christian Thanks for continuing to work on these patches and addressing the earlier review comments. regards Ramana Christian
Re: [PING][PATCH] Change contrib/test_installed for testing cross compilers
Well, FWIW, I was able to do parallel out-of-tree testing using only contrib/test_installed and the Makefile from build/gcc/ (plus touch config.status and some fiddling around with paths). Again I needed both target_triplet and also target_list, but that may be particular to my setup!! --Alan Jeff Law wrote: On 11/24/14 09:51, Alan Lawrence wrote: Having just been experimenting with testing of installed compilers - yes something like this could be useful, however: to do cross-testing I found I also (a) had to set my target_list; so either an extra flag for that, or maybe just a generic 'extra_site_flags' parameter? (b) I had to set up some boards...so maybe could have got there with the --tmpdir flag, ok; (c) lost all the parallelism provided by the Makefile in build/gcc. It should be possible to use the (check-parallel-xxx rules from) Makefile in conjunction with the site.exp from contrib/test_installed, haven't got that far yet... This does leave me wondering (1) whether a one-step $ test_installed is feasible, or a two-stage setup and then run is inevitable; (2) whether having all that parallelism expressed in the Makefile is the best place for it. Not that I have an alternative proposal at this point... It might be inevitable to have a two stage setup. Red Hat does installed compiler testing and I'm sure increased parallelism for install testing would be appreciated by the team doing that work. As for the --target, change itself, seems reasonable. Jeff
Re: [PATCH][ARM] Add Cortex-A17 support
On 27/11/14 08:52, Ramana Radhakrishnan wrote: On Thu, Nov 13, 2014 at 5:25 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This patch adds support for the Cortex-A17 processor to the arm backend. Cortex-A17 is an ARMv7ve core with the same architectural features as the Cortex-A7, A12 and A15 cores. The -m{tune, cpu}=cortex-a17 option is added and a pipeline description for instruction scheduling is provided. This has given an uplift over -mcpu=cortex-a15 tuning on a number of benchmarks. The patch is fairly self-contained with the bulk of the diffstat being the pipeline description files. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? Ok with documentation update in invoke.texi and an entry for changes.html Hi Ramana, Thanks for the review. The invoke.texi changes were posted and approved at https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02683.html and I'm attaching the proposed changes.html patch. Kyrill Ramana Thanks, Kyrill 2014-11-13 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.md (generic_sched): Specify cortexa17 in 'no' list. Include cortex-a17.md. * config/arm/arm.c (arm_issue_rate): Specify 2 for cortexa17. * config/arm/arm-cores.def (cortex-a17): New entry. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Regenerate. * config/arm/bpabi.h (BE8_LINK_SPEC): Specify mcpu=cortex-a17. * config/arm/cortex-a17.md: New file. * config/arm/cortex-a17-neon.md: New file. * config/arm/driver-arm.c (arm_cpu_table): Add entry for cortex-a17. * config/arm/t-aprofile: Add cortex-a17 entries to MULTILIB_MATCHES. Index: htdocs/gcc-5/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v retrieving revision 1.43 diff -U 3 -r1.43 changes.html --- htdocs/gcc-5/changes.html 27 Nov 2014 09:50:04 - 1.43 +++ htdocs/gcc-5/changes.html 27 Nov 2014 09:57:19 - @@ -398,6 +398,14 @@ configure option with either of code--with-tune/code or code--with-arch/code. /li + li Support for the Cortex-A17 processor has been added through the + code-mcpu=cortex-a17/code and code-mtune=cortex-a17/code options. + /li + li Initial big.LITTLE tuning support for the combination of Cortex-A17 + and Cortex-A7 was added through the code-mcpu=cortex-a17.cortex-a7 + /code and code-mtune=cortex-a17.cortex-a7/code options. + /li + /ul h3 id=x86IA-32/x86-64/h3
Re: [PATCH][ARM] Add Cortex-A17 support
On 27/11/14 11:09, Kyrill Tkachov wrote: On 27/11/14 08:52, Ramana Radhakrishnan wrote: On Thu, Nov 13, 2014 at 5:25 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This patch adds support for the Cortex-A17 processor to the arm backend. Cortex-A17 is an ARMv7ve core with the same architectural features as the Cortex-A7, A12 and A15 cores. The -m{tune, cpu}=cortex-a17 option is added and a pipeline description for instruction scheduling is provided. This has given an uplift over -mcpu=cortex-a15 tuning on a number of benchmarks. The patch is fairly self-contained with the bulk of the diffstat being the pipeline description files. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? Ok with documentation update in invoke.texi and an entry for changes.html Hi Ramana, Thanks for the review. The invoke.texi changes were posted and approved at https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02683.html and I'm attaching the proposed changes.html patch. Ok thanks. Ramana Kyrill Ramana Thanks, Kyrill 2014-11-13 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.md (generic_sched): Specify cortexa17 in 'no' list. Include cortex-a17.md. * config/arm/arm.c (arm_issue_rate): Specify 2 for cortexa17. * config/arm/arm-cores.def (cortex-a17): New entry. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Regenerate. * config/arm/bpabi.h (BE8_LINK_SPEC): Specify mcpu=cortex-a17. * config/arm/cortex-a17.md: New file. * config/arm/cortex-a17-neon.md: New file. * config/arm/driver-arm.c (arm_cpu_table): Add entry for cortex-a17. * config/arm/t-aprofile: Add cortex-a17 entries to MULTILIB_MATCHES.
[RFC PATCH, i386]: Prefer %ebx in set_got patterns
Hello! Attached patch helps RA to choose the most appropriate PIC register by changing the register preference for set_got patterns. Using this patch, there should really be a reason for RA to avoid ABI mandated hard PIC reg. This patch avoids many mov %exx,%ebx in front of the calls, that happen with unpatched compiler even with Vladimir's latest RA patch to avoid duplicated PIC registers. As a smoke test, I have checked 32bit libgo.so.6.0.0 library, where now we have: [uros@omen7 .libs]$ grep thunk.bx aaa | wc -l 7693 [uros@omen7 .libs]$ grep thunk.ax aaa | wc -l 10 [uros@omen7 .libs]$ grep thunk.cx aaa | wc -l 4 [uros@omen7 .libs]$ grep thunk.dx aaa | wc -l 8 [uros@omen7 .libs]$ grep thunk.bp aaa | wc -l 497 [uros@omen7 .libs]$ grep thunk.si aaa | wc -l 145 [uros@omen7 .libs]$ grep thunk.di aaa | wc -l 198 2014-11-27 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (set_got): Use =b,?r constraint for operand 0. (set_got_labelled): Ditto. (set_got_rex64): Ditto. (set_rip_rex64): Ditto. (set_got_offset_rex64): Ditto. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Thoughts? Uros Index: config/i386/i386.md === --- config/i386/i386.md (revision 218111) +++ config/i386/i386.md (working copy) @@ -12101,7 +12101,7 @@ ix86_expand_prologue (); DONE;) (define_insn set_got - [(set (match_operand:SI 0 register_operand =r) + [(set (match_operand:SI 0 register_operand =b,?r) (unspec:SI [(const_int 0)] UNSPEC_SET_GOT)) (clobber (reg:CC FLAGS_REG))] !TARGET_64BIT @@ -12110,7 +12110,7 @@ (set_attr length 12)]) (define_insn set_got_labelled - [(set (match_operand:SI 0 register_operand =r) + [(set (match_operand:SI 0 register_operand =b,?r) (unspec:SI [(label_ref (match_operand 1))] UNSPEC_SET_GOT)) (clobber (reg:CC FLAGS_REG))] @@ -12120,7 +12120,7 @@ (set_attr length 12)]) (define_insn set_got_rex64 - [(set (match_operand:DI 0 register_operand =r) + [(set (match_operand:DI 0 register_operand =b,?r) (unspec:DI [(const_int 0)] UNSPEC_SET_GOT))] TARGET_64BIT lea{q}\t{_GLOBAL_OFFSET_TABLE_(%%rip), %0|%0, _GLOBAL_OFFSET_TABLE_[rip]} @@ -12129,7 +12129,7 @@ (set_attr mode DI)]) (define_insn set_rip_rex64 - [(set (match_operand:DI 0 register_operand =r) + [(set (match_operand:DI 0 register_operand =b,?r) (unspec:DI [(label_ref (match_operand 1))] UNSPEC_SET_RIP))] TARGET_64BIT lea{q}\t{%l1(%%rip), %0|%0, %l1[rip]} @@ -12138,7 +12138,7 @@ (set_attr mode DI)]) (define_insn set_got_offset_rex64 - [(set (match_operand:DI 0 register_operand =r) + [(set (match_operand:DI 0 register_operand =b,?r) (unspec:DI [(label_ref (match_operand 1))] UNSPEC_SET_GOT_OFFSET))]
[PATCH][AARCH64]Use selected cpu's tuning when no tuning parameter is specified.
Hi all, We have the following code in aarch64_override_options() function. /* The selected cpu may be an architecture, so lookup tuning by core ID. */ if (!selected_tune) selected_tune = all_cores[selected_cpu-core]; However, the logic here is not right any more according to our current code. selected_cpu will never be an architecture at this point. This patch will correct the behaviour and use selected_cpu's tuning directly if selected_tune is still not decided at this point. We have the following consequence after the change. Previously we have the following tuning settings with the following command line options: -march=armv8-a -- selected_cpu = generic, aarch64_tune_params = cortexa53_tunings Nothing specified and selected_cpu == generic -- aarch64_tune_params = cortexa53_tunings After the change, we got something different: -march=armv8-a -- selected_cpu = generic, aarch64_tune_params = generic_tunings Nothing specified and selected_cpu == generic -- aarch64_tune_params = generic_tunings All other configuration(-march, -mcpu, -mtune) combinations should be unaffected. aarch64-none-elf has been built and tested on the model, no issue. Is it Okay for trunk? gcc/ChangeLog: 2014-11-27 Renlin Li renlin...@arm.com * config/aarch64/aarch64.c (aarch64_parse_cpu): Don't define selected_tune. (aarch64_override_options): Use selected_cpu's tuning. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 1809513..0a8c303 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -6613,7 +6613,6 @@ aarch64_parse_cpu (void) if (strlen (cpu-name) == len strncmp (cpu-name, str, len) == 0) { selected_cpu = cpu; - selected_tune = cpu; aarch64_isa_flags = selected_cpu-flags; if (ext != NULL) @@ -6709,9 +6708,8 @@ aarch64_override_options (void) gcc_assert (selected_cpu); - /* The selected cpu may be an architecture, so lookup tuning by core ID. */ if (!selected_tune) -selected_tune = all_cores[selected_cpu-core]; +selected_tune = selected_cpu; aarch64_tune_flags = selected_tune-flags; aarch64_tune = selected_tune-core;
Re: [PATCH] pr61324 pr 63649 - fix crash in ipa_comdats
On Tue, Nov 25, 2014 at 09:59:46PM +0100, Jan Hubicka wrote: From: Trevor Saunders tsaund...@mozilla.com Hi, the interesting symbol in the test case for pr61324 is __GLOBAL__sub_I_s. It refers to nothing, and is called by nothing, however it is kept (I believe because of -fkeep-inline-functions). That means ipa_comdats never tries to put Aha, that explans why it is around. Ok, so pr 63649 is more interesting. We start out with static ctors for a and b. ipa-pure-const decides the static ctor for b is const, and so calls cgraph_node::set_const_flag (). That function sets DECL_STATIC_CONSTRUCTOR on the static ctor for b to 0, but doesn't actually remove the function (I guess it relies on something else doing that). Then ipa-comdats comes along and the story continues more or less the same as pr 61324. it in a comdat, and so it never ends up in the hash table. It seems like the simplest solution is to just check if symbol is not in the map before trying to get the comdat it should go in, but another approach might be to keep separate hash maps for comdat functions and functions that can't be in any comdat, and then iterate over only the functions that belong in a comdat. Well, -fkeep-inline-functions promise you that you can call any inline function from debugger. I suppose in this case you also want to be able to call static functions. Comdat pass may bundle the function into comdat that is later optimized away by linker, so I would say we just want to disable the whole comdat pass when -fkeep-inline-functions is used? So we might still want to do that, but I think fixing the above issue may resolve this one too? Trev Patch for that is preapproved. Honza boottstrapped + regtested x86_64-unknown-linux-gnu, ok? Trev gcc/ * ipa-comdats.c (ipa_commdats): check if map contains symbol before trying to put symbol in a comdat. diff --git a/gcc/ipa-comdats.c b/gcc/ipa-comdats.c index af2aef8..8695a7e 100644 --- a/gcc/ipa-comdats.c +++ b/gcc/ipa-comdats.c @@ -327,18 +327,18 @@ ipa_comdats (void) !symbol-alias symbol-real_symbol_p ()) { - tree group = *map.get (symbol); + tree *group = map.get (symbol); - if (group == error_mark_node) + if (!group || *group == error_mark_node) continue; if (dump_file) { fprintf (dump_file, Localizing symbol\n); symbol-dump (dump_file); - fprintf (dump_file, To group: %s\n, IDENTIFIER_POINTER (group)); + fprintf (dump_file, To group: %s\n, IDENTIFIER_POINTER (*group)); } symbol-call_for_symbol_and_aliases (set_comdat_group, -*comdat_head_map.get (group), +*comdat_head_map.get (*group), true); } } diff --git a/gcc/testsuite/g++.dg/pr61324.C b/gcc/testsuite/g++.dg/pr61324.C new file mode 100644 index 000..6102574 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr61324.C @@ -0,0 +1,13 @@ +// { dg-do compile } +// { dg-options -O -fkeep-inline-functions -fno-use-cxa-atexit } +void foo (); + +struct S +{ + ~S () + { +foo (); + } +}; + +S s; -- 2.1.3
Re: [PATCH] Fix regressions in libgomp testsuite: set flag_fat_lto_objects for offload
On 25 Nov 10:51, Richard Biener wrote: In the patch adding flag_generate_offload sounds like a good solution, I didn't like emitting fat LTO objects unconditionally just because we offload. Here is updated patch. Bootstrap and make check passed on i686-linux and x86_64-linux. Tests with offloading also passed with trunk binutils. OK for trunk? -- Ilya gcc/ * cgraphunit.c (ipa_passes): Handle flag_generate_offload. (symbol_table::compile): Set flag_generate_offload if there is something to offload. * collect2.c (scan_prog_file): Look for the offload info marker symbol. * common.opt (flag_generate_offload): New Variable declaration. * dwarf2out.c (dwarf2out_finish): Handle flag_generate_offload. * ipa-inline-analysis.c (inline_generate_summary): Do not skip if flag_generate_offload is set. * lto-streamer.c (gate_lto_out): Handle flag_generate_offload. * passes.c (ipa_write_summaries): Do not skip if flag_generate_offload is set. * toplev.c (compile_file): Emit offload marker if offload info has been previously emitted. * tree.c (free_lang_data): Do not skip if flag_generate_offload is set. diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 2fd99a7..fed1a3e 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -2075,7 +2075,7 @@ ipa_passes (void) } /* Some targets need to handle LTO assembler output specially. */ - if (flag_generate_lto) + if (flag_generate_lto || flag_generate_offload) targetm.asm_out.lto_start (); if (!in_lto_p) @@ -2092,7 +2092,7 @@ ipa_passes (void) } } - if (flag_generate_lto) + if (flag_generate_lto || flag_generate_offload) targetm.asm_out.lto_end (); if (!flag_ltrans (in_lto_p || !flag_lto || flag_fat_lto_objects)) @@ -2176,10 +2176,10 @@ symbol_table::compile (void) /* Offloading requires LTO infrastructure. */ if (!in_lto_p g-have_offload) -flag_generate_lto = 1; +flag_generate_offload = 1; /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE. */ - if (flag_generate_lto) + if (flag_generate_lto || flag_generate_offload) lto_streamer_hooks_init (); /* Don't run the IPA passes if there was any error or sorry messages. */ diff --git a/gcc/collect2.c b/gcc/collect2.c index 9c3a1c5..2dcebcd 100644 --- a/gcc/collect2.c +++ b/gcc/collect2.c @@ -2392,12 +2392,16 @@ scan_prog_file (const char *prog_name, scanpass which_pass, if (found_lto) continue; - /* Look for the LTO info marker symbol, and add filename to + /* Look for the LTO or offload info marker symbol, and add filename to the LTO objects list if found. */ for (p = buf; (ch = *p) != '\0' ch != '\n'; p++) if (ch == ' ' p[1] == '_' p[2] == '_' -(strncmp (p + (p[3] == '_' ? 2 : 1), __gnu_lto_v1, 12) == 0) -ISSPACE (p[p[3] == '_' ? 14 : 13])) +(((strncmp (p + (p[3] == '_' ? 2 : 1), + __gnu_lto_v1, 12) == 0) + ISSPACE (p[p[3] == '_' ? 14 : 13])) + || ((strncmp (p + (p[3] == '_' ? 2 : 1), + __gnu_offload_v1, 16) == 0) +ISSPACE (p[p[3] == '_' ? 18 : 17] { add_lto_object (lto_objects, prog_name); diff --git a/gcc/common.opt b/gcc/common.opt index 41c8d4e..752d939 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -67,6 +67,10 @@ int *param_values Variable int flag_generate_lto +; Nonzero if we should write GIMPLE bytecode for offload compilation. +Variable +int flag_generate_offload = 0 + ; True to warn about any objects definitions whose size is larger ; than N bytes. Also want about function definitions whose returned ; values are larger than N bytes, where N is 'larger_than_size'. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 25f0e7d..4ee3102 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -24420,7 +24420,8 @@ dwarf2out_finish (const char *filename) /* When generating LTO bytecode we can not generate new assembler names at this point and all important decls got theirs via free-lang-data. */ - if ((!flag_generate_lto || DECL_ASSEMBLER_NAME_SET_P (decl)) + if (((!flag_generate_lto !flag_generate_offload) + || DECL_ASSEMBLER_NAME_SET_P (decl)) DECL_ASSEMBLER_NAME (decl) != DECL_NAME (decl)) { add_linkage_attr (node-die, decl); diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c index 2f2993c..9d62722 100644 --- a/gcc/ipa-inline-analysis.c +++ b/gcc/ipa-inline-analysis.c @@ -4031,7 +4031,7 @@ inline_generate_summary (void) /* When not optimizing, do not bother to analyze. Inlining is still done because edge redirection needs to happen there. */ - if (!optimize !flag_generate_lto !flag_wpa)
Re: [PATCH] Fix ubsan and C++14 constexpr ICEs (PR sanitizer/63956)
On Wed, Nov 26, 2014 at 12:03:45PM -0500, Jason Merrill wrote: On 11/20/2014 02:04 PM, Marek Polacek wrote: + if (fun == NULL_TREE) +switch (CALL_EXPR_IFN (t)) + { + case IFN_UBSAN_NULL: + case IFN_UBSAN_BOUNDS: +return void_node; + default: +break; + } Other IFNs should make the call non-constant. Ok. -/* { dg-error is not a constant expression { target c++ } 12 } */ +/* { dg-warning right shift count is negative { target c++ } 12 } */ This should be an xfail (pending the delayed folding work) instead of a different test. Just to check that I understand correctly, the point is that with delayed folding we'd give an error here? I added the xfails. +constexpr int n3 = fn7 ((const int *) 0, 8); // { dg-error is not a constant expression|constexpr call flows off } The flows off the end error is a bug and should not be tested for. I'm going to check in a fix. Fixed. Bootstrapped/regtested on ppc64-linux. 2014-11-27 Marek Polacek pola...@redhat.com PR sanitizer/63956 * ubsan.c (is_ubsan_builtin_p): Check also built-in class. cp/ * constexpr.c: Include ubsan.h. (cxx_eval_call_expression): Bail out for IFN_UBSAN_{NULL,BOUNDS} internal functions and for ubsan builtins. * error.c: Include internal-fn.h. (dump_expr): Add printing of internal functions. testsuite/ * c-c++-common/ubsan/shift-5.c: Add xfails. * g++.dg/ubsan/div-by-zero-1.C: Don't use -w. Add xfail. * g++.dg/ubsan/pr63956.C: New test. diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c index ef9ef70..45d5959 100644 --- gcc/cp/constexpr.c +++ gcc/cp/constexpr.c @@ -32,6 +32,7 @@ along with GCC; see the file COPYING3. If not see #include gimplify.h #include builtins.h #include tree-inline.h +#include ubsan.h static bool verify_constant (tree, bool, bool *, bool *); #define VERIFY_CONSTANT(X) \ @@ -1151,6 +1152,19 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t, constexpr_call *entry; bool depth_ok; + if (fun == NULL_TREE) +switch (CALL_EXPR_IFN (t)) + { + case IFN_UBSAN_NULL: + case IFN_UBSAN_BOUNDS: + return void_node; + default: + if (!ctx-quiet) + error_at (loc, call to internal function); + *non_constant_p = true; + return t; + } + if (TREE_CODE (fun) != FUNCTION_DECL) { /* Might be a constexpr function pointer. */ @@ -1171,6 +1185,10 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t, } if (DECL_CLONED_FUNCTION_P (fun)) fun = DECL_CLONED_FUNCTION (fun); + + if (is_ubsan_builtin_p (fun)) +return void_node; + if (is_builtin_fn (fun)) return cxx_eval_builtin_function_call (ctx, t, addr, non_constant_p, overflow_p); diff --git gcc/cp/error.c gcc/cp/error.c index 5dcc149..ff26fb9 100644 --- gcc/cp/error.c +++ gcc/cp/error.c @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. If not see #include tree-pretty-print.h #include c-family/c-objc.h #include ubsan.h +#include internal-fn.h #include new// For placement-new. @@ -2039,6 +2040,14 @@ dump_expr (cxx_pretty_printer *pp, tree t, int flags) tree fn = CALL_EXPR_FN (t); bool skipfirst = false; + /* Deal with internal functions. */ + if (fn == NULL_TREE) + { + pp_string (pp, internal_fn_name (CALL_EXPR_IFN (t))); + dump_call_expr_args (pp, t, flags, skipfirst); + break; + } + if (TREE_CODE (fn) == ADDR_EXPR) fn = TREE_OPERAND (fn, 0); diff --git gcc/testsuite/c-c++-common/ubsan/shift-5.c gcc/testsuite/c-c++-common/ubsan/shift-5.c index 6f9c52a..d779571 100644 --- gcc/testsuite/c-c++-common/ubsan/shift-5.c +++ gcc/testsuite/c-c++-common/ubsan/shift-5.c @@ -2,31 +2,41 @@ /* { dg-options -fsanitize=shift -w } */ /* { dg-shouldfail ubsan } */ -int x; int -foo (void) +foo (int x) { /* None of the following should pass. */ switch (x) { case 1 -1: -/* { dg-error case label does not reduce to an integer constant {target c } 12 } */ -/* { dg-error is not a constant expression { target c++ } 12 } */ +/* { dg-error case label does not reduce to an integer constant { target c } 11 } */ +/* { dg-error { xfail { *-*-* } } 11 } */ case -1 -1: -/* { dg-error case label does not reduce to an integer constant {target c } 15 } */ -/* { dg-error is not a constant expression { target c++ } 15 } */ +/* { dg-error case label does not reduce to an integer constant { target c } 14 } */ +/* { dg-error { xfail { *-*-* } } 14 } */ case 1 -1: -/* { dg-error case label does not reduce to an integer constant {target c } 18 } */ -/* { dg-error is not a constant expression { target c++ } 18 } */ +/* { dg-error case label does not reduce to an integer constant { target c }
[PING][PATCH] Ignore alignment by option
On 11/19/2014 06:01 PM, Marat Zakirov wrote: Hi all! Here is the patch which forces ASan to ignore alignment of memory access. It increases ASan overhead but it's still useful because some programs like linux kernel often cheat with alignment which may cause false negatives. --Marat gcc/ChangeLog: 2014-11-14 Marat Zakirov m.zaki...@samsung.com * asan.c (asan_expand_check_ifn): Ignore alignment by option. * doc/invoke.texi: Document. * params.def (asan-alignment-optimize): New. * params.h: Likewise. gcc/testsuite/ChangeLog: 2014-11-14 Marat Zakirov m.zaki...@samsung.com * c-c++-common/asan/red-align-3.c: New test. diff --git a/gcc/asan.c b/gcc/asan.c index 79dede7..4f86088 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -2518,6 +2518,12 @@ asan_expand_check_ifn (gimple_stmt_iterator *iter, bool use_calls) HOST_WIDE_INT size_in_bytes = is_scalar_access tree_fits_shwi_p (len) ? tree_to_shwi (len) : -1; + + if (!ASAN_ALIGNMENT_OPTIMIZE size_in_bytes 1) +{ + size_in_bytes = -1; + align = 1; +} if (use_calls) { diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 13270bc..8f43c06 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -10580,6 +10580,12 @@ is greater or equal to this number, use callbacks instead of inline checks. E.g. to disable inline code use @option{--param asan-instrumentation-with-call-threshold=0}. +@item asan-alignment-optimize +Enable asan optimization for aligned accesses. +It is enabled by default when using @option{-fsanitize=address} option. +To disable optimization for aligned accesses use +@option{--param asan-alignment-optimize=0}. + @item chkp-max-ctor-size Static constructors generated by Pointer Bounds Checker may become very large and significantly increase compile time at optimization level diff --git a/gcc/params.def b/gcc/params.def index d2d2add..fbccf46 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -1114,6 +1114,11 @@ DEFPARAM (PARAM_ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD, in function becomes greater or equal to this number, 7000, 0, INT_MAX) +DEFPARAM (PARAM_ASAN_ALIGNMENT_OPTIMIZE, + asan-alignment-optimize, + Enable asan optimization for aligned access, + 1, 0, 1) + DEFPARAM (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS, uninit-control-dep-attempts, Maximum number of nested calls to search for control dependencies diff --git a/gcc/params.h b/gcc/params.h index 4779e17..e2973d4 100644 --- a/gcc/params.h +++ b/gcc/params.h @@ -238,5 +238,7 @@ extern void init_param_values (int *params); PARAM_VALUE (PARAM_ASAN_USE_AFTER_RETURN) #define ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD \ PARAM_VALUE (PARAM_ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD) +#define ASAN_ALIGNMENT_OPTIMIZE \ + PARAM_VALUE (PARAM_ASAN_ALIGNMENT_OPTIMIZE) #endif /* ! GCC_PARAMS_H */ diff --git a/gcc/testsuite/c-c++-common/asan/ignore_align.c b/gcc/testsuite/c-c++-common/asan/ignore_align.c new file mode 100644 index 000..989958b --- /dev/null +++ b/gcc/testsuite/c-c++-common/asan/ignore_align.c @@ -0,0 +1,34 @@ +/* { dg-options -fdump-tree-sanopt --param asan-alignment-optimize=0 } */ +/* { dg-do compile } */ +/* { dg-skip-if { *-*-* } { -flto } { } } */ + +#ifdef __cplusplus +extern C { +#endif +void *memset (void *, int, __SIZE_TYPE__); +#ifdef __cplusplus +} +#endif + +struct dummy { + int a; + int b; + int c; + int d; +}; + +volatile struct dummy * new_p; +volatile struct dummy * old_p; + +void foo(void) +{ + *(volatile char *)(0x12); + *(volatile short int *)(0x12); + *(volatile unsigned int *)(0x12); + *(volatile unsigned long long *)(0x12); + *new_p = *old_p; +} + +/* { dg-final { scan-tree-dump-times 3 11 sanopt } } */ +/* { dg-final { scan-tree-dump-times 7 11 sanopt } } */ +/* { dg-final { cleanup-tree-dump sanopt } } */
[PATCH] Fix PR64088
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2014-11-27 Richard Biener rguent...@suse.de PR middle-end/64088 * fold-const.c (const_unop): Re-instantiate missing condition before calling fold_abs_const. * gcc.dg/torture/pr64088.c: New testcase. Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 218121) +++ gcc/fold-const.c(working copy) @@ -1481,7 +1489,9 @@ const_unop (enum tree_code code, tree ty } case ABS_EXPR: - return fold_abs_const (arg0, type); + if (TREE_CODE (arg0) == INTEGER_CST || TREE_CODE (arg0) == REAL_CST) + return fold_abs_const (arg0, type); + break; case CONJ_EXPR: if (TREE_CODE (arg0) == COMPLEX_CST) Index: gcc/testsuite/gcc.dg/torture/pr64088.c === --- gcc/testsuite/gcc.dg/torture/pr64088.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/pr64088.c (working copy) @@ -0,0 +1,17 @@ +/* { dg-do compile } */ + +extern int abs (int); +extern void attenuate_curve(int*); +int a; +void +setup_tone_curves_center_boost () +{ + int b[0]; + a = 0; + for (; a 6; a++) +{ + int c = abs (a); + b[a] = c; +} + attenuate_curve (b); +}
[PATCH] Fix PR64091
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2014-11-27 Richard Biener rguent...@suse.de PR tree-optimization/64088 * tree-ssa-tail-merge.c (update_debug_stmt): After resetting the stmt break from the loop over use operands. * gcc.dg/torture/pr64091.c: New testcase. Index: gcc/tree-ssa-tail-merge.c === --- gcc/tree-ssa-tail-merge.c (revision 218117) +++ gcc/tree-ssa-tail-merge.c (working copy) @@ -1606,9 +1613,7 @@ update_debug_stmt (gimple stmt) { use_operand_p use_p; ssa_op_iter oi; - basic_block bbdef, bbuse; - gimple def_stmt; - tree name; + basic_block bbuse; if (!gimple_debug_bind_p (stmt)) return; @@ -1616,19 +1621,16 @@ update_debug_stmt (gimple stmt) bbuse = gimple_bb (stmt); FOR_EACH_PHI_OR_STMT_USE (use_p, stmt, oi, SSA_OP_USE) { - name = USE_FROM_PTR (use_p); - gcc_assert (TREE_CODE (name) == SSA_NAME); - - def_stmt = SSA_NAME_DEF_STMT (name); - gcc_assert (def_stmt != NULL); - - bbdef = gimple_bb (def_stmt); + tree name = USE_FROM_PTR (use_p); + gimple def_stmt = SSA_NAME_DEF_STMT (name); + basic_block bbdef = gimple_bb (def_stmt); if (bbdef == NULL || bbuse == bbdef || dominated_by_p (CDI_DOMINATORS, bbuse, bbdef)) continue; gimple_debug_bind_reset_value (stmt); update_stmt (stmt); + break; } } Index: gcc/testsuite/gcc.dg/torture/pr64091.c === --- gcc/testsuite/gcc.dg/torture/pr64091.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/pr64091.c (working copy) @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-additional-options -g } */ + +extern int foo(void); + +int main(void) +{ + int i, a, b; + + if (foo()) +return 0; + + for (i = 0, a = 0, b = 0; i 3; i++, a++) + { +if (foo()) + break; + +if (b += a) + a = 0; + } + + if (!a) +return 2; + + b += a; + + return 0; +}
Re: [C++ PATCH] Detect UB in shifts in constexpr functions
On Wed, Nov 26, 2014 at 11:56:26AM -0500, Jason Merrill wrote: Please give diagnostics explaining what's wrong with the shift rather than the generic is not a constant expression. Done. + tree t = build_int_cst (unsigned_type_node, uprec - 1); + t = fold_build2 (MINUS_EXPR, unsigned_type_node, t, rhs); + tree ulhs = fold_convert (unsigned_type_for (lhstype), lhs); + t = fold_build2 (RSHIFT_EXPR, TREE_TYPE (ulhs), ulhs, t); + if (tree_int_cst_lt (integer_one_node, t)) This could also use a comment explaining the logic. Ok. How about the following then? Bootstrapped/regtested on ppc64-linux. 2014-11-27 Marek Polacek pola...@redhat.com * constexpr.c (cxx_eval_check_shift_p): New function. (cxx_eval_binary_expression): Call it. * g++.dg/cpp0x/constexpr-shift1.C: New test. * g++.dg/cpp1y/constexpr-shift1.C: New test. diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c index ef9ef70..d305298 100644 --- gcc/cp/constexpr.c +++ gcc/cp/constexpr.c @@ -1451,6 +1451,48 @@ verify_constant (tree t, bool allow_non_constant, bool *non_constant_p, return *non_constant_p; } +/* Return true if the shift operation on LHS and RHS is undefined. */ + +static bool +cxx_eval_check_shift_p (enum tree_code code, tree lhs, tree rhs) +{ + if ((code != LSHIFT_EXPR code != RSHIFT_EXPR) + || TREE_CODE (lhs) != INTEGER_CST + || TREE_CODE (rhs) != INTEGER_CST) +return false; + + tree lhstype = TREE_TYPE (lhs); + unsigned HOST_WIDE_INT uprec = TYPE_PRECISION (TREE_TYPE (lhs)); + + /* [expr.shift] The behavior is undefined if the right operand + is negative, or greater than or equal to the length in bits + of the promoted left operand. */ + if (tree_int_cst_sgn (rhs) == -1 || compare_tree_int (rhs, uprec) = 0) +return true; + + /* The value of E1 E2 is E1 left-shifted E2 bit positions; [...] + if E1 has a signed type and non-negative value, and E1x2^E2 is + representable in the corresponding unsigned type of the result type, + then that value, converted to the result type, is the resulting value; + otherwise, the behavior is undefined. */ + if (code == LSHIFT_EXPR !TYPE_UNSIGNED (lhstype)) +{ + if (tree_int_cst_sgn (lhs) == -1) + return true; + /* For signed x y the following: +(unsigned) x ((prec (lhs) - 1) - y) +if 1, is undefined. */ + tree t = build_int_cst (unsigned_type_node, uprec - 1); + t = fold_build2 (MINUS_EXPR, unsigned_type_node, t, rhs); + tree ulhs = fold_convert (unsigned_type_for (lhstype), lhs); + t = fold_build2 (RSHIFT_EXPR, TREE_TYPE (ulhs), ulhs, t); + if (tree_int_cst_lt (integer_one_node, t)) + return true; +} + + return false; +} + /* Subroutine of cxx_eval_constant_expression. Attempt to reduce the unary expression tree T to a compile time value. If successful, return the value. Otherwise issue a diagnostic @@ -1513,6 +1555,9 @@ cxx_eval_binary_expression (const constexpr_ctx *ctx, tree t, else r = build2_loc (loc, code, type, lhs, rhs); } + else if (cxx_eval_check_shift_p (code, lhs, rhs)) +error_at (loc, shift expression %q+E overflows, + build2_loc (loc, code, type, lhs, rhs)); VERIFY_CONSTANT (r); return r; } diff --git gcc/testsuite/g++.dg/cpp0x/constexpr-shift1.C gcc/testsuite/g++.dg/cpp0x/constexpr-shift1.C index e69de29..48c2ab1 100644 --- gcc/testsuite/g++.dg/cpp0x/constexpr-shift1.C +++ gcc/testsuite/g++.dg/cpp0x/constexpr-shift1.C @@ -0,0 +1,73 @@ +// { dg-do compile { target c++11 } } + +constexpr int +fn1 (int i, int j) +{ + return i j; // { dg-error shift expression } +} + +constexpr int i1 = fn1 (1, -1); + +constexpr int +fn2 (int i, int j) +{ + return i j; // { dg-error shift expression } +} + +constexpr int i2 = fn2 (1, 200); + +constexpr int +fn3 (int i, int j) +{ + return i j; // { dg-error shift expression } +} + +constexpr int i3 = fn3 (-1, 2); + +constexpr int +fn4 (int i, int j) +{ + return i j; // { dg-error shift expression } +} + +constexpr int i4 = fn4 (__INT_MAX__, 2); + +constexpr int +fn5 (int i, int j) +{ + return i j; +} + +constexpr int i5 = fn5 (__INT_MAX__, 1); + +constexpr int +fn6 (unsigned int i, unsigned int j) +{ + return i j; // { dg-error shift expression } +} + +constexpr int i6 = fn6 (1, -1); + +constexpr int +fn7 (int i, int j) +{ + return i j; // { dg-error shift expression } +} + +constexpr int i7 = fn7 (1, -1); + +constexpr int +fn8 (int i, int j) +{ + return i j; +} + +constexpr int i8 = fn8 (-1, 1); + +constexpr int +fn9 (int i, int j) +{ + return i j; // { dg-error shift expression } +} + +constexpr int i9 = fn9 (1, 200); diff --git gcc/testsuite/g++.dg/cpp1y/constexpr-shift1.C gcc/testsuite/g++.dg/cpp1y/constexpr-shift1.C index e69de29..a739fd2 100644 --- gcc/testsuite/g++.dg/cpp1y/constexpr-shift1.C +++ gcc/testsuite/g++.dg/cpp1y/constexpr-shift1.C @@ -0,0 +1,9 @@ +// { dg-do
Re: [PATCH] IPA ICF: memory leak fix
On 11/23/2014 10:01 AM, Markus Trippelsdorf wrote: On 2014.11.22 at 17:46 +0100, Markus Trippelsdorf wrote: On 2014.11.22 at 16:04 +0100, Martin Liška wrote: On 11/22/2014 10:09 AM, Markus Trippelsdorf wrote: On 2014.11.22 at 09:05 +0100, Martin Liška wrote: Following patch removes memory leak that was introduced by very first IPA ICF patch. I would like to thank David for hunting the leak. Patch an bootstrap on x86_86-linux-pc and no regression is introduced. I gave the patch a quick spin on gcc112: *** Error in `/home/trippels/gcc_build_dir/./prev-gcc/lto1': free(): invalid next size (fast): 0x01000a5fc160 *** === Backtrace: = /lib64/libc.so.6(+0xa3d9c)[0x3fff7b6b3d9c] /lib64/libc.so.6(+0xaf0b4)[0x3fff7b6bf0b4] /home/trippels/gcc_build_dir/./prev-gcc/lto1(_ZN3vecIi7va_heap6vl_ptrE7releaseEv-0x1d4bc00)[0x1025dd88] /home/trippels/gcc_build_dir/./prev-gcc/lto1(_ZN7ipa_icf12sem_function14equals_privateEPNS_8sem_itemER8hash_mapIP11symtab_nodeS2_22default_hashmap_traitsE-0x9c083c)[0x116586bc] /home/trippels/gcc_build_dir/./prev-gcc/lto1(_ZN7ipa_icf12sem_function6equalsEPNS_8sem_itemER8hash_mapIP11symtab_nodeS2_22default_hashmap_traitsE-0x9c0578)[0x11658998] /home/trippels/gcc_build_dir/./prev-gcc/lto1(_ZN7ipa_icf18sem_item_optimizer7executeEv-0x9b8774)[0x11660a84] /home/trippels/gcc_build_dir/./prev-gcc/lto1(_ZN7ipa_icf12pass_ipa_icf7executeEP8function-0x9b0314)[0x11668efc] /home/trippels/gcc_build_dir/./prev-gcc/lto1(_Z16execute_one_passP8opt_pass-0x1647588)[0x1098a0a8] /home/trippels/gcc_build_dir/./prev-gcc/lto1(_Z21execute_ipa_pass_listP8opt_pass-0x1644c2c)[0x1098ca7c] /home/trippels/gcc_build_dir/./prev-gcc/lto1(_Z8lto_mainv-0x1df20e4)[0x101b494c] /home/trippels/gcc_build_dir/./prev-gcc/lto1[0x10b599b8] /home/trippels/gcc_build_dir/./prev-gcc/lto1(_ZN6toplev4mainEiPPc-0x1e8be70)[0x101507b8] /home/trippels/gcc_build_dir/./prev-gcc/lto1(main-0x1ec8d8c)[0x1015493c] /lib64/libc.so.6(+0x447ac)[0x3fff7b6547ac] /lib64/libc.so.6(__libc_start_main-0x19cbf4)[0x3fff7b6549d4] === Memory map: ... Thank you for testing, problem is that I should grow the vector by 1, because '0' is used as NULL value. Please try my fixed patch. This one survives bootstrap-lto. Thanks. But Firefox doesn't build: /home/trippels/gcc_test/usr/local/bin/c++ -fPIC -Wall -Wempty-body -Woverloaded-virtual -Wsign-compare -Wwrite-strings -Werror=endif-labels -Werror=int-to-pointer-cast -Werror=missing-braces -Werror=pointer-arith -Werror=return-type -Werror=sequence-point -Werror=unused-label -Werror=trigraphs -Werror=type-limits -Wno-invalid-offsetof -Wcast-align -flto=160 --param lto-partitions=160 -mcpu=power8 -ffunction-sections -fdata-sections -fno-exceptions -fno-strict-aliasing -frtti -fno-exceptions -fno-math-errno -std=gnu++0x -pthread -pipe -UDEBUG -DNDEBUG -O3 -DU_STATIC_IMPLEMENTATION -fvisibility=hidden -W -Wall -pedantic -Wpointer-arith -Wwrite-strings -Wno-long-long -Wno-unused -Wno-unused-parameter -lpthread -Wl,--hash-style=gnu,--as-needed,--gc-sections,--icf=all -Wl,-z,noexecstack -Wl,-z,text -Wl,--build-id -Wl,--gc-sections -o ../../bin/makeconv makeconv.o ucnvstat.o genmbcs.o gencnvex.o -L../../lib -licutu -L../../lib -licui18n -L../../lib -licuuc -L../../stubdata -licudata - l pthread -ldl -lm lto1: internal compiler error: in operator[], at vec.h:736 0x10122377 vecint, va_heap, vl_embed::operator[](unsigned int) ../../gcc/gcc/vec.h:736 0x10d1a0f3 vecint, va_heap, vl_embed::operator[](unsigned int) ../../gcc/gcc/ipa-icf.c:963 0x10d1a0f3 vecint, va_heap, vl_ptr::operator[](unsigned int) ../../gcc/gcc/vec.h:1202 0x10d1a0f3 ipa_icf::sem_function::bb_dict_test(auto_vecint, 0ul, int, int) ../../gcc/gcc/ipa-icf.c:970 0x10d1aa73 ipa_icf::sem_function::equals_private(ipa_icf::sem_item*, hash_mapsymtab_node*, ipa_icf::sem_item*, default_hashmap_traits) ../../gcc/gcc/ipa-icf.c:512 0x10d1afb3 ipa_icf::sem_function::equals(ipa_icf::sem_item*, hash_mapsymtab_node*, ipa_icf::sem_item*, default_hashmap_traits) ../../gcc/gcc/ipa-icf.c:384 0x10d1dba3 ipa_icf::sem_item_optimizer::subdivide_classes_by_equality(bool) ../../gcc/gcc/ipa-icf.c:1833 0x10d2397f ipa_icf::sem_item_optimizer::execute() ../../gcc/gcc/ipa-icf.c:1652 0x10d24a93 ipa_icf_driver ../../gcc/gcc/ipa-icf.c:2382 0x10d24a93 ipa_icf::pass_ipa_icf::execute(function*) ../../gcc/gcc/ipa-icf.c:2430 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions. lto-wrapper: fatal error: /home/trippels/gcc_test/usr/local/bin/c++ returned 1 exit status compilation terminated. /home/trippels/bin/ld: fatal error: lto-wrapper failed collect2: error: ld returned 1 exit status make[8]: *** [../../bin/makeconv] Error 1 Hi. Well, this final version can run profiled-bootstrap and I capable of building FF with
RE: New patch: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe.
On 18 November 2014 10:14, David Sherwood david.sherw...@arm.com wrote: Hi Christophe, Ah sorry. My mistake - it fixes this in bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59810 I did look at that PR, but since it has no testcase attached, I was unsure. And I am still not :-) PR 59810 is [AArch64] LDn/STn implementations are not ABI-conformant for bigendian. but the advsimd-intrinsics/vldX.c and vldX_lane.c now PASS with Alan's patches on aarch64_be, so I thought Alan's patches solve PR59810. What am I missing? Hi Christophe, I think probably this is our fault for making our lives way too difficult and artificially splitting all these patches up. :) Alan's patch: https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00952.html fixes some issues on aarch64_be, but also causes regressions. For example, Tests that now fail, but worked before: aarch64_be-elf-aem: gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects execution test aarch64_be-elf-aem: gcc.dg/vect/slp-perm-8.c execution test aarch64_be-elf-aem: gcc.dg/vect/vect-over-widen-1-big-array.c -flto -ffat-lto-objects execution test ... Tests that now work, but didn't before: aarch64_be-elf-aem: gcc.dg/vect/fast-math-vect-complex-3.c execution test aarch64_be-elf-aem: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c execution test aarch64_be-elf-aem: gcc.dg/vect/no-scevccp-outer-10a.c execution test ... His patch is only half of the story and must be applied at the same time as the [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe. patch. With both patches applied the result looks much healthier: # Comparing 1 common sum files ## /bin/sh ./src/gcc/contrib/compare_tests /tmp/gxx-sum1.10051 /tmp/gxx-sum2.10051 Tests that now work, but didn't before: aarch64_be-elf-aem: gcc.dg/torture/pr52028.c -O3 -fomit-frame-pointer execution test aarch64_be-elf-aem: gcc.dg/torture/pr52028.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test aarch64_be-elf-aem: gcc.dg/torture/pr52028.c -O3 -fomit-frame-pointer -funroll-loops execution test ... with no new regressions. After applying both patches the aarch64_be gcc testsuite is on a parity with the aarch64 testsuite. Furthermore, after applying both of these patches: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe [AArch64] [BE] Fix vector load/stores to not use ld1/st1 it then becomes safe for us to remove the CCMC macro, which is the cause of unnecessary spills to the stack for certain auto-vectorised code. So really I suppose when I posted my second patch [AArch64] [BE] [2/2] Make large opaque integer modes endianness-safe I should have really just called this [AArch64] [BE] Remove CCMC for aarch64 in order to make it clear exactly what the purpose of these patches is. Kind Regards, David Sherwood.
[PATCH] Fix PR64084
Currently the order in which patterns in match.pd match is pretty random (well, it's deterministic of course). This is bad if there are multiple possible matches such as in z = x + 1; w = z + 0; where we have both z + 0 - z and (x + 1) + 0 - x + (1 + 0). Currently the latter happens to be applied and thus we simplify to z = x + 1; w = x + 1; which is bad for propagation engines which only handle constant or SSA name results (and also either generates redundant code or leaves dead code). Thus the following patch makes sure that if there are multiple matches possible then the pattern first appearing in match.pd is chosen. This slightly increases the number of instructions executed during the matching in the worst case but is certainly the best course of action here (other possibilities are to make the patterns disjunct by disabling the latter for the + 0 case). At least it matches how other generated matchers behave. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2014-11-27 Richard Biener rguent...@suse.de PR middle-end/64084 * genmatch.c (dt_node::gen_kids_1): New function, split out from dt_node::gen_kids. (decision_tree::cmp_node): DT_TRUE are generally not equal. (decision_tree::find_node): Treat DT_TRUE as barrier for node CSE on the same level. (dt_node::append_node): Do not keep DT_TRUE last. (dt_node::gen_kids): Emit code after each DT_TRUE node seen. * gcc.dg/tree-ssa/ssa-ccp-34.c: New testcase. * gcc.dg/tree-ssa/forwprop-31.c: Likewise. Index: gcc/genmatch.c === *** gcc/genmatch.c.orig 2014-11-14 12:13:31.029117775 +0100 --- gcc/genmatch.c 2014-11-27 15:03:24.085664272 +0100 *** struct dt_node *** 1098,1103 --- 1098,1106 virtual void gen (FILE *, bool) {} void gen_kids (FILE *, bool); + void gen_kids_1 (FILE *, bool, + vecdt_operand *, vecdt_operand *, vecdt_operand *, + vecdt_operand *, vecdt_operand *, vecdt_node *); }; /* Generic decision tree node used for DT_OPERAND and DT_MATCH. */ *** decision_tree::cmp_node (dt_node *n1, dt *** 1203,1211 if (!n1 || !n2 || n1-type != n2-type) return false; ! if (n1 == n2 || n1-type == dt_node::DT_TRUE) return true; if (n1-type == dt_node::DT_OPERAND) return cmp_operand ((as_adt_operand * (n1))-op, (as_adt_operand * (n2))-op); --- 1206,1217 if (!n1 || !n2 || n1-type != n2-type) return false; ! if (n1 == n2) return true; + if (n1-type == dt_node::DT_TRUE) + return false; + if (n1-type == dt_node::DT_OPERAND) return cmp_operand ((as_adt_operand * (n1))-op, (as_adt_operand * (n2))-op); *** decision_tree::cmp_node (dt_node *n1, dt *** 1220,1229 dt_node * decision_tree::find_node (vecdt_node * ops, dt_node *p) { ! for (unsigned i = 0; i ops.length (); ++i) ! if (decision_tree::cmp_node (ops[i], p)) ! return ops[i]; ! return NULL; } --- 1226,1246 dt_node * decision_tree::find_node (vecdt_node * ops, dt_node *p) { ! /* We can merge adjacent DT_TRUE. */ ! if (p-type == dt_node::DT_TRUE !!ops.is_empty () !ops.last ()-type == dt_node::DT_TRUE) ! return ops.last (); ! for (int i = ops.length () - 1; i = 0; --i) ! { ! /* But we can't merge across DT_TRUE nodes as they serve as ! pattern order barriers to make sure that patterns apply !in order of appearance in case multiple matches are possible. */ ! if (ops[i]-type == dt_node::DT_TRUE) ! return NULL; ! if (decision_tree::cmp_node (ops[i], p)) ! return ops[i]; ! } return NULL; } *** dt_node::append_node (dt_node *n) *** 1242,1256 kids.safe_push (n); n-level = this-level + 1; - unsigned len = kids.length (); - - if (len 1 kids[len - 2]-type == dt_node::DT_TRUE) - { - dt_node *p = kids[len - 2]; - kids[len - 2] = kids[len - 1]; - kids[len - 1] = p; - } - return n; } --- 1259,1264 *** dt_node::gen_kids (FILE *f, bool gimple) *** 2006,2012 auto_vecdt_operand * generic_fns; auto_vecdt_operand * preds; auto_vecdt_node * others; - dt_node *true_operand = NULL; for (unsigned i = 0; i kids.length (); ++i) { --- 2014,2019 *** dt_node::gen_kids (FILE *f, bool gimple) *** 2044,2054 || kids[i]-type == dt_node::DT_SIMPLIFY) others.safe_push (kids[i]); else if (kids[i]-type == dt_node::DT_TRUE) ! true_operand = kids[i]; else gcc_unreachable (); } char buf[128]; char *kid_opname = buf; --- 2051,2090 || kids[i]-type == dt_node::DT_SIMPLIFY)
Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P
Ping. Thanks, Kyrill On 11/11/14 11:55, Kyrill Tkachov wrote: Hi all, This is the arm implementation of the macro fusion hook. It tries to fuse movw+movt operations together. It also tries to take lo_sum RTXs into account since those generate movt instructions as well. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? Thanks, Kyrill 2014-11-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value.
Re: [PATCH] Fix regressions in libgomp testsuite: set flag_fat_lto_objects for offload
On Thu, Nov 27, 2014 at 2:31 PM, Ilya Verbin iver...@gmail.com wrote: On 25 Nov 10:51, Richard Biener wrote: In the patch adding flag_generate_offload sounds like a good solution, I didn't like emitting fat LTO objects unconditionally just because we offload. Here is updated patch. Bootstrap and make check passed on i686-linux and x86_64-linux. Tests with offloading also passed with trunk binutils. OK for trunk? If it also works without the new __gnu_offload symbol then I'd prefer to re-use __gnu_LTO here. Ok with that change. Thanks, Richard. -- Ilya gcc/ * cgraphunit.c (ipa_passes): Handle flag_generate_offload. (symbol_table::compile): Set flag_generate_offload if there is something to offload. * collect2.c (scan_prog_file): Look for the offload info marker symbol. * common.opt (flag_generate_offload): New Variable declaration. * dwarf2out.c (dwarf2out_finish): Handle flag_generate_offload. * ipa-inline-analysis.c (inline_generate_summary): Do not skip if flag_generate_offload is set. * lto-streamer.c (gate_lto_out): Handle flag_generate_offload. * passes.c (ipa_write_summaries): Do not skip if flag_generate_offload is set. * toplev.c (compile_file): Emit offload marker if offload info has been previously emitted. * tree.c (free_lang_data): Do not skip if flag_generate_offload is set. diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 2fd99a7..fed1a3e 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -2075,7 +2075,7 @@ ipa_passes (void) } /* Some targets need to handle LTO assembler output specially. */ - if (flag_generate_lto) + if (flag_generate_lto || flag_generate_offload) targetm.asm_out.lto_start (); if (!in_lto_p) @@ -2092,7 +2092,7 @@ ipa_passes (void) } } - if (flag_generate_lto) + if (flag_generate_lto || flag_generate_offload) targetm.asm_out.lto_end (); if (!flag_ltrans (in_lto_p || !flag_lto || flag_fat_lto_objects)) @@ -2176,10 +2176,10 @@ symbol_table::compile (void) /* Offloading requires LTO infrastructure. */ if (!in_lto_p g-have_offload) -flag_generate_lto = 1; +flag_generate_offload = 1; /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE. */ - if (flag_generate_lto) + if (flag_generate_lto || flag_generate_offload) lto_streamer_hooks_init (); /* Don't run the IPA passes if there was any error or sorry messages. */ diff --git a/gcc/collect2.c b/gcc/collect2.c index 9c3a1c5..2dcebcd 100644 --- a/gcc/collect2.c +++ b/gcc/collect2.c @@ -2392,12 +2392,16 @@ scan_prog_file (const char *prog_name, scanpass which_pass, if (found_lto) continue; - /* Look for the LTO info marker symbol, and add filename to + /* Look for the LTO or offload info marker symbol, and add filename to the LTO objects list if found. */ for (p = buf; (ch = *p) != '\0' ch != '\n'; p++) if (ch == ' ' p[1] == '_' p[2] == '_' -(strncmp (p + (p[3] == '_' ? 2 : 1), __gnu_lto_v1, 12) == 0) -ISSPACE (p[p[3] == '_' ? 14 : 13])) +(((strncmp (p + (p[3] == '_' ? 2 : 1), + __gnu_lto_v1, 12) == 0) + ISSPACE (p[p[3] == '_' ? 14 : 13])) + || ((strncmp (p + (p[3] == '_' ? 2 : 1), + __gnu_offload_v1, 16) == 0) +ISSPACE (p[p[3] == '_' ? 18 : 17] { add_lto_object (lto_objects, prog_name); diff --git a/gcc/common.opt b/gcc/common.opt index 41c8d4e..752d939 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -67,6 +67,10 @@ int *param_values Variable int flag_generate_lto +; Nonzero if we should write GIMPLE bytecode for offload compilation. +Variable +int flag_generate_offload = 0 + ; True to warn about any objects definitions whose size is larger ; than N bytes. Also want about function definitions whose returned ; values are larger than N bytes, where N is 'larger_than_size'. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 25f0e7d..4ee3102 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -24420,7 +24420,8 @@ dwarf2out_finish (const char *filename) /* When generating LTO bytecode we can not generate new assembler names at this point and all important decls got theirs via free-lang-data. */ - if ((!flag_generate_lto || DECL_ASSEMBLER_NAME_SET_P (decl)) + if (((!flag_generate_lto !flag_generate_offload) + || DECL_ASSEMBLER_NAME_SET_P (decl)) DECL_ASSEMBLER_NAME (decl) != DECL_NAME (decl)) { add_linkage_attr (node-die, decl); diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c index 2f2993c..9d62722
Re: [PATCH] IPA ICF: memory leak fix
On Thu, Nov 27, 2014 at 3:36 PM, Martin Liška mli...@suse.cz wrote: On 11/23/2014 10:01 AM, Markus Trippelsdorf wrote: On 2014.11.22 at 17:46 +0100, Markus Trippelsdorf wrote: On 2014.11.22 at 16:04 +0100, Martin Liška wrote: On 11/22/2014 10:09 AM, Markus Trippelsdorf wrote: On 2014.11.22 at 09:05 +0100, Martin Liška wrote: Following patch removes memory leak that was introduced by very first IPA ICF patch. I would like to thank David for hunting the leak. Patch an bootstrap on x86_86-linux-pc and no regression is introduced. I gave the patch a quick spin on gcc112: *** Error in `/home/trippels/gcc_build_dir/./prev-gcc/lto1': free(): invalid next size (fast): 0x01000a5fc160 *** === Backtrace: = /lib64/libc.so.6(+0xa3d9c)[0x3fff7b6b3d9c] /lib64/libc.so.6(+0xaf0b4)[0x3fff7b6bf0b4] /home/trippels/gcc_build_dir/./prev-gcc/lto1(_ZN3vecIi7va_heap6vl_ptrE7releaseEv-0x1d4bc00)[0x1025dd88] /home/trippels/gcc_build_dir/./prev-gcc/lto1(_ZN7ipa_icf12sem_function14equals_privateEPNS_8sem_itemER8hash_mapIP11symtab_nodeS2_22default_hashmap_traitsE-0x9c083c)[0x116586bc] /home/trippels/gcc_build_dir/./prev-gcc/lto1(_ZN7ipa_icf12sem_function6equalsEPNS_8sem_itemER8hash_mapIP11symtab_nodeS2_22default_hashmap_traitsE-0x9c0578)[0x11658998] /home/trippels/gcc_build_dir/./prev-gcc/lto1(_ZN7ipa_icf18sem_item_optimizer7executeEv-0x9b8774)[0x11660a84] /home/trippels/gcc_build_dir/./prev-gcc/lto1(_ZN7ipa_icf12pass_ipa_icf7executeEP8function-0x9b0314)[0x11668efc] /home/trippels/gcc_build_dir/./prev-gcc/lto1(_Z16execute_one_passP8opt_pass-0x1647588)[0x1098a0a8] /home/trippels/gcc_build_dir/./prev-gcc/lto1(_Z21execute_ipa_pass_listP8opt_pass-0x1644c2c)[0x1098ca7c] /home/trippels/gcc_build_dir/./prev-gcc/lto1(_Z8lto_mainv-0x1df20e4)[0x101b494c] /home/trippels/gcc_build_dir/./prev-gcc/lto1[0x10b599b8] /home/trippels/gcc_build_dir/./prev-gcc/lto1(_ZN6toplev4mainEiPPc-0x1e8be70)[0x101507b8] /home/trippels/gcc_build_dir/./prev-gcc/lto1(main-0x1ec8d8c)[0x1015493c] /lib64/libc.so.6(+0x447ac)[0x3fff7b6547ac] /lib64/libc.so.6(__libc_start_main-0x19cbf4)[0x3fff7b6549d4] === Memory map: ... Thank you for testing, problem is that I should grow the vector by 1, because '0' is used as NULL value. Please try my fixed patch. This one survives bootstrap-lto. Thanks. But Firefox doesn't build: /home/trippels/gcc_test/usr/local/bin/c++ -fPIC -Wall -Wempty-body -Woverloaded-virtual -Wsign-compare -Wwrite-strings -Werror=endif-labels -Werror=int-to-pointer-cast -Werror=missing-braces -Werror=pointer-arith -Werror=return-type -Werror=sequence-point -Werror=unused-label -Werror=trigraphs -Werror=type-limits -Wno-invalid-offsetof -Wcast-align -flto=160 --param lto-partitions=160 -mcpu=power8 -ffunction-sections -fdata-sections -fno-exceptions -fno-strict-aliasing -frtti -fno-exceptions -fno-math-errno -std=gnu++0x -pthread -pipe -UDEBUG -DNDEBUG -O3 -DU_STATIC_IMPLEMENTATION -fvisibility=hidden -W -Wall -pedantic -Wpointer-arith -Wwrite-strings -Wno-long-long -Wno-unused -Wno-unused-parameter -lpthread -Wl,--hash-style=gnu,--as-needed,--gc-sections,--icf=all -Wl,-z,noexecstack -Wl,-z,text -Wl,--build-id -Wl,--gc-sections -o ../../bin/makeconv makeconv.o ucnvstat.o genmbcs.o gencnvex.o -L../../lib -licutu -L../../lib -licui18n -L../../lib -licuuc -L../../stubdata -licudata - l pthread -ldl -lm lto1: internal compiler error: in operator[], at vec.h:736 0x10122377 vecint, va_heap, vl_embed::operator[](unsigned int) ../../gcc/gcc/vec.h:736 0x10d1a0f3 vecint, va_heap, vl_embed::operator[](unsigned int) ../../gcc/gcc/ipa-icf.c:963 0x10d1a0f3 vecint, va_heap, vl_ptr::operator[](unsigned int) ../../gcc/gcc/vec.h:1202 0x10d1a0f3 ipa_icf::sem_function::bb_dict_test(auto_vecint, 0ul, int, int) ../../gcc/gcc/ipa-icf.c:970 0x10d1aa73 ipa_icf::sem_function::equals_private(ipa_icf::sem_item*, hash_mapsymtab_node*, ipa_icf::sem_item*, default_hashmap_traits) ../../gcc/gcc/ipa-icf.c:512 0x10d1afb3 ipa_icf::sem_function::equals(ipa_icf::sem_item*, hash_mapsymtab_node*, ipa_icf::sem_item*, default_hashmap_traits) ../../gcc/gcc/ipa-icf.c:384 0x10d1dba3 ipa_icf::sem_item_optimizer::subdivide_classes_by_equality(bool) ../../gcc/gcc/ipa-icf.c:1833 0x10d2397f ipa_icf::sem_item_optimizer::execute() ../../gcc/gcc/ipa-icf.c:1652 0x10d24a93 ipa_icf_driver ../../gcc/gcc/ipa-icf.c:2382 0x10d24a93 ipa_icf::pass_ipa_icf::execute(function*) ../../gcc/gcc/ipa-icf.c:2430 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions. lto-wrapper: fatal error: /home/trippels/gcc_test/usr/local/bin/c++ returned 1 exit status compilation terminated. /home/trippels/bin/ld: fatal error: lto-wrapper failed collect2: error: ld returned 1 exit
Re: PATCH: PR rtl-optimization/64037: Miscompilation with -Os and enum class : char parameter
On Tue, Nov 25, 2014 at 5:05 PM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Nov 25, 2014 at 7:01 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Nov 25, 2014 at 1:57 PM, H.J. Lu hongjiu...@intel.com wrote: Hi, The enclosed testcase fails on x86 when compiled with -Os since we pass a byte parameter with a byte load in caller and read it as an int in callee. The reason it only shows up with -Os is x86 backend encodes a byte load with an int load if -O isn't used. When a byte load is used, the upper 24 bits of the register have random value for none WORD_REGISTER_OPERATIONS targets. It happens because setup_incoming_promotions in combine.c has /* The mode and signedness of the argument before any promotions happen (equal to the mode of the pseudo holding it at that stage). */ mode1 = TYPE_MODE (TREE_TYPE (arg)); uns1 = TYPE_UNSIGNED (TREE_TYPE (arg)); /* The mode and signedness of the argument after any source language and TARGET_PROMOTE_PROTOTYPES-driven promotions. */ mode2 = TYPE_MODE (DECL_ARG_TYPE (arg)); uns3 = TYPE_UNSIGNED (DECL_ARG_TYPE (arg)); /* The mode and signedness of the argument as it is actually passed, after any TARGET_PROMOTE_FUNCTION_ARGS-driven ABI promotions. */ mode3 = promote_function_mode (DECL_ARG_TYPE (arg), mode2, uns3, TREE_TYPE (cfun-decl), 0); while they are actually passed in register by assign_parm_setup_reg in function.c: /* Store the parm in a pseudoregister during the function, but we may need to do it in a wider mode. Using 2 here makes the result consistent with promote_decl_mode and thus expand_expr_real_1. */ promoted_nominal_mode = promote_function_mode (data-nominal_type, data-nominal_mode, unsignedp, TREE_TYPE (current_function_decl), 2); where nominal_type and nominal_mode are set up with TREE_TYPE (parm) and TYPE_MODE (nominal_type). TREE_TYPE here is I think the bug is here, not in combine.c. Can you try going back in history for both snippets and see if they matched at some point? The bug was introduced by https://gcc.gnu.org/ml/gcc-cvs/2007-09/msg00613.html commit 5d93234932c3d8617ce92b77b7013ef6bede9508 Author: shinwell shinwell@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Thu Sep 20 11:01:18 2007 + gcc/ * combine.c: Include cgraph.h. (setup_incoming_promotions): Rework to allow more aggressive elimination of sign extensions when all call sites of the current function are known to lie within the current unit. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@128618 138bc75d-0d04-0410-961f-82ee72b054a4 Before this commit, combine.c has enum machine_mode mode = TYPE_MODE (TREE_TYPE (arg)); int uns = TYPE_UNSIGNED (TREE_TYPE (arg)); mode = promote_mode (TREE_TYPE (arg), mode, uns, 1); if (mode == GET_MODE (reg) mode != DECL_MODE (arg)) { rtx x; x = gen_rtx_CLOBBER (DECL_MODE (arg), const0_rtx); x = gen_rtx_fmt_e ((uns ? ZERO_EXTEND : SIGN_EXTEND), mode, x); record_value_for_reg (reg, first, x); } It matches function.c: /* This is not really promoting for a call. However we need to be consistent with assign_parm_find_data_types and expand_expr_real_1. */ promoted_nominal_mode = promote_mode (data-nominal_type, data-nominal_mode, unsignedp, 1); r128618 changed mode = promote_mode (TREE_TYPE (arg), mode, uns, 1); to mode3 = promote_mode (DECL_ARG_TYPE (arg), mode2, uns3, 1); It breaks none WORD_REGISTER_OPERATIONS targets. Hmm, I think that DECL_ARG_TYPE makes a difference only for non-WORD_REGISTER_OPERATIONS targets. But yeah, isolated the above change looks wrong. Your patch is ok for trunk if nobody objects within 24h and for branches after a week. Thanks, Richard. -- H.J.
Re: [PATCH] VRP: don't assume strict overflow semantics when checking if a loop wraps
On Fri, Nov 21, 2014 at 2:32 PM, Patrick Palka patr...@parcs.ath.cx wrote: On Fri, Nov 21, 2014 at 7:18 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Nov 21, 2014 at 12:29 PM, Patrick Palka patr...@parcs.ath.cx wrote: When adjusting the value range of an induction variable using SCEV, VRP calls scev_probably_wraps_p() with use_overflow_semantics=true. This parameter set to true makes scev_probably_wraps_p() assume that signed induction variables never wrap, so for these variables it always returns false (when strict overflow rules are in effect). This is wrong because if a signed induction variable really does overflow then we want to give it an INF(OVF) value range and not the (finite) estimation returned by SCEV. While this change shouldn't make a difference in code generation, it should help improve the coverage of -Wstrict-overflow warnings on induction variables like in the test case. OK after bootstrap + regtest on x86_64-unknown-linux-gnu? Hmm, I don't think the change won't affect code-generation. In fact we check for overflow ourselves in the most interesting case (the first block) - only the path adjusting min/max based on the init value and the max value of the type needs to know whether overflow may happen and fail or drop to +-INF(OVF). So I'd rather open-code the relevant cases and not call scev_probably_wraps_p at all. What kind of tests for overflow do you have in mind? max_loop_iterations() in this test case always return INT_MAX so there will be no overflow when computing the upper bound using the number of loop iterations. Do you mean to compare what max_loop_iterations() returns with the range that VRP has inferred for the induction variable? I'm talking about /* Try to use estimated number of iterations for the loop to constrain the final value in the evolution. */ if (TREE_CODE (step) == INTEGER_CST is_gimple_val (init) (TREE_CODE (init) != SSA_NAME || get_value_range (init)-type == VR_RANGE)) { widest_int nit; /* We are only entering here for loop header PHI nodes, so using the number of latch executions is the correct thing to use. */ if (max_loop_iterations (loop, nit)) which should be fine without the scev_probably_wraps check and the fallback tmin/tmax with the min/max of the type only being valid for TYPE_OVERFLOW_UNDEFINED types. At least it should boil down to that, no? Thanks, Richard. Richard. gcc/ * tree-vrp.c (adjust_range_with_scev): Call scev_probably_wraps_p with use_overflow_semantics=false. gcc/testsuite/ * gcc.dg/Wstrict-overflow-27.c: New test. --- gcc/testsuite/gcc.dg/Wstrict-overflow-27.c | 22 ++ gcc/tree-vrp.c | 2 +- 2 files changed, 23 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/Wstrict-overflow-27.c diff --git a/gcc/testsuite/gcc.dg/Wstrict-overflow-27.c b/gcc/testsuite/gcc.dg/Wstrict-overflow-27.c new file mode 100644 index 000..c1f27ab --- /dev/null +++ b/gcc/testsuite/gcc.dg/Wstrict-overflow-27.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options -fstrict-overflow -O2 -Wstrict-overflow } */ + +/* Warn about an overflow when folding i 0. */ + +void bar (unsigned *p); + +int +foo (unsigned *p) +{ + int i; + int sum = 0; + + for (i = 0; i *p; i++) +{ + if (i 0) /* { dg-warning signed overflow } */ + sum += 2; + bar (p); +} + + return sum; +} diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index a75138f..bf9ff61 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -4270,7 +4270,7 @@ adjust_range_with_scev (value_range_t *vr, struct loop *loop, dir == EV_DIR_UNKNOWN /* ... or if it may wrap. */ || scev_probably_wraps_p (init, step, stmt, get_chrec_loop (chrec), - true)) + /*use_overflow_semantics=*/false)) return; /* We use TYPE_MIN_VALUE and TYPE_MAX_VALUE here instead of -- 2.2.0.rc1.23.gf570943
Re: [Ping]Re: [PR63762][4.9] Backport the patch which fixes GCC generates UNPREDICTABLE STR with Rn = Rt for arm
On 26/11/14 18:12, H.J. Lu wrote: On Wed, Nov 26, 2014 at 10:09 AM, Renlin Li renlin...@arm.com wrote: On 26/11/14 12:16, H.J. Lu wrote: On Wed, Nov 26, 2014 at 4:07 AM, Renlin Li renlin...@arm.com wrote: On 20/11/14 16:17, Renlin Li wrote: Hi all, This is a backport for gcc-4_9-branch of the patch [PR63762]GCC generates UNPREDICTABLE STR with Rn = Rt for arm posted in: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02253.html arm-none-eabi has been test on the model, no new issues. bootstrapping and regression tested on x86, no new issues. Is it Okay for gcc-4_9-branch? gcc/ChangeLog: 2014-11-20 Renlin Li renlin...@arm.com PR middle-end/63762 * ira.c (ira): Update preferred class. gcc/testsuite/ChangeLog: 2014-11-20 Renlin Li renlin...@arm.com PR middle-end/63762 * gcc.dg/pr63762.c: New. Ping for it. Please verify if it is the real fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63661 If yes, please add a testcase for PR 63661 and mention it in your ChangeLog entry. Thanks. Hi H.J. Yes, I have verified that, this patch additionally fixes PR 63661. I observed the same behaviour as I saw on arm backend. It will be great if you can double check they are caused by exactly the same reason. I will ask our people to take a look. A new testcase has been added, ChangeLog has been updated to reflect the change. Updated patch has bee attached. Okay for gcc-4_9-branch? Regards, Renlin Li gcc/ChangeLog: 2014-11-26 Renlin Li renlin...@arm.com PR middle-end/63762 PR middle-end/63661 * ira.c (ira): Update preferred class. gcc/testsuite/ChangeLog: 2014-11-26 Renlin Li renlin...@arm.com PR middle-end/63661 PR middle-end/63762 * testsuite/gcc.dg/pr63661.c: New. * testsuite/gcc.dg/pr63762.c: New. pr63661.c should be moved to gcc.target/i386 and run it on PIC target. Thanks. Hi H.J. The patch has been adjusted according to your suggestion. gcc/ChangeLog: 2014-11-27 Renlin Li renlin...@arm.com PR middle-end/63762 PR target/63661 * ira.c (ira): Update preferred class. gcc/testsuite/ChangeLog: 2014-11-27 Renlin Li renlin...@arm.com PR middle-end/63762 PR target/63661 * testsuite/gcc.dg/pr63762.c: New. * testsuite/gcc.target/i386/pr63661.c: New. diff --git a/gcc/ira.c b/gcc/ira.c index 4d91d21..0c703c5 100644 --- a/gcc/ira.c +++ b/gcc/ira.c @@ -5347,7 +5347,18 @@ ira (FILE *f) ira_allocno_iterator ai; FOR_EACH_ALLOCNO (a, ai) - ALLOCNO_REGNO (a) = REGNO (ALLOCNO_EMIT_DATA (a)-reg); + { + int old_regno = ALLOCNO_REGNO (a); + int new_regno = REGNO (ALLOCNO_EMIT_DATA (a)-reg); + + ALLOCNO_REGNO (a) = new_regno; + + if (old_regno != new_regno) + setup_reg_classes (new_regno, reg_preferred_class (old_regno), + reg_alternate_class (old_regno), + reg_allocno_class (old_regno)); + } + } else { diff --git a/gcc/testsuite/gcc.dg/pr63762.c b/gcc/testsuite/gcc.dg/pr63762.c new file mode 100644 index 000..df11067 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr63762.c @@ -0,0 +1,77 @@ +/* PR middle-end/63762 */ +/* { dg-do assemble } */ +/* { dg-options -O2 } */ + +#include stdlib.h + +void *astFree (); +void *astMalloc (); +void astNegate (void *); +int astGetNegated (void *); +void astGetRegionBounds (void *, double *, double *); +int astResampleF (void *, ...); + +extern int astOK; + +int +MaskF (int inside, int ndim, const int lbnd[], const int ubnd[], + float in[], float val) +{ + + void *used_region; + float *c, *d, *out, *tmp_out; + double *lbndgd, *ubndgd; + int *lbndg, *ubndg, idim, ipix, nax, nin, nout, npix, npixg, result = 0; + if (!astOK) return result; + lbndg = astMalloc (sizeof (int)*(size_t) ndim); + ubndg = astMalloc (sizeof (int)*(size_t) ndim); + lbndgd = astMalloc (sizeof (double)*(size_t) ndim); + ubndgd = astMalloc (sizeof (double)*(size_t) ndim); + if (astOK) +{ + astGetRegionBounds (used_region, lbndgd, ubndgd); + npix = 1; + npixg = 1; + for (idim = 0; idim ndim; idim++) +{ + lbndg[ idim ] = lbnd[ idim ]; + ubndg[ idim ] = ubnd[ idim ]; + npix *= (ubnd[ idim ] - lbnd[ idim ] + 1); + if (npixg = 0) npixg *= (ubndg[ idim ] - lbndg[ idim ] + 1); +} + if (npixg = 0 astOK) +{ + if ((inside != 0) == (astGetNegated( used_region ) != 0)) +{ + c = in; + for (ipix = 0; ipix npix; ipix++) *(c++) = val; + result = npix; +} +} + else if (npixg 0 astOK) +{ + if ((inside != 0) == (astGetNegated (used_region) != 0)) +{ + tmp_out = astMalloc (sizeof (float)*(size_t) npix); + if (tmp_out) +{ + c = tmp_out; + for (ipix = 0; ipix npix; ipix++) *(c++) = val; + result =
Re: [PATCH][AArch64] Use std::swap instead of manually swapping
Ping. Thanks, Kyrill On 13/11/14 09:42, Kyrill Tkachov wrote: Hi all, Following the trend in i386 and alpha, this patch uses std::swap to perform swapping of values in the aarch64 backend instead of declaring temporaries. Tested and bootstrapped on aarch64-linux. Ok for trunk? Thanks, Kyrill 2014-11-13 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.c (aarch64_evpc_ext): Use std::swap instead of manual swapping implementation. (aarch64_expand_vec_perm_const_1): Likewise.
Re: [Ping]Re: [PR63762][4.9] Backport the patch which fixes GCC generates UNPREDICTABLE STR with Rn = Rt for arm
On Thu, Nov 27, 2014 at 7:32 AM, Renlin Li renlin...@arm.com wrote: On 26/11/14 18:12, H.J. Lu wrote: On Wed, Nov 26, 2014 at 10:09 AM, Renlin Li renlin...@arm.com wrote: On 26/11/14 12:16, H.J. Lu wrote: On Wed, Nov 26, 2014 at 4:07 AM, Renlin Li renlin...@arm.com wrote: On 20/11/14 16:17, Renlin Li wrote: Hi all, This is a backport for gcc-4_9-branch of the patch [PR63762]GCC generates UNPREDICTABLE STR with Rn = Rt for arm posted in: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02253.html arm-none-eabi has been test on the model, no new issues. bootstrapping and regression tested on x86, no new issues. Is it Okay for gcc-4_9-branch? gcc/ChangeLog: 2014-11-20 Renlin Li renlin...@arm.com PR middle-end/63762 * ira.c (ira): Update preferred class. gcc/testsuite/ChangeLog: 2014-11-20 Renlin Li renlin...@arm.com PR middle-end/63762 * gcc.dg/pr63762.c: New. Ping for it. Please verify if it is the real fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63661 If yes, please add a testcase for PR 63661 and mention it in your ChangeLog entry. Thanks. Hi H.J. Yes, I have verified that, this patch additionally fixes PR 63661. I observed the same behaviour as I saw on arm backend. It will be great if you can double check they are caused by exactly the same reason. I will ask our people to take a look. A new testcase has been added, ChangeLog has been updated to reflect the change. Updated patch has bee attached. Okay for gcc-4_9-branch? Regards, Renlin Li gcc/ChangeLog: 2014-11-26 Renlin Li renlin...@arm.com PR middle-end/63762 PR middle-end/63661 * ira.c (ira): Update preferred class. gcc/testsuite/ChangeLog: 2014-11-26 Renlin Li renlin...@arm.com PR middle-end/63661 PR middle-end/63762 * testsuite/gcc.dg/pr63661.c: New. * testsuite/gcc.dg/pr63762.c: New. pr63661.c should be moved to gcc.target/i386 and run it on PIC target. Thanks. Hi H.J. The patch has been adjusted according to your suggestion. gcc/ChangeLog: 2014-11-27 Renlin Li renlin...@arm.com PR middle-end/63762 PR target/63661 * ira.c (ira): Update preferred class. gcc/testsuite/ChangeLog: 2014-11-27 Renlin Li renlin...@arm.com PR middle-end/63762 PR target/63661 * testsuite/gcc.dg/pr63762.c: New. * testsuite/gcc.target/i386/pr63661.c: New. ^^^ No testsuite/ gcc.target/i386/pr63661.c should be checked into trunk first. -- H.J.
[PATCH][PR63661]Add a test case for trunk.
Hi all, PR63661 is effectively fixed by my previous patch here: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02253.html This patch add a test case for it. Okay for trunk? gcc/testsuite/ChangeLog: 2014-11-27 Renlin Li renlin...@arm.com PR target/63661 * gcc.target/i386/pr63661.c: New.diff --git a/gcc/testsuite/gcc.target/i386/pr63661.c b/gcc/testsuite/gcc.target/i386/pr63661.c new file mode 100644 index 000..8b55146 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr63661.c @@ -0,0 +1,77 @@ +/* PR target/63661 */ +/* { dg-do run } */ +/* { dg-require-effective-target fpic } */ +/* { dg-options -mtune=nehalem -fPIC -O2 } */ + +static void __attribute__((noinline,noclone,hot)) +foo (double a, double q, double *ff, double *gx, int e, int ni) +{ + union +{ + double n; + unsigned long long o; +} punner; + + punner.n = q; + __builtin_printf(B: 0x%016llx %g\n, punner.o, q); + + if(q != 5) +__builtin_abort(); +} + +static int __attribute__((noinline,noclone,hot)) +bar (int order, double q, double c[]) +{ + int ni, nn, i, e; + double g2, x2, de, s, ratio, ff; + + nn = 0; + e = order 1; + s = 0; + ratio = 0; + x2 = 0; + g2 = 0; + + if(q == 0.0) +return 0; + + if (order 5) +{ + ratio = 1.0 / q; + nn = order; +} + + ni = -nn; + + while(1) +{ + de = ratio - g2 - x2; + + foo (0, q, ff, g2, e, ni); + + if((int)de == 0) +break; +} + + s += 2 * nn * c[nn]; + + for (i = 0; i 1; i++) +{ + c[0] = nn; + for (; i 10; i++) +c[i] = 0.0; + c[0] /= s; +} + + return 0; +} + +int +main () +{ + double c[1000]; + + __builtin_printf(A: 0x%016llx\n, (unsigned long long)c); + bar (1, 5.0, c); + return 0; +}
Re: [Ping]Re: [PR63762][4.9] Backport the patch which fixes GCC generates UNPREDICTABLE STR with Rn = Rt for arm
On 27/11/14 15:37, H.J. Lu wrote: On Thu, Nov 27, 2014 at 7:32 AM, Renlin Li renlin...@arm.com wrote: On 26/11/14 18:12, H.J. Lu wrote: On Wed, Nov 26, 2014 at 10:09 AM, Renlin Li renlin...@arm.com wrote: On 26/11/14 12:16, H.J. Lu wrote: On Wed, Nov 26, 2014 at 4:07 AM, Renlin Li renlin...@arm.com wrote: On 20/11/14 16:17, Renlin Li wrote: Hi all, This is a backport for gcc-4_9-branch of the patch [PR63762]GCC generates UNPREDICTABLE STR with Rn = Rt for arm posted in: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02253.html arm-none-eabi has been test on the model, no new issues. bootstrapping and regression tested on x86, no new issues. Is it Okay for gcc-4_9-branch? gcc/ChangeLog: 2014-11-20 Renlin Li renlin...@arm.com PR middle-end/63762 * ira.c (ira): Update preferred class. gcc/testsuite/ChangeLog: 2014-11-20 Renlin Li renlin...@arm.com PR middle-end/63762 * gcc.dg/pr63762.c: New. Ping for it. Please verify if it is the real fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63661 If yes, please add a testcase for PR 63661 and mention it in your ChangeLog entry. Thanks. Hi H.J. Yes, I have verified that, this patch additionally fixes PR 63661. I observed the same behaviour as I saw on arm backend. It will be great if you can double check they are caused by exactly the same reason. I will ask our people to take a look. A new testcase has been added, ChangeLog has been updated to reflect the change. Updated patch has bee attached. Okay for gcc-4_9-branch? Regards, Renlin Li gcc/ChangeLog: 2014-11-26 Renlin Li renlin...@arm.com PR middle-end/63762 PR middle-end/63661 * ira.c (ira): Update preferred class. gcc/testsuite/ChangeLog: 2014-11-26 Renlin Li renlin...@arm.com PR middle-end/63661 PR middle-end/63762 * testsuite/gcc.dg/pr63661.c: New. * testsuite/gcc.dg/pr63762.c: New. pr63661.c should be moved to gcc.target/i386 and run it on PIC target. Thanks. Hi H.J. The patch has been adjusted according to your suggestion. gcc/ChangeLog: 2014-11-27 Renlin Li renlin...@arm.com PR middle-end/63762 PR target/63661 * ira.c (ira): Update preferred class. gcc/testsuite/ChangeLog: 2014-11-27 Renlin Li renlin...@arm.com PR middle-end/63762 PR target/63661 * testsuite/gcc.dg/pr63762.c: New. * testsuite/gcc.target/i386/pr63661.c: New. ^^^ No testsuite/ Hi H.J. gcc.target/i386/pr63661.c should be checked into trunk first. A separate patch is sent to mailing list for this. ChangeLog has been corrected. Regards, Renlin Li gcc/ChangeLog: 2014-11-27 Renlin Li renlin...@arm.com PR middle-end/63762 PR target/63661 * ira.c (ira): Update preferred class. gcc/testsuite/ChangeLog: 2014-11-27 Renlin Li renlin...@arm.com PR middle-end/63762 PR target/63661 * gcc.dg/pr63762.c: New. * gcc.target/i386/pr63661.c: New. diff --git a/gcc/ira.c b/gcc/ira.c index 4d91d21..0c703c5 100644 --- a/gcc/ira.c +++ b/gcc/ira.c @@ -5347,7 +5347,18 @@ ira (FILE *f) ira_allocno_iterator ai; FOR_EACH_ALLOCNO (a, ai) - ALLOCNO_REGNO (a) = REGNO (ALLOCNO_EMIT_DATA (a)-reg); + { + int old_regno = ALLOCNO_REGNO (a); + int new_regno = REGNO (ALLOCNO_EMIT_DATA (a)-reg); + + ALLOCNO_REGNO (a) = new_regno; + + if (old_regno != new_regno) + setup_reg_classes (new_regno, reg_preferred_class (old_regno), + reg_alternate_class (old_regno), + reg_allocno_class (old_regno)); + } + } else { diff --git a/gcc/testsuite/gcc.dg/pr63762.c b/gcc/testsuite/gcc.dg/pr63762.c new file mode 100644 index 000..df11067 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr63762.c @@ -0,0 +1,77 @@ +/* PR middle-end/63762 */ +/* { dg-do assemble } */ +/* { dg-options -O2 } */ + +#include stdlib.h + +void *astFree (); +void *astMalloc (); +void astNegate (void *); +int astGetNegated (void *); +void astGetRegionBounds (void *, double *, double *); +int astResampleF (void *, ...); + +extern int astOK; + +int +MaskF (int inside, int ndim, const int lbnd[], const int ubnd[], + float in[], float val) +{ + + void *used_region; + float *c, *d, *out, *tmp_out; + double *lbndgd, *ubndgd; + int *lbndg, *ubndg, idim, ipix, nax, nin, nout, npix, npixg, result = 0; + if (!astOK) return result; + lbndg = astMalloc (sizeof (int)*(size_t) ndim); + ubndg = astMalloc (sizeof (int)*(size_t) ndim); + lbndgd = astMalloc (sizeof (double)*(size_t) ndim); + ubndgd = astMalloc (sizeof (double)*(size_t) ndim); + if (astOK) +{ + astGetRegionBounds (used_region, lbndgd, ubndgd); + npix = 1; + npixg = 1; + for (idim = 0; idim ndim; idim++) +{ + lbndg[ idim ] = lbnd[ idim ]; + ubndg[ idim ] = ubnd[ idim ]; + npix *= (ubnd[ idim ] - lbnd[ idim ] + 1); + if (npixg = 0) npixg *= (ubndg[ idim ] - lbndg[
Re: [PATCH] Fix PR lto/64075
On Nov 26, 2014, at 1:35 AM, Ilya Enkovich enkovich@gmail.com wrote: This patch fixes LTO streamers which were not adjusted when function_code field was extended up to 12 bits. OK for trunk after bootstrap and check? Please could you burry this in a #define someplace. I think I have a patch that takes it up to 13, and don't look forward to the same issue. Also 12 is kinda short sighted for a port. 13 is plenty big enough for the next 10 years I suspect.
Re: [arm][patch] fix arm_neon_ok check on !arm_arch7
On Nov 26, 2014, at 4:24 AM, Andrew Stubbs andrew_stu...@mentor.com wrote: On 14/11/14 11:12, Andrew Stubbs wrote: On 07/11/14 10:35, Andrew Stubbs wrote: if armv6 never co-exist with NEON, personally I think your original patch is better because TARGET_NEON generally will be used when all options are processed. any way, this needs gate keeper's approval. Ping, Richard. Ping. Ping. Could you include a link or the patch. If the test suite, I'll review it if no one else steps up.
Re: [PATCH 8/9] Negative numbers added for sreal class.
On 11/21/2014 04:21 PM, Martin Liška wrote: On 11/21/2014 04:02 PM, Richard Biener wrote: On Fri, Nov 21, 2014 at 3:39 PM, Martin Liška mli...@suse.cz wrote: Hello. Ok, this is simplified, one can use sreal a = 12345 and it works ;) that's a new API, right? There is no max () and I think that using LONG_MIN here is asking for trouble (host dependence). The comment in the file says the max should be sreal (SREAL_MAX_SIG, SREAL_MAX_EXP) and the min sreal (-SREAL_MAX_SIG, SREAL_MAX_EXP)? Sure, sreal can store much bigger(smaller) numbers :) Where do you need sreal::to_double? The host shouldn't perform double calculations so it can be only for dumping? In which case the user should have used sreal::dump (), maybe with extra arguments. That new function was request from Honza, only for debugging purpose. I agree that dump should this kind of job. If no other problem, I will run tests once more and commit it. Thanks, Martin -#define SREAL_MAX_EXP (INT_MAX / 4) +#define SREAL_MAX_EXP (INT_MAX / 8) this change doesn't look necessary anymore? Btw, it's also odd that... #define SREAL_PART_BITS 32 ... #define SREAL_MIN_SIG ((uint64_t) 1 (SREAL_PART_BITS - 1)) #define SREAL_MAX_SIG (((uint64_t) 1 SREAL_PART_BITS) - 1) thus all m_sig values fit in 32bits but we still use a uint64_t m_sig ... (the implementation uses 64bit for internal computations, but still the storage is wasteful?) Of course the way normalize() works requires that storage to be 64bits to store unnormalized values. I'd say ok with the SREAL_MAX_EXP change reverted. Hi. You are right, this change was done because I used one bit for m_negative (bitfield), not needed any more. Final version attached. Thank you, Martin Thanks, Richard. Otherwise looks good to me and sorry for not noticing the above earlier. Thanks, Richard. Thanks, Martin }; extern void debug (sreal ref); @@ -76,12 +133,12 @@ inline sreal operator+= (sreal a, const sreal b) inline sreal operator-= (sreal a, const sreal b) { -return a = a - b; + return a = a - b; } inline sreal operator/= (sreal a, const sreal b) { -return a = a / b; + return a = a / b; } inline sreal operator*= (sreal a, const sreal b) -- 2.1.2 Hello. After IRC discussions, I decided to give sreal another refactoring where I use int64_t for m_sig. This approach looks much easier and straightforward. I would like to ask folk for comments? I am able to run profiled bootstrap on x86_64-linux-pc and ppc64-linux-pc and new regression is introduced. Thanks, Martin From bff0b4b803271788cd90cfd4032ed6d4e6e95707 Mon Sep 17 00:00:00 2001 From: mliska mli...@suse.cz Date: Wed, 26 Nov 2014 15:46:42 +0100 Subject: [PATCH] New sreal implementation which uses int64_t as m_sig. gcc/ChangeLog: 2014-11-27 Martin Liska mli...@suse.cz * sreal.c (sreal::shift_right): New implementation for int64_t as m_sig. (sreal::normalize): Likewise. (sreal::to_int): Likewise. (sreal::operator+): Likewise. (sreal::operator-): Likewise. (sreal::operator*): Likewise. (sreal::operator/): Likewise. (sreal::signedless_minus): Removed. (sreal::signedless_plus): Removed. * sreal.h (sreal::operator): New implementation for int64_t as m_sig. --- gcc/sreal.c | 129 +++- gcc/sreal.h | 52 ++-- 2 files changed, 61 insertions(+), 120 deletions(-) diff --git a/gcc/sreal.c b/gcc/sreal.c index 2b5e3ae..304feb0 100644 --- a/gcc/sreal.c +++ b/gcc/sreal.c @@ -91,7 +91,7 @@ sreal::shift_right (int s) m_exp += s; - m_sig += (uint64_t) 1 (s - 1); + m_sig += (int64_t) 1 (s - 1); m_sig = s; } @@ -100,43 +100,46 @@ sreal::shift_right (int s) void sreal::normalize () { + int64_t s = m_sig 0 ? -1 : 1; + uint64_t sig = m_sig == LONG_MIN ? LONG_MAX : std::abs (m_sig); + if (m_sig == 0) { - m_negative = 0; m_exp = -SREAL_MAX_EXP; } - else if (m_sig SREAL_MIN_SIG) + else if (sig SREAL_MIN_SIG) { do { - m_sig = 1; + sig = 1; m_exp--; + gcc_checking_assert (sig); } - while (m_sig SREAL_MIN_SIG); + while (sig SREAL_MIN_SIG); /* Check underflow. */ if (m_exp -SREAL_MAX_EXP) { m_exp = -SREAL_MAX_EXP; - m_sig = 0; + sig = 0; } } - else if (m_sig SREAL_MAX_SIG) + else if (sig SREAL_MAX_SIG) { int last_bit; do { - last_bit = m_sig 1; - m_sig = 1; + last_bit = sig 1; + sig = 1; m_exp++; } - while (m_sig SREAL_MAX_SIG); + while (sig SREAL_MAX_SIG); /* Round the number. */ - m_sig += last_bit; - if (m_sig SREAL_MAX_SIG) + sig += last_bit; + if (sig SREAL_MAX_SIG) { - m_sig = 1; + sig = 1; m_exp++; } @@ -144,9 +147,11 @@ sreal::normalize () if (m_exp SREAL_MAX_EXP) { m_exp = SREAL_MAX_EXP; - m_sig = SREAL_MAX_SIG; + sig = SREAL_MAX_SIG; } } + +
Re: [Build, Graphite] PR64017 - support ISL-0.14 (gcc/configure.ac and gcc/graphite*.c)
This patch bootstraps fine on x86_64-apple-darwin14 against isl 0.14 at r218098. Jack On Wed, Nov 26, 2014 at 10:55 AM, Tobias Burnus tobias.bur...@physik.fu-berlin.de wrote: This patch adds a configure check for isl_schedule_constraints_compute_schedule, which is new in ISL 0.13 - and uses it for conditional compilation. The graphite*c patch is based on the one of Jack Howarth, https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00906.html - without checking whether the ISL replacements make sense. With the patch, I have successfully bootstrapped GCC with ISL 0.12.2 and ISL 0.14 (both using an in-tree build). [I still have to regtest the two variants and I also want to do a system-ISL built with ISL 0.12.2] Is the patch OK for the trunk? Tobias PS: I'd be happy if some others could run additional tests. PPS: If the patch is in, can someone put ISL 0.14 to infrastructure? Then one can update contrib/download_prerequisites - such that the graphite failure is at least fixed for in-tree builds (cf. PR64017, PR62289).
[RFC diagnostics/fortran] Move gfc_warning (buffered) to the common diagnostics machinery
This is just a RFC so no Changelog yet. It bootstraps and passes the testsuite. I have three major questions: * Dodji: Do the common diagnostics part look reasonable? I tried to be as least invasive as possible. If you have comments or suggestions they are very welcome. * Fortran devs: Is this approach acceptable? The main idea is to have an output_buffer called pp_warning_buffer with the flush_p bit unset if we are buffering. When printing buffered warnings, use this output_buffer in the global_dc-printer instead of the (unbuffered one) used by the *_now variants. In principle this could support several buffered diagnostics, but Fortran only seems to buffer at most one. The ugliest part is how to handle warningcount and werrorcount. I could handle this in the common machinery in a better way by storing DK_WERROR in the diagnostic-kind and checking it after printing. This way we can first decrease both counters, then increase back the one not changed and store the kind somewhere. then if the diagnostic is canceled nothing is done, if it is flushed, then increase the appropriate counter (perhaps calling into the common part for any post action due to Wfatal-errors or -fmax-errors=). I can also hide the output_buffer switching inside two helper functions, but the helper function would need to use either a static variable or a global one to save and restore the tmp_buffer. I'm not sure that is better or worse (the current code uses a global pointer cur_error_buffer, so perhaps I should have used a similar approach). * Fortran devs #2: The testsuite is testing that the warning is eventually printed. However, I'm not sure it is testing when the warning is buffered and then discarded, is it? If not, how can I produce such a test? Thanks, Manuel. Index: gcc/pretty-print.c === --- gcc/pretty-print.c (revision 218090) +++ gcc/pretty-print.c (working copy) @@ -40,7 +40,8 @@ cur_chunk_array (), stream (stderr), line_length (), -digit_buffer () +digit_buffer (), +flush_p (true) { obstack_init (formatted_obstack); obstack_init (chunk_obstack); @@ -679,12 +680,25 @@ pp_wrapping_mode (pp) = oldmode; } -/* Flush the content of BUFFER onto the attached stream. */ +/* Flush the content of BUFFER onto the attached stream. This + function does nothing unless pp-output_buffer-flush_p. */ void pp_flush (pretty_printer *pp) { + pp_clear_state (pp); + if (!pp-buffer-flush_p) +return; pp_write_text_to_stream (pp); + fflush (pp_buffer (pp)-stream); +} + +/* Flush the content of BUFFER onto the attached stream independently + of the value of pp-output_buffer-flush_p. */ +void +pp_really_flush (pretty_printer *pp) +{ pp_clear_state (pp); + pp_write_text_to_stream (pp); fflush (pp_buffer (pp)-stream); } Index: gcc/pretty-print.h === --- gcc/pretty-print.h (revision 218090) +++ gcc/pretty-print.h (working copy) @@ -100,6 +100,11 @@ /* This must be large enough to hold any printed integer or floating-point value. */ char digit_buffer[128]; + + /* Nonzero means that text should be flushed when + appropriate. Otherwise, text is buffered until either + pp_pp_really_flush or pp_clear_output_area are called. */ + bool flush_p; }; /* The type of pretty-printer flags passed to clients. */ @@ -314,6 +319,7 @@ extern void pp_verbatim (pretty_printer *, const char *, ...) ATTRIBUTE_GCC_PPDIAG(2,3); extern void pp_flush (pretty_printer *); +extern void pp_really_flush (pretty_printer *); extern void pp_format (pretty_printer *, text_info *); extern void pp_output_formatted_text (pretty_printer *); extern void pp_format_verbatim (pretty_printer *, text_info *); Index: gcc/fortran/gfortran.h === --- gcc/fortran/gfortran.h (revision 218090) +++ gcc/fortran/gfortran.h (working copy) @@ -2443,7 +2443,6 @@ int dump_fortran_optimized; int warn_aliasing; - int warn_ampersand; int warn_function_elimination; int warn_implicit_interface; int warn_implicit_procedure; @@ -2691,6 +2690,7 @@ const char *gfc_print_wide_char (gfc_char_t); void gfc_warning (const char *, ...) ATTRIBUTE_GCC_GFC(1,2); +bool gfc_warning_1 (int opt, const char *, ...) ATTRIBUTE_GCC_GFC(2,3); void gfc_warning_now_1 (const char *, ...) ATTRIBUTE_GCC_GFC(1,2); bool gfc_warning_now (const char *, ...) ATTRIBUTE_GCC_GFC(1,2); bool gfc_warning_now (int opt, const char *, ...) ATTRIBUTE_GCC_GFC(2,3); @@ -2719,7 +2719,7 @@ void gfc_free_error (gfc_error_buf *); void gfc_get_errors (int *, int *); -void gfc_errors_to_warnings (int); +void gfc_errors_to_warnings (bool); /* arith.c */ void gfc_arith_init_1 (void); Index: gcc/fortran/error.c === --- gcc/fortran/error.c (revision
Re: [arm][patch] fix arm_neon_ok check on !arm_arch7
On 27/11/14 17:05, Mike Stump wrote: Could you include a link or the patch. If the test suite, I'll review it if no one else steps up. The original patch is here: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01119.html Thanks Andrew
Re: [PATCH] Fix PR lto/64075
2014-11-27 19:55 GMT+03:00 Mike Stump mikest...@comcast.net: On Nov 26, 2014, at 1:35 AM, Ilya Enkovich enkovich@gmail.com wrote: This patch fixes LTO streamers which were not adjusted when function_code field was extended up to 12 bits. OK for trunk after bootstrap and check? Please could you burry this in a #define someplace. I think I have a patch that takes it up to 13, and don't look forward to the same issue. Also 12 is kinda short sighted for a port. 13 is plenty big enough for the next 10 years I suspect. I've committed the patch already.
Re: [RFC diagnostics/fortran] Move gfc_warning (buffered) to the common diagnostics machinery
On Thu, Nov 27, 2014 at 06:25:01PM +0100, Manuel López-Ibáñez wrote: The ugliest part is how to handle warningcount and werrorcount. I could handle this in the common machinery in a better way by storing DK_WERROR in the diagnostic-kind and checking it after printing. This way we can first decrease both counters, then increase back the one not changed and store the kind somewhere. then if the diagnostic is canceled nothing is done, if it is flushed, then increase the appropriate counter (perhaps calling into the common part for any post action due to Wfatal-errors or -fmax-errors=). I can also hide the output_buffer switching inside two helper functions, but the helper function would need to use either a static variable or a global one to save and restore the tmp_buffer. I'm not sure that is better or worse (the current code uses a global pointer cur_error_buffer, so perhaps I should have used a similar approach). Hi Manuel, First, thanks for all your work on the error/warning code in gfortran. Second, I'm dredging through my memories from a decade or so ago. Others are encouraged to jump in and correct me. The design of gfortran's warning/error code is largely unchanged from when g95 was originally imported into the tree. There have been, of course, some changes but the design was set. My understanding of how it all works(ed) is that gfortran's parsing runs a series of matchers over the input. A matcher may generate an error/warning, which is written into a buffer. But, instead of immediately reporting the issue gfortran may continue to run a series of matchers to see if the input is in fact validate. gfortran will report the last buffered error/warning message only after the last matcher fails. In running the series of matchers, there are times when a matcher may hit an error/warning condition and it is, in fact, a problem and gfortran wants to report the problem NOW. This is the origin of the *_now variants. I would need to go dredge through ChangeLogs, but I believe I may be responsible for the counting of error messages. I know that I at least set the default value for -fmax-error. The origins to the counting is that once gfortran encountered and reported an error, she would discard the problematic statement and continue parsing the input. This often leads to a series of spurious run-on error messages, which are all caused by the first error. Thus, if one does -fmax-error=1 and fixes the problem, then all the other problems disappear. Finally, gfortran's error/warning mechanism isn't immediately available at the start of execution. However, errors can occur before the mechanism is initialized. This is one reason why one finds fatal_error() spinkled throughout the code. * Fortran devs #2: The testsuite is testing that the warning is eventually printed. However, I'm not sure it is testing when the warning is buffered and then discarded, is it? If not, how can I produce such a test? I think that you can't check for a discarded buffered message. In fact, why bother? -- Steve
Re: [C/C++ PATCH] Don't convert RHS of a shift-expression to int (PR c/63862)
On Thu, Nov 27, 2014 at 10:56:06AM +0100, Richard Biener wrote: But then rather than using integer_type_node I'd convert to unsigned_type_node (if that doesn't work you know we have some nasty dependence on negative shift counts working) Yeah, I think adding the conversion into c_gimplify_expr is the best thing we can do until we have type demotion/promotion pass. Seems that unsigned_type_node there works well. (What happened to Kai's type elevation pass?) Bootstrapped/regtested on ppc64-linux, ok for trunk? 2014-11-27 Marek Polacek pola...@redhat.com PR c/63862 c-family/ * c-ubsan.c (ubsan_instrument_shift): Change the type of a MINUS_EXPR to op1_utype. * c-gimplify.c (c_gimplify_expr): Convert right operand of a shift expression to unsigned_type_node. c/ * c-typeck.c (build_binary_op) RSHIFT_EXPR, LSHIFT_EXPR: Don't convert the right operand to integer type. cp/ * typeck.c (cp_build_binary_op) RSHIFT_EXPR, LSHIFT_EXPR: Don't convert the right operand to integer type. testsuite/ * gcc.c-torture/execute/shiftopt-1.c: Don't XFAIL anymore. * c-c++-common/ubsan/shift-7.c: New test. diff --git gcc/c-family/c-gimplify.c gcc/c-family/c-gimplify.c index 85b4223..8e0eb4c 100644 --- gcc/c-family/c-gimplify.c +++ gcc/c-family/c-gimplify.c @@ -242,6 +242,24 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p ATTRIBUTE_UNUSED, switch (code) { +case LSHIFT_EXPR: +case RSHIFT_EXPR: + { + /* We used to convert the right operand of a shift-expression + to an integer_type_node in the FEs. But it is unnecessary + and not desirable for diagnostics and sanitizers. We keep + this here to not pessimize the code, but we convert to an + unsigned type, because negative shift counts are undefined + anyway. + We should get rid of this conversion when we have a proper + type demotion/promotion pass. */ + tree *op1_p = TREE_OPERAND (*expr_p, 1); + if (TREE_CODE (TREE_TYPE (*op1_p)) != VECTOR_TYPE +TYPE_MAIN_VARIANT (TREE_TYPE (*op1_p)) != unsigned_type_node) + *op1_p = convert (unsigned_type_node, *op1_p); + break; + } + case DECL_EXPR: /* This is handled mostly by gimplify.c, but we have to deal with not warning about int x = x; as it is a GCC extension to turn off diff --git gcc/c-family/c-ubsan.c gcc/c-family/c-ubsan.c index 90b03f2..96afc67 100644 --- gcc/c-family/c-ubsan.c +++ gcc/c-family/c-ubsan.c @@ -151,7 +151,7 @@ ubsan_instrument_shift (location_t loc, enum tree_code code, !TYPE_UNSIGNED (type0) flag_isoc99) { - tree x = fold_build2 (MINUS_EXPR, unsigned_type_node, uprecm1, + tree x = fold_build2 (MINUS_EXPR, op1_utype, uprecm1, fold_convert (op1_utype, op1)); tt = fold_convert_loc (loc, unsigned_type_for (type0), op0); tt = fold_build2 (RSHIFT_EXPR, TREE_TYPE (tt), tt, x); diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c index 67efb46..bf0f306 100644 --- gcc/c/c-typeck.c +++ gcc/c/c-typeck.c @@ -10513,11 +10513,6 @@ build_binary_op (location_t location, enum tree_code code, /* Use the type of the value to be shifted. */ result_type = type0; - /* Convert the non vector shift-count to an integer, regardless -of size of value being shifted. */ - if (TREE_CODE (TREE_TYPE (op1)) != VECTOR_TYPE - TYPE_MAIN_VARIANT (TREE_TYPE (op1)) != integer_type_node) - op1 = convert (integer_type_node, op1); /* Avoid converting op1 to result_type later. */ converted = 1; } @@ -10563,11 +10558,6 @@ build_binary_op (location_t location, enum tree_code code, /* Use the type of the value to be shifted. */ result_type = type0; - /* Convert the non vector shift-count to an integer, regardless -of size of value being shifted. */ - if (TREE_CODE (TREE_TYPE (op1)) != VECTOR_TYPE - TYPE_MAIN_VARIANT (TREE_TYPE (op1)) != integer_type_node) - op1 = convert (integer_type_node, op1); /* Avoid converting op1 to result_type later. */ converted = 1; } diff --git gcc/cp/typeck.c gcc/cp/typeck.c index 8b66acc..6ca346b 100644 --- gcc/cp/typeck.c +++ gcc/cp/typeck.c @@ -4295,10 +4295,6 @@ cp_build_binary_op (location_t location, right shift count = width of type); } } - /* Convert the shift-count to an integer, regardless of -size of value being shifted. */ - if (TYPE_MAIN_VARIANT (TREE_TYPE (op1)) != integer_type_node) - op1 = cp_convert (integer_type_node, op1, complain); /* Avoid converting op1 to result_type later. */ converted = 1; } @@ -4344,10 +4340,6 @@ cp_build_binary_op (location_t location,
Re: [RFC diagnostics/fortran] Move gfc_warning (buffered) to the common diagnostics machinery
Manuel López-Ibáñez wrote: * Fortran devs: Is this approach acceptable? The main idea is to have an output_buffer called pp_warning_buffer with the flush_p bit unset if we are buffering. When printing buffered warnings, use this output_buffer in the global_dc-printer instead of the (unbuffered one) used by the *_now variants. In principle this could support several buffered diagnostics, but Fortran only seems to buffer at most one. I think the approach is fine. As the _now version overrides the buffer, one might even do with a single buffer by clearing it, setting flush_p temporarily to true and printing the message. It only might collide with buffered warnings (for error_now) and errors (for warning_now), but I don't see whether that's an issue. For warnings/error_now it probably isn't, for errors/warning_now, it might. Thus, having two buffers is probably better. The ugliest part is how to handle warningcount and werrorcount. I could handle this in the common machinery in a better way by storing DK_WERROR in the diagnostic-kind and checking it after printing. I'm missing the but. Does it require more modifications in common code and thus you shy away? Or is there another reason? I can also hide the output_buffer switching inside two helper functions, but the helper function would need to use either a static variable or a global one to save and restore the tmp_buffer. I'm not sure that is better or worse (the current code uses a global pointer cur_error_buffer, so perhaps I should have used a similar approach). Me neither. The current approach is rather localized in error.c; thus, it doesn't really matter. * Fortran devs #2: The testsuite is testing that the warning is eventually printed. However, I'm not sure it is testing when the warning is buffered and then discarded, is it? If not, how can I produce such a test? Well, for nearly every Fortran program, at least for one line multiple attempts have to be done by the parser. Thus, if you set a break point in gfc_error, you will see syntax error messages which never make it to the screen. As the test suite checks for excess errors, nearly every test-suite file tests this. * * * With the patch in place, plus its gfc_error cousin, and after the follow-up conversion work, only the following would remain: * gfc_warning: a single occurence of two locations in the same error string (of ~120 calls) * gfc_error: approx. 34 times two locations (of ~1630 calls) * gfc_error_now_1: 11 times two locations * In scanner.c: - gfc_warning_now_1: 6 calls in scanner.c - gfc_warning: 3 - fprintf: 2 - gfc_error: 0 Some of those can probably be simply converted, others either need to remain, or one has to setup a proper location (currently, using gfc_warning_now would ICE), or one creates a work around and constructs manually the error string (at least for the fprintf cases). * * * Regarding the use of libcpp: Currently, it is only used with explicit preprocessing (-cpp, file extension) and it is used to output into a temporary file which is then read back. I experimented with token = cpp_get_token (cpp_in); const char *str = cpp_token_as_text (cpp_in, token); which kind of works okay. But I do not understand how one gets linebreaks and spacing correctly. For linebreaks, I can use flags BOL or a callback. But I'm still trying to understand how one can get the number of spacings. flags PREV_WHITE seems to record only a single one and seems to convert all of them (\t, \n and ' ') into a single type. For fixed-form Fortran, spaces are essential. On punch cards,* only the first 72 characters were read; as (some?) punch cards had additional 8 characters, those were used for comments (e.g. to enumerate the cards). Hence, there are codes out there, which assumes that everything beyond 72 characters is ignored. Thus, the number of spaces is crucial – at least with -fpreprocessed, if one wants to always goes through libcpp. Looking at the current code, it seems to use a line-break callback with SOURCE_COLUMN to reconstruct the indentation. I think saving the token src_column and the length of the token string and subtracting it from the new source location, will also work for mid-line spaces. However, that's really a kludge – and it still doesn't permit one to distiniguish between spaces and tabs. Finally, the token handling seems to get in trouble with print *, Hello world! where a single string extends over two lines. Taking everything together, I think using libcpp for reading Fortran files is a topic for GCC 6. Any suggestion how to properly handle the spacing? For the lexing itself, one probably needs a Fortran mode, which can recognize Fortran comments, continuation lines and some special properties about Fortran string. When implemented, one could then also turn of the traditional mode. Tobias * Disclaimer: I started with Fortran 95 and a rather object-oriented code. Hence, I might
Re: [C/C++ PATCH] Don't convert RHS of a shift-expression to int (PR c/63862)
On Thu, Nov 27, 2014 at 08:24:03PM +0100, Marek Polacek wrote: On Thu, Nov 27, 2014 at 10:56:06AM +0100, Richard Biener wrote: But then rather than using integer_type_node I'd convert to unsigned_type_node (if that doesn't work you know we have some nasty dependence on negative shift counts working) Yeah, I think adding the conversion into c_gimplify_expr is the best thing we can do until we have type demotion/promotion pass. Seems that unsigned_type_node there works well. (What happened to Kai's type elevation pass?) Bootstrapped/regtested on ppc64-linux, ok for trunk? LGTM. 2014-11-27 Marek Polacek pola...@redhat.com PR c/63862 c-family/ * c-ubsan.c (ubsan_instrument_shift): Change the type of a MINUS_EXPR to op1_utype. * c-gimplify.c (c_gimplify_expr): Convert right operand of a shift expression to unsigned_type_node. c/ * c-typeck.c (build_binary_op) RSHIFT_EXPR, LSHIFT_EXPR: Don't convert the right operand to integer type. cp/ * typeck.c (cp_build_binary_op) RSHIFT_EXPR, LSHIFT_EXPR: Don't convert the right operand to integer type. testsuite/ * gcc.c-torture/execute/shiftopt-1.c: Don't XFAIL anymore. * c-c++-common/ubsan/shift-7.c: New test. Jakub
Re: [patch c++]: Fix PR/53904
On Wed, Nov 26, 2014 at 9:52 AM, Kai Tietz ktiet...@googlemail.com wrote: Ok. Adjusted patch attached. Nevertheless we should use here unsigned HWI instead of possible truncation to signed int. I admit that it is unlikely to have more then 2^31 elements, but well Ok for apply with adjusted ChangeLog? Regards, Kai On Linux/x86, the testcase fails with output is: /export/gnu/import/git/gcc/gcc/testsuite/g++.dg/cpp0x/pr63904.C: In instantiation of 'struct foo0':^M /export/gnu/import/git/gcc/gcc/testsuite/g++.dg/cpp0x/pr63904.C:10:26: required from here^M /export/gnu/import/git/gcc/gcc/testsuite/g++.dg/cpp0x/pr63904.C:6:12: error: ISO C++ forbids zero-size array [-Wpedantic]^M FAIL: g++.dg/cpp0x/pr63904.C -std=c++11 (test for excess errors) H.J.
Re: PATCH: PR target/63833: REAL_PIC_OFFSET_TABLE_REGNUM is wrong for x86-64
On Wed, Nov 12, 2014 at 6:53 AM, H.J. Lu hongjiu...@intel.com wrote: We have been using the wrong register to hold GOT in 64-bit large model, which is used by the large model PLT. The only reason we haven't run into any problem is linker doesn't support the large model PLT. I am looking into linker issue. This patch corrects REAL_PIC_OFFSET_TABLE_REGNUM for 64-bit large model. OK to install? Thanks. H.J. --- 2014-11-12 H.J. Lu hongjiu...@intel.com PR target/63833 * config/i386/i386.h (REAL_PIC_OFFSET_TABLE_REGNUM): Use R15_REG for 64-bit. * config/i386/rdos64.h (REAL_PIC_OFFSET_TABLE_REGNUM): Removed. diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 53dfd22..0ccfb21 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -1244,7 +1244,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); the pic register when possible. The change is visible after the prologue has been emitted. */ -#define REAL_PIC_OFFSET_TABLE_REGNUM BX_REG +#define REAL_PIC_OFFSET_TABLE_REGNUM (TARGET_64BIT ? R15_REG : BX_REG) #define PIC_OFFSET_TABLE_REGNUM \ ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ diff --git a/gcc/config/i386/rdos64.h b/gcc/config/i386/rdos64.h index e6f089a..f902651 100644 --- a/gcc/config/i386/rdos64.h +++ b/gcc/config/i386/rdos64.h @@ -17,8 +17,5 @@ You should have received a copy of the GNU General Public License along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ -#undef REAL_PIC_OFFSET_TABLE_REGNUM -#define REAL_PIC_OFFSET_TABLE_REGNUM R15_REG - #undef DEFAULT_LARGE_SECTION_THRESHOLD #define DEFAULT_LARGE_SECTION_THRESHOLD 16 Hi Uros, The psABI http://www.x86-64.org/svn/trunk/x86-64-ABI/ has been fixed. Is this OK for trunk? Thanks. -- H.J.
Re: PATCH: PR target/63833: REAL_PIC_OFFSET_TABLE_REGNUM is wrong for x86-64
On Wed, Nov 12, 2014 at 3:53 PM, H.J. Lu hongjiu...@intel.com wrote: We have been using the wrong register to hold GOT in 64-bit large model, which is used by the large model PLT. The only reason we haven't run into any problem is linker doesn't support the large model PLT. I am looking into linker issue. This patch corrects REAL_PIC_OFFSET_TABLE_REGNUM for 64-bit large model. OK to install? Thanks. H.J. --- 2014-11-12 H.J. Lu hongjiu...@intel.com PR target/63833 * config/i386/i386.h (REAL_PIC_OFFSET_TABLE_REGNUM): Use R15_REG for 64-bit. * config/i386/rdos64.h (REAL_PIC_OFFSET_TABLE_REGNUM): Removed. OK, the ABI documentation has just been fixed. Thanks, Uros.
Re: [patch c++]: Fix PR/53904
Sorry, missed to add needed hunk to disable pedantic warnings for this testcase. Committed it as obvious fix at rev 218130. Kai
[PATCH][ARM] FreeBSD arm support, EABI, v2
Hi all, this is the second attempt. I reworked the issues Richard mentioned in the previous review. I also found one issue which will break build/bootstrap if I pass --enable-gnu-indirect-function, also fixed. One thing which came up is the way we generate code for the armv6*-*-freebsd* triplet versus the arm-*-freebsd* triplet. I think the thing which confuses is the fact that we have only two fixed triplets where we build a complete OS with. Means the whole OS is built with the same optimization, not only the kernel or one binary. For the armv6* we want to benefit from the cpu's functionality by default. We build all __ARM_ARCH = 6 with TARGET_CPU_arm1176jzs, on the other hand all __ARM_ARCH =5 will be built with TARGET_CPU_arm9. Now who becomes arm-*-freebsd* and who becomes armv6*-*-freebsd*? As tried above, we only know two triplets, so __ARM_ARCH = 6 becomes armv6*-*-freebsd* and __ARM_ARCH =5 becomes arm-*-freebsd*. armv8 is not yet in the portfolio and it will become something different, either arm64 or aarch64, I do not know. I'd like to keep this since our system compilers, clang and gcc-4.2.1 behave the same. If we have to change here, we would confuse people quite a lot. The whole thing is FreeBSD specific and does not touch others. As usual, bootstrapped, cross compiled, tested. Ok for trunk? TIA, Andreas toplevel: * configure.ac: Don't add ${libgcj} for arm*-*-freebsd*. * configure: Regenerate. gcc: * config.gcc (arm*-*-freebsd*): New configuration. * config/arm/freebsd.h: New file. * config.host: Add extra components for arm*-*-freebsd*. * config/arm/arm.h: Introduce MAX_SYNC_LIBFUNC_SIZE. * config/arm/arm.c (arm_init_libfuncs): Use MAX_SYNC_LIBFUNC_SIZE. libgcc: * config.host (arm*-*-freebsd*): Add new configuration for arm*-*-freebsd*. * config/arm/freebsd-atomic.c: New file. * config/arm/t-freebsd: Likewise. * config/arm/unwind-arm.h: Add __FreeBSD__ to the list of 'PC-relative indirect' OS's. libatomic: * configure.tgt: Exclude arm*-*-freebsd* from try_ifunc. libstdc++-v3: * configure.host: Add arm*-*-freebsd* port_specific_symbol_files. Index: configure === --- configure (revision 218129) +++ configure (working copy) @@ -3427,6 +3427,9 @@ alpha*-*-*vms*) noconfigdirs=$noconfigdirs ${libgcj} ;; + arm*-*-freebsd*) +noconfigdirs=$noconfigdirs ${libgcj} +;; arm-wince-pe) noconfigdirs=$noconfigdirs ${libgcj} ;; Index: configure.ac === --- configure.ac(revision 218129) +++ configure.ac(working copy) @@ -781,6 +781,9 @@ alpha*-*-*vms*) noconfigdirs=$noconfigdirs ${libgcj} ;; + arm*-*-freebsd*) +noconfigdirs=$noconfigdirs ${libgcj} +;; arm-wince-pe) noconfigdirs=$noconfigdirs ${libgcj} ;; Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c(revision 218129) +++ gcc/config/arm/arm.c(working copy) @@ -2160,7 +2160,7 @@ { /* For Linux, we have access to kernel support for atomic operations. */ if (arm_abi == ARM_ABI_AAPCS_LINUX) -init_sync_libfuncs (2 * UNITS_PER_WORD); +init_sync_libfuncs (MAX_SYNC_LIBFUNC_SIZE); /* There are no special library functions unless we are using the ARM BPABI. */ Index: gcc/config/arm/arm.h === --- gcc/config/arm/arm.h(revision 218129) +++ gcc/config/arm/arm.h(working copy) @@ -766,6 +766,11 @@ #define PCC_BITFIELD_TYPE_MATTERS TARGET_AAPCS_BASED #endif +/* The maximum size of the sync library functions supported. */ +#ifndef MAX_SYNC_LIBFUNC_SIZE +#define MAX_SYNC_LIBFUNC_SIZE (2 * UNITS_PER_WORD); +#endif + /* Standard register usage. */ Index: gcc/config/arm/freebsd.h === --- gcc/config/arm/freebsd.h(revision 0) +++ gcc/config/arm/freebsd.h(working copy) @@ -0,0 +1,180 @@ +/* Definitions of target machine for GNU compiler, FreeBSD/arm version. + Copyright (C) 2002-2014 Free Software Foundation, Inc. + Contributed by Wasabi Systems, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described
Re: [RFC diagnostics/fortran] Move gfc_warning (buffered) to the common diagnostics machinery
On 27 November 2014 at 20:28, Tobias Burnus bur...@net-b.de wrote: I think the approach is fine. As the _now version overrides the buffer, one might even do with a single buffer by clearing it, setting flush_p temporarily to true and printing the message. It only might collide with buffered warnings (for error_now) and errors (for warning_now), but I don't see whether that's an issue. For warnings/error_now it probably isn't, for errors/warning_now, it might. Thus, having two buffers is probably better. Oh, I didn't notice that the _now versions override the buffered messages. Where do you see that? It seems that gfc_warning_1 saves and restores buffer_flag but does not touch the buffers (it prints directly to the stream). However, as you say, I may still need two buffers, one for errors and other for warnings. (If this were not necessary, it would be great! Is it really?). In fact, we will need more buffers, since there is gfc_push/pop_error. This is why I didn't start with gfc_error, it is a bit more complex than gfc_warning. The ugliest part is how to handle warningcount and werrorcount. I could handle this in the common machinery in a better way by storing DK_WERROR in the diagnostic-kind and checking it after printing. I'm missing the but. Does it require more modifications in common code and thus you shy away? Or is there another reason? I was going to write Exactly and a long explanation. But now I realize that in this particular case I can simply check DK_ERROR, since within gfc_warning that can only come from a warning converted to an error. That simplifies the function a bit: +/* Issue a warning. */ + +bool +gfc_warning_1 (int opt, const char *gmsgid, ...) +{ + va_list argp; + diagnostic_info diagnostic; + bool fatal_errors = global_dc-fatal_errors; + pretty_printer *pp = global_dc-printer; + output_buffer *tmp_buffer = pp-buffer; + + if (!pp_warning_buffer.flush_p) +{ + /* To prevent -fmax-errors= triggering. */ + --werrorcount; + pp-buffer = pp_warning_buffer; + global_dc-fatal_errors = false; +} + + va_start (argp, gmsgid); + diagnostic_set_info (diagnostic, gmsgid, argp, UNKNOWN_LOCATION, + DK_WARNING); + diagnostic.option_index = opt; + bool ret = report_diagnostic (diagnostic); + + if (ret !pp_warning_buffer.flush_p) +{ + warningcount_buffered = 0; + werrorcount_buffered = 0; + /* Undo the above --werrorcount if not Werror, otherwise werrorcount is +correct already. */ + diagnostic.kind == DK_ERROR + ? (++werrorcount_buffered) + : (++werrorcount, --warningcount, ++warningcount_buffered); + + pp-buffer = tmp_buffer; + global_dc-fatal_errors = fatal_errors; +} + + va_end (argp); + return ret; +} Well, for nearly every Fortran program, at least for one line multiple attempts have to be done by the parser. Thus, if you set a break point in gfc_error, you will see syntax error messages which never make it to the screen. As the test suite checks for excess errors, nearly every test-suite file tests this. Sure, but I would like to test specifically triggering and discarding the gfc_warning that I converted (or another single one that you suggest), if this were possible. Some of those can probably be simply converted, others either need to remain, or one has to setup a proper location (currently, using gfc_warning_now would ICE), or one creates a work around and constructs manually the error string (at least for the fprintf cases). Why does it ICE? At worst it should give a wrong location, but the computation of the offset is fairly similar to what Fortran already does. Could you give me a patch+testcase that ICEs? Taking everything together, I think using libcpp for reading Fortran files is a topic for GCC 6. Any suggestion how to properly handle the spacing? No idea. My knowledge of cpp is limited. I think you need to ask Dodji, Tom or Joseph (in IRC or a new thread). I was hoping it was going to be easier to lex Fortran with cpp since cpp can already preprocess it. Thanks for the comment. They are very helpful. Cheers, Manuel.
Re: [RFC PATCH, i386]: Prefer %ebx in set_got patterns
On Thu, Nov 27, 2014 at 3:19 AM, Uros Bizjak ubiz...@gmail.com wrote: Hello! Attached patch helps RA to choose the most appropriate PIC register by changing the register preference for set_got patterns. Using this patch, there should really be a reason for RA to avoid ABI mandated hard PIC reg. This patch avoids many mov %exx,%ebx in front of the calls, that happen with unpatched compiler even with Vladimir's latest RA patch to avoid duplicated PIC registers. As a smoke test, I have checked 32bit libgo.so.6.0.0 library, where now we have: [uros@omen7 .libs]$ grep thunk.bx aaa | wc -l 7693 [uros@omen7 .libs]$ grep thunk.ax aaa | wc -l 10 [uros@omen7 .libs]$ grep thunk.cx aaa | wc -l 4 [uros@omen7 .libs]$ grep thunk.dx aaa | wc -l 8 [uros@omen7 .libs]$ grep thunk.bp aaa | wc -l 497 [uros@omen7 .libs]$ grep thunk.si aaa | wc -l 145 [uros@omen7 .libs]$ grep thunk.di aaa | wc -l 198 2014-11-27 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (set_got): Use =b,?r constraint for operand 0. EBX is needed only if PLT is used. If PLT isn't used, we should try caller saved registers first. (set_got_labelled): Ditto. (set_got_rex64): Ditto. (set_rip_rex64): Ditto. (set_got_offset_rex64): Ditto. We shouldn't do it for 64-bit. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Thoughts? Uros -- H.J.
[PATCH, i386]: Use preferred_for_{size,speed} instead of Yx and Yd register constraints
Hello! Yx and Yd register constraints depend on optimize_function_for_speed. However, optimize_function_for_size is not stable during the compilation, so this can result in unrecognized insn (as was the case with *floatSWI48:modeMODEF:mode2_sse insn failure). The patch uses preferred_for_{size,speed} infrastructure instead to achieve the same functionality as with Yx and Yd register constraint. 2014-11-27 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (preferred_for_size): New attribute (*pushxf): Split Yx*r constraints to r,*r. Use preferred_for_size attribute to conditionally disable alternative 1. (*pushdf): Split Yd*r constraints to r,*r. Use preferred_for_size and prefered_for_speed attributes to conditionally disable alternative 1. (*movxf_internal): Split Yx*r constraints to r,*r. Use preferred_for_size attribute to conditionally disable alternatives 3 and 4. (*movdf_internal): Split Yd*r constraints to r,*r. Use preferred_for_size and prefered_for_speed attributes to conditionally disable alternatives 3 and 4. * config/i386/constraints.md (Yd, Yx): Remove register constraints. Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32} and was committed to mainline SVN. Uros. Index: config/i386/constraints.md === --- config/i386/constraints.md (revision 218129) +++ config/i386/constraints.md (working copy) @@ -105,8 +105,6 @@ ;; n MMX inter-unit moves from MMX register enabled ;; a Integer register when zero extensions with AND are disabled ;; p Integer register when TARGET_PARTIAL_REG_STALL is disabled -;; d Integer register when integer DFmode moves are enabled -;; x Integer register when integer XFmode moves are enabled ;; f x87 register when 80387 floating point arithmetic is enabled (define_register_constraint Yz TARGET_SSE ? SSE_FIRST_REG : NO_REGS @@ -137,15 +135,6 @@ ? NO_REGS : GENERAL_REGS @internal Any integer register when zero extensions with AND are disabled.) -(define_register_constraint Yd - TARGET_INTEGER_DFMODE_MOVES optimize_function_for_speed_p (cfun) - ? GENERAL_REGS : NO_REGS - @internal Any integer register when integer DFmode moves are enabled.) - -(define_register_constraint Yx - optimize_function_for_speed_p (cfun) ? GENERAL_REGS : NO_REGS - @internal Any integer register when integer XFmode moves are enabled.) - (define_register_constraint Yf (ix86_fpmath FPMATH_387) ? FLOAT_REGS : NO_REGS @internal Any x87 register when 80387 FP arithmetic is enabled.) Index: config/i386/i386.md === --- config/i386/i386.md (revision 218129) +++ config/i386/i386.md (working copy) @@ -816,6 +816,7 @@ ] (const_int 1))) +(define_attr preferred_for_size (const_int 1)) (define_attr preferred_for_speed (const_int 1)) ;; Describe a user's asm statement. @@ -2811,8 +2812,8 @@ }) (define_insn *pushxf - [(set (match_operand:XF 0 push_operand =,) - (match_operand:XF 1 general_no_elim_operand f,Yx*roF))] + [(set (match_operand:XF 0 push_operand =,,,) + (match_operand:XF 1 general_no_elim_operand f,r,*r,oF))] { /* This insn should be already split before reg-stack. */ @@ -2819,14 +2820,18 @@ gcc_unreachable (); } [(set_attr type multi) - (set_attr unit i387,*) + (set_attr unit i387,*,*,*) (set (attr mode) - (cond [(eq_attr alternative 1) + (cond [(eq_attr alternative 1,2,3) (if_then_else (match_test TARGET_64BIT) (const_string DI) (const_string SI)) ] - (const_string XF)))]) + (const_string XF))) + (set (attr preferred_for_size) + (cond [(eq_attr alternative 1) + (symbol_ref false)] + (symbol_ref true)))]) ;; %%% Kill this when call knows how to work this out. (define_split @@ -2842,18 +2847,26 @@ }) (define_insn *pushdf - [(set (match_operand:DF 0 push_operand =,,,) - (match_operand:DF 1 general_no_elim_operand f,Yd*roF,rmF,x))] + [(set (match_operand:DF 0 push_operand =,) + (match_operand:DF 1 general_no_elim_operand f,r,*r,oF,rmF,x))] { /* This insn should be already split before reg-stack. */ gcc_unreachable (); } - [(set_attr isa *,nox64,x64,sse2) + [(set_attr isa *,nox64,nox64,nox64,x64,sse2) (set_attr type multi) - (set_attr unit i387,*,*,sse) - (set_attr mode DF,SI,DI,DF)]) - + (set_attr unit i387,*,*,*,*,sse) + (set_attr mode DF,SI,SI,SI,DI,DF) + (set (attr preferred_for_size) + (cond [(eq_attr alternative 1) + (symbol_ref false)] + (symbol_ref true))) + (set (attr preferred_for_speed) + (cond [(eq_attr alternative 1) + (symbol_ref TARGET_INTEGER_DFMODE_MOVES)] + (symbol_ref true)))]) + ;; %%% Kill this when call knows how to work this out.
Re: [RFC diagnostics/fortran] Move gfc_warning (buffered) to the common diagnostics machinery
On 27 November 2014 at 22:33, Manuel López-Ibáñez lopeziba...@gmail.com wrote: On 27 November 2014 at 20:28, Tobias Burnus bur...@net-b.de wrote: I think the approach is fine. As the _now version overrides the buffer, one might even do with a single buffer by clearing it, setting flush_p temporarily to true and printing the message. It only might collide with buffered warnings (for error_now) and errors (for warning_now), but I don't see whether that's an issue. For warnings/error_now it probably isn't, for errors/warning_now, it might. Thus, having two buffers is probably better. Oh, I didn't notice that the _now versions override the buffered messages. Where do you see that? It seems that gfc_warning_1 saves and restores buffer_flag but does not touch the buffers (it prints directly to the stream). I meant gfc_warning_now_1. However, gfc_error_now_1 does reset error_buffer (and sets it as curr_buffer, why if without buffering it is not used?) . Hum, if this is really necessary, I'm afraid I just introduced a subtle bug, since the new gfc_error_now does not reset error_buffer, thus, potentially it may give an error that should have been discarded. Tobias, what do you think?
Re: PR 13631 Problems in messages
Hi Here is a revisited version. I finally go without compiling in C++11 as I prefer to use __builtin_alloca instead of using std::unique_ptr and heap memory. I also realized that I could use codecvt::max_length to know the max number of multibytes. Tested under Linux x86_64 with all locales installed. There are some failures but not coming from this patch, will surely be the subject of another mail. 2014-11-28 François Dumont fdum...@gcc.gnu.org DR libstdc++/13631 * include/bits/codecvt.h (__get_c_locale): New. * config/locale/gnu/messages_member.h, messages_member.cc: Prefer dgettext usage to gettext to allow usage of several catalogs at the same time. Add an internal cache to make catalog names to catalog ids. Add charset management. * testsuite/22_locale/messages/13631.cc: New. * testsuite/22_locale/messages/members/char/2.cc: Use also fr_FR locale for charset conversion to get the expected accentuated character. Do we need any abi tag ? Do we need to wait ? François On 24/11/2014 01:23, Jonathan Wakely wrote: On 24/11/14 00:13 +0100, François Dumont wrote: Hello As we are at doing some evolution in the ABI I would like to take the opportunity to merge branch libstdcxx_so_7-2. The first fix was I don't think we want to merge everything, but it's certainly worth looking to see if there are some fixes on that branch worth taking. It would have been better to do during stage 1 though :-\ about a messages facet issue. So here is the version for the trunk which is the one from the branch plus management of the charset part. This way messageswchar_t works too. There are still some uncovered points in this patch: - I had to make codecvt _M_c_locale_codecvt public to access it from the messages facet code. How do you want to handle it properly ? A friend function to access it or do I make messages facet friend ? - I haven't use the ABI tag yet. I know that there is a plan to tag locale facets, will it work for what I am doing ? Unless I'm missing something you're not making any ABI changes to std::messages, just making the definitiosn of some functions no longer inline. Note that I could use std::tuple instead of combination of std::pair and std::unique_ptr instead of wchar_buffer if we were building it in C++11 mode. Is it possible ? Yes, the symlink to the messages_members.cc file would need to be moved from src/c++98/Makefile.am to src/c++11/Makefile.am Index: include/bits/locale_facets_nonio.h === --- include/bits/locale_facets_nonio.h(revision 217816) +++ include/bits/locale_facets_nonio.h(working copy) @@ -1842,22 +1842,6 @@ */ virtual void do_close(catalog) const; - - // Returns a locale and codeset-converted string, given a char* message. - char* - _M_convert_to_char(const string_type __msg) const - { -// XXX -return reinterpret_castchar*(const_cast_CharT*(__msg.c_str())); - } - - // Returns a locale and codeset-converted string, given a char* message. - string_type - _M_convert_from_char(char*) const - { -// XXX -return string_type(); - } }; Those members are used by the ieee_1003.1-2001 locale. Index: config/locale/gnu/messages_members.cc === --- config/locale/gnu/messages_members.cc (revision 218027) +++ config/locale/gnu/messages_members.cc (working copy) @@ -31,54 +31,272 @@ #include locale #include bits/c++locale_internal.h +#include algorithm +#include utility + +#include ext/concurrence.h + +namespace +{ + using namespace std; + + class Catalogs + { + public: +typedef messages_base::catalog catalog_id; + +struct catalog_info +{ + catalog_info() + { } + + catalog_info(catalog_id __id, const char* __domain, locale __loc) + : _M_id(__id), _M_domain(__domain), _M_locale(__loc) + { } + + catalog_id _M_id; + const char* _M_domain; + locale _M_locale; +}; + +typedef pairconst char*, locale result_type; + +Catalogs() : _M_counter(0), _M_nb_entry(0) { } + +~Catalogs() +{ + if (_M_nb_entry) + { + for (size_t i = 0; i != _M_nb_entry; ++i) + delete[] _M_map[i]._M_domain; + delete[] _M_map; + } +} + +catalog_id +_M_add(const string __domain, locale __l) +{ + __gnu_cxx::__scoped_lock lock(_M_mutex); + + catalog_info* __new_map = new catalog_info[_M_nb_entry + 1]; + __try + { + copy(_M_map, _M_map + _M_nb_entry, __new_map); + char* __domain_copy = new char[__domain.size() + 1]; + __domain.copy(__domain_copy, __domain.size()); + __domain_copy[__domain.size()] = 0; + __new_map[_M_nb_entry] = catalog_info(_M_counter, __domain_copy, __l); + } + __catch(...) + { + delete[] __new_map; + __throw_exception_again; + } + + // The counter is not likely to
Re: [RFC diagnostics/fortran] Move gfc_warning (buffered) to the common diagnostics machinery
Manuel López-Ibáñez wrote: Oh, I didn't notice that the _now versions override the buffered messages. Where do you see that? I think I messed up a bit - they do not seem to get cleared. I was reading gfc_error_now_1, which has: error_buffer.flag = 1; error_buffer.index = 0; cur_error_buffer = error_buffer; and I misread the error_buffer.index = 0 as resetting the output. However, this index = 0 is completely unused as one has unbuffered output and it only applies to the next error. Actually, I wonder whether that can lead to printing an error message multiple times. Assume a buffered message, that's printed with gfc_error_check(), which sets flags = 0, but the buffer-message still contains the message. gfc_error_now_1 sets the flag to 1; it doesn't touch the buffer itself as it directly outputs the message. If one now calls gfc_error_check(), it outputs the previous buffer again. I wonder whether that's sometimes happens. I do recall seeing the same message multiple times without understanding why. Thus, I withdraw the comment. And regarding error push/pop: I also missed that point. Well, for nearly every Fortran program, at least for one line multiple attempts have to be done by the parser. Thus, if you set a break point in gfc_error, you will see syntax error messages which never make it to the screen. As the test suite checks for excess errors, nearly every test-suite file tests this. Sure, but I would like to test specifically triggering and discarding the gfc_warning that I converted (or another single one that you suggest), if this were possible. Hmm, theoretically, it would be possible. However, my feeling is that there is no such case. What would be needed is something which is ambiguous, the compiler assumes first the wrong kind of statement, fails, and retries is with a different kind of statement. That's simple. However, either one has already aborted (gfc_error or NO_MATCH) *before* reaching a gfc_warning – or the code is valid and, hence, the buffered warning is printed. Hence, sorry, I cannot deliver for gfc_warning. For error, I'd could create one. Some of those can probably be simply converted, others either need to remain, or one has to setup a proper location (currently, using gfc_warning_now would ICE), or one creates a work around and constructs manually the error string (at least for the fprintf cases). Why does it ICE? At worst it should give a wrong location, but the computation of the offset is fairly similar to what Fortran already does. Could you give me a patch+testcase that ICEs? Maybe it has been fixed by your latest patch, which added %L, but using printf '\tend' test.f90; gfortran -Wtabs test.f90 used to ICE for me. However, it no longer ICEs. Thus, the following patch could be committed: --- a/gcc/fortran/scanner.c +++ b/gcc/fortran/scanner.c @@ -1391 +1391 @@ gfc_gobble_whitespace (void) - gfc_warning_now_1 (Nonconforming tab character at %C); + gfc_warning_now (OPT_Wtabs, Nonconforming tab character at %C); I also confirm that the !$OMP warning now works correctly. However, the change - gfc_warning_now_1 (Line truncated at %L, gfc_current_locus); + gfc_warning_now (Line truncated at %L, gfc_current_locus); still triggers an ICE with gfortran -Wall test.f90 and with test.f (two locations in scanner.c; see attached test cases and attache patch). The backtrace for test.f90 is: Aborted 0xb6c38f crash_signal ../../gcc/toplev.c:359 0x1244620 linemap_position_for_loc_and_offset(line_maps*, unsigned int, unsigned int) ../../libcpp/line-map.c:648 0x63247a gfc_format_decoder ../../gcc/fortran/error.c:991 0x1231e8c pp_format(pretty_printer*, text_info*) ../../gcc/pretty-print.c:616 0x122f6ec diagnostic_report_diagnostic(diagnostic_context*, diagnostic_info*) ../../gcc/diagnostic.c:820 0x63380b gfc_warning_now(char const*, ...) ../../gcc/fortran/error.c:1133 0x69aca9 gfc_next_char_literal(gfc_instring) ../../gcc/fortran/scanner.c:1059 Besides those, there is additionally: gfc_warning_now_1 (%s:%d: Illegal preprocessor directive, which I haven't tested. Tobias print *, 'aa' end print *, '' diff --git a/gcc/fortran/scanner.c b/gcc/fortran/scanner.c index 884fe70..5b42777 100644 --- a/gcc/fortran/scanner.c +++ b/gcc/fortran/scanner.c @@ -777,6 +777,6 @@ skip_free_comments (void) else - gfc_warning_now_1 (!$OMP at %C starts a commented - line as it neither is followed - by a space nor is a - continuation line); + gfc_warning_now (!$OMP at %C starts a commented + line as it neither is followed + by a space nor is a + continuation line); } @@
Re: [PATCH] Fix regressions in libgomp testsuite: set flag_fat_lto_objects for offload
On 27 Nov 16:00, Richard Biener wrote: On Thu, Nov 27, 2014 at 2:31 PM, Ilya Verbin iver...@gmail.com wrote: On 25 Nov 10:51, Richard Biener wrote: In the patch adding flag_generate_offload sounds like a good solution, I didn't like emitting fat LTO objects unconditionally just because we offload. Here is updated patch. Bootstrap and make check passed on i686-linux and x86_64-linux. Tests with offloading also passed with trunk binutils. OK for trunk? If it also works without the new __gnu_offload symbol then I'd prefer to re-use __gnu_LTO here. Ok with that change. Do you mean a patch like this? Yes, it also works. Bootstrap and make check passed. OK? -- Ilya gcc/ * cgraphunit.c (ipa_passes): Handle flag_generate_offload. (symbol_table::compile): Set flag_generate_offload if there is something to offload. * common.opt (flag_generate_offload): New Variable declaration. * dwarf2out.c (dwarf2out_finish): Handle flag_generate_offload. * ipa-inline-analysis.c (inline_generate_summary): Do not skip if flag_generate_offload is set. * lto-streamer.c (gate_lto_out): Handle flag_generate_offload. * passes.c (ipa_write_summaries): Do not skip if flag_generate_offload is set. * toplev.c (compile_file): Emit LTO marker if offload info has been previously emitted. Do not emit lto_slim marker if flag_generate_offload is without flag_generate_lto. * tree.c (free_lang_data): Do not skip if flag_generate_offload is set. diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 2fd99a7..fed1a3e 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -2075,7 +2075,7 @@ ipa_passes (void) } /* Some targets need to handle LTO assembler output specially. */ - if (flag_generate_lto) + if (flag_generate_lto || flag_generate_offload) targetm.asm_out.lto_start (); if (!in_lto_p) @@ -2092,7 +2092,7 @@ ipa_passes (void) } } - if (flag_generate_lto) + if (flag_generate_lto || flag_generate_offload) targetm.asm_out.lto_end (); if (!flag_ltrans (in_lto_p || !flag_lto || flag_fat_lto_objects)) @@ -2176,10 +2176,10 @@ symbol_table::compile (void) /* Offloading requires LTO infrastructure. */ if (!in_lto_p g-have_offload) -flag_generate_lto = 1; +flag_generate_offload = 1; /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE. */ - if (flag_generate_lto) + if (flag_generate_lto || flag_generate_offload) lto_streamer_hooks_init (); /* Don't run the IPA passes if there was any error or sorry messages. */ diff --git a/gcc/common.opt b/gcc/common.opt index 41c8d4e..752d939 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -67,6 +67,10 @@ int *param_values Variable int flag_generate_lto +; Nonzero if we should write GIMPLE bytecode for offload compilation. +Variable +int flag_generate_offload = 0 + ; True to warn about any objects definitions whose size is larger ; than N bytes. Also want about function definitions whose returned ; values are larger than N bytes, where N is 'larger_than_size'. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 25f0e7d..4ee3102 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -24420,7 +24420,8 @@ dwarf2out_finish (const char *filename) /* When generating LTO bytecode we can not generate new assembler names at this point and all important decls got theirs via free-lang-data. */ - if ((!flag_generate_lto || DECL_ASSEMBLER_NAME_SET_P (decl)) + if (((!flag_generate_lto !flag_generate_offload) + || DECL_ASSEMBLER_NAME_SET_P (decl)) DECL_ASSEMBLER_NAME (decl) != DECL_NAME (decl)) { add_linkage_attr (node-die, decl); diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c index 2f2993c..9d62722 100644 --- a/gcc/ipa-inline-analysis.c +++ b/gcc/ipa-inline-analysis.c @@ -4031,7 +4031,7 @@ inline_generate_summary (void) /* When not optimizing, do not bother to analyze. Inlining is still done because edge redirection needs to happen there. */ - if (!optimize !flag_generate_lto !flag_wpa) + if (!optimize !flag_generate_lto !flag_generate_offload !flag_wpa) return; function_insertion_hook_holder = diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c index e8347dc..af20330 100644 --- a/gcc/lto-streamer.c +++ b/gcc/lto-streamer.c @@ -328,7 +328,7 @@ lto_streamer_init (void) bool gate_lto_out (void) { - return ((flag_generate_lto || in_lto_p) + return ((flag_generate_lto || flag_generate_offload || in_lto_p) /* Don't bother doing anything if the program has errors. */ !seen_error ()); } diff --git a/gcc/passes.c b/gcc/passes.c index a3be0bb..74b40e5 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -2466,7 +2466,7 @@ ipa_write_summaries (bool offload_lto_mode) struct cgraph_node *node; struct cgraph_node **order; - if
Re: [PATCH][OpenMP] Fix named critical sections inside target functions
On 21 Nov 21:36, Jakub Jelinek wrote: On Fri, Nov 21, 2014 at 11:19:26PM +0300, Ilya Verbin wrote: '#pragma omp critical (name)' can be placed in the function, marked with '#pragma omp declare target', in this case the corresponding node should be marked as offloadable too. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Please add a testcase for this. Here is the testcase, based on critical-2.c, ok for trunk? gcc/ * omp-low.c (lower_omp_critical): Mark critical sections inside target functions as offloadable. libgomp/ * testsuite/libgomp.c/target-critical-1.c: New test. diff --git a/gcc/omp-low.c b/gcc/omp-low.c index dff1504..621958c 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -9387,16 +9387,6 @@ lower_omp_critical (gimple_stmt_iterator *gsi_p, omp_context *ctx) DECL_ARTIFICIAL (decl) = 1; DECL_IGNORED_P (decl) = 1; - /* If '#pragma omp critical' is inside target region, the symbol must -be marked for offloading. */ - omp_context *octx; - for (octx = ctx-outer; octx; octx = octx-outer) - if (is_targetreg_ctx (octx)) - { - varpool_node::get_create (decl)-offloadable = 1; - break; - } - varpool_node::finalize_decl (decl); critical_name_mutexes-put (name, decl); @@ -9404,6 +9394,20 @@ lower_omp_critical (gimple_stmt_iterator *gsi_p, omp_context *ctx) else decl = *n; + /* If '#pragma omp critical' is inside target region or +inside function marked as offloadable, the symbol must be +marked as offloadable too. */ + omp_context *octx; + if (cgraph_node::get (current_function_decl)-offloadable) + varpool_node::get_create (decl)-offloadable = 1; + else + for (octx = ctx-outer; octx; octx = octx-outer) + if (is_targetreg_ctx (octx)) + { + varpool_node::get_create (decl)-offloadable = 1; + break; + } + lock = builtin_decl_explicit (BUILT_IN_GOMP_CRITICAL_NAME_START); lock = build_call_expr_loc (loc, lock, 1, build_fold_addr_expr_loc (loc, decl)); diff --git a/libgomp/testsuite/libgomp.c/target-critical-1.c b/libgomp/testsuite/libgomp.c/target-critical-1.c new file mode 100644 index 000..84ad558 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/target-critical-1.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ + +#include omp.h +#include stdlib.h + +#define N 2000 + +#pragma omp declare target +int foo () +{ + int A[N]; + int i, nthreads; + int res = 0; + + #pragma omp parallel shared (A, nthreads) +{ + #pragma omp master + nthreads = omp_get_num_threads (); + + #pragma omp for + for (i = 0; i N; i++) + A[i] = 0; + + #pragma omp critical (crit1) +for (i = 0; i N; i++) + A[i]++; +} + + for (i = 0; i N; i++) +if (A[i] != nthreads) + res = 1; + + return res; +} +#pragma omp end declare target + +int main () +{ + int res1, res2; + + #pragma omp target map (from: res1, res2) +{ + int B[N]; + int i, nthreads; + + res1 = foo (); + + #pragma omp parallel shared (B, nthreads) + { + #pragma omp master + nthreads = omp_get_num_threads (); + + #pragma omp for + for (i = 0; i N; i++) + B[i] = 0; + + #pragma omp critical (crit2) + for (i = 0; i N; i++) + B[i]++; + } + + res2 = 0; + for (i = 0; i N; i++) + if (B[i] != nthreads) + res2 = 1; +} + + if (res1 || res2) +abort (); + + return 0; +} -- Ilya
Re: [PATCH][ARM] Add Cortex-A17 support
On Thursday 2014-11-27 11:09, Kyrill Tkachov wrote: Thanks for the review. The invoke.texi changes were posted and approved at https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02683.html and I'm attaching the proposed changes.html patch. Looks good. Note that once you have has been added and once was added. I _think_ the former may be more appropriate, but let's one of our native speakers comment in case. GeraldIndex: htdocs/gcc-5/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v retrieving revision 1.43 diff -U 3 -r1.43 changes.html --- htdocs/gcc-5/changes.html 27 Nov 2014 09:50:04 - 1.43 +++ htdocs/gcc-5/changes.html 27 Nov 2014 09:57:19 - @@ -398,6 +398,14 @@ configure option with either of code--with-tune/code or code--with-arch/code. /li + li Support for the Cortex-A17 processor has been added through the + code-mcpu=cortex-a17/code and code-mtune=cortex-a17/code options. + /li + li Initial big.LITTLE tuning support for the combination of Cortex-A17 + and Cortex-A7 was added through the code-mcpu=cortex-a17.cortex-a7 + /code and code-mtune=cortex-a17.cortex-a7/code options. + /li + /ul h3 id=x86IA-32/x86-64/h3
Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling
On 11 Oct 18:49, Ilya Verbin wrote: (run_gcc): Outline options handling into the new functions: find_and_merge_options, append_compiler_options, append_linker_options. ... @@ -625,18 +893,13 @@ run_gcc (unsigned argc, char *argv[]) /* Look at saved options in the IL files. */ for (i = 1; i argc; ++i) { ... + have_lto + = find_and_merge_options (fd, file_offset, LTO_SECTION_NAME_PREFIX, + fdecoded_options, fdecoded_options_count, + collect_gcc); + have_offload + = find_and_merge_options (fd, file_offset, OFFLOAD_SECTION_NAME_PREFIX, + offload_fdecoded_options, + offload_fdecoded_options_count, collect_gcc); close (fd); } I found a bug here, have_{lto,offload} must be set if at least one file contains lto/offload sections, but currently they are overwritten by the last file. Fix is bootstrapped and regtested on x86_64-linux. OK for trunk? gcc/ * lto-wrapper.c (run_gcc): Set have_lto and have_offload if at least one file contains sections with LTO and offload IR, respectively. diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c index 9a540b4..0f69457 100644 --- a/gcc/lto-wrapper.c +++ b/gcc/lto-wrapper.c @@ -921,13 +921,14 @@ run_gcc (unsigned argc, char *argv[]) continue; have_lto - = find_and_merge_options (fd, file_offset, LTO_SECTION_NAME_PREFIX, - fdecoded_options, fdecoded_options_count, - collect_gcc); + |= find_and_merge_options (fd, file_offset, LTO_SECTION_NAME_PREFIX, + fdecoded_options, fdecoded_options_count, + collect_gcc); have_offload - = find_and_merge_options (fd, file_offset, OFFLOAD_SECTION_NAME_PREFIX, - offload_fdecoded_options, - offload_fdecoded_options_count, collect_gcc); + |= find_and_merge_options (fd, file_offset, OFFLOAD_SECTION_NAME_PREFIX, + offload_fdecoded_options, + offload_fdecoded_options_count, + collect_gcc); close (fd); } -- Ilya
[PATCH] rs6000: fix for PR64093
A slight oversight. andmode3_imm is predicated by rs6000_gen_cell_microcode so we had better not generate it otherwise; and in that case, we need to force the constant to a register. Bootstrap and testsuite run are fine (of course, since this isn't tested at all); comparing testsuite runs between -mcpu=cell and default shows a few thousand fails (mainly ICEs in dfp), and no differences between with and without this patch. But this PR was a failure in Linux kernel build; I checked it did fail to build without this patch, and works with (I built cell_defconfig). Okay for trunk? Segher 2014-11-27 Segher Boessenkool seg...@kernel.crashing.org gcc/ PR target/64093 * config/rs6000/rs6000.md (andmode3): Don't generate andmode3_imm unless rs6000_gen_cell_microcode is true. --- gcc/config/rs6000/rs6000.md | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index c8c3a68..f3b5aae 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -2646,8 +2646,13 @@ (define_expand andmode3 if (logical_const_operand (operands[2], MODEmode) !any_mask_operand (operands[2], MODEmode)) { - emit_insn (gen_andmode3_imm (operands[0], operands[1], operands[2])); - DONE; + if (rs6000_gen_cell_microcode) + { + emit_insn (gen_andmode3_imm (operands[0], operands[1], operands[2])); + DONE; + } + else + operands[2] = force_reg (MODEmode, operands[2]); } if ((MODEmode == DImode !and64_2_operand (operands[2], MODEmode)) -- 1.8.1.4
[PATCH 0/6] various combine patches, v2
The first three of these are from the previous series, with hopefully all comments addressed. The fourth simplifies the code that breaks up an arith insn with a compare to zero, using the helpers that the third patch introduced. The fifth and the sixth are separate from the rest and each other; they fix PRs that people optimistically thought this patch series would fix. Well, now they are right ;-) Segher Boessenkool (6): combine: add regno field to LOG_LINKS combine: distribute_log_links for PARALLELs of SETs combine: handle I2 a parallel of two SETs combine: simplify the CC thing code combine: handle REG_UNUSED in reg_dead_at_p (PR59278) combine: allow combining two insns into two in some cases (PR52714) gcc/combine.c | 326 +- 1 file changed, 229 insertions(+), 97 deletions(-) -- 1.8.1.4
[PATCH 1/6] combine: add regno field to LOG_LINKS
With this new field in place, we can have LOG_LINKS for insns that set more than one register and distribute them properly in distribute_links. This then allows many more PARALLELs to be combined. Also split off new functions can_combine_{def,use}_p from the create_log_links function. v2: Add used_between_p check to the only case where I2 can be a multiple set and still be combined: we have a log_link for the first use of each def now, so try_combine can be called with I3 later than the first use of one of the sets, so we better check if we have multiple sets. 2014-11-27 Segher Boessenkool seg...@kernel.crashing.org gcc/ * combine.c (struct insn_link): New field `regno'. (alloc_insn_link): New parameter `regno'. Use it. (find_single_use): Check the new field. (can_combine_def_p, can_combine_use_p): New functions. Split off from ... (create_log_links): ... here. Correct data type of `regno'. Adjust call to alloc_insn_link. (adjust_for_new_dest): Find regno, use it in call to alloc_insn_link. (try_combine): Check reg_used_between_p when combining a PARALLEL as earlier insn. Adjust call to alloc_insn_link. (distribute_links): Check the new field. --- gcc/combine.c | 144 -- 1 file changed, 89 insertions(+), 55 deletions(-) diff --git a/gcc/combine.c b/gcc/combine.c index 1808f97..9b53a93 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -328,6 +328,7 @@ static int *uid_insn_cost; struct insn_link { rtx_insn *insn; + unsigned int regno; struct insn_link *next; }; @@ -346,12 +347,13 @@ static struct obstack insn_link_obstack; /* Allocate a link. */ static inline struct insn_link * -alloc_insn_link (rtx_insn *insn, struct insn_link *next) +alloc_insn_link (rtx_insn *insn, unsigned int regno, struct insn_link *next) { struct insn_link *l = (struct insn_link *) obstack_alloc (insn_link_obstack, sizeof (struct insn_link)); l-insn = insn; + l-regno = regno; l-next = next; return l; } @@ -686,7 +688,7 @@ find_single_use (rtx dest, rtx_insn *insn, rtx_insn **ploc) if (INSN_P (next) dead_or_set_p (next, dest)) { FOR_EACH_LOG_LINK (link, next) - if (link-insn == insn) + if (link-insn == insn link-regno == REGNO (dest)) break; if (link) @@ -982,6 +984,43 @@ delete_noop_moves (void) } +/* Return false if we do not want to (or cannot) combine DEF. */ +static bool +can_combine_def_p (df_ref def) +{ + /* Do not consider if it is pre/post modification in MEM. */ + if (DF_REF_FLAGS (def) DF_REF_PRE_POST_MODIFY) +return false; + + unsigned int regno = DF_REF_REGNO (def); + + /* Do not combine frame pointer adjustments. */ + if ((regno == FRAME_POINTER_REGNUM +(!reload_completed || frame_pointer_needed)) +#if !HARD_FRAME_POINTER_IS_FRAME_POINTER + || (regno == HARD_FRAME_POINTER_REGNUM + (!reload_completed || frame_pointer_needed)) +#endif +#if FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM + || (regno == ARG_POINTER_REGNUM fixed_regs[regno]) +#endif + ) +return false; + + return true; +} + +/* Return false if we do not want to (or cannot) combine USE. */ +static bool +can_combine_use_p (df_ref use) +{ + /* Do not consider the usage of the stack pointer by function call. */ + if (DF_REF_FLAGS (use) DF_REF_CALL_STACK_USAGE) +return false; + + return true; +} + /* Fill in log links field for all insns. */ static void @@ -1015,67 +1054,46 @@ create_log_links (void) FOR_EACH_INSN_DEF (def, insn) { - int regno = DF_REF_REGNO (def); + unsigned int regno = DF_REF_REGNO (def); rtx_insn *use_insn; if (!next_use[regno]) continue; - /* Do not consider if it is pre/post modification in MEM. */ - if (DF_REF_FLAGS (def) DF_REF_PRE_POST_MODIFY) -continue; + if (!can_combine_def_p (def)) + continue; - /* Do not make the log link for frame pointer. */ - if ((regno == FRAME_POINTER_REGNUM -(! reload_completed || frame_pointer_needed)) -#if !HARD_FRAME_POINTER_IS_FRAME_POINTER - || (regno == HARD_FRAME_POINTER_REGNUM - (! reload_completed || frame_pointer_needed)) -#endif -#if FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM - || (regno == ARG_POINTER_REGNUM fixed_regs[regno]) -#endif - ) -continue; + use_insn = next_use[regno]; + next_use[regno] = NULL; - use_insn = next_use[regno]; - if (BLOCK_FOR_INSN (use_insn) == bb) -{ - /* flow.c claimed: - - We don't build a LOG_LINK
[PATCH 4/6] combine: simplify the CC thing code
[What a great subject line]. This simplifies the code right before the new code of the previous patch to use one of the new helper functions that introduces. 2014-11-27 Segher Boessenkool seg...@kernel.crashing.org gcc/ * combine.c (try_combine): Use is_parallel_of_n_reg_sets some more. --- gcc/combine.c | 42 -- 1 file changed, 16 insertions(+), 26 deletions(-) diff --git a/gcc/combine.c b/gcc/combine.c index 6c94e53..83a0a10 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -2832,43 +2832,33 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, This undoes a previous combination and allows us to match a branch-and- decrement insn. */ - if (i1 == 0 GET_CODE (PATTERN (i2)) == PARALLEL - XVECLEN (PATTERN (i2), 0) = 2 - GET_CODE (XVECEXP (PATTERN (i2), 0, 0)) == SET + if (i1 == 0 + is_parallel_of_n_reg_sets (i2, 2) (GET_MODE_CLASS (GET_MODE (SET_DEST (XVECEXP (PATTERN (i2), 0, 0 == MODE_CC) GET_CODE (SET_SRC (XVECEXP (PATTERN (i2), 0, 0))) == COMPARE XEXP (SET_SRC (XVECEXP (PATTERN (i2), 0, 0)), 1) == const0_rtx - GET_CODE (XVECEXP (PATTERN (i2), 0, 1)) == SET - REG_P (SET_DEST (XVECEXP (PATTERN (i2), 0, 1))) rtx_equal_p (XEXP (SET_SRC (XVECEXP (PATTERN (i2), 0, 0)), 0), SET_SRC (XVECEXP (PATTERN (i2), 0, 1))) !reg_used_between_p (SET_DEST (XVECEXP (PATTERN (i2), 0, 0)), i2, i3) !reg_used_between_p (SET_DEST (XVECEXP (PATTERN (i2), 0, 1)), i2, i3)) { - for (i = XVECLEN (PATTERN (i2), 0) - 1; i = 2; i--) - if (GET_CODE (XVECEXP (PATTERN (i2), 0, i)) != CLOBBER) - break; - - if (i == 1) - { - /* We make I1 with the same INSN_UID as I2. This gives it -the same DF_INSN_LUID for value tracking. Our fake I1 will -never appear in the insn stream so giving it the same INSN_UID -as I2 will not cause a problem. */ + /* We make I1 with the same INSN_UID as I2. This gives it +the same DF_INSN_LUID for value tracking. Our fake I1 will +never appear in the insn stream so giving it the same INSN_UID +as I2 will not cause a problem. */ - i1 = gen_rtx_INSN (VOIDmode, NULL, i2, BLOCK_FOR_INSN (i2), -XVECEXP (PATTERN (i2), 0, 1), INSN_LOCATION (i2), --1, NULL_RTX); - INSN_UID (i1) = INSN_UID (i2); + i1 = gen_rtx_INSN (VOIDmode, NULL, i2, BLOCK_FOR_INSN (i2), +XVECEXP (PATTERN (i2), 0, 1), INSN_LOCATION (i2), +-1, NULL_RTX); + INSN_UID (i1) = INSN_UID (i2); - SUBST (PATTERN (i2), XVECEXP (PATTERN (i2), 0, 0)); - SUBST (XEXP (SET_SRC (PATTERN (i2)), 0), -SET_DEST (PATTERN (i1))); - unsigned int regno = REGNO (SET_DEST (PATTERN (i1))); - SUBST_LINK (LOG_LINKS (i2), - alloc_insn_link (i1, regno, LOG_LINKS (i2))); - } + SUBST (PATTERN (i2), XVECEXP (PATTERN (i2), 0, 0)); + SUBST (XEXP (SET_SRC (PATTERN (i2)), 0), +SET_DEST (PATTERN (i1))); + unsigned int regno = REGNO (SET_DEST (PATTERN (i1))); + SUBST_LINK (LOG_LINKS (i2), + alloc_insn_link (i1, regno, LOG_LINKS (i2))); } /* If I2 is a PARALLEL of two SETs of REGs (and perhaps some CLOBBERs), -- 1.8.1.4
[PATCH 2/6] combine: distribute_log_links for PARALLELs of SETs
Now that LOG_LINKS are per regno, we can distribute them on PARALLELs just fine. Do so. This makes PARALLELs not lose their LOG_LINKS early when e.g. a trivial reg-reg move is combined, so that they can be used in more useful combinations as well. 2014-11-27 Segher Boessenkool seg...@kernel.crashing.org gcc/ * combine.c (distribute_links): Handle multiple SETs. --- gcc/combine.c | 52 +--- 1 file changed, 37 insertions(+), 15 deletions(-) diff --git a/gcc/combine.c b/gcc/combine.c index 9b53a93..7e3f4e6 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -13817,24 +13817,46 @@ distribute_links (struct insn_link *links) next_link = link-next; - /* If the insn that this link points to is a NOTE or isn't a single -set, ignore it. In the latter case, it isn't clear what we -can do other than ignore the link, since we can't tell which -register it was for. Such links wouldn't be used by combine -anyway. - -It is not possible for the destination of the target of the link to -have been changed by combine. The only potential of this is if we -replace I3, I2, and I1 by I3 and I2. But in that case the -destination of I2 also remains unchanged. */ - - if (NOTE_P (link-insn) - || (set = single_set (link-insn)) == 0) + /* If the insn that this link points to is a NOTE, ignore it. */ + if (NOTE_P (link-insn)) + continue; + + set = 0; + rtx pat = PATTERN (link-insn); + if (GET_CODE (pat) == SET) + set = pat; + else if (GET_CODE (pat) == PARALLEL) + { + int i; + for (i = 0; i XVECLEN (pat, 0); i++) + { + set = XVECEXP (pat, 0, i); + if (GET_CODE (set) != SET) + continue; + + reg = SET_DEST (set); + while (GET_CODE (reg) == ZERO_EXTRACT +|| GET_CODE (reg) == STRICT_LOW_PART +|| GET_CODE (reg) == SUBREG) + reg = XEXP (reg, 0); + + if (!REG_P (reg)) + continue; + + if (REGNO (reg) == link-regno) + break; + } + if (i == XVECLEN (pat, 0)) + continue; + } + else continue; reg = SET_DEST (set); - while (GET_CODE (reg) == SUBREG || GET_CODE (reg) == ZERO_EXTRACT -|| GET_CODE (reg) == STRICT_LOW_PART) + + while (GET_CODE (reg) == ZERO_EXTRACT +|| GET_CODE (reg) == STRICT_LOW_PART +|| GET_CODE (reg) == SUBREG) reg = XEXP (reg, 0); /* A LOG_LINK is defined as being placed on the first insn that uses -- 1.8.1.4
Re: [RFC diagnostics/fortran] Move gfc_warning (buffered) to the common diagnostics machinery
On 27 November 2014 at 23:58, Tobias Burnus bur...@net-b.de wrote: Manuel López-Ibáñez wrote: Sure, but I would like to test specifically triggering and discarding the gfc_warning that I converted (or another single one that you suggest), if this were possible. Hmm, theoretically, it would be possible. However, my feeling is that there is no such case. What would be needed is something which is ambiguous, the compiler assumes first the wrong kind of statement, fails, and retries is with a different kind of statement. That's simple. However, either one has already aborted (gfc_error or NO_MATCH) *before* reaching a gfc_warning - or the code is valid and, hence, the buffered warning is printed. Then, I don't understand the point of gfc_warning. If gfc_warning is never reached, then we could have used gfc_warning_now. If the warning is printed, then gfc_warning was reached, thus we could have used gfc_warning_now. The only difference is that gfc_warning_now will print the warning as soon as gfc_warning is reached, but since there is at most one buffered warning and there can not be a previous error, then this should not make a difference. What am I missing here? Why does it ICE? At worst it should give a wrong location, but the computation of the offset is fairly similar to what Fortran already does. Could you give me a patch+testcase that ICEs? However, the change - gfc_warning_now_1 (Line truncated at %L, gfc_current_locus); + gfc_warning_now (Line truncated at %L, gfc_current_locus); still triggers an ICE with gfortran -Wall test.f90 and with test.f (two locations in scanner.c; see attached test cases and attache patch). The backtrace for test.f90 is: The problem seems to be that linemap_position_for_loc_and_offset cannot handle an offset larger than the column_hint. Fortran hardcodes that to 120. There are two bugs there: (1) we should not crash, at worst we should use offset=0 and (2) we should update the column_hint to the largest column seen. I'm testing a patch that seems to fix the problem. Cheers, Manuel.
[PATCH 3/6] combine: handle I2 a parallel of two SETs
If I2 is a PARALLEL of two SETs, split it into two instructions, I1 and I2. If there already was an I1, rename it to I0. If there already was an I0, don't do anything. This surprisingly simple patch is enough to let combine handle such PARALLELs properly. v2: Add some functions to make the checking for a suitable PARALLEL more readable (and more general). 2014-11-27 Segher Boessenkool seg...@kernel.crashing.org gcc/ * combine.c (is_parallel_of_n_reg_sets): New function. (can_split_parallel_of_n_reg_sets): New function. (try_combine): If I2 is a PARALLEL of two SETs, split it into two insns if possible. --- gcc/combine.c | 78 +++ 1 file changed, 78 insertions(+) diff --git a/gcc/combine.c b/gcc/combine.c index 7e3f4e6..6c94e53 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -2461,6 +2461,59 @@ update_cfg_for_uncondjump (rtx_insn *insn) } } +/* Return whether INSN is a PARALLEL of exactly N register SETs followed + by an arbitrary number of CLOBBERs. */ +static bool +is_parallel_of_n_reg_sets (rtx_insn *insn, int n) +{ + rtx pat = PATTERN (insn); + + if (GET_CODE (pat) != PARALLEL) +return false; + + int len = XVECLEN (pat, 0); + if (len n) +return false; + + int i; + for (i = 0; i n; i++) +if (GET_CODE (XVECEXP (pat, 0, i)) != SET + || !REG_P (SET_DEST (XVECEXP (pat, 0, i + return false; + for ( ; i len; i++) +if (GET_CODE (XVECEXP (pat, 0, i)) != CLOBBER) + return false; + + return true; +} + +/* Return whether INSN, a PARALLEL of N register SETs (and maybe some + CLOBBERs), can be split into individual SETs in that order, without + changing semantics. */ +static bool +can_split_parallel_of_n_reg_sets (rtx_insn *insn, int n) +{ + if (!insn_nothrow_p (insn)) +return false; + + rtx pat = PATTERN (insn); + + int i, j; + for (i = 0; i n; i++) +{ + if (side_effects_p (SET_SRC (XVECEXP (pat, 0, i + return false; + + rtx reg = SET_DEST (XVECEXP (pat, 0, i)); + + for (j = i + 1; j n; j++) + if (reg_referenced_p (reg, XVECEXP (pat, 0, j))) + return false; +} + + return true; +} + /* Try to combine the insns I0, I1 and I2 into I3. Here I0, I1 and I2 appear earlier than I3. I0 and I1 can be zero; then we combine just I2 into I3, or I1 and I2 into @@ -2817,6 +2870,31 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, alloc_insn_link (i1, regno, LOG_LINKS (i2))); } } + + /* If I2 is a PARALLEL of two SETs of REGs (and perhaps some CLOBBERs), + make those two SETs separate I1 and I2 insns, and make an I0 that is + the original I1. */ + if (i0 == 0 + is_parallel_of_n_reg_sets (i2, 2) + can_split_parallel_of_n_reg_sets (i2, 2) + !reg_used_between_p (SET_DEST (XVECEXP (PATTERN (i2), 0, 0)), i2, i3) + !reg_used_between_p (SET_DEST (XVECEXP (PATTERN (i2), 0, 1)), i2, i3)) +{ + /* If there is no I1, there is no I0 either. */ + i0 = i1; + + /* We make I1 with the same INSN_UID as I2. This gives it +the same DF_INSN_LUID for value tracking. Our fake I1 will +never appear in the insn stream so giving it the same INSN_UID +as I2 will not cause a problem. */ + + i1 = gen_rtx_INSN (VOIDmode, NULL, i2, BLOCK_FOR_INSN (i2), +XVECEXP (PATTERN (i2), 0, 0), INSN_LOCATION (i2), +-1, NULL_RTX); + INSN_UID (i1) = INSN_UID (i2); + + SUBST (PATTERN (i2), XVECEXP (PATTERN (i2), 0, 1)); +} #endif /* Verify that I2 and I1 are valid for combining. */ -- 1.8.1.4
[PATCH 5/6] combine: handle REG_UNUSED in reg_dead_at_p (PR59278)
Currently reg_dead_at_p returns 0 for registers that are set but never used. This patch repairs that oversight. This fixes PR59278. 2014-11-27 Segher Boessenkool seg...@kernel.crashing.org gcc/ PR rtl-optimization/59278 combine (reg_dead_at_p): Consider REG_UNUSED notes. --- gcc/combine.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/combine.c b/gcc/combine.c index 83a0a10..91ddff4 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -12990,6 +12990,9 @@ reg_dead_at_p (rtx reg, rtx_insn *insn) { if (INSN_P (insn)) { + if (find_regno_note (insn, REG_UNUSED, reg_dead_regno)) + return 1; + note_stores (PATTERN (insn), reg_dead_at_p_1, NULL); if (reg_dead_flag) return reg_dead_flag == 1 ? 1 : 0; -- 1.8.1.4
[PATCH 6/6] combine: allow combining two insns into two in some cases (PR52714)
If one of the resulting insns is a noop set it is safe to allow the combination, for it will never lead to a loop: any following combination using that noop will delete it. In fact, in most cases combine deletes the noop immediately; for cc0 targets it does not in some cases, but it deletes another insn in that case (a following jump insn). This fixes PR52714. 2014-11-27 Segher Boessenkool seg...@kernel.crashing.org gcc/ PR rtl-optimization/52714 * combine.c (try_combine): Allow combining two insns into two new insns if at least one of those is a noop. --- gcc/combine.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/gcc/combine.c b/gcc/combine.c index 91ddff4..66007d8 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -3812,15 +3812,20 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, /* Similarly, check for a case where we have a PARALLEL of two independent SETs but we started with three insns. In this case, we can do the sets as two separate insns. This case occurs when some SET allows two - other insns to combine, but the destination of that SET is still live. */ + other insns to combine, but the destination of that SET is still live. - else if (i1 insn_code_number 0 asm_noperands (newpat) 0 + Also do this if we started with two insns and (at least) one of the + resulting sets is a noop; this noop will be deleted later. */ + + else if (insn_code_number 0 asm_noperands (newpat) 0 GET_CODE (newpat) == PARALLEL XVECLEN (newpat, 0) == 2 GET_CODE (XVECEXP (newpat, 0, 0)) == SET + GET_CODE (XVECEXP (newpat, 0, 1)) == SET + (i1 || set_noop_p (XVECEXP (newpat, 0, 0)) + || set_noop_p (XVECEXP (newpat, 0, 1))) GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART - GET_CODE (XVECEXP (newpat, 0, 1)) == SET GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != STRICT_LOW_PART ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)), -- 1.8.1.4
Re: [PATCH, ARM] Fix PR63718, Thumb1 bootstrap -- disable fuse-caller-save for Thumb1
This work around brings thumb1 bootstrap back. Though I cannot approve, I would like it in stage 3. A complete fix in thumb1 pattern will be welcomed. Thanks, Joey On Thu, Nov 20, 2014 at 7:54 PM, Tom de Vries tom_devr...@mentor.com wrote: Richard, This patch fixes PR63718, which currently breaks Thumb1 bootstrap. The problem is that in Thumb1 mode, we emit the epilogue in RTL, but the last insn - epilogue_insns - does not accurately model the corresponding insns emitted in the asm file. F.i., the asm file may contain an insn: ... pop {r0} while the corresponding RTL pattern looks like this: ... (jump_insn (unspec_volatile [ (return) ] VUNSPEC_EPILOGUE)) ... As a consequence, the epilogue may clobber registers without fuse-caller-save being able to analyze that. Adding the missing clobbers to epilogue_insns is not trivial, and probably not a good idea for stage3. The patch works around the problem by disabling fuse-caller-save in Thumb1 mode. Build and reg-tested on arm-none-eabi. OK for stage3? Thanks, - Tom
Re: [PATCH] rs6000: fix for PR64093
On Thu, Nov 27, 2014 at 7:21 PM, Segher Boessenkool seg...@kernel.crashing.org wrote: A slight oversight. andmode3_imm is predicated by rs6000_gen_cell_microcode so we had better not generate it otherwise; and in that case, we need to force the constant to a register. Bootstrap and testsuite run are fine (of course, since this isn't tested at all); comparing testsuite runs between -mcpu=cell and default shows a few thousand fails (mainly ICEs in dfp), and no differences between with and without this patch. But this PR was a failure in Linux kernel build; I checked it did fail to build without this patch, and works with (I built cell_defconfig). Okay for trunk? Segher 2014-11-27 Segher Boessenkool seg...@kernel.crashing.org gcc/ PR target/64093 * config/rs6000/rs6000.md (andmode3): Don't generate andmode3_imm unless rs6000_gen_cell_microcode is true. Okay. Thanks, David
Re: [arm][patch] fix arm_neon_ok check on !arm_arch7
On Nov 27, 2014, at 9:28 AM, Andrew Stubbs a...@codesourcery.com wrote: On 27/11/14 17:05, Mike Stump wrote: Could you include a link or the patch. If the test suite, I'll review it if no one else steps up. The original patch is here: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01119.html Sorry, arm people will have to approve... Many reasonable choices, yet I bet only one is best.
Re: [RFC diagnostics/fortran] Move gfc_warning (buffered) to the common diagnostics machinery
Manuel López-Ibáñez wrote: On 27 November 2014 at 23:58, Tobias Burnus bur...@net-b.de wrote: Hmm, theoretically, it would be possible. However, my feeling is that there is no such case. What would be needed is something which is ambiguous, the compiler assumes first the wrong kind of statement, fails, and retries is with a different kind of statement. That's simple. However, either one has already aborted (gfc_error or NO_MATCH) *before* reaching a gfc_warning - or the code is valid and, hence, the buffered warning is printed. Then, I don't understand the point of gfc_warning. I think the idea is that a warning could be printed in the wrong code path (assumption of which kind of statement or expression one has) – and then one re-tries a different code path, see that it fits and drops the buffered message. Grepping through the 132 gfc_warning, I have the impression that this cannot happen, i.e. that the message is always shown. However, in order to be sure, I had to check all code paths and had to think about ways they become unreachable – with the chance that I miss something. If gfc_warning is never reached, then we could have used gfc_warning_now. If the warning is printed, then gfc_warning was reached, thus we could have used gfc_warning_now. I concur; the question is only whether my impression is correct. You could try to map gfc_warning to gfc_warning_now_1 and see whether something in the testsuite breaks. The only difference is that gfc_warning_now will print the warning as soon as gfc_warning is reached, but since there is at most one buffered warning and there can not be a previous error, then this should not make a difference. What am I missing here? (a) That, conceptionally, warning buffering makes sense, if you buffering errors for rewinding - even if it only occurs for error and not for warnings. (b) That I might have missed some code path. (c) That one might add such a code path in future. Except for (b), I do not see a reason not to map gfc_warning to gfc_warning_now. If one later needs buffering, one can still take gfc_error as model for implementing it as it surely needs buffering. FX, Steve et al.: Comments or additional facts? Why does it ICE? At worst it should give a wrong location, but the computation of the offset is fairly similar to what Fortran already does. Could you give me a patch+testcase that ICEs? However, the change - gfc_warning_now_1 (Line truncated at %L, gfc_current_locus); + gfc_warning_now (Line truncated at %L, gfc_current_locus); still triggers an ICE with gfortran -Wall test.f90 and with test.f (two locations in scanner.c; see attached test cases and attache patch). The backtrace for test.f90 is: The problem seems to be that linemap_position_for_loc_and_offset cannot handle an offset larger than the column_hint. Fortran hardcodes that to 120. There are two bugs there: (1) we should not crash, at worst we should use offset=0 and (2) we should update the column_hint to the largest column seen. I'm testing a patch that seems to fix the problem. Fortran (old code) chops off items from the beginning of the line if the column number is high (and also trimps the output line to the right). Thus, instead of print 'aa.' (1) is just show aaa' (1) I don't know how C/C++ handles this. If one doesn't chop off, the output can become extremely long – especially with topformflat 0. And then the ^ is also not helpful as one has to count the numbe of lines ... Tobias