Re: [PATCH][2/3] Fix PR54733 Optimize endian independent load/store
On Wed, 2 Apr 2014, Thomas Preud'homme wrote: Note that as it stands the patch does not work for arrays indexed with variable (such a tab[a] || (tab[a+1] 8)) because fold_const does not fold (a + 1) - a. Uh? It does fold a+1-a for me. What it doesn't do is look through the definition of b in b-a. Richard+GSoC will supposedly soon provide a function that does that. -- Marc Glisse
Re: Fix various x86 tests for --with-arch=bdver3 --with-cpu=bdver3
On Wed, Apr 2, 2014 at 12:27 AM, Joseph S. Myers jos...@codesourcery.com wrote: When I fixed various tests in http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01662.html for failures with --with-arch=bdver3, I missed that a so-configured compiler still defaults to -mtune=generic. If you override that as well with --with-cpu=bdver3, further failures appear, and this patch fixes some of them. Most of these changes add -mno-prefer-avx128 to AVX tests not expecting a -mprefer-avx128 default. In addition, some tests have -mtune=generic added where the behavior tested for depends on some tuning parameter that I identified: X86_TUNE_EXT_80387_CONSTANTS or X86_TUNE_SSE_LOAD0_BY_PXOR. Tested x86_64-linux-gnu. OK to commit? There are other failures this patch does not resolve in a --with-arch=bdver3 --with-cpu=bdver3 configuration. Some of these are AVX tests whose failures are not resolved by adding -mno-prefer-avx128 (and so this patch does not add -mno-prefer-avx128 to those tests); others may be cases where -mtune=generic is appropriate but I haven't identified the specific tuning parameter that shows code generation differences depending on tuning are correct and so a -mtune= option should be used. FAIL: gcc.target/i386/avx2-vpand-1.c scan-assembler vpand[ \\t]+[^\n]*%ymm[0-9] FAIL: gcc.target/i386/avx2-vpand-3.c scan-assembler-times vpand[ \\t]+[^\n]*%ymm[0-9] 1 FAIL: gcc.target/i386/avx2-vpandn-1.c scan-assembler vpandn[ \\t]+[^\n]*%ymm[0-9] FAIL: gcc.target/i386/avx2-vpor-1.c scan-assembler vpor[ \\t]+[^\n]*%ymm[0-9] FAIL: gcc.target/i386/avx2-vpxor-1.c scan-assembler vpxor[ \\t]+[^\n]*%ymm[0-9] FAIL: gcc.target/i386/avx256-unaligned-load-2.c scan-assembler (sse2_loaddqu|vmovdqu[^\n\r]*movv16qi_internal) FAIL: gcc.target/i386/avx256-unaligned-load-2.c scan-assembler vinsert.128 FAIL: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vmovdqa64[ \\t]+%zmm 2 FAIL: gcc.target/i386/avx512f-vmovdqu32-1.c scan-assembler-times vmovdqu[36][24][ \\t]+[^\n]*\\)[^\n]*%zmm[0-9][^{] 1 FAIL: gcc.target/i386/avx512f-vmovupd-1.c scan-assembler-times vmovupd[ \\t]+[^\n]*\\)[^\n]*%zmm[0-9][^{] 1 FAIL: gcc.target/i386/avx512f-vpandd-1.c scan-assembler-times vpandd[ \\t]+[^\n]*%zmm[0-9][^{] 4 FAIL: gcc.target/i386/avx512f-vpandd-1.c scan-assembler-times vpandd[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpandd-1.c scan-assembler-times vpandd[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpandnd-1.c scan-assembler-times vpandnd[ \\t]+[^\n]*%zmm[0-9][^{] 4 FAIL: gcc.target/i386/avx512f-vpandnd-1.c scan-assembler-times vpandnd[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpandnd-1.c scan-assembler-times vpandnd[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpandnq-1.c scan-assembler-times vpandnq[ \\t]+[^\n]*%zmm[0-9][^{] 3 FAIL: gcc.target/i386/avx512f-vpandnq-1.c scan-assembler-times vpandnq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpandnq-1.c scan-assembler-times vpandnq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpandq-1.c scan-assembler-times vpandq[ \\t]+[^\n]*%zmm[0-9][^{] 3 FAIL: gcc.target/i386/avx512f-vpandq-1.c scan-assembler-times vpandq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpandq-1.c scan-assembler-times vpandq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpord-1.c scan-assembler-times vpord[ \\t]+[^\n]*%zmm[0-9][^{] 4 FAIL: gcc.target/i386/avx512f-vpord-1.c scan-assembler-times vpord[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpord-1.c scan-assembler-times vpord[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vporq-1.c scan-assembler-times vporq[ \\t]+[^\n]*%zmm[0-9][^{] 3 FAIL: gcc.target/i386/avx512f-vporq-1.c scan-assembler-times vporq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vporq-1.c scan-assembler-times vporq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpxord-1.c scan-assembler-times vpxord[ \\t]+[^\n]*%zmm[0-9][^{] 4 FAIL: gcc.target/i386/avx512f-vpxord-1.c scan-assembler-times vpxord[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpxord-1.c scan-assembler-times vpxord[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpxorq-1.c scan-assembler-times vpxorq[ \\t]+[^\n]*%zmm[0-9][^{] 3 FAIL: gcc.target/i386/avx512f-vpxorq-1.c scan-assembler-times vpxorq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpxorq-1.c scan-assembler-times vpxorq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/pr49002-1.c scan-assembler vmovapd[\t ]*[^,]*,[\t ]*%xmm FAIL: gcc.target/i386/pr53712.c scan-assembler-times movdqu 1 FAIL: gcc.target/i386/pr53907.c scan-assembler movdqa FAIL: gcc.target/i386/pr59539-1.c scan-assembler-times vmovdqu 1 FAIL:
RE: [PATCH][2/3] Fix PR54733 Optimize endian independent load/store
From: Marc Glisse [mailto:marc.gli...@inria.fr] Uh? It does fold a+1-a for me. What it doesn't do is look through the definition of b in b-a. Richard+GSoC will supposedly soon provide a function that does that. Oh right, it's a bit more complex here since the array index is converted to an offset first. So the operation is more like: ((a+1)*cst) - (a*cst). Any chances this might be handled at some point? Note that this might not be very frequent so it's not very important for this patch. Thanks for the comment. Best regards, Thomas
RFA: RL78: Fix handling of (SUBREG (SYMBOL_REF))
Hi DJ, The patch below is to fix a snafu I made whilst fixing some problems with the RL78 port a while ago. GCC was generating (SUBREG (SYMBOL_REF) n) which made no sense to me, so I had the movqi expander just fail when it encountered them. Now that I have more idea about why they are created - installing symbolic values into bitfields or packed structure fields - I have found that it is necessary to support them. Failure is not an option as GCC will just silently omit generating any code at all. Tested with an rl78-elf toolchain without any regressions. OK to apply ? Cheers Nick gcc/ChangeLog 2014-04-01 Nick Clifton ni...@redhat.com * config/rl78/rl78-expand.md (movqi): Handle (SUBREG (SYMBOL_REF)) properly. Index: gcc/config/rl78/rl78-expand.md === --- gcc/config/rl78/rl78-expand.md (revision 209009) +++ gcc/config/rl78/rl78-expand.md (working copy) @@ -30,18 +30,23 @@ if (rl78_far_p (operands[0]) rl78_far_p (operands[1])) operands[1] = copy_to_mode_reg (QImode, operands[1]); -/* FIXME: Not sure how GCC can generate (SUBREG (SYMBOL_REF)), - but it does. Since this makes no sense, reject it here. */ +/* GCC can generate (SUBREG (SYMBOL_REF)) when it has to store a symbol + into a bitfield, or a packed ordinary field. We can handle this + provided that the destination is a register. If not, then load the + source into a register first. */ if (GET_CODE (operands[1]) == SUBREG - GET_CODE (XEXP (operands[1], 0)) == SYMBOL_REF) - FAIL; + GET_CODE (XEXP (operands[1], 0)) == SYMBOL_REF +! REG_P (operands[0])) + operands[1] = copy_to_mode_reg (QImode, operands[1]); + /* Similarly for (SUBREG (CONST (PLUS (SYMBOL_REF. cf. g++.dg/abi/packed.C. */ if (GET_CODE (operands[1]) == SUBREG GET_CODE (XEXP (operands[1], 0)) == CONST GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == PLUS - GET_CODE (XEXP (XEXP (XEXP (operands[1], 0), 0), 0)) == SYMBOL_REF) - FAIL; + GET_CODE (XEXP (XEXP (XEXP (operands[1], 0), 0), 0)) == SYMBOL_REF +! REG_P (operands[0])) + operands[1] = copy_to_mode_reg (QImode, operands[1]); if (CONST_INT_P (operands[1]) ! IN_RANGE (INTVAL (operands[1]), (-1 8) + 1, (1 8) - 1)) FAIL;
Re: [gomp4] Add tables generation
Hi! On Thu, 20 Mar 2014 17:50:13 +0100, Bernd Schmidt ber...@codesourcery.com wrote: This is based on Michael Zolotukhin's patch 2/3 from a while ago. It adds functionality to build function/variable tables that will allow libgomp to look up offload target code based on the address of the corresponding host function. There are two alternatives, one based on named sections, and one based on a target hook when named sections are unavailable (as on ptx). Committed on gomp-4_0-branch. I see regressions in the libgomp testsuite for configurations where offloading is not enabled: spawn [...]/build/gcc/xgcc -B[...]/build/gcc/ [...]/source/libgomp/testsuite/libgomp.c/for-3.c -B[...]/build/x86_64-unknown-linux-gnu/./libgomp/ -B[...]/build/x86_64-unknown-linux-gnu/./libgomp/.libs -I[...]/build/x86_64-unknown-linux-gnu/./libgomp -I[...]/source/libgomp/testsuite/.. -fmessage-length=0 -fno-diagnostics-show-caret -fdiagnostics-color=never -fopenmp -std=gnu99 -fopenmp -L[...]/build/x86_64-unknown-linux-gnu/./libgomp/.libs -lm -o ./for-3.exe /tmp/ccGnT0ei.o: In function `main': for-3.c:(.text+0x21032): undefined reference to `__OPENMP_TARGET__' collect2: error: ld returned 1 exit status I suppose that's because even if... --- gcc/configure.ac (revision 208715) +++ gcc/configure.ac (working copy) @@ -887,6 +887,10 @@ AC_SUBST(enable_accelerator) offload_targets=`echo $offload_targets | sed -e 's#,#:#'` AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, $offload_targets, [Define to hold the list of target names suitable for offloading.]) +if test x$offload_targets != x; then + AC_DEFINE(ENABLE_OFFLOADING, 1, +[Define this to enable support for offloading.]) +fi ... offloading is not enabled, this... --- gcc/omp-low.c (revision 208706) +++ gcc/omp-low.c (working copy) @@ -8671,19 +8672,22 @@ expand_omp_target (struct omp_region *re } gimple g; - /* FIXME: This will be address of - extern char __OPENMP_TARGET__[] __attribute__((visibility (hidden))) - symbol, as soon as the linker plugin is able to create it for us. */ - tree openmp_target = build_zero_cst (ptr_type_node); + tree openmp_target += build_decl (UNKNOWN_LOCATION, VAR_DECL, + get_identifier (__OPENMP_TARGET__), ptr_type_node); + TREE_PUBLIC (openmp_target) = 1; + DECL_EXTERNAL (openmp_target) = 1; if (kind == GF_OMP_TARGET_KIND_REGION) { tree fnaddr = build_fold_addr_expr (child_fn); - g = gimple_build_call (builtin_decl_explicit (start_ix), 7, - device, fnaddr, openmp_target, t1, t2, t3, t4); + g = gimple_build_call (builtin_decl_explicit (start_ix), 7, device, + fnaddr, build_fold_addr_expr (openmp_target), + t1, t2, t3, t4); } else -g = gimple_build_call (builtin_decl_explicit (start_ix), 6, -device, openmp_target, t1, t2, t3, t4); +g = gimple_build_call (builtin_decl_explicit (start_ix), 6, device, +build_fold_addr_expr (openmp_target), +t1, t2, t3, t4); ... will now cause a reference to __OPENMP_TARGET__, but... --- libgcc/crtstuff.c (revision 208706) +++ libgcc/crtstuff.c (working copy) @@ -311,6 +311,15 @@ register_tm_clones (void) } #endif /* USE_TM_CLONE_REGISTRY */ +#if defined(HAVE_GAS_HIDDEN) defined(ENABLE_OFFLOADING) +void *_omp_func_table[0] + __attribute__ ((__used__, visibility (protected), + section (.offload_func_table_section))) = { }; +void *_omp_var_table[0] + __attribute__ ((__used__, visibility (protected), + section (.offload_var_table_section))) = { }; +#endif + #if defined(INIT_SECTION_ASM_OP) || defined(INIT_ARRAY_SECTION_ASM_OP) #ifdef OBJECT_FORMAT_ELF @@ -752,6 +761,23 @@ __do_global_ctors (void) #error What are you doing with crtstuff.c, then? #endif +#if defined(HAVE_GAS_HIDDEN) defined(ENABLE_OFFLOADING) +void *_omp_funcs_end[0] + __attribute__ ((__used__, visibility (protected), + section (.offload_func_table_section))) = { }; +void *_omp_vars_end[0] + __attribute__ ((__used__, visibility (protected), + section (.offload_var_table_section))) = { }; +extern void *_omp_func_table[]; +extern void *_omp_var_table[]; +void *__OPENMP_TARGET__[] __attribute__ ((__visibility__ (protected))) = +{ + _omp_func_table, _omp_funcs_end, + _omp_var_table, _omp_vars_end +}; +#endif ... __OPENMP_TARGET__ is not being defined here for the !ENABLE_OFFLOADING case. In http://news.gmane.org/find-root.php?message_id=%3C20130905082455.GH23437%40tucnak.redhat.com%3E, Jakub had suggested this to be a weak symbol, so we'd get NULL in this case, which would be what's needed here, I think? Also, I'd suggest to rename __OPENMP_TARGET__ (and similar ones) to __GNU_OFFLOAD__ (or similar). As we're using this offloading stuff for
Re: [PATCH] Guard special installs in install-driver
On Tue, 1 Apr 2014, Mike Stump wrote: On Mar 31, 2014, at 4:50 AM, Richard Biener rguent...@suse.de wrote: -$(INSTALL_PROGRAM) xgcc$(exeext) $(DESTDIR)$(bindir)/$(GCC_INSTALL_NAME)$(exeext) ! -rm -f $(DESTDIR)$(bindir)/$(target_noncanonical)-gcc-$(version)$(exeext) ! -( cd $(DESTDIR)$(bindir) \ ! $(LN) $(GCC_INSTALL_NAME)$(exeext) $(target_noncanonical)-gcc-$(version)$(exeext) ) ! -if [ ! -f gcc-cross$(exeext) ] ; then \ rm -f $(DESTDIR)$(bindir)/$(target_noncanonical)-gcc-tmp$(exeext); \ ( cd $(DESTDIR)$(bindir) \ $(LN) $(GCC_INSTALL_NAME)$(exeext) $(target_noncanonical)-gcc-tmp$(exeext) \ --- 3205,3217 install-driver: installdirs xgcc$(exeext) -rm -f $(DESTDIR)$(bindir)/$(GCC_INSTALL_NAME)$(exeext) -$(INSTALL_PROGRAM) xgcc$(exeext) $(DESTDIR)$(bindir)/$(GCC_INSTALL_NAME)$(exeext) ! -if [ $(GCC_INSTALL_NAME) != $(target_noncanonical)-gcc-$(version) ]; then \ ! -rm -f $(DESTDIR)$(bindir)/$(target_noncanonical)-gcc-$(version)$(exeext) \ ! -( cd $(DESTDIR)$(bindir) \ !$(LN) $(GCC_INSTALL_NAME)$(exeext) $(target_noncanonical)-gcc-$(version)$(exeext) ) \ ! fi Certainly safer for release like this, but, gotta wonder if we can avoid the ignoring of errors with the added check… No idea ;) For my case I ended up without an installed driver as the rm of course succeeded but the rest not ... I’d have to work out why they did that in the first place and run a build and play a bit to be as sure as I’d like to be… but, a cross and a native build I think should test it adequately. Work out why we install _two_ additional variants! (or rather why we install any additional variants to GCC_INSTALL_NAME at all ...). Anyway, I now committed the patch. We can always followup with cleanups to this area later, possibly in stage1. Richard.
Re: Fix various x86 tests for --with-arch=bdver3 --with-cpu=bdver3
On Wed, Apr 2, 2014 at 12:27 AM, Joseph S. Myers jos...@codesourcery.com wrote: There are other failures this patch does not resolve in a --with-arch=bdver3 --with-cpu=bdver3 configuration. Some of these are AVX tests whose failures are not resolved by adding -mno-prefer-avx128 (and so this patch does not add -mno-prefer-avx128 to those tests); others may be cases where -mtune=generic is appropriate but I haven't identified the specific tuning parameter that shows code generation differences depending on tuning are correct and so a -mtune= option should be used. FAIL: gcc.target/i386/avx2-vpand-1.c scan-assembler vpand[ \\t]+[^\n]*%ymm[0-9] FAIL: gcc.target/i386/avx2-vpand-3.c scan-assembler-times vpand[ \\t]+[^\n]*%ymm[0-9] 1 FAIL: gcc.target/i386/avx2-vpandn-1.c scan-assembler vpandn[ \\t]+[^\n]*%ymm[0-9] FAIL: gcc.target/i386/avx2-vpor-1.c scan-assembler vpor[ \\t]+[^\n]*%ymm[0-9] FAIL: gcc.target/i386/avx2-vpxor-1.c scan-assembler vpxor[ \\t]+[^\n]*%ymm[0-9] FAIL: gcc.target/i386/avx256-unaligned-load-2.c scan-assembler (sse2_loaddqu|vmovdqu[^\n\r]*movv16qi_internal) FAIL: gcc.target/i386/avx256-unaligned-load-2.c scan-assembler vinsert.128 FAIL: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vmovdqa64[ \\t]+%zmm 2 FAIL: gcc.target/i386/avx512f-vmovdqu32-1.c scan-assembler-times vmovdqu[36][24][ \\t]+[^\n]*\\)[^\n]*%zmm[0-9][^{] 1 FAIL: gcc.target/i386/avx512f-vmovupd-1.c scan-assembler-times vmovupd[ \\t]+[^\n]*\\)[^\n]*%zmm[0-9][^{] 1 FAIL: gcc.target/i386/avx512f-vpandd-1.c scan-assembler-times vpandd[ \\t]+[^\n]*%zmm[0-9][^{] 4 FAIL: gcc.target/i386/avx512f-vpandd-1.c scan-assembler-times vpandd[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpandd-1.c scan-assembler-times vpandd[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpandnd-1.c scan-assembler-times vpandnd[ \\t]+[^\n]*%zmm[0-9][^{] 4 FAIL: gcc.target/i386/avx512f-vpandnd-1.c scan-assembler-times vpandnd[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpandnd-1.c scan-assembler-times vpandnd[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpandnq-1.c scan-assembler-times vpandnq[ \\t]+[^\n]*%zmm[0-9][^{] 3 FAIL: gcc.target/i386/avx512f-vpandnq-1.c scan-assembler-times vpandnq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpandnq-1.c scan-assembler-times vpandnq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpandq-1.c scan-assembler-times vpandq[ \\t]+[^\n]*%zmm[0-9][^{] 3 FAIL: gcc.target/i386/avx512f-vpandq-1.c scan-assembler-times vpandq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpandq-1.c scan-assembler-times vpandq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpord-1.c scan-assembler-times vpord[ \\t]+[^\n]*%zmm[0-9][^{] 4 FAIL: gcc.target/i386/avx512f-vpord-1.c scan-assembler-times vpord[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpord-1.c scan-assembler-times vpord[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vporq-1.c scan-assembler-times vporq[ \\t]+[^\n]*%zmm[0-9][^{] 3 FAIL: gcc.target/i386/avx512f-vporq-1.c scan-assembler-times vporq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vporq-1.c scan-assembler-times vporq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpxord-1.c scan-assembler-times vpxord[ \\t]+[^\n]*%zmm[0-9][^{] 4 FAIL: gcc.target/i386/avx512f-vpxord-1.c scan-assembler-times vpxord[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpxord-1.c scan-assembler-times vpxord[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpxorq-1.c scan-assembler-times vpxorq[ \\t]+[^\n]*%zmm[0-9][^{] 3 FAIL: gcc.target/i386/avx512f-vpxorq-1.c scan-assembler-times vpxorq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpxorq-1.c scan-assembler-times vpxorq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/pr49002-1.c scan-assembler vmovapd[\t ]*[^,]*,[\t ]*%xmm FAIL: gcc.target/i386/pr53712.c scan-assembler-times movdqu 1 FAIL: gcc.target/i386/pr53907.c scan-assembler movdqa FAIL: gcc.target/i386/pr59539-1.c scan-assembler-times vmovdqu 1 FAIL: gcc.target/i386/pr59539-2.c scan-assembler-times vmovdqu 1 These are due to TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL tuning flag. Currently, this flag applies to all vector sizes (128, 256 and 512 bits), but I guess it is effective only for 128 bit sizes. Can you please review usage of this flag in i386/sse.md? Thanks, Uros.
Re: [4.8, PATCH 9/26] Backport Power8 and LE support: ABI call support
On Wed, 19 Mar 2014, Bill Schmidt wrote: Hi, This patch (diff-abi-calls) backports fixes to common code to support the new ELFv2 ABI. Copying Richard and Jakub for these bits. Ok. Thanks, Richard. Thanks, Bill 2014-03-29 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline r204798: 2013-11-14 Ulrich Weigand ulrich.weig...@de.ibm.com Alan Modra amo...@gmail.com * function.c (assign_parms): Use all.reg_parm_stack_space instead of re-evaluating REG_PARM_STACK_SPACE target macro. (locate_and_pad_parm): New parameter REG_PARM_STACK_SPACE. Use it instead of evaluating target macro REG_PARM_STACK_SPACE every time. (assign_parm_find_entry_rtl): Update call. * calls.c (initialize_argument_information): Update call. (emit_library_call_value_1): Likewise. * expr.h (locate_and_pad_parm): Update prototype. Backport from mainline r204797: 2013-11-14 Ulrich Weigand ulrich.weig...@de.ibm.com * calls.c (store_unaligned_arguments_into_pseudos): Skip PARALLEL arguments. Backport from mainline r197003: 2013-03-23 Eric Botcazou ebotca...@adacore.com * calls.c (expand_call): Add missing guard to code handling return of non-BLKmode structures in MSB. * function.c (expand_function_end): Likewise. Index: gcc-4_8-branch/gcc/calls.c === --- gcc-4_8-branch.orig/gcc/calls.c 2013-12-28 17:41:32.056627059 +0100 +++ gcc-4_8-branch/gcc/calls.c2013-12-28 17:50:43.356356135 +0100 @@ -983,6 +983,7 @@ store_unaligned_arguments_into_pseudos ( for (i = 0; i num_actuals; i++) if (args[i].reg != 0 ! args[i].pass_on_stack + GET_CODE (args[i].reg) != PARALLEL args[i].mode == BLKmode MEM_P (args[i].value) (MEM_ALIGN (args[i].value) @@ -1327,6 +1328,7 @@ initialize_argument_information (int num #else args[i].reg != 0, #endif + reg_parm_stack_space, args[i].pass_on_stack ? 0 : args[i].partial, fndecl, args_size, args[i].locate); #ifdef BLOCK_REG_PADDING @@ -3171,7 +3173,9 @@ expand_call (tree exp, rtx target, int i group load/store machinery below. */ if (!structure_value_addr !pcc_struct_value +TYPE_MODE (rettype) != VOIDmode TYPE_MODE (rettype) != BLKmode +REG_P (valreg) targetm.calls.return_in_msb (rettype)) { if (shift_return_value (TYPE_MODE (rettype), false, valreg)) @@ -3734,7 +3738,8 @@ emit_library_call_value_1 (int retval, r #else argvec[count].reg != 0, #endif -0, NULL_TREE, args_size, argvec[count].locate); +reg_parm_stack_space, 0, +NULL_TREE, args_size, argvec[count].locate); if (argvec[count].reg == 0 || argvec[count].partial != 0 || reg_parm_stack_space 0) @@ -3821,7 +3826,7 @@ emit_library_call_value_1 (int retval, r #else argvec[count].reg != 0, #endif -argvec[count].partial, +reg_parm_stack_space, argvec[count].partial, NULL_TREE, args_size, argvec[count].locate); args_size.constant += argvec[count].locate.size.constant; gcc_assert (!argvec[count].locate.size.var); Index: gcc-4_8-branch/gcc/function.c === --- gcc-4_8-branch.orig/gcc/function.c2013-12-28 17:41:32.056627059 +0100 +++ gcc-4_8-branch/gcc/function.c 2013-12-28 17:50:43.362356165 +0100 @@ -2507,6 +2507,7 @@ assign_parm_find_entry_rtl (struct assig } locate_and_pad_parm (data-promoted_mode, data-passed_type, in_regs, +all-reg_parm_stack_space, entry_parm ? data-partial : 0, current_function_decl, all-stack_args_size, data-locate); @@ -3485,11 +3486,7 @@ assign_parms (tree fndecl) /* Adjust function incoming argument size for alignment and minimum length. */ -#ifdef REG_PARM_STACK_SPACE - crtl-args.size = MAX (crtl-args.size, - REG_PARM_STACK_SPACE (fndecl)); -#endif - + crtl-args.size = MAX (crtl-args.size, all.reg_parm_stack_space); crtl-args.size = CEIL_ROUND (crtl-args.size, PARM_BOUNDARY / BITS_PER_UNIT); @@ -3693,6 +3690,9 @@ gimplify_parameters (void) IN_REGS is nonzero if the argument will be passed in registers. It will never be set if REG_PARM_STACK_SPACE is not defined. + REG_PARM_STACK_SPACE is the number of bytes of stack space reserved + for arguments which are passed in registers. +
Re: [4.8, PATCH 15/26] Backport Power8 and LE support: PR54537
On Wed, 19 Mar 2014, Bill Schmidt wrote: Hi, This patch (diff-pr54537) backports a fix for PR54537 which is unrelated but necessary. Copying Richard and Jakub for the common code. Ok. Thanks, Richard. Thanks, Bill [libstdc++-v3] 2014-03-29 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline 2013-08-01 Fabien Chêne fab...@gcc.gnu.org PR c++/54537 * include/tr1/cmath: Remove pow(double,double) overload, remove a duplicated comment about DR 550. Add a comment to explain the issue. * testsuite/tr1/8_c_compatibility/cmath/pow_cmath.cc: New. [gcc/cp] 2014-03-29 Bill Schmidt wschm...@linux.vnet.ibm.com Back port from mainline 2013-08-01 Fabien Chêne fab...@gcc.gnu.org PR c++/54537 * cp-tree.h: Check OVL_USED with OVERLOAD_CHECK. * name-lookup.c (do_nonmember_using_decl): Make sure we have an OVERLOAD before calling OVL_USED. Call diagnose_name_conflict instead of issuing an error without mentioning the conflicting declaration. [gcc/testsuite] 2014-03-29 Bill Schmidt wschm...@linux.vnet.ibm.com Back port from mainline 2013-08-01 Fabien Chêne fab...@gcc.gnu.org Peter Bergner berg...@vnet.ibm.com PR c++/54537 * g++.dg/overload/using3.C: New. * g++.dg/overload/using2.C: Adjust. * g++.dg/lookup/using9.C: Likewise. Index: gcc-4_8-test/gcc/cp/cp-tree.h === --- gcc-4_8-test.orig/gcc/cp/cp-tree.h +++ gcc-4_8-test/gcc/cp/cp-tree.h @@ -331,7 +331,7 @@ typedef struct ptrmem_cst * ptrmem_cst_t /* If set, this was imported in a using declaration. This is not to confuse with being used somewhere, which is not important for this node. */ -#define OVL_USED(NODE) TREE_USED (NODE) +#define OVL_USED(NODE) TREE_USED (OVERLOAD_CHECK (NODE)) /* If set, this OVERLOAD was created for argument-dependent lookup and can be freed afterward. */ #define OVL_ARG_DEPENDENT(NODE) TREE_LANG_FLAG_0 (OVERLOAD_CHECK (NODE)) Index: gcc-4_8-test/gcc/cp/name-lookup.c === --- gcc-4_8-test.orig/gcc/cp/name-lookup.c +++ gcc-4_8-test/gcc/cp/name-lookup.c @@ -2286,8 +2286,7 @@ push_overloaded_decl_1 (tree decl, int f compparms (TYPE_ARG_TYPES (TREE_TYPE (fn)), TYPE_ARG_TYPES (TREE_TYPE (decl))) ! decls_match (fn, decl)) - error (%q#D conflicts with previous using declaration %q#D, -decl, fn); + diagnose_name_conflict (decl, fn); dup = duplicate_decls (decl, fn, is_friend); /* If DECL was a redeclaration of FN -- even an invalid @@ -2519,7 +2518,7 @@ do_nonmember_using_decl (tree scope, tre if (new_fn == old_fn) /* The function already exists in the current namespace. */ break; - else if (OVL_USED (tmp1)) + else if (TREE_CODE (tmp1) == OVERLOAD OVL_USED (tmp1)) continue; /* this is a using decl */ else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (new_fn)), TYPE_ARG_TYPES (TREE_TYPE (old_fn @@ -2534,7 +2533,7 @@ do_nonmember_using_decl (tree scope, tre break; else { - error (%qD is already declared in this scope, name); + diagnose_name_conflict (new_fn, old_fn); break; } } Index: gcc-4_8-test/gcc/testsuite/g++.dg/lookup/using9.C === --- gcc-4_8-test.orig/gcc/testsuite/g++.dg/lookup/using9.C +++ gcc-4_8-test/gcc/testsuite/g++.dg/lookup/using9.C @@ -21,11 +21,11 @@ void h() f('h'); f(1); // { dg-error ambiguous } // { dg-message candidate candidate note { target *-*-* } 22 } - void f(int); // { dg-error previous using declaration } + void f(int); // { dg-error previous declaration } } void m() { void f(int); - using B::f; // { dg-error already declared } + using B::f; // { dg-error previous declaration } } Index: gcc-4_8-test/gcc/testsuite/g++.dg/overload/using2.C === --- gcc-4_8-test.orig/gcc/testsuite/g++.dg/overload/using2.C +++ gcc-4_8-test/gcc/testsuite/g++.dg/overload/using2.C @@ -45,7 +45,7 @@ using std::C1; extern C void exit (int) throw (); extern C void *malloc (__SIZE_TYPE__) throw () __attribute__((malloc)); - void abort (void) throw (); + void abort (void) throw (); // { dg-message previous } void _exit (int) throw (); // { dg-error conflicts
Re: [PATCH][1/3] Fix PR54733 Optimize endian independent load/store
On Wed, Apr 2, 2014 at 2:54 AM, Thomas Preud'homme thomas.preudho...@arm.com wrote: I took the lack of answer for this patch as an indication that the patch is too big. This is the first patch in a series of three. Its purpose is to create some new effective target for architecture having byte swap instructions and make use of them in the existing byte swap tests. One effective target is created for each size (16, 32 and 64) as not all architectures support byte swap of all sizes. Sorry, I simply queued it in my review queue for stage1 ... it's definitely something that was high on my wish-list (including of also using general vector shuffles if available to support even more patterns). Still on the queue, stay tuned ;) Richard. Here is the gcc/testsuite/ChangeLog entry: 2014-04-01 Thomas Preud'homme thomas.preudho...@arm.com * lib/target-supports.exp: New effective targets for architectures capable of performing byte swap. * gcc.dg/optimize-bswapdi-1.c: Convert to new bswap target. * gcc.dg/optimize-bswapdi-2.c: Likewise. * gcc.dg/optimize-bswapsi-1.c: Likewise. The patch is attached to this email. Is this ok for stage1? Best regards, Thomas
Re: [PATCH][2/3] Fix PR54733 Optimize endian independent load/store
On Wed, Apr 2, 2014 at 9:04 AM, Thomas Preud'homme thomas.preudho...@arm.com wrote: From: Marc Glisse [mailto:marc.gli...@inria.fr] Uh? It does fold a+1-a for me. What it doesn't do is look through the definition of b in b-a. Richard+GSoC will supposedly soon provide a function that does that. Oh right, it's a bit more complex here since the array index is converted to an offset first. So the operation is more like: ((a+1)*cst) - (a*cst). Any chances this might be handled at some point? Note that this might not be very frequent so it's not very important for this patch. More like isn't enough to answer this - do you have a testcase? (usually these end up in undefined-overflow and/or conversion-to-sizetype issues) Richard. Thanks for the comment. Best regards, Thomas
RE: [PATCH][1/3] Fix PR54733 Optimize endian independent load/store
From: Richard Biener [mailto:richard.guent...@gmail.com] Sorry, I simply queued it in my review queue for stage1 ... it's definitely something that was high on my wish-list (including of also using general vector shuffles if available to support even more patterns). Oh great. Anyway, having it split in 3 parts will ease the review for you. Thanks. Thomas
Re: [gomp4] Add tables generation
Hi! On Thu, 20 Mar 2014 17:50:13 +0100, Bernd Schmidt ber...@codesourcery.com wrote: This is based on Michael Zolotukhin's patch 2/3 from a while ago. It adds functionality to build function/variable tables that will allow libgomp to look up offload target code based on the address of the corresponding host function. There are two alternatives, one based on named sections, and one based on a target hook when named sections are unavailable (as on ptx). Committed on gomp-4_0-branch. --- gcc/omp-low.c (revision 208706) +++ gcc/omp-low.c (working copy) @@ -8671,19 +8672,22 @@ expand_omp_target (struct omp_region *re } gimple g; - /* FIXME: This will be address of - extern char __OPENMP_TARGET__[] __attribute__((visibility (hidden))) - symbol, as soon as the linker plugin is able to create it for us. */ - tree openmp_target = build_zero_cst (ptr_type_node); + tree openmp_target += build_decl (UNKNOWN_LOCATION, VAR_DECL, + get_identifier (__OPENMP_TARGET__), ptr_type_node); + TREE_PUBLIC (openmp_target) = 1; + DECL_EXTERNAL (openmp_target) = 1; if (kind == GF_OMP_TARGET_KIND_REGION) { tree fnaddr = build_fold_addr_expr (child_fn); - g = gimple_build_call (builtin_decl_explicit (start_ix), 7, - device, fnaddr, openmp_target, t1, t2, t3, t4); + g = gimple_build_call (builtin_decl_explicit (start_ix), 7, device, + fnaddr, build_fold_addr_expr (openmp_target), + t1, t2, t3, t4); } else -g = gimple_build_call (builtin_decl_explicit (start_ix), 6, -device, openmp_target, t1, t2, t3, t4); +g = gimple_build_call (builtin_decl_explicit (start_ix), 6, device, +build_fold_addr_expr (openmp_target), +t1, t2, t3, t4); Committed in r209013: commit 1f54e08135bd8be59438977b4edbc102e7cef2d7 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Wed Apr 2 08:28:54 2014 + Handle __OPENMP_TARGET__ symbol for OpenACC offloading functions, too. gcc/ * omp-low.c (expand_oacc_offload): Handle __OPENMP_TARGET__ symbol. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@209013 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp | 5 + gcc/omp-low.c | 14 -- 2 files changed, 13 insertions(+), 6 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 1d35b58..8983632 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,3 +1,8 @@ +2014-04-02 Thomas Schwinge tho...@codesourcery.com + + * omp-low.c (expand_oacc_offload): Handle __OPENMP_TARGET__ + symbol. + 2014-03-20 Thomas Schwinge tho...@codesourcery.com * gimple.h (enum gf_mask): Add GF_OMP_FOR_KIND_OACC_LOOP. diff --git gcc/omp-low.c gcc/omp-low.c index a7b93bc..01eda9d 100644 --- gcc/omp-low.c +++ gcc/omp-low.c @@ -5138,13 +5138,15 @@ expand_oacc_offload (struct omp_region *region) } gimple g; - /* FIXME: This will be address of - extern char __OPENMP_TARGET__[] __attribute__((visibility (hidden))) - symbol, as soon as the linker plugin is able to create it for us. */ - tree openmp_target = build_zero_cst (ptr_type_node); + tree openmp_target += build_decl (UNKNOWN_LOCATION, VAR_DECL, + get_identifier (__OPENMP_TARGET__), ptr_type_node); + TREE_PUBLIC (openmp_target) = 1; + DECL_EXTERNAL (openmp_target) = 1; tree fnaddr = build_fold_addr_expr (child_fn); - g = gimple_build_call (builtin_decl_explicit (start_ix), -10, device, fnaddr, openmp_target, t1, t2, t3, t4, + g = gimple_build_call (builtin_decl_explicit (start_ix), 10, device, +fnaddr, build_fold_addr_expr (openmp_target), +t1, t2, t3, t4, t_num_gangs, t_num_workers, t_vector_length); gimple_set_location (g, gimple_location (entry_stmt)); gsi_insert_before (gsi, g, GSI_SAME_STMT); +/* Create new symbol containing (address, size) pairs for omp-marked + functions and global variables. */ +void +omp_finish_file (void) +{ + struct cgraph_node *node; + struct varpool_node *vnode; + const char *funcs_section_name = .offload_func_table_section; + const char *vars_section_name = .offload_var_table_section; + vectree, va_gc *v_funcs, *v_vars; + + vec_alloc (v_vars, 0); + vec_alloc (v_funcs, 0); + + [...] + unsigned num_vars = vec_safe_length (v_vars); + unsigned num_funcs = vec_safe_length (v_funcs); + [...] + if (targetm_common.have_named_sections) +{ + [...] + } + else +{ + for (unsigned i = 0; i num_funcs; i++) + { + tree it = (*v_funcs)[i]; + targetm.record_offload_symbol (it); + } + for (unsigned i = 0; i num_funcs; i++) + { + tree it =
Re: [gomp4] Add tables generation
Hi! On Wed, 02 Apr 2014 09:34:29 +0200, I wrote: On Thu, 20 Mar 2014 17:50:13 +0100, Bernd Schmidt ber...@codesourcery.com wrote: This is based on Michael Zolotukhin's patch 2/3 from a while ago. It adds functionality to build function/variable tables that will allow libgomp to look up offload target code based on the address of the corresponding host function. There are two alternatives, one based on named sections, and one based on a target hook when named sections are unavailable (as on ptx). Committed on gomp-4_0-branch. I see regressions in the libgomp testsuite for configurations where offloading is not enabled: spawn [...]/build/gcc/xgcc -B[...]/build/gcc/ [...]/source/libgomp/testsuite/libgomp.c/for-3.c -B[...]/build/x86_64-unknown-linux-gnu/./libgomp/ -B[...]/build/x86_64-unknown-linux-gnu/./libgomp/.libs -I[...]/build/x86_64-unknown-linux-gnu/./libgomp -I[...]/source/libgomp/testsuite/.. -fmessage-length=0 -fno-diagnostics-show-caret -fdiagnostics-color=never -fopenmp -std=gnu99 -fopenmp -L[...]/build/x86_64-unknown-linux-gnu/./libgomp/.libs -lm -o ./for-3.exe /tmp/ccGnT0ei.o: In function `main': for-3.c:(.text+0x21032): undefined reference to `__OPENMP_TARGET__' collect2: error: ld returned 1 exit status I suppose that's because [...] Workaround committed in r209015: commit 6a015f81a5fafe32cf45656e3de121f4088dbf41 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Wed Apr 2 08:29:17 2014 + Work around __OPENMP_TARGET__ not being defined for !ENABLE_OFFLOADING. libgcc/ * crtstuff.c [!ENABLE_OFFLOADING] (__OPENMP_TARGET__): Define to NULL. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@209015 138bc75d-0d04-0410-961f-82ee72b054a4 --- libgcc/ChangeLog.gomp | 10 ++ libgcc/crtstuff.c | 2 ++ 2 files changed, 12 insertions(+) diff --git libgcc/ChangeLog.gomp libgcc/ChangeLog.gomp new file mode 100644 index 000..7d08efa --- /dev/null +++ libgcc/ChangeLog.gomp @@ -0,0 +1,10 @@ +2014-04-02 Thomas Schwinge tho...@codesourcery.com + + * crtstuff.c [!ENABLE_OFFLOADING] (__OPENMP_TARGET__): Define to + NULL. + +Copyright (C) 2014 Free Software Foundation, Inc. + +Copying and distribution of this file, with or without modification, +are permitted in any medium without royalty provided the copyright +notice and this notice are preserved. diff --git libgcc/crtstuff.c libgcc/crtstuff.c index cda0bae..79af7f0 100644 --- libgcc/crtstuff.c +++ libgcc/crtstuff.c @@ -775,6 +775,8 @@ void *__OPENMP_TARGET__[] __attribute__ ((__visibility__ (protected))) = _omp_func_table, _omp_funcs_end, _omp_var_table, _omp_vars_end }; +#else +void **__OPENMP_TARGET__ __attribute__ ((__visibility__ (protected))) = NULL; #endif Also, I'd suggest to rename __OPENMP_TARGET__ (and similar ones) to __GNU_OFFLOAD__ (or similar). As we're using this offloading stuff for both OpenACC and OpenMP target, it makes sense to me to use a generic name; we still have the chance to do so now while this stuff is not yet in trunk. Grüße, Thomas pgpMH12KYLnx1.pgp Description: PGP signature
Re: [PATCH][LTO] Rework -flto-partition=, add =one case
On Tue, 1 Apr 2014, Jan Hubicka wrote: This reworks the option to use the Enum support we have now and adds a =one case (to eventually get rid of one LTO operation mode, =none ...). I was tempted to support -flto-partition=number and get rid of --param lto-partitions (thereby also supporting =1), Yep, I preffer to have one switch to chose algorithm and other to set its parameter as you do now. At the moment partitioning is quite a non-issue since only important IPA passes works on whole thing, but that may change and we may want to play with different partitionings. (I have plans for that for incremental compilation and other things) Well, partitioning is important to get a parallel build. but that param specifies the maximum number of partitions and still uses the balanced algorithm, thus the result would be confusing (and of little use I suppose, as opposed to =1 which should give you the same answer as =none). =none still seems somewhat useful - for setups where you do multiple parallel compilations it will be faster than WHOPR and it helps developing IPA passes since you do not need to worry about WHOPR complexities at start. True, but as it ends up eating more memory your multiple parallel compilations may in the end be slower if they run into swap ;) And you can do simple IPA passes just where IPA-PTA sits now - at LTRANS level. But with the code to bring function bodies at demand, this is less important. I believe with passmanager being bit more flexible, the code paths can be almost completely shared. Have few patches on this and pass queue reorg for next stage1, so will try to push them out. Yeah, it would be nice to make the flow of compilation somewhat more obvious that it is now ... Richard.
[PATCH] Remove stale declaration
I noticed that we declare this function, but its definition was removed in 2009 by P. Bonzini, thus the decl serves no purpose. Regtested/bootstrapped on x86_64-linux, ok for trunk? 2014-04-02 Marek Polacek pola...@redhat.com * c-common.h (c_expand_expr): Remove declaration. diff --git gcc/c-family/c-common.h gcc/c-family/c-common.h index 1099b10..24959d8 100644 --- gcc/c-family/c-common.h +++ gcc/c-family/c-common.h @@ -928,8 +928,6 @@ extern bool vector_targets_convertible_p (const_tree t1, const_tree t2); extern bool vector_types_convertible_p (const_tree t1, const_tree t2, bool emit_lax_note); extern tree c_build_vec_perm_expr (location_t, tree, tree, tree, bool = true); -extern rtx c_expand_expr (tree, rtx, enum machine_mode, int, rtx *); - extern void init_c_lex (void); extern void c_cpp_builtins (cpp_reader *); Marek
Re: [PATCH] Remove stale declaration
On Wed, Apr 2, 2014 at 12:36 PM, Marek Polacek pola...@redhat.com wrote: I noticed that we declare this function, but its definition was removed in 2009 by P. Bonzini, thus the decl serves no purpose. Regtested/bootstrapped on x86_64-linux, ok for trunk? Ok. Thanks, Richard. 2014-04-02 Marek Polacek pola...@redhat.com * c-common.h (c_expand_expr): Remove declaration. diff --git gcc/c-family/c-common.h gcc/c-family/c-common.h index 1099b10..24959d8 100644 --- gcc/c-family/c-common.h +++ gcc/c-family/c-common.h @@ -928,8 +928,6 @@ extern bool vector_targets_convertible_p (const_tree t1, const_tree t2); extern bool vector_types_convertible_p (const_tree t1, const_tree t2, bool emit_lax_note); extern tree c_build_vec_perm_expr (location_t, tree, tree, tree, bool = true); -extern rtx c_expand_expr (tree, rtx, enum machine_mode, int, rtx *); - extern void init_c_lex (void); extern void c_cpp_builtins (cpp_reader *); Marek
Re: [committed, libjava] XFAIL sourcelocation (PR libgcj/55637) backported to 4.8.3
domi...@lps.ens.fr (Dominique Dhumieres) writes: r...@cebitec.uni-bielefeld.de (Rainer Orth) wrote: Sure, patch preapproved. Commited as r208983: 2014-04-01 Dominique d'Humieres domi...@lps.ens.fr Rainer Orth r...@cebitec.uni-bielefeld.de PR libgcj/55637 * testsuite/libjava.lang/sourcelocation.xfail: New file. Btw, the customary format for such a ChangeLog entry is 2014-04-01 Dominique d'Humieres domi...@lps.ens.fr Backport from mainline 2014-02-20 Rainer Orth r...@cebitec.uni-bielefeld.de PR libgcj/55637 * testsuite/libjava.lang/sourcelocation.xfail: New file. This way, you can easily see when the original went in. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH, ARM] Enable tail call optimization for long call
On 25/03/14 15:44, Richard Earnshaw wrote: On 24/03/14 11:26, Jiong Wang wrote: This patch enables tail call optimization for long call on arm. Previously we have too strict check on arm_function_ok_for_sibcall and be lack of the support on sibcall/sibcall_value expand that long call tail oppportunities are lost. OK for next next stage 1? I think this is OK for EABI targets (since we can rely on the linker generating the right form of interworking veneer), but I'm less certain about other systems (do we still support COFF). I think I'd prefer the patch to factor in TARGET_AAPCS_BASED and to assume that if that is true then arbitrary tail-calls are safe. Hi Richard, IMHO, this is actually a tail call optimization, we just need to make sure the register which hold the address be caller saved then it will be OK. Updated the change log to fix that aarch64 typo. No modification on the patch, but enclose it in this reply to keep wholeness. So, is it ok for next stage-1? Thanks. -- Jiong gcc/ * config/arm/predicates.md (call_insn_operand): Add long_call check. * config/arm/arm.md (sibcall, sibcall_value): Force the address to reg for long_call. * config/arm/arm.c (arm_function_ok_for_sibcall): Remove long_call restriction. gcc/testsuite gcc.target/arm/tail-long-call.c: New test. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index d5f9ff3..8dcdfa8 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -6087,11 +6087,6 @@ arm_function_ok_for_sibcall (tree decl, tree exp) if (TARGET_VXWORKS_RTP flag_pic !targetm.binds_local_p (decl)) return false; - /* Cannot tail-call to long calls, since these are out of range of - a branch instruction. */ - if (decl arm_is_long_call_p (decl)) -return false; - /* If we are interworking and the function is not declared static then we can't tail-call it unless we know that it exists in this compilation unit (since it might be a Thumb routine). */ diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 2ddda02..fe285f0 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -9444,8 +9444,10 @@ TARGET_32BIT { -if (!REG_P (XEXP (operands[0], 0)) -(GET_CODE (XEXP (operands[0], 0)) != SYMBOL_REF)) +if ((!REG_P (XEXP (operands[0], 0)) + GET_CODE (XEXP (operands[0], 0)) != SYMBOL_REF) + || (GET_CODE (XEXP (operands[0], 0)) == SYMBOL_REF + arm_is_long_call_p (SYMBOL_REF_DECL (XEXP (operands[0], 0) XEXP (operands[0], 0) = force_reg (SImode, XEXP (operands[0], 0)); if (operands[2] == NULL_RTX) @@ -9462,8 +9464,10 @@ TARGET_32BIT { -if (!REG_P (XEXP (operands[1], 0)) - (GET_CODE (XEXP (operands[1],0)) != SYMBOL_REF)) +if ((!REG_P (XEXP (operands[1], 0)) + GET_CODE (XEXP (operands[1], 0)) != SYMBOL_REF) + || (GET_CODE (XEXP (operands[1], 0)) == SYMBOL_REF + arm_is_long_call_p (SYMBOL_REF_DECL (XEXP (operands[1], 0) XEXP (operands[1], 0) = force_reg (SImode, XEXP (operands[1], 0)); if (operands[3] == NULL_RTX) diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md index ce5c9a8..3673343 100644 --- a/gcc/config/arm/predicates.md +++ b/gcc/config/arm/predicates.md @@ -673,5 +673,6 @@ (match_code reg 0))) (define_predicate call_insn_operand - (ior (match_code symbol_ref) + (ior (and (match_code symbol_ref) + (match_test !arm_is_long_call_p (SYMBOL_REF_DECL (op (match_operand 0 s_register_operand))) diff --git a/gcc/testsuite/gcc.target/arm/tail-long-call.c b/gcc/testsuite/gcc.target/arm/tail-long-call.c new file mode 100644 index 000..9b27468 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/tail-long-call.c @@ -0,0 +1,12 @@ +/* { dg-skip-if need at least armv5te { *-*-* } { -march=armv[234]* -mthumb } { } } */ +/* { dg-options -O2 -march=armv5te -marm } */ +/* { dg-final { scan-assembler bx } } */ +/* { dg-final { scan-assembler-not blx } } */ + +int lcal (int) __attribute__ ((long_call)); + +int +dec (int a) +{ + return lcal (a); +}
Re: [PATCH][AARCH64] Support tail indirect function call
^Ping... Regards, Jiong On 18/03/14 14:13, Jiong Wang wrote: Current, indirect function call prevents tail-call optimization on AArch64. This patch adapt the fix for PR arm/19599 to AArch64. Is it ok for next stage 1? Thanks. -- Jiong gcc/ * config/aarch64/predicates.md (aarch64_call_insn_operand): New predicate. * config/aarch64/constraints.md (Ucs, Usf): New constraints. * config/aarch64/aarch64.md (*sibcall_insn, *sibcall_value_insn): Adjust for tailcalling through registers. * config/aarch64/aarch64.h (enum reg_class): New caller save register class. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Likewise. * config/aarch64/aarch64.c (aarch64_function_ok_for_sibcall): Allow tailcalling without decls. gcc/testsuite *gcc.target/aarch64/tail-indirect-call.c: New test. -- Jiong
[PATCH][LTO/PGO] Warn when both -flto and -fprofile-generate are enabled
It is a common mistake to enable both -flto and -fprofile-generate when building projects. This is not a good idea, because memory use will skyrocket due to instrumentation. So just warn the user. OK for next stage1? 2014-04-02 Markus Trippelsdorf mar...@trippelsdorf.de * common.opt (fprofile-generate): Add flag. * opts.c (finish_options): Add new warning. (common_handle_option): Set flag. diff --git a/gcc/common.opt b/gcc/common.opt index 62c72f0d2fbf..61e9adfa0df5 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1689,7 +1689,7 @@ Common Report Var(flag_profile_correction) Enable correction of flow inconsistent profile data input fprofile-generate -Common +Common Var(flag_profile_generate) Enable common options for generating profile info for profile feedback directed optimizations fprofile-generate= diff --git a/gcc/opts.c b/gcc/opts.c index fdc903f9271a..b62a0d626d94 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -833,6 +833,9 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set, error_at (loc, only one -flto-partition value can be specified); } + if (opts-x_flag_generate_lto opts-x_flag_profile_generate) +warning_at (loc, 0, Enabling both -fprofile-generate and -flto is a bad idea.); + /* We initialize opts-x_flag_split_stack to -1 so that targets can set a default value if they choose based on other options. */ if (opts-x_flag_split_stack == -1) @@ -1728,6 +1731,7 @@ common_handle_option (struct gcc_options *opts, case OPT_fprofile_generate_: opts-x_profile_data_prefix = xstrdup (arg); + opts-x_flag_profile_generate = true; value = true; /* No break here - do -fprofile-generate processing. */ case OPT_fprofile_generate: -- Markus
Re: [PATCH][LTO/PGO] Warn when both -flto and -fprofile-generate are enabled
On Wed, Apr 02, 2014 at 01:50:31PM +0200, Markus Trippelsdorf wrote: + if (opts-x_flag_generate_lto opts-x_flag_profile_generate) +warning_at (loc, 0, Enabling both -fprofile-generate and -flto is a bad idea.); s/Enabling/enabling/ + no dot at the end. Marek
[PATCHv2][LTO/PGO] Warn when both -flto and -fprofile-generate are enabled
It is a common mistake to enable both -flto and -fprofile-generate when building projects. This is not a good idea, because memory use will skyrocket due to instrumentation. So just warn the user. OK for next stage1? 2014-04-02 Markus Trippelsdorf mar...@trippelsdorf.de * common.opt (fprofile-generate): Add flag. * opts.c (finish_options): Add new warning. (common_handle_option): Set flag. diff --git a/gcc/common.opt b/gcc/common.opt index 62c72f0d2fbf..61e9adfa0df5 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1689,7 +1689,7 @@ Common Report Var(flag_profile_correction) Enable correction of flow inconsistent profile data input fprofile-generate -Common +Common Var(flag_profile_generate) Enable common options for generating profile info for profile feedback directed optimizations fprofile-generate= diff --git a/gcc/opts.c b/gcc/opts.c index fdc903f9271a..581d2e948483 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -833,6 +833,9 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set, error_at (loc, only one -flto-partition value can be specified); } + if (opts-x_flag_generate_lto opts-x_flag_profile_generate) +warning_at (loc, 0, enabling both -fprofile-generate and -flto is a bad idea); + /* We initialize opts-x_flag_split_stack to -1 so that targets can set a default value if they choose based on other options. */ if (opts-x_flag_split_stack == -1) @@ -1728,6 +1731,7 @@ common_handle_option (struct gcc_options *opts, case OPT_fprofile_generate_: opts-x_profile_data_prefix = xstrdup (arg); + opts-x_flag_profile_generate = true; value = true; /* No break here - do -fprofile-generate processing. */ case OPT_fprofile_generate: -- Markus
Re: [PATCH][LTO/PGO] Warn when both -flto and -fprofile-generate are enabled
On Wed, Apr 2, 2014 at 1:50 PM, Markus Trippelsdorf mar...@trippelsdorf.de wrote: It is a common mistake to enable both -flto and -fprofile-generate when building projects. This is not a good idea, because memory use will skyrocket due to instrumentation. So just warn the user. OK for next stage1? I'd rather see if we can fix the underlying issue. For example as we are now instrumenting as IPA pass we can allocate a single counter array (if the number of global vars is the issue). Basically split analysis and instrumentation into two phases for that. Or even better, do profile instrumentation as real IPA pass. Richard. 2014-04-02 Markus Trippelsdorf mar...@trippelsdorf.de * common.opt (fprofile-generate): Add flag. * opts.c (finish_options): Add new warning. (common_handle_option): Set flag. diff --git a/gcc/common.opt b/gcc/common.opt index 62c72f0d2fbf..61e9adfa0df5 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1689,7 +1689,7 @@ Common Report Var(flag_profile_correction) Enable correction of flow inconsistent profile data input fprofile-generate -Common +Common Var(flag_profile_generate) Enable common options for generating profile info for profile feedback directed optimizations fprofile-generate= diff --git a/gcc/opts.c b/gcc/opts.c index fdc903f9271a..b62a0d626d94 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -833,6 +833,9 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set, error_at (loc, only one -flto-partition value can be specified); } + if (opts-x_flag_generate_lto opts-x_flag_profile_generate) +warning_at (loc, 0, Enabling both -fprofile-generate and -flto is a bad idea.); + /* We initialize opts-x_flag_split_stack to -1 so that targets can set a default value if they choose based on other options. */ if (opts-x_flag_split_stack == -1) @@ -1728,6 +1731,7 @@ common_handle_option (struct gcc_options *opts, case OPT_fprofile_generate_: opts-x_profile_data_prefix = xstrdup (arg); + opts-x_flag_profile_generate = true; value = true; /* No break here - do -fprofile-generate processing. */ case OPT_fprofile_generate: -- Markus
Re: [PATCH][LTO/PGO] Warn when both -flto and -fprofile-generate are enabled
On Wed, Apr 2, 2014 at 2:07 PM, Richard Biener richard.guent...@gmail.com wrote: On Wed, Apr 2, 2014 at 1:50 PM, Markus Trippelsdorf mar...@trippelsdorf.de wrote: It is a common mistake to enable both -flto and -fprofile-generate when building projects. This is not a good idea, because memory use will skyrocket due to instrumentation. So just warn the user. OK for next stage1? I'd rather see if we can fix the underlying issue. For example as we are now instrumenting as IPA pass we can allocate a single counter array (if the number of global vars is the issue). Basically split analysis and instrumentation into two phases for that. Or even better, do profile instrumentation as real IPA pass. Thus, isn't -coverage also facing the same issue? Thus, is it really -fprofile-arcs already or only one of the value profiling pieces? Richard. Richard. 2014-04-02 Markus Trippelsdorf mar...@trippelsdorf.de * common.opt (fprofile-generate): Add flag. * opts.c (finish_options): Add new warning. (common_handle_option): Set flag. diff --git a/gcc/common.opt b/gcc/common.opt index 62c72f0d2fbf..61e9adfa0df5 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1689,7 +1689,7 @@ Common Report Var(flag_profile_correction) Enable correction of flow inconsistent profile data input fprofile-generate -Common +Common Var(flag_profile_generate) Enable common options for generating profile info for profile feedback directed optimizations fprofile-generate= diff --git a/gcc/opts.c b/gcc/opts.c index fdc903f9271a..b62a0d626d94 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -833,6 +833,9 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set, error_at (loc, only one -flto-partition value can be specified); } + if (opts-x_flag_generate_lto opts-x_flag_profile_generate) +warning_at (loc, 0, Enabling both -fprofile-generate and -flto is a bad idea.); + /* We initialize opts-x_flag_split_stack to -1 so that targets can set a default value if they choose based on other options. */ if (opts-x_flag_split_stack == -1) @@ -1728,6 +1731,7 @@ common_handle_option (struct gcc_options *opts, case OPT_fprofile_generate_: opts-x_profile_data_prefix = xstrdup (arg); + opts-x_flag_profile_generate = true; value = true; /* No break here - do -fprofile-generate processing. */ case OPT_fprofile_generate: -- Markus
Re: [PATCHv2][LTO/PGO] Warn when both -flto and -fprofile-generate are enabled
Markus Trippelsdorf mar...@trippelsdorf.de writes: diff --git a/gcc/opts.c b/gcc/opts.c index fdc903f9271a..581d2e948483 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -833,6 +833,9 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set, error_at (loc, only one -flto-partition value can be specified); } + if (opts-x_flag_generate_lto opts-x_flag_profile_generate) +warning_at (loc, 0, enabling both -fprofile-generate and -flto is a bad idea); This warning is not very helpful in this form. Rather say something like `causes excessive memory consumption' if this is the problem. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH] Simple enhancements to dumping in ipa.c and ipa-cp.c
Hi, recently I've been looking into a number of bugs involving symtab_remove_unreachable_nodes in one way or another and I have always started by applying the hunk below. I did this because distinguishing different symbol nodes only according to their names is just so inconvenient, especially when compiling C++. The risk is minimal and therefore I'd like to propose it to trunk even at this late stage, although I can of course wait until the next stage1. The other hunk is something that I think is also useful when looking into all failures of ipcp_verify_propagated_values like e.g. PR 60727. I included the patch in a recent bootstrap and testing and it of course passes. OK for trunk now? Or later? Thanks, Martin 2014-04-01 Martin Jambor mjam...@suse.cz * ipa-cp.c (ipcp_verify_propagated_values): Also dump symtab and mention gcc_unreachable before failing. * ipa.c (symtab_remove_unreachable_nodes): Also print order of removed symbols. Index: src/gcc/ipa-cp.c === --- src.orig/gcc/ipa-cp.c +++ src/gcc/ipa-cp.c @@ -884,8 +884,9 @@ ipcp_verify_propagated_values (void) { if (dump_file) { + dump_symtab (dump_file); fprintf (dump_file, \nIPA lattices after constant - propagation:\n); + propagation, before gcc_unreachable:\n); print_all_lattices (dump_file, true, false); } Index: src/gcc/ipa.c === --- src.orig/gcc/ipa.c +++ src/gcc/ipa.c @@ -469,7 +469,7 @@ symtab_remove_unreachable_nodes (bool be if (!node-aux) { if (file) - fprintf (file, %s, node-name ()); + fprintf (file, %s/%i, node-name (), node-order); cgraph_remove_node (node); changed = true; } @@ -483,7 +483,7 @@ symtab_remove_unreachable_nodes (bool be if (node-definition) { if (file) - fprintf (file, %s, node-name ()); + fprintf (file, %s/%i, node-name (), node-order); node-body_removed = true; node-analyzed = false; node-definition = false; @@ -531,7 +531,7 @@ symtab_remove_unreachable_nodes (bool be (!flag_ltrans || !DECL_EXTERNAL (vnode-decl))) { if (file) - fprintf (file, %s, vnode-name ()); + fprintf (file, %s/%i, vnode-name (), vnode-order); varpool_remove_node (vnode); changed = true; }
[PATCH] Disable IPA-SRA for always_inline functions
Hi, when dealing with a PR yesterday I have noticed that IPA-SRA was modifying an always_inline function which is useless work since the function must then be inlined anyway. Thus I'd like to propose the following simple change disabling it in such cases. Included in a bootstrap and testing on x86_64-linux. OK for trunk now or in the next stsge1? Thanks, Martin 2014-04-01 Martin Jambor mjam...@suse.cz * tree-sra.c (ipa_sra_preliminary_function_checks): Skip always_inline functions. Index: src/gcc/tree-sra.c === --- src.orig/gcc/tree-sra.c +++ src/gcc/tree-sra.c @@ -4960,6 +4960,15 @@ ipa_sra_preliminary_function_checks (str if (TYPE_ATTRIBUTES (TREE_TYPE (node-decl))) return false; + if (lookup_attribute (always_inline, + DECL_ATTRIBUTES (node-decl)) != NULL) +{ + if (dump_file) + fprintf (dump_file, Allways inline function will be inlined +anyway. \n); + return false; +} + return true; }
Re: [PATCH] Simple enhancements to dumping in ipa.c and ipa-cp.c
On Wed, 2 Apr 2014, Martin Jambor wrote: Hi, recently I've been looking into a number of bugs involving symtab_remove_unreachable_nodes in one way or another and I have always started by applying the hunk below. I did this because distinguishing different symbol nodes only according to their names is just so inconvenient, especially when compiling C++. The risk is minimal and therefore I'd like to propose it to trunk even at this late stage, although I can of course wait until the next stage1. The other hunk is something that I think is also useful when looking into all failures of ipcp_verify_propagated_values like e.g. PR 60727. I included the patch in a recent bootstrap and testing and it of course passes. OK for trunk now? Or later? I'll leave the actual changes for review by Honza, it's fine at this stage if he things the changes make sense and are consistent. Thanks, Richard. Thanks, Martin 2014-04-01 Martin Jambor mjam...@suse.cz * ipa-cp.c (ipcp_verify_propagated_values): Also dump symtab and mention gcc_unreachable before failing. * ipa.c (symtab_remove_unreachable_nodes): Also print order of removed symbols. Index: src/gcc/ipa-cp.c === --- src.orig/gcc/ipa-cp.c +++ src/gcc/ipa-cp.c @@ -884,8 +884,9 @@ ipcp_verify_propagated_values (void) { if (dump_file) { + dump_symtab (dump_file); fprintf (dump_file, \nIPA lattices after constant -propagation:\n); +propagation, before gcc_unreachable:\n); print_all_lattices (dump_file, true, false); } Index: src/gcc/ipa.c === --- src.orig/gcc/ipa.c +++ src/gcc/ipa.c @@ -469,7 +469,7 @@ symtab_remove_unreachable_nodes (bool be if (!node-aux) { if (file) - fprintf (file, %s, node-name ()); + fprintf (file, %s/%i, node-name (), node-order); cgraph_remove_node (node); changed = true; } @@ -483,7 +483,7 @@ symtab_remove_unreachable_nodes (bool be if (node-definition) { if (file) - fprintf (file, %s, node-name ()); + fprintf (file, %s/%i, node-name (), node-order); node-body_removed = true; node-analyzed = false; node-definition = false; @@ -531,7 +531,7 @@ symtab_remove_unreachable_nodes (bool be (!flag_ltrans || !DECL_EXTERNAL (vnode-decl))) { if (file) - fprintf (file, %s, vnode-name ()); + fprintf (file, %s/%i, vnode-name (), vnode-order); varpool_remove_node (vnode); changed = true; } -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: [PATCH] Disable IPA-SRA for always_inline functions
On Wed, 2 Apr 2014, Martin Jambor wrote: Hi, when dealing with a PR yesterday I have noticed that IPA-SRA was modifying an always_inline function which is useless work since the function must then be inlined anyway. Thus I'd like to propose the following simple change disabling it in such cases. Included in a bootstrap and testing on x86_64-linux. OK for trunk now or in the next stsge1? Ok for next stage1, but please short-cut the lookup_attribute with a DECL_DISREGARD_INLINE_LIMITS () check. Maybe even abstract this away into a predicate on the cgraph node. Thanks, Richard. Thanks, Martin 2014-04-01 Martin Jambor mjam...@suse.cz * tree-sra.c (ipa_sra_preliminary_function_checks): Skip always_inline functions. Index: src/gcc/tree-sra.c === --- src.orig/gcc/tree-sra.c +++ src/gcc/tree-sra.c @@ -4960,6 +4960,15 @@ ipa_sra_preliminary_function_checks (str if (TYPE_ATTRIBUTES (TREE_TYPE (node-decl))) return false; + if (lookup_attribute (always_inline, + DECL_ATTRIBUTES (node-decl)) != NULL) +{ + if (dump_file) + fprintf (dump_file, Allways inline function will be inlined + anyway. \n); + return false; +} + return true; } -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: [PATCH][ARM] Handle simple SImode PLUS and MINUS operations in rtx costs
Pinging this for stage1, otherwise I'll forget about it and it'll fall through the cracks... http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01276.html Thanks, Kyrill On 24/03/14 17:21, Kyrill Tkachov wrote: Hi all, I noticed that we don't handle simple reg-to-reg arithmetic operations in the arm rtx cost functions. We should be adding the cost of alu.arith to the costs of the operands. This patch does that. Since we don't have any cost tables yet that have a non-zero value for that field it shouldn't affect code-gen for any current cores. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for next stage1? Thanks, Kyrill 2014-03-24 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.c (arm_new_rtx_costs): Handle reg-to-reg PLUS and MINUS RTXs.
[PATCH] [ARM] [RFC] Fix longstanding push_minipool_fix ICE (PR49423, lp1296601)
Hi This patch fixes the push_minipool_fix ICE, which occurs when the ARM backend encounters a zero/sign extending load from a constant pool. I don't have a current test case for trunk, lp1296601 has a test case which affects the linaro-4.8 branch. As far as I know, there has been no fix for this on trunk. The approach taken in this patch is to extend each pattern where this can occur, so that it triggers a define_split to synthesise a constant move instead. Some but not all extend patterns have previously added pool_range attributes to work-around this problem, this patch removes those, and also fixes the remaining patterns. Some patterns have slightly more complex workarounds, which I have not yet analysed, but it seems worth posting the patch at this stage to get feedback on the general approach. Tested on arm-unknown-linux-gnueabihf (qemu), bootstrap in progress. If this looks good, I'll clean it up for a more detailed review. Thanks Charles 0001-initial-attempt-at-fixing-push_minipool_fix-ICE.patch Description: application/download
Re: [Patch, AArch64] Fix shuffle for big-endian.
Richard Henderson wrote: On 02/21/2014 08:30 AM, Tejas Belagod wrote: + /* If two vectors, we end up with a wierd mixed-endian mode on NEON. */ + if (BYTES_BIG_ENDIAN) + { + if (!d-one_vector_p d-perm[i] nunits) + { + /* Extract the offset. */ + elt = d-perm[i] (nunits - 1); + /* Reverse the top half. */ + elt = nunits - 1 - elt; + /* Offset it by the bottom half. */ + elt += nunits; + } + else + elt = nunits - 1 - d-perm[i]; + } Isn't this just elt = d-perm[i] ^ (nunits - 1); all the time? I.e. invert the index within the word, but leave the word index (nunits) unchanged. Here is a revised patch. OK for stage-1? Thanks Tejas. 2014-04-02 Tejas Belagod tejas.bela...@yahoo.com gcc/ * config/aarch64/aarch64.c (aarch64_evpc_tbl): Reverse order of elements for big-endian.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index e839539..d30b79c 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -8129,7 +8129,15 @@ aarch64_evpc_tbl (struct expand_vec_perm_d *d) return false; for (i = 0; i nelt; ++i) -rperm[i] = GEN_INT (d-perm[i]); +{ + int nunits = GET_MODE_NUNITS (vmode); + + /* If big-endian and two vectors we end up with a wierd mixed-endian +mode on NEON. Reverse the index within each word but not the word +itself. */ + rperm[i] = GEN_INT (BYTES_BIG_ENDIAN ? d-perm[i] ^ (nunits - 1) + : d-perm[i]); +} sel = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rperm)); sel = force_reg (vmode, sel);
Re: [PATCH] aarch64 suuport for libitm
On 04/01/2014 03:41 PM, Andrew Pinski wrote: On Tue, Apr 1, 2014 at 3:24 PM, Richard Henderson r...@redhat.com wrote: Comments? If approved, should this go in for 4.9, or wait for stage1? Certainly it's self-contained... On Cavium's thunder processor the cache line size is going to be bigger than 64 bytes, what is your solution to improve performance on target's like Thunder? We can expand the number reasonably. The only thing it controls is layout of some of the internal data structures to attempt to put different locks on different lines. Is 128 big enough for Thunder? Honestly, I may well not even have it right for the processor we have in house. I didn't bother trying to track down docs to find out. Also I think the default page size for most Linux distros is going to be 64k on aarch64 including Redhat Linux so it makes sense not to define FIXED_PAGE_SIZE. Heh. It turns out these page size defines aren't used any more at all. During one of the rewrites we must have delete the bits that used it. I'll get rid of all of them so as to be less confusing. I will implement the ILP32 version of this patch once it goes in, there needs a few changes in gtm_jmpbuf due to long and pointers being 32bit but the assembly storing 64bits always. I can minimize those changes now by using unsigned long long... r~
Re: [PATCH][1/3] Fix PR54733 Optimize endian independent load/store
On Wed, 2 Apr 2014, Thomas Preud'homme wrote: + if { [is-effective-target bswap] + ![istarget x86_64-*-*] } { That x86_64-*-* test is wrong. x86_64-*-* and i?86-*-* should always be handled the same (if you then want to distinguish 32-bit and 64-bit multilibs, you check the appropriate effective-target there, depending on whether the condition is one on the ABI or which register size is being used, which affects how x32 should be counted). -- Joseph S. Myers jos...@codesourcery.com
[4.8, PATCH 27/26] Backport Power8 and LE support: Fixes for AIX test failures
Hi, This patch (diff-aix) adds to the 4.8 PowerPC backport patch series with a few backported fixes from trunk that repair test failures on AIX. Thanks, Bill [gcc] 2014-04-02 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline r205308 2013-11-23 David Edelsohn dje@gmail.com * config/rs6000/rs6000.c (IN_NAMED_SECTION): New macro. (rs6000_xcoff_select_section): Place decls with stricter alignment into named sections. (rs6000_xcoff_unique_section): Allow unique sections for uninitialized data with strict alignment. [gcc/testsuite] 2014-04-02 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline 2013-04-05 David Edelsohn dje@gmail.com * gcc.target/powerpc/sd-vsx.c: Skip on AIX. * gcc.target/powerpc/sd-pwr6.c: Same. Index: gcc-4_8-test2/gcc/config/rs6000/rs6000.c === --- gcc-4_8-test2.orig/gcc/config/rs6000/rs6000.c +++ gcc-4_8-test2/gcc/config/rs6000/rs6000.c @@ -29165,10 +29165,23 @@ rs6000_xcoff_asm_named_section (const ch name, suffix[smclass], flags SECTION_ENTSIZE); } +#define IN_NAMED_SECTION(DECL) \ + ((TREE_CODE (DECL) == FUNCTION_DECL || TREE_CODE (DECL) == VAR_DECL) \ +DECL_SECTION_NAME (DECL) != NULL_TREE) + static section * rs6000_xcoff_select_section (tree decl, int reloc, -unsigned HOST_WIDE_INT align ATTRIBUTE_UNUSED) +unsigned HOST_WIDE_INT align) { + /* Place variables with alignment stricter than BIGGEST_ALIGNMENT into + named section. */ + if (align BIGGEST_ALIGNMENT) +{ + resolve_unique_section (decl, reloc, true); + if (IN_NAMED_SECTION (decl)) + return get_named_section (decl, NULL, reloc); +} + if (decl_readonly_section (decl, reloc)) { if (TREE_PUBLIC (decl)) @@ -29206,10 +29219,12 @@ rs6000_xcoff_unique_section (tree decl, { const char *name; - /* Use select_section for private and uninitialized data. */ + /* Use select_section for private data and uninitialized data with + alignment = BIGGEST_ALIGNMENT. */ if (!TREE_PUBLIC (decl) || DECL_COMMON (decl) - || DECL_INITIAL (decl) == NULL_TREE + || (DECL_INITIAL (decl) == NULL_TREE + DECL_ALIGN (decl) = BIGGEST_ALIGNMENT) || DECL_INITIAL (decl) == error_mark_node || (flag_zero_initialized_in_bss initializer_zerop (DECL_INITIAL (decl Index: gcc-4_8-test2/gcc/testsuite/gcc.target/powerpc/sd-pwr6.c === --- gcc-4_8-test2.orig/gcc/testsuite/gcc.target/powerpc/sd-pwr6.c +++ gcc-4_8-test2/gcc/testsuite/gcc.target/powerpc/sd-pwr6.c @@ -1,5 +1,5 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ -/* { dg-skip-if { powerpc*-*-darwin* } { * } { } } */ +/* { dg-skip-if { powerpc*-*-darwin* powerpc-ibm-aix* } { * } { } } */ /* { dg-require-effective-target powerpc_vsx_ok } */ /* { dg-options -O2 -mcpu=power6 -mhard-dfp } */ /* { dg-final { scan-assembler-not lfiwzx } } */ Index: gcc-4_8-test2/gcc/testsuite/gcc.target/powerpc/sd-vsx.c === --- gcc-4_8-test2.orig/gcc/testsuite/gcc.target/powerpc/sd-vsx.c +++ gcc-4_8-test2/gcc/testsuite/gcc.target/powerpc/sd-vsx.c @@ -1,5 +1,5 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ -/* { dg-skip-if { powerpc*-*-darwin* } { * } { } } */ +/* { dg-skip-if { powerpc*-*-darwin* powerpc-ibm-aix* } { * } { } } */ /* { dg-require-effective-target powerpc_vsx_ok } */ /* { dg-options -O2 -mcpu=power7 -mhard-dfp } */ /* { dg-final { scan-assembler-times lfiwzx 2 } } */
[Patch C++] PR57958 RFC
Hi, Following change fixes gimple production for lambda function, in the patch I assumed that constructing COMPOUND_EXPR for the return value of auto type function resoluted to CLASS_TYPE_P is wrong. Tested x86_64-pc-linux-gnu by applying to trunk with no new regressions. Thanks, Dinar. fix1.patch Description: Binary data
Re: [PATCH] aarch64 suuport for libitm
On Apr 2, 2014, at 7:37 AM, Richard Henderson r...@redhat.com wrote: On 04/01/2014 03:41 PM, Andrew Pinski wrote: On Tue, Apr 1, 2014 at 3:24 PM, Richard Henderson r...@redhat.com wrote: Comments? If approved, should this go in for 4.9, or wait for stage1? Certainly it's self-contained... On Cavium's thunder processor the cache line size is going to be bigger than 64 bytes, what is your solution to improve performance on target's like Thunder? We can expand the number reasonably. The only thing it controls is layout of some of the internal data structures to attempt to put different locks on different lines. Is 128 big enough for Thunder? Honestly, I may well not even have it right for the processor we have in house. I didn't bother trying to track down docs to find out. Yes 128 should be enough. Thanks, Andrew Also I think the default page size for most Linux distros is going to be 64k on aarch64 including Redhat Linux so it makes sense not to define FIXED_PAGE_SIZE. Heh. It turns out these page size defines aren't used any more at all. During one of the rewrites we must have delete the bits that used it. I'll get rid of all of them so as to be less confusing. I will implement the ILP32 version of this patch once it goes in, there needs a few changes in gtm_jmpbuf due to long and pointers being 32bit but the assembly storing 64bits always. I can minimize those changes now by using unsigned long long... r~
Re: RFA: Fix PR rtl-optimization/60651
On 28 March 2014 10:20, Eric Botcazou ebotca...@adacore.com wrote: However, the first call is for blocks with incoming abnormal edges. If these are empty, the change as I wrote it yesterday is fine, but not when they are non-empty; in that case, we should indeed insert before the first instruction in that block. OK, so the issue is specific to empty basic blocks and boils down to inserting instructions in a FIFO manner into them. Actually, the issue also applies to abnormal edges where lcm did leave a set - but these are rare, and my last patch should handle these properly in any event, by no longer using the NOTE_INSN_BASIC_BLOCK itself unless the block is empty. This can be archived by finding an insert-before position using NEXT_INSN on the basic block head; this amounts to the very same insertion place as inserting after the basic block head. Also, we will continue to set no location, and use the same bb, because both add_insn_before and add_insn_after (in contradiction to its block comment) will infer the basic block from the insn given (in the case for add_insn_before, I assume that the basic block doesn't start with a BARRIER - that would be invalid - and that the insn it starts with has a valid BLOCK_FOR_INSN setting the same way the basic block head has. This looks reasonable, but I think that we need more commentary because it's not straightforward to understand, so I would: 1. explicitly state that we enforce an order on the entities in addition to the order on priority, both in the code (for example create a 4th paragraph in the comment at the top of the file, before More details ...) and in the doc as you already did, but ordering the two orders for the sake of clarity: first the order on priority then, for the same priority, the order to the entities. Actually, all the patch provides is a partial order, just as I stated. Providing the strict order you describe would require adding another loop nesting to the entity/basic block/seginfo loop, and it wouldn't really be useful for targets. To order by entity first, then by priority, could be useful for some targets, so that they can express a dependency chain of mode switching events to be computed in a single lcm pass without inflating the mode count (which determines how often we have to invoke the lcm machinery). However, that would require having separate buckets for each entity for each insert_insn_on_edge point. For epiphany, EPIPHANY_MSW_ENTITY_FPU_OMNIBUS (for -O0) and EPIPHANY_MSW_ENTITY_ROUND_KNOWN (used when optimizing) depend on EPIPHANY_MSW_ENTITY_AND, EPIPHANY_MSW_ENTITY_OR and EPIPHANY_MSW_ENTITY_CONFIG. The latter three only have two modes, an the former two use the enum attr_fp_mode values, the first of which is FP_MODE_ROUND_UNKNOWN. That value does not actually appear as a needed mode for these entities, hence the partial order is sufficient. EPIPHANY_MSW_ENTITY_FPU_OMNIBUS also depends on EPIPHANY_MSW_ENTITY_OR. 2. add a line in the head comment of new_seginfo saying that INSN may not be a NOTE_BASIC_BLOCK, unless BB is empty. 3. add a comment above the trick in optimize_mode_switching saying that it is both required to implement the FIFO insertion and valid because we know that the basic block was initially empty. Done. It's not clear to me whether this is a regression or not, so you'll also need to run it by the RMs. I don't think it's a regression. 2014-04-02 Joern Rennecke joern.renne...@embecosm.com gcc: PR rtl-optimization/60651 * mode-switching.c (optimize_mode_switching): Make sure to emit sets of a lower numbered entity before sets of a higher numbered entity to a mode of the same or lower priority. (new_seginfo): Document and enforce requirement that NOTE_INSN_BASIC_BLOCK only appears for empty blocks. * doc/tm.texi.in: Document ordering constraint for emitted mode sets. * doc/tm.texi: Regenerate. gcc/testsuite: PR rtl-optimization/60651 * gcc.target/epiphany/mode-switch.c: New test. diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index f7024a7..b8ca17e 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -9778,6 +9778,8 @@ for @var{entity}. For any fixed @var{entity}, @code{mode_priority_to_mode} Generate one or more insns to set @var{entity} to @var{mode}. @var{hard_reg_live} is the set of hard registers live at the point where the insn(s) are to be inserted. +Sets of a lower numbered entity will be emitted before sets of a higher +numbered entity to a mode of the same or lower priority. @end defmac @node Target Attributes diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 6dcbde4..d793d26 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -7447,6 +7447,8 @@ for @var{entity}. For any fixed @var{entity}, @code{mode_priority_to_mode} Generate one or more insns to set @var{entity} to @var{mode}. @var{hard_reg_live} is the set of hard registers live at
Re: [PATCH] Simple enhancements to dumping in ipa.c and ipa-cp.c
On Wed, 2 Apr 2014, Martin Jambor wrote: Hi, recently I've been looking into a number of bugs involving symtab_remove_unreachable_nodes in one way or another and I have always started by applying the hunk below. I did this because distinguishing different symbol nodes only according to their names is just so inconvenient, especially when compiling C++. The risk is minimal and therefore I'd like to propose it to trunk even at this late stage, although I can of course wait until the next stage1. The other hunk is something that I think is also useful when looking into all failures of ipcp_verify_propagated_values like e.g. PR 60727. I included the patch in a recent bootstrap and testing and it of course passes. OK for trunk now? Or later? I'll leave the actual changes for review by Honza, it's fine at this stage if he things the changes make sense and are consistent. It seems fine to me... Thanks, Richard. Thanks, Martin 2014-04-01 Martin Jambor mjam...@suse.cz * ipa-cp.c (ipcp_verify_propagated_values): Also dump symtab and mention gcc_unreachable before failing. * ipa.c (symtab_remove_unreachable_nodes): Also print order of removed symbols. Index: src/gcc/ipa-cp.c === --- src.orig/gcc/ipa-cp.c +++ src/gcc/ipa-cp.c @@ -884,8 +884,9 @@ ipcp_verify_propagated_values (void) { if (dump_file) { + dump_symtab (dump_file); fprintf (dump_file, \nIPA lattices after constant - propagation:\n); + propagation, before gcc_unreachable:\n); This means before symtab_remove_unreachable_nodes? Honza print_all_lattices (dump_file, true, false); } Index: src/gcc/ipa.c === --- src.orig/gcc/ipa.c +++ src/gcc/ipa.c @@ -469,7 +469,7 @@ symtab_remove_unreachable_nodes (bool be if (!node-aux) { if (file) - fprintf (file, %s, node-name ()); + fprintf (file, %s/%i, node-name (), node-order); cgraph_remove_node (node); changed = true; } @@ -483,7 +483,7 @@ symtab_remove_unreachable_nodes (bool be if (node-definition) { if (file) - fprintf (file, %s, node-name ()); + fprintf (file, %s/%i, node-name (), node-order); node-body_removed = true; node-analyzed = false; node-definition = false; @@ -531,7 +531,7 @@ symtab_remove_unreachable_nodes (bool be (!flag_ltrans || !DECL_EXTERNAL (vnode-decl))) { if (file) - fprintf (file, %s, vnode-name ()); + fprintf (file, %s/%i, vnode-name (), vnode-order); varpool_remove_node (vnode); changed = true; } -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: [PATCH] Simple enhancements to dumping in ipa.c and ipa-cp.c
Hi, On Wed, Apr 02, 2014 at 06:08:27PM +0200, Jan Hubicka wrote: On Wed, 2 Apr 2014, Martin Jambor wrote: Hi, recently I've been looking into a number of bugs involving symtab_remove_unreachable_nodes in one way or another and I have always started by applying the hunk below. I did this because distinguishing different symbol nodes only according to their names is just so inconvenient, especially when compiling C++. The risk is minimal and therefore I'd like to propose it to trunk even at this late stage, although I can of course wait until the next stage1. The other hunk is something that I think is also useful when looking into all failures of ipcp_verify_propagated_values like e.g. PR 60727. I included the patch in a recent bootstrap and testing and it of course passes. OK for trunk now? Or later? I'll leave the actual changes for review by Honza, it's fine at this stage if he things the changes make sense and are consistent. It seems fine to me... Thanks, I will commit it shortly then. Thanks, Richard. Thanks, Martin 2014-04-01 Martin Jambor mjam...@suse.cz * ipa-cp.c (ipcp_verify_propagated_values): Also dump symtab and mention gcc_unreachable before failing. * ipa.c (symtab_remove_unreachable_nodes): Also print order of removed symbols. Index: src/gcc/ipa-cp.c === --- src.orig/gcc/ipa-cp.c +++ src/gcc/ipa-cp.c @@ -884,8 +884,9 @@ ipcp_verify_propagated_values (void) { if (dump_file) { + dump_symtab (dump_file); fprintf (dump_file, \nIPA lattices after constant -propagation:\n); +propagation, before gcc_unreachable:\n); This means before symtab_remove_unreachable_nodes? No, there is litrally a call to gcc_unreachable just below this dumping. I added this to grep for it easily when I have a number of dumps lying around because there is the same string in normal dumps too. Thanks, Martin
Re: RFA: Fix PR rtl-optimization/60651
Hmm, the sanity check in new_seginfo caused a boostrap failure building libjava on x86. There was a block with CODE_LABEL as basic block head, otherwise empty.
Skip some gcc.target/i386 tests for conflicting -march= options
If you test an x86_64 toolchain with -march=bdver3 in the multilib options, as noted in http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01662.html various test failures arise from tests whose own -march= in dg-options is overridden. This patch adds dg-skip-if to those tests to skip them for conflicting -march= options, as has been done before for other tests (obviously, if the option ordering is changed in future in DejaGnu, such skips may become obsolete or could be conditioned on DejaGnu version). (No doubt other -march= options would show up further tests needing such changes.) Tested x86_64-linux-gnu. OK to commit? 2014-04-02 Joseph Myers jos...@codesourcery.com * gcc.target/i386/funcspec-2.c, gcc.target/i386/funcspec-3.c, gcc.target/i386/funcspec-9.c, gcc.target/i386/isa-1.c, gcc.target/i386/memcpy-strategy-1.c, gcc.target/i386/memcpy-strategy-2.c, gcc.target/i386/memcpy-vector_loop-1.c, gcc.target/i386/memcpy-vector_loop-2.c, gcc.target/i386/memset-vector_loop-1.c, gcc.target/i386/memset-vector_loop-2.c, gcc.target/i386/sse2-init-v2di-2.c, gcc.target/i386/ssetype-1.c, gcc.target/i386/ssetype-2.c, gcc.target/i386/ssetype-5.c: Skip for -march= options different from those in dg-options. Index: gcc/testsuite/gcc.target/i386/memcpy-vector_loop-2.c === --- gcc/testsuite/gcc.target/i386/memcpy-vector_loop-2.c(revision 209023) +++ gcc/testsuite/gcc.target/i386/memcpy-vector_loop-2.c(working copy) @@ -1,4 +1,5 @@ /* { dg-do compile } */ +/* { dg-skip-if { i?86-*-* x86_64-*-* } { -march=* } { -march=atom } } */ /* { dg-options -O2 -march=atom -minline-all-stringops -mstringop-strategy=vector_loop } */ /* { dg-final { scan-assembler-times movdqa 4} } */ Index: gcc/testsuite/gcc.target/i386/ssetype-1.c === --- gcc/testsuite/gcc.target/i386/ssetype-1.c (revision 209023) +++ gcc/testsuite/gcc.target/i386/ssetype-1.c (working copy) @@ -1,6 +1,7 @@ /* { dg-do compile } */ /* This test checks for absolute memory operands. */ /* { dg-require-effective-target nonpic } */ +/* { dg-skip-if { i?86-*-* x86_64-*-* } { -march=* } { -march=k8 } } */ /* { dg-options -O2 -msse2 -march=k8 } */ /* { dg-final { scan-assembler andpd\[^\\n\]*magic } } */ /* { dg-final { scan-assembler andnpd\[^\\n\]*magic } } */ Index: gcc/testsuite/gcc.target/i386/ssetype-5.c === --- gcc/testsuite/gcc.target/i386/ssetype-5.c (revision 209023) +++ gcc/testsuite/gcc.target/i386/ssetype-5.c (working copy) @@ -1,6 +1,7 @@ /* { dg-do compile } */ /* This test checks for absolute memory operands. */ /* { dg-require-effective-target nonpic } */ +/* { dg-skip-if { i?86-*-* x86_64-*-* } { -march=* } { -march=k8 } } */ /* { dg-options -O2 -msse2 -march=k8 } */ /* { dg-final { scan-assembler pand\[^\\n\]*magic } } */ /* { dg-final { scan-assembler pandn\[^\\n\]*magic } } */ Index: gcc/testsuite/gcc.target/i386/memset-vector_loop-2.c === --- gcc/testsuite/gcc.target/i386/memset-vector_loop-2.c(revision 209023) +++ gcc/testsuite/gcc.target/i386/memset-vector_loop-2.c(working copy) @@ -1,4 +1,5 @@ /* { dg-do compile } */ +/* { dg-skip-if { i?86-*-* x86_64-*-* } { -march=* } { -march=atom } } */ /* { dg-options -O2 -march=atom -minline-all-stringops -mstringop-strategy=vector_loop } */ /* { dg-final { scan-assembler-times movdqa 4} } */ Index: gcc/testsuite/gcc.target/i386/ssetype-2.c === --- gcc/testsuite/gcc.target/i386/ssetype-2.c (revision 209023) +++ gcc/testsuite/gcc.target/i386/ssetype-2.c (working copy) @@ -1,4 +1,5 @@ /* { dg-do compile } */ +/* { dg-skip-if { i?86-*-* x86_64-*-* } { -march=* } { -march=k8 } } */ /* { dg-options -O2 -msse2 -march=k8 } */ /* { dg-final { scan-assembler andpd } } */ /* { dg-final { scan-assembler andnpd } } */ Index: gcc/testsuite/gcc.target/i386/funcspec-9.c === --- gcc/testsuite/gcc.target/i386/funcspec-9.c (revision 209023) +++ gcc/testsuite/gcc.target/i386/funcspec-9.c (working copy) @@ -1,5 +1,6 @@ /* Test whether using target specific options, we can generate FMA4 code. */ /* { dg-do compile } */ +/* { dg-skip-if { i?86-*-* x86_64-*-* } { -march=* } { -march=k8 } } */ /* { dg-options -O2 -march=k8 -mfpmath=sse -msse2 } */ extern void exit (int); Index: gcc/testsuite/gcc.target/i386/funcspec-2.c === --- gcc/testsuite/gcc.target/i386/funcspec-2.c (revision 209023) +++ gcc/testsuite/gcc.target/i386/funcspec-2.c (working copy) @@ -1,5 +1,6 @@ /* Test whether using target specific options, we
Re: [PATCH] [ARM] [RFC] Fix longstanding push_minipool_fix ICE (PR49423, lp1296601)
On 2 April 2014 14:29, Charles Baylis charles.bay...@linaro.org wrote: Tested on arm-unknown-linux-gnueabihf (qemu), bootstrap in progress. bootstrapped successfully on a Chromebook arm-unknown-linux-gnueabihf.
Re: RFA: Fix PR rtl-optimization/60651
On 2 April 2014 17:34, Joern Rennecke joern.renne...@embecosm.com wrote: Hmm, the sanity check in new_seginfo caused a boostrap failure building libjava on x86. There was a block with CODE_LABEL as basic block head, otherwise empty. I've added the testcase - and a bit more detail on this issue - in the PR. I've attached an updated patch, which skips past the CODE_LABEL. And this one bootstraps on i686-pc-linuc-gnu. 2014-04-02 Joern Rennecke joern.renne...@embecosm.com gcc: PR rtl-optimization/60651 * mode-switching.c (optimize_mode_switching): Make sure to emit sets of a lower numbered entity before sets of a higher numbered entity to a mode of the same or lower priority. When creating a seginfo for a basic block that starts with a code label, move the insertion point past the code label. (new_seginfo): Document and enforce requirement that NOTE_INSN_BASIC_BLOCK only appears for empty blocks. * doc/tm.texi.in: Document ordering constraint for emitted mode sets. * doc/tm.texi: Regenerate. gcc/testsuite: PR rtl-optimization/60651 * gcc.target/epiphany/mode-switch.c: New test. diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index f7024a7..b8ca17e 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -9778,6 +9778,8 @@ for @var{entity}. For any fixed @var{entity}, @code{mode_priority_to_mode} Generate one or more insns to set @var{entity} to @var{mode}. @var{hard_reg_live} is the set of hard registers live at the point where the insn(s) are to be inserted. +Sets of a lower numbered entity will be emitted before sets of a higher +numbered entity to a mode of the same or lower priority. @end defmac @node Target Attributes diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 6dcbde4..d793d26 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -7447,6 +7447,8 @@ for @var{entity}. For any fixed @var{entity}, @code{mode_priority_to_mode} Generate one or more insns to set @var{entity} to @var{mode}. @var{hard_reg_live} is the set of hard registers live at the point where the insn(s) are to be inserted. +Sets of a lower numbered entity will be emitted before sets of a higher +numbered entity to a mode of the same or lower priority. @end defmac @node Target Attributes diff --git a/gcc/mode-switching.c b/gcc/mode-switching.c index 88543b2..088156c 100644 --- a/gcc/mode-switching.c +++ b/gcc/mode-switching.c @@ -96,12 +96,18 @@ static void make_preds_opaque (basic_block, int); /* This function will allocate a new BBINFO structure, initialized - with the MODE, INSN, and basic block BB parameters. */ + with the MODE, INSN, and basic block BB parameters. + INSN may not be a NOTE_INSN_BASIC_BLOCK, unless it is en empty + basic block; that allows us later to insert instructions in a FIFO-like + manner. */ static struct seginfo * new_seginfo (int mode, rtx insn, int bb, HARD_REG_SET regs_live) { struct seginfo *ptr; + + gcc_assert (!NOTE_INSN_BASIC_BLOCK_P (insn) + || insn == BB_END (NOTE_BASIC_BLOCK (insn))); ptr = XNEW (struct seginfo); ptr-mode = mode; ptr-insn_ptr = insn; @@ -534,7 +540,13 @@ optimize_mode_switching (void) break; if (e) { - ptr = new_seginfo (no_mode, BB_HEAD (bb), bb-index, live_now); + rtx ins_pos = BB_HEAD (bb); + if (LABEL_P (ins_pos)) + ins_pos = NEXT_INSN (ins_pos); + gcc_assert (NOTE_INSN_BASIC_BLOCK_P (ins_pos)); + if (ins_pos != BB_END (bb)) + ins_pos = NEXT_INSN (ins_pos); + ptr = new_seginfo (no_mode, ins_pos, bb-index, live_now); add_seginfo (info + bb-index, ptr); bitmap_clear_bit (transp[bb-index], j); } @@ -733,7 +745,15 @@ optimize_mode_switching (void) { emitted = true; if (NOTE_INSN_BASIC_BLOCK_P (ptr-insn_ptr)) - emit_insn_after (mode_set, ptr-insn_ptr); + /* We need to emit the insns in a FIFO-like manner, + i.e. the first to be emitted at our insertion + point ends up first in the instruction steam. + Because we made sure that NOTE_INSN_BASIC_BLOCK is + only used for initially empty basic blocks, we + can archive this by appending at the end of + the block. */ + emit_insn_after + (mode_set, BB_END (NOTE_BASIC_BLOCK (ptr-insn_ptr))); else emit_insn_before (mode_set, ptr-insn_ptr); } --- /dev/null 2014-03-19 18:18:19.244212660 + +++ b/gcc/testsuite/gcc.target/epiphany/mode-switch.c 2014-03-25 13:31:41.186140611 + @@ -0,0 +1,12 @@
[PATCH, libitm] Remove unused PAGE_SIZE macros
As recently pointed out in a thread porting libitm to aarch64, the PAGE_SIZE and FIXED_PAGE_SIZE macros are unused. Indeed, not all of the ports actually defined them at all. Removed, lest they cause further confusion. r~ * config/alpha/target.h (PAGE_SIZE, FIXED_PAGE_SIZE): Remove. * config/arm/target.h, config/sh/target.h: Likewise. * config/sparc/target.h, config/x86/target.h: Likewise. diff --git a/libitm/config/alpha/target.h b/libitm/config/alpha/target.h index 5e23c53..e33f1e1 100644 --- a/libitm/config/alpha/target.h +++ b/libitm/config/alpha/target.h @@ -32,10 +32,6 @@ typedef struct gtm_jmpbuf unsigned long f[8]; } gtm_jmpbuf; -/* Alpha generally uses a fixed page size of 8K. */ -#define PAGE_SIZE 8192 -#define FIXED_PAGE_SIZE1 - /* The size of one line in hardware caches (in bytes). */ #define HW_CACHELINE_SIZE 64 diff --git a/libitm/config/arm/target.h b/libitm/config/arm/target.h index 6a1458e..a909e14 100644 --- a/libitm/config/arm/target.h +++ b/libitm/config/arm/target.h @@ -33,10 +33,6 @@ typedef struct gtm_jmpbuf unsigned long pc; } gtm_jmpbuf; -/* ARM generally uses a fixed page size of 4K. */ -#define PAGE_SIZE 4096 -#define FIXED_PAGE_SIZE1 - /* ??? The size of one line in hardware caches (in bytes). */ #define HW_CACHELINE_SIZE 64 diff --git a/libitm/config/sh/target.h b/libitm/config/sh/target.h index 6f6ae5f..fbc804c 100644 --- a/libitm/config/sh/target.h +++ b/libitm/config/sh/target.h @@ -35,10 +35,6 @@ typedef struct gtm_jmpbuf #endif } gtm_jmpbuf; -/* SH generally uses a fixed page size of 4K. */ -#define PAGE_SIZE 4096 -#define FIXED_PAGE_SIZE1 - /* ??? The size of one line in hardware caches (in bytes). */ #define HW_CACHELINE_SIZE 32 diff --git a/libitm/config/sparc/target.h b/libitm/config/sparc/target.h index b127fa4..309dac1 100644 --- a/libitm/config/sparc/target.h +++ b/libitm/config/sparc/target.h @@ -29,10 +29,6 @@ typedef struct gtm_jmpbuf unsigned long pc; } gtm_jmpbuf; -/* UltraSPARC processors generally use a fixed page size of 8K. */ -#define PAGE_SIZE 8192 -#define FIXED_PAGE_SIZE1 - /* The size of one line in hardware caches (in bytes). We use the primary cache line size documented for the UltraSPARC T1/T2. */ #define HW_CACHELINE_SIZE 16 diff --git a/libitm/config/x86/target.h b/libitm/config/x86/target.h index 392db48..78a58e7 100644 --- a/libitm/config/x86/target.h +++ b/libitm/config/x86/target.h @@ -52,10 +52,6 @@ typedef struct gtm_jmpbuf /* x86 doesn't require strict alignment for the basic types. */ #define STRICT_ALIGNMENT 0 -/* x86 uses a fixed page size of 4K. */ -#define PAGE_SIZE 4096 -#define FIXED_PAGE_SIZE 1 - /* The size of one line in hardware caches (in bytes). */ #define HW_CACHELINE_SIZE 64
[commit, spu] Fix regression (ICE) in g++.dg/torture/pr57499.C
Hello, this fixes the following testsuite regression on spu-elf: FAIL: g++.dg/torture/pr57499.C -O1 (internal compiler error) which was caused by a code path in pad_bb that would simply crash if the very last active insn in a function happened to be a blockage. Tested on spu-elf, committed to mainline. Bye, Ulrich ChangeLog: * config/spu/spu.c (pad_bb): Do not crash when the last insn is CODE_FOR_blockage. Index: gcc/config/spu/spu.c === *** gcc/config/spu/spu.c(revision 208964) --- gcc/config/spu/spu.c(working copy) *** pad_bb(void) *** 2064,2070 } hbr_insn = insn; } ! if (INSN_CODE (insn) == CODE_FOR_blockage) { if (GET_MODE (insn) == TImode) PUT_MODE (next_insn, TImode); --- 2064,2070 } hbr_insn = insn; } ! if (INSN_CODE (insn) == CODE_FOR_blockage next_insn) { if (GET_MODE (insn) == TImode) PUT_MODE (next_insn, TImode); -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
[commit, spu] Fix regression (ICE) in gcc.dg/pr48335-2.c
Hello, this fixes the following regressions on spu-elf: FAIL: gcc.dg/pr48335-2.c (internal compiler error) FAIL: gcc.dg/pr48335-3.c (internal compiler error) which are caused by common code calling the insv pattern with a combination of bitoffset/bitsize that lies partially outside the underlying target mode, causing an assertion failure in spu_expand_insv. The original reason for the bad offset is that the test case actually has undefined behavior due to storing partically outside a struct via a misaligned pointer. Still, the compiler should not ICE, so I've fixed this similar to what was done on s390 by just rejecting this in the insv expander and falling back to common code. Tested on spu-elf, committed to mainline. Bye, Ulrich ChangeLog: * config/spu/spu.md (insv): Fail if bitoffset+bitsize lies outside the target mode. Index: gcc/config/spu/spu.md === *** gcc/config/spu/spu.md (revision 208964) --- gcc/config/spu/spu.md (working copy) *** *** 2851,2857 (match_operand:SI 2 const_int_operand )) (match_operand 3 nonmemory_operand ))] ! { spu_expand_insv(operands); DONE; }) ;; Simplify a number of patterns that get generated by extv, extzv, ;; insv, and loads. --- 2851,2863 (match_operand:SI 2 const_int_operand )) (match_operand 3 nonmemory_operand ))] ! { ! if (INTVAL (operands[1]) + INTVAL (operands[2]) ! GET_MODE_BITSIZE (GET_MODE (operands[0]))) ! FAIL; ! spu_expand_insv(operands); ! DONE; ! }) ;; Simplify a number of patterns that get generated by extv, extzv, ;; insv, and loads. -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [patch] Fix texinfo warnings for doc/gcc.texi [was: Re: doc bugs]
*PING* Tobias Burnus wrote: H.J. Lu wrote: On Fri, Mar 28, 2014 at 12:41 PM, Mike Stump mikest...@comcast.net wrote: Since we are nearing release, I thought I'd mention I see: ../../gcc/gcc/doc/invoke.texi:1114: warning: node next `Overall Options' in menu `C Dialect Options' and in sectioning `Invoking G++' differ http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59055 I think one reason that there are (and were) that many warnings is that only recently texinfo gained support for diagnosing these issues. (Or maybe not that recent but distributions were slow in adapting newer texinfo versions.) Attached is a warning-removal patch. OK for the trunk? Regarding invoke.texi: It had (nearly) the same @menu twice, once under @chapter where it belongs to and once under a @section where it doesn't. Tobias
RFA: PATCH to add -fno-gnu-unique for c++/60731
Use of STB_GNU_UNIQUE to avoid problems with variable symbols shared between two RTLD_LOCAL plugins and a common library dependency causes problems with libraries that depend on dlclose/dlopen to reinitialize state. This patch adds a -fno-gnu-unique flag that such libraries can use. Tested x86_64-pc-linux-gnu. OK for trunk? commit e9f123743831274cff1c135cf65bb222507bab32 Author: Jason Merrill ja...@redhat.com Date: Wed Apr 2 15:10:32 2014 -0400 PR c++/60731 * common.opt (-fno-gnu-unique): Add. * config/elfos.h (USE_GNU_UNIQUE_OBJECT): Check it. diff --git a/gcc/common.opt b/gcc/common.opt index 62c72f0..2259f29 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1267,6 +1267,10 @@ fgnu-tm Common Report Var(flag_tm) Enable support for GNU transactional memory +fgnu-unique +Common Report Var(flag_gnu_unique) Init(1) +Use STB_GNU_UNIQUE if supported by the assembler + floop-flatten Common Ignore Does nothing. Preserved for backward compatibility. diff --git a/gcc/config/elfos.h b/gcc/config/elfos.h index 1fce701..c1d5553 100644 --- a/gcc/config/elfos.h +++ b/gcc/config/elfos.h @@ -287,7 +287,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see /* Write the extra assembler code needed to declare an object properly. */ #ifdef HAVE_GAS_GNU_UNIQUE_OBJECT -#define USE_GNU_UNIQUE_OBJECT 1 +#define USE_GNU_UNIQUE_OBJECT flag_gnu_unique #else #define USE_GNU_UNIQUE_OBJECT 0 #endif diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index eca4e8f..2e78b8b 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1070,6 +1070,7 @@ See S/390 and zSeries Options. -ffixed-@var{reg} -fexceptions @gol -fnon-call-exceptions -fdelete-dead-exceptions -funwind-tables @gol -fasynchronous-unwind-tables @gol +-fno-gnu-unique @gol -finhibit-size-directive -finstrument-functions @gol -finstrument-functions-exclude-function-list=@var{sym},@var{sym},@dots{} @gol -finstrument-functions-exclude-file-list=@var{file},@var{file},@dots{} @gol @@ -22015,6 +22016,20 @@ Generate unwind table in DWARF 2 format, if supported by target machine. The table is exact at each instruction boundary, so it can be used for stack unwinding from asynchronous events (such as debugger or garbage collector). +@item -fno-gnu-unique +@opindex fno-gnu-unique +On systems with recent GNU assembler and C library, the C++ compiler +uses the @code{STB_GNU_UNIQUE} binding to make sure that definitions +of template static data members and static local variables in inline +functions are unique even in the presence of @code{RTLD_LOCAL}; this +is necessary to avoid problems with a library used by two different +@code{RTLD_LOCAL} plugins depending on a definition in one of them and +therefore disagreeing with the other one about the binding of the +symbol. But this causes @code{dlclose} to be ignored for affected +DSOs; if your program relies on reinitialization of a DSO via +@code{dlclose} and @code{dlopen}, you can use +@option{-fno-gnu-unique}. + @item -fpcc-struct-return @opindex fpcc-struct-return Return ``short'' @code{struct} and @code{union} values in memory like
Re: Skip some gcc.target/i386 tests for conflicting -march= options
On Wed, Apr 2, 2014 at 6:36 PM, Joseph S. Myers jos...@codesourcery.com wrote: If you test an x86_64 toolchain with -march=bdver3 in the multilib options, as noted in http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01662.html various test failures arise from tests whose own -march= in dg-options is overridden. This patch adds dg-skip-if to those tests to skip them for conflicting -march= options, as has been done before for other tests (obviously, if the option ordering is changed in future in DejaGnu, such skips may become obsolete or could be conditioned on DejaGnu version). (No doubt other -march= options would show up further tests needing such changes.) Tested x86_64-linux-gnu. OK to commit? 2014-04-02 Joseph Myers jos...@codesourcery.com * gcc.target/i386/funcspec-2.c, gcc.target/i386/funcspec-3.c, gcc.target/i386/funcspec-9.c, gcc.target/i386/isa-1.c, gcc.target/i386/memcpy-strategy-1.c, gcc.target/i386/memcpy-strategy-2.c, gcc.target/i386/memcpy-vector_loop-1.c, gcc.target/i386/memcpy-vector_loop-2.c, gcc.target/i386/memset-vector_loop-1.c, gcc.target/i386/memset-vector_loop-2.c, gcc.target/i386/sse2-init-v2di-2.c, gcc.target/i386/ssetype-1.c, gcc.target/i386/ssetype-2.c, gcc.target/i386/ssetype-5.c: Skip for -march= options different from those in dg-options. OK. Thanks, Uros.
Use -mno-prefer-avx128 in two more tests
Two of the tests I noted in http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00036.html did not get fixed for --with-arch=bdver3 --with-cpu=bdver3 by adding -mno-prefer-avx128 in fact also show failures for --with-arch=btver2 --with-tune=btver2, and in that case *are* fixed by adding -mno-prefer-avx128. Thus, while in those cases there may still be other tuning issues as noted in http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00052.html (btver2 doesn't enable the flag in question) I think it *is* correct to use -mno-prefer-avx128 for these two tests, and this patch adds it. Tested x86_64-linux-gnu. OK to commit? 2014-04-02 Joseph Myers jos...@codesourcery.cmo * gcc.target/i386/avx2-vpand-3.c, gcc.target/i386/avx256-unaligned-load-2.c: Use -mno-prefer-avx128. Index: gcc/testsuite/gcc.target/i386/avx2-vpand-3.c === --- gcc/testsuite/gcc.target/i386/avx2-vpand-3.c(revision 209023) +++ gcc/testsuite/gcc.target/i386/avx2-vpand-3.c(working copy) @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-options -mavx2 -O2 -ftree-vectorize -save-temps } */ +/* { dg-options -mavx2 -mno-prefer-avx128 -O2 -ftree-vectorize -save-temps } */ /* { dg-require-effective-target avx2 } */ Index: gcc/testsuite/gcc.target/i386/avx256-unaligned-load-2.c === --- gcc/testsuite/gcc.target/i386/avx256-unaligned-load-2.c (revision 209023) +++ gcc/testsuite/gcc.target/i386/avx256-unaligned-load-2.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile { target { ! ia32 } } } */ -/* { dg-options -O3 -dp -mavx -mavx256-split-unaligned-load } */ +/* { dg-options -O3 -dp -mavx -mavx256-split-unaligned-load -mno-prefer-avx128 } */ void avx_test (char **cp, char **ep) -- Joseph S. Myers jos...@codesourcery.com
Re: Use -mno-prefer-avx128 in two more tests
On Wed, Apr 2, 2014 at 10:09 PM, Joseph S. Myers jos...@codesourcery.com wrote: Two of the tests I noted in http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00036.html did not get fixed for --with-arch=bdver3 --with-cpu=bdver3 by adding -mno-prefer-avx128 in fact also show failures for --with-arch=btver2 --with-tune=btver2, and in that case *are* fixed by adding -mno-prefer-avx128. Thus, while in those cases there may still be other tuning issues as noted in http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00052.html (btver2 doesn't enable the flag in question) I think it *is* correct to use -mno-prefer-avx128 for these two tests, and this patch adds it. Tested x86_64-linux-gnu. OK to commit? 2014-04-02 Joseph Myers jos...@codesourcery.cmo * gcc.target/i386/avx2-vpand-3.c, gcc.target/i386/avx256-unaligned-load-2.c: Use -mno-prefer-avx128. OK. Thanks, Uros.
Re: RFA: RL78: Fix handling of (SUBREG (SYMBOL_REF))
This is OK. Thanks!
Re: [C++ patch] for C++/52369
2014-03-31 23:48 GMT+02:00 Jason Merrill ja...@redhat.com: [...] if (permerror (input_location, default argument given for parameter %d of %q#D, i, newdecl)) permerror (DECL_SOURCE_LOCATION (olddecl), previous specification in %q#D here, olddecl); should the second permerror be a note instead ? Yes. OK to commit the attached patch ? Tested x86_64 linux, though this piece of code does not seem to be covered by the testsuite. 2014-04-02 Fabien Chêne fab...@gcc.gnu.org * cp/decl.c (duplicate_decls): Check for the return of permerror before emitting a note. -- Fabien Index: gcc/cp/decl.c === --- gcc/cp/decl.c (révision 208997) +++ gcc/cp/decl.c (copie de travail) @@ -1737,9 +1737,9 @@ duplicate_decls (tree newdecl, tree oldd if (permerror (input_location, default argument given for parameter %d of %q#D, i, newdecl)) - permerror (DECL_SOURCE_LOCATION (olddecl), - previous specification in %q#D here, - olddecl); + inform (DECL_SOURCE_LOCATION (olddecl), + previous specification in %q#D here, + olddecl); } else {
[BUILD] Ping for Jakub's --with-build-config=bootstrap-asan / bootstrap-ubsan patches
I would like to ping the following two patches of Jakub. As he wrote in PR60667: The http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01370.html fix is still waiting for review, you need that for both --with-build-config=bootstrap-ubsan and --with-build-config=bootstrap-asan. For --with-build-config=bootstrap-asan also the http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01433.html patch is needed, plus --with-build-config=bootstrap-asan will only work with -disable-werror for now (fix for that expected only in stage1). Tobias
[Patch, fortran] PR60717 - Wrong code with recursive procedure with unlimited polymorphic dummy argument
Dear All, This fix, of itself, is quite obvious. The offset was being set to zero for array segments, rather than that required for unity valued lvalues. I think that the fix could be used to clean up: trans-expr.c(gfc_trans_alloc_subarray_assign) trans-expr.c(gfc_trans_pointer_assign) trans-expr.c(fncall_realloc_result) trans-array.c(trans_associate_var) each of which contains calculation of the offset. However, I do not think that this is the stage to fix things that are not broken! I propose to keep the PR open as a reminder to look into this. Bootstrapped and regtested on X86_64/FC17 - OK for trunk and backporting to 4.8? Paul 2014-04-12 Paul Thomas pa...@gcc.gnu.org PR fortran/58771 * trans.h : Add 'use_offset' bitfield to gfc_se. * trans-array.c (gfc_conv_expr_descriptor) : Use 'use_offset' as a trigger to unconditionally recalculate the offset. trans-expr.c (gfc_conv_intrinsic_to_class) : Use it. (gfc_conv_procedure_call) : Ditto. 2014-04-02 Paul Thomas pa...@gcc.gnu.org PR fortran/58771 * gfortran.dg/unlimited_polymorphic_17.f90 : New test Index: gcc/fortran/trans-array.c === *** gcc/fortran/trans-array.c (revision 208997) --- gcc/fortran/trans-array.c (working copy) *** gfc_conv_expr_descriptor (gfc_se *se, gf *** 6807,6813 /* Set offset for assignments to pointer only to zero if it is not the full array. */ ! if (se-direct_byref info-ref info-ref-u.ar.type != AR_FULL) base = gfc_index_zero_node; else if (GFC_ARRAY_TYPE_P (TREE_TYPE (desc))) --- 6807,6813 /* Set offset for assignments to pointer only to zero if it is not the full array. */ ! if ((se-direct_byref || se-use_offset) info-ref info-ref-u.ar.type != AR_FULL) base = gfc_index_zero_node; else if (GFC_ARRAY_TYPE_P (TREE_TYPE (desc))) *** gfc_conv_expr_descriptor (gfc_se *se, gf *** 6899,6905 base = fold_build2_loc (input_location, MINUS_EXPR, TREE_TYPE (base), base, stride); } ! else if (GFC_ARRAY_TYPE_P (TREE_TYPE (desc))) { tmp = gfc_conv_array_lbound (desc, n); tmp = fold_build2_loc (input_location, MINUS_EXPR, --- 6899,6905 base = fold_build2_loc (input_location, MINUS_EXPR, TREE_TYPE (base), base, stride); } ! else if (GFC_ARRAY_TYPE_P (TREE_TYPE (desc)) || se-use_offset) { tmp = gfc_conv_array_lbound (desc, n); tmp = fold_build2_loc (input_location, MINUS_EXPR, *** gfc_conv_expr_descriptor (gfc_se *se, gf *** 6935,6942 gfc_get_dataptr_offset (loop.pre, parm, desc, offset, subref_array_target, expr); ! if ((se-direct_byref || GFC_ARRAY_TYPE_P (TREE_TYPE (desc))) !se-data_not_needed) { /* Set the offset. */ gfc_conv_descriptor_offset_set (loop.pre, parm, base); --- 6935,6943 gfc_get_dataptr_offset (loop.pre, parm, desc, offset, subref_array_target, expr); ! if (((se-direct_byref || GFC_ARRAY_TYPE_P (TREE_TYPE (desc))) !se-data_not_needed) + || (se-use_offset base != NULL_TREE)) { /* Set the offset. */ gfc_conv_descriptor_offset_set (loop.pre, parm, base); Index: gcc/fortran/trans-expr.c === *** gcc/fortran/trans-expr.c(revision 208997) --- gcc/fortran/trans-expr.c(working copy) *** gfc_conv_intrinsic_to_class (gfc_se *par *** 593,598 --- 593,599 else { parmse-ss = ss; + parmse-use_offset = 1; gfc_conv_expr_descriptor (parmse, e); gfc_add_modify (parmse-pre, ctree, parmse-expr); } *** gfc_conv_procedure_call (gfc_se * se, gf *** 4378,4383 --- 4379,4385 || CLASS_DATA (fsym)-attr.codimension)) { /* Pass a class array. */ + parmse.use_offset = 1; gfc_conv_expr_descriptor (parmse, e); /* If an ALLOCATABLE dummy argument has INTENT(OUT) and is Index: gcc/fortran/trans.h === *** gcc/fortran/trans.h (revision 208997) --- gcc/fortran/trans.h (working copy) *** typedef struct gfc_se *** 87,92 --- 87,96 args alias. */ unsigned force_tmp:1; + / * Unconditionally calculate offset for array segments in + gfc_conv_expr_descriptor. */ + unsigned use_offset:1; + unsigned want_coarray:1; /* Scalarization parameters. */ Index:
Re: [C++ patch] for C++/52369
On 04/02/2014 04:21 PM, Fabien Chêne wrote: * cp/decl.c (duplicate_decls): Check for the return of permerror before emitting a note. You don't need cp/ within cp/ChangeLog. OK with that change. Jason
one more patch to fix PR60650
The following patch fixes the PR for new set of options. The details of the problem can be found on http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60650 The patch affects a sensitive part for LRA. Therefore I bootstrapped and tested it on x86-64, aarch64, arm, s390, and Ppc64. The results look ok. x86/x86-64 SPEC2000 testing shows no visible effect on performance and code size. Committed as rev. 209038. 2014-04-02 Vladimir Makarov vmaka...@redhat.com PR rtl-optimization/60650 * lra-constraints.c (process_alt_operands): Decrease reject for earlyclobber matching. 2014-04-02 Vladimir Makarov vmaka...@redhat.com PR rtl-optimization/60650 * gcc.target/arm/pr60650-2.c: New. Index: lra-constraints.c === --- lra-constraints.c (revision 208989) +++ lra-constraints.c (working copy) @@ -1747,12 +1747,27 @@ process_alt_operands (int only_alternati [GET_MODE (*curr_id-operand_loc[m])]); } - /* We prefer no matching alternatives because - it gives more freedom in RA. */ - if (operand_reg[nop] == NULL_RTX - || (find_regno_note (curr_insn, REG_DEAD, -REGNO (operand_reg[nop])) -== NULL_RTX)) + /* Prefer matching earlyclobber alternative as + it results in less hard regs required for + the insn than a non-matching earlyclobber + alternative. */ + if (curr_static_id-operand[m].early_clobber) + { + if (lra_dump_file != NULL) + fprintf + (lra_dump_file, +%d Matching earlyclobber alt: + reject--\n, +nop); + reject--; + } + /* Otherwise we prefer no matching + alternatives because it gives more freedom + in RA. */ + else if (operand_reg[nop] == NULL_RTX +|| (find_regno_note (curr_insn, REG_DEAD, + REGNO (operand_reg[nop])) +== NULL_RTX)) { if (lra_dump_file != NULL) fprintf @@ -2143,7 +2158,7 @@ process_alt_operands (int only_alternati } /* If the operand is dying, has a matching constraint, and satisfies constraints of the matched operand -which failed to satisfy the own constraints, probably +which failed to satisfy the own constraints, most probably the reload for this operand will be gone. */ if (this_alternative_matches = 0 !curr_alt_win[this_alternative_matches] Index: testsuite/gcc.target/arm/pr60650-2.c === --- testsuite/gcc.target/arm/pr60650-2.c(revision 0) +++ testsuite/gcc.target/arm/pr60650-2.c(working copy) @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fno-omit-frame-pointer -march=armv7-a } */ + +int a, h, j; +long long d, e, i; +int f; +fn1 (void *p1, int p2) +{ +switch (p2) +case 8: +{ +register b = *(long long *) p1, c asm (r2); +asm (%0: =r (a), =r (c):r (b), r (0)); +*(long long *) p1 = c; +} +} + +fn2 () +{ +int k; +k = f; +while (1) +{ +fn1 (i, sizeof i); +e = d + k; +switch (d) +case 0: +( +{ +register l asm (r4); +register m asm (r0); +asm ( .err .endif\n\t: =r (h), =r (j):r (m), +r +(l));; +}); +} +}
[PATCH, committed] Fix PR60733
PR60733 identifies a case where straight-line strength reduction produces code that doesn't satisfy SSA verification. For a PHI candidate, the insertion of an initializer for a stride calculation along an incoming arc was specified to be at the point of the feeding definition of the PHI along that arc. This is wrong and can place the initializer far earlier than its operands are guaranteed to be available. In this case, the initializer was placed earlier in the block than the definition of one of its operands. In fact, the initializer is only needed at the end of the feeding block for the PHI argument, and its operands are guaranteed to be available at that point. This patch changes the placement of the initializer to this location for PHI candidates. The nearest common dominator algorithm may still place the initializer at an earlier point, but only if it is safe to do so. Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new regressions; committed. Thanks, Bill [gcc] 2014-04-02 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/60733 * gimple-ssa-strength-reduction.c (ncd_with_phi): Change required insertion point for PHI candidates to be the end of the feeding block for the PHI argument. [gcc/testsuite] 2014-04-02 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/60733 * gcc.dg/torture/pr60733.c: New test. Index: gcc/testsuite/gcc.dg/torture/pr60733.c === --- gcc/testsuite/gcc.dg/torture/pr60733.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/pr60733.c (revision 0) @@ -0,0 +1,36 @@ +/* { dg-do run } */ + +int a, d, e, f, g, h, i, j, k; +unsigned short b; + +short +fn1 (int p1, int p2) +{ + return p1 * p2; +} + +int +main () +{ + for (; a; a--) +{ + int l = 0; + if (f = 0) + { + for (; h;) + e = 0; + for (; l != -6; l--) + { + j = fn1 (b--, d); + for (g = 0; g; g = 1) + ; + k = e ? 2 : 0; + } + i = 0; + for (;;) + ; + } +} + d = 0; + return 0; +} Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 209023) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -3001,10 +3001,10 @@ ncd_with_phi (slsr_cand_t c, double_int incr, gimp { slsr_cand_t arg_cand = base_cand_from_table (arg); double_int diff = arg_cand-index - basis-index; + basic_block pred = gimple_phi_arg_edge (phi, i)-src; if ((incr == diff) || (!address_arithmetic_p incr == -diff)) - ncd = ncd_for_two_cands (ncd, gimple_bb (arg_cand-cand_stmt), -*where, arg_cand, where); + ncd = ncd_for_two_cands (ncd, pred, *where, NULL, where); } } }
Re: [PATCH] Disable IPA-SRA for always_inline functions
Hi, when dealing with a PR yesterday I have noticed that IPA-SRA was modifying an always_inline function which is useless work since the function must then be inlined anyway. Thus I'd like to propose the following simple change disabling it in such cases. Included in a bootstrap and testing on x86_64-linux. OK for trunk now or in the next stsge1? Actually are the attributes copied to the clone? The patch looks OK to me, even at this stage. Honza Thanks, Martin 2014-04-01 Martin Jambor mjam...@suse.cz * tree-sra.c (ipa_sra_preliminary_function_checks): Skip always_inline functions. Index: src/gcc/tree-sra.c === --- src.orig/gcc/tree-sra.c +++ src/gcc/tree-sra.c @@ -4960,6 +4960,15 @@ ipa_sra_preliminary_function_checks (str if (TYPE_ATTRIBUTES (TREE_TYPE (node-decl))) return false; + if (lookup_attribute (always_inline, + DECL_ATTRIBUTES (node-decl)) != NULL) +{ + if (dump_file) + fprintf (dump_file, Allways inline function will be inlined + anyway. \n); + return false; +} + return true; }
Re: [PATCH][LTO/PGO] Warn when both -flto and -fprofile-generate are enabled
On Wed, Apr 2, 2014 at 2:07 PM, Richard Biener richard.guent...@gmail.com wrote: On Wed, Apr 2, 2014 at 1:50 PM, Markus Trippelsdorf mar...@trippelsdorf.de wrote: It is a common mistake to enable both -flto and -fprofile-generate when building projects. This is not a good idea, because memory use will skyrocket due to instrumentation. So just warn the user. OK for next stage1? I'd rather see if we can fix the underlying issue. For example as we are now instrumenting as IPA pass we can allocate a single counter array (if the number of global vars is the issue). Basically split analysis and instrumentation into two phases for that. Or even better, do profile instrumentation as real IPA pass. Thus, isn't -coverage also facing the same issue? Thus, is it really -fprofile-arcs already or only one of the value profiling pieces? Yep, -fprofile-arcs will cause similar issues. Implementing instrumentation as real IPA is on my TODO list, but pretty low, since it is quite some work; we need to stream CFG into summaries and make the instrumentation code independent of function bodies, that needs quite some reorg (at moment we have no way to load cfg alone). Note that -fprofile-generate -flto gives you a bit more precise profiles than -fprofile-generate alone, this is because of COMDAT functions from static libraries that may be lost in the first case. Honza Richard. Richard. 2014-04-02 Markus Trippelsdorf mar...@trippelsdorf.de * common.opt (fprofile-generate): Add flag. * opts.c (finish_options): Add new warning. (common_handle_option): Set flag. diff --git a/gcc/common.opt b/gcc/common.opt index 62c72f0d2fbf..61e9adfa0df5 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1689,7 +1689,7 @@ Common Report Var(flag_profile_correction) Enable correction of flow inconsistent profile data input fprofile-generate -Common +Common Var(flag_profile_generate) Enable common options for generating profile info for profile feedback directed optimizations fprofile-generate= diff --git a/gcc/opts.c b/gcc/opts.c index fdc903f9271a..b62a0d626d94 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -833,6 +833,9 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set, error_at (loc, only one -flto-partition value can be specified); } + if (opts-x_flag_generate_lto opts-x_flag_profile_generate) +warning_at (loc, 0, Enabling both -fprofile-generate and -flto is a bad idea.); + /* We initialize opts-x_flag_split_stack to -1 so that targets can set a default value if they choose based on other options. */ if (opts-x_flag_split_stack == -1) @@ -1728,6 +1731,7 @@ common_handle_option (struct gcc_options *opts, case OPT_fprofile_generate_: opts-x_profile_data_prefix = xstrdup (arg); + opts-x_flag_profile_generate = true; value = true; /* No break here - do -fprofile-generate processing. */ case OPT_fprofile_generate: -- Markus
merged trunk into gimple-front-end
Hi, I just merged trunk r209020 into the gimple-front-end branch, please tell me if you see anything busted ;) I successfully bootstrapped the merge including building the gimple front end and its few tests passed. Trev signature.asc Description: Digital signature
[Patch, moxie] Zero- and sign-extend values properly
This patch does three related things for the moxie port... 1. Changes char to be unsigned by default 2. Changes WCHAR_TYPE from long int to unsigned int 3. Zero- and sign-extends values properly, sometimes using the new sign-extension instructions. I am committing this change even at this late stage of the GCC release process because it only touches the moxie target directory. AG 2014-04-02 Anthony Green gr...@moxielogic.com * config/moxie/moxie.md (zero_extendqisi2, zero_extendhisi2) (extendqisi2, extendhisi2): Define. * config/moxie/moxie.h (DEFAULT_SIGNED_CHAR): Change to 0. (WCHAR_TYPE): Change to unsigned int. Index: gcc/config/moxie/moxie.h === --- gcc/config/moxie/moxie.h(revision 209042) +++ gcc/config/moxie/moxie.h(working copy) @@ -59,7 +59,7 @@ #define DOUBLE_TYPE_SIZE 64 #define LONG_DOUBLE_TYPE_SIZE 64 -#define DEFAULT_SIGNED_CHAR 1 +#define DEFAULT_SIGNED_CHAR 0 #undef SIZE_TYPE #define SIZE_TYPE unsigned int @@ -68,7 +68,7 @@ #define PTRDIFF_TYPE int #undef WCHAR_TYPE -#define WCHAR_TYPE long int +#define WCHAR_TYPE unsigned int #undef WCHAR_TYPE_SIZE #define WCHAR_TYPE_SIZE BITS_PER_WORD Index: gcc/config/moxie/moxie.md === --- gcc/config/moxie/moxie.md (revision 209042) +++ gcc/config/moxie/moxie.md (working copy) @@ -239,6 +239,56 @@ ldo.l %0, %1 [(set_attr length 2,2,6,2,6,2,6,6,6)]) +(define_insn_and_split zero_extendqisi2 + [(set (match_operand:SI 0 register_operand =r,r,r,r) + (zero_extend:SI (match_operand:QI 1 nonimmediate_operand 0,W,A,B)))] + + @ + ; + ld.b %0, %1 + lda.b %0, %1 + ldo.b %0, %1 + reload_completed + [(set (match_dup 2) (match_dup 1)) + (set (match_dup 0) (zero_extend:SI (match_dup 2)))] +{ + operands[2] = gen_lowpart (QImode, operands[0]); +} + [(set_attr length 0,2,6,6)]) + +(define_insn_and_split zero_extendhisi2 + [(set (match_operand:SI 0 register_operand =r,r,r,r) + (zero_extend:SI (match_operand:HI 1 nonimmediate_operand 0,W,A,B)))] + + @ + ; + ld.s %0, %1 + lda.s %0, %1 + ldo.s %0, %1 + reload_completed + [(set (match_dup 2) (match_dup 1)) + (set (match_dup 0) (zero_extend:SI (match_dup 2)))] +{ + operands[2] = gen_lowpart (HImode, operands[0]); +} + [(set_attr length 0,2,6,6)]) + +(define_insn extendqisi2 + [(set (match_operand:SI 0 register_operand =r) + (sign_extend:SI (match_operand:QI 1 nonimmediate_operand r)))] + + @ + sex.b %0, %1 + [(set_attr length 2)]) + +(define_insn extendhisi2 + [(set (match_operand:SI 0 register_operand =r) + (sign_extend:SI (match_operand:HI 1 nonimmediate_operand r)))] + + @ + sex.s %0, %1 + [(set_attr length 2)]) + (define_expand movqi [(set (match_operand:QI 0 general_operand ) (match_operand:QI 1 general_operand ))]
Fix ipa-devirt ICE
Hi, this patch fixes ICE on type inconsistent code. The ICE happens because of gcc_unreachable I forgot in code during development. I added way to mark calls as inconsistent that is useful to redirect them to UNREACHABLE. Bootstrapped/regtested x86_64-linux, comitted. Honza * testsuite/g++.dg/torture/pr60659.C: New testcase. * ipa-devirt.c (get_polymorphic_call_info): Do not ICE on type inconsistent code and instead mark the context inconsistent. (possible_polymorphic_call_targets): For inconsistent contexts return empty complete list. Index: testsuite/g++.dg/torture/pr60659.C === --- testsuite/g++.dg/torture/pr60659.C (revision 0) +++ testsuite/g++.dg/torture/pr60659.C (revision 0) @@ -0,0 +1,58 @@ +// { dg-do compile } +template typename _InputIterator void __distance (_InputIterator); +template typename _InputIterator +void distance (_InputIterator, _InputIterator p2) +{ + __distance (p2); +} + +namespace boost +{ +template class Iterator struct A +{ + typedef typename Iterator::difference_type type; +}; +template class T typename T::const_iterator end (T ); +template class T typename T::const_iterator begin (T ); +template class T struct D : Atypename T::const_iterator +{ +}; +template class T typename DT::type distance (const T p1) +{ + distance (boost::begin (p1), boost::end (p1)); + return 0; +} +template class IteratorT class B +{ +public: + typedef B type; + typedef IteratorT const_iterator; +}; +} + +typedef int storage_t[]; +struct F; +template template typename class struct G +{ + G (const G p1) { p1.m_fn1 ().m_fn1 (0); } + const F m_fn1 () const + { +const void *a; +a = data_m; +return *static_castconst F *(a); + } + storage_t *data_m; +}; + +struct F +{ + virtual F *m_fn1 (void *) const; +}; +template typename struct H; +struct C : GH +{ + typedef int difference_type; +}; +boost::BC AllTransVideos (); +int b = boost::distance (AllTransVideos ()); + Index: ipa-devirt.c === --- ipa-devirt.c(revision 208915) +++ ipa-devirt.c(working copy) @@ -1214,7 +1214,13 @@ get_polymorphic_call_info (tree fndecl, not part of outer type. */ if (!contains_type_p (TREE_TYPE (base), context-offset + offset2, *otr_type)) - return base_pointer; + { + /* Use OTR_TOKEN = INT_MAX as a marker of probably type inconsistent +code sequences; we arrange the calls to be builtin_unreachable +later. */ + *otr_token = INT_MAX; + return base_pointer; + } get_polymorphic_call_info_for_decl (context, base, context-offset + offset2); return NULL; @@ -1288,8 +1294,10 @@ get_polymorphic_call_info (tree fndecl, if (!contains_type_p (context-outer_type, context-offset, *otr_type)) { - context-outer_type = NULL; - gcc_unreachable (); + /* Use OTR_TOKEN = INT_MAX as a marker of probably type inconsistent +code sequences; we arrange the calls to be builtin_unreachable +later. */ + *otr_token = INT_MAX; return base_pointer; } context-maybe_derived_type = false; @@ -1389,6 +1397,9 @@ devirt_variable_node_removal_hook (varpo temporarily change to one of base types. INCLUDE_DERIVER_TYPES make us to walk the inheritance graph for all derivations. + OTR_TOKEN == INT_MAX is used to mark calls that are provably + undefined and should be redirected to unreachable. + If COMPLETEP is non-NULL, store true if the list is complete. CACHE_TOKEN (if non-NULL) will get stored to an unique ID of entry in the target cache. If user needs to visit every target list @@ -1422,6 +1433,7 @@ possible_polymorphic_call_targets (tree bool complete; bool can_refer; + /* If ODR is not initialized, return empty incomplete list. */ if (!odr_hash.is_created ()) { if (completep) @@ -1431,11 +1443,28 @@ possible_polymorphic_call_targets (tree return nodes; } + /* If we hit type inconsistency, just return empty list of targets. */ + if (otr_token == INT_MAX) +{ + if (completep) + *completep = true; + if (nonconstruction_targetsp) + *nonconstruction_targetsp = 0; + return nodes; +} + type = get_odr_type (otr_type, true); /* Lookup the outer class type we want to walk. */ - if (context.outer_type) -get_class_context (context, otr_type); + if (context.outer_type + !get_class_context (context, otr_type)) +{ + if (completep) +
RE: [PATCH][2/3] Fix PR54733 Optimize endian independent load/store
From: Richard Biener [mailto:richard.guent...@gmail.com] More like isn't enough to answer this - do you have a testcase? (usually these end up in undefined-overflow and/or conversion-to-sizetype issues) I do. See attachment. This testcase needs to be compiled with patch 2/3 applied. As you can see from the patch, data[a] and data[a+1] will be converted to offsets by multiplying the index with the element size. Then later, analyzing the ORing, a substraction of these two index will be done. So you have two fold_build and not one. I can't reproduce it with a simple expression such as (a+1)*1 - a*1 so maybe being done in two part is the reason, you know better. Best regards, Thomas missed_folding.c Description: Binary data
RE: [PATCH][1/3] Fix PR54733 Optimize endian independent load/store
From: Joseph Myers [mailto:jos...@codesourcery.com] + if { [is-effective-target bswap] + ![istarget x86_64-*-*] } { That x86_64-*-* test is wrong. x86_64-*-* and i?86-*-* should always be handled the same (if you then want to distinguish 32-bit and 64-bit multilibs, you check the appropriate effective-target there, depending on whether the condition is one on the ABI or which register size is being used, which affects how x32 should be counted). Indeed, it's a mistake. I?86 should be in there two. Please find attached an updated patch. Best regards, Thomas gcc32rm-84.3.1.part1.diff Description: Binary data