Re: [PATCH][4/4] Always 64bit-HWI cleanups
On May 27, 2014 3:58:06 AM CEST, David Edelsohn dje@gmail.com wrote: This patch (further) broke bootstrap on AIX. AIX defaults to 32 bit, although it supports 64 bit HWI. /nasfarm/edelsohn/src/src/gcc/bitmap.c: In function 'int print_statistics(bitmap_descriptor_d**, output_info*)': /nasfarm/edelsohn/src/src/gcc/bitmap.c:2168:24: error: expected ')' before 'PRId64' /nasfarm/edelsohn/src/src/gcc/bitmap.c: In function 'void dump_bitmap_statistics()': /nasfarm/edelsohn/src/src/gcc/bitmap.c:2202:15: error: expected ')' before 'PRId64' /nasfarm/edelsohn/src/src/gcc/bt-load.c: In function 'void migrate_btr_defs(reg_class, int)': /nasfarm/edelsohn/src/src/gcc/bt-load.c:1414:34: error: expected ')' before 'PRId64' /nasfarm/edelsohn/src/src/gcc/cfg.c: In function 'void dump_edge_info(FILE*, edge, int, int)': /nasfarm/edelsohn/src/src/gcc/cfg.c:489:25: error: expected ')' before 'PRId64' /nasfarm/edelsohn/src/src/gcc/cfg.c: In function 'void dump_bb_info(FILE*, basic_block, int, int, bool, bool)': /nasfarm/edelsohn/src/src/gcc/cfg.c:737:33: error: expected ')' before 'PRId64' This means aix has inttypes.h but not the standard mandated PRI?64 macros. Does it have stdint.h and therein int64_t? A possibility is to define the GCC fallbacks when the defines are not defined as opposed to only when inttypes.h is not available. Can you investigate some more the situation on aix? Thanks, Richard. bitmap.i looks like: int print_statistics (bitmap_descriptor_d **slot, output_info *i) { bitmap_descriptor d = *slot; char s[4096]; if (d-allocated) { const char *s1 = d-file; const char *s2; while ((s2 = strstr (s1, gcc/))) s1 = s2 + 4; sprintf (s, %s:%i (%s), s1, d-line, d-function); s[41] = 0; fprintf ((_iob[2]), %-41s %9u %15PRId64 %15PRId64 %15PRId64 %10PRId64 %10PRId64\n, s, d-created, d-allocated, d-peak, d-current, d-nsearches, d-search_iter); i-size += d-allocated; i-count += d-created; } return 1; } - David
Re: [PATCH][4/4] Always 64bit-HWI cleanups
On Tue, May 27, 2014 at 08:26:35AM +0200, Richard Biener wrote: /nasfarm/edelsohn/src/src/gcc/cfg.c: In function 'void dump_bb_info(FILE*, basic_block, int, int, bool, bool)': /nasfarm/edelsohn/src/src/gcc/cfg.c:737:33: error: expected ')' before 'PRId64' This means aix has inttypes.h but not the standard mandated PRI?64 macros. Does it have stdint.h and therein int64_t? A possibility is to define the GCC fallbacks when the defines are not defined as opposed to only when inttypes.h is not available. BTW, we should be prepared for C++11 udlit and use spaces around the PRI* macros, in case some compiler treats those as udlits unconditionally. Jakub
[PATCH, libgomp]: Require vect_simd_clones effective target for libgomp.fortran/declare-simd-[12].f90
Hello! These tests require vect_simd_clones effective target, as the target should be able to compile AVX clones. 2014-05-27 Uros Bizjak ubiz...@gmail.com * testsuite/libgomp.fortran/declare-simd-1.f90: Require vect_simd_clones effective target. Remove dg-additional-options directives. * testsuite/libgomp.fortran/declare-simd-2.f90: Ditto. Tested on x86_64-linux-gnu CentOS 5. Uros. Index: testsuite/libgomp.fortran/declare-simd-1.f90 === --- testsuite/libgomp.fortran/declare-simd-1.f90(revision 210956) +++ testsuite/libgomp.fortran/declare-simd-1.f90(working copy) @@ -1,6 +1,5 @@ ! { dg-options -fno-inline } -! { dg-additional-options -msse2 { target sse2_runtime } } -! { dg-additional-options -mavx { target avx_runtime } } +! { dg-require-effective-target vect_simd_clones } module declare_simd_1_mod contains Index: testsuite/libgomp.fortran/declare-simd-2.f90 === --- testsuite/libgomp.fortran/declare-simd-2.f90(revision 210956) +++ testsuite/libgomp.fortran/declare-simd-2.f90(working copy) @@ -1,8 +1,7 @@ ! { dg-do run } ! { dg-options -fno-inline } - ! { dg-additional-sources declare-simd-3.f90 } -! { dg-additional-options -msse2 { target sse2_runtime } } -! { dg-additional-options -mavx { target avx_runtime } } +! { dg-additional-sources declare-simd-3.f90 } +! { dg-require-effective-target vect_simd_clones } module declare_simd_2_mod contains
Re: [PATCH, libgomp]: Require vect_simd_clones effective target for libgomp.fortran/declare-simd-[12].f90
On Tue, May 27, 2014 at 08:58:18AM +0200, Uros Bizjak wrote: These tests require vect_simd_clones effective target, as the target should be able to compile AVX clones. 2014-05-27 Uros Bizjak ubiz...@gmail.com * testsuite/libgomp.fortran/declare-simd-1.f90: Require vect_simd_clones effective target. Remove dg-additional-options directives. * testsuite/libgomp.fortran/declare-simd-2.f90: Ditto. Tested on x86_64-linux-gnu CentOS 5. Please don't remove the dg-additional-options there, that is completely intentional there, only the simd clones are built for SSE2/AVX/AVX2, the simd loops are built with whatever options the loop is compiled with, and for the common case (AVX or later HW, but compiler not configured to support only AVX or later) I want to test as much vectorization as possible. Requiring vect_simd_clone or the whitespace change is fine, though I'd just use ! { dg-do run { target vect_simd_clones } } instead of dg-require-effective-target. Index: testsuite/libgomp.fortran/declare-simd-1.f90 === --- testsuite/libgomp.fortran/declare-simd-1.f90 (revision 210956) +++ testsuite/libgomp.fortran/declare-simd-1.f90 (working copy) @@ -1,6 +1,5 @@ ! { dg-options -fno-inline } -! { dg-additional-options -msse2 { target sse2_runtime } } -! { dg-additional-options -mavx { target avx_runtime } } +! { dg-require-effective-target vect_simd_clones } module declare_simd_1_mod contains Index: testsuite/libgomp.fortran/declare-simd-2.f90 === --- testsuite/libgomp.fortran/declare-simd-2.f90 (revision 210956) +++ testsuite/libgomp.fortran/declare-simd-2.f90 (working copy) @@ -1,8 +1,7 @@ ! { dg-do run } ! { dg-options -fno-inline } - ! { dg-additional-sources declare-simd-3.f90 } -! { dg-additional-options -msse2 { target sse2_runtime } } -! { dg-additional-options -mavx { target avx_runtime } } +! { dg-additional-sources declare-simd-3.f90 } +! { dg-require-effective-target vect_simd_clones } module declare_simd_2_mod contains Jakub
RE: [PATCH] Fix PR54733 Optimize endian independent load/store
Hi Chistophe and Andreas, From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme I suspect it's the same kind of problem m68k run into. I already wrote a patch to reduce the impact of different bitfield layout and it's in review right now. I'll send you tomorrow for testing. Can you both try the attached patch to see if it solves the bug you experienced? Andreas, I wrote it based on the correction you did so it should at least work for m68k. ChangeLog is: 2014-05-26 Thomas Preud'homme thomas.preudho...@arm.com * gcc.c-torture/execute/bswap-2.c: Add alignment constraints to bitfield to make it more portable. Best regards, Thomas fix_bswap-2.diff Description: Binary data
Re: [PATCH] Warn on and fold NULL checks against inline functions
On Mon, May 26, 2014 at 9:01 PM, Patrick Palka patr...@parcs.ath.cx wrote: Hi, This patch teaches the C++ frontend to warn on NULL checks against inline functions and it teaches the middle-end to fold NULL checks against inline functions. These two things are currently done for non-inline C++ functions, but inline functions are exceptional in that the C++ frontend marks them as weak, and NULL checks against weak symbols cannot be folded in general because the symbol may be mapped to NULL at link-time. But in the case of an inline function, the fact that it's a weak symbol is an implementation detail. And because it is not permitted to explicitly give an inline function the weak attribute (see handle_weak_attribute), in order to acheive $SUBJECT it suffices to assume that all inline functions are non-null, which is what this patch does. Bootstrap and regtest against x86_64-unknown-linux-gnu in progress. Is this patch OK assuming no regressions? What about always_inline function wrappers as used in FORTIFY_SOURCE? IIRC the actual referenced function may be declared weak and thus the address can compare to NULL? Sth like extern void __foo (void) __attribute__((weak,asm(foo))); extern inline __attribute__((always_inline,gnu_inline)) void foo (void) { __foo (); } int main() { if (foo == 0) return 0; abort (); } The issue is that the inline and alias may be hidden to the user and thus he'll get unwanted and confusing warnings. Richard. 2014-05-26 Patrick Palka patr...@parcs.ath.cx * c-family/c-common.c (decl_with_nonnull_addr_p): Assume inline functions are non-null. * fold-const.c (tree_single_nonzero_warnv_p): Likewise. --- gcc/c-family/c-common.c | 4 +++- gcc/fold-const.c| 5 - gcc/testsuite/g++.dg/inline-1.C | 14 ++ 3 files changed, 21 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.dg/inline-1.C diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 6ec14fc..d4747a0 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -4508,7 +4508,9 @@ decl_with_nonnull_addr_p (const_tree expr) return (DECL_P (expr) (TREE_CODE (expr) == PARM_DECL || TREE_CODE (expr) == LABEL_DECL - || !DECL_WEAK (expr))); + || !DECL_WEAK (expr) + || (TREE_CODE (expr) == FUNCTION_DECL + DECL_DECLARED_INLINE_P (expr; } /* Prepare expr to be an argument of a TRUTH_NOT_EXPR, diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 188b179..2796079 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -16052,7 +16052,10 @@ tree_single_nonzero_warnv_p (tree t, bool *strict_overflow_p) || (DECL_CONTEXT (base) TREE_CODE (DECL_CONTEXT (base)) == FUNCTION_DECL auto_var_in_fn_p (base, DECL_CONTEXT (base) - return !VAR_OR_FUNCTION_DECL_P (base) || !DECL_WEAK (base); + return !VAR_OR_FUNCTION_DECL_P (base) +|| !DECL_WEAK (base) +|| (TREE_CODE (base) == FUNCTION_DECL + DECL_DECLARED_INLINE_P (base)); /* Constants are never weak. */ if (CONSTANT_CLASS_P (base)) diff --git a/gcc/testsuite/g++.dg/inline-1.C b/gcc/testsuite/g++.dg/inline-1.C new file mode 100644 index 000..65beff1 --- /dev/null +++ b/gcc/testsuite/g++.dg/inline-1.C @@ -0,0 +1,14 @@ +// { dg-options -Waddress } + +inline void +foo (void) +{ +} + +int +bar (void) +{ + return foo == 0; // { dg-warning never be NULL } +} + +// { dg-final { scan-assembler-not foo } } -- 2.0.0.rc2
Re: [PATCH, libgomp]: Require vect_simd_clones effective target for libgomp.fortran/declare-simd-[12].f90
On Tue, May 27, 2014 at 9:18 AM, Jakub Jelinek ja...@redhat.com wrote: Please don't remove the dg-additional-options there, that is completely intentional there, only the simd clones are built for SSE2/AVX/AVX2, the simd loops are built with whatever options the loop is compiled with, and for the common case (AVX or later HW, but compiler not configured to support only AVX or later) I want to test as much vectorization as possible. Requiring vect_simd_clone or the whitespace change is fine, though I'd just use ! { dg-do run { target vect_simd_clones } } instead of dg-require-effective-target. Thanks for the explanation. Following is the v2 patch that I plan to commit after testing: 2014-05-27 Uros Bizjak ubiz...@gmail.com * testsuite/libgomp.fortran/declare-simd-1.f90: Require vect_simd_clones effective target. * testsuite/libgomp.fortran/declare-simd-2.f90: Ditto. Uros. Index: testsuite/libgomp.fortran/declare-simd-1.f90 === --- testsuite/libgomp.fortran/declare-simd-1.f90(revision 210956) +++ testsuite/libgomp.fortran/declare-simd-1.f90(working copy) @@ -1,3 +1,4 @@ +! { dg-do run { target vect_simd_clones } } ! { dg-options -fno-inline } ! { dg-additional-options -msse2 { target sse2_runtime } } ! { dg-additional-options -mavx { target avx_runtime } } Index: testsuite/libgomp.fortran/declare-simd-2.f90 === --- testsuite/libgomp.fortran/declare-simd-2.f90(revision 210956) +++ testsuite/libgomp.fortran/declare-simd-2.f90(working copy) @@ -1,6 +1,6 @@ -! { dg-do run } +! { dg-do run { target vect_simd_clones } } ! { dg-options -fno-inline } - ! { dg-additional-sources declare-simd-3.f90 } +! { dg-additional-sources declare-simd-3.f90 } ! { dg-additional-options -msse2 { target sse2_runtime } } ! { dg-additional-options -mavx { target avx_runtime } }
Re: [PATCH, libgomp]: Require vect_simd_clones effective target for libgomp.fortran/declare-simd-[12].f90
On Tue, May 27, 2014 at 09:40:23AM +0200, Uros Bizjak wrote: Following is the v2 patch that I plan to commit after testing: 2014-05-27 Uros Bizjak ubiz...@gmail.com * testsuite/libgomp.fortran/declare-simd-1.f90: Require vect_simd_clones effective target. * testsuite/libgomp.fortran/declare-simd-2.f90: Ditto. Thanks. Jakub
[PATCH, fortran]: Include stdlib.h in intrinsics/getcwd.c
... to avoid implicit declaration of function ‘free’ warning. 2014-05-27 Uros Bizjak ubiz...@gmail.com * intrinsics/getcwd.c: Include stdlib.h. Tested on x86_64-pc-linux-gnu. OK for mainline? Uros. Index: intrinsics/getcwd.c === --- intrinsics/getcwd.c (revision 210956) +++ intrinsics/getcwd.c (working copy) @@ -25,6 +25,7 @@ #include libgfortran.h +#include stdlib.h #include string.h #include errno.h
Re: [build, doc, testsuite] Centralise clearing hardware capabilities with Sun ld
Rainer Orth r...@cebitec.uni-bielefeld.de writes: Rainer Orth r...@cebitec.uni-bielefeld.de writes: Prompted by the recent failures of c-c++-common/gomp/pr60823-2.c on Solaris/x86 with Sun as http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00943.html I've reworked the clearing of hardware capabilities via linker maps for Sun ld, which is currently replicated in several places all over the testsuite. I've chosen to use TEST_ALWAYS_FLAGS or ALWAYS_CXXFLAGS to pass on the necessary linker flags. It turned out that more is needed to deal with the new gcc.dg/gomp and g++.dg/gomp failures: https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01185.html The following patch implements what I've outlined there: it introduces a Solaris-specific new -mclear-hwcap option, checking which version of the mapfile syntax works, if any. This way, all the duplication in the testsuite and libitm can go. Do deal with the OpenMP declare simd failures, this option/mapfile is automatically activated with -fopenmp*. Initial testing has concluded successfully, also on an Solaris 10/x86 system where ld only understands the old mapfile syntax. Mainline bootstraps on i386-pc-solaris2.11 (as/ld, gas/ld, gas/gld), i386-pc-solaris2.10 (as/ld, gas/ld), and sparc-sun-solaris2.11 (as/ld) are still running. x86_64-unknown-linux-gnu didn't finish due to unrelated libgo breakage. Also, i386-pc-solaris2.9 bootstrap running on the 4.9 branch which is equally affected by the testsuite failures. Ok for mainline and 4.9 branch if those pass? It's been a week since I've submitted the patch, so far having received approval for the testsuite parts only. While I don't need approval for the Solaris-only parts, it would be good if the doc and build changes could be checked by the respective maintainers. One might argue that they are Solaris-only, thus falling under my maintainership, but better safe than sorry. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH, ARM] interrupt testcases for shink-wrapping
Hi, Following up Richard's comments: https://gcc.gnu.org/ml/gcc-patches/2014-05/msg00967.html I investigate interrupt routines: * gcc can shrink-wrapping an interrupt routine (ARM mode). The shrink-wrap-interrupt_1.c in the patch can show it. * There is restriction for interrupt routine to be shrink-wrapped. e.g. shrink-wrap-interrupt_2.c can not be shrink-wrapped. * When interrupt routine is shrink-wrapped, it uses *cond_simple_return to return from the function. There is no dwarf info issue. So no need to change the codes. The patch only includes two test cases to track it. OK for trunk? Thanks! -Zhenqiang testsuite/ChangeLog: 2014-05-27 Zhenqiang Chen zhenqiang.c...@linaro.org * gcc.target/arm/shrink-wrap-interrupt_1.c: New test. * gcc.target/arm/shrink-wrap-interrupt_2.c: New test. diff --git a/gcc/testsuite/gcc.target/arm/shrink-wrap-interrupt_1.c b/gcc/testsuite/gcc.target/arm/shrink-wrap-interrupt_1.c new file mode 100644 index 000..3432115 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/shrink-wrap-interrupt_1.c @@ -0,0 +1,24 @@ +/* Verify that shrink-wrapping works for functions + with __attribute__ ((interrupt)). */ + +/* { dg-do compile } */ +/* { dg-require-effective-target arm_nothumb } */ +/* { dg-options -O2 -marm -mapcs -fdump-rtl-pro_and_epilogue } */ + +/* This test is not valid when -mthumb. */ + +extern void test (int, int) __attribute__ ((interrupt (IRQ))); +int p; + +void +test (int a, int b) +{ + if (a b) + { +p = a + 0x123456; +p = p * (b - 0x9879876); + } +} + +/* { dg-final { scan-rtl-dump Performing shrink-wrapping pro_and_epilogue } } */ +/* { dg-final { cleanup-rtl-dump pro_and_epilogue } } */ diff --git a/gcc/testsuite/gcc.target/arm/shrink-wrap-interrupt_2.c b/gcc/testsuite/gcc.target/arm/shrink-wrap-interrupt_2.c new file mode 100644 index 000..ecb793e --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/shrink-wrap-interrupt_2.c @@ -0,0 +1,24 @@ +/* Verify that shrink-wrapping works for functions + with __attribute__ ((interrupt)). */ + +/* { dg-do compile } */ +/* { dg-require-effective-target arm_nothumb } */ +/* { dg-options -O2 -marm -mapcs -fdump-rtl-pro_and_epilogue } */ + +/* This test is not valid when -mthumb. */ + +extern void test (int, int) __attribute__ ((interrupt (IRQ))); +int p; + +void +test (int a, int b) +{ + if (a) + { +p = a + 0x123456; +p = p * (b - 0x9879876); + } +} + +/* { dg-final { scan-rtl-dump-not Performing shrink-wrapping pro_and_epilogue } } */ +/* { dg-final { cleanup-rtl-dump pro_and_epilogue } } */
[PATCH] Fix an AARCH64/ARM performance regression
Hi, I have been informed, that the following check-in may cause a slight performance regression on aarch64 and arm on trunk and gcc-4_9-branch: See https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=205899 This can be seen with the following test example: test.c: typedef struct { int *x; int *y; int *z; int *m; int *n; int *o; }A; typedef struct { int x; int y; int z; int m; int n; int o; }B; A a[6]; B *b; void test(int i) { A *a1 = a[i]; B *b1 = b[i]; a1-x = b1-x; a1-y = b1-y; a1-z = b1-z; a1-m = b1-m; a1-n = b1-n; a1-o = b1-o; } when compiled with aarch64-elf-gcc -O3 -S we get the following assembler code: test: adrp x1, b sxtw x3, w0 mov w5, 24 adrp x2, a add x2, x2, :lo12:a ldr x4, [x1, #:lo12:b] lsl x1, x3, 2 sub x1, x1, x3 lsl x1, x1, 4 smaddl x0, w0, w5, x4 add x3, x2, x1 add x7, x0, 8 add x6, x0, 16 add x8, x0, 4 add x5, x0, 12 add x4, x0, 20 str x0, [x2, x1] mov x0, x3 mov x1, x3 str x8, [x3, 8] str x7, [x0, 16]! str x6, [x1, 32]! str x5, [x0, 8] str x4, [x1, 8] ret Note the two mov instructions, and that two extra registers are used to store the values. The mov instructions have not been there before the patch. After some investigation I found out how this happened: The difference is first visible with -fdump-rtl-expand-slim: 1: NOTE_INSN_DELETED 4: NOTE_INSN_BASIC_BLOCK 2 2: r83:SI=x0:SI 3: NOTE_INSN_FUNCTION_BEG 6: r74:DI=sign_extend(r83:SI) 7: r85:DI=high(`b') 8: r84:DI=r85:DI+low(`b') REG_EQUAL `b' 9: r87:SI=0x18 10: r86:DI=sign_extend(r83:SI)*sign_extend(r87:SI) 11: r88:DI=[r84:DI] 12: r76:DI=r88:DI+r86:DI 13: r90:DI=high(`a') 14: r89:DI=r90:DI+low(`a') REG_EQUAL `a' 15: r91:DI=sign_extend(r83:SI) 16: r92:DI=r91:DI 17: r93:DI=r92:DI0x2 18: r94:DI=r93:DI-r91:DI REG_EQUAL r91:DI*0x3 19: r95:DI=r94:DI0x4 20: r94:DI=r95:DI REG_EQUAL r91:DI*0x30 21: r96:DI=r89:DI+r94:DI 22: [r96:DI]=r76:DI 23: r98:DI=high(`a') 24: r97:DI=r98:DI+low(`a') REG_EQUAL `a' 25: r99:DI=sign_extend(r83:SI) 26: r100:DI=r99:DI 27: r101:DI=r100:DI0x2 28: r102:DI=r101:DI-r99:DI REG_EQUAL r99:DI*0x3 29: r103:DI=r102:DI0x4 30: r102:DI=r103:DI REG_EQUAL r99:DI*0x30 31: r104:DI=r97:DI+r102:DI 32: r105:DI=r76:DI+0x4 33: [r104:DI+0x8]=r105:DI 34: r107:DI=high(`a') 35: r106:DI=r107:DI+low(`a') REG_EQUAL `a' 36: r108:DI=sign_extend(r83:SI) 37: r109:DI=r108:DI 38: r110:DI=r109:DI0x2 39: r111:DI=r110:DI-r108:DI REG_EQUAL r108:DI*0x3 40: r112:DI=r111:DI0x4 41: r111:DI=r112:DI REG_EQUAL r108:DI*0x30 42: r113:DI=r106:DI+r111:DI 43: r114:DI=r113:DI+0x10 44: r115:DI=r76:DI+0x8 45: [r114:DI]=r115:DI 46: r117:DI=high(`a') 47: r116:DI=r117:DI+low(`a') REG_EQUAL `a' 48: r118:DI=sign_extend(r83:SI) 49: r119:DI=r118:DI 50: r120:DI=r119:DI0x2 51: r121:DI=r120:DI-r118:DI REG_EQUAL r118:DI*0x3 52: r122:DI=r121:DI0x4 53: r121:DI=r122:DI REG_EQUAL r118:DI*0x30 54: r123:DI=r116:DI+r121:DI 55: r124:DI=r123:DI+0x10 56: r125:DI=r76:DI+0xc 57: [r124:DI+0x8]=r125:DI 58: r127:DI=high(`a') 59: r126:DI=r127:DI+low(`a') REG_EQUAL `a' 60: r128:DI=sign_extend(r83:SI) 61: r129:DI=r128:DI 62: r130:DI=r129:DI0x2 63: r131:DI=r130:DI-r128:DI REG_EQUAL r128:DI*0x3 64: r132:DI=r131:DI0x4 65: r131:DI=r132:DI REG_EQUAL r128:DI*0x30 66: r133:DI=r126:DI+r131:DI 67: r134:DI=r133:DI+0x20 68: r135:DI=r76:DI+0x10 69: [r134:DI]=r135:DI 70: r137:DI=high(`a') 71: r136:DI=r137:DI+low(`a') REG_EQUAL `a' 72: r138:DI=sign_extend(r83:SI) 73: r139:DI=r138:DI 74: r140:DI=r139:DI0x2 75: r141:DI=r140:DI-r138:DI REG_EQUAL r138:DI*0x3 76: r142:DI=r141:DI0x4 77: r141:DI=r142:DI REG_EQUAL r138:DI*0x30 78: r143:DI=r136:DI+r141:DI 79: r144:DI=r143:DI+0x20 80: r145:DI=r76:DI+0x14 81: [r144:DI+0x8]=r145:DI Note the offset +0x8 on the instructions 33, 57, 81 but not on 22, 45, 69. This artefact was not there before the patch. Well, this causes little ripples in the later rtl-passes. There is one pass that does a pretty good job at compensating this effect: forward-propagate. In simple cases the resulting code does not look any different, because fwprop folds all offsets together, and the result looks as before. However in this example fwprop could not remove some dead statements, and that made the
Re: [build, driver] RFC: Support compressed debug sections
Joseph S. Myers jos...@codesourcery.com writes: On Thu, 22 May 2014, Rainer Orth wrote: Joseph S. Myers jos...@codesourcery.com writes: On Thu, 22 May 2014, Rainer Orth wrote: * common.opt (compressed_debug_sections): New enum. (gz, gz=): New options. * opts.c (common_handle_option): Handle OPT_gz, OPT_gz_. Given that the options are completely handled via specs, why can't they just be Driver options (without Common) and so not mentioned in common_handle_option? If I do this, -gz still gets passed to e.g. cc1, which errors out like this: cc1: error: unrecognised debug output level z It seems my way of handling this is clearer than doing this in opts.c (set_debug_level). Thanks for the explanation. The driver changes are OK. Thanks. I still need approval for the doc and build parts, as well as the Darwin and DJGPP changes. For the latter two, I've included the patch in a x86_64-apple-darwin11.4.2 build, verifying that -gz is rejected as expected (no idea if the are any working gas and gold ports that would support the option). For DJGPP, I've tried a i386-pc-solaris2.11 x i586-pc-msdosdjgpp cross. While the specs additions look correct, trying to compile even the most trivial program SEGVs: ro@snoopy 319 ./xgcc -B./ -o hello hello.c built-in: internal compiler error: Segmentation Fault 0x87549df crash_signal /vol/gcc/src/hg/trunk/local/gcc/toplev.c:337 0x82b8270 contains_struct_check /vol/gcc/src/hg/trunk/local/gcc/tree.h:2841 0x82b8270 c_common_nodes_and_builtins() /vol/gcc/src/hg/trunk/local/gcc/c-family/c-common.c:5619 0x82437be c_init_decl_processing() /vol/gcc/src/hg/trunk/local/gcc/c/c-decl.c:3567 0x827fe1a c_objc_common_init() /vol/gcc/src/hg/trunk/local/gcc/c/c-objc-common.c:63 0x8756333 lang_dependent_init /vol/gcc/src/hg/trunk/local/gcc/toplev.c:1712 0x8756333 do_compile /vol/gcc/src/hg/trunk/local/gcc/toplev.c:1901 Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH, committed] PR61319 The tests c-c++-common/ubsan/float-cast-overflow-(1|2|4).c fail on x86_64-apple-darwin*
Patch approved by Jakub in the PR thread. Dominique 2014-05-27 Dominique d'Humieres domi...@lps.ens.fr PR testsuite/61319 * c-c++-common/ubsan/float-cast-overflow-1.c: Make the sign of -nan optional. * c-c++-common/ubsan/float-cast-overflow-2.c: Likewise. * c-c++-common/ubsan/float-cast-overflow-4.c: Likewise. Index: gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-1.c === --- gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-1.c(revision 210957) +++ gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-1.c(working copy) @@ -97,7 +97,7 @@ /* { dg-output \[^\n\r]*value 128.5 is outside the range of representable values of type 'signed char'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value 132 is outside the range of representable values of type 'signed char'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value nan is outside the range of representable values of type 'signed char'\[^\n\r]*(\n|\r\n|\r) } */ -/* { dg-output \[^\n\r]*value -nan is outside the range of representable values of type 'signed char'\[^\n\r]*(\n|\r\n|\r) } */ +/* { dg-output \[^\n\r]*value -?nan is outside the range of representable values of type 'signed char'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value inf is outside the range of representable values of type 'signed char'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value -inf is outside the range of representable values of type 'signed char'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value 256 is outside the range of representable values of type 'unsigned char'\[^\n\r]*(\n|\r\n|\r) } */ @@ -107,7 +107,7 @@ /* { dg-output \[^\n\r]*value -1.5 is outside the range of representable values of type 'unsigned char'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value -1 is outside the range of representable values of type 'unsigned char'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value nan is outside the range of representable values of type 'unsigned char'\[^\n\r]*(\n|\r\n|\r) } */ -/* { dg-output \[^\n\r]*value -nan is outside the range of representable values of type 'unsigned char'\[^\n\r]*(\n|\r\n|\r) } */ +/* { dg-output \[^\n\r]*value -?nan is outside the range of representable values of type 'unsigned char'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value inf is outside the range of representable values of type 'unsigned char'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value -inf is outside the range of representable values of type 'unsigned char'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value -32773 is outside the range of representable values of type 'short int'\[^\n\r]*(\n|\r\n|\r) } */ @@ -117,7 +117,7 @@ /* { dg-output \[^\n\r]*value 32768.5 is outside the range of representable values of type 'short int'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value 32772 is outside the range of representable values of type 'short int'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value nan is outside the range of representable values of type 'short int'\[^\n\r]*(\n|\r\n|\r) } */ -/* { dg-output \[^\n\r]*value -nan is outside the range of representable values of type 'short int'\[^\n\r]*(\n|\r\n|\r) } */ +/* { dg-output \[^\n\r]*value -?nan is outside the range of representable values of type 'short int'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value inf is outside the range of representable values of type 'short int'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value -inf is outside the range of representable values of type 'short int'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value 65536 is outside the range of representable values of type 'short unsigned int'\[^\n\r]*(\n|\r\n|\r) } */ @@ -127,7 +127,7 @@ /* { dg-output \[^\n\r]*value -1.5 is outside the range of representable values of type 'short unsigned int'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value -1 is outside the range of representable values of type 'short unsigned int'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value nan is outside the range of representable values of type 'short unsigned int'\[^\n\r]*(\n|\r\n|\r) } */ -/* { dg-output \[^\n\r]*value -nan is outside the range of representable values of type 'short unsigned int'\[^\n\r]*(\n|\r\n|\r) } */ +/* { dg-output \[^\n\r]*value -?nan is outside the range of representable values of type 'short unsigned int'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value inf is outside the range of representable values of type 'short unsigned int'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value -inf is outside the range of representable values of type 'short unsigned int'\[^\n\r]*(\n|\r\n|\r) } */ /* { dg-output \[^\n\r]*value -2.14748e\\\+09 is outside the range of representable values of type 'int'\[^\n\r]*(\n|\r\n|\r) } */ @@ -137,7 +137,7 @@ /* { dg-output \[^\n\r]*value
Re: Make sure that all insn codes are in (0, LAST_INSN_CODE)
Segher Boessenkool seg...@kernel.crashing.org writes: - puts ( LAST_INSN_CODE\n\ + printf ( LAST_INSN_CODE = %d\n\ };\n\ \n\ -#endif /* GCC_INSN_CODES_H */); +#endif /* GCC_INSN_CODES_H */, last); You probably didn't intend to delete the newline at the end of the generated file? Oops. Updated patch below. Thanks, Richard gcc/ * gencodes.c (main): Make LAST_INSN_CODE higher than any insn code, rather than any named insn's code. Index: gcc/gencodes.c === --- gcc/gencodes.c 2014-05-27 09:38:02.195506710 +0100 +++ gcc/gencodes.c 2014-05-27 09:39:16.002188824 +0100 @@ -50,6 +50,7 @@ gen_insn (rtx insn, int code) main (int argc, char **argv) { rtx desc; + int last = 1; progname = gencodes; @@ -82,13 +83,16 @@ enum insn_code {\n\ break; if (GET_CODE (desc) == DEFINE_INSN || GET_CODE (desc) == DEFINE_EXPAND) - gen_insn (desc, insn_code_number); + { + gen_insn (desc, insn_code_number); + last = insn_code_number + 1; + } } - puts ( LAST_INSN_CODE\n\ + printf ( LAST_INSN_CODE = %d\n\ };\n\ \n\ -#endif /* GCC_INSN_CODES_H */); +#endif /* GCC_INSN_CODES_H */\n, last); if (ferror (stdout) || fflush (stdout) || fclose (stdout)) return FATAL_EXIT_CODE;
Re: Make sure that all insn codes are in (0, LAST_INSN_CODE)
On Tue, May 27, 2014 at 09:41:08AM +0100, Richard Sandiford wrote: * gencodes.c (main): Make LAST_INSN_CODE higher than any insn code, rather than any named insn's code. Ok. --- gcc/gencodes.c2014-05-27 09:38:02.195506710 +0100 +++ gcc/gencodes.c2014-05-27 09:39:16.002188824 +0100 @@ -50,6 +50,7 @@ gen_insn (rtx insn, int code) main (int argc, char **argv) { rtx desc; + int last = 1; progname = gencodes; @@ -82,13 +83,16 @@ enum insn_code {\n\ break; if (GET_CODE (desc) == DEFINE_INSN || GET_CODE (desc) == DEFINE_EXPAND) - gen_insn (desc, insn_code_number); + { + gen_insn (desc, insn_code_number); + last = insn_code_number + 1; + } } - puts ( LAST_INSN_CODE\n\ + printf ( LAST_INSN_CODE = %d\n\ };\n\ \n\ -#endif /* GCC_INSN_CODES_H */); +#endif /* GCC_INSN_CODES_H */\n, last); if (ferror (stdout) || fflush (stdout) || fclose (stdout)) return FATAL_EXIT_CODE; Jakub
Re: [PATCH, fortran]: Include stdlib.h in intrinsics/getcwd.c
Uros Bizjak wrote: ... to avoid implicit declaration of function âeeâwarning. 2014-05-27 Uros Bizjak ubiz...@gmail.com * intrinsics/getcwd.c: Include stdlib.h. Tested on x86_64-pc-linux-gnu. OK for mainline? OK - thanks for the patch. (I think it also counts as obvious.) Tobias
Re: [PATCH] Fix an AARCH64/ARM performance regression
On Tue, May 27, 2014 at 10:10 AM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: Hi, I have been informed, that the following check-in may cause a slight performance regression on aarch64 and arm on trunk and gcc-4_9-branch: See https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=205899 This can be seen with the following test example: test.c: typedef struct { int *x; int *y; int *z; int *m; int *n; int *o; }A; typedef struct { int x; int y; int z; int m; int n; int o; }B; A a[6]; B *b; void test(int i) { A *a1 = a[i]; B *b1 = b[i]; a1-x = b1-x; a1-y = b1-y; a1-z = b1-z; a1-m = b1-m; a1-n = b1-n; a1-o = b1-o; } when compiled with aarch64-elf-gcc -O3 -S we get the following assembler code: test: adrp x1, b sxtw x3, w0 mov w5, 24 adrp x2, a addx2, x2, :lo12:a ldr x4, [x1, #:lo12:b] lsl x1, x3, 2 subx1, x1, x3 lsl x1, x1, 4 smaddl x0, w0, w5, x4 addx3, x2, x1 addx7, x0, 8 addx6, x0, 16 addx8, x0, 4 addx5, x0, 12 addx4, x0, 20 str x0, [x2, x1] mov x0, x3 mov x1, x3 str x8, [x3, 8] str x7, [x0, 16]! str x6, [x1, 32]! str x5, [x0, 8] str x4, [x1, 8] ret Note the two mov instructions, and that two extra registers are used to store the values. The mov instructions have not been there before the patch. After some investigation I found out how this happened: The difference is first visible with -fdump-rtl-expand-slim: 1: NOTE_INSN_DELETED 4: NOTE_INSN_BASIC_BLOCK 2 2: r83:SI=x0:SI 3: NOTE_INSN_FUNCTION_BEG 6: r74:DI=sign_extend(r83:SI) 7: r85:DI=high(`b') 8: r84:DI=r85:DI+low(`b') REG_EQUAL `b' 9: r87:SI=0x18 10: r86:DI=sign_extend(r83:SI)*sign_extend(r87:SI) 11: r88:DI=[r84:DI] 12: r76:DI=r88:DI+r86:DI 13: r90:DI=high(`a') 14: r89:DI=r90:DI+low(`a') REG_EQUAL `a' 15: r91:DI=sign_extend(r83:SI) 16: r92:DI=r91:DI 17: r93:DI=r92:DI0x2 18: r94:DI=r93:DI-r91:DI REG_EQUAL r91:DI*0x3 19: r95:DI=r94:DI0x4 20: r94:DI=r95:DI REG_EQUAL r91:DI*0x30 21: r96:DI=r89:DI+r94:DI 22: [r96:DI]=r76:DI 23: r98:DI=high(`a') 24: r97:DI=r98:DI+low(`a') REG_EQUAL `a' 25: r99:DI=sign_extend(r83:SI) 26: r100:DI=r99:DI 27: r101:DI=r100:DI0x2 28: r102:DI=r101:DI-r99:DI REG_EQUAL r99:DI*0x3 29: r103:DI=r102:DI0x4 30: r102:DI=r103:DI REG_EQUAL r99:DI*0x30 31: r104:DI=r97:DI+r102:DI 32: r105:DI=r76:DI+0x4 33: [r104:DI+0x8]=r105:DI 34: r107:DI=high(`a') 35: r106:DI=r107:DI+low(`a') REG_EQUAL `a' 36: r108:DI=sign_extend(r83:SI) 37: r109:DI=r108:DI 38: r110:DI=r109:DI0x2 39: r111:DI=r110:DI-r108:DI REG_EQUAL r108:DI*0x3 40: r112:DI=r111:DI0x4 41: r111:DI=r112:DI REG_EQUAL r108:DI*0x30 42: r113:DI=r106:DI+r111:DI 43: r114:DI=r113:DI+0x10 44: r115:DI=r76:DI+0x8 45: [r114:DI]=r115:DI 46: r117:DI=high(`a') 47: r116:DI=r117:DI+low(`a') REG_EQUAL `a' 48: r118:DI=sign_extend(r83:SI) 49: r119:DI=r118:DI 50: r120:DI=r119:DI0x2 51: r121:DI=r120:DI-r118:DI REG_EQUAL r118:DI*0x3 52: r122:DI=r121:DI0x4 53: r121:DI=r122:DI REG_EQUAL r118:DI*0x30 54: r123:DI=r116:DI+r121:DI 55: r124:DI=r123:DI+0x10 56: r125:DI=r76:DI+0xc 57: [r124:DI+0x8]=r125:DI 58: r127:DI=high(`a') 59: r126:DI=r127:DI+low(`a') REG_EQUAL `a' 60: r128:DI=sign_extend(r83:SI) 61: r129:DI=r128:DI 62: r130:DI=r129:DI0x2 63: r131:DI=r130:DI-r128:DI REG_EQUAL r128:DI*0x3 64: r132:DI=r131:DI0x4 65: r131:DI=r132:DI REG_EQUAL r128:DI*0x30 66: r133:DI=r126:DI+r131:DI 67: r134:DI=r133:DI+0x20 68: r135:DI=r76:DI+0x10 69: [r134:DI]=r135:DI 70: r137:DI=high(`a') 71: r136:DI=r137:DI+low(`a') REG_EQUAL `a' 72: r138:DI=sign_extend(r83:SI) 73: r139:DI=r138:DI 74: r140:DI=r139:DI0x2 75: r141:DI=r140:DI-r138:DI REG_EQUAL r138:DI*0x3 76: r142:DI=r141:DI0x4 77: r141:DI=r142:DI REG_EQUAL r138:DI*0x30 78: r143:DI=r136:DI+r141:DI 79: r144:DI=r143:DI+0x20 80: r145:DI=r76:DI+0x14 81: [r144:DI+0x8]=r145:DI Note the offset +0x8 on the instructions 33, 57, 81 but not on 22, 45, 69. This artefact was not there before the patch. Well, this causes little ripples in the later rtl-passes. There is one pass that does a pretty good job at compensating
Re: [RFC] optimize x - y cmp 0 with undefined overflow
+ tree new_const + = fold_build2_loc (loc, reverse_op, TREE_TYPE (arg1), const2, const1); /* If the constant operation overflowed this can be simplified as a comparison against INT_MAX/INT_MIN. */ - if (TREE_CODE (lhs) == INTEGER_CST - TREE_OVERFLOW (lhs)) + if (TREE_OVERFLOW (new_const)) well, either use int_const_binop above or retain the check (or use TREE_OVERFLOW_P). Bonus points for using wide-ints here and not relying on TREE_OVERFLOW. The check is useless (you get either NULL_TREE or INTEGER_CST here) but I'll use int_const_binop. + /* Transform comparisons of the form X - Y CMP 0 to X CMP Y. */ + if (TREE_CODE (arg0) == MINUS_EXPR + TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg1)) any good reason for using TYPE_OVERFLOW_UNDEFINED on the type of arg1 instead on the type of the MINUS (yes, they should match, but it really looks odd ... the overflow of the minus has to be undefined). For the sake of consistency with the X +- C1 CMP C2 case just above, but I can change both. Also for EQ_EXPR and NE_EXPR the transform is fine even when !TYPE_OVERFLOW_UNDEFINED (and we seem to perform it already somewhere). Please look where and try to add the undefined overflow case to it. Yes, but it's the same for the X +- C1 CMP C2 case, i.e. there are specific cases for X +- C1 EQ/NE C2 and X - Y EQ/NE 0 in fold_binary, so I'm not sure what you're asking. As for the VRP pieces I don't understand what is missing here (well, compare_range_with_value and/or compare_values might not handle this case? then better fix that to improve symbolic range handling in general?). Also I have a longstanding patch in my tree that does Yes, there is an explicit non-handling of symbolic ranges for PLUS_EXPR and MINUS_EXPR in VRP (extract_range_from_binary_expr_1) and the patch works around it by propagating the code instead of the ranges, which is far easier and sufficient here. If you think that the way to go is to handle symbolic ranges for PLUS_EXPR and MINUS_EXPR instead, fine with me, I can try. -- Eric Botcazou
Re: [RFC] optimize x - y cmp 0 with undefined overflow
On Tue, May 27, 2014 at 11:25 AM, Eric Botcazou ebotca...@adacore.com wrote: + tree new_const + = fold_build2_loc (loc, reverse_op, TREE_TYPE (arg1), const2, const1); /* If the constant operation overflowed this can be simplified as a comparison against INT_MAX/INT_MIN. */ - if (TREE_CODE (lhs) == INTEGER_CST - TREE_OVERFLOW (lhs)) + if (TREE_OVERFLOW (new_const)) well, either use int_const_binop above or retain the check (or use TREE_OVERFLOW_P). Bonus points for using wide-ints here and not relying on TREE_OVERFLOW. The check is useless (you get either NULL_TREE or INTEGER_CST here) but I'll use int_const_binop. + /* Transform comparisons of the form X - Y CMP 0 to X CMP Y. */ + if (TREE_CODE (arg0) == MINUS_EXPR + TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg1)) any good reason for using TYPE_OVERFLOW_UNDEFINED on the type of arg1 instead on the type of the MINUS (yes, they should match, but it really looks odd ... the overflow of the minus has to be undefined). For the sake of consistency with the X +- C1 CMP C2 case just above, but I can change both. Also for EQ_EXPR and NE_EXPR the transform is fine even when !TYPE_OVERFLOW_UNDEFINED (and we seem to perform it already somewhere). Please look where and try to add the undefined overflow case to it. Yes, but it's the same for the X +- C1 CMP C2 case, i.e. there are specific cases for X +- C1 EQ/NE C2 and X - Y EQ/NE 0 in fold_binary, so I'm not sure what you're asking. I'm asking to merge them (move them to fold_comparison). As for the VRP pieces I don't understand what is missing here (well, compare_range_with_value and/or compare_values might not handle this case? then better fix that to improve symbolic range handling in general?). Also I have a longstanding patch in my tree that does Yes, there is an explicit non-handling of symbolic ranges for PLUS_EXPR and MINUS_EXPR in VRP (extract_range_from_binary_expr_1) and the patch works around it by propagating the code instead of the ranges, which is far easier and sufficient here. If you think that the way to go is to handle symbolic ranges for PLUS_EXPR and MINUS_EXPR instead, fine with me, I can try. Yeah, it would be nice to see some support. The most interesting cases will be symbolic-singleton +- CST where the offset shrinks a constant offset in a symbolic A +- CST (thus we don't get into any overflow issues). Thus handling [a + 1, a + 1] - [1, 1] - [a, a] for example. We get the offsetted singleton symbolic ranges from conditional asserts a lot. Thanks, Richard. -- Eric Botcazou
Re: [RFC] optimize x - y cmp 0 with undefined overflow
I'm asking to merge them (move them to fold_comparison). OK, I'll repost a first patch doing only the fold-const.c massaging. Yeah, it would be nice to see some support. The most interesting cases will be symbolic-singleton +- CST where the offset shrinks a constant offset in a symbolic A +- CST (thus we don't get into any overflow issues). Thus handling [a + 1, a + 1] - [1, 1] - [a, a] for example. We get the offsetted singleton symbolic ranges from conditional asserts a lot. For the case at hand, you have [x + 1, INF] for y and you want to evaluate the range of y - x, so you really don't want to use the range of x... -- Eric Botcazou
Re: [RFC] optimize x - y cmp 0 with undefined overflow
On Tue, May 27, 2014 at 11:59 AM, Eric Botcazou ebotca...@adacore.com wrote: I'm asking to merge them (move them to fold_comparison). OK, I'll repost a first patch doing only the fold-const.c massaging. Yeah, it would be nice to see some support. The most interesting cases will be symbolic-singleton +- CST where the offset shrinks a constant offset in a symbolic A +- CST (thus we don't get into any overflow issues). Thus handling [a + 1, a + 1] - [1, 1] - [a, a] for example. We get the offsetted singleton symbolic ranges from conditional asserts a lot. For the case at hand, you have [x + 1, INF] for y and you want to evaluate the range of y - x, so you really don't want to use the range of x... Ok. That's similar to the issue I try to address with the VRP snipped I posted yesterday. I'd say we still handle basic symbolic range ops in extract_range_from_binary_1 but in extract_range_from_binary_expr for a symbolic op0 we try to simplify it with both [op1, op1] and with the value-range of op1 until we get a non-varying range as result. Not sure if it's worth restricting that to the case where op0s value-range refers to op1 or vice versa, and eventually only use op1 symbolically then. Richard. -- Eric Botcazou
Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation
2014-03-01 1:40 GMT+04:00 Bernd Schmidt ber...@codesourcery.com: For your use case, I'd imagine the offload compiler would be built relatively normally as a full build with --enable-as-accelerator-for=x86_64-linux, which would install it into locations where the host will eventually be able to find it. Then the host compiler would be built with another new configure option (as yet unimplemented in my patch set) --enable-offload-targets=mic,... which would tell the host compiler about the pre-built offload target compilers. Hi Bernd, Could you please advise me how to build the offload gcc and target libraries during the build of host gcc? I see a possible way: to duplicate all modules in Makefile.def needed for regular build of gcc and libs. But it looks ugly. Is there a better solution? Thanks, -- Ilya
[PATCH] Try harder for conditional evaluation in VRP
Especially for ops with symbolic ranges it may be preferable to compare one range with an op such as in [x + 1, x + 1] x instead of expanding the range of x on the rhs to sth unrelated. So, try harder. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2014-05-27 Richard Biener rguent...@suse.de * tree-vrp.c (vrp_evaluate_conditional_warnv_with_ops_using_ranges): Try using literal operands when comparing value-ranges failed. Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 210931) +++ gcc/tree-vrp.c (working copy) @@ -6919,14 +6919,15 @@ vrp_evaluate_conditional_warnv_with_ops_ vr0 = (TREE_CODE (op0) == SSA_NAME) ? get_value_range (op0) : NULL; vr1 = (TREE_CODE (op1) == SSA_NAME) ? get_value_range (op1) : NULL; + tree res = NULL_TREE; if (vr0 vr1) -return compare_ranges (code, vr0, vr1, strict_overflow_p); - else if (vr0 vr1 == NULL) -return compare_range_with_value (code, vr0, op1, strict_overflow_p); - else if (vr0 == NULL vr1) -return (compare_range_with_value +res = compare_ranges (code, vr0, vr1, strict_overflow_p); + if (!res vr0) +res = compare_range_with_value (code, vr0, op1, strict_overflow_p); + if (!res vr1) +res = (compare_range_with_value (swap_tree_comparison (code), vr1, op0, strict_overflow_p)); - return NULL; + return res; } /* Helper function for vrp_evaluate_conditional_warnv. */
Re: [PATCH][4/4] Always 64bit-HWI cleanups
On Tue, 27 May 2014, Richard Biener wrote: On May 27, 2014 3:58:06 AM CEST, David Edelsohn dje@gmail.com wrote: This patch (further) broke bootstrap on AIX. AIX defaults to 32 bit, although it supports 64 bit HWI. /nasfarm/edelsohn/src/src/gcc/bitmap.c: In function 'int print_statistics(bitmap_descriptor_d**, output_info*)': /nasfarm/edelsohn/src/src/gcc/bitmap.c:2168:24: error: expected ')' before 'PRId64' /nasfarm/edelsohn/src/src/gcc/bitmap.c: In function 'void dump_bitmap_statistics()': /nasfarm/edelsohn/src/src/gcc/bitmap.c:2202:15: error: expected ')' before 'PRId64' /nasfarm/edelsohn/src/src/gcc/bt-load.c: In function 'void migrate_btr_defs(reg_class, int)': /nasfarm/edelsohn/src/src/gcc/bt-load.c:1414:34: error: expected ')' before 'PRId64' /nasfarm/edelsohn/src/src/gcc/cfg.c: In function 'void dump_edge_info(FILE*, edge, int, int)': /nasfarm/edelsohn/src/src/gcc/cfg.c:489:25: error: expected ')' before 'PRId64' /nasfarm/edelsohn/src/src/gcc/cfg.c: In function 'void dump_bb_info(FILE*, basic_block, int, int, bool, bool)': /nasfarm/edelsohn/src/src/gcc/cfg.c:737:33: error: expected ')' before 'PRId64' This means aix has inttypes.h but not the standard mandated PRI?64 macros. Does it have stdint.h and therein int64_t? A possibility is to define the GCC fallbacks when the defines are not defined as opposed to only when inttypes.h is not available. Can you investigate some more the situation on aix? Proposed patch below. Richard. Index: gcc/hwint.h === --- gcc/hwint.h (revision 210932) +++ gcc/hwint.h (working copy) @@ -135,21 +135,30 @@ typedef HOST_WIDE_INT __gcc_host_wide_in #define HOST_WIDE_INT_PRINT_HEX %# HOST_WIDE_INT_PRINT x #define HOST_WIDE_INT_PRINT_HEX_PURE % HOST_WIDE_INT_PRINT x -/* Provide C99 inttypes.h style format definitions for 64bits. */ -#ifndef HAVE_INTTYPES_H -#undef PRId64 +/* Provide C99 inttypes.h style format definitions for 64bits. As + 64bit types are only conditionally supported also provide fallback + definitions for the 64bit variants even if inttypes.h is available. */ +#ifndef PRId64 #define PRId64 HOST_WIDE_INT_PRINT d -#undef PRIi64 +#endif +#ifndef PRIi64 #define PRIi64 HOST_WIDE_INT_PRINT i -#undef PRIo64 +#endif +#ifndef PRIo64 #define PRIo64 HOST_WIDE_INT_PRINT o -#undef PRIu64 +#endif +#ifndef PRIu64 #define PRIu64 HOST_WIDE_INT_PRINT u -#undef PRIx64 +#endif +#ifndef PRIx64 #define PRIx64 HOST_WIDE_INT_PRINT x -#undef PRIX64 +#endif +#ifndef PRIX64 #define PRIX64 HOST_WIDE_INT_PRINT X #endif +#ifndef HAVE_INTTYPES_H +/* 32bit variants go here once we start using them. */ +#endif /* Define HOST_WIDEST_FAST_INT to the widest integer type supported efficiently in hardware. (That is, the widest integer type that fits
Re: [PATCH][4/4] Always 64bit-HWI cleanups
On Tue, 27 May 2014, Jakub Jelinek wrote: On Tue, May 27, 2014 at 08:26:35AM +0200, Richard Biener wrote: /nasfarm/edelsohn/src/src/gcc/cfg.c: In function 'void dump_bb_info(FILE*, basic_block, int, int, bool, bool)': /nasfarm/edelsohn/src/src/gcc/cfg.c:737:33: error: expected ')' before 'PRId64' This means aix has inttypes.h but not the standard mandated PRI?64 macros. Does it have stdint.h and therein int64_t? A possibility is to define the GCC fallbacks when the defines are not defined as opposed to only when inttypes.h is not available. BTW, we should be prepared for C++11 udlit and use spaces around the PRI* macros, in case some compiler treats those as udlits unconditionally. Though that's not likely as it would break existing and valid code. Richard.
[PATCH] Delete temporary string within demangler even in failure cases.
Spotted that a call to demangle_template might allocate storage within a temporary string even if the call to demangle_template eventually returns failure. This will never cause the demangler to crash, but does leak memory, as a result I've not added any tests for this. Calling string_delete is safe, even if nothing is allocated into the string, the string is initialised with string_init, so we know the internal pointers are NULL. I don't have gcc write privilages, so if this is approved could someone please commit for me. Thanks, Andrew libiberty/ChangeLog * cplus-dem.c (do_type): Call string_delete even if the call to demangle_template fails. --- libiberty/cplus-dem.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/libiberty/cplus-dem.c b/libiberty/cplus-dem.c index e948487..1c713aa 100644 --- a/libiberty/cplus-dem.c +++ b/libiberty/cplus-dem.c @@ -3656,7 +3656,10 @@ do_type (struct work_stuff *work, const char **mangled, string *result) string_delete (temp); } else - break; + { + string_delete (temp); + break; + } } else if (**mangled == 'Q') { -- 1.8.1.3
Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation
On 05/27/2014 12:17 PM, Ilya Verbin wrote: 2014-03-01 1:40 GMT+04:00 Bernd Schmidt ber...@codesourcery.com: For your use case, I'd imagine the offload compiler would be built relatively normally as a full build with --enable-as-accelerator-for=x86_64-linux, which would install it into locations where the host will eventually be able to find it. Then the host compiler would be built with another new configure option (as yet unimplemented in my patch set) --enable-offload-targets=mic,... which would tell the host compiler about the pre-built offload target compilers. Hi Bernd, Could you please advise me how to build the offload gcc and target libraries during the build of host gcc? There isn't a way to do this. For ptx, target libraries can't be built anyway. For your use case, I'd recommend building the offload gcc first, installing it, and then building the host gcc with --enable-offload-targets as described above. Bernd
Re: [patch,doc,avr] PR61044: Note that label defferences are not supported
2014-05-26 20:47 GMT+04:00 Georg-Johann Lay a...@gjlay.de: This adds a note to the user manual that code with label differences is not supported. This is because adding an offset to a stub address as generated with gs() will in general not yield the address of the label+offset. This actually occurs only if gs() has something to do, e.g. on devices with more than 128KiB of flash; but I don't see any benefit from supporting the feature on small devices -- left out all the problems caused by relaxing and when gas resolves the label difference. Thus, add a note that it's not available on any AVR and mark the PR as invalid. Ok for trunk? Johann PR target/61044 * doc/extend.texi (Local Labels): Note that label differences are not supported for AVR. Index: doc/extend.texi Approved. Please, apply.
Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation
2014-05-27 14:59 GMT+04:00 Bernd Schmidt ber...@codesourcery.com: There isn't a way to do this. For ptx, target libraries can't be built anyway. For your use case, I'd recommend building the offload gcc first, installing it, and then building the host gcc with --enable-offload-targets as described above. Do I understand correctly that you recommend to build our offload gcc manually, rather than during configure/make? -- Ilya
[patch] libstdc++/61329 Add missing 'inline' in regex debug functions
Tested x86_64-linux, committed to trunk, will commit to 4.9 later today. commit 9ea4330e4e3a7fe431206048f2e1efe5e9c28eaa Author: Jonathan Wakely jwak...@redhat.com Date: Tue May 27 11:48:50 2014 +0100 PR libstdc++/61329 * include/bits/regex_automaton.tcc (_State_base::_M_print): Add inline specifier. (_State_base::_M_dot): Likewise. diff --git a/libstdc++-v3/include/bits/regex_automaton.tcc b/libstdc++-v3/include/bits/regex_automaton.tcc index e0ac3f9..1c6cfdd 100644 --- a/libstdc++-v3/include/bits/regex_automaton.tcc +++ b/libstdc++-v3/include/bits/regex_automaton.tcc @@ -35,7 +35,7 @@ namespace __detail _GLIBCXX_BEGIN_NAMESPACE_VERSION #ifdef _GLIBCXX_DEBUG - std::ostream + inline std::ostream _State_base::_M_print(std::ostream ostr) const { switch (_M_opcode) @@ -67,7 +67,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } // Prints graphviz dot commands for state. - std::ostream + inline std::ostream _State_base::_M_dot(std::ostream __ostr, _StateIdT __id) const { switch (_M_opcode)
Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation
On 05/27/2014 01:11 PM, Ilya Verbin wrote: 2014-05-27 14:59 GMT+04:00 Bernd Schmidt ber...@codesourcery.com: There isn't a way to do this. For ptx, target libraries can't be built anyway. For your use case, I'd recommend building the offload gcc first, installing it, and then building the host gcc with --enable-offload-targets as described above. Do I understand correctly that you recommend to build our offload gcc manually, rather than during configure/make? Well, configure/make it separately - build and install it first. If there was a way to generalize the Makefiles to allow target libraries to be built for multiple targets, that would be pretty good too, but it might be too much work for too little gain. Bernd
[patch,libgcc]: More fixes for PR61152
This adds runtime library exceptions for some more files that go into libgcc.c via tm.h, mostly due to v850. Ok for trunk? Johann gcc/ PR libgcc/61152 * config/dbx.h (License): Add Runtime Library Exception. * config/newlib-stdint.h (License): Same. * config/rtems.h (License): Same * config/initfini-array.h (License): Same * config/v850/v850.h (License): Same. * config/v850/v850-opts.h (License): Same * config/v850/rtems.h (License): Same. Index: config/dbx.h === --- config/dbx.h (revision 210958) +++ config/dbx.h (working copy) @@ -13,8 +13,13 @@ but WITHOUT ANY WARRANTY; without even t MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. -You should have received a copy of the GNU General Public License -along with GCC; see the file COPYING3. If not see +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see http://www.gnu.org/licenses/. */ /* This file causes gcc to prefer using DBX (stabs) debugging Index: config/newlib-stdint.h === --- config/newlib-stdint.h (revision 210958) +++ config/newlib-stdint.h (working copy) @@ -13,8 +13,13 @@ but WITHOUT ANY WARRANTY; without even t MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. -You should have received a copy of the GNU General Public License -along with GCC; see the file COPYING3. If not see +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see http://www.gnu.org/licenses/. */ /* newlib uses 32-bit long in certain cases for all non-SPU Index: config/initfini-array.h === --- config/initfini-array.h (revision 210958) +++ config/initfini-array.h (working copy) @@ -14,8 +14,13 @@ or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. - You should have received a copy of the GNU General Public License - along with GCC; see the file COPYING3. If not see + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see http://www.gnu.org/licenses/. */ #ifdef HAVE_INITFINI_ARRAY_SUPPORT Index: config/rtems.h === --- config/rtems.h (revision 210958) +++ config/rtems.h (working copy) @@ -13,8 +13,13 @@ but WITHOUT ANY WARRANTY; without even t MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. -You should have received a copy of the GNU General Public License -along with GCC; see the file COPYING3. If not see +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see http://www.gnu.org/licenses/. */ /* The system headers under RTEMS are C++-aware. */ Index: config/v850/v850-opts.h === --- config/v850/v850-opts.h (revision 210958) +++ config/v850/v850-opts.h (working copy) @@ -13,8 +13,13 @@ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. - You should have received a copy of the GNU General Public License - along with GCC; see the file COPYING3. If not see + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public
[PATCH] Fix float-cast test (PR testsuite/61319)
E.g. CentOS 5 doesn't define several macros in limits.h, so define them conditionally to make the test pass. Ok for trunk? 2014-05-27 Marek Polacek pola...@redhat.com PR testsuite/61319 * c-c++-common/ubsan/float-cast.h: Conditionally define LLONG_MAX, LLONG_MIN, and ULLONG_MAX. diff --git gcc/testsuite/c-c++-common/ubsan/float-cast.h gcc/testsuite/c-c++-common/ubsan/float-cast.h index 166da8f..e76171a 100644 --- gcc/testsuite/c-c++-common/ubsan/float-cast.h +++ gcc/testsuite/c-c++-common/ubsan/float-cast.h @@ -1,5 +1,16 @@ /* Various macros for -fsanitize=float-cast-overflow testing. */ +/* E.g. on CentOS 5 these aren't defined in limits.h. */ +#ifndef LLONG_MAX +# define LLONG_MAX __LONG_LONG_MAX__ +#endif +#ifndef LLONG_MIN +# define LLONG_MIN (-__LONG_LONG_MAX__ - 1LL) +#endif +#ifndef ULLONG_MAX +# define ULLONG_MAX (__LONG_LONG_MAX__ * 2ULL + 1ULL) +#endif + #define INT128_MAX (__int128) (((unsigned __int128) 1 ((__SIZEOF_INT128__ * __CHAR_BIT__) - 1)) - 1) #define INT128_MIN (-INT128_MAX - 1) #define UINT128_MAX ((2 * (unsigned __int128) INT128_MAX) + 1) Marek
Re: [PATCH, fortran]: Include stdlib.h in intrinsics/getcwd.c
On Tue, May 27, 2014 at 09:44:22AM +0200, Uros Bizjak wrote: ... to avoid implicit declaration of function ???free??? warning. 2014-05-27 Uros Bizjak ubiz...@gmail.com * intrinsics/getcwd.c: Include stdlib.h. Tested on x86_64-pc-linux-gnu. OK for mainline? Yes. It can also be committed to the 4.9 branch if you have the time. -- Steve
Re: [PATCH] Fix float-cast test (PR testsuite/61319)
On Tue, May 27, 2014 at 01:32:03PM +0200, Marek Polacek wrote: E.g. CentOS 5 doesn't define several macros in limits.h, so define them conditionally to make the test pass. Ok for trunk? 2014-05-27 Marek Polacek pola...@redhat.com PR testsuite/61319 * c-c++-common/ubsan/float-cast.h: Conditionally define LLONG_MAX, LLONG_MIN, and ULLONG_MAX. Ok, thanks. Jakub
[PATCH 1/3] Fix demangler testsuite crashes with CP_DEMANGLE_DEBUG defined
I tried running the demangler's testsuite with CP_DEMANGLE_DEBUG defined, but it crashed, with: Program received signal SIGSEGV, Segmentation fault. 0x0040a8c3 in d_dump (dc=0x1, indent=12) at ../../src/libiberty/cp-demangle.c:567 567 switch (dc-type) (gdb) bt 3 #0 0x0040a8c3 in d_dump (dc=0x1, indent=12) at ../../src/libiberty/cp-demangle.c:567 #1 0x0040ae47 in d_dump (dc=0x7fffd098, indent=10) at ../../src/libiberty/cp-demangle.c:787 #2 0x0040ae47 in d_dump (dc=0x7fffd0c8, indent=8) at ../../src/libiberty/cp-demangle.c:787 Note dc=0x1, which is obviously a bogus pointer. This is the end of d_dump recursing for a component type that that doesn't actually have subtrees: 787 d_dump (d_left (dc), indent + 2); 788 d_dump (d_right (dc), indent + 2); This fixes the two cases the testsuite trips on. libiberty/ 2014-05-26 Pedro Alves pal...@redhat.com * cp-demangle.c (d_dump): Handle DEMANGLE_COMPONENT_FUNCTION_PARAM and DEMANGLE_COMPONENT_NUMBER. --- libiberty/cp-demangle.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c index 68d8ee1..c0d2ffe 100644 --- a/libiberty/cp-demangle.c +++ b/libiberty/cp-demangle.c @@ -575,6 +575,9 @@ d_dump (struct demangle_component *dc, int indent) case DEMANGLE_COMPONENT_TEMPLATE_PARAM: printf (template parameter %ld\n, dc-u.s_number.number); return; +case DEMANGLE_COMPONENT_FUNCTION_PARAM: + printf (function parameter %ld\n, dc-u.s_number.number); + return; case DEMANGLE_COMPONENT_CTOR: printf (constructor %d\n, (int) dc-u.s_ctor.kind); d_dump (dc-u.s_ctor.name, indent + 2); @@ -760,6 +763,9 @@ d_dump (struct demangle_component *dc, int indent) case DEMANGLE_COMPONENT_CHARACTER: printf (character '%c'\n, dc-u.s_character.character); return; +case DEMANGLE_COMPONENT_NUMBER: + printf (number %ld\n, dc-u.s_number.number); + return; case DEMANGLE_COMPONENT_DECLTYPE: printf (decltype\n); break; -- 1.9.0
[PATCH 3/3] mangler/demangler dogfooding
This patch teaches g++ to try to demangle any symbol it generates/mangles, when checking is enabled in the build, as soon as the symbol is mangled. The idea here is validate the demangling as close as possible to the generator as possible. This allows catching demangler bugs/crashes much earlier in the cycle, when building libstdc++ and friends, when running g++'s testsuite, and even potentially earlier than that, when developers work on new C++11/14-and-beyond features, which influence mangling, validating against some random test that's not in the testsuite yet (and sometimes doesn't make it there either), rather than much later in production when the user is trying to debug the code, or the program tries to generate a backtrace. Both libstdc++ and the testsuite include a good set of tricky symbols to demangle, and the latter naturally includes all sort of random, weird, code full of corner cases. And, I assume g++ maintainers once in a while run WIP g++ through some piles of very complicated C++ code. It seems clear to me that ideally we should be able to do assert (demangle (mangle (symbol)); But, unfortunately, we can't yet. I built g++ and ran the testsuite with a variant of this patch that would print the mangled symbol if the demangling fails, but would otherwise continue without aborting, and I found out that: - we can't demangle ~40 symbols generated while building libstdc++ and friends. E.g.: _ZN9__gnu_cxx13new_allocatorINSt13__future_base13_State_baseV2EE9constructIS2_IEEEvPT_DpOT0_ _ZNSt15_Sp_counted_ptrIDnLN9__gnu_cxx12_Lock_policyE1EEC1ERKS3_ - we can't demangle around ~2600 (!) symbols generated while running the g++ testsuite. E.g., _Z1f1SIXadL3FooEEE _ZZ4mainENKUlRT_E_clI1SEEDaS0_ Of course, these all may well be mangler bugs rather than demangler bugs. It's possible that these are already known mangler bugs even, that have been fixed, but require a higher -fabi-version=X. I didn't investigate that. Bootstrapped and regtested on x86_64 Fedora 20. gcc/ 2014-05-27 Pedro Alves pal...@redhat.com * cp/mangle.c: Include demangle.h. (finish_mangling_internal) [ENABLE_CHECKING]: Demangle the just mangled symbol. --- gcc/cp/mangle.c | 30 ++ 1 file changed, 30 insertions(+) diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c index 4205fec..c4eb5dc 100644 --- a/gcc/cp/mangle.c +++ b/gcc/cp/mangle.c @@ -58,6 +58,7 @@ along with GCC; see the file COPYING3. If not see #include target.h #include cgraph.h #include wide-int.h +#include demangle.h /* Debugging support. */ @@ -3367,6 +3368,35 @@ finish_mangling_internal (const bool warn) /* Null-terminate the string. */ write_char ('\0'); + +#if ENABLE_CHECKING + /* Make sure we can demangle what we just generated. */ + { +const char *str ATTRIBUTE_UNUSED; +const char *mangled_str; +int dmgl_opts; + +dmgl_opts = (DMGL_VERBOSE +| DMGL_ANSI +| DMGL_GNU_V3 +| DMGL_RET_POSTFIX +| DMGL_PARAMS); + +mangled_str = (const char *) obstack_base (mangle_obstack); +str = cplus_demangle_v3 (mangled_str, dmgl_opts); +#if 0 +/* XXX The above catches potential demangler crashes. And, + ideally, we'd also abort if demangling fails. However, we + can't do that because the demangler isn't able to demangle all + symbols we generate by default. Some failures might be + demangler bugs, others unknown mangler bugs, and others known + mangler bugs fixed with a higher -fabi-version, which the + demangler doesn't have a workaround for. */ +if ((str != NULL) != (mangled_str[0] == '_' mangled_str[1] == 'Z')) + internal_error (demangling failed for: %s, mangled_str); +#endif + } +#endif } -- 1.9.0
[PATCH 0/3] mangler/demangler dogfooding
There's been discussion on the GDB lists about demangler crashes bringing down GDB. Such demangler bugs should be fixed, of course. Patch 2 in this series fixes the one that is currently known. But while GDB demangles all symbols in every file it loads, always, in libstdc++, the demangler is only called in error situations, and then only to demangle the few symbols that appear in the backtrace. This means that mangler/demangler bugs can easily escape undetected until much later, when the user is trying to debug the program. So we had a discussion in the gdb lists about whether there is something that could be done to demangler testing to make it more effective in catching bugs that GDB would catch. Ideas in progress so far: - Just throw more symbols at it once in a while. We'd like to try demangling the world once in a while. That is, demangle all symbols in a distro. Florian Weimer did this with the ELF symbols in Fedora 20 and rawhide, and didn't get any crashes, which is reassuring. We haven't yet tried extracting and demangling all DWARF symbols, but we'd like to get there too. - Throw pseudo-random symbols at the demangler. That is, make it more robust wrt to invalid input. Gary wrote a fuzzy tester, and it already caught a few crashes. - Teach g++ itself to try to demangle any symbol it generates/mangles ... when checking is enabled in the build. The idea here is validate the mangling/demangling as close as possible to the generator as possible. Patch 3 in this series implements this. This is my favorite approach, as this catches demangler crashes much earlier in the cycle, when building libstdc++ and friends, and when running g++'s testsuite, rather than much later in production when the user is trying to debug the code, or the program tries to generate a backtrace. Both libstdc++ and the testsuite include a good set of tricky symbols to demangle, and the latter naturally includes all sort of random, weird, code full of corner cases. And, I assume g++ maintainers once in a while run WIP g++ through some piles of very complicated C++ code. Patch 2 fixes the demangler crashes I got when running the testsuite with patch 3 installed. And then patch 1 fixes crashes in the demangler debug that I got while I was writing patch 2. Pedro Alves (3): Fix demangler testsuite crashes with CP_DEMANGLE_DEBUG defined PR other/61321 - demangler crash on casts in template parameters mangler/demangler dogfooding gcc/cp/mangle.c | 30 include/demangle.h| 4 libiberty/cp-demangle.c | 43 --- libiberty/cp-demint.c | 1 + libiberty/testsuite/demangle-expected | 23 +++ 5 files changed, 93 insertions(+), 8 deletions(-) -- 1.9.0
[PATCH 2/3] PR other/61321 - demangler crash on casts in template parameters
The fix for bug 59195: [C++ demangler handles conversion operator incorrectly] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59195 unfortunately makes the demangler crash due to infinite recursion, in case of casts in template parameters. For example, with: templateint struct A {}; template typename Y void function_temp(Asizeof ((Y)(999))) {} template void function_tempint(Asizeof (int)); The 'function_tempint' instantiation above mangles to: _Z13function_tempIiEv1AIXszcvT_Li999EEE The demangler parses this as: typed name template name 'function_temp' template argument list builtin type int function type builtin type void argument list template (*) name 'A' template argument list unary operator operator sizeof unary operator cast template parameter 0(**) literal builtin type int name '999' And after the fix for 59195, due to: static void d_print_cast (struct d_print_info *dpi, int options, const struct demangle_component *dc) { ... /* For a cast operator, we need the template parameters from the enclosing template in scope for processing the type. */ if (dpi-current_template != NULL) { dpt.next = dpi-templates; dpi-templates = dpt; dpt.template_decl = dpi-current_template; } when printing the template argument list of A (what should be sizeof (int)), the template parameter 0 (that is, T_, the '**' above) now refers to the first parameter of the the template argument list of the 'A' template (the '*' above), exactly what we were already trying to print. This leads to infinite recursion, and stack exaustion. The template parameter 0 should actually refer to the first parameter of the 'function_temp' template. Where it reads for the cast operator in the comment in d_print_cast (above), it's really talking about a conversion operator, like: struct A { template typename U explicit operator U(); }; We don't want to inject the template parameters from the enclosing template in scope when processing a cast _expression_, only when handling a conversion operator. The problem is that DEMANGLE_COMPONENT_CAST is currently ambiguous, and means _both_ 'conversion operator' and 'cast expression'. Fix this by adding a new DEMANGLE_COMPONENT_CONVERSION component type, which does what DEMANGLE_COMPONENT_CAST does today, and making DEMANGLE_COMPONENT_CAST just simply print its component subtree. I think we could instead reuse DEMANGLE_COMPONENT_CAST and in d_print_comp_inner still do: @@ -5001,9 +5013,9 @@ d_print_comp_inner (struct d_print_info *dpi, int options, d_print_comp (dpi, options, dc-u.s_extended_operator.name); return; case DEMANGLE_COMPONENT_CAST: d_append_string (dpi, operator ); - d_print_cast (dpi, options, dc); + d_print_conversion (dpi, options, dc); return; leaving the unary cast case below calling d_print_cast, but seems to me that spliting the component types makes it easier to reason about the code. g++'s testsuite actually generates three symbols that crash the demangler in the same way. I've added those as tests in the demangler testsuite as well. And then this fixes PR other/61233 too, which happens to be a demangler crash originally reported to GDB, at: https://sourceware.org/bugzilla/show_bug.cgi?id=16957 Bootstrapped and regtested on x86_64 Fedora 20. Also ran this through GDB's testsuite. GDB will require a small update to use DEMANGLE_COMPONENT_CONVERSION in one place it's using DEMANGLE_COMPONENT_CAST in its sources. libiberty/ 2014-05-27 Pedro Alves pal...@redhat.com PR other/61321 PR other/61233 * demangle.h (enum demangle_component_type) DEMANGLE_COMPONENT_CONVERSION: New value. * cp-demangle.c (d_demangle_callback, d_make_comp): Handle DEMANGLE_COMPONENT_CONVERSION. (is_ctor_dtor_or_conversion): Handle DEMANGLE_COMPONENT_CONVERSION instead of DEMANGLE_COMPONENT_CAST. (d_operator_name): Return a DEMANGLE_COMPONENT_CONVERSION component if handling a conversion. (d_count_templates_scopes, d_print_comp_inner): Handle DEMANGLE_COMPONENT_CONVERSION. (d_print_comp_inner): Handle DEMANGLE_COMPONENT_CONVERSION instead of DEMANGLE_COMPONENT_CAST. (d_print_cast): Rename as ... (d_print_conversion): ... this. Adjust comments. (d_print_cast): Rewrite - simply print the left subcomponent. * cp-demint.c (cplus_demangle_fill_component): Handle DEMANGLE_COMPONENT_CONVERSION. * testsuite/demangle-expected: Add tests. --- include/demangle.h| 4 libiberty/cp-demangle.c | 37 +++ libiberty/cp-demint.c | 1 +
RE: [PATCH] Fix an AARCH64/ARM performance regression
Hi, On Tue, 27 May 2014 11:06:21, Richard Biener wrote: But the coment previously read /* A constant address in TO_RTX can have VOIDmode, we must not try to call force_reg for that case. Avoid that case. */ and you are removing that check. So I guess you want to retain GET_MODE (XEXP (to_rtx, 0)) != VOIDmode well Richard, this code is definitely out-dated, with no test cases, no good explanation at all. Obviously force_reg is no longer used in that code path since years. Actually I'd really like to clean up this code... Furthermore nobody knows how to satisfy the condition MEM_P (to_rtx) GET_MODE(XEXP(to_rtx,0) == VOIDmode. A MEM_REF, referring to a SYMBOL_REF always seems to have a REG in XEXP(to_rtx,0) which always is of type Pmode. (Which may be specific to the ARM target). And even if the MEM_REF would directly have a pointer to a SYMBOL_REF as argument 0, it would also have pointer mode. I looked for gen_rtx_SYMBOL_REF, and they all seem to use Pmode. My feeling is that it is impossible to write an example where this condition is false, and it could even be something like a gcc_assert(GET_MODE (XEXP (to_rtx, 0)) != VOIDmode); At least this condition seems not to trigger on normal SYMBOL_REFs. If you really don't like to touch that part, please let me know. I could let it, as it is, however that would not be my preference. or investigate why we don't need to avoid this case anymore. In fact, that's probably the only change necessary, just drop the == BLKmode check? Your lengthy description doesn't reveal if you investigated that. It seems to me it tries to avoid ajdusting MEM (symbol-ref) for example? Maybe also for optimization reasons? The example code can be modified to test the conditions. For instance to understand, why the condition MEM_ALIGN (to_rtx) == GET_MODE_ALIGNMENT (mode1) is too strict, while MEM_ALIGN (to_rtx)= GET_MODE_ALIGNMENT (mode1) generates better code, simply add an alignment attribute to the variable A a[6] __attribute__((aligned(32))); This did not reach the if block at all, and the generated code, was using mov instructions, even before my patch. But in this example, we can see, there is no reason to restrict this only to exactly aligned values. The proposed condition on this if statement has some similarity with the logic in store_field(). If the field is aligned in a certain way, the access will likely be a single move instruction, and it is superior code, to have only one constant offset part folded together. This would naturally be different for bit-field accesses, that's what why the rest of the conditions should not change. Adding a new comment before the block reflecting your analysis above would be nice as well. Yes, I will add a comment, and post and updated version of the patch tomorrow. Thanks Bernd.
Re: [PATCH] Warn on and fold NULL checks against inline functions
On Tue, May 27, 2014 at 3:33 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, May 26, 2014 at 9:01 PM, Patrick Palka patr...@parcs.ath.cx wrote: Hi, This patch teaches the C++ frontend to warn on NULL checks against inline functions and it teaches the middle-end to fold NULL checks against inline functions. These two things are currently done for non-inline C++ functions, but inline functions are exceptional in that the C++ frontend marks them as weak, and NULL checks against weak symbols cannot be folded in general because the symbol may be mapped to NULL at link-time. But in the case of an inline function, the fact that it's a weak symbol is an implementation detail. And because it is not permitted to explicitly give an inline function the weak attribute (see handle_weak_attribute), in order to acheive $SUBJECT it suffices to assume that all inline functions are non-null, which is what this patch does. Bootstrap and regtest against x86_64-unknown-linux-gnu in progress. Is this patch OK assuming no regressions? What about always_inline function wrappers as used in FORTIFY_SOURCE? IIRC the actual referenced function may be declared weak and thus the address can compare to NULL? Sth like extern void __foo (void) __attribute__((weak,asm(foo))); extern inline __attribute__((always_inline,gnu_inline)) void foo (void) { __foo (); } int main() { if (foo == 0) return 0; abort (); } The issue is that the inline and alias may be hidden to the user and thus he'll get unwanted and confusing warnings. Richard. So as it stands the foo == 0 check in the above example gets folded with or without my patch. The issue here seems to be related to the use of gnu_inline. The address of an inline function marked gnu_inline shouldn't be considered non-NULL because a standalone function does not actually get emitted. Of course, one could inspect aliases but in general it is not correct to assume such functions are non-NULL. Does this sound right? This issue has slight overlap with the issue that my patch tries to fix. I could try to handle it as well.
Re: [patch,libgcc]: More fixes for PR61152
Georg-Johann Lay a...@gjlay.de writes: gcc/ PR libgcc/61152 * config/dbx.h (License): Add Runtime Library Exception. * config/newlib-stdint.h (License): Same. * config/rtems.h (License): Same * config/initfini-array.h (License): Same * config/v850/v850.h (License): Same. * config/v850/v850-opts.h (License): Same * config/v850/rtems.h (License): Same. This is OK. Thanks. Ian
Re: Create a library for tools like collect2 and lto-wrapper (2/2)
On 05/22/2014 05:56 PM, Joseph S. Myers wrote: On Thu, 22 May 2014, Bernd Schmidt wrote: The implementations of some functions like fork_execute are changed to those from collect2 and the calls in lto-wrapper adapted accordingly. There are some minor changes in these functions: for example I avoid calling fatal_error, instead using the fatal/fatal_perror functions so that the cleanup routine is called properly (lto-wrapper does not use atexit as collect2 does). Avoiding fatal_error seems like the wrong direction. Using the common diagnostics code from diagnostic.c is preferred to using custom code. Hmm, honestly I would have gone the other way as diagnostic.c seems rather unnecessarily heavyweight for the purposes of these little utilities. Thus, as an initial step I'd suggest instead converting lto-wrapper to use the common functionality where possible (fatal_error with %m replacing fatal_perror), making it use atexit in the process. Don't move code using legacy fatal / fatal_perror functions to a common library. But whatever. Here is such a patch. Bootstrapped and tested on x86_64-linux. Ok? Bernd * lto-wrapper.c (fatal, fatal_perror): Remove functions. All callers changed to use fatal_error. (main): Ensure lto_wrapper_cleanup is run atexit. diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c index f59d74e..f42b14e 100644 --- a/gcc/lto-wrapper.c +++ b/gcc/lto-wrapper.c @@ -111,43 +111,6 @@ fatal_signal (int signum) kill (getpid (), signum); } -/* Just die. CMSGID is the error message. */ - -static void __attribute__ ((format (printf, 1, 2))) -fatal (const char * cmsgid, ...) -{ - va_list ap; - - va_start (ap, cmsgid); - fprintf (stderr, lto-wrapper: ); - vfprintf (stderr, _(cmsgid), ap); - fprintf (stderr, \n); - va_end (ap); - - lto_wrapper_cleanup (); - exit (FATAL_EXIT_CODE); -} - - -/* Die when sys call fails. CMSGID is the error message. */ - -static void __attribute__ ((format (printf, 1, 2))) -fatal_perror (const char *cmsgid, ...) -{ - int e = errno; - va_list ap; - - va_start (ap, cmsgid); - fprintf (stderr, lto-wrapper: ); - vfprintf (stderr, _(cmsgid), ap); - fprintf (stderr, : %s\n, xstrerror (e)); - va_end (ap); - - lto_wrapper_cleanup (); - exit (FATAL_EXIT_CODE); -} - - /* Execute a program, and wait for the reply. ARGV are the arguments. The last one must be NULL. */ @@ -174,7 +137,7 @@ collect_execute (char **argv) pex = pex_init (0, lto-wrapper, NULL); if (pex == NULL) -fatal_perror (pex_init failed); +fatal_error (pex_init failed: %m); /* Do not use PEX_LAST here, we use our stdout for communicating with collect2 or the linker-plugin. Any output from the sub-process @@ -186,10 +149,10 @@ collect_execute (char **argv) if (err != 0) { errno = err; - fatal_perror (errmsg); + fatal_error (%s: %m, _(errmsg)); } else - fatal (errmsg); + fatal_error (errmsg); } return pex; @@ -205,7 +168,7 @@ collect_wait (const char *prog, struct pex_obj *pex) int status; if (!pex_get_status (pex, 1, status)) -fatal_perror (can't get program status); +fatal_error (can't get program status: %m); pex_free (pex); if (status) @@ -214,15 +177,15 @@ collect_wait (const char *prog, struct pex_obj *pex) { int sig = WTERMSIG (status); if (WCOREDUMP (status)) - fatal (%s terminated with signal %d [%s], core dumped, + fatal_error (%s terminated with signal %d [%s], core dumped, prog, sig, strsignal (sig)); else - fatal (%s terminated with signal %d [%s], + fatal_error (%s terminated with signal %d [%s], prog, sig, strsignal (sig)); } if (WIFEXITED (status)) - fatal (%s returned %d exit status, prog, WEXITSTATUS (status)); + fatal_error (%s returned %d exit status, prog, WEXITSTATUS (status)); } return 0; @@ -238,7 +201,7 @@ maybe_unlink_file (const char *file) { if (unlink_if_ordinary (file) errno != ENOENT) - fatal_perror (deleting LTRANS file %s, file); + fatal_error (deleting LTRANS file %s: %m, file); } else if (verbose) fprintf (stderr, [Leaving LTRANS %s]\n, file); @@ -260,12 +223,12 @@ fork_execute (char **argv) at_args = concat (@, args_name, NULL); args = fopen (args_name, w); if (args == NULL) -fatal (failed to open %s, args_name); +fatal_error (failed to open %s, args_name); status = writeargv (argv[1], args); if (status) -fatal (could not write to temporary file %s, args_name); +fatal_error (could not write to temporary file %s, args_name); fclose (args); @@ -312,7 +275,7 @@ get_options_from_collect_gcc_options (const char *collect_gcc, do { if (argv_storage[j] == '\0') - fatal (malformed COLLECT_GCC_OPTIONS); + fatal_error (malformed COLLECT_GCC_OPTIONS); else if (strncmp (argv_storage[j], '\\'', 4) == 0) { argv_storage[k++] = '\''; @@ -452,8 +415,8 @@ merge_and_complain (struct
Re: RFA: cache enabled attribute by insn code
Hi, Commits 210964 and 210965 for this patch have broken GCC build on arm* targets. For instance on target arm-none-linux-gnueabi, when creating unwind-arm.o, I can see: /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra.c:1362 0x8e3bcd process_insn_for_elimination /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra-eliminations.c:1344 0x8e3bcd lra_eliminate(bool, bool) /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra-eliminations.c:1408 0x8dd753 lra_constraints(bool) /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra-constraints.c:4049 0x8cdc1a lra(_IO_FILE*) /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra.c:2332 0x8911f8 do_reload /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/ira.c:5444 0x891508 execute /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/ira.c:5605 Please submit a full bug report, On 23 May 2014 20:19, Jeff Law l...@redhat.com wrote: On 05/20/14 15:36, Richard Sandiford wrote: This is OK for the trunk (referring to the follow-up message which fixed EWRONGPATCH. Sorry, while working on the follow-up LRA patch, I realised I hadn't accounted for target changes that happen directly via target_reinit (rather than SWITCHABLE_TARGETS) and cases where reinit_regs is called to change just the register information. Both could potentially affect the enabled attribute. This version adds a recog_init function that clears the data if necessary. There are no other changes from first time. Is this still OK? Thanks for letting me know, that's a minor twiddle -- the patch is still OK. Jeff
Mark more constants readonly
I noticed that string constants built by the Fortran frontend don't set TREE_READONLY for STRING_CST (and subsequently noticed that nothing seems to set it for COMPLEX_CST). That was confusing the ptx backend I'm working on. The -fwritable-strings option for C was removed a while ago, so I tested the following patch to just set the flag unconditionally, and passed bootstrap and test on x86_64-linux. Ok? Bernd * tree.c (build_string, build_complex): Set TREE_READONLY. diff --git a/gcc/tree.c b/gcc/tree.c index 22b92f3..067f15a 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -1755,6 +1755,7 @@ build_string (int len, const char *str) memset (s, 0, sizeof (struct tree_typed)); TREE_SET_CODE (s, STRING_CST); TREE_CONSTANT (s) = 1; + TREE_READONLY (s) = 1; TREE_STRING_LENGTH (s) = len; memcpy (s-string.str, str, len); s-string.str[len] = '\0'; @@ -1776,6 +1777,7 @@ build_complex (tree type, tree real, tree imag) TREE_IMAGPART (t) = imag; TREE_TYPE (t) = type ? type : build_complex_type (TREE_TYPE (real)); TREE_OVERFLOW (t) = TREE_OVERFLOW (real) | TREE_OVERFLOW (imag); + TREE_READONLY (t) = 1; return t; }
Fix a function decl in gfortran
Compiling Fortran code with the ptx backend I'm working on results in assembler warnings about mismatch between function calls and function decls. This seems to be a bug in the Fortran frontend, where a decl for _gfortran_caf_init is created with four arguments, while the library function is declared as _gfortran_caf_init (int *argc, char ***argv) and calls to it are constructed (in create_main_function) with only two arguments. Bootstrapped and tested on x86_64-linux. Ok? Bernd diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c index 4a5ba1d..db46858 100644 --- a/gcc/fortran/trans-decl.c +++ b/gcc/fortran/trans-decl.c @@ -3260,7 +3260,7 @@ gfc_build_builtin_function_decls (void) gfor_fndecl_caf_init = gfc_build_library_function_decl ( get_identifier (PREFIX(caf_init)), void_type_node, - 4, pint_type, pppchar_type, pint_type, pint_type); + 2, pint_type, pppchar_type); gfor_fndecl_caf_finalize = gfc_build_library_function_decl ( get_identifier (PREFIX(caf_finalize)), void_type_node, 0);
Don't copy constants in build_constant_desc
When putting a constant into the constant pool, we make a copy of the tree in build_constant_desc. This caused me some problems with the ptx backend, which has a pass to modify the address spaces of all variables and constants to match the requirements of the backend. It would be nice for it to be able to find them all by walking the varpool and the gimple of all functions, and I ran into some testcases where these copies can slip through. After a while it occurred to me that the copy is probably a relic of old obstack times and not necessary anymore. And indeed, in 2.7.2.3 the only call to it looked like this: push_obstacks_nochange (); suspend_momentary (); p-exp = copy_constant (exp); pop_obstacks (); It seems unlikely that anything in the compiler would want to modify a constant after it has been put into the constant pool, and even if so, it should have the responsibility of making the copy itself. So, as an experiment I've bootstrapped and tested the following patch (x86_64-linux), and that seemed to work. Ok to apply along with removal of the now unused copy_constant? Bernd diff --git a/gcc/varasm.c b/gcc/varasm.c index 6781096..3469384 100644 --- a/gcc/varasm.c +++ b/gcc/varasm.c @@ -3205,7 +3205,7 @@ build_constant_desc (tree exp) tree decl; desc = ggc_allocconstant_descriptor_tree (); - desc-value = copy_constant (exp); + desc-value = exp; /* Create a string containing the label name, in LABEL. */ labelno = const_labelno++;
[C++ Patch] PR 57543
Hi, here, in order to accept the code without an explicit this- qualification, I propose to simply notice that instance == current_class_type. Tested x86_64-linux. Thanks! Paolo. /cp 2014-05-27 Paolo Carlini paolo.carl...@oracle.com PR c++/57543 * call.c (build_new_method_call_1): In an unevaluated context allow calling a member function of the same class. /testsuite 2014-05-27 Paolo Carlini paolo.carl...@oracle.com PR c++/57543 * g++.dg/cpp0x/decltype59.C: New. Index: cp/call.c === --- cp/call.c (revision 210956) +++ cp/call.c (working copy) @@ -8000,6 +8000,20 @@ build_new_method_call_1 (tree instance, tree fns, we know we really need it. */ cand-first_arg = instance; } + else if (cp_unevaluated_operand != 0 + current_class_type + same_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (instance)), + current_class_type)) + { + /* For example (c++/57543): + +template typename struct X +{ + void foo(); + auto bar() - decltype( X::foo() ); +}; */ + gcc_assert (cand-first_arg == instance); + } else { if (complain tf_error) Index: testsuite/g++.dg/cpp0x/decltype59.C === --- testsuite/g++.dg/cpp0x/decltype59.C (revision 0) +++ testsuite/g++.dg/cpp0x/decltype59.C (working copy) @@ -0,0 +1,13 @@ +// PR c++/57543 +// { dg-do compile { target c++11 } } + +template typename T struct X +{ + void foo(); + auto bar() - decltype( X::foo() ); +}; + +int main() +{ + Xint().bar(); +}
[PATCH] Fix HWI AIX breakage
This fixes the AIX breakage by defining __STDC_FORMAT_MACROS earlier, before stdio.h on AIX gets to include inttypes.h. Committed as obvious. Richard. 2014-05-27 Richard Biener rguent...@suse.de * system.h (__STDC_FORMAT_MACROS): Define as very first thing. Index: gcc/system.h === --- gcc/system.h(revision 210972) +++ gcc/system.h(working copy) @@ -22,6 +22,12 @@ along with GCC; see the file COPYING3. #ifndef GCC_SYSTEM_H #define GCC_SYSTEM_H +/* Define this so that inttypes.h defines the PRI?64 macros even + when compiling with a C++ compiler. Define it here so in the + event inttypes.h gets pulled in by another header it is already + defined. */ +#define __STDC_FORMAT_MACROS + /* We must include stdarg.h before stdio.h. */ #include stdarg.h @@ -491,7 +497,6 @@ extern void *realloc (void *, size_t); #endif #ifdef HAVE_INTTYPES_H -#define __STDC_FORMAT_MACROS #include inttypes.h #endif
Re: RFA: cache enabled attribute by insn code
On Tue, May 27, 2014 at 2:12 PM, Christophe Lyon christophe.l...@linaro.org wrote: Hi, Commits 210964 and 210965 for this patch have broken GCC build on arm* targets. For instance on target arm-none-linux-gnueabi, when creating unwind-arm.o, I can see: /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra.c:1362 0x8e3bcd process_insn_for_elimination /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra-eliminations.c:1344 0x8e3bcd lra_eliminate(bool, bool) /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra-eliminations.c:1408 0x8dd753 lra_constraints(bool) /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra-constraints.c:4049 0x8cdc1a lra(_IO_FILE*) /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra.c:2332 0x8911f8 do_reload /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/ira.c:5444 0x891508 execute /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/ira.c:5605 Please submit a full bug report, Can you file a proper bug report with a pre-processed file - information about configury options etc. to show what the issue is. Ramana On 23 May 2014 20:19, Jeff Law l...@redhat.com wrote: On 05/20/14 15:36, Richard Sandiford wrote: This is OK for the trunk (referring to the follow-up message which fixed EWRONGPATCH. Sorry, while working on the follow-up LRA patch, I realised I hadn't accounted for target changes that happen directly via target_reinit (rather than SWITCHABLE_TARGETS) and cases where reinit_regs is called to change just the register information. Both could potentially affect the enabled attribute. This version adds a recog_init function that clears the data if necessary. There are no other changes from first time. Is this still OK? Thanks for letting me know, that's a minor twiddle -- the patch is still OK. Jeff
Re: [PATCH] Try harder for conditional evaluation in VRP
On Tue, May 27, 2014 at 12:27 PM, Richard Biener wrote: * tree-vrp.c (vrp_evaluate_conditional_warnv_with_ops_using_ranges): Try using literal operands when comparing value-ranges failed. No test case? Ciao! Steven
Re: [PATCH] Delete temporary string within demangler even in failure cases.
On Tue, May 27, 2014 at 3:57 AM, Andrew Burgess aburg...@broadcom.com wrote: libiberty/ChangeLog * cplus-dem.c (do_type): Call string_delete even if the call to demangle_template fails. This is OK. Thanks. I have to ask: you know this code is not used, right? You're looking at the old demangler, for symbols generated by versions of g++ before GCC 3.4 (released 2004). The demangler for current versions of g++ is in cp-demangle.c. Ian
Re: [PATCH] Try harder for conditional evaluation in VRP
On Tue, 27 May 2014, Steven Bosscher wrote: On Tue, May 27, 2014 at 12:27 PM, Richard Biener wrote: * tree-vrp.c (vrp_evaluate_conditional_warnv_with_ops_using_ranges): Try using literal operands when comparing value-ranges failed. No test case? Sorry ;) Happens to patches I uncover in my dev tree. I'll try to come up with sth (I remember coding it when workin on some PR ... but I've lost track of which one). Richard.
Re: [PATCH 1/3] Fix demangler testsuite crashes with CP_DEMANGLE_DEBUG defined
On Tue, May 27, 2014 at 4:57 AM, Pedro Alves pal...@redhat.com wrote: libiberty/ 2014-05-26 Pedro Alves pal...@redhat.com * cp-demangle.c (d_dump): Handle DEMANGLE_COMPONENT_FUNCTION_PARAM and DEMANGLE_COMPONENT_NUMBER. This is OK. Thanks. Ian
[patch] VxWorks configuration adjustments for powerpc 8548 (e500v2)
Hello, The VxWorks compilation of C sources using vxworks header files need the CPU macro to be defined. Configuration files in the compiler are setup to to pick a default value depending on the cpu variant for which we are compiling. On powerpc, this is achieved with: #define CPP_SPEC \ %{!DCPU=*: \ %{mcpu=403 : -DCPU=PPC403 ; \ mcpu=405 : -DCPU=PPC405 ; \ ... in rs6000/vxworks.h. The current definition misses an entry for the e500v2 family of powerpc's, supported by VxWorks and by GCC with -mcpu=8548. The attached patch is a proposal to add the missing entry, accounting for a minor complication: the CPU value to set for e500v2 configurations differs between the regular and the AE versions of VxWorks. This is addressed by the use of an internal VXCPU_FOR_8548 configuration macro, with a default definition in rs6000/vxworks.h and an override in a vxworksae specific configuration file (rs6000/vxworksae.h). vxworksae.h used to exist for this port. It was removed by rev 171779 for PR target/47109 as the file happened to be there only for TARGET_VERSION. This patch re-instates the file, which we'll need for other purposes in patches to be submitted later. Tested by checking that - the compiler builds after the patch, for both powerpc-wrs-vxworks and powerpc-wrs-vxworksae, - the compiler produces the expected CPU defs for -mcpu=8548 in both cases, - the behavior for other -mcpu selections is not altered. OK for mainline ? Thanks much in advance for your feedback, With Kind Regards, Olivier 2014-05-27 Olivier Hainque hain...@adacore.com * config/rs6000/vxworks.h (VXCPU_FOR_8548): New. Default to PPC85XX. (CPP_SPEC): Add entry for -mcpu=8548. * config/rs6000/vxworksae.h: Reinstate. Override VXCPU_FOR_8548. * config.gcc (powerpc-wrs-vxworksae, tm_file): Add back vxworksae.h. vx8548.diff Description: Binary data
Re: Fix a function decl in gfortran
Bernd Schmidt wrote: Compiling Fortran code with the ptx backend I'm working on results in assembler warnings about mismatch between function calls and function decls. Bootstrapped and tested on x86_64-linux. Ok? OK. The change/bug is due to my fortran-caf - trunk patch at https://gcc.gnu.org/ml/fortran/2014-04/msg00091.html, which seemingly missed the function declaration when updating the calls and the library. Thanks! Tobias
Re: [patch i386]: Sibcall tail-call improvement and partial fix PR/60104
- Original Message - On 05/23/14 02:58, Kai Tietz wrote: Hello, yes the underlying issue is the same as for PR/46219. Nevertheless the patch doesn't solve this mentioned PR as I used for know a pretty conservative checking of allowed memories. By extending x86_sibcall_memory_p_1 function about allowing register-arguments too for memory, this problem can be solved. BTW, do you want to add 46219 to your list? Yes, it makes I put it to my list. Underlying issue is related to this issue here. I have a additional patch for fixing 46219. Sadly it causes troubles about stack-based memories in some rare cases. Still working on a finding a sample to reproduce issue outside of bootstrap. The point is that in memories on checking for sibling-tail-calls we still see pseudo-register variables. So we can't be sure if it requires frame/stack-pointer or not. At the least, I think we should add the test from 46219 to the suite, xfailed if you don't tackle it as a part of this work. Sure I added this testcase to the testsuite-patch. I still wait for comments on the accumulator-part, as here I added the stdarg-check. jeff
Re: [PATCH] Warn on and fold NULL checks against inline functions
On Tue, May 27, 2014 at 8:32 AM, Patrick Palka patr...@parcs.ath.cx wrote: On Tue, May 27, 2014 at 3:33 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, May 26, 2014 at 9:01 PM, Patrick Palka patr...@parcs.ath.cx wrote: Hi, This patch teaches the C++ frontend to warn on NULL checks against inline functions and it teaches the middle-end to fold NULL checks against inline functions. These two things are currently done for non-inline C++ functions, but inline functions are exceptional in that the C++ frontend marks them as weak, and NULL checks against weak symbols cannot be folded in general because the symbol may be mapped to NULL at link-time. But in the case of an inline function, the fact that it's a weak symbol is an implementation detail. And because it is not permitted to explicitly give an inline function the weak attribute (see handle_weak_attribute), in order to acheive $SUBJECT it suffices to assume that all inline functions are non-null, which is what this patch does. Bootstrap and regtest against x86_64-unknown-linux-gnu in progress. Is this patch OK assuming no regressions? What about always_inline function wrappers as used in FORTIFY_SOURCE? IIRC the actual referenced function may be declared weak and thus the address can compare to NULL? Sth like extern void __foo (void) __attribute__((weak,asm(foo))); extern inline __attribute__((always_inline,gnu_inline)) void foo (void) { __foo (); } int main() { if (foo == 0) return 0; abort (); } The issue is that the inline and alias may be hidden to the user and thus he'll get unwanted and confusing warnings. Richard. So as it stands the foo == 0 check in the above example gets folded with or without my patch. The issue here seems to be related to the use of gnu_inline. The address of an inline function marked gnu_inline shouldn't be considered non-NULL because a standalone function does not actually get emitted. Of course, one could inspect aliases but in general it is not correct to assume such functions are non-NULL. Does this sound right? This issue has slight overlap with the issue that my patch tries to fix. I could try to handle it as well. Oh, disregard the above. I think I see what you mean now, Richard. Because __foo is an alias for foo, and __foo is weak, is it safe to assume that foo is non-NULL? That is a tricky question...
Re: RFA: cache enabled attribute by insn code
Christophe Lyon christophe.l...@linaro.org writes: Hi, Commits 210964 and 210965 for this patch have broken GCC build on arm* targets. Could you send me the .i? For instance on target arm-none-linux-gnueabi, when creating unwind-arm.o, I can see: /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra.c:1362 0x8e3bcd process_insn_for_elimination /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra-eliminations.c:1344 0x8e3bcd lra_eliminate(bool, bool) /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra-eliminations.c:1408 0x8dd753 lra_constraints(bool) /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra-constraints.c:4049 0x8cdc1a lra(_IO_FILE*) /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra.c:2332 0x8911f8 do_reload /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/ira.c:5444 0x891508 execute /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/ira.c:5605 Please submit a full bug report, Hmm, is this because of insn_enabled? If so, how did that work before the patch? LRA already assumed that the enabled attribute didn't depend on the operands. Does the following patch help? Thanks, Richard gcc/ * config/arm/iterators.md (shiftable_ops): New code iterator. (t2_binop0): New code attribute. * config/arm/arm.md (insn_enabled): Delete. (enabled): Remove insn_enabled test. (*arith_shiftsi): Split out... (*arith_multsi): ...this pattern and remove insn_enabled attribute. Index: gcc/config/arm/arm.md === --- gcc/config/arm/arm.md (revision 210972) +++ gcc/config/arm/arm.md (working copy) @@ -200,17 +200,9 @@ (const_string yes)] (const_string no))) -; Allows an insn to disable certain alternatives for reasons other than -; arch support. -(define_attr insn_enabled no,yes - (const_string yes)) - ; Enable all alternatives that are both arch_enabled and insn_enabled. (define_attr enabled no,yes - (cond [(eq_attr insn_enabled no) - (const_string no) - - (and (eq_attr predicable_short_it no) + (cond [(and (eq_attr predicable_short_it no) (and (eq_attr predicated yes) (match_test arm_restrict_it))) (const_string no) @@ -9876,39 +9868,38 @@ ;; Patterns to allow combination of arithmetic, cond code and shifts -(define_insn *arith_shiftsi - [(set (match_operand:SI 0 s_register_operand =r,r,r,r) -(match_operator:SI 1 shiftable_operator - [(match_operator:SI 3 shift_operator - [(match_operand:SI 4 s_register_operand r,r,r,r) - (match_operand:SI 5 shift_amount_operand M,M,M,r)]) - (match_operand:SI 2 s_register_operand rk,rk,r,rk)]))] +;; Separate out the (mult ...) case, since it doesn't allow register operands. +(define_insn *arith_multsi + [(set (match_operand:SI 0 s_register_operand =r,r) +(shiftable_ops:SI + (match_operator:SI 2 mult_operator + [(match_operand:SI 3 s_register_operand r,r) +(match_operand:SI 4 power_of_two_operand)]) + (match_operand:SI 1 s_register_operand rk,t2_binop0)))] TARGET_32BIT - %i1%?\\t%0, %2, %4%S3 + %i1%?\\t%0, %1, %3%S2 [(set_attr predicable yes) (set_attr predicable_short_it no) (set_attr shift 4) - (set_attr arch a,t2,t2,a) - ;; Thumb2 doesn't allow the stack pointer to be used for - ;; operand1 for all operations other than add and sub. In this case - ;; the minus operation is a candidate for an rsub and hence needs - ;; to be disabled. - ;; We have to make sure to disable the fourth alternative if - ;; the shift_operator is MULT, since otherwise the insn will - ;; also match a multiply_accumulate pattern and validate_change - ;; will allow a replacement of the constant with a register - ;; despite the checks done in shift_operator. - (set_attr_alternative insn_enabled -[(const_string yes) - (if_then_else - (match_operand:SI 1 add_operator ) - (const_string yes) (const_string no)) - (const_string yes) - (if_then_else - (match_operand:SI 3 mult_operator ) - (const_string no) (const_string yes))]) - (set_attr type alu_shift_imm,alu_shift_imm,alu_shift_imm,alu_shift_reg)]) + (set_attr arch a,t2) + (set_attr type alu_shift_imm,alu_shift_imm)]) +;; Handles cases other than the above. +(define_insn *arith_shiftsi + [(set (match_operand:SI 0 s_register_operand =r,r,r) +(shiftable_ops:SI + (match_operator:SI 2 shift_operator +[(match_operand:SI 3 s_register_operand r,r,r) + (match_operand:SI 4 shift_amount_operand M,M,r)]) +
Re: [PATCH] Redesign jump threading profile updates
On Thu, Apr 17, 2014 at 6:23 AM, Teresa Johnson tejohn...@google.com wrote: On Wed, Apr 16, 2014 at 10:39 PM, Jeff Law l...@redhat.com wrote: On 03/26/14 17:44, Teresa Johnson wrote: Recently I discovered that the profile updates being performed by jump threading were incorrect in many cases, particularly in the case where the threading path contains a joiner. Some of the duplicated blocks/edges were not getting any counts, leading to incorrect function splitting and other downstream optimizations, and there were other insanities as well. After making a few attempts to fix the handling I ended up completely redesigning the profile update code, removing a few places throughout the code where it was attempting to do some updates. The profile updates in that code is a mess. It's never really been looked at in any systematic way, what's there is ad-hoc and usually in response to someone mentioning the profile data was incorrectly updated. As we rely more and more on that data the ad-hoc updating is going to cause us more and more pain. So any work in this space is going to be greatly appreciated. I'll have to look at this in some detail. But I wanted you to know I was aware of the work and it's in my queue. Great, thanks for the update! I realize that it is not a trivial change so it would take some time to get through. Hopefully it should address the ongoing profile fixup issues. Teresa Ping. Thanks, Teresa Thanks! jeff -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: RFA: cache enabled attribute by insn code
Richard Sandiford rsand...@linux.vnet.ibm.com writes: Does the following patch help? Bah, it won't of course: %i1 needs to be the operator. Richard
RE: [PATCH][ARM] FAIL: gcc.target/arm/pr58041.c scan-assembler ldrb
On Tue, 13 Aug 2013, Kyrylo Tkachov wrote: On 08/09/13 11:01, Julian Brown wrote: On Thu, 8 Aug 2013 15:44:17 +0100 Kyrylo Tkachov kyrylo.tkac...@arm.com wrote: Hi all, The recently added gcc.target/arm/pr58041.c test exposed a bug in the backend. When compiling for NEON and with -mno-unaligned-access we end up generating the vld1.64 and vst1.64 instructions instead of doing the accesses one byte at a time like -mno-unaligned-access expects. This patch fixes that by enabling the NEON expander and insns that produce these instructions only when unaligned accesses are allowed. Bootstrapped on arm-linux-gnueabihf. Tested arm-none-eabi on qemu. Ok for trunk and 4.8? I'm not sure if this is right, FWIW -- do the instructions in question trap if the CPU is set to disallow unaligned accesses? I thought that control bit only affected ARM core loads stores, not NEON ones. Thinking again - the ARM-ARM says - the alignment check is for element size, so an alternative might be to use vld1.8 instead to allow for this at which point we might as well do something else with the test. I note that these patterns are not allowed for BYTES_BIG_ENDIAN so that might be a better alternative than completely disabling it. Looking at the section on unaligned accesses, it seems that the ldrb/strb-class instructions are the only ones that are unaffected by the SCTLR.A bit and do not produce alignment faults in any case. The NEON load/store instructions, including vld1.8 can still cause an alignment fault when SCTLR.A is set. So it seems we can only use the byte-wise core memory instructions for unaligned data. This change however has regressed gcc.dg/vect/vect-72.c on the arm-linux-gnueabi target, -march=armv5te, in particular in 4.8. Beforehand the code fragment in question produced was: .L14: sub r1, r3, #16 add r3, r3, #16 vld1.8 {q8}, [r1] cmp r3, r0 vst1.64 {d16-d17}, [r2:64]! bne .L14 Afterwards it is: .L14: vldrd16, [r3, #-16] vldrd17, [r3, #-8] add r3, r3, #16 cmp r3, r1 vst1.64 {d16-d17}, [r2:64]! bne .L14 and the second VLDR instruction traps with SIGILL (the value in R3 is 0x10b29, odd as you'd expect, pointing into `ib'). I don't know why and especially why only the second of the two (regrettably I've been unable to track down an instruction reference that'd be detailed enough to specify what exceptions VLDR can produce and under what conditions). Interestingly enough the trap does not happen when the program is single-stepped under GDB (via gdbserver), however it then aborts once this copy loop has completed as `ia' contains rubbish and fails the test. Is there a fix that needs backporting to 4.8 or is this an issue that was unknown so far? Hardware and Linux used: $ cat /proc/cpuinfo Processor : ARMv7 Processor rev 2 (v7l) processor : 0 BogoMIPS: 2013.49 processor : 1 BogoMIPS: 1963.08 Features: swp half thumb fastmult vfp edsp thumbee neon vfpv3 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x1 CPU part: 0xc09 CPU revision: 2 Hardware: OMAP4430 Panda Board Revision: 0020 Serial : $ uname -a Linux panda2 2.6.35-903-omap4 #14-Ubuntu SMP PREEMPT Wed Oct 6 17:23:24 UTC 2010 armv7l GNU/Linux $ Maciej
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
On 27 May 2014 09:23, Thomas Preud'homme thomas.preudho...@arm.com wrote: Hi Chistophe and Andreas, From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme I suspect it's the same kind of problem m68k run into. I already wrote a patch to reduce the impact of different bitfield layout and it's in review right now. I'll send you tomorrow for testing. Can you both try the attached patch to see if it solves the bug you experienced? Andreas, I wrote it based on the correction you did so it should at least work for m68k. It still doesn't work on armeb. Christophe.
Re: [PATCH][ARM] FAIL: gcc.target/arm/pr58041.c scan-assembler ldrb
On Tue, May 27, 2014 at 3:31 PM, Maciej W. Rozycki ma...@codesourcery.com wrote: On Tue, 13 Aug 2013, Kyrylo Tkachov wrote: On 08/09/13 11:01, Julian Brown wrote: On Thu, 8 Aug 2013 15:44:17 +0100 Kyrylo Tkachov kyrylo.tkac...@arm.com wrote: Hi all, The recently added gcc.target/arm/pr58041.c test exposed a bug in the backend. When compiling for NEON and with -mno-unaligned-access we end up generating the vld1.64 and vst1.64 instructions instead of doing the accesses one byte at a time like -mno-unaligned-access expects. This patch fixes that by enabling the NEON expander and insns that produce these instructions only when unaligned accesses are allowed. Bootstrapped on arm-linux-gnueabihf. Tested arm-none-eabi on qemu. Ok for trunk and 4.8? I'm not sure if this is right, FWIW -- do the instructions in question trap if the CPU is set to disallow unaligned accesses? I thought that control bit only affected ARM core loads stores, not NEON ones. Thinking again - the ARM-ARM says - the alignment check is for element size, so an alternative might be to use vld1.8 instead to allow for this at which point we might as well do something else with the test. I note that these patterns are not allowed for BYTES_BIG_ENDIAN so that might be a better alternative than completely disabling it. Looking at the section on unaligned accesses, it seems that the ldrb/strb-class instructions are the only ones that are unaffected by the SCTLR.A bit and do not produce alignment faults in any case. The NEON load/store instructions, including vld1.8 can still cause an alignment fault when SCTLR.A is set. So it seems we can only use the byte-wise core memory instructions for unaligned data. This change however has regressed gcc.dg/vect/vect-72.c on the arm-linux-gnueabi target, -march=armv5te, in particular in 4.8. And what are all the configure flags you are using in case some one has to reproduce this issue ? Beforehand the code fragment in question produced was: .L14: sub r1, r3, #16 add r3, r3, #16 vld1.8 {q8}, [r1] vld1 allows a misaligned load. cmp r3, r0 vst1.64 {d16-d17}, [r2:64]! bne .L14 Afterwards it is: .L14: vldrd16, [r3, #-16] vldrd17, [r3, #-8] add r3, r3, #16 cmp r3, r1 vst1.64 {d16-d17}, [r2:64]! bne .L14 and the second VLDR instruction traps with SIGILL (the value in R3 is 0x10b29, odd as you'd expect, pointing into `ib'). I don't know why and especially why only the second of the two (regrettably I've been unable to track down an instruction reference that'd be detailed enough to specify what exceptions VLDR can produce and under what conditions). vldr will cause an unaligned access fault if the address is misaligned. The question is why is the address misaligned in this case. Is there a fix that needs backporting to 4.8 or is this an issue that was unknown so far? I haven't seen an issue with this so far. Ramana Hardware and Linux used: $ cat /proc/cpuinfo Processor : ARMv7 Processor rev 2 (v7l) processor : 0 BogoMIPS: 2013.49 processor : 1 BogoMIPS: 1963.08 Features: swp half thumb fastmult vfp edsp thumbee neon vfpv3 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x1 CPU part: 0xc09 CPU revision: 2 Hardware: OMAP4430 Panda Board Revision: 0020 Serial : $ uname -a Linux panda2 2.6.35-903-omap4 #14-Ubuntu SMP PREEMPT Wed Oct 6 17:23:24 UTC 2010 armv7l GNU/Linux $ Maciej
Re: [PATCH] Try harder for conditional evaluation in VRP
On Tue, 27 May 2014, Richard Biener wrote: On Tue, 27 May 2014, Steven Bosscher wrote: On Tue, May 27, 2014 at 12:27 PM, Richard Biener wrote: * tree-vrp.c (vrp_evaluate_conditional_warnv_with_ops_using_ranges): Try using literal operands when comparing value-ranges failed. No test case? Sorry ;) Happens to patches I uncover in my dev tree. I'll try to come up with sth (I remember coding it when workin on some PR ... but I've lost track of which one). Here is one (yeah, a bit artificial ...). Committed. Richard. 2014-05-27 Richard Biener rguent...@suse.de * gcc.dg/tree-ssa/vrp92.c: New testcase. Index: gcc/testsuite/gcc.dg/tree-ssa/vrp92.c === --- gcc/testsuite/gcc.dg/tree-ssa/vrp92.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/vrp92.c (working copy) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-vrp1-details } */ + +void bar (void); +int foo (int i, int j) +{ + int res = 1; + if (i j) +{ + /* We should be able to simplify the following conditional + during propagation. */ + if (i j) + res = 0; +} + /* And compute res as having a value-range of [1,1]. */ + if (res) +return i; + return j; +} + +/* { dg-final { scan-tree-dump res_.: \\\[1, 1\\\] vrp1 } } */ +/* { dg-final { scan-tree-dump-not Threaded vrp1 } } */ +/* { dg-final { cleanup-tree-dump vrp1 } } */
[PATCH] PR 41936 Memory leakage with allocatables and user-defined operators
This patch fixes a memory leakage with allocatables and user-defined operators. It is basically Mikael's patch adjusted in order to take into account Tobias' comments. The patch fixes the memory leak in the test gfortran.dg/class_array_15.f03 and I have added a check for it. In the PR I asked the question if this sufficient or if I should also include the test in comment 0. Any opinion? Cheers, Dominique 2014-05-27 Dominique d'Humieres domi...@lps.ens.fr Mikael Morin mik...@gcc.gnu.org PR fortran/41936 * trans-expr.c (gfc_conv_expr_reference): Deallocate array components. 2014-05-27 Dominique d'Humieres domi...@lps.ens.fr Mikael Morin mik...@gcc.gnu.org PR fortran/41936 * gfortran.dg/class_array_15.f03: Check memory leaks. diff -up ../_clean/gcc/fortran/trans-expr.c gcc/fortran/trans-expr.c --- ../_clean/gcc/fortran/trans-expr.c 2014-05-22 10:05:53.0 +0200 +++ gcc/fortran/trans-expr.c2014-05-25 00:50:21.0 +0200 @@ -6506,6 +6506,20 @@ gfc_conv_expr_reference (gfc_se * se, gf /* Take the address of that value. */ se-expr = gfc_build_addr_expr (NULL_TREE, var); + if (expr-ts.type == BT_DERIVED expr-rank + !gfc_is_finalizable (expr-ts.u.derived, NULL) + expr-ts.u.derived-attr.alloc_comp + expr-expr_type != EXPR_VARIABLE) +{ + tree tmp; + + tmp = build_fold_indirect_ref_loc (input_location, se-expr); + tmp = gfc_deallocate_alloc_comp (expr-ts.u.derived, tmp, expr-rank); + + /* The components shall be deallocated before + their containing entity. */ + gfc_prepend_expr_to_block (se-post, tmp); +} } diff -up ../_clean/gcc/testsuite/gfortran.dg/class_array_15.f03 gcc/testsuite/gfortran.dg/class_array_15.f03 --- ../_clean/gcc/testsuite/gfortran.dg/class_array_15.f03 2013-01-06 22:34:50.0 +0100 +++ gcc/testsuite/gfortran.dg/class_array_15.f032014-05-04 10:24:06.0 +0200 @@ -1,4 +1,5 @@ ! { dg-do run } +! { dg-options -fdump-tree-original } ! ! Tests the fixes for three bugs with the same underlying cause. All are regressions ! that come about because class array elements end up with a different tree type @@ -114,3 +115,5 @@ subroutine pr54992 ! This test remains bh = bhGet(b,instance=2) if (loc (b) .ne. loc(bh%hostNode)) call abort end +! { dg-final { scan-tree-dump-times builtin_free 12 original } } +! { dg-final { cleanup-tree-dump original } }
Re: [PATCH] Warn on and fold NULL checks against inline functions
On Tue, May 27, 2014 at 4:06 PM, Patrick Palka patr...@parcs.ath.cx wrote: On Tue, May 27, 2014 at 8:32 AM, Patrick Palka patr...@parcs.ath.cx wrote: On Tue, May 27, 2014 at 3:33 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, May 26, 2014 at 9:01 PM, Patrick Palka patr...@parcs.ath.cx wrote: Hi, This patch teaches the C++ frontend to warn on NULL checks against inline functions and it teaches the middle-end to fold NULL checks against inline functions. These two things are currently done for non-inline C++ functions, but inline functions are exceptional in that the C++ frontend marks them as weak, and NULL checks against weak symbols cannot be folded in general because the symbol may be mapped to NULL at link-time. But in the case of an inline function, the fact that it's a weak symbol is an implementation detail. And because it is not permitted to explicitly give an inline function the weak attribute (see handle_weak_attribute), in order to acheive $SUBJECT it suffices to assume that all inline functions are non-null, which is what this patch does. Bootstrap and regtest against x86_64-unknown-linux-gnu in progress. Is this patch OK assuming no regressions? What about always_inline function wrappers as used in FORTIFY_SOURCE? IIRC the actual referenced function may be declared weak and thus the address can compare to NULL? Sth like extern void __foo (void) __attribute__((weak,asm(foo))); extern inline __attribute__((always_inline,gnu_inline)) void foo (void) { __foo (); } int main() { if (foo == 0) return 0; abort (); } The issue is that the inline and alias may be hidden to the user and thus he'll get unwanted and confusing warnings. Richard. So as it stands the foo == 0 check in the above example gets folded with or without my patch. The issue here seems to be related to the use of gnu_inline. The address of an inline function marked gnu_inline shouldn't be considered non-NULL because a standalone function does not actually get emitted. Of course, one could inspect aliases but in general it is not correct to assume such functions are non-NULL. Does this sound right? This issue has slight overlap with the issue that my patch tries to fix. I could try to handle it as well. Oh, disregard the above. I think I see what you mean now, Richard. Because __foo is an alias for foo, and __foo is weak, is it safe to assume that foo is non-NULL? That is a tricky question... No (I wasn't making a folding validity remark). I want to make sure we won't see diagnostics on code like #include string.h int main() { if (malloc != 0) ... } just because it is built with -D_FORTIFY_SOURCE. Whether we may fold that check at all is another question, of course (and again that should _not_ differ between -D_FORTIFY_SOURCE or -U_FORTIFY_SOURCE, but it may well also be a glibc issue). And yes, aliases are tricky beasts ... Richard.
Re: Mark more constants readonly
On Tue, May 27, 2014 at 3:13 PM, Bernd Schmidt ber...@codesourcery.com wrote: I noticed that string constants built by the Fortran frontend don't set TREE_READONLY for STRING_CST (and subsequently noticed that nothing seems to set it for COMPLEX_CST). That was confusing the ptx backend I'm working on. The -fwritable-strings option for C was removed a while ago, so I tested the following patch to just set the flag unconditionally, and passed bootstrap and test on x86_64-linux. Ok? Hmm? Not so obvious. TREE_READONLY has no purpose on tcc_constant nodes if I read documentation correctly (tree.h). In fact base.readonly_flag is unused for tcc_constant (and would be redundant with TREE_CONSTANT). Richard. Bernd
Re: RFA: cache enabled attribute by insn code
On 27 May 2014 15:37, Ramana Radhakrishnan ramana@googlemail.com wrote: On Tue, May 27, 2014 at 2:12 PM, Christophe Lyon christophe.l...@linaro.org wrote: Hi, Commits 210964 and 210965 for this patch have broken GCC build on arm* targets. For instance on target arm-none-linux-gnueabi, when creating unwind-arm.o, I can see: /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra.c:1362 0x8e3bcd process_insn_for_elimination /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra-eliminations.c:1344 0x8e3bcd lra_eliminate(bool, bool) /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra-eliminations.c:1408 0x8dd753 lra_constraints(bool) /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra-constraints.c:4049 0x8cdc1a lra(_IO_FILE*) /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/lra.c:2332 0x8911f8 do_reload /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/ira.c:5444 0x891508 execute /tmp/5673443_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/ira.c:5605 Please submit a full bug report, Can you file a proper bug report with a pre-processed file - information about configury options etc. to show what the issue is. Filed PR 61331. Thanks.
Re: Don't copy constants in build_constant_desc
On Tue, May 27, 2014 at 3:26 PM, Bernd Schmidt ber...@codesourcery.com wrote: When putting a constant into the constant pool, we make a copy of the tree in build_constant_desc. This caused me some problems with the ptx backend, which has a pass to modify the address spaces of all variables and constants to match the requirements of the backend. It would be nice for it to be able to find them all by walking the varpool and the gimple of all functions, and I ran into some testcases where these copies can slip through. After a while it occurred to me that the copy is probably a relic of old obstack times and not necessary anymore. And indeed, in 2.7.2.3 the only call to it looked like this: push_obstacks_nochange (); suspend_momentary (); p-exp = copy_constant (exp); pop_obstacks (); It seems unlikely that anything in the compiler would want to modify a constant after it has been put into the constant pool, and even if so, it should have the responsibility of making the copy itself. So, as an experiment I've bootstrapped and tested the following patch (x86_64-linux), and that seemed to work. Ok to apply along with removal of the now unused copy_constant? Ok if nobody raises issues within 24h. Thanks, Richard. Bernd
Re: [PATCH, fortran]: Include stdlib.h in intrinsics/getcwd.c
On Tue, May 27, 2014 at 1:37 PM, Steve Kargl s...@troutmask.apl.washington.edu wrote: ... to avoid implicit declaration of function ???free??? warning. 2014-05-27 Uros Bizjak ubiz...@gmail.com * intrinsics/getcwd.c: Include stdlib.h. It can also be committed to the 4.9 branch if you have the time. There is no need for stdlib.h include in the 4.9 branch, the call to free was introduced in 4.10. Uros.
Re: RFA: cache enabled attribute by insn code
Richard Sandiford rdsandif...@googlemail.com writes: Richard Sandiford rsand...@linux.vnet.ibm.com writes: Does the following patch help? Bah, it won't of course: %i1 needs to be the operator. Here's v2. I tested that it worked for simple tests like: int f1 (int x, int y) { return x + (y 4); } int f2 (int x, int y) { return x - (y 4); } int f3 (int x, int y) { return x (y 4); } int f4 (int x, int y) { return x | (y 4); } int f5 (int x, int y) { return x ^ (y 4); } int f6 (int x, int y) { return (y 4) - x; } int g1 (int x, int y, int z) { return x + (y z); } int g2 (int x, int y, int z) { return x - (y z); } int g3 (int x, int y, int z) { return x (y z); } int g4 (int x, int y, int z) { return x | (y z); } int g5 (int x, int y, int z) { return x ^ (y z); } int g6 (int x, int y, int z) { return (y z) - x; } as well as the testcase. Thanks, Richard gcc/ * config/arm/iterators.md (shiftable_ops): New code iterator. (t2_binop0, arith_shift_insn): New code attributes. * config/arm/arm.md (insn_enabled): Delete. (enabled): Remove insn_enabled test. (*arith_shiftsi): Split out... (*arith_multsi): ...this pattern and remove insn_enabled attribute. Index: gcc/config/arm/arm.md === --- gcc/config/arm/arm.md (revision 210972) +++ gcc/config/arm/arm.md (working copy) @@ -200,17 +200,9 @@ (const_string yes)] (const_string no))) -; Allows an insn to disable certain alternatives for reasons other than -; arch support. -(define_attr insn_enabled no,yes - (const_string yes)) - ; Enable all alternatives that are both arch_enabled and insn_enabled. (define_attr enabled no,yes - (cond [(eq_attr insn_enabled no) - (const_string no) - - (and (eq_attr predicable_short_it no) + (cond [(and (eq_attr predicable_short_it no) (and (eq_attr predicated yes) (match_test arm_restrict_it))) (const_string no) @@ -9876,39 +9868,38 @@ ;; Patterns to allow combination of arithmetic, cond code and shifts -(define_insn *arith_shiftsi - [(set (match_operand:SI 0 s_register_operand =r,r,r,r) -(match_operator:SI 1 shiftable_operator - [(match_operator:SI 3 shift_operator - [(match_operand:SI 4 s_register_operand r,r,r,r) - (match_operand:SI 5 shift_amount_operand M,M,M,r)]) - (match_operand:SI 2 s_register_operand rk,rk,r,rk)]))] +;; Separate out the (mult ...) case, since it doesn't allow register operands. +(define_insn *arith_multsi + [(set (match_operand:SI 0 s_register_operand =r,r) +(shiftable_ops:SI + (match_operator:SI 2 mult_operator + [(match_operand:SI 3 s_register_operand r,r) +(match_operand:SI 4 power_of_two_operand)]) + (match_operand:SI 1 s_register_operand rk,t2_binop0)))] TARGET_32BIT - %i1%?\\t%0, %2, %4%S3 + arith_shift_insn%?\\t%0, %1, %3%S2 [(set_attr predicable yes) (set_attr predicable_short_it no) (set_attr shift 4) - (set_attr arch a,t2,t2,a) - ;; Thumb2 doesn't allow the stack pointer to be used for - ;; operand1 for all operations other than add and sub. In this case - ;; the minus operation is a candidate for an rsub and hence needs - ;; to be disabled. - ;; We have to make sure to disable the fourth alternative if - ;; the shift_operator is MULT, since otherwise the insn will - ;; also match a multiply_accumulate pattern and validate_change - ;; will allow a replacement of the constant with a register - ;; despite the checks done in shift_operator. - (set_attr_alternative insn_enabled -[(const_string yes) - (if_then_else - (match_operand:SI 1 add_operator ) - (const_string yes) (const_string no)) - (const_string yes) - (if_then_else - (match_operand:SI 3 mult_operator ) - (const_string no) (const_string yes))]) - (set_attr type alu_shift_imm,alu_shift_imm,alu_shift_imm,alu_shift_reg)]) + (set_attr arch a,t2) + (set_attr type alu_shift_imm,alu_shift_imm)]) +;; Handles cases other than the above. +(define_insn *arith_shiftsi + [(set (match_operand:SI 0 s_register_operand =r,r,r) +(shiftable_ops:SI + (match_operator:SI 2 shift_operator +[(match_operand:SI 3 s_register_operand r,r,r) + (match_operand:SI 4 shift_amount_operand M,M,r)]) + (match_operand:SI 1 s_register_operand rk,t2_binop0,rk)))] + TARGET_32BIT GET_CODE (operands[3]) != MULT + arith_shift_insn%?\\t%0, %1, %3%S2 + [(set_attr predicable yes) + (set_attr predicable_short_it no) + (set_attr shift 4) + (set_attr arch a,t2,a) + (set_attr type alu_shift_imm,alu_shift_imm,alu_shift_reg)]) + (define_split [(set
Re: RFA: cache enabled attribute by insn code
On 27/05/14 15:08, Richard Sandiford wrote: Hmm, is this because of insn_enabled? If so, how did that work before the patch? LRA already assumed that the enabled attribute didn't depend on the operands. Huh! enabled can be applied to each alternative. Alternatives are selected based on the operands. If LRA can't cope with that we have a serious problem. In fact, a pattern that matches but has no enabled alternatives is meaningless and guaranteed to cause problems during register allocation. R.
Re: [PATCH][ARM] FAIL: gcc.target/arm/pr58041.c scan-assembler ldrb
On 27/05/14 15:31, Maciej W. Rozycki wrote: On Tue, 13 Aug 2013, Kyrylo Tkachov wrote: On 08/09/13 11:01, Julian Brown wrote: On Thu, 8 Aug 2013 15:44:17 +0100 Kyrylo Tkachov kyrylo.tkac...@arm.com wrote: Hi all, The recently added gcc.target/arm/pr58041.c test exposed a bug in the backend. When compiling for NEON and with -mno-unaligned-access we end up generating the vld1.64 and vst1.64 instructions instead of doing the accesses one byte at a time like -mno-unaligned-access expects. This patch fixes that by enabling the NEON expander and insns that produce these instructions only when unaligned accesses are allowed. Bootstrapped on arm-linux-gnueabihf. Tested arm-none-eabi on qemu. Ok for trunk and 4.8? I'm not sure if this is right, FWIW -- do the instructions in question trap if the CPU is set to disallow unaligned accesses? I thought that control bit only affected ARM core loads stores, not NEON ones. Thinking again - the ARM-ARM says - the alignment check is for element size, so an alternative might be to use vld1.8 instead to allow for this at which point we might as well do something else with the test. I note that these patterns are not allowed for BYTES_BIG_ENDIAN so that might be a better alternative than completely disabling it. Looking at the section on unaligned accesses, it seems that the ldrb/strb-class instructions are the only ones that are unaffected by the SCTLR.A bit and do not produce alignment faults in any case. The NEON load/store instructions, including vld1.8 can still cause an alignment fault when SCTLR.A is set. So it seems we can only use the byte-wise core memory instructions for unaligned data. This change however has regressed gcc.dg/vect/vect-72.c on the arm-linux-gnueabi target, -march=armv5te, in particular in 4.8. Beforehand the code fragment in question produced was: .L14: sub r1, r3, #16 add r3, r3, #16 vld1.8 {q8}, [r1] cmp r3, r0 vst1.64 {d16-d17}, [r2:64]! bne .L14 Afterwards it is: .L14: vldrd16, [r3, #-16] vldrd17, [r3, #-8] add r3, r3, #16 cmp r3, r1 vst1.64 {d16-d17}, [r2:64]! bne .L14 and the second VLDR instruction traps with SIGILL (the value in R3 is 0x10b29, odd as you'd expect, pointing into `ib'). I don't know why and especially why only the second of the two (regrettably I've been unable to track down an instruction reference that'd be detailed enough to specify what exceptions VLDR can produce and under what conditions). SIGILL means undefined instruction. It does not mean misaligned address (unless the kernel is messed up) -- that would be SIGBUS. Interestingly enough the trap does not happen when the program is single-stepped under GDB (via gdbserver), however it then aborts once this copy loop has completed as `ia' contains rubbish and fails the test. Is there a fix that needs backporting to 4.8 or is this an issue that was unknown so far? Hardware and Linux used: $ cat /proc/cpuinfo Processor : ARMv7 Processor rev 2 (v7l) processor : 0 BogoMIPS : 2013.49 processor : 1 BogoMIPS : 1963.08 Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x1 CPU part : 0xc09 CPU revision : 2 Hardware : OMAP4430 Panda Board Revision : 0020 Serial: $ uname -a Linux panda2 2.6.35-903-omap4 #14-Ubuntu SMP PREEMPT Wed Oct 6 17:23:24 UTC 2010 armv7l GNU/Linux $ Maciej
[PATCH AArch64] Remove from arm_neon.h functions not in the spec
arm_neon.h contains a bunch of functions (for example, the wonderful vcgez_u* intrinsics - that's an unsigned comparison of greater-than-or-equal-to zero) that are not present in the current ARM Neon Intrinsics spec: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/index.html This patch just deletes those intrinsics. OK for trunk? Cheers, Alan gcc/ChangeLog: 2014-05-27 Alan Lawrence alan.lawre...@arm.com * config/aarch64/arm_neon.h (vfmaq_n_f64, vmlaq_n_f64, vmlsq_n_f64, vrsrtsq_f64, vtst_p16, vtstq_p16, vcge_p8, vcgeq_p8, vcgez_p8, vcgez_u8, vcgez_u16, vcgez_u32, vcgez_u64, vcgezq_p8, vcgezq_u8, vcgezq_u16, vcgezq_u32, vcgezq_u64, vcgezd_u64, vcgt_p8, vcgtq_p8, vcgtz_p8, vcgtz_u8, vcgtz_u16, vcgtz_u32, vcgtz_u64, vcgtzq_p8, vcgtzq_u8, vcgtzq_u16, vcgtzq_u32, vcgtzq_u64, vcgtzd_u64, vcle_p8, vcleq_p8, vclez_p8, vclez_u64, vclezq_p8, vclezd_u64, vclt_p8, vcltq_p8, vcltz_p8, vcltzq_p8, vcltzd_u64): Remove functions as they are not in the spec.diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 747a292..6f20337 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -5781,17 +5781,6 @@ vfmaq_n_f32 (float32x4_t a, float32x4_t b, float32_t c) return result; } -__extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) -vfmaq_n_f64 (float64x2_t a, float64x2_t b, float64_t c) -{ - float64x2_t result; - __asm__ (fmla %0.2d, %2.2d, %3.d[0] - : =w(result) - : 0(a), w(b), w(c) - : /* No clobbers */); - return result; -} - __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vfms_f32 (float32x2_t a, float32x2_t b, float32x2_t c) { @@ -7243,18 +7232,6 @@ vmlaq_n_f32 (float32x4_t a, float32x4_t b, float32_t c) return result; } -__extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) -vmlaq_n_f64 (float64x2_t a, float64x2_t b, float64_t c) -{ - float64x2_t result; - float64x2_t t1; - __asm__ (fmul %1.2d, %3.2d, %4.d[0]; fadd %0.2d, %0.2d, %1.2d - : =w(result), =w(t1) - : 0(a), w(b), w(c) - : /* No clobbers */); - return result; -} - __extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) vmlaq_n_s16 (int16x8_t a, int16x8_t b, int16_t c) { @@ -7943,18 +7920,6 @@ vmlsq_n_f32 (float32x4_t a, float32x4_t b, float32_t c) return result; } -__extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) -vmlsq_n_f64 (float64x2_t a, float64x2_t b, float64_t c) -{ - float64x2_t result; - float64x2_t t1; - __asm__ (fmul %1.2d, %3.2d, %4.d[0]; fsub %0.2d, %0.2d, %1.2d - : =w(result), =w(t1) - : 0(a), w(b), x(c) - : /* No clobbers */); - return result; -} - __extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) vmlsq_n_s16 (int16x8_t a, int16x8_t b, int16_t c) { @@ -11329,17 +11294,6 @@ vrsqrtss_f32 (float32_t a, float32_t b) return result; } -__extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) -vrsrtsq_f64 (float64x2_t a, float64x2_t b) -{ - float64x2_t result; - __asm__ (frsqrts %0.2d,%1.2d,%2.2d - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - __extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) vrsubhn_high_s16 (int8x8_t a, int16x8_t b, int16x8_t c) { @@ -16082,13 +16036,6 @@ vcge_f64 (float64x1_t __a, float64x1_t __b) } __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) -vcge_p8 (poly8x8_t __a, poly8x8_t __b) -{ - return (uint8x8_t) __builtin_aarch64_cmgev8qi ((int8x8_t) __a, - (int8x8_t) __b); -} - -__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) vcge_s8 (int8x8_t __a, int8x8_t __b) { return (uint8x8_t) __builtin_aarch64_cmgev8qi (__a, __b); @@ -16152,13 +16099,6 @@ vcgeq_f64 (float64x2_t __a, float64x2_t __b) } __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) -vcgeq_p8 (poly8x16_t __a, poly8x16_t __b) -{ - return (uint8x16_t) __builtin_aarch64_cmgev16qi ((int8x16_t) __a, - (int8x16_t) __b); -} - -__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) vcgeq_s8 (int8x16_t __a, int8x16_t __b) { return (uint8x16_t) __builtin_aarch64_cmgev16qi (__a, __b); @@ -16252,14 +16192,6 @@ vcgez_f64 (float64x1_t __a) } __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) -vcgez_p8 (poly8x8_t __a) -{ - poly8x8_t __b = {0, 0, 0, 0, 0, 0, 0, 0}; - return (uint8x8_t) __builtin_aarch64_cmgev8qi ((int8x8_t) __a, - (int8x8_t) __b); -} - -__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) vcgez_s8 (int8x8_t __a) { int8x8_t __b = {0, 0, 0, 0, 0, 0, 0, 0}; @@ -16286,36 +16218,6 @@ vcgez_s64 (int64x1_t __a) return __a = 0ll ? -1ll : 0ll; }
Re: [PATCH, ARM] interrupt testcases for shink-wrapping
On 05/27/14 02:07, Zhenqiang Chen wrote: Hi, Following up Richard's comments: https://gcc.gnu.org/ml/gcc-patches/2014-05/msg00967.html I investigate interrupt routines: * gcc can shrink-wrapping an interrupt routine (ARM mode). The shrink-wrap-interrupt_1.c in the patch can show it. * There is restriction for interrupt routine to be shrink-wrapped. e.g. shrink-wrap-interrupt_2.c can not be shrink-wrapped. * When interrupt routine is shrink-wrapped, it uses *cond_simple_return to return from the function. There is no dwarf info issue. So no need to change the codes. The patch only includes two test cases to track it. OK for trunk? Thanks! -Zhenqiang testsuite/ChangeLog: 2014-05-27 Zhenqiang Chen zhenqiang.c...@linaro.org * gcc.target/arm/shrink-wrap-interrupt_1.c: New test. * gcc.target/arm/shrink-wrap-interrupt_2.c: New test. OK. Jeff
Re: [PATCH][ARM] FAIL: gcc.target/arm/pr58041.c scan-assembler ldrb
On 27/05/14 15:47, Ramana Radhakrishnan wrote: On Tue, May 27, 2014 at 3:31 PM, Maciej W. Rozycki ma...@codesourcery.com wrote: On Tue, 13 Aug 2013, Kyrylo Tkachov wrote: On 08/09/13 11:01, Julian Brown wrote: On Thu, 8 Aug 2013 15:44:17 +0100 Kyrylo Tkachov kyrylo.tkac...@arm.com wrote: Hi all, The recently added gcc.target/arm/pr58041.c test exposed a bug in the backend. When compiling for NEON and with -mno-unaligned-access we end up generating the vld1.64 and vst1.64 instructions instead of doing the accesses one byte at a time like -mno-unaligned-access expects. This patch fixes that by enabling the NEON expander and insns that produce these instructions only when unaligned accesses are allowed. Bootstrapped on arm-linux-gnueabihf. Tested arm-none-eabi on qemu. Ok for trunk and 4.8? I'm not sure if this is right, FWIW -- do the instructions in question trap if the CPU is set to disallow unaligned accesses? I thought that control bit only affected ARM core loads stores, not NEON ones. Thinking again - the ARM-ARM says - the alignment check is for element size, so an alternative might be to use vld1.8 instead to allow for this at which point we might as well do something else with the test. I note that these patterns are not allowed for BYTES_BIG_ENDIAN so that might be a better alternative than completely disabling it. Looking at the section on unaligned accesses, it seems that the ldrb/strb-class instructions are the only ones that are unaffected by the SCTLR.A bit and do not produce alignment faults in any case. The NEON load/store instructions, including vld1.8 can still cause an alignment fault when SCTLR.A is set. So it seems we can only use the byte-wise core memory instructions for unaligned data. This change however has regressed gcc.dg/vect/vect-72.c on the arm-linux-gnueabi target, -march=armv5te, in particular in 4.8. And what are all the configure flags you are using in case some one has to reproduce this issue ? Second that. My recently built 4.8 (gcc version 4.8.2 20130531) for vect-72 with options: -O2 -ftree-vectorize -march=armv5te -mfpu=neon -mfloat-abi=hard -fno-vect-cost-model -fno-common gives code the same as your original one: .L14: sub r1, r3, #16 add r3, r3, #16 vld1.8 {q8}, [r1] cmp r3, r0 vst1.64 {d16-d17}, [r2:64]! bne .L14 ldr r3, .L22+12 add ip, r3, #128 add r2, r3, #129 Kyrill Beforehand the code fragment in question produced was: .L14: sub r1, r3, #16 add r3, r3, #16 vld1.8 {q8}, [r1] vld1 allows a misaligned load. cmp r3, r0 vst1.64 {d16-d17}, [r2:64]! bne .L14 Afterwards it is: .L14: vldrd16, [r3, #-16] vldrd17, [r3, #-8] add r3, r3, #16 cmp r3, r1 vst1.64 {d16-d17}, [r2:64]! bne .L14 and the second VLDR instruction traps with SIGILL (the value in R3 is 0x10b29, odd as you'd expect, pointing into `ib'). I don't know why and especially why only the second of the two (regrettably I've been unable to track down an instruction reference that'd be detailed enough to specify what exceptions VLDR can produce and under what conditions). vldr will cause an unaligned access fault if the address is misaligned. The question is why is the address misaligned in this case. Is there a fix that needs backporting to 4.8 or is this an issue that was unknown so far? I haven't seen an issue with this so far. Ramana Hardware and Linux used: $ cat /proc/cpuinfo Processor : ARMv7 Processor rev 2 (v7l) processor : 0 BogoMIPS: 2013.49 processor : 1 BogoMIPS: 1963.08 Features: swp half thumb fastmult vfp edsp thumbee neon vfpv3 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x1 CPU part: 0xc09 CPU revision: 2 Hardware: OMAP4430 Panda Board Revision: 0020 Serial : $ uname -a Linux panda2 2.6.35-903-omap4 #14-Ubuntu SMP PREEMPT Wed Oct 6 17:23:24 UTC 2010 armv7l GNU/Linux $ Maciej
Re: [patch] VxWorks configuration adjustments for powerpc 8548 (e500v2)
On 05/27/14 15:57, Olivier Hainque wrote: Hello, The VxWorks compilation of C sources using vxworks header files need the CPU macro to be defined. Configuration files in the compiler are setup to to pick a default value depending on the cpu variant for which we are compiling. OK for mainline ? ok -- Nathan Sidwell
Re: RFA: cache enabled attribute by insn code
On Tue, May 27, 2014 at 04:15:47PM +0100, Richard Earnshaw wrote: On 27/05/14 15:08, Richard Sandiford wrote: Hmm, is this because of insn_enabled? If so, how did that work before the patch? LRA already assumed that the enabled attribute didn't depend on the operands. Huh! enabled can be applied to each alternative. Alternatives are selected based on the operands. If LRA can't cope with that we have a serious problem. In fact, a pattern that matches but has no enabled alternatives is meaningless and guaranteed to cause problems during register allocation. This is not LRA fault, but the backend misusing the enabled attribute for something it wasn't designed for, and IMHO against the documentation of the attribute too. Just look at the original submission why it has been added. Jakub
Re: [PATCH AArch64] Remove from arm_neon.h functions not in the spec
No, hold that, vfmaq_n_f64 has been added back in the latest version (to which I linked). Hang on... --Alan Alan Lawrence wrote: arm_neon.h contains a bunch of functions (for example, the wonderful vcgez_u* intrinsics - that's an unsigned comparison of greater-than-or-equal-to zero) that are not present in the current ARM Neon Intrinsics spec: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/index.html This patch just deletes those intrinsics. OK for trunk? Cheers, Alan gcc/ChangeLog: 2014-05-27 Alan Lawrence alan.lawre...@arm.com * config/aarch64/arm_neon.h (vfmaq_n_f64, vmlaq_n_f64, vmlsq_n_f64, vrsrtsq_f64, vtst_p16, vtstq_p16, vcge_p8, vcgeq_p8, vcgez_p8, vcgez_u8, vcgez_u16, vcgez_u32, vcgez_u64, vcgezq_p8, vcgezq_u8, vcgezq_u16, vcgezq_u32, vcgezq_u64, vcgezd_u64, vcgt_p8, vcgtq_p8, vcgtz_p8, vcgtz_u8, vcgtz_u16, vcgtz_u32, vcgtz_u64, vcgtzq_p8, vcgtzq_u8, vcgtzq_u16, vcgtzq_u32, vcgtzq_u64, vcgtzd_u64, vcle_p8, vcleq_p8, vclez_p8, vclez_u64, vclezq_p8, vclezd_u64, vclt_p8, vcltq_p8, vcltz_p8, vcltzq_p8, vcltzd_u64): Remove functions as they are not in the spec.
Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation
2014-05-27 15:15 GMT+04:00 Bernd Schmidt ber...@codesourcery.com: On 05/27/2014 01:11 PM, Ilya Verbin wrote: Do I understand correctly that you recommend to build our offload gcc manually, rather than during configure/make? Well, configure/make it separately - build and install it first. If there was a way to generalize the Makefiles to allow target libraries to be built for multiple targets, that would be pretty good too, but it might be too much work for too little gain. Ok. And what about the mkoffload tool? Should it be built during the build of the host or offload compiler? Do you have any patches for Makefile that builds it? Thanks, -- Ilya
Re: RFA: PATCH to allow DECL_COMDAT_GROUP to be a DECL
On 05/19/2014 06:10 PM, Jan Hubicka wrote: Instead of adding extra loop over whole symtab, what about converting them at a time the symbols are inserted into the symbol table or in function_and_variable_visibility that already does quite few changes into various flags? Converting them during the initial analyze pass seems to do the trick: commit 062e714358ad384f1a69051f3b7f3902b9714904 Author: Jason Merrill ja...@redhat.com Date: Mon May 19 15:41:24 2014 -0400 PR c++/47202 gcc/cp/ * decl.c (cxx_comdat_group): Return a decl. * optimize.c (cdtor_comdat_group): Get its DECL_ASSEMBLER_NAME. gcc/ * cgraph.h (symtab_node::get_comdat_group_id): New. * cgraphunit.c (analyze_functions): Call it. * symtab.c (dump_symtab_node): Likewise. * tree.c (decl_comdat_group_id): New. * tree.h: Declare it. * lto-streamer-out.c (write_symbol): Use it. * trans-mem.c (ipa_tm_create_version_alias): Likewise. diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 8556e2d..44c8211 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -147,6 +147,13 @@ public: return comdat_group_; } + tree get_comdat_group_id () +{ + if (comdat_group_ TREE_CODE (comdat_group_) != IDENTIFIER_NODE) + comdat_group_ = DECL_ASSEMBLER_NAME (comdat_group_); + return comdat_group_; +} + /* Set comdat group. */ void set_comdat_group (tree group) { diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 3bd364b..e19b0a2 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -974,6 +974,8 @@ analyze_functions (void) node != first_analyzed node != first_analyzed_var; node = node-next) { + /* Convert COMDAT group designators to IDENTIFIER_NODEs. */ + node-get_comdat_group_id (); if (decide_is_symbol_needed (node)) { enqueue_node (node); diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 9e97ff8..3502447 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -14484,18 +14484,17 @@ cp_missing_noreturn_ok_p (tree decl) return DECL_MAIN_P (decl); } -/* Return the COMDAT group into which DECL should be placed. */ +/* Return the decl used to identify the COMDAT group into which DECL should + be placed. */ tree cxx_comdat_group (tree decl) { - tree name; - /* Virtual tables, construction virtual tables, and virtual table tables all go in a single COMDAT group, named after the primary virtual table. */ if (VAR_P (decl) DECL_VTABLE_OR_VTT_P (decl)) -name = DECL_ASSEMBLER_NAME (CLASSTYPE_VTABLES (DECL_CONTEXT (decl))); +decl = CLASSTYPE_VTABLES (DECL_CONTEXT (decl)); /* For all other DECLs, the COMDAT group is the mangled name of the declaration itself. */ else @@ -14513,10 +14512,9 @@ cxx_comdat_group (tree decl) else break; } - name = DECL_ASSEMBLER_NAME (decl); } - return name; + return decl; } /* Returns the return type for FN as written by the user, which may include diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c index 4205fec..6b0c555 100644 --- a/gcc/cp/mangle.c +++ b/gcc/cp/mangle.c @@ -3517,10 +3517,15 @@ mangle_decl (const tree decl) DECL_IGNORED_P (alias) = 1; TREE_PUBLIC (alias) = TREE_PUBLIC (decl); DECL_VISIBILITY (alias) = DECL_VISIBILITY (decl); + DECL_EXTERNAL (alias) = DECL_EXTERNAL (decl); if (vague_linkage_p (decl)) DECL_WEAK (alias) = 1; if (TREE_CODE (decl) == FUNCTION_DECL) - cgraph_same_body_alias (cgraph_get_create_node (decl), alias, decl); + { + /* Don't create an alias to an unreferenced function. */ + if (struct cgraph_node *n = cgraph_get_node (decl)) + cgraph_same_body_alias (n, alias, decl); + } else varpool_extra_name_alias (alias, decl); #endif diff --git a/gcc/cp/optimize.c b/gcc/cp/optimize.c index a58565c..3f89b81 100644 --- a/gcc/cp/optimize.c +++ b/gcc/cp/optimize.c @@ -170,6 +170,8 @@ cdtor_comdat_group (tree complete, tree base) complete_name = cxx_comdat_group (complete); if (base_name == NULL) base_name = cxx_comdat_group (base); + complete_name = DECL_ASSEMBLER_NAME (complete_name); + base_name = DECL_ASSEMBLER_NAME (base_name); gcc_assert (IDENTIFIER_LENGTH (complete_name) == IDENTIFIER_LENGTH (base_name)); grp_name = XALLOCAVEC (char, IDENTIFIER_LENGTH (complete_name) + 1); diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c index d1c8d9f..034d9d9 100644 --- a/gcc/lto-streamer-out.c +++ b/gcc/lto-streamer-out.c @@ -2329,7 +2329,7 @@ write_symbol (struct streamer_tree_cache_d *cache, size = 0; if (DECL_ONE_ONLY (t)) -comdat = IDENTIFIER_POINTER (DECL_COMDAT_GROUP (t)); +comdat = IDENTIFIER_POINTER (decl_comdat_group_id (t)); else comdat = ; diff --git a/gcc/symtab.c b/gcc/symtab.c index 8abb7a1..79fdbc2 100644 --- a/gcc/symtab.c +++ b/gcc/symtab.c @@ -560,7 +560,7 @@ dump_symtab_base (FILE *f, symtab_node *node) fprintf (f, comdat); if (node-get_comdat_group ()) fprintf
Re: [patch i386]: Expand sibling-tail-calls via accumulator register
On 05/26/14 14:32, Kai Tietz wrote: - Original Message - On Mon, May 26, 2014 at 03:22:50PM -0400, Kai Tietz wrote: In any case, I still can't understand how limiting the choices of the register allocator can improve code rather than making it worse. If the accumulator is available there, why doesn't the RA choose it if it is beneficial? And why aren't other registers similarly suitable for that? Say r10, r11... I don't see it as limiting. The intend of this is more to have fixed patterns on epilogue. And in fact is accumulator that register which can be used as scratch-register for all i386-targets. Beside for varardic-functions, which anyway aren't any good candidates for sibling-call-optimization (on x86_64 due ABI). Well, for x86_64 ABI we might could consider to use R11_REG instead of AX_REG. Is there any advantage in special-case for x86_64 ABI? The R10-register isn't a good choice due it might be used as drap-register and therefore can't be loaded before epilogue gets destroyed. It is limiting. If r11/rax and often also r10 can be chosen, telling the RA it can only choose rax is a limitation. No, it isn't. For sure. The code-branch choosing the accu to call after epilogue isn't used as memories for sibling-calls aren't allowed. This code-branch will get active with my other sibling-tail-call patch. But the value we want may be sitting around in a convenient register (such as r11). So if we force the sibcall to use %rax, then we have to generate a copy. Yet if we have a constraint for the set of registers allowed here, then we give the register allocator the opportunity to coalesce away the copies. Jeff
Re: RFA: cache enabled attribute by insn code
On 27/05/14 16:27, Jakub Jelinek wrote: On Tue, May 27, 2014 at 04:15:47PM +0100, Richard Earnshaw wrote: On 27/05/14 15:08, Richard Sandiford wrote: Hmm, is this because of insn_enabled? If so, how did that work before the patch? LRA already assumed that the enabled attribute didn't depend on the operands. Huh! enabled can be applied to each alternative. Alternatives are selected based on the operands. If LRA can't cope with that we have a serious problem. In fact, a pattern that matches but has no enabled alternatives is meaningless and guaranteed to cause problems during register allocation. This is not LRA fault, but the backend misusing the enabled attribute for something it wasn't designed for, and IMHO against the documentation of the attribute too. Just look at the original submission why it has been added. Jakub quote The @code{enabled} insn attribute may be used to disable certain insn alternatives for machine-specific reasons. quote The rest of the text just says what happens when this is done and then gives an example usage. It doesn't any time, either explicitly or implicitly, say that this must be a static choice determined once-off at run time. That being said, I agree that this particular use case is pushing the boundaries -- but that's always a risk when the boundaries aren't clearly defined. A better solution here would be to get rid of all those match_operator patterns and replace them with iterators; but that's a lot of work, particularly for all the conditinal operation patterns we have. It would probably also bloat the number of patterns quite alarmingly.
Re: [PATCH][ARM] FAIL: gcc.target/arm/pr58041.c scan-assembler ldrb
On Tue, 27 May 2014, Ramana Radhakrishnan wrote: Looking at the section on unaligned accesses, it seems that the ldrb/strb-class instructions are the only ones that are unaffected by the SCTLR.A bit and do not produce alignment faults in any case. The NEON load/store instructions, including vld1.8 can still cause an alignment fault when SCTLR.A is set. So it seems we can only use the byte-wise core memory instructions for unaligned data. This change however has regressed gcc.dg/vect/vect-72.c on the arm-linux-gnueabi target, -march=armv5te, in particular in 4.8. And what are all the configure flags you are using in case some one has to reproduce this issue ? There was nothing relevant to code generation among these flags except from --with-arch=armv5te, but this architecture selection I already included above. Beforehand the code fragment in question produced was: .L14: sub r1, r3, #16 add r3, r3, #16 vld1.8 {q8}, [r1] vld1 allows a misaligned load. cmp r3, r0 vst1.64 {d16-d17}, [r2:64]! bne .L14 Afterwards it is: .L14: vldrd16, [r3, #-16] vldrd17, [r3, #-8] add r3, r3, #16 cmp r3, r1 vst1.64 {d16-d17}, [r2:64]! bne .L14 and the second VLDR instruction traps with SIGILL (the value in R3 is 0x10b29, odd as you'd expect, pointing into `ib'). I don't know why and especially why only the second of the two (regrettably I've been unable to track down an instruction reference that'd be detailed enough to specify what exceptions VLDR can produce and under what conditions). vldr will cause an unaligned access fault if the address is misaligned. The question is why is the address misaligned in this case. The address is unaligned because odd indexes into `ib' are accessed, which is defined as: char ib[N+1]; I'd say the question is rather why VLDR is used to access that array where clearly it can't be. The loop copies a chunk of data in 64-bit quantities between two memory areas of which one is 64-bit aligned and the other is not. So while 32-bit or 64-bit aligned accesses can still be used to do it, they can't be for both areas at a time, unless some further data manipulation is done such as shifting and merging. Maciej
Re: RFA: cache enabled attribute by insn code
On Tue, May 27, 2014 at 04:40:13PM +0100, Richard Earnshaw wrote: quote The @code{enabled} insn attribute may be used to disable certain insn alternatives for machine-specific reasons. quote The rest of the text just says what happens when this is done and then gives an example usage. It doesn't any time, either explicitly or implicitly, say that this must be a static choice determined once-off at run time. It says: This definition should be based on other insn attributes and/or target flags. where from what I understand Andreas originally meant was that the other insn attributes would be again based on target flags or other such insn attributes. Yeah, it explicitly rules out just basing the enabled attribute on the operands, arm uses just another attribute based on the operands there, but it still goes against the intent of the attribute, and is definitely a nightmare for the register allocator. Decisions based on the operands should be through constraints. From what I can see, arm uses it just on a single insn, so I'd say it shouldn't be a big deal to rewrite it. Jakub
Re: [patch i386]: Expand sibling-tail-calls via accumulator register
Ok, so just the part ok the patch remains to prevent varardic-functions being a sibling-tail-call remains. Kai
Re: [patch i386]: Expand sibling-tail-calls via accumulator register
On 05/22/2014 02:33 PM, Kai Tietz wrote: * config/i386/i386.c (ix86_expand_call): Enforce for sibcalls on memory the use of accumulator-register. I don't like this at all. I'm fine with allowing memories that are fully symbolic, e.g. extern void (*foo)(void); void f(void) { foo(); } but any time you've got to use one insn to form the address in %eax, you might as well have also issued the memory load into %eax. And if the memory load is moved earlier, you no longer need to constrain to %eax, but let the register allocator choose the call-clobbered register to use. r~
[patch, mips, commited] Check in obvious patch to fix mips build.
The recent comdat group changes broke the mips build because mips.c was not including cgraph.h. I am checking in the following patch as obvious to fix the mips build. 2014-05-27 Steve Ellcey sell...@mips.com * config/mips/mips.c: Add include of cgraph.h. diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index eefcfd2..46e6b81 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -71,6 +71,7 @@ along with GCC; see the file COPYING3. If not see #include opts.h #include tree-pass.h #include context.h +#include cgraph.h /* True if X is an UNSPEC wrapper around a SYMBOL_REF or LABEL_REF. */ #define UNSPEC_ADDRESS_P(X)\
Re: RFA: cache enabled attribute by insn code
On 27/05/14 16:50, Jakub Jelinek wrote: On Tue, May 27, 2014 at 04:40:13PM +0100, Richard Earnshaw wrote: quote The @code{enabled} insn attribute may be used to disable certain insn alternatives for machine-specific reasons. quote The rest of the text just says what happens when this is done and then gives an example usage. It doesn't any time, either explicitly or implicitly, say that this must be a static choice determined once-off at run time. It says: This definition should be based on other insn attributes and/or target flags. pedantry - It only says *should*, not *must*. Should can imply a recommendation, not a requirement. - It only says *based on*, not *solely based on*. Based on implies that other information might be involved as well, without limitation. /pedantary where from what I understand Andreas originally meant was that the other insn attributes would be again based on target flags or other such insn attributes. Yeah, it explicitly rules out just basing the enabled attribute on the operands, arm uses just another attribute based on the operands there, but it still goes against the intent of the attribute, and is definitely a nightmare for the register allocator. Decisions based on the operands should be through constraints. From what I can see, arm uses it just on a single insn, so I'd say it shouldn't be a big deal to rewrite it. Pedantry aside, I'm not against making such a restriction; but it is a new restriction that isn't explictly stated in the documentation today. R.
Re: RFA: cache enabled attribute by insn code
Richard Earnshaw rearn...@arm.com writes: On 27/05/14 16:27, Jakub Jelinek wrote: On Tue, May 27, 2014 at 04:15:47PM +0100, Richard Earnshaw wrote: On 27/05/14 15:08, Richard Sandiford wrote: Hmm, is this because of insn_enabled? If so, how did that work before the patch? LRA already assumed that the enabled attribute didn't depend on the operands. Huh! enabled can be applied to each alternative. Alternatives are selected based on the operands. If LRA can't cope with that we have a serious problem. In fact, a pattern that matches but has no enabled alternatives is meaningless and guaranteed to cause problems during register allocation. This is not LRA fault, but the backend misusing the enabled attribute for something it wasn't designed for, and IMHO against the documentation of the attribute too. Just look at the original submission why it has been added. Jakub quote The @code{enabled} insn attribute may be used to disable certain insn alternatives for machine-specific reasons. quote The rest of the text just says what happens when this is done and then gives an example usage. It doesn't any time, either explicitly or implicitly, say that this must be a static choice determined once-off at run time. OK, how about the doc patch below? That being said, I agree that this particular use case is pushing the boundaries -- but that's always a risk when the boundaries aren't clearly defined. A better solution here would be to get rid of all those match_operator patterns and replace them with iterators; but that's a lot of work, particularly for all the conditinal operation patterns we have. It would probably also bloat the number of patterns quite alarmingly. Yeah, which is why I was just going for the one place where it mattered. I think match_operator still has a place in cases where the insn pattern is entirely regular, which seems to be the case for most other uses of shiftable_operator. It's just that in this case there really were two separate cases per operator (plus vs. non-plus and mult vs. true shift). Thanks, Richard gcc/ * doc/md.texi: Document the restrictions on the enabled attribute. Index: gcc/doc/md.texi === --- gcc/doc/md.texi (revision 210972) +++ gcc/doc/md.texi (working copy) @@ -4094,11 +4094,11 @@ @subsection Disable insn alternatives using the @code{enabled} attribute @cindex enabled -The @code{enabled} insn attribute may be used to disable certain insn -alternatives for machine-specific reasons. This is useful when adding -new instructions to an existing pattern which are only available for -certain cpu architecture levels as specified with the @code{-march=} -option. +The @code{enabled} insn attribute may be used to disable insn +alternatives that are not available for the current subtarget. +This is useful when adding new instructions to an existing pattern +which are only available for certain cpu architecture levels as +specified with the @code{-march=} option. If an insn alternative is disabled, then it will never be used. The compiler treats the constraints for the disabled alternative as @@ -4112,6 +4112,9 @@ A definition of the @code{enabled} insn attribute. The attribute is defined as usual using the @code{define_attr} command. This definition should be based on other insn attributes and/or target flags. +It must not depend directly or indirectly on the current operands, +since the attribute is expected to be a static property of the subtarget. + The @code{enabled} attribute is a numeric attribute and should evaluate to @code{(const_int 1)} for an enabled alternative and to @code{(const_int 0)} otherwise.
Re: [patch, mips, commited] Check in obvious patch to fix mips build.
Steve Ellcey sell...@mips.com writes: The recent comdat group changes broke the mips build because mips.c was not including cgraph.h. I am checking in the following patch as obvious to fix the mips build. Wasn't that fixed by: 2014-05-26 Jan Hubicka hubi...@ucw.cz * tree.h (decl_comdat_group): Declare. * cgraph.h (symtab_in_same_comdat_p): Move offline to ... * tree.c (decl_comdat_group): Here. ? Thanks, Richard
Re: [RFC] optimize x - y cmp 0 with undefined overflow
I'm asking to merge them (move them to fold_comparison). Done (in the end the patch removes more lines than it adds :-). Tested on x86_64-suse-linux with no regressions. 2014-05-27 Eric Botcazou ebotca...@adacore.com * fold-const.c (fold_comparison): Clean up and extend X +- C1 CMP C2 to X CMP C2 -+ C1 transformation to EQ_EXPR/NE_EXPR. Add X - Y CMP 0 to X CMP Y transformation. (fold_binary_loc) EQ_EXPR/NE_EXPR: Remove same transformations. 2014-05-27 Eric Botcazou ebotca...@adacore.com * gcc.dg/fold-compare-8.c: New test. * gcc.dg/Wstrict-overflow-25.c: Likewise. -- Eric BotcazouIndex: fold-const.c === --- fold-const.c (revision 210911) +++ fold-const.c (working copy) @@ -8904,6 +8904,7 @@ static tree fold_comparison (location_t loc, enum tree_code code, tree type, tree op0, tree op1) { + const bool equality_code = (code == EQ_EXPR || code == NE_EXPR); tree arg0, arg1, tem; arg0 = op0; @@ -8920,28 +8921,24 @@ fold_comparison (location_t loc, enum tr if (tree_swap_operands_p (arg0, arg1, true)) return fold_build2_loc (loc, swap_tree_comparison (code), type, op1, op0); - /* Transform comparisons of the form X +- C1 CMP C2 to X CMP C2 +- C1. */ + /* Transform comparisons of the form X +- C1 CMP C2 to X CMP C2 -+ C1. */ if ((TREE_CODE (arg0) == PLUS_EXPR || TREE_CODE (arg0) == MINUS_EXPR) - (TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST - !TREE_OVERFLOW (TREE_OPERAND (arg0, 1)) - TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg1))) - (TREE_CODE (arg1) == INTEGER_CST - !TREE_OVERFLOW (arg1))) + (equality_code || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))) + TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST + !TREE_OVERFLOW (TREE_OPERAND (arg0, 1)) + TREE_CODE (arg1) == INTEGER_CST + !TREE_OVERFLOW (arg1)) { + const enum tree_code + reverse_op = TREE_CODE (arg0) == PLUS_EXPR ? MINUS_EXPR : PLUS_EXPR; tree const1 = TREE_OPERAND (arg0, 1); - tree const2 = arg1; + tree const2 = fold_convert_loc (loc, TREE_TYPE (const1), arg1); tree variable = TREE_OPERAND (arg0, 0); - tree lhs; - int lhs_add; - lhs_add = TREE_CODE (arg0) != PLUS_EXPR; - - lhs = fold_build2_loc (loc, lhs_add ? PLUS_EXPR : MINUS_EXPR, - TREE_TYPE (arg1), const2, const1); + tree new_const = int_const_binop (reverse_op, const2, const1); /* If the constant operation overflowed this can be simplified as a comparison against INT_MAX/INT_MIN. */ - if (TREE_CODE (lhs) == INTEGER_CST - TREE_OVERFLOW (lhs)) + if (TREE_OVERFLOW (new_const)) { int const1_sgn = tree_int_cst_sgn (const1); enum tree_code code2 = code; @@ -8961,29 +8958,48 @@ fold_comparison (location_t loc, enum tr /* We now can look at the canonicalized case VARIABLE + 1 CODE2 INT_MIN and decide on the result. */ - if (code2 == LT_EXPR - || code2 == LE_EXPR - || code2 == EQ_EXPR) - return omit_one_operand_loc (loc, type, boolean_false_node, variable); - else if (code2 == NE_EXPR - || code2 == GE_EXPR - || code2 == GT_EXPR) - return omit_one_operand_loc (loc, type, boolean_true_node, variable); - } + switch (code2) + { + case EQ_EXPR: + case LT_EXPR: + case LE_EXPR: + return + omit_one_operand_loc (loc, type, boolean_false_node, variable); + + case NE_EXPR: + case GE_EXPR: + case GT_EXPR: + return + omit_one_operand_loc (loc, type, boolean_true_node, variable); - if (TREE_CODE (lhs) == TREE_CODE (arg1) - (TREE_CODE (lhs) != INTEGER_CST - || !TREE_OVERFLOW (lhs))) + default: + gcc_unreachable (); + } + } + else { - if (code != EQ_EXPR code != NE_EXPR) + if (!equality_code) fold_overflow_warning (assuming signed overflow does not occur when changing X +- C1 cmp C2 to - X cmp C1 +- C2, + X cmp C2 -+ C1, WARN_STRICT_OVERFLOW_COMPARISON); - return fold_build2_loc (loc, code, type, variable, lhs); + return fold_build2_loc (loc, code, type, variable, new_const); } } + /* Transform comparisons of the form X - Y CMP 0 to X CMP Y. */ + if (TREE_CODE (arg0) == MINUS_EXPR + (equality_code || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))) + integer_zerop (arg1)) +{ + if (!equality_code) + fold_overflow_warning (assuming signed overflow does not occur + when changing X - Y cmp 0 to X cmp Y, + WARN_STRICT_OVERFLOW_COMPARISON); + return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), + TREE_OPERAND (arg0, 1)); +} + /* For comparisons of pointers we can decompose it to a compile time comparison of the base objects and the offsets into the object. This requires at least one operand being an ADDR_EXPR or a @@ -9111,8 +9127,7 @@ fold_comparison
Re: [RFC] optimize x - y cmp 0 with undefined overflow
On May 27, 2014 6:12:58 PM CEST, Eric Botcazou ebotca...@adacore.com wrote: I'm asking to merge them (move them to fold_comparison). Done (in the end the patch removes more lines than it adds :-). That's what I like! Tested on x86_64-suse-linux with no regressions. OK. Thanks, Richard. 2014-05-27 Eric Botcazou ebotca...@adacore.com * fold-const.c (fold_comparison): Clean up and extend X +- C1 CMP C2 to X CMP C2 -+ C1 transformation to EQ_EXPR/NE_EXPR. Add X - Y CMP 0 to X CMP Y transformation. (fold_binary_loc) EQ_EXPR/NE_EXPR: Remove same transformations. 2014-05-27 Eric Botcazou ebotca...@adacore.com * gcc.dg/fold-compare-8.c: New test. * gcc.dg/Wstrict-overflow-25.c: Likewise.
Re: RFA: cache enabled attribute by insn code
On 27/05/14 17:09, Richard Sandiford wrote: Richard Earnshaw rearn...@arm.com writes: On 27/05/14 16:27, Jakub Jelinek wrote: On Tue, May 27, 2014 at 04:15:47PM +0100, Richard Earnshaw wrote: On 27/05/14 15:08, Richard Sandiford wrote: Hmm, is this because of insn_enabled? If so, how did that work before the patch? LRA already assumed that the enabled attribute didn't depend on the operands. Huh! enabled can be applied to each alternative. Alternatives are selected based on the operands. If LRA can't cope with that we have a serious problem. In fact, a pattern that matches but has no enabled alternatives is meaningless and guaranteed to cause problems during register allocation. This is not LRA fault, but the backend misusing the enabled attribute for something it wasn't designed for, and IMHO against the documentation of the attribute too. Just look at the original submission why it has been added. Jakub quote The @code{enabled} insn attribute may be used to disable certain insn alternatives for machine-specific reasons. quote The rest of the text just says what happens when this is done and then gives an example usage. It doesn't any time, either explicitly or implicitly, say that this must be a static choice determined once-off at run time. OK, how about the doc patch below? That being said, I agree that this particular use case is pushing the boundaries -- but that's always a risk when the boundaries aren't clearly defined. A better solution here would be to get rid of all those match_operator patterns and replace them with iterators; but that's a lot of work, particularly for all the conditinal operation patterns we have. It would probably also bloat the number of patterns quite alarmingly. Yeah, which is why I was just going for the one place where it mattered. I think match_operator still has a place in cases where the insn pattern is entirely regular, which seems to be the case for most other uses of shiftable_operator. It's just that in this case there really were two separate cases per operator (plus vs. non-plus and mult vs. true shift). Thanks, Richard gcc/ * doc/md.texi: Document the restrictions on the enabled attribute. Index: gcc/doc/md.texi === --- gcc/doc/md.texi (revision 210972) +++ gcc/doc/md.texi (working copy) @@ -4094,11 +4094,11 @@ @subsection Disable insn alternatives using the @code{enabled} attribute @cindex enabled -The @code{enabled} insn attribute may be used to disable certain insn -alternatives for machine-specific reasons. This is useful when adding -new instructions to an existing pattern which are only available for -certain cpu architecture levels as specified with the @code{-march=} -option. +The @code{enabled} insn attribute may be used to disable insn +alternatives that are not available for the current subtarget. +This is useful when adding new instructions to an existing pattern +which are only available for certain cpu architecture levels as +specified with the @code{-march=} option. If an insn alternative is disabled, then it will never be used. The compiler treats the constraints for the disabled alternative as @@ -4112,6 +4112,9 @@ A definition of the @code{enabled} insn attribute. The attribute is defined as usual using the @code{define_attr} command. This definition should be based on other insn attributes and/or target flags. +It must not depend directly or indirectly on the current operands, +since the attribute is expected to be a static property of the subtarget. + I'd reverse the two components of that sentence. Something like: The attribute must be a static property of the subtarget; that is, it must not depend on the current operands or any other dynamic context (for example, location of the insn within the body of a loop). It would be useful if we could precisely define 'static property' somewhere, so that it encompases per-function changing of the ISA or optimization variant via __attribute__ annotations. That does need to work, since it could be used for switching between ARM and Thumb. R. The @code{enabled} attribute is a numeric attribute and should evaluate to @code{(const_int 1)} for an enabled alternative and to @code{(const_int 0)} otherwise.
Re: [patch i386]: Expand sibling-tail-calls via accumulator register
On 05/27/2014 08:39 AM, Jeff Law wrote: But the value we want may be sitting around in a convenient register (such as r11). So if we force the sibcall to use %rax, then we have to generate a copy. Yet if we have a constraint for the set of registers allowed here, then we give the register allocator the opportunity to coalesce away the copies. We do have an existing call-clobbered register class that we currently do use for indirect sibcalls. The problem that Kai is trying to work around is that he doesn't want to load the value into a register, but its address. But the register allocator only has BASE_REG_CLASS and INDEX_REG_CLASS, and has no possibility to force the address components into registers that are call clobbered. My thinking is that, with few exceptions, this doesn't help anything. If we have to load the full address into a register: lea ofs(base, index, scale), %eax ... call*0(%eax) we might as well include the memory load mov ofs(base, index, scale), %eax ... call*%eax As far as I can see, the only time we can actually save an instruction is when the address is fully symbolic: mov foo, %eax ... call*%eax becomes call*foo r~
Re: RFA: cache enabled attribute by insn code
Richard Earnshaw rearn...@arm.com writes: On 27/05/14 17:09, Richard Sandiford wrote: Richard Earnshaw rearn...@arm.com writes: On 27/05/14 16:27, Jakub Jelinek wrote: On Tue, May 27, 2014 at 04:15:47PM +0100, Richard Earnshaw wrote: On 27/05/14 15:08, Richard Sandiford wrote: Hmm, is this because of insn_enabled? If so, how did that work before the patch? LRA already assumed that the enabled attribute didn't depend on the operands. Huh! enabled can be applied to each alternative. Alternatives are selected based on the operands. If LRA can't cope with that we have a serious problem. In fact, a pattern that matches but has no enabled alternatives is meaningless and guaranteed to cause problems during register allocation. This is not LRA fault, but the backend misusing the enabled attribute for something it wasn't designed for, and IMHO against the documentation of the attribute too. Just look at the original submission why it has been added. Jakub quote The @code{enabled} insn attribute may be used to disable certain insn alternatives for machine-specific reasons. quote The rest of the text just says what happens when this is done and then gives an example usage. It doesn't any time, either explicitly or implicitly, say that this must be a static choice determined once-off at run time. OK, how about the doc patch below? That being said, I agree that this particular use case is pushing the boundaries -- but that's always a risk when the boundaries aren't clearly defined. A better solution here would be to get rid of all those match_operator patterns and replace them with iterators; but that's a lot of work, particularly for all the conditinal operation patterns we have. It would probably also bloat the number of patterns quite alarmingly. Yeah, which is why I was just going for the one place where it mattered. I think match_operator still has a place in cases where the insn pattern is entirely regular, which seems to be the case for most other uses of shiftable_operator. It's just that in this case there really were two separate cases per operator (plus vs. non-plus and mult vs. true shift). Thanks, Richard gcc/ * doc/md.texi: Document the restrictions on the enabled attribute. Index: gcc/doc/md.texi === --- gcc/doc/md.texi (revision 210972) +++ gcc/doc/md.texi (working copy) @@ -4094,11 +4094,11 @@ @subsection Disable insn alternatives using the @code{enabled} attribute @cindex enabled -The @code{enabled} insn attribute may be used to disable certain insn -alternatives for machine-specific reasons. This is useful when adding -new instructions to an existing pattern which are only available for -certain cpu architecture levels as specified with the @code{-march=} -option. +The @code{enabled} insn attribute may be used to disable insn +alternatives that are not available for the current subtarget. +This is useful when adding new instructions to an existing pattern +which are only available for certain cpu architecture levels as +specified with the @code{-march=} option. If an insn alternative is disabled, then it will never be used. The compiler treats the constraints for the disabled alternative as @@ -4112,6 +4112,9 @@ A definition of the @code{enabled} insn attribute. The attribute is defined as usual using the @code{define_attr} command. This definition should be based on other insn attributes and/or target flags. +It must not depend directly or indirectly on the current operands, +since the attribute is expected to be a static property of the subtarget. + I'd reverse the two components of that sentence. Something like: The attribute must be a static property of the subtarget; that is, it must not depend on the current operands or any other dynamic context (for example, location of the insn within the body of a loop). OK, how about the attached? It would be useful if we could precisely define 'static property' somewhere, so that it encompases per-function changing of the ISA or optimization variant via __attribute__ annotations. That does need to work, since it could be used for switching between ARM and Thumb. Yeah, the cache depends on the current target for SWITCHABLE_TARGETs, is invalidated by target_reinit for other targets, and is also invalidated by changes to the register classes. Thanks, Richard gcc/ * doc/md.texi: Document the restrictions on the enabled attribute. Index: gcc/doc/md.texi === --- gcc/doc/md.texi (revision 210972) +++ gcc/doc/md.texi (working copy) @@ -4094,11 +4094,11 @@ @subsection Disable insn alternatives using the @code{enabled} attribute @cindex enabled -The @code{enabled} insn attribute may be used to disable certain insn -alternatives for machine-specific reasons. This is useful
Re: [patch, mips, commited] Check in obvious patch to fix mips build.
On Tue, 2014-05-27 at 17:12 +0100, Richard Sandiford wrote: Steve Ellcey sell...@mips.com writes: The recent comdat group changes broke the mips build because mips.c was not including cgraph.h. I am checking in the following patch as obvious to fix the mips build. Wasn't that fixed by: 2014-05-26 Jan Hubicka hubi...@ucw.cz * tree.h (decl_comdat_group): Declare. * cgraph.h (symtab_in_same_comdat_p): Move offline to ... * tree.c (decl_comdat_group): Here. ? Thanks, Richard It didn't fix the build for me. I still got this failure after that patch: /local/home/sellcey/nightly/src/gcc/gcc/config/mips/mips.c: In function 'void mips_start_unique_function(const char*)': /local/home/sellcey/nightly/src/gcc/gcc/config/mips/mips.c:6278:27: error: 'cgraph_create_node' was not declared in this scope make[2]: *** [mips.o] Error 1 When I grepped for cgraph_create_node I found it in cgraph.h. Steve Ellcey sell...@mips.com
Re: RFA: cache enabled attribute by insn code
On 27 May 2014 17:07, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Sandiford rdsandif...@googlemail.com writes: Richard Sandiford rsand...@linux.vnet.ibm.com writes: Does the following patch help? Bah, it won't of course: %i1 needs to be the operator. Here's v2. I tested that it worked for simple tests like: I confirm that the compiler now builds fine (target arm-none-linux-gnueabihf) with this patch applied. Thanks, Christophe. int f1 (int x, int y) { return x + (y 4); } int f2 (int x, int y) { return x - (y 4); } int f3 (int x, int y) { return x (y 4); } int f4 (int x, int y) { return x | (y 4); } int f5 (int x, int y) { return x ^ (y 4); } int f6 (int x, int y) { return (y 4) - x; } int g1 (int x, int y, int z) { return x + (y z); } int g2 (int x, int y, int z) { return x - (y z); } int g3 (int x, int y, int z) { return x (y z); } int g4 (int x, int y, int z) { return x | (y z); } int g5 (int x, int y, int z) { return x ^ (y z); } int g6 (int x, int y, int z) { return (y z) - x; } as well as the testcase. Thanks, Richard gcc/ * config/arm/iterators.md (shiftable_ops): New code iterator. (t2_binop0, arith_shift_insn): New code attributes. * config/arm/arm.md (insn_enabled): Delete. (enabled): Remove insn_enabled test. (*arith_shiftsi): Split out... (*arith_multsi): ...this pattern and remove insn_enabled attribute. Index: gcc/config/arm/arm.md === --- gcc/config/arm/arm.md (revision 210972) +++ gcc/config/arm/arm.md (working copy) @@ -200,17 +200,9 @@ (const_string yes)] (const_string no))) -; Allows an insn to disable certain alternatives for reasons other than -; arch support. -(define_attr insn_enabled no,yes - (const_string yes)) - ; Enable all alternatives that are both arch_enabled and insn_enabled. (define_attr enabled no,yes - (cond [(eq_attr insn_enabled no) - (const_string no) - - (and (eq_attr predicable_short_it no) + (cond [(and (eq_attr predicable_short_it no) (and (eq_attr predicated yes) (match_test arm_restrict_it))) (const_string no) @@ -9876,39 +9868,38 @@ ;; Patterns to allow combination of arithmetic, cond code and shifts -(define_insn *arith_shiftsi - [(set (match_operand:SI 0 s_register_operand =r,r,r,r) -(match_operator:SI 1 shiftable_operator - [(match_operator:SI 3 shift_operator - [(match_operand:SI 4 s_register_operand r,r,r,r) - (match_operand:SI 5 shift_amount_operand M,M,M,r)]) - (match_operand:SI 2 s_register_operand rk,rk,r,rk)]))] +;; Separate out the (mult ...) case, since it doesn't allow register operands. +(define_insn *arith_multsi + [(set (match_operand:SI 0 s_register_operand =r,r) +(shiftable_ops:SI + (match_operator:SI 2 mult_operator + [(match_operand:SI 3 s_register_operand r,r) +(match_operand:SI 4 power_of_two_operand)]) + (match_operand:SI 1 s_register_operand rk,t2_binop0)))] TARGET_32BIT - %i1%?\\t%0, %2, %4%S3 + arith_shift_insn%?\\t%0, %1, %3%S2 [(set_attr predicable yes) (set_attr predicable_short_it no) (set_attr shift 4) - (set_attr arch a,t2,t2,a) - ;; Thumb2 doesn't allow the stack pointer to be used for - ;; operand1 for all operations other than add and sub. In this case - ;; the minus operation is a candidate for an rsub and hence needs - ;; to be disabled. - ;; We have to make sure to disable the fourth alternative if - ;; the shift_operator is MULT, since otherwise the insn will - ;; also match a multiply_accumulate pattern and validate_change - ;; will allow a replacement of the constant with a register - ;; despite the checks done in shift_operator. - (set_attr_alternative insn_enabled -[(const_string yes) - (if_then_else - (match_operand:SI 1 add_operator ) - (const_string yes) (const_string no)) - (const_string yes) - (if_then_else - (match_operand:SI 3 mult_operator ) - (const_string no) (const_string yes))]) - (set_attr type alu_shift_imm,alu_shift_imm,alu_shift_imm,alu_shift_reg)]) + (set_attr arch a,t2) + (set_attr type alu_shift_imm,alu_shift_imm)]) +;; Handles cases other than the above. +(define_insn *arith_shiftsi + [(set (match_operand:SI 0 s_register_operand =r,r,r) +(shiftable_ops:SI + (match_operator:SI 2 shift_operator +[(match_operand:SI 3 s_register_operand r,r,r) + (match_operand:SI 4 shift_amount_operand M,M,r)]) + (match_operand:SI 1 s_register_operand rk,t2_binop0,rk)))] +