[Committed] S/390: Add missing constraints in builtin patterns
gcc/ChangeLog: 2017-03-09 Andreas Krebbel* config/s390/vx-builtins.md ("vfee", "vfeez") ("vfenez"): Add missing constraints. --- gcc/config/s390/vx-builtins.md | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md index 1e9010a..6aff378 100644 --- a/gcc/config/s390/vx-builtins.md +++ b/gcc/config/s390/vx-builtins.md @@ -1351,9 +1351,9 @@ ; vfeeb, vfeeh, vfeef (define_insn "vfee" - [(set (match_operand:VI_HW_QHS0 "register_operand" "") - (unspec:VI_HW_QHS [(match_operand:VI_HW_QHS 1 "register_operand" "") - (match_operand:VI_HW_QHS 2 "register_operand" "") + [(set (match_operand:VI_HW_QHS0 "register_operand" "=v") + (unspec:VI_HW_QHS [(match_operand:VI_HW_QHS 1 "register_operand" "v") + (match_operand:VI_HW_QHS 2 "register_operand" "v") (const_int 0)] UNSPEC_VEC_VFEE))] "TARGET_VX" @@ -1362,9 +1362,9 @@ ; vfeezb, vfeezh, vfeezf (define_insn "vfeez" - [(set (match_operand:VI_HW_QHS0 "register_operand" "") - (unspec:VI_HW_QHS [(match_operand:VI_HW_QHS 1 "register_operand" "") - (match_operand:VI_HW_QHS 2 "register_operand" "") + [(set (match_operand:VI_HW_QHS0 "register_operand" "=v") + (unspec:VI_HW_QHS [(match_operand:VI_HW_QHS 1 "register_operand" "v") + (match_operand:VI_HW_QHS 2 "register_operand" "v") (const_int VSTRING_FLAG_ZS)] UNSPEC_VEC_VFEE))] "TARGET_VX" @@ -1423,9 +1423,9 @@ ; vfenezb, vfenezh, vfenezf (define_insn "vfenez" - [(set (match_operand:VI_HW_QHS0 "register_operand" "") - (unspec:VI_HW_QHS [(match_operand:VI_HW_QHS 1 "register_operand" "") - (match_operand:VI_HW_QHS 2 "register_operand" "") + [(set (match_operand:VI_HW_QHS0 "register_operand" "=v") + (unspec:VI_HW_QHS [(match_operand:VI_HW_QHS 1 "register_operand" "v") + (match_operand:VI_HW_QHS 2 "register_operand" "v") (const_int VSTRING_FLAG_ZS)] UNSPEC_VEC_VFENE))] "TARGET_VX" -- 2.9.1
Re: [PATCH][AArch64] Implement ALU_BRANCH fusion
Hi James, Thanks for the review and your comments. >> I'd need more detail on what types of instruction pairs you >> are trying to fuse. The documentation mentions it as follows:- Single uop ALU instruction may fuse with adjacent branch instruction in the same bundle >> This comment looks incorrect - there is no vulcan_alu_basic reservation Modified as per comment. Please let us know if the description is sufficient? Thanks, Naveen
New German PO file for 'gcc' (version 7.1-b20170226)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the German team of translators. The file is available at: http://translationproject.org/latest/gcc/de.po (This file, 'gcc-7.1-b20170226.de.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
C++ PATCH for c++/79797, ICE with array NSDMI
In this testcase, ctx->object was the array subobject and 'this' refers to the containing object, so we failed. Fixed by generalizing PLACEHOLDER_EXPR handling more (in a way that replace_placeholders_r already handles). Tested x86_64-pc-linux-gnu, applying to trunk. commit a383ab56e2ea367509255f5a6b95fb5ad0b20f88 Author: Jason MerrillDate: Wed Mar 8 16:03:14 2017 -0500 PR c++/79797 - ICE with self-reference in array DMI. * constexpr.c (lookup_placeholder): Split out... (cxx_eval_constant_expression): ...from here. diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index f114da0..2510e23e 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -3827,6 +3827,35 @@ cxx_eval_switch_expr (const constexpr_ctx *ctx, tree t, return NULL_TREE; } +/* Find the object of TYPE under initialization in CTX. */ + +static tree +lookup_placeholder (const constexpr_ctx *ctx, bool lval, tree type) +{ + if (!ctx || !ctx->ctor || (lval && !ctx->object)) +return NULL_TREE; + + /* We could use ctx->object unconditionally, but using ctx->ctor when we + can is a minor optimization. */ + if (!lval && same_type_p (TREE_TYPE (ctx->ctor), type)) +return ctx->ctor; + + /* Since an object cannot have a field of its own type, we can search outward + from ctx->object to find the unique containing object of TYPE. */ + tree ob = ctx->object; + while (ob) +{ + if (same_type_ignoring_top_level_qualifiers_p (TREE_TYPE (ob), type)) + break; + if (handled_component_p (ob)) + ob = TREE_OPERAND (ob, 0); + else + ob = NULL_TREE; +} + + return ob; +} + /* Attempt to reduce the expression T to a constant value. On failure, issue diagnostic and return error_mark_node. */ /* FIXME unify with c_fully_fold */ @@ -4468,27 +4497,15 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, break; case PLACEHOLDER_EXPR: - if (!ctx || !ctx->ctor || (lval && !ctx->object) - || !(same_type_ignoring_top_level_qualifiers_p - (TREE_TYPE (t), TREE_TYPE (ctx->ctor - { - /* A placeholder without a referent. We can get here when -checking whether NSDMIs are noexcept, or in massage_init_elt; -just say it's non-constant for now. */ - gcc_assert (ctx->quiet); - *non_constant_p = true; - break; - } - else - { - /* Use of the value or address of the current object. We could -use ctx->object unconditionally, but using ctx->ctor when we -can is a minor optimization. */ - tree ctor = lval ? ctx->object : ctx->ctor; - return cxx_eval_constant_expression - (ctx, ctor, lval, -non_constant_p, overflow_p); - } + /* Use of the value or address of the current object. */ + if (tree ctor = lookup_placeholder (ctx, lval, TREE_TYPE (t))) + return cxx_eval_constant_expression (ctx, ctor, lval, +non_constant_p, overflow_p); + /* A placeholder without a referent. We can get here when +checking whether NSDMIs are noexcept, or in massage_init_elt; +just say it's non-constant for now. */ + gcc_assert (ctx->quiet); + *non_constant_p = true; break; case EXIT_EXPR: diff --git a/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr7.C b/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr7.C new file mode 100644 index 000..2134ac9 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr7.C @@ -0,0 +1,12 @@ +// PR c++/79797 +// { dg-do compile { target c++14 } } + +struct A +{ + A* x[1]{(A*)this}; +}; + +extern constexpr A a{}; + +#define SA(X) static_assert ((X), #X) +SA (a.x[0] == );
Re: C++ PATCH to fix ICE in strip_typedefs with -Wpadded (PR c++/79900)
OK. On Wed, Mar 8, 2017 at 6:00 PM, Marek Polacekwrote: > We crash on an assert in strip_typedefs, because we find ourselves in a > scenario where RESULT, the main variant of a struct, was modified in > finalize_record_size (its TYPE_ALIGN changed), but its variants (T in > strip_typedefs) weren't fixed-up yet; that will happen slightly later in > finalize_type_size. So there's a discrepancy that confuses the code. > > This patch implements what Jason suggested to me on IRC, i.e., skip the > attribute handling if RESULT is complete and T isn't. > > Bootstrapped/regtested on x86_64-linux, ok for trunk? > > 2017-03-08 Marek Polacek > > PR c++/79900 - ICE in strip_typedefs > * tree.c (strip_typedefs): Skip the attribute handling if T is > a variant type which hasn't been updated yet. > > * g++.dg/warn/Wpadded-1.C: New test. > > diff --git gcc/cp/tree.c gcc/cp/tree.c > index d3c63b8..b3a4a10 100644 > --- gcc/cp/tree.c > +++ gcc/cp/tree.c > @@ -1548,29 +1548,40 @@ strip_typedefs (tree t, bool *remove_attributes) > result = TYPE_MAIN_VARIANT (t); > } >gcc_assert (!typedef_variant_p (result)); > - if (TYPE_USER_ALIGN (t) != TYPE_USER_ALIGN (result) > - || TYPE_ALIGN (t) != TYPE_ALIGN (result)) > + > + if (COMPLETE_TYPE_P (result) && !COMPLETE_TYPE_P (t)) > + /* If RESULT is complete and T isn't, it's likely the case that T > + is a variant of RESULT which hasn't been updated yet. Skip the > + attribute handling. */; > + else > { > - gcc_assert (TYPE_USER_ALIGN (t)); > - if (remove_attributes) > - *remove_attributes = true; > - else > + if (TYPE_USER_ALIGN (t) != TYPE_USER_ALIGN (result) > + || TYPE_ALIGN (t) != TYPE_ALIGN (result)) > { > - if (TYPE_ALIGN (t) == TYPE_ALIGN (result)) > - result = build_variant_type_copy (result); > + gcc_assert (TYPE_USER_ALIGN (t)); > + if (remove_attributes) > + *remove_attributes = true; > else > - result = build_aligned_type (result, TYPE_ALIGN (t)); > - TYPE_USER_ALIGN (result) = true; > + { > + if (TYPE_ALIGN (t) == TYPE_ALIGN (result)) > + result = build_variant_type_copy (result); > + else > + result = build_aligned_type (result, TYPE_ALIGN (t)); > + TYPE_USER_ALIGN (result) = true; > + } > + } > + > + if (TYPE_ATTRIBUTES (t)) > + { > + if (remove_attributes) > + result = apply_identity_attributes (result, TYPE_ATTRIBUTES (t), > + remove_attributes); > + else > + result = cp_build_type_attribute_variant (result, > + TYPE_ATTRIBUTES (t)); > } > } > - if (TYPE_ATTRIBUTES (t)) > -{ > - if (remove_attributes) > - result = apply_identity_attributes (result, TYPE_ATTRIBUTES (t), > - remove_attributes); > - else > - result = cp_build_type_attribute_variant (result, TYPE_ATTRIBUTES > (t)); > -} > + >return cp_build_qualified_type (result, cp_type_quals (t)); > } > > diff --git gcc/testsuite/g++.dg/warn/Wpadded-1.C > gcc/testsuite/g++.dg/warn/Wpadded-1.C > index e69de29..b3f0581 100644 > --- gcc/testsuite/g++.dg/warn/Wpadded-1.C > +++ gcc/testsuite/g++.dg/warn/Wpadded-1.C > @@ -0,0 +1,22 @@ > +// PR c++/79900 - ICE in strip_typedefs > +// { dg-do compile } > +// { dg-options "-Wpadded" } > + > +template struct A; > +template struct B { // { dg-warning "padding struct size to > alignment boundary" } > + long _M_off; > + int _M_state; > +}; > +template <> struct A { typedef B pos_type; }; > +enum _Ios_Openmode {}; > +struct C { > + typedef _Ios_Openmode openmode; > +}; > +template struct D { > + typedef typename _Traits::pos_type pos_type; > + pos_type m_fn1(pos_type, C::openmode); > +}; > +template class D ; > +template > +typename D<_CharT, _Traits>::pos_type D<_CharT, _Traits>::m_fn1(pos_type x, > +C::openmode) > { return x; } > > Marek
Re: [PATCH] -Wduplicated-branches -fopenmp ICE in inchash::add_expr (PR c++/79672)
On Thu, Mar 09, 2017 at 12:02:17AM +0100, Marek Polacek wrote: > Ping. > > On Wed, Mar 01, 2017 at 08:09:05PM +0100, Marek Polacek wrote: > > The following testcase ICEd with -Wduplicated-branches and -fopenmp > > because we tried to has omp_parallel expression that contained some > > TREE_VECs, but those aren't handled in inchash::add_expr. Handling > > that is easy and fixes the ICE. > > > > Bootstrapped/regtested on x86_64-linux, ok for trunk? > > > > 2017-03-01 Marek Polacek> > > > PR c++/79672 > > * tree.c (inchash::add_expr): Handle TREE_VEC. > > > > * g++.dg/warn/Wduplicated-branches2.C: Fix PR. > > * g++.dg/warn/Wduplicated-branches3.C: New test. Ok, thanks. Jakub
Re: [PATCH] -Wduplicated-branches -fopenmp ICE in inchash::add_expr (PR c++/79672)
Ping. On Wed, Mar 01, 2017 at 08:09:05PM +0100, Marek Polacek wrote: > The following testcase ICEd with -Wduplicated-branches and -fopenmp > because we tried to has omp_parallel expression that contained some > TREE_VECs, but those aren't handled in inchash::add_expr. Handling > that is easy and fixes the ICE. > > Bootstrapped/regtested on x86_64-linux, ok for trunk? > > 2017-03-01 Marek Polacek> > PR c++/79672 > * tree.c (inchash::add_expr): Handle TREE_VEC. > > * g++.dg/warn/Wduplicated-branches2.C: Fix PR. > * g++.dg/warn/Wduplicated-branches3.C: New test. > > diff --git gcc/testsuite/g++.dg/warn/Wduplicated-branches2.C > gcc/testsuite/g++.dg/warn/Wduplicated-branches2.C > index 4da2d54..7e14c5f 100644 > --- gcc/testsuite/g++.dg/warn/Wduplicated-branches2.C > +++ gcc/testsuite/g++.dg/warn/Wduplicated-branches2.C > @@ -1,4 +1,4 @@ > -// PR c/6427 > +// PR c/64279 > // { dg-do compile { target c++11 } } > // { dg-options "-Wduplicated-branches" } > > diff --git gcc/testsuite/g++.dg/warn/Wduplicated-branches3.C > gcc/testsuite/g++.dg/warn/Wduplicated-branches3.C > index e69de29..26dab85 100644 > --- gcc/testsuite/g++.dg/warn/Wduplicated-branches3.C > +++ gcc/testsuite/g++.dg/warn/Wduplicated-branches3.C > @@ -0,0 +1,18 @@ > +// PR c++/79672 > +// { dg-do compile } > +// { dg-options "-Wduplicated-branches -fopenmp" } > +// { dg-require-effective-target fopenmp } > + > +template void foo() > +{ > + if (N > 0) > + { > +#pragma omp parallel for > +for (int i = 0; i < 10; ++i) ; > + } > +} > + > +void bar() > +{ > + foo<0>(); > +} > diff --git gcc/tree.c gcc/tree.c > index 42c8a2d..8f87e7c 100644 > --- gcc/tree.c > +++ gcc/tree.c > @@ -7865,6 +7865,10 @@ add_expr (const_tree t, inchash::hash , > unsigned int flags) > inchash::add_expr (tsi_stmt (i), hstate, flags); > return; >} > +case TREE_VEC: > + for (int i = 0; i < TREE_VEC_LENGTH (t); ++i) > + inchash::add_expr (TREE_VEC_ELT (t, i), hstate, flags); > + return; > case FUNCTION_DECL: >/* When referring to a built-in FUNCTION_DECL, use the __builtin__ > form. >Otherwise nodes that compare equal according to operand_equal_p might > > Marek Marek
C++ PATCH to fix ICE in strip_typedefs with -Wpadded (PR c++/79900)
We crash on an assert in strip_typedefs, because we find ourselves in a scenario where RESULT, the main variant of a struct, was modified in finalize_record_size (its TYPE_ALIGN changed), but its variants (T in strip_typedefs) weren't fixed-up yet; that will happen slightly later in finalize_type_size. So there's a discrepancy that confuses the code. This patch implements what Jason suggested to me on IRC, i.e., skip the attribute handling if RESULT is complete and T isn't. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2017-03-08 Marek PolacekPR c++/79900 - ICE in strip_typedefs * tree.c (strip_typedefs): Skip the attribute handling if T is a variant type which hasn't been updated yet. * g++.dg/warn/Wpadded-1.C: New test. diff --git gcc/cp/tree.c gcc/cp/tree.c index d3c63b8..b3a4a10 100644 --- gcc/cp/tree.c +++ gcc/cp/tree.c @@ -1548,29 +1548,40 @@ strip_typedefs (tree t, bool *remove_attributes) result = TYPE_MAIN_VARIANT (t); } gcc_assert (!typedef_variant_p (result)); - if (TYPE_USER_ALIGN (t) != TYPE_USER_ALIGN (result) - || TYPE_ALIGN (t) != TYPE_ALIGN (result)) + + if (COMPLETE_TYPE_P (result) && !COMPLETE_TYPE_P (t)) + /* If RESULT is complete and T isn't, it's likely the case that T + is a variant of RESULT which hasn't been updated yet. Skip the + attribute handling. */; + else { - gcc_assert (TYPE_USER_ALIGN (t)); - if (remove_attributes) - *remove_attributes = true; - else + if (TYPE_USER_ALIGN (t) != TYPE_USER_ALIGN (result) + || TYPE_ALIGN (t) != TYPE_ALIGN (result)) { - if (TYPE_ALIGN (t) == TYPE_ALIGN (result)) - result = build_variant_type_copy (result); + gcc_assert (TYPE_USER_ALIGN (t)); + if (remove_attributes) + *remove_attributes = true; else - result = build_aligned_type (result, TYPE_ALIGN (t)); - TYPE_USER_ALIGN (result) = true; + { + if (TYPE_ALIGN (t) == TYPE_ALIGN (result)) + result = build_variant_type_copy (result); + else + result = build_aligned_type (result, TYPE_ALIGN (t)); + TYPE_USER_ALIGN (result) = true; + } + } + + if (TYPE_ATTRIBUTES (t)) + { + if (remove_attributes) + result = apply_identity_attributes (result, TYPE_ATTRIBUTES (t), + remove_attributes); + else + result = cp_build_type_attribute_variant (result, + TYPE_ATTRIBUTES (t)); } } - if (TYPE_ATTRIBUTES (t)) -{ - if (remove_attributes) - result = apply_identity_attributes (result, TYPE_ATTRIBUTES (t), - remove_attributes); - else - result = cp_build_type_attribute_variant (result, TYPE_ATTRIBUTES (t)); -} + return cp_build_qualified_type (result, cp_type_quals (t)); } diff --git gcc/testsuite/g++.dg/warn/Wpadded-1.C gcc/testsuite/g++.dg/warn/Wpadded-1.C index e69de29..b3f0581 100644 --- gcc/testsuite/g++.dg/warn/Wpadded-1.C +++ gcc/testsuite/g++.dg/warn/Wpadded-1.C @@ -0,0 +1,22 @@ +// PR c++/79900 - ICE in strip_typedefs +// { dg-do compile } +// { dg-options "-Wpadded" } + +template struct A; +template struct B { // { dg-warning "padding struct size to alignment boundary" } + long _M_off; + int _M_state; +}; +template <> struct A { typedef B pos_type; }; +enum _Ios_Openmode {}; +struct C { + typedef _Ios_Openmode openmode; +}; +template struct D { + typedef typename _Traits::pos_type pos_type; + pos_type m_fn1(pos_type, C::openmode); +}; +template class D ; +template +typename D<_CharT, _Traits>::pos_type D<_CharT, _Traits>::m_fn1(pos_type x, +C::openmode) { return x; } Marek
Re: [PATCH, Fortran, Coarray, v1] Add support for failed images
Hi, On 5 March 2017 at 12:41, Andre Vehreschildwrote: > Hi Jerry, > > thanks for seconding my read of the standard and reviewing so quickly. > Committed as r245900. > I've noticed that the new test: gfortran.dg/coarray/fail_image_2.f08 -fcoarray=single -O2 -latomic execution test fails on arm and aarch64. I'm using qemu if it matters, and my gfortran.log has: spawn /XXX/qemu-wrapper.sh ./fail_image_2.exe FAIL: gfortran.dg/coarray/fail_image_2.f08 -fcoarray=single -O2 -latomic execution test that is, no obvious error message :-( Am I the only one seeing this? Thanks, Christophe > Regards, > Andre > > On Sat, 4 Mar 2017 15:06:25 -0800 > Jerry DeLisle wrote: > >> On 03/04/2017 09:58 AM, Andre Vehreschild wrote: >> > Hi all, >> > >> > attached patch polishes the one begun by Alessandro. It adds documentation >> > and fixes the style issues. Furthermore did I try to interpret the standard >> > according to the FAIL IMAGE statement. IMHO should it just quit the >> > executable without any error code. The caf_single library emits "FAIL >> > IMAGE" to stderr, while in coarray=single mode it just quits. What do you >> > think? >> > >> > Bootstraps and regtests ok on x86_64-linux/f25. Ok for trunk? (May be >> > later). >> > >> > Gruß, >> > Andre >> > >> >> From my read: >> >> "A failed image is usually associated with a hardware failure of the >> processor, memory system, or interconnection network" >> >> Since the FAIL IMAGE statement is intended to simulate such failure, I agree >> with your interpretation as well, it just stops execution. >> >> Yes OK for trunk now. >> >> Jerry > > > -- > Andre Vehreschild * Email: vehre ad gmx dot de
[PATCH rs6000] Fix PR79907
The following patch fixes an "insn does not match its constraints" ICE that occurred when copying a VR reg to a GPR when -mno-upper-regs-df is in effect. The regclass used for the wi/wj constraints was being incorrectly based on TARGET_UPPER_REGS_DF instead of TARGET_UPPER_REGS_DI. This patch fixes the initialization of the wi constraint, which in turn is used to set the value of the wj constraint. Bootstrap/regtest on powerp64le and powerpc64 (-m32/-m64) with no new regressions. Ok for trunk? -Pat 2017-03-08 Pat HaugenPR target/79907 * config/rs6000/rs6000.c (rs6000_init_hard_regno_mode_ok): Test TARGET_UPPER_REGS_DI when setting 'wi' constraint regclass. testsuite/ChangeLog: 2017-03-08 Pat Haugen * gcc.target/powerpc/pr79907.c: New. Index: config/rs6000/rs6000.c === --- config/rs6000/rs6000.c (revision 245961) +++ config/rs6000/rs6000.c (working copy) @@ -3182,7 +3182,7 @@ rs6000_init_hard_regno_mode_ok (bool glo else rs6000_constraints[RS6000_CONSTRAINT_ws] = FLOAT_REGS; - if (TARGET_UPPER_REGS_DF)/* DImode */ + if (TARGET_UPPER_REGS_DI)/* DImode */ rs6000_constraints[RS6000_CONSTRAINT_wi] = VSX_REGS; else rs6000_constraints[RS6000_CONSTRAINT_wi] = FLOAT_REGS; Index: testsuite/gcc.target/powerpc/pr79907.c === --- testsuite/gcc.target/powerpc/pr79907.c (nonexistent) +++ testsuite/gcc.target/powerpc/pr79907.c (working copy) @@ -0,0 +1,15 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-options "-mcpu=power8 -O3 -mno-upper-regs-df" } */ + +int foo (short a[], int x) +{ + unsigned int i; + for (i = 0; i < 1000; i++) +{ + x = a[i]; + a[i] = (x <= 0 ? 0 : x); +} + return x; +}
Re: [PATCH][AArch64] Add neon_pairwise_add & neon_pairwise_add_q types
On Mon, Mar 06, 2017 at 05:09:44AM +, Hurugalawadi, Naveen wrote: > Hi, > > Please find attached the patch that adds "neon_pairwise_add" & > "neon_pairwise_add_qcrypto_pmull" for AArch64. > > The patch doesn't change spec but improve other benchmarks. > > Bootstrapped and Regression tested on aarch64-thunder-linux. > Please review the patch and let us know if its okay for Stage-1? The whitespace in various places in this patch is inconsistent with the whitespace around the modified line. For example: --- a/gcc/config/arm/cortex-a9-neon.md +++ b/gcc/config/arm/cortex-a9-neon.md @@ -52,6 +52,7 @@ (const_string "neon_int_2") (eq_attr "type" "neon_neg, neon_neg_q,\ neon_reduc_add, neon_reduc_add_q,\ + neon_pairwise_add, neon_pairwise_add_q,\ neon_reduc_add_long,\ neon_add_long, neon_sub_long") (const_string "neon_int_3") Where most lines use 8 spaces, and you've used tabs. While all these lines should really have been tabs to begin with, they didn't, and introducing inconsistency from line to line is worse than just following the existing style. With the whitespace fixed, this patch is OK for stage 1. Thanks, James > 2017-03-06 Julian Brown> Naveen H.S > > * config/aarch64/aarch64-simd.md (aarch64_reduc_plus_internal) > (aarch64_reduc_plus_internalv2si, aarch64_addp, aarch64_addpdi): > Use neon_pairwise_add/neon_pairwise_add_q as appropriate. > * config/aarch64/iterators.md (reduc_pairwise): New mode attribute. > * config/aarch64/thunderx.md (thunderx_neon_add, thunderx_neon_add_q): > Tweak for neon_pairwise_add split. > * config/aarch64/thunderx2t99.md (thunderx2t99_asimd_int): Add > neon_pairwise_add/neon_pairwise_add_q types. > * config/arm/cortex-a15-neon.md (cortex_a15_neon_type): Likewise. > * config/arm/cortex-a17-neon.md (cortex_a17_neon_type): Likewise. > * config/arm/cortex-a57.md (cortex_a57_neon_type): Likewise. > * config/arm/cortex-a8-neon.md (cortex_a8_neon_type): Likewise. > * config/arm/cortex-a9-neon.md (cortex_a9_neon_type): Likewise. > * config/arm/xgene1.md (xgene1_neon_arith): Likewise. > * config/arm/types.md (neon_pairwise_add, neon_pairwise_add_q): Add. > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index 338b9f8..878f86a 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -2101,7 +2101,7 @@ > UNSPEC_ADDV))] > "TARGET_SIMD" > "add\\t%0, %1." > - [(set_attr "type" "neon_reduc_add")] > + [(set_attr "type" "neon__add")] > ) > > (define_insn "aarch64_reduc_plus_internalv2si" > @@ -2110,7 +2110,7 @@ > UNSPEC_ADDV))] > "TARGET_SIMD" > "addp\\t%0.2s, %1.2s, %1.2s" > - [(set_attr "type" "neon_reduc_add")] > + [(set_attr "type" "neon_pairwise_add")] > ) > > (define_insn "reduc_plus_scal_" > @@ -4405,7 +4405,7 @@ >UNSPEC_ADDP))] >"TARGET_SIMD" >"addp\t%0, %1, %2" > - [(set_attr "type" "neon_reduc_add")] > + [(set_attr "type" "neon_pairwise_add")] > ) > > (define_insn "aarch64_addpdi" > @@ -4415,7 +4415,7 @@ >UNSPEC_ADDP))] >"TARGET_SIMD" >"addp\t%d0, %1.2d" > - [(set_attr "type" "neon_reduc_add")] > + [(set_attr "type" "neon_pairwise_add")] > ) > > ;; sqrt > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md > index c59d31e..c829cb5 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -790,6 +790,12 @@ > (V2SF "p") (V4SF "v") > (V4HF "v") (V8HF "v")]) > > +(define_mode_attr reduc_pairwise [(V8QI "reduc") (V16QI "reduc") > + (V4HI "reduc") (V8HI "reduc") > + (V2SI "pairwise") (V4SI "reduc") > + (V2DI "pairwise") (V2DF "pairwise") > + (V2SF "pairwise") (V4SF "reduc")]) > + > (define_mode_attr vsi2qi [(V2SI "v8qi") (V4SI "v16qi")]) > (define_mode_attr VSI2QI [(V2SI "V8QI") (V4SI "V16QI")]) > > diff --git a/gcc/config/aarch64/thunderx.md b/gcc/config/aarch64/thunderx.md > index b67671d..95bfad4 100644 > --- a/gcc/config/aarch64/thunderx.md > +++ b/gcc/config/aarch64/thunderx.md > @@ -266,7 +266,8 @@ > > (define_insn_reservation "thunderx_neon_add" 4 >(and (eq_attr "tune" "thunderx") > - (eq_attr "type" "neon_reduc_add, neon_reduc_minmax, > neon_fp_reduc_add_s, \ > + (eq_attr "type" "neon_reduc_add, neon_pairwise_add, > neon_reduc_minmax,\ > + neon_fp_reduc_add_s, \ > neon_fp_reduc_add_d, neon_fp_to_int_s, > neon_fp_to_int_d, \ >
Re: [PATCH][AArch64] Implement ALU_BRANCH fusion
On Mon, Mar 06, 2017 at 05:10:10AM +, Hurugalawadi, Naveen wrote: > Hi, > > Please find attached the patch that implements alu_branch fusion > for AArch64. > The patch doesn't change spec but improve other benchmarks. > > Bootstrapped and Regression tested on aarch64-thunder-linux. > Please review the patch and let us know if its okay for Stage-1? This description is insufficient for me to review this patch - in particular I'd need more detail on what types of instruction pairs you are trying to fuse. From inspection you will be trying to fuse any ALU operation with an unconditional direct branch. Is that what you intend? i.e. you are looking to fuse instruction sequences like: add x0, x1, #5 b .L3 csel x0, x1, x1, gt b .L4 Have I understood that right? > + if (aarch64_fusion_enabled_p (AARCH64_FUSE_ALU_BRANCH) > + && any_uncondjump_p (curr)) > +{ > + /* These types correspond to the reservation "vulcan_alu_basic" for > + Broadcom Vulcan: these are ALU operations that produce a single uop > + during instruction decoding. */ This comment looks incorrect - there is no vulcan_alu_basic reservation in trunk GCC. Thanks, James
Re: [PATCH][AArch64] Add crc reservations for Thunderx2t99
On Mon, Mar 06, 2017 at 05:10:04AM +, Hurugalawadi, Naveen wrote: > Hi, > > Please find attached the patch that adds crc reservations for Thunderx2t99. > > Bootstrapped and Regression tested on aarch64-thunder-linux. > Please review the patch and let us know if its okay for Stage-1? OK for Stage 1. Thanks, James > 2017-03-06 Julian Brown> Naveen H.S > > * config/aarch64/thunderx2t99.md (thunderx2t99_crc): New Reservation. > diff --git a/gcc/config/aarch64/thunderx2t99.md > b/gcc/config/aarch64/thunderx2t99.md > index 2eb136b..936078c 100644 > --- a/gcc/config/aarch64/thunderx2t99.md > +++ b/gcc/config/aarch64/thunderx2t99.md > @@ -462,3 +462,10 @@ > (eq_attr "type" "crypto_sha1_fast,crypto_sha1_xor,crypto_sha1_slow,\ > crypto_sha256_fast,crypto_sha256_slow")) >"thunderx2t99_f1") > + > +;; CRC extension. > + > +(define_insn_reservation "thunderx2t99_crc" 4 > + (and (eq_attr "tune" "thunderx2t99") > + (eq_attr "type" "crc")) > + "thunderx2t99_i1")
Re: [PATCH][AArch64] Add aes and sha reservations for Thunderx2t99
On Mon, Mar 06, 2017 at 05:09:58AM +, Hurugalawadi, Naveen wrote: > Hi, > > Please find attached the patch that adds aes and sha reservations for > Thunderx2t99. > > Bootstrapped and Regression tested on aarch64-thunder-linux. > Please review the patch and let us know if its okay for Stage-1? > > Thanks, > Naveen > > 2017-03-06 Julian Brown> Naveen H.S > > * config/aarch64/thunderx2t99.md (thunderx2t99_crc): New Reservation. > diff --git a/gcc/config/aarch64/thunderx2t99.md > b/gcc/config/aarch64/thunderx2t99.md > index f807547..2eb136b 100644 > --- a/gcc/config/aarch64/thunderx2t99.md > +++ b/gcc/config/aarch64/thunderx2t99.md > @@ -443,7 +443,22 @@ > (eq_attr "type" "neon_store2_one_lane,neon_store2_one_lane_q")) >"thunderx2t99_ls01,thunderx2t99_f01") > > +;; Crypto extensions. > + > +; FIXME: Forwarding path for aese/aesmc or aesd/aesimc pairs? Do you need these if you've enabled instruction fusion? I'm not sure on the exact mechanics of how instruction fusion works, and whether you'd get further scheduling benefits from modeling a forwarding path if one exists. Regardless, this patch is OK for Stage 1. Thanks, James > +(define_insn_reservation "thunderx2t99_aes" 5 > + (and (eq_attr "tune" "thunderx2t99") > + (eq_attr "type" "crypto_aese,crypto_aesmc")) > + "thunderx2t99_f1") > + > (define_insn_reservation "thunderx2t99_pmull" 5 >(and (eq_attr "tune" "thunderx2t99") > (eq_attr "type" "crypto_pmull")) >"thunderx2t99_f1") > + > +(define_insn_reservation "thunderx2t99_sha" 7 > + (and (eq_attr "tune" "thunderx2t99") > + (eq_attr "type" "crypto_sha1_fast,crypto_sha1_xor,crypto_sha1_slow,\ > + crypto_sha256_fast,crypto_sha256_slow")) > + "thunderx2t99_f1")
Re: [PATCH][AArch64] Fix type for 1-element load
On Mon, Mar 06, 2017 at 05:09:51AM +, Hurugalawadi, Naveen wrote: > Hi, > > Please find attached the patch that fixes type for 1-element load in AArch64. > > Bootstrapped and Regression tested on aarch64-thunder-linux. > Please review the patch and let us know if its okay for Stage-1? > > Thanks, > Naveen > > 2017-03-06 Julian Brown> Naveen H.S > > * config/aarch64/aarch64-simd.md (aarch64_simd_vec_set): Fix > type for 1-element load. OK. Thanks, James > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index 878f86a..0443a86 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -561,7 +561,7 @@ > gcc_unreachable (); > } >} > - [(set_attr "type" "neon_from_gp, neon_ins, neon_load1_1reg")] > + [(set_attr "type" "neon_from_gp, neon_ins, neon_load1_one_lane")] > ) > > (define_insn "*aarch64_simd_vec_copy_lane"
[PATCH] Fix -fsanitize=address atomic builtin handling (PR sanitizer/79944)
Hi! As reported, we were using often completely bogus access sizes for atomic builtins, where the access size should be determined purely by their _{1,2,4,8,16} suffix, not by whatever type the passed pointer points to, pointer conversions are useless. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2017-03-08 Jakub JelinekPR sanitizer/79944 * asan.c (get_mem_refs_of_builtin_call): For BUILT_IN_ATOMIC* and BUILT_IN_SYNC*, determine the access type from the size suffix and always build a MEM_REF with that type. Handle forgotten BUILT_IN_SYNC_FETCH_AND_NAND_16 and BUILT_IN_SYNC_NAND_AND_FETCH_16. * c-c++-common/asan/pr79944.c: New test. --- gcc/asan.c.jj 2017-03-06 12:32:28.0 +0100 +++ gcc/asan.c 2017-03-08 12:24:11.151353229 +0100 @@ -603,218 +603,208 @@ get_mem_refs_of_builtin_call (const gcal case BUILT_IN_STRLEN: source0 = gimple_call_arg (call, 0); len = gimple_call_lhs (call); - break ; + break; /* And now the __atomic* and __sync builtins. These are handled differently from the classical memory memory access builtins above. */ case BUILT_IN_ATOMIC_LOAD_1: -case BUILT_IN_ATOMIC_LOAD_2: -case BUILT_IN_ATOMIC_LOAD_4: -case BUILT_IN_ATOMIC_LOAD_8: -case BUILT_IN_ATOMIC_LOAD_16: is_store = false; - /* fall through. */ - + /* FALLTHRU */ case BUILT_IN_SYNC_FETCH_AND_ADD_1: -case BUILT_IN_SYNC_FETCH_AND_ADD_2: -case BUILT_IN_SYNC_FETCH_AND_ADD_4: -case BUILT_IN_SYNC_FETCH_AND_ADD_8: -case BUILT_IN_SYNC_FETCH_AND_ADD_16: - case BUILT_IN_SYNC_FETCH_AND_SUB_1: -case BUILT_IN_SYNC_FETCH_AND_SUB_2: -case BUILT_IN_SYNC_FETCH_AND_SUB_4: -case BUILT_IN_SYNC_FETCH_AND_SUB_8: -case BUILT_IN_SYNC_FETCH_AND_SUB_16: - case BUILT_IN_SYNC_FETCH_AND_OR_1: -case BUILT_IN_SYNC_FETCH_AND_OR_2: -case BUILT_IN_SYNC_FETCH_AND_OR_4: -case BUILT_IN_SYNC_FETCH_AND_OR_8: -case BUILT_IN_SYNC_FETCH_AND_OR_16: - case BUILT_IN_SYNC_FETCH_AND_AND_1: -case BUILT_IN_SYNC_FETCH_AND_AND_2: -case BUILT_IN_SYNC_FETCH_AND_AND_4: -case BUILT_IN_SYNC_FETCH_AND_AND_8: -case BUILT_IN_SYNC_FETCH_AND_AND_16: - case BUILT_IN_SYNC_FETCH_AND_XOR_1: -case BUILT_IN_SYNC_FETCH_AND_XOR_2: -case BUILT_IN_SYNC_FETCH_AND_XOR_4: -case BUILT_IN_SYNC_FETCH_AND_XOR_8: -case BUILT_IN_SYNC_FETCH_AND_XOR_16: - case BUILT_IN_SYNC_FETCH_AND_NAND_1: -case BUILT_IN_SYNC_FETCH_AND_NAND_2: -case BUILT_IN_SYNC_FETCH_AND_NAND_4: -case BUILT_IN_SYNC_FETCH_AND_NAND_8: - case BUILT_IN_SYNC_ADD_AND_FETCH_1: -case BUILT_IN_SYNC_ADD_AND_FETCH_2: -case BUILT_IN_SYNC_ADD_AND_FETCH_4: -case BUILT_IN_SYNC_ADD_AND_FETCH_8: -case BUILT_IN_SYNC_ADD_AND_FETCH_16: - case BUILT_IN_SYNC_SUB_AND_FETCH_1: -case BUILT_IN_SYNC_SUB_AND_FETCH_2: -case BUILT_IN_SYNC_SUB_AND_FETCH_4: -case BUILT_IN_SYNC_SUB_AND_FETCH_8: -case BUILT_IN_SYNC_SUB_AND_FETCH_16: - case BUILT_IN_SYNC_OR_AND_FETCH_1: -case BUILT_IN_SYNC_OR_AND_FETCH_2: -case BUILT_IN_SYNC_OR_AND_FETCH_4: -case BUILT_IN_SYNC_OR_AND_FETCH_8: -case BUILT_IN_SYNC_OR_AND_FETCH_16: - case BUILT_IN_SYNC_AND_AND_FETCH_1: -case BUILT_IN_SYNC_AND_AND_FETCH_2: -case BUILT_IN_SYNC_AND_AND_FETCH_4: -case BUILT_IN_SYNC_AND_AND_FETCH_8: -case BUILT_IN_SYNC_AND_AND_FETCH_16: - case BUILT_IN_SYNC_XOR_AND_FETCH_1: -case BUILT_IN_SYNC_XOR_AND_FETCH_2: -case BUILT_IN_SYNC_XOR_AND_FETCH_4: -case BUILT_IN_SYNC_XOR_AND_FETCH_8: -case BUILT_IN_SYNC_XOR_AND_FETCH_16: - case BUILT_IN_SYNC_NAND_AND_FETCH_1: -case BUILT_IN_SYNC_NAND_AND_FETCH_2: -case BUILT_IN_SYNC_NAND_AND_FETCH_4: -case BUILT_IN_SYNC_NAND_AND_FETCH_8: - case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_1: -case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_2: -case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_4: -case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_8: -case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_16: - case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_1: -case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_2: -case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_4: -case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_8: -case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_16: - case BUILT_IN_SYNC_LOCK_TEST_AND_SET_1: -case BUILT_IN_SYNC_LOCK_TEST_AND_SET_2: -case BUILT_IN_SYNC_LOCK_TEST_AND_SET_4: -case BUILT_IN_SYNC_LOCK_TEST_AND_SET_8: -case BUILT_IN_SYNC_LOCK_TEST_AND_SET_16: - case BUILT_IN_SYNC_LOCK_RELEASE_1: -case BUILT_IN_SYNC_LOCK_RELEASE_2: -case BUILT_IN_SYNC_LOCK_RELEASE_4: -case BUILT_IN_SYNC_LOCK_RELEASE_8: -case BUILT_IN_SYNC_LOCK_RELEASE_16: - case BUILT_IN_ATOMIC_EXCHANGE_1: -case BUILT_IN_ATOMIC_EXCHANGE_2: -case BUILT_IN_ATOMIC_EXCHANGE_4: -case BUILT_IN_ATOMIC_EXCHANGE_8: -case BUILT_IN_ATOMIC_EXCHANGE_16: - case
[committed] Fix taskloop handling inside of parallel construct body (PR c/79940)
Hi! We split OMP_TASKLOOP into 3 constructs, two GIMPLE_OMP_FOR with GIMPLE_OMP_PARALLEL sandwiched in between them, so that it is possible to compute number of iterations etc. before calling GOMP_taskloop*. Using the original iterator in the outer gfor doesn't play very well if the taskloop region is nested in other OpenMP regions like parallel. This patch just creates a temporary for that. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk so far, queued for 6.x. 2017-03-08 Jakub JelinekPR c/79940 * gimplify.c (gimplify_omp_for): Replace index var in outer taskloop statement with an artificial variable and add OMP_CLAUSE_PRIVATE clause for it. * testsuite/libgomp.c/pr79940.c: New test. --- gcc/gimplify.c.jj 2017-02-21 09:03:57.0 +0100 +++ gcc/gimplify.c 2017-03-08 10:06:11.926501447 +0100 @@ -10232,8 +10232,9 @@ gimplify_omp_for (tree *expr_p, gimple_s gimple_omp_for_set_combined_into_p (gfor, true); for (i = 0; i < (int) gimple_omp_for_collapse (gfor); i++) { - t = unshare_expr (gimple_omp_for_index (gfor, i)); - gimple_omp_for_set_index (gforo, i, t); + tree type = TREE_TYPE (gimple_omp_for_index (gfor, i)); + tree v = create_tmp_var (type); + gimple_omp_for_set_index (gforo, i, v); t = unshare_expr (gimple_omp_for_initial (gfor, i)); gimple_omp_for_set_initial (gforo, i, t); gimple_omp_for_set_cond (gforo, i, @@ -10241,7 +10242,13 @@ gimplify_omp_for (tree *expr_p, gimple_s t = unshare_expr (gimple_omp_for_final (gfor, i)); gimple_omp_for_set_final (gforo, i, t); t = unshare_expr (gimple_omp_for_incr (gfor, i)); + gcc_assert (TREE_OPERAND (t, 0) == gimple_omp_for_index (gfor, i)); + TREE_OPERAND (t, 0) = v; gimple_omp_for_set_incr (gforo, i, t); + t = build_omp_clause (input_location, OMP_CLAUSE_PRIVATE); + OMP_CLAUSE_DECL (t) = v; + OMP_CLAUSE_CHAIN (t) = gimple_omp_for_clauses (gforo); + gimple_omp_for_set_clauses (gforo, t); } gimplify_seq_add_stmt (pre_p, gforo); } --- libgomp/testsuite/libgomp.c/pr79940.c.jj2017-03-08 10:47:26.179154442 +0100 +++ libgomp/testsuite/libgomp.c/pr79940.c 2017-03-08 10:46:46.0 +0100 @@ -0,0 +1,47 @@ +/* PR c/79940 */ + +int +main () +{ + int i, j, l, m; + int a[1], b[1], c[1]; + for (i = 0; i < 1; i++) +{ + a[i] = i; + b[i] = i & 31; +} +#pragma omp parallel shared(a, b, c) +#pragma omp single +#pragma omp taskloop shared(a, b, c) + for (i = 0; i < 1; i++) +c[i] = a[i] + b[i]; +#pragma omp parallel +#pragma omp single + { +#pragma omp taskloop shared(a, b, c) lastprivate (i) +for (i = 0; i < 1; i++) + c[i] += a[i] + b[i]; +l = i; + } +#pragma omp parallel +#pragma omp single +#pragma omp taskloop shared(a, b, c) collapse(2) + for (i = 0; i < 100; i++) +for (j = 0; j < 100; j++) + c[i * 100 + j] += a[i * 100 + j] + b[i * 100 + j]; +#pragma omp parallel +#pragma omp single + { +#pragma omp taskloop shared(a, b, c) lastprivate (i, j) +for (i = 0; i < 100; i++) + for (j = 0; j < 100; j++) + c[i * 100 + j] += a[i * 100 + j] + b[i * 100 + j]; +m = i * 100 + j; + } + for (i = 0; i < 1; i++) +if (a[i] != i || b[i] != (i & 31) || c[i] != 4 * i + 4 * (i & 31)) + __builtin_abort (); + if (l != 1 || m != 10100) +__builtin_abort (); + return 0; +} Jakub
[PATCH][AArch64] Emit tighter strong atomic compare-exchange loop when comparing against zero
Hi all, For the testcase in this patch where the value of x is zero we currently generate: foo: mov w1, 4 .L2: ldaxr w2, [x0] cmp w2, 0 bne .L3 stxrw3, w1, [x0] cbnzw3, .L2 .L3: csetw0, eq ret We currently cannot merge the cmp and b.ne inside the loop into a cbnz because we need the condition flags set for the return value of the function (i.e. the cset at the end). But if we re-jig the sequence in that case we can generate a tighter loop: foo: mov w1, 4 .L2: ldaxr w2, [x0] cbnzw2, .L3 stxrw3, w1, [x0] cbnzw3, .L2 .L3: cmp w2, 0 csetw0, eq ret So we add an explicit compare after the loop and inside the loop we use the fact that we're comparing against zero to emit a CBNZ. This means we may re-do the comparison twice (once inside the CBNZ, once at the CMP at the end), but there is now less code inside the loop. I've seen this sequence appear in glibc locking code so maybe it's worth adding the extra bit of complexity to the compare-exchange splitter to catch this case. Bootstrapped and tested on aarch64-none-linux-gnu. In previous iterations of the patch where I had gotten some logic wrong it would cause miscompiles of libgomp leading to timeouts in its testsuite but this version passes everything cleanly. Ok for GCC 8? (I know it's early, but might as well get it out in case someone wants to try it out) Thanks, Kyrill 2017-03-08 Kyrylo Tkachov* config/aarch64/aarch64.c (aarch64_split_compare_and_swap): Emit CBNZ inside loop when doing a strong exchange and comparing against zero. Generate the CC flags after the loop. 2017-03-08 Kyrylo Tkachov * gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c: New test. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 76a2de20dfcd4ea38fb7c58a9e8612509c5987bd..5fa8e197328ce4cb1718ff7d99b1ea95e02129a4 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -12095,6 +12095,17 @@ aarch64_split_compare_and_swap (rtx operands[]) mode = GET_MODE (mem); model = memmodel_from_int (INTVAL (model_rtx)); + /* When OLDVAL is zero and we want the strong version we can emit a tighter +loop: +.label1: + LD[A]XR rval, [mem] + CBNZ rval, .label2 + ST[L]XR scratch, newval, [mem] + CBNZ scratch, .label1 +.label2: + CMP rval, 0. */ + bool strong_zero_p = !is_weak && oldval == const0_rtx; + label1 = NULL; if (!is_weak) { @@ -12111,11 +12122,21 @@ aarch64_split_compare_and_swap (rtx operands[]) else aarch64_emit_load_exclusive (mode, rval, mem, model_rtx); - cond = aarch64_gen_compare_reg (NE, rval, oldval); - x = gen_rtx_NE (VOIDmode, cond, const0_rtx); - x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, - gen_rtx_LABEL_REF (Pmode, label2), pc_rtx); - aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); + if (strong_zero_p) +{ + x = gen_rtx_NE (VOIDmode, rval, const0_rtx); + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, +gen_rtx_LABEL_REF (Pmode, label2), pc_rtx); + aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); +} + else +{ + cond = aarch64_gen_compare_reg (NE, rval, oldval); + x = gen_rtx_NE (VOIDmode, cond, const0_rtx); + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, + gen_rtx_LABEL_REF (Pmode, label2), pc_rtx); + aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); +} aarch64_emit_store_exclusive (mode, scratch, mem, newval, model_rtx); @@ -12134,7 +12155,15 @@ aarch64_split_compare_and_swap (rtx operands[]) } emit_label (label2); - + /* If we used a CBNZ in the exchange loop emit an explicit compare with RVAL + to set the condition flags. If this is not used it will be removed by + later passes. */ + if (strong_zero_p) +{ + cond = gen_rtx_REG (CCmode, CC_REGNUM); + x = gen_rtx_COMPARE (CCmode, rval, const0_rtx); + emit_insn (gen_rtx_SET (cond, x)); +} /* Emit any final barrier needed for a __sync operation. */ if (is_mm_sync (model)) aarch64_emit_post_barrier (model); diff --git a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c new file mode 100644 index ..b14a7c294376f03cd13077d18d865f83a04bd04e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int +foo (int *a) +{ + int x = 0; + return __atomic_compare_exchange_n (a, , 4, 0, + __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE); +} + +/* { dg-final { scan-assembler-times "cbnz\\tw\[0-9\]+" 2 } } */
Re: [RFC] VEC_SELECT sanity checking in genrecog (arm, aarch64, mips)
On 08/03/17 15:51, Bill Schmidt wrote: On Mar 6, 2017, at 6:55 AM, Jakub Jelinekwrote: Nice. So just arm, aarch64 and rs6000 to go. Proposed patch for rs6000 is here: https://gcc.gnu.org/ml/gcc-patches/2017-03/msg00372.html. arm patch is here https://gcc.gnu.org/ml/gcc-patches/2017-03/msg00358.html Doing some more testing on the aarch64 version. Thanks, Kyrill Thanks, Bill
Re: [PATCH] Fix PRs 79955 and 79956 (hopefully)
On 03/08/2017 03:12 AM, Richard Biener wrote: The following avoids warning about completely out-of-bound accesses as uninitialized. For GCC8 we likely want to enhance path isolation to catch those cases (and eventually issue a diagnostic about them). And we finally want to fix complete peeling to not introduce those accesses. FWIW, I want to do isolation of o-o-b array accesses and this would likely fit into that reasonably well. jeff
Re: [RFC] VEC_SELECT sanity checking in genrecog (arm, aarch64, mips)
> On Mar 6, 2017, at 6:55 AM, Jakub Jelinekwrote: > > Nice. So just arm, aarch64 and rs6000 to go. Proposed patch for rs6000 is here: https://gcc.gnu.org/ml/gcc-patches/2017-03/msg00372.html. Thanks, Bill
[PATCH, rs6000] Fix incorrect mode usage for vec_select
Hi, As noted by Jakub in https://gcc.gnu.org/ml/gcc-patches/2017-03/msg00183.html, the PowerPC back end incorrectly uses vec_select with 2 elements for a mode that has only one. This is due to faulty mode iterator use: V1TImode was wrongly included in the VSX_LE mode iterator, but should instead have been in the VSX_LE_128 mode iterator. This patch fixes that, and with VSX_LE no longer including V1TImode, it is now redundant with VSX_D, so that patch removes VSX_LE altogether. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this ok for trunk? I am uncertain whether we should backport the fix to gcc 5 and 6, since, although the code is technically incorrect, it works just fine. The fix is needed in trunk to permit the sanity checking that Jakub has proposed for genrecog. Thanks, Bill 2017-03-08 Bill Schmidt* config/rs6000/rs6000.c (rs6000_gen_le_vsx_permute): Use rotate instead of vec_select for V1TImode. * conifg/rs6000/vsx.md (VSX_LE): Remove mode iterator that is no longer needed. (VSX_LE_128): Add V1TI to this mode iterator. (*vsx_le_perm_load_): Change to use VSX_D mode iterator. (*vsx_le_perm_store_): Likewise. (pre-reload splitter for VSX stores): Likewise. (post-reload splitter for VSX stores): Likewise. (*vsx_xxpermdi2_le_): Likewise. (*vsx_lxvd2x2_le_): Likewise. (*vsx_stxvd2x2_le_): Likewise. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 245952) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -10420,7 +10420,7 @@ rs6000_gen_le_vsx_permute (rtx source, machine_mod { /* Use ROTATE instead of VEC_SELECT on IEEE 128-bit floating point, and 128-bit integers if they are allowed in VSX registers. */ - if (FLOAT128_VECTOR_P (mode) || mode == TImode) + if (FLOAT128_VECTOR_P (mode) || mode == TImode || mode == V1TImode) return gen_rtx_ROTATE (mode, source, GEN_INT (64)); else { Index: gcc/config/rs6000/vsx.md === --- gcc/config/rs6000/vsx.md(revision 245952) +++ gcc/config/rs6000/vsx.md(working copy) @@ -27,15 +27,12 @@ ;; Iterator for the 2 64-bit vector types (define_mode_iterator VSX_D [V2DF V2DI]) -;; Iterator for the 2 64-bit vector types + 128-bit types that are loaded with -;; lxvd2x to properly handle swapping words on little endian -(define_mode_iterator VSX_LE [V2DF V2DI V1TI]) - ;; Mode iterator to handle swapping words on little endian for the 128-bit ;; types that goes in a single vector register. (define_mode_iterator VSX_LE_128 [(KF "FLOAT128_VECTOR_P (KFmode)") (TF "FLOAT128_VECTOR_P (TFmode)") - (TI "TARGET_VSX_TIMODE")]) + (TI "TARGET_VSX_TIMODE") + V1TI]) ;; Iterator for the 2 32-bit vector types (define_mode_iterator VSX_W [V4SF V4SI]) @@ -387,8 +384,8 @@ ;; The patterns for LE permuted loads and stores come before the general ;; VSX moves so they match first. (define_insn_and_split "*vsx_le_perm_load_" - [(set (match_operand:VSX_LE 0 "vsx_register_operand" "=") -(match_operand:VSX_LE 1 "memory_operand" "Z"))] + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=") +(match_operand:VSX_D 1 "memory_operand" "Z"))] "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR" "#" "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR" @@ -501,8 +498,8 @@ (set_attr "length" "8")]) (define_insn "*vsx_le_perm_store_" - [(set (match_operand:VSX_LE 0 "memory_operand" "=Z") -(match_operand:VSX_LE 1 "vsx_register_operand" "+"))] + [(set (match_operand:VSX_D 0 "memory_operand" "=Z") +(match_operand:VSX_D 1 "vsx_register_operand" "+"))] "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR" "#" [(set_attr "type" "vecstore") @@ -509,8 +506,8 @@ (set_attr "length" "12")]) (define_split - [(set (match_operand:VSX_LE 0 "memory_operand" "") -(match_operand:VSX_LE 1 "vsx_register_operand" ""))] + [(set (match_operand:VSX_D 0 "memory_operand" "") +(match_operand:VSX_D 1 "vsx_register_operand" ""))] "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR && !reload_completed" [(set (match_dup 2) (vec_select: @@ -528,8 +525,8 @@ ;; The post-reload split requires that we re-permute the source ;; register in case it is still live. (define_split - [(set (match_operand:VSX_LE 0 "memory_operand" "") -(match_operand:VSX_LE 1 "vsx_register_operand" ""))] + [(set (match_operand:VSX_D 0 "memory_operand" "") +(match_operand:VSX_D 1 "vsx_register_operand" ""))] "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR && reload_completed" [(set (match_dup 1) (vec_select: @@
Re: [PATCH,testsuite] Add check_effective_target_rdynamic and use it in g++.dg/lto/pr69589_0.C.
Hi Toma, > g++.dg/lto/pr69589_0.C is currently failing for mips-mti-elf with the > following error: > > xg++: error: unrecognized command line option '-rdynamic' > > However, it passes just fine for mips-mti-linux-gnu. > I think that we should skip this test for mips-mti-elf. could it be that mips/sde.h is just missing -rdynamic handling in LINK_SPEC? At least some bare-metal targets using gld *do* support -rdynamic (cf. xtensa/elf.h and aarch64/aarch64-elf-raw.h)... That said, I like the general direction of this patch, even when the new keyword is only used in one place. > This patch achieves this by adding support for check_effective_target_rdynamic > and then using it in g++.dg/lto/pr69589_0.C. > > This patch also removes the existing dg-skip-if, as it is made redundant by > the > effective target check. The latter is also more convenient, as we won't have > to > keep tweaking the dg-skip-if for failing targets anymore. > > This is similar to my recent check_effective_target_gettimeofday patch, > which was greatly improved by Rainer Orth's review. > > Tested with mips-mti-elf and mips-mti-linux-gnu. > > Does this look good ? [...] > diff --git a/gcc/testsuite/lib/target-supports.exp > b/gcc/testsuite/lib/target-supports.exp > index 2766af4..f46f0ba 100644 > --- a/gcc/testsuite/lib/target-supports.exp > +++ b/gcc/testsuite/lib/target-supports.exp > @@ -1484,6 +1484,16 @@ proc check_effective_target_static_libgfortran { } { > } "-static"] > } > > +# Return 1 if we can use the -rdynamic option, 0 otherwise. > +# > +# When the target name changes, replace the cached result. I think this line isn't necessary. Just keep the first. Ok for mainline with that fixed. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] V2: Fix compile warning building testcase
On Wed, Mar 8, 2017 at 4:55 AM, Sam Thursfieldwrote: > > Thanks a lot for reviewing the patch. I've attached a new version with some > comments below. > > On 08/03/17 04:03, Ian Lance Taylor via gcc-patches wrote: >> >> Thanks. The patch submission is fine but 1) you didn't see which >> version of GCC you are using; 2) I don't understand why it works. If >> BACKTRACE_SUPPORTED is 0, then I would expect that you would see a >> warning for test1, test2, test3, and test4. Your patch doesn't fix >> those, so it sounds like you are only seeing the warning for test5, so >> BACKTRACE_SUPPORTED must be 1.. test5 is not compiled if >> BACKTRACE_SUPPORTS_DATA is 0, so BACKTRACE_SUPPORTS_DATA must be 1. >> So both BACKTRACE_SUPPORTED and BACKTRACE_SUPPORTS_DATA must be 1, so >> I don't understand what your patch fixes. > > > I'm using GCC 6.3.0 here to compile. > > I see your point about the fix not making sense. The values of the > defines are this: > > #define BACKTRACE_SUPPORTED 0 > #define BACKTRACE_SUPPORTS_DATA 1 > > I noticed that test1(), test2(), test3() and test4() all have the > "unused" attribute: > > static int test1 (void) __attribute__ ((noinline, unused)); > static inline int test2 (void) __attribute__ ((always_inline, unused)); > static int test3 (void) __attribute__ ((noinline, unused)); > static inline int test4 (void) __attribute__ ((always_inline, unused)); > > So that will be why they don't trigger -Wunused-function :-) > > It seems to me now that it'd be neater to add that same attribute to > test5(), instead of using an #ifdef guard. Attached is a new patch that does > so. Makes sense. Thanks. Committed to mainline. Ian
Re: [RFC PATCH libiberty] Fix infinite recursion in demangler
On 03/08/2017 08:29 AM, Mark Wielaard wrote: On Wed, 2017-03-08 at 08:22 -0500, Nathan Sidwell wrote: https://gcc.gnu.org/ml/gcc-patches/2017-03/msg00089.html? Patch looks good to me (not a libiberty maintainer) One question: + if (dc == NULL || dc->d_printing > 1) Can dc ever be null here, or is this just extra defensiveness? Yes, dc can be NULL. There is a similar check (and setting of d_print_error) in d_print_comp_inner. thanks. in case you'd not noticed Ian T's okayed my review. so this is good to go nathan -- Nathan Sidwell
Re: [RFC PATCH libiberty] Fix infinite recursion in demangler
On Wed, 2017-03-08 at 08:22 -0500, Nathan Sidwell wrote: > https://gcc.gnu.org/ml/gcc-patches/2017-03/msg00089.html? > > Patch looks good to me (not a libiberty maintainer) > > One question: > + if (dc == NULL || dc->d_printing > 1) > > Can dc ever be null here, or is this just extra defensiveness? Yes, dc can be NULL. There is a similar check (and setting of d_print_error) in d_print_comp_inner. Cheers, Mark
[RFC PATCH libiberty] Fix infinite recursion in demangler
https://gcc.gnu.org/ml/gcc-patches/2017-03/msg00089.html? Patch looks good to me (not a libiberty maintainer) One question: + if (dc == NULL || dc->d_printing > 1) Can dc ever be null here, or is this just extra defensiveness? nathan -- Nathan Sidwell
Re: [PATCH] Fix ICE with -Walloca-larger-than=[>INT_MAX] (PR middle-end/79809)
On Wed, Mar 08, 2017 at 12:30:59PM +0100, Andreas Schwab wrote: > On Mär 02 2017, Marek Polacek <pola...@redhat.com> wrote: > > > diff --git gcc/testsuite/g++.dg/Walloca1.C gcc/testsuite/g++.dg/Walloca1.C > > index e69de29..23b97e8 100644 > > --- gcc/testsuite/g++.dg/Walloca1.C > > +++ gcc/testsuite/g++.dg/Walloca1.C > > @@ -0,0 +1,6 @@ > > +/* PR middle-end/79809 */ > > +/* { dg-do compile } */ > > +/* { dg-options "-Walloca-larger-than=4207115063 > > -Wvla-larger-than=1233877270 -O2" } */ > > + > > +int a; > > +char *b = static_cast(__builtin_alloca (a)); // { dg-warning > > "argument to .alloca. may be too large" } > > FAIL: g++.dg/Walloca1.C -std=gnu++11 (test for warnings, line 6) > FAIL: g++.dg/Walloca1.C -std=gnu++11 (test for excess errors) > Excess errors: > /daten/aranym/gcc/gcc-20170308/gcc/testsuite/g++.dg/Walloca1.C:6:48: warning: > unbounded use of 'alloca' [-Walloca-larger-than=] Ok, this should help: 2017-03-08 Marek Polacek <pola...@redhat.com> * g++.dg/Walloca1.C: Adjust dg-warning. diff --git gcc/testsuite/g++.dg/Walloca1.C gcc/testsuite/g++.dg/Walloca1.C index 23b97e8..818c6f0 100644 --- gcc/testsuite/g++.dg/Walloca1.C +++ gcc/testsuite/g++.dg/Walloca1.C @@ -3,4 +3,4 @@ /* { dg-options "-Walloca-larger-than=4207115063 -Wvla-larger-than=1233877270 -O2" } */ int a; -char *b = static_cast(__builtin_alloca (a)); // { dg-warning "argument to .alloca. may be too large" } +char *b = static_cast(__builtin_alloca (a)); // { dg-warning "argument to .alloca. may be too large|unbounded use of" } Marek
Re: [PATCH] V2: Fix compile warning building testcase
Hi Ian Thanks a lot for reviewing the patch. I've attached a new version with some comments below. On 08/03/17 04:03, Ian Lance Taylor via gcc-patches wrote: Thanks. The patch submission is fine but 1) you didn't see which version of GCC you are using; 2) I don't understand why it works. If BACKTRACE_SUPPORTED is 0, then I would expect that you would see a warning for test1, test2, test3, and test4. Your patch doesn't fix those, so it sounds like you are only seeing the warning for test5, so BACKTRACE_SUPPORTED must be 1.. test5 is not compiled if BACKTRACE_SUPPORTS_DATA is 0, so BACKTRACE_SUPPORTS_DATA must be 1. So both BACKTRACE_SUPPORTED and BACKTRACE_SUPPORTS_DATA must be 1, so I don't understand what your patch fixes. I'm using GCC 6.3.0 here to compile. I see your point about the fix not making sense. The values of the defines are this: #define BACKTRACE_SUPPORTED 0 #define BACKTRACE_SUPPORTS_DATA 1 I noticed that test1(), test2(), test3() and test4() all have the "unused" attribute: static int test1 (void) __attribute__ ((noinline, unused)); static inline int test2 (void) __attribute__ ((always_inline, unused)); static int test3 (void) __attribute__ ((noinline, unused)); static inline int test4 (void) __attribute__ ((always_inline, unused)); So that will be why they don't trigger -Wunused-function :-) It seems to me now that it'd be neater to add that same attribute to test5(), instead of using an #ifdef guard. Attached is a new patch that does so. Again, I've tested this using GCC 6.3.0 on powerpc-ibm-aix7.2.0.0. Sam -- Sam Thursfield, Codethink Ltd. Office telephone: +44 161 236 5575 >From 87b52dee5ca3b8feca379877c873c22b673f1b7b Mon Sep 17 00:00:00 2001 From: Sam ThursfieldDate: Wed, 8 Mar 2017 06:48:30 -0600 Subject: [PATCH] libbacktrace: Fix compile warning building testcase This warning occurred when the BACKTRACE_SUPPORTED flag is defined to 0. ../../../gcc/libbacktrace/btest.c:624:1: error: 'test5' defined but not used [-Werror=unused-function] test5 (void) ^ cc1: all warnings being treated as errors The other test functions have the 'unused' attribute, presumably to avoid similar errors, so let's handle test5() the same way. instead of using a #ifdef guard. libbacktrace/: 2017-03-08 Sam Thursfield * btest.c (test5): Replace #ifdef guard with 'unused' attribute to fix compile warning when BACKTRACE_SUPPORTED isn't defined. --- libbacktrace/btest.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/libbacktrace/btest.c b/libbacktrace/btest.c index e28a3d8..92cb325 100644 --- a/libbacktrace/btest.c +++ b/libbacktrace/btest.c @@ -616,7 +616,7 @@ f33 (int f1line, int f2line) return failures; } -#if BACKTRACE_SUPPORTS_DATA +static int test5 (void) __attribute__ ((unused)); int global = 1; @@ -686,8 +686,6 @@ test5 (void) return failures; } -#endif /* BACKTRACE_SUPPORTS_DATA */ - static void error_callback_create (void *data ATTRIBUTE_UNUSED, const char *msg, int errnum) -- 2.2.2
Re: [PATCH] Fix ICE with -Walloca-larger-than=[>INT_MAX] (PR middle-end/79809)
On Mär 02 2017, Marek Polacek <pola...@redhat.com> wrote: > diff --git gcc/testsuite/g++.dg/Walloca1.C gcc/testsuite/g++.dg/Walloca1.C > index e69de29..23b97e8 100644 > --- gcc/testsuite/g++.dg/Walloca1.C > +++ gcc/testsuite/g++.dg/Walloca1.C > @@ -0,0 +1,6 @@ > +/* PR middle-end/79809 */ > +/* { dg-do compile } */ > +/* { dg-options "-Walloca-larger-than=4207115063 > -Wvla-larger-than=1233877270 -O2" } */ > + > +int a; > +char *b = static_cast(__builtin_alloca (a)); // { dg-warning > "argument to .alloca. may be too large" } FAIL: g++.dg/Walloca1.C -std=gnu++11 (test for warnings, line 6) FAIL: g++.dg/Walloca1.C -std=gnu++11 (test for excess errors) Excess errors: /daten/aranym/gcc/gcc-20170308/gcc/testsuite/g++.dg/Walloca1.C:6:48: warning: unbounded use of 'alloca' [-Walloca-larger-than=] Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: [PATCH] Loop splitting breaks with loops of pointer type
On Wed, Mar 8, 2017 at 11:53 AM, Andrew Haleywrote: > Loop splitting is fine when the control variable is of integer type, > but when it is a pointer type the upper bound of the new loop is > calculated incorrectly. > The calculation should be guard_init + (end-beg), but instead we do > guard_init - (end-beg). > > Fixed thusly. Bootstrapped, regtested. > > OK? Ok. Thanks, Richard. > Andrew. > > > 2017-03-08 Andrew Haley > > PR tree-optimization/79894 > * tree-ssa-loop-split.c (compute_new_first_bound): When > calculating the new upper bound, (END-BEG) should be added, not > subtracted. > > Index: gcc/tree-ssa-loop-split.c > === > --- gcc/tree-ssa-loop-split.c (revision 245948) > +++ gcc/tree-ssa-loop-split.c (working copy) > @@ -436,7 +436,6 @@ >if (POINTER_TYPE_P (TREE_TYPE (guard_init))) > { >enddiff = gimple_convert (stmts, sizetype, enddiff); > - enddiff = gimple_build (stmts, NEGATE_EXPR, sizetype, enddiff); >newbound = gimple_build (stmts, POINTER_PLUS_EXPR, >TREE_TYPE (guard_init), >guard_init, enddiff); > > > 2017-03-08 Andrew Haley > > PR tree-optimization/79894 > * gcc.dg/tree-ssa/pr79943.c: New test. > > Index: gcc/testsuite/gcc.dg/tree-ssa/pr79943.c > === > --- gcc/testsuite/gcc.dg/tree-ssa/pr79943.c (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr79943.c (revision 0) > @@ -0,0 +1,40 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */ > +/* { dg-require-effective-target int32plus } */ > + > +#ifdef __cplusplus > +extern "C" void abort (void); > +#else > +extern void abort (void); > +#endif > + > +typedef struct { > + int n; > +} region_t; > + > +void set (region_t *region) __attribute__((noinline)); > +void doit (region_t *beg, region_t *end, region_t *limit) > + __attribute__((noinline)); > + > +region_t regions[10]; > + > +void > +set (region_t *region) { > + region->n = 1; > +} > + > +void > +doit (region_t *beg, region_t *end, region_t *limit) { > + for (region_t *cur = beg; cur < end; cur++) { > +if (cur < limit) { > + set(cur); > +} > + } > +} > + > +int > +main (void) { > + doit([0], [2], [10]); > + if (regions[1].n != 1) > +abort(); > +}
Re: [PATCH 2/5] Get bounds for a PARM_DECL (PR ipa/79761).
On Tue, Mar 7, 2017 at 5:07 PM, Martin Liškawrote: > On 03/07/2017 03:57 PM, Richard Biener wrote: >> On Thu, Mar 2, 2017 at 6:06 PM, marxin wrote: >>> gcc/ChangeLog: >>> >>> 2017-03-06 Martin Liska >>> >>> PR ipa/79761 >>> * tree-chkp.c (chkp_get_bound_for_parm): Get bounds for a param. >>> (chkp_find_bounds_1): Remove gcc_unreachable. >>> >>> gcc/testsuite/ChangeLog: >>> >>> 2017-03-06 Martin Liska >>> >>> PR ipa/79761 >>> * g++.dg/pr79761.C: New test. >>> --- >>> gcc/testsuite/g++.dg/pr79761.C | 34 ++ >>> gcc/tree-chkp.c| 3 +-- >>> 2 files changed, 35 insertions(+), 2 deletions(-) >>> create mode 100644 gcc/testsuite/g++.dg/pr79761.C >>> >>> diff --git a/gcc/testsuite/g++.dg/pr79761.C b/gcc/testsuite/g++.dg/pr79761.C >>> new file mode 100644 >>> index 000..b1f92d2b036 >>> --- /dev/null >>> +++ b/gcc/testsuite/g++.dg/pr79761.C >>> @@ -0,0 +1,34 @@ >>> +/* { dg-do compile { target { ! x32 } } } */ >>> +/* { dg-options "-fcheck-pointer-bounds -mmpx -mabi=ms" } */ >>> + >>> +struct Foo >>> +{ >>> + Foo() : a(1), b(1), c('a') {} >>> + int a; >>> + int b; >>> + char c; >>> +}; >>> + >>> +static Foo copy_foo(Foo) __attribute__((noinline, noclone)); >>> + >>> +static Foo copy_foo(Foo A) >>> +{ >>> + return A; >>> +} >>> + >>> +struct Bar : Foo >>> +{ >>> + Bar(Foo t) : Foo(copy_foo(t)) {} >>> +}; >>> + >>> +Foo F; >>> + >>> +int main (void) >>> +{ >>> + Bar B (F); >>> + >>> + if (B.a != 1 || B.b != 1 || B.c != 'a') >>> +__builtin_abort (); >>> + >>> + return 0; >>> +} >>> diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c >>> index 3d497f51ed8..d5683b1b9cf 100644 >>> --- a/gcc/tree-chkp.c >>> +++ b/gcc/tree-chkp.c >>> @@ -2353,7 +2353,7 @@ chkp_get_next_bounds_parm (tree parm) >>> static tree >>> chkp_get_bound_for_parm (tree parm) >>> { >>> - tree decl = SSA_NAME_VAR (parm); >>> + tree decl = TREE_CODE (parm) == PARM_DECL ? parm : SSA_NAME_VAR (parm); >>>tree bounds; >>> >>>gcc_assert (TREE_CODE (decl) == PARM_DECL); >>> @@ -3602,7 +3602,6 @@ chkp_find_bounds_1 (tree ptr, tree ptr_src, >>> gimple_stmt_iterator *iter) >>>break; >>> >>> case PARM_DECL: >>> - gcc_unreachable (); >>>bounds = chkp_get_bound_for_parm (ptr_src); >> >> But this is just useless work ... just do >> >> case PARM_DECL: >> /* Handled above but failed. */ >> break; > > Ok, let's return invalid bounds. Please see updated patch. Ok. > Martin > >> >> the SSA_NAME case is similarly redundantly calling >> chkp_get_registered_bounds. >> >> Richard. >> >>>break; >>> >>> -- >>> 2.11.1 >>> >>> >
Re: [PATCH 1/5] Fix *_CST ICEs connected to MPX.
On Tue, Mar 7, 2017 at 5:01 PM, Martin Liškawrote: > On 03/07/2017 03:53 PM, Richard Biener wrote: >> On Tue, Mar 7, 2017 at 3:48 PM, Martin Liška wrote: >>> On 03/07/2017 11:17 AM, Rainer Orth wrote: marxin writes: > diff --git a/gcc/testsuite/g++.dg/pr79769.C > b/gcc/testsuite/g++.dg/pr79769.C > new file mode 100644 > index 000..f9223db1b2d > --- /dev/null > +++ b/gcc/testsuite/g++.dg/pr79769.C > @@ -0,0 +1,4 @@ > +/* { dg-do compile { target { ! x32 } } } */ > +/* { dg-options "-fcheck-pointer-bounds -mmpx -mabi=ms" } */ ... and again: make this x86-only. Rainer >>> >>> Thanks. I'm sending v2 of the patch. >> >> Hmm, not sure why we should handle REAL_CST here explicitely for example. >> >> Why not, instead of internal_error in the default: case do >> >> bounds = chkp_get_invalid_op_bounds (); > > Because chkp_get_invalid_op_bounds() returns bounds that are always valid and > as it's > security extension, I would be strict here in order to not handle something > that can bypass > the checking. > >> >> there? For the testcase why do we invoke chkp_find_bounds_1 on sth that is >> a REAL_CST for example? > > It's called when setting bounds in a call expr: > > #0 chkp_find_bounds_1 (ptr=0x76a03720, ptr_src=0x76a03720, > iter=0x7fffd5d0) at ../../gcc/tree-chkp.c:3734 > #1 0x00ec7c7d in chkp_find_bounds (ptr=0x76a03720, > iter=0x7fffd5d0) at ../../gcc/tree-chkp.c:3768 > #2 0x00ec22e1 in chkp_add_bounds_to_call_stmt (gsi=0x7fffd5d0) > at ../../gcc/tree-chkp.c:1901 > #3 0x00ec9a1a in chkp_instrument_function () at > ../../gcc/tree-chkp.c:4344 > #4 0x00eca4cb in chkp_execute () at ../../gcc/tree-chkp.c:4528 So this happens for calls with mismatched arguments? I believe that if (BOUNDED_TYPE_P (type) || pass_by_reference (NULL, TYPE_MODE (type), type, true)) new_args.safe_push (chkp_find_bounds (call_arg, gsi)); may be misguided given pass_by_reference is not exposed at GIMPLE and thus for the pass_by_reference case it should fall back to "chkp_get_invalid_op_bounds" and thus eventually pass this down as a flag to chkp_find_bounds (or check on call_arg again). Note that here 'type' and call_arg may not match (asking for trouble). Plus 'fntype' is computed bogously (should use gimple_call_fntype). It later does sth like /* For direct calls fndecl is replaced with instrumented version. */ if (fndecl) { tree new_decl = chkp_maybe_create_clone (fndecl)->decl; gimple_call_set_fndecl (new_call, new_decl); /* In case of a type cast we should modify used function type instead of using type of new fndecl. */ if (gimple_call_fntype (call) != TREE_TYPE (fndecl)) { tree type = gimple_call_fntype (call); type = chkp_copy_function_type_adding_bounds (type); gimple_call_set_fntype (new_call, type); } else gimple_call_set_fntype (new_call, TREE_TYPE (new_decl)); but obviously if gimple_call_fntype and fndecl don't match cloning the mismatched decl isn't of very much help ... (it may just generate more mismatches with mismatching bounds...). > ... > > Martin > >> >> Richard. >> >>> Martin >
[PATCH] Loop splitting breaks with loops of pointer type
Loop splitting is fine when the control variable is of integer type, but when it is a pointer type the upper bound of the new loop is calculated incorrectly. The calculation should be guard_init + (end-beg), but instead we do guard_init - (end-beg). Fixed thusly. Bootstrapped, regtested. OK? Andrew. 2017-03-08 Andrew HaleyPR tree-optimization/79894 * tree-ssa-loop-split.c (compute_new_first_bound): When calculating the new upper bound, (END-BEG) should be added, not subtracted. Index: gcc/tree-ssa-loop-split.c === --- gcc/tree-ssa-loop-split.c (revision 245948) +++ gcc/tree-ssa-loop-split.c (working copy) @@ -436,7 +436,6 @@ if (POINTER_TYPE_P (TREE_TYPE (guard_init))) { enddiff = gimple_convert (stmts, sizetype, enddiff); - enddiff = gimple_build (stmts, NEGATE_EXPR, sizetype, enddiff); newbound = gimple_build (stmts, POINTER_PLUS_EXPR, TREE_TYPE (guard_init), guard_init, enddiff); 2017-03-08 Andrew Haley PR tree-optimization/79894 * gcc.dg/tree-ssa/pr79943.c: New test. Index: gcc/testsuite/gcc.dg/tree-ssa/pr79943.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr79943.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr79943.c (revision 0) @@ -0,0 +1,40 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */ +/* { dg-require-effective-target int32plus } */ + +#ifdef __cplusplus +extern "C" void abort (void); +#else +extern void abort (void); +#endif + +typedef struct { + int n; +} region_t; + +void set (region_t *region) __attribute__((noinline)); +void doit (region_t *beg, region_t *end, region_t *limit) + __attribute__((noinline)); + +region_t regions[10]; + +void +set (region_t *region) { + region->n = 1; +} + +void +doit (region_t *beg, region_t *end, region_t *limit) { + for (region_t *cur = beg; cur < end; cur++) { +if (cur < limit) { + set(cur); +} + } +} + +int +main (void) { + doit([0], [2], [10]); + if (regions[1].n != 1) +abort(); +}
Re: stabilize store merging
On Wed, Mar 8, 2017 at 12:58 AM, Alexandre Olivawrote: > Don't let pointer randomization change the order in which we process > store chains. This may cause SSA_NAMEs to be released in different > order, and if they're reused later, they may cause differences in SSA > partitioning, leading to differences in expand, and ultimately to > different code. > > bootstrap-debug-lean (-fcompare-debug) on i686-linux-gnu has failed in > haifa-sched.c since r245196 exposed the latent ordering problem in > store merging. In this case, the IR differences (different SSA names > selected for copies in out-of-SSA, resulting in some off-by-one > differences in pseudos) were not significant enough to be visible in > the compiler output. > > Regstrapped on x86_64-linux-gnu and i686-linux-gnu. Ok to install? Any reason you chose to maintain a new hash-map instead of at interesting places gather to-be-processed chains into a vector and sort that after IDs? Traversing the hash-map still doesn't get you a reliable order but only one depending on the chain creation and thus stmt walk order - plus the hash function and size - which may be good enough to fix the issue at hand of course. But it will for example still make testcase reduction fragile if shrinking the hash-map by removing unrelated code ends up processing things in different order. So - if we fix this can we fix it by properly sorting things? Thanks, Richard. > for gcc/ChangeLog > > * gimple-ssa-store-merging.c (INCLUDE_MAP): Define. > (struct imm_store_chain_info): Add seqno field and ctor parm. > (class pass_store_merging): Add m_store_seq field. > (pass_store_merging::terminate_and_process_all_chains): > Iterate over m_store_seq. > (pass_store_merging::terminate_all_aliasing_chains): > Likewise. > (pass_store_merging::terminate_and_release_chain): > Erase the m_store_seq entry. > (pass_store_merging::execute): Check for debug stmts first. > Compute seqno, add the m_store_seq entry. > --- > gcc/gimple-ssa-store-merging.c | 58 > +--- > 1 file changed, 42 insertions(+), 16 deletions(-) > > diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c > index 17ac94a..0bf7d89 100644 > --- a/gcc/gimple-ssa-store-merging.c > +++ b/gcc/gimple-ssa-store-merging.c > @@ -103,6 +103,7 @@ >[p + 4B] (16-bit) := 0xabcd; // val & 0x; */ > > #include "config.h" > +#define INCLUDE_MAP > #include "system.h" > #include "coretypes.h" > #include "backend.h" > @@ -675,11 +676,12 @@ merged_store_group::apply_stores () > > struct imm_store_chain_info > { > + unsigned seqno; >tree base_addr; >auto_vec m_store_info; >auto_vec m_merged_store_groups; > > - imm_store_chain_info (tree b_a) : base_addr (b_a) {} > + imm_store_chain_info (unsigned seq, tree b_a) : seqno (seq), base_addr > (b_a) {} >bool terminate_and_process_chain (); >bool coalesce_immediate_stores (); >bool output_merged_store (merged_store_group *); > @@ -718,6 +720,18 @@ public: > private: >hash_map m_stores; > > + /* Use m_store_seq to order elements in m_stores, and iterate over > + them in a predictable way. It maps sequence numbers to the base > + addresses stored as keys in m_stores. Using this order avoids > + extraneous differences in the compiler output just because of > + tree pointer variations (e.g. different chains end up in > + different positions of m_stores, so they are handled in different > + orders, so they allocate or release SSA names in different > + orders, and when they get reused, subsequent passes end up > + getting different SSA names, which may ultimately change > + decisions when going out of SSA). */ > + std::map m_store_seq; > + >bool terminate_and_process_all_chains (); >bool terminate_all_aliasing_chains (imm_store_chain_info **, > bool, gimple *); > @@ -730,11 +744,16 @@ private: > bool > pass_store_merging::terminate_and_process_all_chains () > { > - hash_map ::iterator iter > -= m_stores.begin (); >bool ret = false; > - for (; iter != m_stores.end (); ++iter) > -ret |= terminate_and_release_chain ((*iter).second); > + for (std::map ::iterator next = m_store_seq.begin (), > +iter = next; iter != m_store_seq.end (); iter = next) > +{ > + next++; > + tree base_addr = (*iter).second; > + ret |= terminate_and_release_chain (*m_stores.get (base_addr)); > +} > + gcc_assert (m_stores.elements () == 0); > + gcc_assert (m_store_seq.empty ()); > >return ret; > } > @@ -799,15 +818,17 @@ pass_store_merging::terminate_all_aliasing_chains > (imm_store_chain_info > } > } > > -
[PATCH][ARM] PR target/79911: Invalid vec_select arguments
Hi all, This patch fixes the NEON patterns with Jakub's genrecog verification improvements: ../../gcc/config/arm/neon.md:1338:1: element mode mismatch between vec_select HImode and its operand QImode ../../gcc/config/arm/neon.md:1338:1: element mode mismatch between vec_select SImode and its operand QImode ../../gcc/config/arm/neon.md:1338:1: element mode mismatch between vec_select QImode and its operand HImode ../../gcc/config/arm/neon.md:1338:1: element mode mismatch between vec_select SImode and its operand HImode ../../gcc/config/arm/neon.md:1338:1: element mode mismatch between vec_select QImode and its operand SImode ../../gcc/config/arm/neon.md:1338:1: element mode mismatch between vec_select HImode and its operand SImode ../../gcc/config/arm/neon.md:1353:1: element mode mismatch between vec_select HImode and its operand QImode ../../gcc/config/arm/neon.md:1353:1: element mode mismatch between vec_select SImode and its operand QImode ../../gcc/config/arm/neon.md:1353:1: element mode mismatch between vec_select QImode and its operand HImode ../../gcc/config/arm/neon.md:1353:1: element mode mismatch between vec_select SImode and its operand HImode ../../gcc/config/arm/neon.md:1353:1: element mode mismatch between vec_select QImode and its operand SImode ../../gcc/config/arm/neon.md:1353:1: element mode mismatch between vec_select HImode and its operand SImode ../../gcc/config/arm/neon.md:1407:1: element mode mismatch between vec_select HImode and its operand QImode ../../gcc/config/arm/neon.md:1407:1: element mode mismatch between vec_select SImode and its operand QImode ../../gcc/config/arm/neon.md:1407:1: element mode mismatch between vec_select QImode and its operand HImode ../../gcc/config/arm/neon.md:1407:1: element mode mismatch between vec_select SImode and its operand HImode ../../gcc/config/arm/neon.md:1407:1: element mode mismatch between vec_select QImode and its operand SImode ../../gcc/config/arm/neon.md:1407:1: element mode mismatch between vec_select HImode and its operand SImode ../../gcc/config/arm/neon.md:1422:1: element mode mismatch between vec_select HImode and its operand QImode ../../gcc/config/arm/neon.md:1422:1: element mode mismatch between vec_select SImode and its operand QImode ../../gcc/config/arm/neon.md:1422:1: element mode mismatch between vec_select QImode and its operand HImode ../../gcc/config/arm/neon.md:1422:1: element mode mismatch between vec_select SImode and its operand HImode ../../gcc/config/arm/neon.md:1422:1: element mode mismatch between vec_select QImode and its operand SImode ../../gcc/config/arm/neon.md:1422:1: element mode mismatch between vec_select HImode and its operand SImode The problem is that the affected patterns use both the VQI iterator [V16QI V8HI V4SI] and the VW iterator [V8QI V4HI V2SI]. The intent was to use V8QI when VQI returned V16QI, V4HI when VQI returned V8HI etc. But using both iterators at the same time creates a cartesian product of combinations, so the iterators expanded to some patterns that are not valid that is, the inner mode of the vec_select mode being not the same as the inner mode of its vector argument. This patch fixes that by using the appropriate V_HALF mode attribute instead of the VW iterator, so VQI is the only real mode iterator used in these patterns. I took care to leave the expanded name of each pattern the same so that their gen_* invocations from widen_ssum3 and widen_usum3 remain the same. With this patch the arm backend passes the genrecog validations. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? Thanks, Kyrill 2017-03-08 Kyrylo TkachovPR target/79911 * config/arm/neon.md (vec_sel_widen_ssum_lo3): Rename to... (vec_sel_widen_ssum_lo3): ... This. Avoid mismatch between vec_select and vector argument. (vec_sel_widen_ssum_hi3): Rename to... (vec_sel_widen_ssum_hi3): ... This. Likewise. (vec_sel_widen_usum_lo3): Rename to... (vec_sel_widen_usum_lo3): ... This. (vec_sel_widen_usum_hi3): Rename to... (vec_sel_widen_usum_hi3): ... This. diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index cf281df..50d89eb 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1335,14 +1335,14 @@ (define_expand "widen_ssum3" } ) -(define_insn "vec_sel_widen_ssum_lo3" - [(set (match_operand: 0 "s_register_operand" "=w") - (plus: - (sign_extend: - (vec_select:VW +(define_insn "vec_sel_widen_ssum_lo3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: + (sign_extend: + (vec_select: (match_operand:VQI 1 "s_register_operand" "%w") (match_operand:VQI 2 "vect_par_constant_low" ""))) - (match_operand: 3 "s_register_operand" "0")))] + (match_operand: 3 "s_register_operand" "0")))] "TARGET_NEON" { return BYTES_BIG_ENDIAN ? "vaddw.\t%q0, %q3, %f1" : @@ -1350,13 +1350,14 @@ (define_insn "vec_sel_widen_ssum_lo3" } [(set_attr
Re: [PATCH] Fix PRs 79955 and 79956 (hopefully)
On Wed, Mar 08, 2017 at 11:12:47AM +0100, Richard Biener wrote: > > The following avoids warning about completely out-of-bound accesses > as uninitialized. > > For GCC8 we likely want to enhance path isolation to catch those > cases (and eventually issue a diagnostic about them). And we > finally want to fix complete peeling to not introduce those > accesses. > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. LGTM, thanks. > 2017-03-08 Richard Biener> > PR tree-optimization/79955 > PR tree-optimization/79956 > * tree-ssa-uninit.c (warn_uninitialized_vars): Do not warn > for accesses that are completely outside of the variable. > > * gcc.dg/uninit-24.c: New testcase. Jakub
[PATCH] Fix PRs 79955 and 79956 (hopefully)
The following avoids warning about completely out-of-bound accesses as uninitialized. For GCC8 we likely want to enhance path isolation to catch those cases (and eventually issue a diagnostic about them). And we finally want to fix complete peeling to not introduce those accesses. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2017-03-08 Richard BienerPR tree-optimization/79955 PR tree-optimization/79956 * tree-ssa-uninit.c (warn_uninitialized_vars): Do not warn for accesses that are completely outside of the variable. * gcc.dg/uninit-24.c: New testcase. Index: gcc/tree-ssa-uninit.c === --- gcc/tree-ssa-uninit.c (revision 245968) +++ gcc/tree-ssa-uninit.c (working copy) @@ -287,6 +287,17 @@ warn_uninitialized_vars (bool warn_possi || TREE_NO_WARNING (base)) continue; + /* Do not warn if the access is fully outside of the +variable. */ + if (ref.size != -1 + && ref.max_size == ref.size + && (ref.offset + ref.size <= 0 + || (ref.offset >= 0 + && TREE_CODE (DECL_SIZE (base)) == INTEGER_CST + && compare_tree_int (DECL_SIZE (base), + ref.offset) <= 0))) + continue; + /* Limit the walking to a constant number of stmts after we overcommit quadratic behavior for small functions and O(n) behavior. */ Index: gcc/testsuite/gcc.dg/uninit-24.c === --- gcc/testsuite/gcc.dg/uninit-24.c(nonexistent) +++ gcc/testsuite/gcc.dg/uninit-24.c(working copy) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O -Wmaybe-uninitialized" } */ + +int foo (int x) +{ + int y; + if (x) +return *( + 1); /* { dg-bogus "may be used uninitialized" } */ + return 0; +}
Re: [PATCH] Small ubsan vector arith optimization to fix one testcase from PR sanitizer/79904
On Wed, Mar 08, 2017 at 11:06:38AM +0100, Jakub Jelinek wrote: > On Wed, Mar 08, 2017 at 11:03:53AM +0100, Richard Biener wrote: > > On Wed, 8 Mar 2017, Jakub Jelinek wrote: > > > > > On Wed, Mar 08, 2017 at 09:15:05AM +0100, Richard Biener wrote: > > > > Ok. Note that another option for the loopy case is to do > > > > > > > > for (;;) > > > > { > > > > vec >> by-one-elt; > > > > elt = BIT_FIELD_REF; > > > > } > > > > > > Indeed, that is a possibility, but I guess I'd need to construct > > > the result similarly if resv is non-NULL. Also, not sure about big endian > > > vectors and whether BIT_FIELD_REF with zero or size - elt_size is > > > more efficient there. > > > > > > In any case, the PR was about s390 without vectors enabled, so this > > > wouldn't > > > apply. > > > > > > > when whole-vector shifts are available (they are constructed by > > > > VEC_PERM_EXPR if vec_perm_const_ok for that mask). If you end up > > > > doing variable-index array accesses you can as well spill the > > > > vector to memory and use memory accesses on that. Not sure how > > > > to arrange that from this part of the expander. > > > > > > Shouldn't something else during the expansion force it into memory if it > > > is > > > more efficient to expand it that way? Apparently it is forced into memory > > > > Possibly - but it might end up spilling in the loop itself and thus be > > rather inefficient? > > Ok. That said, this is -fsanitize=undefined which slows down code anyway, > so making it more efficient can be done in GCC8 IMNSHO. Indeed -- I'd push out that to 8. Marek
Re: [PATCH] Small ubsan vector arith optimization to fix one testcase from PR sanitizer/79904
On Wed, Mar 08, 2017 at 11:03:53AM +0100, Richard Biener wrote: > On Wed, 8 Mar 2017, Jakub Jelinek wrote: > > > On Wed, Mar 08, 2017 at 09:15:05AM +0100, Richard Biener wrote: > > > Ok. Note that another option for the loopy case is to do > > > > > > for (;;) > > > { > > > vec >> by-one-elt; > > > elt = BIT_FIELD_REF; > > > } > > > > Indeed, that is a possibility, but I guess I'd need to construct > > the result similarly if resv is non-NULL. Also, not sure about big endian > > vectors and whether BIT_FIELD_REF with zero or size - elt_size is > > more efficient there. > > > > In any case, the PR was about s390 without vectors enabled, so this wouldn't > > apply. > > > > > when whole-vector shifts are available (they are constructed by > > > VEC_PERM_EXPR if vec_perm_const_ok for that mask). If you end up > > > doing variable-index array accesses you can as well spill the > > > vector to memory and use memory accesses on that. Not sure how > > > to arrange that from this part of the expander. > > > > Shouldn't something else during the expansion force it into memory if it is > > more efficient to expand it that way? Apparently it is forced into memory > > Possibly - but it might end up spilling in the loop itself and thus be > rather inefficient? Ok. That said, this is -fsanitize=undefined which slows down code anyway, so making it more efficient can be done in GCC8 IMNSHO. Jakub
Re: [PATCH] Small ubsan vector arith optimization to fix one testcase from PR sanitizer/79904
On Wed, 8 Mar 2017, Jakub Jelinek wrote: > On Wed, Mar 08, 2017 at 09:15:05AM +0100, Richard Biener wrote: > > Ok. Note that another option for the loopy case is to do > > > > for (;;) > > { > > vec >> by-one-elt; > > elt = BIT_FIELD_REF; > > } > > Indeed, that is a possibility, but I guess I'd need to construct > the result similarly if resv is non-NULL. Also, not sure about big endian > vectors and whether BIT_FIELD_REF with zero or size - elt_size is > more efficient there. > > In any case, the PR was about s390 without vectors enabled, so this wouldn't > apply. > > > when whole-vector shifts are available (they are constructed by > > VEC_PERM_EXPR if vec_perm_const_ok for that mask). If you end up > > doing variable-index array accesses you can as well spill the > > vector to memory and use memory accesses on that. Not sure how > > to arrange that from this part of the expander. > > Shouldn't something else during the expansion force it into memory if it is > more efficient to expand it that way? Apparently it is forced into memory Possibly - but it might end up spilling in the loop itself and thus be rather inefficient? > on s390 and the ICE is that the backend doesn't like something on it. Could be - as I said I didn't look into what the ICE actually is. Richard.
[PATCH] For broken exception handling in GDB on AIX platform
Hi, I got some review comment from Bernhard Reutner-Fischer, and I have updated the patch accordingly. This patch is for bug opened here:https://sourceware.org/bugzilla/show_bug.cgi?id=21187 Please find the attachment below. Thanks and Regards, Nitish K Mishra. diff --git a/configure.ac b/configure.ac index 3ec86c1..c400251 100644 --- a/configure.ac +++ b/configure.ac @@ -471,6 +471,13 @@ ENABLE_LIBSTDCXX=default) noconfigdirs="$noconfigdirs target-libstdc++-v3" fi] +AC_ARG_ENABLE(staticlib, +AS_HELP_STRING([--disable-staticlib], + [do not link libstdc++ and libgcc library statically, default is static linking]), +ENABLE_STATICLIB=$enableval, +ENABLE_STATICLIB=yes) + + # If this is accelerator compiler and its target is intelmic we enable # target liboffloadmic by default. If this is compiler with offloading # for intelmic we enable host liboffloadmic by default. Otherwise @@ -1406,9 +1413,10 @@ if test -z "$LD"; then fi fi -# Check whether -static-libstdc++ -static-libgcc is supported. +# If ENABLE_STATICLIB is set for configuration, check whether -static-libstdc++ -static-libgcc is supported have_static_libs=no -if test "$GCC" = yes; then +if test "$ENABLE_STATICLIB" = yes; then + if test "$GCC" = yes; then saved_LDFLAGS="$LDFLAGS" LDFLAGS="$LDFLAGS -static-libstdc++ -static-libgcc" @@ -1424,6 +1432,7 @@ int main() {}], AC_LANG_POP(C++) LDFLAGS="$saved_LDFLAGS" + fi fi ACX_PROG_GNAT @@ -1741,6 +1750,9 @@ AC_ARG_WITH(stage1-ldflags, # trust that they are doing what they want. if test "$stage1_libs" = "" -a "$have_static_libs" = yes; then stage1_ldflags="-static-libstdc++ -static-libgcc" + else + # If static lib is disabled. + stage1_ldflags="" fi]) AC_SUBST(stage1_ldflags) @@ -1768,8 +1780,11 @@ AC_ARG_WITH(boot-ldflags, # In stages 2 and 3, default to linking libstdc++ and libgcc # statically. But if the user explicitly specified the libraries to # use, trust that they are doing what they want. - if test "$poststage1_libs" = ""; then + if test "$poststage1_libs" = "" -a "$have_static_libs" = yes; then poststage1_ldflags="-static-libstdc++ -static-libgcc" + else + # If static library linking is disabled. + poststage1_ldflags="" fi]) AC_SUBST(poststage1_ldflags)
[Ada] Fix PR ada/79903
This (apparently) fixes the build of the runtime on RTEMS. Tested on x86-64/Linux, applied on mainline and 6 branch. 2017-03-08 Thanassis TsiodrasPR ada/79903 * socket.c (__gnat_gethostbyaddr): Add missing test for __rtems__. -- Eric BotcazouIndex: socket.c === --- socket.c (revision 245628) +++ socket.c (working copy) @@ -202,7 +202,7 @@ __gnat_gethostbyaddr (const char *addr, struct hostent *rh; int ri; -#if defined(__linux__) || defined(__GLIBC__) +#if defined(__linux__) || defined(__GLIBC__) || defined(__rtems__) (void) gethostbyaddr_r (addr, len, type, ret, buf, buflen, , h_errnop); #else rh = gethostbyaddr_r (addr, len, type, ret, buf, buflen, h_errnop);
Re: [PATCH] Small ubsan vector arith optimization to fix one testcase from PR sanitizer/79904
On Wed, Mar 08, 2017 at 09:15:05AM +0100, Richard Biener wrote: > Ok. Note that another option for the loopy case is to do > > for (;;) > { > vec >> by-one-elt; > elt = BIT_FIELD_REF; > } Indeed, that is a possibility, but I guess I'd need to construct the result similarly if resv is non-NULL. Also, not sure about big endian vectors and whether BIT_FIELD_REF with zero or size - elt_size is more efficient there. In any case, the PR was about s390 without vectors enabled, so this wouldn't apply. > when whole-vector shifts are available (they are constructed by > VEC_PERM_EXPR if vec_perm_const_ok for that mask). If you end up > doing variable-index array accesses you can as well spill the > vector to memory and use memory accesses on that. Not sure how > to arrange that from this part of the expander. Shouldn't something else during the expansion force it into memory if it is more efficient to expand it that way? Apparently it is forced into memory on s390 and the ICE is that the backend doesn't like something on it. Jakub
[Ada] Fix PR ada/79945
This adjusts the declaration of Default_Bit_Order for PowerPC/Linux to handle the new little-endian variant. And, on mainline only, this merges system.ads files for the big-endian & little-endian variants of ARM/Linux and MIPS/Linux, since their only difference is precisely the Default_Bit_Order setting. Tested on x86-64/Linux and PowerPC64/Linux, applied on mainline and 6 branch. 2017-03-08 Eric BotcazouPR ada/79945 * system-linux-ppc.ads (Default_Bit_Order): Use Standard's setting. * system-linux-arm.ads (Default_Bit_Order): Likewise. * system-linux-mips.ads (Default_Bit_Order): Likewise. * system-linux-armeb.ads: Delete. * system-linux-mipsel.ads: Likewise. * gcc-interface/Makefile.in (MIPS/Linux): Adjust. (ARM/Linux): Likewise. -- Eric BotcazouIndex: gcc-interface/Makefile.in === --- gcc-interface/Makefile.in (revision 245767) +++ gcc-interface/Makefile.in (working copy) @@ -1871,12 +1871,7 @@ ifeq ($(strip $(filter-out mips% linux%, s-taspri.adshttp://www.gnu.org/licenses/>. -- --- -- --- GNAT was originally developed by the GNAT team at New York University. -- --- Extensive contributions were provided by Ada Core Technologies Inc. -- --- -- --- - -package System is - pragma Pure; - -- Note that we take advantage of the implementation permission to make - -- this unit Pure instead of Preelaborable; see RM 13.7.1(15). In Ada - -- 2005, this is Pure in any case (AI-362). - - pragma No_Elaboration_Code_All; - -- Allow the use of that restriction in units that WITH this unit - - type Name is (SYSTEM_NAME_GNAT); - System_Name : constant Name := SYSTEM_NAME_GNAT; - - -- System-Dependent Named Numbers - - Min_Int : constant := Long_Long_Integer'First; - Max_Int : constant := Long_Long_Integer'Last; - - Max_Binary_Modulus: constant := 2 ** Long_Long_Integer'Size; - Max_Nonbinary_Modulus : constant := 2 ** Integer'Size - 1; - - Max_Base_Digits : constant := Long_Long_Float'Digits; - Max_Digits: constant := Long_Long_Float'Digits; - - Max_Mantissa : constant := 63; - Fine_Delta: constant := 2.0 ** (-Max_Mantissa); - - Tick : constant := 0.000_001; - - -- Storage-related Declarations - - type Address is private; - pragma Preelaborable_Initialization (Address); - Null_Address : constant Address; - - Storage_Unit : constant := 8; - Word_Size: constant := Standard'Word_Size; - Memory_Size : constant := 2 ** Word_Size; - - -- Address comparison - - function "<" (Left, Right : Address) return Boolean; - function "<=" (Left, Right : Address) return Boolean; - function ">" (Left, Right : Address) return Boolean; - function ">=" (Left, Right : Address) return Boolean; - function "=" (Left, Right : Address) return Boolean; - - pragma Import (Intrinsic, "<"); - pragma Import (Intrinsic, "<="); - pragma Import (Intrinsic, ">"); - pragma Import (Intrinsic, ">="); - pragma Import (Intrinsic, "="); - - -- Other System-Dependent Declarations - - type Bit_Order is (High_Order_First, Low_Order_First); - Default_Bit_Order : constant Bit_Order := High_Order_First; - pragma Warnings (Off, Default_Bit_Order); -- kill constant condition warning - - -- Priority-related Declarations (RM D.1) - - -- 0 .. 98 corresponds to the system priority range 1 .. 99. - -- - -- If the scheduling policy is SCHED_FIFO or SCHED_RR the runtime makes use - -- of the entire range provided by the system. - -- - -- If the scheduling policy is SCHED_OTHER the only valid system priority - -- is 1 and other values are simply ignored. - - Max_Priority : constant Positive := 97; - Max_Interrupt_Priority : constant Positive := 98; - - subtype Any_Priority is Integer range 0 .. 98; - subtype Priority is Any_Priority range 0 .. 97; - subtype Interrupt_Priority is Any_Priority range 98 .. 98; - - Default_Priority : constant Priority := 48; - -private - - type Address is mod Memory_Size; - Null_Address : constant Address := 0; - - -- - -- System Implementation Parameters -- - -- - - -- These parameters provide information about the target that is used - -- by the compiler. They are in the private part of System, where they - -- can be accessed using the special circuitry in the Targparm unit - -- whose source should be consulted for more detailed descriptions - -- of the individual switch
[committed] Fix avr build
Hi! Richard reported that after my PR79345 r245833 fix avr no longer builds: ../../gcc/config/avr/avr.md:1060:1: define_expand for setmemhi needs to have match_scratch numbers above all other operands The following patch ought to fix that, committed as obvious to trunk. 2017-03-08 Jakub Jelinek* config/avr/avr.md (setmemhi): Make sure match_dup operand number comes before match_scratch. --- gcc/config/avr/avr.md.jj2017-02-06 13:32:14.0 +0100 +++ gcc/config/avr/avr.md 2017-03-08 09:46:39.828773285 +0100 @@ -1062,8 +1062,8 @@ (define_expand "setmemhi" (match_operand 2 "const_int_operand" "")) (use (match_operand:HI 1 "const_int_operand" "")) (use (match_operand:HI 3 "const_int_operand" "")) - (clobber (match_scratch:HI 4 "")) - (clobber (match_dup 5))])] + (clobber (match_scratch:HI 5 "")) + (clobber (match_dup 4))])] "" { rtx addr0; @@ -1077,7 +1077,7 @@ (define_expand "setmemhi" FAIL; mode = u8_operand (operands[1], VOIDmode) ? QImode : HImode; -operands[5] = gen_rtx_SCRATCH (mode); +operands[4] = gen_rtx_SCRATCH (mode); operands[1] = copy_to_mode_reg (mode, gen_int_mode (INTVAL (operands[1]), mode)); addr0 = copy_to_mode_reg (Pmode, XEXP (operands[0], 0)); Jakub
[PATCH] Fix PR79920
The following fixes a bug in vect_transform_slp_perm_load which tries to be clever in computing sth like "ncopies" but fails to do that correctly (and in fact it can't be done). Instead just compute all loads/permutations manually. The fix is as simple as Index: tree-vect-slp.c === --- tree-vect-slp.c (revision 245279) +++ tree-vect-slp.c (working copy) @@ -3412,7 +3412,7 @@ vect_transform_slp_perm_load (slp_tree n int second_vec_index = -1; bool noop_p = true; - for (int j = 0; j < unroll_factor; j++) + for (int j = 0; j < vf; j++) { for (int k = 0; k < group_size; k++) { @@ -3486,7 +3486,7 @@ vect_transform_slp_perm_load (slp_tree n vect_create_mask_and_perm (stmt, mask_vec, first_vec_index, second_vec_index, gsi, node, vectype, dr_chain, -ncopies, vect_stmts_counter++); +1, vect_stmts_counter++); } index = 0; but it allows dead code to be removed and the now pointless helper vect_create_mask_and_perm to vanish. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2017-03-08 Richard BienerPR tree-optimization/79920 * tree-vect-slp.c (vect_create_mask_and_perm): Remove and inline with ncopies == 1 to ... (vect_transform_slp_perm_load): ... here. Properly compute all element loads by iterating VF times over the group. Do not handle ncopies (computed in a broken way) in vect_create_mask_and_perm. * gcc.dg/vect/pr79920.c: New testcase. Index: gcc/tree-vect-slp.c === *** gcc/tree-vect-slp.c (revision 245947) --- gcc/tree-vect-slp.c (working copy) *** vect_get_slp_defs (vec ops, slp_tr *** 3379,3444 } } - - /* Create NCOPIES permutation statements using the mask MASK_BYTES (by -building a vector of type MASK_TYPE from it) and two input vectors placed in -DR_CHAIN at FIRST_VEC_INDX and SECOND_VEC_INDX for the first copy and -shifting by STRIDE elements of DR_CHAIN for every copy. -(STRIDE is the number of vectorized stmts for NODE divided by the number of -copies). -VECT_STMTS_COUNTER specifies the index in the vectorized stmts of NODE, where -the created stmts must be inserted. */ - - static inline void - vect_create_mask_and_perm (gimple *stmt, -tree mask, int first_vec_indx, int second_vec_indx, -gimple_stmt_iterator *gsi, slp_tree node, -tree vectype, vec dr_chain, -int ncopies, int vect_stmts_counter) - { - tree perm_dest; - gimple *perm_stmt = NULL; - int i, stride_in, stride_out; - tree first_vec, second_vec, data_ref; - - stride_out = SLP_TREE_NUMBER_OF_VEC_STMTS (node) / ncopies; - stride_in = dr_chain.length () / ncopies; - - /* Initialize the vect stmts of NODE to properly insert the generated - stmts later. */ - for (i = SLP_TREE_VEC_STMTS (node).length (); -i < (int) SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++) - SLP_TREE_VEC_STMTS (node).quick_push (NULL); - - perm_dest = vect_create_destination_var (gimple_assign_lhs (stmt), vectype); - for (i = 0; i < ncopies; i++) - { - first_vec = dr_chain[first_vec_indx]; - second_vec = dr_chain[second_vec_indx]; - - /* Generate the permute statement if necessary. */ - if (mask) - { - perm_stmt = gimple_build_assign (perm_dest, VEC_PERM_EXPR, - first_vec, second_vec, mask); - data_ref = make_ssa_name (perm_dest, perm_stmt); - gimple_set_lhs (perm_stmt, data_ref); - vect_finish_stmt_generation (stmt, perm_stmt, gsi); - } - else - /* If mask was NULL_TREE generate the requested identity transform. */ - perm_stmt = SSA_NAME_DEF_STMT (first_vec); - - /* Store the vector statement in NODE. */ - SLP_TREE_VEC_STMTS (node)[stride_out * i + vect_stmts_counter] - = perm_stmt; - - first_vec_indx += stride_in; - second_vec_indx += stride_in; - } - } - - /* Generate vector permute statements from a list of loads in DR_CHAIN. If ANALYZE_ONLY is TRUE, only check that it is possible to create valid permute statements for the SLP node NODE of the SLP instance --- 3379,3384 *** vect_transform_slp_perm_load (slp_tree n *** 3456,3462 int nunits, vec_index = 0; tree vectype = STMT_VINFO_VECTYPE (stmt_info); int group_size = SLP_INSTANCE_GROUP_SIZE (slp_node_instance); ! int unroll_factor, mask_element, ncopies; unsigned char *mask;
Re: [PATCH] Small ubsan vector arith optimization to fix one testcase from PR sanitizer/79904
On Tue, 7 Mar 2017, Jakub Jelinek wrote: > Hi! > > If any of the operands of the UBSAN_{ADD,SUB,MUL}_OVERFLOW ifn with > vector operand is a uniform vector, expanding it as VCE on the VECTOR_CST > followed by ARRAY_REF with variable index in the loop is unnecessarily > expensive and nothing fixes it afterwards. > This works around ICE on s390 at least for the uniform vectors. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Note that another option for the loopy case is to do for (;;) { vec >> by-one-elt; elt = BIT_FIELD_REF; } when whole-vector shifts are available (they are constructed by VEC_PERM_EXPR if vec_perm_const_ok for that mask). If you end up doing variable-index array accesses you can as well spill the vector to memory and use memory accesses on that. Not sure how to arrange that from this part of the expander. Unsure what the ICE you see with s390 is about... Thanks, Richard. > 2017-03-07 Jakub Jelinek > > PR sanitizer/79904 > * internal-fn.c (expand_vector_ubsan_overflow): If arg0 or arg1 > is a uniform vector, use uniform_vector_p return value instead of > building ARRAY_REF on folded VIEW_CONVERT_EXPR to array type. > > * gcc.dg/ubsan/pr79904.c: New test. > > --- gcc/internal-fn.c.jj 2017-02-23 08:48:40.0 +0100 > +++ gcc/internal-fn.c 2017-03-07 11:55:56.261465702 +0100 > @@ -1869,12 +1869,20 @@ expand_vector_ubsan_overflow (location_t >if (cnt > 4) > { > tree atype = build_array_type_nelts (eltype, cnt); > - op0 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg0); > - op0 = build4_loc (loc, ARRAY_REF, eltype, op0, cntv, > - NULL_TREE, NULL_TREE); > - op1 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg1); > - op1 = build4_loc (loc, ARRAY_REF, eltype, op1, cntv, > - NULL_TREE, NULL_TREE); > + op0 = uniform_vector_p (arg0); > + if (op0 == NULL_TREE) > + { > + op0 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg0); > + op0 = build4_loc (loc, ARRAY_REF, eltype, op0, cntv, > + NULL_TREE, NULL_TREE); > + } > + op1 = uniform_vector_p (arg1); > + if (op1 == NULL_TREE) > + { > + op1 = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, arg1); > + op1 = build4_loc (loc, ARRAY_REF, eltype, op1, cntv, > + NULL_TREE, NULL_TREE); > + } > if (resv) > { > res = fold_build1_loc (loc, VIEW_CONVERT_EXPR, atype, resv); > --- gcc/testsuite/gcc.dg/ubsan/pr79904.c.jj 2017-03-07 11:58:53.266120958 > +0100 > +++ gcc/testsuite/gcc.dg/ubsan/pr79904.c 2017-03-07 11:58:46.0 > +0100 > @@ -0,0 +1,11 @@ > +/* PR sanitizer/79904 */ > +/* { dg-do compile } */ > +/* { dg-options "-fsanitize=signed-integer-overflow -Wno-psabi" } */ > + > +typedef signed char V __attribute__((vector_size (8))); > + > +void > +foo (V *a) > +{ > + *a = *a * 3; > +} > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH] Increment value instead of a pointer in ADA macro processing.
> I prepared quite obvious fix for that and tested make check -k > RUNTESTFLAGS="dg.exp=dump-ada-spec-*" on x86_64-linux-gnu. Thanks for spotting and fixing it. Please put the fix on the 6 branch too. -- Eric Botcazou