Re: [PATCH version 4, rs6000] Add builtins to convert from float/double to int/long using current rounding mode
Hi Carl, Some comments, mostly trivial: On Thu, Sep 21, 2017 at 04:09:11PM -0700, Carl Love wrote: > --- a/gcc/config/rs6000/rs6000-builtin.def > +++ b/gcc/config/rs6000/rs6000-builtin.def > @@ -608,6 +608,16 @@ > CODE_FOR_ ## ICODE) /* ICODE */ > > > +/* Miscellaneous builtins for instructions added prior to ISA 2.04. These > + operate on floating point registers. */ "operate" should align with "Miscellaneous". > +#define BU_FP_MISC_1(ENUM, NAME, ATTR, ICODE) > \ > + RS6000_BUILTIN_1 (MISC_BUILTIN_ ## ENUM, /* ENUM */ \ > + "__builtin_" NAME, /* NAME */ \ > +RS6000_BTM_HARD_FLOAT, /* MASK */ \ > + (RS6000_BTC_ ## ATTR/* ATTR */ \ > + | RS6000_BTC_UNARY), \ > + CODE_FOR_ ## ICODE) /* ICODE */ RS6000_BTM_HARD_FLOAT needs to be indented one extra space, too. > +(define_insn "lrintsfsi2" > + [(set (match_operand:SI 0 "gpc_reg_operand" "=d") > + (unspec:SI [(match_operand:DF 1 "gpc_reg_operand" "d")] > +UNSPEC_FCTIW))] > + "TARGET_SF_FPR && TARGET_FPRND" > + "fctiw %0,%1" > + [(set_attr "type" "fp")]) Should this be match_operand:SF? Okay for trunk if that works. Thanks, Segher
Re: [PATCH, rs6000] folding of vector stores in GIMPLE
On Fri, Sep 22, 2017 at 02:58:54PM -0500, Bill Schmidt wrote: > OK. Will, for now, let's again use the (void *) solution for the time being, > and > add commentary recording this improvement for future work. Same would > go for the vec_vsx_ld/st variations once you get to those. > > Otherwise the patch looks ok to me. I'll defer to Segher for approval, of > course. It's okay for trunk with the suggested improvements. Thanks for the review! Segher
Re: [PATCH, rs6000] testcase coverage for vector store builtins
On Thu, Sep 21, 2017 at 01:11:11PM -0500, Will Schmidt wrote: > Add testcase coverage for the vec_st (vector store) > intrinsic builtins. > > Tested across power platforms (p6 and newer). OK for trunk? Yep, looks fine. Thanks! Segher > 2017-09-21 Will Schmidt > > * gcc.target/powerpc/fold-vec-st-char.c: New. > * gcc.target/powerpc/fold-vec-st-double.c: New. > * gcc.target/powerpc/fold-vec-st-float.c: New. > * gcc.target/powerpc/fold-vec-st-int.c: New. > * gcc.target/powerpc/fold-vec-st-longlong.c: New. > * gcc.target/powerpc/fold-vec-st-pixel.c: New. > * gcc.target/powerpc/fold-vec-st-short.c: New.
Re: [Patch] Edit contrib/ files to download gfortran prerequisites
On Thu, 21 Sep 2017 09:40:49Richard Biener wrote: > On Wed, Sep 20, 2017 at 10:35 PM, Damian Rouson > wrote: > >> Attached is a patch that adds the downloading of gfortran prerequisites > OpenCoarrays and MPICH in the contrib/download_prerequisites script. The > patch also provides a useful error message when neither wget or curl are > available on the target platform. I tested this patch with several choices > for the command-line options on macOS (including --md5 and --sha512) and > Ubuntu Linux (including --sha512).A suggested ChangeLog entry is >> >> * contrib/download_prerequisites: Download OpenCoarrays and MPICH. >> * contrib/prerequisites.sha5: Add sha512 message digests for > OpenCoarrays and MPICH. >> * contrib/prerequisites.md5: Add md5 message digests for OpenCoarrays > and MPICH. > > > OK for trunk? If so, I’ll ask Jerry to commit this. I don’t have commit > rights. > Can you make this optional similar to graphite/isl? Also I see no support in > the toplevel build machinery to build/install the libs as part of GCC > so how does that work in the end? In the end gcc needs to know what to do and just do it just like mpfr and libquadmath. I started looking at Makefile.def and I am guessing this is where the changes need to be made to enable gcc machinery to actually build the two packages. I am not knowledgeable enough to figure this out and I recently got a new job so now am very time constrained as far as reverse engineering what is going on. I did manage to add Gnu Build System autotools features to OpenCoarrays so we could be closer to integrating this in. (./configure && make vs cmake) We definitely need some guidance from someone regarding the changes required for the top level gcc items. Also we are getting some libtool related build failures with the OpenCoarrays on Mac and some other platforms where we know gcc builds fine, so I suspect we need to build a libtool script or something to ensure that it works everywhere. It works fine on Fedora which is what I use (naturally). So if anyone would step up to the plate and advise on the toplevel machinery it would be greatly appreciated. Regards, Jerry
Re: [PATCH, rs6000 version 3] Add support for vec_xst_len_r() and vec_xl_len_r() builtins
Hi Carl, On Mon, Sep 18, 2017 at 11:31:07AM -0700, Carl Love wrote: > * gcc.target/powerpc/builtins-5-p9-runnable.c: Add new runable test file > for the new built-ins and the existing built-ins. Typo ("runable"). > (define_expand "altivec_lvsr" > - [(use (match_operand:V16QI 0 "register_operand" "")) > + [(use (match_operand:V16QI 0 "altivec_register_operand" "")) > (use (match_operand:V16QI 1 "memory_operand" ""))] Empty constraint strings in define_expand is the default, just leave them out. > +;; Expand for builtin xl_len_r > +(define_expand "xl_len_r" > + [(match_operand:V16QI 0 "vsx_register_operand" "=wa") > + (match_operand:DI 1 "register_operand" "b") > + (match_operand:DI 2 "register_operand" "r")] > + "" Non-empty constraints in an expander do not really make sense either :-) All the rest looks fine. Please fix up the expanders and commit. Thanks! Segher
Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.
On Fri, Sep 22, 2017 at 11:39 AM, Jim Wilson wrote: > On Fri, Sep 22, 2017 at 10:58 AM, Andrew Pinski wrote: >> Two overall comments: >> * What about splitting register_offset into two different elements, >> one for non 128bit modes and one for 128bit (and more; OI, etc.) modes >> so you get better address generation right away for the simd load >> cases rather than having LRA/reload having to reload the address into >> a register. > > I'm not sure if changing register_offset cost would make a difference, > since costs are usually used during optimization, not during address > generation. This is something that I didn't think to try though. I > can try taking a look at this. It does taken into account when fwprop is propagating the addition into the MEM (the tree level is always a_1 = POINTER_PLUS_EXPR; MEM_REF(a_1)). IV-OPTS will produce much better code if the address_cost is correct. It looks like no other pass (combine, etc.) would take that into account except for postreload CSE but maybe they should. > > I did try writing a patch to modify predicates to disallow reg offset > for 128bit modes, and that got complicated, as I had to split apart a > number of patterns in the aarch64-simd.md file that accept both VD and > VQ modes. I ended up with a patch 3-4 times as big as the one I > submitted, without any additional performance improvement, so it > wasn't worth the trouble. > >> * Maybe adding a testcase to the testsuite to show this change. > > Yes, I can add a testcase. > >> One extra comment: >> * should we change the generic tuning to avoid reg+reg for 128bit modes? > > Are there other targets with a similar problem? I only know that it > is a problem for Falkor. It might be a loss for some targets as it is > replacing one instruction with two. Well that is why I was suggesting the address cost model change. Because the cost model change actually might provide better code in the first place and still allow for reasonable generic code to be produced. Thanks, Andrew > > Jim
Re: [PATCH] Fix PR82144
On Sep 21, 2017, Richard Biener wrote: > On Tue, 12 Sep 2017, Richard Biener wrote: >> >> The following avoids adding DW_AT_alignment twice by not doing it >> for incomplete types. >> >> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. >> >> Alex, is that ok or do we want DW_AT_alignment for incomplete types as >> well? > Alex, ping? Sorry, still catching up after Cauldron and recovery from the bug I caught at the conference ;-) Since incomplete enum types are non-standard, there's no standard to guide us. Pointers (the primary viable use of such types) generally benefit from alignment information, but I don't see how the alignment could be meaningfully determined for such types. For debug information, we're probably better off leaving any tentative or uninitialized alignment information out of the DIE, so the patch looks good to me. Thanks for your patience! Reviewed-by: Alexandre Oliva >> PR middle-end/82144 >> * dwarf2out.c (gen_enumeration_type_die): Do not add alignment >> attribute for incomplete types nor twice for complete ones. -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Re: [C++] Fix PR bootstrap/81926
> This looks good. But let's call it debug_type_hash instead. OK with > that change. Thanks. It occured to me that we don't need to look up the type twice when we need to insert it so I have installed the attached patchlet as obvious after boostrapping/regtesting on x86-64/Linux. PR bootstrap/81926 * cp-objcp-common.c (cp_get_debug_type): Do only one lookup. -- Eric BotcazouIndex: cp-objcp-common.c === --- cp-objcp-common.c (revision 253049) +++ cp-objcp-common.c (working copy) @@ -162,13 +162,13 @@ cp_get_debug_type (const_tree type) types on the fly for the debug info only, they would not be attached to any GC root and always be swept, so we would make the contents of the debug info depend on the collection points. */ - struct tree_map in, *h; + struct tree_map in, *h, **slot; in.base.from = CONST_CAST_TREE (type); in.hash = htab_hash_pointer (type); - h = debug_type_hash->find_with_hash (&in, in.hash); - if (h) - return h->to; + slot = debug_type_hash->find_slot_with_hash (&in, in.hash, INSERT); + if (*slot) + return (*slot)->to; tree t = build_offset_type (TYPE_PTRMEMFUNC_OBJECT_TYPE (type), TREE_TYPE (TYPE_PTRMEMFUNC_FN_TYPE (type))); @@ -177,7 +177,7 @@ cp_get_debug_type (const_tree type) h->base.from = CONST_CAST_TREE (type); h->hash = htab_hash_pointer (type); h->to = t; - *debug_type_hash->find_slot_with_hash (h, h->hash, INSERT) = h; + *slot = h; return t; }
Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.
Hi Jim, This looks like a general issue with reg+reg addressing modes being generated in cases where it is not correct. I haven't looked at lmbench for a while, but it generated absolutely horrible code like: add x1, x0, #120 ldr v0, [x2, x1] add x1, x0, #128 ldr v1, [x2, x1] If this is still happening, we need to fix this in a general way as this is bad for any CPU even if reg+reg is fast. Reduced testcases for this would be welcome as it's likely many other targets are affected. A while ago I posted a patch to reassociate (x + C1) * C2 -> x * C2 + C3 to improve cases like this. Note we already support adjusting addressing costs. There are several other CPUs which increase the reg+reg costs. So that's where I would start - assuming reg+reg addressing mode was correctly used. Finally, if you really want to completely disable a particular addressing mode, it's best done in classify_address rather than changing the md patterns. My experience is that if you use anything other than the standard 'm' constraint, GCC reload starts to generate inefficient code even if the pattern should still apply. I have posted several patches to use 'm' more to get better spill code and efficient expansions if the offset happens to be too large. Wilco
Re: [PATCH, rs6000] folding of vector stores in GIMPLE
On Sep 22, 2017, at 5:27 AM, Richard Biener wrote: > > On Thu, Sep 21, 2017 at 10:56 PM, Will Schmidt > wrote: >> Hi, >> >> Folding of vector stores in GIMPLE. >> >> - Add code to handle gimple folding for the vec_st (vector store) builtins. >> - Remove the now obsoleted folding code for vec_st from rs6000-c-c. >> >> There are two spots that I could use some feedback on. >> >> First - >> An early exit remains in place prevent folding of statements that do not >> have a LHS. To allow folding of the stores to get past the check, I have >> added a helper function (rs6000_builtin_valid_without_lhs) that allows >> those store intrinsics to proceed. I'm not sure the approach (or the name I >> chose) >> is the best choice, so I'll defer to recommendations on how to improve that. >> :-) It's fine, but please make the helper function static. >> >> Second - >> This code (as-is) is subject to a TBAA related issue (similar to what was >> noticed >> in the gimple folding code for loads. As-is, with a testcase such as : >> >> void testst_struct1b (vector double vd1, long long ll1, struct S *p) >>{ >>vec_st (vd1, ll1, (vector double *)p); >>} >> >> will generate gimple that looks like: >>MEM[(struct S *)D.3218] = vd1; >> >> If I rework the code, setting arg2_type to be ptr_type_node, i.e. >> + tree arg2_type = TREE_TYPE (arg2); >> to: >> + tree arg2_type = ptr_type_node; >> >> the generated gimple then looks like >> MEM[(void *)D.3218] = vd1; >> >> Which is probably OK, but I cannot say for certain. The generated .s >> content is at least equivalent. > > It looks safe. > > The question I had is whether vec_st (vd1, ll1, (vector double *)p) is > equivalent > to *(vector double *)((char *)p + ll1) = vd1; from a TBAA perspective. Thus > whether the type of the tird argument to vec_st defines the type of the access > (in C standards meaning). If so then what we do now is pessimizing (but > as you say previously (long long *) and (long *) were aliased together and > you got wrong-code with aliasing with regular stores of the "wrong" same > type). > > A proper fix would be to transition this type as seen from the frontend to > GIMPLE, for example in a similar way we do for MEM_REFs by piggy-backing > that on an extra argument, a constant zero pointer of the alias > pointer type to use > (which would also serve as a type indicator of the store itself). I'd use a > target specific internal function for this (not sure if we can have those > target > specific, but I guess if it's folded away then that's fine) and get away with > the overload set. OK. Will, for now, let's again use the (void *) solution for the time being, and add commentary recording this improvement for future work. Same would go for the vec_vsx_ld/st variations once you get to those. Otherwise the patch looks ok to me. I'll defer to Segher for approval, of course. Thanks, Bill > > Richard. > >> The resulting code is verified by testcases powerpc/fold-vec-st-*.c, which >> has been posted separately. >> >> regtest looks clean on power6 and newer. >> >> pending feedback, OK for trunk? >> >> Thanks, >> -Will >> >>[gcc] >> >>2017-09-21 Will Schmidt >> >>* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling >>for early folding of vector stores (ALTIVEC_BUILTIN_ST_*). >>(rs6000_builtin_valid_without_lhs): helper function. >>* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): >>Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_ST. >> >> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c >> index a49db97..4a363a1 100644 >> --- a/gcc/config/rs6000/rs6000-c.c >> +++ b/gcc/config/rs6000/rs6000-c.c >> @@ -6470,82 +6470,10 @@ altivec_resolve_overloaded_builtin (location_t loc, >> tree fndecl, >> convert (TREE_TYPE (stmt), arg0)); >> stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl); >> return stmt; >> } >> >> - /* Expand vec_st into an expression that masks the address and >> - performs the store. We need to expand this early to allow >> - the best aliasing, as by the time we get into RTL we no longer >> - are able to honor __restrict__, for example. We may want to >> - consider this for all memory access built-ins. >> - >> - When -maltivec=be is specified, or the wrong number of arguments >> - is provided, simply punt to existing built-in processing. */ >> - >> - if (fcode == ALTIVEC_BUILTIN_VEC_ST >> - && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG) >> - && nargs == 3) >> -{ >> - tree arg0 = (*arglist)[0]; >> - tree arg1 = (*arglist)[1]; >> - tree arg2 = (*arglist)[2]; >> - >> - /* Construct the masked address. Let existing error handling take >> -over if we don't have a constant offset. */ >> - arg1 = fold (arg1); >> - >> - if (TREE_CODE (arg1) == INTEGER_CST) >> - { >> -
Re: [PATCH][aarch64] Fix error calls in aarch64 code so they can be tranlated
On 09/22/2017 11:54 AM, Steve Ellcey wrote: In my view, the ideal message would use the same form as in the source code. But I suspect that's not easy to determine at all sites where the message is being formatted, so the next best thing is to avoid using the keyword and fall back on the general term. Martin PS If possible, I would also suggest adopting a style that is already in use elsewhere, for consistency. (That would suggest using % with no spaces and no "string" after arch=.) Here is a new version of my patch. I think it addresses these issues and is consistent. It does cause some regressions in tests that check for certain error messages, specifically: gcc.target/aarch64/spellcheck_1.c gcc.target/aarch64/spellcheck_2.c gcc.target/aarch64/spellcheck_3.c gcc.target/aarch64/target_attr_11.c gcc.target/aarch64/target_attr_12.c gcc.target/aarch64/target_attr_17.c But I thought we should try and get some agreement on the format of the messages before updating the tests. If we think these changes look good I can submit a final patch that includes the testsuite changes. Thank you for taking my suggestions! The changes look good to me. Just one minor nit. The following won't produce the kind of output we'd like it to: error (is_pragma ? G_("pragma % does not accept an argument") : G_("attribute % does not accept an argument"), str_to_check); It will end up printing: error: pragma ‘target(‘...’)’ does not accept an argument (note the single quotes around the ... where the syntax otherwise calls for a string). The %s directive should not use the 'q' flag when nested between %< and %>. -Wformat normally warns about this but in this case it doesn't see the format string because of the conditional (I think -Wformat can be extended to handle this case). warning: 'q' flag used within a quoted sequence [-Wformat=] To get the right output I think the format string should like something like this: "pragma % does not accept an argument" Martin
Re: [Patch, Fortran] PR 82143: add a -fdefault-real-16 flag
2017-09-22 11:44 GMT+02:00 Janus Weil : > 2017-09-22 9:11 GMT+02:00 Janne Blomqvist : >> And since the standard requires that double precision variables are >> twice as big as reals, in the absence of an explicit -fdefault-double= >> flag, would it make sense to have -fdefault-real=N imply >> -fdefault-double=[2*N or if that isn't supported on the target, the >> largest supported real kind]? > > That's basically the behavior I tried to implement in my current patch > (although I notice now that you're not necessarily getting the largest > real kind, if 16 is not supported). Attached is a new version of the patch, which improves this point. In the absence of further comments, I'll commit this by tomorrow. Cheers, Janus Index: gcc/fortran/invoke.texi === --- gcc/fortran/invoke.texi (revision 253108) +++ gcc/fortran/invoke.texi (working copy) @@ -119,8 +119,8 @@ by type. Explanations are in the following sectio @gccoptlist{-fall-intrinsics -fbackslash -fcray-pointer -fd-lines-as-code @gol -fd-lines-as-comments @gol -fdec -fdec-structure -fdec-intrinsic-ints -fdec-static -fdec-math @gol --fdefault-double-8 -fdefault-integer-8 @gol --fdefault-real-8 -fdollar-ok -ffixed-line-length-@var{n} @gol +-fdefault-double-8 -fdefault-integer-8 -fdefault-real-8 @gol +-fdefault-real-10 -fdefault-real-16 -fdollar-ok -ffixed-line-length-@var{n} @gol -ffixed-line-length-none -ffree-form -ffree-line-length-@var{n} @gol -ffree-line-length-none -fimplicit-none -finteger-4-integer-8 @gol -fmax-identifier-length -fmodule-private -ffixed-form -fno-range-check @gol @@ -404,6 +404,22 @@ the default width of @code{DOUBLE PRECISION} to 16 @code{-fdefault-double-8} is given, too. Unlike @option{-freal-4-real-8}, it does not promote variables with explicit kind declaration. +@item -fdefault-real-10 +@opindex @code{fdefault-real-10} +Set the default real type to a 10 byte wide type. This option also affects +the kind of non-double real constants like @code{1.0}, and does promote +the default width of @code{DOUBLE PRECISION} to 16 bytes if possible, unless +@code{-fdefault-double-8} is given. Unlike @option{-freal-4-real-10}, +it does not promote variables with explicit kind declaration. + +@item -fdefault-real-16 +@opindex @code{fdefault-real-16} +Set the default real type to a 16 byte wide type. This option also affects +the kind of non-double real constants like @code{1.0}, and does promote +the default width of @code{DOUBLE PRECISION} to 16 bytes if possible, unless +@code{-fdefault-double-8} is given. Unlike @option{-freal-4-real-16}, +it does not promote variables with explicit kind declaration. + @item -fdefault-double-8 @opindex @code{fdefault-double-8} Set the @code{DOUBLE PRECISION} type to an 8 byte wide type. Do nothing if this Index: gcc/fortran/lang.opt === --- gcc/fortran/lang.opt(revision 253108) +++ gcc/fortran/lang.opt(working copy) @@ -457,9 +457,17 @@ Fortran Var(flag_default_integer) Set the default integer kind to an 8 byte wide type. fdefault-real-8 -Fortran Var(flag_default_real) +Fortran Var(flag_default_real_8) Set the default real kind to an 8 byte wide type. +fdefault-real-10 +Fortran Var(flag_default_real_10) +Set the default real kind to an 10 byte wide type. + +fdefault-real-16 +Fortran Var(flag_default_real_16) +Set the default real kind to an 16 byte wide type. + fdollar-ok Fortran Var(flag_dollar_ok) Allow dollar signs in entity names. Index: gcc/fortran/module.c === --- gcc/fortran/module.c(revision 253108) +++ gcc/fortran/module.c(working copy) @@ -6741,7 +6741,7 @@ use_iso_fortran_env_module (void) "standard", symbol[i].name, &u->where)) continue; - if ((flag_default_integer || flag_default_real) + if ((flag_default_integer || flag_default_real_8) && symbol[i].id == ISOFORTRANENV_NUMERIC_STORAGE_SIZE) gfc_warning_now (0, "Use of the NUMERIC_STORAGE_SIZE named " "constant from intrinsic module " @@ -6808,7 +6808,7 @@ use_iso_fortran_env_module (void) if ((gfc_option.allow_std & symbol[i].standard) == 0) continue; - if ((flag_default_integer || flag_default_real) + if ((flag_default_integer || flag_default_real_8) && symbol[i].id == ISOFORTRANENV_NUMERIC_STORAGE_SIZE) gfc_warning_now (0, "Use of the NUMERIC_STORAGE_SIZE named constant " Index: gcc/fortran/trans-types.c === --- gcc/fortran/trans-types.c (revision 253108) +++ gcc/fortran/trans-types.c (working copy) @@ -530,7 +530,7 @@ gfc_init_kinds (void) } /* Choose the default real kind.
libgo patch committed: Add XCOFF support
This patch by Tony Reix adds XCOFF support to libgo and to the go and cgo commands. Bootstrapped and ran Go tests on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 253042) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -1fcb9bb3cefb97cfab1e623826a1cc3f6aadd5f7 +e0c1f0b645b12a544b484c0f477f8fb6f5980550 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/Makefile.am === --- libgo/Makefile.am (revision 253042) +++ libgo/Makefile.am (working copy) @@ -217,7 +217,8 @@ toolexeclibgodebug_DATA = \ debug/gosym.gox \ debug/macho.gox \ debug/pe.gox \ - debug/plan9obj.gox + debug/plan9obj.gox \ + debug/xcoff.gox toolexeclibgoencodingdir = $(toolexeclibgodir)/encoding @@ -722,6 +723,7 @@ PACKAGES = \ debug/macho \ debug/pe \ debug/plan9obj \ + debug/xcoff \ encoding \ encoding/ascii85 \ encoding/asn1 \ @@ -1293,6 +1295,7 @@ TEST_PACKAGES = \ debug/macho/check \ debug/pe/check \ debug/plan9obj/check \ + debug/xcoff/check \ encoding/ascii85/check \ encoding/asn1/check \ encoding/base32/check \ Index: libgo/go/cmd/cgo/gcc.go === --- libgo/go/cmd/cgo/gcc.go (revision 253025) +++ libgo/go/cmd/cgo/gcc.go (working copy) @@ -13,6 +13,7 @@ import ( "debug/elf" "debug/macho" "debug/pe" + "debug/xcoff" "encoding/binary" "errors" "flag" @@ -1289,6 +1290,10 @@ func (p *Package) gccMachine() []string return []string{"-mabi=64"} case "mips", "mipsle": return []string{"-mabi=32"} + case "ppc64": + if goos == "aix" { + return []string{"-maix64"} + } } return nil } @@ -1616,7 +1621,79 @@ func (p *Package) gccDebug(stdin []byte, return d, ints, floats, strs } - fatalf("cannot parse gcc output %s as ELF, Mach-O, PE object", gccTmp()) + if f, err := xcoff.Open(gccTmp()); err == nil { + defer f.Close() + d, err := f.DWARF() + if err != nil { + fatalf("cannot load DWARF output from %s: %v", gccTmp(), err) + } + bo := binary.BigEndian + for _, s := range f.Symbols { + switch { + case isDebugInts(s.Name): + if i := int(s.SectionNumber) - 1; 0 <= i && i < len(f.Sections) { + sect := f.Sections[i] + if s.Value < sect.Size { + if sdat, err := sect.Data(); err == nil { + data := sdat[s.Value:] + ints = make([]int64, len(data)/8) + for i := range ints { + ints[i] = int64(bo.Uint64(data[i*8:])) + } + } + } + } + case isDebugFloats(s.Name): + if i := int(s.SectionNumber) - 1; 0 <= i && i < len(f.Sections) { + sect := f.Sections[i] + if s.Value < sect.Size { + if sdat, err := sect.Data(); err == nil { + data := sdat[s.Value:] + floats = make([]float64, len(data)/8) + for i := range floats { + floats[i] = math.Float64frombits(bo.Uint64(data[i*8:])) + } + } + } + } + default: + if n := indexOfDebugStr(s.Name); n != -1 { + if i := int(s.SectionNumber) - 1; 0 <= i && i < len(f.Sections) { + sect := f.Sections[i] + if s.Value < sect.Size { + if sdat, err
Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.
On Fri, Sep 22, 2017 at 10:58 AM, Andrew Pinski wrote: > Two overall comments: > * What about splitting register_offset into two different elements, > one for non 128bit modes and one for 128bit (and more; OI, etc.) modes > so you get better address generation right away for the simd load > cases rather than having LRA/reload having to reload the address into > a register. I'm not sure if changing register_offset cost would make a difference, since costs are usually used during optimization, not during address generation. This is something that I didn't think to try though. I can try taking a look at this. I did try writing a patch to modify predicates to disallow reg offset for 128bit modes, and that got complicated, as I had to split apart a number of patterns in the aarch64-simd.md file that accept both VD and VQ modes. I ended up with a patch 3-4 times as big as the one I submitted, without any additional performance improvement, so it wasn't worth the trouble. > * Maybe adding a testcase to the testsuite to show this change. Yes, I can add a testcase. > One extra comment: > * should we change the generic tuning to avoid reg+reg for 128bit modes? Are there other targets with a similar problem? I only know that it is a problem for Falkor. It might be a loss for some targets as it is replacing one instruction with two. Jim
Re: [PATCH] Optimize x == -1 && y == -1 (PR middle-end/35691)
On September 22, 2017 6:59:48 PM GMT+02:00, Jakub Jelinek wrote: >Hi! > >While fixing libgcc2.c (__mulvDI3), I've noticed that while we >optimize x == 0 && y == 0 into (x | y) == 0, we don't optimize >analogical x == -1 && y == -1 into (x & y) == -1. > >The following patch does that, bootstrapped/regtested on x86_64-linux >and >i686-linux, ok for trunk? OK. Thanks, Richard. >2017-09-22 Jakub Jelinek > > PR middle-end/35691 > * match.pd: Simplify x == -1 & y == -1 into (x & y) == -1 > and x != -1 | y != -1 into (x & y) != -1. > > * gcc.dg/pr35691-1.c: Use -fdump-tree-forwprop1-details > instead of -fdump-tree-forwprop-details in dg-options. > * gcc.dg/pr35691-2.c: Likewise. > * gcc.dg/pr35691-3.c: New test. > * gcc.dg/pr35691-4.c: New test. > >--- gcc/match.pd.jj2017-09-15 20:42:29.0 +0200 >+++ gcc/match.pd 2017-09-22 13:34:29.413534762 +0200 >@@ -630,17 +630,26 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (if (TYPE_UNSIGNED (type)) >(bit_and @0 (bit_not (lshift { build_all_ones_cst (type); } @1) > >-/* PR35691: Transform >- (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. >- (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ > (for bitop (bit_and bit_ior) > cmp (eq ne) >+ /* PR35691: Transform >+(x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. >+(x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ > (simplify > (bitop (cmp @0 integer_zerop@2) (cmp @1 integer_zerop)) >(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) >- && INTEGRAL_TYPE_P (TREE_TYPE (@1)) >- && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE >(@1))) >-(cmp (bit_ior @0 (convert @1)) @2 >+ && INTEGRAL_TYPE_P (TREE_TYPE (@1)) >+ && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE >(@1))) >+(cmp (bit_ior @0 (convert @1)) @2))) >+ /* Transform: >+(x == -1 & y == -1) -> (x & typeof(x)(y)) == -1. >+(x != -1 | y != -1) -> (x & typeof(x)(y)) != -1. */ >+ (simplify >+ (bitop (cmp @0 integer_all_onesp@2) (cmp @1 integer_all_onesp)) >+ (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) >+ && INTEGRAL_TYPE_P (TREE_TYPE (@1)) >+ && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE >(@1))) >+(cmp (bit_and @0 (convert @1)) @2 > > /* Fold (A & ~B) - (A & B) into (A ^ B) - B. */ > (simplify >--- gcc/testsuite/gcc.dg/pr35691-1.c.jj2016-11-09 15:22:31.0 >+0100 >+++ gcc/testsuite/gcc.dg/pr35691-1.c 2017-09-22 13:58:32.665455251 >+0200 >@@ -1,5 +1,5 @@ > /* { dg-do compile } */ >-/* { dg-options "-O2 -fdump-tree-forwprop-details" } */ >+/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */ > > int foo(int z0, unsigned z1) > { >--- gcc/testsuite/gcc.dg/pr35691-2.c.jj2016-11-09 15:22:32.0 >+0100 >+++ gcc/testsuite/gcc.dg/pr35691-2.c 2017-09-22 13:58:38.174386317 >+0200 >@@ -1,5 +1,5 @@ > /* { dg-do compile } */ >-/* { dg-options "-O2 -fdump-tree-forwprop-details" } */ >+/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */ > > int foo(int z0, unsigned z1) > { >--- gcc/testsuite/gcc.dg/pr35691-3.c.jj2017-09-22 13:55:36.186663576 >+0200 >+++ gcc/testsuite/gcc.dg/pr35691-3.c 2017-09-22 13:58:44.070312539 >+0200 >@@ -0,0 +1,12 @@ >+/* { dg-do compile } */ >+/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */ >+ >+int foo(int z0, unsigned z1) >+{ >+ int t0 = (z0 == -1); >+ int t1 = (z1 == -1U); >+ int t2 = (t0 & t1); >+ return t2; >+} >+ >+/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = >\\(int\\) z1_\[0-9\]*\\(D\\);" "forwprop1" } } */ >--- gcc/testsuite/gcc.dg/pr35691-4.c.jj2017-09-22 13:55:39.855617665 >+0200 >+++ gcc/testsuite/gcc.dg/pr35691-4.c 2017-09-22 13:58:50.132236685 >+0200 >@@ -0,0 +1,12 @@ >+/* { dg-do compile } */ >+/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */ >+ >+int foo(int z0, unsigned z1) >+{ >+ int t0 = (z0 != -1); >+ int t1 = (z1 != -1U); >+ int t2 = (t0 | t1); >+ return t2; >+} >+ >+/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = >\\(int\\) z1_\[0-9\]*\\(D\\);" "forwprop1" } } */ > > Jakub
Re: [PATCH][aarch64] Enable ifunc resolver attribute by default
On Thu, 2017-09-21 at 12:37 +, Joseph Myers wrote: > On Tue, 5 Sep 2017, Steve Ellcey wrote: > > > > > 2017-09-05 Steve Ellcey > > > > * config.gcc: Add new case statement to set > > default_gnu_indirect_function. Remove it from x86_64-*-linux*, > > i[34567]86-*, powerpc*-*-linux*spe*, powerpc*-*-linux*, > > s390-*-linux*, > > s390x-*-linux* case statements. Added aarch64 to the list of > > supported architectures. > This is OK subject to AArch64 maintainer approval, or in the absence of > AArch64 maintainer objections within 24 hours. Since I didn't see any objections I have checked this in. Steve Ellcey sell...@cavium.com
Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.
On Fri, Sep 22, 2017 at 8:59 AM, Jim Wilson wrote: > On Fri, Sep 22, 2017 at 8:49 AM, Jim Wilson wrote: >> On Falkor, because of an idiosyncracy of how the pipelines are designed, a >> quad-word store using a reg+reg addressing mode is almost twice as slow as an >> add followed by a quad-word store with a single reg addressing mode. So we >> get better performance if we disallow addressing modes using register offsets >> with quad-word stores. > > This was tested with a bootstrap and make check of course. Also, > gcc.c-torture/compile/20021212-1.c compiled with -O3 -mcpu=falkor > makes a nice testcase to demonstrate that the patch works. > > OK? Two overall comments: * What about splitting register_offset into two different elements, one for non 128bit modes and one for 128bit (and more; OI, etc.) modes so you get better address generation right away for the simd load cases rather than having LRA/reload having to reload the address into a register. * Maybe adding a testcase to the testsuite to show this change. One extra comment: * should we change the generic tuning to avoid reg+reg for 128bit modes? Thanks, Andrew > > Jim
Re: [C++ PATCH] Fix compile time hog in replace_placeholders (PR sanitizer/81929)
OK. On Thu, Sep 14, 2017 at 4:12 PM, Jakub Jelinek wrote: > Hi! > > When the expression replace_placeholders is called on contains > many SAVE_EXPRs that appear more than once in the tree, we hang walking them > over and over again, while it is sufficient to just walk it without > duplicates (not using cp_walk_tree_without_duplicates, because the callback > can cp_walk_tree* again and we want to use the same pointer set between > those). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2017-09-14 Jakub Jelinek > > PR sanitizer/81929 > * tree.c (struct replace_placeholders_t): Add pset field. > (replace_placeholders_r): Call cp_walk_tree with d->pset as > last argument instead of NULL. Formatting fix. > (replace_placeholders): Add pset variable, add its address > into data. Pass &pset instead of NULL to cp_walk_tree. > > * g++.dg/ubsan/pr81929.C: New test. > > --- gcc/cp/tree.c.jj2017-09-12 09:35:47.0 +0200 > +++ gcc/cp/tree.c 2017-09-14 17:38:07.717064412 +0200 > @@ -3063,6 +3063,7 @@ struct replace_placeholders_t > { >tree obj;/* The object to be substituted for a PLACEHOLDER_EXPR. > */ >bool seen; /* Whether we've encountered a PLACEHOLDER_EXPR. */ > + hash_set *pset;/* To avoid walking same trees multiple > times. */ > }; > > /* Like substitute_placeholder_in_expr, but handle C++ tree codes and > @@ -3085,8 +3086,8 @@ replace_placeholders_r (tree* t, int* wa > case PLACEHOLDER_EXPR: >{ > tree x = obj; > - for (; !(same_type_ignoring_top_level_qualifiers_p > -(TREE_TYPE (*t), TREE_TYPE (x))); > + for (; !same_type_ignoring_top_level_qualifiers_p (TREE_TYPE (*t), > + TREE_TYPE (x)); > x = TREE_OPERAND (x, 0)) > gcc_assert (TREE_CODE (x) == COMPONENT_REF); > *t = x; > @@ -3118,8 +3119,7 @@ replace_placeholders_r (tree* t, int* wa > valp = &TARGET_EXPR_INITIAL (*valp); > } > d->obj = subob; > - cp_walk_tree (valp, replace_placeholders_r, > - data_, NULL); > + cp_walk_tree (valp, replace_placeholders_r, data_, d->pset); > d->obj = obj; > } > *walk_subtrees = false; > @@ -3151,10 +3151,11 @@ replace_placeholders (tree exp, tree obj > return exp; > >tree *tp = &exp; > - replace_placeholders_t data = { obj, false }; > + hash_set pset; > + replace_placeholders_t data = { obj, false, &pset }; >if (TREE_CODE (exp) == TARGET_EXPR) > tp = &TARGET_EXPR_INITIAL (exp); > - cp_walk_tree (tp, replace_placeholders_r, &data, NULL); > + cp_walk_tree (tp, replace_placeholders_r, &data, &pset); >if (seen_p) > *seen_p = data.seen; >return exp; > --- gcc/testsuite/g++.dg/ubsan/pr81929.C.jj 2017-09-14 17:48:09.052611540 > +0200 > +++ gcc/testsuite/g++.dg/ubsan/pr81929.C2017-09-14 17:49:21.644711332 > +0200 > @@ -0,0 +1,14 @@ > +// PR sanitizer/81929 > +// { dg-do compile } > +// { dg-options "-std=c++14 -fsanitize=undefined" } > + > +struct S { S &operator<< (long); S foo (); S (); }; > + > +void > +bar () > +{ > + static_cast(S () << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 > + << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 > + << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 > + << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << > 0).foo (); > +} > > Jakub
Re: [PATCH][aarch64] Fix error calls in aarch64 code so they can be tranlated
> In my view, the ideal message would use the same form as in > the source code. But I suspect that's not easy to determine at > all sites where the message is being formatted, so the next best > thing is to avoid using the keyword and fall back on the general > term. > > Martin > > PS If possible, I would also suggest adopting a style that is > already in use elsewhere, for consistency. (That would suggest > using % with no spaces and no "string" after > arch=.) Here is a new version of my patch. I think it addresses these issues and is consistent. It does cause some regressions in tests that check for certain error messages, specifically: gcc.target/aarch64/spellcheck_1.c gcc.target/aarch64/spellcheck_2.c gcc.target/aarch64/spellcheck_3.c gcc.target/aarch64/target_attr_11.c gcc.target/aarch64/target_attr_12.c gcc.target/aarch64/target_attr_17.c But I thought we should try and get some agreement on the format of the messages before updating the tests. If we think these changes look good I can submit a final patch that includes the testsuite changes. Steve Ellcey sell...@cavium.com 2017-09-22 Steve Ellcey PR target/79868 * config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Change argument type on aarch64_process_target_attr call. * config/aarch64/aarch64-protos.h (aarch64_process_target_attr): Change argument type. * config/aarch64/aarch64.c (aarch64_attribute_info): Change field type. (aarch64_handle_attr_arch): Change argument type, use boolean argument to use different strings in error calls. (aarch64_handle_attr_cpu): Ditto. (aarch64_handle_attr_tune): Ditto. (aarch64_handle_attr_isa_flags): Ditto. (aarch64_process_one_target_attr): Ditto. (aarch64_process_target_attr): Ditto. (aarch64_option_valid_attribute_p): Change argument type on aarch64_process_target_attr call. diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c index 177e638..c9945db 100644 --- a/gcc/config/aarch64/aarch64-c.c +++ b/gcc/config/aarch64/aarch64-c.c @@ -165,7 +165,7 @@ aarch64_pragma_target_parse (tree args, tree pop_target) information that it specifies. */ if (args) { - if (!aarch64_process_target_attr (args, "pragma")) + if (!aarch64_process_target_attr (args, true)) return false; aarch64_override_options_internal (&global_options); diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index e67c2ed..4323e9e 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -445,7 +445,7 @@ bool aarch64_gen_adjusted_ldpstp (rtx *, bool, scalar_mode, RTX_CODE); void aarch64_init_builtins (void); -bool aarch64_process_target_attr (tree, const char*); +bool aarch64_process_target_attr (tree, bool); void aarch64_override_options_internal (struct gcc_options *); rtx aarch64_expand_builtin (tree exp, diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 1c14008..3bf91b3 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -67,6 +67,7 @@ #include "common/common-target.h" #include "selftest.h" #include "selftest-rtl.h" +#include "intl.h" /* This file should be included last. */ #include "target-def.h" @@ -9554,15 +9555,15 @@ struct aarch64_attribute_info const char *name; enum aarch64_attr_opt_type attr_type; bool allow_neg; - bool (*handler) (const char *, const char *); + bool (*handler) (const char *, bool); enum opt_code opt_num; }; /* Handle the ARCH_STR argument to the arch= target attribute. - PRAGMA_OR_ATTR is used in potential error messages. */ + IS_PRAGMA is used in potential error messages. */ static bool -aarch64_handle_attr_arch (const char *str, const char *pragma_or_attr) +aarch64_handle_attr_arch (const char *str, bool is_pragma) { const struct processor *tmp_arch = NULL; enum aarch64_parse_opt_result parse_res @@ -9579,15 +9580,22 @@ aarch64_handle_attr_arch (const char *str, const char *pragma_or_attr) switch (parse_res) { case AARCH64_PARSE_MISSING_ARG: - error ("missing architecture name in 'arch' target %s", pragma_or_attr); + error (is_pragma + ? G_("missing name in % pragma") + : G_("missing name in % attribute")); break; case AARCH64_PARSE_INVALID_ARG: - error ("unknown value %qs for 'arch' target %s", str, pragma_or_attr); + error (is_pragma + ? G_("invalid name (%qs) in % pragma") + : G_("invalid name (%qs) in % attribute"), + str); aarch64_print_hint_for_arch (str); break; case AARCH64_PARSE_INVALID_FEATURE: - error ("invalid feature modifier %qs for 'arch' target %s", - str, pragma_or_attr); + error (is_pragma + ? G_("invalid value (%qs) in % pragma") + : G_("invalid value (%qs) in % attribute"), + str); break; default: gcc_unre
[RFA][PATCH] Stack clash protection 06/08 - V4
Enough changed since V3 that I think another review round on the ppc bits is warranted. wrap_frame_mem is gone. Allocation, attribute setting and notes/scheduler barriers are now in rs6000_emit_allocate_stack_1. handle_residual is gone, it's folded into rs6000_emit_allocate_stack_1. rs6000_emit_probe_stack_range_stack_clash now returns an insn with the first stack adjustment or just before the stack adjustment loop. This seems more correct given how its return value (sp_adjust) is used. The go test suite has been run with and without this patch. I've found os/signal seems to flip between pass and fail without rhyme or reason. It may be related to system load. TestCgoCallbackGC now passes for reasons unknown. I haven't run it enough to know if it is sensitive to load or other timing factors. I also flipped things so that clash protection is enabled by default and re-ran the tests. The idea being to see if I could exercise the path that uses SP_ADJUST a bit more. But that gave me the same results. While I think the change in the return value in rs6000_emit_probe_stack_range_stack_clash is more correct, I don't have a good way to test it. There were other minor improvements pointed out by Segher that were fixed as well. I kept the separation of the case where there's no stack space allocated. I did look at pushing that into emit_probe_stack_range_stack_clash, but doing so actually complicates things. It's cleaner to leave it as-is. OK for the trunk? Jeff commit b25ea3141981f833f3d1211f50209f349c2eae3e Author: Jeff Law Date: Fri Sep 22 11:47:26 2017 -0600 * config/rs6000/rs6000-protos.h (output_probe_stack_range): Update prototype for new argument. * config/rs6000/rs6000.c (rs6000_emit_allocate_stack_1): New function, mostly extracted from rs6000_emit_allocate_stack. (rs6000_emit_probe_stack_range_stack_clash): New function. (rs6000_emit_allocate_stack): Call rs6000_emit_probe_stack_range_stack_clash as needed. (rs6000_emit_probe_stack_range): Add additional argument to call to gen_probe_stack_range{si,di}. (output_probe_stack_range): New. (output_probe_stack_range_1): Renamed from output_probe_stack_range. (output_probe_stack_range_stack_clash): New. (rs6000_emit_prologue): Emit notes into dump file as requested. * rs6000.md (allocate_stack): Handle -fstack-clash-protection. (probe_stack_range): Operand 0 is now early-clobbered. Add additional operand and pass it to output_probe_stack_range. * lib/target-supports.exp (check_effective_target_supports_stack_clash_protection): Enable for rs6000 and powerpc targets. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index fdb7221848c..2bdb8c07372 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,22 @@ +2017-09-22 Jeff Law + + * config/rs6000/rs6000-protos.h (output_probe_stack_range): Update + prototype for new argument. + * config/rs6000/rs6000.c (rs6000_emit_allocate_stack_1): New function, + mostly extracted from rs6000_emit_allocate_stack. + (rs6000_emit_probe_stack_range_stack_clash): New function. + (rs6000_emit_allocate_stack): Call + rs6000_emit_probe_stack_range_stack_clash as needed. + (rs6000_emit_probe_stack_range): Add additional argument + to call to gen_probe_stack_range{si,di}. + (output_probe_stack_range): New. + (output_probe_stack_range_1): Renamed from output_probe_stack_range. + (output_probe_stack_range_stack_clash): New. + (rs6000_emit_prologue): Emit notes into dump file as requested. + * rs6000.md (allocate_stack): Handle -fstack-clash-protection. + (probe_stack_range): Operand 0 is now early-clobbered. + Add additional operand and pass it to output_probe_stack_range. + 2017-09-22 Richard Sandiford Alan Hayward David Sherwood diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 3f86aba947e..781349b850e 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -128,7 +128,7 @@ extern void rs6000_emit_sISEL (machine_mode, rtx[]); extern void rs6000_emit_sCOND (machine_mode, rtx[]); extern void rs6000_emit_cbranch (machine_mode, rtx[]); extern char * output_cbranch (rtx, const char *, int, rtx_insn *); -extern const char * output_probe_stack_range (rtx, rtx); +extern const char * output_probe_stack_range (rtx, rtx, rtx); extern void rs6000_emit_dot_insn (rtx dst, rtx src, int dot, rtx ccreg); extern bool rs6000_emit_set_const (rtx, rtx); extern int rs6000_emit_cmove (rtx, rtx, rtx, rtx); diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index e5ef63889b7..9e486f592b9 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -25458,6 +25458,201
[committed] PR82289: Computing peeling costs for irrelevant drs
This PR shows that we weren't filtering out irrelevant stmts in vect_get_peeling_costs_all_drs (unlike related loops in which we iterate over all datarefs). Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. Installed as obvious. Richard 2017-09-22 Richard Sandiford gcc/ PR tree-optimization/82289 * tree-vect-data-refs.c (vect_get_peeling_costs_all_drs): Check STMT_VINFO_RELEVANT_P. gcc/testsuite/ PR tree-optimization/82289 * gcc.dg/vect/pr82289.c: New test. Index: gcc/tree-vect-data-refs.c === --- gcc/tree-vect-data-refs.c 2017-09-22 17:44:23.043135080 +0100 +++ gcc/tree-vect-data-refs.c 2017-09-22 17:50:04.801574308 +0100 @@ -1326,6 +1326,9 @@ vect_get_peeling_costs_all_drs (vec 0) +{ + d = j; + return; +} + if (!h) +while (g) + ; + while (h < 1) +if (a) + { +fn1 (&h); +h = 0; + } + f[e] = &c; +} + while (1) +; +}
Re: range_int_cst_p handling in extract_range_from_binary_expr_1
Richard Biener writes: > On Wed, Sep 20, 2017 at 2:06 PM, Richard Sandiford > wrote: >> extract_range_from_binary_expr_1 had: >> >> if (range_int_cst_p (&vr0) >> && range_int_cst_p (&vr1) >> && TYPE_OVERFLOW_WRAPS (expr_type)) >> ... >> ... >> extract_range_from_multiplicative_op_1 (vr, code, &vr0, &vr1); >> >> but extract_range_from_multiplicative_op_1 also requires range_int_cst_p. >> I think we should bail out if either range isn't a constant. >> >> This might only be theoretical with current sources, but it's needed >> once polynomial constants are added. >> >> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linus-gnu. >> OK to install? > > extract_range_from_multiplicative_op_1 also allows anti-ranges but indeed > requires INTEGER_CSTs. Note it won't get anti-ranges because we handle > this case by ranges_from_anti_range. > > Thus ok if you also remove the apparent anti-range handling from > extract_range_from_multiplicative_op_1 (in the assert). Thanks, here's what I committed after testing as before. Richard 2017-09-22 Richard Sandiford Alan Hayward David Sherwood gcc/ * tree-vrp.c (extract_range_from_multiplicative_op_1): Assert for VR_RANGE only; don't allow VR_ANTI_RANGE. (extract_range_from_binary_expr_1): Don't call extract_range_from_multiplicative_op_1 if !range_int_cst_p. Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c 2017-09-21 22:35:16.63940 +0100 +++ gcc/tree-vrp.c 2017-09-22 17:46:45.207205032 +0100 @@ -1851,8 +1851,7 @@ extract_range_from_multiplicative_op_1 ( || code == ROUND_DIV_EXPR || code == RSHIFT_EXPR || code == LSHIFT_EXPR); - gcc_assert ((vr0->type == VR_RANGE - || (code == MULT_EXPR && vr0->type == VR_ANTI_RANGE)) + gcc_assert (vr0->type == VR_RANGE && vr0->type == vr1->type); rtype = vr0->type; @@ -2462,9 +2461,14 @@ extract_range_from_binary_expr_1 (value_ signop sign = TYPE_SIGN (expr_type); unsigned int prec = TYPE_PRECISION (expr_type); - if (range_int_cst_p (&vr0) - && range_int_cst_p (&vr1) - && TYPE_OVERFLOW_WRAPS (expr_type)) + if (!range_int_cst_p (&vr0) + || !range_int_cst_p (&vr1)) + { + set_value_range_to_varying (vr); + return; + } + + if (TYPE_OVERFLOW_WRAPS (expr_type)) { typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) vrp_int; typedef generic_wide_int
[PATCH] Optimize x == -1 && y == -1 (PR middle-end/35691)
Hi! While fixing libgcc2.c (__mulvDI3), I've noticed that while we optimize x == 0 && y == 0 into (x | y) == 0, we don't optimize analogical x == -1 && y == -1 into (x & y) == -1. The following patch does that, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2017-09-22 Jakub Jelinek PR middle-end/35691 * match.pd: Simplify x == -1 & y == -1 into (x & y) == -1 and x != -1 | y != -1 into (x & y) != -1. * gcc.dg/pr35691-1.c: Use -fdump-tree-forwprop1-details instead of -fdump-tree-forwprop-details in dg-options. * gcc.dg/pr35691-2.c: Likewise. * gcc.dg/pr35691-3.c: New test. * gcc.dg/pr35691-4.c: New test. --- gcc/match.pd.jj 2017-09-15 20:42:29.0 +0200 +++ gcc/match.pd2017-09-22 13:34:29.413534762 +0200 @@ -630,17 +630,26 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (TYPE_UNSIGNED (type)) (bit_and @0 (bit_not (lshift { build_all_ones_cst (type); } @1) -/* PR35691: Transform - (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. - (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ (for bitop (bit_and bit_ior) cmp (eq ne) + /* PR35691: Transform +(x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. +(x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ (simplify (bitop (cmp @0 integer_zerop@2) (cmp @1 integer_zerop)) (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) - && INTEGRAL_TYPE_P (TREE_TYPE (@1)) - && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))) -(cmp (bit_ior @0 (convert @1)) @2 + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))) +(cmp (bit_ior @0 (convert @1)) @2))) + /* Transform: +(x == -1 & y == -1) -> (x & typeof(x)(y)) == -1. +(x != -1 | y != -1) -> (x & typeof(x)(y)) != -1. */ + (simplify + (bitop (cmp @0 integer_all_onesp@2) (cmp @1 integer_all_onesp)) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))) +(cmp (bit_and @0 (convert @1)) @2 /* Fold (A & ~B) - (A & B) into (A ^ B) - B. */ (simplify --- gcc/testsuite/gcc.dg/pr35691-1.c.jj 2016-11-09 15:22:31.0 +0100 +++ gcc/testsuite/gcc.dg/pr35691-1.c2017-09-22 13:58:32.665455251 +0200 @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-forwprop-details" } */ +/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */ int foo(int z0, unsigned z1) { --- gcc/testsuite/gcc.dg/pr35691-2.c.jj 2016-11-09 15:22:32.0 +0100 +++ gcc/testsuite/gcc.dg/pr35691-2.c2017-09-22 13:58:38.174386317 +0200 @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-forwprop-details" } */ +/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */ int foo(int z0, unsigned z1) { --- gcc/testsuite/gcc.dg/pr35691-3.c.jj 2017-09-22 13:55:36.186663576 +0200 +++ gcc/testsuite/gcc.dg/pr35691-3.c2017-09-22 13:58:44.070312539 +0200 @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */ + +int foo(int z0, unsigned z1) +{ + int t0 = (z0 == -1); + int t1 = (z1 == -1U); + int t2 = (t0 & t1); + return t2; +} + +/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) z1_\[0-9\]*\\(D\\);" "forwprop1" } } */ --- gcc/testsuite/gcc.dg/pr35691-4.c.jj 2017-09-22 13:55:39.855617665 +0200 +++ gcc/testsuite/gcc.dg/pr35691-4.c2017-09-22 13:58:50.132236685 +0200 @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */ + +int foo(int z0, unsigned z1) +{ + int t0 = (z0 != -1); + int t1 = (z1 != -1U); + int t2 = (t0 | t1); + return t2; +} + +/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) z1_\[0-9\]*\\(D\\);" "forwprop1" } } */ Jakub
Re: Don't query the frontend for unsupported types
Richard Biener writes: > On Thu, Sep 21, 2017 at 2:56 PM, Richard Sandiford > wrote: >> Richard Biener writes: >>> On September 20, 2017 2:36:03 PM GMT+02:00, Richard Sandiford >>> wrote: When forcing a constant of mode MODE into memory, force_const_mem asks the frontend to provide the type associated with that mode. In principle type_for_mode is allowed to return null, and although one use site correctly handled that, the other didn't. I think there's agreement that it's bogus to use type_for_mode for this kind of thing, since it forces frontends to handle types that don't exist in that language. See e.g. http://gcc.gnu.org/PR46805 where the Go frontend was forced to handle vector types even though Go doesn't have vector types. Also, the frontends use code like: else if (VECTOR_MODE_P (mode)) { machine_mode inner_mode = GET_MODE_INNER (mode); tree inner_type = c_common_type_for_mode (inner_mode, unsignedp); if (inner_type != NULL_TREE) return build_vector_type_for_mode (inner_type, mode); } and there's no guarantee that every vector mode M used by backend rtl has an associated vector type whose TYPE_MODE is M. I think really the type_for_mode hook should only return trees that _do_ have the requested TYPE_MODE, but PR46805 linked above shows that this is likely to have too many knock-on consequences. It doesn't make sense for force_const_mem to ask about vector modes that aren't valid for vector types, so this patch handles the condition there instead. This is needed for SVE multi-register modes, which are modelled as vector modes but are not usable as vector types. Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linus-gnu. OK to install? >>> >>> I think we should get rid of the use entirely. >> >> I first read this as not using type_for_mode at all in force_const_mem, >> which sounded like a good thing :-) > > That's what I meant ;) A mode doesn't really have a type... > > I tried it overnight on the usual >> at-least-one-target-per-CPU set and diffing the before and after >> assembly for the testsuite. And it looks like i686 relies on this >> to get an alignment of 16 rather than 4 for XFmode constants: >> GET_MODE_ALIGNMENT (XFmode) == 32 (as requested by i386-modes.def), >> but i386's CONSTANT_ALIGNMENT increases it to 128 for static constants. > > Then the issue is that CONSTANT_ALIGNMENT takes a tree and not a mode... > even worse than type_for_mode is a use of make_tree! Incidentially > ix86_constant_alignment _does_ look at the mode in the end... OK, I guess this means another target hook conversion. The patch below converts CONSTANT_ALIGNMENT with its current interface. The definition: #define CONSTANT_ALIGNMENT(EXP, ALIGN) \ (TREE_CODE (EXP) == STRING_CST \ && (ALIGN) < BITS_PER_WORD ? BITS_PER_WORD : (ALIGN)) was very common, so the patch adds a canned definition for that, called constant_alignment_word_strings. Some ports had a variation that used a port-local FASTEST_ALIGNMENT instead of BITS_PER_WORD; the patch uses constant_alignment_word_strings if FASTEST_ALIGNMENT was always BITS_PER_WORD and a port-local hook function otherwise. Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. Also tested by comparing the testsuite assembly output on at least one target per CPU directory. I don't think this comes under Jeff's preapproval due to the constant_alignment_word_strings thing, so: OK to install? If so, then I'll follow up with a separate hook for rtl modes, which varasm, default_constant_alignment and constant_alignment_word_strings can all use. Thanks, Richard 2017-09-22 Richard Sandiford gcc/ * target.def (constant_alignment): New hook. * defaults.h (CONSTANT_ALIGNMENT): Delete. * doc/tm.texi.in (CONSTANT_ALIGNMENT): Replace with... (TARGET_CONSTANT_ALIGNMENT): ...this new hook. * doc/tm.texi: Regenerate. * targhooks.h (default_constant_alignment): Declare. (constant_alignment_word_strings): Likewise. * targhooks.c (default_constant_alignment): New function. (constant_alignment_word_strings): Likewise. * builtins.c (get_object_alignment_2): Use targetm.constant_alignment instead of CONSTANT_ALIGNMENT. * varasm.c (align_variable, get_variable_align, build_constant_desc) (force_const_mem): Likewise. * config/aarch64/aarch64.h (CONSTANT_ALIGNMENT): Delete. * config/aarch64/aarch64.c (aarch64_constant_alignment): New function. (aarch64_classify_address): Call it instead of CONSTANT_ALIGNMENT. (TARGET_CONSTANT_ALIGNMENT): Redefine. * config/alpha/alpha.h (CONSTANT_ALIGNMENT): Delete commented-out definition. * config/arc/arc.h (CONSTANT_ALIGNMENT): Delete. * config/arc/arc.c (TA
Change permute index type to unsigned short
This patch changes the element type of (auto_)vec_perm_indices from unsigned char to unsigned short. This is needed for fixed-length 2048-bit SVE. (SVE is variable-length by default, but it's possible to ask for specific vector lengths if you want to.) Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. Also tested by comparing the testsuite assembly output on at least one target per CPU directory. OK to install? Richard 2017-09-22 Richard Sandiford gcc/ * target.h (vec_perm_indices): Use unsigned short rather than unsigned char. (auto_vec_perm_indices): Likewise. * config/aarch64/aarch64.c (aarch64_vectorize_vec_perm_const_ok): Use unsigned int rather than unsigned char. * config/arm/arm.c (arm_vectorize_vec_perm_const_ok): Likewise. Index: gcc/target.h === --- gcc/target.h2017-09-14 17:04:19.080694343 +0100 +++ gcc/target.h2017-09-22 17:35:22.486794044 +0100 @@ -193,11 +193,11 @@ enum vect_cost_model_location { /* The type to use for vector permutes with a constant permute vector. Each entry is an index into the concatenated input vectors. */ -typedef vec vec_perm_indices; +typedef vec vec_perm_indices; /* Same, but can be used to construct local permute vectors that are automatically freed. */ -typedef auto_vec auto_vec_perm_indices; +typedef auto_vec auto_vec_perm_indices; /* The target structure. This holds all the backend hooks. */ #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME; Index: gcc/config/aarch64/aarch64.c === --- gcc/config/aarch64/aarch64.c2017-09-22 17:31:56.412840135 +0100 +++ gcc/config/aarch64/aarch64.c2017-09-22 17:35:22.483794044 +0100 @@ -13820,7 +13820,7 @@ aarch64_vectorize_vec_perm_const_ok (mac nelt = sel.length (); for (i = which = 0; i < nelt; ++i) { - unsigned char e = d.perm[i]; + unsigned int e = d.perm[i]; gcc_assert (e < 2 * nelt); which |= (e < nelt ? 1 : 2); } Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c2017-09-22 17:31:56.414735941 +0100 +++ gcc/config/arm/arm.c2017-09-22 17:35:22.486794044 +0100 @@ -29261,7 +29261,7 @@ arm_vectorize_vec_perm_const_ok (machine nelt = GET_MODE_NUNITS (d.vmode); for (i = which = 0; i < nelt; ++i) { - unsigned char e = d.perm[i]; + unsigned int e = d.perm[i]; gcc_assert (e < 2 * nelt); which |= (e < nelt ? 1 : 2); }
Update interface to TARGET_VECTORIZE_VEC_PERM_CONST_OK
This patch makes TARGET_VECTORIZE_VEC_PERM_CONST_OK take the permute vector in the form of a vec_perm_indices instead of an unsigned char *. It follows on from the recent patch that did the same in target-independent code. It was easy to make ARM and AArch64 use vec_perm_indices internally as well, and converting AArch64 helps with SVE. I did try doing the same for the other ports, but the surgery needed was much more invasive and much less obviously correct. Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. Also tested by comparing the testsuite assembly output on at least one target per CPU directory. OK to install? Richard 2017-09-22 Richard Sandiford gcc/ * target.def (vec_perm_const_ok): Change sel parameter to vec_perm_indices. * optabs-query.c (can_vec_perm_p): Update accordingly. * doc/tm.texi: Regenerate. * config/aarch64/aarch64.c (expand_vec_perm_d): Change perm to auto_vec_perm_indices and remove separate nelt field. (aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_zip) (aarch64_evpc_ext, aarch64_evpc_rev, aarch64_evpc_dup) (aarch64_evpc_tbl, aarch64_expand_vec_perm_const_1) (aarch64_expand_vec_perm_const): Update accordingly. (aarch64_vectorize_vec_perm_const_ok): Likewise. Change sel to vec_perm_indices. * config/arm/arm.c (expand_vec_perm_d): Change perm to auto_vec_perm_indices and remove separate nelt field. (arm_evpc_neon_vuzp, arm_evpc_neon_vzip, arm_evpc_neon_vrev) (arm_evpc_neon_vtrn, arm_evpc_neon_vext, arm_evpc_neon_vtbl) (arm_expand_vec_perm_const_1, arm_expand_vec_perm_const): Update accordingly. (arm_vectorize_vec_perm_const_ok): Likewise. Change sel to vec_perm_indices. * config/i386/i386.c (ix86_vectorize_vec_perm_const_ok): Change sel to vec_perm_indices. * config/ia64/ia64.c (ia64_vectorize_vec_perm_const_ok): Likewise. * config/mips/mips.c (mips_vectorize_vec_perm_const_ok): Likewise. * config/powerpcspe/powerpcspe.c (rs6000_vectorize_vec_perm_const_ok): Likewise. * config/rs6000/rs6000.c (rs6000_vectorize_vec_perm_const_ok): Likewise. Index: gcc/target.def === --- gcc/target.def 2017-09-22 17:31:36.935337179 +0100 +++ gcc/target.def 2017-09-22 17:31:56.428954480 +0100 @@ -1847,7 +1847,7 @@ DEFHOOK DEFHOOK (vec_perm_const_ok, "Return true if a vector created for @code{vec_perm_const} is valid.", - bool, (machine_mode, const unsigned char *sel), + bool, (machine_mode, vec_perm_indices), NULL) /* Return true if the target supports misaligned store/load of a Index: gcc/optabs-query.c === --- gcc/optabs-query.c 2017-09-14 17:04:19.080694343 +0100 +++ gcc/optabs-query.c 2017-09-22 17:31:56.428006577 +0100 @@ -367,7 +367,7 @@ can_vec_perm_p (machine_mode mode, bool if (direct_optab_handler (vec_perm_const_optab, mode) != CODE_FOR_nothing && (sel == NULL || targetm.vectorize.vec_perm_const_ok == NULL - || targetm.vectorize.vec_perm_const_ok (mode, &(*sel)[0]))) + || targetm.vectorize.vec_perm_const_ok (mode, *sel))) return true; } Index: gcc/doc/tm.texi === --- gcc/doc/tm.texi 2017-09-22 17:31:36.933441374 +0100 +++ gcc/doc/tm.texi 2017-09-22 17:31:56.428006577 +0100 @@ -5774,7 +5774,7 @@ correct for most targets. Return true if vector alignment is reachable (by peeling N iterations) for the given scalar type @var{type}. @var{is_packed} is false if the scalar access using @var{type} is known to be naturally aligned. @end deftypefn -@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST_OK (machine_mode, const unsigned char *@var{sel}) +@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST_OK (machine_mode, @var{vec_perm_indices}) Return true if a vector created for @code{vec_perm_const} is valid. @end deftypefn Index: gcc/config/aarch64/aarch64.c === --- gcc/config/aarch64/aarch64.c2017-09-21 11:53:16.681759682 +0100 +++ gcc/config/aarch64/aarch64.c2017-09-22 17:31:56.412840135 +0100 @@ -141,8 +141,8 @@ static void aarch64_elf_asm_constructor static void aarch64_elf_asm_destructor (rtx, int) ATTRIBUTE_UNUSED; static void aarch64_override_options_after_change (void); static bool aarch64_vector_mode_supported_p (machine_mode); -static bool aarch64_vectorize_vec_perm_const_ok (machine_mode vmode, -const unsigned char *sel); +static bool aarch64_vectorize_vec_perm_const_ok (machine_mode, +vec_perm_indices); static int aarch64_address_cost (
Re: [PING][PATCH] Output DIEs for outlined OpenMP functions in correct lexical scope
On Thu, Sep 21, 2017 at 01:08:01PM -0700, Kevin Buettner wrote: > Ping. Ok, thanks. > On Mon, 7 Aug 2017 17:51:38 -0700 > Kevin Buettner wrote: > > > On Wed, 10 May 2017 17:24:27 +0200 > > Jakub Jelinek wrote: > > > > > What I don't like is that the patch is inconsistent, it sets DECL_CONTEXT > > > of the child function for all kinds of outlined functions, but then you > > > just > > > choose one of the many places and add it into the BLOCK tree. Any reason > > > why the DECL_CONTEXT change can't be done in a helper function together > > > with all the changes you've added into omp-expand.c, and then call it from > > > expand_omp_parallel (with the child_fn and entry_stmt arguments) so that > > > you can call it easily also for other constructs, either now or later on? > > > > > > > I've worked out a way to do the DECL_CONTEXT and the scope change > > together. The helper function should be usable for other constructs, > > though I have not tested this yet. > > > > > Also, is there any rationale on appending the FUNCTION_DECL to BLOCK_VARS > > > instead of prepending it there (which is cheaper)? Does the debugger > > > care about relative order of those artificial functions vs. other > > > variables in the lexical scope? > > > > To the best of my knowledge, the debugger doesn't care about the > > order. I've changed the code to prepend the FUNCTION_DECL to > > BLOCK_VARS instead. > > > > How does this new version (below) look? > > > > I've done a "make bootstrap" and "make -k check". No regressions found > > for x86_64. > > > > gcc/ChangeLog: > > > > * omp-expand.c (adjust_context_scope): New function. > > (expand_parallel_call): Call adjust_context_scope. > > --- > > gcc/omp-expand.c | 38 ++ > > 1 file changed, 38 insertions(+) > > > > diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c > > index d6755cd..9eb0a89 100644 > > --- a/gcc/omp-expand.c > > +++ b/gcc/omp-expand.c > > @@ -498,6 +498,42 @@ parallel_needs_hsa_kernel_p (struct omp_region *region) > >return false; > > } > > > > +/* Change DECL_CONTEXT of CHILD_FNDECL to that of the parent function. > > + Add CHILD_FNDECL to decl chain of the supercontext of the block > > + ENTRY_BLOCK - this is the block which originally contained the > > + code from which CHILD_FNDECL was created. > > + > > + Together, these actions ensure that the debug info for the outlined > > + function will be emitted with the correct lexical scope. */ > > + > > +static void > > +adjust_context_and_scope (tree entry_block, tree child_fndecl) > > +{ > > + if (entry_block != NULL_TREE && TREE_CODE (entry_block) == BLOCK) > > +{ > > + tree b = BLOCK_SUPERCONTEXT (entry_block); > > + > > + if (TREE_CODE (b) == BLOCK) > > +{ > > + tree parent_fndecl; > > + > > + /* Follow supercontext chain until the parent fndecl > > +is found. */ > > + for (parent_fndecl = BLOCK_SUPERCONTEXT (b); > > + TREE_CODE (parent_fndecl) == BLOCK; > > + parent_fndecl = BLOCK_SUPERCONTEXT (parent_fndecl)) > > + ; > > + > > + gcc_assert (TREE_CODE (parent_fndecl) == FUNCTION_DECL); > > + > > + DECL_CONTEXT (child_fndecl) = parent_fndecl; > > + > > + DECL_CHAIN (child_fndecl) = BLOCK_VARS (b); > > + BLOCK_VARS (b) = child_fndecl; > > + } > > +} > > +} > > + > > /* Build the function calls to GOMP_parallel_start etc to actually > > generate the parallel operation. REGION is the parallel region > > being expanded. BB is the block where to insert the code. WS_ARGS > > @@ -667,6 +703,8 @@ expand_parallel_call (struct omp_region *region, > > basic_block bb, > >tree child_fndecl = gimple_omp_parallel_child_fn (entry_stmt); > >t2 = build_fold_addr_expr (child_fndecl); > > > > + adjust_context_and_scope (gimple_block (entry_stmt), child_fndecl); > > + > >vec_alloc (args, 4 + vec_safe_length (ws_args)); > >args->quick_push (t2); > >args->quick_push (t1); Jakub
Re: [patch][arm] (respin) Improve error checking in parsecpu.awk
On 9/22/17 4:58 PM, Vidya Praveen wrote: Hello, This patch by Richard Earnshaw was reverted along with the commit that preceded it as the preceding commit was causing cross-native builds to fail and I presumed this patch was related too. Now I am respinning as the issue that caused the cross-native failure have been fixed. This patch however is simply rebased and has no other changes. For reference, the ChangeLog of the preceding patch that broke cross-native build. [arm] auto-generate arm-isa.h from CPU descriptions This patch autogenerates arm-isa.h from new entries in arm-cpus.in. This has the primary advantage that it makes the description file more self-contained, but it also solves the 'array dimensioning' problem that Tamar recently encountered. It adds two new constructs to arm-cpus.in: features and fgroups. Fgroups are simply a way of naming a group of feature bits so that they can be referenced together. We follow the convention that feature bits are all lower case, while fgroups are (predominantly) upper case. This is helpful as in some contexts they share the same namespace. Most of the minor changes in this patch are related to adopting this new naming convention. * config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file. * config/arm/arm-isa.h: Delete. Move definitions to ... * arm-cpus.in: ... here. Use new feature and fgroup values. * config/arm/arm.c (arm_option_override): Use lower case for feature bit names. * config/arm/arm.h (TARGET_HARD_FLOAT): Likewise. (TARGET_VFP3, TARGET_VFP5, TARGET_FMA): Likewise. * config/arm/parsecpu.awk (END): Add new command 'isa'. (isa_pfx): Delete. (print_isa_bits_for): New function. (gen_isa): New function. (gen_comm_data): Use print_isa_bits_for. (define feature): New keyword. (define fgroup): New keyword. * config/arm/t-arm (OPTIONS_H_EXTRA): Add arm-isa.h (arm-isa.h): Add rule to generate file. * common/config/arm/arm-common.c: (arm_canon_arch_option): Use lower case for feature bit names. Tested by building cross/cross-native arm-none-linux-gnueabihf and baremetal cross build (arm-none-eabi) on x86_64. OK for trunk? Ok, please apply. Ramana Regards VP. gcc/ChangeLog: [arm] Improve error checking in parsecpu.awk This patch adds a bit more error checking to parsecpu.awk to ensure that statements are not missing arguments or have excess arguments beyond those permitted. It also slightly improves the handling of errors so that we terminate properly if parsing fails and be as helpful as we can while in the parsing phase. 2017-09-22 Richard Earnshaw * config/arm/parsecpu.awk (fatal): Note that we've encountered an error. Only quit immediately if parsing is complete. (BEGIN): Initialize fatal_err and parse_done. (begin fpu, end fpu): Check number of arguments. (begin arch, end arch): Likewise. (begin cpu, end cpu): Likewise. (cname, tune for, tune flags, architecture, fpu, option): Likewise. (optalias): Likewise.
Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.
On Fri, Sep 22, 2017 at 8:49 AM, Jim Wilson wrote: > On Falkor, because of an idiosyncracy of how the pipelines are designed, a > quad-word store using a reg+reg addressing mode is almost twice as slow as an > add followed by a quad-word store with a single reg addressing mode. So we > get better performance if we disallow addressing modes using register offsets > with quad-word stores. This was tested with a bootstrap and make check of course. Also, gcc.c-torture/compile/20021212-1.c compiled with -O3 -mcpu=falkor makes a nice testcase to demonstrate that the patch works. OK? Jim
[patch][arm] (respin) Improve error checking in parsecpu.awk
Hello, This patch by Richard Earnshaw was reverted along with the commit that preceded it as the preceding commit was causing cross-native builds to fail and I presumed this patch was related too. Now I am respinning as the issue that caused the cross-native failure have been fixed. This patch however is simply rebased and has no other changes. For reference, the ChangeLog of the preceding patch that broke cross-native build. [arm] auto-generate arm-isa.h from CPU descriptions This patch autogenerates arm-isa.h from new entries in arm-cpus.in. This has the primary advantage that it makes the description file more self-contained, but it also solves the 'array dimensioning' problem that Tamar recently encountered. It adds two new constructs to arm-cpus.in: features and fgroups. Fgroups are simply a way of naming a group of feature bits so that they can be referenced together. We follow the convention that feature bits are all lower case, while fgroups are (predominantly) upper case. This is helpful as in some contexts they share the same namespace. Most of the minor changes in this patch are related to adopting this new naming convention. * config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file. * config/arm/arm-isa.h: Delete. Move definitions to ... * arm-cpus.in: ... here. Use new feature and fgroup values. * config/arm/arm.c (arm_option_override): Use lower case for feature bit names. * config/arm/arm.h (TARGET_HARD_FLOAT): Likewise. (TARGET_VFP3, TARGET_VFP5, TARGET_FMA): Likewise. * config/arm/parsecpu.awk (END): Add new command 'isa'. (isa_pfx): Delete. (print_isa_bits_for): New function. (gen_isa): New function. (gen_comm_data): Use print_isa_bits_for. (define feature): New keyword. (define fgroup): New keyword. * config/arm/t-arm (OPTIONS_H_EXTRA): Add arm-isa.h (arm-isa.h): Add rule to generate file. * common/config/arm/arm-common.c: (arm_canon_arch_option): Use lower case for feature bit names. Tested by building cross/cross-native arm-none-linux-gnueabihf and baremetal cross build (arm-none-eabi) on x86_64. OK for trunk? Regards VP. gcc/ChangeLog: [arm] Improve error checking in parsecpu.awk This patch adds a bit more error checking to parsecpu.awk to ensure that statements are not missing arguments or have excess arguments beyond those permitted. It also slightly improves the handling of errors so that we terminate properly if parsing fails and be as helpful as we can while in the parsing phase. 2017-09-22 Richard Earnshaw * config/arm/parsecpu.awk (fatal): Note that we've encountered an error. Only quit immediately if parsing is complete. (BEGIN): Initialize fatal_err and parse_done. (begin fpu, end fpu): Check number of arguments. (begin arch, end arch): Likewise. (begin cpu, end cpu): Likewise. (cname, tune for, tune flags, architecture, fpu, option): Likewise. (optalias): Likewise. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 5b9217c..4885746 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -2154,6 +2154,17 @@ 2017-09-06 Richard Earnshaw + * config/arm/parsecpu.awk (fatal): Note that we've encountered an + error. Only quit immediately if parsing is complete. + (BEGIN): Initialize fatal_err and parse_done. + (begin fpu, end fpu): Check number of arguments. + (begin arch, end arch): Likewise. + (begin cpu, end cpu): Likewise. + (cname, tune for, tune flags, architecture, fpu, option): Likewise. + (optalias): Likewise. + +2017-09-06 Richard Earnshaw + * config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file. * config/arm/arm-isa.h: Delete. Move definitions to ... * arm-cpus.in: ... here. Use new feature and fgroup values. diff --git a/gcc/config/arm/parsecpu.awk b/gcc/config/arm/parsecpu.awk index d07d3fc..0b4fc68 100644 --- a/gcc/config/arm/parsecpu.awk +++ b/gcc/config/arm/parsecpu.awk @@ -32,7 +32,8 @@ function fatal (m) { print "error ("lineno"): " m > "/dev/stderr" -exit 1 +fatal_err = 1 +if (parse_done) exit 1 } function toplevel () { @@ -502,14 +503,18 @@ BEGIN { arch_name = "" fpu_name = "" lineno = 0 +fatal_err = 0 +parse_done = 0 if (cmd == "") fatal("Usage parsecpu.awk -v cmd=") } +# New line. Reset parse status and increment line count for error messages // { lineno++ parse_ok = 0 } +# Comments must be on a line on their own. /^#/ { parse_ok = 1 } @@ -552,12 +557,14 @@ BEGIN { } /^begin fpu / { +if (NF != 3) fatal("syntax: begin fpu ") toplevel() fpu_name = $3 parse_ok = 1 } /^end fpu / { +if (NF != 3) fatal("syntax: end fpu ") if (fpu_name != $3) fatal("mimatched end fpu") if (! (fpu_name in fpu_isa)) { fatal("fpu definition \"" fpu_name "\" lacks an \"isa\" stat
Re: [patch][arm] (respin) auto-generate arm-isa.h from CPU descriptions
On 22/09/17 16:54, Vidya Praveen wrote: Hello, This patch by Richard Earnshaw was reverted earlier as it was breaking cross-native builds. Respinning now with a minor change that fixes the build issue - adding arm-isa.h to GTM_H. Also remove a redundant dependency (TM_H includes GTM_H). Tested by building cross/cross-native arm-none-linux-gnueabihf and baremetal cross build (arm-none-eabi) on x86_64. OK for trunk? Ok, please apply. Ramana Regards VP. gcc/ChangeLog: [arm] auto-generate arm-isa.h from CPU descriptions This patch autogenerates arm-isa.h from new entries in arm-cpus.in. This has the primary advantage that it makes the description file more self-contained, but it also solves the 'array dimensioning' problem that Tamar recently encountered. It adds two new constructs to arm-cpus.in: features and fgroups. Fgroups are simply a way of naming a group of feature bits so that they can be referenced together. We follow the convention that feature bits are all lower case, while fgroups are (predominantly) upper case. This is helpful as in some contexts they share the same namespace. Most of the minor changes in this patch are related to adopting this new naming convention. 2017-09-22 Richard Earnshaw * config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file. * config/arm/arm-isa.h: Delete. Move definitions to ... * arm-cpus.in: ... here. Use new feature and fgroup values. * config/arm/arm.c (arm_option_override): Use lower case for feature bit names. * config/arm/arm.h (TARGET_HARD_FLOAT): Likewise. (TARGET_VFP3, TARGET_VFP5, TARGET_FMA): Likewise. * config/arm/parsecpu.awk (END): Add new command 'isa'. (isa_pfx): Delete. (print_isa_bits_for): New function. (gen_isa): New function. (gen_comm_data): Use print_isa_bits_for. (define feature): New keyword. (define fgroup): New keyword. * config/arm/t-arm (TM_H): Remove. (GTM_H): Add arm-isa.h. (arm-isa.h): Add rule to generate file. * common/config/arm/arm-common.c: (arm_canon_arch_option): Use lower case for feature bit names.
[patch][arm] (respin) auto-generate arm-isa.h from CPU descriptions
Hello, This patch by Richard Earnshaw was reverted earlier as it was breaking cross-native builds. Respinning now with a minor change that fixes the build issue - adding arm-isa.h to GTM_H. Also remove a redundant dependency (TM_H includes GTM_H). Tested by building cross/cross-native arm-none-linux-gnueabihf and baremetal cross build (arm-none-eabi) on x86_64. OK for trunk? Regards VP. gcc/ChangeLog: [arm] auto-generate arm-isa.h from CPU descriptions This patch autogenerates arm-isa.h from new entries in arm-cpus.in. This has the primary advantage that it makes the description file more self-contained, but it also solves the 'array dimensioning' problem that Tamar recently encountered. It adds two new constructs to arm-cpus.in: features and fgroups. Fgroups are simply a way of naming a group of feature bits so that they can be referenced together. We follow the convention that feature bits are all lower case, while fgroups are (predominantly) upper case. This is helpful as in some contexts they share the same namespace. Most of the minor changes in this patch are related to adopting this new naming convention. 2017-09-22 Richard Earnshaw * config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file. * config/arm/arm-isa.h: Delete. Move definitions to ... * arm-cpus.in: ... here. Use new feature and fgroup values. * config/arm/arm.c (arm_option_override): Use lower case for feature bit names. * config/arm/arm.h (TARGET_HARD_FLOAT): Likewise. (TARGET_VFP3, TARGET_VFP5, TARGET_FMA): Likewise. * config/arm/parsecpu.awk (END): Add new command 'isa'. (isa_pfx): Delete. (print_isa_bits_for): New function. (gen_isa): New function. (gen_comm_data): Use print_isa_bits_for. (define feature): New keyword. (define fgroup): New keyword. * config/arm/t-arm (TM_H): Remove. (GTM_H): Add arm-isa.h. (arm-isa.h): Add rule to generate file. * common/config/arm/arm-common.c: (arm_canon_arch_option): Use lower case for feature bit names. diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common/config/arm/arm-common.c index 38bd3a7..7cb99ec 100644 --- a/gcc/common/config/arm/arm-common.c +++ b/gcc/common/config/arm/arm-common.c @@ -574,7 +574,7 @@ arm_canon_arch_option (int argc, const char **argv) { /* The easiest and safest way to remove the default fpu capabilities is to look for a '+no..' option that removes - the base FPU bit (isa_bit_VFPv2). If that doesn't exist + the base FPU bit (isa_bit_vfpv2). If that doesn't exist then the best we can do is strip out all the bits that might be part of the most capable FPU we know about, which is "crypto-neon-fp-armv8". */ @@ -586,7 +586,7 @@ arm_canon_arch_option (int argc, const char **argv) ++ext) { if (ext->remove - && check_isa_bits_for (ext->isa_bits, isa_bit_VFPv2)) + && check_isa_bits_for (ext->isa_bits, isa_bit_vfpv2)) { arm_initialize_isa (fpu_isa, ext->isa_bits); bitmap_and_compl (target_isa, target_isa, fpu_isa); @@ -620,7 +620,7 @@ arm_canon_arch_option (int argc, const char **argv) { /* Clearing the VFPv2 bit is sufficient to stop any extention that builds on the FPU from matching. */ - bitmap_clear_bit (target_isa, isa_bit_VFPv2); + bitmap_clear_bit (target_isa, isa_bit_vfpv2); } /* If we don't have a selected architecture by now, something's @@ -692,8 +692,8 @@ arm_canon_arch_option (int argc, const char **argv) capable FPU variant that we do support. This is sufficient for multilib selection. */ - if (bitmap_bit_p (target_isa_unsatisfied, isa_bit_VFPv2) - && bitmap_bit_p (fpu_isa, isa_bit_VFPv2)) + if (bitmap_bit_p (target_isa_unsatisfied, isa_bit_vfpv2) + && bitmap_bit_p (fpu_isa, isa_bit_vfpv2)) { std::list::iterator ipoint = extensions.begin (); diff --git a/gcc/config.gcc b/gcc/config.gcc index 555ed69..00225104 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -593,7 +593,7 @@ x86_64-*-*) tm_file="vxworks-dummy.h ${tm_file}" ;; arm*-*-*) - tm_p_file="arm/arm-flags.h arm/arm-isa.h ${tm_p_file} arm/aarch-common-protos.h" + tm_p_file="arm/arm-flags.h ${tm_p_file} arm/aarch-common-protos.h" tm_file="vxworks-dummy.h ${tm_file}" ;; mips*-*-* | sh*-*-* | sparc*-*-*) diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in index d009a9e..07de4c9 100644 --- a/gcc/config/arm/arm-cpus.in +++ b/gcc/config/arm/arm-cpus.in @@ -40,6 +40,210 @@ # names in the final compiler. The order within each group is preserved and # forms the order for the list within the compiler. +# Most objects in this file support forward references. The major +# exception is feature groups, which may only refer to previously +# defined features or feature groups. This is done to avoid the risk +# of feature groups recursively ref
[PATCH, AArch64] Disable reg offset in quad-word store for Falkor.
On Falkor, because of an idiosyncracy of how the pipelines are designed, a quad-word store using a reg+reg addressing mode is almost twice as slow as an add followed by a quad-word store with a single reg addressing mode. So we get better performance if we disallow addressing modes using register offsets with quad-word stores. Using lmbench compiled with -O2 -ftree-vectorize as my benchmark, I see a 13% performance increase on stream copy using this patch, and a 16% performance increase on stream scale using this patch. I also see a small performance increase on SPEC CPU2006 of around 0.2% for int and 0.4% for FP at -O3. gcc/ * config/aarch64/aarch64-protos.h (aarch64_movti_target_operand_p): New. * config/aarch64/aarch64-simd.md (aarch64_simd_mov): Use Utf. * config/aarch64/aarch64-tuning-flags.def (SLOW_REGOFFSET_QUADWORD_STORE): New. * config/aarch64/aarch64.c (qdf24xx_tunings): Add SLOW_REGOFFSET_QUADWORD_STORE to tuning flags. (aarch64_movti_target_operand_p): New. * config/aarch64/aarch64.md (movti_aarch64): Use Utf. (movtf_aarch64): Likewise. * config/aarch64/constraints.md (Utf): New. --- gcc/config/aarch64/aarch64-protos.h | 1 + gcc/config/aarch64/aarch64-simd.md | 4 ++-- gcc/config/aarch64/aarch64-tuning-flags.def | 4 gcc/config/aarch64/aarch64.c| 14 +- gcc/config/aarch64/aarch64.md | 6 +++--- gcc/config/aarch64/constraints.md | 6 ++ 6 files changed, 29 insertions(+), 6 deletions(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index e67c2ed..2dfd057 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -379,6 +379,7 @@ const char *aarch64_output_move_struct (rtx *operands); rtx aarch64_return_addr (int, rtx); rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT); bool aarch64_simd_mem_operand_p (rtx); +bool aarch64_movti_target_operand_p (rtx); rtx aarch64_simd_vect_par_cnst_half (machine_mode, bool); rtx aarch64_tls_get_addr (void); tree aarch64_fold_builtin (tree, int, tree *, bool); diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 70e9339..88bf210 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -133,9 +133,9 @@ (define_insn "*aarch64_simd_mov" [(set (match_operand:VQ 0 "nonimmediate_operand" - "=w, Umq, m, w, ?r, ?w, ?r, w") + "=w, Umq, Utf, w, ?r, ?w, ?r, w") (match_operand:VQ 1 "general_operand" - "m, Dz, w, w, w, r, r, Dn"))] + "m, Dz, w,w, w, r, r, Dn"))] "TARGET_SIMD && (register_operand (operands[0], mode) || aarch64_simd_reg_or_zero (operands[1], mode))" diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index f48642c..7d0b104 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -41,4 +41,8 @@ AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", SLOW_UNALIGNED_LDPW) are not considered cheap. */ AARCH64_EXTRA_TUNING_OPTION ("cheap_shift_extend", CHEAP_SHIFT_EXTEND) +/* Don't use a register offset in a memory address for a quad-word store. */ +AARCH64_EXTRA_TUNING_OPTION ("slow_regoffset_quadword_store", +SLOW_REGOFFSET_QUADWORD_STORE) + #undef AARCH64_EXTRA_TUNING_OPTION diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 5e26cb7..d6f1133 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -818,7 +818,7 @@ static const struct tune_params qdf24xx_tunings = 2, /* min_div_recip_mul_df. */ 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_STRONG, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + (AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE), /* tune_flags. */ &qdf24xx_prefetch_tune }; @@ -11821,6 +11821,18 @@ aarch64_simd_mem_operand_p (rtx op) || REG_P (XEXP (op, 0))); } +/* Return TRUE if OP uses an efficient memory address for quad-word target. */ +bool +aarch64_movti_target_operand_p (rtx op) +{ + if (! optimize_size + && (aarch64_tune_params.extra_tuning_flags + & AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE)) +return MEM_P (op) && ! (GET_CODE (XEXP (op, 0)) == PLUS + && ! CONST_INT_P (XEXP (XEXP (op, 0), 1))); + return MEM_P (op); +} + /* Emit a register copy from operand to operand, taking care not to early-clobber source registers in the process. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index f8cdb06..9c7e356 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1023,7 +1023,7 @@ (define_i
[committed] v2: C++: underline parameters in mismatching function calls
On Thu, 2017-09-21 at 09:20 -0700, Nathan Sidwell wrote: > On 09/20/2017 12:52 PM, David Malcolm wrote: > > When we have a type mismatch in a C++ function call, e.g. > > whereas underlining the mismatching things would make the messages > > easier to comprehend: > > > >test.c: In function 'int caller(int, int, float)': > >test.c:5:38: error: invalid conversion from 'int' to 'const > > char*' > >[-fpermissive] > > return callee (first, second, third); > > ^~ > >test.c:1:12: note: initializing argument 2 of 'int callee(int, > >const char*, float)' > > extern int callee (int one, const char *two, float three); > > Looks nice. Thanks for addressing what Jason & I said at the summit. > > > OK for trunk? > > A few nits, I think ... > > > > + location_t param_loc; > > + if (declarator && declarator->id_loc) > > +param_loc = make_location (declarator->id_loc, > > + decl_spec_token_start->location, > > + input_location); > > + else > > +param_loc = make_location (decl_spec_token_start->location, > > + decl_spec_token_start->location, > > + input_location); > > Are you implicitly relying on UNKNOWN_LOCATION being zero? Should > the > if condition be >if (declarator && declarator->id_loc != UNKNOWN_LOCATION) > ? > > The only difference between those 2 calls is the first param. I > find > such idiom confusing. Can you assign the arg to a temp and then > call > the function exactly once? No need for re-review on that change. > > > /* A collection of calls where argument 2 is of the wrong type. > > > > TODO: we should put the caret and underline for the diagnostic > > - at the second argument, rather than the close paren. > > - > > - TODO: we should highlight the second parameter of the callee, > > rather > > - than its name. */ > > + at the second argument, rather than the close paren. */ > > Yay, x-off a TODO! > > Looks good otherwise. > > nathan Thanks. I went ahead and committed the following to trunk (as r253096), which fixes the issues you pointed out above, and the FUNCTION_FIRST_USER_PARM one Jason pointed out. (Successfully bootstrapped®rtested on x86_64-pc-linux-gnu) gcc/cp/ChangeLog: * call.c (get_fndecl_argument_location): New function. (convert_like_real): Use it when complaining about argument type mismatches. * cp-tree.h (struct cp_parameter_declarator): Add "loc" field. * parser.c (make_parameter_declarator): Add "loc" param and use it to initialize the new field. (cp_parser_translation_unit): Add UNKNOWN_LOCATION for "loc" of the "no_parameters" parameter. (cp_parser_parameter_declaration_list): Set the location of the result of grokdeclarator to be the parameter's loc, assuming no errors. (cp_parser_parameter_declaration): Generate a location for the parameter and pass to make_parameter_declarator. gcc/testsuite/ChangeLog: * g++.dg/diagnostic/param-type-mismatch.C: Update expected results to reflect highlighting of parameters; add test coverage for callback parameters. --- gcc/cp/call.c | 26 +- gcc/cp/cp-tree.h | 2 + gcc/cp/parser.c| 32 +++- .../g++.dg/diagnostic/param-type-mismatch.C| 57 +- 4 files changed, 103 insertions(+), 14 deletions(-) diff --git a/gcc/cp/call.c b/gcc/cp/call.c index 4fa0d03..e83cf99 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -6579,6 +6579,30 @@ maybe_print_user_conv_context (conversion *convs) } } +/* Locate the parameter with the given index within FNDECL. + ARGNUM is zero based, -1 indicates the `this' argument of a method. + Return the location of the FNDECL itself if there are problems. */ + +static location_t +get_fndecl_argument_location (tree fndecl, int argnum) +{ + int i; + tree param; + + /* Locate param by index within DECL_ARGUMENTS (fndecl). */ + for (i = 0, param = FUNCTION_FIRST_USER_PARM (fndecl); + i < argnum && param; + i++, param = TREE_CHAIN (param)) +; + + /* If something went wrong (e.g. if we have a builtin and thus no arguments), + return the location of FNDECL. */ + if (param == NULL) +return DECL_SOURCE_LOCATION (fndecl); + + return DECL_SOURCE_LOCATION (param); +} + /* Perform the conversions in CONVS on the expression EXPR. FN and ARGNUM are used for diagnostics. ARGNUM is zero based, -1 indicates the `this' argument of a method. INNER is nonzero when @@ -6680,7 +6704,7 @@ convert_like_real (conversion *convs, tree expr, tree fn, int argnum, complained = permerror (loc, "invalid conversion from %qH to %qI", TREE_TYPE (expr), totype); if (co
C++ patch ping
Hi! I'd like to ping the http://gcc.gnu.org/ml/gcc-patches/2017-09/msg00937.html - fix compile time hog in replace_placeholders patch. Thanks Jakub
Re: [PATCH][GRAPHITE] More TLC
On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener wrote: > > This simplifies canonicalize_loop_closed_ssa and does other minimal > TLC. It also adds a testcase I reduced from a stupid mistake I made > when reworking canonicalize_loop_closed_ssa. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > SPEC CPU 2006 is happy with it, current statistics on x86_64 with > -Ofast -march=haswell -floop-nest-optimize are > > 61 loop nests "optimized" > 45 loop nest transforms cancelled because of code generation issues > 21 loop nest optimizations timed out the 35 ISL "operations" we allow > > I say "optimized" because the usual transform I've seen is static tiling > as enforced by GRAPHITE according to --param loop-block-tile-size. > There's no way to automagically figure what kind of transform ISL did > Here is how to automate (without magic) the detection of the transform that isl did. The problem solved by isl is the minimization of strides in memory, and to do this, we need to tell the isl scheduler the validity dependence graph, in graphite-optimize-isl.c see the validity (RAW, WAR, WAW) and the proximity (RAR + validity) maps. The proximity does include the read after read, as the isl scheduler needs to minimize strides between consecutive reads. When you apply the schedule to the dependence graph, one can tell from the result the strides in memory, a good way to say whether a transform was beneficial is to sum up all memory strides, and make sure that the sum of all strides decreases after transform. We could add a printf with the sum of strides before and after transforms, and have the testcases check for that. (usually none with the schedule identical check confused by FILTER > stuff positioning). This is also the issue with most GRAPHITE testcases. > We can't really verify easily whether we performed loop interchange > or not. We can probably tell whether we applied loop fusion or > splitting (by counting loops). > > I'm not aware of any remaining ICEs / wrong-code issues with GRAPHITE. > > I'm aware that the current "black-box" granularity hinders > scheduling freedom (each GIMPLE BB is mapped to a ISL stmt, this > is too coarse to schedule say two writes in a BB independently > from each other). Quick experiments could be done by simply > splitting gimple BBs at some points. > > I'm aware that the SCOP detection algorithm assumes that it can > walk loop->next and find loops "in order" -- but while that's > true for the initial flow_loops_find result (DFS walk) it isn't > true for any later created / discovered loops. Sorting of > loop siblings in DFS order should be easy (and a general cfgloopanal > helper). > > Richard. > > 2017-09-22 Richard Biener > > * graphite-isl-ast-to-gimple.c (graphite_verify): Inline into > single caller. > (graphite_regenerate_ast_isl): Do not reset SCEV. Move debug > print of no dependency loops ... > * graphite.c (graphite_transform_loops): ... here. > (canonicalize_loop_closed_ssa_form): Work from inner to outer > loops. > (same_close_phi_node, remove_duplicate_close_phi, > make_close_phi_nodes_unique, defined_in_loop_p): Fold into ... > (canonicalize_loop_closed_ssa): ... here and simplify. > * graphite-optimize-isl.c: Include tree-vectorizer.h. > (optimize_isl): Use dump_printf_loc to tell when we stopped > optimizing because of an ISL timeout. > > * gcc.dg/graphite/scop-24.c: New testcase. > > The change looks good to me. Thanks, Sebastian
Re: [Patch, Fortran] PR 82143: add a -fdefault-real-16 flag
On Fri, Sep 22, 2017 at 10:11:55AM +0300, Janne Blomqvist wrote: > On Fri, Sep 22, 2017 at 8:02 AM, Janus Weil wrote: > > Thanks, Steve. I'm attaching the updated ChangeLog and the two test > > cases for the two new flags. Since this appears to be a somewhat > > controversial feature, I'll wait two more days to allow for others to > > contribute their feedback (positive or negative). I'll commit on > > Sunday if I hear nothing until then. > > Well, if you're actively soliciting bikeshedding, here goes: > > Since we're about to have several -fdefault-real-N flags, would it > make sense to instead make a single flag -fdefault-real=SOMEVALUE (and > for backwards compatibility, make -fdefault-real-8 an alias for > -fdefault-real=8)? And similarly for -fdefault-double=. > > And since the standard requires that double precision variables are > twice as big as reals, in the absence of an explicit -fdefault-double= > flag, would it make sense to have -fdefault-real=N imply > -fdefault-double=[2*N or if that isn't supported on the target, the > largest supported real kind]? (This is sort-of an extension of the > current -fdefault-real-8 behavior) The standard requires more than just double precision being twice as big as default real. Among other things, storage association rules require default logical, integer, and real to all occupy the same number of storage units. The promotion of DOUBLE PRECISION to REAL(16), if available, was my misguided attempt to enforce storage assocation. That is the essenses of why I think these options should go away. But, I understand Janus's position that -fdefault-real-16 allows a developer to quickly test whether an algorithm works and/or is stable at higher precision. The unfortunate side-effect is that a developer finds that the result of the option appears to work, so the developer never reviews the actual code. It is not uncommon to find old code of the form REAL EPS, NEW, OLD PARAMETER(EPS=1.E-8) ... C TEST FOR CONVERGENCE IF (ABS(NEW-OLD).LT.EPS) THEN Is the convergence criteria still correct for -fdefault-real-16? Only a developer reviewing and properly porting the code to a higher precision can make that decision. -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow
libbacktrace patch committed: Replace lstat and readlink if not available
This patch to libbacktrace provides dummy versions of lstat and readlink if they are not available on the system. Bootstrapped and ran libbacktrace tests on x86_64-pc-linux-gnu both normally and with a hand-edited config.h. Committed to mainline. Ian 2017-09-22 Ian Lance Taylor PR sanitizer/77631 * configure.ac: Check for lstat and readlink. * elf.c (lstat, readlink): Provide dummy versions if real versions are not available. * configure, config.h.in: Rebuild. Index: configure.ac === --- configure.ac(revision 253093) +++ configure.ac(working copy) @@ -373,6 +373,7 @@ if test "$have_fcntl" = "yes"; then fi AC_CHECK_DECLS(strnlen) +AC_CHECK_FUNCS(lstat readlink) # Check for getexecname function. if test -n "${with_target_subdir}"; then Index: elf.c === --- elf.c (revision 253093) +++ elf.c (working copy) @@ -75,6 +75,35 @@ xstrnlen (const char *s, size_t maxlen) #endif +#ifndef HAVE_LSTAT + +/* Dummy version of lstat for systems that don't have it. */ + +static int +xlstat (const char *path ATTRIBUTE_UNUSED, struct stat *st ATTRIBUTE_UNUSED) +{ + return -1; +} + +#define lstat xlstat + +#endif + +#ifndef HAVE_READLINK + +/* Dummy version of readlink for systems that don't have it. */ + +static ssize_t +xreadlink (const char *path ATTRIBUTE_UNUSED, char *buf ATTRIBUTE_UNUSED, + size_t bufsz ATTRIBUTE_UNUSED) +{ + return -1; +} + +#define readlink xreadlink + +#endif + #ifndef HAVE_DL_ITERATE_PHDR /* Dummy version of dl_iterate_phdr for systems that don't have it. */
Re: [PATCH][GRAPHITE] Simplify move_sese_in_condition
On Fri, Sep 22, 2017 at 4:37 AM, Richard Biener wrote: > > This re-implements it avoding the need to recompute dominators and in > a much simpler way. > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress, SPEC CPU > 2006 is happy. > > Richard. > > 2017-09-22 Richard Biener > > * sese.c: Include cfganal.h. > (if_region_set_false_region): Remove. > (create_if_region_on_edge): Likewise. > (move_sese_in_condition): Re-implement without destroying > dominators. > This is an excellent cleanup. Thanks! Sebastian
[PATCH] Fix __mulv[dt]i3 and expand_mul_overflow (PR target/82274)
Hi! An overflow isn't detected on multiplication of signed integer with all ones in the upper half of bits and all zeros in the lower half by itself, either with -ftrapv, or __builtin_mul_overflow{,_p}, or signed-integer-overflow sanitization. For libgcc, we have various cases we handle earlier (e.g. when one or both operands are properly sign-extended from half precision to the full precision), and for other operands we mostly abort, except for some special cases (when upper half is all 0s or all 1s, but the msb of lower half is different from that). When upper halves of both operands are all 1s, we mishandle the above mentioned special case: DWunion ww = {.ll = (UDWtype) (UWtype) uu.s.low * (UDWtype) (UWtype) vv.s.low}; ww.s.high -= uu.s.low; ww.s.high -= vv.s.low; if (__builtin_expect (ww.s.high >= 0, 1)) return ww.ll; ww.ll will be 0 and so will be ww.s.high, so we return 0, but the only case when multiplication result is 0 without overflow is when either of the operands is 0 (but in that case it is properly sign-extended from lower half and thus we handle it earlier). So, either we could do if (__builtin_expect (ww.s.high >= 0 && ww.ll, 1)) or what the patch does (which is as expensive as that IMHO, but without performing the multiplication in that case). The expand_mul_overflow fix is similar, in that case we know that either both operands have all upper half bits 1 and then 0 msb of lower half, or both operands have all upper half bits 0 and then 1 msb of lower half. In any case, we know that neither operand is 0, and thus if the result is 0, we certainly have overflow. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and release branches? 2017-09-22 Jakub Jelinek PR target/82274 * internal-fn.c (expand_mul_overflow): If both operands have the same highpart of -1 or 0 and the topmost bit of lowpart is different, overflow is if res <= 0 rather than res < 0. * libgcc2.c (__mulvDI3): If both operands have the same highpart of -1 and the topmost bit of lowpart is 0, multiplication overflows even if both lowparts are 0. * gcc.dg/pr82274-1.c: New test. * gcc.dg/pr82274-2.c: New test. --- gcc/internal-fn.c.jj2017-09-01 09:25:44.0 +0200 +++ gcc/internal-fn.c 2017-09-22 11:10:13.901438868 +0200 @@ -1770,8 +1770,8 @@ expand_mul_overflow (location_t loc, tre } /* At this point hipart{0,1} are both in [-1, 0]. If they are -the same, overflow happened if res is negative, if they are -different, overflow happened if res is positive. */ +the same, overflow happened if res is non-positive, if they +are different, overflow happened if res is positive. */ if (op0_sign != 1 && op1_sign != 1 && op0_sign != op1_sign) emit_jump (hipart_different); else if (op0_sign == 1 || op1_sign == 1) @@ -1779,7 +1779,7 @@ expand_mul_overflow (location_t loc, tre NULL_RTX, NULL, hipart_different, profile_probability::even ()); - do_compare_rtx_and_jump (res, const0_rtx, LT, false, mode, + do_compare_rtx_and_jump (res, const0_rtx, LE, false, mode, NULL_RTX, NULL, do_error, profile_probability::very_unlikely ()); emit_jump (done_label); --- libgcc/libgcc2.c.jj 2017-01-01 12:45:50.0 +0100 +++ libgcc/libgcc2.c2017-09-22 10:17:56.010757022 +0200 @@ -375,7 +375,8 @@ __mulvDI3 (DWtype u, DWtype v) } else { - if (uu.s.high == (Wtype) -1 && vv.s.high == (Wtype) - 1) + if ((uu.s.high & vv.s.high) == (Wtype) -1 + && (uu.s.low | vv.s.low) != 0) { DWunion ww = {.ll = (UDWtype) (UWtype) uu.s.low * (UDWtype) (UWtype) vv.s.low}; --- gcc/testsuite/gcc.dg/pr82274-1.c.jj 2017-09-22 09:58:06.016739604 +0200 +++ gcc/testsuite/gcc.dg/pr82274-1.c2017-09-22 10:10:53.466071164 +0200 @@ -0,0 +1,16 @@ +/* PR target/82274 */ +/* { dg-do run } */ +/* { dg-shouldfail "trapv" } */ +/* { dg-options "-ftrapv" } */ + +int +main () +{ +#ifdef __SIZEOF_INT128__ + volatile __int128 m = -(((__int128) 1) << (__CHAR_BIT__ * __SIZEOF_INT128__ / 2)); +#else + volatile long long m = -(1LL << (__CHAR_BIT__ * __SIZEOF_LONG_LONG__ / 2)); +#endif + m = m * m; + return 0; +} --- gcc/testsuite/gcc.dg/pr82274-2.c.jj 2017-09-22 11:17:59.534630946 +0200 +++ gcc/testsuite/gcc.dg/pr82274-2.c2017-09-22 11:19:17.340661013 +0200 @@ -0,0 +1,26 @@ +/* PR target/82274 */ +/* { dg-do run } */ +/
[PATCH][GRAPHITE] More TLC
This simplifies canonicalize_loop_closed_ssa and does other minimal TLC. It also adds a testcase I reduced from a stupid mistake I made when reworking canonicalize_loop_closed_ssa. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. SPEC CPU 2006 is happy with it, current statistics on x86_64 with -Ofast -march=haswell -floop-nest-optimize are 61 loop nests "optimized" 45 loop nest transforms cancelled because of code generation issues 21 loop nest optimizations timed out the 35 ISL "operations" we allow I say "optimized" because the usual transform I've seen is static tiling as enforced by GRAPHITE according to --param loop-block-tile-size. There's no way to automagically figure what kind of transform ISL did (usually none with the schedule identical check confused by FILTER stuff positioning). This is also the issue with most GRAPHITE testcases. We can't really verify easily whether we performed loop interchange or not. We can probably tell whether we applied loop fusion or splitting (by counting loops). I'm not aware of any remaining ICEs / wrong-code issues with GRAPHITE. I'm aware that the current "black-box" granularity hinders scheduling freedom (each GIMPLE BB is mapped to a ISL stmt, this is too coarse to schedule say two writes in a BB independently from each other). Quick experiments could be done by simply splitting gimple BBs at some points. I'm aware that the SCOP detection algorithm assumes that it can walk loop->next and find loops "in order" -- but while that's true for the initial flow_loops_find result (DFS walk) it isn't true for any later created / discovered loops. Sorting of loop siblings in DFS order should be easy (and a general cfgloopanal helper). Richard. 2017-09-22 Richard Biener * graphite-isl-ast-to-gimple.c (graphite_verify): Inline into single caller. (graphite_regenerate_ast_isl): Do not reset SCEV. Move debug print of no dependency loops ... * graphite.c (graphite_transform_loops): ... here. (canonicalize_loop_closed_ssa_form): Work from inner to outer loops. (same_close_phi_node, remove_duplicate_close_phi, make_close_phi_nodes_unique, defined_in_loop_p): Fold into ... (canonicalize_loop_closed_ssa): ... here and simplify. * graphite-optimize-isl.c: Include tree-vectorizer.h. (optimize_isl): Use dump_printf_loc to tell when we stopped optimizing because of an ISL timeout. * gcc.dg/graphite/scop-24.c: New testcase. Index: gcc/graphite-isl-ast-to-gimple.c === --- gcc/graphite-isl-ast-to-gimple.c(revision 253091) +++ gcc/graphite-isl-ast-to-gimple.c(working copy) @@ -73,15 +73,6 @@ struct ast_build_info bool is_parallelizable; }; -/* Verifies properties that GRAPHITE should maintain during translation. */ - -static inline void -graphite_verify (void) -{ - checking_verify_loop_structure (); - checking_verify_loop_closed_ssa (true); -} - /* IVS_PARAMS maps isl's scattering and parameter identifiers to corresponding trees. */ @@ -2997,8 +2988,9 @@ graphite_regenerate_ast_isl (scop_p scop delete_loop (loop); } - graphite_verify (); - scev_reset (); + /* Verifies properties that GRAPHITE should maintain during translation. */ + checking_verify_loop_structure (); + checking_verify_loop_closed_ssa (true); free (if_region->true_region); free (if_region->region); @@ -3008,19 +3000,6 @@ graphite_regenerate_ast_isl (scop_p scop isl_ast_node_free (root_node); timevar_pop (TV_GRAPHITE_CODE_GEN); - if (dump_file && (dump_flags & TDF_DETAILS)) -{ - loop_p loop; - int num_no_dependency = 0; - - FOR_EACH_LOOP (loop, 0) - if (loop->can_be_parallel) - num_no_dependency++; - - fprintf (dump_file, "%d loops carried no dependency.\n", - num_no_dependency); -} - return !t.codegen_error_p (); } Index: gcc/graphite-optimize-isl.c === --- gcc/graphite-optimize-isl.c (revision 253091) +++ gcc/graphite-optimize-isl.c (working copy) @@ -37,6 +37,7 @@ along with GCC; see the file COPYING3. #include "tree-data-ref.h" #include "params.h" #include "dumpfile.h" +#include "tree-vectorizer.h" #include "graphite.h" @@ -156,9 +157,12 @@ optimize_isl (scop_p scop) if (!scop->transformed_schedule || isl_ctx_last_error (scop->isl_context) == isl_error_quota) { - if (dump_file && dump_flags) - fprintf (dump_file, "isl timed out --param max-isl-operations=%d\n", -max_operations); + location_t loc = find_loop_location + (scop->scop_info->region.entry->dest->loop_father); + dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc, + "loop nest not optimized, optimization timed out " + "after %d operations [--param ma
Re: [PATCH PR82163/V2]New interface checking LCSSA for single loop
On Fri, Sep 22, 2017 at 1:37 PM, Bin Cheng wrote: > Hi, > This is the V2 patch fixing PR82163. It rewrites verify_loop_closed_ssa by > checking > uses of all definitions inside of loop. This gives advantage that we can > check loop > closed ssa form for a specific loop, rather than for whole function. The > interface > is used in fixing this issue. > Bootstrap and test on x86_64, is it OK? Looks good to me. Thanks, Richard. > Thanks, > bin > 2017-09-21 Bin Cheng > > PR tree-optimization/82163 > * tree-ssa-loop-manip.h (verify_loop_closed_ssa): New parameter. > (checking_verify_loop_closed_ssa): New parameter. > * tree-ssa-loop-manip.c (check_loop_closed_ssa_use): Delete. > (check_loop_closed_ssa_stmt): Delete. > (check_loop_closed_ssa_def, check_loop_closed_ssa_bb): New functions. > (verify_loop_closed_ssa): Check loop closed ssa form for LOOP. > (tree_transform_and_unroll_loop): Check loop closed ssa form only for > changed loops. > > gcc/testsuite > 2017-09-21 Bin Cheng > > PR tree-optimization/82163 > * gcc.dg/tree-ssa/pr82163.c: New test.
[PATCH] Fix PR82291
The following fixes a goof in if-conversion which did nothing to false predicated blocks. But we have to at least remove all stores. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2017-09-22 Richard Biener PR tree-optimization/82291 * tree-if-conv.c (predicate_mem_writes): Make sure to remove writes in blocks predicated with false. * gcc.dg/torture/pr82291.c: New testcase. Index: gcc/tree-if-conv.c === --- gcc/tree-if-conv.c (revision 253090) +++ gcc/tree-if-conv.c (working copy) @@ -2197,7 +2197,7 @@ predicate_mem_writes (loop_p loop) gimple *stmt; int index; - if (is_true_predicate (cond) || is_false_predicate (cond)) + if (is_true_predicate (cond)) continue; swap = false; @@ -2210,97 +2210,107 @@ predicate_mem_writes (loop_p loop) vect_sizes.truncate (0); vect_masks.truncate (0); - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) - if (!gimple_assign_single_p (stmt = gsi_stmt (gsi))) - continue; - else if (gimple_plf (stmt, GF_PLF_2)) - { - tree lhs = gimple_assign_lhs (stmt); - tree rhs = gimple_assign_rhs1 (stmt); - tree ref, addr, ptr, mask; - gcall *new_stmt; - gimple_seq stmts = NULL; - int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (lhs))); - ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs; - mark_addressable (ref); - addr = force_gimple_operand_gsi (&gsi, build_fold_addr_expr (ref), -true, NULL_TREE, true, -GSI_SAME_STMT); - if (!vect_sizes.is_empty () - && (index = mask_exists (bitsize, vect_sizes)) != -1) - /* Use created mask. */ - mask = vect_masks[index]; - else - { - if (COMPARISON_CLASS_P (cond)) - mask = gimple_build (&stmts, TREE_CODE (cond), - boolean_type_node, - TREE_OPERAND (cond, 0), - TREE_OPERAND (cond, 1)); - else - { - gcc_assert (TREE_CODE (cond) == SSA_NAME); - mask = cond; - } - - if (swap) - { - tree true_val - = constant_boolean_node (true, TREE_TYPE (mask)); - mask = gimple_build (&stmts, BIT_XOR_EXPR, -TREE_TYPE (mask), mask, true_val); - } - gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT); - - mask = ifc_temp_var (TREE_TYPE (mask), mask, &gsi); - /* Save mask and its size for further use. */ - vect_sizes.safe_push (bitsize); - vect_masks.safe_push (mask); - } - ptr = build_int_cst (reference_alias_ptr_type (ref), -get_object_alignment (ref)); - /* Copy points-to info if possible. */ - if (TREE_CODE (addr) == SSA_NAME && !SSA_NAME_PTR_INFO (addr)) - copy_ref_info (build2 (MEM_REF, TREE_TYPE (ref), addr, ptr), -ref); - if (TREE_CODE (lhs) == SSA_NAME) - { - new_stmt - = gimple_build_call_internal (IFN_MASK_LOAD, 3, addr, - ptr, mask); - gimple_call_set_lhs (new_stmt, lhs); - gimple_set_vuse (new_stmt, gimple_vuse (stmt)); - } - else - { - new_stmt - = gimple_build_call_internal (IFN_MASK_STORE, 4, addr, ptr, + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);) + { + if (!gimple_assign_single_p (stmt = gsi_stmt (gsi))) + ; + else if (is_false_predicate (cond)) + { + unlink_stmt_vdef (stmt); + gsi_remove (&gsi, true); + release_defs (stmt); + continue; + } + else if (gimple_plf (stmt, GF_PLF_2)) + { + tree lhs = gimple_assign_lhs (stmt); + tree rhs = gimple_assign_rhs1 (stmt); + tree ref, addr, ptr, mask; + gcall *new_stmt; + gimple_seq stmts = NULL; + int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (lhs))); + ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs; + mark_addressable (ref); + addr = force_gimple_operand_gsi (&gsi, build_fold_addr_expr (ref), + true, NULL_TREE, true, + GSI_SAME_STMT); + if (!vect_sizes.is_empty () +
Re: [PATCH try 2] [i386,testsuite,libgcc] Fix build breakage on Mac and test FAILS on Solaris caused by PR82196 patch
On Fri, Sep 22, 2017 at 1:27 PM, Uros Bizjak wrote: > On Fri, Sep 22, 2017 at 12:28 PM, Daniel Santos > wrote: >> On 09/22/2017 03:28 AM, Rainer Orth wrote: >>> Hi Daniel, >>> On 09/22/2017 02:18 AM, Rainer Orth wrote: > Hi Daniel, > >> On 09/21/2017 05:18 PM, Daniel Santos wrote: >>> So libgcc doesn't use a config.in. :( >> Scratch that, I forgot that we're using gcc/config.in via auto-host.h. >> So I only have to add this to gcc/configure.ac and it will be available >> for my libgcc header -- this is what I used to sniff out support for the >> .hidden directive. > Please don't go that route: it's totally the wrong direction. There's > work going on to further decouple libgcc from gcc-private headers and > configure results. libgcc already has its own configure tests for > assembler features, and its own config.in. What's wrong with adapting > libitm's avx test in libitm/acinclude.m4 (LIBITM_CHECK_AS_AVX) for > libgcc? Should be trivial... > > Rainer > Oops, I just saw your email after submitting my other patch. Yes, I am mistaken about config.in, sorry about that. I didn't see a config.h file, but examining further it looks like it outputs to auto-target.h. Also, I was looking for some HAVE_AS* macros, but they are named differently. >>> Right: though some are for assembler features, the macros are named >>> differently. >>> I had previously included gcc's auto-host.h since it was in the include path in order to use HAVE_AS_HIDDEN, so in order to decouple this I'll >>> HAVE_GAS_HIDDEN actually ;-) >>> need to add that check into libgcc/configure.ac as well. Again, shouldn't be that much code. Sound sane to you? >>> You could do that, but it was already used before your patches, so >>> please separate it from the current issue if you go down that route. >>> libgcc is still full of cleanup possibilities :-) >>> >>> Rainer >> >> OK, so I'm just adding HAVE_AS_AVX mostly as-is from libitm (we don't >> have $target_cpu so I'm using $target). I do have minor concerns about >> how this test will work on a cross-build -- I'm not an autotools expert >> and I don't understand which assembler it will invoke, but the results >> of the test failing only means we use .byte instead of the real >> mnemonic, so it really shouldn't be a problem. >> >> I've got tests started again, so presuming that *this* one passes, is it >> OK for the trunk? >> >> gcc/testsuite: >> * gcc.target/i386/pr82196-1.c: Simplify so that it doesn't break >> on Solaris or with -mno-omit-frame-pointer. > > No need to explain the change in the ChangeLog. Just say "(b): Remove > volatile asm." > >> * gcc.target/i386/pr82196-2.c: Likewise. >> >> libgcc: >> * configure.ac: Add check for HAVE_AS_AVX. >> * config.in: Regenerate. >> * configure: Likewise. >> * config/i386/i386-asm.h: Include auto-target.h from libgcc. >> (SSE_SAVE, SSE_RESTORE): Sniff HAVE_AS_AVX and directly emit raw >> .byte code when assembler doesn't support avx, correct >> out-of-date comments. > > (SSE_SAVE, SSE_RESTORE): Emit .byte sequence for !HAVE_AS_AVX. > Correct out-of-date comments. > >> >> gcc/testsuite/gcc.target/i386/pr82196-1.c | 5 ++- >> gcc/testsuite/gcc.target/i386/pr82196-2.c | 5 ++- >> libgcc/config.in | 3 ++ >> libgcc/config/i386/i386-asm.h | 45 ++- >> libgcc/configure | 39 +++ >> libgcc/configure.ac | 16 ++ >> 6 files changed, 100 insertions(+), 13 deletions(-) > > >> > > #ifdef MS2SYSV_STUB_AVX > # define MS2SYSV_STUB_PREFIX __avx_ > -# define MOVAPS vmovaps > +# ifdef HAVE_AS_AVX > +# define MOVAPS vmovaps > +# endif > > This is unecessarily complex. Please define MOVAPS unconditionaly, and ... Please disregard the above... It is OK, since the code also handles sse. > +# define BYTE .byte > +# define SSE_SAVE\ > + BYTE 0xc5, 0x78, 0x29, 0x78, 0xd0; /* vmovaps %xmm15,-0x30(%rax) */ \ > > Is there a reason for BYTE definition? Every known assembler supports > .byte directive. + BYTE 0xc5, 0xf8, 0x28, 0x76, 0x60; /* vmovaps 0x60(%rsi),%xmm6 */ +# endif /* MOVAPS */ #endif /* defined (MS2SYSV_STUB_ISA) && defined (MOVAPS) */ #endif /* I386_ASM_H */ Please update the #endif comment. Uros.
[PATCH PR82163/V2]New interface checking LCSSA for single loop
Hi, This is the V2 patch fixing PR82163. It rewrites verify_loop_closed_ssa by checking uses of all definitions inside of loop. This gives advantage that we can check loop closed ssa form for a specific loop, rather than for whole function. The interface is used in fixing this issue. Bootstrap and test on x86_64, is it OK? Thanks, bin 2017-09-21 Bin Cheng PR tree-optimization/82163 * tree-ssa-loop-manip.h (verify_loop_closed_ssa): New parameter. (checking_verify_loop_closed_ssa): New parameter. * tree-ssa-loop-manip.c (check_loop_closed_ssa_use): Delete. (check_loop_closed_ssa_stmt): Delete. (check_loop_closed_ssa_def, check_loop_closed_ssa_bb): New functions. (verify_loop_closed_ssa): Check loop closed ssa form for LOOP. (tree_transform_and_unroll_loop): Check loop closed ssa form only for changed loops. gcc/testsuite 2017-09-21 Bin Cheng PR tree-optimization/82163 * gcc.dg/tree-ssa/pr82163.c: New test.From ef756285db3685bd97bbe7d144d58af477251416 Mon Sep 17 00:00:00 2001 From: Bin Cheng Date: Thu, 21 Sep 2017 12:41:32 +0100 Subject: [PATCH] pr82163-20170921.txt --- gcc/testsuite/gcc.dg/tree-ssa/pr82163.c | 23 + gcc/tree-ssa-loop-manip.c | 91 +++-- gcc/tree-ssa-loop-manip.h | 6 +-- 3 files changed, 80 insertions(+), 40 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr82163.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr82163.c b/gcc/testsuite/gcc.dg/tree-ssa/pr82163.c new file mode 100644 index 000..fef2b1d --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr82163.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ + +int a, b, c[4], d, e, f, g; + +void h () +{ + for (; a; a++) +{ + c[a + 3] = g; + if (b) +c[a] = f; + else +{ + for (; d; d++) +c[d + 3] = c[d]; + for (e = 1; e == 2; e++) +; + if (e) +break; +} +} +} diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c index d6ba305..6ad0b75 100644 --- a/gcc/tree-ssa-loop-manip.c +++ b/gcc/tree-ssa-loop-manip.c @@ -690,48 +690,62 @@ rewrite_virtuals_into_loop_closed_ssa (struct loop *loop) rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_VIRTUAL_USES, loop); } -/* Check invariants of the loop closed ssa form for the USE in BB. */ +/* Check invariants of the loop closed ssa form for the def in DEF_BB. */ static void -check_loop_closed_ssa_use (basic_block bb, tree use) +check_loop_closed_ssa_def (basic_block def_bb, tree def) { - gimple *def; - basic_block def_bb; + use_operand_p use_p; + imm_use_iterator iterator; + FOR_EACH_IMM_USE_FAST (use_p, iterator, def) +{ + if (is_gimple_debug (USE_STMT (use_p))) + continue; - if (TREE_CODE (use) != SSA_NAME || virtual_operand_p (use)) -return; + basic_block use_bb = gimple_bb (USE_STMT (use_p)); + if (is_a (USE_STMT (use_p))) + use_bb = EDGE_PRED (use_bb, PHI_ARG_INDEX_FROM_USE (use_p))->src; - def = SSA_NAME_DEF_STMT (use); - def_bb = gimple_bb (def); - gcc_assert (!def_bb - || flow_bb_inside_loop_p (def_bb->loop_father, bb)); + gcc_assert (flow_bb_inside_loop_p (def_bb->loop_father, use_bb)); +} } -/* Checks invariants of loop closed ssa form in statement STMT in BB. */ +/* Checks invariants of loop closed ssa form in BB. */ static void -check_loop_closed_ssa_stmt (basic_block bb, gimple *stmt) +check_loop_closed_ssa_bb (basic_block bb) { - ssa_op_iter iter; - tree var; + for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi); + gsi_next (&bsi)) +{ + gphi *phi = bsi.phi (); - if (is_gimple_debug (stmt)) -return; + if (!virtual_operand_p (PHI_RESULT (phi))) + check_loop_closed_ssa_def (bb, PHI_RESULT (phi)); +} + + for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi); + gsi_next (&bsi)) +{ + ssa_op_iter iter; + tree var; + gimple *stmt = gsi_stmt (bsi); + + if (is_gimple_debug (stmt)) + continue; - FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_USE) -check_loop_closed_ssa_use (bb, var); + FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_DEF) + check_loop_closed_ssa_def (bb, var); +} } /* Checks that invariants of the loop closed ssa form are preserved. - Call verify_ssa when VERIFY_SSA_P is true. */ + Call verify_ssa when VERIFY_SSA_P is true. Note all loops are checked + if LOOP is NULL, otherwise, only LOOP is checked. */ DEBUG_FUNCTION void -verify_loop_closed_ssa (bool verify_ssa_p) +verify_loop_closed_ssa (bool verify_ssa_p, struct loop *loop) { - basic_block bb; - edge e; - edge_iterator ei; - if (number_of_loops (cfun) <= 1) return; @@ -740,20 +754,22 @@ verify_loop_closed_ssa (bool verify_ssa_p) timevar_push (TV_VERIFY_LOO
Re: [PATCH try 2] [i386,testsuite,libgcc] Fix build breakage on Mac and test FAILS on Solaris caused by PR82196 patch
On Fri, Sep 22, 2017 at 12:28 PM, Daniel Santos wrote: > On 09/22/2017 03:28 AM, Rainer Orth wrote: >> Hi Daniel, >> >>> On 09/22/2017 02:18 AM, Rainer Orth wrote: Hi Daniel, > On 09/21/2017 05:18 PM, Daniel Santos wrote: >> So libgcc doesn't use a config.in. :( > Scratch that, I forgot that we're using gcc/config.in via auto-host.h. > So I only have to add this to gcc/configure.ac and it will be available > for my libgcc header -- this is what I used to sniff out support for the > .hidden directive. Please don't go that route: it's totally the wrong direction. There's work going on to further decouple libgcc from gcc-private headers and configure results. libgcc already has its own configure tests for assembler features, and its own config.in. What's wrong with adapting libitm's avx test in libitm/acinclude.m4 (LIBITM_CHECK_AS_AVX) for libgcc? Should be trivial... Rainer >>> Oops, I just saw your email after submitting my other patch. Yes, I am >>> mistaken about config.in, sorry about that. I didn't see a config.h >>> file, but examining further it looks like it outputs to auto-target.h. >>> Also, I was looking for some HAVE_AS* macros, but they are named >>> differently. >> Right: though some are for assembler features, the macros are named >> differently. >> >>> I had previously included gcc's auto-host.h since it was in the include >>> path in order to use HAVE_AS_HIDDEN, so in order to decouple this I'll >> HAVE_GAS_HIDDEN actually ;-) >> >>> need to add that check into libgcc/configure.ac as well. Again, >>> shouldn't be that much code. Sound sane to you? >> You could do that, but it was already used before your patches, so >> please separate it from the current issue if you go down that route. >> libgcc is still full of cleanup possibilities :-) >> >> Rainer > > OK, so I'm just adding HAVE_AS_AVX mostly as-is from libitm (we don't > have $target_cpu so I'm using $target). I do have minor concerns about > how this test will work on a cross-build -- I'm not an autotools expert > and I don't understand which assembler it will invoke, but the results > of the test failing only means we use .byte instead of the real > mnemonic, so it really shouldn't be a problem. > > I've got tests started again, so presuming that *this* one passes, is it > OK for the trunk? > > gcc/testsuite: > * gcc.target/i386/pr82196-1.c: Simplify so that it doesn't break > on Solaris or with -mno-omit-frame-pointer. No need to explain the change in the ChangeLog. Just say "(b): Remove volatile asm." > * gcc.target/i386/pr82196-2.c: Likewise. > > libgcc: > * configure.ac: Add check for HAVE_AS_AVX. > * config.in: Regenerate. > * configure: Likewise. > * config/i386/i386-asm.h: Include auto-target.h from libgcc. > (SSE_SAVE, SSE_RESTORE): Sniff HAVE_AS_AVX and directly emit raw > .byte code when assembler doesn't support avx, correct > out-of-date comments. (SSE_SAVE, SSE_RESTORE): Emit .byte sequence for !HAVE_AS_AVX. Correct out-of-date comments. > > gcc/testsuite/gcc.target/i386/pr82196-1.c | 5 ++- > gcc/testsuite/gcc.target/i386/pr82196-2.c | 5 ++- > libgcc/config.in | 3 ++ > libgcc/config/i386/i386-asm.h | 45 ++- > libgcc/configure | 39 +++ > libgcc/configure.ac | 16 ++ > 6 files changed, 100 insertions(+), 13 deletions(-) > #ifdef MS2SYSV_STUB_AVX # define MS2SYSV_STUB_PREFIX __avx_ -# define MOVAPS vmovaps +# ifdef HAVE_AS_AVX +# define MOVAPS vmovaps +# endif This is unecessarily complex. Please define MOVAPS unconditionaly, and ... #elif defined(MS2SYSV_STUB_SSE) # define MS2SYSV_STUB_PREFIX __sse_ # define MOVAPS movaps #endif -#if defined (MS2SYSV_STUB_PREFIX) && defined (MOVAPS) +#if defined (MS2SYSV_STUB_PREFIX) # define MS2SYSV_STUB_BEGIN(base_name) \ HIDDEN_FUNC(PASTE2(MS2SYSV_STUB_PREFIX, base_name)) @@ -83,8 +86,10 @@ ASMNAME(fn): # define MS2SYSV_STUB_END(base_name) \ FUNC_END(PASTE2(MS2SYSV_STUB_PREFIX, base_name)) -/* Save SSE registers 6-15. off is the offset of rax to get to xmm6. */ -# define SSE_SAVE \ +/* If expanding for sse or avx and we have assembler support. */ +# ifdef MOVAPS ... check HAVE_AS_AVX here. +/* Save SSE registers 6-15 using rax as the base address. */ +# define SSE_SAVE \ MOVAPS %xmm15,-0x30(%rax); \ +# define BYTE .byte +# define SSE_SAVE\ + BYTE 0xc5, 0x78, 0x29, 0x78, 0xd0; /* vmovaps %xmm15,-0x30(%rax) */ \ Is there a reason for BYTE definition? Every known assembler supports .byte directive. Uros.
Re: [PATCH] haifa-sched: fix autopref_rank_for_schedule qsort comparator
On Tue, 19 Sep 2017, Alexander Monakov wrote: > * haifa-sched.c (autopref_rank_for_schedule): Order 'irrelevant' insns > first, always call autopref_rank_data otherwise. May I apply this patch now to unblock qsort checking? Further changes or adjustments can then go in independently at a later time. Thanks. Alexander > --- a/gcc/haifa-sched.c > +++ b/gcc/haifa-sched.c > @@ -5707,7 +5707,8 @@ autopref_rank_data (autopref_multipass_data_t data1, > static int > autopref_rank_for_schedule (const rtx_insn *insn1, const rtx_insn *insn2) > { > - for (int write = 0; write < 2; ++write) > + int r = 0; > + for (int write = 0; write < 2 && !r; ++write) > { >autopref_multipass_data_t data1 > = &INSN_AUTOPREF_MULTIPASS_DATA (insn1)[write]; > @@ -5716,21 +5717,20 @@ autopref_rank_for_schedule (const rtx_insn *insn1, > const rtx_insn *insn2) > >if (data1->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED) > autopref_multipass_init (insn1, write); > - if (data1->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT) > - continue; > >if (data2->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED) > autopref_multipass_init (insn2, write); > - if (data2->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT) > - continue; > > - if (!rtx_equal_p (data1->base, data2->base)) > - continue; > + int irrel1 = data1->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT; > + int irrel2 = data2->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT; > > - return autopref_rank_data (data1, data2); > + if (!irrel1 && !irrel2) > + r = autopref_rank_data (data1, data2); > + else > + r = irrel2 - irrel1; > } > > - return 0; > + return r; > } > > /* True if header of debug dump was printed. */ >
Re: [PATCH try 2] [i386,testsuite,libgcc] Fix build breakage on Mac and test FAILS on Solaris caused by PR82196 patch
On Fri, Sep 22, 2017 at 05:28:00AM -0500, Daniel Santos wrote: > +/* If the assembler doesn't support AVX then directly emit machine code > + for the instructions above directly. */ Just a nit: too many "directly" words. Jakub
Re: Add a vect_get_dr_size helper function
On Fri, Sep 22, 2017 at 10:15 AM, Richard Sandiford wrote: > Richard Biener writes: >> On Thu, Sep 14, 2017 at 4:05 PM, Richard Sandiford >> wrote: >>> Richard Biener writes: On Thu, Sep 14, 2017 at 1:23 PM, Richard Sandiford wrote: > This patch adds a helper function for getting the number of > bytes accessed by a scalar data reference, which helps when general > modes have a variable size. > > Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. > OK to install? Can you put it into tree-data-ref.h? >>> >>> The idea (which I forgot to say) was to collect the uses within the >>> vectoriser into one place so that we can assert in only one place >>> that the reference is constant-sized. >>> >>> A general data_reference can be variable-sized, so the guarantees >>> wouldn't be the same elsewhere. >>> >>> Would putting it in tree-vectorizer.h be OK? >> >> Maybe name it vect_get_scalar_dr_size then? >> >> Ok with that. > > As discussed on IRC, I tried also using SCALAR_TYPE_MODE instead of > TYPE_MODE, to go with the new function name, but it turns out that > the mode can be a single-element vector instead of a "true" scalar. > > This version of the patch sticks with vect_get_scalar_dr_size as > the name, but adds a comment to say that "scalar" includes "scalar > equivalent". Since SCALAR_TYPE_MODE is too strong an assertion, > I went with tree_to_uhwi instead. > > As also discussed on IRC, it might be possible in current sources > to use SCALAR_TYPE_MODE (TREE_TYPE (STMT_VINFO_VECTYPE ...)) instead. > But SVE has extending loads and truncating stores, for which the DR size > will be different from the vector element size, so if we switch it now, > we'd have to switch back to the DR size again later. > > (Sorry again for labouring such a simple change!) > > Tested as before. OK to install? Ok. Richard. > Thanks, > Richard > > > 2017-09-22 Richard Sandiford > Alan Hayward > David Sherwood > > gcc/ > * tree-vectorizer.h (vect_get_scalar_dr_size): New function. > * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Use it. > (vect_enhance_data_refs_alignment): Likewise. > > Index: gcc/tree-vectorizer.h > === > --- gcc/tree-vectorizer.h 2017-09-18 16:49:51.902170781 +0100 > +++ gcc/tree-vectorizer.h 2017-09-22 08:58:24.308199570 +0100 > @@ -1095,6 +1095,19 @@ vect_get_num_copies (loop_vec_info loop_ > / TYPE_VECTOR_SUBPARTS (vectype)); > } > > +/* Return the size of the value accessed by unvectorized data reference DR. > + This is only valid once STMT_VINFO_VECTYPE has been calculated for the > + associated gimple statement, since that guarantees that DR accesses > + either a scalar or a scalar equivalent. ("Scalar equivalent" here > + includes things like V1SI, which can be vectorized in the same way > + as a plain SI.) */ > + > +inline unsigned int > +vect_get_scalar_dr_size (struct data_reference *dr) > +{ > + return tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr; > +} > + > /* Source location */ > extern source_location vect_location; > > Index: gcc/tree-vect-data-refs.c > === > --- gcc/tree-vect-data-refs.c 2017-09-18 16:42:00.553203578 +0100 > +++ gcc/tree-vect-data-refs.c 2017-09-22 08:58:24.308199570 +0100 > @@ -955,7 +955,6 @@ vect_compute_data_ref_alignment (struct >return true; > } > > - > /* Function vect_update_misalignment_for_peel. > Sets DR's misalignment > - to 0 if it has the same alignment as DR_PEEL, > @@ -975,8 +974,8 @@ vect_update_misalignment_for_peel (struc >unsigned int i; >vec same_aligned_drs; >struct data_reference *current_dr; > - int dr_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr; > - int dr_peel_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF > (dr_peel; > + int dr_size = vect_get_scalar_dr_size (dr); > + int dr_peel_size = vect_get_scalar_dr_size (dr_peel); >stmt_vec_info stmt_info = vinfo_for_stmt (DR_STMT (dr)); >stmt_vec_info peel_stmt_info = vinfo_for_stmt (DR_STMT (dr_peel)); > > @@ -1664,8 +1663,7 @@ vect_enhance_data_refs_alignment (loop_v > >vectype = STMT_VINFO_VECTYPE (stmt_info); >nelements = TYPE_VECTOR_SUBPARTS (vectype); > - mis = DR_MISALIGNMENT (dr) / GET_MODE_SIZE (TYPE_MODE ( > -TREE_TYPE (DR_REF (dr; > + mis = DR_MISALIGNMENT (dr) / vect_get_scalar_dr_size (dr); > if (DR_MISALIGNMENT (dr) != 0) > npeel_tmp = (negative ? (mis - nelements) > : (nelements - mis)) & (nelements - 1); > @@ -1937,8 +1935,7 @@ vect_enhance_data_refs_alignment (loop_v > updating DR_MISALIGNMENT values. The peeling factor is the >
Re: [PATCH, rs6000] folding of vector stores in GIMPLE
On Thu, Sep 21, 2017 at 10:56 PM, Will Schmidt wrote: > Hi, > > Folding of vector stores in GIMPLE. > > - Add code to handle gimple folding for the vec_st (vector store) builtins. > - Remove the now obsoleted folding code for vec_st from rs6000-c-c. > > There are two spots that I could use some feedback on. > > First - > An early exit remains in place prevent folding of statements that do not > have a LHS. To allow folding of the stores to get past the check, I have > added a helper function (rs6000_builtin_valid_without_lhs) that allows > those store intrinsics to proceed. I'm not sure the approach (or the name I > chose) > is the best choice, so I'll defer to recommendations on how to improve that. > :-) > > Second - > This code (as-is) is subject to a TBAA related issue (similar to what was > noticed > in the gimple folding code for loads. As-is, with a testcase such as : > > void testst_struct1b (vector double vd1, long long ll1, struct S *p) > { > vec_st (vd1, ll1, (vector double *)p); > } > > will generate gimple that looks like: > MEM[(struct S *)D.3218] = vd1; > > If I rework the code, setting arg2_type to be ptr_type_node, i.e. > + tree arg2_type = TREE_TYPE (arg2); > to: > + tree arg2_type = ptr_type_node; > > the generated gimple then looks like > MEM[(void *)D.3218] = vd1; > > Which is probably OK, but I cannot say for certain. The generated .s content > is at least equivalent. It looks safe. The question I had is whether vec_st (vd1, ll1, (vector double *)p) is equivalent to *(vector double *)((char *)p + ll1) = vd1; from a TBAA perspective. Thus whether the type of the tird argument to vec_st defines the type of the access (in C standards meaning). If so then what we do now is pessimizing (but as you say previously (long long *) and (long *) were aliased together and you got wrong-code with aliasing with regular stores of the "wrong" same type). A proper fix would be to transition this type as seen from the frontend to GIMPLE, for example in a similar way we do for MEM_REFs by piggy-backing that on an extra argument, a constant zero pointer of the alias pointer type to use (which would also serve as a type indicator of the store itself). I'd use a target specific internal function for this (not sure if we can have those target specific, but I guess if it's folded away then that's fine) and get away with the overload set. Richard. > The resulting code is verified by testcases powerpc/fold-vec-st-*.c, which > has been posted separately. > > regtest looks clean on power6 and newer. > > pending feedback, OK for trunk? > > Thanks, > -Will > > [gcc] > > 2017-09-21 Will Schmidt > > * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling > for early folding of vector stores (ALTIVEC_BUILTIN_ST_*). > (rs6000_builtin_valid_without_lhs): helper function. > * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): > Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_ST. > > diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c > index a49db97..4a363a1 100644 > --- a/gcc/config/rs6000/rs6000-c.c > +++ b/gcc/config/rs6000/rs6000-c.c > @@ -6470,82 +6470,10 @@ altivec_resolve_overloaded_builtin (location_t loc, > tree fndecl, > convert (TREE_TYPE (stmt), arg0)); >stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl); >return stmt; > } > > - /* Expand vec_st into an expression that masks the address and > - performs the store. We need to expand this early to allow > - the best aliasing, as by the time we get into RTL we no longer > - are able to honor __restrict__, for example. We may want to > - consider this for all memory access built-ins. > - > - When -maltivec=be is specified, or the wrong number of arguments > - is provided, simply punt to existing built-in processing. */ > - > - if (fcode == ALTIVEC_BUILTIN_VEC_ST > - && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG) > - && nargs == 3) > -{ > - tree arg0 = (*arglist)[0]; > - tree arg1 = (*arglist)[1]; > - tree arg2 = (*arglist)[2]; > - > - /* Construct the masked address. Let existing error handling take > -over if we don't have a constant offset. */ > - arg1 = fold (arg1); > - > - if (TREE_CODE (arg1) == INTEGER_CST) > - { > - if (!ptrofftype_p (TREE_TYPE (arg1))) > - arg1 = build1 (NOP_EXPR, sizetype, arg1); > - > - tree arg2_type = TREE_TYPE (arg2); > - if (TREE_CODE (arg2_type) == ARRAY_TYPE && c_dialect_cxx ()) > - { > - /* Force array-to-pointer decay for C++. */ > - arg2 = default_conversion (arg2); > - arg2_type = TREE_TYPE (arg2); > - } > - > - /* Find the built-in to make sure a compatible one exists; if not > -we fall back to default handling to get the error message.
Re: [PATCH GCC]A simple implementation of loop interchange
On Mon, Sep 4, 2017 at 2:54 PM, Richard Biener wrote: > On Wed, Aug 30, 2017 at 6:32 PM, Bin.Cheng wrote: >> On Wed, Aug 30, 2017 at 3:19 PM, Richard Biener >> wrote: >>> On Wed, Aug 30, 2017 at 3:18 PM, Bin Cheng wrote: Hi, This patch implements a simple loop interchange pass in GCC, as described by its comments: +/* This pass performs loop interchange: for example, the loop nest + + for (int j = 0; j < N; j++) + for (int k = 0; k < N; k++) + for (int i = 0; i < N; i++) +c[i][j] = c[i][j] + a[i][k]*b[k][j]; + + is transformed to + + for (int i = 0; i < N; i++) + for (int j = 0; j < N; j++) + for (int k = 0; k < N; k++) +c[i][j] = c[i][j] + a[i][k]*b[k][j]; + + This pass implements loop interchange in the following steps: + + 1) Find perfect loop nest for each innermost loop and compute data + dependence relations for it. For above example, loop nest is + . + 2) From innermost to outermost loop, this pass tries to interchange + each loop pair. For above case, it firstly tries to interchange +and loop nest becomes . + Then it tries to interchange and loop nest becomes + . The overall effect is to move innermost + loop to the outermost position. For loop pair + to be interchanged, we: + 3) Check if data dependence relations are valid for loop interchange. + 4) Check if both loops can be interchanged in terms of transformation. + 5) Check if interchanging the two loops is profitable. + 6) Interchange the two loops by mapping induction variables. + + This pass also handles reductions in loop nest. So far we only support + simple reduction of inner loop and double reduction of the loop nest. */ Actually, this pass only does loop shift which outermosting inner loop to outer, rather than permutation. Also as a traditional loop optimizer, it only works for perfect loop nest. I put it just after loop distribution thus ideally loop split/distribution could create perfect nest for it. Unfortunately, we don't get any perfect nest from distribution for now because it only works for innermost loop. For example, the motivation case in spec2k6/bwaves is not handled on this pass alone. I have a patch extending distribution for (innermost) loop nest and with that patch bwaves case can be handled. Another point is I deliberately make both the cost model and code transformation (very) conservative. We can support more cases, or more transformations with great care when it is for sure known beneficial. IMHO, we already hit over-baked issues quite often and don't want to introduce more. As for code generation, this patch has an issue that invariant code in outer loop could be moved to inner loop. For the moment, we rely on the last lim pass to handle all INV generated during interchange. In the future, we may need to avoid that in interchange itself, or another lim pass just like the one after graphite optimizations. Boostrap and test on x86_64 and AArch64. Various benchmarks built and run successfully. Note this pass is disabled in patch, while the code is exercised by bootstrap/building programs with it enabled by default. Any comments? >>> >> Thanks for quick review. >>> +/* The same as above, but this one is only used for interchanging not >>> + innermost loops. */ >>> +#define OUTER_STRIDE_RATIO (2) >>> >>> please make all these knobs --params. >>> >>> +/* Enum type for loop reduction variable. */ >>> +enum reduction_type >>> +{ >>> + UNKNOWN_RTYPE = 0, >>> + SIMPLE_RTYPE, >>> + DOUBLE_RTYPE >>> +}; >>> >>> seeing this we should have some generic data structure / analysis for >>> reduction detection. This adds a third user (autopar and vectorizer >>> are the others). Just an idea. >>> >>> +/* Return true if E is abnormal edge. */ >>> + >>> +static inline bool >>> +abnormal_edge (edge e) >>> +{ >>> + return (e->flags & (EDGE_EH | EDGE_ABNORMAL | EDGE_IRREDUCIBLE_LOOP)); >>> +} >>> >>> bad name/comment for what it does. >>> >>> ... jumping to end of file / start of pass >>> >>> + /* Get the outer loop. */ >>> + loop = superloop_at_depth (loop, loop_depth (loop) - 1); >>> >>> loop_outer (loop)? >>> >>> + /* Only support rectangle loop nest, i.e, inner loop's niters doesn't >>> +depends on outer loop's IV. */ >>> + if (chrec_contains_symbols_defined_in_loop (niters, loop->num)) >>> + break; >>> >>> but you don't check for a three-nest whether niters depends on outer outer >>> loop's IV that way. Either the check is superfluous here or incomplete. >> It is checked
[PATCH try 2] [i386,testsuite,libgcc] Fix build breakage on Mac and test FAILS on Solaris caused by PR82196 patch
On 09/22/2017 03:28 AM, Rainer Orth wrote: > Hi Daniel, > >> On 09/22/2017 02:18 AM, Rainer Orth wrote: >>> Hi Daniel, >>> On 09/21/2017 05:18 PM, Daniel Santos wrote: > So libgcc doesn't use a config.in. :( Scratch that, I forgot that we're using gcc/config.in via auto-host.h. So I only have to add this to gcc/configure.ac and it will be available for my libgcc header -- this is what I used to sniff out support for the .hidden directive. >>> Please don't go that route: it's totally the wrong direction. There's >>> work going on to further decouple libgcc from gcc-private headers and >>> configure results. libgcc already has its own configure tests for >>> assembler features, and its own config.in. What's wrong with adapting >>> libitm's avx test in libitm/acinclude.m4 (LIBITM_CHECK_AS_AVX) for >>> libgcc? Should be trivial... >>> >>> Rainer >>> >> Oops, I just saw your email after submitting my other patch. Yes, I am >> mistaken about config.in, sorry about that. I didn't see a config.h >> file, but examining further it looks like it outputs to auto-target.h. >> Also, I was looking for some HAVE_AS* macros, but they are named >> differently. > Right: though some are for assembler features, the macros are named > differently. > >> I had previously included gcc's auto-host.h since it was in the include >> path in order to use HAVE_AS_HIDDEN, so in order to decouple this I'll > HAVE_GAS_HIDDEN actually ;-) > >> need to add that check into libgcc/configure.ac as well. Again, >> shouldn't be that much code. Sound sane to you? > You could do that, but it was already used before your patches, so > please separate it from the current issue if you go down that route. > libgcc is still full of cleanup possibilities :-) > > Rainer OK, so I'm just adding HAVE_AS_AVX mostly as-is from libitm (we don't have $target_cpu so I'm using $target). I do have minor concerns about how this test will work on a cross-build -- I'm not an autotools expert and I don't understand which assembler it will invoke, but the results of the test failing only means we use .byte instead of the real mnemonic, so it really shouldn't be a problem. I've got tests started again, so presuming that *this* one passes, is it OK for the trunk? gcc/testsuite: * gcc.target/i386/pr82196-1.c: Simplify so that it doesn't break on Solaris or with -mno-omit-frame-pointer. * gcc.target/i386/pr82196-2.c: Likewise. libgcc: * configure.ac: Add check for HAVE_AS_AVX. * config.in: Regenerate. * configure: Likewise. * config/i386/i386-asm.h: Include auto-target.h from libgcc. (SSE_SAVE, SSE_RESTORE): Sniff HAVE_AS_AVX and directly emit raw .byte code when assembler doesn't support avx, correct out-of-date comments. gcc/testsuite/gcc.target/i386/pr82196-1.c | 5 ++- gcc/testsuite/gcc.target/i386/pr82196-2.c | 5 ++- libgcc/config.in | 3 ++ libgcc/config/i386/i386-asm.h | 45 ++- libgcc/configure | 39 +++ libgcc/configure.ac | 16 ++ 6 files changed, 100 insertions(+), 13 deletions(-) Thanks, Daniel diff --git a/gcc/testsuite/gcc.target/i386/pr82196-1.c b/gcc/testsuite/gcc.target/i386/pr82196-1.c index ef858328f00..541d975480d 100644 --- a/gcc/testsuite/gcc.target/i386/pr82196-1.c +++ b/gcc/testsuite/gcc.target/i386/pr82196-1.c @@ -1,7 +1,7 @@ /* { dg-do compile { target lp64 } } */ /* { dg-options "-msse -mcall-ms2sysv-xlogues -O2" } */ -/* { dg-final { scan-assembler "call.*__sse_savms64_18" } } */ -/* { dg-final { scan-assembler "jmp.*__sse_resms64x_18" } } */ +/* { dg-final { scan-assembler "call.*__sse_savms64f?_12" } } */ +/* { dg-final { scan-assembler "jmp.*__sse_resms64f?x_12" } } */ void __attribute__((sysv_abi)) a() { } @@ -9,6 +9,5 @@ void __attribute__((sysv_abi)) a() { static void __attribute__((sysv_abi)) (*volatile a_noinfo)() = a; void __attribute__((ms_abi)) b() { - __asm__ __volatile__ ("" :::"rbx", "rbp", "r12", "r13", "r14", "r15"); a_noinfo (); } diff --git a/gcc/testsuite/gcc.target/i386/pr82196-2.c b/gcc/testsuite/gcc.target/i386/pr82196-2.c index 8fe58411d5e..7166d068bc1 100644 --- a/gcc/testsuite/gcc.target/i386/pr82196-2.c +++ b/gcc/testsuite/gcc.target/i386/pr82196-2.c @@ -1,7 +1,7 @@ /* { dg-do compile { target lp64 } } */ /* { dg-options "-mavx -mcall-ms2sysv-xlogues -O2" } */ -/* { dg-final { scan-assembler "call.*__avx_savms64_18" } } */ -/* { dg-final { scan-assembler "jmp.*__avx_resms64x_18" } } */ +/* { dg-final { scan-assembler "call.*__avx_savms64f?_12" } } */ +/* { dg-final { scan-assembler "jmp.*__avx_resms64f?x_12" } } */ void __attribute__((sysv_abi)) a() { } @@ -9,6 +9,5 @@ void __attribute__((sysv_abi)) a() { static void __attribute__((sysv_abi)) (*volatile a_noinfo)() = a; void __attribute__((ms_abi)) b() { - __asm__ __volat
Re: [PATCH] PR target/80556
> On 18 Sep 2017, at 22:08, Simon Wright wrote: > > On 18 Sep 2017, at 21:09, Iain Sandoe wrote: >> >>> If I propose this alternative patch, should it be a new post, or should I >>> continue this thread? >> >> thanks for the patch. >> >> The basic idea seems sound - as a workaround (as noted in comment #20 in the >> PR, we should really rationalise the libgcc/crts stuff to reflect the modern >> world, but these things take time...). >> >> The patch as you have it would apply to every version of Darwin. >> >> AFAICT from the published sources, i386 Darwin should be able to work with >> the libgcc unwinder (and all earlier Darwin *have* to) - so I’ve proposed a >> modified patch in the PR that makes the changes specific to m64 x86 and >> doesn’t make any alteration for PPC and/or Darwin < 10. > > That sounds like the right thing to do. I hadn't considered the older > hardware/os issues (I only have kit back to macOS 10.11, Darwin 15). So here’s the revised version with the comments slightly updated, checked Darwin10,15,16 x86_64 and i386 in progress, OK if i386 succeeds? Iain Sandoe CodeSourcery / Mentor Embedded / Siemens gcc/ 2017-09-xx Iain Sandoe PR target/80556 * config/i386/darwin.h (REAL_LIB_SPEC): New; put libSystem ahead of libgcc_eh for m64. * config/i386/darwin64.h: Likewise. diff --git a/gcc/config/i386/darwin.h b/gcc/config/i386/darwin.h index fccaf7e..321ed27 100644 --- a/gcc/config/i386/darwin.h +++ b/gcc/config/i386/darwin.h @@ -39,6 +39,32 @@ along with GCC; see the file COPYING3. If not see #endif #endif +/* WORKAROUND pr80556: + For x86_64 Darwin10 and later, the unwinder is in libunwind (redirected + from libSystem). This doesn't use the keymgr (see keymgr.c) and therefore + the calls that libgcc makes to obtain the KEYMGR_GCC3_DW2_OBJ_LIST are not + updated to include new images, and might not even be valid for a single + image. + Therefore, for 64b exes at least, we must use the libunwind implementation, + even when static-libgcc is specified. We put libSystem first so that + unwinder symbols are satisfied from there. */ +#undef REAL_LIBGCC_SPEC +#define REAL_LIBGCC_SPEC \ + "%{static-libgcc|static: \ + %{m64:%:version-compare(>= 10.6 mmacosx-version-min= -lSystem)} \ +-lgcc_eh -lgcc; \ + shared-libgcc|fexceptions|fgnu-runtime: \ + %:version-compare(!> 10.5 mmacosx-version-min= -lgcc_s.10.4) \ + %:version-compare(>< 10.5 10.6 mmacosx-version-min= -lgcc_s.10.5) \ + %:version-compare(!> 10.5 mmacosx-version-min= -lgcc_ext.10.4) \ + %:version-compare(>= 10.5 mmacosx-version-min= -lgcc_ext.10.5) \ + -lgcc ;\ + :%:version-compare(>< 10.3.9 10.5 mmacosx-version-min= -lgcc_s.10.4) \ + %:version-compare(>< 10.5 10.6 mmacosx-version-min= -lgcc_s.10.5) \ + %:version-compare(!> 10.5 mmacosx-version-min= -lgcc_ext.10.4) \ + %:version-compare(>= 10.5 mmacosx-version-min= -lgcc_ext.10.5) \ + -lgcc }" + /* Size of the Obj-C jump buffer. */ #define OBJC_JBLEN ((TARGET_64BIT) ? ((9 * 2) + 3 + 16) : (18)) diff --git a/gcc/config/i386/darwin64.h b/gcc/config/i386/darwin64.h index f2982ed..32cb789 100644 --- a/gcc/config/i386/darwin64.h +++ b/gcc/config/i386/darwin64.h @@ -21,6 +21,32 @@ along with GCC; see the file COPYING3. If not see #undef DARWIN_ARCH_SPEC #define DARWIN_ARCH_SPEC "%{m32:i386;:x86_64}" +/* WORKAROUND pr80556: + For x86_64 Darwin10 and later, the unwinder is in libunwind (redirected + from libSystem). This doesn't use the keymgr (see keymgr.c) and therefore + the calls that libgcc makes to obtain the KEYMGR_GCC3_DW2_OBJ_LIST are not + updated to include new images, and might not even be valid for a single + image. + Therefore, for 64b exes at least, we must use the libunwind implementation, + even when static-libgcc is specified. We put libSystem first so that + unwinder symbols are satisfied from there. */ +#undef REAL_LIBGCC_SPEC +#define REAL_LIBGCC_SPEC \ + "%{static-libgcc|static: \ + %{!m32:%:version-compare(>= 10.6 mmacosx-version-min= -lSystem)}\ +-lgcc_eh -lgcc; \ + shared-libgcc|fexceptions|fgnu-runtime: \ + %:version-compare(!> 10.5 mmacosx-version-min= -lgcc_s.10.4) \ + %:version-compare(>< 10.5 10.6 mmacosx-version-min= -lgcc_s.10.5) \ + %:version-compare(!> 10.5 mmacosx-version-min= -lgcc_ext.10.4) \ + %:version-compare(>= 10.5 mmacosx-version-min= -lgcc_ext.10.5) \ + -lgc
Re: [Patch, Fortran] PR 82143: add a -fdefault-real-16 flag
2017-09-22 9:11 GMT+02:00 Janne Blomqvist : > On Fri, Sep 22, 2017 at 8:02 AM, Janus Weil wrote: >> Thanks, Steve. I'm attaching the updated ChangeLog and the two test >> cases for the two new flags. Since this appears to be a somewhat >> controversial feature, I'll wait two more days to allow for others to >> contribute their feedback (positive or negative). I'll commit on >> Sunday if I hear nothing until then. > > Well, if you're actively soliciting bikeshedding, here goes: I'm not soliciting anything. I'm giving people a fair chance to comment. What you make out of that is completely up to you. > Since we're about to have several -fdefault-real-N flags, would it > make sense to instead make a single flag -fdefault-real=SOMEVALUE I don't think that's a good idea, for several reasons: 1) We're probably not going to have much more N values (adding a single one is tough enough of a job apparently, plus the list of supported real kinds is very much limited anyway). 2) The syntax you're proposing is inconsistent with other flags like -fdefault-integer-8, -fdefault-double-8. I'm sure we don't want to change all of them. 3) That kind of syntax is not even used in other flags like -ffree-line-length-n, where the range of possible values is potentially much larger. > And since the standard requires that double precision variables are > twice as big as reals, in the absence of an explicit -fdefault-double= > flag, would it make sense to have -fdefault-real=N imply > -fdefault-double=[2*N or if that isn't supported on the target, the > largest supported real kind]? That's basically the behavior I tried to implement in my current patch (although I notice now that you're not necessarily getting the largest real kind, if 16 is not supported). Cheers, Janus
[PATCH][GRAPHITE] Simplify move_sese_in_condition
This re-implements it avoding the need to recompute dominators and in a much simpler way. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress, SPEC CPU 2006 is happy. Richard. 2017-09-22 Richard Biener * sese.c: Include cfganal.h. (if_region_set_false_region): Remove. (create_if_region_on_edge): Likewise. (move_sese_in_condition): Re-implement without destroying dominators. Index: gcc/sese.c === --- gcc/sese.c (revision 253090) +++ gcc/sese.c (working copy) @@ -40,8 +40,9 @@ along with GCC; see the file COPYING3. #include "cfgloop.h" #include "tree-data-ref.h" #include "tree-scalar-evolution.h" -#include "sese.h" #include "tree-ssa-propagate.h" +#include "cfganal.h" +#include "sese.h" /* For a USE in BB, if BB is outside REGION, mark the USE in the LIVEOUTS set. */ @@ -333,99 +334,6 @@ get_false_edge_from_guard_bb (basic_bloc return NULL; } -/* Sets the false region of an IF_REGION to REGION. */ - -static void -if_region_set_false_region (ifsese if_region, sese_info_p region) -{ - free_dominance_info (CDI_DOMINATORS); - - basic_block condition = if_region_get_condition_block (if_region); - edge false_edge = get_false_edge_from_guard_bb (condition); - basic_block dummy = false_edge->dest; - edge entry_region = region->region.entry; - edge exit_region = region->region.exit; - basic_block before_region = entry_region->src; - basic_block last_in_region = exit_region->src; - hashval_t hash = htab_hash_pointer (exit_region); - loop_exit **slot -= current_loops->exits->find_slot_with_hash (exit_region, hash, NO_INSERT); - bool latch_p -= exit_region->dest->loop_father->latch == exit_region->src; - - entry_region->flags = false_edge->flags; - false_edge->flags = exit_region->flags; - - redirect_edge_pred (entry_region, condition); - redirect_edge_pred (exit_region, before_region); - redirect_edge_pred (false_edge, last_in_region); - redirect_edge_succ (false_edge, single_succ (dummy)); - delete_basic_block (dummy); - - exit_region->flags = EDGE_FALLTHRU; - - region->region.exit = false_edge; - - free (if_region->false_region); - if_region->false_region = region; - - if (slot) -{ - struct loop_exit *loop_exit = ggc_cleared_alloc (); - - memcpy (loop_exit, *((struct loop_exit **) slot), - sizeof (struct loop_exit)); - current_loops->exits->clear_slot (slot); - - hashval_t hash = htab_hash_pointer (false_edge); - slot = current_loops->exits->find_slot_with_hash (false_edge, hash, - INSERT); - loop_exit->e = false_edge; - *slot = loop_exit; - false_edge->src->loop_father->exits->next = loop_exit; -} - if (latch_p) -exit_region->dest->loop_father->latch = before_region; - - calculate_dominance_info (CDI_DOMINATORS); -} - -/* Creates an IFSESE with CONDITION on edge ENTRY. */ - -static ifsese -create_if_region_on_edge (edge entry, tree condition) -{ - edge e; - edge_iterator ei; - sese_info_p sese_region = XNEW (struct sese_info_t); - sese_info_p true_region = XNEW (struct sese_info_t); - sese_info_p false_region = XNEW (struct sese_info_t); - ifsese if_region = XNEW (struct ifsese_s); - edge exit = create_empty_if_region_on_edge (entry, condition); - - if_region->region = sese_region; - if_region->region->region.entry = entry; - if_region->region->region.exit = exit; - - FOR_EACH_EDGE (e, ei, entry->dest->succs) -{ - if (e->flags & EDGE_TRUE_VALUE) - { - true_region->region.entry = e; - true_region->region.exit = single_succ_edge (e->dest); - if_region->true_region = true_region; - } - else if (e->flags & EDGE_FALSE_VALUE) - { - false_region->region.entry = e; - false_region->region.exit = single_succ_edge (e->dest); - if_region->false_region = false_region; - } -} - - return if_region; -} - /* Moves REGION in a condition expression: | if (1) | ; @@ -436,14 +344,32 @@ create_if_region_on_edge (edge entry, tr ifsese move_sese_in_condition (sese_info_p region) { - gcc_assert (! dom_info_available_p (cfun, CDI_POST_DOMINATORS)); + basic_block region_entry_dest = region->region.entry->dest; basic_block pred_block = split_edge (region->region.entry); - ifsese if_region; + basic_block merge_block = split_edge (region->region.exit); - region->region.entry = single_succ_edge (pred_block); - if_region = create_if_region_on_edge (single_pred_edge (pred_block), - integer_one_node); - if_region_set_false_region (if_region, region); + edge true_edge = make_edge (pred_block, merge_block, EDGE_TRUE_VALUE); + edge false_edge = find_edge (pred_block, region_entry_dest); + false_edge->flags &= ~EDGE_FALLTHRU; + false_edge->flags |= EDGE_FALSE_VALUE; + gimple_stmt_iterator gsi =
Re: [PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV
Hi Daniel, > On 09/22/2017 02:18 AM, Rainer Orth wrote: >> Hi Daniel, >> >>> On 09/21/2017 05:18 PM, Daniel Santos wrote: So libgcc doesn't use a config.in. :( >>> Scratch that, I forgot that we're using gcc/config.in via auto-host.h. >>> So I only have to add this to gcc/configure.ac and it will be available >>> for my libgcc header -- this is what I used to sniff out support for the >>> .hidden directive. >> Please don't go that route: it's totally the wrong direction. There's >> work going on to further decouple libgcc from gcc-private headers and >> configure results. libgcc already has its own configure tests for >> assembler features, and its own config.in. What's wrong with adapting >> libitm's avx test in libitm/acinclude.m4 (LIBITM_CHECK_AS_AVX) for >> libgcc? Should be trivial... >> >> Rainer >> > > Oops, I just saw your email after submitting my other patch. Yes, I am > mistaken about config.in, sorry about that. I didn't see a config.h > file, but examining further it looks like it outputs to auto-target.h. > Also, I was looking for some HAVE_AS* macros, but they are named > differently. Right: though some are for assembler features, the macros are named differently. > I had previously included gcc's auto-host.h since it was in the include > path in order to use HAVE_AS_HIDDEN, so in order to decouple this I'll HAVE_GAS_HIDDEN actually ;-) > need to add that check into libgcc/configure.ac as well. Again, > shouldn't be that much code. Sound sane to you? You could do that, but it was already used before your patches, so please separate it from the current issue if you go down that route. libgcc is still full of cleanup possibilities :-) Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
PING][PATCH][compare-elim] Merge zero-comparisons with normal ops
Ping. Updated patch posted here: https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00390.html
Re: [PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV
On 09/22/2017 02:18 AM, Rainer Orth wrote: > Hi Daniel, > >> On 09/21/2017 05:18 PM, Daniel Santos wrote: >>> So libgcc doesn't use a config.in. :( >> Scratch that, I forgot that we're using gcc/config.in via auto-host.h. >> So I only have to add this to gcc/configure.ac and it will be available >> for my libgcc header -- this is what I used to sniff out support for the >> .hidden directive. > Please don't go that route: it's totally the wrong direction. There's > work going on to further decouple libgcc from gcc-private headers and > configure results. libgcc already has its own configure tests for > assembler features, and its own config.in. What's wrong with adapting > libitm's avx test in libitm/acinclude.m4 (LIBITM_CHECK_AS_AVX) for > libgcc? Should be trivial... > > Rainer > Oops, I just saw your email after submitting my other patch. Yes, I am mistaken about config.in, sorry about that. I didn't see a config.h file, but examining further it looks like it outputs to auto-target.h. Also, I was looking for some HAVE_AS* macros, but they are named differently. I had previously included gcc's auto-host.h since it was in the include path in order to use HAVE_AS_HIDDEN, so in order to decouple this I'll need to add that check into libgcc/configure.ac as well. Again, shouldn't be that much code. Sound sane to you? Thanks, Daniel
[RFC PATCH 4/4] Extra async tests, not for merging
These tests show that changing the system clock has an effect on std::future::wait_until when using std::chrono::system_clock but not when using std::chrono::steady_clock. Unfortunately these tests have a number of downsides: 1. Nothing that is attempting to keep the clock set correctly (ntpd, systemd-timesyncd) can be running at the same time. 2. The test process requires the CAP_SYS_TIME capability. 3. Other processes running concurrently may misbehave when the clock darts back and forth. 4. They are slow to run. As such, I don't think they are suitable for merging. I include them here because I wanted to document how I had tested the changes in the previous commits. --- libstdc++-v3/testsuite/30_threads/async/async.cc | 76 1 file changed, 76 insertions(+) diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc b/libstdc++-v3/testsuite/30_threads/async/async.cc index 59905b4666f..0a3cb28fbda 100644 --- a/libstdc++-v3/testsuite/30_threads/async/async.cc +++ b/libstdc++-v3/testsuite/30_threads/async/async.cc @@ -25,6 +25,7 @@ #include #include +#include using namespace std; @@ -94,11 +95,86 @@ void test03() VERIFY( elapsed < std::chrono::seconds(20) ); } +void perturb_system_clock(const std::chrono::seconds &seconds) +{ + struct timeval tv; + if (gettimeofday(&tv, NULL)) +abort(); + + tv.tv_sec += seconds.count(); + if (settimeofday(&tv, NULL)) +abort(); +} + +void work04() +{ + std::this_thread::sleep_for(std::chrono::seconds(10)); +} + +// Ensure that advancing CLOCK_REALTIME doesn't make any difference +// when we're waiting on std::chrono::steady_clock. +void test04() +{ + auto const start = chrono::steady_clock::now(); + future f1 = async(launch::async, &work04); + + perturb_system_clock(chrono::seconds(20)); + + std::future_status status; + status = f1.wait_for(std::chrono::seconds(4)); + VERIFY( status == std::future_status::timeout ); + + status = f1.wait_until(start + std::chrono::seconds(6)); + VERIFY( status == std::future_status::timeout ); + + status = f1.wait_until(start + std::chrono::seconds(12)); + VERIFY( status == std::future_status::ready ); + + auto const elapsed = chrono::steady_clock::now() - start; + VERIFY( elapsed >= std::chrono::seconds(10) ); + VERIFY( elapsed < std::chrono::seconds(15) ); + + perturb_system_clock(chrono::seconds(-20)); +} + +void work05() +{ + std::this_thread::sleep_for(std::chrono::seconds(5)); + perturb_system_clock(chrono::seconds(60)); + std::this_thread::sleep_for(std::chrono::seconds(5)); +} + +// Ensure that advancing CLOCK_REALTIME does make a difference when +// we're waiting on std::chrono::system_clock. +void test05() +{ + auto const start = chrono::system_clock::now(); + auto const start_steady = chrono::steady_clock::now(); + + future f1 = async(launch::async, &work05); + future_status status; + status = f1.wait_until(start + std::chrono::seconds(60)); + VERIFY( status == std::future_status::timeout ); + + auto const elapsed_steady = chrono::steady_clock::now() - start_steady; + VERIFY( elapsed_steady >= std::chrono::seconds(5) ); + VERIFY( elapsed_steady < std::chrono::seconds(10) ); + + status = f1.wait_until(start + std::chrono::seconds(75)); + VERIFY( status == std::future_status::ready ); + + perturb_system_clock(chrono::seconds(-60)); +} + int main() { test01(); test02(); test03(); test03(); + if (geteuid() == 0) { +test04(); +test05(); + } return 0; } -- 2.11.0
[RFC PATCH 2/4] libstdc++ futex: Support waiting on std::chrono::steady_clock directly
The user-visible effect of this change is for std::future::wait_until to use CLOCK_MONOTONIC when passed a timeout of std::chrono::steady_clock::time_point type. This makes it immune to any changes made to the system clock CLOCK_REALTIME. Add an overload of __atomic_futex_unsigned::_M_load_and_text_until_impl that accepts a std::chrono::steady_clock, and correctly passes this through to __atomic_futex_unsigned_base::_M_futex_wait_until_steady which uses CLOCK_MONOTONIC for the timeout within the futex system call. These functions are mostly just copies of the std::chrono::system_clock versions with small tweaks. Prior to this commit, a std::chrono::steady timeout would be converted via std::chrono::system_clock which risks reducing or increasing the timeout if someone changes CLOCK_REALTIME whilst the wait is happening. (The commit immediately prior to this one increases the window of opportunity for that from a short period during the calculation of a relative timeout, to the entire duration of the wait.) I believe that I've added this functionality in a way that it doesn't break ABI compatibility, but that has made it more verbose and less type safe. I believe that it would be better to maintain the timeout as an instance of the correct clock type all the way down to a single _M_futex_wait_until function with an overload for each clock. The current scheme of separating out the seconds and nanoseconds early risks accidentally calling the wait function for the wrong clock. Unfortunately, doing this would break code that compiled against the old header. --- libstdc++-v3/include/bits/atomic_futex.h | 67 +++- libstdc++-v3/src/c++11/futex.cc | 33 2 files changed, 99 insertions(+), 1 deletion(-) diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h index afcfeb7720d..c2b3df03592 100644 --- a/libstdc++-v3/include/bits/atomic_futex.h +++ b/libstdc++-v3/include/bits/atomic_futex.h @@ -52,11 +52,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #if defined(_GLIBCXX_HAVE_LINUX_FUTEX) && ATOMIC_INT_LOCK_FREE > 1 struct __atomic_futex_unsigned_base { -// Returns false iff a timeout occurred. +// __s and __ns are measured against CLOCK_REALTIME. Returns false +// iff a timeout occurred. bool _M_futex_wait_until(unsigned *__addr, unsigned __val, bool __has_timeout, chrono::seconds __s, chrono::nanoseconds __ns); +// __s and __ns are measured against CLOCK_MONOTONIC. Returns +// false iff a timeout occurred. +bool +_M_futex_wait_until_steady(unsigned *__addr, unsigned __val, bool __has_timeout, + chrono::seconds __s, chrono::nanoseconds __ns); + // This can be executed after the object has been destroyed. static void _M_futex_notify_all(unsigned* __addr); }; @@ -86,6 +93,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // value if equal is false. // The assumed value is the caller's assumption about the current value // when making the call. +// __s and __ns are measured against CLOCK_REALTIME. unsigned _M_load_and_test_until(unsigned __assumed, unsigned __operand, bool __equal, memory_order __mo, bool __has_timeout, @@ -110,6 +118,36 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } } +// If a timeout occurs, returns a current value after the timeout; +// otherwise, returns the operand's value if equal is true or a different +// value if equal is false. +// The assumed value is the caller's assumption about the current value +// when making the call. +// __s and __ns are measured against CLOCK_MONOTONIC. +unsigned +_M_load_and_test_until_steady(unsigned __assumed, unsigned __operand, + bool __equal, memory_order __mo, bool __has_timeout, + chrono::seconds __s, chrono::nanoseconds __ns) +{ + for (;;) + { + // Don't bother checking the value again because we expect the caller + // to have done it recently. + // memory_order_relaxed is sufficient because we can rely on just the + // modification order (store_notify uses an atomic RMW operation too), + // and the futex syscalls synchronize between themselves. + _M_data.fetch_or(_Waiter_bit, memory_order_relaxed); + bool __ret = _M_futex_wait_until_steady((unsigned*)(void*)&_M_data, + __assumed | _Waiter_bit, + __has_timeout, __s, __ns); + // Fetch the current value after waiting (clears _Waiter_bit). + __assumed = _M_load(__mo); + if (!__ret || ((__operand == __assumed) == __equal)) + return __assumed; + // TODO adapt wait time + } +} + // Returns the operand's value if equal is true or a different value if // equal is false. // The assumed value is the caller's assumption about the current value @@ -140,6 +178,19 @@ _GL
[RFC PATCH 3/4] libstdc++ atomic_futex: Use std::chrono::steady_clock as reference clock
The user-visible effect of this change is that std::future::wait_for now uses std::chrono::steady_clock to determine the timeout. This makes it immune to changes made to the system clock. It also means that anyone using their own clock types with std::future::wait_until will have the timeout converted to std::chrono::steady_clock rather than std::chrono::system_clock. Now that use of both std::chrono::steady_clock and std::chrono::system_clock are correctly supported for the wait timeout, I believe that std::chrono::steady_clock is a better choice for the reference clock that is used by std::future::wait_for, and all other clocks are converted to, since it is guaranteed to advance steadily. The previous behaviour of converting to std::chrono::system_clock risks timeouts changing dramatically when the system clock is changed. --- libstdc++-v3/include/bits/atomic_futex.h | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h index c2b3df03592..7ba86dfeec7 100644 --- a/libstdc++-v3/include/bits/atomic_futex.h +++ b/libstdc++-v3/include/bits/atomic_futex.h @@ -71,7 +71,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION template class __atomic_futex_unsigned : __atomic_futex_unsigned_base { -typedef chrono::system_clock __clock_t; +typedef chrono::steady_clock __clock_t; // This must be lock-free and at offset 0. atomic _M_data; @@ -169,7 +169,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION unsigned _M_load_and_test_until_impl(unsigned __assumed, unsigned __operand, bool __equal, memory_order __mo, - const chrono::time_point<__clock_t, _Dur>& __atime) + const chrono::time_point& __atime) { auto __s = chrono::time_point_cast(__atime); auto __ns = chrono::duration_cast(__atime - __s); @@ -229,7 +229,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_load_when_equal_until(unsigned __val, memory_order __mo, const chrono::time_point<_Clock, _Duration>& __atime) { - // DR 887 - Sync unknown clock to known clock. const typename _Clock::time_point __c_entry = _Clock::now(); const __clock_t::time_point __s_entry = __clock_t::now(); const auto __delta = __atime - __c_entry; @@ -241,7 +240,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION template _GLIBCXX_ALWAYS_INLINE bool _M_load_when_equal_until(unsigned __val, memory_order __mo, - const chrono::time_point<__clock_t, _Duration>& __atime) + const chrono::time_point& __atime) { unsigned __i = _M_load(__mo); if ((__i & ~_Waiter_bit) == __val) -- 2.11.0
[RFC PATCH 1/4] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait
The futex system call supports waiting for an absolute time if FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT. Doing so provides two benefits: 1. The call to gettimeofday is not required in order to calculate a relative timeout. 2. If someone changes the system clock during the wait then the futex timeout will correctly expire earlier or later. Currently that only happens if the clock is changed prior to the call to gettimeofday. According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25. There is no attempt to detect the kernel version and fall back to the previous method. --- libstdc++-v3/src/c++11/futex.cc | 21 ++--- 1 file changed, 6 insertions(+), 15 deletions(-) diff --git a/libstdc++-v3/src/c++11/futex.cc b/libstdc++-v3/src/c++11/futex.cc index 64aada06969..217aeefe005 100644 --- a/libstdc++-v3/src/c++11/futex.cc +++ b/libstdc++-v3/src/c++11/futex.cc @@ -35,6 +35,9 @@ // Constants for the wait/wake futex syscall operations const unsigned futex_wait_op = 0; +const unsigned futex_wait_bitset_op = 9; +const unsigned futex_clock_realtime_flag = 256; +const unsigned futex_bitset_match_any = ~0; const unsigned futex_wake_op = 1; namespace std _GLIBCXX_VISIBILITY(default) @@ -58,22 +61,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } else { - struct timeval tv; - gettimeofday (&tv, NULL); - // Convert the absolute timeout value to a relative timeout struct timespec rt; - rt.tv_sec = __s.count() - tv.tv_sec; - rt.tv_nsec = __ns.count() - tv.tv_usec * 1000; - if (rt.tv_nsec < 0) - { - rt.tv_nsec += 10; - --rt.tv_sec; - } - // Did we already time out? - if (rt.tv_sec < 0) - return false; - - if (syscall (SYS_futex, __addr, futex_wait_op, __val, &rt) == -1) + rt.tv_sec = __s.count(); + rt.tv_nsec = __ns.count(); + if (syscall (SYS_futex, __addr, futex_wait_bitset_op | futex_clock_realtime_flag, __val, &rt, nullptr, futex_bitset_match_any) == -1) { _GLIBCXX_DEBUG_ASSERT(errno == EINTR || errno == EAGAIN || errno == ETIMEDOUT); -- 2.11.0
[RFC PATCH 0/4] Make std::future::wait_* use std::chrono::steady_clock when required
This is a first attempt to make std::future::wait_until and std::future::wait_for make correct use of std::chrono::steady_clock/CLOCK_MONOTONIC. It also makes std::future::wait_until react to changes to CLOCK_REALTIME during the wait, but only when passed a std::chrono::system_clock time point. Prior to these patches, the situation is similar to that described at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41861 for std::condition_variable, but with the benefit that this time the bug can be fixed entirely within libstdc++. (I've omitted the ChangeLog entries for now, since I don't believe that this is ready to be accepted in its current form.) Mike. Mike Crowe (4): libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait libstdc++ futex: Support waiting on std::chrono::steady_clock directly libstdc++ atomic_futex: Use std::chrono::steady_clock as reference clock Extra async tests, not for merging libstdc++-v3/include/bits/atomic_futex.h | 74 +-- libstdc++-v3/src/c++11/futex.cc | 48 +++ libstdc++-v3/testsuite/30_threads/async/async.cc | 76 3 files changed, 181 insertions(+), 17 deletions(-) -- 2.11.0
Re: Add a vect_get_dr_size helper function
Richard Biener writes: > On Thu, Sep 14, 2017 at 4:05 PM, Richard Sandiford > wrote: >> Richard Biener writes: >>> On Thu, Sep 14, 2017 at 1:23 PM, Richard Sandiford >>> wrote: This patch adds a helper function for getting the number of bytes accessed by a scalar data reference, which helps when general modes have a variable size. Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? >>> >>> Can you put it into tree-data-ref.h? >> >> The idea (which I forgot to say) was to collect the uses within the >> vectoriser into one place so that we can assert in only one place >> that the reference is constant-sized. >> >> A general data_reference can be variable-sized, so the guarantees >> wouldn't be the same elsewhere. >> >> Would putting it in tree-vectorizer.h be OK? > > Maybe name it vect_get_scalar_dr_size then? > > Ok with that. As discussed on IRC, I tried also using SCALAR_TYPE_MODE instead of TYPE_MODE, to go with the new function name, but it turns out that the mode can be a single-element vector instead of a "true" scalar. This version of the patch sticks with vect_get_scalar_dr_size as the name, but adds a comment to say that "scalar" includes "scalar equivalent". Since SCALAR_TYPE_MODE is too strong an assertion, I went with tree_to_uhwi instead. As also discussed on IRC, it might be possible in current sources to use SCALAR_TYPE_MODE (TREE_TYPE (STMT_VINFO_VECTYPE ...)) instead. But SVE has extending loads and truncating stores, for which the DR size will be different from the vector element size, so if we switch it now, we'd have to switch back to the DR size again later. (Sorry again for labouring such a simple change!) Tested as before. OK to install? Thanks, Richard 2017-09-22 Richard Sandiford Alan Hayward David Sherwood gcc/ * tree-vectorizer.h (vect_get_scalar_dr_size): New function. * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Use it. (vect_enhance_data_refs_alignment): Likewise. Index: gcc/tree-vectorizer.h === --- gcc/tree-vectorizer.h 2017-09-18 16:49:51.902170781 +0100 +++ gcc/tree-vectorizer.h 2017-09-22 08:58:24.308199570 +0100 @@ -1095,6 +1095,19 @@ vect_get_num_copies (loop_vec_info loop_ / TYPE_VECTOR_SUBPARTS (vectype)); } +/* Return the size of the value accessed by unvectorized data reference DR. + This is only valid once STMT_VINFO_VECTYPE has been calculated for the + associated gimple statement, since that guarantees that DR accesses + either a scalar or a scalar equivalent. ("Scalar equivalent" here + includes things like V1SI, which can be vectorized in the same way + as a plain SI.) */ + +inline unsigned int +vect_get_scalar_dr_size (struct data_reference *dr) +{ + return tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr; +} + /* Source location */ extern source_location vect_location; Index: gcc/tree-vect-data-refs.c === --- gcc/tree-vect-data-refs.c 2017-09-18 16:42:00.553203578 +0100 +++ gcc/tree-vect-data-refs.c 2017-09-22 08:58:24.308199570 +0100 @@ -955,7 +955,6 @@ vect_compute_data_ref_alignment (struct return true; } - /* Function vect_update_misalignment_for_peel. Sets DR's misalignment - to 0 if it has the same alignment as DR_PEEL, @@ -975,8 +974,8 @@ vect_update_misalignment_for_peel (struc unsigned int i; vec same_aligned_drs; struct data_reference *current_dr; - int dr_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr; - int dr_peel_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr_peel; + int dr_size = vect_get_scalar_dr_size (dr); + int dr_peel_size = vect_get_scalar_dr_size (dr_peel); stmt_vec_info stmt_info = vinfo_for_stmt (DR_STMT (dr)); stmt_vec_info peel_stmt_info = vinfo_for_stmt (DR_STMT (dr_peel)); @@ -1664,8 +1663,7 @@ vect_enhance_data_refs_alignment (loop_v vectype = STMT_VINFO_VECTYPE (stmt_info); nelements = TYPE_VECTOR_SUBPARTS (vectype); - mis = DR_MISALIGNMENT (dr) / GET_MODE_SIZE (TYPE_MODE ( -TREE_TYPE (DR_REF (dr; + mis = DR_MISALIGNMENT (dr) / vect_get_scalar_dr_size (dr); if (DR_MISALIGNMENT (dr) != 0) npeel_tmp = (negative ? (mis - nelements) : (nelements - mis)) & (nelements - 1); @@ -1937,8 +1935,7 @@ vect_enhance_data_refs_alignment (loop_v updating DR_MISALIGNMENT values. The peeling factor is the vectorization factor minus the misalignment as an element count. */ - mis = DR_MISALIGNMENT (dr0); - mis /= GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr0; + mis = DR_MISALIGNMENT (dr0) / vect_get_
Re: [Patch, fortran] Bug fixes, including regressions, for the associate construct
Dear All, I seem to have messed up associate_31.f90 totally. Not only is it not doing the intended test but it is failing at all levels of optimization. I will remove it first thing tomorrow. Strangely, it did not fail when I regtested before committing. Sorry about this. Paul On 21 September 2017 at 19:41, Paul Richard Thomas wrote: > Dear Jerry, > > Thanks! Committed as revision 253077. > > Cheers > > Paul > > On 21 September 2017 at 01:52, Jerry DeLisle wrote: >> On 09/20/2017 09:45 AM, Paul Richard Thomas wrote: >>> In the last update to the Parameterized Derived Types implementation, >>> I fixed PR60483 as a side effect. I then checked all the ASSOCIATE >>> bugs and noted that I was responsible for a number of regressions due >>> to a patch that I applied last year. I determined to fix them and >>> found that I couldn't stop. >>> >>> The PRs in the ChangeLogs are the low(ish) lying fruit and the changes >>> are fairly obvious or are described in the ChangeLog or the comments. >>> >>> Bootstrapped and regtested on FC23/x86_64 - OK for trunk and later on >>> 7-branch? >> >> Yes OK and thanks for the fixes. >> >> >> Cheers, >> >> Jerry >> > > > > -- > "If you can't explain it simply, you don't understand it well enough" > - Albert Einstein -- "If you can't explain it simply, you don't understand it well enough" - Albert Einstein
[PATCH] [i386, testsuite, libgcc] Fix build breakage on Mac and test FAILS on Solaris.
I've bootstrapped and reg-tested {,-m64} with this on 86_64-pc-linux-gnu, but I'm waiting for a reference test set to finish to compare them. I've verified that we're getting HAVE_AS_IX86_AVX in auto-host.h and I've also built and tested w/o to double-verify that the AVX code is correct. Now that I think of it, I didn't run the tests with -mno-omit-frame-pointer (to simulate Solaris), but since the tests pass as-is, I presume that it understands the 'f?' part of my regex. (A separate set of stubs are used when rbp is the frame pointer and these have 'f' appended to their names.) OK to commit once I get a clean compare? gcc: configure.ac: Add Check for HAVE_AS_IX86_AVX config.in: Regenerate. configure: Likewise. gcc/testsuite: gcc.target/i386/pr82196-1.c: Simplify so that it doesn't break on Solaris or with -mno-omit-frame-pointer. gcc.target/i386/pr82196-2.c: Likewise. libgcc: config/i386/i386-asm.h (SSE_SAVE, SSE_RESTORE): Sniff HAVE_AS_IX86_AVX and directly emit raw .byte code when assembler doesn't support avx, correct out-of-date comments. Thanks, Daniel Signed-off-by: Daniel Santos --- gcc/config.in | 6 + gcc/configure | 32 ++ gcc/configure.ac | 6 + gcc/testsuite/gcc.target/i386/pr82196-1.c | 5 ++-- gcc/testsuite/gcc.target/i386/pr82196-2.c | 5 ++-- libgcc/config/i386/i386-asm.h | 44 ++- 6 files changed, 85 insertions(+), 13 deletions(-) diff --git a/gcc/config.in b/gcc/config.in index 89d7108e8db..df2e518baa6 100644 --- a/gcc/config.in +++ b/gcc/config.in @@ -406,6 +406,12 @@ #endif +/* Define if your assembler supports avx extensions. */ +#ifndef USED_FOR_TARGET +#undef HAVE_AS_IX86_AVX +#endif + + /* Define if your assembler supports the Sun syntax for cmov. */ #ifndef USED_FOR_TARGET #undef HAVE_AS_IX86_CMOV_SUN_SYNTAX diff --git a/gcc/configure b/gcc/configure index 13f97cd3663..e982b86c25c 100755 --- a/gcc/configure +++ b/gcc/configure @@ -25881,6 +25881,38 @@ if test $gcc_cv_as_ix86_swap = yes; then $as_echo "#define HAVE_AS_IX86_SWAP 1" >>confdefs.h +fi + + +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for avx extensions" >&5 +$as_echo_n "checking assembler for avx extensions... " >&6; } +if test "${gcc_cv_as_ix86_avx+set}" = set; then : + $as_echo_n "(cached) " >&6 +else + gcc_cv_as_ix86_avx=no + if test x$gcc_cv_as != x; then +$as_echo 'vzeroupper' > conftest.s +if { ac_try='$gcc_cv_as $gcc_cv_as_flags -o conftest.o conftest.s >&5' + { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5 + (eval $ac_try) 2>&5 + ac_status=$? + $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; }; } +then + gcc_cv_as_ix86_avx=yes +else + echo "configure: failed program was" >&5 + cat conftest.s >&5 +fi +rm -f conftest.o conftest.s + fi +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_ix86_avx" >&5 +$as_echo "$gcc_cv_as_ix86_avx" >&6; } +if test $gcc_cv_as_ix86_avx = yes; then + +$as_echo "#define HAVE_AS_IX86_AVX 1" >>confdefs.h + fi diff --git a/gcc/configure.ac b/gcc/configure.ac index 82711389281..a05f2ca10b2 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -4171,6 +4171,12 @@ foo: nop [AC_DEFINE(HAVE_AS_IX86_SWAP, 1, [Define if your assembler supports the swap suffix.])]) +gcc_GAS_CHECK_FEATURE([avx extensions], + gcc_cv_as_ix86_avx,,, + [vzeroupper],, + [AC_DEFINE(HAVE_AS_IX86_AVX, 1, +[Define if your assembler supports avx extensions.])]) + gcc_GAS_CHECK_FEATURE([different section symbol subtraction], gcc_cv_as_ix86_diff_sect_delta,,, [.section .rodata diff --git a/gcc/testsuite/gcc.target/i386/pr82196-1.c b/gcc/testsuite/gcc.target/i386/pr82196-1.c index ef858328f00..541d975480d 100644 --- a/gcc/testsuite/gcc.target/i386/pr82196-1.c +++ b/gcc/testsuite/gcc.target/i386/pr82196-1.c @@ -1,7 +1,7 @@ /* { dg-do compile { target lp64 } } */ /* { dg-options "-msse -mcall-ms2sysv-xlogues -O2" } */ -/* { dg-final { scan-assembler "call.*__sse_savms64_18" } } */ -/* { dg-final { scan-assembler "jmp.*__sse_resms64x_18" } } */ +/* { dg-final { scan-assembler "call.*__sse_savms64f?_12" } } */ +/* { dg-final { scan-assembler "jmp.*__sse_resms64f?x_12" } } */ void __attribute__((sysv_abi)) a() { } @@ -9,6 +9,5 @@ void __attribute__((sysv_abi)) a() { static void __attribute__((sysv_abi)) (*volatile a_noinfo)() = a; void __attribute__((ms_abi)) b() { - __asm__ __volatile__ ("" :::"rbx", "rbp", "r12", "r13", "r14", "r15"); a_noinfo (); } diff --git a/gcc/testsuite/gcc.target/i386/pr82196-2.c b/gcc/testsuite/gcc.target/i386/pr82196-2.c index 8fe58411d5e..7166d068bc1 100644 --- a/gcc/testsuite/gcc.target/i386/pr82196-2.c +++ b/gcc/testsuite/gcc.t
Re: [PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV
Hi Daniel, > On 09/21/2017 05:18 PM, Daniel Santos wrote: >> So libgcc doesn't use a config.in. :( > > Scratch that, I forgot that we're using gcc/config.in via auto-host.h. > So I only have to add this to gcc/configure.ac and it will be available > for my libgcc header -- this is what I used to sniff out support for the > .hidden directive. Please don't go that route: it's totally the wrong direction. There's work going on to further decouple libgcc from gcc-private headers and configure results. libgcc already has its own configure tests for assembler features, and its own config.in. What's wrong with adapting libitm's avx test in libitm/acinclude.m4 (LIBITM_CHECK_AS_AVX) for libgcc? Should be trivial... Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: Backtrace library [3/3]
On Thu, 21 Sep 2017, Matthias Klose wrote: > On 21.09.2017 17:50, Ian Lance Taylor via gcc-patches wrote: > > On Thu, Sep 21, 2017 at 4:52 AM, Thomas Schwinge > > wrote: > >> > >> On Fri, 14 Sep 2012 14:20:08 -0700, Ian Lance Taylor > >> wrote: > [libbacktrace] > >> > >>> I won't commit for a day or two in case anybody sees a > >>> problem in the implementation. > >> > >> Eventually got committed, trunk r191397. > >> > >> > >> I just happened to notice that contrib/gcc_update never got updated for > >> libbacktrace. OK to commit the following? > >> > >> commit 5f59657d84228734944dc6216e947f2ee627fcf7 > >> Author: Thomas Schwinge > >> Date: Thu Sep 21 13:49:47 2017 +0200 > >> > >> Handle libbacktrace in contrib/gcc_update > >> > >> contrib/ > >> * gcc_update (files_and_dependencies): Handle libbacktrace. > > > > Fine by me. Thanks. > > ok to backport to the active branches? This one specifically? Yes. Richard.
Re: [Patch, Fortran] PR 82143: add a -fdefault-real-16 flag
On Fri, Sep 22, 2017 at 8:02 AM, Janus Weil wrote: > Thanks, Steve. I'm attaching the updated ChangeLog and the two test > cases for the two new flags. Since this appears to be a somewhat > controversial feature, I'll wait two more days to allow for others to > contribute their feedback (positive or negative). I'll commit on > Sunday if I hear nothing until then. Well, if you're actively soliciting bikeshedding, here goes: Since we're about to have several -fdefault-real-N flags, would it make sense to instead make a single flag -fdefault-real=SOMEVALUE (and for backwards compatibility, make -fdefault-real-8 an alias for -fdefault-real=8)? And similarly for -fdefault-double=. And since the standard requires that double precision variables are twice as big as reals, in the absence of an explicit -fdefault-double= flag, would it make sense to have -fdefault-real=N imply -fdefault-double=[2*N or if that isn't supported on the target, the largest supported real kind]? (This is sort-of an extension of the current -fdefault-real-8 behavior) -- Janne Blomqvist