Tweak partitioning for less streaming
Hi, this is something we noticed during experiments with Martin Liska. The streaming works a lot better if partitioning is based on original source order instead of reverse postorder and both orders seems to work similarly badly code quality wise. This reduces streaming needed by firefox to 1/2. Bootstrapped/regtested ppc64-linux, comitted. * lto-partition.c (lto_balanced_map): Always base order on source file order. Index: lto-partition.c === --- lto-partition.c (revision 202000) +++ lto-partition.c (working copy) @@ -449,11 +449,9 @@ lto_balanced_map (void) { int n_nodes = 0; int n_varpool_nodes = 0, varpool_pos = 0, best_varpool_pos = 0; - struct cgraph_node **postorder = -XCNEWVEC (struct cgraph_node *, cgraph_n_nodes); struct cgraph_node **order = XNEWVEC (struct cgraph_node *, cgraph_max_uid); struct varpool_node **varpool_order = NULL; - int i, postorder_len; + int i; struct cgraph_node *node; int total_size = 0, best_total_size = 0; int partition_size; @@ -468,24 +466,20 @@ lto_balanced_map (void) FOR_EACH_VARIABLE (vnode) gcc_assert (!vnode->symbol.aux); - /* Until we have better ordering facility, use toplogical order. - Include only nodes we will partition and compute estimate of program - size. Note that since nodes that are not partitioned might be put into - multiple partitions, this is just an estimate of real size. This is why - we keep partition_size updated after every partition is finalized. */ - postorder_len = ipa_reverse_postorder (postorder); - for (i = 0; i < postorder_len; i++) -{ - node = postorder[i]; - if (get_symbol_class ((symtab_node) node) == SYMBOL_PARTITION) - { - order[n_nodes++] = node; - total_size += inline_summary (node)->size; - } -} - free (postorder); + FOR_EACH_DEFINED_FUNCTION (node) +if (get_symbol_class ((symtab_node) node) == SYMBOL_PARTITION) + { + order[n_nodes++] = node; + total_size += inline_summary (node)->size; + } + /* Streaming works best when the source units do not cross partition + boundaries much. This is because importing function from a source + unit tends to import a lot of global trees defined there. We should + get better about minimizing the function bounday, but until that + things works smoother if we order in source order. */ + qsort (order, n_nodes, sizeof (struct cgraph_node *), node_cmp); if (!flag_toplevel_reorder) { qsort (order, n_nodes, sizeof (struct cgraph_node *), node_cmp);
Re: [PATCH][2/n] 2nd try: Re-organize -fvect-cost-model, enable basic vectorization at -O2
Richard, I have some comments about the patch. > -ftree-vectorizer-verbose=This switch is deprecated. Use > -fopt-info instead. > > ftree-slp-vectorize > ! Common Report Var(flag_tree_slp_vectorize) Optimization > Enable basic block vectorization (SLP) on trees The code dealing with the interactions between -ftree-vectorize, O3, etc are complicated and hard to understand. Is it better to change the meaning of -ftree-vectorize to mean -floop-vectorize only, and make it independent of -fslp-vectorize? P > > + fvect-cost-model= > + Common Joined RejectNegative Enum(vect_cost_model) > Var(flag_vect_cost_model) Init(VECT_COST_MODEL_DEFAULT) > + Specifies the cost model for vectorization > + > + Enum > + Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown > vectorizer cost model %qs) > + > + EnumValue > + Enum(vect_cost_model) String(unlimited) Value(VECT_COST_MODEL_UNLIMITED) > + > + EnumValue > + Enum(vect_cost_model) String(dynamic) Value(VECT_COST_MODEL_DYNAMIC) > + > + EnumValue > + Enum(vect_cost_model) String(cheap) Value(VECT_COST_MODEL_CHEAP) Introducing cheap model is a great change. > + > *** 173,179 > { > struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > > ! if ((unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS) == 0) > return false; > > if (dump_enabled_p ()) > --- 173,180 > { > struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > > ! if (loop_vinfo->cost_model == VECT_COST_MODEL_CHEAP > ! || (unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS) > == 0) > return false; > When the cost_model == cheap, the alignment peeling should also be disabled -- there will still be loops that are beneficial to be vectorized without peeling -- at perhaps reduced net runtime gain. > struct gimple_opt_pass pass_slp_vectorize = > --- 206,220 > static bool > gate_vect_slp (void) > { > ! /* Apply SLP either according to whether the user specified whether to > ! run SLP or not, or according to whether the user specified whether > ! to do vectorization or not. */ > ! if (global_options_set.x_flag_tree_slp_vectorize) > ! return flag_tree_slp_vectorize != 0; > ! if (global_options_set.x_flag_tree_vectorize) > ! return flag_tree_vectorize != 0; > ! /* And if vectorization was enabled by default run SLP only at -O3. */ > ! return flag_tree_vectorize != 0 && optimize == 3; > } The logic can be greatly simplified if slp vectorizer is controlled independently -- easier for user to understand too. > ! @item -fvect-cost-model=@var{model} > @opindex fvect-cost-model > ! Alter the cost model used for vectorization. The @var{model} argument > ! should be one of @code{unlimited}, @code{dynamic} or @code{cheap}. > ! With the @code{unlimited} model the vectorized code-path is assumed > ! to be profitable while with the @code{dynamic} model a runtime check > ! will guard the vectorized code-path to enable it only for iteration > ! counts that will likely execute faster than when executing the original > ! scalar loop. The @code{cheap} model will disable vectorization of > ! loops where doing so would be cost prohibitive for example due to > ! required runtime checks for data dependence or alignment but otherwise > ! is equal to the @code{dynamic} model. > ! The default cost model depends on other optimization flags and is > ! either @code{dynamic} or @code{cheap}. > Vectorizer in theory will only vectorize a loop with net runtime gain, so the 'cost' here should only mean code size and compile time cost. Cheap Model: with this model, the compiler will vectorize loops that are considered beneficial for runtime performance with minimal code size increase and compile time cost; Unlimited Model: compiler will vectorize loops to maximize runtime gain without considering compile time cost and impact to code size; thanks, David
COMDAT missing profile handling (was Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition)
On Wed, Aug 21, 2013 at 8:25 AM, Jan Hubicka wrote: >> > >> > Because offline COMDAT functoin will be porduced for every COMDAT used, I >> > think >> > it is bad to porduce any COMDAT (or any reachable function via calls with >> > non-0 >> > count) that has empty profile (either because it got lost by COMDAT merging >> > or because of reading mismatch). >> >> The approach this patch takes is to simply treat those functions the >> same as we would if we didn't feed back profile data in the first >> place, by using the frequencies. This is sufficient except when one is >> inlined, which is why I have the special handling in the inliner >> itself. > > Yes, my orignal plan was to have per-function profile_status that > specify if profile is read, guessed or absent and do function local > decision sanely with each setting. > > Here we read the function, we set profile to READ (with all counts being 0). > We should drop it to GUESSED when we see that there are non-0 count edges > calling the function in question and probably we should see if it is obviously > hot (i.e. reachable by a hot call) and promote its function profile to HOT > then to get code placement less bad... >> > >> > Since new direct calls can be discovered later, inline may want to do that >> > again each time it inlines non-0 count call of COMDAT with 0 count... >> >> How about an approach like this: >> - Invoke init_and_estimate_bb_frequencies as I am doing to guess the >> profiles at profile read time for functions with 0 counts. > > I see, here we are out of sync. > We always used to go with estimated frequencies for functions with 0 counts, > but it seems that this code broke when prediction was moved before profiling. > (we also should keep edge probabilities from predict.c in that case) > > The esitmated profile is already there before reading the profile in, so we > only do not want to overwrite it. Does the following work for you? It does get the estimated frequencies on the bbs. > > Index: tree-profile.c > === > --- tree-profile.c (revision 201838) > +++ tree-profile.c (working copy) > @@ -604,6 +604,34 @@ > >pop_cfun (); > } > + /* See if 0 count function has non-0 count callers. In this case we > + lost some profile. Drop its function profile to PROFILE_GUESSED. */ > + FOR_EACH_DEFINED_FUNCTION (node) > +{ > + struct cgraph_edge *e; > + bool called = false; > + if (node->count) > + continue; > + for (e = node->callers; e; e = e->next_caller) > + { > + if (e->count) > + called = true; > + if (cgraph_maybe_hot_edge_p (e)) > + break; > + } > + if (e || called > + && profile_status_for_function > + (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ) > + { > + if (dump_file) > + fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on > calls.\n", > +cgraph_node_name (node), node->symbol.order, > +e ? "function is hot" : "function is normal"); > + profile_status_for_function (DECL_STRUCT_FUNCTION > (node->symbol.decl)) > + = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT); > + node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL; > + } > +} > >del_node_map(); >return 0; > Index: predict.c > === > --- predict.c (revision 201838) > +++ predict.c (working copy) > @@ -2715,6 +2715,9 @@ >gcov_type count_max, true_count_max = 0; >basic_block bb; > > + if (!ENTRY_BLOCK_PTR->count) > +return 0; > + >FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb) > true_count_max = MAX (bb->count, true_count_max); > > >> - At inline time, invoke some kind of freqs_to_counts routine for any >> 0-count routine that is reached by non-zero call edges. It would take > > We should not need that since frequencies ought to be there. > >> the sum of all incoming call edge counts and synthesize counts for the >> bbs using the guessed profile frequencies applied earlier by >> init_and_estimate_bb_frequencies. Then the inliner can do its normal >> bb count scaling. > > Yes, i guess we should go this way. Still we will need to watch overly large > values. > Recrusive inlining can probably easily produce quite a nonsense here. > > We wil also need to solve problem that in this case cgraph edges will have 0 > profile. > We probably want to play the game there and just do the scaling for edge > count, > since IPA passes probably do not want to care about partial profiles. The problem I am having is that my new freqs_to_counts routine takes the sum of the incoming edge counts and computes the bb counts by scaling by the estimated bb frequency. But in the case where the caller was also missing profile data the edge count will have 0 profile
Re: [PATCH, i386] PR 57927: -march=core-avx2 different than -march=native on INTEL Haswell (i7-4700K)
On Tue, 27 Aug 2013 19:41:09 +0200 Uros Bizjak wrote: > On Tue, Aug 27, 2013 at 7:36 PM, H.J. Lu wrote: > > >>> As reported in [1] the host processor detection has not yet been updated > >>> to recognize Intel Ivy Bridge and Haswell processors. > >>> This small patch adds the detection of these processors and assumes > >>> core-avx2 as march for unknown processors of the PENTIUMPRO family that > >>> support AVX2. > >> > >> I have committed slightly improved (attached) patch that uses > >> core-avx-i for IvyBridge and adds another IvyBridge model number. > >> While there, I also reordered a bunch of statements. > >> > >> Thanks, > >> Uros. > > > > Page C-3 in ntel optimization guide shows: > > > > 06_3CH, 06_45H, 06_46H Intel microarchitecture Haswell > > 06_3AH, 06_3EH Intel microarchitecture Ivy Bridge > > 06_2AH, 06_2DH Intel microarchitecture Sandy Bridge > > 06_25H, 06_2CH, 06_2FH Intel microarchitecture Westmere > > 06_1AH, 06_1EH, 06_1FH, Intel microarchitecture Nehalem > > 06_2EH > > 06_17H, 06_1DH Enhanced Intel Core microarchitecture > > 06_0FH Intel Core microarchitecture > > > > At least, we should add 0x45 and 0x46 to Haswell. > > OK, the patch is pre-approved. Can these changes be backported to 4.8 and the Ivy Bridge parts to 4.7 as well? We've had a couple reports of bad -march=native results for Haswell on 4.7. I can file PRs if you're interested. -- Ryan Hillpsn: dirtyepic_sk gcc-porting/toolchain/wxwidgets @ gentoo.org 47C3 6D62 4864 0E49 8E9E 7F92 ED38 BD49 957A 8463 signature.asc Description: PGP signature
Re: v3 of GDB hooks for debugging GCC
On Tue, 2013-08-27 at 09:58 -0600, Jeff Law wrote: > On 08/26/2013 12:42 PM, David Malcolm wrote: > > The patch also adds me a maintainer of gdbhooks.py into the MAINTAINERS > > file. (There doesn't seem to be any sort order to the maintainer part > > of that file, should there be?) > > > > Finally, I added a copyright header to the new file ("part of GCC", FSF > > assignee, GPLv3 or later). > > > > OK for trunk? > Yes. This is good. Thanks; committed to trunk as r202040.
Re: [PATCH 3/3] Support dumping type bindings in lambda diagnostics.
On 08/27/2013 02:46 PM, Adam Butcher wrote: Okay. As it stands, it means that you get an additional 'const' in diagnostics for lambda's not declared 'mutable'. Hmm, I guess it would be preferable to use 'mutable' or nothing when printing the lambda just like when declaring one. Jason
Re: RFA: Consider int and same-size short as equivalent vector components
Hi, On 08/28/2013 12:29 AM, Joern Rennecke wrote: * c-common.h: (same_scalar_type_ignoring_signedness): Delete prototype. (vector_types_compatible_elements_p): Prototype. Sorry for nitpicking, but since we are now using C++ I think declaration is more correct than prototype. Paolo.
Re: RFA: Consider int and same-size short as equivalent vector components
Quoting Marc Glisse : On Mon, 26 Aug 2013, Joern Rennecke wrote: Quoting Marc Glisse : The issue seems larger than just short/int. On x86, (l I don't understand what you mean here. Could you post the actual code sample? typedef long vl __attribute__((vector_size(16))); void f(vl l){ (l Checking what tree.h has to say about the meaning of TYPE_VECTOR_OPAQUE, I found it says: /* Nonzero in a VECTOR_TYPE if the frontends should not emit warnings about missing conversions to other vector types of the same size. */ So, presumably, the front ends should accept any combination of an opaque vector with another vector (opaque or not) if the operation can be performed. I don't want into all the ways a conversion might be used to make an operation work at this time, but at least we should accept combinations of vectors that are essentially the same. I found for our test cases, this not only requires replacing the two extant uses of same_scalar_type_ignoring_signedness, but also replacing/augmenting three euqality comparisons. For the C frontend, these were two simple equality comparisons, which could be replaced straightforwardly; For the C++ frontend, I'm not sure if the same_type_ignoring_top_level_qualifiers_p check becomes redundant, and less so, if so, if it'll stay redundant in the future. The safe option seemed to be to have both tests. I added the C++ front end to the CC list in case they have some insight / opinion to offer on this. I have successfully bootstrapped / regtested the attached patch on i686-pc-linux-gnu. 2013-08-27 Joern Rennecke gcc/c-family: * c-common.c (same_scalar_type_ignoring_signedness): Delete. (vector_types_compatible_elements_p): New function. * c-common.h: (same_scalar_type_ignoring_signedness): Delete prototype. (vector_types_compatible_elements_p): Prototype. gcc/c: * c-typeck.c (build_binary_op): Use vector_types_compatible_elements_p. gcc/cp: * typeck.c (cp_build_binary_op): Use vector_types_compatible_elements_p. gcc/testsuite: * c-c++-common/opaque-vector.c: New test. Index: c-family/c-common.c === --- c-family/c-common.c (revision 202008) +++ c-family/c-common.c (working copy) @@ -10690,20 +10690,36 @@ resolve_overloaded_builtin (location_t l } } -/* Ignoring their sign, return true if two scalar types are the same. */ +/* True if vector types T1 and T2 can be inputs to the same binary + operator without conversion. + We don't check the overall vector size here because some of our callers + want to give different error messages when the vectors are compatible + except for the element count. */ + bool -same_scalar_type_ignoring_signedness (tree t1, tree t2) +vector_types_compatible_elements_p (tree t1, tree t2) { + bool opaque = TYPE_VECTOR_OPAQUE (t1) || TYPE_VECTOR_OPAQUE (t2); + t1 = TREE_TYPE (t1); + t2 = TREE_TYPE (t2); + enum tree_code c1 = TREE_CODE (t1), c2 = TREE_CODE (t2); gcc_assert ((c1 == INTEGER_TYPE || c1 == REAL_TYPE || c1 == FIXED_POINT_TYPE) && (c2 == INTEGER_TYPE || c2 == REAL_TYPE || c2 == FIXED_POINT_TYPE)); + t1 = c_common_signed_type (t1); + t2 = c_common_signed_type (t2); /* Equality works here because c_common_signed_type uses TYPE_MAIN_VARIANT. */ - return c_common_signed_type (t1) -== c_common_signed_type (t2); + if (t1 == t2) +return true; + if (opaque && c1 == c2 + && (c1 == INTEGER_TYPE || c1 == REAL_TYPE) + && TYPE_PRECISION (t1) == TYPE_PRECISION (t2)) +return true; + return false; } /* Check for missing format attributes on function pointers. LTYPE is Index: c-family/c-common.h === --- c-family/c-common.h (revision 202008) +++ c-family/c-common.h (working copy) @@ -766,7 +766,7 @@ extern void warn_logical_operator (locat enum tree_code, tree, enum tree_code, tree); extern void check_main_parameter_types (tree decl); extern bool c_determine_visibility (tree); -extern bool same_scalar_type_ignoring_signedness (tree, tree); +extern bool vector_types_compatible_elements_p (tree, tree); extern void mark_valid_location_for_stdc_pragma (bool); extern bool valid_location_for_stdc_pragma_p (void); extern void set_float_const_decimal64 (void); Index: c/c-typeck.c === --- c/c-typeck.c(revision 202008) +++ c/c-typeck.c(working copy) @@ -9973,7 +9973,7 @@ build_binary_op (location_t location, en if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE) { tree intt; - if (TREE_TYPE (type0) != TREE_TYPE (type1)) + if (!vector_types_compatible_elements_p (type0, type1)) { error_at (location, "comparing vectors with different "
[wwwdocs] Use https for twitter.com link
No cookie, Gerald. I should have caught this permanent redirect (verified with wget among others). Change applied. Gerald Index: style.mhtml === RCS file: /cvs/gcc/wwwdocs/htdocs/style.mhtml,v retrieving revision 1.121 diff -u -3 -p -r1.121 style.mhtml --- style.mhtml 13 Aug 2013 00:46:33 - 1.121 +++ style.mhtml 27 Aug 2013 20:03:57 - @@ -158,7 +158,7 @@ Mailing lists http://gcc.gnu.org/onlinedocs/gcc/Contributors.html";>Contributors Steering Committee - http://twitter.com/gnutools";> + https://twitter.com/gnutools";> @gnutools
Re: opt-info message change for vectorizer
If this is the convention, we should probably have another patch to fix all the existing opt-info messages. thanks, David On Tue, Aug 27, 2013 at 1:23 PM, Mike Stump wrote: > On Aug 27, 2013, at 11:22 AM, Xinliang David Li wrote: >> Does this one look ok? > > We don't capitalize text after error:, warning: or note:. > >> thanks, >> >> David >> >> On Thu, Aug 22, 2013 at 4:20 PM, Xinliang David Li >> wrote: >>> Hi, In this patch, loop alignment peeling and loop versioning >>> transformation will be reported via -fopt-info by default. This will >>> help vectorizer size tuning. >>> >>> It also enhances the opt-info dump to include current function name as >>> context. Otherwise, we may see duplicate messeages from inline/cloned >>> instances. >>> >>> An example opt report: >>> >>> >>> >>> b.c:16:A::foo: note: Loop is vectorized >>> >>> b.c:16:A::foo: note: Loop is versioned to remove aliases for vectorization >>> >>> b.c:16:A::foo: note: Loop is peeled to enhance alignment for vectorization >>> >>> b.c:16:A::foo: note: Completely unroll loop 6 times >>> >>> b.c:12:A::foo: note: Completely unroll loop 6 times >>> >>> >>> Ok after testing? >>> >>> thanks, >>> >>> David >
wide-int branch: cleaned up comparison functions.
Removed the redundant implementations of several comparison function by just forwarding the oo version to the static version. Added static versions of cmp, cmpu and cmps. kenny Index: gcc/wide-int.h === --- gcc/wide-int.h (revision 202033) +++ gcc/wide-int.h (working copy) @@ -443,12 +443,18 @@ public: template int cmp (const T &, signop) const; + template + static int cmp (const T1 &, const T2 &, signop); template int cmps (const T &) const; + template + static int cmps (const T1 &, const T2 &); template int cmpu (const T &) const; + template + static int cmpu (const T1 &, const T2 &); bool only_sign_bit_p (unsigned int) const; bool only_sign_bit_p () const; @@ -1137,38 +1143,7 @@ template inline bool wide_int_ro::operator == (const T &c) const { - bool result; - HOST_WIDE_INT ws[WIDE_INT_MAX_ELTS]; - const HOST_WIDE_INT *s; - unsigned int cl; - unsigned int p1, p2; - - p1 = precision; - - s = to_shwi1 (ws, &cl, &p2, c); - check_precision (&p1, &p2, true, false); - - if (p1 == 0) -/* There are prec 0 types and we need to do this to check their - min and max values. */ -result = (len == cl) && (val[0] == s[0]); - else if (p1 < HOST_BITS_PER_WIDE_INT) -{ - unsigned HOST_WIDE_INT mask = ((HOST_WIDE_INT)1 << p1) - 1; - result = (val[0] & mask) == (s[0] & mask); -} - else if (p1 == HOST_BITS_PER_WIDE_INT) -result = val[0] == s[0]; - else -result = eq_p_large (val, len, p1, s, cl); - - if (result) -gcc_assert (len == cl); - -#ifdef DEBUG_WIDE_INT - debug_vwa ("wide_int_ro:: %d = (%s == %s)\n", result, *this, s, cl, p2); -#endif - return result; + return wide_int_ro::eq_p (*this, c); } /* Return true if C1 == C2. If both parameters have nonzero precisions, @@ -1219,31 +1194,7 @@ template inline bool wide_int_ro::lts_p (const T &c) const { - bool result; - HOST_WIDE_INT ws[WIDE_INT_MAX_ELTS]; - const HOST_WIDE_INT *s; - unsigned int cl; - unsigned int p1, p2; - - p1 = precision; - s = to_shwi1 (ws, &cl, &p2, c); - check_precision (&p1, &p2, false, true); - - if (p1 <= HOST_BITS_PER_WIDE_INT - && p2 <= HOST_BITS_PER_WIDE_INT) -{ - gcc_assert (cl != 0); - HOST_WIDE_INT x0 = sext_hwi (val[0], p1); - HOST_WIDE_INT x1 = sext_hwi (s[0], p2); - result = x0 < x1; -} - else -result = lts_p_large (val, len, p1, s, cl, p2); - -#ifdef DEBUG_WIDE_INT - debug_vwa ("wide_int_ro:: %d = (%s lts_p %s\n", result, *this, s, cl, p2); -#endif - return result; + return wide_int_ro::lts_p (*this, c); } /* Return true if C1 < C2 using signed comparisons. */ @@ -1283,30 +1234,7 @@ template inline bool wide_int_ro::ltu_p (const T &c) const { - bool result; - HOST_WIDE_INT ws[WIDE_INT_MAX_ELTS]; - const HOST_WIDE_INT *s; - unsigned int cl; - unsigned int p1, p2; - - p1 = precision; - s = to_shwi1 (ws, &cl, &p2, c); - check_precision (&p1, &p2, false, true); - - if (p1 <= HOST_BITS_PER_WIDE_INT - && p2 <= HOST_BITS_PER_WIDE_INT) -{ - unsigned HOST_WIDE_INT x0 = zext_hwi (val[0], p1); - unsigned HOST_WIDE_INT x1 = zext_hwi (s[0], p2); - result = x0 < x1; -} - else -result = ltu_p_large (val, len, p1, s, cl, p2); - -#ifdef DEBUG_WIDE_INT - debug_vwa ("wide_int_ro:: %d = (%s ltu_p %s)\n", result, *this, s, cl, p2); -#endif - return result; + return wide_int_ro::ltu_p (*this, c); } /* Return true if C1 < C2 using unsigned comparisons. */ @@ -1524,27 +1452,37 @@ wide_int_ro::ge_p (const T1 &c1, const T return geu_p (c1, c2); } -/* Returns -1 if THIS < C, 0 if THIS == C and 1 if A > C using +/* Returns -1 if THIS < C, 0 if THIS == C and 1 if THIS > C using signed compares. */ template inline int wide_int_ro::cmps (const T &c) const { + return wide_int_ro::cmps (*this, c); +} + +/* Returns -1 if C1 < C2, 0 if C1 == C2 and 1 if C1 > C2 using + signed compares. */ +template +inline int +wide_int_ro::cmps (const T1 &c1, const T2 &c2) +{ int result; - HOST_WIDE_INT ws[WIDE_INT_MAX_ELTS]; - const HOST_WIDE_INT *s; - unsigned int cl; - unsigned int prec; + HOST_WIDE_INT ws1[WIDE_INT_MAX_ELTS]; + HOST_WIDE_INT ws2[WIDE_INT_MAX_ELTS]; + const HOST_WIDE_INT *s1, *s2; /* Returned data */ + unsigned int cl1, cl2; /* array lengths */ + unsigned int p1, p2; /* precisions */ - s = to_shwi1 (ws, &cl, &prec, c); - if (prec == 0) -prec = precision; + s1 = to_shwi1 (ws1, &cl1, &p1, c1); + s2 = to_shwi1 (ws2, &cl2, &p2, c2); + check_precision (&p1, &p2, false, true); - if (precision <= HOST_BITS_PER_WIDE_INT - && prec <= HOST_BITS_PER_WIDE_INT) + if (p1 <= HOST_BITS_PER_WIDE_INT + && p2 <= HOST_BITS_PER_WIDE_INT) { - HOST_WIDE_INT x0 = sext_hwi (val[0], precision); - HOST_WIDE_INT x1 = sext_hwi (s[0], prec); + HOST_WIDE_INT x0 = sext_hwi (s1[0], p1); + HOST_WIDE_INT x1 =
Re: opt-info message change for vectorizer
On Aug 27, 2013, at 11:22 AM, Xinliang David Li wrote: > Does this one look ok? We don't capitalize text after error:, warning: or note:. > thanks, > > David > > On Thu, Aug 22, 2013 at 4:20 PM, Xinliang David Li wrote: >> Hi, In this patch, loop alignment peeling and loop versioning >> transformation will be reported via -fopt-info by default. This will >> help vectorizer size tuning. >> >> It also enhances the opt-info dump to include current function name as >> context. Otherwise, we may see duplicate messeages from inline/cloned >> instances. >> >> An example opt report: >> >> >> >> b.c:16:A::foo: note: Loop is vectorized >> >> b.c:16:A::foo: note: Loop is versioned to remove aliases for vectorization >> >> b.c:16:A::foo: note: Loop is peeled to enhance alignment for vectorization >> >> b.c:16:A::foo: note: Completely unroll loop 6 times >> >> b.c:12:A::foo: note: Completely unroll loop 6 times >> >> >> Ok after testing? >> >> thanks, >> >> David
[wide-int] fix mips build error
Since we use these in a cpp #if, we have to use #define for them. Index: wide-int.h === --- wide-int.h (revision 202032) +++ wide-int.h (working copy) @@ -249,15 +249,15 @@ along with GCC; see the file COPYING3. on any platform is 64 bits. When that changes, then it is likely that a target hook should be defined so that targets can make this value larger for those targets. */ -const int addr_max_bitsize = 64; +#define ADDR_MAX_BITSIZE 64 /* This is the internal precision used when doing any address arithmetic. The '4' is really 3 + 1. Three of the bits are for the number of extra bits needed to do bit addresses and single bit is allow everything to be signed without loosing any precision. Then everything is rounded up to the next HWI for efficiency. */ -const int addr_max_precision - = ((addr_max_bitsize + 4 + HOST_BITS_PER_WIDE_INT - 1) & ~(HOST_BITS_PER_WIDE_INT - 1)); +#define ADDR_MAX_PRECISION \ + ((ADDR_MAX_BITSIZE + 4 + HOST_BITS_PER_WIDE_INT - 1) & ~(HOST_BITS_PER_WIDE_INT - 1)) /* This is used to bundle an rtx and a mode together so that the pair can be used as the second operand of a wide int expression. If we @@ -4092,7 +4092,7 @@ fixed_wide_int ::operator -= (c /* A wide_int_ro that has a large enough precision to do any address math on the target. */ -typedef fixed_wide_int addr_wide_int; +typedef fixed_wide_int addr_wide_int; /* A wide_int_ro that has a large enough precision to do any math on the target. */ typedef fixed_wide_int max_wide_int; @@ -4132,14 +4132,14 @@ template <> inline const HOST_WIDE_INT* wide_int_ro::to_shwi1 (HOST_WIDE_INT *s ATTRIBUTE_UNUSED, unsigned int *l, unsigned int *p, - const fixed_wide_int &y) + const fixed_wide_int &y) { *p = y.get_precision (); *l = y.get_len (); return y.get_val (); } -#if addr_max_precision != MAX_BITSIZE_MODE_ANY_INT +#if ADDR_MAX_PRECISION != MAX_BITSIZE_MODE_ANY_INT template <> inline const HOST_WIDE_INT* wide_int_ro::to_shwi1 (HOST_WIDE_INT *s ATTRIBUTE_UNUSED,
Re: [PATCH i386 2/8] [AVX512] Add mask registers.
On 08/27/2013 11:11 AM, Kirill Yukhin wrote: >> > What happened to the bmi andn alternative we discussed? > BMI only supported for 4- and 8- byte integers, while > kandw - for HI/QI > We're talking about values in registers. Ignoring the high bits of the andn result still produces the correct results. r~
[PATCH 4/4] Support using 'auto' in a function parameter list to introduce an implicit template parameter.
* cp-tree.h (type_uses_auto_or_concept): Declare. (is_auto_or_concept): Declare. * decl.c (grokdeclarator): Allow 'auto' parameters with -std=gnu++1y or -std=c++1y. * type-utils.h: New header defining ... (find_type_usage): ... this new template based on pt.c (type_uses_auto) for searching a type tree given a predicate. * pt.c (type_uses_auto): Reimplement via type-utils.h (find_type_usage). (is_auto_or_concept): New function. (type_uses_auto_or_concept): New function. * parser.h (struct cp_parser): Add fully_implicit_function_template_p. * parser.c (cp_parser_new): Initialize fully_implicit_function_template_p. (cp_parser_lambda_expression): Copy and restore value of fully_implicit_function_template_p as per other parser fields. (cp_parser_parameter_declaration_list): Count generic parameters and call ... (add_implicit_template_parms): ... this new function to synthesize them with help from type-utils.h (find_type_usage), ... (tree_type_is_auto_or_concept): ... this new static function and ... (make_generic_type_name): ... this new static function. (cp_parser_direct_declarator): Account for implicit template parameters. (cp_parser_lambda_declarator_opt): Finish fully implicit template if necessary by calling ... (finish_fully_implicit_template): ... this new function. (cp_parser_member_declaration): Likewise. (cp_parser_function_definition_after_declarator): Likewise. --- gcc/cp/cp-tree.h| 2 + gcc/cp/decl.c | 7 +- gcc/cp/parser.c | 182 ++-- gcc/cp/parser.h | 6 ++ gcc/cp/pt.c | 35 +- gcc/cp/type-utils.h | 56 6 files changed, 264 insertions(+), 24 deletions(-) create mode 100644 gcc/cp/type-utils.h diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 876a72a..8d4ac94 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -5453,10 +5453,12 @@ extern tree make_auto (void); extern tree make_decltype_auto (void); extern tree do_auto_deduction (tree, tree, tree); extern tree type_uses_auto (tree); +extern tree type_uses_auto_or_concept (tree); extern void append_type_to_template_for_access_check (tree, tree, tree, location_t); extern tree splice_late_return_type(tree, tree); extern bool is_auto(const_tree); +extern bool is_auto_or_concept (const_tree); extern tree process_template_parm (tree, location_t, tree, bool, bool); extern tree end_template_parm_list (tree); diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 4076a24..df44a6e 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -10325,9 +10325,12 @@ grokdeclarator (const cp_declarator *declarator, if (ctype || in_namespace) error ("cannot use %<::%> in parameter declaration"); - if (type_uses_auto (type)) + if (type_uses_auto (type) && cxx_dialect < cxx1y) { - error ("parameter declared %"); + pedwarn (location_of (type), 0, + "use of % in parameter declaration " + "only available with " + "-std=c++1y or -std=gnu++1y"); type = error_mark_node; } diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 1f0c2c2..7147bfa 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3. If not see #include "plugin.h" #include "tree-pretty-print.h" #include "parser.h" +#include "type-utils.h" /* The lexer. */ @@ -2063,6 +2064,11 @@ static vec *cp_parser_initializer_list static bool cp_parser_ctor_initializer_opt_and_function_body (cp_parser *, bool); +static tree add_implicit_template_parms + (cp_parser *, size_t, tree); +static tree finish_fully_implicit_template + (cp_parser *, tree); + /* Classes [gram.class] */ static tree cp_parser_class_name @@ -3385,6 +3391,9 @@ cp_parser_new (void) /* No template parameters apply. */ parser->num_template_parameter_lists = 0; + /* Not declaring an implicit function template. */ + parser->fully_implicit_function_template_p = false; + return parser; } @@ -8549,10 +8558,12 @@ cp_parser_lambda_expression (cp_parser* parser) = parser->num_template_parameter_lists; unsigned char in_statement = parser->in_statement; bool in_switch_statement_p = parser->in_switch_statement_p; +bool fully_implicit_function_template_p = parser->fully_implicit_function_template_p; parser->num_template_parameter_lists = 0; parser->in_statement = 0; parser->in_switch_statement_p = false; +parser->fully_implicit_func
[PATCH 3/4] Support dumping type bindings in lambda diagnostics.
* error.c (dump_function_decl): Use standard diagnostic flow to dump a lambda diagnostic, albeit without stating the function name or duplicating the parameter spec (which is dumped as part of the type). --- gcc/cp/error.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/gcc/cp/error.c b/gcc/cp/error.c index c82a0ce..27ff962 100644 --- a/gcc/cp/error.c +++ b/gcc/cp/error.c @@ -1380,14 +1380,7 @@ dump_function_decl (cxx_pretty_printer *pp, tree t, int flags) int do_outer_scope = ! (flags & TFF_UNQUALIFIED_NAME); tree exceptions; vec *typenames = NULL; - - if (DECL_NAME (t) && LAMBDA_FUNCTION_P (t)) -{ - /* A lambda's signature is essentially its "type", so defer. */ - gcc_assert (LAMBDA_TYPE_P (DECL_CONTEXT (t))); - dump_type (pp, DECL_CONTEXT (t), flags); - return; -} + bool lambda_p = false; flags &= ~(TFF_UNQUALIFIED_NAME | TFF_TEMPLATE_NAME); if (TREE_CODE (t) == TEMPLATE_DECL) @@ -1449,16 +1442,23 @@ dump_function_decl (cxx_pretty_printer *pp, tree t, int flags) else if (cname) { dump_type (pp, cname, flags); - pp_cxx_colon_colon (pp); + if (LAMBDA_TYPE_P (cname)) + lambda_p = true; + else + pp_cxx_colon_colon (pp); } else dump_scope (pp, CP_DECL_CONTEXT (t), flags); - dump_function_name (pp, t, flags); + /* A lambda's signature is essentially its "type", which has already been + dumped. */ + if (!lambda_p) +dump_function_name (pp, t, flags); if (!(flags & TFF_NO_FUNCTION_ARGUMENTS)) { - dump_parameters (pp, parmtypes, flags); + if (!lambda_p) + dump_parameters (pp, parmtypes, flags); if (TREE_CODE (fntype) == METHOD_TYPE) { -- 1.8.4
[PATCH 2/4] Don't generate lambda conversion op if arglist has parameter pack.
* lambda.c (maybe_add_lambda_conv_op): Optimize argvec building and early out if CALLOP contains a function parameter pack. --- gcc/cp/lambda.c | 60 ++--- 1 file changed, 44 insertions(+), 16 deletions(-) diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c index e9bc7c5..4d76f82 100644 --- a/gcc/cp/lambda.c +++ b/gcc/cp/lambda.c @@ -770,8 +770,51 @@ maybe_add_lambda_conv_op (tree type) return; } + argvec = make_tree_vector (); + + /* Non-template conversion operators are defined directly. Templates are + deferred. In the non-template case, the nullptr instance of the stateless + lambda type is added to ARGVEC for build_call_a. In the template case it + is bound via build_min. */ + if (!generic_lambda_p) +{ + arg = build1 (NOP_EXPR, TREE_TYPE (DECL_ARGUMENTS (callop)), + null_pointer_node); + argvec->quick_push (arg); +} + + /* Copy CALLOP's argument list (as per 'copy_list') as FN_ARGS in order to + declare the static member function "_FUN" below. For each arg append to + ARGVEC (converting from reference in the template call op case). Early out + if a parameter pack is found; conversion to function pointer is not + supported in this case. */ + tree fn_args = NULL_TREE; + { +tree src = DECL_CHAIN (DECL_ARGUMENTS (callop)); +tree tgt; + +while (src) + { + if (FUNCTION_PARAMETER_PACK_P (src)) + return; + + if (!fn_args) + fn_args = tgt = copy_node (src); + else + { + TREE_CHAIN (tgt) = copy_node (src); + tgt = TREE_CHAIN (tgt); + } + + mark_exp_read (tgt); + vec_safe_push (argvec, + generic_lambda_p ? convert_from_reference (tgt) : tgt); + + src = TREE_CHAIN (src); + } + } + tree fn_result = TREE_TYPE (TREE_TYPE (callop)); - tree fn_args = copy_list (DECL_CHAIN (DECL_ARGUMENTS (callop))); if (generic_lambda_p) { @@ -780,12 +823,6 @@ maybe_add_lambda_conv_op (tree type) implementation of the conversion operator. */ tree instance = build_nop (type, null_pointer_node); - argvec = make_tree_vector (); - for (arg = fn_args; arg; arg = DECL_CHAIN (arg)) - { - mark_exp_read (arg); - vec_safe_push (argvec, convert_from_reference (arg)); - } tree objfn = build_min (COMPONENT_REF, NULL_TREE, instance, DECL_NAME (callop), NULL_TREE); @@ -802,15 +839,6 @@ maybe_add_lambda_conv_op (tree type) } else { - arg = build1 (NOP_EXPR, TREE_TYPE (DECL_ARGUMENTS (callop)), - null_pointer_node); - argvec = make_tree_vector (); - argvec->quick_push (arg); - for (arg = fn_args; arg; arg = DECL_CHAIN (arg)) - { - mark_exp_read (arg); - vec_safe_push (argvec, arg); - } call = build_call_a (callop, argvec->length (), argvec->address ()); CALL_FROM_THUNK_P (call) = 1; if (MAYBE_CLASS_TYPE_P (TREE_TYPE (call))) -- 1.8.4
Re: Lambda templates and implicit function templates.
Hi Jason, Here's an updated patch set. The fully_implicit_function_template_p field has been moved into cp_parser and the other comments addressed. I've done some testing with parameter packs also. They work okay with the explicit template parameter syntax for lambdas. Unfortunately, due to errors being thrown 'early' in grokdeclarator, I haven't been able to get 'auto...' (or reference/qualified variants) working yet. I think I need to defer processing the parameter pack internals of grokdeclarator until I have the synthesized template parameter (or generate one on the fly in-place --- but that's returning to the old 'on-demand' implementation which we moved away from). I don't know if it's the correct thing to do but the implementation currently omits the conversion to function pointer operator if the argument list contains a parameter pack. One other thing, assuming the 'auto...' syntax can be made to work, bug 41933 needs to be resolved for the expansion returned by the generic lambda in N3690 5.1.2.5 to compile. Currently (transforming the 'auto&&...' to an explicit ' T&&...') appears to yield the bug. In particular I get: error: expansion pattern ‘ts’ contains no argument packs sorry, unimplemented: use of ‘type_pack_expansion’ in template http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41933#c8 You look to have done some work on it. Any direction as to where to go from there? If the solution requires reworking variadics somehow might that perhaps alleviate my problems implementing 'auto...'? Wishful thinking probably! Cheers, Adam Patch summary (4): Support lambda templates. Don't generate lambda conversion op if arglist has parameter pack. Support dumping type bindings in lambda diagnostics. Support using 'auto' in a function parameter list to introduce an implicit template parameter. gcc/cp/cp-tree.h| 2 + gcc/cp/decl.c | 7 +- gcc/cp/decl2.c | 5 +- gcc/cp/error.c | 22 +++--- gcc/cp/lambda.c | 115 ++- gcc/cp/parser.c | 222 +--- gcc/cp/parser.h | 6 ++ gcc/cp/pt.c | 39 + gcc/cp/type-utils.h | 56 + 9 files changed, 413 insertions(+), 61 deletions(-) create mode 100644 gcc/cp/type-utils.h -- 1.8.4
[PATCH 1/4] Support lambda templates.
* parser.c (cp_parser_lambda_declarator_opt): Accept template parameter list with std=c++1y or std=gnu++1y. (cp_parser_lambda_body): Don't call 'expand_or_defer_fn' for lambda call operator template to avoid adding template result to symbol table. * lambda.c (lambda_function): Return template result if call operator is a template. (maybe_add_lambda_conv_op): Support conversion of a non-capturing lambda template to a function pointer. * decl2.c (check_member_template): Don't reject lambda call operator template in local [lambda] class. * pt.c (instantiate_class_template_1): Don't instantiate lambda call operator template when instantiating lambda class. --- gcc/cp/decl2.c | 5 ++-- gcc/cp/lambda.c | 87 - gcc/cp/parser.c | 40 -- gcc/cp/pt.c | 4 ++- 4 files changed, 110 insertions(+), 26 deletions(-) diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c index d5d2912..ac9dbd7 100644 --- a/gcc/cp/decl2.c +++ b/gcc/cp/decl2.c @@ -507,8 +507,9 @@ check_member_template (tree tmpl) || (TREE_CODE (decl) == TYPE_DECL && MAYBE_CLASS_TYPE_P (TREE_TYPE (decl { - /* The parser rejects template declarations in local classes. */ - gcc_assert (!current_function_decl); + /* The parser rejects template declarations in local classes +(with the exception of generic lambdas). */ + gcc_assert (!current_function_decl || LAMBDA_FUNCTION_P (decl)); /* The parser rejects any use of virtual in a function template. */ gcc_assert (!(TREE_CODE (decl) == FUNCTION_DECL && DECL_VIRTUAL_P (decl))); diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c index a53e692..e9bc7c5 100644 --- a/gcc/cp/lambda.c +++ b/gcc/cp/lambda.c @@ -196,7 +196,7 @@ lambda_function (tree lambda) /*protect=*/0, /*want_type=*/false, tf_warning_or_error); if (lambda) -lambda = BASELINK_FUNCTIONS (lambda); +lambda = STRIP_TEMPLATE (get_first_fn (lambda)); return lambda; } @@ -759,6 +759,10 @@ maybe_add_lambda_conv_op (tree type) if (processing_template_decl) return; + bool generic_lambda_p += (DECL_TEMPLATE_INFO (callop) +&& DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (callop)) == callop); + if (DECL_INITIAL (callop) == NULL_TREE) { /* If the op() wasn't instantiated due to errors, give up. */ @@ -766,7 +770,54 @@ maybe_add_lambda_conv_op (tree type) return; } - stattype = build_function_type (TREE_TYPE (TREE_TYPE (callop)), + tree fn_result = TREE_TYPE (TREE_TYPE (callop)); + tree fn_args = copy_list (DECL_CHAIN (DECL_ARGUMENTS (callop))); + + if (generic_lambda_p) +{ + /* Construct the dependent member call for the static member function +'_FUN' and remove 'auto' from its return type to allow for simple +implementation of the conversion operator. */ + + tree instance = build_nop (type, null_pointer_node); + argvec = make_tree_vector (); + for (arg = fn_args; arg; arg = DECL_CHAIN (arg)) + { + mark_exp_read (arg); + vec_safe_push (argvec, convert_from_reference (arg)); + } + + tree objfn = build_min (COMPONENT_REF, NULL_TREE, + instance, DECL_NAME (callop), NULL_TREE); + call = build_nt_call_vec (objfn, argvec); + + if (type_uses_auto (fn_result)) + { + ++processing_template_decl; + fn_result = finish_decltype_type + (call, /*id_expression_or_member_access_p=*/false, +tf_warning_or_error); + --processing_template_decl; + } +} + else +{ + arg = build1 (NOP_EXPR, TREE_TYPE (DECL_ARGUMENTS (callop)), + null_pointer_node); + argvec = make_tree_vector (); + argvec->quick_push (arg); + for (arg = fn_args; arg; arg = DECL_CHAIN (arg)) + { + mark_exp_read (arg); + vec_safe_push (argvec, arg); + } + call = build_call_a (callop, argvec->length (), argvec->address ()); + CALL_FROM_THUNK_P (call) = 1; + if (MAYBE_CLASS_TYPE_P (TREE_TYPE (call))) + call = build_cplus_new (TREE_TYPE (call), call, tf_warning_or_error); +} + + stattype = build_function_type (fn_result, FUNCTION_ARG_CHAIN (callop)); /* First build up the conversion op. */ @@ -794,6 +845,9 @@ maybe_add_lambda_conv_op (tree type) if (nested) DECL_INTERFACE_KNOWN (fn) = 1; + if (generic_lambda_p) +fn = add_inherited_template_parms (fn, DECL_TI_TEMPLATE (callop)); + add_method (type, fn, NULL_TREE); /* Generic thunk code fails for varargs; we'll complain in mark_used if @@ -820,8 +874,8 @@ maybe_add_lambda_conv_op (tree type) DECL_NOT_REALLY_EXTERN (fn) = 1; DECL_DECLARED_INLINE_P (fn) = 1; DECL_STATIC_FUNCTION
Re: [PATCH 3/3] Support dumping type bindings in lambda diagnostics.
Hi Jason, Was just about to compose a mail to gcc-patches... Been busy, then ill, now just about ready to submit a new set of diffs. On 27.08.2013 17:47, Jason Merrill wrote: This patch doesn't seem to depend on the others; go ahead and apply it. Okay. As it stands, it means that you get an additional 'const' in diagnostics for lambda's not declared 'mutable'. I didn't think this to be necessarily a bad thing (distinguishing between 'mutable' and 'plain' lambdas and pointing at the cause for possible user error attempting to write to by-value captures) but it's probably contentious. Do you want me to ditch the 'const' output prior to pushing? Cheers, Adam
Re: PING: Re: [patch] implement simd loops in trunk (OMP_SIMD)
On 08/19/13 13:18, Aldy Hernandez wrote: Well, apparently this last revision was approved and I didn't even know about it :). Tested one last time against current trunk and committed as revision 202029. My apologies to Jakub for the merge problems he will inherit on the gomp-4_0-branch. Aldy On 08/19/13 04:35, Jakub Jelinek wrote: On Thu, Aug 01, 2013 at 09:38:31AM -1000, Richard Henderson wrote: + if (simd + /* + || (fd->sched_kind == OMP_CLAUSE_SCHEDULE_STATIC + && !fd->have_ordered)*/) Debugging leftovers or what? gomp-4_0-branch contains also the || there, but for the trunk merge I think Aldy can leave that comment out, I'll deal with it when merging the trunk changes back to gomp-4_0-branch. Removed comment and the enclosed code altogether. + /* Enforce simdlen 1 in simd loops with data sharing clauses referencing + variable sized vars. That is unnecessarily hard to support and very + unlikely to result in vectorized code anyway. */ + if (is_simd) +for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c)) + switch (OMP_CLAUSE_CODE (c)) + { + case OMP_CLAUSE_REDUCTION: + if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c)) + max_vf = 1; + /* FALLTHRU */ + case OMP_CLAUSE_PRIVATE: + case OMP_CLAUSE_FIRSTPRIVATE: + case OMP_CLAUSE_LASTPRIVATE: + case OMP_CLAUSE_LINEAR: + if (is_variable_sized (OMP_CLAUSE_DECL (c))) + max_vf = 1; + break; + default: + continue; + } Comment is wrong, code is right, adjusting max_vf not simdlen. max_vf = 1 causes c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_SAFELEN); OMP_CLAUSE_SAFELEN_EXPR (c) = build_int_cst (integer_type_node, max_vf); so I think the comment should just be changed s/simdlen/safelen/, always get those 2 wrong... simdlen is for elemental functions, safelen for simd loops. Adjusted comment to document both things-- that max_vf is being set, and that this later triggers a safelen of 1: + /* Set max_vf=1 (which will later enforce safelen=1) in simd loops + with data sharing clauses referencing variable sized vars. That + is unnecessarily hard to support and very unlikely to result in + vectorized code anyway. */ + /* If max_vf is non-NULL, then we can use only vectorization factor + up to the max_vf we chose. So stick it into safelen clause. */ + if (max_vf) +{ + tree c = find_omp_clause (gimple_omp_for_clauses (ctx->stmt), + OMP_CLAUSE_SAFELEN); First, non-zero, not non-null -- max_vf is not a pointer. Ack. Fixed. Second, I don't understand why we're adjusting safelen. The VF we choose for optimizing the loop (even 1), does not change the fact that there are no dependencies between loop iterations larger than that. No, it adds depenencies. We choose some size of the per-simd lane arrays, and we must not allow larger vectorization factor than that, as every simd lane should have different object. The size is normally the maximum possible value for a target, so it should be big enough, but we really need it to be recorded, so that we don't generate invalid code. max_vf = 1 is for the cases where we punt on handling it properly, those loops for now can't be vectorized just because they had safelen clause (normal vectorization rules will probably also punt on those). In the future if e.g. privatized per-SIMD lane VLAs are common we can improve that, but right now I somehow doubt that would be the case. I'll let you two sort this out. Nothing done. Did you want to be adding a _max_vf_ attribute to record stuff that we learned while examining the omp loop? E.g. the max_vf=1 reduction above? Given the only adjustment we make to max_vf is to disable vectorization, did we in fact want to discard the simd attribute instead, making this a "regular" openmp loop? See above, it is not only about disabling it, we record the sizes of the arrays there. And, for SAFELEN clause with 1, we actually don't set loop->force_vect and set loop->safelen to 0 (i.e. normal loop). Similarly, you too can sort this out. + if (__builtin_expect (N32 cond3 N31, 0)) goto ZERO_ITER_BB; + if (cond3 is <) + adj = STEP3 - 1; + else + adj = STEP3 + 1; + count3 = (adj + N32 - N31) / STEP3; + if (__builtin_expect (N22 cond2 N21, 0)) goto ZERO_ITER_BB; + if (cond2 is <) + adj = STEP2 - 1; + else + adj = STEP2 + 1; + count2 = (adj + N22 - N21) / STEP2; + if (__builtin_expect (N12 cond1 N11, 0)) goto ZERO_ITER_BB; CEIL_DIV_EXPR, instead of TRUNC_DIV_EXPR and adjusting by hand? Unless we can't generate the same code in the end because generically we don't know that the values involved must be negative for GT? This is just moving code around (and merging multiple copies o
Re: opt-info message change for vectorizer
yes -- the long unmangled names can be annoying -- that is why I chose to dump the short form of the function names -- combined with line numbers, it should be enough to get the full context. David On Tue, Aug 27, 2013 at 11:36 AM, Teresa Johnson wrote: > My only concern is whether the dump messages will get too long with > the full function name on the same line. The infrastructure that emits > inform() notes ensures that the function name is printed before each > block of messages related to that function (via an "In function foo:" > type message), but I had found before that the new dump infrastructure > doesn't do that. OTOH, your approach will make it much easier to grep > the output of a large build. Personally, I use grep on this type of > output enough to make the longer lines worth it. Either that or the > new dump infrastructure needs to be fixed to emit the function name > before each block of messages, a la inform(). > > Thanks, > Teresa > > On Tue, Aug 27, 2013 at 11:22 AM, Xinliang David Li > wrote: >> Does this one look ok? >> >> thanks, >> >> David >> >> On Thu, Aug 22, 2013 at 4:20 PM, Xinliang David Li >> wrote: >>> Hi, In this patch, loop alignment peeling and loop versioning >>> transformation will be reported via -fopt-info by default. This will >>> help vectorizer size tuning. >>> >>> It also enhances the opt-info dump to include current function name as >>> context. Otherwise, we may see duplicate messeages from inline/cloned >>> instances. >>> >>> An example opt report: >>> >>> >>> >>> b.c:16:A::foo: note: Loop is vectorized >>> >>> b.c:16:A::foo: note: Loop is versioned to remove aliases for vectorization >>> >>> b.c:16:A::foo: note: Loop is peeled to enhance alignment for vectorization >>> >>> b.c:16:A::foo: note: Completely unroll loop 6 times >>> >>> b.c:12:A::foo: note: Completely unroll loop 6 times >>> >>> >>> Ok after testing? >>> >>> thanks, >>> >>> David > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH i386 2/8] [AVX512] Add mask registers.
On 27 Aug 22:11, Kirill Yukhin wrote: Hello, I've while pasting the patch I've accidentally put extra brace. Pls Ignore it > +(define_insn "kxnor" > + [(set (match_operand:SWI12 0 "register_operand" "=r,!k") > + (not:SWI12 > + (xor:SWI12 > + (match_operand:SWI12 1 "register_operand" "0,k")) ;; <--- Extra ')' > + (match_operand:SWI12 2 "register_operand" "r,k")))] > + "TARGET_AVX512F" -- Thanks, K
Re: [PATCH i386 3/8] [AVX512] [1/n] Add AVX-512 patterns: VF iterator extended.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Problem is that some iterators are depend on each other, so patches are not going to be tiny. Here is 1st one. It extends VF iterator - biggest impact I believe Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is - I am going to strip out ChangeLog lines from big patch --- gcc/config/i386/i386.c | 62 +-- gcc/config/i386/i386.md | 1 + gcc/config/i386/sse.md | 283 +++- 3 files changed, 241 insertions(+), 105 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 8325919..5f50533 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -16538,8 +16538,8 @@ ix86_avx256_split_vector_move_misalign (rtx op0, rtx op1) gcc_unreachable (); case V32QImode: extract = gen_avx_vextractf128v32qi; - load_unaligned = gen_avx_loaddqu256; - store_unaligned = gen_avx_storedqu256; + load_unaligned = gen_avx_loaddquv32qi; + store_unaligned = gen_avx_storedquv32qi; mode = V16QImode; break; case V8SFmode: @@ -16642,10 +16642,56 @@ void ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[]) { rtx op0, op1, m; + rtx (*load_unaligned) (rtx, rtx); + rtx (*store_unaligned) (rtx, rtx); op0 = operands[0]; op1 = operands[1]; + if (GET_MODE_SIZE (mode) == 64) +{ + switch (GET_MODE_CLASS (mode)) + { + case MODE_VECTOR_INT: + case MODE_INT: + op0 = gen_lowpart (V16SImode, op0); + op1 = gen_lowpart (V16SImode, op1); + /* FALLTHRU */ + + case MODE_VECTOR_FLOAT: + switch (GET_MODE (op0)) + { + default: + gcc_unreachable (); + case V16SImode: + load_unaligned = gen_avx512f_loaddquv16si; + store_unaligned = gen_avx512f_storedquv16si; + break; + case V16SFmode: + load_unaligned = gen_avx512f_loadups512; + store_unaligned = gen_avx512f_storeups512; + break; + case V8DFmode: + load_unaligned = gen_avx512f_loadupd512; + store_unaligned = gen_avx512f_storeupd512; + break; + } + + if (MEM_P (op1)) + emit_insn (load_unaligned (op0, op1)); + else if (MEM_P (op0)) + emit_insn (store_unaligned (op0, op1)); + else + gcc_unreachable (); + break; + + default: + gcc_unreachable (); + } + + return; +} + if (TARGET_AVX && GET_MODE_SIZE (mode) == 32) { @@ -16678,7 +16724,7 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[]) op0 = gen_lowpart (V16QImode, op0); op1 = gen_lowpart (V16QImode, op1); /* We will eventually emit movups based on insn attributes. */ - emit_insn (gen_sse2_loaddqu (op0, op1)); + emit_insn (gen_sse2_loaddquv16qi (op0, op1)); } else if (TARGET_SSE2 && mode == V2DFmode) { @@ -16753,7 +16799,7 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[]) op0 = gen_lowpart (V16QImode, op0); op1 = gen_lowpart (V16QImode, op1); /* We will eventually emit movups based on insn attributes. */ - emit_insn (gen_sse2_storedqu (op0, op1)); + emit_insn (gen_sse2_storedquv16qi (op0, op1)); } else if (TARGET_SSE2 && mode == V2DFmode) { @@ -27473,13 +27519,13 @@ static const struct builtin_description bdesc_special_args[] = { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_lfence, "__builtin_ia32_lfence", IX86_BUILTIN_LFENCE, UNKNOWN, (int) VOID_FTYPE_VOID }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_mfence, 0, IX86_BUILTIN_MFENCE, UNKNOWN, (int) VOID_FTYPE_VOID }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_storeupd, "__builtin_ia32_storeupd", IX86_BUILTIN_STOREUPD, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V2DF }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_storedqu, "__builtin_ia32_storedqu", IX86_BUILTIN_STOREDQU, UNKNOWN, (int) VOID_FTYPE_PCHAR_V16QI }, + { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_storedquv16qi, "__builtin_ia32_storedqu", IX86_BUILTIN_STOREDQU, UNKNOWN, (int) VOID_FTYPE_PCHAR_V16QI }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_movntv2df, "__builtin_ia32_movntpd", IX86_BUILTIN_MOVNTPD, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V2DF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_movntv2di, "__builtin_ia32_movntdq", IX86_BUILTIN_MOVNTDQ, UNKNOWN, (int) VOID_FTYPE_PV2DI_V2DI }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_movntisi, "__builtin_ia32_movnti", IX86_BUILTIN_MOVNTI, UNKNOWN
Re: wide-int branch updated
you are about an hour behind in reading your email. I had just committed a patch that is very close to this. On 08/27/2013 02:31 PM, Richard Sandiford wrote: Kenneth Zadeck writes: fixed fits_uhwi_p. tested on x86-64. kenny Index: gcc/wide-int.h === --- gcc/wide-int.h (revision 201985) +++ gcc/wide-int.h (working copy) @@ -1650,7 +1650,7 @@ wide_int_ro::fits_shwi_p () const inline bool wide_int_ro::fits_uhwi_p () const { - return len == 1 || (len == 2 && val[1] == 0); + return (len == 1 && val[0] >= 0) || (len == 2 && val[1] == 0); } With upper bits being undefined, it doesn't seem safe to check val[0] or val[1] like this. I was thinking along the lines of: inline bool wide_int_ro::fits_uhwi_p () const { if (precision <= HOST_BITS_PER_WIDE_INT) return true; if (len == 1) return val[0] >= 0; if (precision < HOST_BITS_PER_WIDE_INT * 2) return ((unsigned HOST_WIDE_INT) val[1] << (HOST_BITS_PER_WIDE_INT * 2 - precision)) == 0; return val[1] == 0; } Since we don't have a sign, everything HWI-sized or smaller fits in a uhwi without loss of precision. I've tested the above on x86_64-linux-gnu FWIW, in case it looks OK. Thanks, Richard
Re: opt-info message change for vectorizer
My only concern is whether the dump messages will get too long with the full function name on the same line. The infrastructure that emits inform() notes ensures that the function name is printed before each block of messages related to that function (via an "In function foo:" type message), but I had found before that the new dump infrastructure doesn't do that. OTOH, your approach will make it much easier to grep the output of a large build. Personally, I use grep on this type of output enough to make the longer lines worth it. Either that or the new dump infrastructure needs to be fixed to emit the function name before each block of messages, a la inform(). Thanks, Teresa On Tue, Aug 27, 2013 at 11:22 AM, Xinliang David Li wrote: > Does this one look ok? > > thanks, > > David > > On Thu, Aug 22, 2013 at 4:20 PM, Xinliang David Li wrote: >> Hi, In this patch, loop alignment peeling and loop versioning >> transformation will be reported via -fopt-info by default. This will >> help vectorizer size tuning. >> >> It also enhances the opt-info dump to include current function name as >> context. Otherwise, we may see duplicate messeages from inline/cloned >> instances. >> >> An example opt report: >> >> >> >> b.c:16:A::foo: note: Loop is vectorized >> >> b.c:16:A::foo: note: Loop is versioned to remove aliases for vectorization >> >> b.c:16:A::foo: note: Loop is peeled to enhance alignment for vectorization >> >> b.c:16:A::foo: note: Completely unroll loop 6 times >> >> b.c:12:A::foo: note: Completely unroll loop 6 times >> >> >> Ok after testing? >> >> thanks, >> >> David -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: wide-int branch updated
Kenneth Zadeck writes: > fixed fits_uhwi_p. > > tested on x86-64. > > kenny > > Index: gcc/wide-int.h > === > --- gcc/wide-int.h(revision 201985) > +++ gcc/wide-int.h(working copy) > @@ -1650,7 +1650,7 @@ wide_int_ro::fits_shwi_p () const > inline bool > wide_int_ro::fits_uhwi_p () const > { > - return len == 1 || (len == 2 && val[1] == 0); > + return (len == 1 && val[0] >= 0) || (len == 2 && val[1] == 0); > } With upper bits being undefined, it doesn't seem safe to check val[0] or val[1] like this. I was thinking along the lines of: inline bool wide_int_ro::fits_uhwi_p () const { if (precision <= HOST_BITS_PER_WIDE_INT) return true; if (len == 1) return val[0] >= 0; if (precision < HOST_BITS_PER_WIDE_INT * 2) return ((unsigned HOST_WIDE_INT) val[1] << (HOST_BITS_PER_WIDE_INT * 2 - precision)) == 0; return val[1] == 0; } Since we don't have a sign, everything HWI-sized or smaller fits in a uhwi without loss of precision. I've tested the above on x86_64-linux-gnu FWIW, in case it looks OK. Thanks, Richard
Re: opt-info message change for vectorizer
Does this one look ok? thanks, David On Thu, Aug 22, 2013 at 4:20 PM, Xinliang David Li wrote: > Hi, In this patch, loop alignment peeling and loop versioning > transformation will be reported via -fopt-info by default. This will > help vectorizer size tuning. > > It also enhances the opt-info dump to include current function name as > context. Otherwise, we may see duplicate messeages from inline/cloned > instances. > > An example opt report: > > > > b.c:16:A::foo: note: Loop is vectorized > > b.c:16:A::foo: note: Loop is versioned to remove aliases for vectorization > > b.c:16:A::foo: note: Loop is peeled to enhance alignment for vectorization > > b.c:16:A::foo: note: Completely unroll loop 6 times > > b.c:12:A::foo: note: Completely unroll loop 6 times > > > Ok after testing? > > thanks, > > David
Re: [PATCH, i386] PR 57927: -march=core-avx2 different than -march=native on INTEL Haswell (i7-4700K)
On Tue, Aug 27, 2013 at 10:41 AM, Uros Bizjak wrote: > On Tue, Aug 27, 2013 at 7:36 PM, H.J. Lu wrote: > As reported in [1] the host processor detection has not yet been updated to recognize Intel Ivy Bridge and Haswell processors. This small patch adds the detection of these processors and assumes core-avx2 as march for unknown processors of the PENTIUMPRO family that support AVX2. >>> >>> I have committed slightly improved (attached) patch that uses >>> core-avx-i for IvyBridge and adds another IvyBridge model number. >>> While there, I also reordered a bunch of statements. >>> >>> Thanks, >>> Uros. >> >> Page C-3 in ntel optimization guide shows: >> >> 06_3CH, 06_45H, 06_46H Intel microarchitecture Haswell >> 06_3AH, 06_3EH Intel microarchitecture Ivy Bridge >> 06_2AH, 06_2DH Intel microarchitecture Sandy Bridge >> 06_25H, 06_2CH, 06_2FH Intel microarchitecture Westmere >> 06_1AH, 06_1EH, 06_1FH, Intel microarchitecture Nehalem >> 06_2EH >> 06_17H, 06_1DH Enhanced Intel Core microarchitecture >> 06_0FH Intel Core microarchitecture >> >> At least, we should add 0x45 and 0x46 to Haswell. > > OK, the patch is pre-approved. > This is what I checked in. Thanks. -- H.J. --- diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 9698dc9..54a4cf9 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2013-08-27 H.J. Lu + +* config/i386/driver-i386.c (host_detect_local_cpu): Update +Haswell processor detection. + 2013-08-27 Christian Widmer PR target/57927 diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c index 001d12f..4cb9907 100644 --- a/gcc/config/i386/driver-i386.c +++ b/gcc/config/i386/driver-i386.c @@ -673,6 +673,8 @@ const char *host_detect_local_cpu (int argc, const char **argv) cpu = "core-avx-i"; break; case 0x3c: +case 0x45: +case 0x46: /* Haswell. */ cpu = "core-avx2"; break;
Re: [PATCH i386 2/8] [AVX512] Add mask registers.
Hello Reichard, On 26 Aug 09:37, Richard Henderson wrote: > On 08/26/2013 09:13 AM, Kirill Yukhin wrote: > > +(define_split > > + [(set (match_operand:SWI12 0 "mask_reg_operand") > > + (any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand") > > +(match_operand:SWI12 2 "mask_reg_operand"))) > > + (clobber (reg:CC FLAGS_REG))] > > + "TARGET_AVX512F && reload_completed" > > + [(set (match_operand:SWI12 0 "mask_reg_operand") > > + (any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand") > > +(match_operand:SWI12 2 "mask_reg_operand")))]) > > You must you match_dup on the new pattern half of define_split. > This pattern must never have triggered during your tests, since > it should have resulted in garbage rtl, and an ICE. Thanks, fixed. > > + (match_operand:SWI12 2 "register_operand" "r,Yk")))] > > + "TARGET_AVX512F" > > + "@ > > + # > > + kandnw\t{%2, %1, %0|%0, %1, %2}" > > + [(set_attr "type" "*,msklog") > > + (set_attr "prefix" "*,vex") > > + (set_attr "mode" "")]) > > What happened to the bmi andn alternative we discussed? BMI only supported for 4- and 8- byte integers, while kandw - for HI/QI > > + and{l}\t{%k2, %k0|%k0, %k2} > > + #" > > + [(set_attr "type" "alu,alu,alu,msklog") > > + (set_attr "mode" "QI,QI,SI,HI")]) > > Why force the split? You can write the kand here... Done. > > + {w}\t{%2, %0|%0, %2} > > + #" > > + [(set_attr "type" "alu,alu,msklog") > > + (set_attr "mode" "HI")]) > > Likewise. Done. > The point being that with optimization enabled, we will have run the split and > gotten all of the performance benefit of eliding the clobber. But with > optimization disabled, we don't need the split for correctness. > > > +(define_insn "kunpckhi" > > + [(set (match_operand:HI 0 "register_operand" "=Yk") > > + (ior:HI > > + (ashift:HI > > + (match_operand:HI 1 "register_operand" "Yk") > > + (const_int 8)) > > + (zero_extend:HI (subreg:QI (match_operand:HI 2 "register_operand" > > "Yk") 0] > > + "TARGET_AVX512F" > > + "kunpckbw\t{%2, %1, %0|%0, %1, %2}" > > + [(set_attr "mode" "HI") > > + (set_attr "type" "msklog") > > + (set_attr "prefix" "vex")]) > > Don't write the subreg explicitly. Instead, use a match_operand:QI, which > will > match the whole (subreg (reg)) expression, and also something that the > combiner > could simplify out of that. Thanks, fixed. > > +(define_insn "*one_cmplhi2_1" > > + [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yk") > > + (not:HI (match_operand:HI 1 "nonimmediate_operand" "0,Yk")))] > > + "ix86_unary_operator_ok (NOT, HImode, operands)" > ... > > (define_insn "*one_cmplqi2_1" > > - [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r") > > - (not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")))] > > + [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,*Yk") > > + (not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,*Yk")))] > > Forgotten ! for Yk alternatives. Thanks. Fixed. > > + "TARGET_AVX512F && !ANY_MASK_REG_P (operands [0])" > ... > > +;; Do not split instructions with mask registers. > > (define_split > ... > > + && (! ANY_MASK_REG_P (operands[0]) > > + || ! ANY_MASK_REG_P (operands[1]) > > + || ! ANY_MASK_REG_P (operands[2]))" > > This ugliness is why I suggested adding a general_reg_operand in our last > conversation. If introduced general_reg_operand predicate. Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. Is it ok? -- Thanks, K --- gcc/config/i386/constraints.md | 8 +- gcc/config/i386/i386.c | 34 +++-- gcc/config/i386/i386.h | 40 -- gcc/config/i386/i386.md| 280 ++--- gcc/config/i386/predicates.md | 9 ++ 5 files changed, 311 insertions(+), 60 deletions(-) diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md index 28e626f..92e0c05 100644 --- a/gcc/config/i386/constraints.md +++ b/gcc/config/i386/constraints.md @@ -19,7 +19,7 @@ ;;; Unused letters: ;;; B H T -;;; h jk +;;; h j ;; Integer register constraints. ;; It is not necessary to define 'r' here. @@ -78,6 +78,12 @@ "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS" "Second from top of 80387 floating-point stack (@code{%st(1)}).") +(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS" +"@internal Any mask register that can be used as predicate, i.e. k1-k7.") + +(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS" +"@internal Any mask register.") + ;; Vector registers (also used for plain floating point nowadays). (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS" "Any MMX register.") diff --git a/gc
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
On Mon, Aug 19, 2013 at 10:47 AM, Teresa Johnson wrote: > On Mon, Aug 19, 2013 at 8:09 AM, Jan Hubicka wrote: >>> Remember it isn't using dominance anymore. The latest patch was >>> instead ensuring the most frequent path between hot blocks and the >>> entry/exit are marked hot. That should be better than the dominance >>> approach used in the earlier version. >> >> Indeed, that looks more resonable approach. >> Can you point me to the last version of patch? Last one I remember still >> walked dominators... > > I've included the latest patch below. I still use dominators in the > post-cfg-optimization fixup (fixup_partitions), but not in the > partition sanitizing done during the partitioning itself > (sanitize_hot_paths). The former is looking for hot bbs newly > dominated by cold bbs after cfg transformations. > >>> >>> > We can commit it and work on better solution incrementally but it will >>> > probably mean replacing it later. If you think it makes things easier >>> > to work on it incrementally, I think the patch is OK. >>> >>> Yes, I think this is a big step forward from what is there now for >>> splitting, which does the splitting purely based on bb count in >>> isolation. I don't have a much better solution in mind yet. >>> Ping on this patch. Honza, did the latest version I sent last week look ok to you? I've included below a new version that adds the partitioning insanity warnings we discussed (emitted to the dump only since as I noted there are various optimization passes that provoke this due to not fixing up profile data). (Honza - I am also going to move the discussion we started in this thread on the COMDAT missing profile handling patch to a different thread. Should have an update on that shortly.) Latest patch below retested with bootstrap and regression testing on x86-64-unknown-linux-gnu, and with a profiledbootstrap and -freorder-blocks-and-partition forced on. Ok for trunk? Thanks, Teresa 2013-08-27 Teresa Johnson Steven Bosscher * cfgrtl.c (fixup_new_cold_bb): New routine. (commit_edge_insertions): Invoke fixup_partitions. (find_partition_fixes): New routine. (fixup_partitions): Ditto. (verify_hot_cold_block_grouping): Update comments. (rtl_verify_edges): Invoke find_partition_fixes. (rtl_verify_bb_pointers): Update comments. (rtl_verify_bb_layout): Ditto. * basic-block.h (probably_never_executed_edge_p): Declare. (fixup_partitions): Ditto. * cfgcleanup.c (try_optimize_cfg): Invoke fixup_partitions. * bb-reorder.c (sanitize_hot_paths): New function. (find_rarely_executed_basic_blocks_and_crossing_edges): Invoke sanitize_hot_paths. * predict.c (probably_never_executed_edge_p): New routine. * cfg.c (check_bb_profile): Add partition insanity warnings. Index: cfgrtl.c === --- cfgrtl.c(revision 202021) +++ cfgrtl.c(working copy) @@ -1358,6 +1358,43 @@ fixup_partition_crossing (edge e) } } +/* Called when block BB has been reassigned to the cold partition, + because it is now dominated by another cold block, + to ensure that the region crossing attributes are updated. */ + +static void +fixup_new_cold_bb (basic_block bb) +{ + edge e; + edge_iterator ei; + + /* This is called when a hot bb is found to now be dominated + by a cold bb and therefore needs to become cold. Therefore, + its preds will no longer be region crossing. Any non-dominating + preds that were previously hot would also have become cold + in the caller for the same region. Any preds that were previously + region-crossing will be adjusted in fixup_partition_crossing. */ + FOR_EACH_EDGE (e, ei, bb->preds) +{ + fixup_partition_crossing (e); +} + + /* Possibly need to make bb's successor edges region crossing, + or remove stale region crossing. */ + FOR_EACH_EDGE (e, ei, bb->succs) +{ + /* We can't have fall-through edges across partition boundaries. + Note that force_nonfallthru will do any necessary partition + boundary fixup by calling fixup_partition_crossing itself. */ + if ((e->flags & EDGE_FALLTHRU) + && BB_PARTITION (bb) != BB_PARTITION (e->dest) + && e->dest != EXIT_BLOCK_PTR) +force_nonfallthru (e); + else +fixup_partition_crossing (e); +} +} + /* Attempt to change code to redirect edge E to TARGET. Don't do that on expense of adding new instructions or reordering basic blocks. @@ -1996,6 +2033,14 @@ commit_edge_insertions (void) { basic_block bb; + /* Optimization passes that invoke this routine can cause hot blocks + previously reached by both hot and cold blocks to become dominated only + by cold blocks. This will cause the verification below to fail, + and lead to now cold code in the hot section. In some cases this + may only be vis
Re: [PATCH] Convert more passes to new dump framework
+ Honza On Tue, Aug 27, 2013 at 10:56 AM, Teresa Johnson wrote: > Ping #3. > > Thanks, > Teresa > > On Mon, Aug 19, 2013 at 11:33 AM, Teresa Johnson wrote: >> Ping. >> Thanks, >> Teresa >> >> On Mon, Aug 12, 2013 at 6:54 AM, Teresa Johnson wrote: >>> On Tue, Aug 6, 2013 at 10:23 PM, Teresa Johnson >>> wrote: On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson wrote: > On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor wrote: >> Hi, >> >> On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote: >>> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor wrote: >>> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote: >>> >> This patch ports messages to the new dump framework, >>> > >>> > It would be great this new framework was documented somewhere. I lost >>> > track of what was agreed it would be and from the uses in the >>> > vectorizer I was never quite sure how to utilize it in other passes. >>> >>> Cc'ing Sharad who implemented this - Sharad, is this documented on a >>> wiki or elsewhere? >> >> Thanks >> >>> >>> > >>> > I'd also like to point out two other minor things inline: >>> > >>> > [...] >>> > >>> >> 2013-08-06 Teresa Johnson >>> >> Dehao Chen >>> >> >>> >> * dumpfile.c (dump_loc): Add column number to output, make >>> >> newlines >>> >> consistent. >>> >> * dumpfile.h (OPTGROUP_OTHER): Add and enable under >>> >> OPTGROUP_ALL. >>> >> * ipa-inline-transform.c (clone_inlined_nodes): >>> >> (cgraph_node_opt_info): New function. >>> >> (cgraph_node_call_chain): Ditto. >>> >> (dump_inline_decision): Ditto. >>> >> (inline_call): Invoke dump_inline_decision. >>> >> * doc/invoke.texi: Document optall -fopt-info flag. >>> >> * profile.c (read_profile_edge_counts): Use new dump >>> >> framework. >>> >> (compute_branch_probabilities): Ditto. >>> >> * passes.c (pass_manager::register_one_dump_file): Use >>> >> OPTGROUP_OTHER >>> >> when pass not in any opt group. >>> >> * value-prof.c (check_counter): Use new dump framework. >>> >> (find_func_by_funcdef_no): Ditto. >>> >> (check_ic_target): Ditto. >>> >> * coverage.c (get_coverage_counts): Ditto. >>> >> (coverage_init): Setup new dump framework. >>> >> * ipa-inline.c (inline_small_functions): Set >>> >> is_in_ipa_inline. >>> >> * ipa-inline.h (is_in_ipa_inline): Declare. >>> >> >>> >> * testsuite/gcc.dg/pr40209.c: Use -fopt-info. >>> >> * testsuite/gcc.dg/pr26570.c: Ditto. >>> >> * testsuite/gcc.dg/pr32773.c: Ditto. >>> >> * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto. >>> >> >>> > >>> > [...] >>> > >>> >> Index: ipa-inline-transform.c >>> >> === >>> >> --- ipa-inline-transform.c (revision 201461) >>> >> +++ ipa-inline-transform.c (working copy) >>> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, >>> >> bool d >>> >> } >>> >> >>> >> >>> >> +#define MAX_INT_LENGTH 20 >>> >> + >>> >> +/* Return NODE's name and profile count, if available. */ >>> >> + >>> >> +static const char * >>> >> +cgraph_node_opt_info (struct cgraph_node *node) >>> >> +{ >>> >> + char *buf; >>> >> + size_t buf_size; >>> >> + const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, >>> >> 0); >>> >> + >>> >> + if (!bfd_name) >>> >> +bfd_name = "unknown"; >>> >> + >>> >> + buf_size = strlen (bfd_name) + 1; >>> >> + if (profile_info) >>> >> +buf_size += (MAX_INT_LENGTH + 3); >>> >> + >>> >> + buf = (char *) xmalloc (buf_size); >>> >> + >>> >> + strcpy (buf, bfd_name); >>> >> + >>> >> + if (profile_info) >>> >> +sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, >>> >> node->count); >>> >> + return buf; >>> >> +} >>> > >>> > I'm not sure if output of this function is aimed only at the user or >>> > if it is supposed to be used by gcc developers as well. If the >>> > latter, an incredibly useful thing is to also dump node->symbol.order >>> > too. We usually dump it after "/" sign separating it from node name. >>> > It is invaluable when examining decisions in C++ code where you can >>> > have lots of clones of a node (and also because existing dumps print >>> > it, it is easy to combine them). >>> >>> The output is useful for both power users doing performance tuning of >>> their application, and by gcc developers. Adding the id is not so >>> useful for the former, but I agree that it i
[PING][PATCH] Fix PR58139 by correctly initializing reg_raw_mode[]
I'd like to ping the following patch which fixes a wrong code bug on powerpc64-linux due to a lost dependency within the scheduler: Fix PR58139 by correctly initializing reg_raw_mode[] http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00910.html H.J., can you please try and SPEC test this patch on Intel to make sure we don't regress you? Thanks. Peter
Re: [PATCH] Convert more passes to new dump framework
Ping #3. Thanks, Teresa On Mon, Aug 19, 2013 at 11:33 AM, Teresa Johnson wrote: > Ping. > Thanks, > Teresa > > On Mon, Aug 12, 2013 at 6:54 AM, Teresa Johnson wrote: >> On Tue, Aug 6, 2013 at 10:23 PM, Teresa Johnson wrote: >>> On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson wrote: On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor wrote: > Hi, > > On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote: >> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor wrote: >> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote: >> >> This patch ports messages to the new dump framework, >> > >> > It would be great this new framework was documented somewhere. I lost >> > track of what was agreed it would be and from the uses in the >> > vectorizer I was never quite sure how to utilize it in other passes. >> >> Cc'ing Sharad who implemented this - Sharad, is this documented on a >> wiki or elsewhere? > > Thanks > >> >> > >> > I'd also like to point out two other minor things inline: >> > >> > [...] >> > >> >> 2013-08-06 Teresa Johnson >> >> Dehao Chen >> >> >> >> * dumpfile.c (dump_loc): Add column number to output, make >> >> newlines >> >> consistent. >> >> * dumpfile.h (OPTGROUP_OTHER): Add and enable under >> >> OPTGROUP_ALL. >> >> * ipa-inline-transform.c (clone_inlined_nodes): >> >> (cgraph_node_opt_info): New function. >> >> (cgraph_node_call_chain): Ditto. >> >> (dump_inline_decision): Ditto. >> >> (inline_call): Invoke dump_inline_decision. >> >> * doc/invoke.texi: Document optall -fopt-info flag. >> >> * profile.c (read_profile_edge_counts): Use new dump >> >> framework. >> >> (compute_branch_probabilities): Ditto. >> >> * passes.c (pass_manager::register_one_dump_file): Use >> >> OPTGROUP_OTHER >> >> when pass not in any opt group. >> >> * value-prof.c (check_counter): Use new dump framework. >> >> (find_func_by_funcdef_no): Ditto. >> >> (check_ic_target): Ditto. >> >> * coverage.c (get_coverage_counts): Ditto. >> >> (coverage_init): Setup new dump framework. >> >> * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline. >> >> * ipa-inline.h (is_in_ipa_inline): Declare. >> >> >> >> * testsuite/gcc.dg/pr40209.c: Use -fopt-info. >> >> * testsuite/gcc.dg/pr26570.c: Ditto. >> >> * testsuite/gcc.dg/pr32773.c: Ditto. >> >> * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto. >> >> >> > >> > [...] >> > >> >> Index: ipa-inline-transform.c >> >> === >> >> --- ipa-inline-transform.c (revision 201461) >> >> +++ ipa-inline-transform.c (working copy) >> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, >> >> bool d >> >> } >> >> >> >> >> >> +#define MAX_INT_LENGTH 20 >> >> + >> >> +/* Return NODE's name and profile count, if available. */ >> >> + >> >> +static const char * >> >> +cgraph_node_opt_info (struct cgraph_node *node) >> >> +{ >> >> + char *buf; >> >> + size_t buf_size; >> >> + const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, >> >> 0); >> >> + >> >> + if (!bfd_name) >> >> +bfd_name = "unknown"; >> >> + >> >> + buf_size = strlen (bfd_name) + 1; >> >> + if (profile_info) >> >> +buf_size += (MAX_INT_LENGTH + 3); >> >> + >> >> + buf = (char *) xmalloc (buf_size); >> >> + >> >> + strcpy (buf, bfd_name); >> >> + >> >> + if (profile_info) >> >> +sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, >> >> node->count); >> >> + return buf; >> >> +} >> > >> > I'm not sure if output of this function is aimed only at the user or >> > if it is supposed to be used by gcc developers as well. If the >> > latter, an incredibly useful thing is to also dump node->symbol.order >> > too. We usually dump it after "/" sign separating it from node name. >> > It is invaluable when examining decisions in C++ code where you can >> > have lots of clones of a node (and also because existing dumps print >> > it, it is easy to combine them). >> >> The output is useful for both power users doing performance tuning of >> their application, and by gcc developers. Adding the id is not so >> useful for the former, but I agree that it is very useful for compiler >> developers. In fact, in the google branch version we emit more verbose >> information (the lipo module id and the funcdef_no) to help uniquely >> identify the routines and
wide-int branch updated.
removed all knowledge of SHIFT_COUNT_TRUNCATED from wide-int both Richard Biener and Richard Sandiford had commented negatively about this. fixed bug with wide-int::fits_uhwi_p. kenny Index: gcc/fold-const.c === --- gcc/fold-const.c (revision 201985) +++ gcc/fold-const.c (working copy) @@ -992,9 +992,9 @@ int_const_binop_1 (enum tree_code code, /* It's unclear from the C standard whether shifts can overflow. The following code ignores overflow; perhaps a C standard interpretation ruling is needed. */ - res = op1.rshift (arg2, sign, GET_MODE_BITSIZE (TYPE_MODE (type)), TRUNC); + res = op1.rshift (arg2, sign, GET_MODE_BITSIZE (TYPE_MODE (type))); else - res = op1.lshift (arg2, GET_MODE_BITSIZE (TYPE_MODE (type)), TRUNC); + res = op1.lshift (arg2, GET_MODE_BITSIZE (TYPE_MODE (type))); break; case RROTATE_EXPR: Index: gcc/simplify-rtx.c === --- gcc/simplify-rtx.c (revision 201985) +++ gcc/simplify-rtx.c (working copy) @@ -3507,9 +3507,7 @@ rtx simplify_const_binary_operation (enum rtx_code code, enum machine_mode mode, rtx op0, rtx op1) { -#if TARGET_SUPPORTS_WIDE_INT == 0 unsigned int width = GET_MODE_PRECISION (mode); -#endif if (VECTOR_MODE_P (mode) && code != VEC_CONCAT @@ -3787,40 +3785,45 @@ simplify_const_binary_operation (enum rt break; case LSHIFTRT: - if (wide_int (std::make_pair (op1, mode)).neg_p ()) - return NULL_RTX; - - result = wop0.rshiftu (pop1, bitsize, TRUNC); - break; - case ASHIFTRT: - if (wide_int (std::make_pair (op1, mode)).neg_p ()) - return NULL_RTX; - - result = wop0.rshifts (pop1, bitsize, TRUNC); - break; - case ASHIFT: - if (wide_int (std::make_pair (op1, mode)).neg_p ()) - return NULL_RTX; - - result = wop0.lshift (pop1, bitsize, TRUNC); - break; - case ROTATE: - if (wide_int (std::make_pair (op1, mode)).neg_p ()) - return NULL_RTX; - - result = wop0.lrotate (pop1); - break; - case ROTATERT: - if (wide_int (std::make_pair (op1, mode)).neg_p ()) - return NULL_RTX; + { + wide_int wop1 = pop1; + if (wop1.neg_p ()) + return NULL_RTX; - result = wop0.rrotate (pop1); - break; + if (SHIFT_COUNT_TRUNCATED) + wop1 = wop1.umod_trunc (width); + switch (code) + { + case LSHIFTRT: + result = wop0.rshiftu (wop1, bitsize); + break; + + case ASHIFTRT: + result = wop0.rshifts (wop1, bitsize); + break; + + case ASHIFT: + result = wop0.lshift (wop1, bitsize); + break; + + case ROTATE: + result = wop0.lrotate (wop1); + break; + + case ROTATERT: + result = wop0.rrotate (wop1); + break; + + default: + gcc_unreachable (); + } + break; + } default: return NULL_RTX; } Index: gcc/wide-int.cc === --- gcc/wide-int.cc (revision 201985) +++ gcc/wide-int.cc (working copy) @@ -2404,7 +2404,12 @@ wide_int_ro::divmod_internal_2 (unsigned /* Do a truncating divide DIVISOR into DIVIDEND. The result is the - same size as the operands. SIGN is either SIGNED or UNSIGNED. */ + same size as the operands. SIGN is either SIGNED or UNSIGNED. If + COMPUTE_QUOTIENT is set, the quotient is computed and returned. If + it is not set, the result is undefined. If COMPUTE_REMAINDER is + set, the remaineder is returned in remainder. If it is not set, + the remainder is undefined. If OFLOW is not null, it is set to the + overflow value. */ wide_int_ro wide_int_ro::divmod_internal (bool compute_quotient, const HOST_WIDE_INT *dividend, @@ -2470,6 +2475,10 @@ wide_int_ro::divmod_internal (bool compu quotient.precision = dividend_prec; remainder->precision = dividend_prec; + /* Initialize the incoming overflow if it has been provided. */ + if (oflow) +*oflow = false; + /* If overflow is set, just get out. There will only be grief by continuing. */ if (overflow) Index: gcc/wide-int.h === --- gcc/wide-int.h (revision 201995) +++ gcc/wide-int.h (working copy) @@ -259,19 +259,6 @@ const int addr_max_bitsize = 64; const int addr_max_precision = ((addr_max_bitsize + 4 + HOST_BITS_PER_WIDE_INT - 1) & ~(HOST_BITS_PER_WIDE_INT - 1)); -enum ShiftOp { - NONE, - /* There are two uses for the wide-int shifting functions. The - first use is as an emulation of the target hardware. The - second use is as service routines for other optimizations. The - first case needs to be identified by passing TRUNC as the value - of ShiftOp so that shift amount is properly handled according to the - SHIFT_COUNT_TRUNCATED flag. For the second case, the shift - amount is always truncated by the bytesize of the mode of - THIS. */ - TRUNC -}; - /*
Re: [PATCH, i386] PR 57927: -march=core-avx2 different than -march=native on INTEL Haswell (i7-4700K)
On Tue, Aug 27, 2013 at 7:36 PM, H.J. Lu wrote: >>> As reported in [1] the host processor detection has not yet been updated >>> to recognize Intel Ivy Bridge and Haswell processors. >>> This small patch adds the detection of these processors and assumes >>> core-avx2 as march for unknown processors of the PENTIUMPRO family that >>> support AVX2. >> >> I have committed slightly improved (attached) patch that uses >> core-avx-i for IvyBridge and adds another IvyBridge model number. >> While there, I also reordered a bunch of statements. >> >> Thanks, >> Uros. > > Page C-3 in ntel optimization guide shows: > > 06_3CH, 06_45H, 06_46H Intel microarchitecture Haswell > 06_3AH, 06_3EH Intel microarchitecture Ivy Bridge > 06_2AH, 06_2DH Intel microarchitecture Sandy Bridge > 06_25H, 06_2CH, 06_2FH Intel microarchitecture Westmere > 06_1AH, 06_1EH, 06_1FH, Intel microarchitecture Nehalem > 06_2EH > 06_17H, 06_1DH Enhanced Intel Core microarchitecture > 06_0FH Intel Core microarchitecture > > At least, we should add 0x45 and 0x46 to Haswell. OK, the patch is pre-approved. Thanks, Uros.
Re: [PATCH, i386] PR 57927: -march=core-avx2 different than -march=native on INTEL Haswell (i7-4700K)
On Tue, Aug 27, 2013 at 10:28 AM, Uros Bizjak wrote: > Hello! > >> As reported in [1] the host processor detection has not yet been updated >> to recognize Intel Ivy Bridge and Haswell processors. >> This small patch adds the detection of these processors and assumes >> core-avx2 as march for unknown processors of the PENTIUMPRO family that >> support AVX2. > > I have committed slightly improved (attached) patch that uses > core-avx-i for IvyBridge and adds another IvyBridge model number. > While there, I also reordered a bunch of statements. > > Thanks, > Uros. Page C-3 in ntel optimization guide shows: 06_3CH, 06_45H, 06_46H Intel microarchitecture Haswell 06_3AH, 06_3EH Intel microarchitecture Ivy Bridge 06_2AH, 06_2DH Intel microarchitecture Sandy Bridge 06_25H, 06_2CH, 06_2FH Intel microarchitecture Westmere 06_1AH, 06_1EH, 06_1FH, Intel microarchitecture Nehalem 06_2EH 06_17H, 06_1DH Enhanced Intel Core microarchitecture 06_0FH Intel Core microarchitecture At least, we should add 0x45 and 0x46 to Haswell. -- H.J.
Re: [PATCH, i386] PR 57927: -march=core-avx2 different than -march=native on INTEL Haswell (i7-4700K)
Hello! > As reported in [1] the host processor detection has not yet been updated > to recognize Intel Ivy Bridge and Haswell processors. > This small patch adds the detection of these processors and assumes > core-avx2 as march for unknown processors of the PENTIUMPRO family that > support AVX2. I have committed slightly improved (attached) patch that uses core-avx-i for IvyBridge and adds another IvyBridge model number. While there, I also reordered a bunch of statements. Thanks, Uros. Index: driver-i386.c === --- driver-i386.c (revision 202024) +++ driver-i386.c (working copy) @@ -644,13 +644,18 @@ const char *host_detect_local_cpu (int argc, const /* Atom. */ cpu = "atom"; break; + case 0x0f: + /* Merom. */ + case 0x17: + case 0x1d: + /* Penryn. */ + cpu = "core2"; + break; case 0x1a: case 0x1e: case 0x1f: case 0x2e: /* Nehalem. */ - cpu = "corei7"; - break; case 0x25: case 0x2c: case 0x2f: @@ -662,20 +667,23 @@ const char *host_detect_local_cpu (int argc, const /* Sandy Bridge. */ cpu = "corei7-avx"; break; - case 0x17: - case 0x1d: - /* Penryn. */ - cpu = "core2"; + case 0x3a: + case 0x3e: + /* Ivy Bridge. */ + cpu = "core-avx-i"; break; - case 0x0f: - /* Merom. */ - cpu = "core2"; + case 0x3c: + /* Haswell. */ + cpu = "core-avx2"; break; default: if (arch) { /* This is unknown family 0x6 CPU. */ - if (has_avx) + if (has_avx2) + /* Assume Haswell. */ + cpu = "core-avx2"; + else if (has_avx) /* Assume Sandy Bridge. */ cpu = "corei7-avx"; else if (has_sse4_2)
Re: [PATCH 3/3] Support dumping type bindings in lambda diagnostics.
This patch doesn't seem to depend on the others; go ahead and apply it. Jason
Re: RFA: fix some avr stdint issues
Joern Rennecke wrote: This patch fixes the gcc.dg/c99-stdint-5.c and gcc.dg/c99-stdint-6.c excess error failures. FYI, some of the problems with the c99-stdint tests are related to the stdint.h implementation in use. For example, some types in AVR-Libc's stdint.h are not defined in the way the GCC test suite expects them: AVR-Libc defines most C99 types by means of attribute mode. https://savannah.nongnu.org/bugs/?34695#comment7 Johann
Re: [PING 3] [Patch RX] Add assembler option "-mcu" for generating assembler
Hi Sandeep, gas/config: 2013-07-18 Sandeep Kumar Singh * rx.h: Add option -mcpu for target variants RX100 and RX200. Approved - please apply. Cheers Nick
Re: RFA: AVR: Support building AVR Linux targets
Hi Joseph, C. Draw up another patch that restricts the AVR patterns in config.gcc to -none and -elf. A and C - I think both changes should be applied. OK - the patch for item A is already applied. Here is a proposed patch for item C. I have not applied the patch as obvious because I was not sure whether it was better to accept avr-*-elf or avr-*-elf*. I went for the latter in case there were AVR ELF variants, but I defer to your superior knowledge. OK to apply ? Cheers Nick gcc/ChangeLog 2013-08-27 Nick Clifton * config.gcc (AVR): Restrict allowed targets to RTEMS, ELF and NONE. Index: gcc/config.gcc === --- gcc/config.gcc (revision 202021) +++ gcc/config.gcc (working copy) @@ -995,7 +995,7 @@ extra_gcc_objs="driver-avr.o avr-devices.o" extra_objs="avr-devices.o avr-log.o" ;; -avr-*-*) +avr-*-elf* | avr-*-none) tm_file="elfos.h avr/elf.h avr/avr-arch.h avr/avr.h dbxelf.h avr/avr-stdint.h" if test x${with_avrlibc} != xno; then tm_file="${tm_file} ${cpu_type}/avrlibc.h"
Re: v2 of GDB hooks for debugging GCC
> "David" == David Malcolm writes: David> Is there a precanned event provided by gdb that I can connect to for David> when the underlying code has changed and my caches need to be David> invalidated? Maybe not :( You could use the "exited" event as a decent approximation. Also, and I think we're really getting into the obscure stuff here, if you want to let users debug multiple versions of gcc at the same time, then your caches have to be parameterized by the inferior. I'm reasonably confident that nobody actually does this. Tom
Re: v3 of GDB hooks for debugging GCC
On 08/26/2013 12:42 PM, David Malcolm wrote: The patch also adds me a maintainer of gdbhooks.py into the MAINTAINERS file. (There doesn't seem to be any sort order to the maintainer part of that file, should there be?) Finally, I added a copyright header to the new file ("part of GCC", FSF assignee, GPLv3 or later). OK for trunk? Yes. This is good. Thanks, jeff
Re: [GOOGLE] Update AutoFDO annotation
Ok. David On Tue, Aug 27, 2013 at 7:36 AM, Dehao Chen wrote: > Patch updated. > > Thanks, > Dehao > > On Mon, Aug 26, 2013 at 4:11 PM, Xinliang David Li wrote: >> Can you add missing documentation on functions like ...:get_count_info >> -- documenting return value etc. Also it might be better to avoid >> using 'set' as the local variable name. Change it to something more >> specific. >> >> thanks, >> >> David >> >> On Thu, Aug 22, 2013 at 3:56 PM, Dehao Chen wrote: >>> This patch has 2 changes: >>> >>> 1. Now that we have discriminator for inlined callsite, we don't need >>> special handling for callsite location any more. >>> 2. If a source line is mapped to multiple BBs, only the first BB will >>> be annotated. >>> 3. Before actual annotation, mark everythin BB/edge as not annotated. >>> >>> Bootstrapped and passed regression test. >>> >>> OK for google branch? >>> >>> Thanks, >>> Dehao >>> >>> Index: gcc/auto-profile.c >>> === >>> --- gcc/auto-profile.c (revision 201927) >>> +++ gcc/auto-profile.c (working copy) >>> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3. If not see >>> #include >>> #include >>> #include >>> +#include >>> >>> #include "config.h" >>> #include "system.h" >>> @@ -100,6 +101,10 @@ typedef std::map icall_target >>> (execution_count, value_profile_histogram). */ >>> typedef std::pair count_info; >>> >>> +/* Set of inline_stack. Used to track if the profile is already used to >>> + annotate the program. */ >>> +typedef std::set location_set; >>> + >>> struct string_compare >>> { >>>bool operator() (const char *a, const char *b) const >>> @@ -202,7 +207,8 @@ class autofdo_source_profile { >>>const function_instance *get_function_instance_by_decl (tree decl) const; >>>/* Find profile info for a given gimple STMT. If found, store the profile >>> info in INFO, and return true; otherwise return false. */ >>> - bool get_count_info (gimple stmt, count_info *info) const; >>> + bool get_count_info (gimple stmt, count_info *info, >>> + const location_set *set) const; >>>/* Find total count of the callee of EDGE. */ >>>gcov_type get_callsite_total_count (struct cgraph_edge *edge) const; >>> >>> @@ -284,17 +290,13 @@ static const char *get_original_name (const char * >>> >>> /* Return the combined location, which is a 32bit integer in which >>> higher 16 bits stores the line offset of LOC to the start lineno >>> - of DECL, The lower 16 bits stores the discrimnator of LOC if >>> - USE_DISCR is true, otherwise 0. */ >>> + of DECL, The lower 16 bits stores the discrimnator. */ >>> >>> static unsigned >>> -get_combined_location (location_t loc, tree decl, bool use_discr) >>> +get_combined_location (location_t loc, tree decl) >>> { >>> - if (use_discr) >>> -return ((LOCATION_LINE (loc) - DECL_SOURCE_LINE (decl)) << 16) >>> - | get_discriminator_from_locus (loc); >>> - else >>> -return (LOCATION_LINE (loc) - DECL_SOURCE_LINE (decl)) << 16; >>> + return ((LOCATION_LINE (loc) - DECL_SOURCE_LINE (decl)) << 16) >>> + | get_discriminator_from_locus (loc); >>> } >>> >>> /* Return the function decl of a given lexical BLOCK. */ >>> @@ -316,7 +318,7 @@ get_function_decl_from_block (tree block) >>> } >>> >>> static void >>> -get_inline_stack (gimple stmt, bool use_discr, inline_stack *stack) >>> +get_inline_stack (gimple stmt, inline_stack *stack) >>> { >>>location_t locus = gimple_location (stmt); >>>if (LOCATION_LOCUS (locus) == UNKNOWN_LOCATION) >>> @@ -337,14 +339,13 @@ static void >>> >>>tree decl = get_function_decl_from_block (block); >>>stack->push_back (std::make_pair ( >>> - decl, get_combined_location (locus, decl, level == 0 && use_discr))); >>> + decl, get_combined_location (locus, decl))); >>>locus = tmp_locus; >>>level++; >>> } >>>stack->push_back (std::make_pair ( >>>current_function_decl, >>> - get_combined_location (locus, current_function_decl, >>> - level == 0 && use_discr))); >>> + get_combined_location (locus, current_function_decl))); >>> } >>> >>> >>> @@ -523,14 +524,16 @@ const function_instance *autofdo_source_profile::g >>>return ret == map_.end() ? NULL : ret->second; >>> } >>> >>> -bool autofdo_source_profile::get_count_info (gimple stmt, >>> - count_info *info) const >>> +bool autofdo_source_profile::get_count_info (gimple stmt, count_info *info, >>> + const location_set *set) const >>> { >>>if (LOCATION_LOCUS (gimple_location (stmt)) == cfun->function_end_locus) >>> return false; >>> >>>inline_stack stack; >>> - get_inline_stack (stmt, true, &stack); >>> + get_inline_stack (stmt, &stack); >>> + if (set && set->find(stack) != set->end()) >>> +return false; >>>if (stack.size () == 0) >>> return false; >>>const function_instance *s = get_function_instance_by_inline_stack >>> (stack); >>> @@ -544,7 +
Re: [PATCH 1/2] Convert symtab, cgraph and varpool nodes into a real class hierarchy
On Aug 27, 2013, at 4:08 AM, Richard Biener wrote: >> and converts: >> struct GTY(()) cgraph_node >> to: >> struct GTY((user)) cgraph_node : public symtab_node_base GTY didn't like single inheritance for me in in wide-int.h. I extended GTY to support it better. See the wide-int branch, if you need more beef here. It isn't flawless, but, it did left me to things to make it work nice enough for my purposes. In my case, all my data is in the base class, and the base class and the derived have the same address (single inheritance, no virtual bases), and I have no embedded pointers in my data. I don't use user, seemed less ideal to me. I also had to put: void gt_ggc_mx(max_wide_int*) { } void gt_pch_nx(max_wide_int*,void (*)(void*, void*), void*) { } void gt_pch_nx(max_wide_int*) { } in wide-int.cc and: extern void gt_ggc_mx(max_wide_int*); extern void gt_pch_nx(max_wide_int*,void (*)(void*, void*), void*); extern void gt_pch_nx(max_wide_int*); into wide-int.h to get it to go. These I think are merely because I'm using a typedef for a template with template args. In the wide-int branch, it'd be nice if a GTY god can figure out how to improves things so I don't need to do the above, and to review/approve/fix the GTY code to support single inheritance even better. >> and: >> struct GTY(()) varpool_node >> to: >> class GTY((user)) varpool_node : public symtab_node_base >> >> dropping the symtab_node_def union. >> >> Since gengtype is unable to cope with inheritance, we have to mark the >> types with GTY((user)), and handcode the gty field-visiting functions. >> Given the simple hierarchy, we don't need virtual functions for this.
Re: [GOOGLE] Update AutoFDO annotation
Patch updated. Thanks, Dehao On Mon, Aug 26, 2013 at 4:11 PM, Xinliang David Li wrote: > Can you add missing documentation on functions like ...:get_count_info > -- documenting return value etc. Also it might be better to avoid > using 'set' as the local variable name. Change it to something more > specific. > > thanks, > > David > > On Thu, Aug 22, 2013 at 3:56 PM, Dehao Chen wrote: >> This patch has 2 changes: >> >> 1. Now that we have discriminator for inlined callsite, we don't need >> special handling for callsite location any more. >> 2. If a source line is mapped to multiple BBs, only the first BB will >> be annotated. >> 3. Before actual annotation, mark everythin BB/edge as not annotated. >> >> Bootstrapped and passed regression test. >> >> OK for google branch? >> >> Thanks, >> Dehao >> >> Index: gcc/auto-profile.c >> === >> --- gcc/auto-profile.c (revision 201927) >> +++ gcc/auto-profile.c (working copy) >> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3. If not see >> #include >> #include >> #include >> +#include >> >> #include "config.h" >> #include "system.h" >> @@ -100,6 +101,10 @@ typedef std::map icall_target >> (execution_count, value_profile_histogram). */ >> typedef std::pair count_info; >> >> +/* Set of inline_stack. Used to track if the profile is already used to >> + annotate the program. */ >> +typedef std::set location_set; >> + >> struct string_compare >> { >>bool operator() (const char *a, const char *b) const >> @@ -202,7 +207,8 @@ class autofdo_source_profile { >>const function_instance *get_function_instance_by_decl (tree decl) const; >>/* Find profile info for a given gimple STMT. If found, store the profile >> info in INFO, and return true; otherwise return false. */ >> - bool get_count_info (gimple stmt, count_info *info) const; >> + bool get_count_info (gimple stmt, count_info *info, >> + const location_set *set) const; >>/* Find total count of the callee of EDGE. */ >>gcov_type get_callsite_total_count (struct cgraph_edge *edge) const; >> >> @@ -284,17 +290,13 @@ static const char *get_original_name (const char * >> >> /* Return the combined location, which is a 32bit integer in which >> higher 16 bits stores the line offset of LOC to the start lineno >> - of DECL, The lower 16 bits stores the discrimnator of LOC if >> - USE_DISCR is true, otherwise 0. */ >> + of DECL, The lower 16 bits stores the discrimnator. */ >> >> static unsigned >> -get_combined_location (location_t loc, tree decl, bool use_discr) >> +get_combined_location (location_t loc, tree decl) >> { >> - if (use_discr) >> -return ((LOCATION_LINE (loc) - DECL_SOURCE_LINE (decl)) << 16) >> - | get_discriminator_from_locus (loc); >> - else >> -return (LOCATION_LINE (loc) - DECL_SOURCE_LINE (decl)) << 16; >> + return ((LOCATION_LINE (loc) - DECL_SOURCE_LINE (decl)) << 16) >> + | get_discriminator_from_locus (loc); >> } >> >> /* Return the function decl of a given lexical BLOCK. */ >> @@ -316,7 +318,7 @@ get_function_decl_from_block (tree block) >> } >> >> static void >> -get_inline_stack (gimple stmt, bool use_discr, inline_stack *stack) >> +get_inline_stack (gimple stmt, inline_stack *stack) >> { >>location_t locus = gimple_location (stmt); >>if (LOCATION_LOCUS (locus) == UNKNOWN_LOCATION) >> @@ -337,14 +339,13 @@ static void >> >>tree decl = get_function_decl_from_block (block); >>stack->push_back (std::make_pair ( >> - decl, get_combined_location (locus, decl, level == 0 && use_discr))); >> + decl, get_combined_location (locus, decl))); >>locus = tmp_locus; >>level++; >> } >>stack->push_back (std::make_pair ( >>current_function_decl, >> - get_combined_location (locus, current_function_decl, >> - level == 0 && use_discr))); >> + get_combined_location (locus, current_function_decl))); >> } >> >> >> @@ -523,14 +524,16 @@ const function_instance *autofdo_source_profile::g >>return ret == map_.end() ? NULL : ret->second; >> } >> >> -bool autofdo_source_profile::get_count_info (gimple stmt, >> - count_info *info) const >> +bool autofdo_source_profile::get_count_info (gimple stmt, count_info *info, >> + const location_set *set) const >> { >>if (LOCATION_LOCUS (gimple_location (stmt)) == cfun->function_end_locus) >> return false; >> >>inline_stack stack; >> - get_inline_stack (stmt, true, &stack); >> + get_inline_stack (stmt, &stack); >> + if (set && set->find(stack) != set->end()) >> +return false; >>if (stack.size () == 0) >> return false; >>const function_instance *s = get_function_instance_by_inline_stack >> (stack); >> @@ -544,7 +547,7 @@ gcov_type autofdo_source_profile::get_callsite_tot >> { >>inline_stack stack; >>stack.push_back (std::make_pair(edge->callee->symbol.decl, 0)); >> - get_inline_stack (edge->call_stmt,
Re: Request to merge Undefined Behavior Sanitizer in (take 3)
On Thu, Aug 22, 2013 at 09:01:57PM +0200, Marek Polacek wrote: > On Thu, Aug 22, 2013 at 07:51:07PM +0200, Marek Polacek wrote: > > Ping. > > I'm withdrawing the ping for now. I'll have to deal with some bootstrap > comparison failures first (ugh!). Fixed with this patch: http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01344.html Also, I've changed the hash table to pointer map: http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01582.html Regtested, ran bootstrap-ubsan on x86_64-linux. Ok to merge ubsan into trunk? Marek
Re: [PING PATCH, PR 57748] Set mode of structures with zero sized arrays to be BLK
On Tue, Aug 27, 2013 at 04:03:42PM +0200, Martin Jambor wrote: > On Fri, Aug 23, 2013 at 05:29:23PM +0200, Jakub Jelinek wrote: > > On Fri, Aug 23, 2013 at 05:11:22PM +0200, Martin Jambor wrote: > > > Hi Jakub and/or Joseph, > > > > > > the reporter of this bug seems to be very anxious to have it fixed in > > > the repository. While Richi is away, do you think you could have a > > > look? It is very small. > > > > Isn't this ABI incompatible change (at least potential on various targets)? > > If so, then it shouldn't be applied to release branches, because it would > > create (potential?) ABI incompatibility between 4.8.[01] and 4.8.2. > > > > I don't know. I did a few attempts to observe a change in the calling > convention of a function accepting a zero sized array terminated > structure by value (on x86_64) but I did not succeed. On the other > hand, I assume there are many other ways how a mode can influence ABI. > So I'll try to have a look whether we can hack around this situation > in 4.8's expr.c instead. All I remember that e.g. the number of regressions from PR20020 was big, and any kind of TYPE_MODE changes are just extremely risky. Perhaps x86_64 in the ABI decisions never uses mode, but we have tons of other targets and it would surprise me if none of those were affected. > Nevertheless, is this patch ok for trunk? I'll defer that to Richard now that he is back ;) Jakub
Re: [PING PATCH, PR 57748] Set mode of structures with zero sized arrays to be BLK
Hi, On Fri, Aug 23, 2013 at 05:29:23PM +0200, Jakub Jelinek wrote: > On Fri, Aug 23, 2013 at 05:11:22PM +0200, Martin Jambor wrote: > > Hi Jakub and/or Joseph, > > > > the reporter of this bug seems to be very anxious to have it fixed in > > the repository. While Richi is away, do you think you could have a > > look? It is very small. > > Isn't this ABI incompatible change (at least potential on various targets)? > If so, then it shouldn't be applied to release branches, because it would > create (potential?) ABI incompatibility between 4.8.[01] and 4.8.2. > I don't know. I did a few attempts to observe a change in the calling convention of a function accepting a zero sized array terminated structure by value (on x86_64) but I did not succeed. On the other hand, I assume there are many other ways how a mode can influence ABI. So I'll try to have a look whether we can hack around this situation in 4.8's expr.c instead. Nevertheless, is this patch ok for trunk? Thanks, Martin
[C++ Patch] PR 13981
Hi, in this very old issue submitter requested some sort of hint in the diagnostic that the specific problem with the testcase has to do with class type B being incomplete. I think an inform can do. I feared that we would provide too many "false positives", in the sense that the inform would often not be that useful, depending on the target type, but in fact I tried and it *never* (X) triggers in our whole testsuite, thus I'm not that worried anymore ;) Tested x86_64-linux. Thanks, Paolo. (X) Without the class type check it would trigger 2 times in g++.old-deja for arrays of unknown bound. I think we don't want those. /cp 2013-08-27 Paolo Carlini PR c++/13981 * typeck.c (convert_for_assignment): Provide an inform for pointers to incomplete class types. /testsuite 2013-08-27 Paolo Carlini PR c++/13981 * g++.dg/diagnostic/pr13981.C: New. Index: cp/typeck.c === --- cp/typeck.c (revision 202020) +++ cp/typeck.c (working copy) @@ -7983,6 +7983,11 @@ convert_for_assignment (tree type, tree rhs, default: gcc_unreachable(); } + if (TYPE_PTR_P (rhstype) + && CLASS_TYPE_P (TREE_TYPE (rhstype)) + && !COMPLETE_TYPE_P (TREE_TYPE (rhstype))) + inform (input_location, "class type %qT is incomplete", + TREE_TYPE (rhstype)); } return error_mark_node; } Index: testsuite/g++.dg/diagnostic/pr13981.C === --- testsuite/g++.dg/diagnostic/pr13981.C (revision 0) +++ testsuite/g++.dg/diagnostic/pr13981.C (working copy) @@ -0,0 +1,12 @@ +// PR c++/13981 + +struct A {}; +struct B; + +void func( A *a ); + +int main() +{ + B *b = 0; + func(b); // { dg-error "cannot convert" } +} // { dg-message "is incomplete" "is incomplete" { target *-*-* } 11 }
[Patch, Fortran, F03] PR 55603: Memory leak with scalar allocatable function result
Hi all, here is a patch for PR 55603, which plugs a memory leak with scalar allocatable function results. To accomplish this, several things are done: 1) Allocatable scalar function results are passed as argument now and returned by reference (just like array or character results, cf. gfc_return_by_reference). 2) A temporary variable is inserted when calling a function with allocatable scalar result (see gfc_conv_procedure_call). 3) The temporary is nullified before and freed after the call (which is of course what plugs the memory leak in the end). The first point requires some additional adjustments (see create_function_arglist, gfc_trans_deferred_vars and gfc_conv_variable). In fact the patch is just a first step and does not handle more advanced cases yet (like polymorphic allocatable scalar results, finalization, etc). However, I wanted to get it out already, in order to get some feedback on my approach before moving on. Also Tobias mentioned that he has a draft patch for this PR, so he may have some additions to it. In any case, the patch fixes the original test case by Damian and regtests cleanly on x86_64-unknown-linux-gnu. Ok for trunk? Cheers, Janus 2013-08-27 Janus Weil PR fortran/55603 * trans-decl.c (create_function_arglist, gfc_trans_deferred_vars): Pass allocatable scalar results as argument. * trans-expr.c (gfc_conv_variable): Extra dereferencing for allocatable scalar results. (gfc_conv_procedure_call): Pass allocatable scalar results as argument. Create temporary, nullify and deallocate. * trans-types.c (gfc_return_by_reference): Treat allocatable scalars. 2013-08-27 Janus Weil PR fortran/55603 * gfortran.dg/allocatable_scalar_result_1.f90: New. pr55603.diff Description: Binary data allocatable_scalar_result_1.f90 Description: Binary data
Re: [ubsan] Introduce pointer_sized_int_node
On Tue, Aug 27, 2013 at 01:48:29PM +0200, Richard Biener wrote: > > + TI_POINTER_SIZED_TYPE, > > I'd rather see TI_UINTPTR_TYPE and TI_INTPTR_TYPE (note they might > not be exactly of POINTER_SIZE but larger). We already have [u]intptr_type_node -- but only in c-family/, thus ubsan.c/asan.c cannot use those nodes. I can create both TI_SIGNED_POINTER_SIZED_TYPE and TI_UNSIGNED_POINTER_SIZED_TYPE, but we currently need only the latter... > TI_POINTER_SIZED_TYPE is ambiguous, too - what's its signedness? Unsigned. But yeah, one can't tell by just looking at the name. > All around the compiler we use sizetype and ssizetype to munge pointers > (well, not strictly correct as targets may define sizetype to be > larger/smaller > than actual pointers). I see. Marek
Re: ELF interposition and One Definition Rule
On 08/26/13 20:58, Jason Merrill wrote: I would be happy with an even stronger default that optimizes on the assumption that no interposition occurs; typically interposition is overriding a symbol found in a dynamic library (i.e. malloc) rather than a symbol defined in the same translation unit as the use. I had thought that that case (overriding malloc etc) was what this patch was dealing with. Perhaps I was confused. There's nothing particularly special about ctors, dtors and virtual function implementations. Their common feature, as Jan described, is that it's hard to take their address: * ctors don't have names. * dtors do have names, but you can't take their address. * virtual function implementations can't have their instance address taken -- you get the ptr-to-memfn object describing the vtable slot etc. There is (was?) a GNU extension to work around that, IIRC. So that stops a programmer calling these indirectly. However I don't really see the semantic difference between: fptr = &somefunc; ... fptr (...); and somefunc (...); So treating those 3 types of function specially because you can't use the former syntax with them seems odd. nathan -- Nathan Sidwell
Re: [PATCH] Change the badness computation to ensure no integer-underflow
> >> > >> I've updated the patch (as attached) to use sreal to compute badness. > + badness = ((int)((double) edge->count / max_count > + * relbenefit / RELATIVE_TIME_BENEFIT_RANGE * INT_MIN / 2)) / growth; > + > >>> > >>> FP operations on the host are frowned upon if code generation depends > >>> on their outcome. They all should use sreal_* as predict already does. > >>> Other than that I wonder why the final division is int (so we get > >>> truncation > >>> and not rounding)? That increases the effect of doing the multiply by > >>> relbenefit in double. Sorry, i missed this patch - I need to update my scripts marking files interesting to me. I originally duplicated the trick from global.c that used to do this for ages. The rationale is that if you do only one FP operation, then you will not get difference in between hosts doing 80bit fp (x86) and 64bit fp (others) and you do the operation a lot faster than sreal does. While this is on slipperly ground, controlled use of double should not imply host dependency. Badness computation is on hot the path of inliner that is the slowest WPA optimization we have (well becuase it only does something realy useful). I suppose I can try to get Martin to benchmark the patch on firefox on the afternoon. Honza > >>> > >>> Richard. > >>> > /* Be sure that insanity of the profile won't lead to increasing > counts > in the scalling and thus to overflow in the computation above. */ > gcc_assert (max_count >= edge->count);
Re: [PATCH 1/2] Convert symtab, cgraph and varpool nodes into a real class hierarchy
> >> > >> Also all of the symbol table is reachable from the global symbol_table > >> dynamic array which is a GC root. So instead of walking ->next/previous > >> and edges you should have a custom marker for the symbol_table global > >> which does more efficient marking with loops. > > > > Indeed, good point! > > All of the pointers linking symbol table together (i.e. pointer to other > > symbol > > table entries) can be ignored for PCH mark (since we are explicitely > > allocated, > > and all we need is to mark the trees/lto data and other stuff we point to. > > > > These are annotated primarily to get PCH working, where we obviously need > > to keep them. > > > > It would be nice to progress on this patch soon :) > > Oh, and with a custom marker for the symbol_table GC root the actual > symbol / edge objects need not be GC allocated at all! Just make sure > to properly mark the GC objects they refer to. Lets play with this incrementally; first nodes are streamed to PCH, so we need to get them out of that mess first. Second even edges reffers to trees that needs to be walked. There will be some surprises. My previous comment about not walking all node pointers was wrong; we do maintain list of removed cgraph nodes linked by ->next and not referred from any global array (or rather hash). So I guess in first iteration we can go with marker that marks all non-symtabs and walks the ->next linked chains. Thanks, Honza > > Richard. > > > Honza > >> > >> Richard.
[ubsan] Use pointer map instead of hash table.
It turned out that for tree -> tree mapping we don't need the hash table at all; pointer map is much more convenient. So this patch weeds out the hash table out of ubsan and introduces pointer map instead. Quite a lot of code could go away--no need to set the alloc pools up etc. Regtested, ran bootstrap-ubsan on x86_64-linux. Applying to the ubsan branch. 2013-08-27 Marek Polacek * ubsan.c: Use pointer map instead of hash table. --- gcc/ubsan.c.mp 2013-08-27 12:31:04.136213922 +0200 +++ gcc/ubsan.c 2013-08-27 12:31:10.361236456 +0200 @@ -22,109 +22,42 @@ along with GCC; see the file COPYING3. #include "system.h" #include "coretypes.h" #include "tree.h" -#include "alloc-pool.h" #include "cgraph.h" #include "gimple.h" -#include "hash-table.h" +#include "pointer-set.h" #include "output.h" #include "toplev.h" #include "ubsan.h" #include "c-family/c-common.h" -/* This type represents an entry in the hash table; this hash table - maps from a TYPE to a ubsan type descriptor VAR_DECL for that type. */ -struct ubsan_typedesc -{ - /* This represents the type of a variable. */ - tree type; - - /* This is the VAR_DECL of the type. */ - tree decl; -}; - -static alloc_pool ubsan_typedesc_alloc_pool; - -/* Hash table for type descriptors. */ -struct ubsan_typedesc_hasher - : typed_noop_remove -{ - typedef ubsan_typedesc value_type; - typedef ubsan_typedesc compare_type; - - static inline hashval_t hash (const value_type *); - static inline bool equal (const value_type *, const compare_type *); -}; - -/* Hash a memory reference. */ - -inline hashval_t -ubsan_typedesc_hasher::hash (const ubsan_typedesc *data) -{ - return iterative_hash_object (TYPE_UID (data->type), 0); -} - -/* Compare two data types. */ - -inline bool -ubsan_typedesc_hasher::equal (const ubsan_typedesc *d1, - const ubsan_typedesc *d2) -{ - /* Here, the types should have identical __typekind, - __typeinfo and __typename. */ - return d1->type == d2->type; -} - -static hash_table ubsan_typedesc_ht; +/* Map a TYPE to an ubsan type descriptor VAR_DECL for that type. */ +static pointer_map_t *typedesc_map; -/* Initializes an instance of ubsan_typedesc. */ +/* Insert DECL as the VAR_DECL for TYPE in the TYPEDESC_MAP. */ static void -ubsan_typedesc_init (ubsan_typedesc *data, tree type, tree decl) +insert_decl_for_type (tree decl, tree type) { - data->type = type; - data->decl = decl; + void **slot = pointer_map_insert (typedesc_map, type); + gcc_assert (*slot == NULL); + *slot = decl; } -/* This creates the alloc pool used to store the instances of - ubsan_typedesc that are stored in the hash table ubsan_typedesc_ht. */ +/* Find the VAR_DECL for TYPE in TYPEDESC_MAP. If TYPE does not + exist in the map, return NULL_TREE, otherwise, return the VAR_DECL + we found. */ -static alloc_pool -ubsan_typedesc_get_alloc_pool () -{ - if (ubsan_typedesc_alloc_pool == NULL) -ubsan_typedesc_alloc_pool = create_alloc_pool ("ubsan_typedesc", - sizeof (ubsan_typedesc), - 10); - return ubsan_typedesc_alloc_pool; -} - -/* Returns a reference to the hash table containing data type. - This function ensures that the hash table is created. */ - -static hash_table & -get_typedesc_hash_table () -{ - if (!ubsan_typedesc_ht.is_created ()) -ubsan_typedesc_ht.create (10); - - return ubsan_typedesc_ht; -} - -/* Allocates memory for an instance of ubsan_typedesc into the memory - pool returned by ubsan_typedesc_get_alloc_pool and initialize it. - TYPE describes a particular type, DECL is its VAR_DECL. */ - -static ubsan_typedesc * -ubsan_typedesc_new (tree type, tree decl) +static tree +lookup_decl_for_type (tree type) { - ubsan_typedesc *desc = -(ubsan_typedesc *) pool_alloc (ubsan_typedesc_get_alloc_pool ()); - - ubsan_typedesc_init (desc, type, decl); - return desc; + /* If the pointer map is not initialized yet, create it now. */ + if (typedesc_map == NULL) +typedesc_map = pointer_map_create (); + void **slot = pointer_map_contains (typedesc_map, type); + return slot ? (tree) *slot : NULL_TREE; } -/* Helper routine, which encodes a value in the uptr type. +/* Helper routine, which encodes a value in the pointer_sized_int_node. Arguments with precision <= POINTER_SIZE are passed directly, the rest is passed by reference. T is a value we are to encode. */ @@ -281,24 +214,19 @@ get_ubsan_type_info_for_type (tree type) } /* Helper routine that returns ADDR_EXPR of a VAR_DECL of a type - descriptor. It first looks into the hash table; if not found, - create the VAR_DECL, put it into the hash table and return the + descriptor. It first looks into the pointer map; if not found, + create the VAR_DECL, put it into the pointer map and return the ADDR_EXPR of it. TYPE describes a particular type. */ tree ubsan_t
Re: [PATCH 1/n] Add conditional compare support
On Tue, Aug 27, 2013 at 1:56 PM, Richard Earnshaw wrote: > On 27/08/13 12:10, Richard Biener wrote: >> What's this for and what's the desired semantics? I don't like having >> extra tree codes for this. Is this for a specific instruction set >> feature? > > The background is to support the conditional compare instructions in ARM > (more effectively) and AArch64 at all. > > The current method used in ARM is to expand into a series of store-flag > instructions and then hope that combine can optimize them away (though > that fails far too often, particularly when the first instruction in the > sequence is combined into another pattern). To make it work at all the > compiler has to lie about the costs of various store-flag type > operations which overall risks producing worse code and means we also > have to support many more complex multi-instruction patterns than is > desirable. I really don't want to go down the same route for AArch64. > > The idea behind all this is to capture potential conditional compare > operations early enough in the mid end that we can keep track of them > until RTL expand time and then to emit the correct logic on all targets > depending on what is the best thing for that target. The current method > of lowering into store-flag sequences doesn't really cut it. It seems to me that then the initial instruction selection process (aka RTL expansion) needs to be improved. As we are expanding with having the CFG around it should be easy enough to detect AND/ORIF cases and do better here. Yeah, I suppose this asks to turn existing jump expansion optimizations up-side-down to optimize with the GIMPLE CFG in mind. The current way of LOGICAL_OP_NON_SHORT_CIRCUIT is certainly bogus - fold-const.c is way too early to decide this. Similar to the ongoing work of expanding / building-up switch expressions in a GIMPLE pass, moving expand complexity up the pipeline this asks for a GIMPLE phase that moves this decision down closer to RTL expansion. (We have tree-ssa-ifcombine.c that is a related GIMPLE transform pass) Richard. > R. >
Re: [PATCH 1/2] Convert symtab, cgraph and varpool nodes into a real class hierarchy
On Tue, Aug 27, 2013 at 1:40 PM, Jan Hubicka wrote: >> > + while (x != xlimit) >> > +{ >> > + /* Code common to all symtab nodes. */ >> > + gt_ggc_m_9tree_node (x->decl); >> > + gt_ggc_mx_symtab_node_base (x->next); >> > + gt_ggc_mx_symtab_node_base (x->previous); >> > + gt_ggc_mx_symtab_node_base (x->next_sharing_asm_name); >> > + gt_ggc_mx_symtab_node_base (x->previous_sharing_asm_name); >> > + gt_ggc_mx_symtab_node_base (x->same_comdat_group); >> > + gt_ggc_m_20vec_ipa_ref_t_va_gc_ (x->ref_list.references); >> > + gt_ggc_m_9tree_node (x->alias_target); >> > + gt_ggc_m_18lto_file_decl_data (x->lto_file_data); >> >> Don't we have template specializations so we just can do >> >> gt_ggc_mark (x->decl); >> gt_ggc_mark (x->next); > Yep, that is what I hope for. >> ... >> etc? >> >> Also all of the symbol table is reachable from the global symbol_table >> dynamic array which is a GC root. So instead of walking ->next/previous >> and edges you should have a custom marker for the symbol_table global >> which does more efficient marking with loops. > > Indeed, good point! > All of the pointers linking symbol table together (i.e. pointer to other > symbol > table entries) can be ignored for PCH mark (since we are explicitely > allocated, > and all we need is to mark the trees/lto data and other stuff we point to. > > These are annotated primarily to get PCH working, where we obviously need > to keep them. > > It would be nice to progress on this patch soon :) Oh, and with a custom marker for the symbol_table GC root the actual symbol / edge objects need not be GC allocated at all! Just make sure to properly mark the GC objects they refer to. Richard. > Honza >> >> Richard.
Re: [PATCH] Change the badness computation to ensure no integer-underflow
On Tue, Jun 25, 2013 at 12:45 AM, Dehao Chen wrote: > The original patch has some flaw. The new patch is attached. > Bootstrapped and passed regression tests. Ok. Thanks, Richard. > Thanks, > Dehao > > On Mon, Jun 24, 2013 at 9:44 AM, Dehao Chen wrote: >> Hi, Richard, >> >> I've updated the patch (as attached) to use sreal to compute badness. >> >> Thanks, >> Dehao >> >> On Mon, Jun 24, 2013 at 5:41 AM, Richard Biener >> wrote: >>> On Sat, Jun 22, 2013 at 2:59 AM, Dehao Chen wrote: This patch prevents integer-underflow of badness computation in ipa-inline. Bootstrapped and passed regression tests. OK for trunk? Thanks, Dehao gcc/ChangeLog: 2013-06-21 Dehao Chen * ipa-inline.c (edge_badness): Fix integer underflow. Index: gcc/ipa-inline.c === --- gcc/ipa-inline.c (revision 200326) +++ gcc/ipa-inline.c (working copy) @@ -888,11 +888,9 @@ edge_badness (struct cgraph_edge *edge, bool dump) else if (max_count) { int relbenefit = relative_time_benefit (callee_info, edge, edge_time); - badness = - ((int) - ((double) edge->count * INT_MIN / 2 / max_count / RELATIVE_TIME_BENEFIT_RANGE) * - relbenefit) / growth; - + badness = ((int)((double) edge->count / max_count + * relbenefit / RELATIVE_TIME_BENEFIT_RANGE * INT_MIN / 2)) / growth; + >>> >>> FP operations on the host are frowned upon if code generation depends >>> on their outcome. They all should use sreal_* as predict already does. >>> Other than that I wonder why the final division is int (so we get truncation >>> and not rounding)? That increases the effect of doing the multiply by >>> relbenefit in double. >>> >>> Richard. >>> /* Be sure that insanity of the profile won't lead to increasing counts in the scalling and thus to overflow in the computation above. */ gcc_assert (max_count >= edge->count);
Re: Stream TYPE_FINAL_P and DECL_FINAL_P to LTO
> > + v = iterative_hash_host_wide_int (TYPE_TRANSPARENT_AGGR (t), v); > > + v = iterative_hash_host_wide_int (TYPE_FINAL_P (t), v); > > please use | (TYPE_FINAL_P (t) << 1) to speed this up like in other cases. Thank you, I missed that TYPE_TRANSPARENT_AGGR is also flag. I will update the patch also adding the cxx_constructor/cxx_destructor flags that landed mainline in meantime and commit. (those go basically whenever DECL_FINAL_P goes) Honza
Re: [PATCH 1/n] Add conditional compare support
On 27/08/13 12:10, Richard Biener wrote: > What's this for and what's the desired semantics? I don't like having > extra tree codes for this. Is this for a specific instruction set > feature? The background is to support the conditional compare instructions in ARM (more effectively) and AArch64 at all. The current method used in ARM is to expand into a series of store-flag instructions and then hope that combine can optimize them away (though that fails far too often, particularly when the first instruction in the sequence is combined into another pattern). To make it work at all the compiler has to lie about the costs of various store-flag type operations which overall risks producing worse code and means we also have to support many more complex multi-instruction patterns than is desirable. I really don't want to go down the same route for AArch64. The idea behind all this is to capture potential conditional compare operations early enough in the mid end that we can keep track of them until RTL expand time and then to emit the correct logic on all targets depending on what is the best thing for that target. The current method of lowering into store-flag sequences doesn't really cut it. R.
Re: [ubsan] Introduce pointer_sized_int_node
On Mon, Aug 26, 2013 at 12:15 PM, Marek Polacek wrote: > I noticed I forgot to apply this old patch, already acked by Jason. > It introduces new pointer_sized_int_node, thus we can get rid of > uptr_type function in ubsan, and it allows us to do some clean-up > in asan.c, too. > > Tested x86_64-linux, applying to ubsan branch. > > 2013-08-26 Marek Polacek > > * tree.h (enum tree_index): Add TI_POINTER_SIZED_TYPE. > (pointer_sized_int_node): Define. > * tree.c (build_common_tree_nodes): Initialize > pointer_sized_int_node. > * ubsan.c (ubsan_encode_value): Use pointer_sized_int_node instead > calling uptr_type. > (uptr_type): Remove function. > * asan.c (asan_global_struct): Use pointer_sized_int_node instead > calling build_nonstandard_integer_type. > (initialize_sanitizer_builtins): Likewise. > (asan_finish_file): Likewise. > > --- gcc/tree.c.mp 2013-07-21 19:54:35.416986756 +0200 > +++ gcc/tree.c 2013-07-21 19:56:58.347562787 +0200 > @@ -9638,6 +9638,8 @@ build_common_tree_nodes (bool signed_cha > = build_pointer_type (build_type_variant (void_type_node, 1, 0)); >fileptr_type_node = ptr_type_node; > > + pointer_sized_int_node = build_nonstandard_integer_type (POINTER_SIZE, 1); > + >float_type_node = make_node (REAL_TYPE); >TYPE_PRECISION (float_type_node) = FLOAT_TYPE_SIZE; >layout_type (float_type_node); > --- gcc/ubsan.c.mp 2013-07-21 20:04:59.469653493 +0200 > +++ gcc/ubsan.c 2013-07-21 20:07:00.227178083 +0200 > @@ -123,14 +123,6 @@ ubsan_typedesc_new (tree type, tree decl >return desc; > } > > -/* Build the ubsan uptr type. */ > - > -static tree > -uptr_type (void) > -{ > - return build_nonstandard_integer_type (POINTER_SIZE, 1); > -} > - > /* Helper routine, which encodes a value in the uptr type. > Arguments with precision <= POINTER_SIZE are passed directly, > the rest is passed by reference. T is a value we are to encode. */ > @@ -143,7 +135,7 @@ ubsan_encode_value (tree t) > { > case INTEGER_TYPE: >if (TYPE_PRECISION (type) <= POINTER_SIZE) > - return fold_build1 (NOP_EXPR, uptr_type (), t); > + return fold_build1 (NOP_EXPR, pointer_sized_int_node, t); >else > return build_fold_addr_expr (t); > case REAL_TYPE: > @@ -153,7 +145,7 @@ ubsan_encode_value (tree t) > { > tree itype = build_nonstandard_integer_type (bitsize, true); > t = fold_build1 (VIEW_CONVERT_EXPR, itype, t); > - return fold_convert (uptr_type (), t); > + return fold_convert (pointer_sized_int_node, t); > } > else > { > --- gcc/tree.h.mp 2013-07-21 19:54:35.441986868 +0200 > +++ gcc/tree.h 2013-07-21 19:56:05.128353854 +0200 > @@ -4227,6 +4227,7 @@ enum tree_index >TI_VA_LIST_FPR_COUNTER_FIELD, >TI_BOOLEAN_TYPE, >TI_FILEPTR_TYPE, > + TI_POINTER_SIZED_TYPE, I'd rather see TI_UINTPTR_TYPE and TI_INTPTR_TYPE (note they might not be exactly of POINTER_SIZE but larger). TI_POINTER_SIZED_TYPE is ambiguous, too - what's its signedness? All around the compiler we use sizetype and ssizetype to munge pointers (well, not strictly correct as targets may define sizetype to be larger/smaller than actual pointers). Richard. > >TI_DFLOAT32_TYPE, >TI_DFLOAT64_TYPE, > @@ -4383,6 +4384,7 @@ extern GTY(()) tree global_trees[TI_MAX] > #define va_list_fpr_counter_field > global_trees[TI_VA_LIST_FPR_COUNTER_FIELD] > /* The C type `FILE *'. */ > #define fileptr_type_node global_trees[TI_FILEPTR_TYPE] > +#define pointer_sized_int_node global_trees[TI_POINTER_SIZED_TYPE] > > #define boolean_type_node global_trees[TI_BOOLEAN_TYPE] > #define boolean_false_node global_trees[TI_BOOLEAN_FALSE] > --- gcc/asan.c.mp 2013-07-21 20:07:15.013237456 +0200 > +++ gcc/asan.c 2013-07-21 20:16:10.929376734 +0200 > @@ -1954,7 +1954,7 @@ asan_global_struct (void) > = build_decl (UNKNOWN_LOCATION, FIELD_DECL, > get_identifier (field_names[i]), > (i == 0 || i == 3) ? const_ptr_type_node > - : build_nonstandard_integer_type (POINTER_SIZE, 1)); > + : pointer_sized_int_node); >DECL_CONTEXT (fields[i]) = ret; >if (i) > DECL_CHAIN (fields[i - 1]) = fields[i]; > @@ -2039,8 +2039,7 @@ initialize_sanitizer_builtins (void) > ptr_type_node, ptr_type_node, NULL_TREE); >tree BT_FN_VOID_PTR_PTRMODE > = build_function_type_list (void_type_node, ptr_type_node, > - build_nonstandard_integer_type (POINTER_SIZE, > - 1), > NULL_TREE); > + pointer_sized_int_node, NULL_TREE); >tree BT_FN_VOID_INT > = build_function_type_list (void_type_node, integ
Re: [PATCH 1/2] Convert symtab, cgraph and varpool nodes into a real class hierarchy
> > + while (x != xlimit) > > +{ > > + /* Code common to all symtab nodes. */ > > + gt_ggc_m_9tree_node (x->decl); > > + gt_ggc_mx_symtab_node_base (x->next); > > + gt_ggc_mx_symtab_node_base (x->previous); > > + gt_ggc_mx_symtab_node_base (x->next_sharing_asm_name); > > + gt_ggc_mx_symtab_node_base (x->previous_sharing_asm_name); > > + gt_ggc_mx_symtab_node_base (x->same_comdat_group); > > + gt_ggc_m_20vec_ipa_ref_t_va_gc_ (x->ref_list.references); > > + gt_ggc_m_9tree_node (x->alias_target); > > + gt_ggc_m_18lto_file_decl_data (x->lto_file_data); > > Don't we have template specializations so we just can do > > gt_ggc_mark (x->decl); > gt_ggc_mark (x->next); Yep, that is what I hope for. > ... > etc? > > Also all of the symbol table is reachable from the global symbol_table > dynamic array which is a GC root. So instead of walking ->next/previous > and edges you should have a custom marker for the symbol_table global > which does more efficient marking with loops. Indeed, good point! All of the pointers linking symbol table together (i.e. pointer to other symbol table entries) can be ignored for PCH mark (since we are explicitely allocated, and all we need is to mark the trees/lto data and other stuff we point to. These are annotated primarily to get PCH working, where we obviously need to keep them. It would be nice to progress on this patch soon :) Honza > > Richard.
Re: Stream TYPE_FINAL_P and DECL_FINAL_P to LTO
On Sun, Aug 25, 2013 at 5:23 PM, Jan Hubicka wrote: > Hi, > this patch adds code to stream DECL_FINAL_P/TYPE_FINAL_P into LTO so we can > use it after the ipa-devirt code at LTO time is merged in. > (http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01007.html) > > Bootstrapped/regtested x86_64-linux, OK? > > Honza > > * lto-streamer-out.c (hash_tree): Add streaming of DECL_FINAL_P > and TYPE_FINAL_P > * lto/lto.c (compare_tree_sccs_1): Add TYPE_FINAL_P and DECL_FINAL_P > * tree-streamer-out.c (pack_ts_decl_with_vis_value_fields): > Add DECL_FINAL_P. > (pack_ts_type_common_value_fields): Add TYPE_FINAL_P. > * tree-streamer-in.c (unpack_ts_decl_with_vis_value_fields): > Add DECL_FINAL_P. > (unpack_ts_type_common_value_fields): Add TYPE_FINAL_P. > > Index: lto-streamer-out.c > === > --- lto-streamer-out.c (revision 201974) > +++ lto-streamer-out.c (working copy) > @@ -798,6 +798,9 @@ hash_tree (struct streamer_tree_cache_d > v); > v = iterative_hash_host_wide_int (DECL_TLS_MODEL (t), v); > } > + if (TREE_CODE (t) == FUNCTION_DECL) > + v = iterative_hash_host_wide_int (DECL_FINAL_P (t), > + v); >if (VAR_OR_FUNCTION_DECL_P (t)) > v = iterative_hash_host_wide_int (DECL_INIT_PRIORITY (t), v); > } > @@ -838,7 +841,10 @@ hash_tree (struct streamer_tree_cache_d > | (TYPE_USER_ALIGN (t) << 5) > | (TYPE_READONLY (t) << 6), v); >if (RECORD_OR_UNION_TYPE_P (t)) > - v = iterative_hash_host_wide_int (TYPE_TRANSPARENT_AGGR (t), v); > + { > + v = iterative_hash_host_wide_int (TYPE_TRANSPARENT_AGGR (t), v); > + v = iterative_hash_host_wide_int (TYPE_FINAL_P (t), v); please use | (TYPE_FINAL_P (t) << 1) to speed this up like in other cases. Otherwise ok. Thanks, Richard. > + } >else if (code == ARRAY_TYPE) > v = iterative_hash_host_wide_int (TYPE_NONALIASED_COMPONENT (t), v); >v = iterative_hash_host_wide_int (TYPE_PRECISION (t), v); > Index: lto/lto.c > === > --- lto/lto.c (revision 201974) > +++ lto/lto.c (working copy) > @@ -1892,6 +1892,7 @@ compare_tree_sccs_1 (tree t1, tree t2, t >compare_values (DECL_DISREGARD_INLINE_LIMITS); >compare_values (DECL_PURE_P); >compare_values (DECL_LOOPING_CONST_OR_PURE_P); > + compare_values (DECL_FINAL_P); >if (DECL_BUILT_IN_CLASS (t1) != NOT_BUILT_IN) > compare_values (DECL_FUNCTION_CODE); >if (DECL_STATIC_DESTRUCTOR (t1)) > @@ -1905,7 +1906,10 @@ compare_tree_sccs_1 (tree t1, tree t2, t >compare_values (TYPE_NO_FORCE_BLK); >compare_values (TYPE_NEEDS_CONSTRUCTING); >if (RECORD_OR_UNION_TYPE_P (t1)) > - compare_values (TYPE_TRANSPARENT_AGGR); > + { > + compare_values (TYPE_TRANSPARENT_AGGR); > + compare_values (TYPE_FINAL_P); > + } >else if (code == ARRAY_TYPE) > compare_values (TYPE_NONALIASED_COMPONENT); >compare_values (TYPE_PACKED); > Index: tree-streamer-out.c > === > --- tree-streamer-out.c (revision 201974) > +++ tree-streamer-out.c (working copy) > @@ -242,6 +242,8 @@ pack_ts_decl_with_vis_value_fields (stru >bp_pack_value (bp, DECL_TLS_MODEL (expr), 3); > } > > + if (TREE_CODE (expr) == FUNCTION_DECL) > +bp_pack_value (bp, DECL_FINAL_P (expr), 1); >if (VAR_OR_FUNCTION_DECL_P (expr)) > bp_pack_var_len_unsigned (bp, DECL_INIT_PRIORITY (expr)); > } > @@ -293,7 +295,10 @@ pack_ts_type_common_value_fields (struct >bp_pack_value (bp, TYPE_NO_FORCE_BLK (expr), 1); >bp_pack_value (bp, TYPE_NEEDS_CONSTRUCTING (expr), 1); >if (RECORD_OR_UNION_TYPE_P (expr)) > -bp_pack_value (bp, TYPE_TRANSPARENT_AGGR (expr), 1); > +{ > + bp_pack_value (bp, TYPE_TRANSPARENT_AGGR (expr), 1); > + bp_pack_value (bp, TYPE_FINAL_P (expr), 1); > +} >else if (TREE_CODE (expr) == ARRAY_TYPE) > bp_pack_value (bp, TYPE_NONALIASED_COMPONENT (expr), 1); >bp_pack_value (bp, TYPE_PACKED (expr), 1); > Index: tree-streamer-in.c > === > --- tree-streamer-in.c (revision 201974) > +++ tree-streamer-in.c (working copy) > @@ -275,6 +275,8 @@ unpack_ts_decl_with_vis_value_fields (st >DECL_TLS_MODEL (expr) = (enum tls_model) bp_unpack_value (bp, 3); > } > > + if (TREE_CODE (expr) == FUNCTION_DECL) > +DECL_FINAL_P (expr) = (unsigned) bp_unpack_value (bp, 1); >if (VAR_OR_FUNCTION_DECL_P (expr)) > { >priority_type p; > @@ -346,7 +348,10 @@ unpack_ts_type_common_value_fields (stru >
Re: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)
On Tue, Jul 30, 2013 at 4:47 PM, Zoran Jovanovic wrote: > Thank you for the reply. > I am in the process of modifying the patch according to some comments > received. > Currently I am considering the usage of DECL_BIT_FIELD_REPRESENTATIVE. > I see that they can be used during analysis phase for deciding which accesses > can be merged - only accesses with same representative will be merged. > I have more dilemmas with usage of representatives for lowering. If my > understanding is correct bit field representative can only be associated to a > field declaration, and not to a BIT_FIELD_REF node. As a consequence > optimization must use COMPONENT_REF to model new bit field access (which > should be an equivalent of several merged accesses). To use COMPONENT_REF a > new field declaration with appropriate bit size (equal to sum of bit sizes of > all merged bit field accesses) must be created and then corresponding bit > field representative could be attached. > Is my understanding correct? Is creating a new field declaration for every > set of merged bit field accesses acceptable? You should just use the DECL_BIT_FIELD_REPRESENTATIVE FIELD_DECL, not create any new FIELD_DECLs. Also you can use BIT_FIELD_REFs but you need to query the proper bit position / size from the DECL_BIT_FIELD_REPRESENTATIVE FIELD_DECL. Using the COMPONENT_REF is better though, I think. Full patch queued for review (but you might want to update it to not create new FIELD_DECLs). Richard. > > Regards, > Zoran Jovanovic > > > From: Richard Biener [richard.guent...@gmail.com] > Sent: Thursday, July 18, 2013 11:31 AM > To: Zoran Jovanovic; gcc-patches@gcc.gnu.org > Cc: Petar Jovanovic > Subject: Re: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc > inside) > > Zoran Jovanovic wrote: > >>Hello, >>This patch adds new optimization pass that combines several adjacent >>bit field accesses that copy values from one memory location to another >>into single bit field access. >> >>Example: >> >>Original code: >> D.1351; >> D.1350; >> D.1349; >> D.1349_2 = p1_1(D)->f1; >> p2_3(D)->f1 = D.1349_2; >> D.1350_4 = p1_1(D)->f2; >> p2_3(D)->f2 = D.1350_4; >> D.1351_5 = p1_1(D)->f3; >> p2_3(D)->f3 = D.1351_5; >> >>Optimized code: >> D.1358; >> D.1358_10 = BIT_FIELD_REF <*p1_1(D), 19, 13>; >> BIT_FIELD_REF <*p2_3(D), 19, 13> = D.1358_10; >> >>Algorithm works on basic block level and consists of following 3 major >>steps: >>1. Go trough basic block statements list. If there are statement pairs >>that implement copy of bit field content from one memory location to >>another record statements pointers and other necessary data in >>corresponding data structure. >>2. Identify records that represent adjacent bit field accesses and mark >>them as merged. >>3. Modify trees accordingly. > > All this should use BITFIELD_REPRESENTATIVE both to decide what accesses are > related and for the lowering. This makes sure to honor the appropriate memory > models. > > In theory only lowering is necessary and FRE and DSE will do the job of > optimizing - also properly accounting for alias issues that Joseph mentions. > The lowering and analysis is strongly related to SRA So I don't believe we > want a new pass for this. > > Richard. > >
Re: [PATCH 1/n] Add conditional compare support
On Wed, Aug 21, 2013 at 12:23 PM, Zhenqiang Chen wrote: > Hi, > > The attached patch is the basic support for conditional compare (CCMP). It > adds > a set of keywords on TREE to represent CCMP: > > DEFTREECODE (TRUTH_ANDIF_LT_EXPR, "truth_andif_lt_expr", tcc_ccomparison, 3) > DEFTREECODE (TRUTH_ANDIF_LE_EXPR, "truth_andif_le_expr", tcc_ccomparison, 3) > DEFTREECODE (TRUTH_ANDIF_GT_EXPR, "truth_andif_gt_expr", tcc_ccomparison, 3) > DEFTREECODE (TRUTH_ANDIF_GE_EXPR, "truth_andif_ge_expr", tcc_ccomparison, 3) > DEFTREECODE (TRUTH_ANDIF_EQ_EXPR, "truth_andif_eq_expr", tcc_ccomparison, 3) > DEFTREECODE (TRUTH_ANDIF_NE_EXPR, "truth_andif_ne_expr", tcc_ccomparison, 3) > DEFTREECODE (TRUTH_ORIF_LT_EXPR, "truth_orif_lt_expr", tcc_ccomparison, 3) > DEFTREECODE (TRUTH_ORIF_LE_EXPR, "truth_orif_le_expr", tcc_ccomparison, 3) > DEFTREECODE (TRUTH_ORIF_GT_EXPR, "truth_orif_gt_expr", tcc_ccomparison, 3) > DEFTREECODE (TRUTH_ORIF_GE_EXPR, "truth_orif_ge_expr", tcc_ccomparison, 3) > DEFTREECODE (TRUTH_ORIF_EQ_EXPR, "truth_orif_eq_expr", tcc_ccomparison, 3) > DEFTREECODE (TRUTH_ORIF_NE_EXPR, "truth_orif_ne_expr", tcc_ccomparison, 3) > > To distinguish others, the patch dumps CCMP as > > && to "?&&" > || to "?||" > > A CCMP operator has three operands: two are from the compare and the other > is from the result of previous compare. To reuse current codes, the result > of > previous compare is in TREE_OPERAND (ccmp, 2)/gimple_assign_rhs3. e.g. > > r = (a > b) && (c > d) > > with CCMP, it will be > > t1 = GT_EXPR (a, b) > r = TRUTH_ANDIF_GT_EXPR (c, d, t1) > > The patch does not include ops to expand CCMP to RTL. It just roll CCMP back > to BIT operators. So with a dummy "conditional_compare", we can test > the patch in any port. > In the patch, dummy conditional_compare for i386 and arm are added for test. > > Bootstrap on x86-64 and ARM Chromebook. > No ICE and runtime errors in regression tests. What's this for and what's the desired semantics? I don't like having extra tree codes for this. Is this for a specific instruction set feature? Richard. > Thanks! > -Zhenqiang > > ChangeLog: > 2013-08-21 Zhenqiang Chen > > * tree.def: Add conditional compare ops: TRUTH_ANDIF_LT_EXPR, > TRUTH_ANDIF_LE_EXPR, TRUTH_ANDIF_GT_EXPR, TRUTH_ANDIF_GE_EXPR, > TRUTH_ANDIF_EQ_EXPR, TRUTH_ANDIF_NE_EXPR, TRUTH_ORIF_LT_EXPR, > TRUTH_ORIF_LE_EXPR, TRUTH_ORIF_GT_EXPR, TRUTH_ORIF_GE_EXPR, > TRUTH_ORIF_EQ_EXPR, TRUTH_ORIF_NE_EXPR. > * tree.h: Add misc utils for conditional compare. > * tree.c (tree_node_structure_for_code, tree_code_size, > record_node_allocation_statistics, contains_placeholder_p, > find_placeholder_in_expr, substitute_in_expr, > substitute_placeholder_in_expr, stabilize_reference_1, build2_stat, > simple_cst_equal): Handle conditional compare. > (get_code_from_ccompare_expr): New added. > (generate_ccompare_code ): New added. > * c-family/c-pretty-print.c (pp_c_expression): Handle conditional > compare. > * cp/error.c (dump_expr): Likewise. > * cp/semantics.c (cxx_eval_constant_expression): Likewise. > (potential_constant_expression_1): Likewise. > (cxx_eval_ccmp_expression): New added. > * cfgexpand.c (expand_debug_expr): Handle conditional compare. > * expr.c (safe_from_p, expand_expr_real_2): Likewise. > (expand_ccmp_to_bitop): New added. > * gimple-pretty-print.c (dump_ccomparison_rhs): New added. > (dump_gimple_assign): Handle conditional compare. > * print-tree.c (print_node): Likewise > * tree-dump.c (dequeue_and_dump): Likewise. > * tree-pretty-print.c (dump_generic_node, op_code_prio): Likewise. > * gimple.c (recalculate_side_effects): Likewise. > (get_gimple_rhs_num_ops): Likewise > * gimplify.c (goa_stabilize_expr, gimplify_expr, gimple_boolify): > Likewise. > * tree-inline.c (estimate_operator_cost): Likewise. > * tree-ssa-operands.c (get_expr_operands): Likewise. > * tree-ssa-loop-niter.c (get_val_for): Likewise. > * tree-cfg.c (verify_gimple_assign_ternary): Likewise. > (verify_gimple_comparison_operands): New added. > (verify_gimple_comparison): Call verify_gimple_comparison_operands. > * fold-const.c (fold_truth_andor): Generate conditinal compare. > * lra-constraints.c (remove_inheritance_pseudos): > Initialize variable "set" to NULL_RTX.
Re: [PATCH 1/2] Convert symtab, cgraph and varpool nodes into a real class hierarchy
On Fri, Aug 16, 2013 at 2:57 AM, David Malcolm wrote: > This patch is the handwritten part of the conversion of these types > to C++; it requires the followup patch, which is autogenerated. > > It converts: > struct GTY(()) symtab_node_base > to: > class GTY((user)) symtab_node_base > > and converts: > struct GTY(()) cgraph_node > to: > struct GTY((user)) cgraph_node : public symtab_node_base > > and: > struct GTY(()) varpool_node > to: > class GTY((user)) varpool_node : public symtab_node_base > > dropping the symtab_node_def union. > > Since gengtype is unable to cope with inheritance, we have to mark the > types with GTY((user)), and handcode the gty field-visiting functions. > Given the simple hierarchy, we don't need virtual functions for this. > > Unfortunately doing so runs into various bugs in gengtype's handling > of GTY((user)), so the patch also includes workarounds for these bugs. > > gengtype walks the graph of the *types* in the code, and produces > functions in gtype-desc.[ch] for all types that are reachable from a > GTY root. > > However, it ignores the contents of GTY((user)) types when walking > this graph. > > Hence if you have a subgraph of types that are only reachable > via fields in GTY((user)) types, gengtype won't generate helper code > for those types. > > Ideally there would be some way to mark a GTY((user)) type to say > which types it references, to avoid having to mark these types as > GTY((user)). > > For now, work around this issue by providing explicit roots of the > missing types, of dummy variables (see the bottom of cgraph.c) > > gcc/ > > * cgraph.c (gt_ggc_mx): New. > (gt_pch_nx (symtab_node_base *)): New. > (gt_pch_nx (symtab_node_base *, gt_pointer_operator, void *)): New. > (dummy_call_site_hash): New. > (dummy_ipa_ref_list): New. > (dummy_cgraph_clone_info): New. > (dummy_ipa_replace_map_pointer): New. > (dummy_varpool_node_ptr): New. > > * cgraph.h (symtab_node_base): Convert to a class; > add GTY((user)). > (gt_ggc_mx): New. > (gt_pch_nx (symtab_node_base *p)): New. > (gt_pch_nx (symtab_node_base *p, gt_pointer_operator op, > void *cookie)): New. > (cgraph_node): Inherit from symtab_node; convert to GTY((user)). > (varpool_node): Likewise. > (symtab_node_def): Remove. > (is_a_helper ::test (symtab_node_def *)): Convert to... > (is_a_helper ::test (symtab_node_base *)): ...this. > (is_a_helper ::test (symtab_node_def *)): Convert to... > (is_a_helper ::test (symtab_node_base *)): ...this. > > * ipa-ref.h (symtab_node_def): Drop. > (symtab_node): Change underlying type from symtab_node_def to > symtab_node_base. > (const_symtab_node): Likwise. > > * is-a.h: Update examples in comment. > > * symtab.c (symtab_hash): Change symtab_node_def to symtab_node_base. > (assembler_name_hash): Likewise. > --- > gcc/cgraph.c | 218 > ++ > gcc/cgraph.h | 48 ++--- > gcc/ipa-ref.h | 6 +- > gcc/is-a.h| 6 +- > gcc/symtab.c | 4 +- > 5 files changed, 247 insertions(+), 35 deletions(-) > > diff --git a/gcc/cgraph.c b/gcc/cgraph.c > index 50d13ab..5cb6a31 100644 > --- a/gcc/cgraph.c > +++ b/gcc/cgraph.c > @@ -3035,4 +3035,222 @@ cgraph_get_body (struct cgraph_node *node) >return true; > } > > +/* GTY((user)) hooks for symtab_node_base (and its subclasses). > + We could use virtual functions for this, but given the presence of the > + "type" field and the trivial size of the class hierarchy, switches are > + perhaps simpler and faster. */ > + > +void gt_ggc_mx (symtab_node_base *x) > +{ > + /* Hand-written equivalent of the chain_next/chain_prev machinery, to > + avoid deep call-stacks. > + > + Locate the neighbors of x (within the linked-list) that haven't been > + marked yet, so that x through xlimit give a range suitable for marking. > + Note that x (on entry) itself has already been marked by the > + gtype-desc.c code, so we first try its successor. > + */ > + symtab_node_base * xlimit = x ? x->next : NULL; > + while (ggc_test_and_set_mark (xlimit)) > + xlimit = xlimit->next; > + if (x != xlimit) > +for (;;) > + { > +symtab_node_base * const xprev = x->previous; > +if (xprev == NULL) break; > +x = xprev; > +(void) ggc_test_and_set_mark (xprev); > + } > + while (x != xlimit) > +{ > + /* Code common to all symtab nodes. */ > + gt_ggc_m_9tree_node (x->decl); > + gt_ggc_mx_symtab_node_base (x->next); > + gt_ggc_mx_symtab_node_base (x->previous); > + gt_ggc_mx_symtab_node_base (x->next_sharing_asm_name); > + gt_ggc_mx_symtab_node_base (x->previous_sharing_asm_name); > + gt_ggc_mx_symtab_node_base (x->same_comdat_group); > + gt_ggc_m_20vec_ipa_ref
Re: [PATCH][4.8 backport] Fix PR57735
On Mon, Jul 22, 2013 at 6:27 PM, Kyrylo Tkachov wrote: > Hi all, > > The fix for PR57735 is in current trunk (for a different issue I think), just > needs a backport to 4.8. > It is r198462 by Richard Sandiford: > > 2013-04-30 Richard Sandiford > > * explow.c (plus_constant): Pass "mode" to immed_double_int_const. > Use gen_int_mode rather than GEN_INT. > > Ok to backport to the 4.8 branch? > > I've attached the testcase that exposed the ICE. I think the ChangeLog would > look like this: > > > 2013-07-22 Kyrylo Tkachov > > PR target/57735 > Backport from mainline > 2013-04-30 Richard Sandiford > > * explow.c (plus_constant): Pass "mode" to > immed_double_int_const. > Use gen_int_mode rather than GEN_INT. > > 2013-07-22 Kyrylo Tkachov > > PR target/57735 > * g++.dg/ext/pr57735.C: New test. Ok. Please also add the testcase to the trunk. Thanks, Richard. > > > Thanks, > Kyrill
Re: Fix PR middle-end/57370
On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman wrote: > I have a new patch that supersedes this. The new patch also fixes PR > tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and > no test regression on x86_64/linux. Ok for trunk? > > 2013-07-31 Easwaran Raman > > PR middle-end/57370 > * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset > of debug statements that cause inconsistent IR. Missing ChangeLog entry for the insert_stmt_after hunk which I do not like at all. The other hunks are ok, but we need to work harder to preserve debug stmts - simply removing all is not going to fly. Richard. > > testsuite/ChangeLog: > 2013-07-31 Easwaran Raman > > PR middle-end/57370 > PR tree-optimization/57393 > PR tree-optimization/58011 > * gfortran.dg/reassoc_12.f90: New testcase. > * gcc.dg/tree-ssa/reassoc-31.c: New testcase. > * gcc.dg/tree-ssa/reassoc-31.c: New testcase. > > > On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman wrote: >> Ping. >> >> On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman wrote: >>> A newly generated statement in build_and_add_sum function of >>> tree-ssa-reassoc.c has to be assigned the UID of its adjacent >>> statement. In one instance, it was assigned the wrong uid (of an >>> earlier phi statement) which messed up the IR and caused the test >>> program to hang. Bootstraps and no test regressions on x86_64/linux. >>> Ok for trunk? >>> >>> Thanks, >>> Easwaran >>> >>> >>> 2013-06-27 Easwaran Raman >>> >>> PR middle-end/57370 >>> * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a >>> phi >>> node for a non-phi gimple statement. >>> >>> testsuite/ChangeLog: >>> 2013-06-27 Easwaran Raman >>> >>> PR middle-end/57370 >>> * gfortran.dg/reassoc_12.f90: New testcase. >>> >>> >>> Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 >>> === >>> --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) >>> +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) >>> @@ -0,0 +1,74 @@ >>> +! { dg-do compile } >>> +! { dg-options "-O2 -ffast-math" } >>> +! PR middle-end/57370 >>> + >>> + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, & >>> + grad_deriv,npoints, sx) >>> +IMPLICIT REAL*8 (t) >>> +INTEGER, PARAMETER :: dp=8 >>> +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, & >>> + e_ndrho_ndrho_rho >>> + DO ii=1,npoints >>> + IF( grad_deriv >= 2 .OR. grad_deriv == -2 ) THEN >>> +t1425 = t233 * t557 >>> +t1429 = beta * t225 >>> +t1622 = t327 * t1621 >>> +t1626 = t327 * t1625 >>> +t1632 = t327 * t1631 >>> +t1685 = t105 * t1684 >>> +t2057 = t1636 + t8 * (t2635 + t3288) >>> + END IF >>> + IF( grad_deriv >= 3 .OR. grad_deriv == -3 ) THEN >>> +t5469 = t5440 - t5443 - t5446 - t5449 - & >>> +t5451 - t5454 - t5456 + t5459 - & >>> +t5462 + t5466 - t5468 >>> +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425 >>> +t5489 = 0.16e2_dp * t1429 * t1658 >>> +t5531 = 0.160e2_dp * t112 * t1626 >>> +t5533 = 0.160e2_dp * t112 * t1632 >>> +t5537 = 0.160e2_dp * t112 * t1622 >>> +t5541 = t5472 - t5478 - t5523 + t5525 + & >>> +t5531 + t5533 + t5535 + t5537 + & >>> +t5540 >>> +t5565 = t112 * t1685 >>> +t5575 = t5545 - t5548 + t5551 + t5553 - & >>> +t5558 + t5560 - t5562 + t5564 - & >>> +0.80e1_dp * t5565 + t5568 + t5572 + & >>> +t5574 >>> +t5611 = t5579 - t5585 + t5590 - t5595 + & >>> +t5597 - t5602 + t5604 + t5607 + & >>> +t5610 >>> +t5613 = t5469 + t5541 + t5575 + t5611 >>> +t6223 = t6189 - & >>> +0.36e0_dp * t83 * t84 * t5613 + & >>> +t6222 >>> +t6227 = - t8 * (t5305 + t6223) >>> +e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii) + & >>> + t6227 * sx >>> +t6352 = t5440 - t5443 - t5446 - t5449 - & >>> +t5451 - t5454 + & >>> +0.40e1_dp * t102 * t327 * t2057 * t557 - & >>> +t5456 + t5459 - t5462 + t5466 - & >>> +t5468 >>> +t6363 = t5480 - t5489 + & >>> +0.96e2_dp * t1054 * t640 * t3679 >>> +t6367 = t5472 - t5474 - t5478 - t5523 + & >>> +t5525 + t5531 + t5533 + t5535 + & >>> +t5537 - 0.20e1_dp * t102 * t105 * t6363 + & >>> +t5540 >>> +t6370 = t5545 - t5548 + t5551 + t5553
Re: [PATCH] Fix PR57521
On Mon, 24 Jun 2013, Richard Biener wrote: > > The following fixes a long-standing bug in tree if-conversion. > The transform phase relies on being able to extract edge predicates > by simply using the predicate under which its source block is > executed. That obviously isn't the correct one if the source > block ends in a condition itself. Thus the following patch > verifies that each predecessor edge of blocks we will if-convert > is non-critical. > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. ... to pick up where I left. That didn't work out, we need to handle the case of one edge being non-critical only to not regress. I've dug in history where all the strange existing tests come from - and they are all for bugs exhibiting this very same problem, just they paper over it in various interesting ways. So the following removes all of them, re-doing the above fix in a less conservative way. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2013-08-27 Richard Biener PR tree-optimization/57521 * tree-if-conv.c (if_convertible_bb_p): Verify that at least one edge is non-critical. (find_phi_replacement_condition): Make sure to use a non-critical edge. Cleanup and remove old bug workarounds. (bb_postdominates_preds): Remove. (if_convertible_loop_p_1): Do not compute post-dominators. (combine_blocks): Do not free post-dominators. (main_tree_if_conversion): Likewise. (pass_data_if_conversion): Add TODO_verify_ssa. * gcc.dg/torture/pr57521.c: New testcase. Index: gcc/tree-if-conv.c === *** gcc/tree-if-conv.c (revision 202017) --- gcc/tree-if-conv.c (working copy) *** if_convertible_stmt_p (gimple stmt, vec< *** 797,816 return true; } - /* Return true when BB post-dominates all its predecessors. */ - - static bool - bb_postdominates_preds (basic_block bb) - { - unsigned i; - - for (i = 0; i < EDGE_COUNT (bb->preds); i++) - if (!dominated_by_p (CDI_POST_DOMINATORS, EDGE_PRED (bb, i)->src, bb)) - return false; - - return true; - } - /* Return true when BB is if-convertible. This routine does not check basic block's statements and phis. --- 797,802 *** if_convertible_bb_p (struct loop *loop, *** 868,877 return false; } ! if (EDGE_COUNT (bb->preds) == 2 ! && bb != loop->header ! && !bb_postdominates_preds (bb)) ! return false; return true; } --- 854,876 return false; } ! /* At least one incoming edge has to be non-critical as otherwise edge ! predicates are not equal to basic-block predicates of the edge ! source. */ ! if (EDGE_COUNT (bb->preds) > 1 ! && bb != loop->header) ! { ! bool found = false; ! FOR_EACH_EDGE (e, ei, bb->preds) ! if (EDGE_COUNT (e->src->succs) == 1) ! found = true; ! if (!found) ! { ! if (dump_file && (dump_flags & TDF_DETAILS)) ! fprintf (dump_file, "only critical predecessors\n"); ! return false; ! } ! } return true; } *** if_convertible_loop_p_1 (struct loop *lo *** 1084,1090 return false; calculate_dominance_info (CDI_DOMINATORS); - calculate_dominance_info (CDI_POST_DOMINATORS); /* Allow statements that can be handled during if-conversion. */ ifc_bbs = get_loop_body_in_if_conv_order (loop); --- 1083,1088 *** if_convertible_loop_p (struct loop *loop *** 1220,1227 if-conversion. */ static basic_block ! find_phi_replacement_condition (struct loop *loop, ! basic_block bb, tree *cond, gimple_stmt_iterator *gsi) { edge first_edge, second_edge; --- 1218,1224 if-conversion. */ static basic_block ! find_phi_replacement_condition (basic_block bb, tree *cond, gimple_stmt_iterator *gsi) { edge first_edge, second_edge; *** find_phi_replacement_condition (struct l *** 1231,1264 first_edge = EDGE_PRED (bb, 0); second_edge = EDGE_PRED (bb, 1); ! /* Use condition based on following criteria: ! 1) !S1: x = !c ? a : b; ! !S2: x = c ? b : a; ! !S2 is preferred over S1. Make 'b' first_bb and use its condition. ! ! 2) Do not make loop header first_bb. ! ! 3) !S1: x = !(c == d)? a : b; ! !S21: t1 = c == d; !S22: x = t1 ? b : a; ! !S3: x = (c == d) ? b : a; ! !S3 is preferred over S1 and S2*, Make 'b' first_bb and use !its condition. ! ! 4) If pred B is dominated by pred A then use pred B's condition. ! See PR23115. */ ! ! /* Select condition that is not TRUTH_NOT_EXPR. */ tmp_cond = bb_predicate (first_
Re: [PATCH 11/11] Make opt_pass and gcc::pipeline be GC-managed
On Sat, Aug 3, 2013 at 2:21 AM, Richard Henderson wrote: > On 08/02/2013 11:53 AM, David Malcolm wrote: >> FWIW I had a go at avoiding templates by attempting to tell gengtype to >> write out functions for all GTY((user)) types, regardless of whether it >> thinks they're referenced, with this: >> @@ -3697,7 +3697,8 @@ write_types (outf_p output_header, type_p structures, >> type_p param_structs, >>/* At last we emit the functions code. */ >>oprintf (output_header, "\n/* functions code */\n"); >>for (s = structures; s; s = s->next) >> -if (s->gc_used == GC_POINTED_TO || s->gc_used == GC_MAYBE_POINTED_TO) >> +if (s->gc_used == GC_POINTED_TO || s->gc_used == GC_MAYBE_POINTED_TO >> +|| s->kind == TYPE_USER_STRUCT) >>{ >> options_p opt; >> >> but I ran into various "incomplete structure" errors due to gengtype's >> lack of a full C/C++ parser. >> >> Hence templates seem the sanest option. > > I was actually contemplating going the other direction -- have gengtype > write out _less_, relying on the templates more. I hadn't gone very far > down that road yet... That was the original idea, btw (well, the original idea was to get rid of gengtype completely of course). Btw, I don't like moving the pass pipeline and pass objects into GC space. It doesn't make much sense as pass execution is quite constrained so passes should not allocate objects that extend the passes lifetime. Richard.
Re: [PATCH] Fix illegal cast to rtx (*insn_gen_fn) (rtx, ...)
On Tue, Aug 27, 2013 at 11:03:32AM +0200, Richard Biener wrote: > On Wed, Jul 10, 2013 at 3:14 AM, Stefan Kristiansson > wrote: > > The (static arg) generator functions are casted to a var arg > > function pointer, making the assumption that the ABI for passing > > the arguments will be the same as for static arguments. > > This isn't a valid assumption on all architectures, var args might for > > example be passed on the stack, even if there would be function argument > > registers still available. > > > > There exists a bugreport for the problem here: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12081 > > > > This patch is taken from the suggestion by Rask Ingemann Lambertsen > > and updated to the current svn tip of trunk. > > > > gcc/Changelog: > > > > 2013-07-10 Stefan Kristiansson > > > > PR target/12081 > > * recog.h (rtx (*insn_gen_fn) (rtx, ...)): Remove typedef. > > (genfun): Replace var arg function pointer with static argument > > function pointers. > > * optabs.h (GEN_FCN): Replace single define with ones of the > > form GEN_FCN?, where ? is the number of arguments. > > * reload1.c: Use GEN_FCN? instead of GEN_FCN > > Can't we use a template for all this to avoid the union and *[12345] mess? > Oleg Endo proposed a cleaner approach using functors here: http://gcc.gnu.org/ml/gcc-patches/2013-07/msg01315.html and AFAIK that (or something close to it) is already applied. Stefan
Re: [Patch, Fortran, OOP] PR 57305: ICE when calling SIZEOF on an unlimited polymorphic variable
Le 26/08/2013 16:16, Janus Weil a écrit : I'm slightly inclined to kindly invite the user to switch to STORAGE_SIZE+SIZE instead. Any other opinion? >>> >>> Since the SIZEOF intrinsic has been around for some time in gfortran >>> (before STORAGE_SIZE was available), I would say we should at least >>> continue to support it for backward compatibility. And I guess we >>> should also make it behave reasonably for all inputs. However, it >>> might be worth to add a note in the documentation that STORAGE_SIZE >>> and SIZE should be used instead in standard-conforming code. >>> >> I thought about it again, and what I'm actually in favor of is >> diagnosing by default _all_ extensions having a standard replacement. > > By 'diagnosing' you mean to give a warning? This can already be > accomplished by compiling with "-std=f2008 -Wintrinsics-std", I guess > (not by default, though). > Yes, it would boil down to using that by default. > Using only -std=f2008 currently results in errors like: > > undefined reference to `sizeof_' > ... which is the best we can do, as sizeof could be a user externally-defined function. Mikael
[committed] Add testcases from PR578{60,61,75,76,77}
Hi! These issues were caused by the LRA optional reloads support that got reverted later on, but the testcases look to be useful for the testsuite, so I went ahead and checked them all in. 2013-08-27 Jakub Jelinek PR rtl-optimization/57860 PR rtl-optimization/57861 PR rtl-optimization/57875 PR rtl-optimization/57876 PR rtl-optimization/57877 * gcc.c-torture/execute/pr57860.c: New test. * gcc.c-torture/execute/pr57861.c: New test. * gcc.c-torture/execute/pr57875.c: New test. * gcc.c-torture/execute/pr57876.c: New test. * gcc.c-torture/execute/pr57877.c: New test. --- gcc/testsuite/gcc.c-torture/execute/pr57860.c.jj2013-08-27 11:16:33.262457070 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr57860.c 2013-08-27 11:15:19.0 +0200 @@ -0,0 +1,25 @@ +/* PR rtl-optimization/57860 */ + +extern void abort (void); +int a, *b = &a, c, d, e, *f = &e, g, *h = &d, k[1] = { 1 }; + +int +foo (int p) +{ + for (;; g++) +{ + for (; c; c--); + *f = *h = p > ((0x1LL ^ a) & *b); + if (k[g]) + return 0; +} +} + +int +main () +{ + foo (1); + if (d != 1) +abort (); + return 0; +} --- gcc/testsuite/gcc.c-torture/execute/pr57861.c.jj2013-08-27 11:12:46.882684096 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr57861.c 2013-08-27 11:12:07.0 +0200 @@ -0,0 +1,33 @@ +/* PR rtl-optimization/57861 */ + +extern void abort (void); +short a = 1, f; +int b, c, d, *g = &b, h, i, j; +unsigned int e; + +static int +foo (char p) +{ + int k; + for (c = 0; c < 2; c++) +{ + i = (j = 0) || p; + k = i * p; + if (e < k) + { + short *l = &f; + a = d && h; + *l = 0; + } +} + return 0; +} + +int +main () +{ + *g = foo (a); + if (a != 0) +abort (); + return 0; +} --- gcc/testsuite/gcc.c-torture/execute/pr57875.c.jj2013-08-27 11:06:09.256786405 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr57875.c 2013-08-27 11:04:58.0 +0200 @@ -0,0 +1,21 @@ +/* PR rtl-optimization/57875 */ + +extern void abort (void); +int a[1], b, c, d, f, i; +char e[1]; + +int +main () +{ + for (; i < 1; i++) +if (!d) + { + if (!c) + f = 2; + e[0] &= f ^= 0; + } + b = a[e[0] >> 1 & 1]; + if (b != 0) +abort (); + return 0; +} --- gcc/testsuite/gcc.c-torture/execute/pr57876.c.jj2013-08-27 11:06:12.224775684 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr57876.c 2013-08-27 11:04:20.0 +0200 @@ -0,0 +1,27 @@ +/* PR rtl-optimization/57876 */ + +extern void abort (void); +int a, b = 1, c, *d = &c, f, *g, h, j; +static int e; + +int +main () +{ + int i; + for (i = 0; i < 2; i++) +{ + long long k = b; + int l; + for (f = 0; f < 8; f++) + { + int *m = &e; + j = *d; + h = a * j - 1; + *m = (h == 0) < k; + g = &l; + } +} + if (e != 1) +abort (); + return 0; +} --- gcc/testsuite/gcc.c-torture/execute/pr57877.c.jj2013-08-27 11:06:15.310759088 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr57877.c 2013-08-27 11:03:56.0 +0200 @@ -0,0 +1,28 @@ +/* PR rtl-optimization/57877 */ + +extern void abort (void); +int a, b, *c = &b, e, f = 6, g, h; +short d; + +static unsigned char +foo (unsigned long long p1, int *p2) +{ + for (; g <= 0; g++) +{ + short *i = &d; + int *j = &e; + h = *c; + *i = h; + *j = (*i == *p2) < p1; +} + return 0; +} + +int +main () +{ + foo (f, &a); + if (e != 1) +abort (); + return 0; +} Jakub
Re: [AArch64] Rewrite the vdup_lane intrinsics in C
On 9 August 2013 10:48, James Greenhalgh wrote: > --- > gcc/ > > 2013-08-09 James Greenhalgh > > * config/aarch64/aarch64-simd-builtins.def > (dup_lane_scalar): Remove. > * config/aarch64/aarch64-simd.md > (aarch64_simd_dup): Add 'w->w' alternative. > (aarch64_dup_lane): Allow for VALL. > (aarch64_dup_lane_scalar): Remove. > (aarch64_dup_lane_): New. > (aarch64_get_lane_signed): Add w->w altenative. > (aarch64_get_lane_unsigned): Likewise. > (aarch64_get_lane): Likewise. > * config/aarch64/aarch64.c (aarch64_evpc_dup): New. > (aarch64_expand_vec_perm_const_1): Use aarch64_evpc_dup. > * config/aarch64/iterators.md (VSWAP_WIDTH): New. > (VCON): Change container of V2SF. > (vswap_width_name): Likewise. > * config/aarch64/arm_neon.h > (__aarch64_vdup_lane_any): New. > (__aarch64_vdup_lane_<8,16,32,64>): Likewise. > (vdup_n_<8,16,32,64>): Convert to C implementation. > (vdup_lane_<8,16,32,64>): Likewise. > > gcc/testsuite/ > > 2013-08-09 James Greenhalgh > > * gcc.target/aarch64/scalar_intrinsics.c > (vdup_lane<8,16,32,64>): Force values to SIMD registers. OK /Marcus
Re: SSA identifiers
On Thu, Jun 27, 2013 at 8:52 PM, Andrew MacLeod wrote: > On 06/27/2013 02:39 PM, Jakub Jelinek wrote: >> >> >> in tree-ssanames.c:release_ssa_names() : >> >> if (! SSA_NAME_IN_FREE_LIST (var)) >> { >>tree saved_ssa_name_var = SSA_NAME_VAR (var); >>int saved_ssa_name_version = SSA_NAME_VERSION (var); >>use_operand_p imm = &(SSA_NAME_IMM_USE_NODE (var)); >> <..> >> /* Hopefully this can go away once we have the new incremental >> SSA updating code installed. */ >>SET_SSA_NAME_VAR_OR_IDENTIFIER (var, saved_ssa_name_var); >> I don't see a big issue with this, sure, you could >> tree saved_ssa_name_identifier = saved_ssa_name_var ? saved_ssa_name_var : >> SSA_NAME_IDENTIFIER (var); >> and use that instead in SET_SSA_NAME_VAR_OR_IDENTIFIER. > > > Yeah I wasn't too concerned about this one, the outof-ssa case looked like > more of a possible issue. Maybe neither is, they just popped out as > inconsistent uses. Restoring SSA_NAME_VAR_OR_IDENTIFIER is only for debugging. Yes, it probably should save SSA_NAME_VAR_OR_IDENTIFIER instead of SSA_NAME_VAR. Richard. > Andrew >
Re: [PATCH] Improve jump threading using VRP information
On Thu, Jun 27, 2013 at 6:29 AM, Jeff Law wrote: > > Just something else I saw while analyzing dumps from an unrelated set of > changes. > > It's relatively common to see sequences like this: > > # parent_1 = PHI > _11 = single_tree_10(D) != 0; > _12 = parent_1 == 0B; > _13 = _11 & _12; > if (_13 != 0) > goto ; > else > goto ; > > Where VRP can deduce that the value of parent_6 has a nonzero value on one > (or more) of the paths reaching this block (because those paths dereference > parent_6). > > Obviously when VRP knows parent_6 is nonzero, then we know the current block > will always transfer control to BB 7. > > Or something like this: > > : > # prephitmp_49 = PHI <1(4), pretmp_48(5)> > prev_line.31_6 = prev_line; > _8 = prev_line.31_6 != line_7(D); > line_differs_9 = (unsigned char) _8; > _10 = prephitmp_49 | line_differs_9; > if (_10 != 0) > goto ; > else > goto ; > > > We may not know the exact value of _10, but VRP can determine that when BB6 > is reached from BB4 that BB6 will always transfer control to BB8. > > > I never coded up the bits to utilize VRP information to simplify statements > for threading except for COND_EXPRs. It wasn't clear how often it would be > useful, thus I took the easy way out. This picks up a hundred or so > additional threading opportunities in my testfiles. Not huge, but worth it > IMHO. > > Bootstrapped and regression tested on x86_64-unknown-linux-gnu. > > OK for trunk? Ok if it still applies. Thanks, Richard. > > > > * tree-vrp.c (simplify_stmt_for_jump_threading): Try to > simplify assignments too. If the RHS collapses to a singleton > range, then return the value for the range. > > * gcc.dg/tree-ssa/ssa-vrp-thread-1.c: New test. > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c > b/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c > new file mode 100644 > index 000..9d9473e > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c > @@ -0,0 +1,31 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-vrp1-details" } */ > + > + > +struct basic_block_def; > +typedef struct basic_block_def *basic_block; > +enum gimple_code > +{ > + LAST_AND_UNUSED_GIMPLE_CODE > +}; > +struct omp_region > +{ > + struct omp_region *outer; > + basic_block cont; > +}; > +void > +build_omp_regions_1 (basic_block bb, struct omp_region *parent, > +unsigned char single_tree, enum gimple_code code) > +{ > + if (code == 25) > +parent = parent->outer; > + else if (code == 42) > +parent->cont = bb; > + if (single_tree && !parent) > +return; > + oof (); > +} > + > +/* { dg-final { scan-tree-dump-times "Threaded" 1 "vrp1" } } */ > +/* { dg-final { cleanup-tree-dump "vrp1" } } */ > + > diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c > index ec7ef8f..683a3db 100644 > --- a/gcc/tree-vrp.c > +++ b/gcc/tree-vrp.c > @@ -9109,15 +9109,27 @@ static vec equiv_stack; > static tree > simplify_stmt_for_jump_threading (gimple stmt, gimple within_stmt) > { > - /* We only use VRP information to simplify conditionals. This is > - overly conservative, but it's unclear if doing more would be > - worth the compile time cost. */ > - if (gimple_code (stmt) != GIMPLE_COND) > -return NULL; > + if (gimple_code (stmt) == GIMPLE_COND) > +return vrp_evaluate_conditional (gimple_cond_code (stmt), > +gimple_cond_lhs (stmt), > +gimple_cond_rhs (stmt), within_stmt); > + > + if (gimple_code (stmt) == GIMPLE_ASSIGN) > +{ > + value_range_t new_vr = VR_INITIALIZER; > + tree lhs = gimple_assign_lhs (stmt); > + > + if (TREE_CODE (lhs) == SSA_NAME > + && (INTEGRAL_TYPE_P (TREE_TYPE (lhs)) > + || POINTER_TYPE_P (TREE_TYPE (lhs > + { > + extract_range_from_assignment (&new_vr, stmt); > + if (range_int_cst_singleton_p (&new_vr)) > + return new_vr.min; > + } > +} > > - return vrp_evaluate_conditional (gimple_cond_code (stmt), > - gimple_cond_lhs (stmt), > - gimple_cond_rhs (stmt), within_stmt); > + return NULL_TREE; > } > > /* Blocks which have more than one predecessor and more than >
Re: Document __builtin_isinf_sign more precisely
On Wed, Jun 26, 2013 at 4:16 PM, Marc Glisse wrote: > Ping http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00475.html Ok. Thanks, Richard. > > On Sun, 9 Jun 2013, Marc Glisse wrote: > >> Hello, >> >> this patch documents that __builtin_isinf_sign returns +-1 for +-Inf. This >> builtin was created so it could be used by a libc that has a stronger >> guarantee than the standard, and for glibc that means returning +-1, which >> is what the code already does. >> >> 2013-06-10 Marc Glisse >> >> PR middle-end/57219 >> * doc/extend.texi (__builtin_isinf_sign): Restrict the return >> values to -1, 0 and 1. > > > -- > Marc Glisse
Re: [patch] Fix error recovery issue with alias
On Tue, Jun 25, 2013 at 9:23 AM, Eric Botcazou wrote: > Hi, > > this fixes a segfault on a malformed alias declaration, which is correctly > flagged as an error by handle_alias_pairs: > > error: 'Linker_Alias.Var' aliased to undefined symbol 'var2' > > but is nevertheless later processed as a valid alias by the cgraph machinery. > > Bootstrapped/regtested on x86_64-suse-linux, OK for the mainline? Ok if it still applies. Thanks, Richard. > > 2013-06-25 Eric Botcazou > > * cgraphunit.c (handle_alias_pairs): Reset the alias flag after the > error message is issued for an alias to an undefined symbol. > > > 2013-06-25 Eric Botcazou > > * gnat.dg/specs/linker_alias.ads: New test. > > > -- > Eric Botcazou
[PATCH, i386] PR 57927: -march=core-avx2 different than -march=native on INTEL Haswell (i7-4700K)
Hello! As reported in [1] the host processor detection has not yet been updated to recognize Intel Ivy Bridge and Haswell processors. This small patch adds the detection of these processors and assumes core-avx2 as march for unknown processors of the PENTIUMPRO family that support AVX2. Best regards Christian Widmer [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57927 2013-08-27 Christian Widmer PR target/57927 * config/i386/driver-i386.c (host_detect_local_cpu): Add detection of Ivy Bridge and Haswell processors and assume core-avx2 for unknown AVX2 capable processors Index: gcc/config/i386/driver-i386.c === *** gcc/config/i386/driver-i386.c (revision 202013) --- gcc/config/i386/driver-i386.c (working copy) *** const char *host_detect_local_cpu (int a *** 662,667 --- 662,675 /* Sandy Bridge. */ cpu = "corei7-avx"; break; + case 0x3a: + /* Ivy Bridge. */ + cpu = "corei7-avx"; + break; + case 0x3c: + /* Haswell. */ + cpu = "core-avx2"; + break; case 0x17: case 0x1d: /* Penryn. */ *** const char *host_detect_local_cpu (int a *** 675,682 if (arch) { /* This is unknown family 0x6 CPU. */ ! if (has_avx) ! /* Assume Sandy Bridge. */ cpu = "corei7-avx"; else if (has_sse4_2) { --- 683,693 if (arch) { /* This is unknown family 0x6 CPU. */ ! if (has_avx2) ! /* Assume Haswell. */ ! cpu = "core-avx2"; ! else if (has_avx) ! /* Assume Sandy/Ivy Bridge. */ cpu = "corei7-avx"; else if (has_sse4_2) {
Re: [RFC Patch, Aarch64] : Macros for profile code generation to enable gprof support
Hi Venkat, On 3 August 2013 19:01, Venkataramanan Kumar wrote: > This patch adds macros to support gprof in Aarch64. The difference > from the previous patch is that the compiler, while generating > "mcount" routine for an instrumented function, also passes the return > address as argument. > > The "mcount" routine in glibc will be modified as follows. > > (-Snip-) > #define MCOUNT \ > -void __mcount (void) > \ > +void __mcount (void* frompc) >\ > { > \ > - mcount_internal ((u_long) RETURN_ADDRESS (1), (u_long) RETURN_ADDRESS > (0)); \ > + mcount_internal ((u_long) frompc, (u_long) RETURN_ADDRESS (0)); \ > } > (-Snip-) > If this is Ok I will send the patch to glibc as well. > 2013-08-02 Venkataramanan Kumar > > * config/aarch64/aarch64.h (MCOUNT_NAME): Define. >(NO_PROFILE_COUNTERS): Likewise. >(PROFILE_HOOK): Likewise. >(FUNCTION_PROFILER): Likewise. > * config/aarch64/aarch64.c (aarch64_function_profiler): Remove. >. > > regards, > Venkat. + emit_library_call (fun, LCT_NORMAL, VOIDmode, 1,lr,Pmode); \ +} GNU coding style requires spaces after the commas, but otherwise I have no further comments on this patch. Post the glibc patch please. Thanks /Marcus
Re: [Patch,AArch64] Support SISD Shifts (SHL/USHR/SSHL/USHL/SSHR)
On 20 August 2013 16:04, Vidya Praveen wrote: >> 2013-08-20 Vidya Praveen >> >> * config/aarch64/aarch64.md (unspec): Add UNSPEC_SISD_SSHL, >> UNSPEC_SISD_USHL, UNSPEC_USHL_2S, UNSPEC_SSHL_2S, UNSPEC_SISD_NEG. >> (3_insn): Remove. >> (aarch64_ashl_sisd_or_int_3): New Pattern. >> (aarch64_lshr_sisd_or_int_3): Likewise. >> (aarch64_ashr_sisd_or_int_3): Likewise. >> (define_split for aarch64_lshr_sisd_or_int_di3): Likewise. >> (define_split for aarch64_lshr_sisd_or_int_si3): Likewise. >> (define_split for aarch64_ashr_sisd_or_int_di3): Likewise. >> (define_split for aarch64_ashr_sisd_or_int_si3): Likewise. >> (aarch64_sisd_ushl, aarch64_sisd_sshl): Likewise. >> (aarch64_ushl_2s, aarch64_sshl_2s, aarch64_sisd_neg_qi): Likewise. >> (ror3_insn): Likewise. >> * config/aarch64/predicates.md (aarch64_simd_register): New. >> >> gcc/testsuite/ChangeLog >> >> 2013-08-20 Vidya Praveen >> >> * gcc.target/aarch64/scalar_shift_1.c: New. >> OK /Marcus
Re: [Patch, AArch64] Optimized implementation of vget_low_* in arm_neon.h.
On 20 August 2013 12:21, Tejas Belagod wrote: > Hi, > > This patch replaces all inline asm implementations of vget_low_* in > arm_neon.h with optimized implementations using other neon intrinsics. > > Tested with aarch64-none-elf. > > OK? This is OK. /Marcus