[PATCH] Fix cxx_eval_bit_field_ref (PR c++/49136)
Hi! optimize_bit_field_compare during folding can create BIT_FIELD_REFs that reference more than a single bitfield, then mask the right bits from it. The following patch changes cxx_eval_bit_field_ref to be able to read the multiple fields from the constructor. Bootstrapped/regtested on x86_64-linux and i686-linux, acked by Jason in bugzilla, committed to trunk/4.6. 2011-05-25 Jakub Jelinek ja...@redhat.com PR c++/49136 * semantics.c (cxx_eval_bit_field_ref): Handle the case when BIT_FIELD_REF doesn't cover only a single field. * g++.dg/cpp0x/constexpr-bitfield2.C: New test. * g++.dg/cpp0x/constexpr-bitfield3.C: New test. --- gcc/cp/semantics.c.jj 2011-05-20 08:14:06.0 +0200 +++ gcc/cp/semantics.c 2011-05-24 18:57:00.0 +0200 @@ -6442,6 +6442,9 @@ cxx_eval_bit_field_ref (const constexpr_ bool *non_constant_p) { tree orig_whole = TREE_OPERAND (t, 0); + tree retval, fldval, utype, mask; + bool fld_seen = false; + HOST_WIDE_INT istart, isize; tree whole = cxx_eval_constant_expression (call, orig_whole, allow_non_constant, addr, non_constant_p); @@ -6462,12 +6465,47 @@ cxx_eval_bit_field_ref (const constexpr_ return t; start = TREE_OPERAND (t, 2); + istart = tree_low_cst (start, 0); + isize = tree_low_cst (TREE_OPERAND (t, 1), 0); + utype = TREE_TYPE (t); + if (!TYPE_UNSIGNED (utype)) +utype = build_nonstandard_integer_type (TYPE_PRECISION (utype), 1); + retval = build_int_cst (utype, 0); FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (whole), i, field, value) { - if (bit_position (field) == start) + tree bitpos = bit_position (field); + if (bitpos == start DECL_SIZE (field) == TREE_OPERAND (t, 1)) return value; + if (TREE_CODE (TREE_TYPE (field)) == INTEGER_TYPE + TREE_CODE (value) == INTEGER_CST + host_integerp (bitpos, 0) + host_integerp (DECL_SIZE (field), 0)) + { + HOST_WIDE_INT bit = tree_low_cst (bitpos, 0); + HOST_WIDE_INT sz = tree_low_cst (DECL_SIZE (field), 0); + HOST_WIDE_INT shift; + if (bit = istart bit + sz = istart + isize) + { + fldval = fold_convert (utype, value); + mask = build_int_cst_type (utype, -1); + mask = fold_build2 (LSHIFT_EXPR, utype, mask, + size_int (TYPE_PRECISION (utype) - sz)); + mask = fold_build2 (RSHIFT_EXPR, utype, mask, + size_int (TYPE_PRECISION (utype) - sz)); + fldval = fold_build2 (BIT_AND_EXPR, utype, fldval, mask); + shift = bit - istart; + if (BYTES_BIG_ENDIAN) + shift = TYPE_PRECISION (utype) - shift - sz; + fldval = fold_build2 (LSHIFT_EXPR, utype, fldval, + size_int (shift)); + retval = fold_build2 (BIT_IOR_EXPR, utype, retval, fldval); + fld_seen = true; + } + } } - gcc_unreachable(); + if (fld_seen) +return fold_convert (TREE_TYPE (t), retval); + gcc_unreachable (); return error_mark_node; } --- gcc/testsuite/g++.dg/cpp0x/constexpr-bitfield2.C.jj 2011-05-24 14:37:39.0 +0200 +++ gcc/testsuite/g++.dg/cpp0x/constexpr-bitfield2.C2011-05-24 14:36:43.0 +0200 @@ -0,0 +1,19 @@ +// PR c++/49136 +// { dg-do compile } +// { dg-options -std=c++0x } + +struct day +{ + unsigned d : 5; + unsigned n : 3; + constexpr explicit day (int dd) : d(dd), n(7) {} +}; + +struct date { + int d; + constexpr date (day dd) : d(dd.n != 7 ? 7 : dd.d) {} +}; + +constexpr day d(0); +constexpr date dt(d); +static_assert (dt.d == 0, Error); --- gcc/testsuite/g++.dg/cpp0x/constexpr-bitfield3.C.jj 2011-05-24 14:37:43.0 +0200 +++ gcc/testsuite/g++.dg/cpp0x/constexpr-bitfield3.C2011-05-24 14:43:40.0 +0200 @@ -0,0 +1,33 @@ +// PR c++/49136 +// { dg-do compile } +// { dg-options -std=c++0x } + +struct S +{ + unsigned : 1; unsigned s : 27; unsigned : 4; + constexpr S (unsigned int x) : s(x) {} +}; + +template typename S +struct T +{ + unsigned int t; + constexpr T (S s) : t(s.s != 7 ? 0 : s.s) {} + constexpr T (S s, S s2) : t(s.s != s2.s ? 0 : s.s) {} +}; + +constexpr S s (7), s2 (7); +constexpr TS t (s), t2 (s, s2); +static_assert (t.t == 7, Error); +static_assert (t2.t == 7, Error); + +struct U +{ + int a : 1; int s : 1; + constexpr U (int x, int y) : a (x), s (y) {} +}; + +constexpr U u (0, -1), u2 (-1, -1); +constexpr TU t3 (u), t4 (u, u2); +static_assert (t3.t == 0, Error); +static_assert (t4.t == -1, Error); Jakub
[PATCH] Fix a typo in i386 host_detect_local_cpu (PR target/49128)
Hi! Committed as obvious. 2011-05-25 Jakub Jelinek ja...@redhat.com PR target/49128 * config/i386/driver-i386.c (host_detect_local_cpu): Fix a typo. --- gcc/config/i386/driver-i386.c (revision 174170) +++ gcc/config/i386/driver-i386.c (revision 174171) @@ -696,7 +696,7 @@ const char *host_detect_local_cpu (int a const char *bmi = has_bmi ? -mbmi : -mno-bmi; const char *tbm = has_tbm ? -mtbm : -mno-tbm; const char *avx = has_avx ? -mavx : -mno-avx; - const char *sse4_2 = has_sse4_2 ? -msse4.2 : -mno-msse4.2; + const char *sse4_2 = has_sse4_2 ? -msse4.2 : -mno-sse4.2; const char *sse4_1 = has_sse4_1 ? -msse4.1 : -mno-sse4.1; options = concat (options, cx16, sahf, movbe, ase, pclmul, Jakub
Fix PR 49014
Hello, This patch fixes PR 49014, yet another case of the insn with wrong reservation. Approved by Uros in the PR audit trail, bootstrapped and regtested on x86-64/linux and committed to trunk. Vlad, Bernd, I wonder if we can avoid having recog_memoized =0 insns that do not have proper DFA reservations (that is, they do not change the DFA state). I see that existing practice allows this as shown by Bernd's patch to 48403, i.e. such insns do not count against issue_rate. I would be happy to fix sel-sched in the same way. However, both sel-sched ICEs as shown by PRs 48143 and 49014 really uncover the latent bugs in the backend. So, is it possible to stop having such insns if scheduling is desired, or otherwise distinguish the insns that wrongly miss the proper DFA reservation? Yours, Andrey Index: gcc/ChangeLog === *** gcc/ChangeLog (revision 174171) --- gcc/ChangeLog (working copy) *** *** 1,3 --- 1,8 + 2011-05-25 Andrey Belevantsev a...@ispras.ru + + PR rtl-optimization/49014 + * config/i386/athlon.md (athlon_ssecomi): Change type to ssecomi. + 2011-05-25 Jakub Jelinek ja...@redhat.com PR target/49128 Index: gcc/config/i386/athlon.md === *** gcc/config/i386/athlon.md (revision 174171) --- gcc/config/i386/athlon.md (working copy) *** (define_insn_reservation athlon_ssecomi *** 798,804 athlon-direct,athlon-fploadk8,athlon-fadd) (define_insn_reservation athlon_ssecomi 4 (and (eq_attr cpu athlon,k8,generic64) ! (eq_attr type ssecmp)) athlon-vector,athlon-fpsched,athlon-fadd) (define_insn_reservation athlon_ssecomi_amdfam10 3 (and (eq_attr cpu amdfam10) --- 798,804 athlon-direct,athlon-fploadk8,athlon-fadd) (define_insn_reservation athlon_ssecomi 4 (and (eq_attr cpu athlon,k8,generic64) ! (eq_attr type ssecomi)) athlon-vector,athlon-fpsched,athlon-fadd) (define_insn_reservation athlon_ssecomi_amdfam10 3 (and (eq_attr cpu amdfam10)
Re: [patch][simplify-rtx] Fix 16-bit - 64-bit multiply and accumulate
On 24/05/11 20:35, Joseph S. Myers wrote: On Tue, 24 May 2011, Andrew Stubbs wrote: I've created this new, simpler patch that converts (extend (mult a b)) into (mult (extend a) (extend b)) regardless of what 'a' and 'b' might be. (These are then simplified and superfluous extends removed, of course.) Are there some missing conditions here? The two aren't equivalent in general - (extend:SI (mult:HI a b)) multiplies the HImode values in HImode (with modulo arithmetic on overflow) before extending the possibly wrapped result to SImode. You'd need a and b themselves to be extended from narrower modes in such a way that if you interpret the extended values in the signedness of the outer extension, the result of the multiplication is exactly representable in the mode of the multiplication. (For example, if both values are extended from QImode, and all extensions have the same signedness, that would be OK. There are cases that are OK where not all extensions have the same signedness, e.g. (sign_extend:DI (mult:SI a b)) where a and b are zero-extended from HImode or QImode, at least one from QImode, though there the outer extension is equivalent to a zero-extension.) So, you're saying that promoting a regular multiply to a widening multiply isn't a valid transformation anyway? I suppose that does make sense. I knew something was too easy. OK, I'll go try again. :) Andrew
Re: [testsuite] remove XFAIL for all but ia64 for g++.dg/tree-ssa/pr43411.C
Janis Johnson jani...@codesourcery.com writes: Archived test results for 4.7.0 for most processors with C++ results have: XPASS: g++.dg/tree-ssa/pr43411.C scan-tree-dump-not optimized OBJ_TYPE_REF The only failures I could find were for ia64-linux and ia64-hpux. This patch changes the xfail so it only applies to ia64-*-*. OK for trunk? Richard rejected a similar patch: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00054.html Perhaps Jan can suggest the correct approach? Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PING][PATCH 13/18] move TS_EXP to be a substructure of TS_TYPED
On Tue, May 24, 2011 at 7:34 PM, Nathan Froyd froy...@codesourcery.com wrote: `0On Mon, May 23, 2011 at 04:58:06PM +0200, Richard Guenther wrote: On Mon, May 23, 2011 at 4:18 PM, Nathan Froyd froy...@codesourcery.com wrote: On 05/17/2011 11:31 AM, Nathan Froyd wrote: On 05/10/2011 04:18 PM, Nathan Froyd wrote: On 03/10/2011 11:23 PM, Nathan Froyd wrote: After all that, we can finally make tree_exp inherit from typed_tree. Quite anticlimatic. Ping. http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00559.html Ping^2. Ping^3 to put it in Richi's INBOX. ;) Ok ;) Please check for sizeof () uses of the structs you touched sofar. ISTR a bug about fold-checking. That doesn't apply here, because I'm not renaming the struct. But I did find some problems with LTO when I was rebootstrapping prior to committing; not sure how I missed these the first time through, maybe I was mistakenly compiling without LTO support. Since we now have things being dumped to LTO that don't have TREE_CHAIN, we need to take care to not access TREE_CHAIN on such things, which the patch below does. Tested on x86_64-unknown-linux-gnu. OK to commit? Ok. Please see if you can adjust the lto-streamer-in/out.c machinery to consistently handle the new TS_ classes. Thanks, Richard. -Nathan gcc/ * tree.h (struct tree_exp): Inherit from struct tree_typed. * tree.c (initialize_tree_contains_struct): Mark TS_EXP as TS_TYPED instead of TS_COMMON. gcc/lto/ * lto.c (lto_ft_typed): New function. (lto_ft_common): Call it. (lto_ft_constructor): Likewise. (lto_ft_expr): Likewise. (lto_fixup_prevailing_decls): Check for TS_COMMON before accessing TREE_CHAIN. diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c index d64ba18..1067b51 100644 --- a/gcc/lto/lto.c +++ b/gcc/lto/lto.c @@ -254,14 +254,20 @@ remember_with_vars (tree t) static void lto_fixup_types (tree); -/* Fix up fields of a tree_common T. */ +/* Fix up fields of a tree_typed T. */ static void -lto_ft_common (tree t) +lto_ft_typed (tree t) { - /* Fixup our type. */ LTO_FIXUP_TREE (TREE_TYPE (t)); +} + +/* Fix up fields of a tree_common T. */ +static void +lto_ft_common (tree t) +{ + lto_ft_typed (t); LTO_FIXUP_TREE (TREE_CHAIN (t)); } @@ -398,7 +404,7 @@ lto_ft_constructor (tree t) unsigned HOST_WIDE_INT idx; constructor_elt *ce; - LTO_FIXUP_TREE (TREE_TYPE (t)); + lto_ft_typed (t); for (idx = 0; VEC_iterate(constructor_elt, CONSTRUCTOR_ELTS (t), idx, ce); @@ -415,7 +421,7 @@ static void lto_ft_expr (tree t) { int i; - lto_ft_common (t); + lto_ft_typed (t); for (i = TREE_OPERAND_LENGTH (t) - 1; i = 0; --i) LTO_FIXUP_TREE (TREE_OPERAND (t, i)); } @@ -2029,7 +2035,8 @@ lto_fixup_prevailing_decls (tree t) { enum tree_code code = TREE_CODE (t); LTO_NO_PREVAIL (TREE_TYPE (t)); - LTO_NO_PREVAIL (TREE_CHAIN (t)); + if (CODE_CONTAINS_STRUCT (code, TS_COMMON)) + LTO_NO_PREVAIL (TREE_CHAIN (t)); if (DECL_P (t)) { LTO_NO_PREVAIL (DECL_NAME (t)); diff --git a/gcc/tree.c b/gcc/tree.c index 3357d84..9cc99fe 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -380,6 +380,7 @@ initialize_tree_contains_struct (void) case TS_COMPLEX: case TS_SSA_NAME: case TS_CONSTRUCTOR: + case TS_EXP: MARK_TS_TYPED (code); break; @@ -388,7 +389,6 @@ initialize_tree_contains_struct (void) case TS_TYPE_COMMON: case TS_LIST: case TS_VEC: - case TS_EXP: case TS_BLOCK: case TS_BINFO: case TS_STATEMENT_LIST: diff --git a/gcc/tree.h b/gcc/tree.h index 805fe06..142237f 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -1917,7 +1917,7 @@ enum omp_clause_default_kind (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_DEFAULT)-omp_clause.subcode.default_kind) struct GTY(()) tree_exp { - struct tree_common common; + struct tree_typed typed; location_t locus; tree block; tree GTY ((special (tree_exp),
Re: [PATCH] Expand pow(x,n) into mulitplies in cse_sincos pass (PR46728, patch 2)
On Tue, May 24, 2011 at 10:35 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Here's a small patch to expand pow(x,n) for integer n using the powi(x,n) logic in the cse_sincos pass. OK for trunk? For the next patch, I'll plan on expanding pow(x,n) for n in {0.5, 0.25, 0.75, 1./3., 1./6.}. This logic will be added to gimple_expand_builtin_pow. Ok. Thanks, Richard. Bill 2011-05-24 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/46728 * tree-ssa-math-opts.c (gimple_expand_builtin_pow): New. (execute_cse_sincos): Add switch case for BUILT_IN_POW. Index: gcc/tree-ssa-math-opts.c === --- gcc/tree-ssa-math-opts.c (revision 174129) +++ gcc/tree-ssa-math-opts.c (working copy) @@ -1024,6 +1024,39 @@ gimple_expand_builtin_powi (gimple_stmt_iterator * return NULL_TREE; } +/* ARG0 and ARG1 are the two arguments to a pow builtin call in GSI + with location info LOC. If possible, create an equivalent and + less expensive sequence of statements prior to GSI, and return an + expession holding the result. */ + +static tree +gimple_expand_builtin_pow (gimple_stmt_iterator *gsi, location_t loc, + tree arg0, tree arg1) +{ + REAL_VALUE_TYPE c, cint; + HOST_WIDE_INT n; + + /* If the exponent isn't a constant, there's nothing of interest + to be done. */ + if (TREE_CODE (arg1) != REAL_CST) + return NULL_TREE; + + /* If the exponent is equivalent to an integer, expand it into + multiplies when profitable. */ + c = TREE_REAL_CST (arg1); + n = real_to_integer (c); + real_from_integer (cint, VOIDmode, n, n 0 ? -1 : 0, 0); + + if (real_identical (c, cint) + ((n = -1 n = 2) + || (flag_unsafe_math_optimizations + optimize_insn_for_speed_p () + powi_cost (n) = POWI_MAX_MULTS))) + return gimple_expand_builtin_powi (gsi, loc, arg0, n); + + return NULL_TREE; +} + /* Go through all calls to sin, cos and cexpi and call execute_cse_sincos_1 on the SSA_NAME argument of each of them. Also expand powi(x,n) into an optimal number of multiplies, when n is a constant. */ @@ -1065,6 +1098,23 @@ execute_cse_sincos (void) cfg_changed |= execute_cse_sincos_1 (arg); break; + CASE_FLT_FN (BUILT_IN_POW): + arg0 = gimple_call_arg (stmt, 0); + arg1 = gimple_call_arg (stmt, 1); + + loc = gimple_location (stmt); + result = gimple_expand_builtin_pow (gsi, loc, arg0, arg1); + + if (result) + { + tree lhs = gimple_get_lhs (stmt); + gimple new_stmt = gimple_build_assign (lhs, result); + gimple_set_location (new_stmt, loc); + unlink_stmt_vdef (stmt); + gsi_replace (gsi, new_stmt, true); + } + break; + CASE_FLT_FN (BUILT_IN_POWI): arg0 = gimple_call_arg (stmt, 0); arg1 = gimple_call_arg (stmt, 1);
Re: C6X port 4/11: Backtracking scheduler
On Tue, 10 May 2011, Bernd Schmidt wrote: On C6X, every jump instruction has 5 delay slots which can be filled with normally scheduled instructions. With an issue width of 8 insns/cycle, this means that up to 40 insns can be issued after the jump insn before the jump's side-effect takes place. I didn't particularaly feel like using reorg.c to deal with this, No kidding... multi-delay-slot bugs just waiting for you... hence these scheduler patches. THANK YOU for these first steps! brgds, H-P
Re: Patch ping #2
On Wed, May 25, 2011 at 12:00 AM, Eric Botcazou ebotca...@adacore.com wrote: Yes, I mean when -fstack-usage is enabled. Thus, make -fstack-usage -Wframe-larger-than behave the same as -fstack-usage -Wstack-usage. Or do you say that -Wstack-usage doesn't require -fstack-usage? Yes, -Wstack-usage doesn't require -fstack-usage. If it doesn't then I hope -Wstack-usage does not have effects on code generation? Neither -fstack-usage nor -Wstack-usage has any effect on code generation. The former generates a .su file and the latter issues a warning. And if not then why can't -Wframe-larger-than just be more precise on some targets? -Wframe-larger-than is documented to work on any targets and to be imprecise. So we would need to list the targets for which it is precise (or the targets for which it isn't precise) and maintain it. By contrast, if -Wstack-usage returns something, then the answer is always precise. Moreover I think that the common name is a big advantage. Thanks for explaining. The patch is ok. Thanks, Richard. -- Eric Botcazou
Faster streaming of enums
Hi, after fixing 1 byte i/o function call and most of hash table overhead, functions to handle ulebs and slebs shows top in profile. We use them in many cases where we know value range of the operand will fit in 1 byte. In particular to handle enums. This is also dangerous since we generally assume enums to be in their value range. This patch adds i/o bits for enums and integers in range that should inline well and add some sanity checking. I converted only tree streamer tags, but if accepted, I will convert more. Bootstrapped/regtested x86_64-linux, OK? * lto-streamer-out.c (output_record_start): Use lto_output_enum (lto_output_tree): Use output_record_start. * lto-streamer-in.c (input_record_start): Use lto_input_enum (lto_get_pickled_tree): Use input_record_start. * lto-section-in.c (lto_section_overrun): Turn into fatal error. (lto_value_range_error): New function. * lto-streamer.h (lto_value_range_error): Declare. (lto_output_int_in_range, lto_input_int_in_range): New functions. (lto_output_enum, lto_input_enum): New macros. Index: lto-streamer-out.c === *** lto-streamer-out.c (revision 174175) --- lto-streamer-out.c (working copy) *** output_sleb128 (struct output_block *ob, *** 270,281 /* Output the start of a record with TAG to output block OB. */ ! static void output_record_start (struct output_block *ob, enum LTO_tags tag) { ! /* Make sure TAG fits inside an unsigned int. */ ! gcc_assert (tag == (enum LTO_tags) (unsigned) tag); ! output_uleb128 (ob, tag); } --- 270,279 /* Output the start of a record with TAG to output block OB. */ ! static inline void output_record_start (struct output_block *ob, enum LTO_tags tag) { ! lto_output_enum (ob-main_stream, LTO_tags, LTO_NUM_TAGS, tag); } *** lto_output_tree (struct output_block *ob *** 1401,1407 will instantiate two different nodes for the same object. */ output_record_start (ob, LTO_tree_pickle_reference); output_uleb128 (ob, ix); ! output_uleb128 (ob, lto_tree_code_to_tag (TREE_CODE (expr))); } else if (lto_stream_as_builtin_p (expr)) { --- 1399,1405 will instantiate two different nodes for the same object. */ output_record_start (ob, LTO_tree_pickle_reference); output_uleb128 (ob, ix); ! output_record_start (ob, lto_tree_code_to_tag (TREE_CODE (expr))); } else if (lto_stream_as_builtin_p (expr)) { Index: lto-streamer-in.c === *** lto-streamer-in.c (revision 174175) --- lto-streamer-in.c (working copy) *** lto_input_string (struct data_in *data_i *** 231,241 /* Return the next tag in the input block IB. */ ! static enum LTO_tags input_record_start (struct lto_input_block *ib) { ! enum LTO_tags tag = (enum LTO_tags) lto_input_uleb128 (ib); ! return tag; } --- 231,240 /* Return the next tag in the input block IB. */ ! static inline enum LTO_tags input_record_start (struct lto_input_block *ib) { ! return lto_input_enum (ib, LTO_tags, LTO_NUM_TAGS); } *** lto_get_pickled_tree (struct lto_input_b *** 2558,2564 enum LTO_tags expected_tag; ix = lto_input_uleb128 (ib); ! expected_tag = (enum LTO_tags) lto_input_uleb128 (ib); result = lto_streamer_cache_get (data_in-reader_cache, ix); gcc_assert (result --- 2557,2563 enum LTO_tags expected_tag; ix = lto_input_uleb128 (ib); ! expected_tag = input_record_start (ib); result = lto_streamer_cache_get (data_in-reader_cache, ix); gcc_assert (result Index: lto-section-in.c === *** lto-section-in.c(revision 174175) --- lto-section-in.c(working copy) *** lto_get_function_in_decl_state (struct l *** 483,488 void lto_section_overrun (struct lto_input_block *ib) { ! internal_error (bytecode stream: trying to read %d bytes ! after the end of the input buffer, ib-p - ib-len); } --- 483,498 void lto_section_overrun (struct lto_input_block *ib) { ! fatal_error (bytecode stream: trying to read %d bytes ! after the end of the input buffer, ib-p - ib-len); ! } ! ! /* Report out of range value. */ ! ! void ! lto_value_range_error (const char *purpose, HOST_WIDE_INT val, ! HOST_WIDE_INT min, HOST_WIDE_INT max) ! { ! fatal_error (%s out of range: Range is %i to %i, value is %i, ! purpose, (int)min, (int)max, (int)val); } Index: lto-streamer.h === *** lto-streamer.h (revision 174175) --- lto-streamer.h (working copy) *** extern int
[v3] libstdc++/49141
Hi, committed to mainline and 4_6-branch. Thanks, Paolo. 2011-05-24 Paolo Carlini paolo.carl...@oracle.com PR libstdc++/49141 * testsuite/26_numerics/complex/cons/48760.cc: Use dg-require-c-std. * testsuite/26_numerics/complex/cons/48760_c++0x.cc: Likewise. * testsuite/26_numerics/headers/cmath/19322.cc: Likewise. Index: testsuite/26_numerics/complex/cons/48760.cc === --- testsuite/26_numerics/complex/cons/48760.cc (revision 174112) +++ testsuite/26_numerics/complex/cons/48760.cc (working copy) @@ -1,3 +1,5 @@ +// { dg-require-c-std } + // Copyright (C) 2011 Free Software Foundation, Inc. // // This file is part of the GNU ISO C++ Library. This library is free Index: testsuite/26_numerics/complex/cons/48760_c++0x.cc === --- testsuite/26_numerics/complex/cons/48760_c++0x.cc (revision 174112) +++ testsuite/26_numerics/complex/cons/48760_c++0x.cc (working copy) @@ -1,4 +1,5 @@ // { dg-options -std=gnu++0x } +// { dg-require-c-std } // Copyright (C) 2011 Free Software Foundation, Inc. // Index: testsuite/26_numerics/headers/cmath/19322.cc === --- testsuite/26_numerics/headers/cmath/19322.cc(revision 174112) +++ testsuite/26_numerics/headers/cmath/19322.cc(working copy) @@ -1,4 +1,6 @@ -// Copyright (C) 2005, 2009 Free Software Foundation, Inc. +// { dg-require-c-std } + +// Copyright (C) 2005, 2009, 2010, 2011 Free Software Foundation, Inc. // // This file is part of the GNU ISO C++ Library. This library is free // software; you can redistribute it and/or modify it under the @@ -15,11 +17,9 @@ // with this library; see the file COPYING3. If not see // http://www.gnu.org/licenses/. - #include cmath #include testsuite_hooks.h -#if _GLIBCXX_USE_C99_MATH // libstdc++/19322 void test01() { @@ -27,12 +27,9 @@ VERIFY( !std::isnan(3.0) ); } -#endif int main() { -#if _GLIBCXX_USE_C99_MATH test01(); -#endif return 0; }
Re: C6X port 9/11: Allow defining attributes in terms of another
On 05/25/2011 08:56 AM, Hans-Peter Nilsson wrote: On Tue, 10 May 2011, Bernd Schmidt wrote: I've found it useful to use a construct such as the following: (define_attr units64 unknown,d,d_addr,l,m,s,dl,ds,dls,ls (const_string unknown)) (define_attr units64p unknown,d,d_addr,l,m,s,dl,ds,dls,ls (attr units64)) to define one attribute in terms of another by default, So it's just the units64p default value taken from the units64 default value or units64p gets its default value from the final units64 value? units64p has the final value of units64, unless an insn explicitly gives it a different value. This is because C64X+ is really very similar to C64X in most respects. We then select which of the various units definitions to use for a given CPU: (define_attr units unknown,d,d_addr,l,m,s,dl,ds,dls,ls (cond [(eq_attr cpu c62x) (attr units62) (eq_attr cpu c67x) (attr units67) (eq_attr cpu c67xp) (attr units67p) (eq_attr cpu c64x) (attr units64) (eq_attr cpu c64xp) (attr units64p) (eq_attr cpu c674x) (attr units674) ] (const_string unknown))) allowing individual insn patterns to override the definition of units64p where necessary. This patch adds support for this in genattrtab. I'm not sure I get it, and I think I would be helped by seeing the documentation update. ;) I'm not sure where you're looking for added documentation for this patch. It just generalizes the define_attr mechanism a little to allow one more kind of expression. Bernd
Re: Faster streaming of enums
On Wed, May 25, 2011 at 11:45 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi, after fixing 1 byte i/o function call and most of hash table overhead, functions to handle ulebs and slebs shows top in profile. We use them in many cases where we know value range of the operand will fit in 1 byte. In particular to handle enums. This is also dangerous since we generally assume enums to be in their value range. This patch adds i/o bits for enums and integers in range that should inline well and add some sanity checking. I converted only tree streamer tags, but if accepted, I will convert more. Bootstrapped/regtested x86_64-linux, OK? * lto-streamer-out.c (output_record_start): Use lto_output_enum (lto_output_tree): Use output_record_start. * lto-streamer-in.c (input_record_start): Use lto_input_enum (lto_get_pickled_tree): Use input_record_start. * lto-section-in.c (lto_section_overrun): Turn into fatal error. (lto_value_range_error): New function. * lto-streamer.h (lto_value_range_error): Declare. (lto_output_int_in_range, lto_input_int_in_range): New functions. (lto_output_enum, lto_input_enum): New macros. Index: lto-streamer-out.c === *** lto-streamer-out.c (revision 174175) --- lto-streamer-out.c (working copy) *** output_sleb128 (struct output_block *ob, *** 270,281 /* Output the start of a record with TAG to output block OB. */ ! static void output_record_start (struct output_block *ob, enum LTO_tags tag) { ! /* Make sure TAG fits inside an unsigned int. */ ! gcc_assert (tag == (enum LTO_tags) (unsigned) tag); ! output_uleb128 (ob, tag); } --- 270,279 /* Output the start of a record with TAG to output block OB. */ ! static inline void output_record_start (struct output_block *ob, enum LTO_tags tag) { ! lto_output_enum (ob-main_stream, LTO_tags, LTO_NUM_TAGS, tag); } *** lto_output_tree (struct output_block *ob *** 1401,1407 will instantiate two different nodes for the same object. */ output_record_start (ob, LTO_tree_pickle_reference); output_uleb128 (ob, ix); ! output_uleb128 (ob, lto_tree_code_to_tag (TREE_CODE (expr))); } else if (lto_stream_as_builtin_p (expr)) { --- 1399,1405 will instantiate two different nodes for the same object. */ output_record_start (ob, LTO_tree_pickle_reference); output_uleb128 (ob, ix); ! output_record_start (ob, lto_tree_code_to_tag (TREE_CODE (expr))); I'd prefer lto_output_enum here as we don't really start a new output record but just emit something for a sanity check. } else if (lto_stream_as_builtin_p (expr)) { Index: lto-streamer-in.c === *** lto-streamer-in.c (revision 174175) --- lto-streamer-in.c (working copy) *** lto_input_string (struct data_in *data_i *** 231,241 /* Return the next tag in the input block IB. */ ! static enum LTO_tags input_record_start (struct lto_input_block *ib) { ! enum LTO_tags tag = (enum LTO_tags) lto_input_uleb128 (ib); ! return tag; } --- 231,240 /* Return the next tag in the input block IB. */ ! static inline enum LTO_tags input_record_start (struct lto_input_block *ib) { ! return lto_input_enum (ib, LTO_tags, LTO_NUM_TAGS); } *** lto_get_pickled_tree (struct lto_input_b *** 2558,2564 enum LTO_tags expected_tag; ix = lto_input_uleb128 (ib); ! expected_tag = (enum LTO_tags) lto_input_uleb128 (ib); result = lto_streamer_cache_get (data_in-reader_cache, ix); gcc_assert (result --- 2557,2563 enum LTO_tags expected_tag; ix = lto_input_uleb128 (ib); ! expected_tag = input_record_start (ib); Likewise use input_enum. result = lto_streamer_cache_get (data_in-reader_cache, ix); gcc_assert (result Index: lto-section-in.c === *** lto-section-in.c (revision 174175) --- lto-section-in.c (working copy) *** lto_get_function_in_decl_state (struct l *** 483,488 void lto_section_overrun (struct lto_input_block *ib) { ! internal_error (bytecode stream: trying to read %d bytes ! after the end of the input buffer, ib-p - ib-len); } --- 483,498 void lto_section_overrun (struct lto_input_block *ib) { ! fatal_error (bytecode stream: trying to read %d bytes ! after the end of the input buffer, ib-p - ib-len); ! } ! ! /* Report out of range value. */ ! ! void ! lto_value_range_error (const char *purpose, HOST_WIDE_INT val, ! HOST_WIDE_INT min, HOST_WIDE_INT max) ! { ! fatal_error (%s out of range: Range is %i to %i, value is %i, !
Re: Faster streaming of enums
*** lto_output_tree (struct output_block *ob *** 1401,1407 will instantiate two different nodes for the same object. */ output_record_start (ob, LTO_tree_pickle_reference); output_uleb128 (ob, ix); ! output_uleb128 (ob, lto_tree_code_to_tag (TREE_CODE (expr))); } else if (lto_stream_as_builtin_p (expr)) { --- 1399,1405 will instantiate two different nodes for the same object. */ output_record_start (ob, LTO_tree_pickle_reference); output_uleb128 (ob, ix); ! output_record_start (ob, lto_tree_code_to_tag (TREE_CODE (expr))); I'd prefer lto_output_enum here as we don't really start a new output record but just emit something for a sanity check. OK, I wondered what is cleaner, will update the patch. + /* Output VAL into OBS and verify it is in range MIN...MAX that is supposed + to be compile time constant. + Be host independent, limit range to 31bits. */ + + static inline void + lto_output_int_in_range (struct lto_output_stream *obs, + HOST_WIDE_INT min, + HOST_WIDE_INT max, + HOST_WIDE_INT val) + { + HOST_WIDE_INT range = max - min; + + gcc_checking_assert (val = min val = max range 0 + range 0x7fff); + + val -= min; + lto_output_1_stream (obs, val 255); + if (range = 0xff) + lto_output_1_stream (obs, (val 8) 255); + if (range = 0x) + lto_output_1_stream (obs, (val 16) 255); + if (range = 0xff) + lto_output_1_stream (obs, (val 24) 255); so you didn't want to create a bitpack_pack_int_in_range and use bitpacks for enums? I suppose for some of the remaining cases packing them into existing bitpacks would be preferable? Well, in my TODO list is to have both. Where we don't bitpatck enums with other values (that is the most common case of enums) this way we produce less overhead and have extra sanity check that the bits unused by enum are really 0. I guess final API should have both lto_output_enum and lto_bitpack_output_enum. I don't really care if the first have the implementation above or just creates its own bitpack to handle the value. + { + HOST_WIDE_INT range = max - min; + HOST_WIDE_INT val = lto_input_1_unsigned (ib); + + gcc_checking_assert (range 0); The assert doesn't match the one during output. Hmm, OK, will match. Honza
Re: [patch][simplify-rtx] Fix 16-bit - 64-bit multiply and accumulate
On Wed, 25 May 2011, Andrew Stubbs wrote: So, you're saying that promoting a regular multiply to a widening multiply isn't a valid transformation anyway? I suppose that does make sense. I knew In general, yes. RTL always has modulo semantics (except for division and remainder by -1); all optimizations based on undefinedness of overflow (in the absence of -fwrapv) happen at tree/GIMPLE level, where signed and unsigned types are still distinct. (So you could promote a regular multiply of signed types at GIMPLE level in the absence of -fwrapv/-ftrapv, but not at RTL level and not for unsigned types at GIMPLE level.) -- Joseph S. Myers jos...@codesourcery.com
Don't ICE in function_and_variable_visibility on Tru64 UNIX (PR middle-end/49062)
Almost 400 c++ and libstdc++ testcases ICE on Tru64 UNIX since Jan's patch 2011-05-06 Jan Hubicka j...@suse.cz * cgraph.c (cgraph_add_thunk): Create real function node instead of alias node; finalize it and mark needed/reachale; arrange visibility to be right and add it into the corresponding same comdat group list. (dump_cgraph_node): Dump thunks. as described in the PR. He provided the following patch in private mail. I tested it on alpha-dec-osf5.1b by rebuilding cc1plus and rerunning the g++ and libstdc++-v3 testsuites: all failures were gone. Approved in private mail, committed to mainline. Rainer 2011-05-25 Jan Hubicka j...@suse.cz PR middle-end/49062 * ipa.c (function_and_variable_visibility): Only add to same comdat group list if DECL_ONE_ONLY. diff --git a/gcc/ipa.c b/gcc/ipa.c --- a/gcc/ipa.c +++ b/gcc/ipa.c @@ -897,7 +897,7 @@ function_and_variable_visibility (bool w { DECL_COMDAT (node-decl) = 1; DECL_COMDAT_GROUP (node-decl) = DECL_COMDAT_GROUP (decl_node-decl); - if (!node-same_comdat_group) + if (DECL_ONE_ONLY (decl_node-decl) !node-same_comdat_group) { node-same_comdat_group = decl_node; if (!decl_node-same_comdat_group) -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: PATCH: Add pause intrinsic
On 05/24/2011 07:28 PM, H.J. Lu wrote: This patch implements pause intrinsic suggested by Andi. OK for trunk? What does full memory barrier here mean? +@table @code +@item void __builtin_ia32_pause (void) +Generates the @code{pause} machine instruction with full memory barrier. +@end table There a memory clobber, but no barrier instruction AFAICS. The doc needs to explain it a bit better. Andrew.
[PATCH] Ignore TYPE_DECLs for canonical type compute in LTO
Just figured that we'd get TYPE_DECLs and FUNCTION_DECLs in aggregate types. But we should treat layout-compatible structs as same, regardless of the above. LTO profile-bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2011-05-25 Richard Guenther rguent...@suse.de * gimple.c (iterative_hash_canonical_type): Skip non-FIELD_DECLs. (gimple_canonical_types_compatible_p): Likewise. Index: gcc/gimple.c === --- gcc/gimple.c(revision 174118) +++ gcc/gimple.c(working copy) @@ -4376,10 +4382,11 @@ iterative_hash_canonical_type (tree type tree f; for (f = TYPE_FIELDS (type), nf = 0; f; f = TREE_CHAIN (f)) - { - v = iterative_hash_canonical_type (TREE_TYPE (f), v); - nf++; - } + if (TREE_CODE (f) == FIELD_DECL) + { + v = iterative_hash_canonical_type (TREE_TYPE (f), v); + nf++; + } v = iterative_hash_hashval_t (nf, v); } @@ -4688,6 +4695,13 @@ gimple_canonical_types_compatible_p (tre f1 f2; f1 = TREE_CHAIN (f1), f2 = TREE_CHAIN (f2)) { + /* Skip non-fields. */ + while (f1 TREE_CODE (f1) != FIELD_DECL) + f1 = TREE_CHAIN (f1); + while (f2 TREE_CODE (f2) != FIELD_DECL) + f2 = TREE_CHAIN (f2); + if (!f1 || !f2) + break; /* The fields must have the same name, offset and type. */ if (DECL_NONADDRESSABLE_P (f1) != DECL_NONADDRESSABLE_P (f2) || !gimple_compare_field_offset (f1, f2)
Re: Faster streaming of enums
On Wed, 25 May 2011, Jan Hubicka wrote: *** lto_output_tree (struct output_block *ob *** 1401,1407 will instantiate two different nodes for the same object. */ output_record_start (ob, LTO_tree_pickle_reference); output_uleb128 (ob, ix); ! output_uleb128 (ob, lto_tree_code_to_tag (TREE_CODE (expr))); } else if (lto_stream_as_builtin_p (expr)) { --- 1399,1405 will instantiate two different nodes for the same object. */ output_record_start (ob, LTO_tree_pickle_reference); output_uleb128 (ob, ix); ! output_record_start (ob, lto_tree_code_to_tag (TREE_CODE (expr))); I'd prefer lto_output_enum here as we don't really start a new output record but just emit something for a sanity check. OK, I wondered what is cleaner, will update the patch. + /* Output VAL into OBS and verify it is in range MIN...MAX that is supposed + to be compile time constant. + Be host independent, limit range to 31bits. */ + + static inline void + lto_output_int_in_range (struct lto_output_stream *obs, + HOST_WIDE_INT min, + HOST_WIDE_INT max, + HOST_WIDE_INT val) + { + HOST_WIDE_INT range = max - min; + + gcc_checking_assert (val = min val = max range 0 + range 0x7fff); + + val -= min; + lto_output_1_stream (obs, val 255); + if (range = 0xff) + lto_output_1_stream (obs, (val 8) 255); + if (range = 0x) + lto_output_1_stream (obs, (val 16) 255); + if (range = 0xff) + lto_output_1_stream (obs, (val 24) 255); so you didn't want to create a bitpack_pack_int_in_range and use bitpacks for enums? I suppose for some of the remaining cases packing them into existing bitpacks would be preferable? Well, in my TODO list is to have both. Where we don't bitpatck enums with other values (that is the most common case of enums) this way we produce less overhead and have extra sanity check that the bits unused by enum are really 0. I guess final API should have both lto_output_enum and lto_bitpack_output_enum. I don't really care if the first have the implementation above or just creates its own bitpack to handle the value. Ok. + { + HOST_WIDE_INT range = max - min; + HOST_WIDE_INT val = lto_input_1_unsigned (ib); + + gcc_checking_assert (range 0); The assert doesn't match the one during output. Hmm, OK, will match. Patch is ok with the changes. Thanks, Richard.
Re: PATCH: Add pause intrinsic
On Wed, May 25, 2011 at 12:26 PM, Andrew Haley a...@redhat.com wrote: On 05/24/2011 07:28 PM, H.J. Lu wrote: This patch implements pause intrinsic suggested by Andi. OK for trunk? What does full memory barrier here mean? +@table @code +@item void __builtin_ia32_pause (void) +Generates the @code{pause} machine instruction with full memory barrier. +@end table There a memory clobber, but no barrier instruction AFAICS. The doc needs to explain it a bit better. The name also sounds odd to me (reminds me of Fortran PAUSE ...). Richard. Andrew.
4.6: do not divide by 0 on insane profile
Hi, cgraph_decide_recursive_inlining may decide to divide by 0 when profile is read but it is small enough, so even count of 0 is considered as possibly hot. This particularly happens when profile was not really read after all. The problem is fixed on mainline differently. This patch just obviously plugs the symptom. Bootstrapped/regtested x86_64-linux, comitted. Index: ChangeLog === --- ChangeLog (revision 174182) +++ ChangeLog (working copy) @@ -1,3 +1,9 @@ +2011-05-18 Jan Hubicka j...@suse.cz + + PR tree-optimization/44897 + * ipa-inline.c (cgraph_decide_recursive_inlining): Do not divide + by zero for insane profiles. + 2011-05-24 Eric Botcazou ebotca...@adacore.com * config/sparc/sparc.c (sparc_option_override): If not set by the user, Index: ipa-inline.c === --- ipa-inline.c(revision 173893) +++ ipa-inline.c(working copy) @@ -895,7 +895,7 @@ cgraph_decide_recursive_inlining (struct continue; } - if (max_count) + if (max_count node-count) { if (!cgraph_maybe_hot_edge_p (curr)) {
Re: [patch ada]: Fix bootstrap for Ada
2011/5/24 Arnaud Charlet char...@adacore.com: I'm confused. The above looks wrong to me: it does not return an empty string, it returns a pointer to an uninitialized string, which cannot be right (and should generate a warning :-) No, static vars are implicitly zero initialized when not explicitly initialized. Hmm I see. Still, the above code is not easy to read IMO. I'd suggest instead the following which is easier to read and understand: __gnat_to_canonical_file_list_next (void) { static char empty[] = ; return empty; } That's actually a change I was about to commit since we've done it recently at AdaCore, so OK with the above variant. Arno Ok applied patch as you suggested at revision 174185. Not sure that sure if this is more readable, but anyway. Regards, Kai
Re: [testsuite] remove XFAIL for all but ia64 for g++.dg/tree-ssa/pr43411.C
Am Wed 25 May 2011 11:04:06 AM CEST schrieb Richard Guenther richard.guent...@gmail.com: On Wed, May 25, 2011 at 10:38 AM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: Janis Johnson jani...@codesourcery.com writes: Archived test results for 4.7.0 for most processors with C++ results have: XPASS: g++.dg/tree-ssa/pr43411.C scan-tree-dump-not optimized OBJ_TYPE_REF The only failures I could find were for ia64-linux and ia64-hpux. This patch changes the xfail so it only applies to ia64-*-*. OK for trunk? Richard rejected a similar patch: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00054.html Perhaps Jan can suggest the correct approach? We should verify that the call to val is inlined in all functions. Maybe rename it to something larger and scan the optimized dump so that name doesn't appear. Indeed, this seems to be safest approach I can think of. If function is supposed to be optimized out completely by early passes, we should just search release_ssa. It is not the case here and dumping IPA info for inlining all instance would be bit tricky. Honza
Re: [pph] Regularize Streaming (issue4528096)
On Tue, May 24, 2011 at 22:42, Lawrence Crowl cr...@google.com wrote: For TEMPLATE_DECL, also stream DECL_MEMBER_TEMPLATE_P. We don't really need to handle this. DECL_MEMBER_TEMPLATE_P is using DECL_LANG_FLAG_1. All the lang_flag fields are already handled in pph_stream_unpack_value_fields (and its counterpart). Besides, this patch is writing DECL_MEMBER_TEMPLATE_P but it is not reading it back, this will cause stream synchronization problems The code edits do NOT conform with the gcc style. This is deliberate so that diff reports sensible differences. I will make a separate patch to fix the style. *gasp* I am horrified! ;) + case USING_DECL: + case VAR_DECL: { + /* FIXME pph: Should we merge DECL_INITIAL into lang_specific? */ Hm? + case TEMPLATE_DECL: { + pph_output_tree_or_ref_1 (stream, DECL_INITIAL (expr), ref_p, 3); + pph_stream_write_lang_specific (stream, expr, ref_p); pph_output_tree_or_ref_1 (stream, DECL_TEMPLATE_RESULT (expr), ref_p, 3); pph_output_tree_or_ref_1 (stream, DECL_TEMPLATE_PARMS (expr), ref_p, 3); pph_output_tree_or_ref_1 (stream, DECL_CONTEXT (expr), ref_p, 3); + pph_output_uchar (stream, DECL_MEMBER_TEMPLATE_P (expr)); There does not seem to be a read operation for DECL_MEMBER_TEMPLATE_P. Diego.
Re: PATCH: Add pause intrinsic
On Wed, May 25, 2011 at 3:26 AM, Andrew Haley a...@redhat.com wrote: On 05/24/2011 07:28 PM, H.J. Lu wrote: This patch implements pause intrinsic suggested by Andi. OK for trunk? What does full memory barrier here mean? +@table @code +@item void __builtin_ia32_pause (void) +Generates the @code{pause} machine instruction with full memory barrier. +@end table There a memory clobber, but no barrier instruction AFAICS. The doc needs to explain it a bit better. There are read/load memory barrier, write/store memory barrier and full/general memory barrier. You can find them at http://www.kernel.org/doc/Documentation/memory-barriers.txt Should I include a pointer to it? -- H.J.
Re: [patch][simplify-rtx] Fix 16-bit - 64-bit multiply and accumulate
On 24/05/11 20:35, Joseph S. Myers wrote: On Tue, 24 May 2011, Andrew Stubbs wrote: I've created this new, simpler patch that converts (extend (mult a b)) into (mult (extend a) (extend b)) regardless of what 'a' and 'b' might be. (These are then simplified and superfluous extends removed, of course.) Are there some missing conditions here? The two aren't equivalent in general - (extend:SI (mult:HI a b)) multiplies the HImode values in HImode (with modulo arithmetic on overflow) before extending the possibly wrapped result to SImode. Ok, I've now modified my patch to prevent it widening regular multiplies. It now converts (extend (mult (extend a) (extend b))) to (mult (newextend a) (newextend b)) But I also have it convert (extend (mult (shift a) (extend b))) to (mult (newextend (shift a)) (newextend b)) The latter case is to catch widening multiplies that extract a subreg using a shift. I don't understand why it doesn't just use subreg in the first place, but apparently it doesn't, and changing it to do that would no doubt break many existing machine descriptions. The latter case also happens to catch cases where an extend is represented by (ashiftrt (ashift x)), which is nice. I know that, potentially, not all shifted operands are going to be widening multiplies, but I *think* this should be safe because other random shift values are unlikely to match a real widening mult instruction (and if they do then the code would already be broken). If somebody knows a reason why this isn't safe then I think I'm going to need some help figuring out what conditions to use. OK? Andrew 2011-05-25 Bernd Schmidt ber...@codesourcery.com Andrew Stubbs a...@codesourcery.com gcc/ * simplify-rtx.c (simplify_unary_operation_1): Canonicalize widening multiplies. * doc/md.texi (Canonicalization of Instructions): Document widening multiply canonicalization. gcc/testsuite/ * gcc.target/arm/mla-2.c: New test. --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5840,6 +5840,21 @@ Equality comparisons of a group of bits (usually a single bit) with zero will be written using @code{zero_extract} rather than the equivalent @code{and} or @code{sign_extract} operations. +@cindex @code{mult}, canonicalization of +@item +@code{(sign_extend:@var{m1} (mult:@var{m2} (sign_extend:@var{m2} @var{x}) +(sign_extend:@var{m2} @var{y})))} is converted to @code{(mult:@var{m1} +(sign_extend:@var{m1} @var{x}) (sign_extend:@var{m1} @var{y}))}, and likewise +for @code{zero_extend}. + +@item +@code{(sign_extend:@var{m1} (mult:@var{m2} (ashiftrt:@var{m2} +@var{x} @var{s}) (sign_extend:@var{m2} @var{y})))} is converted to +@code{(mult:@var{m1} (sign_extend:@var{m1} (ashiftrt:@var{m2} @var{x} @var{s})) +(sign_extend:@var{m1} @var{y}))}, and likewise for patterns using @code{zero_extend} +and @code{lshiftrt}. If the second operand of @code{mult} is also a shift, +then that is extended also. + @end itemize Further canonicalization rules are defined in the function --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -1000,6 +1000,34 @@ simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op) GET_CODE (XEXP (XEXP (op, 0), 1)) == LABEL_REF) return XEXP (op, 0); + /* Extending a widening multiplication should be canonicalized to + a wider widening multiplication. */ + if (GET_CODE (op) == MULT) + { + rtx lhs = XEXP (op, 0); + rtx rhs = XEXP (op, 1); + enum rtx_code lcode = GET_CODE (lhs); + enum rtx_code rcode = GET_CODE (rhs); + + /* Widening multiplies usually extend both operands, but sometimes + they use a shift to extract a portion of a register. We assume + it is safe to widen all such operands because other examples + won't match real instructions. */ + if ((lcode == SIGN_EXTEND || lcode == ASHIFTRT) + (rcode == SIGN_EXTEND || rcode == ASHIFTRT)) + { + enum machine_mode lmode = GET_MODE (lhs); + enum machine_mode rmode = GET_MODE (lhs); + return simplify_gen_binary (MULT, mode, + simplify_gen_unary (SIGN_EXTEND, + mode, + lhs, lmode), + simplify_gen_unary (SIGN_EXTEND, + mode, + rhs, rmode)); + } + } + /* Check for a sign extension of a subreg of a promoted variable, where the promotion is sign-extended, and the target mode is the same as the variable's promotion. */ @@ -1071,6 +1099,34 @@ simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op) GET_MODE_SIZE (mode) = GET_MODE_SIZE (GET_MODE (XEXP (op, 0 return rtl_hooks.gen_lowpart_no_emit (mode, op); + /* Extending a widening multiplication should be canonicalized to + a wider widening multiplication. */ + if (GET_CODE (op) == MULT) + { + rtx lhs = XEXP (op, 0); + rtx rhs = XEXP (op, 1); + enum rtx_code lcode = GET_CODE (lhs); + enum rtx_code rcode = GET_CODE (rhs); + + /* Widening multiplies
Re: PATCH: Add pause intrinsic
On Wed, May 25, 2011 at 3:31 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, May 25, 2011 at 12:26 PM, Andrew Haley a...@redhat.com wrote: On 05/24/2011 07:28 PM, H.J. Lu wrote: This patch implements pause intrinsic suggested by Andi. OK for trunk? What does full memory barrier here mean? +@table @code +@item void __builtin_ia32_pause (void) +Generates the @code{pause} machine instruction with full memory barrier. +@end table There a memory clobber, but no barrier instruction AFAICS. The doc needs to explain it a bit better. The name also sounds odd to me (reminds me of Fortran PAUSE ...). __builtin_ia32_pause is the C intrinsic for x86 machine instruction. I don't think people will get confused by its name. -- H.J.
Re: RFA PR 48770
On 05/24/2011 03:34 PM, Jeff Law wrote: This has gone latent on the trunk, but the underlying issue hasn't been resolved. ira.c::update_equiv_regs can create REG_EQUIV notes for equivalences which are local to a block rather than the traditional function-wide equivalences we typically work with. This occurs when we have an insn that loads a pseudo from a MEM and the pseudo is used within only a single block and the MEM remains unchanged through the life of the pseudo. Starting with the assumption that we're going to create a block local pseudo under the rules noted above, consider this RTL: (set (reg X) (some address)) (set (reg Y) (mem (reg X))) (use Y) We're going to create an equivalence between (reg Y) and its memory location in update_equiv_regs. Assume IRA is able to allocate a hard reg for reg X, but not reg Y. reload's strategy in this situation will be to remove the insn which creates the equivalence between reg Y and the memory location. Uses of reg Y will be replaced with the equivalent memory location. That's all fine and good, except reload uses delete_dead_insn, which deletes the equivalencing insn, but also recursively tries to remove the prior insn if it becomes dead as a result of removing the equivalencing insn. Anyway, continuing with our example, reg X gets a hard reg, so our RTL will look something like (set (reg 0) (some address)) (set (reg Y) (mem (reg 0))) (use Y) Then we remove the equivalencing insn resulting in (set (reg 0) (some address) (use Y) And we recurse from delete_dead_insn and determine that the first insn was dead as well, so it gets removed leaving: (use Y) We then replace Y with its equivalent memory location (use (mem (reg 0)) At which point we lose because hard reg 0 is no longer initialized. The code in question is literally 20 years old and predates running any real dead code elimination after reload. ISTM the right thing to do is stop using delete_dead_insn in this code and let the post-reload DCE pass do its job. That allows us to continue to record the block local equivalence. Sounds like the right thing to do. OK. (Can we eliminate the other caller?) I've looked at code generation; it appears unchanged on i686-linux, which I think is the expected result. There are minor differences in assembly output on mips64-linux. If you want to look at it, I'm attaching a testcase - compile with -O2 -fno-reorder-blocks. Bernd typedef _Bool bool; typedef struct { volatile int counter; } atomic_t; struct list_head { struct list_head *next, *prev; }; typedef void (*ctor_fn_t)(void); struct kref { atomic_t refcount; }; struct kobject { const char *name; struct list_head entry; struct kobject *parent; struct kset *kset; struct kobj_type *ktype; struct sysfs_dirent *sd; struct kref kref; unsigned int state_initialized:1; unsigned int state_in_sysfs:1; unsigned int state_add_uevent_sent:1; unsigned int state_remove_uevent_sent:1; unsigned int uevent_suppress:1; }; static inline __attribute__((always_inline)) void _do_trace_module_get (void (*probe)(char *name, bool wait, unsigned long ip)) { return -38; } struct module_kobject { struct kobject kobj; struct module *mod; struct kobject *drivers_dir; struct module_param_attrs *mp; }; enum module_state { MODULE_STATE_LIVE, MODULE_STATE_COMING, MODULE_STATE_GOING, }; struct module { enum module_state state;struct list_head list;char name[(64 - sizeof(unsigned long))];struct module_kobject mkobj; struct module_attribute *modinfo_attrs; const char *version; const char *srcversion; struct kobject *holders_dir;const struct kernel_symbol *syms; const unsigned long *crcs; unsigned int num_syms;struct kernel_param *kp; unsigned int num_kp;unsigned int num_gpl_syms; const struct kernel_symbol *gpl_syms; const unsigned long *gpl_crcs; struct list_head modules_which_use_me;struct task_struct *waiter;void (*exit)(void); struct module_ref { unsigned int incs; unsigned int decs; } *refptr; ctor_fn_t *ctors; unsigned int num_ctors; }; static inline __attribute__((always_inline)) int bscnl_emit(char *buf, int buflen, int rbot, int rtop, int len) { if (len 0) len += scnprintf(buf + len, buflen - len, ,); if (rbot == rtop) len += scnprintf(buf + len, buflen - len, %d, rbot); else len += scnprintf(buf + len, buflen - len, %d-%d, rbot, rtop); return len; } int bitmap_scnlistprintf(char *buf, unsigned int buflen, const unsigned long *maskp, int nmaskbits) { int len = 0; int cur, rbot, rtop; if (buflen == 0) return 0; buf[0] = 0; rbot = cur = find_next_bit((maskp), (nmaskbits), 0); while (cur nmaskbits) { rtop = cur; cur = find_next_bit(maskp, nmaskbits, cur+1); if (cur = nmaskbits || cur rtop + 1) {len = bscnl_emit(buf, buflen, rbot, rtop, len);rbot = cur; } } return len; }
Re: [PATCH] Fix a typo in i386 host_detect_local_cpu (PR target/49128)
On Wed, May 25, 2011 at 12:15 AM, Jakub Jelinek ja...@redhat.com wrote: Hi! Committed as obvious. 2011-05-25 Jakub Jelinek ja...@redhat.com PR target/49128 * config/i386/driver-i386.c (host_detect_local_cpu): Fix a typo. --- gcc/config/i386/driver-i386.c (revision 174170) +++ gcc/config/i386/driver-i386.c (revision 174171) @@ -696,7 +696,7 @@ const char *host_detect_local_cpu (int a const char *bmi = has_bmi ? -mbmi : -mno-bmi; const char *tbm = has_tbm ? -mtbm : -mno-tbm; const char *avx = has_avx ? -mavx : -mno-avx; - const char *sse4_2 = has_sse4_2 ? -msse4.2 : -mno-msse4.2; + const char *sse4_2 = has_sse4_2 ? -msse4.2 : -mno-sse4.2; const char *sse4_1 = has_sse4_1 ? -msse4.1 : -mno-sse4.1; options = concat (options, cx16, sahf, movbe, ase, pclmul, Thanks. -- H.J.
Re: [patch][simplify-rtx] Fix 16-bit - 64-bit multiply and accumulate
On Wed, 25 May 2011, Andrew Stubbs wrote: I know that, potentially, not all shifted operands are going to be widening multiplies, but I *think* this should be safe because other random shift values are unlikely to match a real widening mult instruction (and if they do then the code would already be broken). If somebody knows a reason why this isn't safe then I think I'm going to need some help figuring out what conditions to use. Random supposition like that is not a sensible basis for modifying GCC. I haven't managed to produce an example of code demonstrating the problem, but that's probably because I'm not sufficiently familiar with all the RTL optimizers. Where is the guarantee that the inputs to these functions must represent real instructions, or that the outputs will only be used if they represent real instructions? Where are the assertions to ensure that wrong code is not quietly generated if this is not the case? Where is the documentation of what instruction patterns it is not permitted to put in .md files because they would violate the assumptions about what instructions you are permitted to represent in RTL? How have you checked there are no existing problematic instruction patterns? RTL has defined abstract semantics and RTL transformations should be ones that are valid in accordance with those semantics, with proper assertions if there are additional constraints on the input passed to a function. This means actually counting the numbers of variable bits in the operands to determine whether the multiplication could overflow. -- Joseph S. Myers jos...@codesourcery.com
Re: [ C++ 4.6 Patch] allow uninitialized const or reference members with -fpermissive
OK, thanks. Jason
Re: Prefixes for libgcc symbols (C6X 9.5/11)
On Fri, May 13, 2011 at 9:10 AM, Bernd Schmidt ber...@codesourcery.com wrote: On 05/13/2011 04:26 PM, Joseph S. Myers wrote: On Fri, 13 May 2011, Bernd Schmidt wrote: The following patch adds a target hook and a corresponding LIBGCC2_ macro which control the generation of library function names. It also makes libgcc-std.ver a generated file, built from libgcc-std.ver.in by replacing some placeholders with the correct prefixes. While I was there, I also added functionality to generate a version of this file with an extra underscore for the Blackfin port. But the linker was changed to use C symbol names in linker scripts and I was told that this script in GCC would be removed in consequence. http://sourceware.org/ml/binutils/2010-12/msg00375.html Oh well. Dropped. Any new target macro for use only in target libraries should, in my view, be poisoned in the host system.h from the start to ensure that no-one accidentally adds definitions to the host tm.h. This would be alongside the existing /* Target macros only used for code built for the target, that have moved to libgcc-tm.h. */ #pragma GCC poison DECLARE_LIBRARY_RENAMES Done. New patch below, now testing. I think it may have caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49160 -- H.J.
Re: [patch][simplify-rtx] Fix 16-bit - 64-bit multiply and accumulate
On 25/05/11 14:19, Joseph S. Myers wrote: RTL has defined abstract semantics and RTL transformations should be ones that are valid in accordance with those semantics, with proper assertions if there are additional constraints on the input passed to a function. This means actually counting the numbers of variable bits in the operands to determine whether the multiplication could overflow. Ok, fair enough, so how can I identify a valid subreg extraction that is defined in terms of shifts? The case that I care about is simple enough: (mult:SI (ashiftrt:SI (reg:SI rM) (const_int 16)) (sign_extend:SI (subreg:HI (reg:SI rN) 0))) I guess that's just equivalent to this: (mult:SI (sign_extend:SI (subreg:HI (reg:SI rM) 4))) (sign_extend:SI (subreg:HI (reg:SI rN) 0))) but it chooses not to represent it that way, which is less than helpful in this case. So I could just scan for that exact pattern, or perhaps look for shift sizes that are half the size of the register, or some such thing, but is that general enough? Or is it too general again? Is there anything else I've missed? Andrew
Re: PATCH: Add pause intrinsic
On Tue, May 24, 2011 at 8:28 PM, H.J. Lu hjl.to...@gmail.com wrote: This patch implements pause intrinsic suggested by Andi. OK for trunk? gcc/ 2011-05-24 H.J. Lu hongjiu...@intel.com * config/i386/i386.c (ix86_builtins): Add IX86_BUILTIN_PAUSE. (bdesc_special_args): Add pause intrinsic. * config/i386/i386.md (UNSPEC_PAUSE): New. (pause): Likewise. (*pause): Likewise. * config/i386/ia32intrin.h (__pause): Likewise. * doc/extend.texi (X86 Built-in Functions): Add documentation for pause intrinsic. gcc/testsuite/ 2011-05-24 H.J. Lu hongjiu...@intel.com * gcc.target/i386/pause-1.c: New. OK. Thanks, Uros.
Re: [ARM] fix C++ EH interoperability
On 05/23/11 16:54, Andrew Haley wrote: On 05/23/2011 04:52 PM, Nathan Sidwell wrote: This patch fixes an interoperability issue with code generated by ARM's EABI compiler. This patch results has been tested for arm-linux, and independently tested by ARM with mixed RVCT-generated code confirming the defect has been fixed. ok? What did the Java test results look like? They are unchanged. nathan -- Nathan Sidwell
Re: Prefixes for libgcc symbols (C6X 9.5/11)
On 05/25/2011 01:37 PM, H.J. Lu wrote: I think it may have caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49160 Looks like it. Not quite sure how to fix it yet. Do you know what files such as i386/64/_divtc3.c are trying to achieve? Bernd
Re: [PATCH PR45098, 4/10] Iv init cost.
Sorry for being so late. I was just curious... Tom de Vries vr...@codesourcery.com writes: The init cost of an iv will in general not be zero. It will be exceptional that the iv register happens to be initialized with the proper value at no cost. In general, there will at the very least be a regcopy or a const set. 2011-05-05 Tom de Vries t...@codesourcery.com PR target/45098 * tree-ssa-loop-ivopts.c (determine_iv_cost): Prevent cost_base.cost == 0. Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c(revision 173380) +++ gcc/tree-ssa-loop-ivopts.c(working copy) @@ -4688,6 +4688,8 @@ determine_iv_cost (struct ivopts_data *d base = cand-iv-base; cost_base = force_var_cost (data, base, NULL); + if (cost_base.cost == 0) + cost_base.cost = COSTS_N_INSNS (1); cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data-speed); cost = cost_step + adjust_setup_cost (data, cost_base.cost); ...why does this reasoning apply only to this call to force_var_cost? Richard
Re: C6X port 9/11: Allow defining attributes in terms of another
On Wed, 25 May 2011, Bernd Schmidt wrote: I'm not sure where you're looking for added documentation for this patch. I guess no surprise that'd be md.texi node Defining Attributes, or an updated example in node Attr Example since the documentation for default basically just refers to it. Or perhaps better node Expressions where (attr x) is documented, since it says it's mostly useful for numeric attributes and not so for non-numeric attributes. Perhaps add after that sentence It can also be used to yield the value of another attribute, useful to e.g. set the value of the current attribute if they share a domain. You can probably find a better wording. :) It just generalizes the define_attr mechanism a little to allow one more kind of expression. Yes, the documentation is a bit terse, isn't it. But the idea that you can redirect to another attribute instead of referring to it in a conditional like in eq_attr seems to me new enough to warrant a line. brgds, H-P
Re: [patch][simplify-rtx] Fix 16-bit - 64-bit multiply and accumulate
On Wed, 25 May 2011, Andrew Stubbs wrote: On 25/05/11 14:19, Joseph S. Myers wrote: RTL has defined abstract semantics and RTL transformations should be ones that are valid in accordance with those semantics, with proper assertions if there are additional constraints on the input passed to a function. This means actually counting the numbers of variable bits in the operands to determine whether the multiplication could overflow. Ok, fair enough, so how can I identify a valid subreg extraction that is defined in terms of shifts? The shift must be by a positive constant amount, strictly less than the precision (GET_MODE_PRECISION) of the mode (of the value being shifted). If that applies, the relevant number of bits is the precision of the mode minus the number of bits of the shift. For an extension, just take the number of bits in the inner mode. Add the two numbers of bits; if the result does not exceed the number of bits in the mode (of the operands and the multiplication) then the multiplication won't overflow. As in your patch, either all the operands must be sign-extensions / arithmetic shifts (and then the result is equivalent to a widening signed multiply), or all must be zero-extensions / logical shifts (and the result is a widening unsigned multiply). -- Joseph S. Myers jos...@codesourcery.com
Re: Prefixes for libgcc symbols (C6X 9.5/11)
On 05/25/2011 01:45 PM, H.J. Lu wrote: On Wed, May 25, 2011 at 6:42 AM, Bernd Schmidt ber...@codesourcery.com wrote: On 05/25/2011 01:37 PM, H.J. Lu wrote: I think it may have caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49160 Looks like it. Not quite sure how to fix it yet. Do you know what files such as i386/64/_divtc3.c are trying to achieve? It provides backward compatibility with symbol versioning: [hjl@gnu-4 64]$ readelf -s /lib64/libgcc_s.so.1| grep __powitf2 52: 003e8a80d170 167 FUNCGLOBAL DEFAULT 12 __powitf2@@GCC_4.3.0 54: 003e8a80d170 167 FUNCGLOBAL DEFAULT 12 __powitf2@GCC_4.0.0 [hjl@gnu-4 64]$ That leaves me as clueless as before. Why does i386/64 need this but not other targets (such as i386/32), and why only those three functions (from the ones in libgcc)? Anyhow, below is one possible way of fixing it. Bernd PR bootstrap/49160 * libgcc2.h (__powisf2, __powidf2, __powitf2, __powixf2, __mulsc3, __muldc3, __mulxc3, __multc3, __divsc3, __divdc3, __divxc3, __divtc3): Wrap definitions in #ifndef. Index: gcc/libgcc2.h === --- gcc/libgcc2.h (revision 174187) +++ gcc/libgcc2.h (working copy) @@ -324,23 +324,48 @@ typedef int shift_count_type __attribute #define __parityDI2__NDW(parity,2) #define __clz_tab __N(clz_tab) +#define __bswapsi2 __N(bswapsi2) +#define __bswapdi2 __N(bswapdi2) +#define __udiv_w_sdiv __N(udiv_w_sdiv) +#define __clear_cache __N(clear_cache) +#define __enable_execute_stack __N(enable_execute_stack) + +#ifndef __powisf2 #define __powisf2 __N(powisf2) +#endif +#ifndef __powidf2 #define __powidf2 __N(powidf2) +#endif +#ifndef __powitf2 #define __powitf2 __N(powitf2) +#endif +#ifndef __powixf2 #define __powixf2 __N(powixf2) -#define __bswapsi2 __N(bswapsi2) -#define __bswapdi2 __N(bswapdi2) +#endif +#ifndef __mulsc3 #define __mulsc3 __N(mulsc3) +#endif +#ifndef __muldc3 #define __muldc3 __N(muldc3) +#endif +#ifndef __mulxc3 #define __mulxc3 __N(mulxc3) +#endif +#ifndef __multc3 #define __multc3 __N(multc3) +#endif +#ifndef __divsc3 #define __divsc3 __N(divsc3) +#endif +#ifndef __divdc3 #define __divdc3 __N(divdc3) +#endif +#ifndef __divxc3 #define __divxc3 __N(divxc3) +#endif +#ifndef __divtc3 #define __divtc3 __N(divtc3) -#define __udiv_w_sdiv __N(udiv_w_sdiv) -#define __clear_cache __N(clear_cache) -#define __enable_execute_stack __N(enable_execute_stack) +#endif extern DWtype __muldi3 (DWtype, DWtype); extern DWtype __divdi3 (DWtype, DWtype);
Re: PATCH: PR target/49142: Invalid 8bit register operand
On Tue, May 24, 2011 at 5:54 PM, H.J. Lu hongjiu...@intel.com wrote: Hi, We are working on a new optimization, which turns off TARGET_MOVX. GCC generates: movb %ah, %dil But %ah can only be used with %[abcd][hl]. This patch adds QIreg_operand and uses it in *movqi_extv_1_rex64/*movqi_extzv_2_rex64. OK for trunk if there is no regression? If this is the case, then please change q_regs_operand predicate to accept just QI_REG_P registers. Uros.
[PATCH PING] unreviewed tree-slimming patches
These patches: (C, C++, middle-end) [PATCH 14/18] move TS_STATEMENT_LIST to be a substructure of TS_TYPED http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00560.html (C, Java, middle-end) [PATCH 18/18] make TS_BLOCK a substructure of TS_BASE http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00564.html are still pending review. Jason commented on the TS_STATEMENT_LIST patch, but the discussion didn't come to a resolution. I forgot to CC the TS_BLOCK patch to the Java folks the first time around. Thanks, -Nathan
C++ PATCH to cp_common_init_ts to fix crash in print_node
Trying to print a TYPE_ARGUMENT_PACK in the debugger with debug_tree crashes because print_node assumes that all types have TS_COMMON. Fixed thus. Tested x86_64-pc-linux-gnu, applying to trunk as obvious. commit 7e5c923a908bffb2d8f8404f6cc7fd81a85bf932 Author: Jason Merrill ja...@redhat.com Date: Tue May 24 23:16:23 2011 -0400 * cp-objcp-common.c (cp_common_init_ts): TYPE_ARGUMENT_PACK has TS_COMMON. diff --git a/gcc/cp/cp-objcp-common.c b/gcc/cp/cp-objcp-common.c index ed85491..df6b1dd 100644 --- a/gcc/cp/cp-objcp-common.c +++ b/gcc/cp/cp-objcp-common.c @@ -241,6 +241,7 @@ cp_common_init_ts (void) MARK_TS_COMMON (UNDERLYING_TYPE); MARK_TS_COMMON (BASELINK); MARK_TS_COMMON (TYPE_PACK_EXPANSION); + MARK_TS_COMMON (TYPE_ARGUMENT_PACK); MARK_TS_COMMON (DECLTYPE_TYPE); MARK_TS_COMMON (BOUND_TEMPLATE_TEMPLATE_PARM); MARK_TS_COMMON (UNBOUND_CLASS_TEMPLATE);
C++ PATCH for c++/48292 (variadics and member templates)
Several parts of the variadic template code have had trouble dealing with partial instantiation; this is another one. Tested x86_64-pc-linux-gnu, applying to trunk. commit 0bbe297555a3e6585f1668266d965745df352ba4 Author: Jason Merrill ja...@redhat.com Date: Tue May 24 23:20:29 2011 -0400 PR c++/48292 * pt.c (tsubst_decl) [PARM_DECL]: Handle partial instantiation of function parameter pack. (tsubst_pack_expansion): Likewise. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index bd9aeba..fc84314 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -8711,7 +8711,12 @@ tsubst_pack_expansion (tree t, tree args, tsubst_flags_t complain, have the wrong value for a recursive call. Just make a dummy decl, since it's only used for its type. */ arg_pack = tsubst_decl (parm_pack, args, complain); - arg_pack = make_fnparm_pack (arg_pack); + if (arg_pack FUNCTION_PARAMETER_PACK_P (arg_pack)) + /* Partial instantiation of the parm_pack, we can't build + up an argument pack yet. */ + arg_pack = NULL_TREE; + else + arg_pack = make_fnparm_pack (arg_pack); } } else @@ -9801,14 +9806,14 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain) if (DECL_TEMPLATE_PARM_P (t)) SET_DECL_TEMPLATE_PARM_P (r); - /* An argument of a function parameter pack is not a parameter - pack. */ - FUNCTION_PARAMETER_PACK_P (r) = false; - if (expanded_types) /* We're on the Ith parameter of the function parameter pack. */ { + /* An argument of a function parameter pack is not a parameter + pack. */ + FUNCTION_PARAMETER_PACK_P (r) = false; + /* Get the Ith type. */ type = TREE_VEC_ELT (expanded_types, i); diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic109.C b/gcc/testsuite/g++.dg/cpp0x/variadic109.C new file mode 100644 index 000..0ec69af --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/variadic109.C @@ -0,0 +1,17 @@ +// PR c++/48292 +// { dg-options -std=c++0x } + +template typename... Args int g(Args...); + +template int N = 0 +struct A +{ +template typename... Args +static auto f(Args... args) - decltype(g(args...)); +}; + +int main() +{ +A::f(); +return 0; +}
Re: Fix for libobjc/48177. Can I apply it to 4.6 as well ?
This patch fixes libobjc/48177. I applied it to trunk. I'd like to apply this patch to the 4.6 branch too. Do I need permission from a Release Manager ? They are always welcome to chime in, though, in this case the libobjc maintainer can approve it. Thanks Mike I browsed the archives of gcc-patches for a while and as far as I can see, you are right and other maintainers do approve patches for the 4.6 branch (in their own areas) without waiting for a Release Manager to double-approve each patch (and it makes sense). So I applied it to the 4.6 branch too. :-) Thanks
C++ PATCH for c++/45080 (lambda conversion in templates)
The lambda conversion operator isn't added to CLASSTYPE_DECL_LIST, so it got lost on instantiation. But since we cut some corners building it up to reduce runtime overhead, it's easier to just add it again at instantiation time. Tested x86_64-pc-linux-gnu, applying to trunk. commit 3b93aba17af31a772141a871c3299250dbbda714 Author: Jason Merrill ja...@redhat.com Date: Wed May 25 01:21:49 2011 -0400 PR c++/45080 * pt.c (instantiate_class_template_1): Call maybe_add_lambda_conv_op. * semantics.c (lambda_function): Check COMPLETE_OR_OPEN_TYPE_P. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index fc84314..bb4515b 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -8566,6 +8566,9 @@ instantiate_class_template_1 (tree type) } } + if (CLASSTYPE_LAMBDA_EXPR (type)) +maybe_add_lambda_conv_op (type); + /* Set the file and line number information to whatever is given for the class itself. This puts error messages involving generated implicit functions at a predictable point, and the same point diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 50f25f0..55ad117 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -8145,7 +8145,8 @@ lambda_function (tree lambda) type = lambda; gcc_assert (LAMBDA_TYPE_P (type)); /* Don't let debug_tree cause instantiation. */ - if (CLASSTYPE_TEMPLATE_INSTANTIATION (type) !COMPLETE_TYPE_P (type)) + if (CLASSTYPE_TEMPLATE_INSTANTIATION (type) + !COMPLETE_OR_OPEN_TYPE_P (type)) return NULL_TREE; lambda = lookup_member (type, ansi_opname (CALL_EXPR), /*protect=*/0, /*want_type=*/false); diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-conv5.C b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-conv5.C new file mode 100644 index 000..53d8e99 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-conv5.C @@ -0,0 +1,15 @@ +// PR c++/45080 +// { dg-options -std=c++0x } + +typedef void(*pfn)(); + +templatetypename=int +void f() +{ + pfn fn = []{}; +} + +void test() +{ + f(); +}
C++ PATCH for c++/45418 (list-initialization of member array)
The code in perform_member_init for handling arrays of non-trivial classes needed a tweak to handle list-initialization. Tested x86_64-pc-linux-gnu, applying to trunk. commit ca84b75b33c26be3e9cf2894f4c8b08e3a5cac73 Author: Jason Merrill ja...@redhat.com Date: Wed May 25 00:45:38 2011 -0400 PR c++/45418 * init.c (perform_member_init): Handle list-initialization of array of non-trivial class type. diff --git a/gcc/cp/init.c b/gcc/cp/init.c index 5f30275..6336dd7 100644 --- a/gcc/cp/init.c +++ b/gcc/cp/init.c @@ -549,6 +549,8 @@ perform_member_init (tree member, tree init) { gcc_assert (TREE_CHAIN (init) == NULL_TREE); init = TREE_VALUE (init); + if (BRACE_ENCLOSED_INITIALIZER_P (init)) + init = digest_init (type, init, tf_warning_or_error); } if (init == NULL_TREE || same_type_ignoring_top_level_qualifiers_p (type, diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist50.C b/gcc/testsuite/g++.dg/cpp0x/initlist50.C new file mode 100644 index 000..ef4e72c --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/initlist50.C @@ -0,0 +1,21 @@ +// PR c++/45418 +// { dg-options -std=c++0x } + +struct A1 { }; +struct A2 { + A2(); +}; + +template class T struct B { + T ar[1]; + B(T t):ar({t}) {} +}; + +int main(){ + Bint bi{1}; + A1 a1; + BA1 ba1{a1}; + A2 a2; + A2 a2r[1]{{a2}}; + BA2 ba2{a2}; +}
C++ PATCH for c++/48935 (ICE with invalid enum scope)
Checking constructor_name_p doesn't work for an enum, and there's no reason to check it for non-classes anyway. The change to cp_parser_invalid_type_name is to avoid saying that a scoped enum is a class; now it will print the actual tag used in defining the type. Tested x86_64-pc-linux-gnu, applying to trunk. commit bef993e717fdccbde6acd7bde7aed2770cc1a95f Author: Jason Merrill ja...@redhat.com Date: Wed May 25 01:44:53 2011 -0400 PR c++/48935 * parser.c (cp_parser_constructor_declarator_p): Don't check constructor_name_p for enums. (cp_parser_diagnose_invalid_type_name): Correct error message. diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 3493e44..db2cb96 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -2534,7 +2534,7 @@ cp_parser_diagnose_invalid_type_name (cp_parser *parser, %qT is a dependent scope, parser-scope, id, parser-scope); else if (TYPE_P (parser-scope)) - error_at (location, %qE in class %qT does not name a type, + error_at (location, %qE in %q#T does not name a type, id, parser-scope); else gcc_unreachable (); @@ -19589,7 +19589,7 @@ cp_parser_constructor_declarator_p (cp_parser *parser, bool friend_p) /* If we have a class scope, this is easy; DR 147 says that S::S always names the constructor, and no other qualified name could. */ if (constructor_p nested_name_specifier - TYPE_P (nested_name_specifier)) + CLASS_TYPE_P (nested_name_specifier)) { tree id = cp_parser_unqualified_id (parser, /*template_keyword_p=*/false, diff --git a/gcc/testsuite/g++.dg/cpp0x/enum16.C b/gcc/testsuite/g++.dg/cpp0x/enum16.C new file mode 100644 index 000..ebb4868 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/enum16.C @@ -0,0 +1,6 @@ +// PR c++/48935 +// { dg-options -std=c++0x } + +enum class ENUM { a }; + +ENUM::Type func() { return ENUM::a; } // { dg-error does not name a type } diff --git a/gcc/testsuite/g++.dg/parse/error15.C b/gcc/testsuite/g++.dg/parse/error15.C index 2352193..607a1db 100644 --- a/gcc/testsuite/g++.dg/parse/error15.C +++ b/gcc/testsuite/g++.dg/parse/error15.C @@ -12,7 +12,7 @@ namespace N N::A f2; // { dg-error 1:invalid use of template-name 'N::A' without an argument list } N::INVALID f3;// { dg-error 1:'INVALID' in namespace 'N' does not name a type } -N::C::INVALID f4; // { dg-error 1:'INVALID' in class 'N::C' does not name a type } +N::C::INVALID f4; // { dg-error 1:'INVALID' in 'struct N::C' does not name a type } N::K f6; // { dg-error 1:'K' in namespace 'N' does not name a type } typename N::A f7; // { dg-error 13:invalid use of template-name 'N::A' without an argument list 13 { target *-*-* } 17 } @@ -22,7 +22,7 @@ struct B { N::A f2;// { dg-error 3:invalid use of template-name 'N::A' without an argument list } N::INVALID f3; // { dg-error 3:'INVALID' in namespace 'N' does not name a type } - N::C::INVALID f4; // { dg-error 3:'INVALID' in class 'N::C' does not name a type } + N::C::INVALID f4; // { dg-error 3:'INVALID' in 'struct N::C' does not name a type } N::K f6;// { dg-error 3:'K' in namespace 'N' does not name a type } typename N::A f7; // { dg-error 15:invalid use of template-name 'N::A' without an argument list 15 { target *-*-* } 27 } @@ -33,7 +33,7 @@ struct C { N::A f2;// { dg-error 3:invalid use of template-name 'N::A' without an argument list } N::INVALID f3; // { dg-error 3:'INVALID' in namespace 'N' does not name a type } - N::C::INVALID f4; // { dg-error 3:'INVALID' in class 'N::C' does not name a type } + N::C::INVALID f4; // { dg-error 3:'INVALID' in 'struct N::C' does not name a type } N::K f6;// { dg-error 3:'K' in namespace 'N' does not name a type } typename N::A f7; // { dg-error 15:invalid use of template-name 'N::A' without an argument list } };
[v3] Use noexcept in thread and mutex
Hi, tested x86_64-linux, committed to mainline. Thanks, Paolo. 2011-05-25 Paolo Carlini paolo.carl...@oracle.com * include/std/thread: Use noexcept throughout per the FDIS. * include/std/mutex: Likewise. Index: include/std/thread === --- include/std/thread (revision 174185) +++ include/std/thread (working copy) @@ -72,7 +72,7 @@ native_handle_type _M_thread; public: - id() : _M_thread() { } + id() noexcept : _M_thread() { } explicit id(native_handle_type __id) : _M_thread(__id) { } @@ -82,11 +82,11 @@ friend class hashthread::id; friend bool - operator==(thread::id __x, thread::id __y) + operator==(thread::id __x, thread::id __y) noexcept { return __gthread_equal(__x._M_thread, __y._M_thread); } friend bool - operator(thread::id __x, thread::id __y) + operator(thread::id __x, thread::id __y) noexcept { return __x._M_thread __y._M_thread; } templateclass _CharT, class _Traits @@ -121,11 +121,11 @@ id _M_id; public: -thread() = default; +thread() noexcept = default; thread(thread) = delete; thread(const thread) = delete; -thread(thread __t) +thread(thread __t) noexcept { swap(__t); } templatetypename _Callable, typename... _Args @@ -145,7 +145,7 @@ thread operator=(const thread) = delete; -thread operator=(thread __t) +thread operator=(thread __t) noexcept { if (joinable()) std::terminate(); @@ -154,11 +154,11 @@ } void -swap(thread __t) +swap(thread __t) noexcept { std::swap(_M_id, __t._M_id); } bool -joinable() const +joinable() const noexcept { return !(_M_id == id()); } void @@ -168,7 +168,7 @@ detach(); thread::id -get_id() const +get_id() const noexcept { return _M_id; } /** @pre thread is joinable @@ -179,7 +179,7 @@ // Returns a value that hints at the number of hardware thread contexts. static unsigned int -hardware_concurrency() +hardware_concurrency() noexcept { return 0; } private: @@ -198,23 +198,23 @@ inline thread::_Impl_base::~_Impl_base() = default; inline void - swap(thread __x, thread __y) + swap(thread __x, thread __y) noexcept { __x.swap(__y); } inline bool - operator!=(thread::id __x, thread::id __y) + operator!=(thread::id __x, thread::id __y) noexcept { return !(__x == __y); } inline bool - operator=(thread::id __x, thread::id __y) + operator=(thread::id __x, thread::id __y) noexcept { return !(__y __x); } inline bool - operator(thread::id __x, thread::id __y) + operator(thread::id __x, thread::id __y) noexcept { return __y __x; } inline bool - operator=(thread::id __x, thread::id __y) + operator=(thread::id __x, thread::id __y) noexcept { return !(__x __y); } // DR 889. @@ -250,12 +250,12 @@ /// get_id inline thread::id -get_id() { return thread::id(__gthread_self()); } +get_id() noexcept { return thread::id(__gthread_self()); } #ifdef _GLIBCXX_USE_SCHED_YIELD /// yield inline void -yield() +yield() noexcept { __gthread_yield(); } #endif Index: include/std/mutex === --- include/std/mutex (revision 174185) +++ include/std/mutex (working copy) @@ -1,6 +1,6 @@ // mutex -*- C++ -*- -// Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 +// Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 // Free Software Foundation, Inc. // // This file is part of the GNU ISO C++ Library. This library is free @@ -70,9 +70,9 @@ typedef __native_type* native_handle_type; #ifdef __GTHREAD_MUTEX_INIT -constexpr mutex() : _M_mutex(__GTHREAD_MUTEX_INIT) { } +constexpr mutex() noexcept : _M_mutex(__GTHREAD_MUTEX_INIT) { } #else -mutex() +mutex() noexcept { // XXX EAGAIN, ENOMEM, EPERM, EBUSY(may), EINVAL(may) __GTHREAD_MUTEX_INIT_FUNCTION(_M_mutex); @@ -95,7 +95,7 @@ } bool -try_lock() +try_lock() noexcept { // XXX EINVAL, EAGAIN, EBUSY return !__gthread_mutex_trylock(_M_mutex); @@ -188,7 +188,7 @@ } bool -try_lock() +try_lock() noexcept { // XXX EINVAL, EAGAIN, EBUSY return !__gthread_recursive_mutex_trylock(_M_mutex); @@ -247,7 +247,7 @@ } bool -try_lock() +try_lock() noexcept { // XXX EINVAL, EAGAIN, EBUSY return !__gthread_mutex_trylock(_M_mutex); @@ -354,7 +354,7 @@ } bool -try_lock() +try_lock() noexcept { // XXX EINVAL, EAGAIN, EBUSY return !__gthread_recursive_mutex_trylock(_M_mutex); @@ -464,7 +464,7 @@ public: typedef
Re: Cgraph thunk reorg
Honza, After we debugged this offline, I assume that you applied a version of the patch to trunk? Thanks, David On Fri, May 13, 2011 at 3:14 PM, David Edelsohn dje@gmail.com wrote: Honza, Testing is not complete, but testcases that failed with DECL_ONE_ONLY error now are passing with the later version of the patch you sent. - David On Fri, May 13, 2011 at 10:49 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi, please also try this patch Index: ipa.c === --- ipa.c (revision 173723) +++ ipa.c (working copy) @@ -886,6 +886,9 @@ function_and_variable_visibility (bool w while (decl_node-thunk.thunk_p) decl_node = decl_node-callees-callee; + DECL_COMDAT_GROUP (node-decl) = DECL_COMDAT_GROUP (decl_node-decl); + DECL_COMDAT (node-decl) = DECL_COMDAT (decl_node-decl); + /* Thunks have the same visibility as function they are attached to. For some reason C++ frontend don't seem to care. I.e. in g++.dg/torture/pr41257-2.C the thunk is not comdat while function @@ -893,10 +896,8 @@ function_and_variable_visibility (bool w We also need to arrange the thunk into the same comdat group as the function it reffers to. */ - if (DECL_COMDAT (decl_node-decl)) + if (DECL_ONE_ONLY (decl_node-decl)) { - DECL_COMDAT (node-decl) = 1; - DECL_COMDAT_GROUP (node-decl) = DECL_COMDAT_GROUP (decl_node-decl); if (!node-same_comdat_group) { node-same_comdat_group = decl_node;
Re: Fix PR 49014
On 05/25/2011 08:21 AM, Andrey Belevantsev wrote: Vlad, Bernd, I wonder if we can avoid having recog_memoized =0 insns that do not have proper DFA reservations (that is, they do not change the DFA state). I see that existing practice allows this as shown by Bernd's patch to 48403, i.e. such insns do not count against issue_rate. I would be happy to fix sel-sched in the same way. However, both sel-sched ICEs as shown by PRs 48143 and 49014 really uncover the latent bugs in the backend. So, is it possible to stop having such insns if scheduling is desired, or otherwise distinguish the insns that wrongly miss the proper DFA reservation? Add a bool target podhook, targetm.sched.all_insns_have_reservations, and add an assert in the scheduler if it is true. I'm not sure what a good default value would be. Defining it to true would almost certainly break a few ports initially (even assuming we override it in sh where it's known not to be true), but I guess it such an assertion failure would be useful information for the target maintainers. Or, if we want to enable extra checking on ports where not all insns have a reservation, a new insn attribute (has_reservation) could be defined, defined to evaluate to true by default in genattrtab, and (set_attr has_reservation 0) added in the machine descriptions where necessary. Bernd
Re: PATCH: PR target/49142: Invalid 8bit register operand
On Wed, May 25, 2011 at 7:00 AM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, May 24, 2011 at 5:54 PM, H.J. Lu hongjiu...@intel.com wrote: Hi, We are working on a new optimization, which turns off TARGET_MOVX. GCC generates: movb %ah, %dil But %ah can only be used with %[abcd][hl]. This patch adds QIreg_operand and uses it in *movqi_extv_1_rex64/*movqi_extzv_2_rex64. OK for trunk if there is no regression? If this is the case, then please change q_regs_operand predicate to accept just QI_REG_P registers. I thought about it. It is a problem only with %[abcd]h. I am not sure if changing q_regs_operand to accept just QI_REG_P registers will negatively impact (define_peephole2 [(set (reg FLAGS_REG) (match_operand 0 )) (set (match_operand:QI 1 register_operand ) (match_operator:QI 2 ix86_comparison_operator [(reg FLAGS_REG) (const_int 0)])) (set (match_operand 3 q_regs_operand ) (zero_extend (match_dup 1)))] (peep2_reg_dead_p (3, operands[1]) || operands_match_p (operands[1], operands[3])) ! reg_overlap_mentioned_p (operands[3], operands[0]) [(set (match_dup 4) (match_dup 0)) (set (strict_low_part (match_dup 5)) (match_dup 2))] (define_peephole2 [(set (reg FLAGS_REG) (match_operand 0 )) (set (match_operand:QI 1 register_operand ) (match_operator:QI 2 ix86_comparison_operator [(reg FLAGS_REG) (const_int 0)])) (parallel [(set (match_operand 3 q_regs_operand ) (zero_extend (match_dup 1))) (clobber (reg:CC FLAGS_REG))])] (peep2_reg_dead_p (3, operands[1]) || operands_match_p (operands[1], operands[3])) ! reg_overlap_mentioned_p (operands[3], operands[0]) [(set (match_dup 4) (match_dup 0)) (set (strict_low_part (match_dup 5)) (match_dup 2))] -- H.J.
Re: [RFA] [PR44618] [PowerPC] Wrong code for -frename-registers
On Mon, May 23, 2011 at 5:53 PM, edmar ed...@freescale.com wrote: I completed re-testing everything. It turns out I cannot reproduce the original error on gcc-4.4 (rev 173968) So, I am submitting only the patch that I tested for gcc-4.5/4.6/4.7 Regression tested for e500mc target on: 4.5: Revision: 173928 4.6: Revision: 173936 trunk: Revision: 173966 The patch gcc.fix_rnreg4 applies directly to 4.6, 4.7 (1 line offset), and 4.5 (-632 lines offset) Are you re-asking for approval? The patch is okay. Thanks, David P.S. Please include the ChangeLog entry inline in the email message and attach the patch to the email if it is large. No tar files.
Re: PATCH: Add pause intrinsic
On Wed, May 25, 2011 at 7:36 AM, Andrew Haley a...@redhat.com wrote: On 05/25/2011 01:34 PM, H.J. Lu wrote: On Wed, May 25, 2011 at 3:26 AM, Andrew Haley a...@redhat.com wrote: On 05/24/2011 07:28 PM, H.J. Lu wrote: This patch implements pause intrinsic suggested by Andi. OK for trunk? What does full memory barrier here mean? +@table @code +@item void __builtin_ia32_pause (void) +Generates the @code{pause} machine instruction with full memory barrier. +@end table There a memory clobber, but no barrier instruction AFAICS. The doc needs to explain it a bit better. There are read/load memory barrier, write/store memory barrier and full/general memory barrier. You can find them at http://www.kernel.org/doc/Documentation/memory-barriers.txt Should I include a pointer to it? No. I know perfectly well what memory barriers are. I'm not asking what full memory barrier means. What barrier instruction(s) does __builtin_ia32_pause() generate? All I see in the patch is rep; nop. Is that really a full memory barrier? It is a full memory barrier in the sense that compiler won't move load/store across it. It is intended for kernel. -- H.J.
[Patch ARM] Actually generate vorn and vbic instructions.
Hi, A co-worker pointed out that we weren't generating vorn and vbic instructions for Neon and I had a look. Tests are still running and will commit to trunk if there are no regressions. cheers Ramana 2011-05-25 Ramana Radhakrishnan ramana.radhakrish...@linaro.org * config/arm/neon.md (ornmode3_neon): Canonicalize not. (orndi3_neon): Likewise. (bicmode3_neon): Likewise. 2011-05-25 Ramana Radhakrishnan ramana.radhakrish...@linaro.org * gcc.target/arm/neon-vorn-vbic.c: New file. Index: gcc/config/arm/neon.md === --- gcc/config/arm/neon.md (revision 174174) +++ gcc/config/arm/neon.md (working copy) @@ -794,8 +794,8 @@ (define_insn ornmode3_neon [(set (match_operand:VDQ 0 s_register_operand =w) - (ior:VDQ (match_operand:VDQ 1 s_register_operand w) -(not:VDQ (match_operand:VDQ 2 s_register_operand w] + (ior:VDQ (not:VDQ (match_operand:VDQ 2 s_register_operand w)) +(match_operand:VDQ 1 s_register_operand w)))] TARGET_NEON vorn\t%V_reg0, %V_reg1, %V_reg2 [(set_attr neon_type neon_int_1)] @@ -803,8 +803,8 @@ (define_insn orndi3_neon [(set (match_operand:DI 0 s_register_operand =w,?=r,?r) - (ior:DI (match_operand:DI 1 s_register_operand w,r,0) -(not:DI (match_operand:DI 2 s_register_operand w,0,r] + (ior:DI (not:DI (match_operand:DI 2 s_register_operand w,0,r)) + (match_operand:DI 1 s_register_operand w,r,0)))] TARGET_NEON @ vorn\t%P0, %P1, %P2 @@ -816,8 +816,8 @@ (define_insn bicmode3_neon [(set (match_operand:VDQ 0 s_register_operand =w) - (and:VDQ (match_operand:VDQ 1 s_register_operand w) - (not:VDQ (match_operand:VDQ 2 s_register_operand w] + (and:VDQ (not:VDQ (match_operand:VDQ 2 s_register_operand w)) +(match_operand:VDQ 1 s_register_operand w)))] TARGET_NEON vbic\t%V_reg0, %V_reg1, %V_reg2 [(set_attr neon_type neon_int_1)] --- /dev/null 2011-05-18 14:49:12.916256701 +0100 +++ ./gcc/testsuite/gcc.target/arm/neon-vorn-vbic.c 2011-05-25 11:17:09.966726432 +0100 @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-options -O2 -ftree-vectorize } */ +/* { dg-add-options arm_neon } */ + +void bor (int *__restrict__ c, int *__restrict__ a, int *__restrict__ b) +{ + int i; + for (i=0;i9;i++) +c[i] = b[i] | (~a[i]); +} +void bic (int *__restrict__ c, int *__restrict__ a, int *__restrict__ b) +{ + int i; + for (i=0;i9;i++) +c[i] = b[i] (~a[i]); +} + +/* { dg-final { scan-assembler vorn\\t } } */ +/* { dg-final { scan-assembler vbic\\t } } */
Re: PATCH: Add pause intrinsic
On Wed, May 25, 2011 at 4:47 PM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, May 25, 2011 at 7:36 AM, Andrew Haley a...@redhat.com wrote: On 05/25/2011 01:34 PM, H.J. Lu wrote: On Wed, May 25, 2011 at 3:26 AM, Andrew Haley a...@redhat.com wrote: On 05/24/2011 07:28 PM, H.J. Lu wrote: This patch implements pause intrinsic suggested by Andi. OK for trunk? What does full memory barrier here mean? +@table @code +@item void __builtin_ia32_pause (void) +Generates the @code{pause} machine instruction with full memory barrier. +@end table There a memory clobber, but no barrier instruction AFAICS. The doc needs to explain it a bit better. There are read/load memory barrier, write/store memory barrier and full/general memory barrier. You can find them at http://www.kernel.org/doc/Documentation/memory-barriers.txt Should I include a pointer to it? No. I know perfectly well what memory barriers are. I'm not asking what full memory barrier means. What barrier instruction(s) does __builtin_ia32_pause() generate? All I see in the patch is rep; nop. Is that really a full memory barrier? It is a full memory barrier in the sense that compiler won't move load/store across it. It is intended for kernel. There is no such thing if you include accesses to automatic variables. Richard. -- H.J.
Re: PATCH: Add pause intrinsic
On 05/25/2011 03:47 PM, H.J. Lu wrote: On Wed, May 25, 2011 at 7:36 AM, Andrew Haley a...@redhat.com wrote: On 05/25/2011 01:34 PM, H.J. Lu wrote: On Wed, May 25, 2011 at 3:26 AM, Andrew Haley a...@redhat.com wrote: On 05/24/2011 07:28 PM, H.J. Lu wrote: This patch implements pause intrinsic suggested by Andi. OK for trunk? What does full memory barrier here mean? +@table @code +@item void __builtin_ia32_pause (void) +Generates the @code{pause} machine instruction with full memory barrier. +@end table There a memory clobber, but no barrier instruction AFAICS. The doc needs to explain it a bit better. There are read/load memory barrier, write/store memory barrier and full/general memory barrier. You can find them at http://www.kernel.org/doc/Documentation/memory-barriers.txt Should I include a pointer to it? No. I know perfectly well what memory barriers are. I'm not asking what full memory barrier means. What barrier instruction(s) does __builtin_ia32_pause() generate? All I see in the patch is rep; nop. Is that really a full memory barrier? It is a full memory barrier in the sense that compiler won't move load/store across it. It is intended for kernel. Right, so it is, in fact, not a full memory barrier. I thought not. I's no more a full memory barrier than a simple asm volatile() . The doc needs to explain that a bit better. Andrew.
Re: PATCH: Add pause intrinsic
On Wed, May 25, 2011 at 4:54 PM, Andrew Haley a...@redhat.com wrote: On 05/25/2011 03:47 PM, H.J. Lu wrote: On Wed, May 25, 2011 at 7:36 AM, Andrew Haley a...@redhat.com wrote: On 05/25/2011 01:34 PM, H.J. Lu wrote: On Wed, May 25, 2011 at 3:26 AM, Andrew Haley a...@redhat.com wrote: On 05/24/2011 07:28 PM, H.J. Lu wrote: This patch implements pause intrinsic suggested by Andi. OK for trunk? What does full memory barrier here mean? +@table @code +@item void __builtin_ia32_pause (void) +Generates the @code{pause} machine instruction with full memory barrier. +@end table There a memory clobber, but no barrier instruction AFAICS. The doc needs to explain it a bit better. There are read/load memory barrier, write/store memory barrier and full/general memory barrier. You can find them at http://www.kernel.org/doc/Documentation/memory-barriers.txt Should I include a pointer to it? No. I know perfectly well what memory barriers are. I'm not asking what full memory barrier means. What barrier instruction(s) does __builtin_ia32_pause() generate? All I see in the patch is rep; nop. Is that really a full memory barrier? It is a full memory barrier in the sense that compiler won't move load/store across it. It is intended for kernel. Right, so it is, in fact, not a full memory barrier. I thought not. I's no more a full memory barrier than a simple asm volatile() . The doc needs to explain that a bit better. asm volatile ( : : : memory) in fact will work as a full memory barrier because we are very very lazy in disambiguating against asms (but that should change, at least a tiny bit). Function calls otoh are pretty well optimized. Richard. Andrew.
Re: PATCH: Add pause intrinsic
On 05/25/2011 03:57 PM, Richard Guenther wrote: asm volatile ( : : : memory) in fact will work as a full memory barrier How? You surely need MFENCE or somesuch, unless all you care about is a compiler barrier. That's what I think needs to be clarified. Andrew.
Re: PATCH: Add pause intrinsic
On Wed, May 25, 2011 at 5:09 PM, Andrew Haley a...@redhat.com wrote: On 05/25/2011 03:57 PM, Richard Guenther wrote: asm volatile ( : : : memory) in fact will work as a full memory barrier How? You surely need MFENCE or somesuch, unless all you care about is a compiler barrier. That's what I think needs to be clarified. Well, yes, I'm talking about the compiler memory barrier. Richard. Andrew.
Re: [testsuite] ignore irrelevant warning in two ARM tests
On 05/24/2011 05:49 PM, Mike Stump wrote: On May 24, 2011, at 3:42 PM, Janis Johnson wrote: Is this one OK for trunk and 4.6? The failure occurs for arm-none-eabi and for arm-none-linux-gnueabi. You should repeat all the original options from the main dg-options line, with -Wno-abi added, in the ARM EABI dg-options line, since only one dg-options line will be in effect. Oops, yet again. I'll do that. Ok with that change. Also, if there are many of these exceptions, it might be better to add the flags to shut it up to the base set of flags, and then to add it explicitly to any testcase that really does want to test the warning. These are the only tests I've found that get this message. Janis
Re: PATCH: Add pause intrinsic
Hi, On Wed, 25 May 2011, Richard Guenther wrote: asm volatile ( : : : memory) in fact will work as a full memory barrier How? You surely need MFENCE or somesuch, unless all you care about is a compiler barrier. That's what I think needs to be clarified. Well, yes, I'm talking about the compiler memory barrier. Something that we conventionally call optimization barrier :) memory barrier has a fixed meaning which we shouldn't use in this case, it's confusing. Ciao, Michael.
Re: PATCH: Add pause intrinsic
On Wed, May 25, 2011 at 5:20 PM, Michael Matz m...@suse.de wrote: Hi, On Wed, 25 May 2011, Richard Guenther wrote: asm volatile ( : : : memory) in fact will work as a full memory barrier How? You surely need MFENCE or somesuch, unless all you care about is a compiler barrier. That's what I think needs to be clarified. Well, yes, I'm talking about the compiler memory barrier. Something that we conventionally call optimization barrier :) memory barrier has a fixed meaning which we shouldn't use in this case, it's confusing. Sure ;) And to keep the info in a suitable thread what I'd like to improve here is to make us disambiguate memory loads/stores against asms that have no memory outputs/inputs. Richard.
Re: PATCH: PR target/49142: Invalid 8bit register operand
On Wed, May 25, 2011 at 4:42 PM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, May 25, 2011 at 7:00 AM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, May 24, 2011 at 5:54 PM, H.J. Lu hongjiu...@intel.com wrote: Hi, We are working on a new optimization, which turns off TARGET_MOVX. GCC generates: movb %ah, %dil But %ah can only be used with %[abcd][hl]. This patch adds QIreg_operand and uses it in *movqi_extv_1_rex64/*movqi_extzv_2_rex64. OK for trunk if there is no regression? and Replace q_regs_operand with QIreg_operand. ( If this is the case, then please change q_regs_operand predicate to accept just QI_REG_P registers. I thought about it. It is a problem only with %[abcd]h. I am not sure if changing q_regs_operand to accept just QI_REG_P registers will negatively impact I see. The patch is OK then, but for consistency, please change the predicate of *movqi_extv_1*movqi_extzv_2 as well. Oh, and the register_operand check in type calculation can be removed. Thanks, Uros.
Re: [SPARC] Disable -fira-share-save-slots by default
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/24/11 16:30, Eric Botcazou wrote: The new save slot sharing algorithm has a documented limitation: Future work: In the fallback case we should iterate backwards across all possible modes for the save, choosing the largest available one instead of falling back to the smallest mode immediately. (eg TF - DF - SF). That's not new -- I wrote that circa 1992/1993. The whole point behind the changes from ~1992 was to try and use DFmode insns when caller-saving FP regs on the sparc. However, I didn't think it was worth the effort to deal with that case (TF - DF - SF). The code tries to build a save/restore insn of MOVE_MAX_WORDS size and if that fails, then it drops back to WORD_SIZE IIRC. that is annoying for the SPARC when it comes to floating-point code because the floating-point registers are single (SF) but there is a fully-fledged support for double (DF) arithmetics in the architecture. So saving registers on an individual basis really pessimizes here. For example, the size of the object generated for the Ada unit a-nlcefu.ads at -O2 decreases from 96080 to 95088 bytes when you pass -fno-ira-share-save-slots. It's the new slot sharing code that doesn't have support for saving larger hunks. Having written the original code to handle larger saves specifically to help sparc, I can certainly understand why the new code is causing you grief :-) Experiments have shown that the impact on integer code is null in terms of code size and negligible in terms of stack usage (-fstack-usage reports 8/16 bytes increase for most functions). Therefore this patch disables the option by default for the SPARC. Boostrapped/regtested on SPARC/Solaris, applied on the mainline and 4.6 branch. Jeff, I'd like to apply it to the 4.5 branch as well, but I need your patch: 2011-01-21 Jeff Law l...@redhat.com PR rtl-optimization/41619 * caller-save.c (setup_save_areas): Break out code to determine which hard regs are live across calls by examining the reload chains so that it is always used. Eliminate code which checked REG_N_CALLS_CROSSED. Do you have any objections to me backporting it to the branch? No objections at all. I don't believe there were any follow-up patches and all the change did was make more of the paths through caller-save consistent in how they determined what needed to be saved. jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJN3SihAAoJEBRtltQi2kC7bnsH/10X0fDwfHtt8b16js5nZHZ8 n+f6TPlAuAu1vJ5h4YI7afybMMfBfHAbvTLwtD+f37boreTQU1wizVH4JLC4GgMS KP9vB48mK/wHli0Hze37QAxVcQt8CPCr3d1fJtpVp6CNUp1gzLWkqT2GjUmxTxfX M3qQ7wRot0cfvVDx8upOj3Yr9tih/c/vIm5ez49s8fzha2acSpEB0vFFj3gcx3EO C3Mgu6z1ZVskIP5KOUIV/2EhtXHMoC4dxsodurfvtGafK5gmbaqVSipzZlKj4BSg Oc4XPKAy07/cSxQhx94pYFB8+Jr7TC99Yubgq2v2gJitf+99AW9MCWmnEvLfX2A= =w6DV -END PGP SIGNATURE-
Re: RFA PR 48770
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/25/11 06:40, Bernd Schmidt wrote: The code in question is literally 20 years old and predates running any real dead code elimination after reload. ISTM the right thing to do is stop using delete_dead_insn in this code and let the post-reload DCE pass do its job. That allows us to continue to record the block local equivalence. Sounds like the right thing to do. OK. (Can we eliminate the other caller?) I didn't look too hard at the other call; looking at it now, I think we can probably safely remove it and just delete the single insn which sets the eliminable register. I can either add that to the existing patch or submit it as a follow-up. I've got no strong preference on this issue. I've looked at code generation; it appears unchanged on i686-linux, which I think is the expected result. There are minor differences in assembly output on mips64-linux. If you want to look at it, I'm attaching a testcase - compile with -O2 -fno-reorder-blocks. I'm a little surprised to hear there is a codegen difference, though I can envision a variety of ways that could happen. The undeleted insns might interfere with the post-reload optimizers which run before dce for example. I'll take a quick look. jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJN3StDAAoJEBRtltQi2kC7kGAH/R7DzEkkQdaNj6xQjTXtqKs5 hv9mngz5lEovhaZvpdmRw8pc4mBcis1P4s9jgD3boj1aX3R8PQu+WsL6br5DzduA b+TtRDyVPazOSrc1mMLiCZr81rbSQfEzCWBWK1ZHLPA2oQNw8v211HtPoTxg1qsq kXyArAnd/bQBip9AJHEh1J3yOyFkV5eNDODZPIl8hvGhIyRlJz+R72v3eRwT+oCA 65mU1Zfqykul+BKtJG1uj13gtTsroxHjZYI/iCmVMYriDFWIyj7qLgNtNOxx9yTQ sFQbJqJX9cdXIgcAJoijzpT+bLubSeGUaWgjZgqG/AwU5vEXkOp8etBGeUZNg2Q= =bSzC -END PGP SIGNATURE-
Re: New options to disable/enable any pass for any functions (issue4550056)
Ping. The link to the message: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01303.html Thanks, David On Sun, May 22, 2011 at 4:17 PM, Xinliang David Li davi...@google.com wrote: Ping. David On Fri, May 20, 2011 at 9:06 AM, Xinliang David Li davi...@google.com wrote: Ok to check in this one? Thanks, David On Wed, May 18, 2011 at 12:30 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Wed, 18 May 2011, David Li wrote: + error (Unrecognized option %s, is_enable ? -fenable : -fdisable); + error (Unknown pass %s specified in %s, + phase_name, + is_enable ? -fenable : -fdisable); Follow GNU Coding Standards for diagnostics (start with lowercase letter). + inform (UNKNOWN_LOCATION, %s pass %s for functions in the range of [%u, %u]\n, + is_enable? Enable:Disable, phase_name, new_range-start, new_range-last); Use separate calls to inform for the enable and disable cases, so that full sentences can be extracted for translation. + error (Invalid range %s in option %s, + one_range, + is_enable ? -fenable : -fdisable); GNU Coding Standards. + error (Invalid range %s in option %s, Likewise. + inform (UNKNOWN_LOCATION, %s pass %s for functions in the range of [%u, %u]\n, + is_enable? Enable:Disable, phase_name, new_range-start, new_range-last); Again needs GCS and i18n fixes. -- Joseph S. Myers jos...@codesourcery.com
Re: PATCH: PR target/49142: Invalid 8bit register operand
On Wed, May 25, 2011 at 8:30 AM, Uros Bizjak ubiz...@gmail.com wrote: On Wed, May 25, 2011 at 4:42 PM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, May 25, 2011 at 7:00 AM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, May 24, 2011 at 5:54 PM, H.J. Lu hongjiu...@intel.com wrote: Hi, We are working on a new optimization, which turns off TARGET_MOVX. GCC generates: movb %ah, %dil But %ah can only be used with %[abcd][hl]. This patch adds QIreg_operand and uses it in *movqi_extv_1_rex64/*movqi_extzv_2_rex64. OK for trunk if there is no regression? and Replace q_regs_operand with QIreg_operand. ( If this is the case, then please change q_regs_operand predicate to accept just QI_REG_P registers. I thought about it. It is a problem only with %[abcd]h. I am not sure if changing q_regs_operand to accept just QI_REG_P registers will negatively impact I see. The patch is OK then, but for consistency, please change the predicate of *movqi_extv_1*movqi_extzv_2 as well. Oh, and the register_operand check in type calculation can be removed. Thanks, Uros. This is what I checked in. Thanks. -- H.J. --- 2011-05-25 H.J. Lu hongjiu...@intel.com PR target/49142 * config/i386/i386.md (*movqi_extv_1_rex64): Remove register_operand check and replace q_regs_operand with QIreg_operand in type calculation. (*movqi_extv_1): Likewise. (*movqi_extzv_2_rex64): Likewise. (*movqi_extzv_2): Likewise. * config/i386/predicates.md (QIreg_operand): New. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 49f1ee7..3b59024 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -2487,10 +2487,9 @@ } } [(set (attr type) - (if_then_else (and (match_operand:QI 0 register_operand ) - (ior (not (match_operand:QI 0 q_regs_operand )) -(ne (symbol_ref TARGET_MOVX) -(const_int 0 + (if_then_else (ior (not (match_operand:QI 0 QIreg_operand )) + (ne (symbol_ref TARGET_MOVX) + (const_int 0))) (const_string imovx) (const_string imov))) (set (attr mode) @@ -2514,10 +2513,9 @@ } } [(set (attr type) - (if_then_else (and (match_operand:QI 0 register_operand ) - (ior (not (match_operand:QI 0 q_regs_operand )) -(ne (symbol_ref TARGET_MOVX) -(const_int 0 + (if_then_else (ior (not (match_operand:QI 0 QIreg_operand )) + (ne (symbol_ref TARGET_MOVX) + (const_int 0))) (const_string imovx) (const_string imov))) (set (attr mode) @@ -2552,7 +2550,7 @@ } } [(set (attr type) - (if_then_else (ior (not (match_operand:QI 0 q_regs_operand )) + (if_then_else (ior (not (match_operand:QI 0 QIreg_operand )) (ne (symbol_ref TARGET_MOVX) (const_int 0))) (const_string imovx) @@ -2579,10 +2577,9 @@ } } [(set (attr type) - (if_then_else (and (match_operand:QI 0 register_operand ) - (ior (not (match_operand:QI 0 q_regs_operand )) -(ne (symbol_ref TARGET_MOVX) -(const_int 0 + (if_then_else (ior (not (match_operand:QI 0 QIreg_operand )) + (ne (symbol_ref TARGET_MOVX) + (const_int 0))) (const_string imovx) (const_string imov))) (set (attr mode) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 8a89f70..1471f5a 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -82,6 +82,10 @@ (and (match_code reg) (match_test REGNO (op) == FLAGS_REG))) +;; Return true if op is one of QImode registers: %[abcd][hl]. +(define_predicate QIreg_operand + (match_test QI_REG_P (op))) + ;; Return true if op is a QImode register operand other than ;; %[abcd][hl]. (define_predicate ext_QIreg_operand
Re: RFA PR 48770
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/25/11 06:40, Bernd Schmidt wrote: I've looked at code generation; it appears unchanged on i686-linux, which I think is the expected result. There are minor differences in assembly output on mips64-linux. If you want to look at it, I'm attaching a testcase - compile with -O2 -fno-reorder-blocks. I get the same code with and without the patch using a cross compiler. Can you send me the differing .s files and dumps? Thanks, jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJN3S7+AAoJEBRtltQi2kC7ZXAH/2utYd98C+K0DcySDvk5wR3s 6yLaYiD2rkFlEKlXTi0ojNIGi87xLwjo8PUDs+lsQy4UONoCDPAbYA7/fX412pCY IgOZqE/lcSsxNj7Mo6ggtobmnsDSShhn3SjnjA5NPbOGL77nfdGAzjtPJ3R9QmuD nPjBKGzxkiM2W7bCDwPYQvuZpJ8M3YxDrrmAferYbrgu9/+QjS+qsg50ckahFgMe l5VdWs+rm1bLym5R2DCqkG5b0ebVzvh7mg8dIDVD/FMonqjLOlzqSODbuLi+Qe/j AQMUawoQMVlowQKtXaVAviP2VPp4V5oV7e8cdGBXO4XiShawqHKZn9/Zf+9YEt0= =ZRqD -END PGP SIGNATURE-
Re: Prefixes for libgcc symbols (C6X 9.5/11)
On Wed, May 25, 2011 at 6:52 AM, Bernd Schmidt ber...@codesourcery.com wrote: On 05/25/2011 01:45 PM, H.J. Lu wrote: On Wed, May 25, 2011 at 6:42 AM, Bernd Schmidt ber...@codesourcery.com wrote: On 05/25/2011 01:37 PM, H.J. Lu wrote: I think it may have caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49160 Looks like it. Not quite sure how to fix it yet. Do you know what files such as i386/64/_divtc3.c are trying to achieve? It provides backward compatibility with symbol versioning: [hjl@gnu-4 64]$ readelf -s /lib64/libgcc_s.so.1| grep __powitf2 52: 003e8a80d170 167 FUNC GLOBAL DEFAULT 12 __powitf2@@GCC_4.3.0 54: 003e8a80d170 167 FUNC GLOBAL DEFAULT 12 __powitf2@GCC_4.0.0 [hjl@gnu-4 64]$ That leaves me as clueless as before. Why does i386/64 need this but not other targets (such as i386/32), and why only those three functions (from the ones in libgcc)? Anyhow, below is one possible way of fixing it. It fixed the libgcc failure. Can you check it in? Thanks. -- H.J.
Re: Prefixes for libgcc symbols (C6X 9.5/11)
On 05/25/2011 04:38 PM, H.J. Lu wrote: On Wed, May 25, 2011 at 6:52 AM, Bernd Schmidt ber...@codesourcery.com wrote: Anyhow, below is one possible way of fixing it. It fixed the libgcc failure. Can you check it in? I suppose it is reasonably obvious. Done. Bernd
Re: [PATCH PING] unreviewed tree-slimming patches
On Wed, 25 May 2011, Nathan Froyd wrote: These patches: (C, C++, middle-end) [PATCH 14/18] move TS_STATEMENT_LIST to be a substructure of TS_TYPED http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00560.html (C, Java, middle-end) [PATCH 18/18] make TS_BLOCK a substructure of TS_BASE http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00564.html are still pending review. Jason commented on the TS_STATEMENT_LIST patch, but The C changes are OK. -- Joseph S. Myers jos...@codesourcery.com
Re: New options to disable/enable any pass for any functions (issue4550056)
On Wed, 25 May 2011, Xinliang David Li wrote: Ping. The link to the message: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01303.html I don't consider this an option handling patch. Patches adding whole new features involving new options should be reviewed by maintainers for the part of the compiler relevant to those features (since there isn't a pass manager maintainer, I guess that means middle-end). -- Joseph S. Myers jos...@codesourcery.com
Re: PATCH: Add pause intrinsic
On Wed, May 25, 2011 at 9:43 AM, Andrew Haley a...@redhat.com wrote: On 05/25/2011 04:32 PM, H.J. Lu wrote: On Wed, May 25, 2011 at 8:27 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, May 25, 2011 at 5:20 PM, Michael Matz m...@suse.de wrote: Hi, On Wed, 25 May 2011, Richard Guenther wrote: asm volatile ( : : : memory) in fact will work as a full memory barrier How? You surely need MFENCE or somesuch, unless all you care about is a compiler barrier. That's what I think needs to be clarified. Well, yes, I'm talking about the compiler memory barrier. Something that we conventionally call optimization barrier :) memory barrier has a fixed meaning which we shouldn't use in this case, it's confusing. Sure ;) And to keep the info in a suitable thread what I'd like to improve here is to make us disambiguate memory loads/stores against asms that have no memory outputs/inputs. Please let me know how I should improve the document, Compiler memory barrier seems to be well-understood. I suggest +Generates the @code{pause} machine instruction with a compiler memory barrier. It's clear enough. Andrew. I checked in this. Thanks. -- H.J. --- Index: doc/extend.texi === --- doc/extend.texi (revision 174216) +++ doc/extend.texi (working copy) @@ -8699,7 +8699,8 @@ The following built-in function is alway @table @code @item void __builtin_ia32_pause (void) -Generates the @code{pause} machine instruction with full memory barrier. +Generates the @code{pause} machine instruction with a compiler memory +barrier. @end table The following floating point built-in functions are made available in the Index: ChangeLog === --- ChangeLog (revision 174216) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2011-05-25 H.J. Lu hongjiu...@intel.com + + * doc/extend.texi (X86 Built-in Functions): Update pause + intrinsic. + 2011-05-25 Bernd Schmidt ber...@codesourcery.com PR bootstrap/49160
[PATCH] Fix VRP switch handling (PR tree-optimization/49161)
Hi! The following testcase is miscompiled, because there are multiple CASE_LABELs for the same target bb in a switch: bb 2: switch (x_1(D)) default: L13, case 3: l4, case 4: l1, case 6: l3 l3: bar (-1); l2: l1: l4: bar (0); find_switch_asserts sorts by uids of CASE_LABELs and adds x_1(D) == 4 as well as x_1(D) == 3 assertions on the same edge, instead of adding properly x_1(D) = 3 and x_1(D) = 4 assertions. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.6? 2011-05-25 Jakub Jelinek ja...@redhat.com PR tree-optimization/49161 * tree-vrp.c (struct case_info): New type. (compare_case_labels): Sort case_info structs instead of trees, and not primarily by CASE_LABEL uids but by label_for_block indexes. (find_switch_asserts): Put case labels into struct case_info array instead of TREE_VEC, adjust sorting, compare label_for_block values instead of CASE_LABELs. * gcc.c-torture/execute/pr49161.c: New test. --- gcc/tree-vrp.c.jj 2011-05-20 08:14:08.0 +0200 +++ gcc/tree-vrp.c 2011-05-25 16:03:18.0 +0200 @@ -4673,28 +4673,35 @@ find_conditional_asserts (basic_block bb return need_assert; } -/* Compare two case labels sorting first by the destination label uid +struct case_info +{ + tree expr; + basic_block bb; +}; + +/* Compare two case labels sorting first by the destination bb index and then by the case value. */ static int compare_case_labels (const void *p1, const void *p2) { - const_tree const case1 = *(const_tree const*)p1; - const_tree const case2 = *(const_tree const*)p2; - unsigned int uid1 = DECL_UID (CASE_LABEL (case1)); - unsigned int uid2 = DECL_UID (CASE_LABEL (case2)); + const struct case_info *ci1 = (const struct case_info *) p1; + const struct case_info *ci2 = (const struct case_info *) p2; + int idx1 = ci1-bb-index; + int idx2 = ci2-bb-index; - if (uid1 uid2) + if (idx1 idx2) return -1; - else if (uid1 == uid2) + else if (idx1 == idx2) { /* Make sure the default label is first in a group. */ - if (!CASE_LOW (case1)) + if (!CASE_LOW (ci1-expr)) return -1; - else if (!CASE_LOW (case2)) + else if (!CASE_LOW (ci2-expr)) return 1; else -return tree_int_cst_compare (CASE_LOW (case1), CASE_LOW (case2)); + return tree_int_cst_compare (CASE_LOW (ci1-expr), +CASE_LOW (ci2-expr)); } else return 1; @@ -4715,8 +4722,8 @@ find_switch_asserts (basic_block bb, gim gimple_stmt_iterator bsi; tree op; edge e; - tree vec2; - size_t n = gimple_switch_num_labels(last); + struct case_info *ci; + size_t n = gimple_switch_num_labels (last); #if GCC_VERSION = 4000 unsigned int idx; #else @@ -4731,36 +4738,38 @@ find_switch_asserts (basic_block bb, gim return false; /* Build a vector of case labels sorted by destination label. */ - vec2 = make_tree_vec (n); + ci = XNEWVEC (struct case_info, n); for (idx = 0; idx n; ++idx) -TREE_VEC_ELT (vec2, idx) = gimple_switch_label (last, idx); - qsort (TREE_VEC_ELT (vec2, 0), n, sizeof (tree), compare_case_labels); +{ + ci[idx].expr = gimple_switch_label (last, idx); + ci[idx].bb = label_to_block (CASE_LABEL (ci[idx].expr)); +} + qsort (ci, n, sizeof (struct case_info), compare_case_labels); for (idx = 0; idx n; ++idx) { tree min, max; - tree cl = TREE_VEC_ELT (vec2, idx); + tree cl = ci[idx].expr; + basic_block cbb = ci[idx].bb; min = CASE_LOW (cl); max = CASE_HIGH (cl); /* If there are multiple case labels with the same destination we need to combine them to a single value range for the edge. */ - if (idx + 1 n - CASE_LABEL (cl) == CASE_LABEL (TREE_VEC_ELT (vec2, idx + 1))) + if (idx + 1 n cbb == ci[idx + 1].bb) { /* Skip labels until the last of the group. */ do { ++idx; - } while (idx n - CASE_LABEL (cl) == CASE_LABEL (TREE_VEC_ELT (vec2, idx))); + } while (idx n cbb == ci[idx].bb); --idx; /* Pick up the maximum of the case label range. */ - if (CASE_HIGH (TREE_VEC_ELT (vec2, idx))) - max = CASE_HIGH (TREE_VEC_ELT (vec2, idx)); + if (CASE_HIGH (ci[idx].expr)) + max = CASE_HIGH (ci[idx].expr); else - max = CASE_LOW (TREE_VEC_ELT (vec2, idx)); + max = CASE_LOW (ci[idx].expr); } /* Nothing to do if the range includes the default label until we @@ -4769,7 +4778,7 @@ find_switch_asserts (basic_block bb, gim continue; /* Find the edge to register the assert expr on. */ - e = find_edge (bb, label_to_block (CASE_LABEL (cl))); + e = find_edge (bb, cbb); /* Register the necessary assertions for the operand in the
Re: PATCH: Add pause intrinsic
On Wed, May 25, 2011 at 10:19 AM, H.J. Lu hjl.to...@gmail.com wrote: -- H.J. --- Index: doc/extend.texi === --- doc/extend.texi (revision 174216) +++ doc/extend.texi (working copy) @@ -8699,7 +8699,8 @@ The following built-in function is alway @table @code @item void __builtin_ia32_pause (void) -Generates the @code{pause} machine instruction with full memory barrier. +Generates the @code{pause} machine instruction with a compiler memory +barrier. What is the pause machine instruction do? How is it different from a normal nop? Also pause to me means it waits for input or an interrupt. Thanks, Andrew Pinski
Re: PATCH: Add pause intrinsic
On 05/25/2011 06:26 PM, Andrew Pinski wrote: On Wed, May 25, 2011 at 10:19 AM, H.J. Lu hjl.to...@gmail.com wrote: -- H.J. --- Index: doc/extend.texi === --- doc/extend.texi (revision 174216) +++ doc/extend.texi (working copy) @@ -8699,7 +8699,8 @@ The following built-in function is alway @table @code @item void __builtin_ia32_pause (void) -Generates the @code{pause} machine instruction with full memory barrier. +Generates the @code{pause} machine instruction with a compiler memory +barrier. What is the pause machine instruction do? That's documented by Intel in the architecture manual. Surely we don't have to explain it all. Andrew. PAUSE—Spin Loop Hint Improves the performance of spin-wait loops. When executing a “spin-wait loop,” a Pentium 4 or Intel Xeon processor suffers a severe performance penalty when exiting the loop because it detects a possible memory order violation. The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a PAUSE instruction be placed in all spin-wait loops. An additional function of the PAUSE instruction is to reduce the power consumed by a Pentium 4 processor while executing a spin loop. The Pentium 4 processor can execute a spin-wait loop extremely quickly, causing the processor to consume a lot of power while it waits for the resource it is spinning on to become available. Inserting a pause instruction in a spin-wait loop greatly reduces the processor’s power consumption. This instruction was introduced in the Pentium 4 processors, but is backward compat- ible with all IA-32 processors. In earlier IA-32 processors, the PAUSE instruction operates like a NOP instruction. The Pentium 4 and Intel Xeon processors implement the PAUSE instruction as a pre-defined delay. The delay is finite and can be zero for some processors. This instruction does not change the architectural state of the processor (that is, it performs essentially a delaying no-op operation). This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
Re: More fixes from static analysis checkers
On Thu, Mar 24, 2011 at 03:52:57PM -0600, Jeff Law wrote: We had a variety of functions which would fail to call va_end prior to returning. I'm not aware of a host were this could cause a problem, but it's easy enough to fix and keeps the checkers quiet. In case of def_fn_type, this added a second va_end if the function doesn't fail. This patch removes the first va_end, bootstrapped/regtested on x86_64-linux and i686-linux, committed as obvious. 2011-05-25 Jakub Jelinek ja...@redhat.com * c-common.c (def_fn_type): Remove extra va_end. * gcc-interface/utils.c (def_fn_type): Remove extra va_end. --- gcc/c-family/c-common.c.jj 2011-05-24 23:34:16.0 +0200 +++ gcc/c-family/c-common.c 2011-05-25 16:50:57.0 +0200 @@ -4451,7 +4451,6 @@ def_fn_type (builtin_type def, builtin_t goto egress; args[i] = t; } - va_end (list); t = builtin_types[ret]; if (t == error_mark_node) --- gcc/ada/gcc-interface/utils.c.jj2011-05-11 19:38:55.0 +0200 +++ gcc/ada/gcc-interface/utils.c 2011-05-25 16:52:00.0 +0200 @@ -4965,7 +4965,6 @@ def_fn_type (builtin_type def, builtin_t goto egress; args[i] = t; } - va_end (list); t = builtin_types[ret]; if (t == error_mark_node) Jakub
Re: [patch, ARM] Fix PR42017, LR not used in leaf functions
On 2011/5/20 07:46 PM, Chung-Lin Tang wrote: On 2011/5/20 下午 07:41, Ramana Radhakrishnan wrote: On 17/05/11 14:10, Chung-Lin Tang wrote: On 2011/5/13 04:26 PM, Richard Sandiford wrote: Richard Sandifordrichard.sandif...@linaro.org writes: Chung-Lin Tangclt...@codesourcery.com writes: My fix here simply adds 'reload_completed' as an additional condition for EPILOGUE_USES to return true for LR_REGNUM. I think this should be valid, as correct LR save/restoring is handled by the epilogue/prologue code; it should be safe for IRA to treat it as a normal call-used register. FWIW, epilogue_completed might be a more accurate choice. I still stand by this, although I realise no other target does it. Did a re-test of the patch just to be sure, as expected the test results were also clear. Attached is the updated patch. Can you specify what you tested with this patch ? Native bootstrap success, plus C/C++ and libstdc++ tests. IIRC I also saw one or two FAIL-PASS in the results too (forgot specific testcases) So, it's interesting to note that the use of this was changed in 2007 by zadeck as a part of the df merge. I can't find the patch trail beyond this on the lists. http://gcc.gnu.org/viewvc/branches/dataflow-branch/gcc/config/arm/arm.h?r1=120281r2=121501 It might be better to understand why this was done in the first place for the ARM port as part of the Dataflow bring up and why folks wanted to make this unconditional. Digging through the repository, this is my explanation, FWIW: 1) The gen_prologue_uses() of LR were added back in Dec.2000 (r38467), when ce2 was still the if-convert-after-reload pass, placed after prologue-epilogue construction. (hence the arm_expand_prologue() comment about preventing ce2 using LR) 2) if-conversion after combine was added in Oct.2002 (r58547), which became the new ce2 (pre-reload); ifcvt after reload became ce3. The comments in arm_expand_prologue() were not updated. 3) dataflow-branch work was circa 2007. RTL-ifcvt seemed to be updated during this time, hence removal of the LR-uses in arm_expand_prologue() seems reasonable. My guess here: ce2 was mistaken to be ifcvt-after-combine (rather than the originally intended ifcvt-after-reload, now ce3) by the comments; considering the arm_expand_prologue() bits were updated, the comments may have been read seriously. 4) Since ce2 was a pre-reload pass by then, the unconditionalizing of EPILOGUE_USES was probably intended to be a supplemental change, to support removing those gen_prologue_use()s. I hope this is a reasonable explanation, but do note a lot of this is guessing :) I tried taking the last version of the dataflow-branch (circa 4.3) and did cross-test run compares of EPILOGUE_USES with and without the reload_completed conditionalization. The C testsuite results were clean. The LR-not-used symptoms seem triggered by this EPILOGUE_USES change since then. As the PR42017 submitter lists the affected GCC versions, this regression has been present since post-4.2. Given the above explanation, and considering that the tests on current trunk are okay, plus we're in stage1 right now, is this re-conditionalizing EPILOGUE_USES change okay to commit? Thanks, Chung-Lin
Re: Go patch committed: Update to current Go library
Ian Lance Taylor i...@google.com writes: I just committed a patch to godump.c which I think should fix this issue. Let me know if it doesn't. There are several issues now: * While I get // var ___iob [59+1]___FILE now, there's still var __lastbuf *_FILE left, with commented // type _FILE struct { _cnt int32; _ptr *uint8; _base *uint8; _flag uint8; _file uint8; __orientation INVALID-bit-field; __ionolock INVALID-bit-field; __seekabl e INVALID-bit-field; __extendedfd INVALID-bit-field; __xf_nocheck INVALID-bit-fi eld; __filler INVALID-bit-field; } as before. * The amd64 sysinfo.go contains several undefined types: sysinfo.go:2886:53: error: use of undefined type '_fpchip_state' sysinfo.go:2886:40: error: struct field type is incomplete sysinfo.go:2886:53: error: use of undefined type '_fpchip_state' sysinfo.go:2887:47: error: struct field type is incomplete sysinfo.go:2892:32: error: use of undefined type '_fxsave_state' sysinfo.go:2892:24: error: struct field type is incomplete type _fpu struct { fp_reg_set struct { fpchip_state _fpchip_state; }; } type _fpregset_t struct { fp_reg_set struct { fpchip_state _fpchip_state; }; } type __kfpu_u struct { kfpu_fx _fxsave_state; } type _kfpu_t struct { kfpu_u __kfpu_u; kfpu_status uint32; kfpu_xstatus uint32; } Both types are commented, due to the use of commented _upad128_t: // type _upad128_t struct { _q INVALID-float-80; } // type _fpchip_state struct { cw uint16; sw uint16; fctw uint8; __fx_rsvd uint8; fop uint16; rip uint64; rdp uint64; mxcsr uint32; mxcsr_mask uint32; st [7+1]struct { fpr_16 [4+1]uint16; }; xmm [15+1]_upad128_t; __fx_ign2 [5+1]_upad128_t; status uint32; xstatus uint32; } // type _fxsave_state struct { fx_fcw uint16; fx_fsw uint16; fx_fctw uint16; fx_fop uint16; fx_rip uint64; fx_rdp uint64; fx_mxcsr uint32; fx_mxcsr_mask uint32; fx_st [7+1]struct { fpr_16 [4+1]uint16; }; fx_xmm [15+1]_upad128_t; __fx_ign2 [5+1]_upad128_t; } Unfortunately, this has as ripple effect and I need to omit several type declarations to make the problem go away: + grep -v '^type _fpu' | \ + grep -v '^type _fpregset_t' | \ + grep -v '^type _mcontext_t' | \ + grep -v '^type _ucontext' | \ + grep -v '^type __kfpu_u' | \ + grep -v '^type _kfpu_t' | \ sys/types.h has typedef union { long double _q; uint32_t_l[4]; } upad128_t; I already have to provide a _upad128_t replacement for other uses, but it would really help to support this directly. With those types and __lastbuf omitted from sysinfo.go, I can successfully bootstrap on i386-pc-solaris2.1[01]. On Solaris 11/x86, the libgo results are clean, on Solaris 10/x86 there are still 37 failures for the amd64 multilib which I still need to debug. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH] More pow(x,c) expansions in cse_sincos pass (PR46728, patch 3)
This patch adds logic to gimple_expand_builtin_pow () to optimize pow(x,y), where y is one of 0.5, 0.25, 0.75, 1./3., or 1./6. I noticed that there were two missing calls to gimple_set_location () in my previous patch, so I've corrected those here as well. There's one TODO comment in this patch. I don't believe the test for TREE_SIDE_EFFECTS (arg0) should be necessary; but I'm not convinced it was necessary in the code whence I copied it, either, so I left it in for comment in case I'm misunderstanding something. 2011-05-25 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/46728 * tree-ssa-math-opts.c (powi_as_mults_1): Add gimple_set_location. (powi_as_mults): Add gimple_set_location. (build_and_insert_call): New. (gimple_expand_builtin_pow): Add handling for pow(x,y) when y is 0.5, 0.25, 0.75, 1./3., or 1./6. Index: gcc/tree-ssa-math-opts.c === --- gcc/tree-ssa-math-opts.c(revision 174199) +++ gcc/tree-ssa-math-opts.c(working copy) @@ -965,6 +965,7 @@ powi_as_mults_1 (gimple_stmt_iterator *gsi, locati } mult_stmt = gimple_build_assign_with_ops (MULT_EXPR, ssa_target, op0, op1); + gimple_set_location (mult_stmt, loc); gsi_insert_before (gsi, mult_stmt, GSI_SAME_STMT); return ssa_target; @@ -999,6 +1000,7 @@ powi_as_mults (gimple_stmt_iterator *gsi, location div_stmt = gimple_build_assign_with_ops (RDIV_EXPR, target, build_real (type, dconst1), result); + gimple_set_location (div_stmt, loc); gsi_insert_before (gsi, div_stmt, GSI_SAME_STMT); return target; @@ -1024,6 +1026,34 @@ gimple_expand_builtin_powi (gimple_stmt_iterator * return NULL_TREE; } +/* Build a gimple call statement that calls FN with argument ARG. + Set the lhs of the call statement to a fresh SSA name for + variable VAR. If VAR is NULL, first allocate it. Insert the + statement prior to GSI's current position, and return the fresh + SSA name. */ + +static tree +build_and_insert_call (gimple_stmt_iterator *gsi, tree fn, tree arg, + tree *var, location_t loc) +{ + gimple call_stmt; + tree ssa_target; + + if (!*var) +{ + *var = create_tmp_var (TREE_TYPE (arg), powroot); + add_referenced_var (*var); +} + + call_stmt = gimple_build_call (fn, 1, arg); + ssa_target = make_ssa_name (*var, NULL); + gimple_set_lhs (call_stmt, ssa_target); + gimple_set_location (call_stmt, loc); + gsi_insert_before (gsi, call_stmt, GSI_SAME_STMT); + + return ssa_target; +} + /* ARG0 and ARG1 are the two arguments to a pow builtin call in GSI with location info LOC. If possible, create an equivalent and less expensive sequence of statements prior to GSI, and return an @@ -1035,6 +1065,8 @@ gimple_expand_builtin_pow (gimple_stmt_iterator *g { REAL_VALUE_TYPE c, cint; HOST_WIDE_INT n; + tree type, sqrtfn, target = NULL_TREE; + enum machine_mode mode; /* If the exponent isn't a constant, there's nothing of interest to be done. */ @@ -1054,6 +1086,108 @@ gimple_expand_builtin_pow (gimple_stmt_iterator *g powi_cost (n) = POWI_MAX_MULTS))) return gimple_expand_builtin_powi (gsi, loc, arg0, n); + /* Attempt various optimizations using sqrt and cbrt. */ + type = TREE_TYPE (arg0); + mode = TYPE_MODE (type); + sqrtfn = mathfn_built_in (type, BUILT_IN_SQRT); + + if (flag_unsafe_math_optimizations sqrtfn != NULL_TREE) +{ + REAL_VALUE_TYPE dconst1_4, dconst3_4; + tree cbrtfn; + bool hw_sqrt_exists; + + /* Optimize pow(x,0.5) = sqrt(x). */ + if (REAL_VALUES_EQUAL (c, dconsthalf)) + return build_and_insert_call (gsi, sqrtfn, arg0, target, loc); + + /* Optimize pow(x,0.25) = sqrt(sqrt(x)). */ + dconst1_4 = dconst1; + SET_REAL_EXP (dconst1_4, REAL_EXP (dconst1_4) - 2); + hw_sqrt_exists = optab_handler(sqrt_optab, mode) != CODE_FOR_nothing; + + if (REAL_VALUES_EQUAL (c, dconst1_4) hw_sqrt_exists) + { + tree sqrt_arg0; + + /* sqrt(x) */ + sqrt_arg0 = build_and_insert_call (gsi, sqrtfn, arg0, target, loc); + + /* sqrt(sqrt(x)) */ + return build_and_insert_call (gsi, sqrtfn, sqrt_arg0, target, loc); + } + + /* Optimize pow(x,0.75) = sqrt(x) * sqrt(sqrt(x)). */ + real_from_integer (dconst3_4, VOIDmode, 3, 0, 0); + SET_REAL_EXP (dconst3_4, REAL_EXP (dconst3_4) - 2); + + if (optimize_function_for_speed_p (cfun) + !TREE_SIDE_EFFECTS (arg0) /* TODO: is this needed? */ + REAL_VALUES_EQUAL (c, dconst3_4) + hw_sqrt_exists) + { + tree sqrt_arg0, sqrt_sqrt, ssa_target; + gimple mult_stmt; + + /* sqrt(x) */ + sqrt_arg0 = build_and_insert_call (gsi, sqrtfn, arg0, target, loc); + + /* sqrt(sqrt(x))
[pph] Reformat (issue4515140)
In pph_stream_read_tree and pph_stream_write_tree, reformat for style. This step was skipped in the last patch to make diffs more sensible. Index: gcc/cp/ChangeLog.pph 2011-05-25 Lawrence Crowl cr...@google.com * pph-streamer-in.c (pph_stream_read_tree): Reformat for style. * pph-streamer-out.c (pph_stream_write_tree): Reformat for style. Index: gcc/cp/pph-streamer-in.c === --- gcc/cp/pph-streamer-in.c(revision 174166) +++ gcc/cp/pph-streamer-in.c(working copy) @@ -820,39 +820,33 @@ pph_stream_read_tree (struct lto_input_b case PARM_DECL: case USING_DECL: case VAR_DECL: - { /* FIXME pph: Should we merge DECL_INITIAL into lang_specific? */ DECL_INITIAL (expr) = pph_input_tree (stream); - pph_stream_read_lang_specific (stream, expr); + pph_stream_read_lang_specific (stream, expr); break; -} case FUNCTION_DECL: -{ DECL_INITIAL (expr) = pph_input_tree (stream); pph_stream_read_lang_specific (stream, expr); - DECL_SAVED_TREE (expr) = pph_input_tree (stream); + DECL_SAVED_TREE (expr) = pph_input_tree (stream); break; - } case TYPE_DECL: -{ DECL_INITIAL (expr) = pph_input_tree (stream); pph_stream_read_lang_specific (stream, expr); - DECL_ORIGINAL_TYPE (expr) = pph_input_tree (stream); + DECL_ORIGINAL_TYPE (expr) = pph_input_tree (stream); break; -} case STATEMENT_LIST: -{ - HOST_WIDE_INT i, num_trees = pph_input_uint (stream); - for (i = 0; i num_trees; i++) - { - tree stmt = pph_input_tree (stream); - append_to_statement_list (stmt, expr); - } + { +HOST_WIDE_INT i, num_trees = pph_input_uint (stream); +for (i = 0; i num_trees; i++) + { + tree stmt = pph_input_tree (stream); + append_to_statement_list (stmt, expr); + } + } break; -} case ARRAY_TYPE: case BOOLEAN_TYPE: @@ -870,62 +864,48 @@ pph_stream_read_tree (struct lto_input_b case REFERENCE_TYPE: case VECTOR_TYPE: case VOID_TYPE: -{ pph_stream_read_lang_type (stream, expr); break; -} case QUAL_UNION_TYPE: case RECORD_TYPE: case UNION_TYPE: -{ - pph_stream_read_lang_type (stream, expr); -{ - TYPE_BINFO (expr) = pph_input_tree (stream); -} + pph_stream_read_lang_type (stream, expr); + TYPE_BINFO (expr) = pph_input_tree (stream); break; -} case OVERLOAD: -{ OVL_FUNCTION (expr) = pph_input_tree (stream); break; -} case IDENTIFIER_NODE: -{ - struct lang_identifier *id = LANG_IDENTIFIER_CAST (expr); - id-namespace_bindings = pph_stream_read_cxx_binding (stream); - id-bindings = pph_stream_read_cxx_binding (stream); - id-class_template_info = pph_input_tree (stream); - id-label_value = pph_input_tree (stream); + { +struct lang_identifier *id = LANG_IDENTIFIER_CAST (expr); +id-namespace_bindings = pph_stream_read_cxx_binding (stream); +id-bindings = pph_stream_read_cxx_binding (stream); +id-class_template_info = pph_input_tree (stream); +id-label_value = pph_input_tree (stream); + } break; -} case BASELINK: -{ BASELINK_BINFO (expr) = pph_input_tree (stream); BASELINK_FUNCTIONS (expr) = pph_input_tree (stream); BASELINK_ACCESS_BINFO (expr) = pph_input_tree (stream); break; -} case TEMPLATE_DECL: -{ DECL_INITIAL (expr) = pph_input_tree (stream); pph_stream_read_lang_specific (stream, expr); DECL_TEMPLATE_RESULT (expr) = pph_input_tree (stream); DECL_TEMPLATE_PARMS (expr) = pph_input_tree (stream); DECL_CONTEXT (expr) = pph_input_tree (stream); break; -} case TEMPLATE_INFO: -{ TI_TYPEDEFS_NEEDING_ACCESS_CHECKING (expr) = pph_stream_read_qual_use_vec (stream); break; -} case TREE_LIST: case TREE_BINFO: Index: gcc/cp/pph-streamer-out.c === --- gcc/cp/pph-streamer-out.c (revision 174166) +++ gcc/cp/pph-streamer-out.c (working copy) @@ -821,45 +821,39 @@ pph_stream_write_tree (struct output_blo case PARM_DECL: case USING_DECL: case VAR_DECL: - { /* FIXME pph: Should we merge DECL_INITIAL into lang_specific? */ pph_output_tree_or_ref_1 (stream, DECL_INITIAL (expr), ref_p, 3); - pph_stream_write_lang_specific (stream, expr, ref_p); + pph_stream_write_lang_specific (stream, expr, ref_p); break; -} case FUNCTION_DECL: -{ pph_output_tree_or_ref_1 (stream, DECL_INITIAL (expr), ref_p, 3); pph_stream_write_lang_specific (stream, expr, ref_p); -
[v3] Small tweak to std::random_device
Hi, committed to mainline. Thanks, Paolo. / 2011-05-25 Paolo Carlini paolo.carl...@oracle.com * include/bits/random.h (random_device::min, max): Specify constexpr. Index: include/bits/random.h === --- include/bits/random.h (revision 174216) +++ include/bits/random.h (working copy) @@ -1544,12 +1544,12 @@ #endif -result_type -min() const +static constexpr result_type +min() { return std::numeric_limitsresult_type::min(); } -result_type -max() const +static constexpr result_type +max() { return std::numeric_limitsresult_type::max(); } double
Re: [testsuite] remove XFAIL for all but ia64 for g++.dg/tree-ssa/pr43411.C
On May 25, 2011, at 1:38 AM, Rainer Orth wrote: Janis Johnson jani...@codesourcery.com writes: Archived test results for 4.7.0 for most processors with C++ results have: XPASS: g++.dg/tree-ssa/pr43411.C scan-tree-dump-not optimized OBJ_TYPE_REF The only failures I could find were for ia64-linux and ia64-hpux. This patch changes the xfail so it only applies to ia64-*-*. OK for trunk? Richard rejected a similar patch: I see the two issues as orthogonal. One issue it to have an accurate expectation for the actual testcase on actual targets. The other is to modify the testcase to test something else. While one can use the XPASS as a way of keeping track of the issue of improving the testcase, I'd rather approve the fix to fix the expected state and have people that want to track the other issue, instead of using XPASS to track that state, to use a PR instead. I think it would be nice to go even farther, and that would be to set the expected state on all testcases on 6 platforms at the time of release, to expected, filing PRs for all failures (any unexpected result) so marked and to actually gate the release on no unexpected results.
Re: [PATCH PING] unreviewed tree-slimming patches
Nathan == Nathan Froyd froy...@codesourcery.com writes: Nathan (C, Java, middle-end) Nathan [PATCH 18/18] make TS_BLOCK a substructure of TS_BASE Nathan http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00564.html The Java parts are ok. I think these sorts of changes should be obvious once approved from a middle-end perspective, at least assuming that there are no regressions. I say this because I think that once the core change has been decided on, there is often just one way to go about fixing up the users; at least, in a case like this where the consequence amounts to deleting assignments. I mentioned this idea before but I didn't see any discussion of it. I am happy to continue looking at patches like this if that is what the more active maintainers would prefer. Tom
Re: [PATCH PING] unreviewed tree-slimming patches
On 05/25/2011 02:06 PM, Tom Tromey wrote: Nathan == Nathan Froyd froy...@codesourcery.com writes: Nathan (C, Java, middle-end) Nathan [PATCH 18/18] make TS_BLOCK a substructure of TS_BASE Nathan http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00564.html The Java parts are ok. I think these sorts of changes should be obvious once approved from a middle-end perspective, at least assuming that there are no regressions. I mentioned this idea before but I didn't see any discussion of it. I am happy to continue looking at patches like this if that is what the more active maintainers would prefer. I think Jason mentioned considering them approved after waiting a week. If we want to enshrine that as policy, I think that'd be reasonable. All in favor...? -Nathan
Patch for libobjc/38307
I committed to trunk this libobjc patch by Richard Frith-Macdonald and David Ayers. The patch fixes some rare (but serious) problems with +initialize in multithreading programs. It's complicated and I refer to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38307 for more information. I approved the patch, and committed it to trunk (as Richard doesn't have write access). Richard/David, thanks a lot for your help! :-) Index: sendmsg.c === --- sendmsg.c (revision 174219) +++ sendmsg.c (working copy) @@ -41,6 +41,7 @@ see the files COPYING3 and COPYING.RUNTIME respect #include objc/thr.h #include objc-private/module-abi-8.h #include objc-private/runtime.h +#include objc-private/hash.h #include objc-private/sarray.h #include objc-private/selector.h /* For sel_is_mapped() */ #include runtime-info.h @@ -75,10 +76,14 @@ IMP (*__objc_msg_forward2) (id, SEL) = NULL; /* Send +initialize to class. */ static void __objc_send_initialize (Class); -static void __objc_install_dispatch_table_for_class (Class); +/* Forward declare some functions */ +static void __objc_install_dtable_for_class (Class cls); +static void __objc_prepare_dtable_for_class (Class cls); +static void __objc_install_prepared_dtable_for_class (Class cls); -/* Forward declare some functions. */ -static void __objc_init_install_dtable (id, SEL); +static struct sarray *__objc_prepared_dtable_for_class (Class cls); +static IMP __objc_get_prepared_imp (Class cls,SEL sel); + /* Various forwarding functions that are used based upon the return type for the selector. @@ -117,7 +122,7 @@ __objc_get_forward_imp (id rcv, SEL sel) { IMP result; if ((result = __objc_msg_forward (sel)) != NULL) -return result; + return result; } /* In all other cases, use the default forwarding functions built @@ -210,7 +215,7 @@ __objc_resolve_instance_method (Class class, SEL s { objc_mutex_lock (__objc_runtime_mutex); if (class-class_pointer-dtable == __objc_uninstalled_dtable) - __objc_install_dispatch_table_for_class (class-class_pointer); + __objc_install_dtable_for_class (class-class_pointer); objc_mutex_unlock (__objc_runtime_mutex); } resolveMethodIMP = sarray_get_safe (class-class_pointer-dtable, @@ -231,8 +236,94 @@ __objc_resolve_instance_method (Class class, SEL s return NULL; } -/* Given a class and selector, return the selector's - implementation. */ +/* Given a CLASS and selector, return the implementation corresponding + to the method of the selector. + + If CLASS is a class, the instance method is returned. + If CLASS is a meta class, the class method is returned. + + Since this requires the dispatch table to be installed, this function + will implicitly invoke +initialize for CLASS if it hasn't been + invoked yet. This also insures that +initialize has been invoked + when the returned implementation is called directly. + + The forwarding hooks require the receiver as an argument (if they are to + perform dynamic lookup in proxy objects etc), so this function has a + receiver argument to be used with those hooks. */ +static inline +IMP +get_implementation (id receiver, Class class, SEL sel) +{ + void *res; + + if (class-dtable == __objc_uninstalled_dtable) +{ + /* The dispatch table needs to be installed. */ + objc_mutex_lock (__objc_runtime_mutex); + + /* Double-checked locking pattern: Check +__objc_uninstalled_dtable again in case another thread +installed the dtable while we were waiting for the lock +to be released. */ + if (class-dtable == __objc_uninstalled_dtable) + { + __objc_install_dtable_for_class (class); + } + + /* If the dispatch table is not yet installed, + we are still in the process of executing +initialize. + But the implementation pointer should be available + in the prepared ispatch table if it exists at all. */ + if (class-dtable == __objc_uninstalled_dtable) + { + assert (__objc_prepared_dtable_for_class (class) != 0); + res = __objc_get_prepared_imp (class, sel); + } + else + { + res = 0; + } + objc_mutex_unlock (__objc_runtime_mutex); + /* Call ourselves with the installed dispatch table and get +the real method. */ + if (!res) + res = get_implementation (receiver, class, sel); +} + else +{ + /* The dispatch table has been installed. */ + res = sarray_get_safe (class-dtable, (size_t) sel-sel_id); + if (res == 0) + { + /* The dispatch table has been installed, and the method +is not in the dispatch table. So the method just +doesn't exist for the class. */ + + /* Try going through the +resolveClassMethod: or ++resolveInstanceMethod:
[PATCH, rs6000] Tidy up dumping of register/memory move cost
The following fixes a problem when dumping register costs, where the incorrect 'from' value was being written out because the code modified the incoming parameter value. It also changes things so that register/memory costs are only dumped on the outermost call, eliminating intermediate output when a cost calculation requires going through memory or GPRs. Bootstrap/regtest on powerpc64-linux with no new regressions. Ok for trunk? -Pat 2011-05-25 Pat Haugen pthau...@us.ibm.com * config/rs6000/rs6000.c (rs6000_register_move_cost): Preserve from parameter value for dump. Dump cost on outermost call only. (rs6000_memory_move_cost): Dump cost on outermost call only. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 174138) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -189,6 +189,8 @@ enum reg_class rs6000_regno_regclass[FIR /* Reload functions based on the type and the vector unit. */ static enum insn_code rs6000_vector_reload[NUM_MACHINE_MODES][2]; +static int dbg_cost_ctrl; + /* Built in types. */ tree rs6000_builtin_types[RS6000_BTI_MAX]; tree rs6000_builtin_decls[RS6000_BUILTIN_COUNT]; @@ -26428,26 +26430,31 @@ rs6000_register_move_cost (enum machine_ { int ret; + if (TARGET_DEBUG_COST) +dbg_cost_ctrl++; + /* Moves from/to GENERAL_REGS. */ if (reg_classes_intersect_p (to, GENERAL_REGS) || reg_classes_intersect_p (from, GENERAL_REGS)) { + reg_class_t rclass = from; + if (! reg_classes_intersect_p (to, GENERAL_REGS)) - from = to; + rclass = to; - if (from == FLOAT_REGS || from == ALTIVEC_REGS || from == VSX_REGS) - ret = (rs6000_memory_move_cost (mode, from, false) + if (rclass == FLOAT_REGS || rclass == ALTIVEC_REGS || rclass == VSX_REGS) + ret = (rs6000_memory_move_cost (mode, rclass, false) + rs6000_memory_move_cost (mode, GENERAL_REGS, false)); /* It's more expensive to move CR_REGS than CR0_REGS because of the shift. */ - else if (from == CR_REGS) + else if (rclass == CR_REGS) ret = 4; /* Power6 has slower LR/CTR moves so make them more expensive than memory in order to bias spills to memory .*/ else if (rs6000_cpu == PROCESSOR_POWER6 - reg_classes_intersect_p (from, LINK_OR_CTR_REGS)) + reg_classes_intersect_p (rclass, LINK_OR_CTR_REGS)) ret = 6 * hard_regno_nregs[0][mode]; else @@ -26471,10 +26478,14 @@ rs6000_register_move_cost (enum machine_ + rs6000_register_move_cost (mode, from, GENERAL_REGS)); if (TARGET_DEBUG_COST) -fprintf (stderr, - rs6000_register_move_cost:, ret=%d, mode=%s, from=%s, to=%s\n, - ret, GET_MODE_NAME (mode), reg_class_names[from], - reg_class_names[to]); +{ + if (dbg_cost_ctrl == 1) + fprintf (stderr, + rs6000_register_move_cost:, ret=%d, mode=%s, from=%s, to=%s\n, + ret, GET_MODE_NAME (mode), reg_class_names[from], + reg_class_names[to]); + dbg_cost_ctrl--; +} return ret; } @@ -26488,6 +26499,9 @@ rs6000_memory_move_cost (enum machine_mo { int ret; + if (TARGET_DEBUG_COST) +dbg_cost_ctrl++; + if (reg_classes_intersect_p (rclass, GENERAL_REGS)) ret = 4 * hard_regno_nregs[0][mode]; else if (reg_classes_intersect_p (rclass, FLOAT_REGS)) @@ -26498,9 +26512,13 @@ rs6000_memory_move_cost (enum machine_mo ret = 4 + rs6000_register_move_cost (mode, rclass, GENERAL_REGS); if (TARGET_DEBUG_COST) -fprintf (stderr, - rs6000_memory_move_cost: ret=%d, mode=%s, rclass=%s, in=%d\n, - ret, GET_MODE_NAME (mode), reg_class_names[rclass], in); +{ + if (dbg_cost_ctrl == 1) + fprintf (stderr, + rs6000_memory_move_cost: ret=%d, mode=%s, rclass=%s, in=%d\n, + ret, GET_MODE_NAME (mode), reg_class_names[rclass], in); + dbg_cost_ctrl--; +} return ret; }
Re: PATCH: PR target/49142: Invalid 8bit register operand
On Wed, May 25, 2011 at 6:20 PM, H.J. Lu hjl.to...@gmail.com wrote: We are working on a new optimization, which turns off TARGET_MOVX. GCC generates: movb %ah, %dil But %ah can only be used with %[abcd][hl]. This patch adds QIreg_operand and uses it in *movqi_extv_1_rex64/*movqi_extzv_2_rex64. OK for trunk if there is no regression? and Replace q_regs_operand with QIreg_operand. ( If this is the case, then please change q_regs_operand predicate to accept just QI_REG_P registers. I thought about it. It is a problem only with %[abcd]h. I am not sure if changing q_regs_operand to accept just QI_REG_P registers will negatively impact I see. The patch is OK then, but for consistency, please change the predicate of *movqi_extv_1*movqi_extzv_2 as well. Oh, and the register_operand check in type calculation can be removed. Thanks, Uros. This is what I checked in. Thanks. -- H.J. --- 2011-05-25 H.J. Lu hongjiu...@intel.com PR target/49142 * config/i386/i386.md (*movqi_extv_1_rex64): Remove register_operand check and replace q_regs_operand with QIreg_operand in type calculation. (*movqi_extv_1): Likewise. (*movqi_extzv_2_rex64): Likewise. (*movqi_extzv_2): Likewise. Er, I didn't mean to remove register_operand check from 32bit patterns... there, operand 0 can also be memory operand due to nonimmediate_operand constraint. Uros.
Re: PATCH: Add pause intrinsic
On Wed, 25 May 2011 11:26:51 +0100 Andrew Haley a...@redhat.com wrote: On 05/24/2011 07:28 PM, H.J. Lu wrote: This patch implements pause intrinsic suggested by Andi. OK for trunk? What does full memory barrier here mean? +@table @code +@item void __builtin_ia32_pause (void) +Generates the @code{pause} machine instruction with full memory barrier. +@end table There a memory clobber, but no barrier instruction AFAICS. The doc needs to explain it a bit better. Perhaps the doc might explain why is it necessary to have a builtin for two independent roles: first, the full compiler memory barrier (which probably means to spill all the registers on the stack - definitely a task for a compiler); second, to pause the processor (which might also mean to flush or invalidate some data caches). In particular, I would naively imagine that we might have a more generic builtin for the compiler memory barrier (which probably could be independent of the particular ia32 target), and in that case which can't we just implement the pause ia32 builtin as builtin_compiler_barrier(); asm (pause)? I find the above documentation too short and (being a non native English speaker) I would prefer it to be much longer. I am not able to suggest better phrasing (because I still did not entirely understood what that builtin_ia32_pause is useful or needed). And if there was a builtin_compiler_barrier () I would believe it can have a lot of other uses. Any generated C code which wants some introspection or some garbage collection write barrier might want it too! [perhaps even I might find later such thing useful in C code generated by MELT] Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basileatstarynkevitchdotnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mine, sont seulement les miennes} ***
Re: Create common hooks structure shared between driver and cc1
Here is a revised version of my patch http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01779.html to create the common hooks structure. Tested in the same way as the original patch. OK to commit? In the course of working on moving hooks to the new structure, I found that every target architecture except for moxie has at least one of the hooks that will move. I didn't want to create 35 separate makefile rules with manually maintained dependencies to build the associated new source files for those hooks (with, probably, many new t-* files to contain those rules and dependencies). So this patch makes $arch-common.o follow a similar approach to $arch.o: have a single common_out_file and common_out_object_file instead of the previous more general extra_common_objs, and have a shared makefile rule, with a standard set of dependencies, in gcc/Makefile.in. (This means the patch no longer touches pa/t-pa, which is again an unused file along with i386/t-crtpic and i386/t-svr3dbx; note that $arch/t-$arch is *only* automatically used if config.gcc leaves tmake_file completely empty, otherwise config.gcc needs to add such a file explicitly to tmake_file to cause it to be used.) There are actually two separate things I wanted to avoid replicating 35-fold: manually maintained rules, and manually maintained dependencies. For the latter, I looked again at Tom's reverted patch from March 2008 to use automatic dependency generation. Although there is now a fixed GNU make release (since last July), and although I'd like to see automatic dependency generation go in (I hope Paolo may follow up on it as per http://gcc.gnu.org/ml/gcc-patches/2008-03/msg01721.html), actually getting it working looks like a potential rathole and I couldn't figure out from the 2008 thread what the makefile feature was that caused problems with a GNU make bug and whether it would be possible to disable use of that feature (instead having stupid dependencies of all objects on all headers) when using older versions of make (which would I think be desirable, to avoid forcing everyone to upgrade make immediately). For the former (avoided in this patch by having one new rule instead of 35), as far as I can see the main reason extra rules are involved for target-specific files is just to build them directly in the gcc/ object directory, instead of with a .o name matching the path to the source directory ($objdir/config/i386/i386.o, etc.). So most of them could probably be eliminated (leaving only dependencies) by putting all objects in subdirectories (and making configure.ac create all those directories). Some involve extra compiler options, which the dependencies patch dealt with by using GNU make variable settings applying to individual makefile targets, and I suppose it might be possible to use that feature separately from automatic dependency generation - though the way those settings apply to dependencies of the target being built could cause problems, so maybe we should instead put $($@-CPPFLAGS) or similar in CPPFLAGS - though if it were that simple to avoid the replication of compilation commands, I'd expect this to be done already. 2011-05-25 Joseph Myers jos...@codesourcery.com * common/common-target-def.h, common/common-target.def, common/common-target.h, common/config/default-common.c, common/config/pa/pa-common.c: New files. * Makefile.in (common_out_file, common_out_object_file, COMMON_TARGET_H, COMMON_TARGET_DEF_H): New. (OBJS-libcommon-target): Include $(common_out_object_file). (prefix.o): Update dependencies. ($(common_out_object_file), common/common-target-hooks-def.h, s-common-target-hooks-def-h): New. (s-tm-texi): Also check timestamp on common-target.def. (build/genhooks.o): Update dependencies. * config.gcc (common_out_file, target_has_targetm_common): Define. * config/pa/som.h (ALWAYS_STRIP_DOTDOT): Replace with TARGET_ALWAYS_STRIP_DOTDOT. * configure.ac (common_out_object_file): Define. (common_out_file, common_out_object_file): Substitute. (common): Create directory. * configure: Regenerate. * doc/tm.texi.in (targetm_common): Document. (TARGET_ALWAYS_STRIP_DOTDOT): Add @hook entry. * doc/tm.texi: Regenerate. * genhooks.c (hook_array): Also include common/common-target.def. * prefix.c (tm.h): Don't include. (common/common-target.h): Include. (ALWAYS_STRIP_DOTDOT): Don't define. (update_path): Use targetm_common.always_strip_dotdot instead of ALWAYS_STRIP_DOTDOT. * system.h (ALWAYS_STRIP_DOTDOT): Poison. Index: gcc/doc/tm.texi === --- gcc/doc/tm.texi (revision 174109) +++ gcc/doc/tm.texi (working copy) @@ -99,6 +99,16 @@ initializer @code{TARGETCM_INITIALIZER} themselves, they should set @code{target_has_targetcm=yes} in
Re: [PATCH][4.6] detect C++ errors to fix 2288 and 18770
On Sun, May 22, 2011 at 03:25:41PM -0700, H.J. Lu wrote: FWIW, I tried Janis's patch on 4.6 branch and I got /export/gnu/import/git/gcc/gcc/testsuite/g++.dg/parse/pr18770.C: In function 'void e1()':^M /export/gnu/import/git/gcc/gcc/testsuite/g++.dg/parse/pr18770.C:29:11: error: redeclaration of 'int k'^M /export/gnu/import/git/gcc/gcc/testsuite/g++.dg/parse/pr18770.C:27:12: error: 'int k' previously declared here^M /export/gnu/import/git/gcc/gcc/testsuite/g++.dg/parse/pr18770.C: In function 'void e4()':^M /export/gnu/import/git/gcc/gcc/testsuite/g++.dg/parse/pr18770.C:63:11: error: redeclaration of 'int i'^M /export/gnu/import/git/gcc/gcc/testsuite/g++.dg/parse/pr18770.C:61:14: error: 'int i' previously declared here^M FAIL: g++.dg/parse/pr18770.C prev (test for errors, line 14) FAIL: g++.dg/parse/pr18770.C redecl (test for errors, line 17) PASS: g++.dg/parse/pr18770.C prev (test for errors, line 27) PASS: g++.dg/parse/pr18770.C redecl (test for errors, line 29) FAIL: g++.dg/parse/pr18770.C prev (test for errors, line 37) FAIL: g++.dg/parse/pr18770.C redecl (test for errors, line 39) FAIL: g++.dg/parse/pr18770.C prev (test for errors, line 47) FAIL: g++.dg/parse/pr18770.C redecl (test for errors, line 53) PASS: g++.dg/parse/pr18770.C prev (test for errors, line 61) PASS: g++.dg/parse/pr18770.C redecl (test for errors, line 63) FAIL: g++.dg/parse/pr18770.C prev (test for errors, line 71) FAIL: g++.dg/parse/pr18770.C redecl (test for errors, line 73) PASS: g++.dg/parse/pr18770.C (test for excess errors) /export/gnu/import/git/gcc/gcc/testsuite/g++.old-deja/g++.jason/cond.C: In function 'int main()':^M /export/gnu/import/git/gcc/gcc/testsuite/g++.old-deja/g++.jason/cond.C:22:11: error: redeclaration of 'int i'^M /export/gnu/import/git/gcc/gcc/testsuite/g++.old-deja/g++.jason/cond.C:20:14: error: 'int i' previously declared here^M /export/gnu/import/git/gcc/gcc/testsuite/g++.old-deja/g++.jason/cond.C:27:11: error: redeclaration of 'int i'^M /export/gnu/import/git/gcc/gcc/testsuite/g++.old-deja/g++.jason/cond.C:25:14: error: 'int i' previously declared here^M /export/gnu/import/git/gcc/gcc/testsuite/g++.old-deja/g++.jason/cond.C:36:16: error: types may not be defined in conditions^M /export/gnu/import/git/gcc/gcc/testsuite/g++.old-deja/g++.jason/cond.C:39:3: error: 'A' was not declared in this scope^M /export/gnu/import/git/gcc/gcc/testsuite/g++.old-deja/g++.jason/cond.C:39:5: error: expected ';' before 'bar'^M /export/gnu/import/git/gcc/gcc/testsuite/g++.old-deja/g++.jason/cond.C:42:12: error: types may not be defined in conditions^M /export/gnu/import/git/gcc/gcc/testsuite/g++.old-deja/g++.jason/cond.C:42:40: error: 'one' was not declared in this scope^M /export/gnu/import/git/gcc/gcc/testsuite/g++.old-deja/g++.jason/cond.C:51:14: warning: declaration of 'int f()' has 'extern' and is initialized [enabled by default]^M /export/gnu/import/git/gcc/gcc/testsuite/g++.old-deja/g++.jason/cond.C:51:18: error: function 'int f()' is initialized like a variable^M /export/gnu/import/git/gcc/gcc/testsuite/g++.old-deja/g++.jason/cond.C:55:23: error: extended initializer lists only available with -std=c++0x or -std=gnu++0x^M FAIL: g++.old-deja/g++.jason/cond.C (test for errors, line 9) FAIL: g++.old-deja/g++.jason/cond.C (test for errors, line 11) FAIL: g++.old-deja/g++.jason/cond.C (test for errors, line 16) PASS: g++.old-deja/g++.jason/cond.C (test for errors, line 20) PASS: g++.old-deja/g++.jason/cond.C (test for errors, line 22) PASS: g++.old-deja/g++.jason/cond.C (test for errors, line 25) PASS: g++.old-deja/g++.jason/cond.C (test for errors, line 27) FAIL: g++.old-deja/g++.jason/cond.C (test for errors, line 30) FAIL: g++.old-deja/g++.jason/cond.C (test for errors, line 33) PASS: g++.old-deja/g++.jason/cond.C (test for errors, line 36) PASS: g++.old-deja/g++.jason/cond.C decl (test for errors, line 39) PASS: g++.old-deja/g++.jason/cond.C exp (test for errors, line 39) PASS: g++.old-deja/g++.jason/cond.C def (test for errors, line 42) PASS: g++.old-deja/g++.jason/cond.C expected (test for errors, line 42) PASS: g++.old-deja/g++.jason/cond.C extern (test for warnings, line 51) The patch no longer catches all problems. The patch just requires some shuffling of logic to catch issues now; below is a version that works for me on the trunk. This new checking does require modifying g++.dg/cpp0x/range-for5.C. The new logic of the patch claims that: int i; for (int i : a) { int i; } is incorrect (the innermost `i' is an erroneous redeclaration). If you apply the expansion of range-based for loops from [stmt.ranged]p1, you'd get something like: for (...; ...; ...) { int i = ...; int i; } which is bad. I believe [basic.scope.local]p4 says much the same thing. Tested with g++ testsuite on x86_64-unknown-linux-gnu; tests in progress for libstdc++. OK to commit? -Nathan gcc/cp/ 2011-xx-xx Janis
Re: PATCH: PR target/49142: Invalid 8bit register operand
On Wed, May 25, 2011 at 12:11 PM, Uros Bizjak ubiz...@gmail.com wrote: On Wed, May 25, 2011 at 6:20 PM, H.J. Lu hjl.to...@gmail.com wrote: We are working on a new optimization, which turns off TARGET_MOVX. GCC generates: movb %ah, %dil But %ah can only be used with %[abcd][hl]. This patch adds QIreg_operand and uses it in *movqi_extv_1_rex64/*movqi_extzv_2_rex64. OK for trunk if there is no regression? and Replace q_regs_operand with QIreg_operand. ( If this is the case, then please change q_regs_operand predicate to accept just QI_REG_P registers. I thought about it. It is a problem only with %[abcd]h. I am not sure if changing q_regs_operand to accept just QI_REG_P registers will negatively impact I see. The patch is OK then, but for consistency, please change the predicate of *movqi_extv_1*movqi_extzv_2 as well. Oh, and the register_operand check in type calculation can be removed. Thanks, Uros. This is what I checked in. Thanks. -- H.J. --- 2011-05-25 H.J. Lu hongjiu...@intel.com PR target/49142 * config/i386/i386.md (*movqi_extv_1_rex64): Remove register_operand check and replace q_regs_operand with QIreg_operand in type calculation. (*movqi_extv_1): Likewise. (*movqi_extzv_2_rex64): Likewise. (*movqi_extzv_2): Likewise. Er, I didn't mean to remove register_operand check from 32bit patterns... there, operand 0 can also be memory operand due to nonimmediate_operand constraint. Ooops. I am checking in this. Thanks. -- H.J. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index ed1834f..1afef8e 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,11 @@ 2011-05-25 H.J. Lu hongjiu...@intel.com + * config/i386/i386.md (*movqi_extv_1)): Put back + register_operand check in type calculation. + (*movqi_extzv_2): Likewise. + +2011-05-25 H.J. Lu hongjiu...@intel.com + * doc/extend.texi (X86 Built-in Functions): Update pause intrinsic. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 1cdbe7e..13a1cde 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -2514,9 +2514,10 @@ } } [(set (attr type) - (if_then_else (ior (not (match_operand:QI 0 QIreg_operand )) - (ne (symbol_ref TARGET_MOVX) - (const_int 0))) + (if_then_else (and (match_operand:QI 0 register_operand ) + (ior (not (match_operand:QI 0 QIreg_operand )) +(ne (symbol_ref TARGET_MOVX) +(const_int 0 (const_string imovx) (const_string imov))) (set (attr mode) @@ -2578,9 +2579,10 @@ } } [(set (attr type) - (if_then_else (ior (not (match_operand:QI 0 QIreg_operand )) - (ne (symbol_ref TARGET_MOVX) - (const_int 0))) + (if_then_else (and (match_operand:QI 0 register_operand ) + (ior (not (match_operand:QI 0 QIreg_operand )) +(ne (symbol_ref TARGET_MOVX) +(const_int 0 (const_string imovx) (const_string imov))) (set (attr mode)
Re: PATCH: Add pause intrinsic
On Wed, May 25, 2011 at 12:17 PM, Basile Starynkevitch bas...@starynkevitch.net wrote: On Wed, 25 May 2011 11:26:51 +0100 Andrew Haley a...@redhat.com wrote: On 05/24/2011 07:28 PM, H.J. Lu wrote: This patch implements pause intrinsic suggested by Andi. OK for trunk? What does full memory barrier here mean? +@table @code +@item void __builtin_ia32_pause (void) +Generates the @code{pause} machine instruction with full memory barrier. +@end table There a memory clobber, but no barrier instruction AFAICS. The doc needs to explain it a bit better. Perhaps the doc might explain why is it necessary to have a builtin for two independent roles: first, the full compiler memory barrier (which probably means to spill all the registers on the stack - definitely a task for a compiler); second, to pause the processor (which might also mean to flush or invalidate some data caches). In particular, I would naively imagine that we might have a more generic builtin for the compiler memory barrier (which probably could be independent of the particular ia32 target), and in that case which can't we just implement the pause ia32 builtin as builtin_compiler_barrier(); asm (pause)? We may need builtin_compiler_barrier(); asm (pause); builtin_compiler_barrier(); -- H.J.
C++ PATCH for c++/46696 (error with defaulted op= and arrays)
Another case where we now need to check DECL_DEFAULTED_FN rather than DECL_ARTIFICIAL. Tested x86_64-pc-linux-gnu, applying to trunk. commit 3ac89bd9f5f81b4d3ff293b337e7e9163d3402dd Author: Jason Merrill ja...@redhat.com Date: Wed May 25 12:05:03 2011 -0400 PR c++/46696 * typeck.c (cp_build_modify_expr): Check DECL_DEFAULTED_FN. diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c index 69b25d3..5fbb765 100644 --- a/gcc/cp/typeck.c +++ b/gcc/cp/typeck.c @@ -6748,7 +6748,7 @@ cp_build_modify_expr (tree lhs, enum tree_code modifycode, tree rhs, /* Allow array assignment in compiler-generated code. */ else if (!current_function_decl - || !DECL_ARTIFICIAL (current_function_decl)) + || !DECL_DEFAULTED_FN (current_function_decl)) { /* This routine is used for both initialization and assignment. Make sure the diagnostic message differentiates the context. */ diff --git a/gcc/testsuite/g++.dg/cpp0x/defaulted29.C b/gcc/testsuite/g++.dg/cpp0x/defaulted29.C new file mode 100644 index 000..5fcf5b0 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/defaulted29.C @@ -0,0 +1,20 @@ +// PR c++/46696 +// { dg-options -std=c++0x } + +struct A +{ + A operator= (A const); +}; + +struct B +{ + A ar[1]; + B operator= (B const) = default; +}; + +int main() +{ + B x; + B y; + y = x; +}
C++ PATCH for c++/47184 (list-initialized temporary in parenthesized initializer)
cp_parser_parameter_declaration is clever enough to tell that when we see Type1 id(Type2 if the next token doesn't indicate a cast, we're dealing with a function declarator. But it was only checking for '('; now it needs to check for '{' as well. After making that fix, I needed to change cp_parser_direct_declarator to not assume that we successfully parsed a parameter list until we see the closing ')'. Tested x86_64-pc-linux-gnu, applying to trunk. commit 365eff32e0004b7e3ac0794a2fbb5d6585f4b4d7 Author: Jason Merrill ja...@redhat.com Date: Wed May 25 11:44:48 2011 -0400 PR c++/47184 * parser.c (cp_parser_parameter_declaration): Recognize list-initialization. diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index db2cb96..004ff05 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -14901,6 +14901,9 @@ cp_parser_direct_declarator (cp_parser* parser, parser-num_template_parameter_lists = saved_num_template_parameter_lists; + /* Consume the `)'. */ + cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN); + /* If all went well, parse the cv-qualifier-seq and the exception-specification. */ if (member_p || cp_parser_parse_definitely (parser)) @@ -14915,8 +14918,6 @@ cp_parser_direct_declarator (cp_parser* parser, if (ctor_dtor_or_conv_p) *ctor_dtor_or_conv_p = *ctor_dtor_or_conv_p 0; first = false; - /* Consume the `)'. */ - cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN); /* Parse the cv-qualifier-seq. */ cv_quals = cp_parser_cv_qualifier_seq_opt (parser); @@ -16053,6 +16054,7 @@ cp_parser_parameter_declaration (cp_parser *parser, of some object of type char to int. */ !parser-in_type_id_in_expr_p cp_parser_uncommitted_to_tentative_parse_p (parser) + cp_lexer_next_token_is_not (parser-lexer, CPP_OPEN_BRACE) cp_lexer_next_token_is_not (parser-lexer, CPP_OPEN_PAREN)) cp_parser_commit_to_tentative_parse (parser); /* Parse the declarator. */ diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist51.C b/gcc/testsuite/g++.dg/cpp0x/initlist51.C new file mode 100644 index 000..9163dd3 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/initlist51.C @@ -0,0 +1,15 @@ +// PR c++/47184 +// { dg-options -std=c++0x } + +struct S +{ + int a; +}; +struct T +{ + T(S s) {} +}; +int main() +{ + T t(S{1}); +}
C++ PATCHes for c++/46245 and c++/46145 (auto issues)
In 46245, we were complaining too soon about an auto parameter; we need to wait until after we splice in a late-specified return type. In 46145, we were failing to complain about an auto typedef. Tested x86_64-pc-linux-gnu, applying to trunk. commit 0ca632627d749d168b602675ca48df9e88a1eac5 Author: Jason Merrill ja...@redhat.com Date: Wed May 25 13:03:13 2011 -0400 PR c++/46145 * decl.c (grokdeclarator): Complain about auto typedef. diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 68dc999..db52184 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -9503,6 +9503,12 @@ grokdeclarator (const cp_declarator *declarator, memfn_quals = TYPE_UNQUALIFIED; } + if (type_uses_auto (type)) + { + error (typedef declared %auto%); + type = error_mark_node; + } + if (decl_context == FIELD) decl = build_lang_decl (TYPE_DECL, unqualified_id, type); else diff --git a/gcc/testsuite/g++.dg/cpp0x/auto9.C b/gcc/testsuite/g++.dg/cpp0x/auto9.C index 142ef90..190bfa6 100644 --- a/gcc/testsuite/g++.dg/cpp0x/auto9.C +++ b/gcc/testsuite/g++.dg/cpp0x/auto9.C @@ -119,3 +119,6 @@ Hauto h; // { dg-error invalid } void qq (auto); // { dg-error auto } void qr (auto*); // { dg-error auto } + +// PR c++/46145 +typedef auto autot; // { dg-error auto } commit 2ab4982d07fd89b0a7bc42868aa655173a132af7 Author: Jason Merrill ja...@redhat.com Date: Wed May 25 12:22:13 2011 -0400 PR c++/46245 * decl.c (grokdeclarator): Complain later for auto parameter. * pt.c (splice_late_return_type): Handle use in a template type-parameter. diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index cc09c1d..68dc999 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -8763,12 +8763,6 @@ grokdeclarator (const cp_declarator *declarator, || thread_p) error (storage class specifiers invalid in parameter declarations); - if (type_uses_auto (type)) - { - error (parameter declared %auto%); - type = error_mark_node; - } - /* Function parameters cannot be constexpr. If we saw one, moan and pretend it wasn't there. */ if (constexpr_p) @@ -9749,6 +9743,12 @@ grokdeclarator (const cp_declarator *declarator, if (ctype || in_namespace) error (cannot use %::% in parameter declaration); + if (type_uses_auto (type)) + { + error (parameter declared %auto%); + type = error_mark_node; + } + /* A parameter declared as an array of T is really a pointer to T. One declared as a function is really a pointer to a function. One declared as a member is really a pointer to member. */ diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index bb4515b..c3c759e 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -19315,7 +19315,12 @@ splice_late_return_type (tree type, tree late_return_type) return type; argvec = make_tree_vec (1); TREE_VEC_ELT (argvec, 0) = late_return_type; - if (processing_template_decl) + if (processing_template_parmlist) +/* For a late-specified return type in a template type-parameter, we + need to add a dummy argument level for its parmlist. */ +argvec = add_to_template_args + (make_tree_vec (processing_template_parmlist), argvec); + if (current_template_parms) argvec = add_to_template_args (current_template_args (), argvec); return tsubst (type, argvec, tf_warning_or_error, NULL_TREE); } diff --git a/gcc/testsuite/g++.dg/cpp0x/auto23.C b/gcc/testsuite/g++.dg/cpp0x/auto23.C new file mode 100644 index 000..49b5a0e --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/auto23.C @@ -0,0 +1,4 @@ +// PR c++/46245 +// { dg-options -std=c++0x } + +templateauto f()-int struct A { }; diff --git a/gcc/testsuite/g++.dg/cpp0x/auto9.C b/gcc/testsuite/g++.dg/cpp0x/auto9.C index ab90be5..142ef90 100644 --- a/gcc/testsuite/g++.dg/cpp0x/auto9.C +++ b/gcc/testsuite/g++.dg/cpp0x/auto9.C @@ -79,10 +79,10 @@ enum struct D : auto * { FF = 0 }; // { dg-error must be an integral type|decl void bar () { - try { } catch (auto i) { } // { dg-error invalid use of } - try { } catch (auto) { } // { dg-error invalid use of } - try { } catch (auto *i) { } // { dg-error invalid use of } - try { } catch (auto *) { } // { dg-error invalid use of } + try { } catch (auto i) { } // { dg-error parameter declared } + try { } catch (auto) { } // { dg-error parameter declared } + try { } catch (auto *i) { } // { dg-error parameter declared } + try { } catch (auto *) { } // { dg-error parameter declared } } void
C++ PATCH for c++/45698 (crash with variadics)
45698 was actually fixed in 4.5.0, but before I closed it I checked to see how the testcase was doing with the current compiler, and found that it was crashing again. This turned out to be because of Nathan's recent tree-slimming work; ARGUMENT_PACK_SELECT doesn't have TREE_TYPE, so we crash when we try to look at it in value_dependent_expression_p. But we shouldn't be treating it as an expression in the first place, since it could be either a type or value argument. Fixed by looking through ARGUMENT_PACK_SELECT before we decide what sort of template argument we're dealing with. While looking at this, I also noticed that print_node expects everything to have TREE_TYPE, which is no longer correct. And I made print_node more useful for ARGUMENT_PACK_SELECT. Tested x86_64-pc-linux-gnu, applying to trunk. commit 0b5532a57ea85765d6baed5eff0abaaabac1aaaf Author: Jason Merrill ja...@redhat.com Date: Wed May 25 13:24:47 2011 -0400 PR c++/45698 * pt.c (dependent_template_arg_p): See through ARGUMENT_PACK_SELECT. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index c3c759e..c9c25cd 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -18759,6 +18759,9 @@ dependent_template_arg_p (tree arg) if (arg == error_mark_node) return true; + if (TREE_CODE (arg) == ARGUMENT_PACK_SELECT) +arg = ARGUMENT_PACK_SELECT_ARG (arg); + if (TREE_CODE (arg) == TEMPLATE_DECL || TREE_CODE (arg) == TEMPLATE_TEMPLATE_PARM) return dependent_template_p (arg); diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic110.C b/gcc/testsuite/g++.dg/cpp0x/variadic110.C new file mode 100644 index 000..86f1bb1 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/variadic110.C @@ -0,0 +1,15 @@ +// PR c++/45698 +// { dg-options -std=c++0x } + +template class... Ts struct tuple { }; + +templateclass... Ts +struct A { + templatetypename T struct N { }; + tupleNTs... tup; +}; + +int main() +{ + Aint, double a; +} commit 46cccd60afea40407a278f6d937373e0121c24ee Author: Jason Merrill ja...@redhat.com Date: Wed May 25 13:32:05 2011 -0400 * print-tree.c (print_node): Only look at TREE_TYPE if TS_TYPED. * cp/ptree.c (cxx_print_xnode): Handle ARGUMENT_PACK_SELECT. diff --git a/gcc/cp/ptree.c b/gcc/cp/ptree.c index a4c3ed5..5c9626e 100644 --- a/gcc/cp/ptree.c +++ b/gcc/cp/ptree.c @@ -221,6 +221,12 @@ cxx_print_xnode (FILE *file, tree node, int indent) fprintf (file, pending_template); } break; +case ARGUMENT_PACK_SELECT: + print_node (file, pack, ARGUMENT_PACK_SELECT_FROM_PACK (node), + indent+4); + indent_to (file, indent + 3); + fprintf (file, index %d, ARGUMENT_PACK_SELECT_INDEX (node)); + break; default: break; } diff --git a/gcc/print-tree.c b/gcc/print-tree.c index 3b5edeb..58c9613 100644 --- a/gcc/print-tree.c +++ b/gcc/print-tree.c @@ -321,7 +321,7 @@ print_node (FILE *file, const char *prefix, tree node, int indent) if (indent = 4) print_node_brief (file, type, TREE_TYPE (node), indent + 4); } - else + else if (CODE_CONTAINS_STRUCT (code, TS_TYPED)) { print_node (file, type, TREE_TYPE (node), indent + 4); if (TREE_TYPE (node))
[PATCH, testsuite] Additional tests for PR46728 (PR46728 patch 4)
Since I'm in process of moving the lowering of pow and powi calls from expand into gimple, I wrote some tests to improve coverage in this area. Most of these look for specific code generation patterns in PowerPC assembly where the existence of a hardware floating square root can be guaranteed. This patch is conditional on patch 3 of the PR46728 series; without it, test pr46728-16.c will fail, since the FMA will not be generated. All other tests currently pass. OK to add to test suite on trunk? Thanks, Bill 2011-05-25 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc.target/powerpc/pr46728-1.c: New. * gcc.target/powerpc/pr46728-2.c: New. * gcc.target/powerpc/pr46728-3.c: New. * gcc.target/powerpc/pr46728-4.c: New. * gcc.target/powerpc/pr46728-5.c: New. * gcc.dg/pr46728-6.c: New. * gcc.target/powerpc/pr46728-7.c: New. * gcc.target/powerpc/pr46728-8.c: New. * gcc.dg/pr46728-9.c: New. * gcc.target/powerpc/pr46728-10.c: New. * gcc.target/powerpc/pr46728-11.c: New. * gcc.dg/pr46728-12.c: New. * gcc.target/powerpc/pr46728-13.c: New. * gcc.target/powerpc/pr46728-14.c: New. * gcc.target/powerpc/pr46728-15.c: New. * gcc.target/powerpc/pr46728-16.c: New. Index: gcc/testsuite/gcc.target/powerpc/pr46728-13.c === --- gcc/testsuite/gcc.target/powerpc/pr46728-13.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr46728-13.c (revision 0) @@ -0,0 +1,27 @@ +/* { dg-do run } */ +/* { dg-options -O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt } */ + +#include math.h + +extern void abort (void); + +#define NVALS 6 + +static double +convert_it (double x) +{ + return pow (x, 1.0 / 6.0); +} + +int +main (int argc, char *argv[]) +{ + double values[NVALS] = { 3.0, 1.95, 2.227, 729.0, 64.0, .0008797 }; + unsigned i; + + for (i = 0; i NVALS; i++) +if (convert_it (values[i]) != cbrt (sqrt (values[i]))) + abort (); + + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/pr46728-3.c === --- gcc/testsuite/gcc.target/powerpc/pr46728-3.c(revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr46728-3.c(revision 0) @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt } */ + +#include math.h + +extern void abort (void); + +#define NVALS 6 + +static double +convert_it (double x) +{ + return pow (x, 0.75); +} + +int +main (int argc, char *argv[]) +{ + double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 }; + unsigned i; + + for (i = 0; i NVALS; i++) +if (convert_it (values[i]) != sqrt(values[i]) * sqrt (sqrt (values[i]))) + abort (); + + return 0; +} + + +/* { dg-final { scan-assembler-times sqrt 4 { target powerpc*-*-* } } } */ +/* { dg-final { scan-assembler-not pow { target powerpc*-*-* } } } */ Index: gcc/testsuite/gcc.target/powerpc/pr46728-14.c === --- gcc/testsuite/gcc.target/powerpc/pr46728-14.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr46728-14.c (revision 0) @@ -0,0 +1,78 @@ +/* { dg-do run } */ +/* { dg-options -O2 -ffast-math -fno-inline -fno-unroll-loops -lm -mpowerpc-gpopt } */ + +#include math.h + +extern void abort (void); + +#define NVALS 6 + +static double +convert_it_1 (double x) +{ + return pow (x, 1.5); +} + +static double +convert_it_2 (double x) +{ + return pow (x, 2.5); +} + +static double +convert_it_3 (double x) +{ + return pow (x, -0.5); +} + +static double +convert_it_4 (double x) +{ + return pow (x, 10.5); +} + +static double +convert_it_5 (double x) +{ + return pow (x, -3.5); +} + +int +main (int argc, char *argv[]) +{ + double values[NVALS] = { 3.0, 1.95, 2.227, 4.0, 256.0, .0008797 }; + double PREC = .99; + unsigned i; + + for (i = 0; i NVALS; i++) +{ + volatile double x, y; + + x = sqrt (values[i]); + y = __builtin_powi (values[i], 1); + if (fabs (convert_it_1 (values[i]) / (x * y)) PREC) + abort (); + + x = sqrt (values[i]); + y = __builtin_powi (values[i], 2); + if (fabs (convert_it_2 (values[i]) / (x * y)) PREC) + abort (); + + x = sqrt (values[i]); + y = __builtin_powi (values[i], -1); + if (fabs (convert_it_3 (values[i]) / (x * y)) PREC) + abort (); + + x = sqrt (values[i]); + y = __builtin_powi (values[i], 10); + if (fabs (convert_it_4 (values[i]) / (x * y)) PREC) + abort (); + + x = sqrt (values[i]); + y = __builtin_powi (values[i], -4); + if (fabs (convert_it_5 (values[i]) / (x * y)) PREC) + abort (); +} + + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/pr46728-4.c === ---
Re: PATCH: Add pause intrinsic
On Wed, 25 May 2011 12:31:17 -0700 H.J. Lu hjl.to...@gmail.com wrote: On Wed, May 25, 2011 at 12:17 PM, Basile Starynkevitch bas...@starynkevitch.net wrote: Perhaps the doc might explain why is it necessary to have a builtin for two independent roles: first, the full compiler memory barrier (which probably means to spill all the registers on the stack - definitely a task for a compiler); second, to pause the processor (which might also mean to flush or invalidate some data caches). In particular, I would naively imagine that we might have a more generic builtin for the compiler memory barrier (which probably could be independent of the particular ia32 target), and in that case which can't we just implement the pause ia32 builtin as builtin_compiler_barrier(); asm (pause)? We may need builtin_compiler_barrier(); asm (pause); builtin_compiler_barrier(); I don't understand why the second builtin_compiler_barrier() after the asm (pause) would be needed? Could you please explain why should we need it? My feeling was that after the first builtin_compiler_barrier () and hence after the asm (pause) no register would contain valid data, and the compiler would have to reload from memory everything. So why do you think the second is needed??? Or perhaps I misunderstood completely all the issues! -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basileatstarynkevitchdotnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mine, sont seulement les miennes} ***