Re: [PATCH] teach emit_store_flag to use clz/ctz
On Fri, 27 Apr 2012, Paolo Bonzini wrote: What about cost considerations? We only seem to have the general branches are expensive metric - but ctz/clz may be prohibitely expensive themselves, no? Yeah, that's a general problem with this kind of tricks. In general however clz/ctz is getting less and less expensive, so I don't think it is worrisome (at least at the beginning of stage 1). We can add rtx_costs checks later. Among architectures I know, only i386 has an expensive bsf/bsr but it also has sete/setne which GCC will use instead of this trick. Looking at rtx_costs, nothing seems to mark clz/ctz as prohibitively expensive (Xtensa does, but only in the case when the optab handler will not exist). I realize though that this is not a particularly good statistic, since the compiler would not generate them out of its hat until now. For the record: MIPS processors that implement CLZ/CLO (for some reason CTZ/CTO haven't been added to the architecture, but these operations can be cheaply transformed into CLZ/CLO) generally have a dedicated unit that causes no pipeline stall for these instructions even in the simplest pipeline designs like the M4K -- IOW they are issued at the usual one instruction per pipeline clock rate. Of course all MIPS processors have SLT too, so perhaps they won't benefit from your change either. Maciej
[PATCH, Android] Stack protector enabling for Android target
Hi! The patch enables stack protector for Android. Android targets don't contain necessary information in features.h so we explicitly enable stack protector for Android. Bootstrapped and regtested on x86_64. Ok to commit? Thanks, Igor 2012-05-06 Igor Zamyatin igor.zamya...@intel.com * configure.ac: Stack protector enabling for Android targets. * configure: Regenerate. diff --git a/gcc/configure.ac b/gcc/configure.ac index 86b4bea..c1012d6 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -4545,6 +4545,8 @@ AC_CACHE_CHECK(__stack_chk_fail in target C library, gcc_cv_libc_provides_ssp, [gcc_cv_libc_provides_ssp=no case $target in + *-android*) + gcc_cv_libc_provides_ssp=yes;; *-*-linux* | *-*-kfreebsd*-gnu | *-*-knetbsd*-gnu) [# glibc 2.4 and later provides __stack_chk_fail and # either __stack_chk_guard, or TLS access to stack guard canary.
Re: [PATCH] Atom: Scheduler improvements for better imul placement
Ping. Could x86 maintainer(s) look at these changes? Thanks, Igor On Fri, Apr 20, 2012 at 4:04 PM, Igor Zamyatin izamya...@gmail.com wrote: On Tue, Apr 17, 2012 at 12:27 AM, Igor Zamyatin izamya...@gmail.com wrote: On Fri, Apr 13, 2012 at 4:20 PM, Andrey Belevantsev a...@ispras.ru wrote: On 13.04.2012 14:18, Igor Zamyatin wrote: On Thu, Apr 12, 2012 at 5:01 PM, Andrey Belevantseva...@ispras.ru wrote: On 12.04.2012 16:38, Richard Guenther wrote: On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatinizamya...@gmail.com wrote: On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakovamona...@ispras.ru wrote: Can atom execute two IMUL in parallel? Or what exactly is the pipeline behavior? As I understand from Intel's optimization reference manual, the behavior is as follows: if the instruction immediately following IMUL has shorter latency, execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a 4-or-more cycles latency instruction can be issued after IMUL without a stall. In other words, IMUL is pipelined with respect to other long-latency instructions, but not to short-latency instructions. It seems to be modeled in the pipeline description though: ;;; imul insn has 5 cycles latency (define_reservation atom-imul-32 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4, atom-port-0) ;;; imul instruction excludes other non-FP instructions. (exclusion_set atom-eu-0, atom-eu-1 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4) The main idea is quite simple: If we are going to schedule IMUL instruction (it is on the top of ready list) we try to find out producer of other (independent) IMUL instruction that is in ready list too. The goal is try to schedule such a producer to get another IMUL in ready list and get scheduling of 2 successive IMUL instructions. Why does that not happen without your patch? Does it never happen without your patch or does it merely not happen for one EEMBC benchmark (can you provide a testcase?)? It does not happen because the scheduler by itself does not do such specific reordering. That said, it is easy to imagine the cases where this patch will make things worse rather than better. Igor, why not try different subtler mechanisms like adjust_priority, which is get called when an insn is added to the ready list? E.g. increase the producer's priority. The patch as is misses checks for NONDEBUG_INSN_P. Also, why bail out when you have more than one imul in the ready list? Don't you want to bump the priority of the other imul found? Could you provide some examples when this patch would harm the performance? I thought of the cases when the other ready insns can fill up the hole and that would be more beneficial because e.g. they would be on more critical paths than the producer of your second imul. I don't know enough of Atom to give an example -- maybe some long divisions? Sched_reorder was chosen since it is used in other ports and looks most suitable for such case, e.g. it provides access to the whole ready list. BTW, just increasing producer's priority seems to be more risky in performance sense - we can incorrectly start delaying some instructions. Yes, but exactly because of the above example you can start incorrectly delaying other insns, too, as you force the insn to be the first in the list. While bumping priority still leaves the scheduler sorting heuristics in place and actually lowers that risk. Thought ready list doesn't contain DEBUG_INSN... Is it so? If it contains them - this could be added easily It does, but I'm not sure the sched_reorder hook gets them or they are immediately removed -- I saw similar checks in one of the targets' hooks. Done with DEBUG_INSN, also 1-imul limit was removed. Patch attached Anyways, my main thought was that it is better to test on more benchmarks to alleviate the above concerns, so as long as the i386 maintainers are happy, I don't see major problems here. A good idea could be to generalize the patch to handle other long latency insns as second consumers, not only imuls, if this is relevant for Atom. Yes, generalization of this approach is in plans. According to Atom Software optimization guide there are several headrooms left here. As for trying on more benchmarks - the particular case is indeed quite rare. I attached the example where patch helps to group imuls in pairs which is profitable for Atom. Such and similar codes are not very common. But hopefully this approach could help avoid this and other glassjaws. BTW, this patch also helps some EEMBC tests when funroll-loops specified. So, any feedback from i386 maintainers about this? :) Changelog slightly changed 2012-04-10 Yuri Rumyantsev yuri.s.rumyant...@intel.com *
Re: [PATCH] teach emit_store_flag to use clz/ctz
On Sat, May 5, 2012 at 11:52 PM, Maciej W. Rozycki ma...@linux-mips.org wrote: For the record: MIPS processors that implement CLZ/CLO (for some reason CTZ/CTO haven't been added to the architecture, but these operations can be cheaply transformed into CLZ/CLO) generally have a dedicated unit that causes no pipeline stall for these instructions even in the simplest pipeline designs like the M4K -- IOW they are issued at the usual one instruction per pipeline clock rate. Even on Octeon this is true. Though Octeon has seq/sneq too so it does not matter in the end. Note I originally was the one who proposed this optimization for PowerPC even before I saw what XLC did. See PR 10588 (which I filed 9 years ago) and it seems we are about to fix it soon. Thanks, Andrew Pinski
[PATCH] Fix a missing truncate due with combine
Take the following testcase: typedef unsigned long long uint64_t; void f(uint64_t *a, uint64_t aa) __attribute__((noinline)); void f(uint64_t *a, uint64_t aa) { uint64_t new_value = aa; uint64_t old_value = *a; int bit_size = 32; uint64_t mask = (uint64_t)(unsigned)(-1); uint64_t tmp = old_value mask; new_value = mask; /* On overflow we need to add 1 in the upper bits */ if (tmp new_value) new_value += 1ullbit_size; /* Add in the upper bits from the old value */ new_value += old_value ~mask; *a = new_value; } int main(void) { uint64_t value, new_value, old_value; value = 0x10001; old_value = value; new_value = (value+1)(uint64_t)(unsigned)(-1); f(value, new_value); if (value != old_value+1) { __builtin_printf(FAIL.\n); __builtin_abort (); } __builtin_printf(0x%llx\n,value); } --- CUT --- Combine is able combines the following three instruction: (insn 8 7 9 2 (set (reg/v:DI 194 [ new_value ]) (and:DI (reg/v:DI 5 $5 [ aa ]) (const_int 4294967295 [0x]))) t.c:10 152 {*anddi3} (expr_list:REG_DEAD (reg/v:DI 200 [ aa ]) (nil))) (insn 9 8 10 2 (set (reg:DI 201) (and:DI (reg/v:DI 195 [ old_value ]) (const_int 4294967295 [0x]))) t.c:9 152 {*anddi3} (nil)) (insn 10 9 11 2 (set (reg:DI 202) (gtu:DI (reg:DI 201) (reg/v:DI 194 [ new_value ]))) t.c:12 473 {*sgtu_didi} (expr_list:REG_DEAD (reg:DI 201) (nil))) --- CUT --- Into: (set (reg:DI 202) (ltu:DI (reg:SI 5 $5 [ aa+4 ]) (subreg:SI (reg/v:DI 195 [ old_value ]) 4))) Which is wrong when TRULY_NOOP_TRUNCATION_MODES_P is false which is what happens on MIPS. This patches fixes the problem by change the place where the call to gen_lowpart should have been gen_lowpart_or_truncate in simplify_comparison. OK? Bootstrapped and tested on mips64-linux-gnu with no regressions. Thanks, Andrew Pinski ChangeLog: * combine.c (simplify_comparison): Use gen_lowpart_or_truncate instead of gen_lowpart when we had a truncating and. * gcc.c-torture/execute/20110418-1.c: New testcase. Index: combine.c === --- combine.c (revision 187203) +++ combine.c (working copy) @@ -11199,8 +11199,8 @@ simplify_comparison (enum rtx_code code, tmode != GET_MODE (op0); tmode = GET_MODE_WIDER_MODE (tmode)) if ((unsigned HOST_WIDE_INT) c0 == GET_MODE_MASK (tmode)) { - op0 = gen_lowpart (tmode, inner_op0); - op1 = gen_lowpart (tmode, inner_op1); + op0 = gen_lowpart_or_truncate (tmode, inner_op0); + op1 = gen_lowpart_or_truncate (tmode, inner_op1); code = unsigned_condition (code); changed = 1; break; Index: testsuite/gcc.c-torture/execute/20110418-1.c === --- testsuite/gcc.c-torture/execute/20110418-1.c(revision 0) +++ testsuite/gcc.c-torture/execute/20110418-1.c(revision 0) @@ -0,0 +1,29 @@ +typedef unsigned long long uint64_t; +void f(uint64_t *a, uint64_t aa) __attribute__((noinline)); +void f(uint64_t *a, uint64_t aa) +{ + uint64_t new_value = aa; + uint64_t old_value = *a; + int bit_size = 32; +uint64_t mask = (uint64_t)(unsigned)(-1); +uint64_t tmp = old_value mask; +new_value = mask; +/* On overflow we need to add 1 in the upper bits */ +if (tmp new_value) +new_value += 1ullbit_size; +/* Add in the upper bits from the old value */ +new_value += old_value ~mask; +*a = new_value; +} +int main(void) +{ + uint64_t value, new_value, old_value; + value = 0x10001; + old_value = value; + new_value = (value+1)(uint64_t)(unsigned)(-1); + f(value, new_value); + if (value != old_value+1) +__builtin_abort (); + return 0; +} +
[Patch] Bump minimum required MPFR version
Hi, in http://gcc.gnu.org/install/prerequisites.html we say that GCC requires at least MPFR 2.4.2, but in the toplevel configure.ac we only require 2.3.1, printing a warning that the result is likely to be buggy if the version is lower than 2.4.2. The attached patch bumps the minimum version to 2.4.0. We started requiring 2.3.1, which was released on 2008-01-29, on 2009-04-08, that is, about 1 year and a few months after the release. MPFR 2.4.0 was released on 2009-01-26, so by now it's 3 years old. And by the time we release 4.8 it's most likely over 4 years old already. For some background, the fortran frontend recently started using mpfr_fmod to fix some bugs in the constant folding of the MOD and MODULO intrinsics, effectively requiring at least MPFR 2.4.0 in order to build. Also, if this patch is accepted the middle-end could be modified to constant fold BUILT_IN_FMOD{F,,L} relatively easily, something which isn't done today. Ok for trunk? 2012-05-06 Janne Blomqvist j...@gcc.gnu.org * configure.ac: Bump minimum MPFR version to 2.4.0. * configure: Regenerated. -- Janne Blomqvist mpfrbump.diff Description: Binary data
Re: [PATCH] Fix a missing truncate due with combine
Which is wrong when TRULY_NOOP_TRUNCATION_MODES_P is false which is what happens on MIPS. This patches fixes the problem by change the place where the call to gen_lowpart should have been gen_lowpart_or_truncate in simplify_comparison. There is a similar transformation in the same function: /* If this AND operation is really a ZERO_EXTEND from a narrower mode, the constant fits within that mode, and this is either an equality or unsigned comparison, try to do this comparison in the narrower mode. Note that in: (ne:DI (and:DI (reg:DI 4) (const_int 0x)) (const_int 0)) - (ne:DI (reg:SI 4) (const_int 0)) unless TRULY_NOOP_TRUNCATION allows it or the register is known to hold a value of the required mode the transformation is invalid. */ if ((equality_comparison_p || unsigned_comparison_p) CONST_INT_P (XEXP (op0, 1)) (i = exact_log2 ((UINTVAL (XEXP (op0, 1)) GET_MODE_MASK (mode)) + 1)) = 0 const_op i == 0 (tmode = mode_for_size (i, MODE_INT, 1)) != BLKmode (TRULY_NOOP_TRUNCATION_MODES_P (tmode, GET_MODE (op0)) || (REG_P (XEXP (op0, 0)) reg_truncated_to_mode (tmode, XEXP (op0, 0) { op0 = gen_lowpart (tmode, XEXP (op0, 0)); continue; } and, in this case, it is simply not done if !TRULY_NOOP_TRUNCATION_MODES_P. I think that both transformations are equally profitable, so can we make them agree, one way or the other? * gcc.c-torture/execute/20110418-1.c: New testcase. This needs to be updated a little bit. :-) -- Eric Botcazou
Re: [patch] Fix cygwin ada install [was Re: Yet another issue with gcc current trunk with ada on cygwin]
OK, revision 184558 now reverted. Now on the 4.7 branch as well. -- Eric Botcazou
Re: [PATCH] Fix overzealous DSE on sparc
Ok, so I plan to push this sparc fix into mainline and the 4.7 branch after my testing is done. Eric, any objections? For the record, none. -- Eric Botcazou
[Ada] disable caret printing by default for Ada
Like for front-end warnings. Tested on i586-suse-linux, applied on the mainline. 2012-05-06 Eric Botcazou ebotca...@adacore.com * gcc-interface/misc.c (gnat_post_options): Disable caret by default. -- Eric Botcazou Index: gcc-interface/misc.c === --- gcc-interface/misc.c (revision 187074) +++ gcc-interface/misc.c (working copy) @@ -235,6 +235,10 @@ gnat_post_options (const char **pfilenam /* No psABI change warnings for Ada. */ warn_psabi = 0; + /* No caret by default for Ada. */ + if (!global_options_set.x_flag_diagnostics_show_caret) +global_dc-show_caret = false; + optimize = global_options.x_optimize; optimize_size = global_options.x_optimize_size; flag_compare_debug = global_options.x_flag_compare_debug;
[Ada] Missed vectorization opportunity for assignment loop
In Ada, we have issues with vectorization when full checks are enabled, most notably -gnato. This patch makes it possible to vectorize more loops at -O3. Tested on i586-suse-linux, applied on the mainline. 2012-05-06 Eric Botcazou ebotca...@adacore.com * gcc-interface/trans.c (Loop_Statement_to_gnu): Also handle invariant conditions with only one bound. (Raise_Error_to_gnu): Likewise. New function extracted from... (gnat_to_gnu) N_Raise_Constraint_Error: ...here. Call above function in regular mode only. -- Eric Botcazou Index: gcc-interface/trans.c === --- gcc-interface/trans.c (revision 187206) +++ gcc-interface/trans.c (working copy) @@ -2563,13 +2563,19 @@ Loop_Statement_to_gnu (Node_Id gnat_node i++) { tree low_ok - = build_binary_op (GE_EXPR, boolean_type_node, - convert (rci-type, gnu_low), - rci-low_bound); + = rci-low_bound + ? build_binary_op (GE_EXPR, boolean_type_node, + convert (rci-type, gnu_low), + rci-low_bound) + : boolean_true_node; + tree high_ok - = build_binary_op (LE_EXPR, boolean_type_node, - convert (rci-type, gnu_high), - rci-high_bound); + = rci-high_bound + ? build_binary_op (LE_EXPR, boolean_type_node, + convert (rci-type, gnu_high), + rci-high_bound) + : boolean_true_node; + tree range_ok = build_binary_op (TRUTH_ANDIF_EXPR, boolean_type_node, low_ok, high_ok); @@ -2794,7 +2800,7 @@ finalize_nrv_r (tree *tp, int *walk_subt tree ret_val = TREE_OPERAND (TREE_OPERAND (t, 0), 1), init_expr; /* If this is the temporary created for a return value with variable - size in call_to_gnu, we replace the RHS with the init expression. */ + size in Call_to_gnu, we replace the RHS with the init expression. */ if (TREE_CODE (ret_val) == COMPOUND_EXPR TREE_CODE (TREE_OPERAND (ret_val, 0)) == INIT_EXPR TREE_OPERAND (TREE_OPERAND (ret_val, 0), 0) @@ -3122,7 +3128,7 @@ build_return_expr (tree ret_obj, tree re aggregate_value_p (operation_type, current_function_decl)) { /* Recognize the temporary created for a return value with variable - size in call_to_gnu. We want to eliminate it if possible. */ + size in Call_to_gnu. We want to eliminate it if possible. */ if (TREE_CODE (ret_val) == COMPOUND_EXPR TREE_CODE (TREE_OPERAND (ret_val, 0)) == INIT_EXPR TREE_OPERAND (TREE_OPERAND (ret_val, 0), 0) @@ -3583,7 +3589,7 @@ create_init_temporary (const char *prefi requires atomic synchronization. */ static tree -call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, tree gnu_target, +Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, tree gnu_target, bool atomic_sync) { const bool function_call = (Nkind (gnat_node) == N_Function_Call); @@ -4751,6 +4757,134 @@ Compilation_Unit_to_gnu (Node_Id gnat_no invalidate_global_renaming_pointers (); } +/* Subroutine of gnat_to_gnu to translate gnat_node, an N_Raise_xxx_Error, + to a GCC tree, which is returned. GNU_RESULT_TYPE_P is a pointer to where + we should place the result type. LABEL_P is true if there is a label to + branch to for the exception. */ + +static tree +Raise_Error_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p) +{ + const Node_Kind kind = Nkind (gnat_node); + const int reason = UI_To_Int (Reason (gnat_node)); + const Node_Id gnat_cond = Condition (gnat_node); + const bool with_extra_info += Exception_Extra_Info + !No_Exception_Handlers_Set () + !get_exception_label (kind); + tree gnu_result = NULL_TREE, gnu_cond = NULL_TREE; + + *gnu_result_type_p = get_unpadded_type (Etype (gnat_node)); + + switch (reason) +{ +case CE_Access_Check_Failed: + if (with_extra_info) + gnu_result = build_call_raise_column (reason, gnat_node); + break; + +case CE_Index_Check_Failed: +case CE_Range_Check_Failed: +case CE_Invalid_Data: + if (Present (gnat_cond) Nkind (gnat_cond) == N_Op_Not) + { + Node_Id gnat_range, gnat_index, gnat_type; + tree gnu_index, gnu_low_bound, gnu_high_bound; + struct range_check_info_d *rci; + + switch (Nkind (Right_Opnd (gnat_cond))) + { + case N_In: + gnat_range = Right_Opnd (Right_Opnd (gnat_cond)); + gcc_assert (Nkind (gnat_range) == N_Range); + gnu_low_bound = gnat_to_gnu (Low_Bound (gnat_range)); + gnu_high_bound = gnat_to_gnu (High_Bound (gnat_range)); + break; + + case N_Op_Ge: + gnu_low_bound = gnat_to_gnu (Right_Opnd (Right_Opnd (gnat_cond))); + gnu_high_bound = NULL_TREE; + break; + + case N_Op_Le: + gnu_low_bound = NULL_TREE; + gnu_high_bound = gnat_to_gnu (Right_Opnd (Right_Opnd (gnat_cond))); + break; + + default: + goto common; + } + + gnat_index = Left_Opnd (Right_Opnd (gnat_cond)); + gnat_type =
[Ada] Fix internal error on renaming with private discriminated type
We failed to use the padded type for the renaming as in the non-private case. Tested on i586-suse-linux, applied on the mainline. 2012-05-06 Eric Botcazou ebotca...@adacore.com * gcc-interface/decl.c (gnat_to_gnu_entity) object: In the renaming case, use the padded type if the renamed object has an unconstrained type with default discriminant. 2012-05-06 Eric Botcazou ebotca...@adacore.com * gnat.dg/specs/renamings.ads: Rename to... * gnat.dg/specs/renaming1.ads: ...this. * gnat.dg/specs/renaming2.ads: New test. * gnat.dg/specs/renaming2_pkg1.ads: New helper. * gnat.dg/specs/renaming2_pkg2.ads: Likewise. * gnat.dg/specs/renaming2_pkg3.ads: Likewise. * gnat.dg/specs/renaming2_pkg4.ad[sb]: Likewise. -- Eric Botcazou Index: gcc-interface/decl.c === --- gcc-interface/decl.c (revision 187206) +++ gcc-interface/decl.c (working copy) @@ -938,6 +938,14 @@ gnat_to_gnu_entity (Entity_Id gnat_entit gnu_type = TREE_TYPE (gnu_expr); } + /* Or else, if the renamed object has an unconstrained type with + default discriminant, use the padded type. */ + else if (TYPE_IS_PADDING_P (TREE_TYPE (gnu_expr)) + TREE_TYPE (TYPE_FIELDS (TREE_TYPE (gnu_expr))) + == gnu_type + CONTAINS_PLACEHOLDER_P (TYPE_SIZE (gnu_type))) + gnu_type = TREE_TYPE (gnu_expr); + /* Case 1: If this is a constant renaming stemming from a function call, treat it as a normal object whose initial value is what is being renamed. RM 3.3 says that the result of evaluating a -- { dg-do compile } with Renaming2_Pkg1; package Renaming2 is type T is null record; package Iter is new Renaming2_Pkg1.GP.Inner (T); end Renaming2; -- { dg-excess-errors no code generated } with Renaming2_Pkg2; with Renaming2_Pkg3; with Renaming2_Pkg4; package Renaming2_Pkg1 is package Impl is new Renaming2_Pkg3 (Base_Index_T = Positive, Value_T = Renaming2_Pkg2.Root); use Impl; package GP is new Renaming2_Pkg4 (Length_T = Impl.Length_T, Value_T = Renaming2_Pkg2.Root); end Renaming2_Pkg1; package Renaming2_Pkg2 is type Root is private; private type Root (D : Boolean := False) is record case D is when True = N : Natural; when False = null; end case; end record; end Renaming2_Pkg2; -- { dg-excess-errors no code generated } generic type Base_Index_T is range ; type Value_T is private; package Renaming2_Pkg3 is type T is private; subtype Length_T is Base_Index_T range 0 .. Base_Index_T'Last; function Value (L : Length_T) return Value_T; function Next return Length_T; private type Obj_T is null record; type T is access Obj_T; end Renaming2_Pkg3; package body Renaming2_Pkg4 is package body Inner is function Next_Value return Value_T is Next_Value : Value_T renames Value (Next); begin return Next_Value; end Next_Value; end Inner; end Renaming2_Pkg4; -- { dg-excess-errors no code generated } generic type Length_T is range ; with function Next return Length_T is ; type Value_T is private; with function Value (L : Length_T) return Value_T is ; package Renaming2_Pkg4 is generic type T is private; package Inner is type Slave_T is tagged null record; function Next_Value return Value_T; end Inner; end Renaming2_Pkg4;
[Ada] Fix 'noreturn' for reraise of exception
This fixes an hole in the declaration of __gnat_reraise_zcx, so that the attached program now compiles without warnings. Tested on i586-suse-linux, applied on the mainline. 2012-05-06 Tristan Gingold ging...@adacore.com * gcc-interface/trans.c (gigi): Decorate reraise_zcx_decl. 2012-05-06 Tristan Gingold ging...@adacore.com * gnat.dg/warn7.adb: New test. -- Eric Botcazou -- { dg-do compile } procedure Warn7 is procedure Nested; pragma No_Return (Nested); procedure Nested is begin raise Constraint_Error; exception when Constraint_Error = raise; end; begin Nested; end; Index: gcc-interface/trans.c === --- gcc-interface/trans.c (revision 187208) +++ gcc-interface/trans.c (working copy) @@ -502,7 +502,12 @@ gigi (Node_Id gnat_root, int max_gnat_no = create_subprog_decl (get_identifier (__gnat_reraise_zcx), NULL_TREE, ftype, NULL_TREE, false, true, true, true, NULL, Empty); + /* Indicate that these never return. */ DECL_IGNORED_P (reraise_zcx_decl) = 1; + TREE_THIS_VOLATILE (reraise_zcx_decl) = 1; + TREE_SIDE_EFFECTS (reraise_zcx_decl) = 1; + TREE_TYPE (reraise_zcx_decl) += build_qualified_type (TREE_TYPE (reraise_zcx_decl), TYPE_QUAL_VOLATILE); /* If in no exception handlers mode, all raise statements are redirected to __gnat_last_chance_handler. No need to redefine raise_nodefer_decl since @@ -550,6 +555,7 @@ gigi (Node_Id gnat_root, int max_gnat_no build_function_type_list (build_pointer_type (except_type_node), NULL_TREE), NULL_TREE, false, true, true, true, NULL, Empty); + DECL_IGNORED_P (get_excptr_decl) = 1; raise_nodefer_decl = create_subprog_decl
Re: [Patch] Bump minimum required MPFR version
On Sun, May 6, 2012 at 10:33 AM, Janne Blomqvist blomqvist.ja...@gmail.com wrote: Hi, in http://gcc.gnu.org/install/prerequisites.html we say that GCC requires at least MPFR 2.4.2, but in the toplevel configure.ac we only require 2.3.1, printing a warning that the result is likely to be buggy if the version is lower than 2.4.2. The attached patch bumps the minimum version to 2.4.0. We started requiring 2.3.1, which was released on 2008-01-29, on 2009-04-08, that is, about 1 year and a few months after the release. MPFR 2.4.0 was released on 2009-01-26, so by now it's 3 years old. And by the time we release 4.8 it's most likely over 4 years old already. For some background, the fortran frontend recently started using mpfr_fmod to fix some bugs in the constant folding of the MOD and MODULO intrinsics, effectively requiring at least MPFR 2.4.0 in order to build. Also, if this patch is accepted the middle-end could be modified to constant fold BUILT_IN_FMOD{F,,L} relatively easily, something which isn't done today. Ok for trunk? Please make the check match documentation, thus 2.4.2, not 2.4.0. Thanks, Richard. 2012-05-06 Janne Blomqvist j...@gcc.gnu.org * configure.ac: Bump minimum MPFR version to 2.4.0. * configure: Regenerated. -- Janne Blomqvist
Re: [RFC] PR 53063 encode group options in .opt files
On Sat, 5 May 2012, Manuel L?pez-Ib??ez wrote: Thanks for the hints. This is what I am currently bootstrapping+regtesting. It builds and works on a few manual tests. OK if it passes? 2012-05-05 Manuel L?pez-Ib??ez m...@gcc.gnu.org PR c/53063 gcc/ * doc/options.texi (EnabledBy): Document. * opts.c (finish_options): Call finish_options_generated instead of handling some options explicitly. * optc-gen.awk: Handle EnabledBy. * opth-gen.awk: Declare finish_options_generated. * common.opt (Wuninitialized): Use EnabledBy. Delete Init. (Wunused-but-set-variable): Likewise. (Wunused-function): Likewise. (Wunused-label): Likewise. (Wunused-value): Likewise. (Wunused-variable): Likewise. * opt-read.awk: Create opt_numbers array. OK. -- Joseph S. Myers jos...@codesourcery.com
[C++ Patch] PR 53152
Hi, this is about the caret not pointing to the operator in the error messages produced by op_error. To fix the problem I'm simply passing down from the parser the proper location, via build_x_* and build_op_new and this appears to work fine. In this area - accurate locations - small issues remain (eg, build_m_component_ref) but I'd like to resolve first this specific PR and then, when time will allow, we'll incrementally make progress. Booted and tested x86_64-linux. Thanks, Paolo. /// 2012-05-06 Paolo Carlini paolo.carl...@oracle.com PR c++/53152 * call.c (op_error, build_new_op_1, build_new_op): Add location_t parameter. (build_conditional_expr_1): Adjust. * typeck.c (build_x_indirect_ref, build_x_binary_op, build_x_unary_op): Add location_t parameter. (rationalize_conditional_expr, build_x_array_ref, build_x_compound_expr, cp_build_modify_expr, build_x_modify_expr): Adjust. * typeck2.c (build_x_arrow): Add location_t parameter. * semantics.c (finish_unary_op_expr): Likewise. (finish_increment_expr, handle_omp_for_class_iterator): Adjust. * decl2.c (grok_array_decl): Add location_t parameter. * parser.c (cp_parser_postfix_open_square_expression, cp_parser_postfix_dot_deref_expression, cp_parser_unary_expression, cp_parser_binary_expression, cp_parser_builtin_offsetof, do_range_for_auto_deduction, cp_convert_range_for, cp_parser_template_argument, cp_parser_omp_for_cond): Pass the location, adjust. * pt.c (tsubst_copy_and_build): Adjust. * tree.c (maybe_dummy_object): Likewise. * cp-tree.h: Update declarations. Index: typeck.c === --- typeck.c(revision 187205) +++ typeck.c(working copy) @@ -2060,7 +2060,8 @@ rationalize_conditional_expr (enum tree_code code, gcc_assert (!TREE_SIDE_EFFECTS (op0) !TREE_SIDE_EFFECTS (op1)); return - build_conditional_expr (build_x_binary_op ((TREE_CODE (t) == MIN_EXPR + build_conditional_expr (build_x_binary_op (input_location, + (TREE_CODE (t) == MIN_EXPR ? LE_EXPR : GE_EXPR), op0, TREE_CODE (op0), op1, TREE_CODE (op1), @@ -2730,7 +2731,7 @@ build_ptrmemfunc_access_expr (tree ptrmem, tree me Must also handle REFERENCE_TYPEs for C++. */ tree -build_x_indirect_ref (tree expr, ref_operator errorstring, +build_x_indirect_ref (location_t loc, tree expr, ref_operator errorstring, tsubst_flags_t complain) { tree orig_expr = expr; @@ -2746,8 +2747,8 @@ tree expr = build_non_dependent_expr (expr); } - rval = build_new_op (INDIRECT_REF, LOOKUP_NORMAL, expr, NULL_TREE, - NULL_TREE, /*overload=*/NULL, complain); + rval = build_new_op (loc, INDIRECT_REF, LOOKUP_NORMAL, expr, + NULL_TREE, NULL_TREE, /*overload=*/NULL, complain); if (!rval) rval = cp_build_indirect_ref (expr, errorstring, complain); @@ -3580,8 +3581,9 @@ convert_arguments (tree typelist, VEC(tree,gc) **v ARG2_CODE as ERROR_MARK. */ tree -build_x_binary_op (enum tree_code code, tree arg1, enum tree_code arg1_code, - tree arg2, enum tree_code arg2_code, tree *overload, +build_x_binary_op (location_t loc, enum tree_code code, tree arg1, + enum tree_code arg1_code, tree arg2, + enum tree_code arg2_code, tree *overload, tsubst_flags_t complain) { tree orig_arg1; @@ -3603,7 +3605,7 @@ tree if (code == DOTSTAR_EXPR) expr = build_m_component_ref (arg1, arg2, complain); else -expr = build_new_op (code, LOOKUP_NORMAL, arg1, arg2, NULL_TREE, +expr = build_new_op (loc, code, LOOKUP_NORMAL, arg1, arg2, NULL_TREE, overload, complain); /* Check for cases such as x+yz which users are likely to @@ -3643,8 +3645,8 @@ build_x_array_ref (tree arg1, tree arg2, tsubst_fl arg2 = build_non_dependent_expr (arg2); } - expr = build_new_op (ARRAY_REF, LOOKUP_NORMAL, arg1, arg2, NULL_TREE, - /*overload=*/NULL, complain); + expr = build_new_op (input_location, ARRAY_REF, LOOKUP_NORMAL, arg1, + arg2, NULL_TREE, /*overload=*/NULL, complain); if (processing_template_decl expr != error_mark_node) return build_min_non_dep (ARRAY_REF, expr, orig_arg1, orig_arg2, @@ -4659,7 +4661,8 @@ pointer_diff (tree op0, tree op1, tree ptrtype) and XARG is the operand. */ tree -build_x_unary_op (enum tree_code code, tree xarg, tsubst_flags_t complain) +build_x_unary_op (location_t loc, enum tree_code code, tree xarg, +
Re: [RFC] PR 53063 encode group options in .opt files
On 6 May 2012 13:56, Joseph S. Myers jos...@codesourcery.com wrote: On Sat, 5 May 2012, Manuel López-Ibáñez wrote: Thanks for the hints. This is what I am currently bootstrapping+regtesting. It builds and works on a few manual tests. OK if it passes? 2012-05-05 Manuel López-Ibáñez m...@gcc.gnu.org PR c/53063 gcc/ * doc/options.texi (EnabledBy): Document. * opts.c (finish_options): Call finish_options_generated instead of handling some options explicitly. * optc-gen.awk: Handle EnabledBy. * opth-gen.awk: Declare finish_options_generated. * common.opt (Wuninitialized): Use EnabledBy. Delete Init. (Wunused-but-set-variable): Likewise. (Wunused-function): Likewise. (Wunused-label): Likewise. (Wunused-value): Likewise. (Wunused-variable): Likewise. * opt-read.awk: Create opt_numbers array. OK. Unfortunately, there are some issues with moving Wuninitialized to the new system. Wuninitialized is enabled by both Wall and Wextra. Wextra enables it in the common part, however, Wall does it in the FE specific part (c-family, fortran, ada). When enabled via Wall, opts_set does not get updated. What is the best way to enable a sub-option? Using handle_option_generated does not set opt_set either, so the test in finish_options_generated does not work as intended. (And the setting of -Wall gets overridden by the setting of -Wextra). I could move the setting of Wall to something like what we do for Wextra. However, this seems to me a step backwards. I think your original idea was to drive everything through the *_handle_option functions. Ideally, Wuninitialized should be handled like Wimplicit, using handle_option_generated to enable suboptions. But I am not sure what is the best way to implement this. Or in other words, what kind of code we want to autogenerate to handle this transparently. One idea could be to have an additional auto_handle_option() that is generated from the awk scripts and called after all other handle_option functions. This function will populate a switch with group options and the respective calls to handle_option_generated for sub-options. Is this a good idea? Where would be the best place to call this function? Cheers, Manuel.
[Patch, Fortran] PR53255 - fix type-bound operator handling
Dear all, if one uses TYPE(extended), the overridden specific procedure (trace_ext to the TBP trace) associated with an operator (.tr.) is not called - but the TBP of the base type. It correctly works for polymorphic types. Build and regtested on x86-64-linux. OK for the trunk? As it is a nasty wrong-code bug (but no regression), I wonder whether it should be backported - and, if so, to which version - 4.7 only? (Affected are GCC 4.5 to 4.8.) Tobias 2012-05-06 Tobias Burnus bur...@net-b.de PR fortran/53255 * resolve.c (resolve_typebound_static): Fix handling of overridden specific to generic operator. 2012-05-06 Tobias Burnus bur...@net-b.de PR fortran/53255 * gfortran.dg/typebound_operator_15.f90: New. diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c index e5a49bc..cacc033 100644 --- a/gcc/fortran/resolve.c +++ b/gcc/fortran/resolve.c @@ -5671,12 +5702,11 @@ resolve_typebound_static (gfc_expr* e, gfc_symtree** target, e-value.compcall.actual = NULL; /* If we find a deferred typebound procedure, check for derived types - that an over-riding typebound procedure has not been missed. */ - if (e-value.compcall.tbp-deferred - e-value.compcall.name - !e-value.compcall.tbp-non_overridable - e-value.compcall.base_object - e-value.compcall.base_object-ts.type == BT_DERIVED) + that an overriding typebound procedure has not been missed. */ + if (e-value.compcall.name + !e-value.compcall.tbp-non_overridable + e-value.compcall.base_object + e-value.compcall.base_object-ts.type == BT_DERIVED) { gfc_symtree *st; gfc_symbol *derived; --- /dev/null 2012-05-04 18:48:20.115791170 +0200 +++ gcc/gcc/testsuite/gfortran.dg/typebound_operator_15.f90 2012-05-06 18:30:18.0 +0200 @@ -0,0 +1,78 @@ +! { dg-do run } +! +! PR fortran/53255 +! +! Contributed by Reinhold Bader. +! +! Before TYPE(ext)'s .tr. wronly called the base type's trace +! instead of ext's trace_ext. +! +module mod_base + implicit none + private + integer, public :: base_cnt = 0 + type, public :: base + private + real :: r(2,2) = reshape( (/ 1.0, 2.0, 3.0, 4.0 /), (/ 2, 2 /)) + contains + procedure, private :: trace + generic :: operator(.tr.) = trace + end type base +contains + complex function trace(this) +class(base), intent(in) :: this +base_cnt = base_cnt + 1 +!write(*,*) 'executing base' +trace = this%r(1,1) + this%r(2,2) + end function trace +end module mod_base + +module mod_ext + use mod_base + implicit none + private + integer, public :: ext_cnt = 0 + public :: base, base_cnt + type, public, extends(base) :: ext + private + real :: i(2,2) = reshape( (/ 1.0, 1.0, 1.0, 1.5 /), (/ 2, 2 /)) + contains + procedure, private :: trace = trace_ext + end type ext +contains + complex function trace_ext(this) +class(ext), intent(in) :: this + +! the following should be executed through invoking .tr. p below +!write(*,*) 'executing override' +ext_cnt = ext_cnt + 1 +trace_ext = .tr. this%base + (0.0, 1.0) * ( this%i(1,1) + this%i(2,2) ) + end function trace_ext + +end module mod_ext +program test_override + use mod_ext + implicit none + type(base) :: o + type(ext) :: p + real :: r + + ! Note: ext's .tr. (trace_ext) calls also base's trace + +! write(*,*) .tr. o +! write(*,*) .tr. p + if (base_cnt /= 0 .or. ext_cnt /= 0) call abort () + r = .tr. o + if (base_cnt /= 1 .or. ext_cnt /= 0) call abort () + r = .tr. p + if (base_cnt /= 2 .or. ext_cnt /= 1) call abort () + + if (abs(.tr. o - 5.0 ) 1.0e-6 .and. abs( .tr. p - (5.0,2.5)) 1.0e-6) + then +if (base_cnt /= 4 .or. ext_cnt /= 2) call abort () +! write(*,*) 'OK' + else +call abort() +! write(*,*) 'FAIL' + end if +end program test_override
Re: [Patch] Bump minimum required MPFR version
On Sun, May 6, 2012 at 2:39 PM, Richard Guenther richard.guent...@gmail.com wrote: On Sun, May 6, 2012 at 10:33 AM, Janne Blomqvist blomqvist.ja...@gmail.com wrote: Hi, in http://gcc.gnu.org/install/prerequisites.html we say that GCC requires at least MPFR 2.4.2, but in the toplevel configure.ac we only require 2.3.1, printing a warning that the result is likely to be buggy if the version is lower than 2.4.2. The attached patch bumps the minimum version to 2.4.0. We started requiring 2.3.1, which was released on 2008-01-29, on 2009-04-08, that is, about 1 year and a few months after the release. MPFR 2.4.0 was released on 2009-01-26, so by now it's 3 years old. And by the time we release 4.8 it's most likely over 4 years old already. For some background, the fortran frontend recently started using mpfr_fmod to fix some bugs in the constant folding of the MOD and MODULO intrinsics, effectively requiring at least MPFR 2.4.0 in order to build. Also, if this patch is accepted the middle-end could be modified to constant fold BUILT_IN_FMOD{F,,L} relatively easily, something which isn't done today. Ok for trunk? Please make the check match documentation, thus 2.4.2, not 2.4.0. Something like the attached patch? FWIW, this removes the distinction we have between buggy, but builds and ok. Ok for trunk? 2012-05-06 Janne Blomqvist j...@gcc.gnu.org * configure.ac: Bump minimum MPFR version to 2.4.2. * configure: Regenerated. -- Janne Blomqvist mpfrbump2.diff Description: Binary data
PR 53249: Multiple address modes for same address space
x32 uses a mixture of MEM address modes for the same address space. Some MEMs have SImode addresses, some have DImode. This means that the currently common idiom: targetm.addr_space.address_mode (MEM_ADDR_SPACE (mem)) isn't trustworthy. We have to use the mode of the address if it has one, and only fall back on the above for VOIDmode (CONST_INT) addresses. We actually already have two (identical) functions to calculate such a mode. The patch below puts the function in a more general place and uses it instead of the above for rtl-level stuff. I'm not sure whether what x32 is doing is a good thing, but I like the patch anyway because (a) it removes a duplicated function and (b) it at least abstracts the concept away. Bootstrapped regression-tested on x86_64-linux-gnu. Also tested to make sure that there were no differences for cc1 .ii files for MIPS n32, o32 and n64. (I used MIPS to get LO_SUM coverage.) OK to install? Richard gcc/ PR middle-end/53249 * dwarf2out.h (get_address_mode): Move declaration to... * rtl.h: ...here. * dwarf2out.c (get_address_mode): Move definition to... * rtlanal.c: ...here. * var-tracking.c (get_address_mode): Delete. * combine.c (find_split_point): Use get_address_mode instead of targetm.addr_space.address_mode. * cselib.c (cselib_record_sets): Likewise. * dse.c (canon_address, record_store): Likewise. * emit-rtl.c (adjust_address_1, offset_address): Likewise. * expr.c (move_by_pieces, emit_block_move_via_loop, store_by_pieces) (store_by_pieces_1, expand_assignment, store_expr, store_constructor) (expand_expr_real_1): Likewise. * ifcvt.c (noce_try_cmove_arith): Likewise. * optabs.c (maybe_legitimize_operand_same_code): Likewise. * reload.c (find_reloads): Likewise. * sched-deps.c (sched_analyze_1, sched_analyze_2): Likewise. * sel-sched-dump.c (debug_mem_addr_value): Likewise. Index: gcc/dwarf2out.h === --- gcc/dwarf2out.h 2012-05-06 16:17:20.0 +0100 +++ gcc/dwarf2out.h 2012-05-06 16:17:20.316206160 +0100 @@ -228,7 +228,6 @@ typedef struct GTY(()) dw_loc_descr_stru (rtx, enum machine_mode mode, enum machine_mode mem_mode, enum var_init_status); extern bool loc_descr_equal_p (dw_loc_descr_ref, dw_loc_descr_ref); -extern enum machine_mode get_address_mode (rtx mem); extern dw_fde_ref dwarf2out_alloc_current_fde (void); extern unsigned long size_of_locs (dw_loc_descr_ref); Index: gcc/rtl.h === --- gcc/rtl.h 2012-05-06 16:17:20.0 +0100 +++ gcc/rtl.h 2012-05-06 16:17:20.294206160 +0100 @@ -1899,6 +1899,7 @@ typedef struct replace_label_data bool update_label_nuses; } replace_label_data; +extern enum machine_mode get_address_mode (rtx mem); extern int rtx_addr_can_trap_p (const_rtx); extern bool nonzero_address_p (const_rtx); extern int rtx_unstable_p (const_rtx); Index: gcc/dwarf2out.c === --- gcc/dwarf2out.c 2012-05-06 16:17:20.0 +0100 +++ gcc/dwarf2out.c 2012-05-06 16:17:20.315206160 +0100 @@ -10971,17 +10971,6 @@ parameter_ref_descriptor (rtx rtl) return ret; } -/* Helper function to get mode of MEM's address. */ - -enum machine_mode -get_address_mode (rtx mem) -{ - enum machine_mode mode = GET_MODE (XEXP (mem, 0)); - if (mode != VOIDmode) -return mode; - return targetm.addr_space.address_mode (MEM_ADDR_SPACE (mem)); -} - /* The following routine converts the RTL for a variable or parameter (resident in memory) into an equivalent Dwarf representation of a mechanism for getting the address of that same variable onto the top of a Index: gcc/rtlanal.c === --- gcc/rtlanal.c 2012-05-06 16:17:20.0 +0100 +++ gcc/rtlanal.c 2012-05-06 16:17:20.298206160 +0100 @@ -5279,3 +5279,17 @@ low_bitmask_len (enum machine_mode mode, return exact_log2 (m + 1); } + +/* Return the mode of MEM's address. */ + +enum machine_mode +get_address_mode (rtx mem) +{ + enum machine_mode mode; + + gcc_assert (MEM_P (mem)); + mode = GET_MODE (XEXP (mem, 0)); + if (mode != VOIDmode) +return mode; + return targetm.addr_space.address_mode (MEM_ADDR_SPACE (mem)); +} Index: gcc/var-tracking.c === --- gcc/var-tracking.c 2012-05-06 16:17:20.0 +0100 +++ gcc/var-tracking.c 2012-05-06 16:17:20.306206160 +0100 @@ -4909,17 +4909,6 @@ find_use_val (rtx x, enum machine_mode m return NULL; } -/* Helper function to get mode of MEM's address. */ - -static inline enum machine_mode -get_address_mode (rtx mem) -{ - enum machine_mode mode = GET_MODE (XEXP (mem, 0)); - if (mode != VOIDmode) -return mode; -
Re: [RFC] PR 53063 encode group options in .opt files
On Sun, 6 May 2012, Manuel López-Ibáñez wrote: Wuninitialized is enabled by both Wall and Wextra. Wextra enables it in the common part, however, Wall does it in the FE specific part (c-family, fortran, ada). When enabled via Wall, opts_set does not get updated. What is the best way to enable a sub-option? Using handle_option_generated does not set opt_set either, so the test in finish_options_generated does not work as intended. (And the setting of -Wall gets overridden by the setting of -Wextra). That's where the notion of distance comes in - if there's an explicit -Wuninitialized or -Wno-uninitialized option, the last one of those takes precedence, but otherwise the last -Wall / -Wno-all / -Wextra / -Wno-extra determines the setting of -Wuninitialized, but otherwise the default value applies. (I'd guess that -Werror=extra should count as a -Wextra variant - at the same distance from any options implied by -Wextra as -Wextra itself - though I'm not entirely sure.) (I don't think you actually need to record distance explicitly for these particular options. You do need to process them as they are seen, so that you can distinguish -Wall -Wno-extra and -Wno-extra -Wall.) I could move the setting of Wall to something like what we do for Wextra. However, this seems to me a step backwards. I think your original idea was to drive everything through the *_handle_option functions. Ideally, Wuninitialized should be handled like Wimplicit, using handle_option_generated to enable suboptions. But I am not sure what is the best way to implement this. Or in other words, what kind of code we want to autogenerate to handle this transparently. One idea could be to have an additional auto_handle_option() that is generated from the awk scripts and called after all other handle_option functions. This function will populate a switch with group options and the respective calls to handle_option_generated for sub-options. Is this a good idea? Where would be the best place to call this function? That certainly seems one reasonable way to handle implications. -- Joseph S. Myers jos...@codesourcery.com
Re: [C++ Patch] fix semi-random template specialization ICE
On Fri, May 4, 2012 at 4:48 AM, Martin Jambor mjam...@suse.cz wrote: Hi, On Thu, May 03, 2012 at 03:17:23PM -0300, Alexandre Oliva wrote: I've recently started getting “libstdc++-v3/include/functional:2057:63: internal compiler error: tree check: expected tree_vec, have error_mark in comp_template_args_with_info, at cp/pt.c:7038” on i686-linux-gnu, building libstdc++-v3/src/c++11/functexcept.cc -fPIC, at stage1 and on non-bootstrapped builds. The problem would not occur on x86_64-linux-gnu with the -m32 multilib. I suppose this is PR 53209. Thanks for dealing with this! Martin Jakub reported getting similar errors in the testsuite, but not in the libstdc++-v3 build. Bisection revealted the patch that exposed the latent error was r186948, but I gather it only introduced more potentially-failing specializations in libstdc++-v3 at spots that wouldn't trigger the bug before. I couldn't pinpoint the exact source of randomness that causes the build to fail at precisely the same point on a given machine at a certain stage, but not on others. What I do know is that it occurs while iterating on a hash table, which, depending on how the hash is computed, may explain why we visit some nodes before others depending on environmentally-deterministic causes. Anyway, the problem is that, for some unsuitable candidate template specializations, tsubst returns error_mark_node, which tsubst_decl stores in argvec, and later on register_specialization gets this error_mark_node and tries to access it as a tree_vec. The trivial patch that avoids the misbehavior is returning error_mark_node as soon as we get that for argvec. Bootstrapped on i686-pc-linux-gnu and x86_64-linux-gnu, regstrapped on the latter. Ok to install? for gcc/cp/ChangeLog from Alexandre Oliva aol...@redhat.com * pt.c (tsubst_decl): Bail out if argvec is error_mark_node. Index: gcc/cp/pt.c === --- gcc/cp/pt.c.orig 2012-04-30 15:34:44.018432544 -0300 +++ gcc/cp/pt.c 2012-04-30 15:34:47.988375071 -0300 @@ -10626,6 +10626,8 @@ tsubst_decl (tree t, tree args, tsubst_f tmpl = DECL_TI_TEMPLATE (t); gen_tmpl = most_general_template (tmpl); argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl); + if (argvec == error_mark_node) + RETURN (error_mark_node); hash = hash_tmpl_and_args (gen_tmpl, argvec); spec = retrieve_specialization (gen_tmpl, argvec, hash); } This does fix: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53209 Can someone review it? Thanks. -- H.J.
Re: PR 53249: Multiple address modes for same address space
On Sun, May 6, 2012 at 11:41 AM, Richard Sandiford rdsandif...@googlemail.com wrote: x32 uses a mixture of MEM address modes for the same address space. Some MEMs have SImode addresses, some have DImode. This means that the currently common idiom: targetm.addr_space.address_mode (MEM_ADDR_SPACE (mem)) isn't trustworthy. We have to use the mode of the address if it has one, and only fall back on the above for VOIDmode (CONST_INT) addresses. We actually already have two (identical) functions to calculate such a mode. The patch below puts the function in a more general place and uses it instead of the above for rtl-level stuff. I'm not sure whether what x32 is doing is a good thing, but I like the patch anyway because (a) it removes a duplicated function and (b) it at least abstracts the concept away. Bootstrapped regression-tested on x86_64-linux-gnu. Also tested to make sure that there were no differences for cc1 .ii files for MIPS n32, o32 and n64. (I used MIPS to get LO_SUM coverage.) OK to install? Richard gcc/ PR middle-end/53249 * dwarf2out.h (get_address_mode): Move declaration to... * rtl.h: ...here. * dwarf2out.c (get_address_mode): Move definition to... * rtlanal.c: ...here. * var-tracking.c (get_address_mode): Delete. * combine.c (find_split_point): Use get_address_mode instead of targetm.addr_space.address_mode. * cselib.c (cselib_record_sets): Likewise. * dse.c (canon_address, record_store): Likewise. * emit-rtl.c (adjust_address_1, offset_address): Likewise. * expr.c (move_by_pieces, emit_block_move_via_loop, store_by_pieces) (store_by_pieces_1, expand_assignment, store_expr, store_constructor) (expand_expr_real_1): Likewise. * ifcvt.c (noce_try_cmove_arith): Likewise. * optabs.c (maybe_legitimize_operand_same_code): Likewise. * reload.c (find_reloads): Likewise. * sched-deps.c (sched_analyze_1, sched_analyze_2): Likewise. * sel-sched-dump.c (debug_mem_addr_value): Likewise. Can you add a testcase? You can put the testcase in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53249#c4 in gcc.target/i386 with: /* { dg-do compile { target { ! { ia32 } } } } */ /* { dg-options -O2 -mx32 -ftls-model=initial-exec -maddress-mode=short } */ Thanks. -- H.J.
Re: PR 53249: Multiple address modes for same address space
On Sun, May 6, 2012 at 11:41 AM, Richard Sandiford rdsandif...@googlemail.com wrote: x32 uses a mixture of MEM address modes for the same address space. Some MEMs have SImode addresses, some have DImode. This means that the currently common idiom: targetm.addr_space.address_mode (MEM_ADDR_SPACE (mem)) isn't trustworthy. We have to use the mode of the address if it has one, and only fall back on the above for VOIDmode (CONST_INT) addresses. We actually already have two (identical) functions to calculate such a mode. The patch below puts the function in a more general place and uses it instead of the above for rtl-level stuff. I'm not sure whether what x32 is doing is a good thing, but I like the patch anyway because (a) it removes a duplicated function and (b) it at least abstracts the concept away. Bootstrapped regression-tested on x86_64-linux-gnu. Also tested to make sure that there were no differences for cc1 .ii files for MIPS n32, o32 and n64. (I used MIPS to get LO_SUM coverage.) OK to install? Richard gcc/ PR middle-end/53249 * dwarf2out.h (get_address_mode): Move declaration to... * rtl.h: ...here. * dwarf2out.c (get_address_mode): Move definition to... * rtlanal.c: ...here. * var-tracking.c (get_address_mode): Delete. * combine.c (find_split_point): Use get_address_mode instead of targetm.addr_space.address_mode. * cselib.c (cselib_record_sets): Likewise. * dse.c (canon_address, record_store): Likewise. * emit-rtl.c (adjust_address_1, offset_address): Likewise. * expr.c (move_by_pieces, emit_block_move_via_loop, store_by_pieces) (store_by_pieces_1, expand_assignment, store_expr, store_constructor) (expand_expr_real_1): Likewise. * ifcvt.c (noce_try_cmove_arith): Likewise. * optabs.c (maybe_legitimize_operand_same_code): Likewise. * reload.c (find_reloads): Likewise. * sched-deps.c (sched_analyze_1, sched_analyze_2): Likewise. * sel-sched-dump.c (debug_mem_addr_value): Likewise. Index: gcc/rtlanal.c === --- gcc/rtlanal.c 2012-05-06 16:17:20.0 +0100 +++ gcc/rtlanal.c 2012-05-06 16:17:20.298206160 +0100 @@ -5279,3 +5279,17 @@ low_bitmask_len (enum machine_mode mode, return exact_log2 (m + 1); } + +/* Return the mode of MEM's address. */ + +enum machine_mode +get_address_mode (rtx mem) +{ + enum machine_mode mode; + + gcc_assert (MEM_P (mem)); + mode = GET_MODE (XEXP (mem, 0)); + if (mode != VOIDmode) + return mode; + return targetm.addr_space.address_mode (MEM_ADDR_SPACE (mem)); +} Index: gcc/sel-sched-dump.c === --- gcc/sel-sched-dump.c 2012-05-06 16:17:20.0 +0100 +++ gcc/sel-sched-dump.c 2012-05-06 16:17:20.316206160 +0100 @@ -957,7 +957,7 @@ debug_mem_addr_value (rtx x) enum machine_mode address_mode; gcc_assert (MEM_P (x)); You should remove this assert since get_address_mode does it. - address_mode = targetm.addr_space.address_mode (MEM_ADDR_SPACE (x)); + address_mode = get_address_mode (x); t = shallow_copy_rtx (x); if (cselib_lookup (XEXP (t, 0), address_mode, 0, GET_MODE (t))) -- H.J.
[committed] Fix lower-subreg cost calculation
Georg-Johann Lay a...@gjlay.de writes: TARGET_RTX_COSTS gets called with x = (const_int 1) and outer = SET for example. How do I get SET_DEST from that information? I don't now if lower-subreg.s ever emits such cost requests, but several passes definitely do. Gah! I really should have remembered that insn_rtx_cost happily ignores both SETs and SET_DESTs, and skips straight to the SET_SRC. This caught me out when looking at the auto-inc-dec rewrite last year too. (The problem in that case was that insn_rtx_cost ignored the cost of MEMs in stores, and only took into account the cost of MEMs in loads.) While that probably ought to change, I felt like I was going down a rathole last time I looked at it, so this patch does what I should have done originally. For the record: I wondered whether rtlanal.c should base the default register-to-register copy cost for mode M on the lowest move_cost[M][c][c]. The problem is that move_cost has traditionally been used to choose between difference classes in the same mode, rather than between modes, with 2 as the base cost. So I don't think it's suitable. Tested on x86_64-linux-gnu and with the upcoming MIPS costs. Installed. Sorry for the breakage. Richard gcc/ * lower-subreg.c (shift_cost): Use set_src_cost, avoiding the SET. (compute_costs): Likewise for the zero extension. Use set_rtx_cost to compute the cost of moves. Set the mode of the target register. Index: gcc/lower-subreg.c === --- gcc/lower-subreg.c 2012-05-06 13:47:49.0 +0100 +++ gcc/lower-subreg.c 2012-05-06 14:56:47.851024108 +0100 @@ -135,13 +135,11 @@ struct cost_rtxes { shift_cost (bool speed_p, struct cost_rtxes *rtxes, enum rtx_code code, enum machine_mode mode, int op1) { - PUT_MODE (rtxes-target, mode); PUT_CODE (rtxes-shift, code); PUT_MODE (rtxes-shift, mode); PUT_MODE (rtxes-source, mode); XEXP (rtxes-shift, 1) = GEN_INT (op1); - SET_SRC (rtxes-set) = rtxes-shift; - return insn_rtx_cost (rtxes-set, speed_p); + return set_src_cost (rtxes-shift, speed_p); } /* For each X in the range [0, BITS_PER_WORD), set SPLITTING[X] @@ -189,11 +187,12 @@ compute_costs (bool speed_p, struct cost unsigned int i; int word_move_zero_cost, word_move_cost; + PUT_MODE (rtxes-target, word_mode); SET_SRC (rtxes-set) = CONST0_RTX (word_mode); - word_move_zero_cost = insn_rtx_cost (rtxes-set, speed_p); + word_move_zero_cost = set_rtx_cost (rtxes-set, speed_p); SET_SRC (rtxes-set) = rtxes-source; - word_move_cost = insn_rtx_cost (rtxes-set, speed_p); + word_move_cost = set_rtx_cost (rtxes-set, speed_p); if (LOG_COSTS) fprintf (stderr, %s move: from zero cost %d, from reg cost %d\n, @@ -209,7 +208,7 @@ compute_costs (bool speed_p, struct cost PUT_MODE (rtxes-target, mode); PUT_MODE (rtxes-source, mode); - mode_move_cost = insn_rtx_cost (rtxes-set, speed_p); + mode_move_cost = set_rtx_cost (rtxes-set, speed_p); if (LOG_COSTS) fprintf (stderr, %s move: original cost %d, split cost %d * %d\n, @@ -236,10 +235,8 @@ compute_costs (bool speed_p, struct cost /* The only case here to check to see if moving the upper part with a zero is cheaper than doing the zext itself. */ - PUT_MODE (rtxes-target, twice_word_mode); PUT_MODE (rtxes-source, word_mode); - SET_SRC (rtxes-set) = rtxes-zext; - zext_cost = insn_rtx_cost (rtxes-set, speed_p); + zext_cost = set_src_cost (rtxes-zext, speed_p); if (LOG_COSTS) fprintf (stderr, %s %s: original cost %d, split cost %d + %d\n,
Re: PR 53249: Multiple address modes for same address space
H.J. Lu hjl.to...@gmail.com writes: Index: gcc/sel-sched-dump.c === --- gcc/sel-sched-dump.c 2012-05-06 16:17:20.0 +0100 +++ gcc/sel-sched-dump.c 2012-05-06 16:17:20.316206160 +0100 @@ -957,7 +957,7 @@ debug_mem_addr_value (rtx x) enum machine_mode address_mode; gcc_assert (MEM_P (x)); You should remove this assert since get_address_mode does it. I think it's better to keep it. Richard - address_mode = targetm.addr_space.address_mode (MEM_ADDR_SPACE (x)); + address_mode = get_address_mode (x); t = shallow_copy_rtx (x); if (cselib_lookup (XEXP (t, 0), address_mode, 0, GET_MODE (t)))
[committed] Add SET rtx costs for MIPS
This patch adds SET rtx costs to MIPS. Since FPR modes and GPR modes aren't tieable, the effect is to restore the original lower-subreg behaviour of splitting all multiword modes. Tested by setting LOG_COSTS to 1 and checking that the costs looked sensible. Also tested by compiling cc1 .ii files for -mabi=n32, -mabi=64, -mabi=32 and -mabi=32 -mfp64. The output was the same as when FORCE_LOWERING was set to 1, but different from unmodified trunk. Applied. Richard gcc/ * config/mips/mips.c (mips_set_reg_reg_piece_cost): New function. (mips_set_reg_reg_cost): Likewise. (mips_rtx_costs): Handle SET. Index: gcc/config/mips/mips.c === --- gcc/config/mips/mips.c 2012-05-06 13:47:49.0 +0100 +++ gcc/config/mips/mips.c 2012-05-06 14:10:25.636105001 +0100 @@ -3490,6 +3490,37 @@ mips_zero_extend_cost (enum machine_mode return COSTS_N_INSNS (1); } +/* Return the cost of moving between two registers of mode MODE, + assuming that the move will be in pieces of at most UNITS bytes. */ + +static int +mips_set_reg_reg_piece_cost (enum machine_mode mode, unsigned int units) +{ + return COSTS_N_INSNS ((GET_MODE_SIZE (mode) + units - 1) / units); +} + +/* Return the cost of moving between two registers of mode MODE. */ + +static int +mips_set_reg_reg_cost (enum machine_mode mode) +{ + switch (GET_MODE_CLASS (mode)) +{ +case MODE_CC: + return mips_set_reg_reg_piece_cost (mode, GET_MODE_SIZE (CCmode)); + +case MODE_FLOAT: +case MODE_COMPLEX_FLOAT: +case MODE_VECTOR_FLOAT: + if (TARGET_HARD_FLOAT) + return mips_set_reg_reg_piece_cost (mode, UNITS_PER_HWFPVALUE); + /* Fall through */ + +default: + return mips_set_reg_reg_piece_cost (mode, UNITS_PER_WORD); +} +} + /* Implement TARGET_RTX_COSTS. */ static bool @@ -3877,6 +3908,15 @@ mips_rtx_costs (rtx x, int code, int out *total = mips_cost-fp_add; return false; +case SET: + if (register_operand (SET_DEST (x), VOIDmode) + reg_or_0_operand (SET_SRC (x), VOIDmode)) + { + *total = mips_set_reg_reg_cost (GET_MODE (SET_DEST (x))); + return true; + } + return false; + default: return false; }
[Fortran, patch] PR 52158 - Regression on character function with gfortran 4.7
Hello, my name is Alessandro, I'm a newbie of GCC and helped by Tobias Burnus and Paul Thomas I'll try to add support for final subroutines. The patch is bootstrapped and tested on x86_64-unknown-linux-gnu - gcc version 4.8.0 20120506 (experimental) Best regards. gcc/fortran/ChangeLog 2012-05-06 Alessandro Fanfarillo fanfarillo@gmail.com Paul Thomas pa...@gcc.gnu.org Tobias Burnus bur...@net-b.de PR fortran/52158 * resolve.c (resolve_fl_derived0): Add a new condition in the if statement of the deferred-length character component error block. * trans-expr (gfc_conv_procedure_call): Add new checks in the if statement on component's attributes (regarding PR 45170). gcc/testsuite/ChangeLog 2012-05-06 Alessandro Fanfarillo fanfarillo@gmail.com Damian Rouson dam...@rouson.net PR fortran/45170 * gfortran.dg/deferred_type_param_3.f90: New. Patch.diff --- gcc-4.8/gcc/fortran/resolve.c 2012-05-06 19:29:21.794825508 +0200 +++ gcc-4.8-patched/gcc/fortran/resolve.c 2012-05-06 19:24:40.770831649 +0200 @@ -11666,7 +11666,7 @@ for ( ; c != NULL; c = c-next) { /* See PRs 51550, 47545, 48654, 49050, 51075 - and 45170. */ - if (c-ts.type == BT_CHARACTER c-ts.deferred) + if (c-ts.type == BT_CHARACTER c-ts.deferred !c-attr.function) { gfc_error (Deferred-length character component '%s' at %L is not yet supported, c-name, c-loc); diff -urN gcc-4.8/gcc/fortran/trans-expr.c gcc-4.8-patched/gcc/fortran/trans-expr.c --- gcc-4.8/gcc/fortran/trans-expr.c2012-05-06 19:29:21.878825505 +0200 +++ gcc-4.8-patched/gcc/fortran/trans-expr.c2012-05-06 19:25:53.134830069 +0200 @@ -4175,7 +4175,9 @@ we take the character length of the first argument for the result. For dummies, we have to look through the formal argument list for this function and use the character length found there.*/ - if (ts.deferred (sym-attr.allocatable || sym-attr.pointer)) + if (ts.deferred ((!comp (sym-attr.allocatable + || sym-attr.pointer)) || (comp (comp-attr.allocatable + || comp-attr.pointer cl.backend_decl = gfc_create_var (gfc_charlen_type_node, slen); else if (!sym-attr.dummy) cl.backend_decl = VEC_index (tree, stringargs, 0); diff -urN gcc-4.8/gcc/testsuite/gfortran.dg/deferred_type_param_3.f90 gcc-4.8-patched/gcc/testsuite/gfortran.dg/deferred_type_param_3.f90 --- gcc-4.8/gcc/testsuite/gfortran.dg/deferred_type_param_3.f90 1970-01-01 01:00:00.0 +0100 +++ gcc-4.8-patched/gcc/testsuite/gfortran.dg/deferred_type_param_3.f90 2012-05-06 19:26:29.498829273 +0200 @@ -0,0 +1,21 @@ +! { dg-do compile } +! +! PR fortran/45170 +! +! Contributed by Damian Rouson + +module speaker_class + type speaker + contains +procedure :: speak + end type +contains + function speak(this) +class(speaker) ,intent(in) :: this +character(:) ,allocatable :: speak + end function + subroutine say_something(somebody) +class(speaker) :: somebody +print *,somebody%speak() + end subroutine +end module
Re: [committed] Add SET rtx costs for MIPS / [SH] PR 53250
On Sun, 2012-05-06 at 20:13 +0100, Richard Sandiford wrote: This patch adds SET rtx costs to MIPS. Since FPR modes and GPR modes aren't tieable, the effect is to restore the original lower-subreg behaviour of splitting all multiword modes. Tested by setting LOG_COSTS to 1 and checking that the costs looked sensible. Also tested by compiling cc1 .ii files for -mabi=n32, -mabi=64, -mabi=32 and -mabi=32 -mfp64. The output was the same as when FORCE_LOWERING was set to 1, but different from unmodified trunk. Applied. The attached patch does pretty much the same for the SH target. Tested also by setting LOG_COSTS to 1 and checking that multi-word modes are marked for splitting (except for DImode zero_extend lowering). Also verified that newlib compiles again. OK? Cheers, Oleg ChangLog: PR target/53250 * config/sh/sh.c (sh_rtx_costs): Handle SET case to restore original behavior of lower-subreg. Index: gcc/config/sh/sh.c === --- gcc/config/sh/sh.c (revision 187212) +++ gcc/config/sh/sh.c (working copy) @@ -2999,6 +2999,27 @@ { switch (code) { + /* The lower-subreg pass decides whether to split multi-word regs + into individual regs by looking at the cost for a REG of certain + modes with the following patterns: + (set (reg) (reg)) + (set (reg) (const_int 0)) + On machines that support vector move operations a multi-word move + is the same cost as individual reg move. On SH there is no + vector-move, so we have to provide the correct cost in the number + of move insns to load/store the reg of the mode in question. */ +case SET: + if (register_operand (SET_DEST (x), VOIDmode) + (register_operand (SET_SRC (x), VOIDmode) + || satisfies_constraint_Z (SET_SRC (x + { + const enum machine_mode mode = GET_MODE (SET_DEST (x)); + *total = COSTS_N_INSNS (GET_MODE_SIZE (mode) + / mov_insn_size (mode, TARGET_SH2A)); + return true; +} + return false; + case CONST_INT: if (TARGET_SHMEDIA) {
Re: [C++ Patch] for c++/51214
2012/2/29 Jason Merrill ja...@redhat.com: On 02/28/2012 05:06 PM, Fabien Chêne wrote: I agree, this is not efficient but I didn't find a better place. perhaps in cp_parser_enumerator_list, that would require adding an additional parameter to keep track of all the enum DECLs. Is it what you have in mind ? I was thinking of finish_enum_value_list. OK great. I have tried to reuse the existing infrastructure to extend the CLASSTYPE_SORTED_FIELDS, unfortunately, it does not seem possible because the code uses a tree chain (chained with DECL_CHAIN), and this field is already used for enum values to store the enum type. Among various possibilities, in the end, I think it is clearer to handle the lately defined enum case separately. That is what I have done in the attached patch. Unqualified lookup works because when the type is not complete, the lookup uses the non sorted case, which always works. OK, just make sure we have a test for that. I have added a check in forw_enum11.C for that. Boostrapped and tested on x86_64-unknown-linux-gnu, OK to commit ? gcc/testsuite/ChangeLog 2012-05-06 Fabien Chêne fab...@gcc.gnu.org PR c++/51214 * g++.dg/cpp0x/forw_enum11.C: New. gcc/cp/ChangeLog 2012-05-06 Fabien Chêne fab...@gcc.gnu.org PR c++/51214 * cp-tree.h (insert_late_enum_def_into_classtype_sorted_fields): Declare. * class.c (insert_into_classtype_sorted_fields): New. (add_enum_fields_to_record_type): New. (count_fields): Adjust the comment. (add_fields_to_record_type): Likewise. (finish_struct_1): Move the code that inserts the fields for the sorted case, into insert_into_classtype_sorted_fields, and call it. (insert_late_enum_def_into_classtype_sorted_fields): Define. * decl.c (finish_enum_value_list): Call insert_late_enum_def_into_classtype_sorted_fields if a late enum definition is encountered. -- Fabien pr51214.patch Description: Binary data
[Patch, Fortran, committed] PR41587 - fix diagnostic for pointer/alloc CLASS with non-derferred array spec
Rather obvious after finding it … I first used t != FAILURE; however, that gives additional error messages of the form: class(t0), pointer :: foo(3) ! { dg-error must have a deferred shape } 1 Error: Component 'foo' with CLASS at (1) must be allocatable or pointer Thus, I decided to always call gfc_build_class_symbol. Committed (Rev. 187214) after building and regtesting it on x86-64-gnu-linux. Tobias 2012-05-06 Tobias Burnus bur...@net-b.de PR fortran/41587 * decl.c (build_struct): Don't ignore FAILED status. 2012-05-06 Tobias Burnus bur...@net-b.de PR fortran/41587 * gfortran.dg/class_array_13.f90: New. diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index 4da21c3..b527dd0 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -1653,17 +1653,20 @@ build_struct (const char *name, gfc_charlen *cl, gfc_expr **init, } scalar: if (c-ts.type == BT_CLASS) { bool delayed = (gfc_state_stack-sym == c-ts.u.derived) || (!c-ts.u.derived-components !c-ts.u.derived-attr.zero_comp); - return gfc_build_class_symbol (c-ts, c-attr, c-as, delayed); + gfc_try t2 = gfc_build_class_symbol (c-ts, c-attr, c-as, delayed); + + if (t != FAILURE) + t = t2; } return t; } /* Match a 'NULL()', and possibly take care of some side effects. */ --- /dev/null 2012-05-04 18:48:20.115791170 +0200 +++ gcc/gcc/testsuite/gfortran.dg/class_array_13.f90 2012-05-06 18:48:31.0 +0200 @@ -0,0 +1,26 @@ +! { dg-do compile } +! { dg-options -fcoarray=single } +! +! PR fortran/41587 +! + +type t0 + integer :: j = 42 +end type t0 + +type t + integer :: i + class(t0), allocatable :: foo(3) ! { dg-error must have a deferred shape } +end type t + +type t2 + integer :: i + class(t0), pointer :: foo(3) ! { dg-error must have a deferred shape } +end type t2 + +type t3 + integer :: i + class(t0), allocatable :: foo[3] ! { dg-error Upper bound of last coarray dimension must be '\\*' } +end type t3 + +end
[PATCH, i386]: Fix PR 53227, FAIL: gcc.target/i386/movbe-2.c scan-assembler-times movbe[ \t] 4
Hello! Attached patch splits bswap patterns on 32bit targets by hand, as is the case with all other DImode patterns. The patch takes into account memory operands, where it swaps high/low word load according to bswap/movbe insn availability, and generates xcgh %rX, %rY for reg-reg swaps, avoiding a move to/from temporary register. 2012-05-06 Uros Bizjak ubiz...@gmail.com PR target/53227 * config/i386/i386.md (swapmode): Rename from *swapmode. (bswapdi2): Split from bswapmode2. Use nonnimediate_operand predicate for operand 1. Force operand 1 to register for TARGET_BSWAP. (bswapsi2): Ditto. (*bswapdi2_doubleword): New insn pattern. (*bswapmode2): Rename from *bswapmode2_1. Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu. Committed to mainline SVN. Uros. Index: i386.md === --- i386.md (revision 187211) +++ i386.md (working copy) @@ -2406,7 +2406,7 @@ (set_attr memory load) (set_attr mode MODE)]) -(define_insn *swapmode +(define_insn swapmode [(set (match_operand:SWI48 0 register_operand +r) (match_operand:SWI48 1 register_operand +r)) (set (match_dup 1) @@ -12487,13 +12487,71 @@ (set_attr type bitmanip) (set_attr mode SI)]) -(define_expand bswapmode2 - [(set (match_operand:SWI48 0 register_operand) - (bswap:SWI48 (match_operand:SWI48 1 register_operand)))] +(define_expand bswapdi2 + [(set (match_operand:DI 0 register_operand) + (bswap:DI (match_operand:DI 1 nonimmediate_operand)))] { - if (MODEmode == SImode !(TARGET_BSWAP || TARGET_MOVBE)) + if (TARGET_64BIT !TARGET_MOVBE) +operands[1] = force_reg (DImode, operands[1]); +}) + +(define_insn_and_split *bswapdi2_doubleword + [(set (match_operand:DI 0 nonimmediate_operand =r,r,m) + (bswap:DI + (match_operand:DI 1 nonimmediate_operand 0,m,r)))] + !TARGET_64BIT +!(MEM_P (operands[0]) MEM_P (operands[1])) + # + reload_completed + [(set (match_dup 2) + (bswap:SI (match_dup 1))) + (set (match_dup 0) + (bswap:SI (match_dup 3)))] +{ + split_double_mode (DImode, operands[0], 2, operands[0], operands[2]); + + if (REG_P (operands[0]) REG_P (operands[1])) { + emit_insn (gen_swapsi (operands[0], operands[2])); + emit_insn (gen_bswapsi2 (operands[0], operands[0])); + emit_insn (gen_bswapsi2 (operands[2], operands[2])); + DONE; +} + + if (!TARGET_MOVBE) +{ + if (MEM_P (operands[0])) + { + emit_insn (gen_bswapsi2 (operands[3], operands[3])); + emit_insn (gen_bswapsi2 (operands[1], operands[1])); + + emit_move_insn (operands[0], operands[3]); + emit_move_insn (operands[2], operands[1]); + } + if (MEM_P (operands[1])) + { + emit_move_insn (operands[2], operands[1]); + emit_move_insn (operands[0], operands[3]); + + emit_insn (gen_bswapsi2 (operands[2], operands[2])); + emit_insn (gen_bswapsi2 (operands[0], operands[0])); + } + DONE; +} +}) + +(define_expand bswapsi2 + [(set (match_operand:SI 0 register_operand) + (bswap:SI (match_operand:SI 1 nonimmediate_operand)))] + +{ + if (TARGET_MOVBE) +; + else if (TARGET_BSWAP) +operands[1] = force_reg (SImode, operands[1]); + else +{ rtx x = operands[0]; emit_move_insn (x, operands[1]); @@ -12519,7 +12577,7 @@ (set_attr prefix_extra *,1,1) (set_attr mode MODE)]) -(define_insn *bswapmode2_1 +(define_insn *bswapmode2 [(set (match_operand:SWI48 0 register_operand =r) (bswap:SWI48 (match_operand:SWI48 1 register_operand 0)))] TARGET_BSWAP
[patch][m68k] Remove sched_branch_type, reduce genattrtab run time to reasonable numbers
Hello, Since around trunk r135033, m68k has some scheduler attributes that are computed by C functions in m68k.c. Together with Richard Sandiford's improvements to genattrtab optimizations, the run time for genattrtab for m68k is 9 minutes on a fast machine (gcc110). With the attached patch, genattrtab goes down to less than 2 minutes. But the only thing the patch does, is remove a write-only array, sched_branch_type! This array was apparently introduced to compute the best type-attribute for four branch instructions, with a FIXME that someone should implement the actual computations for the best type. However, exactly four years have passed since this code was added, and nobody has bothered to actually implement this better type attribute assignment.To me, it makes no sense to keep this code around, given the problems it creates for genattrtab. Tested by building a cross to m68k-linux. OK for trunk? Ciao! Steven PR52391_no_sched_branch_type.diff Description: Binary data
Re: [committed] Add SET rtx costs for MIPS / [SH] PR 53250
Oleg Endo oleg.e...@t-online.de wrote: The attached patch does pretty much the same for the SH target. Tested also by setting LOG_COSTS to 1 and checking that multi-word modes are marked for splitting (except for DImode zero_extend lowering). Also verified that newlib compiles again. OK? Cheers, Oleg ChangLog: PR target/53250 * config/sh/sh.c (sh_rtx_costs): Handle SET case to restore original behavior of lower-subreg. Looks fine, though the terser ChangeLog entry would be better. We usually don't include how and why into there. MIPS's (mips_rtx_costs): Handle SET. is enough, I think. Ok with that change. Thanks for fixing this! Regards, kaz
Re: [PATCH] x86: emit tzcnt unconditionally
On Mon, Apr 30, 2012 at 10:09 AM, Uros Bizjak ubiz...@gmail.com wrote: On Fri, Apr 27, 2012 at 3:30 PM, Paolo Bonzini bonz...@gnu.org wrote: tzcnt is encoded as rep;bsf and unlike lzcnt is a drop-in replacement if we don't care about the flags (it has the same semantics for non-zero values). Since bsf is usually slower, just emit tzcnt unconditionally. However, write it as rep;bsf unless -mbmi is in use, to cater for old assemblers. Please emit rep;bsf when optimize_insn_for_speed_p () is true. Bootstrapped on a non-BMI x86_64-linux host, regtest in progress. Ok for mainline? OK with the optimize_insn_for_speed_p conditional. I have committed similar patch, where we emit bsf when optimizing for size (saving a whopping one byte) and rep;bsf for !TARGET_BMI. The same functionality can be added to *ffsmode_1, since we don't care what ends in the register for input operand == 0 (this is the key difference between tzcnt and bsf). 2012-05-06 Uros Bizjak ubiz...@gmail.com Paolo Bonzini bonz...@gnu.org * config/i386/i386.md (ctzmode2): Emit rep;bsf even for !TARGET_BMI and bsf when optimizing for size. (*ffsmode_1): Ditto. Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Uros. Index: i386.md === --- i386.md (revision 187217) +++ i386.md (working copy) @@ -12112,9 +12112,22 @@ (set (match_operand:SWI48 0 register_operand =r) (ctz:SWI48 (match_dup 1)))] - bsf{imodesuffix}\t{%1, %0|%0, %1} +{ + if (optimize_function_for_size_p (cfun)) +return bsf{imodesuffix}\t{%1, %0|%0, %1}; + else if (TARGET_BMI) +return tzcnt{imodesuffix}\t{%1, %0|%0, %1}; + else +/* tzcnt expands to rep;bsf and we can use it even if !TARGET_BMI. */ +return rep; bsf{imodesuffix}\t{%1, %0|%0, %1}; +} [(set_attr type alu1) (set_attr prefix_0f 1) + (set (attr prefix_rep) + (if_then_else + (match_test optimize_function_for_size_p (cfun)) + (const_string 0) + (const_string 1))) (set_attr mode MODE)]) (define_insn ctzmode2 @@ -12123,14 +12136,21 @@ (clobber (reg:CC FLAGS_REG))] { - if (TARGET_BMI) + if (optimize_function_for_size_p (cfun)) +return bsf{imodesuffix}\t{%1, %0|%0, %1}; + else if (TARGET_BMI) return tzcnt{imodesuffix}\t{%1, %0|%0, %1}; - else -return bsf{imodesuffix}\t{%1, %0|%0, %1}; + else +/* tzcnt expands to rep;bsf and we can use it even if !TARGET_BMI. */ +return rep; bsf{imodesuffix}\t{%1, %0|%0, %1}; } [(set_attr type alu1) (set_attr prefix_0f 1) - (set (attr prefix_rep) (symbol_ref TARGET_BMI)) + (set (attr prefix_rep) + (if_then_else + (match_test optimize_function_for_size_p (cfun)) + (const_string 0) + (const_string 1))) (set_attr mode MODE)]) (define_expand clzmode2
Fix the java-home OS include directory.
If the libjava configure option --enable-java-home is used the os directory under include will always be 'linux' as it is hardcoded so. I.E. it is not configurable using '--with-os-directory' or auto-detected as suggested by the configure help text. -- Steven 2012-05-07 Steven Drake s...@netbsd.org libjava: * Makefile.am (install-data-local): Use the $(OS) variable for the java-home os directory under include. diff --git a/libjava/Makefile.am b/libjava/Makefile.am index 1b71962..b40fa76 100644 --- a/libjava/Makefile.am +++ b/libjava/Makefile.am @@ -899,7 +899,7 @@ if CREATE_JAVA_HOME cd $(DESTDIR)$(JRE_LIB_DIR)/security; \ ln -sf $$RELATIVE/classpath.security java.security; \ cd $$working_dir; \ - $(mkinstalldirs) $(DESTDIR)$(SDK_INCLUDE_DIR)/linux; \ + $(mkinstalldirs) $(DESTDIR)$(SDK_INCLUDE_DIR)/$(OS); \ $(mkinstalldirs) $(DESTDIR)$(JRE_LIB_DIR)/$(CPU)/client; \ $(mkinstalldirs) $(DESTDIR)$(JRE_LIB_DIR)/$(CPU)/server; \ $(mkinstalldirs) $(DESTDIR)$(SDK_LIB_DIR); \ @@ -935,9 +935,9 @@ if CREATE_JAVA_HOME DIRECTORY=$$(dirname $$($(DESTDIR)$(bindir)/`echo gcj | sed 's,^.*/,,;$(transform);s/$$/$(EXEEXT)/'` \ -print-file-name=include/$$headername.h)); \ RELATIVE=$$(relative $$DIRECTORY \ - $(DESTDIR)$(SDK_INCLUDE_DIR)/linux); \ + $(DESTDIR)$(SDK_INCLUDE_DIR)/$(OS)); \ ln -sf $$RELATIVE/$$headername.h \ - $(DESTDIR)$(SDK_INCLUDE_DIR)/linux/$$headername.h; \ + $(DESTDIR)$(SDK_INCLUDE_DIR)/$(OS)/$$headername.h; \ done; \ RELATIVE=$$(relative $(DESTDIR)$(datadir)/java \ $(DESTDIR)$(JVM_ROOT_DIR)/$(SDK_DIR));
Re: [PATCH] x86: emit tzcnt unconditionally
On Mon, May 07, 2012 at 01:04:33AM +0200, Uros Bizjak wrote: Index: i386.md === --- i386.md (revision 187217) +++ i386.md (working copy) @@ -12112,9 +12112,22 @@ (set (match_operand:SWI48 0 register_operand =r) (ctz:SWI48 (match_dup 1)))] - bsf{imodesuffix}\t{%1, %0|%0, %1} +{ + if (optimize_function_for_size_p (cfun)) +return bsf{imodesuffix}\t{%1, %0|%0, %1}; + else if (TARGET_BMI) +return tzcnt{imodesuffix}\t{%1, %0|%0, %1}; + else +/* tzcnt expands to rep;bsf and we can use it even if !TARGET_BMI. */ +return rep; bsf{imodesuffix}\t{%1, %0|%0, %1}; +} Shouldn't that be done only for generic tuning? If somebody uses -mtune=native, then emitting rep; bsf is overkill, the code is intended to be run on a CPU without (or with TARGET_BMI with) tzcnt insn support. Jakub