Re: [RFC, ivopts] fix bugs in ivopts address cost computation
Hi, For the following code change, @@ -4212,11 +4064,6 @@ get_computation_cost_at (struct ivopts_d cost.cost += adjust_setup_cost (data, add_cost (TYPE_MODE (ctype), speed)); - /* Having offset does not affect runtime cost in case it is added to - symbol, but it increases complexity. */ - if (offset) -cost.complexity++; - cost.cost += add_cost (TYPE_MODE (ctype), speed); aratio = ratio > 0 ? ratio : -ratio; I think this shouldn't be removed. The offset may be affected by the position of inserting reduction variable accumulation statement. There will be different cases between before and after reduction variable accumulation. The cost of replacing use point with reduction variable should be different accordingly. BTW, I personally think the current ivopt cost modelling basically works fine, although there might be some tunings needed. The most difficult part is the choice of reduction variable candidates has something to do with register pressure cost, while the register cost estimate is not accurate enough at this stage because we don't have back-end live range interference graph at all. we are always able to find holes on some particular cases or benchmarks, but we can't only want to find a optimal result for them, and the tuning needs to be backed by more comprehensive result. Thanks, -Jiangning 2012/6/6 Sandra Loosemore : > My colleagues and I have been working on the GCC port for the Qualcomm > Hexagon. Along the way I noticed that we were getting poor results > from the ivopts pass no matter how we adjusted the target-specific RTX > costs. In many cases ivopts was coming up with candidate use costs > that seemed completely inconsistent with the target cost model. On > further inspection, I found what appears to be a whole bunch of bugs > in the way ivopts is computing address costs: > > (1) While the address cost computation is assuming in some situations > that pre/post increment/decrement addressing will be used if > supported by the target, it isn't actually using the target's address > cost for such forms -- instead, just the cost of the form that would > be used if autoinc weren't available/applicable. > > (2) The computation to determine which multiplier values are supported > by target addressing modes is constructing an address rtx of the form > (reg * ratio) to do the tests. This isn't a valid address RTX on > Hexagon, although both (reg + reg * ratio) and (sym + reg * ratio) > are. Because it's choosing the wrong address form to probe with, it > thinks that the target doesn't support multipliers at all and is > incorrectly tacking on an extra cost for them. I also note that it's > assuming that the same set of ratios are supported by all three > address forms that can potentially include them, and that all valid > ratios have the same cost. > > (3) The computation to determine the range of valid constant offsets > for address forms that can include them is probing the upper end of > the range using constants of the form ((1< offsets have to be aligned appropriately for the mode, so it's > incorrectly rejecting all positive offsets for non-char modes. And > again, it's assuming that the same range of offsets are supported by > all address forms that can legitimately include them, and that all > valid offsets have the same cost. The latter isn't true on Hexagon. > > (4) The cost adjustment for converting the symbol_present address to a > var_present address seems overly optimistic in assuming that the > symbol load will be hoisted outside the loop. I looked at a lot of > code where this was not happening no matter how expensive I made the > absolute addressing forms in the target-specific costs. Also, if > subsequent passes actually do hoist the symbol load, this requires an > additional register to be available over the entire loop, which ivopts > isn't accounting for in any way. It seems to me that this adjustment > shouldn't be attempted when the symbol_present form is a legitimate > address (because subsequent passes are not doing the anticipated > optimization in that case). There's also a bug present in the cost > accounting: it's adding the full cost of the symbol + var addition, > whereas it should be pro-rated across iterations (see the way this is > handled for the non-address cases in get_computation_cost_at). > > (5) The documentation of TARGET_ADDRESS_COST says that when two > (legitimate) address expressions have the same lowest cost, the one > with the higher complexity is used. But, the ivopts code is doing the > opposite of this and choosing the lower complexity as the tie-breaker. > (Actually, it's also computing the complexity without regard to > whether the address rtx is even legitimate on the target.) > > (6) The way get_address_costs is precomputing and caching a complete > set of cost data for each addressing mode seems incorrect to me, in > the general case. For example, consider MIPS where
[PATCH][Testsuite] XFAIL scev-3/4.c and add scev-5.c
Hi, This patch is to XFAIL scev-3.c and scev-5.c. The bug is going to be fixed after Richard Guenther fix a serials of problems related to POINTER_PLUS_EXPR and sizetype precision. Thanks, -Jiangning ChangeLog for testsuite: 2012-03-21 Jiangning Liu PR tree-optimization/52563 * gcc.dg/tree-ssa/scev-3.c: XFAIL on lp64. * gcc.dg/tree-ssa/scev-4.c: XFAIL on lp64. * gcc.dg/tree-ssa/scev-5.c: New. Thanks, -Jiangning diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c index 28d5c93..ed63a18 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c @@ -14,5 +14,5 @@ f(int k) } } -/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" { xfail lp64 } +} } */ /* { dg-final { cleanup-tree-dump "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c index 6c1e530..a538c32 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c @@ -19,5 +19,5 @@ f(int k) } } -/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" { xfail lp64 } +} } */ /* { dg-final { cleanup-tree-dump "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c new file mode 100644 index 000..b9de36a --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int *a_p; +int a[1000]; + +f(int k) +{ +long long i; + +for (i=k; i<1000; i+=k) { +a_p = &a[i]; +*a_p = 100; +} +} + +/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */
RE: [PATCH] Improve SCEV for array element
> -Original Message- > From: Richard Guenther [mailto:richard.guent...@gmail.com] > Sent: Tuesday, March 06, 2012 9:12 PM > To: Jiangning Liu > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Improve SCEV for array element > > On Fri, Jan 20, 2012 at 10:06 AM, Jiangning Liu > wrote: > >> It's definitely not ok at this stage but at most for next stage1. > > > > OK. I may wait until next stage1. > > > >> This is a very narrow pattern-match. It doesn't allow for &a[i].x > for > >> example, > >> even if a[i] is a one-element structure. I think the canonical way > of > >> handling > >> ADDR_EXPR is to use sth like > >> > >> base = get_inner_reference (TREE_OPERAND (rhs1, 0), ..., > &offset, ...); > >> base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base); > >> chrec1 = analyze_scalar_evolution (loop, base); > >> chrec2 = analyze_scalar_evolution (loop, offset); > >> chrec1 = chrec_convert (type, chrec1, at_stmt); > >> chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt); > >> res = chrec_fold_plus (type, chrec1, chrec2); > >> > >> where you probably need to handle scev_not_known when analyzing > offset > >> (which might be NULL). You also need to add bitpos to the base > address > >> (in bytes, of course). Note that the &MEM_REF case would naturally > >> work > >> with this as well. > > > > OK. New patch is like below, and bootstrapped on x86-32. > > You want instead of > > + if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF > + || TREE_CODE (TREE_OPERAND (rhs1, 0)) == MEM_REF > + || TREE_CODE (TREE_OPERAND (rhs1, 0)) == COMPONENT_REF) > +{ > > if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == MEM_REF > || handled_component_p (TREE_OPERAND (rhs1, 0))) > { > > + base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base); > + chrec1 = analyze_scalar_evolution (loop, base); > > can you please add a wrapper > > tree > analyze_scalar_evolution_for_address_of (struct loop *loop, tree var) > { > return analyze_scalar_evolution (loop, build_fold_addr_expr (var)); > } > > and call that instead of building the ADDR_EXPR there? We want > to avoid building that tree node, but even such a simple wrapper would > be prefered. > > + if (bitpos) > > if (bitpos != 0) > > + chrec3 = build_int_cst (integer_type_node, > + bitpos / BITS_PER_UNIT); > > please use size_int (bitpos / BITS_PER_UNIT) instead. Using > integer_type_node is definitely wrong. > > Ok with that changes. > Richard, Modified as all you suggested, and new code is like below. Bootstrapped on x86-32. OK for trunk now? Thanks, -Jiangning ChangeLog: 2012-03-08 Jiangning Liu * tree-scalar-evolution (interpret_rhs_expr): generate chrec for array reference and component reference. (analyze_scalar_evolution_for_address_of): New. ChangeLog for testsuite: 2012-03-08 Jiangning Liu * gcc.dg/tree-ssa/scev-3.c: New. * gcc.dg/tree-ssa/scev-4.c: New. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c new file mode 100644 index 000..28d5c93 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int *a_p; +int a[1000]; + +f(int k) +{ + int i; + + for (i=k; i<1000; i+=k) { + a_p = &a[i]; + *a_p = 100; +} +} + +/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c new file mode 100644 index 000..6c1e530 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +typedef struct { + int x; + int y; +} S; + +int *a_p; +S a[1000]; + +f(int k) +{ + int i; + + for (i=k; i<1000; i+=k) { + a_p = &a[i].y; + *a_p = 100; +} +} + +/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c index 2077c8d..c719984 --- a/gcc/tree-scalar-evolution.c +++ b/gcc/tree-scalar-evolution.c @@ -266,6 +266,8 @@ along with GCC; see the file COPYING3. If not see
RE: [PATCH] Improve SCEV for array element
> -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Jiangning Liu > Sent: Friday, January 20, 2012 5:07 PM > To: 'Richard Guenther' > Cc: gcc-patches@gcc.gnu.org > Subject: RE: [PATCH] Improve SCEV for array element > > > It's definitely not ok at this stage but at most for next stage1. > > OK. I may wait until next stage1. > > > This is a very narrow pattern-match. It doesn't allow for &a[i].x > for > > example, even if a[i] is a one-element structure. I think the > > canonical way of handling ADDR_EXPR is to use sth like > > > > base = get_inner_reference (TREE_OPERAND (rhs1, 0), ..., &offset, > > ...); base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base); > > chrec1 = analyze_scalar_evolution (loop, base); > > chrec2 = analyze_scalar_evolution (loop, offset); > > chrec1 = chrec_convert (type, chrec1, at_stmt); > > chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt); > > res = chrec_fold_plus (type, chrec1, chrec2); > > > > where you probably need to handle scev_not_known when analyzing > offset > > (which might be NULL). You also need to add bitpos to the base > > address (in bytes, of course). Note that the &MEM_REF case would > > naturally work with this as well. > > OK. New patch is like below, and bootstrapped on x86-32. > > ChangeLog: > > 2012-01-20 Jiangning Liu > > * tree-scalar-evolution (interpret_rhs_expr): generate chrec > for > array reference and component reference. > > > ChangeLog for testsuite: > > 2012-01-20 Jiangning Liu > > * gcc.dg/tree-ssa/scev-3.c: New. > * gcc.dg/tree-ssa/scev-4.c: New. > Richard, PING... Is this patch OK after branch 4.7 is created and trunk is open again? Thanks, -Jiangning > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c > b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c > new file mode 100644 > index 000..28d5c93 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +int *a_p; > +int a[1000]; > + > +f(int k) > +{ > + int i; > + > + for (i=k; i<1000; i+=k) { > + a_p = &a[i]; > + *a_p = 100; > +} > +} > + > +/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c > b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c > new file mode 100644 > index 000..6c1e530 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c > @@ -0,0 +1,23 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +typedef struct { > + int x; > + int y; > +} S; > + > +int *a_p; > +S a[1000]; > + > +f(int k) > +{ > + int i; > + > + for (i=k; i<1000; i+=k) { > + a_p = &a[i].y; > + *a_p = 100; > +} > +} > + > +/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c > index 2077c8d..4e06b75 > --- a/gcc/tree-scalar-evolution.c > +++ b/gcc/tree-scalar-evolution.c > @@ -1712,16 +1712,61 @@ interpret_rhs_expr (struct loop *loop, gimple > at_stmt, >switch (code) > { > case ADDR_EXPR: > - /* Handle &MEM[ptr + CST] which is equivalent to > POINTER_PLUS_EXPR. > */ > - if (TREE_CODE (TREE_OPERAND (rhs1, 0)) != MEM_REF) > - { > - res = chrec_dont_know; > - break; > - } > + if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF > + || TREE_CODE (TREE_OPERAND (rhs1, 0)) == MEM_REF > + || TREE_CODE (TREE_OPERAND (rhs1, 0)) == COMPONENT_REF) > +{ > + enum machine_mode mode; > + HOST_WIDE_INT bitsize, bitpos; > + int unsignedp; > + int volatilep = 0; > + tree base, offset; > + tree chrec3; > + > + base = get_inner_reference (TREE_OPERAND (rhs1, 0), > + &bitsize, &bitpos, &offset, > + &mode, &unsignedp, &volatilep, false); > + > + if (TREE_CODE (base) == MEM_REF) > + { > + rhs2 = TREE_OPERAND (base, 1); > + rhs1 = TREE_
RE: [PATCH] Improve SCEV for array element
> It's definitely not ok at this stage but at most for next stage1. OK. I may wait until next stage1. > This is a very narrow pattern-match. It doesn't allow for &a[i].x for > example, > even if a[i] is a one-element structure. I think the canonical way of > handling > ADDR_EXPR is to use sth like > > base = get_inner_reference (TREE_OPERAND (rhs1, 0), ..., &offset, ...); > base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base); > chrec1 = analyze_scalar_evolution (loop, base); > chrec2 = analyze_scalar_evolution (loop, offset); > chrec1 = chrec_convert (type, chrec1, at_stmt); > chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt); > res = chrec_fold_plus (type, chrec1, chrec2); > > where you probably need to handle scev_not_known when analyzing offset > (which might be NULL). You also need to add bitpos to the base address > (in bytes, of course). Note that the &MEM_REF case would naturally > work > with this as well. OK. New patch is like below, and bootstrapped on x86-32. ChangeLog: 2012-01-20 Jiangning Liu * tree-scalar-evolution (interpret_rhs_expr): generate chrec for array reference and component reference. ChangeLog for testsuite: 2012-01-20 Jiangning Liu * gcc.dg/tree-ssa/scev-3.c: New. * gcc.dg/tree-ssa/scev-4.c: New. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c new file mode 100644 index 000..28d5c93 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int *a_p; +int a[1000]; + +f(int k) +{ + int i; + + for (i=k; i<1000; i+=k) { + a_p = &a[i]; + *a_p = 100; +} +} + +/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c new file mode 100644 index 000..6c1e530 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +typedef struct { + int x; + int y; +} S; + +int *a_p; +S a[1000]; + +f(int k) +{ + int i; + + for (i=k; i<1000; i+=k) { + a_p = &a[i].y; + *a_p = 100; +} +} + +/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c index 2077c8d..4e06b75 --- a/gcc/tree-scalar-evolution.c +++ b/gcc/tree-scalar-evolution.c @@ -1712,16 +1712,61 @@ interpret_rhs_expr (struct loop *loop, gimple at_stmt, switch (code) { case ADDR_EXPR: - /* Handle &MEM[ptr + CST] which is equivalent to POINTER_PLUS_EXPR. */ - if (TREE_CODE (TREE_OPERAND (rhs1, 0)) != MEM_REF) - { - res = chrec_dont_know; - break; - } + if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF + || TREE_CODE (TREE_OPERAND (rhs1, 0)) == MEM_REF + || TREE_CODE (TREE_OPERAND (rhs1, 0)) == COMPONENT_REF) +{ + enum machine_mode mode; + HOST_WIDE_INT bitsize, bitpos; + int unsignedp; + int volatilep = 0; + tree base, offset; + tree chrec3; + + base = get_inner_reference (TREE_OPERAND (rhs1, 0), + &bitsize, &bitpos, &offset, + &mode, &unsignedp, &volatilep, false); + + if (TREE_CODE (base) == MEM_REF) + { + rhs2 = TREE_OPERAND (base, 1); + rhs1 = TREE_OPERAND (base, 0); + + chrec1 = analyze_scalar_evolution (loop, rhs1); + chrec2 = analyze_scalar_evolution (loop, rhs2); + chrec1 = chrec_convert (type, chrec1, at_stmt); + chrec2 = chrec_convert (TREE_TYPE (rhs2), chrec2, at_stmt); + res = chrec_fold_plus (type, chrec1, chrec2); + } + else + { + base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base); + chrec1 = analyze_scalar_evolution (loop, base); + chrec1 = chrec_convert (type, chrec1, at_stmt); + res = chrec1; + } - rhs2 = TREE_OPERAND (TREE_OPERAND (rhs1, 0), 1); - rhs1 = TREE_OPERAND (TREE_OPERAND (rhs1, 0), 0); - /* Fall through. */ + if (offset != NULL_TREE) + { + chrec2 = analyze_scalar_evolution (loop, offset); + chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt); + res = chrec_fold_plus (type, res, chrec2); + } + +
[PATCH] Improve SCEV for array element
This code change intends to improve scev for array element and reduce the common sub-expressions in loop, which may be introduced by multiple reference of expression like &a[i]. With this optimization the register pressure can be reduced in loops. The problem is originally from a real benchmark, and the test case only tries to detect the GIMPLE level changes. Bootstraped on x86-32. OK for trunk? ChangeLog: 2012-01-05 Jiangning Liu * tree-scalar-evolution (interpret_rhs_expr): generate chrec for array reference. ChangeLog for testsuite: 2012-01-05 Jiangning Liu * gcc.dg/scev-1.c: New. diff --git a/gcc/testsuite/gcc.dg/scev-1.c b/gcc/testsuite/gcc.dg/scev-1.c new file mode 100644 index 000..28d5c93 --- /dev/null +++ b/gcc/testsuite/gcc.dg/scev-1.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int *a_p; +int a[1000]; + +f(int k) +{ + int i; + + for (i=k; i<1000; i+=k) { + a_p = &a[i]; + *a_p = 100; +} +} + +/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c index 2077c8d..de89fc4 --- a/gcc/tree-scalar-evolution.c +++ b/gcc/tree-scalar-evolution.c @@ -1712,6 +1712,42 @@ interpret_rhs_expr (struct loop *loop, gimple at_stmt, switch (code) { case ADDR_EXPR: + if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF) +{ + tree array_ref; + tree var_decl, base, offset; + tree low_bound; + tree unit_size; + tree index; + + array_ref = TREE_OPERAND (rhs1, 0); + var_decl = TREE_OPERAND (array_ref, 0); + index = TREE_OPERAND (array_ref, 1); + + low_bound = array_ref_low_bound (array_ref); + unit_size = array_ref_element_size (array_ref); + + /* We assume all arrays have sizes that are a multiple of a byte. +First subtract the lower bound, if any, in the type of the +index, then convert to sizetype and multiply by the size of +the array element. */ + if (! integer_zerop (low_bound)) + index = fold_build2 (MINUS_EXPR, TREE_TYPE (index), +index, low_bound); + + offset = size_binop (MULT_EXPR, + fold_convert (sizetype, index), + unit_size); + + base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), var_decl); + chrec1 = analyze_scalar_evolution (loop, base); + chrec2 = analyze_scalar_evolution (loop, offset); + chrec1 = chrec_convert (type, chrec1, at_stmt); + chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt); + res = chrec_fold_plus (type, chrec1, chrec2); + break; +} + /* Handle &MEM[ptr + CST] which is equivalent to POINTER_PLUS_EXPR. */ if (TREE_CODE (TREE_OPERAND (rhs1, 0)) != MEM_REF) { Thanks, -Jiangning scev.patch Description: Binary data
RE: [RFC] Use REG_EXPR in back-end (introduced by optimization to conditional and/or in ARM back-end)
> -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Jiangning Liu > Sent: Wednesday, December 21, 2011 2:48 PM > To: 'Richard Henderson' > Cc: gcc-patches@gcc.gnu.org; 'Richard Guenther' > Subject: RE: [RFC] Use REG_EXPR in back-end (introduced by optimization > to conditional and/or in ARM back-end) > > > > -Original Message- > > From: Richard Henderson [mailto:r...@redhat.com] > > Sent: Tuesday, November 22, 2011 9:55 AM > > To: Jiangning Liu > > Cc: gcc-patches@gcc.gnu.org; 'Richard Guenther' > > Subject: Re: [RFC] Use REG_EXPR in back-end (introduced by > optimization > > to conditional and/or in ARM back-end) > > > > On 11/21/2011 05:31 PM, Jiangning Liu wrote: > > > My question is essentially is "May I really use REG_EXPR in back- > end > > code?" > > > like the patch I gave below? > > > > I suppose. > > > > Another alternative is to use BImode for booleans. Dunno how much of > > that > > you'd be able to gleen from mere rtl expansion or if you'd need > boolean > > decls to be expanded with BImode. > > Hi Richard, > > The output of expand pass requires the operands must meet the > requirement of > general_operand for binary operations, i.e. all RTX operations on top > level > must meet the hardware instruction requirement. Generally for a > hardware not > directly supporting a single bit logic operation, we can't generate BI > mode > rtx dest. > > So if I simply generate BImode for NE in expand pass, like "subreg:SI > (ne:BI > (reg:SI xxx) (const_int 0)))", there would be an assertion failure due > to > failing to find an appropriate instruction code to operate on a single > bit. > > This looks like a GCC design level issue. How would you suggest > generating a > legitimate rtx expression containing BImode? > Can anybody give me useful suggestions on this issue? My question essentially is may I really use BImode to solve my original problem? Thanks, -Jiangning > Thanks, > -Jiangning > > > > > The later would probably need a target hook. I've often wondered how > > much > > ia64 would benefit from that too, being able to store bool variables > > directly > > in predicate registers. > > > > All of this almost certainly must wait until stage1 opens up again > > though... > > > > > > r~ > > > >
[RFC][patch] improve scev for array element
This code change intends to improve scev for array element and reduce the common sub-expressions in loop, which may be introduced by multiple reference of expression &a[i]. In the end the register pressure may be reduced in the loop. The test case is simplified from a real benchmark, and only want to show the GIMPLE level changes. Any suggestions? diff --git a/gcc/testsuite/gcc.dg/scev-1.c b/gcc/testsuite/gcc.dg/scev-1.c new file mode 100644 index 000..28d5c93 --- /dev/null +++ b/gcc/testsuite/gcc.dg/scev-1.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int *a_p; +int a[1000]; + +f(int k) +{ + int i; + + for (i=k; i<1000; i+=k) { + a_p = &a[i]; + *a_p = 100; +} +} + +/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c index 2077c8d..de89fc4 --- a/gcc/tree-scalar-evolution.c +++ b/gcc/tree-scalar-evolution.c @@ -1712,6 +1712,42 @@ interpret_rhs_expr (struct loop *loop, gimple at_stmt, switch (code) { case ADDR_EXPR: + if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF) +{ + tree array_ref; + tree var_decl, base, offset; + tree low_bound; + tree unit_size; + tree index; + + array_ref = TREE_OPERAND (rhs1, 0); + var_decl = TREE_OPERAND (array_ref, 0); + index = TREE_OPERAND (array_ref, 1); + + low_bound = array_ref_low_bound (array_ref); + unit_size = array_ref_element_size (array_ref); + + /* We assume all arrays have sizes that are a multiple of a byte. +First subtract the lower bound, if any, in the type of the +index, then convert to sizetype and multiply by the size of +the array element. */ + if (! integer_zerop (low_bound)) + index = fold_build2 (MINUS_EXPR, TREE_TYPE (index), +index, low_bound); + + offset = size_binop (MULT_EXPR, + fold_convert (sizetype, index), + unit_size); + + base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), var_decl); + chrec1 = analyze_scalar_evolution (loop, base); + chrec2 = analyze_scalar_evolution (loop, offset); + chrec1 = chrec_convert (type, chrec1, at_stmt); + chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt); + res = chrec_fold_plus (type, chrec1, chrec2); + break; +} + /* Handle &MEM[ptr + CST] which is equivalent to POINTER_PLUS_EXPR. */ if (TREE_CODE (TREE_OPERAND (rhs1, 0)) != MEM_REF) { Another bigger case may be like below, and register pressure could be observed lower. int a[512] ; int b[512] ; int *ax ; int *bx ; int *ay ; int *by ; int i ; int j ; int f(int k) { for( j = 0 ; j < k ; j++) { for( i = j ; i < 512 ; i += k) { ax = &a[i+k] ; bx = &b[i+k] ; ay = &a[i] ; by = &b[i] ; *ax >>= 7 ; *bx >>= 7 ; a[i] = 8; b[i] = 8; } } } Thanks, -Jiangning
RE: [RFC] Use REG_EXPR in back-end (introduced by optimization to conditional and/or in ARM back-end)
> -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Tuesday, November 22, 2011 9:55 AM > To: Jiangning Liu > Cc: gcc-patches@gcc.gnu.org; 'Richard Guenther' > Subject: Re: [RFC] Use REG_EXPR in back-end (introduced by optimization > to conditional and/or in ARM back-end) > > On 11/21/2011 05:31 PM, Jiangning Liu wrote: > > My question is essentially is "May I really use REG_EXPR in back-end > code?" > > like the patch I gave below? > > I suppose. > > Another alternative is to use BImode for booleans. Dunno how much of > that > you'd be able to gleen from mere rtl expansion or if you'd need boolean > decls to be expanded with BImode. Hi Richard, The output of expand pass requires the operands must meet the requirement of general_operand for binary operations, i.e. all RTX operations on top level must meet the hardware instruction requirement. Generally for a hardware not directly supporting a single bit logic operation, we can't generate BI mode rtx dest. So if I simply generate BImode for NE in expand pass, like "subreg:SI (ne:BI (reg:SI xxx) (const_int 0)))", there would be an assertion failure due to failing to find an appropriate instruction code to operate on a single bit. This looks like a GCC design level issue. How would you suggest generating a legitimate rtx expression containing BImode? Thanks, -Jiangning > > The later would probably need a target hook. I've often wondered how > much > ia64 would benefit from that too, being able to store bool variables > directly > in predicate registers. > > All of this almost certainly must wait until stage1 opens up again > though... > > > r~
RE: [RFC] Optimization to conditional and/or in ARM back-end
> -Original Message- > From: Andrew Pinski [mailto:pins...@gmail.com] > Sent: Tuesday, November 22, 2011 1:14 PM > To: Jiangning Liu > Cc: gcc-patches@gcc.gnu.org; Richard Guenther; Richard Henderson > Subject: Re: [RFC] Optimization to conditional and/or in ARM back-end > > On Sun, Nov 20, 2011 at 6:17 PM, Jiangning Liu > wrote: > > Hi, > > > > This patch is to implement a peephole like optimization in ARM back- > end. > > > > If we have an if condition expression like "((r3 != 0) & r1) != 0", > > So this is the same as: > int f1(int r1, int r3) > { > if (((r3 != 0) & r1) != 0) > return g(); > return 1; > } > --- CUT --- > Can't you do this instead: > Combine the following two instructions: > (insn 17 15 18 2 (parallel [ > (set (reg:SI 150) > (and:SI (ne:SI (reg:SI 0 r0 [ r1 ]) > (const_int 0 [0])) > (reg:SI 1 r1 [ r3 ]))) > (clobber (reg:CC 24 cc)) > ]) t24.c:11 274 {*cond_arith} > (expr_list:REG_UNUSED (reg:CC 24 cc) > (expr_list:REG_DEAD (reg:SI 1 r1 [ r3 ]) > (expr_list:REG_DEAD (reg:SI 0 r0 [ r1 ]) > (nil) > > (insn 18 17 19 2 (set (reg:CC 24 cc) > (compare:CC (reg:SI 150) > (const_int 0 [0]))) t24.c:11 211 {*arm_cmpsi_insn} > (expr_list:REG_DEAD (reg:SI 150) > (nil))) > > And then have a pattern which expands it to (note the r3 here is the > r3 in the function, likewise for r1): > andi r1, r1, #1 > cmp r3, #0 > it ne > cmpne r1, #0 > > Yes it is one extra instruction but it is still better than before > right? Hi Andrew, Yes, and even the code generated can be like cmp r3, #0 itt ne andine r1, r1, #1 cmpne r1, #0 But this is not what I want. For performance purpose, I want to remove that andi instruction as well. Also, actually if we don't modify r1, there might be other more optimization opportunities, for example register pressure can be reduced for some cases, I guess. Thanks, -Jiangning
RE: [RFC] Use which_alternative in preparation-statements of define_insn_and_split
> -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Tuesday, November 22, 2011 7:55 AM > To: Jiangning Liu > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [RFC] Use which_alternative in preparation-statements of > define_insn_and_split > > On 11/20/2011 07:34 PM, Jiangning Liu wrote: > > Hi, > > > > I find which_alternative can't really be used in preparation- > statements of > > define_insn_and_split, so can this be fixed like below? > > > > For example, I want to use which_alternative in the pattern below, > > > > (define_insn_and_split "*thumb2_movsicc_insn" > > [(set (match_operand:SI 0 "s_register_operand" "=r,r,r,r,r,r,r,r") > > (if_then_else:SI > > (match_operator 3 "arm_comparison_operator" > > [(match_operand 4 "cc_register" "") (const_int 0)]) > > (match_operand:SI 1 "arm_not_operand" "0,0,rI,K,rI,rI,K,K") > > (match_operand:SI 2 "arm_not_operand" "rI,K,0,0,rI,K,rI,K")))] > > "TARGET_THUMB2" > > "@ > >it\\t%D3\;mov%D3\\t%0, %2 > >it\\t%D3\;mvn%D3\\t%0, #%B2 > >it\\t%d3\;mov%d3\\t%0, %1 > >it\\t%d3\;mvn%d3\\t%0, #%B1 > >ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2 > >ite\\t%d3\;mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2 > >ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2 > >ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2" > > "&& reload_completed" > > [(cond_exec (match_dup 5) > > (set (match_dup 0) (match_dup 6)))] > > " > > { > > if (which_alternative >= 2 && which_alternative < 4) > > { > > ... > > } > > else if (which_alternative >= 4) > > Hmm, I guess. It seems a bit weird. > > It seems like you'd be better off *not* using define_insn_and_split, > actually, and instead using more specific tests on the separate > define_split than you would on the define_insn. > Yes. That would be alternative solution I can accept. But I still want to know why we don't want to support this? I don't see any GCC documentation saying not allowing this usage. Or you might be thinking this change is not safe enough? Thanks, -Jiangning > I don't feel strongly about it though. I won't object if some other > rtl maintainer wants to approve this. > > > r~
RE: [RFC] Use REG_EXPR in back-end (introduced by optimization to conditional and/or in ARM back-end)
The original subject doesn't catch the key point, so I changed the subject to get more people noticed. My question is essentially is "May I really use REG_EXPR in back-end code?" like the patch I gave below? Thanks, -Jiangning > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Jiangning Liu > Sent: Monday, November 21, 2011 10:18 AM > To: gcc-patches@gcc.gnu.org > Cc: 'Richard Guenther'; Richard Henderson > Subject: [RFC] Optimization to conditional and/or in ARM back-end > > Hi, > > This patch is to implement a peephole like optimization in ARM back-end. > > If we have an if condition expression like "((r3 != 0) & r1) != 0", > originally the binary code to be generated is like, > > cmp r3, #0 > ite eq > moveq r1, #0 > andne r1, r1, #1 > cmp r1, #0 > > But actually this expression could be transformed into "((r3 != x) & > (r1 != > 0)) != 0", if we know r2 is with bool type. This way we could generate > new > binary code like, > > cmp r3, #0 > it ne > cmpne r1, #0 > > The question is how to judge r2 is a bool variable in back-end. I'm > using > REG_EXPR in function arm_check_logic_with_bool_reg to check if r2 is a > bool, > this function is being invoked in pattern "*cond_". > > May I really use REG_EXPR this way in GCC back-end code? I posted a > related > topic at http://gcc.gnu.org/ml/gcc/2011-11/msg00417.html. > > If the answer is No, is there any suggestions about either how to judge > this > bool type or how to implement this optimization? > > Appreciate your comments in advance! > > Thanks, > -Jiangning > > > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > index 23a29c6..8b12d48 > --- a/gcc/config/arm/arm-protos.h > +++ b/gcc/config/arm/arm-protos.h > @@ -114,6 +114,7 @@ extern rtx arm_gen_load_multiple (int *, int, rtx, > int, > rtx, HOST_WIDE_INT *); > extern rtx arm_gen_store_multiple (int *, int, rtx, int, rtx, > HOST_WIDE_INT > *); > extern int arm_gen_movmemqi (rtx *); > extern enum machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx); > +extern bool arm_check_logic_with_bool_reg (RTX_CODE, rtx, rtx); > extern enum machine_mode arm_select_dominance_cc_mode (rtx, rtx, > HOST_WIDE_INT); > extern rtx arm_gen_compare_reg (RTX_CODE, rtx, rtx); > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index a429c19..e96f24a > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -10910,6 +10910,41 @@ arm_gen_movmemqi (rtx *operands) >return 1; > } > > +/* Check whether the expression "rc " can be > transformed > + to use conditional comparison or not. */ > +bool > +arm_check_logic_with_bool_reg (RTX_CODE rc, rtx bool_reg, rtx cmp1) { > + rtx cmp2; > + HOST_WIDE_INT dom_cc; > + > + if (!REG_P (bool_reg)) > +return false; > + > + if (REG_EXPR (bool_reg) == NULL) > +return false; > + > + if (TREE_CODE (REG_EXPR (bool_reg)) != VAR_DECL) > +return false; > + > + if (TREE_CODE (TREE_TYPE (REG_EXPR (bool_reg))) != BOOLEAN_TYPE) > +return false; > + > + cmp2 = gen_rtx_NE (GET_MODE (bool_reg), bool_reg, const0_rtx); > + > + if (rc == AND) > +dom_cc = DOM_CC_X_AND_Y; > + else if (rc == IOR) > +dom_cc = DOM_CC_X_OR_Y; > + else > +gcc_unreachable (); > + > + if (arm_select_dominance_cc_mode (cmp1, cmp2, dom_cc) == CCmode) > +return false; > + > + return true; > +} > + > + > /* Select a dominance comparison mode if possible for a test of the > general > form (OP (COND_OR (X) (Y)) (const_int 0)). We support three forms. > COND_OR == DOM_CC_X_AND_Y => (X && Y) > diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md > index a78ba88..e90a78e > --- a/gcc/config/arm/arm.md > +++ b/gcc/config/arm/arm.md > @@ -9172,6 +9172,64 @@ > (set_attr "length" "4,4,8")] > ) > > +; This pattern matches expression like example "((r1 != x) & r2) != 0". > +; Here r2 is a register with data type boolean. > +; This pattern can transform to code matching patterns cmp_and. > +; Likewise, code matching pattern cmp_ior could be generated, if it is > | > +; rather than & in the example. > +(define_insn_and_split "*cond_" > + [(set (match_operand 0 "cc_register" "") > + (compare > + (ior_and:SI > + (match_operator:
[RFC] Use which_alternative in preparation-statements of define_insn_and_split
Hi, I find which_alternative can't really be used in preparation-statements of define_insn_and_split, so can this be fixed like below? For example, I want to use which_alternative in the pattern below, (define_insn_and_split "*thumb2_movsicc_insn" [(set (match_operand:SI 0 "s_register_operand" "=r,r,r,r,r,r,r,r") (if_then_else:SI (match_operator 3 "arm_comparison_operator" [(match_operand 4 "cc_register" "") (const_int 0)]) (match_operand:SI 1 "arm_not_operand" "0,0,rI,K,rI,rI,K,K") (match_operand:SI 2 "arm_not_operand" "rI,K,0,0,rI,K,rI,K")))] "TARGET_THUMB2" "@ it\\t%D3\;mov%D3\\t%0, %2 it\\t%D3\;mvn%D3\\t%0, #%B2 it\\t%d3\;mov%d3\\t%0, %1 it\\t%d3\;mvn%d3\\t%0, #%B1 ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2 ite\\t%d3\;mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2 ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2 ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2" "&& reload_completed" [(cond_exec (match_dup 5) (set (match_dup 0) (match_dup 6)))] " { if (which_alternative >= 2 && which_alternative < 4) { ... } else if (which_alternative >= 4) { ... } }" [(set_attr "length" "6,6,6,6,10,10,10,10") (set_attr "conds" "use")] ) Thanks, -Jiangning Patch: diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c index c94e743..df6a3df --- a/gcc/emit-rtl.c +++ b/gcc/emit-rtl.c @@ -3502,6 +3502,10 @@ try_split (rtx pat, rtx trial, int last) split_branch_probability = INTVAL (XEXP (note, 0)); probability = split_branch_probability; + extract_insn (trial); + if (!constrain_operands (reload_completed)) +return trial; + seq = split_insns (pat, trial); split_branch_probability = -1;
[RFC] Optimization to conditional and/or in ARM back-end
Hi, This patch is to implement a peephole like optimization in ARM back-end. If we have an if condition expression like "((r3 != 0) & r1) != 0", originally the binary code to be generated is like, cmp r3, #0 ite eq moveq r1, #0 andne r1, r1, #1 cmp r1, #0 But actually this expression could be transformed into "((r3 != x) & (r1 != 0)) != 0", if we know r2 is with bool type. This way we could generate new binary code like, cmp r3, #0 it ne cmpne r1, #0 The question is how to judge r2 is a bool variable in back-end. I'm using REG_EXPR in function arm_check_logic_with_bool_reg to check if r2 is a bool, this function is being invoked in pattern "*cond_". May I really use REG_EXPR this way in GCC back-end code? I posted a related topic at http://gcc.gnu.org/ml/gcc/2011-11/msg00417.html. If the answer is No, is there any suggestions about either how to judge this bool type or how to implement this optimization? Appreciate your comments in advance! Thanks, -Jiangning diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 23a29c6..8b12d48 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -114,6 +114,7 @@ extern rtx arm_gen_load_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT *); extern rtx arm_gen_store_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT *); extern int arm_gen_movmemqi (rtx *); extern enum machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx); +extern bool arm_check_logic_with_bool_reg (RTX_CODE, rtx, rtx); extern enum machine_mode arm_select_dominance_cc_mode (rtx, rtx, HOST_WIDE_INT); extern rtx arm_gen_compare_reg (RTX_CODE, rtx, rtx); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index a429c19..e96f24a --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -10910,6 +10910,41 @@ arm_gen_movmemqi (rtx *operands) return 1; } +/* Check whether the expression "rc " can be transformed + to use conditional comparison or not. */ +bool +arm_check_logic_with_bool_reg (RTX_CODE rc, rtx bool_reg, rtx cmp1) { + rtx cmp2; + HOST_WIDE_INT dom_cc; + + if (!REG_P (bool_reg)) +return false; + + if (REG_EXPR (bool_reg) == NULL) +return false; + + if (TREE_CODE (REG_EXPR (bool_reg)) != VAR_DECL) +return false; + + if (TREE_CODE (TREE_TYPE (REG_EXPR (bool_reg))) != BOOLEAN_TYPE) +return false; + + cmp2 = gen_rtx_NE (GET_MODE (bool_reg), bool_reg, const0_rtx); + + if (rc == AND) +dom_cc = DOM_CC_X_AND_Y; + else if (rc == IOR) +dom_cc = DOM_CC_X_OR_Y; + else +gcc_unreachable (); + + if (arm_select_dominance_cc_mode (cmp1, cmp2, dom_cc) == CCmode) +return false; + + return true; +} + + /* Select a dominance comparison mode if possible for a test of the general form (OP (COND_OR (X) (Y)) (const_int 0)). We support three forms. COND_OR == DOM_CC_X_AND_Y => (X && Y) diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index a78ba88..e90a78e --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -9172,6 +9172,64 @@ (set_attr "length" "4,4,8")] ) +; This pattern matches expression like example "((r1 != x) & r2) != 0". +; Here r2 is a register with data type boolean. +; This pattern can transform to code matching patterns cmp_and. +; Likewise, code matching pattern cmp_ior could be generated, if it is | +; rather than & in the example. +(define_insn_and_split "*cond_" + [(set (match_operand 0 "cc_register" "") + (compare + (ior_and:SI + (match_operator:SI 4 "arm_comparison_operator" + [(match_operand:SI 2 "s_register_operand" "r,r") + (match_operand:SI 3 "arm_add_operand" "rI,L")]) + (match_operand:SI 1 "s_register_operand" "r,r")) +(const_int 0)))] + "TARGET_32BIT + && arm_check_logic_with_bool_reg (, operands[1], operands[4])" + "#" + "&& 1" + [(set (match_dup 7) + (compare +(match_op_dup 5 + [(match_op_dup 4 [(match_dup 2) (match_dup 3)]) + (match_op_dup 6 [(match_dup 1) (const_int 0)])]) +(const_int 0)))] + " + { +HOST_WIDE_INT dom_cc; + +operands[6] = gen_rtx_NE (SImode, operands[1], const0_rtx); + +if ( == AND) + { +dom_cc = DOM_CC_X_AND_Y; +operands[5] = gen_rtx_fmt_ee (, SImode, + operands[4], operands[6]); + } +else if ( == IOR) + { +dom_cc = DOM_CC_X_OR_Y; +operands[5] = gen_rtx_fmt_ee (, SImode, + operands[4], operands[6]); + } +else + gcc_unreachable (); + +operands[7] + = gen_rtx_REG (arm_select_dominance_cc_mode (operands[4], + operands[6], + dom_cc), +CC_REGNUM); + }" + [(set_attr "conds" "clob") + (set (attr "length") + (if
MAINTAINERS: add myself
Just committed the following: * MAINTAINERS (Write After Approval): Add myself. Index: MAINTAINERS === --- MAINTAINERS (revision 181466) +++ MAINTAINERS (working copy) @@ -419,6 +419,7 @@ Marc Lehmann p...@goof.com James Lemkejwle...@juniper.net Kriang Lerdsuwanakij lerds...@users.sourceforge.net +Jiangning Liu jiangning@arm.com Sa Liu sa...@de.ibm.com Ralph Loader r...@ihug.co.nz Gabor Loki l...@inf.u-szeged.hu
[arm-embedded] Backport mainline r178102 and partial r172017
Backport mainline 178102 and partial r172017 to ARM/embedded-4_6-branch. Committed. 2011-11-17 Jiangning Liu Backport r178102 from mainline 2011-08-26 Jiangning Liu * config/arm/arm.md (*ior_scc_scc): Enable for Thumb2 as well. (*ior_scc_scc_cmp): Likewise (*and_scc_scc): Likewise. (*and_scc_scc_cmp): Likewise. (*and_scc_scc_nodom): Likewise. (*cmp_ite0, *cmp_ite1, *cmp_and, *cmp_ior): Handle Thumb2. Partially backport r172017 from mainline 2011-04-06 Wei Guozhi * config/arm/constraints.md (Py): New constraint. testsuite: 2011-11-17 Jiangning Liu Backport r178102 from mainline 2011-08-26 Jiangning Liu * gcc.target/arm/thumb2-cond-cmp-1.c: New. * gcc.target/arm/thumb2-cond-cmp-2.c: Likewise. * gcc.target/arm/thumb2-cond-cmp-3.c: Likewise. * gcc.target/arm/thumb2-cond-cmp-4.c: Likewise.
[arm-embedded] Revert r179307 and backport mainline r180964
The original solution r179307 of stack red zone issue for PR38644 is completely reverted in ARM/embedded-4_6-branch. And backport mainline 180964 to ARM/embedded-4_6-branch. Committed. Undo changes committed in r179307. 2011-11-16 Jiangning Liu Backport r180964 from mainline 2011-11-04 Jiangning Liu PR rtl-optimization/38644 * config/arm/arm.c (thumb1_expand_epilogue): Add memory barrier for epilogue having stack adjustment. testsuite: 2011-11-16 Jiangning Liu Backport r180964 from mainline 2011-11-04 Jiangning Liu PR rtl-optimization/38644 * gcc.target/arm/stack-red-zone.c: New.
[PATCH, ARM] Fix stack red zone bug (PR38644)
Hi, This patch is to fix PR38644 in ARM back-end. OK for trunk? For every detail, please refer to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644. ChangeLog: 2011-11-2 Jiangning Liu PR rtl-optimization/38644 * config/arm/arm.c (thumb1_expand_epilogue): Add memory barrier for epilogue having stack adjustment. ChangeLog of testsuite: 2011-11-2 Jiangning Liu PR rtl-optimization/38644 * gcc.target/arm/stack-red-zone.c: New. Thanks, -Jiangning Patch: diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index f1ada6f..1f6fc26 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -22215,6 +22215,8 @@ thumb1_expand_epilogue (void) gcc_assert (amount >= 0); if (amount) { + emit_insn (gen_blockage ()); + if (amount < 512) emit_insn (gen_addsi3 (stack_pointer_rtx, stack_pointer_rtx, GEN_INT (amount))); diff --git a/gcc/testsuite/gcc.target/arm/stack-red-zone.c b/gcc/testsuite/gcc.target/arm/stack-red-zone.c new file mode 100644 index 000..b9f0f99 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/stack-red-zone.c @@ -0,0 +1,12 @@ +/* No stack red zone. PR38644. */ +/* { dg-options "-mthumb -O2" } */ +/* { dg-final { scan-assembler "ldrb\[^\n\]*\\n\[\t \]*add\[\t \]*sp" } } */ + +extern int doStreamReadBlock (int *, char *, int size, int); + +char readStream (int *s) +{ + char c = 0; + doStreamReadBlock (s, &c, 1, *s); + return c; +}
RE: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
> -Original Message- > From: Kai Tietz [mailto:ktiet...@googlemail.com] > Sent: Thursday, October 27, 2011 5:36 PM > To: Jiangning Liu > Cc: Michael Matz; Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org; > Richard Henderson > Subject: Re: [patch tree-optimization]: Improve handling of > conditional-branches on targets with high branch costs > > 2011/10/27 Jiangning Liu : > > > > > >> -Original Message- > >> From: Michael Matz [mailto:m...@suse.de] > >> Sent: Wednesday, October 26, 2011 11:47 PM > >> To: Kai Tietz > >> Cc: Jiangning Liu; Richard Guenther; Kai Tietz; gcc- > patc...@gcc.gnu.org; > >> Richard Henderson > >> Subject: Re: [patch tree-optimization]: Improve handling of > >> conditional-branches on targets with high branch costs > >> > >> Hi, > >> > >> On Wed, 26 Oct 2011, Kai Tietz wrote: > >> > >> > So you would mean that memory dereferencing shouldn't be > considered > >> as > >> > side-effect at all? > >> > >> No. I haven't said this at all. Of course it's a side-effect, but > >> we're > >> allowed to remove existing ones (under some circumstances). We're > not > >> allowed to introduce new ones, which means that this ... > >> > >> > So we would happily cause by code 'if (i && *i != 0) an crash, as > >> > memory-dereference has for you no side-effect? > >> > >> ... is not allowed. But in the original example the memread was on > the > >> left side, hence occured always, therefore we can move it to the > right > >> side, even though it might occur less often. > >> > >> > In you special case it might be valid that, if first (and C-fold- > >> const > >> > doesn't know if the side-effect condition is really the first, as > it > >> > might be a sub-sequence of a condition) condition might trap or > not, > >> to > >> > combine it. But branching has to cover the general cases. If you > >> find > >> > a way to determine that left-hand operand in fold_const's > branching > >> code > >> > is really the left-most condition in chain, then we can add such a > >> > special case, but I don't see here an easy way to determine it. > >> > >> Hmm? I don't see why it's necessary to check if it's the left-most > >> condition in a chain. If the left hand of '&&' is a memread it can > >> always > >> be moved to the right side (or the operator transformed into '&' > which > >> can > >> have the same effect), of course only if the original rhs is free of > >> side > >> effects, but then independed if the && was part of a larger > expression. > >> The memread will possibly be done fewer times than originally, but > as > >> said, that's okay. > >> > > > > Agree. The point is for the small case I gave RHS doesn't have side > effect > > at all, so the optimization of changing it to AND doesn't violate C > > specification. We need to recover something for this case, although > it did > > improve a lot for some particular benchmarks. > > > > Thanks, > > -Jiangning > > > >> > >> Ciao, > >> Michael. > > Hmm, so we can allow merging to AND, if the left-hand-side might trap > but has no-side-effects and rhs has neither trapping nor side-effects. > As for the case that left-hand side has side-effects but right-hand > not, we aren't allowed to do this AND/OR merge. For example 'if ((f = > foo ()) != 0 && f < 24)' we aren't allowed to make this > transformation. > > This shouldn't be that hard. We need to provide to simple_operand_p_2 > an additional argument for checking trapping or not. > Would it be OK if I file a tracker in bugzilla against this? > Regards, > Kai
RE: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
> -Original Message- > From: Michael Matz [mailto:m...@suse.de] > Sent: Wednesday, October 26, 2011 11:47 PM > To: Kai Tietz > Cc: Jiangning Liu; Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org; > Richard Henderson > Subject: Re: [patch tree-optimization]: Improve handling of > conditional-branches on targets with high branch costs > > Hi, > > On Wed, 26 Oct 2011, Kai Tietz wrote: > > > So you would mean that memory dereferencing shouldn't be considered > as > > side-effect at all? > > No. I haven't said this at all. Of course it's a side-effect, but > we're > allowed to remove existing ones (under some circumstances). We're not > allowed to introduce new ones, which means that this ... > > > So we would happily cause by code 'if (i && *i != 0) an crash, as > > memory-dereference has for you no side-effect? > > ... is not allowed. But in the original example the memread was on the > left side, hence occured always, therefore we can move it to the right > side, even though it might occur less often. > > > In you special case it might be valid that, if first (and C-fold- > const > > doesn't know if the side-effect condition is really the first, as it > > might be a sub-sequence of a condition) condition might trap or not, > to > > combine it. But branching has to cover the general cases. If you > find > > a way to determine that left-hand operand in fold_const's branching > code > > is really the left-most condition in chain, then we can add such a > > special case, but I don't see here an easy way to determine it. > > Hmm? I don't see why it's necessary to check if it's the left-most > condition in a chain. If the left hand of '&&' is a memread it can > always > be moved to the right side (or the operator transformed into '&' which > can > have the same effect), of course only if the original rhs is free of > side > effects, but then independed if the && was part of a larger expression. > The memread will possibly be done fewer times than originally, but as > said, that's okay. > Agree. The point is for the small case I gave RHS doesn't have side effect at all, so the optimization of changing it to AND doesn't violate C specification. We need to recover something for this case, although it did improve a lot for some particular benchmarks. Thanks, -Jiangning > > Ciao, > Michael.
RE: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
> -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Michael Matz > Sent: Tuesday, October 11, 2011 10:45 PM > To: Kai Tietz > Cc: Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org; Richard > Henderson > Subject: Re: [patch tree-optimization]: Improve handling of > conditional-branches on targets with high branch costs > > Hi, > > On Tue, 11 Oct 2011, Kai Tietz wrote: > > > > Better make it a separate function the first tests your new > > > conditions, and then calls simple_operand_p. > > > > Well, either I make it a new function and call it instead of > > simple_operand_p, > > That's what I meant, yes. > > > >> @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_ > > >> build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg), > > >> ll_arg, rl_arg), > > >> build_int_cst (TREE_TYPE (ll_arg), 0)); > > >> - > > >> - if (LOGICAL_OP_NON_SHORT_CIRCUIT) > > >> - { > > >> - if (code != orig_code || lhs != orig_lhs || rhs != > orig_rhs) > > >> - return build2_loc (loc, code, truth_type, lhs, rhs); > > >> - return NULL_TREE; > > >> - } > > > > > > Why do you remove this hunk? Shouldn't you instead move the hunk > you > > > added to fold_truth_andor() here. I realize this needs some TLC to > > > fold_truth_andor_1, because right now it early-outs for non- > comparisons, > > > but it seems the better place. I.e. somehow move the below code > into the > > > above branch, with the associated diddling on fold_truth_andor_1 > that it > > > gets called. > > > > This hunk is removed, as it is vain to do here. > > There is a fallthrough now, that wasn't there before. I don't know if > it's harmless, I just wanted to mention it. > Yes, this part introduced different behavior for this small case, int f(char *i, int j) { if (*i && j!=2) return *i; else return j; } Before the fix, we have D.4710 = *i; D.4711 = D.4710 != 0; D.4712 = j != 2; D.4713 = D.4711 & D.4712; if (D.4713 != 0) goto ; else goto ; : D.4710 = *i; D.4716 = (int) D.4710; return D.4716; : D.4716 = j; return D.4716; After the fix, we have D.4711 = *i; if (D.4711 != 0) goto ; else goto ; : if (j != 2) goto ; else goto ; : D.4711 = *i; D.4714 = (int) D.4711; return D.4714; : D.4714 = j; return D.4714; Does this meet the original expectation? I find the code below in function fold_truth_andor makes difference, /* Transform (A AND-IF B) into (A AND B), or (A OR-IF B) into (A OR B). For sequence point consistancy, we need to check for trapping, and side-effects. */ else if (code == icode && simple_operand_p_2 (arg0) && simple_operand_p_2 (arg1)) return fold_build2_loc (loc, ncode, type, arg0, arg1); for "*i != 0" simple_operand_p(*i) returns false. Originally this is not checked by the code. I don't see the patch originally wanted to cover this. Can this be explained reasonably? I'm not arguing this patch did worse thing, but only want to understand the rationale behind this and justify this patch doesn't hurt anything else. Actually on the contrary, I measured and this change accidently made some benchmarks significantly improved due to some other reasons. Thanks, -Jiangning > > Btw richi asked for it, and I agree that new TRUTH-AND/OR packing is > > better done at a single place in fold_truth_andor only. > > As fold_truthop is called twice by fold_truth_andor, the latter might > indeed be the better place. > > > Ciao, > Michael.
RE: [PATCH] Fix stack red zone bug (PR38644)
> -Original Message- > From: Richard Sandiford > Date: Fri, Sep 30, 2011 at 8:46 PM > Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > To: Jiangning Liu > Cc: Jakub Jelinek , Richard Guenther > , Andrew Pinski , > gcc-patches@gcc.gnu.org > > > "Jiangning Liu" writes: > >> You seem to feel strongly about this because it's a wrong-code bug > that > >> is very easy to introduce and often very hard to detect. And I > >> defintely > >> sympathise with that. If we were going to to do it in a target- > >> independent > >> way, though, I think it would be better to scan patterns like > epilogue > >> and > >> automatically introduce barriers before assignments to > >> stack_pointer_rtx > >> (subject to the kind of hook in your patch). But I still don't > think > >> that's better than leaving the onus on the backend. The backend is > >> still responsible for much more complicated things like determning > >> the correct deallocation and register-restore sequence, and for > >> determining the correct CFI sequence. > >> > > > > I think middle-end in GCC is actually shared code rather than the > part > > exactly in the middle. A pass working on RTL can be a middle end just > > because the code can be shared for all targets, and some passes can > even > > work for both GIMPLE and RTL. > > > > Actually some optimizations need to work through "shared part" > (middle-end) > > plus "target specific part" (back-end). You are thinking the > interface > > between this "shared part" and "target specific part" should be using > > "barrier" as a properly model. To some extension I agree with this. > However, > > it doesn't mean the fix should be in back-end rather than middle end, > > because obviously this problem is a common ABI issue for all targets. > If we > > can abstract this issue to be a shared part, why shouldn't we do it > in > > middle end to reduce the onus of back-end? Back-end should handle the > target > > specific things rather than only the complicated things. > > And for avoidance of doubt, the automatic barrier insertion that I > described would be one way of doing it in target-independent code. > But... > > > If a complicated problem can be implemented in a "shared code" manner, > we > > still want to put it into middle end rather than back-end. I believe > those > > optimizations based on SSA form are complicated enough, but they are > all in > > middle end. This is the logic I'm seeing in GCC. > > The situation here is different. The target-independent rtl code is > being given a blob of instructions that the backend has generated for > the epilogue. There's no fine-tuning beyond that. E.g. we don't have > separate patterns for "restore registers", "deallocate stack", "return": > we just have one monolithic "epilogue" pattern. The target-independent > code has very little control. > > In contrast, after the tree optimisers have handed off the initial IL, > the tree optimisers are more or less in full control. There are very > few cases where we generate further trees outside the middle-end. The > only > case I know off-hand is the innards of va_start and va_arg, which can > be > generated by the backend. > > So let's suppose we had a similar situation there, where we wanted > va_arg do something special in a certain situation. If we had the > same three choices of: > > 1. use an on-the-side hook to represent the special something > 2. scan the code generated by the backend and automatically > inject the special something at an appropriate place > 3. require each backend to do it properly from the start > > (OK, slightly prejudiced wording :-)) I think we'd still choose 3. > Richard S., Although I've ever implemented va_arg for a commercial compiler previously long times ago, I forgot all the details. :-) I'm not sure if using va_arg is a good example to compare with this stack red zone case. > > For this particular issue, I don't think that hook interface I'm > > proposing is more complicated than the barrier. Instead, it is easier > > for back-end implementer to be aware of the potential issue before > > really solving stack red zone problem, because it is very clearly > > listed in target hook list. > > The point for "model it in the IL" supporters like myself is that we > have both many backends and many rtl passes.
RE: [PATCH] Fix stack red zone bug (PR38644)
> -Original Message- > From: Richard Guenther [mailto:richard.guent...@gmail.com] > Sent: Friday, September 30, 2011 8:57 PM > To: Jiangning Liu; Jakub Jelinek; Richard Guenther; Andrew Pinski; gcc- > patc...@gcc.gnu.org; richard.sandif...@linaro.org > Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > > On Fri, Sep 30, 2011 at 2:46 PM, Richard Sandiford > wrote: > > "Jiangning Liu" writes: > >>> You seem to feel strongly about this because it's a wrong-code bug > that > >>> is very easy to introduce and often very hard to detect. And I > >>> defintely > >>> sympathise with that. If we were going to to do it in a target- > >>> independent > >>> way, though, I think it would be better to scan patterns like > epilogue > >>> and > >>> automatically introduce barriers before assignments to > >>> stack_pointer_rtx > >>> (subject to the kind of hook in your patch). But I still don't > think > >>> that's better than leaving the onus on the backend. The backend is > >>> still responsible for much more complicated things like determning > >>> the correct deallocation and register-restore sequence, and for > >>> determining the correct CFI sequence. > >>> > >> > >> I think middle-end in GCC is actually shared code rather than the > part > >> exactly in the middle. A pass working on RTL can be a middle end > just > >> because the code can be shared for all targets, and some passes can > even > >> work for both GIMPLE and RTL. > >> > >> Actually some optimizations need to work through "shared part" > (middle-end) > >> plus "target specific part" (back-end). You are thinking the > interface > >> between this "shared part" and "target specific part" should be > using > >> "barrier" as a properly model. To some extension I agree with this. > However, > >> it doesn't mean the fix should be in back-end rather than middle end, > >> because obviously this problem is a common ABI issue for all targets. > If we > >> can abstract this issue to be a shared part, why shouldn't we do it > in > >> middle end to reduce the onus of back-end? Back-end should handle > the target > >> specific things rather than only the complicated things. > > > > And for avoidance of doubt, the automatic barrier insertion that I > > described would be one way of doing it in target-independent code. > > But... > > > >> If a complicated problem can be implemented in a "shared code" > manner, we > >> still want to put it into middle end rather than back-end. I believe > those > >> optimizations based on SSA form are complicated enough, but they are > all in > >> middle end. This is the logic I'm seeing in GCC. > > > > The situation here is different. The target-independent rtl code is > > being given a blob of instructions that the backend has generated for > > the epilogue. There's no fine-tuning beyond that. E.g. we don't > have > > separate patterns for "restore registers", "deallocate stack", > "return": > > we just have one monolithic "epilogue" pattern. The target- > independent > > code has very little control. > > > > In contrast, after the tree optimisers have handed off the initial IL, > > the tree optimisers are more or less in full control. There are very > > few cases where we generate further trees outside the middle- > end. The only > > case I know off-hand is the innards of va_start and va_arg, which can > be > > generated by the backend. > > > > So let's suppose we had a similar situation there, where we wanted > > va_arg do something special in a certain situation. If we had the > > same three choices of: > > > > 1. use an on-the-side hook to represent the special something > > 2. scan the code generated by the backend and automatically > > inject the special something at an appropriate place > > 3. require each backend to do it properly from the start > > > > (OK, slightly prejudiced wording :-)) I think we'd still choose 3. > > > >> For this particular issue, I don't think that hook interface I'm > >> proposing is more complicated than the barrier. Instead, it is > easier > >> for back-end implementer to be aware of the potential issue before > >> really solving stack red zone
RE: [PATCH] Fix stack red zone bug (PR38644)
> -Original Message- > From: Richard Henderson [mailto:r...@redhat.com] > Sent: Saturday, October 01, 2011 3:05 AM > To: Jiangning Liu > Cc: 'Jakub Jelinek'; 'Richard Guenther'; Andrew Pinski; gcc- > patc...@gcc.gnu.org > Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > > On 09/29/2011 06:13 PM, Jiangning Liu wrote: > > > > > >> -Original Message- > >> From: Jakub Jelinek [mailto:ja...@redhat.com] > >> Sent: Thursday, September 29, 2011 6:14 PM > >> To: Jiangning Liu > >> Cc: 'Richard Guenther'; Andrew Pinski; gcc-patches@gcc.gnu.org > >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > >> > >> On Thu, Sep 29, 2011 at 06:08:50PM +0800, Jiangning Liu wrote: > >>> As far as I know different back-ends are implementing different > >>> prologue/epilogue in GCC. If one day this part can be refined and > >> abstracted > >>> as well, I would say solving this stack-red-zone problem in shared > >>> prologue/epilogue code would be a perfect solution, and barrier can > >> be > >>> inserted there. > >>> > >>> I'm not saying you are wrong on keeping scheduler using a pure > >> barrier > >>> interface. From engineering point of view, I only feel my proposal > is > >> so far > >>> so good, because this patch at least solve the problem for all > >> targets in a > >>> quite simple way. Maybe it can be improved in future based on this. > >> > >> But you don't want to listen about any other alternative, other > >> backends are > >> happy with being able to put the best kind of barrier at the best > spot > >> in the epilogue and don't need a "generic" solution which won't > model > >> very > >> well the target diversity anyway. > > > > Jakub, > > > > Appreciate for your attention on this issue, > > > > 1) Can you clarify who are the "others back-ends"? Does it cover most > of the > > back-ends being supported by GCC right now? > > Your red-stack barrier issue is *exactly* the same as the frame pointer > barrier issue, which affects many backends. > > That is, if the frame pointer is initialized before the local stack > frame > is allocated, then one has to add a barrier such that memory references > based on the frame pointer are not scheduled before the local stack > frame > allocation. > > One example of this is in the i386 port, where the prologue looks like > > push%ebp > mov %esp, %ebp > sub $frame, %esp > > The rtl we emit for that subtraction looks like > > (define_insn "pro_epilogue_adjust_stack__add" > [(set (match_operand:P 0 "register_operand" "=r,r") > (plus:P (match_operand:P 1 "register_operand" "0,r") > (match_operand:P 2 "" "r,l"))) >(clobber (reg:CC FLAGS_REG)) >(clobber (mem:BLK (scratch)))] > > Note the final clobber, which is a memory scheduling barrier. > > Other targets use similar tricks. For instance arm "stack_tie". > > Honestly, I've found nothing convincing throughout this thread that > suggests to me that this problem should be handled generically. > Richard H., Thanks for your explanation by giving an example in x86. The key is if possible, fixing it in middle end can benefit all ports directly and avoid bug fixing burden in back-ends, rather than fix this problem port by port. Actually now the debating here is whether memory barrier is properly modeling through whole GCC rather than a single component, because my current understanding is scheduler is not the only component using memory barrier. Thanks, -Jiangning > > r~
RE: [PATCH] Fix stack red zone bug (PR38644)
> -Original Message- > From: Richard Sandiford [mailto:rdsandif...@googlemail.com] > Sent: Friday, September 30, 2011 4:15 PM > To: Jiangning Liu > Cc: 'Jakub Jelinek'; 'Richard Guenther'; Andrew Pinski; gcc- > patc...@gcc.gnu.org > Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > > "Jiangning Liu" writes: > >> -Original Message- > >> From: Jakub Jelinek [mailto:ja...@redhat.com] > >> Sent: Thursday, September 29, 2011 6:14 PM > >> To: Jiangning Liu > >> Cc: 'Richard Guenther'; Andrew Pinski; gcc-patches@gcc.gnu.org > >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > >> > >> On Thu, Sep 29, 2011 at 06:08:50PM +0800, Jiangning Liu wrote: > >> > As far as I know different back-ends are implementing different > >> > prologue/epilogue in GCC. If one day this part can be refined and > >> abstracted > >> > as well, I would say solving this stack-red-zone problem in shared > >> > prologue/epilogue code would be a perfect solution, and barrier > can > >> be > >> > inserted there. > >> > > >> > I'm not saying you are wrong on keeping scheduler using a pure > >> barrier > >> > interface. From engineering point of view, I only feel my proposal > is > >> so far > >> > so good, because this patch at least solve the problem for all > >> targets in a > >> > quite simple way. Maybe it can be improved in future based on this. > >> > >> But you don't want to listen about any other alternative, other > >> backends are > >> happy with being able to put the best kind of barrier at the best > spot > >> in the epilogue and don't need a "generic" solution which won't > model > >> very > >> well the target diversity anyway. > > > > Jakub, > > > > Appreciate for your attention on this issue, > > > > 1) Can you clarify who are the "others back-ends"? Does it cover most > of the > > back-ends being supported by GCC right now? > > Not answering for Jakub of course, but as a maintainer of a backend, I > know > MIPS doesn't have the required barrier at the moment. But that's a bug. > > Like others in this thread, I'm strongly of the opinion that this > should > be modelled directly in the IL. And it's already supposed to be > modelled > in the IL. Target-independent code emits the required barriers in > cases > where it rather than the backend patterns are responsible. E.g.: > > emit_clobber (gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode))); > emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx)); > > emit_move_insn (hard_frame_pointer_rtx, fp); > emit_stack_restore (SAVE_NONLOCAL, stack); > > from expand_builtin_longjmp() and: > > if (sa != 0) > { > sa = validize_mem (sa); > /* These clobbers prevent the scheduler from moving >references to variable arrays below the code >that deletes (pops) the arrays. */ > emit_clobber (gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode))); > emit_clobber (gen_rtx_MEM (BLKmode, stack_pointer_rtx)); > } > > from emit_stack_restore(). Backends that fail to follow suit are IMO > just buggy. > > FWIW, I intend to fix MIPS this weekend. Richard S., Appreciate your attention on this issue and investigation on MIPS target. Really glad to know you "find" a potential bug for MIPS through this discussion. To some extension this proved my hypothesis previously. > > You seem to feel strongly about this because it's a wrong-code bug that > is very easy to introduce and often very hard to detect. And I > defintely > sympathise with that. If we were going to to do it in a target- > independent > way, though, I think it would be better to scan patterns like epilogue > and > automatically introduce barriers before assignments to > stack_pointer_rtx > (subject to the kind of hook in your patch). But I still don't think > that's better than leaving the onus on the backend. The backend is > still responsible for much more complicated things like determning > the correct deallocation and register-restore sequence, and for > determining the correct CFI sequence. > I think middle-end in GCC is actually shared code rather than the part exactly in the middle. A pass working on RTL can be a middle end just because the code can be shared for all targets, and some passes can even work for both GIMPLE and RTL. Actually some op
RE: [PATCH] Fix stack red zone bug (PR38644)
> -Original Message- > From: Jakub Jelinek [mailto:ja...@redhat.com] > Sent: Thursday, September 29, 2011 6:14 PM > To: Jiangning Liu > Cc: 'Richard Guenther'; Andrew Pinski; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > > On Thu, Sep 29, 2011 at 06:08:50PM +0800, Jiangning Liu wrote: > > As far as I know different back-ends are implementing different > > prologue/epilogue in GCC. If one day this part can be refined and > abstracted > > as well, I would say solving this stack-red-zone problem in shared > > prologue/epilogue code would be a perfect solution, and barrier can > be > > inserted there. > > > > I'm not saying you are wrong on keeping scheduler using a pure > barrier > > interface. From engineering point of view, I only feel my proposal is > so far > > so good, because this patch at least solve the problem for all > targets in a > > quite simple way. Maybe it can be improved in future based on this. > > But you don't want to listen about any other alternative, other > backends are > happy with being able to put the best kind of barrier at the best spot > in the epilogue and don't need a "generic" solution which won't model > very > well the target diversity anyway. Jakub, Appreciate for your attention on this issue, 1) Can you clarify who are the "others back-ends"? Does it cover most of the back-ends being supported by GCC right now? 2) You need give data to prove "other back-ends" are happy with current solution. The fact is over the years there are a bunch of bugs filed against this problem. WHY? At this point you are implying "other back-ends" are happy with bugs or potential bugs? This is wired to me. Also, this is not a issue whether a back-end is able to implement barrier or not. If you are really asking "able or not", I would say every back-end is able, but it doesn't mean "able" is correct and it doesn't mean "able" is the most reasonable. Comparing with the one I am proposing, I don't see the current solution has other significant advantages in addition to the "proper modeling" for scheduler you are arguing. Instead, the solution I'm proposing is a safe solution, and a solution easy to avoid bugs. If GCC compiler infrastructure can't even give a safe compilation, why should we insist on the "proper modeling" for scheduler only? Hopefully you can consider again about this. -Jiangning > > Jakub
RE: [PATCH] Fix stack red zone bug (PR38644)
> -Original Message- > From: Richard Guenther [mailto:richard.guent...@gmail.com] > Sent: Thursday, September 29, 2011 5:03 PM > To: Jiangning Liu > Cc: Andrew Pinski; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > > On Thu, Sep 29, 2011 at 5:13 AM, Jiangning Liu > wrote: > > > > > >> -Original Message- > >> From: Richard Guenther [mailto:richard.guent...@gmail.com] > >> Sent: Wednesday, September 28, 2011 5:56 PM > >> To: Jiangning Liu > >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org > >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > >> > >> On Wed, Sep 28, 2011 at 11:40 AM, Jiangning Liu > > >> wrote: > >> > > >> > > >> >> -Original Message- > >> >> From: Richard Guenther [mailto:richard.guent...@gmail.com] > >> >> Sent: Wednesday, September 28, 2011 5:20 PM > >> >> To: Jiangning Liu > >> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org > >> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > >> >> > >> >> On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu > >> > >> >> wrote: > >> >> > > >> >> > > >> >> >> -Original Message- > >> >> >> From: Richard Guenther [mailto:richard.guent...@gmail.com] > >> >> >> Sent: Wednesday, September 28, 2011 4:39 PM > >> >> >> To: Jiangning Liu > >> >> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org > >> >> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > >> >> >> > >> >> >> On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu > >> >> > >> >> >> wrote: > >> >> >> > > >> >> >> > > >> >> >> >> -Original Message- > >> >> >> >> From: Richard Guenther [mailto:richard.guent...@gmail.com] > >> >> >> >> Sent: Tuesday, September 27, 2011 3:41 PM > >> >> >> >> To: Jiangning Liu > >> >> >> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org > >> >> >> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > >> >> >> >> > >> >> >> >> On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu > >> >> >> > >> >> >> >> wrote: > >> >> >> >> >> Think of it this way. What the IR says is there is no > >> barrier > >> >> >> >> between > >> >> >> >> >> those moves. You either have an implicit barrier (which > is > >> >> what > >> >> >> you > >> >> >> >> >> are proposing) or you have it explicitly. I think we > all > >> >> rather > >> >> >> >> have > >> >> >> >> >> more things explicit rather than implicit in the > IR. And > >> that > >> >> >> has > >> >> >> >> >> been the overall feeling for a few years now. > >> >> >> >> >> > >> >> >> >> > > >> >> >> >> > Sorry, I'm afraid I can't agree with you. Instead, I > think > >> >> using > >> >> >> >> barrier to describe this kind of dependence is a kind of > >> implicit > >> >> >> >> method. Please note that this is not an usual data > dependence > >> >> issue. > >> >> >> >> The stack pointer change doesn't have any dependence with > >> memory > >> >> >> access > >> >> >> >> at all. > >> >> >> >> > >> >> >> >> It is similar to atomic instructions that require being an > >> >> >> >> optimization/memory barrier. Sure it is not a usual data > >> >> dependence > >> >> >> >> (otherwise we would handle > >> >> >> >> it already), so the targets have to explicitly express the > >> >> >> dependence > >> >> >> >> somehow, for which we only have barriers right now. > >> >> >> >> > >> >> >> > > >> >
RE: [PATCH] Fix stack red zone bug (PR38644)
> -Original Message- > From: Richard Guenther [mailto:richard.guent...@gmail.com] > Sent: Wednesday, September 28, 2011 5:56 PM > To: Jiangning Liu > Cc: Andrew Pinski; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > > On Wed, Sep 28, 2011 at 11:40 AM, Jiangning Liu > wrote: > > > > > >> -Original Message- > >> From: Richard Guenther [mailto:richard.guent...@gmail.com] > >> Sent: Wednesday, September 28, 2011 5:20 PM > >> To: Jiangning Liu > >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org > >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > >> > >> On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu > > >> wrote: > >> > > >> > > >> >> -Original Message- > >> >> From: Richard Guenther [mailto:richard.guent...@gmail.com] > >> >> Sent: Wednesday, September 28, 2011 4:39 PM > >> >> To: Jiangning Liu > >> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org > >> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > >> >> > >> >> On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu > >> > >> >> wrote: > >> >> > > >> >> > > >> >> >> -Original Message- > >> >> >> From: Richard Guenther [mailto:richard.guent...@gmail.com] > >> >> >> Sent: Tuesday, September 27, 2011 3:41 PM > >> >> >> To: Jiangning Liu > >> >> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org > >> >> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > >> >> >> > >> >> >> On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu > >> >> > >> >> >> wrote: > >> >> >> >> Think of it this way. What the IR says is there is no > barrier > >> >> >> between > >> >> >> >> those moves. You either have an implicit barrier (which is > >> what > >> >> you > >> >> >> >> are proposing) or you have it explicitly. I think we all > >> rather > >> >> >> have > >> >> >> >> more things explicit rather than implicit in the IR. And > that > >> >> has > >> >> >> >> been the overall feeling for a few years now. > >> >> >> >> > >> >> >> > > >> >> >> > Sorry, I'm afraid I can't agree with you. Instead, I think > >> using > >> >> >> barrier to describe this kind of dependence is a kind of > implicit > >> >> >> method. Please note that this is not an usual data dependence > >> issue. > >> >> >> The stack pointer change doesn't have any dependence with > memory > >> >> access > >> >> >> at all. > >> >> >> > >> >> >> It is similar to atomic instructions that require being an > >> >> >> optimization/memory barrier. Sure it is not a usual data > >> dependence > >> >> >> (otherwise we would handle > >> >> >> it already), so the targets have to explicitly express the > >> >> dependence > >> >> >> somehow, for which we only have barriers right now. > >> >> >> > >> >> > > >> >> > Richard, > >> >> > > >> >> > Thanks for your explanation. It's explicit to back-end, while > it's > >> >> implicit > >> >> > to scheduler in middle end, because barrier can decide > dependence > >> in > >> >> > scheduler but barrier can be generated from several different > >> >> scenarios. > >> >> > It's unsafe and prone to introduce bug if any one of the > scenarios > >> >> requiring > >> >> > generating barriers is being missed in back-end. > >> >> > > >> >> > Between middle-end and back-end, we should have interfaces that > is > >> >> easy to > >> >> > be implemented by back-end. After all, middle-end itself can't > >> >> consist of a > >> >> > compiler, and vice versa. Back-end needs middle-end's help to > make > >> >> sure > >> >> > back-end is easy
RE: [PATCH] Fix stack red zone bug (PR38644)
> -Original Message- > From: Richard Guenther [mailto:richard.guent...@gmail.com] > Sent: Wednesday, September 28, 2011 5:20 PM > To: Jiangning Liu > Cc: Andrew Pinski; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > > On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu > wrote: > > > > > >> -Original Message- > >> From: Richard Guenther [mailto:richard.guent...@gmail.com] > >> Sent: Wednesday, September 28, 2011 4:39 PM > >> To: Jiangning Liu > >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org > >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > >> > >> On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu > > >> wrote: > >> > > >> > > >> >> -----Original Message- > >> >> From: Richard Guenther [mailto:richard.guent...@gmail.com] > >> >> Sent: Tuesday, September 27, 2011 3:41 PM > >> >> To: Jiangning Liu > >> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org > >> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > >> >> > >> >> On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu > >> > >> >> wrote: > >> >> >> Think of it this way. What the IR says is there is no barrier > >> >> between > >> >> >> those moves. You either have an implicit barrier (which is > what > >> you > >> >> >> are proposing) or you have it explicitly. I think we all > rather > >> >> have > >> >> >> more things explicit rather than implicit in the IR. And that > >> has > >> >> >> been the overall feeling for a few years now. > >> >> >> > >> >> > > >> >> > Sorry, I'm afraid I can't agree with you. Instead, I think > using > >> >> barrier to describe this kind of dependence is a kind of implicit > >> >> method. Please note that this is not an usual data dependence > issue. > >> >> The stack pointer change doesn't have any dependence with memory > >> access > >> >> at all. > >> >> > >> >> It is similar to atomic instructions that require being an > >> >> optimization/memory barrier. Sure it is not a usual data > dependence > >> >> (otherwise we would handle > >> >> it already), so the targets have to explicitly express the > >> dependence > >> >> somehow, for which we only have barriers right now. > >> >> > >> > > >> > Richard, > >> > > >> > Thanks for your explanation. It's explicit to back-end, while it's > >> implicit > >> > to scheduler in middle end, because barrier can decide dependence > in > >> > scheduler but barrier can be generated from several different > >> scenarios. > >> > It's unsafe and prone to introduce bug if any one of the scenarios > >> requiring > >> > generating barriers is being missed in back-end. > >> > > >> > Between middle-end and back-end, we should have interfaces that is > >> easy to > >> > be implemented by back-end. After all, middle-end itself can't > >> consist of a > >> > compiler, and vice versa. Back-end needs middle-end's help to make > >> sure > >> > back-end is easy to be implemented and reduce the possibility of > >> introducing > >> > bugs. > >> > > >> > Without an explicit hook as I'm proposing, back-end implementers > have > >> to > >> > clearly know all scenarios of generating barrier very clearly, > >> because there > >> > isn't any code tips and comments in middle end telling back-end > the > >> list of > >> > all scenarios on generating barriers. > >> > > >> > Yes, barrier is a perfect interface for scheduler in theory. But > from > >> > engineering point of view, I think it's better to explicitly > define > >> an > >> > interface to describe stack red zone and inform back-end, or vice > >> versa. Not > >> > like computer, people is easy to make mistake if you don't tell > them. > >> On > >> > this bug, the fact is over the years different back-ends made > similar > >> bugs. > >> > > >> > GCC is really a perfect platform on
RE: [PATCH] Fix stack red zone bug (PR38644)
> -Original Message- > From: Richard Guenther [mailto:richard.guent...@gmail.com] > Sent: Wednesday, September 28, 2011 4:39 PM > To: Jiangning Liu > Cc: Andrew Pinski; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > > On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu > wrote: > > > > > >> -Original Message- > >> From: Richard Guenther [mailto:richard.guent...@gmail.com] > >> Sent: Tuesday, September 27, 2011 3:41 PM > >> To: Jiangning Liu > >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org > >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > >> > >> On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu > > >> wrote: > >> >> Think of it this way. What the IR says is there is no barrier > >> between > >> >> those moves. You either have an implicit barrier (which is what > you > >> >> are proposing) or you have it explicitly. I think we all rather > >> have > >> >> more things explicit rather than implicit in the IR. And that > has > >> >> been the overall feeling for a few years now. > >> >> > >> > > >> > Sorry, I'm afraid I can't agree with you. Instead, I think using > >> barrier to describe this kind of dependence is a kind of implicit > >> method. Please note that this is not an usual data dependence issue. > >> The stack pointer change doesn't have any dependence with memory > access > >> at all. > >> > >> It is similar to atomic instructions that require being an > >> optimization/memory barrier. Sure it is not a usual data dependence > >> (otherwise we would handle > >> it already), so the targets have to explicitly express the > dependence > >> somehow, for which we only have barriers right now. > >> > > > > Richard, > > > > Thanks for your explanation. It's explicit to back-end, while it's > implicit > > to scheduler in middle end, because barrier can decide dependence in > > scheduler but barrier can be generated from several different > scenarios. > > It's unsafe and prone to introduce bug if any one of the scenarios > requiring > > generating barriers is being missed in back-end. > > > > Between middle-end and back-end, we should have interfaces that is > easy to > > be implemented by back-end. After all, middle-end itself can't > consist of a > > compiler, and vice versa. Back-end needs middle-end's help to make > sure > > back-end is easy to be implemented and reduce the possibility of > introducing > > bugs. > > > > Without an explicit hook as I'm proposing, back-end implementers have > to > > clearly know all scenarios of generating barrier very clearly, > because there > > isn't any code tips and comments in middle end telling back-end the > list of > > all scenarios on generating barriers. > > > > Yes, barrier is a perfect interface for scheduler in theory. But from > > engineering point of view, I think it's better to explicitly define > an > > interface to describe stack red zone and inform back-end, or vice > versa. Not > > like computer, people is easy to make mistake if you don't tell them. > On > > this bug, the fact is over the years different back-ends made similar > bugs. > > > > GCC is really a perfect platform on building new ports, and I saw a > lot of > > new back-ends. The current middle end is unsafe, if port doesn't > support > > stack red zone and back-ends doesn't generate barrier for it. Why > can't we > > explicitly clarify this in compiler code between middle end and back > end? > > What if any other back-end (new or old) NOT supporting stack red zone > > exposing the similar bug again? > > There are gazillion things you have to explicitly get right in your > backends, > so I don't see why exposing proper scheduling barriers should be > special, > and there, why red-zones should be special (as opposed to other > occasions > where you need to emit barriers from the backend for the scheduler). > Richard, This is because, 1) Current scheduler is unsafe if back-end doesn't generate barrier for a port which doesn't support stack red zone at all. 2) Implementing barrier in back-end is a burden to new back-end implementation for ports not supporting stack red zone at all. 3) There are many other ports not reporting bugs around this. I doubt there isn't bug for them. 4) There are over 300 TARG
RE: [PATCH] Fix stack red zone bug (PR38644)
Just realized ChangeLog needs to be changed as well. ChangeLog: * config/i386/i386.c (ix86_using_red_zone): Remove inline. (TARGET_STACK_USING_RED_ZONE): New. * config/rs6000/rs6000.c (rs6000_stack_using_red_zone): New. (TARGET_STACK_USING_RED_ZONE): New. (offset_below_red_zone_p): Change to use new hook TARGET_STACK_USING_RED_ZONE. * doc/tm.texi (TARGET_STACK_USING_RED_ZONE): New. * doc/tm.texi.in (TARGET_STACK_USING_RED_ZONE): New. * sched-deps.c (sched_analyze_1): If the stack pointer is being modified and stack red zone is not supported for ports, flush out all memory references as they may become invalid if moved across the stack adjustment. * target.def (stack_using_red_zone): New. * testsuite/gcc.target/arm/stack-red-zone.c: New. > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Jiangning Liu > Sent: Wednesday, September 28, 2011 2:24 PM > To: 'Uros Bizjak' > Cc: gcc-patches@gcc.gnu.org; j...@suse.cz; geo...@geoffk.org; > dje@gmail.com; r...@redhat.com; Richard Earnshaw; Matthew Gretton- > Dann > Subject: RE: [PATCH] Fix stack red zone bug (PR38644) > > > > -static inline bool > > > +extern bool > > > > static bool > > > > > ix86_using_red_zone (void) > > > { > > > return TARGET_RED_ZONE && !TARGET_64BIT_MS_ABI; > > > @@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void) > > > #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail > > > #endif > > > > > ... > > > > > > > +/* Return true if the ABI allows red zone access. */ > > > +extern bool > > > > static bool > > > > > +rs6000_stack_using_red_zone (void) > > > +{ > > > + return (DEFAULT_ABI != ABI_V4); > > > +} > > > + > > > > Uros, > > Thanks for your review. Accept and new patch is as below. No change for > ChangeLog. > > Thanks, > -Jiangning > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index ff8c49f..e200974 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -2631,7 +2631,7 @@ static const char *const > cpu_names[TARGET_CPU_DEFAULT_max] = > > > /* Return true if a red-zone is in use. */ > > -static inline bool > +static bool > ix86_using_red_zone (void) > { >return TARGET_RED_ZONE && !TARGET_64BIT_MS_ABI; > @@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void) > #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail > #endif > > +#undef TARGET_STACK_USING_RED_ZONE > +#define TARGET_STACK_USING_RED_ZONE ix86_using_red_zone > + > #undef TARGET_FUNCTION_VALUE > #define TARGET_FUNCTION_VALUE ix86_function_value > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index 1ab57e5..1e64d14 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -1537,6 +1537,9 @@ static const struct attribute_spec > rs6000_attribute_table[] = > #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail > #endif > > +#undef TARGET_STACK_USING_RED_ZONE > +#define TARGET_STACK_USING_RED_ZONE rs6000_stack_using_red_zone > + > /* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors > The PowerPC architecture requires only weak consistency among > processors--that is, memory accesses between processors need not be > @@ -20660,6 +20663,13 @@ rs6000_restore_saved_cr (rtx reg, int > using_mfcr_multiple) > } > } > > +/* Return true if the ABI allows red zone access. */ > +static bool > +rs6000_stack_using_red_zone (void) > +{ > + return (DEFAULT_ABI != ABI_V4); > +} > + > /* Return true if OFFSET from stack pointer can be clobbered by > signals. > V.4 doesn't have any stack cushion, AIX ABIs have 220 or 288 bytes > below stack pointer not cloberred by signals. */ > @@ -20667,7 +20677,7 @@ rs6000_restore_saved_cr (rtx reg, int > using_mfcr_multiple) > static inline bool > offset_below_red_zone_p (HOST_WIDE_INT offset) > { > - return offset < (DEFAULT_ABI == ABI_V4 > + return offset < (!TARGET_STACK_USING_RED_ZONE > ? 0 > : TARGET_32BIT ? -220 : -288); > } > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > index 335c1d1..c848806 100644 > --- a/gcc/doc/tm.texi > +++ b/gcc/doc/tm.texi > @@ -11332,6 +11332,22 @@ to the stack. Therefore, this hook should > return > true in general, but > false for naked functions. The default implementation always ret
RE: [PATCH] Fix stack red zone bug (PR38644)
> > -static inline bool > > +extern bool > > static bool > > > ix86_using_red_zone (void) > > { > > return TARGET_RED_ZONE && !TARGET_64BIT_MS_ABI; > > @@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void) > > #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail > > #endif > > ... > > > > +/* Return true if the ABI allows red zone access. */ > > +extern bool > > static bool > > > +rs6000_stack_using_red_zone (void) > > +{ > > + return (DEFAULT_ABI != ABI_V4); > > +} > > + > Uros, Thanks for your review. Accept and new patch is as below. No change for ChangeLog. Thanks, -Jiangning diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index ff8c49f..e200974 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2631,7 +2631,7 @@ static const char *const cpu_names[TARGET_CPU_DEFAULT_max] = /* Return true if a red-zone is in use. */ -static inline bool +static bool ix86_using_red_zone (void) { return TARGET_RED_ZONE && !TARGET_64BIT_MS_ABI; @@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void) #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail #endif +#undef TARGET_STACK_USING_RED_ZONE +#define TARGET_STACK_USING_RED_ZONE ix86_using_red_zone + #undef TARGET_FUNCTION_VALUE #define TARGET_FUNCTION_VALUE ix86_function_value diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 1ab57e5..1e64d14 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1537,6 +1537,9 @@ static const struct attribute_spec rs6000_attribute_table[] = #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail #endif +#undef TARGET_STACK_USING_RED_ZONE +#define TARGET_STACK_USING_RED_ZONE rs6000_stack_using_red_zone + /* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors The PowerPC architecture requires only weak consistency among processors--that is, memory accesses between processors need not be @@ -20660,6 +20663,13 @@ rs6000_restore_saved_cr (rtx reg, int using_mfcr_multiple) } } +/* Return true if the ABI allows red zone access. */ +static bool +rs6000_stack_using_red_zone (void) +{ + return (DEFAULT_ABI != ABI_V4); +} + /* Return true if OFFSET from stack pointer can be clobbered by signals. V.4 doesn't have any stack cushion, AIX ABIs have 220 or 288 bytes below stack pointer not cloberred by signals. */ @@ -20667,7 +20677,7 @@ rs6000_restore_saved_cr (rtx reg, int using_mfcr_multiple) static inline bool offset_below_red_zone_p (HOST_WIDE_INT offset) { - return offset < (DEFAULT_ABI == ABI_V4 + return offset < (!TARGET_STACK_USING_RED_ZONE ? 0 : TARGET_32BIT ? -220 : -288); } diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 335c1d1..c848806 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11332,6 +11332,22 @@ to the stack. Therefore, this hook should return true in general, but false for naked functions. The default implementation always returns true. @end deftypefn +@deftypefn {Target Hook} bool TARGET_STACK_USING_RED_ZONE (void) +This hook returns true if the target has a red zone (an area beyond the +current extent of the stack that cannot be modified by asynchronous events +on the processor). + +If this hook returns false then the compiler mid-end will not move an access +to memory in the stack frame past a stack adjustment insn. + +If this hook returns true then the compiler mid-end will assume that it is +safe to move an access to memory in the stack frame past a stack adjustment +insn. The target back-end must emit scheduling barrier insns when this is +unsafe. + +The default is to return false which is safe and appropriate for most targets. +@end deftypefn + @deftypevr {Target Hook} {unsigned HOST_WIDE_INT} TARGET_CONST_ANCHOR On some architectures it can take multiple instructions to synthesize a constant. If there is another constant already in a register that diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 6783826..0fa9e10 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -11223,6 +11223,22 @@ to the stack. Therefore, this hook should return true in general, but false for naked functions. The default implementation always returns true. @end deftypefn +@hook TARGET_STACK_USING_RED_ZONE +This hook returns true if the target has a red zone (an area beyond the +current extent of the stack that cannot be modified by asynchronous events +on the processor). + +If this hook returns false then the compiler mid-end will not move an access +to memory in the stack frame past a stack adjustment insn. + +If this hook returns true then the compiler mid-end will assume that it is +safe to move an access to memory in the stack frame past a stack adjustment +insn. The target back-end must emit scheduling barrier insns when this is +unsafe. + +The default is to return false which is safe and appropriate for most targets. +@end deftypefn + @hook TARGET_CONST_ANCHOR On
RE: [PATCH] Fix stack red zone bug (PR38644)
> -Original Message- > From: Richard Guenther [mailto:richard.guent...@gmail.com] > Sent: Tuesday, September 27, 2011 3:41 PM > To: Jiangning Liu > Cc: Andrew Pinski; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > > On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu > wrote: > >> Think of it this way. What the IR says is there is no barrier > between > >> those moves. You either have an implicit barrier (which is what you > >> are proposing) or you have it explicitly. I think we all rather > have > >> more things explicit rather than implicit in the IR. And that has > >> been the overall feeling for a few years now. > >> > > > > Sorry, I'm afraid I can't agree with you. Instead, I think using > barrier to describe this kind of dependence is a kind of implicit > method. Please note that this is not an usual data dependence issue. > The stack pointer change doesn't have any dependence with memory access > at all. > > It is similar to atomic instructions that require being an > optimization/memory barrier. Sure it is not a usual data dependence > (otherwise we would handle > it already), so the targets have to explicitly express the dependence > somehow, for which we only have barriers right now. > Richard, Thanks for your explanation. It's explicit to back-end, while it's implicit to scheduler in middle end, because barrier can decide dependence in scheduler but barrier can be generated from several different scenarios. It's unsafe and prone to introduce bug if any one of the scenarios requiring generating barriers is being missed in back-end. Between middle-end and back-end, we should have interfaces that is easy to be implemented by back-end. After all, middle-end itself can't consist of a compiler, and vice versa. Back-end needs middle-end's help to make sure back-end is easy to be implemented and reduce the possibility of introducing bugs. Without an explicit hook as I'm proposing, back-end implementers have to clearly know all scenarios of generating barrier very clearly, because there isn't any code tips and comments in middle end telling back-end the list of all scenarios on generating barriers. Yes, barrier is a perfect interface for scheduler in theory. But from engineering point of view, I think it's better to explicitly define an interface to describe stack red zone and inform back-end, or vice versa. Not like computer, people is easy to make mistake if you don't tell them. On this bug, the fact is over the years different back-ends made similar bugs. GCC is really a perfect platform on building new ports, and I saw a lot of new back-ends. The current middle end is unsafe, if port doesn't support stack red zone and back-ends doesn't generate barrier for it. Why can't we explicitly clarify this in compiler code between middle end and back end? What if any other back-end (new or old) NOT supporting stack red zone exposing the similar bug again? Thanks, -Jiangning > Richard. > > > No matter what solution itself is, the problem itself is a quite a > common one on ISA level, so we should solve it in middle-end, because > middle-end is shared for all ports. > > > > My proposal avoids problems in future. Any new ports and new back-end > implementations needn't explicitly define this hook in a very safe way. > But if one port wants to "unsafely" introduce red zone, we can > explicitly define this hook in back-end. This way, we would avoid the > bug in the earliest time. Do you really want to hide this problem in > back-end silently? And wait others to report bug and then waste time to > get it fixed again? > > > > The facts I see is over the years, for different ports reported the > similar issue around this, and somebody tried to fix them. Actually, > this kind of problem should be fixed in design level. If the first > people solve this bug can give solution in middle end, we would not > need to waste time on filing those bugs in bug zilla and fixing them > around this problem. > > > > Thanks, > > -Jiangning > > > > > > > > > > > > > > > >
RE: [PATCH] Fix stack red zone bug (PR38644)
> Think of it this way. What the IR says is there is no barrier between > those moves. You either have an implicit barrier (which is what you > are proposing) or you have it explicitly. I think we all rather have > more things explicit rather than implicit in the IR. And that has > been the overall feeling for a few years now. > Sorry, I'm afraid I can't agree with you. Instead, I think using barrier to describe this kind of dependence is a kind of implicit method. Please note that this is not an usual data dependence issue. The stack pointer change doesn't have any dependence with memory access at all. No matter what solution itself is, the problem itself is a quite a common one on ISA level, so we should solve it in middle-end, because middle-end is shared for all ports. My proposal avoids problems in future. Any new ports and new back-end implementations needn't explicitly define this hook in a very safe way. But if one port wants to "unsafely" introduce red zone, we can explicitly define this hook in back-end. This way, we would avoid the bug in the earliest time. Do you really want to hide this problem in back-end silently? And wait others to report bug and then waste time to get it fixed again? The facts I see is over the years, for different ports reported the similar issue around this, and somebody tried to fix them. Actually, this kind of problem should be fixed in design level. If the first people solve this bug can give solution in middle end, we would not need to waste time on filing those bugs in bug zilla and fixing them around this problem. Thanks, -Jiangning
RE: [PATCH] Fix stack red zone bug (PR38644)
Fix a typo and CC x86/rs6000/arm ports maintainers. ChangeLog: * config/i386/i386.c (ix86_stack_using_red_zone): Change inline to be extern. (TARGET_STACK_USING_RED_ZONE): New. * config/rs6000/rs6000.c (rs6000_stack_using_red_zone): New. (TARGET_STACK_USING_RED_ZONE): New. (offset_below_red_zone_p): Change to use new hook TARGET_STACK_USING_RED_ZONE. * doc/tm.texi (TARGET_STACK_USING_RED_ZONE): New. * doc/tm.texi.in (TARGET_STACK_USING_RED_ZONE): New. * sched-deps.c (sched_analyze_1): If the stack pointer is being modified and stack red zone is not supported for ports, flush out all memory references as they may become invalid if moved across the stack adjustment. * target.def (stack_using_red_zone): New. * testsuite/gcc.target/arm/stack-red-zone.c: New. Thanks, -Jiangning diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index ff8c49f..1c16391 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2631,7 +2631,7 @@ static const char *const cpu_names[TARGET_CPU_DEFAULT_max] = /* Return true if a red-zone is in use. */ -static inline bool +extern bool ix86_using_red_zone (void) { return TARGET_RED_ZONE && !TARGET_64BIT_MS_ABI; @@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void) #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail #endif +#undef TARGET_STACK_USING_RED_ZONE +#define TARGET_STACK_USING_RED_ZONE ix86_using_red_zone + #undef TARGET_FUNCTION_VALUE #define TARGET_FUNCTION_VALUE ix86_function_value diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 1ab57e5..411cb09 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1537,6 +1537,9 @@ static const struct attribute_spec rs6000_attribute_table[] = #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail #endif +#undef TARGET_STACK_USING_RED_ZONE +#define TARGET_STACK_USING_RED_ZONE rs6000_stack_using_red_zone + /* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors The PowerPC architecture requires only weak consistency among processors--that is, memory accesses between processors need not be @@ -20660,6 +20663,13 @@ rs6000_restore_saved_cr (rtx reg, int using_mfcr_multiple) } } +/* Return true if the ABI allows red zone access. */ +extern bool +rs6000_stack_using_red_zone (void) +{ + return (DEFAULT_ABI != ABI_V4); +} + /* Return true if OFFSET from stack pointer can be clobbered by signals. V.4 doesn't have any stack cushion, AIX ABIs have 220 or 288 bytes below stack pointer not cloberred by signals. */ @@ -20667,7 +20677,7 @@ rs6000_restore_saved_cr (rtx reg, int using_mfcr_multiple) static inline bool offset_below_red_zone_p (HOST_WIDE_INT offset) { - return offset < (DEFAULT_ABI == ABI_V4 + return offset < (!TARGET_STACK_USING_RED_ZONE ? 0 : TARGET_32BIT ? -220 : -288); } diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 335c1d1..c848806 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11332,6 +11332,22 @@ to the stack. Therefore, this hook should return true in general, but false for naked functions. The default implementation always returns true. @end deftypefn +@deftypefn {Target Hook} bool TARGET_STACK_USING_RED_ZONE (void) +This hook returns true if the target has a red zone (an area beyond the +current extent of the stack that cannot be modified by asynchronous events +on the processor). + +If this hook returns false then the compiler mid-end will not move an access +to memory in the stack frame past a stack adjustment insn. + +If this hook returns true then the compiler mid-end will assume that it is +safe to move an access to memory in the stack frame past a stack adjustment +insn. The target back-end must emit scheduling barrier insns when this is +unsafe. + +The default is to return false which is safe and appropriate for most targets. +@end deftypefn + @deftypevr {Target Hook} {unsigned HOST_WIDE_INT} TARGET_CONST_ANCHOR On some architectures it can take multiple instructions to synthesize a constant. If there is another constant already in a register that diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 6783826..0fa9e10 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -11223,6 +11223,22 @@ to the stack. Therefore, this hook should return true in general, but false for naked functions. The default implementation always returns true. @end deftypefn +@hook TARGET_STACK_USING_RED_ZONE +This hook returns true if the target has a red zone (an area beyond the +current extent of the stack that cannot be modified by asynchronous events +on the processor). + +If this hook returns false then the compiler mid-end will not move an access +to memory in the stack frame past a stack adjustment insn. + +If this hook returns true then the compiler mid-end will assume that
RE: [PATCH] Fix stack red zone bug (PR38644)
> -Original Message- > From: Andrew Pinski [mailto:pins...@gmail.com] > Sent: Tuesday, September 27, 2011 5:31 AM > To: Jiangning Liu > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Fix stack red zone bug (PR38644) > > On Mon, Sep 26, 2011 at 3:26 AM, Jiangning Liu > wrote: > > This patch is fix PR38644, a 3-year-old bug. > > > > From the discussions in mail list and bugzilla, I think the middle > end fix > > is a common view. Although there are stills some gaps on how to fix > it in > > middle end, I think this patch at least moves the problem from back- > end to > > middle-end, which makes GCC infrastructure stronger than before. > Otherwise, > > any back-end needs to "find" and fix this problem by itself. > > I don't see why you think that is the common view that the fix should > be in the middle-end. I and a few others think the back-end should be > where the barrier emitted from. The middle-end should not have stack > accesses as being special in anyway when it comes down to scheduling. > What happens when a new scheduler comes around? Then this has to be > fixed again in that new scheduler rather than having the barrier doing > the correct thing for all schedulers. > Hi Andrew, Thanks for your kindly response! Sorry, for this bug, I don’t see your valuable comments previously in either bug zilla and the [RFC] I sent out long time ago in gcc mail list. What I see is a bunch of people agreed this problem should be fixed in middle end. For example, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644#c48 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644#c49 My comments, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644#c35 I'd like to repeat my points here. As I mentioned, it shouldn't be back-end's responsibility to "find" this problem, i.e. the instructions move over stack pointer change. ISAs can be split into two classes. One class doesn't support stack red zone, and the other class does support stack red zone. If we agree this is a general issue, I think this issue should be solved in middle end rather than in back-end. Yes. Back-end can provide barrier to explicitly represent the dependence, but I think this is a kind of workaround, because this way we are hoping back-end to detect this kind of dependence, while this kind of dependence is common for every back-end which doesn't support stack red zone. If we carefully analyze the implementation in x86 and powerpc back-ends supporting stack red zone, we may find, they are doing almost the same things. In particular, I think the retarget-ability is the most valuable treasure for GCC. Thinking of implementing a new port, do we want to write new code to "find" this problem, no matter whether the new port supports stack red zone or not? If the answer is YES, it means we need to write the similar code like what currently x86-64 and powerpc back-ends are doing. Obviously, it is a non-trivial work. This way I don't think it's good for GCC's retarget-ability. Why don't we abstract the common thing in middle-end with a very clean interface? Finally, It's OK for me if you say scheduler can only tread dependence as a "clean" interface. But this point doesn't support stack red zone issue must be handled in different back-ends respectively. If we want to keep scheduler clean enough, a simpler solution is adding a pass in middle-end to generate barrier before scheduler and this pass handle the general stack-red-zone issue using the interface I'm defining in the patch, but obviously this is a kind of over-design so far. Thanks, -Jiangning > Thanks, > Andrew Pinski
[PATCH] Fix stack red zone bug (PR38644)
This patch is fix PR38644, a 3-year-old bug. >From the discussions in mail list and bugzilla, I think the middle end fix is a common view. Although there are stills some gaps on how to fix it in middle end, I think this patch at least moves the problem from back-end to middle-end, which makes GCC infrastructure stronger than before. Otherwise, any back-end needs to "find" and fix this problem by itself. Since this bug was pinged many times by customers, I think at least we can move on with this patch. If necessary, we can then give follow-up to build a even better solution in middle end later on. The patch is tested on x86 and ARM. ChangeLog: * config/i386/i386.c (ix86_stack_using_red_zone): Change inline to be extern. (TARGET_STACK_USING_RED_ZONE): New. * config/rs6000/rs6000.c (rs6000_stack_using_red_zone): New. (TARGET_STACK_USING_RED_ZONE): New. (offset_below_red_zone_p): Change to use new hook TARGET_STACK_USING_RED_ZONE. * doc/tm.texi (TARGET_STACK_USING_RED_ZONE): New. * doc/tm.texi.in (TARGET_STACK_USING_RED_ZONE): New. * sched-deps.c (sched_analyze_1): If the stack pointer is being modified and stack red zone is not supported for ports, flush out all memory references as they may become invalid if moved across the stack adjustment. * target.def (stack_using_red_zone): New. * testsuite/gcc.target/arm/stack-red-zone.c: New. Thanks, -Jiangning diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index ff8c49f..ce486da 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2631,8 +2631,8 @@ static const char *const cpu_names[TARGET_CPU_DEFAULT_max] = /* Return true if a red-zone is in use. */ -static inline bool -ix86_using_red_zone (void) +extern bool +ix86_stack_using_red_zone (void) { return TARGET_RED_ZONE && !TARGET_64BIT_MS_ABI; } @@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void) #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail #endif +#undef TARGET_STACK_USING_RED_ZONE +#define TARGET_STACK_USING_RED_ZONE ix86_stack_using_red_zone + #undef TARGET_FUNCTION_VALUE #define TARGET_FUNCTION_VALUE ix86_function_value diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 1ab57e5..411cb09 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1537,6 +1537,9 @@ static const struct attribute_spec rs6000_attribute_table[] = #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail #endif +#undef TARGET_STACK_USING_RED_ZONE +#define TARGET_STACK_USING_RED_ZONE rs6000_stack_using_red_zone + /* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors The PowerPC architecture requires only weak consistency among processors--that is, memory accesses between processors need not be @@ -20660,6 +20663,13 @@ rs6000_restore_saved_cr (rtx reg, int using_mfcr_multiple) } } +/* Return true if the ABI allows red zone access. */ +extern bool +rs6000_stack_using_red_zone (void) +{ + return (DEFAULT_ABI != ABI_V4); +} + /* Return true if OFFSET from stack pointer can be clobbered by signals. V.4 doesn't have any stack cushion, AIX ABIs have 220 or 288 bytes below stack pointer not cloberred by signals. */ @@ -20667,7 +20677,7 @@ rs6000_restore_saved_cr (rtx reg, int using_mfcr_multiple) static inline bool offset_below_red_zone_p (HOST_WIDE_INT offset) { - return offset < (DEFAULT_ABI == ABI_V4 + return offset < (!TARGET_STACK_USING_RED_ZONE ? 0 : TARGET_32BIT ? -220 : -288); } diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 335c1d1..c848806 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11332,6 +11332,22 @@ to the stack. Therefore, this hook should return true in general, but false for naked functions. The default implementation always returns true. @end deftypefn +@deftypefn {Target Hook} bool TARGET_STACK_USING_RED_ZONE (void) +This hook returns true if the target has a red zone (an area beyond the +current extent of the stack that cannot be modified by asynchronous events +on the processor). + +If this hook returns false then the compiler mid-end will not move an access +to memory in the stack frame past a stack adjustment insn. + +If this hook returns true then the compiler mid-end will assume that it is +safe to move an access to memory in the stack frame past a stack adjustment +insn. The target back-end must emit scheduling barrier insns when this is +unsafe. + +The default is to return false which is safe and appropriate for most targets. +@end deftypefn + @deftypevr {Target Hook} {unsigned HOST_WIDE_INT} TARGET_CONST_ANCHOR On some architectures it can take multiple instructions to synthesize a constant. If there is another constant already in a register that diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 6783826..0fa9e10 100644 --- a/gcc/doc/tm.texi.in +++ b
RE: [PATCH, testsuite] Add loop unrolling command line options for some test cases
> -Original Message- > From: Richard Guenther [mailto:richard.guent...@gmail.com] > Sent: Saturday, September 24, 2011 6:12 PM > To: Jiangning Liu > Cc: Mike Stump; gcc-patches@gcc.gnu.org; r...@cebitec.uni-bielefeld.de > Subject: Re: [PATCH, testsuite] Add loop unrolling command line options > for some test cases > > On Thu, Sep 22, 2011 at 6:22 AM, Jiangning Liu > wrote: > > Hi Mike, > > > > OK. I will wait 24 more hours. If no objections by then, I will get > it > > checked into trunk. > > I don't think you need -funroll-loops though. > The intention of those test cases are not hurt if -funroll-loops is added, right? > > Thanks, > > -Jiangning > > > >> -Original Message- > >> From: Mike Stump [mailto:mikest...@comcast.net] > >> Sent: Thursday, September 22, 2011 3:10 AM > >> To: Jiangning Liu > >> Cc: gcc-patches@gcc.gnu.org; r...@cebitec.uni-bielefeld.de > >> Subject: Re: [PATCH, testsuite] Add loop unrolling command line > options > >> for some test cases > >> > >> On Sep 21, 2011, at 1:22 AM, Jiangning Liu wrote: > >> > The fix is to explicitly turn on loop unroll and set max-unroll- > times > >> to 8, > >> > which is larger than the unrolling times being detected in the > cases. > >> > >> Sounds reasonable to me. Ok, though, do watch for any comments by > >> people that know more than I. > > > > > > > > > >
RE: [PATCH, testsuite] Add loop unrolling command line options for some test cases
Hi Mike, OK. I will wait 24 more hours. If no objections by then, I will get it checked into trunk. Thanks, -Jiangning > -Original Message- > From: Mike Stump [mailto:mikest...@comcast.net] > Sent: Thursday, September 22, 2011 3:10 AM > To: Jiangning Liu > Cc: gcc-patches@gcc.gnu.org; r...@cebitec.uni-bielefeld.de > Subject: Re: [PATCH, testsuite] Add loop unrolling command line options > for some test cases > > On Sep 21, 2011, at 1:22 AM, Jiangning Liu wrote: > > The fix is to explicitly turn on loop unroll and set max-unroll-times > to 8, > > which is larger than the unrolling times being detected in the cases. > > Sounds reasonable to me. Ok, though, do watch for any comments by > people that know more than I.
[PATCH, testsuite] Add loop unrolling command line options for some test cases
Hi, The following test cases are to check predictive commoning optimization by detecting loop unrolling times. Originally by default the max-unroll-times is 8. If max-unroll-times is specified to be less than the expected unrolling times, those cases would fail. The fix is to explicitly turn on loop unroll and set max-unroll-times to 8, which is larger than the unrolling times being detected in the cases. ChangeLog: 2011-09-14 Jiangning Liu * gcc/testsuite/gcc.dg/tree-ssa/predcom-1.c: Explicitly turn on loop unroll and set max unroll times to 8. * gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c: Likewise. * gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c: Likewise. * gcc/testsuite/gcc.dg/tree-ssa/predcom-4.c: Likewise. * gcc/testsuite/gcc.dg/tree-ssa/predcom-5.c: Likewise. Thanks, -Jiangning diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-1.c b/gcc/testsuite/gcc.dg/tree-ssa/predcom-1.c index 16bd5c9..f1e52e5 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-1.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-do run } */ -/* { dg-options "-O2 -fpredictive-commoning -fdump-tree-pcom-details" } */ +/* { dg-options "-O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning -fdump-tree-pcom-details" } */ void abort (void); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c b/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c index 7275f28..27e53ee 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-do run } */ -/* { dg-options "-O2 -fpredictive-commoning -fdump-tree-pcom-details" } */ +/* { dg-options "-O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning -fdump-tree-pcom-details" } */ void abort (void); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c b/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c index d500234..5dfe384 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fpredictive-commoning -fdump-tree-pcom-details" } */ +/* { dg-options "-O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning -fdump-tree-pcom-details" } */ int a[1000], b[1000]; diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-4.c b/gcc/testsuite/gcc.dg/tree-ssa/predcom-4.c index 6f06b7f..c29a46a 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-4.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-4.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-do run } */ -/* { dg-options "-O2 -fpredictive-commoning -fdump-tree-pcom-details" } */ +/* { dg-options "-O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning -fdump-tree-pcom-details" } */ /* Test for predictive commoning of expressions, without reassociation. */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-5.c b/gcc/testsuite/gcc.dg/tree-ssa/predcom-5.c index 134fc37..29444ab 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-5.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-5.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-do run } */ -/* { dg-options "-O2 -fpredictive-commoning -fdump-tree-pcom-details" } */ +/* { dg-options "-O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning -fdump-tree-pcom-details" } */ /* Test for predictive commoning of expressions, with reassociation. */
[PATCH, testsuite, ARM] Change to expected failure for g++.dg/abi/local1.C on ARM target
Here comes a patch to change the case g++.dg/abi/local1.C to be expected failure for ARM target. In "C++ ABI for the ARM architecture", it says, This runs contrary to §2.9.1 of [GC++ABI] which states: It is intended that two type_info pointers point to equivalent type descriptions if and only if the pointers are equal. An implementation must satisfy this constraint, e.g. by using symbol preemption, COMDAT sections, or other mechanisms. Fortunately, we can ignore this requirement without violating the C++ standard provided that: * type_info::operator== and type_info::operator!= compare the strings returned by type_info::name(), not just the pointers to the RTTI objects and their names. * No reliance is placed on the address returned by type_info::name(). (That is, t1.name() != t2.name() does not imply that t1 != t2). The patch is, diff --git a/gcc/testsuite/g++.dg/abi/local1.C b/gcc/testsuite/g++.dg/abi/local1.C index 518193c..7f08a8f 100644 --- a/gcc/testsuite/g++.dg/abi/local1.C +++ b/gcc/testsuite/g++.dg/abi/local1.C @@ -1,4 +1,4 @@ -// { dg-do run } +// { dg-do run { xfail { arm*-*-eabi* } } } // { dg-additional-sources "local1-a.cc" } #include ChangeLog: 2011-09-14 Jiangning Liu * g++.dg/abi/local1.C: Change to XFAIL for ARM EABI target. Thanks, -Jiangning
RE: [PATCH, testsuite, ARM] change XFAIL to pass for ARM on a case testing tree-ssa-dom
PING... > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Jiangning Liu > Sent: Friday, August 26, 2011 5:56 PM > To: gcc-patches@gcc.gnu.org > Subject: [PATCH, testsuite, ARM] change XFAIL to pass for ARM on a case > testing tree-ssa-dom > > Test case gcc.dg/tree-ssa/20040204-1.c can pass for -O1 after Richard > Guenther fixed something in tree-ssa-dom. The > "link_error" should be optimized away for ARM targets as well. > > The patch is: > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c > b/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c > index 45e44a1..470b585 100644 > --- a/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c > +++ b/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c > @@ -33,5 +33,5 @@ void test55 (int x, int y) > that the && should be emitted (based on BRANCH_COST). Fix this > by teaching dom to look through && and register all components > as true. */ > -/* { dg-final { scan-tree-dump-times "link_error" 0 "optimized" > { xfail { ! > "alpha*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86-*-* mmix- > *-* > mips*-*-* m68k*-*-* moxie-*-* sparc*-*-* spu-*-* x86_64-*-*" } } } } */ > +/* { dg-final { scan-tree-dump-times "link_error" 0 "optimized" > { xfail { ! > "alpha*-*-* arm*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86- > *-* > mmix-*-* mips*-*-* m68k*-*-* moxie-*-* sparc*-*-* spu-*-* x86_64-*- > *" } } } > } */ > /* { dg-final { cleanup-tree-dump "optimized" } } */ > > gcc/testsuite/ChangeLog: > > 2011-08-26 Jiangning Liu > >PR tree-optimization/46021 >* gcc.dg/tree-ssa/20040204-1.c: Don't XFAIL on arm*-*-*. > > Thanks, > -Jiangning > > >
[PATCH, testsuite, ARM] change XFAIL to pass for ARM on a case testing tree-ssa-dom
Test case gcc.dg/tree-ssa/20040204-1.c can pass for -O1 after Richard Guenther fixed something in tree-ssa-dom. The "link_error" should be optimized away for ARM targets as well. The patch is: diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c b/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c index 45e44a1..470b585 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c @@ -33,5 +33,5 @@ void test55 (int x, int y) that the && should be emitted (based on BRANCH_COST). Fix this by teaching dom to look through && and register all components as true. */ -/* { dg-final { scan-tree-dump-times "link_error" 0 "optimized" { xfail { ! "alpha*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86-*-* mmix-*-* mips*-*-* m68k*-*-* moxie-*-* sparc*-*-* spu-*-* x86_64-*-*" } } } } */ +/* { dg-final { scan-tree-dump-times "link_error" 0 "optimized" { xfail { ! "alpha*-*-* arm*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86-*-* mmix-*-* mips*-*-* m68k*-*-* moxie-*-* sparc*-*-* spu-*-* x86_64-*-*" } } } } */ /* { dg-final { cleanup-tree-dump "optimized" } } */ gcc/testsuite/ChangeLog: 2011-08-26 Jiangning Liu PR tree-optimization/46021 * gcc.dg/tree-ssa/20040204-1.c: Don't XFAIL on arm*-*-*. Thanks, -Jiangning
RE: [PATCH, ARM] Generate conditional compares in Thumb2 state
Attached is the new patch file. Review please! ChangeLog: 2011-08-18 Jiangning Liu * config/arm/arm.md (*ior_scc_scc): Enable for Thumb2 as well. (*ior_scc_scc_cmp): Likewise (*and_scc_scc): Likewise. (*and_scc_scc_cmp): Likewise. (*and_scc_scc_nodom): Likewise. (*cmp_ite0, *cmp_ite1, *cmp_and, *cmp_ior): Handle Thumb2. Testsuite Changelog would be: 2011-08-18 Jiangning Liu * gcc.target/arm/thumb2-cond-cmp-1.c: New. Make sure conditional compare can be generated. * gcc.target/arm/thumb2-cond-cmp-2.c: Likewise. * gcc.target/arm/thumb2-cond-cmp-3.c: Likewise. * gcc.target/arm/thumb2-cond-cmp-4.c: Likewise. Regression test against cortex-M0/M3/M4 profile with "-mthumb" option doesn't show any new failures. Thanks, -Jiangning fix_cond_cmp_5.patch Description: Binary data
RE: [PATCH, ARM] Generate conditional compares in Thumb2 state
Ramana, I only see the following three combinations are meaningful, * cmp l, lPy // keep cmp, and length is 2, on thumb * cmp r, rI // keep cmp, and length is 4 * cmp r, L // convert to cmn, and length is 4 According to ARM ARM, for negative immediate, all encodings for cmp/cmn are 4-byte long, i.e. * CMP: encoding T2 and A1 * CMN: encoding T1 and A1 so we needn't to make difference to cover Pw and Pv. > Length is 8 bytes if you have Pw and L For this cases, the length should be 10 for Thumb2 instead. Finally, if we want to describe all possibilities in constraints, we would have the followings 9 combinations, * cmp1 has operands (l, l, l, r, r, r, r, r, r) (lPy,lPy,lPy,rI,rI,rI,L, L, L) * cmp2 has operands (l, r, r, l, r, r, l, r, r) (lPy,rI,L, lPy,rI,L, lPy,rI,L) So the length would be like below, (I don't know how to write it in attribute section yet. ) if (TARGET_THUMB2) { (set_attr "length" "6,8,8,8,10,10,8,10,10")] } else { (set_attr "length" "4,6,6,6,8,8,6,8,8")] } Does it make sense? Thanks, -Jiangning > -Original Message- > From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org] > Sent: Wednesday, August 10, 2011 6:40 PM > To: Jiangning Liu > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH, ARM] Generate conditional compares in Thumb2 state > > On 10 August 2011 09:20, Jiangning Liu wrote: > > PING... > > > > > BTW, in patch fix_cond_cmp_2.patch, the file mode of thumb2.md is > carelessly > > changed, so please check attached new patch file fix_cond_cmp_3.patch. > > > > Please do not top post. > > I've been away for the whole of last week and then been away from my > desk > for the first 2 days this week and am still catching up with email . > > I'm missing a Changelog entry for this . Do I assume it is the same ? > > I've just noticed that the length attribute isn't correct for Thumb2 > state. When I last > looked at this I must have missed the case where the cmp and the cmn > are both 32 bit instructions. > > The length can vary between 6 and 10 bytes for the Thumb2 variant from > my reading of the ARM-ARM. > > i.e cmp reg, <8bitconstant> > it > cmn reg, <8bitconstant> > > Length = 6 bytes > > or even with >cmp reg, reg >it >cmn reg, reg > > All registers betwen r0 and r7 . > > Length is 6 bytes if you have this for both l constraints. > Length is 6 bytes - you should catch this with Pw and Pv respectively . > Length is 8 bytes if you have Pw and L > Length is 8 bytes if you have I and Pv > Length is 10 bytes if you have I and L . > > > Showing you an example with l and Py at this point of time and leaving > the rest as homework. Yes it will duplicate a few cases but it helps > getting the lengths absolutely right. > > (define_insn "*cmp_ite0" > [(set (match_operand 6 "dominant_cc_register" "") > (compare >(if_then_else:SI > (match_operator 4 "arm_comparison_operator" > [(match_operand:SI 0 "s_register_operand" "l,r,r,r,r") > (match_operand:SI 1 "arm_add_operand" "lPy,rI,L,rI,L")]) > (match_operator:SI 5 "arm_comparison_operator" > [(match_operand:SI 2 "s_register_operand" "l,r,r,r,r") > (match_operand:SI 3 "arm_add_operand" "lPy,rI,rI,L,L")]) > (const_int 0)) >(const_int 0)))] > "TARGET_32BIT" > "* > { > static const char * const cmp1[5][2] = > { > {\"cmp\\t%2, %3\", >\"cmp\\t%0, %1\"}, > {\"cmp\\t%2, %3\", >\"cmp\\t%0, %1\"}, > {\"cmp\\t%2, %3\", >\"cmn\\t%0, #%n1\"}, > {\"cmn\\t%2, #%n3\", >\"cmp\\t%0, %1\"}, > {\"cmn\\t%2, #%n3\", >\"cmn\\t%0, #%n1\"} > }; > static const char * const cmp2[5][2] = > { > {\"cmp%d5\\t%0, %1\", >\"cmp%d4\\t%2, %3\"}, > {\"cmp%d5\\t%0, %1\", >\"cmp%d4\\t%2, %3\"}, > {\"cmn%d5\\t%0, #%n1\", >\"cmp%d4\\t%2, %3\"}, > {\"cmp%d5\\t%0, %1\", >\"cmn%d4\\t%2, #%n3\"}, > {\"cmn%d5\\t%0, #%n1\", >\"cmn%d4\\t%2, #%n3\"} > }; > static const char * const ite[2] = > { > \"it\\t%d5\", > \"it\\t%d4\" > }; > int swap = > comparison_do
RE: [PATCH, ARM] Generate conditional compares in Thumb2 state
PING... BTW, in patch fix_cond_cmp_2.patch, the file mode of thumb2.md is carelessly changed, so please check attached new patch file fix_cond_cmp_3.patch. Thanks, -Jiangning > -Original Message- > From: Jiangning Liu [mailto:jiangning@arm.com] > Sent: Monday, August 08, 2011 2:01 PM > To: 'Ramana Radhakrishnan' > Cc: gcc-patches@gcc.gnu.org > Subject: RE: [PATCH, ARM] Generate conditional compares in Thumb2 state > > In attached new patch, arm_arch_thumb2 is changed to TARGET_THUMB2. I > tried that with my patch command line option -mcpu=armv7-a9 doesn't > generate IT instruction any longer, unless option "-mthumb" is being > added. > > All of my tests assume command line option -mthumb, while cortex-M0, > cortex-M3 cortex-M4 are covered by options -mcpu=cortex-m0, - > march=armv7-m, and -march=armv7e-m respectively. > > As for the extra problem exposed by this specific case, may we treat it > as a separate fix to decouple it with this one, and I can give follow > up later on? I think it is a general problem not only for the > particular pattern it/op/it/op. But I'm not sure how far we can go to > optimize this kind of problems introduced by IT block. For this > specific case, I see "if conversion" already generates conditional move > before combination pass. So basically the peephole rules may probably > work for most of the general scenarios. My initial thought is go over > the rules introducing IT block and try to figure out all of the > combination that two of this kinds of rules can be in sequential order. > Maybe we only need to handle the most common case like this one. Since > I do see a bunch of rules have something to do with problem, I'd like > to look into all of them to give a most reasonable solution in a > separate fix. > > Does it make sense? > > Thanks, > -Jiangning > > > -----Original Message- > > From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org] > > Sent: Friday, August 05, 2011 9:20 AM > > To: Jiangning Liu > > Cc: gcc-patches@gcc.gnu.org > > Subject: Re: [PATCH, ARM] Generate conditional compares in Thumb2 > > state > > > > On 3 August 2011 08:48, Jiangning Liu wrote: > > > This patch is to generate more conditional compare instructions in > > Thumb2 > > > state. Given an example like below, > > > > > > int f(int i, int j) > > > { > > > if ( (i == '+') || (j == '-') ) { > > > return i; > > > } else { > > > return j; > > > } > > > } > > > > > > Without the patch, compiler generates the following codes, > > > > > > sub r2, r0, #43 > > > rsbs r3, r2, #0 > > > adc r3, r3, r2 > > > cmp r1, #45 > > > it eq > > > orreq r3, r3, #1 > > > cmp r3, #0 > > > it eq > > > moveq r0, r1 > > > bx lr > > > > > > With the patch, compiler can generate conditional jump like below, > > > > > > cmp r0, #43 > > > it ne > > > cmpne r1, #45 > > > it ne > > > movne r0, r1 > > > bx lr > > > > > > Nice improvement but there could be a single it block to handle both > > and thus you could make this even better with > > > > cmp r0, #43 > > itt ne > > cmpne r1 ,#45 > > movne r0, r1 > > > > The way to do this would be to try and split this post-reload > > unfortunately into the cmp instruction and the conditional compare > > with the appropriate instruction length - Then the backend has a > > chance of merging some of this into a single instruction. > > Unfortunately that won't be very straight-forward but that's a > > direction we probably ought to proceed with in this case. > > > > In a number of places: > > > > > + if (arm_arch_thumb2) > > > > Ah instead of this please use if (TARGET_THUMB2) - arm_arch_thumb2 is > > true based on the architecture levels and not necessarily if the user > > wants to generate Thumb code. I don't want an unnecessary IT > > instruction being emitted in the ASM block in ARM state for v7-a and > > above. > > > > > Tested against arm-none-eabi target and no regression found. > > > > Presumably for ARM and Thumb2 state ? > > > > > > cheers > > Ramana fix_cond_cmp_3.patch Description: Binary data
RE: [PATCH, ARM] Generate conditional compares in Thumb2 state
In attached new patch, arm_arch_thumb2 is changed to TARGET_THUMB2. I tried that with my patch command line option -mcpu=armv7-a9 doesn't generate IT instruction any longer, unless option "-mthumb" is being added. All of my tests assume command line option -mthumb, while cortex-M0, cortex-M3 cortex-M4 are covered by options -mcpu=cortex-m0, -march=armv7-m, and -march=armv7e-m respectively. As for the extra problem exposed by this specific case, may we treat it as a separate fix to decouple it with this one, and I can give follow up later on? I think it is a general problem not only for the particular pattern it/op/it/op. But I'm not sure how far we can go to optimize this kind of problems introduced by IT block. For this specific case, I see "if conversion" already generates conditional move before combination pass. So basically the peephole rules may probably work for most of the general scenarios. My initial thought is go over the rules introducing IT block and try to figure out all of the combination that two of this kinds of rules can be in sequential order. Maybe we only need to handle the most common case like this one. Since I do see a bunch of rules have something to do with problem, I'd like to look into all of them to give a most reasonable solution in a separate fix. Does it make sense? Thanks, -Jiangning > -Original Message- > From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org] > Sent: Friday, August 05, 2011 9:20 AM > To: Jiangning Liu > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH, ARM] Generate conditional compares in Thumb2 state > > On 3 August 2011 08:48, Jiangning Liu wrote: > > This patch is to generate more conditional compare instructions in > Thumb2 > > state. Given an example like below, > > > > int f(int i, int j) > > { > > if ( (i == '+') || (j == '-') ) { > > return i; > > } else { > > return j; > > } > > } > > > > Without the patch, compiler generates the following codes, > > > > sub r2, r0, #43 > > rsbs r3, r2, #0 > > adc r3, r3, r2 > > cmp r1, #45 > > it eq > > orreq r3, r3, #1 > > cmp r3, #0 > > it eq > > moveq r0, r1 > > bx lr > > > > With the patch, compiler can generate conditional jump like below, > > > > cmp r0, #43 > > it ne > > cmpne r1, #45 > > it ne > > movne r0, r1 > > bx lr > > > Nice improvement but there could be a single it block to handle both > and thus you > could make this even better with > > cmp r0, #43 > itt ne > cmpne r1 ,#45 > movne r0, r1 > > The way to do this would be to try and split this post-reload > unfortunately into the cmp instruction and the conditional compare > with the appropriate instruction length - Then the backend has a > chance of merging some of this into a single instruction. > Unfortunately that won't be very straight-forward but that's a > direction we probably ought to proceed with in this case. > > In a number of places: > > > + if (arm_arch_thumb2) > > Ah instead of this please use if (TARGET_THUMB2) - arm_arch_thumb2 is > true based on the architecture levels and not necessarily if the user > wants to generate Thumb code. I don't want an unnecessary IT > instruction being emitted in the ASM block in ARM state for v7-a and > above. > > > Tested against arm-none-eabi target and no regression found. > > Presumably for ARM and Thumb2 state ? > > > cheers > Ramana fix_cond_cmp_2.patch Description: Binary data
[PATCH, ARM] Generate conditional compares in Thumb2 state
This patch is to generate more conditional compare instructions in Thumb2 state. Given an example like below, int f(int i, int j) { if ( (i == '+') || (j == '-') ) { return i; } else { return j; } } Without the patch, compiler generates the following codes, sub r2, r0, #43 rsbsr3, r2, #0 adc r3, r3, r2 cmp r1, #45 it eq orreq r3, r3, #1 cmp r3, #0 it eq moveq r0, r1 bx lr With the patch, compiler can generate conditional jump like below, cmp r0, #43 it ne cmpne r1, #45 it ne movne r0, r1 bx lr The patch is essentially to insert *it* instruction for the following rules in arm.md, * cmp_ite0 * cmp_ite1 * cmp_and * cmp_ior Tested against arm-none-eabi target and no regression found. Source code Changelog would be: 2011-07-29 Jiangning Liu * config/arm/arm.md (*ior_scc_scc): Enable for Thumb2 as well. (*ior_scc_scc_cmp): Likewise (*and_scc_scc): Likewise. (*and_scc_scc_cmp): Likewise. (*and_scc_scc_nodom): Likewise. (*cmp_ite0, *cmp_ite1, *cmp_and, *cmp_ior): Handle Thumb2. Testsuite Changelog would be: 2011-07-29 Jiangning Liu * gcc.target/arm/thumb2-cond-cmp-1.c: New. Make sure conditional compare can be generated. * gcc.target/arm/thumb2-cond-cmp-2.c: Likewise. * gcc.target/arm/thumb2-cond-cmp-3.c: Likewise. * gcc.target/arm/thumb2-cond-cmp-4.c: Likewise. Thanks, -Jiangning fix_cond_cmp.patch Description: Binary data
RE: [RFC] Add middle end hook for stack red zone size
Hi Jakub, Appreciate for your valuable comments! I think SPARC V9 ABI doesn't have red zone defined, right? So stack_red_zone_size should be defined as zero by default, the scheduler would block moving memory accesses across stack adjustment no matter what the offset is. I don't see any risk here. Also, in my patch function *abs* is being used to avoid the opposite stack direction issue as you mentioned. Some people like you insist on the ABI diversity, and actually I agree with you on this. But part of the ABI definition is general for all targets. The point here is memory access beyond stack red zone should be avoided, which is the general part of ABI that compiler should guarantee. For this general part, middle end should take the responsibility. Thanks, -Jiangning > -Original Message- > From: Jakub Jelinek [mailto:ja...@redhat.com] > Sent: Monday, August 01, 2011 6:31 PM > To: Jiangning Liu > Cc: 'Joern Rennecke'; g...@gcc.gnu.org; gcc-patches@gcc.gnu.org; > vmaka...@redhat.com; dje@gmail.com; Richard Henderson; Ramana > Radhakrishnan; 'Ramana Radhakrishnan' > Subject: Re: [RFC] Add middle end hook for stack red zone size > > On Mon, Aug 01, 2011 at 06:14:27PM +0800, Jiangning Liu wrote: > > ARM. You are right, they were all fixed in back-ends in the past, but > we > > should > > fix the bug in a general way to make GCC infrastructure stronger, > rather > > than fixing the problem target-by-target and case-by-case! If you > further > > look into the back-end fixes in x86 and PowerPC, you may find they > looks > > quite similar in back-ends. > > > > Red zone is only one difficulty, your patch is e.g. completely ignoring > existence of biased stack pointers (e.g. SPARC -m64 has them). > Some targets have stack growing in opposite direction, etc. > We have really a huge amount of very diverse ABIs and making the > middle-end > grok what is an invalid stack access is difficult. > > Jakub
RE: [RFC] Add middle end hook for stack red zone size
The answer is ARM can. However, if you look into the bugs PR30282 and PR38644, PR44199, you may find in history, there are several different cases in different ports reporting the similar failures, covering x86, PowerPC and ARM. You are right, they were all fixed in back-ends in the past, but we should fix the bug in a general way to make GCC infrastructure stronger, rather than fixing the problem target-by-target and case-by-case! If you further look into the back-end fixes in x86 and PowerPC, you may find they looks quite similar in back-ends. Thanks, -Jiangning > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] > On Behalf Of Jakub Jelinek > Sent: Monday, August 01, 2011 5:12 PM > To: Jiangning Liu > Cc: 'Joern Rennecke'; g...@gcc.gnu.org; gcc-patches@gcc.gnu.org; > vmaka...@redhat.com; dje@gmail.com; Richard Henderson; Ramana > Radhakrishnan; 'Ramana Radhakrishnan' > Subject: Re: [RFC] Add middle end hook for stack red zone size > > On Mon, Aug 01, 2011 at 11:44:04AM +0800, Jiangning Liu wrote: > > It's quite necessary to solve the general problem in middle-end rather than in > back-end. > > That's what we disagree on. All back-ends but ARM are able to handle it > right, why can't ARM too? The ABI rules for stack handling in the epilogues > are simply too diverse and complex to be handled easily in the scheduler. > > Jakub
RE: [RFC] Add middle end hook for stack red zone size
Joern, Thanks for your valuable feedback! This is only a RFC, and I will send out formal patch along with ChangLog later on. Basically, my patch is only to add new dependence in scheduler, and it only blocks some instruction movements, so it is NO RISK to compiler correctness. For whatever stack pointer changes you gave in different scenarios, the current code base should already work. My patch intends neither to replace old dependences, nor maximize the scheduler capability due to the existence of red zone in stack. It is only to block the memory access moving over stack pointer adjustment if distance is beyond red zone size, which is an OS requirement due to interruption existence. Stack adjustment in epilogue is a very general usage in stack frame. It's quite necessary to solve the general problem in middle-end rather than in back-end. Also, that old patch you attached is to solve the data dependence between two memory accesses, but stack pointer doesn't really have data dependence with memory access without using stack pointer, so they have different stories. Alternative solution of without adding blunt scheduling barrier is we insert an independent pass before scheduler to create RTL barrier by using the same interface stack_red_zone_size, but it would really be an over-design, if we add a new pass only for this *small* functionality. In my patch, *abs* of offset is being used, so you are right that it's possible to get false positive to be too conservative, but there won't exist false negative, because my code would only add new dependences. Since the compilation is based on function, it would be OK if red zone size varies due to different ABI. Could you please tell me exactly on what system being supported by GCC red zone size can be different for incoming and outgoing? And also how scheduler guarantee the correctness in current code base? Anyway, I don't think my patch will break the original solution. Thanks, -Jiangning > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] > On Behalf Of Joern Rennecke > Sent: Tuesday, July 26, 2011 10:33 AM > To: Jiangning Liu > Cc: g...@gcc.gnu.org; gcc-patches@gcc.gnu.org; vmaka...@redhat.com; > dje@gmail.com; Richard Henderson; Ramana Radhakrishnan; 'Ramana > Radhakrishnan' > Subject: RE: [RFC] Add middle end hook for stack red zone size > > Quoting Jiangning Liu : > > > Hi, > > > > One month ago, I sent out this RFC to *gcc-patches* mail list, but I > > didn't receive any response yet. So I'm forwarding this mail to > > *gcc* mail list. Can anybody here really give feedback to me? > > Well, I couldn't approve any patch, but I can point out some issues with your > patch. > > First, it's missing a ChangeLog, and you don't state how you have tested it. > And regarding the code in sched_analyze_1, I think you'll get false positives > with > alloca, and false negatives when registers are involved to compute offsets or > to > restore the stack pointer from. > > FWIW, I think generally blunt scheduling barriers should be avoided, and > instead > the dependencies made visible to the scheduler. > E.g., I've been working with another architecture with a redzone, where at > -fno- > omit-frame-pointer, the prologue can put pretend_args into the redzone, then > after > stack adjustment and frame allocation, these arguments are accessed via the > frame > pointer. > > With the attached patchlet, alias analysis works for this situation too, so > no blunt > scheduling block is required. > > Likewise, with stack adjustments, they should not affect scheduling in > general, but > be considered to clobber the part of the frame that is being exposed to > interrupt > writes either before or after the adjustment. > At the moment, each port that wants to have such selective scheduling > blockages > has to define a stack_adjust pattern with a memory clobber in a parallel, > with a > memref that shows the high water mark of possible interrupt stack writes. > Prima facia it would seem convenient if you only had to tell the middle-end > about > the redzone size, and it could figure out the implicit clobbers when the > stack is > changed. However, when a big stack adjustment is being made, or the stack is > realigned, or restored from the frame pointer / another register where it was > saved due to realignment, the adjustment is not so obvious. I'm not sure if > you can > actually create an robust interface that's simpler to use than putting the > right > memory clobber in the stack adjust pattern. Note also that the redzone size > can > vary from function to function depending on ABI-altering attributes, in > particular > for interrupt functions, which can also have different incoming and outgoing > redzone sizes. Plus, you can have an NMI / reset handler which can use the > stack > like an ordinary address register.
RE: [RFC] Add middle end hook for stack red zone size
Hi, One month ago, I sent out this RFC to *gcc-patches* mail list, but I didn't receive any response yet. So I'm forwarding this mail to *gcc* mail list. Can anybody here really give feedback to me? Appreciate your help in advance! -Jiangning -Original Message- From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org] Sent: Tuesday, July 19, 2011 6:18 PM To: Jiangning Liu Cc: gcc-patches@gcc.gnu.org; vmaka...@redhat.com; dje@gmail.com; Richard Henderson; Ramana Radhakrishnan Subject: Re: [RFC] Add middle end hook for stack red zone size 2011/7/19 Jiangning Liu : > > I see a lot of feedbacks on other posts, but mine is still with ZERO > response in the past 3 weeks, so I'm wondering if I made any mistake in my > process? Who can help me? It would be worth CC'ing the other relevant target maintainers as well to get some feedback since the patch touches ARM, x86 and Powerpc. I've added the maintainers for i386 and PPC to the CC list using the email addresses from the MAINTAINERS file. Thanks, Ramana > > Thanks, > -Jiangning > > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] > On Behalf Of Jiangning Liu > Sent: Tuesday, July 05, 2011 8:32 AM > To: gcc-patches@gcc.gnu.org; rgue...@gcc.gnu.org > Subject: RE: [RFC] Add middle end hook for stack red zone size > > PING... > > I just merged with the latest code base and generated new patch as attached. > > Thanks, > -Jiangning > >> -Original Message- >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- >> ow...@gcc.gnu.org] On Behalf Of Jiangning Liu >> Sent: 2011年6月28日 4:38 PM >> To: gcc-patches@gcc.gnu.org >> Subject: [RFC] Add middle end hook for stack red zone size >> >> This patch is to fix PR38644, which is a bug with long history about >> stack red zone access, and PR30282 is correlated. >> >> Originally red zone concept is not exposed to middle-end, and back-end >> uses special logic to add extra memory barrier RTL and help the >> correct dependence in middle-end. This way different back-ends must >> handle red zone problem by themselves. For example, X86 target >> introduced function >> ix86_using_red_zone() to judge red zone access, while POWER introduced >> offset_below_red_zone_p() to judge it. Note that they have different >> semantics, but the logic in caller sites of back-end uses them to >> decide whether adding memory barrier RTL or not. If back-end >> incorrectly handles this, bug would be introduced. >> >> Therefore, the correct method should be middle-end handles red zone >> related things to avoid the burden in different back-ends. To be >> specific for PR38644, this middle-end problem causes incorrect >> behavior for ARM target. >> This patch exposes red zone concept to middle-end by introducing a >> middle-end/back-end hook TARGET_STACK_RED_ZONE_SIZE defined in >> target.def, and by default its value is 0. Back-end may redefine this >> function to provide concrete red zone size according to specific ABI >> requirements. >> >> In middle end, scheduling dependence is modified by using this hook >> plus checking stack frame pointer adjustment instruction to decide >> whether memory references need to be all flushed out or not. In >> theory, if TARGET_STACK_RED_ZONE_SIZE is defined correctly, back-end >> would not be required to specially handle this scheduling dependence >> issue by introducing extra memory barrier RTL. >> >> In back-end, the following changes are made to define the hook, >> 1) For X86, TARGET_STACK_RED_ZONE_SIZE is redefined to be >> ix86_stack_red_zone_size() in i386.c, which is an newly introduced >> function. >> 2) For POWER, TARGET_STACK_RED_ZONE_SIZE is redefined to be >> rs6000_stack_red_zone_size() in rs6000.c, which is also a newly >> defined function. >> 3) For ARM and others, TARGET_STACK_RED_ZONE_SIZE is defined to be >> default_stack_red_zone_size in targhooks.c, and this function returns >> 0, which means ARM eabi and others don't support red zone access at all. >> >> In summary, the relationship between ABI and red zone access is like >> below, >> >> - >> | ARCH | ARM | X86 |POWER | others | >> |--|---|---|---|| >> |ABI | EABI | MS_64 | other | AIX | V4 || >> |--|---|---|---||--|| >> | RED ZONE | No | YES | No | YES | No | No | >> |--|---|---|---||--|| >> | RED ZONE SIZE| 0 | 128 | 0 |220/288 | 0 |0 | >> - >> >> Thanks, >> -Jiangning > > > >
RE: [RFC] Add middle end hook for stack red zone size
Hello... Can *anybody* give me comments on this bug fix? Our customers have been complaining about this bug with long history. I see a lot of feedbacks on other posts, but mine is still with ZERO response in the past 3 weeks, so I'm wondering if I made any mistake in my process? Who can help me? Thanks, -Jiangning -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On Behalf Of Jiangning Liu Sent: Tuesday, July 05, 2011 8:32 AM To: gcc-patches@gcc.gnu.org; rgue...@gcc.gnu.org Subject: RE: [RFC] Add middle end hook for stack red zone size PING... I just merged with the latest code base and generated new patch as attached. Thanks, -Jiangning > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Jiangning Liu > Sent: 2011年6月28日 4:38 PM > To: gcc-patches@gcc.gnu.org > Subject: [RFC] Add middle end hook for stack red zone size > > This patch is to fix PR38644, which is a bug with long history about > stack red zone access, and PR30282 is correlated. > > Originally red zone concept is not exposed to middle-end, and back-end > uses special logic to add extra memory barrier RTL and help the > correct dependence in middle-end. This way different back-ends must > handle red zone problem by themselves. For example, X86 target > introduced function > ix86_using_red_zone() to judge red zone access, while POWER introduced > offset_below_red_zone_p() to judge it. Note that they have different > semantics, but the logic in caller sites of back-end uses them to > decide whether adding memory barrier RTL or not. If back-end > incorrectly handles this, bug would be introduced. > > Therefore, the correct method should be middle-end handles red zone > related things to avoid the burden in different back-ends. To be > specific for PR38644, this middle-end problem causes incorrect > behavior for ARM target. > This patch exposes red zone concept to middle-end by introducing a > middle-end/back-end hook TARGET_STACK_RED_ZONE_SIZE defined in > target.def, and by default its value is 0. Back-end may redefine this > function to provide concrete red zone size according to specific ABI > requirements. > > In middle end, scheduling dependence is modified by using this hook > plus checking stack frame pointer adjustment instruction to decide > whether memory references need to be all flushed out or not. In > theory, if TARGET_STACK_RED_ZONE_SIZE is defined correctly, back-end > would not be required to specially handle this scheduling dependence > issue by introducing extra memory barrier RTL. > > In back-end, the following changes are made to define the hook, > 1) For X86, TARGET_STACK_RED_ZONE_SIZE is redefined to be > ix86_stack_red_zone_size() in i386.c, which is an newly introduced > function. > 2) For POWER, TARGET_STACK_RED_ZONE_SIZE is redefined to be > rs6000_stack_red_zone_size() in rs6000.c, which is also a newly > defined function. > 3) For ARM and others, TARGET_STACK_RED_ZONE_SIZE is defined to be > default_stack_red_zone_size in targhooks.c, and this function returns > 0, which means ARM eabi and others don't support red zone access at all. > > In summary, the relationship between ABI and red zone access is like > below, > > - > | ARCH | ARM | X86 |POWER | others | > |--|---|---|---|| > |ABI | EABI | MS_64 | other | AIX | V4 || > |--|---|---|---||--|| > | RED ZONE | No | YES | No | YES | No | No | > |--|---|---|---||--|| > | RED ZONE SIZE| 0 | 128 | 0 |220/288 | 0 |0 | > - > > Thanks, > -Jiangning
RE: [RFC] Add middle end hook for stack red zone size
PING... I just merged with the latest code base and generated new patch as attached. Thanks, -Jiangning > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Jiangning Liu > Sent: 2011年6月28日 4:38 PM > To: gcc-patches@gcc.gnu.org > Subject: [RFC] Add middle end hook for stack red zone size > > This patch is to fix PR38644, which is a bug with long history about > stack red zone access, and PR30282 is correlated. > > Originally red zone concept is not exposed to middle-end, and back-end > uses special logic to add extra memory barrier RTL and help the correct > dependence in middle-end. This way different back-ends must handle red > zone problem by themselves. For example, X86 target introduced function > ix86_using_red_zone() to judge red zone access, while POWER introduced > offset_below_red_zone_p() to judge it. Note that they have different > semantics, but the logic in caller sites of back-end uses them to > decide whether adding memory barrier RTL or not. If back-end > incorrectly handles this, bug would be introduced. > > Therefore, the correct method should be middle-end handles red zone > related things to avoid the burden in different back-ends. To be > specific for PR38644, this middle-end problem causes incorrect behavior > for ARM target. > This patch exposes red zone concept to middle-end by introducing a > middle-end/back-end hook TARGET_STACK_RED_ZONE_SIZE defined in > target.def, and by default its value is 0. Back-end may redefine this > function to provide concrete red zone size according to specific ABI > requirements. > > In middle end, scheduling dependence is modified by using this hook > plus checking stack frame pointer adjustment instruction to decide > whether memory references need to be all flushed out or not. In theory, > if TARGET_STACK_RED_ZONE_SIZE is defined correctly, back-end would not > be required to specially handle this scheduling dependence issue by > introducing extra memory barrier RTL. > > In back-end, the following changes are made to define the hook, > 1) For X86, TARGET_STACK_RED_ZONE_SIZE is redefined to be > ix86_stack_red_zone_size() in i386.c, which is an newly introduced > function. > 2) For POWER, TARGET_STACK_RED_ZONE_SIZE is redefined to be > rs6000_stack_red_zone_size() in rs6000.c, which is also a newly defined > function. > 3) For ARM and others, TARGET_STACK_RED_ZONE_SIZE is defined to be > default_stack_red_zone_size in targhooks.c, and this function returns 0, > which means ARM eabi and others don't support red zone access at all. > > In summary, the relationship between ABI and red zone access is like > below, > > - > | ARCH | ARM | X86 |POWER | others | > |--|---|---|---|| > |ABI | EABI | MS_64 | other | AIX | V4 || > |--|---|---|---||--|| > | RED ZONE | No | YES | No | YES | No | No | > |--|---|---|---||--|| > | RED ZONE SIZE| 0 | 128 | 0 |220/288 | 0 |0 | > - > > Thanks, > -Jiangning stack-red-zone-patch-38644-4.patch Description: Binary data
[RFC] Add middle end hook for stack red zone size
This patch is to fix PR38644, which is a bug with long history about stack red zone access, and PR30282 is correlated. Originally red zone concept is not exposed to middle-end, and back-end uses special logic to add extra memory barrier RTL and help the correct dependence in middle-end. This way different back-ends must handle red zone problem by themselves. For example, X86 target introduced function ix86_using_red_zone() to judge red zone access, while POWER introduced offset_below_red_zone_p() to judge it. Note that they have different semantics, but the logic in caller sites of back-end uses them to decide whether adding memory barrier RTL or not. If back-end incorrectly handles this, bug would be introduced. Therefore, the correct method should be middle-end handles red zone related things to avoid the burden in different back-ends. To be specific for PR38644, this middle-end problem causes incorrect behavior for ARM target. This patch exposes red zone concept to middle-end by introducing a middle-end/back-end hook TARGET_STACK_RED_ZONE_SIZE defined in target.def, and by default its value is 0. Back-end may redefine this function to provide concrete red zone size according to specific ABI requirements. In middle end, scheduling dependence is modified by using this hook plus checking stack frame pointer adjustment instruction to decide whether memory references need to be all flushed out or not. In theory, if TARGET_STACK_RED_ZONE_SIZE is defined correctly, back-end would not be required to specially handle this scheduling dependence issue by introducing extra memory barrier RTL. In back-end, the following changes are made to define the hook, 1) For X86, TARGET_STACK_RED_ZONE_SIZE is redefined to be ix86_stack_red_zone_size() in i386.c, which is an newly introduced function. 2) For POWER, TARGET_STACK_RED_ZONE_SIZE is redefined to be rs6000_stack_red_zone_size() in rs6000.c, which is also a newly defined function. 3) For ARM and others, TARGET_STACK_RED_ZONE_SIZE is defined to be default_stack_red_zone_size in targhooks.c, and this function returns 0, which means ARM eabi and others don't support red zone access at all. In summary, the relationship between ABI and red zone access is like below, - | ARCH | ARM | X86 |POWER | others | |--|---|---|---|| |ABI | EABI | MS_64 | other | AIX | V4 || |--|---|---|---||--|| | RED ZONE | No | YES | No | YES | No | No | |--|---|---|---||--|| | RED ZONE SIZE| 0 | 128 | 0 |220/288 | 0 |0 | - Thanks, -Jiangning stack-red-zone-patch-38644-3.patch Description: Binary data