from:"Jiangning Liu"

Re: [RFC, ivopts] fix bugs in ivopts address cost computation

2012-07-05 Thread Jiangning Liu

Hi,

For the following code change,

@@ -4212,11 +4064,6 @@ get_computation_cost_at (struct ivopts_d
 cost.cost += adjust_setup_cost (data,
add_cost (TYPE_MODE (ctype), speed));

-  /* Having offset does not affect runtime cost in case it is added to
- symbol, but it increases complexity.  */
-  if (offset)
-cost.complexity++;
-
   cost.cost += add_cost (TYPE_MODE (ctype), speed);

   aratio = ratio > 0 ? ratio : -ratio;

I think this shouldn't be removed. The offset may be affected by the
position of inserting reduction variable accumulation statement. There
will be different cases between before and after reduction variable
accumulation. The cost of replacing use point with reduction variable
should be different accordingly.

BTW, I personally think the current ivopt cost modelling basically
works fine, although there might be some tunings needed. The most
difficult part is the choice of reduction variable candidates has
something to do with register pressure cost, while the register cost
estimate is not accurate enough at this stage because we don't have
back-end live range interference graph at all. we are always able to
find holes on some particular cases or benchmarks, but we can't only
want to find a optimal result for them, and the tuning needs to be
backed by more comprehensive result.

Thanks,
-Jiangning

2012/6/6 Sandra Loosemore :
> My colleagues and I have been working on the GCC port for the Qualcomm
> Hexagon.  Along the way I noticed that we were getting poor results
> from the ivopts pass no matter how we adjusted the target-specific RTX
> costs.  In many cases ivopts was coming up with candidate use costs
> that seemed completely inconsistent with the target cost model.  On
> further inspection, I found what appears to be a whole bunch of bugs
> in the way ivopts is computing address costs:
>
> (1) While the address cost computation is assuming in some situations
> that pre/post increment/decrement addressing will be used if
> supported by the target, it isn't actually using the target's address
> cost for such forms -- instead, just the cost of the form that would
> be used if autoinc weren't available/applicable.
>
> (2) The computation to determine which multiplier values are supported
> by target addressing modes is constructing an address rtx of the form
> (reg * ratio) to do the tests.  This isn't a valid address RTX on
> Hexagon, although both (reg + reg * ratio) and (sym + reg * ratio)
> are.  Because it's choosing the wrong address form to probe with, it
> thinks that the target doesn't support multipliers at all and is
> incorrectly tacking on an extra cost for them.  I also note that it's
> assuming that the same set of ratios are supported by all three
> address forms that can potentially include them, and that all valid
> ratios have the same cost.
>
> (3) The computation to determine the range of valid constant offsets
> for address forms that can include them is probing the upper end of
> the range using constants of the form ((1< offsets have to be aligned appropriately for the mode, so it's
> incorrectly rejecting all positive offsets for non-char modes.  And
> again, it's assuming that the same range of offsets are supported by
> all address forms that can legitimately include them, and that all
> valid offsets have the same cost.  The latter isn't true on Hexagon.
>
> (4) The cost adjustment for converting the symbol_present address to a
> var_present address seems overly optimistic in assuming that the
> symbol load will be hoisted outside the loop.  I looked at a lot of
> code where this was not happening no matter how expensive I made the
> absolute addressing forms in the target-specific costs.  Also, if
> subsequent passes actually do hoist the symbol load, this requires an
> additional register to be available over the entire loop, which ivopts
> isn't accounting for in any way.  It seems to me that this adjustment
> shouldn't be attempted when the symbol_present form is a legitimate
> address (because subsequent passes are not doing the anticipated
> optimization in that case).  There's also a bug present in the cost
> accounting: it's adding the full cost of the symbol + var addition,
> whereas it should be pro-rated across iterations (see the way this is
> handled for the non-address cases in get_computation_cost_at).
>
> (5) The documentation of TARGET_ADDRESS_COST says that when two
> (legitimate) address expressions have the same lowest cost, the one
> with the higher complexity is used.  But, the ivopts code is doing the
> opposite of this and choosing the lower complexity as the tie-breaker.
> (Actually, it's also computing the complexity without regard to
> whether the address rtx is even legitimate on the target.)
>
> (6) The way get_address_costs is precomputing and caching a complete
> set of cost data for each addressing mode seems incorrect to me, in
> the general case.  For example, consider MIPS where

[PATCH][Testsuite] XFAIL scev-3/4.c and add scev-5.c

2012-03-21 Thread Jiangning Liu

Hi,

This patch is to XFAIL scev-3.c and scev-5.c. 

The bug is going to be fixed after Richard Guenther fix a serials of
problems related to POINTER_PLUS_EXPR and sizetype precision.

Thanks,
-Jiangning 

ChangeLog for testsuite:

2012-03-21  Jiangning Liu  

PR tree-optimization/52563
* gcc.dg/tree-ssa/scev-3.c: XFAIL on lp64.
* gcc.dg/tree-ssa/scev-4.c: XFAIL on lp64.
* gcc.dg/tree-ssa/scev-5.c: New.

Thanks,
-Jiangning

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
index 28d5c93..ed63a18 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
@@ -14,5 +14,5 @@ f(int k)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" { xfail lp64 } 
+} } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */ diff --git
a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
index 6c1e530..a538c32 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
@@ -19,5 +19,5 @@ f(int k)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" { xfail lp64 } 
+} } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */ diff --git
a/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
new file mode 100644
index 000..b9de36a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int *a_p;
+int a[1000];
+
+f(int k)
+{
+long long i;
+
+for (i=k; i<1000; i+=k) {
+a_p = &a[i];
+*a_p = 100;
+}
+}
+
+/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */

RE: [PATCH] Improve SCEV for array element

2012-03-07 Thread Jiangning Liu



> -Original Message-
> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> Sent: Tuesday, March 06, 2012 9:12 PM
> To: Jiangning Liu
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Improve SCEV for array element
> 
> On Fri, Jan 20, 2012 at 10:06 AM, Jiangning Liu 
> wrote:
> >> It's definitely not ok at this stage but at most for next stage1.
> >
> > OK. I may wait until next stage1.
> >
> >> This is a very narrow pattern-match.  It doesn't allow for &a[i].x
> for
> >> example,
> >> even if a[i] is a one-element structure.  I think the canonical way
> of
> >> handling
> >> ADDR_EXPR is to use sth like
> >>
> >> base = get_inner_reference (TREE_OPERAND (rhs1, 0), ...,
> &offset,  ...);
> >> base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base);
> >>         chrec1 = analyze_scalar_evolution (loop, base);
> >>         chrec2 = analyze_scalar_evolution (loop, offset);
> >>         chrec1 = chrec_convert (type, chrec1, at_stmt);
> >>         chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt);
> >>         res = chrec_fold_plus (type, chrec1, chrec2);
> >>
> >> where you probably need to handle scev_not_known when analyzing
> offset
> >> (which might be NULL).  You also need to add bitpos to the base
> address
> >> (in bytes, of course).  Note that the &MEM_REF case would naturally
> >> work
> >> with this as well.
> >
> > OK. New patch is like below, and bootstrapped on x86-32.
> 
> You want instead of
> 
> +  if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF
> +  || TREE_CODE (TREE_OPERAND (rhs1, 0)) == MEM_REF
> +  || TREE_CODE (TREE_OPERAND (rhs1, 0)) == COMPONENT_REF)
> +{
> 
> if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == MEM_REF
> || handled_component_p (TREE_OPERAND (rhs1, 0)))
>   {
> 
> + base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base);
> + chrec1 = analyze_scalar_evolution (loop, base);
> 
> can you please add a wrapper
> 
> tree
> analyze_scalar_evolution_for_address_of (struct loop *loop, tree var)
> {
>   return analyze_scalar_evolution (loop, build_fold_addr_expr (var));
> }
> 
> and call that instead of building the ADDR_EXPR there?  We want
> to avoid building that tree node, but even such a simple wrapper would
> be prefered.
> 
> + if (bitpos)
> 
> if (bitpos != 0)
> 
> + chrec3 = build_int_cst (integer_type_node,
> + bitpos / BITS_PER_UNIT);
> 
> please use size_int (bitpos / BITS_PER_UNIT) instead.  Using
> integer_type_node is definitely wrong.
> 
> Ok with that changes.
> 

Richard,

Modified as all you suggested, and new code is like below. Bootstrapped on
x86-32. OK for trunk now?

Thanks,
-Jiangning

ChangeLog:

2012-03-08  Jiangning Liu  

* tree-scalar-evolution (interpret_rhs_expr): generate chrec for
array reference and component reference.
(analyze_scalar_evolution_for_address_of): New.

ChangeLog for testsuite:

2012-03-08  Jiangning Liu  

* gcc.dg/tree-ssa/scev-3.c: New.
* gcc.dg/tree-ssa/scev-4.c: New.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
new file mode 100644
index 000..28d5c93
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int *a_p;
+int a[1000];
+
+f(int k)
+{
+   int i;
+
+   for (i=k; i<1000; i+=k) {
+   a_p = &a[i];
+   *a_p = 100;
+}
+}
+
+/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
new file mode 100644
index 000..6c1e530
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+typedef struct {
+   int x;
+   int y;
+} S;
+
+int *a_p;
+S a[1000];
+
+f(int k)
+{
+   int i;
+
+   for (i=k; i<1000; i+=k) {
+   a_p = &a[i].y;
+   *a_p = 100;
+}
+}
+
+/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 2077c8d..c719984
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -266,6 +266,8 @@ along with GCC; see the file COPYING3.  If not see

RE: [PATCH] Improve SCEV for array element

2012-02-13 Thread Jiangning Liu



> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Jiangning Liu
> Sent: Friday, January 20, 2012 5:07 PM
> To: 'Richard Guenther'
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] Improve SCEV for array element
> 
> > It's definitely not ok at this stage but at most for next stage1.
> 
> OK. I may wait until next stage1.
> 
> > This is a very narrow pattern-match.  It doesn't allow for &a[i].x
> for
> > example, even if a[i] is a one-element structure.  I think the
> > canonical way of handling ADDR_EXPR is to use sth like
> >
> > base = get_inner_reference (TREE_OPERAND (rhs1, 0), ..., &offset,
> > ...); base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base);
> > chrec1 = analyze_scalar_evolution (loop, base);
> > chrec2 = analyze_scalar_evolution (loop, offset);
> > chrec1 = chrec_convert (type, chrec1, at_stmt);
> > chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt);
> > res = chrec_fold_plus (type, chrec1, chrec2);
> >
> > where you probably need to handle scev_not_known when analyzing
> offset
> > (which might be NULL).  You also need to add bitpos to the base
> > address (in bytes, of course).  Note that the &MEM_REF case would
> > naturally work with this as well.
> 
> OK. New patch is like below, and bootstrapped on x86-32.
> 
> ChangeLog:
> 
> 2012-01-20  Jiangning Liu  
> 
> * tree-scalar-evolution (interpret_rhs_expr): generate chrec
> for
> array reference and component reference.
> 
> 
> ChangeLog for testsuite:
> 
> 2012-01-20  Jiangning Liu  
> 
> * gcc.dg/tree-ssa/scev-3.c: New.
> * gcc.dg/tree-ssa/scev-4.c: New.
> 

Richard,

PING... Is this patch OK after branch 4.7 is created and trunk is open
again?

Thanks,
-Jiangning

> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
> b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
> new file mode 100644
> index 000..28d5c93
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +int *a_p;
> +int a[1000];
> +
> +f(int k)
> +{
> + int i;
> +
> + for (i=k; i<1000; i+=k) {
> + a_p = &a[i];
> + *a_p = 100;
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
> b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
> new file mode 100644
> index 000..6c1e530
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +typedef struct {
> + int x;
> + int y;
> +} S;
> +
> +int *a_p;
> +S a[1000];
> +
> +f(int k)
> +{
> + int i;
> +
> + for (i=k; i<1000; i+=k) {
> + a_p = &a[i].y;
> + *a_p = 100;
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
> index 2077c8d..4e06b75
> --- a/gcc/tree-scalar-evolution.c
> +++ b/gcc/tree-scalar-evolution.c
> @@ -1712,16 +1712,61 @@ interpret_rhs_expr (struct loop *loop, gimple
> at_stmt,
>switch (code)
>  {
>  case ADDR_EXPR:
> -  /* Handle &MEM[ptr + CST] which is equivalent to
> POINTER_PLUS_EXPR.
> */
> -  if (TREE_CODE (TREE_OPERAND (rhs1, 0)) != MEM_REF)
> - {
> -   res = chrec_dont_know;
> -   break;
> - }
> +  if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF
> +  || TREE_CODE (TREE_OPERAND (rhs1, 0)) == MEM_REF
> +  || TREE_CODE (TREE_OPERAND (rhs1, 0)) == COMPONENT_REF)
> +{
> +   enum machine_mode mode;
> +   HOST_WIDE_INT bitsize, bitpos;
> +   int unsignedp;
> +   int volatilep = 0;
> +   tree base, offset;
> +   tree chrec3;
> +
> +   base = get_inner_reference (TREE_OPERAND (rhs1, 0),
> +   &bitsize, &bitpos, &offset,
> +   &mode, &unsignedp, &volatilep, false);
> +
> +   if (TREE_CODE (base) == MEM_REF)
> + {
> +   rhs2 = TREE_OPERAND (base, 1);
> +   rhs1 = TREE_

RE: [PATCH] Improve SCEV for array element

2012-01-20 Thread Jiangning Liu

> It's definitely not ok at this stage but at most for next stage1.

OK. I may wait until next stage1.

> This is a very narrow pattern-match.  It doesn't allow for &a[i].x for
> example,
> even if a[i] is a one-element structure.  I think the canonical way of
> handling
> ADDR_EXPR is to use sth like
> 
> base = get_inner_reference (TREE_OPERAND (rhs1, 0), ..., &offset,  ...);
> base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base);
> chrec1 = analyze_scalar_evolution (loop, base);
> chrec2 = analyze_scalar_evolution (loop, offset);
> chrec1 = chrec_convert (type, chrec1, at_stmt);
> chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt);
> res = chrec_fold_plus (type, chrec1, chrec2);
> 
> where you probably need to handle scev_not_known when analyzing offset
> (which might be NULL).  You also need to add bitpos to the base address
> (in bytes, of course).  Note that the &MEM_REF case would naturally
> work
> with this as well.

OK. New patch is like below, and bootstrapped on x86-32.

ChangeLog:

2012-01-20  Jiangning Liu  

* tree-scalar-evolution (interpret_rhs_expr): generate chrec for
    array reference and component reference.


ChangeLog for testsuite:

2012-01-20  Jiangning Liu  

* gcc.dg/tree-ssa/scev-3.c: New.
* gcc.dg/tree-ssa/scev-4.c: New.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
new file mode 100644
index 000..28d5c93
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int *a_p;
+int a[1000];
+
+f(int k)
+{
+   int i;
+
+   for (i=k; i<1000; i+=k) {
+   a_p = &a[i];
+   *a_p = 100;
+}
+}
+
+/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
new file mode 100644
index 000..6c1e530
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+typedef struct {
+   int x;
+   int y;
+} S;
+
+int *a_p;
+S a[1000];
+
+f(int k)
+{
+   int i;
+
+   for (i=k; i<1000; i+=k) {
+   a_p = &a[i].y;
+   *a_p = 100;
+}
+}
+
+/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 2077c8d..4e06b75
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1712,16 +1712,61 @@ interpret_rhs_expr (struct loop *loop, gimple
at_stmt,
   switch (code)
 {
 case ADDR_EXPR:
-  /* Handle &MEM[ptr + CST] which is equivalent to POINTER_PLUS_EXPR.
*/
-  if (TREE_CODE (TREE_OPERAND (rhs1, 0)) != MEM_REF)
-   {
- res = chrec_dont_know;
- break;
-   }
+  if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF
+  || TREE_CODE (TREE_OPERAND (rhs1, 0)) == MEM_REF
+  || TREE_CODE (TREE_OPERAND (rhs1, 0)) == COMPONENT_REF)
+{
+ enum machine_mode mode;
+ HOST_WIDE_INT bitsize, bitpos;
+ int unsignedp;
+ int volatilep = 0;
+ tree base, offset;
+ tree chrec3;
+
+ base = get_inner_reference (TREE_OPERAND (rhs1, 0),
+ &bitsize, &bitpos, &offset,
+ &mode, &unsignedp, &volatilep, false);
+
+ if (TREE_CODE (base) == MEM_REF)
+   {
+ rhs2 = TREE_OPERAND (base, 1);
+ rhs1 = TREE_OPERAND (base, 0);
+
+ chrec1 = analyze_scalar_evolution (loop, rhs1);
+ chrec2 = analyze_scalar_evolution (loop, rhs2);
+ chrec1 = chrec_convert (type, chrec1, at_stmt);
+ chrec2 = chrec_convert (TREE_TYPE (rhs2), chrec2, at_stmt);
+ res = chrec_fold_plus (type, chrec1, chrec2);
+   }
+ else
+   {
+ base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base);
+ chrec1 = analyze_scalar_evolution (loop, base);
+ chrec1 = chrec_convert (type, chrec1, at_stmt);
+ res = chrec1;
+   }
 
-  rhs2 = TREE_OPERAND (TREE_OPERAND (rhs1, 0), 1);
-  rhs1 = TREE_OPERAND (TREE_OPERAND (rhs1, 0), 0);
-  /* Fall through.  */
+ if (offset != NULL_TREE)
+   {
+ chrec2 = analyze_scalar_evolution (loop, offset);
+ chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt);
+ res = chrec_fold_plus (type, res, chrec2);
+   }
+
+

[PATCH] Improve SCEV for array element

2012-01-17 Thread Jiangning Liu

This code change intends to improve scev for array element and reduce the
common sub-expressions in loop, which may be introduced by multiple
reference of expression like &a[i]. With this optimization the register
pressure can be reduced in loops. 

The problem is originally from a real benchmark, and the test case only
tries to detect the GIMPLE level changes.

Bootstraped on x86-32. OK for trunk?

ChangeLog:

2012-01-05  Jiangning Liu  

* tree-scalar-evolution (interpret_rhs_expr): generate chrec for
array reference.

ChangeLog for testsuite:

2012-01-05  Jiangning Liu  

* gcc.dg/scev-1.c: New.

diff --git a/gcc/testsuite/gcc.dg/scev-1.c b/gcc/testsuite/gcc.dg/scev-1.c
new file mode 100644 index 000..28d5c93
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/scev-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int *a_p;
+int a[1000];
+
+f(int k)
+{
+   int i;
+
+   for (i=k; i<1000; i+=k) {
+   a_p = &a[i];
+   *a_p = 100;
+}
+}
+
+/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c index
2077c8d..de89fc4
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1712,6 +1712,42 @@ interpret_rhs_expr (struct loop *loop, gimple
at_stmt,
   switch (code)
 {
 case ADDR_EXPR:
+  if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF)
+{
+ tree array_ref;
+ tree var_decl, base, offset;
+ tree low_bound;
+ tree unit_size;
+ tree index;
+
+ array_ref = TREE_OPERAND (rhs1, 0);
+ var_decl = TREE_OPERAND (array_ref, 0);
+ index = TREE_OPERAND (array_ref, 1);
+
+ low_bound = array_ref_low_bound (array_ref);
+ unit_size = array_ref_element_size (array_ref);
+
+ /* We assume all arrays have sizes that are a multiple of a byte.
+First subtract the lower bound, if any, in the type of the
+index, then convert to sizetype and multiply by the size of
+the array element.  */
+ if (! integer_zerop (low_bound))
+   index = fold_build2 (MINUS_EXPR, TREE_TYPE (index),
+index, low_bound);
+
+ offset = size_binop (MULT_EXPR,
+  fold_convert (sizetype, index),
+  unit_size);
+
+ base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), var_decl);
+ chrec1 = analyze_scalar_evolution (loop, base);
+ chrec2 = analyze_scalar_evolution (loop, offset);
+  chrec1 = chrec_convert (type, chrec1, at_stmt);
+  chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt);
+ res = chrec_fold_plus (type, chrec1, chrec2);
+ break;
+}
+
   /* Handle &MEM[ptr + CST] which is equivalent to POINTER_PLUS_EXPR.
*/
   if (TREE_CODE (TREE_OPERAND (rhs1, 0)) != MEM_REF)
{

Thanks,
-Jiangning

scev.patch
Description: Binary data

RE: [RFC] Use REG_EXPR in back-end (introduced by optimization to conditional and/or in ARM back-end)

2012-01-11 Thread Jiangning Liu



> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Jiangning Liu
> Sent: Wednesday, December 21, 2011 2:48 PM
> To: 'Richard Henderson'
> Cc: gcc-patches@gcc.gnu.org; 'Richard Guenther'
> Subject: RE: [RFC] Use REG_EXPR in back-end (introduced by optimization
> to conditional and/or in ARM back-end)
> 
> 
> > -Original Message-
> > From: Richard Henderson [mailto:r...@redhat.com]
> > Sent: Tuesday, November 22, 2011 9:55 AM
> > To: Jiangning Liu
> > Cc: gcc-patches@gcc.gnu.org; 'Richard Guenther'
> > Subject: Re: [RFC] Use REG_EXPR in back-end (introduced by
> optimization
> > to conditional and/or in ARM back-end)
> >
> > On 11/21/2011 05:31 PM, Jiangning Liu wrote:
> > > My question is essentially is "May I really use REG_EXPR in back-
> end
> > code?"
> > > like the patch I gave below?
> >
> > I suppose.
> >
> > Another alternative is to use BImode for booleans.  Dunno how much of
> > that
> > you'd be able to gleen from mere rtl expansion or if you'd need
> boolean
> > decls to be expanded with BImode.
> 
> Hi Richard,
> 
> The output of expand pass requires the operands must meet the
> requirement of
> general_operand for binary operations, i.e. all RTX operations on top
> level
> must meet the hardware instruction requirement. Generally for a
> hardware not
> directly supporting a single bit logic operation, we can't generate BI
> mode
> rtx dest.
> 
> So if I simply generate BImode for NE in expand pass, like "subreg:SI
> (ne:BI
> (reg:SI xxx) (const_int 0)))", there would be an assertion failure due
> to
> failing to find an appropriate instruction code to operate on a single
> bit.
> 
> This looks like a GCC design level issue. How would you suggest
> generating a
> legitimate rtx expression containing BImode?
> 

Can anybody give me useful suggestions on this issue?

My question essentially is may I really use BImode to solve my original
problem?

Thanks,
-Jiangning

> Thanks,
> -Jiangning
> 
> >
> > The later would probably need a target hook.  I've often wondered how
> > much
> > ia64 would benefit from that too, being able to store bool variables
> > directly
> > in predicate registers.
> >
> > All of this almost certainly must wait until stage1 opens up again
> > though...
> >
> >
> > r~
> 
> 
> 
>

[RFC][patch] improve scev for array element

2012-01-05 Thread Jiangning Liu

This code change intends to improve scev for array element and reduce the
common sub-expressions in loop, which may be introduced by multiple
reference of expression &a[i]. In the end the register pressure may be
reduced in the loop. The test case is simplified from a real benchmark, and
only want to show the GIMPLE level changes.

Any suggestions?

diff --git a/gcc/testsuite/gcc.dg/scev-1.c b/gcc/testsuite/gcc.dg/scev-1.c
new file mode 100644
index 000..28d5c93
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/scev-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int *a_p;
+int a[1000];
+
+f(int k)
+{
+   int i;
+
+   for (i=k; i<1000; i+=k) {
+   a_p = &a[i];
+   *a_p = 100;
+}
+}
+
+/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 2077c8d..de89fc4
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1712,6 +1712,42 @@ interpret_rhs_expr (struct loop *loop, gimple
at_stmt,
   switch (code)
 {
 case ADDR_EXPR:
+  if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF)
+{
+ tree array_ref;
+ tree var_decl, base, offset;
+ tree low_bound;
+ tree unit_size;
+ tree index;
+
+ array_ref = TREE_OPERAND (rhs1, 0);
+ var_decl = TREE_OPERAND (array_ref, 0);
+ index = TREE_OPERAND (array_ref, 1);
+
+ low_bound = array_ref_low_bound (array_ref);
+ unit_size = array_ref_element_size (array_ref);
+
+ /* We assume all arrays have sizes that are a multiple of a byte.
+First subtract the lower bound, if any, in the type of the
+index, then convert to sizetype and multiply by the size of
+the array element.  */
+ if (! integer_zerop (low_bound))
+   index = fold_build2 (MINUS_EXPR, TREE_TYPE (index),
+index, low_bound);
+
+ offset = size_binop (MULT_EXPR,
+  fold_convert (sizetype, index),
+  unit_size);
+
+ base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), var_decl);
+ chrec1 = analyze_scalar_evolution (loop, base);
+ chrec2 = analyze_scalar_evolution (loop, offset);
+  chrec1 = chrec_convert (type, chrec1, at_stmt);
+  chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt);
+ res = chrec_fold_plus (type, chrec1, chrec2);
+ break;
+}
+
   /* Handle &MEM[ptr + CST] which is equivalent to POINTER_PLUS_EXPR.
*/
   if (TREE_CODE (TREE_OPERAND (rhs1, 0)) != MEM_REF)
{

Another bigger case may be like below, and register pressure could be
observed lower.

int a[512] ;
int b[512] ;
int *ax ;
int *bx ;
int *ay ;
int *by ;
int i ;
int j ;

int f(int k)
{
for( j = 0 ; j < k ; j++)
{
for( i = j ; i < 512 ; i += k)
{
ax = &a[i+k] ;
bx = &b[i+k] ;
ay = &a[i] ;
by = &b[i] ;

*ax >>= 7 ;
*bx >>= 7 ;

a[i] = 8;
b[i] = 8;
}
}
}

Thanks,
-Jiangning

RE: [RFC] Use REG_EXPR in back-end (introduced by optimization to conditional and/or in ARM back-end)

2011-12-20 Thread Jiangning Liu

> -Original Message-
> From: Richard Henderson [mailto:r...@redhat.com]
> Sent: Tuesday, November 22, 2011 9:55 AM
> To: Jiangning Liu
> Cc: gcc-patches@gcc.gnu.org; 'Richard Guenther'
> Subject: Re: [RFC] Use REG_EXPR in back-end (introduced by optimization
> to conditional and/or in ARM back-end)
> 
> On 11/21/2011 05:31 PM, Jiangning Liu wrote:
> > My question is essentially is "May I really use REG_EXPR in back-end
> code?"
> > like the patch I gave below?
> 
> I suppose.
> 
> Another alternative is to use BImode for booleans.  Dunno how much of
> that
> you'd be able to gleen from mere rtl expansion or if you'd need boolean
> decls to be expanded with BImode.

Hi Richard,

The output of expand pass requires the operands must meet the requirement of
general_operand for binary operations, i.e. all RTX operations on top level
must meet the hardware instruction requirement. Generally for a hardware not
directly supporting a single bit logic operation, we can't generate BI mode
rtx dest.

So if I simply generate BImode for NE in expand pass, like "subreg:SI (ne:BI
(reg:SI xxx) (const_int 0)))", there would be an assertion failure due to
failing to find an appropriate instruction code to operate on a single bit.

This looks like a GCC design level issue. How would you suggest generating a
legitimate rtx expression containing BImode?

Thanks,
-Jiangning

> 
> The later would probably need a target hook.  I've often wondered how
> much
> ia64 would benefit from that too, being able to store bool variables
> directly
> in predicate registers.
> 
> All of this almost certainly must wait until stage1 opens up again
> though...
> 
> 
> r~

RE: [RFC] Optimization to conditional and/or in ARM back-end

2011-11-22 Thread Jiangning Liu



> -Original Message-
> From: Andrew Pinski [mailto:pins...@gmail.com]
> Sent: Tuesday, November 22, 2011 1:14 PM
> To: Jiangning Liu
> Cc: gcc-patches@gcc.gnu.org; Richard Guenther; Richard Henderson
> Subject: Re: [RFC] Optimization to conditional and/or in ARM back-end
> 
> On Sun, Nov 20, 2011 at 6:17 PM, Jiangning Liu 
> wrote:
> > Hi,
> >
> > This patch is to implement a peephole like optimization in ARM back-
> end.
> >
> > If we have an if condition expression like "((r3 != 0) & r1) != 0",
> 
> So this is the same as:
> int f1(int r1, int r3)
> {
>   if (((r3 != 0) & r1) != 0)
> return g();
>   return 1;
> }
> --- CUT ---
> Can't you do this instead:
> Combine the following two instructions:
> (insn 17 15 18 2 (parallel [
> (set (reg:SI 150)
> (and:SI (ne:SI (reg:SI 0 r0 [ r1 ])
> (const_int 0 [0]))
> (reg:SI 1 r1 [ r3 ])))
> (clobber (reg:CC 24 cc))
> ]) t24.c:11 274 {*cond_arith}
>  (expr_list:REG_UNUSED (reg:CC 24 cc)
> (expr_list:REG_DEAD (reg:SI 1 r1 [ r3 ])
> (expr_list:REG_DEAD (reg:SI 0 r0 [ r1 ])
> (nil)
> 
> (insn 18 17 19 2 (set (reg:CC 24 cc)
> (compare:CC (reg:SI 150)
> (const_int 0 [0]))) t24.c:11 211 {*arm_cmpsi_insn}
>  (expr_list:REG_DEAD (reg:SI 150)
> (nil)))
> 
> And then have a pattern which expands it to (note the r3 here is the
> r3 in the function, likewise for r1):
> andi r1, r1, #1
> cmp r3, #0
> it  ne
> cmpne   r1, #0
> 
> Yes it is one extra instruction but it is still better than before
> right?

Hi Andrew,

Yes, and even the code generated can be like

cmp r3, #0
itt ne
andine  r1, r1, #1
cmpne   r1, #0

But this is not what I want. For performance purpose, I want to remove that 
andi instruction as well.

Also, actually if we don't modify r1, there might be other more optimization 
opportunities, for example register pressure can be reduced for some cases, I 
guess.

Thanks,
-Jiangning

RE: [RFC] Use which_alternative in preparation-statements of define_insn_and_split

2011-11-21 Thread Jiangning Liu



> -Original Message-
> From: Richard Henderson [mailto:r...@redhat.com]
> Sent: Tuesday, November 22, 2011 7:55 AM
> To: Jiangning Liu
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [RFC] Use which_alternative in preparation-statements of
> define_insn_and_split
> 
> On 11/20/2011 07:34 PM, Jiangning Liu wrote:
> > Hi,
> >
> > I find which_alternative can't really be used in preparation-
> statements of
> > define_insn_and_split, so can this be fixed like below?
> >
> > For example, I want to use which_alternative in the pattern below,
> >
> > (define_insn_and_split "*thumb2_movsicc_insn"
> >   [(set (match_operand:SI 0 "s_register_operand" "=r,r,r,r,r,r,r,r")
> > (if_then_else:SI
> >  (match_operator 3 "arm_comparison_operator"
> >   [(match_operand 4 "cc_register" "") (const_int 0)])
> >  (match_operand:SI 1 "arm_not_operand" "0,0,rI,K,rI,rI,K,K")
> >  (match_operand:SI 2 "arm_not_operand" "rI,K,0,0,rI,K,rI,K")))]
> >   "TARGET_THUMB2"
> >   "@
> >it\\t%D3\;mov%D3\\t%0, %2
> >it\\t%D3\;mvn%D3\\t%0, #%B2
> >it\\t%d3\;mov%d3\\t%0, %1
> >it\\t%d3\;mvn%d3\\t%0, #%B1
> >ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2
> >ite\\t%d3\;mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2
> >ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2
> >ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2"
> >   "&& reload_completed"
> >   [(cond_exec (match_dup 5)
> >   (set (match_dup 0) (match_dup 6)))]
> >   "
> >   {
> > if (which_alternative >= 2 && which_alternative < 4)
> >   {
> > ...
> >   }
> > else if (which_alternative >= 4)
> 
> Hmm, I guess.  It seems a bit weird.
> 
> It seems like you'd be better off *not* using define_insn_and_split,
> actually, and instead using more specific tests on the separate
> define_split than you would on the define_insn.
> 

Yes. That would be alternative solution I can accept. 

But I still want to know why we don't want to support this? I don't see any
GCC documentation saying not allowing this usage.

Or you might be thinking this change is not safe enough?

Thanks,
-Jiangning

> I don't feel strongly about it though.  I won't object if some other
> rtl maintainer wants to approve this.
> 
> 
> r~

RE: [RFC] Use REG_EXPR in back-end (introduced by optimization to conditional and/or in ARM back-end)

2011-11-21 Thread Jiangning Liu

The original subject doesn't catch the key point, so I changed the subject
to get more people noticed.

My question is essentially is "May I really use REG_EXPR in back-end code?"
like the patch I gave below?

Thanks,
-Jiangning

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Jiangning Liu
> Sent: Monday, November 21, 2011 10:18 AM
> To: gcc-patches@gcc.gnu.org
> Cc: 'Richard Guenther'; Richard Henderson
> Subject: [RFC] Optimization to conditional and/or in ARM back-end
> 
> Hi,
> 
> This patch is to implement a peephole like optimization in ARM back-end.
> 
> If we have an if condition expression like "((r3 != 0) & r1) != 0",
> originally the binary code to be generated is like,
> 
> cmp r3, #0
> ite eq
> moveq   r1, #0
> andne   r1, r1, #1
> cmp r1, #0
> 
> But actually this expression could be transformed into "((r3 != x) &
> (r1 !=
> 0)) != 0", if we know r2 is with bool type. This way we could generate
> new
> binary code like,
> 
> cmp r3, #0
> it  ne
> cmpne   r1, #0
> 
> The question is how to judge r2 is a bool variable in back-end. I'm
> using
> REG_EXPR in function arm_check_logic_with_bool_reg to check if r2 is a
> bool,
> this function is being invoked in pattern "*cond_".
> 
> May I really use REG_EXPR this way in GCC back-end code? I posted a
> related
> topic at http://gcc.gnu.org/ml/gcc/2011-11/msg00417.html.
> 
> If the answer is No, is there any suggestions about either how to judge
> this
> bool type or how to implement this optimization?
> 
> Appreciate your comments in advance!
> 
> Thanks,
> -Jiangning
> 
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 23a29c6..8b12d48
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -114,6 +114,7 @@ extern rtx arm_gen_load_multiple (int *, int, rtx,
> int,
> rtx, HOST_WIDE_INT *);
>  extern rtx arm_gen_store_multiple (int *, int, rtx, int, rtx,
> HOST_WIDE_INT
> *);
>  extern int arm_gen_movmemqi (rtx *);
>  extern enum machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx);
> +extern bool arm_check_logic_with_bool_reg (RTX_CODE, rtx, rtx);
>  extern enum machine_mode arm_select_dominance_cc_mode (rtx, rtx,
>  HOST_WIDE_INT);
>  extern rtx arm_gen_compare_reg (RTX_CODE, rtx, rtx);
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index a429c19..e96f24a
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -10910,6 +10910,41 @@ arm_gen_movmemqi (rtx *operands)
>return 1;
>  }
> 
> +/* Check whether the expression "rc  " can be
> transformed
> +   to use conditional comparison or not.  */
> +bool
> +arm_check_logic_with_bool_reg (RTX_CODE rc, rtx bool_reg, rtx cmp1) {
> +  rtx cmp2;
> +  HOST_WIDE_INT dom_cc;
> +
> +  if (!REG_P (bool_reg))
> +return false;
> +
> +  if (REG_EXPR (bool_reg) == NULL)
> +return false;
> +
> +  if (TREE_CODE (REG_EXPR (bool_reg)) != VAR_DECL)
> +return false;
> +
> +  if (TREE_CODE (TREE_TYPE (REG_EXPR (bool_reg))) != BOOLEAN_TYPE)
> +return false;
> +
> +  cmp2 = gen_rtx_NE (GET_MODE (bool_reg), bool_reg, const0_rtx);
> +
> +  if (rc == AND)
> +dom_cc = DOM_CC_X_AND_Y;
> +  else if (rc == IOR)
> +dom_cc = DOM_CC_X_OR_Y;
> +  else
> +gcc_unreachable ();
> +
> +  if (arm_select_dominance_cc_mode (cmp1, cmp2, dom_cc) == CCmode)
> +return false;
> +
> +  return true;
> +}
> +
> +
>  /* Select a dominance comparison mode if possible for a test of the
> general
> form (OP (COND_OR (X) (Y)) (const_int 0)).  We support three forms.
> COND_OR == DOM_CC_X_AND_Y => (X && Y)
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index a78ba88..e90a78e
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -9172,6 +9172,64 @@
> (set_attr "length" "4,4,8")]
>  )
> 
> +; This pattern matches expression like example "((r1 != x) & r2) != 0".
> +; Here r2 is a register with data type boolean.
> +; This pattern can transform to code matching patterns cmp_and.
> +; Likewise, code matching pattern cmp_ior could be generated, if it is
> |
> +; rather than & in the example.
> +(define_insn_and_split "*cond_"
> +  [(set (match_operand 0 "cc_register" "")
> + (compare
> + (ior_and:SI
> +   (match_operator:

[RFC] Use which_alternative in preparation-statements of define_insn_and_split

2011-11-20 Thread Jiangning Liu

Hi,

I find which_alternative can't really be used in preparation-statements of
define_insn_and_split, so can this be fixed like below?

For example, I want to use which_alternative in the pattern below,

(define_insn_and_split "*thumb2_movsicc_insn"
  [(set (match_operand:SI 0 "s_register_operand" "=r,r,r,r,r,r,r,r")
(if_then_else:SI
 (match_operator 3 "arm_comparison_operator"
  [(match_operand 4 "cc_register" "") (const_int 0)])
 (match_operand:SI 1 "arm_not_operand" "0,0,rI,K,rI,rI,K,K")
 (match_operand:SI 2 "arm_not_operand" "rI,K,0,0,rI,K,rI,K")))]
  "TARGET_THUMB2"
  "@
   it\\t%D3\;mov%D3\\t%0, %2
   it\\t%D3\;mvn%D3\\t%0, #%B2
   it\\t%d3\;mov%d3\\t%0, %1
   it\\t%d3\;mvn%d3\\t%0, #%B1
   ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2
   ite\\t%d3\;mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2
   ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2
   ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2"
  "&& reload_completed"
  [(cond_exec (match_dup 5)
  (set (match_dup 0) (match_dup 6)))]
  "
  {
if (which_alternative >= 2 && which_alternative < 4)
  {
...
  }
else if (which_alternative >= 4)
  {
...
  }
  }"
  [(set_attr "length" "6,6,6,6,10,10,10,10")
   (set_attr "conds" "use")]
)

Thanks,
-Jiangning

Patch:

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index c94e743..df6a3df
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -3502,6 +3502,10 @@ try_split (rtx pat, rtx trial, int last)
 split_branch_probability = INTVAL (XEXP (note, 0));
   probability = split_branch_probability;

+  extract_insn (trial);
+  if (!constrain_operands (reload_completed))
+return trial;
+
   seq = split_insns (pat, trial);

   split_branch_probability = -1;

[RFC] Optimization to conditional and/or in ARM back-end

2011-11-20 Thread Jiangning Liu

Hi,

This patch is to implement a peephole like optimization in ARM back-end.

If we have an if condition expression like "((r3 != 0) & r1) != 0",
originally the binary code to be generated is like,

cmp r3, #0
ite eq
moveq   r1, #0
andne   r1, r1, #1
cmp r1, #0

But actually this expression could be transformed into "((r3 != x) & (r1 !=
0)) != 0", if we know r2 is with bool type. This way we could generate new
binary code like,

cmp r3, #0
it  ne
cmpne   r1, #0

The question is how to judge r2 is a bool variable in back-end. I'm using
REG_EXPR in function arm_check_logic_with_bool_reg to check if r2 is a bool,
this function is being invoked in pattern "*cond_". 

May I really use REG_EXPR this way in GCC back-end code? I posted a related
topic at http://gcc.gnu.org/ml/gcc/2011-11/msg00417.html.

If the answer is No, is there any suggestions about either how to judge this
bool type or how to implement this optimization?

Appreciate your comments in advance!

Thanks,
-Jiangning


diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 23a29c6..8b12d48
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -114,6 +114,7 @@ extern rtx arm_gen_load_multiple (int *, int, rtx, int,
rtx, HOST_WIDE_INT *);
 extern rtx arm_gen_store_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT
*);
 extern int arm_gen_movmemqi (rtx *);
 extern enum machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx);
+extern bool arm_check_logic_with_bool_reg (RTX_CODE, rtx, rtx);
 extern enum machine_mode arm_select_dominance_cc_mode (rtx, rtx,
   HOST_WIDE_INT);
 extern rtx arm_gen_compare_reg (RTX_CODE, rtx, rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index a429c19..e96f24a
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -10910,6 +10910,41 @@ arm_gen_movmemqi (rtx *operands)
   return 1;
 }
 
+/* Check whether the expression "rc  " can be transformed
+   to use conditional comparison or not.  */
+bool
+arm_check_logic_with_bool_reg (RTX_CODE rc, rtx bool_reg, rtx cmp1) {
+  rtx cmp2;
+  HOST_WIDE_INT dom_cc;
+
+  if (!REG_P (bool_reg))
+return false;
+
+  if (REG_EXPR (bool_reg) == NULL)
+return false;
+
+  if (TREE_CODE (REG_EXPR (bool_reg)) != VAR_DECL)
+return false;
+
+  if (TREE_CODE (TREE_TYPE (REG_EXPR (bool_reg))) != BOOLEAN_TYPE)
+return false;
+
+  cmp2 = gen_rtx_NE (GET_MODE (bool_reg), bool_reg, const0_rtx);
+
+  if (rc == AND)
+dom_cc = DOM_CC_X_AND_Y;
+  else if (rc == IOR)
+dom_cc = DOM_CC_X_OR_Y;
+  else
+gcc_unreachable ();
+
+  if (arm_select_dominance_cc_mode (cmp1, cmp2, dom_cc) == CCmode)
+return false;
+
+  return true;
+}
+
+
 /* Select a dominance comparison mode if possible for a test of the general
form (OP (COND_OR (X) (Y)) (const_int 0)).  We support three forms.
COND_OR == DOM_CC_X_AND_Y => (X && Y)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index a78ba88..e90a78e
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -9172,6 +9172,64 @@
(set_attr "length" "4,4,8")]
 )
 
+; This pattern matches expression like example "((r1 != x) & r2) != 0".
+; Here r2 is a register with data type boolean.
+; This pattern can transform to code matching patterns cmp_and.
+; Likewise, code matching pattern cmp_ior could be generated, if it is |
+; rather than & in the example.
+(define_insn_and_split "*cond_"
+  [(set (match_operand 0 "cc_register" "")
+   (compare
+ (ior_and:SI
+ (match_operator:SI 4 "arm_comparison_operator"
+   [(match_operand:SI 2 "s_register_operand" "r,r")
+   (match_operand:SI 3 "arm_add_operand" "rI,L")])
+  (match_operand:SI 1 "s_register_operand" "r,r"))
+(const_int 0)))]
+  "TARGET_32BIT
+   && arm_check_logic_with_bool_reg (, operands[1], operands[4])"
+  "#"
+  "&& 1"
+  [(set (match_dup 7)
+   (compare
+(match_op_dup 5
+ [(match_op_dup 4 [(match_dup 2) (match_dup 3)])
+  (match_op_dup 6 [(match_dup 1) (const_int 0)])])
+(const_int 0)))]
+  "
+  {
+HOST_WIDE_INT dom_cc;
+
+operands[6] = gen_rtx_NE (SImode, operands[1], const0_rtx);
+
+if ( == AND)
+  {
+dom_cc = DOM_CC_X_AND_Y;
+operands[5] = gen_rtx_fmt_ee (, SImode,
+  operands[4], operands[6]);
+  }
+else if ( == IOR)
+  {
+dom_cc = DOM_CC_X_OR_Y;
+operands[5] = gen_rtx_fmt_ee (, SImode,
+  operands[4], operands[6]);
+  }
+else
+  gcc_unreachable ();
+
+operands[7]
+  = gen_rtx_REG (arm_select_dominance_cc_mode (operands[4],
+   operands[6],
+  dom_cc),
+CC_REGNUM);
+  }"
+  [(set_attr "conds" "clob")
+   (set (attr "length")
+   (if

MAINTAINERS: add myself

2011-11-17 Thread Jiangning Liu

Just committed the following:

   * MAINTAINERS (Write After Approval): Add myself.

Index: MAINTAINERS
===
--- MAINTAINERS (revision 181466)
+++ MAINTAINERS (working copy)
@@ -419,6 +419,7 @@
 Marc Lehmann   p...@goof.com
 James Lemkejwle...@juniper.net
 Kriang Lerdsuwanakij
lerds...@users.sourceforge.net
+Jiangning Liu  jiangning@arm.com
 Sa Liu sa...@de.ibm.com
 Ralph Loader   r...@ihug.co.nz
 Gabor Loki l...@inf.u-szeged.hu

[arm-embedded] Backport mainline r178102 and partial r172017

2011-11-16 Thread Jiangning Liu

Backport mainline 178102 and partial r172017 to ARM/embedded-4_6-branch.

Committed.

2011-11-17  Jiangning Liu  

Backport r178102 from mainline
2011-08-26  Jiangning Liu  

* config/arm/arm.md (*ior_scc_scc): Enable for Thumb2 as well.
(*ior_scc_scc_cmp): Likewise
(*and_scc_scc): Likewise.
(*and_scc_scc_cmp): Likewise.
(*and_scc_scc_nodom): Likewise.
(*cmp_ite0, *cmp_ite1, *cmp_and, *cmp_ior): Handle Thumb2.

Partially backport r172017 from mainline
2011-04-06  Wei Guozhi  

* config/arm/constraints.md (Py): New constraint.

testsuite:

2011-11-17  Jiangning Liu  

Backport r178102 from mainline
2011-08-26  Jiangning Liu  

* gcc.target/arm/thumb2-cond-cmp-1.c: New.
* gcc.target/arm/thumb2-cond-cmp-2.c: Likewise.
* gcc.target/arm/thumb2-cond-cmp-3.c: Likewise.
* gcc.target/arm/thumb2-cond-cmp-4.c: Likewise.

[arm-embedded] Revert r179307 and backport mainline r180964

2011-11-16 Thread Jiangning Liu

The original solution r179307 of stack red zone issue for PR38644 is
completely reverted in ARM/embedded-4_6-branch.

And backport mainline 180964 to ARM/embedded-4_6-branch.

Committed.

Undo changes committed in r179307.

2011-11-16  Jiangning Liu  

Backport r180964 from mainline
2011-11-04  Jiangning Liu  

PR rtl-optimization/38644
* config/arm/arm.c (thumb1_expand_epilogue): Add memory barrier
for epilogue having stack adjustment.

testsuite:

2011-11-16  Jiangning Liu  

Backport r180964 from mainline
2011-11-04  Jiangning Liu  

PR rtl-optimization/38644
* gcc.target/arm/stack-red-zone.c: New.

[PATCH, ARM] Fix stack red zone bug (PR38644)

2011-11-01 Thread Jiangning Liu

Hi,

This patch is to fix PR38644 in ARM back-end. OK for trunk?

For every detail, please refer to
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644.

ChangeLog:

2011-11-2  Jiangning Liu  

PR rtl-optimization/38644
* config/arm/arm.c (thumb1_expand_epilogue): Add memory barrier
for epilogue having stack adjustment.

ChangeLog of testsuite:

2011-11-2  Jiangning Liu  

PR rtl-optimization/38644
* gcc.target/arm/stack-red-zone.c: New.

Thanks,
-Jiangning

Patch:

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f1ada6f..1f6fc26
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22215,6 +22215,8 @@ thumb1_expand_epilogue (void)
   gcc_assert (amount >= 0);
   if (amount)
 {
+  emit_insn (gen_blockage ());
+
   if (amount < 512)
emit_insn (gen_addsi3 (stack_pointer_rtx, stack_pointer_rtx,
   GEN_INT (amount)));
diff --git a/gcc/testsuite/gcc.target/arm/stack-red-zone.c
b/gcc/testsuite/gcc.target/arm/stack-red-zone.c
new file mode 100644
index 000..b9f0f99
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/stack-red-zone.c
@@ -0,0 +1,12 @@
+/* No stack red zone.  PR38644.  */
+/* { dg-options "-mthumb -O2" } */
+/* { dg-final { scan-assembler "ldrb\[^\n\]*\\n\[\t \]*add\[\t \]*sp" } }
*/
+
+extern int doStreamReadBlock (int *, char *, int size, int);
+
+char readStream (int *s)
+{
+   char c = 0;
+   doStreamReadBlock (s, &c, 1, *s);
+   return c;
+}

RE: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-31 Thread Jiangning Liu



> -Original Message-
> From: Kai Tietz [mailto:ktiet...@googlemail.com]
> Sent: Thursday, October 27, 2011 5:36 PM
> To: Jiangning Liu
> Cc: Michael Matz; Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org;
> Richard Henderson
> Subject: Re: [patch tree-optimization]: Improve handling of
> conditional-branches on targets with high branch costs
> 
> 2011/10/27 Jiangning Liu :
> >
> >
> >> -Original Message-
> >> From: Michael Matz [mailto:m...@suse.de]
> >> Sent: Wednesday, October 26, 2011 11:47 PM
> >> To: Kai Tietz
> >> Cc: Jiangning Liu; Richard Guenther; Kai Tietz; gcc-
> patc...@gcc.gnu.org;
> >> Richard Henderson
> >> Subject: Re: [patch tree-optimization]: Improve handling of
> >> conditional-branches on targets with high branch costs
> >>
> >> Hi,
> >>
> >> On Wed, 26 Oct 2011, Kai Tietz wrote:
> >>
> >> > So you would mean that memory dereferencing shouldn't be
> considered
> >> as
> >> > side-effect at all?
> >>
> >> No.  I haven't said this at all.  Of course it's a side-effect, but
> >> we're
> >> allowed to remove existing ones (under some circumstances).  We're
> not
> >> allowed to introduce new ones, which means that this ...
> >>
> >> > So we would happily cause by code 'if (i && *i != 0) an crash, as
> >> > memory-dereference has for you no side-effect?
> >>
> >> ... is not allowed.  But in the original example the memread was on
> the
> >> left side, hence occured always, therefore we can move it to the
> right
> >> side, even though it might occur less often.
> >>
> >> > In you special case it might be valid that, if first (and C-fold-
> >> const
> >> > doesn't know if the side-effect condition is really the first, as
> it
> >> > might be a sub-sequence of a condition) condition might trap or
> not,
> >> to
> >> > combine it.  But branching has to cover the general cases.  If you
> >> find
> >> > a way to determine that left-hand operand in fold_const's
> branching
> >> code
> >> > is really the left-most condition in chain, then we can add such a
> >> > special case, but I don't see here an easy way to determine it.
> >>
> >> Hmm?  I don't see why it's necessary to check if it's the left-most
> >> condition in a chain.  If the left hand of '&&' is a memread it can
> >> always
> >> be moved to the right side (or the operator transformed into '&'
> which
> >> can
> >> have the same effect), of course only if the original rhs is free of
> >> side
> >> effects, but then independed if the && was part of a larger
> expression.
> >> The memread will possibly be done fewer times than originally, but
> as
> >> said, that's okay.
> >>
> >
> > Agree. The point is for the small case I gave RHS doesn't have side
> effect
> > at all, so the optimization of changing it to AND doesn't violate C
> > specification. We need to recover something for this case, although
> it did
> > improve a lot for some particular benchmarks.
> >
> > Thanks,
> > -Jiangning
> >
> >>
> >> Ciao,
> >> Michael.
> 
> Hmm, so we can allow merging to AND, if the left-hand-side might trap
> but has no-side-effects and rhs has neither trapping nor side-effects.
> As for the case that left-hand side has side-effects but right-hand
> not, we aren't allowed to do this AND/OR merge.  For example 'if ((f =
> foo ()) != 0 && f < 24)' we aren't allowed to make this
> transformation.
> 
> This shouldn't be that hard.  We need to provide to simple_operand_p_2
> an additional argument for checking trapping or not.
> 

Would it be OK if I file a tracker in bugzilla against this?

> Regards,
> Kai

RE: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-27 Thread Jiangning Liu



> -Original Message-
> From: Michael Matz [mailto:m...@suse.de]
> Sent: Wednesday, October 26, 2011 11:47 PM
> To: Kai Tietz
> Cc: Jiangning Liu; Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org;
> Richard Henderson
> Subject: Re: [patch tree-optimization]: Improve handling of
> conditional-branches on targets with high branch costs
> 
> Hi,
> 
> On Wed, 26 Oct 2011, Kai Tietz wrote:
> 
> > So you would mean that memory dereferencing shouldn't be considered
> as
> > side-effect at all?
> 
> No.  I haven't said this at all.  Of course it's a side-effect, but
> we're
> allowed to remove existing ones (under some circumstances).  We're not
> allowed to introduce new ones, which means that this ...
> 
> > So we would happily cause by code 'if (i && *i != 0) an crash, as
> > memory-dereference has for you no side-effect?
> 
> ... is not allowed.  But in the original example the memread was on the
> left side, hence occured always, therefore we can move it to the right
> side, even though it might occur less often.
> 
> > In you special case it might be valid that, if first (and C-fold-
> const
> > doesn't know if the side-effect condition is really the first, as it
> > might be a sub-sequence of a condition) condition might trap or not,
> to
> > combine it.  But branching has to cover the general cases.  If you
> find
> > a way to determine that left-hand operand in fold_const's branching
> code
> > is really the left-most condition in chain, then we can add such a
> > special case, but I don't see here an easy way to determine it.
> 
> Hmm?  I don't see why it's necessary to check if it's the left-most
> condition in a chain.  If the left hand of '&&' is a memread it can
> always
> be moved to the right side (or the operator transformed into '&' which
> can
> have the same effect), of course only if the original rhs is free of
> side
> effects, but then independed if the && was part of a larger expression.
> The memread will possibly be done fewer times than originally, but as
> said, that's okay.
> 

Agree. The point is for the small case I gave RHS doesn't have side effect
at all, so the optimization of changing it to AND doesn't violate C
specification. We need to recover something for this case, although it did
improve a lot for some particular benchmarks.

Thanks,
-Jiangning

> 
> Ciao,
> Michael.

RE: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-26 Thread Jiangning Liu



> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Michael Matz
> Sent: Tuesday, October 11, 2011 10:45 PM
> To: Kai Tietz
> Cc: Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org; Richard
> Henderson
> Subject: Re: [patch tree-optimization]: Improve handling of
> conditional-branches on targets with high branch costs
> 
> Hi,
> 
> On Tue, 11 Oct 2011, Kai Tietz wrote:
> 
> > > Better make it a separate function the first tests your new
> > > conditions, and then calls simple_operand_p.
> >
> > Well, either I make it a new function and call it instead of
> > simple_operand_p,
> 
> That's what I meant, yes.
> 
> > >> @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_
> > >>  build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg),
> > >>  ll_arg, rl_arg),
> > >>  build_int_cst (TREE_TYPE (ll_arg), 0));
> > >> -
> > >> -  if (LOGICAL_OP_NON_SHORT_CIRCUIT)
> > >> - {
> > >> -   if (code != orig_code || lhs != orig_lhs || rhs !=
> orig_rhs)
> > >> - return build2_loc (loc, code, truth_type, lhs, rhs);
> > >> -   return NULL_TREE;
> > >> - }
> > >
> > > Why do you remove this hunk?  Shouldn't you instead move the hunk
> you
> > > added to fold_truth_andor() here.  I realize this needs some TLC to
> > > fold_truth_andor_1, because right now it early-outs for non-
> comparisons,
> > > but it seems the better place.  I.e. somehow move the below code
> into the
> > > above branch, with the associated diddling on fold_truth_andor_1
> that it
> > > gets called.
> >
> > This hunk is removed, as it is vain to do here.
> 
> There is a fallthrough now, that wasn't there before.  I don't know if
> it's harmless, I just wanted to mention it.
> 

Yes, this part introduced different behavior for this small case,

int f(char *i, int j)
{
if (*i && j!=2)
return *i;
else
return j;
}

Before the fix, we have

  D.4710 = *i;
  D.4711 = D.4710 != 0;
  D.4712 = j != 2;
  D.4713 = D.4711 & D.4712;
  if (D.4713 != 0) goto ; else goto ;
  :
  D.4710 = *i;
  D.4716 = (int) D.4710;
  return D.4716;
  :
  D.4716 = j;
  return D.4716;

After the fix, we have

  D.4711 = *i;
  if (D.4711 != 0) goto ; else goto ;
  :
  if (j != 2) goto ; else goto ;
  :
  D.4711 = *i;
  D.4714 = (int) D.4711;
  return D.4714;
  :
  D.4714 = j;
  return D.4714;

Does this meet the original expectation? 

I find the code below in function fold_truth_andor makes difference,

  /* Transform (A AND-IF B) into (A AND B), or (A OR-IF B)   into (A OR B).
 For sequence point consistancy, we need to check for trapping,
 and side-effects.  */
  else if (code == icode && simple_operand_p_2 (arg0)
   && simple_operand_p_2 (arg1))
 return fold_build2_loc (loc, ncode, type, arg0, arg1);

for "*i != 0" simple_operand_p(*i) returns false. Originally this is not 
checked by the code. I don't see the patch originally wanted to cover this. Can 
this be explained reasonably?

I'm not arguing this patch did worse thing, but only want to understand the 
rationale behind this and justify this patch doesn't hurt anything else. 
Actually on the contrary, I measured and this change accidently made some 
benchmarks significantly improved due to some other reasons.

Thanks,
-Jiangning

> > Btw richi asked for it, and I agree that new TRUTH-AND/OR packing is
> > better done at a single place in fold_truth_andor only.
> 
> As fold_truthop is called twice by fold_truth_andor, the latter might
> indeed be the better place.
> 
> 
> Ciao,
> Michael.

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-10-09 Thread Jiangning Liu



> -Original Message-
> From: Richard Sandiford 
> Date: Fri, Sep 30, 2011 at 8:46 PM
> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> To: Jiangning Liu 
> Cc: Jakub Jelinek , Richard Guenther
> , Andrew Pinski ,
> gcc-patches@gcc.gnu.org
> 
> 
> "Jiangning Liu"  writes:
> >> You seem to feel strongly about this because it's a wrong-code bug
> that
> >> is very easy to introduce and often very hard to detect.  And I
> >> defintely
> >> sympathise with that.  If we were going to to do it in a target-
> >> independent
> >> way, though, I think it would be better to scan patterns like
> epilogue
> >> and
> >> automatically introduce barriers before assignments to
> >> stack_pointer_rtx
> >> (subject to the kind of hook in your patch).  But I still don't
> think
> >> that's better than leaving the onus on the backend.  The backend is
> >> still responsible for much more complicated things like determning
> >> the correct deallocation and register-restore sequence, and for
> >> determining the correct CFI sequence.
> >>
> >
> > I think middle-end in GCC is actually shared code rather than the
> part
> > exactly in the middle. A pass working on RTL can be a middle end just
> > because the code can be shared for all targets, and some passes can
> even
> > work for both GIMPLE and RTL.
> >
> > Actually some optimizations need to work through "shared part"
> (middle-end)
> > plus "target specific part" (back-end). You are thinking the
> interface
> > between this "shared part" and "target specific part" should be using
> > "barrier" as a properly model. To some extension I agree with this.
> However,
> > it doesn't mean the fix should be in back-end rather than middle end,
> > because obviously this problem is a common ABI issue for all targets.
> If we
> > can abstract this issue to be a shared part, why shouldn't we do it
> in
> > middle end to reduce the onus of back-end? Back-end should handle the
> target
> > specific things rather than only the complicated things.
> 
> And for avoidance of doubt, the automatic barrier insertion that I
> described would be one way of doing it in target-independent code.
> But...
> 
> > If a complicated problem can be implemented in a "shared code" manner,
> we
> > still want to put it into middle end rather than back-end. I believe
> those
> > optimizations based on SSA form are complicated enough, but they are
> all in
> > middle end. This is the logic I'm seeing in GCC.
> 
> The situation here is different.  The target-independent rtl code is
> being given a blob of instructions that the backend has generated for
> the epilogue.  There's no fine-tuning beyond that.  E.g. we don't have
> separate patterns for "restore registers", "deallocate stack", "return":
> we just have one monolithic "epilogue" pattern.  The target-independent
> code has very little control.
> 
> In contrast, after the tree optimisers have handed off the initial IL,
> the tree optimisers are more or less in full control.  There are very
> few cases where we generate further trees outside the middle-end.  The
> only
> case I know off-hand is the innards of va_start and va_arg, which can
> be
> generated by the backend.
> 
> So let's suppose we had a similar situation there, where we wanted
> va_arg do something special in a certain situation.  If we had the
> same three choices of:
> 
>  1. use an on-the-side hook to represent the special something
>  2. scan the code generated by the backend and automatically
>     inject the special something at an appropriate place
>  3. require each backend to do it properly from the start
> 
> (OK, slightly prejudiced wording :-)) I think we'd still choose 3.
> 

Richard S.,

Although I've ever implemented va_arg for a commercial compiler previously
long times ago, I forgot all the details. :-) I'm not sure if using va_arg
is a good example to compare with this stack red zone case.

> > For this particular issue, I don't think that hook interface I'm
> > proposing is more complicated than the barrier. Instead, it is easier
> > for back-end implementer to be aware of the potential issue before
> > really solving stack red zone problem, because it is very clearly
> > listed in target hook list.
> 
> The point for "model it in the IL" supporters like myself is that we
> have both many backends and many rtl passes.

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-10-09 Thread Jiangning Liu



> -Original Message-
> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> Sent: Friday, September 30, 2011 8:57 PM
> To: Jiangning Liu; Jakub Jelinek; Richard Guenther; Andrew Pinski; gcc-
> patc...@gcc.gnu.org; richard.sandif...@linaro.org
> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> 
> On Fri, Sep 30, 2011 at 2:46 PM, Richard Sandiford
>  wrote:
> > "Jiangning Liu"  writes:
> >>> You seem to feel strongly about this because it's a wrong-code bug
> that
> >>> is very easy to introduce and often very hard to detect.  And I
> >>> defintely
> >>> sympathise with that.  If we were going to to do it in a target-
> >>> independent
> >>> way, though, I think it would be better to scan patterns like
> epilogue
> >>> and
> >>> automatically introduce barriers before assignments to
> >>> stack_pointer_rtx
> >>> (subject to the kind of hook in your patch).  But I still don't
> think
> >>> that's better than leaving the onus on the backend.  The backend is
> >>> still responsible for much more complicated things like determning
> >>> the correct deallocation and register-restore sequence, and for
> >>> determining the correct CFI sequence.
> >>>
> >>
> >> I think middle-end in GCC is actually shared code rather than the
> part
> >> exactly in the middle. A pass working on RTL can be a middle end
> just
> >> because the code can be shared for all targets, and some passes can
> even
> >> work for both GIMPLE and RTL.
> >>
> >> Actually some optimizations need to work through "shared part"
> (middle-end)
> >> plus "target specific part" (back-end). You are thinking the
> interface
> >> between this "shared part" and "target specific part" should be
> using
> >> "barrier" as a properly model. To some extension I agree with this.
> However,
> >> it doesn't mean the fix should be in back-end rather than middle end,
> >> because obviously this problem is a common ABI issue for all targets.
> If we
> >> can abstract this issue to be a shared part, why shouldn't we do it
> in
> >> middle end to reduce the onus of back-end? Back-end should handle
> the target
> >> specific things rather than only the complicated things.
> >
> > And for avoidance of doubt, the automatic barrier insertion that I
> > described would be one way of doing it in target-independent code.
> > But...
> >
> >> If a complicated problem can be implemented in a "shared code"
> manner, we
> >> still want to put it into middle end rather than back-end. I believe
> those
> >> optimizations based on SSA form are complicated enough, but they are
> all in
> >> middle end. This is the logic I'm seeing in GCC.
> >
> > The situation here is different.  The target-independent rtl code is
> > being given a blob of instructions that the backend has generated for
> > the epilogue.  There's no fine-tuning beyond that.  E.g. we don't
> have
> > separate patterns for "restore registers", "deallocate stack",
> "return":
> > we just have one monolithic "epilogue" pattern.  The target-
> independent
> > code has very little control.
> >
> > In contrast, after the tree optimisers have handed off the initial IL,
> > the tree optimisers are more or less in full control.  There are very
> > few cases where we generate further trees outside the middle-
> end.  The only
> > case I know off-hand is the innards of va_start and va_arg, which can
> be
> > generated by the backend.
> >
> > So let's suppose we had a similar situation there, where we wanted
> > va_arg do something special in a certain situation.  If we had the
> > same three choices of:
> >
> >  1. use an on-the-side hook to represent the special something
> >  2. scan the code generated by the backend and automatically
> >     inject the special something at an appropriate place
> >  3. require each backend to do it properly from the start
> >
> > (OK, slightly prejudiced wording :-)) I think we'd still choose 3.
> >
> >> For this particular issue, I don't think that hook interface I'm
> >> proposing is more complicated than the barrier. Instead, it is
> easier
> >> for back-end implementer to be aware of the potential issue before
> >> really solving stack red zone

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-10-09 Thread Jiangning Liu



> -Original Message-
> From: Richard Henderson [mailto:r...@redhat.com]
> Sent: Saturday, October 01, 2011 3:05 AM
> To: Jiangning Liu
> Cc: 'Jakub Jelinek'; 'Richard Guenther'; Andrew Pinski; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> 
> On 09/29/2011 06:13 PM, Jiangning Liu wrote:
> >
> >
> >> -Original Message-
> >> From: Jakub Jelinek [mailto:ja...@redhat.com]
> >> Sent: Thursday, September 29, 2011 6:14 PM
> >> To: Jiangning Liu
> >> Cc: 'Richard Guenther'; Andrew Pinski; gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> >>
> >> On Thu, Sep 29, 2011 at 06:08:50PM +0800, Jiangning Liu wrote:
> >>> As far as I know different back-ends are implementing different
> >>> prologue/epilogue in GCC. If one day this part can be refined and
> >> abstracted
> >>> as well, I would say solving this stack-red-zone problem in shared
> >>> prologue/epilogue code would be a perfect solution, and barrier can
> >> be
> >>> inserted there.
> >>>
> >>> I'm not saying you are wrong on keeping scheduler using a pure
> >> barrier
> >>> interface. From engineering point of view, I only feel my proposal
> is
> >> so far
> >>> so good, because this patch at least solve the problem for all
> >> targets in a
> >>> quite simple way. Maybe it can be improved in future based on this.
> >>
> >> But you don't want to listen about any other alternative, other
> >> backends are
> >> happy with being able to put the best kind of barrier at the best
> spot
> >> in the epilogue and don't need a "generic" solution which won't
> model
> >> very
> >> well the target diversity anyway.
> >
> > Jakub,
> >
> > Appreciate for your attention on this issue,
> >
> > 1) Can you clarify who are the "others back-ends"? Does it cover most
> of the
> > back-ends being supported by GCC right now?
> 
> Your red-stack barrier issue is *exactly* the same as the frame pointer
> barrier issue, which affects many backends.
> 
> That is, if the frame pointer is initialized before the local stack
> frame
> is allocated, then one has to add a barrier such that memory references
> based on the frame pointer are not scheduled before the local stack
> frame
> allocation.
> 
> One example of this is in the i386 port, where the prologue looks like
> 
>   push%ebp
>   mov %esp, %ebp
>   sub $frame, %esp
> 
> The rtl we emit for that subtraction looks like
> 
> (define_insn "pro_epilogue_adjust_stack__add"
>   [(set (match_operand:P 0 "register_operand" "=r,r")
> (plus:P (match_operand:P 1 "register_operand" "0,r")
> (match_operand:P 2 "" "r,l")))
>(clobber (reg:CC FLAGS_REG))
>(clobber (mem:BLK (scratch)))]
> 
> Note the final clobber, which is a memory scheduling barrier.
> 
> Other targets use similar tricks.  For instance arm "stack_tie".
> 
> Honestly, I've found nothing convincing throughout this thread that
> suggests to me that this problem should be handled generically.
> 

Richard H.,

Thanks for your explanation by giving an example in x86. 

The key is if possible, fixing it in middle end can benefit all ports
directly and avoid bug fixing burden in back-ends, rather than fix this
problem port by port.

Actually now the debating here is whether memory barrier is properly
modeling through whole GCC rather than a single component, because my
current understanding is scheduler is not the only component using memory
barrier.

Thanks,
-Jiangning

> 
> r~

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-30 Thread Jiangning Liu



> -Original Message-
> From: Richard Sandiford [mailto:rdsandif...@googlemail.com]
> Sent: Friday, September 30, 2011 4:15 PM
> To: Jiangning Liu
> Cc: 'Jakub Jelinek'; 'Richard Guenther'; Andrew Pinski; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> 
> "Jiangning Liu"  writes:
> >> -Original Message-
> >> From: Jakub Jelinek [mailto:ja...@redhat.com]
> >> Sent: Thursday, September 29, 2011 6:14 PM
> >> To: Jiangning Liu
> >> Cc: 'Richard Guenther'; Andrew Pinski; gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> >>
> >> On Thu, Sep 29, 2011 at 06:08:50PM +0800, Jiangning Liu wrote:
> >> > As far as I know different back-ends are implementing different
> >> > prologue/epilogue in GCC. If one day this part can be refined and
> >> abstracted
> >> > as well, I would say solving this stack-red-zone problem in shared
> >> > prologue/epilogue code would be a perfect solution, and barrier
> can
> >> be
> >> > inserted there.
> >> >
> >> > I'm not saying you are wrong on keeping scheduler using a pure
> >> barrier
> >> > interface. From engineering point of view, I only feel my proposal
> is
> >> so far
> >> > so good, because this patch at least solve the problem for all
> >> targets in a
> >> > quite simple way. Maybe it can be improved in future based on this.
> >>
> >> But you don't want to listen about any other alternative, other
> >> backends are
> >> happy with being able to put the best kind of barrier at the best
> spot
> >> in the epilogue and don't need a "generic" solution which won't
> model
> >> very
> >> well the target diversity anyway.
> >
> > Jakub,
> >
> > Appreciate for your attention on this issue,
> >
> > 1) Can you clarify who are the "others back-ends"? Does it cover most
> of the
> > back-ends being supported by GCC right now?
> 
> Not answering for Jakub of course, but as a maintainer of a backend, I
> know
> MIPS doesn't have the required barrier at the moment.  But that's a bug.
> 
> Like others in this thread, I'm strongly of the opinion that this
> should
> be modelled directly in the IL.  And it's already supposed to be
> modelled
> in the IL.  Target-independent code emits the required barriers in
> cases
> where it rather than the backend patterns are responsible.  E.g.:
> 
> emit_clobber (gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode)));
> emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx));
> 
> emit_move_insn (hard_frame_pointer_rtx, fp);
> emit_stack_restore (SAVE_NONLOCAL, stack);
> 
> from expand_builtin_longjmp() and:
> 
>   if (sa != 0)
> {
>   sa = validize_mem (sa);
>   /* These clobbers prevent the scheduler from moving
>references to variable arrays below the code
>that deletes (pops) the arrays.  */
>   emit_clobber (gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode)));
>   emit_clobber (gen_rtx_MEM (BLKmode, stack_pointer_rtx));
> }
> 
> from emit_stack_restore().  Backends that fail to follow suit are IMO
> just buggy.
> 
> FWIW, I intend to fix MIPS this weekend.

Richard S.,

Appreciate your attention on this issue and investigation on MIPS target.

Really glad to know you "find" a potential bug for MIPS through this
discussion. To some extension this proved my hypothesis previously.

> 
> You seem to feel strongly about this because it's a wrong-code bug that
> is very easy to introduce and often very hard to detect.  And I
> defintely
> sympathise with that.  If we were going to to do it in a target-
> independent
> way, though, I think it would be better to scan patterns like epilogue
> and
> automatically introduce barriers before assignments to
> stack_pointer_rtx
> (subject to the kind of hook in your patch).  But I still don't think
> that's better than leaving the onus on the backend.  The backend is
> still responsible for much more complicated things like determning
> the correct deallocation and register-restore sequence, and for
> determining the correct CFI sequence.
> 

I think middle-end in GCC is actually shared code rather than the part
exactly in the middle. A pass working on RTL can be a middle end just
because the code can be shared for all targets, and some passes can even
work for both GIMPLE and RTL.

Actually some op

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-29 Thread Jiangning Liu

> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Thursday, September 29, 2011 6:14 PM
> To: Jiangning Liu
> Cc: 'Richard Guenther'; Andrew Pinski; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> 
> On Thu, Sep 29, 2011 at 06:08:50PM +0800, Jiangning Liu wrote:
> > As far as I know different back-ends are implementing different
> > prologue/epilogue in GCC. If one day this part can be refined and
> abstracted
> > as well, I would say solving this stack-red-zone problem in shared
> > prologue/epilogue code would be a perfect solution, and barrier can
> be
> > inserted there.
> >
> > I'm not saying you are wrong on keeping scheduler using a pure
> barrier
> > interface. From engineering point of view, I only feel my proposal is
> so far
> > so good, because this patch at least solve the problem for all
> targets in a
> > quite simple way. Maybe it can be improved in future based on this.
> 
> But you don't want to listen about any other alternative, other
> backends are
> happy with being able to put the best kind of barrier at the best spot
> in the epilogue and don't need a "generic" solution which won't model
> very
> well the target diversity anyway.

Jakub,

Appreciate for your attention on this issue,

1) Can you clarify who are the "others back-ends"? Does it cover most of the
back-ends being supported by GCC right now?
2) You need give data to prove "other back-ends" are happy with current
solution. The fact is over the years there are a bunch of bugs filed against
this problem. WHY? At this point you are implying "other back-ends" are
happy with bugs or potential bugs? This is wired to me. Also, this is not a
issue whether a back-end is able to implement barrier or not. If you are
really asking "able or not", I would say every back-end is able, but it
doesn't mean "able" is correct and it doesn't mean "able" is the most
reasonable.

Comparing with the one I am proposing, I don't see the current solution has
other significant advantages in addition to the "proper modeling" for
scheduler you are arguing. Instead, the solution I'm proposing is a safe
solution, and a solution easy to avoid bugs. If GCC compiler infrastructure
can't even give a safe compilation, why should we insist on the "proper
modeling" for scheduler only?

Hopefully you can consider again about this.

-Jiangning

> 
>   Jakub

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-29 Thread Jiangning Liu



> -Original Message-
> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> Sent: Thursday, September 29, 2011 5:03 PM
> To: Jiangning Liu
> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> 
> On Thu, Sep 29, 2011 at 5:13 AM, Jiangning Liu 
> wrote:
> >
> >
> >> -Original Message-
> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> >> Sent: Wednesday, September 28, 2011 5:56 PM
> >> To: Jiangning Liu
> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> >>
> >> On Wed, Sep 28, 2011 at 11:40 AM, Jiangning Liu
> 
> >> wrote:
> >> >
> >> >
> >> >> -Original Message-
> >> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> >> >> Sent: Wednesday, September 28, 2011 5:20 PM
> >> >> To: Jiangning Liu
> >> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> >> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> >> >>
> >> >> On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu
> >> 
> >> >> wrote:
> >> >> >
> >> >> >
> >> >> >> -Original Message-
> >> >> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> >> >> >> Sent: Wednesday, September 28, 2011 4:39 PM
> >> >> >> To: Jiangning Liu
> >> >> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> >> >> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> >> >> >>
> >> >> >> On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu
> >> >> 
> >> >> >> wrote:
> >> >> >> >
> >> >> >> >
> >> >> >> >> -Original Message-
> >> >> >> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> >> >> >> >> Sent: Tuesday, September 27, 2011 3:41 PM
> >> >> >> >> To: Jiangning Liu
> >> >> >> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> >> >> >> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> >> >> >> >>
> >> >> >> >> On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu
> >> >> >> 
> >> >> >> >> wrote:
> >> >> >> >> >> Think of it this way.  What the IR says is there is no
> >> barrier
> >> >> >> >> between
> >> >> >> >> >> those moves.  You either have an implicit barrier (which
> is
> >> >> what
> >> >> >> you
> >> >> >> >> >> are proposing) or you have it explicitly.  I think we
> all
> >> >> rather
> >> >> >> >> have
> >> >> >> >> >> more things explicit rather than implicit in the
> IR.  And
> >> that
> >> >> >> has
> >> >> >> >> >> been the overall feeling for a few years now.
> >> >> >> >> >>
> >> >> >> >> >
> >> >> >> >> > Sorry, I'm afraid I can't agree with you. Instead, I
> think
> >> >> using
> >> >> >> >> barrier to describe this kind of dependence is a kind of
> >> implicit
> >> >> >> >> method. Please note that this is not an usual data
> dependence
> >> >> issue.
> >> >> >> >> The stack pointer change doesn't have any dependence with
> >> memory
> >> >> >> access
> >> >> >> >> at all.
> >> >> >> >>
> >> >> >> >> It is similar to atomic instructions that require being an
> >> >> >> >> optimization/memory barrier.  Sure it is not a usual data
> >> >> dependence
> >> >> >> >> (otherwise we would handle
> >> >> >> >> it already), so the targets have to explicitly express the
> >> >> >> dependence
> >> >> >> >> somehow, for which we only have barriers right now.
> >> >> >> >>
> >> >> >> >
> >> >

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Jiangning Liu



> -Original Message-
> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> Sent: Wednesday, September 28, 2011 5:56 PM
> To: Jiangning Liu
> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> 
> On Wed, Sep 28, 2011 at 11:40 AM, Jiangning Liu 
> wrote:
> >
> >
> >> -Original Message-
> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> >> Sent: Wednesday, September 28, 2011 5:20 PM
> >> To: Jiangning Liu
> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> >>
> >> On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu
> 
> >> wrote:
> >> >
> >> >
> >> >> -Original Message-
> >> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> >> >> Sent: Wednesday, September 28, 2011 4:39 PM
> >> >> To: Jiangning Liu
> >> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> >> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> >> >>
> >> >> On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu
> >> 
> >> >> wrote:
> >> >> >
> >> >> >
> >> >> >> -Original Message-
> >> >> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> >> >> >> Sent: Tuesday, September 27, 2011 3:41 PM
> >> >> >> To: Jiangning Liu
> >> >> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> >> >> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> >> >> >>
> >> >> >> On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu
> >> >> 
> >> >> >> wrote:
> >> >> >> >> Think of it this way.  What the IR says is there is no
> barrier
> >> >> >> between
> >> >> >> >> those moves.  You either have an implicit barrier (which is
> >> what
> >> >> you
> >> >> >> >> are proposing) or you have it explicitly.  I think we all
> >> rather
> >> >> >> have
> >> >> >> >> more things explicit rather than implicit in the IR.  And
> that
> >> >> has
> >> >> >> >> been the overall feeling for a few years now.
> >> >> >> >>
> >> >> >> >
> >> >> >> > Sorry, I'm afraid I can't agree with you. Instead, I think
> >> using
> >> >> >> barrier to describe this kind of dependence is a kind of
> implicit
> >> >> >> method. Please note that this is not an usual data dependence
> >> issue.
> >> >> >> The stack pointer change doesn't have any dependence with
> memory
> >> >> access
> >> >> >> at all.
> >> >> >>
> >> >> >> It is similar to atomic instructions that require being an
> >> >> >> optimization/memory barrier.  Sure it is not a usual data
> >> dependence
> >> >> >> (otherwise we would handle
> >> >> >> it already), so the targets have to explicitly express the
> >> >> dependence
> >> >> >> somehow, for which we only have barriers right now.
> >> >> >>
> >> >> >
> >> >> > Richard,
> >> >> >
> >> >> > Thanks for your explanation. It's explicit to back-end, while
> it's
> >> >> implicit
> >> >> > to scheduler in middle end, because barrier can decide
> dependence
> >> in
> >> >> > scheduler but barrier can be generated from several different
> >> >> scenarios.
> >> >> > It's unsafe and prone to introduce bug if any one of the
> scenarios
> >> >> requiring
> >> >> > generating barriers is being missed in back-end.
> >> >> >
> >> >> > Between middle-end and back-end, we should have interfaces that
> is
> >> >> easy to
> >> >> > be implemented by back-end. After all, middle-end itself can't
> >> >> consist of a
> >> >> > compiler, and vice versa. Back-end needs middle-end's help to
> make
> >> >> sure
> >> >> > back-end is easy

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Jiangning Liu



> -Original Message-
> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> Sent: Wednesday, September 28, 2011 5:20 PM
> To: Jiangning Liu
> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> 
> On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu 
> wrote:
> >
> >
> >> -Original Message-
> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> >> Sent: Wednesday, September 28, 2011 4:39 PM
> >> To: Jiangning Liu
> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> >>
> >> On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu
> 
> >> wrote:
> >> >
> >> >
> >> >> -----Original Message-
> >> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> >> >> Sent: Tuesday, September 27, 2011 3:41 PM
> >> >> To: Jiangning Liu
> >> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> >> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> >> >>
> >> >> On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu
> >> 
> >> >> wrote:
> >> >> >> Think of it this way.  What the IR says is there is no barrier
> >> >> between
> >> >> >> those moves.  You either have an implicit barrier (which is
> what
> >> you
> >> >> >> are proposing) or you have it explicitly.  I think we all
> rather
> >> >> have
> >> >> >> more things explicit rather than implicit in the IR.  And that
> >> has
> >> >> >> been the overall feeling for a few years now.
> >> >> >>
> >> >> >
> >> >> > Sorry, I'm afraid I can't agree with you. Instead, I think
> using
> >> >> barrier to describe this kind of dependence is a kind of implicit
> >> >> method. Please note that this is not an usual data dependence
> issue.
> >> >> The stack pointer change doesn't have any dependence with memory
> >> access
> >> >> at all.
> >> >>
> >> >> It is similar to atomic instructions that require being an
> >> >> optimization/memory barrier.  Sure it is not a usual data
> dependence
> >> >> (otherwise we would handle
> >> >> it already), so the targets have to explicitly express the
> >> dependence
> >> >> somehow, for which we only have barriers right now.
> >> >>
> >> >
> >> > Richard,
> >> >
> >> > Thanks for your explanation. It's explicit to back-end, while it's
> >> implicit
> >> > to scheduler in middle end, because barrier can decide dependence
> in
> >> > scheduler but barrier can be generated from several different
> >> scenarios.
> >> > It's unsafe and prone to introduce bug if any one of the scenarios
> >> requiring
> >> > generating barriers is being missed in back-end.
> >> >
> >> > Between middle-end and back-end, we should have interfaces that is
> >> easy to
> >> > be implemented by back-end. After all, middle-end itself can't
> >> consist of a
> >> > compiler, and vice versa. Back-end needs middle-end's help to make
> >> sure
> >> > back-end is easy to be implemented and reduce the possibility of
> >> introducing
> >> > bugs.
> >> >
> >> > Without an explicit hook as I'm proposing, back-end implementers
> have
> >> to
> >> > clearly know all scenarios of generating barrier very clearly,
> >> because there
> >> > isn't any code tips and comments in middle end telling back-end
> the
> >> list of
> >> > all scenarios on generating barriers.
> >> >
> >> > Yes, barrier is a perfect interface for scheduler in theory. But
> from
> >> > engineering point of view, I think it's better to explicitly
> define
> >> an
> >> > interface to describe stack red zone and inform back-end, or vice
> >> versa. Not
> >> > like computer, people is easy to make mistake if you don't tell
> them.
> >> On
> >> > this bug, the fact is over the years different back-ends made
> similar
> >> bugs.
> >> >
> >> > GCC is really a perfect platform on

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Jiangning Liu



> -Original Message-
> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> Sent: Wednesday, September 28, 2011 4:39 PM
> To: Jiangning Liu
> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> 
> On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu 
> wrote:
> >
> >
> >> -Original Message-
> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> >> Sent: Tuesday, September 27, 2011 3:41 PM
> >> To: Jiangning Liu
> >> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> >>
> >> On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu
> 
> >> wrote:
> >> >> Think of it this way.  What the IR says is there is no barrier
> >> between
> >> >> those moves.  You either have an implicit barrier (which is what
> you
> >> >> are proposing) or you have it explicitly.  I think we all rather
> >> have
> >> >> more things explicit rather than implicit in the IR.  And that
> has
> >> >> been the overall feeling for a few years now.
> >> >>
> >> >
> >> > Sorry, I'm afraid I can't agree with you. Instead, I think using
> >> barrier to describe this kind of dependence is a kind of implicit
> >> method. Please note that this is not an usual data dependence issue.
> >> The stack pointer change doesn't have any dependence with memory
> access
> >> at all.
> >>
> >> It is similar to atomic instructions that require being an
> >> optimization/memory barrier.  Sure it is not a usual data dependence
> >> (otherwise we would handle
> >> it already), so the targets have to explicitly express the
> dependence
> >> somehow, for which we only have barriers right now.
> >>
> >
> > Richard,
> >
> > Thanks for your explanation. It's explicit to back-end, while it's
> implicit
> > to scheduler in middle end, because barrier can decide dependence in
> > scheduler but barrier can be generated from several different
> scenarios.
> > It's unsafe and prone to introduce bug if any one of the scenarios
> requiring
> > generating barriers is being missed in back-end.
> >
> > Between middle-end and back-end, we should have interfaces that is
> easy to
> > be implemented by back-end. After all, middle-end itself can't
> consist of a
> > compiler, and vice versa. Back-end needs middle-end's help to make
> sure
> > back-end is easy to be implemented and reduce the possibility of
> introducing
> > bugs.
> >
> > Without an explicit hook as I'm proposing, back-end implementers have
> to
> > clearly know all scenarios of generating barrier very clearly,
> because there
> > isn't any code tips and comments in middle end telling back-end the
> list of
> > all scenarios on generating barriers.
> >
> > Yes, barrier is a perfect interface for scheduler in theory. But from
> > engineering point of view, I think it's better to explicitly define
> an
> > interface to describe stack red zone and inform back-end, or vice
> versa. Not
> > like computer, people is easy to make mistake if you don't tell them.
> On
> > this bug, the fact is over the years different back-ends made similar
> bugs.
> >
> > GCC is really a perfect platform on building new ports, and I saw a
> lot of
> > new back-ends. The current middle end is unsafe, if port doesn't
> support
> > stack red zone and back-ends doesn't generate barrier for it. Why
> can't we
> > explicitly clarify this in compiler code between middle end and back
> end?
> > What if any other back-end (new or old) NOT supporting stack red zone
> > exposing the similar bug again?
> 
> There are gazillion things you have to explicitly get right in your
> backends,
> so I don't see why exposing proper scheduling barriers should be
> special,
> and there, why red-zones should be special (as opposed to other
> occasions
> where you need to emit barriers from the backend for the scheduler).
> 

Richard,

This is because,

1) Current scheduler is unsafe if back-end doesn't generate barrier for a
port which doesn't support stack red zone at all.
2) Implementing barrier in back-end is a burden to new back-end
implementation for ports not supporting stack red zone at all. 
3) There are many other ports not reporting bugs around this. I doubt there
isn't bug for them. 
4) There are over 300 TARG

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-27 Thread Jiangning Liu

Just realized ChangeLog needs to be changed as well.

ChangeLog:

* config/i386/i386.c (ix86_using_red_zone): Remove inline.
(TARGET_STACK_USING_RED_ZONE): New.
* config/rs6000/rs6000.c (rs6000_stack_using_red_zone): New.
(TARGET_STACK_USING_RED_ZONE): New.
(offset_below_red_zone_p): Change to use new hook 
TARGET_STACK_USING_RED_ZONE.
* doc/tm.texi (TARGET_STACK_USING_RED_ZONE): New.
* doc/tm.texi.in (TARGET_STACK_USING_RED_ZONE): New.
* sched-deps.c (sched_analyze_1): If the stack pointer is being 
modified and stack red zone is not supported for ports, flush out 
all memory references as they may become invalid if moved across 
the stack adjustment.
* target.def (stack_using_red_zone): New.
* testsuite/gcc.target/arm/stack-red-zone.c: New.

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Jiangning Liu
> Sent: Wednesday, September 28, 2011 2:24 PM
> To: 'Uros Bizjak'
> Cc: gcc-patches@gcc.gnu.org; j...@suse.cz; geo...@geoffk.org;
> dje@gmail.com; r...@redhat.com; Richard Earnshaw; Matthew Gretton-
> Dann
> Subject: RE: [PATCH] Fix stack red zone bug (PR38644)
> 
> > > -static inline bool
> > > +extern bool
> >
> > static bool
> >
> > >  ix86_using_red_zone (void)
> > >  {
> > >   return TARGET_RED_ZONE && !TARGET_64BIT_MS_ABI;
> > > @@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void)
> > >  #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail
> > >  #endif
> > >
> 
> ...
> 
> > >
> > > +/* Return true if the ABI allows red zone access.  */
> > > +extern bool
> >
> > static bool
> >
> > > +rs6000_stack_using_red_zone (void)
> > > +{
> > > +   return (DEFAULT_ABI != ABI_V4);
> > > +}
> > > +
> >
> 
> Uros,
> 
> Thanks for your review. Accept and new patch is as below. No change for
> ChangeLog.
> 
> Thanks,
> -Jiangning
> 
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index ff8c49f..e200974 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2631,7 +2631,7 @@ static const char *const
> cpu_names[TARGET_CPU_DEFAULT_max] =
> 
> 
>  /* Return true if a red-zone is in use.  */
> 
> -static inline bool
> +static bool
>  ix86_using_red_zone (void)
>  {
>return TARGET_RED_ZONE && !TARGET_64BIT_MS_ABI;
> @@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void)
>  #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail
>  #endif
> 
> +#undef TARGET_STACK_USING_RED_ZONE
> +#define TARGET_STACK_USING_RED_ZONE ix86_using_red_zone
> +
>  #undef TARGET_FUNCTION_VALUE
>  #define TARGET_FUNCTION_VALUE ix86_function_value
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 1ab57e5..1e64d14 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -1537,6 +1537,9 @@ static const struct attribute_spec
> rs6000_attribute_table[] =
>  #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail
>  #endif
> 
> +#undef TARGET_STACK_USING_RED_ZONE
> +#define TARGET_STACK_USING_RED_ZONE rs6000_stack_using_red_zone
> +
>  /* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors
> The PowerPC architecture requires only weak consistency among
> processors--that is, memory accesses between processors need not be
> @@ -20660,6 +20663,13 @@ rs6000_restore_saved_cr (rtx reg, int
> using_mfcr_multiple)
>   }
>  }
> 
> +/* Return true if the ABI allows red zone access.  */
> +static bool
> +rs6000_stack_using_red_zone (void)
> +{
> +   return (DEFAULT_ABI != ABI_V4);
> +}
> +
>  /* Return true if OFFSET from stack pointer can be clobbered by
> signals.
> V.4 doesn't have any stack cushion, AIX ABIs have 220 or 288 bytes
> below stack pointer not cloberred by signals.  */
> @@ -20667,7 +20677,7 @@ rs6000_restore_saved_cr (rtx reg, int
> using_mfcr_multiple)
>  static inline bool
>  offset_below_red_zone_p (HOST_WIDE_INT offset)
>  {
> -  return offset < (DEFAULT_ABI == ABI_V4
> +  return offset < (!TARGET_STACK_USING_RED_ZONE
>  ? 0
>  : TARGET_32BIT ? -220 : -288);
>  }
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 335c1d1..c848806 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -11332,6 +11332,22 @@ to the stack.  Therefore, this hook should
> return
> true in general, but
>  false for naked functions.  The default implementation always ret

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-27 Thread Jiangning Liu

> > -static inline bool
> > +extern bool
> 
> static bool
> 
> >  ix86_using_red_zone (void)
> >  {
> >   return TARGET_RED_ZONE && !TARGET_64BIT_MS_ABI;
> > @@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void)
> >  #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail
> >  #endif
> >

...

> >
> > +/* Return true if the ABI allows red zone access.  */
> > +extern bool
> 
> static bool
> 
> > +rs6000_stack_using_red_zone (void)
> > +{
> > +   return (DEFAULT_ABI != ABI_V4);
> > +}
> > +
> 

Uros,

Thanks for your review. Accept and new patch is as below. No change for
ChangeLog.

Thanks,
-Jiangning

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ff8c49f..e200974 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2631,7 +2631,7 @@ static const char *const
cpu_names[TARGET_CPU_DEFAULT_max] =
 

 /* Return true if a red-zone is in use.  */
 
-static inline bool
+static bool
 ix86_using_red_zone (void)
 {
   return TARGET_RED_ZONE && !TARGET_64BIT_MS_ABI;
@@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void)
 #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail
 #endif
 
+#undef TARGET_STACK_USING_RED_ZONE
+#define TARGET_STACK_USING_RED_ZONE ix86_using_red_zone
+
 #undef TARGET_FUNCTION_VALUE
 #define TARGET_FUNCTION_VALUE ix86_function_value
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 1ab57e5..1e64d14 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1537,6 +1537,9 @@ static const struct attribute_spec
rs6000_attribute_table[] =
 #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail
 #endif
 
+#undef TARGET_STACK_USING_RED_ZONE
+#define TARGET_STACK_USING_RED_ZONE rs6000_stack_using_red_zone
+
 /* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors
The PowerPC architecture requires only weak consistency among
processors--that is, memory accesses between processors need not be
@@ -20660,6 +20663,13 @@ rs6000_restore_saved_cr (rtx reg, int
using_mfcr_multiple)
}
 }
 
+/* Return true if the ABI allows red zone access.  */
+static bool
+rs6000_stack_using_red_zone (void)
+{
+   return (DEFAULT_ABI != ABI_V4);
+}
+
 /* Return true if OFFSET from stack pointer can be clobbered by signals.
V.4 doesn't have any stack cushion, AIX ABIs have 220 or 288 bytes
below stack pointer not cloberred by signals.  */
@@ -20667,7 +20677,7 @@ rs6000_restore_saved_cr (rtx reg, int
using_mfcr_multiple)
 static inline bool
 offset_below_red_zone_p (HOST_WIDE_INT offset)
 {
-  return offset < (DEFAULT_ABI == ABI_V4
+  return offset < (!TARGET_STACK_USING_RED_ZONE
   ? 0
   : TARGET_32BIT ? -220 : -288);
 }
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 335c1d1..c848806 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11332,6 +11332,22 @@ to the stack.  Therefore, this hook should return
true in general, but
 false for naked functions.  The default implementation always returns true.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_STACK_USING_RED_ZONE (void)
+This hook returns true if the target has a red zone (an area beyond the
+current extent of the stack that cannot be modified by asynchronous events
+on the processor).
+
+If this hook returns false then the compiler mid-end will not move an
access
+to memory in the stack frame past a stack adjustment insn.
+
+If this hook returns true then the compiler mid-end will assume that it is
+safe to move an access to memory in the stack frame past a stack adjustment
+insn.  The target back-end must emit scheduling barrier insns when this is
+unsafe.
+
+The default is to return false which is safe and appropriate for most
targets.
+@end deftypefn
+
 @deftypevr {Target Hook} {unsigned HOST_WIDE_INT} TARGET_CONST_ANCHOR
 On some architectures it can take multiple instructions to synthesize
 a constant.  If there is another constant already in a register that
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 6783826..0fa9e10 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -11223,6 +11223,22 @@ to the stack.  Therefore, this hook should return
true in general, but
 false for naked functions.  The default implementation always returns true.
 @end deftypefn
 
+@hook TARGET_STACK_USING_RED_ZONE
+This hook returns true if the target has a red zone (an area beyond the
+current extent of the stack that cannot be modified by asynchronous events
+on the processor).
+
+If this hook returns false then the compiler mid-end will not move an
access
+to memory in the stack frame past a stack adjustment insn.
+
+If this hook returns true then the compiler mid-end will assume that it is
+safe to move an access to memory in the stack frame past a stack adjustment
+insn.  The target back-end must emit scheduling barrier insns when this is
+unsafe.
+
+The default is to return false which is safe and appropriate for most
targets.
+@end deftypefn
+
 @hook TARGET_CONST_ANCHOR
 On

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-27 Thread Jiangning Liu

> -Original Message-
> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> Sent: Tuesday, September 27, 2011 3:41 PM
> To: Jiangning Liu
> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> 
> On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu 
> wrote:
> >> Think of it this way.  What the IR says is there is no barrier
> between
> >> those moves.  You either have an implicit barrier (which is what you
> >> are proposing) or you have it explicitly.  I think we all rather
> have
> >> more things explicit rather than implicit in the IR.  And that has
> >> been the overall feeling for a few years now.
> >>
> >
> > Sorry, I'm afraid I can't agree with you. Instead, I think using
> barrier to describe this kind of dependence is a kind of implicit
> method. Please note that this is not an usual data dependence issue.
> The stack pointer change doesn't have any dependence with memory access
> at all.
> 
> It is similar to atomic instructions that require being an
> optimization/memory barrier.  Sure it is not a usual data dependence
> (otherwise we would handle
> it already), so the targets have to explicitly express the dependence
> somehow, for which we only have barriers right now.
> 

Richard,

Thanks for your explanation. It's explicit to back-end, while it's implicit
to scheduler in middle end, because barrier can decide dependence in
scheduler but barrier can be generated from several different scenarios.
It's unsafe and prone to introduce bug if any one of the scenarios requiring
generating barriers is being missed in back-end.

Between middle-end and back-end, we should have interfaces that is easy to
be implemented by back-end. After all, middle-end itself can't consist of a
compiler, and vice versa. Back-end needs middle-end's help to make sure
back-end is easy to be implemented and reduce the possibility of introducing
bugs.

Without an explicit hook as I'm proposing, back-end implementers have to
clearly know all scenarios of generating barrier very clearly, because there
isn't any code tips and comments in middle end telling back-end the list of
all scenarios on generating barriers. 

Yes, barrier is a perfect interface for scheduler in theory. But from
engineering point of view, I think it's better to explicitly define an
interface to describe stack red zone and inform back-end, or vice versa. Not
like computer, people is easy to make mistake if you don't tell them. On
this bug, the fact is over the years different back-ends made similar bugs.

GCC is really a perfect platform on building new ports, and I saw a lot of
new back-ends. The current middle end is unsafe, if port doesn't support
stack red zone and back-ends doesn't generate barrier for it. Why can't we
explicitly clarify this in compiler code between middle end and back end?
What if any other back-end (new or old) NOT supporting stack red zone
exposing the similar bug again?

Thanks,
-Jiangning

> Richard.
> 
> > No matter what solution itself is, the problem itself is a quite a
> common one on ISA level, so we should solve it in middle-end, because
> middle-end is shared for all ports.
> >
> > My proposal avoids problems in future. Any new ports and new back-end
> implementations needn't explicitly define this hook in a very safe way.
> But if one port wants to "unsafely" introduce red zone, we can
> explicitly define this hook in back-end. This way, we would avoid the
> bug in the earliest time. Do you really want to hide this problem in
> back-end silently? And wait others to report bug and then waste time to
> get it fixed again?
> >
> > The facts I see is over the years, for different ports reported the
> similar issue around this, and somebody tried to fix them. Actually,
> this kind of problem should be fixed in design level. If the first
> people solve this bug can give solution in middle end, we would not
> need to waste time on filing those bugs in bug zilla and fixing them
> around this problem.
> >
> > Thanks,
> > -Jiangning
> >
> >
> >
> >
> >
> >
> >
> >

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-26 Thread Jiangning Liu

> Think of it this way.  What the IR says is there is no barrier between
> those moves.  You either have an implicit barrier (which is what you
> are proposing) or you have it explicitly.  I think we all rather have
> more things explicit rather than implicit in the IR.  And that has
> been the overall feeling for a few years now.
> 

Sorry, I'm afraid I can't agree with you. Instead, I think using barrier to 
describe this kind of dependence is a kind of implicit method. Please note that 
this is not an usual data dependence issue. The stack pointer change doesn't 
have any dependence with memory access at all.

No matter what solution itself is, the problem itself is a quite a common one 
on ISA level, so we should solve it in middle-end, because middle-end is shared 
for all ports. 

My proposal avoids problems in future. Any new ports and new back-end 
implementations needn't explicitly define this hook in a very safe way. But if 
one port wants to "unsafely" introduce red zone, we can explicitly define this 
hook in back-end. This way, we would avoid the bug in the earliest time. Do you 
really want to hide this problem in back-end silently? And wait others to 
report bug and then waste time to get it fixed again?

The facts I see is over the years, for different ports reported the similar 
issue around this, and somebody tried to fix them. Actually, this kind of 
problem should be fixed in design level. If the first people solve this bug can 
give solution in middle end, we would not need to waste time on filing those 
bugs in bug zilla and fixing them around this problem.

Thanks,
-Jiangning

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-26 Thread Jiangning Liu

Fix a typo and CC x86/rs6000/arm ports maintainers.

ChangeLog:

* config/i386/i386.c (ix86_stack_using_red_zone): Change inline
to be extern.
(TARGET_STACK_USING_RED_ZONE): New.
* config/rs6000/rs6000.c (rs6000_stack_using_red_zone): New.
(TARGET_STACK_USING_RED_ZONE): New.
(offset_below_red_zone_p): Change to use new hook 
TARGET_STACK_USING_RED_ZONE.
* doc/tm.texi (TARGET_STACK_USING_RED_ZONE): New.
* doc/tm.texi.in (TARGET_STACK_USING_RED_ZONE): New.
* sched-deps.c (sched_analyze_1): If the stack pointer is being 
modified and stack red zone is not supported for ports, flush out 
all memory references as they may become invalid if moved across 
the stack adjustment.
* target.def (stack_using_red_zone): New.
* testsuite/gcc.target/arm/stack-red-zone.c: New.

Thanks,
-Jiangning

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ff8c49f..1c16391 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2631,7 +2631,7 @@ static const char *const
cpu_names[TARGET_CPU_DEFAULT_max] =
 

 /* Return true if a red-zone is in use.  */
 
-static inline bool
+extern bool
 ix86_using_red_zone (void)
 {
   return TARGET_RED_ZONE && !TARGET_64BIT_MS_ABI;
@@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void)
 #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail
 #endif
 
+#undef TARGET_STACK_USING_RED_ZONE
+#define TARGET_STACK_USING_RED_ZONE ix86_using_red_zone
+
 #undef TARGET_FUNCTION_VALUE
 #define TARGET_FUNCTION_VALUE ix86_function_value
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 1ab57e5..411cb09 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1537,6 +1537,9 @@ static const struct attribute_spec
rs6000_attribute_table[] =
 #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail
 #endif
 
+#undef TARGET_STACK_USING_RED_ZONE
+#define TARGET_STACK_USING_RED_ZONE rs6000_stack_using_red_zone
+
 /* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors
The PowerPC architecture requires only weak consistency among
processors--that is, memory accesses between processors need not be
@@ -20660,6 +20663,13 @@ rs6000_restore_saved_cr (rtx reg, int
using_mfcr_multiple)
}
 }
 
+/* Return true if the ABI allows red zone access.  */
+extern bool
+rs6000_stack_using_red_zone (void)
+{
+   return (DEFAULT_ABI != ABI_V4);
+}
+
 /* Return true if OFFSET from stack pointer can be clobbered by signals.
V.4 doesn't have any stack cushion, AIX ABIs have 220 or 288 bytes
below stack pointer not cloberred by signals.  */
@@ -20667,7 +20677,7 @@ rs6000_restore_saved_cr (rtx reg, int
using_mfcr_multiple)
 static inline bool
 offset_below_red_zone_p (HOST_WIDE_INT offset)
 {
-  return offset < (DEFAULT_ABI == ABI_V4
+  return offset < (!TARGET_STACK_USING_RED_ZONE
   ? 0
   : TARGET_32BIT ? -220 : -288);
 }
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 335c1d1..c848806 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11332,6 +11332,22 @@ to the stack.  Therefore, this hook should return
true in general, but
 false for naked functions.  The default implementation always returns true.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_STACK_USING_RED_ZONE (void)
+This hook returns true if the target has a red zone (an area beyond the
+current extent of the stack that cannot be modified by asynchronous events
+on the processor).
+
+If this hook returns false then the compiler mid-end will not move an
access
+to memory in the stack frame past a stack adjustment insn.
+
+If this hook returns true then the compiler mid-end will assume that it is
+safe to move an access to memory in the stack frame past a stack adjustment
+insn.  The target back-end must emit scheduling barrier insns when this is
+unsafe.
+
+The default is to return false which is safe and appropriate for most
targets.
+@end deftypefn
+
 @deftypevr {Target Hook} {unsigned HOST_WIDE_INT} TARGET_CONST_ANCHOR
 On some architectures it can take multiple instructions to synthesize
 a constant.  If there is another constant already in a register that
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 6783826..0fa9e10 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -11223,6 +11223,22 @@ to the stack.  Therefore, this hook should return
true in general, but
 false for naked functions.  The default implementation always returns true.
 @end deftypefn
 
+@hook TARGET_STACK_USING_RED_ZONE
+This hook returns true if the target has a red zone (an area beyond the
+current extent of the stack that cannot be modified by asynchronous events
+on the processor).
+
+If this hook returns false then the compiler mid-end will not move an
access
+to memory in the stack frame past a stack adjustment insn.
+
+If this hook returns true then the compiler mid-end will assume that

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-26 Thread Jiangning Liu

> -Original Message-
> From: Andrew Pinski [mailto:pins...@gmail.com]
> Sent: Tuesday, September 27, 2011 5:31 AM
> To: Jiangning Liu
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
> 
> On Mon, Sep 26, 2011 at 3:26 AM, Jiangning Liu 
> wrote:
> > This patch is fix PR38644, a 3-year-old bug.
> >
> > From the discussions in mail list and bugzilla, I think the middle
> end fix
> > is a common view. Although there are stills some gaps on how to fix
> it in
> > middle end, I think this patch at least moves the problem from back-
> end to
> > middle-end, which makes GCC infrastructure stronger than before.
> Otherwise,
> > any back-end needs to "find" and fix this problem by itself.
> 
> I don't see why you think that is the common view that the fix should
> be in the middle-end.  I and a few others think the back-end should be
> where the barrier emitted from.  The middle-end should not have stack
> accesses as being special in anyway when it comes down to scheduling.
> What happens when a new scheduler comes around?  Then this has to be
> fixed again in that new scheduler rather than having the barrier doing
> the correct thing for all schedulers.
> 

Hi Andrew,

Thanks for your kindly response!

Sorry, for this bug, I don’t see your valuable comments previously in either 
bug zilla and the [RFC] I sent out long time ago in gcc mail list. What I see 
is a bunch of people agreed this problem should be fixed in middle end.

For example, 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644#c48
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644#c49

My comments,
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644#c35

I'd like to repeat my points here.

As I mentioned, it shouldn't be back-end's responsibility to "find" this 
problem, i.e. the instructions move over stack pointer change. ISAs can be 
split into two classes. One class doesn't support stack red zone, and the other 
class does support stack red zone. If we agree this is a general issue, I think 
this issue should be solved in middle end rather than in back-end.

Yes. Back-end can provide barrier to explicitly represent the dependence, but I 
think this is a kind of workaround, because this way we are hoping back-end to 
detect this kind of dependence, while this kind of dependence is common for 
every back-end which doesn't support stack red zone. If we carefully analyze 
the implementation in x86 and powerpc back-ends supporting stack red zone, we 
may find, they are doing almost the same things.

In particular, I think the retarget-ability is the most valuable treasure for 
GCC. Thinking of implementing a new port, do we want to write new code to 
"find" this problem, no matter whether the new port supports stack red zone or 
not? If the answer is YES, it means we need to write the similar code like what 
currently x86-64 and powerpc back-ends are doing. Obviously, it is a 
non-trivial work. This way I don't think it's good for GCC's retarget-ability. 
Why don't we abstract the common thing in middle-end with a very clean 
interface?

Finally, It's OK for me if you say scheduler can only tread dependence as a 
"clean" interface. But this point doesn't support stack red zone issue must be 
handled in different back-ends respectively. If we want to keep scheduler clean 
enough, a simpler solution is adding a pass in middle-end to generate barrier 
before scheduler and this pass handle the general stack-red-zone issue using 
the interface I'm defining in the patch, but obviously this is a kind of 
over-design so far. 

Thanks,
-Jiangning

> Thanks,
> Andrew Pinski

[PATCH] Fix stack red zone bug (PR38644)

2011-09-26 Thread Jiangning Liu

This patch is fix PR38644, a 3-year-old bug. 

>From the discussions in mail list and bugzilla, I think the middle end fix
is a common view. Although there are stills some gaps on how to fix it in
middle end, I think this patch at least moves the problem from back-end to
middle-end, which makes GCC infrastructure stronger than before. Otherwise,
any back-end needs to "find" and fix this problem by itself.

Since this bug was pinged many times by customers, I think at least we can
move on with this patch. If necessary, we can then give follow-up to build a
even better solution in middle end later on.

The patch is tested on x86 and ARM.

ChangeLog:

* config/i386/i386.c (ix86_stack_using_red_zone): Change inline
to be extern.
(TARGET_STACK_USING_RED_ZONE): New.
* config/rs6000/rs6000.c (rs6000_stack_using_red_zone): New.
(TARGET_STACK_USING_RED_ZONE): New.
(offset_below_red_zone_p): Change to use new hook 
TARGET_STACK_USING_RED_ZONE.
* doc/tm.texi (TARGET_STACK_USING_RED_ZONE): New.
* doc/tm.texi.in (TARGET_STACK_USING_RED_ZONE): New.
* sched-deps.c (sched_analyze_1): If the stack pointer is being 
modified and stack red zone is not supported for ports, flush out 
all memory references as they may become invalid if moved across 
the stack adjustment.
* target.def (stack_using_red_zone): New.
* testsuite/gcc.target/arm/stack-red-zone.c: New.

Thanks,
-Jiangning

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ff8c49f..ce486da 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2631,8 +2631,8 @@ static const char *const
cpu_names[TARGET_CPU_DEFAULT_max] =
 

 /* Return true if a red-zone is in use.  */
 
-static inline bool
-ix86_using_red_zone (void)
+extern bool
+ix86_stack_using_red_zone (void)
 {
   return TARGET_RED_ZONE && !TARGET_64BIT_MS_ABI;
 }
@@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void)
 #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail
 #endif
 
+#undef TARGET_STACK_USING_RED_ZONE
+#define TARGET_STACK_USING_RED_ZONE ix86_stack_using_red_zone
+
 #undef TARGET_FUNCTION_VALUE
 #define TARGET_FUNCTION_VALUE ix86_function_value
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 1ab57e5..411cb09 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1537,6 +1537,9 @@ static const struct attribute_spec
rs6000_attribute_table[] =
 #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail
 #endif
 
+#undef TARGET_STACK_USING_RED_ZONE
+#define TARGET_STACK_USING_RED_ZONE rs6000_stack_using_red_zone
+
 /* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors
The PowerPC architecture requires only weak consistency among
processors--that is, memory accesses between processors need not be
@@ -20660,6 +20663,13 @@ rs6000_restore_saved_cr (rtx reg, int
using_mfcr_multiple)
}
 }
 
+/* Return true if the ABI allows red zone access.  */
+extern bool
+rs6000_stack_using_red_zone (void)
+{
+   return (DEFAULT_ABI != ABI_V4);
+}
+
 /* Return true if OFFSET from stack pointer can be clobbered by signals.
V.4 doesn't have any stack cushion, AIX ABIs have 220 or 288 bytes
below stack pointer not cloberred by signals.  */
@@ -20667,7 +20677,7 @@ rs6000_restore_saved_cr (rtx reg, int
using_mfcr_multiple)
 static inline bool
 offset_below_red_zone_p (HOST_WIDE_INT offset)
 {
-  return offset < (DEFAULT_ABI == ABI_V4
+  return offset < (!TARGET_STACK_USING_RED_ZONE
   ? 0
   : TARGET_32BIT ? -220 : -288);
 }
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 335c1d1..c848806 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11332,6 +11332,22 @@ to the stack.  Therefore, this hook should return
true in general, but
 false for naked functions.  The default implementation always returns true.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_STACK_USING_RED_ZONE (void)
+This hook returns true if the target has a red zone (an area beyond the
+current extent of the stack that cannot be modified by asynchronous events
+on the processor).
+
+If this hook returns false then the compiler mid-end will not move an
access
+to memory in the stack frame past a stack adjustment insn.
+
+If this hook returns true then the compiler mid-end will assume that it is
+safe to move an access to memory in the stack frame past a stack adjustment
+insn.  The target back-end must emit scheduling barrier insns when this is
+unsafe.
+
+The default is to return false which is safe and appropriate for most
targets.
+@end deftypefn
+
 @deftypevr {Target Hook} {unsigned HOST_WIDE_INT} TARGET_CONST_ANCHOR
 On some architectures it can take multiple instructions to synthesize
 a constant.  If there is another constant already in a register that
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 6783826..0fa9e10 100644
--- a/gcc/doc/tm.texi.in
+++ b

RE: [PATCH, testsuite] Add loop unrolling command line options for some test cases

2011-09-26 Thread Jiangning Liu

> -Original Message-
> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> Sent: Saturday, September 24, 2011 6:12 PM
> To: Jiangning Liu
> Cc: Mike Stump; gcc-patches@gcc.gnu.org; r...@cebitec.uni-bielefeld.de
> Subject: Re: [PATCH, testsuite] Add loop unrolling command line options
> for some test cases
> 
> On Thu, Sep 22, 2011 at 6:22 AM, Jiangning Liu 
> wrote:
> > Hi Mike,
> >
> > OK. I will wait 24 more hours. If no objections by then, I will get
> it
> > checked into trunk.
> 
> I don't think you need -funroll-loops though.
> 

The intention of those test cases are not hurt if -funroll-loops is added,
right?

> > Thanks,
> > -Jiangning
> >
> >> -Original Message-
> >> From: Mike Stump [mailto:mikest...@comcast.net]
> >> Sent: Thursday, September 22, 2011 3:10 AM
> >> To: Jiangning Liu
> >> Cc: gcc-patches@gcc.gnu.org; r...@cebitec.uni-bielefeld.de
> >> Subject: Re: [PATCH, testsuite] Add loop unrolling command line
> options
> >> for some test cases
> >>
> >> On Sep 21, 2011, at 1:22 AM, Jiangning Liu wrote:
> >> > The fix is to explicitly turn on loop unroll and set max-unroll-
> times
> >> to 8,
> >> > which is larger than the unrolling times being detected in the
> cases.
> >>
> >> Sounds reasonable to me.  Ok, though, do watch for any comments by
> >> people that know more than I.
> >
> >
> >
> >
> >

RE: [PATCH, testsuite] Add loop unrolling command line options for some test cases

2011-09-21 Thread Jiangning Liu

Hi Mike,

OK. I will wait 24 more hours. If no objections by then, I will get it
checked into trunk.

Thanks,
-Jiangning

> -Original Message-
> From: Mike Stump [mailto:mikest...@comcast.net]
> Sent: Thursday, September 22, 2011 3:10 AM
> To: Jiangning Liu
> Cc: gcc-patches@gcc.gnu.org; r...@cebitec.uni-bielefeld.de
> Subject: Re: [PATCH, testsuite] Add loop unrolling command line options
> for some test cases
> 
> On Sep 21, 2011, at 1:22 AM, Jiangning Liu wrote:
> > The fix is to explicitly turn on loop unroll and set max-unroll-times
> to 8,
> > which is larger than the unrolling times being detected in the cases.
> 
> Sounds reasonable to me.  Ok, though, do watch for any comments by
> people that know more than I.

[PATCH, testsuite] Add loop unrolling command line options for some test cases

2011-09-21 Thread Jiangning Liu

Hi,

The following test cases are to check predictive commoning optimization by
detecting loop unrolling times. Originally by default the max-unroll-times
is 8. If max-unroll-times is specified to be less than the expected
unrolling times, those cases would fail. 

The fix is to explicitly turn on loop unroll and set max-unroll-times to 8,
which is larger than the unrolling times being detected in the cases.

ChangeLog:

2011-09-14  Jiangning Liu  

* gcc/testsuite/gcc.dg/tree-ssa/predcom-1.c: Explicitly turn on 
loop unroll and set max unroll times to 8.
* gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c: Likewise.
* gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c: Likewise.
* gcc/testsuite/gcc.dg/tree-ssa/predcom-4.c: Likewise.
* gcc/testsuite/gcc.dg/tree-ssa/predcom-5.c: Likewise.

Thanks,
-Jiangning

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-1.c
b/gcc/testsuite/gcc.dg/tree-ssa/predcom-1.c
index 16bd5c9..f1e52e5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-do run } */
-/* { dg-options "-O2 -fpredictive-commoning -fdump-tree-pcom-details" } */
+/* { dg-options "-O2 -funroll-loops --param max-unroll-times=8
-fpredictive-commoning -fdump-tree-pcom-details" } */
 
 void abort (void);
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c
b/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c
index 7275f28..27e53ee 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-do run } */
-/* { dg-options "-O2 -fpredictive-commoning -fdump-tree-pcom-details" } */
+/* { dg-options "-O2 -funroll-loops --param max-unroll-times=8
-fpredictive-commoning -fdump-tree-pcom-details" } */
 
 void abort (void);
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c
b/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c
index d500234..5dfe384 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fpredictive-commoning -fdump-tree-pcom-details" } */
+/* { dg-options "-O2 -funroll-loops --param max-unroll-times=8
-fpredictive-commoning -fdump-tree-pcom-details" } */
 
 int a[1000], b[1000];
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-4.c
b/gcc/testsuite/gcc.dg/tree-ssa/predcom-4.c
index 6f06b7f..c29a46a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-4.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-do run } */
-/* { dg-options "-O2 -fpredictive-commoning -fdump-tree-pcom-details" } */
+/* { dg-options "-O2 -funroll-loops --param max-unroll-times=8
-fpredictive-commoning -fdump-tree-pcom-details" } */
 
 /* Test for predictive commoning of expressions, without reassociation.  */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-5.c
b/gcc/testsuite/gcc.dg/tree-ssa/predcom-5.c
index 134fc37..29444ab 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-5.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-do run } */
-/* { dg-options "-O2 -fpredictive-commoning -fdump-tree-pcom-details" } */
+/* { dg-options "-O2 -funroll-loops --param max-unroll-times=8
-fpredictive-commoning -fdump-tree-pcom-details" } */
 
 /* Test for predictive commoning of expressions, with reassociation.  */

[PATCH, testsuite, ARM] Change to expected failure for g++.dg/abi/local1.C on ARM target

2011-09-13 Thread Jiangning Liu

Here comes a patch to change the case g++.dg/abi/local1.C to be expected
failure for ARM target.

In "C++ ABI for the ARM architecture", it says,



This runs contrary to §2.9.1 of [GC++ABI] which states:

It is intended that two type_info pointers point to equivalent type
descriptions if and only if the pointers are equal. An implementation must
satisfy this constraint, e.g. by using symbol preemption, COMDAT sections,
or other mechanisms.

Fortunately, we can ignore this requirement without violating the C++
standard provided that:

* type_info::operator== and type_info::operator!= compare the strings
returned by type_info::name(), not just the pointers to the RTTI objects and
their names.

* No reliance is placed on the address returned by type_info::name(). (That
is, t1.name() != t2.name() does not imply that t1 != t2).



The patch is,

diff --git a/gcc/testsuite/g++.dg/abi/local1.C
b/gcc/testsuite/g++.dg/abi/local1.C
index 518193c..7f08a8f 100644
--- a/gcc/testsuite/g++.dg/abi/local1.C
+++ b/gcc/testsuite/g++.dg/abi/local1.C
@@ -1,4 +1,4 @@
-// { dg-do run }
+// { dg-do run { xfail { arm*-*-eabi* } } }
 // { dg-additional-sources "local1-a.cc" }
 
 #include 

ChangeLog:

2011-09-14  Jiangning Liu  

* g++.dg/abi/local1.C: Change to XFAIL for ARM EABI target.

Thanks,
-Jiangning

RE: [PATCH, testsuite, ARM] change XFAIL to pass for ARM on a case testing tree-ssa-dom

2011-09-05 Thread Jiangning Liu

PING...

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Jiangning Liu
> Sent: Friday, August 26, 2011 5:56 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH, testsuite, ARM] change XFAIL to pass for ARM on a case
> testing tree-ssa-dom
> 
> Test case gcc.dg/tree-ssa/20040204-1.c can pass for -O1 after Richard
> Guenther  fixed something in tree-ssa-dom. The
> "link_error" should be optimized away for ARM targets as well.
> 
> The patch is:
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c
> b/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c
> index 45e44a1..470b585 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c
> @@ -33,5 +33,5 @@ void test55 (int x, int y)
> that the && should be emitted (based on BRANCH_COST).  Fix this
> by teaching dom to look through && and register all components
> as true.  */
> -/* { dg-final { scan-tree-dump-times "link_error" 0 "optimized"
> { xfail { !
> "alpha*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86-*-* mmix-
> *-*
> mips*-*-* m68k*-*-* moxie-*-* sparc*-*-* spu-*-* x86_64-*-*" } } } } */
> +/* { dg-final { scan-tree-dump-times "link_error" 0 "optimized"
> { xfail { !
> "alpha*-*-* arm*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86-
> *-*
> mmix-*-* mips*-*-* m68k*-*-* moxie-*-* sparc*-*-* spu-*-* x86_64-*-
> *" } } }
> } */
>  /* { dg-final { cleanup-tree-dump "optimized" } } */
> 
> gcc/testsuite/ChangeLog:
> 
> 2011-08-26  Jiangning Liu  
> 
>PR tree-optimization/46021
>* gcc.dg/tree-ssa/20040204-1.c: Don't XFAIL on arm*-*-*.
> 
> Thanks,
> -Jiangning
> 
> 
>

[PATCH, testsuite, ARM] change XFAIL to pass for ARM on a case testing tree-ssa-dom

2011-08-26 Thread Jiangning Liu

Test case gcc.dg/tree-ssa/20040204-1.c can pass for -O1 after Richard
Guenther  fixed something in tree-ssa-dom. The
"link_error" should be optimized away for ARM targets as well.

The patch is:

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c
b/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c
index 45e44a1..470b585 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c
@@ -33,5 +33,5 @@ void test55 (int x, int y)
that the && should be emitted (based on BRANCH_COST).  Fix this
by teaching dom to look through && and register all components
as true.  */
-/* { dg-final { scan-tree-dump-times "link_error" 0 "optimized" { xfail { !
"alpha*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86-*-* mmix-*-*
mips*-*-* m68k*-*-* moxie-*-* sparc*-*-* spu-*-* x86_64-*-*" } } } } */
+/* { dg-final { scan-tree-dump-times "link_error" 0 "optimized" { xfail { !
"alpha*-*-* arm*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86-*-*
mmix-*-* mips*-*-* m68k*-*-* moxie-*-* sparc*-*-* spu-*-* x86_64-*-*" } } }
} */
 /* { dg-final { cleanup-tree-dump "optimized" } } */

gcc/testsuite/ChangeLog:

2011-08-26  Jiangning Liu  

   PR tree-optimization/46021
   * gcc.dg/tree-ssa/20040204-1.c: Don't XFAIL on arm*-*-*.

Thanks,
-Jiangning

RE: [PATCH, ARM] Generate conditional compares in Thumb2 state

2011-08-17 Thread Jiangning Liu

Attached is the new patch file. Review please!

ChangeLog:

2011-08-18  Jiangning Liu  

* config/arm/arm.md (*ior_scc_scc): Enable for Thumb2 as well.
(*ior_scc_scc_cmp): Likewise
(*and_scc_scc): Likewise.
(*and_scc_scc_cmp): Likewise.
(*and_scc_scc_nodom): Likewise.
(*cmp_ite0, *cmp_ite1, *cmp_and, *cmp_ior): Handle Thumb2.

Testsuite Changelog would be:

2011-08-18  Jiangning Liu  

* gcc.target/arm/thumb2-cond-cmp-1.c: New. Make sure conditional 
compare can be generated.
* gcc.target/arm/thumb2-cond-cmp-2.c: Likewise.
* gcc.target/arm/thumb2-cond-cmp-3.c: Likewise.
* gcc.target/arm/thumb2-cond-cmp-4.c: Likewise.

Regression test against cortex-M0/M3/M4 profile with "-mthumb" option
doesn't show any new failures.

Thanks,
-Jiangning

fix_cond_cmp_5.patch
Description: Binary data

RE: [PATCH, ARM] Generate conditional compares in Thumb2 state

2011-08-11 Thread Jiangning Liu

Ramana,

I only see the following three combinations are meaningful, 

* cmp l, lPy // keep cmp, and length is 2, on thumb
* cmp r, rI  // keep cmp, and length is 4
* cmp r, L   // convert to cmn, and length is 4

According to ARM ARM, for negative immediate, all encodings for cmp/cmn are
4-byte long, i.e.

* CMP: encoding T2 and A1
* CMN: encoding T1 and A1

so we needn't to make difference to cover Pw and Pv.

> Length is 8 bytes if you have Pw and L

For this cases, the length should be 10 for Thumb2 instead. 

Finally, if we want to describe all possibilities in constraints, we would
have the followings 9 combinations,

* cmp1 has operands

(l,  l,  l,  r, r, r, r, r, r)
(lPy,lPy,lPy,rI,rI,rI,L, L, L)

* cmp2 has operands

(l,  r, r, l,  r, r, l,  r, r)
(lPy,rI,L, lPy,rI,L, lPy,rI,L)

So the length would be like below, (I don't know how to write it in
attribute section yet. )

if (TARGET_THUMB2) {
(set_attr "length" "6,8,8,8,10,10,8,10,10")]
} else {
(set_attr "length" "4,6,6,6,8,8,6,8,8")]
}

Does it make sense?

Thanks,
-Jiangning

> -Original Message-
> From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org]
> Sent: Wednesday, August 10, 2011 6:40 PM
> To: Jiangning Liu
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH, ARM] Generate conditional compares in Thumb2 state
> 
> On 10 August 2011 09:20, Jiangning Liu  wrote:
> > PING...
> 
> >
> > BTW, in patch fix_cond_cmp_2.patch, the file mode of thumb2.md is
> carelessly
> > changed, so please check attached new patch file fix_cond_cmp_3.patch.
> >
> 
> Please do not top post.
> 
> I've been away for the whole of last week and then been away from my
> desk
> for the first 2 days this week and am still catching up with email .
> 
> I'm missing a Changelog entry for this . Do I assume it is the same ?
> 
> I've just noticed that the length attribute isn't correct for Thumb2
> state. When I last
> looked at this I must have missed the case where the cmp and the cmn
> are both 32 bit instructions.
> 
> The length can vary between 6 and 10 bytes for the Thumb2 variant from
> my reading of the ARM-ARM.
> 
> i.e cmp reg, <8bitconstant>
>  it
>  cmn reg, <8bitconstant>
> 
> Length = 6 bytes
> 
> or even with
>cmp reg, reg
>it 
>cmn reg, reg
> 
>  All registers betwen r0 and r7 .
> 
> Length is 6 bytes if you have this for both l constraints.
> Length is 6 bytes - you should catch this with Pw and Pv respectively .
> Length is 8 bytes if you have Pw and L
> Length is 8 bytes if you have I and Pv
> Length is 10 bytes if you have I and L .
> 
> 
> Showing you an example with l and Py at this point of time and leaving
> the rest as homework. Yes it will duplicate a few cases but it helps
> getting the lengths absolutely right.
> 
> (define_insn "*cmp_ite0"
>   [(set (match_operand 6 "dominant_cc_register" "")
>   (compare
>(if_then_else:SI
> (match_operator 4 "arm_comparison_operator"
>  [(match_operand:SI 0 "s_register_operand" "l,r,r,r,r")
>   (match_operand:SI 1 "arm_add_operand" "lPy,rI,L,rI,L")])
> (match_operator:SI 5 "arm_comparison_operator"
>  [(match_operand:SI 2 "s_register_operand" "l,r,r,r,r")
>   (match_operand:SI 3 "arm_add_operand" "lPy,rI,rI,L,L")])
> (const_int 0))
>(const_int 0)))]
>   "TARGET_32BIT"
>   "*
>   {
> static const char * const cmp1[5][2] =
> {
>   {\"cmp\\t%2, %3\",
>\"cmp\\t%0, %1\"},
>   {\"cmp\\t%2, %3\",
>\"cmp\\t%0, %1\"},
>   {\"cmp\\t%2, %3\",
>\"cmn\\t%0, #%n1\"},
>   {\"cmn\\t%2, #%n3\",
>\"cmp\\t%0, %1\"},
>   {\"cmn\\t%2, #%n3\",
>\"cmn\\t%0, #%n1\"}
> };
> static const char * const cmp2[5][2] =
> {
>   {\"cmp%d5\\t%0, %1\",
>\"cmp%d4\\t%2, %3\"},
>   {\"cmp%d5\\t%0, %1\",
>\"cmp%d4\\t%2, %3\"},
>   {\"cmn%d5\\t%0, #%n1\",
>\"cmp%d4\\t%2, %3\"},
>   {\"cmp%d5\\t%0, %1\",
>\"cmn%d4\\t%2, #%n3\"},
>   {\"cmn%d5\\t%0, #%n1\",
>\"cmn%d4\\t%2, #%n3\"}
> };
> static const char * const ite[2] =
> {
>   \"it\\t%d5\",
>   \"it\\t%d4\"
> };
> int swap =
>   comparison_do

RE: [PATCH, ARM] Generate conditional compares in Thumb2 state

2011-08-10 Thread Jiangning Liu

PING...

BTW, in patch fix_cond_cmp_2.patch, the file mode of thumb2.md is carelessly
changed, so please check attached new patch file fix_cond_cmp_3.patch.

Thanks,
-Jiangning

> -Original Message-
> From: Jiangning Liu [mailto:jiangning@arm.com]
> Sent: Monday, August 08, 2011 2:01 PM
> To: 'Ramana Radhakrishnan'
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH, ARM] Generate conditional compares in Thumb2 state
> 
> In attached new patch, arm_arch_thumb2 is changed to TARGET_THUMB2. I
> tried that with my patch command line option -mcpu=armv7-a9 doesn't
> generate IT instruction any longer, unless option "-mthumb" is being
> added.
> 
> All of my tests assume command line option -mthumb, while cortex-M0,
> cortex-M3 cortex-M4 are covered by options -mcpu=cortex-m0, -
> march=armv7-m, and -march=armv7e-m respectively.
> 
> As for the extra problem exposed by this specific case, may we treat it
> as a separate fix to decouple it with this one, and I can give follow
> up later on? I think it is a general problem not only for the
> particular pattern it/op/it/op. But I'm not sure how far we can go to
> optimize this kind of problems introduced by IT block. For this
> specific case, I see "if conversion" already generates conditional move
> before combination pass. So basically the peephole rules may probably
> work for most of the general scenarios. My initial thought is go over
> the rules introducing IT block and try to figure out all of the
> combination that two of this kinds of rules can be in sequential order.
> Maybe we only need to handle the most common case like this one. Since
> I do see a bunch of rules have something to do with problem, I'd like
> to look into all of them to give a most reasonable solution in a
> separate fix.
> 
> Does it make sense?
> 
> Thanks,
> -Jiangning
> 
> > -----Original Message-
> > From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org]
> > Sent: Friday, August 05, 2011 9:20 AM
> > To: Jiangning Liu
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH, ARM] Generate conditional compares in Thumb2
> > state
> >
> > On 3 August 2011 08:48, Jiangning Liu  wrote:
> > > This patch is to generate more conditional compare instructions in
> > Thumb2
> > > state. Given an example like below,
> > >
> > > int f(int i, int j)
> > > {
> > >  if ( (i == '+') || (j == '-') ) {
> > >    return i;
> > >  } else {
> > >    return j;
> > >  }
> > > }
> > >
> > > Without the patch, compiler generates the following codes,
> > >
> > >        sub     r2, r0, #43
> > >        rsbs    r3, r2, #0
> > >        adc     r3, r3, r2
> > >        cmp     r1, #45
> > >        it      eq
> > >        orreq   r3, r3, #1
> > >        cmp     r3, #0
> > >        it      eq
> > >        moveq   r0, r1
> > >        bx      lr
> > >
> > > With the patch, compiler can generate conditional jump like below,
> > >
> > >        cmp     r0, #43
> > >        it      ne
> > >        cmpne   r1, #45
> > >        it      ne
> > >        movne   r0, r1
> > >        bx      lr
> >
> >
> > Nice improvement but there could be a single it block to handle both
> > and thus you could make this even better with
> >
> > cmp r0, #43
> > itt ne
> > cmpne r1 ,#45
> > movne r0, r1
> >
> > The way to do this would be to try and split this post-reload
> > unfortunately into the cmp instruction and the conditional compare
> > with the appropriate instruction length - Then the backend has a
> > chance of merging some of this into a single instruction.
> > Unfortunately that won't be very straight-forward but that's a
> > direction we probably ought to proceed with in this case.
> >
> > In a number of places:
> >
> > > +   if (arm_arch_thumb2)
> >
> > Ah instead of this please use if (TARGET_THUMB2) - arm_arch_thumb2 is
> > true based on the architecture levels and not necessarily if the user
> > wants to generate Thumb code. I don't want an unnecessary IT
> > instruction being emitted in the ASM block in ARM state for v7-a and
> > above.
> >
> > > Tested against arm-none-eabi target and no regression found.
> >
> > Presumably for ARM and Thumb2 state ?
> >
> >
> > cheers
> > Ramana


fix_cond_cmp_3.patch
Description: Binary data

RE: [PATCH, ARM] Generate conditional compares in Thumb2 state

2011-08-07 Thread Jiangning Liu

In attached new patch, arm_arch_thumb2 is changed to TARGET_THUMB2. I tried
that with my patch command line option -mcpu=armv7-a9 doesn't generate IT
instruction any longer, unless option "-mthumb" is being added.

All of my tests assume command line option -mthumb, while cortex-M0,
cortex-M3 cortex-M4 are covered by options -mcpu=cortex-m0, -march=armv7-m,
and -march=armv7e-m respectively.

As for the extra problem exposed by this specific case, may we treat it as a
separate fix to decouple it with this one, and I can give follow up later
on? I think it is a general problem not only for the particular pattern
it/op/it/op. But I'm not sure how far we can go to optimize this kind of
problems introduced by IT block. For this specific case, I see "if
conversion" already generates conditional move before combination pass. So
basically the peephole rules may probably work for most of the general
scenarios. My initial thought is go over the rules introducing IT block and
try to figure out all of the combination that two of this kinds of rules can
be in sequential order. Maybe we only need to handle the most common case
like this one. Since I do see a bunch of rules have something to do with
problem, I'd like to look into all of them to give a most reasonable
solution in a separate fix. 

Does it make sense?

Thanks,
-Jiangning

> -Original Message-
> From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org]
> Sent: Friday, August 05, 2011 9:20 AM
> To: Jiangning Liu
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH, ARM] Generate conditional compares in Thumb2 state
> 
> On 3 August 2011 08:48, Jiangning Liu  wrote:
> > This patch is to generate more conditional compare instructions in
> Thumb2
> > state. Given an example like below,
> >
> > int f(int i, int j)
> > {
> >  if ( (i == '+') || (j == '-') ) {
> >    return i;
> >  } else {
> >    return j;
> >  }
> > }
> >
> > Without the patch, compiler generates the following codes,
> >
> >        sub     r2, r0, #43
> >        rsbs    r3, r2, #0
> >        adc     r3, r3, r2
> >        cmp     r1, #45
> >        it      eq
> >        orreq   r3, r3, #1
> >        cmp     r3, #0
> >        it      eq
> >        moveq   r0, r1
> >        bx      lr
> >
> > With the patch, compiler can generate conditional jump like below,
> >
> >        cmp     r0, #43
> >        it      ne
> >        cmpne   r1, #45
> >        it      ne
> >        movne   r0, r1
> >        bx      lr
> 
> 
> Nice improvement but there could be a single it block to handle both
> and thus you
> could make this even better with
> 
> cmp r0, #43
> itt ne
> cmpne r1 ,#45
> movne r0, r1
> 
> The way to do this would be to try and split this post-reload
> unfortunately into the cmp instruction and the conditional compare
> with the appropriate instruction length - Then the backend has a
> chance of merging some of this into a single instruction.
> Unfortunately that won't be very straight-forward but that's a
> direction we probably ought to proceed with in this case.
> 
> In a number of places:
> 
> > +   if (arm_arch_thumb2)
> 
> Ah instead of this please use if (TARGET_THUMB2) - arm_arch_thumb2 is
> true based on the architecture levels and not necessarily if the user
> wants to generate Thumb code. I don't want an unnecessary IT
> instruction being emitted in the ASM block in ARM state for v7-a and
> above.
> 
> > Tested against arm-none-eabi target and no regression found.
> 
> Presumably for ARM and Thumb2 state ?
> 
> 
> cheers
> Ramana

fix_cond_cmp_2.patch
Description: Binary data

[PATCH, ARM] Generate conditional compares in Thumb2 state

2011-08-03 Thread Jiangning Liu

This patch is to generate more conditional compare instructions in Thumb2
state. Given an example like below,

int f(int i, int j) 
{
  if ( (i == '+') || (j == '-') ) {
return i;
  } else {
return j;
  }
}

Without the patch, compiler generates the following codes,

sub r2, r0, #43
rsbsr3, r2, #0
adc r3, r3, r2
cmp r1, #45
it  eq
orreq   r3, r3, #1
cmp r3, #0
it  eq
moveq   r0, r1
bx  lr

With the patch, compiler can generate conditional jump like below,

cmp r0, #43
it  ne
cmpne   r1, #45
it  ne
movne   r0, r1
bx  lr

The patch is essentially to insert *it* instruction for the following rules
in arm.md,

* cmp_ite0
* cmp_ite1
* cmp_and
* cmp_ior

Tested against arm-none-eabi target and no regression found.

Source code Changelog would be:

2011-07-29  Jiangning Liu  

* config/arm/arm.md (*ior_scc_scc): Enable for Thumb2 as well.
(*ior_scc_scc_cmp): Likewise
(*and_scc_scc): Likewise.
(*and_scc_scc_cmp): Likewise.
(*and_scc_scc_nodom): Likewise.
(*cmp_ite0, *cmp_ite1, *cmp_and, *cmp_ior): Handle Thumb2.

Testsuite Changelog would be:

2011-07-29  Jiangning Liu  

* gcc.target/arm/thumb2-cond-cmp-1.c: New. Make sure conditional 
compare can be generated.
* gcc.target/arm/thumb2-cond-cmp-2.c: Likewise.
* gcc.target/arm/thumb2-cond-cmp-3.c: Likewise.
* gcc.target/arm/thumb2-cond-cmp-4.c: Likewise.

Thanks,
-Jiangning

fix_cond_cmp.patch
Description: Binary data

RE: [RFC] Add middle end hook for stack red zone size

2011-08-01 Thread Jiangning Liu

Hi Jakub,

Appreciate for your valuable comments!

I think SPARC V9 ABI doesn't have red zone defined, right? So
stack_red_zone_size should be defined as zero by default, the scheduler
would block moving memory accesses across stack adjustment no matter what
the offset is. I don't see any risk here. Also, in my patch function *abs*
is being used to avoid the opposite stack direction issue as you mentioned.

Some people like you insist on the ABI diversity, and actually I agree with
you on this. But part of the ABI definition is general for all targets. The
point here is memory access beyond stack red zone should be avoided, which
is the general part of ABI that compiler should guarantee. For this general
part, middle end should take the responsibility.

Thanks,
-Jiangning

> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Monday, August 01, 2011 6:31 PM
> To: Jiangning Liu
> Cc: 'Joern Rennecke'; g...@gcc.gnu.org; gcc-patches@gcc.gnu.org;
> vmaka...@redhat.com; dje@gmail.com; Richard Henderson; Ramana
> Radhakrishnan; 'Ramana Radhakrishnan'
> Subject: Re: [RFC] Add middle end hook for stack red zone size
> 
> On Mon, Aug 01, 2011 at 06:14:27PM +0800, Jiangning Liu wrote:
> > ARM. You are right, they were all fixed in back-ends in the past, but
> we
> > should
> > fix the bug in a general way to make GCC infrastructure stronger,
> rather
> > than fixing the problem target-by-target and case-by-case! If you
> further
> > look into the back-end fixes in x86 and PowerPC, you may find they
> looks
> > quite similar in back-ends.
> >
> 
> Red zone is only one difficulty, your patch is e.g. completely ignoring
> existence of biased stack pointers (e.g. SPARC -m64 has them).
> Some targets have stack growing in opposite direction, etc.
> We have really a huge amount of very diverse ABIs and making the
> middle-end
> grok what is an invalid stack access is difficult.
> 
>   Jakub

RE: [RFC] Add middle end hook for stack red zone size

2011-08-01 Thread Jiangning Liu

The answer is ARM can. However, if you look into the bugs PR30282 and 
PR38644, PR44199, you may find in history, there are several different cases

in different ports reporting the similar failures, covering x86, PowerPC and

ARM. You are right, they were all fixed in back-ends in the past, but we
should 
fix the bug in a general way to make GCC infrastructure stronger, rather 
than fixing the problem target-by-target and case-by-case! If you further 
look into the back-end fixes in x86 and PowerPC, you may find they looks 
quite similar in back-ends. 

Thanks,
-Jiangning

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
> On Behalf Of Jakub Jelinek
> Sent: Monday, August 01, 2011 5:12 PM
> To: Jiangning Liu
> Cc: 'Joern Rennecke'; g...@gcc.gnu.org; gcc-patches@gcc.gnu.org;
> vmaka...@redhat.com; dje@gmail.com; Richard Henderson; Ramana
> Radhakrishnan; 'Ramana Radhakrishnan'
> Subject: Re: [RFC] Add middle end hook for stack red zone size
> 
> On Mon, Aug 01, 2011 at 11:44:04AM +0800, Jiangning Liu wrote:
> > It's quite necessary to solve the general problem in middle-end rather
than in
> back-end.
> 
> That's what we disagree on.  All back-ends but ARM are able to handle it
> right, why can't ARM too?  The ABI rules for stack handling in the
epilogues
> are simply too diverse and complex to be handled easily in the scheduler.
> 
>   Jakub

RE: [RFC] Add middle end hook for stack red zone size

2011-07-31 Thread Jiangning Liu

Joern,

Thanks for your valuable feedback! This is only a RFC, and I will send out 
formal patch along with ChangLog later on. 

Basically, my patch is only to add new dependence in scheduler, and it only 
blocks some instruction movements, so it is NO RISK to compiler correctness. 
For whatever stack pointer changes you gave in different scenarios, the current 
code base should already work. My patch intends neither to replace old 
dependences, nor maximize the scheduler capability due to the existence of red 
zone in stack. It is only to block the memory access moving over stack pointer 
adjustment if distance is beyond red zone size, which is an OS requirement due 
to interruption existence. 

Stack adjustment in epilogue is a very general usage in stack frame. It's quite 
necessary to solve the general problem in middle-end rather than in back-end. 
Also, that old patch you attached is to solve the data dependence between two 
memory accesses, but stack pointer doesn't really have data dependence with 
memory access without using stack pointer, so they have different stories. 
Alternative solution of without adding blunt scheduling barrier is we insert an 
independent pass before scheduler to create RTL barrier by using the same 
interface stack_red_zone_size, but it would really be an over-design, if we add 
a new pass only for this *small* functionality.

In my patch, *abs* of offset is being used, so you are right that it's possible 
to get false positive to be too conservative, but there won't exist false 
negative, because my code would only add new dependences. 

Since the compilation is based on function, it would be OK if red zone size 
varies due to different ABI. Could you please tell me exactly on what system 
being supported by GCC red zone size can be different for incoming and 
outgoing? And also how scheduler guarantee the correctness in current code 
base? Anyway, I don't think my patch will break the original solution.

Thanks,
-Jiangning

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
> On Behalf Of Joern Rennecke
> Sent: Tuesday, July 26, 2011 10:33 AM
> To: Jiangning Liu
> Cc: g...@gcc.gnu.org; gcc-patches@gcc.gnu.org; vmaka...@redhat.com;
> dje@gmail.com; Richard Henderson; Ramana Radhakrishnan; 'Ramana
> Radhakrishnan'
> Subject: RE: [RFC] Add middle end hook for stack red zone size
> 
> Quoting Jiangning Liu :
> 
> > Hi,
> >
> > One month ago, I sent out this RFC to *gcc-patches* mail list, but I
> >  didn't receive any response yet. So I'm forwarding this mail to
> > *gcc* mail list. Can anybody here really give feedback to me?
> 
> Well, I couldn't approve any patch, but I can point out some issues with your 
> patch.
> 
> First, it's missing a ChangeLog, and you don't state how you have tested it.
> And regarding the code in sched_analyze_1, I think you'll get false positives 
> with
> alloca, and false negatives when registers are involved to compute offsets or 
> to
> restore the stack pointer from.
> 
> FWIW, I think generally blunt scheduling barriers should be avoided, and 
> instead
> the dependencies made visible to the scheduler.
> E.g., I've been working with another architecture with a redzone, where at 
> -fno-
> omit-frame-pointer, the prologue can put pretend_args into the redzone, then 
> after
> stack adjustment and frame allocation, these arguments are accessed via the 
> frame
> pointer.
> 
> With the attached patchlet, alias analysis works for this situation too, so 
> no blunt
> scheduling block is required.
> 
> Likewise, with stack adjustments, they should not affect scheduling in 
> general, but
> be considered to clobber the part of the frame that is being exposed to 
> interrupt
> writes either before or after the adjustment.
> At the moment, each port that wants to have such selective scheduling 
> blockages
> has to define a stack_adjust pattern with a memory clobber in a parallel, 
> with a
> memref that shows the high water mark of possible interrupt stack writes.
> Prima facia it would seem convenient if you only had to tell the middle-end 
> about
> the redzone size, and it could figure out the implicit clobbers when the 
> stack is
> changed.  However, when a big stack adjustment is being made, or the stack is
> realigned, or restored from the frame pointer / another register where it was
> saved due to realignment, the adjustment is not so obvious.  I'm not sure if 
> you can
> actually create an robust interface that's simpler to use than putting the 
> right
> memory clobber in the stack adjust pattern.  Note also that the redzone size 
> can
> vary from function to function depending on ABI-altering attributes, in 
> particular
> for interrupt functions, which can also have different incoming and outgoing
> redzone sizes.  Plus, you can have an NMI / reset handler which can use the 
> stack
> like an ordinary address register.

RE: [RFC] Add middle end hook for stack red zone size

2011-07-25 Thread Jiangning Liu

Hi,

One month ago, I sent out this RFC to *gcc-patches* mail list, but I didn't 
receive any response yet. So I'm forwarding this mail to *gcc* mail list. Can 
anybody here really give feedback to me?

Appreciate your help in advance!

-Jiangning

-Original Message-
From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org] 
Sent: Tuesday, July 19, 2011 6:18 PM
To: Jiangning Liu
Cc: gcc-patches@gcc.gnu.org; vmaka...@redhat.com; dje@gmail.com; Richard 
Henderson; Ramana Radhakrishnan
Subject: Re: [RFC] Add middle end hook for stack red zone size

2011/7/19 Jiangning Liu :
>
> I see a lot of feedbacks on other posts, but mine is still with ZERO
> response in the past 3 weeks, so I'm wondering if I made any mistake in my
> process? Who can help me?

It would be worth CC'ing the other relevant target maintainers as well
to get some feedback since the patch touches ARM, x86 and Powerpc.
I've added the maintainers for i386 and PPC to the CC list using the
email addresses from the MAINTAINERS file.

Thanks,
Ramana

>
> Thanks,
> -Jiangning
>
> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
> On Behalf Of Jiangning Liu
> Sent: Tuesday, July 05, 2011 8:32 AM
> To: gcc-patches@gcc.gnu.org; rgue...@gcc.gnu.org
> Subject: RE: [RFC] Add middle end hook for stack red zone size
>
> PING...
>
> I just merged with the latest code base and generated new patch as attached.
>
> Thanks,
> -Jiangning
>
>> -Original Message-
>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>> ow...@gcc.gnu.org] On Behalf Of Jiangning Liu
>> Sent: 2011年6月28日 4:38 PM
>> To: gcc-patches@gcc.gnu.org
>> Subject: [RFC] Add middle end hook for stack red zone size
>>
>> This patch is to fix PR38644, which is a bug with long history about
>> stack red zone access, and PR30282 is correlated.
>>
>> Originally red zone concept is not exposed to middle-end, and back-end
>> uses special logic to add extra memory barrier RTL and help the
>> correct dependence in middle-end. This way different back-ends must
>> handle red zone problem by themselves. For example, X86 target
>> introduced function
>> ix86_using_red_zone() to judge red zone access, while POWER introduced
>> offset_below_red_zone_p() to judge it. Note that they have different
>> semantics, but the logic in caller sites of back-end uses them to
>> decide whether adding memory barrier RTL or not. If back-end
>> incorrectly handles this, bug would be introduced.
>>
>> Therefore, the correct method should be middle-end handles red zone
>> related things to avoid the burden in different back-ends. To be
>> specific for PR38644, this middle-end problem causes incorrect
>> behavior for ARM target.
>> This patch exposes red zone concept to middle-end by introducing a
>> middle-end/back-end hook TARGET_STACK_RED_ZONE_SIZE defined in
>> target.def, and by default its value is 0. Back-end may redefine this
>> function to provide concrete red zone size according to specific ABI
>> requirements.
>>
>> In middle end, scheduling dependence is modified by using this hook
>> plus checking stack frame pointer adjustment instruction to decide
>> whether memory references need to be all flushed out or not. In
>> theory, if TARGET_STACK_RED_ZONE_SIZE is defined correctly, back-end
>> would not be required to specially handle this scheduling dependence
>> issue by introducing extra memory barrier RTL.
>>
>> In back-end, the following changes are made to define the hook,
>> 1) For X86, TARGET_STACK_RED_ZONE_SIZE is redefined to be
>> ix86_stack_red_zone_size() in i386.c, which is an newly introduced
>> function.
>> 2) For POWER, TARGET_STACK_RED_ZONE_SIZE is redefined to be
>> rs6000_stack_red_zone_size() in rs6000.c, which is also a newly
>> defined function.
>> 3) For ARM and others, TARGET_STACK_RED_ZONE_SIZE is defined to be
>> default_stack_red_zone_size in targhooks.c, and this function returns
>> 0, which means ARM eabi and others don't support red zone access at all.
>>
>> In summary, the relationship between ABI and red zone access is like
>> below,
>>
>> -
>> |   ARCH   |  ARM  |   X86 |POWER  | others |
>> |--|---|---|---||
>> |ABI   | EABI  | MS_64 | other |   AIX  |  V4  ||
>> |--|---|---|---||--||
>> | RED ZONE |  No   |  YES  |  No   |  YES   |  No  |   No   |
>> |--|---|---|---||--||
>> | RED ZONE SIZE|   0   |  128  |   0   |220/288 |   0  |0   |
>> -
>>
>> Thanks,
>> -Jiangning
>
>
>
>

RE: [RFC] Add middle end hook for stack red zone size

2011-07-19 Thread Jiangning Liu

Hello...

Can *anybody* give me comments on this bug fix? Our customers have been
complaining about this bug with long history.

I see a lot of feedbacks on other posts, but mine is still with ZERO
response in the past 3 weeks, so I'm wondering if I made any mistake in my
process? Who can help me? 

Thanks,
-Jiangning

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
On Behalf Of Jiangning Liu
Sent: Tuesday, July 05, 2011 8:32 AM
To: gcc-patches@gcc.gnu.org; rgue...@gcc.gnu.org
Subject: RE: [RFC] Add middle end hook for stack red zone size

PING...

I just merged with the latest code base and generated new patch as attached.

Thanks,
-Jiangning

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- 
> ow...@gcc.gnu.org] On Behalf Of Jiangning Liu
> Sent: 2011年6月28日 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [RFC] Add middle end hook for stack red zone size
> 
> This patch is to fix PR38644, which is a bug with long history about 
> stack red zone access, and PR30282 is correlated.
> 
> Originally red zone concept is not exposed to middle-end, and back-end 
> uses special logic to add extra memory barrier RTL and help the 
> correct dependence in middle-end. This way different back-ends must 
> handle red zone problem by themselves. For example, X86 target 
> introduced function
> ix86_using_red_zone() to judge red zone access, while POWER introduced
> offset_below_red_zone_p() to judge it. Note that they have different 
> semantics, but the logic in caller sites of back-end uses them to 
> decide whether adding memory barrier RTL or not. If back-end 
> incorrectly handles this, bug would be introduced.
> 
> Therefore, the correct method should be middle-end handles red zone 
> related things to avoid the burden in different back-ends. To be 
> specific for PR38644, this middle-end problem causes incorrect 
> behavior for ARM target.
> This patch exposes red zone concept to middle-end by introducing a 
> middle-end/back-end hook TARGET_STACK_RED_ZONE_SIZE defined in 
> target.def, and by default its value is 0. Back-end may redefine this 
> function to provide concrete red zone size according to specific ABI 
> requirements.
> 
> In middle end, scheduling dependence is modified by using this hook 
> plus checking stack frame pointer adjustment instruction to decide 
> whether memory references need to be all flushed out or not. In 
> theory, if TARGET_STACK_RED_ZONE_SIZE is defined correctly, back-end 
> would not be required to specially handle this scheduling dependence 
> issue by introducing extra memory barrier RTL.
> 
> In back-end, the following changes are made to define the hook,
> 1) For X86, TARGET_STACK_RED_ZONE_SIZE is redefined to be
> ix86_stack_red_zone_size() in i386.c, which is an newly introduced 
> function.
> 2) For POWER, TARGET_STACK_RED_ZONE_SIZE is redefined to be
> rs6000_stack_red_zone_size() in rs6000.c, which is also a newly 
> defined function.
> 3) For ARM and others, TARGET_STACK_RED_ZONE_SIZE is defined to be 
> default_stack_red_zone_size in targhooks.c, and this function returns 
> 0, which means ARM eabi and others don't support red zone access at all.
> 
> In summary, the relationship between ABI and red zone access is like 
> below,
> 
> -
> |   ARCH   |  ARM  |   X86 |POWER  | others |
> |--|---|---|---||
> |ABI   | EABI  | MS_64 | other |   AIX  |  V4  ||
> |--|---|---|---||--||
> | RED ZONE |  No   |  YES  |  No   |  YES   |  No  |   No   |
> |--|---|---|---||--||
> | RED ZONE SIZE|   0   |  128  |   0   |220/288 |   0  |0   |
> -
> 
> Thanks,
> -Jiangning

RE: [RFC] Add middle end hook for stack red zone size

2011-07-04 Thread Jiangning Liu

PING...

I just merged with the latest code base and generated new patch as attached.

Thanks,
-Jiangning

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Jiangning Liu
> Sent: 2011年6月28日 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [RFC] Add middle end hook for stack red zone size
> 
> This patch is to fix PR38644, which is a bug with long history about
> stack red zone access, and PR30282 is correlated.
> 
> Originally red zone concept is not exposed to middle-end, and back-end
> uses special logic to add extra memory barrier RTL and help the correct
> dependence in middle-end. This way different back-ends must handle red
> zone problem by themselves. For example, X86 target introduced function
> ix86_using_red_zone() to judge red zone access, while POWER introduced
> offset_below_red_zone_p() to judge it. Note that they have different
> semantics, but the logic in caller sites of back-end uses them to
> decide whether adding memory barrier RTL or not. If back-end
> incorrectly handles this, bug would be introduced.
> 
> Therefore, the correct method should be middle-end handles red zone
> related things to avoid the burden in different back-ends. To be
> specific for PR38644, this middle-end problem causes incorrect behavior
> for ARM target.
> This patch exposes red zone concept to middle-end by introducing a
> middle-end/back-end hook TARGET_STACK_RED_ZONE_SIZE defined in
> target.def, and by default its value is 0. Back-end may redefine this
> function to provide concrete red zone size according to specific ABI
> requirements.
> 
> In middle end, scheduling dependence is modified by using this hook
> plus checking stack frame pointer adjustment instruction to decide
> whether memory references need to be all flushed out or not. In theory,
> if TARGET_STACK_RED_ZONE_SIZE is defined correctly, back-end would not
> be required to specially handle this scheduling dependence issue by
> introducing extra memory barrier RTL.
> 
> In back-end, the following changes are made to define the hook,
> 1) For X86, TARGET_STACK_RED_ZONE_SIZE is redefined to be
> ix86_stack_red_zone_size() in i386.c, which is an newly introduced
> function.
> 2) For POWER, TARGET_STACK_RED_ZONE_SIZE is redefined to be
> rs6000_stack_red_zone_size() in rs6000.c, which is also a newly defined
> function.
> 3) For ARM and others, TARGET_STACK_RED_ZONE_SIZE is defined to be
> default_stack_red_zone_size in targhooks.c, and this function returns 0,
> which means ARM eabi and others don't support red zone access at all.
> 
> In summary, the relationship between ABI and red zone access is like
> below,
> 
> -
> |   ARCH   |  ARM  |   X86 |POWER  | others |
> |--|---|---|---||
> |ABI   | EABI  | MS_64 | other |   AIX  |  V4  ||
> |--|---|---|---||--||
> | RED ZONE |  No   |  YES  |  No   |  YES   |  No  |   No   |
> |--|---|---|---||--||
> | RED ZONE SIZE|   0   |  128  |   0   |220/288 |   0  |0   |
> -
> 
> Thanks,
> -Jiangning

stack-red-zone-patch-38644-4.patch
Description: Binary data

[RFC] Add middle end hook for stack red zone size

2011-06-28 Thread Jiangning Liu

This patch is to fix PR38644, which is a bug with long history about stack
red zone access, and PR30282 is correlated.

Originally red zone concept is not exposed to middle-end, and back-end uses
special logic to add extra memory barrier RTL and help the correct
dependence in middle-end. This way different back-ends must handle red zone
problem by themselves. For example, X86 target introduced function
ix86_using_red_zone() to judge red zone access, while POWER introduced
offset_below_red_zone_p() to judge it. Note that they have different
semantics, but the logic in caller sites of back-end uses them to decide
whether adding memory barrier RTL or not. If back-end incorrectly handles
this, bug would be introduced. 

Therefore, the correct method should be middle-end handles red zone related
things to avoid the burden in different back-ends. To be specific for
PR38644, this middle-end problem causes incorrect behavior for ARM target.
This patch exposes red zone concept to middle-end by introducing a
middle-end/back-end hook TARGET_STACK_RED_ZONE_SIZE defined in target.def,
and by default its value is 0. Back-end may redefine this function to
provide concrete red zone size according to specific ABI requirements. 

In middle end, scheduling dependence is modified by using this hook plus
checking stack frame pointer adjustment instruction to decide whether memory
references need to be all flushed out or not. In theory, if
TARGET_STACK_RED_ZONE_SIZE is defined correctly, back-end would not be
required to specially handle this scheduling dependence issue by introducing
extra memory barrier RTL.

In back-end, the following changes are made to define the hook,
1) For X86, TARGET_STACK_RED_ZONE_SIZE is redefined to be
ix86_stack_red_zone_size() in i386.c, which is an newly introduced function.
2) For POWER, TARGET_STACK_RED_ZONE_SIZE is redefined to be
rs6000_stack_red_zone_size() in rs6000.c, which is also a newly defined
function.
3) For ARM and others, TARGET_STACK_RED_ZONE_SIZE is defined to be
default_stack_red_zone_size in targhooks.c, and this function returns 0,
which means ARM eabi and others don't support red zone access at all.

In summary, the relationship between ABI and red zone access is like below,

-
|   ARCH   |  ARM  |   X86 |POWER  | others |
|--|---|---|---||
|ABI   | EABI  | MS_64 | other |   AIX  |  V4  ||
|--|---|---|---||--||
| RED ZONE |  No   |  YES  |  No   |  YES   |  No  |   No   |
|--|---|---|---||--||
| RED ZONE SIZE|   0   |  128  |   0   |220/288 |   0  |0   |
-

Thanks,
-Jiangning

stack-red-zone-patch-38644-3.patch
Description: Binary data

55 matches

Mail list logo