from:"irar at il dot ibm dot com"

[Bug tree-optimization/45714] [4.6 Regression] Vectorization of double pow function causes a segmentation fault

2010-09-20 Thread irar at il dot ibm dot com



--- Comment #7 from irar at il dot ibm dot com  2010-09-20 06:43 ---
Fixed.


-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45714

[Bug tree-optimization/45733] [4.6 Regression] ICE: verify_stmts failed: invalid conversion in gimple call with -fstrict-overflow -ftree-vectorize

2010-09-20 Thread irar at il dot ibm dot com



--- Comment #2 from irar at il dot ibm dot com  2010-09-20 12:17 ---
Looks like it is caused by revision 164367:

http://gcc.gnu.org/ml/gcc-cvs/2010-09/msg00661.html


-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 CC||matz at gcc dot gnu dot org
 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2010-09-20 12:17:14
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45733

[Bug tree-optimization/45733] [4.6 Regression] ICE: verify_stmts failed: invalid conversion in gimple call with -fstrict-overflow -ftree-vectorize

2010-09-20 Thread irar at il dot ibm dot com



--- Comment #3 from irar at il dot ibm dot com  2010-09-20 13:08 ---
For vector(2) void * we get vec_perm_v2di_u builtin declaration, because the
mode of vector(2) void * is unsigned V2DI.

I wonder if this can happen for every builtin call, and we should convert back
to the original type everywhere?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45733

[Bug tree-optimization/45714] [4.6 Regression] Vectorization of double pow function causes a segmentation fault

2010-09-19 Thread irar at il dot ibm dot com



--- Comment #3 from irar at il dot ibm dot com  2010-09-19 08:52 ---
gimple_bb (stmt) returns NULL for that statement (D.1575_33 = __builtin_pow
(D.1542_14, D.1574_32)).

We can avoid vectorization in such cases, but looks like it should be fixed to
return the actual basic block.

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45714

[Bug tree-optimization/45714] [4.6 Regression] Vectorization of double pow function causes a segmentation fault

2010-09-19 Thread irar at il dot ibm dot com



--- Comment #5 from irar at il dot ibm dot com  2010-09-19 10:08 ---
Right. This patch fixes it:

Index: tree-vect-stmts.c
===
--- tree-vect-stmts.c   (revision 164332)
+++ tree-vect-stmts.c   (working copy)
@@ -4478,6 +4478,7 @@ vect_transform_stmt (gimple stmt, gimple
 case call_vec_info_type:
   gcc_assert (!slp_node);
   done = vectorizable_call (stmt, gsi, vec_stmt);
+  stmt = gsi_stmt (*gsi);
   break;

 case reduc_vec_info_type:

I am going to test it now.

Thanks,
Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45714

[Bug tree-optimization/45470] [4.6 Regression] ICE: verify_flow_info failed: BB 2 can not throw but has an EH edge with -ftree-vectorize -fnon-call-exceptions

2010-09-12 Thread irar at il dot ibm dot com



--- Comment #9 from irar at il dot ibm dot com  2010-09-12 09:46 ---
OK, thanks. I am going to test this patch, it only checks data-refs and
function calls:

Index: tree-vect-data-refs.c
===
--- tree-vect-data-refs.c   (revision 164227)
+++ tree-vect-data-refs.c   (working copy)
@@ -2542,6 +2542,17 @@ vect_analyze_data_refs (loop_vec_info lo
   offset = unshare_expr (DR_OFFSET (dr));
   init = unshare_expr (DR_INIT (dr));

+  if (stmt_could_throw_p (stmt))
+{
+  if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
+{
+  fprintf (vect_dump, not vectorized: statement can throw an 
+   exception );
+  print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
+}
+  return false;
+}
+
   /* Update DR field in stmt_vec_info struct.  */

   /* If the dataref is in an inner-loop of the loop that is considered for
Index: tree-vect-stmts.c
===
--- tree-vect-stmts.c   (revision 164227)
+++ tree-vect-stmts.c   (working copy)
@@ -1343,6 +1343,9 @@ vectorizable_call (gimple stmt, gimple_s
   if (TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME)
 return false;

+  if (stmt_could_throw_p (stmt))
+return false;
+
   vectype_out = STMT_VINFO_VECTYPE (stmt_info);

   /* Process function arguments.  */

Thanks,
Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45470

[Bug tree-optimization/45470] [4.6 Regression] ICE: verify_flow_info failed: BB 2 can not throw but has an EH edge with -ftree-vectorize -fnon-call-exceptions

2010-09-01 Thread irar at il dot ibm dot com



--- Comment #3 from irar at il dot ibm dot com  2010-09-01 09:06 ---
r163260 only made this BB vectorizable.

I checked lookup_stmt_eh_lp for the last stmt of the BB and EDGE_EH flags
before and after vectorization (basic block SLP), and in both cases
lookup_stmt_eh_lp returns 0 and there is an EH edge from the basic block.

I also tried to add cleanup_eh pass after SLP. If is somewhere before
pass_tree_loop_done, there is no ICE:

Index: passes.c
===
--- passes.c(revision 163538)
+++ passes.c(working copy)
@@ -925,6 +925,7 @@ init_optimization_passes (void)
  NEXT_PASS (pass_parallelize_loops);
  NEXT_PASS (pass_loop_prefetch);
  NEXT_PASS (pass_iv_optimize);
+  NEXT_PASS (pass_cleanup_eh);
  NEXT_PASS (pass_tree_loop_done);
}
   NEXT_PASS (pass_cse_reciprocals);


If cleanup_eh is scheduled after tree_loop_done, there is ICE:

Index: passes.c
===
--- passes.c(revision 163538)
+++ passes.c(working copy)
@@ -926,6 +926,7 @@ init_optimization_passes (void)
  NEXT_PASS (pass_loop_prefetch);
  NEXT_PASS (pass_iv_optimize);
  NEXT_PASS (pass_tree_loop_done);
+  NEXT_PASS (pass_cleanup_eh);
}
   NEXT_PASS (pass_cse_reciprocals);
   NEXT_PASS (pass_reassoc);


-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2010-09-01 09:06:19
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45470

[Bug tree-optimization/45470] [4.6 Regression] ICE: verify_flow_info failed: BB 2 can not throw but has an EH edge with -ftree-vectorize -fnon-call-exceptions

2010-09-01 Thread irar at il dot ibm dot com



--- Comment #6 from irar at il dot ibm dot com  2010-09-01 11:54 ---
(In reply to comment #5)
 I see before SLP:
 
 bb 2:
   MEM[(struct A *)this_1(D)].a = 0;
   MEM[(struct A *)this_1(D)].b = 0;
   MEM[(struct A *)this_1(D)].c = 0;
   [LP 2] MEM[(struct A *)this_1(D) + 12B].a = 0;
 
 and after:
 
 bb 2:
   vect_cst_.1_16 = { 0, 0, 0, 0 };
   vect_p.5_17 = MEM[(struct A *)this_1(D)].a;
   M*vect_p.5_17{misalignment: 0} = vect_cst_.1_16;
 
 so EH info has not been properly transfered.  

How should it be done? Is it ok to assume that if one of the old stmts can
throw, then we can set TREE_THIS_NOTRAP for the new access to 0? (and then we
can call maybe_duplicate_eh_stmt (new_stmt, old_stmt)).

Or maybe it's better to avoid vectorization?...

Thanks,
Ira

 Now that only
 MEM[(struct A *)this_1(D) + 12B].a can throw internally but not
 MEM[(struct A *)this_1(D)].c = 0; is a fact that the frontend establishes.
 
 The following mitigates the problem by simply removing the dead EH edges.
 
 Index: gcc/tree-vect-slp.c
 ===
 --- gcc/tree-vect-slp.c (revision 163721)
 +++ gcc/tree-vect-slp.c (working copy)
 @@ -2474,6 +2474,9 @@ vect_schedule_slp (loop_vec_info loop_vi
  }
  }
 
 +  if (bb_vinfo)
 +gimple_purge_dead_eh_edges (BB_VINFO_BB (bb_vinfo));
 +
return is_store;
  }
 
 


-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

   Target Milestone|4.6.0   |---


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45470

[Bug tree-optimization/41881] [4.5/4.6 regression] Complete unrolling (inner) versus vectorization of reduction

2010-08-11 Thread irar at il dot ibm dot com



--- Comment #7 from irar at il dot ibm dot com  2010-08-11 10:24 ---
(In reply to comment #6)
 I think that SLP doesn't handle reduction.
 

Not all kinds of reduction. We handle

#a1 = phi a0, a2
#b1 = phi b0, b2
...
a2 = a1 + x
b2 = b1 + y

Here we also have:
#a1 = phi a0, a9
...
a2 = a1 + x
...
a3 = a2 + y
...

a9 = a8 + z


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881

[Bug tree-optimization/45241] CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre

2010-08-10 Thread irar at il dot ibm dot com



--- Comment #4 from irar at il dot ibm dot com  2010-08-10 09:06 ---
I am testing the same patch as in comment #1.

Testcase that shows the problem:

int
foo(short x)
{
  short i, y;
  int sum;

  for (i = 0; i  x; i++)
y = x * i;

  for (i = x; i  0; i--)
sum += y;

  return sum;
}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45241

[Bug tree-optimization/45241] CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre

2010-08-10 Thread irar at il dot ibm dot com



--- Comment #5 from irar at il dot ibm dot com  2010-08-10 10:23 ---
(In reply to comment #1)
 This patch should be a valid fix, because the recognition of the dot_prod
 pattern is known to be fail at this point if the stmt is outside the loop.
 (I am not sure whether we should not see this case in the vectorizer at this
 point -- should previous analysis already filter out?):
 

I don't understand this. Where do we check if the stmt (which one?) is outside
the loop? 


 diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
 index 19f0ae6..5f81a73 100644
 --- a/gcc/tree-vect-patterns.c
 +++ b/gcc/tree-vect-patterns.c
 @@ -259,6 +259,10 @@ vect_recog_dot_prod_pattern (gimple last_stmt, tree
 *type_in, tree *type_out)
   inside the loop (in case we are analyzing an outer-loop).  */
if (!is_gimple_assign (stmt))
  return NULL;
 +
 +  if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
 +return NULL;
 +
stmt_vinfo = vinfo_for_stmt (stmt);
gcc_assert (stmt_vinfo);
if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_internal_def)
 

I was looking at PR 45239 and didn't notice that there is another PR and didn't
see this comment. So I tested the same fix (successfully on x86_64-suse-linux).
You can commit it if you like (just please notice, that the bug exists on 4.5
as well). 

Thanks,
Ira


-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2010-08-10 10:24:00
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45241

[Bug lto/44152] ICE on compiling xshow.f of xplor-nih with -O3 -ffast-math -fwhopr

2010-07-27 Thread irar at il dot ibm dot com



--- Comment #4 from irar at il dot ibm dot com  2010-07-27 09:25 ---
I am testing a patch.


-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |irar at il dot ibm dot com
   |dot org |
 Status|NEW |ASSIGNED
   Last reconfirmed|2010-07-22 14:47:20 |2010-07-27 09:25:25
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44152

[Bug tree-optimization/44861] internal compiler error: in vectorizable_load, at tree-vect-stmts.c:3812

2010-07-08 Thread irar at il dot ibm dot com



--- Comment #1 from irar at il dot ibm dot com  2010-07-08 09:14 ---
The failure is in vectorizable_store():

  /* If accesses through a pointer to vectype do not alias the original
 memory reference we have a problem.  This should never happen.  */
  gcc_assert (alias_sets_conflict_p (get_alias_set (data_ref),
  get_alias_set (gimple_assign_lhs (stmt;


Since MEM_REF merge the types struct Foo * and struct counted_base * pass
types_compatible_p() test in vect_check_interleaving(). But in revision 161655
(the merge) the basic block gets vectorized and there is no ICE.


-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 CC||richard dot guenther at
   ||gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44861

[Bug tree-optimization/44710] New: If-conversion generates redundant statements

2010-06-29 Thread irar at il dot ibm dot com

Starting from revision 160625
(http://gcc.gnu.org/ml/gcc-patches/2010-06/msg01155.html) if-conversion
generates redundant statements for 

  for (i = 0; i  N; i++)
if (arr[i]  limit)
  {
pos = i + 1;
limit = arr[i];
  }


  # pos_22 = PHI pos_1(4), 1(2)
  # i_23 = PHI prephitmp.8_2(4), 0(2)
  # limit_24 = PHI limit_4(4), 1.28e+2(2)
  # ivtmp.9_18 = PHI ivtmp.9_17(4), 64(2)
  limit_9 = arr[i_23];
  pos_10 = i_23 + 1;
  D.4534_12 = limit_9  limit_24;   -
  pretmp.7_3 = i_23 + 1;
  D.4535_20 = limit_9 = limit_24;  -
  pos_1 = [cond_expr] limit_9 = limit_24 ? pos_22 : pos_10;
  limit_4 = [cond_expr] limit_9 = limit_24 ? limit_24 : limit_9;
  prephitmp.8_2 = [cond_expr] limit_9 = limit_24 ? pretmp.7_3 : pos_10;
  ivtmp.9_17 = ivtmp.9_18 - 1;
  D.4536_19 = D.4534_12 || D.4535_20;   -
  if (ivtmp.9_17 != 0)
goto bb 4;
  else
goto bb 5;

The statements are removed by later dce pass, but they interfere with my
attempts to vectorize this loop.


-- 
   Summary: If-conversion generates redundant statements
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: irar at il dot ibm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44710

[Bug tree-optimization/44710] If-conversion generates redundant statements

2010-06-29 Thread irar at il dot ibm dot com



--- Comment #1 from irar at il dot ibm dot com  2010-06-29 09:11 ---
Created an attachment (id=21036)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21036action=view)
Full testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44710

[Bug tree-optimization/44711] New: PRE doesn't remove equivalent computations of induction variables

2010-06-29 Thread irar at il dot ibm dot com

For the following loop 

  for (i = 0; i  N; i++)
if (arr[i]  limit)
  {
pos = i + 1;
limit = arr[i];
  }

PRE fails to eliminate redundant i_24 + 1 computation. 

Here is Richard's analysis from
http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02982.html:

So the reason is our heuristic in PRE to not introduce new IVs:

Found partial redundancy for expression {plus_expr,i_24,1} (0005)
Skipping insertion of phi for partial redundancy: Looks like an
induction variable
Inserted pretmp.4_2 = i_13 + 1;
 in predecessor 8
Found partial redundancy for expression {plus_expr,i_24,1} (0005)
Inserted pretmp.4_22 = i_24 + 1;
 in predecessor 7
Created phi prephitmp.5_21 = PHI pretmp.4_22(7), pos_11(4)
 in block 5
Found partial redundancy for expression {plus_expr,i_24,1} (0005)
Skipping insertion of phi for partial redundancy: Looks like an
induction variable
Replaced i_24 + 1 with prephitmp.5_21 in i_13 = i_24 + 1;
Removing unnecessary insertion:pretmp.4_2 = i_13 + 1;

we do not want to insert into block 3, so we are left with

bb 3:
  # pos_23 = PHI pos_1(8), 1(2)
  # i_24 = PHI i_13(8), 0(2)
  # limit_25 = PHI limit_4(8), 1.28e+2(2)
  limit_9 = arr[i_24];
  D.3841_10 = limit_9  limit_25;
  if (D.3841_10 != 0)
goto bb 4;
  else
goto bb 7;

bb 7:
  pretmp.4_22 = i_24 + 1;
  goto bb 5;

bb 4:
  pos_11 = i_24 + 1;

bb 5:
  # pos_1 = PHI pos_23(7), pos_11(4)
  # limit_4 = PHI limit_25(7), limit_9(4)
  # prephitmp.5_21 = PHI pretmp.4_22(7), pos_11(4)
  i_13 = prephitmp.5_21;

where there is no full redundancy for i_24 + 1 now (that is,
we did some useless half-way PRE because of that IV
heuristic ...).


-- 
   Summary: PRE doesn't remove equivalent computations of induction
variables
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: irar at il dot ibm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44711

[Bug tree-optimization/44711] PRE doesn't remove equivalent computations of induction variables

2010-06-29 Thread irar at il dot ibm dot com



--- Comment #1 from irar at il dot ibm dot com  2010-06-29 11:00 ---
Created an attachment (id=21037)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21037action=view)
Full testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44711

[Bug tree-optimization/44507] [4.5/4.6 Regression] vectorization ANDs array elements together incorrectly

2010-06-13 Thread irar at il dot ibm dot com



--- Comment #5 from irar at il dot ibm dot com  2010-06-13 10:29 ---
The bug is in creation of a neutral value for BIT_AND_EXPR. What is the correct
way to create it for all types? I found 
double-int.h:#define ALL_ONES (~((unsigned HOST_WIDE_INT) 0))
but it won't work for signed.

Thanks,
Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44507

[Bug tree-optimization/44507] [4.5/4.6 Regression] vectorization ANDs array elements together incorrectly

2010-06-13 Thread irar at il dot ibm dot com



--- Comment #7 from irar at il dot ibm dot com  2010-06-13 12:01 ---
(In reply to comment #6)
 (In reply to comment #5)
  The bug is in creation of a neutral value for BIT_AND_EXPR. What is the 
  correct
  way to create it for all types? I found 
  double-int.h:#define ALL_ONES (~((unsigned HOST_WIDE_INT) 0))
  but it won't work for signed.
 
   build_int_cst (type, -1)

OK, thanks.

 
 At least in tree-vect-slp.c:1669 this seems to be buggy.  The
 case for BIT_AND_EXPR should be separated from that of MULT_EXPR.

Right, this is buggy too, but the failure here is in reduction
(get_initial_def_for_reduction), not in SLP.

Is it safe to assume that operands of BIT_AND_EXPR are of integral type? If so,
I'll test the following patch:

Index: tree-vect-loop.c
===
--- tree-vect-loop.c(revision 160524)
+++ tree-vect-loop.c(working copy)
@@ -2871,12 +2871,15 @@ get_initial_def_for_reduction (gimple st
   *adjustment_def = init_val;
   }

-if (code == MULT_EXPR || code == BIT_AND_EXPR)
+if (code == MULT_EXPR)
   {
 real_init_val = dconst1;
 int_init_val = 1;
   }

+if (code == BIT_AND_EXPR)
+  int_init_val = -1;
+
 if (SCALAR_FLOAT_TYPE_P (scalar_type))
   def_for_init = build_real (scalar_type, real_init_val);
 else
Index: tree-vect-slp.c
===
--- tree-vect-slp.c (revision 160524)
+++ tree-vect-slp.c (working copy)
@@ -1662,7 +1662,6 @@ vect_get_constant_vectors (slp_tree slp_
  break;

   case MULT_EXPR:
-  case BIT_AND_EXPR:
  if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (op)))
neutral_op = build_real (TREE_TYPE (op), dconst1);
  else
@@ -1670,6 +1669,10 @@ vect_get_constant_vectors (slp_tree slp_

  break;

+  case BIT_AND_EXPR:
+neutral_op = build_int_cst (TREE_TYPE (op), -1);
+break;
+
   default:
  neutral_op = NULL;
 }

Thanks,
Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44507

[Bug tree-optimization/44183] Vectorizer may generate invalid memory access

2010-05-20 Thread irar at il dot ibm dot com



--- Comment #1 from irar at il dot ibm dot com  2010-05-20 07:13 ---
Do you mean that extract_even implementation does something illegal with this
last element? Misaligned load also accesses elements outside the array, but the
problem is in extract_even?

Other than doing something in the backend, we can reduce the number of vector
iterations in cases that may access elements outside array bounds for specific
targets...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44183

[Bug tree-optimization/44183] Vectorizer may generate invalid memory access

2010-05-20 Thread irar at il dot ibm dot com



--- Comment #3 from irar at il dot ibm dot com  2010-05-20 10:04 ---
I am curious what is the problem with that? These elements are not used, they
are just loaded... 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44183

[Bug tree-optimization/44183] Vectorizer may generate invalid memory access

2010-05-20 Thread irar at il dot ibm dot com



--- Comment #5 from irar at il dot ibm dot com  2010-05-20 10:24 ---
Even if we are talking about less than vector size from array boundary? And
that boundary is not (vector) aligned.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44183

[Bug middle-end/43901] [4.6 Regression] FAIL: gcc.c-torture/compile/pr42196-2.c

2010-05-10 Thread irar at il dot ibm dot com



--- Comment #16 from irar at il dot ibm dot com  2010-05-10 08:17 ---
Fixed.


-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43901

[Bug middle-end/43901] [4.6 Regression] FAIL: gcc.c-torture/compile/pr42196-2.c

2010-05-05 Thread irar at il dot ibm dot com



--- Comment #14 from irar at il dot ibm dot com  2010-05-05 09:02 ---

 It tries to get a _vector_ type of the same size.  In theory each
 vectorization method can choose whatever vector size suits them
 most (as for external defs they need to build up a vector of equivalent
 elements anyway).  So with AVX we can do V4DF - V4SF vectorization,
 if the double is an external def the vectorization method could choose
 to create a vector with double size.  But the reasonable default for
 now is th force a same-sized vector type as that is what the vectorizer
 was tested for until now (well, until I get the followup patch cleaned
 up and posted again).

OK, thanks for the explanation.

 
 So yes, if we can return false we should probably do so instead of
 asserting (maybe assert that if we are supposed to create vectorized
 stmts and thus cannot fail that we indeed have a vector type here).

I'll prepare a patch.

Thanks,
Ira

 
 Richard.
 


-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |irar at il dot ibm dot com
   |dot org |
 Status|NEW |ASSIGNED
   Last reconfirmed|2010-05-02 10:44:22 |2010-05-05 09:02:26
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43901

[Bug middle-end/43901] [4.6 Regression] FAIL: gcc.c-torture/compile/pr42196-2.c

2010-05-03 Thread irar at il dot ibm dot com



--- Comment #12 from irar at il dot ibm dot com  2010-05-03 12:30 ---

 Well.  For loops we'd have disqualified it as there is no vector
 type for the external def (well, the stmt inside the loop).

I don't think that's true. With -fno-tree-pre we get the same ICE for loop
vectorization for:

#define N 64

union U
{
  __complex__ int ci;
  __complex__ float cf;
};

union U u[N];

void foo (double f1, double f2)
{
  int i;

  for (i=0; iN; i++)
{
  __real__ u[i].cf = f1;
  __imag__ u[i].cf = f2;
}
}

 So we do not do this for SLP?  In that case
 yes, if we can return false at this point then we should replace this
 (and similar) asserts with return false.  Or we should fix
 the code that scans the BB initially and sets vector types properly?

The loop scan that sets vector types, only checks lhs types (or the smallest
type in stmt) in order to decide on vectorization factor. There is a similar
scan for BBs in vect_analyze_stmt (only to set vector types for stmts) and it
also looks only at lhs. 

The failure occurs in analysis, so it's ok to return false at this point. 
But I don't understand why external def has to have the same size as the lhs?
(And it is, of course, possible that both types are vectorizable, but still the
rhs type is bigger than the lhs type).

Thanks,
Ira

 
 Thanks,
 Richard.
 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43901

[Bug middle-end/43901] [4.6 Regression] FAIL: gcc.c-torture/compile/pr42196-2.c

2010-05-02 Thread irar at il dot ibm dot com



--- Comment #9 from irar at il dot ibm dot com  2010-05-02 11:08 ---
Thanks, Uros! I reproduced the ICE using your instructions.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43901

[Bug middle-end/43901] [4.6 Regression] FAIL: gcc.c-torture/compile/pr42196-2.c

2010-05-02 Thread irar at il dot ibm dot com



--- Comment #10 from irar at il dot ibm dot com  2010-05-02 12:12 ---
Looks like it's caused by:
r158157 | rguenth | 2010-04-09 13:40:14 +0300 (Fri, 09 Apr 2010) | 28 lines

The problem is in getting vectype for f1_2:

foo (int b, double f1, double f2, int c1, int c2)
{
...
  float D.1999;
  float D.1998;
...

bb 3:
  D.1998_3 = (float) f1_2(D);
  REALPART_EXPR u.cf = D.1998_3;
  D.1999_5 = (float) f2_4(D);
  IMAGPART_EXPR u.cf = D.1999_5;
  D.2012_10 = u.ci;
  goto bb 5;

An immediate fix would be to replace the assert in 

  /* If op0 is an external or constant def use a vector type with
 the same size as the output vector type.  */
  if (!vectype)
vectype = get_same_sized_vectype (TREE_TYPE (op0), vectype_out);
  gcc_assert (vectype);

with 'return false', since get_same_sized_vectype currently just redirects to
get_vectype_for_scalar_type. But the comment (and the future intent) seems
incorrect for external defs, as f1 and f2 in this test.

Ira


-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 CC||rguenther at suse dot de


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43901

[Bug middle-end/43901] [4.6 Regression] FAIL: gcc.c-torture/compile/pr42196-2.c

2010-05-01 Thread irar at il dot ibm dot com



--- Comment #4 from irar at il dot ibm dot com  2010-05-02 05:51 ---
I don't have access to ia64. I tried to change the types in the test to make
the basic blocks vectorizable on x86_64, but didn't get any error. So I still
need SLP dump in order to solve this.

Thanks,
Ira


-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

   Target Milestone|4.6.0   |---


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43901

[Bug middle-end/43901] [4.6 Regression] FAIL: gcc.c-torture/compile/pr42196-2.c

2010-04-26 Thread irar at il dot ibm dot com



--- Comment #1 from irar at il dot ibm dot com  2010-04-27 05:53 ---
Could you please give some more information? It doesn't fail on x86_64-linux.
(For SLP dump please use -fdump-tree-slp-details).

Thanks,
Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43901

[Bug tree-optimization/43842] [4.6 Regression] ice in vect_create_epilog_for_reduction

2010-04-22 Thread irar at il dot ibm dot com



-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |irar at il dot ibm dot com
   |dot org |
 Status|NEW |ASSIGNED
   Last reconfirmed|2010-04-22 08:51:50 |2010-04-22 11:46:44
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43842

[Bug testsuite/43482] Fix *.log tests merged output containing ===

2010-04-22 Thread irar at il dot ibm dot com



--- Comment #6 from irar at il dot ibm dot com  2010-04-22 18:11 ---
Yes, sorry about that. I updated the ChangeLogs.
Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43482

[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2010-04-21 Thread irar at il dot ibm dot com



--- Comment #8 from irar at il dot ibm dot com  2010-04-21 11:33 ---
Yes, it's possible to add this to SLP. But I don't understand how 
D.3154_3 = COMPLEX_EXPR D.3163_8, D.3164_9;
should be vectorized. D.3154_3 is complex and the rhs will be a vector
{D.3163_8, D.3164_9} (btw, we have to change float to double, otherwise, we
don't have complete vectors and this is not supported).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2010-04-21 Thread irar at il dot ibm dot com



--- Comment #10 from irar at il dot ibm dot com  2010-04-21 18:33 ---
Thanks. So, it is not always profitable and requires a cost model. 
I am now working on cost model for basic block vectorization, I can look at
this once we have one.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

[Bug tree-optimization/43771] [4.5/4.6 Regression] ICE on valid when compiling ParMetis with gcc 4.5.0 and -O3

2010-04-19 Thread irar at il dot ibm dot com



--- Comment #7 from irar at il dot ibm dot com  2010-04-19 07:48 ---
Fixed on 4.6, 4.5 and 4.4.


-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43771

[Bug tree-optimization/37027] SLP loop vectorization missing support for reductions

2010-04-19 Thread irar at il dot ibm dot com



--- Comment #5 from irar at il dot ibm dot com  2010-04-19 14:35 ---
Fixed.


-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37027

[Bug tree-optimization/43771] [4.5/4.6 Regression] ICE on valid when compiling ParMetis with gcc 4.5.0 and -O3

2010-04-18 Thread irar at il dot ibm dot com



-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |irar at il dot ibm dot com
   |dot org |
 Status|NEW |ASSIGNED
   Last reconfirmed|2010-04-16 21:16:37 |2010-04-18 08:12:31
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43771

[Bug tree-optimization/43692] small loop not vectorized

2010-04-08 Thread irar at il dot ibm dot com



--- Comment #1 from irar at il dot ibm dot com  2010-04-08 17:14 ---
It probably happens because the vectorization is not profitable. Try
-fno-vect-cost-model flag.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43692

[Bug tree-optimization/43692] small loop not vectorized

2010-04-08 Thread irar at il dot ibm dot com



--- Comment #3 from irar at il dot ibm dot com  2010-04-08 17:33 ---
Both loops get vectorized for me with -O3 on x86_64-suse-linux.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43692

[Bug tree-optimization/43692] small loop not vectorized

2010-04-08 Thread irar at il dot ibm dot com



--- Comment #5 from irar at il dot ibm dot com  2010-04-08 17:59 ---
In GCC 4.4 the smaller loop gets completely unrolled before the vectorizer.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43692

[Bug tree-optimization/43425] enhance scalar expansion to vectorize this loop

2010-03-28 Thread irar at il dot ibm dot com



--- Comment #2 from irar at il dot ibm dot com  2010-03-28 08:59 ---
I think PR 35229 covers this issue.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43425

[Bug tree-optimization/43431] Diagnostic message is not clear for vectorization profitability analysis

2010-03-28 Thread irar at il dot ibm dot com



--- Comment #1 from irar at il dot ibm dot com  2010-03-28 09:41 ---
(In reply to comment #0)

 What does this message mean?
 vector iteration cost = 2056 is divisible by scalar iteration cost = 4 by a
 factor greater than or equal to the vectorization factor = 4 .
 Is the vectorization not profitable?
 Why?

The cost of one vector iteration is 2056.
The cost of one scalar iteration is 4.
2056/4 = 514 
514  4 (= vectorization factor)
The vectorization is not profitable.

We want to vectorize only if one vector iteration cost is lower than one scalar
iteration cost multiplied by vectorization factor.

(Vector cost is so high here, because of the j,i access. We should vectorize
the outer loop, but we fail because of some unsupported features: unknown inner
loop bound, need for versioning (for alias) in outer loop.)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43431

[Bug tree-optimization/43436] Missed vectorization: unhandled data-ref

2010-03-28 Thread irar at il dot ibm dot com



--- Comment #2 from irar at il dot ibm dot com  2010-03-28 10:58 ---
(In reply to comment #0)

 sub_hfyu_median_prediction.c:18: note: not vectorized: unhandled data-ref 
 
 Looking with GDB at it, I get:
 (gdb) p debug_data_references (datarefs)
 (Data Ref: 
   stmt: D.2736_16 = *D.2735_15;
   ref: *D.2735_15;
   base_object: *src1_14(D);
   Access function 0: {0B, +, 1}_1
 )
 (Data Ref: 
   stmt: 
   ref: 
   base_object: 
 )
 
 I think it is the dst data ref that is NULL.  Might be an aliasing
 problem for the data dep analysis, but still, the data ref should be
 analyzed correctly first.

Data refs analysis fails because of the function call in the loop.

The vectorizer should check the return value of
compute_data_dependences_for_loop() and print some better error message though.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43436

[Bug tree-optimization/43436] Missed vectorization: unhandled data-ref

2010-03-28 Thread irar at il dot ibm dot com



--- Comment #3 from irar at il dot ibm dot com  2010-03-28 11:07 ---
(In reply to comment #1)

 hadamard8_diff.c:44: note: not vectorized: unhandled data-ref 

There is a function call in this loop as well.

 hadamard8_diff.c:26: note: not vectorized: data ref analysis failed D.2771_12 
 =
 *D.2770_11;

Scalar evolution analysis fails here with:
failed: evolution of base is not affine.

  D.2768_8 = i_361 * stride_7(D);
  D.2769_9 = (long unsigned int) D.2768_8;
  D.2770_11 = src_10(D) + D.2769_9;
  D.2771_12 = *D.2770_11;

stride is function parameter.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43436

[Bug tree-optimization/43543] Reorder the statements in the loop can vectorize it

2010-03-28 Thread irar at il dot ibm dot com



--- Comment #1 from irar at il dot ibm dot com  2010-03-28 11:16 ---
Looks similar to PR 32806.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43543

[Bug tree-optimization/43436] Missed vectorization: unhandled data-ref

2010-03-28 Thread irar at il dot ibm dot com



--- Comment #6 from irar at il dot ibm dot com  2010-03-28 18:05 ---
(In reply to comment #4)
 What about fixing the diagnostic message like this:
 

It would be nice to do the same for SLP (compute_data_dependences_for_bb) for
completeness.

Thanks,
Ira

 diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
 index 37ae9b5..44248b3 100644
 --- a/gcc/tree-vect-data-refs.c
 +++ b/gcc/tree-vect-data-refs.c
 @@ -1866,10 +1866,21 @@ vect_analyze_data_refs (loop_vec_info loop_vinfo,
 bb_vec_info bb_vinfo)
 
if (loop_vinfo)
  {
 +  bool res;
 +
loop = LOOP_VINFO_LOOP (loop_vinfo);
 -  compute_data_dependences_for_loop (loop, true,
 - LOOP_VINFO_DATAREFS (loop_vinfo),
 - LOOP_VINFO_DDRS (loop_vinfo));
 +  res = compute_data_dependences_for_loop
 +   (loop, true, LOOP_VINFO_DATAREFS (loop_vinfo),
 +LOOP_VINFO_DDRS (loop_vinfo));
 +
 +  if (!res)
 +{
 +  if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
 +   fprintf (vect_dump, not vectorized: loop contains function calls
 + or data references that cannot be analyzed);
 +  return false;
 +}
 +
datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
  }
else
 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43436

[Bug tree-optimization/43436] Missed vectorization: unhandled data-ref

2010-03-28 Thread irar at il dot ibm dot com



--- Comment #7 from irar at il dot ibm dot com  2010-03-28 18:22 ---
(In reply to comment #5)
 When defining the missing function like this:
 
 static inline int mid_pred(int a, int b, int c)
 {
 int t= (a-b)((a-b)31);
 a-=t;
 b+=t;
 b-= (b-c)((b-c)31);
 b+= (a-b)((a-b)31);
 
 return b;
 }
 
 The vectorization reports: not vectorized: unsupported use in stmt.

Yes, we have an unsupported cycles for l and lt, since they don't match regular
reduction pattern.

 
 When this function is defined like this:
 static inline int mid_pred(int a, int b, int c)
 {
   if(ab){
 if(cb){
   if(ca) b=a;
   elseb=c;
 }
   }else{
 if(bc){
   if(ca) b=c;
   elseb=a;
 }
   }
   return b;
 }
 
 the vectorizer stops with: not vectorized: control flow in loop.
 

if-conversion fails with 
l_34 = *D.2750_33;
tree could trap...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43436

[Bug tree-optimization/42652] vectorizer created unaligned vector insns

2010-02-22 Thread irar at il dot ibm dot com



--- Comment #17 from irar at il dot ibm dot com  2010-02-22 09:01 ---
Is there a way to pass alignment information similar to PR 39954?

Otherwise, a proper fix would be some inter-procedural analysis... Meantime, we
can do intra-procedural analysis and fail when we reach function argument, i.e,
use runtime checks. We already have several types of versioning, so adding
another one will complicate the things even more, and will not always be
possible (because of code size constrains). 

Thanks,
Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42652

[Bug tree-optimization/43074] [4.4/4.5 Regression] ICE in vectorizable_reduction, at tree-vect-loop.c:3491

2010-02-15 Thread irar at il dot ibm dot com



-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |irar at il dot ibm dot com
   |dot org |
 Status|UNCONFIRMED |ASSIGNED
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2010-02-15 12:39:54
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43074

[Bug tree-optimization/42846] GCC sometimes ignores information about pointer target alignment

2010-01-23 Thread irar at il dot ibm dot com



--- Comment #3 from irar at il dot ibm dot com  2010-01-24 07:39 ---
This has already been discussed in PR 41464.

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42846

[Bug tree-optimization/42652] vectorizer created unaligned vector insns

2010-01-18 Thread irar at il dot ibm dot com



--- Comment #13 from irar at il dot ibm dot com  2010-01-18 12:17 ---
Does something like this make sense? (With this patch we will never use peeling
for function parameters, unless the builtin returns OK to peel for packed
types). 

Index: tree-vect-data-refs.c
===
--- tree-vect-data-refs.c   (revision 155880)
+++ tree-vect-data-refs.c   (working copy)
@@ -1010,10 +1010,29 @@ vector_alignment_reachable_p (struct dat
   tree type = (TREE_TYPE (DR_REF (dr)));
   tree ba = DR_BASE_OBJECT (dr);
   bool is_packed = false;
+  tree tmp = TREE_TYPE (DR_BASE_ADDRESS (dr));

   if (ba)
is_packed = contains_packed_reference (ba);

+  is_packed = is_packed || contains_packed_reference (DR_BASE_ADDRESS
(dr));
+
+  if (!is_packed)
+{
+  while (tmp)
+{
+  is_packed = TYPE_PACKED (tmp);
+  if (is_packed)
+break;
+
+  tmp = TREE_TYPE (tmp);
+}
+}
+
+  if (TREE_CODE (DR_BASE_ADDRESS (dr)) == SSA_NAME
+   TREE_CODE (SSA_NAME_VAR (DR_BASE_ADDRESS (dr))) == PARM_DECL)
+is_packed = true;
+
   if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, Unknown misalignment, is_packed = %d,is_packed);
   if (targetm.vectorize.vector_alignment_reachable (type, is_packed))


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42652

[Bug tree-optimization/42652] vectorizer created unaligned vector insns

2010-01-13 Thread irar at il dot ibm dot com



--- Comment #10 from irar at il dot ibm dot com  2010-01-13 09:35 ---
Yes, I understand that we can't assume that an access is aligned if we can't
prove it's aligned. I don't understand how we can prove that a COMPONENT_REF is
aligned, i.e., if there is a way to check if a struct is packed, or we'd better
decide that we always use versioning for COMPONENT_REFs?

Thanks,
Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42652

[Bug tree-optimization/42709] [4.5 Regression] error: type mismatch in pointer plus expression

2010-01-13 Thread irar at il dot ibm dot com



-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |irar at il dot ibm dot com
   |dot org |
 Status|NEW |ASSIGNED
   Last reconfirmed|2010-01-12 16:12:10 |2010-01-13 11:36:55
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42709

[Bug tree-optimization/42652] vectorizer created unaligned vector insns

2010-01-12 Thread irar at il dot ibm dot com



--- Comment #8 from irar at il dot ibm dot com  2010-01-12 08:08 ---
So, to be on the safe side, we should assume that COMPONENT_REFs are not
naturally aligned and never use peeling for them?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42652

[Bug tree-optimization/42652] vectorizer created unaligned vector insns

2010-01-10 Thread irar at il dot ibm dot com



--- Comment #5 from irar at il dot ibm dot com  2010-01-10 08:22 ---
In vector_alignment_reachable_p() we check if an access is packed using
contains_packed_reference(). For packed accesses we return false, meaning
alignment is unreachable and peeling cannot be used.

In the attached testcase contains_packed_reference() returns false for
palette_5.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42652

[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64

2010-01-10 Thread irar at il dot ibm dot com



--- Comment #43 from irar at il dot ibm dot com  2010-01-10 13:43 ---
Since -O2 -ftree-vectorize doesn't cause bad code, it has to be some other
optimization on top of vectorized code that causes the problem.

Bad code is generated when the alignment of 'reduce' is forced and the
reduction 'sum(reduce)' is vectorized. However, the result of the reduction is
correct, and the vector store element does not do any damage (as far as I can
see in debugger). So, the vector stores don't corrupt anything.

The part that goes wrong is in the scalar code that implements the decision on
whether to add the (correctly computed) reduction value to temp[9] and
temp[10]. The code that sets the condition, (which, by the way, is not using
any vectorized code) is not using the values of  reduce[9] and reduce[10], even
though the value of the condition depends on them:

reduce(1:3) = -1
reduce(4:6) = 0
reduce(7:8) = 5
reduce(9:10) = 10
...
WHERE (reduce  6) temp = temp + sum(reduce) 


Here is the code for adding the result of the sum(reduce) to temp[9]:

L29:
lbz r11,152(r1)  # **
cmpwi cr7,r11,0  # reduce  6 ?
beq cr7,L30
lwz r11,240(r1)  # load temp[9]
add r11,r11,r9   # temp[9] + sum(reduce) 
stw r11,240(r1)  # store temp[9]

** - The calculation of 152(r1) is based only on the value of reduce[8]! The
values of reduce[9] and reduce[10] are only used in the reduction calculation
and not compared to 6 at all.

In case we don't vectorize (but force the alignment), there is cmpwi cr7,r29,6
instruction, where r29 is reduce[9] (and the code is correct). The same happens
when the alignment of reduce is not forced and the reduction is vectorized
using peeling. 

I.e., as far as I can see, in the bad code, the comparison of reduce[9] and
reduce[10] with 6 do not exist. I wonder which optimization can be responsible
for that?


Also, some values of reduce are copied to a temporal array and are further
compared with 6. In  the version with peeling the values that are copied are
reduce[4:8]: there is no need to keep the first three and the last two are kept
in registers and compared to 6 (and also used in reduction epilogue). While in
the bad version the kept values are reduce[3:8] and reduce[8] is put before the
values of reduce[3:7] (reduce[3:7] are in 276(r1) to 292(r1), and reduce[8] is
in 272(r1)). (And in the bad code the last two values reduce[9] and reduce[10]
are only used in reduction epilogue).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64

2010-01-05 Thread irar at il dot ibm dot com



--- Comment #42 from irar at il dot ibm dot com  2010-01-05 09:09 ---
So, it's enough to force alignment of reduce only (and to vectorize its loop)
to get wrong code. On the other hand, the result of the vectorized loop is
correct, and the problem is in choosing the correct index of temp.

The assembly looks fine to me. So, for me the only way to proceed is to debug.
Dominique, is it possible to access your machine?

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

[Bug middle-end/41956] Segfault in vectorizer

2009-12-30 Thread irar at il dot ibm dot com



--- Comment #7 from irar at il dot ibm dot com  2009-12-30 10:16 ---
The bug is in SLP load permutation analysis. I am testing a patch.


-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |irar at il dot ibm dot com
   |dot org |
 Status|NEW |ASSIGNED
   Last reconfirmed|2009-12-29 17:42:02 |2009-12-30 10:16:22
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41956

[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64

2009-12-23 Thread irar at il dot ibm dot com



--- Comment #40 from irar at il dot ibm dot com  2009-12-23 14:49 ---
(In reply to comment #39)
 I have regtested the patch in comment #31 and I have ~75 regressions on
 x86_64-apple-darwin10 in the gcc vect test suite (~100 on
 powerpc-apple-darwin9). Is this expected? and do you want the list?

Yes, it is expected, it is not a bug fixing patch (as well as the rest of the
hacks I asked you to check), it disables a feature - alignment forcing, so some
tests are supposed to fail.

Thanks,
Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64

2009-12-22 Thread irar at il dot ibm dot com



--- Comment #30 from irar at il dot ibm dot com  2009-12-22 11:42 ---
We can try to verify the alignment issue by applying the two hacks I am
attaching. 

The first one disables alignment forcing for all the data-refs (and marks the
alignment as unknown). The loops are still vectorizable using peeling -
hopefully, they are also vectorizable on darwin. So, if the results are correct
and the two loops are vectorized, then the problem is in alignment. If the
results are incorrect, the problem is in vectorization.

The second one still forces alignment of the vectorized arrays, but not of the
other arrays. With -fdump-tree-vect-details (or verbosity 9) it prints force
alignment of data-ref, so we can verify that the correct arrays were aligned
(reduce line 11 and temp line 5). So, here, the loops should be vectorized as
before and only the alignment of not vectorized arrays will not be forced.

Dominique, could you please check this?

Thanks,
Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64

2009-12-22 Thread irar at il dot ibm dot com



--- Comment #31 from irar at il dot ibm dot com  2009-12-22 11:43 ---
Created an attachment (id=19370)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19370action=view)
disable alignment forcing


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64

2009-12-22 Thread irar at il dot ibm dot com



--- Comment #32 from irar at il dot ibm dot com  2009-12-22 11:44 ---
Created an attachment (id=19371)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19371action=view)
force alignment of vectorized arrays only


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64

2009-12-22 Thread irar at il dot ibm dot com



--- Comment #36 from irar at il dot ibm dot com  2009-12-23 07:54 ---
Thanks!
So, it is alignment of the vectorized arrays. I'd like to do two more checks:
1. Just force alignment of the two arrays (temp and reduce) and do not
vectorize.
2. Force alignment of reduce only (and vectorize both loops).
I am attaching the hacks. Could you please chesk this as well?

Thanks,
Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64

2009-12-22 Thread irar at il dot ibm dot com



--- Comment #37 from irar at il dot ibm dot com  2009-12-23 07:54 ---
Created an attachment (id=19377)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19377action=view)
Force alignment but don't vectorize


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64

2009-12-22 Thread irar at il dot ibm dot com



--- Comment #38 from irar at il dot ibm dot com  2009-12-23 07:55 ---
Created an attachment (id=19378)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19378action=view)
Force alignment of reduce only


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64

2009-12-20 Thread irar at il dot ibm dot com



--- Comment #23 from irar at il dot ibm dot com  2009-12-20 12:18 ---
The code that now gets vectorized is the summation of array 'reduce':
sum(reduce). It looks like the problem is with adding the reduction result to
the correct index of 'temp' (scalar code), and not with the reduction itself.
Could you please verify that by printing the reduction result?

Thanks,
Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64

2009-12-20 Thread irar at il dot ibm dot com



--- Comment #26 from irar at il dot ibm dot com  2009-12-20 13:46 ---
I think the problem is in alignment. We force alignment of temp.6 and temp.20 -
the arrays of relevant comaprison results - even though we don't vectorize
their loop. The decision whether we can force alignment is made in
vect_can_force_dr_alignment_p(), and it seems that the only target specific
query there is comparison with MAX_STACK_ALIGNMENT.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64

2009-12-20 Thread irar at il dot ibm dot com



--- Comment #28 from irar at il dot ibm dot com  2009-12-20 13:59 ---
Hm, I don't know, but this is my best guess - we change something in the code
that goes wrong...

We also force alignment of reduce, but the reduction computation looks ok.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64

2009-12-16 Thread irar at il dot ibm dot com



--- Comment #21 from irar at il dot ibm dot com  2009-12-16 12:01 ---
Thanks. 
I'll be able to look at this only on Sunday due to holidays.

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64

2009-12-15 Thread irar at il dot ibm dot com



--- Comment #7 from irar at il dot ibm dot com  2009-12-15 08:25 ---
I can't reproduce it with current mainline on powerpc64-suse-linux. Could you
please attach vectorizer dump? Does the good old version gets vectorized? If
so, could you please attach it as well?

Thanks,
Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64

2009-12-15 Thread irar at il dot ibm dot com



--- Comment #11 from irar at il dot ibm dot com  2009-12-15 10:59 ---
Looks that it has to be my patch that enables vectorization of conditions:

r149806 | irar | 2009-07-20 14:59:10 +0300 (Mon, 20 Jul 2009) | 19 lines


* tree-vectorizer.h (vectorizable_condition): Add parameters.
* tree-vect-loop.c (vect_is_simple_reduction): Support COND_EXPR.
(get_initial_def_for_reduction): Likewise.
(vectorizable_reduction): Skip the check of first operand in case
of COND_EXPR. Add check that it is outer loop vectorization if
nested cycle was detected. Call vectorizable_condition() for
COND_EXPR. If reduction epilogue cannot be created do not fail for
nested cycles (if it is not double reduction). Assert that there
is only one type in the loop in case of COND_EXPR. Call
vectorizable_condition() to vectorize COND_EXPR.
* tree-vect-stmts.c (vectorizable_condition): Update comment.
Add parameters. Allow nested cycles if called from
vectorizable_reduction(). Use reduction vector variable if provided.
(vect_analyze_stmt): Call vectorizable_reduction() before
vectorizable_condition().
(vect_transform_stmt): Update call to vectorizable_condition().

I'll try to find out what's wrong with it.

Thanks,
Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64

2009-12-15 Thread irar at il dot ibm dot com



--- Comment #13 from irar at il dot ibm dot com  2009-12-15 13:07 ---
(In reply to comment #12)
  Looks that it has to be my patch that enables vectorization of conditions:
 I am doing a clean bootstrap of C and FORTRAN of revision 149805 to see if the
 test works for it (allow for ~6h on my poor G5). Then I'll update to 149806.

1) Thanks. I got confused by the var names, but actually there is no COND_EXPR
there. But still, it can be this patch. So it worth checking. 

2) The vectorizer's code for powerpc64-suse-linux I got is identical to
darwin's except that the first has calls:

   _gfortran_set_args (argc_1(D), argv_2(D));
_gfortran_set_options (8, options.36[0]);

in the begining

and the second one has this bb:

bb 43:
  dt_parm.33.common.filename = where_2.f90[1]{lb: 1 sz: 1};
  dt_parm.33.common.line = 20;
  dt_parm.33.common.flags = 128;
  dt_parm.33.common.unit = 6;
  _gfortran_st_write (dt_parm.33);
  parm.34.dtype = 265;
  parm.34.dim[0].lbound = 1;
  parm.34.dim[0].ubound = 10;
  parm.34.dim[0].stride = 1;
  parm.34.data = temp[0];
  parm.34.offset = -1;
  _gfortran_transfer_array (dt_parm.33, parm.34, 4, 0);
  _gfortran_st_write_done (dt_parm.33);

(I am attaching my dump).

3) The only difference between the targets I am aware of is natural alignment,
but we don't do peeling, so it shouldn't make any difference here.

4) We do force alignment. Between in the revisions range there is this patch
that may be somehow related:

r149853 | pbrook | 2009-07-21 15:35:38 +0300 (Tue, 21 Jul 2009) | 12 lines

2009-07-21  Paul Brook p...@codesourcery.com

gcc/
* tree-vectorizer.c (increase_alignment): Handle nested arrays.
Terminate debug dump with newline.

gcc/testsuite/
* gcc.dg/vect/section-anchors-nest-1.c: New test.
* lib/target-supports.exp (check_effective_target_section_anchors):
Add arm*-*-*.

5) Also looking at the assembly may help. Could you please attach it as well?

Thanks,
Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64

2009-12-15 Thread irar at il dot ibm dot com



--- Comment #14 from irar at il dot ibm dot com  2009-12-15 13:08 ---
Created an attachment (id=19311)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19311action=view)
powerpc64-suse-linux vect dump


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64

2009-12-15 Thread irar at il dot ibm dot com



--- Comment #16 from irar at il dot ibm dot com  2009-12-15 13:35 ---
But in comment #5 you wrote that it passes with the print, right? So, this dump
contains correct or incorrect code?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082

[Bug tree-optimization/42286] October 23rd change to tree-ssa-pre.c breaks calculix on powerpc with -ffast-math

2009-12-06 Thread irar at il dot ibm dot com



--- Comment #3 from irar at il dot ibm dot com  2009-12-06 13:25 ---
On powerpc64-suse-linux with current trunk calculix failed after a couple of
minutes with
-O3  -maltivec -ffast-math
-O3  -maltivec -ffast-math -fno-tree-vectorize
-O2  -maltivec -ffast-math
-O1  -maltivec -ffast-math

It is currently running for about an hour with
-O0  -maltivec -ffast-math

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42286

[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-30 Thread irar at il dot ibm dot com



--- Comment #20 from irar at il dot ibm dot com  2009-11-30 08:52 ---
Actually, PAREN_EXPRs are vectorizable (the support was added by you, Richard,
in your original PAREN_EXPR patch
http://gcc.gnu.org/viewcvs?limit_changes=0view=revisionrevision=132515 )).

The problem here is that vectorizable_assignment does not support multiple
types. The attached patch adds this support, but I don't know if the patch is
suitable for the current stage...

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108

[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-30 Thread irar at il dot ibm dot com



--- Comment #21 from irar at il dot ibm dot com  2009-11-30 08:54 ---
Created an attachment (id=19183)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19183action=view)
Multiple types support patch


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108

[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-30 Thread irar at il dot ibm dot com



--- Comment #23 from irar at il dot ibm dot com  2009-11-30 12:20 ---
Applied:
http://gcc.gnu.org/viewcvs?limit_changes=0view=revisionrevision=154794

Thanks,
Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108

[Bug middle-end/42193] [4.5 Regression] 454.calculix in SPEC CPU 2006 failed to compile at -O3

2009-11-29 Thread irar at il dot ibm dot com



-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |irar at il dot ibm dot com
   |dot org |
 Status|NEW |ASSIGNED
   Last reconfirmed|2009-11-27 10:36:38 |2009-11-29 12:24:11
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42193

[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-23 Thread irar at il dot ibm dot com



--- Comment #18 from irar at il dot ibm dot com  2009-11-23 09:02 ---
I tried to vectorize eval.f90 with 4.3 and mainline on x86_64-suse-linux. In
both cases no loop gets vectorized in subroutine eval. The k loop is not
vectorizable because the step of x is unknown (function argument), and scalar
evolution analysis fails to analyze it. The j loop is not vectorized first of
all because of the k loop unknown loop bound (this is on our todo list).

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108

[Bug tree-optimization/41879] [4.5 Regression] 172.mgrid regression, vectorizer prevents predictive commoning

2009-11-11 Thread irar at il dot ibm dot com



--- Comment #5 from irar at il dot ibm dot com  2009-11-12 07:51 ---
(In reply to comment #4)
 I didn't check yet.  We'll work on a simple cost-model integration of
 predcom.

You mean, vectorizer cost model will take predcom into account?

If the vectorization is not profitable (vs. scalar without predcom), it can be
a matter of vectorizer cost model tuning (looks easier).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41879

[Bug tree-optimization/41879] [4.5 Regression] 172.mgrid regression, vectorizer prevents predictive commoning

2009-11-10 Thread irar at il dot ibm dot com



--- Comment #3 from irar at il dot ibm dot com  2009-11-10 10:02 ---
(In reply to comment #0)
 This causes mgrid score to drop
 by almost 40% on x86_64 and the vectorized code is pretty bad because it
 uses unaligned accesses.

Is the vectorized code worse than the scalar one even without predcom?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41879

[Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads

2009-09-27 Thread irar at il dot ibm dot com



--- Comment #4 from irar at il dot ibm dot com  2009-09-27 08:06 ---
(In reply to comment #1)
 The interesting thing is that data-ref analysis sees 128bit alignment but
 the vectorizer still produces
   vect_var_.24_59 = M*vect_p.20_57{misalignment: 0};
   D.2564_12 = *D.2563_11;
   vect_var_.25_61 = vect_var_.24_59 * vect_cst_.26_60;
   D.2565_13 = D.2564_12 * 2.299523162841796875e+0;
   M*vect_p.27_64{misalignment: 0} = vect_var_.25_61;
 thus, unknown misalignment.
 (instantiate_scev
   (instantiate_below = 3)
   (evolution_loop = 1)
   (chrec = {i_10(D), +, 4}_1)
   (res = {i_10(D), +, 4}_1))
 base_address: i_10(D)
 offset from base address: 0
 constant offset from base address: 0
 step: 4
 aligned to: 128
 base_object: *i_10(D)
 Creating dr for *D.2562_7
   (res = {f_6(D), +, 4}_1))
 base_address: f_6(D)
 offset from base address: 0
 constant offset from base address: 0
 step: 4
 aligned to: 128
 base_object: *f_6(D)
 t2.i:5: note: === vect_enhance_data_refs_alignment ===
 t2.i:5: note: Vectorizing an unaligned access.
 t2.i:5: note: Vectorizing an unaligned access.

aligned to refers to the offset misalignment and not to the misalignment of
base.
attribute aligned works only for arrays, i.e., declarations, and not for
pointer arguments. For pointers the vectorizer only checks TYPE_ALIGN_UNIT of
the base type.

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464

[Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads

2009-09-27 Thread irar at il dot ibm dot com



--- Comment #6 from irar at il dot ibm dot com  2009-09-27 09:56 ---
(In reply to comment #5)
  
  aligned to refers to the offset misalignment and not to the misalignment 
  of
  base.
 Hmm, I believe it refers to base + offset + constant offset.
tree-data-refs.h:
  /* Alignment information.  ALIGNED_TO is set to the largest power of two
 that divides OFFSET.  */
  tree aligned_to;

tree-dat-refs.c:
DR_ALIGNED_TO (dr) = size_int (highest_pow2_factor (offset_iv.base));


  attribute aligned works only for arrays, i.e., declarations, and not for
  pointer arguments.
 I have to check that - I believe that in principle it should work.
  For pointers the vectorizer only checks TYPE_ALIGN_UNIT of
  the base type.
 That should be ok.  

But we need TYPE_ALIGN_UNIT to be 16, and we are checking scalar type here, so
without user defined alignment it will be 4.

Ira

 I guess I have to see what's going on here.
 Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464

[Bug target/41288] [4.5 Regression] gcc.target/x86_64/abi/test_struct_returning.c regressions on -apple-darwin at -m64

2009-09-07 Thread irar at il dot ibm dot com



--- Comment #9 from irar at il dot ibm dot com  2009-09-08 05:51 ---
Looks related to PR 39907.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41288

[Bug tree-optimization/41019] Variate_generator with mt19937 and normal_distribution produces wrong sequence for -O3.

2009-08-13 Thread irar at il dot ibm dot com



--- Comment #10 from irar at il dot ibm dot com  2009-08-13 11:34 ---
Reduced testcase:

#include stdlib.h
#include stdio.h

#define N 4

long int a[N];
int main ()
{
  int k;

  for (k = 0; k  N; ++k)
a[k] = a[k] != 5 ? 12 : 10;

  for (k = 0; k  N; ++k)
printf (%u , a[k]);

  printf (\n);

  return 0;
}

%gcc -O3 t.c
% ./a.out
0 0 0 0

%gcc -O2 t.c
% ./a.out
12 12 12 12

If the type of 'a' is int, there is no problem. The vectorizer produces almost
the same code in both cases (except for number of iterations and types).
I am attaching the assembly for int and long int versions.

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41019

[Bug tree-optimization/41019] Variate_generator with mt19937 and normal_distribution produces wrong sequence for -O3.

2009-08-13 Thread irar at il dot ibm dot com



--- Comment #11 from irar at il dot ibm dot com  2009-08-13 11:36 ---
Created an attachment (id=18350)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18350action=view)
The assembly for the long int version (wrong code)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41019

[Bug tree-optimization/41019] Variate_generator with mt19937 and normal_distribution produces wrong sequence for -O3.

2009-08-13 Thread irar at il dot ibm dot com



--- Comment #12 from irar at il dot ibm dot com  2009-08-13 11:37 ---
Created an attachment (id=18351)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18351action=view)
The assembly for the int version (correct)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41019

[Bug tree-optimization/41019] Variate_generator with mt19937 and normal_distribution produces wrong sequence for -O3.

2009-08-12 Thread irar at il dot ibm dot com



--- Comment #6 from irar at il dot ibm dot com  2009-08-12 12:14 ---
Looks like a problem in data-ref analysis:

Creating dr for this_6(D)-_M_x[__k_87]
...
base_address: this_6(D)
offset from base address: 0
constant offset from base address: 0
step: 8
aligned to: 128
base_object: this_6(D)-_M_x[0]

And the vectorizer creates accesses relatively to this_6(D) (base_address
above) with zero offset (instead of this_6(D)-_M_x[0] or with an offset of
_M_x).

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41019

[Bug tree-optimization/41008] [4.5 Regression] ICE in vect_is_simple_reduction, at tree-vect-loop.c:1708

2009-08-09 Thread irar at il dot ibm dot com



--- Comment #3 from irar at il dot ibm dot com  2009-08-09 12:15 ---
Fixed.


-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41008

[Bug middle-end/37150] vectorizer misses some loops

2009-08-06 Thread irar at il dot ibm dot com



--- Comment #10 from irar at il dot ibm dot com  2009-08-06 10:49 ---
Yes. The problem is that only a basic implementation was added. To vectorize
this code several improvements must be done: support stmt group sizes greater
than vector size, allow loads and stores to the same location, initiate SLP
analysis from groups of loads, support misaligned access, etc. 

Finding a benchmark could really help to push these items to the top of
vectorizer's todo list.

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150

[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-28 Thread irar at il dot ibm dot com



--- Comment #41 from irar at il dot ibm dot com  2009-07-28 08:12 ---
That requires pattern recognition. MIN/MAX_EXPR are recognized by the first
phiopt pass, so MIN/MAXLOC should be either also recognized there or in the
vectorizer. (The phiopt pass transforms if clause to MIN/MAX_EXPR. The
vectorizer gets COND_EXPR after if-conversion pass).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067

[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-27 Thread irar at il dot ibm dot com



--- Comment #34 from irar at il dot ibm dot com  2009-07-27 08:36 ---
(In reply to comment #33)
 Using the example from comment 23 with
...
 gfortran shows: test.f90:12: note: not vectorized: unsupported use in stmt.
 and needs 2.272s. (By comparison. 4.4 needs 3.688s.)

This is for the inner loop vectorization. For the outer loop we get:
tmp.f90:11: note: not vectorized: control flow in loop.
because of the if's. Maybe loop unswitching can help us. 
Vectorizable outer-loops look like this:

(pre-header)
   |
  header ---+
   | |
  inner-loop |
   | |
  tail --+
   |
(exit-bb)


Does ifort vectorize the exact same implemantion of minloc?

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067

[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-27 Thread irar at il dot ibm dot com



--- Comment #38 from irar at il dot ibm dot com  2009-07-27 12:44 ---
I am not sure that that kind of computation can be generated automatically,
since in general the order of caclulation of cond_expr cannot be changed. 

However, the loop can be split:

  for (i = 0; i  end; i++)
if (arr[i]  limit)
  limit = arr[i];

  for (i = 0; i  end; i++)
if (arr[i] == limit)
  {
pos = i + 1;
break;
  }

making the first loop vectorizable (inner-most loop vectorization).

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067

[Bug tree-optimization/40801] internal compiler error: in vect_get_vec_def_for_stmt_copy, at tree-vect-stmts.c:1096

2009-07-26 Thread irar at il dot ibm dot com



--- Comment #5 from irar at il dot ibm dot com  2009-07-26 07:04 ---
Fixed.


-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40801

[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-26 Thread irar at il dot ibm dot com



--- Comment #32 from irar at il dot ibm dot com  2009-07-26 07:48 ---
(In reply to comment #30)
 Regarding the just committed inline version: It would be interesting to know
 whether it is vectorizable (with/without -ffinite-math-only [i.e.
 -ffast-math]).

It depends on where it is inlined. It has to be vectorized in outer loop (see
my previous comment), so it needs another loop around it.

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067

[Bug tree-optimization/40770] Vectorization of complex types, vectorization of sincos missing

2009-07-20 Thread irar at il dot ibm dot com



--- Comment #7 from irar at il dot ibm dot com  2009-07-20 11:18 ---
AFAIU, querying for the component type of complex type is not difficult to
implement. 
I think, that loop-based vectorization is preferable here, so we should stay
with vectorization factor of 2 for doubles.

The next problem is to vectorize 
  D.1611_4 = IMAGPART_EXPR sincostmp.1_1;
and
  D.1612_6 = REALPART_EXPR sincostmp.1_1;

Currently, we support only loads and stores with IMAGPART/REALPART_EXPR,
vectorizing them as strided accesses, with extract odd and even operations for
loads. So, we will have to support interleaving of non-memory variables. 

Does __builtin_cexpi have a vector implementation? If so, does it return two
vectors?

If not, I guess, we need something like:

  sincostmp.1 = __builtin_cexpi (xd[i]);
  sincostmp.2 = __builtin_cexpi (xd[i+1]);
  v1 = VEC_EXTRACT_EVEN (sincostmp.1, sincostmp.2);
  v2 = VEC_EXTRACT_ODD (sincostmp.1, sincostmp.2);
  sf[i:i+1] = v1;
  cf[i:i+1] = v2;
  i = i + 2;

Or we can use the two vectors from vectorized __builtin_cexpi as parameters of
extract operations.
Does that make sense?

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770

[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-20 Thread irar at il dot ibm dot com



--- Comment #28 from irar at il dot ibm dot com  2009-07-20 12:03 ---
I've just committed a patch that adds support of cond_expr in reductions in
nested cycles (http://gcc.gnu.org/ml/gcc-patches/2009-07/msg01124.html). 

cond_expr cannot be vectorized in reduction of inner-most loop, because such
reduction changes the order of computation, and that cannot be done for
cond_expr.

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067

[Bug tree-optimization/40801] internal compiler error: in vect_get_vec_def_for_stmt_copy, at tree-vect-stmts.c:1096

2009-07-19 Thread irar at il dot ibm dot com



--- Comment #3 from irar at il dot ibm dot com  2009-07-19 09:35 ---
Testing a fix.

Ira


-- 

irar at il dot ibm dot com changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |irar at il dot ibm dot com
   |dot org |
 Status|NEW |ASSIGNED
   Last reconfirmed|2009-07-18 19:15:43 |2009-07-19 09:35:55
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40801

[Bug tree-optimization/40770] Vectorization of complex types, vectorization of sincos missing

2009-07-16 Thread irar at il dot ibm dot com



--- Comment #2 from irar at il dot ibm dot com  2009-07-16 12:29 ---
pr40770.c:20: note: == examining statement: sincostmp.21_1 = __builtin_cexpi
(D.1625_3);
pr40770.c:20: note: get vectype for scalar type:  complex double
pr40770.c:20: note: not vectorized: unsupported data-type complex double

make_vector_type returns NULL for this type.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770

[Bug tree-optimization/40770] Vectorization of complex types, vectorization of sincos missing

2009-07-16 Thread irar at il dot ibm dot com



--- Comment #6 from irar at il dot ibm dot com  2009-07-16 17:31 ---
(In reply to comment #3)
  make_vector_type returns NULL for this type.
 Yes - there is no vector type for complex double.  But the vectorizer
 could query for a vector type for the complex component type (double)
 and divide the vector element count by 2 (for complex) to get the
 vectorization factor which would be 1 here.  

I see.

 Should SLP the be possible
 for that loop?

Not with the current implementation - SLP needs strided stores to start. Here
the stores are not even adjacent. I think, it would be better to vectorize this
loop with regular loop-based vectorization to avoid permutations. I'll take a
better look on Sunday.

Ira

 Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770

1 2 3 4 >

1 - 100 of 361 matches

Mail list logo