[Bug tree-optimization/33869] [4.3 Regression] ICE verify_ssa failed (missing definition for SSA_NAME)

2007-10-29 Thread dorit at il dot ibm dot com


--- Comment #11 from dorit at il dot ibm dot com  2007-10-30 05:48 ---
(In reply to comment #6)

Richard, is this related to the issue you reported in 
http://gcc.gnu.org/ml/gcc-patches/2007-10/msg01127.html
(looks like the same error)?
Any idea why the fix you committed doesn't cover this case?
(I haven't looked into this PR yet, it just reminded me of that thread)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33869



[Bug tree-optimization/25371] -ftree-vectorize results in internal compiler error on AMD64

2007-07-01 Thread dorit at il dot ibm dot com


--- Comment #12 from dorit at il dot ibm dot com  2007-07-01 09:30 ---
> Subject: Re:  -ftree-vectorize results in internal compiler error on AMD64
> Zdenek's patch for cleaning the dataref analysis is also fixing this bug.
> http://gcc.gnu.org/ml/gcc-patches/2007-05/msg00634.html

So now that Zdenek's patch went in, can someone confirm if this problem doesn't
occur anymore on x86_64?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25371



[Bug tree-optimization/24659] Conversions are not vectorized

2007-06-29 Thread dorit at il dot ibm dot com


--- Comment #19 from dorit at il dot ibm dot com  2007-06-29 16:46 ---
testing this patch for Altivec:

Index: config/rs6000/altivec.md
===
*** config/rs6000/altivec.md(revision 126053)
--- config/rs6000/altivec.md(working copy)
***
*** 147,152 
--- 147,156 
 (UNSPEC_VPERMHI321)
 (UNSPEC_INTERHI  322)
 (UNSPEC_INTERLO  323)
+(UNSPEC_VUPKHS_V4SF   324)
+(UNSPEC_VUPKLS_V4SF   325)
+(UNSPEC_VUPKHU_V4SF   326)
+(UNSPEC_VUPKLU_V4SF   327)
  ])

  (define_constants
***
*** 2933,2935 
--- 2937,2995 
emit_insn (gen_altivec_vmrgl (operands[0], operands[1],
operands[2]));
DONE;
  }")
+
+ (define_expand "vec_unpacks_float_hi_v8hi"
+  [(set (match_operand:V4SF 0 "register_operand" "")
+ (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "")]
+  UNSPEC_VUPKHS_V4SF))]
+   "TARGET_ALTIVEC"
+   "
+ {
+   rtx tmp = gen_reg_rtx (V4SImode);
+
+   emit_insn (gen_vec_unpacks_hi_v8hi (tmp, operands[1]));
+   emit_insn (gen_altivec_vcfsx (operands[0], tmp, const0_rtx));
+   DONE;
+ }")
+
+ (define_expand "vec_unpacks_float_lo_v8hi"
+  [(set (match_operand:V4SF 0 "register_operand" "")
+ (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "")]
+  UNSPEC_VUPKLS_V4SF))]
+   "TARGET_ALTIVEC"
+   "
+ {
+   rtx tmp = gen_reg_rtx (V4SImode);
+
+   emit_insn (gen_vec_unpacks_lo_v8hi (tmp, operands[1]));
+   emit_insn (gen_altivec_vcfsx (operands[0], tmp, const0_rtx));
+   DONE;
+ }")
+
+ (define_expand "vec_unpacku_float_hi_v8hi"
+  [(set (match_operand:V4SF 0 "register_operand" "")
+ (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "")]
+  UNSPEC_VUPKHU_V4SF))]
+   "TARGET_ALTIVEC"
+   "
+ {
+   rtx tmp = gen_reg_rtx (V4SImode);
+
+   emit_insn (gen_vec_unpacku_hi_v8hi (tmp, operands[1]));
+   emit_insn (gen_altivec_vcfux (operands[0], tmp, const0_rtx));
+   DONE;
+ }")
+
+ (define_expand "vec_unpacku_float_lo_v8hi"
+  [(set (match_operand:V4SF 0 "register_operand" "")
+ (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "")]
+  UNSPEC_VUPKLU_V4SF))]
+   "TARGET_ALTIVEC"
+   "
+ {
+   rtx tmp = gen_reg_rtx (V4SImode);
+
+   emit_insn (gen_vec_unpacku_lo_v8hi (tmp, operands[1]));
+   emit_insn (gen_altivec_vcfux (operands[0], tmp, const0_rtx));
+   DONE;
+ }")


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24659



[Bug rtl-optimization/32084] gfortran 4.3 13%-18% slower for induct.f90 than gcc 4.0-based competitor

2007-06-27 Thread dorit at il dot ibm dot com


--- Comment #5 from dorit at il dot ibm dot com  2007-06-27 11:57 ---
(In reply to comment #4)
> (In reply to comment #3)
> > The problem is in -ftree-vectorize
> The difference is, that without -ftree-vectorize the inner loop (do k = 1, 9)
> is completely unrolled, but with vectorization, the loop is vectorized, but
> _not_ unrolled. Since the vectorization factor is only 2 for V2DF mode 
> vectors,
> we loose big time at this point.
> My best guess for unroller problems would be rtl-optimization.

Could it be the tree-level complete unroller? (does the vectorizer peel the
loop to handle a misaligned store by any chance? if so, and if the misalignment
amount is unknown, then the number of iterations of the vectorized loop is
unknown, in which case the complete unroller wouldn't work). In autovect-branch
the tree-level complete unroller is before the vectorizer - wonder what happens
there.

Another thing to consider is using -fvect-cost-model (it's very perliminary and
hasn't been tuned much, but this could be a good data point for whoever wants
to tune the vectorizer cost-model for x86_64).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32084



[Bug tree-optimization/32378] can't determine dependence (distinct sections of an array)

2007-06-18 Thread dorit at il dot ibm dot com


--- Comment #4 from dorit at il dot ibm dot com  2007-06-18 11:08 ---
I see this in the vectorizer dump file (with mainline from a few days ago):

(compute_affine_dependence
  (stmt_a =
D.1423_50 = (*a_49(D))[D.1422_48])
  (stmt_b =
(*a_49(D))[D.1420_51] = D.1425_54)
Data ref a:
(Data Ref:
  stmt: D.1423_50 = (*a_49(D))[D.1422_48];
  ref: (*a_49(D))[D.1422_48];
  base_object: (*a_49(D))[0];
  Access function 0: {pretmp.48_45 + 1, +, 1}_1
  Access function 1: 0B
)
Data ref b:
(Data Ref:
  stmt: (*a_49(D))[D.1420_51] = D.1425_54;
  ref: (*a_49(D))[D.1420_51];
  base_object: (*a_49(D))[0];
  Access function 0: {0, +, 1}_1
  Access function 1: 0B
)
affine dependence test not usable: access function not affine or constant.
(dependence classified: scev_not_known)
)
(compute_affine_dependence
  (stmt_a =
D.1424_53 = (*b_52(D))[D.1420_51])
  (stmt_b =
(*a_49(D))[D.1420_51] = D.1425_54)
)

(the IR looks a bit different than PR32075, but the data-rependence analysis
fails with the same problem). pinskia - are you still planning to address this
issue?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32378



[Bug tree-optimization/32075] can't determine dependence between p->a[x+i] and p->a[x+i+1] where x is invariant but defined in the function

2007-06-18 Thread dorit at il dot ibm dot com


--- Comment #2 from dorit at il dot ibm dot com  2007-06-18 11:03 ---
I see this in the vectorizer dump file (with mainline from a few days ago):

(compute_affine_dependence
  (stmt_a =
D.3027_19 = p_7->a[D.3026_18])
  (stmt_b =
p_7->a[D.3025_17] = D.3027_19)
Data ref a:
(Data Ref:
  stmt: D.3027_19 = p_7->a[D.3026_18];
  ref: p_7->a[D.3026_18];
  base_object: p_7->a[0];
  Access function 0: {x1_5 + 1, +, 1}_2
  Access function 1: 0B
)
Data ref b:
(Data Ref:
  stmt: p_7->a[D.3025_17] = D.3027_19;
  ref: p_7->a[D.3025_17];
  base_object: p_7->a[0];
  Access function 0: {x1_5, +, 1}_2
  Access function 1: 0B
)
affine dependence test not usable: access function not affine or constant.
(dependence classified: scev_not_known)
)

(In reply to comment #1)
> Ok, I have a patch for this issue, I am going to test it with -ftree-vectorize

so how is that coming along? do you think it will also address PRs
32375/6/7/8/9 ?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32075



[Bug target/32274] FAIL: gcc.dg/vect/pr32224.c

2007-06-13 Thread dorit at il dot ibm dot com


--- Comment #1 from dorit at il dot ibm dot com  2007-06-13 08:41 ---
Sorry about the breakage. Does it work for you if you change the testcase as
follows?:

Index: pr32224.c
===
--- pr32224.c   (revision 125641)
+++ pr32224.c   (working copy)
@@ -10,7 +10,7 @@

   for (i = 0; i < count; i++)
   {
-__asm__ ("bswap %q0": "=r" (*__dst):"0" (*(__src)));
+__asm__ ("checkme": "=r" (*__dst):"0" (*(__src)));
 __src++;
   }
 }


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32274



[Bug tree-optimization/32309] Unnecessary conversion from short to unsigend short breaks vectorization

2007-06-12 Thread dorit at il dot ibm dot com


--- Comment #4 from dorit at il dot ibm dot com  2007-06-12 17:46 ---
it's on my (long) todo list...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32309



[Bug tree-optimization/32224] [4.3 Regression] ICE in vect_analyze_operations, at tree-vect-analyze.c:374

2007-06-07 Thread dorit at il dot ibm dot com


--- Comment #4 from dorit at il dot ibm dot com  2007-06-07 18:40 ---
You're right. I'm testing this obvious patch:

Index: tree-vect-analyze.c
===
*** tree-vect-analyze.c (revision 125526)
--- tree-vect-analyze.c (working copy)
*** vect_determine_vectorization_factor (loo
*** 173,181 
  print_generic_expr (vect_dump, stmt, TDF_SLIM);
}

- if (TREE_CODE (stmt) != GIMPLE_MODIFY_STMT)
-   continue;
-
  gcc_assert (stmt_info);

  /* skip stmts which do not need to be vectorized.  */
--- 173,178 
*** vect_determine_vectorization_factor (loo
*** 187,192 
--- 184,199 
  continue;
}

+ if (TREE_CODE (stmt) != GIMPLE_MODIFY_STMT)
+   {
+ if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
+   {
+ fprintf (vect_dump, "not vectorized: irregular stmt.");
+ print_generic_expr (vect_dump, stmt, TDF_SLIM);
+   }
+ return false;
+   }
+
  if (!GIMPLE_STMT_P (stmt)
  && VECTOR_MODE_P (TYPE_MODE (TREE_TYPE (stmt
{


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32224



[Bug tree-optimization/32216] [4.3 Regression] ICE: verify_stmts failed (invalid reference prefix) with -ftree-vectorize

2007-06-06 Thread dorit at il dot ibm dot com


--- Comment #5 from dorit at il dot ibm dot com  2007-06-06 08:33 ---
(In reply to comment #4)
> (In reply to comment #3)
> > Probably something similar is required for the VEC_UNPACK_FLOAT_*_EXPR
> > tree-codes ?
> But these tree-codes are already there:

sorry, I guess I was looking at autovect-branch or something


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32216



[Bug tree-optimization/32216] [4.3 Regression] ICE: verify_stmts failed (invalid reference prefix) with -ftree-vectorize

2007-06-05 Thread dorit at il dot ibm dot com


--- Comment #3 from dorit at il dot ibm dot com  2007-06-06 03:28 ---
(In reply to comment #1)

veclower expands things when it wrongly concludes that they are not supported
by the target in vecor mode. For demotion/promotion/conversion kinda operations
this may be because it does not check the optab table using the right type. For
example, I had to add the following in expand_vector_operations_1:
"
  /* For widening/narrowing vector operations, the relevant type is of the
 arguments, not the widened result.  */
  if (code == WIDEN_SUM_EXPR
  || code == VEC_WIDEN_MULT_HI_EXPR
  || code == VEC_WIDEN_MULT_LO_EXPR
  || code == VEC_UNPACK_HI_EXPR
  || code == VEC_UNPACK_LO_EXPR
  || code == VEC_PACK_TRUNC_EXPR
  || code == VEC_PACK_SAT_EXPR)
type = TREE_TYPE (TREE_OPERAND (rhs, 0));
"

Probably something similar is required for the VEC_UNPACK_FLOAT_*_EXPR
tree-codes ?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32216



[Bug target/32107] New: bad codegen for vector initialization in Altivec

2007-05-27 Thread dorit at il dot ibm dot com
Compiling the folloxing testcase:

#define vector __attribute__((__vector_size__(16) ))
float fa[100] __attribute__ ((__aligned__(16)));
vector float foo ()
{
  float f = fa[0];
  vector float vf = {f, f, f, f};
  return vf;
}

...with gcc -O2 -maltivec, we get:

ld  r9,0(r2)
lfs f0,0(r9)
addir9,r1,-16
stfsf0,-16(r1)
lvewx   v2,r0,r9
vspltw  v2,v2,0
blr

My problem is with the {lfs,stfs,lvewx} sequence: we load a value into f0, and
then store it (with stfs) into an aligned memory location, so that it could be
loaded from there into a vector (with lvewx). However, since the address from
which f0 was loaded is known to be aligned, we could directly do an lvewx from
there, and avoid the extra {lfs,stfs}, so the following should be enough:

ld  r9,0(r2)
lvewx   v2,r0,r9
vspltw  v2,v2,0
blr

The problem is that rs6000_expand_vector_init doesn't know that f0 is
originated from an aligned address. It gets the following as vals:

(parallel:V4SF [
(reg/v:SF 119 [ f ])
(reg/v:SF 119 [ f ])
(reg/v:SF 119 [ f ])
(reg/v:SF 119 [ f ])
])

We somehow want to expand 'f = fa[0]' and '{f,f,f,f}' together... if
expand_vector_init could get this as vals: '{fa[0],fa[0],fa[0],fa[0]}', it
could see that the original address is aligned. 
Alternatively, the prospects of getting rid of the redundant load and store
later on during some kind of a peephole optimization don't seem so high to
me... Thoughts?

This may be related to PR31334 (though there the issue is about initialization
with constants, so I'm not sure if the idea for a solution proposed there would
help us here).


-- 
   Summary: bad codegen for vector initialization in Altivec
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: dorit at il dot ibm dot com
 GCC build triplet: powerpc-linux
  GCC host triplet: powerpc-linux
GCC target triplet: powerpc-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32107



[Bug middle-end/31738] Fortran dot product vectorization is restricted

2007-05-16 Thread dorit at il dot ibm dot com


--- Comment #3 from dorit at il dot ibm dot com  2007-05-16 20:45 ---
(In reply to comment #2)
> Here is what happens in the three loops that don't get vectorized:
> (1) the loop in testvectdp2: 
...
> so the vectorizer is ok, except that in this case D.1437_32 doesn't seem to > 
> be used anywhere in the function, so this stmt looks dead to me, but for 
> some reason it is not cleaned away before the vectorizer...  Still need to
> investigate why. 

So looks like the stmt 
   D.1437_32 = prephitmp.192_37
became dead by pass pr31738a.f90.089t.copyprop3.

So the question is what's the most appropriate fix:
(1) fix copyprop3 to also clean away any dead code it creates?
(2) add a dce pass after copyprop3?
(3) work around it in the vectorizer. I think it should be easy - just move the
check of the uses of the reduction in the loop until after the vectorizer
analysis pass that marks relevant stmts.

If (3) sounds like the way to go - I can prepare a patch for that.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31738



[Bug tree-optimization/31946] New: missed vectorization due to too strict peeling-for-alignment policy

2007-05-16 Thread dorit at il dot ibm dot com
The vectorizer is too restricted in the way it decides by how many iterations
to peel a loop in order to align a certain memory reference in a loop. It
considers only the first (potentially) misaligned store it encounters in the
loop. For this reason the testcases vect-multitypes-1.c, vect-multitypes-4.c
and vect-iv-4.c don't get vectorized. For example (using Vector Size of 16
bytes), in vect-multitypes-1.c we have:

short sa[N], sb[N];
int ia[N], ib[N];  
for (i = 0; i < n; i++) {
  ia[i+3] = ib[i];
  sa[i+3] = sb[i];
}

The current peeling-for-alignment scheme will consider the 'ia[i+3]' access for
peeling, and therefore will examine the option of using a peeling factor =
(4-3)%4 = 1. This will not align the access 'sa[i+3]', for which we need to
peel 5 iterations. As a result the loop doesn't get vectorized (cause we
currently can't handle misaligned stores unless we align them by peeling).
However, if we had considered the 'sa[i+3]' access as well for peeling, we
would have examined the option of using a peeling factor = (8-3)%8 = 5, which
would align both accesses, and would allow us to vectorize the loop. So the
vectorizer needs to be extended to consider more peeling factors, and not just
one.


-- 
   Summary: missed vectorization due to too strict peeling-for-
alignment policy
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at il dot ibm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31946



[Bug tree-optimization/31945] New: missing type vector conversions patterns on spu

2007-05-16 Thread dorit at il dot ibm dot com
Since the following patch:

2007-04-22  Uros Bizjak  <[EMAIL PROTECTED]>

PR tree-optimization/24659

GCC supports vectorization of float<-->double conversions. These can also be
modelled for the spu target by implementing the following patterns:
vec_pack_trunc_v2df
vec_unpacks_lo_v4sf
vec_unpacks_hi_v4sf

(also see PR24659)


-- 
   Summary: missing type vector conversions patterns on spu
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at il dot ibm dot com
 GCC build triplet: spu
  GCC host triplet: spu
GCC target triplet: spu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31945



[Bug tree-optimization/25809] missed PRE optimization - move "invariant casts" out of loops

2007-05-08 Thread dorit at il dot ibm dot com


--- Comment #8 from dorit at il dot ibm dot com  2007-05-09 07:14 ---
> So I guess this should be handled somewhere else. I'll open a new
> missed-optimization PR instead (not against PRE this time). thanks.

This is now PR31873


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25809



[Bug tree-optimization/31873] New: missed optimization: we don't move "invariant casts" out of loops

2007-05-08 Thread dorit at il dot ibm dot com
This PR was originally opened against PRE (PR25809), but turns out PRE can't
solve this problem, so here's a new PR instead:

In testcases that have reduction, like gcc.dg/vect/vect-reduc-2char.c and
gcc.dg/vect-reduc-2short.c, the following casts appear:

signed char sdiff;
unsigned char ux, udiff; 
sdiff_0 = ...
loop:
   # sdiff_41 = PHI ;
   .
   ux_36 = 
   udiff_37 = (unsigned char) sdiff_41;  
   udiff_38 = x_36 + udiff_37;
   sdiff_39 = (signed char) udiff_38;
end_loop

although these casts could be taken out of loop all together. i.e., transform
the code into something like the following:

signed char sdiff;
unsigned char ux, udiff;
sdiff_0 = ...
udiff_1 = (unsigned char) sdiff_0;
loop:
   # udiff_3 = PHI ;
   .
   ux_36 = 
   udiff_2 = ux_36 + udiff_3;
end_loop
sdiff_39 = (signed char) udiff_2;

see this discussion thread:
http://gcc.gnu.org/ml/gcc-patches/2005-12/msg01827.html


-- 
   Summary: missed optimization: we don't move "invariant casts" out
of loops
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: dorit at il dot ibm dot com
  GCC host triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31873



[Bug middle-end/31738] Fortran dot product vectorization is restricted

2007-05-08 Thread dorit at il dot ibm dot com


--- Comment #2 from dorit at il dot ibm dot com  2007-05-08 21:00 ---
Here is what happens in the three loops that don't get vectorized:

(1) the loop in testvectdp2: 
This is the loop we analyze:

  # prephitmp.192_37 = PHI 
  # i_1 = PHI <1(3), i_44(5)>
:;
  D.1437_32 = prephitmp.192_37;
  D.1438_33 = (int8) i_1;
  D.1439_34 = D.1438_33 + -1;
  D.1440_36 = (*a_35(D))[D.1439_34];
  D.1441_40 = (*b_39(D))[D.1439_34];
  D.1442_41 = D.1441_40 * D.1440_36;
  D.1443_42 = prephitmp.192_37 + D.1442_41;
  storetmp.191_38 = D.1443_42;
  c__lsm.199_17 = D.1443_42;
  i_44 = i_1 + 1;
  if (i_1 == D.1429_5)
goto  ();
  else
goto  ();

We recognize the reduction, but we think that it is used in the loop:
  pr31738.f90:14: note: reduction used in loop.

and indeed, prephitmp.192_37 is used in:
  D.1443_42 = prephitmp.192_37 + D.1442_41;
which is ok, because this is the reduction stmt,
but also used here:
  D.1437_32 = prephitmp.192_37;
which is indeed something that we normally don't allow.
so the vectorizer is ok, except that in this case D.1437_32 doesn't seem to be
used anywhere in the function, so this stmt looks dead to me, but for some
reason it is not cleaned away before the vectorizer...  Still need to
investigate why. 


(2) the loop in testvecm:
This looks like the problem reported in PR31756:

failed to compute offset or step for (*a.0_11)[D.1559_52]
create_data_ref: failed to create a dr for (*a.0_11)[D.1559_52]
pr31738.f90:24: note: not vectorized: unhandled data-ref

(3) the loop in testvecm2
Same story (the PR31756 problem):

failed to compute offset or step for (*a.0_10)[D.1509_52]
create_data_ref: failed to create a dr for (*a.0_10)[D.1509_52]
pr31738.f90:32: note: not vectorized: unhandled data-ref


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31738



[Bug middle-end/31699] [4.3 Regression] -march=opteron -ftree-vectorize generates wrong code

2007-05-02 Thread dorit at il dot ibm dot com


--- Comment #6 from dorit at il dot ibm dot com  2007-05-02 20:38 ---
patch: http://gcc.gnu.org/ml/gcc-patches/2007-05/msg00111.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31699



[Bug testsuite/31589] gcc.dg/vect failures due to missing target specifiers

2007-04-26 Thread dorit at il dot ibm dot com


--- Comment #4 from dorit at il dot ibm dot com  2007-04-27 05:44 ---
patch: http://gcc.gnu.org/ml/gcc-patches/2007-04/msg01739.html

requires retesting on ia64 before I can commit it.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31589



[Bug middle-end/31699] [Regression 4.3] -march=opteron -ftree-vectorize generates wrong code

2007-04-26 Thread dorit at il dot ibm dot com


--- Comment #4 from dorit at il dot ibm dot com  2007-04-26 19:37 ---
I'm testing the attched patch. The problem is that we don't compute the peel
factor correctly (when peeling to align a store) when we have multiple
data-types in the loop (the computation assumes that VF is the number of
elements in a vector, but that doesn't hold for all the datarefs in the loop if
their types are of different sizes)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31699



[Bug middle-end/31699] [Regression 4.3] -march=opteron -ftree-vectorize generates wrong code

2007-04-26 Thread dorit at il dot ibm dot com


--- Comment #3 from dorit at il dot ibm dot com  2007-04-26 19:34 ---
Created an attachment (id=13450)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13450&action=view)
patch


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31699



[Bug fortran/31615] testsuite failure in gfortran.dg/vect/vect-5.f90

2007-04-25 Thread dorit at il dot ibm dot com


--- Comment #7 from dorit at il dot ibm dot com  2007-04-25 21:30 ---
> Are you going to submit/install your patch?

yes, I'll go ahead and submit the patch


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31615



[Bug fortran/31615] testsuite failure in gfortran.dg/vect/vect-5.f90

2007-04-18 Thread dorit at il dot ibm dot com


--- Comment #5 from dorit at il dot ibm dot com  2007-04-19 07:27 ---
(In reply to comment #4)
> (In reply to comment #3)
> > But then I wonder why we don't see the same failure on ia64?
> Because the failing part of the testcase is only done on ilp32 targets:
> ! { dg-final { scan-tree-dump-times "Alignment of access forced using
> versioning." 3 "vect" { target { ilp32 && vect_no_align
> } } } }

ah, ok. so, in that case we probably want to just change the '3' to '2' in the
above test:

Index: testsuite/gfortran.dg/vect/vect-5.f90
===
--- testsuite/gfortran.dg/vect/vect-5.f90   (revision 123954)
+++ testsuite/gfortran.dg/vect/vect-5.f90   (working copy)
@@ -38,7 +38,7 @@
 ! { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } }
 ! { dg-final { scan-tree-dump-times "Alignment of access forced using peeling"
1 "vect" { xfail { vect_no_align } } } }
 ! { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 1 "vect"
{ xfail { vect_no_align } } } }
-! { dg-final { scan-tree-dump-times "Alignment of access forced using
versioning." 3 "vect" { target { ilp32 && vect_no_align } } } }
+! { dg-final { scan-tree-dump-times "Alignment of access forced using
versioning." 2 "vect" { target { ilp32 && vect_no_align } } } }

 ! We also expect to vectorize one loop for lp64 targets that support
 ! misaligned access:


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31615



[Bug fortran/31615] testsuite failure in gfortran.dg/vect/vect-5.f90

2007-04-18 Thread dorit at il dot ibm dot com


--- Comment #3 from dorit at il dot ibm dot com  2007-04-18 10:18 ---
> Created dump file using -fdump-tree-vect-details

thanks. So I don't understand why we expect to version for 3 different
data-references, since there are only 2 in the loop that is vectorized. But
then I wonder why we don't see the same failure on ia64?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31615



[Bug fortran/31615] testsuite failure in gfortran.dg/vect/vect-5.f90

2007-04-17 Thread dorit at il dot ibm dot com


--- Comment #1 from dorit at il dot ibm dot com  2007-04-18 06:42 ---
could you please provide the .vect dump file, as generated with
-fdump-tree-vect-details?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31615



[Bug testsuite/31589] gcc.dg/vect failures due to missing target specifiers

2007-04-17 Thread dorit at il dot ibm dot com


--- Comment #2 from dorit at il dot ibm dot com  2007-04-17 20:10 ---
> 2 more are under investigation:
> no-section-anchors-vect-69.c
> vect-reduc-dot-u16a.c

In the first testcase, the vectorizer can only prove that the data reference in
the third loop is aligned on 8 bytes. This is enough for targets like ia64 in
which the vector size is 8 bytes, and therefore we don't need to peel in order
to force alignment for this loop. So overall in this testcase we peel only
twice. On targets that require 16byte alignment, a guaranteed 8bytes alignment
is not enough, and therefore we peel this loop to align the data-reference (and
overall in the testcase we peel 3 times).  

I guess the way to solve this is to add a keyword that lists the targets with
8byte-wide-vectors and targets with 16byte-wide-vectors, or just hard code the
targets that are expected to fail/pass here. I'll sleep on it and supply a
patch soon.


The second test needs the same fix as a lot of the other tests: 
add { target vect_pack_mod } to the check.  This is because the loop in main
has a cast from int to short in it.  However, in this testcase we already have
two target keywords that we are checking:
 { target { vect_short_mult && vect_widen_sum_hi_to_si } }, 
and I don't think the testsuite engine currently provides the flexibility to
and a third keyword, so I suggest to just change the loop slightly to avoid the
cast (it's not the point of this testcase anyway):

Index: vect-reduc-dot-u16a.c
===
--- vect-reduc-dot-u16a.c   (revision 123909)
+++ vect-reduc-dot-u16a.c   (working copy)
@@ -30,7 +30,7 @@
 int main (void)
 {
   unsigned int dot1;
-  int i;
+  unsigned short i;

   check_vect ();


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31589



[Bug tree-optimization/25809] missed PRE optimization - move "invariant casts" out of loops

2007-04-17 Thread dorit at il dot ibm dot com


--- Comment #7 from dorit at il dot ibm dot com  2007-04-17 19:31 ---
> so I will look into it. 

(for reference: http://gcc.gnu.org/ml/gcc-patches/2007-04/msg01103.html).

So I guess this should be handled somewhere else. I'll open a new
missed-optimization PR instead (not against PRE this time). thanks.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25809



[Bug tree-optimization/25809] missed PRE optimization - move "invariant casts" out of loops

2007-04-16 Thread dorit at il dot ibm dot com


--- Comment #6 from dorit at il dot ibm dot com  2007-04-17 07:38 ---
> can you please send me the patch so that I could look at this failures before
> you close this PR?

I'm going over my inbox top down, so I just saw that you had laready sent the
patch... so I will look into it. (thanks!)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25809



[Bug tree-optimization/25809] missed PRE optimization - move "invariant casts" out of loops

2007-04-16 Thread dorit at il dot ibm dot com


--- Comment #5 from dorit at il dot ibm dot com  2007-04-17 07:22 ---
> Doing cast motion actually causes about 25 *more* failures in the vectorizer
> testsuite.
> I'm closing this as won't fix since it seems there was no other reason to do
> this.

can you please send me the patch so that I could look at this failures before
you close this PR?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25809



[Bug fortran/31561] FAIL: gfortran.dg/vect/vect-4.f90

2007-04-14 Thread dorit at il dot ibm dot com


--- Comment #4 from dorit at il dot ibm dot com  2007-04-14 09:38 ---
> I think the only thing that really matters is that the loop is vectorized. I
> don't think the alignment details are important checking, even on platforms
> where they are relevant. So we should remove all scan-tree-dump-times except
> the first one, I guess.

I think it's not such a bad idea to check that the handling of alignment is
working as expected. Also, it's relevant both for targets that support
alignment, and for targets that don't, because on those that don't - we can
still vectorize misaligned accesses using loop-versioning. 

> I'm adding Ira and Dorit to the CC list, as they wrote and modified the
> original test. Ira, Dorit, I'm not sure how to proceed here, do you agree with
> the paragraph above about what is the right thing to do?

see - http://gcc.gnu.org/ml/gcc/2007-04/msg00479.html.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31561



[Bug target/31334] Bad codegen for vector initializer with constants prop'd into a vector initializer

2007-04-03 Thread dorit at il dot ibm dot com


--- Comment #8 from dorit at il dot ibm dot com  2007-04-03 20:46 ---
(In reply to comment #7)
> Something like:
> (define_insn_and_split "altivec_dup"
>   [(set (match_operand:V 0 "register_operand" "v")
> (vec_duplicate: (match_operand: 0 "r")))
>(clobber (match_operand:V 3 "memory_operand" "=Z"))]
>   "TARGET_ALTIVEC"
>   "#"
>   "&& reload_completed"
>   
> Which then will be generated from rs6000_expand_vector_init.  I can write this
> if you want, it is just I cannot test this until Monday.

yes, please... I'll be very happy to see this fixed. (thanks!!)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31334



[Bug target/25413] wrong alignment or incorrect address computation in vectorized code on Pentium 4 SSE

2007-04-03 Thread dorit at il dot ibm dot com


--- Comment #6 from dorit at il dot ibm dot com  2007-04-03 20:22 ---
So I see Devang had sent a patch for this over a year ago:
http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00167.html
I don't know what ever happened to it.
Maybe you want to give it a try? (you may need to implement the new target hook
for Pentium4). If you have problems applying the patch (it is a bit old) - I
could try to help update the patch (not before next week though).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413



[Bug tree-optimization/31460] if(a) a[i] = xxx; else a[i] = yyy; is not converted to if (a) ddd= xxx; else ddd = yyy; a[i] = ddd;

2007-04-03 Thread dorit at il dot ibm dot com


--- Comment #4 from dorit at il dot ibm dot com  2007-04-03 19:56 ---
yes, this is indeed a known problem (I don't know if there's a PR open for it).
It is one of the tree-ifcvt enhancements that Victor was going to tackle for
4.3 (item (2.3) in http://gcc.gnu.org/wiki/AutovectBranchOptimizations?). 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31460



[Bug target/31334] New: Bad codegen for vectorized induction with altivec

2007-03-24 Thread dorit at il dot ibm dot com
Turns out the code we are generating for vectorized induction ppc is quite
terrible - the vector induction variable is advanced by a constant step in the
loop (e.g., {4,4,4,4} as in the testcase below). This is the sequence gcc
currently creates for altivec in order to generate the {4,4,4,4} vector:

li 0,4
stw 0,-48(1)
lvewx 0,0,9
vspltw 0,0,0

So, one thing to figure out is why we don't use the immediate form of the splat
(vspltiw); The other is - why this sequence ends up getting generated not only
before the loop (see insns marked with "<<<1" below), but also inside the
loop... (see insns marked with "<<<2" below). 

This is the testcase (it is basically the testcase
gcc.dg/vect/no-tree-scev-cprop-vect-iv-1.c with larger loop count to avoid
complete unrolling):

int main1 (int X)
{
  int s = X;
  int i;
  for (i = 0; i < 96; i++)
s += i;
  return s;
}

compiled as follows:
gcc -O2 -ftree-vectorize -maltivec -fno-tree-scev-cprop -S t.c


li 0,4  <<<1
stw 0,-48(1)<<<1
ld 9,[EMAIL PROTECTED](2)
li 0,23
mr 11,3
mtctr 0
lvx 1,0,9
addi 9,1,-48
vor 13,1,1
lvewx 0,0,9 <<<1
vspltw 0,0,0<<<1
vadduwm 1,1,0
.p2align 4,,15
.L2:
li 0,4  <<<2
addi 9,1,-48
vadduwm 13,13,1
stw 0,-48(1)<<<2
lvewx 0,0,9 <<<2
vspltw 0,0,0<<<2
vadduwm 1,1,0
bdnz .L2


-- 
   Summary: Bad codegen for vectorized induction with altivec
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
         Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at il dot ibm dot com
 GCC build triplet: powerpc-linux
  GCC host triplet: powerpc-linux
GCC target triplet: powerpc-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31334



[Bug tree-optimization/31333] New: ICE with -fno-tree-dominator-opts -ftree-vectorize -msse

2007-03-24 Thread dorit at il dot ibm dot com
The testcase gcc.dg/vect/no-tree-dom-vect-bug.c ICEs on i386-linux when
compiled as follows:

gcc no-tree-dom-vect-bug.c -O2 -fno-tree-dominator-opts -ftree-vectorize -msse

no-tree-dom-vect-bug.c: In function âmain1â:
no-tree-dom-vect-bug.c:15: internal compiler error: in expand_simple_binop, at
optabs.c:1192

This happens somewhere between these passes:
no-tree-dom-vect-bug.c.036t.release_ssa
no-tree-dom-vect-bug.c.044t.apply_inline
(maybe when the vectorized main1 is inlined into main?) 

gdb traceback:
function=0x8703852 "expand_simple_binop") at ../../gcc/gcc/diagnostic.c:656
656   internal_error ("in %s, at %s:%d", function, trim_filename (file),
line);
(gdb) backtrace
#0  fancy_abort (file=0x8703282 "../../gcc/gcc/optabs.c", line=1192,
function=0x8703852 "expand_simple_binop") at ../../gcc/gcc/diagnostic.c:656
#1  0x08271467 in expand_simple_binop (mode=Variable "mode" is not available.
) at ../../gcc/gcc/optabs.c:1192
#2  0x081a2a3f in force_operand (value=0xb7d01d98, target=0xb7a9d960)
at ../../gcc/gcc/expr.c:6069
#3  0x08622a4d in move_invariant_reg (loop=0xa363f60, invno=0)
at ../../gcc/gcc/loop-invariant.c:1180
#4  0x086236bd in move_loop_invariants () at
../../gcc/gcc/loop-invariant.c:1242
#5  0x08621757 in rtl_move_loop_invariants () at ../../gcc/gcc/loop-init.c:256
#6  0x082775c6 in execute_one_pass (pass=0x87ad8a0) at
../../gcc/gcc/passes.c:1058
#7  0x082777b7 in execute_pass_list (pass=0x87ad8a0) at
../../gcc/gcc/passes.c:1110
#8  0x082777ca in execute_pass_list (pass=0x87ad7e0) at
../../gcc/gcc/passes.c:
#9  0x082777ca in execute_pass_list (pass=0x87aab60) at
../../gcc/gcc/passes.c:
#10 0x08356638 in tree_rest_of_compilation (fndecl=0xb7ce9a80)
at ../../gcc/gcc/tree-optimize.c:412
#11 0x0808db8c in c_expand_body (fndecl=0xb7ce9a80) at
../../gcc/gcc/c-common.c:4285
#12 0x084b4ab1 in cgraph_expand_function (node=0xb7c1b700)
at ../../gcc/gcc/cgraphunit.c:1015
#13 0x084b6e96 in cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1084
#14 0x0805b4df in c_write_global_declarations () at ../../gcc/gcc/c-decl.c:7930
#15 0x082f5bd6 in toplev_main (argc=20, argv=0xbfd26434) at
../../gcc/gcc/toplev.c:1063
#16 0x080d5e02 in main (argc=Cannot access memory at address 0x1
) at ../../gcc/gcc/main.c:35
(gdb) up
#2  0x081a2a3f in force_operand (value=0xb7d01d98, target=0xb7a9d960)
at ../../gcc/gcc/expr.c:6069
6069  return expand_simple_binop (GET_MODE (value), code, op1,
op2,
(gdb) p code
$1 = VEC_SELECT


-- 
   Summary: ICE with -fno-tree-dominator-opts -ftree-vectorize -msse
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at il dot ibm dot com
 GCC build triplet: i386-linux
  GCC host triplet: i386-linux
GCC target triplet: i386-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31333



[Bug target/30784] [4.3 regression] ICE on loop vectorization (-O1 -march=athlon-xp -ftree-vectorize)

2007-03-24 Thread dorit at il dot ibm dot com


--- Comment #7 from dorit at il dot ibm dot com  2007-03-24 08:00 ---
patch: http://gcc.gnu.org/ml/gcc/2007-03/msg00918.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30784



[Bug target/30784] [4.3 regression] ICE on loop vectorization (-O1 -march=athlon-xp -ftree-vectorize)

2007-03-14 Thread dorit at il dot ibm dot com


--- Comment #5 from dorit at il dot ibm dot com  2007-03-14 12:29 ---
this is the testcase I have ICE-ing on powerpc64-yellowdog, when compiled with
-ftree-vectorize -maltivec -m64 -O2:

long stack_vars_sorted[32];
void partition_stack_vars (long stack_vars_num)
{
  long si, n = stack_vars_num;
  for (si = 0; si < n; ++si)
stack_vars_sorted[si] = si;
}

(extracted from cfgexpand.c which ICEs during bootstrap with vectorization)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30784



[Bug target/30784] [4.3 regression] ICE on loop vectorization (-O1 -march=athlon-xp -ftree-vectorize)

2007-03-14 Thread dorit at il dot ibm dot com


--- Comment #4 from dorit at il dot ibm dot com  2007-03-14 12:13 ---
I also saw this on powerpc64, on a different testcase (vectorizing longs with
-m64). seems like constant propagation during dom3 propagates the vector
initializer into a BIT_FIELD_EXPR, which results in invalid gimple?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30784



[Bug tree-optimization/31041] [4.3 Regression] verify_stmts failed: invalid operand to binary operator with -O2 -ftree-vectorize

2007-03-05 Thread dorit at il dot ibm dot com


--- Comment #5 from dorit at il dot ibm dot com  2007-03-05 20:15 ---
I'm travelling now, but can prepare a fix when I'm back (next week).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31041



[Bug tree-optimization/30858] [4.3 Regression] ice for legal code with -O2 -ftree-vectorize

2007-02-21 Thread dorit at il dot ibm dot com


--- Comment #8 from dorit at il dot ibm dot com  2007-02-21 19:31 ---
> Is this acceptable ?

sure, thanks


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30858



[Bug tree-optimization/30858] [4.3 Regression] ice for legal code with -O2 -ftree-vectorize

2007-02-20 Thread dorit at il dot ibm dot com


--- Comment #6 from dorit at il dot ibm dot com  2007-02-20 22:56 ---
proposed patches - http://gcc.gnu.org/ml/gcc-patches/2007-02/msg01734.html

> I have thrown most of Suse Linux 10.3 at it and it has crashed
> in a few places.

would you mind giving these patches a try? (to see what's the next ICE...?)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30858



[Bug tree-optimization/30858] [4.3 Regression] ice for legal code with -O2 -ftree-vectorize

2007-02-19 Thread dorit at il dot ibm dot com


--- Comment #5 from dorit at il dot ibm dot com  2007-02-19 14:12 ---
Looks like I wasn't careful enough with my fix for PR30771. Here is a fix for
that fix I'm now testing:

Index: tree-vect-analyze.c
===
--- tree-vect-analyze.c (revision 122128)
+++ tree-vect-analyze.c (working copy)
@@ -124,10 +124,11 @@

  /* Two cases of "relevant" phis: those that define an
 induction that is used in the loop, and those that
-define a reduction.  */
+directly define a reduction.  */
  if ((STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_loop
   && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
- || (STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction
+ || (STMT_VINFO_RELEVANT (stmt_info) ==
+  vect_used_directly_by_reduction
  && STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def))
 {
  gcc_assert (!STMT_VINFO_VECTYPE (stmt_info));
@@ -328,8 +329,12 @@
return false;
  }

- if (STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_loop
- && STMT_VINFO_DEF_TYPE (stmt_info) != vect_induction_def)
+ if ((STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_loop
+  && STMT_VINFO_DEF_TYPE (stmt_info) != vect_induction_def)
+ || (STMT_VINFO_RELEVANT (stmt_info) >
+   vect_used_directly_by_reduction
+  && STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def))
+
{
  /* Most likely a reduction-like computation that is used
 in the loop.  */
@@ -2313,9 +2318,11 @@
   if (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def)
{
  gcc_assert (relevant == vect_unused_in_loop && live_p);
- relevant = vect_used_by_reduction;
+ relevant = vect_used_directly_by_reduction;
  live_p = false;
}
+  else if (relevant == vect_used_directly_by_reduction)
+   relevant = vect_used_by_reduction;

   i = 0;
   FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
Index: tree-vectorizer.h
===
--- tree-vectorizer.h   (revision 122128)
+++ tree-vectorizer.h   (working copy)
@@ -175,6 +175,7 @@
 /* Indicates whether/how a variable is used in the loop.  */
 enum vect_relevant {
   vect_unused_in_loop = 0,
+  vect_used_directly_by_reduction,
   vect_used_by_reduction,
   vect_used_in_loop
 };


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30858



[Bug c/30858] ice for legal code with -O2 -ftree-vectorize

2007-02-19 Thread dorit at il dot ibm dot com


--- Comment #3 from dorit at il dot ibm dot com  2007-02-19 12:56 ---
(In reply to comment #0)

Thanks for exercising the vectorizer and reporting these bugs!

> On the wider issue of the quality of the vectorizer, I
> have thrown most of Suse Linux 10.3 at it and it has crashed
> in a few places.

only a few? :-)

> I suspect there would be considerable value at getting some
> other distribution [ Debian ?], maybe on another type of 
> machine [ PPC64 ?], and flushing out a few more bugs in the optimizer.
> You would need to ensure that -ftree-vectorize was switched
> on for every compile.
> Just a suggestion.

I agree. We are working on a cost model these days to make the vectorizer less
greedy, hopefully as a step towards enabling vectorization on by default -
which would help in flushing bugs out.

(Just as a side comment - FYI - most of the vectorizer bugs you opened so far
in the last few days (30771, 30795, 30843) are related to features that were
added *very* recently).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30858



[Bug c/30858] ice for legal code with -O2 -ftree-vectorize

2007-02-19 Thread dorit at il dot ibm dot com


--- Comment #2 from dorit at il dot ibm dot com  2007-02-19 12:45 ---
Reduced testcase:

int foo (int ko)
{
 int j,i;
  for (j = 0; j < ko; j++)
   i += (i > 10) ? -5 : 7;
 return i;
}

Looking into it...


-- 

dorit at il dot ibm dot com changed:

   What|Removed |Added

Summary|ice for legal code with -O2 |ice for legal code with -O2
   |-ftree-vectorize|-ftree-vectorize


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30858



[Bug c/30843] ice for legal code with -ftree-vectorize -O2

2007-02-19 Thread dorit at il dot ibm dot com


--- Comment #3 from dorit at il dot ibm dot com  2007-02-19 08:28 ---
> Looks like possibly some bad interaction between vectorization of induction 
> and
> vectorization of strided-access. Will investigate. 

I looked into it with Ira, and looks like the problem is that during
transformation we remove each of the stores of the interleaved-group as we scan
the stmts, but we actually vectorize them all together only when we reach the
last store of the interleaved-group, at which point, we attempt to insert the
loop-update for the vectorized induction before one of the stores - and crash.  


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30843



[Bug c/30843] ice for legal code with -ftree-vectorize -O2

2007-02-18 Thread dorit at il dot ibm dot com


--- Comment #2 from dorit at il dot ibm dot com  2007-02-18 21:50 ---
I was able to reproduce it. Here's a reduced testcase:

void dacP98FillRGBMap( unsigned char *pBuffer )
{
unsigned long dw, dw1;
unsigned long *pdw = (unsigned long *)(pBuffer);

for( dw = 256, dw1 = 0; dw; dw--, dw1 += 0x01010101 ) {
  *pdw++ = dw1;
  *pdw++ = dw1;
  *pdw++ = dw1;
  *pdw++ = dw1;
}
}

Looks like possibly some bad interaction between vectorization of induction and
vectorization of strided-access. Will investigate. 


-- 

dorit at il dot ibm dot com changed:

   What|Removed |Added

Summary|ice for legal code with -   |ice for legal code with -
   |ftree-vectorize -O2 |ftree-vectorize -O2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30843



[Bug tree-optimization/30795] [4.3 Regression] ice for legal code with -ftree-vectorize -O2

2007-02-18 Thread dorit at il dot ibm dot com


--- Comment #4 from dorit at il dot ibm dot com  2007-02-18 16:42 ---
patch: http://gcc.gnu.org/ml/gcc-patches/2007-02/msg01555.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30795



[Bug tree-optimization/30795] [4.3 Regression] ice for legal code with -ftree-vectorize -O2

2007-02-15 Thread dorit at il dot ibm dot com


--- Comment #3 from dorit at il dot ibm dot com  2007-02-15 10:21 ---
I'll look into it.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30795



[Bug tree-optimization/30771] ice for legal code with -O2 -ftree-vectorize

2007-02-12 Thread dorit at il dot ibm dot com


--- Comment #4 from dorit at il dot ibm dot com  2007-02-12 14:23 ---
I'm testing the patch below. (wasn;t able to reproduce the problem in the
attched testcase, but here's a reduced testcase for the problem that Richi
described - thanks!:

int a[128];
int
main()
{
  short i;

  for (i=0; i<64; i++){
a[i] = (int)i;
  }
  return 0;
}

)

Index: tree-vect-analyze.c
===
--- tree-vect-analyze.c (revision 121843)
+++ tree-vect-analyze.c (working copy)
@@ -97,8 +97,12 @@
   int nbbs = loop->num_nodes;
   block_stmt_iterator si;
   unsigned int vectorization_factor = 0;
+  tree scalar_type;
+  tree phi;
+  tree vectype;
+  unsigned int nunits;
+  stmt_vec_info stmt_info;
   int i;
-  tree scalar_type;

   if (vect_print_dump_info (REPORT_DETAILS))
 fprintf (vect_dump, "=== vect_determine_vectorization_factor ===");
@@ -107,12 +111,67 @@
 {
   basic_block bb = bbs[i];

+  for (phi = phi_nodes (bb); phi; phi = PHI_CHAIN (phi))
+   {
+ stmt_info = vinfo_for_stmt (phi);
+ if (vect_print_dump_info (REPORT_DETAILS))
+   {
+ fprintf (vect_dump, "==> examining phi: ");
+ print_generic_expr (vect_dump, phi, TDF_SLIM);
+   }
+
+ gcc_assert (stmt_info);
+
+ /* Two cases of "relevant" phis: those that define an 
+induction that is used in the loop, and those that
+define a reduction.  */
+ if ((STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_loop
+  && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+ || (STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction
+ && STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def))
+{
+ gcc_assert (!STMT_VINFO_VECTYPE (stmt_info));
+  scalar_type = TREE_TYPE (PHI_RESULT (phi));
+
+ if (vect_print_dump_info (REPORT_DETAILS))
+   {
+ fprintf (vect_dump, "get vectype for scalar type:  ");
+ print_generic_expr (vect_dump, scalar_type, TDF_SLIM);
+   }
+
+ vectype = get_vectype_for_scalar_type (scalar_type);
+ if (!vectype)
+   {
+ if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
+   {
+ fprintf (vect_dump,
+  "not vectorized: unsupported data-type ");
+ print_generic_expr (vect_dump, scalar_type, TDF_SLIM);
+   }
+ return false;
+   }
+ STMT_VINFO_VECTYPE (stmt_info) = vectype;
+
+ if (vect_print_dump_info (REPORT_DETAILS))
+   {
+ fprintf (vect_dump, "vectype: ");
+ print_generic_expr (vect_dump, vectype, TDF_SLIM);
+   }
+
+ nunits = TYPE_VECTOR_SUBPARTS (vectype);
+ if (vect_print_dump_info (REPORT_DETAILS))
+   fprintf (vect_dump, "nunits = %d", nunits);
+
+ if (!vectorization_factor
+ || (nunits > vectorization_factor))
+   vectorization_factor = nunits;
+   }
+   }
+
   for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
 {
  tree stmt = bsi_stmt (si);
- unsigned int nunits;
- stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
- tree vectype;
+ stmt_info = vinfo_for_stmt (stmt);

  if (vect_print_dump_info (REPORT_DETAILS))
{
@@ -269,10 +328,11 @@
return false;
  }

- if (STMT_VINFO_RELEVANT_P (stmt_info))
+ if (STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_loop
+ && STMT_VINFO_DEF_TYPE (stmt_info) != vect_induction_def)
{
  /* Most likely a reduction-like computation that is used
-in the loop.  */
+in the loop.  */
  if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
fprintf (vect_dump, "not vectorized: unsupported pattern.");
 return false;
@@ -2235,17 +2295,7 @@

 (case 2)
If STMT has been identified as defining a reduction variable, then
-  we have two cases:
-  (case 2.1)
-The last use of STMT is the reduction-variable, which is defined
-by a loop-header-phi. We don't want to mark the phi as live or
-relevant (because it does not need to be vectorized, it is handled
- as part of the vectorization of the reduction), so in this case
we
-skip the call to vect_mark_relevant.
-  (case 2.2)
-The rest of the uses of STMT are defined in the loop body. For
- the def_stmt of these uses we want to set l

[Bug c++/30771] ice for legal code with -O2 -ftree-vectorize

2007-02-12 Thread dorit at il dot ibm dot com


--- Comment #3 from dorit at il dot ibm dot com  2007-02-12 10:11 ---
I'll look into it.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30771



[Bug tree-optimization/29145] unsafe use of restrict qualifier

2007-02-06 Thread dorit at il dot ibm dot com


--- Comment #11 from dorit at il dot ibm dot com  2007-02-06 08:18 ---
(In reply to comment #10)
> One thing I can think of that this description misses is that the two 
> pointers must be based-on *different* restrict-qualified pointers, unless 
> that case is already handled elsewhere.

yes, at the beginning of this function we check if the two pointers are the
same, and if so - we don't reach this part of the code. Since our
implementation of "based on" is the pointer itself (i.e. "is ptr_a based on
some restricted pointer ptr_b" is implemented as "is ptr_a a restricted
pointer", we are safe. You're right though that when the implementation of
"based on" is extended, we'd need to compare the two restricted pointers (we
now compare the two ptr_a's, we'd need to compare the two ptr_b's). 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29145



[Bug tree-optimization/29145] unsafe use of restrict qualifier

2007-01-07 Thread dorit at il dot ibm dot com


--- Comment #8 from dorit at il dot ibm dot com  2007-01-07 20:22 ---
I'm testing this patch, that makes us more conservative, and concludes that two
pointers don't overlap only if both are "based on" restricted pointers, with
"based on" trivially implemented as the pointer used in the reference itself.
In addition, we check that the declarations of both pointers are in the
parameter list of the same function (to be safe w.r.t the scope of the pointer
declarations). Looks like this should be safe enough?

Most of the vectorizer testcases still get vectorized with this patch. Two
testcases that don't, however, are - vect-[74,80].c, for which we need a bit
less trivial implementation of "based on". We can start with this conservative
implementation, switch this PR to a "missed optimization", and gradually work
on relaxing the restrictions as much as we can.


Index: tree-data-ref.c
===
--- tree-data-ref.c (revision 120551)
+++ tree-data-ref.c (working copy)
@@ -490,6 +490,7 @@
   tree addr_a = DR_BASE_ADDRESS (dra);
   tree addr_b = DR_BASE_ADDRESS (drb);
   tree type_a, type_b;
+  tree decl_a, decl_b;
   bool aliased;

   if (!addr_a || !addr_b)
@@ -547,14 +548,25 @@
 }

   /* An instruction writing through a restricted pointer is "independent" of
any
- instruction reading or writing through a different pointer, in the same 
- block/scope.  */
-  else if ((TYPE_RESTRICT (type_a) && !DR_IS_READ (dra))
-  || (TYPE_RESTRICT (type_b) && !DR_IS_READ (drb)))
+ instruction reading or writing through a different restricted pointer, 
+ in the same block/scope.  */
+  else if (TYPE_RESTRICT (type_a)
+  &&  TYPE_RESTRICT (type_b) 
+  && (!DR_IS_READ (drb) || !DR_IS_READ (dra))
+  && TREE_CODE (DR_BASE_ADDRESS (dra)) == SSA_NAME
+  && (decl_a = SSA_NAME_VAR (DR_BASE_ADDRESS (dra)))
+  && TREE_CODE (decl_a) == PARM_DECL
+  && TREE_CODE (DECL_CONTEXT (decl_a)) == FUNCTION_DECL
+  && TREE_CODE (DR_BASE_ADDRESS (drb)) == SSA_NAME
+  && (decl_b = SSA_NAME_VAR (DR_BASE_ADDRESS (drb)))
+  && TREE_CODE (decl_b) == PARM_DECL
+  && TREE_CODE (DECL_CONTEXT (decl_b)) == FUNCTION_DECL
+  && DECL_CONTEXT (decl_a) == DECL_CONTEXT (decl_b)) 
 { 
   *differ_p = true;
   return true;
 }
+
   return false;
 }


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29145



[Bug tree-optimization/30038] Call to sin(x), cos(x) should be transformed to sincos(x)

2006-12-12 Thread dorit at il dot ibm dot com


--- Comment #16 from dorit at il dot ibm dot com  2006-12-12 20:59 ---
(In reply to comment #13)
Looks like what's blocking vectorization of the loop is:

sinc.f90:8: note: value used after loop.
sinc.f90:8: note: not vectorized: relevant stmt not supported: D.1408_32 =
(*radius_31)[D.1407_30]

i.e., there is a value computed in the loop that is also used after the loop
(coefficient__lsm.61_26), and the above stmt is in its def-use chain, as can be
seen from the loop snippet below: 


  # n_3 = PHI <1(3), n_70(5)>;
:;
...
  D.1408_32 = (*radius_31)[D.1407_30];
...
  D.1410_35 = reciptmp.60_24 * D.1408_32;
...
  D.1419_63 = D.1410_35 * pretmp.53_51;
  coefficient__lsm.61_26 = D.1419_63;
...
  (*tmp_49)[D.1426_67] = D.1419_63;
  n_70 = n_3 + 1;
  if (n_3 == D.1398_5) goto ; else goto ;

:;
  goto  ();

  # coefficient__lsm.61_54 = PHI ;
:;
  *coefficient_44 = coefficient__lsm.61_54;

:;
  return;


We currently support a computation that is used after the loop only if the
computation is a reduction. We have a patch in autovect branch that provides
the first step towards supporting this situation in general, but it needs more
work. How important is this feature do you think? 

In the meantime you can try to use a different variable for the coefficient
inside the loop, and after the loop read the desired value from memory to set
the coefficient function argument (hopefully this will disconnect the use
outside the loop from the def inside the loop).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30038



[Bug tree-optimization/30038] Call to sin(x), cos(x) should be transformed to sincos(x)

2006-12-07 Thread dorit at il dot ibm dot com


--- Comment #11 from dorit at il dot ibm dot com  2006-12-07 20:19 ---
(In reply to comment #10)
> Using the three patches:
...
> gfortran is able to use sincos - and does so for my example (comment #0; the
> example, however, cannot be vectorized).

 why? (what does -fdump-tree-vect-details say?)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30038



[Bug fortran/29779] [4.3 Regression] vectorizer fortran testcases failing

2006-12-06 Thread dorit at il dot ibm dot com


--- Comment #12 from dorit at il dot ibm dot com  2006-12-06 22:22 ---

> By the way, you wrote 2006-11-17:
> > Should be submitted this weekend
> Any new ETA?

It was already submitted: 
http://gcc.gnu.org/ml/gcc-patches/2006-12/msg00110.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29779



[Bug fortran/29779] [4.3 Regression] vectorizer fortran testcases failing

2006-11-16 Thread dorit at il dot ibm dot com


--- Comment #7 from dorit at il dot ibm dot com  2006-11-17 06:46 ---
(In reply to comment #6)
> This patch should fix the problem:

indeed it does, thanks!
are you going to submit it to mainline?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29779



[Bug tree-optimization/29777] missed optimization: model missing widen_mult* idioms for SSE

2006-11-09 Thread dorit at il dot ibm dot com


--- Comment #2 from dorit at il dot ibm dot com  2006-11-09 20:26 ---
> But these files can be succesfully vectorized using current (gcc version 4.3.0
> 20061109) version on i686:
> gcc -O2 -msse2 -ftree-vectorize -fdump-tree-vect-all vect-widen-mult-sum.c
> vect-widen-mult-sum.c:16: note: LOOP VECTORIZED.
> vect-widen-mult-sum.c:12: note: vectorized 1 loops in function.

Probably because the i386 port models the "vect_unpack" and "vect_int_mult"
idioms (see
target-supports.exp:check_effective_target_vect_widen_mult_hi_to_si()): i.e.,
instead of recognizing it's a widening multiplication and vectorizing it as
such, it's vectorized by first unpacking (widening) the shorts to ints, and
then doing int multiplication, which is probably less efficient. Sorry for the
unclarity

> > The missing insns (that should be merged from autovect-branch and debugged):
> > vec_widen_umult_hi_v8hi
> > vec_widen_umult_lo_v8hi
> These patterns _are_ present in gcc version 4.3.0 20061109 (experimental) in
> sse.md.

I'm sorry - I meant vec_widen_smult_hi_v8hi and vec_widen_smult_lo_v8hi.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29777



[Bug middle-end/29779] New: vectorizer fortran testcases failing

2006-11-09 Thread dorit at il dot ibm dot com
Looks like sometime between Oct27 
(http://gcc.gnu.org/ml/gcc-testresults/2006-10/msg01336.html) 
and Oct30 
(http://gcc.gnu.org/ml/gcc-testresults/2006-10/msg01538.html) 
the fortran vectorizer testcases started ICEing on:

gfortran.dg/vect/vect-3.f90:0: warning: 'const' attribute directive ignored
gfortran.dg/vect/vect-3.f90:4: internal compiler error: in
vect_setup_realignment, at tree-vect-transform.c:2534

Should be related somehow to this code in rs6000.c:

  /* Initialize target builtin that implements
 targetm.vectorize.builtin_mask_for_load.  */

  decl = add_builtin_function ("__builtin_altivec_mask_for_load",
   v16qi_ftype_long_pcvoid,
   ALTIVEC_BUILTIN_MASK_FOR_LOAD,
   BUILT_IN_MD, NULL,
   tree_cons (get_identifier ("const"),
  NULL_TREE, NULL_TREE));


Anybody knows which patch caused this?


-- 
   Summary: vectorizer fortran testcases failing
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: dorit at il dot ibm dot com
 GCC build triplet: ppc*-*-linux
  GCC host triplet: ppc*-*-linux
GCC target triplet: ppc*-*-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29779



[Bug tree-optimization/29778] New: missed optimization: model missing vec_pack/unpack idioms for ia64

2006-11-09 Thread dorit at il dot ibm dot com
We need to port the ia64 support for vectorization of multiple-datatypes from
autovect-branch. This is the patch missing from mainline (wasn't included in
http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00166.html cause I cauldn't test
this):

2005-12-02  Richard Henderson  <[EMAIL PROTECTED]>

* config/ia64/ia64.c (TARGET_VECTORIZE_BUILTIN_EXTRACT_EVEN): New.
(TARGET_VECTORIZE_BUILTIN_EXTRACT_ODD): New.
(TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN,
TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD, ia64_builtin_mul_widen_even,
ia64_builtin_mul_widen_odd, builtin_ia64_pmpy_r, builtin_ia64_pmpy_l,
IA64_BUILTIN_PMPY_R, IA64_BUILTIN_PMPY_L): New
(ia64_init_builtins): Initialize builtin_ia64_pmpy_[rl].
(ia64_expand_builtin): Expand them.
(ia64_expand_unpack): New.
* config/ia64/vect.md (smulv4hi3_highpart, umulv4hi3_highpart): New.
(vec_pack_ssat_v4hi): Rename from pack2_sss.
(vec_pack_usat_v4hi): Rename from pack2_uss.
(vec_pack_ssat_v2si): Rename from pack4_sss.
(vec_pack_mod_v4hi, vec_pack_mod_v2si): New.
(vec_interleave_lowv8qi): Rename from unpack1_l.
(vec_interleave_highv8qi): Rename from unpack1_h.
(vec_interleave_lowv4hi): Rename from unpack2_l.
(vec_interleave_highv4hi): Rename from unpack2_h.
(vec_interleave_lowv2si): Rename from unpack4_l.
(vec_interleave_highv2si): Rename from unpack4_h.
(vec_unpacku_hi_v8qi, vec_unpacks_hi_v8qi): New.
(vec_unpacku_lo_v8qi, vec_unpacks_lo_v8qi): New.
(vec_unpacku_hi_v4hi, vec_unpacks_hi_v4hi): New.
(vec_unpacku_lo_v4hi, vec_unpacks_lo_v4hi): New.
* config/ia64/ia64-protos.h (ia64_expand_unpack): Declare.

Once the above is merged, we can add ia64 to the lists of targets that support
the following functions in testsuite/lib/target-support.exp:
check_effective_target_vect_sdot_hi
check_effective_target_vect_udot_qi
check_effective_target_vect_sdot_qi
check_effective_target_vect_widen_sum_qi_to_hi
check_effective_target_vect_widen_sum_hi_to_si


-- 
   Summary: missed optimization: model missing vec_pack/unpack
idioms for ia64
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at il dot ibm dot com
 GCC build triplet: ia64-*-*
  GCC host triplet: ia64-*-*
GCC target triplet: ia64-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29778



[Bug tree-optimization/29777] New: missed optimization: model missing widen_mult* idioms for SSE

2006-11-09 Thread dorit at il dot ibm dot com
The patch that adds support for vectorization of multiple data-types
(http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00166.html) was missing a few
bits from the i386 port that rth contributed to autovect-branch a while back.
This is because a couple testcases were failing with these features:

The testcases that failed (on assembler error) are two of tests that require
"vect_widen_mult_hi_to_si":
testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c
testsuite/gcc.dg/vect/vect-widen-mult-s16.c
testsuite/gcc.dg/vect/vect-widen-mult-sum.c

The missing insns (that should be merged from autovect-branch and debugged):
vec_widen_umult_hi_v8hi
vec_widen_umult_lo_v8hi

When these are back in, we'll want to add i?86-*-* and x86_64-*-* to the list
of targets that return true in the function "vect_widen_mult_hi_to_si" in
testsuite/lib/target-support.exp.


-- 
   Summary: missed optimization: model missing widen_mult*  idioms
for SSE
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: dorit at il dot ibm dot com
 GCC build triplet: i?86-*-* and x86_64-*-*
  GCC host triplet: i?86-*-* and x86_64-*-*
GCC target triplet: i?86-*-* and x86_64-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29777



[Bug tree-optimization/29145] unsafe use of restrict qualifier

2006-11-05 Thread dorit at il dot ibm dot com


--- Comment #6 from dorit at il dot ibm dot com  2006-11-05 15:48 ---
(In reply to comment #5)
> This was something that slipped in, IIRC. I was of Ian's viewpoint, that
> may_alias_p should handle it, and it shouldn't be special to data-references.

yes, it was originally added as a temporary hack until alias analysis did
something with restrict


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29145



[Bug middle-end/29269] New: missing documentation for "vcond" (vector conditional operation)

2006-09-28 Thread dorit at il dot ibm dot com
missing documentation for "vcond" (vector conditional operation).


-- 
   Summary: missing documentation for "vcond" (vector conditional
operation)
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: dorit at il dot ibm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29269



[Bug middle-end/29268] New: missed optimization: need to generalize realignment support in the vectorizer

2006-09-28 Thread dorit at il dot ibm dot com
details in theis thread:
http://gcc.gnu.org/ml/gcc/2006-09/msg00503.html

Need to add other ways to handle realignment, that are applicable to targets
that can't support the realign_load the way it is currently defined.


-- 
   Summary: missed optimization: need to generalize realignment
support in the vectorizer
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at il dot ibm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29268



[Bug tree-optimization/29170] autovec cannot handle short+=short

2006-09-21 Thread dorit at il dot ibm dot com


--- Comment #4 from dorit at il dot ibm dot com  2006-09-21 19:30 ---
By the way, the testcase gets vectorized if you compile with -fwrapv.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29170



[Bug middle-end/29160] New: missed optimization: redundant casts prevent vectorization

2006-09-20 Thread dorit at il dot ibm dot com
Details in this thread:
http://gcc.gnu.org/ml/gcc/2006-09/msg00167.html

"
   A silly little testcase which the vectorizer doesn't vectorize:

unsigned char qa[128];
unsigned char qb[128];
unsigned char qc[128];
unsigned char qd[128];

void autovectqi (void)
{
int i;

for (i = 0; i < 128; i ++)
qd[i] = qa[i] ^ qb[i] + qc[i];
}

...

   If I change 'qb[i] + qc[i]' to e.g. 'qb[i] & qc[i]' the vectorizer works
fine.

autovecttest.c:11: note: not vectorized: relevant stmt not supported: D.1861_9
= (signed char) D.1860_8
"

Devnag suggested the solution should be part of a "tree-combin" pass:
http://gcc.gnu.org/ml/gcc/2006-09/msg00182.html

Dorit suggested to add it as part of the vectorizer's pattern-recognition
engine:
http://gcc.gnu.org/ml/gcc/2006-09/msg00281.html


-- 
   Summary: missed optimization: redundant casts prevent
vectorization
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at il dot ibm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29160



[Bug middle-end/28684] Imprecise -funsafe-math-optimizations definition

2006-09-11 Thread dorit at il dot ibm dot com


--- Comment #4 from dorit at il dot ibm dot com  2006-09-11 10:57 ---
> You could help by looking at the source code (there are only a few dozens
> places mentioning flag_unsafe_math_optimizations) and auditing which places
> would be more suited to a new flag_reassociate_fp variable.

we'd be very interested in allowing the vectorizer to work under
flag_reassociate_fp rather than flag_unsafe_math_optimizations, so we'll give
this a try. 


-- 

dorit at il dot ibm dot com changed:

   What|Removed |Added

 CC||eres at il dot ibm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28684



[Bug tree-optimization/26969] [4.1 Regression] ICE with -O1 -funswitch-loops -ftree-vectorize

2006-08-31 Thread dorit at il dot ibm dot com


--- Comment #12 from dorit at il dot ibm dot com  2006-09-01 05:43 ---
oops - I didn't notice it was open against 4.1.
So hopefully porting Victor's patch to 4.1 would fix it.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26969



[Bug tree-optimization/26969] [4.1 Regression] ICE with -O1 -funswitch-loops -ftree-vectorize

2006-08-31 Thread dorit at il dot ibm dot com


--- Comment #10 from dorit at il dot ibm dot com  2006-08-31 08:22 ---
I think this can be closed?
(I opened a missed-optimization PR instead - PR28643)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26969



[Bug tree-optimization/27742] [4.2 regression] ICE with -ftree-vectorizer-verbose

2006-08-31 Thread dorit at il dot ibm dot com


--- Comment #9 from dorit at il dot ibm dot com  2006-08-31 08:08 ---
I have been unsuccessful in reproducing this problem on a i386-redhat-linux.

I don't get a failure compiling the testcase from comment 8.

I tried to compile the testcase from comment 7 and got the following errors:

g++ -O1 -g -ftree-vectorize -ftree-vectorizer-verbose=5 -S G2\[1].ii

G2[1].ii:2154: error: integer constant is too large for âlongâ type
G2[1].ii:2154: error: integer constant is too large for âlongâ type
G2[1].ii:2156: error: integer constant is too large for âlongâ type
G2[1].ii:425: warning: â__malloc__â attribute ignored
G2[1].ii:1662: warning: no matching push for â#pragma GCC visibility popâ
G2[1].ii:2065: error: âoperator newâ takes type âsize_tâ (âunsigned intâ) as
first parameter
G2[1].ii:2065: error: âoperator newâ takes type âsize_tâ (âunsigned intâ) as
first parameter
G2[1].ii:2065: error: âoperator newâ takes type âsize_tâ (âunsigned intâ) as
first parameter
G2[1].ii:2065: error: âoperator newâ takes type âsize_tâ (âunsigned intâ) as
first parameter
G2[1].ii:2065: error: âoperator newâ takes type âsize_tâ (âunsigned intâ) as
first parameter
G2[1].ii:2065: error: âoperator newâ takes type âsize_tâ (âunsigned intâ) as
first parameter
G2[1].ii: In static member function âstatic long int std::numeric_limits::min()â:
G2[1].ii:2154: warning: overflow in implicit constant conversion
G2[1].ii: In static member function âstatic long int std::numeric_limits::max()â:
G2[1].ii:2154: warning: overflow in implicit constant conversion
G2[1].ii: In static member function âstatic long unsigned int
std::numeric_limits::max()â:
G2[1].ii:2156: warning: overflow in implicit constant conversion


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27742



[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-09 Thread dorit at il dot ibm dot com


--- Comment #55 from dorit at il dot ibm dot com  2006-08-09 19:10 ---
Subject: Re:  [4.0/4.1 Regression] gcc 4 produces worse x87 code
 on all platforms than gcc 3

>
> Here's some questions I need to figure out:
> (1) Why do I have to throw the -funsafe-math-optimizations flag to
> enable this?
>-- I see where the .vect file warns of it, but it refers to an SSA
line,
>   so I'm not sure what's going on.

This flag is needed in order to allow vectorization of reduction (summation
in your case) of floating-point data. This is because vectorization of
reduction changes the order of the computation, which may result in
different behavior (instead of summing this way:
((a0+a1)+a2)+a3)+a4)+a5)+a6)+a7, we sum this way
(((a0+a2)+a4)+a6)+(((a1+a3)+a5)+a7)

> (2) Is there any pragma or assertion, etc, that I can put in the code to
> notify the compiler that certain pointers point to 16-byte aligned
data?
> -- Only the output array (C) is possibly misaligned in ATLAS
>

Not really, I'm afraid - there is something that's not entirely supported
in gcc yet - see details in PR20794.

dorit

> Thanks,
> Clint
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
>


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827



[Bug tree-optimization/28643] redundant phi-node in latch-block prevents vectorization

2006-08-08 Thread dorit at il dot ibm dot com


--- Comment #3 from dorit at il dot ibm dot com  2006-08-08 07:38 ---
> Err, SSA copy prop should be enough, actually, since after copy-prop,
> the phi will have no users (and they shouldn't care about code with no
> uses that doesn't access memory).
> Though it's interesting that this redundant phi survives so long. What
> is creating it?

I think it's loop-unswitch


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28643



[Bug middle-end/28643] New: redundant phi-node in latch-block prevents vectorization

2006-08-07 Thread dorit at il dot ibm dot com
Since the fix for PR26969, we now fail to vectorize loops that have redundant
phi-nodes in their (otherwise empty) latch block.
The testcase committed with the PR fix is an example for such a case.
See http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00034.html for more details.


-- 
   Summary: redundant phi-node in latch-block prevents vectorization
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at il dot ibm dot com
 GCC build triplet: powerpc-linux
  GCC host triplet: powerpc-linux
GCC target triplet: powerpc-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28643



[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-07 Thread dorit at il dot ibm dot com


--- Comment #43 from dorit at il dot ibm dot com  2006-08-07 20:35 ---
> I'm all for this.  info gcc says that w/o a guarantee of alignment, loops are
> duped, with an if selecting between vector and scalar loops, is this not
> accurate?  

yes

>I spent a day trying to get gcc to vectorize any of the generator's
> loops, and did not succeed (can you make it vectorize the provided benchmark
> code?).  

The aggressive unrolling in the provided example seems to be the first obstacle
to vectorize the code

> I also tried various unrollings of the inner loop, particularly no
> unrolling and unroll=2 (vector length).  I was unable to truly decipher the
> warning messages explaining the lack of vectorization, and I would truly
> welcome some help in fixing this.

I'd be happy to help decipher the vectorizer's dump file. please send the
un-unrolled version and the dump file generated by -fdump-tree-vect-details,
and I'll see if I can help.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827



[Bug middle-end/27770] [4.2 Regression] wrong code in spec tests for -ftree-vectorize -maltivec

2006-08-07 Thread dorit at il dot ibm dot com


--- Comment #25 from dorit at il dot ibm dot com  2006-08-07 07:09 ---
(In reply to comment #24)
> Fixed, a new different bug for the missed optimization should be opened.

It's PR28628.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27770



[Bug middle-end/28628] New: Not forcing alignment of arrays in structs with -fsection-anchors

2006-08-07 Thread dorit at il dot ibm dot com
Since the fix to PR27770, we now miss opportunities to align some arrays when
-fsection-anchors is enabled. The patch for PR27770 increases the alignment of
(global) arrays only. We have a few testcases though (e.g.
section-anchors-vect-69.c) that have global structs that contain fields that
are arrays. Aligning the beginning of these structs can sometime align one/some
of their array fields. Since the new function cgraph_increase_alignment does
notattempt to do that, we have cases that will be vectorized less efficiently.
To solve this we need to extend the optimization to align global structs that
have array fields that could become aligned as a result.


-- 
   Summary: Not forcing alignment of arrays in structs  with -
fsection-anchors
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at il dot ibm dot com
 GCC build triplet: powerpc-linux
  GCC host triplet: powerpc-linux
GCC target triplet: powerpc-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28628



[Bug middle-end/27770] [4.2 Regression] wrong code in spec tests for -ftree-vectorize -maltivec

2006-07-23 Thread dorit at il dot ibm dot com


--- Comment #19 from dorit at il dot ibm dot com  2006-07-23 19:03 ---
> The fix we've agreed is best in principle is to speculatively increase
> the DECL_ALIGN of vectorisable variables before compiling functions.
> Dorit says that there is a patch related to this on the autovect branch,
> which I'll look at when I get back from Ottawa.
> Richard

Turns out the patch I was thinking about is only for the rs6000 port:
http://gcc.gnu.org/ml/gcc-patches/2006-05/msg00266.html
so that's not much help.

Do we want to implement this as a separate pass? at which point of the
compilation? (doing it during ipa might be a problem if ipa is not enabled?) 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27770



[Bug tree-optimization/26197] [4.2 regression] ICE in is_old_name with vectorizer

2006-03-01 Thread dorit at il dot ibm dot com


--- Comment #13 from dorit at il dot ibm dot com  2006-03-01 12:35 ---
So I'll submit the patch to gcc-patches for approval. Can someone please check
if this patch actually solves this PR?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26197



[Bug tree-optimization/26197] [4.2 regression] ICE in is_old_name with vectorizer

2006-02-28 Thread dorit at il dot ibm dot com


--- Comment #11 from dorit at il dot ibm dot com  2006-02-28 08:26 ---
Created an attachment (id=10935)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10935&action=view)
tentative patch

I get a similar error message when trying to bootstrap mainline with
vectorization enabled:

/home/dorit/mainline_svn/build2/./prev-gcc/xgcc
-B/home/dorit/mainline_svn/build2/./prev-gcc/
-B/home/dorit/mainline_svn2/ppc64-yellowdog-linux/bin/ -c   -g -O2
-ftree-vectorize -maltivec -DIN_GCC   -W -Wall -Wwrite-strings
-Wstrict-prototypes -Wmissing-prototypes -pedantic -Wno-long-long
-Wno-variadic-macros -Wno-overlength-strings -Wold-style-definition
-Wmissing-format-attribute -Werror -fno-common   -DHAVE_CONFIG_H -I. -I.
-I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include
-I../../gcc/gcc/../libcpp/include  -I../../gcc/gcc/../libdecnumber
-I../libdecnumber../../gcc/gcc/recog.c -o recog.o
../../gcc/gcc/recog.c: In function âconstrain_operandsâ:
../../gcc/gcc/recog.c:2270: internal compiler error: tree check: expected
ssa_name, have struct_field_tag in verify_ssa, at tree-ssa.c:735
make[3]: *** [recog.o] Error 1
make[3]: Leaving directory `/home/dorit/mainline_svn/build2/gcc'
make[2]: *** [all-stage2-gcc] Error 2
make[2]: Leaving directory `/home/dorit/mainline_svn/build2'
make[1]: *** [stage2-bubble] Error 2
make[1]: Leaving directory `/home/dorit/mainline_svn/build2'
make: *** [bootstrap] Error 2

Following Zdenek's observations, I tried the attached patch. It solves this
failure above in recog.c, but it fails bootstrap with vectorization enabled
later on. (It does pass regular bootstrap on ppc-linux). So this patch needs to
be further examined, but I wonder if it fixes this PR (I can't reproduce it)?

About the patch: 

new_type_alias() originally looked like this:

TAG <- new tag for ptr;
if (var has subvars){
   foreach subvar
add the subvar as may-alias of TAG.
}
else{
   get the may-aliases of var;
   if (|may-aliases| == 1)
set the (single) may-alias of var as the new tag of ptr;
   else if (|may-aliases| == 0)
add var as may-alias of the TAG;
   else /* |may-aliases| > 1 */
add the may-aliases of var as may-aliases of TAG;
}

What I did is basically factored out the 'else' part into a separate function,
and called that function also in the 'if' part, for each subvar; this way, we
don't add the subvar as may-alias of TAG if the subvar itself has may-aliases,
but add its may-aliases instead:

new version of new_type_alias():

TAG <- new tag for ptr;
if (var has subvars){
   foreach subvar
 add_may_aliases_for_new_tag (TAG, subvar)
}
else{
   add_may_aliases_for_new_tag (TAG, var)
}

add_may_aliases_for_new_tag (TAG, var)
{
   get the may-aliases of var;
   if (|may-aliases| == 1)
set the (single) may-alias of var as the new tag of ptr;
   else if (|may-aliases| == 0)
add var as may-alias of the TAG;
   else /* |may-aliases| > 1 */
add the may-aliases of var as may-aliases of TAG;
}

Makes sense to anyone?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26197



[Bug tree-optimization/26419] -ftree-vectorizer-verbose=n documentation is terse

2006-02-26 Thread dorit at il dot ibm dot com


--- Comment #3 from dorit at il dot ibm dot com  2006-02-26 11:05 ---
patch: http://gcc.gnu.org/ml/gcc-patches/2006-02/msg01905.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26419



[Bug tree-optimization/26420] -ftree-vectorizer-verbose=1 prints unvectorized loops information

2006-02-26 Thread dorit at il dot ibm dot com


--- Comment #2 from dorit at il dot ibm dot com  2006-02-26 11:01 ---
For -ftree-vectorizer-verbose=1 the vectorizer reports each loop that got
vectorized, and also the total number of loops that got vectorized, even if
that number is zero. If preferable, we can report that 0 loops got vectorized
only under -ftree-vectorizer-verbose=2, or higher. The patch below makes the
"vectorized 0 loops" be reported for verbosity level 2 and higher. Shall I
suggest the patch for mainline?

Index: tree-vectorizer.c
===
*** tree-vectorizer.c   (revision 111450)
--- tree-vectorizer.c   (working copy)
*** vectorize_loops (struct loops *loops)
*** 2047,2053 
num_vectorized_loops++;
  }

!   if (vect_print_dump_info (REPORT_VECTORIZED_LOOPS))
  fprintf (vect_dump, "vectorized %u loops in function.\n",
 num_vectorized_loops);

--- 2047,2058 
num_vectorized_loops++;
  }

!   if (num_vectorized_loops > 0
!   && vect_print_dump_info (REPORT_VECTORIZED_LOOPS))
! fprintf (vect_dump, "vectorized %u loops in function.\n",
!num_vectorized_loops);
!   else if (num_vectorized_loops == 0
!  && vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
  fprintf (vect_dump, "vectorized %u loops in function.\n",
 num_vectorized_loops);


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26420



[Bug tree-optimization/26360] [4.2 Regression] Autovectorization of char -> int loop gets ICE

2006-02-21 Thread dorit at il dot ibm dot com


--- Comment #3 from dorit at il dot ibm dot com  2006-02-21 22:02 ---
patch:
http://gcc.gnu.org/ml/gcc-patches/2006-02/msg01713.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26360



[Bug tree-optimization/26359] [4.2 Regression] Over optimization of loop when using -ftree-vectorize

2006-02-21 Thread dorit at il dot ibm dot com


--- Comment #3 from dorit at il dot ibm dot com  2006-02-21 22:01 ---
patch:
http://gcc.gnu.org/ml/gcc-patches/2006-02/msg01710.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26359



[Bug tree-optimization/26362] ICE on the autovect-branch (gfortran example)

2006-02-20 Thread dorit at il dot ibm dot com


--- Comment #2 from dorit at il dot ibm dot com  2006-02-20 17:09 ---
Actually there's this patch by rth that seems to fix this ICE; it's from a
while back, I don't think it was fully tested at the time, and I'm not sure it
provides all the missing bits/fixes for SSE support.

=== targhooks.c
==
--- targhooks.c  (revision 108004)
+++ targhooks.c  (local)
@@ -448,7 +448,8 @@
   tree type;
   enum machine_mode mode;
   block_stmt_iterator bsi;
-  tree th, tl, result, x;
+  tree t1, t2, result, x;
+  int i, n;

   /* If the first argument is a type, just check if support
  is available. Return a non NULL value if supported, NULL_TREE otherwise.
@@ -472,31 +473,37 @@
 return NULL;

   bsi = bsi_for_stmt (stmt);
-  
-  th = make_rename_temp (type, NULL);
-  x = build2 (VEC_INTERLEAVE_HIGH_EXPR, type, vec1, vec2);
-  x = build2 (MODIFY_EXPR, type, th, x);
-  th = make_ssa_name (th, x);
-  TREE_OPERAND (x, 0) = th;
-  bsi_insert_before (&bsi, x, BSI_SAME_STMT);

-  tl = make_rename_temp (type, NULL);
-  x = build2 (VEC_INTERLEAVE_LOW_EXPR, type, vec1, vec2);
-  x = build2 (MODIFY_EXPR, type, tl, x);
-  tl = make_ssa_name (tl, x);
-  TREE_OPERAND (x, 0) = tl;
-  bsi_insert_before (&bsi, x, BSI_SAME_STMT);
+  n = exact_log2 (GET_MODE_NUNITS (mode)) - 1;
+  for (i = 0; i < n; ++i)
+{
+  t1 = create_tmp_var (type, NULL);
+  add_referenced_tmp_var (t1);
+  x = build2 (VEC_INTERLEAVE_HIGH_EXPR, type, vec1, vec2);
+  x = build2 (MODIFY_EXPR, type, t1, x);
+  t1 = make_ssa_name (t1, x);
+  TREE_OPERAND (x, 0) = t1;
+  bsi_insert_before (&bsi, x, BSI_SAME_STMT);

-  result = make_rename_temp (type, NULL);
-  /* ??? Endianness issues?  */
+  t2 = create_tmp_var (type, NULL);
+  add_referenced_tmp_var (t2);
+  x = build2 (VEC_INTERLEAVE_LOW_EXPR, type, vec1, vec2);
+  x = build2 (MODIFY_EXPR, type, t2, x);
+  t2 = make_ssa_name (t2, x);
+  TREE_OPERAND (x, 0) = t2;
+  bsi_insert_before (&bsi, x, BSI_SAME_STMT);
+
+  if (BYTES_BIG_ENDIAN)
+vec1 = t1, vec2 = t2;
+  else
+vec1 = t2, vec2 = t1;
+}
+  
   x = build2 (odd_p ? VEC_INTERLEAVE_HIGH_EXPR : VEC_INTERLEAVE_LOW_EXPR,
-  type, th, tl);
-  x = build2 (MODIFY_EXPR, type, result, x);
-  result = make_ssa_name (result, x);
-  TREE_OPERAND (x, 0) = result;
-  bsi_insert_before (&bsi, x, BSI_SAME_STMT);
+  type, vec1, vec2);
+  x = build2 (MODIFY_EXPR, type, dest, x);

-  return result;
+  return x;
 }

 tree



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26362



[Bug tree-optimization/26362] ICE on the autovect-branch (gfortran example)

2006-02-20 Thread dorit at il dot ibm dot com


--- Comment #1 from dorit at il dot ibm dot com  2006-02-20 16:45 ---
Looks like the vectorizer detects a strided access in this testcase. Strided
accesses are not entirely supported for SSE right now (work in progress...),
but it is enabled, so currently all strided testcases brake on SSE.


-- 

dorit at il dot ibm dot com changed:

   What|Removed |Added

 CC||rth at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26362



[Bug tree-optimization/26197] [4.2 regression] ICE in is_old_name, at tree-into-ssa.c:466

2006-02-19 Thread dorit at il dot ibm dot com


--- Comment #10 from dorit at il dot ibm dot com  2006-02-19 16:10 ---
so maybe if an SFT has may-aliases then new_type_alias should add the
may-aliases of the SFT as may-aliases of the new tag, instead of adding the SFT
as a may-alias of the new tag. ?

There's a comment in new_type_alias that's quite worrying:
"/* The following is based on code in add_stmt_operand to ensure that the
 same defs/uses/vdefs/vuses will be found"
So this code depends on code in add_stmt_operand, that may have changed by
now... 

Related or not, there's another bug in new_type_alias in PR26359.


-- 

dorit at il dot ibm dot com changed:

   What|Removed |Added

 CC||victork at il dot ibm dot
   ||com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26197



[Bug tree-optimization/26359] [4.2 Regression] Over optimization of loop when using -ftree-vectorize

2006-02-19 Thread dorit at il dot ibm dot com


--- Comment #2 from dorit at il dot ibm dot com  2006-02-19 15:34 ---
The problem is that during dce the call to is_hidden_global_store returns false
cause the tag is not marked as global/static.

This seems to fix it:

Index: tree-ssa-alias.c
===
*** tree-ssa-alias.c(revision 110911)
--- tree-ssa-alias.c(working copy)
*** new_type_alias (tree ptr, tree var)
*** 2638,2643 
--- 2638,2651 
add_may_alias (tag, al);
}
  }
+ 
+   /* CHECKME:
+   DECL_CONTEXT (tag) = DECL_CONTEXT (var);
+   TREE_PUBLIC  (tag) = TREE_PUBLIC (var);
+   TREE_READONLY (tag) = TREE_READONLY (var);
+   */
+   MTAG_GLOBAL (tag) = DECL_EXTERNAL (var);
+   TREE_STATIC (tag) = TREE_STATIC (var);
  }

but I don't know if it's the right thing to do in the general case.


-- 

dorit at il dot ibm dot com changed:

   What|Removed |Added

 CC|        |dorit at il dot ibm dot com,
   ||victork at il dot ibm dot
   ||com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26359



[Bug tree-optimization/26360] [4.2 Regression] Autovectorization of char -> int loop gets ICE

2006-02-19 Thread dorit at il dot ibm dot com


--- Comment #2 from dorit at il dot ibm dot com  2006-02-19 08:50 ---
This happens because we actually rely on dce taking place after the vectorizer
to clean up dead code. When we detect a pattern (widneing-summation in this
case) we create a "dummy" stmt ("pattern-stmt") that represents the pattern and
that will be vectorized instead of the original sequence of stmts (that
involves in this case type promotions etc). The def of that "pattern-stmt" is
not connected to any use. This "pattern-stmt" is never meant to remain in the
code in its scalar form (if the loop is vectorized, there will be a vectorized
form of that stmt in the loop, but the scalar "pattern-stmt" will always remain
dead). So, two ways to handle this - either (1) have a "special" dce pass after
the vectorizer that is not disabled by -fno-tree-dce if -ftree-vectorize is on.
or (2) have the vectorizer clean up these pattern-stmts itself when it's done
with the loop; the vectorizer actually does scan the loop after it's done (in
order to free various data structures), so it basically wouldn't cost anything
to do this extra cleanup; the question is - wouldn't it be nicer if a pass
could rely on dce taking place right after, it instead of trying to do some of
the job itself?  


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26360



[Bug tree-optimization/26197] [4.2 regression] ICE in is_old_name, at tree-into-ssa.c:466

2006-02-13 Thread dorit at il dot ibm dot com


--- Comment #6 from dorit at il dot ibm dot com  2006-02-13 16:23 ---
(In reply to comment #5)
> Probably related to
> http://gcc.gnu.org/ml/gcc-patches/2006-01/msg00446.html

Would you expect then that calling mark_new_vars_to_rename, like you did in
your patch, will fix this problem?

I wasn't able to reproduce this error on powerpc-linux and i686-pc-linux-gnu. I
do realize that there's a problem with the setting of virtual operands in the
vectorizer. The over conservativeness in the vectorizer with respect to setting
aliasing information for vector pointers when accessing struct fields may be
responssible for this. I will try to look into this issue. In the meantime,
could someone that can reproduce this problem try out the
mark_new_vars_to_rename patch that Zdenek suggested in the link?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26197



[Bug tree-optimization/25918] gcc.dg/vect/vect-reduc-dot-s16.c scan-tree-dump-times vectorized 1 loops 1 and gcc.dg/vect/vect-reduc-pattern-2.c scan-tree-dump-times vectorized 2 loops 1 fail

2006-02-08 Thread dorit at il dot ibm dot com


--- Comment #7 from dorit at il dot ibm dot com  2006-02-08 14:19 ---
(In reply to comment #5)
Will take care of that.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25918



[Bug tree-optimization/25918] gcc.dg/vect/vect-reduc-dot-s16.c scan-tree-dump-times vectorized 1 loops 1 and gcc.dg/vect/vect-reduc-pattern-2.c scan-tree-dump-times vectorized 2 loops 1 fail

2006-02-08 Thread dorit at il dot ibm dot com


--- Comment #6 from dorit at il dot ibm dot com  2006-02-08 14:17 ---
(In reply to comment #4)
> ... This happens
> because the IA-64 port defines the widen_ssumv4hi3 pattern.  The IA-64 port is
> the only one that defines this pattern, and hence is probably the only port
> "broken" here.  All others will presumably fail to vectorize this loop.

that's correct. it's actually a combination of being able to support
widen_ssumv4hi3 and (non widening) multiplication of shorts. looks like we need
to split these loops into separate testcases, and for this particular loop
expect vectorization if vect_widen_sum and vect_short_mult (new keyword) are
supported. 

> and the testcase fails because we only expected 1 loop to be vectorized.
> I think the only thing wrong here is that the dg-final tests in the testcase
> are not precise enough to handle this case.

indeed. Will take care of that.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25918



[Bug tree-optimization/25918] gcc.dg/vect/vect-reduc-dot-s16.c scan-tree-dump-times vectorized 1 loops 1 and gcc.dg/vect/vect-reduc-pattern-2.c scan-tree-dump-times vectorized 2 loops 1 fail

2006-01-26 Thread dorit at il dot ibm dot com


--- Comment #1 from dorit at il dot ibm dot com  2006-01-26 09:07 ---
Can you please send the dump files generated by -fdump-tree-vect-details?

reduc-dot-s16.c needs the sdot_prodv4hi pattern, which is implemented for ia64,
so I'd expect one loop to be vectorized. I wonder what's the problem there.

In vect-reduc-pattern-2.c - does the vectorizer report vectorizing one loop?
The one loop (that sums shorts into and int accumulator) needs the
widen_ssumv4hi3 pattern to be vectorized, which is implemented for ia64. Does
that loop get vectorized?
The second loop however (that sums chars into and int accumulator) cannot be
vectorized on ia64 because the mode of the result of the widen_ssumv8qi3
pattern as implemented on ia64 in short, not int. If this is indeed the reason
for the failure we'd probably want to introduce finer keywords to represent the
available widening support (in target-supports.exp we currently have just a
"vect_widen_sum" keyword, which does not distinguish between char-to-short
summation and char-to-int summation).


-- 

dorit at il dot ibm dot com changed:

   What|Removed |Added

 CC|                    |dorit at il dot ibm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25918



[Bug tree-optimization/25911] [4.2 Regression] ice in vect_recog_dot_prod_pattern

2006-01-24 Thread dorit at il dot ibm dot com


--- Comment #5 from dorit at il dot ibm dot com  2006-01-24 09:10 ---
Patch:

Index: tree-vect-patterns.c
===
--- tree-vect-patterns.c(revision 109954)
+++ tree-vect-patterns.c(working copy)
@@ -243,7 +243,8 @@
   gcc_assert (stmt);
   stmt_vinfo = vinfo_for_stmt (stmt);
   gcc_assert (stmt_vinfo);
-  gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_loop_def);
+  if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_loop_def)
+return NULL;
   expr = TREE_OPERAND (stmt, 1);
   if (TREE_CODE (expr) != MULT_EXPR)
 return NULL;


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25911



[Bug tree-optimization/25809] New: missed PRE optimization - move "invariant casts" out of loops

2006-01-16 Thread dorit at il dot ibm dot com
In testcases that have reduction, like gcc.dg/vect/vect-reduc-2char.c and
gcc.dg/vect-reduc-2short.c, the following casts appear:

signed char sdiff;
unsigned char ux, udiff; 
sdiff_0 = ...
loop:
   # sdiff_41 = PHI ;
   .
   ux_36 = 
   udiff_37 = (unsigned char) sdiff_41;  
   udiff_38 = x_36 + udiff_37;
   sdiff_39 = (signed char) udiff_38;
end_loop

although these casts could be taken out of loop all together. i.e., transform
the code into something like the following:

signed char sdiff;
unsigned char ux, udiff;
sdiff_0 = ...
udiff_1 = (unsigned char) sdiff_0;
loop:
   # udiff_3 = PHI ;
   .
   ux_36 = 
   udiff_2 = ux_36 + udiff_3;
end_loop
sdiff_39 = (signed char) udiff_2;

see this discussion thread:
http://gcc.gnu.org/ml/gcc-patches/2005-12/msg01827.html


-- 
   Summary: missed PRE optimization - move "invariant casts" out of
loops
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at il dot ibm dot com
 GCC build triplet: ppc64-yellowdog-linux
  GCC host triplet: ppc64-yellowdog-linux
GCC target triplet: ppc64-yellowdog-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25809



[Bug libfortran/21468] vectorizing libfortran

2006-01-08 Thread dorit at il dot ibm dot com


--- Comment #10 from dorit at il dot ibm dot com  2006-01-08 13:49 ---
> Reopening since many of the intrinsics could still vectorize better.

Could help if you list specific functions that you expect to get vectorized. As
far as dotprod is concerned - if it's operating on floats, you need to use
-ffast-math or -funsafe-math-optimizations to enable vectorization. If it's
dotprod of integers - probably the recent patches I sent to support reduction
patterns (http://gcc.gnu.org/ml/gcc-patches/2005-12/msg01896.html) would be
required (this functionality is present in auotvect; you can try to see if it's
vectorized any better with autovect-branch).


-- 

dorit at il dot ibm dot com changed:

   What|Removed |Added

 CC|                |dorit at il dot ibm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21468




[Bug testsuite/25590] FAIL: gcc.dg/tree-ssa/gen-vect-11.c scan-tree-dump-times vectorized 1 loops 1

2006-01-03 Thread dorit at il dot ibm dot com


--- Comment #7 from dorit at il dot ibm dot com  2006-01-04 07:36 ---
(sorry, didn't notice it was already diagnosed as such)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25590




[Bug testsuite/25590] FAIL: gcc.dg/tree-ssa/gen-vect-11.c scan-tree-dump-times vectorized 1 loops 1

2006-01-03 Thread dorit at il dot ibm dot com


--- Comment #6 from dorit at il dot ibm dot com  2006-01-04 07:33 ---
Maybe related to:

2005-12-26  Kazu Hirata  <[EMAIL PROTECTED]>

PR tree-optimization/25125
* convert.c (convert_to_integer): Don't narrow the type of a
PLUX_EXPR or MINUS_EXPR if !flag_wrapv and the unwidened type
is signed.

(indeed this testcase fails vectorization due to cast to unsigned char).
If that's the case, it should probably be xfailed and the PR should be "missed
optimization"


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25590




[Bug target/25413] wrong alignment or incorrect address computation in vectorized code on Pentium 4 SSE

2005-12-15 Thread dorit at il dot ibm dot com


--- Comment #3 from dorit at il dot ibm dot com  2005-12-15 12:50 ---
related discussion: http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413



[Bug target/25413] wrong alignment or incorrect address computation in vectorized code on Pentium 4 SSE

2005-12-15 Thread dorit at il dot ibm dot com


--- Comment #2 from dorit at il dot ibm dot com  2005-12-15 12:41 ---
The problem is that the vectorizer applies loop-peeling in order to align the
data reference *(m->c+i), and peeling only works correctly if the data is
naturally aligned (aligned on it's type size). This is what the vectorizer
currently blindly assumes, but on the Pentium4 doubles are not necessarily
64bit aligned.

Accidentally Devang and I discussed this issue last week, and Devang actually
committed a patch to apple-ppc branch that works around the problem (
http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=108214). Devang's patch
however will not fix this PR - the patch he committed disables vectorization if
the vectorizer was able to compute the misalignment, and discovered that it
doesn't evenly divide by the type size. In this testcase the misalignment is
unknown at compile time. 

To fix this problem we need to disable loop-peeling in the vectorizer if we
can't prove that the data is naturally aligned. Alternatively, if we can't
prove either way we can peel the loop but control the number of iterations it
will execute using a runtime test (i.e. have the prolog loop iterate the entire
loop-count if at runtime we discover that the data is not naturally aligned). 


-- 

dorit at il dot ibm dot com changed:

   What|Removed |Added

 CC|                    |dorit at il dot ibm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413



[Bug target/24378] [4.1/4.2 Regression] gcc.dg/vect/pr24300.c (test for excess errors) fails

2005-12-14 Thread dorit at il dot ibm dot com


--- Comment #9 from dorit at il dot ibm dot com  2005-12-14 15:38 ---
Thanks for testing the patch. I finally submitted it:
http://gcc.gnu.org/ml/gcc-patches/2005-12/msg01071.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24378



  1   2   3   >