[Bug tree-optimization/50596] Problems in vectorization of condition expression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 --- Comment #18 from Jakub Jelinek jakub at gcc dot gnu.org 2011-10-25 08:02:16 UTC --- Author: jakub Date: Tue Oct 25 08:02:08 2011 New Revision: 180424 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=180424 Log: PR tree-optimization/50596 * tree-vect-stmts.c (vect_mark_relevant): Only use FOR_EACH_IMM_USE_FAST if lhs is SSA_NAME. (vectorizable_store): If is_pattern_stmt_p look through VIEW_CONVERT_EXPR on lhs. * tree-vect-patterns.c (check_bool_pattern, adjust_bool_pattern): Use unsigned type instead of signed. (vect_recog_bool_pattern): Optimize also stores into bool memory in addition to casts from bool to integral types. (vect_mark_pattern_stmts): If pattern_stmt already has vinfo created, don't create it again. * gcc.dg/vect/vect-cond-10.c: New test. Added: trunk/gcc/testsuite/gcc.dg/vect/vect-cond-10.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-patterns.c trunk/gcc/tree-vect-stmts.c
[Bug tree-optimization/50596] Problems in vectorization of condition expression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Comment #19 from Jakub Jelinek jakub at gcc dot gnu.org 2011-10-25 08:23:32 UTC --- Bool stores are handled now too.
[Bug tree-optimization/50596] Problems in vectorization of condition expression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 --- Comment #16 from Jakub Jelinek jakub at gcc dot gnu.org 2011-10-16 13:10:26 UTC --- Author: jakub Date: Sun Oct 16 13:10:20 2011 New Revision: 180057 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=180057 Log: PR tree-optimization/50596 * tree-vectorizer.h (NUM_PATTERNS): Increase to 7. * tree-vect-patterns.c (vect_vect_recog_func_ptrs): Add vect_recog_bool_pattern. (check_bool_pattern, adjust_bool_pattern_cast, adjust_bool_pattern, vect_recog_bool_pattern): New functions. * gcc.dg/vect/vect-cond-9.c: New test. Added: trunk/gcc/testsuite/gcc.dg/vect/vect-cond-9.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-patterns.c trunk/gcc/tree-vectorizer.h
[Bug tree-optimization/50596] Problems in vectorization of condition expression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 --- Comment #17 from vincenzo Innocente vincenzo.innocente at cern dot ch 2011-10-16 13:47:22 UTC --- cool! even signed char k[1024]; 61void foo6() { 62 for (int i=0; i!=N; ++i) 63k[i] = (a[i]b[i]) (c[i]d[i]); vectorize! with bool k[1024]; does not. I can survive though. I will have to measure performance. I suspect that using int k[1024]; will be faster… Anyhow great achievement
[Bug tree-optimization/50596] Problems in vectorization of condition expression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 --- Comment #13 from vincenzo Innocente vincenzo.innocente at cern dot ch 2011-10-07 07:35:40 UTC --- is not PR50649 caused by your changes?
[Bug tree-optimization/50596] Problems in vectorization of condition expression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 --- Comment #14 from vincenzo Innocente vincenzo.innocente at cern dot ch 2011-10-07 10:15:03 UTC --- signed char k[1024]; void foo6() { for (int i=0; i!=N; ++i) k[i] = (a[i]b[i] ? -1 : 0) (c[i]d[i] ? -1 : 0); } requires -fno-tree-pre to vectorize w/o I get not vectorized: relevant stmt not supported: prephitmp.214_16 = D.2173_10 D.2174_11 ? iftmp.2_2 : 0; btw in almost all code of mine -fno-tree-pre produces always faster code when vectorization matters!
[Bug tree-optimization/50596] Problems in vectorization of condition expression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 --- Comment #15 from Jakub Jelinek jakub at gcc dot gnu.org 2011-10-07 10:31:13 UTC --- float a[1024], b[1024], c[1024], d[1024]; int j[1024]; void f1 (void) { int i; for (i = 0; i 1024; ++i) { unsigned int x = a[i] b[i] ? -1 : 0; unsigned int y = c[i] d[i] ? -1 : 0; j[i] = (x y) 31; } } vectorizes fine and generates quite good code IMHO. Something similar I'd like to achieve with the vect_recog_bool_pattern I'm working on even for some of your testcases.
[Bug tree-optimization/50596] Problems in vectorization of condition expression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 --- Comment #9 from Jakub Jelinek jakub at gcc dot gnu.org 2011-10-06 11:57:54 UTC --- Created attachment 25428 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25428 gcc47-vect-condexpr-mixed.patch I believe at least some simple case of bool could be handled in tree-vect-patterns.c by transforming bool lhs assignments with comparison on rhs into COND_EXPRs on char/short/int/long (depending on the comparison operand size), /|/^ could be handled too and finally either cast to some integer type or memory store). Before trying to write it, I tried to write something simpler, in particular a pattern recognizer that allows to vectorize mixed size type COND_EXPRs (so far only with INTEGER_CST then/else). For the case where COND_EXPR lhs type is wider than comparison type I think it must be INTEGER_CSTs, otherwise we can't ensure that they fit into the narrower integer type. But for lhs type narrower than comparison type lhs = cmp0 cmp1 ? val1 : val2; (where sizeof (lhs) sizeof (cmp0)) the above in theory could be transformed into (for itype an integer type with the same sign as val1's type, but size of cmp0) into: val1' = (itype) val1; val2' = (itype) val2; lhs' = cmp0 cmp1 ? val1' : val2'; lhs = (__typeof (lhs)) lhs'; but we'd need more than one def_stmt for that. This patch allows e.g. vectorization of: float a[1024], b[1024]; unsigned char k[1024]; void foo (void) { int i; for (i = 0; i 1024; ++i) k[i] = a[i] b[i] ? -1 : 0; } on i?86/x86_64 which couldn't be previously vectorized. Ira, does this sound reasonable? How should a testcase look like (I think it will be currently only vectorized on i?86/x86_64, as it needs mixed mode vcond support, which, while probably implementable for e.g. altivec, is currently i386 backend only feature)? If this makes sense, I'll try to do the bool pattern recognition next.
[Bug tree-optimization/50596] Problems in vectorization of condition expression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 Ira Rosen irar at il dot ibm.com changed: What|Removed |Added CC||irar at il dot ibm.com --- Comment #10 from Ira Rosen irar at il dot ibm.com 2011-10-06 12:31:29 UTC --- (In reply to comment #9) Ira, does this sound reasonable? Looks good to me. (You can probably use build_nonstandard_integer_type() instead of lang_hooks.types.type_for_mode). How should a testcase look like (I think it will be currently only vectorized on i?86/x86_64, as it needs mixed mode vcond support, which, while probably implementable for e.g. altivec, is currently i386 backend only feature)? I am not sure I understand the question. Are you asking how to check that it gets vectorized only on i?86/x86_64? If so, you need a new proc in lib/target-supports.exp (something like vect_cond_mixed_types).
[Bug tree-optimization/50596] Problems in vectorization of condition expression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Attachment #25428|0 |1 is obsolete|| --- Comment #11 from Jakub Jelinek jakub at gcc dot gnu.org 2011-10-06 13:30:36 UTC --- Created attachment 25429 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25429 /tmp/gcc47-vect-condexpr-mixed.patch Thanks, here is an updated patch.
[Bug tree-optimization/50596] Problems in vectorization of condition expression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 --- Comment #12 from Jakub Jelinek jakub at gcc dot gnu.org 2011-10-06 17:49:43 UTC --- Author: jakub Date: Thu Oct 6 17:49:36 2011 New Revision: 179626 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=179626 Log: PR tree-optimization/50596 * tree-vectorizer.h (vect_is_simple_cond): New prototype. (NUM_PATTERNS): Change to 6. * tree-vect-patterns.c (vect_recog_mixed_size_cond_pattern): New function. (vect_vect_recog_func_ptrs): Add vect_recog_mixed_size_cond_pattern. (vect_mark_pattern_stmts): Don't create stmt_vinfo for def_stmt if it already has one, and don't set STMT_VINFO_VECTYPE in it if it is already set. * tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Handle COND_EXPR in pattern stmts. (vect_is_simple_cond): No longer static. * lib/target-supports.exp (check_effective_target_vect_cond_mixed): New. * gcc.dg/vect/vect-cond-8.c: New test. Added: trunk/gcc/testsuite/gcc.dg/vect/vect-cond-8.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/lib/target-supports.exp trunk/gcc/tree-vect-patterns.c trunk/gcc/tree-vect-stmts.c trunk/gcc/tree-vectorizer.h
[Bug tree-optimization/50596] Problems in vectorization of condition expression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 --- Comment #8 from Jakub Jelinek jakub at gcc dot gnu.org 2011-10-05 07:10:56 UTC --- Until http://gcc.gnu.org/viewcvs?root=gccview=revrev=176563 float a[1024], b[1024], c[1024], d[1024]; int j[1024]; void foo (void) { int i; for (i = 0; i 1024; ++i) { int x = a[i] b[i]; int y = c[i] d[i]; j[i] = x c[i] y; } } didn't use any bool types, just int and float, still it couldn't vectorize: pr50596-2.c:8: note: not vectorized: relevant stmt not supported: x_5 = D.2699_3 D.2700_4; I think we could use VECT_COND_EXPR vect1 vect2, { 1, 1, ...}, { 0, 0, ... } for that (and hopefully the backends optimize that well, e.g. into anding the comparison mask with { 1, 1, ... } or doing per-element right shift by element width - 1 on the mask. With bool it would be nice if at least for non-stores we would pick the best suitable wider integer vector type (in this case where the bools are set by comparison operation and feed that is afterwards cast to int the best is obviously int vector).
[Bug tree-optimization/50596] Problems in vectorization of condition expression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||irar at gcc dot gnu.org, ||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org 2011-10-04 07:04:25 UTC --- The first problem with vectorization of ori function is similar to why the first loop below is not vectorized and second is: float a[1024], b[1024], c[1024], d[1024], e[1024]; void foo (void) { for (int i = 0; i 1024; i++) a[i] = b[i] c[i] ? d[i] : e[i]; } void bar (void) { for (int i = 0; i 1024; i++) { float d_ = d[i], e_ = e[i]; a[i] = b[i] c[i] ? d_ : e_; } } gcc doesn't think it is ok to load d[i] resp. e[i] unconditionally. In this exact case where the loop bound is known and it is an static array of at least that size it is probably fine, but if d or e was a pointer which might point to a smaller array, d[i] or e[i] accesses might segfault. That said, we still have control flow that even ifcvt doesn't fix up even with: void f2 () { for (int i = 0; i != N; ++i) { float c_ = c[i], d_ = d[i]; z[i] = a[i] b[i] c_ d_; } } void f3 () { for (int i = 0; i != N; ++i) { float a_ = a[i], b_ = b[i], c_ = c[i], d_ = d[i]; z[i] = a_ b_ c_ d_; } } Note even if there would be no control flow, we'd still give up on bool not being vectorized. Bool is problematic, we'd have to use an unsigned char vector instead (if bool is QImode) for vcond. But it would be a vcond with different datamode and cmpmode size, we'd either need to do it using a V4SFmode/V8SFmode vcond, then VEC_PACK_TRUNC_EXPR them into V16QImode/V32QImode. Anyway, I think handling _Bool/bool somehow is now much more urgent than it has been before, given Kai's/Richard's change to use _Bool/bool much more often in GIMPLE. If a bool SSA_NAME just feeds some COND_EXPR, we could just use some wider type, or we could use wider vcond etc.
[Bug tree-optimization/50596] Problems in vectorization of condition expression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 --- Comment #3 from vincenzo Innocente vincenzo.innocente at cern dot ch 2011-10-04 09:11:53 UTC --- for (int i = 0; i 1024; i++) a[i] = b[i] c[i] ? d[i] : e[i]; DOES vectorize with -ftree-loop-if-convert-stores even with float * a; float * b; float * c; float * d; float * e;
[Bug tree-optimization/50596] Problems in vectorization of condition expression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 Richard Guenther rguenth at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2011-10-04 CC||rguenth at gcc dot gnu.org Ever Confirmed|0 |1 --- Comment #4 from Richard Guenther rguenth at gcc dot gnu.org 2011-10-04 11:09:58 UTC --- I agree with the need to at least support vectorizing loads and stores of 1-bit unsigned precision values. We need to be careful with arithmetic and conversions though (which is why we reject bools right now).
[Bug tree-optimization/50596] Problems in vectorization of condition expression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 --- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org 2011-10-04 11:13:58 UTC --- (In reply to comment #4) I agree with the need to at least support vectorizing loads and stores of 1-bit unsigned precision values. We need to be careful with arithmetic and conversions though (which is why we reject bools right now). We could represent the arithmetic and conversions (or at least subset thereof) using *COND_EXPRs etc. In any case, the bool representation is desirable for the scalar loop, so this isn't something we should be doing in ifcvt, it needs to be done in the vectorizer itself.
[Bug tree-optimization/50596] Problems in vectorization of condition expression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 --- Comment #6 from rguenther at suse dot de rguenther at suse dot de 2011-10-04 11:26:51 UTC --- On Tue, 4 Oct 2011, jakub at gcc dot gnu.org wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 --- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org 2011-10-04 11:13:58 UTC --- (In reply to comment #4) I agree with the need to at least support vectorizing loads and stores of 1-bit unsigned precision values. We need to be careful with arithmetic and conversions though (which is why we reject bools right now). We could represent the arithmetic and conversions (or at least subset thereof) using *COND_EXPRs etc. In any case, the bool representation is desirable for the scalar loop, so this isn't something we should be doing in ifcvt, it needs to be done in the vectorizer itself. Sure. Note that in GIMPLE bool = (bool) int; isn't equivalent to bool = int != 0 but to a truncation to 1-bit precision. Thus for the truncation a BIT_AND is enough. I'm just worried about N-precision signed to mode-precision sign-extension (for the 1-bit case we can use a COND_EXPR, but for more bits it gets more difficult). Richard.
[Bug tree-optimization/50596] Problems in vectorization of condition expression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 --- Comment #7 from rguenther at suse dot de rguenther at suse dot de 2011-10-04 11:28:18 UTC --- On Tue, 4 Oct 2011, jakub at gcc dot gnu.org wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 --- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org 2011-10-04 11:13:58 UTC --- (In reply to comment #4) I agree with the need to at least support vectorizing loads and stores of 1-bit unsigned precision values. We need to be careful with arithmetic and conversions though (which is why we reject bools right now). We could represent the arithmetic and conversions (or at least subset thereof) using *COND_EXPRs etc. In any case, the bool representation is desirable for the scalar loop, so this isn't something we should be doing in ifcvt, it needs to be done in the vectorizer itself. Oh, and we don't handle expanding N-bit precision arithmetic on vector types properly - for scalars we do the necessary truncation at RTL expansion time. So I think we should give up for that case for now.
[Bug tree-optimization/50596] Problems in vectorization of condition expression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50596 --- Comment #1 from vincenzo Innocente vincenzo.innocente at cern dot ch 2011-10-03 08:40:53 UTC --- manage to vectorize this int j[1024]; void foo5() { for (int i=0; i!=N; ++i) j[i] = (a[i]b[i] ? -1 : 0) (c[i]d[i] ? -1 : 0); } which is not bad, still a funny syntax (at least for those who are not used to code in native SSE)