On Wed, May 6, 2015 at 4:04 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Hi All, > > Here is a patch which gives us significant speed-up on HASWELL for > test containing masked stores. The main goal of that patch is attempt > to avoid HW hazard for maskmove instructions through inserting > additional check on zero mask and putting all masked store statements > into separate block on false edge.All MASK_STORE statements having the > same mask put into one block. Any comments will be appreciate.
Hmm. I'm not very happy with this "optimization" happening at the GIMPLE level - it feels more like a mdreorg thing... The testcase you add doesn't end up with invalid addresses - so what's the testcase you are inventing this for? Looking into the implementation I don't see where you are validating data dependences of any sort but you are moving stores (and possibly loads when sinking definition stmts of stored values). The code-sinking part should be handled by the existing pass. Your simple testcase contains a single masked store, so why does simply conditionalizing each masked store in mdreorg not work? It's a hazard (hopefully fixed eventually), thus not really worth optimizing 100%. The target hook name is awful. You don't need a extra flag in struct loop - the vectorizer scans all insns so it can perfectly well re-compute it. What this all feels like is more like a un-if-conversion pass which might be useful for aggressively if-converted vectorized code as well (thus lots of vec_cond expressions for example). Richard. > ChangeLog: > 2015-05-06 Yuri Rumyantsev <ysrum...@gmail.com> > > * cfgloop.h (has_mask_store): Add new field to struct loop. > * config/i386/i386.c: Include files stringpool.h and tree-ssanames.h. > (ix86_vectorize_zero_vector): New function. > (TARGET_VECTORIZE_ZERO_VECTOR): New target macro > * doc/tm.texi.in: Add @hook TARGET_VECTORIZE_ZERO_VECTOR. > * doc/tm.texi: Updated. > * target.def (zero_vector): New DEFHOOK. > * tree-if-conv.c (predicate_mem_writes): Set has_mask_store for loop. > * tree-vect-stmts.c : Include tree-into-ssa.h. > (optimize_mask_stores): New function. > * tree-vectorizer.c (vectorize_loops): Zero has_mask_store field for > non-vectorized loops and invoke optimize_mask_stores function. > > gcc/testsuite/ChangeLog: > * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.