On September 6, 2016 5:14:47 PM GMT+02:00, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote: >Hi all,
s/contigous/contiguous/ s/ where where/ where/ +struct merged_store_group +{ + HOST_WIDE_INT start; + HOST_WIDE_INT width; + unsigned char *val; + unsigned int align; + auto_vec<struct store_immediate_info *> stores; + /* We record the first and last original statements in the sequence because + because we'll need their vuse/vdef and replacement position. */ + gimple *last_stmt; s/ because because/ because/ Why aren't these two HWIs unsigned, likewise in store_immediate_info and in most other spots in the patch? + fprintf (dump_file, "Afer writing "); s/Afer /After/ /access if prohibitively slow/s/ if /is / I'd get rid of successful_p in imm_store_chain_info::output_merged_stores. +unsigned int +pass_store_merging::execute (function *fun) +{ + basic_block bb; + hash_set<gimple *> orig_stmts; + + FOR_EACH_BB_FN (bb, fun) + { + gimple_stmt_iterator gsi; + HOST_WIDE_INT num_statements = 0; + /* Record the original statements so that we can keep track of + statements emitted in this pass and not re-process new + statements. */ + for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple_set_visited (gsi_stmt (gsi), false); + num_statements++; + } + + if (num_statements < 2) + continue; What about debug statements? ISTM you should skip those. (Isn't visited reset before entry of a pass?) Maybe I missed the bikeshedding about the name but I'd have used -fmerge-stores instead. Thanks, > >The v3 of this patch addresses feedback I received on the version >posted at [1]. >The merged store buffer is now represented as a char array that we >splat values onto with >native_encode_expr and native_interpret_expr. This allows us to merge >anything that native_encode_expr >accepts, including floating point values and short vectors. So this >version extends the functionality >of the previous one in that it handles floating point values as well. > >The first phase of the algorithm that detects the contiguous stores is >also slightly refactored according >to feedback to read more fluently. > >Richi, I experimented with merging up to MOVE_MAX bytes rather than >word size but I got worse results on aarch64. >MOVE_MAX there is 16 (because it has load/store register pair >instructions) but the 128-bit immediates that we ended >synthesising were too complex. Perhaps the TImode immediate store RTL >expansions could be improved, but for now >I've left the maximum merge size to be BITS_PER_WORD. > >I've disabled the pass for PDP-endian targets as the merging code >proved to be quite fiddly to get right for different >endiannesses and I didn't feel comfortable writing logic for >BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN targets without serious >testing capabilities. I hope that's ok (I note the bswap pass also >doesn't try to do anything on such targets). > >Tested on arm, aarch64, x86_64 and on big-endian arm and aarch64. > >How does this version look? >Thanks, >Kyrill > >[1] https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01512.html > >2016-09-06 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > PR middle-end/22141 > * Makefile.in (OBJS): Add gimple-ssa-store-merging.o. > * common.opt (fstore-merging): New Optimization option. > * opts.c (default_options_table): Add entry for > OPT_ftree_store_merging. > * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define. > * passes.def: Insert pass_tree_store_merging. > * tree-pass.h (make_pass_store_merging): Declare extern > prototype. > * gimple-ssa-store-merging.c: New file. > * doc/invoke.texi (Optimization Options): Document > -fstore-merging. > >2016-09-06 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > Jakub Jelinek <ja...@redhat.com> > > PR middle-end/22141 > * gcc.c-torture/execute/pr22141-1.c: New test. > * gcc.c-torture/execute/pr22141-2.c: Likewise. > * gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging. > * gcc.target/aarch64/ldp_stp_4.c: Likewise. > * gcc.dg/store_merging_1.c: New test. > * gcc.dg/store_merging_2.c: Likewise. > * gcc.dg/store_merging_3.c: Likewise. > * gcc.dg/store_merging_4.c: Likewise. > * gcc.dg/store_merging_5.c: Likewise. > * gcc.dg/store_merging_6.c: Likewise. > * gcc.target/i386/pr22141.c: Likewise. > * gcc.target/i386/pr34012.c: Add -fno-store-merging to dg-options.