On Mon, Aug 22, 2016 at 5:28 PM, H.J. Lu <hjl.to...@gmail.com> wrote: > On Mon, Aug 22, 2016 at 4:31 AM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Fri, Aug 19, 2016 at 4:45 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >>> On Fri, Aug 19, 2016 at 2:21 AM, Richard Biener >>> <richard.guent...@gmail.com> wrote: >>>> On Thu, Aug 18, 2016 at 5:16 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >>>>> On Thu, Aug 18, 2016 at 1:18 AM, Richard Biener >>>>> <richard.guent...@gmail.com> wrote: >>>>>> On Wed, Aug 17, 2016 at 10:11 PM, H.J. Lu <hongjiu...@intel.com> wrote: >>>>>>> builtin_memset_gen_str returns a register used for memset, which only >>>>>>> supports integer registers. But a target may use vector registers in >>>>>>> memmset. This patch adds a TARGET_GEN_MEMSET_VALUE hook to duplicate >>>>>>> QImode value to mode derived from STORE_MAX_PIECES, which can be used >>>>>>> with vector instructions. The default hook is the same as the original >>>>>>> builtin_memset_gen_str. A target can override it to support vector >>>>>>> instructions for STORE_MAX_PIECES. >>>>>>> >>>>>>> Tested on x86-64 and i686. Any comments? >>>>>>> >>>>>>> H.J. >>>>>>> --- >>>>>>> gcc/ >>>>>>> >>>>>>> * builtins.c (builtin_memset_gen_str): Call >>>>>>> targetm.gen_memset_value. >>>>>>> (default_gen_memset_value): New function. >>>>>>> * target.def (gen_memset_value): New hook. >>>>>>> * targhooks.c: Inclue "expmed.h" and "builtins.h". >>>>>>> (default_gen_memset_value): New function. >>>>>> >>>>>> I see default_gen_memset_value in builtins.c but it belongs here. >>>>>> >>>>>>> * targhooks.h (default_gen_memset_value): New prototype. >>>>>>> * config/i386/i386.c (ix86_gen_memset_value): New function. >>>>>>> (TARGET_GEN_MEMSET_VALUE): New. >>>>>>> * config/i386/i386.h (STORE_MAX_PIECES): Likewise. >>>>>>> * doc/tm.texi.in: Add TARGET_GEN_MEMSET_VALUE hook. >>>>>>> * doc/tm.texi: Updated. >>>>>>> >>>>> >>>>> Like this? >>>> >>>> Aww, ok - I see builtins.c is a better place - sorry for the extra work. >>>> >>>> Richard. >>> >>> Here is the original patch with the updated ChangeLog. OK for trunk? >> >> So now actually looking at what you are doing. Why do we need a new >> hook? Is it that even if we make builtin_memset_gen_str handle vector-mode >> MODE properly you don't get the desired optimized code? Your patch doesn't > > It is hard to say what codes builtin_memset_gen_str will generate > if vector mode is supported. I suspect since broadcast instructions are > target specific, a target hook may still be needed to broadcast a QImode > input to a full vector register.
Well, we mainly seem to miss an optab that matches vec_duplicate (we have vec_init only which might be sufficient as well). >> seem to change the mode to vector but only STORE_MAX_PIECES and >> op_by_pieces_d::run will only consider integer modes. > > My patch doesn't change mode of STORE_MAX_PIECES. It changes > how TImode register from STORE_MAX_PIECES is generated: > > (insn 8 7 9 (set (reg:SI 90) > (zero_extend:SI (reg:QI 89))) c1.i:4 -1 > (nil)) > > (insn 9 8 10 (parallel [ > (set (reg:SI 91) > (mult:SI (reg:SI 90) > (const_int 16843009 [0x1010101]))) > (clobber (reg:CC 17 flags)) > ]) c1.i:4 -1 > (nil)) > > (insn 10 9 11 (set (reg:V4SI 92) > (vec_duplicate:V4SI (reg:SI 91))) c1.i:4 4197 {*vec_dupv4si} > (nil)) > > (insn 11 10 12 (set (subreg:V4SI (reg:TI 93) 0) > (reg:V4SI 92)) c1.i:4 -1 > (nil)) > > (insn 12 11 13 (set (mem:TI (reg/v/f:DI 87 [ dst ]) [0 MEM[(void > *)dst_2(D)]+0 S16 > A8]) > (reg:TI 93)) c1.i:4 -1 > (nil)) Yes, but if you change STORE_MAX_PIECES to also include OI/XImode then this exposes OI/XImode scalars by doing OI/XImode stores. Thus I am suggesting to instead allow to use vector integer modes as well for the by_pieces stuff (and maybe guard that fact with a target hook). Richard. > vs. > > (insn 8 7 9 (set (reg:DI 91) > (zero_extend:DI (reg:QI 89))) c1.i:4 -1 > (nil)) > > (insn 9 8 10 (set (subreg:DI (reg:TI 90) 0) > (reg:DI 91)) c1.i:4 -1 > (nil)) > > (insn 10 9 11 (set (subreg:DI (reg:TI 90) 8) > (const_int 0 [0])) c1.i:4 -1 > (nil)) > > (insn 11 10 12 (set (reg:DI 93) > (const_int 72340172838076673 [0x101010101010101])) c1.i:4 -1 > (nil)) > > (insn 12 11 13 (parallel [ > (set (reg:DI 92) > (mult:DI (subreg:DI (reg:TI 90) 8) > (reg:DI 93))) > (clobber (reg:CC 17 flags)) > ]) c1.i:4 -1 > (nil)) > > (insn 13 12 14 (set (reg:DI 95) > (const_int 72340172838076673 [0x101010101010101])) c1.i:4 -1 > (nil)) > > (insn 14 13 15 (parallel [ > (set (reg:DI 94) > (mult:DI (subreg:DI (reg:TI 90) 0) > (reg:DI 95))) > (clobber (reg:CC 17 flags)) > ]) c1.i:4 -1 > (nil)) > > (insn 15 14 16 (parallel [ > (set (reg:DI 96) > (plus:DI (reg:DI 92) > (reg:DI 94))) > (clobber (reg:CC 17 flags)) > ]) c1.i:4 -1 > (nil)) > > (insn 16 15 17 (set (reg:DI 98) > (const_int 72340172838076673 [0x101010101010101])) c1.i:4 -1 > (nil)) > > (insn 17 16 18 (parallel [ > (set (reg:TI 97) > (mult:TI (zero_extend:TI (subreg:DI (reg:TI 90) 0)) > (zero_extend:TI (reg:DI 98)))) > (clobber (reg:CC 17 flags)) > ]) c1.i:4 -1 > (nil)) > > (insn 18 17 19 (parallel [ > (set (reg:DI 99) > (plus:DI (reg:DI 96) > (subreg:DI (reg:TI 97) 8))) > (clobber (reg:CC 17 flags)) > ]) c1.i:4 -1 > (nil)) > > (insn 19 18 20 (set (subreg:DI (reg:TI 97) 8) > (reg:DI 99)) c1.i:4 -1 > (nil)) > > (insn 20 19 21 (set (reg:TI 97) > (reg:TI 97)) c1.i:4 -1 > (expr_list:REG_EQUAL (mult:TI (reg:TI 90) > (const_wide_int 0x1010101010101010101010101010101)) > (nil))) > > (insn 21 20 22 (set (mem:TI (reg/v/f:DI 87 [ dst ]) [0 MEM[(void > *)dst_2(D)]+0 S16 > A8]) > (reg:TI 97)) c1.i:4 -1 > (nil)) > > >> So isn't this a general missed optimization that TImode 0x01010101...01 * x >> is not optimized to a pxor or broadcast? > > In general, vector instructions aren't supported in integer computation. > TImode 0x01010101...01 * x will use integer registers not vector registers. > My target hook simply allows vector registers to be used in TImode for > STORE_MAX_PIECES. > > -- > H.J.