On Wed, Apr 10, 2013 at 08:14:30PM +0400, Michael Zolotukhin wrote: > Hi, > This patch adds a new algorithm of expanding movmem in x86 and a bit > refactor existing implementation. This is a reincarnation of the patch > that was sent wasn't checked couple of years ago - now I reworked it > from scratch and divide into several more manageable parts. > Hi, I am writing memcpy for libc. It avoids computed jump and has is much faster on small strings (variant for sandy bridge attached. > For now this algorithm isn't used, because cost_models are tuned to > use existing ones. I believe the new algorithm will give better > performance, but I'll leave cost-models tuning for a separate patch. > You must also check performance with cold instruction cache. Now memcpy(x,y,128) takes 126 bytes which is too much.
> Also, I changed get_mem_align_offset to make it handle MEM_REFs as > well. Probably, there is another way of getting info about alignment - > if so, please let me know. > Do not align for small sizes. Dependency caused by this erases any gains that you migth get. Keep in mind that in 55% of cases data are already aligned. Also in my tests best way to handle prologue is first copy last 16 bytes and then loop. > Similar improvements could be done in expanding of memset, but that's > in progress now and I'm going to proceed with it if this patch is ok. > > Bootstrap/make check/Specs2k are passing on i686 and x86_64. > > Is it ok for trunk? > > Changelog entry: > > 2013-04-10 Michael Zolotukhin <michael.v.zolotuk...@gmail.com> > > * config/i386/i386-opts.h (enum stringop_alg): Add vector_loop. > * config/i386/i386.c (expand_set_or_movmem_via_loop): Use > adjust_address instead of change_address to keep info about alignment. > (emit_strmov): Remove. > (emit_memmov): New function. > (expand_movmem_epilogue): Refactor to properly handle bigger sizes. > (expand_movmem_epilogue): Likewise and return updated rtx for > destination. > (expand_constant_movmem_prologue): Likewise and return updated rtx for > destination and source. > (decide_alignment): Refactor, handle vector_loop. > (ix86_expand_movmem): Likewise. > (ix86_expand_setmem): Likewise. > * config/i386/i386.opt (Enum): Add vector_loop to option stringop_alg. > * emit-rtl.c (get_mem_align_offset): Compute alignment for MEM_REF. > > > -- > --- > Best regards, > Michael V. Zolotukhin, > Software Engineer > Intel Corporation.