> Hi! > > As the testcase shows, for the memset (x, 0, y); snippet which handles > y from 16 to 31 inclusive for some tunings, we generate: > .L36: > movq $0, (%rdi) > movq $0, 8(%rdi) > movq $0, -8(%rsi,%rdi)
Oops... > which is correct only for y from 16 to 24 inclusive, if y is 25 to 31, > we clear the first 16 bytes and last 8 bytes of the buffer, but would leave > 1 to 7 bytes untouched in between that. > With this patch we emit: > .L36: > movq $0, (%rdi) > movq $0, 8(%rdi) > movq $0, -16(%rsi,%rdi) > movq $0, -8(%rsi,%rdi) > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2014-01-21 Jakub Jelinek <[email protected]> > > PR target/59003 > * config/i386/i386.c (expand_small_movmem_or_setmem): If mode is > smaller than size, perform several stores or loads and stores > at dst + count - size to store or copy all of size bytes, rather > than just last modesize bytes. > > * gcc.dg/tree-prof/pr59003.c: New test. Yes, this is OK. Thanks a lot for looking into that! This PR was on my TODO list for way too long (mostly because compilation farm machines are unasy to debug profiledbootstrap errors and I got new devel machine only last week). Honza
