http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726
--- Comment #22 from Jan Hubicka <hubicka at ucw dot cz> 2012-06-22 22:45:35 UTC --- > Yes. The question is what is "very small" and how can we possibly As what is very small is defined in the i386.c in the cost tables. I simply run a small benchmark testing library&GCC implementations to fill it in. With new glibcs these tables may need upating. I updated them on some to make glibc in SUSE 11.x. PR 43052 is about memcmp. Memcpy/memset should behave more or less sanely. (that also reminds me that I should look again at the SSE memcpy/memset implementation for 4.8) > detect "very small". For this testcase we can derive an upper bound > of the size, which is 8, but the size is not constant. I think unless > we know we can expand the variable-size memcpy with, say, three > CPU instructions inline there is no reason to not call memcpy. > > Thus if the CPU could do > > tem = unaligned-load-8-bytes-from-src-and-ignore-faults; > mask = generate mask from size > store-unaligned-8-bytes-with-maxk > > then expanding the memcpy call inline would be a win I suppose. > AVX has VMASKMOV, but I'm not sure using that for sizes <= 16 > bytes is profitable? Note that from the specs > of VMASKMOV it seems the memory operands need to be aligned and > the mask does not support byte-granularity. > > Which would leave us to inline expanding the case of at most 2 byte > memcpy. Of course currently there is no way to record an upper > bound for the size (we do not retain value-range information - but > we of course should). My secret plan was to make VRP produce value profiling histogram when value is known to be with small range. Should be quite easy to implement. Honza