> On Tue, Jan 21, 2014 at 08:08:19PM +0100, Jan Hubicka wrote: > > Yes, this is OK. > > Thanks. BTW, I wonder if we got small expected_size_exp like in this > case (6), if it is desirable to emit the large >= 32 size handling > inline, if (say unless -minline-all-stringops) we couldn't just emit for > that a library call. Of course only if expected_size_exp is sufficiently > smaller than 32 that the larger sizes wouldn't occur too frequently (at > least according to profile info).
I think if we know we do not care about speed of the large block size case, we can simply inline rep;stosb since it is shorter than a library call. With -minline-stringop-dynamically we already do what you suggest and I have patch improving this a bit, just I was holding it until wrong code issues are solved, so it will wait for next stage1. I think in this case we also should use 16byte moves on AMD targets - it seems the logic is just overly strict about presence of vec_value; for 0 and 255 we can simply broadcast it ourselves. I will look into it. (probably not for other values, since loading constant from memory just to store it twice is probably not a win) Honza