http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57890
Evgeniy Dushistov <dushistov at mail dot ru> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|target |tree-optimization --- Comment #2 from Evgeniy Dushistov <dushistov at mail dot ru> --- >That would mean the expansion for memset is not optimal for the target which >>means this is a target issue rather than a C++ front-end or a middle-end >issue. I disagree. Bisect show that fault commit (Can anybody add him to CC?): 2012-06-05 Richard Guenther <rguent...@suse.de> PR tree-optimization/53081 * tree-loop-distribution.c (generate_memset_builtin): Handle all kinds of byte-sized stores. (classify_partition): Likewise. (tree_loop_distribution): Adjust seed statements used for !flag_tree_loop_distribution. * gcc.dg/tree-ssa/ldist-19.c: New testcase. * gcc.c-torture/execute/builtins/builtins.exp: Always pass -fno-tree-loop-distribute-patterns. Yes, for builtin memset gcc generated bad code, and this a target issue. But for gcc 4.7 the issue was known (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55953), that bultin memset bad for at least arm, x86 and amd64 ( suppose major CPUs that gcc supports). Why in gcc 4.8 introduce new code in tree optimization that produce more builin memset, why not wait untill builtin memset will be fixed? If look at gcc as the whole thing, this is regression: "+15% CPU time for simple loop".