On Fri, 10 Apr 2015, Linus Torvalds wrote: > It turns out that gcc's -Os is just horrible nasty crap. It doesn't > actually make good tradeoffs for code density, because it doesn't make > any tradeoffs at all. It tries to choose small code, even when it's > ridiculously bad small code. > > For example, a 24-byte static memcpy is best done as three quad-word > load/store pairs. That's very cheap, and not at all unreasonable. > > But what does gcc do? It does a "rep movsl". > > Seriously. That's *shit*. It absolutely kills performance on some very > critical code. > > I'm not making that up. Try "-O2" and "-Os" on the appended trivial > code. Yes, the "rep movsl" is smaller, but it's incredibly expensive, > particularly if the result is partially used afterwards. > > And I'm not a hater of "rep movs" - not at all. I think that "rep > movsb" is basically a perfect way to tell the CPU "do an optimized > memcpy with whatever cache situation you have". So I'm a big fan of > the string instructions, but only when appropriate. And "appropriate" > here very much includes "I don't know the memory copy size, so I'm > going to call out to some complex generic code that does all kinds of > size checks and tricks". > > Replacing three pairs of "mov" instructions with a "rep movs" is insane. > > (There are a couple of other examples of that kind of issues with > "-Os". Like using "imul $15" instead of single shift-by-4 and > subtract. Again, the "imul" is certainly smaller, but can have quite > bad latency and throughput issues). > > So I'm no longer a fan of -Os. It disables too many obviously good > code optimizations.
I think the issue is -Os is a binary yes/no option without further tuning as to how desperate about code size saving GCC is asked to be. That's what we'd probably have with speed optimisation too if there was only a single -O GCC option -- equivalent to today's -O3. However instead GCC has -O1, -O2, -O3 that turn on more and more possibly insane optimisations gradually (plus a load -f options for further fine tuning). So a possible complementary solution for size saving could be keeping -Os as it is for people's build recipe compatibility, and then have say -Os1, -Os2, -Os3 enabling more and insane optimisations, on the size side for a change. In that case -Os3 would be equivalent to today's -Os. There could be further fine-tune options to control things like the string moves you mention. The thing here is someone would have to implement all of it and I gather GCC folks have more than enough stuff to do already. I'm fairly sure they wouldn't decline a patch though. ;) Maciej -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/