On Wed, 2017-07-12 at 21:40 -0700, Andi Kleen wrote: > On Thu, Jul 13, 2017 at 06:28:43AM +0200, Mike Galbraith wrote: > > On Wed, 2017-07-12 at 21:15 -0700, Andi Kleen wrote: > > > On Thu, Jul 13, 2017 at 05:03:00AM +0200, Mike Galbraith wrote: > > > > On Wed, 2017-07-12 at 15:30 -0700, Andi Kleen wrote: > > > > > Josh Poimboeuf <jpoim...@redhat.com> writes: > > > > > > > > > > > > The ORC data format does have a few downsides compared to DWARF. > > > > > > The > > > > > > ORC unwind tables take up ~1MB more memory than DWARF eh_frame > > > > > > tables. > > > > > > > > > > > Can we have an option to just use dwarf instead? For people > > > > > who don't want to waste a MB+ to solve a problem that doesn't > > > > > exist (as proven by many years of opensuse kernel experience) > > > > > > > > Sure the dwarf unwinder works well for crashes, but at the price of > > > > demolishing ftrace/perf utility. > > > > > > You mean the unwind performance? > > > > Yeah, it hurts.. massively, has even been known to kill big boxen. > > Why was that?
Presuming you mean the big box bit, danged if I know, I haven't personally met that, only the massive overhead. > > > That's a valid concern, but neither ORC nor dwarf are likely > > > to address it. However most usages of ftrace/perf shouldn't be that > > > depending on unwind performance -- just lower the frequency of your > > > events. > > > > > > The only possible win is if the win from not using FP code is > > > significant enough. On the x86 side the only modern CPUs that should > > > really > > > care about this are Atoms. > > > > Nope, they all care. Measure performance delta of fast/light stuff. > > Well if your test cares that much about function overhead you may want to try > LTO. It can get rid of a lot of functions by doing cross file > inlining. > > https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=lto-411-2 > > > Maybe I'm expecting too much good stuff to follow, but don't spoil it > > for me, I think I'm looking at a real winner :) > > It's somewhat surprising. It would be good to under stand why that > happens. Is it icache misses, data cache misses for the stack, or > simply more instructions executed, or worse tail calls? No idea. It was speculated that it was register loss, but I played with that, saw nearly zero delta until I stole too many. -Mike