On Fri, Jul 22, 2016 at 5:22 PM, Linus Torvalds <torva...@linux-foundation.org> wrote: > > So without having yet looked at the code, I want people to understand > that to a very real degree, the stack tracer that the *oopsing* code > (ie what all the usual kernel fault handlers use) is very very special > code and needs to be handled very carefully, and needs to be extra > robust, even in the presence of stack corruption, and even in the > presence of the dwarf info being totally corrupted. Because we've very > much had both things happen. > > It is very possible that we should have two different stack tracers - > the stupid "for oopses only" code that doesn't necessarily give the > perfect trace, but is very anal and happily gives old stale addresses > (which can be very useful for seeing what happened just before the > "real" stack trace), and then a separate stack trace engine that is > clever and gets things right, and if that one faults it can depend on > the normal kernel fault handling picking up the pieces.
I think that Josh's code has the potential to be extremely robust *and* give more correct results when possible. One thing I intend to review when v2 shows up is that it's as conservative as it needs to be to avoid ever dereferencing an out-of-bounds pointer. And Josh's oops printer carefully walks and prints out all addresses on the stack (complete with question marks) even if the unwinder doesn't find them. > > Yes, the current stack tracer is crufty. No, it's not perfect. But it > is very well tested, and has held up. That should not be dismissed. > I think you may be giving the current tracer slightly more credit than it's due. In my stack guard page patchset, I fixed two separate issues, one of which caused recursive faults and one of which caused it to output nothing at all. So maybe *now* it's very robust :) But it's still an umaintainable mess IMO, and Josh's patchset helps a *lot*. --Andy