I've done a lot of profiling of Rust code using tools that are based around
stack traces. I can't emphasize how useful it is to have inlined stack
frames.

Here is a highly typical example from a profile I had lying around:

#1: 0x655A213: alloc (alloc.rs:75)
#2: 0x655A213: alloc (alloc.rs:151)
#3: 0x655A213: reserve_internal<(syntax::tokenstream::TokenTree,
syntax::tokenstream::IsJoint),alloc::alloc::Global> (raw_vec.rs:668)
#4: 0x655A213: reserve<(syntax::tokenstream::TokenTree,
syntax::tokenstream::IsJoint),alloc::alloc::Global> (raw_vec.rs:491)
#5: 0x655A213: reserve<(syntax::tokenstream::TokenTree,
syntax::tokenstream::IsJoint)> (vec.rs:457)
#6: 0x655A213: push<(syntax::tokenstream::TokenTree,
syntax::tokenstream::IsJoint)> (vec.rs:1023)
#7: 0x655A213: parse_token_trees_until_close_delim (tokentrees.rs:27)
#8: 0x655A213: syntax::parse::lexer::tokentrees::<impl
syntax::parse::lexer::StringReader>::parse_token_tree (tokentrees.rs:81)
#9: 0x655A17F: parse_token_trees_until_close_delim (tokentrees.rs:26)
#10: 0x655A17F: syntax::parse::lexer::tokentrees::<impl
syntax::parse::lexer::StringReader>::parse_token_tree (tokentrees.rs:81)
#11: 0x655A17F: parse_token_trees_until_close_delim (tokentrees.rs:26)
#12: 0x655A17F: syntax::parse::lexer::tokentrees::<impl
syntax::parse::lexer::StringReader>::parse_token_tree (tokentrees.rs:81)

That stack trace shows inlined frames. Notice that there are only three
distinct addresses. Without the inlined frames, there would only be three
frames, something like this:

#1: 0x655A213: alloc (alloc.rs:75)
#2: 0x655A213: reserve_internal<(syntax::tokenstream::TokenTree,
syntax::tokenstream::IsJoint),alloc::alloc::Global> (raw_vec.rs:668)
#3: 0x655A17F: parse_token_trees_until_close_delim (tokentrees.rs:26)

Here is a more extreme (but still not *so* unusual) example:

#1: 0x5794562: alloc (alloc.rs:75)
#2: 0x5794562: exchange_malloc (alloc.rs:185)
#3: 0x5794562: clone<rustc::mir::Rvalue> (boxed.rs:332)
#4: 0x5794562: clone (mod.rs:1759)
#5: 0x5794562: <rustc::mir::Statement as core::clone::Clone>::clone (
mod.rs:1733)
#6: 0x579E30A:
{{closure}}<core::slice::Iter<rustc::mir::Statement>,rustc::mir::Statement,(),closure>
(mod.rs:272)
#7: 0x579E30A: fold<rustc::mir::Statement,(),closure> (mod.rs:3150)
#8: 0x579E30A:
fold<core::slice::Iter<rustc::mir::Statement>,rustc::mir::Statement,(),closure>
(mod.rs:272)
#9: 0x579E30A:
for_each<core::iter::adapters::Cloned<core::slice::Iter<rustc::mir::Statement>>,closure>
(iterator.rs:604)
#10: 0x579E30A:
spec_extend<rustc::mir::Statement,core::iter::adapters::Cloned<core::slice::Iter<rustc::mir::Statement>>>
(vec.rs:1852)
#11: 0x579E30A:
spec_extend<rustc::mir::Statement,core::slice::Iter<rustc::mir::Statement>>
(vec.rs:1902)
#12: 0x579E30A: extend_from_slice<rustc::mir::Statement> (vec.rs:1351)
#13: 0x579E30A: to_vec<rustc::mir::Statement> (slice.rs:160)
#14: 0x579E30A: to_vec<rustc::mir::Statement> (slice.rs:380)
#15: 0x579E30A: clone<rustc::mir::Statement> (vec.rs:1642)
#16: 0x579E30A: <rustc::mir::BasicBlockData as core::clone::Clone>::clone (
mod.rs:1022)
#17: 0x564271A:
{{closure}}<core::slice::Iter<rustc::mir::BasicBlockData>,rustc::mir::BasicBlockData,(),closure>
(mod.rs:272)
#18: 0x564271A: fold<rustc::mir::BasicBlockData,(),closure> (mod.rs:3150)
#19: 0x564271A:
fold<core::slice::Iter<rustc::mir::BasicBlockData>,rustc::mir::BasicBlockData,(),closure>
(mod.rs:272)
#20: 0x564271A:
for_each<core::iter::adapters::Cloned<core::slice::Iter<rustc::mir::BasicBlockData>>,closure>
(iterator.rs:604)
#21: 0x564271A:
spec_extend<rustc::mir::BasicBlockData,core::iter::adapters::Cloned<core::slice::Iter<rustc::mir::BasicBlockData>>>
(vec.rs:1852)
#22: 0x564271A:
spec_extend<rustc::mir::BasicBlockData,core::slice::Iter<rustc::mir::BasicBlockData>>
(vec.rs:1902)
#23: 0x564271A: extend_from_slice<rustc::mir::BasicBlockData> (vec.rs:1351)
#24: 0x564271A: to_vec<rustc::mir::BasicBlockData> (slice.rs:160)
#25: 0x564271A: to_vec<rustc::mir::BasicBlockData> (slice.rs:380)
#26: 0x564271A: clone<rustc::mir::BasicBlockData> (vec.rs:1642)
#27: 0x564271A: clone<rustc::mir::BasicBlock,rustc::mir::BasicBlockData> (
indexed_vec.rs:499)
#28: 0x564271A: <rustc::mir::Mir as core::clone::Clone>::clone (mod.rs:90)

There are three "real" stack frames and 25 inlined stack frames. Those
inline stack frames are the difference between "I have a vague idea of what
is going on" and "I know exactly what is going on".

Any changes that take us that direction are very useful :)

Nick

On Sat, 6 Apr 2019 at 03:34, Nathan Froyd <nfr...@mozilla.com> wrote:

> TL;DR: We're making some changes to how inlined functions are handled
> in our crash reports on non-Windows platforms in bug 524410.  This
> change should mostly result in more understandable crash stacks for
> code that uses lots of inlining, and shouldn't make things any worse.
> Some crash signatures may change as a result.  If you have concerns,
> or you happen to have crash stacks that you're curious about whether
> they'd change under this new policy, please comment in bug 524410 or
> email me.
>
> For the grotty details, read on.
>
> Modern C++/Rust code relies on inlining for efficiency, and modern
> compilers have gotten very good at accommodating such code: it's not
> unusual for code to feature double-digit levels of inlining (A inlined
> into B inlined into C...inlined into J).  A simple Rust function that
> looks like:
>
>   slice.into_iter().map(|...| { ... })
>
> and you think of as spanning addresses BEGIN to END, might actually
> feature small ranges of instructions from a dozen different functions,
> and the compiler will (mostly) faithfully tell you a precise location
> for each range.  (Instruction A comes from some iterator code,
> instruction B comes from a bit of your closure, instruction C comes
> from some inlined function three levels deep inside of your
> closure...) Unfortunately, this faithfulness means that in the event
> of a crash, the crashing instruction might get attributed to some code
> in libstd with no indication of how that relates to the original code.
>
> Bug 524410, supporting inline functions in the symbol dumper, has been
> open for a decade now.  The idea is that compilers (on Unix platforms,
> not entirely sure this is true on Windows) will not only give you
> precise information about what function particular instruction ranges
> come from, they will also give you information about the chain of
> inlining that resulted in those particular instructions.  The symbol
> dumper ought to be able to emit enough information to reconstruct
> "frames" from inlined functions.  That is, if you have:
>
> ```
> addr0 ---+
> ...      | A
> addr1 ---+
> addr2 ---+   ---+         ------+
> ...      | B    |               |
> addr3 ---+      | operator+     |
> addr4 ---+      |               |
> ...      | C    |               |
> addr5 ---+   ---+               | DoTheThing
> addr6 ---+   ---+               |
> ...      | D    |               |
> addr7 ---+      | operator[]    |
> addr8 ---+      |               |
> ...      | E    |               |
> addr9 ---+   ---+         ------+
> ...
> ```
>
> (apologies if the ASCII art doesn't come through), and you're crashing
> at `addr6`, the status quo is that you know you crashed in `D`, and
> the next frame is whoever called `DoTheThing`.  The ideal state of
> affairs is that you're told that you crashed in `D`, called from
> `operator[]`, called from `DoTheThing`, and so forth.
>
> We're not there yet.  Changing the symbol dumper to emit this
> information requires changing the symbol file format, which requires
> some coordinated updates to several pieces of infrastructure (and
> probably others that I don't know about).  It also requires discussion
> with upstream Breakpad (and therefore breaking many more consumers
> than just Mozilla-internal ones) and/or forking Breakpad completely
> and/or rewriting all our tools (which is a special case of forking).
> We want to get to that end state, but it's a fair bit of work.
>
> The patches on the bug implement a truncated version of the above that
> doesn't require dumping the entire inlining hierarchy.  The idea is
> that, as much as possible, for addresses in some function, you want to
> attribute those addresses to source code lines for said function.  So
> instead of recording the most precise lines possible (A, B, C, D, E,
> etc. in the above), you want to simply attribute the entire range of
> B-E to `DoTheThing`.  This transformation loses information, but it
> tends to produce stack traces that make more sense to humans.  You can
> see some examples of the changes in:
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=524410#c22
> https://bugzilla.mozilla.org/show_bug.cgi?id=524410#c29
>
> which result in more sensible crash stacks, even if they don't
> immediately point out what's going wrong.
>
> This work has not landed yet, but should land sometime next week.  If
> you have concerns, or you happen to have crash stacks that you're
> curious about whether they'd change under this new policy, please
> comment in bug 524410 or email me.
>
> Thanks,
> -Nathan
> _______________________________________________
> Stability mailing list
> stabil...@mozilla.org
> https://mail.mozilla.org/listinfo/stability
>
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to