Re: [lldb-dev] RFC: Processor Trace Support in LLDB

Walter via lldb-dev Fri, 18 Sep 2020 16:11:59 -0700

Thanks for your comments, I'll reply here:

> Is it possible to decode a small portion of an Intel PT trace file
quickly, say, in a few milliseconds? This would be useful if tracing were
done in ringbuffer mode, or if the event the user is interested in
debugging (along with its relevant execution history) is known to occur at
the end of the trace. The user could potentially choose which subset of the
trace to decode, and re-decode a different subset if more context is needed.


Yes, it's totally possible. I'll dig a little deep into the Intel PT trace
structure. A trace is made of a bunch of packets, where some are
synchronization packets (PSB packets). You can pick an arbitrarily sync
packet, and from that point on start decoding. This means that it's
possible to decode backwards (i.e. find the last sync packet, decode the
packets up to the end of the trace, then move to the previous sync packet,
decode until the former packet, and so on). This is very valuable and I'll
keep it in mind for the implementation.

> What mechanisms are available for discerning the root cause of a gap?
Does the Intel PT decoder have internal consistency checks that can
diagnose hardware bugs (or decoder bugs, for that matter)?

Yes. Whenever there's a decoding error, the libipt decoder notifies us of
what the error is. You can check the DecodeInstructions function in
https://reviews.llvm.org/D87589 if you are interested, although it's not a
light read.

> Also, when a gap occurs, perhaps it's possible that the instructions
leading up to the gap are not accurate. E.g., if the decoding process
desyncs from the trace file while disassembling, it's possible to
accidentally follow (or ignore) a branch. Are there measures to
detect/erase those inaccurate instructions prior to a gap?

I don't think this can happen. When an instruction can't be decoded, the
decoder moves to the next synchronization point and resumes decoding from
that point. Interestingly, it's possible to configure how often
synchronization packets are produced. IIRC you could even request one sync
packet per CPU cycle, leading to small gaps. If this configuration is not
specified, the CPU itself decides when to produce these packets, which tend
to be every few KB of data.

> Also, how should a gap be represented in the debugger output? E.g., if a
gap is encountered while dumping instructions, should the debugger print
<gap: instruction unknown>?
Imho it's important to nail down a user interface metaphor for
navigating/exploring a trace before adding any 'dump'-like commands. I
don't think we've done that yet.

We definitely should let the user know of these events. When dumping traces
in this WIP diff https://reviews.llvm.org/D87730, I'm already showing the
user these gaps and a reason why they failed.

[4] 0x400529 <+28>: cmpl   $0x3, -0x8(%rbp)
[3] error -13. 'no memory mapped at this address'
[2] 0x40052d <+32>: jle    0x400521


There's nothing more useful to do besides showing this information somehow.
It's information lost, so I think it's fine as it is :)

However, the real interesting point to discuss is how we could implement
reverse debugging under these circumstances. What if you do reverse-next
and the trace has a gap but a bunch of instructions before that gap, should
we abort the reverse-next and tell the user that there's a gap, then the
user somehow has to move backwards in another way if that's the intention?
Or should we just move backwards skipping the gap and printing an error
message that there's a gap? Probably I'd choose the latter over the former,
but I imagine some people would prefer the first. I'd prefer to leave this
for a future discussion. Dropping some bit of information here, several
IDEs like VSCode already support reverse-debugging controls, so it would
make sense to make the default behavior of the LLDB implementation follow
those controls, and create some other commands for the folks who want
something different.

> I'm not trying to hold up work: I think these 'dump' subcommands can be
hidden, or maybe they could print a 'for lldb developers only' warning
until we have a better idea of how users will want to explore a trace.

I envision these dump commands as the most inefficient way to explore a
trace, and I wouldn't add much more to them. I think that the best way to
explore is with reverse debugging (e.g. place a breakpoint, do
reverse-continue, stop at that breakpoint, move forward and backwards,
print the stack trace, move to another breakpoint, etc.) A trace has so
much information but the user already has an idea of where they want to
look at when root causing a bug, so breakpoints are the easiest interface
for the user to tell LLDB what they are interested in.

> One potential UI metaphor is a slider: the user can see where (which
instruction index) in the decoded trace instruction stream they are, and
they can move the slider (jump backwards/forwards in the instruction
stream) as desired. Wherever they are stopped, they can get an accurate
backtrace, look at the call (or line, or instruction-level) execution
history, peek ahead at future calls, etc. (Reverse) stepping/continuing
could be scene as moving the slider more or less quickly. Maybe it'd be
useful to mark a spot to get back to it later.

We are agreeing on this. I think that reverse debugging controls in the UI
are a very good start for that.

> I'm sure there are other ways to look at a trace. E.g. you could have a
view that shows how often each function/line is executed, or you could have
an annotated CFG view.

Yes! This becomes highly important, especially if there's timing
information associated. You could make visualizations over time, with
callstacks, statistics, etc. My intention is to eventually flesh out those
cool features.


Thanks,
- Walter Erquinigo

Il giorno ven 18 set 2020 alle ore 15:32 Vedant Kumar <v...@apple.com> ha
scritto:

> Hi Walter & Greg,
>
> Thanks for sharing this RFC, and for your work in this area.
>
> On Sep 17, 2020, at 5:28 PM, Walter via lldb-dev <lldb-dev@lists.llvm.org>
> wrote:
>
> Hi all,
>
>
>
> Here I propose, along with Greg Clayton, Processor Trace support for LLDB. 
> I’m attaching a link to the document that contains this proposal if that’s 
> easier to read for you: 
> https://docs.google.com/document/d/1cOVTGp1sL_HBXjP9eB7qjVtDNr5xnuZvUUtv43G5eVI/edit#heading=h.t5mblb9ugv8f
>  
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1cOVTGp1sL-5FHBXjP9eB7qjVtDNr5xnuZvUUtv43G5eVI_edit-23heading-3Dh.t5mblb9ugv8f&d=DwMGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=erxV6KMIZvIQjyWYW8YpOiKz-WqJt4giKQA34YMHsRY&m=DuuwXHUQJpW4TcCay4hPsBund-eBI2uVaVimqEPsp5k&s=o6vqoYYbn-Tz_d34hoLJvWhEnnhracOO6yDsMzq8wR0&e=>.
>  Please make any comments in this mail list.
>
>
>
> If you want to quickly know what Processor Trace can do, you can read this 
> https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__easyperf.net_blog_2019_08_23_Intel-2DProcessor-2DTrace&d=DwMGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=erxV6KMIZvIQjyWYW8YpOiKz-WqJt4giKQA34YMHsRY&m=DuuwXHUQJpW4TcCay4hPsBund-eBI2uVaVimqEPsp5k&s=iaErHaf8byXlZb1YFUk0BpQ-duMhNouUUMyktLm3soQ&e=>.
>
>
>
> Any comments are appreciated, especially the ones regarding the commands the 
> user will interact with.
>
>
>
> Thanks,
>
> Walter Erquinigo.
>
>
>
>
>
> # RFC: Processor Trace Support in LLDB
>
>
>
>
>
> # What is processor tracing?
>
>
>
> Processor tracing works by capturing information about the execution of a 
> process so that the control flow of the program can be reconstructed later. 
> Implementations of this are Intel Processor Trace for X86, x86_64 
> ([https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html](https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html))
>  and ARM CoreSight for some ARM devices 
> ([https://developer.arm.com/ip-products/system-ip/coresight-debug-and-trace](https://developer.arm.com/ip-products/system-ip/coresight-debug-and-trace)).
>
>
>
> As a clarifying example, with these technologies it’s possible to trace all 
> the threads of a process, and after the process has finished, reconstruct 
> every single instruction address each thread has executed. This could include 
> some additional information like timestamps, async CPU events, kernel 
> instructions, bus clock ratio changes, etc. On the other hand, memory and 
> registers are not traced as a way to limit the size of the trace.
>
>
>
>
>
> # Intel Processor Trace as the first implementation
>
>
>
> We’ll focus on Intel Processor Trace (Intel PT), but in a generic way so that 
> in the future similar technologies can be onboarded in LLDB.
>
>
>
> Intel PT has the following features:
>
>
>
>
>
>
>
> *   Control flow tracing in a highly encoded format
>
> *   3% to 5% slowdown when capturing
>
> *   No memory nor registers captured
>
> *   Kernel tracing support
>
> *   Timestamps of branches are produced, which can be used for profiling
>
> *   Adjustable size of trace buffer
>
> *   Supported on most Intel CPUs since 2015
>
> *   X86 and x86_64 only
>
> *   Official support only on Linux
>
> *   Basic support on Windows
>
> *   Decoding/analysis can be done on any operating system
>
>
>
> A very nice introduction to Intel PT can be found 
> [https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html](https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html)
>  and 
> [https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace](https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace).
>  Totally recommended to fully grasp the impact of this project.
>
>
>
> More technical details are in 
> [https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/perf-intel-pt.txt](https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/perf-intel-pt.txt).
>
>
>
> Even more technical details are in the processor manual 
> [https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3c-part-3-manual.pdf](https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3c-part-3-manual.pdf)
>
>
>
>
>
> # Basic Definitions
>
>
>
>
>
>
>
> *   Trace file: A trace file basically contains the information of the target 
> addresses of each branch or jump within the program execution in a highly 
> encoded format.
>
> *   Capturing: The act of tracing a process and producing a trace file.
>
> *   Decoding: Decoding outputs a sequential list of instructions given a 
> trace file and the images of a process. Decoding is generally an offline step 
> as it’s expensive.
>
>
> Is it possible to decode a small portion of an Intel PT trace file
> quickly, say, in a few milliseconds? This would be useful if tracing were
> done in ringbuffer mode, or if the event the user is interested in
> debugging (along with its relevant execution history) is known to occur at
> the end of the trace. The user could potentially choose which subset of the
> trace to decode, and re-decode a different subset if more context is needed.
>
> *   Trace buffer: In order to limit the size of the trace, an on-memory 
> circular buffer can be used, keeping the most recent branching information. 
> The trace file is a snapshot of this.
>
> *   Gap: Sporadically some branching information can be lost or be impossible 
> to decode, which creates a gap in the reconstructed control flow.
>
>
> What mechanisms are available for discerning the root cause of a gap? Does
> the Intel PT decoder have internal consistency checks that can diagnose
> hardware bugs (or decoder bugs, for that matter)?
>
> Also, when a gap occurs, perhaps it's possible that the instructions
> leading up to the gap are not accurate. E.g., if the decoding process
> desyncs from the trace file while disassembling, it's possible to
> accidentally follow (or ignore) a branch. Are there measures to
> detect/erase those inaccurate instructions prior to a gap?
>
> Also, how should a gap be represented in the debugger output? E.g., if a
> gap is encountered while dumping instructions, should the debugger print
> <gap: instruction unknown>?
>
> # New LLDB features
>
>
>
>
>
>
>
> *   Loading traces: We want to load traces potentially from other computers, 
> and have LLDB symbolicating it. A flow like the following should be possible \
>
>
>
>
>
>     ```
>
>     $ trace load /path/to/trace
>
>     $ trace dump --instructions
>
>     pid: '1234', tid: '1981309'
>
>       a.out`main
>
>       [57] 0x400549 <+13>: movl   %eax, -0x4(%rbp)
>
>       a.out`bar()
>
>       [56] 0x40053b <+46>: retq
>
>       [55] 0x40053a <+45>: leave
>
>       [54] 0x400537 <+42>: movl   -0x4(%rbp), %eax
>
>       [53] 0x400535 <+40>: jle    0x400525                  ; <+24> at 
> main.cpp:7
>
>       [52] 0x400531 <+36>: cmpl   $0x3, -0x8(%rbp)
>
>       [51] 0x40052d <+32>: addl   $0x1, -0x8(%rbp)
>
>       [50] 0x40052a <+29>: addl   %eax, -0x4(%rbp)
>
>       a.out`foo()
>
>       [49] 0x400567 <+15>: retq
>
>       [48] 0x400566 <+14>: popq   %rbp
>
>       [47] 0x400563 <+11>: movl   -0x4(%rbp), %eax
>
>       [46] 0x40055c <+4>: movl   $0x2a, -0x4(%rbp)
>
>
>
>               ...
>
>           [1] 0x400559 <+1>: movq   %rsp, %rbp
>
>           [0] 0x400558 <+0>: pushq  %rbp
>
>
>
>
>
>           // Format:
>
>     ```
>
>
>
>
>
>
>
>     `  // [instruction index] &lt;instruction disassembly> \
>
> `Notice the resemblance to loading a core file, but in this case we can get 
> the control flow, printed in reverse order in this example.
>
>
>
>
>
>
>
> *   Decoding: LLDB can use libipt 
> ([https://github.com/intel/libipt](https://github.com/intel/libipt)), which 
> is the low level Intel PT decoding library, to convert trace files into 
> instructions.
>
> *   Showing instructions: LLDB can output the list of instructions of the 
> control flow, as shown above
>
> *   Showing function calls: Similarly, LLDB can print a hierarchical view of 
> the function calls. A flow like this should be possible: \
>
>
>
>
>
>     ```
>
>     $ trace load /path/to/trace
>
>     $ trace dump --function-calls
>
>     pid: '1234', tid: '1981309'
>
>       [50]     a.out`bar()         0x40052a
>
>       [45]       a.out`zaz()       0x400558
>
>       [40]     a.out`baz()         0x400559
>
>       [30]   a.out`foo()           0x400567
>
>     ```
>
>
>
>
>
>
>
>     `  [0]  a.out`main              0x400000 \
>
>  \
>
> `This functionality allows LLDB to reconstruct the call stack at any point 
> and potentially  do reverse debugging.
>
>
> Imho it's important to nail down a user interface metaphor for
> navigating/exploring a trace before adding any 'dump'-like commands. I
> don't think we've done that yet.
>
> I'm not trying to hold up work: I think these 'dump' subcommands can be
> hidden, or maybe they could print a 'for lldb developers only' warning
> until we have a better idea of how users will want to explore a trace.
>
> One potential UI metaphor is a slider: the user can see where (which
> instruction index) in the decoded trace instruction stream they are, and
> they can move the slider (jump backwards/forwards in the instruction
> stream) as desired. Wherever they are stopped, they can get an accurate
> backtrace, look at the call (or line, or instruction-level) execution
> history, peek ahead at future calls, etc. (Reverse) stepping/continuing
> could be scene as moving the slider more or less quickly. Maybe it'd be
> useful to mark a spot to get back to it later.
>
> I'm sure there are other ways to look at a trace. E.g. you could have a
> view that shows how often each function/line is executed, or you could have
> an annotated CFG view.
>
> (Stepping back a bit -- I realize these comments are somewhat
> forward-looking / potentially out of scope for your initial patches. Still,
> I feel it's worth thinking about early on.)
>
> *   Capturing: LLDB can also do the Intel PT capturing of a live process, so 
> that at any stop the user can do reverse stepping or simply inspect the 
> trace. A possible flow is:
>
>
>
>     ```
>
>     $ <stopped at main>
>
>     $ b main.cpp:50
>
>     $ trace start intel-pt // this initiates the tracing
>
>     $ continue
>
>     $ <stopped at main.cpp:50>
>
>     $ trace dump --instructions
>
> pid: '1234', tid: '1981309'
>
>       a.out`main
>
>       [57] 0x400549 <+13>: movl   %eax, -0x4(%rbp)
>
>       a.out`bar()
>
>       [56] 0x40053b <+46>: retq
>
>       [55] 0x40053a <+45>: leave
>
>     ```
>
>
>
>
>
>
>
>     Displaying time information: If the trace contains timing information, we 
> could also display it along with each instruction, e.g.
>
>
>
>
>
>     ```
>
>     a.out`bar()
>
>     [56: 1600284226]: 0x40053b <+46>: retq
>
>     ...
>
>     [4:  1600284200]: 0x40053a <+45>: leave
>
>     // Format:
>
>     // [instruction index: unix timestamp] <instruction disassembly>
>
>     ```
>
>
>
>
>
>
>
>     Furthermore, we could display the time spent in each function.
>
>
>
>
>
>
>
> # Future LLDB features
>
>
>
>
>
>
>
> *   Reverse Stepping: With the hierarchical reconstruction of the function 
> calls, along with the individual instructions, LLDB can offer reverse 
> stepping. Operations like reverse-next, reverse-step-out, reverse-continue 
> could work by traversing the trace. We plan to work on this once the features 
> presented above are in place.
>
> *   Trace-based profiling
>
> *   SB API of the mentioned features
>
>
>
>
>
> # Why is this useful?
>
>
>
>
>
>
>
> *   Bug root-causing:
>
>     *   For example, a crash in a production Release build ends up being 
> analyzed with logs, a coredump, and a stack trace. Logs are not 
> comprehensive, and a stack trace only contains the final state of the 
> program. Providing the user with the control flow of the last milliseconds 
> gives a tremendous amount of information that is game-changing in 
> root-causing issues. It could be said that the user goes from a single stack 
> trace to a list of stack traces.
>
>     *   Reverse stepping enables more efficient debugging, as it reduces the 
> number of iterations to efficiently root-cause bugs. More often than not, 
> reproducing a bug takes a considerable amount of time, and the user needs to 
> reproduce it several times until the correct breakpoints are hit. This takes 
> a considerable amount of time. Giving the user the information of what has 
> been executed so far can help them figuring out where’s the location to place 
> a breakpoint, or to very easily figure out what went wrong.
>
> *   Low cost: unlike other similar technologies, Intel PT has an almost 
> negligible performance cost regardless of whether the build is optimized or 
> not, making it appealing to a wide range of scenarios.
>
> *   This infrastructure can be used for enabling other tools like 
> non-sample-based profilers with instruction-level accuracy, security 
> analyzers that check if certain memory regions are executed, and trace 
> comparators, which could find bugs by comparing similar traces.
>
>
>
>
>
> # Goals of this document:
>
>
>
>
>
>
>
> *   Gather feedback on the basic Trace implementation, which would include 
> the following basic operations: loading, decoding, and dumping.
>
>
> All this sounds good to me with the caveat that, as mentioned above, we
> probably should indicate to users that the `trace dump` facility is not
> stable / likely to change.
>
> *   Create awareness about this work.
>
> *   Get a green light on the current set of patches implementing this feature 
> starting with https://reviews.llvm.org/D85705.
>
>
>
>
>
> # Non-Goals:
>
>
>
>
>
>
>
> *   Discuss how reverse-stepping will be implemented. This can be left for 
> another discussion. Once the Trace architecture is in place and robust, 
> reverse-stepping can then be discussed, as it’s a more controversial change 
> than this one.
>
> *   Explain thoroughly Intel PT.
>
>
>
>
>
> # Existing Tool Support
>
>
>
>
>
>
>
> *   GDB has a basic implementation of the features above 
> ([https://sourceware.org/gdb/onlinedocs/gdb/Process-Record-and-Replay.html](https://sourceware.org/gdb/onlinedocs/gdb/Process-Record-and-Replay.html))
>  and some ideas are taken from there.
>
> *   Perf is a standalone tool that can do capturing and decoding.
>
> *   The Linux kernel has full support for doing capturing at thread, logical 
> cpu or cgroup level.
>
> *   Intel developed a basic version of Intel PT support in LLDB as an 
> external plugin. 
> [https://reviews.llvm.org/D33674](https://reviews.llvm.org/D33674), 
> [https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b](https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b).
>
>
>
>
>
> # New Trace Commands
>
>
>
> Based on this patch 
> [https://reviews.llvm.org/D85705](https://reviews.llvm.org/D85705), there 
> would be a common Trace class along with plug-in implementations.
>
>
>
>
>
> ## Trace loading
>
>
>
>
>
> ### $ trace load /path/to/trace/settings/file.json
>
>
>
> As decoding a trace requires the images of the object files, the trace files 
> and some CPU information, it’s convenient to have a JSON file that describes 
> an entire trace session. The following JSON schema could be used.
>
>
>
>
>
> ```
>
> {
>
> "trace": {
>
>    … // plug-in specific information
>
>  },
>
>  "processes": [      // process information common to all trace plug-ins
>
>    {
>
>      "pid": integer,
>
>      "triple": string, // llvm-triple
>
>      "threads": [
>
>        {
>
>          "tid": integer,
>
>          "traceFile": string
>
>        }
>
>      ],
>
>      "modules": [
>
>        {
>
>          "systemPath": string, // original path of the module at runtime
>
>          "file"?: string, // copy of the file if not available at "systemPath"
>
>          "loadAddress": string, // string address in hex or decimal form
>
>          "uuid"?: string,
>
>        }
>
>      ]
>
>    }
>
>  ]
>
> }
>
> // Notes:
>
> // All paths are either absolute or relative to the settings file.
>
> ```
>
>
>
>
>
> **Corefiles:**
>
>
>
> We plan to extend this schema to support corefiles, but we would leave it out 
> of this discussion, as can be easily seen as an extension of this basic 
> schema.
>
>
>
> **Implementation details:**
>
>
>
> To make our first implementation easier, we’ll ask for an individual trace 
> file per thread. This is the simpler collection mode for Intel PT.
>
>
>
> The entire json file will be translated into a Trace object, which contains 
> the trace information of each thread and process in it.
>
>
>
> Each process in the json file will be represented as a new Target. Similarly, 
> threads and modules for each target will be created following the json file. 
> This is very similar to what loading a minidump or coredump does.
>
>
>
> Each Target will be associated with a Trace, and multiple targets can share 
> the same Trace. The contract is that Trace is assumed to end at the current 
> PC of each thread of the target.
>
>
>
>
>
> ### $ trace schema &lt;plug-in>
>
>
>
> This command prints the JSON schema of the trace settings file for the 
> provided plug-in. It would output something similar to this
>
>
>
>
>
> ```
>
> {
>
> "trace": {
>
>    "type": "intel-pt",
>
>    "pt_cpu": {
>
>      "vendor": "intel" | "unknown",
>
>      "family": integer,
>
>      "model": integer,
>
>      "stepping": integer
>
>    }
>
>  },
>
>  "processes": [
>
>    {
>
>      "pid": integer,
>
>      "triple": string, // llvm-triple
>
>      "threads": [
>
>        {
>
>          "tid": integer,
>
>          "traceFile": string
>
>        }
>
>      ],
>
>      "modules": [
>
>        {
>
>          "systemPath": string, // original path of the module at runtime
>
>          "file"?: string, // copy of the file if not available at "systemPath"
>
>          "loadAddress": string, // string address in hex or decimal form
>
>          "uuid"?: string,
>
>        }
>
>      ]
>
>    }
>
>  ]
>
> }
>
> // Notes:
>
> // All paths are either absolute or relative to the settings file.
>
> ```
>
>
>
>
>
>
>
> ### $ trace dump [--verbose] [-t tid1] [-t tid2] ...
>
>
>
> Print the trace information corresponding to the provided thread ids of the 
> currently selected target, which would mainly include the same information as 
> the trace settings file. If no tid is provided, the currently selected thread 
> is used. This would be useful for debugging. The information would be like
>
>
>
>   Modules:
>
>
>
>     &lt;module info like systemPath, file, load address, uuid, size>
>
>
>
>   Threads:
>
>
>
>     &lt;thread info like location of trace file, number of instructions (if 
> already decoded), number   of function calls (if already decoded)>
>
>
>
> If &lt;--verbose> is passed, the original settings.json file is printed as 
> well.
>
>
>
>
>
> ## Decoder-based commands
>
>
>
> The following commands require decoding the trace and are of the form. “trace 
> dump &lt;action> [-t &lt;tid>]”. If tids are not specified, then the current 
> thread or the current target will be used.
>
>
>
>
>
> ### $ trace dump --instructions [-t &lt;tid>] [-c &lt;count> = 10] [-o 
> &lt;offset> = 0]
>
>
>
> This command would print the last &lt;count> instructions starting at the 
> given offset from the last instruction in the trace. The output would be 
> similar to that of the “disassembly” command and would include timing 
> information if available.
>
>
>
>
>
> ```
>
>     $ trace dump --instructions -c 5
>
>     pid: '1234', tid: '1981309'
>
>       a.out`main
>
>       [57] 0x400549 <+13>: movl   %eax, -0x4(%rbp)
>
>       a.out`bar()
>
>       [56] 0x40053b <+46>: retq
>
>       [55] 0x40053a <+45>: leave
>
>       [54] error -13. 'no memory mapped at this address'
>
>       a.out`foo()
>
>       [53] 0x400567 <+15>: retq
>
> ```
>
>
>
>
>
> Repeating the command would continue printing where it was left off in the 
> last run.
>
>
>
> **Implementation details:**
>
>
>
> Each instruction output by the decoder is either an actual instruction or an 
> error. An error can be caused due to a collection error (e.g. internal CPU 
> buffer overflow error) or a decoding error (e.g. the image of an object file 
> is missing while decoding). These errors represent gaps in the trace and the 
> user should know about them, so we print them accordingly in this dump.
>
>
>
> Each instruction (including errors) has an index in the decoded trace, and 
> serves as a checkpoint.
>
>
>
>
>
> ### $ trace dump --function-calls [-t &lt;tid>] [-c &lt;count> = 10] [-o 
> &lt;offset> = 0] [--flat]
>
>
>
> This command would print the hierarchical list of function calls. Similar to 
> the “--instructions” command, it would show the last &lt;count> function 
> calls with the given offset from the last instructions. Timing information 
> would be included if available.
>
>
>
>
>
> ```
>
>     $ trace dump --function-calls
>
>     pid: '1234', tid: '1981309'
>
>       [50]     a.out`bar()         0x40052a
>
>       [45]       a.out`zaz()       0x400558
>
>       [40]     a.out`baz()         0x400559
>
>       [30]   a.out`foo()           0x400567
>
>       [0]  a.out`main              0x400000
>
> ```
>
>
>
>
>
> Repeating the command would continue printing where it was left off in the 
> last run.
>
>
>
> If &lt;--flat> is passed, then instead of a hierarchical view, a flat list 
> would be produced.
>
>
>
>
>
> ## Capturing command
>
>
>
>
>
> ### $ trace start &lt;plugin_name> [-t &lt;tid>] [--all] [-b 
> &lt;buffer_size_in_KB>]
>
>
>
> This command will start tracing the given thread of the currently selected 
> target, or all the threads of that target if “--all” is passed. If “--all” is 
> passed, any thread created after this command will also be traced 
> automatically.
>
>
>
> Besides, the optional -b parameter can define the size of each trace buffer 
> to be created. I haven’t yet decided a default one, but 1M might be 
> acceptable, as it traces around 1 million instructions on average according 
> to Intel, and that’s more than enough for a useful analysis.
>
>
>
> For an initial implementation, the plugin_name parameter will be required 
> (e.g. intel-pt). Later a more automated mechanism for finding the right 
> plugin can be implemented.
>
>
>
> **Implementation notes:**
>
>
>
> There’s already a basic implementation in lldb as an external plugin. It’s in 
> [https://reviews.llvm.org/source/llvm-github/browse/master/lldb/tools/intel-features/intel-pt/](https://reviews.llvm.org/source/llvm-github/browse/master/lldb/tools/intel-features/intel-pt/)
>  created by 
> [https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b](https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b).
>  It hasn’t received much attention and has been mostly unmaintained since it 
> was created. It’s already capable of tracing a given thread and collecting 
> the trace buffer. We plan to reuse that logic, which is already working.
>
>
>
> A Trace object will be created and will be associated with the current Target.
>
>
>
> Any interaction with trace, like dumping instructions, will trigger a fetch 
> of the most recent trace buffer, unless it hasn’t changed.
>
>
>
> When multiple threads are traced, each one will have its own trace buffer, as 
> sharing one buffer in multiple threads requires knowing when each context 
> switch happened so that the decoded trace can be split correctly among 
> threads. This is beyond the scope of the initial version of this project.
>
>
>
>
>
> ### $ trace save /path/to/file.json [--copy-images]
>
>
>
> This creates a bundle trace with settings saved in the given json file for 
> the current process. By default, it doesn’t create any copy of the images 
> loaded on the process, unless the “--copy-images” parameter is specified. 
> That parameter is useful for analyzing the trace in a machine other than 
> where it was captured.
>
>
>
>
>
> # Remote Protocol Changes
>
>
>
> No remote protocol changes are required, as 
> [https://reviews.llvm.org/D33674](https://reviews.llvm.org/D33674) and 
> [https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b](https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b)
>  already created them some years ago.
>
>
>
>
>
> # Build Requirements
>
>
>
> In order to build LLDB with this support, it has to be linked with a build of 
> libipt [https://github.com/intel/libipt](https://github.com/intel/libipt), 
> which is the decoder.
>
>
>
>
>
> # Operating System Requirements for Collection/Tracing
>
>
>
> Collection can only be done on linux if the file 
> /sys/bus/event_source/devices/intel_pt/type is defined. The logic gating this 
> feature is already checked in and defined in 
> [https://reviews.llvm.org/D33674](https://reviews.llvm.org/D33674).
>
>
>
>
>
> # Testing
>
>
>
> It’s fortunately straightforward to test this feature. It’s possible to 
> capture traces with perf or with the future “trace start” / ”trace save” 
> commands and create trace bundles with their corresponding settings .json 
> file. Analyzing those traces should give the same results on any machine, 
> making testing deterministic. 
> [https://reviews.llvm.org/D85705](https://reviews.llvm.org/D85705) and 
> descendents already implement some deterministic tests.
>
> _______________________________________________
> lldb-dev mailing list
> lldb-dev@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>
>
> vedant
>


-- 
- Walter Erquínigo Pezo

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] RFC: Processor Trace Support in LLDB

Reply via email to