Re: [lldb-dev] RFC: Processor Trace Support in LLDB

Fangrui Song via lldb-dev Sun, 20 Sep 2020 11:25:02 -0700

On 2020-09-18, Eric Christopher via lldb-dev wrote:

Hi Walter,


I've only done a brief scan of the document but, in general, I'm favorable
of the goals, aim, and approach. Something I think would be good would be
to compare/contrast against rr as an "exploring alternatives" section of
the document. I think the document should also be made available/adapted to
be part of the documentation on "why lldb is implementing this feature/what
it can be used for/why".

Thanks so much for starting this and looking forward to the work and
collaboration.

-eric


Same. I am really excited that this work will open up possibilities for
reverse debugging, which is the most important factor impeding me from
migrating (from gdb) to lldb :)

For unit tests, a json format tracing record is probably convenient, but
for practical usage we may need a compacter format, e.g. Cap'n Proto
used by rr
(https://robert.ocallahan.org/2017/08/stabilizing-rr-trace-format.html)
Hope the framework can be easily adapted to such a compact format.

On Thu, Sep 17, 2020 at 8:28 PM Walter via lldb-dev <lldb-dev@lists.llvm.org>
wrote:

Hi all,



Here I propose, along with Greg Clayton, Processor Trace support for LLDB. I’m attaching a link to the 
document that contains this proposal if that’s easier to read for you: 
https://docs.google.com/document/d/1cOVTGp1sL_HBXjP9eB7qjVtDNr5xnuZvUUtv43G5eVI/edit#heading=h.t5mblb9ugv8f 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1cOVTGp1sL-5FHBXjP9eB7qjVtDNr5xnuZvUUtv43G5eVI_edit-23heading-3Dh.t5mblb9ugv8f&d=DwMGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=erxV6KMIZvIQjyWYW8YpOiKz-WqJt4giKQA34YMHsRY&m=DuuwXHUQJpW4TcCay4hPsBund-eBI2uVaVimqEPsp5k&s=o6vqoYYbn-Tz_d34hoLJvWhEnnhracOO6yDsMzq8wR0&e=>.
 Please make any comments in this mail list.



If you want to quickly know what Processor Trace can do, you can read this 
https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__easyperf.net_blog_2019_08_23_Intel-2DProcessor-2DTrace&d=DwMGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=erxV6KMIZvIQjyWYW8YpOiKz-WqJt4giKQA34YMHsRY&m=DuuwXHUQJpW4TcCay4hPsBund-eBI2uVaVimqEPsp5k&s=iaErHaf8byXlZb1YFUk0BpQ-duMhNouUUMyktLm3soQ&e=>.



Any comments are appreciated, especially the ones regarding the commands the 
user will interact with.



Thanks,

Walter Erquinigo.





# RFC: Processor Trace Support in LLDB





# What is processor tracing?



Processor tracing works by capturing information about the execution of a 
process so that the control flow of the program can be reconstructed later. 
Implementations of this are Intel Processor Trace for X86, x86_64 
([https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html](https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html))
 and ARM CoreSight for some ARM devices 
([https://developer.arm.com/ip-products/system-ip/coresight-debug-and-trace](https://developer.arm.com/ip-products/system-ip/coresight-debug-and-trace)).



As a clarifying example, with these technologies it’s possible to trace all the 
threads of a process, and after the process has finished, reconstruct every 
single instruction address each thread has executed. This could include some 
additional information like timestamps, async CPU events, kernel instructions, 
bus clock ratio changes, etc. On the other hand, memory and registers are not 
traced as a way to limit the size of the trace.





# Intel Processor Trace as the first implementation



We’ll focus on Intel Processor Trace (Intel PT), but in a generic way so that 
in the future similar technologies can be onboarded in LLDB.



Intel PT has the following features:







*   Control flow tracing in a highly encoded format

*   3% to 5% slowdown when capturing

*   No memory nor registers captured

*   Kernel tracing support

*   Timestamps of branches are produced, which can be used for profiling

*   Adjustable size of trace buffer

*   Supported on most Intel CPUs since 2015

*   X86 and x86_64 only

*   Official support only on Linux

*   Basic support on Windows

*   Decoding/analysis can be done on any operating system



A very nice introduction to Intel PT can be found 
[https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html](https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html)
 and 
[https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace](https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace).
 Totally recommended to fully grasp the impact of this project.



More technical details are in 
[https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/perf-intel-pt.txt](https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/perf-intel-pt.txt).



Even more technical details are in the processor manual 
[https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3c-part-3-manual.pdf](https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3c-part-3-manual.pdf)





# Basic Definitions







*   Trace file: A trace file basically contains the information of the target 
addresses of each branch or jump within the program execution in a highly 
encoded format.

*   Capturing: The act of tracing a process and producing a trace file.

*   Decoding: Decoding outputs a sequential list of instructions given a trace 
file and the images of a process. Decoding is generally an offline step as it’s 
expensive.

*   Trace buffer: In order to limit the size of the trace, an on-memory 
circular buffer can be used, keeping the most recent branching information. The 
trace file is a snapshot of this.

*   Gap: Sporadically some branching information can be lost or be impossible 
to decode, which creates a gap in the reconstructed control flow.





# New LLDB features







*   Loading traces: We want to load traces potentially from other computers, 
and have LLDB symbolicating it. A flow like the following should be possible \





    ```

    $ trace load /path/to/trace

    $ trace dump --instructions

    pid: '1234', tid: '1981309'

      a.out`main

      [57] 0x400549 <+13>: movl   %eax, -0x4(%rbp)

      a.out`bar()

      [56] 0x40053b <+46>: retq

      [55] 0x40053a <+45>: leave

      [54] 0x400537 <+42>: movl   -0x4(%rbp), %eax

      [53] 0x400535 <+40>: jle    0x400525                  ; <+24> at 
main.cpp:7

      [52] 0x400531 <+36>: cmpl   $0x3, -0x8(%rbp)

      [51] 0x40052d <+32>: addl   $0x1, -0x8(%rbp)

      [50] 0x40052a <+29>: addl   %eax, -0x4(%rbp)

      a.out`foo()

      [49] 0x400567 <+15>: retq

      [48] 0x400566 <+14>: popq   %rbp

      [47] 0x400563 <+11>: movl   -0x4(%rbp), %eax

      [46] 0x40055c <+4>: movl   $0x2a, -0x4(%rbp)



              ...

          [1] 0x400559 <+1>: movq   %rsp, %rbp

          [0] 0x400558 <+0>: pushq  %rbp





          // Format:

    ```







    `  // [instruction index] &lt;instruction disassembly> \

`Notice the resemblance to loading a core file, but in this case we can get the 
control flow, printed in reverse order in this example.







*   Decoding: LLDB can use libipt 
([https://github.com/intel/libipt](https://github.com/intel/libipt)), which is 
the low level Intel PT decoding library, to convert trace files into 
instructions.

*   Showing instructions: LLDB can output the list of instructions of the 
control flow, as shown above

*   Showing function calls: Similarly, LLDB can print a hierarchical view of 
the function calls. A flow like this should be possible: \





    ```

    $ trace load /path/to/trace

    $ trace dump --function-calls

    pid: '1234', tid: '1981309'

      [50]     a.out`bar()         0x40052a

      [45]       a.out`zaz()       0x400558

      [40]     a.out`baz()         0x400559

      [30]   a.out`foo()           0x400567

    ```







    `  [0]  a.out`main              0x400000 \

 \

`This functionality allows LLDB to reconstruct the call stack at any point and 
potentially  do reverse debugging.



*   Capturing: LLDB can also do the Intel PT capturing of a live process, so 
that at any stop the user can do reverse stepping or simply inspect the trace. 
A possible flow is:



    ```

    $ <stopped at main>

    $ b main.cpp:50

    $ trace start intel-pt // this initiates the tracing

    $ continue

    $ <stopped at main.cpp:50>

    $ trace dump --instructions

pid: '1234', tid: '1981309'

      a.out`main

      [57] 0x400549 <+13>: movl   %eax, -0x4(%rbp)

      a.out`bar()

      [56] 0x40053b <+46>: retq

      [55] 0x40053a <+45>: leave

    ```







    Displaying time information: If the trace contains timing information, we 
could also display it along with each instruction, e.g.





    ```

    a.out`bar()

    [56: 1600284226]: 0x40053b <+46>: retq

    ...

    [4:  1600284200]: 0x40053a <+45>: leave

    // Format:

    // [instruction index: unix timestamp] <instruction disassembly>

    ```







    Furthermore, we could display the time spent in each function.







# Future LLDB features







*   Reverse Stepping: With the hierarchical reconstruction of the function 
calls, along with the individual instructions, LLDB can offer reverse stepping. 
Operations like reverse-next, reverse-step-out, reverse-continue could work by 
traversing the trace. We plan to work on this once the features presented above 
are in place.

*   Trace-based profiling

*   SB API of the mentioned features





# Why is this useful?







*   Bug root-causing:

    *   For example, a crash in a production Release build ends up being 
analyzed with logs, a coredump, and a stack trace. Logs are not comprehensive, 
and a stack trace only contains the final state of the program. Providing the 
user with the control flow of the last milliseconds gives a tremendous amount 
of information that is game-changing in root-causing issues. It could be said 
that the user goes from a single stack trace to a list of stack traces.

    *   Reverse stepping enables more efficient debugging, as it reduces the 
number of iterations to efficiently root-cause bugs. More often than not, 
reproducing a bug takes a considerable amount of time, and the user needs to 
reproduce it several times until the correct breakpoints are hit. This takes a 
considerable amount of time. Giving the user the information of what has been 
executed so far can help them figuring out where’s the location to place a 
breakpoint, or to very easily figure out what went wrong.

*   Low cost: unlike other similar technologies, Intel PT has an almost 
negligible performance cost regardless of whether the build is optimized or 
not, making it appealing to a wide range of scenarios.

*   This infrastructure can be used for enabling other tools like 
non-sample-based profilers with instruction-level accuracy, security analyzers 
that check if certain memory regions are executed, and trace comparators, which 
could find bugs by comparing similar traces.





# Goals of this document:







*   Gather feedback on the basic Trace implementation, which would include the 
following basic operations: loading, decoding, and dumping.

*   Create awareness about this work.

*   Get a green light on the current set of patches implementing this feature 
starting with https://reviews.llvm.org/D85705.





# Non-Goals:







*   Discuss how reverse-stepping will be implemented. This can be left for 
another discussion. Once the Trace architecture is in place and robust, 
reverse-stepping can then be discussed, as it’s a more controversial change 
than this one.

*   Explain thoroughly Intel PT.





# Existing Tool Support







*   GDB has a basic implementation of the features above 
([https://sourceware.org/gdb/onlinedocs/gdb/Process-Record-and-Replay.html](https://sourceware.org/gdb/onlinedocs/gdb/Process-Record-and-Replay.html))
 and some ideas are taken from there.

*   Perf is a standalone tool that can do capturing and decoding.

*   The Linux kernel has full support for doing capturing at thread, logical 
cpu or cgroup level.

*   Intel developed a basic version of Intel PT support in LLDB as an external 
plugin. [https://reviews.llvm.org/D33674](https://reviews.llvm.org/D33674), 
[https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b](https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b).





# New Trace Commands



Based on this patch 
[https://reviews.llvm.org/D85705](https://reviews.llvm.org/D85705), there would 
be a common Trace class along with plug-in implementations.





## Trace loading





### $ trace load /path/to/trace/settings/file.json



As decoding a trace requires the images of the object files, the trace files 
and some CPU information, it’s convenient to have a JSON file that describes an 
entire trace session. The following JSON schema could be used.





```

{

"trace": {

   … // plug-in specific information

 },

 "processes": [      // process information common to all trace plug-ins

   {

     "pid": integer,

     "triple": string, // llvm-triple

     "threads": [

       {

         "tid": integer,

         "traceFile": string

       }

     ],

     "modules": [

       {

         "systemPath": string, // original path of the module at runtime

         "file"?: string, // copy of the file if not available at "systemPath"

         "loadAddress": string, // string address in hex or decimal form

         "uuid"?: string,

       }

     ]

   }

 ]

}

// Notes:

// All paths are either absolute or relative to the settings file.

```





**Corefiles:**



We plan to extend this schema to support corefiles, but we would leave it out 
of this discussion, as can be easily seen as an extension of this basic schema.



**Implementation details:**



To make our first implementation easier, we’ll ask for an individual trace file 
per thread. This is the simpler collection mode for Intel PT.



The entire json file will be translated into a Trace object, which contains the 
trace information of each thread and process in it.



Each process in the json file will be represented as a new Target. Similarly, 
threads and modules for each target will be created following the json file. 
This is very similar to what loading a minidump or coredump does.



Each Target will be associated with a Trace, and multiple targets can share the 
same Trace. The contract is that Trace is assumed to end at the current PC of 
each thread of the target.





### $ trace schema &lt;plug-in>



This command prints the JSON schema of the trace settings file for the provided 
plug-in. It would output something similar to this





```

{

"trace": {

   "type": "intel-pt",

   "pt_cpu": {

     "vendor": "intel" | "unknown",

     "family": integer,

     "model": integer,

     "stepping": integer

   }

 },

 "processes": [

   {

     "pid": integer,

     "triple": string, // llvm-triple

     "threads": [

       {

         "tid": integer,

         "traceFile": string

       }

     ],

     "modules": [

       {

         "systemPath": string, // original path of the module at runtime

         "file"?: string, // copy of the file if not available at "systemPath"

         "loadAddress": string, // string address in hex or decimal form

         "uuid"?: string,

       }

     ]

   }

 ]

}

// Notes:

// All paths are either absolute or relative to the settings file.

```







### $ trace dump [--verbose] [-t tid1] [-t tid2] ...



Print the trace information corresponding to the provided thread ids of the 
currently selected target, which would mainly include the same information as 
the trace settings file. If no tid is provided, the currently selected thread 
is used. This would be useful for debugging. The information would be like



  Modules:



    &lt;module info like systemPath, file, load address, uuid, size>



  Threads:



    &lt;thread info like location of trace file, number of instructions (if 
already decoded), number   of function calls (if already decoded)>



If &lt;--verbose> is passed, the original settings.json file is printed as well.





## Decoder-based commands



The following commands require decoding the trace and are of the form. “trace dump 
&lt;action> [-t &lt;tid>]”. If tids are not specified, then the current thread 
or the current target will be used.





### $ trace dump --instructions [-t &lt;tid>] [-c &lt;count> = 10] [-o 
&lt;offset> = 0]



This command would print the last &lt;count> instructions starting at the given 
offset from the last instruction in the trace. The output would be similar to that of 
the “disassembly” command and would include timing information if available.





```

    $ trace dump --instructions -c 5

    pid: '1234', tid: '1981309'

      a.out`main

      [57] 0x400549 <+13>: movl   %eax, -0x4(%rbp)

      a.out`bar()

      [56] 0x40053b <+46>: retq

      [55] 0x40053a <+45>: leave

      [54] error -13. 'no memory mapped at this address'

      a.out`foo()

      [53] 0x400567 <+15>: retq

```





Repeating the command would continue printing where it was left off in the last 
run.



**Implementation details:**



Each instruction output by the decoder is either an actual instruction or an 
error. An error can be caused due to a collection error (e.g. internal CPU 
buffer overflow error) or a decoding error (e.g. the image of an object file is 
missing while decoding). These errors represent gaps in the trace and the user 
should know about them, so we print them accordingly in this dump.



Each instruction (including errors) has an index in the decoded trace, and 
serves as a checkpoint.





### $ trace dump --function-calls [-t &lt;tid>] [-c &lt;count> = 10] [-o 
&lt;offset> = 0] [--flat]



This command would print the hierarchical list of function calls. Similar to the 
“--instructions” command, it would show the last &lt;count> function calls with 
the given offset from the last instructions. Timing information would be included if 
available.





```

    $ trace dump --function-calls

    pid: '1234', tid: '1981309'

      [50]     a.out`bar()         0x40052a

      [45]       a.out`zaz()       0x400558

      [40]     a.out`baz()         0x400559

      [30]   a.out`foo()           0x400567

      [0]  a.out`main              0x400000

```





Repeating the command would continue printing where it was left off in the last 
run.



If &lt;--flat> is passed, then instead of a hierarchical view, a flat list 
would be produced.





## Capturing command





### $ trace start &lt;plugin_name> [-t &lt;tid>] [--all] [-b 
&lt;buffer_size_in_KB>]



This command will start tracing the given thread of the currently selected 
target, or all the threads of that target if “--all” is passed. If “--all” is 
passed, any thread created after this command will also be traced automatically.



Besides, the optional -b parameter can define the size of each trace buffer to 
be created. I haven’t yet decided a default one, but 1M might be acceptable, as 
it traces around 1 million instructions on average according to Intel, and 
that’s more than enough for a useful analysis.



For an initial implementation, the plugin_name parameter will be required (e.g. 
intel-pt). Later a more automated mechanism for finding the right plugin can be 
implemented.



**Implementation notes:**



There’s already a basic implementation in lldb as an external plugin. It’s in 
[https://reviews.llvm.org/source/llvm-github/browse/master/lldb/tools/intel-features/intel-pt/](https://reviews.llvm.org/source/llvm-github/browse/master/lldb/tools/intel-features/intel-pt/)
 created by 
[https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b](https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b).
 It hasn’t received much attention and has been mostly unmaintained since it 
was created. It’s already capable of tracing a given thread and collecting the 
trace buffer. We plan to reuse that logic, which is already working.



A Trace object will be created and will be associated with the current Target.



Any interaction with trace, like dumping instructions, will trigger a fetch of 
the most recent trace buffer, unless it hasn’t changed.



When multiple threads are traced, each one will have its own trace buffer, as 
sharing one buffer in multiple threads requires knowing when each context 
switch happened so that the decoded trace can be split correctly among threads. 
This is beyond the scope of the initial version of this project.





### $ trace save /path/to/file.json [--copy-images]



This creates a bundle trace with settings saved in the given json file for the 
current process. By default, it doesn’t create any copy of the images loaded on 
the process, unless the “--copy-images” parameter is specified. That parameter 
is useful for analyzing the trace in a machine other than where it was captured.





# Remote Protocol Changes



No remote protocol changes are required, as 
[https://reviews.llvm.org/D33674](https://reviews.llvm.org/D33674) and 
[https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b](https://reviews.llvm.org/rG307db0f8974d1b28d7b237cb0d50895efc7f6e6b)
 already created them some years ago.





# Build Requirements



In order to build LLDB with this support, it has to be linked with a build of 
libipt [https://github.com/intel/libipt](https://github.com/intel/libipt), 
which is the decoder.





# Operating System Requirements for Collection/Tracing



Collection can only be done on linux if the file 
/sys/bus/event_source/devices/intel_pt/type is defined. The logic gating this 
feature is already checked in and defined in 
[https://reviews.llvm.org/D33674](https://reviews.llvm.org/D33674).





# Testing



It’s fortunately straightforward to test this feature. It’s possible to capture 
traces with perf or with the future “trace start” / ”trace save” commands and 
create trace bundles with their corresponding settings .json file. Analyzing 
those traces should give the same results on any machine, making testing 
deterministic. 
[https://reviews.llvm.org/D85705](https://reviews.llvm.org/D85705) and 
descendents already implement some deterministic tests.

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] RFC: Processor Trace Support in LLDB

Reply via email to