Re: [lldb-dev] RFC: Processor Trace Support in LLDB

2020-09-18 Thread Vedant Kumar via lldb-dev
Hi Walter & Greg,

Thanks for sharing this RFC, and for your work in this area.

> On Sep 17, 2020, at 5:28 PM, Walter via lldb-dev  
> wrote:
> 
> Hi all,
>  
> Here I propose, along with Greg Clayton, Processor Trace support for LLDB. 
> I’m attaching a link to the document that contains this proposal if that’s 
> easier to read for you: 
> https://docs.google.com/document/d/1cOVTGp1sL_HBXjP9eB7qjVtDNr5xnuZvUUtv43G5eVI/edit#heading=h.t5mblb9ugv8f
>  
> .
>  Please make any comments in this mail list.
>  
>  
> If you want to quickly know what Processor Trace can do, you can read this 
> https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace 
> .
>  
> Any comments are appreciated, especially the ones regarding the commands the 
> user will interact with. 
>  
> Thanks,
> Walter Erquinigo.
>  
>  
> # RFC: Processor Trace Support in LLDB
>  
>  
> # What is processor tracing?
>  
> Processor tracing works by capturing information about the execution of a 
> process so that the control flow of the program can be reconstructed later. 
> Implementations of this are Intel Processor Trace for X86, x86_64 
> ([https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html](https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html)
>  
> )
>  and ARM CoreSight for some ARM devices 
> ([https://developer.arm.com/ip-products/system-ip/coresight-debug-and-trace](https://developer.arm.com/ip-products/system-ip/coresight-debug-and-trace)
>  
> ).
>  
>  
> As a clarifying example, with these technologies it’s possible to trace all 
> the threads of a process, and after the process has finished, reconstruct 
> every single instruction address each thread has executed. This could include 
> some additional information like timestamps, async CPU events, kernel 
> instructions, bus clock ratio changes, etc. On the other hand, memory and 
> registers are not traced as a way to limit the size of the trace.
>  
>  
> # Intel Processor Trace as the first implementation
>  
> We’ll focus on Intel Processor Trace (Intel PT), but in a generic way so that 
> in the future similar technologies can be onboarded in LLDB.
>  
> Intel PT has the following features:
>  
>  
>  
> *   Control flow tracing in a highly encoded format
> *   3% to 5% slowdown when capturing
> *   No memory nor registers captured
> *   Kernel tracing support
> *   Timestamps of branches are produced, which can be used for profiling
> *   Adjustable size of trace buffer
> *   Supported on most Intel CPUs since 2015
> *   X86 and x86_64 only
> *   Official support only on Linux
> *   Basic support on Windows
> *   Decoding/analysis can be done on any operating system
>  
> A very nice introduction to Intel PT can be found 
> [https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html](https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html)
>  
> 
>  and 
> [https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace](https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace)
>  
> .
>  Totally recommended to fully grasp the impact of this project. 
>  
> More technical details are in 
> [https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/perf-intel-pt.txt](https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/perf-intel-pt.txt)
>  
> .
>  
>  
> Even more technical details are in the processor manual 
> [https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3c-part-3-manual.pdf](https:

Re: [lldb-dev] RFC: Processor Trace Support in LLDB

2020-09-18 Thread Walter via lldb-dev
Thanks for your comments, I'll reply here:

> Is it possible to decode a small portion of an Intel PT trace file
quickly, say, in a few milliseconds? This would be useful if tracing were
done in ringbuffer mode, or if the event the user is interested in
debugging (along with its relevant execution history) is known to occur at
the end of the trace. The user could potentially choose which subset of the
trace to decode, and re-decode a different subset if more context is needed.

Yes, it's totally possible. I'll dig a little deep into the Intel PT trace
structure. A trace is made of a bunch of packets, where some are
synchronization packets (PSB packets). You can pick an arbitrarily sync
packet, and from that point on start decoding. This means that it's
possible to decode backwards (i.e. find the last sync packet, decode the
packets up to the end of the trace, then move to the previous sync packet,
decode until the former packet, and so on). This is very valuable and I'll
keep it in mind for the implementation.

> What mechanisms are available for discerning the root cause of a gap?
Does the Intel PT decoder have internal consistency checks that can
diagnose hardware bugs (or decoder bugs, for that matter)?

Yes. Whenever there's a decoding error, the libipt decoder notifies us of
what the error is. You can check the DecodeInstructions function in
https://reviews.llvm.org/D87589 if you are interested, although it's not a
light read.

> Also, when a gap occurs, perhaps it's possible that the instructions
leading up to the gap are not accurate. E.g., if the decoding process
desyncs from the trace file while disassembling, it's possible to
accidentally follow (or ignore) a branch. Are there measures to
detect/erase those inaccurate instructions prior to a gap?

I don't think this can happen. When an instruction can't be decoded, the
decoder moves to the next synchronization point and resumes decoding from
that point. Interestingly, it's possible to configure how often
synchronization packets are produced. IIRC you could even request one sync
packet per CPU cycle, leading to small gaps. If this configuration is not
specified, the CPU itself decides when to produce these packets, which tend
to be every few KB of data.

> Also, how should a gap be represented in the debugger output? E.g., if a
gap is encountered while dumping instructions, should the debugger print
?
Imho it's important to nail down a user interface metaphor for
navigating/exploring a trace before adding any 'dump'-like commands. I
don't think we've done that yet.

We definitely should let the user know of these events. When dumping traces
in this WIP diff https://reviews.llvm.org/D87730, I'm already showing the
user these gaps and a reason why they failed.

[4] 0x400529 <+28>: cmpl   $0x3, -0x8(%rbp)
[3] error -13. 'no memory mapped at this address'
[2] 0x40052d <+32>: jle0x400521


There's nothing more useful to do besides showing this information somehow.
It's information lost, so I think it's fine as it is :)

However, the real interesting point to discuss is how we could implement
reverse debugging under these circumstances. What if you do reverse-next
and the trace has a gap but a bunch of instructions before that gap, should
we abort the reverse-next and tell the user that there's a gap, then the
user somehow has to move backwards in another way if that's the intention?
Or should we just move backwards skipping the gap and printing an error
message that there's a gap? Probably I'd choose the latter over the former,
but I imagine some people would prefer the first. I'd prefer to leave this
for a future discussion. Dropping some bit of information here, several
IDEs like VSCode already support reverse-debugging controls, so it would
make sense to make the default behavior of the LLDB implementation follow
those controls, and create some other commands for the folks who want
something different.

> I'm not trying to hold up work: I think these 'dump' subcommands can be
hidden, or maybe they could print a 'for lldb developers only' warning
until we have a better idea of how users will want to explore a trace.

I envision these dump commands as the most inefficient way to explore a
trace, and I wouldn't add much more to them. I think that the best way to
explore is with reverse debugging (e.g. place a breakpoint, do
reverse-continue, stop at that breakpoint, move forward and backwards,
print the stack trace, move to another breakpoint, etc.) A trace has so
much information but the user already has an idea of where they want to
look at when root causing a bug, so breakpoints are the easiest interface
for the user to tell LLDB what they are interested in.

> One potential UI metaphor is a slider: the user can see where (which
instruction index) in the decoded trace instruction stream they are, and
they can move the slider (jump backwards/forwards in the instruction
stream) as desired. Wherever they are stopped, they can get an accurate
backtr

Re: [lldb-dev] RFC: Processor Trace Support in LLDB

2020-09-18 Thread Eric Christopher via lldb-dev
Hi Walter,

I've only done a brief scan of the document but, in general, I'm favorable
of the goals, aim, and approach. Something I think would be good would be
to compare/contrast against rr as an "exploring alternatives" section of
the document. I think the document should also be made available/adapted to
be part of the documentation on "why lldb is implementing this feature/what
it can be used for/why".

Thanks so much for starting this and looking forward to the work and
collaboration.

-eric

On Thu, Sep 17, 2020 at 8:28 PM Walter via lldb-dev 
wrote:

> Hi all,
>
>
>
> Here I propose, along with Greg Clayton, Processor Trace support for LLDB. 
> I’m attaching a link to the document that contains this proposal if that’s 
> easier to read for you: 
> https://docs.google.com/document/d/1cOVTGp1sL_HBXjP9eB7qjVtDNr5xnuZvUUtv43G5eVI/edit#heading=h.t5mblb9ugv8f
>  
> .
>  Please make any comments in this mail list.
>
>
>
> If you want to quickly know what Processor Trace can do, you can read this 
> https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace 
> .
>
>
>
> Any comments are appreciated, especially the ones regarding the commands the 
> user will interact with.
>
>
>
> Thanks,
>
> Walter Erquinigo.
>
>
>
>
>
> # RFC: Processor Trace Support in LLDB
>
>
>
>
>
> # What is processor tracing?
>
>
>
> Processor tracing works by capturing information about the execution of a 
> process so that the control flow of the program can be reconstructed later. 
> Implementations of this are Intel Processor Trace for X86, x86_64 
> ([https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html](https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html))
>  and ARM CoreSight for some ARM devices 
> ([https://developer.arm.com/ip-products/system-ip/coresight-debug-and-trace](https://developer.arm.com/ip-products/system-ip/coresight-debug-and-trace)).
>
>
>
> As a clarifying example, with these technologies it’s possible to trace all 
> the threads of a process, and after the process has finished, reconstruct 
> every single instruction address each thread has executed. This could include 
> some additional information like timestamps, async CPU events, kernel 
> instructions, bus clock ratio changes, etc. On the other hand, memory and 
> registers are not traced as a way to limit the size of the trace.
>
>
>
>
>
> # Intel Processor Trace as the first implementation
>
>
>
> We’ll focus on Intel Processor Trace (Intel PT), but in a generic way so that 
> in the future similar technologies can be onboarded in LLDB.
>
>
>
> Intel PT has the following features:
>
>
>
>
>
>
>
> *   Control flow tracing in a highly encoded format
>
> *   3% to 5% slowdown when capturing
>
> *   No memory nor registers captured
>
> *   Kernel tracing support
>
> *   Timestamps of branches are produced, which can be used for profiling
>
> *   Adjustable size of trace buffer
>
> *   Supported on most Intel CPUs since 2015
>
> *   X86 and x86_64 only
>
> *   Official support only on Linux
>
> *   Basic support on Windows
>
> *   Decoding/analysis can be done on any operating system
>
>
>
> A very nice introduction to Intel PT can be found 
> [https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html](https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html)
>  and 
> [https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace](https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace).
>  Totally recommended to fully grasp the impact of this project.
>
>
>
> More technical details are in 
> [https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/perf-intel-pt.txt](https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/perf-intel-pt.txt).
>
>
>
> Even more technical details are in the processor manual 
> [https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3c-part-3-manual.pdf](https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3c-part-3-manual.pdf)
>
>
>
>
>
> # Basic Definitions
>
>
>
>
>
>
>
> *   Trace file: A trace file basically contains the information of the target 
> addresses of each branch or jump within the program execution in a highly 
> encoded format.
>
> *   Capturin