Re: [lldb-dev] Inquiry for performance monitors

Pavel Labath via lldb-dev Mon, 01 Feb 2016 02:18:07 -0800

It feels to me that the python based approach could run into a dead
end fairly quickly: a) you can only access the data when the target is
stopped; b) the self-tracing means that the evaluation of these
expressions would introduce noise in the data; c) overhead of all the
extra packets(?).


So, I would be in favor of a lldb-server based approach. I'm not
telling you that you shouldn't do that, but I don't think that's an
approach I would take...

pl


On 1 February 2016 at 08:58, Ravitheja Addepally
<ravithejaw...@gmail.com> wrote:
> Ok, that is one option, but one of the aim for this activity is to make the
> data available for use by the IDE's like Android Studio or XCode or any
> other that may want to display this information in its environment so
> keeping that in consideration would the complete python based approach be
> useful ? or would providing LLDB api's to extract raw perf data from the
> target be useful ?
>
> On Thu, Jan 21, 2016 at 10:00 PM, Greg Clayton <gclay...@apple.com> wrote:
>>
>> One thing to think about is you can actually just run an expression in the
>> program that is being debugged without needing to change anything in the GDB
>> remote server. So this can all be done via python commands and would require
>> no changes to anything. So you can run an expression to enable the buffer.
>> Since LLDB supports multiple line expression that can define their own local
>> variables and local types. So the expression could be something like:
>>
>> int perf_fd = (int)perf_event_open(...);
>> struct PerfData
>> {
>>     void *data;
>>     size_t size;
>> };
>> PerfData result = read_perf_data(perf_fd);
>> result
>>
>>
>> The result is then a structure that you can access from your python
>> command (it will be a SBValue) and then you can read memory in order to get
>> the perf data.
>>
>> You can also split things up into multiple calls where you can run
>> perf_event_open() on its own and return the file descriptor:
>>
>> (int)perf_event_open(...)
>>
>> This expression will return the file descriptor
>>
>> Then you could allocate memory via the SBProcess:
>>
>> (void *)malloc(1024);
>>
>> The result of this expression will be the buffer that you use...
>>
>> Then you can read 1024 bytes at a time into this newly created buffer.
>>
>> So a solution that is completely done in python would be very attractive.
>>
>> Greg
>>
>>
>> > On Jan 21, 2016, at 7:04 AM, Ravitheja Addepally
>> > <ravithejaw...@gmail.com> wrote:
>> >
>> > Hello,
>> >       Regarding the questions in this thread please find the answers ->
>> >
>> > How are you going to present this information to the user? (I know
>> > debugserver can report some performance data... Have you looked into
>> > how that works? Do you plan to reuse some parts of that
>> > infrastructure?) and How will you get the information from the server to
>> > the client?
>> >
>> >  Currently I plan to show a list of instructions that have been executed
>> > so far, I saw the
>> > implementation suggested by pavel, the already present infrastructure is
>> > a little bit lacking in terms of the needs of the
>> > project, but I plan to follow a similar approach, i.e to extract the raw
>> > trace data by querying the server (which can use the
>> > perf_event_open to get the raw trace data from the kernel) and transport
>> > it through gdb packets ( qXfer packets
>> >
>> > https://sourceware.org/gdb/onlinedocs/gdb/Branch-Trace-Format.html#Branch-Trace-Format).
>> > At the client side the raw trace data
>> > could be passed on to python based command that could decode the data.
>> > This also eliminates the dependency of libipt since LLDB
>> > would not decode the data itself.
>> >
>> > There is also the question of this third party library.  Do we take a
>> > hard dependency on libipt (probably a non-starter), or only use it if it's
>> > available (much better)?
>> >
>> > With the above mentioned way LLDB would not need the library, who ever
>> > wants to use the python command would have to install it separately but 
>> > LLDB
>> > wont need it
>> >
>> > With the performance counters, the interface would still be
>> > perf_event_open, so if there was a perf_wrapper in LLDB server then it 
>> > could
>> > be reused to configure and use the
>> > software performance counters as well, you would just need to pass
>> > different attributes in the perf_event_open system call, plus I think the
>> > perf_wrapper could be reused to
>> > get CoreSight information as well (see https://lwn.net/Articles/664236/
>> > )
>> >
>> >
>> > On Wed, Oct 21, 2015 at 8:57 PM, Greg Clayton <gclay...@apple.com>
>> > wrote:
>> > one main benefit to doing this externally is allow this to be done
>> > remotely over any debugger connection. If you can run expressions to
>> > enable/disable/setup the memory buffer/access the buffer contents, then you
>> > don't need to add code into the debugger to actually do this.
>> >
>> > Greg
>> >
>> > > On Oct 21, 2015, at 11:54 AM, Greg Clayton <gclay...@apple.com> wrote:
>> > >
>> > > IMHO the best way to provide this information is to implement reverse
>> > > debugging packets in a GDB server (lldb-server). If you enable this 
>> > > feature
>> > > via some packet to lldb-server, and that enables the gathering of data 
>> > > that
>> > > keeps the last N instructions run by all threads in some buffer that gets
>> > > overwritten. The lldb-server enables it and gives a buffer to the
>> > > perf_event_interface(). Then clients can ask the lldb-server to step 
>> > > back in
>> > > any thread. Only when the data is requested do we actually use the data 
>> > > to
>> > > implement the reverse stepping.
>> > >
>> > > Another way to do this would be to use a python based command that can
>> > > be added to any target that supports this. The plug-in could install a 
>> > > set
>> > > of LLDB commands. To see how to create new lldb command line commands in
>> > > python, see the section named "CREATE A NEW LLDB COMMAND USING A PYTHON
>> > > FUNCTION" on the http://lldb.llvm.org/python-reference.html web page.
>> > >
>> > > Then you can have some commands like:
>> > >
>> > > intel-pt-start
>> > > intel-pt-dump
>> > > intel-pt-stop
>> > >
>> > > Each command could have options and arguments as desired. The
>> > > "intel-pt-start" command could make an expression call to enable the 
>> > > feature
>> > > in the target by running and expression that runs the some
>> > > perf_event_interface calls that would allocate some memory and hand it to
>> > > the Intel PT stuff. The "intel-pt-dump" could just give a raw dump all of
>> > > history for one or more threads (again, add options and arguments as 
>> > > needed
>> > > to this command). The python code could bridge to C and use the intel
>> > > libraries that know how to process the data.
>> > >
>> > > If this all goes well we can think about building it into LLDB as a
>> > > built in command.
>> > >
>> > >
>> > >> On Oct 21, 2015, at 9:50 AM, Zachary Turner via lldb-dev
>> > >> <lldb-dev@lists.llvm.org> wrote:
>> > >>
>> > >> There are two different kinds of performance counters: OS performance
>> > >> counters and CPU performance counters.  It sounds like you're talking 
>> > >> about
>> > >> the latter, but it's worth considering whether this could be designed 
>> > >> in a
>> > >> way to support both (i.e. even if you don't do both yourself, at least 
>> > >> make
>> > >> the machinery reusable and apply to both for when someone else wanted to
>> > >> come through and add OS perf counters).
>> > >>
>> > >> There is also the question of this third party library.  Do we take a
>> > >> hard dependency on libipt (probably a non-starter), or only use it if 
>> > >> it's
>> > >> available (much better)?
>> > >>
>> > >> As Pavel said, how are you planning to present the information to the
>> > >> user?  Through some sort of top level command like "perfcount
>> > >> instructions_retired"?
>> > >>
>> > >> On Wed, Oct 21, 2015 at 8:16 AM Pavel Labath via lldb-dev
>> > >> <lldb-dev@lists.llvm.org> wrote:
>> > >> [ Moving this discussion back to the list. I pressed the wrong button
>> > >> when replying.]
>> > >>
>> > >> Thanks for the explanation Ravi. It sounds like a very useful feature
>> > >> indeed. I've found a reference to the debugserver profile data in
>> > >> GDBRemoteCommunicationClient.cpp:1276, so maybe that will help with
>> > >> your investigation. Maybe also someone more knowledgeable can explain
>> > >> what those A packets are used for (?).
>> > >>
>> > >>
>> > >> On 21 October 2015 at 15:48, Ravitheja Addepally
>> > >> <ravithejaw...@gmail.com> wrote:
>> > >>> Hi,
>> > >>>   Thanx for your reply, some of the future processors to be released
>> > >>> by
>> > >>> Intel have this hardware support for recording the instructions that
>> > >>> were
>> > >>> executed by the processor and this recording process is also quite
>> > >>> fast and
>> > >>> does not add too much computational load. Now this hardware is made
>> > >>> accessible via the perf_event_interface where one could map a region
>> > >>> of
>> > >>> memory for this purpose by passing it as an argument to this
>> > >>> perf_event_interface. The recorded instructions are then written to
>> > >>> the
>> > >>> memory region assigned. Now this is basically the raw information,
>> > >>> which can
>> > >>> be obtained from the hardware. It can be interpreted and presented
>> > >>> to the
>> > >>> user in the following ways ->
>> > >>>
>> > >>> 1) Instruction history - where the user gets basically a list of all
>> > >>> instructions that were executed
>> > >>> 2) Function Call History - It is also possible to get a list of all
>> > >>> the
>> > >>> functions called in the inferior
>> > >>> 3) Reverse Debugging with limited information - In GDB this is only
>> > >>> the
>> > >>> functions executed.
>> > >>>
>> > >>> This raw information also needs to decoded (even before you can
>> > >>> disassemble
>> > >>> it ), there is already a library released by Intel called libipt
>> > >>> which can
>> > >>> do that. At the moment we plan to work with Instruction History.
>> > >>> I will look into the debugserver infrastructure and get back to you.
>> > >>> I guess
>> > >>> for the server client communication we would rely on packets only.
>> > >>> In case
>> > >>> of concerns about too much data being transferred, we can limit the
>> > >>> number
>> > >>> of entries we report because anyway the amount of data recorded is
>> > >>> too big
>> > >>> to present all at once so we would have to resort to something like
>> > >>> a
>> > >>> viewport.
>> > >>>
>> > >>> Since a lot of instructions can be recorded this way, the function
>> > >>> call
>> > >>> history can be quite useful for debugging and especially since it is
>> > >>> a lot
>> > >>> faster to collect function traces this way.
>> > >>>
>> > >>> -ravi
>> > >>>
>> > >>> On Wed, Oct 21, 2015 at 3:14 PM, Pavel Labath <lab...@google.com>
>> > >>> wrote:
>> > >>>>
>> > >>>> Hi,
>> > >>>>
>> > >>>> I am not really familiar with the perf_event interface (and I
>> > >>>> suspect
>> > >>>> others aren't also), so it might help if you explain what kind of
>> > >>>> information do you plan to collect from there.
>> > >>>>
>> > >>>> As for the PtraceWrapper question, I think that really depends on
>> > >>>> bigger design decisions. My two main questions for a feature like
>> > >>>> this
>> > >>>> would be:
>> > >>>> - How are you going to present this information to the user? (I
>> > >>>> know
>> > >>>> debugserver can report some performance data... Have you looked
>> > >>>> into
>> > >>>> how that works? Do you plan to reuse some parts of that
>> > >>>> infrastructure?)
>> > >>>> - How will you get the information from the server to the client?
>> > >>>>
>> > >>>> pl
>> > >>>>
>> > >>>>
>> > >>>> On 21 October 2015 at 13:41, Ravitheja Addepally via lldb-dev
>> > >>>> <lldb-dev@lists.llvm.org> wrote:
>> > >>>>> Hello,
>> > >>>>>       I want to implement support for reading Performance
>> > >>>>> measurement
>> > >>>>> information using the perf_event_open system calls. The motive is
>> > >>>>> to add
>> > >>>>> support for Intel PT hardware feature, which is available through
>> > >>>>> the
>> > >>>>> perf_event interface. I was thinking of implementing a new Wrapper
>> > >>>>> like
>> > >>>>> PtraceWrapper in NativeProcessLinux files. My query is that, is
>> > >>>>> this a
>> > >>>>> correct place to start or not ? in case not, could someone suggest
>> > >>>>> me
>> > >>>>> another place to begin with ?
>> > >>>>>
>> > >>>>> BR,
>> > >>>>> A Ravi Theja
>> > >>>>>
>> > >>>>>
>> > >>>>> _______________________________________________
>> > >>>>> lldb-dev mailing list
>> > >>>>> lldb-dev@lists.llvm.org
>> > >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>> > >>>>>
>> > >>>
>> > >>>
>> > >> _______________________________________________
>> > >> lldb-dev mailing list
>> > >> lldb-dev@lists.llvm.org
>> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>> > >> _______________________________________________
>> > >> lldb-dev mailing list
>> > >> lldb-dev@lists.llvm.org
>> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>> > >
>> >
>> >
>>
>
_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] Inquiry for performance monitors

Reply via email to