Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Tamminen, Eero T
Hi,

On Thu, 2021-02-11 at 17:39 -0800, John Bates wrote:
> I recently opened issue 4262
>  to begin the
> discussion on integrating perfetto into mesa.
> 
> *Background*
> 
> System-wide tracing is an invaluable tool for developers to find and
> fix
> performance problems. The perfetto project enables a combined view of
> trace
> data from kernel ftrace, GPU driver and various manually-instrumented
> tracepoints throughout the application and system.

Unlike some other Linux tracing solutions, Perfetto appears to be for
Android / Chrome(OS?), and not available from in common Linux distro
repos.

So, why Perfetto instead of one of the other solutions, e.g. from ones
mentioned here:
https://tracingsummit.org/ts/2018/
?

And, if tracing API is added to Mesa, shouldn't it support also
tracepoints for other tracing solutions?

I mean, code added to drivers themselves preferably should not have
anything perfetto/percetto specific.  Tracing system specific code
should be only in one place (even if it's just macros in common header).


> This helps developers
> quickly answer questions like:
> 
>    - How long are frames taking?

That doesn't require any changes to Mesa.  Just set uprobe for suitable
buffer swap function [1], and parse kernel ftrace events.  This way
starting tracing doesn't require even restarting the tracked processes.


[1] glXSwapBuffers, eglSwapBuffers, eglSwapBuffersWithDamageEXT, 
anv_QueuePresentKHR[2]..

[2] Many apps resolve "vkQueuePresentKHR" Vulkan API loader wrapper
function and call the backend function like "anv_QueuePresentKHR"
directly, so it's  better to track latter instead.


>    - What caused a particular frame drop?
>    - Is it CPU bound or GPU bound?

That doesn't require adding tracepoints to Mesa, just checking CPU & GPU
utilization (which is lower level thing).


>    - Did a CPU core frequency drop cause something to go slower than
> usual?

Note that nowadays actual CPU frequencies are often controlled by HW /
firmware, so you don't necessarily get any ftrace event from freq
change, you would need to poll MSR registers instead (which is
privileged operation, and polling can easily miss changes).


>    - Is something else running that is stealing CPU or GPU time? Could
> I
>    fix that with better thread/context priorities?
>    - Are all CPU cores being used effectively? Do I need
> sched_setaffinity
>    to keep my thread on a big or little core?

I don't think these to require adding tracepoints to Mesa either...


>    - What’s the latency between CPU frame submit and GPU start?

I think this would require tracepoints in kernel GPU code more than in
Mesa?


- Eero


> *What Does Mesa + Perfetto Provide?*
> 
> Mesa is in a unique position to produce GPU trace data for several GPU
> vendors without requiring the developer to build and install
> additional
> tools like gfx-pps .
> 
> The key is making it easy for developers to use. Ideally, perfetto is
> eventually available by default in mesa so that if your system has
> perfetto
> traced running, you just need to run perfetto (perhaps along with
> setting
> an environment variable) with the mesa categories to see:
> 
>    - GPU processing timeline events.
>    - GPU counters.
>    - CPU events for potentially slow functions in mesa like shader
> compiles.
> 
> Example of what this data might look like (with fake GPU events):
> [image: percetto-gpu-example.png]
> 
> *Runtime Characteristics*
> 
>    - ~500KB additional binary size. Even with using only the basic
> features
>    of perfetto, it will increase the binary size of mesa by about
> 500KB.
>    - Background thread. Perfetto uses a background thread for
> communication
>    with the system tracing daemon (traced) to advertise trace data and
> get
>    notification of trace start/stop.
>    - Runtime overhead when disabled is designed to be optimal with one
>    predicted branch, typically a few CPU cycles
>   
> 
> per
>    event. While enabled, the overhead can be around 1 us per event.
> 
> *Integration Challenges*
> 
>    - The perfetto SDK is C++ and designed around macros, lambdas,
> inline
>    templates, etc. There are ongoing discussions on providing an
> official
>    perfetto C API, but it is not yet clear when this will land on the
> perfetto
>    roadmap.
>    - The perfetto SDK is an amalgamated .h and .cc that adds up to
> 100K
>    lines of code.
>    - Anything that includes perfetto.h takes a long time to compile.
>    - The current Perfetto SDK design is incompatible with being a
> shared
>    library behind a C API.
> 
> *Percetto*
> 
> The percetto library  was
> recently
> implemented to provide an interim C API for perfetto. It provides
> efficient
> support for scoped trace events, multiple categories, counters, 

[Mesa-dev] Small problem with my Mesa install (via Re: [ANNOUNCE] mesa 20.2.2)

2021-02-12 Thread Ed B
Dear Dylan / Mesa Developer(s),

I have just managed to find this contact email for someone who is something to 
do with Mesa development / repair :-)

I have a problem with my computer crashing - probably due to Mesa updating from 
version 20.0.8 to 20.2.6, please see my enquiry to the KDE Forum 
https://forum.kde.org/viewtopic.php?f=63=169662  For further details.

I would be very grateful if you would you point me to the right person / Forum 
for dealing with this issue please (you will notice I am not a software person) 
?
And you are welcome to pass on this email.

For the person who can tell me, I would like to know -

Is this a "bug" which will be fixed ?
Is this old hardware which was removed from programming, and might be 
re-instated in the programming ?
Am I at a "dead-end" ?  and will have to forgo further Updates permanently - I 
do hope not :-/

I didn't mention on the Forum that I was having memory "segmentation" faults.   
I have looked those up and they seem to be very and awkward to deal with - so I 
am hoping it is "just" a software bug which is fixable.

Currently I am using Kubuntu 20-10, I have moved from K-20-04, and tried Ubuntu 
20-04 on the way, to no avail.

Many thanks,  Regards,  Ed G8BQR
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Mark Janes
I've recently been using GPUVis to look at trace events.  On Intel
platforms, GPUVis incorporates ftrace events from the i915 driver,
performance metrics from igt-gpu-tools, and userspace ftrace markers
that I locally hack up in Mesa.

It is very easy to compile the GPUVis UI.  Userspace instrumentation
requires a single C/C++ header.  You don't have to access an external
web service to analyze trace data (a big no-no for devs working on
preproduction hardware).

Is it possible to build and run the Perfetto UI locally?  Can it display
arbitrary trace events that are written to
/sys/kernel/tracing/trace_marker ?  Can it be extended to show i915 and
i915-perf-recorder events?

John Bates  writes:

> I recently opened issue 4262
>  to begin the
> discussion on integrating perfetto into mesa.
>
> *Background*
>
> System-wide tracing is an invaluable tool for developers to find and fix
> performance problems. The perfetto project enables a combined view of trace
> data from kernel ftrace, GPU driver and various manually-instrumented
> tracepoints throughout the application and system. This helps developers
> quickly answer questions like:
>
>- How long are frames taking?
>- What caused a particular frame drop?
>- Is it CPU bound or GPU bound?
>- Did a CPU core frequency drop cause something to go slower than usual?
>- Is something else running that is stealing CPU or GPU time? Could I
>fix that with better thread/context priorities?
>- Are all CPU cores being used effectively? Do I need sched_setaffinity
>to keep my thread on a big or little core?
>- What’s the latency between CPU frame submit and GPU start?
>
> *What Does Mesa + Perfetto Provide?*
>
> Mesa is in a unique position to produce GPU trace data for several GPU
> vendors without requiring the developer to build and install additional
> tools like gfx-pps .
>
> The key is making it easy for developers to use. Ideally, perfetto is
> eventually available by default in mesa so that if your system has perfetto
> traced running, you just need to run perfetto (perhaps along with setting
> an environment variable) with the mesa categories to see:
>
>- GPU processing timeline events.
>- GPU counters.
>- CPU events for potentially slow functions in mesa like shader compiles.
>
> Example of what this data might look like (with fake GPU events):
> [image: percetto-gpu-example.png]
>
> *Runtime Characteristics*
>
>- ~500KB additional binary size. Even with using only the basic features
>of perfetto, it will increase the binary size of mesa by about 500KB.
>- Background thread. Perfetto uses a background thread for communication
>with the system tracing daemon (traced) to advertise trace data and get
>notification of trace start/stop.
>- Runtime overhead when disabled is designed to be optimal with one
>predicted branch, typically a few CPU cycles
> per
>event. While enabled, the overhead can be around 1 us per event.
>
> *Integration Challenges*
>
>- The perfetto SDK is C++ and designed around macros, lambdas, inline
>templates, etc. There are ongoing discussions on providing an official
>perfetto C API, but it is not yet clear when this will land on the perfetto
>roadmap.
>- The perfetto SDK is an amalgamated .h and .cc that adds up to 100K
>lines of code.
>- Anything that includes perfetto.h takes a long time to compile.
>- The current Perfetto SDK design is incompatible with being a shared
>library behind a C API.
>
> *Percetto*
>
> The percetto library  was recently
> implemented to provide an interim C API for perfetto. It provides efficient
> support for scoped trace events, multiple categories, counters, custom
> timestamps, and debug data annotations. Percetto also provides some
> features that are important to mesa, but not available yet with perfetto
> SDK:
>
>- Trace events from multiple perfetto instances in separate shared
>libraries (like mesa and virglrenderer) show correctly in a single process
>and thread view.
>- Counter tracks and macro API.
>
> Percetto is missing API for perfetto's GPU DataSource and counter support,
> but that feature could be implemented next if it is important for mesa.
> With the existing percetto API mesa could present GPU trace data as named
> 'slice' events and int64_t counters with custom timestamps as shown in the
> image above (based on this sample
> ).
>
> *Mesa Integration Alternatives*
>
> Note: we have some pressing needs for performance analysis in Chrome OS, so
> I'm intentionally leaving out the alternative of waiting for an official
> perfetto C API. Of course, once that C API is available it would 

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread John Bates
On Fri, Feb 12, 2021 at 5:01 AM Tamminen, Eero T 
wrote:

>
> Unlike some other Linux tracing solutions, Perfetto appears to be for
> Android / Chrome(OS?), and not available from in common Linux distro
> repos.
>
> So, why Perfetto instead of one of the other solutions, e.g. from ones
> mentioned here:
> https://tracingsummit.org/ts/2018/
> ?
>
>
Good question. Perfetto is for Linux, Android, and Chrome OS. Not sure what
Linux distros provide it besides Android and Chrome OS. It provides
comprehensive tracing solutions from data collection and tools to
convenient web-based UI and analysis as well as interoperation with
other trace data providers. Looking at the tracing summit presentations,
for example, there appear to be some good additional tracing data sources
that could potentially feed into Perfetto trace daemon and UI. But none of
those particular projects are providing a comprehensive solution like
Perfetto is. Lots more detail at perfetto.dev.


> And, if tracing API is added to Mesa, shouldn't it support also
> tracepoints for other tracing solutions?
>
> I mean, code added to drivers themselves preferably should not have
> anything perfetto/percetto specific.  Tracing system specific code
> should be only in one place (even if it's just macros in common header).


I agree it makes sense to keep the macro API implementation in a common
mesa header so that we have the option of changing out the backend. On the
other hand, it can get difficult to maintain more than one tracing backend,
especially when tracing usage goes beyond the simple TRACE_SCOPE(__func__)
macros. For example, with GPU timeline tracks, counters, etc. I would not
expect mesa devs to test their tracing code on more than one tracing
backend, so it would be likely for other backends to regress. So ideally we
could pick one.


>
>
> > This helps developers
> > quickly answer questions like:
> >
> >- How long are frames taking?
>
> That doesn't require any changes to Mesa.  Just set uprobe for suitable
> buffer swap function [1], and parse kernel ftrace events.  This way
> starting tracing doesn't require even restarting the tracked processes.
>
>
> [1] glXSwapBuffers, eglSwapBuffers, eglSwapBuffersWithDamageEXT,
> anv_QueuePresentKHR[2]..
>
> [2] Many apps resolve "vkQueuePresentKHR" Vulkan API loader wrapper
> function and call the backend function like "anv_QueuePresentKHR"
> directly, so it's  better to track latter instead.
>
>
> >- What caused a particular frame drop?
> >- Is it CPU bound or GPU bound?
>
> That doesn't require adding tracepoints to Mesa, just checking CPU & GPU
> utilization (which is lower level thing).
>
>
> >- Did a CPU core frequency drop cause something to go slower than
> > usual?
>
> Note that nowadays actual CPU frequencies are often controlled by HW /
> firmware, so you don't necessarily get any ftrace event from freq
> change, you would need to poll MSR registers instead (which is
> privileged operation, and polling can easily miss changes).
>
>
> >- Is something else running that is stealing CPU or GPU time? Could
> > I
> >fix that with better thread/context priorities?
> >- Are all CPU cores being used effectively? Do I need
> > sched_setaffinity
> >to keep my thread on a big or little core?
>
> I don't think these to require adding tracepoints to Mesa either...
>
>
> >- What’s the latency between CPU frame submit and GPU start?
>
> I think this would require tracepoints in kernel GPU code more than in
> Mesa?
>
>
> - Eero
>
>
> > *What Does Mesa + Perfetto Provide?*
> >
> > Mesa is in a unique position to produce GPU trace data for several GPU
> > vendors without requiring the developer to build and install
> > additional
> > tools like gfx-pps .
> >
> > The key is making it easy for developers to use. Ideally, perfetto is
> > eventually available by default in mesa so that if your system has
> > perfetto
> > traced running, you just need to run perfetto (perhaps along with
> > setting
> > an environment variable) with the mesa categories to see:
> >
> >- GPU processing timeline events.
> >- GPU counters.
> >- CPU events for potentially slow functions in mesa like shader
> > compiles.
> >
> > Example of what this data might look like (with fake GPU events):
> > [image: percetto-gpu-example.png]
> >
> > *Runtime Characteristics*
> >
> >- ~500KB additional binary size. Even with using only the basic
> > features
> >of perfetto, it will increase the binary size of mesa by about
> > 500KB.
> >- Background thread. Perfetto uses a background thread for
> > communication
> >with the system tracing daemon (traced) to advertise trace data and
> > get
> >notification of trace start/stop.
> >- Runtime overhead when disabled is designed to be optimal with one
> >predicted branch, typically a few CPU cycles
> >
> > 

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Alyssa Rosenzweig
My 2c for Mali/Panfrost --

For us, capturing GPU perf counters is orthogonal to rendering. It's
expected (e.g. with Arm's tools) to do this from a separate process.
Neither Mesa nor the DDK should require custom instrumentation for the
low-level data. Fahien's gfx-pps handles this correctly for Panfrost +
Perfetto as it is. So for us I don't see the value in modifying Mesa for
tracing.

On Fri, Feb 12, 2021 at 01:34:51PM -0800, John Bates wrote:
> (responding from correct address this time)
> 
> On Fri, Feb 12, 2021 at 12:03 PM Mark Janes  wrote:
> 
> > I've recently been using GPUVis to look at trace events.  On Intel
> > platforms, GPUVis incorporates ftrace events from the i915 driver,
> > performance metrics from igt-gpu-tools, and userspace ftrace markers
> > that I locally hack up in Mesa.
> >
> 
> GPUVis is great. I would love to see that data combined with
> userspace events without any need for local hacks. Perfetto provides
> on-demand trace events with lower overhead compared to ftrace, so for
> example it is acceptable to have production trace instrumentation that can
> be captured without dev builds. To do that with ftrace it may require a way
> to enable and disable the ftrace file writes to avoid the overhead when
> tracing is not in use. This is what Android does with systrace/atrace, for
> example, it uses Binder to notify processes about trace sessions. Perfetto
> does that in a more portable way.
> 
> 
> >
> > It is very easy to compile the GPUVis UI.  Userspace instrumentation
> > requires a single C/C++ header.  You don't have to access an external
> > web service to analyze trace data (a big no-no for devs working on
> > preproduction hardware).
> >
> > Is it possible to build and run the Perfetto UI locally?
> 
> 
> Yes, local UI builds are possible
> .
> Also confirmed with the perfetto team  that
> trace data is not uploaded unless you use the 'share' feature.
> 
> 
> >   Can it display
> > arbitrary trace events that are written to
> > /sys/kernel/tracing/trace_marker ?
> 
> 
> Yes, I believe it does support that via linux.ftrace data source
> . We use that for
> example to overlay CPU sched data to show what process is on each core
> throughout the timeline. There are many ftrace event types
> 
> in
> the perfetto protos.
> 
> 
> > Can it be extended to show i915 and
> > i915-perf-recorder events?
> >
> 
> It can be extended to consume custom data sources. One way this is done is
> via a bridge daemon, such as traced_probes which is responsible for
> capturing data from ftrace and /proc during a trace session and sending it
> to traced. traced is the main perfetto tracing daemon that notifies all
> trace data sources to start/stop tracing and communicates with user tracing
> requests via the 'perfetto' command.
> 
> 
> 
> >
> > John Bates  writes:
> >
> > > I recently opened issue 4262
> > >  to begin the
> > > discussion on integrating perfetto into mesa.
> > >
> > > *Background*
> > >
> > > System-wide tracing is an invaluable tool for developers to find and fix
> > > performance problems. The perfetto project enables a combined view of
> > trace
> > > data from kernel ftrace, GPU driver and various manually-instrumented
> > > tracepoints throughout the application and system. This helps developers
> > > quickly answer questions like:
> > >
> > >- How long are frames taking?
> > >- What caused a particular frame drop?
> > >- Is it CPU bound or GPU bound?
> > >- Did a CPU core frequency drop cause something to go slower than
> > usual?
> > >- Is something else running that is stealing CPU or GPU time? Could I
> > >fix that with better thread/context priorities?
> > >- Are all CPU cores being used effectively? Do I need
> > sched_setaffinity
> > >to keep my thread on a big or little core?
> > >- What’s the latency between CPU frame submit and GPU start?
> > >
> > > *What Does Mesa + Perfetto Provide?*
> > >
> > > Mesa is in a unique position to produce GPU trace data for several GPU
> > > vendors without requiring the developer to build and install additional
> > > tools like gfx-pps .
> > >
> > > The key is making it easy for developers to use. Ideally, perfetto is
> > > eventually available by default in mesa so that if your system has
> > perfetto
> > > traced running, you just need to run perfetto (perhaps along with setting
> > > an environment variable) with the mesa categories to see:
> > >
> > >- GPU processing timeline events.
> > >- GPU counters.
> > >- CPU events for potentially slow functions in mesa like shader
> > compiles.
> > >
> > > 

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Mark Janes
Rob Clark  writes:

> On Fri, Feb 12, 2021 at 5:01 AM Tamminen, Eero T
>  wrote:
>>
>> Hi,
>>
>> On Thu, 2021-02-11 at 17:39 -0800, John Bates wrote:
>> > I recently opened issue 4262
>> >  to begin the
>> > discussion on integrating perfetto into mesa.
>> >
>> > *Background*
>> >
>> > System-wide tracing is an invaluable tool for developers to find and
>> > fix
>> > performance problems. The perfetto project enables a combined view of
>> > trace
>> > data from kernel ftrace, GPU driver and various manually-instrumented
>> > tracepoints throughout the application and system.
>>
>> Unlike some other Linux tracing solutions, Perfetto appears to be for
>> Android / Chrome(OS?), and not available from in common Linux distro
>> repos.
>
> I don't think there is anything about perfetto that would not be
> usable in a generic linux distro.. and mesa support for perfetto would
> perhaps be a compelling reason for distro's to add support
>
>> So, why Perfetto instead of one of the other solutions, e.g. from ones
>> mentioned here:
>> https://tracingsummit.org/ts/2018/
>> ?
>>
>> And, if tracing API is added to Mesa, shouldn't it support also
>> tracepoints for other tracing solutions?
>
> perfetto does have systrace collectors
>
> And a general comment on perfetto vs other things.. we end up needing
> to support perfetto regardless (for android and CrOS).. we don't
> *need* to enable it on generic linux, but I think we should (but maybe
> using the mode that does not require a system server.. at least
> initially.. that may limit it's ability to collect systrace and traces
> from other parts of the system, but that wouldn't depend on distro's
> enabling perfetto system server).

Perfetto seems like an awful lot of infrastructure to capture trace
events.  Why not follow the example of GPUVis, and write generic
trace_markers to ftrace?  It limits impact to Mesa, while allowing any
trace visualizer to use the trace points.

>> I mean, code added to drivers themselves preferably should not have
>> anything perfetto/percetto specific.  Tracing system specific code
>> should be only in one place (even if it's just macros in common header).
>>
>>
>> > This helps developers
>> > quickly answer questions like:
>> >
>> >- How long are frames taking?
>>
>> That doesn't require any changes to Mesa.  Just set uprobe for suitable
>> buffer swap function [1], and parse kernel ftrace events.  This way
>> starting tracing doesn't require even restarting the tracked processes.
>>
>
> But this doesn't tell you how long the GPU is spending doing what.  My
> rough idea is to hook up an optional callback to u_tracepoint so we
> can get generate perfetto traces on the GPU timeline (ie. with
> timestamps captured from GPU), fwiw

I implemented a feature called INTEL_MEASURE based off of a tool that
Ken wrote.  It captures render/batch/frame timestamps in a BO, providing
durations on the GPU timeline.  It works for Iris and Anv.

The approach provides accurate gpu timing, with minimal stalling.  This
data could be presented in Perfetto or GPUVis.

> BR,
> -R
>
>> [1] glXSwapBuffers, eglSwapBuffers, eglSwapBuffersWithDamageEXT,
>> anv_QueuePresentKHR[2]..
>>
>> [2] Many apps resolve "vkQueuePresentKHR" Vulkan API loader wrapper
>> function and call the backend function like "anv_QueuePresentKHR"
>> directly, so it's  better to track latter instead.
>>
>>
>> >- What caused a particular frame drop?
>> >- Is it CPU bound or GPU bound?
>>
>> That doesn't require adding tracepoints to Mesa, just checking CPU & GPU
>> utilization (which is lower level thing).
>>
>>
>> >- Did a CPU core frequency drop cause something to go slower than
>> > usual?
>>
>> Note that nowadays actual CPU frequencies are often controlled by HW /
>> firmware, so you don't necessarily get any ftrace event from freq
>> change, you would need to poll MSR registers instead (which is
>> privileged operation, and polling can easily miss changes).
>>
>>
>> >- Is something else running that is stealing CPU or GPU time? Could
>> > I
>> >fix that with better thread/context priorities?
>> >- Are all CPU cores being used effectively? Do I need
>> > sched_setaffinity
>> >to keep my thread on a big or little core?
>>
>> I don't think these to require adding tracepoints to Mesa either...
>>
>>
>> >- What’s the latency between CPU frame submit and GPU start?
>>
>> I think this would require tracepoints in kernel GPU code more than in
>> Mesa?
>>
>>
>> - Eero
>>
>>
>> > *What Does Mesa + Perfetto Provide?*
>> >
>> > Mesa is in a unique position to produce GPU trace data for several GPU
>> > vendors without requiring the developer to build and install
>> > additional
>> > tools like gfx-pps .
>> >
>> > The key is making it easy for developers to use. Ideally, perfetto is
>> > eventually available by default in mesa so that if your 

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Rob Clark
On Thu, Feb 11, 2021 at 5:40 PM John Bates  wrote:
>



> Runtime Characteristics
>
> ~500KB additional binary size. Even with using only the basic features of 
> perfetto, it will increase the binary size of mesa by about 500KB.

IMHO, that size is negligible.. looking at freedreno, a mesa build
*only* enabling freedreno is already ~6MB.. distros typically use
"megadriver" (ie. all the drivers linked into a single .so with hard
links for the different  ${driver}_dri.so), which on my fedora laptop
is ~21M.  Maybe if anything is relevant it is how much of that
actually gets paged into RAM from disk, but I think 500K isn't a thing
to worry about too much.

> Background thread. Perfetto uses a background thread for communication with 
> the system tracing daemon (traced) to advertise trace data and get 
> notification of trace start/stop.

Mesa already tends to have plenty of threads.. some of that depends on
the driver, I think currently radeonsi is the threading king, but
there are several other drivers working on threaded_context and async
compile thread pool.

It is worth mentioning that, AFAIU, perfetto can operate in
self-server mode, which seems like it would be useful for distros
which do not have the system daemon.  I'm not sure if we lose that
with percetto?

> Runtime overhead when disabled is designed to be optimal with one predicted 
> branch, typically a few CPU cycles per event. While enabled, the overhead can 
> be around 1 us per event.
>
> Integration Challenges
>
> The perfetto SDK is C++ and designed around macros, lambdas, inline 
> templates, etc. There are ongoing discussions on providing an official 
> perfetto C API, but it is not yet clear when this will land on the perfetto 
> roadmap.
> The perfetto SDK is an amalgamated .h and .cc that adds up to 100K lines of 
> code.
> Anything that includes perfetto.h takes a long time to compile.
> The current Perfetto SDK design is incompatible with being a shared library 
> behind a C API.

So, C++ on it's own isn't a showstopper, mesa has plenty of C++ code.
But maybe we should verify that MSVC is happy with it, otherwise we
need to take a bit more care in some parts of the codebase.

As far as compile time, I wonder if we can regenerate the .cc/.h with
only the gpu trace parts?  But I wouldn't expect the .h to be
something widely included.  For example, for gpu timeline traces in
freedreno, I'm expecting it to look like a freedreno_perfetto.cc with
extern "C" {} around the callbacks that would hook into the
u_tracepoint tracepoints.  That one file would pull in the perfetto
.h, and we'd just not build that file if perfetto was disabled.

Overall having to add our own extern C wrappers in some places doesn't
seem like the *end* of the world.. a bit annoying, but we might end up
doing that regardless if other folks want the ability to hook in
something other than perfetto?



> Mesa Integration Alternatives

I'm kind of leaning towards the "just slurp in the .cc/.h" approach..
that is mostly because I expect to initially just add some basic gpu
timeline tracepoints, but over time iterate on adding more.. it would
be nice to not have to depend on a newer version of an external
library at each step.  That is ofc only my $0.02..

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Chia-I Wu
For virgl, where the biggest perf gaps often come from unnecessary CPU
waits or high latencies of fence signaling, being able to insert
userspace driver trace events and combine them with kernel ftrace
events are a big plus.  Admittedly, there is no HW counters and my
needs are simpler (inserting function begin/end and wait begin/end and
combining them with virtio-gpu and dma-fence ftrace events).

On Fri, Feb 12, 2021 at 2:13 PM Alyssa Rosenzweig
 wrote:
>
> My 2c for Mali/Panfrost --
>
> For us, capturing GPU perf counters is orthogonal to rendering. It's
> expected (e.g. with Arm's tools) to do this from a separate process.
> Neither Mesa nor the DDK should require custom instrumentation for the
> low-level data. Fahien's gfx-pps handles this correctly for Panfrost +
> Perfetto as it is. So for us I don't see the value in modifying Mesa for
> tracing.
>
> On Fri, Feb 12, 2021 at 01:34:51PM -0800, John Bates wrote:
> > (responding from correct address this time)
> >
> > On Fri, Feb 12, 2021 at 12:03 PM Mark Janes  wrote:
> >
> > > I've recently been using GPUVis to look at trace events.  On Intel
> > > platforms, GPUVis incorporates ftrace events from the i915 driver,
> > > performance metrics from igt-gpu-tools, and userspace ftrace markers
> > > that I locally hack up in Mesa.
> > >
> >
> > GPUVis is great. I would love to see that data combined with
> > userspace events without any need for local hacks. Perfetto provides
> > on-demand trace events with lower overhead compared to ftrace, so for
> > example it is acceptable to have production trace instrumentation that can
> > be captured without dev builds. To do that with ftrace it may require a way
> > to enable and disable the ftrace file writes to avoid the overhead when
> > tracing is not in use. This is what Android does with systrace/atrace, for
> > example, it uses Binder to notify processes about trace sessions. Perfetto
> > does that in a more portable way.
> >
> >
> > >
> > > It is very easy to compile the GPUVis UI.  Userspace instrumentation
> > > requires a single C/C++ header.  You don't have to access an external
> > > web service to analyze trace data (a big no-no for devs working on
> > > preproduction hardware).
> > >
> > > Is it possible to build and run the Perfetto UI locally?
> >
> >
> > Yes, local UI builds are possible
> > .
> > Also confirmed with the perfetto team  that
> > trace data is not uploaded unless you use the 'share' feature.
> >
> >
> > >   Can it display
> > > arbitrary trace events that are written to
> > > /sys/kernel/tracing/trace_marker ?
> >
> >
> > Yes, I believe it does support that via linux.ftrace data source
> > . We use that for
> > example to overlay CPU sched data to show what process is on each core
> > throughout the timeline. There are many ftrace event types
> > 
> > in
> > the perfetto protos.
> >
> >
> > > Can it be extended to show i915 and
> > > i915-perf-recorder events?
> > >
> >
> > It can be extended to consume custom data sources. One way this is done is
> > via a bridge daemon, such as traced_probes which is responsible for
> > capturing data from ftrace and /proc during a trace session and sending it
> > to traced. traced is the main perfetto tracing daemon that notifies all
> > trace data sources to start/stop tracing and communicates with user tracing
> > requests via the 'perfetto' command.
> >
> >
> >
> > >
> > > John Bates  writes:
> > >
> > > > I recently opened issue 4262
> > > >  to begin the
> > > > discussion on integrating perfetto into mesa.
> > > >
> > > > *Background*
> > > >
> > > > System-wide tracing is an invaluable tool for developers to find and fix
> > > > performance problems. The perfetto project enables a combined view of
> > > trace
> > > > data from kernel ftrace, GPU driver and various manually-instrumented
> > > > tracepoints throughout the application and system. This helps developers
> > > > quickly answer questions like:
> > > >
> > > >- How long are frames taking?
> > > >- What caused a particular frame drop?
> > > >- Is it CPU bound or GPU bound?
> > > >- Did a CPU core frequency drop cause something to go slower than
> > > usual?
> > > >- Is something else running that is stealing CPU or GPU time? Could I
> > > >fix that with better thread/context priorities?
> > > >- Are all CPU cores being used effectively? Do I need
> > > sched_setaffinity
> > > >to keep my thread on a big or little core?
> > > >- What’s the latency between CPU frame submit and GPU start?
> > > >
> > > > *What Does Mesa + Perfetto Provide?*
> > > >
> > > > Mesa is in a unique position to produce GPU trace data 

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Rob Clark
yes, but that is a limitation of mali which does not apply to a lot of
other drivers ;-)

But AFAIU typically you'd use perfetto with a sort of system server
collecting trace data from various different processes, so the fact
that that mali trace perf counters come from somewhere else doesn't
really matter

And this is about more than just perf cntrs, I plan to wire up the
u_tracepoint stuff to perfetto events (or rather provide a way to hook
up individual tracepoints) so that we can see on a timeline things
like how long the binning pass took, how long tile passes take (broken
down into restore/draw/resolve).  I think we mostly definitely want
perfetto support in mesa.  It can be optional, but I'm hoping linux
distros start enabling perfetto when they have a compelling reason to
(ie. mesa gpu perf analysis)

BR,
-R

On Fri, Feb 12, 2021 at 2:13 PM Alyssa Rosenzweig
 wrote:
>
> My 2c for Mali/Panfrost --
>
> For us, capturing GPU perf counters is orthogonal to rendering. It's
> expected (e.g. with Arm's tools) to do this from a separate process.
> Neither Mesa nor the DDK should require custom instrumentation for the
> low-level data. Fahien's gfx-pps handles this correctly for Panfrost +
> Perfetto as it is. So for us I don't see the value in modifying Mesa for
> tracing.
>
> On Fri, Feb 12, 2021 at 01:34:51PM -0800, John Bates wrote:
> > (responding from correct address this time)
> >
> > On Fri, Feb 12, 2021 at 12:03 PM Mark Janes  wrote:
> >
> > > I've recently been using GPUVis to look at trace events.  On Intel
> > > platforms, GPUVis incorporates ftrace events from the i915 driver,
> > > performance metrics from igt-gpu-tools, and userspace ftrace markers
> > > that I locally hack up in Mesa.
> > >
> >
> > GPUVis is great. I would love to see that data combined with
> > userspace events without any need for local hacks. Perfetto provides
> > on-demand trace events with lower overhead compared to ftrace, so for
> > example it is acceptable to have production trace instrumentation that can
> > be captured without dev builds. To do that with ftrace it may require a way
> > to enable and disable the ftrace file writes to avoid the overhead when
> > tracing is not in use. This is what Android does with systrace/atrace, for
> > example, it uses Binder to notify processes about trace sessions. Perfetto
> > does that in a more portable way.
> >
> >
> > >
> > > It is very easy to compile the GPUVis UI.  Userspace instrumentation
> > > requires a single C/C++ header.  You don't have to access an external
> > > web service to analyze trace data (a big no-no for devs working on
> > > preproduction hardware).
> > >
> > > Is it possible to build and run the Perfetto UI locally?
> >
> >
> > Yes, local UI builds are possible
> > .
> > Also confirmed with the perfetto team  that
> > trace data is not uploaded unless you use the 'share' feature.
> >
> >
> > >   Can it display
> > > arbitrary trace events that are written to
> > > /sys/kernel/tracing/trace_marker ?
> >
> >
> > Yes, I believe it does support that via linux.ftrace data source
> > . We use that for
> > example to overlay CPU sched data to show what process is on each core
> > throughout the timeline. There are many ftrace event types
> > 
> > in
> > the perfetto protos.
> >
> >
> > > Can it be extended to show i915 and
> > > i915-perf-recorder events?
> > >
> >
> > It can be extended to consume custom data sources. One way this is done is
> > via a bridge daemon, such as traced_probes which is responsible for
> > capturing data from ftrace and /proc during a trace session and sending it
> > to traced. traced is the main perfetto tracing daemon that notifies all
> > trace data sources to start/stop tracing and communicates with user tracing
> > requests via the 'perfetto' command.
> >
> >
> >
> > >
> > > John Bates  writes:
> > >
> > > > I recently opened issue 4262
> > > >  to begin the
> > > > discussion on integrating perfetto into mesa.
> > > >
> > > > *Background*
> > > >
> > > > System-wide tracing is an invaluable tool for developers to find and fix
> > > > performance problems. The perfetto project enables a combined view of
> > > trace
> > > > data from kernel ftrace, GPU driver and various manually-instrumented
> > > > tracepoints throughout the application and system. This helps developers
> > > > quickly answer questions like:
> > > >
> > > >- How long are frames taking?
> > > >- What caused a particular frame drop?
> > > >- Is it CPU bound or GPU bound?
> > > >- Did a CPU core frequency drop cause something to go slower than
> > > usual?
> > > >- Is something else running that is 

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread John Bates
(responding from correct address this time)

On Fri, Feb 12, 2021 at 12:03 PM Mark Janes  wrote:

> I've recently been using GPUVis to look at trace events.  On Intel
> platforms, GPUVis incorporates ftrace events from the i915 driver,
> performance metrics from igt-gpu-tools, and userspace ftrace markers
> that I locally hack up in Mesa.
>

GPUVis is great. I would love to see that data combined with
userspace events without any need for local hacks. Perfetto provides
on-demand trace events with lower overhead compared to ftrace, so for
example it is acceptable to have production trace instrumentation that can
be captured without dev builds. To do that with ftrace it may require a way
to enable and disable the ftrace file writes to avoid the overhead when
tracing is not in use. This is what Android does with systrace/atrace, for
example, it uses Binder to notify processes about trace sessions. Perfetto
does that in a more portable way.


>
> It is very easy to compile the GPUVis UI.  Userspace instrumentation
> requires a single C/C++ header.  You don't have to access an external
> web service to analyze trace data (a big no-no for devs working on
> preproduction hardware).
>
> Is it possible to build and run the Perfetto UI locally?


Yes, local UI builds are possible
.
Also confirmed with the perfetto team  that
trace data is not uploaded unless you use the 'share' feature.


>   Can it display
> arbitrary trace events that are written to
> /sys/kernel/tracing/trace_marker ?


Yes, I believe it does support that via linux.ftrace data source
. We use that for
example to overlay CPU sched data to show what process is on each core
throughout the timeline. There are many ftrace event types

in
the perfetto protos.


> Can it be extended to show i915 and
> i915-perf-recorder events?
>

It can be extended to consume custom data sources. One way this is done is
via a bridge daemon, such as traced_probes which is responsible for
capturing data from ftrace and /proc during a trace session and sending it
to traced. traced is the main perfetto tracing daemon that notifies all
trace data sources to start/stop tracing and communicates with user tracing
requests via the 'perfetto' command.



>
> John Bates  writes:
>
> > I recently opened issue 4262
> >  to begin the
> > discussion on integrating perfetto into mesa.
> >
> > *Background*
> >
> > System-wide tracing is an invaluable tool for developers to find and fix
> > performance problems. The perfetto project enables a combined view of
> trace
> > data from kernel ftrace, GPU driver and various manually-instrumented
> > tracepoints throughout the application and system. This helps developers
> > quickly answer questions like:
> >
> >- How long are frames taking?
> >- What caused a particular frame drop?
> >- Is it CPU bound or GPU bound?
> >- Did a CPU core frequency drop cause something to go slower than
> usual?
> >- Is something else running that is stealing CPU or GPU time? Could I
> >fix that with better thread/context priorities?
> >- Are all CPU cores being used effectively? Do I need
> sched_setaffinity
> >to keep my thread on a big or little core?
> >- What’s the latency between CPU frame submit and GPU start?
> >
> > *What Does Mesa + Perfetto Provide?*
> >
> > Mesa is in a unique position to produce GPU trace data for several GPU
> > vendors without requiring the developer to build and install additional
> > tools like gfx-pps .
> >
> > The key is making it easy for developers to use. Ideally, perfetto is
> > eventually available by default in mesa so that if your system has
> perfetto
> > traced running, you just need to run perfetto (perhaps along with setting
> > an environment variable) with the mesa categories to see:
> >
> >- GPU processing timeline events.
> >- GPU counters.
> >- CPU events for potentially slow functions in mesa like shader
> compiles.
> >
> > Example of what this data might look like (with fake GPU events):
> > [image: percetto-gpu-example.png]
> >
> > *Runtime Characteristics*
> >
> >- ~500KB additional binary size. Even with using only the basic
> features
> >of perfetto, it will increase the binary size of mesa by about 500KB.
> >- Background thread. Perfetto uses a background thread for
> communication
> >with the system tracing daemon (traced) to advertise trace data and
> get
> >notification of trace start/stop.
> >- Runtime overhead when disabled is designed to be optimal with one
> >predicted branch, typically a few CPU cycles
> >

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Lionel Landwerlin

We're kind of in the same boat for Intel.

Access to GPU perf counters is exclusive to a single process if you want 
to build a timeline of the work (because preemption etc...).


The best information we could add from mesa would a timestamp of when a 
particular drawcall started.

But that's pretty much when timestamps queries are.

Were you thinking of particular GPU generated data you don't get from 
gfx-pps?


Thanks,

-Lionel


On 13/02/2021 00:12, Alyssa Rosenzweig wrote:

My 2c for Mali/Panfrost --

For us, capturing GPU perf counters is orthogonal to rendering. It's
expected (e.g. with Arm's tools) to do this from a separate process.
Neither Mesa nor the DDK should require custom instrumentation for the
low-level data. Fahien's gfx-pps handles this correctly for Panfrost +
Perfetto as it is. So for us I don't see the value in modifying Mesa for
tracing.

On Fri, Feb 12, 2021 at 01:34:51PM -0800, John Bates wrote:

(responding from correct address this time)

On Fri, Feb 12, 2021 at 12:03 PM Mark Janes  wrote:


I've recently been using GPUVis to look at trace events.  On Intel
platforms, GPUVis incorporates ftrace events from the i915 driver,
performance metrics from igt-gpu-tools, and userspace ftrace markers
that I locally hack up in Mesa.


GPUVis is great. I would love to see that data combined with
userspace events without any need for local hacks. Perfetto provides
on-demand trace events with lower overhead compared to ftrace, so for
example it is acceptable to have production trace instrumentation that can
be captured without dev builds. To do that with ftrace it may require a way
to enable and disable the ftrace file writes to avoid the overhead when
tracing is not in use. This is what Android does with systrace/atrace, for
example, it uses Binder to notify processes about trace sessions. Perfetto
does that in a more portable way.



It is very easy to compile the GPUVis UI.  Userspace instrumentation
requires a single C/C++ header.  You don't have to access an external
web service to analyze trace data (a big no-no for devs working on
preproduction hardware).

Is it possible to build and run the Perfetto UI locally?


Yes, local UI builds are possible
.
Also confirmed with the perfetto team  that
trace data is not uploaded unless you use the 'share' feature.



   Can it display
arbitrary trace events that are written to
/sys/kernel/tracing/trace_marker ?


Yes, I believe it does support that via linux.ftrace data source
. We use that for
example to overlay CPU sched data to show what process is on each core
throughout the timeline. There are many ftrace event types

in
the perfetto protos.



Can it be extended to show i915 and
i915-perf-recorder events?


It can be extended to consume custom data sources. One way this is done is
via a bridge daemon, such as traced_probes which is responsible for
capturing data from ftrace and /proc during a trace session and sending it
to traced. traced is the main perfetto tracing daemon that notifies all
trace data sources to start/stop tracing and communicates with user tracing
requests via the 'perfetto' command.




John Bates  writes:


I recently opened issue 4262
 to begin the
discussion on integrating perfetto into mesa.

*Background*

System-wide tracing is an invaluable tool for developers to find and fix
performance problems. The perfetto project enables a combined view of

trace

data from kernel ftrace, GPU driver and various manually-instrumented
tracepoints throughout the application and system. This helps developers
quickly answer questions like:

- How long are frames taking?
- What caused a particular frame drop?
- Is it CPU bound or GPU bound?
- Did a CPU core frequency drop cause something to go slower than

usual?

- Is something else running that is stealing CPU or GPU time? Could I
fix that with better thread/context priorities?
- Are all CPU cores being used effectively? Do I need

sched_setaffinity

to keep my thread on a big or little core?
- What’s the latency between CPU frame submit and GPU start?

*What Does Mesa + Perfetto Provide?*

Mesa is in a unique position to produce GPU trace data for several GPU
vendors without requiring the developer to build and install additional
tools like gfx-pps .

The key is making it easy for developers to use. Ideally, perfetto is
eventually available by default in mesa so that if your system has

perfetto

traced running, you just need to run perfetto (perhaps along with setting
an environment variable) with the mesa categories to see:

- GPU processing timeline 

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Rob Clark
On Fri, Feb 12, 2021 at 4:51 PM Mark Janes  wrote:
>
> Rob Clark  writes:
>
> > On Fri, Feb 12, 2021 at 5:01 AM Tamminen, Eero T
> >  wrote:
> >>
> >> Hi,
> >>
> >> On Thu, 2021-02-11 at 17:39 -0800, John Bates wrote:
> >> > I recently opened issue 4262
> >> >  to begin the
> >> > discussion on integrating perfetto into mesa.
> >> >
> >> > *Background*
> >> >
> >> > System-wide tracing is an invaluable tool for developers to find and
> >> > fix
> >> > performance problems. The perfetto project enables a combined view of
> >> > trace
> >> > data from kernel ftrace, GPU driver and various manually-instrumented
> >> > tracepoints throughout the application and system.
> >>
> >> Unlike some other Linux tracing solutions, Perfetto appears to be for
> >> Android / Chrome(OS?), and not available from in common Linux distro
> >> repos.
> >
> > I don't think there is anything about perfetto that would not be
> > usable in a generic linux distro.. and mesa support for perfetto would
> > perhaps be a compelling reason for distro's to add support
> >
> >> So, why Perfetto instead of one of the other solutions, e.g. from ones
> >> mentioned here:
> >> https://tracingsummit.org/ts/2018/
> >> ?
> >>
> >> And, if tracing API is added to Mesa, shouldn't it support also
> >> tracepoints for other tracing solutions?
> >
> > perfetto does have systrace collectors
> >
> > And a general comment on perfetto vs other things.. we end up needing
> > to support perfetto regardless (for android and CrOS).. we don't
> > *need* to enable it on generic linux, but I think we should (but maybe
> > using the mode that does not require a system server.. at least
> > initially.. that may limit it's ability to collect systrace and traces
> > from other parts of the system, but that wouldn't depend on distro's
> > enabling perfetto system server).
>
> Perfetto seems like an awful lot of infrastructure to capture trace
> events.  Why not follow the example of GPUVis, and write generic
> trace_markers to ftrace?  It limits impact to Mesa, while allowing any
> trace visualizer to use the trace points.

I'm not really seeing how that would cover anything more than CPU
based events.. which is kind of the smallest part of what I'm
interested in..

> >> I mean, code added to drivers themselves preferably should not have
> >> anything perfetto/percetto specific.  Tracing system specific code
> >> should be only in one place (even if it's just macros in common header).
> >>
> >>
> >> > This helps developers
> >> > quickly answer questions like:
> >> >
> >> >- How long are frames taking?
> >>
> >> That doesn't require any changes to Mesa.  Just set uprobe for suitable
> >> buffer swap function [1], and parse kernel ftrace events.  This way
> >> starting tracing doesn't require even restarting the tracked processes.
> >>
> >
> > But this doesn't tell you how long the GPU is spending doing what.  My
> > rough idea is to hook up an optional callback to u_tracepoint so we
> > can get generate perfetto traces on the GPU timeline (ie. with
> > timestamps captured from GPU), fwiw
>
> I implemented a feature called INTEL_MEASURE based off of a tool that
> Ken wrote.  It captures render/batch/frame timestamps in a BO, providing
> durations on the GPU timeline.  It works for Iris and Anv.
>
> The approach provides accurate gpu timing, with minimal stalling.  This
> data could be presented in Perfetto or GPUVis.

have a look at u_trace.. it is basically this but implemented in a way
that is (hopefully) useful to other drivers..

(there is some small gallium dependency currently, although at some
point when I'm spending more time on vk optimization I'll hoist it out
of gallium/aux/util, unless someone else gets there first)

BR,
-R

> > BR,
> > -R
> >
> >> [1] glXSwapBuffers, eglSwapBuffers, eglSwapBuffersWithDamageEXT,
> >> anv_QueuePresentKHR[2]..
> >>
> >> [2] Many apps resolve "vkQueuePresentKHR" Vulkan API loader wrapper
> >> function and call the backend function like "anv_QueuePresentKHR"
> >> directly, so it's  better to track latter instead.
> >>
> >>
> >> >- What caused a particular frame drop?
> >> >- Is it CPU bound or GPU bound?
> >>
> >> That doesn't require adding tracepoints to Mesa, just checking CPU & GPU
> >> utilization (which is lower level thing).
> >>
> >>
> >> >- Did a CPU core frequency drop cause something to go slower than
> >> > usual?
> >>
> >> Note that nowadays actual CPU frequencies are often controlled by HW /
> >> firmware, so you don't necessarily get any ftrace event from freq
> >> change, you would need to poll MSR registers instead (which is
> >> privileged operation, and polling can easily miss changes).
> >>
> >>
> >> >- Is something else running that is stealing CPU or GPU time? Could
> >> > I
> >> >fix that with better thread/context priorities?
> >> >- Are all CPU cores being used effectively? Do I need
> >> > sched_setaffinity
> 

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Rob Clark
On Fri, Feb 12, 2021 at 5:01 AM Tamminen, Eero T
 wrote:
>
> Hi,
>
> On Thu, 2021-02-11 at 17:39 -0800, John Bates wrote:
> > I recently opened issue 4262
> >  to begin the
> > discussion on integrating perfetto into mesa.
> >
> > *Background*
> >
> > System-wide tracing is an invaluable tool for developers to find and
> > fix
> > performance problems. The perfetto project enables a combined view of
> > trace
> > data from kernel ftrace, GPU driver and various manually-instrumented
> > tracepoints throughout the application and system.
>
> Unlike some other Linux tracing solutions, Perfetto appears to be for
> Android / Chrome(OS?), and not available from in common Linux distro
> repos.

I don't think there is anything about perfetto that would not be
usable in a generic linux distro.. and mesa support for perfetto would
perhaps be a compelling reason for distro's to add support

> So, why Perfetto instead of one of the other solutions, e.g. from ones
> mentioned here:
> https://tracingsummit.org/ts/2018/
> ?
>
> And, if tracing API is added to Mesa, shouldn't it support also
> tracepoints for other tracing solutions?

perfetto does have systrace collectors

And a general comment on perfetto vs other things.. we end up needing
to support perfetto regardless (for android and CrOS).. we don't
*need* to enable it on generic linux, but I think we should (but maybe
using the mode that does not require a system server.. at least
initially.. that may limit it's ability to collect systrace and traces
from other parts of the system, but that wouldn't depend on distro's
enabling perfetto system server).

> I mean, code added to drivers themselves preferably should not have
> anything perfetto/percetto specific.  Tracing system specific code
> should be only in one place (even if it's just macros in common header).
>
>
> > This helps developers
> > quickly answer questions like:
> >
> >- How long are frames taking?
>
> That doesn't require any changes to Mesa.  Just set uprobe for suitable
> buffer swap function [1], and parse kernel ftrace events.  This way
> starting tracing doesn't require even restarting the tracked processes.
>

But this doesn't tell you how long the GPU is spending doing what.  My
rough idea is to hook up an optional callback to u_tracepoint so we
can get generate perfetto traces on the GPU timeline (ie. with
timestamps captured from GPU), fwiw

BR,
-R

> [1] glXSwapBuffers, eglSwapBuffers, eglSwapBuffersWithDamageEXT,
> anv_QueuePresentKHR[2]..
>
> [2] Many apps resolve "vkQueuePresentKHR" Vulkan API loader wrapper
> function and call the backend function like "anv_QueuePresentKHR"
> directly, so it's  better to track latter instead.
>
>
> >- What caused a particular frame drop?
> >- Is it CPU bound or GPU bound?
>
> That doesn't require adding tracepoints to Mesa, just checking CPU & GPU
> utilization (which is lower level thing).
>
>
> >- Did a CPU core frequency drop cause something to go slower than
> > usual?
>
> Note that nowadays actual CPU frequencies are often controlled by HW /
> firmware, so you don't necessarily get any ftrace event from freq
> change, you would need to poll MSR registers instead (which is
> privileged operation, and polling can easily miss changes).
>
>
> >- Is something else running that is stealing CPU or GPU time? Could
> > I
> >fix that with better thread/context priorities?
> >- Are all CPU cores being used effectively? Do I need
> > sched_setaffinity
> >to keep my thread on a big or little core?
>
> I don't think these to require adding tracepoints to Mesa either...
>
>
> >- What’s the latency between CPU frame submit and GPU start?
>
> I think this would require tracepoints in kernel GPU code more than in
> Mesa?
>
>
> - Eero
>
>
> > *What Does Mesa + Perfetto Provide?*
> >
> > Mesa is in a unique position to produce GPU trace data for several GPU
> > vendors without requiring the developer to build and install
> > additional
> > tools like gfx-pps .
> >
> > The key is making it easy for developers to use. Ideally, perfetto is
> > eventually available by default in mesa so that if your system has
> > perfetto
> > traced running, you just need to run perfetto (perhaps along with
> > setting
> > an environment variable) with the mesa categories to see:
> >
> >- GPU processing timeline events.
> >- GPU counters.
> >- CPU events for potentially slow functions in mesa like shader
> > compiles.
> >
> > Example of what this data might look like (with fake GPU events):
> > [image: percetto-gpu-example.png]
> >
> > *Runtime Characteristics*
> >
> >- ~500KB additional binary size. Even with using only the basic
> > features
> >of perfetto, it will increase the binary size of mesa by about
> > 500KB.
> >- Background thread. Perfetto uses a background thread for
> > communication
> >with 

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Alyssa Rosenzweig
Sure, I definitely see the use case for virgl :)

On Fri, Feb 12, 2021 at 02:43:25PM -0800, Chia-I Wu wrote:
> For virgl, where the biggest perf gaps often come from unnecessary CPU
> waits or high latencies of fence signaling, being able to insert
> userspace driver trace events and combine them with kernel ftrace
> events are a big plus.  Admittedly, there is no HW counters and my
> needs are simpler (inserting function begin/end and wait begin/end and
> combining them with virtio-gpu and dma-fence ftrace events).
> 
> On Fri, Feb 12, 2021 at 2:13 PM Alyssa Rosenzweig
>  wrote:
> >
> > My 2c for Mali/Panfrost --
> >
> > For us, capturing GPU perf counters is orthogonal to rendering. It's
> > expected (e.g. with Arm's tools) to do this from a separate process.
> > Neither Mesa nor the DDK should require custom instrumentation for the
> > low-level data. Fahien's gfx-pps handles this correctly for Panfrost +
> > Perfetto as it is. So for us I don't see the value in modifying Mesa for
> > tracing.
> >
> > On Fri, Feb 12, 2021 at 01:34:51PM -0800, John Bates wrote:
> > > (responding from correct address this time)
> > >
> > > On Fri, Feb 12, 2021 at 12:03 PM Mark Janes  
> > > wrote:
> > >
> > > > I've recently been using GPUVis to look at trace events.  On Intel
> > > > platforms, GPUVis incorporates ftrace events from the i915 driver,
> > > > performance metrics from igt-gpu-tools, and userspace ftrace markers
> > > > that I locally hack up in Mesa.
> > > >
> > >
> > > GPUVis is great. I would love to see that data combined with
> > > userspace events without any need for local hacks. Perfetto provides
> > > on-demand trace events with lower overhead compared to ftrace, so for
> > > example it is acceptable to have production trace instrumentation that can
> > > be captured without dev builds. To do that with ftrace it may require a 
> > > way
> > > to enable and disable the ftrace file writes to avoid the overhead when
> > > tracing is not in use. This is what Android does with systrace/atrace, for
> > > example, it uses Binder to notify processes about trace sessions. Perfetto
> > > does that in a more portable way.
> > >
> > >
> > > >
> > > > It is very easy to compile the GPUVis UI.  Userspace instrumentation
> > > > requires a single C/C++ header.  You don't have to access an external
> > > > web service to analyze trace data (a big no-no for devs working on
> > > > preproduction hardware).
> > > >
> > > > Is it possible to build and run the Perfetto UI locally?
> > >
> > >
> > > Yes, local UI builds are possible
> > > .
> > > Also confirmed with the perfetto team  that
> > > trace data is not uploaded unless you use the 'share' feature.
> > >
> > >
> > > >   Can it display
> > > > arbitrary trace events that are written to
> > > > /sys/kernel/tracing/trace_marker ?
> > >
> > >
> > > Yes, I believe it does support that via linux.ftrace data source
> > > . We use that for
> > > example to overlay CPU sched data to show what process is on each core
> > > throughout the timeline. There are many ftrace event types
> > > 
> > > in
> > > the perfetto protos.
> > >
> > >
> > > > Can it be extended to show i915 and
> > > > i915-perf-recorder events?
> > > >
> > >
> > > It can be extended to consume custom data sources. One way this is done is
> > > via a bridge daemon, such as traced_probes which is responsible for
> > > capturing data from ftrace and /proc during a trace session and sending it
> > > to traced. traced is the main perfetto tracing daemon that notifies all
> > > trace data sources to start/stop tracing and communicates with user 
> > > tracing
> > > requests via the 'perfetto' command.
> > >
> > >
> > >
> > > >
> > > > John Bates  writes:
> > > >
> > > > > I recently opened issue 4262
> > > > >  to begin the
> > > > > discussion on integrating perfetto into mesa.
> > > > >
> > > > > *Background*
> > > > >
> > > > > System-wide tracing is an invaluable tool for developers to find and 
> > > > > fix
> > > > > performance problems. The perfetto project enables a combined view of
> > > > trace
> > > > > data from kernel ftrace, GPU driver and various manually-instrumented
> > > > > tracepoints throughout the application and system. This helps 
> > > > > developers
> > > > > quickly answer questions like:
> > > > >
> > > > >- How long are frames taking?
> > > > >- What caused a particular frame drop?
> > > > >- Is it CPU bound or GPU bound?
> > > > >- Did a CPU core frequency drop cause something to go slower than
> > > > usual?
> > > > >- Is something else running that is stealing CPU or GPU time? 
> > > > > Could I
> > > > >fix that with 

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Rob Clark
On Fri, Feb 12, 2021 at 5:40 PM John Bates  wrote:
>
>
>
> On Fri, Feb 12, 2021 at 4:34 PM Rob Clark  wrote:
>>
>> On Thu, Feb 11, 2021 at 5:40 PM John Bates  wrote:
>> >
>>
>> 
>>
>> > Runtime Characteristics
>> >
>> > ~500KB additional binary size. Even with using only the basic features of 
>> > perfetto, it will increase the binary size of mesa by about 500KB.
>>
>> IMHO, that size is negligible.. looking at freedreno, a mesa build
>> *only* enabling freedreno is already ~6MB.. distros typically use
>> "megadriver" (ie. all the drivers linked into a single .so with hard
>> links for the different  ${driver}_dri.so), which on my fedora laptop
>> is ~21M.  Maybe if anything is relevant it is how much of that
>> actually gets paged into RAM from disk, but I think 500K isn't a thing
>> to worry about too much.
>>
>> > Background thread. Perfetto uses a background thread for communication 
>> > with the system tracing daemon (traced) to advertise trace data and get 
>> > notification of trace start/stop.
>>
>> Mesa already tends to have plenty of threads.. some of that depends on
>> the driver, I think currently radeonsi is the threading king, but
>> there are several other drivers working on threaded_context and async
>> compile thread pool.
>>
>> It is worth mentioning that, AFAIU, perfetto can operate in
>> self-server mode, which seems like it would be useful for distros
>> which do not have the system daemon.  I'm not sure if we lose that
>> with percetto?
>
>
> Easy to add, but want to avoid a runtime arg because it would add ~300KB to 
> binary size. Okay if we have an alternate init function though.

I think I could imagine wanting mesa build params to control whether
we want self-server or system-server mode.. ie. if some distros add
system-server support they wouldn't need self-server mode and visa
versa

>
>>
>>
>> > Runtime overhead when disabled is designed to be optimal with one 
>> > predicted branch, typically a few CPU cycles per event. While enabled, the 
>> > overhead can be around 1 us per event.
>> >
>> > Integration Challenges
>> >
>> > The perfetto SDK is C++ and designed around macros, lambdas, inline 
>> > templates, etc. There are ongoing discussions on providing an official 
>> > perfetto C API, but it is not yet clear when this will land on the 
>> > perfetto roadmap.
>> > The perfetto SDK is an amalgamated .h and .cc that adds up to 100K lines 
>> > of code.
>> > Anything that includes perfetto.h takes a long time to compile.
>> > The current Perfetto SDK design is incompatible with being a shared 
>> > library behind a C API.
>>
>> So, C++ on it's own isn't a showstopper, mesa has plenty of C++ code.
>> But maybe we should verify that MSVC is happy with it, otherwise we
>> need to take a bit more care in some parts of the codebase.
>>
>> As far as compile time, I wonder if we can regenerate the .cc/.h with
>> only the gpu trace parts?  But I wouldn't expect the .h to be
>> something widely included.  For example, for gpu timeline traces in
>> freedreno, I'm expecting it to look like a freedreno_perfetto.cc with
>> extern "C" {} around the callbacks that would hook into the
>> u_tracepoint tracepoints.  That one file would pull in the perfetto
>> .h, and we'd just not build that file if perfetto was disabled.
>
>
> That works for GPU, but I'd like to see some slow CPU functions in traces as 
> well to help reason about performance problems. This ends up peppering the 
> trace header in lots of places.

My point was that we could strip out a whole lot of stuff that is
completely unrelated to mesa.. not sure if it is worth bothering with,
I doubt we'd #include perfetto.h very widely

>> Overall having to add our own extern C wrappers in some places doesn't
>> seem like the *end* of the world.. a bit annoying, but we might end up
>> doing that regardless if other folks want the ability to hook in
>> something other than perfetto?
>
>
> It's more than extern C wrappers if we want to minimize overhead while 
> tracing enabled at compile time. Have a look at percetto.h/cc.

I'm not sure how many distros are not using LTO these days.. I assume
once you have LTO it doesn't really matter anymore?

>>
>>
>> 
>>
>> > Mesa Integration Alternatives
>>
>> I'm kind of leaning towards the "just slurp in the .cc/.h" approach..
>> that is mostly because I expect to initially just add some basic gpu
>> timeline tracepoints, but over time iterate on adding more.. it would
>> be nice to not have to depend on a newer version of an external
>> library at each step.  That is ofc only my $0.02..
>
>
> It's a small initial setup tax, true, but I still think it depends on what 
> perfetto features we plan to use -- for only a couple files doing GPU tracing 
> I agree percetto is unnecessary, but for CPU tracing it gets more complicated.

Definitely the first thing I plan to use is getting render stages onto
a timeline, so I can better see where the GPU time is going.. second
step is probably adding more 

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Dylan Baker
We're open source, tilting at windmills is what we do :D

On Fri, Feb 12, 2021, at 18:49, Rob Clark wrote:
> A lot of code, which like I said is mostly just generated ser/deser
> and not very interesting.. and 90% of it we won't use (unless mesa
> becomes a wifi driver, and more or less the rest of an OS).  And if
> there is a need to update it, we update it.. it's two files.  But *if*
> there are any API changes, we can deal with that in the same commit.
> I'm not saying it is great.. just that it is least bad.
> 
> I completely understand the argument against vendoring.. and I also
> completely understand the argument against tilting at windmills ;-)
> 
> BR,
> -R
> 
> On Fri, Feb 12, 2021 at 6:15 PM Dylan Baker  wrote:
> >
> > I can't speak for anyone else, but a giant pile of vendored code that 
> > you're expected to not update seems like a really bad idea to me.
> >
> > On Fri, Feb 12, 2021, at 18:09, Rob Clark wrote:
> > > I'm not really sure that is a fair statement.. the work scales
> > > according to the API change (which I'm not really sure if it changes
> > > much other than adding things).. if the API doesn't change, it is not
> > > really any effort to update two files in mesa git.
> > >
> > > As far as bug fixes.. it is a lot of code, but seems like the largest
> > > part of it is just generated protobuf serialization/deserialization
> > > code, rather than anything interesting.
> > >
> > > And again, I'm not a fan of their approach of "just vendor it".. but
> > > it is how perfetto is intended to be used, and in this case it seems
> > > like the best approach, since it is a situation where the protocol is
> > > the point of abi stability.
> > >
> > > BR,
> > > -R
> > >
> > > On Fri, Feb 12, 2021 at 5:51 PM Dylan Baker  wrote:
> > > >
> > > > So, we're vendoring something that we know getting bug fixes for will 
> > > > be an enormous pile of work? That sounds like a really bad idea.
> > > >
> > > > On Fri, Feb 12, 2021, at 17:51, Rob Clark wrote:
> > > > > On Fri, Feb 12, 2021 at 5:35 PM Dylan Baker  
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Feb 12, 2021, at 16:36, Rob Clark wrote:
> > > > > > > On Thu, Feb 11, 2021 at 5:40 PM John Bates  
> > > > > > > wrote:
> > > > > > > >
> > > > > > >
> > > > > > > 
> > > > > > >
> > > > > > > > Runtime Characteristics
> > > > > > > >
> > > > > > > > ~500KB additional binary size. Even with using only the basic 
> > > > > > > > features of perfetto, it will increase the binary size of mesa 
> > > > > > > > by about 500KB.
> > > > > > >
> > > > > > > IMHO, that size is negligible.. looking at freedreno, a mesa build
> > > > > > > *only* enabling freedreno is already ~6MB.. distros typically use
> > > > > > > "megadriver" (ie. all the drivers linked into a single .so with 
> > > > > > > hard
> > > > > > > links for the different  ${driver}_dri.so), which on my fedora 
> > > > > > > laptop
> > > > > > > is ~21M.  Maybe if anything is relevant it is how much of that
> > > > > > > actually gets paged into RAM from disk, but I think 500K isn't a 
> > > > > > > thing
> > > > > > > to worry about too much.
> > > > > > >
> > > > > > > > Background thread. Perfetto uses a background thread for 
> > > > > > > > communication with the system tracing daemon (traced) to 
> > > > > > > > advertise trace data and get notification of trace start/stop.
> > > > > > >
> > > > > > > Mesa already tends to have plenty of threads.. some of that 
> > > > > > > depends on
> > > > > > > the driver, I think currently radeonsi is the threading king, but
> > > > > > > there are several other drivers working on threaded_context and 
> > > > > > > async
> > > > > > > compile thread pool.
> > > > > > >
> > > > > > > It is worth mentioning that, AFAIU, perfetto can operate in
> > > > > > > self-server mode, which seems like it would be useful for distros
> > > > > > > which do not have the system daemon.  I'm not sure if we lose that
> > > > > > > with percetto?
> > > > > > >
> > > > > > > > Runtime overhead when disabled is designed to be optimal with 
> > > > > > > > one predicted branch, typically a few CPU cycles per event. 
> > > > > > > > While enabled, the overhead can be around 1 us per event.
> > > > > > > >
> > > > > > > > Integration Challenges
> > > > > > > >
> > > > > > > > The perfetto SDK is C++ and designed around macros, lambdas, 
> > > > > > > > inline templates, etc. There are ongoing discussions on 
> > > > > > > > providing an official perfetto C API, but it is not yet clear 
> > > > > > > > when this will land on the perfetto roadmap.
> > > > > > > > The perfetto SDK is an amalgamated .h and .cc that adds up to 
> > > > > > > > 100K lines of code.
> > > > > > > > Anything that includes perfetto.h takes a long time to compile.
> > > > > > > > The current Perfetto SDK design is incompatible with being a 
> > > > > > > > shared library behind a C API.
> > > > > > >
> > > > > > > So, C++ on it's own isn't a showstopper, mesa has plenty 

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread John Bates
On Fri, Feb 12, 2021 at 4:34 PM Rob Clark  wrote:

> On Thu, Feb 11, 2021 at 5:40 PM John Bates  wrote:
> >
>
> 
>
> > Runtime Characteristics
> >
> > ~500KB additional binary size. Even with using only the basic features
> of perfetto, it will increase the binary size of mesa by about 500KB.
>
> IMHO, that size is negligible.. looking at freedreno, a mesa build
> *only* enabling freedreno is already ~6MB.. distros typically use
> "megadriver" (ie. all the drivers linked into a single .so with hard
> links for the different  ${driver}_dri.so), which on my fedora laptop
> is ~21M.  Maybe if anything is relevant it is how much of that
> actually gets paged into RAM from disk, but I think 500K isn't a thing
> to worry about too much.
>
> > Background thread. Perfetto uses a background thread for communication
> with the system tracing daemon (traced) to advertise trace data and get
> notification of trace start/stop.
>
> Mesa already tends to have plenty of threads.. some of that depends on
> the driver, I think currently radeonsi is the threading king, but
> there are several other drivers working on threaded_context and async
> compile thread pool.
>
> It is worth mentioning that, AFAIU, perfetto can operate in
> self-server mode, which seems like it would be useful for distros
> which do not have the system daemon.  I'm not sure if we lose that
> with percetto?
>

Easy to add, but want to avoid a runtime arg because it would add ~300KB to
binary size. Okay if we have an alternate init function though.


>
> > Runtime overhead when disabled is designed to be optimal with one
> predicted branch, typically a few CPU cycles per event. While enabled, the
> overhead can be around 1 us per event.
> >
> > Integration Challenges
> >
> > The perfetto SDK is C++ and designed around macros, lambdas, inline
> templates, etc. There are ongoing discussions on providing an official
> perfetto C API, but it is not yet clear when this will land on the perfetto
> roadmap.
> > The perfetto SDK is an amalgamated .h and .cc that adds up to 100K lines
> of code.
> > Anything that includes perfetto.h takes a long time to compile.
> > The current Perfetto SDK design is incompatible with being a shared
> library behind a C API.
>
> So, C++ on it's own isn't a showstopper, mesa has plenty of C++ code.
> But maybe we should verify that MSVC is happy with it, otherwise we
> need to take a bit more care in some parts of the codebase.
>
> As far as compile time, I wonder if we can regenerate the .cc/.h with
> only the gpu trace parts?  But I wouldn't expect the .h to be
> something widely included.  For example, for gpu timeline traces in
> freedreno, I'm expecting it to look like a freedreno_perfetto.cc with
> extern "C" {} around the callbacks that would hook into the
> u_tracepoint tracepoints.  That one file would pull in the perfetto
> .h, and we'd just not build that file if perfetto was disabled.
>

That works for GPU, but I'd like to see some slow CPU functions in traces
as well to help reason about performance problems. This ends up peppering
the trace header in lots of places.

Overall having to add our own extern C wrappers in some places doesn't
> seem like the *end* of the world.. a bit annoying, but we might end up
> doing that regardless if other folks want the ability to hook in
> something other than perfetto?
>

It's more than extern C wrappers if we want to minimize overhead while
tracing enabled at compile time. Have a look at percetto.h
/cc
.


>
> 
>
> > Mesa Integration Alternatives
>
> I'm kind of leaning towards the "just slurp in the .cc/.h" approach..
> that is mostly because I expect to initially just add some basic gpu
> timeline tracepoints, but over time iterate on adding more.. it would
> be nice to not have to depend on a newer version of an external
> library at each step.  That is ofc only my $0.02..
>

It's a small initial setup tax, true, but I still think it depends on what
perfetto features we plan to use -- for only a couple files doing GPU
tracing I agree percetto is unnecessary, but for CPU tracing it gets more
complicated.


>
> BR,
> -R
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Lionel Landwerlin

On 13/02/2021 03:38, Rob Clark wrote:

On Fri, Feb 12, 2021 at 5:08 PM Lionel Landwerlin
 wrote:

We're kind of in the same boat for Intel.

Access to GPU perf counters is exclusive to a single process if you want
to build a timeline of the work (because preemption etc...).

ugg, does that mean extensions like AMD_performance_monitor doesn't
actually work on intel?



It work,s but only a single app can use it at a time.





The best information we could add from mesa would a timestamp of when a
particular drawcall started.
But that's pretty much when timestamps queries are.

Were you thinking of particular GPU generated data you don't get from
gfx-pps?

>From the looks of it, currently I don't get *any* GPU generated data
from gfx-pps ;-)



Maybe file a bug? : 
https://gitlab.freedesktop.org/Fahien/gfx-pps/-/blob/master/src/gpu/intel/intel_driver.cc





We can ofc sample counters from a separate process as well... I have a
curses tool (fdperf) which does this.. but running outside of gpu
cmdstream plus counters losing context across suspend/resume makes it
less than perfect.



Our counters are global so to give per application values, we need to 
post process a stream of HW counter snapshots.




   And something that works the same way as
AMD_performance_monitor under the hook gives a more precise look at
which shaders (for ex) are consuming the most cycles.



In our implementation that precision (in particular when a drawcall 
ends) comes at a stalling cost unfortunately.




   For cases where
we can profile a trace, frameretrace and related tools is pretty
great.. but it would be nice to have similar visibility for actual
games (which for me, mostly means android games, since so far no
aarch64 steam store), but also give game developers good tools (or at
least the same tools that they get with other closed src drivers on
android).



Sure, but frame analysis is different than live monitoring of the system.

On Intel's HW you don't get the same level of details in both cases, and 
apart for a few timestamps, I think gfx-pps is as good as you gonna get 
for live stuff.



-Lionel




BR,
-R


Thanks,

-Lionel


On 13/02/2021 00:12, Alyssa Rosenzweig wrote:

My 2c for Mali/Panfrost --

For us, capturing GPU perf counters is orthogonal to rendering. It's
expected (e.g. with Arm's tools) to do this from a separate process.
Neither Mesa nor the DDK should require custom instrumentation for the
low-level data. Fahien's gfx-pps handles this correctly for Panfrost +
Perfetto as it is. So for us I don't see the value in modifying Mesa for
tracing.

On Fri, Feb 12, 2021 at 01:34:51PM -0800, John Bates wrote:

(responding from correct address this time)

On Fri, Feb 12, 2021 at 12:03 PM Mark Janes  wrote:


I've recently been using GPUVis to look at trace events.  On Intel
platforms, GPUVis incorporates ftrace events from the i915 driver,
performance metrics from igt-gpu-tools, and userspace ftrace markers
that I locally hack up in Mesa.


GPUVis is great. I would love to see that data combined with
userspace events without any need for local hacks. Perfetto provides
on-demand trace events with lower overhead compared to ftrace, so for
example it is acceptable to have production trace instrumentation that can
be captured without dev builds. To do that with ftrace it may require a way
to enable and disable the ftrace file writes to avoid the overhead when
tracing is not in use. This is what Android does with systrace/atrace, for
example, it uses Binder to notify processes about trace sessions. Perfetto
does that in a more portable way.



It is very easy to compile the GPUVis UI.  Userspace instrumentation
requires a single C/C++ header.  You don't have to access an external
web service to analyze trace data (a big no-no for devs working on
preproduction hardware).

Is it possible to build and run the Perfetto UI locally?

Yes, local UI builds are possible
.
Also confirmed with the perfetto team  that
trace data is not uploaded unless you use the 'share' feature.



Can it display
arbitrary trace events that are written to
/sys/kernel/tracing/trace_marker ?

Yes, I believe it does support that via linux.ftrace data source
. We use that for
example to overlay CPU sched data to show what process is on each core
throughout the timeline. There are many ftrace event types

in
the perfetto protos.



Can it be extended to show i915 and
i915-perf-recorder events?


It can be extended to consume custom data sources. One way this is done is
via a bridge daemon, such as traced_probes which is responsible for
capturing data from ftrace and /proc during a trace session and sending it
to traced. traced is the main perfetto tracing 

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Dylan Baker
I can't speak for anyone else, but a giant pile of vendored code that you're 
expected to not update seems like a really bad idea to me. 

On Fri, Feb 12, 2021, at 18:09, Rob Clark wrote:
> I'm not really sure that is a fair statement.. the work scales
> according to the API change (which I'm not really sure if it changes
> much other than adding things).. if the API doesn't change, it is not
> really any effort to update two files in mesa git.
> 
> As far as bug fixes.. it is a lot of code, but seems like the largest
> part of it is just generated protobuf serialization/deserialization
> code, rather than anything interesting.
> 
> And again, I'm not a fan of their approach of "just vendor it".. but
> it is how perfetto is intended to be used, and in this case it seems
> like the best approach, since it is a situation where the protocol is
> the point of abi stability.
> 
> BR,
> -R
> 
> On Fri, Feb 12, 2021 at 5:51 PM Dylan Baker  wrote:
> >
> > So, we're vendoring something that we know getting bug fixes for will be an 
> > enormous pile of work? That sounds like a really bad idea.
> >
> > On Fri, Feb 12, 2021, at 17:51, Rob Clark wrote:
> > > On Fri, Feb 12, 2021 at 5:35 PM Dylan Baker  wrote:
> > > >
> > > >
> > > >
> > > > On Fri, Feb 12, 2021, at 16:36, Rob Clark wrote:
> > > > > On Thu, Feb 11, 2021 at 5:40 PM John Bates  
> > > > > wrote:
> > > > > >
> > > > >
> > > > > 
> > > > >
> > > > > > Runtime Characteristics
> > > > > >
> > > > > > ~500KB additional binary size. Even with using only the basic 
> > > > > > features of perfetto, it will increase the binary size of mesa by 
> > > > > > about 500KB.
> > > > >
> > > > > IMHO, that size is negligible.. looking at freedreno, a mesa build
> > > > > *only* enabling freedreno is already ~6MB.. distros typically use
> > > > > "megadriver" (ie. all the drivers linked into a single .so with hard
> > > > > links for the different  ${driver}_dri.so), which on my fedora laptop
> > > > > is ~21M.  Maybe if anything is relevant it is how much of that
> > > > > actually gets paged into RAM from disk, but I think 500K isn't a thing
> > > > > to worry about too much.
> > > > >
> > > > > > Background thread. Perfetto uses a background thread for 
> > > > > > communication with the system tracing daemon (traced) to advertise 
> > > > > > trace data and get notification of trace start/stop.
> > > > >
> > > > > Mesa already tends to have plenty of threads.. some of that depends on
> > > > > the driver, I think currently radeonsi is the threading king, but
> > > > > there are several other drivers working on threaded_context and async
> > > > > compile thread pool.
> > > > >
> > > > > It is worth mentioning that, AFAIU, perfetto can operate in
> > > > > self-server mode, which seems like it would be useful for distros
> > > > > which do not have the system daemon.  I'm not sure if we lose that
> > > > > with percetto?
> > > > >
> > > > > > Runtime overhead when disabled is designed to be optimal with one 
> > > > > > predicted branch, typically a few CPU cycles per event. While 
> > > > > > enabled, the overhead can be around 1 us per event.
> > > > > >
> > > > > > Integration Challenges
> > > > > >
> > > > > > The perfetto SDK is C++ and designed around macros, lambdas, inline 
> > > > > > templates, etc. There are ongoing discussions on providing an 
> > > > > > official perfetto C API, but it is not yet clear when this will 
> > > > > > land on the perfetto roadmap.
> > > > > > The perfetto SDK is an amalgamated .h and .cc that adds up to 100K 
> > > > > > lines of code.
> > > > > > Anything that includes perfetto.h takes a long time to compile.
> > > > > > The current Perfetto SDK design is incompatible with being a shared 
> > > > > > library behind a C API.
> > > > >
> > > > > So, C++ on it's own isn't a showstopper, mesa has plenty of C++ code.
> > > > > But maybe we should verify that MSVC is happy with it, otherwise we
> > > > > need to take a bit more care in some parts of the codebase.
> > > > >
> > > > > As far as compile time, I wonder if we can regenerate the .cc/.h with
> > > > > only the gpu trace parts?  But I wouldn't expect the .h to be
> > > > > something widely included.  For example, for gpu timeline traces in
> > > > > freedreno, I'm expecting it to look like a freedreno_perfetto.cc with
> > > > > extern "C" {} around the callbacks that would hook into the
> > > > > u_tracepoint tracepoints.  That one file would pull in the perfetto
> > > > > .h, and we'd just not build that file if perfetto was disabled.
> > > > >
> > > > > Overall having to add our own extern C wrappers in some places doesn't
> > > > > seem like the *end* of the world.. a bit annoying, but we might end up
> > > > > doing that regardless if other folks want the ability to hook in
> > > > > something other than perfetto?
> > > > >
> > > > > 
> > > > >
> > > > > > Mesa Integration Alternatives
> > > > >
> > > > > I'm kind of leaning towards the "just slurp in the .cc/.h" 

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Rob Clark
On Fri, Feb 12, 2021 at 5:56 PM Lionel Landwerlin
 wrote:
>
> On 13/02/2021 03:38, Rob Clark wrote:
> > On Fri, Feb 12, 2021 at 5:08 PM Lionel Landwerlin
> >  wrote:
> >> We're kind of in the same boat for Intel.
> >>
> >> Access to GPU perf counters is exclusive to a single process if you want
> >> to build a timeline of the work (because preemption etc...).
> > ugg, does that mean extensions like AMD_performance_monitor doesn't
> > actually work on intel?
>
>
> It work,s but only a single app can use it at a time.
>

I see.. on the freedreno side we haven't really gone down the
preemption route yet, but we have a way to hook in some safe/restore
cmdstream

>
> >
> >> The best information we could add from mesa would a timestamp of when a
> >> particular drawcall started.
> >> But that's pretty much when timestamps queries are.
> >>
> >> Were you thinking of particular GPU generated data you don't get from
> >> gfx-pps?
> > >From the looks of it, currently I don't get *any* GPU generated data
> > from gfx-pps ;-)
>
>
> Maybe file a bug? :
> https://gitlab.freedesktop.org/Fahien/gfx-pps/-/blob/master/src/gpu/intel/intel_driver.cc
>
>
> >
> > We can ofc sample counters from a separate process as well... I have a
> > curses tool (fdperf) which does this.. but running outside of gpu
> > cmdstream plus counters losing context across suspend/resume makes it
> > less than perfect.
>
>
> Our counters are global so to give per application values, we need to
> post process a stream of HW counter snapshots.
>
>
> >And something that works the same way as
> > AMD_performance_monitor under the hook gives a more precise look at
> > which shaders (for ex) are consuming the most cycles.
>
>
> In our implementation that precision (in particular when a drawcall
> ends) comes at a stalling cost unfortunately.

yeah, stalling on our end too for per-draw counter snapshots.. but if
you are looking for which shaders to optimize that doesn't matter
*that* much.. they'll be some overhead, but it's not really going to
change which draws/shaders are expensive.. just mean that you lose out
on pipelining of the state changes

BR,
-R

>
> >For cases where
> > we can profile a trace, frameretrace and related tools is pretty
> > great.. but it would be nice to have similar visibility for actual
> > games (which for me, mostly means android games, since so far no
> > aarch64 steam store), but also give game developers good tools (or at
> > least the same tools that they get with other closed src drivers on
> > android).
>
>
> Sure, but frame analysis is different than live monitoring of the system.
>
> On Intel's HW you don't get the same level of details in both cases, and
> apart for a few timestamps, I think gfx-pps is as good as you gonna get
> for live stuff.
>
>
> -Lionel
>
>
> >
> > BR,
> > -R
> >
> >> Thanks,
> >>
> >> -Lionel
> >>
> >>
> >> On 13/02/2021 00:12, Alyssa Rosenzweig wrote:
> >>> My 2c for Mali/Panfrost --
> >>>
> >>> For us, capturing GPU perf counters is orthogonal to rendering. It's
> >>> expected (e.g. with Arm's tools) to do this from a separate process.
> >>> Neither Mesa nor the DDK should require custom instrumentation for the
> >>> low-level data. Fahien's gfx-pps handles this correctly for Panfrost +
> >>> Perfetto as it is. So for us I don't see the value in modifying Mesa for
> >>> tracing.
> >>>
> >>> On Fri, Feb 12, 2021 at 01:34:51PM -0800, John Bates wrote:
>  (responding from correct address this time)
> 
>  On Fri, Feb 12, 2021 at 12:03 PM Mark Janes  
>  wrote:
> 
> > I've recently been using GPUVis to look at trace events.  On Intel
> > platforms, GPUVis incorporates ftrace events from the i915 driver,
> > performance metrics from igt-gpu-tools, and userspace ftrace markers
> > that I locally hack up in Mesa.
> >
>  GPUVis is great. I would love to see that data combined with
>  userspace events without any need for local hacks. Perfetto provides
>  on-demand trace events with lower overhead compared to ftrace, so for
>  example it is acceptable to have production trace instrumentation that 
>  can
>  be captured without dev builds. To do that with ftrace it may require a 
>  way
>  to enable and disable the ftrace file writes to avoid the overhead when
>  tracing is not in use. This is what Android does with systrace/atrace, 
>  for
>  example, it uses Binder to notify processes about trace sessions. 
>  Perfetto
>  does that in a more portable way.
> 
> 
> > It is very easy to compile the GPUVis UI.  Userspace instrumentation
> > requires a single C/C++ header.  You don't have to access an external
> > web service to analyze trace data (a big no-no for devs working on
> > preproduction hardware).
> >
> > Is it possible to build and run the Perfetto UI locally?
>  Yes, local UI builds are possible
>  

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Rob Clark
On Fri, Feb 12, 2021 at 5:08 PM Lionel Landwerlin
 wrote:
>
> We're kind of in the same boat for Intel.
>
> Access to GPU perf counters is exclusive to a single process if you want
> to build a timeline of the work (because preemption etc...).

ugg, does that mean extensions like AMD_performance_monitor doesn't
actually work on intel?

> The best information we could add from mesa would a timestamp of when a
> particular drawcall started.
> But that's pretty much when timestamps queries are.
>
> Were you thinking of particular GPU generated data you don't get from
> gfx-pps?

From the looks of it, currently I don't get *any* GPU generated data
from gfx-pps ;-)

We can ofc sample counters from a separate process as well... I have a
curses tool (fdperf) which does this.. but running outside of gpu
cmdstream plus counters losing context across suspend/resume makes it
less than perfect.  And something that works the same way as
AMD_performance_monitor under the hook gives a more precise look at
which shaders (for ex) are consuming the most cycles.  For cases where
we can profile a trace, frameretrace and related tools is pretty
great.. but it would be nice to have similar visibility for actual
games (which for me, mostly means android games, since so far no
aarch64 steam store), but also give game developers good tools (or at
least the same tools that they get with other closed src drivers on
android).

BR,
-R

> Thanks,
>
> -Lionel
>
>
> On 13/02/2021 00:12, Alyssa Rosenzweig wrote:
> > My 2c for Mali/Panfrost --
> >
> > For us, capturing GPU perf counters is orthogonal to rendering. It's
> > expected (e.g. with Arm's tools) to do this from a separate process.
> > Neither Mesa nor the DDK should require custom instrumentation for the
> > low-level data. Fahien's gfx-pps handles this correctly for Panfrost +
> > Perfetto as it is. So for us I don't see the value in modifying Mesa for
> > tracing.
> >
> > On Fri, Feb 12, 2021 at 01:34:51PM -0800, John Bates wrote:
> >> (responding from correct address this time)
> >>
> >> On Fri, Feb 12, 2021 at 12:03 PM Mark Janes  wrote:
> >>
> >>> I've recently been using GPUVis to look at trace events.  On Intel
> >>> platforms, GPUVis incorporates ftrace events from the i915 driver,
> >>> performance metrics from igt-gpu-tools, and userspace ftrace markers
> >>> that I locally hack up in Mesa.
> >>>
> >> GPUVis is great. I would love to see that data combined with
> >> userspace events without any need for local hacks. Perfetto provides
> >> on-demand trace events with lower overhead compared to ftrace, so for
> >> example it is acceptable to have production trace instrumentation that can
> >> be captured without dev builds. To do that with ftrace it may require a way
> >> to enable and disable the ftrace file writes to avoid the overhead when
> >> tracing is not in use. This is what Android does with systrace/atrace, for
> >> example, it uses Binder to notify processes about trace sessions. Perfetto
> >> does that in a more portable way.
> >>
> >>
> >>> It is very easy to compile the GPUVis UI.  Userspace instrumentation
> >>> requires a single C/C++ header.  You don't have to access an external
> >>> web service to analyze trace data (a big no-no for devs working on
> >>> preproduction hardware).
> >>>
> >>> Is it possible to build and run the Perfetto UI locally?
> >>
> >> Yes, local UI builds are possible
> >> .
> >> Also confirmed with the perfetto team  that
> >> trace data is not uploaded unless you use the 'share' feature.
> >>
> >>
> >>>Can it display
> >>> arbitrary trace events that are written to
> >>> /sys/kernel/tracing/trace_marker ?
> >>
> >> Yes, I believe it does support that via linux.ftrace data source
> >> . We use that for
> >> example to overlay CPU sched data to show what process is on each core
> >> throughout the timeline. There are many ftrace event types
> >> 
> >> in
> >> the perfetto protos.
> >>
> >>
> >>> Can it be extended to show i915 and
> >>> i915-perf-recorder events?
> >>>
> >> It can be extended to consume custom data sources. One way this is done is
> >> via a bridge daemon, such as traced_probes which is responsible for
> >> capturing data from ftrace and /proc during a trace session and sending it
> >> to traced. traced is the main perfetto tracing daemon that notifies all
> >> trace data sources to start/stop tracing and communicates with user tracing
> >> requests via the 'perfetto' command.
> >>
> >>
> >>
> >>> John Bates  writes:
> >>>
>  I recently opened issue 4262
>   to begin the
>  discussion on integrating perfetto into mesa.
> 
>  *Background*
> 
>  

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Dylan Baker
So, we're vendoring something that we know getting bug fixes for will be an 
enormous pile of work? That sounds like a really bad idea.

On Fri, Feb 12, 2021, at 17:51, Rob Clark wrote:
> On Fri, Feb 12, 2021 at 5:35 PM Dylan Baker  wrote:
> >
> >
> >
> > On Fri, Feb 12, 2021, at 16:36, Rob Clark wrote:
> > > On Thu, Feb 11, 2021 at 5:40 PM John Bates  wrote:
> > > >
> > >
> > > 
> > >
> > > > Runtime Characteristics
> > > >
> > > > ~500KB additional binary size. Even with using only the basic features 
> > > > of perfetto, it will increase the binary size of mesa by about 500KB.
> > >
> > > IMHO, that size is negligible.. looking at freedreno, a mesa build
> > > *only* enabling freedreno is already ~6MB.. distros typically use
> > > "megadriver" (ie. all the drivers linked into a single .so with hard
> > > links for the different  ${driver}_dri.so), which on my fedora laptop
> > > is ~21M.  Maybe if anything is relevant it is how much of that
> > > actually gets paged into RAM from disk, but I think 500K isn't a thing
> > > to worry about too much.
> > >
> > > > Background thread. Perfetto uses a background thread for communication 
> > > > with the system tracing daemon (traced) to advertise trace data and get 
> > > > notification of trace start/stop.
> > >
> > > Mesa already tends to have plenty of threads.. some of that depends on
> > > the driver, I think currently radeonsi is the threading king, but
> > > there are several other drivers working on threaded_context and async
> > > compile thread pool.
> > >
> > > It is worth mentioning that, AFAIU, perfetto can operate in
> > > self-server mode, which seems like it would be useful for distros
> > > which do not have the system daemon.  I'm not sure if we lose that
> > > with percetto?
> > >
> > > > Runtime overhead when disabled is designed to be optimal with one 
> > > > predicted branch, typically a few CPU cycles per event. While enabled, 
> > > > the overhead can be around 1 us per event.
> > > >
> > > > Integration Challenges
> > > >
> > > > The perfetto SDK is C++ and designed around macros, lambdas, inline 
> > > > templates, etc. There are ongoing discussions on providing an official 
> > > > perfetto C API, but it is not yet clear when this will land on the 
> > > > perfetto roadmap.
> > > > The perfetto SDK is an amalgamated .h and .cc that adds up to 100K 
> > > > lines of code.
> > > > Anything that includes perfetto.h takes a long time to compile.
> > > > The current Perfetto SDK design is incompatible with being a shared 
> > > > library behind a C API.
> > >
> > > So, C++ on it's own isn't a showstopper, mesa has plenty of C++ code.
> > > But maybe we should verify that MSVC is happy with it, otherwise we
> > > need to take a bit more care in some parts of the codebase.
> > >
> > > As far as compile time, I wonder if we can regenerate the .cc/.h with
> > > only the gpu trace parts?  But I wouldn't expect the .h to be
> > > something widely included.  For example, for gpu timeline traces in
> > > freedreno, I'm expecting it to look like a freedreno_perfetto.cc with
> > > extern "C" {} around the callbacks that would hook into the
> > > u_tracepoint tracepoints.  That one file would pull in the perfetto
> > > .h, and we'd just not build that file if perfetto was disabled.
> > >
> > > Overall having to add our own extern C wrappers in some places doesn't
> > > seem like the *end* of the world.. a bit annoying, but we might end up
> > > doing that regardless if other folks want the ability to hook in
> > > something other than perfetto?
> > >
> > > 
> > >
> > > > Mesa Integration Alternatives
> > >
> > > I'm kind of leaning towards the "just slurp in the .cc/.h" approach..
> > > that is mostly because I expect to initially just add some basic gpu
> > > timeline tracepoints, but over time iterate on adding more.. it would
> > > be nice to not have to depend on a newer version of an external
> > > library at each step.  That is ofc only my $0.02..
> > >
> > > BR,
> > > -R
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > >
> >
> >
> > My experience is that vendoring just ends up being a huge pain for 
> > everyone, especially if the ui code stops working with our forked version, 
> > and we have to rebase all of our changes on upstream.
> >
> > Could we add meson build files and use a wrap for this if the distro 
> > doesn't ship the library? Id be willing to do/help with an initial port if 
> > that's what we wanted to do. But since this really a dev dependency i don't 
> > see why using a wrap would be a big deal.
> 
> I'm not a super huge fan of the perfetto approach of "just import the
> library", but at the end of the day the point of ABI compatibility is
> the protocol, not the API.. so it is actually safer to import the
> library into mesa's git tree
> 
> BR,
> -R
>

-- 
  Dylan Baker
  

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Rob Clark
I'm not really sure that is a fair statement.. the work scales
according to the API change (which I'm not really sure if it changes
much other than adding things).. if the API doesn't change, it is not
really any effort to update two files in mesa git.

As far as bug fixes.. it is a lot of code, but seems like the largest
part of it is just generated protobuf serialization/deserialization
code, rather than anything interesting.

And again, I'm not a fan of their approach of "just vendor it".. but
it is how perfetto is intended to be used, and in this case it seems
like the best approach, since it is a situation where the protocol is
the point of abi stability.

BR,
-R

On Fri, Feb 12, 2021 at 5:51 PM Dylan Baker  wrote:
>
> So, we're vendoring something that we know getting bug fixes for will be an 
> enormous pile of work? That sounds like a really bad idea.
>
> On Fri, Feb 12, 2021, at 17:51, Rob Clark wrote:
> > On Fri, Feb 12, 2021 at 5:35 PM Dylan Baker  wrote:
> > >
> > >
> > >
> > > On Fri, Feb 12, 2021, at 16:36, Rob Clark wrote:
> > > > On Thu, Feb 11, 2021 at 5:40 PM John Bates  wrote:
> > > > >
> > > >
> > > > 
> > > >
> > > > > Runtime Characteristics
> > > > >
> > > > > ~500KB additional binary size. Even with using only the basic 
> > > > > features of perfetto, it will increase the binary size of mesa by 
> > > > > about 500KB.
> > > >
> > > > IMHO, that size is negligible.. looking at freedreno, a mesa build
> > > > *only* enabling freedreno is already ~6MB.. distros typically use
> > > > "megadriver" (ie. all the drivers linked into a single .so with hard
> > > > links for the different  ${driver}_dri.so), which on my fedora laptop
> > > > is ~21M.  Maybe if anything is relevant it is how much of that
> > > > actually gets paged into RAM from disk, but I think 500K isn't a thing
> > > > to worry about too much.
> > > >
> > > > > Background thread. Perfetto uses a background thread for 
> > > > > communication with the system tracing daemon (traced) to advertise 
> > > > > trace data and get notification of trace start/stop.
> > > >
> > > > Mesa already tends to have plenty of threads.. some of that depends on
> > > > the driver, I think currently radeonsi is the threading king, but
> > > > there are several other drivers working on threaded_context and async
> > > > compile thread pool.
> > > >
> > > > It is worth mentioning that, AFAIU, perfetto can operate in
> > > > self-server mode, which seems like it would be useful for distros
> > > > which do not have the system daemon.  I'm not sure if we lose that
> > > > with percetto?
> > > >
> > > > > Runtime overhead when disabled is designed to be optimal with one 
> > > > > predicted branch, typically a few CPU cycles per event. While 
> > > > > enabled, the overhead can be around 1 us per event.
> > > > >
> > > > > Integration Challenges
> > > > >
> > > > > The perfetto SDK is C++ and designed around macros, lambdas, inline 
> > > > > templates, etc. There are ongoing discussions on providing an 
> > > > > official perfetto C API, but it is not yet clear when this will land 
> > > > > on the perfetto roadmap.
> > > > > The perfetto SDK is an amalgamated .h and .cc that adds up to 100K 
> > > > > lines of code.
> > > > > Anything that includes perfetto.h takes a long time to compile.
> > > > > The current Perfetto SDK design is incompatible with being a shared 
> > > > > library behind a C API.
> > > >
> > > > So, C++ on it's own isn't a showstopper, mesa has plenty of C++ code.
> > > > But maybe we should verify that MSVC is happy with it, otherwise we
> > > > need to take a bit more care in some parts of the codebase.
> > > >
> > > > As far as compile time, I wonder if we can regenerate the .cc/.h with
> > > > only the gpu trace parts?  But I wouldn't expect the .h to be
> > > > something widely included.  For example, for gpu timeline traces in
> > > > freedreno, I'm expecting it to look like a freedreno_perfetto.cc with
> > > > extern "C" {} around the callbacks that would hook into the
> > > > u_tracepoint tracepoints.  That one file would pull in the perfetto
> > > > .h, and we'd just not build that file if perfetto was disabled.
> > > >
> > > > Overall having to add our own extern C wrappers in some places doesn't
> > > > seem like the *end* of the world.. a bit annoying, but we might end up
> > > > doing that regardless if other folks want the ability to hook in
> > > > something other than perfetto?
> > > >
> > > > 
> > > >
> > > > > Mesa Integration Alternatives
> > > >
> > > > I'm kind of leaning towards the "just slurp in the .cc/.h" approach..
> > > > that is mostly because I expect to initially just add some basic gpu
> > > > timeline tracepoints, but over time iterate on adding more.. it would
> > > > be nice to not have to depend on a newer version of an external
> > > > library at each step.  That is ofc only my $0.02..
> > > >
> > > > BR,
> > > > -R
> > > > ___
> > > > mesa-dev 

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Rob Clark
A lot of code, which like I said is mostly just generated ser/deser
and not very interesting.. and 90% of it we won't use (unless mesa
becomes a wifi driver, and more or less the rest of an OS).  And if
there is a need to update it, we update it.. it's two files.  But *if*
there are any API changes, we can deal with that in the same commit.
I'm not saying it is great.. just that it is least bad.

I completely understand the argument against vendoring.. and I also
completely understand the argument against tilting at windmills ;-)

BR,
-R

On Fri, Feb 12, 2021 at 6:15 PM Dylan Baker  wrote:
>
> I can't speak for anyone else, but a giant pile of vendored code that you're 
> expected to not update seems like a really bad idea to me.
>
> On Fri, Feb 12, 2021, at 18:09, Rob Clark wrote:
> > I'm not really sure that is a fair statement.. the work scales
> > according to the API change (which I'm not really sure if it changes
> > much other than adding things).. if the API doesn't change, it is not
> > really any effort to update two files in mesa git.
> >
> > As far as bug fixes.. it is a lot of code, but seems like the largest
> > part of it is just generated protobuf serialization/deserialization
> > code, rather than anything interesting.
> >
> > And again, I'm not a fan of their approach of "just vendor it".. but
> > it is how perfetto is intended to be used, and in this case it seems
> > like the best approach, since it is a situation where the protocol is
> > the point of abi stability.
> >
> > BR,
> > -R
> >
> > On Fri, Feb 12, 2021 at 5:51 PM Dylan Baker  wrote:
> > >
> > > So, we're vendoring something that we know getting bug fixes for will be 
> > > an enormous pile of work? That sounds like a really bad idea.
> > >
> > > On Fri, Feb 12, 2021, at 17:51, Rob Clark wrote:
> > > > On Fri, Feb 12, 2021 at 5:35 PM Dylan Baker  wrote:
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Feb 12, 2021, at 16:36, Rob Clark wrote:
> > > > > > On Thu, Feb 11, 2021 at 5:40 PM John Bates  
> > > > > > wrote:
> > > > > > >
> > > > > >
> > > > > > 
> > > > > >
> > > > > > > Runtime Characteristics
> > > > > > >
> > > > > > > ~500KB additional binary size. Even with using only the basic 
> > > > > > > features of perfetto, it will increase the binary size of mesa by 
> > > > > > > about 500KB.
> > > > > >
> > > > > > IMHO, that size is negligible.. looking at freedreno, a mesa build
> > > > > > *only* enabling freedreno is already ~6MB.. distros typically use
> > > > > > "megadriver" (ie. all the drivers linked into a single .so with hard
> > > > > > links for the different  ${driver}_dri.so), which on my fedora 
> > > > > > laptop
> > > > > > is ~21M.  Maybe if anything is relevant it is how much of that
> > > > > > actually gets paged into RAM from disk, but I think 500K isn't a 
> > > > > > thing
> > > > > > to worry about too much.
> > > > > >
> > > > > > > Background thread. Perfetto uses a background thread for 
> > > > > > > communication with the system tracing daemon (traced) to 
> > > > > > > advertise trace data and get notification of trace start/stop.
> > > > > >
> > > > > > Mesa already tends to have plenty of threads.. some of that depends 
> > > > > > on
> > > > > > the driver, I think currently radeonsi is the threading king, but
> > > > > > there are several other drivers working on threaded_context and 
> > > > > > async
> > > > > > compile thread pool.
> > > > > >
> > > > > > It is worth mentioning that, AFAIU, perfetto can operate in
> > > > > > self-server mode, which seems like it would be useful for distros
> > > > > > which do not have the system daemon.  I'm not sure if we lose that
> > > > > > with percetto?
> > > > > >
> > > > > > > Runtime overhead when disabled is designed to be optimal with one 
> > > > > > > predicted branch, typically a few CPU cycles per event. While 
> > > > > > > enabled, the overhead can be around 1 us per event.
> > > > > > >
> > > > > > > Integration Challenges
> > > > > > >
> > > > > > > The perfetto SDK is C++ and designed around macros, lambdas, 
> > > > > > > inline templates, etc. There are ongoing discussions on providing 
> > > > > > > an official perfetto C API, but it is not yet clear when this 
> > > > > > > will land on the perfetto roadmap.
> > > > > > > The perfetto SDK is an amalgamated .h and .cc that adds up to 
> > > > > > > 100K lines of code.
> > > > > > > Anything that includes perfetto.h takes a long time to compile.
> > > > > > > The current Perfetto SDK design is incompatible with being a 
> > > > > > > shared library behind a C API.
> > > > > >
> > > > > > So, C++ on it's own isn't a showstopper, mesa has plenty of C++ 
> > > > > > code.
> > > > > > But maybe we should verify that MSVC is happy with it, otherwise we
> > > > > > need to take a bit more care in some parts of the codebase.
> > > > > >
> > > > > > As far as compile time, I wonder if we can regenerate the .cc/.h 
> > > > > > with
> > > > > > only the gpu trace parts?  But I 

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Dylan Baker



On Fri, Feb 12, 2021, at 16:36, Rob Clark wrote:
> On Thu, Feb 11, 2021 at 5:40 PM John Bates  wrote:
> >
> 
> 
> 
> > Runtime Characteristics
> >
> > ~500KB additional binary size. Even with using only the basic features of 
> > perfetto, it will increase the binary size of mesa by about 500KB.
> 
> IMHO, that size is negligible.. looking at freedreno, a mesa build
> *only* enabling freedreno is already ~6MB.. distros typically use
> "megadriver" (ie. all the drivers linked into a single .so with hard
> links for the different  ${driver}_dri.so), which on my fedora laptop
> is ~21M.  Maybe if anything is relevant it is how much of that
> actually gets paged into RAM from disk, but I think 500K isn't a thing
> to worry about too much.
> 
> > Background thread. Perfetto uses a background thread for communication with 
> > the system tracing daemon (traced) to advertise trace data and get 
> > notification of trace start/stop.
> 
> Mesa already tends to have plenty of threads.. some of that depends on
> the driver, I think currently radeonsi is the threading king, but
> there are several other drivers working on threaded_context and async
> compile thread pool.
> 
> It is worth mentioning that, AFAIU, perfetto can operate in
> self-server mode, which seems like it would be useful for distros
> which do not have the system daemon.  I'm not sure if we lose that
> with percetto?
> 
> > Runtime overhead when disabled is designed to be optimal with one predicted 
> > branch, typically a few CPU cycles per event. While enabled, the overhead 
> > can be around 1 us per event.
> >
> > Integration Challenges
> >
> > The perfetto SDK is C++ and designed around macros, lambdas, inline 
> > templates, etc. There are ongoing discussions on providing an official 
> > perfetto C API, but it is not yet clear when this will land on the perfetto 
> > roadmap.
> > The perfetto SDK is an amalgamated .h and .cc that adds up to 100K lines of 
> > code.
> > Anything that includes perfetto.h takes a long time to compile.
> > The current Perfetto SDK design is incompatible with being a shared library 
> > behind a C API.
> 
> So, C++ on it's own isn't a showstopper, mesa has plenty of C++ code.
> But maybe we should verify that MSVC is happy with it, otherwise we
> need to take a bit more care in some parts of the codebase.
> 
> As far as compile time, I wonder if we can regenerate the .cc/.h with
> only the gpu trace parts?  But I wouldn't expect the .h to be
> something widely included.  For example, for gpu timeline traces in
> freedreno, I'm expecting it to look like a freedreno_perfetto.cc with
> extern "C" {} around the callbacks that would hook into the
> u_tracepoint tracepoints.  That one file would pull in the perfetto
> .h, and we'd just not build that file if perfetto was disabled.
> 
> Overall having to add our own extern C wrappers in some places doesn't
> seem like the *end* of the world.. a bit annoying, but we might end up
> doing that regardless if other folks want the ability to hook in
> something other than perfetto?
> 
> 
> 
> > Mesa Integration Alternatives
> 
> I'm kind of leaning towards the "just slurp in the .cc/.h" approach..
> that is mostly because I expect to initially just add some basic gpu
> timeline tracepoints, but over time iterate on adding more.. it would
> be nice to not have to depend on a newer version of an external
> library at each step.  That is ofc only my $0.02..
> 
> BR,
> -R
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>


My experience is that vendoring just ends up being a huge pain for everyone, 
especially if the ui code stops working with our forked version, and we have to 
rebase all of our changes on upstream.

Could we add meson build files and use a wrap for this if the distro doesn't 
ship the library? Id be willing to do/help with an initial port if that's what 
we wanted to do. But since this really a dev dependency i don't see why using a 
wrap would be a big deal.
-- 
  Dylan Baker
  dy...@pnwbakers.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Rob Clark
On Fri, Feb 12, 2021 at 5:35 PM Dylan Baker  wrote:
>
>
>
> On Fri, Feb 12, 2021, at 16:36, Rob Clark wrote:
> > On Thu, Feb 11, 2021 at 5:40 PM John Bates  wrote:
> > >
> >
> > 
> >
> > > Runtime Characteristics
> > >
> > > ~500KB additional binary size. Even with using only the basic features of 
> > > perfetto, it will increase the binary size of mesa by about 500KB.
> >
> > IMHO, that size is negligible.. looking at freedreno, a mesa build
> > *only* enabling freedreno is already ~6MB.. distros typically use
> > "megadriver" (ie. all the drivers linked into a single .so with hard
> > links for the different  ${driver}_dri.so), which on my fedora laptop
> > is ~21M.  Maybe if anything is relevant it is how much of that
> > actually gets paged into RAM from disk, but I think 500K isn't a thing
> > to worry about too much.
> >
> > > Background thread. Perfetto uses a background thread for communication 
> > > with the system tracing daemon (traced) to advertise trace data and get 
> > > notification of trace start/stop.
> >
> > Mesa already tends to have plenty of threads.. some of that depends on
> > the driver, I think currently radeonsi is the threading king, but
> > there are several other drivers working on threaded_context and async
> > compile thread pool.
> >
> > It is worth mentioning that, AFAIU, perfetto can operate in
> > self-server mode, which seems like it would be useful for distros
> > which do not have the system daemon.  I'm not sure if we lose that
> > with percetto?
> >
> > > Runtime overhead when disabled is designed to be optimal with one 
> > > predicted branch, typically a few CPU cycles per event. While enabled, 
> > > the overhead can be around 1 us per event.
> > >
> > > Integration Challenges
> > >
> > > The perfetto SDK is C++ and designed around macros, lambdas, inline 
> > > templates, etc. There are ongoing discussions on providing an official 
> > > perfetto C API, but it is not yet clear when this will land on the 
> > > perfetto roadmap.
> > > The perfetto SDK is an amalgamated .h and .cc that adds up to 100K lines 
> > > of code.
> > > Anything that includes perfetto.h takes a long time to compile.
> > > The current Perfetto SDK design is incompatible with being a shared 
> > > library behind a C API.
> >
> > So, C++ on it's own isn't a showstopper, mesa has plenty of C++ code.
> > But maybe we should verify that MSVC is happy with it, otherwise we
> > need to take a bit more care in some parts of the codebase.
> >
> > As far as compile time, I wonder if we can regenerate the .cc/.h with
> > only the gpu trace parts?  But I wouldn't expect the .h to be
> > something widely included.  For example, for gpu timeline traces in
> > freedreno, I'm expecting it to look like a freedreno_perfetto.cc with
> > extern "C" {} around the callbacks that would hook into the
> > u_tracepoint tracepoints.  That one file would pull in the perfetto
> > .h, and we'd just not build that file if perfetto was disabled.
> >
> > Overall having to add our own extern C wrappers in some places doesn't
> > seem like the *end* of the world.. a bit annoying, but we might end up
> > doing that regardless if other folks want the ability to hook in
> > something other than perfetto?
> >
> > 
> >
> > > Mesa Integration Alternatives
> >
> > I'm kind of leaning towards the "just slurp in the .cc/.h" approach..
> > that is mostly because I expect to initially just add some basic gpu
> > timeline tracepoints, but over time iterate on adding more.. it would
> > be nice to not have to depend on a newer version of an external
> > library at each step.  That is ofc only my $0.02..
> >
> > BR,
> > -R
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
>
>
> My experience is that vendoring just ends up being a huge pain for everyone, 
> especially if the ui code stops working with our forked version, and we have 
> to rebase all of our changes on upstream.
>
> Could we add meson build files and use a wrap for this if the distro doesn't 
> ship the library? Id be willing to do/help with an initial port if that's 
> what we wanted to do. But since this really a dev dependency i don't see why 
> using a wrap would be a big deal.

I'm not a super huge fan of the perfetto approach of "just import the
library", but at the end of the day the point of ABI compatibility is
the protocol, not the API.. so it is actually safer to import the
library into mesa's git tree

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev