Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-18 Thread Alyssa Rosenzweig
> But on many other embedded OSes  - at least Google ones like  CrOS and
> Android - the security model is way stricter.  We could argue that is
> bad / undesirable / too draconian but that is something that any of us
> has the power to change. At some point each platform decides where it
> wants to be in the spectrum of "easy to hack" and "secure for the
> user". CrOS model is: you can hack as much as you want, but you need
> first to re-flash it in dev-mode.

Off-topic but, speaking from someone who grew up in the libre software
"purist" circles, I'm a big fan of the CrOS model here.
Draconian is when the user _can't_ put it in dev mode. If you can,
there's nothing wrong with sane defaults
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-18 Thread Primiano Tucci



On 18/02/2021 20:26, Tamminen, Eero T wrote:

Hi,

(This isn't anymore that related to Mesa, but maybe it's still of
interest.)

On Thu, 2021-02-18 at 16:40 +0100, Primiano Tucci wrote:


On 18/02/2021 14:35, Tamminen, Eero T wrote:

[...]

It doesn't require executable code to be writable from user-space,
library code can remain read-only because kernel can toggle relevant
page writable for uprobe breakpoint setup and back.


The problem is not who rewrites the .text pages (although, yes, I agree
that the kernel doing this is better than userspace doing it). The
problem is:

1. Losing the ability to verify the integrity of system executables.
tell if some malware/rootkit did alter them or uprobes did. Effectively
you lose the ability to verify the full chain of bootloader -> system
image -> file integrity.


Why you would lose it?

Integrity checks will succeed when there are no trace points enabled,
and trace points should be enabled only when you start tracing, so you
know what is causing integrity check failures (especially when they
start passing again once you disable tracepoint


If you do this (disabling when tracing) the message out there becomes: 
"if you write malware, the very first thing you should do is enabling 
tracing, so any system integrity check will be suppressed" :)


Things like uprobes (i.e. anything that can dynamically alter the 
execution flow of system processes) is typically available only on 
engineering setups, where you have control of the device / kernel / 
security settings (Yama, selinux or any other security module),  not on 
production devices.
I understand that the situation for most (all?) Linux-based distros is 
different as you can just sudo. But on many other embedded OSes  - at 
least Google ones like  CrOS and Android - the security model is way 
stricter.
We could argue that is bad / undesirable / too draconian but that is 
something that any of us has the power to change. At some point each 
platform decides where it wants to be in the spectrum of "easy to hack" 
and "secure for the user". CrOS model is: you can hack as much as you 
want, but you need first to re-flash it in dev-mode.




2. In general, a mechanism that allows dynamic rewriting of code is a
wide attack surface, not welcome on production devices (for the same
very unlikely to fly for non-dev images IMHO. Many system processes
contain too sensitive information like cookie jar, oauth2 tokens etc.


Isn't there any kind of dev-mode which would be required to enable
things that are normally disallowed?


That requires following steps that are non-trivial for non-tech-savy 
users and, more importantly, wiping the device (CrOS calls this 
"power-washing") [1].
We can't ask users to reflash their device just to give us a trace when 
they are experiencing problems. Many of those problems can't be 
reproduced by engineers because depend on some peculiar state the user 
is in. A recent example (not related with Mesa): some users were 
experiencing an extremely unresponsive (Chrome) UI. After looking at 
traces engineers figured out that the root cause (and hence the repro) 
was: "you need to have a (chrome) tab which title is long enough to 
cause ellipsis and that also has an emoji in the left-most visible part. 
The emoji causes invalidation of the cached font measurement (this is 
the bug), which causes every UI draw to be awfully slow.
For problems like this (which are very frequent) we really need to ask 
users to give us traces. And that needs to be really a one-click thing 
for them or they will not be able to help us.


[1] 
https://www.chromium.org/chromium-os/chromiumos-design-docs/developer-mode


(like kernel modifying RO mapped user-space process memory pages)





[...]

Yes, if you need more context, or handle really frequent events,
static
breakpoints are a better choice.


In case of more frequent events, on Linux one might consider using
some
BPF program to process dynamic tracepoint data so that much smaller
amount needs to be transferred to user-space.  But I'm not sure
whether
support for attaching BPF to tracepoints is in upstream Linux kernel
yet.


eBPF, which you can use in recent kernels with tracepoints, solves
different problem. It solves e.g., (1) dynamic filtering or (2)
computing aggregations from hi-freq events. It doesn't solve problems
like "I want to see all scheduling events and all frame-related
userspace instrumentation points. But given that sched events are so
hi-traffic I want to put them in a separate buffer, so they don't
clobber all the rest". Turning scheduling events into a histogram
(something you can do with eBPF+tracepoints) doesn't really solve cases
where you want to follow the full scheduling block/wake chain while some
userspace events taking unexpectedly long.


You could e.g. filter out all sched events except ones for the process
you're interested about.  That should already provide huge reduction in
amount of data, for use-cases where scheduling 

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-18 Thread Tamminen, Eero T
Hi,

(This isn't anymore that related to Mesa, but maybe it's still of
interest.)

On Thu, 2021-02-18 at 16:40 +0100, Primiano Tucci wrote:

> On 18/02/2021 14:35, Tamminen, Eero T wrote:
[...]
> > It doesn't require executable code to be writable from user-space,
> > library code can remain read-only because kernel can toggle relevant
> > page writable for uprobe breakpoint setup and back.
> 
> The problem is not who rewrites the .text pages (although, yes, I agree 
> that the kernel doing this is better than userspace doing it). The 
> problem is:
> 
> 1. Losing the ability to verify the integrity of system executables. 
> tell if some malware/rootkit did alter them or uprobes did. Effectively 
> you lose the ability to verify the full chain of bootloader -> system 
> image -> file integrity.

Why you would lose it?

Integrity checks will succeed when there are no trace points enabled,
and trace points should be enabled only when you start tracing, so you
know what is causing integrity check failures (especially when they
start passing again once you disable tracepoints)...


> 2. In general, a mechanism that allows dynamic rewriting of code is a 
> wide attack surface, not welcome on production devices (for the same 
> very unlikely to fly for non-dev images IMHO. Many system processes 
> contain too sensitive information like cookie jar, oauth2 tokens etc.

Isn't there any kind of dev-mode which would be required to enable
things that are normally disallowed?

(like kernel modifying RO mapped user-space process memory pages)


> 
[...]
> > Yes, if you need more context, or handle really frequent events,
> > static
> > breakpoints are a better choice.
> > 
> > 
> > In case of more frequent events, on Linux one might consider using
> > some
> > BPF program to process dynamic tracepoint data so that much smaller
> > amount needs to be transferred to user-space.  But I'm not sure
> > whether
> > support for attaching BPF to tracepoints is in upstream Linux kernel
> > yet.
> 
> eBPF, which you can use in recent kernels with tracepoints, solves 
> different problem. It solves e.g., (1) dynamic filtering or (2) 
> computing aggregations from hi-freq events. It doesn't solve problems 
> like "I want to see all scheduling events and all frame-related 
> userspace instrumentation points. But given that sched events are so 
> hi-traffic I want to put them in a separate buffer, so they don't 
> clobber all the rest". Turning scheduling events into a histogram 
> (something you can do with eBPF+tracepoints) doesn't really solve cases 
> where you want to follow the full scheduling block/wake chain while some
> userspace events taking unexpectedly long.

You could e.g. filter out all sched events except ones for the process
you're interested about.  That should already provide huge reduction in
amount of data, for use-cases where scheduling of rest of processes is
of less interest.

However, I think high frequency kernel tracing is a different use-case
from user-space tracing, which requires its own tooling [1] (and just
few user-space trace points to provide context for traced kernel
activity).


- Eero

[1] In corporate setting I would expect this kind of latency
investigations to be actually HW assisted, otherwise tracing itself
disturbs the system too much.  Ultimately it could be using instruction
branch tracing to catch *everything*, as both ARM and x86 have HW
support for that.

(Instruction branch tracing doesn't include context, but that can be
injected separately to the data stream.  Because it catches everything,
one can infer some of the context from the trace itself too.  I don't
think there's any good Open Source post-processing / visualization tools
for such data though.)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-18 Thread Rob Clark
On Thu, Feb 18, 2021 at 8:00 AM Rob Clark  wrote:
>
> On Thu, Feb 18, 2021 at 5:35 AM Tamminen, Eero T
>  wrote:
> >
> > Hi,
> >
> > On Thu, 2021-02-18 at 12:17 +0100, Primiano Tucci wrote:
> > [discussion about Perfetto itself]
> > ...
> > > eero.t.tamminen@
> > > from in common Linux distro repos.
> > >
> > > That's right. I am aware of the problem. The plan is to address it
> > > with
> > > bit.ly/perfetto-debian as a starter.
> >
> > Glad to hear something is planned for making things easier for distros!
> >
> >
> > > > eero.t.tamminen@
> > > > Just set uprobe for suitable buffer swap function [1], and parse
> > > kernel ftrace events. (paraphrasing for context: "why do we need
> > > instrumentation points? we can't we just use uprobes instead?")
> > >
> > > The problem with uprobes is:
> > > 1. It's linux specific. Perhaps not a big problem for Mesa here, but the
> > > reason why we didn't go there with Perfetto, at least until now, is that
> > > we need to support all major OSes (Linux, CrOS, Android, Windows,
> > > macOS).
> >
> > The main point here is that uprobe works already without any
> > modifications to any libraries (I have script that has been used for FPS
> > tracking of daily 3D stack builds for many years).
> >
> > And other OSes already offer similar functionality.  E.g. DTrace should
> > be available both for Mac & Windows.
> >
>
> So we are talking about a couple different tracing use-cases which
> perfetto provides.. *Maybe* uprobe can work for the instrument the
> code use case that you are talking about, just assuming for the sake
> of argument that the security folks buy into it, etc.. I'm not sure if
> there isn't a race condition if the kernel has to temporarily remap
> text pages r/w or other potential issues I've not thought of?
>
> But that is ignoring important things like gpu traces and perf
> counters.  I kind of think it is informative to look at some of the
> related proto definitions, because they give a sense of what
> information the visualization UI tools can make use of, for example:
>
> https://cs.android.com/android/platform/superproject/+/master:external/perfetto/protos/perfetto/trace/gpu/gpu_render_stage_event.proto
>
> For that, we would need to, from a background thread in the driver
> (well aux/util/u_trace) collect up the logged gpu timestamps after the
> fact and fill in the relevant details for the trace event.  We are
> going to anyways need the perfetto SDK (in the short term, until we
> can switch to C shared lib when it is avail) for that.
>

jfyi, I captured an example perfetto trace from an android phone,
since we don't have all this wired up in mesa.. but it should be
enough to give an idea what is possible with the gpu counters and
render stage traces

https://people.freedesktop.org/~robclark/example.perfetto

It looks like the GPU render stages don't show up in ui.perfetto.dev
(yet?), but you can also open it directly in AGI[1][2] which does also
show the render stages.  The GPU counters show up in both.

[1] https://gpuinspector.dev/
[2] https://github.com/google/agi

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-18 Thread Rob Clark
On Thu, Feb 18, 2021 at 7:40 AM Primiano Tucci  wrote:
>
>
>
> On 18/02/2021 14:35, Tamminen, Eero T wrote:
> > Hi,
> >
> > On Thu, 2021-02-18 at 12:17 +0100, Primiano Tucci wrote:
> > [discussion about Perfetto itself]
> > ...
> >> eero.t.tamminen@
> >> from in common Linux distro repos.
> >>



> >
> > At least both Ubuntu and Fedora default kernels have had uprobes built
> > in for *many* years.
> >
> > Up to date Fedora 33 kernel:
> > $ grep UPROBE /boot/config-5.10.15-200.fc33.x86_64
> > CONFIG_ARCH_SUPPORTS_UPROBES=y
> > CONFIG_UPROBES=y
> > CONFIG_UPROBE_EVENTS=y
> >
> > Same on up to date Ubuntu 20.04:
> > $ grep UPROBE /boot/config-5.4.0-65-generic
> > CONFIG_ARCH_SUPPORTS_UPROBES=y
> > CONFIG_UPROBES=y
> > CONFIG_UPROBE_EVENTS=y
> >
>
> Somebody more knowledgeable about CrOS should chime in, but from a
> codesearch, I don't think they are enabled on CrOS:
>
> https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/kernel/v5.4/arch/x86/configs/chromiumos-jail-vm-x86_64_defconfig;l=254?q=CONFIG_UPROBES%20-%22%23if%22%20-%22obj-%22&sq=&ss=chromiumos%2Fchromiumos%2Fcodesearch:src%2Fthird_party%2Fkernel%2Fv5.4%2F

These are *not* enabled in CrOS.. it is possible to build your own
kernel with them enabled (if your device is in dev mode, and rootfs
verification is disabled).. but that completely defeats the purpose of
having something where we can trace production builds and have tools
available for our users

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-18 Thread Rob Clark
On Thu, Feb 18, 2021 at 5:35 AM Tamminen, Eero T
 wrote:
>
> Hi,
>
> On Thu, 2021-02-18 at 12:17 +0100, Primiano Tucci wrote:
> [discussion about Perfetto itself]
> ...
> > eero.t.tamminen@
> > from in common Linux distro repos.
> >
> > That's right. I am aware of the problem. The plan is to address it
> > with
> > bit.ly/perfetto-debian as a starter.
>
> Glad to hear something is planned for making things easier for distros!
>
>
> > > eero.t.tamminen@
> > > Just set uprobe for suitable buffer swap function [1], and parse
> > kernel ftrace events. (paraphrasing for context: "why do we need
> > instrumentation points? we can't we just use uprobes instead?")
> >
> > The problem with uprobes is:
> > 1. It's linux specific. Perhaps not a big problem for Mesa here, but the
> > reason why we didn't go there with Perfetto, at least until now, is that
> > we need to support all major OSes (Linux, CrOS, Android, Windows,
> > macOS).
>
> The main point here is that uprobe works already without any
> modifications to any libraries (I have script that has been used for FPS
> tracking of daily 3D stack builds for many years).
>
> And other OSes already offer similar functionality.  E.g. DTrace should
> be available both for Mac & Windows.
>

So we are talking about a couple different tracing use-cases which
perfetto provides.. *Maybe* uprobe can work for the instrument the
code use case that you are talking about, just assuming for the sake
of argument that the security folks buy into it, etc.. I'm not sure if
there isn't a race condition if the kernel has to temporarily remap
text pages r/w or other potential issues I've not thought of?

But that is ignoring important things like gpu traces and perf
counters.  I kind of think it is informative to look at some of the
related proto definitions, because they give a sense of what
information the visualization UI tools can make use of, for example:

https://cs.android.com/android/platform/superproject/+/master:external/perfetto/protos/perfetto/trace/gpu/gpu_render_stage_event.proto

For that, we would need to, from a background thread in the driver
(well aux/util/u_trace) collect up the logged gpu timestamps after the
fact and fill in the relevant details for the trace event.  We are
going to anyways need the perfetto SDK (in the short term, until we
can switch to C shared lib when it is avail) for that.

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-18 Thread Primiano Tucci



On 18/02/2021 14:35, Tamminen, Eero T wrote:

Hi,

On Thu, 2021-02-18 at 12:17 +0100, Primiano Tucci wrote:
[discussion about Perfetto itself]
...

eero.t.tamminen@
from in common Linux distro repos.

That's right. I am aware of the problem. The plan is to address it
with
bit.ly/perfetto-debian as a starter.


Glad to hear something is planned for making things easier for distros!



eero.t.tamminen@
Just set uprobe for suitable buffer swap function [1], and parse

kernel ftrace events. (paraphrasing for context: "why do we need
instrumentation points? we can't we just use uprobes instead?")

The problem with uprobes is:
1. It's linux specific. Perhaps not a big problem for Mesa here, but the
reason why we didn't go there with Perfetto, at least until now, is that
we need to support all major OSes (Linux, CrOS, Android, Windows,
macOS).


The main point here is that uprobe works already without any
modifications to any libraries (I have script that has been used for FPS
tracking of daily 3D stack builds for many years).



Marking begin/end of function with timestamp is the easy part and you 
can do with an arbitrarily large set of tracing/debugging tools. In my 
experience things get more interesting when you want to start dumping 
the state of entire subsystems in the trace, so you can reason about 
them after the fact when looking at the trace timeline.


E.g. instrumentation points like this which go beyond the notion of 
dynamic breakpoints like this:


https://chromium.googlesource.com/chromium/src/+/4f331b42066d4e729c28facaa9bd9d4c33c6bbfd/components/viz/common/quads/compositor_render_pass.cc#116

which eventually allow to get tracing features like:
https://www.chromium.org/developers/how-tos/trace-event-profiling-tool/using-frameviewer 





And other OSes already offer similar functionality.  E.g. DTrace should
be available both for Mac & Windows.


Btw. If really needed, you could even implement similar functionality in
user space on other OSes.We did something similar at Nokia using
ptrace before uprobes was a thing:
https://github.com/maemo-tools-old/functracer

(While possible, because of the need for some tricky architecture
specific assembly, and certain instructions needing some extra assembly
fixups when replaced by a breakpoint, it's unlikely to be feasible for
general tracing though.)



2. Even on Linux-based systems, it's really hard to have uprobes enabled
in production (I am not sure what is the situation for CrOS).

  In Google,
we care a lot about being able to trace from production devices
without
reflashing them with dev images, because then we can just tell people
that are experiencing problems "can you just open chrome://tracing,
orders of magnitude the actionable feedback we'd be able to get from
users / testers.


I would think having tracepoint code in the libraries themselves and
generally enabled for some specific tracing solution like Perfetto, is
*much* more unlikely.


IMO this one of the key point that you (Mesa) folks need to discuss 
here: whether (i) the trace points are directly tied to Perfetto (or any 
other tracing system) API (I buy the skepticism given the current state 
of things) or (ii) you have some mesa-specific tracing abstraction layer 
and you wire up Perfetto (or whatever else) in some "Mesa tracing 
backend impl", so the dependency surface is minimized.
In my experience, (ii) tends to be a bit more appealing but its 
feasibility depends on "what" do you want to trace, i.e. how much your 
instrumentation points look like begin/end markers and counters (easy) 
or full object state like the link above, in which case the risk is that 
you'll end up doing lot of boilerplate code and double-copies for state 
objects to avoid the direct deps.


Perhaps the best way is to have snippets of code to see how that would 
look like.




At least both Ubuntu and Fedora default kernels have had uprobes built
in for *many* years.

Up to date Fedora 33 kernel:
$ grep UPROBE /boot/config-5.10.15-200.fc33.x86_64
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_UPROBES=y
CONFIG_UPROBE_EVENTS=y

Same on up to date Ubuntu 20.04:
$ grep UPROBE /boot/config-5.4.0-65-generic
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_UPROBES=y
CONFIG_UPROBE_EVENTS=y



Somebody more knowledgeable about CrOS should chime in, but from a 
codesearch, I don't think they are enabled on CrOS:


https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/kernel/v5.4/arch/x86/configs/chromiumos-jail-vm-x86_64_defconfig;l=254?q=CONFIG_UPROBES%20-%22%23if%22%20-%22obj-%22&sq=&ss=chromiumos%2Fchromiumos%2Fcodesearch:src%2Fthird_party%2Fkernel%2Fv5.4%2F




The challenge of ubprobes is that it relies on dynamic rewriting of
.text pages. Whenever I mention that, a platform security team reacts
like the Frau Blucher horses (https://youtu.be/bps5hJ5DQDw?t=10), with
understandable reasons.


I'm not sure you've given them accurate picture of it.

It doesn't require executable code to b

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-18 Thread Tamminen, Eero T
Hi,

On Thu, 2021-02-18 at 12:17 +0100, Primiano Tucci wrote:
[discussion about Perfetto itself]
...
> eero.t.tamminen@
> from in common Linux distro repos.
> 
> That's right. I am aware of the problem. The plan is to address it
> with 
> bit.ly/perfetto-debian as a starter.

Glad to hear something is planned for making things easier for distros!


> > eero.t.tamminen@
> > Just set uprobe for suitable buffer swap function [1], and parse 
> kernel ftrace events. (paraphrasing for context: "why do we need 
> instrumentation points? we can't we just use uprobes instead?")
> 
> The problem with uprobes is:
> 1. It's linux specific. Perhaps not a big problem for Mesa here, but the
> reason why we didn't go there with Perfetto, at least until now, is that
> we need to support all major OSes (Linux, CrOS, Android, Windows,
> macOS).

The main point here is that uprobe works already without any
modifications to any libraries (I have script that has been used for FPS
tracking of daily 3D stack builds for many years).

And other OSes already offer similar functionality.  E.g. DTrace should
be available both for Mac & Windows.


Btw. If really needed, you could even implement similar functionality in
user space on other OSes.We did something similar at Nokia using
ptrace before uprobes was a thing:
https://github.com/maemo-tools-old/functracer

(While possible, because of the need for some tricky architecture
specific assembly, and certain instructions needing some extra assembly
fixups when replaced by a breakpoint, it's unlikely to be feasible for
general tracing though.)


> 2. Even on Linux-based systems, it's really hard to have uprobes enabled
> in production (I am not sure what is the situation for CrOS).
>
>  In Google, 
> we care a lot about being able to trace from production devices
> without 
> reflashing them with dev images, because then we can just tell people 
> that are experiencing problems "can you just open chrome://tracing, 
> orders of magnitude the actionable feedback we'd be able to get from 
> users / testers.

I would think having tracepoint code in the libraries themselves and
generally enabled for some specific tracing solution like Perfetto, is
*much* more unlikely.

At least both Ubuntu and Fedora default kernels have had uprobes built
in for *many* years.

Up to date Fedora 33 kernel:
$ grep UPROBE /boot/config-5.10.15-200.fc33.x86_64 
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_UPROBES=y
CONFIG_UPROBE_EVENTS=y

Same on up to date Ubuntu 20.04:
$ grep UPROBE /boot/config-5.4.0-65-generic 
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_UPROBES=y
CONFIG_UPROBE_EVENTS=y


> The challenge of ubprobes is that it relies on dynamic rewriting of 
> .text pages. Whenever I mention that, a platform security team reacts 
> like the Frau Blucher horses (https://youtu.be/bps5hJ5DQDw?t=10), with
> understandable reasons.

I'm not sure you've given them accurate picture of it.

It doesn't require executable code to be writable from user-space,
library code can remain read-only because kernel can toggle relevant
page writable for uprobe breakpoint setup and back.

# cat /sys/kernel/tracing/uprobe_events 
p:uprobes/glXSwapBuffers /opt/lib/libGL.so.1.2.0:0x0003bab0

# grep -h /opt/lib/libGL.so.1.2.0 /proc/*/maps | sort
7f486ab51000-7f486ab6a000 r--p  08:03 7865435  
/opt/lib/libGL.so.1.2.0
7f486ab6a000-7f486abaf000 r-xp 00019000 08:03 7865435  
/opt/lib/libGL.so.1.2.0
7f486abaf000-7f486abc6000 r--p 0005e000 08:03 7865435  
/opt/lib/libGL.so.1.2.0
7f486abc6000-7f486abc9000 r--p 00074000 08:03 7865435  
/opt/lib/libGL.so.1.2.0
7f486abc9000-7f486abca000 rw-p 00077000 08:03 7865435  
/opt/lib/libGL.so.1.2.0
7f491438d000-7f49143a6000 r--p  08:03 7865435  
/opt/lib/libGL.so.1.2.0
7f49143a6000-7f49143eb000 r-xp 00019000 08:03 7865435  
/opt/lib/libGL.so.1.2.0
7f49143eb000-7f4914402000 r--p 0005e000 08:03 7865435  
/opt/lib/libGL.so.1.2.0
7f4914402000-7f4914405000 r--p 00074000 08:03 7865435  
/opt/lib/libGL.so.1.2.0
7f4914405000-7f4914406000 rw-p 00077000 08:03 7865435  
/opt/lib/libGL.so.1.2.0
7f6296d62000-7f6296d7b000 r--p  08:03 7865435  
/opt/lib/libGL.so.1.2.0
...


>  > mark.a.janes@
> events.  Why not follow the example of GPUVis, and write generic 
> trace_markers to ftrace?
> 
> In my experience ftrace's trace_marker:
> 1. Works for very simple types of events (e.g. 
> begin-slice/end-slice/counters) but don't work with richer / structured 
> event types like the ones linked above, as that gets into 
> stringification format, mashaling costs and interop.
> 2. Writing into the marker has some non-trivial cost (IIRC 1-10 us on 
> Android), it involves a kernel into and back from the kernel;
> 3. It leads to one global ring buffer, where fast events push out slower
> ones, which is particularly problemat

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-18 Thread Primiano Tucci



Hey folks,
I'm one of the authors and maintainers of Perfetto, also +skyostil@.
I am really sorry for the giant bulk reply. I'll try to do my best to 
answer the various open questions about Perfetto but I don't know a 
better way than some heavy -ing given I'm joining the party late.


In short:

- Yep, so far the only distribution we have for the SDK is a C++ 
amalgamation. I am aware that it isn't great for Linux OSS projects, it 
was very optimized for Google projects that have the habit of statically 
linking everything.


- There are plans to move beyond that and have a stable C API (docs 
linked below). But that will take us quite some time. We should probably 
figure out some intermediate solution meanwhile.


- I'd be really keen to learn how Mesa is intending to do tracing. That 
can influence a lot our upcoming design. Begin/end markers are IMO the 
least interesting thing as they tend to work in whatever form and are 
easy to abstract. Richer/structured trace points like 
https://github.com/google/perfetto/tree/master/protos/perfetto/trace/gpu 
( currently used by Android GPU drivers [1]) are more interesting and 
where most of the challenges lie.


- Maybe the discussion here needs to be split into: (1) a shorter-term 
plan to iterate, figure out what works, what doesn't, see how the end 
result looks like; (2) a longer term plan and on how the API surface 
should look like.


I don't have strong opinions on how Mesa should proceed here and you 
don't need an extra cook in the kitchen. If I really had to express a 
handwavy opinion, my best advice would be to start with something you 
can iterate on right now, maybe behind some compile-time flag, and come 
up with a plan on how to turn into a production thing later on. We are 
interested to hear your feedback and adjust the design of our stable C API.


[1] 
https://android-developers.googleblog.com/2020/09/android-gpu-inspector-open-beta.html?m=1


On the tracing library / C++ vendoring / stable C API:

The way the Perfetto SDK is organized today is mainly influenced-by and 
designed-for the way Google handles its projects, which boils down to: 
(i) statically link everything, to minimize the testing matrix; (ii) 
move fast and refactor all dependencies as needed.
It's all about "who pays the maintenance cost and when?". This tends to 
work well in a large company which: (i) has a giant repo which allows 
~atomic cross-project changes; (ii) has the resources to keep everything 
up to date.
I am perfectly aware this is not appealing nor ideal for open source 
projects and, more in general, with the way libraries in Linux 
distributions work. I hear you when you say "vendoring [...] seems a bad 
idea". Yes, it implicitly pushes the burden of up-revving onto the 
"depender" [that's bad]
We are committed to maintain ABI stability of our tracing protocol and 
socket (see https://perfetto.dev/docs/design-docs/api-and-abi). This is 
because Chrome, Android, and now CrOS, and tools like gpuinspector.dev, 
which all statically link perfetto in some form, have strongly different 
release cycles. [that's good]
We also try to not break the C++ API too much, as robdclark@ found 
trying to update through our v3..v12 monthly releases [that's good]. But 
that C++ API has a too wide surface and we can't commit to make that a 
fully stable API. Nor can we make the current C++ ABI stable across a 
.so boundary (the C++ SDK today heavily depends on inlines to allow the 
compiler to see through the layers). [that's bad]


For this reason, we recently started making plans to change the way we 
do things to meet the needs of open source projects and not just 
Google-internal ones. [that's good]
Specifically (Note: to open the docs below you need to join 
https://groups.google.com/forum/#!forum/perfetto-dev to inherit the ACLs):


1. https://bit.ly/perfetto-debian has a plan to distribute tracing 
services and SDK as standard pkg-config based packages (at least for 
Debian. We'll rely on volunteers for other distros)


2. https://bit.ly/perfetto-c has a plan + ongoing discussion for having 
a long-term stable C API in a libperfetto.so . The key here for us 
(Perfetto) is identifying a subset of the wider C++ API that fits the 
bill for projects out there and that we are comfortable maintaining 
longer term.


The one thing I also need to be very clear on, though, is that both the 
perfetto-debian and perfetto-c discussions are very recent and will take 
a while for us to get there. We can't commit to a specific timeline 
right now, but if I had to make an educated estimate I'd say more 
towards end-of-2021. [that's bad]
I'd also be more keen to commit once there are concrete use-cases, 
ideally with iterations/feedback from a project like Mesa.


[obligatory reference at this point: https://youtu.be/Krbl911ZPBA?t=22]


> eero.t.tamminen@
> Just set uprobe for suitable buffer swap function [1], and parse 
kernel ftrace events. (paraphrasing for context: "why do we