On 04/28/2017 01:50 PM, Hannes Frederic Sowa wrote:
On 28.04.2017 03:11, Alexei Starovoitov wrote:
[...]
i disagree re: kallsyms. The goal of prog_tag is to let program writers
understand which program is running in a stable way.
But exactly it doesn't let program writers do that, it just confuses them:
---
jit on:
perf record -e bpf_redirect -agR
The unwinder walks the stack, extracts address of upper function and
sends it to user space (perf) or handles it inside the kernel/kallsyms
(ftrace).
User takes tag of bpf program and wants to inspect related maps to the
program. Unfortunately the tag is not unique and thus we need to expand
the tag back to all possible programs with the same tag and expand that
to the union of all possible maps that those programs reference again.
That is what we present to the application developer. I would seriously
be very confused.
If application developer doesn't trust perf and uses instruction pointer
value from the stack directly he can't find out which program there is,
because fdinfo e.g. doesn't show the actual address of where the program
is allocated. I would use /dev/kmem now.
I don't think it would be reasonable to let fdinfo unconditionally
dump the address of the program including unprivileged progs. We
probably could add a run-time check into bpf_prog_show_fdinfo() and
show it dynamically when user has cap_sys_admin.
---
jit off:
perf probe -a '__bpf_prog_run ctx insn'
perf probe -a 'bpf_redirect flags ifindex'
perf record -e bpf_redirect -agR
Situation doesn't change. We do get the insn pointer thus have a unique
id for the program. That's it, no further introspection. I can read
/dev/kmem now.
---
Personally I wouldn't rely on such infrastructure.
My proposal would be to maybe hash a map id into the program, so instead
of replacing the user space file descriptor with zero, take a map id
(like discussed below) or an inode number of the map into the register
and hash with that, so that those program have unique identifiers.
I don't think that proposal would work, f.e. placing dev + inode number
(inode itself wouldn't be sufficient either; map would also have to be
pinned as anonymous inode from fd wouldn't work) or map id into insn
won't give you out of a sudden a unique prog id, since maps can be shared
among multiple progs, but also the same prog can be attached to, say,
multiple attachment points.
Otherwise construct kallsym entries with prog id instead of tag.
I think that the hash should try to reassemble some kind of identity
function and mapping two programs to the same tag, that do something
completely differently is not good (based on we don't include the map).
Also I do think in future the difference between non-jit and jit
operation in regards to tracing should also be lifted. We could add a
manual tracing point into the interpreter for reporting the same event
as if the program was jitted.
Debugging should not be that different based on the sysctl flags.
With regards to tracing it's quite useful to see whether a program was
JITed or not JITed (aka __bpf_prog_run()), so I don't think it makes
sense to e.g. have everything named __bpf_prog_run(), at least the other
way around wouldn't work for interpreter as far as I see.
But lets assume JIT is off for a moment, and you only see __bpf_prog_run().
Then, in the stack trace you'll also see related functions that call this
in the first place, for example, mlx4_en_poll_rx_cq() / mlx4_en_process_rx_cq()
in case of XDP, meaning, you get the call path context as well, for which
you later on (with the proposed infrastructure for getting fds from
attachment points + dumping them) can return the attached prog fd and
with that also dump the code or map data.