On 10/9/15 4:45 AM, Hannes Frederic Sowa wrote:
Afaics this problem hasn't even be solved in
perf so far, tracepoints hit independent of the namespace currently.

yes and that's exactly what we're trying to solve.
The "demux+worker bpf programs" proposal is a work-in-progress solution
to get confidence how to actually separate tracepoint events into
namespaces before adding any new APIs to kernel.

For me namespacing of ebpf code is actually not that important, I would
much rather like to control which namespace is allowed to execute ebpf
in an unpriviledged manner. Like Thomas wrote, a capability was great
for that, but I don't know if any new capabilities will be added.

I think we're mixing too many things here.
First I believe eBPF 'socket filters' do not need any caps.
They're packet read-only and functionally very similar to classic with
a distinction that packet data can be aggregated into maps and programs
can be written in C. So I see no reason to restrict them per user or
per namespace.
Openstack use case is different. There it will be prog_type_sched_cls
that can mangle packets, change skb metadata, etc under TC framework.
These are not suitable for all users and this patch leaves
them root-only. If you're proposing to add CAP_BPF_TC to let containers
use them without being CAP_SYS_ADMIN, then I agree, it is useful, but
needs a lot more safety analysis on tc side.
Similar for prog_type_kprobe: we can add CAP_BPF_KPROBE to let
some trusted applications run unprivileged, but still being able
to do performance monitoring/analytics.
And we would need to carefully think about program restrictions,
since bpf_probe_read and kernel pointer walking is essential part
in tracing.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to