Hello,
The bpf_xdp_link_attach_failed tracepoint (added in commit bf4ea1d0b2cb
"xdp: Add tracepoint for xdp attaching failure") exposes the netlink
extack message produced when attaching an XDP program via BPF_LINK_CREATE
fails. This is useful because, unlike the netlink attach path, the
bpf_link attach path does not return the extack to userspace -- the caller
only gets an errno (e.g. EINVAL/ERANGE).
We would like to use this in Cilium [1][2]: when attaching the XDP
datapath program fails, surface the kernel's reason (e.g. "single-buffer
XDP requires MTU less than ...") in the agent logs instead of an opaque
errno, so operators don't have to inspect dmesg on the host.
The limitation we hit is that the tracepoint only carries the message
string, so a consumer cannot tell which device a failure belongs to.
This matters for two reasons:
1. Correlation: with only the message, a consumer cannot reliably
attribute a failure to a specific attach, particularly if multiple
XDP attaches happen concurrently.
2. Scoping: a consumer watching this tracepoint sees XDP attach
failures system-wide and cannot limit them to the devices it
manages.
At the call site (bpf_xdp_link_attach() in net/core/dev.c) the net_device
is in scope, so exposing it looks straightforward:
TRACE_EVENT(bpf_xdp_link_attach_failed,
TP_PROTO(const char *msg, const struct net_device *dev),
TP_ARGS(msg, dev),
TP_STRUCT__entry(
__string(msg, msg)
__field(int, ifindex)
),
TP_fast_assign(
__assign_str(msg);
__entry->ifindex = dev->ifindex;
),
TP_printk("ifindex=%d errmsg=%s", __entry->ifindex, __get_str(msg))
);
- trace_bpf_xdp_link_attach_failed(extack._msg);
+ trace_bpf_xdp_link_attach_failed(extack._msg, dev);
Before sending a formal patch I'd appreciate guidance on a few points:
- Should the tracepoint take const struct net_device *dev (consistent
with the other tracepoints in this file, and lets TP_printk show the
device), or just the ifindex as an int (simpler for raw_tp BPF
consumers, which otherwise read dev->ifindex via CO-RE)?
- For raw_tp consumers the argument order is effectively ABI: prepending
dev would shift the existing msg argument. I've appended dev above to
keep msg at args[0]. Is preserving the existing argument position the
right call, or is reordering acceptable given how new and rarely
consumed this tracepoint is?
- Is extending the existing tracepoint preferred, or would you rather
keep it as-is and expose the device context some other way?
This would be my first XDP/BPF tracepoint change, so any direction is
welcome. I'm happy to send a proper patch once the shape is agreed.
Regards,
Masashi Honma
[1] https://github.com/cilium/cilium/issues/40777
[2] https://github.com/cilium/cilium/pull/46546