On Fri, Aug 05, 2016 at 12:52:09PM +0200, Peter Zijlstra wrote: > > > > Currently overflow_handler is set at event alloc time. If we start > > > > changing it on the fly with atomic xchg(), afaik things shouldn't > > > > break, since each overflow_handler is run to completion and doesn't > > > > change global state, right? > > Yes, or even a simple WRITE_ONCE() to replace it, as long as we make > sure to use a READ_ONCE() to load the pointer. > > As long as we're sure to limit this poking to a single user its fairly > simple to get right. The moment there can be concurrency a lot of fail > can happen.
agreed. > > So instead of normal __perf_event_output() writing into ringbuffer, > > a bpf prog will be called that can optionally write into different > > rb via bpf_perf_event_output. > > It could even chain and call into the original function once its done > and have both outputs. interesting idea. makes sense. Also thinking about concurrency and the need to remember the original handler somewhere, would it be cleaner api to add a bit to perf_event_attr and use attr.config1 as bpf_fd ? Then perf_event_open at event creation time will use bpf prog as overflow_handler. That solves concurrency concerns and potential semantical issues if we go with ioctl() approach. Like if we perf_event_open() an event for a task, then bpf attach to it, what children task and corresponding inherited events suppose to do? Inherit overflow_handler, right? but then deatch of bpf in the parent suppose to clear it in inherited events as well. A bit complicated. I guess we can define it that way. Just seems easier to do bpf attach at perf_event_open time only. > > The question is what to pass into the > > program to make the most use out of it. 'struct pt_regs' is done deal. > > but perf_sample_data we cannot pass as-is, since it's kernel internal. > > Urgh, does it have to be stable API? Can't we simply rely on the kernel > headers to provide the right structure definition? yes we can. The concern is about assumptions people will make about perf_sample_data and the speed of access to it. From bpf program point of view the pointer to perf_sample_data will be opaque unsafe pointer, so any access to fields would have to be done via bpf_probe_read which has non-trivial overhead. If we go with the uapi mirror of perf_sample_data approach, it will be fast, since mirror is not an actual struct. Like the 'struct __sk_buff' we have in uapi/linux/bpf.h is a meta structure. It's not allocated anywhere and no fields are copied. When bpf program does 'skb->vlan_present' the verifier rewrites it at load time into corresponding access to kernel internal 'struct sk_buff' fields with bitmask, shifts and such. For this case we can define something like struct bpf_perf_sample_data { __u64 period; }; then bpf prog will only be able to access that signle field which verifier will translate into 'data->period' where data is 'struct perf_sample_data *' Later we can add other fields if necessary. The kernel is free to mess around with perf_sample_data whichever way without impacting bpf progs.