Peter Zijlstra <pet...@infradead.org> writes: > On Fri, Mar 04, 2016 at 03:42:46PM +0200, Alexander Shishkin wrote: >> @@ -4649,10 +4679,22 @@ static void perf_mmap_close(struct vm_area_struct >> *vma) >> */ >> if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff && >> atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex)) >> { >> + /* >> + * Stop all aux events that are writing to this here buffer, >> + * so that we can free its aux pages and corresponding pmu >> + * data. Note that after rb::aux_mmap_count dropped to zero, >> + * they won't start any more (see perf_aux_output_begin()). >> + */ >> + perf_pmu_output_stop(event); > > So to me it seems like we're interested in rb, we don't particularly > care about @event in this case.
Yeah, @event is used only for its rb and pmu down this path. >> + if (!has_aux(event)) >> + return; >> + > > if (!parent) > parent = event; > >> + if (rcu_dereference(event->rb) == rb) > s/event/parent/ > >> + ro->err = __perf_event_stop(event); > >> + else if (parent && rcu_dereference(parent->rb) == rb) >> + ro->err = __perf_event_stop(event); > > and these can go.. However.. You're right: it's the parent that's got the ->rb, but it may well be the child that's actually running and writing data there. So we need to stop any running children. I think I actually broke this bit since the previous version, because there needs to be a comment about it. > >> +} >> + >> +static int __perf_pmu_output_stop(void *info) >> +{ >> + struct perf_event *event = info; >> + struct pmu *pmu = event->pmu; >> + struct perf_cpu_context *cpuctx = get_cpu_ptr(pmu->pmu_cpu_context); >> + struct remote_output ro = { >> + .rb = event->rb, >> + }; >> + >> + rcu_read_lock(); >> + perf_event_aux_ctx(&cpuctx->ctx, __perf_event_output_stop, &ro); >> + if (cpuctx->task_ctx) >> + perf_event_aux_ctx(cpuctx->task_ctx, __perf_event_output_stop, >> + &ro); >> + rcu_read_unlock(); >> + >> + return ro.err; >> +} >> + >> +static void perf_pmu_output_stop(struct perf_event *event) >> +{ >> + int cpu, err; >> + >> + /* better be thorough */ >> + get_online_cpus(); >> +restart: >> + for_each_online_cpu(cpu) { >> + err = cpu_function_call(cpu, __perf_pmu_output_stop, event); >> + if (err) >> + goto restart; >> + } >> + put_online_cpus(); >> +} > > This seems wildly overkill, could we not iterate rb->event_list like we > do for the normal buffer? Actually we can. One problem though is that iterating rb::event_list requires rcu read section or irqsafe rb::event_lock and we need to send IPIs. The normal buffer case tears down the rb::event_list as it goes, so it can close the rcu read section right after it fetches one event from it. In this case however, we must keep the list intact. > Sure, we need to IPI for each event found, but that seems better than > unconditionally sending IPIs to all CPUs. Actually, won't it "often" be the case that the number of events will be a multiple of the number of cpus? The usual use case being one event per task per cpu and inheritance enabled. In this case we'll zap multiple events per IPI. Regards, -- Alex