Peter Zijlstra <[email protected]> writes:

> Yuck, nasty problem. Also, I think its broken. By not having
> mmap_mutex around the whole thing, notably rb_free_aux(), you can race
> against mmap().
>
> What seems possible now is that:
>
>       mmap(aux); // rb->aux_mmap_count == 1
>       munmap(aux)
>         atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex); 
> // == 0
>
>         mutex_unlock(&event->mmap_mutex);
>
>                                       mmap(aux)
>                                         if (rb_has_aux())
>                                           atomic_inc(&rb->aux_mmap_count); // 
> == 1
>
>         rb_free_aux(); // oops!!

Wait, this isn't actually a problem, we can hold mmap_mutex over
rb_free_aux(), as we actually already do in current code. My patch did
it wrongly though, but there's really no reason to drop the mutex before
rb_free_aux().

> So I thought that pulling all the aux bits out from the ring_buffer
> struct, such that we have rb->aux, would solve the issue in that we can
> then fix mmap() to have the same retry loop as for event->rb.
>
> And while that fixes that race (I almost had that patch complete -- I
> might still send it out, just so you can see what it looks like), it
> doesn't solve the complete problem I don't think.

I was toying with that some time ago, but I couldn't really see the
benefits that would justify the hassle.

> Because in that case, you want the event to start again on the new
> buffer, and I think its possible we end up calling ->start() before
> we've issued the ->stop() and that would be BAD (tm).

So if we just hold the mmap_mutex over rb_free_aux(), this won't
happen, right?

> The only solution I've come up with is:
>
>       struct rb_aux *aux = rb->aux;
>
>       if (aux && vma->vm_pgoff == aux->pgoff) {
>               ctx = perf_event_ctx_lock(event);
>               if (!atomic_dec_and_mutex_lock(&aux->mmap_count, 
> &event->mmap_mutex) {
>                       /* we now hold both ctx::mutex and event::mmap_mutex */
>                       rb->aux = NULL;
>                       ring_buffer_put(rb); /* aux had a reference */
>                       _perf_event_stop(event);

Here we really need to ensure that none of the events on the
rb->event_list is running, not just the parent, and that still presents
complications wrt irqsave rb->event_lock even with your new idea for
perf_event_stop().

How about something like this to stop the writers:

static int __ring_buffer_output_stop(void *info)
{
        struct ring_buffer *rb = info;
        struct perf_event *event;
 
        spin_lock(&rb->event_lock);
        list_for_each_entry_rcu(event, &rb->event_list, rb_entry) {
                if (event->state != PERF_EVENT_STATE_ACTIVE)
                        continue;

                event->pmu->stop(event, PERF_EF_UPDATE);
        }
        spin_unlock(&rb->event_lock);

        return 0;
}

static void perf_event_output_stop(struct perf_event *event)
{
        struct ring_buffer *rb = event->rb;

        lockdep_assert_held(&event->mmap_mutex);

        if (event->cpu == -1)
                perf_event_stop(event);

        cpu_function_call(event->cpu, __ring_buffer_output_stop, rb);
}
 
And then in the mmap_close:

        if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff &&
            atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex)) 
{
                perf_event_output_stop(event);

                /* undo the mlock accounting here */

                rb_free_aux(rb);
                mutex_unlock(&event->mmap_mutex);
        }

Regards,
--
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to