On Thu, 26 Jun 2014 15:22:38 +0200
Petr Mladek <pmla...@suse.cz> wrote:

> The trace/ring_buffer allows to swap the entire ring buffer. Everything has to
> be done lockless. I think that I have found a race when trying to understand
> the code. The problematic situation is the following:
> 
> CPU 1 (write/reserve event)           CPU 2 (swap the cpu buffer)
> -------------------------------------------------------------------------
> ring_buffer_write()
>   if (cpu_buffer->record_disabled)
>   ^^^ test fails and write continues
> 
>                                       ring_buffer_swap_cpu()

This is a per cpu swap, not a full ring buffer swap.

> 
>                                         inc(&cpu_buffer_a->record_disabled);
>                                         inc(&cpu_buffer_b->record_disabled);
> 
>                                         if (cpu_buffer_a->committing)
>                                         ^^^ test fails and swap continues
>                                         if (cpu_buffer_b->committing)
>                                         ^^^ test fails and swap continues

Grumble, this was originally written such that the swap can only be
done on the CPU that it is running on.


> 
>   rb_reserve_next_event()
>     rb_start_commit()
>       local_inc(&cpu_buffer->committing);
>       local_inc(&cpu_buffer->commits);
> 
>     if (unlikely(ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
>     ^^^ test fails and reserve_next continues
> 
>                                         buffer_a->buffers[cpu] = cpu_buffer_b;
>                                         buffer_b->buffers[cpu] = cpu_buffer_a;
>                                         cpu_buffer_b->buffer = buffer_a;
>                                         cpu_buffer_a->buffer = buffer_b;
> 
>   Pheeee, reservation continues in the removed buffer.
> 
> This can be solved by a better check in rb_reserve_next_event(). The 
> reservation
> could continue only when "committing" is enabled and there is no swap in
> progress or when any swap was not finished in the meantime.
> 
> Note that the solution is not perfect. It stops writing also in this 
> situation:
> 
> CPU 1 (write/reserve event)           CPU 2 (swap the cpu buffer)
> ----------------------------------------------------------------------------
> ring_buffer_write()
>   if (cpu_buffer->record_disabled)
>   ^^^ test fails and write continues
> 
>                                       ring_buffer_swap_cpu()
> 
>                                         inc(&cpu_buffer_a->record_disabled);
>                                         inc(&cpu_buffer_b->record_disabled);
> 
>   rb_reserve_next_event()
>     rb_start_commit()
>       local_inc(&cpu_buffer->committing);
>       local_inc(&cpu_buffer->commits);
> 
>                                         if (cpu_buffer_a->committing)
>                                         ^^^ test passes and swap is canceled
> 
>     if (cpu_buffer->record_disabled)
>     ^^^ test passes and reserve is canceled
> 
>                                         dec(&cpu_buffer_a->record_disabled);
>                                         dec(&cpu_buffer_b->record_disabled);
> 
>   Pheeee, both actions were canceled. The write/reserve could have continued
>   if the recording was enabled in time.
> 
> Well, it is the price for using lockless algorithms. I think that it happens
> in more situations here, it is not worth more complicated code, and we could
> live with it.
> 
> The patch also adds some missing memory barriers. Note that compiler barriers
> are not enough because the data can be accessed by different CPUs.
> 
> Signed-off-by: Petr Mladek <pmla...@suse.cz>
> ---
>  kernel/trace/ring_buffer.c | 27 +++++++++++++++++++++------
>  1 file changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 7c56c3d06943..3020060ded7e 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -2529,13 +2529,15 @@ rb_reserve_next_event(struct ring_buffer *buffer,
>  
>  #ifdef CONFIG_RING_BUFFER_ALLOW_SWAP
>       /*
> -      * Due to the ability to swap a cpu buffer from a buffer
> -      * it is possible it was swapped before we committed.
> -      * (committing stops a swap). We check for it here and
> -      * if it happened, we have to fail the write.
> +      * The cpu buffer swap could have started before we managed to stop it
> +      * by incrementing the "committing" values. If the swap is in progress,
> +      * we see disabled recording. If the swap has finished, we see the new
> +      * cpu buffer. In both cases, we should go back and stop committing
> +      * to the old buffer. See also ring_buffer_swap_cpu()
>        */
> -     barrier();
> -     if (unlikely(ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
> +     smp_mb();
> +     if (unlikely(atomic_read(&cpu_buffer->record_disabled) ||
> +                  ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
>               local_dec(&cpu_buffer->committing);
>               local_dec(&cpu_buffer->commits);
>               return NULL;
> @@ -4334,6 +4336,13 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
>       atomic_inc(&cpu_buffer_a->record_disabled);
>       atomic_inc(&cpu_buffer_b->record_disabled);
>  
> +     /*
> +      * We could not swap if a commit is in progress. Also any commit could
> +      * not start after we have stopped recording. Keep both checks in sync.
> +      * The counter part is in rb_reserve_next_event()
> +      */
> +     smp_mb();

This will kill ring buffer performance. So I will not allow this fix.

What we can do is force ring_buffer_swap_cpu() to only work for the CPU
that it is on. As we have snapshot in per_cpu buffers, to make that
work, we will need to change the per_cpu version of snapshot to do a
smp_call_function_single() to the CPU that it wants to take a snapshot
of, and run the swap there.

To force this, we can remove the cpu parameter from the
ring_buffer_swap_cpu(). By doing this, we may be able to remove some of
the CONFIG_RING_BUFFER_ALLOW_SWAP hacks too!

I'm not going to sacrifice the general performance of the ring buffer
for a feature that is seldom (if ever) used.

-- Steve

> +
>       ret = -EBUSY;
>       if (local_read(&cpu_buffer_a->committing))
>               goto out_dec;
> @@ -4348,6 +4357,12 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
>  
>       ret = 0;
>  
> +     /*
> +      * Mare sure that rb_reserve_next_event() see the right
> +      * buffer before we enable recording.
> +      */
> +     smp_wmb();
> +
>  out_dec:
>       atomic_dec(&cpu_buffer_a->record_disabled);
>       atomic_dec(&cpu_buffer_b->record_disabled);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to