Hi,

> >         do {
> >                 tid = this_cpu_read(s->cpu_slab->tid);
> >                 c = raw_cpu_ptr(s->cpu_slab);
> > -       } while (IS_ENABLED(CONFIG_PREEMPT) && unlikely(tid != c->tid));
> > +       } while (IS_ENABLED(CONFIG_PREEMPT) &&
> > +                unlikely(tid != READ_ONCE(c->tid)));

[...]

> Could you show me generated code again?

The code generated without this patch in !SMP && PREEMPT kernels is:

/* Hoisted load of c->tid */
ffffffc00016d3c4:       f9400404        ldr     x4, [x0,#8]
/* this_cpu_read(s->cpu_slab->tid)) -- buggy, see [1] */
ffffffc00016d3c8:       f9400401        ldr     x1, [x0,#8]
ffffffc00016d3cc:       eb04003f        cmp     x1, x4
ffffffc00016d3d0:       54ffffc1        b.ne    ffffffc00016d3c8 
<slab_alloc_node.constprop.82+0x30>

The code generated with this patch in !SMP && PREEMPT kernels is:

/* this_cpu_read(s->cpu_slab->tid)) -- buggy, see [1] */
ffffffc00016d3c4:       f9400401        ldr     x1, [x0,#8]
/* load of c->tid */
ffffffc00016d3c8:       f9400404        ldr     x4, [x0,#8]
ffffffc00016d3cc:       eb04003f        cmp     x1, x4
ffffffc00016d3d0:       54ffffa1        b.ne    ffffffc00016d3c4 
<slab_alloc_node.constprop.82+0x2c>

Note that with the patch the branch results in both loads being
performed again.

Given that in !SMP kernels we know that the loads _must_ happen on the
same CPU, I think we could go a bit further with the loop condition:

        while (IS_ENABLED(CONFIG_PREEMPT) &&
               !IS_ENABLED(CONFIG_SMP) &&
               unlikely(tid != READ_ONCE(c->tid)));

The barrier afterwards should be sufficient to order the load of the tid
against subsequent accesses to the other cpu_slab fields.

> What we need to check is redoing whole things in the loop.
> Previous attached code seems to me that it already did
> refetching c->tid in the loop and this patch looks only handle
> refetching c->tid.

The refetch in the loop is this_cpu_read(s->cpu_slab->tid), not the load
of c->tid (which is hoisted above the loop).

> READ_ONCE(c->tid) will trigger redoing 'tid = 
> this_cpu_read(s->cpu_slab->tid)'?

I was under the impression that this_cpu operations would always result
in an access, much like the *_ONCE accessors, so we should aways redo
the access for this_cpu_read(s->cpu_slab->tid). Is that not the case?

Mark.

[1] The arm64 this_cpu * operations are currently buggy. We generate the
    percpu address into a register, then perform the access with
    separate instructions (and could be preempted between the two).
    Steve Capper is currently fixing this.

    However, the hoisting of the c->tid load could happen regardless,
    whenever raw_cpu_ptr(c) can be evaluated at compile time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to