Re: [PATCH] sched,x86: optimize switch_mm for multi-threaded workloads

Rik van Riel Wed, 31 Jul 2013 15:39:56 -0700

On 07/31/2013 06:21 PM, Linus Torvalds wrote:

Ummm.. The race is to the testing of the bit, not setting. The testing
of the bit is not valid before we have set the tlb state, AFAIK.


I believe the bit is cleared and set by the current CPU.

Clearing is done from the TLB flush IPI handler, or by directly
calling leave_mm from ptep_flush_clear if the flush originated
locally.  The exception is clear_tasks_mm_cpumask, which may
only be called for an already offlined CPU.

I believe setting is only ever done in switch_mm.

Interrupts are blocked inside switch_mm, so I think we
are safe.

Would you like a comment to this effect in the code, or
are there other things we need to check first?

On Jul 31, 2013 3:16 PM, "Rik van Riel" <[email protected]
<mailto:[email protected]>> wrote:

    On 07/31/2013 06:07 PM, Linus Torvalds wrote:

        On Wed, Jul 31, 2013 at 2:43 PM, Rik van Riel <[email protected]
        <mailto:[email protected]>> wrote:


            The cause turned out to be unnecessary atomic accesses to the
            mm_cpumask. When in lazy TLB mode, the CPU is only removed from
            the mm_cpumask if there is a TLB flush event.

            Most of the time, no such TLB flush happens, and the kernel
            skips the TLB reload.  It can also skip the atomic memory
            set & test.


        The patch looks obvious, and I'm not seeing any very clear
        reasons for
        why we would want that test-and-set to be atomic. That said, I'd
        like
        to have some explicit comments about exactly why it doesn't need the
        atomicity. Because afaik, there actually are concurrent readers
        _and_
        writers of that mask, and the accesses are not locked by anything
        here.

     >

        I _think_ the reason for this all being safe is simply that the only
        real race is "We need to set the bit before we load the page table,
        and we're protected against that bit being cleared because the TLB
        state is TLBSTATE_OK and thus TLB flushing will no longer leave that
        mm".

        But damn, it all looks subtle as hell. That code does:

                          this_cpu_write(cpu_tlbstate.__state, TLBSTATE_OK);
                          BUG_ON(this_cpu_read(cpu___tlbstate.active_mm)
        != next);

                          if (!cpumask_test_and_set_cpu(__cpu,
        mm_cpumask(next))) {

        and I'm wondering if we need a barrier to make sure that that
        TLBSTATE_OK write happens *before* we test the cpumask. With
        test_and_set(), we have the barrier in the test-and-set. But
        with just
        test_bit, I'm not seeing why the compiler couldn't re-order them. I
        suspect it won't, but...


    cpumask_set_bit expands to set_bit, which is atomic

    Do we need any explicit compiler barrier in addition to the
    atomic operation performed by set_bit?

    I would be happy to rewrite the comment right above the
    cpumask_set_cpu call if you want.

    --
    All rights reversed



--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched,x86: optimize switch_mm for multi-threaded workloads

Reply via email to