On 12/17, Andrea Arcangeli wrote: > > On Tue, Dec 17, 2013 at 05:53:35PM +0100, Oleg Nesterov wrote: > > > > I can't stop thinking about another change. What if we simply change > > __split_huge_page_refcount() to also do compound_lock/unlock(page_tail) > > in a main loop? > > > > This way we can greatly simplify get/put_page paths, we can rely on > > compound_lock(sub-page) and avoid get_page_unless_zero(page_head). > > Yes, this will make _split a bit slower, but PG_compound_lock should > > not be contended? And we should change page_tail->flags carefully, but > > this looks simple. > > > > Or this is not possible/desirable? > > That would be 512 nested spinlocks instead of 1,
Yes. But, just in case, we do not need to take them all at once. We only need to take/release the additional lock per page_tail in a loop. > last time I did > something like that in mm_take_all_locks people weren't too pleased as > it started firing lockdeps complains too. bit_spin_lock() doesn't have the lockdep annotations and it would be trivial to add _nested if necessary. > split_huge_page to the contrary could run in a flood if > you're unlucky. split_huge_page is needed not just to handle non-THP > aware paths that mostly disappeared by now, but also when you truncate > a vma so that a THP doesn't fit in it anymore. So it's up to userland > how frequently it needs to run. Yes, I understand. I _think_ this should be fine performance-wise, compared to other things split_huge_page() paths should do these 511 test_and_set_bit + clear_bit should not be noticeable. But of course I can be wrong. > I think it's reasonable to consider it though, but then it's not > guaranteed that a put_page on a THP tail is more frequent than > split_huge_page. Keep in mind we do the get_page_unless_zero to > stabilize the head to take the compound_lock on it, only for the > tails, never for the heads. So this restricts it to _only_ the > put_page following a gup_fast. Only gup_fast can ever take a reference > on the tail pages of a THP. Nothing else can. I see. But my main point was simplification. And I'm afraid this compound_lock() in get/put can race with someone else playing with page->flags "because I own this freshly allocated page". Perhaps. However. I do understand your concerns, lets forget about this for now. --------------------------------------------------------------------------- Please review get_futex_key() changes we discussed before. As for the race with prep_new_page() I'll send another patch, it is completely orthogonal to this series. Testing. I simply have no idea how can I test this series so I didn't even try ;) I'll try to do something tomorrow, but in any case I have to rely on your review. Most of the cleanups in this series are not needed to change get_futex_key(), but imho make sense. And we need more cleanups after this series, I'll do this later if this series makes any sense. Oleg. include/linux/mm.h | 2 + kernel/futex.c | 37 +-------- mm/swap.c | 251 ++++++++++++++++++++++++++-------------------------- 3 files changed, 128 insertions(+), 162 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/