On Wed, Jun 12, 2013 at 10:50 AM, Davidlohr Bueso <davidlohr.bu...@hp.com> wrote: > > * short: is the big winner for this patch, +69% throughput improvement > with 100-2000 users. This makes a lot of sense since the workload spends > a ridiculous amount of time trying to acquire the d_lock: > > 84.86% 1569902 reaim [kernel.kallsyms] [k] > _raw_spin_lock > | > --- _raw_spin_lock > | > |--49.96%-- dget_parent > | __fsnotify_parent > |--49.71%-- dput
Ugh. Do you have any idea what the heck that thing actually does? Normally, we shouldn't see lots of dget contention, since the dcache these days does everything but the last path component locklessly. But there's a few exceptions, like symlinks (act as "last component" in the middle). And obviously, if some crazy threaded program opens the *same* file concurrently over and over again, then that "last component" will hammer on the dentry lock of that particular path. But that "open the same file concurrently" seems totally unrealistic - although maybe that's what AIM does.. Anybody know the AIM subtests? Also, we *may* actually be able to optimize this by making dentry->d_count atomic, which will allow us to often do dget_parent and put() without taking the dcache lock at all. That's what it used to be, but the RCU patches actually made it be protected by the d_lock. It made sense at the time, as a step in the sequence, and many of the dentry d_count accesses are under the lock, but now that the remaining hot-paths are dget_parent and dput and many of the dentry d_count increments are gone from the hot-paths, we might want to re-visit that decision. It could go either way. Al, comments? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/