On Wed, Jun 12, 2013 at 01:26:25PM -0700, Linus Torvalds wrote: > For similar reasons, I think you need to still maintain the d_lock in > d_prune_aliases etc. That's a slow-path, so the fact that we add an > atomic sequence there doesn't much matter. > > However, one optimization missing from your patch is obvious in the > profile. "dget_parent()" also needs to be optimized - you still have > that as 99% of the spin-lock case. I think we could do something like > > rcu_read_lock(); > parent = ACCESS_ONCE(dentry->d_parent); > if (atomic_inc_nonzero(&parent->d_count)) > return parent; > .. get d_lock and do it the slow way ... > rcu_read_unlock(); > > to locklessly get the parent pointer. We know "parent" isn't going > away (dentries are rcu-free'd and we hold the rcu read lock), and I > think that we can optimistically take *any* parent dentry that > happened to be valid at one point. As long as the refcount didn't go > down to zero. Al?
What will you do with __d_rcu_to_refcount()? Any such scheme has to hold d_lock from zero->non-zero d_count changes, or atomic_dec_and_lock in dput() won't help at all. As it is, both comlete_walk() and unlazy_walk() are grabbing ->d_lock on the dentry we'd reached, so they can call that sucker. And that'll give you ->d_lock contention when a bunch of threads are hitting the same file; I don't see how atomics would avoid that one... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/