On Mon, Jan 26, 2009 at 01:30:09PM +0100, Peter Zijlstra wrote: > I don't think I was suggesting that. All you need is to serialize any > mkdir/creat against the rmdir of the youngest non-default group, and you > can do that by holding su_mutex. > > In rmdir, you already own all the i_mutex instances you need to uncouple > the whole tree, all you need to do is validate that its indeed empty -- > you don't need i_mutex's for that, because you're holding su_mutex, and > any concurrent mkdir/creat will be blocking on that. > > If you find it empty, just mark everybody DEAD, drop su_mutex and > decouple. All concurrent mkdir/creat thingies that were blocking will > now bail because their parent is found DEAD.
Right, that's what I was talking about when I said: > > Now look in detach_groups(). We drop the groups children before > > marking them DEAD. Louis' plan, I think, is to perhaps mark a group > > DEAD, disconnect it from the vfs, and then operate on its children. In > > this fashion, perhaps we can unlock the trailing lock like a normal VFS > > operation. > > This will require some serious auditing, however, because now > > vfs functions can get into the vfs objects behind us. And more vfs > > changes affect us. Whereas the current locking relies on the vfs's > > parent->child lock ordering only, something that isn't likely to be > > changed. That is, the vfs has already walked past this directory, dropped i_mutex, and is in a child default group holding its i_mutex. It wants to mkdir(2) down there. You're saying that, if mkdir(2) holds su_mutex higher up, it can check S_DEAD and compare with us, and that's exactly the scheme I mentioned in the first of the quoted paragraphs (Louis proposed it a while back). Thus, though we don't hold i_mutex on that child, it will eventually either a) have gotten su_mutex first, and will cause rmdir to ENOTEMPTY or b) have gotten su_mutex second, will see S_DEAD, and return -ENOENT. As far as preventing mkdir(2), I don't see why it wouldn't work. The issue is in that second quoted paragraph. We know that the VFS can look up our children if we're not holding i_mutex. In fact, cached_lookup() can find them without i_mutex. Now, we know that mkdir(2) and rmdir(2) will block at su_mutex. But what about all other file operations, both on the child directories *and* the attribute files? For attribute files, we prevent access at creation time with a flag. We can trust the flag because we hold i_mutex. This might hold anyway because we're holding that toplevel i_mutex. At teardown time, though, we know they can't be found because of i_mutex. Now we don't have that protection for processes that are farther down the tree. But the bigger issue is just the plain regular operations on our directories. An example is ->readdir(). We currently lock out ->readdir() during rmdir(2) with our holding of i_mutex. However, if we are not holding i_mutex, vfs_readdir() can call into our ->readdir() right as we're tearing things down. We may not have gotten to S_DEAD yet in the rmdir(2) path, and the two will race. We can't take su_mutex in ->readdir(), because that sort of solution effectively says "we have to take su_mutex for all operations", and we end up serializing all operations on a subsustem. Now, I can think of a way to make ->readdir() safe without su_mutex. But what other operation is next? How do I know I have them all? How do I notice when someone adds a new operation or code path to the generic code that I have to protect against? With i_mutex, I *know* that everyone has agreed that is the gatekeeper. Without it... *That's* the big worry. That's what I'm worried about being sure of. I'd love to hear a solution that we know will work, or at least move towards one. Joel PS: And we haven't even talked about configfs_depend_item() yet. -- Life's Little Instruction Book #252 "Take good care of those you love." Joel Becker Principal Software Developer Oracle E-mail: joel.bec...@oracle.com Phone: (650) 506-8127