NeilBrown <[email protected]> wrote: > From: NeilBrown <[email protected]> > > The primary purpose of this patch is to remove the locking from > ovl_lookup_real_one() as part of centralising all locking of directories > for name operations. > > The locking here isn't needed. By performing consistency tests after > the lookup we can be sure that the result of the lookup was valid at > least for a moment, which is all the original code promised. > > lookup_noperm_unlocked() is used for the lookup and it will take the > lock if needed only where it is needed. > > Also: > - don't take a reference to real->d_parent. The parent is > only use for a pointer comparison, and no reference is needed for > that. > - Several "if" statements have a "goto" followed by "else" - the > else isn't needed: the following statement can directly follow > the "if" as a new statement > - Use a consistent pattern of setting "err" before performing a test > and possibly going to "fail". > - remove the "out" label (now that we don't need to dput(parent) or > unlock) and simply return from fail:.
Hi everyone, Amir asked me to run these through, and this commit was flagged. I recently changed the prompts to include some vfs knowledge: https://github.com/masoncl/review-prompts/blob/main/kernel/subsystem/vfs.md And I also put in some words to make it prefer reporting problems that might not be bugs, but that do violate some rule in the .md files. This might be a false positive from that new shift toward blindly trusting the guides, but I'll let everyone here decide: commit d1bfcef048896cdc9e4c57e3249b6ad344d64c02 Author: NeilBrown <[email protected]> ovl: Simplify ovl_lookup_real_one() This patch removes locking from ovl_lookup_real_one() and reorganizes the control flow. The function now uses lookup_noperm_unlocked() instead of lookup_noperm() and performs consistency validation after the lookup rather than holding a lock during the operation. > diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c > index 83f80fdb15674..b448fc9424b64 100644 > --- a/fs/overlayfs/export.c > +++ b/fs/overlayfs/export.c > @@ -349,69 +349,64 @@ static struct dentry *ovl_dentry_real_at(struct dentry > *dentry, int idx) > return NULL; > } > > -/* > - * Lookup a child overlay dentry to get a connected overlay dentry whose real > - * dentry is @real. If @real is on upper layer, we lookup a child overlay > - * dentry with the same name as the real dentry. Otherwise, we need to > consult > - * index for lookup. > +/** > + * ovl_lookup_real_one - Lookup a child overlay dentry to get an overlay > dentry whose real dentry is given > + * @connected: parent overlay dentry > + * @real: given child real dentry > + * @layer: layer in which @real exists > + * > + * > + * Lookup a child overlay dentry in @connected with the same name as the > @real > + * dentry. Then check that the parent of the result is the real dentry for > + * @connected, and @real is the real dentry for the result. > + * > + * Returns: > + * %-ECHILD if the parent of @real is no longer the real dentry for > @connected. > + * %-ESTALE if @real is no the real dentry of the found dentry. ^^ There's a typo here: "is no the" should be "is not the". > + * Otherwise the found dentry is returned. > */ > static struct dentry *ovl_lookup_real_one(struct dentry *connected, > struct dentry *real, > const struct ovl_layer *layer) > { > - struct inode *dir = d_inode(connected); > - struct dentry *this, *parent = NULL; > + struct dentry *this; > struct name_snapshot name; > int err; > > /* > - * Lookup child overlay dentry by real name. The dir mutex protects us > - * from racing with overlay rename. If the overlay dentry that is above > - * real has already been moved to a parent that is not under the > - * connected overlay dir, we return -ECHILD and restart the lookup of > - * connected real path from the top. > - */ > - inode_lock_nested(dir, I_MUTEX_PARENT); > - err = -ECHILD; > - parent = dget_parent(real); > - if (ovl_dentry_real_at(connected, layer->idx) != parent) > - goto fail; > - > - /* > - * We also need to take a snapshot of real dentry name to protect us > + * We need to take a snapshot of real dentry name to protect us > * from racing with underlying layer rename. In this case, we don't > * care about returning ESTALE, only from dereferencing a free name > * pointer because we hold no lock on the real dentry. > */ > take_dentry_name_snapshot(&name, real); > - /* > - * No idmap handling here: it's an internal lookup. > - */ > - this = lookup_noperm(&name.name, connected); > + this = lookup_noperm_unlocked(&name.name, connected); > release_dentry_name_snapshot(&name); > + > + err = -ECHILD; > + if (ovl_dentry_real_at(connected, layer->idx) != real->d_parent) > + goto fail; > + A subsystem pattern flags this as potentially concerning: the VFS subsystem guide indicates that TOCTOU validation checking d_parent should also check d_unhashed() because a dentry can be unhashed (removed from dcache by concurrent operations) while its d_parent pointer remains unchanged. The old code held inode_lock_nested(dir, I_MUTEX_PARENT) during the entire lookup and validation, which serialized with operations that could unhash the dentry. The new lockless pattern creates a window where: 1. lookup_noperm_unlocked() completes and returns dentry `this` 2. A concurrent unlink/remove operation unhashes `this` 3. The d_parent validation passes (parent pointer unchanged) 4. The function returns an unhashed dentry Other overlayfs code uses the complete pattern. For example, in ovl_lock_rename_workdir() at fs/overlayfs/util.c: if (work->d_parent != workdir || d_unhashed(work)) { ... } Could this validation miss a race where the dentry is unhashed but d_parent still matches? The subsequent checks (!this->d_inode for -ENOENT, wrong real dentry for -ESTALE) might catch some cases, but is that sufficient for all the scenarios where an unhashed dentry could cause problems? > err = PTR_ERR(this); > - if (IS_ERR(this)) { > + if (IS_ERR(this)) > goto fail; > - } else if (!this || !this->d_inode) { > - dput(this); > - err = -ENOENT; > + > + err = -ENOENT; > + if (!this || !this->d_inode) > goto fail; > - } else if (ovl_dentry_real_at(this, layer->idx) != real) { > - dput(this); > - err = -ESTALE; > + > + err = -ESTALE; > + if (ovl_dentry_real_at(this, layer->idx) != real) > goto fail; > - } > > -out: > - dput(parent); > - inode_unlock(dir); > return this; > > fail: > pr_warn_ratelimited("failed to lookup one by real (%pd2, layer=%d, > connected=%pd2, err=%i)\n", > real, layer->idx, connected, err); > - this = ERR_PTR(err); > - goto out; > + if (!IS_ERR(this)) > + dput(this); > + return ERR_PTR(err); > } > > static struct dentry *ovl_lookup_real(struct super_block *sb,
