On Thu, Feb 9, 2017 at 6:53 AM, Al Viro <v...@zeniv.linux.org.uk> wrote: > On Wed, Feb 08, 2017 at 01:48:04PM -0800, Andrew Morton wrote: > >> > This patch detects stale dentry in proc_sys_compare and pretends that >> > it has matching name - revalidation will kill it and lookup restarts. >> > As a result each stale dentry will be seen only once and will not >> > contaminate hash endlessly. >> > >> >> What are "stale" dentries? Unused dentries? If so, why doesn't the >> creation of a new dentry immediately invalidate the old dentry with a >> matching path? What do other filesystems do to prevent this issue? > > The whole point is that it's *NOT* a matching path. Currently ->d_compare() > for /proc/sys tries to make sure that sysctl getting unregistered means > that no extra references will be added to dentries of the stuff we are > trying to kick out. If it's getting unregistered, ->d_compare() won't be > seeing them and from that point on dentry refcount can only go down - > no new lookups will increase it. > > This kludge tries to have _any_ lookup in the same hash chain pick the > first dentry of such stuff, no matter what name/parent it has. Then > it relies upon ->d_revalidate() refusing to accept that sucker, so that > it gets unhashed and we (hopefully) repeat the lookup. > > This is complete garbage. Lookups won't be repeated indefinitely - > if there are several such dentries in the hash chain we search, syscall > will end up failing with ESTALE on thus buggered ->d_compare(), even though > none of those dentries are anywhere near the path we are trying to resolve. > No other filesystem attempts that kind of insanity, and for a good reason. > > The problem it tries to address is that sysctl unregistration doesn't > unhash the now-stale dentries. Before the unregistration we kept them > even with refcount 0, until memory pressure evicts the suckers. After > unregistration we make sure that refcount reaching 0 will cause the > instant eviction. The problem is with the case when they had refcount 0 > to start with. Then the eviction rule does not get triggered - it would have > happened when dropping the last reference, but we don't have any. > > The kludge proposed in that patch is nowhere near being a sane way to deal > with that. Having ->d_compare() notice such dentries and quietly kick > them out would be borderline saner, but > * it's a potentially blocking operation and ->d_compare() is called > in non-blocking contexts, including deep under rcu_read_lock(). > * it's done when walking a hash chain; having that chain modified > by ->d_compare() itself would require some modifications of callers and > those are very hot codepaths. > > I agree that the problem is real, but this is no way to deal with it. > What we want is something along the lines of d_prune_aliases() done for all > inodes corresponding to given sysctl. Done just before erase_header() > in start_unregistering(). That would require maintaining the list of > inodes in question (e.g. anchored in ctl_table_header) and a bit of care > in traversing it (use of igrab(), etc.) > > In the current form - NAK. Sorry.
Ok, Thank you. I've expected that this fix isn't sane, Maybe we could minimize changes for now. For example: keep these stale dentries in memory but silently unhash them in ->d_compare(). Memory processure and reclaimer will kill them later.