On 2/3/26 6:26 PM, Al Viro wrote:
On Tue, Feb 03, 2026 at 09:50:02PM +0000, Al Viro wrote:
On Tue, Feb 03, 2026 at 03:32:04PM -0500, Waiman Long wrote:

That is actually a concern that I have at the back of my mind. I can modify
the patch to cache only the dentry and do get/put the mount every time which
is much cheaper as it is a percpu counter.  In that way, a chdir(2) followed
by a umount(2) shouldn't cause a -EBUSY. Right?
Quite - it will just retain a reference to dentry, with filesystem shutdown
being very unhappy about somebody retaining references to objects on the
filesystem about to be taken out...
Sarcasm aside, I wonder if we could do the following trick:
        * a new primitive for "grab or borrow pwd", similar to what fdget()
does for struct file.  If current->fs is shared, do what we do now and return
true; otherwise just copy the contents of current->fs->pwd return false.
        * paired primitive that would take a boolean + struct path * and
do path_put() if boolean is true.
        * syscalls that might alter ->fs, ->fs->pwd or add extra references to
->fs would start with grabbing an extra ref on entry and drop it in the end;
that would make that primitive safe to use there.
        * audit using that thing and storing the result along with the copy
of pwd; on the way out it would use the "put unless borrowed" primitive.

Might or might not be useful - hard to tell without knowing the job mix of those
audit-afflicted production systems.

I'll try to put something along those lines together...

Interesting. So are you thinking about a reference on the pwd inside the fs_struct? I am thinking about instead of getting a reference to pwd.dentry and pwd.mnt, we can just get a reference to the pwd itself. We have to grab the fs lock in get_fs_pwd() anyway. This spinlock doesn't show up as being contended in the customer bug report as each process can has its own fs_struct instead of many processing sharing the same working directory.

In the case of audit, fs->pwd is copied out at the beginning of the open* system call and then put back at the end of that system call. So it should last a pretty short time, i.e. the reference will be released shortly. However, that will make the set_fs_pwd() call a bit tricky to implement, but I think it is still doable.

Cheers,
Longman


Reply via email to