subject:"Do we really need d_weak

Re: Do we really need d_weak_revalidate???

2017-09-08 Thread David Howells

Ian Kent  wrote:

> So far only David commented about using ENOENT rather than EREMOTE.
> 
> I prefer ENOENT for this case myself and he didn't object when I
> explained why, David, any concerns?

Not really - it just seems EREMOTE is a better fit since there is something
there, we're just not allowed to follow it.  This is different to a dangling
symlink IMO.

David

Re: Do we really need d_weak_revalidate???

2017-09-08 Thread David Howells

Ian Kent  wrote:

> So far only David commented about using ENOENT rather than EREMOTE.
> 
> I prefer ENOENT for this case myself and he didn't object when I
> explained why, David, any concerns?

Not really - it just seems EREMOTE is a better fit since there is something
there, we're just not allowed to follow it.  This is different to a dangling
symlink IMO.

David

Re: Do we really need d_weak_revalidate???

2017-08-24 Thread Ian Kent

On 24/08/17 19:03, Michael Kerrisk (man-pages) wrote:
> Hi Neil,
> 
> On 24 August 2017 at 06:07, NeilBrown  wrote:
>> On Wed, Aug 23 2017, Ian Kent wrote:
>>
>>>
>>> That inconsistency has bothered me for quite a while now.
>>>
>>> It was carried over from the autofs module behavior when automounting
>>> support was added to the VFS. What's worse is it prevents the use of
>>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>>> statx().
>>>
>>> There is some risk in changing that so it does work but it really does
>>> need to work to enable userspace to not trigger an automount by using
>>> this flag.
>>>
>>> So that's (hopefully) going to change soonish, see:
>>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>>
>>> The result should be that stat family calls don't trigger automounts except
>>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>>
>>
>> oooh, yes.  That's much better - thanks.
>>
>> We should make sure that change gets into the man pages...
>>
>> First however, we should probably correct the man page!
>> stat.2 says:
>>
>>
>>   NOTES
>>On Linux, lstat() will generally not trigger  automounter
>>action,  whereas  stat()  will  (but  see  the description of
>>fstatat() AT_NO_AUTOMOUNT fag, above).
>>
>> which is wrong: lstat and stat treat automounts the same.
>> @Michael: do you recall why you inserted that text?  The commit message
>> in commit 1ef5b2805471 ("stat.2: Cosmetic reworking of timestamp
>> discussion in NOTES") is not very helpful.
> 
> That commit really was just cosmetic changes. The change that
> introduced the text was 82d2be3d9d66b7, based on a note from Peter
> Anvin:

Indeed, that was correct for autofs v3 but we're at autofs v5 now and
a lot has changed over time (the commit is from 2008).

All I can do is apologize for not also checking the man pages and trying
to keep them up to date.

Let's just work on making them accurate now.

> 
> [[
> > > Additionally, you may want to make a note in the stat/lstat man page 
> tha
> t on
> > > Linux, lstat(2) will generally not trigger automounter action, whereas
> > > stat(2) will.
> >
> > I don't understand this last piece.  Can you say some more.  (I'm not
> > familiar with automounter details.)
> 
> An automounter (either an explicit one, like autofs, or an implicit
> one, such as are used by AFS or NFSv4) is something that triggers
> a mount when something is touched.
> 
> However, it's undesirable to automount, say, everyone's home
> directory just because someone opened up /home in their GUI
> browser or typed "ls -l /home".  The early automounters simply
> didn't list the contents until you accessed it by name;
> this is still the case when you can't enumerate a mapping
> (say, all DNS names under /net).  However, this is extremely
> inconvenient, too.
> 
> The solution we ended up settling on is to create something
> that looks like a directory (i.e. reports S_IFDIR in stat()),
> but behaves somewhat like a symlink.  In particular, when it is
> accessed in a way where a symlink would be dereferenced,
> the automount triggers and the directory is mounted.  However,
> system calls which do *not* cause a symlink to be dereferenced,
> like lstat(), also do not cause the automounter to trigger.
> This means that "ls -l", or a GUI file browser, can see a list
> of directories without causing each one of them to be automounted.
> 
> -hpa
> ]]
> 
> Cheers,
> 
> Michael
> 
>> I propose correcting to
>>
>>   NOTES:
>>   On Linux, lstat() nor stat() act as though AT_NO_AUTOMOUNT was set
>>   and will not trigger automounter action for direct automount
>>   points, though they may (prior to 4.14) for indirect automount
>>   points.
>>
>>
>> The more precise details, that automount action for indirect automount
>> points is not triggered when the 'browse' option is used, is probably
>> not necessary.
>>
>> Ian: if you agree with that text, and Michael doesn't provide alternate
>> evidence, I'll submit a formal patch for the man page or should we
>> just wait until the patch actually lands?
>>
>> Thanks,
>> NeilBrown
>>
> 
> 
>

Re: Do we really need d_weak_revalidate???

2017-08-24 Thread Ian Kent

On 24/08/17 19:03, Michael Kerrisk (man-pages) wrote:
> Hi Neil,
> 
> On 24 August 2017 at 06:07, NeilBrown  wrote:
>> On Wed, Aug 23 2017, Ian Kent wrote:
>>
>>>
>>> That inconsistency has bothered me for quite a while now.
>>>
>>> It was carried over from the autofs module behavior when automounting
>>> support was added to the VFS. What's worse is it prevents the use of
>>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>>> statx().
>>>
>>> There is some risk in changing that so it does work but it really does
>>> need to work to enable userspace to not trigger an automount by using
>>> this flag.
>>>
>>> So that's (hopefully) going to change soonish, see:
>>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>>
>>> The result should be that stat family calls don't trigger automounts except
>>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>>
>>
>> oooh, yes.  That's much better - thanks.
>>
>> We should make sure that change gets into the man pages...
>>
>> First however, we should probably correct the man page!
>> stat.2 says:
>>
>>
>>   NOTES
>>On Linux, lstat() will generally not trigger  automounter
>>action,  whereas  stat()  will  (but  see  the description of
>>fstatat() AT_NO_AUTOMOUNT fag, above).
>>
>> which is wrong: lstat and stat treat automounts the same.
>> @Michael: do you recall why you inserted that text?  The commit message
>> in commit 1ef5b2805471 ("stat.2: Cosmetic reworking of timestamp
>> discussion in NOTES") is not very helpful.
> 
> That commit really was just cosmetic changes. The change that
> introduced the text was 82d2be3d9d66b7, based on a note from Peter
> Anvin:

Indeed, that was correct for autofs v3 but we're at autofs v5 now and
a lot has changed over time (the commit is from 2008).

All I can do is apologize for not also checking the man pages and trying
to keep them up to date.

Let's just work on making them accurate now.

> 
> [[
> > > Additionally, you may want to make a note in the stat/lstat man page 
> tha
> t on
> > > Linux, lstat(2) will generally not trigger automounter action, whereas
> > > stat(2) will.
> >
> > I don't understand this last piece.  Can you say some more.  (I'm not
> > familiar with automounter details.)
> 
> An automounter (either an explicit one, like autofs, or an implicit
> one, such as are used by AFS or NFSv4) is something that triggers
> a mount when something is touched.
> 
> However, it's undesirable to automount, say, everyone's home
> directory just because someone opened up /home in their GUI
> browser or typed "ls -l /home".  The early automounters simply
> didn't list the contents until you accessed it by name;
> this is still the case when you can't enumerate a mapping
> (say, all DNS names under /net).  However, this is extremely
> inconvenient, too.
> 
> The solution we ended up settling on is to create something
> that looks like a directory (i.e. reports S_IFDIR in stat()),
> but behaves somewhat like a symlink.  In particular, when it is
> accessed in a way where a symlink would be dereferenced,
> the automount triggers and the directory is mounted.  However,
> system calls which do *not* cause a symlink to be dereferenced,
> like lstat(), also do not cause the automounter to trigger.
> This means that "ls -l", or a GUI file browser, can see a list
> of directories without causing each one of them to be automounted.
> 
> -hpa
> ]]
> 
> Cheers,
> 
> Michael
> 
>> I propose correcting to
>>
>>   NOTES:
>>   On Linux, lstat() nor stat() act as though AT_NO_AUTOMOUNT was set
>>   and will not trigger automounter action for direct automount
>>   points, though they may (prior to 4.14) for indirect automount
>>   points.
>>
>>
>> The more precise details, that automount action for indirect automount
>> points is not triggered when the 'browse' option is used, is probably
>> not necessary.
>>
>> Ian: if you agree with that text, and Michael doesn't provide alternate
>> evidence, I'll submit a formal patch for the man page or should we
>> just wait until the patch actually lands?
>>
>> Thanks,
>> NeilBrown
>>
> 
> 
>

Re: Do we really need d_weak_revalidate???

2017-08-24 Thread Michael Kerrisk (man-pages)

Hi Neil,

On 24 August 2017 at 06:07, NeilBrown  wrote:
> On Wed, Aug 23 2017, Ian Kent wrote:
>
>>
>> That inconsistency has bothered me for quite a while now.
>>
>> It was carried over from the autofs module behavior when automounting
>> support was added to the VFS. What's worse is it prevents the use of
>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>> statx().
>>
>> There is some risk in changing that so it does work but it really does
>> need to work to enable userspace to not trigger an automount by using
>> this flag.
>>
>> So that's (hopefully) going to change soonish, see:
>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>
>> The result should be that stat family calls don't trigger automounts except
>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>
>
> oooh, yes.  That's much better - thanks.
>
> We should make sure that change gets into the man pages...
>
> First however, we should probably correct the man page!
> stat.2 says:
>
>
>   NOTES
>On Linux, lstat() will generally not trigger  automounter
>action,  whereas  stat()  will  (but  see  the description of
>fstatat() AT_NO_AUTOMOUNT fag, above).
>
> which is wrong: lstat and stat treat automounts the same.
> @Michael: do you recall why you inserted that text?  The commit message
> in commit 1ef5b2805471 ("stat.2: Cosmetic reworking of timestamp
> discussion in NOTES") is not very helpful.

That commit really was just cosmetic changes. The change that
introduced the text was 82d2be3d9d66b7, based on a note from Peter
Anvin:

[[
> > Additionally, you may want to make a note in the stat/lstat man page tha
t on
> > Linux, lstat(2) will generally not trigger automounter action, whereas
> > stat(2) will.
>
> I don't understand this last piece.  Can you say some more.  (I'm not
> familiar with automounter details.)

An automounter (either an explicit one, like autofs, or an implicit
one, such as are used by AFS or NFSv4) is something that triggers
a mount when something is touched.

However, it's undesirable to automount, say, everyone's home
directory just because someone opened up /home in their GUI
browser or typed "ls -l /home".  The early automounters simply
didn't list the contents until you accessed it by name;
this is still the case when you can't enumerate a mapping
(say, all DNS names under /net).  However, this is extremely
inconvenient, too.

The solution we ended up settling on is to create something
that looks like a directory (i.e. reports S_IFDIR in stat()),
but behaves somewhat like a symlink.  In particular, when it is
accessed in a way where a symlink would be dereferenced,
the automount triggers and the directory is mounted.  However,
system calls which do *not* cause a symlink to be dereferenced,
like lstat(), also do not cause the automounter to trigger.
This means that "ls -l", or a GUI file browser, can see a list
of directories without causing each one of them to be automounted.

-hpa
]]

Cheers,

Michael

> I propose correcting to
>
>   NOTES:
>   On Linux, lstat() nor stat() act as though AT_NO_AUTOMOUNT was set
>   and will not trigger automounter action for direct automount
>   points, though they may (prior to 4.14) for indirect automount
>   points.
>
>
> The more precise details, that automount action for indirect automount
> points is not triggered when the 'browse' option is used, is probably
> not necessary.
>
> Ian: if you agree with that text, and Michael doesn't provide alternate
> evidence, I'll submit a formal patch for the man page or should we
> just wait until the patch actually lands?
>
> Thanks,
> NeilBrown
>

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Re: Do we really need d_weak_revalidate???

2017-08-24 Thread Michael Kerrisk (man-pages)

Hi Neil,

On 24 August 2017 at 06:07, NeilBrown  wrote:
> On Wed, Aug 23 2017, Ian Kent wrote:
>
>>
>> That inconsistency has bothered me for quite a while now.
>>
>> It was carried over from the autofs module behavior when automounting
>> support was added to the VFS. What's worse is it prevents the use of
>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>> statx().
>>
>> There is some risk in changing that so it does work but it really does
>> need to work to enable userspace to not trigger an automount by using
>> this flag.
>>
>> So that's (hopefully) going to change soonish, see:
>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>
>> The result should be that stat family calls don't trigger automounts except
>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>
>
> oooh, yes.  That's much better - thanks.
>
> We should make sure that change gets into the man pages...
>
> First however, we should probably correct the man page!
> stat.2 says:
>
>
>   NOTES
>On Linux, lstat() will generally not trigger  automounter
>action,  whereas  stat()  will  (but  see  the description of
>fstatat() AT_NO_AUTOMOUNT fag, above).
>
> which is wrong: lstat and stat treat automounts the same.
> @Michael: do you recall why you inserted that text?  The commit message
> in commit 1ef5b2805471 ("stat.2: Cosmetic reworking of timestamp
> discussion in NOTES") is not very helpful.

That commit really was just cosmetic changes. The change that
introduced the text was 82d2be3d9d66b7, based on a note from Peter
Anvin:

[[
> > Additionally, you may want to make a note in the stat/lstat man page tha
t on
> > Linux, lstat(2) will generally not trigger automounter action, whereas
> > stat(2) will.
>
> I don't understand this last piece.  Can you say some more.  (I'm not
> familiar with automounter details.)

An automounter (either an explicit one, like autofs, or an implicit
one, such as are used by AFS or NFSv4) is something that triggers
a mount when something is touched.

However, it's undesirable to automount, say, everyone's home
directory just because someone opened up /home in their GUI
browser or typed "ls -l /home".  The early automounters simply
didn't list the contents until you accessed it by name;
this is still the case when you can't enumerate a mapping
(say, all DNS names under /net).  However, this is extremely
inconvenient, too.

The solution we ended up settling on is to create something
that looks like a directory (i.e. reports S_IFDIR in stat()),
but behaves somewhat like a symlink.  In particular, when it is
accessed in a way where a symlink would be dereferenced,
the automount triggers and the directory is mounted.  However,
system calls which do *not* cause a symlink to be dereferenced,
like lstat(), also do not cause the automounter to trigger.
This means that "ls -l", or a GUI file browser, can see a list
of directories without causing each one of them to be automounted.

-hpa
]]

Cheers,

Michael

> I propose correcting to
>
>   NOTES:
>   On Linux, lstat() nor stat() act as though AT_NO_AUTOMOUNT was set
>   and will not trigger automounter action for direct automount
>   points, though they may (prior to 4.14) for indirect automount
>   points.
>
>
> The more precise details, that automount action for indirect automount
> points is not triggered when the 'browse' option is used, is probably
> not necessary.
>
> Ian: if you agree with that text, and Michael doesn't provide alternate
> evidence, I'll submit a formal patch for the man page or should we
> just wait until the patch actually lands?
>
> Thanks,
> NeilBrown
>

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Re: Do we really need d_weak_revalidate???

2017-08-24 Thread NeilBrown

On Mon, Aug 14 2017, NeilBrown wrote:

> On Fri, Aug 11 2017, Trond Myklebust wrote:
>
>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>>> flag and introduced the d_weak_revalidate dentry operation instead.
>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>>> and added the new dentry operation to NFS dentries  but not to
>>> NFSv4
>>> dentries.
>>> 
>>> And nobody noticed.
>>> 
>>> Until today.
>>> 
>>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>>> NFS
>>> filesystem hangs because the network has been deconfigured.  This
>>> makes
>>> perfect sense and I suggested a code change to fix the problem.
>>> However when a colleague was trying to reproduce the problem to
>>> validate
>>> the fix, he couldn't.  Then nor could I.
>>> 
>>> The problem is trivially reproducible with NFSv3, and not at all with
>>> NFSv4.  The reason is the missing d_weak_revalidate.
>>> 
>>> We could simply add d_weak_revalidate for NFSv4, but given that it
>>> has been missing for 4.5 years, and the only time anyone noticed was
>>> when the ommission resulted in a better user experience, I do wonder
>>> if
>>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>>> does
>>> it serve?  I couldn't find one.
>>> 
>>> Thanks,
>>> NeilBrown
>>> 
>>> For reference, see
>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>>> d_weak_revalidate dentry op")
>>> 
>>> 
>>> 
>>> To reproduce the problem at home, on a system that uses systemd:
>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>>> 3/ loop-mount the filesystem image read-only somewhere
>>> 4/ reboot
>>> 
>>> If you choose v4, the reboot will succeed, possibly after a 90second
>>> timeout.
>>> If you choose v3, the reboot will hang indefinitely in systemd-
>>> shutdown while
>>> remounting the nfs filesystem read-only.
>>> 
>>> If you don't use "noac" it can still hang, but only if something
>>> slows
>>> down the reboot enough that attributes have timed out by the time
>>> that
>>> systemd-shutdown runs.  This happens for our customer.
>>> 
>>> If the loop-mounted filesystem is not read-only, you get other
>>> problems.
>>> 
>>> We really want systemd to figure out that the loop-mount needs to be
>>> unmounted first.  I have ideas concerning that, but it is messy.  But
>>> that isn't the only bug here.
>>
>> The main purpose of d_weak_revalidate() was to catch the issues that
>> arise when someone changes the contents of the current working
>> directory or its parent on the server. Since '.' and '..' are treated
>> specially in the lookup code, they would not be revalidated without
>> special treatment. That leads to issues when looking up files as
>> ./ or ../, since the client won't detect that its
>> dcache is stale until it tries to use the cached dentry+inode.
>
> I don't think that is quite right.
> d_weak_revalidate() is only called from complete_walk() if LOOKUP_JUMPED
> is set.  The happens when the final component of a path:
>  - is a mount point
>  - is ".."
> or if the whole path is "/".  I thought "." was treated specially too,
> but I cannot find that in the code.

Actually, you were very close to the right answer, and I was missing
something important.

The issue (or, at least "an" issue) happens when you open "." or ".." or
a mount point, or a /proc/*/fd/* symlink.

In each case LOOKUP_JUMPED is set.  "." doesn't set it, but it doesn't
clear it either and it is always set at the start of a path lookup.

When you open a file (or directory) on NFS you need to validate the
attributes to ensure close-to-open consistency rules are met.
When you open any path that ends with a LAST_NORM name, d_revalidate will
be passed the LOOKUP_OPEN flag and so nfs_lookup_verify_inode() will
force a revalidate with __nfs_revalidate_inode().

When you open something that ends with LOOKUP_JUMPED, the task of
forcing the revalidate falls to d_weak_revalidate().  Unfortunately it
doesn't actually do that.  With NFSv4, there is no d_weak_revalidate().
With NFSv3 there is - but it doesn't know if LOOKUP_JUMPED is set, and
doesn't force the revalidate.

This means that if you
   echo *
or
   echo ../*

there might be no communication with the server, and you might get stale
data.

These command *do* work as expected only when the directory being listed
is a mountpoint.  This is because nfs_opendir() contains:

if (filp->f_path.dentry == filp->f_path.mnt->mnt_root) {
/* This is a mountpoint, so d_revalidate will never
 * have been called, so we need to refresh the
 * inode (for close-open consistency) ourselves.
 */
__nfs_revalidate_inode(NFS_SERVER(inode), inode);
}

which I put there some years ago, when things worked differently.

There are various ways we

Re: Do we really need d_weak_revalidate???

2017-08-24 Thread NeilBrown

On Mon, Aug 14 2017, NeilBrown wrote:

> On Fri, Aug 11 2017, Trond Myklebust wrote:
>
>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>>> flag and introduced the d_weak_revalidate dentry operation instead.
>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>>> and added the new dentry operation to NFS dentries  but not to
>>> NFSv4
>>> dentries.
>>> 
>>> And nobody noticed.
>>> 
>>> Until today.
>>> 
>>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>>> NFS
>>> filesystem hangs because the network has been deconfigured.  This
>>> makes
>>> perfect sense and I suggested a code change to fix the problem.
>>> However when a colleague was trying to reproduce the problem to
>>> validate
>>> the fix, he couldn't.  Then nor could I.
>>> 
>>> The problem is trivially reproducible with NFSv3, and not at all with
>>> NFSv4.  The reason is the missing d_weak_revalidate.
>>> 
>>> We could simply add d_weak_revalidate for NFSv4, but given that it
>>> has been missing for 4.5 years, and the only time anyone noticed was
>>> when the ommission resulted in a better user experience, I do wonder
>>> if
>>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>>> does
>>> it serve?  I couldn't find one.
>>> 
>>> Thanks,
>>> NeilBrown
>>> 
>>> For reference, see
>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>>> d_weak_revalidate dentry op")
>>> 
>>> 
>>> 
>>> To reproduce the problem at home, on a system that uses systemd:
>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>>> 3/ loop-mount the filesystem image read-only somewhere
>>> 4/ reboot
>>> 
>>> If you choose v4, the reboot will succeed, possibly after a 90second
>>> timeout.
>>> If you choose v3, the reboot will hang indefinitely in systemd-
>>> shutdown while
>>> remounting the nfs filesystem read-only.
>>> 
>>> If you don't use "noac" it can still hang, but only if something
>>> slows
>>> down the reboot enough that attributes have timed out by the time
>>> that
>>> systemd-shutdown runs.  This happens for our customer.
>>> 
>>> If the loop-mounted filesystem is not read-only, you get other
>>> problems.
>>> 
>>> We really want systemd to figure out that the loop-mount needs to be
>>> unmounted first.  I have ideas concerning that, but it is messy.  But
>>> that isn't the only bug here.
>>
>> The main purpose of d_weak_revalidate() was to catch the issues that
>> arise when someone changes the contents of the current working
>> directory or its parent on the server. Since '.' and '..' are treated
>> specially in the lookup code, they would not be revalidated without
>> special treatment. That leads to issues when looking up files as
>> ./ or ../, since the client won't detect that its
>> dcache is stale until it tries to use the cached dentry+inode.
>
> I don't think that is quite right.
> d_weak_revalidate() is only called from complete_walk() if LOOKUP_JUMPED
> is set.  The happens when the final component of a path:
>  - is a mount point
>  - is ".."
> or if the whole path is "/".  I thought "." was treated specially too,
> but I cannot find that in the code.

Actually, you were very close to the right answer, and I was missing
something important.

The issue (or, at least "an" issue) happens when you open "." or ".." or
a mount point, or a /proc/*/fd/* symlink.

In each case LOOKUP_JUMPED is set.  "." doesn't set it, but it doesn't
clear it either and it is always set at the start of a path lookup.

When you open a file (or directory) on NFS you need to validate the
attributes to ensure close-to-open consistency rules are met.
When you open any path that ends with a LAST_NORM name, d_revalidate will
be passed the LOOKUP_OPEN flag and so nfs_lookup_verify_inode() will
force a revalidate with __nfs_revalidate_inode().

When you open something that ends with LOOKUP_JUMPED, the task of
forcing the revalidate falls to d_weak_revalidate().  Unfortunately it
doesn't actually do that.  With NFSv4, there is no d_weak_revalidate().
With NFSv3 there is - but it doesn't know if LOOKUP_JUMPED is set, and
doesn't force the revalidate.

This means that if you
   echo *
or
   echo ../*

there might be no communication with the server, and you might get stale
data.

These command *do* work as expected only when the directory being listed
is a mountpoint.  This is because nfs_opendir() contains:

if (filp->f_path.dentry == filp->f_path.mnt->mnt_root) {
/* This is a mountpoint, so d_revalidate will never
 * have been called, so we need to refresh the
 * inode (for close-open consistency) ourselves.
 */
__nfs_revalidate_inode(NFS_SERVER(inode), inode);
}

which I put there some years ago, when things worked differently.

There are various ways we

Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent

On 24/08/17 12:07, NeilBrown wrote:
> 
> 
> The more precise details, that automount action for indirect automount
> points is not triggered when the 'browse' option is used, is probably
> not necessary.
> 
> Ian: if you agree with that text, and Michael doesn't provide alternate
> evidence, I'll submit a formal patch for the man page or should we
> just wait until the patch actually lands?

So far only David commented about using ENOENT rather than EREMOTE.

I prefer ENOENT for this case myself and he didn't object when I
explained why, David, any concerns?

Al has been silent so far so either he hasn't seen it or he's ok with
it, Al, any concerns?

And I guess if there are no concerns there's a good chance Andrew is
ok with it for the next merge window, Andrew?

If everyone agrees then we could go ahead immediately so there's a
better chance of getting it into released man pages closer to the
change being merged.

Ian

Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent

On 24/08/17 12:07, NeilBrown wrote:
> 
> 
> The more precise details, that automount action for indirect automount
> points is not triggered when the 'browse' option is used, is probably
> not necessary.
> 
> Ian: if you agree with that text, and Michael doesn't provide alternate
> evidence, I'll submit a formal patch for the man page or should we
> just wait until the patch actually lands?

So far only David commented about using ENOENT rather than EREMOTE.

I prefer ENOENT for this case myself and he didn't object when I
explained why, David, any concerns?

Al has been silent so far so either he hasn't seen it or he's ok with
it, Al, any concerns?

And I guess if there are no concerns there's a good chance Andrew is
ok with it for the next merge window, Andrew?

If everyone agrees then we could go ahead immediately so there's a
better chance of getting it into released man pages closer to the
change being merged.

Ian

Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent

On 24/08/17 12:07, NeilBrown wrote:
> On Wed, Aug 23 2017, Ian Kent wrote:
> 
>>
>> That inconsistency has bothered me for quite a while now.
>>
>> It was carried over from the autofs module behavior when automounting
>> support was added to the VFS. What's worse is it prevents the use of
>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>> statx().
>>
>> There is some risk in changing that so it does work but it really does
>> need to work to enable userspace to not trigger an automount by using
>> this flag.
>>
>> So that's (hopefully) going to change soonish, see:
>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>
>> The result should be that stat family calls don't trigger automounts except
>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>
> 
> oooh, yes.  That's much better - thanks.
> 
> We should make sure that change gets into the man pages...

Yes, I was wondering who to contact for that.

> 
> First however, we should probably correct the man page!
> stat.2 says:
> 
> 
>   NOTES
>On Linux, lstat() will generally not trigger  automounter
>action,  whereas  stat()  will  (but  see  the description of
>fstatat() AT_NO_AUTOMOUNT fag, above).
> 
> which is wrong: lstat and stat treat automounts the same.
> @Michael: do you recall why you inserted that text?  The commit message
> in commit 1ef5b2805471 ("stat.2: Cosmetic reworking of timestamp
> discussion in NOTES") is not very helpful.
> 
> I propose correcting to
> 
>   NOTES:
>   On Linux, lstat() nor stat() act as though AT_NO_AUTOMOUNT was set
>   and will not trigger automounter action for direct automount
>   points, though they may (prior to 4.14) for indirect automount
>   points.

Shouldn't that be "lstat() and stat() act as though AT_NO_AUTOMOUNT is
set ..."

> 
> 
> The more precise details, that automount action for indirect automount
> points is not triggered when the 'browse' option is used, is probably
> not necessary.
> 
> Ian: if you agree with that text, and Michael doesn't provide alternate
> evidence, I'll submit a formal patch for the man page or should we
> just wait until the patch actually lands?

I thought the fstatat() description needed attention too, doubly so with
the AT_NO_AUTOMOUNT change.

The "The fstatat() system call operates in exactly the same way as stat()"
is wrong in the same way as the stat() description was wrong.

After the change fstatat() will trigger automounts if the AT_NO_AUTOMOUNT
flag is not given which is different from lstat() and stat().

The updated NOTE above probably needs to be referred to in order to clarify
what's meant by "in exactly the same way" since that probably refers to the
information returned rather than whether an automount will be done.

Ian

Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent

On 24/08/17 12:07, NeilBrown wrote:
> On Wed, Aug 23 2017, Ian Kent wrote:
> 
>>
>> That inconsistency has bothered me for quite a while now.
>>
>> It was carried over from the autofs module behavior when automounting
>> support was added to the VFS. What's worse is it prevents the use of
>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>> statx().
>>
>> There is some risk in changing that so it does work but it really does
>> need to work to enable userspace to not trigger an automount by using
>> this flag.
>>
>> So that's (hopefully) going to change soonish, see:
>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>
>> The result should be that stat family calls don't trigger automounts except
>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>
> 
> oooh, yes.  That's much better - thanks.
> 
> We should make sure that change gets into the man pages...

Yes, I was wondering who to contact for that.

> 
> First however, we should probably correct the man page!
> stat.2 says:
> 
> 
>   NOTES
>On Linux, lstat() will generally not trigger  automounter
>action,  whereas  stat()  will  (but  see  the description of
>fstatat() AT_NO_AUTOMOUNT fag, above).
> 
> which is wrong: lstat and stat treat automounts the same.
> @Michael: do you recall why you inserted that text?  The commit message
> in commit 1ef5b2805471 ("stat.2: Cosmetic reworking of timestamp
> discussion in NOTES") is not very helpful.
> 
> I propose correcting to
> 
>   NOTES:
>   On Linux, lstat() nor stat() act as though AT_NO_AUTOMOUNT was set
>   and will not trigger automounter action for direct automount
>   points, though they may (prior to 4.14) for indirect automount
>   points.

Shouldn't that be "lstat() and stat() act as though AT_NO_AUTOMOUNT is
set ..."

> 
> 
> The more precise details, that automount action for indirect automount
> points is not triggered when the 'browse' option is used, is probably
> not necessary.
> 
> Ian: if you agree with that text, and Michael doesn't provide alternate
> evidence, I'll submit a formal patch for the man page or should we
> just wait until the patch actually lands?

I thought the fstatat() description needed attention too, doubly so with
the AT_NO_AUTOMOUNT change.

The "The fstatat() system call operates in exactly the same way as stat()"
is wrong in the same way as the stat() description was wrong.

After the change fstatat() will trigger automounts if the AT_NO_AUTOMOUNT
flag is not given which is different from lstat() and stat().

The updated NOTE above probably needs to be referred to in order to clarify
what's meant by "in exactly the same way" since that probably refers to the
information returned rather than whether an automount will be done.

Ian

Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent

On 24/08/17 11:21, NeilBrown wrote:
> On Wed, Aug 23 2017, Ian Kent wrote:
> 
>> On 23/08/17 10:32, Ian Kent wrote:
>>> On 23/08/17 09:06, NeilBrown wrote:
 On Mon, Aug 21 2017, Ian Kent wrote:

>>
>> A mount isn't triggered by kern_path(pathname, 0, ).
>> That '0' would need to include one of
>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>
>> to trigger an automount (otherwise you just get -EISDIR).
>
> It's perfectly sensible to think that but there is a case where a
> a mount is triggered when using kern_path().
>
> The EISDIR return occurs for positive dentrys, negative dentrys
> will still trigger an automount (which is autofs specific,
> indirect mount map using nobrowse option, the install default).

 Ok, I understand this better now.  This difference between direct and
 indirect mounts is slightly awkward. It is visible from user-space, but
 not elegant to document.
 When you use O_PATH to open a direct automount that has not already been
 triggered, the open returns the underlying directory (and fstatfs
 confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
 an indirect automount, it *will* trigger the automount when "nobrowse" is
 in effect, but it won't when "browse" is in effect.
>>>
>>> That inconsistency has bothered me for quite a while now.
>>>
>>> It was carried over from the autofs module behavior when automounting
>>> support was added to the VFS. What's worse is it prevents the use of
>>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>>> statx().
>>>
>>> There is some risk in changing that so it does work but it really does
>>> need to work to enable userspace to not trigger an automount by using
>>> this flag.
>>>
>>> So that's (hopefully) going to change soonish, see:
>>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>>
>>> The result should be that stat family calls don't trigger automounts except
>>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>>

 So we cannot just say "O_PATH doesn't trigger automounts", which is
 essentially what I said in

 https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94

 It might be possible to modify automount so that it was more consistent
 - i.e. if the point is triggered by a mkdir has been done, just to the
 mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
 guess that might be racy, and in any case is hard to justify.

 Maybe I should change it to be about "direct automounts", and add a note
 that indirect automounts aren't so predictable.
>>>
>>> Right and the semantics should be much more consistent in the near future.
>>> I hope (and expect) this semantic change won't cause problems.
>>>

 But back to my original issue of wanting to discard
 kern_path_mountpoint, what would you think of the following approach -
 slight revised from before.

 Thanks,
 NeilBrown

 diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
 index beef981aa54f..7663ea82e68d 100644
 --- a/fs/autofs4/autofs_i.h
 +++ b/fs/autofs4/autofs_i.h
 @@ -135,10 +135,13 @@ static inline struct autofs_info 
 *autofs4_dentry_ino(struct dentry *dentry)
  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
   * processes which do manipulations for us in user space sees the raw
   * filesystem without "magic".)
 + * A process performing certain ioctls can get temporary oz status.
   */
 +extern struct task_struct *autofs_tmp_oz;
  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
  {
 -  return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
 +  return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
 +  autofs_tmp_oz == current;
  }
  
  struct inode *autofs4_get_inode(struct super_block *, umode_t);
 diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
 index dd9f1bebb5a3..d76401669a20 100644
 --- a/fs/autofs4/dev-ioctl.c
 +++ b/fs/autofs4/dev-ioctl.c
 @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file 
 *fp,
return 0;
  }
  
 +struct task_struct *autofs_tmp_oz;
 +int kern_path_oz(const char *pathname, int flags, struct path *path)
 +{
 +  static DEFINE_MUTEX(autofs_oz);
 +  int err;
 +
 +  mutex_lock(_oz);
 +  autofs_tmp_oz = current;
 +  err = kern_path(pathname, flags, path);
 +  autofs_tmp_oz = NULL;
 +  mutex_unlock(_oz);
 +  return err;
 +}
 +
>>>
>>> It's simple enough but does look like it will attract criticism as being
>>> a hack!
>>>
>>> The kern_path_locked() function is very similar to what was

Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent

On 24/08/17 11:21, NeilBrown wrote:
> On Wed, Aug 23 2017, Ian Kent wrote:
> 
>> On 23/08/17 10:32, Ian Kent wrote:
>>> On 23/08/17 09:06, NeilBrown wrote:
 On Mon, Aug 21 2017, Ian Kent wrote:

>>
>> A mount isn't triggered by kern_path(pathname, 0, ).
>> That '0' would need to include one of
>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>
>> to trigger an automount (otherwise you just get -EISDIR).
>
> It's perfectly sensible to think that but there is a case where a
> a mount is triggered when using kern_path().
>
> The EISDIR return occurs for positive dentrys, negative dentrys
> will still trigger an automount (which is autofs specific,
> indirect mount map using nobrowse option, the install default).

 Ok, I understand this better now.  This difference between direct and
 indirect mounts is slightly awkward. It is visible from user-space, but
 not elegant to document.
 When you use O_PATH to open a direct automount that has not already been
 triggered, the open returns the underlying directory (and fstatfs
 confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
 an indirect automount, it *will* trigger the automount when "nobrowse" is
 in effect, but it won't when "browse" is in effect.
>>>
>>> That inconsistency has bothered me for quite a while now.
>>>
>>> It was carried over from the autofs module behavior when automounting
>>> support was added to the VFS. What's worse is it prevents the use of
>>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>>> statx().
>>>
>>> There is some risk in changing that so it does work but it really does
>>> need to work to enable userspace to not trigger an automount by using
>>> this flag.
>>>
>>> So that's (hopefully) going to change soonish, see:
>>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>>
>>> The result should be that stat family calls don't trigger automounts except
>>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>>

 So we cannot just say "O_PATH doesn't trigger automounts", which is
 essentially what I said in

 https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94

 It might be possible to modify automount so that it was more consistent
 - i.e. if the point is triggered by a mkdir has been done, just to the
 mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
 guess that might be racy, and in any case is hard to justify.

 Maybe I should change it to be about "direct automounts", and add a note
 that indirect automounts aren't so predictable.
>>>
>>> Right and the semantics should be much more consistent in the near future.
>>> I hope (and expect) this semantic change won't cause problems.
>>>

 But back to my original issue of wanting to discard
 kern_path_mountpoint, what would you think of the following approach -
 slight revised from before.

 Thanks,
 NeilBrown

 diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
 index beef981aa54f..7663ea82e68d 100644
 --- a/fs/autofs4/autofs_i.h
 +++ b/fs/autofs4/autofs_i.h
 @@ -135,10 +135,13 @@ static inline struct autofs_info 
 *autofs4_dentry_ino(struct dentry *dentry)
  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
   * processes which do manipulations for us in user space sees the raw
   * filesystem without "magic".)
 + * A process performing certain ioctls can get temporary oz status.
   */
 +extern struct task_struct *autofs_tmp_oz;
  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
  {
 -  return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
 +  return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
 +  autofs_tmp_oz == current;
  }
  
  struct inode *autofs4_get_inode(struct super_block *, umode_t);
 diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
 index dd9f1bebb5a3..d76401669a20 100644
 --- a/fs/autofs4/dev-ioctl.c
 +++ b/fs/autofs4/dev-ioctl.c
 @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file 
 *fp,
return 0;
  }
  
 +struct task_struct *autofs_tmp_oz;
 +int kern_path_oz(const char *pathname, int flags, struct path *path)
 +{
 +  static DEFINE_MUTEX(autofs_oz);
 +  int err;
 +
 +  mutex_lock(_oz);
 +  autofs_tmp_oz = current;
 +  err = kern_path(pathname, flags, path);
 +  autofs_tmp_oz = NULL;
 +  mutex_unlock(_oz);
 +  return err;
 +}
 +
>>>
>>> It's simple enough but does look like it will attract criticism as being
>>> a hack!
>>>
>>> The kern_path_locked() function is very similar to what was

Re: Do we really need d_weak_revalidate???

2017-08-23 Thread NeilBrown

On Wed, Aug 23 2017, Ian Kent wrote:

>
> That inconsistency has bothered me for quite a while now.
>
> It was carried over from the autofs module behavior when automounting
> support was added to the VFS. What's worse is it prevents the use of
> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
> statx().
>
> There is some risk in changing that so it does work but it really does
> need to work to enable userspace to not trigger an automount by using
> this flag.
>
> So that's (hopefully) going to change soonish, see:
> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>
> The result should be that stat family calls don't trigger automounts except
> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>

oooh, yes.  That's much better - thanks.

We should make sure that change gets into the man pages...

First however, we should probably correct the man page!
stat.2 says:

  NOTES
   On Linux, lstat() will generally not trigger  automounter
   action,  whereas  stat()  will  (but  see  the description of
   fstatat() AT_NO_AUTOMOUNT fag, above).

which is wrong: lstat and stat treat automounts the same.
@Michael: do you recall why you inserted that text?  The commit message
in commit 1ef5b2805471 ("stat.2: Cosmetic reworking of timestamp
discussion in NOTES") is not very helpful.

I propose correcting to

  NOTES:
  On Linux, lstat() nor stat() act as though AT_NO_AUTOMOUNT was set
  and will not trigger automounter action for direct automount
  points, though they may (prior to 4.14) for indirect automount
  points.

The more precise details, that automount action for indirect automount
points is not triggered when the 'browse' option is used, is probably
not necessary.

Ian: if you agree with that text, and Michael doesn't provide alternate
evidence, I'll submit a formal patch for the man page or should we
just wait until the patch actually lands?

Thanks,
NeilBrown

signature.asc
Description: PGP signature

Re: Do we really need d_weak_revalidate???

2017-08-23 Thread NeilBrown

On Wed, Aug 23 2017, Ian Kent wrote:

>
> That inconsistency has bothered me for quite a while now.
>
> It was carried over from the autofs module behavior when automounting
> support was added to the VFS. What's worse is it prevents the use of
> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
> statx().
>
> There is some risk in changing that so it does work but it really does
> need to work to enable userspace to not trigger an automount by using
> this flag.
>
> So that's (hopefully) going to change soonish, see:
> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>
> The result should be that stat family calls don't trigger automounts except
> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>

oooh, yes.  That's much better - thanks.

We should make sure that change gets into the man pages...

First however, we should probably correct the man page!
stat.2 says:

  NOTES
   On Linux, lstat() will generally not trigger  automounter
   action,  whereas  stat()  will  (but  see  the description of
   fstatat() AT_NO_AUTOMOUNT fag, above).

which is wrong: lstat and stat treat automounts the same.
@Michael: do you recall why you inserted that text?  The commit message
in commit 1ef5b2805471 ("stat.2: Cosmetic reworking of timestamp
discussion in NOTES") is not very helpful.

I propose correcting to

  NOTES:
  On Linux, lstat() nor stat() act as though AT_NO_AUTOMOUNT was set
  and will not trigger automounter action for direct automount
  points, though they may (prior to 4.14) for indirect automount
  points.

The more precise details, that automount action for indirect automount
points is not triggered when the 'browse' option is used, is probably
not necessary.

Ian: if you agree with that text, and Michael doesn't provide alternate
evidence, I'll submit a formal patch for the man page or should we
just wait until the patch actually lands?

Thanks,
NeilBrown

signature.asc
Description: PGP signature

Re: Do we really need d_weak_revalidate???

2017-08-23 Thread NeilBrown

On Wed, Aug 23 2017, Ian Kent wrote:

> On 23/08/17 10:32, Ian Kent wrote:
>> On 23/08/17 09:06, NeilBrown wrote:
>>> On Mon, Aug 21 2017, Ian Kent wrote:
>>>
>
> A mount isn't triggered by kern_path(pathname, 0, ).
> That '0' would need to include one of
>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>
> to trigger an automount (otherwise you just get -EISDIR).

 It's perfectly sensible to think that but there is a case where a
 a mount is triggered when using kern_path().

 The EISDIR return occurs for positive dentrys, negative dentrys
 will still trigger an automount (which is autofs specific,
 indirect mount map using nobrowse option, the install default).
>>>
>>> Ok, I understand this better now.  This difference between direct and
>>> indirect mounts is slightly awkward. It is visible from user-space, but
>>> not elegant to document.
>>> When you use O_PATH to open a direct automount that has not already been
>>> triggered, the open returns the underlying directory (and fstatfs
>>> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
>>> an indirect automount, it *will* trigger the automount when "nobrowse" is
>>> in effect, but it won't when "browse" is in effect.
>> 
>> That inconsistency has bothered me for quite a while now.
>> 
>> It was carried over from the autofs module behavior when automounting
>> support was added to the VFS. What's worse is it prevents the use of
>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>> statx().
>> 
>> There is some risk in changing that so it does work but it really does
>> need to work to enable userspace to not trigger an automount by using
>> this flag.
>> 
>> So that's (hopefully) going to change soonish, see:
>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>> 
>> The result should be that stat family calls don't trigger automounts except
>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>> 
>>>
>>> So we cannot just say "O_PATH doesn't trigger automounts", which is
>>> essentially what I said in
>>>
>>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
>>>
>>> It might be possible to modify automount so that it was more consistent
>>> - i.e. if the point is triggered by a mkdir has been done, just to the
>>> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
>>> guess that might be racy, and in any case is hard to justify.
>>>
>>> Maybe I should change it to be about "direct automounts", and add a note
>>> that indirect automounts aren't so predictable.
>> 
>> Right and the semantics should be much more consistent in the near future.
>> I hope (and expect) this semantic change won't cause problems.
>> 
>>>
>>> But back to my original issue of wanting to discard
>>> kern_path_mountpoint, what would you think of the following approach -
>>> slight revised from before.
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>> index beef981aa54f..7663ea82e68d 100644
>>> --- a/fs/autofs4/autofs_i.h
>>> +++ b/fs/autofs4/autofs_i.h
>>> @@ -135,10 +135,13 @@ static inline struct autofs_info 
>>> *autofs4_dentry_ino(struct dentry *dentry)
>>>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>>>   * processes which do manipulations for us in user space sees the raw
>>>   * filesystem without "magic".)
>>> + * A process performing certain ioctls can get temporary oz status.
>>>   */
>>> +extern struct task_struct *autofs_tmp_oz;
>>>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>>>  {
>>> -   return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
>>> +   return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
>>> +   autofs_tmp_oz == current;
>>>  }
>>>  
>>>  struct inode *autofs4_get_inode(struct super_block *, umode_t);
>>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>>> index dd9f1bebb5a3..d76401669a20 100644
>>> --- a/fs/autofs4/dev-ioctl.c
>>> +++ b/fs/autofs4/dev-ioctl.c
>>> @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file 
>>> *fp,
>>> return 0;
>>>  }
>>>  
>>> +struct task_struct *autofs_tmp_oz;
>>> +int kern_path_oz(const char *pathname, int flags, struct path *path)
>>> +{
>>> +   static DEFINE_MUTEX(autofs_oz);
>>> +   int err;
>>> +
>>> +   mutex_lock(_oz);
>>> +   autofs_tmp_oz = current;
>>> +   err = kern_path(pathname, flags, path);
>>> +   autofs_tmp_oz = NULL;
>>> +   mutex_unlock(_oz);
>>> +   return err;
>>> +}
>>> +
>> 
>> It's simple enough but does look like it will attract criticism as being
>> a hack!
>> 
>> The kern_path_locked() function is very similar to what was originally
>> done, along with code to look down the mount stack (rather than up the
>> way it does now) to get the mount point. In this case,

Re: Do we really need d_weak_revalidate???

2017-08-23 Thread NeilBrown

On Wed, Aug 23 2017, Ian Kent wrote:

> On 23/08/17 10:32, Ian Kent wrote:
>> On 23/08/17 09:06, NeilBrown wrote:
>>> On Mon, Aug 21 2017, Ian Kent wrote:
>>>
>
> A mount isn't triggered by kern_path(pathname, 0, ).
> That '0' would need to include one of
>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>
> to trigger an automount (otherwise you just get -EISDIR).

 It's perfectly sensible to think that but there is a case where a
 a mount is triggered when using kern_path().

 The EISDIR return occurs for positive dentrys, negative dentrys
 will still trigger an automount (which is autofs specific,
 indirect mount map using nobrowse option, the install default).
>>>
>>> Ok, I understand this better now.  This difference between direct and
>>> indirect mounts is slightly awkward. It is visible from user-space, but
>>> not elegant to document.
>>> When you use O_PATH to open a direct automount that has not already been
>>> triggered, the open returns the underlying directory (and fstatfs
>>> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
>>> an indirect automount, it *will* trigger the automount when "nobrowse" is
>>> in effect, but it won't when "browse" is in effect.
>> 
>> That inconsistency has bothered me for quite a while now.
>> 
>> It was carried over from the autofs module behavior when automounting
>> support was added to the VFS. What's worse is it prevents the use of
>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>> statx().
>> 
>> There is some risk in changing that so it does work but it really does
>> need to work to enable userspace to not trigger an automount by using
>> this flag.
>> 
>> So that's (hopefully) going to change soonish, see:
>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>> 
>> The result should be that stat family calls don't trigger automounts except
>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>> 
>>>
>>> So we cannot just say "O_PATH doesn't trigger automounts", which is
>>> essentially what I said in
>>>
>>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
>>>
>>> It might be possible to modify automount so that it was more consistent
>>> - i.e. if the point is triggered by a mkdir has been done, just to the
>>> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
>>> guess that might be racy, and in any case is hard to justify.
>>>
>>> Maybe I should change it to be about "direct automounts", and add a note
>>> that indirect automounts aren't so predictable.
>> 
>> Right and the semantics should be much more consistent in the near future.
>> I hope (and expect) this semantic change won't cause problems.
>> 
>>>
>>> But back to my original issue of wanting to discard
>>> kern_path_mountpoint, what would you think of the following approach -
>>> slight revised from before.
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>> index beef981aa54f..7663ea82e68d 100644
>>> --- a/fs/autofs4/autofs_i.h
>>> +++ b/fs/autofs4/autofs_i.h
>>> @@ -135,10 +135,13 @@ static inline struct autofs_info 
>>> *autofs4_dentry_ino(struct dentry *dentry)
>>>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>>>   * processes which do manipulations for us in user space sees the raw
>>>   * filesystem without "magic".)
>>> + * A process performing certain ioctls can get temporary oz status.
>>>   */
>>> +extern struct task_struct *autofs_tmp_oz;
>>>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>>>  {
>>> -   return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
>>> +   return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
>>> +   autofs_tmp_oz == current;
>>>  }
>>>  
>>>  struct inode *autofs4_get_inode(struct super_block *, umode_t);
>>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>>> index dd9f1bebb5a3..d76401669a20 100644
>>> --- a/fs/autofs4/dev-ioctl.c
>>> +++ b/fs/autofs4/dev-ioctl.c
>>> @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file 
>>> *fp,
>>> return 0;
>>>  }
>>>  
>>> +struct task_struct *autofs_tmp_oz;
>>> +int kern_path_oz(const char *pathname, int flags, struct path *path)
>>> +{
>>> +   static DEFINE_MUTEX(autofs_oz);
>>> +   int err;
>>> +
>>> +   mutex_lock(_oz);
>>> +   autofs_tmp_oz = current;
>>> +   err = kern_path(pathname, flags, path);
>>> +   autofs_tmp_oz = NULL;
>>> +   mutex_unlock(_oz);
>>> +   return err;
>>> +}
>>> +
>> 
>> It's simple enough but does look like it will attract criticism as being
>> a hack!
>> 
>> The kern_path_locked() function is very similar to what was originally
>> done, along with code to look down the mount stack (rather than up the
>> way it does now) to get the mount point. In this case,

Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent

On 23/08/17 10:54, Ian Kent wrote:
> On 23/08/17 10:40, Ian Kent wrote:
>> On 23/08/17 10:32, Ian Kent wrote:
>>> On 23/08/17 09:06, NeilBrown wrote:
 On Mon, Aug 21 2017, Ian Kent wrote:

>>
>> A mount isn't triggered by kern_path(pathname, 0, ).
>> That '0' would need to include one of
>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>
>> to trigger an automount (otherwise you just get -EISDIR).
>
> It's perfectly sensible to think that but there is a case where a
> a mount is triggered when using kern_path().
>
> The EISDIR return occurs for positive dentrys, negative dentrys
> will still trigger an automount (which is autofs specific,
> indirect mount map using nobrowse option, the install default).

 Ok, I understand this better now.  This difference between direct and
 indirect mounts is slightly awkward. It is visible from user-space, but
 not elegant to document.
 When you use O_PATH to open a direct automount that has not already been
 triggered, the open returns the underlying directory (and fstatfs
 confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
 an indirect automount, it *will* trigger the automount when "nobrowse" is
 in effect, but it won't when "browse" is in effect.
>>>
>>> That inconsistency has bothered me for quite a while now.
>>>
>>> It was carried over from the autofs module behavior when automounting
>>> support was added to the VFS. What's worse is it prevents the use of
>>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>>> statx().
>>>
>>> There is some risk in changing that so it does work but it really does
>>> need to work to enable userspace to not trigger an automount by using
>>> this flag.
>>>
>>> So that's (hopefully) going to change soonish, see:
>>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>>
>>> The result should be that stat family calls don't trigger automounts except
>>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>>

 So we cannot just say "O_PATH doesn't trigger automounts", which is
 essentially what I said in

 https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94

 It might be possible to modify automount so that it was more consistent
 - i.e. if the point is triggered by a mkdir has been done, just to the
 mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
 guess that might be racy, and in any case is hard to justify.

 Maybe I should change it to be about "direct automounts", and add a note
 that indirect automounts aren't so predictable.
>>>
>>> Right and the semantics should be much more consistent in the near future.
>>> I hope (and expect) this semantic change won't cause problems.
>>>

 But back to my original issue of wanting to discard
 kern_path_mountpoint, what would you think of the following approach -
 slight revised from before.

 Thanks,
 NeilBrown

 diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
 index beef981aa54f..7663ea82e68d 100644
 --- a/fs/autofs4/autofs_i.h
 +++ b/fs/autofs4/autofs_i.h
 @@ -135,10 +135,13 @@ static inline struct autofs_info 
 *autofs4_dentry_ino(struct dentry *dentry)
  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
   * processes which do manipulations for us in user space sees the raw
   * filesystem without "magic".)
 + * A process performing certain ioctls can get temporary oz status.
   */
 +extern struct task_struct *autofs_tmp_oz;
  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
  {
 -  return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
 +  return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
 +  autofs_tmp_oz == current;
  }
  
  struct inode *autofs4_get_inode(struct super_block *, umode_t);
 diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
 index dd9f1bebb5a3..d76401669a20 100644
 --- a/fs/autofs4/dev-ioctl.c
 +++ b/fs/autofs4/dev-ioctl.c
 @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file 
 *fp,
return 0;
  }
  
 +struct task_struct *autofs_tmp_oz;
 +int kern_path_oz(const char *pathname, int flags, struct path *path)
 +{
 +  static DEFINE_MUTEX(autofs_oz);
 +  int err;
 +
 +  mutex_lock(_oz);
 +  autofs_tmp_oz = current;
 +  err = kern_path(pathname, flags, path);
 +  autofs_tmp_oz = NULL;
 +  mutex_unlock(_oz);
 +  return err;
 +}
 +
>>>
>>> It's simple enough but does look like it will attract criticism as being
>>> a hack!
>>>
>>> The kern_path_locked() function is very similar to what was originally
>>>

Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent

On 23/08/17 10:54, Ian Kent wrote:
> On 23/08/17 10:40, Ian Kent wrote:
>> On 23/08/17 10:32, Ian Kent wrote:
>>> On 23/08/17 09:06, NeilBrown wrote:
 On Mon, Aug 21 2017, Ian Kent wrote:

>>
>> A mount isn't triggered by kern_path(pathname, 0, ).
>> That '0' would need to include one of
>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>
>> to trigger an automount (otherwise you just get -EISDIR).
>
> It's perfectly sensible to think that but there is a case where a
> a mount is triggered when using kern_path().
>
> The EISDIR return occurs for positive dentrys, negative dentrys
> will still trigger an automount (which is autofs specific,
> indirect mount map using nobrowse option, the install default).

 Ok, I understand this better now.  This difference between direct and
 indirect mounts is slightly awkward. It is visible from user-space, but
 not elegant to document.
 When you use O_PATH to open a direct automount that has not already been
 triggered, the open returns the underlying directory (and fstatfs
 confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
 an indirect automount, it *will* trigger the automount when "nobrowse" is
 in effect, but it won't when "browse" is in effect.
>>>
>>> That inconsistency has bothered me for quite a while now.
>>>
>>> It was carried over from the autofs module behavior when automounting
>>> support was added to the VFS. What's worse is it prevents the use of
>>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>>> statx().
>>>
>>> There is some risk in changing that so it does work but it really does
>>> need to work to enable userspace to not trigger an automount by using
>>> this flag.
>>>
>>> So that's (hopefully) going to change soonish, see:
>>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>>
>>> The result should be that stat family calls don't trigger automounts except
>>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>>

 So we cannot just say "O_PATH doesn't trigger automounts", which is
 essentially what I said in

 https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94

 It might be possible to modify automount so that it was more consistent
 - i.e. if the point is triggered by a mkdir has been done, just to the
 mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
 guess that might be racy, and in any case is hard to justify.

 Maybe I should change it to be about "direct automounts", and add a note
 that indirect automounts aren't so predictable.
>>>
>>> Right and the semantics should be much more consistent in the near future.
>>> I hope (and expect) this semantic change won't cause problems.
>>>

 But back to my original issue of wanting to discard
 kern_path_mountpoint, what would you think of the following approach -
 slight revised from before.

 Thanks,
 NeilBrown

 diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
 index beef981aa54f..7663ea82e68d 100644
 --- a/fs/autofs4/autofs_i.h
 +++ b/fs/autofs4/autofs_i.h
 @@ -135,10 +135,13 @@ static inline struct autofs_info 
 *autofs4_dentry_ino(struct dentry *dentry)
  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
   * processes which do manipulations for us in user space sees the raw
   * filesystem without "magic".)
 + * A process performing certain ioctls can get temporary oz status.
   */
 +extern struct task_struct *autofs_tmp_oz;
  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
  {
 -  return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
 +  return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
 +  autofs_tmp_oz == current;
  }
  
  struct inode *autofs4_get_inode(struct super_block *, umode_t);
 diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
 index dd9f1bebb5a3..d76401669a20 100644
 --- a/fs/autofs4/dev-ioctl.c
 +++ b/fs/autofs4/dev-ioctl.c
 @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file 
 *fp,
return 0;
  }
  
 +struct task_struct *autofs_tmp_oz;
 +int kern_path_oz(const char *pathname, int flags, struct path *path)
 +{
 +  static DEFINE_MUTEX(autofs_oz);
 +  int err;
 +
 +  mutex_lock(_oz);
 +  autofs_tmp_oz = current;
 +  err = kern_path(pathname, flags, path);
 +  autofs_tmp_oz = NULL;
 +  mutex_unlock(_oz);
 +  return err;
 +}
 +
>>>
>>> It's simple enough but does look like it will attract criticism as being
>>> a hack!
>>>
>>> The kern_path_locked() function is very similar to what was originally
>>>

Re: Do we really need d_weak_revalidate???

2017-08-22 Thread Ian Kent

On 23/08/17 10:40, Ian Kent wrote:
> On 23/08/17 10:32, Ian Kent wrote:
>> On 23/08/17 09:06, NeilBrown wrote:
>>> On Mon, Aug 21 2017, Ian Kent wrote:
>>>
>
> A mount isn't triggered by kern_path(pathname, 0, ).
> That '0' would need to include one of
>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>
> to trigger an automount (otherwise you just get -EISDIR).

 It's perfectly sensible to think that but there is a case where a
 a mount is triggered when using kern_path().

 The EISDIR return occurs for positive dentrys, negative dentrys
 will still trigger an automount (which is autofs specific,
 indirect mount map using nobrowse option, the install default).
>>>
>>> Ok, I understand this better now.  This difference between direct and
>>> indirect mounts is slightly awkward. It is visible from user-space, but
>>> not elegant to document.
>>> When you use O_PATH to open a direct automount that has not already been
>>> triggered, the open returns the underlying directory (and fstatfs
>>> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
>>> an indirect automount, it *will* trigger the automount when "nobrowse" is
>>> in effect, but it won't when "browse" is in effect.
>>
>> That inconsistency has bothered me for quite a while now.
>>
>> It was carried over from the autofs module behavior when automounting
>> support was added to the VFS. What's worse is it prevents the use of
>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>> statx().
>>
>> There is some risk in changing that so it does work but it really does
>> need to work to enable userspace to not trigger an automount by using
>> this flag.
>>
>> So that's (hopefully) going to change soonish, see:
>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>
>> The result should be that stat family calls don't trigger automounts except
>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>
>>>
>>> So we cannot just say "O_PATH doesn't trigger automounts", which is
>>> essentially what I said in
>>>
>>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
>>>
>>> It might be possible to modify automount so that it was more consistent
>>> - i.e. if the point is triggered by a mkdir has been done, just to the
>>> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
>>> guess that might be racy, and in any case is hard to justify.
>>>
>>> Maybe I should change it to be about "direct automounts", and add a note
>>> that indirect automounts aren't so predictable.
>>
>> Right and the semantics should be much more consistent in the near future.
>> I hope (and expect) this semantic change won't cause problems.
>>
>>>
>>> But back to my original issue of wanting to discard
>>> kern_path_mountpoint, what would you think of the following approach -
>>> slight revised from before.
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>> index beef981aa54f..7663ea82e68d 100644
>>> --- a/fs/autofs4/autofs_i.h
>>> +++ b/fs/autofs4/autofs_i.h
>>> @@ -135,10 +135,13 @@ static inline struct autofs_info 
>>> *autofs4_dentry_ino(struct dentry *dentry)
>>>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>>>   * processes which do manipulations for us in user space sees the raw
>>>   * filesystem without "magic".)
>>> + * A process performing certain ioctls can get temporary oz status.
>>>   */
>>> +extern struct task_struct *autofs_tmp_oz;
>>>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>>>  {
>>> -   return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
>>> +   return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
>>> +   autofs_tmp_oz == current;
>>>  }
>>>  
>>>  struct inode *autofs4_get_inode(struct super_block *, umode_t);
>>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>>> index dd9f1bebb5a3..d76401669a20 100644
>>> --- a/fs/autofs4/dev-ioctl.c
>>> +++ b/fs/autofs4/dev-ioctl.c
>>> @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file 
>>> *fp,
>>> return 0;
>>>  }
>>>  
>>> +struct task_struct *autofs_tmp_oz;
>>> +int kern_path_oz(const char *pathname, int flags, struct path *path)
>>> +{
>>> +   static DEFINE_MUTEX(autofs_oz);
>>> +   int err;
>>> +
>>> +   mutex_lock(_oz);
>>> +   autofs_tmp_oz = current;
>>> +   err = kern_path(pathname, flags, path);
>>> +   autofs_tmp_oz = NULL;
>>> +   mutex_unlock(_oz);
>>> +   return err;
>>> +}
>>> +
>>
>> It's simple enough but does look like it will attract criticism as being
>> a hack!
>>
>> The kern_path_locked() function is very similar to what was originally
>> done, along with code to look down the mount stack (rather than up the
>> way it does now) to get the mount point. In this case, to be valid

Re: Do we really need d_weak_revalidate???

2017-08-22 Thread Ian Kent

On 23/08/17 10:40, Ian Kent wrote:
> On 23/08/17 10:32, Ian Kent wrote:
>> On 23/08/17 09:06, NeilBrown wrote:
>>> On Mon, Aug 21 2017, Ian Kent wrote:
>>>
>
> A mount isn't triggered by kern_path(pathname, 0, ).
> That '0' would need to include one of
>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>
> to trigger an automount (otherwise you just get -EISDIR).

 It's perfectly sensible to think that but there is a case where a
 a mount is triggered when using kern_path().

 The EISDIR return occurs for positive dentrys, negative dentrys
 will still trigger an automount (which is autofs specific,
 indirect mount map using nobrowse option, the install default).
>>>
>>> Ok, I understand this better now.  This difference between direct and
>>> indirect mounts is slightly awkward. It is visible from user-space, but
>>> not elegant to document.
>>> When you use O_PATH to open a direct automount that has not already been
>>> triggered, the open returns the underlying directory (and fstatfs
>>> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
>>> an indirect automount, it *will* trigger the automount when "nobrowse" is
>>> in effect, but it won't when "browse" is in effect.
>>
>> That inconsistency has bothered me for quite a while now.
>>
>> It was carried over from the autofs module behavior when automounting
>> support was added to the VFS. What's worse is it prevents the use of
>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>> statx().
>>
>> There is some risk in changing that so it does work but it really does
>> need to work to enable userspace to not trigger an automount by using
>> this flag.
>>
>> So that's (hopefully) going to change soonish, see:
>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>
>> The result should be that stat family calls don't trigger automounts except
>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>
>>>
>>> So we cannot just say "O_PATH doesn't trigger automounts", which is
>>> essentially what I said in
>>>
>>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
>>>
>>> It might be possible to modify automount so that it was more consistent
>>> - i.e. if the point is triggered by a mkdir has been done, just to the
>>> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
>>> guess that might be racy, and in any case is hard to justify.
>>>
>>> Maybe I should change it to be about "direct automounts", and add a note
>>> that indirect automounts aren't so predictable.
>>
>> Right and the semantics should be much more consistent in the near future.
>> I hope (and expect) this semantic change won't cause problems.
>>
>>>
>>> But back to my original issue of wanting to discard
>>> kern_path_mountpoint, what would you think of the following approach -
>>> slight revised from before.
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>> index beef981aa54f..7663ea82e68d 100644
>>> --- a/fs/autofs4/autofs_i.h
>>> +++ b/fs/autofs4/autofs_i.h
>>> @@ -135,10 +135,13 @@ static inline struct autofs_info 
>>> *autofs4_dentry_ino(struct dentry *dentry)
>>>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>>>   * processes which do manipulations for us in user space sees the raw
>>>   * filesystem without "magic".)
>>> + * A process performing certain ioctls can get temporary oz status.
>>>   */
>>> +extern struct task_struct *autofs_tmp_oz;
>>>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>>>  {
>>> -   return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
>>> +   return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
>>> +   autofs_tmp_oz == current;
>>>  }
>>>  
>>>  struct inode *autofs4_get_inode(struct super_block *, umode_t);
>>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>>> index dd9f1bebb5a3..d76401669a20 100644
>>> --- a/fs/autofs4/dev-ioctl.c
>>> +++ b/fs/autofs4/dev-ioctl.c
>>> @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file 
>>> *fp,
>>> return 0;
>>>  }
>>>  
>>> +struct task_struct *autofs_tmp_oz;
>>> +int kern_path_oz(const char *pathname, int flags, struct path *path)
>>> +{
>>> +   static DEFINE_MUTEX(autofs_oz);
>>> +   int err;
>>> +
>>> +   mutex_lock(_oz);
>>> +   autofs_tmp_oz = current;
>>> +   err = kern_path(pathname, flags, path);
>>> +   autofs_tmp_oz = NULL;
>>> +   mutex_unlock(_oz);
>>> +   return err;
>>> +}
>>> +
>>
>> It's simple enough but does look like it will attract criticism as being
>> a hack!
>>
>> The kern_path_locked() function is very similar to what was originally
>> done, along with code to look down the mount stack (rather than up the
>> way it does now) to get the mount point. In this case, to be valid

Re: Do we really need d_weak_revalidate???

2017-08-22 Thread Ian Kent

On 23/08/17 10:32, Ian Kent wrote:
> On 23/08/17 09:06, NeilBrown wrote:
>> On Mon, Aug 21 2017, Ian Kent wrote:
>>

 A mount isn't triggered by kern_path(pathname, 0, ).
 That '0' would need to include one of
   LOOKUP_PARENT | LOOKUP_DIRECTORY |
   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT

 to trigger an automount (otherwise you just get -EISDIR).
>>>
>>> It's perfectly sensible to think that but there is a case where a
>>> a mount is triggered when using kern_path().
>>>
>>> The EISDIR return occurs for positive dentrys, negative dentrys
>>> will still trigger an automount (which is autofs specific,
>>> indirect mount map using nobrowse option, the install default).
>>
>> Ok, I understand this better now.  This difference between direct and
>> indirect mounts is slightly awkward. It is visible from user-space, but
>> not elegant to document.
>> When you use O_PATH to open a direct automount that has not already been
>> triggered, the open returns the underlying directory (and fstatfs
>> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
>> an indirect automount, it *will* trigger the automount when "nobrowse" is
>> in effect, but it won't when "browse" is in effect.
> 
> That inconsistency has bothered me for quite a while now.
> 
> It was carried over from the autofs module behavior when automounting
> support was added to the VFS. What's worse is it prevents the use of
> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
> statx().
> 
> There is some risk in changing that so it does work but it really does
> need to work to enable userspace to not trigger an automount by using
> this flag.
> 
> So that's (hopefully) going to change soonish, see:
> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
> 
> The result should be that stat family calls don't trigger automounts except
> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
> 
>>
>> So we cannot just say "O_PATH doesn't trigger automounts", which is
>> essentially what I said in
>>
>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
>>
>> It might be possible to modify automount so that it was more consistent
>> - i.e. if the point is triggered by a mkdir has been done, just to the
>> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
>> guess that might be racy, and in any case is hard to justify.
>>
>> Maybe I should change it to be about "direct automounts", and add a note
>> that indirect automounts aren't so predictable.
> 
> Right and the semantics should be much more consistent in the near future.
> I hope (and expect) this semantic change won't cause problems.
> 
>>
>> But back to my original issue of wanting to discard
>> kern_path_mountpoint, what would you think of the following approach -
>> slight revised from before.
>>
>> Thanks,
>> NeilBrown
>>
>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>> index beef981aa54f..7663ea82e68d 100644
>> --- a/fs/autofs4/autofs_i.h
>> +++ b/fs/autofs4/autofs_i.h
>> @@ -135,10 +135,13 @@ static inline struct autofs_info 
>> *autofs4_dentry_ino(struct dentry *dentry)
>>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>>   * processes which do manipulations for us in user space sees the raw
>>   * filesystem without "magic".)
>> + * A process performing certain ioctls can get temporary oz status.
>>   */
>> +extern struct task_struct *autofs_tmp_oz;
>>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>>  {
>> -return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
>> +return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
>> +autofs_tmp_oz == current;
>>  }
>>  
>>  struct inode *autofs4_get_inode(struct super_block *, umode_t);
>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>> index dd9f1bebb5a3..d76401669a20 100644
>> --- a/fs/autofs4/dev-ioctl.c
>> +++ b/fs/autofs4/dev-ioctl.c
>> @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file *fp,
>>  return 0;
>>  }
>>  
>> +struct task_struct *autofs_tmp_oz;
>> +int kern_path_oz(const char *pathname, int flags, struct path *path)
>> +{
>> +static DEFINE_MUTEX(autofs_oz);
>> +int err;
>> +
>> +mutex_lock(_oz);
>> +autofs_tmp_oz = current;
>> +err = kern_path(pathname, flags, path);
>> +autofs_tmp_oz = NULL;
>> +mutex_unlock(_oz);
>> +return err;
>> +}
>> +
> 
> It's simple enough but does look like it will attract criticism as being
> a hack!
> 
> The kern_path_locked() function is very similar to what was originally
> done, along with code to look down the mount stack (rather than up the
> way it does now) to get the mount point. In this case, to be valid the
> dentry can't be a symlink so that fits kern_path_locked() too.

Oh wait, that __lookup_hash() tries too hard to resolve the dentry,

Re: Do we really need d_weak_revalidate???

2017-08-22 Thread Ian Kent

On 23/08/17 10:32, Ian Kent wrote:
> On 23/08/17 09:06, NeilBrown wrote:
>> On Mon, Aug 21 2017, Ian Kent wrote:
>>

 A mount isn't triggered by kern_path(pathname, 0, ).
 That '0' would need to include one of
   LOOKUP_PARENT | LOOKUP_DIRECTORY |
   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT

 to trigger an automount (otherwise you just get -EISDIR).
>>>
>>> It's perfectly sensible to think that but there is a case where a
>>> a mount is triggered when using kern_path().
>>>
>>> The EISDIR return occurs for positive dentrys, negative dentrys
>>> will still trigger an automount (which is autofs specific,
>>> indirect mount map using nobrowse option, the install default).
>>
>> Ok, I understand this better now.  This difference between direct and
>> indirect mounts is slightly awkward. It is visible from user-space, but
>> not elegant to document.
>> When you use O_PATH to open a direct automount that has not already been
>> triggered, the open returns the underlying directory (and fstatfs
>> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
>> an indirect automount, it *will* trigger the automount when "nobrowse" is
>> in effect, but it won't when "browse" is in effect.
> 
> That inconsistency has bothered me for quite a while now.
> 
> It was carried over from the autofs module behavior when automounting
> support was added to the VFS. What's worse is it prevents the use of
> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
> statx().
> 
> There is some risk in changing that so it does work but it really does
> need to work to enable userspace to not trigger an automount by using
> this flag.
> 
> So that's (hopefully) going to change soonish, see:
> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
> 
> The result should be that stat family calls don't trigger automounts except
> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
> 
>>
>> So we cannot just say "O_PATH doesn't trigger automounts", which is
>> essentially what I said in
>>
>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
>>
>> It might be possible to modify automount so that it was more consistent
>> - i.e. if the point is triggered by a mkdir has been done, just to the
>> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
>> guess that might be racy, and in any case is hard to justify.
>>
>> Maybe I should change it to be about "direct automounts", and add a note
>> that indirect automounts aren't so predictable.
> 
> Right and the semantics should be much more consistent in the near future.
> I hope (and expect) this semantic change won't cause problems.
> 
>>
>> But back to my original issue of wanting to discard
>> kern_path_mountpoint, what would you think of the following approach -
>> slight revised from before.
>>
>> Thanks,
>> NeilBrown
>>
>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>> index beef981aa54f..7663ea82e68d 100644
>> --- a/fs/autofs4/autofs_i.h
>> +++ b/fs/autofs4/autofs_i.h
>> @@ -135,10 +135,13 @@ static inline struct autofs_info 
>> *autofs4_dentry_ino(struct dentry *dentry)
>>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>>   * processes which do manipulations for us in user space sees the raw
>>   * filesystem without "magic".)
>> + * A process performing certain ioctls can get temporary oz status.
>>   */
>> +extern struct task_struct *autofs_tmp_oz;
>>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>>  {
>> -return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
>> +return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
>> +autofs_tmp_oz == current;
>>  }
>>  
>>  struct inode *autofs4_get_inode(struct super_block *, umode_t);
>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>> index dd9f1bebb5a3..d76401669a20 100644
>> --- a/fs/autofs4/dev-ioctl.c
>> +++ b/fs/autofs4/dev-ioctl.c
>> @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file *fp,
>>  return 0;
>>  }
>>  
>> +struct task_struct *autofs_tmp_oz;
>> +int kern_path_oz(const char *pathname, int flags, struct path *path)
>> +{
>> +static DEFINE_MUTEX(autofs_oz);
>> +int err;
>> +
>> +mutex_lock(_oz);
>> +autofs_tmp_oz = current;
>> +err = kern_path(pathname, flags, path);
>> +autofs_tmp_oz = NULL;
>> +mutex_unlock(_oz);
>> +return err;
>> +}
>> +
> 
> It's simple enough but does look like it will attract criticism as being
> a hack!
> 
> The kern_path_locked() function is very similar to what was originally
> done, along with code to look down the mount stack (rather than up the
> way it does now) to get the mount point. In this case, to be valid the
> dentry can't be a symlink so that fits kern_path_locked() too.

Oh wait, that __lookup_hash() tries too hard to resolve the dentry,

Re: Do we really need d_weak_revalidate???

2017-08-22 Thread Ian Kent

On 23/08/17 09:06, NeilBrown wrote:
> On Mon, Aug 21 2017, Ian Kent wrote:
> 
>>>
>>> A mount isn't triggered by kern_path(pathname, 0, ).
>>> That '0' would need to include one of
>>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>>
>>> to trigger an automount (otherwise you just get -EISDIR).
>>
>> It's perfectly sensible to think that but there is a case where a
>> a mount is triggered when using kern_path().
>>
>> The EISDIR return occurs for positive dentrys, negative dentrys
>> will still trigger an automount (which is autofs specific,
>> indirect mount map using nobrowse option, the install default).
> 
> Ok, I understand this better now.  This difference between direct and
> indirect mounts is slightly awkward. It is visible from user-space, but
> not elegant to document.
> When you use O_PATH to open a direct automount that has not already been
> triggered, the open returns the underlying directory (and fstatfs
> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
> an indirect automount, it *will* trigger the automount when "nobrowse" is
> in effect, but it won't when "browse" is in effect.

That inconsistency has bothered me for quite a while now.

It was carried over from the autofs module behavior when automounting
support was added to the VFS. What's worse is it prevents the use of
the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
statx().

There is some risk in changing that so it does work but it really does
need to work to enable userspace to not trigger an automount by using
this flag.

So that's (hopefully) going to change soonish, see:
http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch

The result should be that stat family calls don't trigger automounts except
for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.

> 
> So we cannot just say "O_PATH doesn't trigger automounts", which is
> essentially what I said in
> 
> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
> 
> It might be possible to modify automount so that it was more consistent
> - i.e. if the point is triggered by a mkdir has been done, just to the
> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
> guess that might be racy, and in any case is hard to justify.
> 
> Maybe I should change it to be about "direct automounts", and add a note
> that indirect automounts aren't so predictable.

Right and the semantics should be much more consistent in the near future.
I hope (and expect) this semantic change won't cause problems.

> 
> But back to my original issue of wanting to discard
> kern_path_mountpoint, what would you think of the following approach -
> slight revised from before.
> 
> Thanks,
> NeilBrown
> 
> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
> index beef981aa54f..7663ea82e68d 100644
> --- a/fs/autofs4/autofs_i.h
> +++ b/fs/autofs4/autofs_i.h
> @@ -135,10 +135,13 @@ static inline struct autofs_info 
> *autofs4_dentry_ino(struct dentry *dentry)
>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>   * processes which do manipulations for us in user space sees the raw
>   * filesystem without "magic".)
> + * A process performing certain ioctls can get temporary oz status.
>   */
> +extern struct task_struct *autofs_tmp_oz;
>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>  {
> - return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
> + return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
> + autofs_tmp_oz == current;
>  }
>  
>  struct inode *autofs4_get_inode(struct super_block *, umode_t);
> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
> index dd9f1bebb5a3..d76401669a20 100644
> --- a/fs/autofs4/dev-ioctl.c
> +++ b/fs/autofs4/dev-ioctl.c
> @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file *fp,
>   return 0;
>  }
>  
> +struct task_struct *autofs_tmp_oz;
> +int kern_path_oz(const char *pathname, int flags, struct path *path)
> +{
> + static DEFINE_MUTEX(autofs_oz);
> + int err;
> +
> + mutex_lock(_oz);
> + autofs_tmp_oz = current;
> + err = kern_path(pathname, flags, path);
> + autofs_tmp_oz = NULL;
> + mutex_unlock(_oz);
> + return err;
> +}
> +

It's simple enough but does look like it will attract criticism as being
a hack!

The kern_path_locked() function is very similar to what was originally
done, along with code to look down the mount stack (rather than up the
way it does now) to get the mount point. In this case, to be valid the
dentry can't be a symlink so that fits kern_path_locked() too.

So maybe it is worth going back to the way it was in the beginning and
be done with it  OTOH Al must have had a reason for changing the
way it was done that I didn't get.

>  /* Find the topmost mount satisfying test() */
>

Re: Do we really need d_weak_revalidate???

2017-08-22 Thread Ian Kent

On 23/08/17 09:06, NeilBrown wrote:
> On Mon, Aug 21 2017, Ian Kent wrote:
> 
>>>
>>> A mount isn't triggered by kern_path(pathname, 0, ).
>>> That '0' would need to include one of
>>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>>
>>> to trigger an automount (otherwise you just get -EISDIR).
>>
>> It's perfectly sensible to think that but there is a case where a
>> a mount is triggered when using kern_path().
>>
>> The EISDIR return occurs for positive dentrys, negative dentrys
>> will still trigger an automount (which is autofs specific,
>> indirect mount map using nobrowse option, the install default).
> 
> Ok, I understand this better now.  This difference between direct and
> indirect mounts is slightly awkward. It is visible from user-space, but
> not elegant to document.
> When you use O_PATH to open a direct automount that has not already been
> triggered, the open returns the underlying directory (and fstatfs
> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
> an indirect automount, it *will* trigger the automount when "nobrowse" is
> in effect, but it won't when "browse" is in effect.

That inconsistency has bothered me for quite a while now.

It was carried over from the autofs module behavior when automounting
support was added to the VFS. What's worse is it prevents the use of
the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
statx().

There is some risk in changing that so it does work but it really does
need to work to enable userspace to not trigger an automount by using
this flag.

So that's (hopefully) going to change soonish, see:
http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch

The result should be that stat family calls don't trigger automounts except
for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.

> 
> So we cannot just say "O_PATH doesn't trigger automounts", which is
> essentially what I said in
> 
> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
> 
> It might be possible to modify automount so that it was more consistent
> - i.e. if the point is triggered by a mkdir has been done, just to the
> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
> guess that might be racy, and in any case is hard to justify.
> 
> Maybe I should change it to be about "direct automounts", and add a note
> that indirect automounts aren't so predictable.

Right and the semantics should be much more consistent in the near future.
I hope (and expect) this semantic change won't cause problems.

> 
> But back to my original issue of wanting to discard
> kern_path_mountpoint, what would you think of the following approach -
> slight revised from before.
> 
> Thanks,
> NeilBrown
> 
> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
> index beef981aa54f..7663ea82e68d 100644
> --- a/fs/autofs4/autofs_i.h
> +++ b/fs/autofs4/autofs_i.h
> @@ -135,10 +135,13 @@ static inline struct autofs_info 
> *autofs4_dentry_ino(struct dentry *dentry)
>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>   * processes which do manipulations for us in user space sees the raw
>   * filesystem without "magic".)
> + * A process performing certain ioctls can get temporary oz status.
>   */
> +extern struct task_struct *autofs_tmp_oz;
>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>  {
> - return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
> + return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
> + autofs_tmp_oz == current;
>  }
>  
>  struct inode *autofs4_get_inode(struct super_block *, umode_t);
> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
> index dd9f1bebb5a3..d76401669a20 100644
> --- a/fs/autofs4/dev-ioctl.c
> +++ b/fs/autofs4/dev-ioctl.c
> @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file *fp,
>   return 0;
>  }
>  
> +struct task_struct *autofs_tmp_oz;
> +int kern_path_oz(const char *pathname, int flags, struct path *path)
> +{
> + static DEFINE_MUTEX(autofs_oz);
> + int err;
> +
> + mutex_lock(_oz);
> + autofs_tmp_oz = current;
> + err = kern_path(pathname, flags, path);
> + autofs_tmp_oz = NULL;
> + mutex_unlock(_oz);
> + return err;
> +}
> +

It's simple enough but does look like it will attract criticism as being
a hack!

The kern_path_locked() function is very similar to what was originally
done, along with code to look down the mount stack (rather than up the
way it does now) to get the mount point. In this case, to be valid the
dentry can't be a symlink so that fits kern_path_locked() too.

So maybe it is worth going back to the way it was in the beginning and
be done with it  OTOH Al must have had a reason for changing the
way it was done that I didn't get.

>  /* Find the topmost mount satisfying test() */
>

Re: Do we really need d_weak_revalidate???

2017-08-22 Thread NeilBrown

On Mon, Aug 21 2017, Ian Kent wrote:

>> 
>> A mount isn't triggered by kern_path(pathname, 0, ).
>> That '0' would need to include one of
>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>> 
>> to trigger an automount (otherwise you just get -EISDIR).
>
> It's perfectly sensible to think that but there is a case where a
> a mount is triggered when using kern_path().
>
> The EISDIR return occurs for positive dentrys, negative dentrys
> will still trigger an automount (which is autofs specific,
> indirect mount map using nobrowse option, the install default).

Ok, I understand this better now.  This difference between direct and
indirect mounts is slightly awkward. It is visible from user-space, but
not elegant to document.
When you use O_PATH to open a direct automount that has not already been
triggered, the open returns the underlying directory (and fstatfs
confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
an indirect automount, it *will* trigger the automount when "nobrowse" is
in effect, but it won't when "browse" is in effect.

So we cannot just say "O_PATH doesn't trigger automounts", which is
essentially what I said in

https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94

It might be possible to modify automount so that it was more consistent
- i.e. if the point is triggered by a mkdir has been done, just to the
mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
guess that might be racy, and in any case is hard to justify.

Maybe I should change it to be about "direct automounts", and add a note
that indirect automounts aren't so predictable.

But back to my original issue of wanting to discard
kern_path_mountpoint, what would you think of the following approach -
slight revised from before.

Thanks,
NeilBrown

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index beef981aa54f..7663ea82e68d 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -135,10 +135,13 @@ static inline struct autofs_info 
*autofs4_dentry_ino(struct dentry *dentry)
 /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
  * processes which do manipulations for us in user space sees the raw
  * filesystem without "magic".)
+ * A process performing certain ioctls can get temporary oz status.
  */
+extern struct task_struct *autofs_tmp_oz;
 static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
 {
-   return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
+   return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
+   autofs_tmp_oz == current;
 }
 
 struct inode *autofs4_get_inode(struct super_block *, umode_t);
diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index dd9f1bebb5a3..d76401669a20 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file *fp,
return 0;
 }
 
+struct task_struct *autofs_tmp_oz;
+int kern_path_oz(const char *pathname, int flags, struct path *path)
+{
+   static DEFINE_MUTEX(autofs_oz);
+   int err;
+
+   mutex_lock(_oz);
+   autofs_tmp_oz = current;
+   err = kern_path(pathname, flags, path);
+   autofs_tmp_oz = NULL;
+   mutex_unlock(_oz);
+   return err;
+}
+
 /* Find the topmost mount satisfying test() */
 static int find_autofs_mount(const char *pathname,
 struct path *res,
@@ -209,7 +223,8 @@ static int find_autofs_mount(const char *pathname,
struct path path;
int err;
 
-   err = kern_path_mountpoint(AT_FDCWD, pathname, , 0);
+   err = kern_path_oz(pathname, 0, );
+
if (err)
return err;
err = -ENOENT;
@@ -552,8 +567,7 @@ static int autofs_dev_ioctl_ismountpoint(struct file *fp,
 
if (!fp || param->ioctlfd == -1) {
if (autofs_type_any(type))
-   err = kern_path_mountpoint(AT_FDCWD,
-  name, , LOOKUP_FOLLOW);
+   err = kern_path_oz(name, LOOKUP_FOLLOW, );
else
err = find_autofs_mount(name, ,
test_by_type, );


signature.asc
Description: PGP signature

Re: Do we really need d_weak_revalidate???

2017-08-22 Thread NeilBrown

On Mon, Aug 21 2017, Ian Kent wrote:

>> 
>> A mount isn't triggered by kern_path(pathname, 0, ).
>> That '0' would need to include one of
>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>> 
>> to trigger an automount (otherwise you just get -EISDIR).
>
> It's perfectly sensible to think that but there is a case where a
> a mount is triggered when using kern_path().
>
> The EISDIR return occurs for positive dentrys, negative dentrys
> will still trigger an automount (which is autofs specific,
> indirect mount map using nobrowse option, the install default).

Ok, I understand this better now.  This difference between direct and
indirect mounts is slightly awkward. It is visible from user-space, but
not elegant to document.
When you use O_PATH to open a direct automount that has not already been
triggered, the open returns the underlying directory (and fstatfs
confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
an indirect automount, it *will* trigger the automount when "nobrowse" is
in effect, but it won't when "browse" is in effect.

So we cannot just say "O_PATH doesn't trigger automounts", which is
essentially what I said in

https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94

It might be possible to modify automount so that it was more consistent
- i.e. if the point is triggered by a mkdir has been done, just to the
mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
guess that might be racy, and in any case is hard to justify.

Maybe I should change it to be about "direct automounts", and add a note
that indirect automounts aren't so predictable.

But back to my original issue of wanting to discard
kern_path_mountpoint, what would you think of the following approach -
slight revised from before.

Thanks,
NeilBrown

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index beef981aa54f..7663ea82e68d 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -135,10 +135,13 @@ static inline struct autofs_info 
*autofs4_dentry_ino(struct dentry *dentry)
 /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
  * processes which do manipulations for us in user space sees the raw
  * filesystem without "magic".)
+ * A process performing certain ioctls can get temporary oz status.
  */
+extern struct task_struct *autofs_tmp_oz;
 static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
 {
-   return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
+   return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
+   autofs_tmp_oz == current;
 }
 
 struct inode *autofs4_get_inode(struct super_block *, umode_t);
diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index dd9f1bebb5a3..d76401669a20 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file *fp,
return 0;
 }
 
+struct task_struct *autofs_tmp_oz;
+int kern_path_oz(const char *pathname, int flags, struct path *path)
+{
+   static DEFINE_MUTEX(autofs_oz);
+   int err;
+
+   mutex_lock(_oz);
+   autofs_tmp_oz = current;
+   err = kern_path(pathname, flags, path);
+   autofs_tmp_oz = NULL;
+   mutex_unlock(_oz);
+   return err;
+}
+
 /* Find the topmost mount satisfying test() */
 static int find_autofs_mount(const char *pathname,
 struct path *res,
@@ -209,7 +223,8 @@ static int find_autofs_mount(const char *pathname,
struct path path;
int err;
 
-   err = kern_path_mountpoint(AT_FDCWD, pathname, , 0);
+   err = kern_path_oz(pathname, 0, );
+
if (err)
return err;
err = -ENOENT;
@@ -552,8 +567,7 @@ static int autofs_dev_ioctl_ismountpoint(struct file *fp,
 
if (!fp || param->ioctlfd == -1) {
if (autofs_type_any(type))
-   err = kern_path_mountpoint(AT_FDCWD,
-  name, , LOOKUP_FOLLOW);
+   err = kern_path_oz(name, LOOKUP_FOLLOW, );
else
err = find_autofs_mount(name, ,
test_by_type, );


signature.asc
Description: PGP signature

Re: Do we really need d_weak_revalidate???

2017-08-21 Thread NeilBrown

On Mon, Aug 21 2017, Ian Kent wrote:

> On 21/08/17 14:23, NeilBrown wrote:
>> On Fri, Aug 18 2017, Ian Kent wrote:
>> 
>>> On 18/08/17 13:24, NeilBrown wrote:
 On Thu, Aug 17 2017, Ian Kent wrote:

> On 16/08/17 19:34, Jeff Layton wrote:
>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>>
 On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
> On Fri, Aug 11 2017, Jeff Layton wrote:
>
>> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
 Funny story.  4.5 years ago we discarded the FS_REVAL_DOT 
 superblock
 flag and introduced the d_weak_revalidate dentry operation instead.
 We duly removed the flag from NFS superblocks and NFSv4 
 superblocks,
 and added the new dentry operation to NFS dentries  but not to
 NFSv4
 dentries.

 And nobody noticed.

 Until today.

 A customer reports a situation where mount(,MS_REMOUNT,..) on 
 an
 NFS
 filesystem hangs because the network has been deconfigured.  This
 makes
 perfect sense and I suggested a code change to fix the problem.
 However when a colleague was trying to reproduce the problem to
 validate
 the fix, he couldn't.  Then nor could I.

 The problem is trivially reproducible with NFSv3, and not at all 
 with
 NFSv4.  The reason is the missing d_weak_revalidate.

 We could simply add d_weak_revalidate for NFSv4, but given that it
 has been missing for 4.5 years, and the only time anyone noticed 
 was
 when the ommission resulted in a better user experience, I do 
 wonder
 if
 we need to.  Can we just discard d_weak_revalidate?  What purpose
 does
 it serve?  I couldn't find one.

 Thanks,
 NeilBrown

 For reference, see
 Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
 d_weak_revalidate dentry op")

 To reproduce the problem at home, on a system that uses systemd:
 1/ place (or find) a filesystem image in a file on an NFS 
 filesystem.
 2/ mount the nfs filesystem with "noac" - choose v3 or v4
 3/ loop-mount the filesystem image read-only somewhere
 4/ reboot

 If you choose v4, the reboot will succeed, possibly after a 
 90second
 timeout.
 If you choose v3, the reboot will hang indefinitely in systemd-
 shutdown while
 remounting the nfs filesystem read-only.

 If you don't use "noac" it can still hang, but only if something
 slows
 down the reboot enough that attributes have timed out by the time
 that
 systemd-shutdown runs.  This happens for our customer.

 If the loop-mounted filesystem is not read-only, you get other
 problems.

 We really want systemd to figure out that the loop-mount needs to 
 be
 unmounted first.  I have ideas concerning that, but it is messy.  
 But
 that isn't the only bug here.
>>>
>>> The main purpose of d_weak_revalidate() was to catch the issues that
>>> arise when someone changes the contents of the current working
>>> directory or its parent on the server. Since '.' and '..' are 
>>> treated
>>> specially in the lookup code, they would not be revalidated without
>>> special treatment. That leads to issues when looking up files as
>>> ./ or ../, since the client won't detect that 
>>> its
>>> dcache is stale until it tries to use the cached dentry+inode.
>>>
>>> The one thing that has changed since its introduction is, I believe,
>>> the ESTALE handling in the VFS layer. That might fix a lot of the
>>> dcache lookup bugs that were previously handled by 
>>> d_weak_revalidate().
>>> I haven't done an audit to figure out if it actually can handle all 
>>> of
>>> them.
>>>
>>
>> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
>>
>> vfs: allow umount to handle mountpoints without revalidating them
>
> You say in the comment for that commit:
>
>  but there
> are cases where we do want to revalidate the root of

Re: Do we really need d_weak_revalidate???

2017-08-21 Thread NeilBrown

On Mon, Aug 21 2017, Ian Kent wrote:

> On 21/08/17 14:23, NeilBrown wrote:
>> On Fri, Aug 18 2017, Ian Kent wrote:
>> 
>>> On 18/08/17 13:24, NeilBrown wrote:
 On Thu, Aug 17 2017, Ian Kent wrote:

> On 16/08/17 19:34, Jeff Layton wrote:
>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>>
 On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
> On Fri, Aug 11 2017, Jeff Layton wrote:
>
>> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
 Funny story.  4.5 years ago we discarded the FS_REVAL_DOT 
 superblock
 flag and introduced the d_weak_revalidate dentry operation instead.
 We duly removed the flag from NFS superblocks and NFSv4 
 superblocks,
 and added the new dentry operation to NFS dentries  but not to
 NFSv4
 dentries.

 And nobody noticed.

 Until today.

 A customer reports a situation where mount(,MS_REMOUNT,..) on 
 an
 NFS
 filesystem hangs because the network has been deconfigured.  This
 makes
 perfect sense and I suggested a code change to fix the problem.
 However when a colleague was trying to reproduce the problem to
 validate
 the fix, he couldn't.  Then nor could I.

 The problem is trivially reproducible with NFSv3, and not at all 
 with
 NFSv4.  The reason is the missing d_weak_revalidate.

 We could simply add d_weak_revalidate for NFSv4, but given that it
 has been missing for 4.5 years, and the only time anyone noticed 
 was
 when the ommission resulted in a better user experience, I do 
 wonder
 if
 we need to.  Can we just discard d_weak_revalidate?  What purpose
 does
 it serve?  I couldn't find one.

 Thanks,
 NeilBrown

 For reference, see
 Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
 d_weak_revalidate dentry op")

 To reproduce the problem at home, on a system that uses systemd:
 1/ place (or find) a filesystem image in a file on an NFS 
 filesystem.
 2/ mount the nfs filesystem with "noac" - choose v3 or v4
 3/ loop-mount the filesystem image read-only somewhere
 4/ reboot

 If you choose v4, the reboot will succeed, possibly after a 
 90second
 timeout.
 If you choose v3, the reboot will hang indefinitely in systemd-
 shutdown while
 remounting the nfs filesystem read-only.

 If you don't use "noac" it can still hang, but only if something
 slows
 down the reboot enough that attributes have timed out by the time
 that
 systemd-shutdown runs.  This happens for our customer.

 If the loop-mounted filesystem is not read-only, you get other
 problems.

 We really want systemd to figure out that the loop-mount needs to 
 be
 unmounted first.  I have ideas concerning that, but it is messy.  
 But
 that isn't the only bug here.
>>>
>>> The main purpose of d_weak_revalidate() was to catch the issues that
>>> arise when someone changes the contents of the current working
>>> directory or its parent on the server. Since '.' and '..' are 
>>> treated
>>> specially in the lookup code, they would not be revalidated without
>>> special treatment. That leads to issues when looking up files as
>>> ./ or ../, since the client won't detect that 
>>> its
>>> dcache is stale until it tries to use the cached dentry+inode.
>>>
>>> The one thing that has changed since its introduction is, I believe,
>>> the ESTALE handling in the VFS layer. That might fix a lot of the
>>> dcache lookup bugs that were previously handled by 
>>> d_weak_revalidate().
>>> I haven't done an audit to figure out if it actually can handle all 
>>> of
>>> them.
>>>
>>
>> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
>>
>> vfs: allow umount to handle mountpoints without revalidating them
>
> You say in the comment for that commit:
>
>  but there
> are cases where we do want to revalidate the root of

Re: Do we really need d_weak_revalidate???

2017-08-21 Thread Ian Kent

On 21/08/17 14:23, NeilBrown wrote:
> On Fri, Aug 18 2017, Ian Kent wrote:
> 
>> On 18/08/17 13:24, NeilBrown wrote:
>>> On Thu, Aug 17 2017, Ian Kent wrote:
>>>
 On 16/08/17 19:34, Jeff Layton wrote:
> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>
>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
 On Fri, Aug 11 2017, Jeff Layton wrote:

> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>>> flag and introduced the d_weak_revalidate dentry operation instead.
>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>>> and added the new dentry operation to NFS dentries  but not to
>>> NFSv4
>>> dentries.
>>>
>>> And nobody noticed.
>>>
>>> Until today.
>>>
>>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>>> NFS
>>> filesystem hangs because the network has been deconfigured.  This
>>> makes
>>> perfect sense and I suggested a code change to fix the problem.
>>> However when a colleague was trying to reproduce the problem to
>>> validate
>>> the fix, he couldn't.  Then nor could I.
>>>
>>> The problem is trivially reproducible with NFSv3, and not at all 
>>> with
>>> NFSv4.  The reason is the missing d_weak_revalidate.
>>>
>>> We could simply add d_weak_revalidate for NFSv4, but given that it
>>> has been missing for 4.5 years, and the only time anyone noticed was
>>> when the ommission resulted in a better user experience, I do wonder
>>> if
>>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>>> does
>>> it serve?  I couldn't find one.
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>> For reference, see
>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>>> d_weak_revalidate dentry op")
>>>
>>>
>>>
>>> To reproduce the problem at home, on a system that uses systemd:
>>> 1/ place (or find) a filesystem image in a file on an NFS 
>>> filesystem.
>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>>> 3/ loop-mount the filesystem image read-only somewhere
>>> 4/ reboot
>>>
>>> If you choose v4, the reboot will succeed, possibly after a 90second
>>> timeout.
>>> If you choose v3, the reboot will hang indefinitely in systemd-
>>> shutdown while
>>> remounting the nfs filesystem read-only.
>>>
>>> If you don't use "noac" it can still hang, but only if something
>>> slows
>>> down the reboot enough that attributes have timed out by the time
>>> that
>>> systemd-shutdown runs.  This happens for our customer.
>>>
>>> If the loop-mounted filesystem is not read-only, you get other
>>> problems.
>>>
>>> We really want systemd to figure out that the loop-mount needs to be
>>> unmounted first.  I have ideas concerning that, but it is messy.  
>>> But
>>> that isn't the only bug here.
>>
>> The main purpose of d_weak_revalidate() was to catch the issues that
>> arise when someone changes the contents of the current working
>> directory or its parent on the server. Since '.' and '..' are treated
>> specially in the lookup code, they would not be revalidated without
>> special treatment. That leads to issues when looking up files as
>> ./ or ../, since the client won't detect that its
>> dcache is stale until it tries to use the cached dentry+inode.
>>
>> The one thing that has changed since its introduction is, I believe,
>> the ESTALE handling in the VFS layer. That might fix a lot of the
>> dcache lookup bugs that were previously handled by 
>> d_weak_revalidate().
>> I haven't done an audit to figure out if it actually can handle all 
>> of
>> them.
>>
>
> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
>
> vfs: allow umount to handle mountpoints without revalidating them

 You say in the comment for that commit:

  but there
 are cases where we do want to revalidate the root of the fs.

 Do you happen to remember what those cases are?

>>>
>>> Not exactly, but I _think_ I might have been assuming that we needed to
>>> ensure that the inode attrs on the root were up to date after the
>>> pathwalk.
>>>

Re: Do we really need d_weak_revalidate???

2017-08-21 Thread Ian Kent

On 21/08/17 14:23, NeilBrown wrote:
> On Fri, Aug 18 2017, Ian Kent wrote:
> 
>> On 18/08/17 13:24, NeilBrown wrote:
>>> On Thu, Aug 17 2017, Ian Kent wrote:
>>>
 On 16/08/17 19:34, Jeff Layton wrote:
> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>
>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
 On Fri, Aug 11 2017, Jeff Layton wrote:

> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>>> flag and introduced the d_weak_revalidate dentry operation instead.
>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>>> and added the new dentry operation to NFS dentries  but not to
>>> NFSv4
>>> dentries.
>>>
>>> And nobody noticed.
>>>
>>> Until today.
>>>
>>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>>> NFS
>>> filesystem hangs because the network has been deconfigured.  This
>>> makes
>>> perfect sense and I suggested a code change to fix the problem.
>>> However when a colleague was trying to reproduce the problem to
>>> validate
>>> the fix, he couldn't.  Then nor could I.
>>>
>>> The problem is trivially reproducible with NFSv3, and not at all 
>>> with
>>> NFSv4.  The reason is the missing d_weak_revalidate.
>>>
>>> We could simply add d_weak_revalidate for NFSv4, but given that it
>>> has been missing for 4.5 years, and the only time anyone noticed was
>>> when the ommission resulted in a better user experience, I do wonder
>>> if
>>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>>> does
>>> it serve?  I couldn't find one.
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>> For reference, see
>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>>> d_weak_revalidate dentry op")
>>>
>>>
>>>
>>> To reproduce the problem at home, on a system that uses systemd:
>>> 1/ place (or find) a filesystem image in a file on an NFS 
>>> filesystem.
>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>>> 3/ loop-mount the filesystem image read-only somewhere
>>> 4/ reboot
>>>
>>> If you choose v4, the reboot will succeed, possibly after a 90second
>>> timeout.
>>> If you choose v3, the reboot will hang indefinitely in systemd-
>>> shutdown while
>>> remounting the nfs filesystem read-only.
>>>
>>> If you don't use "noac" it can still hang, but only if something
>>> slows
>>> down the reboot enough that attributes have timed out by the time
>>> that
>>> systemd-shutdown runs.  This happens for our customer.
>>>
>>> If the loop-mounted filesystem is not read-only, you get other
>>> problems.
>>>
>>> We really want systemd to figure out that the loop-mount needs to be
>>> unmounted first.  I have ideas concerning that, but it is messy.  
>>> But
>>> that isn't the only bug here.
>>
>> The main purpose of d_weak_revalidate() was to catch the issues that
>> arise when someone changes the contents of the current working
>> directory or its parent on the server. Since '.' and '..' are treated
>> specially in the lookup code, they would not be revalidated without
>> special treatment. That leads to issues when looking up files as
>> ./ or ../, since the client won't detect that its
>> dcache is stale until it tries to use the cached dentry+inode.
>>
>> The one thing that has changed since its introduction is, I believe,
>> the ESTALE handling in the VFS layer. That might fix a lot of the
>> dcache lookup bugs that were previously handled by 
>> d_weak_revalidate().
>> I haven't done an audit to figure out if it actually can handle all 
>> of
>> them.
>>
>
> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
>
> vfs: allow umount to handle mountpoints without revalidating them

 You say in the comment for that commit:

  but there
 are cases where we do want to revalidate the root of the fs.

 Do you happen to remember what those cases are?

>>>
>>> Not exactly, but I _think_ I might have been assuming that we needed to
>>> ensure that the inode attrs on the root were up to date after the
>>> pathwalk.
>>>

Re: Do we really need d_weak_revalidate???

2017-08-21 Thread NeilBrown

On Fri, Aug 18 2017, Ian Kent wrote:

> On 18/08/17 13:24, NeilBrown wrote:
>> On Thu, Aug 17 2017, Ian Kent wrote:
>> 
>>> On 16/08/17 19:34, Jeff Layton wrote:
 On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
> On Mon, Aug 14 2017, Jeff Layton wrote:
>
>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
>>> On Fri, Aug 11 2017, Jeff Layton wrote:
>>>
 On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>> flag and introduced the d_weak_revalidate dentry operation instead.
>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>> and added the new dentry operation to NFS dentries  but not to
>> NFSv4
>> dentries.
>>
>> And nobody noticed.
>>
>> Until today.
>>
>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>> NFS
>> filesystem hangs because the network has been deconfigured.  This
>> makes
>> perfect sense and I suggested a code change to fix the problem.
>> However when a colleague was trying to reproduce the problem to
>> validate
>> the fix, he couldn't.  Then nor could I.
>>
>> The problem is trivially reproducible with NFSv3, and not at all with
>> NFSv4.  The reason is the missing d_weak_revalidate.
>>
>> We could simply add d_weak_revalidate for NFSv4, but given that it
>> has been missing for 4.5 years, and the only time anyone noticed was
>> when the ommission resulted in a better user experience, I do wonder
>> if
>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>> does
>> it serve?  I couldn't find one.
>>
>> Thanks,
>> NeilBrown
>>
>> For reference, see
>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>> d_weak_revalidate dentry op")
>>
>>
>>
>> To reproduce the problem at home, on a system that uses systemd:
>> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>> 3/ loop-mount the filesystem image read-only somewhere
>> 4/ reboot
>>
>> If you choose v4, the reboot will succeed, possibly after a 90second
>> timeout.
>> If you choose v3, the reboot will hang indefinitely in systemd-
>> shutdown while
>> remounting the nfs filesystem read-only.
>>
>> If you don't use "noac" it can still hang, but only if something
>> slows
>> down the reboot enough that attributes have timed out by the time
>> that
>> systemd-shutdown runs.  This happens for our customer.
>>
>> If the loop-mounted filesystem is not read-only, you get other
>> problems.
>>
>> We really want systemd to figure out that the loop-mount needs to be
>> unmounted first.  I have ideas concerning that, but it is messy.  But
>> that isn't the only bug here.
>
> The main purpose of d_weak_revalidate() was to catch the issues that
> arise when someone changes the contents of the current working
> directory or its parent on the server. Since '.' and '..' are treated
> specially in the lookup code, they would not be revalidated without
> special treatment. That leads to issues when looking up files as
> ./ or ../, since the client won't detect that its
> dcache is stale until it tries to use the cached dentry+inode.
>
> The one thing that has changed since its introduction is, I believe,
> the ESTALE handling in the VFS layer. That might fix a lot of the
> dcache lookup bugs that were previously handled by 
> d_weak_revalidate().
> I haven't done an audit to figure out if it actually can handle all of
> them.
>

 It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:

 vfs: allow umount to handle mountpoints without revalidating them
>>>
>>> You say in the comment for that commit:
>>>
>>>  but there
>>> are cases where we do want to revalidate the root of the fs.
>>>
>>> Do you happen to remember what those cases are?
>>>
>>
>> Not exactly, but I _think_ I might have been assuming that we needed to
>> ensure that the inode attrs on the root were up to date after the
>> pathwalk.
>>
>> I think that was probably wrong. d_revalidate is really intended to
>> ensure that the dentry in question still points to the same inode. In
>> the case of the root of the mount

Re: Do we really need d_weak_revalidate???

2017-08-21 Thread NeilBrown

On Fri, Aug 18 2017, Ian Kent wrote:

> On 18/08/17 13:24, NeilBrown wrote:
>> On Thu, Aug 17 2017, Ian Kent wrote:
>> 
>>> On 16/08/17 19:34, Jeff Layton wrote:
 On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
> On Mon, Aug 14 2017, Jeff Layton wrote:
>
>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
>>> On Fri, Aug 11 2017, Jeff Layton wrote:
>>>
 On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>> flag and introduced the d_weak_revalidate dentry operation instead.
>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>> and added the new dentry operation to NFS dentries  but not to
>> NFSv4
>> dentries.
>>
>> And nobody noticed.
>>
>> Until today.
>>
>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>> NFS
>> filesystem hangs because the network has been deconfigured.  This
>> makes
>> perfect sense and I suggested a code change to fix the problem.
>> However when a colleague was trying to reproduce the problem to
>> validate
>> the fix, he couldn't.  Then nor could I.
>>
>> The problem is trivially reproducible with NFSv3, and not at all with
>> NFSv4.  The reason is the missing d_weak_revalidate.
>>
>> We could simply add d_weak_revalidate for NFSv4, but given that it
>> has been missing for 4.5 years, and the only time anyone noticed was
>> when the ommission resulted in a better user experience, I do wonder
>> if
>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>> does
>> it serve?  I couldn't find one.
>>
>> Thanks,
>> NeilBrown
>>
>> For reference, see
>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>> d_weak_revalidate dentry op")
>>
>>
>>
>> To reproduce the problem at home, on a system that uses systemd:
>> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>> 3/ loop-mount the filesystem image read-only somewhere
>> 4/ reboot
>>
>> If you choose v4, the reboot will succeed, possibly after a 90second
>> timeout.
>> If you choose v3, the reboot will hang indefinitely in systemd-
>> shutdown while
>> remounting the nfs filesystem read-only.
>>
>> If you don't use "noac" it can still hang, but only if something
>> slows
>> down the reboot enough that attributes have timed out by the time
>> that
>> systemd-shutdown runs.  This happens for our customer.
>>
>> If the loop-mounted filesystem is not read-only, you get other
>> problems.
>>
>> We really want systemd to figure out that the loop-mount needs to be
>> unmounted first.  I have ideas concerning that, but it is messy.  But
>> that isn't the only bug here.
>
> The main purpose of d_weak_revalidate() was to catch the issues that
> arise when someone changes the contents of the current working
> directory or its parent on the server. Since '.' and '..' are treated
> specially in the lookup code, they would not be revalidated without
> special treatment. That leads to issues when looking up files as
> ./ or ../, since the client won't detect that its
> dcache is stale until it tries to use the cached dentry+inode.
>
> The one thing that has changed since its introduction is, I believe,
> the ESTALE handling in the VFS layer. That might fix a lot of the
> dcache lookup bugs that were previously handled by 
> d_weak_revalidate().
> I haven't done an audit to figure out if it actually can handle all of
> them.
>

 It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:

 vfs: allow umount to handle mountpoints without revalidating them
>>>
>>> You say in the comment for that commit:
>>>
>>>  but there
>>> are cases where we do want to revalidate the root of the fs.
>>>
>>> Do you happen to remember what those cases are?
>>>
>>
>> Not exactly, but I _think_ I might have been assuming that we needed to
>> ensure that the inode attrs on the root were up to date after the
>> pathwalk.
>>
>> I think that was probably wrong. d_revalidate is really intended to
>> ensure that the dentry in question still points to the same inode. In
>> the case of the root of the mount

Re: Do we really need d_weak_revalidate???

2017-08-18 Thread Ian Kent

On 18/08/17 14:47, Ian Kent wrote:
> On 18/08/17 13:24, NeilBrown wrote:
>> On Thu, Aug 17 2017, Ian Kent wrote:
>>
>>> On 16/08/17 19:34, Jeff Layton wrote:
 On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
> On Mon, Aug 14 2017, Jeff Layton wrote:
>
>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
>>> On Fri, Aug 11 2017, Jeff Layton wrote:
>>>
 On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>> flag and introduced the d_weak_revalidate dentry operation instead.
>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>> and added the new dentry operation to NFS dentries  but not to
>> NFSv4
>> dentries.
>>
>> And nobody noticed.
>>
>> Until today.
>>
>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>> NFS
>> filesystem hangs because the network has been deconfigured.  This
>> makes
>> perfect sense and I suggested a code change to fix the problem.
>> However when a colleague was trying to reproduce the problem to
>> validate
>> the fix, he couldn't.  Then nor could I.
>>
>> The problem is trivially reproducible with NFSv3, and not at all with
>> NFSv4.  The reason is the missing d_weak_revalidate.
>>
>> We could simply add d_weak_revalidate for NFSv4, but given that it
>> has been missing for 4.5 years, and the only time anyone noticed was
>> when the ommission resulted in a better user experience, I do wonder
>> if
>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>> does
>> it serve?  I couldn't find one.
>>
>> Thanks,
>> NeilBrown
>>
>> For reference, see
>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>> d_weak_revalidate dentry op")
>>
>>
>>
>> To reproduce the problem at home, on a system that uses systemd:
>> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>> 3/ loop-mount the filesystem image read-only somewhere
>> 4/ reboot
>>
>> If you choose v4, the reboot will succeed, possibly after a 90second
>> timeout.
>> If you choose v3, the reboot will hang indefinitely in systemd-
>> shutdown while
>> remounting the nfs filesystem read-only.
>>
>> If you don't use "noac" it can still hang, but only if something
>> slows
>> down the reboot enough that attributes have timed out by the time
>> that
>> systemd-shutdown runs.  This happens for our customer.
>>
>> If the loop-mounted filesystem is not read-only, you get other
>> problems.
>>
>> We really want systemd to figure out that the loop-mount needs to be
>> unmounted first.  I have ideas concerning that, but it is messy.  But
>> that isn't the only bug here.
>
> The main purpose of d_weak_revalidate() was to catch the issues that
> arise when someone changes the contents of the current working
> directory or its parent on the server. Since '.' and '..' are treated
> specially in the lookup code, they would not be revalidated without
> special treatment. That leads to issues when looking up files as
> ./ or ../, since the client won't detect that its
> dcache is stale until it tries to use the cached dentry+inode.
>
> The one thing that has changed since its introduction is, I believe,
> the ESTALE handling in the VFS layer. That might fix a lot of the
> dcache lookup bugs that were previously handled by 
> d_weak_revalidate().
> I haven't done an audit to figure out if it actually can handle all of
> them.
>

 It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:

 vfs: allow umount to handle mountpoints without revalidating them
>>>
>>> You say in the comment for that commit:
>>>
>>>  but there
>>> are cases where we do want to revalidate the root of the fs.
>>>
>>> Do you happen to remember what those cases are?
>>>
>>
>> Not exactly, but I _think_ I might have been assuming that we needed to
>> ensure that the inode attrs on the root were up to date after the
>> pathwalk.
>>
>> I think that was probably wrong. d_revalidate is really intended to
>> ensure that the dentry in question still points to the same inode. In
>> the case of the root of the mount though,

Re: Do we really need d_weak_revalidate???

2017-08-18 Thread Ian Kent

On 18/08/17 14:47, Ian Kent wrote:
> On 18/08/17 13:24, NeilBrown wrote:
>> On Thu, Aug 17 2017, Ian Kent wrote:
>>
>>> On 16/08/17 19:34, Jeff Layton wrote:
 On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
> On Mon, Aug 14 2017, Jeff Layton wrote:
>
>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
>>> On Fri, Aug 11 2017, Jeff Layton wrote:
>>>
 On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>> flag and introduced the d_weak_revalidate dentry operation instead.
>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>> and added the new dentry operation to NFS dentries  but not to
>> NFSv4
>> dentries.
>>
>> And nobody noticed.
>>
>> Until today.
>>
>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>> NFS
>> filesystem hangs because the network has been deconfigured.  This
>> makes
>> perfect sense and I suggested a code change to fix the problem.
>> However when a colleague was trying to reproduce the problem to
>> validate
>> the fix, he couldn't.  Then nor could I.
>>
>> The problem is trivially reproducible with NFSv3, and not at all with
>> NFSv4.  The reason is the missing d_weak_revalidate.
>>
>> We could simply add d_weak_revalidate for NFSv4, but given that it
>> has been missing for 4.5 years, and the only time anyone noticed was
>> when the ommission resulted in a better user experience, I do wonder
>> if
>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>> does
>> it serve?  I couldn't find one.
>>
>> Thanks,
>> NeilBrown
>>
>> For reference, see
>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>> d_weak_revalidate dentry op")
>>
>>
>>
>> To reproduce the problem at home, on a system that uses systemd:
>> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>> 3/ loop-mount the filesystem image read-only somewhere
>> 4/ reboot
>>
>> If you choose v4, the reboot will succeed, possibly after a 90second
>> timeout.
>> If you choose v3, the reboot will hang indefinitely in systemd-
>> shutdown while
>> remounting the nfs filesystem read-only.
>>
>> If you don't use "noac" it can still hang, but only if something
>> slows
>> down the reboot enough that attributes have timed out by the time
>> that
>> systemd-shutdown runs.  This happens for our customer.
>>
>> If the loop-mounted filesystem is not read-only, you get other
>> problems.
>>
>> We really want systemd to figure out that the loop-mount needs to be
>> unmounted first.  I have ideas concerning that, but it is messy.  But
>> that isn't the only bug here.
>
> The main purpose of d_weak_revalidate() was to catch the issues that
> arise when someone changes the contents of the current working
> directory or its parent on the server. Since '.' and '..' are treated
> specially in the lookup code, they would not be revalidated without
> special treatment. That leads to issues when looking up files as
> ./ or ../, since the client won't detect that its
> dcache is stale until it tries to use the cached dentry+inode.
>
> The one thing that has changed since its introduction is, I believe,
> the ESTALE handling in the VFS layer. That might fix a lot of the
> dcache lookup bugs that were previously handled by 
> d_weak_revalidate().
> I haven't done an audit to figure out if it actually can handle all of
> them.
>

 It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:

 vfs: allow umount to handle mountpoints without revalidating them
>>>
>>> You say in the comment for that commit:
>>>
>>>  but there
>>> are cases where we do want to revalidate the root of the fs.
>>>
>>> Do you happen to remember what those cases are?
>>>
>>
>> Not exactly, but I _think_ I might have been assuming that we needed to
>> ensure that the inode attrs on the root were up to date after the
>> pathwalk.
>>
>> I think that was probably wrong. d_revalidate is really intended to
>> ensure that the dentry in question still points to the same inode. In
>> the case of the root of the mount though,

Re: Do we really need d_weak_revalidate???

2017-08-18 Thread Ian Kent

On 18/08/17 13:24, NeilBrown wrote:
> On Thu, Aug 17 2017, Ian Kent wrote:
> 
>> On 16/08/17 19:34, Jeff Layton wrote:
>>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
 On Mon, Aug 14 2017, Jeff Layton wrote:

> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
>> On Fri, Aug 11 2017, Jeff Layton wrote:
>>
>>> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
 On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
> flag and introduced the d_weak_revalidate dentry operation instead.
> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
> and added the new dentry operation to NFS dentries  but not to
> NFSv4
> dentries.
>
> And nobody noticed.
>
> Until today.
>
> A customer reports a situation where mount(,MS_REMOUNT,..) on an
> NFS
> filesystem hangs because the network has been deconfigured.  This
> makes
> perfect sense and I suggested a code change to fix the problem.
> However when a colleague was trying to reproduce the problem to
> validate
> the fix, he couldn't.  Then nor could I.
>
> The problem is trivially reproducible with NFSv3, and not at all with
> NFSv4.  The reason is the missing d_weak_revalidate.
>
> We could simply add d_weak_revalidate for NFSv4, but given that it
> has been missing for 4.5 years, and the only time anyone noticed was
> when the ommission resulted in a better user experience, I do wonder
> if
> we need to.  Can we just discard d_weak_revalidate?  What purpose
> does
> it serve?  I couldn't find one.
>
> Thanks,
> NeilBrown
>
> For reference, see
> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
> d_weak_revalidate dentry op")
>
>
>
> To reproduce the problem at home, on a system that uses systemd:
> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
> 3/ loop-mount the filesystem image read-only somewhere
> 4/ reboot
>
> If you choose v4, the reboot will succeed, possibly after a 90second
> timeout.
> If you choose v3, the reboot will hang indefinitely in systemd-
> shutdown while
> remounting the nfs filesystem read-only.
>
> If you don't use "noac" it can still hang, but only if something
> slows
> down the reboot enough that attributes have timed out by the time
> that
> systemd-shutdown runs.  This happens for our customer.
>
> If the loop-mounted filesystem is not read-only, you get other
> problems.
>
> We really want systemd to figure out that the loop-mount needs to be
> unmounted first.  I have ideas concerning that, but it is messy.  But
> that isn't the only bug here.

 The main purpose of d_weak_revalidate() was to catch the issues that
 arise when someone changes the contents of the current working
 directory or its parent on the server. Since '.' and '..' are treated
 specially in the lookup code, they would not be revalidated without
 special treatment. That leads to issues when looking up files as
 ./ or ../, since the client won't detect that its
 dcache is stale until it tries to use the cached dentry+inode.

 The one thing that has changed since its introduction is, I believe,
 the ESTALE handling in the VFS layer. That might fix a lot of the
 dcache lookup bugs that were previously handled by d_weak_revalidate().
 I haven't done an audit to figure out if it actually can handle all of
 them.

>>>
>>> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
>>>
>>> vfs: allow umount to handle mountpoints without revalidating them
>>
>> You say in the comment for that commit:
>>
>>  but there
>> are cases where we do want to revalidate the root of the fs.
>>
>> Do you happen to remember what those cases are?
>>
>
> Not exactly, but I _think_ I might have been assuming that we needed to
> ensure that the inode attrs on the root were up to date after the
> pathwalk.
>
> I think that was probably wrong. d_revalidate is really intended to
> ensure that the dentry in question still points to the same inode. In
> the case of the root of the mount though, we don't really care about the
> dentry on the server at all. We're attaching the root of the mount to an
> inode and don't care of the dentry name

Re: Do we really need d_weak_revalidate???

2017-08-18 Thread Ian Kent

On 18/08/17 13:24, NeilBrown wrote:
> On Thu, Aug 17 2017, Ian Kent wrote:
> 
>> On 16/08/17 19:34, Jeff Layton wrote:
>>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
 On Mon, Aug 14 2017, Jeff Layton wrote:

> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
>> On Fri, Aug 11 2017, Jeff Layton wrote:
>>
>>> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
 On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
> flag and introduced the d_weak_revalidate dentry operation instead.
> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
> and added the new dentry operation to NFS dentries  but not to
> NFSv4
> dentries.
>
> And nobody noticed.
>
> Until today.
>
> A customer reports a situation where mount(,MS_REMOUNT,..) on an
> NFS
> filesystem hangs because the network has been deconfigured.  This
> makes
> perfect sense and I suggested a code change to fix the problem.
> However when a colleague was trying to reproduce the problem to
> validate
> the fix, he couldn't.  Then nor could I.
>
> The problem is trivially reproducible with NFSv3, and not at all with
> NFSv4.  The reason is the missing d_weak_revalidate.
>
> We could simply add d_weak_revalidate for NFSv4, but given that it
> has been missing for 4.5 years, and the only time anyone noticed was
> when the ommission resulted in a better user experience, I do wonder
> if
> we need to.  Can we just discard d_weak_revalidate?  What purpose
> does
> it serve?  I couldn't find one.
>
> Thanks,
> NeilBrown
>
> For reference, see
> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
> d_weak_revalidate dentry op")
>
>
>
> To reproduce the problem at home, on a system that uses systemd:
> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
> 3/ loop-mount the filesystem image read-only somewhere
> 4/ reboot
>
> If you choose v4, the reboot will succeed, possibly after a 90second
> timeout.
> If you choose v3, the reboot will hang indefinitely in systemd-
> shutdown while
> remounting the nfs filesystem read-only.
>
> If you don't use "noac" it can still hang, but only if something
> slows
> down the reboot enough that attributes have timed out by the time
> that
> systemd-shutdown runs.  This happens for our customer.
>
> If the loop-mounted filesystem is not read-only, you get other
> problems.
>
> We really want systemd to figure out that the loop-mount needs to be
> unmounted first.  I have ideas concerning that, but it is messy.  But
> that isn't the only bug here.

 The main purpose of d_weak_revalidate() was to catch the issues that
 arise when someone changes the contents of the current working
 directory or its parent on the server. Since '.' and '..' are treated
 specially in the lookup code, they would not be revalidated without
 special treatment. That leads to issues when looking up files as
 ./ or ../, since the client won't detect that its
 dcache is stale until it tries to use the cached dentry+inode.

 The one thing that has changed since its introduction is, I believe,
 the ESTALE handling in the VFS layer. That might fix a lot of the
 dcache lookup bugs that were previously handled by d_weak_revalidate().
 I haven't done an audit to figure out if it actually can handle all of
 them.

>>>
>>> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
>>>
>>> vfs: allow umount to handle mountpoints without revalidating them
>>
>> You say in the comment for that commit:
>>
>>  but there
>> are cases where we do want to revalidate the root of the fs.
>>
>> Do you happen to remember what those cases are?
>>
>
> Not exactly, but I _think_ I might have been assuming that we needed to
> ensure that the inode attrs on the root were up to date after the
> pathwalk.
>
> I think that was probably wrong. d_revalidate is really intended to
> ensure that the dentry in question still points to the same inode. In
> the case of the root of the mount though, we don't really care about the
> dentry on the server at all. We're attaching the root of the mount to an
> inode and don't care of the dentry name

Re: Do we really need d_weak_revalidate???

2017-08-17 Thread NeilBrown

On Thu, Aug 17 2017, Ian Kent wrote:

> On 16/08/17 19:34, Jeff Layton wrote:
>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>>
 On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
> On Fri, Aug 11 2017, Jeff Layton wrote:
>
>> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
 Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
 flag and introduced the d_weak_revalidate dentry operation instead.
 We duly removed the flag from NFS superblocks and NFSv4 superblocks,
 and added the new dentry operation to NFS dentries  but not to
 NFSv4
 dentries.

 And nobody noticed.

 Until today.

 A customer reports a situation where mount(,MS_REMOUNT,..) on an
 NFS
 filesystem hangs because the network has been deconfigured.  This
 makes
 perfect sense and I suggested a code change to fix the problem.
 However when a colleague was trying to reproduce the problem to
 validate
 the fix, he couldn't.  Then nor could I.

 The problem is trivially reproducible with NFSv3, and not at all with
 NFSv4.  The reason is the missing d_weak_revalidate.

 We could simply add d_weak_revalidate for NFSv4, but given that it
 has been missing for 4.5 years, and the only time anyone noticed was
 when the ommission resulted in a better user experience, I do wonder
 if
 we need to.  Can we just discard d_weak_revalidate?  What purpose
 does
 it serve?  I couldn't find one.

 Thanks,
 NeilBrown

 For reference, see
 Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
 d_weak_revalidate dentry op")

 To reproduce the problem at home, on a system that uses systemd:
 1/ place (or find) a filesystem image in a file on an NFS filesystem.
 2/ mount the nfs filesystem with "noac" - choose v3 or v4
 3/ loop-mount the filesystem image read-only somewhere
 4/ reboot

 If you choose v4, the reboot will succeed, possibly after a 90second
 timeout.
 If you choose v3, the reboot will hang indefinitely in systemd-
 shutdown while
 remounting the nfs filesystem read-only.

 If you don't use "noac" it can still hang, but only if something
 slows
 down the reboot enough that attributes have timed out by the time
 that
 systemd-shutdown runs.  This happens for our customer.

 If the loop-mounted filesystem is not read-only, you get other
 problems.

 We really want systemd to figure out that the loop-mount needs to be
 unmounted first.  I have ideas concerning that, but it is messy.  But
 that isn't the only bug here.
>>>
>>> The main purpose of d_weak_revalidate() was to catch the issues that
>>> arise when someone changes the contents of the current working
>>> directory or its parent on the server. Since '.' and '..' are treated
>>> specially in the lookup code, they would not be revalidated without
>>> special treatment. That leads to issues when looking up files as
>>> ./ or ../, since the client won't detect that its
>>> dcache is stale until it tries to use the cached dentry+inode.
>>>
>>> The one thing that has changed since its introduction is, I believe,
>>> the ESTALE handling in the VFS layer. That might fix a lot of the
>>> dcache lookup bugs that were previously handled by d_weak_revalidate().
>>> I haven't done an audit to figure out if it actually can handle all of
>>> them.
>>>
>>
>> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
>>
>> vfs: allow umount to handle mountpoints without revalidating them
>
> You say in the comment for that commit:
>
>  but there
> are cases where we do want to revalidate the root of the fs.
>
> Do you happen to remember what those cases are?
>

 Not exactly, but I _think_ I might have been assuming that we needed to
 ensure that the inode attrs on the root were up to date after the
 pathwalk.

 I think that was probably wrong. d_revalidate is really intended to
 ensure that the dentry in question still points to the same inode. In
 the case of the root of the mount though, we don't really care about the
 dentry on the server at all. We're attaching the root of the mount to an
 inode and don't care of the dentry name changes. If we do need to ensure
 the inode attrs are updated, we'll just revalidate them at that point.

>>
>> Possibly the fact that

Re: Do we really need d_weak_revalidate???

2017-08-17 Thread NeilBrown

On Thu, Aug 17 2017, Ian Kent wrote:

> On 16/08/17 19:34, Jeff Layton wrote:
>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>>
 On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
> On Fri, Aug 11 2017, Jeff Layton wrote:
>
>> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
 Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
 flag and introduced the d_weak_revalidate dentry operation instead.
 We duly removed the flag from NFS superblocks and NFSv4 superblocks,
 and added the new dentry operation to NFS dentries  but not to
 NFSv4
 dentries.

 And nobody noticed.

 Until today.

 A customer reports a situation where mount(,MS_REMOUNT,..) on an
 NFS
 filesystem hangs because the network has been deconfigured.  This
 makes
 perfect sense and I suggested a code change to fix the problem.
 However when a colleague was trying to reproduce the problem to
 validate
 the fix, he couldn't.  Then nor could I.

 The problem is trivially reproducible with NFSv3, and not at all with
 NFSv4.  The reason is the missing d_weak_revalidate.

 We could simply add d_weak_revalidate for NFSv4, but given that it
 has been missing for 4.5 years, and the only time anyone noticed was
 when the ommission resulted in a better user experience, I do wonder
 if
 we need to.  Can we just discard d_weak_revalidate?  What purpose
 does
 it serve?  I couldn't find one.

 Thanks,
 NeilBrown

 For reference, see
 Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
 d_weak_revalidate dentry op")

 To reproduce the problem at home, on a system that uses systemd:
 1/ place (or find) a filesystem image in a file on an NFS filesystem.
 2/ mount the nfs filesystem with "noac" - choose v3 or v4
 3/ loop-mount the filesystem image read-only somewhere
 4/ reboot

 If you choose v4, the reboot will succeed, possibly after a 90second
 timeout.
 If you choose v3, the reboot will hang indefinitely in systemd-
 shutdown while
 remounting the nfs filesystem read-only.

 If you don't use "noac" it can still hang, but only if something
 slows
 down the reboot enough that attributes have timed out by the time
 that
 systemd-shutdown runs.  This happens for our customer.

 If the loop-mounted filesystem is not read-only, you get other
 problems.

 We really want systemd to figure out that the loop-mount needs to be
 unmounted first.  I have ideas concerning that, but it is messy.  But
 that isn't the only bug here.
>>>
>>> The main purpose of d_weak_revalidate() was to catch the issues that
>>> arise when someone changes the contents of the current working
>>> directory or its parent on the server. Since '.' and '..' are treated
>>> specially in the lookup code, they would not be revalidated without
>>> special treatment. That leads to issues when looking up files as
>>> ./ or ../, since the client won't detect that its
>>> dcache is stale until it tries to use the cached dentry+inode.
>>>
>>> The one thing that has changed since its introduction is, I believe,
>>> the ESTALE handling in the VFS layer. That might fix a lot of the
>>> dcache lookup bugs that were previously handled by d_weak_revalidate().
>>> I haven't done an audit to figure out if it actually can handle all of
>>> them.
>>>
>>
>> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
>>
>> vfs: allow umount to handle mountpoints without revalidating them
>
> You say in the comment for that commit:
>
>  but there
> are cases where we do want to revalidate the root of the fs.
>
> Do you happen to remember what those cases are?
>

 Not exactly, but I _think_ I might have been assuming that we needed to
 ensure that the inode attrs on the root were up to date after the
 pathwalk.

 I think that was probably wrong. d_revalidate is really intended to
 ensure that the dentry in question still points to the same inode. In
 the case of the root of the mount though, we don't really care about the
 dentry on the server at all. We're attaching the root of the mount to an
 inode and don't care of the dentry name changes. If we do need to ensure
 the inode attrs are updated, we'll just revalidate them at that point.

>>
>> Possibly the fact that

Re: Do we really need d_weak_revalidate???

2017-08-16 Thread Ian Kent

On 16/08/17 19:34, Jeff Layton wrote:
> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>
>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
 On Fri, Aug 11 2017, Jeff Layton wrote:

> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>>> flag and introduced the d_weak_revalidate dentry operation instead.
>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>>> and added the new dentry operation to NFS dentries  but not to
>>> NFSv4
>>> dentries.
>>>
>>> And nobody noticed.
>>>
>>> Until today.
>>>
>>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>>> NFS
>>> filesystem hangs because the network has been deconfigured.  This
>>> makes
>>> perfect sense and I suggested a code change to fix the problem.
>>> However when a colleague was trying to reproduce the problem to
>>> validate
>>> the fix, he couldn't.  Then nor could I.
>>>
>>> The problem is trivially reproducible with NFSv3, and not at all with
>>> NFSv4.  The reason is the missing d_weak_revalidate.
>>>
>>> We could simply add d_weak_revalidate for NFSv4, but given that it
>>> has been missing for 4.5 years, and the only time anyone noticed was
>>> when the ommission resulted in a better user experience, I do wonder
>>> if
>>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>>> does
>>> it serve?  I couldn't find one.
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>> For reference, see
>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>>> d_weak_revalidate dentry op")
>>>
>>>
>>>
>>> To reproduce the problem at home, on a system that uses systemd:
>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>>> 3/ loop-mount the filesystem image read-only somewhere
>>> 4/ reboot
>>>
>>> If you choose v4, the reboot will succeed, possibly after a 90second
>>> timeout.
>>> If you choose v3, the reboot will hang indefinitely in systemd-
>>> shutdown while
>>> remounting the nfs filesystem read-only.
>>>
>>> If you don't use "noac" it can still hang, but only if something
>>> slows
>>> down the reboot enough that attributes have timed out by the time
>>> that
>>> systemd-shutdown runs.  This happens for our customer.
>>>
>>> If the loop-mounted filesystem is not read-only, you get other
>>> problems.
>>>
>>> We really want systemd to figure out that the loop-mount needs to be
>>> unmounted first.  I have ideas concerning that, but it is messy.  But
>>> that isn't the only bug here.
>>
>> The main purpose of d_weak_revalidate() was to catch the issues that
>> arise when someone changes the contents of the current working
>> directory or its parent on the server. Since '.' and '..' are treated
>> specially in the lookup code, they would not be revalidated without
>> special treatment. That leads to issues when looking up files as
>> ./ or ../, since the client won't detect that its
>> dcache is stale until it tries to use the cached dentry+inode.
>>
>> The one thing that has changed since its introduction is, I believe,
>> the ESTALE handling in the VFS layer. That might fix a lot of the
>> dcache lookup bugs that were previously handled by d_weak_revalidate().
>> I haven't done an audit to figure out if it actually can handle all of
>> them.
>>
>
> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
>
> vfs: allow umount to handle mountpoints without revalidating them

 You say in the comment for that commit:

  but there
 are cases where we do want to revalidate the root of the fs.

 Do you happen to remember what those cases are?

>>>
>>> Not exactly, but I _think_ I might have been assuming that we needed to
>>> ensure that the inode attrs on the root were up to date after the
>>> pathwalk.
>>>
>>> I think that was probably wrong. d_revalidate is really intended to
>>> ensure that the dentry in question still points to the same inode. In
>>> the case of the root of the mount though, we don't really care about the
>>> dentry on the server at all. We're attaching the root of the mount to an
>>> inode and don't care of the dentry name changes. If we do need to ensure
>>> the inode attrs are updated, we'll just revalidate them at that point.
>>>
>
> Possibly the fact that we no longer try to revalidate during unmount
> means that this is no longer necessary?
>
> The original patch that added d_weak_revalidate

Re: Do we really need d_weak_revalidate???

2017-08-16 Thread Ian Kent

On 16/08/17 19:34, Jeff Layton wrote:
> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>
>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
 On Fri, Aug 11 2017, Jeff Layton wrote:

> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>>> flag and introduced the d_weak_revalidate dentry operation instead.
>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>>> and added the new dentry operation to NFS dentries  but not to
>>> NFSv4
>>> dentries.
>>>
>>> And nobody noticed.
>>>
>>> Until today.
>>>
>>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>>> NFS
>>> filesystem hangs because the network has been deconfigured.  This
>>> makes
>>> perfect sense and I suggested a code change to fix the problem.
>>> However when a colleague was trying to reproduce the problem to
>>> validate
>>> the fix, he couldn't.  Then nor could I.
>>>
>>> The problem is trivially reproducible with NFSv3, and not at all with
>>> NFSv4.  The reason is the missing d_weak_revalidate.
>>>
>>> We could simply add d_weak_revalidate for NFSv4, but given that it
>>> has been missing for 4.5 years, and the only time anyone noticed was
>>> when the ommission resulted in a better user experience, I do wonder
>>> if
>>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>>> does
>>> it serve?  I couldn't find one.
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>> For reference, see
>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>>> d_weak_revalidate dentry op")
>>>
>>>
>>>
>>> To reproduce the problem at home, on a system that uses systemd:
>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>>> 3/ loop-mount the filesystem image read-only somewhere
>>> 4/ reboot
>>>
>>> If you choose v4, the reboot will succeed, possibly after a 90second
>>> timeout.
>>> If you choose v3, the reboot will hang indefinitely in systemd-
>>> shutdown while
>>> remounting the nfs filesystem read-only.
>>>
>>> If you don't use "noac" it can still hang, but only if something
>>> slows
>>> down the reboot enough that attributes have timed out by the time
>>> that
>>> systemd-shutdown runs.  This happens for our customer.
>>>
>>> If the loop-mounted filesystem is not read-only, you get other
>>> problems.
>>>
>>> We really want systemd to figure out that the loop-mount needs to be
>>> unmounted first.  I have ideas concerning that, but it is messy.  But
>>> that isn't the only bug here.
>>
>> The main purpose of d_weak_revalidate() was to catch the issues that
>> arise when someone changes the contents of the current working
>> directory or its parent on the server. Since '.' and '..' are treated
>> specially in the lookup code, they would not be revalidated without
>> special treatment. That leads to issues when looking up files as
>> ./ or ../, since the client won't detect that its
>> dcache is stale until it tries to use the cached dentry+inode.
>>
>> The one thing that has changed since its introduction is, I believe,
>> the ESTALE handling in the VFS layer. That might fix a lot of the
>> dcache lookup bugs that were previously handled by d_weak_revalidate().
>> I haven't done an audit to figure out if it actually can handle all of
>> them.
>>
>
> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
>
> vfs: allow umount to handle mountpoints without revalidating them

 You say in the comment for that commit:

  but there
 are cases where we do want to revalidate the root of the fs.

 Do you happen to remember what those cases are?

>>>
>>> Not exactly, but I _think_ I might have been assuming that we needed to
>>> ensure that the inode attrs on the root were up to date after the
>>> pathwalk.
>>>
>>> I think that was probably wrong. d_revalidate is really intended to
>>> ensure that the dentry in question still points to the same inode. In
>>> the case of the root of the mount though, we don't really care about the
>>> dentry on the server at all. We're attaching the root of the mount to an
>>> inode and don't care of the dentry name changes. If we do need to ensure
>>> the inode attrs are updated, we'll just revalidate them at that point.
>>>
>
> Possibly the fact that we no longer try to revalidate during unmount
> means that this is no longer necessary?
>
> The original patch that added d_weak_revalidate

Re: Do we really need d_weak_revalidate???

2017-08-16 Thread NeilBrown

On Wed, Aug 16 2017, Jeff Layton wrote:

> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>> On Mon, Aug 14 2017, Jeff Layton wrote:
>> 
>> > On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
>> > > On Fri, Aug 11 2017, Jeff Layton wrote:
>> > > 
>> > > > On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>> > > > > On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>> > > > > > Funny story.  4.5 years ago we discarded the FS_REVAL_DOT 
>> > > > > > superblock
>> > > > > > flag and introduced the d_weak_revalidate dentry operation instead.
>> > > > > > We duly removed the flag from NFS superblocks and NFSv4 
>> > > > > > superblocks,
>> > > > > > and added the new dentry operation to NFS dentries  but not to
>> > > > > > NFSv4
>> > > > > > dentries.
>> > > > > > 
>> > > > > > And nobody noticed.
>> > > > > > 
>> > > > > > Until today.
>> > > > > > 
>> > > > > > A customer reports a situation where mount(,MS_REMOUNT,..) on 
>> > > > > > an
>> > > > > > NFS
>> > > > > > filesystem hangs because the network has been deconfigured.  This
>> > > > > > makes
>> > > > > > perfect sense and I suggested a code change to fix the problem.
>> > > > > > However when a colleague was trying to reproduce the problem to
>> > > > > > validate
>> > > > > > the fix, he couldn't.  Then nor could I.
>> > > > > > 
>> > > > > > The problem is trivially reproducible with NFSv3, and not at all 
>> > > > > > with
>> > > > > > NFSv4.  The reason is the missing d_weak_revalidate.
>> > > > > > 
>> > > > > > We could simply add d_weak_revalidate for NFSv4, but given that it
>> > > > > > has been missing for 4.5 years, and the only time anyone noticed 
>> > > > > > was
>> > > > > > when the ommission resulted in a better user experience, I do 
>> > > > > > wonder
>> > > > > > if
>> > > > > > we need to.  Can we just discard d_weak_revalidate?  What purpose
>> > > > > > does
>> > > > > > it serve?  I couldn't find one.
>> > > > > > 
>> > > > > > Thanks,
>> > > > > > NeilBrown
>> > > > > > 
>> > > > > > For reference, see
>> > > > > > Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>> > > > > > d_weak_revalidate dentry op")
>> > > > > > 
>> > > > > > 
>> > > > > > 
>> > > > > > To reproduce the problem at home, on a system that uses systemd:
>> > > > > > 1/ place (or find) a filesystem image in a file on an NFS 
>> > > > > > filesystem.
>> > > > > > 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>> > > > > > 3/ loop-mount the filesystem image read-only somewhere
>> > > > > > 4/ reboot
>> > > > > > 
>> > > > > > If you choose v4, the reboot will succeed, possibly after a 
>> > > > > > 90second
>> > > > > > timeout.
>> > > > > > If you choose v3, the reboot will hang indefinitely in systemd-
>> > > > > > shutdown while
>> > > > > > remounting the nfs filesystem read-only.
>> > > > > > 
>> > > > > > If you don't use "noac" it can still hang, but only if something
>> > > > > > slows
>> > > > > > down the reboot enough that attributes have timed out by the time
>> > > > > > that
>> > > > > > systemd-shutdown runs.  This happens for our customer.
>> > > > > > 
>> > > > > > If the loop-mounted filesystem is not read-only, you get other
>> > > > > > problems.
>> > > > > > 
>> > > > > > We really want systemd to figure out that the loop-mount needs to 
>> > > > > > be
>> > > > > > unmounted first.  I have ideas concerning that, but it is messy.  
>> > > > > > But
>> > > > > > that isn't the only bug here.
>> > > > > 
>> > > > > The main purpose of d_weak_revalidate() was to catch the issues that
>> > > > > arise when someone changes the contents of the current working
>> > > > > directory or its parent on the server. Since '.' and '..' are treated
>> > > > > specially in the lookup code, they would not be revalidated without
>> > > > > special treatment. That leads to issues when looking up files as
>> > > > > ./ or ../, since the client won't detect that its
>> > > > > dcache is stale until it tries to use the cached dentry+inode.
>> > > > > 
>> > > > > The one thing that has changed since its introduction is, I believe,
>> > > > > the ESTALE handling in the VFS layer. That might fix a lot of the
>> > > > > dcache lookup bugs that were previously handled by 
>> > > > > d_weak_revalidate().
>> > > > > I haven't done an audit to figure out if it actually can handle all 
>> > > > > of
>> > > > > them.
>> > > > > 
>> > > > 
>> > > > It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
>> > > > 
>> > > > vfs: allow umount to handle mountpoints without revalidating them
>> > > 
>> > > You say in the comment for that commit:
>> > > 
>> > >  but there
>> > > are cases where we do want to revalidate the root of the fs.
>> > > 
>> > > Do you happen to remember what those cases are?
>> > > 
>> > 
>> > Not exactly, but I _think_ I might have been assuming that we needed to
>> > ensure that the inode attrs on the root were up to date after the
>> > pathwalk.
>> > 
>> > I think that was

Re: Do we really need d_weak_revalidate???

2017-08-16 Thread NeilBrown

On Wed, Aug 16 2017, Jeff Layton wrote:

> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>> On Mon, Aug 14 2017, Jeff Layton wrote:
>> 
>> > On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
>> > > On Fri, Aug 11 2017, Jeff Layton wrote:
>> > > 
>> > > > On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>> > > > > On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>> > > > > > Funny story.  4.5 years ago we discarded the FS_REVAL_DOT 
>> > > > > > superblock
>> > > > > > flag and introduced the d_weak_revalidate dentry operation instead.
>> > > > > > We duly removed the flag from NFS superblocks and NFSv4 
>> > > > > > superblocks,
>> > > > > > and added the new dentry operation to NFS dentries  but not to
>> > > > > > NFSv4
>> > > > > > dentries.
>> > > > > > 
>> > > > > > And nobody noticed.
>> > > > > > 
>> > > > > > Until today.
>> > > > > > 
>> > > > > > A customer reports a situation where mount(,MS_REMOUNT,..) on 
>> > > > > > an
>> > > > > > NFS
>> > > > > > filesystem hangs because the network has been deconfigured.  This
>> > > > > > makes
>> > > > > > perfect sense and I suggested a code change to fix the problem.
>> > > > > > However when a colleague was trying to reproduce the problem to
>> > > > > > validate
>> > > > > > the fix, he couldn't.  Then nor could I.
>> > > > > > 
>> > > > > > The problem is trivially reproducible with NFSv3, and not at all 
>> > > > > > with
>> > > > > > NFSv4.  The reason is the missing d_weak_revalidate.
>> > > > > > 
>> > > > > > We could simply add d_weak_revalidate for NFSv4, but given that it
>> > > > > > has been missing for 4.5 years, and the only time anyone noticed 
>> > > > > > was
>> > > > > > when the ommission resulted in a better user experience, I do 
>> > > > > > wonder
>> > > > > > if
>> > > > > > we need to.  Can we just discard d_weak_revalidate?  What purpose
>> > > > > > does
>> > > > > > it serve?  I couldn't find one.
>> > > > > > 
>> > > > > > Thanks,
>> > > > > > NeilBrown
>> > > > > > 
>> > > > > > For reference, see
>> > > > > > Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>> > > > > > d_weak_revalidate dentry op")
>> > > > > > 
>> > > > > > 
>> > > > > > 
>> > > > > > To reproduce the problem at home, on a system that uses systemd:
>> > > > > > 1/ place (or find) a filesystem image in a file on an NFS 
>> > > > > > filesystem.
>> > > > > > 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>> > > > > > 3/ loop-mount the filesystem image read-only somewhere
>> > > > > > 4/ reboot
>> > > > > > 
>> > > > > > If you choose v4, the reboot will succeed, possibly after a 
>> > > > > > 90second
>> > > > > > timeout.
>> > > > > > If you choose v3, the reboot will hang indefinitely in systemd-
>> > > > > > shutdown while
>> > > > > > remounting the nfs filesystem read-only.
>> > > > > > 
>> > > > > > If you don't use "noac" it can still hang, but only if something
>> > > > > > slows
>> > > > > > down the reboot enough that attributes have timed out by the time
>> > > > > > that
>> > > > > > systemd-shutdown runs.  This happens for our customer.
>> > > > > > 
>> > > > > > If the loop-mounted filesystem is not read-only, you get other
>> > > > > > problems.
>> > > > > > 
>> > > > > > We really want systemd to figure out that the loop-mount needs to 
>> > > > > > be
>> > > > > > unmounted first.  I have ideas concerning that, but it is messy.  
>> > > > > > But
>> > > > > > that isn't the only bug here.
>> > > > > 
>> > > > > The main purpose of d_weak_revalidate() was to catch the issues that
>> > > > > arise when someone changes the contents of the current working
>> > > > > directory or its parent on the server. Since '.' and '..' are treated
>> > > > > specially in the lookup code, they would not be revalidated without
>> > > > > special treatment. That leads to issues when looking up files as
>> > > > > ./ or ../, since the client won't detect that its
>> > > > > dcache is stale until it tries to use the cached dentry+inode.
>> > > > > 
>> > > > > The one thing that has changed since its introduction is, I believe,
>> > > > > the ESTALE handling in the VFS layer. That might fix a lot of the
>> > > > > dcache lookup bugs that were previously handled by 
>> > > > > d_weak_revalidate().
>> > > > > I haven't done an audit to figure out if it actually can handle all 
>> > > > > of
>> > > > > them.
>> > > > > 
>> > > > 
>> > > > It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
>> > > > 
>> > > > vfs: allow umount to handle mountpoints without revalidating them
>> > > 
>> > > You say in the comment for that commit:
>> > > 
>> > >  but there
>> > > are cases where we do want to revalidate the root of the fs.
>> > > 
>> > > Do you happen to remember what those cases are?
>> > > 
>> > 
>> > Not exactly, but I _think_ I might have been assuming that we needed to
>> > ensure that the inode attrs on the root were up to date after the
>> > pathwalk.
>> > 
>> > I think that was

Re: Do we really need d_weak_revalidate???

2017-08-16 Thread Jeff Layton

On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
> On Mon, Aug 14 2017, Jeff Layton wrote:
> 
> > On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
> > > On Fri, Aug 11 2017, Jeff Layton wrote:
> > > 
> > > > On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
> > > > > On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
> > > > > > Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
> > > > > > flag and introduced the d_weak_revalidate dentry operation instead.
> > > > > > We duly removed the flag from NFS superblocks and NFSv4 superblocks,
> > > > > > and added the new dentry operation to NFS dentries  but not to
> > > > > > NFSv4
> > > > > > dentries.
> > > > > > 
> > > > > > And nobody noticed.
> > > > > > 
> > > > > > Until today.
> > > > > > 
> > > > > > A customer reports a situation where mount(,MS_REMOUNT,..) on an
> > > > > > NFS
> > > > > > filesystem hangs because the network has been deconfigured.  This
> > > > > > makes
> > > > > > perfect sense and I suggested a code change to fix the problem.
> > > > > > However when a colleague was trying to reproduce the problem to
> > > > > > validate
> > > > > > the fix, he couldn't.  Then nor could I.
> > > > > > 
> > > > > > The problem is trivially reproducible with NFSv3, and not at all 
> > > > > > with
> > > > > > NFSv4.  The reason is the missing d_weak_revalidate.
> > > > > > 
> > > > > > We could simply add d_weak_revalidate for NFSv4, but given that it
> > > > > > has been missing for 4.5 years, and the only time anyone noticed was
> > > > > > when the ommission resulted in a better user experience, I do wonder
> > > > > > if
> > > > > > we need to.  Can we just discard d_weak_revalidate?  What purpose
> > > > > > does
> > > > > > it serve?  I couldn't find one.
> > > > > > 
> > > > > > Thanks,
> > > > > > NeilBrown
> > > > > > 
> > > > > > For reference, see
> > > > > > Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
> > > > > > d_weak_revalidate dentry op")
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > To reproduce the problem at home, on a system that uses systemd:
> > > > > > 1/ place (or find) a filesystem image in a file on an NFS 
> > > > > > filesystem.
> > > > > > 2/ mount the nfs filesystem with "noac" - choose v3 or v4
> > > > > > 3/ loop-mount the filesystem image read-only somewhere
> > > > > > 4/ reboot
> > > > > > 
> > > > > > If you choose v4, the reboot will succeed, possibly after a 90second
> > > > > > timeout.
> > > > > > If you choose v3, the reboot will hang indefinitely in systemd-
> > > > > > shutdown while
> > > > > > remounting the nfs filesystem read-only.
> > > > > > 
> > > > > > If you don't use "noac" it can still hang, but only if something
> > > > > > slows
> > > > > > down the reboot enough that attributes have timed out by the time
> > > > > > that
> > > > > > systemd-shutdown runs.  This happens for our customer.
> > > > > > 
> > > > > > If the loop-mounted filesystem is not read-only, you get other
> > > > > > problems.
> > > > > > 
> > > > > > We really want systemd to figure out that the loop-mount needs to be
> > > > > > unmounted first.  I have ideas concerning that, but it is messy.  
> > > > > > But
> > > > > > that isn't the only bug here.
> > > > > 
> > > > > The main purpose of d_weak_revalidate() was to catch the issues that
> > > > > arise when someone changes the contents of the current working
> > > > > directory or its parent on the server. Since '.' and '..' are treated
> > > > > specially in the lookup code, they would not be revalidated without
> > > > > special treatment. That leads to issues when looking up files as
> > > > > ./ or ../, since the client won't detect that its
> > > > > dcache is stale until it tries to use the cached dentry+inode.
> > > > > 
> > > > > The one thing that has changed since its introduction is, I believe,
> > > > > the ESTALE handling in the VFS layer. That might fix a lot of the
> > > > > dcache lookup bugs that were previously handled by 
> > > > > d_weak_revalidate().
> > > > > I haven't done an audit to figure out if it actually can handle all of
> > > > > them.
> > > > > 
> > > > 
> > > > It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
> > > > 
> > > > vfs: allow umount to handle mountpoints without revalidating them
> > > 
> > > You say in the comment for that commit:
> > > 
> > >  but there
> > > are cases where we do want to revalidate the root of the fs.
> > > 
> > > Do you happen to remember what those cases are?
> > > 
> > 
> > Not exactly, but I _think_ I might have been assuming that we needed to
> > ensure that the inode attrs on the root were up to date after the
> > pathwalk.
> > 
> > I think that was probably wrong. d_revalidate is really intended to
> > ensure that the dentry in question still points to the same inode. In
> > the case of the root of the mount though, we don't really care about the
> > dentry on the server at all. We're attaching the root of

Re: Do we really need d_weak_revalidate???

2017-08-16 Thread Jeff Layton

On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
> On Mon, Aug 14 2017, Jeff Layton wrote:
> 
> > On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
> > > On Fri, Aug 11 2017, Jeff Layton wrote:
> > > 
> > > > On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
> > > > > On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
> > > > > > Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
> > > > > > flag and introduced the d_weak_revalidate dentry operation instead.
> > > > > > We duly removed the flag from NFS superblocks and NFSv4 superblocks,
> > > > > > and added the new dentry operation to NFS dentries  but not to
> > > > > > NFSv4
> > > > > > dentries.
> > > > > > 
> > > > > > And nobody noticed.
> > > > > > 
> > > > > > Until today.
> > > > > > 
> > > > > > A customer reports a situation where mount(,MS_REMOUNT,..) on an
> > > > > > NFS
> > > > > > filesystem hangs because the network has been deconfigured.  This
> > > > > > makes
> > > > > > perfect sense and I suggested a code change to fix the problem.
> > > > > > However when a colleague was trying to reproduce the problem to
> > > > > > validate
> > > > > > the fix, he couldn't.  Then nor could I.
> > > > > > 
> > > > > > The problem is trivially reproducible with NFSv3, and not at all 
> > > > > > with
> > > > > > NFSv4.  The reason is the missing d_weak_revalidate.
> > > > > > 
> > > > > > We could simply add d_weak_revalidate for NFSv4, but given that it
> > > > > > has been missing for 4.5 years, and the only time anyone noticed was
> > > > > > when the ommission resulted in a better user experience, I do wonder
> > > > > > if
> > > > > > we need to.  Can we just discard d_weak_revalidate?  What purpose
> > > > > > does
> > > > > > it serve?  I couldn't find one.
> > > > > > 
> > > > > > Thanks,
> > > > > > NeilBrown
> > > > > > 
> > > > > > For reference, see
> > > > > > Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
> > > > > > d_weak_revalidate dentry op")
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > To reproduce the problem at home, on a system that uses systemd:
> > > > > > 1/ place (or find) a filesystem image in a file on an NFS 
> > > > > > filesystem.
> > > > > > 2/ mount the nfs filesystem with "noac" - choose v3 or v4
> > > > > > 3/ loop-mount the filesystem image read-only somewhere
> > > > > > 4/ reboot
> > > > > > 
> > > > > > If you choose v4, the reboot will succeed, possibly after a 90second
> > > > > > timeout.
> > > > > > If you choose v3, the reboot will hang indefinitely in systemd-
> > > > > > shutdown while
> > > > > > remounting the nfs filesystem read-only.
> > > > > > 
> > > > > > If you don't use "noac" it can still hang, but only if something
> > > > > > slows
> > > > > > down the reboot enough that attributes have timed out by the time
> > > > > > that
> > > > > > systemd-shutdown runs.  This happens for our customer.
> > > > > > 
> > > > > > If the loop-mounted filesystem is not read-only, you get other
> > > > > > problems.
> > > > > > 
> > > > > > We really want systemd to figure out that the loop-mount needs to be
> > > > > > unmounted first.  I have ideas concerning that, but it is messy.  
> > > > > > But
> > > > > > that isn't the only bug here.
> > > > > 
> > > > > The main purpose of d_weak_revalidate() was to catch the issues that
> > > > > arise when someone changes the contents of the current working
> > > > > directory or its parent on the server. Since '.' and '..' are treated
> > > > > specially in the lookup code, they would not be revalidated without
> > > > > special treatment. That leads to issues when looking up files as
> > > > > ./ or ../, since the client won't detect that its
> > > > > dcache is stale until it tries to use the cached dentry+inode.
> > > > > 
> > > > > The one thing that has changed since its introduction is, I believe,
> > > > > the ESTALE handling in the VFS layer. That might fix a lot of the
> > > > > dcache lookup bugs that were previously handled by 
> > > > > d_weak_revalidate().
> > > > > I haven't done an audit to figure out if it actually can handle all of
> > > > > them.
> > > > > 
> > > > 
> > > > It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
> > > > 
> > > > vfs: allow umount to handle mountpoints without revalidating them
> > > 
> > > You say in the comment for that commit:
> > > 
> > >  but there
> > > are cases where we do want to revalidate the root of the fs.
> > > 
> > > Do you happen to remember what those cases are?
> > > 
> > 
> > Not exactly, but I _think_ I might have been assuming that we needed to
> > ensure that the inode attrs on the root were up to date after the
> > pathwalk.
> > 
> > I think that was probably wrong. d_revalidate is really intended to
> > ensure that the dentry in question still points to the same inode. In
> > the case of the root of the mount though, we don't really care about the
> > dentry on the server at all. We're attaching the root of

Re: Do we really need d_weak_revalidate???

2017-08-15 Thread NeilBrown

On Mon, Aug 14 2017, Jeff Layton wrote:

> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
>> On Fri, Aug 11 2017, Jeff Layton wrote:
>> 
>> > On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>> > > On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>> > > > Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>> > > > flag and introduced the d_weak_revalidate dentry operation instead.
>> > > > We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>> > > > and added the new dentry operation to NFS dentries  but not to
>> > > > NFSv4
>> > > > dentries.
>> > > > 
>> > > > And nobody noticed.
>> > > > 
>> > > > Until today.
>> > > > 
>> > > > A customer reports a situation where mount(,MS_REMOUNT,..) on an
>> > > > NFS
>> > > > filesystem hangs because the network has been deconfigured.  This
>> > > > makes
>> > > > perfect sense and I suggested a code change to fix the problem.
>> > > > However when a colleague was trying to reproduce the problem to
>> > > > validate
>> > > > the fix, he couldn't.  Then nor could I.
>> > > > 
>> > > > The problem is trivially reproducible with NFSv3, and not at all with
>> > > > NFSv4.  The reason is the missing d_weak_revalidate.
>> > > > 
>> > > > We could simply add d_weak_revalidate for NFSv4, but given that it
>> > > > has been missing for 4.5 years, and the only time anyone noticed was
>> > > > when the ommission resulted in a better user experience, I do wonder
>> > > > if
>> > > > we need to.  Can we just discard d_weak_revalidate?  What purpose
>> > > > does
>> > > > it serve?  I couldn't find one.
>> > > > 
>> > > > Thanks,
>> > > > NeilBrown
>> > > > 
>> > > > For reference, see
>> > > > Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>> > > > d_weak_revalidate dentry op")
>> > > > 
>> > > > 
>> > > > 
>> > > > To reproduce the problem at home, on a system that uses systemd:
>> > > > 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>> > > > 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>> > > > 3/ loop-mount the filesystem image read-only somewhere
>> > > > 4/ reboot
>> > > > 
>> > > > If you choose v4, the reboot will succeed, possibly after a 90second
>> > > > timeout.
>> > > > If you choose v3, the reboot will hang indefinitely in systemd-
>> > > > shutdown while
>> > > > remounting the nfs filesystem read-only.
>> > > > 
>> > > > If you don't use "noac" it can still hang, but only if something
>> > > > slows
>> > > > down the reboot enough that attributes have timed out by the time
>> > > > that
>> > > > systemd-shutdown runs.  This happens for our customer.
>> > > > 
>> > > > If the loop-mounted filesystem is not read-only, you get other
>> > > > problems.
>> > > > 
>> > > > We really want systemd to figure out that the loop-mount needs to be
>> > > > unmounted first.  I have ideas concerning that, but it is messy.  But
>> > > > that isn't the only bug here.
>> > > 
>> > > The main purpose of d_weak_revalidate() was to catch the issues that
>> > > arise when someone changes the contents of the current working
>> > > directory or its parent on the server. Since '.' and '..' are treated
>> > > specially in the lookup code, they would not be revalidated without
>> > > special treatment. That leads to issues when looking up files as
>> > > ./ or ../, since the client won't detect that its
>> > > dcache is stale until it tries to use the cached dentry+inode.
>> > > 
>> > > The one thing that has changed since its introduction is, I believe,
>> > > the ESTALE handling in the VFS layer. That might fix a lot of the
>> > > dcache lookup bugs that were previously handled by d_weak_revalidate().
>> > > I haven't done an audit to figure out if it actually can handle all of
>> > > them.
>> > > 
>> > 
>> > It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
>> > 
>> > vfs: allow umount to handle mountpoints without revalidating them
>> 
>> You say in the comment for that commit:
>> 
>>  but there
>> are cases where we do want to revalidate the root of the fs.
>> 
>> Do you happen to remember what those cases are?
>> 
>
> Not exactly, but I _think_ I might have been assuming that we needed to
> ensure that the inode attrs on the root were up to date after the
> pathwalk.
>
> I think that was probably wrong. d_revalidate is really intended to
> ensure that the dentry in question still points to the same inode. In
> the case of the root of the mount though, we don't really care about the
> dentry on the server at all. We're attaching the root of the mount to an
> inode and don't care of the dentry name changes. If we do need to ensure
> the inode attrs are updated, we'll just revalidate them at that point.
>
>> > 
>> > Possibly the fact that we no longer try to revalidate during unmount
>> > means that this is no longer necessary?
>> > 
>> > The original patch that added d_weak_revalidate had a reproducer in the
>> > patch description. Have you verified

Re: Do we really need d_weak_revalidate???

2017-08-15 Thread NeilBrown

On Mon, Aug 14 2017, Jeff Layton wrote:

> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
>> On Fri, Aug 11 2017, Jeff Layton wrote:
>> 
>> > On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>> > > On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>> > > > Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>> > > > flag and introduced the d_weak_revalidate dentry operation instead.
>> > > > We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>> > > > and added the new dentry operation to NFS dentries  but not to
>> > > > NFSv4
>> > > > dentries.
>> > > > 
>> > > > And nobody noticed.
>> > > > 
>> > > > Until today.
>> > > > 
>> > > > A customer reports a situation where mount(,MS_REMOUNT,..) on an
>> > > > NFS
>> > > > filesystem hangs because the network has been deconfigured.  This
>> > > > makes
>> > > > perfect sense and I suggested a code change to fix the problem.
>> > > > However when a colleague was trying to reproduce the problem to
>> > > > validate
>> > > > the fix, he couldn't.  Then nor could I.
>> > > > 
>> > > > The problem is trivially reproducible with NFSv3, and not at all with
>> > > > NFSv4.  The reason is the missing d_weak_revalidate.
>> > > > 
>> > > > We could simply add d_weak_revalidate for NFSv4, but given that it
>> > > > has been missing for 4.5 years, and the only time anyone noticed was
>> > > > when the ommission resulted in a better user experience, I do wonder
>> > > > if
>> > > > we need to.  Can we just discard d_weak_revalidate?  What purpose
>> > > > does
>> > > > it serve?  I couldn't find one.
>> > > > 
>> > > > Thanks,
>> > > > NeilBrown
>> > > > 
>> > > > For reference, see
>> > > > Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>> > > > d_weak_revalidate dentry op")
>> > > > 
>> > > > 
>> > > > 
>> > > > To reproduce the problem at home, on a system that uses systemd:
>> > > > 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>> > > > 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>> > > > 3/ loop-mount the filesystem image read-only somewhere
>> > > > 4/ reboot
>> > > > 
>> > > > If you choose v4, the reboot will succeed, possibly after a 90second
>> > > > timeout.
>> > > > If you choose v3, the reboot will hang indefinitely in systemd-
>> > > > shutdown while
>> > > > remounting the nfs filesystem read-only.
>> > > > 
>> > > > If you don't use "noac" it can still hang, but only if something
>> > > > slows
>> > > > down the reboot enough that attributes have timed out by the time
>> > > > that
>> > > > systemd-shutdown runs.  This happens for our customer.
>> > > > 
>> > > > If the loop-mounted filesystem is not read-only, you get other
>> > > > problems.
>> > > > 
>> > > > We really want systemd to figure out that the loop-mount needs to be
>> > > > unmounted first.  I have ideas concerning that, but it is messy.  But
>> > > > that isn't the only bug here.
>> > > 
>> > > The main purpose of d_weak_revalidate() was to catch the issues that
>> > > arise when someone changes the contents of the current working
>> > > directory or its parent on the server. Since '.' and '..' are treated
>> > > specially in the lookup code, they would not be revalidated without
>> > > special treatment. That leads to issues when looking up files as
>> > > ./ or ../, since the client won't detect that its
>> > > dcache is stale until it tries to use the cached dentry+inode.
>> > > 
>> > > The one thing that has changed since its introduction is, I believe,
>> > > the ESTALE handling in the VFS layer. That might fix a lot of the
>> > > dcache lookup bugs that were previously handled by d_weak_revalidate().
>> > > I haven't done an audit to figure out if it actually can handle all of
>> > > them.
>> > > 
>> > 
>> > It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
>> > 
>> > vfs: allow umount to handle mountpoints without revalidating them
>> 
>> You say in the comment for that commit:
>> 
>>  but there
>> are cases where we do want to revalidate the root of the fs.
>> 
>> Do you happen to remember what those cases are?
>> 
>
> Not exactly, but I _think_ I might have been assuming that we needed to
> ensure that the inode attrs on the root were up to date after the
> pathwalk.
>
> I think that was probably wrong. d_revalidate is really intended to
> ensure that the dentry in question still points to the same inode. In
> the case of the root of the mount though, we don't really care about the
> dentry on the server at all. We're attaching the root of the mount to an
> inode and don't care of the dentry name changes. If we do need to ensure
> the inode attrs are updated, we'll just revalidate them at that point.
>
>> > 
>> > Possibly the fact that we no longer try to revalidate during unmount
>> > means that this is no longer necessary?
>> > 
>> > The original patch that added d_weak_revalidate had a reproducer in the
>> > patch description. Have you verified

Re: Do we really need d_weak_revalidate???

2017-08-14 Thread Jeff Layton

On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
> On Fri, Aug 11 2017, Jeff Layton wrote:
> 
> > On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
> > > On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
> > > > Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
> > > > flag and introduced the d_weak_revalidate dentry operation instead.
> > > > We duly removed the flag from NFS superblocks and NFSv4 superblocks,
> > > > and added the new dentry operation to NFS dentries  but not to
> > > > NFSv4
> > > > dentries.
> > > > 
> > > > And nobody noticed.
> > > > 
> > > > Until today.
> > > > 
> > > > A customer reports a situation where mount(,MS_REMOUNT,..) on an
> > > > NFS
> > > > filesystem hangs because the network has been deconfigured.  This
> > > > makes
> > > > perfect sense and I suggested a code change to fix the problem.
> > > > However when a colleague was trying to reproduce the problem to
> > > > validate
> > > > the fix, he couldn't.  Then nor could I.
> > > > 
> > > > The problem is trivially reproducible with NFSv3, and not at all with
> > > > NFSv4.  The reason is the missing d_weak_revalidate.
> > > > 
> > > > We could simply add d_weak_revalidate for NFSv4, but given that it
> > > > has been missing for 4.5 years, and the only time anyone noticed was
> > > > when the ommission resulted in a better user experience, I do wonder
> > > > if
> > > > we need to.  Can we just discard d_weak_revalidate?  What purpose
> > > > does
> > > > it serve?  I couldn't find one.
> > > > 
> > > > Thanks,
> > > > NeilBrown
> > > > 
> > > > For reference, see
> > > > Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
> > > > d_weak_revalidate dentry op")
> > > > 
> > > > 
> > > > 
> > > > To reproduce the problem at home, on a system that uses systemd:
> > > > 1/ place (or find) a filesystem image in a file on an NFS filesystem.
> > > > 2/ mount the nfs filesystem with "noac" - choose v3 or v4
> > > > 3/ loop-mount the filesystem image read-only somewhere
> > > > 4/ reboot
> > > > 
> > > > If you choose v4, the reboot will succeed, possibly after a 90second
> > > > timeout.
> > > > If you choose v3, the reboot will hang indefinitely in systemd-
> > > > shutdown while
> > > > remounting the nfs filesystem read-only.
> > > > 
> > > > If you don't use "noac" it can still hang, but only if something
> > > > slows
> > > > down the reboot enough that attributes have timed out by the time
> > > > that
> > > > systemd-shutdown runs.  This happens for our customer.
> > > > 
> > > > If the loop-mounted filesystem is not read-only, you get other
> > > > problems.
> > > > 
> > > > We really want systemd to figure out that the loop-mount needs to be
> > > > unmounted first.  I have ideas concerning that, but it is messy.  But
> > > > that isn't the only bug here.
> > > 
> > > The main purpose of d_weak_revalidate() was to catch the issues that
> > > arise when someone changes the contents of the current working
> > > directory or its parent on the server. Since '.' and '..' are treated
> > > specially in the lookup code, they would not be revalidated without
> > > special treatment. That leads to issues when looking up files as
> > > ./ or ../, since the client won't detect that its
> > > dcache is stale until it tries to use the cached dentry+inode.
> > > 
> > > The one thing that has changed since its introduction is, I believe,
> > > the ESTALE handling in the VFS layer. That might fix a lot of the
> > > dcache lookup bugs that were previously handled by d_weak_revalidate().
> > > I haven't done an audit to figure out if it actually can handle all of
> > > them.
> > > 
> > 
> > It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
> > 
> > vfs: allow umount to handle mountpoints without revalidating them
> 
> You say in the comment for that commit:
> 
>  but there
> are cases where we do want to revalidate the root of the fs.
> 
> Do you happen to remember what those cases are?
> 

Not exactly, but I _think_ I might have been assuming that we needed to
ensure that the inode attrs on the root were up to date after the
pathwalk.

I think that was probably wrong. d_revalidate is really intended to
ensure that the dentry in question still points to the same inode. In
the case of the root of the mount though, we don't really care about the
dentry on the server at all. We're attaching the root of the mount to an
inode and don't care of the dentry name changes. If we do need to ensure
the inode attrs are updated, we'll just revalidate them at that point.

> > 
> > Possibly the fact that we no longer try to revalidate during unmount
> > means that this is no longer necessary?
> > 
> > The original patch that added d_weak_revalidate had a reproducer in the
> > patch description. Have you verified that that problem is still not
> > reproducible when you remove d_weak_revalidate?
> 
> I did try the reproducer and it works as expected both with and without
>

Re: Do we really need d_weak_revalidate???

2017-08-14 Thread Jeff Layton

On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
> On Fri, Aug 11 2017, Jeff Layton wrote:
> 
> > On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
> > > On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
> > > > Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
> > > > flag and introduced the d_weak_revalidate dentry operation instead.
> > > > We duly removed the flag from NFS superblocks and NFSv4 superblocks,
> > > > and added the new dentry operation to NFS dentries  but not to
> > > > NFSv4
> > > > dentries.
> > > > 
> > > > And nobody noticed.
> > > > 
> > > > Until today.
> > > > 
> > > > A customer reports a situation where mount(,MS_REMOUNT,..) on an
> > > > NFS
> > > > filesystem hangs because the network has been deconfigured.  This
> > > > makes
> > > > perfect sense and I suggested a code change to fix the problem.
> > > > However when a colleague was trying to reproduce the problem to
> > > > validate
> > > > the fix, he couldn't.  Then nor could I.
> > > > 
> > > > The problem is trivially reproducible with NFSv3, and not at all with
> > > > NFSv4.  The reason is the missing d_weak_revalidate.
> > > > 
> > > > We could simply add d_weak_revalidate for NFSv4, but given that it
> > > > has been missing for 4.5 years, and the only time anyone noticed was
> > > > when the ommission resulted in a better user experience, I do wonder
> > > > if
> > > > we need to.  Can we just discard d_weak_revalidate?  What purpose
> > > > does
> > > > it serve?  I couldn't find one.
> > > > 
> > > > Thanks,
> > > > NeilBrown
> > > > 
> > > > For reference, see
> > > > Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
> > > > d_weak_revalidate dentry op")
> > > > 
> > > > 
> > > > 
> > > > To reproduce the problem at home, on a system that uses systemd:
> > > > 1/ place (or find) a filesystem image in a file on an NFS filesystem.
> > > > 2/ mount the nfs filesystem with "noac" - choose v3 or v4
> > > > 3/ loop-mount the filesystem image read-only somewhere
> > > > 4/ reboot
> > > > 
> > > > If you choose v4, the reboot will succeed, possibly after a 90second
> > > > timeout.
> > > > If you choose v3, the reboot will hang indefinitely in systemd-
> > > > shutdown while
> > > > remounting the nfs filesystem read-only.
> > > > 
> > > > If you don't use "noac" it can still hang, but only if something
> > > > slows
> > > > down the reboot enough that attributes have timed out by the time
> > > > that
> > > > systemd-shutdown runs.  This happens for our customer.
> > > > 
> > > > If the loop-mounted filesystem is not read-only, you get other
> > > > problems.
> > > > 
> > > > We really want systemd to figure out that the loop-mount needs to be
> > > > unmounted first.  I have ideas concerning that, but it is messy.  But
> > > > that isn't the only bug here.
> > > 
> > > The main purpose of d_weak_revalidate() was to catch the issues that
> > > arise when someone changes the contents of the current working
> > > directory or its parent on the server. Since '.' and '..' are treated
> > > specially in the lookup code, they would not be revalidated without
> > > special treatment. That leads to issues when looking up files as
> > > ./ or ../, since the client won't detect that its
> > > dcache is stale until it tries to use the cached dentry+inode.
> > > 
> > > The one thing that has changed since its introduction is, I believe,
> > > the ESTALE handling in the VFS layer. That might fix a lot of the
> > > dcache lookup bugs that were previously handled by d_weak_revalidate().
> > > I haven't done an audit to figure out if it actually can handle all of
> > > them.
> > > 
> > 
> > It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
> > 
> > vfs: allow umount to handle mountpoints without revalidating them
> 
> You say in the comment for that commit:
> 
>  but there
> are cases where we do want to revalidate the root of the fs.
> 
> Do you happen to remember what those cases are?
> 

Not exactly, but I _think_ I might have been assuming that we needed to
ensure that the inode attrs on the root were up to date after the
pathwalk.

I think that was probably wrong. d_revalidate is really intended to
ensure that the dentry in question still points to the same inode. In
the case of the root of the mount though, we don't really care about the
dentry on the server at all. We're attaching the root of the mount to an
inode and don't care of the dentry name changes. If we do need to ensure
the inode attrs are updated, we'll just revalidate them at that point.

> > 
> > Possibly the fact that we no longer try to revalidate during unmount
> > means that this is no longer necessary?
> > 
> > The original patch that added d_weak_revalidate had a reproducer in the
> > patch description. Have you verified that that problem is still not
> > reproducible when you remove d_weak_revalidate?
> 
> I did try the reproducer and it works as expected both with and without
>

Re: Do we really need d_weak_revalidate???

2017-08-13 Thread NeilBrown

On Fri, Aug 11 2017, Jeff Layton wrote:

> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>> > Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>> > flag and introduced the d_weak_revalidate dentry operation instead.
>> > We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>> > and added the new dentry operation to NFS dentries  but not to
>> > NFSv4
>> > dentries.
>> > 
>> > And nobody noticed.
>> > 
>> > Until today.
>> > 
>> > A customer reports a situation where mount(,MS_REMOUNT,..) on an
>> > NFS
>> > filesystem hangs because the network has been deconfigured.  This
>> > makes
>> > perfect sense and I suggested a code change to fix the problem.
>> > However when a colleague was trying to reproduce the problem to
>> > validate
>> > the fix, he couldn't.  Then nor could I.
>> > 
>> > The problem is trivially reproducible with NFSv3, and not at all with
>> > NFSv4.  The reason is the missing d_weak_revalidate.
>> > 
>> > We could simply add d_weak_revalidate for NFSv4, but given that it
>> > has been missing for 4.5 years, and the only time anyone noticed was
>> > when the ommission resulted in a better user experience, I do wonder
>> > if
>> > we need to.  Can we just discard d_weak_revalidate?  What purpose
>> > does
>> > it serve?  I couldn't find one.
>> > 
>> > Thanks,
>> > NeilBrown
>> > 
>> > For reference, see
>> > Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>> > d_weak_revalidate dentry op")
>> > 
>> > 
>> > 
>> > To reproduce the problem at home, on a system that uses systemd:
>> > 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>> > 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>> > 3/ loop-mount the filesystem image read-only somewhere
>> > 4/ reboot
>> > 
>> > If you choose v4, the reboot will succeed, possibly after a 90second
>> > timeout.
>> > If you choose v3, the reboot will hang indefinitely in systemd-
>> > shutdown while
>> > remounting the nfs filesystem read-only.
>> > 
>> > If you don't use "noac" it can still hang, but only if something
>> > slows
>> > down the reboot enough that attributes have timed out by the time
>> > that
>> > systemd-shutdown runs.  This happens for our customer.
>> > 
>> > If the loop-mounted filesystem is not read-only, you get other
>> > problems.
>> > 
>> > We really want systemd to figure out that the loop-mount needs to be
>> > unmounted first.  I have ideas concerning that, but it is messy.  But
>> > that isn't the only bug here.
>> 
>> The main purpose of d_weak_revalidate() was to catch the issues that
>> arise when someone changes the contents of the current working
>> directory or its parent on the server. Since '.' and '..' are treated
>> specially in the lookup code, they would not be revalidated without
>> special treatment. That leads to issues when looking up files as
>> ./ or ../, since the client won't detect that its
>> dcache is stale until it tries to use the cached dentry+inode.
>> 
>> The one thing that has changed since its introduction is, I believe,
>> the ESTALE handling in the VFS layer. That might fix a lot of the
>> dcache lookup bugs that were previously handled by d_weak_revalidate().
>> I haven't done an audit to figure out if it actually can handle all of
>> them.
>> 
>
> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
>
> vfs: allow umount to handle mountpoints without revalidating them

You say in the comment for that commit:

 but there
are cases where we do want to revalidate the root of the fs.

Do you happen to remember what those cases are?

>
> Possibly the fact that we no longer try to revalidate during unmount
> means that this is no longer necessary?
>
> The original patch that added d_weak_revalidate had a reproducer in the
> patch description. Have you verified that that problem is still not
> reproducible when you remove d_weak_revalidate?

I did try the reproducer and it works as expected both with and without
d_weak_revalidate.
On reflection, the problem it displayed was caused by d_revalidate()
being called when the dentry name was irrelevant.  We remove that
(fixing the problem) and introduce d_weak_revalidate because we thought
that minimum functionality was still useful.  I'm currently not
convinced that even that is needed.

If we discarded d_weak_revalidate(), we could get rid of the special
handling of umount

Thanks,
NeilBrown


signature.asc
Description: PGP signature

Re: Do we really need d_weak_revalidate???

2017-08-13 Thread NeilBrown

On Fri, Aug 11 2017, Jeff Layton wrote:

> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>> > Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>> > flag and introduced the d_weak_revalidate dentry operation instead.
>> > We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>> > and added the new dentry operation to NFS dentries  but not to
>> > NFSv4
>> > dentries.
>> > 
>> > And nobody noticed.
>> > 
>> > Until today.
>> > 
>> > A customer reports a situation where mount(,MS_REMOUNT,..) on an
>> > NFS
>> > filesystem hangs because the network has been deconfigured.  This
>> > makes
>> > perfect sense and I suggested a code change to fix the problem.
>> > However when a colleague was trying to reproduce the problem to
>> > validate
>> > the fix, he couldn't.  Then nor could I.
>> > 
>> > The problem is trivially reproducible with NFSv3, and not at all with
>> > NFSv4.  The reason is the missing d_weak_revalidate.
>> > 
>> > We could simply add d_weak_revalidate for NFSv4, but given that it
>> > has been missing for 4.5 years, and the only time anyone noticed was
>> > when the ommission resulted in a better user experience, I do wonder
>> > if
>> > we need to.  Can we just discard d_weak_revalidate?  What purpose
>> > does
>> > it serve?  I couldn't find one.
>> > 
>> > Thanks,
>> > NeilBrown
>> > 
>> > For reference, see
>> > Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>> > d_weak_revalidate dentry op")
>> > 
>> > 
>> > 
>> > To reproduce the problem at home, on a system that uses systemd:
>> > 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>> > 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>> > 3/ loop-mount the filesystem image read-only somewhere
>> > 4/ reboot
>> > 
>> > If you choose v4, the reboot will succeed, possibly after a 90second
>> > timeout.
>> > If you choose v3, the reboot will hang indefinitely in systemd-
>> > shutdown while
>> > remounting the nfs filesystem read-only.
>> > 
>> > If you don't use "noac" it can still hang, but only if something
>> > slows
>> > down the reboot enough that attributes have timed out by the time
>> > that
>> > systemd-shutdown runs.  This happens for our customer.
>> > 
>> > If the loop-mounted filesystem is not read-only, you get other
>> > problems.
>> > 
>> > We really want systemd to figure out that the loop-mount needs to be
>> > unmounted first.  I have ideas concerning that, but it is messy.  But
>> > that isn't the only bug here.
>> 
>> The main purpose of d_weak_revalidate() was to catch the issues that
>> arise when someone changes the contents of the current working
>> directory or its parent on the server. Since '.' and '..' are treated
>> specially in the lookup code, they would not be revalidated without
>> special treatment. That leads to issues when looking up files as
>> ./ or ../, since the client won't detect that its
>> dcache is stale until it tries to use the cached dentry+inode.
>> 
>> The one thing that has changed since its introduction is, I believe,
>> the ESTALE handling in the VFS layer. That might fix a lot of the
>> dcache lookup bugs that were previously handled by d_weak_revalidate().
>> I haven't done an audit to figure out if it actually can handle all of
>> them.
>> 
>
> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
>
> vfs: allow umount to handle mountpoints without revalidating them

You say in the comment for that commit:

 but there
are cases where we do want to revalidate the root of the fs.

Do you happen to remember what those cases are?

>
> Possibly the fact that we no longer try to revalidate during unmount
> means that this is no longer necessary?
>
> The original patch that added d_weak_revalidate had a reproducer in the
> patch description. Have you verified that that problem is still not
> reproducible when you remove d_weak_revalidate?

I did try the reproducer and it works as expected both with and without
d_weak_revalidate.
On reflection, the problem it displayed was caused by d_revalidate()
being called when the dentry name was irrelevant.  We remove that
(fixing the problem) and introduce d_weak_revalidate because we thought
that minimum functionality was still useful.  I'm currently not
convinced that even that is needed.

If we discarded d_weak_revalidate(), we could get rid of the special
handling of umount

Thanks,
NeilBrown


signature.asc
Description: PGP signature

Re: Do we really need d_weak_revalidate???

2017-08-13 Thread NeilBrown

On Fri, Aug 11 2017, Trond Myklebust wrote:

> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>> flag and introduced the d_weak_revalidate dentry operation instead.
>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>> and added the new dentry operation to NFS dentries  but not to
>> NFSv4
>> dentries.
>> 
>> And nobody noticed.
>> 
>> Until today.
>> 
>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>> NFS
>> filesystem hangs because the network has been deconfigured.  This
>> makes
>> perfect sense and I suggested a code change to fix the problem.
>> However when a colleague was trying to reproduce the problem to
>> validate
>> the fix, he couldn't.  Then nor could I.
>> 
>> The problem is trivially reproducible with NFSv3, and not at all with
>> NFSv4.  The reason is the missing d_weak_revalidate.
>> 
>> We could simply add d_weak_revalidate for NFSv4, but given that it
>> has been missing for 4.5 years, and the only time anyone noticed was
>> when the ommission resulted in a better user experience, I do wonder
>> if
>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>> does
>> it serve?  I couldn't find one.
>> 
>> Thanks,
>> NeilBrown
>> 
>> For reference, see
>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>> d_weak_revalidate dentry op")
>> 
>> 
>> 
>> To reproduce the problem at home, on a system that uses systemd:
>> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>> 3/ loop-mount the filesystem image read-only somewhere
>> 4/ reboot
>> 
>> If you choose v4, the reboot will succeed, possibly after a 90second
>> timeout.
>> If you choose v3, the reboot will hang indefinitely in systemd-
>> shutdown while
>> remounting the nfs filesystem read-only.
>> 
>> If you don't use "noac" it can still hang, but only if something
>> slows
>> down the reboot enough that attributes have timed out by the time
>> that
>> systemd-shutdown runs.  This happens for our customer.
>> 
>> If the loop-mounted filesystem is not read-only, you get other
>> problems.
>> 
>> We really want systemd to figure out that the loop-mount needs to be
>> unmounted first.  I have ideas concerning that, but it is messy.  But
>> that isn't the only bug here.
>
> The main purpose of d_weak_revalidate() was to catch the issues that
> arise when someone changes the contents of the current working
> directory or its parent on the server. Since '.' and '..' are treated
> specially in the lookup code, they would not be revalidated without
> special treatment. That leads to issues when looking up files as
> ./ or ../, since the client won't detect that its
> dcache is stale until it tries to use the cached dentry+inode.

I don't think that is quite right.
d_weak_revalidate() is only called from complete_walk() if LOOKUP_JUMPED
is set.  The happens when the final component of a path:
 - is a mount point
 - is ".."
or if the whole path is "/".  I thought "." was treated specially too,
but I cannot find that in the code.

After a path walk completes, the operation that acts on the path will
revalidate the inode one way or another so having an extra early
validation seems hard to justify.

If the inode has been removed, ESTALE is returned.  The slightly earlier
return of ESTALE might change some behavior All I can think of is
that if the directory under a mountpoint gets deleted, the mountpoint is
automatically removed.., but that happens in d_invalidate() which isn't
called when d_weak_revalidate() is called.

>
> The one thing that has changed since its introduction is, I believe,
> the ESTALE handling in the VFS layer. That might fix a lot of the
> dcache lookup bugs that were previously handled by d_weak_revalidate().
> I haven't done an audit to figure out if it actually can handle all of
> them.

I agree that seems like it might be relevant, but I don't see how it
would relate to any of the three cases that d_weak_revalidate affects.
Maybe there is some other change that we don't remember.

Thanks,
NeilBrown


>
> -- 
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> trond.mykleb...@primarydata.com


signature.asc
Description: PGP signature

Re: Do we really need d_weak_revalidate???

2017-08-13 Thread NeilBrown

On Fri, Aug 11 2017, Trond Myklebust wrote:

> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>> flag and introduced the d_weak_revalidate dentry operation instead.
>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>> and added the new dentry operation to NFS dentries  but not to
>> NFSv4
>> dentries.
>> 
>> And nobody noticed.
>> 
>> Until today.
>> 
>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>> NFS
>> filesystem hangs because the network has been deconfigured.  This
>> makes
>> perfect sense and I suggested a code change to fix the problem.
>> However when a colleague was trying to reproduce the problem to
>> validate
>> the fix, he couldn't.  Then nor could I.
>> 
>> The problem is trivially reproducible with NFSv3, and not at all with
>> NFSv4.  The reason is the missing d_weak_revalidate.
>> 
>> We could simply add d_weak_revalidate for NFSv4, but given that it
>> has been missing for 4.5 years, and the only time anyone noticed was
>> when the ommission resulted in a better user experience, I do wonder
>> if
>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>> does
>> it serve?  I couldn't find one.
>> 
>> Thanks,
>> NeilBrown
>> 
>> For reference, see
>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>> d_weak_revalidate dentry op")
>> 
>> 
>> 
>> To reproduce the problem at home, on a system that uses systemd:
>> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>> 3/ loop-mount the filesystem image read-only somewhere
>> 4/ reboot
>> 
>> If you choose v4, the reboot will succeed, possibly after a 90second
>> timeout.
>> If you choose v3, the reboot will hang indefinitely in systemd-
>> shutdown while
>> remounting the nfs filesystem read-only.
>> 
>> If you don't use "noac" it can still hang, but only if something
>> slows
>> down the reboot enough that attributes have timed out by the time
>> that
>> systemd-shutdown runs.  This happens for our customer.
>> 
>> If the loop-mounted filesystem is not read-only, you get other
>> problems.
>> 
>> We really want systemd to figure out that the loop-mount needs to be
>> unmounted first.  I have ideas concerning that, but it is messy.  But
>> that isn't the only bug here.
>
> The main purpose of d_weak_revalidate() was to catch the issues that
> arise when someone changes the contents of the current working
> directory or its parent on the server. Since '.' and '..' are treated
> specially in the lookup code, they would not be revalidated without
> special treatment. That leads to issues when looking up files as
> ./ or ../, since the client won't detect that its
> dcache is stale until it tries to use the cached dentry+inode.

I don't think that is quite right.
d_weak_revalidate() is only called from complete_walk() if LOOKUP_JUMPED
is set.  The happens when the final component of a path:
 - is a mount point
 - is ".."
or if the whole path is "/".  I thought "." was treated specially too,
but I cannot find that in the code.

After a path walk completes, the operation that acts on the path will
revalidate the inode one way or another so having an extra early
validation seems hard to justify.

If the inode has been removed, ESTALE is returned.  The slightly earlier
return of ESTALE might change some behavior All I can think of is
that if the directory under a mountpoint gets deleted, the mountpoint is
automatically removed.., but that happens in d_invalidate() which isn't
called when d_weak_revalidate() is called.

>
> The one thing that has changed since its introduction is, I believe,
> the ESTALE handling in the VFS layer. That might fix a lot of the
> dcache lookup bugs that were previously handled by d_weak_revalidate().
> I haven't done an audit to figure out if it actually can handle all of
> them.

I agree that seems like it might be relevant, but I don't see how it
would relate to any of the three cases that d_weak_revalidate affects.
Maybe there is some other change that we don't remember.

Thanks,
NeilBrown


>
> -- 
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> trond.mykleb...@primarydata.com


signature.asc
Description: PGP signature

Re: Do we really need d_weak_revalidate???

2017-08-11 Thread Jeff Layton

On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
> > Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
> > flag and introduced the d_weak_revalidate dentry operation instead.
> > We duly removed the flag from NFS superblocks and NFSv4 superblocks,
> > and added the new dentry operation to NFS dentries  but not to
> > NFSv4
> > dentries.
> > 
> > And nobody noticed.
> > 
> > Until today.
> > 
> > A customer reports a situation where mount(,MS_REMOUNT,..) on an
> > NFS
> > filesystem hangs because the network has been deconfigured.  This
> > makes
> > perfect sense and I suggested a code change to fix the problem.
> > However when a colleague was trying to reproduce the problem to
> > validate
> > the fix, he couldn't.  Then nor could I.
> > 
> > The problem is trivially reproducible with NFSv3, and not at all with
> > NFSv4.  The reason is the missing d_weak_revalidate.
> > 
> > We could simply add d_weak_revalidate for NFSv4, but given that it
> > has been missing for 4.5 years, and the only time anyone noticed was
> > when the ommission resulted in a better user experience, I do wonder
> > if
> > we need to.  Can we just discard d_weak_revalidate?  What purpose
> > does
> > it serve?  I couldn't find one.
> > 
> > Thanks,
> > NeilBrown
> > 
> > For reference, see
> > Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
> > d_weak_revalidate dentry op")
> > 
> > 
> > 
> > To reproduce the problem at home, on a system that uses systemd:
> > 1/ place (or find) a filesystem image in a file on an NFS filesystem.
> > 2/ mount the nfs filesystem with "noac" - choose v3 or v4
> > 3/ loop-mount the filesystem image read-only somewhere
> > 4/ reboot
> > 
> > If you choose v4, the reboot will succeed, possibly after a 90second
> > timeout.
> > If you choose v3, the reboot will hang indefinitely in systemd-
> > shutdown while
> > remounting the nfs filesystem read-only.
> > 
> > If you don't use "noac" it can still hang, but only if something
> > slows
> > down the reboot enough that attributes have timed out by the time
> > that
> > systemd-shutdown runs.  This happens for our customer.
> > 
> > If the loop-mounted filesystem is not read-only, you get other
> > problems.
> > 
> > We really want systemd to figure out that the loop-mount needs to be
> > unmounted first.  I have ideas concerning that, but it is messy.  But
> > that isn't the only bug here.
> 
> The main purpose of d_weak_revalidate() was to catch the issues that
> arise when someone changes the contents of the current working
> directory or its parent on the server. Since '.' and '..' are treated
> specially in the lookup code, they would not be revalidated without
> special treatment. That leads to issues when looking up files as
> ./ or ../, since the client won't detect that its
> dcache is stale until it tries to use the cached dentry+inode.
> 
> The one thing that has changed since its introduction is, I believe,
> the ESTALE handling in the VFS layer. That might fix a lot of the
> dcache lookup bugs that were previously handled by d_weak_revalidate().
> I haven't done an audit to figure out if it actually can handle all of
> them.
> 

It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:

vfs: allow umount to handle mountpoints without revalidating them

Possibly the fact that we no longer try to revalidate during unmount
means that this is no longer necessary?

The original patch that added d_weak_revalidate had a reproducer in the
patch description. Have you verified that that problem is still not
reproducible when you remove d_weak_revalidate?
-- 
Jeff Layton

Re: Do we really need d_weak_revalidate???

2017-08-11 Thread Jeff Layton

On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
> > Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
> > flag and introduced the d_weak_revalidate dentry operation instead.
> > We duly removed the flag from NFS superblocks and NFSv4 superblocks,
> > and added the new dentry operation to NFS dentries  but not to
> > NFSv4
> > dentries.
> > 
> > And nobody noticed.
> > 
> > Until today.
> > 
> > A customer reports a situation where mount(,MS_REMOUNT,..) on an
> > NFS
> > filesystem hangs because the network has been deconfigured.  This
> > makes
> > perfect sense and I suggested a code change to fix the problem.
> > However when a colleague was trying to reproduce the problem to
> > validate
> > the fix, he couldn't.  Then nor could I.
> > 
> > The problem is trivially reproducible with NFSv3, and not at all with
> > NFSv4.  The reason is the missing d_weak_revalidate.
> > 
> > We could simply add d_weak_revalidate for NFSv4, but given that it
> > has been missing for 4.5 years, and the only time anyone noticed was
> > when the ommission resulted in a better user experience, I do wonder
> > if
> > we need to.  Can we just discard d_weak_revalidate?  What purpose
> > does
> > it serve?  I couldn't find one.
> > 
> > Thanks,
> > NeilBrown
> > 
> > For reference, see
> > Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
> > d_weak_revalidate dentry op")
> > 
> > 
> > 
> > To reproduce the problem at home, on a system that uses systemd:
> > 1/ place (or find) a filesystem image in a file on an NFS filesystem.
> > 2/ mount the nfs filesystem with "noac" - choose v3 or v4
> > 3/ loop-mount the filesystem image read-only somewhere
> > 4/ reboot
> > 
> > If you choose v4, the reboot will succeed, possibly after a 90second
> > timeout.
> > If you choose v3, the reboot will hang indefinitely in systemd-
> > shutdown while
> > remounting the nfs filesystem read-only.
> > 
> > If you don't use "noac" it can still hang, but only if something
> > slows
> > down the reboot enough that attributes have timed out by the time
> > that
> > systemd-shutdown runs.  This happens for our customer.
> > 
> > If the loop-mounted filesystem is not read-only, you get other
> > problems.
> > 
> > We really want systemd to figure out that the loop-mount needs to be
> > unmounted first.  I have ideas concerning that, but it is messy.  But
> > that isn't the only bug here.
> 
> The main purpose of d_weak_revalidate() was to catch the issues that
> arise when someone changes the contents of the current working
> directory or its parent on the server. Since '.' and '..' are treated
> specially in the lookup code, they would not be revalidated without
> special treatment. That leads to issues when looking up files as
> ./ or ../, since the client won't detect that its
> dcache is stale until it tries to use the cached dentry+inode.
> 
> The one thing that has changed since its introduction is, I believe,
> the ESTALE handling in the VFS layer. That might fix a lot of the
> dcache lookup bugs that were previously handled by d_weak_revalidate().
> I haven't done an audit to figure out if it actually can handle all of
> them.
> 

It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:

vfs: allow umount to handle mountpoints without revalidating them

Possibly the fact that we no longer try to revalidate during unmount
means that this is no longer necessary?

The original patch that added d_weak_revalidate had a reproducer in the
patch description. Have you verified that that problem is still not
reproducible when you remove d_weak_revalidate?
-- 
Jeff Layton

Re: Do we really need d_weak_revalidate???

2017-08-10 Thread Trond Myklebust

On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
> flag and introduced the d_weak_revalidate dentry operation instead.
> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
> and added the new dentry operation to NFS dentries  but not to
> NFSv4
> dentries.
> 
> And nobody noticed.
> 
> Until today.
> 
> A customer reports a situation where mount(,MS_REMOUNT,..) on an
> NFS
> filesystem hangs because the network has been deconfigured.  This
> makes
> perfect sense and I suggested a code change to fix the problem.
> However when a colleague was trying to reproduce the problem to
> validate
> the fix, he couldn't.  Then nor could I.
> 
> The problem is trivially reproducible with NFSv3, and not at all with
> NFSv4.  The reason is the missing d_weak_revalidate.
> 
> We could simply add d_weak_revalidate for NFSv4, but given that it
> has been missing for 4.5 years, and the only time anyone noticed was
> when the ommission resulted in a better user experience, I do wonder
> if
> we need to.  Can we just discard d_weak_revalidate?  What purpose
> does
> it serve?  I couldn't find one.
> 
> Thanks,
> NeilBrown
> 
> For reference, see
> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
> d_weak_revalidate dentry op")
> 
> 
> 
> To reproduce the problem at home, on a system that uses systemd:
> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
> 3/ loop-mount the filesystem image read-only somewhere
> 4/ reboot
> 
> If you choose v4, the reboot will succeed, possibly after a 90second
> timeout.
> If you choose v3, the reboot will hang indefinitely in systemd-
> shutdown while
> remounting the nfs filesystem read-only.
> 
> If you don't use "noac" it can still hang, but only if something
> slows
> down the reboot enough that attributes have timed out by the time
> that
> systemd-shutdown runs.  This happens for our customer.
> 
> If the loop-mounted filesystem is not read-only, you get other
> problems.
> 
> We really want systemd to figure out that the loop-mount needs to be
> unmounted first.  I have ideas concerning that, but it is messy.  But
> that isn't the only bug here.

The main purpose of d_weak_revalidate() was to catch the issues that
arise when someone changes the contents of the current working
directory or its parent on the server. Since '.' and '..' are treated
specially in the lookup code, they would not be revalidated without
special treatment. That leads to issues when looking up files as
./ or ../, since the client won't detect that its
dcache is stale until it tries to use the cached dentry+inode.

The one thing that has changed since its introduction is, I believe,
the ESTALE handling in the VFS layer. That might fix a lot of the
dcache lookup bugs that were previously handled by d_weak_revalidate().
I haven't done an audit to figure out if it actually can handle all of
them.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.mykleb...@primarydata.com

Re: Do we really need d_weak_revalidate???

2017-08-10 Thread Trond Myklebust

On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
> flag and introduced the d_weak_revalidate dentry operation instead.
> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
> and added the new dentry operation to NFS dentries  but not to
> NFSv4
> dentries.
> 
> And nobody noticed.
> 
> Until today.
> 
> A customer reports a situation where mount(,MS_REMOUNT,..) on an
> NFS
> filesystem hangs because the network has been deconfigured.  This
> makes
> perfect sense and I suggested a code change to fix the problem.
> However when a colleague was trying to reproduce the problem to
> validate
> the fix, he couldn't.  Then nor could I.
> 
> The problem is trivially reproducible with NFSv3, and not at all with
> NFSv4.  The reason is the missing d_weak_revalidate.
> 
> We could simply add d_weak_revalidate for NFSv4, but given that it
> has been missing for 4.5 years, and the only time anyone noticed was
> when the ommission resulted in a better user experience, I do wonder
> if
> we need to.  Can we just discard d_weak_revalidate?  What purpose
> does
> it serve?  I couldn't find one.
> 
> Thanks,
> NeilBrown
> 
> For reference, see
> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
> d_weak_revalidate dentry op")
> 
> 
> 
> To reproduce the problem at home, on a system that uses systemd:
> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
> 3/ loop-mount the filesystem image read-only somewhere
> 4/ reboot
> 
> If you choose v4, the reboot will succeed, possibly after a 90second
> timeout.
> If you choose v3, the reboot will hang indefinitely in systemd-
> shutdown while
> remounting the nfs filesystem read-only.
> 
> If you don't use "noac" it can still hang, but only if something
> slows
> down the reboot enough that attributes have timed out by the time
> that
> systemd-shutdown runs.  This happens for our customer.
> 
> If the loop-mounted filesystem is not read-only, you get other
> problems.
> 
> We really want systemd to figure out that the loop-mount needs to be
> unmounted first.  I have ideas concerning that, but it is messy.  But
> that isn't the only bug here.

The main purpose of d_weak_revalidate() was to catch the issues that
arise when someone changes the contents of the current working
directory or its parent on the server. Since '.' and '..' are treated
specially in the lookup code, they would not be revalidated without
special treatment. That leads to issues when looking up files as
./ or ../, since the client won't detect that its
dcache is stale until it tries to use the cached dentry+inode.

The one thing that has changed since its introduction is, I believe,
the ESTALE handling in the VFS layer. That might fix a lot of the
dcache lookup bugs that were previously handled by d_weak_revalidate().
I haven't done an audit to figure out if it actually can handle all of
them.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.mykleb...@primarydata.com

Do we really need d_weak_revalidate???

2017-08-10 Thread NeilBrown


Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
flag and introduced the d_weak_revalidate dentry operation instead.
We duly removed the flag from NFS superblocks and NFSv4 superblocks,
and added the new dentry operation to NFS dentries  but not to NFSv4
dentries.

And nobody noticed.

Until today.

A customer reports a situation where mount(,MS_REMOUNT,..) on an NFS
filesystem hangs because the network has been deconfigured.  This makes
perfect sense and I suggested a code change to fix the problem.
However when a colleague was trying to reproduce the problem to validate
the fix, he couldn't.  Then nor could I.

The problem is trivially reproducible with NFSv3, and not at all with
NFSv4.  The reason is the missing d_weak_revalidate.

We could simply add d_weak_revalidate for NFSv4, but given that it
has been missing for 4.5 years, and the only time anyone noticed was
when the ommission resulted in a better user experience, I do wonder if
we need to.  Can we just discard d_weak_revalidate?  What purpose does
it serve?  I couldn't find one.

Thanks,
NeilBrown

For reference, see
Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate 
dentry op")



To reproduce the problem at home, on a system that uses systemd:
1/ place (or find) a filesystem image in a file on an NFS filesystem.
2/ mount the nfs filesystem with "noac" - choose v3 or v4
3/ loop-mount the filesystem image read-only somewhere
4/ reboot

If you choose v4, the reboot will succeed, possibly after a 90second timeout.
If you choose v3, the reboot will hang indefinitely in systemd-shutdown while
remounting the nfs filesystem read-only.

If you don't use "noac" it can still hang, but only if something slows
down the reboot enough that attributes have timed out by the time that
systemd-shutdown runs.  This happens for our customer.

If the loop-mounted filesystem is not read-only, you get other problems.

We really want systemd to figure out that the loop-mount needs to be
unmounted first.  I have ideas concerning that, but it is messy.  But
that isn't the only bug here.


signature.asc
Description: PGP signature

Do we really need d_weak_revalidate???

2017-08-10 Thread NeilBrown


Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
flag and introduced the d_weak_revalidate dentry operation instead.
We duly removed the flag from NFS superblocks and NFSv4 superblocks,
and added the new dentry operation to NFS dentries  but not to NFSv4
dentries.

And nobody noticed.

Until today.

A customer reports a situation where mount(,MS_REMOUNT,..) on an NFS
filesystem hangs because the network has been deconfigured.  This makes
perfect sense and I suggested a code change to fix the problem.
However when a colleague was trying to reproduce the problem to validate
the fix, he couldn't.  Then nor could I.

The problem is trivially reproducible with NFSv3, and not at all with
NFSv4.  The reason is the missing d_weak_revalidate.

We could simply add d_weak_revalidate for NFSv4, but given that it
has been missing for 4.5 years, and the only time anyone noticed was
when the ommission resulted in a better user experience, I do wonder if
we need to.  Can we just discard d_weak_revalidate?  What purpose does
it serve?  I couldn't find one.

Thanks,
NeilBrown

For reference, see
Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate 
dentry op")



To reproduce the problem at home, on a system that uses systemd:
1/ place (or find) a filesystem image in a file on an NFS filesystem.
2/ mount the nfs filesystem with "noac" - choose v3 or v4
3/ loop-mount the filesystem image read-only somewhere
4/ reboot

If you choose v4, the reboot will succeed, possibly after a 90second timeout.
If you choose v3, the reboot will hang indefinitely in systemd-shutdown while
remounting the nfs filesystem read-only.

If you don't use "noac" it can still hang, but only if something slows
down the reboot enough that attributes have timed out by the time that
systemd-shutdown runs.  This happens for our customer.

If the loop-mounted filesystem is not read-only, you get other problems.

We really want systemd to figure out that the loop-mount needs to be
unmounted first.  I have ideas concerning that, but it is messy.  But
that isn't the only bug here.


signature.asc
Description: PGP signature

60 matches

Mail list logo