Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-28 Thread Ian Kent
On 29/11/17 10:13, Mike Marion wrote:
> On Wed, Nov 29, 2017 at 12:17:27PM +1100, NeilBrown wrote:
> 
>> How big do people see /proc/self/mount* getting?  What size reads
>> does 'strace' show the various programs using to read it?
> 
> We already have line counts into 5 figures.  This wasn't an issue until 
> the change of /etc/mtab to a link.  The large count is due to our large
> direct automount maps.
> 

And, admittedly, the testing I was doing was with 15k+ size maps.

Of course it's necessary to have this number of mounts to see serious
problems which is easiest to do with large direct mount maps.

The thing that's different now is that before applications started
using /proc directly for mount table information using mount(2)
instead of mount(8) was enough to prevent the mount entries from
being added to the table seen by applications.

Ian


Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-27 Thread Ian Kent
On 28/11/17 00:01, Mike Marion wrote:
> On Thu, Nov 23, 2017 at 08:36:49AM +0800, Ian Kent wrote:
> 
>> And with the move of userspace to use /proc based mount tables (one
>> example being the symlink of /etc/mtab into /proc) even modest sized
>> direct mount maps will be a problem with every entry getting mounted.
>>
>> Systems will cope with this fine but larger systems not so much.
> 
> Yes.. we've run into some big issues due to the change of /etc/mtab from
> a file to a symlink to /proc/self/mounts.  Most have been worked around
> thus far (mostly due to Suse coming up with patches) but still have a
> few annoying ones.
> 

I think the situation is going to get worse before it gets better.

On recent Fedora and kernel, with a large map and heavy mount activity
I see:

systemd, udisksd, gvfs-udisks2-volume-monitor, gvfsd-trash,
gnome-settings-daemon, packagekitd and gnome-shell

all go crazy consuming large amounts of CPU.

Once the mount activity is completed I see two processes continue to
consume a large amount of CPU. I thought one of those two was systemd
but my notes say they were gvfs-udisks2-volume-monitor and
gnome-settings-daemon.

The symlink change was probably the start, now a number of applications
now got directly to the proc file system for this information.

For large mount tables and many processes accessing the mount table
(probably reading the whole thing, either periodically or on change
notification) the current system does not scale well at all.

Ian


Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-27 Thread Ian Kent
On 28/11/17 00:01, Mike Marion wrote:
> On Thu, Nov 23, 2017 at 08:36:49AM +0800, Ian Kent wrote:
> 
>> And with the move of userspace to use /proc based mount tables (one
>> example being the symlink of /etc/mtab into /proc) even modest sized
>> direct mount maps will be a problem with every entry getting mounted.
>>
>> Systems will cope with this fine but larger systems not so much.
> 
> Yes.. we've run into some big issues due to the change of /etc/mtab from
> a file to a symlink to /proc/self/mounts.  Most have been worked around
> thus far (mostly due to Suse coming up with patches) but still have a
> few annoying ones.
> 

I think the situation is going to get worse before it gets better.

On recent Fedora and kernel, with a large map and heavy mount activity
I see:

systemd, udisksd, gvfs-udisks2-volume-monitor, gvfsd-trash,
gnome-settings-daemon, packagekitd and gnome-shell

all go crazy consuming large amounts of CPU.

Once the mount activity is completed I see two processes continue to
consume a large amount of CPU. I thought one of those two was systemd
but my notes say they were gvfs-udisks2-volume-monitor and
gnome-settings-daemon.

The symlink change was probably the start, now a number of applications
now got directly to the proc file system for this information.

For large mount tables and many processes accessing the mount table
(probably reading the whole thing, either periodically or on change
notification) the current system does not scale well at all.

Ian


[PATCH 2/2] autofs: revert fix AT_NO_AUTOMOUNT not being honored

2017-11-26 Thread Ian Kent
Commit 42f4614821 allowed the fstatat(2) system call to properly honor
the AT_NO_AUTOMOUNT flag but introduced a semantic change.

In order to honor AT_NO_AUTOMOUNT a semantic change was made to the
negative dentry case for stat family system calls in follow_automount().

This changed the unconditional triggering of an automount in this case
to no longer be done and an error returned instead.

This has caused more problems than I expected so reverting the change
is needed.

In a discussion with Neil Brown it was concluded that the automount(8)
daemon can implement this change without kernel modifications. So that
will be done instead and the autofs module documentation updated with
a description of the problem and what needs to be done by module users
for this specific case.

Reverts: 42f4614821

Signed-off-by: Ian Kent <ra...@themaw.net>
Cc: Neil Brown <ne...@suse.com>
Cc: Al Viro <v...@zeniv.linux.org.uk>
Cc: David Howells <dhowe...@redhat.com>
Cc: Colin Walters <walt...@redhat.com>
Cc: Ondrej Holy <oh...@redhat.com>
---
 fs/namei.c |   15 +++
 include/linux/fs.h |3 ++-
 2 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f0c7a7b9b6ca..9cc91fb7f156 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1129,18 +1129,9 @@ static int follow_automount(struct path *path, struct 
nameidata *nd,
 * of the daemon to instantiate them before they can be used.
 */
if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
-  LOOKUP_OPEN | LOOKUP_CREATE |
-  LOOKUP_AUTOMOUNT))) {
-   /* Positive dentry that isn't meant to trigger an
-* automount, EISDIR will allow it to be used,
-* otherwise there's no mount here "now" so return
-* ENOENT.
-*/
-   if (path->dentry->d_inode)
-   return -EISDIR;
-   else
-   return -ENOENT;
-   }
+  LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
+   path->dentry->d_inode)
+   return -EISDIR;
 
if (path->dentry->d_sb->s_user_ns != _user_ns)
return -EACCES;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2995a271ec46..9c4a43e391ac 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3088,7 +3088,8 @@ static inline int vfs_lstat(const char __user *name, 
struct kstat *stat)
 static inline int vfs_fstatat(int dfd, const char __user *filename,
  struct kstat *stat, int flags)
 {
-   return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
+   return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
+stat, STATX_BASIC_STATS);
 }
 static inline int vfs_fstat(int fd, struct kstat *stat)
 {



[PATCH 1/2] autofs: revert take more care to not update last_used on path walk

2017-11-26 Thread Ian Kent
While the patch of commit 092a53452b helped (partially) resolve a
problem where automounts were not expiring due to aggressive accesses
from user space it has a side effect for very large environments.

This change helps with the expire problem by making the expire more
aggressive but, for very large environments, that means more mount
requests from clients. When there are a lot of clients that can mean
fairly significant server load increases.

It turns out I put the last_used in this position to solve this
very problem and failed to update my own thinking of the autofs
expire policy. So the patch being reverted introduces a regression
which should be fixed.

Reverts: 092a53452b

Signed-off-by: Ian Kent <ra...@themaw.net>
Cc: Neil Brown <ne...@suse.com>
Cc: Al Viro <v...@zeniv.linux.org.uk>
---
 fs/autofs4/root.c |   17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index d79ced925861..82e8f6edfb48 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -281,8 +281,8 @@ static int autofs4_mount_wait(const struct path *path, bool 
rcu_walk)
pr_debug("waiting for mount name=%pd\n", path->dentry);
status = autofs4_wait(sbi, path, NFY_MOUNT);
pr_debug("mount wait done status=%d\n", status);
-   ino->last_used = jiffies;
}
+   ino->last_used = jiffies;
return status;
 }
 
@@ -321,21 +321,16 @@ static struct dentry *autofs4_mountpoint_changed(struct 
path *path)
 */
if (autofs_type_indirect(sbi->type) && d_unhashed(dentry)) {
struct dentry *parent = dentry->d_parent;
+   struct autofs_info *ino;
struct dentry *new;
 
new = d_lookup(parent, >d_name);
if (!new)
return NULL;
-   if (new == dentry)
-   dput(new);
-   else {
-   struct autofs_info *ino;
-
-   ino = autofs4_dentry_ino(new);
-   ino->last_used = jiffies;
-   dput(path->dentry);
-   path->dentry = new;
-   }
+   ino = autofs4_dentry_ino(new);
+   ino->last_used = jiffies;
+   dput(path->dentry);
+   path->dentry = new;
}
return path->dentry;
 }



[PATCH 2/2] autofs: revert fix AT_NO_AUTOMOUNT not being honored

2017-11-26 Thread Ian Kent
Commit 42f4614821 allowed the fstatat(2) system call to properly honor
the AT_NO_AUTOMOUNT flag but introduced a semantic change.

In order to honor AT_NO_AUTOMOUNT a semantic change was made to the
negative dentry case for stat family system calls in follow_automount().

This changed the unconditional triggering of an automount in this case
to no longer be done and an error returned instead.

This has caused more problems than I expected so reverting the change
is needed.

In a discussion with Neil Brown it was concluded that the automount(8)
daemon can implement this change without kernel modifications. So that
will be done instead and the autofs module documentation updated with
a description of the problem and what needs to be done by module users
for this specific case.

Reverts: 42f4614821

Signed-off-by: Ian Kent 
Cc: Neil Brown 
Cc: Al Viro 
Cc: David Howells 
Cc: Colin Walters 
Cc: Ondrej Holy 
---
 fs/namei.c |   15 +++
 include/linux/fs.h |3 ++-
 2 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f0c7a7b9b6ca..9cc91fb7f156 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1129,18 +1129,9 @@ static int follow_automount(struct path *path, struct 
nameidata *nd,
 * of the daemon to instantiate them before they can be used.
 */
if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
-  LOOKUP_OPEN | LOOKUP_CREATE |
-  LOOKUP_AUTOMOUNT))) {
-   /* Positive dentry that isn't meant to trigger an
-* automount, EISDIR will allow it to be used,
-* otherwise there's no mount here "now" so return
-* ENOENT.
-*/
-   if (path->dentry->d_inode)
-   return -EISDIR;
-   else
-   return -ENOENT;
-   }
+  LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
+   path->dentry->d_inode)
+   return -EISDIR;
 
if (path->dentry->d_sb->s_user_ns != _user_ns)
return -EACCES;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2995a271ec46..9c4a43e391ac 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3088,7 +3088,8 @@ static inline int vfs_lstat(const char __user *name, 
struct kstat *stat)
 static inline int vfs_fstatat(int dfd, const char __user *filename,
  struct kstat *stat, int flags)
 {
-   return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
+   return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
+stat, STATX_BASIC_STATS);
 }
 static inline int vfs_fstat(int fd, struct kstat *stat)
 {



[PATCH 1/2] autofs: revert take more care to not update last_used on path walk

2017-11-26 Thread Ian Kent
While the patch of commit 092a53452b helped (partially) resolve a
problem where automounts were not expiring due to aggressive accesses
from user space it has a side effect for very large environments.

This change helps with the expire problem by making the expire more
aggressive but, for very large environments, that means more mount
requests from clients. When there are a lot of clients that can mean
fairly significant server load increases.

It turns out I put the last_used in this position to solve this
very problem and failed to update my own thinking of the autofs
expire policy. So the patch being reverted introduces a regression
which should be fixed.

Reverts: 092a53452b

Signed-off-by: Ian Kent 
Cc: Neil Brown 
Cc: Al Viro 
---
 fs/autofs4/root.c |   17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index d79ced925861..82e8f6edfb48 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -281,8 +281,8 @@ static int autofs4_mount_wait(const struct path *path, bool 
rcu_walk)
pr_debug("waiting for mount name=%pd\n", path->dentry);
status = autofs4_wait(sbi, path, NFY_MOUNT);
pr_debug("mount wait done status=%d\n", status);
-   ino->last_used = jiffies;
}
+   ino->last_used = jiffies;
return status;
 }
 
@@ -321,21 +321,16 @@ static struct dentry *autofs4_mountpoint_changed(struct 
path *path)
 */
if (autofs_type_indirect(sbi->type) && d_unhashed(dentry)) {
struct dentry *parent = dentry->d_parent;
+   struct autofs_info *ino;
struct dentry *new;
 
new = d_lookup(parent, >d_name);
if (!new)
return NULL;
-   if (new == dentry)
-   dput(new);
-   else {
-   struct autofs_info *ino;
-
-   ino = autofs4_dentry_ino(new);
-   ino->last_used = jiffies;
-   dput(path->dentry);
-   path->dentry = new;
-   }
+   ino = autofs4_dentry_ino(new);
+   ino->last_used = jiffies;
+   dput(path->dentry);
+   path->dentry = new;
}
return path->dentry;
 }



Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 23/11/17 12:49, NeilBrown wrote:
> On Thu, Nov 23 2017, Ian Kent wrote:
> 
>> On 23/11/17 10:21, NeilBrown wrote:
>>> On Thu, Nov 23 2017, Ian Kent wrote:
>>>
>>>>
>>>> Hey Neil, I'm looking at this again because RH QE have complained about
>>>> a regression test failing with a kernel that has this change.
>>>>
>>>> Maybe I'm just dumb but I though a "find  "
>>>> would, well, just look at the contents below  but an
>>>> strace shows that it reads and calls fstatat() on "every entry in the
>>>> mount table" regardless of the path.
>>>
>>> weird ... I can only get find to look at the mount table if given the
>>> -fstyp option, and even then it doesn't fstatat anything that isn't in
>>> the tree it is searching.
>>
>> It's probably the -xautofs (exclude autofs fs'es) that was used in
>> the test that requires reading the mount table to get info about
>> excluding autofs mounts but the fstatat() on all the entries,
>> regardless of path, that was a surprise to me.
>>
>> find did use AT_SYMLINK_NOFOLLOW which historically behaved like
>> AT_NO_AUTOMOUNT.
>>
>>>
>>>
>>>>
>>>> And with the move of userspace to use /proc based mount tables (one
>>>> example being the symlink of /etc/mtab into /proc) even modest sized
>>>> direct mount maps will be a problem with every entry getting mounted.
>>>
>>> But the patch in question is only about indirect mount maps, isn't it?
>>> How is it relevant to direct mount maps?
>>
>> The change here will cause fstatat() to trigger direct mounts on access
>> if it doesn't use AT_NO_AUTOMOUNT.
> 
> Ahhh... light dawns.
> This is about this bit of the patch:
> 
>  static inline int vfs_fstatat(int dfd, const char __user *filename,
>   struct kstat *stat, int flags)
>  {
> -   return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
> -stat, STATX_BASIC_STATS);
> +   return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
>  }
> 
> I hadn't paid much attention to that.
> 
> I before this patch:
>   stat and lstat act as you would expect AT_NO_AUTOMOUNT to act on
>   direct mount and browseable indirect mount, but not on unbrowseable
>   indirect mounts

Yep, because of the fall thru for negative dentrys at:

if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
path->dentry->d_inode)
return -EISDIR;

which is missing a LOOKUP_FOLLOW check if we wanted to catch 
AT_SYMLINK_NOFOLLOW.

>   fstatat appeared to accept the AT_NO_AUTOMOUNT flag, but actually
>   assumed it was always set, but acted like stat and lstat

Yep, always set AT_NO_AUTOMOUNT so that it behaved the same as 

>   xstatat actually accepted the AT_NO_AUTOMOUNT flag, but it had no
>   effect on unbrowseable indirect mounts.

Yep.

> 
> after the patch, the distinction between direct and indirect was gone,
> and fstatat now handles AT_NO_AUTOMOUNT the same as xstatat.
> So:
>   stat and lstat now don't trigger automounts even on indirect, but
> this is a mixed blessing as they don't even trigger the mkdir

Yep.

>   fstatat without AT_NO_AUTOMOUNT now always triggers an automount
> This is a problematic regression that you have noticed and
> likely needs to be reverted.  Maybe we can assume
> AT_NO_AUTOMOUNT when AT_SYMLINK_NOFOLLOW is set, and require
> people to use xstatat if they need to set the flags separately

Yep.

The introduction of AT_NO_AUTOMOUNT (and the introduction of
follow_managed() and friends) was meant to do away with the
misleading use the AT_SYMLINK_NOFOLLOW (at the time the automount
mechanism abused the ->follow_link() method because it had similar
semantics to symlinks).

To catch the older usage pattern re-adding an AT_SYMLINK_NOFOLLOW
check would be helpful.

The reality is there are still the same old problems of unintended
mounting (mount storms) and AT_NO_AUTOMOUNT not being properly
handled.

Certainly the implementation we have now is much better but these
niggling problems remain and user space steps on them way too often,
and it feels like its much more so lately.  

>   xstatat now correctly honours AT_NO_AUTOMOUNT for indirect mounts
> but is otherwise unchanged.

Yep, assuming we accept the ENOENT return as sensible for AT_NO_AUTOMOUNT
no browse indirect mount case. statx() being a new system call it would
be ideal to get the semantics of this call right now before it becomes well
u

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 23/11/17 12:49, NeilBrown wrote:
> On Thu, Nov 23 2017, Ian Kent wrote:
> 
>> On 23/11/17 10:21, NeilBrown wrote:
>>> On Thu, Nov 23 2017, Ian Kent wrote:
>>>
>>>>
>>>> Hey Neil, I'm looking at this again because RH QE have complained about
>>>> a regression test failing with a kernel that has this change.
>>>>
>>>> Maybe I'm just dumb but I though a "find  "
>>>> would, well, just look at the contents below  but an
>>>> strace shows that it reads and calls fstatat() on "every entry in the
>>>> mount table" regardless of the path.
>>>
>>> weird ... I can only get find to look at the mount table if given the
>>> -fstyp option, and even then it doesn't fstatat anything that isn't in
>>> the tree it is searching.
>>
>> It's probably the -xautofs (exclude autofs fs'es) that was used in
>> the test that requires reading the mount table to get info about
>> excluding autofs mounts but the fstatat() on all the entries,
>> regardless of path, that was a surprise to me.
>>
>> find did use AT_SYMLINK_NOFOLLOW which historically behaved like
>> AT_NO_AUTOMOUNT.
>>
>>>
>>>
>>>>
>>>> And with the move of userspace to use /proc based mount tables (one
>>>> example being the symlink of /etc/mtab into /proc) even modest sized
>>>> direct mount maps will be a problem with every entry getting mounted.
>>>
>>> But the patch in question is only about indirect mount maps, isn't it?
>>> How is it relevant to direct mount maps?
>>
>> The change here will cause fstatat() to trigger direct mounts on access
>> if it doesn't use AT_NO_AUTOMOUNT.
> 
> Ahhh... light dawns.
> This is about this bit of the patch:
> 
>  static inline int vfs_fstatat(int dfd, const char __user *filename,
>   struct kstat *stat, int flags)
>  {
> -   return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
> -stat, STATX_BASIC_STATS);
> +   return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
>  }
> 
> I hadn't paid much attention to that.
> 
> I before this patch:
>   stat and lstat act as you would expect AT_NO_AUTOMOUNT to act on
>   direct mount and browseable indirect mount, but not on unbrowseable
>   indirect mounts

Yep, because of the fall thru for negative dentrys at:

if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
path->dentry->d_inode)
return -EISDIR;

which is missing a LOOKUP_FOLLOW check if we wanted to catch 
AT_SYMLINK_NOFOLLOW.

>   fstatat appeared to accept the AT_NO_AUTOMOUNT flag, but actually
>   assumed it was always set, but acted like stat and lstat

Yep, always set AT_NO_AUTOMOUNT so that it behaved the same as 

>   xstatat actually accepted the AT_NO_AUTOMOUNT flag, but it had no
>   effect on unbrowseable indirect mounts.

Yep.

> 
> after the patch, the distinction between direct and indirect was gone,
> and fstatat now handles AT_NO_AUTOMOUNT the same as xstatat.
> So:
>   stat and lstat now don't trigger automounts even on indirect, but
> this is a mixed blessing as they don't even trigger the mkdir

Yep.

>   fstatat without AT_NO_AUTOMOUNT now always triggers an automount
> This is a problematic regression that you have noticed and
> likely needs to be reverted.  Maybe we can assume
> AT_NO_AUTOMOUNT when AT_SYMLINK_NOFOLLOW is set, and require
> people to use xstatat if they need to set the flags separately

Yep.

The introduction of AT_NO_AUTOMOUNT (and the introduction of
follow_managed() and friends) was meant to do away with the
misleading use the AT_SYMLINK_NOFOLLOW (at the time the automount
mechanism abused the ->follow_link() method because it had similar
semantics to symlinks).

To catch the older usage pattern re-adding an AT_SYMLINK_NOFOLLOW
check would be helpful.

The reality is there are still the same old problems of unintended
mounting (mount storms) and AT_NO_AUTOMOUNT not being properly
handled.

Certainly the implementation we have now is much better but these
niggling problems remain and user space steps on them way too often,
and it feels like its much more so lately.  

>   xstatat now correctly honours AT_NO_AUTOMOUNT for indirect mounts
> but is otherwise unchanged.

Yep, assuming we accept the ENOENT return as sensible for AT_NO_AUTOMOUNT
no browse indirect mount case. statx() being a new system call it would
be ideal to get the semantics of this call right now before it becomes well
u

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 23/11/17 11:04, NeilBrown wrote:
> On Thu, Nov 23 2017, Ian Kent wrote:
> 
>> On 23/11/17 08:47, NeilBrown wrote:
>>> On Wed, Nov 22 2017, Ian Kent wrote:
>>>
>>>> On 21/11/17 09:53, NeilBrown wrote:
>>>>> On Wed, May 10 2017, Ian Kent wrote:
>>>>>
>>>>>> The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
>>>>>> which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
>>>>>> of an automount by the call. But this flag is unconditionally cleared
>>>>>> for all stat family system calls except statx().
>>>>>>
>>>>>> stat family system calls have always triggered mount requests for the
>>>>>> negative dentry case in follow_automount() which is intended but prevents
>>>>>> the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.
>>>>>>
>>>>>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>>>>>> negative dentry case in follow_automount() needs to be changed to
>>>>>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>>>>>> required flags are clear).
>>>>>>
>>>>>> AFAICT this change doesn't have any noticable side effects and may,
>>>>>> in some use cases (although I didn't see it in testing) prevent
>>>>>> unnecessary callbacks to the automount daemon.
>>>>>
>>>>> Actually, this patch does have a noticeable side effect.
>>>>
>>>> That's right.
>>>>
>>>>>
>>>>> Previously, if /home were an indirect mount point so that /home/neilb
>>>>> would mount my home directory, "ls -l /home/neilb" would always work.
>>>>> Now it doesn't.
>>>>
>>>> An this is correct too.
>>>>
>>>> The previous policy was that a ->lookup() would always cause a mount to
>>>> occur. It's that policy that prevents the AT_NO_AUTOMOUNT flag from being
>>>> able to work consistently.
>>>
>>> Just to be clear, the previous policy was:
>>>  kernel - a lookup would cause a message to be sent to the automount daemon
>>>  daemon - the receipt of a message would cause a mount to occur.
>>>
>>> Either part of this policy could be changed to allow AT_NO_AUTOMOUNT to
>>> work reliably.  You chose to change the first.  At the time I thought
>>> this was a good idea.  I no longer think so.
>>>
>>> Specifically, I think the second part of the policy should be revised a 
>>> little.
>>>
>>>>
>>>> If you set the indirect mount "browse" option that will cause the mount
>>>> point directories for each of the map keys to be pre-created. So a
>>>> directory listing will show the mount points and an attempt to access
>>>> anything within the mount point directory will cause it to be mounted.
>>>>
>>>> There is still the problem of not knowing map keys when the wildcard
>>>> map entry is used.
>>>>
>>>> Still, stat family systems calls have always had similar problems that
>>>> prevent consistent behavior.
>>>
>>> Yes, I understand all this.  stat family sys-call have some odd
>>> behaviours at times like "stat(); open(); fstat()" will result in
>>> different sets of status info.  This is known.
>>> The point is that these odd behaviours have changed in a noticeably way
>>> (contrary to the change log comment) and it isn't clear these changes
>>> are good.
>>>
>>>>
>>>>>
>>>>> This happens because "ls" calls 'lstat' on the path and when that fails
>>>>> with "ENOENT", it doesn't bother trying to open.
>>>>
>>>> I know, I believe the ENOENT is appropriate because there is no mount
>>>> and no directory at the time this happens.
>>>
>>> Two distinct statements here.  "no mount" and "no directory".
>>> If AT_NO_AUTOMOUNT is given (or implicit) the "no mount" is significant
>>> and shouldn't be changed.   It isn't clear that "no directory" is
>>> significant.
>>> If you think of the list of names stored in the autofs filesystem as a
>>> cache of recently used names from the map, then the directory *does*
>>> exist (in the map), it is just that the (in-kernel) cac

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 23/11/17 11:04, NeilBrown wrote:
> On Thu, Nov 23 2017, Ian Kent wrote:
> 
>> On 23/11/17 08:47, NeilBrown wrote:
>>> On Wed, Nov 22 2017, Ian Kent wrote:
>>>
>>>> On 21/11/17 09:53, NeilBrown wrote:
>>>>> On Wed, May 10 2017, Ian Kent wrote:
>>>>>
>>>>>> The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
>>>>>> which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
>>>>>> of an automount by the call. But this flag is unconditionally cleared
>>>>>> for all stat family system calls except statx().
>>>>>>
>>>>>> stat family system calls have always triggered mount requests for the
>>>>>> negative dentry case in follow_automount() which is intended but prevents
>>>>>> the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.
>>>>>>
>>>>>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>>>>>> negative dentry case in follow_automount() needs to be changed to
>>>>>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>>>>>> required flags are clear).
>>>>>>
>>>>>> AFAICT this change doesn't have any noticable side effects and may,
>>>>>> in some use cases (although I didn't see it in testing) prevent
>>>>>> unnecessary callbacks to the automount daemon.
>>>>>
>>>>> Actually, this patch does have a noticeable side effect.
>>>>
>>>> That's right.
>>>>
>>>>>
>>>>> Previously, if /home were an indirect mount point so that /home/neilb
>>>>> would mount my home directory, "ls -l /home/neilb" would always work.
>>>>> Now it doesn't.
>>>>
>>>> An this is correct too.
>>>>
>>>> The previous policy was that a ->lookup() would always cause a mount to
>>>> occur. It's that policy that prevents the AT_NO_AUTOMOUNT flag from being
>>>> able to work consistently.
>>>
>>> Just to be clear, the previous policy was:
>>>  kernel - a lookup would cause a message to be sent to the automount daemon
>>>  daemon - the receipt of a message would cause a mount to occur.
>>>
>>> Either part of this policy could be changed to allow AT_NO_AUTOMOUNT to
>>> work reliably.  You chose to change the first.  At the time I thought
>>> this was a good idea.  I no longer think so.
>>>
>>> Specifically, I think the second part of the policy should be revised a 
>>> little.
>>>
>>>>
>>>> If you set the indirect mount "browse" option that will cause the mount
>>>> point directories for each of the map keys to be pre-created. So a
>>>> directory listing will show the mount points and an attempt to access
>>>> anything within the mount point directory will cause it to be mounted.
>>>>
>>>> There is still the problem of not knowing map keys when the wildcard
>>>> map entry is used.
>>>>
>>>> Still, stat family systems calls have always had similar problems that
>>>> prevent consistent behavior.
>>>
>>> Yes, I understand all this.  stat family sys-call have some odd
>>> behaviours at times like "stat(); open(); fstat()" will result in
>>> different sets of status info.  This is known.
>>> The point is that these odd behaviours have changed in a noticeably way
>>> (contrary to the change log comment) and it isn't clear these changes
>>> are good.
>>>
>>>>
>>>>>
>>>>> This happens because "ls" calls 'lstat' on the path and when that fails
>>>>> with "ENOENT", it doesn't bother trying to open.
>>>>
>>>> I know, I believe the ENOENT is appropriate because there is no mount
>>>> and no directory at the time this happens.
>>>
>>> Two distinct statements here.  "no mount" and "no directory".
>>> If AT_NO_AUTOMOUNT is given (or implicit) the "no mount" is significant
>>> and shouldn't be changed.   It isn't clear that "no directory" is
>>> significant.
>>> If you think of the list of names stored in the autofs filesystem as a
>>> cache of recently used names from the map, then the directory *does*
>>> exist (in the map), it is just that the (in-kernel) cac

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 23/11/17 10:46, Ian Kent wrote:
> On 23/11/17 10:21, NeilBrown wrote:
>> On Thu, Nov 23 2017, Ian Kent wrote:
>>
>>>
>>> Hey Neil, I'm looking at this again because RH QE have complained about
>>> a regression test failing with a kernel that has this change.
>>>
>>> Maybe I'm just dumb but I though a "find  "
>>> would, well, just look at the contents below  but an
>>> strace shows that it reads and calls fstatat() on "every entry in the
>>> mount table" regardless of the path.
>>
>> weird ... I can only get find to look at the mount table if given the
>> -fstyp option, and even then it doesn't fstatat anything that isn't in
>> the tree it is searching.
> 
> It's probably the -xautofs (exclude autofs fs'es) that was used in
> the test that requires reading the mount table to get info about
> excluding autofs mounts but the fstatat() on all the entries,
> regardless of path, that was a surprise to me.
> 
> find did use AT_SYMLINK_NOFOLLOW which historically behaved like
> AT_NO_AUTOMOUNT.

Check this simple reproducer out:

[root@intel-mahobay-02 ~]# uname -r
4.14.0-0.rc8.2.el7a.x86_64
[root@intel-mahobay-02 ~]# cat /etc/exports
/exportdir-bz1482065 *(rw,no_root_squash)
[root@intel-mahobay-02 ~]# cat /etc/auto.master
/- /etc/auto.direct
[root@intel-mahobay-02 ~]# cat /etc/auto.direct 
/mnt/autofs/test localhost:/exportdir-bz1482065
[root@intel-mahobay-02 ~]# ls /mnt/autofs/
bz1482065  test
[root@intel-mahobay-02 ~]# ls /exportdir-bz1482065
bz1482065
[root@intel-mahobay-02 ~]# find /mnt/autofs -xautofs  -name '*bz1482065*'
[root@intel-mahobay-02 ~]# find /mnt/autofs -xautofs  -name '*bz1482065*'
/mnt/autofs/bz1482065
/mnt/autofs/test/bz1482065

The first find mounts the direct mount /mnt/autofs/test/bz1482065 which causes 
the
second to list /mnt/autofs/bz1482065 (not sure yet where this comes from, it 
must
be a plain directory within /mnt/autofs) and the now mounted bind mount (ie. the
localhost: host name in the map entry) /mnt/autofs/test/bz1482065.

The strace is:
execve("/usr/bin/find", ["find", "/mnt/autofs", "-xautofs", "-name", 
"*bz1482065*"], [/* 45 vars */]) = 0

snip ...

newfstatat(AT_FDCWD, "/mnt/autofs", {st_mode=S_IFDIR|0755, st_size=35, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
open("/etc/mtab", O_RDONLY|O_CLOEXEC)   = 4
futex(0x7efec9ba55b0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7efeca2ea000
read(4, "sysfs /sys sysfs rw,seclabel,nos"..., 1024) = 1024
read(4, "oup/net_cls,net_prio cgroup rw,n"..., 1024) = 1024
read(4, "ges hugetlbfs rw,seclabel,relati"..., 1024) = 594
read(4, "", 1024)   = 0
close(4)= 0
munmap(0x7efeca2ea000, 4096)= 0
newfstatat(AT_FDCWD, "/sys", {st_mode=S_IFDIR|0555, st_size=0, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/dev", {st_mode=S_IFDIR|0755, st_size=3280, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/kernel/security", {st_mode=S_IFDIR|0755, st_size=0, 
...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/dev/shm", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=40, 
...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/dev/pts", {st_mode=S_IFDIR|0755, st_size=0, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/run", {st_mode=S_IFDIR|0755, st_size=1020, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup", {st_mode=S_IFDIR|0755, st_size=340, 
...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/systemd", {st_mode=S_IFDIR|0555, 
st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/pstore", {st_mode=S_IFDIR|0750, st_size=0, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/blkio", {st_mode=S_IFDIR|0555, st_size=0, 
...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/cpu,cpuacct", {st_mode=S_IFDIR|0555, 
st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/pids", {st_mode=S_IFDIR|0555, st_size=0, 
...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/net_cls,net_prio", {st_mode=S_IFDIR|0555, 
st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/devices", {st_mode=S_IFDIR|0555, 
st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/freezer", {st_mode=S_IFDIR|0555, 
st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/perf_event"

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 23/11/17 10:46, Ian Kent wrote:
> On 23/11/17 10:21, NeilBrown wrote:
>> On Thu, Nov 23 2017, Ian Kent wrote:
>>
>>>
>>> Hey Neil, I'm looking at this again because RH QE have complained about
>>> a regression test failing with a kernel that has this change.
>>>
>>> Maybe I'm just dumb but I though a "find  "
>>> would, well, just look at the contents below  but an
>>> strace shows that it reads and calls fstatat() on "every entry in the
>>> mount table" regardless of the path.
>>
>> weird ... I can only get find to look at the mount table if given the
>> -fstyp option, and even then it doesn't fstatat anything that isn't in
>> the tree it is searching.
> 
> It's probably the -xautofs (exclude autofs fs'es) that was used in
> the test that requires reading the mount table to get info about
> excluding autofs mounts but the fstatat() on all the entries,
> regardless of path, that was a surprise to me.
> 
> find did use AT_SYMLINK_NOFOLLOW which historically behaved like
> AT_NO_AUTOMOUNT.

Check this simple reproducer out:

[root@intel-mahobay-02 ~]# uname -r
4.14.0-0.rc8.2.el7a.x86_64
[root@intel-mahobay-02 ~]# cat /etc/exports
/exportdir-bz1482065 *(rw,no_root_squash)
[root@intel-mahobay-02 ~]# cat /etc/auto.master
/- /etc/auto.direct
[root@intel-mahobay-02 ~]# cat /etc/auto.direct 
/mnt/autofs/test localhost:/exportdir-bz1482065
[root@intel-mahobay-02 ~]# ls /mnt/autofs/
bz1482065  test
[root@intel-mahobay-02 ~]# ls /exportdir-bz1482065
bz1482065
[root@intel-mahobay-02 ~]# find /mnt/autofs -xautofs  -name '*bz1482065*'
[root@intel-mahobay-02 ~]# find /mnt/autofs -xautofs  -name '*bz1482065*'
/mnt/autofs/bz1482065
/mnt/autofs/test/bz1482065

The first find mounts the direct mount /mnt/autofs/test/bz1482065 which causes 
the
second to list /mnt/autofs/bz1482065 (not sure yet where this comes from, it 
must
be a plain directory within /mnt/autofs) and the now mounted bind mount (ie. the
localhost: host name in the map entry) /mnt/autofs/test/bz1482065.

The strace is:
execve("/usr/bin/find", ["find", "/mnt/autofs", "-xautofs", "-name", 
"*bz1482065*"], [/* 45 vars */]) = 0

snip ...

newfstatat(AT_FDCWD, "/mnt/autofs", {st_mode=S_IFDIR|0755, st_size=35, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
open("/etc/mtab", O_RDONLY|O_CLOEXEC)   = 4
futex(0x7efec9ba55b0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7efeca2ea000
read(4, "sysfs /sys sysfs rw,seclabel,nos"..., 1024) = 1024
read(4, "oup/net_cls,net_prio cgroup rw,n"..., 1024) = 1024
read(4, "ges hugetlbfs rw,seclabel,relati"..., 1024) = 594
read(4, "", 1024)   = 0
close(4)= 0
munmap(0x7efeca2ea000, 4096)= 0
newfstatat(AT_FDCWD, "/sys", {st_mode=S_IFDIR|0555, st_size=0, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/dev", {st_mode=S_IFDIR|0755, st_size=3280, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/kernel/security", {st_mode=S_IFDIR|0755, st_size=0, 
...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/dev/shm", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=40, 
...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/dev/pts", {st_mode=S_IFDIR|0755, st_size=0, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/run", {st_mode=S_IFDIR|0755, st_size=1020, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup", {st_mode=S_IFDIR|0755, st_size=340, 
...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/systemd", {st_mode=S_IFDIR|0555, 
st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/pstore", {st_mode=S_IFDIR|0750, st_size=0, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/blkio", {st_mode=S_IFDIR|0555, st_size=0, 
...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/cpu,cpuacct", {st_mode=S_IFDIR|0555, 
st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/pids", {st_mode=S_IFDIR|0555, st_size=0, 
...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/net_cls,net_prio", {st_mode=S_IFDIR|0555, 
st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/devices", {st_mode=S_IFDIR|0555, 
st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/freezer", {st_mode=S_IFDIR|0555, 
st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/perf_event"

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 23/11/17 10:21, NeilBrown wrote:
> On Thu, Nov 23 2017, Ian Kent wrote:
> 
>>
>> Hey Neil, I'm looking at this again because RH QE have complained about
>> a regression test failing with a kernel that has this change.
>>
>> Maybe I'm just dumb but I though a "find  "
>> would, well, just look at the contents below  but an
>> strace shows that it reads and calls fstatat() on "every entry in the
>> mount table" regardless of the path.
> 
> weird ... I can only get find to look at the mount table if given the
> -fstyp option, and even then it doesn't fstatat anything that isn't in
> the tree it is searching.

It's probably the -xautofs (exclude autofs fs'es) that was used in
the test that requires reading the mount table to get info about
excluding autofs mounts but the fstatat() on all the entries,
regardless of path, that was a surprise to me.

find did use AT_SYMLINK_NOFOLLOW which historically behaved like
AT_NO_AUTOMOUNT.

> 
> 
>>
>> And with the move of userspace to use /proc based mount tables (one
>> example being the symlink of /etc/mtab into /proc) even modest sized
>> direct mount maps will be a problem with every entry getting mounted.
> 
> But the patch in question is only about indirect mount maps, isn't it?
> How is it relevant to direct mount maps?

The change here will cause fstatat() to trigger direct mounts on access
if it doesn't use AT_NO_AUTOMOUNT.

It's not a problem for browse indirect mounts because they are plain
directories within the autofs mount point not individual autofs mount
triggers.

> 
>>
>> Systems will cope with this fine but larger systems not so much.
>>
>> If find does this then the user space changes needed to accommodate
>> this sort of change are almost certainly far more than I expected.
>>
>> I think this is an example of the larger problem I'm faced with and
>> this change was was meant to be a starting point for resolution.
>>
>> The most obvious symptom of the problem is auto-mounts no longer able
>> to be expired due to being re-mounted immediately after expire. Another
>> symptom is unwanted (by the user) accesses causing unexpected auto-mount
>> attempts.
>>
>> I believe this monitoring of the mount table is what leads to excessive
>> CPU consumption I've seen, usually around six processes, under heavy
>> mount activity. And following this, when the mount table is large and
>> there is "no mount activity" two of the six processes continue to consume
>> excessive CPU, until the mount table shrinks.
>>
>> So now I'm coming around to the idea of reverting this change . and
>> going back to the drawing board.
> 
> I can well imaging that a large mount table could cause problems for
> applications that are written to expect one, and I can imagine that
> autofs could cause extra issues for such a program as it might change
> the mount table more often.  But I haven't yet worked out how this is
> related to the patch in question
> 
> Thanks,
> NeilBrown
> 



Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 23/11/17 10:21, NeilBrown wrote:
> On Thu, Nov 23 2017, Ian Kent wrote:
> 
>>
>> Hey Neil, I'm looking at this again because RH QE have complained about
>> a regression test failing with a kernel that has this change.
>>
>> Maybe I'm just dumb but I though a "find  "
>> would, well, just look at the contents below  but an
>> strace shows that it reads and calls fstatat() on "every entry in the
>> mount table" regardless of the path.
> 
> weird ... I can only get find to look at the mount table if given the
> -fstyp option, and even then it doesn't fstatat anything that isn't in
> the tree it is searching.

It's probably the -xautofs (exclude autofs fs'es) that was used in
the test that requires reading the mount table to get info about
excluding autofs mounts but the fstatat() on all the entries,
regardless of path, that was a surprise to me.

find did use AT_SYMLINK_NOFOLLOW which historically behaved like
AT_NO_AUTOMOUNT.

> 
> 
>>
>> And with the move of userspace to use /proc based mount tables (one
>> example being the symlink of /etc/mtab into /proc) even modest sized
>> direct mount maps will be a problem with every entry getting mounted.
> 
> But the patch in question is only about indirect mount maps, isn't it?
> How is it relevant to direct mount maps?

The change here will cause fstatat() to trigger direct mounts on access
if it doesn't use AT_NO_AUTOMOUNT.

It's not a problem for browse indirect mounts because they are plain
directories within the autofs mount point not individual autofs mount
triggers.

> 
>>
>> Systems will cope with this fine but larger systems not so much.
>>
>> If find does this then the user space changes needed to accommodate
>> this sort of change are almost certainly far more than I expected.
>>
>> I think this is an example of the larger problem I'm faced with and
>> this change was was meant to be a starting point for resolution.
>>
>> The most obvious symptom of the problem is auto-mounts no longer able
>> to be expired due to being re-mounted immediately after expire. Another
>> symptom is unwanted (by the user) accesses causing unexpected auto-mount
>> attempts.
>>
>> I believe this monitoring of the mount table is what leads to excessive
>> CPU consumption I've seen, usually around six processes, under heavy
>> mount activity. And following this, when the mount table is large and
>> there is "no mount activity" two of the six processes continue to consume
>> excessive CPU, until the mount table shrinks.
>>
>> So now I'm coming around to the idea of reverting this change . and
>> going back to the drawing board.
> 
> I can well imaging that a large mount table could cause problems for
> applications that are written to expect one, and I can imagine that
> autofs could cause extra issues for such a program as it might change
> the mount table more often.  But I haven't yet worked out how this is
> related to the patch in question
> 
> Thanks,
> NeilBrown
> 



Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 23/11/17 09:43, Ian Kent wrote:
> On 23/11/17 08:47, NeilBrown wrote:
>> On Wed, Nov 22 2017, Ian Kent wrote:
>>
>>> On 21/11/17 09:53, NeilBrown wrote:
>>>> On Wed, May 10 2017, Ian Kent wrote:
>>>>
>>>>> The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
>>>>> which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
>>>>> of an automount by the call. But this flag is unconditionally cleared
>>>>> for all stat family system calls except statx().
>>>>>
>>>>> stat family system calls have always triggered mount requests for the
>>>>> negative dentry case in follow_automount() which is intended but prevents
>>>>> the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.
>>>>>
>>>>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>>>>> negative dentry case in follow_automount() needs to be changed to
>>>>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>>>>> required flags are clear).
>>>>>
>>>>> AFAICT this change doesn't have any noticable side effects and may,
>>>>> in some use cases (although I didn't see it in testing) prevent
>>>>> unnecessary callbacks to the automount daemon.
>>>>
>>>> Actually, this patch does have a noticeable side effect.
>>>
>>> That's right.
>>>
>>>>
>>>> Previously, if /home were an indirect mount point so that /home/neilb
>>>> would mount my home directory, "ls -l /home/neilb" would always work.
>>>> Now it doesn't.
>>>
>>> An this is correct too.
>>>
>>> The previous policy was that a ->lookup() would always cause a mount to
>>> occur. It's that policy that prevents the AT_NO_AUTOMOUNT flag from being
>>> able to work consistently.
>>
>> Just to be clear, the previous policy was:
>>  kernel - a lookup would cause a message to be sent to the automount daemon
>>  daemon - the receipt of a message would cause a mount to occur.
>>
>> Either part of this policy could be changed to allow AT_NO_AUTOMOUNT to
>> work reliably.  You chose to change the first.  At the time I thought
>> this was a good idea.  I no longer think so.
>>
>> Specifically, I think the second part of the policy should be revised a 
>> little.
>>
>>>
>>> If you set the indirect mount "browse" option that will cause the mount
>>> point directories for each of the map keys to be pre-created. So a
>>> directory listing will show the mount points and an attempt to access
>>> anything within the mount point directory will cause it to be mounted.
>>>
>>> There is still the problem of not knowing map keys when the wildcard
>>> map entry is used.
>>>
>>> Still, stat family systems calls have always had similar problems that
>>> prevent consistent behavior.
>>
>> Yes, I understand all this.  stat family sys-call have some odd
>> behaviours at times like "stat(); open(); fstat()" will result in
>> different sets of status info.  This is known.
>> The point is that these odd behaviours have changed in a noticeably way
>> (contrary to the change log comment) and it isn't clear these changes
>> are good.
>>
>>>
>>>>
>>>> This happens because "ls" calls 'lstat' on the path and when that fails
>>>> with "ENOENT", it doesn't bother trying to open.
>>>
>>> I know, I believe the ENOENT is appropriate because there is no mount
>>> and no directory at the time this happens.
>>
>> Two distinct statements here.  "no mount" and "no directory".
>> If AT_NO_AUTOMOUNT is given (or implicit) the "no mount" is significant
>> and shouldn't be changed.   It isn't clear that "no directory" is
>> significant.
>> If you think of the list of names stored in the autofs filesystem as a
>> cache of recently used names from the map, then the directory *does*
>> exist (in the map), it is just that the (in-kernel) cache hasn't been
>> populated yet.
>> Should "AT_NO_AUTOMOUNT" prevent the cache from being populated?
>>
>>>
>>>>
>>>> An alternate approach to the problem that this patch addresses is to
>>>> *not* change follow_automount() but instead change the automount daemon
>>>> and possibly autofs.
>&

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 23/11/17 09:43, Ian Kent wrote:
> On 23/11/17 08:47, NeilBrown wrote:
>> On Wed, Nov 22 2017, Ian Kent wrote:
>>
>>> On 21/11/17 09:53, NeilBrown wrote:
>>>> On Wed, May 10 2017, Ian Kent wrote:
>>>>
>>>>> The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
>>>>> which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
>>>>> of an automount by the call. But this flag is unconditionally cleared
>>>>> for all stat family system calls except statx().
>>>>>
>>>>> stat family system calls have always triggered mount requests for the
>>>>> negative dentry case in follow_automount() which is intended but prevents
>>>>> the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.
>>>>>
>>>>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>>>>> negative dentry case in follow_automount() needs to be changed to
>>>>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>>>>> required flags are clear).
>>>>>
>>>>> AFAICT this change doesn't have any noticable side effects and may,
>>>>> in some use cases (although I didn't see it in testing) prevent
>>>>> unnecessary callbacks to the automount daemon.
>>>>
>>>> Actually, this patch does have a noticeable side effect.
>>>
>>> That's right.
>>>
>>>>
>>>> Previously, if /home were an indirect mount point so that /home/neilb
>>>> would mount my home directory, "ls -l /home/neilb" would always work.
>>>> Now it doesn't.
>>>
>>> An this is correct too.
>>>
>>> The previous policy was that a ->lookup() would always cause a mount to
>>> occur. It's that policy that prevents the AT_NO_AUTOMOUNT flag from being
>>> able to work consistently.
>>
>> Just to be clear, the previous policy was:
>>  kernel - a lookup would cause a message to be sent to the automount daemon
>>  daemon - the receipt of a message would cause a mount to occur.
>>
>> Either part of this policy could be changed to allow AT_NO_AUTOMOUNT to
>> work reliably.  You chose to change the first.  At the time I thought
>> this was a good idea.  I no longer think so.
>>
>> Specifically, I think the second part of the policy should be revised a 
>> little.
>>
>>>
>>> If you set the indirect mount "browse" option that will cause the mount
>>> point directories for each of the map keys to be pre-created. So a
>>> directory listing will show the mount points and an attempt to access
>>> anything within the mount point directory will cause it to be mounted.
>>>
>>> There is still the problem of not knowing map keys when the wildcard
>>> map entry is used.
>>>
>>> Still, stat family systems calls have always had similar problems that
>>> prevent consistent behavior.
>>
>> Yes, I understand all this.  stat family sys-call have some odd
>> behaviours at times like "stat(); open(); fstat()" will result in
>> different sets of status info.  This is known.
>> The point is that these odd behaviours have changed in a noticeably way
>> (contrary to the change log comment) and it isn't clear these changes
>> are good.
>>
>>>
>>>>
>>>> This happens because "ls" calls 'lstat' on the path and when that fails
>>>> with "ENOENT", it doesn't bother trying to open.
>>>
>>> I know, I believe the ENOENT is appropriate because there is no mount
>>> and no directory at the time this happens.
>>
>> Two distinct statements here.  "no mount" and "no directory".
>> If AT_NO_AUTOMOUNT is given (or implicit) the "no mount" is significant
>> and shouldn't be changed.   It isn't clear that "no directory" is
>> significant.
>> If you think of the list of names stored in the autofs filesystem as a
>> cache of recently used names from the map, then the directory *does*
>> exist (in the map), it is just that the (in-kernel) cache hasn't been
>> populated yet.
>> Should "AT_NO_AUTOMOUNT" prevent the cache from being populated?
>>
>>>
>>>>
>>>> An alternate approach to the problem that this patch addresses is to
>>>> *not* change follow_automount() but instead change the automount daemon
>>>> and possibly autofs.
>&

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 23/11/17 08:47, NeilBrown wrote:
> On Wed, Nov 22 2017, Ian Kent wrote:
> 
>> On 21/11/17 09:53, NeilBrown wrote:
>>> On Wed, May 10 2017, Ian Kent wrote:
>>>
>>>> The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
>>>> which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
>>>> of an automount by the call. But this flag is unconditionally cleared
>>>> for all stat family system calls except statx().
>>>>
>>>> stat family system calls have always triggered mount requests for the
>>>> negative dentry case in follow_automount() which is intended but prevents
>>>> the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.
>>>>
>>>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>>>> negative dentry case in follow_automount() needs to be changed to
>>>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>>>> required flags are clear).
>>>>
>>>> AFAICT this change doesn't have any noticable side effects and may,
>>>> in some use cases (although I didn't see it in testing) prevent
>>>> unnecessary callbacks to the automount daemon.
>>>
>>> Actually, this patch does have a noticeable side effect.
>>
>> That's right.
>>
>>>
>>> Previously, if /home were an indirect mount point so that /home/neilb
>>> would mount my home directory, "ls -l /home/neilb" would always work.
>>> Now it doesn't.
>>
>> An this is correct too.
>>
>> The previous policy was that a ->lookup() would always cause a mount to
>> occur. It's that policy that prevents the AT_NO_AUTOMOUNT flag from being
>> able to work consistently.
> 
> Just to be clear, the previous policy was:
>  kernel - a lookup would cause a message to be sent to the automount daemon
>  daemon - the receipt of a message would cause a mount to occur.
> 
> Either part of this policy could be changed to allow AT_NO_AUTOMOUNT to
> work reliably.  You chose to change the first.  At the time I thought
> this was a good idea.  I no longer think so.
> 
> Specifically, I think the second part of the policy should be revised a 
> little.
> 
>>
>> If you set the indirect mount "browse" option that will cause the mount
>> point directories for each of the map keys to be pre-created. So a
>> directory listing will show the mount points and an attempt to access
>> anything within the mount point directory will cause it to be mounted.
>>
>> There is still the problem of not knowing map keys when the wildcard
>> map entry is used.
>>
>> Still, stat family systems calls have always had similar problems that
>> prevent consistent behavior.
> 
> Yes, I understand all this.  stat family sys-call have some odd
> behaviours at times like "stat(); open(); fstat()" will result in
> different sets of status info.  This is known.
> The point is that these odd behaviours have changed in a noticeably way
> (contrary to the change log comment) and it isn't clear these changes
> are good.
> 
>>
>>>
>>> This happens because "ls" calls 'lstat' on the path and when that fails
>>> with "ENOENT", it doesn't bother trying to open.
>>
>> I know, I believe the ENOENT is appropriate because there is no mount
>> and no directory at the time this happens.
> 
> Two distinct statements here.  "no mount" and "no directory".
> If AT_NO_AUTOMOUNT is given (or implicit) the "no mount" is significant
> and shouldn't be changed.   It isn't clear that "no directory" is
> significant.
> If you think of the list of names stored in the autofs filesystem as a
> cache of recently used names from the map, then the directory *does*
> exist (in the map), it is just that the (in-kernel) cache hasn't been
> populated yet.
> Should "AT_NO_AUTOMOUNT" prevent the cache from being populated?
> 
>>
>>>
>>> An alternate approach to the problem that this patch addresses is to
>>> *not* change follow_automount() but instead change the automount daemon
>>> and possibly autofs.
>>
>> Perhaps, but the daemon probably doesn't have enough information to
>> make decisions about this so there would need to be something coming
>> from the kernel too.
> 
> I don't think so.
> The daemon already has a promise that upcalls for a given name are
> serialized, and it has the ability to test if a name is already in the
> cache.  This is enough.
> 

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 23/11/17 08:47, NeilBrown wrote:
> On Wed, Nov 22 2017, Ian Kent wrote:
> 
>> On 21/11/17 09:53, NeilBrown wrote:
>>> On Wed, May 10 2017, Ian Kent wrote:
>>>
>>>> The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
>>>> which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
>>>> of an automount by the call. But this flag is unconditionally cleared
>>>> for all stat family system calls except statx().
>>>>
>>>> stat family system calls have always triggered mount requests for the
>>>> negative dentry case in follow_automount() which is intended but prevents
>>>> the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.
>>>>
>>>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>>>> negative dentry case in follow_automount() needs to be changed to
>>>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>>>> required flags are clear).
>>>>
>>>> AFAICT this change doesn't have any noticable side effects and may,
>>>> in some use cases (although I didn't see it in testing) prevent
>>>> unnecessary callbacks to the automount daemon.
>>>
>>> Actually, this patch does have a noticeable side effect.
>>
>> That's right.
>>
>>>
>>> Previously, if /home were an indirect mount point so that /home/neilb
>>> would mount my home directory, "ls -l /home/neilb" would always work.
>>> Now it doesn't.
>>
>> An this is correct too.
>>
>> The previous policy was that a ->lookup() would always cause a mount to
>> occur. It's that policy that prevents the AT_NO_AUTOMOUNT flag from being
>> able to work consistently.
> 
> Just to be clear, the previous policy was:
>  kernel - a lookup would cause a message to be sent to the automount daemon
>  daemon - the receipt of a message would cause a mount to occur.
> 
> Either part of this policy could be changed to allow AT_NO_AUTOMOUNT to
> work reliably.  You chose to change the first.  At the time I thought
> this was a good idea.  I no longer think so.
> 
> Specifically, I think the second part of the policy should be revised a 
> little.
> 
>>
>> If you set the indirect mount "browse" option that will cause the mount
>> point directories for each of the map keys to be pre-created. So a
>> directory listing will show the mount points and an attempt to access
>> anything within the mount point directory will cause it to be mounted.
>>
>> There is still the problem of not knowing map keys when the wildcard
>> map entry is used.
>>
>> Still, stat family systems calls have always had similar problems that
>> prevent consistent behavior.
> 
> Yes, I understand all this.  stat family sys-call have some odd
> behaviours at times like "stat(); open(); fstat()" will result in
> different sets of status info.  This is known.
> The point is that these odd behaviours have changed in a noticeably way
> (contrary to the change log comment) and it isn't clear these changes
> are good.
> 
>>
>>>
>>> This happens because "ls" calls 'lstat' on the path and when that fails
>>> with "ENOENT", it doesn't bother trying to open.
>>
>> I know, I believe the ENOENT is appropriate because there is no mount
>> and no directory at the time this happens.
> 
> Two distinct statements here.  "no mount" and "no directory".
> If AT_NO_AUTOMOUNT is given (or implicit) the "no mount" is significant
> and shouldn't be changed.   It isn't clear that "no directory" is
> significant.
> If you think of the list of names stored in the autofs filesystem as a
> cache of recently used names from the map, then the directory *does*
> exist (in the map), it is just that the (in-kernel) cache hasn't been
> populated yet.
> Should "AT_NO_AUTOMOUNT" prevent the cache from being populated?
> 
>>
>>>
>>> An alternate approach to the problem that this patch addresses is to
>>> *not* change follow_automount() but instead change the automount daemon
>>> and possibly autofs.
>>
>> Perhaps, but the daemon probably doesn't have enough information to
>> make decisions about this so there would need to be something coming
>> from the kernel too.
> 
> I don't think so.
> The daemon already has a promise that upcalls for a given name are
> serialized, and it has the ability to test if a name is already in the
> cache.  This is enough.
> 

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 22/11/17 12:28, Ian Kent wrote:
> On 21/11/17 09:53, NeilBrown wrote:
>> On Wed, May 10 2017, Ian Kent wrote:
>>
>>> The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
>>> which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
>>> of an automount by the call. But this flag is unconditionally cleared
>>> for all stat family system calls except statx().
>>>
>>> stat family system calls have always triggered mount requests for the
>>> negative dentry case in follow_automount() which is intended but prevents
>>> the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.
>>>
>>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>>> negative dentry case in follow_automount() needs to be changed to
>>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>>> required flags are clear).
>>>
>>> AFAICT this change doesn't have any noticable side effects and may,
>>> in some use cases (although I didn't see it in testing) prevent
>>> unnecessary callbacks to the automount daemon.
>>
>> Actually, this patch does have a noticeable side effect.
> 
> That's right.
> 
>>
>> Previously, if /home were an indirect mount point so that /home/neilb
>> would mount my home directory, "ls -l /home/neilb" would always work.
>> Now it doesn't.
> 
> An this is correct too.
> 
> The previous policy was that a ->lookup() would always cause a mount to
> occur. It's that policy that prevents the AT_NO_AUTOMOUNT flag from being
> able to work consistently.
> 
> If you set the indirect mount "browse" option that will cause the mount
> point directories for each of the map keys to be pre-created. So a
> directory listing will show the mount points and an attempt to access
> anything within the mount point directory will cause it to be mounted.
> 
> There is still the problem of not knowing map keys when the wildcard
> map entry is used.
> 
> Still, stat family systems calls have always had similar problems that
> prevent consistent behavior.
> 
>>
>> This happens because "ls" calls 'lstat' on the path and when that fails
>> with "ENOENT", it doesn't bother trying to open.
> 
> I know, I believe the ENOENT is appropriate because there is no mount
> and no directory at the time this happens.
> 
>>
>> An alternate approach to the problem that this patch addresses is to
>> *not* change follow_automount() but instead change the automount daemon
>> and possibly autofs.
> 
> Perhaps, but the daemon probably doesn't have enough information to
> make decisions about this so there would need to be something coming
> from the kernel too.
> 
>>
>> When a notification of access for an indirect mount point is received,
>> it would firstly just create the directory - not triggering a mount.
>> If another notification is then received (after the directory has been
>> created), it then triggers the mount.
> 
> Not sure, perhaps.
> 
> But I don't think that will work, I've had many problems with this type
> behavior due to bugs and I think the the same sort of problems would be
> encountered.
> 
> The indirect mount "browse" option which behaves very much like what your
> suggesting is the internal program default, and has been since the initial
> v5 release. Historically it is disabled on install to maintain backward
> compatibility with the original Linux autofs (which is also the reason for
> always triggering an automount on ->lookup()).
> 
> Perhaps now is the time for that to change. 
> 
>>
>> I suspect this might need changes to autofs to avoid races when two
>> threads call lstat() on the same path at the same time.  We would need
>> to ensure that automount didn't see this as two requests though
>> maybe it already has enough locking.
>>
>> Does that seem reasonable?  Should we revert this patch (as a
>> regression) and do something in automount instead?

Hey Neil, I'm looking at this again because RH QE have complained about
a regression test failing with a kernel that has this change.

Maybe I'm just dumb but I though a "find  "
would, well, just look at the contents below  but an
strace shows that it reads and calls fstatat() on "every entry in the
mount table" regardless of the path.

And with the move of userspace to use /proc based mount tables (one
example being the symlink of /etc/mtab into /proc) even modest sized
direct mount maps will be a problem with every entry getting mounted.

Systems will cope with this fine but larger systems 

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 22/11/17 12:28, Ian Kent wrote:
> On 21/11/17 09:53, NeilBrown wrote:
>> On Wed, May 10 2017, Ian Kent wrote:
>>
>>> The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
>>> which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
>>> of an automount by the call. But this flag is unconditionally cleared
>>> for all stat family system calls except statx().
>>>
>>> stat family system calls have always triggered mount requests for the
>>> negative dentry case in follow_automount() which is intended but prevents
>>> the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.
>>>
>>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>>> negative dentry case in follow_automount() needs to be changed to
>>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>>> required flags are clear).
>>>
>>> AFAICT this change doesn't have any noticable side effects and may,
>>> in some use cases (although I didn't see it in testing) prevent
>>> unnecessary callbacks to the automount daemon.
>>
>> Actually, this patch does have a noticeable side effect.
> 
> That's right.
> 
>>
>> Previously, if /home were an indirect mount point so that /home/neilb
>> would mount my home directory, "ls -l /home/neilb" would always work.
>> Now it doesn't.
> 
> An this is correct too.
> 
> The previous policy was that a ->lookup() would always cause a mount to
> occur. It's that policy that prevents the AT_NO_AUTOMOUNT flag from being
> able to work consistently.
> 
> If you set the indirect mount "browse" option that will cause the mount
> point directories for each of the map keys to be pre-created. So a
> directory listing will show the mount points and an attempt to access
> anything within the mount point directory will cause it to be mounted.
> 
> There is still the problem of not knowing map keys when the wildcard
> map entry is used.
> 
> Still, stat family systems calls have always had similar problems that
> prevent consistent behavior.
> 
>>
>> This happens because "ls" calls 'lstat' on the path and when that fails
>> with "ENOENT", it doesn't bother trying to open.
> 
> I know, I believe the ENOENT is appropriate because there is no mount
> and no directory at the time this happens.
> 
>>
>> An alternate approach to the problem that this patch addresses is to
>> *not* change follow_automount() but instead change the automount daemon
>> and possibly autofs.
> 
> Perhaps, but the daemon probably doesn't have enough information to
> make decisions about this so there would need to be something coming
> from the kernel too.
> 
>>
>> When a notification of access for an indirect mount point is received,
>> it would firstly just create the directory - not triggering a mount.
>> If another notification is then received (after the directory has been
>> created), it then triggers the mount.
> 
> Not sure, perhaps.
> 
> But I don't think that will work, I've had many problems with this type
> behavior due to bugs and I think the the same sort of problems would be
> encountered.
> 
> The indirect mount "browse" option which behaves very much like what your
> suggesting is the internal program default, and has been since the initial
> v5 release. Historically it is disabled on install to maintain backward
> compatibility with the original Linux autofs (which is also the reason for
> always triggering an automount on ->lookup()).
> 
> Perhaps now is the time for that to change. 
> 
>>
>> I suspect this might need changes to autofs to avoid races when two
>> threads call lstat() on the same path at the same time.  We would need
>> to ensure that automount didn't see this as two requests though
>> maybe it already has enough locking.
>>
>> Does that seem reasonable?  Should we revert this patch (as a
>> regression) and do something in automount instead?

Hey Neil, I'm looking at this again because RH QE have complained about
a regression test failing with a kernel that has this change.

Maybe I'm just dumb but I though a "find  "
would, well, just look at the contents below  but an
strace shows that it reads and calls fstatat() on "every entry in the
mount table" regardless of the path.

And with the move of userspace to use /proc based mount tables (one
example being the symlink of /etc/mtab into /proc) even modest sized
direct mount maps will be a problem with every entry getting mounted.

Systems will cope with this fine but larger systems 

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-21 Thread Ian Kent
On 21/11/17 09:53, NeilBrown wrote:
> On Wed, May 10 2017, Ian Kent wrote:
> 
>> The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
>> which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
>> of an automount by the call. But this flag is unconditionally cleared
>> for all stat family system calls except statx().
>>
>> stat family system calls have always triggered mount requests for the
>> negative dentry case in follow_automount() which is intended but prevents
>> the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.
>>
>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>> negative dentry case in follow_automount() needs to be changed to
>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>> required flags are clear).
>>
>> AFAICT this change doesn't have any noticable side effects and may,
>> in some use cases (although I didn't see it in testing) prevent
>> unnecessary callbacks to the automount daemon.
> 
> Actually, this patch does have a noticeable side effect.

That's right.

> 
> Previously, if /home were an indirect mount point so that /home/neilb
> would mount my home directory, "ls -l /home/neilb" would always work.
> Now it doesn't.

An this is correct too.

The previous policy was that a ->lookup() would always cause a mount to
occur. It's that policy that prevents the AT_NO_AUTOMOUNT flag from being
able to work consistently.

If you set the indirect mount "browse" option that will cause the mount
point directories for each of the map keys to be pre-created. So a
directory listing will show the mount points and an attempt to access
anything within the mount point directory will cause it to be mounted.

There is still the problem of not knowing map keys when the wildcard
map entry is used.

Still, stat family systems calls have always had similar problems that
prevent consistent behavior.

> 
> This happens because "ls" calls 'lstat' on the path and when that fails
> with "ENOENT", it doesn't bother trying to open.

I know, I believe the ENOENT is appropriate because there is no mount
and no directory at the time this happens.

> 
> An alternate approach to the problem that this patch addresses is to
> *not* change follow_automount() but instead change the automount daemon
> and possibly autofs.

Perhaps, but the daemon probably doesn't have enough information to
make decisions about this so there would need to be something coming
from the kernel too.

> 
> When a notification of access for an indirect mount point is received,
> it would firstly just create the directory - not triggering a mount.
> If another notification is then received (after the directory has been
> created), it then triggers the mount.

Not sure, perhaps.

But I don't think that will work, I've had many problems with this type
behavior due to bugs and I think the the same sort of problems would be
encountered.

The indirect mount "browse" option which behaves very much like what your
suggesting is the internal program default, and has been since the initial
v5 release. Historically it is disabled on install to maintain backward
compatibility with the original Linux autofs (which is also the reason for
always triggering an automount on ->lookup()).

Perhaps now is the time for that to change. 

> 
> I suspect this might need changes to autofs to avoid races when two
> threads call lstat() on the same path at the same time.  We would need
> to ensure that automount didn't see this as two requests though
> maybe it already has enough locking.
> 
> Does that seem reasonable?  Should we revert this patch (as a
> regression) and do something in automount instead?

Can you check out the "browse" option behavior before we talk further
about this please.

The problem in user space now is that there is no way to control
whether an indirect mount that uses the "nobrowse" option is triggered
or not (using fstatat() or statx()) regardless of the flags used and it's
essential the AT_NO_AUTOMOUNT flag work as expected to have user space
take more care in not triggering automounts.

> 
> Thanks,
> NeilBrown
> 
> 
>>
>> It's also possible that a stat family call has been made with a
>> path that is in the process of being mounted by some other process.
>> But stat family calls should return the automount state of the path
>> as it is "now" so it shouldn't wait for mount completion.
>>
>> This is the same semantic as the positive dentry case already
>> handled.
>>
>> Signed-off-by: Ian Kent <ra...@themaw.net>
>> Cc: David Howells <dhowe...@redhat.com>
>> Cc: Colin Walters <walt...

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-21 Thread Ian Kent
On 21/11/17 09:53, NeilBrown wrote:
> On Wed, May 10 2017, Ian Kent wrote:
> 
>> The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
>> which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
>> of an automount by the call. But this flag is unconditionally cleared
>> for all stat family system calls except statx().
>>
>> stat family system calls have always triggered mount requests for the
>> negative dentry case in follow_automount() which is intended but prevents
>> the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.
>>
>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>> negative dentry case in follow_automount() needs to be changed to
>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>> required flags are clear).
>>
>> AFAICT this change doesn't have any noticable side effects and may,
>> in some use cases (although I didn't see it in testing) prevent
>> unnecessary callbacks to the automount daemon.
> 
> Actually, this patch does have a noticeable side effect.

That's right.

> 
> Previously, if /home were an indirect mount point so that /home/neilb
> would mount my home directory, "ls -l /home/neilb" would always work.
> Now it doesn't.

An this is correct too.

The previous policy was that a ->lookup() would always cause a mount to
occur. It's that policy that prevents the AT_NO_AUTOMOUNT flag from being
able to work consistently.

If you set the indirect mount "browse" option that will cause the mount
point directories for each of the map keys to be pre-created. So a
directory listing will show the mount points and an attempt to access
anything within the mount point directory will cause it to be mounted.

There is still the problem of not knowing map keys when the wildcard
map entry is used.

Still, stat family systems calls have always had similar problems that
prevent consistent behavior.

> 
> This happens because "ls" calls 'lstat' on the path and when that fails
> with "ENOENT", it doesn't bother trying to open.

I know, I believe the ENOENT is appropriate because there is no mount
and no directory at the time this happens.

> 
> An alternate approach to the problem that this patch addresses is to
> *not* change follow_automount() but instead change the automount daemon
> and possibly autofs.

Perhaps, but the daemon probably doesn't have enough information to
make decisions about this so there would need to be something coming
from the kernel too.

> 
> When a notification of access for an indirect mount point is received,
> it would firstly just create the directory - not triggering a mount.
> If another notification is then received (after the directory has been
> created), it then triggers the mount.

Not sure, perhaps.

But I don't think that will work, I've had many problems with this type
behavior due to bugs and I think the the same sort of problems would be
encountered.

The indirect mount "browse" option which behaves very much like what your
suggesting is the internal program default, and has been since the initial
v5 release. Historically it is disabled on install to maintain backward
compatibility with the original Linux autofs (which is also the reason for
always triggering an automount on ->lookup()).

Perhaps now is the time for that to change. 

> 
> I suspect this might need changes to autofs to avoid races when two
> threads call lstat() on the same path at the same time.  We would need
> to ensure that automount didn't see this as two requests though
> maybe it already has enough locking.
> 
> Does that seem reasonable?  Should we revert this patch (as a
> regression) and do something in automount instead?

Can you check out the "browse" option behavior before we talk further
about this please.

The problem in user space now is that there is no way to control
whether an indirect mount that uses the "nobrowse" option is triggered
or not (using fstatat() or statx()) regardless of the flags used and it's
essential the AT_NO_AUTOMOUNT flag work as expected to have user space
take more care in not triggering automounts.

> 
> Thanks,
> NeilBrown
> 
> 
>>
>> It's also possible that a stat family call has been made with a
>> path that is in the process of being mounted by some other process.
>> But stat family calls should return the automount state of the path
>> as it is "now" so it shouldn't wait for mount completion.
>>
>> This is the same semantic as the positive dentry case already
>> handled.
>>
>> Signed-off-by: Ian Kent 
>> Cc: David Howells 
>> Cc: Colin Walters 
>> Cc: Ondrej Holy 
>> Cc: sta...

Re: [PATCH] autofs: don't fail mount for transient error

2017-11-03 Thread Ian Kent
On 03/11/17 09:40, NeilBrown wrote:
> 

Hi Neil, and thanks taking the time to post the patch.

> Currently if the autofs kernel module gets an error when
> writing to the pipe which links to the daemon, then it
> marks the whole moutpoint as catatonic, and it will stop working.
> 
> It is possible that the error is transient.  This can happen
> if the daemon is slow and more than 16 requests queue up.
> If a subsequent process tries to queue a request, and is then signalled,
> the write to the pipe will return -ERESTARTSYS and autofs
> will take that as total failure.

Indeed it does.

And given the problems with a half dozen (or so) user space
applications consuming large amounts of CPU under heavy mount
and umount activity this could happen more easily than we
expect.

> 
> So change the code to assess -ERESTARTSYS and -ENOMEM as transient
> failures which only abort the current request, not the whole
> mountpoint.

This looks good to me.

> 
> Signed-off-by: NeilBrown 
> ---
> 
> Do people think this should got to -stable ??
> It isn't a crash or a data corruption, but having autofs mountpoints
> suddenly stop working is rather inconvenient.

Perhaps that's a good idea given the CPU usage problem I refer
to above has been around for a while now.

> 
> Thanks,
> NeilBrown
> 
> 
>  fs/autofs4/waitq.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
> index 4ac49d038bf3..8fc41705c7cd 100644
> --- a/fs/autofs4/waitq.c
> +++ b/fs/autofs4/waitq.c
> @@ -81,7 +81,8 @@ static int autofs4_write(struct autofs_sb_info *sbi,
>   spin_unlock_irqrestore(>sighand->siglock, flags);
>   }
>  
> - return (bytes > 0);
> + /* if 'wr' returned 0 (impossible) we assume -EIO (safe) */
> + return bytes == 0 ? 0 : wr < 0 ? wr : -EIO;
>  }
>  
>  static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
> @@ -95,6 +96,7 @@ static void autofs4_notify_daemon(struct autofs_sb_info 
> *sbi,
>   } pkt;
>   struct file *pipe = NULL;
>   size_t pktsz;
> + int ret;
>  
>   pr_debug("wait id = 0x%08lx, name = %.*s, type=%d\n",
>(unsigned long) wq->wait_queue_token,
> @@ -169,7 +171,18 @@ static void autofs4_notify_daemon(struct autofs_sb_info 
> *sbi,
>   mutex_unlock(>wq_mutex);
>  
>   if (autofs4_write(sbi, pipe, , pktsz))
> + switch (ret = autofs4_write(sbi, pipe, , pktsz)) {
> + case 0:
> + break;
> + case -ENOMEM:
> + case -ERESTARTSYS:
> + /* Just fail this one */
> + autofs4_wait_release(sbi, wq->wait_queue_token, ret);
> + break;
> + default:
>   autofs4_catatonic_mode(sbi);
> + break;
> + }
>   fput(pipe);
>  }
>  
> 



Re: [PATCH] autofs: don't fail mount for transient error

2017-11-03 Thread Ian Kent
On 03/11/17 09:40, NeilBrown wrote:
> 

Hi Neil, and thanks taking the time to post the patch.

> Currently if the autofs kernel module gets an error when
> writing to the pipe which links to the daemon, then it
> marks the whole moutpoint as catatonic, and it will stop working.
> 
> It is possible that the error is transient.  This can happen
> if the daemon is slow and more than 16 requests queue up.
> If a subsequent process tries to queue a request, and is then signalled,
> the write to the pipe will return -ERESTARTSYS and autofs
> will take that as total failure.

Indeed it does.

And given the problems with a half dozen (or so) user space
applications consuming large amounts of CPU under heavy mount
and umount activity this could happen more easily than we
expect.

> 
> So change the code to assess -ERESTARTSYS and -ENOMEM as transient
> failures which only abort the current request, not the whole
> mountpoint.

This looks good to me.

> 
> Signed-off-by: NeilBrown 
> ---
> 
> Do people think this should got to -stable ??
> It isn't a crash or a data corruption, but having autofs mountpoints
> suddenly stop working is rather inconvenient.

Perhaps that's a good idea given the CPU usage problem I refer
to above has been around for a while now.

> 
> Thanks,
> NeilBrown
> 
> 
>  fs/autofs4/waitq.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
> index 4ac49d038bf3..8fc41705c7cd 100644
> --- a/fs/autofs4/waitq.c
> +++ b/fs/autofs4/waitq.c
> @@ -81,7 +81,8 @@ static int autofs4_write(struct autofs_sb_info *sbi,
>   spin_unlock_irqrestore(>sighand->siglock, flags);
>   }
>  
> - return (bytes > 0);
> + /* if 'wr' returned 0 (impossible) we assume -EIO (safe) */
> + return bytes == 0 ? 0 : wr < 0 ? wr : -EIO;
>  }
>  
>  static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
> @@ -95,6 +96,7 @@ static void autofs4_notify_daemon(struct autofs_sb_info 
> *sbi,
>   } pkt;
>   struct file *pipe = NULL;
>   size_t pktsz;
> + int ret;
>  
>   pr_debug("wait id = 0x%08lx, name = %.*s, type=%d\n",
>(unsigned long) wq->wait_queue_token,
> @@ -169,7 +171,18 @@ static void autofs4_notify_daemon(struct autofs_sb_info 
> *sbi,
>   mutex_unlock(>wq_mutex);
>  
>   if (autofs4_write(sbi, pipe, , pktsz))
> + switch (ret = autofs4_write(sbi, pipe, , pktsz)) {
> + case 0:
> + break;
> + case -ENOMEM:
> + case -ERESTARTSYS:
> + /* Just fail this one */
> + autofs4_wait_release(sbi, wq->wait_queue_token, ret);
> + break;
> + default:
>   autofs4_catatonic_mode(sbi);
> + break;
> + }
>   fput(pipe);
>  }
>  
> 



Re: [RFC PATCH 1/2] autofs: set compat flag on sbi when daemon uses 32bit addressation

2017-09-14 Thread Ian Kent
On 14/09/17 19:39, Stanislav Kinsburskiy wrote:
> 
> 
> 14.09.2017 13:29, Ian Kent пишет:
>> On 14/09/17 17:24, Stanislav Kinsburskiy wrote:
>>>
>>>
>>> 14.09.2017 02:38, Ian Kent пишет:
>>>> On 01/09/17 19:21, Stanislav Kinsburskiy wrote:
>>>>> Signed-off-by: Stanislav Kinsburskiy <skinsbur...@virtuozzo.com>
>>>>> ---
>>>>>  fs/autofs4/autofs_i.h  |3 +++
>>>>>  fs/autofs4/dev-ioctl.c |3 +++
>>>>>  fs/autofs4/inode.c |4 +++-
>>>>>  3 files changed, 9 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>>>> index 4737615..3da105f 100644
>>>>> --- a/fs/autofs4/autofs_i.h
>>>>> +++ b/fs/autofs4/autofs_i.h
>>>>> @@ -120,6 +120,9 @@ struct autofs_sb_info {
>>>>>   struct list_head active_list;
>>>>>   struct list_head expiring_list;
>>>>>   struct rcu_head rcu;
>>>>> +#ifdef CONFIG_COMPAT
>>>>> + unsigned is32bit:1;
>>>>> +#endif
>>>>>  };
>>>>>  
>>>>>  static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb)
>>>>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>>>>> index b7c816f..467d6c4 100644
>>>>> --- a/fs/autofs4/dev-ioctl.c
>>>>> +++ b/fs/autofs4/dev-ioctl.c
>>>>> @@ -397,6 +397,9 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp,
>>>>>   sbi->pipefd = pipefd;
>>>>>   sbi->pipe = pipe;
>>>>>   sbi->catatonic = 0;
>>>>> +#ifdef CONFIG_COMPAT
>>>>> + sbi->is32bit = is_compat_task();
>>>>> +#endif
>>>>>   }
>>>>>  out:
>>>>>   put_pid(new_pid);
>>>>> diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
>>>>> index 09e7d68..21d3c0b 100644
>>>>> --- a/fs/autofs4/inode.c
>>>>> +++ b/fs/autofs4/inode.c
>>>>> @@ -301,7 +301,9 @@ int autofs4_fill_super(struct super_block *s, void 
>>>>> *data, int silent)
>>>>>   } else {
>>>>>   sbi->oz_pgrp = get_task_pid(current, PIDTYPE_PGID);
>>>>>   }
>>>>> -
>>>>> +#ifdef CONFIG_COMPAT
>>>>> + sbi->is32bit = is_compat_task();
>>>>> +#endif
>>>>>   if (autofs_type_trigger(sbi->type))
>>>>>   __managed_dentry_set_managed(root);
>>>>>  
>>>>>
>>>>
>>>> Not sure about this.
>>>>
>>>> Don't you think it would be better to avoid the in code #ifdefs by doing 
>>>> some
>>>> checks and defines in the header file and defining what's need to just use
>>>> is_compat_task().
>>>>
>>>
>>> Yes, might be...
>>>
>>>> Not sure 2 patches are needed for this either ..
>>>>
>>>
>>> Well, I found this issue occasionally.
>>
>> I'm wondering what the symptoms are?
>>
> 
> Size of struct autofs_v5_packet is 300 bytes for x86 and 304 bytes for x86_64.
> Which means, that 32bit task can read more than size of autofs_v5_packet on 
> 64bit kernel.

Are you sure?

Shouldn't that be a short read on the x86 side of a 4 bytes longer
structure on the x86_64 side.

I didn't think you could have a 64 bit client on a 32 bit kernel
so the converse (the read past end of struct) doesn't apply.

Ian


Re: [RFC PATCH 1/2] autofs: set compat flag on sbi when daemon uses 32bit addressation

2017-09-14 Thread Ian Kent
On 14/09/17 19:39, Stanislav Kinsburskiy wrote:
> 
> 
> 14.09.2017 13:29, Ian Kent пишет:
>> On 14/09/17 17:24, Stanislav Kinsburskiy wrote:
>>>
>>>
>>> 14.09.2017 02:38, Ian Kent пишет:
>>>> On 01/09/17 19:21, Stanislav Kinsburskiy wrote:
>>>>> Signed-off-by: Stanislav Kinsburskiy 
>>>>> ---
>>>>>  fs/autofs4/autofs_i.h  |3 +++
>>>>>  fs/autofs4/dev-ioctl.c |3 +++
>>>>>  fs/autofs4/inode.c |4 +++-
>>>>>  3 files changed, 9 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>>>> index 4737615..3da105f 100644
>>>>> --- a/fs/autofs4/autofs_i.h
>>>>> +++ b/fs/autofs4/autofs_i.h
>>>>> @@ -120,6 +120,9 @@ struct autofs_sb_info {
>>>>>   struct list_head active_list;
>>>>>   struct list_head expiring_list;
>>>>>   struct rcu_head rcu;
>>>>> +#ifdef CONFIG_COMPAT
>>>>> + unsigned is32bit:1;
>>>>> +#endif
>>>>>  };
>>>>>  
>>>>>  static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb)
>>>>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>>>>> index b7c816f..467d6c4 100644
>>>>> --- a/fs/autofs4/dev-ioctl.c
>>>>> +++ b/fs/autofs4/dev-ioctl.c
>>>>> @@ -397,6 +397,9 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp,
>>>>>   sbi->pipefd = pipefd;
>>>>>   sbi->pipe = pipe;
>>>>>   sbi->catatonic = 0;
>>>>> +#ifdef CONFIG_COMPAT
>>>>> + sbi->is32bit = is_compat_task();
>>>>> +#endif
>>>>>   }
>>>>>  out:
>>>>>   put_pid(new_pid);
>>>>> diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
>>>>> index 09e7d68..21d3c0b 100644
>>>>> --- a/fs/autofs4/inode.c
>>>>> +++ b/fs/autofs4/inode.c
>>>>> @@ -301,7 +301,9 @@ int autofs4_fill_super(struct super_block *s, void 
>>>>> *data, int silent)
>>>>>   } else {
>>>>>   sbi->oz_pgrp = get_task_pid(current, PIDTYPE_PGID);
>>>>>   }
>>>>> -
>>>>> +#ifdef CONFIG_COMPAT
>>>>> + sbi->is32bit = is_compat_task();
>>>>> +#endif
>>>>>   if (autofs_type_trigger(sbi->type))
>>>>>   __managed_dentry_set_managed(root);
>>>>>  
>>>>>
>>>>
>>>> Not sure about this.
>>>>
>>>> Don't you think it would be better to avoid the in code #ifdefs by doing 
>>>> some
>>>> checks and defines in the header file and defining what's need to just use
>>>> is_compat_task().
>>>>
>>>
>>> Yes, might be...
>>>
>>>> Not sure 2 patches are needed for this either ..
>>>>
>>>
>>> Well, I found this issue occasionally.
>>
>> I'm wondering what the symptoms are?
>>
> 
> Size of struct autofs_v5_packet is 300 bytes for x86 and 304 bytes for x86_64.
> Which means, that 32bit task can read more than size of autofs_v5_packet on 
> 64bit kernel.

Are you sure?

Shouldn't that be a short read on the x86 side of a 4 bytes longer
structure on the x86_64 side.

I didn't think you could have a 64 bit client on a 32 bit kernel
so the converse (the read past end of struct) doesn't apply.

Ian


Re: [RFC PATCH 1/2] autofs: set compat flag on sbi when daemon uses 32bit addressation

2017-09-14 Thread Ian Kent
On 14/09/17 17:24, Stanislav Kinsburskiy wrote:
> 
> 
> 14.09.2017 02:38, Ian Kent пишет:
>> On 01/09/17 19:21, Stanislav Kinsburskiy wrote:
>>> Signed-off-by: Stanislav Kinsburskiy <skinsbur...@virtuozzo.com>
>>> ---
>>>  fs/autofs4/autofs_i.h  |3 +++
>>>  fs/autofs4/dev-ioctl.c |3 +++
>>>  fs/autofs4/inode.c |4 +++-
>>>  3 files changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>> index 4737615..3da105f 100644
>>> --- a/fs/autofs4/autofs_i.h
>>> +++ b/fs/autofs4/autofs_i.h
>>> @@ -120,6 +120,9 @@ struct autofs_sb_info {
>>> struct list_head active_list;
>>> struct list_head expiring_list;
>>> struct rcu_head rcu;
>>> +#ifdef CONFIG_COMPAT
>>> +   unsigned is32bit:1;
>>> +#endif
>>>  };
>>>  
>>>  static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb)
>>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>>> index b7c816f..467d6c4 100644
>>> --- a/fs/autofs4/dev-ioctl.c
>>> +++ b/fs/autofs4/dev-ioctl.c
>>> @@ -397,6 +397,9 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp,
>>> sbi->pipefd = pipefd;
>>> sbi->pipe = pipe;
>>> sbi->catatonic = 0;
>>> +#ifdef CONFIG_COMPAT
>>> +   sbi->is32bit = is_compat_task();
>>> +#endif
>>> }
>>>  out:
>>> put_pid(new_pid);
>>> diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
>>> index 09e7d68..21d3c0b 100644
>>> --- a/fs/autofs4/inode.c
>>> +++ b/fs/autofs4/inode.c
>>> @@ -301,7 +301,9 @@ int autofs4_fill_super(struct super_block *s, void 
>>> *data, int silent)
>>> } else {
>>> sbi->oz_pgrp = get_task_pid(current, PIDTYPE_PGID);
>>> }
>>> -
>>> +#ifdef CONFIG_COMPAT
>>> +   sbi->is32bit = is_compat_task();
>>> +#endif
>>> if (autofs_type_trigger(sbi->type))
>>> __managed_dentry_set_managed(root);
>>>  
>>>
>>
>> Not sure about this.
>>
>> Don't you think it would be better to avoid the in code #ifdefs by doing some
>> checks and defines in the header file and defining what's need to just use
>> is_compat_task().
>>
> 
> Yes, might be...
> 
>> Not sure 2 patches are needed for this either ..
>>
> 
> Well, I found this issue occasionally.

I'm wondering what the symptoms are?

> And, frankly speaking, it's not clear to me, whether this issue is important 
> at all, so I wanted to clarify this first.
> Thanks to O_DIRECT, the only way to catch the issue is to try to read more, 
> than expected, in compat task (that's how I found it).

Right, the O_DIRECT patch from Linus was expected to fix the structure
alignment problem. The stuct field offsets are ok aren't they?

> I don't see any other flaw so far. And if so, that, probably, we shouldn't 
> care about the issue at all.
> What do you think?

If we are seeing hangs, incorrect struct fields or similar something
should be done about it but if all is actually working ok then the
O_DIRECT fix is doing it's job and further changes aren't necessary.

Ian


Re: [RFC PATCH 1/2] autofs: set compat flag on sbi when daemon uses 32bit addressation

2017-09-14 Thread Ian Kent
On 14/09/17 17:24, Stanislav Kinsburskiy wrote:
> 
> 
> 14.09.2017 02:38, Ian Kent пишет:
>> On 01/09/17 19:21, Stanislav Kinsburskiy wrote:
>>> Signed-off-by: Stanislav Kinsburskiy 
>>> ---
>>>  fs/autofs4/autofs_i.h  |3 +++
>>>  fs/autofs4/dev-ioctl.c |3 +++
>>>  fs/autofs4/inode.c |4 +++-
>>>  3 files changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>> index 4737615..3da105f 100644
>>> --- a/fs/autofs4/autofs_i.h
>>> +++ b/fs/autofs4/autofs_i.h
>>> @@ -120,6 +120,9 @@ struct autofs_sb_info {
>>> struct list_head active_list;
>>> struct list_head expiring_list;
>>> struct rcu_head rcu;
>>> +#ifdef CONFIG_COMPAT
>>> +   unsigned is32bit:1;
>>> +#endif
>>>  };
>>>  
>>>  static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb)
>>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>>> index b7c816f..467d6c4 100644
>>> --- a/fs/autofs4/dev-ioctl.c
>>> +++ b/fs/autofs4/dev-ioctl.c
>>> @@ -397,6 +397,9 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp,
>>> sbi->pipefd = pipefd;
>>> sbi->pipe = pipe;
>>> sbi->catatonic = 0;
>>> +#ifdef CONFIG_COMPAT
>>> +   sbi->is32bit = is_compat_task();
>>> +#endif
>>> }
>>>  out:
>>> put_pid(new_pid);
>>> diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
>>> index 09e7d68..21d3c0b 100644
>>> --- a/fs/autofs4/inode.c
>>> +++ b/fs/autofs4/inode.c
>>> @@ -301,7 +301,9 @@ int autofs4_fill_super(struct super_block *s, void 
>>> *data, int silent)
>>> } else {
>>> sbi->oz_pgrp = get_task_pid(current, PIDTYPE_PGID);
>>> }
>>> -
>>> +#ifdef CONFIG_COMPAT
>>> +   sbi->is32bit = is_compat_task();
>>> +#endif
>>> if (autofs_type_trigger(sbi->type))
>>> __managed_dentry_set_managed(root);
>>>  
>>>
>>
>> Not sure about this.
>>
>> Don't you think it would be better to avoid the in code #ifdefs by doing some
>> checks and defines in the header file and defining what's need to just use
>> is_compat_task().
>>
> 
> Yes, might be...
> 
>> Not sure 2 patches are needed for this either ..
>>
> 
> Well, I found this issue occasionally.

I'm wondering what the symptoms are?

> And, frankly speaking, it's not clear to me, whether this issue is important 
> at all, so I wanted to clarify this first.
> Thanks to O_DIRECT, the only way to catch the issue is to try to read more, 
> than expected, in compat task (that's how I found it).

Right, the O_DIRECT patch from Linus was expected to fix the structure
alignment problem. The stuct field offsets are ok aren't they?

> I don't see any other flaw so far. And if so, that, probably, we shouldn't 
> care about the issue at all.
> What do you think?

If we are seeing hangs, incorrect struct fields or similar something
should be done about it but if all is actually working ok then the
O_DIRECT fix is doing it's job and further changes aren't necessary.

Ian


Re: [RFC PATCH 1/2] autofs: set compat flag on sbi when daemon uses 32bit addressation

2017-09-13 Thread Ian Kent
On 01/09/17 19:21, Stanislav Kinsburskiy wrote:
> Signed-off-by: Stanislav Kinsburskiy 
> ---
>  fs/autofs4/autofs_i.h  |3 +++
>  fs/autofs4/dev-ioctl.c |3 +++
>  fs/autofs4/inode.c |4 +++-
>  3 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
> index 4737615..3da105f 100644
> --- a/fs/autofs4/autofs_i.h
> +++ b/fs/autofs4/autofs_i.h
> @@ -120,6 +120,9 @@ struct autofs_sb_info {
>   struct list_head active_list;
>   struct list_head expiring_list;
>   struct rcu_head rcu;
> +#ifdef CONFIG_COMPAT
> + unsigned is32bit:1;
> +#endif
>  };
>  
>  static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb)
> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
> index b7c816f..467d6c4 100644
> --- a/fs/autofs4/dev-ioctl.c
> +++ b/fs/autofs4/dev-ioctl.c
> @@ -397,6 +397,9 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp,
>   sbi->pipefd = pipefd;
>   sbi->pipe = pipe;
>   sbi->catatonic = 0;
> +#ifdef CONFIG_COMPAT
> + sbi->is32bit = is_compat_task();
> +#endif
>   }
>  out:
>   put_pid(new_pid);
> diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
> index 09e7d68..21d3c0b 100644
> --- a/fs/autofs4/inode.c
> +++ b/fs/autofs4/inode.c
> @@ -301,7 +301,9 @@ int autofs4_fill_super(struct super_block *s, void *data, 
> int silent)
>   } else {
>   sbi->oz_pgrp = get_task_pid(current, PIDTYPE_PGID);
>   }
> -
> +#ifdef CONFIG_COMPAT
> + sbi->is32bit = is_compat_task();
> +#endif
>   if (autofs_type_trigger(sbi->type))
>   __managed_dentry_set_managed(root);
>  
> 

Not sure about this.

Don't you think it would be better to avoid the in code #ifdefs by doing some
checks and defines in the header file and defining what's need to just use
is_compat_task().

Not sure 2 patches are needed for this either ..

Ian


Re: [RFC PATCH 1/2] autofs: set compat flag on sbi when daemon uses 32bit addressation

2017-09-13 Thread Ian Kent
On 01/09/17 19:21, Stanislav Kinsburskiy wrote:
> Signed-off-by: Stanislav Kinsburskiy 
> ---
>  fs/autofs4/autofs_i.h  |3 +++
>  fs/autofs4/dev-ioctl.c |3 +++
>  fs/autofs4/inode.c |4 +++-
>  3 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
> index 4737615..3da105f 100644
> --- a/fs/autofs4/autofs_i.h
> +++ b/fs/autofs4/autofs_i.h
> @@ -120,6 +120,9 @@ struct autofs_sb_info {
>   struct list_head active_list;
>   struct list_head expiring_list;
>   struct rcu_head rcu;
> +#ifdef CONFIG_COMPAT
> + unsigned is32bit:1;
> +#endif
>  };
>  
>  static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb)
> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
> index b7c816f..467d6c4 100644
> --- a/fs/autofs4/dev-ioctl.c
> +++ b/fs/autofs4/dev-ioctl.c
> @@ -397,6 +397,9 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp,
>   sbi->pipefd = pipefd;
>   sbi->pipe = pipe;
>   sbi->catatonic = 0;
> +#ifdef CONFIG_COMPAT
> + sbi->is32bit = is_compat_task();
> +#endif
>   }
>  out:
>   put_pid(new_pid);
> diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
> index 09e7d68..21d3c0b 100644
> --- a/fs/autofs4/inode.c
> +++ b/fs/autofs4/inode.c
> @@ -301,7 +301,9 @@ int autofs4_fill_super(struct super_block *s, void *data, 
> int silent)
>   } else {
>   sbi->oz_pgrp = get_task_pid(current, PIDTYPE_PGID);
>   }
> -
> +#ifdef CONFIG_COMPAT
> + sbi->is32bit = is_compat_task();
> +#endif
>   if (autofs_type_trigger(sbi->type))
>   __managed_dentry_set_managed(root);
>  
> 

Not sure about this.

Don't you think it would be better to avoid the in code #ifdefs by doing some
checks and defines in the header file and defining what's need to just use
is_compat_task().

Not sure 2 patches are needed for this either ..

Ian


Re: [RFC PATCH 0/2] autofs: fix autofs_v5_packet dlivery in compat mode

2017-09-13 Thread Ian Kent
On 01/09/17 19:21, Stanislav Kinsburskiy wrote:
> The problem is that in compat mode struct autofs_v5_packet has to have 
> different size
> (i.e. 4 bytes less).

I regret (several times over) my original decision to not make v5 packets
packed 

I have to say the description of the problem is not all that good.

> 
> This is RFC because:
> 1) This issue is hidden, because autofs pipe has O_DIRECT and the rest of the
> epacket is truncated when read.
> 2) X86 arch doesn't have is_compat_task() helper
> 3) It's now clear, what to do if "pgrp" option is specified.

I don't understand what the "pgrp" option has to do with this?

> 
> The following series implements...
> 
> ---
> 
> Stanislav Kinsburskiy (2):
>   autofs: set compat flag on sbi when daemon uses 32bit addressation
>   autofs: sent 32-bit sized packet for 32-bit process
> 
> 
>  fs/autofs4/autofs_i.h  |3 +++
>  fs/autofs4/dev-ioctl.c |3 +++
>  fs/autofs4/inode.c |4 +++-
>  fs/autofs4/waitq.c |5 +
>  4 files changed, 14 insertions(+), 1 deletion(-)
> 



Re: [RFC PATCH 0/2] autofs: fix autofs_v5_packet dlivery in compat mode

2017-09-13 Thread Ian Kent
On 01/09/17 19:21, Stanislav Kinsburskiy wrote:
> The problem is that in compat mode struct autofs_v5_packet has to have 
> different size
> (i.e. 4 bytes less).

I regret (several times over) my original decision to not make v5 packets
packed 

I have to say the description of the problem is not all that good.

> 
> This is RFC because:
> 1) This issue is hidden, because autofs pipe has O_DIRECT and the rest of the
> epacket is truncated when read.
> 2) X86 arch doesn't have is_compat_task() helper
> 3) It's now clear, what to do if "pgrp" option is specified.

I don't understand what the "pgrp" option has to do with this?

> 
> The following series implements...
> 
> ---
> 
> Stanislav Kinsburskiy (2):
>   autofs: set compat flag on sbi when daemon uses 32bit addressation
>   autofs: sent 32-bit sized packet for 32-bit process
> 
> 
>  fs/autofs4/autofs_i.h  |3 +++
>  fs/autofs4/dev-ioctl.c |3 +++
>  fs/autofs4/inode.c |4 +++-
>  fs/autofs4/waitq.c |5 +
>  4 files changed, 14 insertions(+), 1 deletion(-)
> 



Re: Do we really need d_weak_revalidate???

2017-08-24 Thread Ian Kent
On 24/08/17 19:03, Michael Kerrisk (man-pages) wrote:
> Hi Neil,
> 
> On 24 August 2017 at 06:07, NeilBrown <ne...@suse.com> wrote:
>> On Wed, Aug 23 2017, Ian Kent wrote:
>>
>>>
>>> That inconsistency has bothered me for quite a while now.
>>>
>>> It was carried over from the autofs module behavior when automounting
>>> support was added to the VFS. What's worse is it prevents the use of
>>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>>> statx().
>>>
>>> There is some risk in changing that so it does work but it really does
>>> need to work to enable userspace to not trigger an automount by using
>>> this flag.
>>>
>>> So that's (hopefully) going to change soonish, see:
>>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>>
>>> The result should be that stat family calls don't trigger automounts except
>>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>>
>>
>> oooh, yes.  That's much better - thanks.
>>
>> We should make sure that change gets into the man pages...
>>
>> First however, we should probably correct the man page!
>> stat.2 says:
>>
>>
>>   NOTES
>>On Linux, lstat() will generally not trigger  automounter
>>action,  whereas  stat()  will  (but  see  the description of
>>fstatat() AT_NO_AUTOMOUNT fag, above).
>>
>> which is wrong: lstat and stat treat automounts the same.
>> @Michael: do you recall why you inserted that text?  The commit message
>> in commit 1ef5b2805471 ("stat.2: Cosmetic reworking of timestamp
>> discussion in NOTES") is not very helpful.
> 
> That commit really was just cosmetic changes. The change that
> introduced the text was 82d2be3d9d66b7, based on a note from Peter
> Anvin:

Indeed, that was correct for autofs v3 but we're at autofs v5 now and
a lot has changed over time (the commit is from 2008).

All I can do is apologize for not also checking the man pages and trying
to keep them up to date.

Let's just work on making them accurate now.

> 
> [[
> > > Additionally, you may want to make a note in the stat/lstat man page 
> tha
> t on
> > > Linux, lstat(2) will generally not trigger automounter action, whereas
> > > stat(2) will.
> >
> > I don't understand this last piece.  Can you say some more.  (I'm not
> > familiar with automounter details.)
> 
> An automounter (either an explicit one, like autofs, or an implicit
> one, such as are used by AFS or NFSv4) is something that triggers
> a mount when something is touched.
> 
> However, it's undesirable to automount, say, everyone's home
> directory just because someone opened up /home in their GUI
> browser or typed "ls -l /home".  The early automounters simply
> didn't list the contents until you accessed it by name;
> this is still the case when you can't enumerate a mapping
> (say, all DNS names under /net).  However, this is extremely
> inconvenient, too.
> 
> The solution we ended up settling on is to create something
> that looks like a directory (i.e. reports S_IFDIR in stat()),
> but behaves somewhat like a symlink.  In particular, when it is
> accessed in a way where a symlink would be dereferenced,
> the automount triggers and the directory is mounted.  However,
> system calls which do *not* cause a symlink to be dereferenced,
> like lstat(), also do not cause the automounter to trigger.
> This means that "ls -l", or a GUI file browser, can see a list
> of directories without causing each one of them to be automounted.
> 
> -hpa
> ]]
> 
> Cheers,
> 
> Michael
> 
>> I propose correcting to
>>
>>   NOTES:
>>   On Linux, lstat() nor stat() act as though AT_NO_AUTOMOUNT was set
>>   and will not trigger automounter action for direct automount
>>   points, though they may (prior to 4.14) for indirect automount
>>   points.
>>
>>
>> The more precise details, that automount action for indirect automount
>> points is not triggered when the 'browse' option is used, is probably
>> not necessary.
>>
>> Ian: if you agree with that text, and Michael doesn't provide alternate
>> evidence, I'll submit a formal patch for the man page or should we
>> just wait until the patch actually lands?
>>
>> Thanks,
>> NeilBrown
>>
> 
> 
> 



Re: Do we really need d_weak_revalidate???

2017-08-24 Thread Ian Kent
On 24/08/17 19:03, Michael Kerrisk (man-pages) wrote:
> Hi Neil,
> 
> On 24 August 2017 at 06:07, NeilBrown  wrote:
>> On Wed, Aug 23 2017, Ian Kent wrote:
>>
>>>
>>> That inconsistency has bothered me for quite a while now.
>>>
>>> It was carried over from the autofs module behavior when automounting
>>> support was added to the VFS. What's worse is it prevents the use of
>>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>>> statx().
>>>
>>> There is some risk in changing that so it does work but it really does
>>> need to work to enable userspace to not trigger an automount by using
>>> this flag.
>>>
>>> So that's (hopefully) going to change soonish, see:
>>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>>
>>> The result should be that stat family calls don't trigger automounts except
>>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>>
>>
>> oooh, yes.  That's much better - thanks.
>>
>> We should make sure that change gets into the man pages...
>>
>> First however, we should probably correct the man page!
>> stat.2 says:
>>
>>
>>   NOTES
>>On Linux, lstat() will generally not trigger  automounter
>>action,  whereas  stat()  will  (but  see  the description of
>>fstatat() AT_NO_AUTOMOUNT fag, above).
>>
>> which is wrong: lstat and stat treat automounts the same.
>> @Michael: do you recall why you inserted that text?  The commit message
>> in commit 1ef5b2805471 ("stat.2: Cosmetic reworking of timestamp
>> discussion in NOTES") is not very helpful.
> 
> That commit really was just cosmetic changes. The change that
> introduced the text was 82d2be3d9d66b7, based on a note from Peter
> Anvin:

Indeed, that was correct for autofs v3 but we're at autofs v5 now and
a lot has changed over time (the commit is from 2008).

All I can do is apologize for not also checking the man pages and trying
to keep them up to date.

Let's just work on making them accurate now.

> 
> [[
> > > Additionally, you may want to make a note in the stat/lstat man page 
> tha
> t on
> > > Linux, lstat(2) will generally not trigger automounter action, whereas
> > > stat(2) will.
> >
> > I don't understand this last piece.  Can you say some more.  (I'm not
> > familiar with automounter details.)
> 
> An automounter (either an explicit one, like autofs, or an implicit
> one, such as are used by AFS or NFSv4) is something that triggers
> a mount when something is touched.
> 
> However, it's undesirable to automount, say, everyone's home
> directory just because someone opened up /home in their GUI
> browser or typed "ls -l /home".  The early automounters simply
> didn't list the contents until you accessed it by name;
> this is still the case when you can't enumerate a mapping
> (say, all DNS names under /net).  However, this is extremely
> inconvenient, too.
> 
> The solution we ended up settling on is to create something
> that looks like a directory (i.e. reports S_IFDIR in stat()),
> but behaves somewhat like a symlink.  In particular, when it is
> accessed in a way where a symlink would be dereferenced,
> the automount triggers and the directory is mounted.  However,
> system calls which do *not* cause a symlink to be dereferenced,
> like lstat(), also do not cause the automounter to trigger.
> This means that "ls -l", or a GUI file browser, can see a list
> of directories without causing each one of them to be automounted.
> 
> -hpa
> ]]
> 
> Cheers,
> 
> Michael
> 
>> I propose correcting to
>>
>>   NOTES:
>>   On Linux, lstat() nor stat() act as though AT_NO_AUTOMOUNT was set
>>   and will not trigger automounter action for direct automount
>>   points, though they may (prior to 4.14) for indirect automount
>>   points.
>>
>>
>> The more precise details, that automount action for indirect automount
>> points is not triggered when the 'browse' option is used, is probably
>> not necessary.
>>
>> Ian: if you agree with that text, and Michael doesn't provide alternate
>> evidence, I'll submit a formal patch for the man page or should we
>> just wait until the patch actually lands?
>>
>> Thanks,
>> NeilBrown
>>
> 
> 
> 



Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent
On 24/08/17 12:07, NeilBrown wrote:
> 
> 
> The more precise details, that automount action for indirect automount
> points is not triggered when the 'browse' option is used, is probably
> not necessary.
> 
> Ian: if you agree with that text, and Michael doesn't provide alternate
> evidence, I'll submit a formal patch for the man page or should we
> just wait until the patch actually lands?

So far only David commented about using ENOENT rather than EREMOTE.

I prefer ENOENT for this case myself and he didn't object when I
explained why, David, any concerns?

Al has been silent so far so either he hasn't seen it or he's ok with
it, Al, any concerns?

And I guess if there are no concerns there's a good chance Andrew is
ok with it for the next merge window, Andrew?

If everyone agrees then we could go ahead immediately so there's a
better chance of getting it into released man pages closer to the
change being merged.

Ian


Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent
On 24/08/17 12:07, NeilBrown wrote:
> 
> 
> The more precise details, that automount action for indirect automount
> points is not triggered when the 'browse' option is used, is probably
> not necessary.
> 
> Ian: if you agree with that text, and Michael doesn't provide alternate
> evidence, I'll submit a formal patch for the man page or should we
> just wait until the patch actually lands?

So far only David commented about using ENOENT rather than EREMOTE.

I prefer ENOENT for this case myself and he didn't object when I
explained why, David, any concerns?

Al has been silent so far so either he hasn't seen it or he's ok with
it, Al, any concerns?

And I guess if there are no concerns there's a good chance Andrew is
ok with it for the next merge window, Andrew?

If everyone agrees then we could go ahead immediately so there's a
better chance of getting it into released man pages closer to the
change being merged.

Ian


Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent
On 24/08/17 12:07, NeilBrown wrote:
> On Wed, Aug 23 2017, Ian Kent wrote:
> 
>>
>> That inconsistency has bothered me for quite a while now.
>>
>> It was carried over from the autofs module behavior when automounting
>> support was added to the VFS. What's worse is it prevents the use of
>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>> statx().
>>
>> There is some risk in changing that so it does work but it really does
>> need to work to enable userspace to not trigger an automount by using
>> this flag.
>>
>> So that's (hopefully) going to change soonish, see:
>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>
>> The result should be that stat family calls don't trigger automounts except
>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>
> 
> oooh, yes.  That's much better - thanks.
> 
> We should make sure that change gets into the man pages...

Yes, I was wondering who to contact for that.

> 
> First however, we should probably correct the man page!
> stat.2 says:
> 
> 
>   NOTES
>On Linux, lstat() will generally not trigger  automounter
>action,  whereas  stat()  will  (but  see  the description of
>fstatat() AT_NO_AUTOMOUNT fag, above).
> 
> which is wrong: lstat and stat treat automounts the same.
> @Michael: do you recall why you inserted that text?  The commit message
> in commit 1ef5b2805471 ("stat.2: Cosmetic reworking of timestamp
> discussion in NOTES") is not very helpful.
> 
> I propose correcting to
> 
>   NOTES:
>   On Linux, lstat() nor stat() act as though AT_NO_AUTOMOUNT was set
>   and will not trigger automounter action for direct automount
>   points, though they may (prior to 4.14) for indirect automount
>   points.

Shouldn't that be "lstat() and stat() act as though AT_NO_AUTOMOUNT is
set ..."

> 
> 
> The more precise details, that automount action for indirect automount
> points is not triggered when the 'browse' option is used, is probably
> not necessary.
> 
> Ian: if you agree with that text, and Michael doesn't provide alternate
> evidence, I'll submit a formal patch for the man page or should we
> just wait until the patch actually lands?

I thought the fstatat() description needed attention too, doubly so with
the AT_NO_AUTOMOUNT change.

The "The fstatat() system call operates in exactly the same way as stat()"
is wrong in the same way as the stat() description was wrong.

After the change fstatat() will trigger automounts if the AT_NO_AUTOMOUNT
flag is not given which is different from lstat() and stat().

The updated NOTE above probably needs to be referred to in order to clarify
what's meant by "in exactly the same way" since that probably refers to the
information returned rather than whether an automount will be done.

Ian


Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent
On 24/08/17 12:07, NeilBrown wrote:
> On Wed, Aug 23 2017, Ian Kent wrote:
> 
>>
>> That inconsistency has bothered me for quite a while now.
>>
>> It was carried over from the autofs module behavior when automounting
>> support was added to the VFS. What's worse is it prevents the use of
>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>> statx().
>>
>> There is some risk in changing that so it does work but it really does
>> need to work to enable userspace to not trigger an automount by using
>> this flag.
>>
>> So that's (hopefully) going to change soonish, see:
>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>
>> The result should be that stat family calls don't trigger automounts except
>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>
> 
> oooh, yes.  That's much better - thanks.
> 
> We should make sure that change gets into the man pages...

Yes, I was wondering who to contact for that.

> 
> First however, we should probably correct the man page!
> stat.2 says:
> 
> 
>   NOTES
>On Linux, lstat() will generally not trigger  automounter
>action,  whereas  stat()  will  (but  see  the description of
>fstatat() AT_NO_AUTOMOUNT fag, above).
> 
> which is wrong: lstat and stat treat automounts the same.
> @Michael: do you recall why you inserted that text?  The commit message
> in commit 1ef5b2805471 ("stat.2: Cosmetic reworking of timestamp
> discussion in NOTES") is not very helpful.
> 
> I propose correcting to
> 
>   NOTES:
>   On Linux, lstat() nor stat() act as though AT_NO_AUTOMOUNT was set
>   and will not trigger automounter action for direct automount
>   points, though they may (prior to 4.14) for indirect automount
>   points.

Shouldn't that be "lstat() and stat() act as though AT_NO_AUTOMOUNT is
set ..."

> 
> 
> The more precise details, that automount action for indirect automount
> points is not triggered when the 'browse' option is used, is probably
> not necessary.
> 
> Ian: if you agree with that text, and Michael doesn't provide alternate
> evidence, I'll submit a formal patch for the man page or should we
> just wait until the patch actually lands?

I thought the fstatat() description needed attention too, doubly so with
the AT_NO_AUTOMOUNT change.

The "The fstatat() system call operates in exactly the same way as stat()"
is wrong in the same way as the stat() description was wrong.

After the change fstatat() will trigger automounts if the AT_NO_AUTOMOUNT
flag is not given which is different from lstat() and stat().

The updated NOTE above probably needs to be referred to in order to clarify
what's meant by "in exactly the same way" since that probably refers to the
information returned rather than whether an automount will be done.

Ian


Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent
On 24/08/17 11:21, NeilBrown wrote:
> On Wed, Aug 23 2017, Ian Kent wrote:
> 
>> On 23/08/17 10:32, Ian Kent wrote:
>>> On 23/08/17 09:06, NeilBrown wrote:
>>>> On Mon, Aug 21 2017, Ian Kent wrote:
>>>>
>>>>>>
>>>>>> A mount isn't triggered by kern_path(pathname, 0, ).
>>>>>> That '0' would need to include one of
>>>>>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>>>>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>>>>>
>>>>>> to trigger an automount (otherwise you just get -EISDIR).
>>>>>
>>>>> It's perfectly sensible to think that but there is a case where a
>>>>> a mount is triggered when using kern_path().
>>>>>
>>>>> The EISDIR return occurs for positive dentrys, negative dentrys
>>>>> will still trigger an automount (which is autofs specific,
>>>>> indirect mount map using nobrowse option, the install default).
>>>>
>>>> Ok, I understand this better now.  This difference between direct and
>>>> indirect mounts is slightly awkward. It is visible from user-space, but
>>>> not elegant to document.
>>>> When you use O_PATH to open a direct automount that has not already been
>>>> triggered, the open returns the underlying directory (and fstatfs
>>>> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
>>>> an indirect automount, it *will* trigger the automount when "nobrowse" is
>>>> in effect, but it won't when "browse" is in effect.
>>>
>>> That inconsistency has bothered me for quite a while now.
>>>
>>> It was carried over from the autofs module behavior when automounting
>>> support was added to the VFS. What's worse is it prevents the use of
>>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>>> statx().
>>>
>>> There is some risk in changing that so it does work but it really does
>>> need to work to enable userspace to not trigger an automount by using
>>> this flag.
>>>
>>> So that's (hopefully) going to change soonish, see:
>>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>>
>>> The result should be that stat family calls don't trigger automounts except
>>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>>
>>>>
>>>> So we cannot just say "O_PATH doesn't trigger automounts", which is
>>>> essentially what I said in
>>>>
>>>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
>>>>
>>>> It might be possible to modify automount so that it was more consistent
>>>> - i.e. if the point is triggered by a mkdir has been done, just to the
>>>> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
>>>> guess that might be racy, and in any case is hard to justify.
>>>>
>>>> Maybe I should change it to be about "direct automounts", and add a note
>>>> that indirect automounts aren't so predictable.
>>>
>>> Right and the semantics should be much more consistent in the near future.
>>> I hope (and expect) this semantic change won't cause problems.
>>>
>>>>
>>>> But back to my original issue of wanting to discard
>>>> kern_path_mountpoint, what would you think of the following approach -
>>>> slight revised from before.
>>>>
>>>> Thanks,
>>>> NeilBrown
>>>>
>>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>>> index beef981aa54f..7663ea82e68d 100644
>>>> --- a/fs/autofs4/autofs_i.h
>>>> +++ b/fs/autofs4/autofs_i.h
>>>> @@ -135,10 +135,13 @@ static inline struct autofs_info 
>>>> *autofs4_dentry_ino(struct dentry *dentry)
>>>>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>>>>   * processes which do manipulations for us in user space sees the raw
>>>>   * filesystem without "magic".)
>>>> + * A process performing certain ioctls can get temporary oz status.
>>>>   */
>>>> +extern struct task_struct *autofs_tmp_oz;
>>>>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>>>>  {
>>>> -  return sbi->catatonic || task_pgrp(

Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent
On 24/08/17 11:21, NeilBrown wrote:
> On Wed, Aug 23 2017, Ian Kent wrote:
> 
>> On 23/08/17 10:32, Ian Kent wrote:
>>> On 23/08/17 09:06, NeilBrown wrote:
>>>> On Mon, Aug 21 2017, Ian Kent wrote:
>>>>
>>>>>>
>>>>>> A mount isn't triggered by kern_path(pathname, 0, ).
>>>>>> That '0' would need to include one of
>>>>>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>>>>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>>>>>
>>>>>> to trigger an automount (otherwise you just get -EISDIR).
>>>>>
>>>>> It's perfectly sensible to think that but there is a case where a
>>>>> a mount is triggered when using kern_path().
>>>>>
>>>>> The EISDIR return occurs for positive dentrys, negative dentrys
>>>>> will still trigger an automount (which is autofs specific,
>>>>> indirect mount map using nobrowse option, the install default).
>>>>
>>>> Ok, I understand this better now.  This difference between direct and
>>>> indirect mounts is slightly awkward. It is visible from user-space, but
>>>> not elegant to document.
>>>> When you use O_PATH to open a direct automount that has not already been
>>>> triggered, the open returns the underlying directory (and fstatfs
>>>> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
>>>> an indirect automount, it *will* trigger the automount when "nobrowse" is
>>>> in effect, but it won't when "browse" is in effect.
>>>
>>> That inconsistency has bothered me for quite a while now.
>>>
>>> It was carried over from the autofs module behavior when automounting
>>> support was added to the VFS. What's worse is it prevents the use of
>>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>>> statx().
>>>
>>> There is some risk in changing that so it does work but it really does
>>> need to work to enable userspace to not trigger an automount by using
>>> this flag.
>>>
>>> So that's (hopefully) going to change soonish, see:
>>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>>
>>> The result should be that stat family calls don't trigger automounts except
>>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>>
>>>>
>>>> So we cannot just say "O_PATH doesn't trigger automounts", which is
>>>> essentially what I said in
>>>>
>>>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
>>>>
>>>> It might be possible to modify automount so that it was more consistent
>>>> - i.e. if the point is triggered by a mkdir has been done, just to the
>>>> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
>>>> guess that might be racy, and in any case is hard to justify.
>>>>
>>>> Maybe I should change it to be about "direct automounts", and add a note
>>>> that indirect automounts aren't so predictable.
>>>
>>> Right and the semantics should be much more consistent in the near future.
>>> I hope (and expect) this semantic change won't cause problems.
>>>
>>>>
>>>> But back to my original issue of wanting to discard
>>>> kern_path_mountpoint, what would you think of the following approach -
>>>> slight revised from before.
>>>>
>>>> Thanks,
>>>> NeilBrown
>>>>
>>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>>> index beef981aa54f..7663ea82e68d 100644
>>>> --- a/fs/autofs4/autofs_i.h
>>>> +++ b/fs/autofs4/autofs_i.h
>>>> @@ -135,10 +135,13 @@ static inline struct autofs_info 
>>>> *autofs4_dentry_ino(struct dentry *dentry)
>>>>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>>>>   * processes which do manipulations for us in user space sees the raw
>>>>   * filesystem without "magic".)
>>>> + * A process performing certain ioctls can get temporary oz status.
>>>>   */
>>>> +extern struct task_struct *autofs_tmp_oz;
>>>>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>>>>  {
>>>> -  return sbi->catatonic || task_pgrp(

Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent
On 23/08/17 10:54, Ian Kent wrote:
> On 23/08/17 10:40, Ian Kent wrote:
>> On 23/08/17 10:32, Ian Kent wrote:
>>> On 23/08/17 09:06, NeilBrown wrote:
>>>> On Mon, Aug 21 2017, Ian Kent wrote:
>>>>
>>>>>>
>>>>>> A mount isn't triggered by kern_path(pathname, 0, ).
>>>>>> That '0' would need to include one of
>>>>>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>>>>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>>>>>
>>>>>> to trigger an automount (otherwise you just get -EISDIR).
>>>>>
>>>>> It's perfectly sensible to think that but there is a case where a
>>>>> a mount is triggered when using kern_path().
>>>>>
>>>>> The EISDIR return occurs for positive dentrys, negative dentrys
>>>>> will still trigger an automount (which is autofs specific,
>>>>> indirect mount map using nobrowse option, the install default).
>>>>
>>>> Ok, I understand this better now.  This difference between direct and
>>>> indirect mounts is slightly awkward. It is visible from user-space, but
>>>> not elegant to document.
>>>> When you use O_PATH to open a direct automount that has not already been
>>>> triggered, the open returns the underlying directory (and fstatfs
>>>> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
>>>> an indirect automount, it *will* trigger the automount when "nobrowse" is
>>>> in effect, but it won't when "browse" is in effect.
>>>
>>> That inconsistency has bothered me for quite a while now.
>>>
>>> It was carried over from the autofs module behavior when automounting
>>> support was added to the VFS. What's worse is it prevents the use of
>>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>>> statx().
>>>
>>> There is some risk in changing that so it does work but it really does
>>> need to work to enable userspace to not trigger an automount by using
>>> this flag.
>>>
>>> So that's (hopefully) going to change soonish, see:
>>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>>
>>> The result should be that stat family calls don't trigger automounts except
>>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>>
>>>>
>>>> So we cannot just say "O_PATH doesn't trigger automounts", which is
>>>> essentially what I said in
>>>>
>>>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
>>>>
>>>> It might be possible to modify automount so that it was more consistent
>>>> - i.e. if the point is triggered by a mkdir has been done, just to the
>>>> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
>>>> guess that might be racy, and in any case is hard to justify.
>>>>
>>>> Maybe I should change it to be about "direct automounts", and add a note
>>>> that indirect automounts aren't so predictable.
>>>
>>> Right and the semantics should be much more consistent in the near future.
>>> I hope (and expect) this semantic change won't cause problems.
>>>
>>>>
>>>> But back to my original issue of wanting to discard
>>>> kern_path_mountpoint, what would you think of the following approach -
>>>> slight revised from before.
>>>>
>>>> Thanks,
>>>> NeilBrown
>>>>
>>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>>> index beef981aa54f..7663ea82e68d 100644
>>>> --- a/fs/autofs4/autofs_i.h
>>>> +++ b/fs/autofs4/autofs_i.h
>>>> @@ -135,10 +135,13 @@ static inline struct autofs_info 
>>>> *autofs4_dentry_ino(struct dentry *dentry)
>>>>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>>>>   * processes which do manipulations for us in user space sees the raw
>>>>   * filesystem without "magic".)
>>>> + * A process performing certain ioctls can get temporary oz status.
>>>>   */
>>>> +extern struct task_struct *autofs_tmp_oz;
>>>>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>>>>  {
>>>> -  return sbi->catatonic || task_pgrp(current) == sbi-&g

Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent
On 23/08/17 10:54, Ian Kent wrote:
> On 23/08/17 10:40, Ian Kent wrote:
>> On 23/08/17 10:32, Ian Kent wrote:
>>> On 23/08/17 09:06, NeilBrown wrote:
>>>> On Mon, Aug 21 2017, Ian Kent wrote:
>>>>
>>>>>>
>>>>>> A mount isn't triggered by kern_path(pathname, 0, ).
>>>>>> That '0' would need to include one of
>>>>>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>>>>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>>>>>
>>>>>> to trigger an automount (otherwise you just get -EISDIR).
>>>>>
>>>>> It's perfectly sensible to think that but there is a case where a
>>>>> a mount is triggered when using kern_path().
>>>>>
>>>>> The EISDIR return occurs for positive dentrys, negative dentrys
>>>>> will still trigger an automount (which is autofs specific,
>>>>> indirect mount map using nobrowse option, the install default).
>>>>
>>>> Ok, I understand this better now.  This difference between direct and
>>>> indirect mounts is slightly awkward. It is visible from user-space, but
>>>> not elegant to document.
>>>> When you use O_PATH to open a direct automount that has not already been
>>>> triggered, the open returns the underlying directory (and fstatfs
>>>> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
>>>> an indirect automount, it *will* trigger the automount when "nobrowse" is
>>>> in effect, but it won't when "browse" is in effect.
>>>
>>> That inconsistency has bothered me for quite a while now.
>>>
>>> It was carried over from the autofs module behavior when automounting
>>> support was added to the VFS. What's worse is it prevents the use of
>>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>>> statx().
>>>
>>> There is some risk in changing that so it does work but it really does
>>> need to work to enable userspace to not trigger an automount by using
>>> this flag.
>>>
>>> So that's (hopefully) going to change soonish, see:
>>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>>
>>> The result should be that stat family calls don't trigger automounts except
>>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>>
>>>>
>>>> So we cannot just say "O_PATH doesn't trigger automounts", which is
>>>> essentially what I said in
>>>>
>>>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
>>>>
>>>> It might be possible to modify automount so that it was more consistent
>>>> - i.e. if the point is triggered by a mkdir has been done, just to the
>>>> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
>>>> guess that might be racy, and in any case is hard to justify.
>>>>
>>>> Maybe I should change it to be about "direct automounts", and add a note
>>>> that indirect automounts aren't so predictable.
>>>
>>> Right and the semantics should be much more consistent in the near future.
>>> I hope (and expect) this semantic change won't cause problems.
>>>
>>>>
>>>> But back to my original issue of wanting to discard
>>>> kern_path_mountpoint, what would you think of the following approach -
>>>> slight revised from before.
>>>>
>>>> Thanks,
>>>> NeilBrown
>>>>
>>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>>> index beef981aa54f..7663ea82e68d 100644
>>>> --- a/fs/autofs4/autofs_i.h
>>>> +++ b/fs/autofs4/autofs_i.h
>>>> @@ -135,10 +135,13 @@ static inline struct autofs_info 
>>>> *autofs4_dentry_ino(struct dentry *dentry)
>>>>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>>>>   * processes which do manipulations for us in user space sees the raw
>>>>   * filesystem without "magic".)
>>>> + * A process performing certain ioctls can get temporary oz status.
>>>>   */
>>>> +extern struct task_struct *autofs_tmp_oz;
>>>>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>>>>  {
>>>> -  return sbi->catatonic || task_pgrp(current) == sbi-&g

Re: Do we really need d_weak_revalidate???

2017-08-22 Thread Ian Kent
On 23/08/17 10:40, Ian Kent wrote:
> On 23/08/17 10:32, Ian Kent wrote:
>> On 23/08/17 09:06, NeilBrown wrote:
>>> On Mon, Aug 21 2017, Ian Kent wrote:
>>>
>>>>>
>>>>> A mount isn't triggered by kern_path(pathname, 0, ).
>>>>> That '0' would need to include one of
>>>>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>>>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>>>>
>>>>> to trigger an automount (otherwise you just get -EISDIR).
>>>>
>>>> It's perfectly sensible to think that but there is a case where a
>>>> a mount is triggered when using kern_path().
>>>>
>>>> The EISDIR return occurs for positive dentrys, negative dentrys
>>>> will still trigger an automount (which is autofs specific,
>>>> indirect mount map using nobrowse option, the install default).
>>>
>>> Ok, I understand this better now.  This difference between direct and
>>> indirect mounts is slightly awkward. It is visible from user-space, but
>>> not elegant to document.
>>> When you use O_PATH to open a direct automount that has not already been
>>> triggered, the open returns the underlying directory (and fstatfs
>>> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
>>> an indirect automount, it *will* trigger the automount when "nobrowse" is
>>> in effect, but it won't when "browse" is in effect.
>>
>> That inconsistency has bothered me for quite a while now.
>>
>> It was carried over from the autofs module behavior when automounting
>> support was added to the VFS. What's worse is it prevents the use of
>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>> statx().
>>
>> There is some risk in changing that so it does work but it really does
>> need to work to enable userspace to not trigger an automount by using
>> this flag.
>>
>> So that's (hopefully) going to change soonish, see:
>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>
>> The result should be that stat family calls don't trigger automounts except
>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>
>>>
>>> So we cannot just say "O_PATH doesn't trigger automounts", which is
>>> essentially what I said in
>>>
>>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
>>>
>>> It might be possible to modify automount so that it was more consistent
>>> - i.e. if the point is triggered by a mkdir has been done, just to the
>>> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
>>> guess that might be racy, and in any case is hard to justify.
>>>
>>> Maybe I should change it to be about "direct automounts", and add a note
>>> that indirect automounts aren't so predictable.
>>
>> Right and the semantics should be much more consistent in the near future.
>> I hope (and expect) this semantic change won't cause problems.
>>
>>>
>>> But back to my original issue of wanting to discard
>>> kern_path_mountpoint, what would you think of the following approach -
>>> slight revised from before.
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>> index beef981aa54f..7663ea82e68d 100644
>>> --- a/fs/autofs4/autofs_i.h
>>> +++ b/fs/autofs4/autofs_i.h
>>> @@ -135,10 +135,13 @@ static inline struct autofs_info 
>>> *autofs4_dentry_ino(struct dentry *dentry)
>>>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>>>   * processes which do manipulations for us in user space sees the raw
>>>   * filesystem without "magic".)
>>> + * A process performing certain ioctls can get temporary oz status.
>>>   */
>>> +extern struct task_struct *autofs_tmp_oz;
>>>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>>>  {
>>> -   return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
>>> +   return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
>>> +   autofs_tmp_oz == current;
>>>  }
>>>  
>>>  struct inode *autofs4_get_inode(struct super_block *, umode_t);
>>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>>> index dd9f1bebb5a3..d76401669a20 100

Re: Do we really need d_weak_revalidate???

2017-08-22 Thread Ian Kent
On 23/08/17 10:40, Ian Kent wrote:
> On 23/08/17 10:32, Ian Kent wrote:
>> On 23/08/17 09:06, NeilBrown wrote:
>>> On Mon, Aug 21 2017, Ian Kent wrote:
>>>
>>>>>
>>>>> A mount isn't triggered by kern_path(pathname, 0, ).
>>>>> That '0' would need to include one of
>>>>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>>>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>>>>
>>>>> to trigger an automount (otherwise you just get -EISDIR).
>>>>
>>>> It's perfectly sensible to think that but there is a case where a
>>>> a mount is triggered when using kern_path().
>>>>
>>>> The EISDIR return occurs for positive dentrys, negative dentrys
>>>> will still trigger an automount (which is autofs specific,
>>>> indirect mount map using nobrowse option, the install default).
>>>
>>> Ok, I understand this better now.  This difference between direct and
>>> indirect mounts is slightly awkward. It is visible from user-space, but
>>> not elegant to document.
>>> When you use O_PATH to open a direct automount that has not already been
>>> triggered, the open returns the underlying directory (and fstatfs
>>> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
>>> an indirect automount, it *will* trigger the automount when "nobrowse" is
>>> in effect, but it won't when "browse" is in effect.
>>
>> That inconsistency has bothered me for quite a while now.
>>
>> It was carried over from the autofs module behavior when automounting
>> support was added to the VFS. What's worse is it prevents the use of
>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>> statx().
>>
>> There is some risk in changing that so it does work but it really does
>> need to work to enable userspace to not trigger an automount by using
>> this flag.
>>
>> So that's (hopefully) going to change soonish, see:
>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>
>> The result should be that stat family calls don't trigger automounts except
>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>
>>>
>>> So we cannot just say "O_PATH doesn't trigger automounts", which is
>>> essentially what I said in
>>>
>>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
>>>
>>> It might be possible to modify automount so that it was more consistent
>>> - i.e. if the point is triggered by a mkdir has been done, just to the
>>> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
>>> guess that might be racy, and in any case is hard to justify.
>>>
>>> Maybe I should change it to be about "direct automounts", and add a note
>>> that indirect automounts aren't so predictable.
>>
>> Right and the semantics should be much more consistent in the near future.
>> I hope (and expect) this semantic change won't cause problems.
>>
>>>
>>> But back to my original issue of wanting to discard
>>> kern_path_mountpoint, what would you think of the following approach -
>>> slight revised from before.
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>> index beef981aa54f..7663ea82e68d 100644
>>> --- a/fs/autofs4/autofs_i.h
>>> +++ b/fs/autofs4/autofs_i.h
>>> @@ -135,10 +135,13 @@ static inline struct autofs_info 
>>> *autofs4_dentry_ino(struct dentry *dentry)
>>>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>>>   * processes which do manipulations for us in user space sees the raw
>>>   * filesystem without "magic".)
>>> + * A process performing certain ioctls can get temporary oz status.
>>>   */
>>> +extern struct task_struct *autofs_tmp_oz;
>>>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>>>  {
>>> -   return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
>>> +   return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
>>> +   autofs_tmp_oz == current;
>>>  }
>>>  
>>>  struct inode *autofs4_get_inode(struct super_block *, umode_t);
>>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>>> index dd9f1bebb5a3..d76401669a20 100

Re: Do we really need d_weak_revalidate???

2017-08-22 Thread Ian Kent
On 23/08/17 10:32, Ian Kent wrote:
> On 23/08/17 09:06, NeilBrown wrote:
>> On Mon, Aug 21 2017, Ian Kent wrote:
>>
>>>>
>>>> A mount isn't triggered by kern_path(pathname, 0, ).
>>>> That '0' would need to include one of
>>>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>>>
>>>> to trigger an automount (otherwise you just get -EISDIR).
>>>
>>> It's perfectly sensible to think that but there is a case where a
>>> a mount is triggered when using kern_path().
>>>
>>> The EISDIR return occurs for positive dentrys, negative dentrys
>>> will still trigger an automount (which is autofs specific,
>>> indirect mount map using nobrowse option, the install default).
>>
>> Ok, I understand this better now.  This difference between direct and
>> indirect mounts is slightly awkward. It is visible from user-space, but
>> not elegant to document.
>> When you use O_PATH to open a direct automount that has not already been
>> triggered, the open returns the underlying directory (and fstatfs
>> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
>> an indirect automount, it *will* trigger the automount when "nobrowse" is
>> in effect, but it won't when "browse" is in effect.
> 
> That inconsistency has bothered me for quite a while now.
> 
> It was carried over from the autofs module behavior when automounting
> support was added to the VFS. What's worse is it prevents the use of
> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
> statx().
> 
> There is some risk in changing that so it does work but it really does
> need to work to enable userspace to not trigger an automount by using
> this flag.
> 
> So that's (hopefully) going to change soonish, see:
> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
> 
> The result should be that stat family calls don't trigger automounts except
> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
> 
>>
>> So we cannot just say "O_PATH doesn't trigger automounts", which is
>> essentially what I said in
>>
>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
>>
>> It might be possible to modify automount so that it was more consistent
>> - i.e. if the point is triggered by a mkdir has been done, just to the
>> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
>> guess that might be racy, and in any case is hard to justify.
>>
>> Maybe I should change it to be about "direct automounts", and add a note
>> that indirect automounts aren't so predictable.
> 
> Right and the semantics should be much more consistent in the near future.
> I hope (and expect) this semantic change won't cause problems.
> 
>>
>> But back to my original issue of wanting to discard
>> kern_path_mountpoint, what would you think of the following approach -
>> slight revised from before.
>>
>> Thanks,
>> NeilBrown
>>
>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>> index beef981aa54f..7663ea82e68d 100644
>> --- a/fs/autofs4/autofs_i.h
>> +++ b/fs/autofs4/autofs_i.h
>> @@ -135,10 +135,13 @@ static inline struct autofs_info 
>> *autofs4_dentry_ino(struct dentry *dentry)
>>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>>   * processes which do manipulations for us in user space sees the raw
>>   * filesystem without "magic".)
>> + * A process performing certain ioctls can get temporary oz status.
>>   */
>> +extern struct task_struct *autofs_tmp_oz;
>>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>>  {
>> -return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
>> +return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
>> +autofs_tmp_oz == current;
>>  }
>>  
>>  struct inode *autofs4_get_inode(struct super_block *, umode_t);
>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>> index dd9f1bebb5a3..d76401669a20 100644
>> --- a/fs/autofs4/dev-ioctl.c
>> +++ b/fs/autofs4/dev-ioctl.c
>> @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file *fp,
>>  return 0;
>>  }
>>  
>> +struct task_struct *autofs_tmp_oz;
>> +int kern_path_oz(const char *pathname, int flags, struct path *path)
>> +{
>> +static DEFINE_MUTEX(autofs_o

Re: Do we really need d_weak_revalidate???

2017-08-22 Thread Ian Kent
On 23/08/17 10:32, Ian Kent wrote:
> On 23/08/17 09:06, NeilBrown wrote:
>> On Mon, Aug 21 2017, Ian Kent wrote:
>>
>>>>
>>>> A mount isn't triggered by kern_path(pathname, 0, ).
>>>> That '0' would need to include one of
>>>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>>>
>>>> to trigger an automount (otherwise you just get -EISDIR).
>>>
>>> It's perfectly sensible to think that but there is a case where a
>>> a mount is triggered when using kern_path().
>>>
>>> The EISDIR return occurs for positive dentrys, negative dentrys
>>> will still trigger an automount (which is autofs specific,
>>> indirect mount map using nobrowse option, the install default).
>>
>> Ok, I understand this better now.  This difference between direct and
>> indirect mounts is slightly awkward. It is visible from user-space, but
>> not elegant to document.
>> When you use O_PATH to open a direct automount that has not already been
>> triggered, the open returns the underlying directory (and fstatfs
>> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
>> an indirect automount, it *will* trigger the automount when "nobrowse" is
>> in effect, but it won't when "browse" is in effect.
> 
> That inconsistency has bothered me for quite a while now.
> 
> It was carried over from the autofs module behavior when automounting
> support was added to the VFS. What's worse is it prevents the use of
> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
> statx().
> 
> There is some risk in changing that so it does work but it really does
> need to work to enable userspace to not trigger an automount by using
> this flag.
> 
> So that's (hopefully) going to change soonish, see:
> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
> 
> The result should be that stat family calls don't trigger automounts except
> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
> 
>>
>> So we cannot just say "O_PATH doesn't trigger automounts", which is
>> essentially what I said in
>>
>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
>>
>> It might be possible to modify automount so that it was more consistent
>> - i.e. if the point is triggered by a mkdir has been done, just to the
>> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
>> guess that might be racy, and in any case is hard to justify.
>>
>> Maybe I should change it to be about "direct automounts", and add a note
>> that indirect automounts aren't so predictable.
> 
> Right and the semantics should be much more consistent in the near future.
> I hope (and expect) this semantic change won't cause problems.
> 
>>
>> But back to my original issue of wanting to discard
>> kern_path_mountpoint, what would you think of the following approach -
>> slight revised from before.
>>
>> Thanks,
>> NeilBrown
>>
>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>> index beef981aa54f..7663ea82e68d 100644
>> --- a/fs/autofs4/autofs_i.h
>> +++ b/fs/autofs4/autofs_i.h
>> @@ -135,10 +135,13 @@ static inline struct autofs_info 
>> *autofs4_dentry_ino(struct dentry *dentry)
>>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>>   * processes which do manipulations for us in user space sees the raw
>>   * filesystem without "magic".)
>> + * A process performing certain ioctls can get temporary oz status.
>>   */
>> +extern struct task_struct *autofs_tmp_oz;
>>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>>  {
>> -return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
>> +return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
>> +autofs_tmp_oz == current;
>>  }
>>  
>>  struct inode *autofs4_get_inode(struct super_block *, umode_t);
>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>> index dd9f1bebb5a3..d76401669a20 100644
>> --- a/fs/autofs4/dev-ioctl.c
>> +++ b/fs/autofs4/dev-ioctl.c
>> @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file *fp,
>>  return 0;
>>  }
>>  
>> +struct task_struct *autofs_tmp_oz;
>> +int kern_path_oz(const char *pathname, int flags, struct path *path)
>> +{
>> +static DEFINE_MUTEX(autofs_o

Re: Do we really need d_weak_revalidate???

2017-08-22 Thread Ian Kent
On 23/08/17 09:06, NeilBrown wrote:
> On Mon, Aug 21 2017, Ian Kent wrote:
> 
>>>
>>> A mount isn't triggered by kern_path(pathname, 0, ).
>>> That '0' would need to include one of
>>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>>
>>> to trigger an automount (otherwise you just get -EISDIR).
>>
>> It's perfectly sensible to think that but there is a case where a
>> a mount is triggered when using kern_path().
>>
>> The EISDIR return occurs for positive dentrys, negative dentrys
>> will still trigger an automount (which is autofs specific,
>> indirect mount map using nobrowse option, the install default).
> 
> Ok, I understand this better now.  This difference between direct and
> indirect mounts is slightly awkward. It is visible from user-space, but
> not elegant to document.
> When you use O_PATH to open a direct automount that has not already been
> triggered, the open returns the underlying directory (and fstatfs
> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
> an indirect automount, it *will* trigger the automount when "nobrowse" is
> in effect, but it won't when "browse" is in effect.

That inconsistency has bothered me for quite a while now.

It was carried over from the autofs module behavior when automounting
support was added to the VFS. What's worse is it prevents the use of
the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
statx().

There is some risk in changing that so it does work but it really does
need to work to enable userspace to not trigger an automount by using
this flag.

So that's (hopefully) going to change soonish, see:
http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch

The result should be that stat family calls don't trigger automounts except
for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.

> 
> So we cannot just say "O_PATH doesn't trigger automounts", which is
> essentially what I said in
> 
> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
> 
> It might be possible to modify automount so that it was more consistent
> - i.e. if the point is triggered by a mkdir has been done, just to the
> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
> guess that might be racy, and in any case is hard to justify.
> 
> Maybe I should change it to be about "direct automounts", and add a note
> that indirect automounts aren't so predictable.

Right and the semantics should be much more consistent in the near future.
I hope (and expect) this semantic change won't cause problems.

> 
> But back to my original issue of wanting to discard
> kern_path_mountpoint, what would you think of the following approach -
> slight revised from before.
> 
> Thanks,
> NeilBrown
> 
> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
> index beef981aa54f..7663ea82e68d 100644
> --- a/fs/autofs4/autofs_i.h
> +++ b/fs/autofs4/autofs_i.h
> @@ -135,10 +135,13 @@ static inline struct autofs_info 
> *autofs4_dentry_ino(struct dentry *dentry)
>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>   * processes which do manipulations for us in user space sees the raw
>   * filesystem without "magic".)
> + * A process performing certain ioctls can get temporary oz status.
>   */
> +extern struct task_struct *autofs_tmp_oz;
>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>  {
> - return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
> + return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
> + autofs_tmp_oz == current;
>  }
>  
>  struct inode *autofs4_get_inode(struct super_block *, umode_t);
> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
> index dd9f1bebb5a3..d76401669a20 100644
> --- a/fs/autofs4/dev-ioctl.c
> +++ b/fs/autofs4/dev-ioctl.c
> @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file *fp,
>   return 0;
>  }
>  
> +struct task_struct *autofs_tmp_oz;
> +int kern_path_oz(const char *pathname, int flags, struct path *path)
> +{
> + static DEFINE_MUTEX(autofs_oz);
> + int err;
> +
> + mutex_lock(_oz);
> + autofs_tmp_oz = current;
> + err = kern_path(pathname, flags, path);
> + autofs_tmp_oz = NULL;
> + mutex_unlock(_oz);
> + return err;
> +}
> +

It's simple enough but does look like it will attract criticism as being
a hack!

The kern_path_locked() function is very similar to what was originally
done, along with code to look down th

Re: Do we really need d_weak_revalidate???

2017-08-22 Thread Ian Kent
On 23/08/17 09:06, NeilBrown wrote:
> On Mon, Aug 21 2017, Ian Kent wrote:
> 
>>>
>>> A mount isn't triggered by kern_path(pathname, 0, ).
>>> That '0' would need to include one of
>>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>>
>>> to trigger an automount (otherwise you just get -EISDIR).
>>
>> It's perfectly sensible to think that but there is a case where a
>> a mount is triggered when using kern_path().
>>
>> The EISDIR return occurs for positive dentrys, negative dentrys
>> will still trigger an automount (which is autofs specific,
>> indirect mount map using nobrowse option, the install default).
> 
> Ok, I understand this better now.  This difference between direct and
> indirect mounts is slightly awkward. It is visible from user-space, but
> not elegant to document.
> When you use O_PATH to open a direct automount that has not already been
> triggered, the open returns the underlying directory (and fstatfs
> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
> an indirect automount, it *will* trigger the automount when "nobrowse" is
> in effect, but it won't when "browse" is in effect.

That inconsistency has bothered me for quite a while now.

It was carried over from the autofs module behavior when automounting
support was added to the VFS. What's worse is it prevents the use of
the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
statx().

There is some risk in changing that so it does work but it really does
need to work to enable userspace to not trigger an automount by using
this flag.

So that's (hopefully) going to change soonish, see:
http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch

The result should be that stat family calls don't trigger automounts except
for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.

> 
> So we cannot just say "O_PATH doesn't trigger automounts", which is
> essentially what I said in
> 
> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
> 
> It might be possible to modify automount so that it was more consistent
> - i.e. if the point is triggered by a mkdir has been done, just to the
> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
> guess that might be racy, and in any case is hard to justify.
> 
> Maybe I should change it to be about "direct automounts", and add a note
> that indirect automounts aren't so predictable.

Right and the semantics should be much more consistent in the near future.
I hope (and expect) this semantic change won't cause problems.

> 
> But back to my original issue of wanting to discard
> kern_path_mountpoint, what would you think of the following approach -
> slight revised from before.
> 
> Thanks,
> NeilBrown
> 
> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
> index beef981aa54f..7663ea82e68d 100644
> --- a/fs/autofs4/autofs_i.h
> +++ b/fs/autofs4/autofs_i.h
> @@ -135,10 +135,13 @@ static inline struct autofs_info 
> *autofs4_dentry_ino(struct dentry *dentry)
>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>   * processes which do manipulations for us in user space sees the raw
>   * filesystem without "magic".)
> + * A process performing certain ioctls can get temporary oz status.
>   */
> +extern struct task_struct *autofs_tmp_oz;
>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>  {
> - return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
> + return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
> + autofs_tmp_oz == current;
>  }
>  
>  struct inode *autofs4_get_inode(struct super_block *, umode_t);
> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
> index dd9f1bebb5a3..d76401669a20 100644
> --- a/fs/autofs4/dev-ioctl.c
> +++ b/fs/autofs4/dev-ioctl.c
> @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file *fp,
>   return 0;
>  }
>  
> +struct task_struct *autofs_tmp_oz;
> +int kern_path_oz(const char *pathname, int flags, struct path *path)
> +{
> + static DEFINE_MUTEX(autofs_oz);
> + int err;
> +
> + mutex_lock(_oz);
> + autofs_tmp_oz = current;
> + err = kern_path(pathname, flags, path);
> + autofs_tmp_oz = NULL;
> + mutex_unlock(_oz);
> + return err;
> +}
> +

It's simple enough but does look like it will attract criticism as being
a hack!

The kern_path_locked() function is very similar to what was originally
done, along with code to look down th

Re: Do we really need d_weak_revalidate???

2017-08-21 Thread Ian Kent
On 21/08/17 14:23, NeilBrown wrote:
> On Fri, Aug 18 2017, Ian Kent wrote:
> 
>> On 18/08/17 13:24, NeilBrown wrote:
>>> On Thu, Aug 17 2017, Ian Kent wrote:
>>>
>>>> On 16/08/17 19:34, Jeff Layton wrote:
>>>>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>>>>>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>>>>>
>>>>>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
>>>>>>>> On Fri, Aug 11 2017, Jeff Layton wrote:
>>>>>>>>
>>>>>>>>> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>>>>>>>>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>>>>>>>>>>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>>>>>>>>>>> flag and introduced the d_weak_revalidate dentry operation instead.
>>>>>>>>>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>>>>>>>>>>> and added the new dentry operation to NFS dentries  but not to
>>>>>>>>>>> NFSv4
>>>>>>>>>>> dentries.
>>>>>>>>>>>
>>>>>>>>>>> And nobody noticed.
>>>>>>>>>>>
>>>>>>>>>>> Until today.
>>>>>>>>>>>
>>>>>>>>>>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>>>>>>>>>>> NFS
>>>>>>>>>>> filesystem hangs because the network has been deconfigured.  This
>>>>>>>>>>> makes
>>>>>>>>>>> perfect sense and I suggested a code change to fix the problem.
>>>>>>>>>>> However when a colleague was trying to reproduce the problem to
>>>>>>>>>>> validate
>>>>>>>>>>> the fix, he couldn't.  Then nor could I.
>>>>>>>>>>>
>>>>>>>>>>> The problem is trivially reproducible with NFSv3, and not at all 
>>>>>>>>>>> with
>>>>>>>>>>> NFSv4.  The reason is the missing d_weak_revalidate.
>>>>>>>>>>>
>>>>>>>>>>> We could simply add d_weak_revalidate for NFSv4, but given that it
>>>>>>>>>>> has been missing for 4.5 years, and the only time anyone noticed was
>>>>>>>>>>> when the ommission resulted in a better user experience, I do wonder
>>>>>>>>>>> if
>>>>>>>>>>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>>>>>>>>>>> does
>>>>>>>>>>> it serve?  I couldn't find one.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> NeilBrown
>>>>>>>>>>>
>>>>>>>>>>> For reference, see
>>>>>>>>>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>>>>>>>>>>> d_weak_revalidate dentry op")
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> To reproduce the problem at home, on a system that uses systemd:
>>>>>>>>>>> 1/ place (or find) a filesystem image in a file on an NFS 
>>>>>>>>>>> filesystem.
>>>>>>>>>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>>>>>>>>>>> 3/ loop-mount the filesystem image read-only somewhere
>>>>>>>>>>> 4/ reboot
>>>>>>>>>>>
>>>>>>>>>>> If you choose v4, the reboot will succeed, possibly after a 90second
>>>>>>>>>>> timeout.
>>>>>>>>>>> If you choose v3, the reboot will hang indefinitely in systemd-
>>>>>>>>>>> shutdown while
>>>>>>>>>>> remounting the nfs filesystem read-only.
>>>>>>>>>>>
>>>>>>>>>>> If you don't use "noac" it can still 

Re: Do we really need d_weak_revalidate???

2017-08-21 Thread Ian Kent
On 21/08/17 14:23, NeilBrown wrote:
> On Fri, Aug 18 2017, Ian Kent wrote:
> 
>> On 18/08/17 13:24, NeilBrown wrote:
>>> On Thu, Aug 17 2017, Ian Kent wrote:
>>>
>>>> On 16/08/17 19:34, Jeff Layton wrote:
>>>>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>>>>>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>>>>>
>>>>>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
>>>>>>>> On Fri, Aug 11 2017, Jeff Layton wrote:
>>>>>>>>
>>>>>>>>> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>>>>>>>>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>>>>>>>>>>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>>>>>>>>>>> flag and introduced the d_weak_revalidate dentry operation instead.
>>>>>>>>>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>>>>>>>>>>> and added the new dentry operation to NFS dentries  but not to
>>>>>>>>>>> NFSv4
>>>>>>>>>>> dentries.
>>>>>>>>>>>
>>>>>>>>>>> And nobody noticed.
>>>>>>>>>>>
>>>>>>>>>>> Until today.
>>>>>>>>>>>
>>>>>>>>>>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>>>>>>>>>>> NFS
>>>>>>>>>>> filesystem hangs because the network has been deconfigured.  This
>>>>>>>>>>> makes
>>>>>>>>>>> perfect sense and I suggested a code change to fix the problem.
>>>>>>>>>>> However when a colleague was trying to reproduce the problem to
>>>>>>>>>>> validate
>>>>>>>>>>> the fix, he couldn't.  Then nor could I.
>>>>>>>>>>>
>>>>>>>>>>> The problem is trivially reproducible with NFSv3, and not at all 
>>>>>>>>>>> with
>>>>>>>>>>> NFSv4.  The reason is the missing d_weak_revalidate.
>>>>>>>>>>>
>>>>>>>>>>> We could simply add d_weak_revalidate for NFSv4, but given that it
>>>>>>>>>>> has been missing for 4.5 years, and the only time anyone noticed was
>>>>>>>>>>> when the ommission resulted in a better user experience, I do wonder
>>>>>>>>>>> if
>>>>>>>>>>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>>>>>>>>>>> does
>>>>>>>>>>> it serve?  I couldn't find one.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> NeilBrown
>>>>>>>>>>>
>>>>>>>>>>> For reference, see
>>>>>>>>>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>>>>>>>>>>> d_weak_revalidate dentry op")
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> To reproduce the problem at home, on a system that uses systemd:
>>>>>>>>>>> 1/ place (or find) a filesystem image in a file on an NFS 
>>>>>>>>>>> filesystem.
>>>>>>>>>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>>>>>>>>>>> 3/ loop-mount the filesystem image read-only somewhere
>>>>>>>>>>> 4/ reboot
>>>>>>>>>>>
>>>>>>>>>>> If you choose v4, the reboot will succeed, possibly after a 90second
>>>>>>>>>>> timeout.
>>>>>>>>>>> If you choose v3, the reboot will hang indefinitely in systemd-
>>>>>>>>>>> shutdown while
>>>>>>>>>>> remounting the nfs filesystem read-only.
>>>>>>>>>>>
>>>>>>>>>>> If you don't use "noac" it can still 

Re: Do we really need d_weak_revalidate???

2017-08-18 Thread Ian Kent
On 18/08/17 14:47, Ian Kent wrote:
> On 18/08/17 13:24, NeilBrown wrote:
>> On Thu, Aug 17 2017, Ian Kent wrote:
>>
>>> On 16/08/17 19:34, Jeff Layton wrote:
>>>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>>>>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>>>>
>>>>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
>>>>>>> On Fri, Aug 11 2017, Jeff Layton wrote:
>>>>>>>
>>>>>>>> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>>>>>>>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>>>>>>>>>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>>>>>>>>>> flag and introduced the d_weak_revalidate dentry operation instead.
>>>>>>>>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>>>>>>>>>> and added the new dentry operation to NFS dentries  but not to
>>>>>>>>>> NFSv4
>>>>>>>>>> dentries.
>>>>>>>>>>
>>>>>>>>>> And nobody noticed.
>>>>>>>>>>
>>>>>>>>>> Until today.
>>>>>>>>>>
>>>>>>>>>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>>>>>>>>>> NFS
>>>>>>>>>> filesystem hangs because the network has been deconfigured.  This
>>>>>>>>>> makes
>>>>>>>>>> perfect sense and I suggested a code change to fix the problem.
>>>>>>>>>> However when a colleague was trying to reproduce the problem to
>>>>>>>>>> validate
>>>>>>>>>> the fix, he couldn't.  Then nor could I.
>>>>>>>>>>
>>>>>>>>>> The problem is trivially reproducible with NFSv3, and not at all with
>>>>>>>>>> NFSv4.  The reason is the missing d_weak_revalidate.
>>>>>>>>>>
>>>>>>>>>> We could simply add d_weak_revalidate for NFSv4, but given that it
>>>>>>>>>> has been missing for 4.5 years, and the only time anyone noticed was
>>>>>>>>>> when the ommission resulted in a better user experience, I do wonder
>>>>>>>>>> if
>>>>>>>>>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>>>>>>>>>> does
>>>>>>>>>> it serve?  I couldn't find one.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> NeilBrown
>>>>>>>>>>
>>>>>>>>>> For reference, see
>>>>>>>>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>>>>>>>>>> d_weak_revalidate dentry op")
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> To reproduce the problem at home, on a system that uses systemd:
>>>>>>>>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>>>>>>>>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>>>>>>>>>> 3/ loop-mount the filesystem image read-only somewhere
>>>>>>>>>> 4/ reboot
>>>>>>>>>>
>>>>>>>>>> If you choose v4, the reboot will succeed, possibly after a 90second
>>>>>>>>>> timeout.
>>>>>>>>>> If you choose v3, the reboot will hang indefinitely in systemd-
>>>>>>>>>> shutdown while
>>>>>>>>>> remounting the nfs filesystem read-only.
>>>>>>>>>>
>>>>>>>>>> If you don't use "noac" it can still hang, but only if something
>>>>>>>>>> slows
>>>>>>>>>> down the reboot enough that attributes have timed out by the time
>>>>>>>>>> that
>>>>>>>>>> systemd-shutdown runs.  This happens for our customer.
>>>>>>>>>>
>>>>>>>>>> I

Re: Do we really need d_weak_revalidate???

2017-08-18 Thread Ian Kent
On 18/08/17 14:47, Ian Kent wrote:
> On 18/08/17 13:24, NeilBrown wrote:
>> On Thu, Aug 17 2017, Ian Kent wrote:
>>
>>> On 16/08/17 19:34, Jeff Layton wrote:
>>>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>>>>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>>>>
>>>>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
>>>>>>> On Fri, Aug 11 2017, Jeff Layton wrote:
>>>>>>>
>>>>>>>> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>>>>>>>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>>>>>>>>>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>>>>>>>>>> flag and introduced the d_weak_revalidate dentry operation instead.
>>>>>>>>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>>>>>>>>>> and added the new dentry operation to NFS dentries  but not to
>>>>>>>>>> NFSv4
>>>>>>>>>> dentries.
>>>>>>>>>>
>>>>>>>>>> And nobody noticed.
>>>>>>>>>>
>>>>>>>>>> Until today.
>>>>>>>>>>
>>>>>>>>>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>>>>>>>>>> NFS
>>>>>>>>>> filesystem hangs because the network has been deconfigured.  This
>>>>>>>>>> makes
>>>>>>>>>> perfect sense and I suggested a code change to fix the problem.
>>>>>>>>>> However when a colleague was trying to reproduce the problem to
>>>>>>>>>> validate
>>>>>>>>>> the fix, he couldn't.  Then nor could I.
>>>>>>>>>>
>>>>>>>>>> The problem is trivially reproducible with NFSv3, and not at all with
>>>>>>>>>> NFSv4.  The reason is the missing d_weak_revalidate.
>>>>>>>>>>
>>>>>>>>>> We could simply add d_weak_revalidate for NFSv4, but given that it
>>>>>>>>>> has been missing for 4.5 years, and the only time anyone noticed was
>>>>>>>>>> when the ommission resulted in a better user experience, I do wonder
>>>>>>>>>> if
>>>>>>>>>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>>>>>>>>>> does
>>>>>>>>>> it serve?  I couldn't find one.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> NeilBrown
>>>>>>>>>>
>>>>>>>>>> For reference, see
>>>>>>>>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>>>>>>>>>> d_weak_revalidate dentry op")
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> To reproduce the problem at home, on a system that uses systemd:
>>>>>>>>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>>>>>>>>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>>>>>>>>>> 3/ loop-mount the filesystem image read-only somewhere
>>>>>>>>>> 4/ reboot
>>>>>>>>>>
>>>>>>>>>> If you choose v4, the reboot will succeed, possibly after a 90second
>>>>>>>>>> timeout.
>>>>>>>>>> If you choose v3, the reboot will hang indefinitely in systemd-
>>>>>>>>>> shutdown while
>>>>>>>>>> remounting the nfs filesystem read-only.
>>>>>>>>>>
>>>>>>>>>> If you don't use "noac" it can still hang, but only if something
>>>>>>>>>> slows
>>>>>>>>>> down the reboot enough that attributes have timed out by the time
>>>>>>>>>> that
>>>>>>>>>> systemd-shutdown runs.  This happens for our customer.
>>>>>>>>>>
>>>>>>>>>> I

Re: Do we really need d_weak_revalidate???

2017-08-18 Thread Ian Kent
On 18/08/17 13:24, NeilBrown wrote:
> On Thu, Aug 17 2017, Ian Kent wrote:
> 
>> On 16/08/17 19:34, Jeff Layton wrote:
>>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>>>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>>>
>>>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
>>>>>> On Fri, Aug 11 2017, Jeff Layton wrote:
>>>>>>
>>>>>>> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>>>>>>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>>>>>>>>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>>>>>>>>> flag and introduced the d_weak_revalidate dentry operation instead.
>>>>>>>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>>>>>>>>> and added the new dentry operation to NFS dentries  but not to
>>>>>>>>> NFSv4
>>>>>>>>> dentries.
>>>>>>>>>
>>>>>>>>> And nobody noticed.
>>>>>>>>>
>>>>>>>>> Until today.
>>>>>>>>>
>>>>>>>>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>>>>>>>>> NFS
>>>>>>>>> filesystem hangs because the network has been deconfigured.  This
>>>>>>>>> makes
>>>>>>>>> perfect sense and I suggested a code change to fix the problem.
>>>>>>>>> However when a colleague was trying to reproduce the problem to
>>>>>>>>> validate
>>>>>>>>> the fix, he couldn't.  Then nor could I.
>>>>>>>>>
>>>>>>>>> The problem is trivially reproducible with NFSv3, and not at all with
>>>>>>>>> NFSv4.  The reason is the missing d_weak_revalidate.
>>>>>>>>>
>>>>>>>>> We could simply add d_weak_revalidate for NFSv4, but given that it
>>>>>>>>> has been missing for 4.5 years, and the only time anyone noticed was
>>>>>>>>> when the ommission resulted in a better user experience, I do wonder
>>>>>>>>> if
>>>>>>>>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>>>>>>>>> does
>>>>>>>>> it serve?  I couldn't find one.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> NeilBrown
>>>>>>>>>
>>>>>>>>> For reference, see
>>>>>>>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>>>>>>>>> d_weak_revalidate dentry op")
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> To reproduce the problem at home, on a system that uses systemd:
>>>>>>>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>>>>>>>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>>>>>>>>> 3/ loop-mount the filesystem image read-only somewhere
>>>>>>>>> 4/ reboot
>>>>>>>>>
>>>>>>>>> If you choose v4, the reboot will succeed, possibly after a 90second
>>>>>>>>> timeout.
>>>>>>>>> If you choose v3, the reboot will hang indefinitely in systemd-
>>>>>>>>> shutdown while
>>>>>>>>> remounting the nfs filesystem read-only.
>>>>>>>>>
>>>>>>>>> If you don't use "noac" it can still hang, but only if something
>>>>>>>>> slows
>>>>>>>>> down the reboot enough that attributes have timed out by the time
>>>>>>>>> that
>>>>>>>>> systemd-shutdown runs.  This happens for our customer.
>>>>>>>>>
>>>>>>>>> If the loop-mounted filesystem is not read-only, you get other
>>>>>>>>> problems.
>>>>>>>>>
>>>>>>>>> We really want systemd to figure out that the loop-mount needs to be
>>>>>>>>> unmounted first.  I have ideas

Re: Do we really need d_weak_revalidate???

2017-08-18 Thread Ian Kent
On 18/08/17 13:24, NeilBrown wrote:
> On Thu, Aug 17 2017, Ian Kent wrote:
> 
>> On 16/08/17 19:34, Jeff Layton wrote:
>>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>>>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>>>
>>>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
>>>>>> On Fri, Aug 11 2017, Jeff Layton wrote:
>>>>>>
>>>>>>> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>>>>>>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>>>>>>>>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>>>>>>>>> flag and introduced the d_weak_revalidate dentry operation instead.
>>>>>>>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>>>>>>>>> and added the new dentry operation to NFS dentries  but not to
>>>>>>>>> NFSv4
>>>>>>>>> dentries.
>>>>>>>>>
>>>>>>>>> And nobody noticed.
>>>>>>>>>
>>>>>>>>> Until today.
>>>>>>>>>
>>>>>>>>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>>>>>>>>> NFS
>>>>>>>>> filesystem hangs because the network has been deconfigured.  This
>>>>>>>>> makes
>>>>>>>>> perfect sense and I suggested a code change to fix the problem.
>>>>>>>>> However when a colleague was trying to reproduce the problem to
>>>>>>>>> validate
>>>>>>>>> the fix, he couldn't.  Then nor could I.
>>>>>>>>>
>>>>>>>>> The problem is trivially reproducible with NFSv3, and not at all with
>>>>>>>>> NFSv4.  The reason is the missing d_weak_revalidate.
>>>>>>>>>
>>>>>>>>> We could simply add d_weak_revalidate for NFSv4, but given that it
>>>>>>>>> has been missing for 4.5 years, and the only time anyone noticed was
>>>>>>>>> when the ommission resulted in a better user experience, I do wonder
>>>>>>>>> if
>>>>>>>>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>>>>>>>>> does
>>>>>>>>> it serve?  I couldn't find one.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> NeilBrown
>>>>>>>>>
>>>>>>>>> For reference, see
>>>>>>>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>>>>>>>>> d_weak_revalidate dentry op")
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> To reproduce the problem at home, on a system that uses systemd:
>>>>>>>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>>>>>>>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>>>>>>>>> 3/ loop-mount the filesystem image read-only somewhere
>>>>>>>>> 4/ reboot
>>>>>>>>>
>>>>>>>>> If you choose v4, the reboot will succeed, possibly after a 90second
>>>>>>>>> timeout.
>>>>>>>>> If you choose v3, the reboot will hang indefinitely in systemd-
>>>>>>>>> shutdown while
>>>>>>>>> remounting the nfs filesystem read-only.
>>>>>>>>>
>>>>>>>>> If you don't use "noac" it can still hang, but only if something
>>>>>>>>> slows
>>>>>>>>> down the reboot enough that attributes have timed out by the time
>>>>>>>>> that
>>>>>>>>> systemd-shutdown runs.  This happens for our customer.
>>>>>>>>>
>>>>>>>>> If the loop-mounted filesystem is not read-only, you get other
>>>>>>>>> problems.
>>>>>>>>>
>>>>>>>>> We really want systemd to figure out that the loop-mount needs to be
>>>>>>>>> unmounted first.  I have ideas

Re: Do we really need d_weak_revalidate???

2017-08-16 Thread Ian Kent
On 16/08/17 19:34, Jeff Layton wrote:
> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>
>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
 On Fri, Aug 11 2017, Jeff Layton wrote:

> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>>> flag and introduced the d_weak_revalidate dentry operation instead.
>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>>> and added the new dentry operation to NFS dentries  but not to
>>> NFSv4
>>> dentries.
>>>
>>> And nobody noticed.
>>>
>>> Until today.
>>>
>>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>>> NFS
>>> filesystem hangs because the network has been deconfigured.  This
>>> makes
>>> perfect sense and I suggested a code change to fix the problem.
>>> However when a colleague was trying to reproduce the problem to
>>> validate
>>> the fix, he couldn't.  Then nor could I.
>>>
>>> The problem is trivially reproducible with NFSv3, and not at all with
>>> NFSv4.  The reason is the missing d_weak_revalidate.
>>>
>>> We could simply add d_weak_revalidate for NFSv4, but given that it
>>> has been missing for 4.5 years, and the only time anyone noticed was
>>> when the ommission resulted in a better user experience, I do wonder
>>> if
>>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>>> does
>>> it serve?  I couldn't find one.
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>> For reference, see
>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>>> d_weak_revalidate dentry op")
>>>
>>>
>>>
>>> To reproduce the problem at home, on a system that uses systemd:
>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>>> 3/ loop-mount the filesystem image read-only somewhere
>>> 4/ reboot
>>>
>>> If you choose v4, the reboot will succeed, possibly after a 90second
>>> timeout.
>>> If you choose v3, the reboot will hang indefinitely in systemd-
>>> shutdown while
>>> remounting the nfs filesystem read-only.
>>>
>>> If you don't use "noac" it can still hang, but only if something
>>> slows
>>> down the reboot enough that attributes have timed out by the time
>>> that
>>> systemd-shutdown runs.  This happens for our customer.
>>>
>>> If the loop-mounted filesystem is not read-only, you get other
>>> problems.
>>>
>>> We really want systemd to figure out that the loop-mount needs to be
>>> unmounted first.  I have ideas concerning that, but it is messy.  But
>>> that isn't the only bug here.
>>
>> The main purpose of d_weak_revalidate() was to catch the issues that
>> arise when someone changes the contents of the current working
>> directory or its parent on the server. Since '.' and '..' are treated
>> specially in the lookup code, they would not be revalidated without
>> special treatment. That leads to issues when looking up files as
>> ./ or ../, since the client won't detect that its
>> dcache is stale until it tries to use the cached dentry+inode.
>>
>> The one thing that has changed since its introduction is, I believe,
>> the ESTALE handling in the VFS layer. That might fix a lot of the
>> dcache lookup bugs that were previously handled by d_weak_revalidate().
>> I haven't done an audit to figure out if it actually can handle all of
>> them.
>>
>
> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
>
> vfs: allow umount to handle mountpoints without revalidating them

 You say in the comment for that commit:

  but there
 are cases where we do want to revalidate the root of the fs.

 Do you happen to remember what those cases are?

>>>
>>> Not exactly, but I _think_ I might have been assuming that we needed to
>>> ensure that the inode attrs on the root were up to date after the
>>> pathwalk.
>>>
>>> I think that was probably wrong. d_revalidate is really intended to
>>> ensure that the dentry in question still points to the same inode. In
>>> the case of the root of the mount though, we don't really care about the
>>> dentry on the server at all. We're attaching the root of the mount to an
>>> inode and don't care of the dentry name changes. If we do need to ensure
>>> the inode attrs are updated, we'll just revalidate them at that point.
>>>
>
> Possibly the fact that we no longer try to revalidate during unmount
> means that this is no longer necessary?
>
> The original patch that added d_weak_revalidate 

Re: Do we really need d_weak_revalidate???

2017-08-16 Thread Ian Kent
On 16/08/17 19:34, Jeff Layton wrote:
> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>
>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
 On Fri, Aug 11 2017, Jeff Layton wrote:

> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>>> flag and introduced the d_weak_revalidate dentry operation instead.
>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>>> and added the new dentry operation to NFS dentries  but not to
>>> NFSv4
>>> dentries.
>>>
>>> And nobody noticed.
>>>
>>> Until today.
>>>
>>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>>> NFS
>>> filesystem hangs because the network has been deconfigured.  This
>>> makes
>>> perfect sense and I suggested a code change to fix the problem.
>>> However when a colleague was trying to reproduce the problem to
>>> validate
>>> the fix, he couldn't.  Then nor could I.
>>>
>>> The problem is trivially reproducible with NFSv3, and not at all with
>>> NFSv4.  The reason is the missing d_weak_revalidate.
>>>
>>> We could simply add d_weak_revalidate for NFSv4, but given that it
>>> has been missing for 4.5 years, and the only time anyone noticed was
>>> when the ommission resulted in a better user experience, I do wonder
>>> if
>>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>>> does
>>> it serve?  I couldn't find one.
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>> For reference, see
>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>>> d_weak_revalidate dentry op")
>>>
>>>
>>>
>>> To reproduce the problem at home, on a system that uses systemd:
>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>>> 3/ loop-mount the filesystem image read-only somewhere
>>> 4/ reboot
>>>
>>> If you choose v4, the reboot will succeed, possibly after a 90second
>>> timeout.
>>> If you choose v3, the reboot will hang indefinitely in systemd-
>>> shutdown while
>>> remounting the nfs filesystem read-only.
>>>
>>> If you don't use "noac" it can still hang, but only if something
>>> slows
>>> down the reboot enough that attributes have timed out by the time
>>> that
>>> systemd-shutdown runs.  This happens for our customer.
>>>
>>> If the loop-mounted filesystem is not read-only, you get other
>>> problems.
>>>
>>> We really want systemd to figure out that the loop-mount needs to be
>>> unmounted first.  I have ideas concerning that, but it is messy.  But
>>> that isn't the only bug here.
>>
>> The main purpose of d_weak_revalidate() was to catch the issues that
>> arise when someone changes the contents of the current working
>> directory or its parent on the server. Since '.' and '..' are treated
>> specially in the lookup code, they would not be revalidated without
>> special treatment. That leads to issues when looking up files as
>> ./ or ../, since the client won't detect that its
>> dcache is stale until it tries to use the cached dentry+inode.
>>
>> The one thing that has changed since its introduction is, I believe,
>> the ESTALE handling in the VFS layer. That might fix a lot of the
>> dcache lookup bugs that were previously handled by d_weak_revalidate().
>> I haven't done an audit to figure out if it actually can handle all of
>> them.
>>
>
> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
>
> vfs: allow umount to handle mountpoints without revalidating them

 You say in the comment for that commit:

  but there
 are cases where we do want to revalidate the root of the fs.

 Do you happen to remember what those cases are?

>>>
>>> Not exactly, but I _think_ I might have been assuming that we needed to
>>> ensure that the inode attrs on the root were up to date after the
>>> pathwalk.
>>>
>>> I think that was probably wrong. d_revalidate is really intended to
>>> ensure that the dentry in question still points to the same inode. In
>>> the case of the root of the mount though, we don't really care about the
>>> dentry on the server at all. We're attaching the root of the mount to an
>>> inode and don't care of the dentry name changes. If we do need to ensure
>>> the inode attrs are updated, we'll just revalidate them at that point.
>>>
>
> Possibly the fact that we no longer try to revalidate during unmount
> means that this is no longer necessary?
>
> The original patch that added d_weak_revalidate 

[PATCH 3/5] autofs: use AUTOFS_DEV_IOCTL_SIZE

2017-08-15 Thread Ian Kent
From: Tomohiro Kusumi <tkus...@tuxera.com>

Use a macro which defines misc-dev ioctl parameter size (excluding
a path beyond [0]) since it's been used to initialize and copy
this structure ever since it first appeared in 8d7b48e0 in 2008.

(or simply get rid of this if this is just unnecessary abstraction
when all it needs is sizeof(struct autofs_dev_ioctl))

Edit: imk
That's a good point but I'd prefer to keep the macro define.
End edit: imk

Signed-off-by: Tomohiro Kusumi <tkus...@tuxera.com>
Signed-off-by: Ian Kent <ra...@themaw.net>
---
 fs/autofs4/dev-ioctl.c |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index ea8b3a1cddd2..b8b66d55266d 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -97,13 +97,13 @@ static struct autofs_dev_ioctl *
 {
struct autofs_dev_ioctl tmp, *res;
 
-   if (copy_from_user(, in, sizeof(tmp)))
+   if (copy_from_user(, in, AUTOFS_DEV_IOCTL_SIZE))
return ERR_PTR(-EFAULT);
 
-   if (tmp.size < sizeof(tmp))
+   if (tmp.size < AUTOFS_DEV_IOCTL_SIZE)
return ERR_PTR(-EINVAL);
 
-   if (tmp.size > (PATH_MAX + sizeof(tmp)))
+   if (tmp.size > AUTOFS_DEV_IOCTL_SIZE + PATH_MAX)
return ERR_PTR(-ENAMETOOLONG);
 
res = memdup_user(in, tmp.size);
@@ -133,8 +133,8 @@ static int validate_dev_ioctl(int cmd, struct 
autofs_dev_ioctl *param)
goto out;
}
 
-   if (param->size > sizeof(*param)) {
-   err = invalid_str(param->path, param->size - sizeof(*param));
+   if (param->size > AUTOFS_DEV_IOCTL_SIZE) {
+   err = invalid_str(param->path, param->size - 
AUTOFS_DEV_IOCTL_SIZE);
if (err) {
pr_warn(
  "path string terminator missing for cmd(0x%08x)\n",
@@ -451,7 +451,7 @@ static int autofs_dev_ioctl_requester(struct file *fp,
dev_t devid;
int err = -ENOENT;
 
-   if (param->size <= sizeof(*param)) {
+   if (param->size <= AUTOFS_DEV_IOCTL_SIZE) {
err = -EINVAL;
goto out;
}
@@ -539,7 +539,7 @@ static int autofs_dev_ioctl_ismountpoint(struct file *fp,
unsigned int devid, magic;
int err = -ENOENT;
 
-   if (param->size <= sizeof(*param)) {
+   if (param->size <= AUTOFS_DEV_IOCTL_SIZE) {
err = -EINVAL;
goto out;
}



[PATCH 3/5] autofs: use AUTOFS_DEV_IOCTL_SIZE

2017-08-15 Thread Ian Kent
From: Tomohiro Kusumi 

Use a macro which defines misc-dev ioctl parameter size (excluding
a path beyond [0]) since it's been used to initialize and copy
this structure ever since it first appeared in 8d7b48e0 in 2008.

(or simply get rid of this if this is just unnecessary abstraction
when all it needs is sizeof(struct autofs_dev_ioctl))

Edit: imk
That's a good point but I'd prefer to keep the macro define.
End edit: imk

Signed-off-by: Tomohiro Kusumi 
Signed-off-by: Ian Kent 
---
 fs/autofs4/dev-ioctl.c |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index ea8b3a1cddd2..b8b66d55266d 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -97,13 +97,13 @@ static struct autofs_dev_ioctl *
 {
struct autofs_dev_ioctl tmp, *res;
 
-   if (copy_from_user(, in, sizeof(tmp)))
+   if (copy_from_user(, in, AUTOFS_DEV_IOCTL_SIZE))
return ERR_PTR(-EFAULT);
 
-   if (tmp.size < sizeof(tmp))
+   if (tmp.size < AUTOFS_DEV_IOCTL_SIZE)
return ERR_PTR(-EINVAL);
 
-   if (tmp.size > (PATH_MAX + sizeof(tmp)))
+   if (tmp.size > AUTOFS_DEV_IOCTL_SIZE + PATH_MAX)
return ERR_PTR(-ENAMETOOLONG);
 
res = memdup_user(in, tmp.size);
@@ -133,8 +133,8 @@ static int validate_dev_ioctl(int cmd, struct 
autofs_dev_ioctl *param)
goto out;
}
 
-   if (param->size > sizeof(*param)) {
-   err = invalid_str(param->path, param->size - sizeof(*param));
+   if (param->size > AUTOFS_DEV_IOCTL_SIZE) {
+   err = invalid_str(param->path, param->size - 
AUTOFS_DEV_IOCTL_SIZE);
if (err) {
pr_warn(
  "path string terminator missing for cmd(0x%08x)\n",
@@ -451,7 +451,7 @@ static int autofs_dev_ioctl_requester(struct file *fp,
dev_t devid;
int err = -ENOENT;
 
-   if (param->size <= sizeof(*param)) {
+   if (param->size <= AUTOFS_DEV_IOCTL_SIZE) {
err = -EINVAL;
goto out;
}
@@ -539,7 +539,7 @@ static int autofs_dev_ioctl_ismountpoint(struct file *fp,
unsigned int devid, magic;
int err = -ENOENT;
 
-   if (param->size <= sizeof(*param)) {
+   if (param->size <= AUTOFS_DEV_IOCTL_SIZE) {
err = -EINVAL;
goto out;
}



[PATCH 1/5] autofs: remove unused AUTOFS_IOC_EXPIRE_DIRECT/INDIRECT

2017-08-15 Thread Ian Kent
From: Tomohiro Kusumi <tkus...@tuxera.com>

These are not used by either kernel or userspace, although
AUTOFS_IOC_EXPIRE_DIRECT once seems to have been used by userspace
in around 2006-2008, which was technically just an alias of the
existing ioctl AUTOFS_IOC_EXPIRE_MULTI.

ioctls for autofs are already complicated enough that they could
be removed unless these are staying here to be able to compile
userspace code of certain period of time from a decade ago.

Edit: imk
Yes, this is indeed very old and anything that still uses must
be updated becuase it will be using broken functionality.
End edit: imk

Signed-off-by: Tomohiro Kusumi <tkus...@tuxera.com>
Signed-off-by: Ian Kent <ra...@themaw.net>
---
 include/uapi/linux/auto_fs4.h |2 --
 1 file changed, 2 deletions(-)

diff --git a/include/uapi/linux/auto_fs4.h b/include/uapi/linux/auto_fs4.h
index 7c6da423d54e..9453e9a07c9d 100644
--- a/include/uapi/linux/auto_fs4.h
+++ b/include/uapi/linux/auto_fs4.h
@@ -155,8 +155,6 @@ enum {
 };
 
 #define AUTOFS_IOC_EXPIRE_MULTI_IOW(AUTOFS_IOCTL, 
AUTOFS_IOC_EXPIRE_MULTI_CMD, int)
-#define AUTOFS_IOC_EXPIRE_INDIRECT AUTOFS_IOC_EXPIRE_MULTI
-#define AUTOFS_IOC_EXPIRE_DIRECT   AUTOFS_IOC_EXPIRE_MULTI
 #define AUTOFS_IOC_PROTOSUBVER _IOR(AUTOFS_IOCTL, 
AUTOFS_IOC_PROTOSUBVER_CMD, int)
 #define AUTOFS_IOC_ASKUMOUNT   _IOR(AUTOFS_IOCTL, 
AUTOFS_IOC_ASKUMOUNT_CMD, int)
 



[PATCH 1/5] autofs: remove unused AUTOFS_IOC_EXPIRE_DIRECT/INDIRECT

2017-08-15 Thread Ian Kent
From: Tomohiro Kusumi 

These are not used by either kernel or userspace, although
AUTOFS_IOC_EXPIRE_DIRECT once seems to have been used by userspace
in around 2006-2008, which was technically just an alias of the
existing ioctl AUTOFS_IOC_EXPIRE_MULTI.

ioctls for autofs are already complicated enough that they could
be removed unless these are staying here to be able to compile
userspace code of certain period of time from a decade ago.

Edit: imk
Yes, this is indeed very old and anything that still uses must
be updated becuase it will be using broken functionality.
End edit: imk

Signed-off-by: Tomohiro Kusumi 
Signed-off-by: Ian Kent 
---
 include/uapi/linux/auto_fs4.h |2 --
 1 file changed, 2 deletions(-)

diff --git a/include/uapi/linux/auto_fs4.h b/include/uapi/linux/auto_fs4.h
index 7c6da423d54e..9453e9a07c9d 100644
--- a/include/uapi/linux/auto_fs4.h
+++ b/include/uapi/linux/auto_fs4.h
@@ -155,8 +155,6 @@ enum {
 };
 
 #define AUTOFS_IOC_EXPIRE_MULTI_IOW(AUTOFS_IOCTL, 
AUTOFS_IOC_EXPIRE_MULTI_CMD, int)
-#define AUTOFS_IOC_EXPIRE_INDIRECT AUTOFS_IOC_EXPIRE_MULTI
-#define AUTOFS_IOC_EXPIRE_DIRECT   AUTOFS_IOC_EXPIRE_MULTI
 #define AUTOFS_IOC_PROTOSUBVER _IOR(AUTOFS_IOCTL, 
AUTOFS_IOC_PROTOSUBVER_CMD, int)
 #define AUTOFS_IOC_ASKUMOUNT   _IOR(AUTOFS_IOCTL, 
AUTOFS_IOC_ASKUMOUNT_CMD, int)
 



[PATCH 2/5] autofs: non functional header inclusion cleanup

2017-08-15 Thread Ian Kent
From: Tomohiro Kusumi <tkus...@tuxera.com>

Having header includes before any macro (without any dependency)
simply looks normal. No reason to have these macros in between.

Signed-off-by: Tomohiro Kusumi <tkus...@tuxera.com>
Signed-off-by: Ian Kent <ra...@themaw.net>
---
 fs/autofs4/autofs_i.h |   22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index beef981aa54f..4737615f0eaa 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -11,10 +11,21 @@
 
 #include 
 #include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 /* This is the range of ioctl() numbers we claim as ours */
 #define AUTOFS_IOC_FIRST AUTOFS_IOC_READY
@@ -24,17 +35,6 @@
 #define AUTOFS_DEV_IOCTL_IOC_COUNT \
(AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD - AUTOFS_DEV_IOCTL_VERSION_CMD)
 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
 #ifdef pr_fmt
 #undef pr_fmt
 #endif



[PATCH 5/5] autofs: use unsigned int/long instead of uint/ulong for ioctl args

2017-08-15 Thread Ian Kent
From: Tomohiro Kusumi <tkus...@tuxera.com>

The standard types unsigned int and unsigned long should be used for
.compat_ioctl. autofs is the only fs using uing/ulong for this, and
these are even the only uint/ulong in the entire autofs code.

Drop unneeded long cast in return value of autofs_dev_ioctl_compat().
It's already long.

Signed-off-by: Tomohiro Kusumi <tkus...@tuxera.com>
Signed-off-by: Ian Kent <ra...@themaw.net>
---
 fs/autofs4/dev-ioctl.c |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index a990c9d0f893..b7c816f39404 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -93,7 +93,7 @@ static int check_dev_ioctl_version(int cmd, struct 
autofs_dev_ioctl *param)
  * at the end of the struct.
  */
 static struct autofs_dev_ioctl *
-   copy_dev_ioctl(struct autofs_dev_ioctl __user *in)
+copy_dev_ioctl(struct autofs_dev_ioctl __user *in)
 {
struct autofs_dev_ioctl tmp, *res;
 
@@ -705,7 +705,8 @@ static int _autofs_dev_ioctl(unsigned int command,
return err;
 }
 
-static long autofs_dev_ioctl(struct file *file, uint command, ulong u)
+static long autofs_dev_ioctl(struct file *file, unsigned int command,
+unsigned long u)
 {
int err;
 
@@ -714,9 +715,10 @@ static long autofs_dev_ioctl(struct file *file, uint 
command, ulong u)
 }
 
 #ifdef CONFIG_COMPAT
-static long autofs_dev_ioctl_compat(struct file *file, uint command, ulong u)
+static long autofs_dev_ioctl_compat(struct file *file, unsigned int command,
+   unsigned long u)
 {
-   return (long) autofs_dev_ioctl(file, command, (ulong) compat_ptr(u));
+   return autofs_dev_ioctl(file, command, (unsigned long) compat_ptr(u));
 }
 #else
 #define autofs_dev_ioctl_compat NULL



[PATCH 2/5] autofs: non functional header inclusion cleanup

2017-08-15 Thread Ian Kent
From: Tomohiro Kusumi 

Having header includes before any macro (without any dependency)
simply looks normal. No reason to have these macros in between.

Signed-off-by: Tomohiro Kusumi 
Signed-off-by: Ian Kent 
---
 fs/autofs4/autofs_i.h |   22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index beef981aa54f..4737615f0eaa 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -11,10 +11,21 @@
 
 #include 
 #include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 /* This is the range of ioctl() numbers we claim as ours */
 #define AUTOFS_IOC_FIRST AUTOFS_IOC_READY
@@ -24,17 +35,6 @@
 #define AUTOFS_DEV_IOCTL_IOC_COUNT \
(AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD - AUTOFS_DEV_IOCTL_VERSION_CMD)
 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
 #ifdef pr_fmt
 #undef pr_fmt
 #endif



[PATCH 5/5] autofs: use unsigned int/long instead of uint/ulong for ioctl args

2017-08-15 Thread Ian Kent
From: Tomohiro Kusumi 

The standard types unsigned int and unsigned long should be used for
.compat_ioctl. autofs is the only fs using uing/ulong for this, and
these are even the only uint/ulong in the entire autofs code.

Drop unneeded long cast in return value of autofs_dev_ioctl_compat().
It's already long.

Signed-off-by: Tomohiro Kusumi 
Signed-off-by: Ian Kent 
---
 fs/autofs4/dev-ioctl.c |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index a990c9d0f893..b7c816f39404 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -93,7 +93,7 @@ static int check_dev_ioctl_version(int cmd, struct 
autofs_dev_ioctl *param)
  * at the end of the struct.
  */
 static struct autofs_dev_ioctl *
-   copy_dev_ioctl(struct autofs_dev_ioctl __user *in)
+copy_dev_ioctl(struct autofs_dev_ioctl __user *in)
 {
struct autofs_dev_ioctl tmp, *res;
 
@@ -705,7 +705,8 @@ static int _autofs_dev_ioctl(unsigned int command,
return err;
 }
 
-static long autofs_dev_ioctl(struct file *file, uint command, ulong u)
+static long autofs_dev_ioctl(struct file *file, unsigned int command,
+unsigned long u)
 {
int err;
 
@@ -714,9 +715,10 @@ static long autofs_dev_ioctl(struct file *file, uint 
command, ulong u)
 }
 
 #ifdef CONFIG_COMPAT
-static long autofs_dev_ioctl_compat(struct file *file, uint command, ulong u)
+static long autofs_dev_ioctl_compat(struct file *file, unsigned int command,
+   unsigned long u)
 {
-   return (long) autofs_dev_ioctl(file, command, (ulong) compat_ptr(u));
+   return autofs_dev_ioctl(file, command, (unsigned long) compat_ptr(u));
 }
 #else
 #define autofs_dev_ioctl_compat NULL



[PATCH 4/5] autofs: drop wrong comment

2017-08-15 Thread Ian Kent
From: Tomohiro Kusumi <tkus...@tuxera.com>

This comment was correct when it was added in 8d7b48e0 in 2008,
but not after 4e44b685 in 2009 which introduced find_autofs_mount().

Signed-off-by: Tomohiro Kusumi <tkus...@tuxera.com>
Signed-off-by: Ian Kent <ra...@themaw.net>
---
 fs/autofs4/dev-ioctl.c |5 -
 1 file changed, 5 deletions(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index b8b66d55266d..a990c9d0f893 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -258,11 +258,6 @@ static int autofs_dev_ioctl_open_mountpoint(const char 
*name, dev_t devid)
if (err)
goto out;
 
-   /*
-* Find autofs super block that has the device number
-* corresponding to the autofs fs we want to open.
-*/
-
filp = dentry_open(, O_RDONLY, current_cred());
path_put();
if (IS_ERR(filp)) {



[PATCH 4/5] autofs: drop wrong comment

2017-08-15 Thread Ian Kent
From: Tomohiro Kusumi 

This comment was correct when it was added in 8d7b48e0 in 2008,
but not after 4e44b685 in 2009 which introduced find_autofs_mount().

Signed-off-by: Tomohiro Kusumi 
Signed-off-by: Ian Kent 
---
 fs/autofs4/dev-ioctl.c |5 -
 1 file changed, 5 deletions(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index b8b66d55266d..a990c9d0f893 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -258,11 +258,6 @@ static int autofs_dev_ioctl_open_mountpoint(const char 
*name, dev_t devid)
if (err)
goto out;
 
-   /*
-* Find autofs super block that has the device number
-* corresponding to the autofs fs we want to open.
-*/
-
filp = dentry_open(, O_RDONLY, current_cred());
path_put();
if (IS_ERR(filp)) {



Re: [PATCH 1/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-08-09 Thread Ian Kent
On 09/08/17 17:51, Ian Kent wrote:
> On 09/08/17 16:39, David Howells wrote:
>> Ian Kent <ra...@themaw.net> wrote:
>>
>>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>>> negative dentry case in follow_automount() needs to be changed to
>>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>>> required flags are clear).
>>
>> Should the be EREMOTE instead of ENOENT?
> 
> I thought about that and ended up thinking ENOENT was more sensible
> but I'll look at it again.

I think EREMOTE and ENOENT both are inaccurate.

There's no way to know if the negative dentry corresponds to a valid map
key, and we've seen increasing lookups from userspace applications for
invalid directories, so I'm not sure.

I went with ENOENT but I guess we could use EREMOTE, what's your thinking
on why EREMOTE might be better than ENOENT?

Ian


Re: [PATCH 1/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-08-09 Thread Ian Kent
On 09/08/17 17:51, Ian Kent wrote:
> On 09/08/17 16:39, David Howells wrote:
>> Ian Kent  wrote:
>>
>>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>>> negative dentry case in follow_automount() needs to be changed to
>>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>>> required flags are clear).
>>
>> Should the be EREMOTE instead of ENOENT?
> 
> I thought about that and ended up thinking ENOENT was more sensible
> but I'll look at it again.

I think EREMOTE and ENOENT both are inaccurate.

There's no way to know if the negative dentry corresponds to a valid map
key, and we've seen increasing lookups from userspace applications for
invalid directories, so I'm not sure.

I went with ENOENT but I guess we could use EREMOTE, what's your thinking
on why EREMOTE might be better than ENOENT?

Ian


Re: [PATCH 1/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-08-09 Thread Ian Kent
On 09/08/17 16:39, David Howells wrote:
> Ian Kent <ra...@themaw.net> wrote:
> 
>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>> negative dentry case in follow_automount() needs to be changed to
>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>> required flags are clear).
> 
> Should the be EREMOTE instead of ENOENT?

I thought about that and ended up thinking ENOENT was more sensible
but I'll look at it again.

Thanks
Ian


Re: [PATCH 1/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-08-09 Thread Ian Kent
On 09/08/17 16:39, David Howells wrote:
> Ian Kent  wrote:
> 
>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>> negative dentry case in follow_automount() needs to be changed to
>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>> required flags are clear).
> 
> Should the be EREMOTE instead of ENOENT?

I thought about that and ended up thinking ENOENT was more sensible
but I'll look at it again.

Thanks
Ian


Re: [PATCH 1/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-08-08 Thread Ian Kent
On 08/08/17 21:11, Colin Walters wrote:
> On Tue, Aug 8, 2017, at 12:26 AM, Ian Kent wrote:
> 
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -3022,8 +3022,7 @@ static inline int vfs_lstat(const char __user *name, 
>> struct kstat *stat)
>>  static inline int vfs_fstatat(int dfd, const char __user *filename,
>>struct kstat *stat, int flags)
>>  {
>> -return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
>> - stat, STATX_BASIC_STATS);
>> +return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
>>  }
>>  static inline int vfs_fstat(int fd, struct kstat *stat)
>>  {
> 
> This is reverting the fstatat() prat of
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=deccf497d804a4c5fca2dbfad2f104675a6f9102
> Which itself seems weird to me - it looks like we were unconditionally
> forcing on AT_NO_AUTOMOUNT regardless of what userspace passed?
> So perhaps a
> Fixes: deccf497d804a4c5fca2dbfad2f104675a6f9102
> is appropriate here?

David posted this at my request.

I asked him to do it because, when I saw this, I thought restoring the semantics
to what they were before they were changed needed to be done as quickly as 
possible.

That was so that I could then work on fixing the AT_NO_AUTOMOUNT not being 
honored
with fstatat(2).

> 
> I understand that for stat()/lstat() we didn't expose the option to userspace,
> so the behavior was...ah, there's this note in man-pages 
> (man-pages-4.09-3.fc26.noarch):
> 
>> On Linux, lstat() will generally not trigger automounter action, whereas 
>> stat() will (but see fstatat(2)).
> 
> I have no idea of the history here, but maybe it makes sense to drop
> the AT_NO_AUTOMOUNT from the vfs_stat() too?
> 

I thought I had talked about the history in the patch description but I guess
it's not clear and isn't detailed enough for people that haven't been close to
the development over time.

Historically stat family calls were not supposed to trigger automounts because
that can easily lead to mount storms that are really bad for large autofs mount
maps. But the mount storm problem was mostly only evident for autofs maps that
used the "browse" option, the non-negative dentry case. The negative dentry
case always triggered an automount regardless of the system call.

Because of the move in user space to mostly always use proc filesystem mount
tables where there can be many more mount entries than were present in the
text based mount tables it's critical to not perform mount callbacks that
aren't absolutely essentially.

At this point that means to me going over the stat(2) system call behavior
and making sure it is only calling back where necessary. That's because that
is where it's expected automounts won't be triggered and so should have least
impact.

So it's the negative dentry handling in follow_automount() you should be
thinking about in terms of impact rather than the actual fstatat(2) change.
If man pages need to change then they need to change.

AFAICT, as I said in the patch description, this should not cause regressions
but I can't be certain. In any case it is in keeping with the historical "stat
family system calls shouldn't trigger automounts" mantra needed from the 
beginning.

Also notice that the negative dentry handling change should only affect autofs
as other kernel uses will be triggering automounts for positive dentrys.

Hopefully this doesn't sound to aggressive, I don't mean it to sound that way.

Ian


Re: [PATCH 1/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-08-08 Thread Ian Kent
On 08/08/17 21:11, Colin Walters wrote:
> On Tue, Aug 8, 2017, at 12:26 AM, Ian Kent wrote:
> 
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -3022,8 +3022,7 @@ static inline int vfs_lstat(const char __user *name, 
>> struct kstat *stat)
>>  static inline int vfs_fstatat(int dfd, const char __user *filename,
>>struct kstat *stat, int flags)
>>  {
>> -return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
>> - stat, STATX_BASIC_STATS);
>> +return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
>>  }
>>  static inline int vfs_fstat(int fd, struct kstat *stat)
>>  {
> 
> This is reverting the fstatat() prat of
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=deccf497d804a4c5fca2dbfad2f104675a6f9102
> Which itself seems weird to me - it looks like we were unconditionally
> forcing on AT_NO_AUTOMOUNT regardless of what userspace passed?
> So perhaps a
> Fixes: deccf497d804a4c5fca2dbfad2f104675a6f9102
> is appropriate here?

David posted this at my request.

I asked him to do it because, when I saw this, I thought restoring the semantics
to what they were before they were changed needed to be done as quickly as 
possible.

That was so that I could then work on fixing the AT_NO_AUTOMOUNT not being 
honored
with fstatat(2).

> 
> I understand that for stat()/lstat() we didn't expose the option to userspace,
> so the behavior was...ah, there's this note in man-pages 
> (man-pages-4.09-3.fc26.noarch):
> 
>> On Linux, lstat() will generally not trigger automounter action, whereas 
>> stat() will (but see fstatat(2)).
> 
> I have no idea of the history here, but maybe it makes sense to drop
> the AT_NO_AUTOMOUNT from the vfs_stat() too?
> 

I thought I had talked about the history in the patch description but I guess
it's not clear and isn't detailed enough for people that haven't been close to
the development over time.

Historically stat family calls were not supposed to trigger automounts because
that can easily lead to mount storms that are really bad for large autofs mount
maps. But the mount storm problem was mostly only evident for autofs maps that
used the "browse" option, the non-negative dentry case. The negative dentry
case always triggered an automount regardless of the system call.

Because of the move in user space to mostly always use proc filesystem mount
tables where there can be many more mount entries than were present in the
text based mount tables it's critical to not perform mount callbacks that
aren't absolutely essentially.

At this point that means to me going over the stat(2) system call behavior
and making sure it is only calling back where necessary. That's because that
is where it's expected automounts won't be triggered and so should have least
impact.

So it's the negative dentry handling in follow_automount() you should be
thinking about in terms of impact rather than the actual fstatat(2) change.
If man pages need to change then they need to change.

AFAICT, as I said in the patch description, this should not cause regressions
but I can't be certain. In any case it is in keeping with the historical "stat
family system calls shouldn't trigger automounts" mantra needed from the 
beginning.

Also notice that the negative dentry handling change should only affect autofs
as other kernel uses will be triggering automounts for positive dentrys.

Hopefully this doesn't sound to aggressive, I don't mean it to sound that way.

Ian


[PATCH 2/3] autofs - make disc device user accessible

2017-08-07 Thread Ian Kent
The autofs miscellanous device ioctls that shouldn't require
CAP_SYS_ADMIN need to be accessible to user space applications in
order to be able to get information about autofs mounts.

The module checks capabilities so the miscelaneous device should
be fine with broad permissions.

Signed-off-by: Ian Kent <ra...@themaw.net>
Cc: Colin Walters <walt...@redhat.com>
Cc: Ondrej Holy <oh...@redhat.com>
---
 fs/autofs4/dev-ioctl.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index dd9f1bebb5a3..218a4ecc75cc 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -733,7 +733,8 @@ static const struct file_operations _dev_ioctl_fops = {
 static struct miscdevice _autofs_dev_ioctl_misc = {
.minor  = AUTOFS_MINOR,
.name   = AUTOFS_DEVICE_NAME,
-   .fops   = &_dev_ioctl_fops
+   .fops   = &_dev_ioctl_fops,
+   .mode   = 0644,
 };
 
 MODULE_ALIAS_MISCDEV(AUTOFS_MINOR);



[PATCH 2/3] autofs - make disc device user accessible

2017-08-07 Thread Ian Kent
The autofs miscellanous device ioctls that shouldn't require
CAP_SYS_ADMIN need to be accessible to user space applications in
order to be able to get information about autofs mounts.

The module checks capabilities so the miscelaneous device should
be fine with broad permissions.

Signed-off-by: Ian Kent 
Cc: Colin Walters 
Cc: Ondrej Holy 
---
 fs/autofs4/dev-ioctl.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index dd9f1bebb5a3..218a4ecc75cc 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -733,7 +733,8 @@ static const struct file_operations _dev_ioctl_fops = {
 static struct miscdevice _autofs_dev_ioctl_misc = {
.minor  = AUTOFS_MINOR,
.name   = AUTOFS_DEVICE_NAME,
-   .fops   = &_dev_ioctl_fops
+   .fops   = &_dev_ioctl_fops,
+   .mode   = 0644,
 };
 
 MODULE_ALIAS_MISCDEV(AUTOFS_MINOR);



[PATCH 1/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-08-07 Thread Ian Kent
The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
of an automount by the call. But this flag is unconditionally cleared
for all stat family system calls except statx().

stat family system calls have always triggered mount requests for the
negative dentry case in follow_automount() which is intended but prevents
the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.

In order to handle the AT_NO_AUTOMOUNT for both system calls the
negative dentry case in follow_automount() needs to be changed to
return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
required flags are clear).

AFAICT this change doesn't have any noticable side effects and may,
in some use cases (although I didn't see it in testing) prevent
unnecessary callbacks to the automount daemon.

It's also possible that a stat family call has been made with a
path that is in the process of being mounted by some other process.
But stat family calls should return the automount state of the path
as it is "now" so it shouldn't wait for mount completion.

This is the same semantic as the positive dentry case already
handled.

Signed-off-by: Ian Kent <ra...@themaw.net>
Cc: David Howells <dhowe...@redhat.com>
Cc: Colin Walters <walt...@redhat.com>
Cc: Ondrej Holy <oh...@redhat.com>
---
 fs/namei.c |   15 ---
 include/linux/fs.h |3 +--
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ddb6a7c2b3d4..1180f9c58093 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1129,9 +1129,18 @@ static int follow_automount(struct path *path, struct 
nameidata *nd,
 * of the daemon to instantiate them before they can be used.
 */
if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
-  LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
-   path->dentry->d_inode)
-   return -EISDIR;
+  LOOKUP_OPEN | LOOKUP_CREATE |
+  LOOKUP_AUTOMOUNT))) {
+   /* Positive dentry that isn't meant to trigger an
+* automount, EISDIR will allow it to be used,
+* otherwise there's no mount here "now" so return
+* ENOENT.
+*/
+   if (path->dentry->d_inode)
+   return -EISDIR;
+   else
+   return -ENOENT;
+   }
 
if (path->dentry->d_sb->s_user_ns != _user_ns)
return -EACCES;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6e1fd5d21248..37c96f52e48e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3022,8 +3022,7 @@ static inline int vfs_lstat(const char __user *name, 
struct kstat *stat)
 static inline int vfs_fstatat(int dfd, const char __user *filename,
  struct kstat *stat, int flags)
 {
-   return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
-stat, STATX_BASIC_STATS);
+   return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
 }
 static inline int vfs_fstat(int fd, struct kstat *stat)
 {



[PATCH 3/3] autofs - make dev ioctl version and ismountpoint user accessible

2017-08-07 Thread Ian Kent
Some of the autofs miscellaneous device ioctls need to be accessable to
user space applications without CAP_SYS_ADMIN to get information about
autofs mounts.

Signed-off-by: Ian Kent <ra...@themaw.net>
Cc: Colin Walters <walt...@redhat.com>
Cc: Ondrej Holy <oh...@redhat.com>
---
 fs/autofs4/dev-ioctl.c  |   12 
 include/uapi/linux/auto_dev-ioctl.h |2 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index 218a4ecc75cc..ea8b3a1cddd2 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -628,10 +628,6 @@ static int _autofs_dev_ioctl(unsigned int command,
ioctl_fn fn = NULL;
int err = 0;
 
-   /* only root can play with this */
-   if (!capable(CAP_SYS_ADMIN))
-   return -EPERM;
-
cmd_first = _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST);
cmd = _IOC_NR(command);
 
@@ -640,6 +636,14 @@ static int _autofs_dev_ioctl(unsigned int command,
return -ENOTTY;
}
 
+   /* Only root can use ioctls other than AUTOFS_DEV_IOCTL_VERSION_CMD
+* and AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD
+*/
+   if (cmd != AUTOFS_DEV_IOCTL_VERSION_CMD &&
+   cmd != AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD &&
+   !capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
/* Copy the parameters into kernel space. */
param = copy_dev_ioctl(user);
if (IS_ERR(param))
diff --git a/include/uapi/linux/auto_dev-ioctl.h 
b/include/uapi/linux/auto_dev-ioctl.h
index 744b3d060968..5558db8e6646 100644
--- a/include/uapi/linux/auto_dev-ioctl.h
+++ b/include/uapi/linux/auto_dev-ioctl.h
@@ -16,7 +16,7 @@
 #define AUTOFS_DEVICE_NAME "autofs"
 
 #define AUTOFS_DEV_IOCTL_VERSION_MAJOR 1
-#define AUTOFS_DEV_IOCTL_VERSION_MINOR 0
+#define AUTOFS_DEV_IOCTL_VERSION_MINOR 1
 
 #define AUTOFS_DEV_IOCTL_SIZE  sizeof(struct autofs_dev_ioctl)
 



[PATCH 1/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-08-07 Thread Ian Kent
The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
of an automount by the call. But this flag is unconditionally cleared
for all stat family system calls except statx().

stat family system calls have always triggered mount requests for the
negative dentry case in follow_automount() which is intended but prevents
the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.

In order to handle the AT_NO_AUTOMOUNT for both system calls the
negative dentry case in follow_automount() needs to be changed to
return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
required flags are clear).

AFAICT this change doesn't have any noticable side effects and may,
in some use cases (although I didn't see it in testing) prevent
unnecessary callbacks to the automount daemon.

It's also possible that a stat family call has been made with a
path that is in the process of being mounted by some other process.
But stat family calls should return the automount state of the path
as it is "now" so it shouldn't wait for mount completion.

This is the same semantic as the positive dentry case already
handled.

Signed-off-by: Ian Kent 
Cc: David Howells 
Cc: Colin Walters 
Cc: Ondrej Holy 
---
 fs/namei.c |   15 ---
 include/linux/fs.h |3 +--
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ddb6a7c2b3d4..1180f9c58093 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1129,9 +1129,18 @@ static int follow_automount(struct path *path, struct 
nameidata *nd,
 * of the daemon to instantiate them before they can be used.
 */
if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
-  LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
-   path->dentry->d_inode)
-   return -EISDIR;
+  LOOKUP_OPEN | LOOKUP_CREATE |
+  LOOKUP_AUTOMOUNT))) {
+   /* Positive dentry that isn't meant to trigger an
+* automount, EISDIR will allow it to be used,
+* otherwise there's no mount here "now" so return
+* ENOENT.
+*/
+   if (path->dentry->d_inode)
+   return -EISDIR;
+   else
+   return -ENOENT;
+   }
 
if (path->dentry->d_sb->s_user_ns != _user_ns)
return -EACCES;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6e1fd5d21248..37c96f52e48e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3022,8 +3022,7 @@ static inline int vfs_lstat(const char __user *name, 
struct kstat *stat)
 static inline int vfs_fstatat(int dfd, const char __user *filename,
  struct kstat *stat, int flags)
 {
-   return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
-stat, STATX_BASIC_STATS);
+   return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
 }
 static inline int vfs_fstat(int fd, struct kstat *stat)
 {



[PATCH 3/3] autofs - make dev ioctl version and ismountpoint user accessible

2017-08-07 Thread Ian Kent
Some of the autofs miscellaneous device ioctls need to be accessable to
user space applications without CAP_SYS_ADMIN to get information about
autofs mounts.

Signed-off-by: Ian Kent 
Cc: Colin Walters 
Cc: Ondrej Holy 
---
 fs/autofs4/dev-ioctl.c  |   12 
 include/uapi/linux/auto_dev-ioctl.h |2 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index 218a4ecc75cc..ea8b3a1cddd2 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -628,10 +628,6 @@ static int _autofs_dev_ioctl(unsigned int command,
ioctl_fn fn = NULL;
int err = 0;
 
-   /* only root can play with this */
-   if (!capable(CAP_SYS_ADMIN))
-   return -EPERM;
-
cmd_first = _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST);
cmd = _IOC_NR(command);
 
@@ -640,6 +636,14 @@ static int _autofs_dev_ioctl(unsigned int command,
return -ENOTTY;
}
 
+   /* Only root can use ioctls other than AUTOFS_DEV_IOCTL_VERSION_CMD
+* and AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD
+*/
+   if (cmd != AUTOFS_DEV_IOCTL_VERSION_CMD &&
+   cmd != AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD &&
+   !capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
/* Copy the parameters into kernel space. */
param = copy_dev_ioctl(user);
if (IS_ERR(param))
diff --git a/include/uapi/linux/auto_dev-ioctl.h 
b/include/uapi/linux/auto_dev-ioctl.h
index 744b3d060968..5558db8e6646 100644
--- a/include/uapi/linux/auto_dev-ioctl.h
+++ b/include/uapi/linux/auto_dev-ioctl.h
@@ -16,7 +16,7 @@
 #define AUTOFS_DEVICE_NAME "autofs"
 
 #define AUTOFS_DEV_IOCTL_VERSION_MAJOR 1
-#define AUTOFS_DEV_IOCTL_VERSION_MINOR 0
+#define AUTOFS_DEV_IOCTL_VERSION_MINOR 1
 
 #define AUTOFS_DEV_IOCTL_SIZE  sizeof(struct autofs_dev_ioctl)
 



Re: [PATCH] autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL

2017-06-15 Thread Ian Kent
On Fri, 2017-06-16 at 12:13 +1000, NeilBrown wrote:
> On Thu, Jun 15 2017, Andrew Morton wrote:
> 
> > On Wed, 07 Jun 2017 12:08:38 +1000 NeilBrown  wrote:
> > 
> > > 
> > > If a positive status is passed with the AUTOFS_DEV_IOCTL_FAIL
> > > ioctl, autofs4_d_automount() will return
> > >    ERR_PTR(status)
> > > with that status to follow_automount(), which will then
> > > dereference an invalid pointer.
> > > 
> > > So treat a positive status the same as zero, and map
> > > to ENOENT.
> > > 
> > > See comment in systemd src/core/automount.c::automount_send_ready().
> > > 
> > > ...
> > > 
> > > --- a/fs/autofs4/dev-ioctl.c
> > > +++ b/fs/autofs4/dev-ioctl.c
> > > @@ -344,7 +344,7 @@ static int autofs_dev_ioctl_fail(struct file *fp,
> > >   int status;
> > >  
> > >   token = (autofs_wqt_t) param->fail.token;
> > > - status = param->fail.status ? param->fail.status : -ENOENT;
> > > + status = param->fail.status < 0 ? param->fail.status : -ENOENT;
> > >   return autofs4_wait_release(sbi, token, status);
> > >  }
> > 
> > Sounds serious.  Was the absence of a cc:stable deliberate?
> 
> You need CAP_SYS_ADMIN to  get the ioctl even looked at.  Doesn't that
> mean the bug can only be triggered by a process that could easily do
> worse?

Think so, yes.

> 
> Or do containers allow admins to give out CAP_SYS_ADMIN to untrusted
> people??  I haven't been keeping up.

Maybe, with docker root can start a container with --privileged to give the
container admin capabilities. It may be the case that capabilities can be used
now I don't know.

> 
> Given how simple the patch is, it probably makes sense to add a
> cc:stable, just in case.

IMHO it needs to be applied to stable as well.

> 
> Thanks,
> NeilBrown


Re: [PATCH] autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL

2017-06-15 Thread Ian Kent
On Fri, 2017-06-16 at 12:13 +1000, NeilBrown wrote:
> On Thu, Jun 15 2017, Andrew Morton wrote:
> 
> > On Wed, 07 Jun 2017 12:08:38 +1000 NeilBrown  wrote:
> > 
> > > 
> > > If a positive status is passed with the AUTOFS_DEV_IOCTL_FAIL
> > > ioctl, autofs4_d_automount() will return
> > >    ERR_PTR(status)
> > > with that status to follow_automount(), which will then
> > > dereference an invalid pointer.
> > > 
> > > So treat a positive status the same as zero, and map
> > > to ENOENT.
> > > 
> > > See comment in systemd src/core/automount.c::automount_send_ready().
> > > 
> > > ...
> > > 
> > > --- a/fs/autofs4/dev-ioctl.c
> > > +++ b/fs/autofs4/dev-ioctl.c
> > > @@ -344,7 +344,7 @@ static int autofs_dev_ioctl_fail(struct file *fp,
> > >   int status;
> > >  
> > >   token = (autofs_wqt_t) param->fail.token;
> > > - status = param->fail.status ? param->fail.status : -ENOENT;
> > > + status = param->fail.status < 0 ? param->fail.status : -ENOENT;
> > >   return autofs4_wait_release(sbi, token, status);
> > >  }
> > 
> > Sounds serious.  Was the absence of a cc:stable deliberate?
> 
> You need CAP_SYS_ADMIN to  get the ioctl even looked at.  Doesn't that
> mean the bug can only be triggered by a process that could easily do
> worse?

Think so, yes.

> 
> Or do containers allow admins to give out CAP_SYS_ADMIN to untrusted
> people??  I haven't been keeping up.

Maybe, with docker root can start a container with --privileged to give the
container admin capabilities. It may be the case that capabilities can be used
now I don't know.

> 
> Given how simple the patch is, it probably makes sense to add a
> cc:stable, just in case.

IMHO it needs to be applied to stable as well.

> 
> Thanks,
> NeilBrown


Re: [PATCH RFC] mnt: umount mounts one by one in umount_tree()

2017-06-13 Thread Ian Kent
On Fri, 2017-05-12 at 00:08 -0700, Andrei Vagin wrote:
> With this patch, we don't try to umount all mounts of a tree together.
> Instead of this we umount them one by one. In this case, we see a significant
> improvement in performance for the worsе case.

Indeed, umount has been very slow for a while now.
Even a moderately large number of mounts (~1) become painfully slow.

Re you still perusing this?
Anything I can do to help?

Eric, what are your thoughts on this latest attempt?

> 
> The reason of this optimization is that umount() can hold namespace_sem
> for a long time, this semaphore is global, so it affects all users.
> Recently Eric W. Biederman added a per mount namespace limit on the
> number of mounts. The default number of mounts allowed per mount
> namespace at 100,000. Currently this value is allowed to construct a tree
> which requires hours to be umounted.
> 
> In a worse case the current complexity of umount_tree() is O(n^3).
> * Enumirate all mounts in a target tree (propagate_umount)
> * Enumirate mounts to find where these changes have to
>   be propagated (mark_umount_candidates)
> * Enumirate mounts to find a requered mount by parent and dentry
>   (__lookup_mnt). __lookup_mnt() searches a mount in m_hash, but
>   the number of mounts is much bigger than a size of the hash.
> 
> The worse case is when all mounts from the tree live in the same shared
> group. In this case we have to enumirate all mounts on each step.
> 
> There is CVE-2016-6213 about this case.
> 
> Here are results for the kernel with this patch
> $ for i in `seq 10 15`; do  unshare -m sh ./run.sh $i; done
> umount -l /mnt/1 ->   0:00.00
> umount -l /mnt/1 ->   0:00.01
> umount -l /mnt/1 ->   0:00.01
> umount -l /mnt/1 ->   0:00.03
> umount -l /mnt/1 ->   0:00.07
> umount -l /mnt/1 ->   0:00.14
> 
> Here are results for the kernel without this patch
> $ for i in `seq 10 15`; do  unshare -m sh ./run.sh $i; done
> umount -l /mnt/1 ->   0:00.04
> umount -l /mnt/1 ->   0:00.17
> umount -l /mnt/1 ->   0:00.75
> umount -l /mnt/1 ->   0:05.96
> umount -l /mnt/1 ->   0:34.40
> umount -l /mnt/1 ->   3:46.27
> 
> And here is a test script:
> $ cat run.sh
> set -e -m
> 
> mount -t tmpfs zdtm /mnt
> mkdir -p /mnt/1 /mnt/2
> mount -t tmpfs zdtm /mnt/1
> mount --make-shared /mnt/1
> mkdir /mnt/1/1
> 
> for i in `seq $1`; do
>   ./mount --bind /mnt/1/1 /mnt/1/1
> done
> 
> echo -n "umount -l /mnt/1 -> "
> /usr/bin/time -f '%E' ./umount -l /mnt/1
> 
> And we need these simple mount and umount tools, because the standard
> ones read /proc/self/mountinfo, but this is extremely slow when we have
> thousands of mounts.
> $ cat mount.c
>  #include 
>  #include 
> 
>  int main(int argc, char **argv)
>  {
>   return mount(argv[2], argv[3], NULL, MS_BIND, NULL);
>  }
> 
> $ cat umount.c
>  #include 
> 
>  int main(int argc, char **argv)
>  {
>   return umount2(argv[2], MNT_DETACH);
>  }
> 
> Here is a previous attempt to optimize this code:
> https://lkml.org/lkml/2016/10/10/495
> 
> Signed-off-by: Andrei Vagin 
> ---
>  fs/namespace.c | 81 +++
> ---
>  1 file changed, 43 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 3bf0cd2..4e6f258 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1474,56 +1474,61 @@ static bool disconnect_mount(struct mount *mnt, enum
> umount_tree_flags how)
>   */
>  static void umount_tree(struct mount *mnt, enum umount_tree_flags how)
>  {
> - LIST_HEAD(tmp_list);
>   struct mount *p;
> + int done = 0;
>  
>   if (how & UMOUNT_PROPAGATE)
>   propagate_mount_unlock(mnt);
>  
>   /* Gather the mounts to umount */
> - for (p = mnt; p; p = next_mnt(p, mnt)) {
> + while (!done) {
> + LIST_HEAD(tmp_list);
> +
> + p = mnt;
> + while (!list_empty(>mnt_mounts))
> + p = list_entry(p->mnt_mounts.next, struct mount,
> mnt_child);
> +
>   p->mnt.mnt_flags |= MNT_UMOUNT;
>   list_move(>mnt_list, _list);
> - }
> -
> - /* Hide the mounts from mnt_mounts */
> - list_for_each_entry(p, _list, mnt_list) {
>   list_del_init(>mnt_child);
> - }
>  
> - /* Add propogated mounts to the tmp_list */
> - if (how & UMOUNT_PROPAGATE)
> - propagate_umount(_list);
> -
> - while (!list_empty(_list)) {
> - struct mnt_namespace *ns;
> - bool disconnect;
> - p = list_first_entry(_list, struct mount, mnt_list);
> - list_del_init(>mnt_expire);
> - list_del_init(>mnt_list);
> - ns = p->mnt_ns;
> - if (ns) {
> - ns->mounts--;
> - __touch_mnt_namespace(ns);
> - }
> - p->mnt_ns = NULL;
> - if (how & UMOUNT_SYNC)
> - p->mnt.mnt_flags |= MNT_SYNC_UMOUNT;
> -
> - 

Re: [PATCH RFC] mnt: umount mounts one by one in umount_tree()

2017-06-13 Thread Ian Kent
On Fri, 2017-05-12 at 00:08 -0700, Andrei Vagin wrote:
> With this patch, we don't try to umount all mounts of a tree together.
> Instead of this we umount them one by one. In this case, we see a significant
> improvement in performance for the worsе case.

Indeed, umount has been very slow for a while now.
Even a moderately large number of mounts (~1) become painfully slow.

Re you still perusing this?
Anything I can do to help?

Eric, what are your thoughts on this latest attempt?

> 
> The reason of this optimization is that umount() can hold namespace_sem
> for a long time, this semaphore is global, so it affects all users.
> Recently Eric W. Biederman added a per mount namespace limit on the
> number of mounts. The default number of mounts allowed per mount
> namespace at 100,000. Currently this value is allowed to construct a tree
> which requires hours to be umounted.
> 
> In a worse case the current complexity of umount_tree() is O(n^3).
> * Enumirate all mounts in a target tree (propagate_umount)
> * Enumirate mounts to find where these changes have to
>   be propagated (mark_umount_candidates)
> * Enumirate mounts to find a requered mount by parent and dentry
>   (__lookup_mnt). __lookup_mnt() searches a mount in m_hash, but
>   the number of mounts is much bigger than a size of the hash.
> 
> The worse case is when all mounts from the tree live in the same shared
> group. In this case we have to enumirate all mounts on each step.
> 
> There is CVE-2016-6213 about this case.
> 
> Here are results for the kernel with this patch
> $ for i in `seq 10 15`; do  unshare -m sh ./run.sh $i; done
> umount -l /mnt/1 ->   0:00.00
> umount -l /mnt/1 ->   0:00.01
> umount -l /mnt/1 ->   0:00.01
> umount -l /mnt/1 ->   0:00.03
> umount -l /mnt/1 ->   0:00.07
> umount -l /mnt/1 ->   0:00.14
> 
> Here are results for the kernel without this patch
> $ for i in `seq 10 15`; do  unshare -m sh ./run.sh $i; done
> umount -l /mnt/1 ->   0:00.04
> umount -l /mnt/1 ->   0:00.17
> umount -l /mnt/1 ->   0:00.75
> umount -l /mnt/1 ->   0:05.96
> umount -l /mnt/1 ->   0:34.40
> umount -l /mnt/1 ->   3:46.27
> 
> And here is a test script:
> $ cat run.sh
> set -e -m
> 
> mount -t tmpfs zdtm /mnt
> mkdir -p /mnt/1 /mnt/2
> mount -t tmpfs zdtm /mnt/1
> mount --make-shared /mnt/1
> mkdir /mnt/1/1
> 
> for i in `seq $1`; do
>   ./mount --bind /mnt/1/1 /mnt/1/1
> done
> 
> echo -n "umount -l /mnt/1 -> "
> /usr/bin/time -f '%E' ./umount -l /mnt/1
> 
> And we need these simple mount and umount tools, because the standard
> ones read /proc/self/mountinfo, but this is extremely slow when we have
> thousands of mounts.
> $ cat mount.c
>  #include 
>  #include 
> 
>  int main(int argc, char **argv)
>  {
>   return mount(argv[2], argv[3], NULL, MS_BIND, NULL);
>  }
> 
> $ cat umount.c
>  #include 
> 
>  int main(int argc, char **argv)
>  {
>   return umount2(argv[2], MNT_DETACH);
>  }
> 
> Here is a previous attempt to optimize this code:
> https://lkml.org/lkml/2016/10/10/495
> 
> Signed-off-by: Andrei Vagin 
> ---
>  fs/namespace.c | 81 +++
> ---
>  1 file changed, 43 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 3bf0cd2..4e6f258 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1474,56 +1474,61 @@ static bool disconnect_mount(struct mount *mnt, enum
> umount_tree_flags how)
>   */
>  static void umount_tree(struct mount *mnt, enum umount_tree_flags how)
>  {
> - LIST_HEAD(tmp_list);
>   struct mount *p;
> + int done = 0;
>  
>   if (how & UMOUNT_PROPAGATE)
>   propagate_mount_unlock(mnt);
>  
>   /* Gather the mounts to umount */
> - for (p = mnt; p; p = next_mnt(p, mnt)) {
> + while (!done) {
> + LIST_HEAD(tmp_list);
> +
> + p = mnt;
> + while (!list_empty(>mnt_mounts))
> + p = list_entry(p->mnt_mounts.next, struct mount,
> mnt_child);
> +
>   p->mnt.mnt_flags |= MNT_UMOUNT;
>   list_move(>mnt_list, _list);
> - }
> -
> - /* Hide the mounts from mnt_mounts */
> - list_for_each_entry(p, _list, mnt_list) {
>   list_del_init(>mnt_child);
> - }
>  
> - /* Add propogated mounts to the tmp_list */
> - if (how & UMOUNT_PROPAGATE)
> - propagate_umount(_list);
> -
> - while (!list_empty(_list)) {
> - struct mnt_namespace *ns;
> - bool disconnect;
> - p = list_first_entry(_list, struct mount, mnt_list);
> - list_del_init(>mnt_expire);
> - list_del_init(>mnt_list);
> - ns = p->mnt_ns;
> - if (ns) {
> - ns->mounts--;
> - __touch_mnt_namespace(ns);
> - }
> - p->mnt_ns = NULL;
> - if (how & UMOUNT_SYNC)
> - p->mnt.mnt_flags |= MNT_SYNC_UMOUNT;
> -
> - disconnect = 

Re: [RFC][PATCH 0/9] Make containers kernel objects

2017-05-29 Thread Ian Kent
On Sat, 2017-05-27 at 17:45 +, Trond Myklebust wrote:
> On Mon, 2017-05-22 at 14:04 -0500, Eric W. Biederman wrote:
> > David Howells  writes:
> > 
> > > Here are a set of patches to define a container object for the
> > > kernel and
> > > to provide some methods to create and manipulate them.
> > > 
> > > The reason I think this is necessary is that the kernel has no idea
> > > how to
> > > direct upcalls to what userspace considers to be a container -
> > > current
> > > Linux practice appears to make a "container" just an arbitrarily
> > > chosen
> > > junction of namespaces, control groups and files, which may be
> > > changed
> > > individually within the "container".
> > > 
> > 
> > I think this might possibly be a useful abstraction for solving the
> > keyring upcalls if it was something created implicitly.
> > 
> > fork_into_container for use by keyring upcalls is currently a
> > security
> > vulnerability as it allows escaping all of a containers cgroups.  But
> > you have that on your list of things to fix.  However you don't have
> > seccomp and a few other things.
> > 
> > Before we had kthreadd in the kernel upcalls always had issues
> > because
> > the code to reset all of the userspace bits and make the forked
> > task suitable for running upcalls was always missing some detail.  It
> > is
> > a very bug-prone kind of idiom that you are talking about.  It is
> > doubly
> > bug-prone because the wrongness is visible to userspace and as such
> > might get become a frozen KABI guarantee.
> > 
> > Let me suggest a concrete alternative:
> > 
> > - At the time of mount observer the mounters user namespace.
> > - Find the mounters pid namespace.
> > - If the mounters pid namespace is owned by the mounters user
> > namespace
> >   walk up the pid namespace tree to the first pid namespace owned by
> >   that user namespace.
> > - If the mounters pid namespace is not owned by the mounters user
> >   namespace fail the mount it is going to need to make upcalls as
> >   will not be possible.
> > - Hold a reference to the pid namespace that was found.
> > 
> > Then when an upcall needs to be made fork a child of the init process
> > of the specified pid namespace.  Or fail if the init process of the
> > pid namespace has died.
> > 
> > That should always work and it does not require keeping expensive
> > state
> > where we did not have it previously.  Further because the semantics
> > are
> > fork a child of a particular pid namespace's init as features get
> > added
> > to the kernel this code remains well defined.
> > 
> > For ordinary request-key upcalls we should be able to use the same
> > rules
> > and just not save/restore things in the kernel.
> > 
> > A huge advantage of my alternative (other than not being a bit-rot
> > magnet) is that it should drop into existing container infrastructure
> > without problems.  The rule for container implementors is simple to
> > use
> > security key infrastructure you need to have created a pid namespace
> > in
> > your user namespace.
> > 
> 
> While this may be part of a solution, I don't see how it can deal with
> issues such as the need to set up an RPCSEC_GSS session on behalf of
> the user. The issue there is that while the mount may have been created
> in a parent namespace, the actual call to kinit to set up a kerberos
> context is likely to have been made inside the container. It may even
> have been done using a completely separate net namespace. So in order
> to set up my RPCSEC_GSS session, I may need to do so from inside the
> user container.
> 
> In that kind of environment, might it perhaps make sense to just allow
> an upcall executable running in the root init namespace to tunnel
> through (using setns()) so it can actually execute in the context of
> the child container? That would keep security policy with the init
> namespace, but would also ensure that the container environment rules
> may be applied if and when appropriate.
> 
> In addition to today's upcall mechanism, we would need the ability to
> pass in the nsproxy (and root directory) for the confined process that
> triggered the upcall and/or the namespace for the mountpoint. I'm
> assuming that could be done by passing in a file descriptor to the
> appropriate /proc entries?

Maybe I don't quite understand what your describing 

But the information needed (I think) is readily available at the time of the
upcall from within the container (it's not clear that really is the case based
on a later discussion). Then the process run in the init namespace can pick that
up and call (a slightly modified) setns().

I went down that burrow for quite a while and ended up leaving it at, as
Biederman so elegantly describes as, a bad half-assed implementation.

In fact I stopped perusing that approach for a couple of reasons, not the least
of which was a lack of in-depth knowledge of all the sub-systems involved (and a
complete lack of useful suggestions or help from 

Re: [RFC][PATCH 0/9] Make containers kernel objects

2017-05-29 Thread Ian Kent
On Sat, 2017-05-27 at 17:45 +, Trond Myklebust wrote:
> On Mon, 2017-05-22 at 14:04 -0500, Eric W. Biederman wrote:
> > David Howells  writes:
> > 
> > > Here are a set of patches to define a container object for the
> > > kernel and
> > > to provide some methods to create and manipulate them.
> > > 
> > > The reason I think this is necessary is that the kernel has no idea
> > > how to
> > > direct upcalls to what userspace considers to be a container -
> > > current
> > > Linux practice appears to make a "container" just an arbitrarily
> > > chosen
> > > junction of namespaces, control groups and files, which may be
> > > changed
> > > individually within the "container".
> > > 
> > 
> > I think this might possibly be a useful abstraction for solving the
> > keyring upcalls if it was something created implicitly.
> > 
> > fork_into_container for use by keyring upcalls is currently a
> > security
> > vulnerability as it allows escaping all of a containers cgroups.  But
> > you have that on your list of things to fix.  However you don't have
> > seccomp and a few other things.
> > 
> > Before we had kthreadd in the kernel upcalls always had issues
> > because
> > the code to reset all of the userspace bits and make the forked
> > task suitable for running upcalls was always missing some detail.  It
> > is
> > a very bug-prone kind of idiom that you are talking about.  It is
> > doubly
> > bug-prone because the wrongness is visible to userspace and as such
> > might get become a frozen KABI guarantee.
> > 
> > Let me suggest a concrete alternative:
> > 
> > - At the time of mount observer the mounters user namespace.
> > - Find the mounters pid namespace.
> > - If the mounters pid namespace is owned by the mounters user
> > namespace
> >   walk up the pid namespace tree to the first pid namespace owned by
> >   that user namespace.
> > - If the mounters pid namespace is not owned by the mounters user
> >   namespace fail the mount it is going to need to make upcalls as
> >   will not be possible.
> > - Hold a reference to the pid namespace that was found.
> > 
> > Then when an upcall needs to be made fork a child of the init process
> > of the specified pid namespace.  Or fail if the init process of the
> > pid namespace has died.
> > 
> > That should always work and it does not require keeping expensive
> > state
> > where we did not have it previously.  Further because the semantics
> > are
> > fork a child of a particular pid namespace's init as features get
> > added
> > to the kernel this code remains well defined.
> > 
> > For ordinary request-key upcalls we should be able to use the same
> > rules
> > and just not save/restore things in the kernel.
> > 
> > A huge advantage of my alternative (other than not being a bit-rot
> > magnet) is that it should drop into existing container infrastructure
> > without problems.  The rule for container implementors is simple to
> > use
> > security key infrastructure you need to have created a pid namespace
> > in
> > your user namespace.
> > 
> 
> While this may be part of a solution, I don't see how it can deal with
> issues such as the need to set up an RPCSEC_GSS session on behalf of
> the user. The issue there is that while the mount may have been created
> in a parent namespace, the actual call to kinit to set up a kerberos
> context is likely to have been made inside the container. It may even
> have been done using a completely separate net namespace. So in order
> to set up my RPCSEC_GSS session, I may need to do so from inside the
> user container.
> 
> In that kind of environment, might it perhaps make sense to just allow
> an upcall executable running in the root init namespace to tunnel
> through (using setns()) so it can actually execute in the context of
> the child container? That would keep security policy with the init
> namespace, but would also ensure that the container environment rules
> may be applied if and when appropriate.
> 
> In addition to today's upcall mechanism, we would need the ability to
> pass in the nsproxy (and root directory) for the confined process that
> triggered the upcall and/or the namespace for the mountpoint. I'm
> assuming that could be done by passing in a file descriptor to the
> appropriate /proc entries?

Maybe I don't quite understand what your describing 

But the information needed (I think) is readily available at the time of the
upcall from within the container (it's not clear that really is the case based
on a later discussion). Then the process run in the init namespace can pick that
up and call (a slightly modified) setns().

I went down that burrow for quite a while and ended up leaving it at, as
Biederman so elegantly describes as, a bad half-assed implementation.

In fact I stopped perusing that approach for a couple of reasons, not the least
of which was a lack of in-depth knowledge of all the sub-systems involved (and a
complete lack of useful suggestions or help from others that are 

[ANNOUNCE] autofs 5.1.3 release

2017-05-26 Thread Ian Kent
Hi all,

It's time for a new release, autofs-5.1.3.

Of particular interest is the addition browse support for amd format
maps.

The browsable_dirs configuration option can be set to either "yes" or
"no" and the pseudo option "browsable" can be used as an option of
mount type "auto" mounts.

Note that the browsable_dirs "full" option and the mount type auto
"fullybrowsabe" option, mostly used for map debugging, cannot be
provided by autofs.

Related changes are the removal of the requirement to include entries
for amd format maps in the master map. The optional amd format map
 mount sections may be used without adding these entries, in the same
way these sections are used in the amd configuration.

This prevents the need to include potentially incompatible entries in
shared master maps in multi-vendor environments.

Something to watch out for is that autofs is no longer tolerant of
misconfigured nsswitch "automount:" entries. If nsswitch sources that
aren't configured are present it can lead to an unconditional 10
second delay at startup with some unexpected log errors. autofs is
probably not going to interfere with nsswitch.conf because it is not
the owner of the file so it needs to be set correctly by people using
autofs. This will often mean just removing the inappropriate "nisplus"
source present in the default installed nsswitch.conf. 

There's two added configuration parameters which hopefully will help
with map source availability problems at startup.

The "master_wait" configuration option (default 10 seconds) will retry
reading the master map for the configured time when certain errors are
encountered (the reason misconfigured nsswitch configurations cause a
delay).

And the option "sss_master_map_wait" (default 0, disabled) has been
added specifically for the sss source. This option may be needed
because the error returned when sss is not yet available to answer
queries can't be distinguished from a real "map does not exist" return.
The default of disabled is because the sss source not being ready
problem has already be resolved in recent versions of sss itself.

Additionally there a number of bug fixes and small improvements.

autofs
==

The package can be found at:
https://www.kernel.org/pub/linux/daemons/autofs/v5/

It is autofs-5.1.3.tar.[gz|xz]

No source rpm is there as it can be produced by using:

rpmbuild -ts autofs-5.1.3.tar.gz

and the binary rpm by using:

rpmbuild -tb autofs-5.1.3.tar.gz

See the README.amd-maps file for information about using amd format
maps.

Here are the entries from the CHANGELOG which outline the updates:

24/05/2016 autofs-5.1.3
===
- fix release date in CHANGELOG.
- build: check for clock_gettime in librt.
- fix compiler warning in try_remount().
- drop redundant \n in logerr().
- Fix size arg of fgets(3).
- fix libtirpc detection with -Wl,--as-needed.
- Fix a typo in CREDITS.
- Change .requestor to .requester for consistency.
- fix file map changed check.
- Remove unused local 2KB buffer.
- Fix typos in error messages.
- Fix fgets(3) size argument (another one).
- fix short memory allocation in lookup_amd_instance().
- fix count_mounts() function.
- configure: add cache variable for Linux proc filesystem check.
- Avoid local variable name shadowing another.
- fix typo in MOUNT_FLAG_GHOST comment.
- fix cachefs parse message not being logged.
- fix argc off by one in mount_autofs.c.
- fix _strncmp() usage.
- fix create_client() RPC client handling.
- update and add README for old autofs schema.
- wait for master map available at start.
- add master read wait option.
- work around sss startup delay.
- add sss master map wait config option.
- fix quoted key handling in sanitize_path().
- fix included master map not found return.
- dont exit on master map read fail timeout.
- set sane default master read wait timeout.
- don't return until after master map retry read.
- make lookup_nss_read_master() return nss status.
- check NFS server availability on local mount fallback.
- check NFS server availability on local mount fallback.
- make set_direct_mount_catatonic() more general.
- set autofs mounts catatonic at exit.
- honor last rw in mount options when doing a bind mount.
- fix typos in README.amd-maps.
- add ref counting to struct map_source.
- add support for amd browsable option.
- add function conf_amd_get_map_name().
- add function conf_amd_get_mount_paths().
- include amd mount sections mounts in master mounts list.
- check for conflicting amd section mounts.
- add function conf_get_map_options().
- capture cache option and its settings during parsing.
- handle map_option cache for top level mounts.
- handle amd cache option all in amd type auto mounts.
- fix bogus check in expire_cleanup().
- delay submount exit for amd submounts.
- add the mount requestor's pid to pending_args.
- create thread-local ID for mount attempts.
- log functions to prefix messages with attempt_id if available.
- factor out 

[ANNOUNCE] autofs 5.1.3 release

2017-05-26 Thread Ian Kent
Hi all,

It's time for a new release, autofs-5.1.3.

Of particular interest is the addition browse support for amd format
maps.

The browsable_dirs configuration option can be set to either "yes" or
"no" and the pseudo option "browsable" can be used as an option of
mount type "auto" mounts.

Note that the browsable_dirs "full" option and the mount type auto
"fullybrowsabe" option, mostly used for map debugging, cannot be
provided by autofs.

Related changes are the removal of the requirement to include entries
for amd format maps in the master map. The optional amd format map
 mount sections may be used without adding these entries, in the same
way these sections are used in the amd configuration.

This prevents the need to include potentially incompatible entries in
shared master maps in multi-vendor environments.

Something to watch out for is that autofs is no longer tolerant of
misconfigured nsswitch "automount:" entries. If nsswitch sources that
aren't configured are present it can lead to an unconditional 10
second delay at startup with some unexpected log errors. autofs is
probably not going to interfere with nsswitch.conf because it is not
the owner of the file so it needs to be set correctly by people using
autofs. This will often mean just removing the inappropriate "nisplus"
source present in the default installed nsswitch.conf. 

There's two added configuration parameters which hopefully will help
with map source availability problems at startup.

The "master_wait" configuration option (default 10 seconds) will retry
reading the master map for the configured time when certain errors are
encountered (the reason misconfigured nsswitch configurations cause a
delay).

And the option "sss_master_map_wait" (default 0, disabled) has been
added specifically for the sss source. This option may be needed
because the error returned when sss is not yet available to answer
queries can't be distinguished from a real "map does not exist" return.
The default of disabled is because the sss source not being ready
problem has already be resolved in recent versions of sss itself.

Additionally there a number of bug fixes and small improvements.

autofs
==

The package can be found at:
https://www.kernel.org/pub/linux/daemons/autofs/v5/

It is autofs-5.1.3.tar.[gz|xz]

No source rpm is there as it can be produced by using:

rpmbuild -ts autofs-5.1.3.tar.gz

and the binary rpm by using:

rpmbuild -tb autofs-5.1.3.tar.gz

See the README.amd-maps file for information about using amd format
maps.

Here are the entries from the CHANGELOG which outline the updates:

24/05/2016 autofs-5.1.3
===
- fix release date in CHANGELOG.
- build: check for clock_gettime in librt.
- fix compiler warning in try_remount().
- drop redundant \n in logerr().
- Fix size arg of fgets(3).
- fix libtirpc detection with -Wl,--as-needed.
- Fix a typo in CREDITS.
- Change .requestor to .requester for consistency.
- fix file map changed check.
- Remove unused local 2KB buffer.
- Fix typos in error messages.
- Fix fgets(3) size argument (another one).
- fix short memory allocation in lookup_amd_instance().
- fix count_mounts() function.
- configure: add cache variable for Linux proc filesystem check.
- Avoid local variable name shadowing another.
- fix typo in MOUNT_FLAG_GHOST comment.
- fix cachefs parse message not being logged.
- fix argc off by one in mount_autofs.c.
- fix _strncmp() usage.
- fix create_client() RPC client handling.
- update and add README for old autofs schema.
- wait for master map available at start.
- add master read wait option.
- work around sss startup delay.
- add sss master map wait config option.
- fix quoted key handling in sanitize_path().
- fix included master map not found return.
- dont exit on master map read fail timeout.
- set sane default master read wait timeout.
- don't return until after master map retry read.
- make lookup_nss_read_master() return nss status.
- check NFS server availability on local mount fallback.
- check NFS server availability on local mount fallback.
- make set_direct_mount_catatonic() more general.
- set autofs mounts catatonic at exit.
- honor last rw in mount options when doing a bind mount.
- fix typos in README.amd-maps.
- add ref counting to struct map_source.
- add support for amd browsable option.
- add function conf_amd_get_map_name().
- add function conf_amd_get_mount_paths().
- include amd mount sections mounts in master mounts list.
- check for conflicting amd section mounts.
- add function conf_get_map_options().
- capture cache option and its settings during parsing.
- handle map_option cache for top level mounts.
- handle amd cache option all in amd type auto mounts.
- fix bogus check in expire_cleanup().
- delay submount exit for amd submounts.
- add the mount requestor's pid to pending_args.
- create thread-local ID for mount attempts.
- log functions to prefix messages with attempt_id if available.
- factor out 

Re: [RFC][PATCH 0/9] Make containers kernel objects

2017-05-24 Thread Ian Kent
On Wed, 2017-05-24 at 03:26 -0500, Eric W. Biederman wrote:
> 
> So far no one has even bothered to seriously try the one solution that
> is guaranteed to work because it takes a lot of changes to kernel code.
> I believe the last effort snagged on what a pain it is to refactor the
> user mode helper infrastructure.

Yes, that's mostly true in my case although I wouldn't say I haven't looked at
it seriously but equally I haven't got anything towards it yet either, sorry.

I'm likely going to revisit this based on a couple of approaches.

One is just what you describe and I had already been looking at this some time
ago. It seems to me that adding a work queue type that starts and retains a
process until the work queue is destroyed (similar to the way the work queue sub
system starts a fail over thread for use under resource exhaustion) would be a
sensible way to do it.

This doesn't mean I think it's a good idea for reasons I've outlined in the past
but the approach does warrant the effort to work out if it can be used without
problems.

And there's also the request key infrastructure which, as it is now, gets in the
road of verifying results, *sigh*.

Ian


Re: [RFC][PATCH 0/9] Make containers kernel objects

2017-05-24 Thread Ian Kent
On Wed, 2017-05-24 at 03:26 -0500, Eric W. Biederman wrote:
> 
> So far no one has even bothered to seriously try the one solution that
> is guaranteed to work because it takes a lot of changes to kernel code.
> I believe the last effort snagged on what a pain it is to refactor the
> user mode helper infrastructure.

Yes, that's mostly true in my case although I wouldn't say I haven't looked at
it seriously but equally I haven't got anything towards it yet either, sorry.

I'm likely going to revisit this based on a couple of approaches.

One is just what you describe and I had already been looking at this some time
ago. It seems to me that adding a work queue type that starts and retains a
process until the work queue is destroyed (similar to the way the work queue sub
system starts a fail over thread for use under resource exhaustion) would be a
sensible way to do it.

This doesn't mean I think it's a good idea for reasons I've outlined in the past
but the approach does warrant the effort to work out if it can be used without
problems.

And there's also the request key infrastructure which, as it is now, gets in the
road of verifying results, *sigh*.

Ian


Re: [RFC][PATCH 0/9] Make containers kernel objects

2017-05-23 Thread Ian Kent
On Mon, 2017-05-22 at 12:21 -0700, James Bottomley wrote:
> 
> > > >  (3) nfsdcltrack.  A way for NFSD to access stable storage for 
> > > > tracking of persistent state.  Again, network-namespace 
> > > > dependent, but also perhaps mount-namespace dependent.
> > 
> > Definitely mount-namespace dependent.
> > 
> > > 
> > > So again, given we can set this up to work today, this sounds like 
> > > more a restriction that will bite us than an enhancement that gives 
> > > us extra features.
> > > 
> > 
> > How do you set this up to work today?
> 
> Well, as above, it spawns into the root, you jump it to where it should
> be and re-execute or simply handle in the host. 
> 
> > AFAIK, if you want to run knfsd in a container today, you're out of 
> > luck for any non-trivial configuration.
> 
> Well "running knfsd in a container" is actually different from having a
> containerised nfs export.  My understanding was that thanks to the work
> of Stas Kinsbursky, the latter has mostly worked since the 3.9 kernel
> for v3 and below.  I assume the current issue is that there's a problem
> with v4?

Oh, ok, I thought that, say, a docker (NFS) volumes-from a container to another
container didn't work for any version of NFS.

Certainly didn't work last time I tried, it was a while ago though.

Ian


Re: [RFC][PATCH 0/9] Make containers kernel objects

2017-05-23 Thread Ian Kent
On Mon, 2017-05-22 at 12:21 -0700, James Bottomley wrote:
> 
> > > >  (3) nfsdcltrack.  A way for NFSD to access stable storage for 
> > > > tracking of persistent state.  Again, network-namespace 
> > > > dependent, but also perhaps mount-namespace dependent.
> > 
> > Definitely mount-namespace dependent.
> > 
> > > 
> > > So again, given we can set this up to work today, this sounds like 
> > > more a restriction that will bite us than an enhancement that gives 
> > > us extra features.
> > > 
> > 
> > How do you set this up to work today?
> 
> Well, as above, it spawns into the root, you jump it to where it should
> be and re-execute or simply handle in the host. 
> 
> > AFAIK, if you want to run knfsd in a container today, you're out of 
> > luck for any non-trivial configuration.
> 
> Well "running knfsd in a container" is actually different from having a
> containerised nfs export.  My understanding was that thanks to the work
> of Stas Kinsbursky, the latter has mostly worked since the 3.9 kernel
> for v3 and below.  I assume the current issue is that there's a problem
> with v4?

Oh, ok, I thought that, say, a docker (NFS) volumes-from a container to another
container didn't work for any version of NFS.

Certainly didn't work last time I tried, it was a while ago though.

Ian


Re: [RFC][PATCH 0/9] Make containers kernel objects

2017-05-23 Thread Ian Kent
On Mon, 2017-05-22 at 17:22 +0100, David Howells wrote:
> Here are a set of patches to define a container object for the kernel and
> to provide some methods to create and manipulate them.
> 
> The reason I think this is necessary is that the kernel has no idea how to
> direct upcalls to what userspace considers to be a container - current
> Linux practice appears to make a "container" just an arbitrarily chosen
> junction of namespaces, control groups and files, which may be changed
> individually within the "container".
> 
> The kernel upcall mechanism then needs to decide which set of namespaces,
> etc., it must exec the appropriate upcall program.  Examples of this
> include:
> 
>  (1) The DNS resolver.  The DNS cache in the kernel should probably be
>  per-network namespace, but in userspace the program, its libraries and
>  its config data are associated with a mount tree and a user namespace
>  and it gets run in a particular pid namespace.
> 
>  (2) NFS ID mapper.  The NFS ID mapping cache should also probably be
>  per-network namespace.
> 
>  (3) nfsdcltrack.  A way for NFSD to access stable storage for tracking
>  of persistent state.  Again, network-namespace dependent, but also
>  perhaps mount-namespace dependent.
> 
>  (4) General request-key upcalls.  Not particularly namespace dependent,
>  apart from keyrings being somewhat governed by the user namespace and
>  the upcall being configured by the mount namespace.
> 
> These patches are built on top of the mount context patchset so that
> namespaces can be properly propagated over submounts/automounts.
> 
> These patches implement a container object that holds the following things:
> 
>  (1) Namespaces.
> 
>  (2) A root directory.
> 
>  (3) A set of processes, including a designated 'init' process.
> 
>  (4) The creator's credentials, including ownership.
> 
>  (5) A place to hang security for the container, allowing policies to be
>  set per-container.
> 
> I also want to add:
> 
>  (6) Control groups.
> 
>  (7) A per-container keyring that can be added to from outside of the
>  container, even once the container is live, for the provision of
>  filesystem authentication/encryption keys in advance of the container
>  being started.

It's hard to decide which of these has higher priority, I think both essential
to a container implementation.

Ian


Re: [RFC][PATCH 0/9] Make containers kernel objects

2017-05-23 Thread Ian Kent
On Mon, 2017-05-22 at 17:22 +0100, David Howells wrote:
> Here are a set of patches to define a container object for the kernel and
> to provide some methods to create and manipulate them.
> 
> The reason I think this is necessary is that the kernel has no idea how to
> direct upcalls to what userspace considers to be a container - current
> Linux practice appears to make a "container" just an arbitrarily chosen
> junction of namespaces, control groups and files, which may be changed
> individually within the "container".
> 
> The kernel upcall mechanism then needs to decide which set of namespaces,
> etc., it must exec the appropriate upcall program.  Examples of this
> include:
> 
>  (1) The DNS resolver.  The DNS cache in the kernel should probably be
>  per-network namespace, but in userspace the program, its libraries and
>  its config data are associated with a mount tree and a user namespace
>  and it gets run in a particular pid namespace.
> 
>  (2) NFS ID mapper.  The NFS ID mapping cache should also probably be
>  per-network namespace.
> 
>  (3) nfsdcltrack.  A way for NFSD to access stable storage for tracking
>  of persistent state.  Again, network-namespace dependent, but also
>  perhaps mount-namespace dependent.
> 
>  (4) General request-key upcalls.  Not particularly namespace dependent,
>  apart from keyrings being somewhat governed by the user namespace and
>  the upcall being configured by the mount namespace.
> 
> These patches are built on top of the mount context patchset so that
> namespaces can be properly propagated over submounts/automounts.
> 
> These patches implement a container object that holds the following things:
> 
>  (1) Namespaces.
> 
>  (2) A root directory.
> 
>  (3) A set of processes, including a designated 'init' process.
> 
>  (4) The creator's credentials, including ownership.
> 
>  (5) A place to hang security for the container, allowing policies to be
>  set per-container.
> 
> I also want to add:
> 
>  (6) Control groups.
> 
>  (7) A per-container keyring that can be added to from outside of the
>  container, even once the container is live, for the provision of
>  filesystem authentication/encryption keys in advance of the container
>  being started.

It's hard to decide which of these has higher priority, I think both essential
to a container implementation.

Ian


Re: [RFC][PATCH 0/9] Make containers kernel objects

2017-05-23 Thread Ian Kent
On Mon, 2017-05-22 at 09:53 -0700, James Bottomley wrote:
> [Added missing cc to containers list]
> On Mon, 2017-05-22 at 17:22 +0100, David Howells wrote:
> > Here are a set of patches to define a container object for the kernel 
> > and to provide some methods to create and manipulate them.
> > 
> > The reason I think this is necessary is that the kernel has no idea 
> > how to direct upcalls to what userspace considers to be a container -
> > current Linux practice appears to make a "container" just an 
> > arbitrarily chosen junction of namespaces, control groups and files, 
> > which may be changed individually within the "container".
> 
> This sounds like a step in the wrong direction: the strength of the
> current container interfaces in Linux is that people who set up
> containers don't have to agree what they look like.  So I can set up a
> user namespace without a mount namespace or an architecture emulation
> container with only a mount namespace.
> 
> But ignoring my fun foibles with containers and to give a concrete
> example in terms of a popular orchestration system: in kubernetes,
> where certain namespaces are shared across pods, do you imagine the
> kernel's view of the "container" to be the pod or what kubernetes
> thinks of as the container?  This is important, because half the
> examples you give below are network related and usually pods share a
> network namespace.
> 
> > The kernel upcall mechanism then needs to decide which set of 
> > namespaces, etc., it must exec the appropriate upcall program. 
> >  Examples of this include:
> > 
> >  (1) The DNS resolver.  The DNS cache in the kernel should probably 
> > be per-network namespace, but in userspace the program, its
> > libraries and its config data are associated with a mount tree and a 
> > user namespace and it gets run in a particular pid namespace.
> 
> All persistent (written to fs data) has to be mount ns associated;
> there are no ifs, ands and buts to that.  I agree this implies that if
> you want to run a separate network namespace, you either take DNS from
> the parent (a lot of containers do) or you set up a daemon to run
> within the mount namespace.  I agree the latter is a slightly fiddly
> operation you have to get right, but that's why we have orchestration
> systems.
> 
> What is it we could do with the above that we cannot do today?
> 
> >  (2) NFS ID mapper.  The NFS ID mapping cache should also probably be
> >  per-network namespace.
> 
> I think this is a view but not the only one:  Right at the moment, NFS
> ID mapping is used as the one of the ways we can get the user namespace
> ID mapping writes to file problems fixed ... that makes it a property
> of the mount namespace for a lot of containers.  There are many other
> instances where they do exactly as you say, but what I'm saying is that
> we don't want to lose the flexibility we currently have.
> 
> >  (3) nfsdcltrack.  A way for NFSD to access stable storage for 
> > tracking of persistent state.  Again, network-namespace dependent, 
> > but also perhaps mount-namespace dependent.
> 
> So again, given we can set this up to work today, this sounds like more
> a restriction that will bite us than an enhancement that gives us extra
> features.
> 
> >  (4) General request-key upcalls.  Not particularly namespace 
> > dependent, apart from keyrings being somewhat governed by the user
> > namespace and the upcall being configured by the mount namespace.
> 
> All mount namespaces have an owning user namespace, so the data
> relations are already there in the kernel, is the problem simply
> finding them?
> 
> > These patches are built on top of the mount context patchset so that
> > namespaces can be properly propagated over submounts/automounts.
> 
> I'll stop here ... you get the idea that I think this is imposing a set
> of restrictions that will come back to bite us later.  If this is just
> for the sake of figuring out how to get keyring upcalls to work, then
> I'm sure we can come up with something.

You talk about a number of things I'm simply not aware of so I can't answer your
questions. But your points do sound like issues that need to be covered.

I think you mentioned user space used NFS ID mapper works fine.
I wonder, could you give more detail on that please.

Perhaps nsenter(1) is being used, I tried that as a possible usermode helper
solution and it probably did "work" in the sense of in container execution but
no-one liked it, it seems kernel folk expect to do things, well, in kernel.

Not only that there were other problems, probably request key sub system not
being namespace aware, or id caching within nfs or somewhere else, and there was
a question of not being able to cater for user namespace usage.

Anyway I do have a different view from my own experiences.

First there are a number of subsystems involved in creating a process from
within a container that has the container environment and, AFAICS (from the
usermode helper 

Re: [RFC][PATCH 0/9] Make containers kernel objects

2017-05-23 Thread Ian Kent
On Mon, 2017-05-22 at 09:53 -0700, James Bottomley wrote:
> [Added missing cc to containers list]
> On Mon, 2017-05-22 at 17:22 +0100, David Howells wrote:
> > Here are a set of patches to define a container object for the kernel 
> > and to provide some methods to create and manipulate them.
> > 
> > The reason I think this is necessary is that the kernel has no idea 
> > how to direct upcalls to what userspace considers to be a container -
> > current Linux practice appears to make a "container" just an 
> > arbitrarily chosen junction of namespaces, control groups and files, 
> > which may be changed individually within the "container".
> 
> This sounds like a step in the wrong direction: the strength of the
> current container interfaces in Linux is that people who set up
> containers don't have to agree what they look like.  So I can set up a
> user namespace without a mount namespace or an architecture emulation
> container with only a mount namespace.
> 
> But ignoring my fun foibles with containers and to give a concrete
> example in terms of a popular orchestration system: in kubernetes,
> where certain namespaces are shared across pods, do you imagine the
> kernel's view of the "container" to be the pod or what kubernetes
> thinks of as the container?  This is important, because half the
> examples you give below are network related and usually pods share a
> network namespace.
> 
> > The kernel upcall mechanism then needs to decide which set of 
> > namespaces, etc., it must exec the appropriate upcall program. 
> >  Examples of this include:
> > 
> >  (1) The DNS resolver.  The DNS cache in the kernel should probably 
> > be per-network namespace, but in userspace the program, its
> > libraries and its config data are associated with a mount tree and a 
> > user namespace and it gets run in a particular pid namespace.
> 
> All persistent (written to fs data) has to be mount ns associated;
> there are no ifs, ands and buts to that.  I agree this implies that if
> you want to run a separate network namespace, you either take DNS from
> the parent (a lot of containers do) or you set up a daemon to run
> within the mount namespace.  I agree the latter is a slightly fiddly
> operation you have to get right, but that's why we have orchestration
> systems.
> 
> What is it we could do with the above that we cannot do today?
> 
> >  (2) NFS ID mapper.  The NFS ID mapping cache should also probably be
> >  per-network namespace.
> 
> I think this is a view but not the only one:  Right at the moment, NFS
> ID mapping is used as the one of the ways we can get the user namespace
> ID mapping writes to file problems fixed ... that makes it a property
> of the mount namespace for a lot of containers.  There are many other
> instances where they do exactly as you say, but what I'm saying is that
> we don't want to lose the flexibility we currently have.
> 
> >  (3) nfsdcltrack.  A way for NFSD to access stable storage for 
> > tracking of persistent state.  Again, network-namespace dependent, 
> > but also perhaps mount-namespace dependent.
> 
> So again, given we can set this up to work today, this sounds like more
> a restriction that will bite us than an enhancement that gives us extra
> features.
> 
> >  (4) General request-key upcalls.  Not particularly namespace 
> > dependent, apart from keyrings being somewhat governed by the user
> > namespace and the upcall being configured by the mount namespace.
> 
> All mount namespaces have an owning user namespace, so the data
> relations are already there in the kernel, is the problem simply
> finding them?
> 
> > These patches are built on top of the mount context patchset so that
> > namespaces can be properly propagated over submounts/automounts.
> 
> I'll stop here ... you get the idea that I think this is imposing a set
> of restrictions that will come back to bite us later.  If this is just
> for the sake of figuring out how to get keyring upcalls to work, then
> I'm sure we can come up with something.

You talk about a number of things I'm simply not aware of so I can't answer your
questions. But your points do sound like issues that need to be covered.

I think you mentioned user space used NFS ID mapper works fine.
I wonder, could you give more detail on that please.

Perhaps nsenter(1) is being used, I tried that as a possible usermode helper
solution and it probably did "work" in the sense of in container execution but
no-one liked it, it seems kernel folk expect to do things, well, in kernel.

Not only that there were other problems, probably request key sub system not
being namespace aware, or id caching within nfs or somewhere else, and there was
a question of not being able to cater for user namespace usage.

Anyway I do have a different view from my own experiences.

First there are a number of subsystems involved in creating a process from
within a container that has the container environment and, AFAICS (from the
usermode helper 

[PATCH 2/3] autofs - make dev ioctl version and ismountpoint user accessible

2017-05-09 Thread Ian Kent
Some of the autofs miscellaneous device ioctls need to be accessable to
user space applications without CAP_SYS_ADMIN to get information about
autofs mounts.

Start by making the autofs miscellaneous device ioctl header available
and allow applications to use version and ismountpoint ioctls.

Signed-off-by: Ian Kent <ra...@themaw.net>
Cc: Colin Walters <walt...@redhat.com>
Cc: Ondrej Holy <oh...@redhat.com>
Cc: sta...@vger.kernel.org
---
 fs/autofs4/dev-ioctl.c  |   12 
 include/uapi/linux/Kbuild   |1 +
 include/uapi/linux/auto_dev-ioctl.h |2 +-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index 9b58d6e..f8cb3f6 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -628,10 +628,6 @@ static int _autofs_dev_ioctl(unsigned int command,
ioctl_fn fn = NULL;
int err = 0;
 
-   /* only root can play with this */
-   if (!capable(CAP_SYS_ADMIN))
-   return -EPERM;
-
cmd_first = _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST);
cmd = _IOC_NR(command);
 
@@ -640,6 +636,14 @@ static int _autofs_dev_ioctl(unsigned int command,
return -ENOTTY;
}
 
+   /* Only root can use ioctls other than AUTOFS_DEV_IOCTL_VERSION_CMD
+* and AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD
+*/
+   if (cmd != AUTOFS_DEV_IOCTL_VERSION_CMD &&
+   cmd != AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD &&
+   !capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
/* Copy the parameters into kernel space. */
param = copy_dev_ioctl(user);
if (IS_ERR(param))
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 662c592..1f22bbb 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -61,6 +61,7 @@ header-y += atm_zatm.h
 header-y += audit.h
 header-y += auto_fs4.h
 header-y += auto_fs.h
+header-y += auto_dev-ioctl.h
 header-y += auxvec.h
 header-y += ax25.h
 header-y += b1lli.h
diff --git a/include/uapi/linux/auto_dev-ioctl.h 
b/include/uapi/linux/auto_dev-ioctl.h
index 744b3d0..5558db8 100644
--- a/include/uapi/linux/auto_dev-ioctl.h
+++ b/include/uapi/linux/auto_dev-ioctl.h
@@ -16,7 +16,7 @@
 #define AUTOFS_DEVICE_NAME "autofs"
 
 #define AUTOFS_DEV_IOCTL_VERSION_MAJOR 1
-#define AUTOFS_DEV_IOCTL_VERSION_MINOR 0
+#define AUTOFS_DEV_IOCTL_VERSION_MINOR 1
 
 #define AUTOFS_DEV_IOCTL_SIZE  sizeof(struct autofs_dev_ioctl)
 



[PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-05-09 Thread Ian Kent
The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
of an automount by the call. But this flag is unconditionally cleared
for all stat family system calls except statx().

stat family system calls have always triggered mount requests for the
negative dentry case in follow_automount() which is intended but prevents
the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.

In order to handle the AT_NO_AUTOMOUNT for both system calls the
negative dentry case in follow_automount() needs to be changed to
return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
required flags are clear).

AFAICT this change doesn't have any noticable side effects and may,
in some use cases (although I didn't see it in testing) prevent
unnecessary callbacks to the automount daemon.

It's also possible that a stat family call has been made with a
path that is in the process of being mounted by some other process.
But stat family calls should return the automount state of the path
as it is "now" so it shouldn't wait for mount completion.

This is the same semantic as the positive dentry case already
handled.

Signed-off-by: Ian Kent <ra...@themaw.net>
Cc: David Howells <dhowe...@redhat.com>
Cc: Colin Walters <walt...@redhat.com>
Cc: Ondrej Holy <oh...@redhat.com>
Cc: sta...@vger.kernel.org
---
 fs/namei.c |   15 ---
 include/linux/fs.h |3 +--
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 7286f87..cd74838 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1129,9 +1129,18 @@ static int follow_automount(struct path *path, struct 
nameidata *nd,
 * of the daemon to instantiate them before they can be used.
 */
if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
-  LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
-   path->dentry->d_inode)
-   return -EISDIR;
+  LOOKUP_OPEN | LOOKUP_CREATE |
+  LOOKUP_AUTOMOUNT))) {
+   /* Positive dentry that isn't meant to trigger an
+* automount, EISDIR will allow it to be used,
+* otherwise there's no mount here "now" so return
+* ENOENT.
+*/
+   if (path->dentry->d_inode)
+   return -EISDIR;
+   else
+   return -ENOENT;
+   }
 
if (path->dentry->d_sb->s_user_ns != _user_ns)
return -EACCES;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 26488b4..be09684 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2935,8 +2935,7 @@ static inline int vfs_lstat(const char __user *name, 
struct kstat *stat)
 static inline int vfs_fstatat(int dfd, const char __user *filename,
  struct kstat *stat, int flags)
 {
-   return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
-stat, STATX_BASIC_STATS);
+   return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
 }
 static inline int vfs_fstat(int fd, struct kstat *stat)
 {



[PATCH 1/3] autofs - make disc device user accessible

2017-05-09 Thread Ian Kent
The autofs miscellanous device ioctls that shouldn't require
CAP_SYS_ADMIN need to be accessible to user space applications in
order to be able to get information about autofs mounts.

The module checks capabilities so the miscelaneous device should
be fine with broad permissions.

Signed-off-by: Ian Kent <ra...@themaw.net>
Cc: Colin Walters <walt...@redhat.com>
Cc: Ondrej Holy <oh...@redhat.com>
Cc: sta...@vger.kernel.org
---
 fs/autofs4/dev-ioctl.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index 734cbf8..9b58d6e 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -733,7 +733,8 @@ static const struct file_operations _dev_ioctl_fops = {
 static struct miscdevice _autofs_dev_ioctl_misc = {
.minor  = AUTOFS_MINOR,
.name   = AUTOFS_DEVICE_NAME,
-   .fops   = &_dev_ioctl_fops
+   .fops   = &_dev_ioctl_fops,
+   .mode   = 0666
 };
 
 MODULE_ALIAS_MISCDEV(AUTOFS_MINOR);



[PATCH 2/3] autofs - make dev ioctl version and ismountpoint user accessible

2017-05-09 Thread Ian Kent
Some of the autofs miscellaneous device ioctls need to be accessable to
user space applications without CAP_SYS_ADMIN to get information about
autofs mounts.

Start by making the autofs miscellaneous device ioctl header available
and allow applications to use version and ismountpoint ioctls.

Signed-off-by: Ian Kent 
Cc: Colin Walters 
Cc: Ondrej Holy 
Cc: sta...@vger.kernel.org
---
 fs/autofs4/dev-ioctl.c  |   12 
 include/uapi/linux/Kbuild   |1 +
 include/uapi/linux/auto_dev-ioctl.h |2 +-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index 9b58d6e..f8cb3f6 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -628,10 +628,6 @@ static int _autofs_dev_ioctl(unsigned int command,
ioctl_fn fn = NULL;
int err = 0;
 
-   /* only root can play with this */
-   if (!capable(CAP_SYS_ADMIN))
-   return -EPERM;
-
cmd_first = _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST);
cmd = _IOC_NR(command);
 
@@ -640,6 +636,14 @@ static int _autofs_dev_ioctl(unsigned int command,
return -ENOTTY;
}
 
+   /* Only root can use ioctls other than AUTOFS_DEV_IOCTL_VERSION_CMD
+* and AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD
+*/
+   if (cmd != AUTOFS_DEV_IOCTL_VERSION_CMD &&
+   cmd != AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD &&
+   !capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
/* Copy the parameters into kernel space. */
param = copy_dev_ioctl(user);
if (IS_ERR(param))
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 662c592..1f22bbb 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -61,6 +61,7 @@ header-y += atm_zatm.h
 header-y += audit.h
 header-y += auto_fs4.h
 header-y += auto_fs.h
+header-y += auto_dev-ioctl.h
 header-y += auxvec.h
 header-y += ax25.h
 header-y += b1lli.h
diff --git a/include/uapi/linux/auto_dev-ioctl.h 
b/include/uapi/linux/auto_dev-ioctl.h
index 744b3d0..5558db8 100644
--- a/include/uapi/linux/auto_dev-ioctl.h
+++ b/include/uapi/linux/auto_dev-ioctl.h
@@ -16,7 +16,7 @@
 #define AUTOFS_DEVICE_NAME "autofs"
 
 #define AUTOFS_DEV_IOCTL_VERSION_MAJOR 1
-#define AUTOFS_DEV_IOCTL_VERSION_MINOR 0
+#define AUTOFS_DEV_IOCTL_VERSION_MINOR 1
 
 #define AUTOFS_DEV_IOCTL_SIZE  sizeof(struct autofs_dev_ioctl)
 



[PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-05-09 Thread Ian Kent
The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
of an automount by the call. But this flag is unconditionally cleared
for all stat family system calls except statx().

stat family system calls have always triggered mount requests for the
negative dentry case in follow_automount() which is intended but prevents
the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.

In order to handle the AT_NO_AUTOMOUNT for both system calls the
negative dentry case in follow_automount() needs to be changed to
return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
required flags are clear).

AFAICT this change doesn't have any noticable side effects and may,
in some use cases (although I didn't see it in testing) prevent
unnecessary callbacks to the automount daemon.

It's also possible that a stat family call has been made with a
path that is in the process of being mounted by some other process.
But stat family calls should return the automount state of the path
as it is "now" so it shouldn't wait for mount completion.

This is the same semantic as the positive dentry case already
handled.

Signed-off-by: Ian Kent 
Cc: David Howells 
Cc: Colin Walters 
Cc: Ondrej Holy 
Cc: sta...@vger.kernel.org
---
 fs/namei.c |   15 ---
 include/linux/fs.h |3 +--
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 7286f87..cd74838 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1129,9 +1129,18 @@ static int follow_automount(struct path *path, struct 
nameidata *nd,
 * of the daemon to instantiate them before they can be used.
 */
if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
-  LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
-   path->dentry->d_inode)
-   return -EISDIR;
+  LOOKUP_OPEN | LOOKUP_CREATE |
+  LOOKUP_AUTOMOUNT))) {
+   /* Positive dentry that isn't meant to trigger an
+* automount, EISDIR will allow it to be used,
+* otherwise there's no mount here "now" so return
+* ENOENT.
+*/
+   if (path->dentry->d_inode)
+   return -EISDIR;
+   else
+   return -ENOENT;
+   }
 
if (path->dentry->d_sb->s_user_ns != _user_ns)
return -EACCES;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 26488b4..be09684 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2935,8 +2935,7 @@ static inline int vfs_lstat(const char __user *name, 
struct kstat *stat)
 static inline int vfs_fstatat(int dfd, const char __user *filename,
  struct kstat *stat, int flags)
 {
-   return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
-stat, STATX_BASIC_STATS);
+   return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
 }
 static inline int vfs_fstat(int fd, struct kstat *stat)
 {



<    1   2   3   4   5   6   7   8   9   10   >