Re: [ANNOUNCE] autofs 5.1.2 release

2017-12-19 Thread Ian Kent
On 20/12/17 13:52, Ian Kent wrote:
> On 20/12/17 11:29, NeilBrown wrote:
>>
>> Hi Ian,
>>  I've been looking at:
>>
>>> - add configuration option to use fqdn in mounts.
>>
>> (commit 9aeef772604) because using this new option causes a regression.
>> If you are using the "replicated server" functionality, then
>>   use_hostname_for_mounts = yes
>> completely disables it.
> 
> Yes, that's not quite right.
> 
> It disables the probe and proximity check for each distinct host
> name used.
> 
> Each of the entries in the list of hosts should still be
> attempted and given that NFS ping is also now used in the NFS
> mount module what's lost is the preferred ordering of the hosts
> list.

Mmm  that's also not right.

An NFS ping is only done on failed local bind mount to check
the NFS server is running on the local machine.

So that availability check needs to be done at mount time if
the proximity check is not done 

Ian


Re: [ANNOUNCE] autofs 5.1.2 release

2017-12-19 Thread Ian Kent
On 20/12/17 14:10, Ian Kent wrote:
> On 20/12/17 13:52, Ian Kent wrote:
>> On 20/12/17 11:29, NeilBrown wrote:
>>>
>>> Hi Ian,
>>>  I've been looking at:
>>>
>>>> - add configuration option to use fqdn in mounts.
>>>
>>> (commit 9aeef772604) because using this new option causes a regression.
>>> If you are using the "replicated server" functionality, then
>>>   use_hostname_for_mounts = yes
>>> completely disables it.
>>
>> Yes, that's not quite right.
>>
>> It disables the probe and proximity check for each distinct host
>> name used.
>>
>> Each of the entries in the list of hosts should still be
>> attempted and given that NFS ping is also now used in the NFS
>> mount module what's lost is the preferred ordering of the hosts
>> list.
>>
>>>
>>> This is caused by:
>>>
>>> diff --git a/modules/replicated.c b/modules/replicated.c
>>> index 32860d5fe245..8437f5f3d5b2 100644
>>> --- a/modules/replicated.c
>>> +++ b/modules/replicated.c
>>> @@ -667,6 +667,12 @@ int prune_host_list(unsigned logopt, struct host 
>>> **list,
>>> if (!*list)
>>> return 0;
>>>  
>>> +   /* If we're using the host name then there's no point probing
>>> +* avialability and respose time.
>>> +*/
>>> +   if (defaults_use_hostname_for_mounts())
>>> +   return 1;
>>> +
>>> /* Use closest hosts to choose NFS version */
>>>
>>> My question is: why what this particular change made.
>>
>> It was a while ago but there were complains about using the IP
>> address for mounts. It was requested to provide a way to prevent
>> that and force the use of the host name in mounts.
>>
>>> Why can't prune_host_list() be allowed to do it's thing
>>> when use_hostname_for_mounts is set.
>>
>> We could if each host name resolved to a single IP address.
>>
>> I'd need to check that use_hostname_for_mounts doesn't get
>> in the road but the host struct should have ->rr set to true
>> if it has multiple addresses so changing it to work the way
>> your recommending shouldn't be hard. I think there's a couple
>> of places that would need to be checked.
>>
>> If the host does resolve to multiple addresses the situation
>> is different. There's no way to stop the actual mount from
>> trying an IP address that's not responding and proximity
>> doesn't make sense either again because every time a lookup
>> is done on the host name (eg. at mount time) the next address
>> in its list will be returned which can and usually is different
>> from what would have been checked.
>>
>>> I understand that it would be pointless choosing between
>>> the different interfaces of a multi-homed host, but there is still value
>>> in choosing between multiple distinct hosts.
>>>
>>> What, if anything, might go wrong if I simply reverse this chunk of the
>>> patch?
>>
>> You'll get IP addresses in the logs in certain cases but that
>> should be all.
>>
>> It would probably be better to ensure that the checks are done
>> if the host name resolves to a single IP address.
> 
> I think that should be "if the host names in the list each resolve
> to a single IP address", otherwise the round robin behavior would
> probably still get in the road.

I think maybe this is sufficient 

autofs-5.1.4 - use proximity check if all host names are simple

From: Ian Kent 

Currently if the configuration option use_hostname_for_mounts is
set then the proximity calcualtion is not done for the list of
hosts.

But if each host name in the host list resolves to a single IP
address then performing the proximity check still makes sense.

Signed-off-by: Ian Kent 
---
 modules/replicated.c |   32 ++--
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/modules/replicated.c b/modules/replicated.c
index 3ac4c70f..e5c2276d 100644
--- a/modules/replicated.c
+++ b/modules/replicated.c
@@ -711,6 +711,24 @@ done:
return 0;
 }
 
+static unsigned int is_hosts_list_simple(struct host *list)
+{
+   struct host *this = list;
+   unsigned int ret = 1;
+
+   while (this) {
+   struct host *next = this->next;
+
+   if (this->rr) {
+   ret = 0;
+   break;
+   }
+   this = next;
+   }
+
+   return ret;

Re: [ANNOUNCE] autofs 5.1.2 release

2017-12-19 Thread Ian Kent
On 20/12/17 13:52, Ian Kent wrote:
> On 20/12/17 11:29, NeilBrown wrote:
>>
>> Hi Ian,
>>  I've been looking at:
>>
>>> - add configuration option to use fqdn in mounts.
>>
>> (commit 9aeef772604) because using this new option causes a regression.
>> If you are using the "replicated server" functionality, then
>>   use_hostname_for_mounts = yes
>> completely disables it.
> 
> Yes, that's not quite right.
> 
> It disables the probe and proximity check for each distinct host
> name used.
> 
> Each of the entries in the list of hosts should still be
> attempted and given that NFS ping is also now used in the NFS
> mount module what's lost is the preferred ordering of the hosts
> list.
> 
>>
>> This is caused by:
>>
>> diff --git a/modules/replicated.c b/modules/replicated.c
>> index 32860d5fe245..8437f5f3d5b2 100644
>> --- a/modules/replicated.c
>> +++ b/modules/replicated.c
>> @@ -667,6 +667,12 @@ int prune_host_list(unsigned logopt, struct host **list,
>> if (!*list)
>> return 0;
>>  
>> +   /* If we're using the host name then there's no point probing
>> +* avialability and respose time.
>> +*/
>> +   if (defaults_use_hostname_for_mounts())
>> +   return 1;
>> +
>> /* Use closest hosts to choose NFS version */
>>
>> My question is: why what this particular change made.
> 
> It was a while ago but there were complains about using the IP
> address for mounts. It was requested to provide a way to prevent
> that and force the use of the host name in mounts.
> 
>> Why can't prune_host_list() be allowed to do it's thing
>> when use_hostname_for_mounts is set.
> 
> We could if each host name resolved to a single IP address.
> 
> I'd need to check that use_hostname_for_mounts doesn't get
> in the road but the host struct should have ->rr set to true
> if it has multiple addresses so changing it to work the way
> your recommending shouldn't be hard. I think there's a couple
> of places that would need to be checked.
> 
> If the host does resolve to multiple addresses the situation
> is different. There's no way to stop the actual mount from
> trying an IP address that's not responding and proximity
> doesn't make sense either again because every time a lookup
> is done on the host name (eg. at mount time) the next address
> in its list will be returned which can and usually is different
> from what would have been checked.
> 
>> I understand that it would be pointless choosing between
>> the different interfaces of a multi-homed host, but there is still value
>> in choosing between multiple distinct hosts.
>>
>> What, if anything, might go wrong if I simply reverse this chunk of the
>> patch?
> 
> You'll get IP addresses in the logs in certain cases but that
> should be all.
> 
> It would probably be better to ensure that the checks are done
> if the host name resolves to a single IP address.

I think that should be "if the host names in the list each resolve
to a single IP address", otherwise the round robin behavior would
probably still get in the road.

Ian


Re: [ANNOUNCE] autofs 5.1.2 release

2017-12-19 Thread Ian Kent
On 20/12/17 11:29, NeilBrown wrote:
> 
> Hi Ian,
>  I've been looking at:
> 
>> - add configuration option to use fqdn in mounts.
> 
> (commit 9aeef772604) because using this new option causes a regression.
> If you are using the "replicated server" functionality, then
>   use_hostname_for_mounts = yes
> completely disables it.

Yes, that's not quite right.

It disables the probe and proximity check for each distinct host
name used.

Each of the entries in the list of hosts should still be
attempted and given that NFS ping is also now used in the NFS
mount module what's lost is the preferred ordering of the hosts
list.

> 
> This is caused by:
> 
> diff --git a/modules/replicated.c b/modules/replicated.c
> index 32860d5fe245..8437f5f3d5b2 100644
> --- a/modules/replicated.c
> +++ b/modules/replicated.c
> @@ -667,6 +667,12 @@ int prune_host_list(unsigned logopt, struct host **list,
> if (!*list)
> return 0;
>  
> +   /* If we're using the host name then there's no point probing
> +* avialability and respose time.
> +*/
> +   if (defaults_use_hostname_for_mounts())
> +   return 1;
> +
> /* Use closest hosts to choose NFS version */
> 
> My question is: why what this particular change made.

It was a while ago but there were complains about using the IP
address for mounts. It was requested to provide a way to prevent
that and force the use of the host name in mounts.

> Why can't prune_host_list() be allowed to do it's thing
> when use_hostname_for_mounts is set.

We could if each host name resolved to a single IP address.

I'd need to check that use_hostname_for_mounts doesn't get
in the road but the host struct should have ->rr set to true
if it has multiple addresses so changing it to work the way
your recommending shouldn't be hard. I think there's a couple
of places that would need to be checked.

If the host does resolve to multiple addresses the situation
is different. There's no way to stop the actual mount from
trying an IP address that's not responding and proximity
doesn't make sense either again because every time a lookup
is done on the host name (eg. at mount time) the next address
in its list will be returned which can and usually is different
from what would have been checked.

> I understand that it would be pointless choosing between
> the different interfaces of a multi-homed host, but there is still value
> in choosing between multiple distinct hosts.
> 
> What, if anything, might go wrong if I simply reverse this chunk of the
> patch?

You'll get IP addresses in the logs in certain cases but that
should be all.

It would probably be better to ensure that the checks are done
if the host name resolves to a single IP address.

Ian


Re: [ANNOUNCE] autofs 5.1.4 release

2017-12-19 Thread Ian Kent
On 19/12/17 17:45, Patrick Goetz wrote:
> On 12/19/2017 02:29 AM, Ian Kent wrote:
>> A new amd mount type has been added. The amd mount type "program"
>> has been added which allows a user provided program to be given
>> in a map entry that will be used for performing the mount and
>> optionally another program to be used at umount.
>>
> 
> What's a use case for this?

It was requested and it is one of the standard amd mount types
that hadn't been implemented. If nothing else it makes a slightly
more complete amd map format implementation.

I've seen a case where someone used a script to customize a mount
command to suit different environments, where the mount command
needed to be slightly different for different platforms. Probably
not so useful to autofs but am-utils can run on a number of different
OSes so the amd maps used by autofs (ideally unchanged from other arches)
need to be able to cope with this.

There's also the case where a mount needs to use a custom command to
perform the mount (and possibly the umount at expire).

Long ago this was requested for Sun format maps but I'm still not
in a position to implement that (and it isn't part of the defined
Sun format map syntax either).

Ian


[ANNOUNCE] autofs 5.1.4 release

2017-12-19 Thread Ian Kent
Hi all,

It's finally time for a release, autofs-5.1.4.

Unfortunately there were a couple of annoying regressions in 5.1.3.

I believe they have been resolved in this release.

Also, the startup retry on the master map not available logic has
been improved so that should function properly now and help with
reported problems of startup failures. The delay is configurable
in case the default is not sufficient.

A new amd mount type has been added. The amd mount type "program"
has been added which allows a user provided program to be given
in a map entry that will be used for performing the mount and
optionally another program to be used at umount.

Additionally there are a number of bug fixes and other minor
improvements.

autofs
==

The package can be found at:
https://www.kernel.org/pub/linux/daemons/autofs/v5/

It is autofs-5.1.4.tar.[gz|xz]

No source rpm is there as it can be produced by using:

rpmbuild -ts autofs-5.1.4.tar.gz

and the binary rpm by using:

rpmbuild -tb autofs-5.1.4.tar.gz

See the README.amd-maps file for information about using amd format
maps.

Here are the entries from the CHANGELOG which outline the updates:

19/12/2017 autofs-5.1.4
- fix spec file url.
- fix unset tsd group name handling.
- Add -c option when calling /bin/umount - if supported.
- remove some redundant rpc library code.
- add port parameter to rpc_ping().
- dont probe NFSv2 by default.
- add version parameter to rpc_ping().
- fix typo in autofs config file comments.
- fix typos in autofs man pages.
- use pkg-config to search for libtirpc to fix cross-compilation.
- fix incorrect status return in get_nfs_info().
- fix a couple of compiler warnings.
- set systemd KillMode to process.
- fix mount.nfs blocks on first mount.
- fix some man page problems.
- add some more debug logging to get_nfs_info().
- add some more debug logging to get_supported_ver_and_cost().
- fix ipv6 proto option handling.
- also check flag file exe name.
- fix possible map instance memory leak.
- check map instances for staleness on map update.
- allow dot in OPTIONSTR value lexer pattern.
- fix autofs_use_lofs description.
- fix amd parser error buffer size.
- make spawn_bind_mount() use mount_wait as well.
- document ghost option in auto.master man page.
- only take master map mutex for master map update.
- revert fix argc off by one in mount_autofs.c.
- reset master map list on startup retry.
- fix nisplus lookup init not configured check.
- make open_lookup() error handling more consistent.
- be silent about sss library not found.
- be silent about nis domain not set.
- make map source reference message debug only.
- improve description of mount_nfs_default_protocol.
- the port option should not behave like nobind option.
- handle additional nfs versions in mount_nfs.c.
- fix symlink option passthrough in mount_nfs.c.
- fix ordering of seteuid/setegid in do_spawn().
- update configure to check for pipe2(2).
- fix open calls not using open_() calls.
- move open_() functions to spawn.c.
- serialize calls to open_() functions.
- improve debug logging of lookup key.
- fix typo in amd_parse.c.
- add missing MODPREFIX to logging in amd parser.
- fix symlink false negative in umount_multi().
- remove expand_selectors() on amd parser entry.
- fix amd defaults map entry handling.
- refactor amd_parse.c.
- fix amd parser double quote handling.
- fix expandamdent() quote handling.
- fix possible memory leak during amd parse.
- remove path restriction of amd external mount.
- add function umount_amd_ext_mount().
- add function ext_mount_inuse().
- add function construct_argv().
- add amd mount type program mount support.
- fix memory leak in umount_amd_ext_mount().
- fix strerror_r() parameter declaration in do program_mount().
- fix incorrect check in validate_program_options().
- fix log message in spawn_umount().
- workaround getaddrinfo(3) ai_canonname bug

Ian


Re: [PATCH 0/2] userns: automount cleanups

2017-11-30 Thread Ian Kent
On 30/11/17 13:21, Eric W. Biederman wrote:
> Ian Kent  writes:
> 
>> On 30/11/17 08:01, Eric W. Biederman wrote:
>>>
>>> While reviewing some code I realized that in getting d_automount working
>>> with s_user_ns I had left behind some unnecessary relics of the blind
>>> path I started down.  Here are two patches that remove those relics.
>>>
>>> Unless someone has another preference I will drop them in my userns tree
>>> and merge them that way.
>>
>> I saw the "->s_user_ns != &init_user_ns" and wondered if that would
>> trigger for automount(8) run entirely with a container (eg. docker)?
> 
> autofs still needs FS_USERNS_MOUNT before you can reach that point.  But
> docker does have a mode ?--userns-remap? where it sets up the containers
> mounts that way.
> 
> I think in principle that should work and be safe.  I don't know how
> robust autofs is against malicious users.  Which is the question to ask
> before actually adding FS_USERNS_MOUNT in struct file_system_type.

automount(8) thinks it is running as uid 0 (which it is in the container)
so this would require a bit of thought since automount(8) doesn't take
special precautions.

The restrictions, in the case of running entirely within a container, are
those of the container itself. For example, if there are no granular
capabilities available then the admin needs to run the container as
privileged in order to be able to mount NFS file systems, which is a
common usage case. 

Ian


Re: [PATCH 0/2] userns: automount cleanups

2017-11-29 Thread Ian Kent
On 30/11/17 08:01, Eric W. Biederman wrote:
> 
> While reviewing some code I realized that in getting d_automount working
> with s_user_ns I had left behind some unnecessary relics of the blind
> path I started down.  Here are two patches that remove those relics.
> 
> Unless someone has another preference I will drop them in my userns tree
> and merge them that way.

I saw the "->s_user_ns != &init_user_ns" and wondered if that would
trigger for automount(8) run entirely with a container (eg. docker)?

Anyway, it's gone now, so ACK to these two, thanks Eric.

> 
> Eric W. Biederman (2):
>   userns: Don't fail follow_automount based on s_user_ns
>   autofs4: Modify autofs_wait to use current_uid() and current_gid()
> 
>  fs/autofs4/waitq.c | 4 ++--
>  fs/namei.c | 3 ---
>  2 files changed, 2 insertions(+), 5 deletions(-)
> 
> 



Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-29 Thread Ian Kent
On 29/11/17 15:39, NeilBrown wrote:
> On Wed, Nov 29 2017, Ian Kent wrote:
> 
>> On 29/11/17 11:45, NeilBrown wrote:
>>> On Wed, Nov 29 2017, Ian Kent wrote:
>>>
>>>> Adding Al Viro to the Cc list as I believe Stephen Whitehouse and
>>>> Al have discussed something similar, please feel free to chime in
>>>> with your thoughts Al.
>>>>
>>>> On 29/11/17 09:17, NeilBrown wrote:
>>>>> On Tue, Nov 28 2017, Mike Marion wrote:
>>>>>
>>>>>> On Tue, Nov 28, 2017 at 07:43:05AM +0800, Ian Kent wrote:
>>>>>>
>>>>>>> I think the situation is going to get worse before it gets better.
>>>>>>>
>>>>>>> On recent Fedora and kernel, with a large map and heavy mount activity
>>>>>>> I see:
>>>>>>>
>>>>>>> systemd, udisksd, gvfs-udisks2-volume-monitor, gvfsd-trash,
>>>>>>> gnome-settings-daemon, packagekitd and gnome-shell
>>>>>>>
>>>>>>> all go crazy consuming large amounts of CPU.
>>>>>>
>>>>>> Yep.  I'm not even worried about the CPU usage as much (yet, I'm sure 
>>>>>> it'll be more of a problem as time goes on).  We have pretty huge
>>>>>> direct maps and our initial startup tests on a new host with the link vs
>>>>>> file took >6 hours.  That's not a typo.  We worked with Suse engineering 
>>>>>> to come up with a fix, which should've been pushed here some time ago.
>>>>>>
>>>>>> Then, there's shutdowns (and reboots). They also took a long time (on
>>>>>> the order of 20+min) because it would walk the entire /proc/mounts
>>>>>> "unmounting" things.  Also fixed now.  That one had something to do in
>>>>>> SMP code as if you used a single CPU/core, it didn't take long at all.
>>>>>>
>>>>>> Just got a fix for the suse grub2-mkconfig script to fix their parsing 
>>>>>> looking for the root dev to skip over fstype autofs
>>>>>> (probe_nfsroot_device function).
>>>>>>
>>>>>>> The symlink change was probably the start, now a number of applications
>>>>>>> now got directly to the proc file system for this information.
>>>>>>>
>>>>>>> For large mount tables and many processes accessing the mount table
>>>>>>> (probably reading the whole thing, either periodically or on change
>>>>>>> notification) the current system does not scale well at all.
>>>>>>
>>>>>> We use Clearcase in some instances as well, and that's yet another thing
>>>>>> adding mounts, and its startup is very slow, due to the size of
>>>>>> /proc/mounts.  
>>>>>>
>>>>>> It's definitely something that's more than just autofs and probably
>>>>>> going to get worse, as you say.
>>>>>
>>>>> If we assume that applications are going to want to read
>>>>> /proc/self/mount* a log, we probably need to make it faster.
>>>>> I performed a simple experiment where I mounted 1000 tmpfs filesystems,
>>>>> copied /proc/self/mountinfo to /tmp/mountinfo, then
>>>>> ran 4 for loops in parallel catting one of these files to /dev/null 1000 
>>>>> times.
>>>>> On a single CPU VM:
>>>>>   For /tmp/mountinfo, each group of 1000 cats took about 3 seconds.
>>>>>   For /proc/self/mountinfo, each group of 1000 cats took about 14 seconds.
>>>>> On a 4 CPU VM
>>>>>   /tmp/mountinfo: 1.5secs
>>>>>   /proc/self/mountinfo: 3.5 secs
>>>>>
>>>>> Using "perf record" it appears that most of the cost is repeated calls
>>>>> to prepend_path, with a small contribution from the fact that each read
>>>>> only returns 4K rather than the 128K that cat asks for.
>>>>>
>>>>> If we could hang a cache off struct mnt_namespace and use it instead of
>>>>> iterating the mount table - using rcu and ns->event to ensure currency -
>>>>> we should be able to minimize the cost of this increased use of
>>>>> /proc/self/mount*.
>>>>>
>>>>> I suspect that the best approach would be implement a cache at the
>>>>

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-28 Thread Ian Kent
On 29/11/17 11:45, NeilBrown wrote:
> On Wed, Nov 29 2017, Ian Kent wrote:
> 
>> Adding Al Viro to the Cc list as I believe Stephen Whitehouse and
>> Al have discussed something similar, please feel free to chime in
>> with your thoughts Al.
>>
>> On 29/11/17 09:17, NeilBrown wrote:
>>> On Tue, Nov 28 2017, Mike Marion wrote:
>>>
>>>> On Tue, Nov 28, 2017 at 07:43:05AM +0800, Ian Kent wrote:
>>>>
>>>>> I think the situation is going to get worse before it gets better.
>>>>>
>>>>> On recent Fedora and kernel, with a large map and heavy mount activity
>>>>> I see:
>>>>>
>>>>> systemd, udisksd, gvfs-udisks2-volume-monitor, gvfsd-trash,
>>>>> gnome-settings-daemon, packagekitd and gnome-shell
>>>>>
>>>>> all go crazy consuming large amounts of CPU.
>>>>
>>>> Yep.  I'm not even worried about the CPU usage as much (yet, I'm sure 
>>>> it'll be more of a problem as time goes on).  We have pretty huge
>>>> direct maps and our initial startup tests on a new host with the link vs
>>>> file took >6 hours.  That's not a typo.  We worked with Suse engineering 
>>>> to come up with a fix, which should've been pushed here some time ago.
>>>>
>>>> Then, there's shutdowns (and reboots). They also took a long time (on
>>>> the order of 20+min) because it would walk the entire /proc/mounts
>>>> "unmounting" things.  Also fixed now.  That one had something to do in
>>>> SMP code as if you used a single CPU/core, it didn't take long at all.
>>>>
>>>> Just got a fix for the suse grub2-mkconfig script to fix their parsing 
>>>> looking for the root dev to skip over fstype autofs
>>>> (probe_nfsroot_device function).
>>>>
>>>>> The symlink change was probably the start, now a number of applications
>>>>> now got directly to the proc file system for this information.
>>>>>
>>>>> For large mount tables and many processes accessing the mount table
>>>>> (probably reading the whole thing, either periodically or on change
>>>>> notification) the current system does not scale well at all.
>>>>
>>>> We use Clearcase in some instances as well, and that's yet another thing
>>>> adding mounts, and its startup is very slow, due to the size of
>>>> /proc/mounts.  
>>>>
>>>> It's definitely something that's more than just autofs and probably
>>>> going to get worse, as you say.
>>>
>>> If we assume that applications are going to want to read
>>> /proc/self/mount* a log, we probably need to make it faster.
>>> I performed a simple experiment where I mounted 1000 tmpfs filesystems,
>>> copied /proc/self/mountinfo to /tmp/mountinfo, then
>>> ran 4 for loops in parallel catting one of these files to /dev/null 1000 
>>> times.
>>> On a single CPU VM:
>>>   For /tmp/mountinfo, each group of 1000 cats took about 3 seconds.
>>>   For /proc/self/mountinfo, each group of 1000 cats took about 14 seconds.
>>> On a 4 CPU VM
>>>   /tmp/mountinfo: 1.5secs
>>>   /proc/self/mountinfo: 3.5 secs
>>>
>>> Using "perf record" it appears that most of the cost is repeated calls
>>> to prepend_path, with a small contribution from the fact that each read
>>> only returns 4K rather than the 128K that cat asks for.
>>>
>>> If we could hang a cache off struct mnt_namespace and use it instead of
>>> iterating the mount table - using rcu and ns->event to ensure currency -
>>> we should be able to minimize the cost of this increased use of
>>> /proc/self/mount*.
>>>
>>> I suspect that the best approach would be implement a cache at the
>>> seq_file level.
>>>
>>> One possible problem might be if applications assume that a read will
>>> always return a whole number of lines (it currently does).  To be
>>> sure we remain safe, we would only be able to use the cache for
>>> a read() syscall which reads the whole file.
>>> How big do people see /proc/self/mount* getting?  What size reads
>>> does 'strace' show the various programs using to read it?
>>
>> Buffer size almost always has a significant impact on IO so that's
>> likely a big factor but the other aspect of this is notification
>> of changes.
>>
&g

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-28 Thread Ian Kent
On 29/11/17 10:48, NeilBrown wrote:
> On Wed, Nov 29 2017, Ian Kent wrote:
> 
>> On 29/11/17 10:13, Mike Marion wrote:
>>> On Wed, Nov 29, 2017 at 12:17:27PM +1100, NeilBrown wrote:
>>>
>>>> How big do people see /proc/self/mount* getting?  What size reads
>>>> does 'strace' show the various programs using to read it?
>>>
>>> We already have line counts into 5 figures.  This wasn't an issue until 
>>> the change of /etc/mtab to a link.  The large count is due to our large
>>> direct automount maps.
>>>
> 
> So  90,000 lines with a length of may 120 chars or about 10Meg.
> Presumably these machines would have so many gigabytes of RAM that
> caching a 10M mountinfo file would go unnoticed?
> 
> Reading that in 128K chunks without generating bits on the fly will help
> a lot I suspect.
> 
> We could probably ensure proper alignment by searching backwards for
> '\n' when deciding how much to return for a read.
> 
>>
>> And, admittedly, the testing I was doing was with 15k+ size maps.
>>
>> Of course it's necessary to have this number of mounts to see serious
>> problems which is easiest to do with large direct mount maps.
>>
>> The thing that's different now is that before applications started
>> using /proc directly for mount table information using mount(2)
>> instead of mount(8) was enough to prevent the mount entries from
>> being added to the table seen by applications.
> 
> I wonder who would notice if untriggered direct mounts quietly disappeared 
> from
> /proc/mounts...  I suspect systemd would, but there is a good chance it
> would fail-safe: assume that the mount worked.
> Alternately we could introduce /proc/self/mountinfo2 which doesn't list
> direct automounts and encourage problematic programs to use that where
> available.

Personally I think the proc filesystem tables should be a full representation
of the mounted mounts. That needs to available somewhere.

autofs still uses the proc mounts in one situation (there might be more I've
missed in eliminating this) and other applications probably do too (I have
patches to eliminate this one but they don't fit well with the current
implementation).

One approach I tried was to write a simple file system to mount over
/etc/mtab (et. al.) and filter entries returned based on simple filter
rules.

That file system is not complete by any means and I couldn't get a feel
for how effective it would be due to the direct access to the proc mount
tables done by applications.

OTOH, if applications could be persuaded to use /etc/(mtab|mounts|mountinfo)
and only consult the proc tables when absolutely necessary this pseudo file
system would be an ideal place for an entry cache 

Ian


Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-28 Thread Ian Kent

Adding Al Viro to the Cc list as I believe Stephen Whitehouse and
Al have discussed something similar, please feel free to chime in
with your thoughts Al.

On 29/11/17 09:17, NeilBrown wrote:
> On Tue, Nov 28 2017, Mike Marion wrote:
> 
>> On Tue, Nov 28, 2017 at 07:43:05AM +0800, Ian Kent wrote:
>>
>>> I think the situation is going to get worse before it gets better.
>>>
>>> On recent Fedora and kernel, with a large map and heavy mount activity
>>> I see:
>>>
>>> systemd, udisksd, gvfs-udisks2-volume-monitor, gvfsd-trash,
>>> gnome-settings-daemon, packagekitd and gnome-shell
>>>
>>> all go crazy consuming large amounts of CPU.
>>
>> Yep.  I'm not even worried about the CPU usage as much (yet, I'm sure 
>> it'll be more of a problem as time goes on).  We have pretty huge
>> direct maps and our initial startup tests on a new host with the link vs
>> file took >6 hours.  That's not a typo.  We worked with Suse engineering 
>> to come up with a fix, which should've been pushed here some time ago.
>>
>> Then, there's shutdowns (and reboots). They also took a long time (on
>> the order of 20+min) because it would walk the entire /proc/mounts
>> "unmounting" things.  Also fixed now.  That one had something to do in
>> SMP code as if you used a single CPU/core, it didn't take long at all.
>>
>> Just got a fix for the suse grub2-mkconfig script to fix their parsing 
>> looking for the root dev to skip over fstype autofs
>> (probe_nfsroot_device function).
>>
>>> The symlink change was probably the start, now a number of applications
>>> now got directly to the proc file system for this information.
>>>
>>> For large mount tables and many processes accessing the mount table
>>> (probably reading the whole thing, either periodically or on change
>>> notification) the current system does not scale well at all.
>>
>> We use Clearcase in some instances as well, and that's yet another thing
>> adding mounts, and its startup is very slow, due to the size of
>> /proc/mounts.  
>>
>> It's definitely something that's more than just autofs and probably
>> going to get worse, as you say.
> 
> If we assume that applications are going to want to read
> /proc/self/mount* a log, we probably need to make it faster.
> I performed a simple experiment where I mounted 1000 tmpfs filesystems,
> copied /proc/self/mountinfo to /tmp/mountinfo, then
> ran 4 for loops in parallel catting one of these files to /dev/null 1000 
> times.
> On a single CPU VM:
>   For /tmp/mountinfo, each group of 1000 cats took about 3 seconds.
>   For /proc/self/mountinfo, each group of 1000 cats took about 14 seconds.
> On a 4 CPU VM
>   /tmp/mountinfo: 1.5secs
>   /proc/self/mountinfo: 3.5 secs
> 
> Using "perf record" it appears that most of the cost is repeated calls
> to prepend_path, with a small contribution from the fact that each read
> only returns 4K rather than the 128K that cat asks for.
> 
> If we could hang a cache off struct mnt_namespace and use it instead of
> iterating the mount table - using rcu and ns->event to ensure currency -
> we should be able to minimize the cost of this increased use of
> /proc/self/mount*.
> 
> I suspect that the best approach would be implement a cache at the
> seq_file level.
> 
> One possible problem might be if applications assume that a read will
> always return a whole number of lines (it currently does).  To be
> sure we remain safe, we would only be able to use the cache for
> a read() syscall which reads the whole file.
> How big do people see /proc/self/mount* getting?  What size reads
> does 'strace' show the various programs using to read it?

Buffer size almost always has a significant impact on IO so that's
likely a big factor but the other aspect of this is notification
of changes.

The risk is improving the IO efficiency might just allow a higher
rate of processing of change notifications and similar symptoms
to what we have now.

The suggestion is that a system that allows for incremental (diff
type) update notification is needed to allow mount table propagation
to scale well.

That implies some as yet undefined user <-> kernel communication
protocol.

Ian


Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-28 Thread Ian Kent
On 29/11/17 10:13, Mike Marion wrote:
> On Wed, Nov 29, 2017 at 12:17:27PM +1100, NeilBrown wrote:
> 
>> How big do people see /proc/self/mount* getting?  What size reads
>> does 'strace' show the various programs using to read it?
> 
> We already have line counts into 5 figures.  This wasn't an issue until 
> the change of /etc/mtab to a link.  The large count is due to our large
> direct automount maps.
> 

And, admittedly, the testing I was doing was with 15k+ size maps.

Of course it's necessary to have this number of mounts to see serious
problems which is easiest to do with large direct mount maps.

The thing that's different now is that before applications started
using /proc directly for mount table information using mount(2)
instead of mount(8) was enough to prevent the mount entries from
being added to the table seen by applications.

Ian


Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-27 Thread Ian Kent
On 28/11/17 00:01, Mike Marion wrote:
> On Thu, Nov 23, 2017 at 08:36:49AM +0800, Ian Kent wrote:
> 
>> And with the move of userspace to use /proc based mount tables (one
>> example being the symlink of /etc/mtab into /proc) even modest sized
>> direct mount maps will be a problem with every entry getting mounted.
>>
>> Systems will cope with this fine but larger systems not so much.
> 
> Yes.. we've run into some big issues due to the change of /etc/mtab from
> a file to a symlink to /proc/self/mounts.  Most have been worked around
> thus far (mostly due to Suse coming up with patches) but still have a
> few annoying ones.
> 

I think the situation is going to get worse before it gets better.

On recent Fedora and kernel, with a large map and heavy mount activity
I see:

systemd, udisksd, gvfs-udisks2-volume-monitor, gvfsd-trash,
gnome-settings-daemon, packagekitd and gnome-shell

all go crazy consuming large amounts of CPU.

Once the mount activity is completed I see two processes continue to
consume a large amount of CPU. I thought one of those two was systemd
but my notes say they were gvfs-udisks2-volume-monitor and
gnome-settings-daemon.

The symlink change was probably the start, now a number of applications
now got directly to the proc file system for this information.

For large mount tables and many processes accessing the mount table
(probably reading the whole thing, either periodically or on change
notification) the current system does not scale well at all.

Ian


[PATCH 2/2] autofs: revert fix AT_NO_AUTOMOUNT not being honored

2017-11-26 Thread Ian Kent
Commit 42f4614821 allowed the fstatat(2) system call to properly honor
the AT_NO_AUTOMOUNT flag but introduced a semantic change.

In order to honor AT_NO_AUTOMOUNT a semantic change was made to the
negative dentry case for stat family system calls in follow_automount().

This changed the unconditional triggering of an automount in this case
to no longer be done and an error returned instead.

This has caused more problems than I expected so reverting the change
is needed.

In a discussion with Neil Brown it was concluded that the automount(8)
daemon can implement this change without kernel modifications. So that
will be done instead and the autofs module documentation updated with
a description of the problem and what needs to be done by module users
for this specific case.

Reverts: 42f4614821

Signed-off-by: Ian Kent 
Cc: Neil Brown 
Cc: Al Viro 
Cc: David Howells 
Cc: Colin Walters 
Cc: Ondrej Holy 
---
 fs/namei.c |   15 +++
 include/linux/fs.h |3 ++-
 2 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f0c7a7b9b6ca..9cc91fb7f156 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1129,18 +1129,9 @@ static int follow_automount(struct path *path, struct 
nameidata *nd,
 * of the daemon to instantiate them before they can be used.
 */
if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
-  LOOKUP_OPEN | LOOKUP_CREATE |
-  LOOKUP_AUTOMOUNT))) {
-   /* Positive dentry that isn't meant to trigger an
-* automount, EISDIR will allow it to be used,
-* otherwise there's no mount here "now" so return
-* ENOENT.
-*/
-   if (path->dentry->d_inode)
-   return -EISDIR;
-   else
-   return -ENOENT;
-   }
+  LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
+   path->dentry->d_inode)
+   return -EISDIR;
 
if (path->dentry->d_sb->s_user_ns != &init_user_ns)
return -EACCES;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2995a271ec46..9c4a43e391ac 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3088,7 +3088,8 @@ static inline int vfs_lstat(const char __user *name, 
struct kstat *stat)
 static inline int vfs_fstatat(int dfd, const char __user *filename,
  struct kstat *stat, int flags)
 {
-   return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
+   return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
+stat, STATX_BASIC_STATS);
 }
 static inline int vfs_fstat(int fd, struct kstat *stat)
 {



[PATCH 1/2] autofs: revert take more care to not update last_used on path walk

2017-11-26 Thread Ian Kent
While the patch of commit 092a53452b helped (partially) resolve a
problem where automounts were not expiring due to aggressive accesses
from user space it has a side effect for very large environments.

This change helps with the expire problem by making the expire more
aggressive but, for very large environments, that means more mount
requests from clients. When there are a lot of clients that can mean
fairly significant server load increases.

It turns out I put the last_used in this position to solve this
very problem and failed to update my own thinking of the autofs
expire policy. So the patch being reverted introduces a regression
which should be fixed.

Reverts: 092a53452b

Signed-off-by: Ian Kent 
Cc: Neil Brown 
Cc: Al Viro 
---
 fs/autofs4/root.c |   17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index d79ced925861..82e8f6edfb48 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -281,8 +281,8 @@ static int autofs4_mount_wait(const struct path *path, bool 
rcu_walk)
pr_debug("waiting for mount name=%pd\n", path->dentry);
status = autofs4_wait(sbi, path, NFY_MOUNT);
pr_debug("mount wait done status=%d\n", status);
-   ino->last_used = jiffies;
}
+   ino->last_used = jiffies;
return status;
 }
 
@@ -321,21 +321,16 @@ static struct dentry *autofs4_mountpoint_changed(struct 
path *path)
 */
if (autofs_type_indirect(sbi->type) && d_unhashed(dentry)) {
struct dentry *parent = dentry->d_parent;
+   struct autofs_info *ino;
struct dentry *new;
 
new = d_lookup(parent, &dentry->d_name);
if (!new)
return NULL;
-   if (new == dentry)
-   dput(new);
-   else {
-   struct autofs_info *ino;
-
-   ino = autofs4_dentry_ino(new);
-   ino->last_used = jiffies;
-   dput(path->dentry);
-   path->dentry = new;
-   }
+   ino = autofs4_dentry_ino(new);
+   ino->last_used = jiffies;
+   dput(path->dentry);
+   path->dentry = new;
}
return path->dentry;
 }



Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 23/11/17 12:49, NeilBrown wrote:
> On Thu, Nov 23 2017, Ian Kent wrote:
> 
>> On 23/11/17 10:21, NeilBrown wrote:
>>> On Thu, Nov 23 2017, Ian Kent wrote:
>>>
>>>>
>>>> Hey Neil, I'm looking at this again because RH QE have complained about
>>>> a regression test failing with a kernel that has this change.
>>>>
>>>> Maybe I'm just dumb but I though a "find  "
>>>> would, well, just look at the contents below  but an
>>>> strace shows that it reads and calls fstatat() on "every entry in the
>>>> mount table" regardless of the path.
>>>
>>> weird ... I can only get find to look at the mount table if given the
>>> -fstyp option, and even then it doesn't fstatat anything that isn't in
>>> the tree it is searching.
>>
>> It's probably the -xautofs (exclude autofs fs'es) that was used in
>> the test that requires reading the mount table to get info about
>> excluding autofs mounts but the fstatat() on all the entries,
>> regardless of path, that was a surprise to me.
>>
>> find did use AT_SYMLINK_NOFOLLOW which historically behaved like
>> AT_NO_AUTOMOUNT.
>>
>>>
>>>
>>>>
>>>> And with the move of userspace to use /proc based mount tables (one
>>>> example being the symlink of /etc/mtab into /proc) even modest sized
>>>> direct mount maps will be a problem with every entry getting mounted.
>>>
>>> But the patch in question is only about indirect mount maps, isn't it?
>>> How is it relevant to direct mount maps?
>>
>> The change here will cause fstatat() to trigger direct mounts on access
>> if it doesn't use AT_NO_AUTOMOUNT.
> 
> Ahhh... light dawns.
> This is about this bit of the patch:
> 
>  static inline int vfs_fstatat(int dfd, const char __user *filename,
>   struct kstat *stat, int flags)
>  {
> -   return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
> -stat, STATX_BASIC_STATS);
> +   return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
>  }
> 
> I hadn't paid much attention to that.
> 
> I before this patch:
>   stat and lstat act as you would expect AT_NO_AUTOMOUNT to act on
>   direct mount and browseable indirect mount, but not on unbrowseable
>   indirect mounts

Yep, because of the fall thru for negative dentrys at:

if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
path->dentry->d_inode)
return -EISDIR;

which is missing a LOOKUP_FOLLOW check if we wanted to catch 
AT_SYMLINK_NOFOLLOW.

>   fstatat appeared to accept the AT_NO_AUTOMOUNT flag, but actually
>   assumed it was always set, but acted like stat and lstat

Yep, always set AT_NO_AUTOMOUNT so that it behaved the same as 

>   xstatat actually accepted the AT_NO_AUTOMOUNT flag, but it had no
>   effect on unbrowseable indirect mounts.

Yep.

> 
> after the patch, the distinction between direct and indirect was gone,
> and fstatat now handles AT_NO_AUTOMOUNT the same as xstatat.
> So:
>   stat and lstat now don't trigger automounts even on indirect, but
> this is a mixed blessing as they don't even trigger the mkdir

Yep.

>   fstatat without AT_NO_AUTOMOUNT now always triggers an automount
> This is a problematic regression that you have noticed and
> likely needs to be reverted.  Maybe we can assume
> AT_NO_AUTOMOUNT when AT_SYMLINK_NOFOLLOW is set, and require
> people to use xstatat if they need to set the flags separately

Yep.

The introduction of AT_NO_AUTOMOUNT (and the introduction of
follow_managed() and friends) was meant to do away with the
misleading use the AT_SYMLINK_NOFOLLOW (at the time the automount
mechanism abused the ->follow_link() method because it had similar
semantics to symlinks).

To catch the older usage pattern re-adding an AT_SYMLINK_NOFOLLOW
check would be helpful.

The reality is there are still the same old problems of unintended
mounting (mount storms) and AT_NO_AUTOMOUNT not being properly
handled.

Certainly the implementation we have now is much better but these
niggling problems remain and user space steps on them way too often,
and it feels like its much more so lately.  

>   xstatat now correctly honours AT_NO_AUTOMOUNT for indirect mounts
> but is otherwise unchanged.

Yep, assuming we accept the ENOENT return as sensible for AT_NO_AUTOMOUNT
no browse indirect mount case. statx() being a new system call it would
be ideal to get the se

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 23/11/17 11:04, NeilBrown wrote:
> On Thu, Nov 23 2017, Ian Kent wrote:
> 
>> On 23/11/17 08:47, NeilBrown wrote:
>>> On Wed, Nov 22 2017, Ian Kent wrote:
>>>
>>>> On 21/11/17 09:53, NeilBrown wrote:
>>>>> On Wed, May 10 2017, Ian Kent wrote:
>>>>>
>>>>>> The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
>>>>>> which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
>>>>>> of an automount by the call. But this flag is unconditionally cleared
>>>>>> for all stat family system calls except statx().
>>>>>>
>>>>>> stat family system calls have always triggered mount requests for the
>>>>>> negative dentry case in follow_automount() which is intended but prevents
>>>>>> the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.
>>>>>>
>>>>>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>>>>>> negative dentry case in follow_automount() needs to be changed to
>>>>>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>>>>>> required flags are clear).
>>>>>>
>>>>>> AFAICT this change doesn't have any noticable side effects and may,
>>>>>> in some use cases (although I didn't see it in testing) prevent
>>>>>> unnecessary callbacks to the automount daemon.
>>>>>
>>>>> Actually, this patch does have a noticeable side effect.
>>>>
>>>> That's right.
>>>>
>>>>>
>>>>> Previously, if /home were an indirect mount point so that /home/neilb
>>>>> would mount my home directory, "ls -l /home/neilb" would always work.
>>>>> Now it doesn't.
>>>>
>>>> An this is correct too.
>>>>
>>>> The previous policy was that a ->lookup() would always cause a mount to
>>>> occur. It's that policy that prevents the AT_NO_AUTOMOUNT flag from being
>>>> able to work consistently.
>>>
>>> Just to be clear, the previous policy was:
>>>  kernel - a lookup would cause a message to be sent to the automount daemon
>>>  daemon - the receipt of a message would cause a mount to occur.
>>>
>>> Either part of this policy could be changed to allow AT_NO_AUTOMOUNT to
>>> work reliably.  You chose to change the first.  At the time I thought
>>> this was a good idea.  I no longer think so.
>>>
>>> Specifically, I think the second part of the policy should be revised a 
>>> little.
>>>
>>>>
>>>> If you set the indirect mount "browse" option that will cause the mount
>>>> point directories for each of the map keys to be pre-created. So a
>>>> directory listing will show the mount points and an attempt to access
>>>> anything within the mount point directory will cause it to be mounted.
>>>>
>>>> There is still the problem of not knowing map keys when the wildcard
>>>> map entry is used.
>>>>
>>>> Still, stat family systems calls have always had similar problems that
>>>> prevent consistent behavior.
>>>
>>> Yes, I understand all this.  stat family sys-call have some odd
>>> behaviours at times like "stat(); open(); fstat()" will result in
>>> different sets of status info.  This is known.
>>> The point is that these odd behaviours have changed in a noticeably way
>>> (contrary to the change log comment) and it isn't clear these changes
>>> are good.
>>>
>>>>
>>>>>
>>>>> This happens because "ls" calls 'lstat' on the path and when that fails
>>>>> with "ENOENT", it doesn't bother trying to open.
>>>>
>>>> I know, I believe the ENOENT is appropriate because there is no mount
>>>> and no directory at the time this happens.
>>>
>>> Two distinct statements here.  "no mount" and "no directory".
>>> If AT_NO_AUTOMOUNT is given (or implicit) the "no mount" is significant
>>> and shouldn't be changed.   It isn't clear that "no directory" is
>>> significant.
>>> If you think of the list of names stored in the autofs filesystem as a
>>> cache of recently used names from the map, then the directory *does

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 23/11/17 10:46, Ian Kent wrote:
> On 23/11/17 10:21, NeilBrown wrote:
>> On Thu, Nov 23 2017, Ian Kent wrote:
>>
>>>
>>> Hey Neil, I'm looking at this again because RH QE have complained about
>>> a regression test failing with a kernel that has this change.
>>>
>>> Maybe I'm just dumb but I though a "find  "
>>> would, well, just look at the contents below  but an
>>> strace shows that it reads and calls fstatat() on "every entry in the
>>> mount table" regardless of the path.
>>
>> weird ... I can only get find to look at the mount table if given the
>> -fstyp option, and even then it doesn't fstatat anything that isn't in
>> the tree it is searching.
> 
> It's probably the -xautofs (exclude autofs fs'es) that was used in
> the test that requires reading the mount table to get info about
> excluding autofs mounts but the fstatat() on all the entries,
> regardless of path, that was a surprise to me.
> 
> find did use AT_SYMLINK_NOFOLLOW which historically behaved like
> AT_NO_AUTOMOUNT.

Check this simple reproducer out:

[root@intel-mahobay-02 ~]# uname -r
4.14.0-0.rc8.2.el7a.x86_64
[root@intel-mahobay-02 ~]# cat /etc/exports
/exportdir-bz1482065 *(rw,no_root_squash)
[root@intel-mahobay-02 ~]# cat /etc/auto.master
/- /etc/auto.direct
[root@intel-mahobay-02 ~]# cat /etc/auto.direct 
/mnt/autofs/test localhost:/exportdir-bz1482065
[root@intel-mahobay-02 ~]# ls /mnt/autofs/
bz1482065  test
[root@intel-mahobay-02 ~]# ls /exportdir-bz1482065
bz1482065
[root@intel-mahobay-02 ~]# find /mnt/autofs -xautofs  -name '*bz1482065*'
[root@intel-mahobay-02 ~]# find /mnt/autofs -xautofs  -name '*bz1482065*'
/mnt/autofs/bz1482065
/mnt/autofs/test/bz1482065

The first find mounts the direct mount /mnt/autofs/test/bz1482065 which causes 
the
second to list /mnt/autofs/bz1482065 (not sure yet where this comes from, it 
must
be a plain directory within /mnt/autofs) and the now mounted bind mount (ie. the
localhost: host name in the map entry) /mnt/autofs/test/bz1482065.

The strace is:
execve("/usr/bin/find", ["find", "/mnt/autofs", "-xautofs", "-name", 
"*bz1482065*"], [/* 45 vars */]) = 0

snip ...

newfstatat(AT_FDCWD, "/mnt/autofs", {st_mode=S_IFDIR|0755, st_size=35, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
open("/etc/mtab", O_RDONLY|O_CLOEXEC)   = 4
futex(0x7efec9ba55b0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7efeca2ea000
read(4, "sysfs /sys sysfs rw,seclabel,nos"..., 1024) = 1024
read(4, "oup/net_cls,net_prio cgroup rw,n"..., 1024) = 1024
read(4, "ges hugetlbfs rw,seclabel,relati"..., 1024) = 594
read(4, "", 1024)   = 0
close(4)= 0
munmap(0x7efeca2ea000, 4096)= 0
newfstatat(AT_FDCWD, "/sys", {st_mode=S_IFDIR|0555, st_size=0, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/dev", {st_mode=S_IFDIR|0755, st_size=3280, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/kernel/security", {st_mode=S_IFDIR|0755, st_size=0, 
...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/dev/shm", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=40, 
...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/dev/pts", {st_mode=S_IFDIR|0755, st_size=0, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/run", {st_mode=S_IFDIR|0755, st_size=1020, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup", {st_mode=S_IFDIR|0755, st_size=340, 
...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/systemd", {st_mode=S_IFDIR|0555, 
st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/pstore", {st_mode=S_IFDIR|0750, st_size=0, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/blkio", {st_mode=S_IFDIR|0555, st_size=0, 
...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/cpu,cpuacct", {st_mode=S_IFDIR|0555, 
st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/pids", {st_mode=S_IFDIR|0555, st_size=0, 
...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/net_cls,net_prio", {st_mode=S_IFDIR|0555, 
st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/devices", {st_mode=S_IFDIR|0555, 
st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "/sys/fs/cgroup/freezer", {st_mode=S_IFDIR|0555, 
st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstat

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 23/11/17 10:21, NeilBrown wrote:
> On Thu, Nov 23 2017, Ian Kent wrote:
> 
>>
>> Hey Neil, I'm looking at this again because RH QE have complained about
>> a regression test failing with a kernel that has this change.
>>
>> Maybe I'm just dumb but I though a "find  "
>> would, well, just look at the contents below  but an
>> strace shows that it reads and calls fstatat() on "every entry in the
>> mount table" regardless of the path.
> 
> weird ... I can only get find to look at the mount table if given the
> -fstyp option, and even then it doesn't fstatat anything that isn't in
> the tree it is searching.

It's probably the -xautofs (exclude autofs fs'es) that was used in
the test that requires reading the mount table to get info about
excluding autofs mounts but the fstatat() on all the entries,
regardless of path, that was a surprise to me.

find did use AT_SYMLINK_NOFOLLOW which historically behaved like
AT_NO_AUTOMOUNT.

> 
> 
>>
>> And with the move of userspace to use /proc based mount tables (one
>> example being the symlink of /etc/mtab into /proc) even modest sized
>> direct mount maps will be a problem with every entry getting mounted.
> 
> But the patch in question is only about indirect mount maps, isn't it?
> How is it relevant to direct mount maps?

The change here will cause fstatat() to trigger direct mounts on access
if it doesn't use AT_NO_AUTOMOUNT.

It's not a problem for browse indirect mounts because they are plain
directories within the autofs mount point not individual autofs mount
triggers.

> 
>>
>> Systems will cope with this fine but larger systems not so much.
>>
>> If find does this then the user space changes needed to accommodate
>> this sort of change are almost certainly far more than I expected.
>>
>> I think this is an example of the larger problem I'm faced with and
>> this change was was meant to be a starting point for resolution.
>>
>> The most obvious symptom of the problem is auto-mounts no longer able
>> to be expired due to being re-mounted immediately after expire. Another
>> symptom is unwanted (by the user) accesses causing unexpected auto-mount
>> attempts.
>>
>> I believe this monitoring of the mount table is what leads to excessive
>> CPU consumption I've seen, usually around six processes, under heavy
>> mount activity. And following this, when the mount table is large and
>> there is "no mount activity" two of the six processes continue to consume
>> excessive CPU, until the mount table shrinks.
>>
>> So now I'm coming around to the idea of reverting this change . and
>> going back to the drawing board.
> 
> I can well imaging that a large mount table could cause problems for
> applications that are written to expect one, and I can imagine that
> autofs could cause extra issues for such a program as it might change
> the mount table more often.  But I haven't yet worked out how this is
> related to the patch in question
> 
> Thanks,
> NeilBrown
> 



Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 23/11/17 09:43, Ian Kent wrote:
> On 23/11/17 08:47, NeilBrown wrote:
>> On Wed, Nov 22 2017, Ian Kent wrote:
>>
>>> On 21/11/17 09:53, NeilBrown wrote:
>>>> On Wed, May 10 2017, Ian Kent wrote:
>>>>
>>>>> The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
>>>>> which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
>>>>> of an automount by the call. But this flag is unconditionally cleared
>>>>> for all stat family system calls except statx().
>>>>>
>>>>> stat family system calls have always triggered mount requests for the
>>>>> negative dentry case in follow_automount() which is intended but prevents
>>>>> the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.
>>>>>
>>>>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>>>>> negative dentry case in follow_automount() needs to be changed to
>>>>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>>>>> required flags are clear).
>>>>>
>>>>> AFAICT this change doesn't have any noticable side effects and may,
>>>>> in some use cases (although I didn't see it in testing) prevent
>>>>> unnecessary callbacks to the automount daemon.
>>>>
>>>> Actually, this patch does have a noticeable side effect.
>>>
>>> That's right.
>>>
>>>>
>>>> Previously, if /home were an indirect mount point so that /home/neilb
>>>> would mount my home directory, "ls -l /home/neilb" would always work.
>>>> Now it doesn't.
>>>
>>> An this is correct too.
>>>
>>> The previous policy was that a ->lookup() would always cause a mount to
>>> occur. It's that policy that prevents the AT_NO_AUTOMOUNT flag from being
>>> able to work consistently.
>>
>> Just to be clear, the previous policy was:
>>  kernel - a lookup would cause a message to be sent to the automount daemon
>>  daemon - the receipt of a message would cause a mount to occur.
>>
>> Either part of this policy could be changed to allow AT_NO_AUTOMOUNT to
>> work reliably.  You chose to change the first.  At the time I thought
>> this was a good idea.  I no longer think so.
>>
>> Specifically, I think the second part of the policy should be revised a 
>> little.
>>
>>>
>>> If you set the indirect mount "browse" option that will cause the mount
>>> point directories for each of the map keys to be pre-created. So a
>>> directory listing will show the mount points and an attempt to access
>>> anything within the mount point directory will cause it to be mounted.
>>>
>>> There is still the problem of not knowing map keys when the wildcard
>>> map entry is used.
>>>
>>> Still, stat family systems calls have always had similar problems that
>>> prevent consistent behavior.
>>
>> Yes, I understand all this.  stat family sys-call have some odd
>> behaviours at times like "stat(); open(); fstat()" will result in
>> different sets of status info.  This is known.
>> The point is that these odd behaviours have changed in a noticeably way
>> (contrary to the change log comment) and it isn't clear these changes
>> are good.
>>
>>>
>>>>
>>>> This happens because "ls" calls 'lstat' on the path and when that fails
>>>> with "ENOENT", it doesn't bother trying to open.
>>>
>>> I know, I believe the ENOENT is appropriate because there is no mount
>>> and no directory at the time this happens.
>>
>> Two distinct statements here.  "no mount" and "no directory".
>> If AT_NO_AUTOMOUNT is given (or implicit) the "no mount" is significant
>> and shouldn't be changed.   It isn't clear that "no directory" is
>> significant.
>> If you think of the list of names stored in the autofs filesystem as a
>> cache of recently used names from the map, then the directory *does*
>> exist (in the map), it is just that the (in-kernel) cache hasn't been
>> populated yet.
>> Should "AT_NO_AUTOMOUNT" prevent the cache from being populated?
>>
>>>
>>>>
>>>> An alternate approach to the problem that this patch addresses is to
>>>> *not* change follow_automount() but instead change the auto

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 23/11/17 08:47, NeilBrown wrote:
> On Wed, Nov 22 2017, Ian Kent wrote:
> 
>> On 21/11/17 09:53, NeilBrown wrote:
>>> On Wed, May 10 2017, Ian Kent wrote:
>>>
>>>> The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
>>>> which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
>>>> of an automount by the call. But this flag is unconditionally cleared
>>>> for all stat family system calls except statx().
>>>>
>>>> stat family system calls have always triggered mount requests for the
>>>> negative dentry case in follow_automount() which is intended but prevents
>>>> the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.
>>>>
>>>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>>>> negative dentry case in follow_automount() needs to be changed to
>>>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>>>> required flags are clear).
>>>>
>>>> AFAICT this change doesn't have any noticable side effects and may,
>>>> in some use cases (although I didn't see it in testing) prevent
>>>> unnecessary callbacks to the automount daemon.
>>>
>>> Actually, this patch does have a noticeable side effect.
>>
>> That's right.
>>
>>>
>>> Previously, if /home were an indirect mount point so that /home/neilb
>>> would mount my home directory, "ls -l /home/neilb" would always work.
>>> Now it doesn't.
>>
>> An this is correct too.
>>
>> The previous policy was that a ->lookup() would always cause a mount to
>> occur. It's that policy that prevents the AT_NO_AUTOMOUNT flag from being
>> able to work consistently.
> 
> Just to be clear, the previous policy was:
>  kernel - a lookup would cause a message to be sent to the automount daemon
>  daemon - the receipt of a message would cause a mount to occur.
> 
> Either part of this policy could be changed to allow AT_NO_AUTOMOUNT to
> work reliably.  You chose to change the first.  At the time I thought
> this was a good idea.  I no longer think so.
> 
> Specifically, I think the second part of the policy should be revised a 
> little.
> 
>>
>> If you set the indirect mount "browse" option that will cause the mount
>> point directories for each of the map keys to be pre-created. So a
>> directory listing will show the mount points and an attempt to access
>> anything within the mount point directory will cause it to be mounted.
>>
>> There is still the problem of not knowing map keys when the wildcard
>> map entry is used.
>>
>> Still, stat family systems calls have always had similar problems that
>> prevent consistent behavior.
> 
> Yes, I understand all this.  stat family sys-call have some odd
> behaviours at times like "stat(); open(); fstat()" will result in
> different sets of status info.  This is known.
> The point is that these odd behaviours have changed in a noticeably way
> (contrary to the change log comment) and it isn't clear these changes
> are good.
> 
>>
>>>
>>> This happens because "ls" calls 'lstat' on the path and when that fails
>>> with "ENOENT", it doesn't bother trying to open.
>>
>> I know, I believe the ENOENT is appropriate because there is no mount
>> and no directory at the time this happens.
> 
> Two distinct statements here.  "no mount" and "no directory".
> If AT_NO_AUTOMOUNT is given (or implicit) the "no mount" is significant
> and shouldn't be changed.   It isn't clear that "no directory" is
> significant.
> If you think of the list of names stored in the autofs filesystem as a
> cache of recently used names from the map, then the directory *does*
> exist (in the map), it is just that the (in-kernel) cache hasn't been
> populated yet.
> Should "AT_NO_AUTOMOUNT" prevent the cache from being populated?
> 
>>
>>>
>>> An alternate approach to the problem that this patch addresses is to
>>> *not* change follow_automount() but instead change the automount daemon
>>> and possibly autofs.
>>
>> Perhaps, but the daemon probably doesn't have enough information to
>> make decisions about this so there would need to be something coming
>> from the kernel too.
> 
> I don't think so.
> The daemon already has a promise that upcalls for a given name are
> serialized, and it has the abilit

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-22 Thread Ian Kent
On 22/11/17 12:28, Ian Kent wrote:
> On 21/11/17 09:53, NeilBrown wrote:
>> On Wed, May 10 2017, Ian Kent wrote:
>>
>>> The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
>>> which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
>>> of an automount by the call. But this flag is unconditionally cleared
>>> for all stat family system calls except statx().
>>>
>>> stat family system calls have always triggered mount requests for the
>>> negative dentry case in follow_automount() which is intended but prevents
>>> the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.
>>>
>>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>>> negative dentry case in follow_automount() needs to be changed to
>>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>>> required flags are clear).
>>>
>>> AFAICT this change doesn't have any noticable side effects and may,
>>> in some use cases (although I didn't see it in testing) prevent
>>> unnecessary callbacks to the automount daemon.
>>
>> Actually, this patch does have a noticeable side effect.
> 
> That's right.
> 
>>
>> Previously, if /home were an indirect mount point so that /home/neilb
>> would mount my home directory, "ls -l /home/neilb" would always work.
>> Now it doesn't.
> 
> An this is correct too.
> 
> The previous policy was that a ->lookup() would always cause a mount to
> occur. It's that policy that prevents the AT_NO_AUTOMOUNT flag from being
> able to work consistently.
> 
> If you set the indirect mount "browse" option that will cause the mount
> point directories for each of the map keys to be pre-created. So a
> directory listing will show the mount points and an attempt to access
> anything within the mount point directory will cause it to be mounted.
> 
> There is still the problem of not knowing map keys when the wildcard
> map entry is used.
> 
> Still, stat family systems calls have always had similar problems that
> prevent consistent behavior.
> 
>>
>> This happens because "ls" calls 'lstat' on the path and when that fails
>> with "ENOENT", it doesn't bother trying to open.
> 
> I know, I believe the ENOENT is appropriate because there is no mount
> and no directory at the time this happens.
> 
>>
>> An alternate approach to the problem that this patch addresses is to
>> *not* change follow_automount() but instead change the automount daemon
>> and possibly autofs.
> 
> Perhaps, but the daemon probably doesn't have enough information to
> make decisions about this so there would need to be something coming
> from the kernel too.
> 
>>
>> When a notification of access for an indirect mount point is received,
>> it would firstly just create the directory - not triggering a mount.
>> If another notification is then received (after the directory has been
>> created), it then triggers the mount.
> 
> Not sure, perhaps.
> 
> But I don't think that will work, I've had many problems with this type
> behavior due to bugs and I think the the same sort of problems would be
> encountered.
> 
> The indirect mount "browse" option which behaves very much like what your
> suggesting is the internal program default, and has been since the initial
> v5 release. Historically it is disabled on install to maintain backward
> compatibility with the original Linux autofs (which is also the reason for
> always triggering an automount on ->lookup()).
> 
> Perhaps now is the time for that to change. 
> 
>>
>> I suspect this might need changes to autofs to avoid races when two
>> threads call lstat() on the same path at the same time.  We would need
>> to ensure that automount didn't see this as two requests though
>> maybe it already has enough locking.
>>
>> Does that seem reasonable?  Should we revert this patch (as a
>> regression) and do something in automount instead?

Hey Neil, I'm looking at this again because RH QE have complained about
a regression test failing with a kernel that has this change.

Maybe I'm just dumb but I though a "find  "
would, well, just look at the contents below  but an
strace shows that it reads and calls fstatat() on "every entry in the
mount table" regardless of the path.

And with the move of userspace to use /proc based mount tables (one
example being the symlink of /etc/mtab into /proc) even modest sized
direct mount maps will be a problem with every entry 

Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-11-21 Thread Ian Kent
On 21/11/17 09:53, NeilBrown wrote:
> On Wed, May 10 2017, Ian Kent wrote:
> 
>> The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
>> which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
>> of an automount by the call. But this flag is unconditionally cleared
>> for all stat family system calls except statx().
>>
>> stat family system calls have always triggered mount requests for the
>> negative dentry case in follow_automount() which is intended but prevents
>> the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.
>>
>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>> negative dentry case in follow_automount() needs to be changed to
>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>> required flags are clear).
>>
>> AFAICT this change doesn't have any noticable side effects and may,
>> in some use cases (although I didn't see it in testing) prevent
>> unnecessary callbacks to the automount daemon.
> 
> Actually, this patch does have a noticeable side effect.

That's right.

> 
> Previously, if /home were an indirect mount point so that /home/neilb
> would mount my home directory, "ls -l /home/neilb" would always work.
> Now it doesn't.

An this is correct too.

The previous policy was that a ->lookup() would always cause a mount to
occur. It's that policy that prevents the AT_NO_AUTOMOUNT flag from being
able to work consistently.

If you set the indirect mount "browse" option that will cause the mount
point directories for each of the map keys to be pre-created. So a
directory listing will show the mount points and an attempt to access
anything within the mount point directory will cause it to be mounted.

There is still the problem of not knowing map keys when the wildcard
map entry is used.

Still, stat family systems calls have always had similar problems that
prevent consistent behavior.

> 
> This happens because "ls" calls 'lstat' on the path and when that fails
> with "ENOENT", it doesn't bother trying to open.

I know, I believe the ENOENT is appropriate because there is no mount
and no directory at the time this happens.

> 
> An alternate approach to the problem that this patch addresses is to
> *not* change follow_automount() but instead change the automount daemon
> and possibly autofs.

Perhaps, but the daemon probably doesn't have enough information to
make decisions about this so there would need to be something coming
from the kernel too.

> 
> When a notification of access for an indirect mount point is received,
> it would firstly just create the directory - not triggering a mount.
> If another notification is then received (after the directory has been
> created), it then triggers the mount.

Not sure, perhaps.

But I don't think that will work, I've had many problems with this type
behavior due to bugs and I think the the same sort of problems would be
encountered.

The indirect mount "browse" option which behaves very much like what your
suggesting is the internal program default, and has been since the initial
v5 release. Historically it is disabled on install to maintain backward
compatibility with the original Linux autofs (which is also the reason for
always triggering an automount on ->lookup()).

Perhaps now is the time for that to change. 

> 
> I suspect this might need changes to autofs to avoid races when two
> threads call lstat() on the same path at the same time.  We would need
> to ensure that automount didn't see this as two requests though
> maybe it already has enough locking.
> 
> Does that seem reasonable?  Should we revert this patch (as a
> regression) and do something in automount instead?

Can you check out the "browse" option behavior before we talk further
about this please.

The problem in user space now is that there is no way to control
whether an indirect mount that uses the "nobrowse" option is triggered
or not (using fstatat() or statx()) regardless of the flags used and it's
essential the AT_NO_AUTOMOUNT flag work as expected to have user space
take more care in not triggering automounts.

> 
> Thanks,
> NeilBrown
> 
> 
>>
>> It's also possible that a stat family call has been made with a
>> path that is in the process of being mounted by some other process.
>> But stat family calls should return the automount state of the path
>> as it is "now" so it shouldn't wait for mount completion.
>>
>> This is the same semantic as the positive dentry case already
>> handled.
>>
>> Signed-off-by: Ian Kent 
>> Cc: David Howells 
>> Cc: Coli

Re: [PATCH] autofs: don't fail mount for transient error

2017-11-03 Thread Ian Kent
On 03/11/17 09:40, NeilBrown wrote:
> 

Hi Neil, and thanks taking the time to post the patch.

> Currently if the autofs kernel module gets an error when
> writing to the pipe which links to the daemon, then it
> marks the whole moutpoint as catatonic, and it will stop working.
> 
> It is possible that the error is transient.  This can happen
> if the daemon is slow and more than 16 requests queue up.
> If a subsequent process tries to queue a request, and is then signalled,
> the write to the pipe will return -ERESTARTSYS and autofs
> will take that as total failure.

Indeed it does.

And given the problems with a half dozen (or so) user space
applications consuming large amounts of CPU under heavy mount
and umount activity this could happen more easily than we
expect.

> 
> So change the code to assess -ERESTARTSYS and -ENOMEM as transient
> failures which only abort the current request, not the whole
> mountpoint.

This looks good to me.

> 
> Signed-off-by: NeilBrown 
> ---
> 
> Do people think this should got to -stable ??
> It isn't a crash or a data corruption, but having autofs mountpoints
> suddenly stop working is rather inconvenient.

Perhaps that's a good idea given the CPU usage problem I refer
to above has been around for a while now.

> 
> Thanks,
> NeilBrown
> 
> 
>  fs/autofs4/waitq.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
> index 4ac49d038bf3..8fc41705c7cd 100644
> --- a/fs/autofs4/waitq.c
> +++ b/fs/autofs4/waitq.c
> @@ -81,7 +81,8 @@ static int autofs4_write(struct autofs_sb_info *sbi,
>   spin_unlock_irqrestore(¤t->sighand->siglock, flags);
>   }
>  
> - return (bytes > 0);
> + /* if 'wr' returned 0 (impossible) we assume -EIO (safe) */
> + return bytes == 0 ? 0 : wr < 0 ? wr : -EIO;
>  }
>  
>  static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
> @@ -95,6 +96,7 @@ static void autofs4_notify_daemon(struct autofs_sb_info 
> *sbi,
>   } pkt;
>   struct file *pipe = NULL;
>   size_t pktsz;
> + int ret;
>  
>   pr_debug("wait id = 0x%08lx, name = %.*s, type=%d\n",
>(unsigned long) wq->wait_queue_token,
> @@ -169,7 +171,18 @@ static void autofs4_notify_daemon(struct autofs_sb_info 
> *sbi,
>   mutex_unlock(&sbi->wq_mutex);
>  
>   if (autofs4_write(sbi, pipe, &pkt, pktsz))
> + switch (ret = autofs4_write(sbi, pipe, &pkt, pktsz)) {
> + case 0:
> + break;
> + case -ENOMEM:
> + case -ERESTARTSYS:
> + /* Just fail this one */
> + autofs4_wait_release(sbi, wq->wait_queue_token, ret);
> + break;
> + default:
>   autofs4_catatonic_mode(sbi);
> + break;
> + }
>   fput(pipe);
>  }
>  
> 



Re: [RFC PATCH 1/2] autofs: set compat flag on sbi when daemon uses 32bit addressation

2017-09-14 Thread Ian Kent
On 14/09/17 19:39, Stanislav Kinsburskiy wrote:
> 
> 
> 14.09.2017 13:29, Ian Kent пишет:
>> On 14/09/17 17:24, Stanislav Kinsburskiy wrote:
>>>
>>>
>>> 14.09.2017 02:38, Ian Kent пишет:
>>>> On 01/09/17 19:21, Stanislav Kinsburskiy wrote:
>>>>> Signed-off-by: Stanislav Kinsburskiy 
>>>>> ---
>>>>>  fs/autofs4/autofs_i.h  |3 +++
>>>>>  fs/autofs4/dev-ioctl.c |3 +++
>>>>>  fs/autofs4/inode.c |4 +++-
>>>>>  3 files changed, 9 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>>>> index 4737615..3da105f 100644
>>>>> --- a/fs/autofs4/autofs_i.h
>>>>> +++ b/fs/autofs4/autofs_i.h
>>>>> @@ -120,6 +120,9 @@ struct autofs_sb_info {
>>>>>   struct list_head active_list;
>>>>>   struct list_head expiring_list;
>>>>>   struct rcu_head rcu;
>>>>> +#ifdef CONFIG_COMPAT
>>>>> + unsigned is32bit:1;
>>>>> +#endif
>>>>>  };
>>>>>  
>>>>>  static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb)
>>>>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>>>>> index b7c816f..467d6c4 100644
>>>>> --- a/fs/autofs4/dev-ioctl.c
>>>>> +++ b/fs/autofs4/dev-ioctl.c
>>>>> @@ -397,6 +397,9 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp,
>>>>>   sbi->pipefd = pipefd;
>>>>>   sbi->pipe = pipe;
>>>>>   sbi->catatonic = 0;
>>>>> +#ifdef CONFIG_COMPAT
>>>>> + sbi->is32bit = is_compat_task();
>>>>> +#endif
>>>>>   }
>>>>>  out:
>>>>>   put_pid(new_pid);
>>>>> diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
>>>>> index 09e7d68..21d3c0b 100644
>>>>> --- a/fs/autofs4/inode.c
>>>>> +++ b/fs/autofs4/inode.c
>>>>> @@ -301,7 +301,9 @@ int autofs4_fill_super(struct super_block *s, void 
>>>>> *data, int silent)
>>>>>   } else {
>>>>>   sbi->oz_pgrp = get_task_pid(current, PIDTYPE_PGID);
>>>>>   }
>>>>> -
>>>>> +#ifdef CONFIG_COMPAT
>>>>> + sbi->is32bit = is_compat_task();
>>>>> +#endif
>>>>>   if (autofs_type_trigger(sbi->type))
>>>>>   __managed_dentry_set_managed(root);
>>>>>  
>>>>>
>>>>
>>>> Not sure about this.
>>>>
>>>> Don't you think it would be better to avoid the in code #ifdefs by doing 
>>>> some
>>>> checks and defines in the header file and defining what's need to just use
>>>> is_compat_task().
>>>>
>>>
>>> Yes, might be...
>>>
>>>> Not sure 2 patches are needed for this either ..
>>>>
>>>
>>> Well, I found this issue occasionally.
>>
>> I'm wondering what the symptoms are?
>>
> 
> Size of struct autofs_v5_packet is 300 bytes for x86 and 304 bytes for x86_64.
> Which means, that 32bit task can read more than size of autofs_v5_packet on 
> 64bit kernel.

Are you sure?

Shouldn't that be a short read on the x86 side of a 4 bytes longer
structure on the x86_64 side.

I didn't think you could have a 64 bit client on a 32 bit kernel
so the converse (the read past end of struct) doesn't apply.

Ian


Re: [RFC PATCH 1/2] autofs: set compat flag on sbi when daemon uses 32bit addressation

2017-09-14 Thread Ian Kent
On 14/09/17 17:24, Stanislav Kinsburskiy wrote:
> 
> 
> 14.09.2017 02:38, Ian Kent пишет:
>> On 01/09/17 19:21, Stanislav Kinsburskiy wrote:
>>> Signed-off-by: Stanislav Kinsburskiy 
>>> ---
>>>  fs/autofs4/autofs_i.h  |3 +++
>>>  fs/autofs4/dev-ioctl.c |3 +++
>>>  fs/autofs4/inode.c |4 +++-
>>>  3 files changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>> index 4737615..3da105f 100644
>>> --- a/fs/autofs4/autofs_i.h
>>> +++ b/fs/autofs4/autofs_i.h
>>> @@ -120,6 +120,9 @@ struct autofs_sb_info {
>>> struct list_head active_list;
>>> struct list_head expiring_list;
>>> struct rcu_head rcu;
>>> +#ifdef CONFIG_COMPAT
>>> +   unsigned is32bit:1;
>>> +#endif
>>>  };
>>>  
>>>  static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb)
>>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>>> index b7c816f..467d6c4 100644
>>> --- a/fs/autofs4/dev-ioctl.c
>>> +++ b/fs/autofs4/dev-ioctl.c
>>> @@ -397,6 +397,9 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp,
>>> sbi->pipefd = pipefd;
>>> sbi->pipe = pipe;
>>> sbi->catatonic = 0;
>>> +#ifdef CONFIG_COMPAT
>>> +   sbi->is32bit = is_compat_task();
>>> +#endif
>>> }
>>>  out:
>>> put_pid(new_pid);
>>> diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
>>> index 09e7d68..21d3c0b 100644
>>> --- a/fs/autofs4/inode.c
>>> +++ b/fs/autofs4/inode.c
>>> @@ -301,7 +301,9 @@ int autofs4_fill_super(struct super_block *s, void 
>>> *data, int silent)
>>> } else {
>>> sbi->oz_pgrp = get_task_pid(current, PIDTYPE_PGID);
>>> }
>>> -
>>> +#ifdef CONFIG_COMPAT
>>> +   sbi->is32bit = is_compat_task();
>>> +#endif
>>> if (autofs_type_trigger(sbi->type))
>>> __managed_dentry_set_managed(root);
>>>  
>>>
>>
>> Not sure about this.
>>
>> Don't you think it would be better to avoid the in code #ifdefs by doing some
>> checks and defines in the header file and defining what's need to just use
>> is_compat_task().
>>
> 
> Yes, might be...
> 
>> Not sure 2 patches are needed for this either ..
>>
> 
> Well, I found this issue occasionally.

I'm wondering what the symptoms are?

> And, frankly speaking, it's not clear to me, whether this issue is important 
> at all, so I wanted to clarify this first.
> Thanks to O_DIRECT, the only way to catch the issue is to try to read more, 
> than expected, in compat task (that's how I found it).

Right, the O_DIRECT patch from Linus was expected to fix the structure
alignment problem. The stuct field offsets are ok aren't they?

> I don't see any other flaw so far. And if so, that, probably, we shouldn't 
> care about the issue at all.
> What do you think?

If we are seeing hangs, incorrect struct fields or similar something
should be done about it but if all is actually working ok then the
O_DIRECT fix is doing it's job and further changes aren't necessary.

Ian


Re: [RFC PATCH 1/2] autofs: set compat flag on sbi when daemon uses 32bit addressation

2017-09-13 Thread Ian Kent
On 01/09/17 19:21, Stanislav Kinsburskiy wrote:
> Signed-off-by: Stanislav Kinsburskiy 
> ---
>  fs/autofs4/autofs_i.h  |3 +++
>  fs/autofs4/dev-ioctl.c |3 +++
>  fs/autofs4/inode.c |4 +++-
>  3 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
> index 4737615..3da105f 100644
> --- a/fs/autofs4/autofs_i.h
> +++ b/fs/autofs4/autofs_i.h
> @@ -120,6 +120,9 @@ struct autofs_sb_info {
>   struct list_head active_list;
>   struct list_head expiring_list;
>   struct rcu_head rcu;
> +#ifdef CONFIG_COMPAT
> + unsigned is32bit:1;
> +#endif
>  };
>  
>  static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb)
> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
> index b7c816f..467d6c4 100644
> --- a/fs/autofs4/dev-ioctl.c
> +++ b/fs/autofs4/dev-ioctl.c
> @@ -397,6 +397,9 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp,
>   sbi->pipefd = pipefd;
>   sbi->pipe = pipe;
>   sbi->catatonic = 0;
> +#ifdef CONFIG_COMPAT
> + sbi->is32bit = is_compat_task();
> +#endif
>   }
>  out:
>   put_pid(new_pid);
> diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
> index 09e7d68..21d3c0b 100644
> --- a/fs/autofs4/inode.c
> +++ b/fs/autofs4/inode.c
> @@ -301,7 +301,9 @@ int autofs4_fill_super(struct super_block *s, void *data, 
> int silent)
>   } else {
>   sbi->oz_pgrp = get_task_pid(current, PIDTYPE_PGID);
>   }
> -
> +#ifdef CONFIG_COMPAT
> + sbi->is32bit = is_compat_task();
> +#endif
>   if (autofs_type_trigger(sbi->type))
>   __managed_dentry_set_managed(root);
>  
> 

Not sure about this.

Don't you think it would be better to avoid the in code #ifdefs by doing some
checks and defines in the header file and defining what's need to just use
is_compat_task().

Not sure 2 patches are needed for this either ..

Ian


Re: [RFC PATCH 0/2] autofs: fix autofs_v5_packet dlivery in compat mode

2017-09-13 Thread Ian Kent
On 01/09/17 19:21, Stanislav Kinsburskiy wrote:
> The problem is that in compat mode struct autofs_v5_packet has to have 
> different size
> (i.e. 4 bytes less).

I regret (several times over) my original decision to not make v5 packets
packed 

I have to say the description of the problem is not all that good.

> 
> This is RFC because:
> 1) This issue is hidden, because autofs pipe has O_DIRECT and the rest of the
> epacket is truncated when read.
> 2) X86 arch doesn't have is_compat_task() helper
> 3) It's now clear, what to do if "pgrp" option is specified.

I don't understand what the "pgrp" option has to do with this?

> 
> The following series implements...
> 
> ---
> 
> Stanislav Kinsburskiy (2):
>   autofs: set compat flag on sbi when daemon uses 32bit addressation
>   autofs: sent 32-bit sized packet for 32-bit process
> 
> 
>  fs/autofs4/autofs_i.h  |3 +++
>  fs/autofs4/dev-ioctl.c |3 +++
>  fs/autofs4/inode.c |4 +++-
>  fs/autofs4/waitq.c |5 +
>  4 files changed, 14 insertions(+), 1 deletion(-)
> 



Re: Do we really need d_weak_revalidate???

2017-08-24 Thread Ian Kent
On 24/08/17 19:03, Michael Kerrisk (man-pages) wrote:
> Hi Neil,
> 
> On 24 August 2017 at 06:07, NeilBrown  wrote:
>> On Wed, Aug 23 2017, Ian Kent wrote:
>>
>>>
>>> That inconsistency has bothered me for quite a while now.
>>>
>>> It was carried over from the autofs module behavior when automounting
>>> support was added to the VFS. What's worse is it prevents the use of
>>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>>> statx().
>>>
>>> There is some risk in changing that so it does work but it really does
>>> need to work to enable userspace to not trigger an automount by using
>>> this flag.
>>>
>>> So that's (hopefully) going to change soonish, see:
>>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>>
>>> The result should be that stat family calls don't trigger automounts except
>>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>>
>>
>> oooh, yes.  That's much better - thanks.
>>
>> We should make sure that change gets into the man pages...
>>
>> First however, we should probably correct the man page!
>> stat.2 says:
>>
>>
>>   NOTES
>>On Linux, lstat() will generally not trigger  automounter
>>action,  whereas  stat()  will  (but  see  the description of
>>fstatat() AT_NO_AUTOMOUNT fag, above).
>>
>> which is wrong: lstat and stat treat automounts the same.
>> @Michael: do you recall why you inserted that text?  The commit message
>> in commit 1ef5b2805471 ("stat.2: Cosmetic reworking of timestamp
>> discussion in NOTES") is not very helpful.
> 
> That commit really was just cosmetic changes. The change that
> introduced the text was 82d2be3d9d66b7, based on a note from Peter
> Anvin:

Indeed, that was correct for autofs v3 but we're at autofs v5 now and
a lot has changed over time (the commit is from 2008).

All I can do is apologize for not also checking the man pages and trying
to keep them up to date.

Let's just work on making them accurate now.

> 
> [[
> > > Additionally, you may want to make a note in the stat/lstat man page 
> tha
> t on
> > > Linux, lstat(2) will generally not trigger automounter action, whereas
> > > stat(2) will.
> >
> > I don't understand this last piece.  Can you say some more.  (I'm not
> > familiar with automounter details.)
> 
> An automounter (either an explicit one, like autofs, or an implicit
> one, such as are used by AFS or NFSv4) is something that triggers
> a mount when something is touched.
> 
> However, it's undesirable to automount, say, everyone's home
> directory just because someone opened up /home in their GUI
> browser or typed "ls -l /home".  The early automounters simply
> didn't list the contents until you accessed it by name;
> this is still the case when you can't enumerate a mapping
> (say, all DNS names under /net).  However, this is extremely
> inconvenient, too.
> 
> The solution we ended up settling on is to create something
> that looks like a directory (i.e. reports S_IFDIR in stat()),
> but behaves somewhat like a symlink.  In particular, when it is
> accessed in a way where a symlink would be dereferenced,
> the automount triggers and the directory is mounted.  However,
> system calls which do *not* cause a symlink to be dereferenced,
> like lstat(), also do not cause the automounter to trigger.
> This means that "ls -l", or a GUI file browser, can see a list
> of directories without causing each one of them to be automounted.
> 
> -hpa
> ]]
> 
> Cheers,
> 
> Michael
> 
>> I propose correcting to
>>
>>   NOTES:
>>   On Linux, lstat() nor stat() act as though AT_NO_AUTOMOUNT was set
>>   and will not trigger automounter action for direct automount
>>   points, though they may (prior to 4.14) for indirect automount
>>   points.
>>
>>
>> The more precise details, that automount action for indirect automount
>> points is not triggered when the 'browse' option is used, is probably
>> not necessary.
>>
>> Ian: if you agree with that text, and Michael doesn't provide alternate
>> evidence, I'll submit a formal patch for the man page or should we
>> just wait until the patch actually lands?
>>
>> Thanks,
>> NeilBrown
>>
> 
> 
> 



Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent
On 24/08/17 12:07, NeilBrown wrote:
> 
> 
> The more precise details, that automount action for indirect automount
> points is not triggered when the 'browse' option is used, is probably
> not necessary.
> 
> Ian: if you agree with that text, and Michael doesn't provide alternate
> evidence, I'll submit a formal patch for the man page or should we
> just wait until the patch actually lands?

So far only David commented about using ENOENT rather than EREMOTE.

I prefer ENOENT for this case myself and he didn't object when I
explained why, David, any concerns?

Al has been silent so far so either he hasn't seen it or he's ok with
it, Al, any concerns?

And I guess if there are no concerns there's a good chance Andrew is
ok with it for the next merge window, Andrew?

If everyone agrees then we could go ahead immediately so there's a
better chance of getting it into released man pages closer to the
change being merged.

Ian


Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent
On 24/08/17 12:07, NeilBrown wrote:
> On Wed, Aug 23 2017, Ian Kent wrote:
> 
>>
>> That inconsistency has bothered me for quite a while now.
>>
>> It was carried over from the autofs module behavior when automounting
>> support was added to the VFS. What's worse is it prevents the use of
>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>> statx().
>>
>> There is some risk in changing that so it does work but it really does
>> need to work to enable userspace to not trigger an automount by using
>> this flag.
>>
>> So that's (hopefully) going to change soonish, see:
>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>
>> The result should be that stat family calls don't trigger automounts except
>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>
> 
> oooh, yes.  That's much better - thanks.
> 
> We should make sure that change gets into the man pages...

Yes, I was wondering who to contact for that.

> 
> First however, we should probably correct the man page!
> stat.2 says:
> 
> 
>   NOTES
>On Linux, lstat() will generally not trigger  automounter
>action,  whereas  stat()  will  (but  see  the description of
>fstatat() AT_NO_AUTOMOUNT fag, above).
> 
> which is wrong: lstat and stat treat automounts the same.
> @Michael: do you recall why you inserted that text?  The commit message
> in commit 1ef5b2805471 ("stat.2: Cosmetic reworking of timestamp
> discussion in NOTES") is not very helpful.
> 
> I propose correcting to
> 
>   NOTES:
>   On Linux, lstat() nor stat() act as though AT_NO_AUTOMOUNT was set
>   and will not trigger automounter action for direct automount
>   points, though they may (prior to 4.14) for indirect automount
>   points.

Shouldn't that be "lstat() and stat() act as though AT_NO_AUTOMOUNT is
set ..."

> 
> 
> The more precise details, that automount action for indirect automount
> points is not triggered when the 'browse' option is used, is probably
> not necessary.
> 
> Ian: if you agree with that text, and Michael doesn't provide alternate
> evidence, I'll submit a formal patch for the man page or should we
> just wait until the patch actually lands?

I thought the fstatat() description needed attention too, doubly so with
the AT_NO_AUTOMOUNT change.

The "The fstatat() system call operates in exactly the same way as stat()"
is wrong in the same way as the stat() description was wrong.

After the change fstatat() will trigger automounts if the AT_NO_AUTOMOUNT
flag is not given which is different from lstat() and stat().

The updated NOTE above probably needs to be referred to in order to clarify
what's meant by "in exactly the same way" since that probably refers to the
information returned rather than whether an automount will be done.

Ian


Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent
On 24/08/17 11:21, NeilBrown wrote:
> On Wed, Aug 23 2017, Ian Kent wrote:
> 
>> On 23/08/17 10:32, Ian Kent wrote:
>>> On 23/08/17 09:06, NeilBrown wrote:
>>>> On Mon, Aug 21 2017, Ian Kent wrote:
>>>>
>>>>>>
>>>>>> A mount isn't triggered by kern_path(pathname, 0, &path).
>>>>>> That '0' would need to include one of
>>>>>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>>>>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>>>>>
>>>>>> to trigger an automount (otherwise you just get -EISDIR).
>>>>>
>>>>> It's perfectly sensible to think that but there is a case where a
>>>>> a mount is triggered when using kern_path().
>>>>>
>>>>> The EISDIR return occurs for positive dentrys, negative dentrys
>>>>> will still trigger an automount (which is autofs specific,
>>>>> indirect mount map using nobrowse option, the install default).
>>>>
>>>> Ok, I understand this better now.  This difference between direct and
>>>> indirect mounts is slightly awkward. It is visible from user-space, but
>>>> not elegant to document.
>>>> When you use O_PATH to open a direct automount that has not already been
>>>> triggered, the open returns the underlying directory (and fstatfs
>>>> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
>>>> an indirect automount, it *will* trigger the automount when "nobrowse" is
>>>> in effect, but it won't when "browse" is in effect.
>>>
>>> That inconsistency has bothered me for quite a while now.
>>>
>>> It was carried over from the autofs module behavior when automounting
>>> support was added to the VFS. What's worse is it prevents the use of
>>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>>> statx().
>>>
>>> There is some risk in changing that so it does work but it really does
>>> need to work to enable userspace to not trigger an automount by using
>>> this flag.
>>>
>>> So that's (hopefully) going to change soonish, see:
>>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>>
>>> The result should be that stat family calls don't trigger automounts except
>>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>>
>>>>
>>>> So we cannot just say "O_PATH doesn't trigger automounts", which is
>>>> essentially what I said in
>>>>
>>>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
>>>>
>>>> It might be possible to modify automount so that it was more consistent
>>>> - i.e. if the point is triggered by a mkdir has been done, just to the
>>>> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
>>>> guess that might be racy, and in any case is hard to justify.
>>>>
>>>> Maybe I should change it to be about "direct automounts", and add a note
>>>> that indirect automounts aren't so predictable.
>>>
>>> Right and the semantics should be much more consistent in the near future.
>>> I hope (and expect) this semantic change won't cause problems.
>>>
>>>>
>>>> But back to my original issue of wanting to discard
>>>> kern_path_mountpoint, what would you think of the following approach -
>>>> slight revised from before.
>>>>
>>>> Thanks,
>>>> NeilBrown
>>>>
>>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>>> index beef981aa54f..7663ea82e68d 100644
>>>> --- a/fs/autofs4/autofs_i.h
>>>> +++ b/fs/autofs4/autofs_i.h
>>>> @@ -135,10 +135,13 @@ static inline struct autofs_info 
>>>> *autofs4_dentry_ino(struct dentry *dentry)
>>>>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>>>>   * processes which do manipulations for us in user space sees the raw
>>>>   * filesystem without "magic".)
>>>> + * A process performing certain ioctls can get temporary oz status.
>>>>   */
>>>> +extern struct task_struct *autofs_tmp_oz;
>>>>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>>>>  {
>&

Re: Do we really need d_weak_revalidate???

2017-08-23 Thread Ian Kent
On 23/08/17 10:54, Ian Kent wrote:
> On 23/08/17 10:40, Ian Kent wrote:
>> On 23/08/17 10:32, Ian Kent wrote:
>>> On 23/08/17 09:06, NeilBrown wrote:
>>>> On Mon, Aug 21 2017, Ian Kent wrote:
>>>>
>>>>>>
>>>>>> A mount isn't triggered by kern_path(pathname, 0, &path).
>>>>>> That '0' would need to include one of
>>>>>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>>>>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>>>>>
>>>>>> to trigger an automount (otherwise you just get -EISDIR).
>>>>>
>>>>> It's perfectly sensible to think that but there is a case where a
>>>>> a mount is triggered when using kern_path().
>>>>>
>>>>> The EISDIR return occurs for positive dentrys, negative dentrys
>>>>> will still trigger an automount (which is autofs specific,
>>>>> indirect mount map using nobrowse option, the install default).
>>>>
>>>> Ok, I understand this better now.  This difference between direct and
>>>> indirect mounts is slightly awkward. It is visible from user-space, but
>>>> not elegant to document.
>>>> When you use O_PATH to open a direct automount that has not already been
>>>> triggered, the open returns the underlying directory (and fstatfs
>>>> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
>>>> an indirect automount, it *will* trigger the automount when "nobrowse" is
>>>> in effect, but it won't when "browse" is in effect.
>>>
>>> That inconsistency has bothered me for quite a while now.
>>>
>>> It was carried over from the autofs module behavior when automounting
>>> support was added to the VFS. What's worse is it prevents the use of
>>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>>> statx().
>>>
>>> There is some risk in changing that so it does work but it really does
>>> need to work to enable userspace to not trigger an automount by using
>>> this flag.
>>>
>>> So that's (hopefully) going to change soonish, see:
>>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>>
>>> The result should be that stat family calls don't trigger automounts except
>>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>>
>>>>
>>>> So we cannot just say "O_PATH doesn't trigger automounts", which is
>>>> essentially what I said in
>>>>
>>>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
>>>>
>>>> It might be possible to modify automount so that it was more consistent
>>>> - i.e. if the point is triggered by a mkdir has been done, just to the
>>>> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
>>>> guess that might be racy, and in any case is hard to justify.
>>>>
>>>> Maybe I should change it to be about "direct automounts", and add a note
>>>> that indirect automounts aren't so predictable.
>>>
>>> Right and the semantics should be much more consistent in the near future.
>>> I hope (and expect) this semantic change won't cause problems.
>>>
>>>>
>>>> But back to my original issue of wanting to discard
>>>> kern_path_mountpoint, what would you think of the following approach -
>>>> slight revised from before.
>>>>
>>>> Thanks,
>>>> NeilBrown
>>>>
>>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>>> index beef981aa54f..7663ea82e68d 100644
>>>> --- a/fs/autofs4/autofs_i.h
>>>> +++ b/fs/autofs4/autofs_i.h
>>>> @@ -135,10 +135,13 @@ static inline struct autofs_info 
>>>> *autofs4_dentry_ino(struct dentry *dentry)
>>>>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>>>>   * processes which do manipulations for us in user space sees the raw
>>>>   * filesystem without "magic".)
>>>> + * A process performing certain ioctls can get temporary oz status.
>>>>   */
>>>> +extern struct task_struct *autofs_tmp_oz;
>>>>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>>>>  {
>>>>

Re: Do we really need d_weak_revalidate???

2017-08-22 Thread Ian Kent
On 23/08/17 10:40, Ian Kent wrote:
> On 23/08/17 10:32, Ian Kent wrote:
>> On 23/08/17 09:06, NeilBrown wrote:
>>> On Mon, Aug 21 2017, Ian Kent wrote:
>>>
>>>>>
>>>>> A mount isn't triggered by kern_path(pathname, 0, &path).
>>>>> That '0' would need to include one of
>>>>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>>>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>>>>
>>>>> to trigger an automount (otherwise you just get -EISDIR).
>>>>
>>>> It's perfectly sensible to think that but there is a case where a
>>>> a mount is triggered when using kern_path().
>>>>
>>>> The EISDIR return occurs for positive dentrys, negative dentrys
>>>> will still trigger an automount (which is autofs specific,
>>>> indirect mount map using nobrowse option, the install default).
>>>
>>> Ok, I understand this better now.  This difference between direct and
>>> indirect mounts is slightly awkward. It is visible from user-space, but
>>> not elegant to document.
>>> When you use O_PATH to open a direct automount that has not already been
>>> triggered, the open returns the underlying directory (and fstatfs
>>> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
>>> an indirect automount, it *will* trigger the automount when "nobrowse" is
>>> in effect, but it won't when "browse" is in effect.
>>
>> That inconsistency has bothered me for quite a while now.
>>
>> It was carried over from the autofs module behavior when automounting
>> support was added to the VFS. What's worse is it prevents the use of
>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
>> statx().
>>
>> There is some risk in changing that so it does work but it really does
>> need to work to enable userspace to not trigger an automount by using
>> this flag.
>>
>> So that's (hopefully) going to change soonish, see:
>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
>>
>> The result should be that stat family calls don't trigger automounts except
>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
>>
>>>
>>> So we cannot just say "O_PATH doesn't trigger automounts", which is
>>> essentially what I said in
>>>
>>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
>>>
>>> It might be possible to modify automount so that it was more consistent
>>> - i.e. if the point is triggered by a mkdir has been done, just to the
>>> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
>>> guess that might be racy, and in any case is hard to justify.
>>>
>>> Maybe I should change it to be about "direct automounts", and add a note
>>> that indirect automounts aren't so predictable.
>>
>> Right and the semantics should be much more consistent in the near future.
>> I hope (and expect) this semantic change won't cause problems.
>>
>>>
>>> But back to my original issue of wanting to discard
>>> kern_path_mountpoint, what would you think of the following approach -
>>> slight revised from before.
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>> index beef981aa54f..7663ea82e68d 100644
>>> --- a/fs/autofs4/autofs_i.h
>>> +++ b/fs/autofs4/autofs_i.h
>>> @@ -135,10 +135,13 @@ static inline struct autofs_info 
>>> *autofs4_dentry_ino(struct dentry *dentry)
>>>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>>>   * processes which do manipulations for us in user space sees the raw
>>>   * filesystem without "magic".)
>>> + * A process performing certain ioctls can get temporary oz status.
>>>   */
>>> +extern struct task_struct *autofs_tmp_oz;
>>>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>>>  {
>>> -   return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
>>> +   return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
>>> +   autofs_tmp_oz == current;
>>>  }
>>>  
>>>  struct inode *autofs4_get_inode(struct super_block *, umode_t);
>>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs

Re: Do we really need d_weak_revalidate???

2017-08-22 Thread Ian Kent
On 23/08/17 10:32, Ian Kent wrote:
> On 23/08/17 09:06, NeilBrown wrote:
>> On Mon, Aug 21 2017, Ian Kent wrote:
>>
>>>>
>>>> A mount isn't triggered by kern_path(pathname, 0, &path).
>>>> That '0' would need to include one of
>>>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>>>
>>>> to trigger an automount (otherwise you just get -EISDIR).
>>>
>>> It's perfectly sensible to think that but there is a case where a
>>> a mount is triggered when using kern_path().
>>>
>>> The EISDIR return occurs for positive dentrys, negative dentrys
>>> will still trigger an automount (which is autofs specific,
>>> indirect mount map using nobrowse option, the install default).
>>
>> Ok, I understand this better now.  This difference between direct and
>> indirect mounts is slightly awkward. It is visible from user-space, but
>> not elegant to document.
>> When you use O_PATH to open a direct automount that has not already been
>> triggered, the open returns the underlying directory (and fstatfs
>> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
>> an indirect automount, it *will* trigger the automount when "nobrowse" is
>> in effect, but it won't when "browse" is in effect.
> 
> That inconsistency has bothered me for quite a while now.
> 
> It was carried over from the autofs module behavior when automounting
> support was added to the VFS. What's worse is it prevents the use of
> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
> statx().
> 
> There is some risk in changing that so it does work but it really does
> need to work to enable userspace to not trigger an automount by using
> this flag.
> 
> So that's (hopefully) going to change soonish, see:
> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch
> 
> The result should be that stat family calls don't trigger automounts except
> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.
> 
>>
>> So we cannot just say "O_PATH doesn't trigger automounts", which is
>> essentially what I said in
>>
>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
>>
>> It might be possible to modify automount so that it was more consistent
>> - i.e. if the point is triggered by a mkdir has been done, just to the
>> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
>> guess that might be racy, and in any case is hard to justify.
>>
>> Maybe I should change it to be about "direct automounts", and add a note
>> that indirect automounts aren't so predictable.
> 
> Right and the semantics should be much more consistent in the near future.
> I hope (and expect) this semantic change won't cause problems.
> 
>>
>> But back to my original issue of wanting to discard
>> kern_path_mountpoint, what would you think of the following approach -
>> slight revised from before.
>>
>> Thanks,
>> NeilBrown
>>
>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>> index beef981aa54f..7663ea82e68d 100644
>> --- a/fs/autofs4/autofs_i.h
>> +++ b/fs/autofs4/autofs_i.h
>> @@ -135,10 +135,13 @@ static inline struct autofs_info 
>> *autofs4_dentry_ino(struct dentry *dentry)
>>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>>   * processes which do manipulations for us in user space sees the raw
>>   * filesystem without "magic".)
>> + * A process performing certain ioctls can get temporary oz status.
>>   */
>> +extern struct task_struct *autofs_tmp_oz;
>>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>>  {
>> -return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
>> +return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
>> +autofs_tmp_oz == current;
>>  }
>>  
>>  struct inode *autofs4_get_inode(struct super_block *, umode_t);
>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>> index dd9f1bebb5a3..d76401669a20 100644
>> --- a/fs/autofs4/dev-ioctl.c
>> +++ b/fs/autofs4/dev-ioctl.c
>> @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file *fp,
>>  return 0;
>>  }
>>  
>> +struct task_struct *autofs_tmp_oz;
>> +int kern_path_oz(const char *pathname, int flags, struct pa

Re: Do we really need d_weak_revalidate???

2017-08-22 Thread Ian Kent
On 23/08/17 09:06, NeilBrown wrote:
> On Mon, Aug 21 2017, Ian Kent wrote:
> 
>>>
>>> A mount isn't triggered by kern_path(pathname, 0, &path).
>>> That '0' would need to include one of
>>>   LOOKUP_PARENT | LOOKUP_DIRECTORY |
>>>   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT
>>>
>>> to trigger an automount (otherwise you just get -EISDIR).
>>
>> It's perfectly sensible to think that but there is a case where a
>> a mount is triggered when using kern_path().
>>
>> The EISDIR return occurs for positive dentrys, negative dentrys
>> will still trigger an automount (which is autofs specific,
>> indirect mount map using nobrowse option, the install default).
> 
> Ok, I understand this better now.  This difference between direct and
> indirect mounts is slightly awkward. It is visible from user-space, but
> not elegant to document.
> When you use O_PATH to open a direct automount that has not already been
> triggered, the open returns the underlying directory (and fstatfs
> confirms that it is AUTOFS_SUPER_MAGIC).  When you use O_PATH on
> an indirect automount, it *will* trigger the automount when "nobrowse" is
> in effect, but it won't when "browse" is in effect.

That inconsistency has bothered me for quite a while now.

It was carried over from the autofs module behavior when automounting
support was added to the VFS. What's worse is it prevents the use of
the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with
statx().

There is some risk in changing that so it does work but it really does
need to work to enable userspace to not trigger an automount by using
this flag.

So that's (hopefully) going to change soonish, see:
http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch

The result should be that stat family calls don't trigger automounts except
for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag.

> 
> So we cannot just say "O_PATH doesn't trigger automounts", which is
> essentially what I said in
> 
> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94
> 
> It might be possible to modify automount so that it was more consistent
> - i.e. if the point is triggered by a mkdir has been done, just to the
> mkdir.  If it is triggered after a mkdir has been done, do the mount.  I
> guess that might be racy, and in any case is hard to justify.
> 
> Maybe I should change it to be about "direct automounts", and add a note
> that indirect automounts aren't so predictable.

Right and the semantics should be much more consistent in the near future.
I hope (and expect) this semantic change won't cause problems.

> 
> But back to my original issue of wanting to discard
> kern_path_mountpoint, what would you think of the following approach -
> slight revised from before.
> 
> Thanks,
> NeilBrown
> 
> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
> index beef981aa54f..7663ea82e68d 100644
> --- a/fs/autofs4/autofs_i.h
> +++ b/fs/autofs4/autofs_i.h
> @@ -135,10 +135,13 @@ static inline struct autofs_info 
> *autofs4_dentry_ino(struct dentry *dentry)
>  /* autofs4_oz_mode(): do we see the man behind the curtain?  (The
>   * processes which do manipulations for us in user space sees the raw
>   * filesystem without "magic".)
> + * A process performing certain ioctls can get temporary oz status.
>   */
> +extern struct task_struct *autofs_tmp_oz;
>  static inline int autofs4_oz_mode(struct autofs_sb_info *sbi)
>  {
> - return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp;
> + return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp ||
> + autofs_tmp_oz == current;
>  }
>  
>  struct inode *autofs4_get_inode(struct super_block *, umode_t);
> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
> index dd9f1bebb5a3..d76401669a20 100644
> --- a/fs/autofs4/dev-ioctl.c
> +++ b/fs/autofs4/dev-ioctl.c
> @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file *fp,
>   return 0;
>  }
>  
> +struct task_struct *autofs_tmp_oz;
> +int kern_path_oz(const char *pathname, int flags, struct path *path)
> +{
> + static DEFINE_MUTEX(autofs_oz);
> + int err;
> +
> + mutex_lock(&autofs_oz);
> + autofs_tmp_oz = current;
> + err = kern_path(pathname, flags, path);
> + autofs_tmp_oz = NULL;
> + mutex_unlock(&autofs_oz);
> + return err;
> +}
> +

It's simple enough but does look like it will attract criticism as being
a hack!

The kern_path_locked() function

Re: Do we really need d_weak_revalidate???

2017-08-20 Thread Ian Kent
On 21/08/17 14:23, NeilBrown wrote:
> On Fri, Aug 18 2017, Ian Kent wrote:
> 
>> On 18/08/17 13:24, NeilBrown wrote:
>>> On Thu, Aug 17 2017, Ian Kent wrote:
>>>
>>>> On 16/08/17 19:34, Jeff Layton wrote:
>>>>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>>>>>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>>>>>
>>>>>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
>>>>>>>> On Fri, Aug 11 2017, Jeff Layton wrote:
>>>>>>>>
>>>>>>>>> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>>>>>>>>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>>>>>>>>>>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>>>>>>>>>>> flag and introduced the d_weak_revalidate dentry operation instead.
>>>>>>>>>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>>>>>>>>>>> and added the new dentry operation to NFS dentries  but not to
>>>>>>>>>>> NFSv4
>>>>>>>>>>> dentries.
>>>>>>>>>>>
>>>>>>>>>>> And nobody noticed.
>>>>>>>>>>>
>>>>>>>>>>> Until today.
>>>>>>>>>>>
>>>>>>>>>>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>>>>>>>>>>> NFS
>>>>>>>>>>> filesystem hangs because the network has been deconfigured.  This
>>>>>>>>>>> makes
>>>>>>>>>>> perfect sense and I suggested a code change to fix the problem.
>>>>>>>>>>> However when a colleague was trying to reproduce the problem to
>>>>>>>>>>> validate
>>>>>>>>>>> the fix, he couldn't.  Then nor could I.
>>>>>>>>>>>
>>>>>>>>>>> The problem is trivially reproducible with NFSv3, and not at all 
>>>>>>>>>>> with
>>>>>>>>>>> NFSv4.  The reason is the missing d_weak_revalidate.
>>>>>>>>>>>
>>>>>>>>>>> We could simply add d_weak_revalidate for NFSv4, but given that it
>>>>>>>>>>> has been missing for 4.5 years, and the only time anyone noticed was
>>>>>>>>>>> when the ommission resulted in a better user experience, I do wonder
>>>>>>>>>>> if
>>>>>>>>>>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>>>>>>>>>>> does
>>>>>>>>>>> it serve?  I couldn't find one.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> NeilBrown
>>>>>>>>>>>
>>>>>>>>>>> For reference, see
>>>>>>>>>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>>>>>>>>>>> d_weak_revalidate dentry op")
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> To reproduce the problem at home, on a system that uses systemd:
>>>>>>>>>>> 1/ place (or find) a filesystem image in a file on an NFS 
>>>>>>>>>>> filesystem.
>>>>>>>>>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>>>>>>>>>>> 3/ loop-mount the filesystem image read-only somewhere
>>>>>>>>>>> 4/ reboot
>>>>>>>>>>>
>>>>>>>>>>> If you choose v4, the reboot will succeed, possibly after a 90second
>>>>>>>>>>> timeout.
>>>>>>>>>>> If you choose v3, the reboot will hang indefinitely in systemd-
>>>>>>>>>>> shutdown while
>>>>>>>>>>> remounting the nfs filesystem read-only.
>>>>>>>>>>>
>>>>>>>>>>> If you don't use "noac"

Re: Do we really need d_weak_revalidate???

2017-08-17 Thread Ian Kent
On 18/08/17 14:47, Ian Kent wrote:
> On 18/08/17 13:24, NeilBrown wrote:
>> On Thu, Aug 17 2017, Ian Kent wrote:
>>
>>> On 16/08/17 19:34, Jeff Layton wrote:
>>>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>>>>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>>>>
>>>>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
>>>>>>> On Fri, Aug 11 2017, Jeff Layton wrote:
>>>>>>>
>>>>>>>> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>>>>>>>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>>>>>>>>>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>>>>>>>>>> flag and introduced the d_weak_revalidate dentry operation instead.
>>>>>>>>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>>>>>>>>>> and added the new dentry operation to NFS dentries  but not to
>>>>>>>>>> NFSv4
>>>>>>>>>> dentries.
>>>>>>>>>>
>>>>>>>>>> And nobody noticed.
>>>>>>>>>>
>>>>>>>>>> Until today.
>>>>>>>>>>
>>>>>>>>>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>>>>>>>>>> NFS
>>>>>>>>>> filesystem hangs because the network has been deconfigured.  This
>>>>>>>>>> makes
>>>>>>>>>> perfect sense and I suggested a code change to fix the problem.
>>>>>>>>>> However when a colleague was trying to reproduce the problem to
>>>>>>>>>> validate
>>>>>>>>>> the fix, he couldn't.  Then nor could I.
>>>>>>>>>>
>>>>>>>>>> The problem is trivially reproducible with NFSv3, and not at all with
>>>>>>>>>> NFSv4.  The reason is the missing d_weak_revalidate.
>>>>>>>>>>
>>>>>>>>>> We could simply add d_weak_revalidate for NFSv4, but given that it
>>>>>>>>>> has been missing for 4.5 years, and the only time anyone noticed was
>>>>>>>>>> when the ommission resulted in a better user experience, I do wonder
>>>>>>>>>> if
>>>>>>>>>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>>>>>>>>>> does
>>>>>>>>>> it serve?  I couldn't find one.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> NeilBrown
>>>>>>>>>>
>>>>>>>>>> For reference, see
>>>>>>>>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>>>>>>>>>> d_weak_revalidate dentry op")
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> To reproduce the problem at home, on a system that uses systemd:
>>>>>>>>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>>>>>>>>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>>>>>>>>>> 3/ loop-mount the filesystem image read-only somewhere
>>>>>>>>>> 4/ reboot
>>>>>>>>>>
>>>>>>>>>> If you choose v4, the reboot will succeed, possibly after a 90second
>>>>>>>>>> timeout.
>>>>>>>>>> If you choose v3, the reboot will hang indefinitely in systemd-
>>>>>>>>>> shutdown while
>>>>>>>>>> remounting the nfs filesystem read-only.
>>>>>>>>>>
>>>>>>>>>> If you don't use "noac" it can still hang, but only if something
>>>>>>>>>> slows
>>>>>>>>>> down the reboot enough that attributes have timed out by the time
>>>>>>>>>> that
>>>>>>>>>> systemd-shutdown runs.  This happens for our customer.
>>>>>>>>>>
>>>>>>>

Re: Do we really need d_weak_revalidate???

2017-08-17 Thread Ian Kent
On 18/08/17 13:24, NeilBrown wrote:
> On Thu, Aug 17 2017, Ian Kent wrote:
> 
>> On 16/08/17 19:34, Jeff Layton wrote:
>>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>>>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>>>
>>>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
>>>>>> On Fri, Aug 11 2017, Jeff Layton wrote:
>>>>>>
>>>>>>> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>>>>>>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>>>>>>>>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>>>>>>>>> flag and introduced the d_weak_revalidate dentry operation instead.
>>>>>>>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>>>>>>>>> and added the new dentry operation to NFS dentries  but not to
>>>>>>>>> NFSv4
>>>>>>>>> dentries.
>>>>>>>>>
>>>>>>>>> And nobody noticed.
>>>>>>>>>
>>>>>>>>> Until today.
>>>>>>>>>
>>>>>>>>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>>>>>>>>> NFS
>>>>>>>>> filesystem hangs because the network has been deconfigured.  This
>>>>>>>>> makes
>>>>>>>>> perfect sense and I suggested a code change to fix the problem.
>>>>>>>>> However when a colleague was trying to reproduce the problem to
>>>>>>>>> validate
>>>>>>>>> the fix, he couldn't.  Then nor could I.
>>>>>>>>>
>>>>>>>>> The problem is trivially reproducible with NFSv3, and not at all with
>>>>>>>>> NFSv4.  The reason is the missing d_weak_revalidate.
>>>>>>>>>
>>>>>>>>> We could simply add d_weak_revalidate for NFSv4, but given that it
>>>>>>>>> has been missing for 4.5 years, and the only time anyone noticed was
>>>>>>>>> when the ommission resulted in a better user experience, I do wonder
>>>>>>>>> if
>>>>>>>>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>>>>>>>>> does
>>>>>>>>> it serve?  I couldn't find one.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> NeilBrown
>>>>>>>>>
>>>>>>>>> For reference, see
>>>>>>>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>>>>>>>>> d_weak_revalidate dentry op")
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> To reproduce the problem at home, on a system that uses systemd:
>>>>>>>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>>>>>>>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>>>>>>>>> 3/ loop-mount the filesystem image read-only somewhere
>>>>>>>>> 4/ reboot
>>>>>>>>>
>>>>>>>>> If you choose v4, the reboot will succeed, possibly after a 90second
>>>>>>>>> timeout.
>>>>>>>>> If you choose v3, the reboot will hang indefinitely in systemd-
>>>>>>>>> shutdown while
>>>>>>>>> remounting the nfs filesystem read-only.
>>>>>>>>>
>>>>>>>>> If you don't use "noac" it can still hang, but only if something
>>>>>>>>> slows
>>>>>>>>> down the reboot enough that attributes have timed out by the time
>>>>>>>>> that
>>>>>>>>> systemd-shutdown runs.  This happens for our customer.
>>>>>>>>>
>>>>>>>>> If the loop-mounted filesystem is not read-only, you get other
>>>>>>>>> problems.
>>>>>>>>>
>>>>>>>>> We really want systemd to figure out that the loop-mount needs to be
>>>>>>>>> unmounted first

Re: Do we really need d_weak_revalidate???

2017-08-16 Thread Ian Kent
On 16/08/17 19:34, Jeff Layton wrote:
> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote:
>> On Mon, Aug 14 2017, Jeff Layton wrote:
>>
>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote:
 On Fri, Aug 11 2017, Jeff Layton wrote:

> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote:
>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote:
>>> Funny story.  4.5 years ago we discarded the FS_REVAL_DOT superblock
>>> flag and introduced the d_weak_revalidate dentry operation instead.
>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks,
>>> and added the new dentry operation to NFS dentries  but not to
>>> NFSv4
>>> dentries.
>>>
>>> And nobody noticed.
>>>
>>> Until today.
>>>
>>> A customer reports a situation where mount(,MS_REMOUNT,..) on an
>>> NFS
>>> filesystem hangs because the network has been deconfigured.  This
>>> makes
>>> perfect sense and I suggested a code change to fix the problem.
>>> However when a colleague was trying to reproduce the problem to
>>> validate
>>> the fix, he couldn't.  Then nor could I.
>>>
>>> The problem is trivially reproducible with NFSv3, and not at all with
>>> NFSv4.  The reason is the missing d_weak_revalidate.
>>>
>>> We could simply add d_weak_revalidate for NFSv4, but given that it
>>> has been missing for 4.5 years, and the only time anyone noticed was
>>> when the ommission resulted in a better user experience, I do wonder
>>> if
>>> we need to.  Can we just discard d_weak_revalidate?  What purpose
>>> does
>>> it serve?  I couldn't find one.
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>> For reference, see
>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
>>> d_weak_revalidate dentry op")
>>>
>>>
>>>
>>> To reproduce the problem at home, on a system that uses systemd:
>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem.
>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4
>>> 3/ loop-mount the filesystem image read-only somewhere
>>> 4/ reboot
>>>
>>> If you choose v4, the reboot will succeed, possibly after a 90second
>>> timeout.
>>> If you choose v3, the reboot will hang indefinitely in systemd-
>>> shutdown while
>>> remounting the nfs filesystem read-only.
>>>
>>> If you don't use "noac" it can still hang, but only if something
>>> slows
>>> down the reboot enough that attributes have timed out by the time
>>> that
>>> systemd-shutdown runs.  This happens for our customer.
>>>
>>> If the loop-mounted filesystem is not read-only, you get other
>>> problems.
>>>
>>> We really want systemd to figure out that the loop-mount needs to be
>>> unmounted first.  I have ideas concerning that, but it is messy.  But
>>> that isn't the only bug here.
>>
>> The main purpose of d_weak_revalidate() was to catch the issues that
>> arise when someone changes the contents of the current working
>> directory or its parent on the server. Since '.' and '..' are treated
>> specially in the lookup code, they would not be revalidated without
>> special treatment. That leads to issues when looking up files as
>> ./ or ../, since the client won't detect that its
>> dcache is stale until it tries to use the cached dentry+inode.
>>
>> The one thing that has changed since its introduction is, I believe,
>> the ESTALE handling in the VFS layer. That might fix a lot of the
>> dcache lookup bugs that were previously handled by d_weak_revalidate().
>> I haven't done an audit to figure out if it actually can handle all of
>> them.
>>
>
> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581:
>
> vfs: allow umount to handle mountpoints without revalidating them

 You say in the comment for that commit:

  but there
 are cases where we do want to revalidate the root of the fs.

 Do you happen to remember what those cases are?

>>>
>>> Not exactly, but I _think_ I might have been assuming that we needed to
>>> ensure that the inode attrs on the root were up to date after the
>>> pathwalk.
>>>
>>> I think that was probably wrong. d_revalidate is really intended to
>>> ensure that the dentry in question still points to the same inode. In
>>> the case of the root of the mount though, we don't really care about the
>>> dentry on the server at all. We're attaching the root of the mount to an
>>> inode and don't care of the dentry name changes. If we do need to ensure
>>> the inode attrs are updated, we'll just revalidate them at that point.
>>>
>
> Possibly the fact that we no longer try to revalidate during unmount
> means that this is no longer necessary?
>
> The original patch that added d_weak_revalidate had

[PATCH 3/5] autofs: use AUTOFS_DEV_IOCTL_SIZE

2017-08-15 Thread Ian Kent
From: Tomohiro Kusumi 

Use a macro which defines misc-dev ioctl parameter size (excluding
a path beyond &path[0]) since it's been used to initialize and copy
this structure ever since it first appeared in 8d7b48e0 in 2008.

(or simply get rid of this if this is just unnecessary abstraction
when all it needs is sizeof(struct autofs_dev_ioctl))

Edit: imk
That's a good point but I'd prefer to keep the macro define.
End edit: imk

Signed-off-by: Tomohiro Kusumi 
Signed-off-by: Ian Kent 
---
 fs/autofs4/dev-ioctl.c |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index ea8b3a1cddd2..b8b66d55266d 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -97,13 +97,13 @@ static struct autofs_dev_ioctl *
 {
struct autofs_dev_ioctl tmp, *res;
 
-   if (copy_from_user(&tmp, in, sizeof(tmp)))
+   if (copy_from_user(&tmp, in, AUTOFS_DEV_IOCTL_SIZE))
return ERR_PTR(-EFAULT);
 
-   if (tmp.size < sizeof(tmp))
+   if (tmp.size < AUTOFS_DEV_IOCTL_SIZE)
return ERR_PTR(-EINVAL);
 
-   if (tmp.size > (PATH_MAX + sizeof(tmp)))
+   if (tmp.size > AUTOFS_DEV_IOCTL_SIZE + PATH_MAX)
return ERR_PTR(-ENAMETOOLONG);
 
res = memdup_user(in, tmp.size);
@@ -133,8 +133,8 @@ static int validate_dev_ioctl(int cmd, struct 
autofs_dev_ioctl *param)
goto out;
}
 
-   if (param->size > sizeof(*param)) {
-   err = invalid_str(param->path, param->size - sizeof(*param));
+   if (param->size > AUTOFS_DEV_IOCTL_SIZE) {
+   err = invalid_str(param->path, param->size - 
AUTOFS_DEV_IOCTL_SIZE);
if (err) {
pr_warn(
  "path string terminator missing for cmd(0x%08x)\n",
@@ -451,7 +451,7 @@ static int autofs_dev_ioctl_requester(struct file *fp,
dev_t devid;
int err = -ENOENT;
 
-   if (param->size <= sizeof(*param)) {
+   if (param->size <= AUTOFS_DEV_IOCTL_SIZE) {
err = -EINVAL;
goto out;
}
@@ -539,7 +539,7 @@ static int autofs_dev_ioctl_ismountpoint(struct file *fp,
unsigned int devid, magic;
int err = -ENOENT;
 
-   if (param->size <= sizeof(*param)) {
+   if (param->size <= AUTOFS_DEV_IOCTL_SIZE) {
err = -EINVAL;
goto out;
}



[PATCH 1/5] autofs: remove unused AUTOFS_IOC_EXPIRE_DIRECT/INDIRECT

2017-08-15 Thread Ian Kent
From: Tomohiro Kusumi 

These are not used by either kernel or userspace, although
AUTOFS_IOC_EXPIRE_DIRECT once seems to have been used by userspace
in around 2006-2008, which was technically just an alias of the
existing ioctl AUTOFS_IOC_EXPIRE_MULTI.

ioctls for autofs are already complicated enough that they could
be removed unless these are staying here to be able to compile
userspace code of certain period of time from a decade ago.

Edit: imk
Yes, this is indeed very old and anything that still uses must
be updated becuase it will be using broken functionality.
End edit: imk

Signed-off-by: Tomohiro Kusumi 
Signed-off-by: Ian Kent 
---
 include/uapi/linux/auto_fs4.h |2 --
 1 file changed, 2 deletions(-)

diff --git a/include/uapi/linux/auto_fs4.h b/include/uapi/linux/auto_fs4.h
index 7c6da423d54e..9453e9a07c9d 100644
--- a/include/uapi/linux/auto_fs4.h
+++ b/include/uapi/linux/auto_fs4.h
@@ -155,8 +155,6 @@ enum {
 };
 
 #define AUTOFS_IOC_EXPIRE_MULTI_IOW(AUTOFS_IOCTL, 
AUTOFS_IOC_EXPIRE_MULTI_CMD, int)
-#define AUTOFS_IOC_EXPIRE_INDIRECT AUTOFS_IOC_EXPIRE_MULTI
-#define AUTOFS_IOC_EXPIRE_DIRECT   AUTOFS_IOC_EXPIRE_MULTI
 #define AUTOFS_IOC_PROTOSUBVER _IOR(AUTOFS_IOCTL, 
AUTOFS_IOC_PROTOSUBVER_CMD, int)
 #define AUTOFS_IOC_ASKUMOUNT   _IOR(AUTOFS_IOCTL, 
AUTOFS_IOC_ASKUMOUNT_CMD, int)
 



[PATCH 2/5] autofs: non functional header inclusion cleanup

2017-08-15 Thread Ian Kent
From: Tomohiro Kusumi 

Having header includes before any macro (without any dependency)
simply looks normal. No reason to have these macros in between.

Signed-off-by: Tomohiro Kusumi 
Signed-off-by: Ian Kent 
---
 fs/autofs4/autofs_i.h |   22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index beef981aa54f..4737615f0eaa 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -11,10 +11,21 @@
 
 #include 
 #include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 /* This is the range of ioctl() numbers we claim as ours */
 #define AUTOFS_IOC_FIRST AUTOFS_IOC_READY
@@ -24,17 +35,6 @@
 #define AUTOFS_DEV_IOCTL_IOC_COUNT \
(AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD - AUTOFS_DEV_IOCTL_VERSION_CMD)
 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
 #ifdef pr_fmt
 #undef pr_fmt
 #endif



[PATCH 5/5] autofs: use unsigned int/long instead of uint/ulong for ioctl args

2017-08-15 Thread Ian Kent
From: Tomohiro Kusumi 

The standard types unsigned int and unsigned long should be used for
.compat_ioctl. autofs is the only fs using uing/ulong for this, and
these are even the only uint/ulong in the entire autofs code.

Drop unneeded long cast in return value of autofs_dev_ioctl_compat().
It's already long.

Signed-off-by: Tomohiro Kusumi 
Signed-off-by: Ian Kent 
---
 fs/autofs4/dev-ioctl.c |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index a990c9d0f893..b7c816f39404 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -93,7 +93,7 @@ static int check_dev_ioctl_version(int cmd, struct 
autofs_dev_ioctl *param)
  * at the end of the struct.
  */
 static struct autofs_dev_ioctl *
-   copy_dev_ioctl(struct autofs_dev_ioctl __user *in)
+copy_dev_ioctl(struct autofs_dev_ioctl __user *in)
 {
struct autofs_dev_ioctl tmp, *res;
 
@@ -705,7 +705,8 @@ static int _autofs_dev_ioctl(unsigned int command,
return err;
 }
 
-static long autofs_dev_ioctl(struct file *file, uint command, ulong u)
+static long autofs_dev_ioctl(struct file *file, unsigned int command,
+unsigned long u)
 {
int err;
 
@@ -714,9 +715,10 @@ static long autofs_dev_ioctl(struct file *file, uint 
command, ulong u)
 }
 
 #ifdef CONFIG_COMPAT
-static long autofs_dev_ioctl_compat(struct file *file, uint command, ulong u)
+static long autofs_dev_ioctl_compat(struct file *file, unsigned int command,
+   unsigned long u)
 {
-   return (long) autofs_dev_ioctl(file, command, (ulong) compat_ptr(u));
+   return autofs_dev_ioctl(file, command, (unsigned long) compat_ptr(u));
 }
 #else
 #define autofs_dev_ioctl_compat NULL



[PATCH 4/5] autofs: drop wrong comment

2017-08-15 Thread Ian Kent
From: Tomohiro Kusumi 

This comment was correct when it was added in 8d7b48e0 in 2008,
but not after 4e44b685 in 2009 which introduced find_autofs_mount().

Signed-off-by: Tomohiro Kusumi 
Signed-off-by: Ian Kent 
---
 fs/autofs4/dev-ioctl.c |5 -
 1 file changed, 5 deletions(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index b8b66d55266d..a990c9d0f893 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -258,11 +258,6 @@ static int autofs_dev_ioctl_open_mountpoint(const char 
*name, dev_t devid)
if (err)
goto out;
 
-   /*
-* Find autofs super block that has the device number
-* corresponding to the autofs fs we want to open.
-*/
-
filp = dentry_open(&path, O_RDONLY, current_cred());
path_put(&path);
if (IS_ERR(filp)) {



Re: [PATCH 1/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-08-09 Thread Ian Kent
On 09/08/17 17:51, Ian Kent wrote:
> On 09/08/17 16:39, David Howells wrote:
>> Ian Kent  wrote:
>>
>>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>>> negative dentry case in follow_automount() needs to be changed to
>>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>>> required flags are clear).
>>
>> Should the be EREMOTE instead of ENOENT?
> 
> I thought about that and ended up thinking ENOENT was more sensible
> but I'll look at it again.

I think EREMOTE and ENOENT both are inaccurate.

There's no way to know if the negative dentry corresponds to a valid map
key, and we've seen increasing lookups from userspace applications for
invalid directories, so I'm not sure.

I went with ENOENT but I guess we could use EREMOTE, what's your thinking
on why EREMOTE might be better than ENOENT?

Ian


Re: [PATCH 1/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-08-09 Thread Ian Kent
On 09/08/17 16:39, David Howells wrote:
> Ian Kent  wrote:
> 
>> In order to handle the AT_NO_AUTOMOUNT for both system calls the
>> negative dentry case in follow_automount() needs to be changed to
>> return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
>> required flags are clear).
> 
> Should the be EREMOTE instead of ENOENT?

I thought about that and ended up thinking ENOENT was more sensible
but I'll look at it again.

Thanks
Ian


Re: [PATCH 1/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-08-08 Thread Ian Kent
On 08/08/17 21:11, Colin Walters wrote:
> On Tue, Aug 8, 2017, at 12:26 AM, Ian Kent wrote:
> 
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -3022,8 +3022,7 @@ static inline int vfs_lstat(const char __user *name, 
>> struct kstat *stat)
>>  static inline int vfs_fstatat(int dfd, const char __user *filename,
>>struct kstat *stat, int flags)
>>  {
>> -return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
>> - stat, STATX_BASIC_STATS);
>> +return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
>>  }
>>  static inline int vfs_fstat(int fd, struct kstat *stat)
>>  {
> 
> This is reverting the fstatat() prat of
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=deccf497d804a4c5fca2dbfad2f104675a6f9102
> Which itself seems weird to me - it looks like we were unconditionally
> forcing on AT_NO_AUTOMOUNT regardless of what userspace passed?
> So perhaps a
> Fixes: deccf497d804a4c5fca2dbfad2f104675a6f9102
> is appropriate here?

David posted this at my request.

I asked him to do it because, when I saw this, I thought restoring the semantics
to what they were before they were changed needed to be done as quickly as 
possible.

That was so that I could then work on fixing the AT_NO_AUTOMOUNT not being 
honored
with fstatat(2).

> 
> I understand that for stat()/lstat() we didn't expose the option to userspace,
> so the behavior was...ah, there's this note in man-pages 
> (man-pages-4.09-3.fc26.noarch):
> 
>> On Linux, lstat() will generally not trigger automounter action, whereas 
>> stat() will (but see fstatat(2)).
> 
> I have no idea of the history here, but maybe it makes sense to drop
> the AT_NO_AUTOMOUNT from the vfs_stat() too?
> 

I thought I had talked about the history in the patch description but I guess
it's not clear and isn't detailed enough for people that haven't been close to
the development over time.

Historically stat family calls were not supposed to trigger automounts because
that can easily lead to mount storms that are really bad for large autofs mount
maps. But the mount storm problem was mostly only evident for autofs maps that
used the "browse" option, the non-negative dentry case. The negative dentry
case always triggered an automount regardless of the system call.

Because of the move in user space to mostly always use proc filesystem mount
tables where there can be many more mount entries than were present in the
text based mount tables it's critical to not perform mount callbacks that
aren't absolutely essentially.

At this point that means to me going over the stat(2) system call behavior
and making sure it is only calling back where necessary. That's because that
is where it's expected automounts won't be triggered and so should have least
impact.

So it's the negative dentry handling in follow_automount() you should be
thinking about in terms of impact rather than the actual fstatat(2) change.
If man pages need to change then they need to change.

AFAICT, as I said in the patch description, this should not cause regressions
but I can't be certain. In any case it is in keeping with the historical "stat
family system calls shouldn't trigger automounts" mantra needed from the 
beginning.

Also notice that the negative dentry handling change should only affect autofs
as other kernel uses will be triggering automounts for positive dentrys.

Hopefully this doesn't sound to aggressive, I don't mean it to sound that way.

Ian


[PATCH 2/3] autofs - make disc device user accessible

2017-08-07 Thread Ian Kent
The autofs miscellanous device ioctls that shouldn't require
CAP_SYS_ADMIN need to be accessible to user space applications in
order to be able to get information about autofs mounts.

The module checks capabilities so the miscelaneous device should
be fine with broad permissions.

Signed-off-by: Ian Kent 
Cc: Colin Walters 
Cc: Ondrej Holy 
---
 fs/autofs4/dev-ioctl.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index dd9f1bebb5a3..218a4ecc75cc 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -733,7 +733,8 @@ static const struct file_operations _dev_ioctl_fops = {
 static struct miscdevice _autofs_dev_ioctl_misc = {
.minor  = AUTOFS_MINOR,
.name   = AUTOFS_DEVICE_NAME,
-   .fops   = &_dev_ioctl_fops
+   .fops   = &_dev_ioctl_fops,
+   .mode   = 0644,
 };
 
 MODULE_ALIAS_MISCDEV(AUTOFS_MINOR);



[PATCH 1/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-08-07 Thread Ian Kent
The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
of an automount by the call. But this flag is unconditionally cleared
for all stat family system calls except statx().

stat family system calls have always triggered mount requests for the
negative dentry case in follow_automount() which is intended but prevents
the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.

In order to handle the AT_NO_AUTOMOUNT for both system calls the
negative dentry case in follow_automount() needs to be changed to
return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
required flags are clear).

AFAICT this change doesn't have any noticable side effects and may,
in some use cases (although I didn't see it in testing) prevent
unnecessary callbacks to the automount daemon.

It's also possible that a stat family call has been made with a
path that is in the process of being mounted by some other process.
But stat family calls should return the automount state of the path
as it is "now" so it shouldn't wait for mount completion.

This is the same semantic as the positive dentry case already
handled.

Signed-off-by: Ian Kent 
Cc: David Howells 
Cc: Colin Walters 
Cc: Ondrej Holy 
---
 fs/namei.c |   15 ---
 include/linux/fs.h |3 +--
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ddb6a7c2b3d4..1180f9c58093 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1129,9 +1129,18 @@ static int follow_automount(struct path *path, struct 
nameidata *nd,
 * of the daemon to instantiate them before they can be used.
 */
if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
-  LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
-   path->dentry->d_inode)
-   return -EISDIR;
+  LOOKUP_OPEN | LOOKUP_CREATE |
+  LOOKUP_AUTOMOUNT))) {
+   /* Positive dentry that isn't meant to trigger an
+* automount, EISDIR will allow it to be used,
+* otherwise there's no mount here "now" so return
+* ENOENT.
+*/
+   if (path->dentry->d_inode)
+   return -EISDIR;
+   else
+   return -ENOENT;
+   }
 
if (path->dentry->d_sb->s_user_ns != &init_user_ns)
return -EACCES;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6e1fd5d21248..37c96f52e48e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3022,8 +3022,7 @@ static inline int vfs_lstat(const char __user *name, 
struct kstat *stat)
 static inline int vfs_fstatat(int dfd, const char __user *filename,
  struct kstat *stat, int flags)
 {
-   return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
-stat, STATX_BASIC_STATS);
+   return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
 }
 static inline int vfs_fstat(int fd, struct kstat *stat)
 {



[PATCH 3/3] autofs - make dev ioctl version and ismountpoint user accessible

2017-08-07 Thread Ian Kent
Some of the autofs miscellaneous device ioctls need to be accessable to
user space applications without CAP_SYS_ADMIN to get information about
autofs mounts.

Signed-off-by: Ian Kent 
Cc: Colin Walters 
Cc: Ondrej Holy 
---
 fs/autofs4/dev-ioctl.c  |   12 
 include/uapi/linux/auto_dev-ioctl.h |2 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index 218a4ecc75cc..ea8b3a1cddd2 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -628,10 +628,6 @@ static int _autofs_dev_ioctl(unsigned int command,
ioctl_fn fn = NULL;
int err = 0;
 
-   /* only root can play with this */
-   if (!capable(CAP_SYS_ADMIN))
-   return -EPERM;
-
cmd_first = _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST);
cmd = _IOC_NR(command);
 
@@ -640,6 +636,14 @@ static int _autofs_dev_ioctl(unsigned int command,
return -ENOTTY;
}
 
+   /* Only root can use ioctls other than AUTOFS_DEV_IOCTL_VERSION_CMD
+* and AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD
+*/
+   if (cmd != AUTOFS_DEV_IOCTL_VERSION_CMD &&
+   cmd != AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD &&
+   !capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
/* Copy the parameters into kernel space. */
param = copy_dev_ioctl(user);
if (IS_ERR(param))
diff --git a/include/uapi/linux/auto_dev-ioctl.h 
b/include/uapi/linux/auto_dev-ioctl.h
index 744b3d060968..5558db8e6646 100644
--- a/include/uapi/linux/auto_dev-ioctl.h
+++ b/include/uapi/linux/auto_dev-ioctl.h
@@ -16,7 +16,7 @@
 #define AUTOFS_DEVICE_NAME "autofs"
 
 #define AUTOFS_DEV_IOCTL_VERSION_MAJOR 1
-#define AUTOFS_DEV_IOCTL_VERSION_MINOR 0
+#define AUTOFS_DEV_IOCTL_VERSION_MINOR 1
 
 #define AUTOFS_DEV_IOCTL_SIZE  sizeof(struct autofs_dev_ioctl)
 



Re: [PATCH] autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL

2017-06-15 Thread Ian Kent
On Fri, 2017-06-16 at 12:13 +1000, NeilBrown wrote:
> On Thu, Jun 15 2017, Andrew Morton wrote:
> 
> > On Wed, 07 Jun 2017 12:08:38 +1000 NeilBrown  wrote:
> > 
> > > 
> > > If a positive status is passed with the AUTOFS_DEV_IOCTL_FAIL
> > > ioctl, autofs4_d_automount() will return
> > >    ERR_PTR(status)
> > > with that status to follow_automount(), which will then
> > > dereference an invalid pointer.
> > > 
> > > So treat a positive status the same as zero, and map
> > > to ENOENT.
> > > 
> > > See comment in systemd src/core/automount.c::automount_send_ready().
> > > 
> > > ...
> > > 
> > > --- a/fs/autofs4/dev-ioctl.c
> > > +++ b/fs/autofs4/dev-ioctl.c
> > > @@ -344,7 +344,7 @@ static int autofs_dev_ioctl_fail(struct file *fp,
> > >   int status;
> > >  
> > >   token = (autofs_wqt_t) param->fail.token;
> > > - status = param->fail.status ? param->fail.status : -ENOENT;
> > > + status = param->fail.status < 0 ? param->fail.status : -ENOENT;
> > >   return autofs4_wait_release(sbi, token, status);
> > >  }
> > 
> > Sounds serious.  Was the absence of a cc:stable deliberate?
> 
> You need CAP_SYS_ADMIN to  get the ioctl even looked at.  Doesn't that
> mean the bug can only be triggered by a process that could easily do
> worse?

Think so, yes.

> 
> Or do containers allow admins to give out CAP_SYS_ADMIN to untrusted
> people??  I haven't been keeping up.

Maybe, with docker root can start a container with --privileged to give the
container admin capabilities. It may be the case that capabilities can be used
now I don't know.

> 
> Given how simple the patch is, it probably makes sense to add a
> cc:stable, just in case.

IMHO it needs to be applied to stable as well.

> 
> Thanks,
> NeilBrown


Re: [PATCH RFC] mnt: umount mounts one by one in umount_tree()

2017-06-13 Thread Ian Kent
On Fri, 2017-05-12 at 00:08 -0700, Andrei Vagin wrote:
> With this patch, we don't try to umount all mounts of a tree together.
> Instead of this we umount them one by one. In this case, we see a significant
> improvement in performance for the worsе case.

Indeed, umount has been very slow for a while now.
Even a moderately large number of mounts (~1) become painfully slow.

Re you still perusing this?
Anything I can do to help?

Eric, what are your thoughts on this latest attempt?

> 
> The reason of this optimization is that umount() can hold namespace_sem
> for a long time, this semaphore is global, so it affects all users.
> Recently Eric W. Biederman added a per mount namespace limit on the
> number of mounts. The default number of mounts allowed per mount
> namespace at 100,000. Currently this value is allowed to construct a tree
> which requires hours to be umounted.
> 
> In a worse case the current complexity of umount_tree() is O(n^3).
> * Enumirate all mounts in a target tree (propagate_umount)
> * Enumirate mounts to find where these changes have to
>   be propagated (mark_umount_candidates)
> * Enumirate mounts to find a requered mount by parent and dentry
>   (__lookup_mnt). __lookup_mnt() searches a mount in m_hash, but
>   the number of mounts is much bigger than a size of the hash.
> 
> The worse case is when all mounts from the tree live in the same shared
> group. In this case we have to enumirate all mounts on each step.
> 
> There is CVE-2016-6213 about this case.
> 
> Here are results for the kernel with this patch
> $ for i in `seq 10 15`; do  unshare -m sh ./run.sh $i; done
> umount -l /mnt/1 ->   0:00.00
> umount -l /mnt/1 ->   0:00.01
> umount -l /mnt/1 ->   0:00.01
> umount -l /mnt/1 ->   0:00.03
> umount -l /mnt/1 ->   0:00.07
> umount -l /mnt/1 ->   0:00.14
> 
> Here are results for the kernel without this patch
> $ for i in `seq 10 15`; do  unshare -m sh ./run.sh $i; done
> umount -l /mnt/1 ->   0:00.04
> umount -l /mnt/1 ->   0:00.17
> umount -l /mnt/1 ->   0:00.75
> umount -l /mnt/1 ->   0:05.96
> umount -l /mnt/1 ->   0:34.40
> umount -l /mnt/1 ->   3:46.27
> 
> And here is a test script:
> $ cat run.sh
> set -e -m
> 
> mount -t tmpfs zdtm /mnt
> mkdir -p /mnt/1 /mnt/2
> mount -t tmpfs zdtm /mnt/1
> mount --make-shared /mnt/1
> mkdir /mnt/1/1
> 
> for i in `seq $1`; do
>   ./mount --bind /mnt/1/1 /mnt/1/1
> done
> 
> echo -n "umount -l /mnt/1 -> "
> /usr/bin/time -f '%E' ./umount -l /mnt/1
> 
> And we need these simple mount and umount tools, because the standard
> ones read /proc/self/mountinfo, but this is extremely slow when we have
> thousands of mounts.
> $ cat mount.c
>  #include 
>  #include 
> 
>  int main(int argc, char **argv)
>  {
>   return mount(argv[2], argv[3], NULL, MS_BIND, NULL);
>  }
> 
> $ cat umount.c
>  #include 
> 
>  int main(int argc, char **argv)
>  {
>   return umount2(argv[2], MNT_DETACH);
>  }
> 
> Here is a previous attempt to optimize this code:
> https://lkml.org/lkml/2016/10/10/495
> 
> Signed-off-by: Andrei Vagin 
> ---
>  fs/namespace.c | 81 +++
> ---
>  1 file changed, 43 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 3bf0cd2..4e6f258 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1474,56 +1474,61 @@ static bool disconnect_mount(struct mount *mnt, enum
> umount_tree_flags how)
>   */
>  static void umount_tree(struct mount *mnt, enum umount_tree_flags how)
>  {
> - LIST_HEAD(tmp_list);
>   struct mount *p;
> + int done = 0;
>  
>   if (how & UMOUNT_PROPAGATE)
>   propagate_mount_unlock(mnt);
>  
>   /* Gather the mounts to umount */
> - for (p = mnt; p; p = next_mnt(p, mnt)) {
> + while (!done) {
> + LIST_HEAD(tmp_list);
> +
> + p = mnt;
> + while (!list_empty(&p->mnt_mounts))
> + p = list_entry(p->mnt_mounts.next, struct mount,
> mnt_child);
> +
>   p->mnt.mnt_flags |= MNT_UMOUNT;
>   list_move(&p->mnt_list, &tmp_list);
> - }
> -
> - /* Hide the mounts from mnt_mounts */
> - list_for_each_entry(p, &tmp_list, mnt_list) {
>   list_del_init(&p->mnt_child);
> - }
>  
> - /* Add propogated mounts to the tmp_list */
> - if (how & UMOUNT_PROPAGATE)
> - propagate_umount(&tmp_list);
> -
> - while (!list_empty(&tmp_list)) {
> - struct mnt_namespace *ns;
> - bool disconnect;
> - p = list_first_entry(&tmp_list, struct mount, mnt_list);
> - list_del_init(&p->mnt_expire);
> - list_del_init(&p->mnt_list);
> - ns = p->mnt_ns;
> - if (ns) {
> - ns->mounts--;
> - __touch_mnt_namespace(ns);
> - }
> - p->mnt_ns = NULL;
> - if (how & UMOUNT_SYNC)
> - p->mnt.mnt_flags |= MNT_SYNC_UMOUNT;
> -
> 

Re: [RFC][PATCH 0/9] Make containers kernel objects

2017-05-29 Thread Ian Kent
On Sat, 2017-05-27 at 17:45 +, Trond Myklebust wrote:
> On Mon, 2017-05-22 at 14:04 -0500, Eric W. Biederman wrote:
> > David Howells  writes:
> > 
> > > Here are a set of patches to define a container object for the
> > > kernel and
> > > to provide some methods to create and manipulate them.
> > > 
> > > The reason I think this is necessary is that the kernel has no idea
> > > how to
> > > direct upcalls to what userspace considers to be a container -
> > > current
> > > Linux practice appears to make a "container" just an arbitrarily
> > > chosen
> > > junction of namespaces, control groups and files, which may be
> > > changed
> > > individually within the "container".
> > > 
> > 
> > I think this might possibly be a useful abstraction for solving the
> > keyring upcalls if it was something created implicitly.
> > 
> > fork_into_container for use by keyring upcalls is currently a
> > security
> > vulnerability as it allows escaping all of a containers cgroups.  But
> > you have that on your list of things to fix.  However you don't have
> > seccomp and a few other things.
> > 
> > Before we had kthreadd in the kernel upcalls always had issues
> > because
> > the code to reset all of the userspace bits and make the forked
> > task suitable for running upcalls was always missing some detail.  It
> > is
> > a very bug-prone kind of idiom that you are talking about.  It is
> > doubly
> > bug-prone because the wrongness is visible to userspace and as such
> > might get become a frozen KABI guarantee.
> > 
> > Let me suggest a concrete alternative:
> > 
> > - At the time of mount observer the mounters user namespace.
> > - Find the mounters pid namespace.
> > - If the mounters pid namespace is owned by the mounters user
> > namespace
> >   walk up the pid namespace tree to the first pid namespace owned by
> >   that user namespace.
> > - If the mounters pid namespace is not owned by the mounters user
> >   namespace fail the mount it is going to need to make upcalls as
> >   will not be possible.
> > - Hold a reference to the pid namespace that was found.
> > 
> > Then when an upcall needs to be made fork a child of the init process
> > of the specified pid namespace.  Or fail if the init process of the
> > pid namespace has died.
> > 
> > That should always work and it does not require keeping expensive
> > state
> > where we did not have it previously.  Further because the semantics
> > are
> > fork a child of a particular pid namespace's init as features get
> > added
> > to the kernel this code remains well defined.
> > 
> > For ordinary request-key upcalls we should be able to use the same
> > rules
> > and just not save/restore things in the kernel.
> > 
> > A huge advantage of my alternative (other than not being a bit-rot
> > magnet) is that it should drop into existing container infrastructure
> > without problems.  The rule for container implementors is simple to
> > use
> > security key infrastructure you need to have created a pid namespace
> > in
> > your user namespace.
> > 
> 
> While this may be part of a solution, I don't see how it can deal with
> issues such as the need to set up an RPCSEC_GSS session on behalf of
> the user. The issue there is that while the mount may have been created
> in a parent namespace, the actual call to kinit to set up a kerberos
> context is likely to have been made inside the container. It may even
> have been done using a completely separate net namespace. So in order
> to set up my RPCSEC_GSS session, I may need to do so from inside the
> user container.
> 
> In that kind of environment, might it perhaps make sense to just allow
> an upcall executable running in the root init namespace to tunnel
> through (using setns()) so it can actually execute in the context of
> the child container? That would keep security policy with the init
> namespace, but would also ensure that the container environment rules
> may be applied if and when appropriate.
> 
> In addition to today's upcall mechanism, we would need the ability to
> pass in the nsproxy (and root directory) for the confined process that
> triggered the upcall and/or the namespace for the mountpoint. I'm
> assuming that could be done by passing in a file descriptor to the
> appropriate /proc entries?

Maybe I don't quite understand what your describing 

But the information needed (I think) is readily available at the time of the
upcall from within the container (it's not clear that really is the case based
on a later discussion). Then the process run in the init namespace can pick that
up and call (a slightly modified) setns().

I went down that burrow for quite a while and ended up leaving it at, as
Biederman so elegantly describes as, a bad half-assed implementation.

In fact I stopped perusing that approach for a couple of reasons, not the least
of which was a lack of in-depth knowledge of all the sub-systems involved (and a
complete lack of useful suggestions or help from others that are famili

[ANNOUNCE] autofs 5.1.3 release

2017-05-26 Thread Ian Kent
Hi all,

It's time for a new release, autofs-5.1.3.

Of particular interest is the addition browse support for amd format
maps.

The browsable_dirs configuration option can be set to either "yes" or
"no" and the pseudo option "browsable" can be used as an option of
mount type "auto" mounts.

Note that the browsable_dirs "full" option and the mount type auto
"fullybrowsabe" option, mostly used for map debugging, cannot be
provided by autofs.

Related changes are the removal of the requirement to include entries
for amd format maps in the master map. The optional amd format map
 mount sections may be used without adding these entries, in the same
way these sections are used in the amd configuration.

This prevents the need to include potentially incompatible entries in
shared master maps in multi-vendor environments.

Something to watch out for is that autofs is no longer tolerant of
misconfigured nsswitch "automount:" entries. If nsswitch sources that
aren't configured are present it can lead to an unconditional 10
second delay at startup with some unexpected log errors. autofs is
probably not going to interfere with nsswitch.conf because it is not
the owner of the file so it needs to be set correctly by people using
autofs. This will often mean just removing the inappropriate "nisplus"
source present in the default installed nsswitch.conf. 

There's two added configuration parameters which hopefully will help
with map source availability problems at startup.

The "master_wait" configuration option (default 10 seconds) will retry
reading the master map for the configured time when certain errors are
encountered (the reason misconfigured nsswitch configurations cause a
delay).

And the option "sss_master_map_wait" (default 0, disabled) has been
added specifically for the sss source. This option may be needed
because the error returned when sss is not yet available to answer
queries can't be distinguished from a real "map does not exist" return.
The default of disabled is because the sss source not being ready
problem has already be resolved in recent versions of sss itself.

Additionally there a number of bug fixes and small improvements.

autofs
==

The package can be found at:
https://www.kernel.org/pub/linux/daemons/autofs/v5/

It is autofs-5.1.3.tar.[gz|xz]

No source rpm is there as it can be produced by using:

rpmbuild -ts autofs-5.1.3.tar.gz

and the binary rpm by using:

rpmbuild -tb autofs-5.1.3.tar.gz

See the README.amd-maps file for information about using amd format
maps.

Here are the entries from the CHANGELOG which outline the updates:

24/05/2016 autofs-5.1.3
===
- fix release date in CHANGELOG.
- build: check for clock_gettime in librt.
- fix compiler warning in try_remount().
- drop redundant \n in logerr().
- Fix size arg of fgets(3).
- fix libtirpc detection with -Wl,--as-needed.
- Fix a typo in CREDITS.
- Change .requestor to .requester for consistency.
- fix file map changed check.
- Remove unused local 2KB buffer.
- Fix typos in error messages.
- Fix fgets(3) size argument (another one).
- fix short memory allocation in lookup_amd_instance().
- fix count_mounts() function.
- configure: add cache variable for Linux proc filesystem check.
- Avoid local variable name shadowing another.
- fix typo in MOUNT_FLAG_GHOST comment.
- fix cachefs parse message not being logged.
- fix argc off by one in mount_autofs.c.
- fix _strncmp() usage.
- fix create_client() RPC client handling.
- update and add README for old autofs schema.
- wait for master map available at start.
- add master read wait option.
- work around sss startup delay.
- add sss master map wait config option.
- fix quoted key handling in sanitize_path().
- fix included master map not found return.
- dont exit on master map read fail timeout.
- set sane default master read wait timeout.
- don't return until after master map retry read.
- make lookup_nss_read_master() return nss status.
- check NFS server availability on local mount fallback.
- check NFS server availability on local mount fallback.
- make set_direct_mount_catatonic() more general.
- set autofs mounts catatonic at exit.
- honor last rw in mount options when doing a bind mount.
- fix typos in README.amd-maps.
- add ref counting to struct map_source.
- add support for amd browsable option.
- add function conf_amd_get_map_name().
- add function conf_amd_get_mount_paths().
- include amd mount sections mounts in master mounts list.
- check for conflicting amd section mounts.
- add function conf_get_map_options().
- capture cache option and its settings during parsing.
- handle map_option cache for top level mounts.
- handle amd cache option all in amd type auto mounts.
- fix bogus check in expire_cleanup().
- delay submount exit for amd submounts.
- add the mount requestor's pid to pending_args.
- create thread-local ID for mount attempts.
- log functions to prefix messages with attempt_id if available.
- factor out set_thread_mount_request_log_id

Re: [RFC][PATCH 0/9] Make containers kernel objects

2017-05-24 Thread Ian Kent
On Wed, 2017-05-24 at 03:26 -0500, Eric W. Biederman wrote:
> 
> So far no one has even bothered to seriously try the one solution that
> is guaranteed to work because it takes a lot of changes to kernel code.
> I believe the last effort snagged on what a pain it is to refactor the
> user mode helper infrastructure.

Yes, that's mostly true in my case although I wouldn't say I haven't looked at
it seriously but equally I haven't got anything towards it yet either, sorry.

I'm likely going to revisit this based on a couple of approaches.

One is just what you describe and I had already been looking at this some time
ago. It seems to me that adding a work queue type that starts and retains a
process until the work queue is destroyed (similar to the way the work queue sub
system starts a fail over thread for use under resource exhaustion) would be a
sensible way to do it.

This doesn't mean I think it's a good idea for reasons I've outlined in the past
but the approach does warrant the effort to work out if it can be used without
problems.

And there's also the request key infrastructure which, as it is now, gets in the
road of verifying results, *sigh*.

Ian


Re: [RFC][PATCH 0/9] Make containers kernel objects

2017-05-23 Thread Ian Kent
On Mon, 2017-05-22 at 12:21 -0700, James Bottomley wrote:
> 
> > > >  (3) nfsdcltrack.  A way for NFSD to access stable storage for 
> > > > tracking of persistent state.  Again, network-namespace 
> > > > dependent, but also perhaps mount-namespace dependent.
> > 
> > Definitely mount-namespace dependent.
> > 
> > > 
> > > So again, given we can set this up to work today, this sounds like 
> > > more a restriction that will bite us than an enhancement that gives 
> > > us extra features.
> > > 
> > 
> > How do you set this up to work today?
> 
> Well, as above, it spawns into the root, you jump it to where it should
> be and re-execute or simply handle in the host. 
> 
> > AFAIK, if you want to run knfsd in a container today, you're out of 
> > luck for any non-trivial configuration.
> 
> Well "running knfsd in a container" is actually different from having a
> containerised nfs export.  My understanding was that thanks to the work
> of Stas Kinsbursky, the latter has mostly worked since the 3.9 kernel
> for v3 and below.  I assume the current issue is that there's a problem
> with v4?

Oh, ok, I thought that, say, a docker (NFS) volumes-from a container to another
container didn't work for any version of NFS.

Certainly didn't work last time I tried, it was a while ago though.

Ian


Re: [RFC][PATCH 0/9] Make containers kernel objects

2017-05-23 Thread Ian Kent
On Mon, 2017-05-22 at 17:22 +0100, David Howells wrote:
> Here are a set of patches to define a container object for the kernel and
> to provide some methods to create and manipulate them.
> 
> The reason I think this is necessary is that the kernel has no idea how to
> direct upcalls to what userspace considers to be a container - current
> Linux practice appears to make a "container" just an arbitrarily chosen
> junction of namespaces, control groups and files, which may be changed
> individually within the "container".
> 
> The kernel upcall mechanism then needs to decide which set of namespaces,
> etc., it must exec the appropriate upcall program.  Examples of this
> include:
> 
>  (1) The DNS resolver.  The DNS cache in the kernel should probably be
>  per-network namespace, but in userspace the program, its libraries and
>  its config data are associated with a mount tree and a user namespace
>  and it gets run in a particular pid namespace.
> 
>  (2) NFS ID mapper.  The NFS ID mapping cache should also probably be
>  per-network namespace.
> 
>  (3) nfsdcltrack.  A way for NFSD to access stable storage for tracking
>  of persistent state.  Again, network-namespace dependent, but also
>  perhaps mount-namespace dependent.
> 
>  (4) General request-key upcalls.  Not particularly namespace dependent,
>  apart from keyrings being somewhat governed by the user namespace and
>  the upcall being configured by the mount namespace.
> 
> These patches are built on top of the mount context patchset so that
> namespaces can be properly propagated over submounts/automounts.
> 
> These patches implement a container object that holds the following things:
> 
>  (1) Namespaces.
> 
>  (2) A root directory.
> 
>  (3) A set of processes, including a designated 'init' process.
> 
>  (4) The creator's credentials, including ownership.
> 
>  (5) A place to hang security for the container, allowing policies to be
>  set per-container.
> 
> I also want to add:
> 
>  (6) Control groups.
> 
>  (7) A per-container keyring that can be added to from outside of the
>  container, even once the container is live, for the provision of
>  filesystem authentication/encryption keys in advance of the container
>  being started.

It's hard to decide which of these has higher priority, I think both essential
to a container implementation.

Ian


Re: [RFC][PATCH 0/9] Make containers kernel objects

2017-05-23 Thread Ian Kent
On Mon, 2017-05-22 at 09:53 -0700, James Bottomley wrote:
> [Added missing cc to containers list]
> On Mon, 2017-05-22 at 17:22 +0100, David Howells wrote:
> > Here are a set of patches to define a container object for the kernel 
> > and to provide some methods to create and manipulate them.
> > 
> > The reason I think this is necessary is that the kernel has no idea 
> > how to direct upcalls to what userspace considers to be a container -
> > current Linux practice appears to make a "container" just an 
> > arbitrarily chosen junction of namespaces, control groups and files, 
> > which may be changed individually within the "container".
> 
> This sounds like a step in the wrong direction: the strength of the
> current container interfaces in Linux is that people who set up
> containers don't have to agree what they look like.  So I can set up a
> user namespace without a mount namespace or an architecture emulation
> container with only a mount namespace.
> 
> But ignoring my fun foibles with containers and to give a concrete
> example in terms of a popular orchestration system: in kubernetes,
> where certain namespaces are shared across pods, do you imagine the
> kernel's view of the "container" to be the pod or what kubernetes
> thinks of as the container?  This is important, because half the
> examples you give below are network related and usually pods share a
> network namespace.
> 
> > The kernel upcall mechanism then needs to decide which set of 
> > namespaces, etc., it must exec the appropriate upcall program. 
> >  Examples of this include:
> > 
> >  (1) The DNS resolver.  The DNS cache in the kernel should probably 
> > be per-network namespace, but in userspace the program, its
> > libraries and its config data are associated with a mount tree and a 
> > user namespace and it gets run in a particular pid namespace.
> 
> All persistent (written to fs data) has to be mount ns associated;
> there are no ifs, ands and buts to that.  I agree this implies that if
> you want to run a separate network namespace, you either take DNS from
> the parent (a lot of containers do) or you set up a daemon to run
> within the mount namespace.  I agree the latter is a slightly fiddly
> operation you have to get right, but that's why we have orchestration
> systems.
> 
> What is it we could do with the above that we cannot do today?
> 
> >  (2) NFS ID mapper.  The NFS ID mapping cache should also probably be
> >  per-network namespace.
> 
> I think this is a view but not the only one:  Right at the moment, NFS
> ID mapping is used as the one of the ways we can get the user namespace
> ID mapping writes to file problems fixed ... that makes it a property
> of the mount namespace for a lot of containers.  There are many other
> instances where they do exactly as you say, but what I'm saying is that
> we don't want to lose the flexibility we currently have.
> 
> >  (3) nfsdcltrack.  A way for NFSD to access stable storage for 
> > tracking of persistent state.  Again, network-namespace dependent, 
> > but also perhaps mount-namespace dependent.
> 
> So again, given we can set this up to work today, this sounds like more
> a restriction that will bite us than an enhancement that gives us extra
> features.
> 
> >  (4) General request-key upcalls.  Not particularly namespace 
> > dependent, apart from keyrings being somewhat governed by the user
> > namespace and the upcall being configured by the mount namespace.
> 
> All mount namespaces have an owning user namespace, so the data
> relations are already there in the kernel, is the problem simply
> finding them?
> 
> > These patches are built on top of the mount context patchset so that
> > namespaces can be properly propagated over submounts/automounts.
> 
> I'll stop here ... you get the idea that I think this is imposing a set
> of restrictions that will come back to bite us later.  If this is just
> for the sake of figuring out how to get keyring upcalls to work, then
> I'm sure we can come up with something.

You talk about a number of things I'm simply not aware of so I can't answer your
questions. But your points do sound like issues that need to be covered.

I think you mentioned user space used NFS ID mapper works fine.
I wonder, could you give more detail on that please.

Perhaps nsenter(1) is being used, I tried that as a possible usermode helper
solution and it probably did "work" in the sense of in container execution but
no-one liked it, it seems kernel folk expect to do things, well, in kernel.

Not only that there were other problems, probably request key sub system not
being namespace aware, or id caching within nfs or somewhere else, and there was
a question of not being able to cater for user namespace usage.

Anyway I do have a different view from my own experiences.

First there are a number of subsystems involved in creating a process from
within a container that has the container environment and, AFAICS (from the
usermode helper experience

[PATCH 2/3] autofs - make dev ioctl version and ismountpoint user accessible

2017-05-09 Thread Ian Kent
Some of the autofs miscellaneous device ioctls need to be accessable to
user space applications without CAP_SYS_ADMIN to get information about
autofs mounts.

Start by making the autofs miscellaneous device ioctl header available
and allow applications to use version and ismountpoint ioctls.

Signed-off-by: Ian Kent 
Cc: Colin Walters 
Cc: Ondrej Holy 
Cc: sta...@vger.kernel.org
---
 fs/autofs4/dev-ioctl.c  |   12 
 include/uapi/linux/Kbuild   |1 +
 include/uapi/linux/auto_dev-ioctl.h |2 +-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index 9b58d6e..f8cb3f6 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -628,10 +628,6 @@ static int _autofs_dev_ioctl(unsigned int command,
ioctl_fn fn = NULL;
int err = 0;
 
-   /* only root can play with this */
-   if (!capable(CAP_SYS_ADMIN))
-   return -EPERM;
-
cmd_first = _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST);
cmd = _IOC_NR(command);
 
@@ -640,6 +636,14 @@ static int _autofs_dev_ioctl(unsigned int command,
return -ENOTTY;
}
 
+   /* Only root can use ioctls other than AUTOFS_DEV_IOCTL_VERSION_CMD
+* and AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD
+*/
+   if (cmd != AUTOFS_DEV_IOCTL_VERSION_CMD &&
+   cmd != AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD &&
+   !capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
/* Copy the parameters into kernel space. */
param = copy_dev_ioctl(user);
if (IS_ERR(param))
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 662c592..1f22bbb 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -61,6 +61,7 @@ header-y += atm_zatm.h
 header-y += audit.h
 header-y += auto_fs4.h
 header-y += auto_fs.h
+header-y += auto_dev-ioctl.h
 header-y += auxvec.h
 header-y += ax25.h
 header-y += b1lli.h
diff --git a/include/uapi/linux/auto_dev-ioctl.h 
b/include/uapi/linux/auto_dev-ioctl.h
index 744b3d0..5558db8 100644
--- a/include/uapi/linux/auto_dev-ioctl.h
+++ b/include/uapi/linux/auto_dev-ioctl.h
@@ -16,7 +16,7 @@
 #define AUTOFS_DEVICE_NAME "autofs"
 
 #define AUTOFS_DEV_IOCTL_VERSION_MAJOR 1
-#define AUTOFS_DEV_IOCTL_VERSION_MINOR 0
+#define AUTOFS_DEV_IOCTL_VERSION_MINOR 1
 
 #define AUTOFS_DEV_IOCTL_SIZE  sizeof(struct autofs_dev_ioctl)
 



[PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-05-09 Thread Ian Kent
The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
of an automount by the call. But this flag is unconditionally cleared
for all stat family system calls except statx().

stat family system calls have always triggered mount requests for the
negative dentry case in follow_automount() which is intended but prevents
the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.

In order to handle the AT_NO_AUTOMOUNT for both system calls the
negative dentry case in follow_automount() needs to be changed to
return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
required flags are clear).

AFAICT this change doesn't have any noticable side effects and may,
in some use cases (although I didn't see it in testing) prevent
unnecessary callbacks to the automount daemon.

It's also possible that a stat family call has been made with a
path that is in the process of being mounted by some other process.
But stat family calls should return the automount state of the path
as it is "now" so it shouldn't wait for mount completion.

This is the same semantic as the positive dentry case already
handled.

Signed-off-by: Ian Kent 
Cc: David Howells 
Cc: Colin Walters 
Cc: Ondrej Holy 
Cc: sta...@vger.kernel.org
---
 fs/namei.c |   15 ---
 include/linux/fs.h |3 +--
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 7286f87..cd74838 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1129,9 +1129,18 @@ static int follow_automount(struct path *path, struct 
nameidata *nd,
 * of the daemon to instantiate them before they can be used.
 */
if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
-  LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
-   path->dentry->d_inode)
-   return -EISDIR;
+  LOOKUP_OPEN | LOOKUP_CREATE |
+  LOOKUP_AUTOMOUNT))) {
+   /* Positive dentry that isn't meant to trigger an
+* automount, EISDIR will allow it to be used,
+* otherwise there's no mount here "now" so return
+* ENOENT.
+*/
+   if (path->dentry->d_inode)
+   return -EISDIR;
+   else
+   return -ENOENT;
+   }
 
if (path->dentry->d_sb->s_user_ns != &init_user_ns)
return -EACCES;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 26488b4..be09684 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2935,8 +2935,7 @@ static inline int vfs_lstat(const char __user *name, 
struct kstat *stat)
 static inline int vfs_fstatat(int dfd, const char __user *filename,
  struct kstat *stat, int flags)
 {
-   return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
-stat, STATX_BASIC_STATS);
+   return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
 }
 static inline int vfs_fstat(int fd, struct kstat *stat)
 {



[PATCH 1/3] autofs - make disc device user accessible

2017-05-09 Thread Ian Kent
The autofs miscellanous device ioctls that shouldn't require
CAP_SYS_ADMIN need to be accessible to user space applications in
order to be able to get information about autofs mounts.

The module checks capabilities so the miscelaneous device should
be fine with broad permissions.

Signed-off-by: Ian Kent 
Cc: Colin Walters 
Cc: Ondrej Holy 
Cc: sta...@vger.kernel.org
---
 fs/autofs4/dev-ioctl.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index 734cbf8..9b58d6e 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -733,7 +733,8 @@ static const struct file_operations _dev_ioctl_fops = {
 static struct miscdevice _autofs_dev_ioctl_misc = {
.minor  = AUTOFS_MINOR,
.name   = AUTOFS_DEVICE_NAME,
-   .fops   = &_dev_ioctl_fops
+   .fops   = &_dev_ioctl_fops,
+   .mode   = 0666
 };
 
 MODULE_ALIAS_MISCDEV(AUTOFS_MINOR);



Re: [git pull] vfs fixes

2017-04-03 Thread Ian Kent
On Mon, 2017-04-03 at 01:00 -0500, Eric W. Biederman wrote:
> Al Viro  writes:
> 
> > On Sun, Apr 02, 2017 at 05:58:41PM -0700, Linus Torvalds wrote:
> > 
> > > I had to go and double-check that "DCACHE_DIRECTORY_TYPE" is what
> > > d_can_lookup() actually checks, so _that_ part is perhaps a bit
> > > subtle, and might be worth noting in that comment that you edited.
> > > 
> > > So the real "rule" ends up being that we only ever look up things from
> > > dentries of type DCACHE_DIRECTORY_TYPE set, and those had better have
> > > DCACHE_RCUACCESS bit set.
> > > 
> > > And the only reason path_init() only checks it for that case is that
> > > nd->root and nd->pwd both have to be of type d_can_lookup().
> > > 
> > > Do we check that when we set it? I hope/assume we do.
> > 
> > For chdir()/chroot()/pivot_root() it's done by LOOKUP_DIRECTORY in lookup
> > flags; fchdir() is slightly different - there we check S_ISDIR of inode
> > of opened file.  Which is almost the same thing, except for
> > kinda-sorta directories that have no ->lookup() - we have them for
> > NFS referral points.  It should be impossible to end up with
> > one of those opened - not even with O_PATH; follow_managed() will be called
> > and we'll either fail or cross into whatever ends up overmounting them.
> > Still, it might be cleaner to turn that check into
> > d_can_lookup(f.file->f_path.dentry)
> > simply for consistency sake.
> > 
> > The thing I really don't like is mntns_install().  With sufficiently
> > nasty nfsroot setup it might be possible to end up with referral point
> > as one's root/pwd; getting out of such state would be interesting...
> > Smells like that place should be a solitary follow_down(), not a loop
> > of follow_down_one(), but I want Eric's opinion on that one; userns stuff
> > is weird.
> 
> If I read the conversation correctly the concern is that we might
> initialize a pwd or root with something that is almost but not quite a
> directory in mntns_install.
> 
> Refereshing my memory.  d_automount mounts things and is what
> is used for nfs referrals.  d_manage blocks waiting for
> an automounts to complete or expire.  follow_down just calls d_manage,
> follow_manage calls both d_manage and d_automount as appropriate.
> 
> If the concern is nfs referral points calling follow_down is wrong and
> what is wanted is follow_managed.

The case Al was concerned about (sounds like) where the root (or pwd) being
followed is an NFS referral (a similar case could be NFS file system migration
if (when?) being used, and that's probably more likely to be triggered from a
file system root than a referral).

I can't see how that could happen for a referral, but if it did the follow would
need to call d_automount(). It's unlikely ->d_manage() would factor into it but
it is available for use so should be part of it. 

So follow_down() rather than follow_down_one() sounds like the right thing to
do.

> 
> The only thing that follow_down prevents is changing onto directories
> that are only half mounted, and not really directories yet.  Which
> is certainly part of the invarient we want to preserve.
> 
> 
> 
> The intent of the logic in mntns_install is to just pick a reasonable
> looking place somewhere in that mount namespace to use as a root
> directory.  I arbitrarily picked the top of the mount stack on "/".  Which
> is typically used as the root directory.  If people really care where
> their root is they save a directory file descriptor off somewhere and
> call chroot.  So there is a little wiggle room exactly what the code
> does.
> 
> There is a secondary use of mntns_install which is to give you a way to
> access what is under "/" if you are so foolish as to umount "/".  I keep
> thinking setns to your own mount namespace would be a handy way to get
> back to the rootfs and to use it for something during system shutdown.
> I don't know if anyone has actually used setns to your own mount
> namespace for that.
> 
> The worst case I can see from the proposed change is we would
> not be able to umount all of the way down to rootfs.  That
> would be a self inflicted wound so I don't care.
> 
> I can't imagine anyone mounting an automount point deliberately on /
> except as way to confuse the vfs.  Though I can almost imagine getting
> there by accident if an automount expires.
> 
> So yes please let's change the follow_down_one loop to follow_managed
> to preserve the invariant that we always have a directory that
> supports d_can_lookup to pass to set_fs_pwd and set_fs_root.
> 
> Eric
> 
> > diff --git a/fs/dcache.c b/fs/dcache.c
> > index 95d71eda8142..05550139a8a6 100644
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -1757,7 +1757,13 @@ static unsigned d_flags_for_inode(struct inode
> > *inode)
> >     return DCACHE_MISS_TYPE;
> >  
> >     if (S_ISDIR(inode->i_mode)) {
> > -   add_flags = DCACHE_DIRECTORY_TYPE;
> > +   /*
> > +    * Any potential starting point of lookup should have

Re: [git pull] vfs fixes

2017-04-03 Thread Ian Kent
On Mon, 2017-04-03 at 01:00 -0500, Eric W. Biederman wrote:
> Al Viro  writes:
> 
> > On Sun, Apr 02, 2017 at 05:58:41PM -0700, Linus Torvalds wrote:
> > 
> > > I had to go and double-check that "DCACHE_DIRECTORY_TYPE" is what
> > > d_can_lookup() actually checks, so _that_ part is perhaps a bit
> > > subtle, and might be worth noting in that comment that you edited.
> > > 
> > > So the real "rule" ends up being that we only ever look up things from
> > > dentries of type DCACHE_DIRECTORY_TYPE set, and those had better have
> > > DCACHE_RCUACCESS bit set.
> > > 
> > > And the only reason path_init() only checks it for that case is that
> > > nd->root and nd->pwd both have to be of type d_can_lookup().
> > > 
> > > Do we check that when we set it? I hope/assume we do.
> > 
> > For chdir()/chroot()/pivot_root() it's done by LOOKUP_DIRECTORY in lookup
> > flags; fchdir() is slightly different - there we check S_ISDIR of inode
> > of opened file.  Which is almost the same thing, except for
> > kinda-sorta directories that have no ->lookup() - we have them for
> > NFS referral points.  It should be impossible to end up with
> > one of those opened - not even with O_PATH; follow_managed() will be called
> > and we'll either fail or cross into whatever ends up overmounting them.
> > Still, it might be cleaner to turn that check into
> > d_can_lookup(f.file->f_path.dentry)
> > simply for consistency sake.
> > 
> > The thing I really don't like is mntns_install().  With sufficiently
> > nasty nfsroot setup it might be possible to end up with referral point
> > as one's root/pwd; getting out of such state would be interesting...
> > Smells like that place should be a solitary follow_down(), not a loop
> > of follow_down_one(), but I want Eric's opinion on that one; userns stuff
> > is weird.
> 
> If I read the conversation correctly the concern is that we might
> initialize a pwd or root with something that is almost but not quite a
> directory in mntns_install.
> 
> Refereshing my memory.  d_automount mounts things and is what
> is used for nfs referrals.  d_manage blocks waiting for
> an automounts to complete or expire.  follow_down just calls d_manage,
> follow_manage calls both d_manage and d_automount as appropriate.

AFAIK d_manage() is only defined by autofs.

It was needed by autofs because the the mount creation and addition is done by
another (user space) thread whereas "normal" file systems like NFS do all the
work in-kernel.

> 
> If the concern is nfs referral points calling follow_down is wrong and
> what is wanted is follow_managed.
> 
> The only thing that follow_down prevents is changing onto directories
> that are only half mounted, and not really directories yet.  Which
> is certainly part of the invarient we want to preserve.
> 
> 
> 
> The intent of the logic in mntns_install is to just pick a reasonable
> looking place somewhere in that mount namespace to use as a root
> directory.  I arbitrarily picked the top of the mount stack on "/".  Which
> is typically used as the root directory.  If people really care where
> their root is they save a directory file descriptor off somewhere and
> call chroot.  So there is a little wiggle room exactly what the code
> does.
> 
> There is a secondary use of mntns_install which is to give you a way to
> access what is under "/" if you are so foolish as to umount "/".  I keep
> thinking setns to your own mount namespace would be a handy way to get
> back to the rootfs and to use it for something during system shutdown.
> I don't know if anyone has actually used setns to your own mount
> namespace for that.
> 
> The worst case I can see from the proposed change is we would
> not be able to umount all of the way down to rootfs.  That
> would be a self inflicted wound so I don't care.
> 
> I can't imagine anyone mounting an automount point deliberately on /
> except as way to confuse the vfs.  Though I can almost imagine getting
> there by accident if an automount expires.
> 
> So yes please let's change the follow_down_one loop to follow_managed
> to preserve the invariant that we always have a directory that
> supports d_can_lookup to pass to set_fs_pwd and set_fs_root.
> 
> Eric
> 
> > diff --git a/fs/dcache.c b/fs/dcache.c
> > index 95d71eda8142..05550139a8a6 100644
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -1757,7 +1757,13 @@ static unsigned d_flags_for_inode(struct inode
> > *inode)
> >     return DCACHE_MISS_TYPE;
> >  
> >     if (S_ISDIR(inode->i_mode)) {
> > -   add_flags = DCACHE_DIRECTORY_TYPE;
> > +   /*
> > +    * Any potential starting point of lookup should have
> > +    * DCACHE_RCUACCESS; currently directory dentries
> > +    * come from d_alloc() anyway, but it costs us nothing
> > +    * to enforce it here.
> > +    */
> > +   add_flags = DCACHE_DIRECTORY_TYPE | DCACHE_RCUACCESS;
> >     if (unlikely(!(inode->i_opflags & IOP_LOOKUP))) {
> >     

Re: [PATCH 4/5] fs/autofs4: Fix a dead URL to ftp.kernel.org

2017-03-27 Thread Ian Kent
On Mon, 2017-03-27 at 14:47 +0900, SeongJae Park wrote:
> As ftp.kernel.org is closed [0], this commit fixes a dead URL to
> ftp.kernel.org in fs/autofs4/ to use www.kernel.org instead.
> 
> [0] https://www.kernel.org/shutting-down-ftp-services.html
> 
> Signed-off-by: SeongJae Park 

ACK, and thanks for fixing this.

> ---
>  fs/autofs4/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/autofs4/Kconfig b/fs/autofs4/Kconfig
> index 1204d6384d39..44727bf18297 100644
> --- a/fs/autofs4/Kconfig
> +++ b/fs/autofs4/Kconfig
> @@ -7,7 +7,7 @@ config AUTOFS4_FS
>     automounter (amd), which is a pure user space daemon.
>  
>     To use the automounter you need the user-space tools from
> -   ; you also
> +   ; you also
>     want to answer Y to "NFS file system support", below.
>  
>     To compile this support as a module, choose M here: the module will
> be


[PATCH 4/7] autofs: update ioctl documentation regarding struct autofs_dev_ioctl

2017-01-30 Thread Ian Kent
From: Tomohiro Kusumi 

This is the same as bf72eda5 except that it's a different file.
Sync documentation with changes made by 730c9eec in 2009.

Signed-off-by: Tomohiro Kusumi 
Signed-off-by: Ian Kent 
---
 .../filesystems/autofs4-mount-control.txt  |1 +
 Documentation/filesystems/autofs4.txt  |   32 +---
 2 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/Documentation/filesystems/autofs4-mount-control.txt 
b/Documentation/filesystems/autofs4-mount-control.txt
index 50a3e01..e5177cb 100644
--- a/Documentation/filesystems/autofs4-mount-control.txt
+++ b/Documentation/filesystems/autofs4-mount-control.txt
@@ -179,6 +179,7 @@ struct autofs_dev_ioctl {
 * including this struct */
__s32 ioctlfd;  /* automount command fd */
 
+   /* Command parameters */
union {
struct args_protoverprotover;
struct args_protosubver protosubver;
diff --git a/Documentation/filesystems/autofs4.txt 
b/Documentation/filesystems/autofs4.txt
index 7534863..f10dd59 100644
--- a/Documentation/filesystems/autofs4.txt
+++ b/Documentation/filesystems/autofs4.txt
@@ -425,8 +425,20 @@ Each ioctl is passed a pointer to an `autofs_dev_ioctl` 
structure:
  * including this struct */
 __s32 ioctlfd;  /* automount command fd */
 
-__u32 arg1; /* Command parameters */
-__u32 arg2;
+   /* Command parameters */
+   union {
+   struct args_protoverprotover;
+   struct args_protosubver protosubver;
+   struct args_openmount   openmount;
+   struct args_ready   ready;
+   struct args_failfail;
+   struct args_setpipefd   setpipefd;
+   struct args_timeout timeout;
+   struct args_requester   requester;
+   struct args_expire  expire;
+   struct args_askumount   askumount;
+   struct args_ismountpointismountpoint;
+   };
 
 char path[0];
 };
@@ -446,24 +458,22 @@ Commands are:
 set version numbers.
 - **AUTOFS_DEV_IOCTL_OPENMOUNT_CMD**: return an open file descriptor
 on the root of an autofs filesystem.  The filesystem is identified
-by name and device number, which is stored in `arg1`.  Device
-numbers for existing filesystems can be found in
+by name and device number, which is stored in `openmount.devid`.
+Device numbers for existing filesystems can be found in
 `/proc/self/mountinfo`.
 - **AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD**: same as `close(ioctlfd)`.
 - **AUTOFS_DEV_IOCTL_SETPIPEFD_CMD**: if the filesystem is in
 catatonic mode, this can provide the write end of a new pipe
-in `arg1` to re-establish communication with a daemon.  The
-process group of the calling process is used to identify the
+in `setpipefd.pipefd` to re-establish communication with a daemon.
+The process group of the calling process is used to identify the
 daemon.
 - **AUTOFS_DEV_IOCTL_REQUESTER_CMD**: `path` should be a
 name within the filesystem that has been auto-mounted on.
-On successful return, `arg1` and `arg2` will be the UID and GID
-of the process which triggered that mount.
-
+On successful return, `requester.uid` and `requester.gid` will be
+the UID and GID of the process which triggered that mount.
 - **AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD**: Check if path is a
 mountpoint of a particular type - see separate documentation for
 details.
-
 - **AUTOFS_DEV_IOCTL_PROTOVER_CMD**:
 - **AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD**:
 - **AUTOFS_DEV_IOCTL_READY_CMD**:
@@ -473,7 +483,7 @@ Commands are:
 - **AUTOFS_DEV_IOCTL_EXPIRE_CMD**:
 - **AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD**:  These all have the same
 function as the similarly named **AUTOFS_IOC** ioctls, except
-that **FAIL** can be given an explicit error number in `arg1`
+that **FAIL** can be given an explicit error number in `fail.status`
 instead of assuming `ENOENT`, and this **EXPIRE** command
 corresponds to **AUTOFS_IOC_EXPIRE_MULTI**.
 



[PATCH 5/7] autofs: add command enum/macros for root-dir ioctls

2017-01-30 Thread Ian Kent
From: Tomohiro Kusumi 

Sync root-dir ioctl with misc-char-dev ioctl's enum/macro format
since these two types of ioctls aren't completely independent of
each other in terms of command nr. No functional changes.

Signed-off-by: Tomohiro Kusumi 
Signed-off-by: Ian Kent 
---
 include/uapi/linux/auto_dev-ioctl.h |2 --
 include/uapi/linux/auto_fs.h|   25 ++---
 include/uapi/linux/auto_fs4.h   |   16 +++-
 3 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/include/uapi/linux/auto_dev-ioctl.h 
b/include/uapi/linux/auto_dev-ioctl.h
index 388739b..af107aa 100644
--- a/include/uapi/linux/auto_dev-ioctl.h
+++ b/include/uapi/linux/auto_dev-ioctl.h
@@ -156,8 +156,6 @@ enum {
AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD,
 };
 
-#define AUTOFS_IOCTL 0x93
-
 #define AUTOFS_DEV_IOCTL_VERSION \
_IOWR(AUTOFS_IOCTL, \
  AUTOFS_DEV_IOCTL_VERSION_CMD, struct autofs_dev_ioctl)
diff --git a/include/uapi/linux/auto_fs.h b/include/uapi/linux/auto_fs.h
index 1bfc3ed..aa63451 100644
--- a/include/uapi/linux/auto_fs.h
+++ b/include/uapi/linux/auto_fs.h
@@ -61,12 +61,23 @@ struct autofs_packet_expire {
char name[NAME_MAX+1];
 };
 
-#define AUTOFS_IOC_READY  _IO(0x93, 0x60)
-#define AUTOFS_IOC_FAIL   _IO(0x93, 0x61)
-#define AUTOFS_IOC_CATATONIC  _IO(0x93, 0x62)
-#define AUTOFS_IOC_PROTOVER   _IOR(0x93, 0x63, int)
-#define AUTOFS_IOC_SETTIMEOUT32 _IOWR(0x93, 0x64, compat_ulong_t)
-#define AUTOFS_IOC_SETTIMEOUT _IOWR(0x93, 0x64, unsigned long)
-#define AUTOFS_IOC_EXPIRE _IOR(0x93, 0x65, struct autofs_packet_expire)
+#define AUTOFS_IOCTL 0x93
+
+enum {
+   AUTOFS_IOC_READY_CMD = 0x60,
+   AUTOFS_IOC_FAIL_CMD,
+   AUTOFS_IOC_CATATONIC_CMD,
+   AUTOFS_IOC_PROTOVER_CMD,
+   AUTOFS_IOC_SETTIMEOUT_CMD,
+   AUTOFS_IOC_EXPIRE_CMD,
+};
+
+#define AUTOFS_IOC_READY_IO(AUTOFS_IOCTL, AUTOFS_IOC_READY_CMD)
+#define AUTOFS_IOC_FAIL _IO(AUTOFS_IOCTL, AUTOFS_IOC_FAIL_CMD)
+#define AUTOFS_IOC_CATATONIC_IO(AUTOFS_IOCTL, AUTOFS_IOC_CATATONIC_CMD)
+#define AUTOFS_IOC_PROTOVER _IOR(AUTOFS_IOCTL, AUTOFS_IOC_PROTOVER_CMD, 
int)
+#define AUTOFS_IOC_SETTIMEOUT32 _IOWR(AUTOFS_IOCTL, AUTOFS_IOC_SETTIMEOUT_CMD, 
compat_ulong_t)
+#define AUTOFS_IOC_SETTIMEOUT   _IOWR(AUTOFS_IOCTL, AUTOFS_IOC_SETTIMEOUT_CMD, 
unsigned long)
+#define AUTOFS_IOC_EXPIRE   _IOR(AUTOFS_IOCTL, AUTOFS_IOC_EXPIRE_CMD, 
struct autofs_packet_expire)
 
 #endif /* _UAPI_LINUX_AUTO_FS_H */
diff --git a/include/uapi/linux/auto_fs4.h b/include/uapi/linux/auto_fs4.h
index 8f8f1bd..7c6da42 100644
--- a/include/uapi/linux/auto_fs4.h
+++ b/include/uapi/linux/auto_fs4.h
@@ -148,10 +148,16 @@ union autofs_v5_packet_union {
autofs_packet_expire_direct_t expire_direct;
 };
 
-#define AUTOFS_IOC_EXPIRE_MULTI_IOW(0x93, 0x66, int)
-#define AUTOFS_IOC_EXPIRE_INDIRECT AUTOFS_IOC_EXPIRE_MULTI
-#define AUTOFS_IOC_EXPIRE_DIRECT   AUTOFS_IOC_EXPIRE_MULTI
-#define AUTOFS_IOC_PROTOSUBVER _IOR(0x93, 0x67, int)
-#define AUTOFS_IOC_ASKUMOUNT   _IOR(0x93, 0x70, int)
+enum {
+   AUTOFS_IOC_EXPIRE_MULTI_CMD = 0x66, /* AUTOFS_IOC_EXPIRE_CMD + 1 */
+   AUTOFS_IOC_PROTOSUBVER_CMD,
+   AUTOFS_IOC_ASKUMOUNT_CMD = 0x70, /* AUTOFS_DEV_IOCTL_VERSION_CMD - 1 */
+};
+
+#define AUTOFS_IOC_EXPIRE_MULTI_IOW(AUTOFS_IOCTL, 
AUTOFS_IOC_EXPIRE_MULTI_CMD, int)
+#define AUTOFS_IOC_EXPIRE_INDIRECT AUTOFS_IOC_EXPIRE_MULTI
+#define AUTOFS_IOC_EXPIRE_DIRECT   AUTOFS_IOC_EXPIRE_MULTI
+#define AUTOFS_IOC_PROTOSUBVER _IOR(AUTOFS_IOCTL, 
AUTOFS_IOC_PROTOSUBVER_CMD, int)
+#define AUTOFS_IOC_ASKUMOUNT   _IOR(AUTOFS_IOCTL, 
AUTOFS_IOC_ASKUMOUNT_CMD, int)
 
 #endif /* _LINUX_AUTO_FS4_H */



[PATCH 1/7] autofs: remove wrong comment

2017-01-30 Thread Ian Kent
From: Tomohiro Kusumi 

This format seems to have been taken from device mapper header,
but autofs has no such file:function in both kernel and userspace.

Signed-off-by: Tomohiro Kusumi 
Signed-off-by: Ian Kent 
---
 include/uapi/linux/auto_dev-ioctl.h |4 
 1 file changed, 4 deletions(-)

diff --git a/include/uapi/linux/auto_dev-ioctl.h 
b/include/uapi/linux/auto_dev-ioctl.h
index 021ed33..388739b 100644
--- a/include/uapi/linux/auto_dev-ioctl.h
+++ b/include/uapi/linux/auto_dev-ioctl.h
@@ -120,10 +120,6 @@ static inline void init_autofs_dev_ioctl(struct 
autofs_dev_ioctl *in)
in->ioctlfd = -1;
 }
 
-/*
- * If you change this make sure you make the corresponding change
- * to autofs-dev-ioctl.c:lookup_ioctl()
- */
 enum {
/* Get various version info */
AUTOFS_DEV_IOCTL_VERSION_CMD = 0x71,



[PATCH 2/7] autofs: fix typo in Documentation

2017-01-30 Thread Ian Kent
From: Tomohiro Kusumi 

Signed-off-by: Tomohiro Kusumi 
Signed-off-by: Ian Kent 
---
 Documentation/filesystems/autofs4.txt |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/autofs4.txt 
b/Documentation/filesystems/autofs4.txt
index 8fac3fe..a5dc56f 100644
--- a/Documentation/filesystems/autofs4.txt
+++ b/Documentation/filesystems/autofs4.txt
@@ -65,7 +65,7 @@ directory is a mount trap only if the filesystem is mounted 
*direct*
 and the root is empty.
 
 Directories created in the root directory are mount traps only if the
-filesystem is mounted  *indirect* and they are empty.
+filesystem is mounted *indirect* and they are empty.
 
 Directories further down the tree depend on the *maxproto* mount
 option and particularly whether it is less than five or not.
@@ -352,7 +352,7 @@ Communicating with autofs: root directory ioctls
 
 
 The root directory of an autofs filesystem will respond to a number of
-ioctls.   The process issuing the ioctl must have the CAP_SYS_ADMIN
+ioctls.  The process issuing the ioctl must have the CAP_SYS_ADMIN
 capability, or must be the automount daemon.
 
 The available ioctl commands are:
@@ -512,7 +512,7 @@ always be mounted "shared". e.g.
 
 > `mount --make-shared /autofs/mount/point`
 
-The automount daemon is only able to mange a single mount location for
+The automount daemon is only able to manage a single mount location for
 an autofs filesystem and if mounts on that are not 'shared', other
 locations will not behave as expected.  In particular access to those
 other locations will likely result in the `ELOOP` error



[PATCH 3/7] autofs: fix wrong ioctl documentation regarding devid

2017-01-30 Thread Ian Kent
From: Tomohiro Kusumi 

This is the same as d8732841 except that it's a different file.
A caller has no devid input, and devid is obtained via superblock.

Signed-off-by: Tomohiro Kusumi 
Signed-off-by: Ian Kent 
---
 Documentation/filesystems/autofs4.txt |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/autofs4.txt 
b/Documentation/filesystems/autofs4.txt
index a5dc56f..7534863 100644
--- a/Documentation/filesystems/autofs4.txt
+++ b/Documentation/filesystems/autofs4.txt
@@ -457,9 +457,8 @@ Commands are:
 daemon.
 - **AUTOFS_DEV_IOCTL_REQUESTER_CMD**: `path` should be a
 name within the filesystem that has been auto-mounted on.
-arg1 is the dev number of the underlying autofs.  On successful
-return, `arg1` and `arg2` will be the UID and GID of the process
-which triggered that mount.
+On successful return, `arg1` and `arg2` will be the UID and GID
+of the process which triggered that mount.
 
 - **AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD**: Check if path is a
 mountpoint of a particular type - see separate documentation for



[PATCH 7/7] autofs - take more care to not update last_used on path walk

2017-01-30 Thread Ian Kent
GUI environments seem to be becoming more agressive at scanning
filesystems, to the point where autofs cannot expire mounts at
all.

This is one key reason the update of the autofs dentry info
last_used field is done in the expire system when the dentry
is seen to be in use.

But somewhere along the way instances of the update has crept
back into the autofs path walk functions which, with the more
aggressive file access patterns, is preventing expiration.

Changing the update in the path walk functions allows autofs
to at least make progress in spite of frequent immediate
re-mounts from file accesses.

Signed-off-by: Ian Kent 
---
 fs/autofs4/root.c |   17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index a11f731..6ddd4fa 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -281,8 +281,8 @@ static int autofs4_mount_wait(struct dentry *dentry, bool 
rcu_walk)
pr_debug("waiting for mount name=%pd\n", dentry);
status = autofs4_wait(sbi, dentry, NFY_MOUNT);
pr_debug("mount wait done status=%d\n", status);
+   ino->last_used = jiffies;
}
-   ino->last_used = jiffies;
return status;
 }
 
@@ -319,16 +319,21 @@ static struct dentry *autofs4_mountpoint_changed(struct 
path *path)
 */
if (autofs_type_indirect(sbi->type) && d_unhashed(dentry)) {
struct dentry *parent = dentry->d_parent;
-   struct autofs_info *ino;
struct dentry *new;
 
new = d_lookup(parent, &dentry->d_name);
if (!new)
return NULL;
-   ino = autofs4_dentry_ino(new);
-   ino->last_used = jiffies;
-   dput(path->dentry);
-   path->dentry = new;
+   if (new == dentry)
+   dput(new);
+   else {
+   struct autofs_info *ino;
+
+   ino = autofs4_dentry_ino(new);
+   ino->last_used = jiffies;
+   dput(path->dentry);
+   path->dentry = new;
+   }
}
return path->dentry;
 }



[PATCH 6/7] autofs: remove duplicated AUTOFS_DEV_IOCTL_SIZE definition

2017-01-30 Thread Ian Kent
From: Tomohiro Kusumi 

This macro is already defined in uapi header.
Also use this macro where possible.

Signed-off-by: Tomohiro Kusumi 
Signed-off-by: Ian Kent 
---
 fs/autofs4/dev-ioctl.c  |2 --
 include/uapi/linux/auto_dev-ioctl.h |4 ++--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index fc09eb7..35099ac 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -38,8 +38,6 @@
  * which have been left busy at at service shutdown.
  */
 
-#define AUTOFS_DEV_IOCTL_SIZE  sizeof(struct autofs_dev_ioctl)
-
 typedef int (*ioctl_fn)(struct file *, struct autofs_sb_info *,
struct autofs_dev_ioctl *);
 
diff --git a/include/uapi/linux/auto_dev-ioctl.h 
b/include/uapi/linux/auto_dev-ioctl.h
index af107aa..744b3d0 100644
--- a/include/uapi/linux/auto_dev-ioctl.h
+++ b/include/uapi/linux/auto_dev-ioctl.h
@@ -113,10 +113,10 @@ struct autofs_dev_ioctl {
 
 static inline void init_autofs_dev_ioctl(struct autofs_dev_ioctl *in)
 {
-   memset(in, 0, sizeof(struct autofs_dev_ioctl));
+   memset(in, 0, AUTOFS_DEV_IOCTL_SIZE);
in->ver_major = AUTOFS_DEV_IOCTL_VERSION_MAJOR;
in->ver_minor = AUTOFS_DEV_IOCTL_VERSION_MINOR;
-   in->size = sizeof(struct autofs_dev_ioctl);
+   in->size = AUTOFS_DEV_IOCTL_SIZE;
in->ioctlfd = -1;
 }
 



Re: linux-next: manual merge of the akpm tree with the vfs tree

2016-12-12 Thread Ian Kent
On Mon, 2016-12-12 at 16:52 +1100, Stephen Rothwell wrote:
> Hi all,
> 
> Al let me know that he had put a newer version of the autofs patches
> into his vfs tree, so I have dropped the following patches from the akpm
> tree today:
> 
>   vfs: change d_manage() to take a struct path
>   vfs: add path_is_mountpoint() helper
>   vfs: fix boolreturn.cocci warnings
>   vfs: add path_has_submounts()
>   autofs: change autofs4_expire_wait() to take struct path
>   autofs: change autofs4_wait() to take struct path
>   autofs: use path_is_mountpoint() to fix unreliable d_mountpoint() checks
>   autofs: use path_has_submounts() to fix unreliable have_submount() checks
>   vfs: remove unused have_submounts() function
>   vfs: merge path_is_mountpoint() and path_is_mountpoint_rcu()
>   autofs: make struct path const in autofs4_dir_open()
>   autofs: change struct path to const in autofs4_expire_wait() and
> autofs4_wait()
>   vfs: change struct path to const in d_manage()
>   vfs: constify path parameter of path_has_submounts()
>   autofs: don't hold spinlock over direct mount expire
>   vfs: make may_umount_tree() mount propagation aware
>   vfs-make-may_umount_tree-mount-propogation-aware-checkpatch-fixes
> 
> I hope that was the correct ones.

Hi Stephen,

Yes, Al merged a number of the patches, fixed a couple of problems, and rejected
one which I'm still working on.

The list looks right to me.
That's great, thanks,
Ian


Re: [PATCH 1/7] vfs - merge path_is_mountpoint() and path_is_mountpoint_rcu()

2016-12-07 Thread Ian Kent
On Thu, 2016-12-08 at 17:28 +1300, Eric W. Biederman wrote:
> Ian Kent  writes:
> 
> > On Thu, 2016-12-08 at 10:30 +1300, Eric W. Biederman wrote:
> > > Ian Kent  writes:
> > > 
> > > > On Sat, 2016-12-03 at 05:13 +, Al Viro wrote:
> > > > >   FWIW, I've folded that pile into vfs.git#work.autofs.
> > > > > 
> > > > > Problems:
> > > > 
> > > > snip ...
> > > > 
> > > > >   * the last one (propagation-related) is too ugly to live - at
> > > > > the
> > > > > very least, its pieces should live in fs/pnode.c; exposing
> > > > > propagate_next()
> > > > > is simply wrong.  I hadn't picked that one at all, and I would suggest
> > > > > coordinating anything in that area with ebiederman - he has some work
> > > > > around fs/pnode.c and you risk stepping on his toes.
> > > > 
> > > > The earlier patches seem to be ok now so how about we talk a little
> > > > about
> > > > this
> > > > last one.
> > > > 
> > > > Eric, Al mentioned that you are working with fs/pnode.c and recommended
> > > > I
> > > > co-
> > > > ordinate with you.
> > > > 
> > > > So is my working on this this (which is most likely going to live in
> > > > pnode.c
> > > > if
> > > > I can can get something acceptable) going to cause complications for
> > > > you?
> > > > Is what your doing at a point were it would be worth doing as Al
> > > > suggests?
> > > > 
> > > > Anyway, the problem that this patch is supposed to solve is to check if
> > > > any
> > > > of
> > > > the list of mnt_mounts or any of the mounts propagated from each are in
> > > > use.
> > > > 
> > > > One obvious problem with it is the propagation could be very large.
> > > > 
> > > > But now I look at it again there's no reason to have to every tree
> > > > because
> > > > if
> > > > one tree is busy then the the set of trees is busy. But every tree would
> > > > be
> > > > visited if the not busy so it's perhaps still a problem.
> > > > 
> > > > The difficult thing is working out if a tree is busy, more so because
> > > > there
> > > > could be a struct path holding references within any the trees so I
> > > > don't
> > > > know
> > > > of a simpler, more efficient way to check for this.
> > > 
> > > So coordination seems to be in order.  Not so much because of your
> > > current implementation (which won't tell you what you want to know)
> > 
> > Umm ... ok I've missed something then because the patch I posted does appear
> > to
> > get the calculation right. Perhaps I've missed a case where it gets it wrong
> > 
> > 
> > But now I'm looking deeper into it I'm beginning to wonder why it worked for
> > one
> > of the cases. Maybe I need to do some more tests with even more
> > printks.
> 
> So the case that I am concerned about is that if someone happens to do
> mount --bind the mount propagation trees would not just be mirror images
> in single filesystems but have leaves in odd places.
> 
> Consider .../base/one/two (where one and two are automounts) and base
> is the shared mountpoint that is propagated between namespaces.
> 
> If someone does mount --bind .../base/one ../somepath/one in any
> namespace then there will be places where an unmount of "two" will
> propagate to, but that an unmount of "one" won't as their respective
> propagation trees are different.
> 
> Not accounting for different mount points having different propagation
> trees is what I meant when I said your code was wrong.

Yeah, it's all too easy for me to miss important cases like this because such
things are not allowed (or rather not supported, since it can be done) within
automount managed directories. Even without propagation it can be problematic.

But you're right, that's no excuse for a VFS function to not cater for, or at
least handle sensibly, cases like this.

> 
> > > but because an implementation that tells you what you are looking for
> > > has really bad > O(N^2) complexity walking the propagation tree in
> > > the worst case.
> > 
> > Yes it is bad indeed.
> > 
> > > 
> > > To be correct

Re: [PATCH 1/7] vfs - merge path_is_mountpoint() and path_is_mountpoint_rcu()

2016-12-07 Thread Ian Kent
On Thu, 2016-12-08 at 10:30 +1300, Eric W. Biederman wrote:
> Ian Kent  writes:
> 
> > On Sat, 2016-12-03 at 05:13 +, Al Viro wrote:
> > >   FWIW, I've folded that pile into vfs.git#work.autofs.
> > > 
> > > Problems:
> > 
> > snip ...
> > 
> > >   * the last one (propagation-related) is too ugly to live - at the
> > > very least, its pieces should live in fs/pnode.c; exposing
> > > propagate_next()
> > > is simply wrong.  I hadn't picked that one at all, and I would suggest
> > > coordinating anything in that area with ebiederman - he has some work
> > > around fs/pnode.c and you risk stepping on his toes.
> > 
> > The earlier patches seem to be ok now so how about we talk a little about
> > this
> > last one.
> > 
> > Eric, Al mentioned that you are working with fs/pnode.c and recommended I
> > co-
> > ordinate with you.
> > 
> > So is my working on this this (which is most likely going to live in pnode.c
> > if
> > I can can get something acceptable) going to cause complications for you?
> > Is what your doing at a point were it would be worth doing as Al
> > suggests?
> > 
> > Anyway, the problem that this patch is supposed to solve is to check if any
> > of
> > the list of mnt_mounts or any of the mounts propagated from each are in use.
> > 
> > One obvious problem with it is the propagation could be very large.
> > 
> > But now I look at it again there's no reason to have to every tree because
> > if
> > one tree is busy then the the set of trees is busy. But every tree would be
> > visited if the not busy so it's perhaps still a problem.
> > 
> > The difficult thing is working out if a tree is busy, more so because there
> > could be a struct path holding references within any the trees so I don't
> > know
> > of a simpler, more efficient way to check for this.
> 
> So coordination seems to be in order.  Not so much because of your
> current implementation (which won't tell you what you want to know)

Umm ... ok I've missed something then because the patch I posted does appear to
get the calculation right. Perhaps I've missed a case where it gets it wrong


But now I'm looking deeper into it I'm beginning to wonder why it worked for one
of the cases. Maybe I need to do some more tests with even more printks. 

> but because an implementation that tells you what you are looking for
> has really bad > O(N^2) complexity walking the propagation tree in
> the worst case.

Yes it is bad indeed.

> 
> To be correct you code would need to use next_mnt and walk the tree and
> on each mount use propagate_mnt_busy to see if that entry can be
> unmounted.  Possibly with small variations to match your case.  Mounts
> in one part of the tree may propagate to places that mounts in other
> parts of the tree will not.

Right, the point there being use propagate_mnt_busy() for the check.

I tried to use that when I started this but had trouble, I'll have another look
at using it.

I thought the reason I couldn't use propagate_mnt_busy() is because it will see
any mount with submounts as busy and that's not what's need. It's mounts below
each of mounts having submounts that make them busy in my case.

You probably don't want to read about this but, unfortunately, an example is
possibly the best way to describe what I need to achieve.

A simple example for is the autofs direct mount map entry (a so called autofs
multi-mount):

/absolute/path/to/direct/mount-point \
/offset/path/one  server:/export/path/one \
/offset2/path/two server2:/export/path/other

The path /absolute/path/to/direct/mount-point is an autofs direct mount trigger
and is essentially an autofs mount.

When the mount is triggered each of the offsets, /offset/path/one and
/offset2/path/two offset trigger mounts are mounted so they can trigger mounts
independently.

When the direct mount /absolute/path/to/direct/mount-point is expired the two
offsets need to be umounted if they are not busy so
/absolute/path/to/direct/mount-point having submounts doesn't necessarily make
it unable to be umounted.

But this is (supposed) to be for only to one level of mounts, for example if I
had:

/absolute/path/to/direct/mount-point \
/offset/path/one    server:/export/path/one \
/offset2/path/two   server2:/export/path/other \
/offset/path/one/nested server3:/export/path3/nested

then the the offsets below the nesting point /offset/path/one make it busy and
need to be expired first.

So why would I even need to do this?

It's functionality of Sun autofs maps and that's the primary map format autofs
supports

Re: [PATCH 1/7] vfs - merge path_is_mountpoint() and path_is_mountpoint_rcu()

2016-12-06 Thread Ian Kent
On Sat, 2016-12-03 at 05:13 +, Al Viro wrote:
>   FWIW, I've folded that pile into vfs.git#work.autofs.
> 
> Problems:

snip ...

>   * the last one (propagation-related) is too ugly to live - at the
> very least, its pieces should live in fs/pnode.c; exposing propagate_next()
> is simply wrong.  I hadn't picked that one at all, and I would suggest
> coordinating anything in that area with ebiederman - he has some work
> around fs/pnode.c and you risk stepping on his toes.

The earlier patches seem to be ok now so how about we talk a little about this
last one.

Eric, Al mentioned that you are working with fs/pnode.c and recommended I co-
ordinate with you.

So is my working on this this (which is most likely going to live in pnode.c if
I can can get something acceptable) going to cause complications for you?
Is what your doing at a point were it would be worth doing as Al suggests?

Anyway, the problem that this patch is supposed to solve is to check if any of
the list of mnt_mounts or any of the mounts propagated from each are in use.

One obvious problem with it is the propagation could be very large.

But now I look at it again there's no reason to have to every tree because if
one tree is busy then the the set of trees is busy. But every tree would be
visited if the not busy so it's perhaps still a problem.

The difficult thing is working out if a tree is busy, more so because there
could be a struct path holding references within any the trees so I don't know
of a simpler, more efficient way to check for this.

Anyone have any suggestions at all?
Ian


Re: [PATCH 1/7] vfs - merge path_is_mountpoint() and path_is_mountpoint_rcu()

2016-12-04 Thread Ian Kent
On Sun, 2016-12-04 at 10:18 +0800, Ian Kent wrote:
> On Sat, 2016-12-03 at 23:29 +, Al Viro wrote:
> > 
> > On Sat, Dec 03, 2016 at 05:13:22AM +, Al Viro wrote:
> > > 
> > > 
> > >   * path_has_submounts() is broken.  At the very least, it's
> > > AB-BA between mount_lock and rename_lock.  I would suggest trying to
> > > put read_seqlock_excl(&mount_lock) around the call of d_walk() in there,
> > > and using __lookup_mnt() in the callback (without retries on the
> > > mount_lock,
> > > of course - read_seqlock_excl done on the outside is enough).  I'm not
> > > sure
> > > if it won't cause trouble with contention, though; that needs testing.  As
> > > it is, that function is broken in #work.autofs, same as it is in -mm and
> > > -next.
> > Fix for path_has_submounts() (as above) force-pushed.  It does
> > need testing and profiling, obviously.
> I'll run my tests against it and re-run with oprofile if all goes well.
> 
> The submount-test I use should show contention if it's a problem but I'm not
> sure the number of mounts used will be sufficient to show up scale problems.
> 
> Basically each case of the test (there are two) runs for 100 iterations using
> 10
> processes with timing set to promote expire to mount contention, mainly to
> test
> for expire to mount races.
> 
> If I don't see contention I might need to analyse further whether the test has
> adequate coverage.

I have run my usual tests on a build of vfs.get#work.autofs.

Both tests ran without problem multiple times (even though they should be
deterministic experience had shown they sometimes aren't).

I also did a system wide oprofile run of an additional run of the submount-test.

I hadn't noticed that the submount-test runs are shorter that I have come to
expect. This might be due to changes to the behaviour of the expire and mount
timing over time.  But I always check that I'm seeing expire and mount behaviour
that should lead to contention during the test run and that looked as expected.

The run time was about 55 minutes for each of the two cases I test whereas I had
come to expect a run time of around 70 minutes each. It's been quite a while
since I've actually paid attention to this and a lot has changed in the VFS
since.

It might be due to Neil Browns' changes to satisfy path walks in rcu-walk mode
where possible rather than always dropping into ref-walk mode as I didn't
profile those changes when they were implemented because I didn't notice
unexpectedly different run times when testing the changes.

I was going to talk about why the autofs usage of path_has_submounts() should
not be problematic but that would be tedious for everyone, instead I'll just say
that, clearly it is possible to abuse path_has_submounts() by calling it on
mount points that have a large number of directories, but autofs shouldn't do
that and the profile verifies this.

I'm not sure how accurate the profile is (I don't do it very often). There were
about 40 out of 2200 samples missed.

And hopefully the report output is readable enough to be useful after posting


Anyway, a system wide opreport (such as it is) showed this:

CPU: Intel Haswell microarchitecture, speed 3700 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask 
of 0x00 (No unit mask) count 10
samples  %image name   app name symbol name
3897186  17.6585  libc-2.22.so operf
__strcmp_sse2_unaligned
3125714  14.1629  libstdc++.so.6.0.21  operf
std::_Rb_tree_increment(std::_Rb_tree_node_base const*)
1500262   6.7978  libsqlite3.so.0.8.6  evolution
/usr/lib64/libsqlite3.so.0.8.6
8562623.8798  operfoperf
/usr/bin/operf
7496033.3965  libglib-2.0.so.0.4600.2  evolution
/usr/lib64/libglib-2.0.so.0.4600.2
5608122.5411  libdbus-1.so.3.14.8  dbus-daemon  
/usr/lib64/libdbus-1.so.3.14.8
3782271.7138  systemd  systemd  
/usr/lib/systemd/systemd
3011751.3647  libcamel-1.2.so.54.0.0   evolution
/usr/lib64/libcamel-1.2.so.54.0.0
2235451.0129  dbus-daemon  dbus-daemon  
/usr/bin/dbus-daemon
1877830.8509  libc-2.22.so dbus-daemon  
__strcmp_sse2_unaligned
1575640.7139  libc-2.22.so evolution
__strcmp_sse2_unaligned
1572180.7124  libc-2.22.so evolution_int_malloc
1560890.7073  libgobject-2.0.so.0.4600.2 evolution
/usr/lib64/libgobject-2.0.so.0.4600.2
1162770.5269  libc-2.22.so 

Re: [PATCH 1/7] vfs - merge path_is_mountpoint() and path_is_mountpoint_rcu()

2016-12-03 Thread Ian Kent
On Sat, 2016-12-03 at 23:29 +, Al Viro wrote:
> On Sat, Dec 03, 2016 at 05:13:22AM +, Al Viro wrote:
> > 
> > * path_has_submounts() is broken.  At the very least, it's
> > AB-BA between mount_lock and rename_lock.  I would suggest trying to
> > put read_seqlock_excl(&mount_lock) around the call of d_walk() in there,
> > and using __lookup_mnt() in the callback (without retries on the mount_lock,
> > of course - read_seqlock_excl done on the outside is enough).  I'm not sure
> > if it won't cause trouble with contention, though; that needs testing.  As
> > it is, that function is broken in #work.autofs, same as it is in -mm and
> > -next.
>   Fix for path_has_submounts() (as above) force-pushed.  It does
> need testing and profiling, obviously.

I'll run my tests against it and re-run with oprofile if all goes well.

The submount-test I use should show contention if it's a problem but I'm not
sure the number of mounts used will be sufficient to show up scale problems.

Basically each case of the test (there are two) runs for 100 iterations using 10
processes with timing set to promote expire to mount contention, mainly to test
for expire to mount races.

If I don't see contention I might need to analyse further whether the test has
adequate coverage.

Ian


Re: [PATCH 1/7] vfs - merge path_is_mountpoint() and path_is_mountpoint_rcu()

2016-12-03 Thread Ian Kent
On Sat, 2016-12-03 at 05:13 +, Al Viro wrote:
>   FWIW, I've folded that pile into vfs.git#work.autofs.
> 
> Problems:
>   * (fixed) __path_is_mountpoint() should _not_ treat NULL from
> __lookup_mnt() as "nothing's mounted there" until it has checked
> that mount_lock hadn't been touched - mount --move on something unrelated
> can race with lockless hash lookup and lead to false negatives.

Right, looking at what you've done there the mistake is so obvious now!
Thanks for helping with it.

>   * linux/mount.h might be the wrong place for path_is_mountpoint().
> Or it shouldn't be inlined.  I don't like the includes you've added there.
>   * path_has_submounts() is broken.  At the very least, it's
> AB-BA between mount_lock and rename_lock.  I would suggest trying to
> put read_seqlock_excl(&mount_lock) around the call of d_walk() in there,
> and using __lookup_mnt() in the callback (without retries on the mount_lock,
> of course - read_seqlock_excl done on the outside is enough).  I'm not sure
> if it won't cause trouble with contention, though; that needs testing.  As
> it is, that function is broken in #work.autofs, same as it is in -mm and
> -next.

Umm ... that's a much more obvious dumb mistake and what you've done there
didn't occur to me even after you spelled it out, I'll take some time to digest
it. And thanks for that one too.

Ian


Re: [PATCH 1/7] vfs - merge path_is_mountpoint() and path_is_mountpoint_rcu()

2016-11-30 Thread Ian Kent
On Wed, 2016-11-30 at 14:22 -0800, Andrew Morton wrote:
> So a far as I can tell, this patch series is intended to
> address Al's review comments against the
> http://lkml.kernel.org/r/20161011053352.27645.83962.st...@pluto.themaw.net
> series?

That's right and also to fix an additional problem I found in subsequent testing
(the may_umount_tree() changes).

Ian


[PATCH 4/7] vfs - change struct path to const in d_manage()

2016-11-27 Thread Ian Kent
From: Ian Kent 

The ->d_manage() function is meant to be used to check if an
automount is in progress and to block if needed before the
mount is followed.

It shouldn't need to modify the passed struct path.

Make that usage explicit by changing ->d_manage() path parameter
to a const.

Signed-off-by: Ian Kent 
Cc: Al Viro 
Cc: Eric W. Biederman 
Cc: Omar Sandoval 
---
 Documentation/filesystems/Locking |2 +-
 Documentation/filesystems/vfs.txt |2 +-
 fs/autofs4/root.c |4 ++--
 fs/namei.c|2 +-
 include/linux/dcache.h|2 +-
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
index 08c9e56..ace63cd 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -20,7 +20,7 @@ prototypes:
void (*d_iput)(struct dentry *, struct inode *);
char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen);
struct vfsmount *(*d_automount)(struct path *path);
-   int (*d_manage)(struct path *, bool);
+   int (*d_manage)(const struct path *, bool);
struct dentry *(*d_real)(struct dentry *, const struct inode *,
 unsigned int);
 
diff --git a/Documentation/filesystems/vfs.txt 
b/Documentation/filesystems/vfs.txt
index e54e992..3893f4d 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -948,7 +948,7 @@ struct dentry_operations {
void (*d_iput)(struct dentry *, struct inode *);
char *(*d_dname)(struct dentry *, char *, int);
struct vfsmount *(*d_automount)(struct path *);
-   int (*d_manage)(struct path *, bool);
+   int (*d_manage)(const struct path *, bool);
struct dentry *(*d_real)(struct dentry *, const struct inode *,
 unsigned int);
 };
diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index 859eef9..08718d3 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -32,7 +32,7 @@ static int autofs4_dir_open(struct inode *inode, struct file 
*file);
 static struct dentry *autofs4_lookup(struct inode *,
 struct dentry *, unsigned int);
 static struct vfsmount *autofs4_d_automount(struct path *);
-static int autofs4_d_manage(struct path *, bool);
+static int autofs4_d_manage(const struct path *, bool);
 static void autofs4_dentry_release(struct dentry *);
 
 const struct file_operations autofs4_root_operations = {
@@ -426,7 +426,7 @@ static struct vfsmount *autofs4_d_automount(struct path 
*path)
return NULL;
 }
 
-static int autofs4_d_manage(struct path *path, bool rcu_walk)
+static int autofs4_d_manage(const struct path *path, bool rcu_walk)
 {
struct dentry *dentry = path->dentry;
struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
diff --git a/fs/namei.c b/fs/namei.c
index 1669c93d..8e9a358 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1263,7 +1263,7 @@ int follow_down_one(struct path *path)
 }
 EXPORT_SYMBOL(follow_down_one);
 
-static inline int managed_dentry_rcu(struct path *path)
+static inline int managed_dentry_rcu(const struct path *path)
 {
return (path->dentry->d_flags & DCACHE_MANAGE_TRANSIT) ?
path->dentry->d_op->d_manage(path, true) : 0;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 44a9a9b..f835bd4 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -139,7 +139,7 @@ struct dentry_operations {
void (*d_iput)(struct dentry *, struct inode *);
char *(*d_dname)(struct dentry *, char *, int);
struct vfsmount *(*d_automount)(struct path *);
-   int (*d_manage)(struct path *, bool);
+   int (*d_manage)(const struct path *, bool);
struct dentry *(*d_real)(struct dentry *, const struct inode *,
 unsigned int);
 } cacheline_aligned;



[PATCH 3/7] autofs - change struct path to const in autofs4_expire_wait() and autofs4_wait()

2016-11-27 Thread Ian Kent
From: Ian Kent 

The functions autofs4_expire_wait() and autofs4_wait() don't modify
the passed struct path so change it to a const.

Signed-off-by: Ian Kent 
Cc: Al Viro 
Cc: Eric W. Biederman 
Cc: Omar Sandoval 
---
 fs/autofs4/autofs_i.h |5 +++--
 fs/autofs4/expire.c   |4 ++--
 fs/autofs4/root.c |6 +++---
 fs/autofs4/waitq.c|4 ++--
 4 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index 14cef41..c885daa 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -145,7 +145,7 @@ void autofs4_free_ino(struct autofs_info *);
 
 /* Expiration */
 int is_autofs4_dentry(struct dentry *);
-int autofs4_expire_wait(struct path *path, int rcu_walk);
+int autofs4_expire_wait(const struct path *path, int rcu_walk);
 int autofs4_expire_run(struct super_block *, struct vfsmount *,
   struct autofs_sb_info *,
   struct autofs_packet_expire __user *);
@@ -217,7 +217,8 @@ static inline int autofs_prepare_pipe(struct file *pipe)
 
 /* Queue management functions */
 
-int autofs4_wait(struct autofs_sb_info *, struct path *, enum autofs_notify);
+int autofs4_wait(struct autofs_sb_info *,
+const struct path *, enum autofs_notify);
 int autofs4_wait_release(struct autofs_sb_info *, autofs_wqt_t, int);
 void autofs4_catatonic_mode(struct autofs_sb_info *);
 
diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
index a37ba40..13178bf 100644
--- a/fs/autofs4/expire.c
+++ b/fs/autofs4/expire.c
@@ -495,7 +495,7 @@ struct dentry *autofs4_expire_indirect(struct super_block 
*sb,
return expired;
 }
 
-int autofs4_expire_wait(struct path *path, int rcu_walk)
+int autofs4_expire_wait(const struct path *path, int rcu_walk)
 {
struct dentry *dentry = path->dentry;
struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
@@ -593,7 +593,7 @@ int autofs4_do_expire_multi(struct super_block *sb, struct 
vfsmount *mnt,
 
if (dentry) {
struct autofs_info *ino = autofs4_dentry_ino(dentry);
-   struct path path = { .mnt = mnt, .dentry = dentry };
+   const struct path path = { .mnt = mnt, .dentry = dentry };
 
/* This is synchronous because it makes the daemon a
 * little easier
diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index 3e33f00..859eef9 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -272,7 +272,7 @@ static struct dentry *autofs4_lookup_expiring(struct dentry 
*dentry,
return NULL;
 }
 
-static int autofs4_mount_wait(struct path *path, bool rcu_walk)
+static int autofs4_mount_wait(const struct path *path, bool rcu_walk)
 {
struct autofs_sb_info *sbi = autofs4_sbi(path->dentry->d_sb);
struct autofs_info *ino = autofs4_dentry_ino(path->dentry);
@@ -289,7 +289,7 @@ static int autofs4_mount_wait(struct path *path, bool 
rcu_walk)
return status;
 }
 
-static int do_expire_wait(struct path *path, bool rcu_walk)
+static int do_expire_wait(const struct path *path, bool rcu_walk)
 {
struct dentry *dentry = path->dentry;
struct dentry *expiring;
@@ -300,7 +300,7 @@ static int do_expire_wait(struct path *path, bool rcu_walk)
if (!expiring)
return autofs4_expire_wait(path, rcu_walk);
else {
-   struct path this = { .mnt = path->mnt, .dentry = expiring };
+   const struct path this = { .mnt = path->mnt, .dentry = expiring 
};
/*
 * If we are racing with expire the request might not
 * be quite complete, but the directory has been removed
diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
index e7a037f..1278335 100644
--- a/fs/autofs4/waitq.c
+++ b/fs/autofs4/waitq.c
@@ -250,7 +250,7 @@ autofs4_find_wait(struct autofs_sb_info *sbi, const struct 
qstr *qstr)
 static int validate_request(struct autofs_wait_queue **wait,
struct autofs_sb_info *sbi,
const struct qstr *qstr,
-   struct path *path, enum autofs_notify notify)
+   const struct path *path, enum autofs_notify notify)
 {
struct dentry *dentry = path->dentry;
struct autofs_wait_queue *wq;
@@ -349,7 +349,7 @@ static int validate_request(struct autofs_wait_queue **wait,
 }
 
 int autofs4_wait(struct autofs_sb_info *sbi,
-struct path *path, enum autofs_notify notify)
+const struct path *path, enum autofs_notify notify)
 {
struct dentry *dentry = path->dentry;
struct autofs_wait_queue *wq;



[PATCH 6/7] autofs - dont hold spin lock over direct mount expire

2016-11-27 Thread Ian Kent
From: Ian Kent 

Commit 7cbdb4a286 altered the autofs indirect mount expire to
not hold a spin lock during the expire check.

The direct mount expire needs the same treatment because to
make autofs expires namespace aware may_umount_tree() needs to
to use a similar method to may_umount() when checking if a mount
tree is in use.

This means may_umount_tree() will end up taking the namespace_sem
for the check so the autofs direct mount expire won't be allowed
to hold a spin lock over the check.

Signed-off-by: Ian Kent 
Cc: Al Viro 
Cc: Eric W. Biederman 
Cc: Omar Sandoval 
---
 fs/autofs4/expire.c |   17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
index 13178bf..57725d4 100644
--- a/fs/autofs4/expire.c
+++ b/fs/autofs4/expire.c
@@ -310,26 +310,29 @@ struct dentry *autofs4_expire_direct(struct super_block 
*sb,
now = jiffies;
timeout = sbi->exp_timeout;
 
-   spin_lock(&sbi->fs_lock);
-   ino = autofs4_dentry_ino(root);
-   /* No point expiring a pending mount */
-   if (ino->flags & AUTOFS_INF_PENDING)
-   goto out;
if (!autofs4_direct_busy(mnt, root, timeout, do_now)) {
+   spin_lock(&sbi->fs_lock);
+   ino = autofs4_dentry_ino(root);
+   /* No point expiring a pending mount */
+   if (ino->flags & AUTOFS_INF_PENDING) {
+   spin_unlock(&sbi->fs_lock);
+   goto out;
+   }
ino->flags |= AUTOFS_INF_WANT_EXPIRE;
spin_unlock(&sbi->fs_lock);
synchronize_rcu();
-   spin_lock(&sbi->fs_lock);
if (!autofs4_direct_busy(mnt, root, timeout, do_now)) {
+   spin_lock(&sbi->fs_lock);
ino->flags |= AUTOFS_INF_EXPIRING;
init_completion(&ino->expire_complete);
spin_unlock(&sbi->fs_lock);
return root;
}
+   spin_lock(&sbi->fs_lock);
ino->flags &= ~AUTOFS_INF_WANT_EXPIRE;
+   spin_unlock(&sbi->fs_lock);
}
 out:
-   spin_unlock(&sbi->fs_lock);
dput(root);
 
return NULL;



[PATCH 5/7] vfs - constify path parameter of path_has_submounts()

2016-11-27 Thread Ian Kent
From: Ian Kent 

path_has_submounts() doesn't modify the passed in path parameter,
and shouldn't need to, make that usage explicit.

Signed-off-by: Ian Kent 
Cc: Al Viro 
Cc: Eric W. Biederman 
Cc: Omar Sandoval 
---
 fs/dcache.c|2 +-
 include/linux/dcache.h |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 719d8b4..29bc910 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1298,7 +1298,7 @@ static enum d_walk_ret path_check_mount(void *data, 
struct dentry *dentry)
  * Return true if the parent or its subdirectories contain
  * a mount point in the current namespace.
  */
-int path_has_submounts(struct path *parent)
+int path_has_submounts(const struct path *parent)
 {
struct check_mount data = { .mnt = parent->mnt, .mounted = 0 };
 
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index f835bd4..c965e44 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -254,7 +254,7 @@ extern struct dentry *d_find_alias(struct inode *);
 extern void d_prune_aliases(struct inode *);
 
 /* test whether we have any submounts in a subdir tree */
-extern int path_has_submounts(struct path *);
+extern int path_has_submounts(const struct path *);
 
 /*
  * This adds the entry to the hash queues.



[PATCH 2/7] autofs - make struct path const in autofs4_dir_open()

2016-11-27 Thread Ian Kent
From: Ian Kent 

There's no reason to copy the file->f_path in autofs4_dir_open() and
f_path is not modified so change it to a "const struct path *".

Signed-off-by: Ian Kent 
Cc: Al Viro 
Cc: Eric W. Biederman 
Cc: Omar Sandoval 
---
 fs/autofs4/root.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index dd2ea5d..3e33f00 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -107,14 +107,14 @@ static int autofs4_dir_open(struct inode *inode, struct 
file *file)
 {
struct dentry *dentry = file->f_path.dentry;
struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
-   struct path path;
+   const struct path *path;
 
pr_debug("file=%p dentry=%p %pd\n", file, dentry, dentry);
 
if (autofs4_oz_mode(sbi))
goto out;
 
-   path = file->f_path;
+   path = &file->f_path;
 
/*
 * An empty directory in an autofs file system is always a
@@ -126,7 +126,7 @@ static int autofs4_dir_open(struct inode *inode, struct 
file *file)
 * it.
 */
spin_lock(&sbi->lookup_lock);
-   if (!path_is_mountpoint(&path) && simple_empty(dentry)) {
+   if (!path_is_mountpoint(path) && simple_empty(dentry)) {
spin_unlock(&sbi->lookup_lock);
return -ENOENT;
}



[PATCH 7/7] vfs - make may_umount_tree() mount propogation aware

2016-11-27 Thread Ian Kent
From: Ian Kent 

Now that autofs has namespace aware mounted checks the expire needs
changes to make it aware of mount propagation.

When checking for expiration may_umount_tree() checks only if the
given mount is in use. This leads to a callback to the automount
daemon to umount the mount which will fail if any propagated mounts
are in use.

To avoid this unnecessary call back may_umount_tree() needs to check
propagated mount trees also.

Signed-off-by: Ian Kent 
Cc: Al Viro 
Cc: Eric W. Biederman 
Cc: Omar Sandoval 
---
 fs/autofs4/expire.c |4 +--
 fs/namespace.c  |   71 +--
 fs/pnode.c  |3 +-
 fs/pnode.h  |1 +
 4 files changed, 66 insertions(+), 13 deletions(-)

diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
index 57725d4..d44ae32 100644
--- a/fs/autofs4/expire.c
+++ b/fs/autofs4/expire.c
@@ -52,7 +52,7 @@ static int autofs4_mount_busy(struct vfsmount *mnt, struct 
dentry *dentry)
goto done;
}
 
-   /* Update the expiry counter if fs is busy */
+   /* Update the expiry counter if fs is busy in any namespace */
if (!may_umount_tree(path.mnt)) {
struct autofs_info *ino;
 
@@ -191,7 +191,7 @@ static int autofs4_direct_busy(struct vfsmount *mnt,
 {
pr_debug("top %p %pd\n", top, top);
 
-   /* If it's busy update the expiry counters */
+   /* If it's busy in any namespace update the expiry counters */
if (!may_umount_tree(mnt)) {
struct autofs_info *ino;
 
diff --git a/fs/namespace.c b/fs/namespace.c
index da1cd87..092d75c 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1311,6 +1311,33 @@ const struct seq_operations mounts_op = {
 };
 #endif  /* CONFIG_PROC_FS */
 
+struct mnt_tree_refs {
+   struct mount *root;
+   unsigned int refs;
+   unsigned int min_refs;
+};
+
+static void mnt_get_tree_refs(struct mnt_tree_refs *mtr)
+{
+   struct mount *mnt = mtr->root;
+   struct mount *p;
+
+   /*
+* Each propagated tree contribues 2 * #mounts - 1 to
+* the minimal reference count. But when a mount is
+* umounted and connected the mount doesn't hold a
+* reference to its parent so it contributes a single
+* reference.
+*/
+   for (p = mnt; p; p = next_mnt(p, mnt)) {
+   mtr->refs += mnt_get_count(p);
+   if (p == mnt || p->mnt.mnt_flags & MNT_UMOUNT)
+   mtr->min_refs++;
+   else
+   mtr->min_refs += 2;
+   }
+}
+
 /**
  * may_umount_tree - check if a mount tree is busy
  * @mnt: root of mount tree
@@ -1322,25 +1349,51 @@ const struct seq_operations mounts_op = {
 int may_umount_tree(struct vfsmount *m)
 {
struct mount *mnt = real_mount(m);
-   int actual_refs = 0;
-   int minimum_refs = 0;
-   struct mount *p;
+   struct mount *parent = mnt->mnt_parent;
+   struct mnt_tree_refs mtr;
+   struct mount *p, *child;
+
BUG_ON(!m);
 
-   /* write lock needed for mnt_get_count */
+   down_read(&namespace_sem);
lock_mount_hash();
-   for (p = mnt; p; p = next_mnt(p, mnt)) {
-   actual_refs += mnt_get_count(p);
-   minimum_refs += 2;
+
+   mtr.root = mnt;
+   mtr.refs = 0;
+   mtr.min_refs = 0;
+
+   mnt_get_tree_refs(&mtr);
+   /*
+* Caller holds a mount reference so minimum references
+* to the tree at mnt is one greater than the minumum
+* references.
+*/
+   mtr.min_refs++;
+
+   /* The pnode.c propagation_next() function (as used below)
+* returns each mount propogated from a given mount. Using
+* the parent of mnt and matching the mnt->mnt_mountpoint
+* gets the list of mounts propogated from mnt. To work
+* out if the tree is in use (eg. open file or pwd) the
+* reference counts of each of these mounts needs to be
+* checked as well as mnt itself.
+*/
+   for (p = propagation_next(parent, parent); p;
+   p = propagation_next(p, parent)) {
+   child = __lookup_mnt_last(&p->mnt, mnt->mnt_mountpoint);
+   if (child) {
+   mtr.root = child;
+   mnt_get_tree_refs(&mtr);
+   }
}
unlock_mount_hash();
+   up_read(&namespace_sem);
 
-   if (actual_refs > minimum_refs)
+   if (mtr.refs > mtr.min_refs)
return 0;
 
return 1;
 }
-
 EXPORT_SYMBOL(may_umount_tree);
 
 /**
diff --git a/fs/pnode.c b/fs/pnode.c
index 234a9ac..ae6f95e 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -143,8 +143,7 @@ void change_mnt_propagation(struct mount *mnt, int type)
  * vfsmount found while iterating with propagation_next() is
  * a peer of o

[PATCH 1/7] vfs - merge path_is_mountpoint() and path_is_mountpoint_rcu()

2016-11-27 Thread Ian Kent
From: Ian Kent 

Forgetting that the rcu lock allows nesting I added a superfluous rcu
version of path_is_mountpoint().

Merge it and the rcu version, make the common case (d_mountpoint()
returning true) inline and change the path parameter to a const.

Also move the function definition to include/linux/mount.h as it
seems a more sensible place for it.

Signed-off-by: Ian Kent 
Cc: Al Viro 
Cc: Eric W. Biederman 
Cc: Omar Sandoval 
---
 fs/autofs4/root.c |   11 +++
 fs/namespace.c|   41 ++---
 include/linux/fs.h|2 --
 include/linux/mount.h |   11 +++
 4 files changed, 28 insertions(+), 37 deletions(-)

diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index c4df881..dd2ea5d 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -437,13 +437,8 @@ static int autofs4_d_manage(struct path *path, bool 
rcu_walk)
 
/* The daemon never waits. */
if (autofs4_oz_mode(sbi)) {
-   if (rcu_walk) {
-   if (!path_is_mountpoint_rcu(path))
-   return -EISDIR;
-   } else {
-   if (!path_is_mountpoint(path))
-   return -EISDIR;
-   }
+   if (!path_is_mountpoint(path))
+   return -EISDIR;
return 0;
}
 
@@ -471,7 +466,7 @@ static int autofs4_d_manage(struct path *path, bool 
rcu_walk)
 
if (ino->flags & AUTOFS_INF_WANT_EXPIRE)
return 0;
-   if (path_is_mountpoint_rcu(path))
+   if (path_is_mountpoint(path))
return 0;
inode = d_inode_rcu(dentry);
if (inode && S_ISLNK(inode->i_mode))
diff --git a/fs/namespace.c b/fs/namespace.c
index 79473ee..da1cd87 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1160,12 +1160,23 @@ struct vfsmount *mntget(struct vfsmount *mnt)
 }
 EXPORT_SYMBOL(mntget);
 
-static bool __path_is_mountpoint(struct path *path)
+/* __path_is_mountpoint() - Check if path is a mount in the current
+ *  namespace.
+ *
+ *  d_mountpoint() can only be used reliably to establish if a dentry is
+ *  not mounted in any namespace and that common case is handled inline.
+ *  d_mountpoint() isn't aware of the possibility there may be multiple
+ *  mounts using a given dentry in a different namespace. This function
+ *  checks if the passed in path is a mountpoint rather than the dentry
+ *  alone.
+ */
+bool __path_is_mountpoint(const struct path *path)
 {
struct mount *mount;
struct vfsmount *mnt;
unsigned seq;
 
+   rcu_read_lock();
do {
seq = read_seqbegin(&mount_lock);
mount = __lookup_mnt(path->mnt, path->dentry);
@@ -1173,35 +1184,11 @@ static bool __path_is_mountpoint(struct path *path)
} while (mnt &&
 !(mnt->mnt_flags & MNT_SYNC_UMOUNT) &&
 read_seqretry(&mount_lock, seq));
-
-   return mnt != NULL;
-}
-
-/* Check if path is a mount in current namespace */
-bool path_is_mountpoint(struct path *path)
-{
-   bool res;
-
-   if (!d_mountpoint(path->dentry))
-   return false;
-
-   rcu_read_lock();
-   res = __path_is_mountpoint(path);
rcu_read_unlock();
 
-   return res;
-}
-EXPORT_SYMBOL(path_is_mountpoint);
-
-/* Check if path is a mount in current namespace */
-bool path_is_mountpoint_rcu(struct path *path)
-{
-   if (!d_mountpoint(path->dentry))
-   return false;
-
-   return __path_is_mountpoint(path);
+   return mnt != NULL;
 }
-EXPORT_SYMBOL(path_is_mountpoint_rcu);
+EXPORT_SYMBOL(__path_is_mountpoint);
 
 struct vfsmount *mnt_clone_internal(struct path *path)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 03a5a39..83de8b6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2095,8 +2095,6 @@ extern int vfs_ustat(dev_t, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
 extern int thaw_super(struct super_block *super);
 extern bool our_mnt(struct vfsmount *mnt);
-extern bool path_is_mountpoint(struct path *);
-extern bool path_is_mountpoint_rcu(struct path *);
 
 extern int current_umask(void);
 
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 1172cce..42dc62b 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -15,6 +15,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 struct super_block;
 struct vfsmount;
@@ -98,4 +100,13 @@ extern dev_t name_to_dev_t(const char *name);
 
 extern unsigned int sysctl_mount_max;
 
+extern bool __path_is_mountpoint(const struct path *path);
+static inline bool path_is_mountpoint(const struct path *path)
+{
+   if (!d_mountpoint(path->dentry))
+   return 0;
+
+   return __path_is_mountpoint(path);
+}
+
 #endif /* _LINUX_MOUNT_H */



Re: [PATCH 1/8] vfs - change d_manage() to take a struct path

2016-10-31 Thread Ian Kent
On Thu, 2016-10-27 at 14:50 +0800, Ian Kent wrote:
> On Thu, 2016-10-27 at 10:47 +0800, Ian Kent wrote:
> > 
> > On Thu, 2016-10-27 at 03:11 +0100, Al Viro wrote:
> > > 
> > >  
> > > 
> > > How much testing did it get?  I've several test setups involving
> > > autofs, but they are nowhere near exhaustive and I don't have good
> > > enough feel of the codebase to slap together something with decent
> > > coverage...
> > It got my standard testing.
> > 
> > For that I use a modified version of the autofs Connectathon system.
> > 
> > It's more about testing a wide variety of syntax and map setups and so
> > exercises
> > a large number of different types of autofs mounts.
> > 
> > It's meant to check normal operation but not so much stress testing even
> > though
> > it does perform quite a few mounts (around 250-300, not to mention the
> > autofs
> > mounts themselves).
> > 
> > I have another standard test I call the submount-test and it was originally
> > done
> > to stress test the most common problem I see, concurrent expire to mount.
> > 
> > I didn't see any problems I couldn't explain in these but I might need to
> > re-
> > visit the submount-test to see if it is still doing what I want.
> > 
> > OTOH, the pattern of mount and umount I see when the submount-test is run
> > does
> > look like it is doing what I want but it might not be getting all the way to
> > the
> > top of the tree of mounts enough times over the course of the test.
> > 
> > So I'm happy with my testing, just not as happy as I could be.
> Well, almost happy with my testing.
> 
> Naturally I also tested the specific case this series is meant to fix.
> 
> Basically:
> ls /mnt/foo# do the initial automount
> unshare -m sleep 10 &  # hold the automount in a new namespace
> umount /mnt/foo# pretend the mount timed out
> ls /mnt/foo# try to access it again
> ls: cannot open directory '/mnt/foo': Too many levels of symbolic links
> 
> as seen on the autofs mailing list. My specific test was a little different
> but
> verified this was resolved.
> 
> Now that Al seems reasonably OK with the series, with some changes, I'll test
> some other use cases, mainly to verify the expire still functions as required.
> That might need more work.

I have done some further tests, specifically for (what I believe are) the two
most common use cases.

First, using automount(8) entirely within a container, as expected works fine.

But the second case, one where automount(8) is run in the root namespace and has
automount directories bound into a container does have a problem.

The problem is due to may_umount_tree() only considering mounts in the root
namespace and leads to expire attempts on mounts even if they are in use in
another namespace.

It's not a serious problem as the umount attempt fails because the mount is busy
but it would be good to avoid the call back overhead.

Unfortunately it looks like transforming may_umount_tree() to use a similar
check to may_umount() introduces a race (picked up by my submount-test) which
I'm struggling to understand, I'll continue to work on it.

Ian


Re: [PATCH 1/8] vfs - change d_manage() to take a struct path

2016-10-26 Thread Ian Kent
On Thu, 2016-10-27 at 10:47 +0800, Ian Kent wrote:
> On Thu, 2016-10-27 at 03:11 +0100, Al Viro wrote:
> > 
> > 
> > How much testing did it get?  I've several test setups involving
> > autofs, but they are nowhere near exhaustive and I don't have good
> > enough feel of the codebase to slap together something with decent
> > coverage...
> It got my standard testing.
> 
> For that I use a modified version of the autofs Connectathon system.
> 
> It's more about testing a wide variety of syntax and map setups and so
> exercises
> a large number of different types of autofs mounts.
> 
> It's meant to check normal operation but not so much stress testing even
> though
> it does perform quite a few mounts (around 250-300, not to mention the autofs
> mounts themselves).
> 
> I have another standard test I call the submount-test and it was originally
> done
> to stress test the most common problem I see, concurrent expire to mount.
> 
> I didn't see any problems I couldn't explain in these but I might need to re-
> visit the submount-test to see if it is still doing what I want.
> 
> OTOH, the pattern of mount and umount I see when the submount-test is run does
> look like it is doing what I want but it might not be getting all the way to
> the
> top of the tree of mounts enough times over the course of the test.
> 
> So I'm happy with my testing, just not as happy as I could be.

Well, almost happy with my testing.

Naturally I also tested the specific case this series is meant to fix.

Basically:
ls /mnt/foo# do the initial automount
unshare -m sleep 10 &  # hold the automount in a new namespace
umount /mnt/foo# pretend the mount timed out
ls /mnt/foo# try to access it again
ls: cannot open directory '/mnt/foo': Too many levels of symbolic links

as seen on the autofs mailing list. My specific test was a little different but
verified this was resolved.

Now that Al seems reasonably OK with the series, with some changes, I'll test
some other use cases, mainly to verify the expire still functions as required.
That might need more work.

Ian


Re: [PATCH 6/8] autofs - use path_is_mountpoint() to fix unreliable d_mountpoint() checks

2016-10-26 Thread Ian Kent
On Thu, 2016-10-27 at 03:17 +0100, Al Viro wrote:
> On Tue, Oct 11, 2016 at 01:34:18PM +0800, Ian Kent wrote:
> > 
> > +   path = file->f_path;
> > +
> >     /*
> >      * An empty directory in an autofs file system is always a
> >      * mount point. The daemon must have failed to mount this
> > @@ -123,7 +126,7 @@ static int autofs4_dir_open(struct inode *inode, struct
> > file *file)
> >      * it.
> >      */
> >     spin_lock(&sbi->lookup_lock);
> > -   if (!d_mountpoint(dentry) && simple_empty(dentry)) {
> > +   if (!path_is_mountpoint(&path) && simple_empty(dentry)) {
> Why not &file->f_path, provided that you constify that thing properly?

Yep, my bad, as pointed out by Eric.

Patches to fix that and constify a bunch of things will follow.

> 
> > 
> > +   if (rcu_walk) {
> > +   if (!path_is_mountpoint_rcu(path))
> > +   return -EISDIR;
> > +   } else {
> > +   if (!path_is_mountpoint(path))
> > +   return -EISDIR;
> IDGI.  What's the point of _having_ the _rcu() variant, anyway?  Here you
> are probably paying more in terms of i-cache footprint/branch prediction
> than you win on not doing that rcu_read_lock()/rcu_read_unlock()...
> 
> _rcu variants make sense when non-RCU case does something you can't do
> under RCU; here your path_is_mountpoint() is pretty close to being
> rcu_read_lock()+path_is_mountpoint_rcu()+rcu_read_unlock() anyway...

Again, my bad, I'll merge these two and post along with the follow up patches
above.

Thanks Al,
Ian


Re: [PATCH 1/8] vfs - change d_manage() to take a struct path

2016-10-26 Thread Ian Kent
On Thu, 2016-10-27 at 03:11 +0100, Al Viro wrote:
> On Fri, Oct 21, 2016 at 07:39:36AM +0800, Ian Kent wrote:
> 
> > 
> > Maybe Al has been too busy to comment, he has been on the Cc from the start.
> That's... a very mild version of what's been going on.  Let's just say that
> the last few weeks had been really interesting.  Not that the shit has
> settled, but there was some slackening in the shitstorm last few days.
> Unlikely to last, I'm afraid, but...
>  
> > 
> > Hopefully this email will prompt a review, Al?
> Aside of the Eric's note re constifying struct path (strongly seconded),
> I'm not sure if expiration-related side of that is correct.  OTOH,
> since the expiration happens from userland...

Sure, I have a follow up series to do the constifying as recommended by Eric and
now yourself.

> 
> How much testing did it get?  I've several test setups involving
> autofs, but they are nowhere near exhaustive and I don't have good
> enough feel of the codebase to slap together something with decent
> coverage...

It got my standard testing.

For that I use a modified version of the autofs Connectathon system.

It's more about testing a wide variety of syntax and map setups and so exercises
a large number of different types of autofs mounts.

It's meant to check normal operation but not so much stress testing even though
it does perform quite a few mounts (around 250-300, not to mention the autofs
mounts themselves).

I have another standard test I call the submount-test and it was originally done
to stress test the most common problem I see, concurrent expire to mount.

I didn't see any problems I couldn't explain in these but I might need to re-
visit the submount-test to see if it is still doing what I want.

OTOH, the pattern of mount and umount I see when the submount-test is run does
look like it is doing what I want but it might not be getting all the way to the
top of the tree of mounts enough times over the course of the test.

So I'm happy with my testing, just not as happy as I could be.

Ian


Re: [PATCH 1/8] vfs - change d_manage() to take a struct path

2016-10-20 Thread Ian Kent
On Wed, 2016-10-19 at 12:40 -0700, Andrew Morton wrote:
> On Tue, 11 Oct 2016 13:33:52 +0800 Ian Kent  wrote:
> 
> > For the autofs module to be able to reliably check if a dentry is a
> > mountpoint in a multiple namespace environment the ->d_manage() dentry
> > operation will need to take a path argument instead of a dentry.
> 
> This patchset contains lots of ViroStuff.  I'll queue it up for some
> testing and will go into wait-and-see mode.

Thanks Andrew.

Maybe Al has been too busy to comment, he has been on the Cc from the start.

Hopefully this email will prompt a review, Al?

> 
> Some patches had an explicit From: Ian Kent  and some
> did not.  I assumed that this was intended for all patches.  So all
> patches now have different From: and Signed-off-by: email addresses. 
> Unclear if this was your intent ;)

Umm ... sorry about that, I didn't pay enough attention to the From of the
patches when I resurrected them from an earlier attempt at this.

Since the time I did this with an earlier patch series I have elected to not
change my git config and set these in the local tree config so it shouldn't
happen for new work. I'll pay special attention to this if I need to resurrect
any other older patches too.

I know this looks confusing and isn't the best but both email addresses have
been present on my gpg key for a long time so it's verifiable the patches are
from me.

Once again sorry about that.
Ian


Re: [PATCH 1/8] vfs - change d_manage() to take a struct path

2016-10-11 Thread Ian Kent
On Tue, 2016-10-11 at 11:04 -0500, Eric W. Biederman wrote:
> Ian Kent  writes:
> 
> > For the autofs module to be able to reliably check if a dentry is a
> > mountpoint in a multiple namespace environment the ->d_manage() dentry
> > operation will need to take a path argument instead of a dentry.
> 
> Taking a quick look overall I see no issues with this series.  Overall
> it seems straight forward.
> 
> On the nit side I expect saying const struct path * in the functions
> that now take a struct path would be useful.
> 
> I suspect it would also be useful to say
>   const struct path *path;
> path = &file->f_path;
> 
> In the one part of the code where you do that.  Instead of copying the
> path out of the struct file.
> 
> Overall I expect that will keep down bugs at no reduction in usability.
> Just a statement that the struct path won't change when it is passed
> to various functions.

Thanks Eric, that's a good suggestion for a follow up patch, will do.

Ian


[PATCH 5/8] autofs - change autofs4_wait() to take struct path

2016-10-10 Thread Ian Kent
From: Ian Kent 

In order to use the functions path_is_mountpoint() (or its rcu-walk
variant) and path_has_submounts() autofs needs to pass a struct path
in several places.

Now change autofs4_wait() to take a struct path instead of a struct
dentry.

Signed-off-by: Ian Kent 
Cc: Al Viro 
Cc: Eric W. Biederman 
Cc: Omar Sandoval 
---
 fs/autofs4/autofs_i.h |2 +-
 fs/autofs4/expire.c   |5 +++--
 fs/autofs4/root.c |   16 
 fs/autofs4/waitq.c|3 ++-
 4 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index 6d72bba..14cef41 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -217,7 +217,7 @@ static inline int autofs_prepare_pipe(struct file *pipe)
 
 /* Queue management functions */
 
-int autofs4_wait(struct autofs_sb_info *, struct dentry *, enum autofs_notify);
+int autofs4_wait(struct autofs_sb_info *, struct path *, enum autofs_notify);
 int autofs4_wait_release(struct autofs_sb_info *, autofs_wqt_t, int);
 void autofs4_catatonic_mode(struct autofs_sb_info *);
 
diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
index 7eac498..a37ba40 100644
--- a/fs/autofs4/expire.c
+++ b/fs/autofs4/expire.c
@@ -526,7 +526,7 @@ int autofs4_expire_wait(struct path *path, int rcu_walk)
 
pr_debug("waiting for expire %p name=%pd\n", dentry, dentry);
 
-   status = autofs4_wait(sbi, dentry, NFY_NONE);
+   status = autofs4_wait(sbi, path, NFY_NONE);
wait_for_completion(&ino->expire_complete);
 
pr_debug("expire done status=%d\n", status);
@@ -593,11 +593,12 @@ int autofs4_do_expire_multi(struct super_block *sb, 
struct vfsmount *mnt,
 
if (dentry) {
struct autofs_info *ino = autofs4_dentry_ino(dentry);
+   struct path path = { .mnt = mnt, .dentry = dentry };
 
/* This is synchronous because it makes the daemon a
 * little easier
 */
-   ret = autofs4_wait(sbi, dentry, NFY_EXPIRE);
+   ret = autofs4_wait(sbi, &path, NFY_EXPIRE);
 
spin_lock(&sbi->fs_lock);
/* avoid rapid-fire expire attempts if expiry fails */
diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index 35096e3..d47930ad 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -269,17 +269,17 @@ static struct dentry *autofs4_lookup_expiring(struct 
dentry *dentry,
return NULL;
 }
 
-static int autofs4_mount_wait(struct dentry *dentry, bool rcu_walk)
+static int autofs4_mount_wait(struct path *path, bool rcu_walk)
 {
-   struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
-   struct autofs_info *ino = autofs4_dentry_ino(dentry);
+   struct autofs_sb_info *sbi = autofs4_sbi(path->dentry->d_sb);
+   struct autofs_info *ino = autofs4_dentry_ino(path->dentry);
int status = 0;
 
if (ino->flags & AUTOFS_INF_PENDING) {
if (rcu_walk)
return -ECHILD;
-   pr_debug("waiting for mount name=%pd\n", dentry);
-   status = autofs4_wait(sbi, dentry, NFY_MOUNT);
+   pr_debug("waiting for mount name=%pd\n", path->dentry);
+   status = autofs4_wait(sbi, path, NFY_MOUNT);
pr_debug("mount wait done status=%d\n", status);
}
ino->last_used = jiffies;
@@ -364,7 +364,7 @@ static struct vfsmount *autofs4_d_automount(struct path 
*path)
spin_lock(&sbi->fs_lock);
if (ino->flags & AUTOFS_INF_PENDING) {
spin_unlock(&sbi->fs_lock);
-   status = autofs4_mount_wait(dentry, 0);
+   status = autofs4_mount_wait(path, 0);
if (status)
return ERR_PTR(status);
goto done;
@@ -405,7 +405,7 @@ static struct vfsmount *autofs4_d_automount(struct path 
*path)
}
ino->flags |= AUTOFS_INF_PENDING;
spin_unlock(&sbi->fs_lock);
-   status = autofs4_mount_wait(dentry, 0);
+   status = autofs4_mount_wait(path, 0);
spin_lock(&sbi->fs_lock);
ino->flags &= ~AUTOFS_INF_PENDING;
if (status) {
@@ -447,7 +447,7 @@ static int autofs4_d_manage(struct path *path, bool 
rcu_walk)
 * This dentry may be under construction so wait on mount
 * completion.
 */
-   status = autofs4_mount_wait(dentry, rcu_walk);
+   status = autofs4_mount_wait(path, rcu_walk);
if (status)
return status;
 
diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
index 431fd7e..f757f87 100644
--- a/fs/autofs4/waitq.c
+++ b/fs/autofs4/waitq.c
@@ -345,8 +345,9 @@ static int validate_request(struct autofs_wait_queue **wait,
 }
 
 int autofs4_wait(struct auto

[PATCH 2/8] vfs - add path_is_mountpoint() helper

2016-10-10 Thread Ian Kent
From: Ian Kent 

d_mountpoint() can only be used reliably to establish if a dentry is
not mounted in any namespace. It isn't aware of the possibility there
may be multiple mounts using a given dentry that may be in a different
namespace.

Add helper functions, path_is_mountpoint() and an rcu version , that
checks if a struct path is a mountpoint for this case.

Signed-off-by: Ian Kent 
Cc: Al Viro 
Cc: Eric W. Biederman 
Cc: Omar Sandoval 
---
 fs/namespace.c |   43 +++
 include/linux/fs.h |2 ++
 2 files changed, 45 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index ff1cd14..5ef9618 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1160,6 +1160,49 @@ struct vfsmount *mntget(struct vfsmount *mnt)
 }
 EXPORT_SYMBOL(mntget);
 
+static bool __path_is_mountpoint(struct path *path)
+{
+   struct mount *mount;
+   struct vfsmount *mnt;
+   unsigned seq;
+
+   do {
+   seq = read_seqbegin(&mount_lock);
+   mount = __lookup_mnt(path->mnt, path->dentry);
+   mnt = mount ? &mount->mnt : NULL;
+   } while (mnt &&
+!(mnt->mnt_flags & MNT_SYNC_UMOUNT) &&
+read_seqretry(&mount_lock, seq));
+
+   return mnt != NULL;
+}
+
+/* Check if path is a mount in current namespace */
+bool path_is_mountpoint(struct path *path)
+{
+   bool res;
+
+   if (!d_mountpoint(path->dentry))
+   return 0;
+
+   rcu_read_lock();
+   res = __path_is_mountpoint(path);
+   rcu_read_unlock();
+
+   return res;
+}
+EXPORT_SYMBOL(path_is_mountpoint);
+
+/* Check if path is a mount in current namespace */
+bool path_is_mountpoint_rcu(struct path *path)
+{
+   if (!d_mountpoint(path->dentry))
+   return 0;
+
+   return __path_is_mountpoint(path);
+}
+EXPORT_SYMBOL(path_is_mountpoint_rcu);
+
 struct vfsmount *mnt_clone_internal(struct path *path)
 {
struct mount *p;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 09a8c41..deaf08b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2142,6 +2142,8 @@ extern int vfs_ustat(dev_t, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
 extern int thaw_super(struct super_block *super);
 extern bool our_mnt(struct vfsmount *mnt);
+extern bool path_is_mountpoint(struct path *);
+extern bool path_is_mountpoint_rcu(struct path *);
 
 extern int current_umask(void);
 



[PATCH 8/8] vfs - remove unused have_submounts() function

2016-10-10 Thread Ian Kent
Now that path_has_submounts() has been added have_submounts() is no
longer used so remove it.

Signed-off-by: Ian Kent 
Cc: Al Viro 
Cc: Eric W. Biederman 
Cc: Omar Sandoval 
---
 fs/dcache.c|   33 -
 include/linux/dcache.h |1 -
 2 files changed, 34 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 872f04e..719d8b4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1273,39 +1273,6 @@ static void d_walk(struct dentry *parent, void *data,
goto again;
 }
 
-/*
- * Search for at least 1 mount point in the dentry's subdirs.
- * We descend to the next level whenever the d_subdirs
- * list is non-empty and continue searching.
- */
-
-static enum d_walk_ret check_mount(void *data, struct dentry *dentry)
-{
-   int *ret = data;
-   if (d_mountpoint(dentry)) {
-   *ret = 1;
-   return D_WALK_QUIT;
-   }
-   return D_WALK_CONTINUE;
-}
-
-/**
- * have_submounts - check for mounts over a dentry
- * @parent: dentry to check.
- *
- * Return true if the parent or its subdirectories contain
- * a mount point
- */
-int have_submounts(struct dentry *parent)
-{
-   int ret = 0;
-
-   d_walk(parent, &ret, check_mount, NULL);
-
-   return ret;
-}
-EXPORT_SYMBOL(have_submounts);
-
 struct check_mount {
struct vfsmount *mnt;
unsigned int mounted;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index e351646..44a9a9b 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -254,7 +254,6 @@ extern struct dentry *d_find_alias(struct inode *);
 extern void d_prune_aliases(struct inode *);
 
 /* test whether we have any submounts in a subdir tree */
-extern int have_submounts(struct dentry *);
 extern int path_has_submounts(struct path *);
 
 /*



[PATCH 3/8] vfs - add path_has_submounts()

2016-10-10 Thread Ian Kent
From: Ian Kent 

d_mountpoint() can only be used reliably to establish if a dentry is
not mounted in any namespace. It isn't aware of the possibility there
may be multiple mounts using the given dentry, possibly in a different
namespace.

Add function, path_has_submounts(), that checks is a struct path contains
mounts (or is a mountpoint itself) to handle this case.

Signed-off-by: Ian Kent 
Cc: Al Viro 
Cc: Eric W. Biederman 
Cc: Omar Sandoval 
---
 fs/dcache.c|   35 +++
 include/linux/dcache.h |1 +
 2 files changed, 36 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index 5c7cc95..872f04e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1306,6 +1306,41 @@ int have_submounts(struct dentry *parent)
 }
 EXPORT_SYMBOL(have_submounts);
 
+struct check_mount {
+   struct vfsmount *mnt;
+   unsigned int mounted;
+};
+
+static enum d_walk_ret path_check_mount(void *data, struct dentry *dentry)
+{
+   struct check_mount *info = data;
+   struct path path = { .mnt = info->mnt, .dentry = dentry };
+
+   if (path_is_mountpoint(&path)) {
+   info->mounted = 1;
+   return D_WALK_QUIT;
+   }
+   return D_WALK_CONTINUE;
+}
+
+/**
+ * path_has_submounts - check for mounts over a dentry in the
+ *  current namespace.
+ * @parent: path to check.
+ *
+ * Return true if the parent or its subdirectories contain
+ * a mount point in the current namespace.
+ */
+int path_has_submounts(struct path *parent)
+{
+   struct check_mount data = { .mnt = parent->mnt, .mounted = 0 };
+
+   d_walk(parent->dentry, &data, path_check_mount, NULL);
+
+   return data.mounted;
+}
+EXPORT_SYMBOL(path_has_submounts);
+
 /*
  * Called by mount code to set a mountpoint and check if the mountpoint is
  * reachable (e.g. NFS can unhash a directory dentry and then the complete
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 798bc04..e351646 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -255,6 +255,7 @@ extern void d_prune_aliases(struct inode *);
 
 /* test whether we have any submounts in a subdir tree */
 extern int have_submounts(struct dentry *);
+extern int path_has_submounts(struct path *);
 
 /*
  * This adds the entry to the hash queues.



[PATCH 4/8] autofs - change autofs4_expire_wait() to take struct path

2016-10-10 Thread Ian Kent
From: Ian Kent 

In order to use the functions path_is_mountpoint() (or it's rcu-walk
variant) and path_has_submounts() autofs needs to pass a struct path
in several places.

Start by changing autofs4_expire_wait() to take a struct path instead
of a struct dentry.

Signed-off-by: Ian Kent 
Cc: Al Viro 
Cc: Eric W. Biederman 
Cc: Omar Sandoval 
---
 fs/autofs4/autofs_i.h  |2 +-
 fs/autofs4/dev-ioctl.c |2 +-
 fs/autofs4/expire.c|3 ++-
 fs/autofs4/root.c  |   12 +++-
 4 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index a1fba42..6d72bba 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -145,7 +145,7 @@ void autofs4_free_ino(struct autofs_info *);
 
 /* Expiration */
 int is_autofs4_dentry(struct dentry *);
-int autofs4_expire_wait(struct dentry *dentry, int rcu_walk);
+int autofs4_expire_wait(struct path *path, int rcu_walk);
 int autofs4_expire_run(struct super_block *, struct vfsmount *,
   struct autofs_sb_info *,
   struct autofs_packet_expire __user *);
diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index fc09eb7..40c69f9 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -468,7 +468,7 @@ static int autofs_dev_ioctl_requester(struct file *fp,
ino = autofs4_dentry_ino(path.dentry);
if (ino) {
err = 0;
-   autofs4_expire_wait(path.dentry, 0);
+   autofs4_expire_wait(&path, 0);
spin_lock(&sbi->fs_lock);
param->requester.uid =
from_kuid_munged(current_user_ns(), ino->uid);
diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
index d8e6d42..7eac498 100644
--- a/fs/autofs4/expire.c
+++ b/fs/autofs4/expire.c
@@ -495,8 +495,9 @@ struct dentry *autofs4_expire_indirect(struct super_block 
*sb,
return expired;
 }
 
-int autofs4_expire_wait(struct dentry *dentry, int rcu_walk)
+int autofs4_expire_wait(struct path *path, int rcu_walk)
 {
+   struct dentry *dentry = path->dentry;
struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
struct autofs_info *ino = autofs4_dentry_ino(dentry);
int status;
diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index 5cbd4e1..35096e3 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -286,22 +286,24 @@ static int autofs4_mount_wait(struct dentry *dentry, bool 
rcu_walk)
return status;
 }
 
-static int do_expire_wait(struct dentry *dentry, bool rcu_walk)
+static int do_expire_wait(struct path *path, bool rcu_walk)
 {
+   struct dentry *dentry = path->dentry;
struct dentry *expiring;
 
expiring = autofs4_lookup_expiring(dentry, rcu_walk);
if (IS_ERR(expiring))
return PTR_ERR(expiring);
if (!expiring)
-   return autofs4_expire_wait(dentry, rcu_walk);
+   return autofs4_expire_wait(path, rcu_walk);
else {
+   struct path this = { .mnt = path->mnt, .dentry = expiring };
/*
 * If we are racing with expire the request might not
 * be quite complete, but the directory has been removed
 * so it must have been successful, just wait for it.
 */
-   autofs4_expire_wait(expiring, 0);
+   autofs4_expire_wait(&this, 0);
autofs4_del_expiring(expiring);
dput(expiring);
}
@@ -354,7 +356,7 @@ static struct vfsmount *autofs4_d_automount(struct path 
*path)
 * and the directory was removed, so just go ahead and try
 * the mount.
 */
-   status = do_expire_wait(dentry, 0);
+   status = do_expire_wait(path, 0);
if (status && status != -EAGAIN)
return NULL;
 
@@ -438,7 +440,7 @@ static int autofs4_d_manage(struct path *path, bool 
rcu_walk)
}
 
/* Wait for pending expires */
-   if (do_expire_wait(dentry, rcu_walk) == -ECHILD)
+   if (do_expire_wait(path, rcu_walk) == -ECHILD)
return -ECHILD;
 
/*



[PATCH 6/8] autofs - use path_is_mountpoint() to fix unreliable d_mountpoint() checks

2016-10-10 Thread Ian Kent
From: Ian Kent 

If an automount mount is clone(2)ed into a file system that is propagation
private, when it later expires in the originating namespace, subsequent
calls to autofs ->d_automount() for that dentry in the original namespace
will return ELOOP until the mount is umounted in the cloned namespace.

Now that a struct path is available where needed use path_is_mountpoint()
instead of d_mountpoint() so we don't get false positives when checking
if a dentry is a mount point in the current namespace.

Signed-off-by: Ian Kent 
Cc: Al Viro 
Cc: Eric W. Biederman 
Cc: Omar Sandoval 
---
 fs/autofs4/root.c |   24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index d47930ad..d7e48fe 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -107,12 +107,15 @@ static int autofs4_dir_open(struct inode *inode, struct 
file *file)
 {
struct dentry *dentry = file->f_path.dentry;
struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
+   struct path path;
 
pr_debug("file=%p dentry=%p %pd\n", file, dentry, dentry);
 
if (autofs4_oz_mode(sbi))
goto out;
 
+   path = file->f_path;
+
/*
 * An empty directory in an autofs file system is always a
 * mount point. The daemon must have failed to mount this
@@ -123,7 +126,7 @@ static int autofs4_dir_open(struct inode *inode, struct 
file *file)
 * it.
 */
spin_lock(&sbi->lookup_lock);
-   if (!d_mountpoint(dentry) && simple_empty(dentry)) {
+   if (!path_is_mountpoint(&path) && simple_empty(dentry)) {
spin_unlock(&sbi->lookup_lock);
return -ENOENT;
}
@@ -372,15 +375,15 @@ static struct vfsmount *autofs4_d_automount(struct path 
*path)
 
/*
 * If the dentry is a symlink it's equivalent to a directory
-* having d_mountpoint() true, so there's no need to call back
-* to the daemon.
+* having path_is_mountpoint() true, so there's no need to call
+* back to the daemon.
 */
if (d_really_is_positive(dentry) && d_is_symlink(dentry)) {
spin_unlock(&sbi->fs_lock);
goto done;
}
 
-   if (!d_mountpoint(dentry)) {
+   if (!path_is_mountpoint(path)) {
/*
 * It's possible that user space hasn't removed directories
 * after umounting a rootless multi-mount, although it
@@ -434,8 +437,13 @@ static int autofs4_d_manage(struct path *path, bool 
rcu_walk)
 
/* The daemon never waits. */
if (autofs4_oz_mode(sbi)) {
-   if (!d_mountpoint(dentry))
-   return -EISDIR;
+   if (rcu_walk) {
+   if (!path_is_mountpoint_rcu(path))
+   return -EISDIR;
+   } else {
+   if (!path_is_mountpoint(path))
+   return -EISDIR;
+   }
return 0;
}
 
@@ -463,7 +471,7 @@ static int autofs4_d_manage(struct path *path, bool 
rcu_walk)
 
if (ino->flags & AUTOFS_INF_WANT_EXPIRE)
return 0;
-   if (d_mountpoint(dentry))
+   if (path_is_mountpoint_rcu(path))
return 0;
inode = d_inode_rcu(dentry);
if (inode && S_ISLNK(inode->i_mode))
@@ -490,7 +498,7 @@ static int autofs4_d_manage(struct path *path, bool 
rcu_walk)
 * we can avoid needless calls ->d_automount() and avoid
 * an incorrect ELOOP error return.
 */
-   if ((!d_mountpoint(dentry) && !simple_empty(dentry)) ||
+   if ((!path_is_mountpoint(path) && !simple_empty(dentry)) ||
(d_really_is_positive(dentry) && d_is_symlink(dentry)))
status = -EISDIR;
}



<    1   2   3   4   5   6   7   8   >