Re: autofs multi-map regression

2017-06-16 Thread Dick Streefland
On Friday 2017-06-16 15:57, Eric W. Biederman wrote:
| I don't believe this is a kernel change.
| 
| I dug up an old VM and I was able to reproduce this issue simply
| by installing autofs, and your auto.master and auto.net files.
| 
| # uname -a
| Linux ubuntu-16 4.4.0-24-generic #43-Ubuntu SMP Wed Jun 8 19:27:37 UTC 2016 
x86_64 x86_64 x86_64 GNU/Linux
| 
| # ls /net/
| localhost
| # ls /net/localhost/loc
| ls: cannot open directory '/net/localhost/loc': Too many levels of symbolic 
links
| # ls /loc
| ls: cannot open directory '/loc/': Too many levels of symbolic links
| 
| I suspect there is configuration somewhere in your autofs
| configuration.  I don't speak autofs well enough to debug the issue at
| this point.  But I can conclusively say it was not the kernel commit you
| pointed at, as I see the issue you are reporting and I don't have that
| commit in the kernel under test.

I have a second partition mounted on /loc, that is the reason for the
multi-map autofs setup. With a separate mount on /loc, you won't see
the errors with the old kernel.

Fact is that my setup worked for a long time, and that it stopped
working after the backport of commit 1064f874 to the ubuntu 4.4
kernel.

-- 
Dick


Re: autofs multi-map regression

2017-06-16 Thread Eric W. Biederman
Dick Streefland  writes:

> On Friday 2017-06-16 12:03, Eric W. Biederman wrote:
> | Interesting...
> | 
> | Can you test this on a stock 4.11 kernel?
> | 
> | I definitely need a little bit more information to solve this.  That
> | commit did not add any new error condidtions so I need to understand
> | what state you are getting yourself into that is affected by this
> | commit.
> | 
> | Is there a chance you can post /proc/self/mountinfo from when this is
> | happening?
>
> I've installed the mainline 4.11 kernel from:
>
>   http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11/
>
> and this kernel works correctly!
>
> So either this issue was fixed in the meantime, or it is something
> specific to the Ubuntu kernel. I guess I should file a bug report
> with Ubuntu then?

Please.

> I've also looked at /proc/self/mountinfo before and directly after the
> mount attempt. Here are the ext4 and autofs entries for the failing 4.4
> kernel:

Thank you.

I am definitely out of my depth on the autofs portion of this.  As
things are working with 4.11 and failing with my test of 4.4 with
a much older kernel.  I will leave this with you and the ubuntu folks to
sort out.

Good Luck,

Eric


> before:
> 23 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 
> rw,errors=remount-ro,data=ordered
> 41 19 0:34 / /proc/sys/fs/binfmt_misc rw,relatime shared:24 - autofs 
> systemd-1 rw,fd=34,pgrp=1,timeout=0,minproto=5,maxproto=5,direct
> 46 23 8:4 / /loc rw,nosuid,nodev,noatime shared:30 - ext4 /dev/sda4 
> rw,block_validity,delalloc,barrier,user_xattr,acl
> 202 23 0:44 / /net rw,relatime shared:160 - autofs /etc/auto.net 
> rw,fd=6,pgrp=1724,timeout=120,minproto=5,maxproto=5,indirect
>

> after:
> 23 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 
> rw,errors=remount-ro,data=ordered
> 41 19 0:34 / /proc/sys/fs/binfmt_misc rw,relatime shared:24 - autofs 
> systemd-1 rw,fd=34,pgrp=1,timeout=0,minproto=5,maxproto=5,direct
> 46 162 8:4 / /loc rw,nosuid,nodev,noatime shared:30 - ext4 /dev/sda4 
> rw,block_validity,delalloc,barrier,user_xattr,acl
> 202 23 0:44 / /net rw,relatime shared:160 - autofs /etc/auto.net 
> rw,fd=6,pgrp=1724,timeout=120,minproto=5,maxproto=5,indirect
> 157 202 8:2 / /net/localhost rw,relatime shared:1 - ext4 /dev/sda2 
> rw,errors=remount-ro,data=ordered
> 161 157 0:47 / /net/localhost/loc rw,relatime shared:119 - autofs 
> /etc/auto.net rw,fd=6,pgrp=1724,timeout=120,minproto=5,maxproto=5,offset
> 162 23 0:47 / /loc rw,relatime shared:119 - autofs /etc/auto.net 
> rw,fd=6,pgrp=1724,timeout=120,minproto=5,maxproto=5,offset
>
> And here the info for the working mainline 4.11 kernel:
>
> before:
> 23 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 
> rw,errors=remount-ro,data=ordered
> 74 19 0:36 / /proc/sys/fs/binfmt_misc rw,relatime shared:56 - autofs 
> systemd-1 
> rw,fd=35,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=12754
> 45 23 8:4 / /loc rw,nosuid,nodev,noatime shared:28 - ext4 /dev/sda4 
> rw,block_validity,delalloc,barrier,user_xattr,acl
> 208 23 0:46 / /net rw,relatime shared:164 - autofs /etc/auto.net 
> rw,fd=6,pgrp=1545,timeout=120,minproto=5,maxproto=5,indirect,pipe_ino=26555
>
> after:
> 23 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 
> rw,errors=remount-ro,data=ordered
> 74 19 0:36 / /proc/sys/fs/binfmt_misc rw,relatime shared:56 - autofs 
> systemd-1 
> rw,fd=35,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=12754
> 45 175 8:4 / /loc rw,nosuid,nodev,noatime shared:28 - ext4 /dev/sda4 
> rw,block_validity,delalloc,barrier,user_xattr,acl
> 208 23 0:46 / /net rw,relatime shared:164 - autofs /etc/auto.net 
> rw,fd=6,pgrp=1545,timeout=120,minproto=5,maxproto=5,indirect,pipe_ino=26555
> 162 208 8:2 / /net/localhost rw,relatime shared:1 - ext4 /dev/sda2 
> rw,errors=remount-ro,data=ordered
> 166 162 0:48 / /net/localhost/loc rw,relatime shared:122 - autofs 
> /etc/auto.net 
> rw,fd=6,pgrp=1545,timeout=120,minproto=5,maxproto=5,offset,pipe_ino=26555
> 167 23 0:48 / /loc rw,relatime shared:122 - autofs /etc/auto.net 
> rw,fd=6,pgrp=1545,timeout=120,minproto=5,maxproto=5,offset,pipe_ino=26555
> 174 166 8:4 / /net/localhost/loc rw,nosuid,nodev,noatime shared:28 - ext4 
> /dev/sda4 rw,block_validity,delalloc,barrier,user_xattr,acl
> 175 167 8:4 / /loc rw,nosuid,nodev,noatime shared:28 - ext4 /dev/sda4 
> rw,block_validity,delalloc,barrier,user_xattr,acl


Re: autofs multi-map regression

2017-06-16 Thread Eric W. Biederman
Dick Streefland  writes:

> After a recent upgrade of a Ubuntu xenial machine, a particular
> autofs multi-map mount setup stopped working. A simplified example is:
>
> ::
> auto.master
> ::
> /net  /etc/auto.net
> ::
> auto.net
> ::
> localhost / :/ /loc :/loc
>
> Accessing /net/localhost/loc should trigger two nested bind mounts on
> /net/localhost and /net/localhost/loc, but with the new kernel, it fails
> with ELOOP:
>
> $ ls /net/localhost/loc
> ls: cannot open directory '/net/localhost/loc': Too many levels of symbolic 
> links
>
> The problem is related to the upgrade of the Ubuntu xenial kernel from
> 4.4.0-38.57 to 4.4.0-78.99. I bisected the regression to commit
> 731ac92843877f3633325203abc942193c1e9001, which is a Ubuntu backport
> of this upstream kernel commit:
>
> commit 1064f874abc0d05eeed8993815f584d847b72486
> Author: Eric W. Biederman 
> Date:   Fri Jan 20 18:28:35 2017 +1300
>
> mnt: Tuck mounts under others instead of creating shadow/side mounts.


I don't believe this is a kernel change.

I dug up an old VM and I was able to reproduce this issue simply
by installing autofs, and your auto.master and auto.net files.

# uname -a
Linux ubuntu-16 4.4.0-24-generic #43-Ubuntu SMP Wed Jun 8 19:27:37 UTC 2016 
x86_64 x86_64 x86_64 GNU/Linux

# ls /net/
localhost
# ls /net/localhost/loc
ls: cannot open directory '/net/localhost/loc': Too many levels of symbolic 
links
# ls /loc
ls: cannot open directory '/loc/': Too many levels of symbolic links

I suspect there is configuration somewhere in your autofs
configuration.  I don't speak autofs well enough to debug the issue at
this point.  But I can conclusively say it was not the kernel commit you
pointed at, as I see the issue you are reporting and I don't have that
commit in the kernel under test.

Eric




Re: autofs multi-map regression

2017-06-16 Thread Dick Streefland
On Friday 2017-06-16 12:03, Eric W. Biederman wrote:
| Interesting...
| 
| Can you test this on a stock 4.11 kernel?
| 
| I definitely need a little bit more information to solve this.  That
| commit did not add any new error condidtions so I need to understand
| what state you are getting yourself into that is affected by this
| commit.
| 
| Is there a chance you can post /proc/self/mountinfo from when this is
| happening?

I've installed the mainline 4.11 kernel from:

  http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11/

and this kernel works correctly!

So either this issue was fixed in the meantime, or it is something
specific to the Ubuntu kernel. I guess I should file a bug report
with Ubuntu then?

I've also looked at /proc/self/mountinfo before and directly after the
mount attempt. Here are the ext4 and autofs entries for the failing 4.4
kernel:

before:
23 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 
rw,errors=remount-ro,data=ordered
41 19 0:34 / /proc/sys/fs/binfmt_misc rw,relatime shared:24 - autofs systemd-1 
rw,fd=34,pgrp=1,timeout=0,minproto=5,maxproto=5,direct
46 23 8:4 / /loc rw,nosuid,nodev,noatime shared:30 - ext4 /dev/sda4 
rw,block_validity,delalloc,barrier,user_xattr,acl
202 23 0:44 / /net rw,relatime shared:160 - autofs /etc/auto.net 
rw,fd=6,pgrp=1724,timeout=120,minproto=5,maxproto=5,indirect

after:
23 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 
rw,errors=remount-ro,data=ordered
41 19 0:34 / /proc/sys/fs/binfmt_misc rw,relatime shared:24 - autofs systemd-1 
rw,fd=34,pgrp=1,timeout=0,minproto=5,maxproto=5,direct
46 162 8:4 / /loc rw,nosuid,nodev,noatime shared:30 - ext4 /dev/sda4 
rw,block_validity,delalloc,barrier,user_xattr,acl
202 23 0:44 / /net rw,relatime shared:160 - autofs /etc/auto.net 
rw,fd=6,pgrp=1724,timeout=120,minproto=5,maxproto=5,indirect
157 202 8:2 / /net/localhost rw,relatime shared:1 - ext4 /dev/sda2 
rw,errors=remount-ro,data=ordered
161 157 0:47 / /net/localhost/loc rw,relatime shared:119 - autofs /etc/auto.net 
rw,fd=6,pgrp=1724,timeout=120,minproto=5,maxproto=5,offset
162 23 0:47 / /loc rw,relatime shared:119 - autofs /etc/auto.net 
rw,fd=6,pgrp=1724,timeout=120,minproto=5,maxproto=5,offset

And here the info for the working mainline 4.11 kernel:

before:
23 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 
rw,errors=remount-ro,data=ordered
74 19 0:36 / /proc/sys/fs/binfmt_misc rw,relatime shared:56 - autofs systemd-1 
rw,fd=35,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=12754
45 23 8:4 / /loc rw,nosuid,nodev,noatime shared:28 - ext4 /dev/sda4 
rw,block_validity,delalloc,barrier,user_xattr,acl
208 23 0:46 / /net rw,relatime shared:164 - autofs /etc/auto.net 
rw,fd=6,pgrp=1545,timeout=120,minproto=5,maxproto=5,indirect,pipe_ino=26555

after:
23 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 
rw,errors=remount-ro,data=ordered
74 19 0:36 / /proc/sys/fs/binfmt_misc rw,relatime shared:56 - autofs systemd-1 
rw,fd=35,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=12754
45 175 8:4 / /loc rw,nosuid,nodev,noatime shared:28 - ext4 /dev/sda4 
rw,block_validity,delalloc,barrier,user_xattr,acl
208 23 0:46 / /net rw,relatime shared:164 - autofs /etc/auto.net 
rw,fd=6,pgrp=1545,timeout=120,minproto=5,maxproto=5,indirect,pipe_ino=26555
162 208 8:2 / /net/localhost rw,relatime shared:1 - ext4 /dev/sda2 
rw,errors=remount-ro,data=ordered
166 162 0:48 / /net/localhost/loc rw,relatime shared:122 - autofs /etc/auto.net 
rw,fd=6,pgrp=1545,timeout=120,minproto=5,maxproto=5,offset,pipe_ino=26555
167 23 0:48 / /loc rw,relatime shared:122 - autofs /etc/auto.net 
rw,fd=6,pgrp=1545,timeout=120,minproto=5,maxproto=5,offset,pipe_ino=26555
174 166 8:4 / /net/localhost/loc rw,nosuid,nodev,noatime shared:28 - ext4 
/dev/sda4 rw,block_validity,delalloc,barrier,user_xattr,acl
175 167 8:4 / /loc rw,nosuid,nodev,noatime shared:28 - ext4 /dev/sda4 
rw,block_validity,delalloc,barrier,user_xattr,acl

-- 
Dick


Re: autofs multi-map regression

2017-06-16 Thread Eric W. Biederman
Dick Streefland  writes:

> After a recent upgrade of a Ubuntu xenial machine, a particular
> autofs multi-map mount setup stopped working. A simplified example is:
>
> ::
> auto.master
> ::
> /net  /etc/auto.net
> ::
> auto.net
> ::
> localhost / :/ /loc :/loc
>
> Accessing /net/localhost/loc should trigger two nested bind mounts on
> /net/localhost and /net/localhost/loc, but with the new kernel, it fails
> with ELOOP:
>
> $ ls /net/localhost/loc
> ls: cannot open directory '/net/localhost/loc': Too many levels of symbolic 
> links
>
> The problem is related to the upgrade of the Ubuntu xenial kernel from
> 4.4.0-38.57 to 4.4.0-78.99. I bisected the regression to commit
> 731ac92843877f3633325203abc942193c1e9001, which is a Ubuntu backport
> of this upstream kernel commit:
>
> commit 1064f874abc0d05eeed8993815f584d847b72486
> Author: Eric W. Biederman 
> Date:   Fri Jan 20 18:28:35 2017 +1300
>
> mnt: Tuck mounts under others instead of creating shadow/side mounts.

Interesting...

Can you test this on a stock 4.11 kernel?

I definitely need a little bit more information to solve this.  That
commit did not add any new error condidtions so I need to understand
what state you are getting yourself into that is affected by this
commit.

Is there a chance you can post /proc/self/mountinfo from when this is
happening?

Eric