Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-12-04 Thread Jan Kara
On Fri 24-11-17 15:03:37, Andreas Dilger wrote:
> On Nov 24, 2017, at 9:51 AM, Andi Kleen  wrote:
> > 
> >> We checked old kernels, and old e2fsprogs, and didn't see any cases
> >> where fast (<= 60 chars) symlinks were created using external blocks.
> >> It seems that _something_ did create them, and it would be good to
> >> figure that out so we can determine if it is a widespread problem
> > 
> > I assume it was the original kernel.
> > 
> >> 
> >> I think e2fsck can fix this quite easily, and there really isn't
> >> an easy way to revert to the old method if the large xattr feature
> >> is enabled.  If you are willing to run a new kernel, you should also
> >> be willing to run a new e2fsck.
> > 
> > It's obviously not enabled on ext3.
> > 
> >> We could probably add a fallback to the old mechanism (and print
> >> a one-time warning to upgrade to a newer e2fsck) if an external fast
> >> symlink is found and the large xattr  feature is not enabled, which
> >> would give more time to fix this (hopefully rare in the wild) case.
> > 
> > If the old kernel created it, then likely all the
> > /lib{,64}/ld-linux.so.2 symlinks have that, which breaks all ELF
> > executables. I suspect in these old file systems it's not particularly rare.
> 
> Sure, but not many people are going to be running a 4.14 kernel with
> a 2007 system.  Could you please run the updated find command to see
> whether this is an isolated case, or if it is a common case:
> 
> find / -type l -size -60c -print0 | xargs -0r ls -dils | awk '$2 != 0 { print 
> }'
> 
> It would also be useful if anyone else reading this that has an old
> system (2005-2011 install date) ran the same to see if any such
> symlinks are found.  To see when the root filesystem was created, run:
> 
> dumpe2fs -h $(df -P / | awk '/dev/ { print $1 }') 2>&1 | grep created

I have one fs image around from:

Filesystem created:   Tue Nov 15 04:43:22 2005

and it indeed does have these problematic symlinks as well:

none):~# l /usr/share/terminfo/x/xterm-r5
lrwxrwxrwx 1 root root 24 May 19  2006 /usr/share/terminfo/x/xterm-r5 ->
/lib/terminfo/x/xterm-r5
(none):~# stat /usr/share/terminfo/x/xterm-r5
  File: `/usr/share/terminfo/x/xterm-r5' -> `/lib/terminfo/x/xterm-r5'
  Size: 24  Blocks: 8  IO Block: 4096   symbolic link
Device: 6200h/25088dInode: 98027   Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (0/root)   Gid: (0/root)
Access: 2017-12-04 16:27:29.0 +
Modify: 2006-05-19 21:12:53.0 +
Change: 2006-05-19 21:12:53.0 +

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-12-04 Thread Jan Kara
On Fri 24-11-17 15:03:37, Andreas Dilger wrote:
> On Nov 24, 2017, at 9:51 AM, Andi Kleen  wrote:
> > 
> >> We checked old kernels, and old e2fsprogs, and didn't see any cases
> >> where fast (<= 60 chars) symlinks were created using external blocks.
> >> It seems that _something_ did create them, and it would be good to
> >> figure that out so we can determine if it is a widespread problem
> > 
> > I assume it was the original kernel.
> > 
> >> 
> >> I think e2fsck can fix this quite easily, and there really isn't
> >> an easy way to revert to the old method if the large xattr feature
> >> is enabled.  If you are willing to run a new kernel, you should also
> >> be willing to run a new e2fsck.
> > 
> > It's obviously not enabled on ext3.
> > 
> >> We could probably add a fallback to the old mechanism (and print
> >> a one-time warning to upgrade to a newer e2fsck) if an external fast
> >> symlink is found and the large xattr  feature is not enabled, which
> >> would give more time to fix this (hopefully rare in the wild) case.
> > 
> > If the old kernel created it, then likely all the
> > /lib{,64}/ld-linux.so.2 symlinks have that, which breaks all ELF
> > executables. I suspect in these old file systems it's not particularly rare.
> 
> Sure, but not many people are going to be running a 4.14 kernel with
> a 2007 system.  Could you please run the updated find command to see
> whether this is an isolated case, or if it is a common case:
> 
> find / -type l -size -60c -print0 | xargs -0r ls -dils | awk '$2 != 0 { print 
> }'
> 
> It would also be useful if anyone else reading this that has an old
> system (2005-2011 install date) ran the same to see if any such
> symlinks are found.  To see when the root filesystem was created, run:
> 
> dumpe2fs -h $(df -P / | awk '/dev/ { print $1 }') 2>&1 | grep created

I have one fs image around from:

Filesystem created:   Tue Nov 15 04:43:22 2005

and it indeed does have these problematic symlinks as well:

none):~# l /usr/share/terminfo/x/xterm-r5
lrwxrwxrwx 1 root root 24 May 19  2006 /usr/share/terminfo/x/xterm-r5 ->
/lib/terminfo/x/xterm-r5
(none):~# stat /usr/share/terminfo/x/xterm-r5
  File: `/usr/share/terminfo/x/xterm-r5' -> `/lib/terminfo/x/xterm-r5'
  Size: 24  Blocks: 8  IO Block: 4096   symbolic link
Device: 6200h/25088dInode: 98027   Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (0/root)   Gid: (0/root)
Access: 2017-12-04 16:27:29.0 +
Modify: 2006-05-19 21:12:53.0 +
Change: 2006-05-19 21:12:53.0 +

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-27 Thread Dave Chinner
On Mon, Nov 27, 2017 at 12:11:26PM -0500, Theodore Ts'o wrote:
> On Mon, Nov 27, 2017 at 08:14:27AM +1100, Dave Chinner wrote:
> > Of course. I've done that every time I've come acros these sorts of
> > problems.
> 
> The most recent report I was able to find was against 4.7-rc6, in July
> 2016.  Have you been able to reproduce it more recently than that?

I hit it once a couple of months ago, but I was was busy with much
higher priority stuff at the time (sorting out a CVE-worthy bug fix)
so it slipped off my radar pretty rapidly after I recovered the test
system and kept doing what I needed to do...

So, yeah, the problems are still there, I just don't run my root
filesystems out of space very often. Like I said - maybe once or
twice a year is the typical frequency this happens.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-27 Thread Dave Chinner
On Mon, Nov 27, 2017 at 12:11:26PM -0500, Theodore Ts'o wrote:
> On Mon, Nov 27, 2017 at 08:14:27AM +1100, Dave Chinner wrote:
> > Of course. I've done that every time I've come acros these sorts of
> > problems.
> 
> The most recent report I was able to find was against 4.7-rc6, in July
> 2016.  Have you been able to reproduce it more recently than that?

I hit it once a couple of months ago, but I was was busy with much
higher priority stuff at the time (sorting out a CVE-worthy bug fix)
so it slipped off my radar pretty rapidly after I recovered the test
system and kept doing what I needed to do...

So, yeah, the problems are still there, I just don't run my root
filesystems out of space very often. Like I said - maybe once or
twice a year is the typical frequency this happens.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-27 Thread Theodore Ts'o
On Mon, Nov 27, 2017 at 08:14:27AM +1100, Dave Chinner wrote:
> Of course. I've done that every time I've come acros these sorts of
> problems.

The most recent report I was able to find was against 4.7-rc6, in July
2016.  Have you been able to reproduce it more recently than that?

Cheers,

- Ted


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-27 Thread Theodore Ts'o
On Mon, Nov 27, 2017 at 08:14:27AM +1100, Dave Chinner wrote:
> Of course. I've done that every time I've come acros these sorts of
> problems.

The most recent report I was able to find was against 4.7-rc6, in July
2016.  Have you been able to reproduce it more recently than that?

Cheers,

- Ted


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-26 Thread Dave Chinner
On Sun, Nov 26, 2017 at 10:40:26AM -0500, Theodore Ts'o wrote:
> On Sun, Nov 26, 2017 at 09:32:02AM +1100, Dave Chinner wrote:
> > 
> > They don't have any whacky symlinks around, but the modern ext4 code
> > does try to eat these filesystems every so often. Extended operation
> > at ENOSPC will eventually corrupt the rootfs and crash the kernel,
> > and then I play the "e2fsck doesn't detect corruption, kernel does"
> > game to get them fixed up and working again
> 
> If you have stack dumps or file system images which e2fsck doesn't
> detect any problems but the kernels do, please do feel free send
> reports to the ext4 mailing list.

Of course. I've done that every time I've come acros these sorts of
problems.

> > I'm running with everything up to date (debian unstable) on these
> > VMs, they are just an old filesystem because some distros have had
> > reliable rolling updates for the entire life of these VMs. :P
> 
> Or if you can make the VM's available and tell me how you are
> using/exercising them, I can try to see if I can repro the problem.

No, I can't xpamke them available. As for how I use them, they are
my test/devel VMs, so they are getting multiple kernels thrown at
them every day, and I'll just kill the VM via the qemu console (they
*never* get shut down clealy) when I need to install a new kernel.
Often they won't shut down anyway, because I've
oopsed/deadlocked/etc something on a different filesystem...

> I am wondering how you are running into ENOSPC on the root file
> systems; I take this is much more than running xfstests?

No, it isn't.  Just have a scratch filesystem failure during
xfstests such that mount fails during a "fill to enospc" test and it
will fill the root filesystem rather than the test/scratch device.
Or run a buggy test that dumps everything in $here. Or fill /tmp
without noticing it.  Then let fstests continue to run trying to
write state and logs for the next 500 tests...

> Are you
> running some benchmarks that are logging into the root, and that's
> triggering the ENOSPC condition?

No, I'm not doing anything like that on these machines. It's
straight forward "something filled the root fs unexpectedly" type of
error which I don't notice immediately...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-26 Thread Dave Chinner
On Sun, Nov 26, 2017 at 10:40:26AM -0500, Theodore Ts'o wrote:
> On Sun, Nov 26, 2017 at 09:32:02AM +1100, Dave Chinner wrote:
> > 
> > They don't have any whacky symlinks around, but the modern ext4 code
> > does try to eat these filesystems every so often. Extended operation
> > at ENOSPC will eventually corrupt the rootfs and crash the kernel,
> > and then I play the "e2fsck doesn't detect corruption, kernel does"
> > game to get them fixed up and working again
> 
> If you have stack dumps or file system images which e2fsck doesn't
> detect any problems but the kernels do, please do feel free send
> reports to the ext4 mailing list.

Of course. I've done that every time I've come acros these sorts of
problems.

> > I'm running with everything up to date (debian unstable) on these
> > VMs, they are just an old filesystem because some distros have had
> > reliable rolling updates for the entire life of these VMs. :P
> 
> Or if you can make the VM's available and tell me how you are
> using/exercising them, I can try to see if I can repro the problem.

No, I can't xpamke them available. As for how I use them, they are
my test/devel VMs, so they are getting multiple kernels thrown at
them every day, and I'll just kill the VM via the qemu console (they
*never* get shut down clealy) when I need to install a new kernel.
Often they won't shut down anyway, because I've
oopsed/deadlocked/etc something on a different filesystem...

> I am wondering how you are running into ENOSPC on the root file
> systems; I take this is much more than running xfstests?

No, it isn't.  Just have a scratch filesystem failure during
xfstests such that mount fails during a "fill to enospc" test and it
will fill the root filesystem rather than the test/scratch device.
Or run a buggy test that dumps everything in $here. Or fill /tmp
without noticing it.  Then let fstests continue to run trying to
write state and logs for the next 500 tests...

> Are you
> running some benchmarks that are logging into the root, and that's
> triggering the ENOSPC condition?

No, I'm not doing anything like that on these machines. It's
straight forward "something filled the root fs unexpectedly" type of
error which I don't notice immediately...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-26 Thread Theodore Ts'o
On Sun, Nov 26, 2017 at 09:32:02AM +1100, Dave Chinner wrote:
> 
> They don't have any whacky symlinks around, but the modern ext4 code
> does try to eat these filesystems every so often. Extended operation
> at ENOSPC will eventually corrupt the rootfs and crash the kernel,
> and then I play the "e2fsck doesn't detect corruption, kernel does"
> game to get them fixed up and working again

If you have stack dumps or file system images which e2fsck doesn't
detect any problems but the kernels do, please do feel free send
reports to the ext4 mailing list.

> I'm running with everything up to date (debian unstable) on these
> VMs, they are just an old filesystem because some distros have had
> reliable rolling updates for the entire life of these VMs. :P

Or if you can make the VM's available and tell me how you are
using/exercising them, I can try to see if I can repro the problem.

I am wondering how you are running into ENOSPC on the root file
systems; I take this is much more than running xfstests?  Are you
running some benchmarks that are logging into the root, and that's
triggering the ENOSPC condition?

Thanks,

- Ted


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-26 Thread Theodore Ts'o
On Sun, Nov 26, 2017 at 09:32:02AM +1100, Dave Chinner wrote:
> 
> They don't have any whacky symlinks around, but the modern ext4 code
> does try to eat these filesystems every so often. Extended operation
> at ENOSPC will eventually corrupt the rootfs and crash the kernel,
> and then I play the "e2fsck doesn't detect corruption, kernel does"
> game to get them fixed up and working again

If you have stack dumps or file system images which e2fsck doesn't
detect any problems but the kernels do, please do feel free send
reports to the ext4 mailing list.

> I'm running with everything up to date (debian unstable) on these
> VMs, they are just an old filesystem because some distros have had
> reliable rolling updates for the entire life of these VMs. :P

Or if you can make the VM's available and tell me how you are
using/exercising them, I can try to see if I can repro the problem.

I am wondering how you are running into ENOSPC on the root file
systems; I take this is much more than running xfstests?  Are you
running some benchmarks that are logging into the root, and that's
triggering the ENOSPC condition?

Thanks,

- Ted


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-25 Thread Reindl Harald


Am 25.11.2017 um 23:32 schrieb Dave Chinner:

On Fri, Nov 24, 2017 at 03:03:37PM -0700, Andreas Dilger wrote:

Any worse an idea than running a new kernel on an old system?
Newer e2fsck fixes a lot of bugs that are present in older
e2fsck as well...


I'm running with everything up to date (debian unstable) on these
VMs, they are just an old filesystem because some distros have had
reliable rolling updates for the entire life of these VMs. :P


but why not update the FS to ext4?

our whole infrastructure was installed with Fedora 9 on ext3 (currently 
running F26, yum/dnf dist-upgrades) but any FS including the rootfs was 
converted to ext4 in 2010


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-25 Thread Reindl Harald


Am 25.11.2017 um 23:32 schrieb Dave Chinner:

On Fri, Nov 24, 2017 at 03:03:37PM -0700, Andreas Dilger wrote:

Any worse an idea than running a new kernel on an old system?
Newer e2fsck fixes a lot of bugs that are present in older
e2fsck as well...


I'm running with everything up to date (debian unstable) on these
VMs, they are just an old filesystem because some distros have had
reliable rolling updates for the entire life of these VMs. :P


but why not update the FS to ext4?

our whole infrastructure was installed with Fedora 9 on ext3 (currently 
running F26, yum/dnf dist-upgrades) but any FS including the rootfs was 
converted to ext4 in 2010


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-25 Thread Dave Chinner
On Sat, Nov 25, 2017 at 11:45:07PM +0100, Reindl Harald wrote:
> 
> Am 25.11.2017 um 23:32 schrieb Dave Chinner:
> >On Fri, Nov 24, 2017 at 03:03:37PM -0700, Andreas Dilger wrote:
> >>Any worse an idea than running a new kernel on an old system?
> >>Newer e2fsck fixes a lot of bugs that are present in older
> >>e2fsck as well...
> >
> >I'm running with everything up to date (debian unstable) on these
> >VMs, they are just an old filesystem because some distros have had
> >reliable rolling updates for the entire life of these VMs. :P
> 
> but why not update the FS to ext4?

Unlike ext3, ext4 is not a filesystem that takes kindly to being
abused by an environment that involves machines being crashed,
oopsed and forcibly rebooted without warning tens of times a day.
Every ext4 root filesytsem I've tried on these VMs has lasted less
than two weeks before being unrecoverably corrupted and needing to
be rebuilt from scratch.

Last time I tried a couple of years ago, the ext4 filesystems lasted
less than a day because corrupting itself in a way that it couldn't
mount but e2fsck didn't detect anything wrong and so it couldn't be
repaired. ext4 is just not robust enough for my use case.

And, FWIW, I don't use XFS for these root filesystems because the
reason I'm doing this to machines is that I'm trashing throwaway XFS
filesystems with broken XFS code on other devices on the VM...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-25 Thread Dave Chinner
On Sat, Nov 25, 2017 at 11:45:07PM +0100, Reindl Harald wrote:
> 
> Am 25.11.2017 um 23:32 schrieb Dave Chinner:
> >On Fri, Nov 24, 2017 at 03:03:37PM -0700, Andreas Dilger wrote:
> >>Any worse an idea than running a new kernel on an old system?
> >>Newer e2fsck fixes a lot of bugs that are present in older
> >>e2fsck as well...
> >
> >I'm running with everything up to date (debian unstable) on these
> >VMs, they are just an old filesystem because some distros have had
> >reliable rolling updates for the entire life of these VMs. :P
> 
> but why not update the FS to ext4?

Unlike ext3, ext4 is not a filesystem that takes kindly to being
abused by an environment that involves machines being crashed,
oopsed and forcibly rebooted without warning tens of times a day.
Every ext4 root filesytsem I've tried on these VMs has lasted less
than two weeks before being unrecoverably corrupted and needing to
be rebuilt from scratch.

Last time I tried a couple of years ago, the ext4 filesystems lasted
less than a day because corrupting itself in a way that it couldn't
mount but e2fsck didn't detect anything wrong and so it couldn't be
repaired. ext4 is just not robust enough for my use case.

And, FWIW, I don't use XFS for these root filesystems because the
reason I'm doing this to machines is that I'm trashing throwaway XFS
filesystems with broken XFS code on other devices on the VM...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-25 Thread Dave Chinner
On Fri, Nov 24, 2017 at 03:03:37PM -0700, Andreas Dilger wrote:
> On Nov 24, 2017, at 9:51 AM, Andi Kleen  wrote:
> > 
> >> We checked old kernels, and old e2fsprogs, and didn't see any cases
> >> where fast (<= 60 chars) symlinks were created using external blocks.
> >> It seems that _something_ did create them, and it would be good to
> >> figure that out so we can determine if it is a widespread problem
> > 
> > I assume it was the original kernel.
> > 
> >> 
> >> I think e2fsck can fix this quite easily, and there really isn't
> >> an easy way to revert to the old method if the large xattr feature
> >> is enabled.  If you are willing to run a new kernel, you should also
> >> be willing to run a new e2fsck.
> > 
> > It's obviously not enabled on ext3.
> > 
> >> We could probably add a fallback to the old mechanism (and print
> >> a one-time warning to upgrade to a newer e2fsck) if an external fast
> >> symlink is found and the large xattr  feature is not enabled, which
> >> would give more time to fix this (hopefully rare in the wild) case.
> > 
> > If the old kernel created it, then likely all the
> > /lib{,64}/ld-linux.so.2 symlinks have that, which breaks all ELF
> > executables. I suspect in these old file systems it's not particularly rare.
> 
> Sure, but not many people are going to be running a 4.14 kernel with
> a 2007 system. 

I have multiple test VMs with root ext3 filesystems that date back
that far. Looks like the original install the root fs image was
derived from came from around 2006:

$ ls -lt /etc |tail -1
-rw-r--r--  1 root root   9 Aug  8  2006 host.conf
$ ls -lt /usr/bin |tail -2
-rwxr-xr-x 1 root   root 2038 Jun 18  2006 defoma-hints
-rwxr-xr-x 1 root   root 1761 Jun 18  2006 dh_installdefoma
$ uname -a
Linux test4 4.14.0-dgc #211 SMP PREEMPT Thu Nov 23 16:49:31 AEDT 2017 x86_64 
GNU/Linux
$

These VMs are in use 24x7, and have been since they were created way
back when. When something in ext3 breaks, I tend to notice it and
report it.

They don't have any whacky symlinks around, but the modern ext4 code
does try to eat these filesystems every so often. Extended operation
at ENOSPC will eventually corrupt the rootfs and crash the kernel,
and then I play the "e2fsck doesn't detect corruption, kernel does"
game to get them fixed up and working again

> > Requiring new e2fsck on old systems is a bad idea.
> 
> Any worse an idea than running a new kernel on an old system?
> Newer e2fsck fixes a lot of bugs that are present in older
> e2fsck as well...

I'm running with everything up to date (debian unstable) on these
VMs, they are just an old filesystem because some distros have had
reliable rolling updates for the entire life of these VMs. :P

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-25 Thread Dave Chinner
On Fri, Nov 24, 2017 at 03:03:37PM -0700, Andreas Dilger wrote:
> On Nov 24, 2017, at 9:51 AM, Andi Kleen  wrote:
> > 
> >> We checked old kernels, and old e2fsprogs, and didn't see any cases
> >> where fast (<= 60 chars) symlinks were created using external blocks.
> >> It seems that _something_ did create them, and it would be good to
> >> figure that out so we can determine if it is a widespread problem
> > 
> > I assume it was the original kernel.
> > 
> >> 
> >> I think e2fsck can fix this quite easily, and there really isn't
> >> an easy way to revert to the old method if the large xattr feature
> >> is enabled.  If you are willing to run a new kernel, you should also
> >> be willing to run a new e2fsck.
> > 
> > It's obviously not enabled on ext3.
> > 
> >> We could probably add a fallback to the old mechanism (and print
> >> a one-time warning to upgrade to a newer e2fsck) if an external fast
> >> symlink is found and the large xattr  feature is not enabled, which
> >> would give more time to fix this (hopefully rare in the wild) case.
> > 
> > If the old kernel created it, then likely all the
> > /lib{,64}/ld-linux.so.2 symlinks have that, which breaks all ELF
> > executables. I suspect in these old file systems it's not particularly rare.
> 
> Sure, but not many people are going to be running a 4.14 kernel with
> a 2007 system. 

I have multiple test VMs with root ext3 filesystems that date back
that far. Looks like the original install the root fs image was
derived from came from around 2006:

$ ls -lt /etc |tail -1
-rw-r--r--  1 root root   9 Aug  8  2006 host.conf
$ ls -lt /usr/bin |tail -2
-rwxr-xr-x 1 root   root 2038 Jun 18  2006 defoma-hints
-rwxr-xr-x 1 root   root 1761 Jun 18  2006 dh_installdefoma
$ uname -a
Linux test4 4.14.0-dgc #211 SMP PREEMPT Thu Nov 23 16:49:31 AEDT 2017 x86_64 
GNU/Linux
$

These VMs are in use 24x7, and have been since they were created way
back when. When something in ext3 breaks, I tend to notice it and
report it.

They don't have any whacky symlinks around, but the modern ext4 code
does try to eat these filesystems every so often. Extended operation
at ENOSPC will eventually corrupt the rootfs and crash the kernel,
and then I play the "e2fsck doesn't detect corruption, kernel does"
game to get them fixed up and working again

> > Requiring new e2fsck on old systems is a bad idea.
> 
> Any worse an idea than running a new kernel on an old system?
> Newer e2fsck fixes a lot of bugs that are present in older
> e2fsck as well...

I'm running with everything up to date (debian unstable) on these
VMs, they are just an old filesystem because some distros have had
reliable rolling updates for the entire life of these VMs. :P

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-24 Thread Theodore Ts'o
On Fri, Nov 24, 2017 at 08:51:02AM -0800, Andi Kleen wrote:
> > I think e2fsck can fix this quite easily, and there really isn't
> > an easy way to revert to the old method if the large xattr feature
> > is enabled.  If you are willing to run a new kernel, you should also
> > be willing to run a new e2fsck.
> 
> It's obviously not enabled on ext3.

Yes, I think it's clear we need to enable a backwards compatibility
support for ext3 file systems, or even all ext4 file systems that
don't have the large xattr feature.

We could have e2fsck offer to fix it, so long as it is being run
manually (e.g., not in preen mode), since it does have the benefit of
releasing unnecessarily allocated 4k blocks for symlinks which are <
60 bytes.

- Ted


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-24 Thread Theodore Ts'o
On Fri, Nov 24, 2017 at 08:51:02AM -0800, Andi Kleen wrote:
> > I think e2fsck can fix this quite easily, and there really isn't
> > an easy way to revert to the old method if the large xattr feature
> > is enabled.  If you are willing to run a new kernel, you should also
> > be willing to run a new e2fsck.
> 
> It's obviously not enabled on ext3.

Yes, I think it's clear we need to enable a backwards compatibility
support for ext3 file systems, or even all ext4 file systems that
don't have the large xattr feature.

We could have e2fsck offer to fix it, so long as it is being run
manually (e.g., not in preen mode), since it does have the benefit of
releasing unnecessarily allocated 4k blocks for symlinks which are <
60 bytes.

- Ted


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-24 Thread Andi Kleen
> Sure, but not many people are going to be running a 4.14 kernel with
> a 2007 system.  

It's not just root, but any disk. People could well have 10 year old
disks.

> Could you please run the updated find command to see
> whether this is an isolated case, or if it is a common case:
> 
> find / -type l -size -60c -print0 | xargs -0r ls -dils | awk '$2 != 0 { print 
> }'

Pretty much all symlinks on / hit it. / has 1278 symlinks total, and 
1218 match the line above.

-Andi


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-24 Thread Andi Kleen
> Sure, but not many people are going to be running a 4.14 kernel with
> a 2007 system.  

It's not just root, but any disk. People could well have 10 year old
disks.

> Could you please run the updated find command to see
> whether this is an isolated case, or if it is a common case:
> 
> find / -type l -size -60c -print0 | xargs -0r ls -dils | awk '$2 != 0 { print 
> }'

Pretty much all symlinks on / hit it. / has 1278 symlinks total, and 
1218 match the line above.

-Andi


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-24 Thread James Bottomley
On Fri, 2017-11-24 at 15:03 -0700, Andreas Dilger wrote:
> On Nov 24, 2017, at 9:51 AM, Andi Kleen  wrote:
> > 
> > 
> > > 
> > > We checked old kernels, and old e2fsprogs, and didn't see any
> > > cases
> > > where fast (<= 60 chars) symlinks were created using external
> > > blocks.
> > > It seems that _something_ did create them, and it would be good
> > > to
> > > figure that out so we can determine if it is a widespread problem
> > 
> > I assume it was the original kernel.
> > 
> > > 
> > > 
> > > I think e2fsck can fix this quite easily, and there really isn't
> > > an easy way to revert to the old method if the large xattr
> > > feature
> > > is enabled.  If you are willing to run a new kernel, you should
> > > also
> > > be willing to run a new e2fsck.
> > 
> > It's obviously not enabled on ext3.
> > 
> > > 
> > > We could probably add a fallback to the old mechanism (and print
> > > a one-time warning to upgrade to a newer e2fsck) if an external
> > > fast symlink is found and the large xattr  feature is not
> > > enabled, which would give more time to fix this (hopefully rare
> > > in the wild) case.
> > 
> > If the old kernel created it, then likely all the
> > /lib{,64}/ld-linux.so.2 symlinks have that, which breaks all ELF
> > executables. I suspect in these old file systems it's not
> > particularly rare.
> 
> Sure, but not many people are going to be running a 4.14 kernel with
> a 2007 system. 

I really disagree on this ... most of us who are doing kernel testing
will be running with older systems.  It's true, some of us do install
from scratch and then test, but most of us upgrade (which doesn't
necessarily modify the symlinks).  On your creation test, this is my
cloud system:

bedivere:~# dumpe2fs -h $(df -P / | awk '/dev/ { print $1 }') 2>&1 | grep 
created
Filesystem created:   Tue Mar 24 20:21:35 2009

Your find command turns up nothing untoward.

My older system is the home entertainment system, but that has an xfs
root dating back to 2005.

I bet I have a laptop even older (currently travelling, so can't
check).

James


signature.asc
Description: This is a digitally signed message part


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-24 Thread James Bottomley
On Fri, 2017-11-24 at 15:03 -0700, Andreas Dilger wrote:
> On Nov 24, 2017, at 9:51 AM, Andi Kleen  wrote:
> > 
> > 
> > > 
> > > We checked old kernels, and old e2fsprogs, and didn't see any
> > > cases
> > > where fast (<= 60 chars) symlinks were created using external
> > > blocks.
> > > It seems that _something_ did create them, and it would be good
> > > to
> > > figure that out so we can determine if it is a widespread problem
> > 
> > I assume it was the original kernel.
> > 
> > > 
> > > 
> > > I think e2fsck can fix this quite easily, and there really isn't
> > > an easy way to revert to the old method if the large xattr
> > > feature
> > > is enabled.  If you are willing to run a new kernel, you should
> > > also
> > > be willing to run a new e2fsck.
> > 
> > It's obviously not enabled on ext3.
> > 
> > > 
> > > We could probably add a fallback to the old mechanism (and print
> > > a one-time warning to upgrade to a newer e2fsck) if an external
> > > fast symlink is found and the large xattr  feature is not
> > > enabled, which would give more time to fix this (hopefully rare
> > > in the wild) case.
> > 
> > If the old kernel created it, then likely all the
> > /lib{,64}/ld-linux.so.2 symlinks have that, which breaks all ELF
> > executables. I suspect in these old file systems it's not
> > particularly rare.
> 
> Sure, but not many people are going to be running a 4.14 kernel with
> a 2007 system. 

I really disagree on this ... most of us who are doing kernel testing
will be running with older systems.  It's true, some of us do install
from scratch and then test, but most of us upgrade (which doesn't
necessarily modify the symlinks).  On your creation test, this is my
cloud system:

bedivere:~# dumpe2fs -h $(df -P / | awk '/dev/ { print $1 }') 2>&1 | grep 
created
Filesystem created:   Tue Mar 24 20:21:35 2009

Your find command turns up nothing untoward.

My older system is the home entertainment system, but that has an xfs
root dating back to 2005.

I bet I have a laptop even older (currently travelling, so can't
check).

James


signature.asc
Description: This is a digitally signed message part


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-24 Thread Andreas Dilger
On Nov 24, 2017, at 9:51 AM, Andi Kleen  wrote:
> 
>> We checked old kernels, and old e2fsprogs, and didn't see any cases
>> where fast (<= 60 chars) symlinks were created using external blocks.
>> It seems that _something_ did create them, and it would be good to
>> figure that out so we can determine if it is a widespread problem
> 
> I assume it was the original kernel.
> 
>> 
>> I think e2fsck can fix this quite easily, and there really isn't
>> an easy way to revert to the old method if the large xattr feature
>> is enabled.  If you are willing to run a new kernel, you should also
>> be willing to run a new e2fsck.
> 
> It's obviously not enabled on ext3.
> 
>> We could probably add a fallback to the old mechanism (and print
>> a one-time warning to upgrade to a newer e2fsck) if an external fast
>> symlink is found and the large xattr  feature is not enabled, which
>> would give more time to fix this (hopefully rare in the wild) case.
> 
> If the old kernel created it, then likely all the
> /lib{,64}/ld-linux.so.2 symlinks have that, which breaks all ELF
> executables. I suspect in these old file systems it's not particularly rare.

Sure, but not many people are going to be running a 4.14 kernel with
a 2007 system.  Could you please run the updated find command to see
whether this is an isolated case, or if it is a common case:

find / -type l -size -60c -print0 | xargs -0r ls -dils | awk '$2 != 0 { print }'

It would also be useful if anyone else reading this that has an old
system (2005-2011 install date) ran the same to see if any such
symlinks are found.  To see when the root filesystem was created, run:

dumpe2fs -h $(df -P / | awk '/dev/ { print $1 }') 2>&1 | grep created

> So I don't think you can just break them all.

Sure.  As previously mentioned, it shouldn't have broken *any* systems
based on our prior investigation, I'm just trying to see how bad the
problem really is.  Like I said, a workaround (without need to patch
the kernel, and that is compatible with old and new kernels) is:

find / -type l -size -60c -print0 | xargs -0r ls -dils | awk '$2 != 0 { print 
}' |
while read L; do ln -sfv "$(ls -l "$L" | sed -e 's/.*-> //')" "$L"; done

This just recreates any problematic symlinks in place, which should make
it a proper fast symlink.

> I think it's ok to only handle it when the large xattrs are disabled.
> 
> Requiring new e2fsck on old systems is a bad idea.

Any worse an idea than running a new kernel on an old system?
Newer e2fsck fixes a lot of bugs that are present in older
e2fsck as well...

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-24 Thread Andreas Dilger
On Nov 24, 2017, at 9:51 AM, Andi Kleen  wrote:
> 
>> We checked old kernels, and old e2fsprogs, and didn't see any cases
>> where fast (<= 60 chars) symlinks were created using external blocks.
>> It seems that _something_ did create them, and it would be good to
>> figure that out so we can determine if it is a widespread problem
> 
> I assume it was the original kernel.
> 
>> 
>> I think e2fsck can fix this quite easily, and there really isn't
>> an easy way to revert to the old method if the large xattr feature
>> is enabled.  If you are willing to run a new kernel, you should also
>> be willing to run a new e2fsck.
> 
> It's obviously not enabled on ext3.
> 
>> We could probably add a fallback to the old mechanism (and print
>> a one-time warning to upgrade to a newer e2fsck) if an external fast
>> symlink is found and the large xattr  feature is not enabled, which
>> would give more time to fix this (hopefully rare in the wild) case.
> 
> If the old kernel created it, then likely all the
> /lib{,64}/ld-linux.so.2 symlinks have that, which breaks all ELF
> executables. I suspect in these old file systems it's not particularly rare.

Sure, but not many people are going to be running a 4.14 kernel with
a 2007 system.  Could you please run the updated find command to see
whether this is an isolated case, or if it is a common case:

find / -type l -size -60c -print0 | xargs -0r ls -dils | awk '$2 != 0 { print }'

It would also be useful if anyone else reading this that has an old
system (2005-2011 install date) ran the same to see if any such
symlinks are found.  To see when the root filesystem was created, run:

dumpe2fs -h $(df -P / | awk '/dev/ { print $1 }') 2>&1 | grep created

> So I don't think you can just break them all.

Sure.  As previously mentioned, it shouldn't have broken *any* systems
based on our prior investigation, I'm just trying to see how bad the
problem really is.  Like I said, a workaround (without need to patch
the kernel, and that is compatible with old and new kernels) is:

find / -type l -size -60c -print0 | xargs -0r ls -dils | awk '$2 != 0 { print 
}' |
while read L; do ln -sfv "$(ls -l "$L" | sed -e 's/.*-> //')" "$L"; done

This just recreates any problematic symlinks in place, which should make
it a proper fast symlink.

> I think it's ok to only handle it when the large xattrs are disabled.
> 
> Requiring new e2fsck on old systems is a bad idea.

Any worse an idea than running a new kernel on an old system?
Newer e2fsck fixes a lot of bugs that are present in older
e2fsck as well...

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-24 Thread Andi Kleen
> We checked old kernels, and old e2fsprogs, and didn't see any cases
> where fast (<= 60 chars) symlinks were created using external blocks.
> It seems that _something_ did create them, and it would be good to
> figure that out so we can determine if it is a widespread problem

I assume it was the original kernel. 

> 
> I think e2fsck can fix this quite easily, and there really isn't
> an easy way to revert to the old method if the large xattr feature
> is enabled.  If you are willing to run a new kernel, you should also
> be willing to run a new e2fsck.

It's obviously not enabled on ext3.

> 
> We could probably add a fallback to the old mechanism (and print
> a one-time warning to upgrade to a newer e2fsck) if an external fast
> symlink is found and the large xattr  feature is not enabled, which
> would give more time to fix this (hopefully rare in the wild) case.

If the old kernel created it, then likely all the
/lib{,64}/ld-linux.so.2 symlinks have that, which breaks all ELF 
executables. I suspect in these old file systems it's not particularly rare.

So I don't think you can just break them all.

I think it's ok to only handle it when the large xattrs are disabled.

Requiring new e2fsck on old systems is a bad idea.

-Andi


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-24 Thread Andi Kleen
> We checked old kernels, and old e2fsprogs, and didn't see any cases
> where fast (<= 60 chars) symlinks were created using external blocks.
> It seems that _something_ did create them, and it would be good to
> figure that out so we can determine if it is a widespread problem

I assume it was the original kernel. 

> 
> I think e2fsck can fix this quite easily, and there really isn't
> an easy way to revert to the old method if the large xattr feature
> is enabled.  If you are willing to run a new kernel, you should also
> be willing to run a new e2fsck.

It's obviously not enabled on ext3.

> 
> We could probably add a fallback to the old mechanism (and print
> a one-time warning to upgrade to a newer e2fsck) if an external fast
> symlink is found and the large xattr  feature is not enabled, which
> would give more time to fix this (hopefully rare in the wild) case.

If the old kernel created it, then likely all the
/lib{,64}/ld-linux.so.2 symlinks have that, which breaks all ELF 
executables. I suspect in these old file systems it's not particularly rare.

So I don't think you can just break them all.

I think it's ok to only handle it when the large xattrs are disabled.

Requiring new e2fsck on old systems is a bad idea.

-Andi


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-23 Thread Andreas Dilger
On Nov 23, 2017, at 7:04 PM, Andi Kleen  wrote:
> 
>> As a workaround, you could delete and recreate the symlink with the new
> 
> I revert the patch for now. Everything seems to work.
> 
>> kernel to create a proper fast symlink.  It would be useful to scan
>> the image to see if there are other similar symlinks present:
>> 
>>find /myth/tmp -type l -size -60 -ls | awk '$2 != 0 { print }'
> 
> Doesn't find anything. Your recipe must be wrong.

I see that I should have used "-60c" to properly limit the listing to
short symlinks, but this doesn't appear to be the core problem.  It
looks like there is a bug in find (at least version 4.4.2 that I'm
testing with) that it doesn't print the blocks count properly.

According to find(1) the "-ls" argument should list the file the same
as "ls -dils" format (blocks is $2), but as shown below "find -ls"
prints "0" for blocks when it should be "4" (for a long symlink using
"+60c" in my example, I couldn't find any short+external symlinks on a
couple of 8 year old root filesystems):

$ find /etc/alternatives/rmid -type l -size +60c -ls
327877 0 lrwxrwxrwx 1 root root 73 Jan  4  2017 /etc/alternatives/rmid -> 
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-0.b15.el6_8.x86_64/jre/bin/rmid

$ ls -dils /etc/alternatives/rmid
327877 4 lrwxrwxrwx 1 root root 73 Jan  4  2017 /etc/alternatives/rmid -> 
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-0.b15.el6_8.x86_64/jre/bin/rmid*


Try the following command instead:

find / -type l -size -60c -print0 | xargs -0r ls -dils | awk '$2 != 0 { print }'


>> This is probably something that e2fsck should check for and fix.
> 
> Nah the kernel should just support it like it always did.

The reason we changed this code in the first place was because the
old check would repeatedly break when some new reason for storing
blocks on a symlink appeared.  It broke when xattrs were allowed
on symlinks for SELinux.  It broke when bigalloc blocks were added.
It broke when inline_data was added, and it would have broken (and
been really hard to fix efficiently) when large xattrs were added.

We checked old kernels, and old e2fsprogs, and didn't see any cases
where fast (<= 60 chars) symlinks were created using external blocks.
It seems that _something_ did create them, and it would be good to
figure that out so we can determine if it is a widespread problem.

I think e2fsck can fix this quite easily, and there really isn't
an easy way to revert to the old method if the large xattr feature
is enabled.  If you are willing to run a new kernel, you should also
be willing to run a new e2fsck.

We could probably add a fallback to the old mechanism (and print
a one-time warning to upgrade to a newer e2fsck) if an external fast
symlink is found and the large xattr  feature is not enabled, which
would give more time to fix this (hopefully rare in the wild) case.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-23 Thread Andreas Dilger
On Nov 23, 2017, at 7:04 PM, Andi Kleen  wrote:
> 
>> As a workaround, you could delete and recreate the symlink with the new
> 
> I revert the patch for now. Everything seems to work.
> 
>> kernel to create a proper fast symlink.  It would be useful to scan
>> the image to see if there are other similar symlinks present:
>> 
>>find /myth/tmp -type l -size -60 -ls | awk '$2 != 0 { print }'
> 
> Doesn't find anything. Your recipe must be wrong.

I see that I should have used "-60c" to properly limit the listing to
short symlinks, but this doesn't appear to be the core problem.  It
looks like there is a bug in find (at least version 4.4.2 that I'm
testing with) that it doesn't print the blocks count properly.

According to find(1) the "-ls" argument should list the file the same
as "ls -dils" format (blocks is $2), but as shown below "find -ls"
prints "0" for blocks when it should be "4" (for a long symlink using
"+60c" in my example, I couldn't find any short+external symlinks on a
couple of 8 year old root filesystems):

$ find /etc/alternatives/rmid -type l -size +60c -ls
327877 0 lrwxrwxrwx 1 root root 73 Jan  4  2017 /etc/alternatives/rmid -> 
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-0.b15.el6_8.x86_64/jre/bin/rmid

$ ls -dils /etc/alternatives/rmid
327877 4 lrwxrwxrwx 1 root root 73 Jan  4  2017 /etc/alternatives/rmid -> 
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-0.b15.el6_8.x86_64/jre/bin/rmid*


Try the following command instead:

find / -type l -size -60c -print0 | xargs -0r ls -dils | awk '$2 != 0 { print }'


>> This is probably something that e2fsck should check for and fix.
> 
> Nah the kernel should just support it like it always did.

The reason we changed this code in the first place was because the
old check would repeatedly break when some new reason for storing
blocks on a symlink appeared.  It broke when xattrs were allowed
on symlinks for SELinux.  It broke when bigalloc blocks were added.
It broke when inline_data was added, and it would have broken (and
been really hard to fix efficiently) when large xattrs were added.

We checked old kernels, and old e2fsprogs, and didn't see any cases
where fast (<= 60 chars) symlinks were created using external blocks.
It seems that _something_ did create them, and it would be good to
figure that out so we can determine if it is a widespread problem.

I think e2fsck can fix this quite easily, and there really isn't
an easy way to revert to the old method if the large xattr feature
is enabled.  If you are willing to run a new kernel, you should also
be willing to run a new e2fsck.

We could probably add a fallback to the old mechanism (and print
a one-time warning to upgrade to a newer e2fsck) if an external fast
symlink is found and the large xattr  feature is not enabled, which
would give more time to fix this (hopefully rare in the wild) case.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-23 Thread Andi Kleen
> As a workaround, you could delete and recreate the symlink with the new

I revert the patch for now. Everything seems to work.

> kernel to create a proper fast symlink.  It would be useful to scan the
> image to see if there are other similar symlinks present:
> 
> find /myth/tmp -type l -size -60 -ls | awk '$2 != 0 { print }'

Doesn't find anything. Your recipe must be wrong.
> 
> This is probably something that e2fsck should check for and fix.

Nah the kernel should just support it like it always did.

-Andi


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-23 Thread Andi Kleen
> As a workaround, you could delete and recreate the symlink with the new

I revert the patch for now. Everything seems to work.

> kernel to create a proper fast symlink.  It would be useful to scan the
> image to see if there are other similar symlinks present:
> 
> find /myth/tmp -type l -size -60 -ls | awk '$2 != 0 { print }'

Doesn't find anything. Your recipe must be wrong.
> 
> This is probably something that e2fsck should check for and fix.

Nah the kernel should just support it like it always did.

-Andi


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-23 Thread Andreas Dilger
On Nov 23, 2017, at 4:31 PM, Andi Kleen  wrote:
> 
> On Thu, Nov 23, 2017 at 05:23:17PM -0500, Theodore Ts'o wrote:
>> On Thu, Nov 23, 2017 at 12:33:30PM -0800, Andi Kleen wrote:
>>> 
>>> I have an older qemu VM image that i sometimes use for testing. It
>>> stopped booting with 4.13-4.14 because it couldn't run init.
>>> It uses ext3 for the root file system.
>> 
>> Hmm, do you know roughly when (what krenel version) this image was
>> created?  We had done quite a lot of research and the belief was
>> kernels never would create a "slow" symlink which was less than 60
>> bytes.
> 
> The date of the inode is from 2007, the original kernel was 2.6.17
> with a 32bit kernel.
> 
>> Or was this image something that was created manually (e.g., using debugfs)?
> 
> No, it was installed.

As a workaround, you could delete and recreate the symlink with the new
kernel to create a proper fast symlink.  It would be useful to scan the
image to see if there are other similar symlinks present:

find /myth/tmp -type l -size -60 -ls | awk '$2 != 0 { print }'

This is probably something that e2fsck should check for and fix.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-23 Thread Andreas Dilger
On Nov 23, 2017, at 4:31 PM, Andi Kleen  wrote:
> 
> On Thu, Nov 23, 2017 at 05:23:17PM -0500, Theodore Ts'o wrote:
>> On Thu, Nov 23, 2017 at 12:33:30PM -0800, Andi Kleen wrote:
>>> 
>>> I have an older qemu VM image that i sometimes use for testing. It
>>> stopped booting with 4.13-4.14 because it couldn't run init.
>>> It uses ext3 for the root file system.
>> 
>> Hmm, do you know roughly when (what krenel version) this image was
>> created?  We had done quite a lot of research and the belief was
>> kernels never would create a "slow" symlink which was less than 60
>> bytes.
> 
> The date of the inode is from 2007, the original kernel was 2.6.17
> with a 32bit kernel.
> 
>> Or was this image something that was created manually (e.g., using debugfs)?
> 
> No, it was installed.

As a workaround, you could delete and recreate the symlink with the new
kernel to create a proper fast symlink.  It would be useful to scan the
image to see if there are other similar symlinks present:

find /myth/tmp -type l -size -60 -ls | awk '$2 != 0 { print }'

This is probably something that e2fsck should check for and fix.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-23 Thread Andi Kleen
On Thu, Nov 23, 2017 at 05:23:17PM -0500, Theodore Ts'o wrote:
> On Thu, Nov 23, 2017 at 12:33:30PM -0800, Andi Kleen wrote:
> > 
> > I have an older qemu VM image that i sometimes use for testing. It
> > stopped booting with 4.13-4.14 because it couldn't run init.  
> > It uses ext3 for the root file system.
> 
> Hmm, do you know roughly when (what krenel version) this image was
> created?  We had done quite a lot of research and the belief was
> kernels never would create a "slow" symlink which was less than 60
> bytes.

The date of the inode is from 2007, the original kernel was 2.6.17
with a 32bit kernel.

> Or was this image something that was created manually (e.g., using debugfs)?

No, it was installed.

-Andi


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-23 Thread Andi Kleen
On Thu, Nov 23, 2017 at 05:23:17PM -0500, Theodore Ts'o wrote:
> On Thu, Nov 23, 2017 at 12:33:30PM -0800, Andi Kleen wrote:
> > 
> > I have an older qemu VM image that i sometimes use for testing. It
> > stopped booting with 4.13-4.14 because it couldn't run init.  
> > It uses ext3 for the root file system.
> 
> Hmm, do you know roughly when (what krenel version) this image was
> created?  We had done quite a lot of research and the belief was
> kernels never would create a "slow" symlink which was less than 60
> bytes.

The date of the inode is from 2007, the original kernel was 2.6.17
with a 32bit kernel.

> Or was this image something that was created manually (e.g., using debugfs)?

No, it was installed.

-Andi


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-23 Thread Theodore Ts'o
On Thu, Nov 23, 2017 at 12:33:30PM -0800, Andi Kleen wrote:
> 
> I have an older qemu VM image that i sometimes use for testing. It
> stopped booting with 4.13-4.14 because it couldn't run init.  
> It uses ext3 for the root file system.

Hmm, do you know roughly when (what krenel version) this image was
created?  We had done quite a lot of research and the belief was
kernels never would create a "slow" symlink which was less than 60
bytes.

Or was this image something that was created manually (e.g., using debugfs)?

  - Ted


Re: regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-23 Thread Theodore Ts'o
On Thu, Nov 23, 2017 at 12:33:30PM -0800, Andi Kleen wrote:
> 
> I have an older qemu VM image that i sometimes use for testing. It
> stopped booting with 4.13-4.14 because it couldn't run init.  
> It uses ext3 for the root file system.

Hmm, do you know roughly when (what krenel version) this image was
created?  We had done quite a lot of research and the belief was
kernels never would create a "slow" symlink which was less than 60
bytes.

Or was this image something that was created manually (e.g., using debugfs)?

  - Ted


regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-23 Thread Andi Kleen

Hi,

I have an older qemu VM image that i sometimes use for testing. It
stopped booting with 4.13-4.14 because it couldn't run init.  
It uses ext3 for the root file system.

I instrumented the code and found that it failed to follow the 
/lib/ld-linux.so.2 -> ld-2.3.6.so symlink for init's ELF interpreter. 

I bisected it to down to

commit 407cd7fb83c0ebabb490190e673d8c71ee7df97e (refs/bisect/bad)
Author: Tahsin Erdogan 
Date:   Tue Jul 4 00:11:21 2017 -0400

ext4: change fast symlink test to not rely on i_blocks

when I revert this commit 4.14 my VM runs fine again.

Dump of the inode in debugfs: 

debugfs:  Inode: 1767   Type: symlinkMode:  0777   Flags: 0x0
Generation: 0
User: 0   Group: 0   Size: 11
File ACL: 0Directory ACL: 0
Links: 1   Blockcount: 8
Fragment:  Address: 0Number: 0Size: 0
ctime: 0x45ad7ba0 -- Wed Jan 17 01:28:00 2007
atime: 0x5a164be5 -- Thu Nov 23 04:17:41 2017
mtime: 0x45ad7ba0 -- Wed Jan 17 01:28:00 2007
BLOCKS:
(0):11006
TOTAL: 1

-Andi



regression: 4.13 cannot follow symlinks on some ext3 fs

2017-11-23 Thread Andi Kleen

Hi,

I have an older qemu VM image that i sometimes use for testing. It
stopped booting with 4.13-4.14 because it couldn't run init.  
It uses ext3 for the root file system.

I instrumented the code and found that it failed to follow the 
/lib/ld-linux.so.2 -> ld-2.3.6.so symlink for init's ELF interpreter. 

I bisected it to down to

commit 407cd7fb83c0ebabb490190e673d8c71ee7df97e (refs/bisect/bad)
Author: Tahsin Erdogan 
Date:   Tue Jul 4 00:11:21 2017 -0400

ext4: change fast symlink test to not rely on i_blocks

when I revert this commit 4.14 my VM runs fine again.

Dump of the inode in debugfs: 

debugfs:  Inode: 1767   Type: symlinkMode:  0777   Flags: 0x0
Generation: 0
User: 0   Group: 0   Size: 11
File ACL: 0Directory ACL: 0
Links: 1   Blockcount: 8
Fragment:  Address: 0Number: 0Size: 0
ctime: 0x45ad7ba0 -- Wed Jan 17 01:28:00 2007
atime: 0x5a164be5 -- Thu Nov 23 04:17:41 2017
mtime: 0x45ad7ba0 -- Wed Jan 17 01:28:00 2007
BLOCKS:
(0):11006
TOTAL: 1

-Andi