Bug#808151: systemd: failed to start remount root and kernel file system

2015-12-20 Thread Frank B. Brokken
Dear Michael Biebl, you wrote:
> Here is a more complete log excerpt for v228 (full log attached)
> 
> > Dez 20 01:27:42 debian systemd[1]: -.mount: Changed dead -> mounted
...
> 
> So the best guess is that the timing in v228 changed a little (some code
> paths became faster). This would confirm Frank's findings that enabling
> verbose output (which slows down boot a bit) made it less likely to hit.
> 
> Martin has been working/debugging the tentative stuff in the past, so
> maybe he has some insights here.
> 
> We should probably also involve upstream at some point.

OK, thanks for the help and (for me at least) final conclusion. For me
personally the problem has been solved: for the time being I'm happy with 227,
and I'm sure that the problem will soon be fixed.

Thanks again for helping along!

Cheers,

-- 
Frank B. Brokken
Center for Information Technology, University of Groningen
(+31) 50 363 9281 
Public PGP key: http://pgp.surfnet.nl
Key Fingerprint: DF32 13DE B156 7732 E65E  3B4D 7DB2 A8BE EAE4 D8AA


signature.asc
Description: PGP signature
___
Pkg-systemd-maintainers mailing list
Pkg-systemd-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-systemd-maintainers

Bug#808151: systemd: failed to start remount root and kernel file system

2015-12-19 Thread Frank B. Brokken
Dear Michael Biebl, you wrote:

> > This information is available at https://www.icce.rug.nl/systemd in the 
> > files 
> > initramfs.debug and alb.
> 
> Hm, unfortunately the journal dump is incomplete again. I have no idea why

Remarkable. I made it available the way I got it, so that's apparently what
there is.

> > booting procedure. You're sure it can't be some timing problem? 
> 
> Well, what kind of timing problem do you have in mind?

Don't know: I didn't design systemd. But if it's doing things in parallel then
maybe on newer, faster, computers things might have completed, like remounting
/usr rw before it's actually used. A race condition might then explain why the
problem doesn't always show itself, and why chances of failure are reduced
when more time is spent writing debug/verbose messages.

 
> So far, the only thing I can say for sure looking at the initramfs log,
> is that /usr has been mounted successfully in the initramfs.
> 
> "Something" apparently causes /usr to be unmounted later on. Which part
> and why that is, is not clear yet.
> 
> Do you have any (custom) init scripts in /etc/rcS.d/ which fiddle around
> with mount settings, run telinit or stuff like that?

Nope.

> I'm running out of ideas, tbh.

Well, that's already a *lot* more than I could offer myself :-) But
fortunately (for me, but hard to fix, I realize), the problem doesn't emerge
all the time. If rebooting every now and then gets me a running system, then
so be it. The FailureAction=reboot-force entry in systemd-remount-fs.service
already has proven to be my friend :-)


> If you suspect the remount service to be the cause for this, let's
> output the mounts before and after
> For that run
> $ systemctl edit systemd-remount-fs.service

When I issue that command I get the reply

Warning: systemd-remount-fs.service changed on disk. Run 'systemctl
daemon-reload' to reload units.

I guess the warning is obvious as I edited the file 

/lib/systemd/system/local-fs.target.wants/systemd-remount-fs.service

to prevent the reboot action. So I did as advised and reran the command,
but got an empty file in my editor, while the following message was shown
after ending the editor:

Editing "/etc/systemd/system/systemd-remount-fs.service.d/override.conf"
canceled: temporary file is empty.

> Then add
> [Service]
> ExecStartPre=/bin/sh -c 'echo "before rootfs remount"; findmnt'
> ExecStartPost=/bin/sh -c 'echo "after rootfs remount"; findmnt'
> 
> Reboot and attach the journal log again.

Instead of running the systemctl command I edited the file
/lib/systemd/system/local-fs.target.wants/systemd-remount-fs.service and added
the lines you suggested. My next e-mail is about the contents of journal log.

Thereafter I'll try to downgrade to the previous version to see what
happens then.


-- 
Frank B. Brokken
Center for Information Technology, University of Groningen
(+31) 50 363 9281 
Public PGP key: http://pgp.surfnet.nl
Key Fingerprint: DF32 13DE B156 7732 E65E  3B4D 7DB2 A8BE EAE4 D8AA


signature.asc
Description: PGP signature
___
Pkg-systemd-maintainers mailing list
Pkg-systemd-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-systemd-maintainers

Bug#808151: systemd: failed to start remount root and kernel file system

2015-12-19 Thread Frank B. Brokken
Hi Michael,

I downgraded to the following versions of the following packages:

libpam-systemd_227-2_i386.deb  libudev1_227-2_i386.deb 
libsystemd-dev_227-2_i386.deb  systemd-sysv_227-2_i386.deb 
libsystemd0_227-2_i386.deb systemd_227-2_i386.deb 
libudev-dev_227-2_i386.deb udev_227-2_i386.deb 

Thereafter I rebooted several times without encountering any problems. Also
with reduced output (grub's option 'quiet') no problems were encountered.

Cheers,

-- 
Frank B. Brokken
Center for Information Technology, University of Groningen
(+31) 50 363 9281 
Public PGP key: http://pgp.surfnet.nl
Key Fingerprint: DF32 13DE B156 7732 E65E  3B4D 7DB2 A8BE EAE4 D8AA


signature.asc
Description: PGP signature
___
Pkg-systemd-maintainers mailing list
Pkg-systemd-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-systemd-maintainers

Bug#808151: systemd: failed to start remount root and kernel file system

2015-12-19 Thread Frank B. Brokken
Dear Michael Biebl, you wrote:
> Am 18.12.2015 um 15:59 schrieb Frank B. Brokken:
> > Is there a way to determine that? What I do to upgrade the system is run
> > 'aptitude update' and then 'aptitude upgrade'. Is there a log somewhere that
> > tells me what packages and versions were updated at what moments in time?
> 
> /var/log/dpkg.log is a low-level log.
> 
> and then there is one for aptitude at /var/log/aptitude

Thanks: I made the logs available at https://www.icce.rug.nl/systemd


> ...
> Btw, you mentioned that this happened after an upgrade. Which previous
> version did you run which worked fine? Which other packages were
> upgraded along with it?

Tue, Dec  1 2015 14:07:23 +0100: 
the aptitude log shows an upgrade from systemd 227-2 to 228-2 

dpkg log: 2015-12-01 14:08:42 upgrade systemd:i386 227-2 228-2

dpkg log: 2015-12-03 08:30:01 upgrade systemd:i386 228-2 215-17+deb8u2

Thu, Dec  3 2015 08:31:37 +0100
the aptitude log shows an upgrade from systemd 215-17+deb8u2 -> 228-2

dpkg log: 2015-12-03 08:31:40 upgrade systemd:i386 215-17+deb8u2 228-2

and then, recently, by me trying to downgrade:

dpkg log: 2015-12-17 12:59:12 upgrade systemd:i386 228-2 228-2
dpkg log: 2015-12-18 16:15:37 upgrade systemd:i386 228-2 215-17+deb8u2
dpkg log: 2015-12-18 16:17:11 upgrade systemd:i386 215-17+deb8u2 228-2

Before Dec 1 no updates were recorded for systemd or udev, and until the
upgrades early December everything ran fine.




> If you downgrade systemd/udev, does the problem go away?

As I feared, downgrading is difficult because of the many reverse
dependencies. 

I looked at 

ftp://ftp.de.debian.org/debian/pool/main/s/systemd/

which is the mirror I usually use for earlier .deb files, but the one before
228-2 is 215-17, and 227-2 is only available as source archives and not AFAICS
as .deb packages.

I'll add the debug entry next (cf. your mail from Date: Fri, 18 Dec 2015
03:15:15 +0100) and let you know the results.


-- 
Frank B. Brokken
Center for Information Technology, University of Groningen
(+31) 50 363 9281 
Public PGP key: http://pgp.surfnet.nl
Key Fingerprint: DF32 13DE B156 7732 E65E  3B4D 7DB2 A8BE EAE4 D8AA


signature.asc
Description: PGP signature
___
Pkg-systemd-maintainers mailing list
Pkg-systemd-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-systemd-maintainers

Bug#808151: systemd: failed to start remount root and kernel file system

2015-12-18 Thread Frank B. Brokken
Dear Michael Biebl, you wrote:

> Well, /usr is mounted by the initramfs these days. So it should already
> be available when systemd is started. If that fails, this is a bug which
> needs to be addressed by initramfs-tools (or one of the hook scripts).
> It wasn't clear so far that /usr hasn't been mounted at all.
> 
> Is /usr on LVM, RAID, etc?

No, nothing like that. And for what it's worth: the problem only appeared
after I upgraded systemd last week. The laptop has nothing special in its
setup, and has been working perfectly for years, until last week when systemd
was renwed. I think in my bugreport I mentioned the problem that /usr wasn't
mounted.

In your next reply you wrote:

> I'm a bit confused by those logs. They show that sda5 have been mounted.
> 
> Dec 17 15:44:29 localhost.localdomain kernel: EXT4-fs (sda5): mounting
> ext3 file system using the ext4 subsystem
> Dec 17 15:44:29 localhost.localdomain kernel: EXT4-fs (sda5): mounted
> filesystem with ordered data mode. Opts: (null)
> 
> I figure /dev/sda5 is your /usr partition? Just to be sure, please
> attach ls -la /dev/disk/by-uuid/

I seem to remember that message, in particular the Opts: (null) remark, and I
think at that point /usr was mounted by me fron the systemd shell. Also,
couldn't it be that initramfs *did* do the mount, but that remounting it rw,
als reported in the error message is the problem? Also, to me it appears
remarkable that by removing the 'quiet' from the kernel parameters, so that
things go a bit slower because of the extra messages that are displayed the
frequency of failing boot procedures is greatly diminished. I'm considering
trying to add 'verbose' to grub's parameters to see if that produces more
output and maybe further reduces the frequency, but I haven't had the time to
do that yet. Something on the TODO list :-)

Anyway, here's the ls -la output:

total 0
drwxr-xr-x 2 root root 200 Dec 18 13:05 ./
drwxr-xr-x 5 root root 100 Dec 18 13:02 ../
lrwxrwxrwx 1 root root  10 Dec 18 13:02 04b82e8b-f871-4abb-978a-44ae44c5d1f7
-> ../../sda1
lrwxrwxrwx 1 root root  10 Dec 18 13:02 595bcdbf-6436-45a7-99d2-297a3dd85930
-> ../../sda6
lrwxrwxrwx 1 root root  10 Dec 18 13:02 693c71eb-d411-4ee0-a1b3-c577df02e01b
-> ../../sda9
lrwxrwxrwx 1 root root  10 Dec 18 13:02 6bcb2a05-33c9-402b-8093-e6a35ffd7aa1
-> ../../sda8
lrwxrwxrwx 1 root root  11 Dec 18 13:05 82e52787-6072-4af9-a5e6-2d88c365e62b
-> ../../loop0
lrwxrwxrwx 1 root root  10 Dec 18 13:02 c5591eff-0a6c-4310-bb11-7d5535f7da7b
-> ../../sda7
lrwxrwxrwx 1 root root  10 Dec 18 13:05 e289e4ad-be1d-42a8-9b38-f4dad9473520
-> ../../dm-0
lrwxrwxrwx 1 root root  10 Dec 18 13:02 ea8202e7-4564-424c-af70-a6a640fafb65
-> ../../sda5
~

I'll do the 'debug' addition later this weekend, like you requested.

Finally, you asked:

> Do you have any custom udev rules in /etc/udev/rules.d?

I don't think so, looking at the time stamps nothing has been changed there for 
years:

total 10
drwxr-xr-x 2 root root 3072 Dec  6  2014 ./
drwxr-xr-x 4 root root 1024 Dec  3 08:34 ../
-rw-r--r-- 1 root root  115 Dec  6  2014 70-automount.rules
-rw-r--r-- 1 root root 3841 Dec  6  2014 70-persistent-cd.rules
-rw-r--r-- 1 root root  895 Feb 26  2013 70-persistent-net.rules

And I definitely didn't recently change there any files, so again: the problem
appeared out of the blue since last weeks upgrade. 

I hope the above gives you at least some additional info. As I wrote: I'll do
the 'debug' addition tomorrow.

Cheers,

-- 
Frank B. Brokken
Center for Information Technology, University of Groningen
(+31) 50 363 9281 
Public PGP key: http://pgp.surfnet.nl
Key Fingerprint: DF32 13DE B156 7732 E65E  3B4D 7DB2 A8BE EAE4 D8AA

___
Pkg-systemd-maintainers mailing list
Pkg-systemd-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-systemd-maintainers


Bug#808151: systemd: failed to start remount root and kernel file system

2015-12-18 Thread Frank B. Brokken
Dear Michael Biebl, you wrote:

> The verbose debug log from the initramfs and systemd can maybe tell us
> more. But looking at https://www.icce.rug.nl/systemd/journalctl, the
> sda5 mount happens at line 773, the first errors start at line 785 and
> the remount is at line 802.
> So it looks like /usr is not mounted at the time
> systemd-remount-fs.service is run.

Right. That's consistent with the impression I got from the error messages.
*Why* that is true, however, eludes me.

> 
> What's also curious is, that the log doesn't seem to be complete.
> Usually systemd's first log line is something like
> 
> > Dez 18 07:03:47 pluto systemd[1]: systemd 228 running in system mode. (+PAM 
> > +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP 
> > +GCRYPT +GNUTLS +ACL +X
> > Dez 18 07:03:47 pluto systemd[1]: Detected architecture x86-64.
> 
> Those early boot messages seem to be missing completely.

Well, I didn't edit anything. The information I generated is passed to you the
way it was made available by the various programs/commands.


> Btw, you mentioned that this happened after an upgrade. Which previous
> version did you run which worked fine? Which other packages were
> upgraded along with it?

Is there a way to determine that? What I do to upgrade the system is run
'aptitude update' and then 'aptitude upgrade'. Is there a log somewhere that
tells me what packages and versions were updated at what moments in time?


> If you downgrade systemd/udev, does the problem go away?

I thought about doing that, but was afraid for an avalanche of forced
downgrades of packages that might now depend on the most recent udev and
systemd versions. But I'll give it a try asap and let you know the results.

-- 
Frank B. Brokken
Center for Information Technology, University of Groningen
(+31) 50 363 9281 
Public PGP key: http://pgp.surfnet.nl
Key Fingerprint: DF32 13DE B156 7732 E65E  3B4D 7DB2 A8BE EAE4 D8AA

___
Pkg-systemd-maintainers mailing list
Pkg-systemd-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-systemd-maintainers


Bug#808151: systemd: failed to start remount root and kernel file system

2015-12-17 Thread Frank B. Brokken
Dear Michael Biebl, you wrote:

> What happens afterwards? Are you dropped into the rescue shell?

Afterwards (i.e., after the initial failure message) the system tries to
continue booting, but shows lots of failure messages, eventually grinding to a
halt. No reboot (e.g. ctrl-alt-del) is possible and there's no rescue shell.

> If not, try to enable the debug shell by adding "systemd.debug-shell" to
> the kernel command line. This will give you a root shell on tty9.

Unfortunately, it doesn't, since the system halts (I first removed the
automatic reboot on failure).

However, during the process I observed that setting systemd.debug-shell and
removing the default 'quiet' specification from grub's /etc/default/grub (so,
now it specifies:

GRUB_CMDLINE_LINUX_DEFAULT="systemd.debug-shell" 

greatly reduces the chances of a failing boot. Not completely, but
greatly. Still, when rebooting fails there's just the plain halt, w/o a debug
shell. Since removing the quiet also produces a lot more output on the screen,
might my problem not simply be some timing problem?


-- 
Frank B. Brokken
Center for Information Technology, University of Groningen
(+31) 50 363 9281 
Public PGP key: http://pgp.surfnet.nl
Key Fingerprint: DF32 13DE B156 7732 E65E  3B4D 7DB2 A8BE EAE4 D8AA


signature.asc
Description: PGP signature
___
Pkg-systemd-maintainers mailing list
Pkg-systemd-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-systemd-maintainers

Bug#808151: systemd: failed to start remount root and kernel file system

2015-12-17 Thread Frank B. Brokken
Dear Michael Biebl, you wrote:
> Am 17.12.2015 um 13:46 schrieb Frank B. Brokken:

> > halt. No reboot (e.g. ctrl-alt-del) is possible and there's no rescue 
> > shell> 
> What exactly do you mean with halt? The systems completely locks up so
> you can't use the keyboard and switch to tty9?

No, that's not what happens. I mean that doing a reboot using ctrl-alt-del
isn't possible. Switching VTs is no problem, but except for VT1 nothing was
being shown. But maybe I overlooked things when I sent you the previous reply:
see below.

> That would sound like a kernel problem.

> > might my problem not simply be some timing problem?
> 
> Can you make a screenshot or a video from the boot process with "quiet"
> removed from the kernel command line.

I did. Not only that, I also tried to reboot again and this time I was able to
run the commands you asked before from tty9:

systemctl status
systemd-analyze dump
journalctl -alb

This time the debug shell prompt was available at tty9, although booting
failed. And in line with my previous findings, systemd-analyze and journalctl
weren't available, as they live in /usr/bin, and /usr hadn't been mounted. But
after mounting /usr from tty9 and then using the mount command systemd-analyze
and journalctl were available, so I also have the output from those commands
for you. The output, and the mp4 movie I made during the booting process are a
bit too large for the e-mail, but they are available for download/inspection
at https://www.icce.rug.nl/systemd/

Cheers,

-- 
Frank B. Brokken
Center for Information Technology, University of Groningen
(+31) 50 363 9281 
Public PGP key: http://pgp.surfnet.nl
Key Fingerprint: DF32 13DE B156 7732 E65E  3B4D 7DB2 A8BE EAE4 D8AA


signature.asc
Description: PGP signature
___
Pkg-systemd-maintainers mailing list
Pkg-systemd-maintainers@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-systemd-maintainers