On Wed, Dec 16, 2009 at 10:07 PM, Tom Bennet <twben...@gmail.com> wrote: > I spent the day recovering from a Gentoo upgrade, and thought I'd document > the experience in case it helps someone else. > > I'm running a custom kernel 2.6.25-gentoo-r7 on amd64, though I don't think > the rarer hardware is relevant. > > I tend to put off upgrading my Gentoo box because anytime I do, something > breaks. I'm afraid I haven't changed my opinion about that. Anyway, I did > "emerge --update --deep world" and plugged my ears. Some 600-odd packages > (and a few simpler problems) later, the system seemed to be doing okay. So > I thought I'd see if it could survive a reboot. No, it couldn't. > > On boot it failed checking the root file system and dropped into the repair > shell. The reason the fsck failed is that the root pseudo device file > /dev/md0, didn't exist. The root file system was actually, fine, though. > Inside the repair shell, I could see all the files from my root, but there > wasn't much in /dev. (I have the md stuff compiled in to the kernel, and > don't use an initrd, so it wasn't an initrd problem.) > > Short Solution > > The problem was with udev, the facility which automatically populates the > /dev directory. During the upgrade, emerge noted that my kernel version was > a bit early, but acceptable. What was missing, apparently, was the signalfd > syscall, which that kernel version either doesn't have or I hadn't > configured. Apparently, udev has only started using signalfd recently, so > the solution was to downgrade to an older version of udev (udev-141 to be > precise). > > What I Actually Did To Get There > > Of course, I didn't know that at first. Just had a fun unbootable system. > I might have been able to simply emerge the downgrade from the repair shell > (the network did come up), but I didn't know to try that yet. I figured I > wanted to find some way to make the system boot. Since the failing file > check is done from /etc/init.d/checkroot, I added a mknod command to create > the device node before trying to run the file check. At the start of the > start() method: > > if [ ! -e /dev/md0 ] ; then > mknod -m 0660 /dev/md0 b 9 0 > fi > > It's a hack, not a solution, but it did make the system boot, to a rather > crippled state. Since there were a lot of devices missing, a lot of > services wouldn't start. (If you're using a more boring root partition, it > might be something like "mknod -m 0660 /dev/sda1 b 8 1") > > So I had managed by now to gather that udev wasn't working, but I didn't > know why. My first thought was to try "/etc/init.d/udev start", to see if > it would start. But it told me that the script is written for baselevel-2, > and I shouldn't use it on baselevel-1. Following a bit of googling about > what the heck a baselevel is, I gathered that I was using baselevel-1, and > so the service wasn't supposed to be started that way. So it wasn't a bug > that it wouldn't start that way. Another page suggested trying to run it > directly, with "/sbin/udevd --daemon", which gave the message "error getting > signalfd". That told my why it didn't start. This message was also in the > logs, but for some reason I didn't look there until later. > > So back to Google, and I found a message on a Debian board noting that udev > had started using signalfd recently. This suggested an old version might do > the trick. I tried one, and it did.
I really only have two things to say, after reading this... First, and this really does overshadow the second in weight, thank you for the excellently presented writeup of problem *and* solution, as more often than ever should be (less so here, but across the net as a whole), problems are mentioned, solutions are offered, and rarely does a good, clear, "this worked" follow. Secondly... it's been my experience, with Gentoo, that things break far more often when I allow longer delays between updating than when I keep up to date with everything, and it's held true for me on both x86 and ~x86 systems (as has the headache when I've put updates off). And.. I reiterate a part of the "first"... Thank you for the writeup. -- Poison [BLX] Joshua M. Murphy