On Tue, Dec 16, 2008 at 10:08:59AM -0700, Aaron D. Johnson wrote: > dann frazier writes: > > I've got a theory - can you search the /var/log/kern.log* files on > > this guest for any Oops messages? > > No Oopses going back to 3 Dec: > ajohn...@spielplatz:~$ sudo zgrep -i oops /var/log/kern.log* > ajohn...@spielplatz:~$ sudo gzip -dc /var/log/kern.log.6.gz | head -n 1 > Dec 3 06:26:12 spielplatz kernel: [15525239.819366] postgres(22142): > floating-point assist fault at ip 40000000003de402, isr 0000040000000008 > ajohn...@spielplatz:~$ > > Countless floating-point assist fault messages, though. It seems that > PostgreSQL needs some help in this department. > > > Do you recall experiencing a hang during your kernel upgrade? > > I remember a hang on shutdown for some system during the last week, > but nothing during the kernel package upgrade proper. > > > I'm wondering if there was an oops at the time you upgraded your > > kernel package. Also, can you mount your efi partition and capture > > the md5sums of the files under /boot/efi/efi/debian? > > ajohn...@spielplatz:~$ sudo mount -v -t vfat -o ro /dev/sda1 /mnt > ajohn...@spielplatz:~$ md5sum /mnt/efi/debian/* > 9fa2639fa5dca1521df76c7c254f4e04 /mnt/efi/debian/elilo.conf > 5bec2375858e01c4590976f3fb479a3c /mnt/efi/debian/elilo.efi > f6d26c846defcbb6a255365b32205e69 /mnt/efi/debian/initrd.img > f43e07c02fff08489e5d1f60dc0046ae /mnt/efi/debian/initrd.img.old > 35a0f1cd6e79fc7ffd93ca1dddb5df01 /mnt/efi/debian/readme.txt > 384b24d661e30ca549569954ab9dc3ae /mnt/efi/debian/vmlinuz > 67a9622f681abd91cc4710da8894b743 /mnt/efi/debian/vmlinuz.old > ajohn...@spielplatz:~$ > > > If my theory is correct, you may be able to get back up and running > > by booting an older kernel (if you have one), running 'elilo', then > > booting back into the 2.6.26-11 kernel. > > OK, so that worked. What change did re-running elilo make? Based on > the MD5sums, there are new initrd and vmlinuz files. Seems like > installing kernel-image-2.6.26-1-mckinley should have done that in its > postinst script.
Here's what I think happened: - Running 2.6.26-8 - Upgraded to 2.6.26-11 - unpacked 2.6.26-11 - generated initramfs - called elilo - elilo loads modules it needs to mount EFI partition, but the modules available are now for 2.6.26-11 and are incompatible with 2.6.26-8. - system tries to mount efi partition and hangs due to incompatible modules - kernel/initrd in the efi partition is now out of date with respect to the files in /boot - system boots 2.6.26-8 again - initramfs loads, works fine (still using 2.6.26-8 initramfs) - system mounts root - system starts loading modules from the root partition (which are now 2.6.26-11 modules), and does bad things. The bug would therefore be that we created a kernel with the same abiname that was actually incompatible with the modules from an earlier release. > What happens to the poor user who doesn't know to re-run elilo? (Not > that I expect there are too many "poor users" running ia64 systems.) Unfortunately, I don't know that there's anyway to retroactively solve this problem. The cat is out of the bag, as they say. It would be a nice safety procedure to make sure the modules we need are loaded before we unpack the new modules - i.e., in the preinst. One way to do this would be to call 'elilo' in the preinst. Savy users can configure their systems to do this themselves by adding a preinst hook in /etc/kernel-img.conf. -- dann frazier -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org