Peter Humphrey <[EMAIL PROTECTED]> posted [EMAIL PROTECTED], excerpted below, on Mon, 27 Aug 2007 12:33:04 +0100:
> All of which makes me wonder what the device map is for, seeing as > booting succeeds without it. The device map is for one thing: telling grub what BIOS order your boot devices are in when it can't figure it out directly on its own. Since it gets that info directly from BIOS when booting, it ignores the device map at that point. It's mainly (only?) used doing its installs from Linux (or other OS, BSD or whatever). If the device map is there when doing the installs from Linux, it'll use it. If not, it probes -- but those probes aren't always accurate. Thus, ideally, you create the device map Ideally, then, what one does is run grub with --device.map initially, to create it, then reboot and confirm the order from the grub boot shell, using grub's find and cat commands to help figure out which is which if necessary. I don't believe there's a way to directly write to disk from the boot shell (there's the dump command, but that copies from one file to another, so wouldn't work trying to modify a file on disk, you modify the command line if necessary, boot, /then/ modify the files from the OS), so if you find it's the wrong order, you either remember or write down the changes, and make them after booting. Anyway, once you have the device map correct, you should then be able to reinstall grub from Linux without issue (using the --device.map parameter to tell it where the file is if it's not at the default spot), since it will then use the device map to direct where it writes. Of course, if for whatever reason the drive order changes from what's in your device map (the common ones are you install/remove disks changing the order or the kernel config changes, making the order different), everything's screwed up again. Since /unlike/ LILO, grub doesn't (normally, obviously /something/ happened recently that was an exception) have to be reinstalled every time you update the kernel, at least here, it's as likely as not that the device map will be wrong due to kernel or device changes when I DO decide to update it, I basically ignore the device map, and prefer to do the grub install to the MBR from the grub boot shell, so I don't have to worry about the device map OR grub failing to get the order correct with its dynamic probes. >> Theory: Kernel 2.6.22 (perhaps because of the options selected in >> menuconfig), caused a departure from the usual practice of keeping the >> first 63 sectors after the master boot record clear and starting the >> partition data table at sector 64. >> The first partof this area is normally used by stage 1.5. > > I don't know about this; surely the Linux kernel team wouldn't > arbitrarily redefine the data structures used by all OSes? Unless... they either did it inadvertently (bug) or as an unavoidable result of fixing say a security bug. It's also worth noting that I didn't have any (observed) problem with the "vanilla, direct from kernel.org, not Gentoo's" kernel 2.6.22, only with 2.6.23-rc3 (the first one I tried, I skipped the first two). Here it's reported to be a problem with (Gentoo's) kernel 2.6.22-r5 (I didn't pay attention to which specific one, gentoo-sources I guess?) as well. Now, Gentoo adds the upstream 2.6.x.y "y" stable updates as -rX updates of 2.6.x, so 2.6.22-r5 presumably includes at least the first one or two stable updates from upstream. Thus, there are two... well, I just thought of a third one, so make that three... possibilities here. Either something came up that was considered important enough to change both ongoing (in the form of 2.6.23- rcs) and stable (in the form of 2.6.22.y, which Gentoo incorporated as 2.6.22-rX), with implications either important enough to be worth the problem or possibly unrealized, OR something in 2.6.22 set the stage for the /next/ update to fail, that next update being (kernel.org's) 2.6.23- rc3 for me, and (Gentoo's) 2.6.22-r5 here. The third possibility, and we'll have to compare notes here, but from my perspective, is that I upgraded to the new (just headed to unstable ~arch) baselayout-2 shortly before the kernel upgrade. Now, I'm NOT thinking baselayout-2 had anything on its own to do with it. However, due to some of the changes in the way it handles the pre-checkfs steps with LVM and RAID (they have their own separate initscripts now, and I screwed up the dependencies a bit), the first shutdown I did didn't shut everything down in the right order, and I ended up with a RAID-6 rebuild to get all drives back online. Thus, the third possibility is that the problem may have been introduced earlier, perhaps as early as 2.6.21 (both .21 and .22 reworked some md/RAID code), and only the RAID repair triggered it. That's the only thing I've done recently that has changed the filesystem or partition layout. Maybe it rewrote the md/RAID superblock, and that triggered the problem? So the comparing notes end of this third possibility would be confirming that others experiencing the issue (a) have md/RAID(-1,5,6,10, not -0 or linear and not just simple non-raid hardware disk devices), and (b) have had some RAID damage recently that triggered a RAID repair, that may possibly have rewritten the RAID superblock. If either (a) or (b) are confirmed NOT to be the case, then this third possibility is shot full of holes. > What I do > know is that, until yesterday, every time I've run grub to install > itself on any disk, whether MBR or partition boot record, it's always > failed to install stage 1.5. Yesterday, though, it succeeded for the > first time when I ran "grub --device-map=/boot/grub/device.map". Hmm... now you are making me wish I paid better attention reinstalling grub here. I know it gave me the (non-fatal) warning on at least one of the four images, but didn't pay close enough attention to know if it was all of them. It's possible it was just the one I had installed the UBUNTU grub version to, to get back up and running, or that it was the other three only and /not/ that one. I wish I remembered, now, but didn't pay any more attention after I realized it was specifically stated as non-fatal. >> No partition table. Except you got a message, "(It couldn't find stage >> 1.5.)", suggesting that the partition data table was protected and >> stage 1.5 was never really written or was written who knows where. > > As this was the first time that stage 1.5 had ever been reported as > written successfully, your reasoning is at least plausible. I think I've always gotten the warning too, which would be further confirmation, but as I said, I wasn't paying enough attention to see whether I did consistently this last time, so can't directly map the absence of the warning to the problem this time. >> Alternatively, an unknown (to me) repair process restored the start of >> disk partition table from the end of disk partition table and >> obliterated stage 1.5. > > I don't know what could have done that. That's what got me thinking about the third possibility I mention above... >> Suggestion: >> Shrink/move partitions to leave a megabyte or so of unallocated space >> before the first partition. Then reinstall grub. > > I've done that, making the first partition on each of the drives start > at cylinder 2 (except /dev/sdc, which I'm leaving severely alone for the > moment), and for the moment all seems serene. Thanks for the idea. I haven't and don't plan to. Everything's working now. I'm leaving well enough alone for the moment. > One thing that does seem odd is that grub, when run from the Linux > command line, takes no notice of the disk presentation order I've > declared in the BIOS, even though it says it's "probing devices to guess > BIOS drives". Always, hd0 is /dev/hda, hd1 is /dev/sda, hd2 is /dev/sdb > and hd3 is /dev/sdc. At least it's logical and consistent. This isn't entirely unusual. I've had mobos/BIOSs that change the drive order they tell the OS when you change the drive that boots, and mobos/ BIOSs that kept it entirely consistent, no matter which one you told to boot. Unlike Linux which doesn't care about such things, MSWormOS always wants to be on the first disk, so this matters. However, if it refuses to boot the first time, the BIOS errors out but apparently switches things around so it boots the second time. At least, that was my experience here. (The MSWormOS was 98, newer versions may be different.) Then in one of these threads (I'm forgetting which is which now), someone posted that their BIOS has another entire page, with drive order separate from boot order, so I guess on that one you can specify the two separately, and set it up to behave exactly as you want/need it to. > Incidentally, I'm attempting to create a grub floppy, and grub has now > been stuck in this probing phase for the best part of an hour. The > installed /dev/fd0 on this box seems to be faulty, so I've plugged a USB > floppy in, which is recognised as /dev/sdd. Grub seems to dislike it, > however. The probing has never taken more than about 90s before now, and > until I installed the latest two disks in the box I'd have missed it if > I'd blinked. I've no experience with USB floppy drives at all. However, I know that due to the way the floppy ribbon cables were keyed (or more, not keyed so well), it was often possible to accidentally reverse them. I'd hate to try that with a hard drive, but with a floppy, at least here, all it did was cause it not to function properly until the cable was unplugged and reversed again so it was plugged in correctly. Vwalla! The thing worked as good as it did before! So if the floppy seems to be faulty, double check and/or try reversing the data cable (be careful, it could damage the mainboard if you get it wrong, but didn't here, so obviously doesn't all the time). Also, you may have it plugged into the B drive slot instead of A. So you may have to set the BIOS to the second one or plug the floppy in using the other connector on the cable (if it has two). IIRC, the A floppy will have a portion of the cable (not the connector, a part of the ribbon cable). I always thought it should be the B floppy, so if you only had A plugged in, you could use a flat ribbon instead of having that twist in part of it, but hey, they made the A floppy the twisted one, for whatever reason. So if you have a ribbon cable and it's flat between the board and the floppy drive, not part of it twisted, you have it plugged in as the B floppy, not the A floppy. Make sure the BIOS setting is the same, or simply set both A and B to the appropriate device size and try both fd0 and fd1. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- [EMAIL PROTECTED] mailing list