Peter Humphrey <[EMAIL PROTECTED]> posted
[EMAIL PROTECTED], excerpted below, on  Mon, 27 Aug
2007 12:33:04 +0100:

> All of which makes me wonder what the device map is for, seeing as
> booting succeeds without it.

The device map is for one thing: telling grub what BIOS order your boot 
devices are in when it can't figure it out directly on its own.  Since it 
gets that info directly from BIOS when booting, it ignores the device map 
at that point.  It's mainly (only?) used doing its installs from Linux 
(or other OS, BSD or whatever).  If the device map is there when doing 
the installs from Linux, it'll use it.  If not, it probes -- but those 
probes aren't always accurate.  Thus, ideally, you create the device map 

Ideally, then, what one does is run grub with --device.map initially, to 
create it, then reboot and confirm the order from the grub boot shell, 
using grub's find and cat commands to help figure out which is which if 
necessary.  I don't believe there's a way to directly write to disk from 
the boot shell (there's the dump command, but that copies from one file 
to another, so wouldn't work trying to modify a file on disk, you modify 
the command line if necessary, boot, /then/ modify the files from the 
OS), so if you find it's the wrong order, you either remember or write 
down the changes, and make them after booting.  Anyway, once you have the 
device map correct, you should then be able to reinstall grub from Linux 
without issue (using the --device.map parameter to tell it where the file 
is if it's not at the default spot), since it will then use the device 
map to direct where it writes.

Of course, if for whatever reason the drive order changes from what's in 
your device map (the common ones are you install/remove disks changing 
the order or the kernel config changes, making the order different), 
everything's screwed up again.  Since /unlike/ LILO, grub doesn't 
(normally, obviously /something/ happened recently that was an exception) 
have to be reinstalled every time you update the kernel, at least here, 
it's as likely as not that the device map will be wrong due to kernel or 
device changes when I DO decide to update it, I basically ignore the 
device map, and prefer to do the grub install to the MBR from the grub 
boot shell, so I don't have to worry about the device map OR grub failing 
to get the order correct with its dynamic probes.

>> Theory: Kernel 2.6.22 (perhaps because of the options selected in
>> menuconfig), caused a departure from the usual practice of keeping the
>> first 63 sectors after the master boot record clear and starting the
>> partition data table at sector 64.
>> The first partof this area is normally used by stage 1.5.
> 
> I don't know about this; surely the Linux kernel team wouldn't
> arbitrarily redefine the data structures used by all OSes?

Unless... they either did it inadvertently (bug) or as an unavoidable 
result of fixing say a security bug.

It's also worth noting that I didn't have any (observed) problem with the 
"vanilla, direct from kernel.org, not Gentoo's" kernel 2.6.22, only with 
2.6.23-rc3 (the first one I tried, I skipped the first two).

Here it's reported to be a problem with (Gentoo's) kernel 2.6.22-r5 (I 
didn't pay attention to which specific one, gentoo-sources I guess?) as 
well.  Now, Gentoo adds the upstream 2.6.x.y "y" stable updates as -rX 
updates of 2.6.x, so 2.6.22-r5 presumably includes at least the first one 
or two stable updates from upstream.

Thus, there are two... well, I just thought of a third one, so make that 
three... possibilities here.  Either something came up that was 
considered important enough to change both ongoing (in the form of 2.6.23-
rcs) and stable (in the form of 2.6.22.y, which Gentoo incorporated as 
2.6.22-rX), with implications either important enough to be worth the 
problem or possibly unrealized, OR something in 2.6.22 set the stage for 
the /next/ update to fail, that next update being (kernel.org's) 2.6.23-
rc3 for me, and (Gentoo's) 2.6.22-r5 here.

The third possibility, and we'll have to compare notes here, but from my 
perspective, is that I upgraded to the new (just headed to unstable 
~arch) baselayout-2 shortly before the kernel upgrade.  Now, I'm NOT 
thinking baselayout-2 had anything on its own to do with it.  However, 
due to some of the changes in the way it handles the pre-checkfs steps 
with LVM and RAID (they have their own separate initscripts now, and I 
screwed up the dependencies a bit), the first shutdown I did didn't shut 
everything down in the right order, and I ended up with a RAID-6 rebuild 
to get all drives back online.  Thus, the third possibility is that the 
problem may have been introduced earlier, perhaps as early as 2.6.21 
(both .21 and .22 reworked some md/RAID code), and only the RAID repair 
triggered it.  That's the only thing I've done recently that has changed 
the filesystem or partition layout.  Maybe it rewrote the md/RAID 
superblock, and that triggered the problem?

So the comparing notes end of this third possibility would be confirming 
that others experiencing the issue (a) have md/RAID(-1,5,6,10, not -0 or 
linear and not just simple non-raid hardware disk devices), and (b) have 
had some RAID damage recently that triggered a RAID repair, that may 
possibly have rewritten the RAID superblock.  If either (a) or (b) are 
confirmed NOT to be the case, then this third possibility is shot full of 
holes.

> What I do
> know is that, until yesterday, every time I've run grub to install
> itself on any disk, whether MBR or partition boot record, it's always
> failed to install stage 1.5. Yesterday, though, it succeeded for the
> first time when I ran "grub --device-map=/boot/grub/device.map".

Hmm... now you are making me wish I paid better attention reinstalling 
grub here.  I know it gave me the (non-fatal) warning on at least one of 
the four images, but didn't pay close enough attention to know if it was 
all of them.  It's possible it was just the one I had installed the 
UBUNTU grub version to, to get back up and running, or that it was the 
other three only and /not/ that one.  I wish I remembered, now, but 
didn't pay any more attention after I realized it was specifically stated 
as non-fatal.

>> No partition table. Except you got a message, "(It couldn't find stage
>> 1.5.)", suggesting that the partition data table was protected and
>> stage 1.5 was never really written or was written who knows where.
> 
> As this was the first time that stage 1.5 had ever been reported as
> written successfully, your reasoning is at least plausible.

I think I've always gotten the warning too, which would be further 
confirmation, but as I said, I wasn't paying enough attention to see 
whether I did consistently this last time, so can't directly map the 
absence of the warning to the problem this time.

>> Alternatively, an unknown (to me) repair process restored the start of
>> disk partition table from the end of disk partition table and
>> obliterated stage 1.5.
> 
> I don't know what could have done that.

That's what got me thinking about the third possibility I mention above...

>> Suggestion:
>> Shrink/move partitions to leave a megabyte or so of unallocated space
>> before the first partition. Then reinstall grub.
> 
> I've done that, making the first partition on each of the drives start
> at cylinder 2 (except /dev/sdc, which I'm leaving severely alone for the
> moment), and for the moment all seems serene. Thanks for the idea.

I haven't and don't plan to.  Everything's working now.  I'm leaving well 
enough alone for the moment.

> One thing that does seem odd is that grub, when run from the Linux
> command line, takes no notice of the disk presentation order I've
> declared in the BIOS, even though it says it's "probing devices to guess
> BIOS drives". Always, hd0 is /dev/hda, hd1 is /dev/sda, hd2 is /dev/sdb
> and hd3 is /dev/sdc. At least it's logical and consistent.

This isn't entirely unusual.  I've had mobos/BIOSs that change the drive 
order they tell the OS when you change the drive that boots, and mobos/
BIOSs that kept it entirely consistent, no matter which one you told to 
boot.  Unlike Linux which doesn't care about such things, MSWormOS always 
wants to be on the first disk, so this matters.  However, if it refuses 
to boot the first time, the BIOS errors out but apparently switches 
things around so it boots the second time.  At least, that was my 
experience here.  (The MSWormOS was 98, newer versions may be different.)

Then in one of these threads (I'm forgetting which is which now), someone 
posted that their BIOS has another entire page, with drive order separate 
from boot order, so I guess on that one you can specify the two 
separately, and set it up to behave exactly as you want/need it to.

> Incidentally, I'm attempting to create a grub floppy, and grub has now
> been stuck in this probing phase for the best part of an hour. The
> installed /dev/fd0 on this box seems to be faulty, so I've plugged a USB
> floppy in, which is recognised as /dev/sdd. Grub seems to dislike it,
> however. The probing has never taken more than about 90s before now, and
> until I installed the latest two disks in the box I'd have missed it if
> I'd blinked.

I've no experience with USB floppy drives at all.  However, I know that 
due to the way the floppy ribbon cables were keyed (or more, not keyed so 
well), it was often possible to accidentally reverse them.  I'd hate to 
try that with a hard drive, but with a floppy, at least here, all it did 
was cause it not to function properly until the cable was unplugged and 
reversed again so it was plugged in correctly.  Vwalla!  The thing worked 
as good as it did before!

So if the floppy seems to be faulty, double check and/or try reversing 
the data cable (be careful, it could damage the mainboard if you get it 
wrong, but didn't here, so obviously doesn't all the time).

Also, you may have it plugged into the B drive slot instead of A.  So you 
may have to set the BIOS to the second one or plug the floppy in using 
the other connector on the cable (if it has two).  IIRC, the A floppy 
will have a portion of the cable (not the connector, a part of the ribbon 
cable).  I always thought it should be the B floppy, so if you only had A 
plugged in, you could use a flat ribbon instead of having that twist in 
part of it, but hey, they made the A floppy the twisted one, for whatever 
reason.  So if you have a ribbon cable and it's flat between the board 
and the floppy drive, not part of it twisted, you have it plugged in as 
the B floppy, not the A floppy.  Make sure the BIOS setting is the same, 
or simply set both A and B to the appropriate device size and try both 
fd0 and fd1.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

-- 
[EMAIL PROTECTED] mailing list

Reply via email to