Found two disks with the same number 0?!?
On my raid1-using system, I get the following error at boot: error: Found two disks with the number 0?!? Robert Millan suggested I apply a patch to print out the two disks with this problem; they are (hd1,2) and (hd3,2). If I comment out this check then I can boot normally. Robert things GRUB is being too conservative here; perhaps this check should be removed? Index: disk/raid.c === --- disk/raid.c (revision 1691) +++ disk/raid.c (working copy) @@ -440,18 +440,6 @@ return 0; } - - if (array-device[sb.this_disk.number] != NULL) - { - /* We found multiple devices with the same number. Again, -this shouldn't happen.*/ - - grub_error (GRUB_ERR_BAD_NUMBER, - Found two disks with the number %d?!?, - sb.this_disk.number); - - return 0; - } } /* Add an array to the list if we didn't find any. */ -- Sam Morris http://robots.org.uk/ PGP key id 1024D/5EA01078 3412 EA18 1277 354B 991B C869 B219 7FDB 5EA0 1078 signature.asc Description: This is a digitally signed message part ___ Grub-devel mailing list Grub-devel@gnu.org http://lists.gnu.org/mailman/listinfo/grub-devel
Re: [PATCH] tell the user why they are in rescue mode
On Fri, 08 Feb 2008 20:15:39 +0100, Robert Millan wrote: On Fri, Feb 08, 2008 at 06:44:26PM +, Sam Morris wrote: While that works, the output looks like this: Entering rescue mode... error: out of disk _ Which does not present the information in the clearest possible way: it looks like the error happened in the rescue mode code, instead of the normal mode code. Also, the error message could be lost if any of the code between the start of grub_enter_rescue_mode and its call to grub_print_error itself triggers an error. For these reasons I think it's better to call grub_print_error from within grub_load_normal. Agreed. See attached patch. Does this work as expected ? Yes, it works fine. -- Sam Morris http://robots.org.uk/ PGP key id 1024D/5EA01078 3412 EA18 1277 354B 991B C869 B219 7FDB 5EA0 1078 ___ Grub-devel mailing list Grub-devel@gnu.org http://lists.gnu.org/mailman/listinfo/grub-devel
Re: [PATCH] tell the user why they are in rescue mode
On Fri, 08 Feb 2008 02:10:29 +0100, Robert Millan wrote: Hi Sam, On Thu, Feb 07, 2008 at 06:34:26PM +, Sam Morris wrote: diff -ru grub2-1.96+20080203+orig/kern/main.c grub2-1.96+20080203+sam/kern/main.c --- grub2-1.96+20080203+orig/kern/main.c 2008-01-05 12:04:35.0 + +++ grub2-1.96+20080203+sam/kern/main.c2008-02-07 08:41:01.0 + @@ -102,8 +102,13 @@ /* Load the module. */ grub_dl_load (normal); - /* Ignore any error, because we have the rescue mode anyway. */ - grub_errno = GRUB_ERR_NONE; + if (grub_errno != GRUB_ERR_NONE) +{ + grub_printf (Unable to enter 'normal' mode (error %d: %s)\n, grub_errno, grub_errmsg); + + /* We're about to continue into rescue mode, so clear the error. */ + grub_errno = GRUB_ERR_NONE; +} I just checked, and it seems we already have a function for this: grub_print_error(). If you just invoke this function, do you get the desired result? Yes, it works fine. Also, I wonder why the existing grub_print_error() call in the rescue loop doesn't handle this already (in kern/rescue.c). Perhaps all you need to do is remove these two lines? diff -ur grub2/kern/main.c tmp/kern/main.c --- grub2/kern/main.c 2008-01-05 13:04:35.0 +0100 +++ tmp/kern/main.c 2008-02-08 02:09:03.0 +0100 @@ -101,9 +101,6 @@ { /* Load the module. */ grub_dl_load (normal); - - /* Ignore any error, because we have the rescue mode anyway. */ - grub_errno = GRUB_ERR_NONE; } /* The main routine. */ While that works, the output looks like this: Entering rescue mode... error: out of disk _ Which does not present the information in the clearest possible way: it looks like the error happened in the rescue mode code, instead of the normal mode code. Also, the error message could be lost if any of the code between the start of grub_enter_rescue_mode and its call to grub_print_error itself triggers an error. For these reasons I think it's better to call grub_print_error from within grub_load_normal. BTW, does anyone know the first thing grub_enter_rescue_mode does is call attempt_normal_mode? -- Sam Morris http://robots.org.uk/ PGP key id 1024D/5EA01078 3412 EA18 1277 354B 991B C869 B219 7FDB 5EA0 1078 ___ Grub-devel mailing list Grub-devel@gnu.org http://lists.gnu.org/mailman/listinfo/grub-devel
Re: [PATCH] fix for partmap detection on RAID/LVM
, fragment_size = 0, fs_type = 0 '\0', fs_fragments = 0 '\0', fs_cylinders = 0}, { size = 134656064, offset = 134625056, fragment_size = 3216513864, fs_type = 255 '�', fs_fragments = 225 '�', fs_cylinders = 2053}, { size = 128, offset = 131, fragment_size = 0, fs_type = 0 '\0', fs_fragments = 0 '\0', fs_cylinders = 0}, {size = 134625820, offset = 3216514036, fragment_size = 3216513880, fs_type = 158 '\236', fs_fragments = 227 '�', fs_cylinders = 2053}, {size = 0, offset = 0, fragment_size = 3216513928, fs_type = 131 '\203', fs_fragments = 220 '�', fs_cylinders = 2052}, {size = 134615777, offset = 134625820, fragment_size = 0, fs_type = 255 '�', fs_fragments = 255 '�', fs_cylinders = 65535}}} raw = {name = 0x806b040 hd3, dev = 0x8063720, total_sectors = 586114704, has_partitions = 128, id = 131, partition = 0x0, read_hook = 0, data = 0x0} #3 0x0804e1bf in grub_partition_iterate (hook=0xbfb81c9e) at kern/partition.c:126 ret = value optimized out partmap = (grub_partition_map_t) 0x8063a1c disk = (struct grub_disk *) 0x806b008 #4 0x0804b812 in iterate_disk (disk_name=0xbfb81c42 hd3) at kern/device.c:101 dev = (grub_device_t) 0x806b030 hook = (int (*)(const char *)) 0x80601c0 grub_raid_scan_device #5 0x08049810 in grub_util_biosdisk_iterate (hook=0xbfb81c94) at util/biosdisk.c:131 i = 131 #6 0x0804b914 in grub_disk_dev_iterate (hook=0xbfb81c94) at kern/disk.c:205 p = (grub_disk_dev_t) 0x8063720 #7 0x0804b614 in grub_device_iterate (hook=0x80601c0 grub_raid_scan_device) at kern/device.c:138 No locals. #8 0x0805f9e2 in grub_mod_init (mod=0x0) at disk/raid.c:563 No locals. #9 0x0805fa02 in grub_raid_init () at disk/raid.c:561 No locals. #10 0x0804927c in main (argc=Cannot access memory at address 0xa9fe4bb5 ) at util/grub-probe.c:338 c = value optimized out dev_map = 0x0 path = 0xbfb83bd7 /boot/grub/ I'll debug this further later if you don't know why it happened. -- Sam Morris http://robots.org.uk/ PGP key id 1024D/5EA01078 3412 EA18 1277 354B 991B C869 B219 7FDB 5EA0 1078 signature.asc Description: This is a digitally signed message part ___ Grub-devel mailing list Grub-devel@gnu.org http://lists.gnu.org/mailman/listinfo/grub-devel
Re: [PATCH] fix for partmap detection on RAID/LVM
On Fri, 08 Feb 2008 20:38:56 +0100, Robert Millan wrote: On Fri, Feb 08, 2008 at 07:20:31PM +, Sam Morris wrote: On Fri, 2008-02-08 at 15:52 +0100, Robert Millan wrote: New patch to fix partmap detection in LVM/RAID. Changes in comparison to previous patch: (gdb) run -t partmap /boot/grub/ Starting program: /home/sam/grub/grub2/grub-probe -t partmap /boot/grub/ Program received signal SIGSEGV, Segmentation fault. 0x0806035a in grub_raid_scan_device (name=0x806b080 hd3,2) at disk/raid.c:442 442if (array-device[sb.this_disk.number]-name != 0) I didn't touch this function. I assume this was introduced with my previous commit that redefined this structure. .name used to be initialized altogether with .disk, so checking for .name initialization amounts to checking for .disk initialization, which is what we still have (but with a different name)). So: diff -x configure -x config.h.in -x CVS -x '*~' -x '*.mk' -urp -N ../grub2/disk/raid.c ./disk/raid.c --- ../grub2/disk/raid.c 2008-02-08 13:35:05.0 +0100 +++ ./disk/raid.c 2008-02-08 20:36:47.0 +0100 @@ -419,7 +419,7 @@ grub_raid_scan_device (const char *name) return 0; } - if (array-device[sb.this_disk.number]-name != 0) + if (array-device[sb.this_disk.number] != NULL) { /* We found multiple devices with the same number. Again, this shouldn't happen.*/ does this work? Looks like it! $ sudo ./grub-probe -t partmap /boot/grub/ pc pc -- Sam Morris http://robots.org.uk/ PGP key id 1024D/5EA01078 3412 EA18 1277 354B 991B C869 B219 7FDB 5EA0 1078 ___ Grub-devel mailing list Grub-devel@gnu.org http://lists.gnu.org/mailman/listinfo/grub-devel
[PATCH] tell the user why they are in rescue mode
Here's a patch that explains to the user why they are being sent to rescue mode in cast normal mode fails. I've also corrected a grammatical error in the 'entering rescue mode' message. -- Sam Morris http://robots.org.uk/ PGP key id 1024D/5EA01078 3412 EA18 1277 354B 991B C869 B219 7FDB 5EA0 1078 diff -ru grub2-1.96+20080203+orig/kern/main.c grub2-1.96+20080203+sam/kern/main.c --- grub2-1.96+20080203+orig/kern/main.c 2008-01-05 12:04:35.0 + +++ grub2-1.96+20080203+sam/kern/main.c 2008-02-07 08:41:01.0 + @@ -102,8 +102,13 @@ /* Load the module. */ grub_dl_load (normal); - /* Ignore any error, because we have the rescue mode anyway. */ - grub_errno = GRUB_ERR_NONE; + if (grub_errno != GRUB_ERR_NONE) +{ + grub_printf (Unable to enter 'normal' mode (error %d: %s)\n, grub_errno, grub_errmsg); + + /* We're about to continue into rescue mode, so clear the error. */ + grub_errno = GRUB_ERR_NONE; +} } /* The main routine. */ diff -ru grub2-1.96+20080203+orig/kern/rescue.c grub2-1.96+20080203+sam/kern/rescue.c --- grub2-1.96+20080203+orig/kern/rescue.c 2008-01-30 14:42:09.0 + +++ grub2-1.96+20080203+sam/kern/rescue.c 2008-02-07 08:38:46.0 + @@ -618,7 +618,7 @@ /* First of all, attempt to execute the normal mode. */ attempt_normal_mode (); - grub_printf (Entering into rescue mode...\n); + grub_printf (Entering rescue mode...\n); grub_rescue_register_command (boot, grub_rescue_cmd_boot, boot an operating system); signature.asc Description: This is a digitally signed message part ___ Grub-devel mailing list Grub-devel@gnu.org http://lists.gnu.org/mailman/listinfo/grub-devel
Re: grub2 and Linux software RAID devices
On Tue, 05 Feb 2008 10:35:19 +0100, Robert Millan wrote: On Tue, Feb 05, 2008 at 01:32:46AM +, Sam Morris wrote: Version 1.96+20080203-1 behaves differently; I am simply dropped to the rescue shell, and I can't load any modules (because root is set to (hd1,2) which does not show up when I run 'ls'). I guessed that this is because 'pc' is not in the list of modules included when grub-install creates core.img. Did you make sure 'pc' is loaded before 'raid' ? This was required, at least for LVM. Yes, the exact command I ran was: 'grub-mkimage --output=/boot/grub/ core.img --prefix=/boot/grub ext2 pc biosdisk raid _chain'. If I add 'pc' to core.img then things are much healthier: I am still dropped into the rescue shell, but I can load the 'normal' module, issue the 'normal' command and then I get the (pretty!) menu and am able to boot normally. I'm still not sure why I'm dropped into the rescue shell in the first place, however. There's a flash of text before the prompt appears, but the screen is cleared much to fast to read any of it. Please can you take a picture of what you see right after the text is cleared? Ok, I discovered that I can use the Pause key to read what's on the screen. Unfortunately, it's not very useful... it's just the text printed by grub (legacy) as it loads the core.img file! Booting 'Chainload into GRUB 2' root (hd1,1) Filesystem type is ext2fs, partition type 0xfd kernel /boot/grub/core.img [Multiboot-kludge, loadaddr=0x10, text-and-data=0x7676, bss=0x0, entry=0x1002903] savedefault Then the screen blanks, and the next thing printed is: Welcome to GRUB! Entering into rescue mode... grub rescue Whereupon I can enter 'insmod normal' and then 'normal' to get to the regular boot menu. -- Sam Morris http://robots.org.uk/ PGP key id 1024D/5EA01078 3412 EA18 1277 354B 991B C869 B219 7FDB 5EA0 1078 ___ Grub-devel mailing list Grub-devel@gnu.org http://lists.gnu.org/mailman/listinfo/grub-devel
Re: grub2 and Linux software RAID devices
On Mon, 04 Feb 2008 23:43:38 +0100, Robert Millan wrote: On Mon, Feb 04, 2008 at 09:36:45PM +, Sam Morris wrote: Hi there, A while ago, I tried grub2 on my Debian system, which has my root filesystem on a Linux software RAID-1 array. I ran into some problems, and while they were raised here, nothing really came of them. Robert Millan suggested I post my problem again to see if anything can be done to fix it. I think my problems stem from the Promise IDE controller that my second disk is connected to. It does not support 48-bit LBA addressing, and so any attempt to read the end of the disk using BIOS calls will fail. Of course, once an operating system has loaded its own driver for the controller, the disk can be read correctly. Here's what the two disks look like: Model: Maxtor 6L300R0 (ide) Disk /dev/hdb: 300GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start EndSize Type File system Flags 1 32.3kB 543MB 543MB primary 2 543MB 300GB 300GB primary ext3 raid The first partition is swap, the second is the root filesystem. The first problem is the operation of the grub-probe partition. Debian's post-install script runs the following command to determine which modules to include in the generated core.img file: grub-probe --target=partmap --device-map=/boot/grub/device.map /boot/grub Which fails with the error: grub-probe: error: Cannot detect partition map for md0 It appears that grub-probe expects to find a partition table inside the RAID device, when of course, it is really in its containing device, /dev/hdb. This is a known problem, and I roughly have a solution in mind, but I haven't been able to reproduce it. When I try to install Debian with /boot inside an LVM, the installer hangs. This option doesn't seem to be supported at all. Ouch... I have never tried /boot on LVM myself (since grub legacy can't handle it). However /boot on RAID works fine, I suggest you try that instead. Furthermore, how do you boot that system with GRUB Legacy? As far as grub1 is concerned, /dev/hdb2 is a normal partition containing an ext3 filesystem. If you give me some details on how to reproduce the scheme in which /boot is behind a lvm/raid abstraction, I could try to get this fixed. It's pretty simple, assuming you are using d-i. When partitioning, configure two disks with identical partition layouts (a single partition on each is sufficient). Then, tell partman that you want to use them as 'physical volumes for RAID'. A new option should appear, 'configure RAID' (or something similar). Here you can create a RAID1 array using both the partitions, which you can use as an ext3 filesystem, mounted at /. Debian's post-install script has actually been written to substitute 'pc gpt' if the partmap probing fails, Actually, this was reverted a while ago. manually. However, I now hit the second problem: the menu that grub presents has no text! It seems to have an entry, however, as there is a highlighted line. Which version did you try? Is it more recent than 1.96 ? We fixed bugs producing this result recently. Damn, I installed the version from testing by mistake. I will try again with version 1.96+20080203-1. Jeroen Dekkers previously suggested a patch to suppress the 'out of disk' error, at http://www.mail-archive.com/grub-devel@gnu.org/msg02873.html but no one ever committed it. Ah, I see. The patch looks correct to me; only the description you gave before isn't. Hmm, I don't understand the difference between your changelog entry and my description, but ok. :) -- Sam Morris http://robots.org.uk/ PGP key id 1024D/5EA01078 3412 EA18 1277 354B 991B C869 B219 7FDB 5EA0 1078 ___ Grub-devel mailing list Grub-devel@gnu.org http://lists.gnu.org/mailman/listinfo/grub-devel
Re: grub2 and Linux software RAID devices
On Mon, 04 Feb 2008 23:43:38 +0100, Robert Millan wrote: Debian's post-install script has actually been written to substitute 'pc gpt' if the partmap probing fails, Actually, this was reverted a while ago. manually. However, I now hit the second problem: the menu that grub presents has no text! It seems to have an entry, however, as there is a highlighted line. Which version did you try? Is it more recent than 1.96 ? We fixed bugs producing this result recently. This now works fine (and looks nice and pretty!) with version 1.96 +20080203-1. :) The third problem is that if I press enter, or wait for the timeout to finish, or the screen blanks, and I get the message Booting '. The system then freezes and I have to use the hardware reset switch to continue (ctrl+alt+del does not work). Pressing 'e', or the up or down keys also freeze the system, without the Booting ' message. Where does GRUB get its grub.cfg from? What are its contents? (from the POV of GRUB; use 'cat' to determine). Version 1.96+20080203-1 behaves differently; I am simply dropped to the rescue shell, and I can't load any modules (because root is set to (hd1,2) which does not show up when I run 'ls'). I guessed that this is because 'pc' is not in the list of modules included when grub-install creates core.img. If I add 'pc' to core.img then things are much healthier: I am still dropped into the rescue shell, but I can load the 'normal' module, issue the 'normal' command and then I get the (pretty!) menu and am able to boot normally. I'm still not sure why I'm dropped into the rescue shell in the first place, however. There's a flash of text before the prompt appears, but the screen is cleared much to fast to read any of it. ISTR that when I tried Jeroen Dekkers' patch before, I was no longer booted into the rescue shell; but I can't remember (and I don't understand why, if I was kicked into the rescue shell becuase grub tried to read past the 'end' of my disk, why I can immediatly issue the 'normal' command to go to the menu and have everything working). Anyway, I'll wait to see how things improve after a new grub-pc is uploaded. Thanks for your help advice so far! -- Sam Morris http://robots.org.uk/ PGP key id 1024D/5EA01078 3412 EA18 1277 354B 991B C869 B219 7FDB 5EA0 1078 ___ Grub-devel mailing list Grub-devel@gnu.org http://lists.gnu.org/mailman/listinfo/grub-devel
Re: Moving to another SCM
https://savannah.gnu.org/maintenance/Git It will depend on when savannah starts offering it when we can switch. Support for git is already up and running: http://git.savannah.gnu.org/gitweb/. I know from the GNU coreutils and gnulib people that they've already been using it successfully for a few months. (``Successfully'' here means that they didn't regret to have switched to git.) This sounds interesting. Personally I have no objections to a switch. Will the history be preserved, do we still have a commits mailinglist, is git easy to learn and how do other developers (especially Okuji) think about this? I have been using the following script to maintain a mirror of Grub's CVS repository. The first time it's run it creates a git repository in 'upstream'; subsequent runs will import any new commits since the last run. #!/bin/sh set -e export CVSROOT=:pserver:[EMAIL PROTECTED]:/sources/grub git-cvsimport -k -v -i -C upstream grub2 The full history, including branches, is imported. The 'upstream' directory ends up being 6.3 MiB, which includes a checkout of the 'origin' branch. -- Sam Morris http://robots.org.uk/ PGP key id 1024D/5EA01078 3412 EA18 1277 354B 991B C869 B219 7FDB 5EA0 1078 signature.asc Description: This is a digitally signed message part ___ Grub-devel mailing list Grub-devel@gnu.org http://lists.gnu.org/mailman/listinfo/grub-devel
Re: Bug#423022: Bug#422851: grub-probe -t partmap doesn't work with software RAID
On Mon, 2007-05-21 at 13:08 +0200, Jeroen Dekkers wrote: At Sat, 19 May 2007 15:13:58 +0100, Sam Morris wrote: In addition, it would be nice if the 'out of disk' error could be deferred until grub actually tries to read a block that is out of range, as grub-legacy does The problem is that it actually tries to do that, because the RAID superblock is located at the end of the partition. Oh, good point. My only remaining point of confusion is how I am able to access the (md0) device fine from within grub, since it can't read the superblocks. Does it just assume that any pc partitions of type 0xfd with unreadable superblocks are part of a RAID 1 array, or is it possible that something else is going on? My array is made up of partitions on two disks; the first is the primary master on the motherboard's ATA controller, and the second is on a Promise PCI card. Now, AFAIK the promise card cannot do 48-bit LBA addressing without a bios flash that I never applied. But is it possible that my motherboard's controller is able to do 48-bit addressing? If this were the case it would explain how grub is able to access an (md0) device (via the fully-readable (hd0,2) device), and also where the 'out of disk' error comes from (from trying to read the superblock of (hd3,2)). If this is the case, it would be nice if the raid module would only throw a warning if some of the component devices could not be added to a RAID1 array. -- Sam Morris http://robots.org.uk/ PGP key id 1024D/5EA01078 3412 EA18 1277 354B 991B C869 B219 7FDB 5EA0 1078 signature.asc Description: This is a digitally signed message part ___ Grub-devel mailing list Grub-devel@gnu.org http://lists.gnu.org/mailman/listinfo/grub-devel
Re: fallback for grub-probe -t partmap failures
On Sat, 2007-05-19 at 17:32 +0200, Robert Millan wrote: (please keep CC to upsteam..) So, do we put this in upstream grub CVS, or do we hold it off untill a better fix is available ? I think falling back to 'pc raid gpt' is ok. Perhaps if the fallback is used, a warning could be displayed prompting the user to set GRUB_INSTALL_partmap_module in /etc/default/grub (the presence of which prevents grub-probe -t partmap from being run at all). Of course there is the question of how to handle lvm (assuming it has this same problem). I guess it could be included in the fallback list, but I am unfamiliar with the drawbacks with increasing the size of the generated core.img. I guess that as its size increases, the likelihood that it becomes too fragmented for all its blocks to be listed in the MBR increases? Maybe instead grub-install could be made more intelligent; if grub-probe -t partmap fails then fall back to 'pc gpt' and also examine the device's sysfs entry; if it's /sys/block/md[0-9]+ then add 'raid', if it's /sys/block/dm-[0-9]+ then add 'lvm'. As an aside, /sys/block/dm-[0-9] does not automatically mean LVM... it just means that the block device is a device-mapper device, so it could be dm-crypt, or using dmraid... perhaps some other check for LVM should happen before grub-install goes to /sys. Come to think of it, the initramfs-generators, initramfs and yaird, have this same problem. I wonder if their detection code can be re-used... Anyway, the sysfs path of the block device can be obtained fairly simply once you have the major/minor device numbers with code like this: find /sys -name dev | while read line; do if test $($line) = '9:0'; then echo $(dirname $line); break; fi; done Perhaps libsysfs even has a function to do this. If sysfs is not available then grub-install can directly poke around /dev, /proc/mounts and so on. -- Sam Morris http://robots.org.uk/ PGP key id 1024D/5EA01078 3412 EA18 1277 354B 991B C869 B219 7FDB 5EA0 1078 signature.asc Description: This is a digitally signed message part ___ Grub-devel mailing list Grub-devel@gnu.org http://lists.gnu.org/mailman/listinfo/grub-devel