severity 650819 serious
tags 650819 + confirmed patch
retitle 650819 GRUB entries (grub.cfg) sometimes lacking other operating
systems, particularly installing 686 or amd64 images (i386)
reassign 650819 os-prober, grub-common
thanks
I have to confirm this. I was hit by this when installing from the March
22 i386 wheezy netinst on my laptop, a typical Intel Core i3 (x86-64)
laptop with Windows 7. Although d-i detected Windows, after the install
Windows was not listed by GRUB.
I reproduced with a later businesscard, and then with a March 27
"flexible way" USB key with an updated netinst. I reproduced this about
in 10-20 installs before precisely understanding when/why it happened.
Thanks Brian for reporting. All the information you reported was
precious in nailing this one. This is indeed an os-prober bug, or at
least a bug of interaction between os-prober and GRUB.
First of all, debian-installer typically calls os-prober 3 times. The
last time is during finish-install (clock-setup) and although it nicely
fills syslog, it is not relevant at all to this problem. The 2 other
times are indeed from grub-installer.
There are 2 os-prober packages, a deb and a udeb. Typically, both are
installed. The deb may however not be installed, when automatic
installation of recommendations is disabled (os-prober is only installed
because it's recommended by grub-common) or when it is not available
(for example, when installing from a netinst without using a mirror).
Typically, grub-installer calls os-prober twice. The first is used
mainly to verify the list of other operating systems detected, before
asking whether GRUB should be installed. The (possible) second time is
when grub-installer calls update-grub (line 845). update-grub's
30_os-prober hook calls os-prober if it is installed.
There is an important difference between these calls. The first, direct,
call to os-prober happens in d-i's context (it uses os-prober-udeb). The
second one happens in-target (it uses the os-prober deb). This problem
comes from this second time. Starting from version 1.45, os-prober's
50mounted-tests attempts to mount partitions using grub-mount, rather
than using mount, if the former is available:
http://packages.qa.debian.org/o/os-prober/news/20110424T183244Z.html
http://anonscm.debian.org/gitweb/?p=d-i/os-prober.git;a=commit;h=7ed9dec4d2c65056f211324f8e25a4d913b0f2a1
mounted=
if which grub-mount >/dev/null 2>&1 && \
grub-mount "$partition" "$tmpmnt" 2>/dev/null; then
mounted=1
type="$(grub-probe -d "$partition" -t fs)"
[ "$type" ] || type=fuseblk
else
ro_partition "$partition"
for type in $types; do
if mount -o ro -t "$type" "$partition" "$tmpmnt" 2>/dev/null; then
mounted=1
break
fi
done
fi
What happens here is that grub-mount fails, but the if's condition still
evaluates to true because grub-mount's exit status is 0, and the code
above assumes 0 means success. From that point, 50mounted-tests
considers the partition mounted, and subtests quietly fail to find anything.
This issue does not affect the first call to os-prober (which is outside
the target) because which(1) is not available in the installer, so the
condition is false and the tests fallback to the standard mount, which
works. This bug (using which in os-prober-udeb) was fixed in os-prober
1.51:
http://anonscm.debian.org/gitweb/?p=d-i/os-prober.git;a=commit;h=94048e4ec7a8896fb2c9c917433fa5e3ba71fbbe
However, that commit also introduced a check for grub-probe, which is
not in grub-mount-udeb for now, as indicated in the commit message, so
for now there is no functional difference; the first use of os-prober
will keep falling back to the standard mount.
Brian's finding about the subtle "fuse init" line was a hint to the
reason why grub-mount fails. grub-mount needs fuse, and fuse is not in
the installer's 486 Linux. Here is what happens:
# grub-mount /dev/sdb1 /var/lib/os-prober/mount
fuse: device not found, try 'modprobe fuse' first
However, fuse is in stock (non-install) Linux images, so when installing
the 486 image, grub-mount succeeds to load fuse because it's running
in-target and it attemps loading the installed Linux's LKM, rather than
failing to find a fuse LKM for the installer Linux. Of course, the
installed Linux's fuse is compatible with the installer Linux's module
ABI when installing the 486 image, but not when installing the 686
image. This is presumably also true on i386 for any non-486 image, such
as amd64, however the 686 image is on netinsts and offered as a choice.
It should be noted that at this time, the 486 image is more likely to be
installed on 686 machines due to #655437, but this is merely a blessed
misfortune.
I do not know other architectures, but I imagine that this doesn't
affect amd64, as the only image proposed for installation will be amd64,
which matches the installer. So I imagine this problem is largely
specific to i386.
Back to the problem in 50mounted-tests's use of grub-mount, grub-mount's
exit status is unspecified. However, it's clear that it generally
attempts to return non-0 on error, but it doesn't do that in this case.
I did not debug grub-mount, but this is my understanding of the problem
from a summary code examination.
grub-mount.c:
static grub_err_t
fuse_init (void)
{
int i;
for (i = 0; i < num_disks; i++)
{
char *argv[2];
char *host_file;
char *loop_name;
loop_name = grub_xasprintf ("loop%d", i);
if (!loop_name)
grub_util_error (grub_errmsg);
host_file = grub_xasprintf ("(host)%s", images[i]);
if (!host_file)
grub_util_error (grub_errmsg);
argv[0] = loop_name;
argv[1] = host_file;
if (execute_command ("loopback", 2, argv))
grub_util_error (_("loopback command fails"));
grub_free (loop_name);
grub_free (host_file);
}
grub_lvm_fini ();
grub_mdraid09_fini ();
grub_mdraid1x_fini ();
grub_raid_fini ();
grub_raid_init ();
grub_mdraid09_init ();
grub_mdraid1x_init ();
grub_lvm_init ();
dev = grub_device_open (0);
if (! dev)
return grub_errno;
I believe grub_device_open() is failing, but the if still returns 0.
disk.c:
grub_disk_t
grub_disk_open (const char *name)
{
const char *p;
grub_disk_t disk;
grub_disk_dev_t dev;
char *raw = (char *) name;
grub_uint64_t current_time;
grub_dprintf ("disk", "Opening `%s'...\n", name);
disk = (grub_disk_t) grub_zalloc (sizeof (*disk));
if (! disk)
return 0;
p = find_part_sep (name);
if (p)
{
grub_size_t len = p - name;
raw = grub_malloc (len + 1);
if (! raw)
goto fail;
grub_memcpy (raw, name, len);
raw[len] = '\0';
disk->name = grub_strdup (raw);
}
else
disk->name = grub_strdup (name);
if (! disk->name)
goto fail;
for (dev = grub_disk_dev_list; dev; dev = dev->next)
{
if ((dev->open) (raw, disk) == GRUB_ERR_NONE)
break;
else if (grub_errno == GRUB_ERR_UNKNOWN_DEVICE)
grub_errno = GRUB_ERR_NONE;
else
goto fail;
}
if (! dev)
{
grub_error (GRUB_ERR_UNKNOWN_DEVICE, "no such disk");
goto fail;
}
Oddly, GRUB_ERR_UNKNOWN_DEVICE seems to be defined as 0.
err.h:
typedef enum
{
GRUB_ERR_NONE = 0,
GRUB_ERR_TEST_FAILURE,
GRUB_ERR_BAD_MODULE,
GRUB_ERR_OUT_OF_MEMORY,
GRUB_ERR_BAD_FILE_TYPE,
GRUB_ERR_FILE_NOT_FOUND,
GRUB_ERR_FILE_READ_ERROR,
GRUB_ERR_BAD_FILENAME,
GRUB_ERR_UNKNOWN_FS,
GRUB_ERR_BAD_FS,
GRUB_ERR_BAD_NUMBER,
GRUB_ERR_OUT_OF_RANGE,
GRUB_ERR_UNKNOWN_DEVICE,
GRUB_ERR_BAD_DEVICE,
GRUB_ERR_READ_ERROR,
GRUB_ERR_WRITE_ERROR,
GRUB_ERR_UNKNOWN_COMMAND,
GRUB_ERR_INVALID_COMMAND,
GRUB_ERR_BAD_ARGUMENT,
GRUB_ERR_BAD_PART_TABLE,
GRUB_ERR_UNKNOWN_OS,
GRUB_ERR_BAD_OS,
GRUB_ERR_NO_KERNEL,
GRUB_ERR_BAD_FONT,
GRUB_ERR_NOT_IMPLEMENTED_YET,
GRUB_ERR_SYMLINK_LOOP,
GRUB_ERR_BAD_COMPRESSED_DATA,
GRUB_ERR_MENU,
GRUB_ERR_TIMEOUT,
GRUB_ERR_IO,
GRUB_ERR_ACCESS_DENIED,
GRUB_ERR_EXTRACTOR,
GRUB_ERR_BUG
}
grub_err_t;
This I don't understand, if it's intentional.
So I see 4 ways to fix/workaround this:
* Add fuse to the installer's Linux image(s), or add a fuse modules udeb
* Always use traditional mount instead of grub-mount
* Make grub-mount return non-0 on failure
* Check grub-mount's output instead of just checking its exit status.
I used the last approach, changing both 50mounted-tests from
if type grub-mount >/dev/null 2>&1 && \
type grub-probe >/dev/null 2>&1 && \
grub-mount "$partition" "$tmpmnt" 2>/dev/null; then
to
if type grub-mount >/dev/null 2>&1 && \
type grub-probe >/dev/null 2>&1 && \
[ -z `grub-mount "$partition" "$tmpmnt" 2>&1` ]; then
I verified that this succeeds to workaround. Note that this assumes that
grub-mount will write to stderr or stdout if and only if it fails.
umount's failure towards the end of 50mounted-tests ("warning: failed to
umount /var/lib/os-prober/mount") is therefore an indication of the
problem, but not its cause. It would greatly help to avoid problems of
this kind to give the reason for this failure:
REASON=$(umount "$tmpmnt" 2>&1)
if ! [ $? = "0" ]; then
warn "failed to umount $tmpmnt ; $REASON"
fi
It also wouldn't hurt to warn when the partition wasn't mounted. These
changes would give something like this for the general 50mounted-tests:
if [ "$mounted" ]; then
for test in /usr/lib/os-probes/mounted/*; do
debug "running subtest $test"
if [ -f "$test" ] && [ -x "$test" ]; then
if "$test" "$partition" "$tmpmnt" "$type"; then
debug "os found by subtest $test"
if ! umount "$tmpmnt"; then
warn "failed to umount $tmpmnt"
fi
rmdir "$tmpmnt" || true
exit 0
fi
fi
done
REASON=$(umount "$tmpmnt" 2>&1)
if ! [ $? = "0" ]; then
warn "failed to umount $tmpmnt ; $REASON"
fi
else
warn "mounted-tests: $partition not mounted"
fi
To clarify, this will not happen when os-prober is not installed to the
target. In that case, grub-installer hacks a static 30_otheros file by
using the output of the first call to os-prober (from the udeb). This
appears to work fine, so the problem will not be visible in this rare
case, as grub.cfg will end up containing the necessary entries.
Unfortunately, it's currently only the second call that fails (due to
the which / grub-probe issue(s) explained above), which causes the
installed system to lack the entries even though grub-installer said
they were detected in its prompt, hence serious severity.