On 09/15/09 09:08 AM, Brian Ruthven - Sun UK wrote:

I don't believe this behaviour is a bug in itself:

The dual boot scenario where an older version of Solaris
trashes a newer version or vice versa is, IMO, a bug. It
should be possible to boot different versions of Solaris
from different disks without having to reinstall. If it
isn't a bug, it is a highly undesirable feature!

There is a private (i.e. undocumented) flag to bootadm which is called
at reboot time (and init 6, etc...). This flag causes bootadm to update
the archive on all *mounted* boot environments. I believe the normal
case for this is something like mounting up a "broken" BE, fixing it,
then rebooting. If still mounted at reboot time, then the archive will
be checked and resynced if necessary.

You might be able to make a case if the OS versions were the same,
for, example, if one were to be using a putative rescue disk to,
well, rescue an installation. But why do it automatically? Going
through all file systems looking for something that looks like an
rpool and then trashing it doesn't seem like a good idea to me :-).

I too had been surprised to see something like "Creating boot_archive
for /mnt" when rebooting in the past, and I queried it. Above is the
answer I got (paraphrased).

As you see, it is highly reproducible! If the versions aren't too
different, then it is likely harmless. But it is serious, especially
if one version boots from UFS and the other from ZFS. I have not tried
to see if snv122 booted from UFS would trash an snv122 image bootable
from ZFS, but I suspect it would.

I'm not sure what the logic is, but I suspect it starts with the
/etc/mnttab to search for candidate filesystems which might contain
alternate BEs. Thus in your case, I would guess that the x86 image is
rooted at a filesystem mount point? If so, bootadm will try and update
the archive on reboot.

If the X86 archive is used as a backup for an actual X68, the SPARC
bootadm should leave it alone. We use these as an easy source to
restore files lost due to fatal checksum errors, lost even on mirrored
drives (the subject of a separate CR).

I believe the underlying premise is that the boot-archive should never
be out of sync with the filesystem it refers to, so reboot and friends

The operative words being "filesystem it refers to". It should be
hands-off for any other file system(s)!

will try to resync this during shutdown, even if it is not the current
BE (after all, it could be the BE you are about to boot, so the archive
better be up to date!).
But this completely ignores the case where the BE you are about to
boot is perfectly OK and just happens not to be the same version
as you are currently running. I daresay that a newer version will
probably trash an older one, but since my older one boots from UFS,
whatever it is that does the trashing doesn't seem to see it. This
is what is so scary. This behavior makes fallback or fallforward
impossible. The workaround seems to be to make sure that all possible
rpools are unmounted before rebooting, but this is prone to error
and IMO is a killer for ZFS adoption. We have resorted to physically
removing the alternate boot disks to avoid this problem.
What I'd suggest is that you add details of the corruption to your bug
(http://defect.opensolaris.org/bz/show_bug.cgi?id=11358). Things like:
What messages you see during shutdown

This is so horribly easy to reproduce I didn't think it necessary. I
have an snv111b rpool in the data pool. When I reboot from snv103, it
updates it at boot. If I reboot from snv122, it updates it at reboot.

What state the boot archive is left in (i.e. is it 0-length, null-filled
file, etc...)
What happens if you take a copy, manually unpack it and mount it? (use
lofiadm + "mount -oro -Fufs" or possibly "-Fhsfs").

This only affects ZFS root pools. However since it is so reproducible I
would think that it would be just as easy to reproduce it...

What are the contents of the archive if mounted as above?

Sorry, I have no clue where the archives are. The nessages at reboot
simply say that the archives are being updated. They don't say where
or what a boot archive actually is. I could look in the snv111b boot
archive mentioned above if I know where to look...
What's not clear to me is why trashing an x86 boot archive on a SPARC
server is a problem? The SPARC system should never try to boot from it,
so why is this an issue in the first place?

Just used it as a highly visible example. Something is clearly amiss
when the SPARC reboot tries to update an X86 archive, or snv103 trashes
a perfectly good bootable instance of snv122 or vice versa.

Cheers -- Frank



_______________________________________________
indiana-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/indiana-discuss

Reply via email to