George,
George Wilson wrote: > Jan, > > It seems like the problem is not with ZFS but with the device driver. > If the driver is failing to provide the devid then ZFS is just going > to be a victim. I agree with you that this is what we might be encountering with respect to 'devid' problem here. > I would recommend that we change the synopsis to devid_get() fails > with "Invalid argument" and pass this to the driver folks. I will let Sanjay comment on this, since he has done some more investigation recently. > Do you know if it's always the same driver? I can only reproduce it on one system - this one has SATA drive connected to the controller handled by nv_sata(7D) driver. I think that Sanjay encountered that problem also on system with SATA disk. Thank you, Jan > > Thanks, > George > > jan damborsky wrote: >> Hi George, >> >> >> George Wilson wrote: >>> Jan, >>> >>> So who is working the UFS issue and how is that being tracked. >> >> In general, bugs in OpenSolaris Caiman installer are tracked in >> Bugzilla at >> defect.opensolaris.org. This is the preferred over filing bugs in >> Bugster. >> Speaking about this particular problem, it is tracked by following bug: >> >> 4675 Fix for bug 30 causes ZFS label to be mangled - ending up in >> GRUB prompt after installing OpenSolaris >> http://defect.opensolaris.org/bz/show_bug.cgi?id=4675 >> >> Sanjay Nadkarni is assigned to this bug (CCing him). >> >>> I would recommend that we keep this bug as the UFS/install issue and >>> create a new bug and send that to me. >> >> As pointed above, Bugzilla is preferred database to track issues in >> Caiman installer. >> >> Please note that 6769487 was originally filed for tracking the >> problem when >> GRUB can't access ZFS filesystem because 'devid' is not present in >> ZFS label. >> >> It was overloaded later by 'UFS' problem. >> >>> Can you move the descriptions below from this bug and add them to >>> the new one? >> >> To be honest, since installer part of problem related to UFS is >> tracked by 4675, >> I don't see why we shouldn't continue to use 6769487 to track the >> issue this bug >> was initially filed for and I think that we might lose some context when >> ZFS related information is moved from 6769487 to the new bug. >> That said, if you think it might be helpful, please let me know and >> I will try to capture all information from 6769487 I think is >> relevant to >> the ZFS part in new bug. >> >>> Also since you can reproduce this can you tell me exactly how or >>> point me at a system which I can login into to debug? >> >> Sure, the machine can be accessed via 'ssh', but since it is not >> directly accessible from SWAN (it is behind the NAT), >> I will provide you with instructions, how to access it. >> Unfortunately it doesn't have console access. >> >> Please let me know, in which state you would need to have that >> machine - right after the installation finished, but before reboot ? >> >> Unfortunately, following the procedure itself doesn't seem to be >> sufficient for reproducing the problem :-( I tried exactly the >> same steps on other bare metal as well as in virtual environment, >> but without success. >> >> >>> >>> I want to make sure we don't lose sight of the UFS issue and this >>> bug has already gone down to root cause so let's not overload this >>> bug any further. >> >> UFS part of problem is being solved right now (please feel free to >> monitor >> bug 4675 for progress and add anything you might consider relevant >> to that issue). >> >> Thank you, >> Jan >> >>> >>> Thanks, >>> George >>> >>> jan damborsky wrote: >>>> Hi George, >>>> >>>> there are at least two parts of this problem: >>>> >>>> [1] UFS one >>>> This is what you are referring to and it is being tracked by >>>> Bugzilla bug 4675. >>>> In that case workaround #2 helps to "solve" the problem. >>>> >>>> [2] ZFS one >>>> Please see original description #1. I am able to reproduce that on >>>> system >>>> at will which didn't contain any UFS filesystem and thus [1] is not >>>> applicable here. 'zpool import' helps in this case. >>>> >>>> Also please see: >>>> * description #4 >>>> * description #5 >>>> * public comments #8 >>>> * comments #6 >>>> >>>> People are apparently encountering this problem in >>>> other configurations (e.g. when using virgin disk >>>> or installing on system containing only Windows). >>>> >>>> I am not stating that this is in fact problem in ZFS as it might >>>> be related for example to device driver code, but at this point it >>>> seems to me that ZFS team is the most eligible one to move >>>> things forward, as GRUB can't read menu.lst from ZFS >>>> filesystem . >>>> >>>> Please let me know if you have any questions or need more >>>> information. >>>> >>>> Thank you, >>>> Jan >>>> >>>> >>>> George Wilson wrote: >>>>> Jan, >>>>> >>>>> I don't understand how this is a ZFS problem. I thought from the >>>>> evaluation that the issue is that UFS and ZFS are sharing the same >>>>> block and this was being caused by the fact the the livecd had >>>>> mounted a UFS filesystem as part of the installation. Could you >>>>> clarify? >>>>> >>>>> Thanks, >>>>> George >>>>> >>>>> Jan.Damborsky at Sun.COM wrote: >>>>>> Sun Confidential: Internal only >>>>>> >>>>>> *Synopsis*: Ended up in 'grub>' prompt after installation of >>>>>> OpenSolaris 2008.11 (build 101a) >>>>>> >>>>>> CrPrint: http://bt2ws.central.sun.com/CrPrint?id=6769487 >>>>>> Monaco: http://monaco.sfbay.sun.com/detail.jsf?cr=6769487 >>>>>> >>>>>> Due to a change of Responsible manager requested by >>>>>> jan.damborsky at sun.com, >>>>>> david.brittle at sun.com is now the responsible manager for: >>>>>> >>>>>> Due to a change requested by jan.damborsky at sun.com, >>>>>> this CR is being redispatched: >>>>>> >>>>>> This is a high priority CR and requires your immediate attention. >>>>>> Please evaluate it as soon as possible. Thank you. >>>>>> >>>>>> CR 6769487 changed on Nov 12 2008 by jan.damborsky at sun.com >>>>>> >>>>>> === Field ============ === New Value ============= === Old Value >>>>>> ============= >>>>>> >>>>>> Category kernel >>>>>> opensolaris Comments New >>>>>> Note >>>>>> Comments New Note Old >>>>>> Note Comments New >>>>>> Note Old Note Public >>>>>> Comments New >>>>>> Note Responsible >>>>>> Manager david.brittle at sun.com eric.ray at sun.com >>>>>> Status 1-Dispatched 5-Cause >>>>>> Known SubCategory >>>>>> zfs livecd >>>>>> ====================== =========================== >>>>>> =========================== >>>>>> >>>>>> *Change Request ID*: 6769487 >>>>>> >>>>>> *Synopsis*: Ended up in 'grub>' prompt after installation of >>>>>> OpenSolaris 2008.11 (build 101a) >>>>>> >>>>>> Product: solaris >>>>>> Category: kernel >>>>>> Subcategory: zfs >>>>>> Type: Defect >>>>>> Subtype: Functionality >>>>>> Status: 1-Dispatched >>>>>> Substatus: Priority: 1-Very High >>>>>> Introduced In Release: Introduced In Build: Responsible >>>>>> Manager: david.brittle at sun.com >>>>>> Responsible Engineer: Initial Evaluator: zfs-team at sun.com >>>>>> Keywords: >>>>>> === *Description* >>>>>> ============================================================ >>>>>> When testing installation with recent OpenSolaris builds, we have >>>>>> been encountering that >>>>>> in some cases, people end up in GRUB prompt after the >>>>>> installation - it seems that menu.lst >>>>>> can't be accessed for some reason. For now bunch of Bugzilla bugs >>>>>> seem to be describing >>>>>> the same manifestation of the problem which root cause has not >>>>>> been identified yet: >>>>>> >>>>>> 4051 opensolaris b99b/b100a does not install on 1.5 TB disk or >>>>>> boot fails after install >>>>>> 4591 Install failure on a Sun Fire X4240 with Opensolaris 200811 >>>>>> 4161 no grub in 2008.11 Development Builds (comment #20, comment >>>>>> #31) >>>>>> 4760 Enter grub after installing 2008.11 RC 1 >>>>>> ... >>>>>> >>>>>> I also hit that problem when testing Automated Installer (it is a >>>>>> part of Caiman project >>>>>> and will replace current jumpstart install technology), I was >>>>>> able to make GRUB find >>>>>> 'menu.lst' just by using 'zpool import' command - please see >>>>>> below for detailed procedure. >>>>>> >>>>>> >>>>>> configuration: >>>>>> -------------- >>>>>> HW: Ultra 20, 1GB RWM, 1 250GB SATA drive >>>>>> SW: Opensolaris build 100, 64bit mode >>>>>> >>>>>> steps used: >>>>>> ----------- >>>>>> [1] OpenSolaris 100 installed using Automated Installer >>>>>> - Solaris 2 partition created during installation >>>>>> >>>>>> * partition configuration before installation: >>>>>> >>>>>> # fdisk -W - c2t0d0p0 >>>>>> ...* Id Act Bhead Bsect Bcyl Ehead Esect Ecyl >>>>>> Rsect Numsect >>>>>> 192 0 0 1 1 254 63 1023 >>>>>> 16065 22491000 >>>>>> * partition configuration after installation: >>>>>> >>>>>> # fdisk -W - c2t0d0p0 >>>>>> ...* Id Act Bhead Bsect Bcyl Ehead Esect Ecyl >>>>>> Rsect Numsect >>>>>> 192 0 0 1 1 254 63 1023 >>>>>> 16065 22491000 191 128 254 63 1023 254 >>>>>> 63 1023 22507065 30000000 >>>>>> >>>>>> [2] When I reboot the system after the installation, I ended up >>>>>> in GRUB prompt: >>>>>> grub> root >>>>>> (hd0,1,a): Filesystem type unknown, partition type 0xbf >>>>>> >>>>>> grub> cat /rpool/boot/grub/menu.lst >>>>>> >>>>>> Error 17: Cannot mount selected partition >>>>>> >>>>>> grub> >>>>>> >>>>>> [3] I rebooted into AI and did 'zpool import' >>>>>> # zdb -l /dev/rdsk/c2t0d0s0 > /tmp/zdb_before_import.txt (attached) >>>>>> # zpool import -f rpool >>>>>> # zdb -l /dev/rdsk/c2t0d0s0 > /tmp/zdb_after_import.txt (attached) >>>>>> # diff /tmp/zdb_before_import.txt /tmp/zdb_after_import.txt >>>>>> 7c7 >>>>>> < txg=21 >>>>>> --- >>>>>> >>>>>>> txg=2675 >>>>>>> >>>>>> 9c9 >>>>>> < hostid=4741222 >>>>>> --- >>>>>> >>>>>>> hostid=4247690 >>>>>>> >>>>>> 17a18 >>>>>> >>>>>>> devid='id1,sd at f00c778e247ac7bd0000238460000/a' >>>>>>> >>>>>> 31c32 >>>>>> ... >>>>>> # reboot >>>>>> >>>>>> [4] Now GRUB can access menu.lst and Solaris is booted >>>>>> >>>>>> hypothesis >>>>>> ---------- >>>>>> It seems that for some reason, when ZFS pool was created, 'devid' >>>>>> information was not added to the ZFS label. >>>>>> >>>>>> When 'zpool import' was called, 'devid' got populated. >>>>>> >>>>>> Looking at the GRUB ZFS plug-in, it seems that 'devid' >>>>>> (ZPOOL_CONFIG_DEVID attribute) is >>>>>> required in order to be able to access ZFS filesystem: >>>>>> >>>>>> In grub/grub-0.95/stage2/fsys_zfs.c: >>>>>> >>>>>> vdev_get_bootpath() >>>>>> { >>>>>> ... >>>>>> if (strcmp(type, VDEV_TYPE_DISK) == 0) { >>>>>> if (vdev_validate(nv) != 0 || >>>>>> (nvlist_lookup_value(nv, ZPOOL_CONFIG_PHYS_PATH, >>>>>> bootpath, DATA_TYPE_STRING, NULL) != 0) || >>>>>> (nvlist_lookup_value(nv, ZPOOL_CONFIG_DEVID, >>>>>> devid, DATA_TYPE_STRING, NULL) != 0)) >>>>>> return (ERR_NO_BOOTPATH); >>>>>> ... >>>>>> } >>>>>> >>>>>> additional observations: >>>>>> ------------------------ >>>>>> [1] If 'devid' is populated during installation after 'zpool create' >>>>>> operation, the problem doesn't occur. >>>>>> >>>>>> [2] If following described procedure, the problem is reproducible >>>>>> at will on system where it was initially reproduced (please see >>>>>> above for the configuration) >>>>>> >>>>>> [3] Other people reported this problem also for following >>>>>> configurations: >>>>>> * vmware >>>>>> * Sun Java Workstation W2100z with 2xOpteron2.4G 3G Mem >>>>>> >>>>>> [4] When installation into existing Solaris2 partition containing >>>>>> Solaris instance is done >>>>>> 'devid' is always populated and the problem doesn't occur (it >>>>>> doesn't matter if partition >>>>>> is marked 'active' or not). >>>>>> >>>>>> *** (#1 of 5): 2008-11-10 10:27:21 GMT+00:00 jan.damborsky at sun.com >>>>>> >>>>>> If the system once be Navada, (101a as mine), install OpenSolaris >>>>>> will hit this issue, while keep the partition but not choose the >>>>>> entire disk (I suspect this caused the issue, perhaps) >>>>>> There's a diagnostic partition on there if Navada installed, and >>>>>> opensolaris 2008.11 simply enter grub> as this CR mentioned. Then >>>>>> I use the entire disk, this time the system boot up okay. >>>>>> But while I re-install it again with a smaller size than the >>>>>> entire disk specified, >>>>>> grub has no problem, but GNOME cannot start (hang there endlessly) >>>>>> >>>>>> *** (#2 of 5): 2008-11-10 10:45:29 GMT+00:00 robin.guo at sun.com >>>>>> >>>>>> The root cause of this problem is the continued existence of UFS >>>>>> filesystems structures on disk, even after the zfs filesystem is >>>>>> created and is live. Because ZFS did not destroy the UFS magic, >>>>>> both GRUB and Solaris think there's a (horribly damaged) UFS >>>>>> filesystem present on that slice (a WARNING is displayed at boot >>>>>> time during OpenSolaris boot informing the user that >>>>>> /mnt/solaris<N> (where <N> is a number) could not be mounted >>>>>> because of filesystem problems -- in reality, that slice is where >>>>>> the zfs root is located. >>>>>> >>>>>> In GRUB, since code that attempts to mount root does so by trying >>>>>> each filesystem module in the order in which they are listed in >>>>>> the fsys_table[] array, and since UFS is listed before ZFS, GRUB >>>>>> thinks that a UFS filesystem exists in the slice actually >>>>>> containing the ZFS root filesystem (and fails trying to mount it, >>>>>> leaving it unable to locate the real root filesystem). A >>>>>> modified version of GRUB that modifies fsys_table by declaring >>>>>> the ZFS operations before the UFS operations confirms this >>>>>> hypothesis. >>>>>> >>>>>> Therefore, a valid workaround destroys the UFS magic, preventing >>>>>> both GRUB's and Solaris's UFS modules from recognizing the slice >>>>>> as a UFS filesystem. When GRUB's UFS code fails to find a valid >>>>>> UFS filesystem, the ZFS module is subsequently tried and is able >>>>>> to successfully mount the filesystem. >>>>>> >>>>>> *** (#3 of 5): 2008-11-11 03:23:04 GMT+00:00 seth.goldberg at sun.com >>>>>> *** Last Edit: 2008-11-11 03:45:05 GMT+00:00 seth.goldberg at sun.com >>>>>> >>>>>> I think there are two separate issues here. The UFS label >>>>>> appears to be one. The signature for this bug is that at grub >>>>>> prompt, typing root - generates the UFS filesystem info. >>>>>> However there is a secondary bug where after installation, one >>>>>> gets a grub prompt. Typing root command at the grub prompmt >>>>>> generates - unknown file system. In this case no UFS filesystems >>>>>> were detected or mounted. The workaround for this has been to >>>>>> run zpool import. This still needs to be investigated. >>>>>> >>>>>> *** (#4 of 5): 2008-11-12 00:04:16 GMT+00:00 sanjay.nadkarni at sun.com >>>>>> >>>>>> We were able to recreate the grub failure where typing root at >>>>>> the prompt returns unknown file system. This was on a Fujistu >>>>>> LifeBook S7211. It was installed with installed with Vista. We >>>>>> then booted OpenSolaris and started the install. At the end of >>>>>> the installation we noted that the zfs label did not have devid >>>>>> information. >>>>>> >>>>>> We then loaded a simple program that would get the devid >>>>>> (devid_get). This failed with "Invalid argument". We then >>>>>> rebooted the liveCD again and reran this program and this time it >>>>>> printed out the device id. The disk is off a SATA controller. >>>>>> The driver that attached to this is ahci. The device is: >>>>>> 82801HBM/HEM. The disk is Fujitsu MHY2120BH >>>>>> >>>>>> *** (#5 of 5): 2008-11-12 02:43:18 GMT+00:00 sanjay.nadkarni at sun.com >>>>>> >>>>>> >>>>>> === *Public Comments* >>>>>> ======================================================== >>>>>> Following Bugzilla bugs were closed as duplicate of this issue: >>>>>> >>>>>> 4772 Cannot install OpenSolaris 2008.11 on VMware Server 2.0 >>>>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4772 >>>>>> >>>>>> 4756 after reboot when finishing the installation, system can not >>>>>> boot >>>>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4756 >>>>>> >>>>>> 4749 After installed opensolaris0811RC1 on Dell PowerEdge, can't >>>>>> boot from disk. >>>>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4749 >>>>>> >>>>>> *** (#1 of 9): 2008-11-10 17:20:54 GMT+00:00 dave.miner at sun.com >>>>>> *** Last Edit: 2008-11-11 11:45:41 GMT+00:00 jan.damborsky at sun.com >>>>>> >>>>>> zpool import doesn't help for me, nor would I expect it to (it's >>>>>> a mystery >>>>>> why it seems to). Clearing the UFS magic helps. >>>>>> >>>>>> Looking further, I find that the data on disk at 8k seems to still >>>>>> be a UFS superblock, not a zfs vdev_boot_header_t, which doesn't >>>>>> make >>>>>> sense to me; in any ZFS initialization scheme, one would expect >>>>>> all parts >>>>>> of the label to be completely written. >>>>>> >>>>>> The expected vdev_boot_header_t appears at the label copy at >>>>>> 256K+8K, as >>>>>> expected. >>>>>> >>>>>> *** (#2 of 9): 2008-11-11 04:39:09 GMT+00:00 dan.mick at sun.com >>>>>> >>>>>> It appears that ZFS doesn't validate that first 8k (the >>>>>> vdev_boot_header), so >>>>>> that explains why the kernel was happy even with a UFS superblock >>>>>> where the >>>>>> vdev_boot_header was supposed to be. >>>>>> >>>>>> Also, the last few bits of the 8k block in question seem to >>>>>> contain a >>>>>> zio_block_tail_t (i.e. a zbt_magic and a zbt_cksum), so it seems >>>>>> this block >>>>>> was written by ZFS sometime in the past. >>>>>> Possible theories: 1) the ZFS initialization somehow skipped >>>>>> this 8k header, >>>>>> or 2) somehow the 8k superblock was rewritten over the block >>>>>> after ZFS initialized it. >>>>>> >>>>>> *** (#3 of 9): 2008-11-11 04:49:57 GMT+00:00 dan.mick at sun.com >>>>>> >>>>>> Another possible theory: could this be the superblock flush from >>>>>> a still-mounted UFS being shut down? >>>>>> >>>>>> (The block was correct until after the OpenSolaris installer said >>>>>> it was done, >>>>>> and waited for me to press a button to reboot. I suspect >>>>>> the original UFS was mounted and not unmounted before the ZFS >>>>>> creation, >>>>>> so they both think they own the device.) >>>>>> >>>>>> Supporting evidence: the "last mounted" path in the superblock is >>>>>> "/mnt/solaris0". >>>>>> >>>>>> I suspect the cause of this bug is a UFS that's mounted and >>>>>> should be >>>>>> unmounted by the installer before ZFS creation. >>>>>> >>>>>> What's the right category/subcategory for Caiman? >>>>>> >>>>>> *** (#4 of 9): 2008-11-11 07:34:58 GMT+00:00 dan.mick at sun.com >>>>>> >>>>>> The live CD has historically automatically mounted up any UFS >>>>>> file systems that it found, going back to Belenix. Interesting >>>>>> that this is just now a problem, but it probably is a result of >>>>>> switching to ZFS for swap, as up until build 96 we always created >>>>>> a swap slice at the start of the disk, which it appears would >>>>>> have masked this problem. >>>>>> >>>>>> *** (#5 of 9): 2008-11-11 15:02:03 GMT+00:00 dave.miner at sun.com >>>>>> >>>>>> Installer takes care of releasing the target device before Target >>>>>> Instantiation >>>>>> phase is launched. Among other things, it >>>>>> >>>>>> * releases all swap devices created on target disk >>>>>> * unmounts whatever is mounted on target disk >>>>>> >>>>>> For the latter, /etc/mnttab is read and if there is mounted >>>>>> device which is part of >>>>>> the target disk, installer tries to unmount it. >>>>>> >>>>>> The problem is after fix for Bugzilla bug 30 was integrated, UFS >>>>>> filesystems are >>>>>> mounted with '-o m' option which causes the filesystem being >>>>>> mounted without making >>>>>> entry in /etc/mnttab. Then mountpoints are hidden, installer >>>>>> can't see those and >>>>>> doesn't unmount them. >>>>>> >>>>>> That said, this explains UFS part of the problem (when 'dd' >>>>>> workaround works), >>>>>> but doesn't seems to be related to ZFS part of the issue, when >>>>>> 'zpool import' workaround helped. >>>>>> >>>>>> *** (#6 of 9): 2008-11-11 16:25:09 GMT+00:00 jan.damborsky at sun.com >>>>>> *** Last Edit: 2008-11-11 16:34:29 GMT+00:00 jan.damborsky at sun.com >>>>>> >>>>>> We should probably file leave this bug to resolve zpool create >>>>>> not removing evidence of the >>>>>> previous ufs fs, and file another one to chase down the other >>>>>> issue(s?). >>>>>> >>>>>> Chris, if you run zbd -l on you virgin device, are you missing >>>>>> any zfs properties? The reader >>>>>> in GRUB pretty much gives up if things like the devid aren't set. >>>>>> >>>>>> *** (#7 of 9): 2008-11-11 19:30:56 GMT+00:00 >>>>>> jan.setje-eilers at sun.com >>>>>> >>>>>> Concur that Chris' problem is different; the UFS superblock does >>>>>> not exist in >>>>>> the first 256kb attached to the bug. It appears as though >>>>>> phys_path and devid >>>>>> are present, although it's difficult to be sure. We should >>>>>> probably see if we can >>>>>> send a debug version of Grub to Chris, with installation >>>>>> instructions, to see >>>>>> why it seems unable to find the zfs. >>>>>> >>>>>> *** (#8 of 9): 2008-11-11 22:16:50 GMT+00:00 dan.mick at sun.com >>>>>> >>>>>> The root cause of 'UFS part' of this problem is in 'livecd code' >>>>>> and is tracked by >>>>>> following Bugzilla bug: >>>>>> >>>>>> 4675 Fix for bug 30 causes ZFS label to be mangled - ending up in >>>>>> GRUB prompt after installing OpenSolaris >>>>>> >>>>>> Please feel free to use this bug (6769487) for tracking other >>>>>> part(s) of the problem. >>>>>> Resetting category to solaris/kernel/zfs and Status to 'Dispatched'. >>>>>> >>>>>> *** (#9 of 9): 2008-11-12 12:46:21 GMT+00:00 jan.damborsky at sun.com >>>>>> >>>>>> >>>>>> === *Comments* >>>>>> =============================================================== >>>>>> Moved to public comments. >>>>>> >>>>>> *** (#1 of 6): 2008-11-10 17:04:10 GMT+00:00 jan.damborsky at sun.com >>>>>> *** Last Edit: 2008-11-10 17:20:54 GMT+00:00 dave.miner at sun.com >>>>>> >>>>>> Same situation (without zfs) on: >>>>>> White Box based on Intel DG33TL motherboard with ICH9R chipset, >>>>>> 2Gb memory, 3 SATA drives, 1 SATA CD/DVD, Intel graphics. >>>>>> >>>>>> *** (#2 of 6): 2008-11-10 22:52:23 GMT+00:00 pawel.wojcik at sun.com >>>>>> >>>>>> Workaround #1 does not cause the system to boot properly on the >>>>>> system I tried installing (that seems to be consistent with what >>>>>> others are reporting in the opensolaris defect report), but >>>>>> workaround #2 DOES. >>>>>> >>>>>> *** (#3 of 6): 2008-11-11 01:56:43 GMT+00:00 seth.goldberg at sun.com >>>>>> *** Last Edit: 2008-11-11 03:41:48 GMT+00:00 seth.goldberg at sun.com >>>>>> >>>>>> I've reproduced this on a "virgin" disk, see SR record against >>>>>> this bug, (had to purchase a new spindle as previous disk failed >>>>>> and new disk removed supplier packaging was inserted into laptop >>>>>> and then 2008.11 CD booted). >>>>>> >>>>>> After a discussion with Dan Mick on email data requested by dan >>>>>> was capture root command from grub prompt: >>>>>> >>>>>> (hd0,0,a): Filesystem type is zfs, partition type 0xbf >>>>>> >>>>>> Also, can you boot from the CD and collect the first 256kb of the >>>>>> disk, with >>>>>> >>>>>> dd if=<your s0 slice here> of=first.256kb bs=256k count=1 >>>>>> >>>>>> This is attached. >>>>>> >>>>>> *** (#4 of 6): 2008-11-11 10:46:29 GMT+00:00 >>>>>> christopher.armes at sun.com >>>>>> >>>>>> Saw this bug on several machines today which I was helping to >>>>>> install. One person did a reinstall and it worked fine the second >>>>>> time as some reported. >>>>>> >>>>>> 2 other machines could use the workaround which Lin Ling pointed >>>>>> us to with this bug. That did save a couple folks from having to >>>>>> reinstall, so was very helpful. Thanks Lin! Of the installs of >>>>>> people that installed to a hard drive (i.e., not within >>>>>> VirtualBox), about 12 systems, we saw this on 3 machines, so >>>>>> about 25% of the systems in this small sampling. >>>>>> >>>>>> *** (#5 of 6): 2008-11-12 09:58:01 GMT+00:00 alan.duboff at sun.com >>>>>> >>>>>> Moved to public comments. >>>>>> >>>>>> *** (#6 of 6): 2008-11-12 12:43:18 GMT+00:00 jan.damborsky at sun.com >>>>>> *** Last Edit: 2008-11-12 12:46:43 GMT+00:00 jan.damborsky at sun.com >>>>>> >>>>>> >>>>>> === *Evaluation* >>>>>> ============================================================= >>>>>> See Description. >>>>>> >>>>>> *** (#1 of 4): 2008-11-11 03:23:04 GMT+00:00 seth.goldberg at sun.com >>>>>> >>>>>> remove mislead evaluation. >>>>>> >>>>>> *** (#2 of 4): 2008-11-11 21:45:12 GMT+00:00 lin.ling at sun.com >>>>>> *** Last Edit: 2008-11-11 23:16:07 GMT+00:00 lin.ling at sun.com >>>>>> >>>>>> What? No, read the public comments. The problem is that the UFS >>>>>> filesystem is still mounted as the installer lays down the ZFS. >>>>>> Then, on reboot, the UFS, as >>>>>> it's syncing, writes its superblock back to the filesystem it >>>>>> thinks it owns, >>>>>> over the top of the now-ZFS-owned space. >>>>>> >>>>>> The installer must ensure that other filesystems are not mounted >>>>>> on the slice >>>>>> where it's creating the ZFS rpool. >>>>>> >>>>>> *** (#3 of 4): 2008-11-11 22:11:35 GMT+00:00 dan.mick at sun.com >>>>>> >>>>>> You are right. I misunderstood. >>>>>> George Wilson just corrected me that 'zpool create' indeed clears >>>>>> the space correctly: >>>>>> >>>>>> vdev_label_init() { >>>>>> : >>>>>> vp = zio_buf_alloc(sizeof (vdev_phys_t)); >>>>>> bzero(vp, sizeof (vdev_phys_t)); >>>>>> : >>>>>> bzero(vb, sizeof (vdev_boot_header_t)); >>>>>> : >>>>>> } >>>>>> >>>>>> Thanks for the clarification. >>>>>> >>>>>> *** (#4 of 4): 2008-11-11 22:49:04 GMT+00:00 lin.ling at sun.com >>>>>> >>>>>> >>>>>> === *Suggested Fix* >>>>>> ========================================================== >>>>>> >>>>>> === *Workaround* >>>>>> ============================================================= >>>>>> [1] Boot LiveCD >>>>>> $ pfexec su - >>>>>> # zpool import -f rpool >>>>>> >>>>>> *** (#1 of 3): 2008-11-10 10:27:21 GMT+00:00 jan.damborsky at sun.com >>>>>> >>>>>> ZERO OUT The leftover UFS magic: >>>>>> >>>>>> For GNU dd: >>>>>> dd if=/dev/zero bs=1 count=4 seek=9564 /dev/dsk/<SLICE> >>>>>> >>>>>> (e.g.: >>>>>> dd if=/dev/zero bs=1 count=4 seek=9564 /dev/dsk/c4t0d0s0 >>>>>> ) >>>>>> >>>>>> *** (#2 of 3): 2008-11-11 03:36:55 GMT+00:00 seth.goldberg at sun.com >>>>>> >>>>>> I did the following in dd to workaround around the issue: >>>>>> >>>>>> root at opensolaris:~# dd if=/dev/zero of=/dev/dsk/c1t0d0s0 bs=1 >>>>>> count=4 seek=9564 >>>>>> 4+0 records in >>>>>> 4+0 records out >>>>>> 4 bytes (4 B) copied, 0.0394095 s, 0.1 kB/s >>>>>> root at opensolaris:~# >>>>>> >>>>>> *** (#3 of 3): 2008-11-11 19:07:04 GMT+00:00 mary.ding at sun.com >>>>>> >>>>>> >>>>>> === *Justification* >>>>>> ========================================================== >>>>>> Priority changed from [] to [1-Very High] >>>>>> Installed OpenSolaris 2008.11 doesn't boot >>>>>> jan.damborsky at sun.com 2008-11-10 10:27:21 GMT >>>>>> >>>>>> *** (#1 of 1): 2008-11-10 10:27:21 GMT+00:00 jan.damborsky at sun.com >>>>>> >>>>>> >>>>>> === *Additional Details* >>>>>> ===================================================== >>>>>> Targeted Release: Commit To Fix In Build: >>>>>> Fixed In Build: Integrated In Build: Verified In >>>>>> Build: See Also: 6769534 >>>>>> Duplicate of: Hooks: >>>>>> Hook1: Hook2: Hook3: >>>>>> Hook4: Hook5: Hook6: Interest List: >>>>>> dan.mick at sun.com, dave.miner at sun.com, david.comay at sun.com, >>>>>> frank.batschulat at sun.com, kerberos-iteam at Sun.COM, >>>>>> lin.ling at sun.com, nick.todd at sun.com, peter.dennis at sun.com, >>>>>> plus1tb at sun.com, sdg at sun.com, si-bugs at sun.com, sst-prg at >>>>>> sun.com, >>>>>> tomas.hurka at sun.com >>>>>> Program Management: New Defect >>>>>> Root Cause: Is a Security Vulnerability?: No >>>>>> Fix Affects Documentation: No >>>>>> Fix Affects Localization: No >>>>>> Reported by: >>>>>> === *History* >>>>>> ================================================================ >>>>>> Date Submitted: 2008-11-10 10:27:21 GMT+00:00 >>>>>> Submitted By: jan.damborsky at sun.com >>>>>> >>>>>> Status Changed Date Updated Updated By >>>>>> 3-Accepted 2008-11-10 23:59:05 GMT+00:00 >>>>>> lin.ling at sun.com >>>>>> 5-Cause Known 2008-11-11 03:23:04 GMT+00:00 >>>>>> seth.goldberg at sun.com >>>>>> 1-Dispatched 2008-11-12 12:43:18 GMT+00:00 >>>>>> jan.damborsky at sun.com >>>>>> >>>>>> >>>>>> === *Solution* >>>>>> =============================================================== >>>>>> >>>>>> >>>>>> === *Service Request* >>>>>> ======================================================== >>>>>> ID: 1-493023606 >>>>>> Customer: >>>>>> Account Name: Sun Microsystems >>>>>> Customer Contact: Customer Contact Role: >>>>>> D-Development >>>>>> Customer Contact Type: I-Internal (SMI) Customer >>>>>> Impact: Critical >>>>>> Functionality: Primary >>>>>> Severity: 1 >>>>>> Synopsis: Product Name: solaris >>>>>> Product Release: osol_2008.11 >>>>>> Product Build: Operating System: osol_2008.11 >>>>>> Hardware: generic >>>>>> Reference Number: Sun Contact: jan.damborsky at sun.com >>>>>> Status: Open >>>>>> Source: BugTraq2 >>>>>> Reproducible: Submitted By: jan.damborsky at sun.com >>>>>> Submitted Date: 2008-11-10 10:27:21 GMT+00:00 >>>>>> Description: >>>>>> >>>>>> === *Service Request* >>>>>> ======================================================== >>>>>> ID: 1-493053806 >>>>>> Customer: >>>>>> Account Name: SUN MicroSystems >>>>>> Customer Contact: Customer Contact Role: >>>>>> D-Development >>>>>> Customer Contact Type: I-Internal (SMI) Customer >>>>>> Impact: Critical >>>>>> Functionality: Primary >>>>>> Severity: 1 >>>>>> Synopsis: After installing 2008.11RC1b boot from hard >>>>>> disk fails >>>>>> Product Name: solaris >>>>>> Product Release: osol_2008.11 >>>>>> Product Build: Operating System: osol_2008.11 >>>>>> Hardware: x86 >>>>>> Reference Number: Sun Contact: >>>>>> christopher.armes at sun.com >>>>>> Status: Open >>>>>> Source: BugTraq2 >>>>>> Reproducible: Always >>>>>> Submitted By: christopher.armes at sun.com >>>>>> Submitted Date: 2008-11-10 12:54:24 GMT+00:00 >>>>>> Description: Booting from the livecd and then selecting >>>>>> install works fine upon reboot with either cd in and selecting >>>>>> boot from hard disk or without cd allowing grub menu to boot, >>>>>> causes boot to fail drops system to "grub>" prompt >>>>>> >>>>>> >>>>>> === *Service Request* >>>>>> ======================================================== >>>>>> ID: 1-493177108 >>>>>> Customer: >>>>>> Account Name: SUN >>>>>> Customer Contact: Customer Contact Role: >>>>>> D-Development >>>>>> Customer Contact Type: I-Internal (SMI) Customer >>>>>> Impact: Critical >>>>>> Functionality: Primary >>>>>> Severity: 1 >>>>>> Synopsis: Product Name: solaris >>>>>> Product Release: osol_2008.11 >>>>>> Product Build: osol_2008.11 >>>>>> Operating System: osol_2008.11 >>>>>> Hardware: amd >>>>>> Reference Number: Sun Contact: >>>>>> garrett.damore at sun.com >>>>>> Status: Source: BugTraq2 >>>>>> Reproducible: Submitted By: garrett.damore at sun.com >>>>>> Submitted Date: 2008-11-10 20:16:41 GMT+00:00 >>>>>> Description: I hit this when updating my Ultra 20 >>>>>> (original model, not M2) from b77ish to OSOL 2008.11rc1b >>>>>> >>>>>> System has 1.5GB ram, SATA hard disk. >>>>>> >>>>>> >>>>>> === *Service Request* >>>>>> ======================================================== >>>>>> ID: 1-493257401 >>>>>> Customer: >>>>>> Account Name: Sun Microsystems, Inc. >>>>>> Customer Contact: Customer Contact Role: >>>>>> D-Development >>>>>> Customer Contact Type: I-Internal (SMI) Customer >>>>>> Impact: Critical >>>>>> Functionality: Primary >>>>>> Severity: 1 >>>>>> Synopsis: Product Name: solaris >>>>>> Product Release: osol_2008.11 >>>>>> Product Build: osol_2008.11 >>>>>> Operating System: osol_2008.11 >>>>>> Hardware: generic_ibm_compatible >>>>>> Reference Number: Sun Contact: dana.myers at sun.com >>>>>> Status: Open >>>>>> Source: BugTraq2 >>>>>> Reproducible: Submitted By: dana.myers at sun.com >>>>>> Submitted Date: 2008-11-10 22:34:45 GMT+00:00 >>>>>> Description: >>>>>> >>>>>> === *Service Request* >>>>>> ======================================================== >>>>>> ID: 1-493265801 >>>>>> Customer: >>>>>> Account Name: Sun Microsystems >>>>>> Customer Contact: pawel.wojcik at sun.com >>>>>> Customer Contact Role: D-Development >>>>>> Customer Contact Type: I-Internal (SMI) Customer >>>>>> Impact: Critical >>>>>> Functionality: Primary >>>>>> Severity: 1 >>>>>> Synopsis: Product Name: solaris >>>>>> Product Release: osol_2008.11 >>>>>> Product Build: osol_2008.11 >>>>>> Operating System: solaris >>>>>> Hardware: intel >>>>>> Reference Number: Sun Contact: pawel.wojcik at sun.com >>>>>> Status: Source: BugTraq2 >>>>>> Reproducible: Submitted By: pawel.wojcik at sun.com >>>>>> Submitted Date: 2008-11-10 22:50:53 GMT+00:00 >>>>>> Description: >>>>>> >>>>>> === *Activity* >>>>>> =============================================================== >>>>>> >>>>>> >>>>>> === *Multiple Release (MR) Cluster* - 0 >>>>>> ====================================== >>>>>> >>>>>> >>>>>> === *Escalations* >>>>>> ============================================================ >>>>>> >>>>>> >>>>> >>>> >>> >> >
