Mary, thanks for the update !
Jan mary ding wrote: > Jan and George: > > I had also seen this on ultra24 and ultra40 when we test the 1.5 TB > Seagate sata disk. So far I had only seen this on system with sata > drive. > > > > jan damborsky wrote: >> George, >> >> >> George Wilson wrote: >>> Jan, >>> >>> It seems like the problem is not with ZFS but with the device >>> driver. If the driver is failing to provide the devid then ZFS is >>> just going to be a victim. >> >> I agree with you that this is what we might be encountering >> with respect to 'devid' problem here. >> >> >>> I would recommend that we change the synopsis to devid_get() fails >>> with "Invalid argument" and pass this to the driver folks. >> >> I will let Sanjay comment on this, since he has done >> some more investigation recently. >> >>> Do you know if it's always the same driver? >> >> I can only reproduce it on one system - this one has SATA drive >> connected to the controller handled by nv_sata(7D) driver. I think >> that Sanjay encountered that problem also on system with SATA disk. >> >> Thank you, >> Jan >> >>> Thanks, >>> George >>> >>> jan damborsky wrote: >>>> Hi George, >>>> >>>> >>>> George Wilson wrote: >>>>> Jan, >>>>> >>>>> So who is working the UFS issue and how is that being tracked. >>>> In general, bugs in OpenSolaris Caiman installer are tracked in >>>> Bugzilla at >>>> defect.opensolaris.org. This is the preferred over filing bugs in >>>> Bugster. >>>> Speaking about this particular problem, it is tracked by following >>>> bug: >>>> >>>> 4675 Fix for bug 30 causes ZFS label to be mangled - ending up in >>>> GRUB prompt after installing OpenSolaris >>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4675 >>>> >>>> Sanjay Nadkarni is assigned to this bug (CCing him). >>>> >>>>> I would recommend that we keep this bug as the UFS/install issue >>>>> and create a new bug and send that to me. >>>> As pointed above, Bugzilla is preferred database to track issues in >>>> Caiman installer. >>>> >>>> Please note that 6769487 was originally filed for tracking the >>>> problem when >>>> GRUB can't access ZFS filesystem because 'devid' is not present in >>>> ZFS label. >>>> >>>> It was overloaded later by 'UFS' problem. >>>> >>>>> Can you move the descriptions below from this bug and add them to >>>>> the new one? >>>> To be honest, since installer part of problem related to UFS is >>>> tracked by 4675, >>>> I don't see why we shouldn't continue to use 6769487 to track the >>>> issue this bug >>>> was initially filed for and I think that we might lose some context >>>> when >>>> ZFS related information is moved from 6769487 to the new bug. >>>> That said, if you think it might be helpful, please let me know and >>>> I will try to capture all information from 6769487 I think is >>>> relevant to >>>> the ZFS part in new bug. >>>> >>>>> Also since you can reproduce this can you tell me exactly how or >>>>> point me at a system which I can login into to debug? >>>> Sure, the machine can be accessed via 'ssh', but since it is not >>>> directly accessible from SWAN (it is behind the NAT), >>>> I will provide you with instructions, how to access it. >>>> Unfortunately it doesn't have console access. >>>> >>>> Please let me know, in which state you would need to have that >>>> machine - right after the installation finished, but before reboot ? >>>> >>>> Unfortunately, following the procedure itself doesn't seem to be >>>> sufficient for reproducing the problem :-( I tried exactly the >>>> same steps on other bare metal as well as in virtual environment, >>>> but without success. >>>> >>>> >>>>> I want to make sure we don't lose sight of the UFS issue and this >>>>> bug has already gone down to root cause so let's not overload this >>>>> bug any further. >>>> UFS part of problem is being solved right now (please feel free to >>>> monitor >>>> bug 4675 for progress and add anything you might consider relevant >>>> to that issue). >>>> >>>> Thank you, >>>> Jan >>>> >>>>> Thanks, >>>>> George >>>>> >>>>> jan damborsky wrote: >>>>>> Hi George, >>>>>> >>>>>> there are at least two parts of this problem: >>>>>> >>>>>> [1] UFS one >>>>>> This is what you are referring to and it is being tracked by >>>>>> Bugzilla bug 4675. >>>>>> In that case workaround #2 helps to "solve" the problem. >>>>>> >>>>>> [2] ZFS one >>>>>> Please see original description #1. I am able to reproduce that >>>>>> on system >>>>>> at will which didn't contain any UFS filesystem and thus [1] is not >>>>>> applicable here. 'zpool import' helps in this case. >>>>>> >>>>>> Also please see: >>>>>> * description #4 >>>>>> * description #5 >>>>>> * public comments #8 >>>>>> * comments #6 >>>>>> >>>>>> People are apparently encountering this problem in >>>>>> other configurations (e.g. when using virgin disk >>>>>> or installing on system containing only Windows). >>>>>> >>>>>> I am not stating that this is in fact problem in ZFS as it might >>>>>> be related for example to device driver code, but at this point it >>>>>> seems to me that ZFS team is the most eligible one to move >>>>>> things forward, as GRUB can't read menu.lst from ZFS >>>>>> filesystem . >>>>>> >>>>>> Please let me know if you have any questions or need more >>>>>> information. >>>>>> >>>>>> Thank you, >>>>>> Jan >>>>>> >>>>>> >>>>>> George Wilson wrote: >>>>>>> Jan, >>>>>>> >>>>>>> I don't understand how this is a ZFS problem. I thought from the >>>>>>> evaluation that the issue is that UFS and ZFS are sharing the >>>>>>> same block and this was being caused by the fact the the livecd >>>>>>> had mounted a UFS filesystem as part of the installation. Could >>>>>>> you clarify? >>>>>>> >>>>>>> Thanks, >>>>>>> George >>>>>>> >>>>>>> Jan.Damborsky at Sun.COM wrote: >>>>>>>> Sun Confidential: Internal only >>>>>>>> >>>>>>>> *Synopsis*: Ended up in 'grub>' prompt after installation of >>>>>>>> OpenSolaris 2008.11 (build 101a) >>>>>>>> >>>>>>>> CrPrint: http://bt2ws.central.sun.com/CrPrint?id=6769487 >>>>>>>> Monaco: http://monaco.sfbay.sun.com/detail.jsf?cr=6769487 >>>>>>>> >>>>>>>> Due to a change of Responsible manager requested by >>>>>>>> jan.damborsky at sun.com, >>>>>>>> david.brittle at sun.com is now the responsible manager for: >>>>>>>> >>>>>>>> Due to a change requested by jan.damborsky at sun.com, >>>>>>>> this CR is being redispatched: >>>>>>>> >>>>>>>> This is a high priority CR and requires your immediate attention. >>>>>>>> Please evaluate it as soon as possible. Thank you. >>>>>>>> >>>>>>>> CR 6769487 changed on Nov 12 2008 by jan.damborsky at sun.com >>>>>>>> >>>>>>>> === Field ============ === New Value ============= === Old >>>>>>>> Value ============= >>>>>>>> >>>>>>>> Category kernel >>>>>>>> opensolaris Comments New >>>>>>>> Note >>>>>>>> Comments New Note Old >>>>>>>> Note Comments New >>>>>>>> Note Old Note Public >>>>>>>> Comments New >>>>>>>> Note Responsible >>>>>>>> Manager david.brittle at sun.com >>>>>>>> eric.ray at sun.com Status >>>>>>>> 1-Dispatched 5-Cause Known >>>>>>>> SubCategory zfs >>>>>>>> livecd ====================== >>>>>>>> =========================== =========================== >>>>>>>> >>>>>>>> *Change Request ID*: 6769487 >>>>>>>> >>>>>>>> *Synopsis*: Ended up in 'grub>' prompt after installation of >>>>>>>> OpenSolaris 2008.11 (build 101a) >>>>>>>> >>>>>>>> Product: solaris >>>>>>>> Category: kernel >>>>>>>> Subcategory: zfs >>>>>>>> Type: Defect >>>>>>>> Subtype: Functionality >>>>>>>> Status: 1-Dispatched >>>>>>>> Substatus: Priority: 1-Very High >>>>>>>> Introduced In Release: Introduced In Build: Responsible >>>>>>>> Manager: david.brittle at sun.com >>>>>>>> Responsible Engineer: Initial Evaluator: zfs-team at sun.com >>>>>>>> Keywords: >>>>>>>> === *Description* >>>>>>>> ============================================================ >>>>>>>> When testing installation with recent OpenSolaris builds, we >>>>>>>> have been encountering that >>>>>>>> in some cases, people end up in GRUB prompt after the >>>>>>>> installation - it seems that menu.lst >>>>>>>> can't be accessed for some reason. For now bunch of Bugzilla >>>>>>>> bugs seem to be describing >>>>>>>> the same manifestation of the problem which root cause has not >>>>>>>> been identified yet: >>>>>>>> >>>>>>>> 4051 opensolaris b99b/b100a does not install on 1.5 TB disk or >>>>>>>> boot fails after install >>>>>>>> 4591 Install failure on a Sun Fire X4240 with Opensolaris 200811 >>>>>>>> 4161 no grub in 2008.11 Development Builds (comment #20, >>>>>>>> comment #31) >>>>>>>> 4760 Enter grub after installing 2008.11 RC 1 >>>>>>>> ... >>>>>>>> >>>>>>>> I also hit that problem when testing Automated Installer (it is >>>>>>>> a part of Caiman project >>>>>>>> and will replace current jumpstart install technology), I was >>>>>>>> able to make GRUB find >>>>>>>> 'menu.lst' just by using 'zpool import' command - please see >>>>>>>> below for detailed procedure. >>>>>>>> >>>>>>>> >>>>>>>> configuration: >>>>>>>> -------------- >>>>>>>> HW: Ultra 20, 1GB RWM, 1 250GB SATA drive >>>>>>>> SW: Opensolaris build 100, 64bit mode >>>>>>>> >>>>>>>> steps used: >>>>>>>> ----------- >>>>>>>> [1] OpenSolaris 100 installed using Automated Installer >>>>>>>> - Solaris 2 partition created during installation >>>>>>>> >>>>>>>> * partition configuration before installation: >>>>>>>> >>>>>>>> # fdisk -W - c2t0d0p0 >>>>>>>> ...* Id Act Bhead Bsect Bcyl Ehead Esect Ecyl >>>>>>>> Rsect Numsect >>>>>>>> 192 0 0 1 1 254 63 1023 >>>>>>>> 16065 22491000 >>>>>>>> * partition configuration after installation: >>>>>>>> >>>>>>>> # fdisk -W - c2t0d0p0 >>>>>>>> ...* Id Act Bhead Bsect Bcyl Ehead Esect Ecyl >>>>>>>> Rsect Numsect >>>>>>>> 192 0 0 1 1 254 63 1023 >>>>>>>> 16065 22491000 191 128 254 63 1023 254 >>>>>>>> 63 1023 22507065 30000000 >>>>>>>> >>>>>>>> [2] When I reboot the system after the installation, I ended up >>>>>>>> in GRUB prompt: >>>>>>>> grub> root >>>>>>>> (hd0,1,a): Filesystem type unknown, partition type 0xbf >>>>>>>> >>>>>>>> grub> cat /rpool/boot/grub/menu.lst >>>>>>>> >>>>>>>> Error 17: Cannot mount selected partition >>>>>>>> >>>>>>>> grub> >>>>>>>> >>>>>>>> [3] I rebooted into AI and did 'zpool import' >>>>>>>> # zdb -l /dev/rdsk/c2t0d0s0 > /tmp/zdb_before_import.txt >>>>>>>> (attached) >>>>>>>> # zpool import -f rpool >>>>>>>> # zdb -l /dev/rdsk/c2t0d0s0 > /tmp/zdb_after_import.txt (attached) >>>>>>>> # diff /tmp/zdb_before_import.txt /tmp/zdb_after_import.txt >>>>>>>> 7c7 >>>>>>>> < txg=21 >>>>>>>> --- >>>>>>>> >>>>>>>>> txg=2675 >>>>>>>>> >>>>>>>> 9c9 >>>>>>>> < hostid=4741222 >>>>>>>> --- >>>>>>>> >>>>>>>>> hostid=4247690 >>>>>>>>> >>>>>>>> 17a18 >>>>>>>> >>>>>>>>> devid='id1,sd at f00c778e247ac7bd0000238460000/a' >>>>>>>>> >>>>>>>> 31c32 >>>>>>>> ... >>>>>>>> # reboot >>>>>>>> >>>>>>>> [4] Now GRUB can access menu.lst and Solaris is booted >>>>>>>> >>>>>>>> hypothesis >>>>>>>> ---------- >>>>>>>> It seems that for some reason, when ZFS pool was created, >>>>>>>> 'devid' information was not added to the ZFS label. >>>>>>>> >>>>>>>> When 'zpool import' was called, 'devid' got populated. >>>>>>>> >>>>>>>> Looking at the GRUB ZFS plug-in, it seems that 'devid' >>>>>>>> (ZPOOL_CONFIG_DEVID attribute) is >>>>>>>> required in order to be able to access ZFS filesystem: >>>>>>>> >>>>>>>> In grub/grub-0.95/stage2/fsys_zfs.c: >>>>>>>> >>>>>>>> vdev_get_bootpath() >>>>>>>> { >>>>>>>> ... >>>>>>>> if (strcmp(type, VDEV_TYPE_DISK) == 0) { >>>>>>>> if (vdev_validate(nv) != 0 || >>>>>>>> (nvlist_lookup_value(nv, ZPOOL_CONFIG_PHYS_PATH, >>>>>>>> bootpath, DATA_TYPE_STRING, NULL) != 0) || >>>>>>>> (nvlist_lookup_value(nv, ZPOOL_CONFIG_DEVID, >>>>>>>> devid, DATA_TYPE_STRING, NULL) != 0)) >>>>>>>> return (ERR_NO_BOOTPATH); >>>>>>>> ... >>>>>>>> } >>>>>>>> >>>>>>>> additional observations: >>>>>>>> ------------------------ >>>>>>>> [1] If 'devid' is populated during installation after 'zpool >>>>>>>> create' >>>>>>>> operation, the problem doesn't occur. >>>>>>>> >>>>>>>> [2] If following described procedure, the problem is reproducible >>>>>>>> at will on system where it was initially reproduced (please see >>>>>>>> above for the configuration) >>>>>>>> >>>>>>>> [3] Other people reported this problem also for following >>>>>>>> configurations: >>>>>>>> * vmware >>>>>>>> * Sun Java Workstation W2100z with 2xOpteron2.4G 3G Mem >>>>>>>> >>>>>>>> [4] When installation into existing Solaris2 partition >>>>>>>> containing Solaris instance is done >>>>>>>> 'devid' is always populated and the problem doesn't occur (it >>>>>>>> doesn't matter if partition >>>>>>>> is marked 'active' or not). >>>>>>>> >>>>>>>> *** (#1 of 5): 2008-11-10 10:27:21 GMT+00:00 jan.damborsky at sun.com >>>>>>>> >>>>>>>> If the system once be Navada, (101a as mine), install >>>>>>>> OpenSolaris will hit this issue, while keep the partition but >>>>>>>> not choose the entire disk (I suspect this caused the issue, >>>>>>>> perhaps) >>>>>>>> There's a diagnostic partition on there if Navada installed, >>>>>>>> and opensolaris 2008.11 simply enter grub> as this CR >>>>>>>> mentioned. Then I use the entire disk, this time the system >>>>>>>> boot up okay. >>>>>>>> But while I re-install it again with a smaller size than the >>>>>>>> entire disk specified, >>>>>>>> grub has no problem, but GNOME cannot start (hang there endlessly) >>>>>>>> >>>>>>>> *** (#2 of 5): 2008-11-10 10:45:29 GMT+00:00 robin.guo at sun.com >>>>>>>> >>>>>>>> The root cause of this problem is the continued existence of >>>>>>>> UFS filesystems structures on disk, even after the zfs >>>>>>>> filesystem is created and is live. Because ZFS did not destroy >>>>>>>> the UFS magic, both GRUB and Solaris think there's a (horribly >>>>>>>> damaged) UFS filesystem present on that slice (a WARNING is >>>>>>>> displayed at boot time during OpenSolaris boot informing the >>>>>>>> user that /mnt/solaris<N> (where <N> is a number) could not be >>>>>>>> mounted because of filesystem problems -- in reality, that >>>>>>>> slice is where the zfs root is located. >>>>>>>> >>>>>>>> In GRUB, since code that attempts to mount root does so by >>>>>>>> trying each filesystem module in the order in which they are >>>>>>>> listed in the fsys_table[] array, and since UFS is listed >>>>>>>> before ZFS, GRUB thinks that a UFS filesystem exists in the >>>>>>>> slice actually containing the ZFS root filesystem (and fails >>>>>>>> trying to mount it, leaving it unable to locate the real root >>>>>>>> filesystem). A modified version of GRUB that modifies >>>>>>>> fsys_table by declaring the ZFS operations before the UFS >>>>>>>> operations confirms this hypothesis. >>>>>>>> >>>>>>>> Therefore, a valid workaround destroys the UFS magic, >>>>>>>> preventing both GRUB's and Solaris's UFS modules from >>>>>>>> recognizing the slice as a UFS filesystem. When GRUB's UFS >>>>>>>> code fails to find a valid UFS filesystem, the ZFS module is >>>>>>>> subsequently tried and is able to successfully mount the >>>>>>>> filesystem. >>>>>>>> >>>>>>>> *** (#3 of 5): 2008-11-11 03:23:04 GMT+00:00 seth.goldberg at sun.com >>>>>>>> *** Last Edit: 2008-11-11 03:45:05 GMT+00:00 seth.goldberg at sun.com >>>>>>>> >>>>>>>> I think there are two separate issues here. The UFS label >>>>>>>> appears to be one. The signature for this bug is that at grub >>>>>>>> prompt, typing root - generates the UFS filesystem info. >>>>>>>> However there is a secondary bug where after installation, one >>>>>>>> gets a grub prompt. Typing root command at the grub prompmt >>>>>>>> generates - unknown file system. In this case no UFS >>>>>>>> filesystems were detected or mounted. The workaround for this >>>>>>>> has been to run zpool import. This still needs to be >>>>>>>> investigated. >>>>>>>> >>>>>>>> *** (#4 of 5): 2008-11-12 00:04:16 GMT+00:00 >>>>>>>> sanjay.nadkarni at sun.com >>>>>>>> >>>>>>>> We were able to recreate the grub failure where typing root at >>>>>>>> the prompt returns unknown file system. This was on a Fujistu >>>>>>>> LifeBook S7211. It was installed with installed with Vista. >>>>>>>> We then booted OpenSolaris and started the install. At the end >>>>>>>> of the installation we noted that the zfs label did not have >>>>>>>> devid information. >>>>>>>> >>>>>>>> We then loaded a simple program that would get the devid >>>>>>>> (devid_get). This failed with "Invalid argument". We then >>>>>>>> rebooted the liveCD again and reran this program and this time >>>>>>>> it printed out the device id. The disk is off a SATA >>>>>>>> controller. The driver that attached to this is ahci. The >>>>>>>> device is: 82801HBM/HEM. The disk is Fujitsu MHY2120BH >>>>>>>> >>>>>>>> *** (#5 of 5): 2008-11-12 02:43:18 GMT+00:00 >>>>>>>> sanjay.nadkarni at sun.com >>>>>>>> >>>>>>>> >>>>>>>> === *Public Comments* >>>>>>>> ======================================================== >>>>>>>> Following Bugzilla bugs were closed as duplicate of this issue: >>>>>>>> >>>>>>>> 4772 Cannot install OpenSolaris 2008.11 on VMware Server 2.0 >>>>>>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4772 >>>>>>>> >>>>>>>> 4756 after reboot when finishing the installation, system can >>>>>>>> not boot >>>>>>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4756 >>>>>>>> >>>>>>>> 4749 After installed opensolaris0811RC1 on Dell PowerEdge, >>>>>>>> can't boot from disk. >>>>>>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4749 >>>>>>>> >>>>>>>> *** (#1 of 9): 2008-11-10 17:20:54 GMT+00:00 dave.miner at sun.com >>>>>>>> *** Last Edit: 2008-11-11 11:45:41 GMT+00:00 jan.damborsky at sun.com >>>>>>>> >>>>>>>> zpool import doesn't help for me, nor would I expect it to >>>>>>>> (it's a mystery >>>>>>>> why it seems to). Clearing the UFS magic helps. >>>>>>>> >>>>>>>> Looking further, I find that the data on disk at 8k seems to still >>>>>>>> be a UFS superblock, not a zfs vdev_boot_header_t, which >>>>>>>> doesn't make >>>>>>>> sense to me; in any ZFS initialization scheme, one would expect >>>>>>>> all parts >>>>>>>> of the label to be completely written. >>>>>>>> >>>>>>>> The expected vdev_boot_header_t appears at the label copy at >>>>>>>> 256K+8K, as >>>>>>>> expected. >>>>>>>> >>>>>>>> *** (#2 of 9): 2008-11-11 04:39:09 GMT+00:00 dan.mick at sun.com >>>>>>>> >>>>>>>> It appears that ZFS doesn't validate that first 8k (the >>>>>>>> vdev_boot_header), so >>>>>>>> that explains why the kernel was happy even with a UFS >>>>>>>> superblock where the >>>>>>>> vdev_boot_header was supposed to be. >>>>>>>> >>>>>>>> Also, the last few bits of the 8k block in question seem to >>>>>>>> contain a >>>>>>>> zio_block_tail_t (i.e. a zbt_magic and a zbt_cksum), so it >>>>>>>> seems this block >>>>>>>> was written by ZFS sometime in the past. >>>>>>>> Possible theories: 1) the ZFS initialization somehow skipped >>>>>>>> this 8k header, >>>>>>>> or 2) somehow the 8k superblock was rewritten over the block >>>>>>>> after ZFS initialized it. >>>>>>>> >>>>>>>> *** (#3 of 9): 2008-11-11 04:49:57 GMT+00:00 dan.mick at sun.com >>>>>>>> >>>>>>>> Another possible theory: could this be the superblock flush >>>>>>>> from a still-mounted UFS being shut down? >>>>>>>> >>>>>>>> (The block was correct until after the OpenSolaris installer >>>>>>>> said it was done, >>>>>>>> and waited for me to press a button to reboot. I suspect >>>>>>>> the original UFS was mounted and not unmounted before the ZFS >>>>>>>> creation, >>>>>>>> so they both think they own the device.) >>>>>>>> >>>>>>>> Supporting evidence: the "last mounted" path in the superblock >>>>>>>> is "/mnt/solaris0". >>>>>>>> >>>>>>>> I suspect the cause of this bug is a UFS that's mounted and >>>>>>>> should be >>>>>>>> unmounted by the installer before ZFS creation. >>>>>>>> >>>>>>>> What's the right category/subcategory for Caiman? >>>>>>>> >>>>>>>> *** (#4 of 9): 2008-11-11 07:34:58 GMT+00:00 dan.mick at sun.com >>>>>>>> >>>>>>>> The live CD has historically automatically mounted up any UFS >>>>>>>> file systems that it found, going back to Belenix. Interesting >>>>>>>> that this is just now a problem, but it probably is a result of >>>>>>>> switching to ZFS for swap, as up until build 96 we always >>>>>>>> created a swap slice at the start of the disk, which it appears >>>>>>>> would have masked this problem. >>>>>>>> >>>>>>>> *** (#5 of 9): 2008-11-11 15:02:03 GMT+00:00 dave.miner at sun.com >>>>>>>> >>>>>>>> Installer takes care of releasing the target device before >>>>>>>> Target Instantiation >>>>>>>> phase is launched. Among other things, it >>>>>>>> >>>>>>>> * releases all swap devices created on target disk >>>>>>>> * unmounts whatever is mounted on target disk >>>>>>>> >>>>>>>> For the latter, /etc/mnttab is read and if there is mounted >>>>>>>> device which is part of >>>>>>>> the target disk, installer tries to unmount it. >>>>>>>> >>>>>>>> The problem is after fix for Bugzilla bug 30 was integrated, >>>>>>>> UFS filesystems are >>>>>>>> mounted with '-o m' option which causes the filesystem being >>>>>>>> mounted without making >>>>>>>> entry in /etc/mnttab. Then mountpoints are hidden, installer >>>>>>>> can't see those and >>>>>>>> doesn't unmount them. >>>>>>>> >>>>>>>> That said, this explains UFS part of the problem (when 'dd' >>>>>>>> workaround works), >>>>>>>> but doesn't seems to be related to ZFS part of the issue, when >>>>>>>> 'zpool import' workaround helped. >>>>>>>> >>>>>>>> *** (#6 of 9): 2008-11-11 16:25:09 GMT+00:00 jan.damborsky at sun.com >>>>>>>> *** Last Edit: 2008-11-11 16:34:29 GMT+00:00 jan.damborsky at sun.com >>>>>>>> >>>>>>>> We should probably file leave this bug to resolve zpool create >>>>>>>> not removing evidence of the >>>>>>>> previous ufs fs, and file another one to chase down the other >>>>>>>> issue(s?). >>>>>>>> >>>>>>>> Chris, if you run zbd -l on you virgin device, are you missing >>>>>>>> any zfs properties? The reader >>>>>>>> in GRUB pretty much gives up if things like the devid aren't set. >>>>>>>> >>>>>>>> *** (#7 of 9): 2008-11-11 19:30:56 GMT+00:00 >>>>>>>> jan.setje-eilers at sun.com >>>>>>>> >>>>>>>> Concur that Chris' problem is different; the UFS superblock >>>>>>>> does not exist in >>>>>>>> the first 256kb attached to the bug. It appears as though >>>>>>>> phys_path and devid >>>>>>>> are present, although it's difficult to be sure. We should >>>>>>>> probably see if we can >>>>>>>> send a debug version of Grub to Chris, with installation >>>>>>>> instructions, to see >>>>>>>> why it seems unable to find the zfs. >>>>>>>> >>>>>>>> *** (#8 of 9): 2008-11-11 22:16:50 GMT+00:00 dan.mick at sun.com >>>>>>>> >>>>>>>> The root cause of 'UFS part' of this problem is in 'livecd >>>>>>>> code' and is tracked by >>>>>>>> following Bugzilla bug: >>>>>>>> >>>>>>>> 4675 Fix for bug 30 causes ZFS label to be mangled - ending up >>>>>>>> in GRUB prompt after installing OpenSolaris >>>>>>>> >>>>>>>> Please feel free to use this bug (6769487) for tracking other >>>>>>>> part(s) of the problem. >>>>>>>> Resetting category to solaris/kernel/zfs and Status to >>>>>>>> 'Dispatched'. >>>>>>>> >>>>>>>> *** (#9 of 9): 2008-11-12 12:46:21 GMT+00:00 jan.damborsky at sun.com >>>>>>>> >>>>>>>> >>>>>>>> === *Comments* >>>>>>>> =============================================================== >>>>>>>> Moved to public comments. >>>>>>>> >>>>>>>> *** (#1 of 6): 2008-11-10 17:04:10 GMT+00:00 jan.damborsky at sun.com >>>>>>>> *** Last Edit: 2008-11-10 17:20:54 GMT+00:00 dave.miner at sun.com >>>>>>>> >>>>>>>> Same situation (without zfs) on: >>>>>>>> White Box based on Intel DG33TL motherboard with ICH9R chipset, >>>>>>>> 2Gb memory, 3 SATA drives, 1 SATA CD/DVD, Intel graphics. >>>>>>>> >>>>>>>> *** (#2 of 6): 2008-11-10 22:52:23 GMT+00:00 pawel.wojcik at sun.com >>>>>>>> >>>>>>>> Workaround #1 does not cause the system to boot properly on the >>>>>>>> system I tried installing (that seems to be consistent with >>>>>>>> what others are reporting in the opensolaris defect report), >>>>>>>> but workaround #2 DOES. >>>>>>>> >>>>>>>> *** (#3 of 6): 2008-11-11 01:56:43 GMT+00:00 seth.goldberg at sun.com >>>>>>>> *** Last Edit: 2008-11-11 03:41:48 GMT+00:00 seth.goldberg at sun.com >>>>>>>> >>>>>>>> I've reproduced this on a "virgin" disk, see SR record against >>>>>>>> this bug, (had to purchase a new spindle as previous disk >>>>>>>> failed and new disk removed supplier packaging was inserted >>>>>>>> into laptop and then 2008.11 CD booted). >>>>>>>> >>>>>>>> After a discussion with Dan Mick on email data requested by dan >>>>>>>> was capture root command from grub prompt: >>>>>>>> >>>>>>>> (hd0,0,a): Filesystem type is zfs, partition type 0xbf >>>>>>>> >>>>>>>> Also, can you boot from the CD and collect the first 256kb of >>>>>>>> the disk, with >>>>>>>> >>>>>>>> dd if=<your s0 slice here> of=first.256kb bs=256k count=1 >>>>>>>> >>>>>>>> This is attached. >>>>>>>> >>>>>>>> *** (#4 of 6): 2008-11-11 10:46:29 GMT+00:00 >>>>>>>> christopher.armes at sun.com >>>>>>>> >>>>>>>> Saw this bug on several machines today which I was helping to >>>>>>>> install. One person did a reinstall and it worked fine the >>>>>>>> second time as some reported. >>>>>>>> >>>>>>>> 2 other machines could use the workaround which Lin Ling >>>>>>>> pointed us to with this bug. That did save a couple folks from >>>>>>>> having to reinstall, so was very helpful. Thanks Lin! Of the >>>>>>>> installs of people that installed to a hard drive (i.e., not >>>>>>>> within VirtualBox), about 12 systems, we saw this on 3 >>>>>>>> machines, so about 25% of the systems in this small sampling. >>>>>>>> >>>>>>>> *** (#5 of 6): 2008-11-12 09:58:01 GMT+00:00 alan.duboff at sun.com >>>>>>>> >>>>>>>> Moved to public comments. >>>>>>>> >>>>>>>> *** (#6 of 6): 2008-11-12 12:43:18 GMT+00:00 jan.damborsky at sun.com >>>>>>>> *** Last Edit: 2008-11-12 12:46:43 GMT+00:00 jan.damborsky at sun.com >>>>>>>> >>>>>>>> >>>>>>>> === *Evaluation* >>>>>>>> ============================================================= >>>>>>>> See Description. >>>>>>>> >>>>>>>> *** (#1 of 4): 2008-11-11 03:23:04 GMT+00:00 seth.goldberg at sun.com >>>>>>>> >>>>>>>> remove mislead evaluation. >>>>>>>> >>>>>>>> *** (#2 of 4): 2008-11-11 21:45:12 GMT+00:00 lin.ling at sun.com >>>>>>>> *** Last Edit: 2008-11-11 23:16:07 GMT+00:00 lin.ling at sun.com >>>>>>>> >>>>>>>> What? No, read the public comments. The problem is that the >>>>>>>> UFS filesystem is still mounted as the installer lays down the >>>>>>>> ZFS. Then, on reboot, the UFS, as >>>>>>>> it's syncing, writes its superblock back to the filesystem it >>>>>>>> thinks it owns, >>>>>>>> over the top of the now-ZFS-owned space. >>>>>>>> >>>>>>>> The installer must ensure that other filesystems are not >>>>>>>> mounted on the slice >>>>>>>> where it's creating the ZFS rpool. >>>>>>>> >>>>>>>> *** (#3 of 4): 2008-11-11 22:11:35 GMT+00:00 dan.mick at sun.com >>>>>>>> >>>>>>>> You are right. I misunderstood. >>>>>>>> George Wilson just corrected me that 'zpool create' indeed >>>>>>>> clears the space correctly: >>>>>>>> >>>>>>>> vdev_label_init() { >>>>>>>> : >>>>>>>> vp = zio_buf_alloc(sizeof (vdev_phys_t)); >>>>>>>> bzero(vp, sizeof (vdev_phys_t)); >>>>>>>> : >>>>>>>> bzero(vb, sizeof (vdev_boot_header_t)); >>>>>>>> : >>>>>>>> } >>>>>>>> >>>>>>>> Thanks for the clarification. >>>>>>>> >>>>>>>> *** (#4 of 4): 2008-11-11 22:49:04 GMT+00:00 lin.ling at sun.com >>>>>>>> >>>>>>>> >>>>>>>> === *Suggested Fix* >>>>>>>> ========================================================== >>>>>>>> >>>>>>>> === *Workaround* >>>>>>>> ============================================================= >>>>>>>> [1] Boot LiveCD >>>>>>>> $ pfexec su - >>>>>>>> # zpool import -f rpool >>>>>>>> >>>>>>>> *** (#1 of 3): 2008-11-10 10:27:21 GMT+00:00 jan.damborsky at sun.com >>>>>>>> >>>>>>>> ZERO OUT The leftover UFS magic: >>>>>>>> >>>>>>>> For GNU dd: >>>>>>>> dd if=/dev/zero bs=1 count=4 seek=9564 /dev/dsk/<SLICE> >>>>>>>> >>>>>>>> (e.g.: >>>>>>>> dd if=/dev/zero bs=1 count=4 seek=9564 /dev/dsk/c4t0d0s0 >>>>>>>> ) >>>>>>>> >>>>>>>> *** (#2 of 3): 2008-11-11 03:36:55 GMT+00:00 seth.goldberg at sun.com >>>>>>>> >>>>>>>> I did the following in dd to workaround around the issue: >>>>>>>> >>>>>>>> root at opensolaris:~# dd if=/dev/zero of=/dev/dsk/c1t0d0s0 bs=1 >>>>>>>> count=4 seek=9564 >>>>>>>> 4+0 records in >>>>>>>> 4+0 records out >>>>>>>> 4 bytes (4 B) copied, 0.0394095 s, 0.1 kB/s >>>>>>>> root at opensolaris:~# >>>>>>>> >>>>>>>> *** (#3 of 3): 2008-11-11 19:07:04 GMT+00:00 mary.ding at sun.com >>>>>>>> >>>>>>>> >>>>>>>> === *Justification* >>>>>>>> ========================================================== >>>>>>>> Priority changed from [] to [1-Very High] >>>>>>>> Installed OpenSolaris 2008.11 doesn't boot >>>>>>>> jan.damborsky at sun.com 2008-11-10 10:27:21 GMT >>>>>>>> >>>>>>>> *** (#1 of 1): 2008-11-10 10:27:21 GMT+00:00 jan.damborsky at sun.com >>>>>>>> >>>>>>>> >>>>>>>> === *Additional Details* >>>>>>>> ===================================================== >>>>>>>> Targeted Release: Commit To Fix In >>>>>>>> Build: Fixed In Build: Integrated In >>>>>>>> Build: Verified In Build: See Also: 6769534 >>>>>>>> Duplicate of: Hooks: >>>>>>>> Hook1: Hook2: Hook3: >>>>>>>> Hook4: Hook5: Hook6: Interest List: >>>>>>>> dan.mick at sun.com, dave.miner at sun.com, david.comay at sun.com, >>>>>>>> frank.batschulat at sun.com, kerberos-iteam at Sun.COM, >>>>>>>> lin.ling at sun.com, nick.todd at sun.com, peter.dennis at sun.com, >>>>>>>> plus1tb at sun.com, sdg at sun.com, si-bugs at sun.com, sst-prg at >>>>>>>> sun.com, >>>>>>>> tomas.hurka at sun.com >>>>>>>> Program Management: New Defect >>>>>>>> Root Cause: Is a Security Vulnerability?: No >>>>>>>> Fix Affects Documentation: No >>>>>>>> Fix Affects Localization: No >>>>>>>> Reported by: >>>>>>>> === *History* >>>>>>>> ================================================================ >>>>>>>> Date Submitted: 2008-11-10 10:27:21 GMT+00:00 >>>>>>>> Submitted By: jan.damborsky at sun.com >>>>>>>> >>>>>>>> Status Changed Date Updated Updated By >>>>>>>> 3-Accepted 2008-11-10 23:59:05 GMT+00:00 >>>>>>>> lin.ling at sun.com >>>>>>>> 5-Cause Known 2008-11-11 03:23:04 GMT+00:00 >>>>>>>> seth.goldberg at sun.com >>>>>>>> 1-Dispatched 2008-11-12 12:43:18 GMT+00:00 >>>>>>>> jan.damborsky at sun.com >>>>>>>> >>>>>>>> >>>>>>>> === *Solution* >>>>>>>> =============================================================== >>>>>>>> >>>>>>>> >>>>>>>> === *Service Request* >>>>>>>> ======================================================== >>>>>>>> ID: 1-493023606 >>>>>>>> Customer: >>>>>>>> Account Name: Sun Microsystems >>>>>>>> Customer Contact: Customer Contact Role: >>>>>>>> D-Development >>>>>>>> Customer Contact Type: I-Internal (SMI) Customer >>>>>>>> Impact: Critical >>>>>>>> Functionality: Primary >>>>>>>> Severity: 1 >>>>>>>> Synopsis: Product Name: solaris >>>>>>>> Product Release: osol_2008.11 >>>>>>>> Product Build: Operating System: osol_2008.11 >>>>>>>> Hardware: generic >>>>>>>> Reference Number: Sun Contact: >>>>>>>> jan.damborsky at sun.com >>>>>>>> Status: Open >>>>>>>> Source: BugTraq2 >>>>>>>> Reproducible: Submitted By: jan.damborsky at sun.com >>>>>>>> Submitted Date: 2008-11-10 10:27:21 GMT+00:00 >>>>>>>> Description: >>>>>>>> >>>>>>>> === *Service Request* >>>>>>>> ======================================================== >>>>>>>> ID: 1-493053806 >>>>>>>> Customer: >>>>>>>> Account Name: SUN MicroSystems >>>>>>>> Customer Contact: Customer Contact Role: >>>>>>>> D-Development >>>>>>>> Customer Contact Type: I-Internal (SMI) Customer >>>>>>>> Impact: Critical >>>>>>>> Functionality: Primary >>>>>>>> Severity: 1 >>>>>>>> Synopsis: After installing 2008.11RC1b boot from hard >>>>>>>> disk fails >>>>>>>> Product Name: solaris >>>>>>>> Product Release: osol_2008.11 >>>>>>>> Product Build: Operating System: osol_2008.11 >>>>>>>> Hardware: x86 >>>>>>>> Reference Number: Sun Contact: >>>>>>>> christopher.armes at sun.com >>>>>>>> Status: Open >>>>>>>> Source: BugTraq2 >>>>>>>> Reproducible: Always >>>>>>>> Submitted By: christopher.armes at sun.com >>>>>>>> Submitted Date: 2008-11-10 12:54:24 GMT+00:00 >>>>>>>> Description: Booting from the livecd and then selecting >>>>>>>> install works fine upon reboot with either cd in and selecting >>>>>>>> boot from hard disk or without cd allowing grub menu to boot, >>>>>>>> causes boot to fail drops system to "grub>" prompt >>>>>>>> >>>>>>>> >>>>>>>> === *Service Request* >>>>>>>> ======================================================== >>>>>>>> ID: 1-493177108 >>>>>>>> Customer: >>>>>>>> Account Name: SUN >>>>>>>> Customer Contact: Customer Contact Role: >>>>>>>> D-Development >>>>>>>> Customer Contact Type: I-Internal (SMI) Customer >>>>>>>> Impact: Critical >>>>>>>> Functionality: Primary >>>>>>>> Severity: 1 >>>>>>>> Synopsis: Product Name: solaris >>>>>>>> Product Release: osol_2008.11 >>>>>>>> Product Build: osol_2008.11 >>>>>>>> Operating System: osol_2008.11 >>>>>>>> Hardware: amd >>>>>>>> Reference Number: Sun Contact: >>>>>>>> garrett.damore at sun.com >>>>>>>> Status: Source: BugTraq2 >>>>>>>> Reproducible: Submitted By: garrett.damore at sun.com >>>>>>>> Submitted Date: 2008-11-10 20:16:41 GMT+00:00 >>>>>>>> Description: I hit this when updating my Ultra 20 >>>>>>>> (original model, not M2) from b77ish to OSOL 2008.11rc1b >>>>>>>> >>>>>>>> System has 1.5GB ram, SATA hard disk. >>>>>>>> >>>>>>>> >>>>>>>> === *Service Request* >>>>>>>> ======================================================== >>>>>>>> ID: 1-493257401 >>>>>>>> Customer: >>>>>>>> Account Name: Sun Microsystems, Inc. >>>>>>>> Customer Contact: Customer Contact Role: >>>>>>>> D-Development >>>>>>>> Customer Contact Type: I-Internal (SMI) Customer >>>>>>>> Impact: Critical >>>>>>>> Functionality: Primary >>>>>>>> Severity: 1 >>>>>>>> Synopsis: Product Name: solaris >>>>>>>> Product Release: osol_2008.11 >>>>>>>> Product Build: osol_2008.11 >>>>>>>> Operating System: osol_2008.11 >>>>>>>> Hardware: generic_ibm_compatible >>>>>>>> Reference Number: Sun Contact: dana.myers at sun.com >>>>>>>> Status: Open >>>>>>>> Source: BugTraq2 >>>>>>>> Reproducible: Submitted By: dana.myers at sun.com >>>>>>>> Submitted Date: 2008-11-10 22:34:45 GMT+00:00 >>>>>>>> Description: >>>>>>>> >>>>>>>> === *Service Request* >>>>>>>> ======================================================== >>>>>>>> ID: 1-493265801 >>>>>>>> Customer: >>>>>>>> Account Name: Sun Microsystems >>>>>>>> Customer Contact: pawel.wojcik at sun.com >>>>>>>> Customer Contact Role: D-Development >>>>>>>> Customer Contact Type: I-Internal (SMI) Customer >>>>>>>> Impact: Critical >>>>>>>> Functionality: Primary >>>>>>>> Severity: 1 >>>>>>>> Synopsis: Product Name: solaris >>>>>>>> Product Release: osol_2008.11 >>>>>>>> Product Build: osol_2008.11 >>>>>>>> Operating System: solaris >>>>>>>> Hardware: intel >>>>>>>> Reference Number: Sun Contact: >>>>>>>> pawel.wojcik at sun.com >>>>>>>> Status: Source: BugTraq2 >>>>>>>> Reproducible: Submitted By: pawel.wojcik at sun.com >>>>>>>> Submitted Date: 2008-11-10 22:50:53 GMT+00:00 >>>>>>>> Description: >>>>>>>> >>>>>>>> === *Activity* >>>>>>>> =============================================================== >>>>>>>> >>>>>>>> >>>>>>>> === *Multiple Release (MR) Cluster* - 0 >>>>>>>> ====================================== >>>>>>>> >>>>>>>> >>>>>>>> === *Escalations* >>>>>>>> ============================================================ >>>>>>>> >>>>>>>> >> >> _______________________________________________ >> caiman-discuss mailing list >> caiman-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/caiman-discuss >
