George,

George Wilson wrote:
> Jan,
>
> It seems like the problem is not with ZFS but with the device driver. 
> If the driver is failing to provide the devid then ZFS is just going 
> to be a victim.

I agree with you that this is what we might be encountering
with respect to 'devid' problem here.


> I would recommend that we change the synopsis to devid_get() fails 
> with "Invalid argument" and pass this to the driver folks.

I will let Sanjay comment on this, since he has done
some more investigation recently.

> Do you know if it's always the same driver?

I can only reproduce it on one system - this one has SATA drive
connected to the controller handled by nv_sata(7D) driver. I think
that Sanjay encountered that problem also on system with SATA disk.

Thank you,
Jan

>
> Thanks,
> George
>
> jan damborsky wrote:
>> Hi George,
>>
>>
>> George Wilson wrote:
>>> Jan,
>>>
>>> So who is working the UFS issue and how is that being tracked. 
>>
>> In general, bugs in OpenSolaris Caiman installer are tracked in 
>> Bugzilla at
>> defect.opensolaris.org. This is the preferred over filing bugs in 
>> Bugster.
>> Speaking about this particular problem, it is tracked by following bug:
>>
>> 4675 Fix for bug 30 causes ZFS label to be mangled - ending up in 
>> GRUB prompt after installing OpenSolaris
>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4675
>>
>> Sanjay Nadkarni is assigned to this bug (CCing him).
>>
>>> I would recommend that we keep this bug as the UFS/install issue and 
>>> create a new bug and send that to me.
>>
>> As pointed above, Bugzilla is preferred database to track issues in 
>> Caiman installer.
>>
>> Please note that 6769487 was originally filed for tracking the 
>> problem when
>> GRUB can't access ZFS filesystem because 'devid' is not present in 
>> ZFS label.
>>
>> It was overloaded later by 'UFS' problem.
>>
>>> Can you move the descriptions below from this bug and add them to 
>>> the new one?
>>
>> To be honest, since installer part of problem related to UFS is 
>> tracked by 4675,
>> I don't see why we shouldn't continue to use 6769487 to track the 
>> issue this bug
>> was initially filed for and I think that we might lose some context when
>> ZFS related information is moved from 6769487 to the new bug.
>> That said, if you think it might be helpful, please let me know and
>> I will try to capture all information from 6769487 I think is 
>> relevant to
>> the ZFS part in new bug.
>>
>>> Also since you can reproduce this can you tell me exactly how or 
>>> point me at a system which I can login into to debug?
>>
>> Sure, the machine can be accessed via 'ssh', but since it is not
>> directly accessible from SWAN (it is behind the NAT),
>> I will provide you with instructions, how to access it.
>> Unfortunately it doesn't have console access.
>>
>> Please let me know, in which state you would need to have that
>> machine - right after the installation finished, but before reboot ?
>>
>> Unfortunately, following the procedure itself doesn't seem to be
>> sufficient for reproducing the problem :-( I tried exactly the
>> same steps on other bare metal as well as in virtual environment,
>> but without success.
>>
>>
>>>
>>> I want to make sure we don't lose sight of the UFS issue and this 
>>> bug has already gone down to root cause so let's not overload this 
>>> bug any further.
>>
>> UFS part of problem is being solved right now (please feel free to 
>> monitor
>> bug 4675 for progress and add anything you might consider relevant
>> to that issue).
>>
>> Thank you,
>> Jan
>>
>>>
>>> Thanks,
>>> George
>>>
>>> jan damborsky wrote:
>>>> Hi George,
>>>>
>>>> there are at least two parts of this problem:
>>>>
>>>> [1] UFS one
>>>> This is what you are referring to and it is being tracked by 
>>>> Bugzilla bug 4675.
>>>> In that case workaround #2 helps to "solve" the problem.
>>>>
>>>> [2] ZFS one
>>>> Please see original description #1. I am able to reproduce that on 
>>>> system
>>>> at will which didn't contain any UFS filesystem and thus [1] is not
>>>> applicable here. 'zpool import' helps in this case.
>>>>
>>>> Also please see:
>>>> * description #4
>>>> * description #5
>>>> * public comments #8
>>>> * comments #6
>>>>
>>>> People are apparently encountering this problem in
>>>> other configurations (e.g. when using virgin disk
>>>> or installing on system containing only Windows).
>>>>
>>>> I am not stating that this is in fact problem in ZFS as it might
>>>> be related for example to device driver code, but at this point it
>>>> seems to me that ZFS team is the most eligible one to move
>>>> things forward, as GRUB can't read menu.lst from ZFS
>>>> filesystem .
>>>>
>>>> Please let me know if you have any questions or need more
>>>> information.
>>>>
>>>> Thank you,
>>>> Jan
>>>>
>>>>
>>>> George Wilson wrote:
>>>>> Jan,
>>>>>
>>>>> I don't understand how this is a ZFS problem. I thought from the 
>>>>> evaluation that the issue is that UFS and ZFS are sharing the same 
>>>>> block and this was being caused by the fact the the livecd had 
>>>>> mounted a UFS filesystem as part of the installation. Could you 
>>>>> clarify?
>>>>>
>>>>> Thanks,
>>>>> George
>>>>>
>>>>> Jan.Damborsky at Sun.COM wrote:
>>>>>>                         Sun Confidential: Internal only
>>>>>>
>>>>>> *Synopsis*: Ended up in 'grub>' prompt after installation of 
>>>>>> OpenSolaris 2008.11 (build 101a)
>>>>>>
>>>>>> CrPrint: http://bt2ws.central.sun.com/CrPrint?id=6769487
>>>>>> Monaco: http://monaco.sfbay.sun.com/detail.jsf?cr=6769487
>>>>>>
>>>>>> Due to a change of Responsible manager requested by 
>>>>>> jan.damborsky at sun.com,
>>>>>> david.brittle at sun.com is now the responsible manager for:
>>>>>>
>>>>>> Due to a change requested by jan.damborsky at sun.com,
>>>>>> this CR is being redispatched:
>>>>>>
>>>>>> This is a high priority CR and requires your immediate attention.
>>>>>> Please evaluate it as soon as possible.  Thank you.
>>>>>>
>>>>>> CR 6769487 changed on Nov 12 2008 by jan.damborsky at sun.com
>>>>>>
>>>>>> === Field ============ === New Value ============= === Old Value 
>>>>>> =============
>>>>>>
>>>>>> Category               kernel                      
>>>>>> opensolaris                Comments               New 
>>>>>> Note                                               
>>>>>> Comments               New Note                    Old 
>>>>>> Note                   Comments               New 
>>>>>> Note                    Old Note                   Public 
>>>>>> Comments        New 
>>>>>> Note                                               Responsible 
>>>>>> Manager    david.brittle at sun.com       eric.ray at sun.com           
>>>>>> Status                 1-Dispatched                5-Cause 
>>>>>> Known              SubCategory            
>>>>>> zfs                         livecd                     
>>>>>> ====================== =========================== 
>>>>>> ===========================
>>>>>>
>>>>>>      *Change Request ID*: 6769487
>>>>>>
>>>>>> *Synopsis*: Ended up in 'grub>' prompt after installation of 
>>>>>> OpenSolaris 2008.11 (build 101a)
>>>>>>
>>>>>>   Product: solaris
>>>>>>   Category: kernel
>>>>>>   Subcategory: zfs
>>>>>>   Type: Defect
>>>>>>   Subtype: Functionality
>>>>>>   Status: 1-Dispatched
>>>>>>   Substatus:   Priority: 1-Very High
>>>>>>   Introduced In Release:   Introduced In Build:   Responsible 
>>>>>> Manager: david.brittle at sun.com
>>>>>>   Responsible Engineer:   Initial Evaluator: zfs-team at sun.com
>>>>>>   Keywords:
>>>>>> === *Description* 
>>>>>> ============================================================
>>>>>> When testing installation with recent OpenSolaris builds, we have 
>>>>>> been encountering that
>>>>>> in some cases, people end up in GRUB prompt after the 
>>>>>> installation - it seems that menu.lst
>>>>>> can't be accessed for some reason. For now bunch of Bugzilla bugs 
>>>>>> seem to be describing
>>>>>> the same manifestation of the problem which root cause has not 
>>>>>> been identified yet:
>>>>>>
>>>>>> 4051 opensolaris b99b/b100a does not install on 1.5 TB disk or 
>>>>>> boot fails after install
>>>>>> 4591 Install failure on a Sun Fire X4240 with Opensolaris 200811
>>>>>> 4161 no grub in 2008.11 Development Builds (comment #20, comment 
>>>>>> #31)
>>>>>> 4760 Enter grub after installing 2008.11 RC 1
>>>>>> ...
>>>>>>
>>>>>> I also hit that problem when testing Automated Installer (it is a 
>>>>>> part of Caiman project
>>>>>> and will replace current jumpstart install technology), I was 
>>>>>> able to make GRUB find
>>>>>> 'menu.lst' just by using 'zpool import' command - please see 
>>>>>> below for detailed procedure.
>>>>>>
>>>>>>
>>>>>> configuration:
>>>>>> --------------
>>>>>> HW: Ultra 20, 1GB RWM, 1 250GB SATA drive
>>>>>> SW: Opensolaris build 100, 64bit mode
>>>>>>
>>>>>> steps used:
>>>>>> -----------
>>>>>> [1] OpenSolaris 100 installed using Automated Installer
>>>>>>    - Solaris 2 partition created during installation
>>>>>>
>>>>>> * partition configuration before installation:
>>>>>>
>>>>>> # fdisk -W - c2t0d0p0
>>>>>> ...* Id    Act  Bhead  Bsect  Bcyl    Ehead  Esect  Ecyl    
>>>>>> Rsect      Numsect
>>>>>>  192   0    0      1      1       254    63     1023    
>>>>>> 16065      22491000
>>>>>> * partition configuration after installation:
>>>>>>
>>>>>> # fdisk -W - c2t0d0p0
>>>>>> ...* Id    Act  Bhead  Bsect  Bcyl    Ehead  Esect  Ecyl    
>>>>>> Rsect      Numsect
>>>>>>  192   0    0      1      1       254    63     1023    
>>>>>> 16065      22491000  191   128  254    63     1023    254    
>>>>>> 63     1023    22507065   30000000
>>>>>>
>>>>>> [2] When I reboot the system after the installation, I ended up 
>>>>>> in GRUB prompt:
>>>>>> grub> root
>>>>>> (hd0,1,a): Filesystem type unknown, partition type 0xbf
>>>>>>
>>>>>> grub> cat /rpool/boot/grub/menu.lst
>>>>>>
>>>>>> Error 17: Cannot mount selected partition
>>>>>>
>>>>>> grub>
>>>>>>
>>>>>> [3] I rebooted into AI and did 'zpool import'
>>>>>> # zdb -l /dev/rdsk/c2t0d0s0 > /tmp/zdb_before_import.txt (attached)
>>>>>> # zpool import -f rpool
>>>>>> # zdb -l /dev/rdsk/c2t0d0s0 > /tmp/zdb_after_import.txt (attached)
>>>>>> # diff /tmp/zdb_before_import.txt /tmp/zdb_after_import.txt
>>>>>> 7c7
>>>>>> <     txg=21
>>>>>> ---
>>>>>>  
>>>>>>>     txg=2675
>>>>>>>     
>>>>>> 9c9
>>>>>> <     hostid=4741222
>>>>>> ---
>>>>>>  
>>>>>>>     hostid=4247690
>>>>>>>     
>>>>>> 17a18
>>>>>>  
>>>>>>>         devid='id1,sd at f00c778e247ac7bd0000238460000/a'
>>>>>>>     
>>>>>> 31c32
>>>>>> ...
>>>>>> # reboot
>>>>>>
>>>>>> [4] Now GRUB can access menu.lst and Solaris is booted
>>>>>>
>>>>>> hypothesis
>>>>>> ----------
>>>>>> It seems that for some reason, when ZFS pool was created, 'devid' 
>>>>>> information was not added to the ZFS label.
>>>>>>
>>>>>> When 'zpool import' was called, 'devid' got populated.
>>>>>>
>>>>>> Looking at the GRUB ZFS plug-in, it seems that 'devid' 
>>>>>> (ZPOOL_CONFIG_DEVID attribute) is
>>>>>> required in order to be able to access ZFS filesystem:
>>>>>>
>>>>>> In grub/grub-0.95/stage2/fsys_zfs.c:
>>>>>>
>>>>>> vdev_get_bootpath()
>>>>>> {
>>>>>> ...
>>>>>>    if (strcmp(type, VDEV_TYPE_DISK) == 0) {
>>>>>>        if (vdev_validate(nv) != 0 ||
>>>>>>            (nvlist_lookup_value(nv, ZPOOL_CONFIG_PHYS_PATH,
>>>>>>            bootpath, DATA_TYPE_STRING, NULL) != 0) ||
>>>>>>            (nvlist_lookup_value(nv, ZPOOL_CONFIG_DEVID,
>>>>>>            devid, DATA_TYPE_STRING, NULL) != 0))
>>>>>>            return (ERR_NO_BOOTPATH);
>>>>>> ...
>>>>>> }
>>>>>>
>>>>>> additional observations:
>>>>>> ------------------------
>>>>>> [1] If 'devid' is populated during installation after 'zpool create'
>>>>>> operation, the problem doesn't occur.
>>>>>>
>>>>>> [2] If following described procedure, the problem is reproducible
>>>>>> at will on system where it was initially reproduced (please see 
>>>>>> above for the configuration)
>>>>>>
>>>>>> [3] Other people reported this problem also for following 
>>>>>> configurations:
>>>>>> * vmware
>>>>>> * Sun Java Workstation W2100z with 2xOpteron2.4G 3G Mem
>>>>>>
>>>>>> [4] When installation into existing Solaris2 partition containing 
>>>>>> Solaris instance is done
>>>>>> 'devid' is always populated and the problem doesn't occur (it 
>>>>>> doesn't matter if partition
>>>>>> is marked 'active' or not).
>>>>>>
>>>>>> *** (#1 of 5): 2008-11-10 10:27:21 GMT+00:00 jan.damborsky at sun.com
>>>>>>
>>>>>> If the system once be Navada, (101a as mine), install OpenSolaris 
>>>>>> will hit this issue, while keep the partition but not choose the 
>>>>>> entire disk (I suspect this caused the issue, perhaps)
>>>>>> There's a diagnostic partition on there if Navada installed, and 
>>>>>> opensolaris 2008.11 simply enter grub> as this CR mentioned. Then 
>>>>>> I use the entire disk, this time the system boot up okay.
>>>>>> But while I re-install it again with a smaller size than the 
>>>>>> entire disk specified,
>>>>>> grub has no problem, but GNOME cannot start (hang there endlessly)
>>>>>>
>>>>>> *** (#2 of 5): 2008-11-10 10:45:29 GMT+00:00 robin.guo at sun.com
>>>>>>
>>>>>> The root cause of this problem is the continued existence of UFS 
>>>>>> filesystems structures on disk, even after the zfs filesystem is 
>>>>>> created and is live.  Because ZFS did not destroy the UFS magic, 
>>>>>> both GRUB and Solaris think there's a (horribly damaged) UFS 
>>>>>> filesystem present on that slice (a WARNING is displayed at boot 
>>>>>> time during OpenSolaris boot informing the user that 
>>>>>> /mnt/solaris<N> (where <N> is a number) could not be mounted 
>>>>>> because of filesystem problems -- in reality, that slice is where 
>>>>>> the zfs root is located.
>>>>>>
>>>>>> In GRUB, since code that attempts to mount root does so by trying 
>>>>>> each filesystem module in the order in which they are listed in 
>>>>>> the fsys_table[] array, and since UFS is listed before ZFS, GRUB 
>>>>>> thinks that a UFS filesystem exists in the slice actually 
>>>>>> containing the ZFS root filesystem (and fails trying to mount it, 
>>>>>> leaving it unable to locate the real root filesystem).  A 
>>>>>> modified version of GRUB that modifies fsys_table by declaring 
>>>>>> the ZFS operations before the UFS operations confirms this 
>>>>>> hypothesis.
>>>>>>
>>>>>> Therefore, a valid workaround destroys the UFS magic, preventing 
>>>>>> both GRUB's and Solaris's UFS modules from recognizing the slice 
>>>>>> as a UFS filesystem.  When GRUB's UFS code fails to find a valid 
>>>>>> UFS filesystem, the ZFS module is subsequently tried and is able 
>>>>>> to successfully mount the filesystem.
>>>>>>
>>>>>> *** (#3 of 5): 2008-11-11 03:23:04 GMT+00:00 seth.goldberg at sun.com
>>>>>> *** Last Edit: 2008-11-11 03:45:05 GMT+00:00 seth.goldberg at sun.com
>>>>>>
>>>>>> I think there are two separate issues here.  The UFS label 
>>>>>> appears to be one. The signature for this bug is that at grub 
>>>>>> prompt, typing root - generates the UFS filesystem info.
>>>>>>  However there is a secondary bug where after installation, one 
>>>>>> gets a grub prompt. Typing root command at the grub prompmt  
>>>>>> generates -  unknown file system. In this case no UFS filesystems 
>>>>>> were detected or mounted.  The workaround for this has been to 
>>>>>> run zpool import.   This still needs to be investigated.
>>>>>>
>>>>>> *** (#4 of 5): 2008-11-12 00:04:16 GMT+00:00 sanjay.nadkarni at sun.com
>>>>>>
>>>>>> We were able to recreate the grub failure where typing root at 
>>>>>> the prompt returns unknown file system. This was on a Fujistu 
>>>>>> LifeBook S7211.  It was installed with installed with Vista.  We 
>>>>>> then booted OpenSolaris and started the install. At the end of 
>>>>>> the installation we noted that the zfs label did  not have devid 
>>>>>> information.
>>>>>>
>>>>>> We then loaded a simple program that would get the devid 
>>>>>> (devid_get).  This failed with "Invalid argument".  We then 
>>>>>> rebooted the liveCD again and reran this program and this time it 
>>>>>> printed out the device id.  The disk is off a SATA controller.  
>>>>>> The driver that attached to this is ahci.  The device is: 
>>>>>> 82801HBM/HEM. The disk is Fujitsu MHY2120BH
>>>>>>
>>>>>> *** (#5 of 5): 2008-11-12 02:43:18 GMT+00:00 sanjay.nadkarni at sun.com
>>>>>>
>>>>>>
>>>>>> === *Public Comments* 
>>>>>> ========================================================
>>>>>> Following Bugzilla bugs were closed as duplicate of this issue:
>>>>>>
>>>>>> 4772 Cannot install OpenSolaris 2008.11 on VMware Server 2.0
>>>>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4772
>>>>>>
>>>>>> 4756 after reboot when finishing the installation, system can not 
>>>>>> boot
>>>>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4756
>>>>>>
>>>>>> 4749 After installed opensolaris0811RC1 on Dell PowerEdge, can't 
>>>>>> boot from disk.
>>>>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4749
>>>>>>
>>>>>> *** (#1 of 9): 2008-11-10 17:20:54 GMT+00:00 dave.miner at sun.com
>>>>>> *** Last Edit: 2008-11-11 11:45:41 GMT+00:00 jan.damborsky at sun.com
>>>>>>
>>>>>> zpool import doesn't help for me, nor would I expect it to (it's 
>>>>>> a mystery
>>>>>> why it seems to).  Clearing the UFS magic helps.
>>>>>>
>>>>>> Looking further, I find that the data on disk at 8k seems to still
>>>>>> be a UFS superblock, not a zfs vdev_boot_header_t, which doesn't 
>>>>>> make
>>>>>> sense to me; in any ZFS initialization scheme, one would expect 
>>>>>> all parts
>>>>>> of the label to be completely written.
>>>>>>
>>>>>> The expected vdev_boot_header_t appears at the label copy at 
>>>>>> 256K+8K, as
>>>>>> expected.
>>>>>>
>>>>>> *** (#2 of 9): 2008-11-11 04:39:09 GMT+00:00 dan.mick at sun.com
>>>>>>
>>>>>> It appears that ZFS doesn't validate that first 8k (the 
>>>>>> vdev_boot_header), so
>>>>>> that explains why the kernel was happy even with a UFS superblock 
>>>>>> where the
>>>>>> vdev_boot_header was supposed to be.
>>>>>>
>>>>>> Also, the last few bits of the 8k block in question seem to 
>>>>>> contain a
>>>>>> zio_block_tail_t (i.e. a zbt_magic and a zbt_cksum), so it seems 
>>>>>> this block
>>>>>> was written by ZFS sometime in the past.
>>>>>> Possible theories:  1) the ZFS initialization somehow skipped 
>>>>>> this 8k header,
>>>>>> or 2) somehow the 8k superblock was rewritten over the block 
>>>>>> after ZFS initialized it.
>>>>>>
>>>>>> *** (#3 of 9): 2008-11-11 04:49:57 GMT+00:00 dan.mick at sun.com
>>>>>>
>>>>>> Another possible theory: could this be the superblock flush from 
>>>>>> a still-mounted UFS being shut down?
>>>>>>
>>>>>> (The block was correct until after the OpenSolaris installer said 
>>>>>> it was done,
>>>>>> and waited for me to press a button to reboot.  I suspect
>>>>>> the original UFS was mounted and not unmounted before the ZFS 
>>>>>> creation,
>>>>>> so they both think they own the device.)
>>>>>>
>>>>>> Supporting evidence: the "last mounted" path in the superblock is 
>>>>>> "/mnt/solaris0".
>>>>>>
>>>>>> I suspect the cause of this bug is a UFS that's mounted and 
>>>>>> should be
>>>>>> unmounted by the installer before ZFS creation.
>>>>>>
>>>>>> What's the right category/subcategory for Caiman?
>>>>>>
>>>>>> *** (#4 of 9): 2008-11-11 07:34:58 GMT+00:00 dan.mick at sun.com
>>>>>>
>>>>>> The live CD has historically automatically mounted up any UFS 
>>>>>> file systems that it found, going back to Belenix.  Interesting 
>>>>>> that this is just now a problem, but it probably is a result of 
>>>>>> switching to ZFS for swap, as up until build 96 we always created 
>>>>>> a swap slice at the start of the disk, which it appears would 
>>>>>> have masked this problem.
>>>>>>
>>>>>> *** (#5 of 9): 2008-11-11 15:02:03 GMT+00:00 dave.miner at sun.com
>>>>>>
>>>>>> Installer takes care of releasing the target device before Target 
>>>>>> Instantiation
>>>>>> phase is launched. Among other things, it
>>>>>>
>>>>>> * releases all swap devices created on target disk
>>>>>> * unmounts whatever is mounted on target disk
>>>>>>
>>>>>> For the latter, /etc/mnttab is read and if there is mounted 
>>>>>> device which is part of
>>>>>> the target disk, installer tries to unmount it.
>>>>>>
>>>>>> The problem is after fix for Bugzilla bug 30 was integrated, UFS 
>>>>>> filesystems are
>>>>>> mounted with '-o m' option which causes the filesystem being 
>>>>>> mounted without making
>>>>>> entry in /etc/mnttab. Then mountpoints are hidden, installer 
>>>>>> can't see those and
>>>>>> doesn't unmount them.
>>>>>>
>>>>>> That said, this explains UFS part of the problem  (when 'dd' 
>>>>>> workaround works),
>>>>>> but doesn't seems to be related to ZFS part of the issue, when 
>>>>>> 'zpool import' workaround helped.
>>>>>>
>>>>>> *** (#6 of 9): 2008-11-11 16:25:09 GMT+00:00 jan.damborsky at sun.com
>>>>>> *** Last Edit: 2008-11-11 16:34:29 GMT+00:00 jan.damborsky at sun.com
>>>>>>
>>>>>> We should probably file leave this bug to resolve zpool create 
>>>>>> not removing evidence of the
>>>>>> previous ufs fs, and file another one to chase down the other 
>>>>>> issue(s?).
>>>>>>
>>>>>>  Chris, if you run zbd -l on you virgin device, are you missing 
>>>>>> any zfs properties? The reader
>>>>>> in GRUB pretty much gives up if things like the devid aren't set.
>>>>>>
>>>>>> *** (#7 of 9): 2008-11-11 19:30:56 GMT+00:00 
>>>>>> jan.setje-eilers at sun.com
>>>>>>
>>>>>> Concur that Chris' problem is different; the UFS superblock does 
>>>>>> not exist in
>>>>>> the first 256kb attached to the bug.  It appears as though 
>>>>>> phys_path and devid
>>>>>> are present, although it's difficult to be sure.  We should 
>>>>>> probably see if we can
>>>>>> send a debug version of Grub to Chris, with installation 
>>>>>> instructions, to see
>>>>>> why it seems unable to find the zfs.
>>>>>>
>>>>>> *** (#8 of 9): 2008-11-11 22:16:50 GMT+00:00 dan.mick at sun.com
>>>>>>
>>>>>> The root cause of 'UFS part' of this problem is in 'livecd code' 
>>>>>> and is tracked by
>>>>>> following Bugzilla bug:
>>>>>>
>>>>>> 4675 Fix for bug 30 causes ZFS label to be mangled - ending up in 
>>>>>> GRUB prompt after installing OpenSolaris
>>>>>>
>>>>>> Please feel free to use this bug (6769487) for tracking other 
>>>>>> part(s) of the problem.
>>>>>> Resetting category to solaris/kernel/zfs and Status to 'Dispatched'.
>>>>>>
>>>>>> *** (#9 of 9): 2008-11-12 12:46:21 GMT+00:00 jan.damborsky at sun.com
>>>>>>
>>>>>>
>>>>>> === *Comments* 
>>>>>> ===============================================================
>>>>>> Moved to public comments.
>>>>>>
>>>>>> *** (#1 of 6): 2008-11-10 17:04:10 GMT+00:00 jan.damborsky at sun.com
>>>>>> *** Last Edit: 2008-11-10 17:20:54 GMT+00:00 dave.miner at sun.com
>>>>>>
>>>>>> Same situation (without zfs) on:
>>>>>> White Box based on Intel DG33TL motherboard with ICH9R chipset, 
>>>>>> 2Gb memory, 3 SATA drives, 1 SATA CD/DVD, Intel graphics.
>>>>>>
>>>>>> *** (#2 of 6): 2008-11-10 22:52:23 GMT+00:00 pawel.wojcik at sun.com
>>>>>>
>>>>>> Workaround #1 does not cause the system to boot properly on the 
>>>>>> system I tried installing (that seems to be consistent with what 
>>>>>> others are reporting in the opensolaris defect report), but 
>>>>>> workaround #2 DOES.
>>>>>>
>>>>>> *** (#3 of 6): 2008-11-11 01:56:43 GMT+00:00 seth.goldberg at sun.com
>>>>>> *** Last Edit: 2008-11-11 03:41:48 GMT+00:00 seth.goldberg at sun.com
>>>>>>
>>>>>> I've reproduced this on a "virgin" disk, see SR record against 
>>>>>> this bug, (had to purchase a new spindle as previous disk failed 
>>>>>> and new disk removed supplier packaging was inserted into laptop 
>>>>>> and then 2008.11 CD booted).
>>>>>>
>>>>>> After a discussion with Dan Mick on email data requested by dan 
>>>>>> was capture root command from grub prompt:
>>>>>>
>>>>>> (hd0,0,a): Filesystem type is zfs, partition type 0xbf
>>>>>>
>>>>>> Also, can you boot from the CD and collect the first 256kb of the 
>>>>>> disk, with
>>>>>>
>>>>>> dd if=<your s0 slice here> of=first.256kb bs=256k count=1
>>>>>>
>>>>>> This is attached.
>>>>>>
>>>>>> *** (#4 of 6): 2008-11-11 10:46:29 GMT+00:00 
>>>>>> christopher.armes at sun.com
>>>>>>
>>>>>> Saw this bug on several machines today which I was helping to 
>>>>>> install. One person did a reinstall and it worked fine the second 
>>>>>> time as some reported.
>>>>>>
>>>>>> 2 other machines could use the workaround which Lin Ling pointed 
>>>>>> us to with this bug. That did save a couple folks from having to 
>>>>>> reinstall, so was very helpful. Thanks Lin! Of the installs of 
>>>>>> people that installed to a hard drive (i.e., not within 
>>>>>> VirtualBox), about 12 systems, we saw this on 3 machines, so 
>>>>>> about 25% of the systems in this small sampling.
>>>>>>
>>>>>> *** (#5 of 6): 2008-11-12 09:58:01 GMT+00:00 alan.duboff at sun.com
>>>>>>
>>>>>> Moved to public comments.
>>>>>>
>>>>>> *** (#6 of 6): 2008-11-12 12:43:18 GMT+00:00 jan.damborsky at sun.com
>>>>>> *** Last Edit: 2008-11-12 12:46:43 GMT+00:00 jan.damborsky at sun.com
>>>>>>
>>>>>>
>>>>>> === *Evaluation* 
>>>>>> =============================================================
>>>>>> See Description.
>>>>>>
>>>>>> *** (#1 of 4): 2008-11-11 03:23:04 GMT+00:00 seth.goldberg at sun.com
>>>>>>
>>>>>> remove mislead evaluation.
>>>>>>
>>>>>> *** (#2 of 4): 2008-11-11 21:45:12 GMT+00:00 lin.ling at sun.com
>>>>>> *** Last Edit: 2008-11-11 23:16:07 GMT+00:00 lin.ling at sun.com
>>>>>>
>>>>>> What?  No, read the public comments.  The problem is that the UFS 
>>>>>> filesystem is still mounted as the installer lays down the ZFS.  
>>>>>> Then, on reboot, the UFS, as
>>>>>> it's syncing, writes its superblock back to the filesystem it 
>>>>>> thinks it owns,
>>>>>> over the top of the now-ZFS-owned space.
>>>>>>
>>>>>> The installer must ensure that other filesystems are not mounted 
>>>>>> on the slice
>>>>>> where it's creating the ZFS rpool.
>>>>>>
>>>>>> *** (#3 of 4): 2008-11-11 22:11:35 GMT+00:00 dan.mick at sun.com
>>>>>>
>>>>>> You are right. I misunderstood.
>>>>>> George Wilson just corrected me that 'zpool create' indeed clears 
>>>>>> the space correctly:
>>>>>>
>>>>>> vdev_label_init() {
>>>>>>     :
>>>>>>         vp = zio_buf_alloc(sizeof (vdev_phys_t));
>>>>>>         bzero(vp, sizeof (vdev_phys_t));
>>>>>>     :
>>>>>>         bzero(vb, sizeof (vdev_boot_header_t));
>>>>>>     :
>>>>>> }
>>>>>>
>>>>>> Thanks for the clarification.
>>>>>>
>>>>>> *** (#4 of 4): 2008-11-11 22:49:04 GMT+00:00 lin.ling at sun.com
>>>>>>
>>>>>>
>>>>>> === *Suggested Fix* 
>>>>>> ==========================================================
>>>>>>
>>>>>> === *Workaround* 
>>>>>> =============================================================
>>>>>> [1] Boot LiveCD
>>>>>> $ pfexec su -
>>>>>> # zpool import -f rpool
>>>>>>
>>>>>> *** (#1 of 3): 2008-11-10 10:27:21 GMT+00:00 jan.damborsky at sun.com
>>>>>>
>>>>>> ZERO OUT The leftover UFS magic:
>>>>>>
>>>>>> For GNU dd:
>>>>>> dd if=/dev/zero bs=1 count=4 seek=9564 /dev/dsk/<SLICE>
>>>>>>
>>>>>> (e.g.:
>>>>>> dd if=/dev/zero bs=1 count=4 seek=9564 /dev/dsk/c4t0d0s0
>>>>>> )
>>>>>>
>>>>>> *** (#2 of 3): 2008-11-11 03:36:55 GMT+00:00 seth.goldberg at sun.com
>>>>>>
>>>>>> I did the following in dd to workaround around the issue:
>>>>>>
>>>>>> root at opensolaris:~# dd if=/dev/zero of=/dev/dsk/c1t0d0s0 bs=1 
>>>>>> count=4 seek=9564
>>>>>> 4+0 records in
>>>>>> 4+0 records out
>>>>>> 4 bytes (4 B) copied, 0.0394095 s, 0.1 kB/s
>>>>>> root at opensolaris:~#
>>>>>>
>>>>>> *** (#3 of 3): 2008-11-11 19:07:04 GMT+00:00 mary.ding at sun.com
>>>>>>
>>>>>>
>>>>>> === *Justification* 
>>>>>> ==========================================================
>>>>>> Priority changed from [] to [1-Very High]
>>>>>> Installed OpenSolaris 2008.11 doesn't boot
>>>>>> jan.damborsky at sun.com 2008-11-10 10:27:21 GMT
>>>>>>
>>>>>> *** (#1 of 1): 2008-11-10 10:27:21 GMT+00:00 jan.damborsky at sun.com
>>>>>>
>>>>>>
>>>>>> === *Additional Details* 
>>>>>> =====================================================
>>>>>>         Targeted Release:         Commit To Fix In Build:         
>>>>>> Fixed In Build:         Integrated In Build:         Verified In 
>>>>>> Build:   See Also: 6769534
>>>>>>   Duplicate of:   Hooks:
>>>>>>         Hook1:         Hook2:         Hook3:         
>>>>>> Hook4:         Hook5:         Hook6:   Interest List: 
>>>>>> dan.mick at sun.com, dave.miner at sun.com, david.comay at sun.com, 
>>>>>> frank.batschulat at sun.com, kerberos-iteam at Sun.COM, 
>>>>>> lin.ling at sun.com, nick.todd at sun.com, peter.dennis at sun.com, 
>>>>>> plus1tb at sun.com, sdg at sun.com, si-bugs at sun.com, sst-prg at 
>>>>>> sun.com, 
>>>>>> tomas.hurka at sun.com
>>>>>>   Program Management: New Defect
>>>>>>   Root Cause:   Is a Security Vulnerability?: No
>>>>>>   Fix Affects Documentation: No
>>>>>>   Fix Affects Localization: No
>>>>>>   Reported by:
>>>>>> === *History* 
>>>>>> ================================================================
>>>>>>         Date Submitted: 2008-11-10 10:27:21 GMT+00:00
>>>>>>         Submitted By: jan.damborsky at sun.com
>>>>>>
>>>>>>         Status Changed    Date Updated                  Updated By
>>>>>>         3-Accepted        2008-11-10 23:59:05 GMT+00:00 
>>>>>> lin.ling at sun.com
>>>>>>         5-Cause Known     2008-11-11 03:23:04 GMT+00:00 
>>>>>> seth.goldberg at sun.com
>>>>>>         1-Dispatched      2008-11-12 12:43:18 GMT+00:00 
>>>>>> jan.damborsky at sun.com
>>>>>>
>>>>>>
>>>>>> === *Solution* 
>>>>>> ===============================================================
>>>>>>
>>>>>>
>>>>>> === *Service Request* 
>>>>>> ========================================================
>>>>>>         ID: 1-493023606
>>>>>>         Customer:
>>>>>>         Account Name: Sun Microsystems
>>>>>>         Customer Contact:         Customer Contact Role: 
>>>>>> D-Development
>>>>>>         Customer Contact Type: I-Internal (SMI) Customer
>>>>>>         Impact: Critical
>>>>>>         Functionality: Primary
>>>>>>         Severity: 1
>>>>>>         Synopsis:         Product Name: solaris
>>>>>>         Product Release: osol_2008.11
>>>>>>         Product Build:         Operating System: osol_2008.11
>>>>>>         Hardware: generic
>>>>>>         Reference Number:         Sun Contact: jan.damborsky at sun.com
>>>>>>         Status: Open
>>>>>>         Source: BugTraq2
>>>>>>         Reproducible:         Submitted By: jan.damborsky at sun.com
>>>>>>         Submitted Date: 2008-11-10 10:27:21 GMT+00:00
>>>>>>         Description:
>>>>>>
>>>>>> === *Service Request* 
>>>>>> ========================================================
>>>>>>         ID: 1-493053806
>>>>>>         Customer:
>>>>>>         Account Name: SUN MicroSystems
>>>>>>         Customer Contact:         Customer Contact Role: 
>>>>>> D-Development
>>>>>>         Customer Contact Type: I-Internal (SMI) Customer
>>>>>>         Impact: Critical
>>>>>>         Functionality: Primary
>>>>>>         Severity: 1
>>>>>>         Synopsis: After installing 2008.11RC1b boot from hard 
>>>>>> disk fails
>>>>>>         Product Name: solaris
>>>>>>         Product Release: osol_2008.11
>>>>>>         Product Build:         Operating System: osol_2008.11
>>>>>>         Hardware: x86
>>>>>>         Reference Number:         Sun Contact: 
>>>>>> christopher.armes at sun.com
>>>>>>         Status: Open
>>>>>>         Source: BugTraq2
>>>>>>         Reproducible: Always
>>>>>>         Submitted By: christopher.armes at sun.com
>>>>>>         Submitted Date: 2008-11-10 12:54:24 GMT+00:00
>>>>>>         Description: Booting from the livecd and then selecting 
>>>>>> install works fine upon reboot with either cd in and selecting 
>>>>>> boot from hard disk or without cd allowing grub menu to boot, 
>>>>>> causes boot to fail drops system to "grub>" prompt
>>>>>>
>>>>>>
>>>>>> === *Service Request* 
>>>>>> ========================================================
>>>>>>         ID: 1-493177108
>>>>>>         Customer:
>>>>>>         Account Name: SUN
>>>>>>         Customer Contact:         Customer Contact Role: 
>>>>>> D-Development
>>>>>>         Customer Contact Type: I-Internal (SMI) Customer
>>>>>>         Impact: Critical
>>>>>>         Functionality: Primary
>>>>>>         Severity: 1
>>>>>>         Synopsis:         Product Name: solaris
>>>>>>         Product Release: osol_2008.11
>>>>>>         Product Build: osol_2008.11
>>>>>>         Operating System: osol_2008.11
>>>>>>         Hardware: amd
>>>>>>         Reference Number:         Sun Contact: 
>>>>>> garrett.damore at sun.com
>>>>>>         Status:         Source: BugTraq2
>>>>>>         Reproducible:         Submitted By: garrett.damore at sun.com
>>>>>>         Submitted Date: 2008-11-10 20:16:41 GMT+00:00
>>>>>>         Description: I hit this when updating my Ultra 20 
>>>>>> (original model, not M2) from b77ish to OSOL 2008.11rc1b
>>>>>>
>>>>>> System has 1.5GB ram, SATA hard disk.
>>>>>>
>>>>>>
>>>>>> === *Service Request* 
>>>>>> ========================================================
>>>>>>         ID: 1-493257401
>>>>>>         Customer:
>>>>>>         Account Name: Sun Microsystems, Inc.
>>>>>>         Customer Contact:         Customer Contact Role: 
>>>>>> D-Development
>>>>>>         Customer Contact Type: I-Internal (SMI) Customer
>>>>>>         Impact: Critical
>>>>>>         Functionality: Primary
>>>>>>         Severity: 1
>>>>>>         Synopsis:         Product Name: solaris
>>>>>>         Product Release: osol_2008.11
>>>>>>         Product Build: osol_2008.11
>>>>>>         Operating System: osol_2008.11
>>>>>>         Hardware: generic_ibm_compatible
>>>>>>         Reference Number:         Sun Contact: dana.myers at sun.com
>>>>>>         Status: Open
>>>>>>         Source: BugTraq2
>>>>>>         Reproducible:         Submitted By: dana.myers at sun.com
>>>>>>         Submitted Date: 2008-11-10 22:34:45 GMT+00:00
>>>>>>         Description:
>>>>>>
>>>>>> === *Service Request* 
>>>>>> ========================================================
>>>>>>         ID: 1-493265801
>>>>>>         Customer:
>>>>>>         Account Name: Sun Microsystems
>>>>>>         Customer Contact: pawel.wojcik at sun.com
>>>>>>         Customer Contact Role: D-Development
>>>>>>         Customer Contact Type: I-Internal (SMI) Customer
>>>>>>         Impact: Critical
>>>>>>         Functionality: Primary
>>>>>>         Severity: 1
>>>>>>         Synopsis:         Product Name: solaris
>>>>>>         Product Release: osol_2008.11
>>>>>>         Product Build: osol_2008.11
>>>>>>         Operating System: solaris
>>>>>>         Hardware: intel
>>>>>>         Reference Number:         Sun Contact: pawel.wojcik at sun.com
>>>>>>         Status:         Source: BugTraq2
>>>>>>         Reproducible:         Submitted By: pawel.wojcik at sun.com
>>>>>>         Submitted Date: 2008-11-10 22:50:53 GMT+00:00
>>>>>>         Description:
>>>>>>
>>>>>> === *Activity* 
>>>>>> ===============================================================
>>>>>>
>>>>>>
>>>>>> === *Multiple Release (MR) Cluster* - 0 
>>>>>> ======================================
>>>>>>
>>>>>>
>>>>>> === *Escalations* 
>>>>>> ============================================================
>>>>>>
>>>>>>   
>>>>>
>>>>
>>>
>>
>


Reply via email to