Mary,

thanks for the update !

Jan


mary ding wrote:
> Jan and George:
>
> I had also seen this on ultra24 and ultra40 when we test the 1.5 TB 
> Seagate sata disk.  So far I had only seen this on system with sata 
> drive.
>
>
>
> jan damborsky wrote:
>> George,
>>
>>
>> George Wilson wrote:
>>> Jan,
>>>
>>> It seems like the problem is not with ZFS but with the device 
>>> driver. If the driver is failing to provide the devid then ZFS is 
>>> just going to be a victim.
>>
>> I agree with you that this is what we might be encountering
>> with respect to 'devid' problem here.
>>
>>
>>> I would recommend that we change the synopsis to devid_get() fails 
>>> with "Invalid argument" and pass this to the driver folks.
>>
>> I will let Sanjay comment on this, since he has done
>> some more investigation recently.
>>
>>> Do you know if it's always the same driver?
>>
>> I can only reproduce it on one system - this one has SATA drive
>> connected to the controller handled by nv_sata(7D) driver. I think
>> that Sanjay encountered that problem also on system with SATA disk.
>>
>> Thank you,
>> Jan
>>
>>> Thanks,
>>> George
>>>
>>> jan damborsky wrote:
>>>> Hi George,
>>>>
>>>>
>>>> George Wilson wrote:
>>>>> Jan,
>>>>>
>>>>> So who is working the UFS issue and how is that being tracked. 
>>>> In general, bugs in OpenSolaris Caiman installer are tracked in 
>>>> Bugzilla at
>>>> defect.opensolaris.org. This is the preferred over filing bugs in 
>>>> Bugster.
>>>> Speaking about this particular problem, it is tracked by following 
>>>> bug:
>>>>
>>>> 4675 Fix for bug 30 causes ZFS label to be mangled - ending up in 
>>>> GRUB prompt after installing OpenSolaris
>>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4675
>>>>
>>>> Sanjay Nadkarni is assigned to this bug (CCing him).
>>>>
>>>>> I would recommend that we keep this bug as the UFS/install issue 
>>>>> and create a new bug and send that to me.
>>>> As pointed above, Bugzilla is preferred database to track issues in 
>>>> Caiman installer.
>>>>
>>>> Please note that 6769487 was originally filed for tracking the 
>>>> problem when
>>>> GRUB can't access ZFS filesystem because 'devid' is not present in 
>>>> ZFS label.
>>>>
>>>> It was overloaded later by 'UFS' problem.
>>>>
>>>>> Can you move the descriptions below from this bug and add them to 
>>>>> the new one?
>>>> To be honest, since installer part of problem related to UFS is 
>>>> tracked by 4675,
>>>> I don't see why we shouldn't continue to use 6769487 to track the 
>>>> issue this bug
>>>> was initially filed for and I think that we might lose some context 
>>>> when
>>>> ZFS related information is moved from 6769487 to the new bug.
>>>> That said, if you think it might be helpful, please let me know and
>>>> I will try to capture all information from 6769487 I think is 
>>>> relevant to
>>>> the ZFS part in new bug.
>>>>
>>>>> Also since you can reproduce this can you tell me exactly how or 
>>>>> point me at a system which I can login into to debug?
>>>> Sure, the machine can be accessed via 'ssh', but since it is not
>>>> directly accessible from SWAN (it is behind the NAT),
>>>> I will provide you with instructions, how to access it.
>>>> Unfortunately it doesn't have console access.
>>>>
>>>> Please let me know, in which state you would need to have that
>>>> machine - right after the installation finished, but before reboot ?
>>>>
>>>> Unfortunately, following the procedure itself doesn't seem to be
>>>> sufficient for reproducing the problem :-( I tried exactly the
>>>> same steps on other bare metal as well as in virtual environment,
>>>> but without success.
>>>>
>>>>
>>>>> I want to make sure we don't lose sight of the UFS issue and this 
>>>>> bug has already gone down to root cause so let's not overload this 
>>>>> bug any further.
>>>> UFS part of problem is being solved right now (please feel free to 
>>>> monitor
>>>> bug 4675 for progress and add anything you might consider relevant
>>>> to that issue).
>>>>
>>>> Thank you,
>>>> Jan
>>>>
>>>>> Thanks,
>>>>> George
>>>>>
>>>>> jan damborsky wrote:
>>>>>> Hi George,
>>>>>>
>>>>>> there are at least two parts of this problem:
>>>>>>
>>>>>> [1] UFS one
>>>>>> This is what you are referring to and it is being tracked by 
>>>>>> Bugzilla bug 4675.
>>>>>> In that case workaround #2 helps to "solve" the problem.
>>>>>>
>>>>>> [2] ZFS one
>>>>>> Please see original description #1. I am able to reproduce that 
>>>>>> on system
>>>>>> at will which didn't contain any UFS filesystem and thus [1] is not
>>>>>> applicable here. 'zpool import' helps in this case.
>>>>>>
>>>>>> Also please see:
>>>>>> * description #4
>>>>>> * description #5
>>>>>> * public comments #8
>>>>>> * comments #6
>>>>>>
>>>>>> People are apparently encountering this problem in
>>>>>> other configurations (e.g. when using virgin disk
>>>>>> or installing on system containing only Windows).
>>>>>>
>>>>>> I am not stating that this is in fact problem in ZFS as it might
>>>>>> be related for example to device driver code, but at this point it
>>>>>> seems to me that ZFS team is the most eligible one to move
>>>>>> things forward, as GRUB can't read menu.lst from ZFS
>>>>>> filesystem .
>>>>>>
>>>>>> Please let me know if you have any questions or need more
>>>>>> information.
>>>>>>
>>>>>> Thank you,
>>>>>> Jan
>>>>>>
>>>>>>
>>>>>> George Wilson wrote:
>>>>>>> Jan,
>>>>>>>
>>>>>>> I don't understand how this is a ZFS problem. I thought from the 
>>>>>>> evaluation that the issue is that UFS and ZFS are sharing the 
>>>>>>> same block and this was being caused by the fact the the livecd 
>>>>>>> had mounted a UFS filesystem as part of the installation. Could 
>>>>>>> you clarify?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> George
>>>>>>>
>>>>>>> Jan.Damborsky at Sun.COM wrote:
>>>>>>>>                         Sun Confidential: Internal only
>>>>>>>>
>>>>>>>> *Synopsis*: Ended up in 'grub>' prompt after installation of 
>>>>>>>> OpenSolaris 2008.11 (build 101a)
>>>>>>>>
>>>>>>>> CrPrint: http://bt2ws.central.sun.com/CrPrint?id=6769487
>>>>>>>> Monaco: http://monaco.sfbay.sun.com/detail.jsf?cr=6769487
>>>>>>>>
>>>>>>>> Due to a change of Responsible manager requested by 
>>>>>>>> jan.damborsky at sun.com,
>>>>>>>> david.brittle at sun.com is now the responsible manager for:
>>>>>>>>
>>>>>>>> Due to a change requested by jan.damborsky at sun.com,
>>>>>>>> this CR is being redispatched:
>>>>>>>>
>>>>>>>> This is a high priority CR and requires your immediate attention.
>>>>>>>> Please evaluate it as soon as possible.  Thank you.
>>>>>>>>
>>>>>>>> CR 6769487 changed on Nov 12 2008 by jan.damborsky at sun.com
>>>>>>>>
>>>>>>>> === Field ============ === New Value ============= === Old 
>>>>>>>> Value =============
>>>>>>>>
>>>>>>>> Category               kernel                      
>>>>>>>> opensolaris                Comments               New 
>>>>>>>> Note                                               
>>>>>>>> Comments               New Note                    Old 
>>>>>>>> Note                   Comments               New 
>>>>>>>> Note                    Old Note                   Public 
>>>>>>>> Comments        New 
>>>>>>>> Note                                               Responsible 
>>>>>>>> Manager    david.brittle at sun.com       
>>>>>>>> eric.ray at sun.com           Status                 
>>>>>>>> 1-Dispatched                5-Cause Known              
>>>>>>>> SubCategory            zfs                         
>>>>>>>> livecd                     ====================== 
>>>>>>>> =========================== ===========================
>>>>>>>>
>>>>>>>>      *Change Request ID*: 6769487
>>>>>>>>
>>>>>>>> *Synopsis*: Ended up in 'grub>' prompt after installation of 
>>>>>>>> OpenSolaris 2008.11 (build 101a)
>>>>>>>>
>>>>>>>>   Product: solaris
>>>>>>>>   Category: kernel
>>>>>>>>   Subcategory: zfs
>>>>>>>>   Type: Defect
>>>>>>>>   Subtype: Functionality
>>>>>>>>   Status: 1-Dispatched
>>>>>>>>   Substatus:   Priority: 1-Very High
>>>>>>>>   Introduced In Release:   Introduced In Build:   Responsible 
>>>>>>>> Manager: david.brittle at sun.com
>>>>>>>>   Responsible Engineer:   Initial Evaluator: zfs-team at sun.com
>>>>>>>>   Keywords:
>>>>>>>> === *Description* 
>>>>>>>> ============================================================
>>>>>>>> When testing installation with recent OpenSolaris builds, we 
>>>>>>>> have been encountering that
>>>>>>>> in some cases, people end up in GRUB prompt after the 
>>>>>>>> installation - it seems that menu.lst
>>>>>>>> can't be accessed for some reason. For now bunch of Bugzilla 
>>>>>>>> bugs seem to be describing
>>>>>>>> the same manifestation of the problem which root cause has not 
>>>>>>>> been identified yet:
>>>>>>>>
>>>>>>>> 4051 opensolaris b99b/b100a does not install on 1.5 TB disk or 
>>>>>>>> boot fails after install
>>>>>>>> 4591 Install failure on a Sun Fire X4240 with Opensolaris 200811
>>>>>>>> 4161 no grub in 2008.11 Development Builds (comment #20, 
>>>>>>>> comment #31)
>>>>>>>> 4760 Enter grub after installing 2008.11 RC 1
>>>>>>>> ...
>>>>>>>>
>>>>>>>> I also hit that problem when testing Automated Installer (it is 
>>>>>>>> a part of Caiman project
>>>>>>>> and will replace current jumpstart install technology), I was 
>>>>>>>> able to make GRUB find
>>>>>>>> 'menu.lst' just by using 'zpool import' command - please see 
>>>>>>>> below for detailed procedure.
>>>>>>>>
>>>>>>>>
>>>>>>>> configuration:
>>>>>>>> --------------
>>>>>>>> HW: Ultra 20, 1GB RWM, 1 250GB SATA drive
>>>>>>>> SW: Opensolaris build 100, 64bit mode
>>>>>>>>
>>>>>>>> steps used:
>>>>>>>> -----------
>>>>>>>> [1] OpenSolaris 100 installed using Automated Installer
>>>>>>>>    - Solaris 2 partition created during installation
>>>>>>>>
>>>>>>>> * partition configuration before installation:
>>>>>>>>
>>>>>>>> # fdisk -W - c2t0d0p0
>>>>>>>> ...* Id    Act  Bhead  Bsect  Bcyl    Ehead  Esect  Ecyl    
>>>>>>>> Rsect      Numsect
>>>>>>>>  192   0    0      1      1       254    63     1023    
>>>>>>>> 16065      22491000
>>>>>>>> * partition configuration after installation:
>>>>>>>>
>>>>>>>> # fdisk -W - c2t0d0p0
>>>>>>>> ...* Id    Act  Bhead  Bsect  Bcyl    Ehead  Esect  Ecyl    
>>>>>>>> Rsect      Numsect
>>>>>>>>  192   0    0      1      1       254    63     1023    
>>>>>>>> 16065      22491000  191   128  254    63     1023    254    
>>>>>>>> 63     1023    22507065   30000000
>>>>>>>>
>>>>>>>> [2] When I reboot the system after the installation, I ended up 
>>>>>>>> in GRUB prompt:
>>>>>>>> grub> root
>>>>>>>> (hd0,1,a): Filesystem type unknown, partition type 0xbf
>>>>>>>>
>>>>>>>> grub> cat /rpool/boot/grub/menu.lst
>>>>>>>>
>>>>>>>> Error 17: Cannot mount selected partition
>>>>>>>>
>>>>>>>> grub>
>>>>>>>>
>>>>>>>> [3] I rebooted into AI and did 'zpool import'
>>>>>>>> # zdb -l /dev/rdsk/c2t0d0s0 > /tmp/zdb_before_import.txt 
>>>>>>>> (attached)
>>>>>>>> # zpool import -f rpool
>>>>>>>> # zdb -l /dev/rdsk/c2t0d0s0 > /tmp/zdb_after_import.txt (attached)
>>>>>>>> # diff /tmp/zdb_before_import.txt /tmp/zdb_after_import.txt
>>>>>>>> 7c7
>>>>>>>> <     txg=21
>>>>>>>> ---
>>>>>>>>  
>>>>>>>>>     txg=2675
>>>>>>>>>     
>>>>>>>> 9c9
>>>>>>>> <     hostid=4741222
>>>>>>>> ---
>>>>>>>>  
>>>>>>>>>     hostid=4247690
>>>>>>>>>     
>>>>>>>> 17a18
>>>>>>>>  
>>>>>>>>>         devid='id1,sd at f00c778e247ac7bd0000238460000/a'
>>>>>>>>>     
>>>>>>>> 31c32
>>>>>>>> ...
>>>>>>>> # reboot
>>>>>>>>
>>>>>>>> [4] Now GRUB can access menu.lst and Solaris is booted
>>>>>>>>
>>>>>>>> hypothesis
>>>>>>>> ----------
>>>>>>>> It seems that for some reason, when ZFS pool was created, 
>>>>>>>> 'devid' information was not added to the ZFS label.
>>>>>>>>
>>>>>>>> When 'zpool import' was called, 'devid' got populated.
>>>>>>>>
>>>>>>>> Looking at the GRUB ZFS plug-in, it seems that 'devid' 
>>>>>>>> (ZPOOL_CONFIG_DEVID attribute) is
>>>>>>>> required in order to be able to access ZFS filesystem:
>>>>>>>>
>>>>>>>> In grub/grub-0.95/stage2/fsys_zfs.c:
>>>>>>>>
>>>>>>>> vdev_get_bootpath()
>>>>>>>> {
>>>>>>>> ...
>>>>>>>>    if (strcmp(type, VDEV_TYPE_DISK) == 0) {
>>>>>>>>        if (vdev_validate(nv) != 0 ||
>>>>>>>>            (nvlist_lookup_value(nv, ZPOOL_CONFIG_PHYS_PATH,
>>>>>>>>            bootpath, DATA_TYPE_STRING, NULL) != 0) ||
>>>>>>>>            (nvlist_lookup_value(nv, ZPOOL_CONFIG_DEVID,
>>>>>>>>            devid, DATA_TYPE_STRING, NULL) != 0))
>>>>>>>>            return (ERR_NO_BOOTPATH);
>>>>>>>> ...
>>>>>>>> }
>>>>>>>>
>>>>>>>> additional observations:
>>>>>>>> ------------------------
>>>>>>>> [1] If 'devid' is populated during installation after 'zpool 
>>>>>>>> create'
>>>>>>>> operation, the problem doesn't occur.
>>>>>>>>
>>>>>>>> [2] If following described procedure, the problem is reproducible
>>>>>>>> at will on system where it was initially reproduced (please see 
>>>>>>>> above for the configuration)
>>>>>>>>
>>>>>>>> [3] Other people reported this problem also for following 
>>>>>>>> configurations:
>>>>>>>> * vmware
>>>>>>>> * Sun Java Workstation W2100z with 2xOpteron2.4G 3G Mem
>>>>>>>>
>>>>>>>> [4] When installation into existing Solaris2 partition 
>>>>>>>> containing Solaris instance is done
>>>>>>>> 'devid' is always populated and the problem doesn't occur (it 
>>>>>>>> doesn't matter if partition
>>>>>>>> is marked 'active' or not).
>>>>>>>>
>>>>>>>> *** (#1 of 5): 2008-11-10 10:27:21 GMT+00:00 jan.damborsky at sun.com
>>>>>>>>
>>>>>>>> If the system once be Navada, (101a as mine), install 
>>>>>>>> OpenSolaris will hit this issue, while keep the partition but 
>>>>>>>> not choose the entire disk (I suspect this caused the issue, 
>>>>>>>> perhaps)
>>>>>>>> There's a diagnostic partition on there if Navada installed, 
>>>>>>>> and opensolaris 2008.11 simply enter grub> as this CR 
>>>>>>>> mentioned. Then I use the entire disk, this time the system 
>>>>>>>> boot up okay.
>>>>>>>> But while I re-install it again with a smaller size than the 
>>>>>>>> entire disk specified,
>>>>>>>> grub has no problem, but GNOME cannot start (hang there endlessly)
>>>>>>>>
>>>>>>>> *** (#2 of 5): 2008-11-10 10:45:29 GMT+00:00 robin.guo at sun.com
>>>>>>>>
>>>>>>>> The root cause of this problem is the continued existence of 
>>>>>>>> UFS filesystems structures on disk, even after the zfs 
>>>>>>>> filesystem is created and is live.  Because ZFS did not destroy 
>>>>>>>> the UFS magic, both GRUB and Solaris think there's a (horribly 
>>>>>>>> damaged) UFS filesystem present on that slice (a WARNING is 
>>>>>>>> displayed at boot time during OpenSolaris boot informing the 
>>>>>>>> user that /mnt/solaris<N> (where <N> is a number) could not be 
>>>>>>>> mounted because of filesystem problems -- in reality, that 
>>>>>>>> slice is where the zfs root is located.
>>>>>>>>
>>>>>>>> In GRUB, since code that attempts to mount root does so by 
>>>>>>>> trying each filesystem module in the order in which they are 
>>>>>>>> listed in the fsys_table[] array, and since UFS is listed 
>>>>>>>> before ZFS, GRUB thinks that a UFS filesystem exists in the 
>>>>>>>> slice actually containing the ZFS root filesystem (and fails 
>>>>>>>> trying to mount it, leaving it unable to locate the real root 
>>>>>>>> filesystem).  A modified version of GRUB that modifies 
>>>>>>>> fsys_table by declaring the ZFS operations before the UFS 
>>>>>>>> operations confirms this hypothesis.
>>>>>>>>
>>>>>>>> Therefore, a valid workaround destroys the UFS magic, 
>>>>>>>> preventing both GRUB's and Solaris's UFS modules from 
>>>>>>>> recognizing the slice as a UFS filesystem.  When GRUB's UFS 
>>>>>>>> code fails to find a valid UFS filesystem, the ZFS module is 
>>>>>>>> subsequently tried and is able to successfully mount the 
>>>>>>>> filesystem.
>>>>>>>>
>>>>>>>> *** (#3 of 5): 2008-11-11 03:23:04 GMT+00:00 seth.goldberg at sun.com
>>>>>>>> *** Last Edit: 2008-11-11 03:45:05 GMT+00:00 seth.goldberg at sun.com
>>>>>>>>
>>>>>>>> I think there are two separate issues here.  The UFS label 
>>>>>>>> appears to be one. The signature for this bug is that at grub 
>>>>>>>> prompt, typing root - generates the UFS filesystem info.
>>>>>>>>  However there is a secondary bug where after installation, one 
>>>>>>>> gets a grub prompt. Typing root command at the grub prompmt  
>>>>>>>> generates -  unknown file system. In this case no UFS 
>>>>>>>> filesystems were detected or mounted.  The workaround for this 
>>>>>>>> has been to run zpool import.   This still needs to be 
>>>>>>>> investigated.
>>>>>>>>
>>>>>>>> *** (#4 of 5): 2008-11-12 00:04:16 GMT+00:00 
>>>>>>>> sanjay.nadkarni at sun.com
>>>>>>>>
>>>>>>>> We were able to recreate the grub failure where typing root at 
>>>>>>>> the prompt returns unknown file system. This was on a Fujistu 
>>>>>>>> LifeBook S7211.  It was installed with installed with Vista.  
>>>>>>>> We then booted OpenSolaris and started the install. At the end 
>>>>>>>> of the installation we noted that the zfs label did  not have 
>>>>>>>> devid information.
>>>>>>>>
>>>>>>>> We then loaded a simple program that would get the devid 
>>>>>>>> (devid_get).  This failed with "Invalid argument".  We then 
>>>>>>>> rebooted the liveCD again and reran this program and this time 
>>>>>>>> it printed out the device id.  The disk is off a SATA 
>>>>>>>> controller.  The driver that attached to this is ahci.  The 
>>>>>>>> device is: 82801HBM/HEM. The disk is Fujitsu MHY2120BH
>>>>>>>>
>>>>>>>> *** (#5 of 5): 2008-11-12 02:43:18 GMT+00:00 
>>>>>>>> sanjay.nadkarni at sun.com
>>>>>>>>
>>>>>>>>
>>>>>>>> === *Public Comments* 
>>>>>>>> ========================================================
>>>>>>>> Following Bugzilla bugs were closed as duplicate of this issue:
>>>>>>>>
>>>>>>>> 4772 Cannot install OpenSolaris 2008.11 on VMware Server 2.0
>>>>>>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4772
>>>>>>>>
>>>>>>>> 4756 after reboot when finishing the installation, system can 
>>>>>>>> not boot
>>>>>>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4756
>>>>>>>>
>>>>>>>> 4749 After installed opensolaris0811RC1 on Dell PowerEdge, 
>>>>>>>> can't boot from disk.
>>>>>>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4749
>>>>>>>>
>>>>>>>> *** (#1 of 9): 2008-11-10 17:20:54 GMT+00:00 dave.miner at sun.com
>>>>>>>> *** Last Edit: 2008-11-11 11:45:41 GMT+00:00 jan.damborsky at sun.com
>>>>>>>>
>>>>>>>> zpool import doesn't help for me, nor would I expect it to 
>>>>>>>> (it's a mystery
>>>>>>>> why it seems to).  Clearing the UFS magic helps.
>>>>>>>>
>>>>>>>> Looking further, I find that the data on disk at 8k seems to still
>>>>>>>> be a UFS superblock, not a zfs vdev_boot_header_t, which 
>>>>>>>> doesn't make
>>>>>>>> sense to me; in any ZFS initialization scheme, one would expect 
>>>>>>>> all parts
>>>>>>>> of the label to be completely written.
>>>>>>>>
>>>>>>>> The expected vdev_boot_header_t appears at the label copy at 
>>>>>>>> 256K+8K, as
>>>>>>>> expected.
>>>>>>>>
>>>>>>>> *** (#2 of 9): 2008-11-11 04:39:09 GMT+00:00 dan.mick at sun.com
>>>>>>>>
>>>>>>>> It appears that ZFS doesn't validate that first 8k (the 
>>>>>>>> vdev_boot_header), so
>>>>>>>> that explains why the kernel was happy even with a UFS 
>>>>>>>> superblock where the
>>>>>>>> vdev_boot_header was supposed to be.
>>>>>>>>
>>>>>>>> Also, the last few bits of the 8k block in question seem to 
>>>>>>>> contain a
>>>>>>>> zio_block_tail_t (i.e. a zbt_magic and a zbt_cksum), so it 
>>>>>>>> seems this block
>>>>>>>> was written by ZFS sometime in the past.
>>>>>>>> Possible theories:  1) the ZFS initialization somehow skipped 
>>>>>>>> this 8k header,
>>>>>>>> or 2) somehow the 8k superblock was rewritten over the block 
>>>>>>>> after ZFS initialized it.
>>>>>>>>
>>>>>>>> *** (#3 of 9): 2008-11-11 04:49:57 GMT+00:00 dan.mick at sun.com
>>>>>>>>
>>>>>>>> Another possible theory: could this be the superblock flush 
>>>>>>>> from a still-mounted UFS being shut down?
>>>>>>>>
>>>>>>>> (The block was correct until after the OpenSolaris installer 
>>>>>>>> said it was done,
>>>>>>>> and waited for me to press a button to reboot.  I suspect
>>>>>>>> the original UFS was mounted and not unmounted before the ZFS 
>>>>>>>> creation,
>>>>>>>> so they both think they own the device.)
>>>>>>>>
>>>>>>>> Supporting evidence: the "last mounted" path in the superblock 
>>>>>>>> is "/mnt/solaris0".
>>>>>>>>
>>>>>>>> I suspect the cause of this bug is a UFS that's mounted and 
>>>>>>>> should be
>>>>>>>> unmounted by the installer before ZFS creation.
>>>>>>>>
>>>>>>>> What's the right category/subcategory for Caiman?
>>>>>>>>
>>>>>>>> *** (#4 of 9): 2008-11-11 07:34:58 GMT+00:00 dan.mick at sun.com
>>>>>>>>
>>>>>>>> The live CD has historically automatically mounted up any UFS 
>>>>>>>> file systems that it found, going back to Belenix.  Interesting 
>>>>>>>> that this is just now a problem, but it probably is a result of 
>>>>>>>> switching to ZFS for swap, as up until build 96 we always 
>>>>>>>> created a swap slice at the start of the disk, which it appears 
>>>>>>>> would have masked this problem.
>>>>>>>>
>>>>>>>> *** (#5 of 9): 2008-11-11 15:02:03 GMT+00:00 dave.miner at sun.com
>>>>>>>>
>>>>>>>> Installer takes care of releasing the target device before 
>>>>>>>> Target Instantiation
>>>>>>>> phase is launched. Among other things, it
>>>>>>>>
>>>>>>>> * releases all swap devices created on target disk
>>>>>>>> * unmounts whatever is mounted on target disk
>>>>>>>>
>>>>>>>> For the latter, /etc/mnttab is read and if there is mounted 
>>>>>>>> device which is part of
>>>>>>>> the target disk, installer tries to unmount it.
>>>>>>>>
>>>>>>>> The problem is after fix for Bugzilla bug 30 was integrated, 
>>>>>>>> UFS filesystems are
>>>>>>>> mounted with '-o m' option which causes the filesystem being 
>>>>>>>> mounted without making
>>>>>>>> entry in /etc/mnttab. Then mountpoints are hidden, installer 
>>>>>>>> can't see those and
>>>>>>>> doesn't unmount them.
>>>>>>>>
>>>>>>>> That said, this explains UFS part of the problem  (when 'dd' 
>>>>>>>> workaround works),
>>>>>>>> but doesn't seems to be related to ZFS part of the issue, when 
>>>>>>>> 'zpool import' workaround helped.
>>>>>>>>
>>>>>>>> *** (#6 of 9): 2008-11-11 16:25:09 GMT+00:00 jan.damborsky at sun.com
>>>>>>>> *** Last Edit: 2008-11-11 16:34:29 GMT+00:00 jan.damborsky at sun.com
>>>>>>>>
>>>>>>>> We should probably file leave this bug to resolve zpool create 
>>>>>>>> not removing evidence of the
>>>>>>>> previous ufs fs, and file another one to chase down the other 
>>>>>>>> issue(s?).
>>>>>>>>
>>>>>>>>  Chris, if you run zbd -l on you virgin device, are you missing 
>>>>>>>> any zfs properties? The reader
>>>>>>>> in GRUB pretty much gives up if things like the devid aren't set.
>>>>>>>>
>>>>>>>> *** (#7 of 9): 2008-11-11 19:30:56 GMT+00:00 
>>>>>>>> jan.setje-eilers at sun.com
>>>>>>>>
>>>>>>>> Concur that Chris' problem is different; the UFS superblock 
>>>>>>>> does not exist in
>>>>>>>> the first 256kb attached to the bug.  It appears as though 
>>>>>>>> phys_path and devid
>>>>>>>> are present, although it's difficult to be sure.  We should 
>>>>>>>> probably see if we can
>>>>>>>> send a debug version of Grub to Chris, with installation 
>>>>>>>> instructions, to see
>>>>>>>> why it seems unable to find the zfs.
>>>>>>>>
>>>>>>>> *** (#8 of 9): 2008-11-11 22:16:50 GMT+00:00 dan.mick at sun.com
>>>>>>>>
>>>>>>>> The root cause of 'UFS part' of this problem is in 'livecd 
>>>>>>>> code' and is tracked by
>>>>>>>> following Bugzilla bug:
>>>>>>>>
>>>>>>>> 4675 Fix for bug 30 causes ZFS label to be mangled - ending up 
>>>>>>>> in GRUB prompt after installing OpenSolaris
>>>>>>>>
>>>>>>>> Please feel free to use this bug (6769487) for tracking other 
>>>>>>>> part(s) of the problem.
>>>>>>>> Resetting category to solaris/kernel/zfs and Status to 
>>>>>>>> 'Dispatched'.
>>>>>>>>
>>>>>>>> *** (#9 of 9): 2008-11-12 12:46:21 GMT+00:00 jan.damborsky at sun.com
>>>>>>>>
>>>>>>>>
>>>>>>>> === *Comments* 
>>>>>>>> ===============================================================
>>>>>>>> Moved to public comments.
>>>>>>>>
>>>>>>>> *** (#1 of 6): 2008-11-10 17:04:10 GMT+00:00 jan.damborsky at sun.com
>>>>>>>> *** Last Edit: 2008-11-10 17:20:54 GMT+00:00 dave.miner at sun.com
>>>>>>>>
>>>>>>>> Same situation (without zfs) on:
>>>>>>>> White Box based on Intel DG33TL motherboard with ICH9R chipset, 
>>>>>>>> 2Gb memory, 3 SATA drives, 1 SATA CD/DVD, Intel graphics.
>>>>>>>>
>>>>>>>> *** (#2 of 6): 2008-11-10 22:52:23 GMT+00:00 pawel.wojcik at sun.com
>>>>>>>>
>>>>>>>> Workaround #1 does not cause the system to boot properly on the 
>>>>>>>> system I tried installing (that seems to be consistent with 
>>>>>>>> what others are reporting in the opensolaris defect report), 
>>>>>>>> but workaround #2 DOES.
>>>>>>>>
>>>>>>>> *** (#3 of 6): 2008-11-11 01:56:43 GMT+00:00 seth.goldberg at sun.com
>>>>>>>> *** Last Edit: 2008-11-11 03:41:48 GMT+00:00 seth.goldberg at sun.com
>>>>>>>>
>>>>>>>> I've reproduced this on a "virgin" disk, see SR record against 
>>>>>>>> this bug, (had to purchase a new spindle as previous disk 
>>>>>>>> failed and new disk removed supplier packaging was inserted 
>>>>>>>> into laptop and then 2008.11 CD booted).
>>>>>>>>
>>>>>>>> After a discussion with Dan Mick on email data requested by dan 
>>>>>>>> was capture root command from grub prompt:
>>>>>>>>
>>>>>>>> (hd0,0,a): Filesystem type is zfs, partition type 0xbf
>>>>>>>>
>>>>>>>> Also, can you boot from the CD and collect the first 256kb of 
>>>>>>>> the disk, with
>>>>>>>>
>>>>>>>> dd if=<your s0 slice here> of=first.256kb bs=256k count=1
>>>>>>>>
>>>>>>>> This is attached.
>>>>>>>>
>>>>>>>> *** (#4 of 6): 2008-11-11 10:46:29 GMT+00:00 
>>>>>>>> christopher.armes at sun.com
>>>>>>>>
>>>>>>>> Saw this bug on several machines today which I was helping to 
>>>>>>>> install. One person did a reinstall and it worked fine the 
>>>>>>>> second time as some reported.
>>>>>>>>
>>>>>>>> 2 other machines could use the workaround which Lin Ling 
>>>>>>>> pointed us to with this bug. That did save a couple folks from 
>>>>>>>> having to reinstall, so was very helpful. Thanks Lin! Of the 
>>>>>>>> installs of people that installed to a hard drive (i.e., not 
>>>>>>>> within VirtualBox), about 12 systems, we saw this on 3 
>>>>>>>> machines, so about 25% of the systems in this small sampling.
>>>>>>>>
>>>>>>>> *** (#5 of 6): 2008-11-12 09:58:01 GMT+00:00 alan.duboff at sun.com
>>>>>>>>
>>>>>>>> Moved to public comments.
>>>>>>>>
>>>>>>>> *** (#6 of 6): 2008-11-12 12:43:18 GMT+00:00 jan.damborsky at sun.com
>>>>>>>> *** Last Edit: 2008-11-12 12:46:43 GMT+00:00 jan.damborsky at sun.com
>>>>>>>>
>>>>>>>>
>>>>>>>> === *Evaluation* 
>>>>>>>> =============================================================
>>>>>>>> See Description.
>>>>>>>>
>>>>>>>> *** (#1 of 4): 2008-11-11 03:23:04 GMT+00:00 seth.goldberg at sun.com
>>>>>>>>
>>>>>>>> remove mislead evaluation.
>>>>>>>>
>>>>>>>> *** (#2 of 4): 2008-11-11 21:45:12 GMT+00:00 lin.ling at sun.com
>>>>>>>> *** Last Edit: 2008-11-11 23:16:07 GMT+00:00 lin.ling at sun.com
>>>>>>>>
>>>>>>>> What?  No, read the public comments.  The problem is that the 
>>>>>>>> UFS filesystem is still mounted as the installer lays down the 
>>>>>>>> ZFS.  Then, on reboot, the UFS, as
>>>>>>>> it's syncing, writes its superblock back to the filesystem it 
>>>>>>>> thinks it owns,
>>>>>>>> over the top of the now-ZFS-owned space.
>>>>>>>>
>>>>>>>> The installer must ensure that other filesystems are not 
>>>>>>>> mounted on the slice
>>>>>>>> where it's creating the ZFS rpool.
>>>>>>>>
>>>>>>>> *** (#3 of 4): 2008-11-11 22:11:35 GMT+00:00 dan.mick at sun.com
>>>>>>>>
>>>>>>>> You are right. I misunderstood.
>>>>>>>> George Wilson just corrected me that 'zpool create' indeed 
>>>>>>>> clears the space correctly:
>>>>>>>>
>>>>>>>> vdev_label_init() {
>>>>>>>>     :
>>>>>>>>         vp = zio_buf_alloc(sizeof (vdev_phys_t));
>>>>>>>>         bzero(vp, sizeof (vdev_phys_t));
>>>>>>>>     :
>>>>>>>>         bzero(vb, sizeof (vdev_boot_header_t));
>>>>>>>>     :
>>>>>>>> }
>>>>>>>>
>>>>>>>> Thanks for the clarification.
>>>>>>>>
>>>>>>>> *** (#4 of 4): 2008-11-11 22:49:04 GMT+00:00 lin.ling at sun.com
>>>>>>>>
>>>>>>>>
>>>>>>>> === *Suggested Fix* 
>>>>>>>> ==========================================================
>>>>>>>>
>>>>>>>> === *Workaround* 
>>>>>>>> =============================================================
>>>>>>>> [1] Boot LiveCD
>>>>>>>> $ pfexec su -
>>>>>>>> # zpool import -f rpool
>>>>>>>>
>>>>>>>> *** (#1 of 3): 2008-11-10 10:27:21 GMT+00:00 jan.damborsky at sun.com
>>>>>>>>
>>>>>>>> ZERO OUT The leftover UFS magic:
>>>>>>>>
>>>>>>>> For GNU dd:
>>>>>>>> dd if=/dev/zero bs=1 count=4 seek=9564 /dev/dsk/<SLICE>
>>>>>>>>
>>>>>>>> (e.g.:
>>>>>>>> dd if=/dev/zero bs=1 count=4 seek=9564 /dev/dsk/c4t0d0s0
>>>>>>>> )
>>>>>>>>
>>>>>>>> *** (#2 of 3): 2008-11-11 03:36:55 GMT+00:00 seth.goldberg at sun.com
>>>>>>>>
>>>>>>>> I did the following in dd to workaround around the issue:
>>>>>>>>
>>>>>>>> root at opensolaris:~# dd if=/dev/zero of=/dev/dsk/c1t0d0s0 bs=1 
>>>>>>>> count=4 seek=9564
>>>>>>>> 4+0 records in
>>>>>>>> 4+0 records out
>>>>>>>> 4 bytes (4 B) copied, 0.0394095 s, 0.1 kB/s
>>>>>>>> root at opensolaris:~#
>>>>>>>>
>>>>>>>> *** (#3 of 3): 2008-11-11 19:07:04 GMT+00:00 mary.ding at sun.com
>>>>>>>>
>>>>>>>>
>>>>>>>> === *Justification* 
>>>>>>>> ==========================================================
>>>>>>>> Priority changed from [] to [1-Very High]
>>>>>>>> Installed OpenSolaris 2008.11 doesn't boot
>>>>>>>> jan.damborsky at sun.com 2008-11-10 10:27:21 GMT
>>>>>>>>
>>>>>>>> *** (#1 of 1): 2008-11-10 10:27:21 GMT+00:00 jan.damborsky at sun.com
>>>>>>>>
>>>>>>>>
>>>>>>>> === *Additional Details* 
>>>>>>>> =====================================================
>>>>>>>>         Targeted Release:         Commit To Fix In 
>>>>>>>> Build:         Fixed In Build:         Integrated In 
>>>>>>>> Build:         Verified In Build:   See Also: 6769534
>>>>>>>>   Duplicate of:   Hooks:
>>>>>>>>         Hook1:         Hook2:         Hook3:         
>>>>>>>> Hook4:         Hook5:         Hook6:   Interest List: 
>>>>>>>> dan.mick at sun.com, dave.miner at sun.com, david.comay at sun.com, 
>>>>>>>> frank.batschulat at sun.com, kerberos-iteam at Sun.COM, 
>>>>>>>> lin.ling at sun.com, nick.todd at sun.com, peter.dennis at sun.com, 
>>>>>>>> plus1tb at sun.com, sdg at sun.com, si-bugs at sun.com, sst-prg at 
>>>>>>>> sun.com, 
>>>>>>>> tomas.hurka at sun.com
>>>>>>>>   Program Management: New Defect
>>>>>>>>   Root Cause:   Is a Security Vulnerability?: No
>>>>>>>>   Fix Affects Documentation: No
>>>>>>>>   Fix Affects Localization: No
>>>>>>>>   Reported by:
>>>>>>>> === *History* 
>>>>>>>> ================================================================
>>>>>>>>         Date Submitted: 2008-11-10 10:27:21 GMT+00:00
>>>>>>>>         Submitted By: jan.damborsky at sun.com
>>>>>>>>
>>>>>>>>         Status Changed    Date Updated                  Updated By
>>>>>>>>         3-Accepted        2008-11-10 23:59:05 GMT+00:00 
>>>>>>>> lin.ling at sun.com
>>>>>>>>         5-Cause Known     2008-11-11 03:23:04 GMT+00:00 
>>>>>>>> seth.goldberg at sun.com
>>>>>>>>         1-Dispatched      2008-11-12 12:43:18 GMT+00:00 
>>>>>>>> jan.damborsky at sun.com
>>>>>>>>
>>>>>>>>
>>>>>>>> === *Solution* 
>>>>>>>> ===============================================================
>>>>>>>>
>>>>>>>>
>>>>>>>> === *Service Request* 
>>>>>>>> ========================================================
>>>>>>>>         ID: 1-493023606
>>>>>>>>         Customer:
>>>>>>>>         Account Name: Sun Microsystems
>>>>>>>>         Customer Contact:         Customer Contact Role: 
>>>>>>>> D-Development
>>>>>>>>         Customer Contact Type: I-Internal (SMI) Customer
>>>>>>>>         Impact: Critical
>>>>>>>>         Functionality: Primary
>>>>>>>>         Severity: 1
>>>>>>>>         Synopsis:         Product Name: solaris
>>>>>>>>         Product Release: osol_2008.11
>>>>>>>>         Product Build:         Operating System: osol_2008.11
>>>>>>>>         Hardware: generic
>>>>>>>>         Reference Number:         Sun Contact: 
>>>>>>>> jan.damborsky at sun.com
>>>>>>>>         Status: Open
>>>>>>>>         Source: BugTraq2
>>>>>>>>         Reproducible:         Submitted By: jan.damborsky at sun.com
>>>>>>>>         Submitted Date: 2008-11-10 10:27:21 GMT+00:00
>>>>>>>>         Description:
>>>>>>>>
>>>>>>>> === *Service Request* 
>>>>>>>> ========================================================
>>>>>>>>         ID: 1-493053806
>>>>>>>>         Customer:
>>>>>>>>         Account Name: SUN MicroSystems
>>>>>>>>         Customer Contact:         Customer Contact Role: 
>>>>>>>> D-Development
>>>>>>>>         Customer Contact Type: I-Internal (SMI) Customer
>>>>>>>>         Impact: Critical
>>>>>>>>         Functionality: Primary
>>>>>>>>         Severity: 1
>>>>>>>>         Synopsis: After installing 2008.11RC1b boot from hard 
>>>>>>>> disk fails
>>>>>>>>         Product Name: solaris
>>>>>>>>         Product Release: osol_2008.11
>>>>>>>>         Product Build:         Operating System: osol_2008.11
>>>>>>>>         Hardware: x86
>>>>>>>>         Reference Number:         Sun Contact: 
>>>>>>>> christopher.armes at sun.com
>>>>>>>>         Status: Open
>>>>>>>>         Source: BugTraq2
>>>>>>>>         Reproducible: Always
>>>>>>>>         Submitted By: christopher.armes at sun.com
>>>>>>>>         Submitted Date: 2008-11-10 12:54:24 GMT+00:00
>>>>>>>>         Description: Booting from the livecd and then selecting 
>>>>>>>> install works fine upon reboot with either cd in and selecting 
>>>>>>>> boot from hard disk or without cd allowing grub menu to boot, 
>>>>>>>> causes boot to fail drops system to "grub>" prompt
>>>>>>>>
>>>>>>>>
>>>>>>>> === *Service Request* 
>>>>>>>> ========================================================
>>>>>>>>         ID: 1-493177108
>>>>>>>>         Customer:
>>>>>>>>         Account Name: SUN
>>>>>>>>         Customer Contact:         Customer Contact Role: 
>>>>>>>> D-Development
>>>>>>>>         Customer Contact Type: I-Internal (SMI) Customer
>>>>>>>>         Impact: Critical
>>>>>>>>         Functionality: Primary
>>>>>>>>         Severity: 1
>>>>>>>>         Synopsis:         Product Name: solaris
>>>>>>>>         Product Release: osol_2008.11
>>>>>>>>         Product Build: osol_2008.11
>>>>>>>>         Operating System: osol_2008.11
>>>>>>>>         Hardware: amd
>>>>>>>>         Reference Number:         Sun Contact: 
>>>>>>>> garrett.damore at sun.com
>>>>>>>>         Status:         Source: BugTraq2
>>>>>>>>         Reproducible:         Submitted By: garrett.damore at sun.com
>>>>>>>>         Submitted Date: 2008-11-10 20:16:41 GMT+00:00
>>>>>>>>         Description: I hit this when updating my Ultra 20 
>>>>>>>> (original model, not M2) from b77ish to OSOL 2008.11rc1b
>>>>>>>>
>>>>>>>> System has 1.5GB ram, SATA hard disk.
>>>>>>>>
>>>>>>>>
>>>>>>>> === *Service Request* 
>>>>>>>> ========================================================
>>>>>>>>         ID: 1-493257401
>>>>>>>>         Customer:
>>>>>>>>         Account Name: Sun Microsystems, Inc.
>>>>>>>>         Customer Contact:         Customer Contact Role: 
>>>>>>>> D-Development
>>>>>>>>         Customer Contact Type: I-Internal (SMI) Customer
>>>>>>>>         Impact: Critical
>>>>>>>>         Functionality: Primary
>>>>>>>>         Severity: 1
>>>>>>>>         Synopsis:         Product Name: solaris
>>>>>>>>         Product Release: osol_2008.11
>>>>>>>>         Product Build: osol_2008.11
>>>>>>>>         Operating System: osol_2008.11
>>>>>>>>         Hardware: generic_ibm_compatible
>>>>>>>>         Reference Number:         Sun Contact: dana.myers at sun.com
>>>>>>>>         Status: Open
>>>>>>>>         Source: BugTraq2
>>>>>>>>         Reproducible:         Submitted By: dana.myers at sun.com
>>>>>>>>         Submitted Date: 2008-11-10 22:34:45 GMT+00:00
>>>>>>>>         Description:
>>>>>>>>
>>>>>>>> === *Service Request* 
>>>>>>>> ========================================================
>>>>>>>>         ID: 1-493265801
>>>>>>>>         Customer:
>>>>>>>>         Account Name: Sun Microsystems
>>>>>>>>         Customer Contact: pawel.wojcik at sun.com
>>>>>>>>         Customer Contact Role: D-Development
>>>>>>>>         Customer Contact Type: I-Internal (SMI) Customer
>>>>>>>>         Impact: Critical
>>>>>>>>         Functionality: Primary
>>>>>>>>         Severity: 1
>>>>>>>>         Synopsis:         Product Name: solaris
>>>>>>>>         Product Release: osol_2008.11
>>>>>>>>         Product Build: osol_2008.11
>>>>>>>>         Operating System: solaris
>>>>>>>>         Hardware: intel
>>>>>>>>         Reference Number:         Sun Contact: 
>>>>>>>> pawel.wojcik at sun.com
>>>>>>>>         Status:         Source: BugTraq2
>>>>>>>>         Reproducible:         Submitted By: pawel.wojcik at sun.com
>>>>>>>>         Submitted Date: 2008-11-10 22:50:53 GMT+00:00
>>>>>>>>         Description:
>>>>>>>>
>>>>>>>> === *Activity* 
>>>>>>>> ===============================================================
>>>>>>>>
>>>>>>>>
>>>>>>>> === *Multiple Release (MR) Cluster* - 0 
>>>>>>>> ======================================
>>>>>>>>
>>>>>>>>
>>>>>>>> === *Escalations* 
>>>>>>>> ============================================================
>>>>>>>>
>>>>>>>>   
>>
>> _______________________________________________
>> caiman-discuss mailing list
>> caiman-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
>


Reply via email to