Re: [pca] zpool unavailable after Kernel Patch 142909-17
On Thu, 21 Oct 2010, Paul B. Henson wrote: ; On 10/20/2010 7:29 AM, Glen Gunselman wrote: ; Sun's x86 machines are definitely a step down from the capabilities of their ; SPARC systems, even ones years older. If a problem like this occurred on one ; of my SPARC servers, I'd simply break into the prom and force a kernel panic, ; resulting in a nice juicy crash dump suitable for forensic analysis. As far as ; I can tell, the only way on x86 to make this happen is to initially boot the ; system from the kernel debugger, and leave it running under the debugger ; indefinitely until the problem occurs 8-/. I'm also still disgusted about the ; continued lack of serial console logging support on the ILOM :(. You can force a panic on x86 via an NMI through an IPMI interface if the hardware supports it and you've set the right parameters in /etc/system. Here's an article on how to do it: http://www.cuddletech.com/blog/pivot/entry.php?id=1044 Regards, Andy
Re: [pca] zpool unavailable after Kernel Patch 142909-17
Sounds like the kind of problem that DTrace could help to diagnose (though you would need to have a good idea of where the problem is to start with, so you place your probes in the right area)... Perhaps there might be something in the DTrace Toolkit could help - see http://hub.opensolaris.org/bin/view/Community+Group+dtrace/dtracetoolkit DTrace should allow you to examine your system while it's still available (as opposed to forcing the kernel to core dump), though this is obviously not the case if it's hung :) Just a thought... -Don Andy Fiddaman wrote: On Thu, 21 Oct 2010, Paul B. Henson wrote: ; On 10/20/2010 7:29 AM, Glen Gunselman wrote: ; Sun's x86 machines are definitely a step down from the capabilities of their ; SPARC systems, even ones years older. If a problem like this occurred on one ; of my SPARC servers, I'd simply break into the prom and force a kernel panic, ; resulting in a nice juicy crash dump suitable for forensic analysis. As far as ; I can tell, the only way on x86 to make this happen is to initially boot the ; system from the kernel debugger, and leave it running under the debugger ; indefinitely until the problem occurs 8-/. I'm also still disgusted about the ; continued lack of serial console logging support on the ILOM :(. You can force a panic on x86 via an NMI through an IPMI interface if the hardware supports it and you've set the right parameters in /etc/system. Here's an article on how to do it: http://www.cuddletech.com/blog/pivot/entry.php?id=1044 Regards, Andy
Re: [pca] zpool unavailable after Kernel Patch 142909-17
On 10/20/2010 7:29 AM, Glen Gunselman wrote: Thanks for the info. I am supporting a X4500 with 1TB disks. It has a on going problem where it hangs (can not invoke kernel debugger but no kernel dump/panic) and support has suggested I try S10U9 (currently running S10U7). Interesting; I have 5 x4500's of similar configuration. Over the past six months I have had three occurrences of inexplicable hangs. Nothing is logged on either the system or the console. Based on circumstantial evidence I believe access to zfs is wedging. Initially the ssh port connects, but never displays a banner (telnet to port 22 I mean), and if left alone, it eventually refuses connections entirely. Given the complete lack of diagnostic evidence I hadn't bothered opening a support ticket, and was also planning to just try U9 to see if it perhaps goes away, as there are supposedly many zfs fixes in U9. Sun's x86 machines are definitely a step down from the capabilities of their SPARC systems, even ones years older. If a problem like this occurred on one of my SPARC servers, I'd simply break into the prom and force a kernel panic, resulting in a nice juicy crash dump suitable for forensic analysis. As far as I can tell, the only way on x86 to make this happen is to initially boot the system from the kernel debugger, and leave it running under the debugger indefinitely until the problem occurs 8-/. I'm also still disgusted about the continued lack of serial console logging support on the ILOM :(. I did finally get U9 running on my test x4500 (after battling some scalability issues with live upgrade, sordid details available at http://opensolaris.org/jive/thread.jspa?messageID=503177tstart=0 if anyone's interested), and so far it seems to be working fine. All of the disks were recognized, and there were no problems importing or accessing any of the pools. I don't think this particular bug is a reason to delay an upgrade on this specific hardware... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
Re: [pca] zpool unavailable after Kernel Patch 142909-17
Sun's x86 machines are definitely a step down from the capabilities of their SPARC systems, even ones years older. I'm sorry for being OT here but I agree completely. I'll always spend the extra money for a Sparc server and generally, at least in every case in the past 15 years, they have never failed me. -- Dennis Clarke dcla...@opensolaris.ca - Email related to the open source Solaris dcla...@blastwave.org - Email related to open source for Solaris
Re: [pca] zpool unavailable after Kernel Patch 142909-17
Paul B. Henson wrote: Sounds like it will either work fine or be broken depending on your hardware. We've got X4500's with 1GB disks, has anybody had any problems with U9 on that hardware platform? I have an X4500 with 500GB disks, on which I installed 142910-17 when it came out (Sep 07). On Sep 29 I re-installed it from scratch with U9. All the ZFS pools survived both procedures and I haven't had any problems. So - no guarantee that it will work for you, but at least a prove that it doesn't harm all systems. Martin.
Re: [pca] zpool unavailable after Kernel Patch 142909-17
Not sure if this is helpful or not but here is the bug report for OpenSolaris.. http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=8929f7f15898c4b763796bce2e0d?bug_id=6967658 *Synopsis* sd_send_scsi_READ_CAPACITY_16() needs to handle SBC-2 and SBC-3 response formats On Thu, Oct 14, 2010 at 5:22 AM, Don O'Malley don.omal...@oracle.comwrote: I'm trying to track down any SunAlerts related to this issue, but can't find any yet... Martin Paul wrote: Paul B. Henson wrote: Sounds like it will either work fine or be broken depending on your hardware. We've got X4500's with 1GB disks, has anybody had any problems with U9 on that hardware platform? I have an X4500 with 500GB disks, on which I installed 142910-17 when it came out (Sep 07). On Sep 29 I re-installed it from scratch with U9. All the ZFS pools survived both procedures and I haven't had any problems. So - no guarantee that it will work for you, but at least a prove that it doesn't harm all systems. Martin. -- http://www.oracle.com/ *Don O'Malley* Manager, Patch System Test Revenue Product Engineering | Solaris | Hardware East Point Business Park, Dublin 3, Ireland Phone: +353 1 8199764 Team Alias: rpe_patch_system_test...@oracle.com http://www.oracle.com/commitment green-logo.gifgraphics1
Re: [pca] zpool unavailable after Kernel Patch 142909-17
Hi Thomas, I have asked a colleague about this and he believes that you have hit CR 6967658, which is a direct result of installing 142909-17. There is no generic patch delivering a fix yet, but thought this might help if you are contacting support. Best, -Don Bleek Thomas wrote: Hello, just a warning and perhaps a request for some advices. We have a Sun StorEdge SE3510 connected to a V240. This Raid is used as JBOD (12 independend disks, 5x2 mirrored, 2 spares) for ZFS and Patch testings. I have 2 pools, one on this array, another on 2 local scsi-disks. After installing all current patches I can't mount the pool on the Raid, the local one works. The disks are seen with format, zpool status (actually zpool import) gives: r...@nftp:/zpool import pool: tank id: 10696630212093874974 state: UNAVAIL status: The pool is formatted using an older on-disk version. action: The pool cannot be imported due to damaged devices or data. config: tank UNAVAIL insufficient replicas mirror-0 UNAVAIL corrupted data c2t40d0 ONLINE c2t40d1 ONLINE mirror-1 UNAVAIL corrupted data c2t40d3 ONLINE c2t40d2 ONLINE mirror-2 UNAVAIL corrupted data c2t40d4 ONLINE c2t40d5 ONLINE mirror-3 UNAVAIL corrupted data c2t40d6 ONLINE c2t40d7 ONLINE mirror-4 UNAVAIL corrupted data c2t40d8 ONLINE c2t40d9 ONLINE r...@nftp:/ 2 things I have noticed: 1. The two spares have vanished (but they are seen with format) 2. The names of the submirros have changed, before the patch, they are all named simply "mirror". I have still not done a "zpool upgrade" because I assume, that I will not able to mount on the older, unpatched system. After booting into the old BE (thanks, Live Upgrade), the pool is online again. So I tried to find the guilty patch with backing out patch for patch. After backing out the kernel patch 142909-17 the problem has vanished but now I don't know how to proceed:-( Any hints other than opening a case, which I will do after getting no responses? TIA, Thomas -- Don O'Malley Manager,Patch System Test Revenue Product Engineering | Solaris | Hardware East Point Business Park, Dublin 3, Ireland Phone: +353 1 8199764 Team Alias: rpe_patch_system_test...@oracle.com
Re: [pca] zpool unavailable after Kernel Patch 142909-17
I did a search on sunsolve on CR 6967658 and came across another bug report (6984043) on update 9 (which has 142909-17 as kernel patch). It claims that after upgrading to update 9, zpool create consistently panic the server. Did anyone run into this problem? Thanks Ying Xu y...@littonloan.com Unix Group Office: 713-218-4508 BB: 832-671-6633 4828 Loop Central Dr. Houston TX 77081 From: pca-boun...@lists.univie.ac.at [mailto:pca-boun...@lists.univie.ac.at] On Behalf Of Don O'Malley Sent: Wednesday, October 13, 2010 1:07 PM To: PCA (Patch Check Advanced) Discussion Subject: Re: [pca] zpool unavailable after Kernel Patch 142909-17 Hi Thomas, I have asked a colleague about this and he believes that you have hit CR 6967658, which is a direct result of installing 142909-17. There is no generic patch delivering a fix yet, but thought this might help if you are contacting support. Best, -Don Bleek Thomas wrote: Hello, just a warning and perhaps a request for some advices. We have a Sun StorEdge SE3510 connected to a V240. This Raid is used as JBOD (12 independend disks, 5x2 mirrored, 2 spares) for ZFS and Patch testings. I have 2 pools, one on this array, another on 2 local scsi-disks. After installing all current patches I can't mount the pool on the Raid, the local one works. The disks are seen with format, zpool status (actually zpool import) gives: r...@nftp:/zpool import pool: tank id: 10696630212093874974 state: UNAVAIL status: The pool is formatted using an older on-disk version. action: The pool cannot be imported due to damaged devices or data. config: tank UNAVAIL insufficient replicas mirror-0 UNAVAIL corrupted data c2t40d0 ONLINE c2t40d1 ONLINE mirror-1 UNAVAIL corrupted data c2t40d3 ONLINE c2t40d2 ONLINE mirror-2 UNAVAIL corrupted data c2t40d4 ONLINE c2t40d5 ONLINE mirror-3 UNAVAIL corrupted data c2t40d6 ONLINE c2t40d7 ONLINE mirror-4 UNAVAIL corrupted data c2t40d8 ONLINE c2t40d9 ONLINE r...@nftp:/ 2 things I have noticed: 1. The two spares have vanished (but they are seen with format) 2. The names of the submirros have changed, before the patch, they are all named simply mirror. I have still not done a zpool upgrade because I assume, that I will not able to mount on the older, unpatched system. After booting into the old BE (thanks, Live Upgrade), the pool is online again. So I tried to find the guilty patch with backing out patch for patch. After backing out the kernel patch 142909-17 the problem has vanished but now I don't know how to proceed:-( Any hints other than opening a case, which I will do after getting no responses? TIA, Thomas -- http://www.oracle.com/ Don O'Malley Manager,Patch System Test Revenue Product Engineering | Solaris | Hardware East Point Business Park, Dublin 3, Ireland Phone: +353 1 8199764 Team Alias: rpe_patch_system_test...@oracle.com http://www.oracle.com/commitment --- DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender by replying to this message and then delete it from your system. Use, dissemination or copying of this message by unintended recipients is not authorized and may be unlawful. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Finally, the recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. graphics1green-logo.gif
Re: [pca] zpool unavailable after Kernel Patch 142909-17
On Wed, 13 Oct 2010, Xu, Ying (Houston) wrote: I did a search on sunsolve on CR 6967658 and came across another bug report (6984043) on update 9 (which has 142909-17 as kernel patch). It claims that after upgrading to update 9, zpool create consistently panic the server. Did anyone run into this problem? Ouch, we were just about to deploy U9 8-/, wonder if it might be a good idea to wait... The status of 6967658 is Fix Delivered, seems to have been fixed in build 148 (details now hidden in the secret Oracle repo I suppose); so presumably they're working on an S10 patch. 6984043 is closed as a dup of 6967658. Sounds like it will either work fine or be broken depending on your hardware. We've got X4500's with 1GB disks, has anybody had any problems with U9 on that hardware platform? Thanks... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768
[pca] zpool unavailable after Kernel Patch 142909-17
Hello, just a warning and perhaps a request for some advices. We have a Sun StorEdge SE3510 connected to a V240. This Raid is used as JBOD (12 independend disks, 5x2 mirrored, 2 spares) for ZFS and Patch testings. I have 2 pools, one on this array, another on 2 local scsi-disks. After installing all current patches I can't mount the pool on the Raid, the local one works. The disks are seen with format, zpool status (actually zpool import) gives: r...@nftp:/zpool import pool: tank id: 10696630212093874974 state: UNAVAIL status: The pool is formatted using an older on-disk version. action: The pool cannot be imported due to damaged devices or data. config: tank UNAVAIL insufficient replicas mirror-0 UNAVAIL corrupted data c2t40d0 ONLINE c2t40d1 ONLINE mirror-1 UNAVAIL corrupted data c2t40d3 ONLINE c2t40d2 ONLINE mirror-2 UNAVAIL corrupted data c2t40d4 ONLINE c2t40d5 ONLINE mirror-3 UNAVAIL corrupted data c2t40d6 ONLINE c2t40d7 ONLINE mirror-4 UNAVAIL corrupted data c2t40d8 ONLINE c2t40d9 ONLINE r...@nftp:/ 2 things I have noticed: 1. The two spares have vanished (but they are seen with format) 2. The names of the submirros have changed, before the patch, they are all named simply mirror. I have still not done a zpool upgrade because I assume, that I will not able to mount on the older, unpatched system. After booting into the old BE (thanks, Live Upgrade), the pool is online again. So I tried to find the guilty patch with backing out patch for patch. After backing out the kernel patch 142909-17 the problem has vanished but now I don't know how to proceed:-( Any hints other than opening a case, which I will do after getting no responses? TIA, Thomas smime.p7s Description: S/MIME cryptographic signature
Re: [pca] zpool unavailable after Kernel Patch 142909-17
Hello, just a warning and perhaps a request for some advices. We have a Sun StorEdge SE3510 connected to a V240. This Raid is used as JBOD (12 independend disks, 5x2 mirrored, 2 spares) for ZFS and Patch testings. I have 2 pools, one on this array, another on 2 local scsi-disks. After installing all current patches I can't mount the pool on the Raid, the local one works. The disks are seen with format, zpool status (actually zpool import) gives: r...@nftp:/zpool import pool: tank id: 10696630212093874974 state: UNAVAIL status: The pool is formatted using an older on-disk version. action: The pool cannot be imported due to damaged devices or data. config: tank UNAVAIL insufficient replicas mirror-0 UNAVAIL corrupted data c2t40d0 ONLINE c2t40d1 ONLINE mirror-1 UNAVAIL corrupted data c2t40d3 ONLINE c2t40d2 ONLINE mirror-2 UNAVAIL corrupted data c2t40d4 ONLINE c2t40d5 ONLINE mirror-3 UNAVAIL corrupted data c2t40d6 ONLINE c2t40d7 ONLINE mirror-4 UNAVAIL corrupted data c2t40d8 ONLINE c2t40d9 ONLINE r...@nftp:/ Fascinating. This looks like an undocumented change in the way a zpool status is reported. I see the same thing here : $ uname -a SunOS mercury 5.10 Generic_142909-17 sun4u sparc SUNW,Sun-Blade-2500 $ $ zpool status pool: mercury_rpool state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: resilver completed after 0h0m with 0 errors on Wed Sep 29 17:26:25 2010 config: NAME STATE READ WRITE CKSUM mercury_rpool ONLINE 0 0 0 mirror-0ONLINE 0 0 0 c3t0d0s0 ONLINE 0 0 0 c1t2d0s0 ONLINE 0 0 0 6.74M resilvered See mirror-0 ? I didn't do that. Too bad the sources for zpool status are not open anymore :-( Dennis