Re: [pca] zpool unavailable after Kernel Patch 142909-17

2010-10-22 Thread Andy Fiddaman


On Thu, 21 Oct 2010, Paul B. Henson wrote:

; On 10/20/2010 7:29 AM, Glen Gunselman wrote:
; Sun's x86 machines are definitely a step down from the capabilities of their
; SPARC systems, even ones years older. If a problem like this occurred on one
; of my SPARC servers, I'd simply break into the prom and force a kernel panic,
; resulting in a nice juicy crash dump suitable for forensic analysis. As far as
; I can tell, the only way on x86 to make this happen is to initially boot the
; system from the kernel debugger, and leave it running under the debugger
; indefinitely until the problem occurs 8-/. I'm also still disgusted about the
; continued lack of serial console logging support on the ILOM :(.

You can force a panic on x86 via an NMI through an IPMI interface if the
hardware supports it and you've set the right parameters in /etc/system.

Here's an article on how to do it:

http://www.cuddletech.com/blog/pivot/entry.php?id=1044

Regards,

Andy



Re: [pca] zpool unavailable after Kernel Patch 142909-17

2010-10-22 Thread Don O'Malley
Sounds like the kind of problem that DTrace could help to diagnose 
(though you would need to have a good idea of where the problem is to 
start with, so you place your probes in the right area)...


Perhaps there might be something in the DTrace Toolkit could help - see 
http://hub.opensolaris.org/bin/view/Community+Group+dtrace/dtracetoolkit


DTrace should allow you to examine your system while it's still 
available (as opposed to forcing the kernel to core dump), though this 
is obviously not the case if it's hung :)


Just a thought...
-Don

Andy Fiddaman wrote:

On Thu, 21 Oct 2010, Paul B. Henson wrote:

; On 10/20/2010 7:29 AM, Glen Gunselman wrote:
; Sun's x86 machines are definitely a step down from the capabilities of their
; SPARC systems, even ones years older. If a problem like this occurred on one
; of my SPARC servers, I'd simply break into the prom and force a kernel panic,
; resulting in a nice juicy crash dump suitable for forensic analysis. As far as
; I can tell, the only way on x86 to make this happen is to initially boot the
; system from the kernel debugger, and leave it running under the debugger
; indefinitely until the problem occurs 8-/. I'm also still disgusted about the
; continued lack of serial console logging support on the ILOM :(.

You can force a panic on x86 via an NMI through an IPMI interface if the
hardware supports it and you've set the right parameters in /etc/system.

Here's an article on how to do it:

http://www.cuddletech.com/blog/pivot/entry.php?id=1044

Regards,

Andy

  




Re: [pca] zpool unavailable after Kernel Patch 142909-17

2010-10-21 Thread Paul B. Henson

On 10/20/2010 7:29 AM, Glen Gunselman wrote:


Thanks for the info. I am supporting a X4500 with 1TB disks. It has a on
going problem where it hangs (can not invoke kernel debugger but no
kernel dump/panic) and support has suggested I try S10U9 (currently
running S10U7).


Interesting; I have 5 x4500's of similar configuration. Over the past 
six months I have had three occurrences of inexplicable hangs. Nothing 
is logged on either the system or the console. Based on circumstantial 
evidence I believe access to zfs is wedging. Initially the ssh port 
connects, but never displays a banner (telnet to port 22 I mean), and if 
left alone, it eventually refuses connections entirely. Given the 
complete lack of diagnostic evidence I hadn't bothered opening a support 
ticket, and was also planning to just try U9 to see if it perhaps goes 
away, as there are supposedly many zfs fixes in U9.


Sun's x86 machines are definitely a step down from the capabilities of 
their SPARC systems, even ones years older. If a problem like this 
occurred on one of my SPARC servers, I'd simply break into the prom and 
force a kernel panic, resulting in a nice juicy crash dump suitable for 
forensic analysis. As far as I can tell, the only way on x86 to make 
this happen is to initially boot the system from the kernel debugger, 
and leave it running under the debugger indefinitely until the problem 
occurs 8-/. I'm also still disgusted about the continued lack of serial 
console logging support on the ILOM :(.


I did finally get U9 running on my test x4500 (after battling some 
scalability issues with live upgrade, sordid details available at 
http://opensolaris.org/jive/thread.jspa?messageID=503177tstart=0 if 
anyone's interested), and so far it seems to be working fine. All of the 
disks were recognized, and there were no problems importing or accessing 
any of the pools. I don't think this particular bug is a reason to delay 
an upgrade on this specific hardware...



--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768



Re: [pca] zpool unavailable after Kernel Patch 142909-17

2010-10-21 Thread Dennis Clarke

 Sun's x86 machines are definitely a step down from the capabilities of
 their SPARC systems, even ones years older.

I'm sorry for being OT here but I agree completely. I'll always spend the
extra money for a Sparc server and generally, at least in every case in
the past 15 years, they have never failed me.


-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris





Re: [pca] zpool unavailable after Kernel Patch 142909-17

2010-10-14 Thread Martin Paul

Paul B. Henson wrote:


Sounds like it will either work fine or be broken depending on your
hardware. We've got X4500's with 1GB disks, has anybody had any problems
with U9 on that hardware platform?


I have an X4500 with 500GB disks, on which I installed 142910-17 when it 
came out (Sep 07). On Sep 29 I re-installed it from scratch with U9. All 
the ZFS pools survived both procedures and I haven't had any problems.


So - no guarantee that it will work for you, but at least a prove that 
it doesn't harm all systems.


Martin.



Re: [pca] zpool unavailable after Kernel Patch 142909-17

2010-10-14 Thread Jon Price
Not sure if this is helpful or not but here is the bug report for
OpenSolaris..

http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=8929f7f15898c4b763796bce2e0d?bug_id=6967658

*Synopsis* sd_send_scsi_READ_CAPACITY_16() needs to handle SBC-2 and SBC-3
response formats


On Thu, Oct 14, 2010 at 5:22 AM, Don O'Malley don.omal...@oracle.comwrote:

  I'm trying to track down any SunAlerts related to this issue, but can't
 find any yet...


 Martin Paul wrote:

 Paul B. Henson wrote:

 Sounds like it will either work fine or be broken depending on your
 hardware. We've got X4500's with 1GB disks, has anybody had any problems
 with U9 on that hardware platform?


 I have an X4500 with 500GB disks, on which I installed 142910-17 when it
 came out (Sep 07). On Sep 29 I re-installed it from scratch with U9. All the
 ZFS pools survived both procedures and I haven't had any problems.

 So - no guarantee that it will work for you, but at least a prove that it
 doesn't harm all systems.

 Martin.


 --
  http://www.oracle.com/
 *Don O'Malley*
  Manager, Patch System Test
 Revenue Product Engineering | Solaris | Hardware
 East Point Business Park, Dublin 3, Ireland
 Phone: +353 1 8199764
 Team Alias: rpe_patch_system_test...@oracle.com
   http://www.oracle.com/commitment

green-logo.gifgraphics1

Re: [pca] zpool unavailable after Kernel Patch 142909-17

2010-10-13 Thread Don O'Malley




Hi Thomas,

I have asked a colleague about this and he believes that you have hit
CR 6967658, which is a direct result of installing 142909-17.

There is no generic patch delivering a fix yet, but thought this might
help if you are contacting support.

Best,
-Don


Bleek Thomas wrote:

  Hello,

just a warning and perhaps a request for some advices.

We have a Sun StorEdge SE3510 connected to a V240. This Raid is used as JBOD (12 independend disks, 5x2 mirrored, 2 spares) for ZFS and Patch testings.
I have 2 pools, one on this array, another on 2 local scsi-disks.

After installing all current patches I can't mount the pool on the Raid, the local one works. The disks are seen with format, zpool status (actually zpool import) gives:
r...@nftp:/zpool import
  pool: tank
id: 10696630212093874974
 state: UNAVAIL
status: The pool is formatted using an older on-disk version.
action: The pool cannot be imported due to damaged devices or data.
config:

tank UNAVAIL  insufficient replicas
  mirror-0   UNAVAIL  corrupted data
c2t40d0  ONLINE
c2t40d1  ONLINE
  mirror-1   UNAVAIL  corrupted data
c2t40d3  ONLINE
c2t40d2  ONLINE
  mirror-2   UNAVAIL  corrupted data
c2t40d4  ONLINE
c2t40d5  ONLINE
  mirror-3   UNAVAIL  corrupted data
c2t40d6  ONLINE
c2t40d7  ONLINE
  mirror-4   UNAVAIL  corrupted data
c2t40d8  ONLINE
c2t40d9  ONLINE
r...@nftp:/

2 things I have noticed:
1. The two spares have vanished (but they are seen with format)
2. The names of the submirros have changed, before the patch, they are all named simply "mirror".

I have still not done a "zpool upgrade" because I assume, that I will not able to mount on the older, unpatched system.

After booting into the old BE (thanks, Live Upgrade), the pool is online again.

So I tried to find the guilty patch with backing out patch for patch.
After backing out the kernel patch 142909-17 the problem has vanished but now I don't know how to proceed:-(

Any hints other than opening a case, which I will do after getting no responses?

TIA,
Thomas


  


-- 
  
Don O'Malley

Manager,Patch System Test
Revenue Product Engineering | Solaris | Hardware 
East Point Business Park, Dublin 3, Ireland
Phone: +353 1 8199764 
Team Alias: rpe_patch_system_test...@oracle.com
  





Re: [pca] zpool unavailable after Kernel Patch 142909-17

2010-10-13 Thread Xu, Ying (Houston)
I did a search on sunsolve on CR 6967658 and came across another bug
report (6984043) on update 9 (which has 142909-17 as kernel patch).  It
claims that after upgrading to update 9, zpool create consistently panic
the server.  Did anyone run into this problem?  
 
Thanks
 
Ying Xu y...@littonloan.com
Unix Group
Office: 713-218-4508
BB: 832-671-6633
4828 Loop Central Dr. Houston TX 77081
 
 



From: pca-boun...@lists.univie.ac.at
[mailto:pca-boun...@lists.univie.ac.at] On Behalf Of Don O'Malley
Sent: Wednesday, October 13, 2010 1:07 PM
To: PCA (Patch Check Advanced) Discussion
Subject: Re: [pca] zpool unavailable after Kernel Patch 142909-17


Hi Thomas,

I have asked a colleague about this and he believes that you have hit CR
6967658, which is a direct result of installing 142909-17.

There is no generic patch delivering a fix yet, but thought this might
help if you are contacting support.

Best,
-Don


Bleek Thomas wrote: 

Hello,

just a warning and perhaps a request for some advices.

We have a Sun StorEdge SE3510 connected to a V240. This Raid is
used as JBOD (12 independend disks, 5x2 mirrored, 2 spares) for ZFS and
Patch testings.
I have 2 pools, one on this array, another on 2 local
scsi-disks.

After installing all current patches I can't mount the pool on
the Raid, the local one works. The disks are seen with format, zpool
status (actually zpool import) gives:
r...@nftp:/zpool import
  pool: tank
id: 10696630212093874974
 state: UNAVAIL
status: The pool is formatted using an older on-disk version.
action: The pool cannot be imported due to damaged devices or
data.
config:

tank UNAVAIL  insufficient replicas
  mirror-0   UNAVAIL  corrupted data
c2t40d0  ONLINE
c2t40d1  ONLINE
  mirror-1   UNAVAIL  corrupted data
c2t40d3  ONLINE
c2t40d2  ONLINE
  mirror-2   UNAVAIL  corrupted data
c2t40d4  ONLINE
c2t40d5  ONLINE
  mirror-3   UNAVAIL  corrupted data
c2t40d6  ONLINE
c2t40d7  ONLINE
  mirror-4   UNAVAIL  corrupted data
c2t40d8  ONLINE
c2t40d9  ONLINE
r...@nftp:/

2 things I have noticed:
1. The two spares have vanished (but they are seen with format)
2. The names of the submirros have changed, before the patch,
they are all named simply mirror.

I have still not done a zpool upgrade because I assume, that I
will not able to mount on the older, unpatched system.

After booting into the old BE (thanks, Live Upgrade), the pool
is online again.

So I tried to find the guilty patch with backing out patch for
patch.
After backing out the kernel patch 142909-17 the problem has
vanished but now I don't know how to proceed:-(

Any hints other than opening a case, which I will do after
getting no responses?

TIA,
Thomas


  


-- 
  http://www.oracle.com/  
Don O'Malley
Manager,Patch System Test
Revenue Product Engineering | Solaris | Hardware 
East Point Business Park, Dublin 3, Ireland
Phone: +353 1 8199764 
Team Alias: rpe_patch_system_test...@oracle.com
  http://www.oracle.com/commitment  
---

DISCLAIMER: This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they are
addressed. If you have received this email in error please notify the sender
by replying to this message and then delete it from your system. Use,
dissemination or copying of this message by unintended recipients is not
authorized and may be unlawful. Please note that any views or opinions
presented in this email are solely those of the author and do not necessarily
represent those of the company. Finally, the recipient should check this email
and any attachments for the presence of viruses. The company accepts no
liability for any damage caused by any virus transmitted by this email.
graphics1green-logo.gif

Re: [pca] zpool unavailable after Kernel Patch 142909-17

2010-10-13 Thread Paul B. Henson
On Wed, 13 Oct 2010, Xu, Ying (Houston) wrote:

 I did a search on sunsolve on CR 6967658 and came across another bug
 report (6984043) on update 9 (which has 142909-17 as kernel patch).  It
 claims that after upgrading to update 9, zpool create consistently panic
 the server.  Did anyone run into this problem?

Ouch, we were just about to deploy U9 8-/, wonder if it might be a good
idea to wait... The status of 6967658 is Fix Delivered, seems to have
been fixed in build 148 (details now hidden in the secret Oracle repo I
suppose); so presumably they're working on an S10 patch. 6984043 is closed
as a dup of 6967658.

Sounds like it will either work fine or be broken depending on your
hardware. We've got X4500's with 1GB disks, has anybody had any problems
with U9 on that hardware platform?

Thanks...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768



[pca] zpool unavailable after Kernel Patch 142909-17

2010-10-12 Thread Bleek Thomas
Hello,

just a warning and perhaps a request for some advices.

We have a Sun StorEdge SE3510 connected to a V240. This Raid is used as JBOD 
(12 independend disks, 5x2 mirrored, 2 spares) for ZFS and Patch testings.
I have 2 pools, one on this array, another on 2 local scsi-disks.

After installing all current patches I can't mount the pool on the Raid, the 
local one works. The disks are seen with format, zpool status (actually zpool 
import) gives:
r...@nftp:/zpool import
  pool: tank
id: 10696630212093874974
 state: UNAVAIL
status: The pool is formatted using an older on-disk version.
action: The pool cannot be imported due to damaged devices or data.
config:

tank UNAVAIL  insufficient replicas
  mirror-0   UNAVAIL  corrupted data
c2t40d0  ONLINE
c2t40d1  ONLINE
  mirror-1   UNAVAIL  corrupted data
c2t40d3  ONLINE
c2t40d2  ONLINE
  mirror-2   UNAVAIL  corrupted data
c2t40d4  ONLINE
c2t40d5  ONLINE
  mirror-3   UNAVAIL  corrupted data
c2t40d6  ONLINE
c2t40d7  ONLINE
  mirror-4   UNAVAIL  corrupted data
c2t40d8  ONLINE
c2t40d9  ONLINE
r...@nftp:/

2 things I have noticed:
1. The two spares have vanished (but they are seen with format)
2. The names of the submirros have changed, before the patch, they are all 
named simply mirror.

I have still not done a zpool upgrade because I assume, that I will not able 
to mount on the older, unpatched system.

After booting into the old BE (thanks, Live Upgrade), the pool is online again.

So I tried to find the guilty patch with backing out patch for patch.
After backing out the kernel patch 142909-17 the problem has vanished but now I 
don't know how to proceed:-(

Any hints other than opening a case, which I will do after getting no responses?

TIA,
Thomas




smime.p7s
Description: S/MIME cryptographic signature


Re: [pca] zpool unavailable after Kernel Patch 142909-17

2010-10-12 Thread Dennis Clarke

 Hello,

 just a warning and perhaps a request for some advices.

 We have a Sun StorEdge SE3510 connected to a V240. This Raid is used as
 JBOD (12 independend disks, 5x2 mirrored, 2 spares) for ZFS and Patch
 testings.
 I have 2 pools, one on this array, another on 2 local scsi-disks.

 After installing all current patches I can't mount the pool on the Raid,
 the local one works. The disks are seen with format, zpool status
 (actually zpool import) gives:
 r...@nftp:/zpool import
   pool: tank
 id: 10696630212093874974
  state: UNAVAIL
 status: The pool is formatted using an older on-disk version.
 action: The pool cannot be imported due to damaged devices or data.
 config:

 tank UNAVAIL  insufficient replicas
   mirror-0   UNAVAIL  corrupted data
 c2t40d0  ONLINE
 c2t40d1  ONLINE
   mirror-1   UNAVAIL  corrupted data
 c2t40d3  ONLINE
 c2t40d2  ONLINE
   mirror-2   UNAVAIL  corrupted data
 c2t40d4  ONLINE
 c2t40d5  ONLINE
   mirror-3   UNAVAIL  corrupted data
 c2t40d6  ONLINE
 c2t40d7  ONLINE
   mirror-4   UNAVAIL  corrupted data
 c2t40d8  ONLINE
 c2t40d9  ONLINE
 r...@nftp:/

Fascinating.

This looks like an undocumented change in the way a zpool status is
reported. I see the same thing here :

$ uname -a
SunOS mercury 5.10 Generic_142909-17 sun4u sparc SUNW,Sun-Blade-2500
$
$ zpool status
  pool: mercury_rpool
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: resilver completed after 0h0m with 0 errors on Wed Sep 29 17:26:25
2010
config:

NAME  STATE READ WRITE CKSUM
mercury_rpool  ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
c3t0d0s0  ONLINE   0 0 0
c1t2d0s0  ONLINE   0 0 0  6.74M resilvered


See mirror-0 ? I didn't do that.

Too bad the sources for zpool status are not open anymore :-(

Dennis