Re: [OmniOS-discuss] weird disk behavior

Richard Elling Wed, 09 Mar 2016 16:13:45 -0800

comment below...


> On Mar 9, 2016, at 11:05 AM, Michael Rasmussen <m...@miras.org> wrote:
> 
> Hi all,
> 
> I suddenly noticed one of the disk bays in my storage server going red
> with this logged in dmesg:
> Mar  9 19:19:47 nas genunix: [ID 107833 kern.warning] WARNING: 
> /pci@0,0/pci1022,1708@3/pci1028,1f0e@0 (mpt0):
> Mar  9 19:19:47 nas     Disconnected command timeout for Target 1
> Mar  9 19:19:51 nas genunix: [ID 365881 kern.info] 
> /pci@0,0/pci1022,1708@3/pci1028,1f0e@0 (mpt0):
> Mar  9 19:19:51 nas     Log info 0x31140000 received for target 1.
> Mar  9 19:19:51 nas     scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
> Mar  9 19:19:51 nas genunix: [ID 365881 kern.info] 
> /pci@0,0/pci1022,1708@3/pci1028,1f0e@0 (mpt0):
> Mar  9 19:19:51 nas     Log info 0x31130000 received for target 1.
> Mar  9 19:19:51 nas     scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
> Mar  9 19:19:51 nas genunix: [ID 365881 kern.info] 
> /pci@0,0/pci1022,1708@3/pci1028,1f0e@0 (mpt0):
> Mar  9 19:19:51 nas     Log info 0x31130000 received for target 1.
> Mar  9 19:19:51 nas     scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
> Mar  9 19:20:21 nas genunix: [ID 107833 kern.warning] WARNING: 
> /pci@0,0/pci1022,1708@3/pci1028,1f0e@0/sd@1,0 (sd3):
> Mar  9 19:20:21 nas     Command failed to complete...Device is gone
> Mar  9 19:20:24 nas genunix: [ID 107833 kern.warning] WARNING: 
> /pci@0,0/pci1022,1708@3/pci1028,1f0e@0/sd@1,0 (sd3):
> Mar  9 19:20:24 nas     Command failed to complete...Device is gone
> Mar  9 19:20:24 nas genunix: [ID 107833 kern.warning] WARNING: 
> /pci@0,0/pci1022,1708@3/pci1028,1f0e@0/sd@1,0 (sd3):
> Mar  9 19:20:24 nas     SYNCHRONIZE CACHE command failed (5)
> Mar  9 19:20:24 nas genunix: [ID 107833 kern.warning] WARNING: 
> /pci@0,0/pci1022,1708@3/pci1028,1f0e@0/sd@1,0 (sd3):
> Mar  9 19:20:24 nas     drive offline
> 
> zpool online and smartctl could not talk to the disk.
> 
> Pulling the disk and reinserting it and the status showed green in
> which case both smartctl and zpool online could talk to the disk.
> 
> Resilvering is now taking place.
> 
> Any idea what has went wrong or should I worry for a disk imminently
> failing?

these are symptoms that the drive is not responding and resets are being sent 
to try (often in vain) to bring the disk online. Since this is mpt, it is 
likely 3Gbps and if the drive is SATA your tears will flow. Now that the drive 
is back AND the symptoms cleared after reinstalling the drive, it is very 
likely that drive is the source of the errors. smartctl might give more info. 
IMHO you should plan for replacement of that drive.

NB, for that SAS fabric generation, it is posaible that the problem drive is 
not the only drive showing the same errors, but your drive pull test is a 
reasonable approach. 

Do not be surprised if smartctl doesn't correctly identify the issue, smart 
isn't very smart sometimes.

  -- richard


> 
> 
> -- 
> Hilsen/Regards
> Michael Rasmussen
> 
> Get my public GnuPG keys:
> michael <at> rasmussen <dot> cc
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
> mir <at> datanom <dot> net
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
> mir <at> miras <dot> org
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
> --------------------------------------------------------------
> /usr/games/fortune -es says:
> No group of professionals meets except to conspire against the public
> at large. -- Mark Twain
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] weird disk behavior

Reply via email to