Ooops… should have waited with sending that message after I rebootet the S11.1 host…

Am 25.01.17 um 23:41 schrieb Stephan Budach:
Hi Richard,

Am 25.01.17 um 20:27 schrieb Richard Elling:
Hi Stephan,

On Jan 25, 2017, at 5:54 AM, Stephan Budach <stephan.bud...@jvm.de <mailto:stephan.bud...@jvm.de>> wrote:

Hi guys,

I have been trying to import a zpool, based on a 3way-mirror provided by three omniOS boxes via iSCSI. This zpool had been working flawlessly until some random reboot of the S11.1 host. Since then, S11.1 has been importing this zpool without success.

This zpool consists of three 108TB LUNs, based on a raidz-2 zvols… yeah I know, we shouldn't have done that in the first place, but performance was not the primary goal for that, as this one is a backup/archive pool.

When issueing a zpool import, it says this:

root@solaris11atest2:~# zpool import
  pool: vsmPool10
    id: 12653649504720395171
 state: DEGRADED
status: The pool was last accessed by another system.
action: The pool can be imported despite missing or damaged devices. The
        fault tolerance of the pool may be compromised if imported.
   see: http://support.oracle.com/msg/ZFS-8000-EY
config:

vsmPool10                                  DEGRADED
mirror-0                                 DEGRADED
c0t600144F07A3506580000569398F60001d0  DEGRADED corrupted data
c0t600144F07A35066C00005693A0D90001d0  DEGRADED corrupted data
c0t600144F07A35001A00005693A2810001d0  DEGRADED corrupted data

device details:

c0t600144F07A3506580000569398F60001d0 DEGRADED scrub/resilver needed
        status: ZFS detected errors on this device.
                The device is missing some data that is recoverable.

c0t600144F07A35066C00005693A0D90001d0 DEGRADED scrub/resilver needed
        status: ZFS detected errors on this device.
                The device is missing some data that is recoverable.

c0t600144F07A35001A00005693A2810001d0 DEGRADED scrub/resilver needed
        status: ZFS detected errors on this device.
                The device is missing some data that is recoverable.

However, when actually running zpool import -f vsmPool10, the system starts to perform a lot of writes on the LUNs and iostat report an alarming increase in h/w errors:

root@solaris11atest2:~# iostat -xeM 5
extended device statistics ---- errors --- device r/s w/s Mr/s Mw/s wait actv svc_t %w %b s/w h/w trn tot sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 71 0 71 sd3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 sd4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 sd5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 extended device statistics ---- errors --- device r/s w/s Mr/s Mw/s wait actv svc_t %w %b s/w h/w trn tot sd0 14.2 147.3 0.7 0.4 0.2 0.1 2.0 6 9 0 0 0 0 sd1 14.2 8.4 0.4 0.0 0.0 0.0 0.3 0 0 0 0 0 0 sd2 0.0 4.2 0.0 0.0 0.0 0.0 0.0 0 0 0 92 0 92 sd3 157.3 46.2 2.1 0.2 0.0 0.7 3.7 0 14 0 30 0 30 sd4 123.9 29.4 1.6 0.1 0.0 1.7 10.9 0 36 0 40 0 40 sd5 142.5 43.0 2.0 0.1 0.0 1.9 10.2 0 45 0 88 0 88 extended device statistics ---- errors --- device r/s w/s Mr/s Mw/s wait actv svc_t %w %b s/w h/w trn tot sd0 0.0 234.5 0.0 0.6 0.2 0.1 1.4 6 10 0 0 0 0 sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 92 0 92 sd3 3.6 64.0 0.0 0.5 0.0 4.3 63.2 0 63 0 235 0 235 sd4 3.0 67.0 0.0 0.6 0.0 4.2 60.5 0 68 0 298 0 298 sd5 4.2 59.6 0.0 0.4 0.0 5.2 81.0 0 72 0 406 0 406 extended device statistics ---- errors --- device r/s w/s Mr/s Mw/s wait actv svc_t %w %b s/w h/w trn tot sd0 0.0 234.8 0.0 0.7 0.4 0.1 2.2 11 10 0 0 0 0 sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 92 0 92 sd3 5.4 54.4 0.0 0.3 0.0 2.9 48.5 0 67 0 384 0 384 sd4 6.0 53.4 0.0 0.3 0.0 4.6 77.7 0 87 0 519 0 519 sd5 6.0 60.8 0.0 0.3 0.0 4.8 72.5 0 87 0 727 0 727

h/w errors are a classification of other errors. The full error list is available from "iostat -E" and will
be important to tracking this down.

A better, more detailed analysis can be gleaned from the "fmdump -e" ereports that should be associated with each h/w error. However, there are dozens of causes of these so we don’t have
enough info here to fully understand.
 — richard

Well… I can't provide you with the output of fmdump -e (since I am currently unable to get the '-' typed in to the console, due to some fancy keyboard layout issues and nit being able to login via ssh as well (can authenticate, but I don't get to the shell, which may be due to the running zpool import), but I can confirm that fmdump does show nothing at all. I could just reset the S11.1 host, after removing the zpool.cache file, such as that the system will not try to import the zpool upon restart right away…

…plus I might get the option to set the keyboard right, after reboot, but that's another issue…

After resetting the S11.1 host and getting the keyboard layout right, I issued a fmdump -e and there they are… lots of:

Jan 25 23:25:13.5643 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.8944 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.8945 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.8946 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9274 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9275 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9276 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9277 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9282 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9284 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9285 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9286 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9287 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9288 ereport.fs.zfs.dev.merr.write
Jan 25 23:25:13.9290 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9294 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9301 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9306 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:50:44.7195 ereport.io.scsi.cmd.disk.dev.rqs.derr
Jan 25 23:50:44.7306 ereport.io.scsi.cmd.disk.dev.rqs.derr
Jan 25 23:50:44.7434 ereport.io.scsi.cmd.disk.dev.rqs.derr
Jan 25 23:53:31.4386 ereport.io.scsi.cmd.disk.dev.rqs.derr
Jan 25 23:53:31.4579 ereport.io.scsi.cmd.disk.dev.rqs.derr
Jan 25 23:53:31.4710 ereport.io.scsi.cmd.disk.dev.rqs.derr


These seem to be media errors and disk errors on the zpools/zvols that make up the LUNs for this zpool… I am wondering, why this happens.

Stephan

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Reply via email to