Hi Richard, gotcha… read on, below…
Am 26.01.17 um 00:43 schrieb Richard Elling:
Oh Lord… I really think, that this is it… this is what the zpool/zvol looks like on one of the three targets:more below…On Jan 25, 2017, at 3:01 PM, Stephan Budach <[email protected] <mailto:[email protected]>> wrote:Ooops… should have waited with sending that message after I rebootet the S11.1 host…Am 25.01.17 um 23:41 schrieb Stephan Budach:After resetting the S11.1 host and getting the keyboard layout right, I issued a fmdump -e and there they are… lots of:Hi Richard, Am 25.01.17 um 20:27 schrieb Richard Elling:Well… I can't provide you with the output of fmdump -e (since I am currently unable to get the '-' typed in to the console, due to some fancy keyboard layout issues and nit being able to login via ssh as well (can authenticate, but I don't get to the shell, which may be due to the running zpool import), but I can confirm that fmdump does show nothing at all. I could just reset the S11.1 host, after removing the zpool.cache file, such as that the system will not try to import the zpool upon restart right away…Hi Stephan,On Jan 25, 2017, at 5:54 AM, Stephan Budach <[email protected] <mailto:[email protected]>> wrote:Hi guys,I have been trying to import a zpool, based on a 3way-mirror provided by three omniOS boxes via iSCSI. This zpool had been working flawlessly until some random reboot of the S11.1 host. Since then, S11.1 has been importing this zpool without success.This zpool consists of three 108TB LUNs, based on a raidz-2 zvols… yeah I know, we shouldn't have done that in the first place, but performance was not the primary goal for that, as this one is a backup/archive pool.When issueing a zpool import, it says this: root@solaris11atest2:~# zpool import pool: vsmPool10 id: 12653649504720395171 state: DEGRADED status: The pool was last accessed by another system.action: The pool can be imported despite missing or damaged devices. Thefault tolerance of the pool may be compromised if imported. see: http://support.oracle.com/msg/ZFS-8000-EY config: vsmPool10 DEGRADED mirror-0 DEGRADED c0t600144F07A3506580000569398F60001d0 DEGRADED corrupted data c0t600144F07A35066C00005693A0D90001d0 DEGRADED corrupted data c0t600144F07A35001A00005693A2810001d0 DEGRADED corrupted data device details:c0t600144F07A3506580000569398F60001d0 DEGRADED scrub/resilver neededstatus: ZFS detected errors on this device. The device is missing some data that is recoverable.c0t600144F07A35066C00005693A0D90001d0 DEGRADED scrub/resilver neededstatus: ZFS detected errors on this device. The device is missing some data that is recoverable.c0t600144F07A35001A00005693A2810001d0 DEGRADED scrub/resilver neededstatus: ZFS detected errors on this device. The device is missing some data that is recoverable.However, when actually running zpool import -f vsmPool10, the system starts to perform a lot of writes on the LUNs and iostat report an alarming increase in h/w errors:root@solaris11atest2:~# iostat -xeM 5 extended device statistics ---- errors ---device r/s w/s Mr/s Mw/s wait actv svc_t %w %b s/w h/w trn totsd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 71 0 71sd3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 sd4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 sd5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 extended device statistics ---- errors ---device r/s w/s Mr/s Mw/s wait actv svc_t %w %b s/w h/w trn totsd0 14.2 147.3 0.7 0.4 0.2 0.1 2.0 6 9 0 0 0 0 sd1 14.2 8.4 0.4 0.0 0.0 0.0 0.3 0 0 0 0 0 0sd2 0.0 4.2 0.0 0.0 0.0 0.0 0.0 0 0 0 92 0 92 sd3 157.3 46.2 2.1 0.2 0.0 0.7 3.7 0 14 0 30 0 30 sd4 123.9 29.4 1.6 0.1 0.0 1.7 10.9 0 36 0 40 0 40 sd5 142.5 43.0 2.0 0.1 0.0 1.9 10.2 0 45 0 88 0 88extended device statistics ---- errors ---device r/s w/s Mr/s Mw/s wait actv svc_t %w %b s/w h/w trn totsd0 0.0 234.5 0.0 0.6 0.2 0.1 1.4 6 10 0 0 0 0 sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 92 0 92 sd3 3.6 64.0 0.0 0.5 0.0 4.3 63.2 0 63 0 235 0 235 sd4 3.0 67.0 0.0 0.6 0.0 4.2 60.5 0 68 0 298 0 298 sd5 4.2 59.6 0.0 0.4 0.0 5.2 81.0 0 72 0 406 0 406extended device statistics ---- errors ---device r/s w/s Mr/s Mw/s wait actv svc_t %w %b s/w h/w trn totsd0 0.0 234.8 0.0 0.7 0.4 0.1 2.2 11 10 0 0 0 0 sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 92 0 92 sd3 5.4 54.4 0.0 0.3 0.0 2.9 48.5 0 67 0 384 0 384 sd4 6.0 53.4 0.0 0.3 0.0 4.6 77.7 0 87 0 519 0 519 sd5 6.0 60.8 0.0 0.3 0.0 4.8 72.5 0 87 0 727 0 727h/w errors are a classification of other errors. The full error list is available from "iostat -E" and willbe important to tracking this down.A better, more detailed analysis can be gleaned from the "fmdump -e" ereports that should be associated with each h/w error. However, there are dozens of causes of these so we don’t haveenough info here to fully understand. — richard…plus I might get the option to set the keyboard right, after reboot, but that's another issue…Jan 25 23:25:13.5643 ereport.io.scsi.cmd.disk.dev.rqs.merr.write Jan 25 23:25:13.8944 ereport.io.scsi.cmd.disk.dev.rqs.merr.write Jan 25 23:25:13.8945 ereport.io.scsi.cmd.disk.dev.rqs.merr.write Jan 25 23:25:13.8946 ereport.io.scsi.cmd.disk.dev.rqs.merr.write Jan 25 23:25:13.9274 ereport.io.scsi.cmd.disk.dev.rqs.merr.write Jan 25 23:25:13.9275 ereport.io.scsi.cmd.disk.dev.rqs.merr.write Jan 25 23:25:13.9276 ereport.io.scsi.cmd.disk.dev.rqs.merr.write Jan 25 23:25:13.9277 ereport.io.scsi.cmd.disk.dev.rqs.merr.write Jan 25 23:25:13.9282 ereport.io.scsi.cmd.disk.dev.rqs.merr.write Jan 25 23:25:13.9284 ereport.io.scsi.cmd.disk.dev.rqs.merr.write Jan 25 23:25:13.9285 ereport.io.scsi.cmd.disk.dev.rqs.merr.write Jan 25 23:25:13.9286 ereport.io.scsi.cmd.disk.dev.rqs.merr.write Jan 25 23:25:13.9287 ereport.io.scsi.cmd.disk.dev.rqs.merr.write Jan 25 23:25:13.9288 ereport.fs.zfs.dev.merr.write Jan 25 23:25:13.9290 ereport.io.scsi.cmd.disk.dev.rqs.merr.write Jan 25 23:25:13.9294 ereport.io.scsi.cmd.disk.dev.rqs.merr.write Jan 25 23:25:13.9301 ereport.io.scsi.cmd.disk.dev.rqs.merr.write Jan 25 23:25:13.9306 ereport.io.scsi.cmd.disk.dev.rqs.merr.write Jan 25 23:50:44.7195 ereport.io.scsi.cmd.disk.dev.rqs.derr Jan 25 23:50:44.7306 ereport.io.scsi.cmd.disk.dev.rqs.derr Jan 25 23:50:44.7434 ereport.io.scsi.cmd.disk.dev.rqs.derr Jan 25 23:53:31.4386 ereport.io.scsi.cmd.disk.dev.rqs.derr Jan 25 23:53:31.4579 ereport.io.scsi.cmd.disk.dev.rqs.derr Jan 25 23:53:31.4710 ereport.io.scsi.cmd.disk.dev.rqs.derrThese seem to be media errors and disk errors on the zpools/zvols that make up the LUNs for this zpool… I am wondering, why this happens.yes, good questionThat we get media errors "merr" on write is one clue. To find out more details, "fmdump -eV"will show in gory details the exact SCSI asc/ascq codes, LBAs, etc.ZFS is COW, so if the LUs are backed by ZFS and there isn’t enough free space, then this isthe sort of error we expect. But there could be other reasons. — richard
root@tr1206900:/root# zpool listNAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 29,8G 21,5G 8,27G - 45% 72% 1.00x ONLINE - tr1206900data 109T 106T 3,41T - 51% 96% 1.00x ONLINE - root@tr1206900:/root# zfs list -r tr1206900data NAME USED AVAIL REFER MOUNTPOINT tr1206900data 86,6T 0 236K /tr1206900data tr1206900data/vsmPool10 86,6T 0 86,6T - root@tr1206900:/root# zfs get all tr1206900data/vsmPool10 NAME PROPERTY VALUE SOURCE tr1206900data/vsmPool10 type volume - tr1206900data/vsmPool10 creation Mo. Jan 11 12:57 2016 - tr1206900data/vsmPool10 used 86,6T - tr1206900data/vsmPool10 available 0 - tr1206900data/vsmPool10 referenced 86,6T - tr1206900data/vsmPool10 compressratio 1.00x - tr1206900data/vsmPool10 reservation none default tr1206900data/vsmPool10 volsize 109T local tr1206900data/vsmPool10 volblocksize 128K - tr1206900data/vsmPool10 checksum on default tr1206900data/vsmPool10 compression off default tr1206900data/vsmPool10 readonly off default tr1206900data/vsmPool10 copies 1 default tr1206900data/vsmPool10 refreservation none default tr1206900data/vsmPool10 primarycache all local tr1206900data/vsmPool10 secondarycache all default tr1206900data/vsmPool10 usedbysnapshots 0 - tr1206900data/vsmPool10 usedbydataset 86,6T - tr1206900data/vsmPool10 usedbychildren 0 - tr1206900data/vsmPool10 usedbyrefreservation 0 - tr1206900data/vsmPool10 logbias latency default tr1206900data/vsmPool10 dedup off default tr1206900data/vsmPool10 mlslabel none default tr1206900data/vsmPool10 sync standard default tr1206900data/vsmPool10 refcompressratio 1.00x - tr1206900data/vsmPool10 written 86,6T - tr1206900data/vsmPool10 logicalused 86,5T - tr1206900data/vsmPool10 logicalreferenced 86,5T - tr1206900data/vsmPool10 snapshot_limit none default tr1206900data/vsmPool10 snapshot_count none default tr1206900data/vsmPool10 redundant_metadata all defaultThis must be the dumbest failure one can possibly have, when setting up a zvol iSCSI target. So, someone - no, it wasn't me, actually, but this doesn't do me any good anyway, created a zvol equal the size to the zpool and now it is as you suspected: the zvol has run out of space.
So, the only chance would be to add additional space to these zpools, such as that the zvol actually can occupy the space the claim to have? Should be manageable… I could provide some iSCSI LUNs to the targets themselves and add another vdev. There will be some serious cleanup needed afterwards…
What about the "free" 3,41T in zpool itself? Could those be somehow utilised?
Thanks, Stephan
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ OmniOS-discuss mailing list [email protected] http://lists.omniti.com/mailman/listinfo/omnios-discuss
