Hi Stephan, > On Jan 25, 2017, at 5:54 AM, Stephan Budach <[email protected]> wrote: > > Hi guys, > > I have been trying to import a zpool, based on a 3way-mirror provided by > three omniOS boxes via iSCSI. This zpool had been working flawlessly until > some random reboot of the S11.1 host. Since then, S11.1 has been importing > this zpool without success. > > This zpool consists of three 108TB LUNs, based on a raidz-2 zvols… yeah I > know, we shouldn't have done that in the first place, but performance was not > the primary goal for that, as this one is a backup/archive pool. > > When issueing a zpool import, it says this: > > root@solaris11atest2:~# zpool import > pool: vsmPool10 > id: 12653649504720395171 > state: DEGRADED > status: The pool was last accessed by another system. > action: The pool can be imported despite missing or damaged devices. The > fault tolerance of the pool may be compromised if imported. > see: http://support.oracle.com/msg/ZFS-8000-EY > <http://support.oracle.com/msg/ZFS-8000-EY> > config: > > vsmPool10 DEGRADED > mirror-0 DEGRADED > c0t600144F07A3506580000569398F60001d0 DEGRADED corrupted data > c0t600144F07A35066C00005693A0D90001d0 DEGRADED corrupted data > c0t600144F07A35001A00005693A2810001d0 DEGRADED corrupted data > > device details: > > c0t600144F07A3506580000569398F60001d0 DEGRADED > scrub/resilver needed > status: ZFS detected errors on this device. > The device is missing some data that is recoverable. > > c0t600144F07A35066C00005693A0D90001d0 DEGRADED > scrub/resilver needed > status: ZFS detected errors on this device. > The device is missing some data that is recoverable. > > c0t600144F07A35001A00005693A2810001d0 DEGRADED > scrub/resilver needed > status: ZFS detected errors on this device. > The device is missing some data that is recoverable. > > However, when actually running zpool import -f vsmPool10, the system starts > to perform a lot of writes on the LUNs and iostat report an alarming increase > in h/w errors: > > root@solaris11atest2:~# iostat -xeM 5 > extended device statistics ---- errors --- > device r/s w/s Mr/s Mw/s wait actv svc_t %w %b s/w h/w trn tot > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 > sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 > sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 71 0 71 > sd3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 > sd4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 > sd5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 > extended device statistics ---- errors --- > device r/s w/s Mr/s Mw/s wait actv svc_t %w %b s/w h/w trn tot > sd0 14.2 147.3 0.7 0.4 0.2 0.1 2.0 6 9 0 0 0 0 > sd1 14.2 8.4 0.4 0.0 0.0 0.0 0.3 0 0 0 0 0 0 > sd2 0.0 4.2 0.0 0.0 0.0 0.0 0.0 0 0 0 92 0 92 > sd3 157.3 46.2 2.1 0.2 0.0 0.7 3.7 0 14 0 30 0 30 > sd4 123.9 29.4 1.6 0.1 0.0 1.7 10.9 0 36 0 40 0 40 > sd5 142.5 43.0 2.0 0.1 0.0 1.9 10.2 0 45 0 88 0 88 > extended device statistics ---- errors --- > device r/s w/s Mr/s Mw/s wait actv svc_t %w %b s/w h/w trn tot > sd0 0.0 234.5 0.0 0.6 0.2 0.1 1.4 6 10 0 0 0 0 > sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 > sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 92 0 92 > sd3 3.6 64.0 0.0 0.5 0.0 4.3 63.2 0 63 0 235 0 235 > sd4 3.0 67.0 0.0 0.6 0.0 4.2 60.5 0 68 0 298 0 298 > sd5 4.2 59.6 0.0 0.4 0.0 5.2 81.0 0 72 0 406 0 406 > extended device statistics ---- errors --- > device r/s w/s Mr/s Mw/s wait actv svc_t %w %b s/w h/w trn tot > sd0 0.0 234.8 0.0 0.7 0.4 0.1 2.2 11 10 0 0 0 0 > sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 > sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 92 0 92 > sd3 5.4 54.4 0.0 0.3 0.0 2.9 48.5 0 67 0 384 0 384 > sd4 6.0 53.4 0.0 0.3 0.0 4.6 77.7 0 87 0 519 0 519 > sd5 6.0 60.8 0.0 0.3 0.0 4.8 72.5 0 87 0 727 0 727
h/w errors are a classification of other errors. The full error list is available from "iostat -E" and will be important to tracking this down. A better, more detailed analysis can be gleaned from the "fmdump -e" ereports that should be associated with each h/w error. However, there are dozens of causes of these so we don’t have enough info here to fully understand. — richard > > > I have tried pulling data from the LUNs using dd to /dev/null and I didn't > get any h/w error, this just started, when trying to actually import the > zpool. As the h/w errors are constantly rising, I am wondering what could > cause this and if there can something be done about this? > > Cheers, > Stephan > _______________________________________________ > OmniOS-discuss mailing list > [email protected] > http://lists.omniti.com/mailman/listinfo/omnios-discuss
_______________________________________________ OmniOS-discuss mailing list [email protected] http://lists.omniti.com/mailman/listinfo/omnios-discuss
