Overnight I was running a zfs send | zfs receive (both within the same
system / zpool). The system ran out of space, a drive went off line,
and the system is degraded.
This is a raidz2 array running on FreeBSD 8.1-STABLE #0: Sat Sep 18
23:43:48 EDT 2010.
The following logs are also available at
http://www.langille.org/tmp/zfs-space.txt <- no line wrapping
This is what was running:
# time zfs send storage/bac...@transfer | mbuffer | zfs receive
storage/compressed/bacula-mbuffer
in @ 0.0 kB/s, out @ 0.0 kB/s, 3670 GB total, buffer 100% fullcannot
receive new filesystem stream: out of space
mbuffer: error: outputThread: error writing to <stdout> at offset
0x395917c4000: Broken pipe
summary: 3670 GByte in 10 h 40 min 97.8 MB/s
mbuffer: warning: error during output to <stdout>: Broken pipe
warning: cannot send 'storage/bac...@transfer': Broken pipe
real 640m48.423s
user 8m52.660s
sys 211m40.862s
Looking in the logs, I see this:
Oct 2 00:50:53 kraken kernel: (ada0:siisch0:0:0:0): lost device
Oct 2 00:50:54 kraken kernel: siisch0: Timeout on slot 30
Oct 2 00:50:54 kraken kernel: siisch0: siis_timeout is 00040000 ss
40000000 rs 40000000 es 00000000 sts 801f0040 serr 00000000
Oct 2 00:50:54 kraken kernel: siisch0: Error while READ LOG EXT
Oct 2 00:50:55 kraken kernel: siisch0: Timeout on slot 30
Oct 2 00:50:55 kraken kernel: siisch0: siis_timeout is 00040000 ss
40000000 rs 40000000 es 00000000 sts 801f0040 serr 00000000
Oct 2 00:50:55 kraken kernel: siisch0: Error while READ LOG EXT
Oct 2 00:50:56 kraken kernel: siisch0: Timeout on slot 30
Oct 2 00:50:56 kraken kernel: siisch0: siis_timeout is 00040000 ss
40000000 rs 40000000 es 00000000 sts 801f0040 serr 00000000
Oct 2 00:50:56 kraken kernel: siisch0: Error while READ LOG EXT
Oct 2 00:50:57 kraken kernel: siisch0: Timeout on slot 30
Oct 2 00:50:57 kraken kernel: siisch0: siis_timeout is 00040000 ss
40000000 rs 40000000 es 00000000 sts 801f0040 serr 00000000
Oct 2 00:50:57 kraken kernel: siisch0: Error while READ LOG EXT
Oct 2 00:50:58 kraken kernel: siisch0: Timeout on slot 30
Oct 2 00:50:58 kraken kernel: siisch0: siis_timeout is 00040000 ss
40000000 rs 40000000 es 00000000 sts 801f0040 serr 00000000
Oct 2 00:50:58 kraken kernel: siisch0: Error while READ LOG EXT
Oct 2 00:50:59 kraken root: ZFS: vdev I/O failure, zpool=storage
path=/dev/gpt/disk06-live offset=270336 size=8192 error=6
Oct 2 00:50:59 kraken kernel: (ada0:siisch0:0:0:0): Synchronize cache
failed
Oct 2 00:50:59 kraken kernel: (ada0:siisch0:0:0:0): removing device entry
Oct 2 00:50:59 kraken root: ZFS: vdev I/O failure, zpool=storage
path=/dev/gpt/disk06-live offset=2000187564032 size=8192 error=6
Oct 2 00:50:59 kraken root: ZFS: vdev I/O failure, zpool=storage
path=/dev/gpt/disk06-live offset=2000187826176 size=8192 error=6
$ zpool status
pool: storage
state: DEGRADED
scrub: scrub in progress for 5h32m, 17.16% done, 26h44m to go
config:
NAME STATE READ WRITE CKSUM
storage DEGRADED 0 0 0
raidz2 DEGRADED 0 0 0
gpt/disk01-live ONLINE 0 0 0
gpt/disk02-live ONLINE 0 0 0
gpt/disk03-live ONLINE 0 0 0
gpt/disk04-live ONLINE 0 0 0
gpt/disk05-live ONLINE 0 0 0
gpt/disk06-live REMOVED 0 0 0
gpt/disk07-live ONLINE 0 0 0
$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
storage 6.97T 1.91T 1.75G /storage
storage/bacula 4.72T 1.91T 4.29T /storage/bacula
storage/compressed 2.25T 1.91T 46.9K /storage/compressed
storage/compressed/bacula 2.25T 1.91T 42.7K /storage/compressed/bacula
storage/pgsql 5.50G 1.91T 5.50G /storage/pgsql
$ sudo camcontrol devlist
Password:
<Hitachi HDS722020ALA330 JKAOA28A> at scbus2 target 0 lun 0 (pass1,ada1)
<Hitachi HDS722020ALA330 JKAOA28A> at scbus3 target 0 lun 0 (pass2,ada2)
<Hitachi HDS722020ALA330 JKAOA28A> at scbus4 target 0 lun 0 (pass3,ada3)
<Hitachi HDS722020ALA330 JKAOA28A> at scbus5 target 0 lun 0 (pass4,ada4)
<Hitachi HDS722020ALA330 JKAOA28A> at scbus6 target 0 lun 0 (pass5,ada5)
<Hitachi HDS722020ALA330 JKAOA28A> at scbus7 target 0 lun 0 (pass6,ada6)
<ST380815AS 4.AAB> at scbus8 target 0 lun 0 (pass7,ada7)
<TSSTcorp CDDVDW SH-S223C SB01> at scbus9 target 0 lun 0 (cd0,pass8)
<WDC WD1600AAJS-75M0A0 02.03E02> at scbus10 target 0 lun 0 (pass9,ada8)
I'm not yet sure if the drive is fully dead or not. This is not a
hot-swap box.
I'm guessing the first step is to get ada0 back online and then in the
zpool. However, I'm reluctant to do a 'camcontrol scan' on this box as
it it froze up the system the last time I tried that:
http://docs.freebsd.org/cgi/mid.cgi?4C78FF01.5020500
Any suggestions for getting the drive back online and the zpool stabilized?
--
Dan Langille - http://langille.org/
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"