On 22/10/2015 12:07, Gabriele Bulfon wrote:
Yes, I understand, but why the zpool is not signaling the problem? I had to face the problem while doing a cp, and then through iostat. When is ZFS taking account of the problem and putting it offline or sending advice through fault management? ---------------------------------------------------------------------------------- Da: Udo Grabowski [email protected] Gabriele Bulfon Data: 22 ottobre 2015 11.41.37 CEST Oggetto: Re: [discuss] iostat errors on zpool mirror On 22/10/2015 11:27, Gabriele Bulfon wrote: Hi, I have a falining sata device inside a mirrored zpool. Shouldn't I have failures on the zpool device too? Any big file copy on to the mirror will make the system VERY slow, and I have to kill the cp. How can I dig more the problem? sonicle@xstreamserver:~# iostat -e ---- errors --- device s/w h/w trn tot sd0 0 5 0 5 sd1 0 0 0 0 sd2 0 0 0 0 sd3 0 0 0 0 sd4 0 0 0 0 sd5 0 0 0 0 sd6 0 0 0 0 sd7 0 0 0 0 sd13 0 33 0 33 nfs1 0 0 0 0 sonicle@xstreamserver:~# kstat -n sd13,err module: sderr instance: 13 name: sd13,err class: device_error Hard Errors 33 Illegal Request 223 Media Error 24 Not necessary to dig more, that disk is going south, get a new one. If you trust your other disk, try 'zpool offline' on the failing one and draw a backup quickly before replacing. You can 'set sd:sd_io_time = 0x20' in /etc/system (or via mdb for immediate effect to shorten the outages, but that disk will plague you indefinitely until it finally dies (which can last several weeks, we had this pest a couple of times this year). -- Dr.Udo Grabowski Inst.f.Meteorology &Climate Research IMK-ASF-SAT http://www.imk-asf.kit.edu/english/sat.php KIT - Karlsruhe Institute of Technology http://www.kit.edu Postfach 3640,76021 Karlsruhe,Germany T:(+49)721 608-26026 F:-926026Yes, I understand, but why the zpool is not signaling the problem? I had to face the problem while doing a cp, and then through iostat. When is ZFS taking account of the problem and putting it offline or sending advice through fault management?
It signals when it's faulted, but it checks a disk a couple of times with sd timeout, and when it comes back, zfs is happy again. Since it tries to maintain the integrity of the mirror, it will always wait to synchronize the second disk. Disk failures like these are extremely annoying since the disk often manages to get around a problem within this timeout, therefore lower the sd timeout to get more failures counted by fm, so that the disk will be taken out earlier due to excessive errors. We sometimes go down to 5 seconds when we cannot get the disk out by other means, and, if nothing helps, we simply pull the disk to force a fault. Self healing capabilities often have a price tag, too.... -- Dr.Udo Grabowski Inst.f.Meteorology & Climate Research IMK-ASF-SAT http://www.imk-asf.kit.edu/english/sat.php KIT - Karlsruhe Institute of Technology http://www.kit.edu Postfach 3640,76021 Karlsruhe,Germany T:(+49)721 608-26026 F:-926026
smime.p7s
Description: S/MIME Cryptographic Signature
------------------------------------------- illumos-discuss Archives: https://www.listbox.com/member/archive/182180/=now RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be Modify Your Subscription: https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4 Powered by Listbox: http://www.listbox.com
