Please note: You will not want to leave zil disabled as this could and would lead to nfs client corruption.
Disabling zil for a test will allow you to see if zfs write cache of many small files on large FS is causing the problems. > /etc/system > set ZFS:zil_disable=1 > > This may help > > > > It seems that all community moved from Solaris > forums > > here, so I'm actually re-posting my original post. > > > > I've Solaris 10U4 running on x86/64 with 127729-07 > > patch applied (nfs large file panic zfs). I've 1TB > > ZFS pool with over million files (mainly small, > > mails) and folders shared and accessed over NFSv4. > > Before application of mentioned patch machine were > > crashing eash 2 hours, after application it became > > rare - approx. once per 1-2 days. > > > > After crash ZFS pool shows strange error: > > ---------------- > > # zpool status -xv > > pool: box5 > > tate: ONLINE > > status: One or more devices has experienced an > error > > resulting in data > > corruption. Applications may be affected. > > Restore the file in question if possible. > Otherwise > restore the > entire pool from backup. > www.sun.com/msg/ZFS-8000-8A > > scrub: none requested > > onfig: > > > > NAME STATE READ WRITE CKSUM > > box5 ONLINE 0 0 0 > > mirror ONLINE 0 0 0 > > c1d0 ONLINE 0 0 0 > > c2d0 ONLINE 0 0 0 > > mirror ONLINE 0 0 0 > > c2d1 ONLINE 0 0 0 > > c1d1 ONLINE 0 0 0 > > > > errors: Permanent errors have been detected in the > > following files: > > > > box5:<0x0> > > --------- > > > > I removed this permanent error by running > scrubbing > > which which identified checksum error and then > > cleaning it. > > > > ---------------- > > # zpool status -xv > > pool: box5 > > tate: ONLINE > > status: One or more devices has experienced an > error > > resulting in data > > corruption. Applications may be affected. > > Restore the file in question if possible. > Otherwise > restore the > entire pool from backup. > www.sun.com/msg/ZFS-8000-8A > > scrub: scrub in progress, 1.20% done, 176h37m to > go > > onfig: > > > > NAME STATE READ WRITE CKSUM > > box5 ONLINE 0 0 4 > > mirror ONLINE 0 0 2 > > c1d0 ONLINE 0 0 4 > > c2d0 ONLINE 0 0 4 > > mirror ONLINE 0 0 2 > > c2d1 ONLINE 0 0 4 > > c1d1 ONLINE 0 0 4 > > > > errors: Permanent errors have been detected in the > > following files: > > > > box5:<0x0> > > --------- > > > > However, ZFS error (box5:<0x0>) appeared as soon > as > > system crashed again. > > > > I've attached stack trace, status and other infos > for > > last 4 crash dumps. I kept last one for further > > debugging just in case. Basically stacks are very > > similar. > > > > The only suspection I had was my non-ECC ram > modules. > > But I replaced all modules couple a days ago and > it > > didn't help. > > > > Anyone have similar symptoms? Any > temporary/permanent > > solutions appreciated. > > > > Message was edited by: > > rstml This message posted from opensolaris.org _______________________________________________ opensolaris-discuss mailing list [email protected]
