On 2013-07-08 22:58, CJ Keist wrote:
Thank you all for the replies. I tried OmniOS and Oracle Solaris 11.1 but both were not able to import the data pool. So I have reinstalled OI 151a7 and after importing the data and having it crash, I booted up in single user mode. At this point I was able to initiate zpool scrub data and it looks to be running!! I will wait and see if the scrub can finish and then try to remount everything. See attached pic.
That screenshot seems disturbing: with such a large pool you only have one device. Is it on hardware RAID which masks away all the disks and possible redundancy and repair variants away from ZFS? In that case, the data error maybe anywhere in that RAID's implementation (i.e. when you did a force-reboot, some critical data was not flushed to disks at all, or worse - in a wrong order - for example uberblock updates came before the other metadata updates, and the latter never made it). I think that for the scrub you did mount the pool read-write, so it would be too late to try rolling back a few transactions into an older but possibly more consistent state of the pool (or did you already do that while successfully importing?) If the pool just "gave up" and after a few panics began to import at least so much that the kernel accepts it, it is possible (just from my experience, shooting ideas into sky here) that some deferred ops were recorded on the pool, and it finally unrolled them. For example, I had a series of panicky reboots when deleting lots of data on a deduped pool on a machine with low RAM (8Gb) - enumerating the DDT consumed a lot more, the kernel couldn't swap, BAM! Took about two weeks of resetting it every 3-4 hours, for the box to get itself straight... For the developers here to provide more targeted ideas and/or make a solution, it would sure be helpful if you could provide a stack trace of the kernel panic - to see where it goes wrong (probably, some data on disk did not match an assertion like unexpected zero/nonzero value). For this you could boot into kmdb (preferably on a serial console, the traces are quite long and roll off the 25-line screen), so that when the problem occurs - the messages are printed but the machine doesn't reboot automatically. Actually, with a serial console you might care a bit less about kmdb - if you can copy-paste the trace quickly enough before it is overwritten by BIOS POST messages. HTH, //Jim _______________________________________________ OpenIndiana-discuss mailing list [email protected] http://openindiana.org/mailman/listinfo/openindiana-discuss
