2012-04-19 11:23, Joshua M. Clulow wrote: > Assuming you have keys in redmine already (so that I don't have to > re-run the sync process) you can try out this test version: > webrev -t rsync://web...@alpha.sysmgr.org:some_webrev_name -O -U
Heh, it works and is quite simple - must be magic... gotta get this into Wiki when the solution is polished ;) So here goes my first webrev attempt: http://alpha.sysmgr.org/~webrev/jimklimov/issue2024/ I've reported on the issue in detail on the zfs-discuss list this winter, and in illumos bugtracker as issue #2024 ZFS dedup=on should not panic the system when two blocks with same checksum exist in DDT https://www.illumos.org/issues/2024 In short, I had some corruption on a pool with deduped data. DDT entries had a checksum, same as in BP, but the actual data block was corrupted and mismatched the checksum. If the block identical to original (same checksum) is written with dedup=verify, then ZFS detects that the newly written block is different from ondisk data, and makes another DDT entry. If the original block is later rewritten with dedup=on (no verification) ZFS panics the kernel. Stacktrace research led to some routines which can validly return a NULL pointer during DDT lookups, but other code assumes that a pointer is always valid - thus my kernel got NULL pointer dereferences and panicked. This panic also happens soon after pool reimport (after reboot), I guess some processing like deferred-free kicks in. The proposed fix adds some non-NULLness checks to the routine in question and some routines near it in the source code file. I added kernel logging in case the lookup returns NULL; it may be correct to invoke the message only if the kernel is DEBUG... or maybe the users should always know there is something foul with their pools? It is possible that some of these checks may be not needed, or that I missed some locations where similar checks are due. It is also possible that the solution may lead to block leaks. As an interim solution, I prefered to potentially lose a few MBs of disk space rather than not have access to the whole pool. ZFS code gurus are kindly asked to review the solution here: http://alpha.sysmgr.org/~webrev/jimklimov/issue2024/ https://www.illumos.org/issues/2024 Thanks, //Jim Klimov _______________________________________________ oi-dev mailing list oi-dev@openindiana.org http://openindiana.org/mailman/listinfo/oi-dev