Hi all, Before the week was out, I wanted to provide an update on this issue.
Last weekend, I installed two VMs with CURRENT (20240208-82bebc793658-268105) - one on zfs and one on ufs - and built a kernel with this config file: include GENERIC ident THAYER-FULLDEBUG makeoptions DEBUG=-g options KASAN options DDB options INVARIANT_SUPPORT options INVARIANTS options QUEUE_MACRO_DEBUG_TRASH options WITNESS options WITNESS_SKIPSPIN options KGSSAPI I'm also setting these in loader.conf: debug.witness.watch=1 debug.witness.kdb=1 kern.kstack_pages=8 These two VMs have been running non-stop with our hdf5 workload without a panic for 146 hours and 122 hours, respectively. This might be good news, but is well within the threshold we've seen in our testing over the past 6 months. Given that all the debug kernel options slow things down significantly, these could just be taking a long while to panic. I also have a another VM with our "standard" 14.0p5 kernel (GENERIC with KGSSAPI enabled) running on ufs to try to rule in or out zfs. This failed this morning, but not with a panic. In this case, nfs stopped responding. This is a failure mode we have seen in our testing, but is much rarer than a full panic. I intend to continue testing this to try to induce a panic, at which point I think we can rule out zfs as a potential cause. Just so it's documented, since I started experimenting with kernel debug options last week, I have so far induced panics with the following: - 13.2p9 kernel on hardware (only WITNESS enabled) - 14.0p4 kernel on VM (only KASAN enabled) - 13.2p9 kernel on hardware (all debug options above except KASAN) My plan right now is to continue running my two test VMs with CURRENT to see if it's just taking a long time to panic. Once I have finished my ufs testing on the third VM, I will build a GENERIC kernel for CURRENT (no debug options, only KGSSAPI) and test against that to see if the actual debug instrumentation is interfering with reproducing this issue. Please reach out if you have ideas or suggestions. I'll provide updates here when I have them. Thanks, Matt