On Sun, Sep 25, 2016 at 7:22 PM, Jeff Mahoney <je...@suse.com> wrote: > On 9/25/16 9:55 AM, Rich Freeman wrote: >> On Fri, Sep 23, 2016 at 12:58 AM, Duncan <1i5t5.dun...@cox.net> wrote: >>> >>> Btrfs raid1 you say, and you have existing compressed files it's trying >>> to read in the backtrace? >>> >>> Sounds like the issues I see sometimes and have posted about where after >>> a crash that resulted in one device of my raid1 pair getting behind the >>> other, the kernel will crash if it sees too many csum-errors, even tho >>> it's /supposed/ to check the other copy and read from it if valid (which >>> it is as a btrfs scrub resolves the issue). >>> >>> When booted to rescue/single-user mode, can you run a scrub? >> >> After a few reboots trying to capture the initial panic message (even >> when I set panic_on_oops=1 I was getting multiple ones with only the >> tainted one staying on screen), the system managed to stay up. I >> completed a scrub and it found no errors. I also haven't had any >> issues with it but haven't attempted another reboot. I figured the >> safest course was to just leave it on for a good week so that whatever >> was in the log/etc that was giving it trouble works its way out. I'm >> also doing a balance which may or may not help (and which is useful >> anyway since I increased the size of the drive I replaced). > > If it stays up, can you post the initial Oops then? >
Unfortunately, it stays up because there is no OOPS. It was crashing fairly consistently, but for whatever reason it didn't this time. Since I needed the box working and wasn't having a lot of luck capturing the OOPS I just let it run with minimal prodding, and hopefully it is now in a state where it won't crash. But, if it happens again I'll try to capture an initial OOPS output, and I'll do a memory test in any case (though I really am not expecting anything there). If I were able to get kernel core dumping working on this machine, would that contain information about the initial oops. I forget if they contain the full ring buffer/etc. I used to have it working but some change in either the kernel or the utils was causing issues with it. I still boot my kernels with space set aside for the crash kernel... -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html