On Sun, Sep 25, 2016 at 7:22 PM, Jeff Mahoney <je...@suse.com> wrote:
> On 9/25/16 9:55 AM, Rich Freeman wrote:
>> On Fri, Sep 23, 2016 at 12:58 AM, Duncan <1i5t5.dun...@cox.net> wrote:
>>>
>>> Btrfs raid1 you say, and you have existing compressed files it's trying
>>> to read in the backtrace?
>>>
>>> Sounds like the issues I see sometimes and have posted about where after
>>> a crash that resulted in one device of my raid1 pair getting behind the
>>> other, the kernel will crash if it sees too many csum-errors, even tho
>>> it's /supposed/ to check the other copy and read from it if valid (which
>>> it is as a btrfs scrub resolves the issue).
>>>
>>> When booted to rescue/single-user mode, can you run a scrub?
>>
>> After a few reboots trying to capture the initial panic message (even
>> when I set panic_on_oops=1 I was getting multiple ones with only the
>> tainted one staying on screen), the system managed to stay up.  I
>> completed a scrub and it found no errors.  I also haven't had any
>> issues with it but haven't attempted another reboot.  I figured the
>> safest course was to just leave it on for a good week so that whatever
>> was in the log/etc that was giving it trouble works its way out.  I'm
>> also doing a balance which may or may not help (and which is useful
>> anyway since I increased the size of the drive I replaced).
>
> If it stays up, can you post the initial Oops then?
>

Unfortunately, it stays up because there is no OOPS.  It was crashing
fairly consistently, but for whatever reason it didn't this time.
Since I needed the box working and wasn't having a lot of luck
capturing the OOPS I just let it run with minimal prodding, and
hopefully it is now in a state where it won't crash.

But, if it happens again I'll try to capture an initial OOPS output,
and I'll do a memory test in any case (though I really am not
expecting anything there).

If I were able to get kernel core dumping working on this machine,
would that contain information about the initial oops.  I forget if
they contain the full ring buffer/etc.  I used to have it working but
some change in either the kernel or the utils was causing issues with
it.  I still boot my kernels with space set aside for the crash
kernel...

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to