On Wed, Apr 22, 2015 at 05:47:17PM -0400, Diego Remolina wrote:
> In 2012, I setup a Centos 6.x machine with a btrfs file system on top
> of DRBD, we did some testing prior to going production and it seemed
> fine, and has worked fine for a long time. However, now we are
> encountering problems and was wondering if I could get any help.
> 
> [root@ysmha01 tmp]# btrfs fi show
> Label: none  uuid: 7a38f3ab-f3b0-4b3d-81c0-28b347b26da1
>         Total devices 1 FS bytes used 5.79TB
>         devid    1 size 18.19TB used 8.94TB path /dev/drbd0
> 
> Btrfs Btrfs v0.20-rc1

   This is old, but probably not related to your problem.

> While still running the official Centos
> kernel-2.6.32-504.12.2.el6.x86_64 the machine started crashing with a
> kernel oops. Since that happened, I tried a few different 2.6.32
> kernels with the same result. Yesterday I switched to the elrepo
> kernel-lt 3.10.75-1.el6.elrepo.x86_64 version

   These are very old (3.10) and utterly antique (2.6.32). Even with
backporting of patches, there's almost certainly some serious bugs in
those versions that have since been fixed.

> and was able to get the
> machine up and running and found some error messages which lead me to
> believe things were not too bad after all:

[snip]

> Apr 21 17:54:56 ysmha01 kernel: BTRFS warning (device drbd0): failed
> to load free space cache for block group 7255336419328, rebuild it now
> Apr 21 17:54:56 ysmha01 kernel: BTRFS warning (device drbd0): block
> group 7256410161152 has wrong amount of free space

   That on its own is, as you say, not a major problem. The fact that
it's repeating suggests that there's some other problem in there.

> Since then, the machine was left up and serving samba shares until it
> had another kernel oops this morning.
> 
[snip oops]
> ........snip......

   Someone may recognise that oops as a bug that's since been fixed --
but it's probably not likely, since with a kernel that old, the
information has likely fallen out of the head of anyone who might have
known about it.

> When the oops happens, then the mount point becomes unusable. What
> would be the best path to recovery from here?
> 
> What other information may I provide?

   I think the best thing for you to do is find a suitable 3.19 or 4.0
kernel and see how that behaves with this filesystem.

   Another thing to do would be to get hold of a recent (3.19) set of
userspace tools, and run btrfs check --readonly on the filesystem
(unmounted), and report back what that says.

   Hugo.

-- 
Hugo Mills             | People are too unreliable to be replaced by
hugo@... carfax.org.uk | machines.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                              Nathan Spring, Star Cops

Attachment: signature.asc
Description: Digital signature

Reply via email to