On Thu, 14 Feb 2013, Martin Steigerwald wrote:
Am Mittwoch, 13. Februar 2013 schrieb Fredrik Tolf:
You started the balance after above btrfs fi show command?

I did.

Then its obvious to me:

For some reason BTRFS is still trying to write to /dev/sdd, which isn´t
there anymore. That perfectly explains those lost page writes for me. If
that is the case, this seems to me like a serious bug in BTRFS.

Now I have disconnected the drive entirely, quite simply, so that I can try to do simply what I should do if the drive really had failed completely and I had gotten a replacement in its stead. Neither any sdd nor any sdi is seen by the system anymore. However, I'm still getting kernel messages about being unable to write to sdd:

Feb 15 04:37:41 nerv kernel: [252822.640560] lost page write due to I/O error 
on /dev/sdd1
Feb 15 04:37:41 nerv kernel: [252822.644531] btrfs: bdev /dev/sdd1 errs: wr 
362195, rd 26, flush 1, corrupt 0, gen 0

I can't say I know what conclusions that lead to with regards to your observations.

I´d restart the machine, see that BTRFS is using both devices again and
then try the balance again.

I mentioned it in another mail, but I'd very much prefer not to do that. I'd like to try and solve this as I normally should when a drive fails.

When I'm running btrfs fi show, this is what I'm getting now:

$ sudo ./btrfs fi show
Label: none  uuid: 40d346bb-2c77-4a78-8803-1e441bf0aff7
        Total devices 2 FS bytes used 2.66TB
        devid    2 size 2.73TB used 2.67TB path /dev/sde1
        *** Some devices missing

So that's what it should look like when a drive fails, right?

At this point, I'm trying to remove the missing device from the filesystem as the Wiki indicates that I should be able to, but alas:

$ sudo ./btrfs device delete missing /mnt
ERROR: error removing the device 'missing' - Invalid argument

The dmesg tells me this:

Feb 15 04:42:22 nerv kernel: [253103.799201] btrfs: unable to go below two 
devices on raid1

How do I remove the conception of the missing device so that I can replace it? Should I simply add the replacement first, and only after that remove the missing device?

If the latter, how can I "scratch" the previous btrfs metadata from this "replacement" drive so that it doesn't try to autoreinsert it into the filesystem when it is detected? I assume it won't be enough be just zeroing the first few sectors of the drive, right?

Thanks for replying!

--

Fredrik Tolf

Reply via email to