Re: Rebuilding chunk root?

2012-09-24 Thread Sami Haahtinen
On Mon, Sep 24, 2012 at 6:12 PM, David Sterba  wrote:
> On Mon, Sep 24, 2012 at 03:02:39PM +0100, Hugo Mills wrote:
> >Out of interest, does mounting with -o recovery help at all? (I'm
> > not expecting it to do much if your chunk tree's gone, but it might do
> > something).
>
> The -o recovery has access to the respective tree roots, but the
> contents may be destroyed already. The chunk tree is not deep, I can see
> height 1 on a 6 disk array (though lightly used, 1 node, 8 leaves) and 3
> disk array (1/7 TB used, 1 node, 29 leaves). So it's quite a small
> amount of data to destroy the chunktree completely, COW will lower the
> chances a bit.

Yeah, the whole tree is gone, I'm pretty sure of it since the first
20-50GB has been wiped from the drive and the mentioned address is in
the beginning of that part. I just wonder if there is any chance of
the older versions of the chunk tree still being somewhere and how to
find them. I doubt it's an easy feat though.

> Rebuilding from scratch does not look simple, the available information
> is stored in BLOCK_GROUP_ITEMs or INODE_ITEMs and covers portions of the
> chunks. Given that the device tree would be probably damaged as well,
> the amount of information to do cross-check is not high. Maybe replaying
> the chunk creation logic can save some guesswork.

Replaying chunk creation logic would not help that much, since the
drive has been resized a few times and had other operations that have
modified the chunk tree as well. The array itself is not that complex
(2 drives), but it's still not as simple as a single drive array.

Regards,
--
Sami Haahtinen
Bad Wolf Oy
+358443302775
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


filesystem in such a state that btrfsck crashes

2012-10-16 Thread Sami Haahtinen
Hi,

A few days ago I started experiencing some major slowdowns in my main
btrfs filesystem and when inspecting the errors I noticed an error
during balance:

btrfs: block rsv returned -28

After a while I started seeing worse and worse problems with the
filesystem. Eventually I was forced to revert to a LiveCD to attempt a
fix. I downloaded the latest version of btrfs-progs from git and ran
btrfsck against the filesystem and btrfsck dies with a segfault after
quite some time.

The kernel I started seeing the problems is 3.6.0 and I was on 3.6.2
during the few final moments. I have backups of the filesystem, but
I'm a bit curious what is causing this. I've seen mentions of other
segfault causing problems, so if there is anything I can do to help
debug the cause for the segfault, I'd be happy to help.

Filesystem info:

Label: none  uuid: 6dab592a-72a2-41bd-a773-c16614c56f51
Total devices 2 FS bytes used 843.64GB
devid2 size 1.80TB used 1.32TB path /dev/sdb2
devid1 size 1.80TB used 1.32TB path /dev/sda2

Data, RAID1: total=1.25TB, used=809.41GB
System, RAID1: total=32.00MB, used=196.00KB
System: total=4.00MB, used=0.00
Metadata, RAID1: total=74.00GB, used=34.23GB


Regards,
--
Sami Haahtinen
Bad Wolf Oy
+358443302775
root@sysresccd /root % btrfsck /dev/sda2
checking extents
ref mismatch on [282295177216 4096] extent item 12, found 8
Backref 282295177216 parent 276656304128 not referenced back 0x12076918
Backref 282295177216 parent 276656295936 not referenced back 0xc36b378
Backref 282295177216 parent 276743098368 not referenced back 0xbf808d8
Incorrect global backref count on 282295177216 found 11 wanted 8
backpointer mismatch on [282295177216 4096]
ref mismatch on [28930420 4096] extent item 2, found 1
Backref 28930420 parent 276453351424 not referenced back 0x82a56ea0
Incorrect global backref count on 28930420 found 2 wanted 1
backpointer mismatch on [28930420 4096]
ref mismatch on [289330163712 4096] extent item 2, found 1
Backref 289330163712 parent 276453351424 not referenced back 0x81b2e2e0
Incorrect global backref count on 289330163712 found 2 wanted 1
backpointer mismatch on [289330163712 4096]
ref mismatch on [289332174848 4096] extent item 2, found 1
Backref 289332174848 parent 276453351424 not referenced back 0x82c3b070
Incorrect global backref count on 289332174848 found 2 wanted 1
backpointer mismatch on [289332174848 4096]
ref mismatch on [289333620736 4096] extent item 2, found 1
Backref 289333620736 parent 276453351424 not referenced back 0x812fe950
Incorrect global backref count on 289333620736 found 2 wanted 1
backpointer mismatch on [289333620736 4096]
ref mismatch on [289334108160 4096] extent item 2, found 1
Backref 289334108160 parent 276656295936 not referenced back 0x85d1dde8
Incorrect global backref count on 289334108160 found 2 wanted 1
backpointer mismatch on [289334108160 4096]
ref mismatch on [289396117504 4096] extent item 5, found 4
Backref 289396117504 parent 276453339136 not referenced back 0xb243ca0
Incorrect global backref count on 289396117504 found 5 wanted 4
backpointer mismatch on [289396117504 4096]
ref mismatch on [289484378112 4096] extent item 2, found 1
Backref 289484378112 parent 276453339136 not referenced back 0x84014f20
Incorrect global backref count on 289484378112 found 2 wanted 1
backpointer mismatch on [289484378112 4096]
ref mismatch on [289488429056 4096] extent item 2, found 1
Backref 289488429056 parent 276453339136 not referenced back 0x82dd51b8
Incorrect global backref count on 289488429056 found 2 wanted 1
backpointer mismatch on [289488429056 4096]
ref mismatch on [289490132992 4096] extent item 2, found 1
Backref 289490132992 parent 276453339136 not referenced back 0x82810310
Incorrect global backref count on 289490132992 found 2 wanted 1
backpointer mismatch on [289490132992 4096]
ref mismatch on [289490591744 4096] extent item 2, found 1
Backref 289490591744 parent 276656304128 not referenced back 0x849310c8
Incorrect global backref count on 289490591744 found 2 wanted 1
backpointer mismatch on [289490591744 4096]
Errors found in extent allocation tree
checking fs roots
root 257 inode 1766316 errors 400
zsh: abort  btrfsck /dev/sda2
root@sysresccd /root %


Re: filesystem in such a state that btrfsck crashes

2012-10-17 Thread Sami Haahtinen
On Tue, Oct 16, 2012 at 11:25 PM, Sami Haahtinen  wrote:
> The kernel I started seeing the problems is 3.6.0 and I was on 3.6.2
> during the few final moments. I have backups of the filesystem, but
> I'm a bit curious what is causing this. I've seen mentions of other
> segfault causing problems, so if there is anything I can do to help
> debug the cause for the segfault, I'd be happy to help.

After a bit of investigating I noticed that I was unable to delete
some of the snapshots, so the problem is most likely related to
snapshots.

root@sysresccd /mnt/windows % btrfs sub del @apt-snap*
Delete subvolume '/mnt/windows/@apt-snapshot-2012-09-25_17:01:15'
ERROR: cannot delete '/mnt/windows/@apt-snapshot-2012-09-25_17:01:15'
- Directory not empty
Delete subvolume '/mnt/windows/@apt-snapshot-2012-09-25_17:01:16'
ERROR: cannot delete '/mnt/windows/@apt-snapshot-2012-09-25_17:01:16'
- Directory not empty
[...]

The unmount process after the deletion has now taken quite a while
(over an hour) and I'm not really expecting it to complete.

Regards,
-- 
Sami Haahtinen
Bad Wolf Oy
+358443302775
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html