Re: btrfsck: extent-tree.c:2549: btrfs_reserve_extent: Assertion `!(ret)' failed.

2013-02-19 Thread Richard Cooper

On 3 Jan 2013, at 12:26, Richard Cooper wrote:
> Hi All,
> 
> I'm trying to repair a broken fs using btrfsck and am hitting a failed 
> assertion. I'd appreciate any suggestions for what to do next. Is there any 
> thing I can do to help fix this bug? Any other information from my FS which 
> would help? If the FS could be salvaged that would be a bonus, but I'm more 
> interested in providing a useful bug report before wiping the disk.

After this error I reformatted and started again. A few days ago I hit exactly 
the same error in btrfsck again. Is it useful for me to report the same errors 
again? Would it make more sense for me to submit them to 
https://bugzilla.kernel.org? Is that tracker appropriate for btrfs-progs 
related bugs?

My versions are:
- OS - CentOS 6.3
- Kernel -  3.7.8-1 from http://elrepo.org/tiki/kernel-ml 
- btrfs-progs - v0.20-rc1-56-g6cd836d. Built from 
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git

# ./btrfs-progs/btrfs fi show /dev/md4
Label: none  uuid: fecad98e-63e8-47b7-9bc1-5d9351f15e76
Total devices 1 FS bytes used 702.22GB
devid1 size 16.36TB used 723.04GB path /dev/md4

Btrfs v0.20-rc1-56-g6cd836d

# ./btrfsck --repair /dev/md4 
enabling repair mode
checking extents
incorrect offsets 2539 133611
bad block 694804008960
bad key ordering 26 27
bad block 710147256320
bad key ordering 29 30
bad block 710793940992
ref mismatch on [506097786880 8192] extent item 1, found 0
btrfsck: extent-tree.c:2549: btrfs_reserve_extent: Assertion `!(ret)' failed.
Aborted
# echo $?
134

If i run btrfsck in non-repair mode I get:

# ./btrfsck /dev/md4 
checking extents
incorrect offsets 2539 133611
bad block 694804008960
bad key ordering 26 27
bad block 710147256320
bad key ordering 29 30
bad block 710793940992
ref mismatch on [506097786880 8192] extent item 1, found 0
Incorrect local backref count on 506097786880 root 257 owner 6201018 offset 0 
found 0 wanted 1 back 0x72966380
backpointer mismatch on [506097786880 8192]
owner ref check failed [506097786880 8192]
ref mismatch on [506097795072 8192] extent item 1, found 0
Incorrect local backref count on 506097795072 root 257 owner 6201019 offset 0 
found 0 wanted 1 back 0x72966460
backpointer mismatch on [506097795072 8192]
owner ref check failed [506097795072 8192]
ref mismatch on [506097803264 8192] extent item 1, found 0
Incorrect local backref count on 506097803264 root 257 owner 6201021 offset 0 
found 0 wanted 1 back 0x72966540
backpointer mismatch on [506097803264 8192]
owner ref check failed [506097803264 8192]
ref mismatch on [506097811456 8192] extent item 1, found 0
Incorrect local backref count on 506097811456 root 257 owner 6201022 offset 0 
found 0 wanted 1 back 0x72966620
backpointer mismatch on [506097811456 8192]
owner ref check failed [506097811456 8192]
ref mismatch on [686802194432 20480] extent item 1, found 0
Incorrect local backref count on 686802194432 root 257 owner 8037960 offset 0 
found 0 wanted 1 back 0x82c1e2a0
backpointer mismatch on [686802194432 20480]
owner ref check failed [686802194432 20480]
ref mismatch on [686802214912 20480] extent item 1, found 0
Incorrect local backref count on 686802214912 root 257 owner 8037961 offset 0 
found 0 wanted 1 back 0x82c1e380
backpointer mismatch on [686802214912 20480]
owner ref check failed [686802214912 20480]
ref mismatch on [686802235392 20480] extent item 1, found 0
Incorrect local backref count on 686802235392 root 257 owner 8037962 offset 0 
found 0 wanted 1 back 0x82c1e460
backpointer mismatch on [686802235392 20480]
owner ref check failed [686802235392 20480]
ref mismatch on [686802255872 20480] extent item 1, found 0
Incorrect local backref count on 686802255872 root 257 owner 8037963 offset 0 
found 0 wanted 1 back 0x82c1e540
backpointer mismatch on [686802255872 20480]
owner ref check failed [686802255872 20480]
ref mismatch on [686802276352 20480] extent item 1, found 0
Incorrect local backref count on 686802276352 root 257 owner 8037964 offset 0 
found 0 wanted 1 back 0x82c1e620
backpointer mismatch on [686802276352 20480]
owner ref check failed [686802276352 20480]
ref mismatch on [686802296832 20480] extent item 1, found 0
Incorrect local backref count on 686802296832 root 257 owner 8037965 offset 0 
found 0 wanted 1 back 0x82c1e700
backpointer mismatch on [686802296832 20480]
owner ref check failed [686802296832 20480]
ref mismatch on [686802317312 20480] extent item 1, found 0
Incorrect local backref count on 686802317312 root 257 owner 8037966 offset 0 
found 0 wanted 1 back 0x82c1e7e0
backpointer mismatch on [686802317312 20480]
owner ref check failed [686802317312 20480]
owner ref check failed [694804008960 4096]
owner ref check failed [710147256320 4096]
owner ref check failed [710793940992 4096]
Errors found in extent allocation tree
checking fs roots
root 257 inode 6292865 errors 400
root 257 inode 6292867 errors 400
found 754001297408 bytes used err is 1
tot

kernel BUG at fs/btrfs/extent-tree.c:5151

2013-01-21 Thread Richard Cooper
Hi All,

I have has a machine doesn't get on with btrfs at all. I've been using it as a 
testbed for several months and it never seems to last more than a couple of 
weeks before FS corruption.

My versions are:
- OS - CentOS 6.3
- Kernel - 3.7.1-2 from http://elrepo.org/tiki/kernel-ml 
- btrfs-progs - v0.20-rc1-37-g91d9eec. Built from 
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git

The /var/log/messages log for the latest failure is given below.

Two things of note:
1. The btrfs device is sitting on top of a 16.3 TB, 7 partition software RAID 5 
array. The bug seems to have occurred during a md data-check on that device. By 
default CentOS 6 does these checks every week.
2. After I rebooted the server the first thing I did was try to unmount the 
btrfs device. However the umount command hangs in an uninterruptible sleep in 
btrfs_error_commit_super.

Thoughts?

- Richard

Jan 20 11:30:04 backup3 kernel: leaf 556658688 total ptrs 102 free space 851
Jan 20 11:30:04 backup3 kernel: item 0 key (5846384640 b8 282234880) 
itemoff 3991 itemsize 4
Jan 20 11:30:04 backup3 kernel: shared data backref count 1
Jan 20 11:30:04 backup3 kernel: item 1 key (5846384640 b8 369684480) 
itemoff 3987 itemsize 4
Jan 20 11:30:04 backup3 kernel: shared data backref count 1
[snip a bunch of similar lines]
Jan 20 11:30:04 backup3 kernel: item 32 key (5846384640 b8 
225396207616) itemoff 3863 itemsize 4
Jan 20 11:30:04 backup3 kernel: shared data backref count 1
Jan 20 11:30:04 backup3 kernel: item 33 key (5846429696 a8 65536) 
itemoff 3797 itemsize 66
Jan 20 11:30:04 backup3 kernel: extent refs 35 gen 40 flags 1
Jan 20 11:30:04 backup3 kernel: extent data backref root 257 
objectid 172094 offset 0 count 1
Jan 20 11:30:04 backup3 kernel: shared data backref parent 
73255260160 count 1
Jan 20 11:30:04 backup3 kernel: item 34 key (5846429696 b8 282234880) 
itemoff 3793 itemsize 4
Jan 20 11:30:04 backup3 kernel: shared data backref count 1
[snip a bunch of similar lines]
Jan 20 11:30:04 backup3 kernel: item 67 key (5846495232 a8 77824) 
itemoff 3599 itemsize 66
Jan 20 11:30:04 backup3 kernel: extent refs 35 gen 40 flags 1
Jan 20 11:30:04 backup3 kernel: extent data backref root 257 
objectid 172091 offset 0 count 1
Jan 20 11:30:04 backup3 kernel: shared data backref parent 
73255256064 count 1
Jan 20 11:30:04 backup3 kernel: item 68 key (5846495232 b8 28592) 
itemoff 3595 itemsize 4
Jan 20 11:30:04 backup3 kernel: shared data backref count 1
[snip a bunch of similar lines]
Jan 20 11:30:04 backup3 kernel: item 100 key (5846495232 b8 
225396203520) itemoff 3467 itemsize 4
Jan 20 11:30:04 backup3 kernel: shared data backref count 1
Jan 20 11:30:04 backup3 kernel: item 101 key (5846573056 a8 90112) 
itemoff 3401 itemsize 66
Jan 20 11:30:04 backup3 kernel: extent refs 36 gen 40 flags 1
Jan 20 11:30:04 backup3 kernel: extent data backref root 257 
objectid 172096 offset 0 count 1
Jan 20 11:30:04 backup3 kernel: shared data backref parent 
73255260160 count 1
Jan 20 11:30:04 backup3 kernel: [ cut here ]
Jan 20 11:30:04 backup3 kernel: WARNING: at fs/btrfs/extent-tree.c:5134 
__btrfs_free_extent+0x714/0x860 [btrfs]()
Jan 20 11:30:04 backup3 kernel: Hardware name: MS-7522
Jan 20 11:30:04 backup3 kernel: Modules linked in: btrfs libcrc32c ipv6 
ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack 
iptable_filter ip_tables gpio_ich iTCO_wdt iTCO_vendor_support coretemp 
kvm_intel kvm crc32c_intel microcode pcspkr i2c_i801 lpc_ich r8169 mii sg 
i7core_edac edac_core ext4 mbcache jbd2 raid456 async_raid6_recov async_pq 
raid6_pq async_xor xor async_memcpy async_tx raid1 sd_mod crc_t10dif pata_acpi 
ata_generic pata_jmicron ahci libahci nouveau ttm drm_kms_helper hwmon mxm_wmi 
video wmi dm_mirror dm_region_hash dm_log dm_mod
Jan 20 11:30:04 backup3 kernel: Pid: 25973, comm: btrfs-cleaner Not tainted 
3.7.1-2.el6.elrepo.x86_64 #1
Jan 20 11:30:04 backup3 kernel: Call Trace:
Jan 20 11:30:04 backup3 kernel: [] 
warn_slowpath_common+0x7f/0xc0
Jan 20 11:30:04 backup3 kernel: [] 
warn_slowpath_null+0x1a/0x20
Jan 20 11:30:04 backup3 kernel: [] 
__btrfs_free_extent+0x714/0x860 [btrfs]
Jan 20 11:30:04 backup3 kernel: [] 
run_delayed_data_ref+0x159/0x160 [btrfs]
Jan 20 11:30:04 backup3 kernel: [] 
run_one_delayed_ref+0xba/0xc0 [btrfs]
Jan 20 11:30:04 backup3 kernel: [] 
run_clustered_refs+0x116/0x370 [btrfs]
Jan 20 11:30:04 backup3 kernel: [] 
btrfs_run_delayed_refs+0xd0/0x300 [btrfs]
Jan 20 11:30:04 backup3 kernel: [] 
btrfs_should_end_transaction+0x44/0x90 [btrfs]
Jan 20 11:30:04 backup3 kernel: [] 
btrfs_drop_snapshot+0x3b4/0x610 [btrfs]
Jan 20 11:30:04 backup3 kernel: [] ? __schedule+0x

Re: btrfsck: extent-tree.c:2549: btrfs_reserve_extent: Assertion `!(ret)' failed.

2013-01-09 Thread Richard Cooper

On 3 Jan 2013, at 16:43, Richard Cooper wrote:

> On 3 Jan 2013, at 15:06, Josef Bacik wrote:
> 
>> On Thu, Jan 03, 2013 at 05:26:38AM -0700, Richard Cooper wrote:
>>> Hi All,
>>> 
>>> I'm trying to repair a broken fs using btrfsck and am hitting a failed 
>>> assertion. I'd appreciate any suggestions for what to do next. Is there 
>>> anything I can do to help fix this bug? Any other information from my FS 
>>> which would help? If the FS could be salvaged that would be a bonus, but 
>>> I'm more interested in providing a useful bug report before wiping the disk.
>>> 
>> 
>> Well good news is that its the allocator failing to find space for a new 
>> block,
>> and the allocator in btrfs-progs is under-tested, so it's likely just an
>> internal bug and something we can fix.  Can you do btrfs fi show /dev/md4 
>> (not
>> mounted) and post that so we can be sure there's actually enough space. 
> 
> # ./btrfs fi show /dev/md4 
> Label: none  uuid: 5be10dea-64c1-474e-b640-987b25af3c27
>   Total devices 1 FS bytes used 606.79GB
>   devid1 size 16.36TB used 627.04GB path /dev/md4

Is this all the information you need? Is there a bug tracker I should report 
this to, to stop it getting lost in the mailing list archives?

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck: extent-tree.c:2549: btrfs_reserve_extent: Assertion `!(ret)' failed.

2013-01-03 Thread Richard Cooper

On 3 Jan 2013, at 15:06, Josef Bacik wrote:

> On Thu, Jan 03, 2013 at 05:26:38AM -0700, Richard Cooper wrote:
>> Hi All,
>> 
>> I'm trying to repair a broken fs using btrfsck and am hitting a failed 
>> assertion. I'd appreciate any suggestions for what to do next. Is there 
>> anything I can do to help fix this bug? Any other information from my FS 
>> which would help? If the FS could be salvaged that would be a bonus, but I'm 
>> more interested in providing a useful bug report before wiping the disk.
>> 
> 
> Well good news is that its the allocator failing to find space for a new 
> block,
> and the allocator in btrfs-progs is under-tested, so it's likely just an
> internal bug and something we can fix.  Can you do btrfs fi show /dev/md4 (not
> mounted) and post that so we can be sure there's actually enough space. 

# ./btrfs fi show /dev/md4 
Label: none  uuid: 5be10dea-64c1-474e-b640-987b25af3c27
Total devices 1 FS bytes used 606.79GB
devid1 size 16.36TB used 627.04GB path /dev/md4

Thanks for looking at this.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfsck: extent-tree.c:2549: btrfs_reserve_extent: Assertion `!(ret)' failed.

2013-01-03 Thread Richard Cooper
Hi All,

I'm trying to repair a broken fs using btrfsck and am hitting a failed 
assertion. I'd appreciate any suggestions for what to do next. Is there any 
thing I can do to help fix this bug? Any other information from my FS which 
would help? If the FS could be salvaged that would be a bonus, but I'm more 
interested in providing a useful bug report before wiping the disk.

My versions are:
- OS - CentOS 6.3
- Kernel - 3.7.1-2 from http://elrepo.org/tiki/kernel-ml 
- btrfs-progs - v0.20-rc1-37-g91d9eec. Built today from 
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git

The failure is:

# ./btrfsck --repair /dev/md4 
enabling repair mode
checking extents
leaf parent key incorrect 183603200
bad block 183603200
leaf parent key incorrect 183640064
bad block 183640064
warning, start mismatch 152387469312 762175488
block 152387469312 rec extent_item_refs 1, passed 1
warning, start mismatch 449606139904 427217858560
block 427217858560 rec extent_item_refs 1, passed 1
ref mismatch on [32215040 4096] extent item 30, found 31
btrfsck: extent-tree.c:2549: btrfs_reserve_extent: Assertion `!(ret)' failed.
Aborted

If i run btrfsck in non-repair mode I get:

# ./btrfsck /dev/md4 
checking extents
leaf parent key incorrect 183603200
bad block 183603200
leaf parent key incorrect 183640064
bad block 183640064
warning, start mismatch 152387469312 762175488
checksum verify failed on 34390753280 wanted 26D779EB found 40
checksum verify failed on 34390753280 wanted 26D779EB found 40
block 152387469312 rec extent_item_refs 1, passed 1
warning, start mismatch 449606139904 427217858560
block 427217858560 rec extent_item_refs 1, passed 1
ref mismatch on [32215040 4096] extent item 30, found 31
Backref 32215040 parent 427255582720 root 427255582720 not found in extent tree
backpointer mismatch on [32215040 4096]

... [snipped several thousand lines similar to the previous three] ...

ref mismatch on [477808889856 4096] extent item 11, found 12
Backref 477808889856 parent 427202011136 root 427202011136 not found in extent 
tree
backpointer mismatch on [477808889856 4096]
Errors found in extent allocation tree
checking fs roots
root 256 inode 140337 errors 400
root 256 inode 169441 errors 400
root 256 inode 169442 errors 400
root 256 inode 1843202 errors 400
warning line 1789
found 651533594626 bytes used err is 1
total csum bytes: 624739028
total tree bytes: 11639873536
total fs tree bytes: 10394214400
btree space waste bytes: 2925577458
file data blocks allocated: 741854191616
 referenced 741832200192
Btrfs v0.20-rc1-37-g91d9eec]

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel panic from "btrfs subvolume delete"

2012-06-29 Thread Richard Cooper

On 29 Jun 2012, at 11:42, Fajar A. Nugraha wrote:
>> What should I do now? Do I need to upgrade to a more recent btrfs?
> 
> Yep
> 
>> If so, how?
> 
> https://blogs.oracle.com/linux/entry/oracle_unbreakable_enterprise_kernel_release
> http://elrepo.org/tiki/kernel-ml

Perfect, thank you! I was looking for a mainline kernel yum repo but my 
google-fu was failing me. That looks like just what I need.

I've installed kernel v3.4.4 from http://elrepo.org/tiki/kernel-ml and that 
seems to have fixed my kernel panic. I'm still using the default Cent OS 6 
versions of the btrfs userspace programs (v0.19). Any reason why that might be 
a bad idea?

Thanks again,

- Rich--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Kernel panic from "btrfs subvolume delete"

2012-06-29 Thread Richard Cooper
Hi All,

I have two machines where I've been testing various btrfs based backup 
strategies. They are both Cent OS 6 with the standard kernel and btrfs-progs 
RPMs from the CentOS repos.

- kernel-2.6.32-220.17.1.el6.x86_64
- btrfs-progs-0.19-12.el6.x86_64

Both are currently in a state when trying to delete a subvolume results in the 
following kernel panic.

--

[root@backup2 ~]# btrfs subvolume delete /srv/backup_history/2012-06-28-1342
Delete subvolume '/srv/backup_history/2012-06-28-1342'
[root@backup2 ~]# 
Message from syslogd@backup2 at Jun 29 08:53:06 ...
 kernel:[ cut here ]

Message from syslogd@backup2 at Jun 29 08:53:06 ...
 kernel:invalid opcode:  [#1] SMP 

Message from syslogd@backup2 at Jun 29 08:53:06 ...
 kernel:last sysfs file: /sys/devices/virtual/block/md1/md/metadata_version

Message from syslogd@backup2 at Jun 29 08:53:06 ...
 kernel:Stack:

Message from syslogd@backup2 at Jun 29 08:53:06 ...
 kernel:Call Trace:

Message from syslogd@backup2 at Jun 29 08:53:06 ...
 kernel:Code: 89 ef e8 84 f5 fe ff 48 89 df 89 45 d8 e8 99 86 fe ff 8b 45 d8 48 
8b 5d e0 4c 8b 65 e8 4c 8b 6d f0 4c 8b 75 f8 c9 c3 0f 0b eb fe <0f> 0b eb fe 0f 
0b 66 2e 0f 1f 84 00 00 00 00 00 eb f4 66 66 66 

Message from syslogd@backup2 at Jun 29 08:53:06 ...
 kernel:Kernel panic - not syncing: Fatal exception

--

Sometimes the kernel:last sysfs file line says "/sys/kernel/kexec_crash_loaded" 
instead.

My setup is that /srv is a btrfs sat on /dev/md4 which is a 4 drive software 
RAID5 array. /srv/backups/data is a subvolume containing 65GB worth of test 
data. I've "btrfs subvolume snapshot"ed that data to a few new subvolumes under 
 /srv/backup_history/. Now whenever I try to delete any of the snapshots on 
either machine I get a kernel panic.

btrfsck look like this:

[root@backup2 ~]# btrfsck /dev/md4
found 72254246912 bytes used err is 0
total csum bytes: 66815432
total tree bytes: 3835244544
total fs tree bytes: 358144
btree space waste bytes: 1187313778
file data blocks allocated: 68419002368
 referenced 68418383872
Btrfs Btrfs v0.19


What should I do now? Do I need to upgrade to a more recent btrfs? If so, how? 
Can I provide any more information to help debug and fix the problem?

Regards,

- Rich

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html