At 10/31/2016 11:04 PM, Marc MERLIN wrote:
On Mon, Oct 31, 2016 at 08:44:12AM +0000, Hugo Mills wrote:
Any idea on special dm setup which can make us fail to read out some
data range?
I've seen both btrfs check and btrfs dump-super give wrong answers
(particularly, some addresses end up larger than the device, for some
reason) when run on a mounted filesystem. Worth ruling that one out.
I just finished running my scrub overnight, and it failed around 10%:
[115500.316921] BTRFS error (device dm-0): bad tree block start
8461247125784585065 17619396231168
[115500.332354] BTRFS error (device dm-0): bad tree block start
8461247125784585065 17619396231168
[115500.332626] BTRFS: error (device dm-0) in __btrfs_free_extent:6954:
errno=-5 IO failure
[115500.332629] BTRFS info (device dm-0): forced readonly
[115500.332632] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:2960:
errno=-5 IO failure
[115500.436002] btrfs_printk: 550 callbacks suppressed
[115500.436024] BTRFS warning (device dm-0): Skipping commit of aborted
transaction.
[115500.436029] BTRFS: error (device dm-0) in cleanup_transaction:1854:
errno=-5 IO failure
myth:~# ionice -c 3 nice -10 btrfs scrub start -Bd /mnt/mnt
(...)
scrub device /dev/mapper/crypt_bcache0 (id 1) canceled
scrub started at Sun Oct 30 22:52:59 2016 and was aborted after 09:03:11
total bytes scrubbed: 1.15TiB with 512 errors
error details: csum=512
corrected errors: 0, uncorrectable errors: 512, unverified errors: 0
Am I correct that if I see "__btrfs_free_extent:6954: errno=-5 IO failure" it
means
that btrfs had physical read errors from the underlying block layer?
Not really sure if it's physical read errors. As we throw -EIO almost
every where.
But that's possible that your extent tree got corrupted so
__btrfs_free_extent() failed to modify extent tree.
And in that case, we do throw -EIO.
Do I have some weird mismatch between the size of my md array and the size of
my filesystem
(as per dd apparently thinking parts of it are out of bounds?)
Yet, the sizes seem to match:
Would you try to locate the range where we starts to fail to read?
I still think the root problem is we failed to read the device in user
space.
Thanks,
Qu
myth:~# mdadm --query --detail /dev/md5
/dev/md5:
Version : 1.2
Creation Time : Tue Jan 21 10:35:52 2014
Raid Level : raid5
Array Size : 15627542528 (14903.59 GiB 16002.60 GB)
Used Dev Size : 3906885632 (3725.90 GiB 4000.65 GB)
Raid Devices : 5
Total Devices : 5
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Mon Oct 31 07:56:07 2016
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : gargamel.svh.merlins.org:5
UUID : ec672af7:a66d9557:2f00d76c:38c9f705
Events : 147992
Number Major Minor RaidDevice State
0 8 97 0 active sync /dev/sdg1
6 8 113 1 active sync /dev/sdh1
2 8 81 2 active sync /dev/sdf1
3 8 65 3 active sync /dev/sde1
5 8 49 4 active sync /dev/sdd1
myth:~# btrfs fi df /mnt/mnt
Data, single: total=13.22TiB, used=13.19TiB
System, DUP: total=32.00MiB, used=1.42MiB
Metadata, DUP: total=75.00GiB, used=72.82GiB
GlobalReserve, single: total=512.00MiB, used=6.73MiB
Thanks,
Marc
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html