Re: Unrecoverable scrub errors

2017-11-19 Thread Nazar Mokrynskyi
Looks like it is not going to resolve nicely.

After removing that problematic snapshot filesystem quickly becomes readonly 
like so:

> [23552.839055] BTRFS error (device dm-2): cleaner transaction attach returned 
> -30
> [23577.374390] BTRFS info (device dm-2): use lzo compression
> [23577.374391] BTRFS info (device dm-2): disk space caching is enabled
> [23577.374392] BTRFS info (device dm-2): has skinny extents
> [23577.506214] BTRFS info (device dm-2): bdev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, 
> flush 0, corrupt 24, gen 0
> [23795.026390] BTRFS error (device dm-2): bad tree block start 0 470069510144
> [23795.148193] BTRFS error (device dm-2): bad tree block start 56 470069542912
> [23795.148424] BTRFS warning (device dm-2): dm-2 checksum verify failed on 
> 470069460992 wanted 54C49539 found FD171FBB level 0
> [23795.148526] BTRFS error (device dm-2): bad tree block start 0 470069493760
> [23795.150461] BTRFS error (device dm-2): bad tree block start 1459617832 
> 470069477376
> [23795.639781] BTRFS error (device dm-2): bad tree block start 0 470069510144
> [23795.655487] BTRFS error (device dm-2): bad tree block start 0 470069510144
> [23795.655496] BTRFS: error (device dm-2) in btrfs_drop_snapshot:9244: 
> errno=-5 IO failure
> [23795.655498] BTRFS info (device dm-2): forced readonly
Check and repaid doesn't help either:

> nazar-pc@nazar-pc ~> sudo btrfs check -p 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> Checking filesystem on 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> UUID: 82cfcb0f-0b80-4764-bed6-f529f2030ac5
> Extent back ref already exists for 797694840832 parent 330760175616 root 0 
> owner 0 offset 0 num_refs 1
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> Ignoring transid failure
> leaf parent key incorrect 470072098816
> bad block 470072098816
>
> ERROR: errors found in extent allocation tree or chunk allocation
> There is no free space entry for 797694844928-797694808064
> There is no free space entry for 797694844928-797819535360
> cache appears valid but isn't 796745793536
> There is no free space entry for 814739984384-814739988480
> There is no free space entry for 814739984384-814999404544
> cache appears valid but isn't 813925662720
> block group 894456299520 has wrong amount of free space
> failed to load free space cache for block group 894456299520
> block group 922910457856 has wrong amount of free space
> failed to load free space cache for block group 922910457856
>
> ERROR: errors found in free space cache
> found 963515335717 bytes used, error(s) found
> total csum bytes: 921699896
> total tree bytes: 20361920512
> total fs tree bytes: 17621073920
> total extent tree bytes: 1629323264
> btree space waste bytes: 3812167723
> file data blocks allocated: 21167059447808
>  referenced 2283091746816
>
> nazar-pc@nazar-pc ~> sudo btrfs check --repair -p 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> enabling repair mode
> Checking filesystem on 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> UUID: 82cfcb0f-0b80-4764-bed6-f529f2030ac5
> Extent back ref already exists for 797694840832 parent 330760175616 root 0 
> owner 0 offset 0 num_refs 1
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> Ignoring transid failure
> leaf parent key incorrect 470072098816
> bad block 470072098816
>
> ERROR: errors found in extent allocation tree or chunk allocation
> Fixed 0 roots.
> There is no free space entry for 797694844928-797694808064
> There is no free space entry for 797694844928-797819535360
> cache appears valid but isn't 796745793536
> There is no free space entry for 814739984384-814739988480
> There is no free space entry for 814739984384-814999404544
> cache appears valid but isn't 813925662720
> block group 894456299520 has wrong amount of free space
> failed to load free space cache for block group 894456299520
> block group 922910457856 has wrong amount of free space
> failed to load free space cache for block group 922910457856
>
> ERROR: errors found in free space cache
> found 963515335717 bytes used, error(s) found
> total csum bytes: 921699896
> total tree bytes: 20361920512
> total fs tree bytes: 17621073920
> total extent tree bytes: 1629323264
> btree space waste bytes: 3812167723
> file data blocks allocated: 21167059447808
>  referenced 2283091746816
Anything else I can try before starting from scratch?

Sincerely, Nazar Mokrynskyi
github.com/nazar-pc

19.11.17 07:30, Nazar 

uncorrectable errors in Raid 10

2017-11-19 Thread Steffen Sindzinski

Hello,

I have done a scrub on my Btrfs Raid 10 and have 2 uncorrectable errors. 
In fact I cannot access 2 directories, even as root, permission is 
denied and all directory attributes in ls -la are .


Before I have run this filesystem as Raid 1 with 3 disks without any 
problems for more than a year. Scrubbed regularily. A month ago I added 
a fouth HDD and balanced to Raid10. I am not sure if I did a scub 
afterwards, but I usualy do. Now it found that errors. Smart status of 
HDDs is healthy. The 2 directories were read-only for some years and not 
even read in the last month.


What can I do now? Should I do a btrfs rescue? Which device, it is a 
raid 10? Probably my files are OK, only the directories I cannot access. 
How to recover the files?


Thanks in advance!

Steffen


Here is my data:


Linux bigbox 4.13.0-17-generic #20-Ubuntu SMP Mon Nov 6 10:04:08 UTC 
2017 x86_64 x86_64 x86_64 GNU/Linux


btrfs-progs v4.12

Label: 'Videos'  uuid: 4fafd0d4-7dd9-4dcc-9a33-5f1ad9555358
    Total devices 4 FS bytes used 1.44TiB
    devid    3 size 1.82TiB used 785.56GiB path /dev/sdc2
    devid    4 size 1.82TiB used 785.56GiB path /dev/sde2
    devid    5 size 1.82TiB used 785.56GiB path /dev/sdd2
    devid    6 size 1.36TiB used 785.56GiB path /dev/sdf1

Data, RAID10: total=1.51TiB, used=1.43TiB
System, RAID10: total=128.00MiB, used=192.00KiB
Metadata, RAID10: total=21.00GiB, used=17.76GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


 % sudo btrfs scrub start -Bd /
scrub device /dev/sdc2 (id 3) canceled
    scrub started at Sun Nov 19 08:43:21 2017 and was aborted after 
01:58:27

    total bytes scrubbed: 568.86GiB with 0 errors
scrub device /dev/sde2 (id 4) canceled
    scrub started at Sun Nov 19 08:43:21 2017 and was aborted after 
01:58:28

    total bytes scrubbed: 702.88GiB with 1 errors
    error details: verify=1
    corrected errors: 0, uncorrectable errors: 1, unverified errors: 0
scrub device /dev/sdd2 (id 5) done
    scrub started at Sun Nov 19 08:43:21 2017 and finished after 01:48:00
    total bytes scrubbed: 737.74GiB with 1 errors
    error details: verify=1
    corrected errors: 0, uncorrectable errors: 1, unverified errors: 0
scrub device /dev/sdf1 (id 6) canceled
    scrub started at Sun Nov 19 08:43:21 2017 and was aborted after 
01:58:28

    total bytes scrubbed: 506.02GiB with 0 errors




[    4.985712] BTRFS: device label Videos devid 6 transid 772463 /dev/sdf1
[    4.985882] BTRFS: device label Videos devid 4 transid 772463 /dev/sde2
[    4.986541] BTRFS: device label Videos devid 5 transid 772463 /dev/sdd2
[    4.986713] BTRFS: device label Videos devid 3 transid 772463 /dev/sdc2
[    5.007986] BTRFS info (device sdc2): disk space caching is enabled
[    5.007988] BTRFS info (device sdc2): has skinny extents
[    5.149910] BTRFS info (device sdc2): bdev /dev/sdd2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 3
[    5.149916] BTRFS info (device sdc2): bdev /dev/sde2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 3

[   25.482987] BTRFS info (device sdc2): use lzo compression
[   25.482990] BTRFS info (device sdc2): disk space caching is enabled
[57830.611730] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[57830.611732] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[57830.611734] BTRFS error (device sdc2): bdev /dev/sdd2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 4
[57832.688689] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sdd2
[57870.488081] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[57870.488083] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[57870.488085] BTRFS error (device sdc2): bdev /dev/sde2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 4
[57870.500114] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sde2
[88979.005712] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[88979.005718] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[88979.005720] BTRFS error (device sdc2): bdev /dev/sdd2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 5
[88979.036670] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sdd2
[89026.609561] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[89026.609563] BTRFS warning (device sdc2): 

uncorrectable errors in Raid 10

2017-11-19 Thread Steffen Sindzinski


Hello,

I have done a scrub on my Btrfs Raid 10 and have 2 uncorrectable errors. 
In fact I cannot access 2 directories, even as root, permission is 
denied and all directory attributes in ls -la are .


Before I have run this filesystem as Raid 1 with 3 disks without any 
problems for more than a year. Scrubbed regularily. A month ago I added 
a fouth HDD and balanced to Raid10. I am not sure if I did a scub 
afterwards, but I usualy do. Now it found that errors. Smart status of 
HDDs is healthy. The 2 directories were read-only for some years and not 
even read in the last month.


What can I do now? Should I do a btrfs rescue? Which device, it is a 
raid 10? Probably my files are OK, only the directories I cannot access. 
How to recover the files?


Thanks in advance!

Steffen


Here is my data:


Linux bigbox 4.13.0-17-generic #20-Ubuntu SMP Mon Nov 6 10:04:08 UTC 
2017 x86_64 x86_64 x86_64 GNU/Linux


btrfs-progs v4.12

Label: 'Videos'  uuid: 4fafd0d4-7dd9-4dcc-9a33-5f1ad9555358
Total devices 4 FS bytes used 1.44TiB
devid3 size 1.82TiB used 785.56GiB path /dev/sdc2
devid4 size 1.82TiB used 785.56GiB path /dev/sde2
devid5 size 1.82TiB used 785.56GiB path /dev/sdd2
devid6 size 1.36TiB used 785.56GiB path /dev/sdf1

Data, RAID10: total=1.51TiB, used=1.43TiB
System, RAID10: total=128.00MiB, used=192.00KiB
Metadata, RAID10: total=21.00GiB, used=17.76GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


 % sudo btrfs scrub start -Bd /
scrub device /dev/sdc2 (id 3) canceled
scrub started at Sun Nov 19 08:43:21 2017 and was aborted after 
01:58:27

total bytes scrubbed: 568.86GiB with 0 errors
scrub device /dev/sde2 (id 4) canceled
scrub started at Sun Nov 19 08:43:21 2017 and was aborted after 
01:58:28

total bytes scrubbed: 702.88GiB with 1 errors
error details: verify=1
corrected errors: 0, uncorrectable errors: 1, unverified errors: 0
scrub device /dev/sdd2 (id 5) done
scrub started at Sun Nov 19 08:43:21 2017 and finished after 01:48:00
total bytes scrubbed: 737.74GiB with 1 errors
error details: verify=1
corrected errors: 0, uncorrectable errors: 1, unverified errors: 0
scrub device /dev/sdf1 (id 6) canceled
scrub started at Sun Nov 19 08:43:21 2017 and was aborted after 
01:58:28

total bytes scrubbed: 506.02GiB with 0 errors




[4.985712] BTRFS: device label Videos devid 6 transid 772463 /dev/sdf1
[4.985882] BTRFS: device label Videos devid 4 transid 772463 /dev/sde2
[4.986541] BTRFS: device label Videos devid 5 transid 772463 /dev/sdd2
[4.986713] BTRFS: device label Videos devid 3 transid 772463 /dev/sdc2
[5.007986] BTRFS info (device sdc2): disk space caching is enabled
[5.007988] BTRFS info (device sdc2): has skinny extents
[5.149910] BTRFS info (device sdc2): bdev /dev/sdd2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 3
[5.149916] BTRFS info (device sdc2): bdev /dev/sde2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 3

[   25.482987] BTRFS info (device sdc2): use lzo compression
[   25.482990] BTRFS info (device sdc2): disk space caching is enabled
[57830.611730] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[57830.611732] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[57830.611734] BTRFS error (device sdc2): bdev /dev/sdd2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 4
[57832.688689] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sdd2
[57870.488081] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[57870.488083] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[57870.488085] BTRFS error (device sdc2): bdev /dev/sde2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 4
[57870.500114] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sde2
[88979.005712] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[88979.005718] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf 
(level 0) in tree 318
[88979.005720] BTRFS error (device sdc2): bdev /dev/sdd2 errs: wr 0, rd 
0, flush 0, corrupt 0, gen 5
[88979.036670] BTRFS error (device sdc2): unable to fixup (regular) 
error at logical 17478699876352 on dev /dev/sdd2
[89026.609561] BTRFS warning (device sdc2): checksum/header error at 
logical 17478699876352 on dev /dev/sde2, sector 348423360: metadata leaf 
(level 0) in tree 318
[89026.609563] BTRFS warning (device sdc2):

Re: Unrecoverable scrub errors

2017-11-19 Thread Roy Sigurd Karlsbakk
I guess not using RAID-0 would be a good start…

Vennlig hilsen

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
Hið góða skaltu í stein höggva, hið illa í snjó rita.

- Original Message -
> From: "Nazar Mokrynskyi" 
> To: "Chris Murphy" 
> Cc: "linux-btrfs" 
> Sent: Sunday, 19 November, 2017 12:17:36
> Subject: Re: Unrecoverable scrub errors

> Looks like it is not going to resolve nicely.
> 
> After removing that problematic snapshot filesystem quickly becomes readonly
> like so:
> 
>> [23552.839055] BTRFS error (device dm-2): cleaner transaction attach returned
>> -30
>> [23577.374390] BTRFS info (device dm-2): use lzo compression
>> [23577.374391] BTRFS info (device dm-2): disk space caching is enabled
>> [23577.374392] BTRFS info (device dm-2): has skinny extents
>> [23577.506214] BTRFS info (device dm-2): bdev
>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0,
>> flush 0, corrupt 24, gen 0
>> [23795.026390] BTRFS error (device dm-2): bad tree block start 0 470069510144
>> [23795.148193] BTRFS error (device dm-2): bad tree block start 56 
>> 470069542912
>> [23795.148424] BTRFS warning (device dm-2): dm-2 checksum verify failed on
>> 470069460992 wanted 54C49539 found FD171FBB level 0
>> [23795.148526] BTRFS error (device dm-2): bad tree block start 0 470069493760
>> [23795.150461] BTRFS error (device dm-2): bad tree block start 1459617832
>> 470069477376
>> [23795.639781] BTRFS error (device dm-2): bad tree block start 0 470069510144
>> [23795.655487] BTRFS error (device dm-2): bad tree block start 0 470069510144
>> [23795.655496] BTRFS: error (device dm-2) in btrfs_drop_snapshot:9244: 
>> errno=-5
>> IO failure
>> [23795.655498] BTRFS info (device dm-2): forced readonly
> Check and repaid doesn't help either:
> 
>> nazar-pc@nazar-pc ~> sudo btrfs check -p
>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
>> Checking filesystem on
>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
>> UUID: 82cfcb0f-0b80-4764-bed6-f529f2030ac5
>> Extent back ref already exists for 797694840832 parent 330760175616 root 0 
>> owner
>> 0 offset 0 num_refs 1
>> parent transid verify failed on 470072098816 wanted 1431 found 307965
>> parent transid verify failed on 470072098816 wanted 1431 found 307965
>> parent transid verify failed on 470072098816 wanted 1431 found 307965
>> parent transid verify failed on 470072098816 wanted 1431 found 307965
>> Ignoring transid failure
>> leaf parent key incorrect 470072098816
>> bad block 470072098816
>>
>> ERROR: errors found in extent allocation tree or chunk allocation
>> There is no free space entry for 797694844928-797694808064
>> There is no free space entry for 797694844928-797819535360
>> cache appears valid but isn't 796745793536
>> There is no free space entry for 814739984384-814739988480
>> There is no free space entry for 814739984384-814999404544
>> cache appears valid but isn't 813925662720
>> block group 894456299520 has wrong amount of free space
>> failed to load free space cache for block group 894456299520
>> block group 922910457856 has wrong amount of free space
>> failed to load free space cache for block group 922910457856
>>
>> ERROR: errors found in free space cache
>> found 963515335717 bytes used, error(s) found
>> total csum bytes: 921699896
>> total tree bytes: 20361920512
>> total fs tree bytes: 17621073920
>> total extent tree bytes: 1629323264
>> btree space waste bytes: 3812167723
>> file data blocks allocated: 21167059447808
>>  referenced 2283091746816
>>
>> nazar-pc@nazar-pc ~> sudo btrfs check --repair -p
>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
>> enabling repair mode
>> Checking filesystem on
>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
>> UUID: 82cfcb0f-0b80-4764-bed6-f529f2030ac5
>> Extent back ref already exists for 797694840832 parent 330760175616 root 0 
>> owner
>> 0 offset 0 num_refs 1
>> parent transid verify failed on 470072098816 wanted 1431 found 307965
>> parent transid verify failed on 470072098816 wanted 1431 found 307965
>> parent transid verify failed on 470072098816 wanted 1431 found 307965
>> parent transid verify failed on 470072098816 wanted 1431 found 307965
>> Ignoring transid failure
>> leaf parent key incorrect 470072098816
>> bad block 470072098816
>>
>> ERROR: errors found in extent allocation tree or chunk allocation
>> Fixed 0 roots.
>> There is no free space entry for 797694844928-797694808064
>> There is no free space entry for 797694844928-797819535360
>> cache appears valid but isn't 796745793536
>> There is no free space entry for 814739984384-814739988480
>> There is no free space entry for 814739984384-814999404544
>> cache appears valid but isn't 813925662720
>> block group 894456299520 has wrong amount of free space
>> failed to load free space cache for block group 894456299520
>> block group 922910457856 has wrong 

Re: Unrecoverable scrub errors

2017-11-19 Thread Nazar Mokrynskyi
This particular partition was initially created in July 2015. I've 
added/removed drives a few times when migrating from older to newer hardware, 
but never used RAID0 or any other RAID level beyond that.

Sincerely, Nazar Mokrynskyi
github.com/nazar-pc

19.11.17 22:39, Roy Sigurd Karlsbakk пише:
> I guess not using RAID-0 would be a good start…
>
> Vennlig hilsen
>
> roy
> --
> Roy Sigurd Karlsbakk
> (+47) 98013356
> http://blogg.karlsbakk.net/
> GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
> --
> Hið góða skaltu í stein höggva, hið illa í snjó rita.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bug? fstrim only trims unallocated space, not unused in bg's

2017-11-19 Thread Chris Murphy
On Sat, Nov 18, 2017 at 11:27 PM, Andrei Borzenkov  wrote:
> 19.11.2017 09:17, Chris Murphy пишет:
>> fstrim should trim free space, but it only trims unallocated. This is
>> with kernel 4.14.0 and the entire 4.13 series. I'm pretty sure it
>> behaved this way with 4.12 also.
>>
>
> Well, I was told it should also trim free space ...
>
> https://www.spinics.net/lists/linux-btrfs/msg61819.html
>

It definitely isn't. If I do a partial balance, then fstrim, I get a
larger trimmed value, corresponding exactly to unallocated space.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: uncorrectable errors in Raid 10

2017-11-19 Thread Chris Murphy
On Sun, Nov 19, 2017 at 12:31 PM, Steffen Sindzinski  wrote:

> [57830.611730] BTRFS warning (device sdc2): checksum/header error at logical
> 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf (level 0)
> in tree 318
> [57830.611732] BTRFS warning (device sdc2): checksum/header error at logical
> 17478699876352 on dev /dev/sdd2, sector 338986176: metadata leaf (level 0)
> in tree 318

The same leaf is corrupt in the same physical sector on two devices.
I'm guessing the checksum was computed incorrectly and written twice,
affecting both copies. I doubt it's a device problem. It might be
useful to look through the archives specifically for checksum header
error, it's kinda interesting Btrfs knows the problem is specifically
there.

What do you get for:

btrfs-debut-tree -b 17478699876352 /dev/sdd2

I think the problem is isolated but you're probably best off to
freshen up backups now while you can. Yes you can use restore to get
data off the volume if you don't already have backups. And as for
btrfs check --repair, only do that once you have backups and you're
prepared to lose the file system.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bug? fstrim only trims unallocated space, not unused in bg's

2017-11-19 Thread Qu Wenruo


On 2017年11月19日 14:17, Chris Murphy wrote:
> fstrim should trim free space, but it only trims unallocated. This is
> with kernel 4.14.0 and the entire 4.13 series. I'm pretty sure it
> behaved this way with 4.12 also.

Tested with 4.14-rc7, can't reproduce it.
--
# btrfs fi us /mnt/btrfs/
Overall:
Device size:   1.00GiB
Device allocated:566.38MiB
Device unallocated:  457.62MiB
Device missing:  0.00B
Used:256.81MiB
Free (estimated):649.62MiB  (min: 420.81MiB)
Data ratio:   1.00
Metadata ratio:   2.00
Global reserve:   16.00MiB  (used: 0.00B)

Data,single: Size:448.00MiB, Used:256.00MiB
   /dev/loop0448.00MiB

Metadata,DUP: Size:51.19MiB, Used:400.00KiB
   /dev/loop0102.38MiB

System,DUP: Size:8.00MiB, Used:16.00KiB
   /dev/loop0 16.00MiB

Unallocated:
   /dev/loop0457.62MiB

# fstrim  /mnt/btrfs -v
/mnt/btrfs: 665.3 MiB (697597952 bytes) trimmed
--


Any special mount options or setup?
(BTW, I also tried space_cache=v2 and default v1, no obvious difference)

Thanks,
Qu

> 
> 
> [root@f27h ~]# fstrim -v /
> /: 39 GiB (41841328128 bytes) trimmed
> [root@f27h ~]# btrfs fi us /
> Overall:
> Device size:  70.00GiB
> Device allocated:  31.03GiB
> Device unallocated:  38.97GiB
> Device missing: 0.00B
> Used:  22.02GiB
> Free (estimated):  47.72GiB(min: 47.72GiB)
> Data ratio:  1.00
> Metadata ratio:  1.00
> Global reserve:  65.97MiB(used: 0.00B)
> 
> Data,single: Size:30.00GiB, Used:21.25GiB
>/dev/nvme0n1p8  30.00GiB
> 
> Metadata,single: Size:1.00GiB, Used:791.58MiB
>/dev/nvme0n1p8   1.00GiB
> 
> System,single: Size:32.00MiB, Used:16.00KiB
>/dev/nvme0n1p8  32.00MiB
> 
> Unallocated:
>/dev/nvme0n1p8  38.97GiB
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: bug? fstrim only trims unallocated space, not unused in bg's

2017-11-19 Thread Chris Murphy
On Sun, Nov 19, 2017 at 7:13 PM, Qu Wenruo  wrote:
>
>
> On 2017年11月19日 14:17, Chris Murphy wrote:
>> fstrim should trim free space, but it only trims unallocated. This is
>> with kernel 4.14.0 and the entire 4.13 series. I'm pretty sure it
>> behaved this way with 4.12 also.
>
> Tested with 4.14-rc7, can't reproduce it.

$ sudo btrfs fi us /
Overall:
Device size:  70.00GiB
Device allocated:  31.03GiB
Device unallocated:  38.97GiB
Device missing: 0.00B
Used:  22.12GiB
Free (estimated):  47.62GiB(min: 47.62GiB)
...snip...

$ sudo fstrim -v /
/: 39 GiB (41841328128 bytes) trimmed

Then I run btrfs-debug -b / and find the least used block group, at 8% usage;

block group offset   174202028032 len 1073741824 used   89206784
chunk_objectid 256 flags 1 usage 0.08

And balance that block group:

$ sudo btrfs balance start -dvrange=174202028032..174202028033 -dlimit=1 /
Done, had to relocate 1 out of 32 chunks

And trim again:

/: 39 GiB (41841328128 bytes) trimmed


> Any special mount options or setup?
> (BTW, I also tried space_cache=v2 and default v1, no obvious difference)


/dev/nvme0n1p8 on / type btrfs
(rw,relatime,seclabel,ssd,space_cache,subvolid=333,subvol=/root27)


Would a strace of fstrim help?


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bug? fstrim only trims unallocated space, not unused in bg's

2017-11-19 Thread Chris Murphy
On Sun, Nov 19, 2017 at 7:24 PM, Chris Murphy  wrote:


> $ sudo fstrim -v /
> /: 39 GiB (41841328128 bytes) trimmed

> And trim again:
>
> /: 39 GiB (41841328128 bytes) trimmed

Cute. The balance command claimed it balanced a chunk but it didn't.
btrfs-debug -b says that same 8% chunk is present...

block group offset   175275769856 len 1073741824 used   89206784
chunk_objectid 256 flags 1 usage 0.08

Fine. I'll do a -duage instead.

$ sudo btrfs balance start -dusage=11 /
Done, had to relocate 2 out of 32 chunks
$ sudo fstrim -v /
/: 40 GiB (42915069952 bytes) trimmed
$ sudo btrfs balance start -dusage=21 /
Done, had to relocate 2 out of 31 chunks
$ sudo fstrim -v /
/: 41 GiB (43988811776 bytes) trimmed


OK so a different bug is that it's claiming to balance two chunks but
it's really only balancing one. That same 8% used block group was not
rewritten, it's at the same address, so for whatever reason that tiny
one is pinned.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bug? fstrim only trims unallocated space, not unused in bg's

2017-11-19 Thread Qu Wenruo


On 2017年11月20日 10:24, Chris Murphy wrote:
> On Sun, Nov 19, 2017 at 7:13 PM, Qu Wenruo  wrote:
>>
>>
>> On 2017年11月19日 14:17, Chris Murphy wrote:
>>> fstrim should trim free space, but it only trims unallocated. This is
>>> with kernel 4.14.0 and the entire 4.13 series. I'm pretty sure it
>>> behaved this way with 4.12 also.
>>
>> Tested with 4.14-rc7, can't reproduce it.
> 
> $ sudo btrfs fi us /
> Overall:
> Device size:  70.00GiB
> Device allocated:  31.03GiB
> Device unallocated:  38.97GiB
> Device missing: 0.00B
> Used:  22.12GiB
> Free (estimated):  47.62GiB(min: 47.62GiB)
> ...snip...
> 
> $ sudo fstrim -v /
> /: 39 GiB (41841328128 bytes) trimmed
> 
> Then I run btrfs-debug -b / and find the least used block group, at 8% usage;
> 
> block group offset   174202028032 len 1073741824 used   89206784
> chunk_objectid 256 flags 1 usage 0.08
> 
> And balance that block group:
> 
> $ sudo btrfs balance start -dvrange=174202028032..174202028033 -dlimit=1 /
> Done, had to relocate 1 out of 32 chunks
> 
> And trim again:
> 
> /: 39 GiB (41841328128 bytes) trimmed
> 
> 
>> Any special mount options or setup?
>> (BTW, I also tried space_cache=v2 and default v1, no obvious difference)
> 
> 
> /dev/nvme0n1p8 on / type btrfs
> (rw,relatime,seclabel,ssd,space_cache,subvolid=333,subvol=/root27)

Nothing special at all.

And unfortunately, no trace point inside btrfs_trim_block_group() at all.

But a quick glance shows me that, the loop to iterate existing block
groups to trim free space inside them has a return value overwrite bug.

So only unallocated space get trimmed.

Would you please try this diff to get the return value?

--
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 309a109069f1..dbec05dc8810 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10983,12 +10983,12 @@ int btrfs_trim_fs(struct btrfs_fs_info
*fs_info, struct fstrim_range *range)
ret = cache_block_group(cache, 0);
if (ret) {
btrfs_put_block_group(cache);
-   break;
+   goto out;
}
ret = wait_block_group_cache_done(cache);
if (ret) {
btrfs_put_block_group(cache);
-   break;
+   goto out;
}
}
ret = btrfs_trim_block_group(cache,
@@ -11000,7 +11000,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
struct fstrim_range *range)
trimmed += group_trimmed;
if (ret) {
btrfs_put_block_group(cache);
-   break;
+   goto out;
}
}

@@ -11019,6 +11019,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
struct fstrim_range *range)
}
mutex_unlock(&fs_info->fs_devices->device_list_mutex);

+out:
range->len = trimmed;
return ret;
 }
--

Thanks,
Qu

> 
> 
> Would a strace of fstrim help?
> 
> 



signature.asc
Description: OpenPGP digital signature


[PATCH] btrfs: Enhance btrfs_trim_fs function to handle error better

2017-11-19 Thread Qu Wenruo
Function btrfs_trim_fs() doesn't handle errors in a consistent way, if
error happens when trimming existing block groups, it will skip the
remaining blocks and continue to trim unallocated space for each device.

And the return value will only reflect the final error from device
trimming.

This patch will fix such behavior by:

1) Recording first error from block group or device trimming
   So return value will also reflect any error found when trimming.
   Make developer more aware of the problem.

2) Outputting btrfs warning message for each trimming failure
   Any error for block group or device trimming will cause btrfs warning
   kernel message.

3) Continuing trimming if we can
   If we failed to trim one block group or device, we could still try
   next block group or device.

Such behavior can avoid confusion for case like failure to trim the
first block group and then only unallocated space is trimmed.

Reported-by: Chris Murphy 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/extent-tree.c | 59 --
 1 file changed, 43 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 309a109069f1..46d65ffb3bd1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10948,6 +10948,16 @@ static int btrfs_trim_free_extents(struct btrfs_device 
*device,
return ret;
 }
 
+/*
+ * Trim the whole fs, by:
+ * 1) Trimming free space in each block group
+ * 2) Trimming unallocated space in each device
+ *
+ * Will try to continue trimming even if we failed to trim one block group or
+ * device.
+ * The return value will be the error return value of the first error.
+ * Or 0 if nothing wrong happened.
+ */
 int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
 {
struct btrfs_block_group_cache *cache = NULL;
@@ -10958,6 +10968,8 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct 
fstrim_range *range)
u64 end;
u64 trimmed = 0;
u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
+   int bg_ret = 0;
+   int dev_ret = 0;
int ret = 0;
 
/*
@@ -10968,7 +10980,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct 
fstrim_range *range)
else
cache = btrfs_lookup_block_group(fs_info, range->start);
 
-   while (cache) {
+   for (; cache; cache = next_block_group(fs_info, cache)) {
if (cache->key.objectid >= (range->start + range->len)) {
btrfs_put_block_group(cache);
break;
@@ -10982,29 +10994,36 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, 
struct fstrim_range *range)
if (!block_group_cache_done(cache)) {
ret = cache_block_group(cache, 0);
if (ret) {
-   btrfs_put_block_group(cache);
-   break;
+   btrfs_warn_rl(fs_info,
+   "failed to cache block group %llu ret %d",
+  cache->key.objectid, ret);
+   if (!bg_ret)
+   bg_ret = ret;
+   continue;
}
ret = wait_block_group_cache_done(cache);
if (ret) {
-   btrfs_put_block_group(cache);
-   break;
+   btrfs_warn_rl(fs_info,
+   "failed to wait cache for block group %llu ret %d",
+  cache->key.objectid, ret);
+   if (!bg_ret)
+   bg_ret = ret;
+   continue;
}
}
-   ret = btrfs_trim_block_group(cache,
-&group_trimmed,
-start,
-end,
-range->minlen);
+   ret = btrfs_trim_block_group(cache, &group_trimmed,
+   start, end, range->minlen);
 
trimmed += group_trimmed;
if (ret) {
-   btrfs_put_block_group(cache);
-   break;
+   btrfs_warn_rl(fs_info,
+   "failed to trim block group %llu ret %d",
+  cache->key.objectid, ret);
+   if (!bg_ret)
+   bg_ret = ret;
+   

Issues while doing btrfs delete missing in raid6

2017-11-19 Thread Jérôme Carretero
Hi,


While doing a test (to evaluate drives), where I'm filling a bunch of
drives in RAID6, one of the disks failed in the process.
(System with v4.14 / ECC).
I remounted the array in degraded, launched a "btrfs delete missing"
as I have no replacement device.

The command (takes ages and) fails with:
 ERROR: error removing device 'missing': Input/output error

and klog says:

 [631517.263313] BTRFS info (device dm-18): relocating block group 
1411883335680 flags data|raid6
 [631547.556527] btrfs_print_data_csum_error: 151 callbacks suppressed
 [631547.556530] BTRFS warning (device dm-18): csum failed root -9 ino 1177 off 
3559653376 csum 0x2e827bb4 expected csum 0xda9c34d6 mirror 2
 [631547.562727] BTRFS warning (device dm-18): csum failed root -9 ino 1177 off 
3559657472 csum 0x6722cd32 expected csum 0x3ca2ce6f mirror 2
 [631547.562730] BTRFS warning (device dm-18): csum failed root -9 ino 1177 off 
3559661568 csum 0x90368636 expected csum 0xf55a0410 mirror 2
 [631547.562732] BTRFS warning (device dm-18): csum failed root -9 ino 1177 off 
3559665664 csum 0x3e38aeb2 expected csum 0x6c80a970 mirror 2
 [631547.562746] BTRFS warning (device dm-18): csum failed root -9 ino 1177 off 
3559669760 csum 0x77d73f2d expected csum 0xe62cfbe8 mirror 2
 [631547.562747] BTRFS warning (device dm-18): csum failed root -9 ino 1177 off 
3559673856 csum 0xb03d1632 expected csum 0xe9a3f0e6 mirror 2
 [631547.562756] BTRFS warning (device dm-18): csum failed root -9 ino 1177 off 
3559677952 csum 0xeea04377 expected csum 0x8819aaf7 mirror 2
 [631547.562758] BTRFS warning (device dm-18): csum failed root -9 ino 1177 off 
3559682048 csum 0xe46ab546 expected csum 0xacc16686 mirror 2
 [631547.562775] BTRFS warning (device dm-18): csum failed root -9 ino 1177 off 
3559690240 csum 0x956a74d7 expected csum 0x99e29858 mirror 2
 [631547.562788] BTRFS warning (device dm-18): csum failed root -9 ino 1177 off 
3559686144 csum 0xb09a35ae expected csum 0x5f61fa99 mirror 2

Since this is RAID6, I wasn't expecting to not be able to recover
from a checksum issue, also it's not very practical to bail out on the first
error of this kind during a delete... the offending blocks could be
left as is.

I then try to work around the issue by removing the offending file
(yes it's a test, but filling the drives takes a lot of time),
finding it with "btrfs inspect-internal inode-resolve 1177", and somehow:
 ERROR: ino paths ioctl: No such file or directory


Regards,

-- 
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issues while doing btrfs delete missing in raid6

2017-11-19 Thread Jérôme Carretero
On Mon, 20 Nov 2017 01:43:44 -0500
Jérôme Carretero  wrote:

> Hi,
> 
> 
> While doing a test (to evaluate drives), where I'm filling a bunch of
> drives in RAID6, one of the disks failed in the process.
> (System with v4.14 / ECC).
> I remounted the array in degraded, launched a "btrfs delete missing"
> as I have no replacement device.
> 
> The command (takes ages and) fails with:
>  ERROR: error removing device 'missing': Input/output error

> Since this is RAID6, I wasn't expecting to not be able to recover
> from a checksum issue, also it's not very practical to bail out on
> the first error of this kind during a delete... the offending blocks
> could be left as is.

While doing a "tar c /mnt/test | pv >/dev/null" I see the csum errors,
but they are corrected then.
I guess I'll try to scrub and see. But there's probably a bug, if
delete/replace/balance can't do that.


Regards,

-- 
Jérôme

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


WARNING: CPU: 3 PID: 20953 at /usr/src/linux/fs/btrfs/raid56.c:848 __free_raid_bio+0x8e/0xa0

2017-11-19 Thread Jérôme Carretero
Hi,



This was while doing a "userspace scrub" with "tar c":

[633250.707455] btrfs_print_data_csum_error: 14608 callbacks suppressed
[633250.707459] BTRFS warning (device dm-18): csum failed root 5 ino 1376 off 
3530293248 csum 0xb8c194fb expected csum 0xb3680c88 mirror 2
[633250.707465] BTRFS warning (device dm-18): csum failed root 5 ino 1376 off 
3530293248 csum 0x7f422a5d expected csum 0xb3680c88 mirror 2
[633250.707470] BTRFS warning (device dm-18): csum failed root 5 ino 1376 off 
3530293248 csum 0xa5db59eb expected csum 0xb3680c88 mirror 2
[633250.707473] BTRFS warning (device dm-18): csum failed root 5 ino 1376 off 
3530293248 csum 0x5d244234 expected csum 0xb3680c88 mirror 2
[633250.707475] BTRFS warning (device dm-18): csum failed root 5 ino 1376 off 
3530293248 csum 0x7f422a5d expected csum 0xb3680c88 mirror 2
[633250.707478] BTRFS warning (device dm-18): csum failed root 5 ino 1376 off 
3530301440 csum 0xc0a71540 expected csum 0x904f75bc mirror 2
[633250.707480] BTRFS warning (device dm-18): csum failed root 5 ino 1376 off 
3530293248 csum 0x7f422a5d expected csum 0xb3680c88 mirror 2
[633250.707483] BTRFS warning (device dm-18): csum failed root 5 ino 1376 off 
3530293248 csum 0x7f422a5d expected csum 0xb3680c88 mirror 2
[633250.707484] BTRFS warning (device dm-18): csum failed root 5 ino 1376 off 
3530301440 csum 0x0abd2cac expected csum 0x904f75bc mirror 2
[633250.707488] BTRFS warning (device dm-18): csum failed root 5 ino 1376 off 
3530301440 csum 0x0d046c34 expected csum 0x904f75bc mirror 2
[633250.888501] BTRFS info (device dm-18): read error corrected: ino 1376 off 
230948864 (dev /dev/mapper/I8U2-4 sector 1373688)
[633250.937373] BTRFS info (device dm-18): read error corrected: ino 1376 off 
230952960 (dev /dev/mapper/I8U2-4 sector 1373688)
[633250.949808] BTRFS info (device dm-18): read error corrected: ino 1376 off 
230957056 (dev /dev/mapper/I8U2-4 sector 1373688)
[633250.961703] BTRFS info (device dm-18): read error corrected: ino 1376 off 
230961152 (dev /dev/mapper/I8U2-4 sector 1373688)
[633250.973827] BTRFS info (device dm-18): read error corrected: ino 1376 off 
230965248 (dev /dev/mapper/I8U2-4 sector 1373688)
[633250.986271] BTRFS info (device dm-18): read error corrected: ino 1376 off 
230969344 (dev /dev/mapper/I8U2-4 sector 1373688)
[633250.998517] BTRFS info (device dm-18): read error corrected: ino 1376 off 
230973440 (dev /dev/mapper/I8U2-4 sector 1373688)
[633251.010537] BTRFS info (device dm-18): read error corrected: ino 1376 off 
230977536 (dev /dev/mapper/I8U2-4 sector 1373688)
[633251.022767] BTRFS info (device dm-18): read error corrected: ino 1376 off 
230981632 (dev /dev/mapper/I8U2-4 sector 1373688)
[633251.034990] BTRFS info (device dm-18): read error corrected: ino 1376 off 
230985728 (dev /dev/mapper/I8U2-4 sector 1373688)
[633254.456570] [ cut here ]
[633254.461294] WARNING: CPU: 3 PID: 20953 at 
/usr/src/linux/fs/btrfs/raid56.c:848 __free_raid_bio+0x8e/0xa0
[633254.470863] Modules linked in: bfq twofish_avx_x86_64 twofish_x86_64_3way 
xts twofish_x86_64 twofish_common serpent_avx_x86_64 serpent_generic lrw 
gf128mul ablk_helper algif_skcipher af_alg nfnetlink_queue nfnetlink_log 
nfnetlink cfg80211 rfkill usbmon fuse usb_storage dm_crypt dm_mod dax coretemp 
hwmon intel_rapl snd_hda_codec_realtek x86_pkg_temp_thermal 
snd_hda_codec_generic iTCO_wdt kvm_intel iTCO_vendor_support snd_hda_intel kvm 
snd_hda_codec irqbypass snd_hwdep aesni_intel snd_hda_core aes_x86_64 snd_pcm 
xhci_pci snd_timer ehci_pci crypto_simd xhci_hcd cryptd ehci_hcd sdhci_pci 
glue_helper pcspkr snd usbcore sdhci soundcore lpc_ich mmc_core usb_common 
mfd_core bnx2 bonding autofs4 [last unloaded: i2c_dev]
[633254.533987] CPU: 3 PID: 20953 Comm: kworker/u16:18 Tainted: GW  
 4.14.0-Vantage-dirty #14
[633254.543298] Hardware name: LENOVO 056851U/LENOVO, BIOS A0KT56AUS 02/01/2016
[633254.550365] Workqueue: btrfs-endio btrfs_endio_helper
[633254.08] task: 880859523b00 task.stack: c90006164000
[633254.561528] RIP: 0010:__free_raid_bio+0x8e/0xa0
[633254.566143] RSP: 0018:c90006167bc8 EFLAGS: 00010282
[633254.571457] RAX: 88052540d010 RBX: 8801ffd02800 RCX: 
0001
[633254.578683] RDX: 88052540d010 RSI: 0246 RDI: 
88052540d000
[633254.585912] RBP: 88052540d000 R08:  R09: 
0001
[633254.593131] R10: 8805ad3a2e60 R11: 0006 R12: 
000a
[633254.600378] R13: 0004 R14: 0001 R15: 
880537d1c000
[633254.607604] FS:  () GS:88087fcc() 
knlGS:
[633254.615804] CS:  0010 DS:  ES:  CR0: 80050033
[633254.621635] CR2: 7f13db554000 CR3: 01e09002 CR4: 
000606e0
[633254.628870] Call Trace:
[633254.631421]  rbio_orig_end_io+0x42/0x80
[633254.635352]  __raid56_parity_recover+0x17a/0x1f0
[633254.640078]  raid56_parity_recover+0x193/0x1d0
[633254.644623]  b

Re: WARNING: CPU: 3 PID: 20953 at /usr/src/linux/fs/btrfs/raid56.c:848 __free_raid_bio+0x8e/0xa0

2017-11-19 Thread Jérôme Carretero
On Mon, 20 Nov 2017 02:00:07 -0500
Jérôme Carretero  wrote:

> [ cut here ]

It should be noted that the filesystem doesn't want to be unmounted now.


Regards,

-- 
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issues while doing btrfs delete missing in raid6

2017-11-19 Thread Qu Wenruo


On 2017年11月20日 14:43, Jérôme Carretero wrote:
> Hi,
> 
> 
> While doing a test (to evaluate drives), where I'm filling a bunch of
> drives in RAID6, one of the disks failed in the process.
> (System with v4.14 / ECC).
> I remounted the array in degraded, launched a "btrfs delete missing"
> as I have no replacement device.
> 
> The command (takes ages and) fails with:
>  ERROR: error removing device 'missing': Input/output error
> 
> and klog says:
> 
>  [631517.263313] BTRFS info (device dm-18): relocating block group 
> 1411883335680 flags data|raid6
>  [631547.556527] btrfs_print_data_csum_error: 151 callbacks suppressed
>  [631547.556530] BTRFS warning (device dm-18): csum failed root -9 ino 1177 
> off 3559653376 csum 0x2e827bb4 expected csum 0xda9c34d6 mirror 2

Root -9 means it's a data reloc tree. So its ino number is not real
inode number.

To delete it, you need to  calculate the offset into bytenr, then find
the owner.

>  [631547.562727] BTRFS warning (device dm-18): csum failed root -9 ino 1177 
> off 3559657472 csum 0x6722cd32 expected csum 0x3ca2ce6f mirror 2
>  [631547.562730] BTRFS warning (device dm-18): csum failed root -9 ino 1177 
> off 3559661568 csum 0x90368636 expected csum 0xf55a0410 mirror 2
>  [631547.562732] BTRFS warning (device dm-18): csum failed root -9 ino 1177 
> off 3559665664 csum 0x3e38aeb2 expected csum 0x6c80a970 mirror 2
>  [631547.562746] BTRFS warning (device dm-18): csum failed root -9 ino 1177 
> off 3559669760 csum 0x77d73f2d expected csum 0xe62cfbe8 mirror 2
>  [631547.562747] BTRFS warning (device dm-18): csum failed root -9 ino 1177 
> off 3559673856 csum 0xb03d1632 expected csum 0xe9a3f0e6 mirror 2
>  [631547.562756] BTRFS warning (device dm-18): csum failed root -9 ino 1177 
> off 3559677952 csum 0xeea04377 expected csum 0x8819aaf7 mirror 2
>  [631547.562758] BTRFS warning (device dm-18): csum failed root -9 ino 1177 
> off 3559682048 csum 0xe46ab546 expected csum 0xacc16686 mirror 2
>  [631547.562775] BTRFS warning (device dm-18): csum failed root -9 ino 1177 
> off 3559690240 csum 0x956a74d7 expected csum 0x99e29858 mirror 2
>  [631547.562788] BTRFS warning (device dm-18): csum failed root -9 ino 1177 
> off 3559686144 csum 0xb09a35ae expected csum 0x5f61fa99 mirror 2
> 
> Since this is RAID6, I wasn't expecting to not be able to recover
> from a checksum issue,

Currently btrfs RAID6 can't ensure recovered data to match its csum.

That's to say, if some other error, like real data corruption in another
disk, in theory RAID6 could still recover it, but the truth is, it may
use the corrupted disk to recover, resulting back checksum.

Thanks,
Qu

> also it's not very practical to bail out on the first
> error of this kind during a delete... the offending blocks could be
> left as is.
> 
> I then try to work around the issue by removing the offending file
> (yes it's a test, but filling the drives takes a lot of time),
> finding it with "btrfs inspect-internal inode-resolve 1177", and somehow:
>  ERROR: ino paths ioctl: No such file or directory
> 
> 
> Regards,
> 



signature.asc
Description: OpenPGP digital signature