subject:"Unrecoverable scrub errors"

Re: Unrecoverable scrub errors

2017-11-19 Thread Nazar Mokrynskyi

This particular partition was initially created in July 2015. I've 
added/removed drives a few times when migrating from older to newer hardware, 
but never used RAID0 or any other RAID level beyond that.

Sincerely, Nazar Mokrynskyi
github.com/nazar-pc

19.11.17 22:39, Roy Sigurd Karlsbakk пише:
> I guess not using RAID-0 would be a good start…
>
> Vennlig hilsen
>
> roy
> --
> Roy Sigurd Karlsbakk
> (+47) 98013356
> http://blogg.karlsbakk.net/
> GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
> --
> Hið góða skaltu í stein höggva, hið illa í snjó rita.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unrecoverable scrub errors

2017-11-19 Thread Roy Sigurd Karlsbakk

I guess not using RAID-0 would be a good start…

Vennlig hilsen

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
Hið góða skaltu í stein höggva, hið illa í snjó rita.

- Original Message -
> From: "Nazar Mokrynskyi" <na...@mokrynskyi.com>
> To: "Chris Murphy" <li...@colorremedies.com>
> Cc: "linux-btrfs" <linux-btrfs@vger.kernel.org>
> Sent: Sunday, 19 November, 2017 12:17:36
> Subject: Re: Unrecoverable scrub errors

> Looks like it is not going to resolve nicely.
> 
> After removing that problematic snapshot filesystem quickly becomes readonly
> like so:
> 
>> [23552.839055] BTRFS error (device dm-2): cleaner transaction attach returned
>> -30
>> [23577.374390] BTRFS info (device dm-2): use lzo compression
>> [23577.374391] BTRFS info (device dm-2): disk space caching is enabled
>> [23577.374392] BTRFS info (device dm-2): has skinny extents
>> [23577.506214] BTRFS info (device dm-2): bdev
>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0,
>> flush 0, corrupt 24, gen 0
>> [23795.026390] BTRFS error (device dm-2): bad tree block start 0 470069510144
>> [23795.148193] BTRFS error (device dm-2): bad tree block start 56 
>> 470069542912
>> [23795.148424] BTRFS warning (device dm-2): dm-2 checksum verify failed on
>> 470069460992 wanted 54C49539 found FD171FBB level 0
>> [23795.148526] BTRFS error (device dm-2): bad tree block start 0 470069493760
>> [23795.150461] BTRFS error (device dm-2): bad tree block start 1459617832
>> 470069477376
>> [23795.639781] BTRFS error (device dm-2): bad tree block start 0 470069510144
>> [23795.655487] BTRFS error (device dm-2): bad tree block start 0 470069510144
>> [23795.655496] BTRFS: error (device dm-2) in btrfs_drop_snapshot:9244: 
>> errno=-5
>> IO failure
>> [23795.655498] BTRFS info (device dm-2): forced readonly
> Check and repaid doesn't help either:
> 
>> nazar-pc@nazar-pc ~> sudo btrfs check -p
>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
>> Checking filesystem on
>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
>> UUID: 82cfcb0f-0b80-4764-bed6-f529f2030ac5
>> Extent back ref already exists for 797694840832 parent 330760175616 root 0 
>> owner
>> 0 offset 0 num_refs 1
>> parent transid verify failed on 470072098816 wanted 1431 found 307965
>> parent transid verify failed on 470072098816 wanted 1431 found 307965
>> parent transid verify failed on 470072098816 wanted 1431 found 307965
>> parent transid verify failed on 470072098816 wanted 1431 found 307965
>> Ignoring transid failure
>> leaf parent key incorrect 470072098816
>> bad block 470072098816
>>
>> ERROR: errors found in extent allocation tree or chunk allocation
>> There is no free space entry for 797694844928-797694808064
>> There is no free space entry for 797694844928-797819535360
>> cache appears valid but isn't 796745793536
>> There is no free space entry for 814739984384-814739988480
>> There is no free space entry for 814739984384-814999404544
>> cache appears valid but isn't 813925662720
>> block group 894456299520 has wrong amount of free space
>> failed to load free space cache for block group 894456299520
>> block group 922910457856 has wrong amount of free space
>> failed to load free space cache for block group 922910457856
>>
>> ERROR: errors found in free space cache
>> found 963515335717 bytes used, error(s) found
>> total csum bytes: 921699896
>> total tree bytes: 20361920512
>> total fs tree bytes: 17621073920
>> total extent tree bytes: 1629323264
>> btree space waste bytes: 3812167723
>> file data blocks allocated: 21167059447808
>>  referenced 2283091746816
>>
>> nazar-pc@nazar-pc ~> sudo btrfs check --repair -p
>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
>> enabling repair mode
>> Checking filesystem on
>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
>> UUID: 82cfcb0f-0b80-4764-bed6-f529f2030ac5
>> Extent back ref already exists for 797694840832 parent 330760175616 root 0 
>> owner
>> 0 offset 0 num_refs 1
>> parent transid verify failed on 470072098816 wanted 1431 found 307965
>> parent transid verify failed on 470072098816 wanted 1431 found 307965
>> parent transid verify failed on 470072098816 wanted 1431 found 307965
>> parent transid verify failed on 470072098816 wanted 1431 found 307965
>> Ignoring transid failure
>> leaf parent key incorrect 4700720988

Re: Unrecoverable scrub errors

2017-11-19 Thread Nazar Mokrynskyi

Looks like it is not going to resolve nicely.

After removing that problematic snapshot filesystem quickly becomes readonly 
like so:

> [23552.839055] BTRFS error (device dm-2): cleaner transaction attach returned 
> -30
> [23577.374390] BTRFS info (device dm-2): use lzo compression
> [23577.374391] BTRFS info (device dm-2): disk space caching is enabled
> [23577.374392] BTRFS info (device dm-2): has skinny extents
> [23577.506214] BTRFS info (device dm-2): bdev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, 
> flush 0, corrupt 24, gen 0
> [23795.026390] BTRFS error (device dm-2): bad tree block start 0 470069510144
> [23795.148193] BTRFS error (device dm-2): bad tree block start 56 470069542912
> [23795.148424] BTRFS warning (device dm-2): dm-2 checksum verify failed on 
> 470069460992 wanted 54C49539 found FD171FBB level 0
> [23795.148526] BTRFS error (device dm-2): bad tree block start 0 470069493760
> [23795.150461] BTRFS error (device dm-2): bad tree block start 1459617832 
> 470069477376
> [23795.639781] BTRFS error (device dm-2): bad tree block start 0 470069510144
> [23795.655487] BTRFS error (device dm-2): bad tree block start 0 470069510144
> [23795.655496] BTRFS: error (device dm-2) in btrfs_drop_snapshot:9244: 
> errno=-5 IO failure
> [23795.655498] BTRFS info (device dm-2): forced readonly
Check and repaid doesn't help either:

> nazar-pc@nazar-pc ~> sudo btrfs check -p 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> Checking filesystem on 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> UUID: 82cfcb0f-0b80-4764-bed6-f529f2030ac5
> Extent back ref already exists for 797694840832 parent 330760175616 root 0 
> owner 0 offset 0 num_refs 1
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> Ignoring transid failure
> leaf parent key incorrect 470072098816
> bad block 470072098816
>
> ERROR: errors found in extent allocation tree or chunk allocation
> There is no free space entry for 797694844928-797694808064
> There is no free space entry for 797694844928-797819535360
> cache appears valid but isn't 796745793536
> There is no free space entry for 814739984384-814739988480
> There is no free space entry for 814739984384-814999404544
> cache appears valid but isn't 813925662720
> block group 894456299520 has wrong amount of free space
> failed to load free space cache for block group 894456299520
> block group 922910457856 has wrong amount of free space
> failed to load free space cache for block group 922910457856
>
> ERROR: errors found in free space cache
> found 963515335717 bytes used, error(s) found
> total csum bytes: 921699896
> total tree bytes: 20361920512
> total fs tree bytes: 17621073920
> total extent tree bytes: 1629323264
> btree space waste bytes: 3812167723
> file data blocks allocated: 21167059447808
>  referenced 2283091746816
>
> nazar-pc@nazar-pc ~> sudo btrfs check --repair -p 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> enabling repair mode
> Checking filesystem on 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> UUID: 82cfcb0f-0b80-4764-bed6-f529f2030ac5
> Extent back ref already exists for 797694840832 parent 330760175616 root 0 
> owner 0 offset 0 num_refs 1
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> Ignoring transid failure
> leaf parent key incorrect 470072098816
> bad block 470072098816
>
> ERROR: errors found in extent allocation tree or chunk allocation
> Fixed 0 roots.
> There is no free space entry for 797694844928-797694808064
> There is no free space entry for 797694844928-797819535360
> cache appears valid but isn't 796745793536
> There is no free space entry for 814739984384-814739988480
> There is no free space entry for 814739984384-814999404544
> cache appears valid but isn't 813925662720
> block group 894456299520 has wrong amount of free space
> failed to load free space cache for block group 894456299520
> block group 922910457856 has wrong amount of free space
> failed to load free space cache for block group 922910457856
>
> ERROR: errors found in free space cache
> found 963515335717 bytes used, error(s) found
> total csum bytes: 921699896
> total tree bytes: 20361920512
> total fs tree bytes: 17621073920
> total extent tree bytes: 1629323264
> btree space waste bytes: 3812167723
> file data blocks allocated: 21167059447808
>  referenced 2283091746816
Anything else I can try before starting from scratch?

Sincerely, Nazar Mokrynskyi
github.com/nazar-pc

19.11.17 07:30, Nazar

Re: Unrecoverable scrub errors

2017-11-18 Thread Nazar Mokrynskyi

19.11.17 07:23, Chris Murphy пише:
> On Sat, Nov 18, 2017 at 10:13 PM, Nazar Mokrynskyi  
> wrote:
>
>> That was eventually useful:
>>
>> * found some familiar file names (mangled eCryptfs file names from times 
>> when I used it for home directory) and decided to search for it in old 
>> snapshots of home directory (about 1/3 of snapshots on that partition)
>> * file name was present in snapshots back to July of 2015, but during search 
>> through snapshot from 2016-10-26_18:47:04 I've got I/O error reported by 
>> find command at one directory
>> * tried to open directory in file manager - same error, fails to open
>> * after removing this lets call it "broken" snapshot started new scrub, 
>> hopefully it'll finish fine
>>
>> If it is not actually related to recent memory issues I'd be positively 
>> surprised. Not sure what happened towards the end of October 2016 though, 
>> especially that backups were on different physical device back then.
> Wrong csum computation during the transfer? Did you use btrfs send receive?

Yes, I've used send/receive to copy snapshots from primary SSD to backup HDD.

Not sure when wrong csum computation happened, since SSD contains only most 
recent snapshots and only HDD contains older snapshots. Even if error happened 
on SSD, those older snapshots are gone a long time ago and there is no way to 
check this.

Sincerely, Nazar Mokrynskyi
github.com/nazar-pc

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unrecoverable scrub errors

2017-11-18 Thread Chris Murphy

On Sat, Nov 18, 2017 at 10:13 PM, Nazar Mokrynskyi  wrote:

>
> That was eventually useful:
>
> * found some familiar file names (mangled eCryptfs file names from times when 
> I used it for home directory) and decided to search for it in old snapshots 
> of home directory (about 1/3 of snapshots on that partition)
> * file name was present in snapshots back to July of 2015, but during search 
> through snapshot from 2016-10-26_18:47:04 I've got I/O error reported by find 
> command at one directory
> * tried to open directory in file manager - same error, fails to open
> * after removing this lets call it "broken" snapshot started new scrub, 
> hopefully it'll finish fine
>
> If it is not actually related to recent memory issues I'd be positively 
> surprised. Not sure what happened towards the end of October 2016 though, 
> especially that backups were on different physical device back then.

Wrong csum computation during the transfer? Did you use btrfs send receive?

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unrecoverable scrub errors

2017-11-18 Thread Nazar Mokrynskyi

19.11.17 06:33, Chris Murphy пише:
> On Sat, Nov 18, 2017 at 8:45 PM, Nazar Mokrynskyi  
> wrote:
>> 19.11.17 05:19, Chris Murphy пише:
>>> On Sat, Nov 18, 2017 at 1:15 AM, Nazar Mokrynskyi  
>>> wrote:
 I can assure you that drive (it is HDD) is perfectly functional with 0 
 SMART errors or warnings and doesn't have any problems. dmesg is clean in 
 that regard too, HDD itself can be excluded from potential causes.

 There were however some memory-related issues on my machine a few months 
 ago, so there is a chance that data might have being written incorrectly 
 to the drive back then (I didn't run scrub on backup drive for a long 
 time).

 How can I identify to which files these metadata belong to replace or just 
 remove them (files)?
>>> You might look through the archives about bad ram and btrfs check
>>> --repair and include Hugo Mills in the search, I'm pretty sure there
>>> is code in repair that can fix certain kinds of memory induced
>>> corruption in metadata. But I have no idea if this is that type or if
>>> repair can make things worse in this case. So I'd say you get
>>> everything off this file system that you want, and then go ahead and
>>> try --repair and see what happens.
>> In this case I'm not sure if data were written incorrectly or checksum or 
>> both. So I'd like to first identify the files affected, check them manually 
>> and then decide what to do with it. Especially there not many errors yet.
>>
>>> One alternative is to just leave it alone. If you're not hitting these
>>> leaves in day to day operation, they won't hurt anything.
>> It was working for some time, but I have suspicion that occasionally it 
>> causes spikes of disk activity because of this errors (which is why I run 
>> scrub initially).
>>> Another alternative is to umount, and use btrfs-debug-tree -b  on one
>>> of the leaf/node addresses and see what you get (probably an error),
>>> but it might still also show the node content so we have some idea
>>> what's affected by the error. If it flat out refuses to show the node,
>>> might be a feature request to get a flag that forces display of the
>>> node such as it is...
>> Here is what I've got:
>>
>>> nazar-pc@nazar-pc ~> sudo btrfs-debug-tree -b 470069460992 
>>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
>>> btrfs-progs v4.13.3
>>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
>>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
>>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
>>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
>>> Csum didn't match
>>> ERROR: failed to read 470069460992
>> Looks like I indeed need a --force here.
>>
> Huh, seems overdue. But what do I know?
>
> You can use btrfs-map-logical -l to get a physical address for this
> leaf, and then plug that into dd
>
> # dd if=/dev/ skip= bs=1 count=16384 2>/dev/null | hexdump -C
>
> Gotcha of course is this is not translated into the more plain
> language output by btrfs-debug-tree. And you're in the weeds with the
> on disk format documentation. But maybe you'll see filenames on the
> right hand side of the hexdump output and maybe that's enough... Or
> maybe it's worth computing a csum on that leaf to check against the
> csum for that leaf which is found in the first field of the leaf. I'd
> expect the csum itself is what's wrong, because if you get memory
> corruption in creating the node, the resulting csum will be *correct*
> for that malformed node and there'd be no csum error, you'd just see
> some other crazy faceplant.

That was eventually useful:

* found some familiar file names (mangled eCryptfs file names from times when I 
used it for home directory) and decided to search for it in old snapshots of 
home directory (about 1/3 of snapshots on that partition)
* file name was present in snapshots back to July of 2015, but during search 
through snapshot from 2016-10-26_18:47:04 I've got I/O error reported by find 
command at one directory
* tried to open directory in file manager - same error, fails to open
* after removing this lets call it "broken" snapshot started new scrub, 
hopefully it'll finish fine

If it is not actually related to recent memory issues I'd be positively 
surprised. Not sure what happened towards the end of October 2016 though, 
especially that backups were on different physical device back then.

Sincerely, Nazar Mokrynskyi
github.com/nazar-pc

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unrecoverable scrub errors

2017-11-18 Thread Chris Murphy

On Sat, Nov 18, 2017 at 8:45 PM, Nazar Mokrynskyi  wrote:
> 19.11.17 05:19, Chris Murphy пише:
>> On Sat, Nov 18, 2017 at 1:15 AM, Nazar Mokrynskyi  
>> wrote:
>>> I can assure you that drive (it is HDD) is perfectly functional with 0 
>>> SMART errors or warnings and doesn't have any problems. dmesg is clean in 
>>> that regard too, HDD itself can be excluded from potential causes.
>>>
>>> There were however some memory-related issues on my machine a few months 
>>> ago, so there is a chance that data might have being written incorrectly to 
>>> the drive back then (I didn't run scrub on backup drive for a long time).
>>>
>>> How can I identify to which files these metadata belong to replace or just 
>>> remove them (files)?
>> You might look through the archives about bad ram and btrfs check
>> --repair and include Hugo Mills in the search, I'm pretty sure there
>> is code in repair that can fix certain kinds of memory induced
>> corruption in metadata. But I have no idea if this is that type or if
>> repair can make things worse in this case. So I'd say you get
>> everything off this file system that you want, and then go ahead and
>> try --repair and see what happens.
>
> In this case I'm not sure if data were written incorrectly or checksum or 
> both. So I'd like to first identify the files affected, check them manually 
> and then decide what to do with it. Especially there not many errors yet.
>
>> One alternative is to just leave it alone. If you're not hitting these
>> leaves in day to day operation, they won't hurt anything.
> It was working for some time, but I have suspicion that occasionally it 
> causes spikes of disk activity because of this errors (which is why I run 
> scrub initially).
>> Another alternative is to umount, and use btrfs-debug-tree -b  on one
>> of the leaf/node addresses and see what you get (probably an error),
>> but it might still also show the node content so we have some idea
>> what's affected by the error. If it flat out refuses to show the node,
>> might be a feature request to get a flag that forces display of the
>> node such as it is...
>
> Here is what I've got:
>
>> nazar-pc@nazar-pc ~> sudo btrfs-debug-tree -b 470069460992 
>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
>> btrfs-progs v4.13.3
>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
>> Csum didn't match
>> ERROR: failed to read 470069460992
> Looks like I indeed need a --force here.
>

Huh, seems overdue. But what do I know?

You can use btrfs-map-logical -l to get a physical address for this
leaf, and then plug that into dd

# dd if=/dev/ skip= bs=1 count=16384 2>/dev/null | hexdump -C

Gotcha of course is this is not translated into the more plain
language output by btrfs-debug-tree. And you're in the weeds with the
on disk format documentation. But maybe you'll see filenames on the
right hand side of the hexdump output and maybe that's enough... Or
maybe it's worth computing a csum on that leaf to check against the
csum for that leaf which is found in the first field of the leaf. I'd
expect the csum itself is what's wrong, because if you get memory
corruption in creating the node, the resulting csum will be *correct*
for that malformed node and there'd be no csum error, you'd just see
some other crazy faceplant.


Example.

I need a metadata leaf, so ask debug tree to show the files tree for
an empty subvolume. In your case, you've got a bad leaf address
already, so you just plug that into btrfs-map-logical as shown below:

# btrfs-debug-tree -t 340 /dev/nvme0n1p8
btrfs-progs v4.13.3
file tree key (340 ROOT_ITEM 0)
leaf 155375550464 items 3 free space 15942 generation 249992 owner 340
leaf 155375550464 flags 0x1(WRITTEN) backref revision 1
fs uuid 2662057f-e6c7-47fa-8af9-ad933a22f6ec
chunk uuid 1df72dcf-f515-404a-894a-f7345f988793
item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
generation 50968 transid 249992 size 0 nbytes 0
block group 0 mode 40700 links 1 uid 0 gid 0 rdev 0
sequence 0 flags 0x124(none)
atime 1510866942.430740536 (2017-11-16 14:15:42)
ctime 1511053088.58606103 (2017-11-18 17:58:08)
mtime 1494741970.844618722 (2017-05-14 00:06:10)
otime 1494741970.844618722 (2017-05-14 00:06:10)
item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
index 0 namelen 2 name: ..
item 2 key (256 XATTR_ITEM 3817753667) itemoff 16017 itemsize 94
location key (0 UNKNOWN.0 0) type XATTR
transid 50969 data_len 48 name_len 16
name: security.selinux
data system_u:object_r:systemd_machined_var_lib_t:s0
total bytes 75161927680
bytes used 23639638016
uuid 2662057f-e6c7-47fa-8af9-ad933a22f6ec

Get

Re: Unrecoverable scrub errors

2017-11-18 Thread Nazar Mokrynskyi

19.11.17 05:19, Chris Murphy пише:
> On Sat, Nov 18, 2017 at 1:15 AM, Nazar Mokrynskyi  
> wrote:
>> I can assure you that drive (it is HDD) is perfectly functional with 0 SMART 
>> errors or warnings and doesn't have any problems. dmesg is clean in that 
>> regard too, HDD itself can be excluded from potential causes.
>>
>> There were however some memory-related issues on my machine a few months 
>> ago, so there is a chance that data might have being written incorrectly to 
>> the drive back then (I didn't run scrub on backup drive for a long time).
>>
>> How can I identify to which files these metadata belong to replace or just 
>> remove them (files)?
> You might look through the archives about bad ram and btrfs check
> --repair and include Hugo Mills in the search, I'm pretty sure there
> is code in repair that can fix certain kinds of memory induced
> corruption in metadata. But I have no idea if this is that type or if
> repair can make things worse in this case. So I'd say you get
> everything off this file system that you want, and then go ahead and
> try --repair and see what happens.

In this case I'm not sure if data were written incorrectly or checksum or both. 
So I'd like to first identify the files affected, check them manually and then 
decide what to do with it. Especially there not many errors yet.

> One alternative is to just leave it alone. If you're not hitting these
> leaves in day to day operation, they won't hurt anything.
It was working for some time, but I have suspicion that occasionally it causes 
spikes of disk activity because of this errors (which is why I run scrub 
initially).
> Another alternative is to umount, and use btrfs-debug-tree -b  on one
> of the leaf/node addresses and see what you get (probably an error),
> but it might still also show the node content so we have some idea
> what's affected by the error. If it flat out refuses to show the node,
> might be a feature request to get a flag that forces display of the
> node such as it is...

Here is what I've got:

> nazar-pc@nazar-pc ~> sudo btrfs-debug-tree -b 470069460992 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> btrfs-progs v4.13.3
> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
> Csum didn't match
> ERROR: failed to read 470069460992
Looks like I indeed need a --force here.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unrecoverable scrub errors

2017-11-18 Thread Chris Murphy

On Sat, Nov 18, 2017 at 1:15 AM, Nazar Mokrynskyi  wrote:
> I can assure you that drive (it is HDD) is perfectly functional with 0 SMART 
> errors or warnings and doesn't have any problems. dmesg is clean in that 
> regard too, HDD itself can be excluded from potential causes.
>
> There were however some memory-related issues on my machine a few months ago, 
> so there is a chance that data might have being written incorrectly to the 
> drive back then (I didn't run scrub on backup drive for a long time).
>
> How can I identify to which files these metadata belong to replace or just 
> remove them (files)?

You might look through the archives about bad ram and btrfs check
--repair and include Hugo Mills in the search, I'm pretty sure there
is code in repair that can fix certain kinds of memory induced
corruption in metadata. But I have no idea if this is that type or if
repair can make things worse in this case. So I'd say you get
everything off this file system that you want, and then go ahead and
try --repair and see what happens.

One alternative is to just leave it alone. If you're not hitting these
leaves in day to day operation, they won't hurt anything.

Another alternative is to umount, and use btrfs-debug-tree -b  on one
of the leaf/node addresses and see what you get (probably an error),
but it might still also show the node content so we have some idea
what's affected by the error. If it flat out refuses to show the node,
might be a feature request to get a flag that forces display of the
node such as it is...



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unrecoverable scrub errors

2017-11-18 Thread Nazar Mokrynskyi

I can assure you that drive (it is HDD) is perfectly functional with 0 SMART 
errors or warnings and doesn't have any problems. dmesg is clean in that regard 
too, HDD itself can be excluded from potential causes.

There were however some memory-related issues on my machine a few months ago, 
so there is a chance that data might have being written incorrectly to the 
drive back then (I didn't run scrub on backup drive for a long time).

How can I identify to which files these metadata belong to replace or just 
remove them (files)?

Sincerely, Nazar Mokrynskyi
github.com/nazar-pc

18.11.17 05:33, Adam Borowski пише:
> On Fri, Nov 17, 2017 at 08:19:11PM -0700, Chris Murphy wrote:
>> On Fri, Nov 17, 2017 at 8:41 AM, Nazar Mokrynskyi  
>> wrote:
>>
 [551049.038718] BTRFS warning (device dm-2): checksum error at logical 
 470069460992 on dev 
 /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
 942238048: metadata leaf (level 0) in tree 985
 [551049.038720] BTRFS warning (device dm-2): checksum error at logical 
 470069460992 on dev 
 /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
 942238048: metadata leaf (level 0) in tree 985
 [551049.038723] BTRFS error (device dm-2): bdev 
 /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 
 0, flush 0, corrupt 1, gen 0
 [551049.039634] BTRFS warning (device dm-2): checksum error at logical 
 470069526528 on dev 
 /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
 942238176: metadata leaf (level 0) in tree 985
 [551049.039635] BTRFS warning (device dm-2): checksum error at logical 
 470069526528 on dev 
 /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
 942238176: metadata leaf (level 0) in tree 985
 [551049.039637] BTRFS error (device dm-2): bdev 
 /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 
 0, flush 0, corrupt 2, gen 0
 [551049.413114] BTRFS error (device dm-2): unable to fixup (regular) error 
 at logical 470069460992 on dev 
 /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
>> These are metadata errors. Are there any other storage stack related
>> errors in the previous 2-5 minutes, such as read errors (UNC) or SATA
>> link reset messages?
>>
>>> Maybe I can find snapshot that contains file with wrong checksum and
>>> remove corresponding snapshot or something like that?
>> It's not a file. It's metadata leaf.
> Just for the record: had this be a data block (ie, a non-inline file
> extent), the dmesg message would include one of filenames that refer to that
> extent.  To clear the error, you'd need to remove all such files.
>
 nazar-pc@nazar-pc ~> sudo btrfs filesystem df /media/Backup
 Data, single: total=879.01GiB, used=877.24GiB
 System, DUP: total=40.00MiB, used=128.00KiB
 Metadata, DUP: total=20.50GiB, used=18.96GiB
 GlobalReserve, single: total=512.00MiB, used=0.00B
>> Metadata is DUP, but both copies have corruption. Kinda strange. But I
>> don't know how close the DUP copies are to each other, if possibly a
>> big enough media defect can explain this.
> The original post mentioned SSD (but was unclear if _this_ filesystem is
> backed by one).  If so, DUP is nearly worthless as both copies will be
> written to physical cells next to each other, no matter what positions the
> FTL shows them at.
>
>
> Meow!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unrecoverable scrub errors

2017-11-17 Thread Adam Borowski

On Fri, Nov 17, 2017 at 08:19:11PM -0700, Chris Murphy wrote:
> On Fri, Nov 17, 2017 at 8:41 AM, Nazar Mokrynskyi  
> wrote:
> 
> >> [551049.038718] BTRFS warning (device dm-2): checksum error at logical 
> >> 470069460992 on dev 
> >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> >> 942238048: metadata leaf (level 0) in tree 985
> >> [551049.038720] BTRFS warning (device dm-2): checksum error at logical 
> >> 470069460992 on dev 
> >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> >> 942238048: metadata leaf (level 0) in tree 985
> >> [551049.038723] BTRFS error (device dm-2): bdev 
> >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 
> >> 0, flush 0, corrupt 1, gen 0
> >> [551049.039634] BTRFS warning (device dm-2): checksum error at logical 
> >> 470069526528 on dev 
> >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> >> 942238176: metadata leaf (level 0) in tree 985
> >> [551049.039635] BTRFS warning (device dm-2): checksum error at logical 
> >> 470069526528 on dev 
> >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> >> 942238176: metadata leaf (level 0) in tree 985
> >> [551049.039637] BTRFS error (device dm-2): bdev 
> >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 
> >> 0, flush 0, corrupt 2, gen 0
> >> [551049.413114] BTRFS error (device dm-2): unable to fixup (regular) error 
> >> at logical 470069460992 on dev 
> >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> 
> These are metadata errors. Are there any other storage stack related
> errors in the previous 2-5 minutes, such as read errors (UNC) or SATA
> link reset messages?
> 
> >Maybe I can find snapshot that contains file with wrong checksum and
> > remove corresponding snapshot or something like that?
> 
> It's not a file. It's metadata leaf.

Just for the record: had this be a data block (ie, a non-inline file
extent), the dmesg message would include one of filenames that refer to that
extent.  To clear the error, you'd need to remove all such files.

> >> nazar-pc@nazar-pc ~> sudo btrfs filesystem df /media/Backup
> >> Data, single: total=879.01GiB, used=877.24GiB
> >> System, DUP: total=40.00MiB, used=128.00KiB
> >> Metadata, DUP: total=20.50GiB, used=18.96GiB
> >> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> Metadata is DUP, but both copies have corruption. Kinda strange. But I
> don't know how close the DUP copies are to each other, if possibly a
> big enough media defect can explain this.

The original post mentioned SSD (but was unclear if _this_ filesystem is
backed by one).  If so, DUP is nearly worthless as both copies will be
written to physical cells next to each other, no matter what positions the
FTL shows them at.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢰⠒⠀⣿⡁ Imagine there are bandits in your house, your kid is bleeding out,
⢿⡄⠘⠷⠚⠋⠀ the house is on fire, and seven big-ass trumpets are playing in the
⠈⠳⣄ sky.  Your cat demands food.  The priority should be obvious...
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unrecoverable scrub errors

2017-11-17 Thread Chris Murphy

On Fri, Nov 17, 2017 at 8:41 AM, Nazar Mokrynskyi  wrote:

>> [551049.038718] BTRFS warning (device dm-2): checksum error at logical 
>> 470069460992 on dev 
>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
>> 942238048: metadata leaf (level 0) in tree 985
>> [551049.038720] BTRFS warning (device dm-2): checksum error at logical 
>> 470069460992 on dev 
>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
>> 942238048: metadata leaf (level 0) in tree 985
>> [551049.038723] BTRFS error (device dm-2): bdev 
>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 
>> 0, flush 0, corrupt 1, gen 0
>> [551049.039634] BTRFS warning (device dm-2): checksum error at logical 
>> 470069526528 on dev 
>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
>> 942238176: metadata leaf (level 0) in tree 985
>> [551049.039635] BTRFS warning (device dm-2): checksum error at logical 
>> 470069526528 on dev 
>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
>> 942238176: metadata leaf (level 0) in tree 985
>> [551049.039637] BTRFS error (device dm-2): bdev 
>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 
>> 0, flush 0, corrupt 2, gen 0
>> [551049.413114] BTRFS error (device dm-2): unable to fixup (regular) error 
>> at logical 470069460992 on dev 
>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1


These are metadata errors. Are there any other storage stack related
errors in the previous 2-5 minutes, such as read errors (UNC) or SATA
link reset messages?

> Are there any better options before resorting to `btrfsck --repair`?

I wouldn't try it just yet. What do you get for btrfs check without
repair? This will check the metadata and it should run into the same
problem, but if it craps out then chances are --repair will too.


>Maybe I can find snapshot that contains file with wrong checksum and remove 
>corresponding snapshot or something like that?

It's not a file. It's metadata leaf.


>> nazar-pc@nazar-pc ~> sudo btrfs filesystem df /media/Backup
>> Data, single: total=879.01GiB, used=877.24GiB
>> System, DUP: total=40.00MiB, used=128.00KiB
>> Metadata, DUP: total=20.50GiB, used=18.96GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B

Metadata is DUP, but both copies have corruption. Kinda strange. But I
don't know how close the DUP copies are to each other, if possibly a
big enough media defect can explain this.

What do you get for smartctl -l scterc /dev/ (whole physical device,
not the dm device)

In the meantime, take the drive offline (umount it), and run smartctl
-t long, and after that finishes, smartctl -x. Attach that as a plain
text file, it should be small enough for the list to handle it, and
avoids reformatting problems.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Unrecoverable scrub errors

2017-11-17 Thread Nazar Mokrynskyi

Hi folks,

I'm a long-term btrfs user (permanently for my root partition and other stuff 
for ~3 years now, with compression, most of the way with RAID0 on various SSD, 
etc).

In simple words my setup consists of root partition and backup partition. There 
are automated snapshots on root partition which are then copied to online 
backup partition (send/receive, handled by "Just backup btrfs") and 
occasionally to offline backup partition (handled by "Btrfs sync subvolumes").

I've recently found that my online backup partition has some unrecoverable 
errors as reported after running scrub:

> scrub status for 82cfcb0f-0b80-4764-bed6-f529f2030ac5
>     scrub started at Fri Nov 17 15:05:12 2017 and finished after 02:07:30
>     total bytes scrubbed: 915.16GiB with 12 errors
>     error details: csum=12
>     corrected errors: 0, uncorrectable errors: 12, unverified errors: 0
dmesg (this is all related to mentioned errors):

> [551049.038718] BTRFS warning (device dm-2): checksum error at logical 
> 470069460992 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 942238048: metadata leaf (level 0) in tree 985
> [551049.038720] BTRFS warning (device dm-2): checksum error at logical 
> 470069460992 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 942238048: metadata leaf (level 0) in tree 985
> [551049.038723] BTRFS error (device dm-2): bdev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, 
> flush 0, corrupt 1, gen 0
> [551049.039634] BTRFS warning (device dm-2): checksum error at logical 
> 470069526528 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 942238176: metadata leaf (level 0) in tree 985
> [551049.039635] BTRFS warning (device dm-2): checksum error at logical 
> 470069526528 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 942238176: metadata leaf (level 0) in tree 985
> [551049.039637] BTRFS error (device dm-2): bdev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, 
> flush 0, corrupt 2, gen 0
> [551049.413114] BTRFS error (device dm-2): unable to fixup (regular) error at 
> logical 470069460992 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> [551049.413473] BTRFS warning (device dm-2): checksum error at logical 
> 470069477376 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 942238080: metadata leaf (level 0) in tree 985
> [551049.413473] BTRFS warning (device dm-2): checksum error at logical 
> 470069477376 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 942238080: metadata leaf (level 0) in tree 985
> [551049.413475] BTRFS error (device dm-2): bdev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, 
> flush 0, corrupt 3, gen 0
> [551049.413685] BTRFS error (device dm-2): unable to fixup (regular) error at 
> logical 470069477376 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> [551049.413910] BTRFS warning (device dm-2): checksum error at logical 
> 470069493760 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 942238112: metadata leaf (level 0) in tree 985
> [551049.413911] BTRFS warning (device dm-2): checksum error at logical 
> 470069493760 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 942238112: metadata leaf (level 0) in tree 985
> [551049.413912] BTRFS error (device dm-2): bdev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, 
> flush 0, corrupt 4, gen 0
> [551049.414121] BTRFS error (device dm-2): unable to fixup (regular) error at 
> logical 470069493760 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> [551049.414354] BTRFS warning (device dm-2): checksum error at logical 
> 470069510144 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 942238144: metadata leaf (level 0) in tree 985
> [551049.414355] BTRFS warning (device dm-2): checksum error at logical 
> 470069510144 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 942238144: metadata leaf (level 0) in tree 985
> [551049.414356] BTRFS error (device dm-2): bdev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, 
> flush 0, corrupt 5, gen 0
> [551049.414567] BTRFS error (device dm-2): unable to fixup (regular) error at 
> logical 470069510144 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> [551049.479023] BTRFS error (device dm-2): unable to fixup (regular) error at 
> logical 470069526528 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> [551049.479989] BTRFS warning (device dm-2): checksum error at logical 
> 470069542912 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 942238208: metadata leaf (level 0)

Re: Unrecoverable scrub errors

Re: Unrecoverable scrub errors

Re: Unrecoverable scrub errors

Re: Unrecoverable scrub errors

Re: Unrecoverable scrub errors

Re: Unrecoverable scrub errors

Re: Unrecoverable scrub errors

Re: Unrecoverable scrub errors

Re: Unrecoverable scrub errors

Re: Unrecoverable scrub errors

Re: Unrecoverable scrub errors

Re: Unrecoverable scrub errors

Unrecoverable scrub errors

13 matches

Site Navigation

Mail list logo

Footer information