Re: Unrecoverable scrub errors
This particular partition was initially created in July 2015. I've added/removed drives a few times when migrating from older to newer hardware, but never used RAID0 or any other RAID level beyond that. Sincerely, Nazar Mokrynskyi github.com/nazar-pc 19.11.17 22:39, Roy Sigurd Karlsbakk пише: > I guess not using RAID-0 would be a good start… > > Vennlig hilsen > > roy > -- > Roy Sigurd Karlsbakk > (+47) 98013356 > http://blogg.karlsbakk.net/ > GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt > -- > Hið góða skaltu í stein höggva, hið illa í snjó rita. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unrecoverable scrub errors
I guess not using RAID-0 would be a good start… Vennlig hilsen roy -- Roy Sigurd Karlsbakk (+47) 98013356 http://blogg.karlsbakk.net/ GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt -- Hið góða skaltu í stein höggva, hið illa í snjó rita. - Original Message - > From: "Nazar Mokrynskyi" <na...@mokrynskyi.com> > To: "Chris Murphy" <li...@colorremedies.com> > Cc: "linux-btrfs" <linux-btrfs@vger.kernel.org> > Sent: Sunday, 19 November, 2017 12:17:36 > Subject: Re: Unrecoverable scrub errors > Looks like it is not going to resolve nicely. > > After removing that problematic snapshot filesystem quickly becomes readonly > like so: > >> [23552.839055] BTRFS error (device dm-2): cleaner transaction attach returned >> -30 >> [23577.374390] BTRFS info (device dm-2): use lzo compression >> [23577.374391] BTRFS info (device dm-2): disk space caching is enabled >> [23577.374392] BTRFS info (device dm-2): has skinny extents >> [23577.506214] BTRFS info (device dm-2): bdev >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, >> flush 0, corrupt 24, gen 0 >> [23795.026390] BTRFS error (device dm-2): bad tree block start 0 470069510144 >> [23795.148193] BTRFS error (device dm-2): bad tree block start 56 >> 470069542912 >> [23795.148424] BTRFS warning (device dm-2): dm-2 checksum verify failed on >> 470069460992 wanted 54C49539 found FD171FBB level 0 >> [23795.148526] BTRFS error (device dm-2): bad tree block start 0 470069493760 >> [23795.150461] BTRFS error (device dm-2): bad tree block start 1459617832 >> 470069477376 >> [23795.639781] BTRFS error (device dm-2): bad tree block start 0 470069510144 >> [23795.655487] BTRFS error (device dm-2): bad tree block start 0 470069510144 >> [23795.655496] BTRFS: error (device dm-2) in btrfs_drop_snapshot:9244: >> errno=-5 >> IO failure >> [23795.655498] BTRFS info (device dm-2): forced readonly > Check and repaid doesn't help either: > >> nazar-pc@nazar-pc ~> sudo btrfs check -p >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 >> Checking filesystem on >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 >> UUID: 82cfcb0f-0b80-4764-bed6-f529f2030ac5 >> Extent back ref already exists for 797694840832 parent 330760175616 root 0 >> owner >> 0 offset 0 num_refs 1 >> parent transid verify failed on 470072098816 wanted 1431 found 307965 >> parent transid verify failed on 470072098816 wanted 1431 found 307965 >> parent transid verify failed on 470072098816 wanted 1431 found 307965 >> parent transid verify failed on 470072098816 wanted 1431 found 307965 >> Ignoring transid failure >> leaf parent key incorrect 470072098816 >> bad block 470072098816 >> >> ERROR: errors found in extent allocation tree or chunk allocation >> There is no free space entry for 797694844928-797694808064 >> There is no free space entry for 797694844928-797819535360 >> cache appears valid but isn't 796745793536 >> There is no free space entry for 814739984384-814739988480 >> There is no free space entry for 814739984384-814999404544 >> cache appears valid but isn't 813925662720 >> block group 894456299520 has wrong amount of free space >> failed to load free space cache for block group 894456299520 >> block group 922910457856 has wrong amount of free space >> failed to load free space cache for block group 922910457856 >> >> ERROR: errors found in free space cache >> found 963515335717 bytes used, error(s) found >> total csum bytes: 921699896 >> total tree bytes: 20361920512 >> total fs tree bytes: 17621073920 >> total extent tree bytes: 1629323264 >> btree space waste bytes: 3812167723 >> file data blocks allocated: 21167059447808 >> referenced 2283091746816 >> >> nazar-pc@nazar-pc ~> sudo btrfs check --repair -p >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 >> enabling repair mode >> Checking filesystem on >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 >> UUID: 82cfcb0f-0b80-4764-bed6-f529f2030ac5 >> Extent back ref already exists for 797694840832 parent 330760175616 root 0 >> owner >> 0 offset 0 num_refs 1 >> parent transid verify failed on 470072098816 wanted 1431 found 307965 >> parent transid verify failed on 470072098816 wanted 1431 found 307965 >> parent transid verify failed on 470072098816 wanted 1431 found 307965 >> parent transid verify failed on 470072098816 wanted 1431 found 307965 >> Ignoring transid failure >> leaf parent key incorrect 4700720988
Re: Unrecoverable scrub errors
Looks like it is not going to resolve nicely. After removing that problematic snapshot filesystem quickly becomes readonly like so: > [23552.839055] BTRFS error (device dm-2): cleaner transaction attach returned > -30 > [23577.374390] BTRFS info (device dm-2): use lzo compression > [23577.374391] BTRFS info (device dm-2): disk space caching is enabled > [23577.374392] BTRFS info (device dm-2): has skinny extents > [23577.506214] BTRFS info (device dm-2): bdev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, > flush 0, corrupt 24, gen 0 > [23795.026390] BTRFS error (device dm-2): bad tree block start 0 470069510144 > [23795.148193] BTRFS error (device dm-2): bad tree block start 56 470069542912 > [23795.148424] BTRFS warning (device dm-2): dm-2 checksum verify failed on > 470069460992 wanted 54C49539 found FD171FBB level 0 > [23795.148526] BTRFS error (device dm-2): bad tree block start 0 470069493760 > [23795.150461] BTRFS error (device dm-2): bad tree block start 1459617832 > 470069477376 > [23795.639781] BTRFS error (device dm-2): bad tree block start 0 470069510144 > [23795.655487] BTRFS error (device dm-2): bad tree block start 0 470069510144 > [23795.655496] BTRFS: error (device dm-2) in btrfs_drop_snapshot:9244: > errno=-5 IO failure > [23795.655498] BTRFS info (device dm-2): forced readonly Check and repaid doesn't help either: > nazar-pc@nazar-pc ~> sudo btrfs check -p > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > Checking filesystem on > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > UUID: 82cfcb0f-0b80-4764-bed6-f529f2030ac5 > Extent back ref already exists for 797694840832 parent 330760175616 root 0 > owner 0 offset 0 num_refs 1 > parent transid verify failed on 470072098816 wanted 1431 found 307965 > parent transid verify failed on 470072098816 wanted 1431 found 307965 > parent transid verify failed on 470072098816 wanted 1431 found 307965 > parent transid verify failed on 470072098816 wanted 1431 found 307965 > Ignoring transid failure > leaf parent key incorrect 470072098816 > bad block 470072098816 > > ERROR: errors found in extent allocation tree or chunk allocation > There is no free space entry for 797694844928-797694808064 > There is no free space entry for 797694844928-797819535360 > cache appears valid but isn't 796745793536 > There is no free space entry for 814739984384-814739988480 > There is no free space entry for 814739984384-814999404544 > cache appears valid but isn't 813925662720 > block group 894456299520 has wrong amount of free space > failed to load free space cache for block group 894456299520 > block group 922910457856 has wrong amount of free space > failed to load free space cache for block group 922910457856 > > ERROR: errors found in free space cache > found 963515335717 bytes used, error(s) found > total csum bytes: 921699896 > total tree bytes: 20361920512 > total fs tree bytes: 17621073920 > total extent tree bytes: 1629323264 > btree space waste bytes: 3812167723 > file data blocks allocated: 21167059447808 > referenced 2283091746816 > > nazar-pc@nazar-pc ~> sudo btrfs check --repair -p > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > enabling repair mode > Checking filesystem on > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > UUID: 82cfcb0f-0b80-4764-bed6-f529f2030ac5 > Extent back ref already exists for 797694840832 parent 330760175616 root 0 > owner 0 offset 0 num_refs 1 > parent transid verify failed on 470072098816 wanted 1431 found 307965 > parent transid verify failed on 470072098816 wanted 1431 found 307965 > parent transid verify failed on 470072098816 wanted 1431 found 307965 > parent transid verify failed on 470072098816 wanted 1431 found 307965 > Ignoring transid failure > leaf parent key incorrect 470072098816 > bad block 470072098816 > > ERROR: errors found in extent allocation tree or chunk allocation > Fixed 0 roots. > There is no free space entry for 797694844928-797694808064 > There is no free space entry for 797694844928-797819535360 > cache appears valid but isn't 796745793536 > There is no free space entry for 814739984384-814739988480 > There is no free space entry for 814739984384-814999404544 > cache appears valid but isn't 813925662720 > block group 894456299520 has wrong amount of free space > failed to load free space cache for block group 894456299520 > block group 922910457856 has wrong amount of free space > failed to load free space cache for block group 922910457856 > > ERROR: errors found in free space cache > found 963515335717 bytes used, error(s) found > total csum bytes: 921699896 > total tree bytes: 20361920512 > total fs tree bytes: 17621073920 > total extent tree bytes: 1629323264 > btree space waste bytes: 3812167723 > file data blocks allocated: 21167059447808 > referenced 2283091746816 Anything else I can try before starting from scratch? Sincerely, Nazar Mokrynskyi github.com/nazar-pc 19.11.17 07:30, Nazar
Re: Unrecoverable scrub errors
19.11.17 07:23, Chris Murphy пише: > On Sat, Nov 18, 2017 at 10:13 PM, Nazar Mokrynskyi> wrote: > >> That was eventually useful: >> >> * found some familiar file names (mangled eCryptfs file names from times >> when I used it for home directory) and decided to search for it in old >> snapshots of home directory (about 1/3 of snapshots on that partition) >> * file name was present in snapshots back to July of 2015, but during search >> through snapshot from 2016-10-26_18:47:04 I've got I/O error reported by >> find command at one directory >> * tried to open directory in file manager - same error, fails to open >> * after removing this lets call it "broken" snapshot started new scrub, >> hopefully it'll finish fine >> >> If it is not actually related to recent memory issues I'd be positively >> surprised. Not sure what happened towards the end of October 2016 though, >> especially that backups were on different physical device back then. > Wrong csum computation during the transfer? Did you use btrfs send receive? Yes, I've used send/receive to copy snapshots from primary SSD to backup HDD. Not sure when wrong csum computation happened, since SSD contains only most recent snapshots and only HDD contains older snapshots. Even if error happened on SSD, those older snapshots are gone a long time ago and there is no way to check this. Sincerely, Nazar Mokrynskyi github.com/nazar-pc -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unrecoverable scrub errors
On Sat, Nov 18, 2017 at 10:13 PM, Nazar Mokrynskyiwrote: > > That was eventually useful: > > * found some familiar file names (mangled eCryptfs file names from times when > I used it for home directory) and decided to search for it in old snapshots > of home directory (about 1/3 of snapshots on that partition) > * file name was present in snapshots back to July of 2015, but during search > through snapshot from 2016-10-26_18:47:04 I've got I/O error reported by find > command at one directory > * tried to open directory in file manager - same error, fails to open > * after removing this lets call it "broken" snapshot started new scrub, > hopefully it'll finish fine > > If it is not actually related to recent memory issues I'd be positively > surprised. Not sure what happened towards the end of October 2016 though, > especially that backups were on different physical device back then. Wrong csum computation during the transfer? Did you use btrfs send receive? -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unrecoverable scrub errors
19.11.17 06:33, Chris Murphy пише: > On Sat, Nov 18, 2017 at 8:45 PM, Nazar Mokrynskyi> wrote: >> 19.11.17 05:19, Chris Murphy пише: >>> On Sat, Nov 18, 2017 at 1:15 AM, Nazar Mokrynskyi >>> wrote: I can assure you that drive (it is HDD) is perfectly functional with 0 SMART errors or warnings and doesn't have any problems. dmesg is clean in that regard too, HDD itself can be excluded from potential causes. There were however some memory-related issues on my machine a few months ago, so there is a chance that data might have being written incorrectly to the drive back then (I didn't run scrub on backup drive for a long time). How can I identify to which files these metadata belong to replace or just remove them (files)? >>> You might look through the archives about bad ram and btrfs check >>> --repair and include Hugo Mills in the search, I'm pretty sure there >>> is code in repair that can fix certain kinds of memory induced >>> corruption in metadata. But I have no idea if this is that type or if >>> repair can make things worse in this case. So I'd say you get >>> everything off this file system that you want, and then go ahead and >>> try --repair and see what happens. >> In this case I'm not sure if data were written incorrectly or checksum or >> both. So I'd like to first identify the files affected, check them manually >> and then decide what to do with it. Especially there not many errors yet. >> >>> One alternative is to just leave it alone. If you're not hitting these >>> leaves in day to day operation, they won't hurt anything. >> It was working for some time, but I have suspicion that occasionally it >> causes spikes of disk activity because of this errors (which is why I run >> scrub initially). >>> Another alternative is to umount, and use btrfs-debug-tree -b on one >>> of the leaf/node addresses and see what you get (probably an error), >>> but it might still also show the node content so we have some idea >>> what's affected by the error. If it flat out refuses to show the node, >>> might be a feature request to get a flag that forces display of the >>> node such as it is... >> Here is what I've got: >> >>> nazar-pc@nazar-pc ~> sudo btrfs-debug-tree -b 470069460992 >>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 >>> btrfs-progs v4.13.3 >>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 >>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 >>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 >>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 >>> Csum didn't match >>> ERROR: failed to read 470069460992 >> Looks like I indeed need a --force here. >> > Huh, seems overdue. But what do I know? > > You can use btrfs-map-logical -l to get a physical address for this > leaf, and then plug that into dd > > # dd if=/dev/ skip= bs=1 count=16384 2>/dev/null | hexdump -C > > Gotcha of course is this is not translated into the more plain > language output by btrfs-debug-tree. And you're in the weeds with the > on disk format documentation. But maybe you'll see filenames on the > right hand side of the hexdump output and maybe that's enough... Or > maybe it's worth computing a csum on that leaf to check against the > csum for that leaf which is found in the first field of the leaf. I'd > expect the csum itself is what's wrong, because if you get memory > corruption in creating the node, the resulting csum will be *correct* > for that malformed node and there'd be no csum error, you'd just see > some other crazy faceplant. That was eventually useful: * found some familiar file names (mangled eCryptfs file names from times when I used it for home directory) and decided to search for it in old snapshots of home directory (about 1/3 of snapshots on that partition) * file name was present in snapshots back to July of 2015, but during search through snapshot from 2016-10-26_18:47:04 I've got I/O error reported by find command at one directory * tried to open directory in file manager - same error, fails to open * after removing this lets call it "broken" snapshot started new scrub, hopefully it'll finish fine If it is not actually related to recent memory issues I'd be positively surprised. Not sure what happened towards the end of October 2016 though, especially that backups were on different physical device back then. Sincerely, Nazar Mokrynskyi github.com/nazar-pc -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unrecoverable scrub errors
On Sat, Nov 18, 2017 at 8:45 PM, Nazar Mokrynskyiwrote: > 19.11.17 05:19, Chris Murphy пише: >> On Sat, Nov 18, 2017 at 1:15 AM, Nazar Mokrynskyi >> wrote: >>> I can assure you that drive (it is HDD) is perfectly functional with 0 >>> SMART errors or warnings and doesn't have any problems. dmesg is clean in >>> that regard too, HDD itself can be excluded from potential causes. >>> >>> There were however some memory-related issues on my machine a few months >>> ago, so there is a chance that data might have being written incorrectly to >>> the drive back then (I didn't run scrub on backup drive for a long time). >>> >>> How can I identify to which files these metadata belong to replace or just >>> remove them (files)? >> You might look through the archives about bad ram and btrfs check >> --repair and include Hugo Mills in the search, I'm pretty sure there >> is code in repair that can fix certain kinds of memory induced >> corruption in metadata. But I have no idea if this is that type or if >> repair can make things worse in this case. So I'd say you get >> everything off this file system that you want, and then go ahead and >> try --repair and see what happens. > > In this case I'm not sure if data were written incorrectly or checksum or > both. So I'd like to first identify the files affected, check them manually > and then decide what to do with it. Especially there not many errors yet. > >> One alternative is to just leave it alone. If you're not hitting these >> leaves in day to day operation, they won't hurt anything. > It was working for some time, but I have suspicion that occasionally it > causes spikes of disk activity because of this errors (which is why I run > scrub initially). >> Another alternative is to umount, and use btrfs-debug-tree -b on one >> of the leaf/node addresses and see what you get (probably an error), >> but it might still also show the node content so we have some idea >> what's affected by the error. If it flat out refuses to show the node, >> might be a feature request to get a flag that forces display of the >> node such as it is... > > Here is what I've got: > >> nazar-pc@nazar-pc ~> sudo btrfs-debug-tree -b 470069460992 >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 >> btrfs-progs v4.13.3 >> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 >> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 >> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 >> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 >> Csum didn't match >> ERROR: failed to read 470069460992 > Looks like I indeed need a --force here. > Huh, seems overdue. But what do I know? You can use btrfs-map-logical -l to get a physical address for this leaf, and then plug that into dd # dd if=/dev/ skip= bs=1 count=16384 2>/dev/null | hexdump -C Gotcha of course is this is not translated into the more plain language output by btrfs-debug-tree. And you're in the weeds with the on disk format documentation. But maybe you'll see filenames on the right hand side of the hexdump output and maybe that's enough... Or maybe it's worth computing a csum on that leaf to check against the csum for that leaf which is found in the first field of the leaf. I'd expect the csum itself is what's wrong, because if you get memory corruption in creating the node, the resulting csum will be *correct* for that malformed node and there'd be no csum error, you'd just see some other crazy faceplant. Example. I need a metadata leaf, so ask debug tree to show the files tree for an empty subvolume. In your case, you've got a bad leaf address already, so you just plug that into btrfs-map-logical as shown below: # btrfs-debug-tree -t 340 /dev/nvme0n1p8 btrfs-progs v4.13.3 file tree key (340 ROOT_ITEM 0) leaf 155375550464 items 3 free space 15942 generation 249992 owner 340 leaf 155375550464 flags 0x1(WRITTEN) backref revision 1 fs uuid 2662057f-e6c7-47fa-8af9-ad933a22f6ec chunk uuid 1df72dcf-f515-404a-894a-f7345f988793 item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160 generation 50968 transid 249992 size 0 nbytes 0 block group 0 mode 40700 links 1 uid 0 gid 0 rdev 0 sequence 0 flags 0x124(none) atime 1510866942.430740536 (2017-11-16 14:15:42) ctime 1511053088.58606103 (2017-11-18 17:58:08) mtime 1494741970.844618722 (2017-05-14 00:06:10) otime 1494741970.844618722 (2017-05-14 00:06:10) item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12 index 0 namelen 2 name: .. item 2 key (256 XATTR_ITEM 3817753667) itemoff 16017 itemsize 94 location key (0 UNKNOWN.0 0) type XATTR transid 50969 data_len 48 name_len 16 name: security.selinux data system_u:object_r:systemd_machined_var_lib_t:s0 total bytes 75161927680 bytes used 23639638016 uuid 2662057f-e6c7-47fa-8af9-ad933a22f6ec Get
Re: Unrecoverable scrub errors
19.11.17 05:19, Chris Murphy пише: > On Sat, Nov 18, 2017 at 1:15 AM, Nazar Mokrynskyi> wrote: >> I can assure you that drive (it is HDD) is perfectly functional with 0 SMART >> errors or warnings and doesn't have any problems. dmesg is clean in that >> regard too, HDD itself can be excluded from potential causes. >> >> There were however some memory-related issues on my machine a few months >> ago, so there is a chance that data might have being written incorrectly to >> the drive back then (I didn't run scrub on backup drive for a long time). >> >> How can I identify to which files these metadata belong to replace or just >> remove them (files)? > You might look through the archives about bad ram and btrfs check > --repair and include Hugo Mills in the search, I'm pretty sure there > is code in repair that can fix certain kinds of memory induced > corruption in metadata. But I have no idea if this is that type or if > repair can make things worse in this case. So I'd say you get > everything off this file system that you want, and then go ahead and > try --repair and see what happens. In this case I'm not sure if data were written incorrectly or checksum or both. So I'd like to first identify the files affected, check them manually and then decide what to do with it. Especially there not many errors yet. > One alternative is to just leave it alone. If you're not hitting these > leaves in day to day operation, they won't hurt anything. It was working for some time, but I have suspicion that occasionally it causes spikes of disk activity because of this errors (which is why I run scrub initially). > Another alternative is to umount, and use btrfs-debug-tree -b on one > of the leaf/node addresses and see what you get (probably an error), > but it might still also show the node content so we have some idea > what's affected by the error. If it flat out refuses to show the node, > might be a feature request to get a flag that forces display of the > node such as it is... Here is what I've got: > nazar-pc@nazar-pc ~> sudo btrfs-debug-tree -b 470069460992 > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > btrfs-progs v4.13.3 > checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 > checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 > checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 > checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 > Csum didn't match > ERROR: failed to read 470069460992 Looks like I indeed need a --force here. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unrecoverable scrub errors
On Sat, Nov 18, 2017 at 1:15 AM, Nazar Mokrynskyiwrote: > I can assure you that drive (it is HDD) is perfectly functional with 0 SMART > errors or warnings and doesn't have any problems. dmesg is clean in that > regard too, HDD itself can be excluded from potential causes. > > There were however some memory-related issues on my machine a few months ago, > so there is a chance that data might have being written incorrectly to the > drive back then (I didn't run scrub on backup drive for a long time). > > How can I identify to which files these metadata belong to replace or just > remove them (files)? You might look through the archives about bad ram and btrfs check --repair and include Hugo Mills in the search, I'm pretty sure there is code in repair that can fix certain kinds of memory induced corruption in metadata. But I have no idea if this is that type or if repair can make things worse in this case. So I'd say you get everything off this file system that you want, and then go ahead and try --repair and see what happens. One alternative is to just leave it alone. If you're not hitting these leaves in day to day operation, they won't hurt anything. Another alternative is to umount, and use btrfs-debug-tree -b on one of the leaf/node addresses and see what you get (probably an error), but it might still also show the node content so we have some idea what's affected by the error. If it flat out refuses to show the node, might be a feature request to get a flag that forces display of the node such as it is... -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unrecoverable scrub errors
I can assure you that drive (it is HDD) is perfectly functional with 0 SMART errors or warnings and doesn't have any problems. dmesg is clean in that regard too, HDD itself can be excluded from potential causes. There were however some memory-related issues on my machine a few months ago, so there is a chance that data might have being written incorrectly to the drive back then (I didn't run scrub on backup drive for a long time). How can I identify to which files these metadata belong to replace or just remove them (files)? Sincerely, Nazar Mokrynskyi github.com/nazar-pc 18.11.17 05:33, Adam Borowski пише: > On Fri, Nov 17, 2017 at 08:19:11PM -0700, Chris Murphy wrote: >> On Fri, Nov 17, 2017 at 8:41 AM, Nazar Mokrynskyi>> wrote: >> [551049.038718] BTRFS warning (device dm-2): checksum error at logical 470069460992 on dev /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 942238048: metadata leaf (level 0) in tree 985 [551049.038720] BTRFS warning (device dm-2): checksum error at logical 470069460992 on dev /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 942238048: metadata leaf (level 0) in tree 985 [551049.038723] BTRFS error (device dm-2): bdev /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 [551049.039634] BTRFS warning (device dm-2): checksum error at logical 470069526528 on dev /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 942238176: metadata leaf (level 0) in tree 985 [551049.039635] BTRFS warning (device dm-2): checksum error at logical 470069526528 on dev /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 942238176: metadata leaf (level 0) in tree 985 [551049.039637] BTRFS error (device dm-2): bdev /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 [551049.413114] BTRFS error (device dm-2): unable to fixup (regular) error at logical 470069460992 on dev /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 >> These are metadata errors. Are there any other storage stack related >> errors in the previous 2-5 minutes, such as read errors (UNC) or SATA >> link reset messages? >> >>> Maybe I can find snapshot that contains file with wrong checksum and >>> remove corresponding snapshot or something like that? >> It's not a file. It's metadata leaf. > Just for the record: had this be a data block (ie, a non-inline file > extent), the dmesg message would include one of filenames that refer to that > extent. To clear the error, you'd need to remove all such files. > nazar-pc@nazar-pc ~> sudo btrfs filesystem df /media/Backup Data, single: total=879.01GiB, used=877.24GiB System, DUP: total=40.00MiB, used=128.00KiB Metadata, DUP: total=20.50GiB, used=18.96GiB GlobalReserve, single: total=512.00MiB, used=0.00B >> Metadata is DUP, but both copies have corruption. Kinda strange. But I >> don't know how close the DUP copies are to each other, if possibly a >> big enough media defect can explain this. > The original post mentioned SSD (but was unclear if _this_ filesystem is > backed by one). If so, DUP is nearly worthless as both copies will be > written to physical cells next to each other, no matter what positions the > FTL shows them at. > > > Meow! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unrecoverable scrub errors
On Fri, Nov 17, 2017 at 08:19:11PM -0700, Chris Murphy wrote: > On Fri, Nov 17, 2017 at 8:41 AM, Nazar Mokrynskyi> wrote: > > >> [551049.038718] BTRFS warning (device dm-2): checksum error at logical > >> 470069460992 on dev > >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > >> 942238048: metadata leaf (level 0) in tree 985 > >> [551049.038720] BTRFS warning (device dm-2): checksum error at logical > >> 470069460992 on dev > >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > >> 942238048: metadata leaf (level 0) in tree 985 > >> [551049.038723] BTRFS error (device dm-2): bdev > >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd > >> 0, flush 0, corrupt 1, gen 0 > >> [551049.039634] BTRFS warning (device dm-2): checksum error at logical > >> 470069526528 on dev > >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > >> 942238176: metadata leaf (level 0) in tree 985 > >> [551049.039635] BTRFS warning (device dm-2): checksum error at logical > >> 470069526528 on dev > >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > >> 942238176: metadata leaf (level 0) in tree 985 > >> [551049.039637] BTRFS error (device dm-2): bdev > >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd > >> 0, flush 0, corrupt 2, gen 0 > >> [551049.413114] BTRFS error (device dm-2): unable to fixup (regular) error > >> at logical 470069460992 on dev > >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > > These are metadata errors. Are there any other storage stack related > errors in the previous 2-5 minutes, such as read errors (UNC) or SATA > link reset messages? > > >Maybe I can find snapshot that contains file with wrong checksum and > > remove corresponding snapshot or something like that? > > It's not a file. It's metadata leaf. Just for the record: had this be a data block (ie, a non-inline file extent), the dmesg message would include one of filenames that refer to that extent. To clear the error, you'd need to remove all such files. > >> nazar-pc@nazar-pc ~> sudo btrfs filesystem df /media/Backup > >> Data, single: total=879.01GiB, used=877.24GiB > >> System, DUP: total=40.00MiB, used=128.00KiB > >> Metadata, DUP: total=20.50GiB, used=18.96GiB > >> GlobalReserve, single: total=512.00MiB, used=0.00B > > Metadata is DUP, but both copies have corruption. Kinda strange. But I > don't know how close the DUP copies are to each other, if possibly a > big enough media defect can explain this. The original post mentioned SSD (but was unclear if _this_ filesystem is backed by one). If so, DUP is nearly worthless as both copies will be written to physical cells next to each other, no matter what positions the FTL shows them at. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢰⠒⠀⣿⡁ Imagine there are bandits in your house, your kid is bleeding out, ⢿⡄⠘⠷⠚⠋⠀ the house is on fire, and seven big-ass trumpets are playing in the ⠈⠳⣄ sky. Your cat demands food. The priority should be obvious... -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unrecoverable scrub errors
On Fri, Nov 17, 2017 at 8:41 AM, Nazar Mokrynskyiwrote: >> [551049.038718] BTRFS warning (device dm-2): checksum error at logical >> 470069460992 on dev >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector >> 942238048: metadata leaf (level 0) in tree 985 >> [551049.038720] BTRFS warning (device dm-2): checksum error at logical >> 470069460992 on dev >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector >> 942238048: metadata leaf (level 0) in tree 985 >> [551049.038723] BTRFS error (device dm-2): bdev >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd >> 0, flush 0, corrupt 1, gen 0 >> [551049.039634] BTRFS warning (device dm-2): checksum error at logical >> 470069526528 on dev >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector >> 942238176: metadata leaf (level 0) in tree 985 >> [551049.039635] BTRFS warning (device dm-2): checksum error at logical >> 470069526528 on dev >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector >> 942238176: metadata leaf (level 0) in tree 985 >> [551049.039637] BTRFS error (device dm-2): bdev >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd >> 0, flush 0, corrupt 2, gen 0 >> [551049.413114] BTRFS error (device dm-2): unable to fixup (regular) error >> at logical 470069460992 on dev >> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 These are metadata errors. Are there any other storage stack related errors in the previous 2-5 minutes, such as read errors (UNC) or SATA link reset messages? > Are there any better options before resorting to `btrfsck --repair`? I wouldn't try it just yet. What do you get for btrfs check without repair? This will check the metadata and it should run into the same problem, but if it craps out then chances are --repair will too. >Maybe I can find snapshot that contains file with wrong checksum and remove >corresponding snapshot or something like that? It's not a file. It's metadata leaf. >> nazar-pc@nazar-pc ~> sudo btrfs filesystem df /media/Backup >> Data, single: total=879.01GiB, used=877.24GiB >> System, DUP: total=40.00MiB, used=128.00KiB >> Metadata, DUP: total=20.50GiB, used=18.96GiB >> GlobalReserve, single: total=512.00MiB, used=0.00B Metadata is DUP, but both copies have corruption. Kinda strange. But I don't know how close the DUP copies are to each other, if possibly a big enough media defect can explain this. What do you get for smartctl -l scterc /dev/ (whole physical device, not the dm device) In the meantime, take the drive offline (umount it), and run smartctl -t long, and after that finishes, smartctl -x. Attach that as a plain text file, it should be small enough for the list to handle it, and avoids reformatting problems. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Unrecoverable scrub errors
Hi folks, I'm a long-term btrfs user (permanently for my root partition and other stuff for ~3 years now, with compression, most of the way with RAID0 on various SSD, etc). In simple words my setup consists of root partition and backup partition. There are automated snapshots on root partition which are then copied to online backup partition (send/receive, handled by "Just backup btrfs") and occasionally to offline backup partition (handled by "Btrfs sync subvolumes"). I've recently found that my online backup partition has some unrecoverable errors as reported after running scrub: > scrub status for 82cfcb0f-0b80-4764-bed6-f529f2030ac5 > scrub started at Fri Nov 17 15:05:12 2017 and finished after 02:07:30 > total bytes scrubbed: 915.16GiB with 12 errors > error details: csum=12 > corrected errors: 0, uncorrectable errors: 12, unverified errors: 0 dmesg (this is all related to mentioned errors): > [551049.038718] BTRFS warning (device dm-2): checksum error at logical > 470069460992 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 942238048: metadata leaf (level 0) in tree 985 > [551049.038720] BTRFS warning (device dm-2): checksum error at logical > 470069460992 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 942238048: metadata leaf (level 0) in tree 985 > [551049.038723] BTRFS error (device dm-2): bdev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, > flush 0, corrupt 1, gen 0 > [551049.039634] BTRFS warning (device dm-2): checksum error at logical > 470069526528 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 942238176: metadata leaf (level 0) in tree 985 > [551049.039635] BTRFS warning (device dm-2): checksum error at logical > 470069526528 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 942238176: metadata leaf (level 0) in tree 985 > [551049.039637] BTRFS error (device dm-2): bdev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, > flush 0, corrupt 2, gen 0 > [551049.413114] BTRFS error (device dm-2): unable to fixup (regular) error at > logical 470069460992 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > [551049.413473] BTRFS warning (device dm-2): checksum error at logical > 470069477376 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 942238080: metadata leaf (level 0) in tree 985 > [551049.413473] BTRFS warning (device dm-2): checksum error at logical > 470069477376 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 942238080: metadata leaf (level 0) in tree 985 > [551049.413475] BTRFS error (device dm-2): bdev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, > flush 0, corrupt 3, gen 0 > [551049.413685] BTRFS error (device dm-2): unable to fixup (regular) error at > logical 470069477376 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > [551049.413910] BTRFS warning (device dm-2): checksum error at logical > 470069493760 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 942238112: metadata leaf (level 0) in tree 985 > [551049.413911] BTRFS warning (device dm-2): checksum error at logical > 470069493760 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 942238112: metadata leaf (level 0) in tree 985 > [551049.413912] BTRFS error (device dm-2): bdev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, > flush 0, corrupt 4, gen 0 > [551049.414121] BTRFS error (device dm-2): unable to fixup (regular) error at > logical 470069493760 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > [551049.414354] BTRFS warning (device dm-2): checksum error at logical > 470069510144 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 942238144: metadata leaf (level 0) in tree 985 > [551049.414355] BTRFS warning (device dm-2): checksum error at logical > 470069510144 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 942238144: metadata leaf (level 0) in tree 985 > [551049.414356] BTRFS error (device dm-2): bdev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, > flush 0, corrupt 5, gen 0 > [551049.414567] BTRFS error (device dm-2): unable to fixup (regular) error at > logical 470069510144 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > [551049.479023] BTRFS error (device dm-2): unable to fixup (regular) error at > logical 470069526528 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > [551049.479989] BTRFS warning (device dm-2): checksum error at logical > 470069542912 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 942238208: metadata leaf (level 0)