Re: Bad hard drive - checksum verify failure forces readonly mount
Bug reported https://bugzilla.kernel.org/show_bug.cgi?id=121491 Thank you for helping. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad hard drive - checksum verify failure forces readonly mount
On Mon, Jun 27, 2016 at 12:30 AM, Vasco Almeida wrote: > File system image available at (choose one link) > https://mega.nz/#!AkAEgKyB!RUa7G5xHIygWm0ALx5ZxQjjXNdFYa7lDRHJ_sW0bWLs > https://www.sendspace.com/file/i70cft > Should I file a bug report with that image dump linked above or btrfs- > debug-tree output or both? If it were me, I'd include both. Maybe the image is incomplete or vice versa. The debug tree output is also human readable. I'd also put them up in a cloud location where you can kinda forget about them for a while, I've had images not looked at for 6+ months by a dev. > I think I will use the subject of this thread as summary to file the > bug. Can you think of something more suitable or is that fine? I would try to summarize something like: file system created with btrfs-progs version -, and mostly used with kernel version -, and inexplicably the file system became unusable at boot time always mounting only readonly. Newer kernel versions still could not mount it, nor was btrfs check using btrfs-progs version - able to repair. See thread URL for more details. btrfs-image URL btrfs-debug-tree URL > I think I will reinstall the OS since, even if I manage to recover the > file system from this issue, that OS will be something I can not trust > fully. Yeah pretty much that's right. There is an rpm command where you can have it check the signatures of all installed binaries, but I forget what it is offhand. That'd be an alternative to reinstalling if the init options were to work. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad hard drive - checksum verify failure forces readonly mount
A Dom, 26-06-2016 às 13:54 -0600, Chris Murphy escreveu: > On Sun, Jun 26, 2016 at 7:05 AM, Vasco Almeida > wrote: > > I have tried "btrfs check --repair /device" but that seems do not > > do > > any good. > > http://paste.fedoraproject.org/384960/66945936/ > > It did fix things, in particular with the snapshot that was having > problems being dropped. But it's not enough it seems to prevent it > from going read only. > > There's more than one bug here, you might see if the repair was good > enough that it's possible to use brtfs-image now. File system image available at (choose one link) https://mega.nz/#!AkAEgKyB!RUa7G5xHIygWm0ALx5ZxQjjXNdFYa7lDRHJ_sW0bWLs https://www.sendspace.com/file/i70cft > If not, use > btrfs-debug-tree > file.txt and post that file somewhere. This > does expose file names. Maybe that'll shed some light on the problem. > But also worth filing a bug at bugzilla.kernel.org with this debug > tree referenced (probably too big to attach), maybe a dev will be > able > to look at it and improve things so they don't fail. Should I file a bug report with that image dump linked above or btrfs- debug-tree output or both? I think I will use the subject of this thread as summary to file the bug. Can you think of something more suitable or is that fine? > > What else can I do or I must rebuild the file system? > > Well, it's a long shot but you could try using --repair --init-csum > which will create a new csum tree. But that applies to data, if the > problem with it going read only is due to metadata corruption this > won't help. And then last you could try --init-extent-tree. Thing I > can't answer is which order to do it in. > > In any case there will be files that you shouldn't trust after csum > has been recreated, anything corrupt will now have a new csum, so you > can get silent data corruption. It's better to just blow away this > file system and make a new one and reinstall the OS. But if you're > feeling brave, you can try one or both of those additional options > and > see if they can help. I think I will reinstall the OS since, even if I manage to recover the file system from this issue, that OS will be something I can not trust fully. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad hard drive - checksum verify failure forces readonly mount
On Sun, Jun 26, 2016 at 7:05 AM, Vasco Almeida wrote: > A Sáb, 25-06-2016 às 14:54 -0600, Chris Murphy escreveu: >> On Sat, Jun 25, 2016 at 2:10 PM, Vasco Almeida > > wrote: >> > Citando Chris Murphy : >> > > 3. btrfs-image so that devs can see what's causing the problem >> > > that >> > > the current code isn't handling well enough. >> > >> > >> > btrfs-image does not create dump image: >> > >> > # btrfs-image /dev/mapper/vg_pupu-lv_opensuse_root >> > btrfs-lv_opensuse_root.image >> > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 >> > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 >> > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 >> > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 >> > Csum didn't match >> > Error reading metadata block >> > Error adding block -5 >> > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 >> > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 >> > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 >> > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 >> > Csum didn't match >> > Error reading metadata block >> > Error flushing pending -5 >> > create failed (Success) >> > # echo $? >> > 1 >> >> Well it's pretty strange to have DUP metadata and for the checksum >> verify to fail on both copies. I don't have much optimism that brfsck >> repair can fix it either. But still it's worth a shot since there's >> not much else to go on. > > I have tried "btrfs check --repair /device" but that seems do not do > any good. > http://paste.fedoraproject.org/384960/66945936/ It did fix things, in particular with the snapshot that was having problems being dropped. But it's not enough it seems to prevent it from going read only. There's more than one bug here, you might see if the repair was good enough that it's possible to use brtfs-image now. If not, use btrfs-debug-tree > file.txt and post that file somewhere. This does expose file names. Maybe that'll shed some light on the problem. But also worth filing a bug at bugzilla.kernel.org with this debug tree referenced (probably too big to attach), maybe a dev will be able to look at it and improve things so they don't fail. > What else can I do or I must rebuild the file system? Well, it's a long shot but you could try using --repair --init-csum which will create a new csum tree. But that applies to data, if the problem with it going read only is due to metadata corruption this won't help. And then last you could try --init-extent-tree. Thing I can't answer is which order to do it in. In any case there will be files that you shouldn't trust after csum has been recreated, anything corrupt will now have a new csum, so you can get silent data corruption. It's better to just blow away this file system and make a new one and reinstall the OS. But if you're feeling brave, you can try one or both of those additional options and see if they can help. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad hard drive - checksum verify failure forces readonly mount
A Sáb, 25-06-2016 às 14:54 -0600, Chris Murphy escreveu: > On Sat, Jun 25, 2016 at 2:10 PM, Vasco Almeida > wrote: > > Citando Chris Murphy : > > > 3. btrfs-image so that devs can see what's causing the problem > > > that > > > the current code isn't handling well enough. > > > > > > btrfs-image does not create dump image: > > > > # btrfs-image /dev/mapper/vg_pupu-lv_opensuse_root > > btrfs-lv_opensuse_root.image > > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > > Csum didn't match > > Error reading metadata block > > Error adding block -5 > > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > > Csum didn't match > > Error reading metadata block > > Error flushing pending -5 > > create failed (Success) > > # echo $? > > 1 > > Well it's pretty strange to have DUP metadata and for the checksum > verify to fail on both copies. I don't have much optimism that brfsck > repair can fix it either. But still it's worth a shot since there's > not much else to go on. I have tried "btrfs check --repair /device" but that seems do not do any good. http://paste.fedoraproject.org/384960/66945936/ I then issued "mount /device /mnt" and, like before, it was mounted readwrite and then forced readonly. Got some kernel oops and traces. I noticed that btrfs-balance was using ~100% CPU whilst btrfs device was mounted readonly. I let it run for about 20 minutes. Then had to reboot because the system was no responding well: was unable to open or close applications, use internet. Did SysRq+reisu (operations were enabled) and then pressed reset button on computer. Unfortunately dmesg dumps were lost after resetting computer. What else can I do or I must rebuild the file system? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad hard drive - checksum verify failure forces readonly mount
On Sat, Jun 25, 2016 at 2:10 PM, Vasco Almeida wrote: > Citando Chris Murphy : >> >> I would do a couple things in order: >> 1. Mount ro and copy off what you want in case the whole thing gets >> worse and can't ever be mounted again. >> 2. Mount with only these options: -o skip_balance,subvolid=5,nospace_cache > > > I have mounted with that options and was readwrite first and then it forces > readonly. You can see a delay between first BTRFS messages and the "BTRFS > info: forced readonly" message in dmesg. > > /dev/mapper/vg_pupu-lv_opensuse_root on /mnt type btrfs > (ro,relatime,seclabel,nospace_cache,skip_balance,subvolid=5,subvol=/) > > >> If it mounts rw, don't do anything with it, just see if it cleans up >> after itself. It also looks from the previous trace it was trying to >> remove a snapshot and there are complaints of problems in that >> snapshot. So hopefully just waiting 5 minutes doing nothing and it'll >> clean up after itself (you can check with top to see if there are any >> btrfs related transactions that run including the btrfs-cleaner >> process) wait until they're done. > > > I can see that btrfs processes including btrfs-cleaner but they may be not > doing much since device was forced readonly after mounting it. Readonly just refers to user space to and including VFS, is my understanding. The file system itself can still write to the block device. > I have umount it normally (umount /mnt) after more than 20 minutes since > mounting it. > >> 3. btrfs-image so that devs can see what's causing the problem that >> the current code isn't handling well enough. > > > btrfs-image does not create dump image: > > # btrfs-image /dev/mapper/vg_pupu-lv_opensuse_root > btrfs-lv_opensuse_root.image > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > Csum didn't match > Error reading metadata block > Error adding block -5 > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > Csum didn't match > Error reading metadata block > Error flushing pending -5 > create failed (Success) > # echo $? > 1 Well it's pretty strange to have DUP metadata and for the checksum verify to fail on both copies. I don't have much optimism that brfsck repair can fix it either. But still it's worth a shot since there's not much else to go on. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad hard drive - checksum verify failure forces readonly mount
Citando Chris Murphy : On Fri, Jun 24, 2016 at 6:06 PM, Vasco Almeida wrote: Citando Chris Murphy : dmesg http://paste.fedoraproject.org/384352/80842814/ [ 1837.386732] BTRFS info (device dm-9): continuing balance [ 1838.006038] BTRFS info (device dm-9): relocating block group 15799943168 flags 34 [ 1838.684892] BTRFS info (device dm-9): relocating block group 10934550528 flags 36 [ 1839.301453] [ cut here ] [ 1839.301495] WARNING: CPU: 3 PID: 76 at fs/btrfs/extent-tree.c:1625 lookup_inline_extent_backref+0x45c/0x5a0 [btrfs]() followed by [ 1839.301797] WARNING: CPU: 3 PID: 76 at fs/btrfs/extent-tree.c:2946 btrfs_run_delayed_refs+0x29d/0x2d0 [btrfs]() [ 1839.301798] BTRFS: Transaction aborted (error -5) [...] [ 1839.301972] BTRFS: error (device dm-9) in btrfs_run_delayed_refs:2946: errno=-5 IO failure [ 1839.301975] BTRFS info (device dm-9): forced readonly So it looks like it was resuming a balance automatically, and while processing delayed references it's running into something it doesn't expect and doesn't have a way to fix, so it goes read only to avoid causing more problems. I would do a couple things in order: 1. Mount ro and copy off what you want in case the whole thing gets worse and can't ever be mounted again. 2. Mount with only these options: -o skip_balance,subvolid=5,nospace_cache I have mounted with that options and was readwrite first and then it forces readonly. You can see a delay between first BTRFS messages and the "BTRFS info: forced readonly" message in dmesg. /dev/mapper/vg_pupu-lv_opensuse_root on /mnt type btrfs (ro,relatime,seclabel,nospace_cache,skip_balance,subvolid=5,subvol=/) If it mounts rw, don't do anything with it, just see if it cleans up after itself. It also looks from the previous trace it was trying to remove a snapshot and there are complaints of problems in that snapshot. So hopefully just waiting 5 minutes doing nothing and it'll clean up after itself (you can check with top to see if there are any btrfs related transactions that run including the btrfs-cleaner process) wait until they're done. I can see that btrfs processes including btrfs-cleaner but they may be not doing much since device was forced readonly after mounting it. Then umount. If you want you could have two other consoles ready first, one for 'journalctl -f' and another for sysrq+t to issue in case you get a hang. This doesn't fix anything but it collects more information for a bug report for the devs. Once you get it umounted normally or by force, the next thing to do is I have umount it normally (umount /mnt) after more than 20 minutes since mounting it. 3. btrfs-image so that devs can see what's causing the problem that the current code isn't handling well enough. btrfs-image does not create dump image: # btrfs-image /dev/mapper/vg_pupu-lv_opensuse_root btrfs-lv_opensuse_root.image checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 Csum didn't match Error reading metadata block Error adding block -5 checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 Csum didn't match Error reading metadata block Error flushing pending -5 create failed (Success) # echo $? 1 4. btrfs check --repair Did not issue this command yet. dmesg http://paste.fedoraproject.org/384799/14668851/ Thank your for helping. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad hard drive - checksum verify failure forces readonly mount
On Fri, Jun 24, 2016 at 6:06 PM, Vasco Almeida wrote: > Citando Chris Murphy : >> A lot of changes have happened since 4.1.2 I would still use something >> newer and try to repair it. > > > By repair do you mean issue "btrfs check --repair /device" ? Once you have copied off the important stuff, yes. It's less likely to make things worse now. However, there are some things to do first: > dmesg http://paste.fedoraproject.org/384352/80842814/ [ 1837.386732] BTRFS info (device dm-9): continuing balance [ 1838.006038] BTRFS info (device dm-9): relocating block group 15799943168 flags 34 [ 1838.684892] BTRFS info (device dm-9): relocating block group 10934550528 flags 36 [ 1839.301453] [ cut here ] [ 1839.301495] WARNING: CPU: 3 PID: 76 at fs/btrfs/extent-tree.c:1625 lookup_inline_extent_backref+0x45c/0x5a0 [btrfs]() followed by [ 1839.301797] WARNING: CPU: 3 PID: 76 at fs/btrfs/extent-tree.c:2946 btrfs_run_delayed_refs+0x29d/0x2d0 [btrfs]() [ 1839.301798] BTRFS: Transaction aborted (error -5) [...] [ 1839.301972] BTRFS: error (device dm-9) in btrfs_run_delayed_refs:2946: errno=-5 IO failure [ 1839.301975] BTRFS info (device dm-9): forced readonly So it looks like it was resuming a balance automatically, and while processing delayed references it's running into something it doesn't expect and doesn't have a way to fix, so it goes read only to avoid causing more problems. I would do a couple things in order: 1. Mount ro and copy off what you want in case the whole thing gets worse and can't ever be mounted again. 2. Mount with only these options: -o skip_balance,subvolid=5,nospace_cache If it mounts rw, don't do anything with it, just see if it cleans up after itself. It also looks from the previous trace it was trying to remove a snapshot and there are complaints of problems in that snapshot. So hopefully just waiting 5 minutes doing nothing and it'll clean up after itself (you can check with top to see if there are any btrfs related transactions that run including the btrfs-cleaner process) wait until they're done. Then umount. If you want you could have two other consoles ready first, one for 'journalctl -f' and another for sysrq+t to issue in case you get a hang. This doesn't fix anything but it collects more information for a bug report for the devs. Once you get it umounted normally or by force, the next thing to do is 3. btrfs-image so that devs can see what's causing the problem that the current code isn't handling well enough. 4. btrfs check --repair Let's see the results of that repair. You can use 'script btrfsrepair.txt' first and then 'btrfs check --repair' and it will log everything. After btrfs check completes, use 'exit' to stop script from recording and you should have a btrfsrepair.txt file you can post somewhere. When using > not everything gets logged for some reason but script will capture everything. Depending on how the repair goes, there might be a couple more options left. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad hard drive - checksum verify failure forces readonly mount
Citando Chris Murphy : On Fri, Jun 24, 2016 at 9:52 AM, Vasco Almeida wrote: From the pasted kernel messages: > Linux version 3.18.34-std473-amd64 (root@rl-sysrcd-p11) (gcc version 4.8.5 > (Gentoo 4.8.5 p1.3, pie-0.6.2) ) #2 SMP Tue May 24 20:34:19 UTC 2016 3.18.34 is ancient. Find something newer and try to remount normally. Present information concerns openSUSE Leap 42.1 (x86_64) mount of root file system at boot time. That should mount it normally. Hope that fits what you mean. OK but it's not mounting it normally, it's still being forced readonly at btrfs_drop_snapshot and the only thing I'm coming up with search wise is that it's related to qgroups. Have you enabled quotas on this file system ever? Unless openSUSE does that by default, I did not enable quotas. It is not something I am aware of doing. btrfs-progs v4.1.2+20151002 A lot of changes have happened since 4.1.2 I would still use something newer and try to repair it. By repair do you mean issue "btrfs check --repair /device" ? $ /usr/sbin/btrfs fi df / Data, single: total=10.01GiB, used=9.06GiB System, DUP: total=64.00MiB, used=16.00KiB Metadata, DUP: total=1.12GiB, used=596.69MiB GlobalReserve, single: total=208.00MiB, used=0.00B I forgot to mention in last e-mail that I ran Marc MERLIN's scrubbing script [1] after mounting the device with "-o ro,recovery" on System Rescue CD. Even after that device is forced readonly. OK but System Rescue CD uses an old kernel by btrfs standards, even account for all the backports in that particular version: 4.7.3) 2016-06-04: Standard kernels: Long-Term-Supported linux-3.18.34 (rescue32 + rescue64) So that's why I'm suggesting you use something newer, like 4.5.x, same for btrfs-progs. The old versions aren't working. There's no assurance it'll work with new versions, but that it doesn't get fixed up with old versions means you either try new versions or you rebuild the file system. *shrug* I am using Fedora 24 and have issued "mount /dev/mapper/vg_pupu-lv_opensuse_root /mnt". Got some call trace and scary stuff that did not get before on other systems. Please check dmesg output linked below. Linux catarina 4.5.7-300.fc24.x86_64 #1 SMP Wed Jun 8 18:12:45 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux btrfs-progs v4.5.2 # btrfs fi show Label: none uuid: ad167e92-fbb1-4148-b54d-6345b6fb26da Total devices 1 FS bytes used 9.63GiB devid1 size 50.00GiB used 12.32GiB path /dev/mapper/vg_pupu-lv_opensuse_root # btrfs fi df /mnt/ Data, single: total=10.01GiB, used=9.05GiB System, DUP: total=32.00MiB, used=16.00KiB Metadata, DUP: total=1.12GiB, used=597.62MiB GlobalReserve, single: total=208.00MiB, used=224.00KiB dmesg http://paste.fedoraproject.org/384352/80842814/ dmesg after umount http://paste.fedoraproject.org/384359/14668108/ diff between two http://paste.fedoraproject.org/384364/11704146/ btrfs check --readonly /dev/mappper/vg_pupu-lv_opensuse_root http://paste.fedoraproject.org/384361/68112421/ After umount and mounting again, the device was normally mounted readwrite again: /dev/mapper/vg_pupu-lv_opensuse_root on /mnt type btrfs (rw,relatime,seclabel,space_cache,subvolid=259,subvol=/@/.snapshots/1/snapshot) But trying to umount it afterwards makes umount command hang. Device no longer shows on mount output, though. CTRL-C or SIGTERM can't kill umount. dmesg http://paste.fedoraproject.org/384371/14668130/ I would like to find a solution to be able to mount normally readwrite again and hopefully understand what caused the issue. My best guess is qgroup related, there were a lot of problems with multiple quota implementations and snapshots and openSUSE does take many many snapshots. So that could be it. But without a reproducer it's hard to say what caused it. Thank you again for your time and reply. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad hard drive - checksum verify failure forces readonly mount
On Fri, Jun 24, 2016 at 9:52 AM, Vasco Almeida wrote: >> >> From the pasted kernel messages: >> > Linux version 3.18.34-std473-amd64 (root@rl-sysrcd-p11) (gcc version 4.8.5 >> > (Gentoo 4.8.5 p1.3, pie-0.6.2) ) #2 SMP Tue May 24 20:34:19 UTC 2016 >> 3.18.34 is ancient. Find something newer and try to remount normally. > Present information concerns openSUSE Leap 42.1 (x86_64) mount of root file > system at boot time. That should mount it normally. Hope that fits what you > mean. OK but it's not mounting it normally, it's still being forced readonly at btrfs_drop_snapshot and the only thing I'm coming up with search wise is that it's related to qgroups. Have you enabled quotas on this file system ever? > btrfs-progs v4.1.2+20151002 A lot of changes have happened since 4.1.2 I would still use something newer and try to repair it. > > $ /usr/sbin/btrfs fi df / > Data, single: total=10.01GiB, used=9.06GiB > System, DUP: total=64.00MiB, used=16.00KiB > Metadata, DUP: total=1.12GiB, used=596.69MiB > GlobalReserve, single: total=208.00MiB, used=0.00B > > I forgot to mention in last e-mail that I ran Marc MERLIN's scrubbing script > [1] after mounting the device with "-o ro,recovery" on System Rescue CD. > Even after that device is forced readonly. OK but System Rescue CD uses an old kernel by btrfs standards, even account for all the backports in that particular version: 4.7.3) 2016-06-04: Standard kernels: Long-Term-Supported linux-3.18.34 (rescue32 + rescue64) So that's why I'm suggesting you use something newer, like 4.5.x, same for btrfs-progs. The old versions aren't working. There's no assurance it'll work with new versions, but that it doesn't get fixed up with old versions means you either try new versions or you rebuild the file system. *shrug* > I would like to find a solution to be able to mount normally readwrite again > and hopefully understand what caused the issue. My best guess is qgroup related, there were a lot of problems with multiple quota implementations and snapshots and openSUSE does take many many snapshots. So that could be it. But without a reproducer it's hard to say what caused it. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad hard drive - checksum verify failure forces readonly mount
On Thu, Jun 23, 2016 at 10:56 PM, Duncan <1i5t5.dun...@cox.net> wrote: > Chris Murphy posted on Thu, 23 Jun 2016 18:54:28 -0600 as excerpted: > >> From the pasted kernel messages: >> >>> Linux version 3.18.34-std473-amd64 (root@rl-sysrcd-p11) (gcc version >>> 4.8.5 (Gentoo 4.8.5 p1.3, pie-0.6.2) ) #2 SMP Tue May 24 20:34:19 UTC >>> 2016 >> >> >> 3.18.34 is ancient. Find something newer and try to remount normally. >> And then also with recovery if necessary (don't use ro, see if it'll >> mount rw and fix itself). And if not, then try btrfs check with a newer >> version of btrfs-progs, I can't tell from the pasted output what version >> you're using but since the kernel is so old, decent chance the btrfsck >> is old also. > > ... So I guess that means we're back to supporting only the latest two > LTS kernel series, those being 4.1 and 4.4 at this time. I had hoped > that btrfs was stabilizing enough, and 3.18 was trouble-free enough btrfs- > wise, that we could expand that to three LTS series now, as the > indications were we might when 4.4 was still new. But it seems that > while we did support it a bit longer, say 2.5 LTS series, that couldn't > continue until the /next/ LTS came out. > > Oh, well, it /was/ a bit of a stretch... Yeah looks like 3.18.35 even has some backports, and it's not that old but I have no idea if the problem in this case if fixed by something newer. I'd say 50/50 shot at a new kernel doing better, but for the sure the btrfs-progs has a better chance because btrfsck has had lots of improvements since 3.18. It's just too easy to dd a Fedora 24 live image to a USB stick, which has kernel 4.5.5 and btrfs-progs 4.5.2 and give it a shot. And if that doesn't work, then btrfs-image time so hopefully devs can see if it's possible to improve btrfsck. But at that point it also means blowing away this fs :-\ but at least it's ro mountable so anything important can be copied off normally. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad hard drive - checksum verify failure forces readonly mount
Chris Murphy posted on Thu, 23 Jun 2016 18:54:28 -0600 as excerpted: > From the pasted kernel messages: > >> Linux version 3.18.34-std473-amd64 (root@rl-sysrcd-p11) (gcc version >> 4.8.5 (Gentoo 4.8.5 p1.3, pie-0.6.2) ) #2 SMP Tue May 24 20:34:19 UTC >> 2016 > > > 3.18.34 is ancient. Find something newer and try to remount normally. > And then also with recovery if necessary (don't use ro, see if it'll > mount rw and fix itself). And if not, then try btrfs check with a newer > version of btrfs-progs, I can't tell from the pasted output what version > you're using but since the kernel is so old, decent chance the btrfsck > is old also. ... So I guess that means we're back to supporting only the latest two LTS kernel series, those being 4.1 and 4.4 at this time. I had hoped that btrfs was stabilizing enough, and 3.18 was trouble-free enough btrfs- wise, that we could expand that to three LTS series now, as the indications were we might when 4.4 was still new. But it seems that while we did support it a bit longer, say 2.5 LTS series, that couldn't continue until the /next/ LTS came out. Oh, well, it /was/ a bit of a stretch... -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad hard drive - checksum verify failure forces readonly mount
On Thu, Jun 23, 2016 at 2:30 PM, Vasco Almeida wrote: > I was running OpenSuse Leap 42.1 with btrfs and > LVM (Logical Volume Management). > Last time I've checked smartd log, I noticed there were > 30 sector pending reallocation and 1 unrecoverable bad > sector on hard drive. > I think my hard drive got some sector corrupted and now btrfs fails > some checksum and forces mount readonly. > The device is successfully mounted readonly. > > OpenSuse dmesg reported: > > BTRFS: dm-1 checksum verify failed on 437944320 wanted 39F45669 found > 8BF8C752 leval 0 > (more 2 times) > BTRFS: error (device dm-1) in btrfs_drop_snapshot:???: error=-5 IO failure > BTRFS: info (device dm-1): forced readonly > > Now I'm on System Rescue CD and that is not reported. > I've written down those log line on paper, so there may be some typo. > Seemingly there is no journalctl installed on this system to check > OpenSuse logs again. > > All the following logs are on System Rescue CD. > mount -o ro,recovery /dev/mapper/vg_pupu-lv_opensuse_root /mnt/opensuse > https://bpaste.net/show/263e5f7ae9d4 > > After mounting and umounting several times with and without "-o ro,recovery" > https://bpaste.net/show/43eb64decb63 > > btrfs check --readonly /dev/mapper/vg_pupu-lv_opensuse_root > https://bpaste.net/show/7ecf422c73a2 > > > Would it be apropriate to run any of "btrfs check --repair /device" or > "btrfs check --init-csum-tree /device" to be able to mount readwrite again? > > smartctl --all /dev/disk/by-id/ata-SAMSUNG_HD154UI_S1Y6JDWSC01351 > https://bpaste.net/show/a6c132618974 > > btrfs check manpage: > https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-check > btrfsck page: https://btrfs.wiki.kernel.org/index.php/Btrfsck Normally if this is just data blocks corrupted it will still mount rw and just flag the affected file in kernel messages so you can delete it and replace. Since that's not happening, it's probably metadata, but then there should be two copies unless this is on SSD or otherwise the file system was created with -m single. If there are two copies of the metadata and both are wrong that's unusual. >From the pasted kernel messages: > Linux version 3.18.34-std473-amd64 (root@rl-sysrcd-p11) (gcc version 4.8.5 > (Gentoo 4.8.5 p1.3, pie-0.6.2) ) #2 SMP Tue May 24 20:34:19 UTC 2016 3.18.34 is ancient. Find something newer and try to remount normally. And then also with recovery if necessary (don't use ro, see if it'll mount rw and fix itself). And if not, then try btrfs check with a newer version of btrfs-progs, I can't tell from the pasted output what version you're using but since the kernel is so old, decent chance the btrfsck is old also. Chris Murphy -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bad hard drive - checksum verify failure forces readonly mount
I was running OpenSuse Leap 42.1 with btrfs and LVM (Logical Volume Management). Last time I've checked smartd log, I noticed there were 30 sector pending reallocation and 1 unrecoverable bad sector on hard drive. I think my hard drive got some sector corrupted and now btrfs fails some checksum and forces mount readonly. The device is successfully mounted readonly. OpenSuse dmesg reported: BTRFS: dm-1 checksum verify failed on 437944320 wanted 39F45669 found 8BF8C752 leval 0 (more 2 times) BTRFS: error (device dm-1) in btrfs_drop_snapshot:???: error=-5 IO failure BTRFS: info (device dm-1): forced readonly Now I'm on System Rescue CD and that is not reported. I've written down those log line on paper, so there may be some typo. Seemingly there is no journalctl installed on this system to check OpenSuse logs again. All the following logs are on System Rescue CD. mount -o ro,recovery /dev/mapper/vg_pupu-lv_opensuse_root /mnt/opensuse https://bpaste.net/show/263e5f7ae9d4 After mounting and umounting several times with and without "-o ro,recovery" https://bpaste.net/show/43eb64decb63 btrfs check --readonly /dev/mapper/vg_pupu-lv_opensuse_root https://bpaste.net/show/7ecf422c73a2 Would it be apropriate to run any of "btrfs check --repair /device" or "btrfs check --init-csum-tree /device" to be able to mount readwrite again? smartctl --all /dev/disk/by-id/ata-SAMSUNG_HD154UI_S1Y6JDWSC01351 https://bpaste.net/show/a6c132618974 btrfs check manpage: https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-check btrfsck page: https://btrfs.wiki.kernel.org/index.php/Btrfsck -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html