Re: Seeking Help on Corruption Issues

2017-10-03 Thread Stephen Nesbitt


On 10/3/2017 2:11 PM, Hugo Mills wrote:

Hi, Stephen,

On Tue, Oct 03, 2017 at 08:52:04PM +, Stephen Nesbitt wrote:

Here it i. There are a couple of out-of-order entries beginning at 117. And
yes I did uncover a bad stick of RAM:

btrfs-progs v4.9.1
leaf 2589782867968 items 134 free space 6753 generation 3351574 owner 2
fs uuid 24b768c3-2141-44bf-ae93-1c3833c8c8e3
chunk uuid 19ce12f0-d271-46b8-a691-e0d26c1790c6

[snip]

item 116 key (1623012749312 EXTENT_ITEM 45056) itemoff 10908 itemsize 53
extent refs 1 gen 3346444 flags DATA
extent data backref root 271 objectid 2478 offset 0 count 1
item 117 key (1621939052544 EXTENT_ITEM 8192) itemoff 10855 itemsize 53
extent refs 1 gen 3346495 flags DATA
extent data backref root 271 objectid 21751764 offset 6733824 count 1
item 118 key (1623012450304 EXTENT_ITEM 8192) itemoff 10802 itemsize 53
extent refs 1 gen 3351513 flags DATA
extent data backref root 271 objectid 5724364 offset 680640512 count 1
item 119 key (1623012802560 EXTENT_ITEM 12288) itemoff 10749 itemsize 53
extent refs 1 gen 3346376 flags DATA
extent data backref root 271 objectid 21751764 offset 6701056 count 1

hex(1623012749312)

'0x179e3193000'

hex(1621939052544)

'0x179a319e000'

hex(1623012450304)

'0x179e314a000'

hex(1623012802560)

'0x179e31a'

That's "e" -> "a" in the fourth hex digit, which is a single-bit
flip, and should be fixable by btrfs check (I think). However, even
fixing that, it's not ordered, because 118 is then before 117, which
could be another bitflip ("9" -> "4" in the 7th digit), but two bad
bits that close to each other seems unlikely to me.

Hugo.


Hope this is a duplicate reply - I might have fat fingered something.

The underlying file is disposable/replaceable. Any way to zero out/zap 
the bad BTRFS entry?


-steve

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Seeking Help on Corruption Issues

2017-10-03 Thread Stephen Nesbitt

All:

I came back to my computer yesterday to find my filesystem in read only 
mode. Running a btrfs scrub start -dB aborts as follows:


btrfs scrub start -dB /mnt
ERROR: scrubbing /mnt failed for device id 4: ret=-1, errno=5 
(Input/output error)
ERROR: scrubbing /mnt failed for device id 5: ret=-1, errno=5 
(Input/output error)

scrub device /dev/sdb (id 4) canceled
    scrub started at Mon Oct  2 21:51:46 2017 and was aborted after 
00:09:02

    total bytes scrubbed: 75.58GiB with 1 errors
    error details: csum=1
    corrected errors: 0, uncorrectable errors: 1, unverified errors: 0
scrub device /dev/sdc (id 5) canceled
    scrub started at Mon Oct  2 21:51:46 2017 and was aborted after 
00:11:11

    total bytes scrubbed: 50.75GiB with 0 errors

The resulting dmesg is:
[  699.534066] BTRFS error (device sdc): bdev /dev/sdb errs: wr 0, rd 0, 
flush 0, corrupt 6, gen 0
[  699.703045] BTRFS error (device sdc): unable to fixup (regular) error 
at logical 1609808347136 on dev /dev/sdb
[  783.306525] BTRFS critical (device sdc): corrupt leaf, bad key order: 
block=2589782867968, root=1, slot=116
[  789.776132] BTRFS critical (device sdc): corrupt leaf, bad key order: 
block=2589782867968, root=1, slot=116
[  911.529842] BTRFS critical (device sdc): corrupt leaf, bad key order: 
block=2589782867968, root=1, slot=116
[  918.365225] BTRFS critical (device sdc): corrupt leaf, bad key order: 
block=2589782867968, root=1, slot=116


Running btrfs check /dev/sdc results in:
btrfs check /dev/sdc
Checking filesystem on /dev/sdc
UUID: 24b768c3-2141-44bf-ae93-1c3833c8c8e3
checking extents
bad key ordering 116 117
bad block 2589782867968
ERROR: errors found in extent allocation tree or chunk allocation
checking free space cache
There is no free space entry for 1623012450304-1623012663296
There is no free space entry for 1623012450304-1623225008128
cache appears valid but isn't 1622151266304
found 288815742976 bytes used err is -22
total csum bytes: 0
total tree bytes: 350781440
total fs tree bytes: 0
total extent tree bytes: 350027776
btree space waste bytes: 115829777
file data blocks allocated: 156499968

uname -a:
Linux sysresccd 4.9.24-std500-amd64 #2 SMP Sat Apr 22 17:14:43 UTC 2017 
x86_64 Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz GenuineIntel GNU/Linux


btrfs --version: btrfs-progs v4.9.1

btrfs fi show:
Label: none  uuid: 24b768c3-2141-44bf-ae93-1c3833c8c8e3
    Total devices 2 FS bytes used 475.08GiB
    devid    4 size 931.51GiB used 612.06GiB path /dev/sdb
    devid    5 size 931.51GiB used 613.09GiB path /dev/sdc

btrfs fi df /mnt:
Data, RAID1: total=603.00GiB, used=468.03GiB
System, RAID1: total=64.00MiB, used=112.00KiB
System, single: total=32.00MiB, used=0.00B
Metadata, RAID1: total=9.00GiB, used=7.04GiB
Metadata, single: total=1.00GiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B

What is the recommended procedure at this point? Run btrfs check 
--repair? I have backups so losing a file or two isn't critical, but I 
really don't want to go through the effort of a bare metal reinstall.


In the process of researching this I did uncover a bad DIMM. Am I 
correct that the problems I'm seeing are likely linked to the resulting 
memory errors.


Thx in advance,

-steve

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html