On 25/06/16 00:52, Steven Haigh wrote:
> Ok, so I figured that despite what the BTRFS wiki seems to imply, the
> 'multi parity' support just isn't stable enough to be used. So, I'm
> trying to revert to what I had before.
> 
> My setup consist of:
>       * 2 x 3Tb drives +
>       * 3 x 2Tb drives.
> 
> I've got (had?) about 4.9Tb of data.
> 
> My idea was to convert the existing setup using a balance to a 'single'
> setup, delete the 3 x 2Tb drives from the BTRFS system, then create a
> new mdadm based RAID6 (5 drives degraded to 3), create a new filesystem
> on that, then copy the data across.
> 
> So, great - first the balance:
> $ btrfs balance start -dconvert=single -mconvert=single -f (yes, I know
> it'll reduce the metadata redundancy).
> 
> This promptly was followed by a system crash.
> 
> After a reboot, I can no longer mount the BTRFS in read-write:
> [  134.768908] BTRFS info (device xvdd): disk space caching is enabled
> [  134.769032] BTRFS: has skinny extents
> [  134.769856] BTRFS: failed to read the system array on xvdd
> [  134.776055] BTRFS: open_ctree failed
> [  143.900055] BTRFS info (device xvdd): allowing degraded mounts
> [  143.900152] BTRFS info (device xvdd): not using ssd allocation scheme
> [  143.900243] BTRFS info (device xvdd): disk space caching is enabled
> [  143.900330] BTRFS: has skinny extents
> [  143.901860] BTRFS warning (device xvdd): devid 4 uuid
> 61ccce61-9787-453e-b793-1b86f8015ee1 is missing
> [  146.539467] BTRFS: missing devices(1) exceeds the limit(0), writeable
> mount is not allowed
> [  146.552051] BTRFS: open_ctree failed
> 
> I can mount it read only - but then I also get crashes when it seems to
> hit a read error:
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064
> csum 3245290974 wanted 982056704 mirror 0
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 390821102 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 550556475 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1279883714 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2566472073 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1876236691 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 3350537857 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 3319706190 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2377458007 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2066127208 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 657140479 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1239359620 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1598877324 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1082738394 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 371906697 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2156787247 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 3777709399 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 180814340 wanted 982056704 mirror 1
> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/extent_io.c:2401!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: btrfs x86_pkg_temp_thermal coretemp crct10dif_pclmul
> xor aesni_intel aes_x86_64 lrw gf128mul glue_helper pcspkr raid6_pq
> ablk_helper cryptd nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables
> xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2610978113 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 59610051 wanted 982056704 mirror 1
> CPU: 1 PID: 1273 Comm: kworker/u4:4 Not tainted 4.4.13-1.el7xen.x86_64 #1
> Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
> task: ffff880079ce12c0 ti: ffff880078788000 task.ti: ffff880078788000
> RIP: e030:[<ffffffffa039e0e0>]  [<ffffffffa039e0e0>]
> btrfs_check_repairable+0x100/0x110 [btrfs]
> RSP: e02b:ffff88007878bcc8  EFLAGS: 00010297
> RAX: 0000000000000001 RBX: ffff880079db2080 RCX: 0000000000000003
> RDX: 0000000000000003 RSI: 000004db13730000 RDI: ffff88007889ef38
> RBP: ffff88007878bce0 R08: 000004db01c00000 R09: 000004dbc1c00000
> R10: ffff88006bb0c1b8 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff88007b213ea8 R14: 0000000000001000 R15: 0000000000000000
> FS:  00007fbf2fdc0880(0000) GS:ffff88007f500000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fbf2d96702b CR3: 000000007969f000 CR4: 0000000000042660
> Stack:
>  ffffea00019db180 0000000000010000 ffff88007b213f30 ffff88007878bd88
>  ffffffffa03a0808 ffff880002d15500 ffff88007878bd18 ffff880079ce12c0
>  ffff88007b213e40 000000000000001f ffff880000000000 ffff88006bb0c048
> Call Trace:
>  [<ffffffffa03a0808>] end_bio_extent_readpage+0x428/0x560 [btrfs]
>  [<ffffffff812f40c0>] bio_endio+0x40/0x60
>  [<ffffffffa0375a6c>] end_workqueue_fn+0x3c/0x40 [btrfs]
>  [<ffffffffa03af3f1>] normal_work_helper+0xc1/0x300 [btrfs]
>  [<ffffffff810a1352>] ? finish_task_switch+0x82/0x280
>  [<ffffffffa03af702>] btrfs_endio_helper+0x12/0x20 [btrfs]
>  [<ffffffff81093844>] process_one_work+0x154/0x400
>  [<ffffffff8109438a>] worker_thread+0x11a/0x460
>  [<ffffffff8165a24f>] ? __schedule+0x2bf/0x880
>  [<ffffffff81094270>] ? rescuer_thread+0x2f0/0x2f0
>  [<ffffffff810993f9>] kthread+0xc9/0xe0
>  [<ffffffff81099330>] ? kthread_park+0x60/0x60
>  [<ffffffff8165e14f>] ret_from_fork+0x3f/0x70
>  [<ffffffff81099330>] ? kthread_park+0x60/0x60
> Code: 00 31 c0 eb d5 8d 48 02 eb d9 31 c0 45 89 e0 48 c7 c6 a0 f8 3f a0
> 48 c7 c7 00 05 41 a0 e8 c9 f2 fa e0 31 c0 e9 70 ff ff ff 0f 0b <0f> 0b
> 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
> RIP  [<ffffffffa039e0e0>] btrfs_check_repairable+0x100/0x110 [btrfs]
>  RSP <ffff88007878bcc8>
> ------------[ cut here ]------------
> <more crashes until the system hangs>
> 
> So, where to from here? Sadly, I feel there is data loss in my future,
> but not sure how to minimise this :\
> 

The more I look at this, the more I'm wondering if this is a total
corruption scenario:

$ btrfs restore -D -l /dev/xvdc
warning, device 4 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=11224137564160
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 2 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 3 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=59973363410688
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 2 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 3 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=59973363410688
Couldn't read chunk tree
Could not open root, trying backup super

$ btrfs restore -D -l /dev/xvdd
warning, device 4 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=11224137564160
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 1 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 3 is missing
bytenr mismatch, want=11224137170944, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super
warning, device 1 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 3 is missing
bytenr mismatch, want=11224137170944, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super

$ btrfs restore -D -l /dev/xvde
warning, device 4 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=11224137564160
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 1 is missing
warning, device 2 is missing
warning, device 4 is missing
warning, device 5 is missing
checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
bytenr mismatch, want=11224137170944, have=59973365311232
ERROR: cannot read chunk root
Could not open root, trying backup super
warning, device 1 is missing
warning, device 2 is missing
warning, device 4 is missing
warning, device 5 is missing
checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
bytenr mismatch, want=11224137170944, have=59973365311232
ERROR: cannot read chunk root
Could not open root, trying backup super

$ btrfs restore -D -l /dev/xvdf
warning, device 4 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=11224137564160
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 1 is missing
warning, device 2 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 3 is missing
bytenr mismatch, want=11224137170944, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super
warning, device 1 is missing
warning, device 2 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 3 is missing
bytenr mismatch, want=11224137170944, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super

$ btrfs restore -D -l /dev/xvdg
warning, device 4 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=11224137564160
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 1 is missing
warning, device 2 is missing
warning, device 4 is missing
warning, device 3 is missing
bytenr mismatch, want=11224137170944, have=11224137105408
ERROR: cannot read chunk root
Could not open root, trying backup super
warning, device 1 is missing
warning, device 2 is missing
warning, device 4 is missing
warning, device 3 is missing
bytenr mismatch, want=11224137170944, have=11224137105408
ERROR: cannot read chunk root
Could not open root, trying backup super

If I mount it read only:
$ mount -o nossd,degraded,ro /dev/xvdc /mnt/fileshare/

$ btrfs device usage /mnt/fileshare/

/dev/xvdc, ID: 1
   Device size:             2.73TiB
   Device slack:              0.00B
   Data,single:             5.00GiB
   Data,RAID6:              1.60TiB
   Data,RAID6:              2.75GiB
   Data,RAID6:              1.00GiB
   Metadata,RAID6:          2.06GiB
   System,RAID6:           32.00MiB
   Unallocated:             1.12TiB

/dev/xvdd, ID: 2
   Device size:             2.73TiB
   Device slack:              0.00B
   Data,single:             1.00GiB
   Data,RAID6:              1.60TiB
   Data,RAID6:              7.07GiB
   Data,RAID6:              1.00GiB
   Metadata,RAID6:          2.06GiB
   System,RAID6:           32.00MiB
   Unallocated:             1.12TiB

/dev/xvde, ID: 3
   Device size:             1.82TiB
   Device slack:              0.00B
   Data,RAID6:              1.60TiB
   Data,RAID6:              7.07GiB
   Metadata,RAID6:          2.06GiB
   System,RAID6:           32.00MiB
   Unallocated:           213.23GiB

/dev/xvdf, ID: 6
   Device size:             1.82TiB
   Device slack:              0.00B
   Data,RAID6:            882.62GiB
   Data,RAID6:              1.00GiB
   Metadata,RAID6:          2.06GiB
   Unallocated:           977.33GiB

/dev/xvdg, ID: 5
   Device size:             1.82TiB
   Device slack:              0.00B
   Data,RAID6:              1.60TiB
   Data,RAID6:              7.07GiB
   Metadata,RAID6:          2.06GiB
   System,RAID6:           32.00MiB
   Unallocated:           213.23GiB

missing, ID: 4
   Device size:               0.00B
   Device slack:           16.00EiB
   Data,RAID6:            758.00GiB
   Data,RAID6:              4.31GiB
   System,RAID6:           32.00MiB
   Unallocated:             1.07TiB

Hoping this isn't a total loss ;)

-- 
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to