Hello,

I have a 4x 2TB HDD raid5 array and one of the disks started going bad
(according to smart no read/write errors seen by btrfs), after replacing the
disk with a new one I ran "btrfs replace" which resulted in kernel crash about
0.5% done:


BTRFS info (device dm-10): dev_replace from <missing disk> (devid 4) to
/dev/mapper/bcrypt_sdj1 started

WARNING: CPU: 1 PID: 30627 at fs/btrfs/inode.c:9125
btrfs_destroy_inode+0x271/0x290()
Modules linked in: algif_skcipher af_alg evdev xt_tcpudp nf_conntrack_ipv4
nf_defrag_ipv4 xt_conntrack nf_conntrack x86_pkg_temp_thermal kvm_intel kvm
irqbypass ghash_clmulni_intel psmouse iptable_filter ip_tables x_tables fan
thermal battery processor button autofs4
CPU: 1 PID: 30627 Comm: umount Not tainted 4.5.0 #1
Hardware name: System manufacturer System Product Name/P8Z77-V LE PLUS, BIOS
0910 03/18/2014
0000000000000000 ffffffff813971f9 0000000000000000 ffffffff817f2b34
ffffffff8107ab78 ffff8800d55daa00 ffff8800cb990998 ffff880212d5b800
0000000000000000 ffff8801fcc0ff58 ffffffff812dbfc1 ffff8800d55daa00
Call Trace:
[<ffffffff813971f9>] ? dump_stack+0x46/0x5d
[<ffffffff8107ab78>] ? warn_slowpath_common+0x78/0xb0
[<ffffffff812dbfc1>] ? btrfs_destroy_inode+0x271/0x290
[<ffffffff812b69a2>] ? btrfs_put_block_group_cache+0x72/0xa0
[<ffffffff812c71d6>] ? close_ctree+0x146/0x330
[<ffffffff81154d9f>] ? generic_shutdown_super+0x5f/0xe0
[<ffffffff81155029>] ? kill_anon_super+0x9/0x10
[<ffffffff8129c5ed>] ? btrfs_kill_super+0xd/0x90
[<ffffffff8115534f>] ? deactivate_locked_super+0x2f/0x60
[<ffffffff8116f376>] ? cleanup_mnt+0x36/0x80
[<ffffffff81091f3c>] ? task_work_run+0x6c/0x90
[<ffffffff810011aa>] ? exit_to_usermode_loop+0x8a/0x90
[<ffffffff8167bce3>] ? int_ret_from_sys_call+0x25/0x8f
---[ end trace 6a7dec9450d45f9c ]---


Replace continues automatically after reboot but ends up using all of memory,
around every 6% of progress (8 hours) and crashes system:


BTRFS info (device dm-10): continuing dev_replace from <missing disk> (devid 4)
to /dev/mapper/bcrypt_sdj1 @0%
Apr 20 14:03:48 localhost kernel: BTRFS warning (device dm-4): devid 4 uuid
e02b8898-c6ce-4c95-956d-24217c470b8a is missing
Apr 20 14:03:52 localhost kernel: BTRFS info (device dm-4): continuing
dev_replace from <missing disk> (devid 4) to /dev/mapper/bcrypt_sdj1 @6%
Apr 20 22:38:41 localhost kernel: BTRFS warning (device dm-4): devid 4 uuid
e02b8898-c6ce-4c95-956d-24217c470b8a is missing
Apr 20 22:38:46 localhost kernel: BTRFS info (device dm-4): continuing
dev_replace from <missing disk> (devid 4) to /dev/mapper/bcrypt_sdj1 @12%
Apr 21 13:14:51 localhost kernel: BTRFS warning (device dm-4): devid 4 uuid
e02b8898-c6ce-4c95-956d-24217c470b8a is missing
Apr 21 13:14:55 localhost kernel: BTRFS info (device dm-4): continuing
dev_replace from <missing disk> (devid 4) to /dev/mapper/bcrypt_sdj1 @18%


The issue is related to "bio-1" using all of memory:

/proc/meminfo:

MemTotal:        8072852 kB
MemFree:          646108 kB
...
Slab:            6235188 kB
SReclaimable:      49320 kB
SUnreclaim:      6185868 kB

/proc/slabinfo:

# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab>
: tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs>
<num_slabs> <sharedavail>
bio-1             17588753 17588964    320   12    1 : tunables    0    0    0 :
slabdata 1465747 1465747      0


The replace operation is super slow (no other load) with avg. 3x20MB/s (old
disks) reads and 1.4MB/s write (new disk) with CFQ scheduler. Using deadline
schd. the performance is better with avg. 3x40MB/s reads and 4MB/s write (both
schds. with default queue/nr_requests).

Write speed seems slow but guess it possible if there's a lot random writes but
why is the difference between data read vs. written so large? According to
iostat replace reads 35 times more data than it writes to the new disk.


Info:

kernel 4.5 (now 4.5.2, no change)
btrfs-progs 4.5.1
dm-crypted partitions, 4k aligned
mount opts: defaults,noatime,compress=lzo
8GB RAM


btrfs fi usage /bstorage/
WARNING: RAID56 detected, not implemented
WARNING: RAID56 detected, not implemented
WARNING: RAID56 detected, not implemented
Overall:
    Device size:                   9.10TiB
    Device allocated:                0.00B
    Device unallocated:            9.10TiB
    Device missing:                1.82TiB
    Used:                            0.00B
    Free (estimated):                0.00B      (min: 8.00EiB)
    Data ratio:                       0.00
    Metadata ratio:                   0.00
    Global reserve:              512.00MiB      (used: 0.00B)

Data,RAID5: Size:1.52TiB, Used:1.46TiB
   /dev/mapper/bcrypt_sdg1       520.00GiB
   /dev/mapper/bcrypt_sdh1       520.00GiB
   /dev/mapper/bcrypt_sdi1       520.00GiB
   missing       520.00GiB

Metadata,RAID5: Size:4.03GiB, Used:1.96GiB
   /dev/mapper/bcrypt_sdg1         1.34GiB
   /dev/mapper/bcrypt_sdh1         1.34GiB
   /dev/mapper/bcrypt_sdi1         1.34GiB
   missing         1.34GiB

System,RAID5: Size:76.00MiB, Used:128.00KiB
   /dev/mapper/bcrypt_sdg1        36.00MiB
   /dev/mapper/bcrypt_sdh1        36.00MiB
   /dev/mapper/bcrypt_sdi1        36.00MiB
   missing         4.00MiB

Unallocated:
   /dev/mapper/bcrypt_sdg1         1.31TiB
   /dev/mapper/bcrypt_sdh1         1.31TiB
   /dev/mapper/bcrypt_sdi1         1.31TiB
   /dev/mapper/bcrypt_sdj1         1.82TiB
   missing         1.31TiB


btrfs fi show /bstorage/
Label: 'btrfs_bstorage'  uuid: 3861e35a-43ef-4293-b2bf-f841c8bcb4e4
        Total devices 5 FS bytes used 1.47TiB
        devid    0 size 1.82TiB used 521.35GiB path /dev/mapper/bcrypt_sdj1
        devid    1 size 1.82TiB used 521.38GiB path /dev/mapper/bcrypt_sdg1
        devid    2 size 1.82TiB used 521.38GiB path /dev/mapper/bcrypt_sdh1
        devid    3 size 1.82TiB used 521.38GiB path /dev/mapper/bcrypt_sdi1
        *** Some devices missing


btrfs device stats /bstorage/
[/dev/mapper/bcrypt_sdj1].write_io_errs   0
[/dev/mapper/bcrypt_sdj1].read_io_errs    0
[/dev/mapper/bcrypt_sdj1].flush_io_errs   0
[/dev/mapper/bcrypt_sdj1].corruption_errs 0
[/dev/mapper/bcrypt_sdj1].generation_errs 0
[/dev/mapper/bcrypt_sdg1].write_io_errs   0
[/dev/mapper/bcrypt_sdg1].read_io_errs    0
[/dev/mapper/bcrypt_sdg1].flush_io_errs   0
[/dev/mapper/bcrypt_sdg1].corruption_errs 0
[/dev/mapper/bcrypt_sdg1].generation_errs 0
[/dev/mapper/bcrypt_sdh1].write_io_errs   0
[/dev/mapper/bcrypt_sdh1].read_io_errs    0
[/dev/mapper/bcrypt_sdh1].flush_io_errs   0
[/dev/mapper/bcrypt_sdh1].corruption_errs 0
[/dev/mapper/bcrypt_sdh1].generation_errs 0
[/dev/mapper/bcrypt_sdi1].write_io_errs   0
[/dev/mapper/bcrypt_sdi1].read_io_errs    0
[/dev/mapper/bcrypt_sdi1].flush_io_errs   0
[/dev/mapper/bcrypt_sdi1].corruption_errs 0
[/dev/mapper/bcrypt_sdi1].generation_errs 0
[(null)].write_io_errs   0
[(null)].read_io_errs    0
[(null)].flush_io_errs   0
[(null)].corruption_errs 0
[(null)].generation_errs 0


btrfs dev usage /bstorage/
/dev/mapper/bcrypt_sdg1, ID: 1
   Device size:             1.82TiB
   Data,RAID5:            520.00GiB
   Metadata,RAID5:          1.34GiB
   System,RAID5:            4.00MiB
   System,RAID5:           32.00MiB
   Unallocated:             1.31TiB

/dev/mapper/bcrypt_sdh1, ID: 2
   Device size:             1.82TiB
   Data,RAID5:            520.00GiB
   Metadata,RAID5:          1.34GiB
   System,RAID5:            4.00MiB
   System,RAID5:           32.00MiB
   Unallocated:             1.31TiB

/dev/mapper/bcrypt_sdi1, ID: 3
   Device size:             1.82TiB
   Data,RAID5:            520.00GiB
   Metadata,RAID5:          1.34GiB
   System,RAID5:            4.00MiB
   System,RAID5:           32.00MiB
   Unallocated:             1.31TiB

/dev/mapper/bcrypt_sdj1, ID: 0
   Device size:             1.82TiB
   Unallocated:             1.82TiB

missing, ID: 4
   Device size:               0.00B
   Data,RAID5:            520.00GiB
   Metadata,RAID5:          1.34GiB
   System,RAID5:            4.00MiB
   Unallocated:             1.31TiB
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to