Yet another system upgraded to 4.19 and showing strange issues.

btrfs-cleaner is showing as ~90-100% busy in iotop:

   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
1340 be/4 root 0.00 B/s 0.00 B/s 0.00 % 92.88 % [btrfs-cleaner]


Note disk read, disk write are 0.00 B/s.


iostat -mx 1 shows all disks are ~100% busy, yet there are 0 reads and 0 writes to them:

Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.91 0.00 0.00 0.00 90.80 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 100.00 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 100.00



btrfs-cleaner persists 100% busy after reboots. The system is generally not very responsive.



# echo w > /proc/sysrq-trigger

# dmesg -c
[  931.585611] sysrq: SysRq : Show Blocked State
[  931.585715]   task                        PC stack   pid father
[  931.590168] btrfs-cleaner   D    0  1340      2 0x80000000
[  931.590175] Call Trace:
[  931.590190]  __schedule+0x29e/0x840
[  931.590195]  schedule+0x2c/0x80
[  931.590199]  schedule_timeout+0x258/0x360
[  931.590204]  io_schedule_timeout+0x1e/0x50
[  931.590208]  wait_for_completion_io+0xb7/0x140
[  931.590214]  ? wake_up_q+0x80/0x80
[  931.590219]  submit_bio_wait+0x61/0x90
[  931.590225]  blkdev_issue_discard+0x7a/0xd0
[  931.590266]  btrfs_issue_discard+0x123/0x160 [btrfs]
[  931.590299]  btrfs_discard_extent+0xd8/0x160 [btrfs]
[  931.590335]  btrfs_finish_extent_commit+0xe2/0x240 [btrfs]
[  931.590382]  btrfs_commit_transaction+0x573/0x840 [btrfs]
[  931.590415]  ? btrfs_block_rsv_check+0x25/0x70 [btrfs]
[  931.590456]  __btrfs_end_transaction+0x2be/0x2d0 [btrfs]
[  931.590493]  btrfs_end_transaction_throttle+0x13/0x20 [btrfs]
[  931.590530]  btrfs_drop_snapshot+0x489/0x800 [btrfs]
[  931.590567]  btrfs_clean_one_deleted_snapshot+0xbb/0xf0 [btrfs]
[  931.590607]  cleaner_kthread+0x136/0x160 [btrfs]
[  931.590612]  kthread+0x120/0x140
[  931.590646]  ? btree_submit_bio_start+0x20/0x20 [btrfs]
[  931.590658]  ? kthread_bind+0x40/0x40
[  931.590661]  ret_from_fork+0x22/0x40




After rebooting to 4.19.3, I've started seeing read errors. There were no errors with a previous kernel (4.17.14); disks are healthy according to SMART; no errors reported when we read the whole surface with dd.

# grep -i btrfs.*corrected /var/log/syslog|wc -l
156

Things like:

Nov 22 12:17:43 lxd05 kernel: [ 711.538836] BTRFS info (device sda2): read error corrected: ino 0 off 3739083145216 (dev /dev/sdc2 sector 101088384) Nov 22 12:17:43 lxd05 kernel: [ 711.538905] BTRFS info (device sda2): read error corrected: ino 0 off 3739083149312 (dev /dev/sdc2 sector 101088392) Nov 22 12:17:43 lxd05 kernel: [ 711.538958] BTRFS info (device sda2): read error corrected: ino 0 off 3739083153408 (dev /dev/sdc2 sector 101088400) Nov 22 12:17:43 lxd05 kernel: [ 711.539006] BTRFS info (device sda2): read error corrected: ino 0 off 3739083157504 (dev /dev/sdc2 sector 101088408)


Yet - 0 errors, according to stats, not sure if it's expected or not:

# btrfs device stats /data/lxd
[/dev/sda2].write_io_errs    0
[/dev/sda2].read_io_errs     0
[/dev/sda2].flush_io_errs    0
[/dev/sda2].corruption_errs  0
[/dev/sda2].generation_errs  0
[/dev/sdb2].write_io_errs    0
[/dev/sdb2].read_io_errs     0
[/dev/sdb2].flush_io_errs    0
[/dev/sdb2].corruption_errs  0
[/dev/sdb2].generation_errs  0
[/dev/sdc2].write_io_errs    0
[/dev/sdc2].read_io_errs     0
[/dev/sdc2].flush_io_errs    0
[/dev/sdc2].corruption_errs  0
[/dev/sdc2].generation_errs  0


# btrfs fi usage /data/lxd
Overall:
    Device size:                   5.18TiB
    Device allocated:              2.36TiB
    Device unallocated:            2.83TiB
    Device missing:                  0.00B
    Used:                          1.93TiB
    Free (estimated):              1.62TiB      (min: 1.62TiB)
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)

Data,RAID1: Size:1.17TiB, Used:983.87GiB
   /dev/sda2     796.00GiB
   /dev/sdb2     799.00GiB
   /dev/sdc2     797.00GiB

Metadata,RAID1: Size:10.00GiB, Used:4.09GiB
   /dev/sda2       8.00GiB
   /dev/sdb2       5.00GiB
   /dev/sdc2       7.00GiB

System,RAID1: Size:64.00MiB, Used:224.00KiB
   /dev/sda2      32.00MiB
   /dev/sdb2      64.00MiB
   /dev/sdc2      32.00MiB

Unallocated:
   /dev/sda2     964.46GiB
   /dev/sdb2     964.43GiB
   /dev/sdc2     964.46GiB


# btrfs fi df /data/lxd
Data, RAID1: total=1.17TiB, used=983.04GiB
System, RAID1: total=64.00MiB, used=224.00KiB
Metadata, RAID1: total=10.00GiB, used=4.08GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


# btrfs fi show /data/lxd
Label: 'lxd5'  uuid: 2b77b498-a644-430b-9dd9-2ad3d381448a
        Total devices 3 FS bytes used 987.12GiB
        devid    1 size 1.73TiB used 804.03GiB path /dev/sda2
        devid    2 size 1.73TiB used 804.06GiB path /dev/sdb2
        devid    3 size 1.73TiB used 804.03GiB path /dev/sdc2


Tomasz Chmielewski
https://lxadm.com

Reply via email to