Hi, my btrfs-based system (~2.5 TiB stored in the filesystem replicated onto on two disks, running kernel 4.9.6-1-ARCH) locked up after I enabled quotas and had a btrfs-size tool running. Now the question is how to recover from that. Whenever I mount the filesystem I end up with btrfs-cleaner and a kworker hanging:
> [ 491.154603] INFO: task kworker/u128:3:105 blocked for more than 120 > seconds. > [ 491.188559] Not tainted 4.9.6-1-ARCH #1 > [ 491.209443] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 491.247188] kworker/u128:3 D 0 105 2 0x00000000 > [ 491.247208] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper > [btrfs] > [ 491.247210] ffff880103bc8800 0000000000000000 ffff8801034ba7c0 > ffff8801062580c0 > [ 491.247213] ffff880105fe8d40 ffffc90000c63c30 ffffffff81605cdf > ffff8801034ba7c0 > [ 491.247215] 0000000000000001 ffff8801062580c0 ffffffff810aa490 > ffff8801034ba7c0 > [ 491.247217] Call Trace: > [ 491.247222] [<ffffffff81605cdf>] ? __schedule+0x22f/0x6e0 > [ 491.247224] [<ffffffff810aa490>] ? wake_up_q+0x80/0x80 > [ 491.247226] [<ffffffff816061cd>] schedule+0x3d/0x90 > [ 491.247237] [<ffffffffa01d248e>] wait_current_trans.isra.8+0xbe/0x110 > [btrfs] > [ 491.247240] [<ffffffff810c4200>] ? wake_atomic_t_function+0x60/0x60 > [ 491.247249] [<ffffffffa01d4d1c>] start_transaction+0x2bc/0x4a0 [btrfs] > [ 491.247258] [<ffffffffa01d4f18>] btrfs_start_transaction+0x18/0x20 [btrfs] > [ 491.247267] [<ffffffffa02442ba>] btrfs_qgroup_rescan_worker+0x7a/0x610 > [btrfs] > [ 491.247278] [<ffffffffa0209abd>] btrfs_scrubparity_helper+0x7d/0x350 > [btrfs] > [ 491.247288] [<ffffffffa0209dde>] btrfs_qgroup_rescan_helper+0xe/0x10 > [btrfs] > [ 491.247291] [<ffffffff81098a95>] process_one_work+0x1e5/0x470 > [ 491.247292] [<ffffffff81098d68>] worker_thread+0x48/0x4e0 > [ 491.247294] [<ffffffff81098d20>] ? process_one_work+0x470/0x470 > [ 491.247296] [<ffffffff8109e8f9>] kthread+0xd9/0xf0 > [ 491.247298] [<ffffffff8102d752>] ? __switch_to+0x2d2/0x630 > [ 491.247299] [<ffffffff8109e820>] ? kthread_park+0x60/0x60 > [ 491.247301] [<ffffffff8160a995>] ret_from_fork+0x25/0x30 > [ 491.247306] INFO: task btrfs-cleaner:148 blocked for more than 120 seconds. > [ 491.280723] Not tainted 4.9.6-1-ARCH #1 > [ 491.302026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 491.340471] btrfs-cleaner D 0 148 2 0x00000000 > [ 491.340475] ffff880103bc8800 0000000000000000 ffff8801032acf80 > ffff8801062580c0 > [ 491.340478] ffff8801032a8d40 ffffc90000cc3cf0 ffffffff81605cdf > ffff8801032acf80 > [ 491.340480] 0000000000000001 ffff8801062580c0 ffffffff810aa490 > ffff8801032acf80 > [ 491.340482] Call Trace: > [ 491.340487] [<ffffffff81605cdf>] ? __schedule+0x22f/0x6e0 > [ 491.340489] [<ffffffff810aa490>] ? wake_up_q+0x80/0x80 > [ 491.340491] [<ffffffff816061cd>] schedule+0x3d/0x90 > [ 491.340505] [<ffffffffa01d248e>] wait_current_trans.isra.8+0xbe/0x110 > [btrfs] > [ 491.340508] [<ffffffff810c4200>] ? wake_atomic_t_function+0x60/0x60 > [ 491.340517] [<ffffffffa01d4d1c>] start_transaction+0x2bc/0x4a0 [btrfs] > [ 491.340525] [<ffffffffa01d4f18>] btrfs_start_transaction+0x18/0x20 [btrfs] > [ 491.340534] [<ffffffffa01bb819>] btrfs_drop_snapshot+0x4e9/0x880 [btrfs] > [ 491.340542] [<ffffffffa01d3e7b>] > btrfs_clean_one_deleted_snapshot+0xbb/0x110 [btrfs] > [ 491.340552] [<ffffffffa01ca7f1>] cleaner_kthread+0x141/0x1b0 [btrfs] > [ 491.340560] [<ffffffffa01ca6b0>] ? > btrfs_destroy_pinned_extent+0x120/0x120 [btrfs] > [ 491.340562] [<ffffffff8109e8f9>] kthread+0xd9/0xf0 > [ 491.340564] [<ffffffff8102d752>] ? __switch_to+0x2d2/0x630 > [ 491.340565] [<ffffffff8109e820>] ? kthread_park+0x60/0x60 > [ 491.340566] [<ffffffff8160a995>] ret_from_fork+0x25/0x30 Unfortunately whenever I try to execute a btrfs command against the mounted filesystem -- e.g. to disable quota -- the command hangs. And unfortunately that's in a shell without job control over a serial console. Relevant output from ps: > 105 0 0 DW [kworker/u128:3] > 107 0 0 SW [kworker/u128:5] > 111 0 0 SW< [bioset] > 112 0 0 SW< [bioset] > 113 0 0 SW< [bioset] > 115 0 0 SW [kworker/1:2] > 117 0 0 SW< [kworker/0:1H] > 118 0 0 SW< [kworker/1:1H] > 122 0 0 SW< [bioset] > 123 0 6724 S sh -i > 128 0 0 SW< [btrfs-worker] > 129 0 0 SW< [kworker/u129:0] > 130 0 0 SW< [btrfs-worker-hi] > 131 0 0 SW< [btrfs-delalloc] > 132 0 0 SW< [btrfs-flush_del] > 133 0 0 SW< [btrfs-cache] > 134 0 0 SW< [btrfs-submit] > 135 0 0 SW< [btrfs-fixup] > 136 0 0 SW< [btrfs-endio] > 137 0 0 SW< [btrfs-endio-met] > 138 0 0 SW< [btrfs-endio-met] > 139 0 0 SW< [btrfs-endio-rai] > 140 0 0 SW< [btrfs-endio-rep] > 141 0 0 SW< [btrfs-rmw] > 142 0 0 SW< [btrfs-endio-wri] > 143 0 0 SW< [btrfs-freespace] > 144 0 0 SW< [btrfs-delayed-m] > 145 0 0 SW< [btrfs-readahead] > 146 0 0 SW< [btrfs-qgroup-re] > 147 0 0 SW< [btrfs-extent-re] > 148 0 0 DW [btrfs-cleaner] > 149 0 0 RW [btrfs-transacti] So there's always a running btrfs-transaction. The kernel messages start off like this: > [ 3.900674] BTRFS: device fsid e7ef324b-c81e-4ccf-941d-713b807ffab4 devid > 1 transid 2030007 /dev/sdb2 > [ 3.942600] BTRFS: device fsid e7ef324b-c81e-4ccf-941d-713b807ffab4 devid > 2 transid 2030007 /dev/sda2 > [ 14.569488] BTRFS info (device sda2): disk space caching is enabled > [ 14.569491] BTRFS info (device sda2): has skinny extents > [ 14.826782] random: crng init done > [ 30.738810] BTRFS info (device sda2): checking UUID tree > [ 62.916772] BTRFS info (device sda2): The free space cache file > (880598319104) is invalid. skip it > [ 62.916772] The actual disk traffic quiets down after a while, without any further message printed into dmesg -- it'd be useful to know when it's done checking the UUID tree. Long story short: Is there a way for me to disable quotas again without mounting the filesystem? Or a way to get btrfs to not spawn cleanup tasks before I can disable quotas? I have many, many qgroups now because of many snapshots created by snapper. Even if I try to touch these the command hangs. Kind regards and thanks Philipp Kern
signature.asc
Description: OpenPGP digital signature