kernel: 3.18.7 + all patches since 3.19 + the daily Filipe ;)

For the last few days I've been getting an awful lot of stuck tasks
after mundane operations like simple rsync'ing, an fallocate or just
doign a manual "sync".

Symptom is always 100% CPU use and the task (user-space fallocate, sync
or the [btrfs-transaction] kthread on eventual tx commit) hanging.

This happens even without stress (idle single-disk fs/system, no mem pressure)
and very irregularly. Today I got particularly unlucky and could trigger it
repeatedly, simply by doing a bunch of small fallocates on a fresh subvolume:
the first few would work and then - boom.

A full collection of several SysRq traces is at:
https://gist.github.com/hhoffstaette/c54ca2813cd47439c4c1

I've inserted spaces between different runs and SysRq segments to make
it a bit easier to read.

Common theme is almost always:

Feb 22 12:44:03 tux kernel: [<ffffffff812baa36>] ? 
__percpu_counter_add+0x56/0x80
Feb 22 12:44:03 tux kernel: [<ffffffffa07b566c>] ? 
find_first_extent_bit_state+0x2c/0x80 [btrfs]
Feb 22 12:44:03 tux kernel: [<ffffffff8108bb1b>] ? 
lock_timer_base.isra.36+0x2b/0x50
Feb 22 12:44:03 tux kernel: [<ffffffff81075023>] ? 
prepare_to_wait_event+0x83/0x100
Feb 22 12:44:03 tux kernel: [<ffffffffa07980ff>] 
wait_current_trans.isra.17+0x9f/0x100 [btrfs]
Feb 22 12:44:03 tux kernel: [<ffffffff81075130>] ? __wake_up_sync+0x20/0x20
Feb 22 12:44:03 tux kernel: [<ffffffffa0799ad8>] start_transaction+0x318/0x5a0 
[btrfs]
Feb 22 12:44:03 tux kernel: [<ffffffffa0799e17>] 
btrfs_attach_transaction+0x17/0x20 [btrfs]
Feb 22 12:44:03 tux kernel: [<ffffffffa079486b>] transaction_kthread+0x8b/0x260 
[btrfs]
Feb 22 12:44:03 tux kernel: [<ffffffffa07947e0>] ? 
btrfs_cleanup_transaction+0x520/0x520 [btrfs]
Feb 22 12:44:03 tux kernel: [<ffffffff810685eb>] kthread+0xdb/0x100
Feb 22 12:44:03 tux kernel: [<ffffffff81068510>] ? 
kthread_create_on_node+0x180/0x180
Feb 22 12:44:03 tux kernel: [<ffffffff8153f1ec>] ret_from_fork+0x7c/0xb0
Feb 22 12:44:03 tux kernel: [<ffffffff81068510>] ? 
kthread_create_on_node+0x180/0x180

or this:

Feb 22 14:08:45 tux kernel: [<ffffffffa056a809>] 
btrfs_set_path_blocking+0x49/0x90 [btrfs]
Feb 22 14:08:45 tux kernel: [<ffffffffa056a8a5>] 
btrfs_clear_path_blocking+0x55/0xe0 [btrfs]
Feb 22 14:08:45 tux kernel: [<ffffffffa056f657>] btrfs_search_slot+0x1f7/0xa60 
[btrfs]
Feb 22 14:08:45 tux kernel: [<ffffffffa0585955>] btrfs_update_root+0x55/0x270 
[btrfs]
Feb 22 14:08:45 tux kernel: [<ffffffffa060b4fd>] 
commit_cowonly_roots+0x1e5/0x285 [btrfs]
Feb 22 14:08:45 tux kernel: [<ffffffffa0594135>] 
btrfs_commit_transaction+0x525/0xbb0 [btrfs]
Feb 22 14:08:45 tux kernel: [<ffffffffa05d671d>] ? 
btrfs_log_dentry_safe+0x6d/0x80 [btrfs]
Feb 22 14:08:45 tux kernel: [<ffffffffa05a9f5c>] btrfs_sync_file+0x1fc/0x330 
[btrfs]
Feb 22 14:08:45 tux kernel: [<ffffffff81191531>] do_fsync+0x51/0x80
Feb 22 14:08:45 tux kernel: [<ffffffff811602e7>] ? SyS_fallocate+0x47/0x80
Feb 22 14:08:45 tux kernel: [<ffffffff811917d0>] SyS_fsync+0x10/0x20 

Clearly something is going into endless active loops and not terminating as it
should.

I realize this is vague but wanted to check if
- anyone is seeing this/something similar recently
- might have a suspect?

I've already backtracked a bit and can rule out Filipe's recent inode 
handling/fsync
stuff. The problem must have snuck in recently (last 2-3 weeks).

Grateful for any suggestions!

-h

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to