The extent-same ioctl seems to have a locking bug.  My test machines
will run between 0 and 3 days before something gets locked and stays
locked forever.

In the dumps and logs below, 'btrsame' calls the btrfs extent-same ioctl
on its arguments.  As you can see from the trace, it is stuck in this
very ioctl.

'rsync' is famous already and needs no introduction.  It seems
to be stuck just trying to read the file extent-same is reading.

All of these processes are stuck on the same file.  There are lots of
rsyncs because the sender keeps timing out and starting a new rsync,
and the new rsync promptly locks up as soon as it touches the same file
the old ones did.

The rsync processes (and also 'cat' from the command line just now) lock
up when they try to read the same byte range currently being processed by
extent-same.  They are able to read earlier bytes in the file just fine.

Here is /proc/version:

        Linux version 3.18.8-zb64+ (root@sneezy) (gcc version 4.9.1 (Debian 
4.9.1-19) ) #1 SMP PREEMPT Fri Feb 27 01:09:08 EST 2015

Here are the kernel stacks from currently stuck processes:

        USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
        root       583  0.5  0.2 193664 38812 ?        DNs  02:36   2:36 rsync 
--server -vlHogDtprSze.iLsf --timeout=3600 --delete-during --delete-excluded 
--partial --numeric-ids --fuzzy --
        [<ffffffff81181412>] __lock_page_killable+0xa2/0xb0
        [<ffffffff81182f30>] generic_file_read_iter+0x440/0x600
        [<ffffffff811fbbfe>] new_sync_read+0x7e/0xb0
        [<ffffffff811fc394>] vfs_read+0x94/0x180
        [<ffffffff811fcf46>] SyS_read+0x46/0xc0
        [<ffffffff818711ad>] system_call_fastpath+0x16/0x1b
        [<ffffffffffffffff>] 0xffffffffffffffff

        USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
        root      4241  1.3  0.2 148064 36516 ?        DNs  07:11   2:27 rsync 
--server -vlHogDtprSze.iLsf --timeout=3600 --delete-during --delete-excluded 
--partial --numeric-ids --fuzzy --
        [<ffffffff81181412>] __lock_page_killable+0xa2/0xb0
        [<ffffffff81182f30>] generic_file_read_iter+0x440/0x600
        [<ffffffff811fbbfe>] new_sync_read+0x7e/0xb0
        [<ffffffff811fc394>] vfs_read+0x94/0x180
        [<ffffffff811fcf46>] SyS_read+0x46/0xc0
        [<ffffffff818711ad>] system_call_fastpath+0x16/0x1b
        [<ffffffffffffffff>] 0xffffffffffffffff

        USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
        root     10091  0.3  0.2 184208 40416 ?        DNs  Mar02   2:25 rsync 
--server -vlHogDtprSze.iLsf --timeout=3600 --delete-during --delete-excluded 
--partial --numeric-ids --fuzzy --
        [<ffffffff81392d9d>] lock_extent_bits+0x1ad/0x200
        [<ffffffff81392e03>] lock_extent+0x13/0x20
        [<ffffffff81395d3c>] __extent_readpages.constprop.38+0x21c/0x2c0
        [<ffffffff81397109>] extent_readpages+0x179/0x1c0
        [<ffffffff813777bf>] btrfs_readpages+0x1f/0x30
        [<ffffffff8118f606>] __do_page_cache_readahead+0x1b6/0x250
        [<ffffffff8118f77f>] ondemand_readahead+0xdf/0x280
        [<ffffffff8118fa4e>] page_cache_sync_readahead+0x2e/0x50
        [<ffffffff81182fdc>] generic_file_read_iter+0x4ec/0x600
        [<ffffffff811fbbfe>] new_sync_read+0x7e/0xb0
        [<ffffffff811fc394>] vfs_read+0x94/0x180
        [<ffffffff811fcf46>] SyS_read+0x46/0xc0
        [<ffffffff818711ad>] system_call_fastpath+0x16/0x1b
        [<ffffffffffffffff>] 0xffffffffffffffff

        USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
        root     19705  0.8  0.2 192260 34888 ?        DNs  05:11   2:29 rsync 
--server -vlHogDtprSze.iLsf --timeout=3600 --delete-during --delete-excluded 
--partial --numeric-ids --fuzzy --
        [<ffffffff81181412>] __lock_page_killable+0xa2/0xb0
        [<ffffffff81182f30>] generic_file_read_iter+0x440/0x600
        [<ffffffff811fbbfe>] new_sync_read+0x7e/0xb0
        [<ffffffff811fc394>] vfs_read+0x94/0x180
        [<ffffffff811fcf46>] SyS_read+0x46/0xc0
        [<ffffffff818711ad>] system_call_fastpath+0x16/0x1b
        [<ffffffffffffffff>] 0xffffffffffffffff

        USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
        root     21950  0.1  0.0  14800  1452 pts/43   D<+  01:17   0:44 
btrsame [paths redacted]
        [<ffffffff81181362>] __lock_page+0xa2/0xb0
        [<ffffffff81181dc8>] pagecache_get_page+0x168/0x1a0
        [<ffffffff813a805d>] extent_same_get_page+0x2d/0xc0
        [<ffffffff813aef8f>] btrfs_ioctl+0x26bf/0x28c0
        [<ffffffff8120fa10>] do_vfs_ioctl+0x2f0/0x4f0
        [<ffffffff8120fc91>] SyS_ioctl+0x81/0xa0
        [<ffffffff818711ad>] system_call_fastpath+0x16/0x1b
        [<ffffffffffffffff>] 0xffffffffffffffff

        USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
        root     22535 13.4  0.1  75392 16716 ?        DNs  10:02   1:41 rsync 
--server -vlHogDtprSze.iLsf --timeout=3600 --delete-during --delete-excluded 
--partial --numeric-ids --fuzzy --
        [<ffffffffffffffff>] 0xffffffffffffffff

Here are the kernel messages:

        Mar  3 04:29:29 sneezy kernel: [113471.296017] INFO: task rsync:10091 
blocked for more than 1800 seconds.
        Mar  3 04:29:29 sneezy kernel: [113471.296067]       Tainted: G        
W      3.18.8-zb64+ #1
        Mar  3 04:29:29 sneezy kernel: [113471.296109] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
        Mar  3 04:29:29 sneezy kernel: [113471.296184] rsync           D 
0000000000000001     0 10091      1 0x00000004
        Mar  3 04:29:29 sneezy kernel: [113471.296193]  ffff8800088c79a8 
0000000000000086 ffff880069696000 000000000001fc00
        Mar  3 04:29:29 sneezy kernel: [113471.296200]  ffff8800088c7fd8 
000000000001fc00 ffff88040a648000 ffff880069696000
        Mar  3 04:29:29 sneezy kernel: [113471.296207]  ffffffff81391f2c 
ffff880131b509d0 ffff8800088c7908 ffffffff8139152a
        Mar  3 04:29:29 sneezy kernel: [113471.296214] Call Trace:
        Mar  3 04:29:29 sneezy kernel: [113471.296230]  [<ffffffff81391f2c>] ? 
__set_extent_bit+0x44c/0x540
        Mar  3 04:29:29 sneezy kernel: [113471.296238]  [<ffffffff8139152a>] ? 
free_extent_state+0x4a/0xd0
        Mar  3 04:29:29 sneezy kernel: [113471.296245]  [<ffffffff81391f2c>] ? 
__set_extent_bit+0x44c/0x540
        Mar  3 04:29:29 sneezy kernel: [113471.296252]  [<ffffffff81871ea8>] ? 
retint_restore_args+0x13/0x13
        Mar  3 04:29:29 sneezy kernel: [113471.296261]  [<ffffffff8186ca79>] 
schedule+0x29/0x70
        Mar  3 04:29:29 sneezy kernel: [113471.296266]  [<ffffffff81392d9d>] 
lock_extent_bits+0x1ad/0x200
        Mar  3 04:29:29 sneezy kernel: [113471.296274]  [<ffffffff810b78c0>] ? 
__wake_up_sync+0x20/0x20
        Mar  3 04:29:29 sneezy kernel: [113471.296280]  [<ffffffff81392e03>] 
lock_extent+0x13/0x20
        Mar  3 04:29:29 sneezy kernel: [113471.296286]  [<ffffffff81395d3c>] 
__extent_readpages.constprop.38+0x21c/0x2c0
        Mar  3 04:29:29 sneezy kernel: [113471.296293]  [<ffffffff81379130>] ? 
btrfs_direct_IO+0x360/0x360
        Mar  3 04:29:29 sneezy kernel: [113471.296299]  [<ffffffff81397109>] 
extent_readpages+0x179/0x1c0
        Mar  3 04:29:29 sneezy kernel: [113471.296304]  [<ffffffff81379130>] ? 
btrfs_direct_IO+0x360/0x360
        Mar  3 04:29:29 sneezy kernel: [113471.296314]  [<ffffffff813777bf>] 
btrfs_readpages+0x1f/0x30
        Mar  3 04:29:29 sneezy kernel: [113471.296321]  [<ffffffff8118f606>] 
__do_page_cache_readahead+0x1b6/0x250
        Mar  3 04:29:29 sneezy kernel: [113471.296329]  [<ffffffff8118f77f>] 
ondemand_readahead+0xdf/0x280
        Mar  3 04:29:29 sneezy kernel: [113471.296336]  [<ffffffff8118fa4e>] 
page_cache_sync_readahead+0x2e/0x50
        Mar  3 04:29:29 sneezy kernel: [113471.296342]  [<ffffffff81182fdc>] 
generic_file_read_iter+0x4ec/0x600
        Mar  3 04:29:29 sneezy kernel: [113471.296350]  [<ffffffff810136cb>] ? 
__switch_to+0x1fb/0x620
        Mar  3 04:29:29 sneezy kernel: [113471.296358]  [<ffffffff811fbbfe>] 
new_sync_read+0x7e/0xb0
        Mar  3 04:29:29 sneezy kernel: [113471.296367]  [<ffffffff811fc394>] 
vfs_read+0x94/0x180
        Mar  3 04:29:29 sneezy kernel: [113471.296373]  [<ffffffff811fcf46>] 
SyS_read+0x46/0xc0
        Mar  3 04:29:29 sneezy kernel: [113471.296379]  [<ffffffff818711ad>] 
system_call_fastpath+0x16/0x1b
        Mar  3 04:29:29 sneezy kernel: [113471.296386] INFO: task btrsame:21950 
blocked for more than 1800 seconds.
        Mar  3 04:29:29 sneezy kernel: [113471.296428]       Tainted: G        
W      3.18.8-zb64+ #1
        Mar  3 04:29:29 sneezy kernel: [113471.296469] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
        Mar  3 04:29:29 sneezy kernel: [113471.296542] btrsame         D 
0000000000000000     0 21950  30469 0x00000000
        Mar  3 04:29:29 sneezy kernel: [113471.296549]  ffff880003a9bc18 
0000000000000086 ffff880003ff6000 000000000001fc00
        Mar  3 04:29:29 sneezy kernel: [113471.296556]  ffff880003a9bfd8 
000000000001fc00 ffff88005c172000 ffff880003ff6000
        Mar  3 04:29:29 sneezy kernel: [113471.296563]  ffff8801d11de000 
ffff88009cd5d000 ffff880003a9bc68 ffffffff8186c432
        Mar  3 04:29:29 sneezy kernel: [113471.296570] Call Trace:
        Mar  3 04:29:29 sneezy kernel: [113471.296578]  [<ffffffff8186c432>] ? 
__schedule+0x442/0xa60
        Mar  3 04:29:29 sneezy kernel: [113471.296586]  [<ffffffff8186d470>] ? 
bit_wait+0x60/0x60
        Mar  3 04:29:29 sneezy kernel: [113471.296592]  [<ffffffff8101e999>] ? 
read_tsc+0x9/0x10
        Mar  3 04:29:29 sneezy kernel: [113471.296599]  [<ffffffff8186d470>] ? 
bit_wait+0x60/0x60
        Mar  3 04:29:29 sneezy kernel: [113471.296604]  [<ffffffff8186ca79>] 
schedule+0x29/0x70
        Mar  3 04:29:29 sneezy kernel: [113471.296610]  [<ffffffff8186cd98>] 
io_schedule+0x98/0x100
        Mar  3 04:29:29 sneezy kernel: [113471.296616]  [<ffffffff8186d4a4>] 
bit_wait_io+0x34/0x60
        Mar  3 04:29:29 sneezy kernel: [113471.296622]  [<ffffffff8186d32b>] 
__wait_on_bit_lock+0x4b/0xb0
        Mar  3 04:29:29 sneezy kernel: [113471.296628]  [<ffffffff811361a0>] ? 
__delayacct_blkio_end+0x60/0xa0
        Mar  3 04:29:29 sneezy kernel: [113471.296635]  [<ffffffff81181362>] 
__lock_page+0xa2/0xb0
        Mar  3 04:29:29 sneezy kernel: [113471.296642]  [<ffffffff810b7900>] ? 
autoremove_wake_function+0x40/0x40
        Mar  3 04:29:29 sneezy kernel: [113471.296648]  [<ffffffff81181dc8>] 
pagecache_get_page+0x168/0x1a0
        Mar  3 04:29:29 sneezy kernel: [113471.296656]  [<ffffffff813a805d>] 
extent_same_get_page+0x2d/0xc0
        Mar  3 04:29:29 sneezy kernel: [113471.296663]  [<ffffffff813aef8f>] 
btrfs_ioctl+0x26bf/0x28c0
        Mar  3 04:29:29 sneezy kernel: [113471.296669]  [<ffffffff8147321f>] ? 
queue_unplugged+0x3f/0xc0
        Mar  3 04:29:29 sneezy kernel: [113471.296678]  [<ffffffff81476578>] ? 
blk_finish_plug+0x18/0x60
        Mar  3 04:29:29 sneezy kernel: [113471.296684]  [<ffffffff8118f618>] ? 
__do_page_cache_readahead+0x1c8/0x250
        Mar  3 04:29:29 sneezy kernel: [113471.296692]  [<ffffffff8120fa10>] 
do_vfs_ioctl+0x2f0/0x4f0
        Mar  3 04:29:29 sneezy kernel: [113471.296699]  [<ffffffff814b35ba>] ? 
trace_hardirqs_off_thunk+0x3a/0x3c
        Mar  3 04:29:29 sneezy kernel: [113471.296705]  [<ffffffff8120fc91>] 
SyS_ioctl+0x81/0xa0
        Mar  3 04:29:29 sneezy kernel: [113471.296711]  [<ffffffff818711ad>] 
system_call_fastpath+0x16/0x1b

Attachment: signature.asc
Description: Digital signature

Reply via email to