On 20.11.18 г. 21:00 ч., Josef Bacik wrote:
> On Fri, Oct 26, 2018 at 02:41:55PM +0300, Nikolay Borisov wrote:
>> Running btrfs/124 in a loop hung up on me sporadically with the
>> following call trace:
>> btrfs D 0 5760 5324 0x00000000
>> Call Trace:
>> ? __schedule+0x243/0x800
>> schedule+0x33/0x90
>> btrfs_start_ordered_extent+0x10c/0x1b0 [btrfs]
>> ? wait_woken+0xa0/0xa0
>> btrfs_wait_ordered_range+0xbb/0x100 [btrfs]
>> btrfs_relocate_block_group+0x1ff/0x230 [btrfs]
>> btrfs_relocate_chunk+0x49/0x100 [btrfs]
>> btrfs_balance+0xbeb/0x1740 [btrfs]
>> btrfs_ioctl_balance+0x2ee/0x380 [btrfs]
>> btrfs_ioctl+0x1691/0x3110 [btrfs]
>> ? lockdep_hardirqs_on+0xed/0x180
>> ? __handle_mm_fault+0x8e7/0xfb0
>> ? _raw_spin_unlock+0x24/0x30
>> ? __handle_mm_fault+0x8e7/0xfb0
>> ? do_vfs_ioctl+0xa5/0x6e0
>> ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
>> do_vfs_ioctl+0xa5/0x6e0
>> ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
>> ksys_ioctl+0x3a/0x70
>> __x64_sys_ioctl+0x16/0x20
>> do_syscall_64+0x60/0x1b0
>> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> Turns out during page writeback it's possible that the code in
>> writepage_delalloc can instantiate a delalloc range which doesn't
>> belong to the page currently being written back. This happens since
>> find_lock_delalloc_range returns up to BTRFS_MAX_EXTENT_SIZE delalloc
>> range when asked and doens't really consider the range of the passed
>> page. When such a foregin range is found the code proceeds to
>> run_delalloc_range and calls the appropriate function to fill the
>> delalloc and create ordered extents. If, however, a failure occurs
>> while this operation is in effect then the clean up code in
>> btrfs_cleanup_ordered_extents will be called. This function has the
>> wrong assumption that caller of run_delalloc_range always properly
>> cleans the first page of the range hence when it calls
>> __endio_write_update_ordered it explicitly ommits the first page of
>> the delalloc range. This is wrong because this function could be
>> cleaning a delalloc range that doesn't belong to the current page. This
>> in turn means that the page cleanup code in __extent_writepage will
>> not really free the initial page for the range, leaving a hanging
>> ordered extent with bytes_left set to 4k. This bug has been present
>> ever since the original introduction of the cleanup code in 524272607e88.
>>
>> Fix this by correctly checking whether the current page belongs to the
>> range being instantiated and if so correctly adjust the range parameters
>> passed for cleaning up. If it doesn't, then just clean the whole OE
>> range directly.
>>
>> Signed-off-by: Nikolay Borisov <nbori...@suse.com>
>> Fixes: 524272607e88 ("btrfs: Handle delalloc error correctly to avoid
>> ordered extent hang")
>
> Can we just remove the endio cleanup in __extent_writepage() and make this do
> the proper cleanup? I'm not sure if that is feasible or not, but seems like
> it
> would make the cleanup stuff less confusing and more straightforward. If not
> you can add
Quickly skimming the code I think the cleanup in __extent_writepage
could be moved into __extent_writepage_io where we have 2 branches that
set PageError. So I guess it could be done, but I will have to
experiment with it.
>
> Reviewed-by: Josef Bacik <jo...@toxicpanda.com>
>
> Thanks,
>
> Josef
>