On 20.11.18 г. 21:00 ч., Josef Bacik wrote:
> On Fri, Oct 26, 2018 at 02:41:55PM +0300, Nikolay Borisov wrote:
>> Running btrfs/124 in a loop hung up on me sporadically with the
>> following call trace:
>>      btrfs           D    0  5760   5324 0x00000000
>>      Call Trace:
>>       ? __schedule+0x243/0x800
>>       schedule+0x33/0x90
>>       btrfs_start_ordered_extent+0x10c/0x1b0 [btrfs]
>>       ? wait_woken+0xa0/0xa0
>>       btrfs_wait_ordered_range+0xbb/0x100 [btrfs]
>>       btrfs_relocate_block_group+0x1ff/0x230 [btrfs]
>>       btrfs_relocate_chunk+0x49/0x100 [btrfs]
>>       btrfs_balance+0xbeb/0x1740 [btrfs]
>>       btrfs_ioctl_balance+0x2ee/0x380 [btrfs]
>>       btrfs_ioctl+0x1691/0x3110 [btrfs]
>>       ? lockdep_hardirqs_on+0xed/0x180
>>       ? __handle_mm_fault+0x8e7/0xfb0
>>       ? _raw_spin_unlock+0x24/0x30
>>       ? __handle_mm_fault+0x8e7/0xfb0
>>       ? do_vfs_ioctl+0xa5/0x6e0
>>       ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
>>       do_vfs_ioctl+0xa5/0x6e0
>>       ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
>>       ksys_ioctl+0x3a/0x70
>>       __x64_sys_ioctl+0x16/0x20
>>       do_syscall_64+0x60/0x1b0
>>       entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> Turns out during page writeback it's possible that the code in
>> writepage_delalloc can instantiate a delalloc range which doesn't
>> belong to the page currently being written back. This happens since
>> find_lock_delalloc_range returns up to BTRFS_MAX_EXTENT_SIZE delalloc
>> range when asked and doens't really consider the range of the passed
>> page. When such a foregin range is found the code proceeds to
>> run_delalloc_range and calls the appropriate function to fill the
>> delalloc and create ordered extents. If, however, a failure occurs
>> while this operation is in effect then the clean up code in
>> btrfs_cleanup_ordered_extents will be called. This function has the
>> wrong assumption that caller of run_delalloc_range always properly
>> cleans the first page of the range hence when it calls
>> __endio_write_update_ordered it explicitly ommits the first page of
>> the delalloc range. This is wrong because this function could be
>> cleaning a delalloc range that doesn't belong to the current page. This
>> in turn means that the page cleanup code in __extent_writepage will
>> not really free the initial page for the range, leaving a hanging
>> ordered extent with bytes_left set to 4k. This bug has been present
>> ever since the original introduction of the cleanup code in 524272607e88.
>>
>> Fix this by correctly checking whether the current page belongs to the
>> range being instantiated and if so correctly adjust the range parameters
>> passed for cleaning up. If it doesn't, then just clean the whole OE
>> range directly.
>>
>> Signed-off-by: Nikolay Borisov <nbori...@suse.com>
>> Fixes: 524272607e88 ("btrfs: Handle delalloc error correctly to avoid 
>> ordered extent hang")
> 
> Can we just remove the endio cleanup in __extent_writepage() and make this do
> the proper cleanup?  I'm not sure if that is feasible or not, but seems like 
> it
> would make the cleanup stuff less confusing and more straightforward.  If not
> you can add

Quickly skimming the code I think the cleanup in __extent_writepage
could be moved into __extent_writepage_io where we have 2 branches that
set PageError. So I guess it could be done, but I will have to
experiment with it.

> 
> Reviewed-by: Josef Bacik <jo...@toxicpanda.com>
> 
> Thanks,
> 
> Josef
> 

Reply via email to