Re: fallocate does not prevent ENOSPC on write

Qu Wenruo Thu, 25 Apr 2019 06:50:44 -0700


On 2019/4/25 下午9:25, Josef Bacik wrote:
[snip]
>>>
>>> What if the commit is reverted, if the problem is otherwise hard to fix?
>>> This seems to break the semantics of fallocate so the performance should
>>> not the main concern here.
>>
> 
> Are we sure the ENOSPC is coming from the data reservation?  That change makes
> us fall back on the old behavior, which means we should still succeed at 
> making
> the data reservation.
> 
> However it fallocate() _does not_ guarantee you won't fail the metadata
> reservation, I suspect that may be what you are running into.


For this script, we only needs 4 file extents at most.
Even the initial 8M metadata should be pretty enough, thus I don't think
it's metadata causing the problem.
---
#!/bin/bash

dev=/dev/test/test
mnt=/mnt/btrfs

mkfs.btrfs -f $dev -b 512M

mount $dev $mnt

fallocate -l 384M $mnt/file1
echo "fallocate success"
sync
dd if=/dev/zero bs=512K  oflag=direct conv=notrunc count=768 of=$mnt/file2

umount $mnt
---

> 
>> My blur memory of the underflow case is something like below: (failed to
>> locate the old thread)
>>
>> - fallocate
>> - pwrite in to the reallocated range
>>   At this timing, we can do nocow, thus no data space is reserved.
>>
>> - Something happened to make that preallocated extent shared, without
>>   writing back dirty pages.
>>   Some possible causes are snapshot and reflink.
>>   However nowadays, snapshots will write all dirty inodes, and reflink
>>   will write the source range to disk.
>>
>>   Maybe it's a small window inside create_snapshot() between
>>   btrfs_start_delalloc_snapshot() and btrfs_commit_transaction() calls?
>>
>> - dirty pages get written back
>>   We created ordered extent, but at this timing, we can't do nocow any
>>   more, we need to fallback to cow.
>>   However at the buffered write timing, we didn't reserved data space.
>>   Now we will underflow data space reservation.
>>
>> However nowadays there are some new mechanism to handle this case more
>> gracefully, like btrfs_root::will_be_snapshotted.
>>
>> I'll double check if reverting that patch in latest kernel still cause
>> problem.
>> But any idea on the possible problem is welcomed.
>>
> 
> Reading the code there's two scenarios that happen.  All of our down stream
> stuff assumes that we've updated ->bytes_may_use for our data write.  So if we
> fail our reservation and do the nocow thing of skipping our reservation we can
> overflow if we
> 
> 1) Need to allocate an extent anyway because of reflink/snapshot.
> btrfs_add_reserved_space() expects that space_info->bytes_may_use has our 
> region
> in it, so in this case it doesn't and we underflow here.  I think you are 
> right
> in that we do all dirty writeback nowadays so this is less of an issue, buuuut
> 
> 2) In run_delalloc_nocow we do EXTENT_CLEAR_DATA_RESV unconditionally if we 
> did
> manage to do a nocow.  If we fell back on the no reserve case then this would
> underflow our ->bytes_may_use counter here.

Right, I missed this case. Thanks for pointing this out.

> 
> Off the top of my head I say we just add our write_bytes to ->bytes_may_use if
> we use the nocow path.  If we're already failing to reserve data space as it 
> is
> then there's no harm in making it appear like we have less space by inflating
> ->bytes_may_use.  This is the straightforward fix for the underflow, and we
> could come up with something more crafty later, like setting the range with
> EXTENT_NO_DATA_RESERVE and doing magic later with ->bytes_may_use.  Thanks,

Sounds pretty valid to me.

Thanks for the idea,
Qu

> 
> Josef
>

signature.asc
Description: OpenPGP digital signature

Re: fallocate does not prevent ENOSPC on write

Reply via email to