Re: fallocate does not prevent ENOSPC on write

Qu Wenruo Tue, 23 Apr 2019 16:49:44 -0700


On 2019/4/23 下午10:50, Filipe Manana wrote:
> On Tue, Apr 23, 2019 at 1:14 PM Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>>
>>
>>
>> On 2019/4/23 下午7:33, David Sterba wrote:
>>> On Tue, Apr 23, 2019 at 10:16:32AM +0800, Qu Wenruo wrote:
>>>> On 2019/4/23 上午5:09, Jakob Unterwurzacher wrote:
>>>>> I have a user who is reporting ENOSPC errors when running gocryptfs on
>>>>> top of btrfs (ticket: https://github.com/rfjakob/gocryptfs/issues/395 ).
>>>>>
>>>>> What is interesting is that the error gets thrown at write time. This
>>>>> is not supposed to happen, because gocryptfs does
>>>>>
>>>>>     fallocate(..., FALLOC_FL_KEEP_SIZE, ...)
>>>>>
>>>>> before writing.
>>>>>
>>>>> I wrote a minimal reproducer in C: 
>>>>> https://github.com/rfjakob/fallocate_write
>>>>> This is what it looks like on ext4:
>>>>>
>>>>>     $ ../fallocate_write/fallocate_write
>>>>>     reading from /dev/urandom
>>>>>     writing to ./blob.379Q8P
>>>>>     writing blocks of 132096 bytes each
>>>>>     [...]
>>>>>     fallocate failed: No space left on device
>>>>>
>>>>> On btrfs, it will instead look like this:
>>>>>
>>>>>     [...]
>>>>>     pwrite failed: No space left on device
>>>>>
>>>>> Is this a bug in btrfs' fallocate implementation or am I reading the
>>>>> guarantees that fallocate gives me wrong?
>>>>
>>>> Since v4.7, this commit changed the how btrfs do NodataCOW check:
>>>> c6887cd11149 ("Btrfs: don't do nocow check unless we have to").
>>>>
>>>> Before that commit, btrfs always check if they need to reserve space for
>>>> COW, while after that patch, btrfs never checks unless we have no space.
>>>>
>>>> However this screws up other nodatacow space check.
>>>> And due to its age and deep changeset, it's pretty hard to fix it.
>>>> I have tried several times, but it will only cause more problems.
>>>
>>> What if the commit is reverted, if the problem is otherwise hard to fix?
>>
>> Tried reverted, but all other problems came up.
> 
> I haven't seen an explanation on why that patch causes ENOSPC or what
> nodatacow space check screw ups it causes.
> 
> It seems fine to me, and what we currently do:
> 
> 1) For any buffered write, check if there's enough free data space;
> 2) If not try to allocate a new data chunk;
> 3) If that fails check if the file has the "have prealloc extents"
> flag or has the nodatacow flag set
> 4) If any of those conditions is true, check if we can write to the
> existing extent - if it's not shared or no checksums exist in its
> range, meaning it's an unwritten (prealloc) extent, return success to
> userspace
> 
> So what's wrong with it? And how does it cause the ENOSPC?


E.g.

We have a 128Mb preallocated file extent.
And assume the fs only have 128M free data space, meaning 0 remaining
space at all.

Then we try to buffer write, which means buffered will just fail as it
will need data space.

The idea is always here for fallocate/pwrite, just the timing where the
ENOSPC happens.


We have btrfs/153 for the same reason to fail for a long time, although
it's from quota, but the reason the completely the same.

Thanks,
Qu

> 
> Trying the reproducer, at least on a 5.0 kernel, does never fail on a
> pwrite for me, but always on fallocate:
> 
> $ mkfs.btrfs -f -b $((4 * 1024 * 1024 * 1024)) /dev/sdi
> $ mount /dev/sdi /mnt/sdi
> $ cd /mnt/sdi
> $ /path/to/reproducer
> reading from /dev/urandom
> writing to ./blob.IIa6tH
> writing blocks of 132096 bytes each
> total    125 MiB,  65.52 MiB/s
> total    251 MiB,  44.59 MiB/s
> total    377 MiB,  55.23 MiB/s
> total    503 MiB,  66.21 MiB/s
> total    629 MiB,  59.97 MiB/s
> total    755 MiB,   3.70 MiB/s
> total    881 MiB,  50.24 MiB/s
> total   1007 MiB,  64.51 MiB/s
> total   1133 MiB,  50.70 MiB/s
> total   1259 MiB,  49.29 MiB/s
> total   1385 MiB,  47.93 MiB/s
> total   1511 MiB,   4.00 MiB/s
> total   1637 MiB,  49.85 MiB/s
> total   1763 MiB,  48.11 MiB/s
> total   1889 MiB,  66.62 MiB/s
> total   2015 MiB,   5.60 MiB/s
> total   2141 MiB,  19.58 MiB/s
> total   2267 MiB,  64.80 MiB/s
> total   2393 MiB,  13.23 MiB/s
> total   2519 MiB,  14.95 MiB/s
> fallocate failed: No space left on device
> 
> So either that was tested on a rather old kernel or:
> 
> 1) we had snapshotting happening between a fallocate and a pwrite (or
> at the same time as the pwrite)
> 2) before the pwrite (or during) the unwritten/prealloc extent was
> reflinked (cp --reflink, clone or dedupe ioctls)
> 
> What did I miss here?
> 
> Thanks.
> 
>>
>> E.g. reserved space underflow.
>>
>> I'll find the old thread and retry again.
>>
>> Thanks,
>> Qu
>>
>>> This seems to break the semantics of fallocate so the performance should
>>> not the main concern here.
>>>
>>
> 
>

signature.asc
Description: OpenPGP digital signature

Re: fallocate does not prevent ENOSPC on write

Reply via email to