On 2019/4/23 下午7:33, David Sterba wrote:
> On Tue, Apr 23, 2019 at 10:16:32AM +0800, Qu Wenruo wrote:
>> On 2019/4/23 上午5:09, Jakob Unterwurzacher wrote:
>>> I have a user who is reporting ENOSPC errors when running gocryptfs on
>>> top of btrfs (ticket: https://github.com/rfjakob/gocryptfs/issues/395 ).
>>>
>>> What is interesting is that the error gets thrown at write time. This
>>> is not supposed to happen, because gocryptfs does
>>>
>>>     fallocate(..., FALLOC_FL_KEEP_SIZE, ...)
>>>
>>> before writing.
>>>
>>> I wrote a minimal reproducer in C: 
>>> https://github.com/rfjakob/fallocate_write
>>> This is what it looks like on ext4:
>>>
>>>     $ ../fallocate_write/fallocate_write
>>>     reading from /dev/urandom
>>>     writing to ./blob.379Q8P
>>>     writing blocks of 132096 bytes each
>>>     [...]
>>>     fallocate failed: No space left on device
>>>
>>> On btrfs, it will instead look like this:
>>>
>>>     [...]
>>>     pwrite failed: No space left on device
>>>
>>> Is this a bug in btrfs' fallocate implementation or am I reading the
>>> guarantees that fallocate gives me wrong?
>>
>> Since v4.7, this commit changed the how btrfs do NodataCOW check:
>> c6887cd11149 ("Btrfs: don't do nocow check unless we have to").
>>
>> Before that commit, btrfs always check if they need to reserve space for
>> COW, while after that patch, btrfs never checks unless we have no space.
>>
>> However this screws up other nodatacow space check.
>> And due to its age and deep changeset, it's pretty hard to fix it.
>> I have tried several times, but it will only cause more problems.
> 
> What if the commit is reverted, if the problem is otherwise hard to fix?
> This seems to break the semantics of fallocate so the performance should
> not the main concern here.

My blur memory of the underflow case is something like below: (failed to
locate the old thread)

- fallocate
- pwrite in to the reallocated range
  At this timing, we can do nocow, thus no data space is reserved.

- Something happened to make that preallocated extent shared, without
  writing back dirty pages.
  Some possible causes are snapshot and reflink.
  However nowadays, snapshots will write all dirty inodes, and reflink
  will write the source range to disk.

  Maybe it's a small window inside create_snapshot() between
  btrfs_start_delalloc_snapshot() and btrfs_commit_transaction() calls?

- dirty pages get written back
  We created ordered extent, but at this timing, we can't do nocow any
  more, we need to fallback to cow.
  However at the buffered write timing, we didn't reserved data space.
  Now we will underflow data space reservation.

However nowadays there are some new mechanism to handle this case more
gracefully, like btrfs_root::will_be_snapshotted.

I'll double check if reverting that patch in latest kernel still cause
problem.
But any idea on the possible problem is welcomed.

Thanks,
Qu

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to