On Tue, Apr 23, 2019 at 1:14 PM Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>
>
>
> On 2019/4/23 下午7:33, David Sterba wrote:
> > On Tue, Apr 23, 2019 at 10:16:32AM +0800, Qu Wenruo wrote:
> >> On 2019/4/23 上午5:09, Jakob Unterwurzacher wrote:
> >>> I have a user who is reporting ENOSPC errors when running gocryptfs on
> >>> top of btrfs (ticket: https://github.com/rfjakob/gocryptfs/issues/395 ).
> >>>
> >>> What is interesting is that the error gets thrown at write time. This
> >>> is not supposed to happen, because gocryptfs does
> >>>
> >>>     fallocate(..., FALLOC_FL_KEEP_SIZE, ...)
> >>>
> >>> before writing.
> >>>
> >>> I wrote a minimal reproducer in C: 
> >>> https://github.com/rfjakob/fallocate_write
> >>> This is what it looks like on ext4:
> >>>
> >>>     $ ../fallocate_write/fallocate_write
> >>>     reading from /dev/urandom
> >>>     writing to ./blob.379Q8P
> >>>     writing blocks of 132096 bytes each
> >>>     [...]
> >>>     fallocate failed: No space left on device
> >>>
> >>> On btrfs, it will instead look like this:
> >>>
> >>>     [...]
> >>>     pwrite failed: No space left on device
> >>>
> >>> Is this a bug in btrfs' fallocate implementation or am I reading the
> >>> guarantees that fallocate gives me wrong?
> >>
> >> Since v4.7, this commit changed the how btrfs do NodataCOW check:
> >> c6887cd11149 ("Btrfs: don't do nocow check unless we have to").
> >>
> >> Before that commit, btrfs always check if they need to reserve space for
> >> COW, while after that patch, btrfs never checks unless we have no space.
> >>
> >> However this screws up other nodatacow space check.
> >> And due to its age and deep changeset, it's pretty hard to fix it.
> >> I have tried several times, but it will only cause more problems.
> >
> > What if the commit is reverted, if the problem is otherwise hard to fix?
>
> Tried reverted, but all other problems came up.

I haven't seen an explanation on why that patch causes ENOSPC or what
nodatacow space check screw ups it causes.

It seems fine to me, and what we currently do:

1) For any buffered write, check if there's enough free data space;
2) If not try to allocate a new data chunk;
3) If that fails check if the file has the "have prealloc extents"
flag or has the nodatacow flag set
4) If any of those conditions is true, check if we can write to the
existing extent - if it's not shared or no checksums exist in its
range, meaning it's an unwritten (prealloc) extent, return success to
userspace

So what's wrong with it? And how does it cause the ENOSPC?

Trying the reproducer, at least on a 5.0 kernel, does never fail on a
pwrite for me, but always on fallocate:

$ mkfs.btrfs -f -b $((4 * 1024 * 1024 * 1024)) /dev/sdi
$ mount /dev/sdi /mnt/sdi
$ cd /mnt/sdi
$ /path/to/reproducer
reading from /dev/urandom
writing to ./blob.IIa6tH
writing blocks of 132096 bytes each
total    125 MiB,  65.52 MiB/s
total    251 MiB,  44.59 MiB/s
total    377 MiB,  55.23 MiB/s
total    503 MiB,  66.21 MiB/s
total    629 MiB,  59.97 MiB/s
total    755 MiB,   3.70 MiB/s
total    881 MiB,  50.24 MiB/s
total   1007 MiB,  64.51 MiB/s
total   1133 MiB,  50.70 MiB/s
total   1259 MiB,  49.29 MiB/s
total   1385 MiB,  47.93 MiB/s
total   1511 MiB,   4.00 MiB/s
total   1637 MiB,  49.85 MiB/s
total   1763 MiB,  48.11 MiB/s
total   1889 MiB,  66.62 MiB/s
total   2015 MiB,   5.60 MiB/s
total   2141 MiB,  19.58 MiB/s
total   2267 MiB,  64.80 MiB/s
total   2393 MiB,  13.23 MiB/s
total   2519 MiB,  14.95 MiB/s
fallocate failed: No space left on device

So either that was tested on a rather old kernel or:

1) we had snapshotting happening between a fallocate and a pwrite (or
at the same time as the pwrite)
2) before the pwrite (or during) the unwritten/prealloc extent was
reflinked (cp --reflink, clone or dedupe ioctls)

What did I miss here?

Thanks.

>
> E.g. reserved space underflow.
>
> I'll find the old thread and retry again.
>
> Thanks,
> Qu
>
> > This seems to break the semantics of fallocate so the performance should
> > not the main concern here.
> >
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

Reply via email to