On Tue, Apr 23, 2019 at 1:14 PM Qu Wenruo <quwenruo.bt...@gmx.com> wrote: > > > > On 2019/4/23 下午7:33, David Sterba wrote: > > On Tue, Apr 23, 2019 at 10:16:32AM +0800, Qu Wenruo wrote: > >> On 2019/4/23 上午5:09, Jakob Unterwurzacher wrote: > >>> I have a user who is reporting ENOSPC errors when running gocryptfs on > >>> top of btrfs (ticket: https://github.com/rfjakob/gocryptfs/issues/395 ). > >>> > >>> What is interesting is that the error gets thrown at write time. This > >>> is not supposed to happen, because gocryptfs does > >>> > >>> fallocate(..., FALLOC_FL_KEEP_SIZE, ...) > >>> > >>> before writing. > >>> > >>> I wrote a minimal reproducer in C: > >>> https://github.com/rfjakob/fallocate_write > >>> This is what it looks like on ext4: > >>> > >>> $ ../fallocate_write/fallocate_write > >>> reading from /dev/urandom > >>> writing to ./blob.379Q8P > >>> writing blocks of 132096 bytes each > >>> [...] > >>> fallocate failed: No space left on device > >>> > >>> On btrfs, it will instead look like this: > >>> > >>> [...] > >>> pwrite failed: No space left on device > >>> > >>> Is this a bug in btrfs' fallocate implementation or am I reading the > >>> guarantees that fallocate gives me wrong? > >> > >> Since v4.7, this commit changed the how btrfs do NodataCOW check: > >> c6887cd11149 ("Btrfs: don't do nocow check unless we have to"). > >> > >> Before that commit, btrfs always check if they need to reserve space for > >> COW, while after that patch, btrfs never checks unless we have no space. > >> > >> However this screws up other nodatacow space check. > >> And due to its age and deep changeset, it's pretty hard to fix it. > >> I have tried several times, but it will only cause more problems. > > > > What if the commit is reverted, if the problem is otherwise hard to fix? > > Tried reverted, but all other problems came up.
I haven't seen an explanation on why that patch causes ENOSPC or what nodatacow space check screw ups it causes. It seems fine to me, and what we currently do: 1) For any buffered write, check if there's enough free data space; 2) If not try to allocate a new data chunk; 3) If that fails check if the file has the "have prealloc extents" flag or has the nodatacow flag set 4) If any of those conditions is true, check if we can write to the existing extent - if it's not shared or no checksums exist in its range, meaning it's an unwritten (prealloc) extent, return success to userspace So what's wrong with it? And how does it cause the ENOSPC? Trying the reproducer, at least on a 5.0 kernel, does never fail on a pwrite for me, but always on fallocate: $ mkfs.btrfs -f -b $((4 * 1024 * 1024 * 1024)) /dev/sdi $ mount /dev/sdi /mnt/sdi $ cd /mnt/sdi $ /path/to/reproducer reading from /dev/urandom writing to ./blob.IIa6tH writing blocks of 132096 bytes each total 125 MiB, 65.52 MiB/s total 251 MiB, 44.59 MiB/s total 377 MiB, 55.23 MiB/s total 503 MiB, 66.21 MiB/s total 629 MiB, 59.97 MiB/s total 755 MiB, 3.70 MiB/s total 881 MiB, 50.24 MiB/s total 1007 MiB, 64.51 MiB/s total 1133 MiB, 50.70 MiB/s total 1259 MiB, 49.29 MiB/s total 1385 MiB, 47.93 MiB/s total 1511 MiB, 4.00 MiB/s total 1637 MiB, 49.85 MiB/s total 1763 MiB, 48.11 MiB/s total 1889 MiB, 66.62 MiB/s total 2015 MiB, 5.60 MiB/s total 2141 MiB, 19.58 MiB/s total 2267 MiB, 64.80 MiB/s total 2393 MiB, 13.23 MiB/s total 2519 MiB, 14.95 MiB/s fallocate failed: No space left on device So either that was tested on a rather old kernel or: 1) we had snapshotting happening between a fallocate and a pwrite (or at the same time as the pwrite) 2) before the pwrite (or during) the unwritten/prealloc extent was reflinked (cp --reflink, clone or dedupe ioctls) What did I miss here? Thanks. > > E.g. reserved space underflow. > > I'll find the old thread and retry again. > > Thanks, > Qu > > > This seems to break the semantics of fallocate so the performance should > > not the main concern here. > > > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”