On Mon, Jun 09, 2025 at 04:12:00PM -0100, Theodore Ts'o wrote:
> On Thu, May 22, 2025 at 07:48:51PM +0100, Tim Woodall wrote:
> > At least for my test, doing the rm via
> > debugfs -w -R "rm filler" "${DEV}"
> > works without issue (my fix is just a cut and paste of what debugfs does in
> > place of the call to ext2fs_punch) so I don't think the extent tree is
> > corrupted at this point.
>
> The use of ext2fs_punch() was deliberate because for extent-mapped
> files, it's more efficient to iterate by extent instead by block by
> block.
>
> And there was a bug in ext2fs_punch(), for which the fix is attached
> below.
>
> (BTW, Darrick, the other very rare corruption you were seeing with
> fuse2fs and tools like fsx and fsstress I suspect may be fixed by
> commit b91470122325 ("libext2fs: fix a extent tree corruption bug in
> ext2fs_extent_set_bmap()" in my git repo.
<nod> I'll rebase and see what fstests says tonight.
> > > inusefile.patch: Have you observed that umounting a fuse2fs mount
> > > returns immediately but the fuse2fs server itself keeps running (and
> > > flushing dirty data to disk) for several seconds more? That I think is
> > > a bug/antifeature in the kernel and the solution to that is to move to a
> > > fuseblk mount. OTOH if you're actually seeing this with regular files
> > > then yeah, some other kind of locking is needed.
> >
> > Yes, I'm seeing this with regular files being mounted and for test.sh on my
> > system it's around 50s
>
> The fuseblk mount is Linux specific, so I think the inusefile patch is
> something we'll want to pursue for (a) non-Linux fuse2fs users, and
> (b) when you're mounting regular files and not block devices.
Yeah... I asked on fsdevel about FUSE_DESTROY and I get the sense that
Miklos doesn't entirely like the idea of calling FUSE_DESTROY from
unmount for !fuseblk filesystems because that could lead to DOSes:
https://lore.kernel.org/linux-fsdevel/cajfpegvwxql_n0poa95kgpjt5mmxs2xxcojbgwabhfczy8a...@mail.gmail.com/
Another thing that we /could/ do to minimize the time spent in
op_destroy is to implement syncfs in fuse2fs so that the sync_filesystem
call made by the VFS unmount function triggers all the flushing work
that op_destroy currently does; and then op_destroy just calls
ext2fs_close(), which should be fast.
The hard part is that nobody's wired up FUSE_SYNCFS in libfuse3.
Separate topic:
I found a really nasty footg^Wdesign flaw in the upper-level libfuse
interface. libfuse assigns a unique nodeid to every child file of a
directory, then passes this nodeid into the kernel. The nodeid is the
index passed into iget5_locked in the kernel, which means that the
nodeid controls which struct inode the kernel is dealing with.
Unfortunately, the nodeid does not account for hardlinked files. If you
open a hardlinked ext2 file, you get two different struct inodes in the
kernel. So far I think nobody's noticed this in fuse2fs because in its
default configuration fuse2fs sets caching to zero and so the kernel
aggressively calls op_getattr and passes all file IO through to fuse2fs.
This is causing a huge problem for fuse2fs+iomap because iomap assumes
that it can attach an iomapping cache to the struct (fuse_)inode and
that i_size and i_mode are stable. So now there are weird cache
coherency issues across hardlinked files and I've had to disable op_link
to quiet down fstests.
I'm also chasing down a coherency problem between open(..., O_TRUNC) and
pagecache writeback, but that's most likely my own fault since that's
all in my new iomap code. :P
> > (Note that this is inside another fuse mount as the system I'm running the
> > tests on has 1K blocks so I cannot create a 5T sparse file to contain the
> > filesystem)
>
> So a quick free warning --- there are real performance issues relating
> to using a 1k block file system. It's fine for small devices where
> you are trying to squeeze out every bit of storage efficiency, so that
> files that are only say, 128 bytes consume a single 1k block instead
> of a 4k block. But in general, for general purpose file systems,
> there are a lot of downsides with using a 1k block file system, and in
> my opinions the costs far exceed the benefits.
>
> Cheers,
>
> - Ted
>
> commit 34b2a4a1f9794498ca403393003cc5840c240d42
> Author: Theodore Ts'o <[email protected]>
> Date: Mon Jun 9 11:30:48 2025 -0400
>
> libext2fs: fix integer overflow in ext2fs_punch() when releasing more
> than 2**31 blocks
>
> Addresses-Debian-Bug: #1106241
> Signed-off-by: Theodore Ts'o <[email protected]>
>
> diff --git a/lib/ext2fs/punch.c b/lib/ext2fs/punch.c
> index e2543e1e..80c699eb 100644
> --- a/lib/ext2fs/punch.c
> +++ b/lib/ext2fs/punch.c
> @@ -193,10 +193,10 @@ static void dbg_print_extent(char *desc, struct
> ext2fs_extent *extent)
> static errcode_t punch_extent_blocks(ext2_filsys fs, ext2_ino_t ino,
> struct ext2_inode *inode,
> blk64_t lfree_start, blk64_t free_start,
> - __u32 free_count, int *freed)
> + __u32 free_count, blk64_t *freed)
> {
> blk64_t pblk;
> - int freed_now = 0;
> + __u32 freed_now = 0;
> __u32 cluster_freed;
> errcode_t retval = 0;
>
> @@ -271,7 +271,7 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs,
> ext2_ino_t ino,
> errcode_t retval;
> blk64_t free_start, next, lfree_start;
> __u32 free_count, newlen;
> - int freed = 0;
> + blk64_t freed = 0;
> int op;
>
> retval = ext2fs_extent_open2(fs, ino, inode, &handle);
> @@ -442,7 +442,7 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs,
> ext2_ino_t ino,
> if (retval)
> goto errout;
> }
> - dbg_printf("Freed %d blocks\n", freed);
> + dbg_printf("Freed %llu blocks\n", freed);
> retval = ext2fs_iblk_sub_blocks(fs, inode, freed);
> errout:
> ext2fs_extent_free(handle);