Bug#1106241: e2fsprogs: fuse2fs - rm fails for very large files

Darrick J. Wong Mon, 09 Jun 2025 11:19:14 -0700

On Mon, Jun 09, 2025 at 04:12:00PM -0100, Theodore Ts'o wrote:
> On Thu, May 22, 2025 at 07:48:51PM +0100, Tim Woodall wrote:
> > At least for my test, doing the rm via
> > debugfs -w -R "rm filler" "${DEV}"
> > works without issue (my fix is just a cut and paste of what debugfs does in
> > place of the call to ext2fs_punch) so I don't think the extent tree is
> > corrupted at this point.
> 
> The use of ext2fs_punch() was deliberate because for extent-mapped
> files, it's more efficient to iterate by extent instead by block by
> block.
> 
> And there was a bug in ext2fs_punch(), for which the fix is attached
> below.
> 
> (BTW, Darrick, the other very rare corruption you were seeing with
> fuse2fs and tools like fsx and fsstress I suspect may be fixed by
> commit b91470122325 ("libext2fs: fix a extent tree corruption bug in
> ext2fs_extent_set_bmap()" in my git repo.


<nod> I'll rebase and see what fstests says tonight.

> > > inusefile.patch: Have you observed that umounting a fuse2fs mount
> > > returns immediately but the fuse2fs server itself keeps running (and
> > > flushing dirty data to disk) for several seconds more?  That I think is
> > > a bug/antifeature in the kernel and the solution to that is to move to a
> > > fuseblk mount.  OTOH if you're actually seeing this with regular files
> > > then yeah, some other kind of locking is needed.
> > 
> > Yes, I'm seeing this with regular files being mounted and for test.sh on my
> > system it's around 50s
> 
> The fuseblk mount is Linux specific, so I think the inusefile patch is
> something we'll want to pursue for (a) non-Linux fuse2fs users, and
> (b) when you're mounting regular files and not block devices.

Yeah... I asked on fsdevel about FUSE_DESTROY and I get the sense that
Miklos doesn't entirely like the idea of calling FUSE_DESTROY from
unmount for !fuseblk filesystems because that could lead to DOSes:

https://lore.kernel.org/linux-fsdevel/cajfpegvwxql_n0poa95kgpjt5mmxs2xxcojbgwabhfczy8a...@mail.gmail.com/

Another thing that we /could/ do to minimize the time spent in
op_destroy is to implement syncfs in fuse2fs so that the sync_filesystem
call made by the VFS unmount function triggers all the flushing work
that op_destroy currently does; and then op_destroy just calls
ext2fs_close(), which should be fast.

The hard part is that nobody's wired up FUSE_SYNCFS in libfuse3.

Separate topic:

I found a really nasty footg^Wdesign flaw in the upper-level libfuse
interface.  libfuse assigns a unique nodeid to every child file of a
directory, then passes this nodeid into the kernel.  The nodeid is the
index passed into iget5_locked in the kernel, which means that the
nodeid controls which struct inode the kernel is dealing with.

Unfortunately, the nodeid does not account for hardlinked files.  If you
open a hardlinked ext2 file, you get two different struct inodes in the
kernel.  So far I think nobody's noticed this in fuse2fs because in its
default configuration fuse2fs sets caching to zero and so the kernel
aggressively calls op_getattr and passes all file IO through to fuse2fs.

This is causing a huge problem for fuse2fs+iomap because iomap assumes
that it can attach an iomapping cache to the struct (fuse_)inode and
that i_size and i_mode are stable.   So now there are weird cache
coherency issues across hardlinked files and I've had to disable op_link
to quiet down fstests.

I'm also chasing down a coherency problem between open(..., O_TRUNC) and
pagecache writeback, but that's most likely my own fault since that's
all in my new iomap code. :P

> > (Note that this is inside another fuse mount as the system I'm running the
> > tests on has 1K blocks so I cannot create a 5T sparse file to contain the
> > filesystem)
> 
> So a quick free warning --- there are real performance issues relating
> to using a 1k block file system.  It's fine for small devices where
> you are trying to squeeze out every bit of storage efficiency, so that
> files that are only say, 128 bytes consume a single 1k block instead
> of a 4k block.  But in general, for general purpose file systems,
> there are a lot of downsides with using a 1k block file system, and in
> my opinions the costs far exceed the benefits.
> 
> Cheers,
> 
>                                               - Ted
> 
> commit 34b2a4a1f9794498ca403393003cc5840c240d42
> Author: Theodore Ts'o <[email protected]>
> Date:   Mon Jun 9 11:30:48 2025 -0400
> 
>     libext2fs: fix integer overflow in ext2fs_punch() when releasing more 
> than 2**31 blocks
>     
>     Addresses-Debian-Bug: #1106241
>     Signed-off-by: Theodore Ts'o <[email protected]>
> 
> diff --git a/lib/ext2fs/punch.c b/lib/ext2fs/punch.c
> index e2543e1e..80c699eb 100644
> --- a/lib/ext2fs/punch.c
> +++ b/lib/ext2fs/punch.c
> @@ -193,10 +193,10 @@ static void dbg_print_extent(char *desc, struct 
> ext2fs_extent *extent)
>  static errcode_t punch_extent_blocks(ext2_filsys fs, ext2_ino_t ino,
>                                    struct ext2_inode *inode,
>                                    blk64_t lfree_start, blk64_t free_start,
> -                                  __u32 free_count, int *freed)
> +                                  __u32 free_count, blk64_t *freed)
>  {
>       blk64_t         pblk;
> -     int             freed_now = 0;
> +     __u32           freed_now = 0;
>       __u32           cluster_freed;
>       errcode_t       retval = 0;
>  
> @@ -271,7 +271,7 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, 
> ext2_ino_t ino,
>       errcode_t               retval;
>       blk64_t                 free_start, next, lfree_start;
>       __u32                   free_count, newlen;
> -     int                     freed = 0;
> +     blk64_t                 freed = 0;
>       int                     op;
>  
>       retval = ext2fs_extent_open2(fs, ino, inode, &handle);
> @@ -442,7 +442,7 @@ static errcode_t ext2fs_punch_extent(ext2_filsys fs, 
> ext2_ino_t ino,
>               if (retval)
>                       goto errout;
>       }
> -     dbg_printf("Freed %d blocks\n", freed);
> +     dbg_printf("Freed %llu blocks\n", freed);
>       retval = ext2fs_iblk_sub_blocks(fs, inode, freed);
>  errout:
>       ext2fs_extent_free(handle);

Bug#1106241: e2fsprogs: fuse2fs - rm fails for very large files

Reply via email to