On Wed, Sep 17, 2025 at 06:06:55PM -0400, Brian Song wrote:
> 
> 
> On 9/17/25 9:01 AM, Hanna Czenczek wrote:
> > On 15.09.25 07:43, Brian Song wrote:
> > > Hi Hanna,
> > 
> > Hi Brian!
> > 
> > (Thanks for your heads-up!)
> > 
> > > Stefan raised the above issue and proposed a preliminary solution: keep
> > > closing the file descriptor in the delete section, but perform
> > > umount separately for FUSE uring and traditional FUSE in the shutdown
> > > and delete sections respectively. This approach avoids the race
> > > condition on the file descriptor.
> > > 
> > > In the case of FUSE uring, umount must be performed in the shutdown
> > > section. The reason is that the kernel currently lacks an interface to
> > > explicitly cancel submitted SQEs. Performing umount forces the kernel to
> > > flush all pending SQEs and return their CQEs. Without this step, CQEs
> > > may arrive after the export has already been deleted, and invoking the
> > > CQE handler at that point would dereference freed memory and trigger a
> > > segmentation fault.
> > 
> > The commit message says that incrementing the BB reference would be
> > enough to solve the problem (i.e. deleting is delayed until all requests
> > are done).  Why isn’t it?
> 
> Hanna:
> 
> If we place umount in the delete section instead of the shutdown section,
> the kernel FUSE driver will continue waiting for user FUSE requests and
> therefore won't return CQEs to userspace. As a result, the BB reference
> remains held (since the reference is acquired during registration and
> submission and only released once the CQE returns), preventing the delete
> operation from being invoked (invoked once the reference is decreased to 0).
> This is why umount must be placed in the shutdown section.
> 
> > 
> > > I’m curious about traditional FUSE: is it strictly necessary to perform
> > > umount in the delete section, or could it also be done in shutdown?
> > 
> > Looking into libfuse, fuse_session_unmount() (in fuse_kern_unmount())
> > closes the FUSE FD.  I can imagine that might result in the potential
> > problems Stefan described.
> > 
> > > Additionally, what is the correct ordering between close(fd) and
> > > umount, does one need to precede the other?
> > 
> > fuse_kern_unmount() closes the (queue 0) FD first before actually
> > unmounting, with a comment: “Need to close file descriptor, otherwise
> > synchronous umount would recurse into filesystem, and deadlock.”
> > 
> > Given that, I assume the FDs should all be closed before unmounting.
> > 
> > (Though to be fair, before looking into it now, I don’t think I’ve ever
> > given it much thought…)
> > 
> > Hanna
> > 
> Stefan:
> 
> I roughly went through the umount and close system calls:
> 
> umount:
> fuse_kill_sb_anon -> fuse_sb_destroy -> fuse_abort_conn
> 
> close:
> __fput -> file->f_op->release(inode, file) -> fuse_dev_release ->
> fuse_abort_conn
> (this only runs after all /dev/fuse FDs have been closed).
> 
> And as Hanna mentioned, libfuse points out: “Need to close file descriptor,
> otherwise synchronous umount would recurse into filesystem, and deadlock.”
> 
> So ideally, we should close each queue FD first, then call umount at the end
> — even though calling umount directly also works. The root issue is that the
> kernel doesn't provide an interface to cancel already submitted SQEs.

Hi Bernd,
I wanted to check with you to see if you have thought more about
ASYNC_CANCEL support for FUSE-over-io_uring SQEs?

If you don't have time to implement it, maybe you could share your
thoughts on how one would go about doing this? That would be a nice
starting point if someone else wants to try it out.

Thanks,
Stefan

> 
> You mentioned that in fuse over io_uring mode we perform close in the
> shutdown path, but at that point the server may still be processing
> requests. While handling requests, it may still write to the FD, but that FD
> might not be /dev/fuse. I’m not sure how this gets triggered, since in fuse
> uring mode all FUSE requests are handled by io_uring, and our FUSE requests
> should be completed via io_uring. After shutdown closes the FD, it may call
> fuse_abort_conn, which terminates all request processing in the kernel.
> There’s also locking in place to protect the termination of requests and the
> subsequent uring cleanup.
> 
> That’s why I think the best approach for now is:
> 
> in shutdown, handle close and umount for fuse over io_uring;
> 
> in delete, handle close and umount for traditional FUSE.
> 
> > > Thanks,
> > > Brian
> > > 
> > > On 9/9/25 3:33 PM, Stefan Hajnoczi wrote:
> > >   > On Fri, Aug 29, 2025 at 10:50:24PM -0400, Brian Song wrote:
> > >   >> @@ -901,24 +941,15 @@ static void fuse_export_shutdown(BlockExport
> > > *blk_exp)
> > >   >>            */
> > >   >>           g_hash_table_remove(exports, exp->mountpoint);
> > >   >>       }
> > >   >> -}
> > >   >> -
> > >   >> -static void fuse_export_delete(BlockExport *blk_exp)
> > >   >> -{
> > >   >> -    FuseExport *exp = container_of(blk_exp, FuseExport, common);
> > >   >>
> > >   >> -    for (int i = 0; i < exp->num_queues; i++) {
> > >   >> +    for (size_t i = 0; i < exp->num_queues; i++) {
> > >   >>           FuseQueue *q = &exp->queues[i];
> > >   >>
> > >   >>           /* Queue 0's FD belongs to the FUSE session */
> > >   >>           if (i > 0 && q->fuse_fd >= 0) {
> > >   >>               close(q->fuse_fd);
> > >   >
> > >   > This changes the behavior of the non-io_uring code. Now all fuse
> > > fds and
> > >   > fuse_session are closed while requests are potentially still being
> > >   > processed.
> > >   >
> > >   > There is a race condition: if an IOThread is processing a
> > > request here
> > >   > then it may invoke a system call on q->fuse_fd just after it has been
> > >   > closed but not set to -1. If another thread has also opened a
> > > new file
> > >   > then the fd could be reused, resulting in an accidental write(2)
> > > to the
> > >   > new file. I'm not sure whether there is a way to trigger this in
> > >   > practice, but it looks like a problem waiting to happen.
> > >   >
> > >   > Simply setting q->fuse_fd to -1 here doesn't fix the race. It
> > > would be
> > >   > necessary to stop processing fuse_fd in the thread before closing it
> > >   > here or to schedule a BH in each thread so that fuse_fd can be closed
> > >   > in the thread that uses the fd.
> > > 
> > 
> 

Attachment: signature.asc
Description: PGP signature

Reply via email to