https://bugs.kde.org/show_bug.cgi?id=502968

--- Comment #7 from mcer...@redhat.com ---
Created attachment 182650
  --> https://bugs.kde.org/attachment.cgi?id=182650&action=edit
proposed patch

(In reply to Mark Wielaard from comment #5)
> > From 3b6d22623c3fd9f22b47957b2c3255a4517e3d69 Mon Sep 17 00:00:00 2001
> > From: Martin Cermak <mcer...@redhat.com>
> > Date: Thu, 19 Jun 2025 16:39:24 +0200
> > Subject: [PATCH] Wrap linux specific syscalls 457 (listmount) and 458 
> > (statmount)
> >
> > The listmount syscall returns a list of mount IDs under the req.mnt_id.
> > This is meant to be used in conjuction with statmount(2) in order to
> > provide a way to iterate and discover mounted file systems.
> 
> s/conjuction/conjunction/

fixed

> > The statmount syscall returns information about a mount, storing it in
> > the buffer pointed to by smbuf.  The returned buffer is a struct
> > statmount which is of size bufsize with the fields filled in as
> > described below.
> 
> Described below?

fixed

> > Declare a sys_{lis,sta}tmount wrapper in priv_syswrap-linux.h and hook it
> > for {amd64,arm,arm64,mips64,nanomips,ppc32,ppc64,riscv64,s390x,x86}-linux
> > using LINXY with PRE and POST handler in syswrap-linux.c
> 
> Which is simple because the syscall numbers are shared between arches.
> 
> > Both syscalls need CAP_SYS_ADMIN, to successfully test.
> 
> Aha, I was wondering about that.
> 
> Please also add https://bugs.kde.org/show_bug.cgi?id=502968 to the commit
> message
> 
> > diff --git a/NEWS b/NEWS
> > index 97e4b3b41..837f1e0a6 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -42,6 +42,7 @@ are not entered into bugzilla tend to get forgotten about 
> > or ignored.
> >  504936  Add FreeBSD amd64 sysarch subcommands AMD64_SET_TLSBASE and
> >          AMD64_GET_TLSBASE
> >  505228  Wrap linux specific mseal syscall
> >+502968  Wrap linux specific syscalls 457 (listmount) and 458 (statmount)
> 
> Ack, thanks.
> 
> > diff --git a/coregrind/m_syswrap/priv_syswrap-linux.h 
> > b/coregrind/m_syswrap/priv_syswrap-linux.h
> > index ed8cb4ed5..9e6cb8981 100644
> > --- a/coregrind/m_syswrap/priv_syswrap-linux.h
> > +++ b/coregrind/m_syswrap/priv_syswrap-linux.h
> > @@ -355,6 +355,10 @@ DECL_TEMPLATE(linux, sys_pidfd_getfd);
> >  // Since Linux 6.6
> >  DECL_TEMPLATE(linux, sys_fchmodat2);
> >  
> > +// Since Linux 6.8
> > +DECL_TEMPLATE(linux, sys_listmount);
> > +DECL_TEMPLATE(linux, sys_statmount);
> 
> OK.
>  
> > +   LINXY(__NR_statmount,         sys_statmount),         // 457
> > +   LINXY(__NR_listmount,         sys_listmount),         // 458
> 
> Looks ok for all arches.
>  
> > diff --git a/coregrind/m_syswrap/syswrap-linux.c 
> > b/coregrind/m_syswrap/syswrap-linux.c
> > index 0db871778..18be21346 100644
> > --- a/coregrind/m_syswrap/syswrap-linux.c
> > +++ b/coregrind/m_syswrap/syswrap-linux.c
> > @@ -4305,6 +4305,50 @@ PRE(sys_mseal)
> >         SET_STATUS_Failure(VKI_ENOMEM);
> >  }
> > 
> > +PRE(sys_statmount)
> > +{
> > +   // int syscall(SYS_statmount, struct mnt_id_req *req,
> > +   //             struct statmount *smbuf, size_t bufsize,
> > +   //             unsigned long flags);
> 
> I think we may want a:
> *flags |= SfMayBlock;
> or maybe just FUSE_COMPATIBLE_MAY_BLOCK();
> since it might require switching threads/block when the file system is in
> userspace/this process.

OK, understood why this is needed, added!

> > +   PRINT("sys_statmount ( %#" FMT_REGWORD "x, %#" FMT_REGWORD "x, %lu,  
> > %#" FMT_REGWORD "x)", ARG1, ARG2, ARG3, ARG4);
> > +   PRE_REG_READ4(long, "statmount", struct vki_mnt_id_req *, req, struct 
> > vki_statmount *, smbuf, vki_size_t, bufsize, unsigned long, flags);
> > +   if (ARG1 != 0) {
> > +      PRE_MEM_READ("statmount(req)", ARG1, sizeof(struct vki_mnt_id_req));
> > +   }
> 
> Is req allowed to me NULL? Does statmount function correctly if it is?
> 
> > +   if ((ARG2 != 0) && (ARG3 > 0)) {
> > +      PRE_MEM_WRITE("statmount(smbuf)", ARG2, ARG3);
> > +   }
> 
> Same question for smbuf.
> 
> If not, then producing a warning here is the right thing to do (instead of
> ignoring it).

Understood, added warnings.

> > +}
> > +
> +POST(sys_statmount)
> +{
> +   if ((ARG2 != 0) && (ARG3 > 0)) {
> +      POST_MEM_WRITE(ARG2, ARG3);
> +   }
> +}
> 
> Same question as above. Also I think this should be something like:
> 
> struct vki_statmount *smbuf = (struct vki_statmount *)(Addr)ARG2;
> POST_MEM_WRITE( (Addr)smbuf, smbuf->size );

Understood, fixed.

> Since the kernel only seems to guarantee to fill in smbuf up to that size.
> 
> Sorry, have to return to the rest of the review later. But hopefully this
> already helps.

(In reply to Mark Wielaard from comment #6)
> Review part 2:
> 
> > +PRE(sys_listmount)
> > +{
> > +   // int syscall(SYS_listmount, struct mnt_id_req *req,
> > +   //             uint64_t *mnt_ids, size_t nr_mnt_ids,
> > +   //             unsigned long flags);
> > +   PRINT("sys_listmount ( %#" FMT_REGWORD "x, %#" FMT_REGWORD "x, %lu,  
> > %#" FMT_REGWORD "x)", ARG1, ARG2, ARG3, ARG4);
> > +   PRE_REG_READ4(long, "listmount", struct vki_mnt_id_req *, req, 
> > vki_uint64_t *, mnt_ids, vki_size_t, nr_mnt_ids, unsigned long, flags);
> 
> OK.
> Like with statmount we might want to use *flags |= SfMayBlock;
> or maybe FUSE_COMPATIBLE_MAY_BLOCK();
> 
> > +   if (ARG1 != 0) {
> > +      PRE_MEM_READ("listmount(req)", ARG1, sizeof(struct vki_mnt_id_req));
> > +   }
> > +   if ((ARG2 != 0) && (ARG3 > 0)) {
> > +      PRE_MEM_WRITE("listmount(mnt_ids)", ARG2, ARG3 * 
> > sizeof(vki_uint64_t));
> > +   }
> 
> As with statmount, are ARG1 and ARG2 allowed to be NULL?
> If not then just always do the check.

Understood, fixed.

> > +}
> > +
> > +POST(sys_listmount)
> > +{
> > +   if ((ARG2 != 0) && (ARG3 > 0)) {
> > +      POST_MEM_WRITE(ARG2, ARG3 * sizeof(vki_uint64_t));
> > +   }
> > +}
> 
> Likewise.

Likewise in a sense that NULL value of ARG2 needs to be checked?  Not sure I'm
getting this right.  Isn't this the case that NULL value SEGVs whole thing, and
then POST doesn't apply (because POST is /by default/ only exec'd for
successful syscalls) ?

> And I think it should be POST_MEM_WRITE(ARG2, RES * sizeof(vki_uint64_t));
> ARG3 is the maximum, RES is the actual number of mnt_ids put in.
> 
> > +struct vki_mnt_id_req {
> > +   __vki_u32 size;
> > +   __vki_u32 spare;
> > +   __vki_u64 mnt_id;
> > +   __vki_u64 param;
> > +   __vki_u64 mnt_ns_id;
> > +};
> 
> O. I see what that size field is for. There are multiple versions of
> this struct. The second variant is bigger than the first (has the
> mnt_ns_id field added). See
> https://lore.kernel.org/all/49930bdce29a8367a213eb14c1e68e7e49284f86.
> 1719243756.git.jo...@toxicpanda.com/
> 
> So where we do
> PRE_MEM_READ("statmount(req)", ARG1, sizeof(struct vki_mnt_id_req));
> or
> PRE_MEM_READ("listmount(req)", ARG1, sizeof(struct vki_mnt_id_req));
> 
> We really should check just the struct size from user space:
> 
> struct vki_mnt_id_req *req = (struct vki_mnt_id_req *)(Addr)ARG1;
> PRE_MEM_READ("statmount(req)", ARG1, req->size);

aha, fixed!

> > +struct vki_statmount {
> > +     __vki_u32 size;         /* Total size, including strings */
> > +     __vki_u32 mnt_opts;             /* [str] Mount options of the mount */
> > +     __vki_u64 mask;         /* What results were written */
> > +     __vki_u32 sb_dev_major; /* Device ID */
> > +     __vki_u32 sb_dev_minor;
> > +     __vki_u64 sb_magic;             /* ..._SUPER_MAGIC */
> > +     __vki_u32 sb_flags;             /* 
> > SB_{RDONLY,SYNCHRONOUS,DIRSYNC,LAZYTIME} */
> > +     __vki_u32 fs_type;              /* [str] Filesystem type */
> > +     __vki_u64 mnt_id;               /* Unique ID of mount */
> > +     __vki_u64 mnt_parent_id;        /* Unique ID of parent (for root == 
> > mnt_id) */
> > +     __vki_u32 mnt_id_old;   /* Reused IDs used in proc/.../mountinfo */
> > +     __vki_u32 mnt_parent_id_old;
> > +     __vki_u64 mnt_attr;             /* MOUNT_ATTR_... */
> > +     __vki_u64 mnt_propagation;      /* 
> > MS_{SHARED,SLAVE,PRIVATE,UNBINDABLE} */
> > +     __vki_u64 mnt_peer_group;       /* ID of shared peer group */
> > +     __vki_u64 mnt_master;   /* Mount receives propagation from this ID */
> > +     __vki_u64 propagate_from;       /* Propagation from in current 
> > namespace */
> > +     __vki_u32 mnt_root;             /* [str] Root of mount relative to 
> > root of fs */
> > +     __vki_u32 mnt_point;    /* [str] Mountpoint relative to current root 
> > */
> > +     __vki_u64 mnt_ns_id;    /* ID of the mount namespace */
> > +     __vki_u64 __spare2[49];
> > +     char str[];             /* Variable size part containing strings */
> > +};
> 
> OK.
> 
> > diff --git a/include/vki/vki-scnums-shared-linux.h 
> > b/include/vki/vki-scnums-shared-linux.h
> > index 32ef8ac13..85bf9d171 100644
> > --- a/include/vki/vki-scnums-shared-linux.h
> > +++ b/include/vki/vki-scnums-shared-linux.h
> > @@ -56,6 +56,8 @@
>  
> >  #define __NR_cachestat               451
> >  #define __NR_fchmodat2               452
> > +#define __NR_statmount          457
> > +#define __NR_listmount          458
> >  #define __NR_mseal           462
> 
> Looks good.

Thanks a ton, Mark, for your review.  Please check my updated patch.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to