On Wed, Mar 04, 2026 at 01:57:31PM -0500, Demi Marie Obenour wrote:
> On 3/4/26 08:03, Christian Brauner wrote:
> > On Wed, Mar 04, 2026 at 01:53:42AM -0500, Demi Marie Obenour wrote:
> >> I noticed potentially missing input sanitization in dma_buf_set_name(),
> >> which is reachable from DMA_BUF_SET_NAME.  This allows inserting a name
> >> containing a newline, which is then used to construct the contents of
> >> /proc/PID/task/TID/fdinfo/FD.  This could confuse userspace programs
> >> that access this data, possibly tricking them into thinking a file
> >> descriptor is of a different type than it actually is.
> >>
> >> Other code might have similar bugs.  For instance, there is code that
> >> uses a sysfs path, a driver name, or a device name from /dev.  It is
> >> possible to sanitize the first, and the second and third should come
> >> from trusted sources within the kernel itself.  The last area where
> >> I found a potential problem is BPF.  I don't know if this can happen.
> >>
> >> I think this should be fixed by either sanitizing data on write
> >> (by limiting the allowed characters in dma_buf_set_name()), on read
> >> (by using one of the formats that escapes special characters), or both.
> >>
> >> Is there a better way to identify that a file descriptor is of
> >> a particular type, such as an eventfd?  fdinfo is subject to
> > 
> > The problem is that most of the anonymous inodes share a single
> > anonymous inode so any uapi that returns information based inode->i_op
> > is not going to be usable.
> > 
> >> bugs of this type, which might happen again.  readlink() reports
> >> "anon_inode:[eventfd]" and S_IFMT reports a mode of 0, but but my
> > 
> > That is definitely uapi by now. We've tried to change S_IFMT and it
> > breaks lsfd and other tools so we can't reasonably change it. In fact,
> > pidfds pretend to be anon_inode even though they're not simply because
> > some tools parse that out.
> 
> Does Linux guarantee that anything that is not an anonymous inode
> will have (st_mode & S_IFMT) != 0?

Ignoring bugs or disk corruption anonymous inodes should be the only
inode type that has a zero type. Everything else should have a non-zero
type and the I made the VFS splat in may_open():

          switch (inode->i_mode & S_IFMT) {
          case S_IFLNK:
                  return -ELOOP;
          case S_IFDIR:
                  if (acc_mode & MAY_WRITE)
                          return -EISDIR;
                  if (acc_mode & MAY_EXEC)
                          return -EACCES;
                  break;
          case S_IFBLK:
          case S_IFCHR:
                  if (!may_open_dev(path))
                          return -EACCES;
                  fallthrough;
          case S_IFIFO:
          case S_IFSOCK:
                  if (acc_mode & MAY_EXEC)
                          return -EACCES;
                  flag &= ~O_TRUNC;
                  break;
          case S_IFREG:
                  if ((acc_mode & MAY_EXEC) && path_noexec(path))
                          return -EACCES;
                  break;
          default:
                  VFS_BUG_ON_INODE(!IS_ANON_FILE(inode), inode);
          }

> Maybe it is time for a prctl that disables this legacy behavior?

I've switched anonymous inodes internally to S_IFREG a while ago in [1]
and then masked it off for userspace. Even just the internal conversion
caused various subsystems like io_uring to lose it which is why we
reverted it in [2].

So any next attempt needs to ensure that there are no internal and no
external regressions. And no prctl()s please. It's a strong contender
for Linux' main landfill next to procfs.

Ideally we'd just look at lsfd and lsof and move them away from any type
assertions. I have asked them to do that for pidfds a while ago and they
have merged a patch to that effect.

[1]: cfd86ef7e8e7 ("anon_inode: use a proper mode internally")
[2]: 1e7ab6f67824 ("anon_inode: rework assertions")

Reply via email to