On 2026-04-07, Mateusz Guzik <[email protected]> wrote: > On Thu, Apr 2, 2026 at 4:52 AM Aleksa Sarai <[email protected]> wrote: > > > > On 2026-04-01, Mateusz Guzik <[email protected]> wrote: > > > Trying to handle this in open() is a no-go. openat2 is rather > > > problematic. > > > > I'm interested in what makes you say that. It would be very nice to be able > > to do mkdir + RESOLVE_IN_ROOT and get an fd back all in one syscall. :D > > > > Not handling this in either of open or openat2 does not preclude mkdir > + RESOLVE_IN_ROOT + getting a fd in one go from existing.
Well, that would also require passing RESOLVE_* flags to mkdirat2(2) which kind of begs the question why not just integrate it into openat2(2) -- otherwise there will always be more features available to O_CREAT than mkdirat2(2) which seems unfortunate. > Creating a directory was always a different syscall than creating a > file. I don't see any benefit to squeezing it into open. I do see a > downside because of an extra branchfest to differentiate the cases. Ah, so it's just an issue of taste, not a technical problem (as the mail I replied to made it sound)? > > > The routine would have to start with validating the passed O_ flags, for > > > now only allowing O_CLOEXEC and EINVAL-ing otherwise. > > > > Please do not use O_* flags! O_CLOEXEC takes up 3 flag bits on different > > architectures which makes adding new flags a nightmare. > > > > With my proposal there are no new flags added so I don't think that's > relevant. I'm confused, was "the new routine would have to start with validating the passed O_ flags" talking about a hypothetical API you oppose? It read like a suggestion on my first pass-through, hence the reply. If you're saying that your proposal doesn't add any new O_* (or MKDIRAT_*) flags that really isn't the issue -- any syscall that takes a flag argument will grow new flags eventually and using the literal value of O_CLOEXEC for some other syscall's flags just leads to burning three flag bits needlessly. This is arguably the most painful thing about open_tree(2)'s flags -- most other syscalls define their own flag that is equivalent to O_CLOEXEC but not literally equal to it (this is even recommended in Documentation/process/adding-syscalls.rst!). > > I think this should take AT_* flags and (like most newer syscalls) > > O_CLOEXEC should be automatically set. Userspace can unset it with > > fnctl(F_SETFD) in the relatively rare case where they don't want > > O_CLOEXEC. Alternatively, we could just bite the bullet and make > > AT_NO_CLOEXEC a thing... > > > > I would say that's a pretty weird discrepancy vs what normally happens > with other syscalls, but perhaps it would be fine. Quite a few of the newer uAPIs do this -- all of the pidfd APIs do it, as well as newer ioctls that return fds (like the NS_GET_* ioctls for nsfs). Clearing O_CLOEXEC safely is trivial but safely setting it is not really possible in multi-threaded programs (see "man 2 openat"), so it makes more sense for newer APIs to just default to O_CLOEXEC and userspace can unset it (and that is what newer APIs already do). We should probably update Documentation/process/adding-syscalls.rst to mention this... -- Aleksa Sarai https://www.cyphar.com/
signature.asc
Description: PGP signature

