[The last time that was discussed a year ago; rather than posting it as a followup into the old thread, I'm starting a new one. Old thread can be found at https://lore.kernel.org/all/20240922004901.GA3413968@ZenIV/]
There are two filename-related problems in io_uring and its interplay with audit. Filenames are imported when request is submitted and used when it is processed. Unfortunately, with io_uring the latter may very well happen in a different thread. In that case the reference to filename is put into the wrong audit_context - that of submitting thread, not the processing one. Audit logics is called by the latter, and it really wants to be able to find the names in audit_context of current (== processing) thread. Another related problem is the headache with refcounts - normally all references to given struct filename are visible only to one thread (the one that uses that struct filename). io_uring violates that - an extra reference is stashed in audit_context of submitter. It gets dropped when submitter returns to userland, which can happen simultaneously with processing thread deciding to drop the reference it got. We paper over that by making refcount atomic, but that means pointless overhead for everyone. One more headache with refcount comes from retry loops - handling of -ESTALE while resolving a parent is dealt with transparently, but server might get around to telling you that things got stale only when you e.g. ask it to remove a link. In that case we have to repeat the pathwalk in "trust no one" mode and see if it helps. So far, so good, but depending upon the helpers we are using we might end up re-importing the pathname from userland. Had that been merely duplicated work on an already slow path, it wouldn't matter much, but with audit in the mix it becomes seriously confusing. Currently getname() and its ilk try to cope with that (only if audit is enabled) by stashing the userland address in struct filename and checking if an instance for the same userland address has already been made visible to audit, in which case we just grab an extra reference. That's bogus for several reasons. In particular, having the same pointer passed in different pathname arguments of a syscall should not be different from having separate strings with identical context given to it. Compiler might turn the latter into the former, after all - merging identical string literals may happen. As the result, audit might produce significantly different outputs on the same program, depending upon the compiler flags used when building it. This is obviously not a good thing. The fact that this logics is dependent upon CONFIG_AUDIT also doesn't help. The right solution is to have the pathname imported once, before the retry loop; fortunately, most of those loops are already done that way these days - only 9 exceptions in the entire kernel. With those exceptions taken care of, we can get rid of the entire "stash the userland address in struct filename" thing. That helps to solve both io_uring problems - import of pathname in there is already separated from making use of it (the former happens in submitting thread, the latter - in processing one). If we explicitly mark the places where we start making use of those suckers (in io_mkdirat(), etc.), we can have the submitter do "incomplete" imports (just copying the name from userland and stashing the result in opaque object). Then processing thread would explicitly ask to complete the import and use the resulting struct filename *, same as a normal syscall would - all in the thread that does actual work, so that situation looks normal for audit - the damn thing goes into the right audit_context, all uses are thread-synchronous and from the same thread, etc. What's more, refcount can become non-atomic (and grabbed only inside the kernel/auditsc.c, at that). The series below attempts to do that. It does need a serious review, including that from audit and io_uring folks. It lives in git://git.kernel.org:/pub/scm/linux/kernel/git/viro/vfs.git #work.filename-refcnt Individual patches in followups. Al Viro (12): do_faccessat(): import pathname only once do_fchmodat(): import pathname only once do_fchownat(): import pathname only once do_utimes_path(): import pathname only once chdir(2): import pathname only once chroot(2): import pathname only once user_statfs(): import pathname only once do_sys_truncate(): import pathname only once do_readlinkat(): import pathname only once get rid of audit_reusename() allow incomplete imports of filenames struct filename ->refcnt doesn't need to be atomic Mateusz Guzik (1): fs: touch up predicts in putname() fs/namei.c | 68 +++++++++++++++++++++------- fs/open.c | 39 +++++++++------- fs/stat.c | 6 +-- fs/statfs.c | 4 +- fs/utimes.c | 13 +++--- include/linux/audit.h | 11 ----- include/linux/fs.h | 17 ++++--- io_uring/fs.c | 101 ++++++++++++++++++++++-------------------- io_uring/openclose.c | 16 +++---- io_uring/statx.c | 17 +++---- io_uring/xattr.c | 30 +++++-------- kernel/auditsc.c | 23 ++-------- 12 files changed, 178 insertions(+), 167 deletions(-) -- 2.47.3
