+cc Linus

On Sat, Jun 22, 2024 at 06:56:08PM +0800, Xi Ruoyao wrote:
> It's cheap to check if the path is empty in the userspace, but expensive
> to check if a userspace string is empty from the kernel.  So using statx
> and AT_EMPTY_PATH to implement fstat is slower than a "native" fstat
> call.  But for arch/loongarch fstat does not exist so we have to use
> statx, and on all 32-bit architectures we must use statx after 2037.
> And seccomp also cannot audit AT_EMPTY_PATH properly because it cannot
> check if path is empty.
> 
> To resolve these issues, add a relaxed version of AT_EMPTY_PATH: it does
> not check if the path is empty, but just assumes the path is empty and
> then behaves like AT_EMPTY_PATH.
> 
> Link: https://sourceware.org/pipermail/libc-alpha/2023-September/151364.html
> Link: 
> https://lore.kernel.org/loongarch/599df4a3-47a4-49be-9c81-8e21ea1f9...@xen0n.name/

imo the thing to do is to add fstat for your arch and add fstatx for
everyone.

My argument for fstatx specifically is that Rust uses statx instead of
fstat and is growing in popularity.

To sum up the problem: stat and statx met with "" + AT_EMPTY_PATH have
more work to do than fstat and its hypotethical statx counterpart:
- buf alloc/free for the path
- userspace access (very painful on x86_64 + SMAP)
- lockref acquire/release

(and other things concerning lookup itself which I'm going to ignore
here)

Your patch avoids the peek at userspace, but the other overhead remains.
In particular the lockref cycle, apart from adding work single-threaded,
adds avoidable serialization against other threads issuing stat(x) on
the same file. iow your patch still leaves performance on the table and
I don't think it is necessary.

If the flag is the way to go (I don't see why though), I would suggest
something like AT_NO_PATH and requring NULL as the path argument (or
some other predefined "there is nothing here" constant).

I wanted to type up a proposal for fstatx (+ patch) some time ago, but
some refactoring was needed to make it happen and put it on the back
burner.

Perhaps you would be willing to pick it up, assuming the vfs folk are
oke with it.

Regardless of what happens with statx or this patch you should probably
add fstat anyway.

If there are any other perf-sensitive syscalls which don't have their
fd-only variants they should be plugged to, but I can't think of
anything.

> ---
>  fs/namei.c                 | 8 +++++++-
>  fs/stat.c                  | 4 +++-
>  include/linux/namei.h      | 4 ++++
>  include/trace/misc/fs.h    | 1 +
>  include/uapi/linux/fcntl.h | 3 +++
>  5 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 37fb0a8aa09a..0c44a7ea5961 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -147,7 +147,13 @@ getname_flags(const char __user *filename, int flags, 
> int *empty)
>       kname = (char *)result->iname;
>       result->name = kname;
>  
> -     len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX);
> +     if (!(flags & LOOKUP_EMPTY_NOCHECK))
> +             len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX);
> +     else {
> +             len = 0;
> +             kname[0] = '\0';
> +     }
> +
>       if (unlikely(len < 0)) {
>               __putname(result);
>               return ERR_PTR(len);
> diff --git a/fs/stat.c b/fs/stat.c
> index 70bd3e888cfa..53944d3287cd 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -210,6 +210,8 @@ int getname_statx_lookup_flags(int flags)
>               lookup_flags |= LOOKUP_AUTOMOUNT;
>       if (flags & AT_EMPTY_PATH)
>               lookup_flags |= LOOKUP_EMPTY;
> +     if (flags & AT_EMPTY_PATH_NOCHECK)
> +             lookup_flags |= LOOKUP_EMPTY | LOOKUP_EMPTY_NOCHECK;
>  
>       return lookup_flags;
>  }
> @@ -237,7 +239,7 @@ static int vfs_statx(int dfd, struct filename *filename, 
> int flags,
>       int error;
>  
>       if (flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT | AT_EMPTY_PATH |
> -                   AT_STATX_SYNC_TYPE))
> +                   AT_STATX_SYNC_TYPE | AT_EMPTY_PATH_NOCHECK))
>               return -EINVAL;
>  
>  retry:
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 967aa9ea9f96..def6a8a1b531 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -45,9 +45,13 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
>  #define LOOKUP_IN_ROOT               0x100000 /* Treat dirfd as fs root. */
>  #define LOOKUP_CACHED                0x200000 /* Only do cached lookup */
>  #define LOOKUP_LINKAT_EMPTY  0x400000 /* Linkat request with empty path. */
> +
>  /* LOOKUP_* flags which do scope-related checks based on the dirfd. */
>  #define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT)
>  
> +/* If this is set, LOOKUP_EMPTY must be set as well. */
> +#define LOOKUP_EMPTY_NOCHECK 0x800000 /* Consider path empty. */
> +
>  extern int path_pts(struct path *path);
>  
>  extern int user_path_at_empty(int, const char __user *, unsigned, struct 
> path *, int *empty);
> diff --git a/include/trace/misc/fs.h b/include/trace/misc/fs.h
> index 738b97f22f36..24aec7ed6b0b 100644
> --- a/include/trace/misc/fs.h
> +++ b/include/trace/misc/fs.h
> @@ -119,4 +119,5 @@
>               { LOOKUP_NO_XDEV,       "NO_XDEV" }, \
>               { LOOKUP_BENEATH,       "BENEATH" }, \
>               { LOOKUP_IN_ROOT,       "IN_ROOT" }, \
> +             { LOOKUP_EMPTY_NOCHECK, "EMPTY_NOCHECK" }, \
>               { LOOKUP_CACHED,        "CACHED" })
> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> index c0bcc185fa48..aa2f68d80820 100644
> --- a/include/uapi/linux/fcntl.h
> +++ b/include/uapi/linux/fcntl.h
> @@ -113,6 +113,9 @@
>  #define AT_STATX_DONT_SYNC   0x4000  /* - Don't sync attributes with the 
> server */
>  
>  #define AT_RECURSIVE         0x8000  /* Apply to the entire subtree */
> +#define AT_EMPTY_PATH_NOCHECK        0x10000 /* Like AT_EMPTY_PATH, but the 
> path
> +                                           is not checked and it's just
> +                                           assumed to be empty */
>  
>  /* Flags for name_to_handle_at(2). We reuse AT_ flag space to save bits... */
>  #define AT_HANDLE_FID                AT_REMOVEDIR    /* file handle is 
> needed to
> -- 
> 2.45.2
> 

Reply via email to