On Nov 23, 2016, at 1:37 AM, Michael Kerrisk <mtk.manpa...@gmail.com> wrote:
> 
> Hi David,
> 
> On 11/23/2016 01:55 AM, David Howells wrote:
>> Add a system call to make extended file information available, including
>> file creation and some attribute flags where available through the
>> underlying filesystem.
>> 
>> 
>> ========
>> OVERVIEW
>> ========
>> 
>> The idea was initially proposed as a set of xattrs that could be retrieved
>> with getxattr(), but the general preferance proved to be for a new syscall
> 
> s/preferance/preference/
> 
>> with an extended stat structure.
>> 
>> This can feasibly be used to support a number of things, not all of which
>> are added here:
> 
> It would be very useful if this overview distinguishes which of the features
> below are supported in the initial implementation, versus which features
> (e.g., femtosecond timestamps) are simply allowed for in a future
> implementation.
> 
>> (1) Better support for the y2038 problem [Arnd Bergmann].
>> 
>> (2) Creation time: The SMB protocol carries the creation time, which could
>>     be exported by Samba, which will in turn help CIFS make use of
>>     FS-Cache as that can be used for coherency data.
>> 
>>     This is also specified in NFSv4 as a recommended attribute and could
>>     be exported by NFSD [Steve French].
>> 
>> (3) Lightweight stat: Ask for just those details of interest, and allow a
>>     netfs (such as NFS) to approximate anything not of interest, possibly
>>     without going to the server [Trond Myklebust, Ulrich Drepper, Andreas
>>     Dilger].
>> 
>> (4) Heavyweight stat: Force a netfs to go to the server, even if it thinks
>>     its cached attributes are up to date [Trond Myklebust].
>> 
>> (5) Data version number: Could be used by userspace NFS servers [Aneesh
>>     Kumar].
>> 
>>     Can also be used to modify fill_post_wcc() in NFSD which retrieves
>>     i_version directly, but has just called vfs_getattr().  It could get
>>     it from the kstat struct if it used vfs_xgetattr() instead.
>> 
>> (6) BSD stat compatibility: Including more fields from the BSD stat such
>>     as creation time (st_btime) and inode generation number (st_gen)
>>     [Jeremy Allison, Bernd Schubert].
>> 
>> (7) Inode generation number: Useful for FUSE and userspace NFS servers
>>     [Bernd Schubert].  This was asked for but later deemed unnecessary
>>     with the open-by-handle capability available
>> 
>> (8) Extra coherency data may be useful in making backups [Andreas Dilger].
> 
> Can you elaborate on the point [8] in this commit message. It's not clear
> to me at least what this is about.
>> 
>> (9) Allow the filesystem to indicate what it can/cannot provide: A
>>     filesystem can now say it doesn't support a standard stat feature if
>>     that isn't available, so if, for instance, inode numbers or UIDs don't
>>     exist or are fabricated locally...
>> 
>> (10) Make the fields a consistent size on all arches and make them large.
>> 
>> (11) Store a 16-byte volume ID in the superblock that can be returned in
>>     struct xstat [Steve French].
>> 
>> (12) Include granularity fields in the time data to indicate the
>>     granularity of each of the times (NFSv4 time_delta) [Steve French].
>> 
>> (13) FS_IOC_GETFLAGS value.  These could be translated to BSD's st_flags.
>>     Note that the Linux IOC flags are a mess and filesystems such as Ext4
>>     define flags that aren't in linux/fs.h, so translation in the kernel
>>     may be a necessity (or, possibly, we provide the filesystem type too).
>> 
>> (14) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer,
>>     Michael Kerrisk].
>> 
>> (15) Spare space, request flags and information flags are provided for
>>     future expansion.
>> 
>> (16) Femtosecond-resolution timestamps [Dave Chinner].
>> 
>> 
>> ===============
>> NEW SYSTEM CALL
>> ===============
>> 
>> The new system call is:
>> 
>>      int ret = statx(int dfd,
>>                      const char *filename,
>>                      unsigned int flags,
> 
> In the 0/4 of this patch series, this argument is called 'atflags'.
> These should be consistent. 'flags' seems correct to me.

Given that there are a number of different flags and masks in use for
this syscall, naming this field "atflags" makes it more clear what it
is used for.

>>                      unsigned int mask,

Similarly, naming this field "request_mask" would also be more clear,
and matches what is used elsewhere in the code.

That said, I don't care enough about this detail to request a patch refresh,
but it would be useful for the man pages.

Cheers, Andreas

>>                      struct statx *buffer);
>> 
>> The dfd, filename and flags parameters indicate the file to query, in a
>> similar way to fstatat().  There is no equivalent of lstat() as that can be
>> emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags.  There is
>> also no equivalent of fstat() as that can be emulated by passing a NULL
>> filename to statx() with the fd of interest in dfd.
>> 
>> Whether or not statx() synchronises the attributes with the backing store
>> can be controlled (this typically only affects network filesystems) can be
>> set by OR'ing a value into the flags argument:
> 
> s/can be set//
> 
>> 
>> (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this
>>     respect.
>> 
>> (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise
>>     its attributes with the server - which might require data writeback to
>>     occur to get the timestamps correct.
>> 
>> (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a
>>     network filesystem.  The resulting values should be considered
>>     approximate.
>> 
>> mask is a bitmask indicating the fields in struct statx that are of
>> interest to the caller.  The user should set this to STATX_BASIC_STATS to
>> get the basic set returned by stat().  It should be note that asking for
> 
> s/note/noted/
> 
>> more information may entail extra I/O operations.
>> 
>> buffer points to the destination for the data.  This must be 256 bytes in
>> size.
>> 
>> 
>> ======================
>> MAIN ATTRIBUTES RECORD
>> ======================
>> 
>> The following structures are defined in which to return the main attribute
>> set:
>> 
>>      struct statx_timestamp {
>>              __s64   tv_sec;
>>              __s32   tv_nsec;
>>              __s32   __reserved;
>>      };
>> 
>>      struct statx {
>>              __u32   stx_mask;
>>              __u32   stx_blksize;
>>              __u64   stx_attributes;
>>              __u32   stx_nlink;
>>              __u32   stx_uid;
>>              __u32   stx_gid;
>>              __u16   stx_mode;
>>              __u16   __spare0[1];
>>              __u64   stx_ino;
>>              __u64   stx_size;
>>              __u64   stx_blocks;
>>              __u64   __spare1[1];
>>              struct statx_timestamp  stx_atime;
>>              struct statx_timestamp  stx_btime;
>>              struct statx_timestamp  stx_ctime;
>>              struct statx_timestamp  stx_mtime;
>>              __u32   stx_rdev_major;
>>              __u32   stx_rdev_minor;
>>              __u32   stx_dev_major;
>>              __u32   stx_dev_minor;
>>              __u64   __spare2[14];
>>      };
>> 
>> The defined bits in request_mask and stx_mask are:
>> 
>>      STATX_TYPE              Want/got stx_mode & S_IFMT
>>      STATX_MODE              Want/got stx_mode & ~S_IFMT
>>      STATX_NLINK             Want/got stx_nlink
>>      STATX_UID               Want/got stx_uid
>>      STATX_GID               Want/got stx_gid
>>      STATX_ATIME             Want/got stx_atime{,_ns}
>>      STATX_MTIME             Want/got stx_mtime{,_ns}
>>      STATX_CTIME             Want/got stx_ctime{,_ns}
>>      STATX_INO               Want/got stx_ino
>>      STATX_SIZE              Want/got stx_size
>>      STATX_BLOCKS            Want/got stx_blocks
>>      STATX_BASIC_STATS       [The stuff in the normal stat struct]
>>      STATX_BTIME             Want/got stx_btime{,_ns}
>>      STATX_ALL               [All currently available stuff]
>> 
>> stx_btime is the file creation time, stx_mask is a bitmask indicating the
>> data provided and __spares*[] are where as-yet undefined fields can be
>> placed.
>> 
>> Time fields are structures with separate seconds and nanoseconds fields
>> plus a reserved field in case we want to add even finer resolution.  Note
>> that times will be negative if before 1970; in such a case, the nanosecond
>> fields will also be negative if not zero.
>> 
>> The bits defined in the stx_attributes field convey information about a
>> file, how it is accessed, where it is and what it does.  The following
>> attributes map to FS_*_FL flags and are the same numerical value:
>> 
>>      STATX_ATTR_COMPRESSED           File is compressed by the fs
>>      STATX_ATTR_IMMUTABLE            File is marked immutable
>>      STATX_ATTR_APPEND               File is append-only
>>      STATX_ATTR_NODUMP               File is not to be dumped
>>      STATX_ATTR_ENCRYPTED            File requires key to decrypt in fs
>> 
>> Within the kernel, the supported flags are listed by:
>> 
>>      KSTAT_ATTR_FS_IOC_FLAGS
>> 
>> [Are any other IOC flags of sufficient general interest to be exposed
>> through this interface?]
>> 
>> New flags include:
>> 
>>      STATX_ATTR_AUTOMOUNT            Object is an automount trigger
>> 
>> These are for the use of GUI tools that might want to mark files specially,
>> depending on what they are.
>> 
>> Fields in struct statx come in a number of classes:
>> 
>> (0) stx_dev_*, stx_blksize.
>> 
>>     These are local system information and are always available.
>> 
>> (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino,
>>     stx_size, stx_blocks.
>> 
>>     These will be returned whether the caller asks for them or not.  The
>>     corresponding bits in stx_mask will be set to indicate whether they
>>     actually have valid values.
>> 
>>     If the caller didn't ask for them, then they may be approximated.  For
>>     example, NFS won't waste any time updating them from the server,
>>     unless as a byproduct of updating something requested.
>> 
>>     If the values don't actually exist for the underlying object (such as
>>     UID or GID on a DOS file), then the bit won't be set in the stx_mask,
>>     even if the caller asked for the value.  In such a case, the returned
>>     value will be a fabrication.
>> 
>>     Note that there are instances where the type might not be valid, for
>>     instance Windows reparse points.
>> 
>> (2) stx_rdev_*.
>> 
>>     This will be set only if stx_mode indicates we're looking at a
>>     blockdev or a chardev, otherwise will be 0.
>> 
>> (3) stx_btime.
>> 
>>     Similar to (1), except this will be set to 0 if it doesn't exist.
>> 
>> 
>> =======
>> TESTING
>> =======
>> 
>> The following test program can be used to test the statx system call:
>> 
>>      samples/statx/test-statx.c
>> 
>> Just compile and run, passing it paths to the files you want to examine.
>> The file is built automatically if CONFIG_SAMPLES is enabled.
>> 
>> Here's some example output.  Firstly, an NFS directory that crosses to
>> another FSID.  Note that the AUTOMOUNT attribute is set because transiting
>> this directory will cause d_automount to be invoked by the VFS.
>> 
>>      [root@andromeda tmp]# ./samples/statx/test-statx -A /warthog/data
>>      statx(/warthog/data) = 0
>>      results=17ff
>>        Size: 4096            Blocks: 8          IO Block: 1048576  directory
>>      Device: 00:26           Inode: 1703937     Links: 124
>>      Access: (3777/drwxrwxrwx)  Uid:     0   Gid:  4041
>>      Access: 2016-11-10 15:52:11.219935864+0000
>>      Modify: 2016-11-10 08:07:32.482314928+0000
>>      Change: 2016-11-10 08:07:32.482314928+0000
>>      Attributes: 0000000000001000 (-------- -------- -------- -------- 
>> -------- -------- ---m---- --------)
>>      IO-blocksize: blksize=1048576
>> 
>> Secondly, the result of automounting on that directory.
>> 
>>      [root@andromeda tmp]# ./samples/statx/test-statx /warthog/data
>>      statx(/warthog/data) = 0
>>      results=17ff
>>        Size: 4096            Blocks: 8          IO Block: 1048576  directory
>>      Device: 00:27           Inode: 2           Links: 124
>>      Access: (3777/drwxrwxrwx)  Uid:     0   Gid:  4041
>>      Access: 2016-11-10 15:52:11.219935864+0000
>>      Modify: 2016-11-10 08:07:32.482314928+0000
>>      Change: 2016-11-10 08:07:32.482314928+0000
>>      IO-blocksize: blksize=1048576
>> 
>> Signed-off-by: David Howells <dhowe...@redhat.com>
>> ---
>> 
>> arch/x86/entry/syscalls/syscall_32.tbl |    1
>> arch/x86/entry/syscalls/syscall_64.tbl |    1
>> fs/exportfs/expfs.c                    |    4
>> fs/stat.c                              |  297 
>> +++++++++++++++++++++++++++++---
>> include/linux/fs.h                     |    5 -
>> include/linux/stat.h                   |   19 +-
>> include/linux/syscalls.h               |    3
>> include/uapi/linux/fcntl.h             |    5 +
>> include/uapi/linux/stat.h              |  120 +++++++++++++
>> samples/Kconfig                        |    5 +
>> samples/Makefile                       |    3
>> samples/statx/Makefile                 |   10 +
>> samples/statx/test-statx.c             |  248 +++++++++++++++++++++++++++
>> 13 files changed, 681 insertions(+), 40 deletions(-)
>> create mode 100644 samples/statx/Makefile
>> create mode 100644 samples/statx/test-statx.c
>> 
>> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
>> b/arch/x86/entry/syscalls/syscall_32.tbl
>> index 2b3618542544..9ba050fe47f3 100644
>> --- a/arch/x86/entry/syscalls/syscall_32.tbl
>> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
>> @@ -389,3 +389,4 @@
>> 380  i386    pkey_mprotect           sys_pkey_mprotect
>> 381  i386    pkey_alloc              sys_pkey_alloc
>> 382  i386    pkey_free               sys_pkey_free
>> +383 i386    statx                   sys_statx
>> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
>> b/arch/x86/entry/syscalls/syscall_64.tbl
>> index e93ef0b38db8..5aef183e2f85 100644
>> --- a/arch/x86/entry/syscalls/syscall_64.tbl
>> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
>> @@ -338,6 +338,7 @@
>> 329  common  pkey_mprotect           sys_pkey_mprotect
>> 330  common  pkey_alloc              sys_pkey_alloc
>> 331  common  pkey_free               sys_pkey_free
>> +332 common  statx                   sys_statx
>> 
>> #
>> # x32-specific system call numbers start at 512 to avoid cache impact
>> diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
>> index a4b531be9168..2acc31751248 100644
>> --- a/fs/exportfs/expfs.c
>> +++ b/fs/exportfs/expfs.c
>> @@ -299,7 +299,9 @@ static int get_name(const struct path *path, char *name, 
>> struct dentry *child)
>>       * filesystem supports 64-bit inode numbers.  So we need to
>>       * actually call ->getattr, not just read i_ino:
>>       */
>> -    error = vfs_getattr_nosec(&child_path, &stat);
>> +    stat.query_flags = 0;
>> +    stat.request_mask = STATX_BASIC_STATS;
>> +    error = vfs_xgetattr_nosec(&child_path, &stat);
>>      if (error)
>>              return error;
>>      buffer.ino = stat.ino;
>> diff --git a/fs/stat.c b/fs/stat.c
>> index bc045c7994e1..82e656c42157 100644
>> --- a/fs/stat.c
>> +++ b/fs/stat.c
>> @@ -18,6 +18,15 @@
>> #include <asm/uaccess.h>
>> #include <asm/unistd.h>
>> 
>> +/**
>> + * generic_fillattr - Fill in the basic attributes from the inode struct
>> + * @inode: Inode to use as the source
>> + * @stat: Where to fill in the attributes
>> + *
>> + * Fill in the basic attributes in the kstat structure from data that's to 
>> be
>> + * found on the VFS inode structure.  This is the default if no getattr 
>> inode
>> + * operation is supplied.
>> + */
>> void generic_fillattr(struct inode *inode, struct kstat *stat)
>> {
>>      stat->dev = inode->i_sb->s_dev;
>> @@ -27,87 +36,189 @@ void generic_fillattr(struct inode *inode, struct kstat 
>> *stat)
>>      stat->uid = inode->i_uid;
>>      stat->gid = inode->i_gid;
>>      stat->rdev = inode->i_rdev;
>> -    stat->size = i_size_read(inode);
>> -    stat->atime = inode->i_atime;
>>      stat->mtime = inode->i_mtime;
>>      stat->ctime = inode->i_ctime;
>> -    stat->blksize = (1 << inode->i_blkbits);
>> +    stat->size = i_size_read(inode);
>>      stat->blocks = inode->i_blocks;
>> -}
>> +    stat->blksize = 1 << inode->i_blkbits;
>> 
>> +    stat->result_mask |= STATX_BASIC_STATS;
>> +    if (IS_NOATIME(inode))
>> +            stat->result_mask &= ~STATX_ATIME;
>> +    else
>> +            stat->atime = inode->i_atime;
>> +
>> +    if (IS_AUTOMOUNT(inode))
>> +            stat->attributes |= STATX_ATTR_AUTOMOUNT;
>> +}
>> EXPORT_SYMBOL(generic_fillattr);
>> 
>> /**
>> - * vfs_getattr_nosec - getattr without security checks
>> + * vfs_xgetattr_nosec - getattr without security checks
>>  * @path: file to get attributes from
>>  * @stat: structure to return attributes in
>>  *
>>  * Get attributes without calling security_inode_getattr.
>>  *
>> - * Currently the only caller other than vfs_getattr is internal to the
>> - * filehandle lookup code, which uses only the inode number and returns
>> - * no attributes to any user.  Any other code probably wants
>> - * vfs_getattr.
>> + * Currently the only caller other than vfs_xgetattr is internal to the
>> + * filehandle lookup code, which uses only the inode number and returns no
>> + * attributes to any user.  Any other code probably wants vfs_xgetattr.
>> + *
>> + * The caller must set stat->request_mask to indicate what they want and
>> + * stat->query_flags to indicate whether the server should be queried.
>>  */
>> -int vfs_getattr_nosec(struct path *path, struct kstat *stat)
>> +int vfs_xgetattr_nosec(struct path *path, struct kstat *stat)
>> {
>>      struct inode *inode = d_backing_inode(path->dentry);
>> 
>> +    stat->query_flags &= ~KSTAT_QUERY_FLAGS;
>> +
>> +    stat->result_mask = 0;
>> +    stat->attributes = 0;
>>      if (inode->i_op->getattr)
>>              return inode->i_op->getattr(path->mnt, path->dentry, stat);
>> 
>>      generic_fillattr(inode, stat);
>>      return 0;
>> }
>> +EXPORT_SYMBOL(vfs_xgetattr_nosec);
>> 
>> -EXPORT_SYMBOL(vfs_getattr_nosec);
>> -
>> -int vfs_getattr(struct path *path, struct kstat *stat)
>> +/*
>> + * vfs_xgetattr - Get the enhanced basic attributes of a file
>> + * @path: The file of interest
>> + * @stat: Where to return the statistics
>> + *
>> + * Ask the filesystem for a file's attributes.  The caller must have preset
>> + * stat->request_mask and stat->query_flags to indicate what they want.
>> + *
>> + * If the file is remote, the filesystem can be forced to update the 
>> attributes
>> + * from the backing store by passing AT_FORCE_ATTR_SYNC in query_flags or 
>> can
>> + * suppress the update by passing AT_NO_ATTR_SYNC.
>> + *
>> + * Bits must have been set in stat->request_mask to indicate which 
>> attributes
>> + * the caller wants retrieving.  Any such attribute not requested may be
>> + * returned anyway, but the value may be approximate, and, if remote, may 
>> not
>> + * have been synchronised with the server.
>> + *
>> + * 0 will be returned on success, and a -ve error code if unsuccessful.
>> + */
>> +int vfs_xgetattr(struct path *path, struct kstat *stat)
>> {
>>      int retval;
>> 
>>      retval = security_inode_getattr(path);
>>      if (retval)
>>              return retval;
>> -    return vfs_getattr_nosec(path, stat);
>> +    return vfs_xgetattr_nosec(path, stat);
>> }
>> +EXPORT_SYMBOL(vfs_xgetattr);
>> 
>> +/**
>> + * vfs_getattr - Get the basic attributes of a file
>> + * @path: The file of interest
>> + * @stat: Where to return the statistics
>> + *
>> + * Ask the filesystem for a file's attributes.  If remote, the filesystem 
>> isn't
>> + * forced to update its files from the backing store.  Only the basic set of
>> + * attributes will be retrieved; anyone wanting more must use 
>> vfs_xgetattr(),
>> + * as must anyone who wants to force attributes to be sync'd with the 
>> server.
>> + *
>> + * 0 will be returned on success, and a -ve error code if unsuccessful.
>> + */
>> +int vfs_getattr(struct path *path, struct kstat *stat)
>> +{
>> +    stat->query_flags = 0;
>> +    stat->request_mask = STATX_BASIC_STATS;
>> +    return vfs_xgetattr(path, stat);
>> +}
>> EXPORT_SYMBOL(vfs_getattr);
>> 
>> -int vfs_fstat(unsigned int fd, struct kstat *stat)
>> +/**
>> + * vfs_fstatx - Get the enhanced basic attributes by file descriptor
>> + * @fd: The file descriptor referring to the file of interest
>> + * @stat: The result structure to fill in.
>> + *
>> + * This function is a wrapper around vfs_xgetattr().  The main difference is
>> + * that it uses a file descriptor to determine the file location.
>> + *
>> + * The caller must have preset stat->query_flags and stat->request_mask as 
>> for
>> + * vfs_xgetattr().
>> + *
>> + * 0 will be returned on success, and a -ve error code if unsuccessful.
>> + */
>> +int vfs_fstatx(unsigned int fd, struct kstat *stat)
>> {
>>      struct fd f = fdget_raw(fd);
>>      int error = -EBADF;
>> 
>>      if (f.file) {
>> -            error = vfs_getattr(&f.file->f_path, stat);
>> +            error = vfs_xgetattr(&f.file->f_path, stat);
>>              fdput(f);
>>      }
>>      return error;
>> }
>> +EXPORT_SYMBOL(vfs_fstatx);
>> +
>> +/**
>> + * vfs_fstat - Get basic attributes by file descriptor
>> + * @fd: The file descriptor referring to the file of interest
>> + * @stat: The result structure to fill in.
>> + *
>> + * This function is a wrapper around vfs_getattr().  The main difference is
>> + * that it uses a file descriptor to determine the file location.
>> + *
>> + * 0 will be returned on success, and a -ve error code if unsuccessful.
>> + */
>> +int vfs_fstat(unsigned int fd, struct kstat *stat)
>> +{
>> +    stat->query_flags = 0;
>> +    stat->request_mask = STATX_BASIC_STATS;
>> +    return vfs_fstatx(fd, stat);
>> +}
>> EXPORT_SYMBOL(vfs_fstat);
>> 
>> -int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat,
>> -            int flag)
>> +/**
>> + * vfs_statx - Get basic and extra attributes by filename
>> + * @dfd: A file descriptor representing the base dir for a relative filename
>> + * @filename: The name of the file of interest
>> + * @flags: Flags to control the query
>> + * @stat: The result structure to fill in.
>> + *
>> + * This function is a wrapper around vfs_xgetattr().  The main difference is
>> + * that it uses a filename and base directory to determine the file 
>> location.
>> + * Additionally, the addition of AT_SYMLINK_NOFOLLOW to flags will prevent a
> 
> s/the addition of AT_SYMLINK_NOFOLLOW to/the use of AT_SYMLINK_NOFOLLOW in/
> 
> 
>> + * symlink at the given name from being referenced.
>> + *
>> + * The caller must have preset stat->request_mask as for vfs_xgetattr().  
>> The
>> + * flags are also used to load up stat->query_flags.
>> + *
>> + * 0 will be returned on success, and a -ve error code if unsuccessful.
>> + */
>> +int vfs_statx(int dfd, const char __user *filename, int flags,
>> +          struct kstat *stat)
>> {
>>      struct path path;
>>      int error = -EINVAL;
>> -    unsigned int lookup_flags = 0;
>> +    unsigned int lookup_flags = LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT;
>> 
>> -    if ((flag & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT |
>> -                  AT_EMPTY_PATH)) != 0)
>> -            goto out;
>> +    if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT |
>> +                   AT_EMPTY_PATH | KSTAT_QUERY_FLAGS)) != 0)
>> +            return -EINVAL;
>> 
>> -    if (!(flag & AT_SYMLINK_NOFOLLOW))
>> -            lookup_flags |= LOOKUP_FOLLOW;
>> -    if (flag & AT_EMPTY_PATH)
>> +    if (flags & AT_SYMLINK_NOFOLLOW)
>> +            lookup_flags &= ~LOOKUP_FOLLOW;
>> +    if (flags & AT_NO_AUTOMOUNT)
>> +            lookup_flags &= ~LOOKUP_AUTOMOUNT;
>> +    if (flags & AT_EMPTY_PATH)
>>              lookup_flags |= LOOKUP_EMPTY;
>> +    stat->query_flags = flags;
>> +
>> retry:
>>      error = user_path_at(dfd, filename, lookup_flags, &path);
>>      if (error)
>>              goto out;
>> 
>> -    error = vfs_getattr(&path, stat);
>> +    error = vfs_xgetattr(&path, stat);
>>      path_put(&path);
>>      if (retry_estale(error, lookup_flags)) {
>>              lookup_flags |= LOOKUP_REVAL;
>> @@ -116,17 +227,65 @@ int vfs_fstatat(int dfd, const char __user *filename, 
>> struct kstat *stat,
>> out:
>>      return error;
>> }
>> +EXPORT_SYMBOL(vfs_statx);
>> +
>> +/**
>> + * vfs_fstatat - Get basic attributes by filename
>> + * @dfd: A file descriptor representing the base dir for a relative filename
>> + * @filename: The name of the file of interest
>> + * @flags: Flags to control the query
>> + * @stat: The result structure to fill in.
>> + *
>> + * This function is a wrapper around vfs_statx().  The difference is that it
>> + * preselects basic stats only.  The flags are used to load up
>> + * stat->query_flags in addition to indicating symlink handling during path
>> + * resolution.
>> + *
>> + * 0 will be returned on success, and a -ve error code if unsuccessful.
>> + */
>> +int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat,
>> +            int flags)
>> +{
>> +    stat->request_mask = STATX_BASIC_STATS;
>> +    return vfs_statx(dfd, filename, flags, stat);
>> +}
>> EXPORT_SYMBOL(vfs_fstatat);
>> 
>> -int vfs_stat(const char __user *name, struct kstat *stat)
>> +/**
>> + * vfs_stat - Get basic attributes by filename
>> + * @filename: The name of the file of interest
>> + * @stat: The result structure to fill in.
>> + *
>> + * This function is a wrapper around vfs_statx().  The difference is that it
>> + * preselects basic stats only, terminal symlinks are followed regardless 
>> and a
> 
> s/terminal symlinks/symlinks in the basename/
> 
>> + * remote filesystem can't be forced to query the server.  If such is 
>> desired,
>> + * vfs_statx() should be used instead.
>> + *
>> + * 0 will be returned on success, and a -ve error code if unsuccessful.
>> + */
>> +int vfs_stat(const char __user *filename, struct kstat *stat)
>> {
>> -    return vfs_fstatat(AT_FDCWD, name, stat, 0);
>> +    stat->request_mask = STATX_BASIC_STATS;
>> +    return vfs_statx(AT_FDCWD, filename, 0, stat);
>> }
>> EXPORT_SYMBOL(vfs_stat);
>> 
>> +/**
>> + * vfs_lstat - Get basic attrs by filename, without following terminal 
>> symlink
>> + * @filename: The name of the file of interest
>> + * @stat: The result structure to fill in.
>> + *
>> + * This function is a wrapper around vfs_statx().  The difference is that it
>> + * preselects basic stats only, terminal symlinks are note followed 
>> regardless
> 
> s/terminal symlinks/symlinks in the basename/
> s/note/not/
> 
> 
>> + * and a remote filesystem can't be forced to query the server.  If such is
>> + * desired, vfs_statx() should be used instead.
>> + *
>> + * 0 will be returned on success, and a -ve error code if unsuccessful.
>> + */
>> int vfs_lstat(const char __user *name, struct kstat *stat)
>> {
>> -    return vfs_fstatat(AT_FDCWD, name, stat, AT_SYMLINK_NOFOLLOW);
>> +    stat->request_mask = STATX_BASIC_STATS;
>> +    return vfs_statx(AT_FDCWD, name, AT_SYMLINK_NOFOLLOW, stat);
>> }
>> EXPORT_SYMBOL(vfs_lstat);
>> 
>> @@ -141,7 +300,7 @@ static int cp_old_stat(struct kstat *stat, struct 
>> __old_kernel_stat __user * sta
>> {
>>      static int warncount = 5;
>>      struct __old_kernel_stat tmp;
>> -
>> +
>>      if (warncount > 0) {
>>              warncount--;
>>              printk(KERN_WARNING "VFS: Warning: %s using old stat() call. 
>> Recompile your binary.\n",
>> @@ -166,7 +325,7 @@ static int cp_old_stat(struct kstat *stat, struct 
>> __old_kernel_stat __user * sta
>> #if BITS_PER_LONG == 32
>>      if (stat->size > MAX_NON_LFS)
>>              return -EOVERFLOW;
>> -#endif
>> +#endif
>>      tmp.st_size = stat->size;
>>      tmp.st_atime = stat->atime.tv_sec;
>>      tmp.st_mtime = stat->mtime.tv_sec;
>> @@ -443,6 +602,82 @@ SYSCALL_DEFINE4(fstatat64, int, dfd, const char __user 
>> *, filename,
>> }
>> #endif /* __ARCH_WANT_STAT64 || __ARCH_WANT_COMPAT_STAT64 */
>> 
>> +/*
>> + * Set the statx results.
>> + */
>> +static long statx_set_result(struct kstat *stat, struct statx __user 
>> *buffer)
>> +{
>> +    uid_t uid = from_kuid_munged(current_user_ns(), stat->uid);
>> +    gid_t gid = from_kgid_munged(current_user_ns(), stat->gid);
>> +
>> +#define __put_timestamp(kts, uts) (                         \
>> +            __put_user(kts.tv_sec,  uts.tv_sec      ) ||    \
>> +            __put_user(kts.tv_nsec, uts.tv_nsec     ) ||            \
>> +            __put_user(0,           uts.__reserved  ))
>> +
>> +    if (__put_user(stat->result_mask,       &buffer->stx_mask       ) ||
>> +        __put_user(stat->mode,              &buffer->stx_mode       ) ||
>> +        __clear_user(&buffer->__spare0, sizeof(buffer->__spare0))     ||
>> +        __put_user(stat->nlink,             &buffer->stx_nlink      ) ||
>> +        __put_user(uid,                     &buffer->stx_uid        ) ||
>> +        __put_user(gid,                     &buffer->stx_gid        ) ||
>> +        __put_user(stat->attributes,        &buffer->stx_attributes ) ||
>> +        __put_user(stat->blksize,           &buffer->stx_blksize    ) ||
>> +        __put_user(MAJOR(stat->rdev),       &buffer->stx_rdev_major ) ||
>> +        __put_user(MINOR(stat->rdev),       &buffer->stx_rdev_minor ) ||
>> +        __put_user(MAJOR(stat->dev),        &buffer->stx_dev_major  ) ||
>> +        __put_user(MINOR(stat->dev),        &buffer->stx_dev_minor  ) ||
>> +        __put_timestamp(stat->atime,        &buffer->stx_atime      ) ||
>> +        __put_timestamp(stat->btime,        &buffer->stx_btime      ) ||
>> +        __put_timestamp(stat->ctime,        &buffer->stx_ctime      ) ||
>> +        __put_timestamp(stat->mtime,        &buffer->stx_mtime      ) ||
>> +        __put_user(stat->ino,               &buffer->stx_ino        ) ||
>> +        __put_user(stat->size,              &buffer->stx_size       ) ||
>> +        __put_user(stat->blocks,            &buffer->stx_blocks     ) ||
>> +        __clear_user(&buffer->__spare1, sizeof(buffer->__spare1))     ||
>> +        __clear_user(&buffer->__spare2, sizeof(buffer->__spare2)))
>> +            return -EFAULT;
>> +
>> +    return 0;
>> +}
>> +
>> +/**
>> + * sys_statx - System call to get enhanced stats
>> + * @dfd: Base directory to pathwalk from *or* fd to stat.
>> + * @filename: File to stat *or* NULL.
>> + * @flags: AT_* flags to control pathwalk.
>> + * @mask: Parts of statx struct actually required.
>> + * @buffer: Result buffer.
>> + *
>> + * Note that if filename is NULL, then it does the equivalent of fstat() 
>> using
>> + * dfd to indicate the file of interest.
>> + */
>> +SYSCALL_DEFINE5(statx,
>> +            int, dfd, const char __user *, filename, unsigned, flags,
>> +            unsigned int, mask,
>> +            struct statx __user *, buffer)
>> +{
>> +    struct kstat stat;
>> +    int error;
>> +
>> +    if ((flags & AT_STATX_SYNC_TYPE) == AT_STATX_SYNC_TYPE)
>> +            return -EINVAL;
>> +    if (!access_ok(VERIFY_WRITE, buffer, sizeof(*buffer)))
>> +            return -EFAULT;
>> +
>> +    memset(&stat, 0, sizeof(stat));
>> +    stat.query_flags = flags;
>> +    stat.request_mask = mask & STATX_ALL;
>> +
>> +    if (filename)
>> +            error = vfs_statx(dfd, filename, flags, &stat);
>> +    else
>> +            error = vfs_fstatx(dfd, &stat);
>> +    if (error)
>> +            return error;
>> +    return statx_set_result(&stat, buffer);
>> +}
>> +
>> /* Caller is here responsible for sufficient locking (ie. inode->i_lock) */
>> void __inode_add_bytes(struct inode *inode, loff_t bytes)
>> {
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 16d2b6e874d6..f153199566b4 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -2916,8 +2916,9 @@ extern const struct inode_operations 
>> page_symlink_inode_operations;
>> extern void kfree_link(void *);
>> extern int generic_readlink(struct dentry *, char __user *, int);
>> extern void generic_fillattr(struct inode *, struct kstat *);
>> -int vfs_getattr_nosec(struct path *path, struct kstat *stat);
>> +extern int vfs_xgetattr_nosec(struct path *path, struct kstat *stat);
>> extern int vfs_getattr(struct path *, struct kstat *);
>> +extern int vfs_xgetattr(struct path *, struct kstat *);
>> void __inode_add_bytes(struct inode *inode, loff_t bytes);
>> void inode_add_bytes(struct inode *inode, loff_t bytes);
>> void __inode_sub_bytes(struct inode *inode, loff_t bytes);
>> @@ -2935,6 +2936,8 @@ extern int vfs_lstat(const char __user *, struct kstat 
>> *);
>> extern int vfs_fstat(unsigned int, struct kstat *);
>> extern int vfs_fstatat(int , const char __user *, struct kstat *, int);
>> extern const char *vfs_get_link(struct dentry *, struct delayed_call *);
>> +extern int vfs_xstat(int, const char __user *, int, struct kstat *);
>> +extern int vfs_xfstat(unsigned int, struct kstat *);
>> 
>> extern int __generic_block_fiemap(struct inode *inode,
>>                                struct fiemap_extent_info *fieinfo,
>> diff --git a/include/linux/stat.h b/include/linux/stat.h
>> index 075cb0c7eb2a..9b81dfcbb57a 100644
>> --- a/include/linux/stat.h
>> +++ b/include/linux/stat.h
>> @@ -19,19 +19,26 @@
>> #include <linux/uidgid.h>
>> 
>> struct kstat {
>> -    u64             ino;
>> -    dev_t           dev;
>> +    u32             query_flags;    /* Operational flags */
>> +#define KSTAT_QUERY_FLAGS (AT_STATX_SYNC_TYPE)
>> +    u32             request_mask;   /* What fields the user asked for */
>> +    u32             result_mask;    /* What fields the user got */
>>      umode_t         mode;
>>      unsigned int    nlink;
>> +    uint32_t        blksize;        /* Preferred I/O size */
>> +    u64             attributes;
>> +#define KSTAT_ATTR_FS_IOC_FLAGS             0x00000874 /* Attrs 
>> corresponding to FS_*_FL flags */
>> +    u64             ino;
>> +    dev_t           dev;
>> +    dev_t           rdev;
>>      kuid_t          uid;
>>      kgid_t          gid;
>> -    dev_t           rdev;
>>      loff_t          size;
>> -    struct timespec  atime;
>> +    struct timespec atime;
>>      struct timespec mtime;
>>      struct timespec ctime;
>> -    unsigned long   blksize;
>> -    unsigned long long      blocks;
>> +    struct timespec btime;                  /* File creation time */
>> +    u64             blocks;
>> };
>> 
>> #endif
>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>> index 91a740f6b884..980c3c9b06f8 100644
>> --- a/include/linux/syscalls.h
>> +++ b/include/linux/syscalls.h
>> @@ -48,6 +48,7 @@ struct stat;
>> struct stat64;
>> struct statfs;
>> struct statfs64;
>> +struct statx;
>> struct __sysctl_args;
>> struct sysinfo;
>> struct timespec;
>> @@ -902,5 +903,7 @@ asmlinkage long sys_pkey_mprotect(unsigned long start, 
>> size_t len,
>>                                unsigned long prot, int pkey);
>> asmlinkage long sys_pkey_alloc(unsigned long flags, unsigned long init_val);
>> asmlinkage long sys_pkey_free(int pkey);
>> +asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
>> +                      unsigned mask, struct statx __user *buffer);
>> 
>> #endif
>> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
>> index beed138bd359..813afd6eee71 100644
>> --- a/include/uapi/linux/fcntl.h
>> +++ b/include/uapi/linux/fcntl.h
>> @@ -63,5 +63,10 @@
>> #define AT_NO_AUTOMOUNT              0x800   /* Suppress terminal automount 
>> traversal */
>> #define AT_EMPTY_PATH                0x1000  /* Allow empty relative 
>> pathname */
>> 
>> +#define AT_STATX_SYNC_TYPE  0x6000  /* Type of synchronisation required 
>> from statx() */
>> +#define AT_STATX_SYNC_AS_STAT       0x0000  /* - Do whatever stat() does */
>> +#define AT_STATX_FORCE_SYNC 0x2000  /* - Force the attributes to be sync'd 
>> with the server */
>> +#define AT_STATX_DONT_SYNC  0x4000  /* - Don't sync attributes with the 
>> server */
>> +
>> 
>> #endif /* _UAPI_LINUX_FCNTL_H */
>> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
>> index 7fec7e36d921..995e82fe019c 100644
>> --- a/include/uapi/linux/stat.h
>> +++ b/include/uapi/linux/stat.h
>> @@ -1,6 +1,7 @@
>> #ifndef _UAPI_LINUX_STAT_H
>> #define _UAPI_LINUX_STAT_H
>> 
>> +#include <linux/types.h>
>> 
>> #if defined(__KERNEL__) || !defined(__GLIBC__) || (__GLIBC__ < 2)
>> 
>> @@ -41,5 +42,124 @@
>> 
>> #endif
>> 
>> +/*
>> + * Timestamp structure for the timestamps in struct statx.
>> + */
>> +struct statx_timestamp {
>> +    __s64   tv_sec;         /* Number of seconds before or after midnight 
>> 1st Jan 1970 */
>> +    __s32   tv_nsec;        /* Number of nanoseconds before or after sec 
>> (0-999,999,999) */
> 
> Here, add a note in the comment: "Will be a negative value (if nonzero) if 
> tv_sec is negative"
> 
> [...]
> 
> Cheers,
> 
> Michael
> 
> 
> 
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas





Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to