Re: [PATCH v2 1/1] process_madvise.2: Add process_madvise man page

2021-02-01 Thread Suren Baghdasaryan
On Sat, Jan 30, 2021 at 1:34 PM Michael Kerrisk (man-pages)
 wrote:
>
> Hello Suren,
>
> Thank you for the revisions! Just a few more comments: all pretty small
> stuff (many points that I overlooked the first time rround), since the
> page already looks pretty good by now.
>
> Again, thanks for the rendered version. As before, I've added my
> comments to the page source.

Hi Michael,
Thanks for reviewing!

>
> On 1/29/21 8:03 AM, Suren Baghdasaryan wrote:
> > Initial version of process_madvise(2) manual page. Initial text was
> > extracted from [1], amended after fix [2] and more details added using
> > man pages of madvise(2) and process_vm_read(2) as examples. It also
> > includes the changes to required permission proposed in [3].
> >
> > [1] https://lore.kernel.org/patchwork/patch/1297933/
> > [2] https://lkml.org/lkml/2020/12/8/1282
> > [3] 
> > https://patchwork.kernel.org/project/selinux/patch/2021070622.2613577-1-sur...@google.com/#23888311
> >
> > Signed-off-by: Suren Baghdasaryan 
> > ---
> > changes in v2:
> > - Changed description of MADV_COLD per Michal Hocko's suggestion
> > - Appled fixes suggested by Michael Kerrisk
> >
> > NAME
> > process_madvise - give advice about use of memory to a process
>
> s/-/\-/

ack

>
> >
> > SYNOPSIS
> > #include 
> >
> > ssize_t process_madvise(int pidfd,
> >const struct iovec *iovec,
> >unsigned long vlen,
> >int advice,
> >unsigned int flags);
> >
> > DESCRIPTION
> > The process_madvise() system call is used to give advice or directions
> > to the kernel about the address ranges of other process as well as of
> > the calling process. It provides the advice to address ranges of process
> > described by iovec and vlen. The goal of such advice is to improve 
> > system
> > or application performance.
> >
> > The pidfd argument is a PID file descriptor (see pidofd_open(2)) that
> > specifies the process to which the advice is to be applied.
> >
> > The pointer iovec points to an array of iovec structures, defined in
> >  as:
> >
> > struct iovec {
> > void  *iov_base;/* Starting address */
> > size_t iov_len; /* Number of bytes to transfer */
> > };
> >
> > The iovec structure describes address ranges beginning at iov_base 
> > address
> > and with the size of iov_len bytes.
> >
> > The vlen represents the number of elements in the iovec structure.
> >
> > The advice argument is one of the values listed below.
> >
> >   Linux-specific advice values
> > The following Linux-specific advice values have no counterparts in the
> > POSIX-specified posix_madvise(3), and may or may not have counterparts
> > in the madvise(2) interface available on other implementations.
> >
> > MADV_COLD (since Linux 5.4.1)
> > Deactive a given range of pages which will make them a more probable
> > reclaim target should there be a memory pressure. This is a non-
> > destructive operation. The advice might be ignored for some pages in
> > the range when it is not applicable.
> >
> > MADV_PAGEOUT (since Linux 5.4.1)
> > Reclaim a given range of pages. This is done to free up memory 
> > occupied
> > by these pages. If a page is anonymous it will be swapped out. If a
> > page is file-backed and dirty it will be written back to the backing
> > storage. The advice might be ignored for some pages in the range 
> > when
> > it is not applicable.
> >
> > The flags argument is reserved for future use; currently, this argument
> > must be specified as 0.
> >
> > The value specified in the vlen argument must be less than or equal to
> > IOV_MAX (defined in  or accessible via the call
> > sysconf(_SC_IOV_MAX)).
> >
> > The vlen and iovec arguments are checked before applying any hints. If
> > the vlen is too big, or iovec is invalid, an error will be returned
> > immediately.
> >
> > The hint might be applied to a part of iovec if one of its elements 
> > points
> > to an invalid memory region in the remote process. No further elements 
> > will
> > be processed beyond that point.
> >
> > Permission to provide a hint to another process is governed by a ptrace
> > access mode PTRACE_MODE_READ_REALCREDS check (see ptrace(2)); in 
> > addition,
> > the caller must have the CAP_SYS_ADMIN capability due to performance
> > implications of applying the hint.
> >
> > RETURN VALUE
> > On success, process_madvise() returns the number of bytes advised. This
> > return value may be less than the total number of requested bytes, if an
> > error occurred after some iovec elements were already processed. The 
> > caller
> > should check the return value to determine whether a partial advice
> > occurred.
> >
> > On error, -1 is r

Re: [PATCH v2 1/1] process_madvise.2: Add process_madvise man page

2021-01-30 Thread Michael Kerrisk (man-pages)
Hello Suren,

Thank you for the revisions! Just a few more comments: all pretty small
stuff (many points that I overlooked the first time rround), since the
page already looks pretty good by now.

Again, thanks for the rendered version. As before, I've added my
comments to the page source.

On 1/29/21 8:03 AM, Suren Baghdasaryan wrote:
> Initial version of process_madvise(2) manual page. Initial text was
> extracted from [1], amended after fix [2] and more details added using
> man pages of madvise(2) and process_vm_read(2) as examples. It also
> includes the changes to required permission proposed in [3].
> 
> [1] https://lore.kernel.org/patchwork/patch/1297933/
> [2] https://lkml.org/lkml/2020/12/8/1282
> [3] 
> https://patchwork.kernel.org/project/selinux/patch/2021070622.2613577-1-sur...@google.com/#23888311
> 
> Signed-off-by: Suren Baghdasaryan 
> ---
> changes in v2:
> - Changed description of MADV_COLD per Michal Hocko's suggestion
> - Appled fixes suggested by Michael Kerrisk
> 
> NAME
> process_madvise - give advice about use of memory to a process

s/-/\-/

> 
> SYNOPSIS
> #include 
> 
> ssize_t process_madvise(int pidfd,
>const struct iovec *iovec,
>unsigned long vlen,
>int advice,
>unsigned int flags);
> 
> DESCRIPTION
> The process_madvise() system call is used to give advice or directions
> to the kernel about the address ranges of other process as well as of
> the calling process. It provides the advice to address ranges of process
> described by iovec and vlen. The goal of such advice is to improve system
> or application performance.
> 
> The pidfd argument is a PID file descriptor (see pidofd_open(2)) that
> specifies the process to which the advice is to be applied.
> 
> The pointer iovec points to an array of iovec structures, defined in
>  as:
> 
> struct iovec {
> void  *iov_base;/* Starting address */
> size_t iov_len; /* Number of bytes to transfer */
> };
> 
> The iovec structure describes address ranges beginning at iov_base address
> and with the size of iov_len bytes.
> 
> The vlen represents the number of elements in the iovec structure.
> 
> The advice argument is one of the values listed below.
> 
>   Linux-specific advice values
> The following Linux-specific advice values have no counterparts in the
> POSIX-specified posix_madvise(3), and may or may not have counterparts
> in the madvise(2) interface available on other implementations.
> 
> MADV_COLD (since Linux 5.4.1)
> Deactive a given range of pages which will make them a more probable
> reclaim target should there be a memory pressure. This is a non-
> destructive operation. The advice might be ignored for some pages in
> the range when it is not applicable.
> 
> MADV_PAGEOUT (since Linux 5.4.1)
> Reclaim a given range of pages. This is done to free up memory 
> occupied
> by these pages. If a page is anonymous it will be swapped out. If a
> page is file-backed and dirty it will be written back to the backing
> storage. The advice might be ignored for some pages in the range when
> it is not applicable.
> 
> The flags argument is reserved for future use; currently, this argument
> must be specified as 0.
> 
> The value specified in the vlen argument must be less than or equal to
> IOV_MAX (defined in  or accessible via the call
> sysconf(_SC_IOV_MAX)).
> 
> The vlen and iovec arguments are checked before applying any hints. If
> the vlen is too big, or iovec is invalid, an error will be returned
> immediately.
> 
> The hint might be applied to a part of iovec if one of its elements points
> to an invalid memory region in the remote process. No further elements 
> will
> be processed beyond that point.
> 
> Permission to provide a hint to another process is governed by a ptrace
> access mode PTRACE_MODE_READ_REALCREDS check (see ptrace(2)); in addition,
> the caller must have the CAP_SYS_ADMIN capability due to performance
> implications of applying the hint.
> 
> RETURN VALUE
> On success, process_madvise() returns the number of bytes advised. This
> return value may be less than the total number of requested bytes, if an
> error occurred after some iovec elements were already processed. The 
> caller
> should check the return value to determine whether a partial advice
> occurred.
> 
> On error, -1 is returned and errno is set to indicate the error.
> 
> ERRORS
> EFAULT The memory described by iovec is outside the accessible address
>space of the process referred to by pidfd.
> EINVAL flags is not 0.
> EINVAL The sum of the iov_len values of iovec overflows a ssize_t value.
> EINVAL vlen is too large.
> E

Re: [PATCH v2 1/1] process_madvise.2: Add process_madvise man page

2021-01-29 Thread Suren Baghdasaryan
On Fri, Jan 29, 2021 at 1:13 AM 'Michal Hocko' via kernel-team
 wrote:
>
> On Thu 28-01-21 23:03:40, Suren Baghdasaryan wrote:
> > Initial version of process_madvise(2) manual page. Initial text was
> > extracted from [1], amended after fix [2] and more details added using
> > man pages of madvise(2) and process_vm_read(2) as examples. It also
> > includes the changes to required permission proposed in [3].
> >
> > [1] https://lore.kernel.org/patchwork/patch/1297933/
> > [2] https://lkml.org/lkml/2020/12/8/1282
> > [3] 
> > https://patchwork.kernel.org/project/selinux/patch/2021070622.2613577-1-sur...@google.com/#23888311
> >
> > Signed-off-by: Suren Baghdasaryan 
>
> Reviewed-by: Michal Hocko 

Thanks!

> Thanks!
>
> > ---
> > changes in v2:
> > - Changed description of MADV_COLD per Michal Hocko's suggestion
> > - Appled fixes suggested by Michael Kerrisk
> >
> > NAME
> > process_madvise - give advice about use of memory to a process
> >
> > SYNOPSIS
> > #include 
> >
> > ssize_t process_madvise(int pidfd,
> >const struct iovec *iovec,
> >unsigned long vlen,
> >int advice,
> >unsigned int flags);
> >
> > DESCRIPTION
> > The process_madvise() system call is used to give advice or directions
> > to the kernel about the address ranges of other process as well as of
> > the calling process. It provides the advice to address ranges of process
> > described by iovec and vlen. The goal of such advice is to improve 
> > system
> > or application performance.
> >
> > The pidfd argument is a PID file descriptor (see pidofd_open(2)) that
> > specifies the process to which the advice is to be applied.
> >
> > The pointer iovec points to an array of iovec structures, defined in
> >  as:
> >
> > struct iovec {
> > void  *iov_base;/* Starting address */
> > size_t iov_len; /* Number of bytes to transfer */
> > };
> >
> > The iovec structure describes address ranges beginning at iov_base 
> > address
> > and with the size of iov_len bytes.
> >
> > The vlen represents the number of elements in the iovec structure.
> >
> > The advice argument is one of the values listed below.
> >
> >   Linux-specific advice values
> > The following Linux-specific advice values have no counterparts in the
> > POSIX-specified posix_madvise(3), and may or may not have counterparts
> > in the madvise(2) interface available on other implementations.
> >
> > MADV_COLD (since Linux 5.4.1)
> > Deactive a given range of pages which will make them a more probable
> > reclaim target should there be a memory pressure. This is a non-
> > destructive operation. The advice might be ignored for some pages in
> > the range when it is not applicable.
> >
> > MADV_PAGEOUT (since Linux 5.4.1)
> > Reclaim a given range of pages. This is done to free up memory 
> > occupied
> > by these pages. If a page is anonymous it will be swapped out. If a
> > page is file-backed and dirty it will be written back to the backing
> > storage. The advice might be ignored for some pages in the range 
> > when
> > it is not applicable.
> >
> > The flags argument is reserved for future use; currently, this argument
> > must be specified as 0.
> >
> > The value specified in the vlen argument must be less than or equal to
> > IOV_MAX (defined in  or accessible via the call
> > sysconf(_SC_IOV_MAX)).
> >
> > The vlen and iovec arguments are checked before applying any hints. If
> > the vlen is too big, or iovec is invalid, an error will be returned
> > immediately.
> >
> > The hint might be applied to a part of iovec if one of its elements 
> > points
> > to an invalid memory region in the remote process. No further elements 
> > will
> > be processed beyond that point.
> >
> > Permission to provide a hint to another process is governed by a ptrace
> > access mode PTRACE_MODE_READ_REALCREDS check (see ptrace(2)); in 
> > addition,
> > the caller must have the CAP_SYS_ADMIN capability due to performance
> > implications of applying the hint.
> >
> > RETURN VALUE
> > On success, process_madvise() returns the number of bytes advised. This
> > return value may be less than the total number of requested bytes, if an
> > error occurred after some iovec elements were already processed. The 
> > caller
> > should check the return value to determine whether a partial advice
> > occurred.
> >
> > On error, -1 is returned and errno is set to indicate the error.
> >
> > ERRORS
> > EFAULT The memory described by iovec is outside the accessible address
> >space of the process referred to by pidfd.
> > EINVAL flags is not 0.
> > EINVAL The sum of the iov_len values of iovec overflows a ssiz

[PATCH v2 1/1] process_madvise.2: Add process_madvise man page

2021-01-28 Thread Suren Baghdasaryan
Initial version of process_madvise(2) manual page. Initial text was
extracted from [1], amended after fix [2] and more details added using
man pages of madvise(2) and process_vm_read(2) as examples. It also
includes the changes to required permission proposed in [3].

[1] https://lore.kernel.org/patchwork/patch/1297933/
[2] https://lkml.org/lkml/2020/12/8/1282
[3] 
https://patchwork.kernel.org/project/selinux/patch/2021070622.2613577-1-sur...@google.com/#23888311

Signed-off-by: Suren Baghdasaryan 
---
changes in v2:
- Changed description of MADV_COLD per Michal Hocko's suggestion
- Appled fixes suggested by Michael Kerrisk

NAME
process_madvise - give advice about use of memory to a process

SYNOPSIS
#include 

ssize_t process_madvise(int pidfd,
   const struct iovec *iovec,
   unsigned long vlen,
   int advice,
   unsigned int flags);

DESCRIPTION
The process_madvise() system call is used to give advice or directions
to the kernel about the address ranges of other process as well as of
the calling process. It provides the advice to address ranges of process
described by iovec and vlen. The goal of such advice is to improve system
or application performance.

The pidfd argument is a PID file descriptor (see pidofd_open(2)) that
specifies the process to which the advice is to be applied.

The pointer iovec points to an array of iovec structures, defined in
 as:

struct iovec {
void  *iov_base;/* Starting address */
size_t iov_len; /* Number of bytes to transfer */
};

The iovec structure describes address ranges beginning at iov_base address
and with the size of iov_len bytes.

The vlen represents the number of elements in the iovec structure.

The advice argument is one of the values listed below.

  Linux-specific advice values
The following Linux-specific advice values have no counterparts in the
POSIX-specified posix_madvise(3), and may or may not have counterparts
in the madvise(2) interface available on other implementations.

MADV_COLD (since Linux 5.4.1)
Deactive a given range of pages which will make them a more probable
reclaim target should there be a memory pressure. This is a non-
destructive operation. The advice might be ignored for some pages in
the range when it is not applicable.

MADV_PAGEOUT (since Linux 5.4.1)
Reclaim a given range of pages. This is done to free up memory occupied
by these pages. If a page is anonymous it will be swapped out. If a
page is file-backed and dirty it will be written back to the backing
storage. The advice might be ignored for some pages in the range when
it is not applicable.

The flags argument is reserved for future use; currently, this argument
must be specified as 0.

The value specified in the vlen argument must be less than or equal to
IOV_MAX (defined in  or accessible via the call
sysconf(_SC_IOV_MAX)).

The vlen and iovec arguments are checked before applying any hints. If
the vlen is too big, or iovec is invalid, an error will be returned
immediately.

The hint might be applied to a part of iovec if one of its elements points
to an invalid memory region in the remote process. No further elements will
be processed beyond that point.

Permission to provide a hint to another process is governed by a ptrace
access mode PTRACE_MODE_READ_REALCREDS check (see ptrace(2)); in addition,
the caller must have the CAP_SYS_ADMIN capability due to performance
implications of applying the hint.

RETURN VALUE
On success, process_madvise() returns the number of bytes advised. This
return value may be less than the total number of requested bytes, if an
error occurred after some iovec elements were already processed. The caller
should check the return value to determine whether a partial advice
occurred.

On error, -1 is returned and errno is set to indicate the error.

ERRORS
EFAULT The memory described by iovec is outside the accessible address
   space of the process referred to by pidfd.
EINVAL flags is not 0.
EINVAL The sum of the iov_len values of iovec overflows a ssize_t value.
EINVAL vlen is too large.
ENOMEM Could not allocate memory for internal copies of the iovec
   structures.
EPERM The caller does not have permission to access the address space of
  the process pidfd.
ESRCH The target process does not exist (i.e., it has terminated and been
  waited on).
EBADF pidfd is not a valid PID file descriptor.

VERSIONS
This system call first appeared in Linux 5.10, Support for this system
call is optional, depending on the setting of the CONFIG_ADVISE_SYSCALLS
configuration option.

SEE ALSO
madvise(2), pidofd_open