Re: [PATCH 1/1] process_madvise.2: Add process_madvise man page

2021-01-28 Thread Suren Baghdasaryan
On Thu, Jan 28, 2021 at 12:31 PM Michael Kerrisk (man-pages)
 wrote:
>
> Hello Suren,
>
> On 1/28/21 7:40 PM, Suren Baghdasaryan wrote:
> > On Thu, Jan 28, 2021 at 4:24 AM Michael Kerrisk (man-pages)
> >  wrote:
> >>
> >> Hello Suren,
> >>
> >> Thank you for writing this page! Some comments below.
> >
> > Thanks for the review!
> > Couple questions below and I'll respin the new version once they are 
> > clarified.
>
> Okay. See below.
>
> >> On Wed, 20 Jan 2021 at 21:36, Suren Baghdasaryan  wrote:
> >>>
>
> [...]
>
> Thanks for all the acks. That let's me know that you saw what I said.
>
> >>> RETURN VALUE
> >>> On success, process_madvise() returns the number of bytes advised. 
> >>> This
> >>> return value may be less than the total number of requested bytes, if 
> >>> an
> >>> error occurred. The caller should check return value to determine 
> >>> whether
> >>> a partial advice occurred.
> >>
> >> So there are three return values possible,
> >
> > Ok, I think I see your point. How about this instead:
>
> Well, I'm glad you saw it, because I forgot to finish it. But yes,
> you understood what I forgot to say.
>
> > RETURN VALUE
> >  On success, process_madvise() returns the number of bytes advised. This
> >  return value may be less than the total number of requested bytes, if 
> > an
> >  error occurred after some iovec elements were already processed. The 
> > caller
> >  should check the return value to determine whether a partial
> > advice occurred.
> >
> > On error, -1 is returned and errno is set appropriately.
>
> We recently standardized some wording here:
> s/appropriately/to indicate the error/.
>
>
> >>> +.PP
> >>> +The pointer
> >>> +.I iovec
> >>> +points to an array of iovec structures, defined in
> >>
> >> "iovec" should be formatted as
> >>
> >> .I iovec
> >
> > I think it is formatted that way above. What am I missing?
>
> But also in "an array of iovec structures"...
>
> > BTW, where should I be using .I vs .IR? I was looking for an answer
> > but could not find it.
>
> .B / .I == bold/italic this line
> .BR / .IR == alternate bold/italic with normal (Roman) font.
>
> So:
> .I iovec
> .I iovec ,   # so that comma is not italic
> .BR process_madvise ()
> etc.
>
> [...]
>
> >>> +.I iovec
> >>> +if one of its elements points to an invalid memory
> >>> +region in the remote process. No further elements will be
> >>> +processed beyond that point.
> >>> +.PP
> >>> +Permission to provide a hint to external process is governed by a
> >>> +ptrace access mode
> >>> +.B PTRACE_MODE_READ_REALCREDS
> >>> +check; see
> >>> +.BR ptrace (2)
> >>> +and
> >>> +.B CAP_SYS_ADMIN
> >>> +capability that caller should have in order to affect performance
> >>> +of an external process.
> >>
> >> The preceding sentence is garbled. Missing words?
> >
> > Maybe I worded it incorrectly. What I need to say here is that the
> > caller should have both PTRACE_MODE_READ_REALCREDS credentials and
> > CAP_SYS_ADMIN capability. The first part I shamelessly copy/pasted
> > from https://man7.org/linux/man-pages/man2/process_vm_readv.2.html and
> > tried adding the second one to it, obviously unsuccessfully. Any
> > advice on how to fix that?
>
> I think you already got pretty close. How about:
>
> [[
> Permission to provide a hint to another process is governed by a
> ptrace access mode
> .B PTRACE_MODE_READ_REALCREDS
> check (see
> BR ptrace (2));
> in addition, the caller must have the
> .B CAP_SYS_ADMIN
> capability.

In V2 I explanded a bit this part to explain why CAP_SYS_ADMIN is
needed. There were questions about that during my patch review which
adds this requirement
(https://lore.kernel.org/patchwork/patch/1363605), so I thought a
short explanation would be useful.

> ]]
>
> [...]
>
> >>> +.TP
> >>> +.B ESRCH
> >>> +No process with ID
> >>> +.I pidfd
> >>> +exists.
> >>
> >> Should this maybe be:
> >> [[
> >> The target process does not exist (i.e., it has terminated and
> >> been waited on).
> >> ]]
> >>
> >> See pidfd_send_signal(2).
> >
> > I "borrowed" mine from
> > https://man7.org/linux/man-pages/man2/process_vm_readv.2.html but
> > either one sounds good to me. Maybe for pidfd_send_signal the wording
> > about termination is more important. Anyway, it's up to you. Just let
> > me know which one to use.
>
> I think the pidfd_send_signal(2) wording fits better.
>
> [...]
>
> Thanks,
>
> Michael
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/


Re: [PATCH 1/1] process_madvise.2: Add process_madvise man page

2021-01-28 Thread Suren Baghdasaryan
On Thu, Jan 28, 2021 at 12:31 PM Michael Kerrisk (man-pages)
 wrote:
>
> Hello Suren,
>
> On 1/28/21 7:40 PM, Suren Baghdasaryan wrote:
> > On Thu, Jan 28, 2021 at 4:24 AM Michael Kerrisk (man-pages)
> >  wrote:
> >>
> >> Hello Suren,
> >>
> >> Thank you for writing this page! Some comments below.
> >
> > Thanks for the review!
> > Couple questions below and I'll respin the new version once they are 
> > clarified.
>
> Okay. See below.
>
> >> On Wed, 20 Jan 2021 at 21:36, Suren Baghdasaryan  wrote:
> >>>
>
> [...]
>
> Thanks for all the acks. That let's me know that you saw what I said.
>
> >>> RETURN VALUE
> >>> On success, process_madvise() returns the number of bytes advised. 
> >>> This
> >>> return value may be less than the total number of requested bytes, if 
> >>> an
> >>> error occurred. The caller should check return value to determine 
> >>> whether
> >>> a partial advice occurred.
> >>
> >> So there are three return values possible,
> >
> > Ok, I think I see your point. How about this instead:
>
> Well, I'm glad you saw it, because I forgot to finish it. But yes,
> you understood what I forgot to say.
>
> > RETURN VALUE
> >  On success, process_madvise() returns the number of bytes advised. This
> >  return value may be less than the total number of requested bytes, if 
> > an
> >  error occurred after some iovec elements were already processed. The 
> > caller
> >  should check the return value to determine whether a partial
> > advice occurred.
> >
> > On error, -1 is returned and errno is set appropriately.
>
> We recently standardized some wording here:
> s/appropriately/to indicate the error/.

ack

>
>
> >>> +.PP
> >>> +The pointer
> >>> +.I iovec
> >>> +points to an array of iovec structures, defined in
> >>
> >> "iovec" should be formatted as
> >>
> >> .I iovec
> >
> > I think it is formatted that way above. What am I missing?
>
> But also in "an array of iovec structures"...

ack

>
> > BTW, where should I be using .I vs .IR? I was looking for an answer
> > but could not find it.
>
> .B / .I == bold/italic this line
> .BR / .IR == alternate bold/italic with normal (Roman) font.
>
> So:
> .I iovec
> .I iovec ,   # so that comma is not italic
> .BR process_madvise ()
> etc.

Aha! Got it now. It's clear after your example. Thanks!

>
> [...]
>
> >>> +.I iovec
> >>> +if one of its elements points to an invalid memory
> >>> +region in the remote process. No further elements will be
> >>> +processed beyond that point.
> >>> +.PP
> >>> +Permission to provide a hint to external process is governed by a
> >>> +ptrace access mode
> >>> +.B PTRACE_MODE_READ_REALCREDS
> >>> +check; see
> >>> +.BR ptrace (2)
> >>> +and
> >>> +.B CAP_SYS_ADMIN
> >>> +capability that caller should have in order to affect performance
> >>> +of an external process.
> >>
> >> The preceding sentence is garbled. Missing words?
> >
> > Maybe I worded it incorrectly. What I need to say here is that the
> > caller should have both PTRACE_MODE_READ_REALCREDS credentials and
> > CAP_SYS_ADMIN capability. The first part I shamelessly copy/pasted
> > from https://man7.org/linux/man-pages/man2/process_vm_readv.2.html and
> > tried adding the second one to it, obviously unsuccessfully. Any
> > advice on how to fix that?
>
> I think you already got pretty close. How about:
>
> [[
> Permission to provide a hint to another process is governed by a
> ptrace access mode
> .B PTRACE_MODE_READ_REALCREDS
> check (see
> BR ptrace (2));
> in addition, the caller must have the
> .B CAP_SYS_ADMIN
> capability.
> ]]

Perfect! I'll use that.

>
> [...]
>
> >>> +.TP
> >>> +.B ESRCH
> >>> +No process with ID
> >>> +.I pidfd
> >>> +exists.
> >>
> >> Should this maybe be:
> >> [[
> >> The target process does not exist (i.e., it has terminated and
> >> been waited on).
> >> ]]
> >>
> >> See pidfd_send_signal(2).
> >
> > I "borrowed" mine from
> > https://man7.org/linux/man-pages/man2/process_vm_readv.2.html but
> > either one sounds good to me. Maybe for pidfd_send_signal the wording
> > about termination is more important. Anyway, it's up to you. Just let
> > me know which one to use.
>
> I think the pidfd_send_signal(2) wording fits better.

ack, will use pidfd_send_signal(2) version.

>
> [...]
>
> Thanks,
>
> Michael

I'll include your and Michal's suggestions and will post the next
version later today or tomorrow morning.
Thanks for the guidance!

>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/


Re: [PATCH 1/1] process_madvise.2: Add process_madvise man page

2021-01-28 Thread Michael Kerrisk (man-pages)
Hello Suren,

On 1/28/21 7:40 PM, Suren Baghdasaryan wrote:
> On Thu, Jan 28, 2021 at 4:24 AM Michael Kerrisk (man-pages)
>  wrote:
>>
>> Hello Suren,
>>
>> Thank you for writing this page! Some comments below.
> 
> Thanks for the review!
> Couple questions below and I'll respin the new version once they are 
> clarified.

Okay. See below.

>> On Wed, 20 Jan 2021 at 21:36, Suren Baghdasaryan  wrote:
>>>

[...]

Thanks for all the acks. That let's me know that you saw what I said.

>>> RETURN VALUE
>>> On success, process_madvise() returns the number of bytes advised. This
>>> return value may be less than the total number of requested bytes, if an
>>> error occurred. The caller should check return value to determine 
>>> whether
>>> a partial advice occurred.
>>
>> So there are three return values possible,
> 
> Ok, I think I see your point. How about this instead:

Well, I'm glad you saw it, because I forgot to finish it. But yes,
you understood what I forgot to say.

> RETURN VALUE
>  On success, process_madvise() returns the number of bytes advised. This
>  return value may be less than the total number of requested bytes, if an
>  error occurred after some iovec elements were already processed. The 
> caller
>  should check the return value to determine whether a partial
> advice occurred.
> 
> On error, -1 is returned and errno is set appropriately.

We recently standardized some wording here:
s/appropriately/to indicate the error/.


>>> +.PP
>>> +The pointer
>>> +.I iovec
>>> +points to an array of iovec structures, defined in
>>
>> "iovec" should be formatted as
>>
>> .I iovec
> 
> I think it is formatted that way above. What am I missing?

But also in "an array of iovec structures"...

> BTW, where should I be using .I vs .IR? I was looking for an answer
> but could not find it.

.B / .I == bold/italic this line
.BR / .IR == alternate bold/italic with normal (Roman) font.

So:
.I iovec
.I iovec ,   # so that comma is not italic
.BR process_madvise ()
etc.

[...]

>>> +.I iovec
>>> +if one of its elements points to an invalid memory
>>> +region in the remote process. No further elements will be
>>> +processed beyond that point.
>>> +.PP
>>> +Permission to provide a hint to external process is governed by a
>>> +ptrace access mode
>>> +.B PTRACE_MODE_READ_REALCREDS
>>> +check; see
>>> +.BR ptrace (2)
>>> +and
>>> +.B CAP_SYS_ADMIN
>>> +capability that caller should have in order to affect performance
>>> +of an external process.
>>
>> The preceding sentence is garbled. Missing words?
> 
> Maybe I worded it incorrectly. What I need to say here is that the
> caller should have both PTRACE_MODE_READ_REALCREDS credentials and
> CAP_SYS_ADMIN capability. The first part I shamelessly copy/pasted
> from https://man7.org/linux/man-pages/man2/process_vm_readv.2.html and
> tried adding the second one to it, obviously unsuccessfully. Any
> advice on how to fix that?

I think you already got pretty close. How about:

[[
Permission to provide a hint to another process is governed by a
ptrace access mode
.B PTRACE_MODE_READ_REALCREDS
check (see
BR ptrace (2));
in addition, the caller must have the
.B CAP_SYS_ADMIN
capability.
]]

[...]

>>> +.TP
>>> +.B ESRCH
>>> +No process with ID
>>> +.I pidfd
>>> +exists.
>>
>> Should this maybe be:
>> [[
>> The target process does not exist (i.e., it has terminated and
>> been waited on).
>> ]]
>>
>> See pidfd_send_signal(2).
> 
> I "borrowed" mine from
> https://man7.org/linux/man-pages/man2/process_vm_readv.2.html but
> either one sounds good to me. Maybe for pidfd_send_signal the wording
> about termination is more important. Anyway, it's up to you. Just let
> me know which one to use.

I think the pidfd_send_signal(2) wording fits better.

[...]

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


Re: [PATCH 1/1] process_madvise.2: Add process_madvise man page

2021-01-28 Thread Michael Kerrisk (man-pages)
Hello Suren,

Thank you for writing this page! Some comments below.

On Wed, 20 Jan 2021 at 21:36, Suren Baghdasaryan  wrote:
>
> Initial version of process_madvise(2) manual page. Initial text was
> extracted from [1], amended after fix [2] and more details added using
> man pages of madvise(2) and process_vm_read(2) as examples. It also
> includes the changes to required permission proposed in [3].
>
> [1] https://lore.kernel.org/patchwork/patch/1297933/
> [2] https://lkml.org/lkml/2020/12/8/1282
> [3] 
> https://patchwork.kernel.org/project/selinux/patch/2021070622.2613577-1-sur...@google.com/#23888311
>
> Signed-off-by: Suren Baghdasaryan 
> Signed-off-by: Minchan Kim 
> ---
>
> Adding the plane text version for ease of review:

Thanks for adding the rendered version. I will make my comments
against the source, below.

> NAME
> process_madvise - give advice about use of memory to a process
>
> SYNOPSIS
> #include 
>
> ssize_t process_madvise(int pidfd,
>const struct iovec *iovec,
>unsigned long vlen,
>int advice,
>unsigned int flags);
>
> DESCRIPTION
> The process_madvise() system call is used to give advice or directions to
> the kernel about the address ranges from external process as well as local
> process. It provides the advice to address ranges of process described by
> iovec and vlen. The goal of such advice is to improve system or 
> application
> performance.
>
> The pidfd selects the process referred to by the PID file descriptor
> specified in pidfd. (see pidofd_open(2) for further information).
>
> The pointer iovec points to an array of iovec structures, defined in
>  as:
>
> struct iovec {
> void  *iov_base;/* Starting address */
> size_t iov_len; /* Number of bytes to transfer */
> };
>
> The iovec describes address ranges beginning at iov_base address and with
> the size of iov_len bytes.
>
> The vlen represents the number of elements in iovec.
>
> The advice can be one of the values listed below.
>
>   Linux-specific advice values
> The following Linux-specific advice values have no counterparts in the
> POSIX-specified posix_madvise(3), and may or may not have counterparts in
> the madvise() interface available on other implementations.
>
> MADV_COLD (since Linux 5.4.1)
> Deactivate a given range of pages by moving them from active to
> inactive LRU list. This is done to accelerate the reclaim of these
> pages. The advice might be ignored for some pages in the range when it
> is not applicable.
> MADV_PAGEOUT (since Linux 5.4.1)
> Reclaim a given range of pages. This is done to free up memory 
> occupied
> by these pages. If a page is anonymous it will be swapped out. If a
> page is file-backed and dirty it will be written back into the backing
> storage. The advice might be ignored for some pages in the range when
> it is not applicable.
>
> The flags argument is reserved for future use; currently, this argument 
> must
> be specified as 0.
>
> The value specified in the vlen argument must be less than or equal to
> IOV_MAX (defined in  or accessible via the call
> sysconf(_SC_IOV_MAX)).
>
> The vlen and iovec arguments are checked before applying any hints. If the
> vlen is too big, or iovec is invalid, an error will be returned
> immediately.
>
> Hint might be applied to a part of iovec if one of its elements points to
> an invalid memory region in the remote process. No further elements will 
> be
> processed beyond that point.
>
> Permission to provide a hint to external process is governed by a ptrace
> access mode PTRACE_MODE_READ_REALCREDS check; see ptrace(2) and
> CAP_SYS_ADMIN capability that caller should have in order to affect
> performance of an external process.
>
> RETURN VALUE
> On success, process_madvise() returns the number of bytes advised. This
> return value may be less than the total number of requested bytes, if an
> error occurred. The caller should check return value to determine whether
> a partial advice occurred.

So there are three return values possible,
> ERRORS
> EFAULT The memory described by iovec is outside the accessible address
>space of the process pid.

s/pid/
of the process referred to by
.IR pidfd .

> EINVAL flags is not 0.
> EINVAL The sum of the iov_len values of iovec overflows a ssize_t value.
> EINVAL vlen is too large.
> ENOMEM Could not allocate memory for internal copies of the iovec
>structures.
> EPERM The caller does not have permission to access the address space of
>   the process pidfd.
> ESRCH No process with ID pidfd exists.
>
> VERSIONS
> Since Linux 5.10, support for this system call is

Re: [PATCH 1/1] process_madvise.2: Add process_madvise man page

2021-01-26 Thread Suren Baghdasaryan
On Mon, Jan 25, 2021 at 5:19 AM 'Michal Hocko' via kernel-team
 wrote:
>
> On Wed 20-01-21 12:23:37, Suren Baghdasaryan wrote:
> [...]
> > MADV_COLD (since Linux 5.4.1)
> > Deactivate a given range of pages by moving them from active to
> > inactive LRU list. This is done to accelerate the reclaim of these
> > pages. The advice might be ignored for some pages in the range when 
> > it
> > is not applicable.
>
> I do not think we want to talk about active/inactive LRU lists here.
> Wouldn't it be sufficient to say
> Deactive a given range of pages which will make them a more probable
> reclaim target should there be a memory pressure. This is a
> non-destructive operation.

That sounds better. Will update in the next version.

>
> Other than that, looks good to me from the content POV.
>
> Thanks!

Thanks for the review Michal!

> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to kernel-team+unsubscr...@android.com.
>


Re: [PATCH 1/1] process_madvise.2: Add process_madvise man page

2021-01-25 Thread Michal Hocko
On Wed 20-01-21 12:23:37, Suren Baghdasaryan wrote:
[...]
> MADV_COLD (since Linux 5.4.1)
> Deactivate a given range of pages by moving them from active to
> inactive LRU list. This is done to accelerate the reclaim of these
> pages. The advice might be ignored for some pages in the range when it
> is not applicable.

I do not think we want to talk about active/inactive LRU lists here.
Wouldn't it be sufficient to say
Deactive a given range of pages which will make them a more probable
reclaim target should there be a memory pressure. This is a
non-destructive operation.

Other than that, looks good to me from the content POV.

Thanks!
-- 
Michal Hocko
SUSE Labs