Re: [PATCH 1/1] process_madvise.2: Add process_madvise man page
On Thu, Jan 28, 2021 at 12:31 PM Michael Kerrisk (man-pages) wrote: > > Hello Suren, > > On 1/28/21 7:40 PM, Suren Baghdasaryan wrote: > > On Thu, Jan 28, 2021 at 4:24 AM Michael Kerrisk (man-pages) > > wrote: > >> > >> Hello Suren, > >> > >> Thank you for writing this page! Some comments below. > > > > Thanks for the review! > > Couple questions below and I'll respin the new version once they are > > clarified. > > Okay. See below. > > >> On Wed, 20 Jan 2021 at 21:36, Suren Baghdasaryan wrote: > >>> > > [...] > > Thanks for all the acks. That let's me know that you saw what I said. > > >>> RETURN VALUE > >>> On success, process_madvise() returns the number of bytes advised. > >>> This > >>> return value may be less than the total number of requested bytes, if > >>> an > >>> error occurred. The caller should check return value to determine > >>> whether > >>> a partial advice occurred. > >> > >> So there are three return values possible, > > > > Ok, I think I see your point. How about this instead: > > Well, I'm glad you saw it, because I forgot to finish it. But yes, > you understood what I forgot to say. > > > RETURN VALUE > > On success, process_madvise() returns the number of bytes advised. This > > return value may be less than the total number of requested bytes, if > > an > > error occurred after some iovec elements were already processed. The > > caller > > should check the return value to determine whether a partial > > advice occurred. > > > > On error, -1 is returned and errno is set appropriately. > > We recently standardized some wording here: > s/appropriately/to indicate the error/. > > > >>> +.PP > >>> +The pointer > >>> +.I iovec > >>> +points to an array of iovec structures, defined in > >> > >> "iovec" should be formatted as > >> > >> .I iovec > > > > I think it is formatted that way above. What am I missing? > > But also in "an array of iovec structures"... > > > BTW, where should I be using .I vs .IR? I was looking for an answer > > but could not find it. > > .B / .I == bold/italic this line > .BR / .IR == alternate bold/italic with normal (Roman) font. > > So: > .I iovec > .I iovec , # so that comma is not italic > .BR process_madvise () > etc. > > [...] > > >>> +.I iovec > >>> +if one of its elements points to an invalid memory > >>> +region in the remote process. No further elements will be > >>> +processed beyond that point. > >>> +.PP > >>> +Permission to provide a hint to external process is governed by a > >>> +ptrace access mode > >>> +.B PTRACE_MODE_READ_REALCREDS > >>> +check; see > >>> +.BR ptrace (2) > >>> +and > >>> +.B CAP_SYS_ADMIN > >>> +capability that caller should have in order to affect performance > >>> +of an external process. > >> > >> The preceding sentence is garbled. Missing words? > > > > Maybe I worded it incorrectly. What I need to say here is that the > > caller should have both PTRACE_MODE_READ_REALCREDS credentials and > > CAP_SYS_ADMIN capability. The first part I shamelessly copy/pasted > > from https://man7.org/linux/man-pages/man2/process_vm_readv.2.html and > > tried adding the second one to it, obviously unsuccessfully. Any > > advice on how to fix that? > > I think you already got pretty close. How about: > > [[ > Permission to provide a hint to another process is governed by a > ptrace access mode > .B PTRACE_MODE_READ_REALCREDS > check (see > BR ptrace (2)); > in addition, the caller must have the > .B CAP_SYS_ADMIN > capability. In V2 I explanded a bit this part to explain why CAP_SYS_ADMIN is needed. There were questions about that during my patch review which adds this requirement (https://lore.kernel.org/patchwork/patch/1363605), so I thought a short explanation would be useful. > ]] > > [...] > > >>> +.TP > >>> +.B ESRCH > >>> +No process with ID > >>> +.I pidfd > >>> +exists. > >> > >> Should this maybe be: > >> [[ > >> The target process does not exist (i.e., it has terminated and > >> been waited on). > >> ]] > >> > >> See pidfd_send_signal(2). > > > > I "borrowed" mine from > > https://man7.org/linux/man-pages/man2/process_vm_readv.2.html but > > either one sounds good to me. Maybe for pidfd_send_signal the wording > > about termination is more important. Anyway, it's up to you. Just let > > me know which one to use. > > I think the pidfd_send_signal(2) wording fits better. > > [...] > > Thanks, > > Michael > > -- > Michael Kerrisk > Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ > Linux/UNIX System Programming Training: http://man7.org/training/
Re: [PATCH 1/1] process_madvise.2: Add process_madvise man page
On Thu, Jan 28, 2021 at 12:31 PM Michael Kerrisk (man-pages) wrote: > > Hello Suren, > > On 1/28/21 7:40 PM, Suren Baghdasaryan wrote: > > On Thu, Jan 28, 2021 at 4:24 AM Michael Kerrisk (man-pages) > > wrote: > >> > >> Hello Suren, > >> > >> Thank you for writing this page! Some comments below. > > > > Thanks for the review! > > Couple questions below and I'll respin the new version once they are > > clarified. > > Okay. See below. > > >> On Wed, 20 Jan 2021 at 21:36, Suren Baghdasaryan wrote: > >>> > > [...] > > Thanks for all the acks. That let's me know that you saw what I said. > > >>> RETURN VALUE > >>> On success, process_madvise() returns the number of bytes advised. > >>> This > >>> return value may be less than the total number of requested bytes, if > >>> an > >>> error occurred. The caller should check return value to determine > >>> whether > >>> a partial advice occurred. > >> > >> So there are three return values possible, > > > > Ok, I think I see your point. How about this instead: > > Well, I'm glad you saw it, because I forgot to finish it. But yes, > you understood what I forgot to say. > > > RETURN VALUE > > On success, process_madvise() returns the number of bytes advised. This > > return value may be less than the total number of requested bytes, if > > an > > error occurred after some iovec elements were already processed. The > > caller > > should check the return value to determine whether a partial > > advice occurred. > > > > On error, -1 is returned and errno is set appropriately. > > We recently standardized some wording here: > s/appropriately/to indicate the error/. ack > > > >>> +.PP > >>> +The pointer > >>> +.I iovec > >>> +points to an array of iovec structures, defined in > >> > >> "iovec" should be formatted as > >> > >> .I iovec > > > > I think it is formatted that way above. What am I missing? > > But also in "an array of iovec structures"... ack > > > BTW, where should I be using .I vs .IR? I was looking for an answer > > but could not find it. > > .B / .I == bold/italic this line > .BR / .IR == alternate bold/italic with normal (Roman) font. > > So: > .I iovec > .I iovec , # so that comma is not italic > .BR process_madvise () > etc. Aha! Got it now. It's clear after your example. Thanks! > > [...] > > >>> +.I iovec > >>> +if one of its elements points to an invalid memory > >>> +region in the remote process. No further elements will be > >>> +processed beyond that point. > >>> +.PP > >>> +Permission to provide a hint to external process is governed by a > >>> +ptrace access mode > >>> +.B PTRACE_MODE_READ_REALCREDS > >>> +check; see > >>> +.BR ptrace (2) > >>> +and > >>> +.B CAP_SYS_ADMIN > >>> +capability that caller should have in order to affect performance > >>> +of an external process. > >> > >> The preceding sentence is garbled. Missing words? > > > > Maybe I worded it incorrectly. What I need to say here is that the > > caller should have both PTRACE_MODE_READ_REALCREDS credentials and > > CAP_SYS_ADMIN capability. The first part I shamelessly copy/pasted > > from https://man7.org/linux/man-pages/man2/process_vm_readv.2.html and > > tried adding the second one to it, obviously unsuccessfully. Any > > advice on how to fix that? > > I think you already got pretty close. How about: > > [[ > Permission to provide a hint to another process is governed by a > ptrace access mode > .B PTRACE_MODE_READ_REALCREDS > check (see > BR ptrace (2)); > in addition, the caller must have the > .B CAP_SYS_ADMIN > capability. > ]] Perfect! I'll use that. > > [...] > > >>> +.TP > >>> +.B ESRCH > >>> +No process with ID > >>> +.I pidfd > >>> +exists. > >> > >> Should this maybe be: > >> [[ > >> The target process does not exist (i.e., it has terminated and > >> been waited on). > >> ]] > >> > >> See pidfd_send_signal(2). > > > > I "borrowed" mine from > > https://man7.org/linux/man-pages/man2/process_vm_readv.2.html but > > either one sounds good to me. Maybe for pidfd_send_signal the wording > > about termination is more important. Anyway, it's up to you. Just let > > me know which one to use. > > I think the pidfd_send_signal(2) wording fits better. ack, will use pidfd_send_signal(2) version. > > [...] > > Thanks, > > Michael I'll include your and Michal's suggestions and will post the next version later today or tomorrow morning. Thanks for the guidance! > > -- > Michael Kerrisk > Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ > Linux/UNIX System Programming Training: http://man7.org/training/
Re: [PATCH 1/1] process_madvise.2: Add process_madvise man page
Hello Suren, On 1/28/21 7:40 PM, Suren Baghdasaryan wrote: > On Thu, Jan 28, 2021 at 4:24 AM Michael Kerrisk (man-pages) > wrote: >> >> Hello Suren, >> >> Thank you for writing this page! Some comments below. > > Thanks for the review! > Couple questions below and I'll respin the new version once they are > clarified. Okay. See below. >> On Wed, 20 Jan 2021 at 21:36, Suren Baghdasaryan wrote: >>> [...] Thanks for all the acks. That let's me know that you saw what I said. >>> RETURN VALUE >>> On success, process_madvise() returns the number of bytes advised. This >>> return value may be less than the total number of requested bytes, if an >>> error occurred. The caller should check return value to determine >>> whether >>> a partial advice occurred. >> >> So there are three return values possible, > > Ok, I think I see your point. How about this instead: Well, I'm glad you saw it, because I forgot to finish it. But yes, you understood what I forgot to say. > RETURN VALUE > On success, process_madvise() returns the number of bytes advised. This > return value may be less than the total number of requested bytes, if an > error occurred after some iovec elements were already processed. The > caller > should check the return value to determine whether a partial > advice occurred. > > On error, -1 is returned and errno is set appropriately. We recently standardized some wording here: s/appropriately/to indicate the error/. >>> +.PP >>> +The pointer >>> +.I iovec >>> +points to an array of iovec structures, defined in >> >> "iovec" should be formatted as >> >> .I iovec > > I think it is formatted that way above. What am I missing? But also in "an array of iovec structures"... > BTW, where should I be using .I vs .IR? I was looking for an answer > but could not find it. .B / .I == bold/italic this line .BR / .IR == alternate bold/italic with normal (Roman) font. So: .I iovec .I iovec , # so that comma is not italic .BR process_madvise () etc. [...] >>> +.I iovec >>> +if one of its elements points to an invalid memory >>> +region in the remote process. No further elements will be >>> +processed beyond that point. >>> +.PP >>> +Permission to provide a hint to external process is governed by a >>> +ptrace access mode >>> +.B PTRACE_MODE_READ_REALCREDS >>> +check; see >>> +.BR ptrace (2) >>> +and >>> +.B CAP_SYS_ADMIN >>> +capability that caller should have in order to affect performance >>> +of an external process. >> >> The preceding sentence is garbled. Missing words? > > Maybe I worded it incorrectly. What I need to say here is that the > caller should have both PTRACE_MODE_READ_REALCREDS credentials and > CAP_SYS_ADMIN capability. The first part I shamelessly copy/pasted > from https://man7.org/linux/man-pages/man2/process_vm_readv.2.html and > tried adding the second one to it, obviously unsuccessfully. Any > advice on how to fix that? I think you already got pretty close. How about: [[ Permission to provide a hint to another process is governed by a ptrace access mode .B PTRACE_MODE_READ_REALCREDS check (see BR ptrace (2)); in addition, the caller must have the .B CAP_SYS_ADMIN capability. ]] [...] >>> +.TP >>> +.B ESRCH >>> +No process with ID >>> +.I pidfd >>> +exists. >> >> Should this maybe be: >> [[ >> The target process does not exist (i.e., it has terminated and >> been waited on). >> ]] >> >> See pidfd_send_signal(2). > > I "borrowed" mine from > https://man7.org/linux/man-pages/man2/process_vm_readv.2.html but > either one sounds good to me. Maybe for pidfd_send_signal the wording > about termination is more important. Anyway, it's up to you. Just let > me know which one to use. I think the pidfd_send_signal(2) wording fits better. [...] Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/
Re: [PATCH 1/1] process_madvise.2: Add process_madvise man page
Hello Suren, Thank you for writing this page! Some comments below. On Wed, 20 Jan 2021 at 21:36, Suren Baghdasaryan wrote: > > Initial version of process_madvise(2) manual page. Initial text was > extracted from [1], amended after fix [2] and more details added using > man pages of madvise(2) and process_vm_read(2) as examples. It also > includes the changes to required permission proposed in [3]. > > [1] https://lore.kernel.org/patchwork/patch/1297933/ > [2] https://lkml.org/lkml/2020/12/8/1282 > [3] > https://patchwork.kernel.org/project/selinux/patch/2021070622.2613577-1-sur...@google.com/#23888311 > > Signed-off-by: Suren Baghdasaryan > Signed-off-by: Minchan Kim > --- > > Adding the plane text version for ease of review: Thanks for adding the rendered version. I will make my comments against the source, below. > NAME > process_madvise - give advice about use of memory to a process > > SYNOPSIS > #include > > ssize_t process_madvise(int pidfd, >const struct iovec *iovec, >unsigned long vlen, >int advice, >unsigned int flags); > > DESCRIPTION > The process_madvise() system call is used to give advice or directions to > the kernel about the address ranges from external process as well as local > process. It provides the advice to address ranges of process described by > iovec and vlen. The goal of such advice is to improve system or > application > performance. > > The pidfd selects the process referred to by the PID file descriptor > specified in pidfd. (see pidofd_open(2) for further information). > > The pointer iovec points to an array of iovec structures, defined in > as: > > struct iovec { > void *iov_base;/* Starting address */ > size_t iov_len; /* Number of bytes to transfer */ > }; > > The iovec describes address ranges beginning at iov_base address and with > the size of iov_len bytes. > > The vlen represents the number of elements in iovec. > > The advice can be one of the values listed below. > > Linux-specific advice values > The following Linux-specific advice values have no counterparts in the > POSIX-specified posix_madvise(3), and may or may not have counterparts in > the madvise() interface available on other implementations. > > MADV_COLD (since Linux 5.4.1) > Deactivate a given range of pages by moving them from active to > inactive LRU list. This is done to accelerate the reclaim of these > pages. The advice might be ignored for some pages in the range when it > is not applicable. > MADV_PAGEOUT (since Linux 5.4.1) > Reclaim a given range of pages. This is done to free up memory > occupied > by these pages. If a page is anonymous it will be swapped out. If a > page is file-backed and dirty it will be written back into the backing > storage. The advice might be ignored for some pages in the range when > it is not applicable. > > The flags argument is reserved for future use; currently, this argument > must > be specified as 0. > > The value specified in the vlen argument must be less than or equal to > IOV_MAX (defined in or accessible via the call > sysconf(_SC_IOV_MAX)). > > The vlen and iovec arguments are checked before applying any hints. If the > vlen is too big, or iovec is invalid, an error will be returned > immediately. > > Hint might be applied to a part of iovec if one of its elements points to > an invalid memory region in the remote process. No further elements will > be > processed beyond that point. > > Permission to provide a hint to external process is governed by a ptrace > access mode PTRACE_MODE_READ_REALCREDS check; see ptrace(2) and > CAP_SYS_ADMIN capability that caller should have in order to affect > performance of an external process. > > RETURN VALUE > On success, process_madvise() returns the number of bytes advised. This > return value may be less than the total number of requested bytes, if an > error occurred. The caller should check return value to determine whether > a partial advice occurred. So there are three return values possible, > ERRORS > EFAULT The memory described by iovec is outside the accessible address >space of the process pid. s/pid/ of the process referred to by .IR pidfd . > EINVAL flags is not 0. > EINVAL The sum of the iov_len values of iovec overflows a ssize_t value. > EINVAL vlen is too large. > ENOMEM Could not allocate memory for internal copies of the iovec >structures. > EPERM The caller does not have permission to access the address space of > the process pidfd. > ESRCH No process with ID pidfd exists. > > VERSIONS > Since Linux 5.10, support for this system call is
Re: [PATCH 1/1] process_madvise.2: Add process_madvise man page
On Mon, Jan 25, 2021 at 5:19 AM 'Michal Hocko' via kernel-team wrote: > > On Wed 20-01-21 12:23:37, Suren Baghdasaryan wrote: > [...] > > MADV_COLD (since Linux 5.4.1) > > Deactivate a given range of pages by moving them from active to > > inactive LRU list. This is done to accelerate the reclaim of these > > pages. The advice might be ignored for some pages in the range when > > it > > is not applicable. > > I do not think we want to talk about active/inactive LRU lists here. > Wouldn't it be sufficient to say > Deactive a given range of pages which will make them a more probable > reclaim target should there be a memory pressure. This is a > non-destructive operation. That sounds better. Will update in the next version. > > Other than that, looks good to me from the content POV. > > Thanks! Thanks for the review Michal! > -- > Michal Hocko > SUSE Labs > > -- > To unsubscribe from this group and stop receiving emails from it, send an > email to kernel-team+unsubscr...@android.com. >
Re: [PATCH 1/1] process_madvise.2: Add process_madvise man page
On Wed 20-01-21 12:23:37, Suren Baghdasaryan wrote: [...] > MADV_COLD (since Linux 5.4.1) > Deactivate a given range of pages by moving them from active to > inactive LRU list. This is done to accelerate the reclaim of these > pages. The advice might be ignored for some pages in the range when it > is not applicable. I do not think we want to talk about active/inactive LRU lists here. Wouldn't it be sufficient to say Deactive a given range of pages which will make them a more probable reclaim target should there be a memory pressure. This is a non-destructive operation. Other than that, looks good to me from the content POV. Thanks! -- Michal Hocko SUSE Labs